我用sqoop拉取mysql表,到hive中後表的存儲格式爲parquet格式。
這時我要用此表關聯另一張表,自己建的表。如下:
兩個建表語句(語句2多了 STORED AS parquet):
語句1:
CREATE TABLE `tmp.t_position_name_data_times_greate300_positions`(
`id` string,
`title` string,
`company_name` string,
`work_city` string,
`company_id` string,
`education_request` string,
`work_year_request` string,
`position_description` string);
語句2:
CREATE TABLE `tmp.t_position_name_data_times_greate300_positions`(
`id` string,
`title` string,
`company_name` string,
`work_city` string,
`company_id` string,
`education_request` string,
`work_year_request` string,
`position_description` string) STORED AS parquet;
如果直接使用語句1建表則默認存儲格式爲Text,這樣的話如果字段中有 /r,/t 換行符的 的話select * 不會換行,但是和我第一張sqoop拉表做關聯的話會出現換行的情況。
如果使用語句2建表的話,這樣存儲格式爲parquet格式,是按列存儲的,這樣的話就不會換行。
兩種方式建表後,show create table:
方式1:
CREATE TABLE `tmp.t_position_name_data_times_greate300_positions`(
`id` string,
`title` string,
`company_name` string,
`work_city` string,
`company_id` string,
`education_request` string,
`work_year_request` string,
`position_description` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'hdfs://nameservice1/hy/data/hive/warehouse/tmp.db/t_position_name_data_times_greate300_positions'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'numRows'='100',
'rawDataSize'='78685',
'totalSize'='78785',
'transient_lastDdlTime'='1589166294')
方式2:
CREATE TABLE `tmp.t_position_name_data_times_greate300_positions`(
`id` string,
`title` string,
`company_name` string,
`work_city` string,
`company_id` string,
`education_request` string,
`work_year_request` string,
`position_description` string)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
'hdfs://nameservice1/hy/data/hive/warehouse/tmp.db/t_position_name_data_times_greate300_positions'
TBLPROPERTIES (
'COLUMN_STATS_ACCURATE'='true',
'numFiles'='1',
'numRows'='100',
'rawDataSize'='800',
'totalSize'='77365',
'transient_lastDdlTime'='1589167094')