hive:表做關聯,字段出現了換行

我用sqoop拉取mysql表,到hive中後表的存儲格式爲parquet格式。

這時我要用此表關聯另一張表,自己建的表。如下:

兩個建表語句(語句2多了 STORED AS parquet):

語句1:
 CREATE TABLE `tmp.t_position_name_data_times_greate300_positions`(
  `id` string, 
  `title` string, 
  `company_name` string, 
  `work_city` string, 
  `company_id` string, 
  `education_request` string, 
  `work_year_request` string, 
  `position_description` string);


語句2:

 CREATE TABLE `tmp.t_position_name_data_times_greate300_positions`(
  `id` string, 
  `title` string, 
  `company_name` string, 
  `work_city` string, 
  `company_id` string, 
  `education_request` string, 
  `work_year_request` string, 
  `position_description` string) STORED AS parquet;

 

如果直接使用語句1建表則默認存儲格式爲Text,這樣的話如果字段中有 /r,/t 換行符的 的話select * 不會換行,但是和我第一張sqoop拉表做關聯的話會出現換行的情況。

如果使用語句2建表的話,這樣存儲格式爲parquet格式,是按列存儲的,這樣的話就不會換行。

兩種方式建表後,show create table:

方式1:

​
CREATE TABLE `tmp.t_position_name_data_times_greate300_positions`(
  `id` string, 
  `title` string, 
  `company_name` string, 
  `work_city` string, 
  `company_id` string, 
  `education_request` string, 
  `work_year_request` string, 
  `position_description` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://nameservice1/hy/data/hive/warehouse/tmp.db/t_position_name_data_times_greate300_positions'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true', 
  'numFiles'='1', 
  'numRows'='100', 
  'rawDataSize'='78685', 
  'totalSize'='78785', 
  'transient_lastDdlTime'='1589166294')

​

方式2:

CREATE TABLE `tmp.t_position_name_data_times_greate300_positions`(
  `id` string, 
  `title` string, 
  `company_name` string, 
  `work_city` string, 
  `company_id` string, 
  `education_request` string, 
  `work_year_request` string, 
  `position_description` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'hdfs://nameservice1/hy/data/hive/warehouse/tmp.db/t_position_name_data_times_greate300_positions'
TBLPROPERTIES (
  'COLUMN_STATS_ACCURATE'='true', 
  'numFiles'='1', 
  'numRows'='100', 
  'rawDataSize'='800', 
  'totalSize'='77365', 
  'transient_lastDdlTime'='1589167094')

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章