現有hdfs路徑hadoop fs -du -h /user/portal/ODM/push/pushcatch_data_collect/
路徑下有每天分區
284.2 K /user/portal/ODM/push/pushcatch_data_collect/2018-12-18
158.8 K /user/portal/ODM/push/pushcatch_data_collect/2018-12-19
現有兩種建表方法,
1.建表語句中包含分區(通常是時間分區), 建表完成後, 需手動插入對應的分區;
alter table tmp.test1 add if not exists partition (day = 20181201) location ‘/user/portal/ODM/push/pushcatch_data_collect/2018-12-18’;
2.不插入時間分區, 直接建表
兩者都能取到數據;
注意時間的選擇
concat(substr(insertTime,1,4),substr(insertTime,6,2),substr(insertTime,9,2)) = ‘${ts}’
###############
數據源是lzo壓縮格式, hive建表時候注意建表語句;
STORED AS INPUTFORMAT
‘com.hadoop.mapred.DeprecatedLzoTextInputFormat’
OUTPUTFORMAT
‘org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat’
###############
CREATE EXTERNAL TABLE IF NOT EXISTS tmp.test__3(
body STRING COMMENT '',
dt string,
gogo string
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT
'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
'/user/portal/ODM/push/pushcatch_data_collect/2018-12-18';
另外, hive建表不能覆蓋, 比如說已經建了A表, 如果更改表結構或者表的路徑, 重新建立A表, 雖然建表成功, 但是查詢時還是查到的是原先A表的記錄