建表語句:
CREATE EXTERNAL TABLE `app.table1`(
.....
)
PARTITIONED BY (
`dt` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT --當查詢時會用到,需要查詢lzo格式的文件!!
'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
表格式是lzo,查詢時需要對lzo文件查詢,所以在寫入時,也需要寫入lzo格式的文件!!
寫表語句:
set mapred.output.compress=true;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
-- 前三行很重要,指定表的數據文件是lzo格式!!
set hive.exec.parallel=true;
set hive.auto.convert.join=true;
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;
SET hive.exec.max.created.files=655350;
set mapreduce.input.fileinputformat.split.maxsize=256000000;
set mapreduce.input.fileinputformat.split.minsize.per.rack=256000000;
set mapreduce.input.fileinputformat.split.minsize.per.node=256000000;
set hive.hadoop.supports.splittable.combineinputformat=true;
use app;
insert overwrite table app.table1 partition(dt='""" + yesterday_str+ """')
select
....
from
......
如果不寫前三行,則表的數據文件是txt格式,則查詢時不會有結果!!!
查詢時是 INPUT,與INPUTFORMAT有關,
寫入時是OUTPUT,與OUTPUTFORMAT有關