建表语句:
CREATE EXTERNAL TABLE `app.table1`(
.....
)
PARTITIONED BY (
`dt` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT --当查询时会用到,需要查询lzo格式的文件!!
'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';
表格式是lzo,查询时需要对lzo文件查询,所以在写入时,也需要写入lzo格式的文件!!
写表语句:
set mapred.output.compress=true;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
-- 前三行很重要,指定表的数据文件是lzo格式!!
set hive.exec.parallel=true;
set hive.auto.convert.join=true;
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;
SET hive.exec.max.created.files=655350;
set mapreduce.input.fileinputformat.split.maxsize=256000000;
set mapreduce.input.fileinputformat.split.minsize.per.rack=256000000;
set mapreduce.input.fileinputformat.split.minsize.per.node=256000000;
set hive.hadoop.supports.splittable.combineinputformat=true;
use app;
insert overwrite table app.table1 partition(dt='""" + yesterday_str+ """')
select
....
from
......
如果不写前三行,则表的数据文件是txt格式,则查询时不会有结果!!!
查询时是 INPUT,与INPUTFORMAT有关,
写入时是OUTPUT,与OUTPUTFORMAT有关