Hive,INPUTFORMAT的作用

建表语句：

CREATE EXTERNAL TABLE `app.table1`(
.....
)
PARTITIONED BY (
  `dt` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
STORED AS INPUTFORMAT --当查询时会用到，需要查询lzo格式的文件！！
  'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

表格式是lzo，查询时需要对lzo文件查询，所以在写入时，也需要写入lzo格式的文件！！

写表语句：

set mapred.output.compress=true;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
-- 前三行很重要，指定表的数据文件是lzo格式！！
set hive.exec.parallel=true;
set hive.auto.convert.join=true;
SET hive.exec.max.dynamic.partitions=100000;
SET hive.exec.max.dynamic.partitions.pernode=100000;
SET hive.exec.max.created.files=655350;
set mapreduce.input.fileinputformat.split.maxsize=256000000;
set mapreduce.input.fileinputformat.split.minsize.per.rack=256000000;
set mapreduce.input.fileinputformat.split.minsize.per.node=256000000;
set hive.hadoop.supports.splittable.combineinputformat=true;


use app;
insert overwrite table app.table1 partition(dt='""" + yesterday_str+ """')
select
....
from
......

如果不写前三行，则表的数据文件是txt格式，则查询时不会有结果！！！

查询时是 INPUT，与INPUTFORMAT有关，

写入时是OUTPUT，与OUTPUTFORMAT有关

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive,INPUTFORMAT的作用

杭州的 IT 崩盘了么？

开源高性能结构化日志模块NanoLog

Python 潮流周刊#55：分享 9 个高质量的技术类信息源！

Azure Virtual Network (22) 多订阅使用Azure DNS解析问题 Windows Azure Platform 系列文章目录

【简写Mybatis-02】注册机的实现以及SqlSession处理

手绘二维码

.NET借助虚拟网卡实现一个简单异地组网工具

Flink 三種狀態存儲方式 MemoryStateBackend、FsStateBackend、RocksDBStateBackend

flink sql實例， TableException: Create BatchTableEnvironment failed.報錯

Flink DataSet partitionByRange sortPartition 用法實例

Flink ClassNotFoundException BatchTableEnvironmentImpl 報錯解決方法

Flink 分佈式緩存廣播變量區別

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結