Hive如何根據表中某個字段動態分區

原創

baigp

2019-06-11 06:38

使用hive儲存數據時，需要對做分區，如果從kafka接收數據，將每天的數據保存一個分區（按天分區），保存分區時需要根據某個字段做動態分區，而不是傻傻的將數據寫到某一個臨時目錄最後倒入到某一個分區，這是靜態分區。

動態分區的核心就是修改兩個配置項。

Hive動態分區步驟如下：

1、建立某一個源表模擬數據源並插入一些數據

create table t_test_p_source (
    id string,
    name string,
    birthday string
) 
row format delimited fields terminated by '\t'
stored as textfile;

insert into t_test_p_source values ('a1', 'zhangsan', '2018-01-01');
insert into t_test_p_source values ('a2', 'lisi', '2018-01-02');
insert into t_test_p_source values ('a3', 'zhangsan', '2018-01-03');
insert into t_test_p_source values ('a4', 'wangwu', '2018-01-04');
insert into t_test_p_source values ('a5', 'sanzang', '2018-01-05');
insert into t_test_p_source values ('a6', 'zhangsan2', '2018-01-01');

2、建立一張分區表（按birthday字段分區）

create table t_test_p_target (
    id string,
    name string
)
partitioned by (birthday string)
row format delimited fields terminated by '\t'
stored as textfile;

3、向分區表中插入數據

SET hive.exec.dynamic.partition=true;   #是否開啓動態分區，默認是false，所以必須要設置成true
SET hive.exec.dynamic.partition.mode=nonstrict;    # 動態分區模式，默認爲strict, 表示表中必須一個分區爲靜態分區，nostrict表示允許所有字段都可以作爲動態分區

insert into table t_test_p_target partition (birthday) select id, name, birthday  from t_test_p_source;

4、測試是否動態分區了

2018-01-01這個分區只有2條數據，再來看下HDFS上的分區目錄

至此，hive動態分區已經完成了。

原文鏈接：https://www.cnblogs.com/jsnr-tdyd/p/9946788.html

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

Hive如何根據表中某個字段動態分區

python gdal 安裝使用（Windows， python 3.6.8）

mapreduce自定義類型-空指針異常之坑NullPointerException

大數據常見端口彙總-hadoop、hbase、hive、spark、kafka、zookeeper等（持續更新）

Mac環境下， VMware Fusion下的虛擬機（ CentOS 7）的 NAT網絡配置

遍歷ArrayList，並刪除某些元素的方法實現

MySQL無法登錄問題-"ERROR 1045 (28000): Access denied for user 'root'@'localhost'"-之解決方法-密碼重置

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結