hive分區表增加字段新增字段值爲空的bug

原創

men子烦高

2018-09-02 03:33

關鍵字： hive, partition, add column

hive JIRA：https://issues.apache.org/jira/browse/HIVE-6131

最近在查hive版本問題，發現在hive1.1.0和hive1.2.1上，分區表新增字段後新增字段值爲空的情況。

網上查了資料，提供了兩種解決辦法：

1. 修改hive元數據SDS表的CD_ID字段，原因是修改表結構後，元數據庫中的SDS中該表對應的CD_ID會改變，但是該表分區下面對應的CD_ID還是原來表的CD_ID

2.刪除當前分區重建

這兩個辦法都不太適應，辦法1修改元數據庫風險大，辦法2可能會導致數據丟失。

老大給的任務是其他辦法workaround。

通過測試發現如下規律，先給出結論：

在分區表裏增加字段後，向分區表插入數據有兩種情況：
1.分區在修改表結構前存在
2.分區在修改表結構前不存在
對於第二種情況，bug不存在
針對第一種情形，
執行alter table denglg add columns(c3 string); 查分區數據新增字段值爲空，
需再執行alter table denglg partition(step='1') add columns(c3 string);【假設當前只有step='1'的分區】

具體測試如下，可以參考看看

1.新建分區表，插入兩個分區的數據

   CREATE TABLE testtmp.denglg(c1 string, c2 string)PARTITIONED BY (step string);
    insert into table testtmp.denglg partition(step='1') select '1','2' from default.dual;
    insert into table testtmp.denglg partition(step='2') select '11','22' from default.dual;
  hive> select * from denglg where step='1';   hive> select * from denglg where step='2';
  OK                                           OK
  1 2 1                              1122 2

2.新增字段c3

alter table denglg add columns(c3 string);

3.向三個分區插入數據
insert into table testtmp.denglg partition(step='1') select '1','2','3' from default.dual;
insert into table testtmp.denglg partition(step='2') select '11','22','33' from default.dual;
insert into table testtmp.denglg partition(step='3') select '111','222','333' from default.dual;
hive> select * from denglg where step='1';
OK
1 2 NULL1
1 2 NULL1
Time taken: 0.122 seconds, Fetched: 2 row(s)
hive> select * from denglg where step='2';
OK
11 22 NULL2
11 22 NULL2
Time taken: 0.075 seconds, Fetched: 2 row(s)
hive> select * from denglg where step='3';
OK
111 222 333 3
Time taken: 0.077 seconds, Fetched: 1 row(s)
發現分區step=3不受影響
4.執行alter table denglg partition(step='1') add columns(c3 string);
hive> select * from denglg where step='1';
OK
1 2 NULL1
1 2 31
Time taken: 0.728 seconds, Fetched: 2 row(s)
hive> select * from denglg where step='2';
OK
11 22 NULL2
11 22 NULL2
驗證上述結論正確。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

hive分區表增加字段新增字段值爲空的bug

含有負數的取模運算

libjvm.so:cannot restore segment prot after reloc:Permission denied

Spark Streaming Backpressure分析

KMP字符串模式匹配算法Java實現

oozie server系統時鐘偏差導致sqoop報錯

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結