加載本地文件到Hive表出現NULL列的解決辦法

原創

du_qi

2020-06-27 11:48

舉例說明，

現有本地文件，每行兩列，內容如下：

0000000026310400 F
0000000029858520 F
0000000042620180 F
0000000044783820 F
0000000045771260 F

創建一個Hive表，使用如下語句：

create table if not exists new_table(id string, lable string);

加載本地文件到new_table，使用如下語句：

load data local inpath '~/new_file' overwrite into table new_table；

查詢new_table內容，發現多了一個NULL列：

hive> select id, lable from new_table;
0000000026310400 F NULL
0000000029858520 F NULL
0000000042620180 F NULL
0000000044783820 F NULL
0000000045771260 F NULL

再查詢第一個字段：

hive> select id from new_table;
0000000026310400 F
0000000029858520 F
0000000042620180 F
0000000044783820 F
0000000045771260 F

從查詢結果中可以看出，本地文件每行的兩列被當作一個字段加載到表中，但是表定義了兩個字段，另一個字段沒有數據，所以全是NULL。

分析原因，因爲創建表的語句使用了Hive默認的SerDe存儲格式，即序列化存儲，默認是以'\001'作爲字段分隔符，而本地文件new_file兩列之間是以'\t'作爲分隔符，所以文件中兩列被當作一個字段了。

要去除NULL列，可以修改表new_table的字段分隔符，使用如下語句：

hive> alter table new_table set SERDEPROPERTIES('field.delim'='\t');

再次查詢：

hive> select id, lable from new_table;
0000000026310400 F
0000000029858520 F
0000000042620180 F
0000000044783820 F
0000000045771260 F

hive> select id from new_table;
0000000026310400
0000000029858520
0000000042620180
0000000044783820
0000000045771260

表的內容正常了。

爲避免出現以上問題，其實應該在定義表的時候就使用正確的字段分隔符，可使用如下語句：

create table if not exists new_table(id string, lable string) row format delimited fields terminated by '\t';

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

加載本地文件到Hive表出現NULL列的解決辦法

Windows下MySQL安裝、卸載、數據路徑配置

在linux shell中獲取時間

加載本地文件到Hive表出現NULL列的解決辦法

分類算法中的ROC與PR指標

SQL in 與inner join查詢結果的區別

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結