溫馨提示:要看高清無碼套圖,請使用手機打開並單擊圖片放大查看。
1.問題描述
通過sqoop抽取Mysql表數據到hive表,發現hive表所有列顯示爲null
Hive表的分隔符爲“\u001B”,sqoop指定的分隔符也是“\u001B”
通過命令show create table test_hive_delimiter查看建表語句如下:
0: jdbc:hive2://localhost:10000/> show create table test_hive_delimiter;
...
INFO : OK
+----------------------------------------------------+--+
| createtab_stmt |
+----------------------------------------------------+--+
| CREATE EXTERNAL TABLE `test_hive_delimiter`( |
| `id` int, |
| `name` string, |
| `address` string) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| WITH SERDEPROPERTIES ( |
| 'field.delim'='\u0015', |
| 'serialization.format'='\u0015') |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://ip-172-31-6-148.fayson.com:8020/fayson/test_hive_delimiter' |
| TBLPROPERTIES ( |
| 'COLUMN_STATS_ACCURATE'='false', |
| 'numFiles'='0', |
| 'numRows'='-1', |
| 'rawDataSize'='-1', |
| 'totalSize'='0', |
| 'transient_lastDdlTime'='1504705887') |
+----------------------------------------------------+--+
22 rows selected (0.084 seconds)
0: jdbc:hive2://localhost:10000/>
發現Hive的原始建表語句中的分隔符是“\u001B”而通過show create table test_hive_delimiter命令查詢出來的分隔符爲“\u0015”,分隔符被修改了。
2.問題復現
1.創建Hive表test_hive_delimiter,使用“\u001B”分隔符
create external table test_hive_delimiter
(
id int,
name string,
address string
)
row format delimited fields terminated by '\u001B'
stored as textfile location '/fayson/test_hive_delimiter';
2.使用sqoop抽取MySQL中test表數據到hive表(test_hive_delimiter)
[root@ip-172-31-6-148 ~]# sqoop import --connect jdbc:mysql://ip-172-31-6-148.fayson.com:3306/fayson -username root -password 123456 --table test -m 1 --hive-import --fields-terminated-by "\0x001B" --target-dir /fayson/test_hive_delimiter --hive-table test_hive_delimiter
數據抽取成功:
[root@ip-172-31-6-148 ~]# hadoop fs -ls /fayson/test_hive_delimiter
Found 2 items
-rw-r--r-- 3 fayson supergroup 0 2017-09-06 13:46 /fayson/test_hive_delimiter/_SUCCESS
-rwxr-xr-x 3 fayson supergroup 56 2017-09-06 13:46 /fayson/test_hive_delimiter/part-m-00000
[root@ip-172-31-6-148 ~]# hadoop fs -ls /fayson/test_hive_delimiter/part-m-00000
-rwxr-xr-x 3 fayson supergroup 56 2017-09-06 13:46 /fayson/test_hive_delimiter/part-m-00000
[root@ip-172-31-6-148 ~]#
3.查看test_hive_delimiter表數據
[root@ip-172-31-6-148 ~]# beeline
Beeline version 1.1.0-cdh5.12.1 by Apache Hive
beeline> !connect jdbc:hive2://localhost:10000/;principal=hive/[email protected]
...
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://localhost:10000/> select * from test_hive_delimiter;
...
INFO : OK
+-------------------------+---------------------------+------------------------------+--+
| test_hive_delimiter.id | test_hive_delimiter.name | test_hive_delimiter.address |
+-------------------------+---------------------------+------------------------------+--+
| NULL | NULL | NULL |
| NULL | NULL | NULL |
| NULL | NULL | NULL |
+-------------------------+---------------------------+------------------------------+--+
3 rows selected (0.287 seconds)
0: jdbc:hive2://localhost:10000/>
4.Hive表的建表語句如下
3.解決方法
分隔符“\u001B”爲十六進制,而Hive的分隔符實際是八進制,所以在使用十六進制的分隔符時會被Hive轉義,所以出現使用“\u001B”分隔符創建hive表後顯示的分隔符爲“\u0015”。
在不改變數據文件分隔符的情況下,要先將十六進制分隔符轉換成八進制分隔符來創建Hive表。
1.將十六進制分隔符轉換爲八進制分隔符
“\u001B”轉換八進制爲“\033”,在線轉換工具:http://tool.lu/hexconvert/
2.修改建表語句使用八進制“\033”作爲分隔符
create external table test_hive_delimiter
(
id int,
name string,
address string
)
row format delimited fields terminated by '\033'
stored as textfile location '/fayson/test_hive_delimiter';
使用命令show create table test_hive_delimiter查看建表語句
0: jdbc:hive2://localhost:10000/> show create table test_hive_delimiter;
...
INFO : OK
+----------------------------------------------------+--+
| createtab_stmt |
+----------------------------------------------------+--+
| CREATE EXTERNAL TABLE `test_hive_delimiter`( |
| `id` int, |
| `name` string, |
| `address` string) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| WITH SERDEPROPERTIES ( |
| 'field.delim'='\u001B', |
| 'serialization.format'='\u001B') |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'hdfs://ip-172-31-6-148.fayson.com:8020/fayson/test_hive_delimiter' |
| TBLPROPERTIES ( |
| 'COLUMN_STATS_ACCURATE'='false', |
| 'numFiles'='0', |
| 'numRows'='-1', |
| 'rawDataSize'='-1', |
| 'totalSize'='0', |
| 'transient_lastDdlTime'='1504707693') |
+----------------------------------------------------+--+
22 rows selected (0.079 seconds)
0: jdbc:hive2://localhost:10000/>
3.查詢test_hive_delimiter表數據
0: jdbc:hive2://localhost:10000/> select * from test_hive_delimiter;
...
INFO : OK
+-------------------------+---------------------------+------------------------------+--+
| test_hive_delimiter.id | test_hive_delimiter.name | test_hive_delimiter.address |
+-------------------------+---------------------------+------------------------------+--+
| 1 | fayson | guangdong |
| 2 | zhangsan | shenzheng |
| 3 | lisi | shanghai |
+-------------------------+---------------------------+------------------------------+--+
3 rows selected (0.107 seconds)
0: jdbc:hive2://localhost:10000/>
將十六進制的”\u001B”轉換爲八進制的”\033”建表,問題解決。
4.備註
- Hive建表時使用十六進制分割符需要注意,部分分隔符會被轉義(如:001B/001C等)
- Sqoop指定十六進制分隔符,爲什麼是“\0x001B”而不是“\u001B”,可參考Sqoop
官網說明:https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#\_file\_formats
醉酒鞭名馬,少年多浮誇! 嶺南浣溪沙,嘔吐酒肆下!摯友不肯放,數據玩的花!
溫馨提示:要看高清無碼套圖,請使用手機打開並單擊圖片放大查看。
推薦關注Hadoop實操,第一時間,分享更多Hadoop乾貨,歡迎轉發和分享。
原創文章,歡迎轉載,轉載請註明:轉載自微信公衆號Hadoop實操