如何Load TXT 到HDInsight Hive table
記得以前做過一個小項目,需要把客戶整理的TXT導入到數據庫,然後結合客戶的需求統計分析特定條件的報表,比如表的schema爲:time, name, meeting, level。需求統計特定的時間有多少人開過會等等。
遷移txt到數據庫的方法有很多,比如:SSIS或者開發entity framework,讀txt文件內容,然後寫到數據庫。這裏我們介紹如何用HDInsight load txt到 HDI hive table,同樣可以實現客戶的需求。
上傳hivetable.txt到HDI的headnode。
SSH到創建好的HDInsight headnode,查看文件內容。
sshuser@hn0-hdites:~$ cat hivetable.txt
linlin,123,male
brian,345,male
lin,567,female
複製txt文件到HDFS存儲:
hdfs dfs -copyFromLocal hivetable.txt wasb://[email protected]/hive/
Note: hditest.blob.core.windows.net爲Azure Blob存儲數據庫。
連接到Hive接口:
beeline -u 'jdbc:hive2://headnodehost:10001/;transportMode=http'
針對TXT文件,創建表結構如下:
CREATE TABLE hiveexample (
name string,
id int,
sex string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;
0: jdbc:hive2://headnodehost:10001/> SHOW CREATE TABLE hiveexample;
+-------------------------------------------------------------------------------------------------------------------------------+--+
| createtab_stmt |
+-------------------------------------------------------------------------------------------------------------------------------+--+
| CREATE TABLE `hiveexample`( |
| `name` string, |
| `id` int, |
| `sex` string) |
| ROW FORMAT DELIMITED |
| FIELDS TERMINATED BY ',' |
| LINES TERMINATED BY '\n' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'wasb://[email protected]/hive/warehouse/hiveexample' |
| TBLPROPERTIES ( |
| 'numFiles'='1', |
| 'numRows'='0', |
| 'rawDataSize'='0', |
| 'totalSize'='49', |
| 'transient_lastDdlTime'='1570359102') |
+-------------------------------------------------------------------------------------------------------------------------------+--+
19 rows selected (0.656 seconds)
創建好的表結構如下:
0: jdbc:hive2://headnodehost:10001/> show tables;
+------------------+--+
| tab_name |
+------------------+--+
| hiveexample | |
+------------------+--+
導入HDFS上的存儲文件hivetable.txt到表hiveexample:
LOAD DATA INPATH '/hive/hivetable.txt' OVERWRITE INTO TABLE hiveexample;
查看錶內容:
0: jdbc:hive2://headnodehost:10001/> select * from hiveexample;
+-------------------+-----------------+------------------+--+
| hiveexample.name | hiveexample.id | hiveexample.sex |
+-------------------+-----------------+------------------+--+
| linlin | 123 | male |
| brian | 345 | male |
| lin | 567 | female |
+-------------------+-----------------+------------------+--+
0: jdbc:hive2://headnodehost:10001/> select * from hiveexample where sex = 'male';
+-------------------+-----------------+------------------+--+
| hiveexample.name | hiveexample.id | hiveexample.sex |
+-------------------+-----------------+------------------+--+
| linlin | 123 | male |
| brian | 345 | male |
+-------------------+-----------------+------------------+--+
2 rows selected (0.615 seconds)
這樣你就可以用SQL的語言對錶hiveexample做操作。
個人覺得Hive這個操作起來更靈活,方便,你值得擁有。