使用Sqoop從MySQL導入數據到HBase和hive(轉載)

有時候需要將mysql的全量數據導入到hive或者hbase中,使用sqoop是一個比較好用的工具,速度相對來說比較快。mysql的增量數據在用其他方法實時同步。

一、mysql同步到hbase

導入命令:
sqoop import --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/database --table tablename --hbase-table hbasetablename --column-family family --hbase-row-key ID --hbase-create-table --username 'root' -P
參數說明:
--connect:數據庫連接串
--username:用戶名
--P:交互式輸入密碼
--table:表名
-m:並行執行sqoop導入程序的map task的數量,在不指定的情況下默認啓動4個map
--split-by:並行導入過程中,各個map task根據哪個字段來劃分數據段,該參數最好指定一個能相對均勻劃分數據的字段,比如創建時間、遞增的ID
--hbase-table:hbase中接收數據的表名
--hbase-create-table:如果指定的接收數據表在hbase中不存在,則新建表
--column-family:列族名稱,所有源表的字段都進入該列族
--hbase-row-key:如果不指定則採用源表的key作爲hbase的row key。可以指定一個字段作爲row key,或者指定組合行鍵,當指定組合行鍵時,用雙引號包含多個字段,各字段用逗號分隔

執行命令部分日誌:

[hdfs@slave1 ~]$ sqoop import --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/database --table tablename --hbase-table hbasetablename --column-family family --hbase-row-key ID --hbase-create-table --username 'root' -P
Warning: /soft/bigdata/clouderamanager/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
17/04/28 15:54:37 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.10.0
Enter password: 
17/04/28 15:54:44 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
17/04/28 15:54:44 INFO tool.CodeGenTool: Beginning code generation
Fri Apr 28 15:54:44 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
17/04/28 15:54:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `COMP_DICT` AS t LIMIT 1
17/04/28 15:54:45 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `COMP_DICT` AS t LIMIT 1
17/04/28 15:54:45 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /soft/bigdata/clouderamanager/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-hdfs/compile/f5c3b693ffb26b66c554308ad32b2880/COMP_DICT.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
17/04/28 15:54:47 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hdfs/compile/f5c3b693ffb26b66c554308ad32b2880/COMP_DICT.jar
……
7/04/28 15:54:53 INFO mapreduce.Job: The url to track the job: http://master2:8088/proxy/application_1491881598805_0027/
17/04/28 15:54:53 INFO mapreduce.Job: Running job: job_1491881598805_0027
17/04/28 15:54:59 INFO mapreduce.Job: Job job_1491881598805_0027 running in uber mode : false
17/04/28 15:54:59 INFO mapreduce.Job:  map 0% reduce 0%
17/04/28 15:55:05 INFO mapreduce.Job:  map 20% reduce 0%
17/04/28 15:55:06 INFO mapreduce.Job:  map 60% reduce 0%
17/04/28 15:55:09 INFO mapreduce.Job:  map 100% reduce 0%
17/04/28 15:55:10 INFO mapreduce.Job: Job job_1491881598805_0027 completed successfully
17/04/28 15:55:10 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=925010
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=665
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=5
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Launched map tasks=5
        Other local map tasks=5
        Total time spent by all maps in occupied slots (ms)=25663
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=25663
        Total vcore-seconds taken by all map tasks=25663
        Total megabyte-seconds taken by all map tasks=26278912
    Map-Reduce Framework
        Map input records=10353
        Map output records=10353
        Input split bytes=665
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=586
        CPU time spent (ms)=17940
        Physical memory (bytes) snapshot=1619959808
        Virtual memory (bytes) snapshot=14046998528
        Total committed heap usage (bytes)=1686634496
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
17/04/28 15:55:10 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 20.0424 seconds (0 bytes/sec)
17/04/28 15:55:10 INFO mapreduce.ImportJobBase: Retrieved 10353 records.

PS:如果想把表中的主鍵也增加到hbase的cell中也就是列族中的列的話,需要設置如下參數:
sqoop.hbase.add.row.key=true

sqoop import -D sqoop.hbase.add.row.key=true --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/database --table tablename --hbase-table hbasetablename --column-family family --hbase-row-key "etl_date,APPLY_ID" --hbase-create-table --username 'root' -P

二、mysql同步到hive

1、創建與mysql相同結構的表:

sqoop create-hive-table --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/shiro --table UserInfo --hive-database shiro --hive-table userinfo --username root --password xxxxxx --fields-terminated-by "\0001" --lines-terminated-by "\n";
參數說明:
--fields-terminated-by "\0001" 是設置每列之間的分隔符,"\0001"是ASCII碼中的1,它也是hive的默認行內分隔符, 而sqoop的默認行內分隔符爲","
--lines-terminated-by "\n" 設置的是每行之間的分隔符,此處爲換行符,也是默認的分隔符;

2、mysql數據導入到hive中

sqoop import --connect jdbc:mysql://xxx.xxx.xxx.xxx:3306/testSqoop --table dydata --hive-database testsqoop --hive-import --hive-table dydata --username root --password xxxxxx --fields-terminated-by "\0001";
參數說明:
-m 2 表示由兩個map作業執行
--fields-terminated-by "\0001" 需同創建hive表時保持一致;
--hive-import一定要加此參數,否則無法成功導入hive中



作者:獻給記性不好的自己
鏈接:https://www.jianshu.com/p/929934b5e9b8
來源:簡書
簡書著作權歸作者所有,任何形式的轉載都請聯繫作者獲得授權並註明出處。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章