Kettle與Hadoop(七)執行Sqoop作業

目錄

一、Sqoop export

二、Sqoop import


一、Sqoop export

1. 建立一個作業,將HDFS文件導出到MySQL數據庫。
(1)打開PDI,新建一個作業,如圖1所示。

圖1

 

(2)編輯'Sqoop export'作業項,如圖2所示。

圖2

        說明:

圖3
  • sales_order是MySQL中的目標表,結構與part-m-00000文件匹配,如圖4所示。
圖4

2. 保存並執行作業,日誌如下所示。

2020/06/08 09:28:40 - Spoon - Starting job...
2020/06/08 09:28:40 - sqoop_export - Start of job execution
2020/06/08 09:28:40 - sqoop_export - Starting entry [Sqoop export]
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Running Sqoop version: 1.4.7-cdh6.1.0
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Setting your password on the command-line is insecure. Consider using -P instead.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Preparing to use a MySQL streaming resultset.
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Beginning code generation
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 09:28:40 - Sqoop export - 2020/06/08 09:28:40 - $HADOOP_MAPRED_HOME is not set
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Writing jar file: /tmp/sqoop-root/compile/7c09fc949771ba78a584cc07c1cd6b3f/sales_order.jar
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Beginning export of sales_order
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - SQOOP_HOME is unset. May not be able to find all job dependencies.
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Connecting to ResourceManager at manager/172.16.1.124:8032
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Disabling Erasure Coding for path: /user/root/.staging/job_1591323999364_0046
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Total input files to process : 1
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Total input files to process : 1
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - number of splits:4
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Submitting tokens for job: job_1591323999364_0046
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Executing with tokens: []
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Submitted application application_1591323999364_0046
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - The url to track the job: http://manager:8088/proxy/application_1591323999364_0046/
2020/06/08 09:28:41 - Sqoop export - 2020/06/08 09:28:41 - Running job: job_1591323999364_0046
2020/06/08 09:28:51 - Sqoop export - 2020/06/08 09:28:51 - Job job_1591323999364_0046 running in uber mode : false
2020/06/08 09:28:51 - Sqoop export - 2020/06/08 09:28:51 -  map 0% reduce 0%
2020/06/08 09:28:58 - Sqoop export - 2020/06/08 09:28:58 -  map 75% reduce 0%
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 -  map 100% reduce 0%
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Job job_1591323999364_0046 completed successfully
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Counters: 34
2020/06/08 09:28:59 - Sqoop export -     File System Counters
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of bytes read=0
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of bytes written=968088
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of read operations=0
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of large read operations=0
2020/06/08 09:28:59 - Sqoop export -         FILE: Number of write operations=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of bytes read=15868
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of bytes written=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of read operations=16
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of large read operations=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of write operations=0
2020/06/08 09:28:59 - Sqoop export -         HDFS: Number of bytes read erasure-coded=0
2020/06/08 09:28:59 - Sqoop export -     Job Counters 
2020/06/08 09:28:59 - Sqoop export -         Launched map tasks=4
2020/06/08 09:28:59 - Sqoop export -         Data-local map tasks=2
2020/06/08 09:28:59 - Sqoop export -         Rack-local map tasks=2
2020/06/08 09:28:59 - Sqoop export -         Total time spent by all maps in occupied slots (ms)=18912
2020/06/08 09:28:59 - Sqoop export -         Total time spent by all reduces in occupied slots (ms)=0
2020/06/08 09:28:59 - Sqoop export -         Total time spent by all map tasks (ms)=18912
2020/06/08 09:28:59 - Sqoop export -         Total vcore-milliseconds taken by all map tasks=18912
2020/06/08 09:28:59 - Sqoop export -         Total megabyte-milliseconds taken by all map tasks=19365888
2020/06/08 09:28:59 - Sqoop export -     Map-Reduce Framework
2020/06/08 09:28:59 - Sqoop export -         Map input records=102
2020/06/08 09:28:59 - Sqoop export -         Map output records=102
2020/06/08 09:28:59 - Sqoop export -         Input split bytes=556
2020/06/08 09:28:59 - Sqoop export -         Spilled Records=0
2020/06/08 09:28:59 - Sqoop export -         Failed Shuffles=0
2020/06/08 09:28:59 - Sqoop export -         Merged Map outputs=0
2020/06/08 09:28:59 - Sqoop export -         GC time elapsed (ms)=351
2020/06/08 09:28:59 - Sqoop export -         CPU time spent (ms)=5660
2020/06/08 09:28:59 - Sqoop export -         Physical memory (bytes) snapshot=857292800
2020/06/08 09:28:59 - Sqoop export -         Virtual memory (bytes) snapshot=10344894464
2020/06/08 09:28:59 - Sqoop export -         Total committed heap usage (bytes)=739770368
2020/06/08 09:28:59 - Sqoop export -         Peak Map Physical memory (bytes)=228241408
2020/06/08 09:28:59 - Sqoop export -         Peak Map Virtual memory (bytes)=2586439680
2020/06/08 09:28:59 - Sqoop export -     File Input Format Counters 
2020/06/08 09:28:59 - Sqoop export -         Bytes Read=0
2020/06/08 09:28:59 - Sqoop export -     File Output Format Counters 
2020/06/08 09:28:59 - Sqoop export -         Bytes Written=0
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Transferred 15.4961 KB in 18.7657 seconds (845.5859 bytes/sec)
2020/06/08 09:28:59 - Sqoop export - 2020/06/08 09:28:59 - Exported 102 records.
2020/06/08 09:28:59 - sqoop_export - Finished job entry [Sqoop export] (result=[true])
2020/06/08 09:28:59 - sqoop_export - Job execution finished
2020/06/08 09:28:59 - Spoon - Job has ended.

        從日誌中可以看到,作業已經成功執行。在作業所在目錄,會生成一個名爲sales_order.java的文件。

3. 檢查MySQL表,結果如圖5所示。

圖5

        從圖5可以看到,通過Sqoop export作業項將HDFS文件內容導出到MySQL表中。
 

二、Sqoop import

1. 建立一個作業,將MySQL表數據導入到HDFS。
(1)打開PDI,新建一個作業,如圖6所示。

圖6

(2)編輯'Sqoop import'作業項,如圖7所示。

圖7

        說明:

  • /user/root/sales_order是HDFS的目標目錄,該目錄應該尚不存在。

2. 保存並執行作業,日誌如下所示。

2020/06/08 10:00:52 - Spoon - Starting job...
2020/06/08 10:00:52 - sqoop_import - Start of job execution
2020/06/08 10:00:52 - sqoop_import - Starting entry [Sqoop import]
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Running Sqoop version: 1.4.7-cdh6.1.0
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Setting your password on the command-line is insecure. Consider using -P instead.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Preparing to use a MySQL streaming resultset.
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Beginning code generation
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Executing SQL statement: SELECT t.* FROM `sales_order` AS t LIMIT 1
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - $HADOOP_MAPRED_HOME is not set
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Writing jar file: /tmp/sqoop-root/compile/7c09fc949771ba78a584cc07c1cd6b3f/sales_order.jar
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Setting zero DATETIME behavior to convertToNull (mysql)
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - Beginning import of sales_order
2020/06/08 10:00:52 - Sqoop import - 2020/06/08 10:00:52 - SQOOP_HOME is unset. May not be able to find all job dependencies.
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Connecting to ResourceManager at manager/172.16.1.124:8032
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Disabling Erasure Coding for path: /user/root/.staging/job_1591323999364_0048
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Using read commited transaction isolation
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - BoundingValsQuery: SELECT MIN(`order_number`), MAX(`order_number`) FROM `sales_order`
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Split size: 25; Num splits: 4 from: 101 to: 202
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - number of splits:4
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - fs.default.name is deprecated. Instead, use fs.defaultFS
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Submitting tokens for job: job_1591323999364_0048
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Executing with tokens: []
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Submitted application application_1591323999364_0048
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - The url to track the job: http://manager:8088/proxy/application_1591323999364_0048/
2020/06/08 10:00:53 - Sqoop import - 2020/06/08 10:00:53 - Running job: job_1591323999364_0048
2020/06/08 10:01:02 - Sqoop import - 2020/06/08 10:01:02 - Job job_1591323999364_0048 running in uber mode : false
2020/06/08 10:01:02 - Sqoop import - 2020/06/08 10:01:02 -  map 0% reduce 0%
2020/06/08 10:01:09 - Sqoop import - 2020/06/08 10:01:09 -  map 75% reduce 0%
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 -  map 100% reduce 0%
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Job job_1591323999364_0048 completed successfully
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Counters: 33
2020/06/08 10:01:11 - Sqoop import -     File System Counters
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of bytes read=0
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of bytes written=969236
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of read operations=0
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of large read operations=0
2020/06/08 10:01:11 - Sqoop import -         FILE: Number of write operations=0
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of bytes read=481
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of bytes written=6120
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of read operations=24
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of large read operations=0
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of write operations=8
2020/06/08 10:01:11 - Sqoop import -         HDFS: Number of bytes read erasure-coded=0
2020/06/08 10:01:11 - Sqoop import -     Job Counters 
2020/06/08 10:01:11 - Sqoop import -         Launched map tasks=4
2020/06/08 10:01:11 - Sqoop import -         Other local map tasks=4
2020/06/08 10:01:11 - Sqoop import -         Total time spent by all maps in occupied slots (ms)=19048
2020/06/08 10:01:11 - Sqoop import -         Total time spent by all reduces in occupied slots (ms)=0
2020/06/08 10:01:11 - Sqoop import -         Total time spent by all map tasks (ms)=19048
2020/06/08 10:01:11 - Sqoop import -         Total vcore-milliseconds taken by all map tasks=19048
2020/06/08 10:01:11 - Sqoop import -         Total megabyte-milliseconds taken by all map tasks=19505152
2020/06/08 10:01:11 - Sqoop import -     Map-Reduce Framework
2020/06/08 10:01:11 - Sqoop import -         Map input records=102
2020/06/08 10:01:11 - Sqoop import -         Map output records=102
2020/06/08 10:01:11 - Sqoop import -         Input split bytes=481
2020/06/08 10:01:11 - Sqoop import -         Spilled Records=0
2020/06/08 10:01:11 - Sqoop import -         Failed Shuffles=0
2020/06/08 10:01:11 - Sqoop import -         Merged Map outputs=0
2020/06/08 10:01:11 - Sqoop import -         GC time elapsed (ms)=500
2020/06/08 10:01:11 - Sqoop import -         CPU time spent (ms)=5680
2020/06/08 10:01:11 - Sqoop import -         Physical memory (bytes) snapshot=783245312
2020/06/08 10:01:11 - Sqoop import -         Virtual memory (bytes) snapshot=10357563392
2020/06/08 10:01:11 - Sqoop import -         Total committed heap usage (bytes)=659554304
2020/06/08 10:01:11 - Sqoop import -         Peak Map Physical memory (bytes)=202842112
2020/06/08 10:01:11 - Sqoop import -         Peak Map Virtual memory (bytes)=2591666176
2020/06/08 10:01:11 - Sqoop import -     File Input Format Counters 
2020/06/08 10:01:11 - Sqoop import -         Bytes Read=0
2020/06/08 10:01:11 - Sqoop import -     File Output Format Counters 
2020/06/08 10:01:11 - Sqoop import -         Bytes Written=6120
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Transferred 5.9766 KB in 18.733 seconds (326.697 bytes/sec)
2020/06/08 10:01:11 - Sqoop import - 2020/06/08 10:01:11 - Retrieved 102 records.
2020/06/08 10:01:11 - sqoop_import - Finished job entry [Sqoop import] (result=[true])
2020/06/08 10:01:11 - sqoop_import - Job execution finished
2020/06/08 10:01:11 - Spoon - Job has ended.

        從日誌中可以看到,作業已經成功執行。在作業所在目錄,會生成一個名爲sales_order.java的文件。

3. 檢查HDFS目標目錄,結果如圖8所示。

圖8

        從圖8可以看到,通過Sqoop import作業項將MySQL表中數據導入到HDFS中。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章