sqoop基本語法簡介及導入導出詳細步驟

sqoop參數

-m 指定啓動map進程個數,默認是4--delete-target-dir 刪除目標目錄
--mapreduce-job-name 指定mapreduce的job的名字
--target-dir 導入到指定目錄
--fields-terminated-by 指定字段之間的分隔符
--null-string 含義是 string類型的字段,當Value是NULL,替換成指定的字符
--null-non-string 含義是非string類型的字段,當Value是NULL,替換成指定字符
--columns 導入表中的部分字段
--where 按條件導入數據
--query 按照sql語句進行導入 使用--query關鍵字,就不能使用--table和--columns
--options-file 在文件中執行

>>>>>HDFS數據導出到MySQL或Hive中的數據導入到MySQL
--table 指定導出表的名稱
--input-fields-terminated-by 指定hdfs上文件的分隔符,默認是逗號
--export-dir 導出數據的目錄
--columns 指定導出的字段

>>>數據導入到Hive中
--create-hive-table 創建目標表,如果有會報錯
--hive-database 指定hive數據庫
--hive-import 指定導入hive(沒有這個條件導入到hdfs中)
--hive-overwrite 覆蓋
--hive-table 指定hive中表的名字,如果不指定使用導入的表的表名
--hive-partition-key 指定Hive分區表字段
--hive-partition-value 指定導入的分區值

mysql中創建表過程

#查看mysql中數據庫
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hivedb             |
| mysql              |
| performance_schema |
+--------------------+
4 rows in set (0.11 sec)
#創建數據庫
mysql> create database sqoopdb;
Query OK, 1 row affected (0.04 sec)
#檢查所創建的數據庫
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hivedb             |
| mysql              |
| performance_schema |
| sqoopdb            |
+--------------------+
5 rows in set (0.00 sec)
#使用數據庫(意思就是想在那個數據庫中創建表)
mysql> use sqoopdb;
#看到 Database changed 標誌代表可以在它裏面創建表
Database changed
#創建表(表的大小寫要一致,避免之後出現問題  親身體驗過)
mysql> create table user( ID INT(4) NOT NULL AUTO_INCREMENT, ACCOUNT VARCHAR(255) DEFAULT NULL, PASSWD VARCHAR(255) DEFAULT NULL, PRIMARY KEY(ID) );
Query OK, 0 rows affected (0.15 sec)
#插入數據
mysql> INSERT INTO user VALUES("1","admin",'admin');
Query OK, 1 row affected (0.03 sec)

mysql> INSERT INTO user VALUES("2","root",'root');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO user VALUES("3","zfx",'zfx');
Query OK, 1 row affected (0.00 sec)
#查看插入表中的數據
mysql> select * from user;
+----+---------+--------+
| ID | ACCOUNT | PASSWD |
+----+---------+--------+
|  1 | admin   | admin  |
|  2 | root    | root   |
|  3 | zfx     | zfx    |
+----+---------+--------+
3 rows in set (0.00 sec)

查看sqoop命令詳細幫助

[root@hadoop01 ~]$ sqoop help
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

# 這裏提示我們使用sqoop help command(要查詢的命令)進行該命令的詳細查詢

查看本地虛擬機mysql數據庫中的數據庫的名稱

#查看本地虛擬機mysql數據庫中的數據庫的名稱
[root@hadoop01 ~]# sqoop list-databases --connect jdbc:mysql://hadoop01:3306/ --username root --password root
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/20 09:26:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/20 09:26:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/20 09:26:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hivedb
mysql
performance_schema
sqoopdb

查看本地虛擬機mysql數據庫中的數據庫中表名

[root@hadoop01 ~]# sqoop list-tables --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/20 09:27:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/20 09:27:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/20 09:27:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
user

Mysql 中導入到HDFS中並查看數據

sqoop語法:

sqoop import  
--connect jdbc:mysql://ip:3306/databasename  #指定JDBC的URL 其中database指的是(Mysql或者Oracle)中的數據庫名
--table  tablename  #要讀取數據庫database中的表名           
--username root      #用戶名 
--password  123456  #密碼    
--target-dir   /path  #指的是HDFS中導入表的存放目錄(注意:是目錄)
--fields-terminated-by '\t'   #設定導入數據後每個字段的分隔符,默認;分隔
--lines-terminated-by '\n'    #設定導入數據後每行的分隔符
--m 1  #併發的map數量1,如果不設置默認啓動4個map task執行數據導入,則需要指定一個列來作爲劃分map task任務的依據
-- where ’查詢條件‘   #導入查詢出來的內容,表的子集
--incremental  append  #增量導入
--check-column:column_id   #指定增量導入時的參考列
--last-value:num   #上一次導入column_id的最後一個值
--null-string ‘’   #導入的字段爲空時,用指定的字符進行替換

命令詳解

 sqoop import #表示導入
 --connect jdbc:mysql://hadoop01:3306/sqoopdb #連接mysql
 --username root #mysql用戶
 --password root #mysql密碼
 --table user #要讀取數據庫database中的表名 
 --columns 'id ,ACCOUNT,PASSWD' #導入mysql表中的部分字段(我這裏是導入全部字段)
 -m 1 #指定啓動map進程個數
 --target-dir '/sqoopdb/' #導入到hdfs指定目錄

實踐:
需求:Mysql 中表user 導入到HDFS的sqoopdb目錄下 ,並用hdfs命令查看導入的數據

 sqoop import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table user --columns 'id ,ACCOUNT,PASSWD' -m 1 --target-dir '/sqoopdb/' 
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/20 09:39:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/20 09:39:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/20 09:39:34 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/05/20 09:39:34 INFO tool.CodeGenTool: Beginning code generation
20/05/20 09:39:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
20/05/20 09:39:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
20/05/20 09:39:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/app/hadoop
注: /tmp/sqoop-root/compile/5ab33079a88c16d4be68a133b8c67593/user.java使用或覆蓋了已過時的 API。
注: 有關詳細信息, 請使用 -Xlint:deprecation 重新編譯。
20/05/20 09:39:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/5ab33079a88c16d4be68a133b8c67593/user.jar
20/05/20 09:39:40 WARN manager.MySQLManager: It looks like you are importing from mysql.
20/05/20 09:39:40 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
20/05/20 09:39:40 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
20/05/20 09:39:40 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
20/05/20 09:39:40 INFO mapreduce.ImportJobBase: Beginning import of user
20/05/20 09:39:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/05/20 09:39:42 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/05/20 09:39:42 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.40.128:8032
20/05/20 09:39:49 INFO db.DBInputFormat: Using read commited transaction isolation
20/05/20 09:39:49 INFO mapreduce.JobSubmitter: number of splits:1
20/05/20 09:39:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1589936728054_0001
20/05/20 09:39:50 INFO impl.YarnClientImpl: Submitted application application_1589936728054_0001
20/05/20 09:39:50 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1589936728054_0001/
20/05/20 09:39:50 INFO mapreduce.Job: Running job: job_1589936728054_0001
20/05/20 09:40:03 INFO mapreduce.Job: Job job_1589936728054_0001 running in uber mode : false
20/05/20 09:40:03 INFO mapreduce.Job:  map 0% reduce 0%
20/05/20 09:40:34 INFO mapreduce.Job:  map 100% reduce 0%
20/05/20 09:40:36 INFO mapreduce.Job: Job job_1589936728054_0001 completed successfully
20/05/20 09:40:36 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=124476
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=36
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=24069
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=24069
		Total vcore-seconds taken by all map tasks=24069
		Total megabyte-seconds taken by all map tasks=24646656
	Map-Reduce Framework
		Map input records=3
		Map output records=3
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=52
		CPU time spent (ms)=610
		Physical memory (bytes) snapshot=84381696
		Virtual memory (bytes) snapshot=2078760960
		Total committed heap usage (bytes)=16961536
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=36
20/05/20 09:40:36 INFO mapreduce.ImportJobBase: Transferred 36 bytes in 54.4531 seconds (0.6611 bytes/sec)
20/05/20 09:40:36 INFO mapreduce.ImportJobBase: Retrieved 3 records.

使用hdfs 命令查看數據

hadoop fs -ls 顯示的是當前的用戶目錄 
hadoop fs -ls / 顯示的是HDFS根目錄
hdfs dfs -cat/text  顯示內容
[root@hadoop01 ~]# hdfs dfs -ls /sqoopdb/
Found 2 items
-rw-r--r--   3 root supergroup          0 2020-05-20 09:40 /sqoopdb/_SUCCESS
-rw-r--r--   3 root supergroup         36 2020-05-20 09:40 /sqoopdb/part-m-00000
[root@hadoop01 ~]# hdfs dfs -ls /sqoopdb//sqoopdb/part-m-00000
ls: `/sqoopdb//sqoopdb/part-m-00000': No such file or directory
[root@hadoop01 ~]# hdfs dfs -ls /sqoopdb/part-m-00000
-rw-r--r--   3 root supergroup         36 2020-05-20 09:40 /sqoopdb/part-m-00000
#這就是導入hdfs中數據
[root@hadoop01 ~]# hdfs dfs -cat /sqoopdb/part-m-00000
1,admin,admin
2,root,root
3,zfx,zfx

Mysql 中導入到Hbase中並查看數據

命令詳解

sqoop import #導入數據
--connect jdbc:mysql://hadoop01:3306/sqoopdb #連接mysql
--username root #mysql用戶
--password root #mysql密碼
--table MYUSER  #mysql中導入habse的表名
--columns 'ID ,ACCOUNT,PASSWD' #導入mysql表中的部分字段
-m 1 #指定啓動map進程個數
--hbase-table user #指定hbase中表的名字
--hbase-row-key ID #可以你的mysql數據庫主鍵
--column-family info # habse的'列簇名:region'

實踐:
需求:
Mysql 中表MYUSER導入到hbase表user中(需要提前在hbase表中創建),之後到hbase表中查看

[root@hadoop01 ~]# sqoop import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table MYUSER --columns 'ID ,ACCOUNT,PASSWD' -m 1 --hbase-table user --hbase-row-key ID --column-family info 
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
................
20/05/20 18:51:50 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 167.842 seconds (0 bytes/sec)
20/05/20 18:51:50 INFO mapreduce.ImportJobBase: Retrieved 2 records.

Hbase命令查看數據

#列出habse中表
hbase(main):003:0> list
TABLE                                                                                                                                                                      
scores                                                                                                                                                                     
user                                                                                                                                                                       
2 row(s) in 6.7600 seconds

=> ["scores", "user"]
#查看通過sqoop 從mysql導入habse數據
hbase(main):004:0> scan 'user'
ROW                                         COLUMN+CELL                                                                                                                    
 1                                          column=info:ACCOUNT, timestamp=1589971903320, value=admin                                                                      
 1                                          column=info:PASSWD, timestamp=1589971903320, value=admin                                                                       
 2                                          column=info:ACCOUNT, timestamp=1589971903320, value=ROOT                                                                       
 2                                          column=info:PASSWD, timestamp=1589971903320, value=ROOT                                                                        
2 row(s) in 5.8630 seconds

Mysql 中導入到hive中並查看數據

命令詳解

 sqoop import #導入數據
 --hive-import #指定導入hive
 --connect jdbc:mysql://hadoop01:3306/sqoopdb #連接mysql
 --username root #mysql用戶
 --password root #mysql 密碼
 --table MYUSER  #mysql中要往hive導的表
 --columns'ID,ACCOUNT,PASSWD' #導入mysql表中的部分字段
 -m 1 #指定啓動map進程個數
 --hive-table myuser #指定hive中表的名字,如果沒有話自動創建

實踐
需求:將mysql中表MYUSER導入hive表myuser中,並使用hive命令查看

[root@hadoop01 ~]# sqoop import --hive-import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table MYUSER --columns'ID,ACCOUNT,PASSWD' -m 1 --hive-table myuser 
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/21 12:31:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/21 12:31:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: --columnsID,ACCOUNT,PASSWD
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: -m
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: 1
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-table
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: USER

Try --help for usage instructions.
[root@hadoop01 ~]# sqoop import --hive-import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table MYUSER --columns 'ID,ACCOUNT,PASSWD' -m 1 --hive-table USER 
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.

20/05/21 12:33:27 INFO mapreduce.ImportJobBase: Transferred 26 bytes in 36.2284 seconds (0.7177 bytes/sec)
20/05/21 12:33:27 INFO mapreduce.ImportJobBase: Retrieved 2 records.
20/05/21 12:33:27 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table MYUSER
20/05/21 12:33:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `MYUSER` AS t LIMIT 1
20/05/21 12:33:27 INFO hive.HiveImport: Loading uploaded data into Hive
20/05/21 12:33:45 INFO hive.HiveImport: 
20/05/21 12:33:45 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/lib/hive-common-1.2.1.jar!/hive-log4j.properties
.......
20/05/21 12:33:58 INFO hive.HiveImport: OK
20/05/21 12:33:58 INFO hive.HiveImport: Time taken: 2.803 seconds
20/05/21 12:33:58 INFO hive.HiveImport: Loading data to table default.user
20/05/21 12:33:59 INFO hive.HiveImport: Table default.user stats: [numFiles=1, totalSize=26]
20/05/21 12:33:59 INFO hive.HiveImport: OK
20/05/21 12:33:59 INFO hive.HiveImport: Time taken: 1.299 seconds
20/05/21 12:34:00 INFO hive.HiveImport: Hive import complete.
20/05/21 12:34:00 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

查看mysql中導入數據

#查看數據庫
hive> show tables;
OK
mysql1
myuser
stu2
student
user
Time taken: 0.035 seconds, Fetched: 5 row(s)
#查看詳細數據
hive> select * from myuser;
OK
1	admin	admin
2	ROOT	ROOT
Time taken: 0.74 seconds, Fetched: 2 row(s)
hive> 

使用Sqoop將HDFS/Hive/HBase中的數據導出到MySQL

sqoop export #導出
--connect jdbc:mysql://hadoop01:3306/sqoopdb #連接mysql
--username root #mysql用戶
--password root #mysql密碼
-m 1 #指定啓動map進程個數
--table user1 #hdfs導入mysql當中那個表(需要提前創建)
--export-dir /sqoop/user #要導出數據的目錄
sqoop export --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root -m 1 --table user1 --export-dir /sqoop/user
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章