sqoop基本语法简介及导入导出详细步骤

sqoop参数

-m 指定启动map进程个数,默认是4--delete-target-dir 删除目标目录
--mapreduce-job-name 指定mapreduce的job的名字
--target-dir 导入到指定目录
--fields-terminated-by 指定字段之间的分隔符
--null-string 含义是 string类型的字段,当Value是NULL,替换成指定的字符
--null-non-string 含义是非string类型的字段,当Value是NULL,替换成指定字符
--columns 导入表中的部分字段
--where 按条件导入数据
--query 按照sql语句进行导入 使用--query关键字,就不能使用--table和--columns
--options-file 在文件中执行

>>>>>HDFS数据导出到MySQL或Hive中的数据导入到MySQL
--table 指定导出表的名称
--input-fields-terminated-by 指定hdfs上文件的分隔符,默认是逗号
--export-dir 导出数据的目录
--columns 指定导出的字段

>>>数据导入到Hive中
--create-hive-table 创建目标表,如果有会报错
--hive-database 指定hive数据库
--hive-import 指定导入hive(没有这个条件导入到hdfs中)
--hive-overwrite 覆盖
--hive-table 指定hive中表的名字,如果不指定使用导入的表的表名
--hive-partition-key 指定Hive分区表字段
--hive-partition-value 指定导入的分区值

mysql中创建表过程

#查看mysql中数据库
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hivedb             |
| mysql              |
| performance_schema |
+--------------------+
4 rows in set (0.11 sec)
#创建数据库
mysql> create database sqoopdb;
Query OK, 1 row affected (0.04 sec)
#检查所创建的数据库
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hivedb             |
| mysql              |
| performance_schema |
| sqoopdb            |
+--------------------+
5 rows in set (0.00 sec)
#使用数据库(意思就是想在那个数据库中创建表)
mysql> use sqoopdb;
#看到 Database changed 标志代表可以在它里面创建表
Database changed
#创建表(表的大小写要一致,避免之后出现问题  亲身体验过)
mysql> create table user( ID INT(4) NOT NULL AUTO_INCREMENT, ACCOUNT VARCHAR(255) DEFAULT NULL, PASSWD VARCHAR(255) DEFAULT NULL, PRIMARY KEY(ID) );
Query OK, 0 rows affected (0.15 sec)
#插入数据
mysql> INSERT INTO user VALUES("1","admin",'admin');
Query OK, 1 row affected (0.03 sec)

mysql> INSERT INTO user VALUES("2","root",'root');
Query OK, 1 row affected (0.00 sec)

mysql> INSERT INTO user VALUES("3","zfx",'zfx');
Query OK, 1 row affected (0.00 sec)
#查看插入表中的数据
mysql> select * from user;
+----+---------+--------+
| ID | ACCOUNT | PASSWD |
+----+---------+--------+
|  1 | admin   | admin  |
|  2 | root    | root   |
|  3 | zfx     | zfx    |
+----+---------+--------+
3 rows in set (0.00 sec)

查看sqoop命令详细帮助

[root@hadoop01 ~]$ sqoop help
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import datasets from a mainframe server to HDFS
  job                Work with saved jobs
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  merge              Merge results of incremental imports
  metastore          Run a standalone Sqoop metastore
  version            Display version information

See 'sqoop help COMMAND' for information on a specific command.

# 这里提示我们使用sqoop help command(要查询的命令)进行该命令的详细查询

查看本地虚拟机mysql数据库中的数据库的名称

#查看本地虚拟机mysql数据库中的数据库的名称
[root@hadoop01 ~]# sqoop list-databases --connect jdbc:mysql://hadoop01:3306/ --username root --password root
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/20 09:26:35 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/20 09:26:35 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/20 09:26:35 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
hivedb
mysql
performance_schema
sqoopdb

查看本地虚拟机mysql数据库中的数据库中表名

[root@hadoop01 ~]# sqoop list-tables --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/20 09:27:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/20 09:27:13 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/20 09:27:13 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
user

Mysql 中导入到HDFS中并查看数据

sqoop语法:

sqoop import  
--connect jdbc:mysql://ip:3306/databasename  #指定JDBC的URL 其中database指的是(Mysql或者Oracle)中的数据库名
--table  tablename  #要读取数据库database中的表名           
--username root      #用户名 
--password  123456  #密码    
--target-dir   /path  #指的是HDFS中导入表的存放目录(注意:是目录)
--fields-terminated-by '\t'   #设定导入数据后每个字段的分隔符,默认;分隔
--lines-terminated-by '\n'    #设定导入数据后每行的分隔符
--m 1  #并发的map数量1,如果不设置默认启动4个map task执行数据导入,则需要指定一个列来作为划分map task任务的依据
-- where ’查询条件‘   #导入查询出来的内容,表的子集
--incremental  append  #增量导入
--check-column:column_id   #指定增量导入时的参考列
--last-value:num   #上一次导入column_id的最后一个值
--null-string ‘’   #导入的字段为空时,用指定的字符进行替换

命令详解

 sqoop import #表示导入
 --connect jdbc:mysql://hadoop01:3306/sqoopdb #连接mysql
 --username root #mysql用户
 --password root #mysql密码
 --table user #要读取数据库database中的表名 
 --columns 'id ,ACCOUNT,PASSWD' #导入mysql表中的部分字段(我这里是导入全部字段)
 -m 1 #指定启动map进程个数
 --target-dir '/sqoopdb/' #导入到hdfs指定目录

实践:
需求:Mysql 中表user 导入到HDFS的sqoopdb目录下 ,并用hdfs命令查看导入的数据

 sqoop import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table user --columns 'id ,ACCOUNT,PASSWD' -m 1 --target-dir '/sqoopdb/' 
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/20 09:39:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/20 09:39:34 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/20 09:39:34 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/05/20 09:39:34 INFO tool.CodeGenTool: Beginning code generation
20/05/20 09:39:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
20/05/20 09:39:34 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user` AS t LIMIT 1
20/05/20 09:39:34 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/app/hadoop
注: /tmp/sqoop-root/compile/5ab33079a88c16d4be68a133b8c67593/user.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
20/05/20 09:39:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/5ab33079a88c16d4be68a133b8c67593/user.jar
20/05/20 09:39:40 WARN manager.MySQLManager: It looks like you are importing from mysql.
20/05/20 09:39:40 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
20/05/20 09:39:40 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
20/05/20 09:39:40 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
20/05/20 09:39:40 INFO mapreduce.ImportJobBase: Beginning import of user
20/05/20 09:39:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/05/20 09:39:42 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/05/20 09:39:42 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.40.128:8032
20/05/20 09:39:49 INFO db.DBInputFormat: Using read commited transaction isolation
20/05/20 09:39:49 INFO mapreduce.JobSubmitter: number of splits:1
20/05/20 09:39:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1589936728054_0001
20/05/20 09:39:50 INFO impl.YarnClientImpl: Submitted application application_1589936728054_0001
20/05/20 09:39:50 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1589936728054_0001/
20/05/20 09:39:50 INFO mapreduce.Job: Running job: job_1589936728054_0001
20/05/20 09:40:03 INFO mapreduce.Job: Job job_1589936728054_0001 running in uber mode : false
20/05/20 09:40:03 INFO mapreduce.Job:  map 0% reduce 0%
20/05/20 09:40:34 INFO mapreduce.Job:  map 100% reduce 0%
20/05/20 09:40:36 INFO mapreduce.Job: Job job_1589936728054_0001 completed successfully
20/05/20 09:40:36 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=124476
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=87
		HDFS: Number of bytes written=36
		HDFS: Number of read operations=4
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Launched map tasks=1
		Other local map tasks=1
		Total time spent by all maps in occupied slots (ms)=24069
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=24069
		Total vcore-seconds taken by all map tasks=24069
		Total megabyte-seconds taken by all map tasks=24646656
	Map-Reduce Framework
		Map input records=3
		Map output records=3
		Input split bytes=87
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=52
		CPU time spent (ms)=610
		Physical memory (bytes) snapshot=84381696
		Virtual memory (bytes) snapshot=2078760960
		Total committed heap usage (bytes)=16961536
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=36
20/05/20 09:40:36 INFO mapreduce.ImportJobBase: Transferred 36 bytes in 54.4531 seconds (0.6611 bytes/sec)
20/05/20 09:40:36 INFO mapreduce.ImportJobBase: Retrieved 3 records.

使用hdfs 命令查看数据

hadoop fs -ls 显示的是当前的用户目录 
hadoop fs -ls / 显示的是HDFS根目录
hdfs dfs -cat/text  显示内容
[root@hadoop01 ~]# hdfs dfs -ls /sqoopdb/
Found 2 items
-rw-r--r--   3 root supergroup          0 2020-05-20 09:40 /sqoopdb/_SUCCESS
-rw-r--r--   3 root supergroup         36 2020-05-20 09:40 /sqoopdb/part-m-00000
[root@hadoop01 ~]# hdfs dfs -ls /sqoopdb//sqoopdb/part-m-00000
ls: `/sqoopdb//sqoopdb/part-m-00000': No such file or directory
[root@hadoop01 ~]# hdfs dfs -ls /sqoopdb/part-m-00000
-rw-r--r--   3 root supergroup         36 2020-05-20 09:40 /sqoopdb/part-m-00000
#这就是导入hdfs中数据
[root@hadoop01 ~]# hdfs dfs -cat /sqoopdb/part-m-00000
1,admin,admin
2,root,root
3,zfx,zfx

Mysql 中导入到Hbase中并查看数据

命令详解

sqoop import #导入数据
--connect jdbc:mysql://hadoop01:3306/sqoopdb #连接mysql
--username root #mysql用户
--password root #mysql密码
--table MYUSER  #mysql中导入habse的表名
--columns 'ID ,ACCOUNT,PASSWD' #导入mysql表中的部分字段
-m 1 #指定启动map进程个数
--hbase-table user #指定hbase中表的名字
--hbase-row-key ID #可以你的mysql数据库主键
--column-family info # habse的'列簇名:region'

实践:
需求:
Mysql 中表MYUSER导入到hbase表user中(需要提前在hbase表中创建),之后到hbase表中查看

[root@hadoop01 ~]# sqoop import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table MYUSER --columns 'ID ,ACCOUNT,PASSWD' -m 1 --hbase-table user --hbase-row-key ID --column-family info 
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
................
20/05/20 18:51:50 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 167.842 seconds (0 bytes/sec)
20/05/20 18:51:50 INFO mapreduce.ImportJobBase: Retrieved 2 records.

Hbase命令查看数据

#列出habse中表
hbase(main):003:0> list
TABLE                                                                                                                                                                      
scores                                                                                                                                                                     
user                                                                                                                                                                       
2 row(s) in 6.7600 seconds

=> ["scores", "user"]
#查看通过sqoop 从mysql导入habse数据
hbase(main):004:0> scan 'user'
ROW                                         COLUMN+CELL                                                                                                                    
 1                                          column=info:ACCOUNT, timestamp=1589971903320, value=admin                                                                      
 1                                          column=info:PASSWD, timestamp=1589971903320, value=admin                                                                       
 2                                          column=info:ACCOUNT, timestamp=1589971903320, value=ROOT                                                                       
 2                                          column=info:PASSWD, timestamp=1589971903320, value=ROOT                                                                        
2 row(s) in 5.8630 seconds

Mysql 中导入到hive中并查看数据

命令详解

 sqoop import #导入数据
 --hive-import #指定导入hive
 --connect jdbc:mysql://hadoop01:3306/sqoopdb #连接mysql
 --username root #mysql用户
 --password root #mysql 密码
 --table MYUSER  #mysql中要往hive导的表
 --columns'ID,ACCOUNT,PASSWD' #导入mysql表中的部分字段
 -m 1 #指定启动map进程个数
 --hive-table myuser #指定hive中表的名字,如果没有话自动创建

实践
需求:将mysql中表MYUSER导入hive表myuser中,并使用hive命令查看

[root@hadoop01 ~]# sqoop import --hive-import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table MYUSER --columns'ID,ACCOUNT,PASSWD' -m 1 --hive-table myuser 
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
20/05/21 12:31:02 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/05/21 12:31:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Error parsing arguments for import:
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: --columnsID,ACCOUNT,PASSWD
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: -m
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: 1
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: --hive-table
20/05/21 12:31:02 ERROR tool.BaseSqoopTool: Unrecognized argument: USER

Try --help for usage instructions.
[root@hadoop01 ~]# sqoop import --hive-import --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root --table MYUSER --columns 'ID,ACCOUNT,PASSWD' -m 1 --hive-table USER 
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.

20/05/21 12:33:27 INFO mapreduce.ImportJobBase: Transferred 26 bytes in 36.2284 seconds (0.7177 bytes/sec)
20/05/21 12:33:27 INFO mapreduce.ImportJobBase: Retrieved 2 records.
20/05/21 12:33:27 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table MYUSER
20/05/21 12:33:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `MYUSER` AS t LIMIT 1
20/05/21 12:33:27 INFO hive.HiveImport: Loading uploaded data into Hive
20/05/21 12:33:45 INFO hive.HiveImport: 
20/05/21 12:33:45 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/opt/app/sqoop-1.4.7.bin__hadoop-2.6.0/lib/hive-common-1.2.1.jar!/hive-log4j.properties
.......
20/05/21 12:33:58 INFO hive.HiveImport: OK
20/05/21 12:33:58 INFO hive.HiveImport: Time taken: 2.803 seconds
20/05/21 12:33:58 INFO hive.HiveImport: Loading data to table default.user
20/05/21 12:33:59 INFO hive.HiveImport: Table default.user stats: [numFiles=1, totalSize=26]
20/05/21 12:33:59 INFO hive.HiveImport: OK
20/05/21 12:33:59 INFO hive.HiveImport: Time taken: 1.299 seconds
20/05/21 12:34:00 INFO hive.HiveImport: Hive import complete.
20/05/21 12:34:00 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

查看mysql中导入数据

#查看数据库
hive> show tables;
OK
mysql1
myuser
stu2
student
user
Time taken: 0.035 seconds, Fetched: 5 row(s)
#查看详细数据
hive> select * from myuser;
OK
1	admin	admin
2	ROOT	ROOT
Time taken: 0.74 seconds, Fetched: 2 row(s)
hive> 

使用Sqoop将HDFS/Hive/HBase中的数据导出到MySQL

sqoop export #导出
--connect jdbc:mysql://hadoop01:3306/sqoopdb #连接mysql
--username root #mysql用户
--password root #mysql密码
-m 1 #指定启动map进程个数
--table user1 #hdfs导入mysql当中那个表(需要提前创建)
--export-dir /sqoop/user #要导出数据的目录
sqoop export --connect jdbc:mysql://hadoop01:3306/sqoopdb --username root --password root -m 1 --table user1 --export-dir /sqoop/user
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章