自己去官網上看user guide,可有效解決問題:http://sqoop.apache.org/
一、sqoop1,sqoop-1.4.6.bin_hadoop-2.0.4-alpha.tar.gz安裝配置:
1、下載解壓:
2、修改配置文件:
cd $SQOOP_HOME/conf
mv sqoop-env-template.sh sqoop-env.sh
打開sqoop-env.sh並編輯下面幾行:
export HADOOP_COMMON_HOME=/home/hadoop/apps/hadoop-2.6.1/
export HADOOP_MAPRED_HOME=/home/hadoop/apps/hadoop-2.6.1/
export HIVE_HOME=/home/hadoop/apps/hive-1.2.1
3、配置環境變量:
必須配置環境變量,否則報錯:
vi /etc/profile
export SQOOP_HOME=/usr/lib/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
source /etc/profile
4、下載並配置java的MySQL連接器(mysql-connector-java)
wget http://central.maven.org/maven2/mysql/mysql-connector-java/8.0.11/mysql-connector-java-8.0.11.jar
將mysql-connector-java-8.0.11.jar移動到lib目錄下:
mv mysql-connector-java-8.0.11.jar $SQOOP_HOME/lib
5、驗證Sqoop
下面的命令被用來驗證Sqoop版本。
sqoop -version
二、sqoop2,sqoop-1.99.7-bin-hadoop200.tar.gz安裝配置
1、下載解壓:
2、創建2個相關目錄:
mkdir /home/hadoop/sqoop/sqoop-1.99.7-bin-hadoop200/extra
mkdir /home/hadoop/sqoop/sqoop-1.99.7-bin-hadoop200/logs
3、配置環境變量:
vi /etc/profile
export SQOOP_HOME=/home/hadoop/sqoop/sqoop-1.99.7-bin-hadoop200
export PATH=$PATH:$SQOOP_HOME/bin
export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra
export CATALINA_BASE=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs/
source /etc/profile
4、修改sqoop配置文件:
cd /home/hadoop/sqoop/sqoop-1.99.7-bin-hadoop200/conf
vi sqoop.properties
修改hadoop configure directory路徑:
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop/hadoop-2.7.3/etc/hadoop
在conf目錄下,添加catalina.properties文件。加入本機hadoop的相關jar路徑,如下所示:
common.loader=${catalina.base}/lib,${catalina.base}/lib/*.jar,${catalina.home}/lib,${catalina.home}/lib/*.jar,${catalina.home}/../lib/*.jar,/usr/local/hadoop/hadoop-2.7.3/share/hadoop/common/*.jar,/usr/local/hadoop/hadoop-2.7.3/share/hadoop/common/lib/*.jar,/usr/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/*.jar,/usr/local/hadoop/hadoop-2.7.3/share/hadoop/hdfs/lib/*.jar,/usr/local/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/*.jar,/usr/local/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/lib/*.jar,/usr/local/hadoop/hadoop-2.7.3/share/hadoop/tools/lib/*.jar,/usr/local/hadoop/hadoop-2.7.3/share/hadoop/yarn/*.jar,/usr/local/hadoop/hadoop-2.7.3/share/hadoop/yarn/lib/*.jar,/usr/local/hadoop/hadoop-2.7.3/share/hadoop/httpfs/tomcat/lib/*.jar
注意:將標紅的hadoop路徑替換成自己實際的hadoop安裝路徑,可在notepad++中用正則表達式替換。
5、sqoop2必須修改$HADOOP_HOME/etc/hadoop/core-site.xml 文件,否則start job會報錯:
感覺這也是sqoop比較坑的地方,報錯圖文不符
Exception: org.apache.sqoop.common.SqoopException Message: GENERIC_HDFS_CONNECTOR_0007:Invalid input/output directory - Unexpected exception
在core-site.xml中追加:(直接把下面這句複製過去就好,不用做任何修改)
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
6、啓動server、client:
sqoop.sh server start
sqoop.sh client
7、sqoop使用:
服務端連接是正常可用,show connector
1、創建到各個源或目標存儲的連接:
1)創建hdfs link
sqoop:000> create link --connector hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: hdfs-link ##link的名稱,可以隨意起
HDFS cluster
URI: hdfs://192.168.101.11:9000 ##hdfs地址,在hadoop配置文件core-site.xml中查看
Conf directory: /home/hadoop/hadoop-2.6.5/etc/hadoop ##hadoop配置文件目錄
Additional configs::
There are currently 0 values in the map:
entry# ##直接回車即可
New link was successfully created with validation status OK and name hdfs-link
sqoop:000> show link
+-----------+----------------+---------+
| Name | Connector Name | Enabled |
+-----------+----------------+---------+
| hdfs-link | hdfs-connector | true |
+-----------+----------------+---------+
2)創建Oracle link
sqoop:000> create link --connector generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: oracle-link ##link的名稱,可以隨意起
Database connection
Driver class: oracle.jdbc.driver.OracleDriver ##Oracle jdbc驅動
Connection String: jdbc:oracle:thin:@192.168.101.9:1521:orcl ##Oracle連接字符串
Username: scott ##Oracle用戶名
Password: ***** #密碼
Fetch Size: #以下直接回車,默認即可
Connection Properties:
There are currently 0 values in the map:
entry#
SQL Dialect
Identifier enclose: ##sql定界符,爲避免錯誤,需要打一個空格
New link was successfully created with validation status OK and name oracle-link
sqoop:000> show link
+-------------+------------------------+---------+
| Name | Connector Name | Enabled |
+-------------+------------------------+---------+
| oracle-link | generic-jdbc-connector | true |
| hdfs-link | hdfs-connector | true |
+-------------+------------------------+---------+
3)創建從oracle向hdfs導入數據的job
sqoop:000> create job -f oracle-link -t hdfs-link
Creating job for links with from name oracle-link and to name hdfs-link
Please fill following values to create new job object
Name: oracle2hdfs #job名稱
Database source
Schema name: scott #oracle的schema,即用戶名
Table name: emp #需要導入的表名
SQL statement: ##SQL,默認導入整張表
Column names:
There are currently 0 values in the list:
element#
Partition column: empno ##指定一個列名即可,一般可以用主鍵,或者時間列,以便mapper任務的切分
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value:
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 0 #選擇導入hdfs的一種格式,選擇txt即可
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0 #默認無壓縮
Custom codec:
Output directory: /data ##導入到hdfs的目錄,注意,全量導入時,該目錄需要爲空
Append mode: ##追加模式,默認爲全量
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name oracle2hdfs
sqoop:000>
4)創建從hdfs向oracle導入的job
sqoop:000> create job -f hdfs-link -t oracle-link
Creating job for links with from name hdfs-link and to name oracle-link
Please fill following values to create new job object
Name: hdfs2oracle
Input configuration
Input directory: /data ##hdfs 的導入目錄
Override null value:
Null value:
Incremental import
Incremental type:
0 : NONE
1 : NEW_FILES
Choose: 0 #默認選0即可,應該是配合上一個參數,Incremental import,設置增量導入模式
Last imported date:
Database target
Schema name: scott #oracle用戶
Table name: emp2 #oracle表名,需要提前創建好表結構
Column names:
There are currently 0 values in the list:
element#
Staging table:
Clear stage table:
Throttling resources
Extractors:
Loaders:
Classpath configuration
Extra mapper jars:
There are currently 0 values in the list:
element#
New job was successfully created with validation status OK and name hdfs2oracle
sqoop:000>
sqoop:000> show job
+----+-------------+--------------------------------------+--------------------------------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+-------------+--------------------------------------+--------------------------------------+---------+
| 8 | oracle2hdfs | oracle-link (generic-jdbc-connector) | hdfs-link (hdfs-connector) | true |
| 9 | hdfs2oracle | hdfs-link (hdfs-connector) | oracle-link (generic-jdbc-connector) | true |
+----+-------------+--------------------------------------+--------------------------------------+---------+
sqoop:000>
5)運行作業
start job -name oracle2hdfs
6)修改job
update job --name hdfs-to-mysql-001
測試過程中,用該方法從HDFS導數據到mysql中,二百多萬的表的數據,只成功導入了一般,漏了一半。sqoop2似乎沒有sqoop1好用。
備註:其它用法可去官網查看文檔