Sqoop 工具是Hadoop環境下連接關係數據庫,和hadoop存儲系統的橋樑,支持多種關係數據源和hive,hdfs,hbase的相互導入。一般情況下,關係數據表存在於線上環境的備份環境,需要每天進行數據導入,根據每天的數據量而言,sqoop可以全表導入,對於每天產生的數據量不是很大的情形可以全表導入,但是sqoop也提供了增量數據導入的機制。
下面介紹幾個常用的sqoop的命令,以及一些參數:
序號 |
命令/command |
類 |
說明 |
1 |
impor |
ImportTool |
從關係型數據庫中導入數據(來自表或者查詢語句)到HDFS中 |
2 |
export |
ExportTool |
將HDFS中的數據導入到關係型數據庫中 |
3 |
codegen |
CodeGenTool |
獲取數據庫中某張表數據生成Java並打成jar包 |
4 |
create-hive-table |
CreateHiveTableTool |
創建Hive表 |
5 |
eval |
EvalSqlTool |
查看SQL執行結果 |
6 |
import-all-tables |
ImportAllTablesTool |
導入某個數據庫下所有表到HDFS中 |
7 |
job |
JobTool |
|
8 |
list-databases |
ListDatabasesTool |
列出所有數據庫名 |
9 |
list-tables |
ListTablesTool |
列出某個數據庫下所有表 |
10 |
merge |
MergeTool |
|
11 |
metastore |
MetastoreTool |
|
12 |
help |
HelpTool |
查看幫助 |
13 |
version |
VersionTool |
查看版本 |
接着列出Sqoop的各種通用參數,然後針對以上13個命令列出他們自己的參數.Sqoop通用參數又分Common arguments
Incrementalimport arguments
Outputline formatting arguments
Inputparsing arguments,Hive arguments
HBasearguments
GenericHadoop command-line arguments
1.Common arguments通用參數,主要是針對關係型數據庫鏈接的一些參數
序號 |
參數 |
說明 |
樣例 |
1 |
connect |
連接關係型數據庫的URL |
jdbc:mysql://localhost/sqoop_datas |
2 |
connection-manager |
連接管理類,一般不用 |
|
3 |
driver |
連接驅動 |
|
4 |
hadoop-home |
hadoop目錄 |
/home/hadoop |
5 |
help |
查看幫助信息 |
|
6 |
password |
連接關係型數據庫的密碼 |
|
7 |
username |
鏈接關係型數據庫的用戶名 |
|
8 |
verbose |
查看更多的信息,其實是將日誌級別調低 |
該參數後面不接值 |
Importcontrol arguments:
Argument |
Description |
--append |
Append data to an existing dataset in HDFS |
--as-avrodatafile |
Imports data to Avro Data Files |
--as-sequencefile |
Imports data to SequenceFiles |
--as-textfile |
Imports data as plain text (default) |
--boundary-query <statement> |
Boundary query to use for creating splits |
--columns <col,col,col…> |
Columns to import from table |
--direct |
Use direct import fast path |
--direct-split-size <n> |
Split the input stream every n bytes when importing in direct mode |
--inline-lob-limit <n> |
Set the maximum size for an inline LOB |
-m,--num-mappers <n> |
Use n map tasks to import in parallel |
-e,--query <statement> |
Import the results of statement. |
--split-by <column-name> |
Column of the table used to split work units |
--table <table-name> |
Table to read |
--target-dir <dir> |
HDFS destination dir |
--warehouse-dir <dir> |
HDFS parent for table destination |
--where <where clause> |
WHERE clause to use during import |
-z,--compress |
Enable compression |
--compression-codec <c> |
Use Hadoop codec (default gzip) |
--null-string <null-string> |
The string to be written for a null value for string columns |
--null-non-string <null-string> |
The string to be written for a null value for non-string columns |
Incrementalimport arguments:
Argument |
Description |
--check-column (col) |
Specifies the column to be examined when determining which rows to import. |
--incremental (mode) |
Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified. |
--last-value (value) |
Specifies the maximum value of the check column from the previous import. |
Output lineformatting arguments:
Argument |
Description |
--enclosed-by <char> |
Sets a required field enclosing character |
--escaped-by <char> |
Sets the escape character |
--fields-terminated-by <char> |
Sets the field separator character |
--lines-terminated-by <char> |
Sets the end-of-line character |
--mysql-delimiters |
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ' |
--optionally-enclosed-by <char> |
Sets a field enclosing character |
Hivearguments:
Argument |
Description |
--hive-home <dir> |
Override $HIVE_HOME |
--hive-import |
Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
--hive-overwrite |
Overwrite existing data in the Hive table. |
--create-hive-table |
If set, then the job will fail if the target hive |
table exits. By default this property is false. |
|
--hive-table <table-name> |
Sets the table name to use when importing to Hive. |
--hive-drop-import-delims |
Drops \n, \r, and \01 from string fields when importing to Hive. |
--hive-delims-replacement |
Replace \n, \r, and \01 from string fields with user defined string when importing to Hive. |
--hive-partition-key |
Name of a hive field to partition are sharded on |
--hive-partition-value <v> |
String-value that serves as partition key for this imported into hive in this job. |
--map-column-hive <map> |
Override default mapping from SQL type to Hive type for configured columns. |
HBasearguments:
Argument |
Description |
--column-family <family> |
Sets the target column family for the import |
--hbase-create-table |
If specified, create missing HBase tables |
--hbase-row-key <col> |
Specifies which input column to use as the row key |
--hbase-table <table-name> |
Specifies an HBase table to use as the target instead of HDFS |
Codegeneration arguments:
Argument |
Description |
--bindir <dir> |
Output directory for compiled objects |
--class-name <name> |
Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. |
--jar-file <file> |
Disable code generation; use specified jar |
--outdir <dir> |
Output directory for generated code |
--package-name <name> |
Put auto-generated classes in this package |
--map-column-java <m> |
Override default mapping from SQL type to Java type for configured columns. |
Sqoop 的詳細介紹:請點這裏
Sqoop 的下載地址:請點這裏
相關閱讀:
通過Sqoop實現Mysql / Oracle 與HDFS / Hbase互導數據 http://www.linuxidc.com/Linux/2013-06/85817.htm
[Hadoop] Sqoop安裝過程詳解 http://www.linuxidc.com/Linux/2013-05/84082.htm
用Sqoop進行MySQL和HDFS系統間的數據互導 http://www.linuxidc.com/Linux/2013-04/83447.htm
Hadoop Oozie學習筆記 Oozie不支持Sqoop問題解決 http://www.linuxidc.com/Linux/2012-08/67027.htm
Hadoop生態系統搭建(hadoop hive hbase zookeeper oozie Sqoop) http://www.linuxidc.com/Linux/2012-03/55721.htm
Hadoop學習全程記錄——使用Sqoop將MySQL中數據導入到Hive中 http://www.linuxidc.com/Linux/2012-01/51993.htm