Sqoop使用和簡介

Sqoop 工具是Hadoop環境下連接關係數據庫,和hadoop存儲系統的橋樑,支持多種關係數據源和hive,hdfs,hbase的相互導入。一般情況下,關係數據表存在於線上環境的備份環境,需要每天進行數據導入,根據每天的數據量而言,sqoop可以全表導入,對於每天產生的數據量不是很大的情形可以全表導入,但是sqoop也提供了增量數據導入的機制。

下面介紹幾個常用的sqoop的命令,以及一些參數:

 

序號

命令/command

說明

1

impor

ImportTool

從關係型數據庫中導入數據(來自表或者查詢語句)到HDFS中

2

export

ExportTool

將HDFS中的數據導入到關係型數據庫中

3

codegen

CodeGenTool

獲取數據庫中某張表數據生成Java並打成jar包

4

create-hive-table

CreateHiveTableTool

創建Hive表

5

eval

EvalSqlTool

查看SQL執行結果

6

import-all-tables

ImportAllTablesTool

導入某個數據庫下所有表到HDFS中

7

job

JobTool

 

8

list-databases

ListDatabasesTool

列出所有數據庫名

9

list-tables

ListTablesTool

列出某個數據庫下所有表

10

merge

MergeTool

 

11

metastore

MetastoreTool

 

12

help

HelpTool

查看幫助

13

version

VersionTool

查看版本

 

接着列出Sqoop的各種通用參數,然後針對以上13個命令列出他們自己的參數.Sqoop通用參數又分Common arguments

Incrementalimport arguments

Outputline formatting arguments

Inputparsing arguments,Hive arguments

HBasearguments

GenericHadoop command-line arguments

 

1.Common arguments通用參數,主要是針對關係型數據庫鏈接的一些參數

序號

參數

說明

樣例

1

connect

連接關係型數據庫的URL

jdbc:mysql://localhost/sqoop_datas

2

connection-manager

連接管理類,一般不用

 

3

driver

連接驅動

 

4

hadoop-home

hadoop目錄

/home/hadoop

5

help

查看幫助信息

 

6

password

連接關係型數據庫的密碼

 

7

username

鏈接關係型數據庫的用戶名

 

8

verbose

查看更多的信息,其實是將日誌級別調低

該參數後面不接值

 

Importcontrol arguments:

Argument

Description

--append

Append data to an existing dataset in HDFS

--as-avrodatafile

Imports data to Avro Data Files

--as-sequencefile

Imports data to SequenceFiles

--as-textfile

Imports data as plain text (default)

--boundary-query <statement>

Boundary query to use for creating splits

--columns <col,col,col…>

Columns to import from table

--direct

Use direct import fast path

--direct-split-size <n>

Split the input stream every n bytes when importing in direct mode

--inline-lob-limit <n>

Set the maximum size for an inline LOB

-m,--num-mappers <n>

Use n map tasks to import in parallel

-e,--query <statement>

Import the results of statement.

--split-by <column-name>

Column of the table used to split work units

--table <table-name>

Table to read

--target-dir <dir>

HDFS destination dir

--warehouse-dir <dir>

HDFS parent for table destination

--where <where clause>

WHERE clause to use during import

-z,--compress

Enable compression

--compression-codec <c>

Use Hadoop codec (default gzip)

--null-string <null-string>

The string to be written for a null value for string columns

--null-non-string <null-string>

The string to be written for a null value for non-string columns

 

 

Incrementalimport arguments:

Argument

Description

--check-column (col)

Specifies the column to be examined when determining which rows to import.

--incremental (mode)

Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.

--last-value (value)

Specifies the maximum value of the check column from the previous import.

 

 

Output lineformatting arguments:

Argument

Description

--enclosed-by <char>

Sets a required field enclosing character

--escaped-by <char>

Sets the escape character

--fields-terminated-by <char>

Sets the field separator character

--lines-terminated-by <char>

Sets the end-of-line character

--mysql-delimiters

Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '

--optionally-enclosed-by <char>

Sets a field enclosing character

 

 

Hivearguments:

Argument

Description

--hive-home <dir>

Override $HIVE_HOME

--hive-import

Import tables into Hive (Uses Hive’s default delimiters if none are set.)

--hive-overwrite

Overwrite existing data in the Hive table.

--create-hive-table

If set, then the job will fail if the target hive

 

table exits. By default this property is false.

--hive-table <table-name>

Sets the table name to use when importing to Hive.

--hive-drop-import-delims

Drops \n\r, and \01 from string fields when importing to Hive.

--hive-delims-replacement

Replace \n\r, and \01 from string fields with user defined string when importing to Hive.

--hive-partition-key

Name of a hive field to partition are sharded on

--hive-partition-value <v>

String-value that serves as partition key for this imported into hive in this job.

--map-column-hive <map>

Override default mapping from SQL type to Hive type for configured columns.

 

 

HBasearguments:

Argument

Description

--column-family <family>

Sets the target column family for the import

--hbase-create-table

If specified, create missing HBase tables

--hbase-row-key <col>

Specifies which input column to use as the row key

--hbase-table <table-name>

Specifies an HBase table to use as the target instead of HDFS

 

 

Codegeneration arguments:

Argument

Description

--bindir <dir>

Output directory for compiled objects

--class-name <name>

Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.

--jar-file <file>

Disable code generation; use specified jar

--outdir <dir>

Output directory for generated code

--package-name <name>

Put auto-generated classes in this package

--map-column-java <m>

Override default mapping from SQL type to Java type for configured columns.

Sqoop 的詳細介紹請點這裏
Sqoop 的下載地址請點這裏

相關閱讀

通過Sqoop實現Mysql / Oracle 與HDFS / Hbase互導數據 http://www.linuxidc.com/Linux/2013-06/85817.htm

[Hadoop] Sqoop安裝過程詳解 http://www.linuxidc.com/Linux/2013-05/84082.htm

用Sqoop進行MySQL和HDFS系統間的數據互導 http://www.linuxidc.com/Linux/2013-04/83447.htm

Hadoop Oozie學習筆記 Oozie不支持Sqoop問題解決 http://www.linuxidc.com/Linux/2012-08/67027.htm

Hadoop生態系統搭建(hadoop hive hbase zookeeper oozie Sqoop) http://www.linuxidc.com/Linux/2012-03/55721.htm

Hadoop學習全程記錄——使用Sqoop將MySQL中數據導入到Hive中 http://www.linuxidc.com/Linux/2012-01/51993.htm


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章