Sqoop使用和簡介

Sqoop 工具是Hadoop環境下連接關係數據庫，和hadoop存儲系統的橋樑，支持多種關係數據源和hive,hdfs,hbase的相互導入。一般情況下，關係數據表存在於線上環境的備份環境，需要每天進行數據導入，根據每天的數據量而言，sqoop可以全表導入，對於每天產生的數據量不是很大的情形可以全表導入，但是sqoop也提供了增量數據導入的機制。

下面介紹幾個常用的sqoop的命令，以及一些參數：

序號	命令/command	類	說明
1	impor	ImportTool	從關係型數據庫中導入數據(來自表或者查詢語句)到HDFS中
2	export	ExportTool	將HDFS中的數據導入到關係型數據庫中
3	codegen	CodeGenTool	獲取數據庫中某張表數據生成Java並打成jar包
4	create-hive-table	CreateHiveTableTool	創建Hive表
5	eval	EvalSqlTool	查看SQL執行結果
6	import-all-tables	ImportAllTablesTool	導入某個數據庫下所有表到HDFS中
7	job	JobTool
8	list-databases	ListDatabasesTool	列出所有數據庫名
9	list-tables	ListTablesTool	列出某個數據庫下所有表
10	merge	MergeTool
11	metastore	MetastoreTool
12	help	HelpTool	查看幫助
13	version	VersionTool	查看版本

接着列出Sqoop的各種通用參數,然後針對以上13個命令列出他們自己的參數.Sqoop通用參數又分Common arguments

Incrementalimport arguments

Outputline formatting arguments

Inputparsing arguments,Hive arguments

HBasearguments

GenericHadoop command-line arguments

1.Common arguments通用參數,主要是針對關係型數據庫鏈接的一些參數

序號	參數	說明	樣例
1	connect	連接關係型數據庫的URL	jdbc:mysql://localhost/sqoop_datas
2	connection-manager	連接管理類,一般不用
3	driver	連接驅動
4	hadoop-home	hadoop目錄	/home/hadoop
5	help	查看幫助信息
6	password	連接關係型數據庫的密碼
7	username	鏈接關係型數據庫的用戶名
8	verbose	查看更多的信息,其實是將日誌級別調低	該參數後面不接值

Importcontrol arguments:

Argument	Description
--append	Append data to an existing dataset in HDFS
--as-avrodatafile	Imports data to Avro Data Files
--as-sequencefile	Imports data to SequenceFiles
--as-textfile	Imports data as plain text (default)
--boundary-query <statement>	Boundary query to use for creating splits
--columns <col,col,col…>	Columns to import from table
--direct	Use direct import fast path
--direct-split-size <n>	Split the input stream every n bytes when importing in direct mode
--inline-lob-limit <n>	Set the maximum size for an inline LOB
-m,--num-mappers <n>	Use n map tasks to import in parallel
-e,--query <statement>	Import the results of statement.
--split-by <column-name>	Column of the table used to split work units
--table <table-name>	Table to read
--target-dir <dir>	HDFS destination dir
--warehouse-dir <dir>	HDFS parent for table destination
--where <where clause>	WHERE clause to use during import
-z,--compress	Enable compression
--compression-codec <c>	Use Hadoop codec (default gzip)
--null-string <null-string>	The string to be written for a null value for string columns
--null-non-string <null-string>	The string to be written for a null value for non-string columns

Incrementalimport arguments:

Argument	Description
--check-column (col)	Specifies the column to be examined when determining which rows to import.
--incremental (mode)	Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified.
--last-value (value)	Specifies the maximum value of the check column from the previous import.

Output lineformatting arguments:

Argument	Description
--enclosed-by <char>	Sets a required field enclosing character
--escaped-by <char>	Sets the escape character
--fields-terminated-by <char>	Sets the field separator character
--lines-terminated-by <char>	Sets the end-of-line character
--mysql-delimiters	Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: '
--optionally-enclosed-by <char>	Sets a field enclosing character

Hivearguments:

Argument	Description
--hive-home <dir>	Override $HIVE_HOME
--hive-import	Import tables into Hive (Uses Hive’s default delimiters if none are set.)
--hive-overwrite	Overwrite existing data in the Hive table.
--create-hive-table	If set, then the job will fail if the target hive
	table exits. By default this property is false.
--hive-table <table-name>	Sets the table name to use when importing to Hive.
--hive-drop-import-delims	Drops \n, \r, and \01 from string fields when importing to Hive.
--hive-delims-replacement	Replace \n, \r, and \01 from string fields with user defined string when importing to Hive.
--hive-partition-key	Name of a hive field to partition are sharded on
--hive-partition-value <v>	String-value that serves as partition key for this imported into hive in this job.
--map-column-hive <map>	Override default mapping from SQL type to Hive type for configured columns.

HBasearguments:

Argument	Description
--column-family <family>	Sets the target column family for the import
--hbase-create-table	If specified, create missing HBase tables
--hbase-row-key <col>	Specifies which input column to use as the row key
--hbase-table <table-name>	Specifies an HBase table to use as the target instead of HDFS

Codegeneration arguments:

Argument	Description
--bindir <dir>	Output directory for compiled objects
--class-name <name>	Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class.
--jar-file <file>	Disable code generation; use specified jar
--outdir <dir>	Output directory for generated code
--package-name <name>	Put auto-generated classes in this package
--map-column-java <m>	Override default mapping from SQL type to Java type for configured columns.

Sqoop 的詳細介紹：請點這裏
Sqoop 的下載地址：請點這裏

相關閱讀：

通過Sqoop實現Mysql / Oracle 與HDFS / Hbase互導數據 http://www.linuxidc.com/Linux/2013-06/85817.htm

[Hadoop] Sqoop安裝過程詳解 http://www.linuxidc.com/Linux/2013-05/84082.htm

用Sqoop進行MySQL和HDFS系統間的數據互導 http://www.linuxidc.com/Linux/2013-04/83447.htm

Hadoop Oozie學習筆記 Oozie不支持Sqoop問題解決 http://www.linuxidc.com/Linux/2012-08/67027.htm

Hadoop生態系統搭建（hadoop hive hbase zookeeper oozie Sqoop） http://www.linuxidc.com/Linux/2012-03/55721.htm

Hadoop學習全程記錄——使用Sqoop將MySQL中數據導入到Hive中 http://www.linuxidc.com/Linux/2012-01/51993.htm

Sqoop使用和簡介

搭建SQOOP環境

修改HDFS上文件的權限

hadoop僞分佈模式環境安裝

Sqoop使用和簡介

hadoop僞分佈式環境apache版本切換爲CDH

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結