sqoop 中文文檔 User guide 一

寫在譯文前的話

這是我第一次翻譯技術文檔,肯定有很多錯誤不妥之處,希望各位指出,我馬上改過來。

我認爲閱讀英文文檔應該分爲三個階段 1 理解英文含義 2 理解文字表達的內容 3 實際操作,本文力爭做到第二個階段。

譯文雖然完成了90%,但主要的功能都翻譯的很清楚。應該可以滿足大部分的應用需求。

如果把這個文檔弄明白了,其他的sqoop也不用看了

注:紅字部分是有疑問,或需要標識的地方。

sqoop 官方文檔
1. 簡介
    Sqoop是被設計用來作爲Hadoop與關係型數據庫之間傳輸數據的工具。Sqoop可以把數據從RDBMS(如mysql,oracle)導入到HDFS,經過Hadoop的Mapreduce計算轉換,然後再將數據導回到RDBMS。
    Sqoop會依據數據庫的schema所描述數據類型,自動的完成大部分的工作。Sqoop利用MapReduce來導入和導出數據的, 這樣就可以並行作業及容錯。
     本文檔是介紹怎樣使用Sqoop去做數據庫與hadoop之間的數據遷移,並且提供了Sqoop命令行工具操作的一些參考資料。
    本文檔的面向對象:
           1.系統和應用程序員
           2.系統管理員
           3.數據庫管理員
           4.數據分析
           5.數據工程師
2. 支持版本 Sqoop v1.4.2
3.Sqoop release
   sqoop兼容hadoop 0.21 和 Cloudera’s Distribution of Hadoop version 3.
4.預備知識
   a)計算機基本技術及相關術語
   b)熟悉命令行操作,如bash
   c)關係型數據庫
   d)熟悉Hadoop的用途及基本操作
   環境準備:安裝配置好Hadoop
   本文檔假設你使用的是類Linux環境。如果你用的是Windows,亦可能需要用cygwin去完成大部分的任務。如果你用的是Mac OS X,你可能會遇到一些兼容性的錯誤。Sqoop主要操作及測試於Linux環境。
5.基本用途
   Sqoop可以將數據從關係型數據庫導入到HDFS。Sqoop的數據導入過程是這樣的:輸入是數據庫中的一張表,Sqoop會逐條的將數據從表中導入到 HDFS中。導入過程的輸出是包含被導入表的一些文件。導入過程是並行執行的。所以,輸出也是多個文件。這些文件可能是以特定分隔符分隔的文本文件(比 如:是用“,”或tab爲字段分隔符),或者將記錄序列化的二進制Avro或SequenceFiles文件。
   Sqooq導入過程的衍生出一個Java類 ,它用來包裝導入表中的一條記錄.這個java類用於Sqoop導入數據的過程中,這個Java類的源碼也會提供給你,用於下接下來的數據的 MapReduce過程中。這個類可以與SequenceFile格式之間進行序列化及反序列化,也可以解析分隔符形式的文本記錄。基於這些功能,你可以 在你的處理管道中,快速開發基於Hdfs存儲記錄的MapReduce應用。當然,你也可以用你喜歡的其他工具,來解析這些分隔符格式的文本。
   處理完導入數據之後(比如:MapReduce或Hive),你可以將這些結果數據導回到關係型數據庫.Sqoop的導出進程,會並行從HDFS中讀取一 組delimited-text,解析成記錄,然後將其插入到目標數據庫(Oracle 或Mysql), 被外部的應用或用戶使用。
   Sqoop包括一些可以檢驗所用數據庫的命令。例如,列出可用數據庫schema(使用:sqoop-list-databases工具 ) ,數據庫表schema(使用sqoop-list-tables).Sqoop還提供一些可以執行原生SQL的Shell(sqoop-eval)
   數據導入,代碼生成,數據導出的大部分地方都可以進行定製。你可以控制導入數據的範圍、導入哪些列。你可以對基於文件的數據,聲明特別的分隔符,轉意字 符,以及文件格式化。代碼生成中你也可以控制類名、包名。下面部分就是介紹怎樣設置Sqoop的這些參數及其他的參數。
原文地址 http://www.dl234.com/blog/?p=33

5.Basic Usage

A by-product of the import process is a generated Java class which can encapsulate one row of the imported table;
導入過程生成的副產品是一個Java class,它封裝了導入的表的一行。
boundary query:邊界查詢
--verbose  ( 詳細的) 顯示詳細的工作信息 , 顯示debug日誌.

6.Sqoop Tools

Sqoop is a collection of related tools. To use Sqoop, you specify the tool you want to use and the arguments that control the tool.

sqoop是一組 相關工具的集合,你可以通過制定工具和參數來控制它。

If Sqoop is compiled from its own source, you can run Sqoop without a formal installation process by running thebin/sqoopprogram. Users of a packaged deployment of Sqoop (such as an RPM shipped with Apache Bigtop) will see this program installed as/usr/bin/sqoop. The remainder of this documentation will refer to this program assqoop. For example:

sqoop的調用方式, /bin/sqoop 或 /usr/bin/sqoop

$ sqoop tool-name [tool-arguments]
,
[Note]Note

$ 代表客戶端,它不是輸入的一部分

Sqoop ships with a help tool. To display a list of all available tools, type the following command:

Sqoop有一個幫助工具,來展示可用的工具列表,輸入下面的命令

type 有打字和輸入的意思 ships with :帶有

$ sqoop help
usage: sqoop COMMAND [ARGS]

Available commands:
  codegen            Generate code to interact with database records 生成與數據庫記錄交互的代碼
  create-hive-table  Import a table definition into Hive 創建hive型表結構
  eval               Evaluate a SQL statement and display the results 返回sql的執行結果並顯示 
  export             Export an HDFS directory to a database table 導出一個HDFS目錄到一個表
  help               List available commands 列出可用命令行
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS 導出指定數據庫的所有表
  list-databases     List available databases on a server 列出所有數據庫名
  list-tables        List available tables in a database列出所有表名
  version            Display version information 顯示版本信息

See 'sqoop help COMMAND' for information on a specific command. help還可以用在一個指定的命令上

You can display help for a specific tool by entering:sqoop help (tool-name); for example,sqoop help import.


You can also add the--helpargument to any command:sqoop import --help.

你也可以這樣使用:sqoop import --help.

6.1.Using Command Aliases

In addition to typing thesqoop (toolname)syntax, you can use alias scripts that specify thesqoop-(toolname)syntax. For example, the scriptssqoop-import,sqoop-export, etc. each select a specific tool.

使用別名,如sqoop-import, sqoop-export.

6.2.Controlling the Hadoop Installation //控制hadoop安裝

You invoke Sqoop through the program launch capability provided by Hadoop. Thesqoopcommand-line program is a wrapper which runs thebin/hadoopscript shipped with Hadoop. If you have multiple installations of Hadoop present on your machine, you can select the Hadoop installation by setting the$HADOOP_COMMON_HOMEand$HADOOP_MAPRED_HOMEenvironment variables.

For example:

$ HADOOP_COMMON_HOME=/path/to/some/hadoop \
  HADOOP_MAPRED_HOME=/path/to/some/hadoop-mapreduce \
  sqoop import --arguments...

or:

$ export HADOOP_COMMON_HOME=/some/path/to/hadoop
$ export HADOOP_MAPRED_HOME=/some/path/to/hadoop-mapreduce
$ sqoop import --arguments...

If either of these variables are not set, Sqoop will fall back to$HADOOP_HOME. If it is not set either, Sqoop will use the default installation locations for Apache Bigtop,/usr/lib/hadoopand/usr/lib/hadoop-mapreduce, respectively.

The active Hadoop configuration is loaded from$HADOOP_HOME/conf/, unless the$HADOOP_CONF_DIRenvironment variable is set.

6.3.Using Generic and Specific Arguments //使用通用 和指定參數

To control the operation of each Sqoop tool, you use generic and specific arguments.

控制每個 Sqoop工具,你都可以使用通用參數和指定參數

For example: 例如:

$ sqoop help import
usage: sqoop import [GENERIC-ARGS] [TOOL-ARGS]

Common arguments:
   --connect <jdbc-uri>     Specify JDBC connect string //指定JDBC連接字符串
   --connect-manager <jdbc-uri>     Specify connection manager class to use//指定連接管理者的class類
   --driver <class-name>    Manually specify JDBC driver class to use//手動指定 JDBC驅動類
   --hadoop-mapred-home <dir>+      Override $HADOOP_MAPRED_HOME
   --help                   Print usage instructions //顯示使用說明
-P                          Read password from console//從控制檯讀取參數
   --password <password>    Set authentication password //身份驗證密碼
   --username <username>    Set authentication username//身份驗證用戶
   --verbose                Print more information while working //輸出debug信息
   --hadoop-home <dir>+     Deprecated. Override $HADOOP_HOME //已經棄用

[...]

Generic Hadoop command-line arguments: Hadoop通用命令參數(是hadoop的命令,詳見hadoop命令文檔,可以用在sqoop tool上)
(must preceed any tool-specific arguments)(必須優先指定工具的參數)
Generic options supported are
-conf <configuration file>     specify an application configuration file//指定一個應用配置文件
-D <property=value>            use value for given property //傳參數
-fs <local|namenode:port>      specify a namenode// 指定一個namenote
-jt <local|jobtracker:port>    specify a job tracker//指定一個 job tracker
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster 指定逗號分隔文件。這些文件被拷貝到mapreduce 集羣上
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is 通用命令行語法是 bin/hadoop command [genericOptions] [commandOptions]

You must supply the generic arguments-conf,-D, and so on after the tool name butbeforeany tool-specific arguments (such as--connect). Note that generic Hadoop arguments are preceeded by a single dash character (-), whereas tool-specific arguments start with two dashes (--), unless they are single character arguments such as-P.

-conf, -D和其他的通用參數必須寫在所有的指定工具的參數前  (比如 --connect).注意 通用參數使用一個破折號 (-),在特定工具中除了單個字符的參數使用一個破折號外其他參數使用兩個破折號(-)

The-conf,-D,-fsand-jtarguments control the configuration and Hadoop server settings. For example, the-D mapred.job.name=<job_name>can be used to set the name of the MR job that Sqoop launches, if not specified, the name defaults to the jar name for the job - which is derived from the used table name.

-conf, -D, -fs and -jt控制hadoop服務的設置。(具體 作用 就得研究hadoop的參數)

The-files,-libjars, and-archivesarguments are not typically used with Sqoop, but they are included as part of Hadoop’s internal argument-parsing system.

-files, -libjars, and -archives 通常不用於Sqoop,他們被包含作爲hadoop的內部分析參數系統的一部分。


6.4.Using Options Files to Pass Arguments //使用選中的文件傳遞參數

//很羅嗦,就是說可以指定一個 文件來傳遞參數,並給出了例子,試試就懂了。

When using Sqoop, the command line options that do not change from invocation to invocation can be put in an options file for convenience. An options file is a text file where each line identifies an option in the order that it appears otherwise on the command line. Option files allow specifying a single option on multiple lines by using the back-slash character at the end of intermediate lines. Also supported are comments within option files that begin with the hash character. Comments must be specified on a new line and may not be mixed with option text. All comments and empty lines are ignored when option files are expanded. Unless options appear as quoted strings, any leading or trailing spaces are ignored. Quoted strings if used must not extend beyond the line on which they are specified.

Option files can be specified anywhere in the command line as long as the options within them follow the otherwise prescribed rules of options ordering. For instance, regardless of where the options are loaded from, they must follow the ordering such that generic options appear first, tool specific options next, finally followed by options that are intended to be passed to child programs.

To specify an options file, simply create an options file in a convenient location and pass it to the command line via--options-fileargument.

Whenever an options file is specified, it is expanded on the command line before the tool is invoked. You can specify more than one option files within the same invocation if needed.

For example, the following Sqoop invocation for import can be specified alternatively as shown below:

$ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST

$ sqoop --options-file /users/homer/work/import.txt --table TEST

where the options file/users/homer/work/import.txtcontains the following:

import
--connect
jdbc:mysql://localhost/db
--username
foo

The options file can have empty lines and comments for readability purposes. So the above example would work exactly the same if the options file/users/homer/work/import.txtcontained the following:

#
# Options file for Sqoop import
#

# Specifies the tool being invoked
import

# Connect parameter and value
--connect
jdbc:mysql://localhost/db

# Username parameter and value
--username
foo

#
# Remaining options should be specified in the command line.
#

6.5.Using Tools

The following sections will describe each tool’s operation. The tools are listed in the most likely order you will find them useful.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章