01-datax安裝和簡單實用

參考連接:

datax github官方地址:https://github.com/alibaba/DataX

1, 安裝使用

1.1, 下載地址

http://datax-opensource.oss-cn-hangzhou.aliyuncs.com/datax.tar.gz

1.2, 使用方式(DataX工具包,非源碼編譯方式)
  • 下載後解壓至本地某個目錄,進入bin目錄,即可運行同步作業:

    $ cd  {YOUR_DATAX_HOME}/bin
    $ python datax.py {YOUR_JOB.json}
    
  • 自檢腳本:

    $ python {YOUR_DATAX_HOME}/bin/datax.py {YOUR_DATAX_HOME}/job/job.json
    
1.3, 配置示例:從stream讀取數據並打印到控制檯
1.3.1, 第一步, 查找配置文件模板(json格式)

可以通過命令查看配置模板: python datax.py -r {YOUR_READER} -w {YOUR_WRITER}

如:python datax.py -r streamreader -w streamwriter, 會打印如下信息:

當然也可以通過官網的github地址下載配置的模板, 裏面還有具體的字段的詳細解釋

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.


Please refer to the streamreader document:
     https://github.com/alibaba/DataX/blob/master/streamreader/doc/streamreader.md 

Please refer to the streamwriter document:
     https://github.com/alibaba/DataX/blob/master/streamwriter/doc/streamwriter.md 
 
Please save the following configuration as a json file and  use
     python {DATAX_HOME}/bin/datax.py {JSON_FILE_NAME}.json 
to run the job.

{
    "job": {
        "content": [
            {
                "reader": {
                    "name": "streamreader", 
                    "parameter": {
                        "column": [], 
                        "sliceRecordCount": ""
                    }
                }, 
                "writer": {
                    "name": "streamwriter", 
                    "parameter": {
                        "encoding": "", 
                        "print": true
                    }
                }
            }
        ], 
        "setting": {
            "speed": {
                "channel": ""
            }
        }
    }
}
1.3.2, 第二步, 根據模板自定義配置文件
#stream2stream.json
{
  "job": {
    "content": [
      {
        "reader": {
          "name": "streamreader",
          "parameter": {
            "sliceRecordCount": 10,
            "column": [
              {
                "type": "long",
                "value": "10"
              },
              {
                "type": "string",
                "value": "hello,你好,世界-DataX"
              }
            ]
          }
        },
        "writer": {
          "name": "streamwriter",
          "parameter": {
            "encoding": "UTF-8",
            "print": true
          }
        }
      }
    ],
    "setting": {
      "speed": {
        "channel": 5
       }
    }
  }
}
1.3.3, 第三步, 啓動datax,根據配置json文件執行即可
  • 如下執行後即可通過streamwriter把streamreader從內存中讀取的內存打印在控制檯上
$ python ../bin/datax.py stream2stream.json
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章