1.datax下載
2.配置
爲了方便操作,在/etc/profile下配置DATAX_HOME,並將bin目錄導入PAHT
3.測試
python $DATAX_HOME/bin/datax.py {YOUR_JOB.json}
eg:
cd $DATAX_HOME
python ./bin/datax.py ./job/job.json
4.1 mysql–>hive
mysql的exam數據庫下創建一個user表
CREATE TABLE user (
user_id bigint(11) NOT NULL,
user_name varchar(25) DEFAULT NULL,
trade_time datetime DEFAULT NULL,
PRIMARY KEY (user_id)
) ENGINE=InnoDB AUTO_INCREMENT=0 DEFAULT CHARSET=utf8;
在hive的ali.db下創建一個u2表 指定存儲格式爲orc
create table if not exists u2(
id int,
name string,
trade_time string
)
row format delimited fields terminated by '\t'
stored as orc
;
在./job/下創建一個mysqltohive.json文件
//爲了hive中u2表可以加載數據,所以這裏的hdfswriter的filetype要和創建u2表的壓縮格式一致
內容如下:
{
"job": {
"content": [
{
"reader": {
"name": "mysqlreader",
"parameter": {
"column": [
"user_id",
"user_name",
"trade_time"
],
"connection": [
{
"jdbcUrl": ["jdbc:mysql://hadoop111:3306/exam"],
"table": ["user"]
}
],
"password": "root",
"username": "root"
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"defaultFS": "hdfs://hadoop111:9000",
"fileType": "orc",
"path": "/user/hive/warehouse/ali.db/u2",
"fileName": "m2h01",
"column": [
{
"name": "id",
"type": "INT"
},
{
"name": "name",
"type": "STRING"
},
{
"name": "trade_time",
"type": "DATE"
}
],
"writeMode": "append",
"fieldDelimiter": "\t",
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
執行:
python $DATAX_HOME/bin/datax.py $DATAX_HOME/job/mysqltohive.json
- hdfswriter中的字段名要和hive下創建的u2表的字段名相同
- 並且字段類型要相互對應,見下表