datax中的Transformer的使用
建議看一下datax的源碼哦!其實沒有我們想象的那麼複雜...
官網上也有些示例代碼的。請看地址:https://github.com/alibaba/DataX/tree/master/core/src/main/java/com/alibaba/datax/core/transport/transformer
類似截圖:
詳細實踐
同步配置如下:(我把相關連接了,域名了省略了...)
{
"content": [
{
"reader": {
"name": "hivereader",
"parameter": {
"column": [
...
{
"name": "stat_date",
"type": "string"
},
...
],
"hive2Url": "jdbc:hive2://xxxxxxx:xxxx/test_dev_2",
"modifyUserName": "xxxxxxx",
"partition": "ds=20190424",
"table": "xxx",
"username": "hive"
}
},
"writer": {
"name": "adswriter",
"parameter": {
"column": [
...
"stat_date"
...
],
"modifyUserName": "xxxxxxxxxxxxxx",
"partition": "id=20190424",
"partitionKey": [
"id"
],
"partitionValue": [
"20190424"
],
"password": "******************************",
"schema": "xxxx",
"table": "xxxx",
"url": "xxxxxxx:xxxxx",
"username": "xxxxxxxxxxxxxx",
"writeMode": "insert"
}
}
}
],
"setting": {
"errorLimit": {
"record": 0
},
"speed": {
"channel": 5,
"throttle": false
}
}
}
添加完Transform的配置:
{
"content": [
{
"reader": {
"name": "hivereader",
"parameter": {
"column": [
...
],
"hive2Url": "jdbc:hive2://xxxxxxx:xxxx/test_dev_2",
"modifyUserName": "xxxxxxxxxxxxxx",
"partition": "ds=20190424",
"table": "xxx",
"username": "hive"
}
},
"transformer": [
{
"name": "dx_groovy",
"parameter": {
"code": "return record",
"extraPackage": [
"import groovy.json.JsonSlurper;"
]
}
}
],
"writer": {
"name": "adswriter",
"parameter": {
"column": [
...
],
"modifyUserName": "xxxxxxxxxxx",
"partition": "id=20190424",
"partitionKey": [
"id"
],
"partitionValue": [
"20190424"
],
"password": "******************************",
"schema": "xxx",
"table": "xxx",
"url": "xxxxxxxxxxxxxxxxxxxxx:xxxxxxx",
"username": "xxxxxxx",
"writeMode": "insert"
}
}
}
],
"setting": {
"errorLimit": {
"record": 0
},
"speed": {
"channel": 5,
"throttle": false
}
}
}
使用注意事項
“name” : 對應的datax中自定義Transformer名字, 固定格式: dx_groovy
“parameter”: Transformer參數
“code” : 需要對同步表進行的數據的邏輯操作(在idea或eclipse中繼承Transformer類重寫evaluate方法.得到record對象), code裏面的東西不能隨便換行,整個transformer是正確的json. 定義變量用def 進行定義,會自動類型轉化
“extraPackage”: 不支持引入第三方jar包.只能用自身的.
建議大家下載dataX源碼看看哈~
數據字段頭部增加字符
{
"name":"dx_groovy",
"parameter":{
"code":"Column column = record.getColumn(1);def str = column.asString();def sb = new StringBuffer(str);def header = sb.insert(0,'AA');def strHearder = header.toString();record.setColumn(1, new StringColumn(strHearder));return record",
"extraPackage":[
"import groovy.json.JsonSlurper;"
]
}
}
結果圖:
數據字段尾部添加字符
{
"name":"dx_groovy",
"parameter":{
"code":"Column column = record.getColumn(1);def str = column.asString();def sb = new StringBuffer(str);def tail = sb.append('ZZ');def strTail = tail.toString();record.setColumn(1, new StringColumn(strTail));return record",
"extraPackage":[
"import groovy.json.JsonSlurper;"
]
}
}
結果圖:
數據字段中間插入字符
{
"name":"dx_groovy",
"parameter":{
"code":"Column column = record.getColumn(1);def str = column.asString();def sb = new StringBuffer(str);def mid = sb.insert(2,'A');def strMid = mid.toString();record.setColumn(1, new StringColumn(strMid));return record",
"extraPackage":[
"import groovy.json.JsonSlurper;"
]
}
}
結果圖
數據字段字符轉換
{
"name":"dx_groovy",
"parameter":{
"code":"Column column = record.getColumn(1);def str = column.asString();def newStr=null;if(str.contains('BJ')){newStr=str.replaceAll('BJ', '北京');record.setColumn(1, new StringColumn(newStr));};return record",
"extraPackage":[
"import groovy.json.JsonSlurper;"
]
}
}
結果圖:
數據字段歸零
{
"name":"dx_groovy",
"parameter":{
"code":"Column column = record.getColumn(1);def str = column.asString();str='0';record.setColumn(1, new StringColumn(str));return record",
"extraPackage":[
"import groovy.json.JsonSlurper;"
]
}
結果圖: