1.實現原理
基於datax同步功能,datax-web增量功能,clickhouse的ReplacingMergeTree分區合併功能實現增量同步(包含單條數據的修改後同步)
2.前置條件
datax沒有clickhousewriter插件,需要下載插件並反正plugin的writer目錄下
clickhousewriter插件下載(爲了方便下載,已上傳至csdn)
3.實現方案
1.基於datax_web的增量功能
時間戳參數
-DlastTime=%s -DcurrentTime=’%s’
2.增量同步數據到clickhourse,涉及到單條數據的更新
https://clickhouse.com/docs/zh/engines/table-engines/mergetree-family/replacingmergetree/
DROP TABLE IF EXISTS tms.datax_test_incr;
CREATE TABLE tms.datax_test_incr (
id BIGINT,
name VARCHAR(100),
update_time_stamp TIMESTAMP
) ENGINE = ReplacingMergeTree(update_time_stamp)
PARTITION BY toYYYYMM(update_time_stamp)
PRIMARY KEY id
ORDER BY id
//手動合併分區
optimize table datax_test_incr final;
//查詢去重
select * from datax_test_incr final;
dataxweb配置
"postSql": [
"optimize table datax_test_incr final"
],
4.實現後效果
————————————————