對接MQ實時同步mysql數據到kudu-附案例代碼

目錄

  1. 背景簡介
  2. 需求分析
  3. 項目實現
  4. 案例實現

 

一:背景簡介

    近幾年,隨着大數據的興起,每個公司越來越重視對數據的利用,從廣泛的定義來看,數據分爲歷史數據即離線數據和實時流數據,實時數據的處理往往比離線數據更加複雜,對機器資源的要求更加的苛刻,但是實時數據的採集,加工,利用確實在當前互聯網行業有着獨特的應用。本人從事電商行業,就拿電商舉例,雙11大屏,各種業務方對接,需要查看實時數據等,實時的數據比起離線數據更具有時效性。本文就拿實時同步mysql數據到kudu來舉例,來大概講述一下實時系統的一套完整解決方案,當然技術選型沒有固定,需要根據公司當前的業務來定。

 

二:需求分析

    實時同步mysql數據到kudu上,大致思路是實時採集mysql的binlog日誌,解析binlog日誌,通過MQ進行實時的傳輸,消費者實時消費數據寫到kudu當中去,根據以上思路,大致可以分爲以下幾點:

  1. 實時監控binlog日誌並採集
  2. 解析binlog日誌,並作爲生產者發送給MQ
  3. 作爲消費者消費數據
  4. 復構數據源,寫入kudu

 

三:項目實現

 

    第一步就是實時監控binlog日誌並採集,我們採用的是阿里開源的canal框架,純java開發的,詳細關於canal的介紹請閱讀該篇博客:canal介紹。簡而言之就是canal可以複製實時監控binlog日誌,集採集,解析,過濾,存儲一套的完整框架,我們針對canal進行了輕量級的封裝(DataHub),以供更加方便的使用,以下是結構流程圖:

大家都知道binlog裏面存儲的是二進制的數據,想要直接利用二進制的數據來進行二次開發是相當複雜的,工程量很難維護,所以採用canal的好處是,canal複製解析二進制數據,生成較好的結構化數據,大概結構如下所示:

以下爲本人採集的一條模擬數據,主要方便大家理解解析以後的數據格式:

[INFO] 2019-02-25 14:38:40.069 [DafkaConsumerThread0-bigdata_dmall_pos_sale][LOG_KUDU_JOB]com.dmall.data.pos.PosOrderBinlogHandler:124 - msg : {"batchId":"42|0000000002","dbName":"dmall_pos_sale","ddl":false,"ddlSql":"","eventType":"INSERT","executionTime":1550817636000
,"logicTableName":"sale","partitionKey":"sale","realTableName":"sale","rowData":[{"afterColumns":[{"key":true,"name":"id","null":false,"type":"int(10) unsigned","updated":true,"value":"1"},{"key":false,"name":"group_no","null":false,"type":"varchar(30)","updated":true,"
value":"10"},{"key":false,"name":"region_no","null":false,"type":"varchar(10)","updated":true,"value":"300"},{"key":false,"name":"org_no","null":false,"type":"varchar(10)","updated":true,"value":"2013"},{"key":false,"name":"pos_id","null":false,"type":"smallint(6)","upd
ated":true,"value":"1"},{"key":false,"name":"sale_dt","null":false,"type":"datetime","updated":true,"value":"2019-02-22 14:24:20"},{"key":false,"name":"sale_id","null":false,"type":"int(11)","updated":true,"value":"4"},{"key":false,"name":"total_amt","null":false,"type"
:"decimal(12,4)","updated":true,"value":"1.3000"},{"key":false,"name":"total_discount","null":false,"type":"decimal(12,2)","updated":true,"value":"0.0"},{"key":false,"name":"mem_need_score","null":false,"type":"decimal(12,4)","updated":true,"value":"0.0"},{"key":false,"
name":"mem_ecard_no","null":false,"type":"varchar(60)","updated":true,"value":""},{"key":false,"name":"mem_user_id","null":false,"type":"varchar(60)","updated":true,"value":""},{"key":false,"name":"mem_code","null":false,"type":"varchar(20)","updated":true,"value":""},{
"key":false,"name":"mem_card_channel","null":false,"type":"varchar(30)","updated":true,"value":""},{"key":false,"name":"mem_input_code","null":false,"type":"varchar(60)","updated":true,"value":""},{"key":false,"name":"mem_input_type","null":false,"type":"int(11)","updat
ed":true,"value":"0"},{"key":false,"name":"mem_card_level","null":false,"type":"varchar(30)","updated":true,"value":""},{"key":false,"name":"mem_score_flag","null":false,"type":"int(11)","updated":true,"value":"0"},{"key":false,"name":"eorder_id","null":false,"type":"va
rchar(30)","updated":true,"value":""},{"key":false,"name":"eorder_status","null":false,"type":"varchar(30)","updated":true,"value":"False"},{"key":false,"name":"business_id","null":false,"type":"varchar(30)","updated":true,"value":"1"},{"key":false,"name":"coupon_temple
_no","null":false,"type":"varchar(128)","updated":true,"value":""},{"key":false,"name":"merch_input_dur","null":false,"type":"int(11)","updated":true,"value":"436"},{"key":false,"name":"trans_totl_dur","null":false,"type":"int(11)","updated":true,"value":"442"},{"key":f
alse,"name":"cashier_type","null":false,"type":"int(11)","updated":true,"value":"0"},{"key":false,"name":"cashier_id","null":false,"type":"int(11)","updated":true,"value":"0"},{"key":false,"name":"cashier_no","null":false,"type":"varchar(30)","updated":true,"value":"000
00000"},{"key":false,"name":"qr_code","null":false,"type":"varchar(128)","updated":true,"value":""},{"key":false,"name":"upload_flag","null":false,"type":"int(11)","updated":true,"value":"0"},{"key":false,"name":"upload_msg","null":false,"type":"varchar(60)","updated":t
rue,"value":""}],"beforeColumns":[]}]}

完成了binlog的實時採集和解析以後,接下來需要對接MQ來進行實時消費了,市面上常見的MQ有很多,我們公司採用的是kafka和阿里開源的RocketMQ。大概的結構圖如下:

有的小夥伴可能就會爲爲什麼對接了兩個MQ(AWS現在已經摒棄不用了),那是因爲kafka和Rocket適用的場景不一樣。

1.Kafka特點:

1) Kafka的高吞吐建立在批量發送的基礎上,而批量發送存在丟數據的風險。
2) 一個partition有多個replica,同步模型要求一次寫入,所有replica都寫成功才返回。
3) 注重高吞吐,對低延遲沒有太多優化。其內部對寫同步做了大量的異步處理,因此耗時不穩定,尤其是在多副本的情況下。
4) Topic以Partition爲寫單位,一個Broker上多個Partition寫入讓順序寫變成了隨機寫,犧牲了寫性能。

從業務方使用消息系統的特點及Kafka特點,依然堅持用Kafka來實現業務方要求的低延遲是不現實的,並且新版本的Kafka對延遲並沒有明顯的優化。因此考慮引入其他MQ作爲DMG的核心組件。

 

2.爲什麼是RocketMQ

所有數據單獨存儲到一個Commit Log,完全順序寫,隨機讀。

主備數據同步方式:同步寫入,異步刷盤。

對磁盤的訪問串行化,避免磁盤竟爭,不會因爲隊列增加導致IOWAIT增高。

基於java實現,便於二次開發。

 

3.RocketMQ特性摘要

1) 與Kafka集羣類比,NameServer取代了Zookeeper作爲服務發現的替代組件。

2) 位置點信息存在Broker上,消費成功的消息提交位置點是Schedule線程異步進行的。

3) Topic有write queue和read queue的概念(與Kafka partition類似),write queue分散寫壓力,read queue擴大讀併發。Read queue數據來源於write queue,一般write queue與read queue的數量相同即可,但也可以不一致。

4) 寫入一個Broker的不同Topic的所有消息都順序寫入同一個文件,然後對該文件按Topic、QueueId建索引,保證了順序寫,但讀的消耗會增加。

5) 消費端消費失敗的數據,會被髮回Broker,寫入一個後臺創建的名爲%RETRY%{consume_group_name}的Topic,供消費重試;消費重試的次數如果超過5次,消息會被髮回Broker,寫入一個後臺創建的名爲%DLQ%{consume_group_name}的死信隊列,死信隊列的消息不可再被消費,但可以被檢索,重發。

6) RocketMQ亮點之一,所有消息都支持查詢。

7) Broker group可水平擴展,增加Broker group,可同時提升吞吐量,降低寫延遲。

整個項目的流程差不多就是這個樣子,相關組件框架,如果各位讀者需要詳細介紹的,請自行百度。

 

四:案例實現

    說了這麼多結構呀,框架呀,該實現一下需求了,那就是作爲消費者從kafka中消費數據,實時的寫入kudu當中去,當然這邊的數據就是通過DataHub解析完的數據,結構在上文已經給出,下面給出具體的案例代碼:

package com.dmall.data.mysql2kudu;

import com.alibaba.fastjson.JSONObject;
import com.dmall.data.kudu.KuduAgentClient;
import com.dmall.data.kudu.KuduColumn;
import com.dmall.data.kudu.KuduRow;
import com.dmall.data.mq.core.BaseDataHub;
import com.dmall.datahub.clientmodel.prototype.ColumnModel;
import com.dmall.datahub.clientmodel.prototype.EventBatchModel;
import com.dmall.datahub.clientmodel.prototype.RowDataModel;
import org.apache.kudu.Type;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.InitializingBean;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.Properties;

public class Mysql2KuduHandler extends BaseDataHub implements InitializingBean {
    private static String MYSQL_DB_NAME = null;
    private static String MYSQL_TABLE_NAME = null;
    private static final Properties prop = new Properties();
    static BufferedReader br = null;
    static KuduAgentClient agent = null;
    static Logger log = null;

    static {
        log = LoggerFactory.getLogger("LOG_KUDU_JOB_TEST");
        String masterHost = "idc-10-248-3-71.ddw.dmall.com:7051,idc-10-248-3-72.ddw.dmall.com:7051,idc-10-248-3-73.ddw.dmall.com:7051";
        agent = new KuduAgentClient(masterHost);
        try {
            br = new BufferedReader(new FileReader(new File(".").getAbsolutePath().replace(".", "") + "table.properties"));
            prop.load(br);
            Iterator<String> it = prop.stringPropertyNames().iterator();
            while (it.hasNext()) {
                String key = it.next();
                String value = prop.getProperty(key);
                if (key.equals("MYSQL_DB_NAME")) {
                    MYSQL_DB_NAME = value;
                } else if (key.equals("MYSQL_TABLE_NAME")) {
                    MYSQL_TABLE_NAME = value;
                }
                log.info("the key is:{} and value is:{}", key, value);
            }
            br.close();
        } catch (Exception e) {
            e.printStackTrace();
            log.error("執行靜態代碼塊報錯:{}", e);
        }
    }

    @Override
    protected void dataHub(EventBatchModel eventBatchModel) throws Exception {
        log.info("msg : {}", JSONObject.toJSONString(eventBatchModel));
//        Thread.sleep(1000);
        if (eventBatchModel.getDbName().equals(MYSQL_DB_NAME)) {
            if (eventBatchModel.getLogicTableName().equals(MYSQL_TABLE_NAME)) {
                RowDataModel[] rowDatas = eventBatchModel.getRowData();
                log.info("need rowDatas : {}", JSONObject.toJSONString(rowDatas));
                for (RowDataModel rowModel : rowDatas) {
                    // 對mysql來的一條數據進行處理
                    ColumnModel[] rowAft = rowModel.getAfterColumns();
                    List<KuduColumn> row = new ArrayList<>();
                    for (ColumnModel column : rowAft) {
                        Type column_kudu_type = null;
                        Object kudu_colValue = null;
                        String colName = column.getName();
                        String colValue = column.getValue();
                        String colType = column.getType();
                        Boolean isKey = column.isKey();
                        // 對colType做相關處理,使之符合需求,如:INT,VARCHAR,DOUBLE,FLOAT等
                        if (colType.startsWith("int")) {
                            colType = "INT";
                        } else if (colType.startsWith("tinyint")) {
                            colType = "TINYINT";
                        } else if (colType.startsWith("smallint")) {
                            colType = "SMALLINT";
                        } else if (colType.startsWith("bigint")) {
                            colType = "BIGINT";
                        } else if (colType.startsWith("varchar")) {
                            colType = "VARCHAR";
                        } else if (colType.startsWith("double")) {
                            colType = "DOUBLE";
                        } else if (colType.startsWith("float")) {
                            colType = "FLOAT";
                        } else if (colType.startsWith("decimal")) {
                            colType = "DECIMAL";
                        } else if (colType.startsWith("datetime")) {
                            colType = "DATETIME";
                        }
                        switch (colType) {
                            case "INT":
                                column_kudu_type = Type.INT32;
                                kudu_colValue = Integer.parseInt(colValue);
                                break;
                            case "TINYINT":
                                // mysql中的TINYINT類型一律對應kudu的INT類型
                                column_kudu_type = Type.INT8;
                                kudu_colValue = Integer.parseInt(colValue);
                                break;
                            case "SMALLINT":
                                // mysql中的SMALLINT類型一律對應kudu的INT類型
                                column_kudu_type = Type.INT16;
                                kudu_colValue = Integer.parseInt(colValue);
                                break;
                            case "BIGINT":
                                column_kudu_type = Type.INT64;
                                kudu_colValue = Long.parseLong(colValue);
                                break;
                            case "VARCHAR":
                                column_kudu_type = Type.STRING;
                                kudu_colValue = colValue;
                                break;
                            case "FLOAT":
                                column_kudu_type = Type.FLOAT;
                                kudu_colValue = Float.parseFloat(colValue);
                                break;
                            case "DOUBLE":
                                column_kudu_type = Type.DOUBLE;
                                kudu_colValue = Double.parseDouble(colValue);
                                break;
                            case "DECIMAL":
                                // mysql中的DECIMAL類型一律對應kudu的DOUBLE類型
                                column_kudu_type = Type.DOUBLE;
                                kudu_colValue = Double.parseDouble(colValue);
                                break;
                            case "DATETIME":
                                // mysql中的DATETIME類型一律對應kudu的DOUBLE類型
                                column_kudu_type = Type.STRING;
                                kudu_colValue = String.valueOf(colValue);
                                break;
                            default:
                                break;
                        }
//                        System.out.println("colName is:"+colName+" and colValue is:"+kudu_colValue+" and column_kudu_type is:"+column_kudu_type+" and isKey is:"+isKey);
//                        將key和value寫入kudu表當中
                        log.info("colName is:{} and colValue is:{} and column_kudu_type is:{} and isKey:{}", colName, kudu_colValue, column_kudu_type, isKey);
                        KuduColumn c01 = new KuduColumn();
                        if (isKey) {
                            c01.setColumnName(colName).setColumnValue(kudu_colValue).setColumnType(column_kudu_type).setPrimaryKey(true).setUpdate(true).setNullEnble(false);
                        } else {
                            c01.setColumnName(colName).setColumnValue(kudu_colValue).setColumnType(column_kudu_type).setPrimaryKey(false).setUpdate(true);
                        }
                        row.add(c01);
                    }
                    KuduRow myrows01 = new KuduRow();
                    myrows01.setRows(row);
                    log.info("更新前的數據爲:{}",myrows01);
                    if(eventBatchModel.getEventType().equals("INSERT")||eventBatchModel.getEventType().equals("UPDATE")){
                        agent.upsert("impala::"+MYSQL_DB_NAME+"."+MYSQL_TABLE_NAME, agent.getKdClient(), myrows01);
                    }else if(eventBatchModel.getEventType().equals("DELETE")){
                        agent.delete("impala::"+MYSQL_DB_NAME+"."+MYSQL_TABLE_NAME, agent.getKdClient(), myrows01);
                    }
                }
            }
        }
    }
    @Override
    public void afterPropertiesSet() throws Exception {

    }
}

代碼只是提供一個思路,並不能直接使用。

好啦,以上就是全部內容,原創作品,轉載請註明出處!

謝謝!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章