跨境支付平臺XTransfer的實時數倉之路:深度參與開源才能不被淘汰

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近兩年新冠肺炎疫情對各行各業造成重大沖擊,但對於跨境電商行業來說則是機大於危,跨境支付賽道也因此備受關注。其中,受疫情影響,大量B2B外貿交易轉到線上,相比起B2C,B2B跨境交易支付場景更爲複雜,因爲其業務場景也往往更爲複雜、週期長、貿易參與角色衆多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另一方面,支付業務天然地對風險識別的及時性和準確性要求非常高,因此數據的採集、加工、計算就需要前置在每個業務節點、前瞻性地做好數據資產的維護工作。"},{"type":"text","marks":[{"type":"strong"}],"text":"實時數倉和實時模型引擎,就是當前解決上述問題最合適的基礎設施。"},{"type":"text","text":"近日,InfoQ有幸接觸到面向中小微外貿企業提供B2B跨境支付和風控服務的平臺XTransfer(上海奪暢網絡技術有限公司),並通過採訪得以瞭解其自行搭建實時數倉的實踐過程和背後的思考,以及深度參與開源的方式。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採訪嘉賓:XTransfer聯合創始人兼CTO劉豔芳、XTransfer技術專家康偉"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"技術發展三階段"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"XTransfer於2017年創立,聚焦於B2B跨境支付,爲從事跨境電商B2B出口的中小微企業提供外貿收款服務,以及風控服務,解決貿易風險問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從創立至今,XTransfer的技術發展歷程可大致分爲三個階段:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第一階段(2017年7月-2018年10月):創業初期,需要先打造出基礎平臺,並把業務鏈路跑通,確保基礎平臺能夠支撐業務的發展。在這個過程中,"},{"type":"text","marks":[{"type":"strong"}],"text":"公司完成了整個"},{"type":"text","marks":[{"type":"strong"},{"type":"strong"},{"type":"strong"}],"text":"基礎平臺"},{"type":"text","marks":[{"type":"strong"}],"text":"體系的搭建"},{"type":"text","text":"。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第二階段(2018年6月-2021年7月):提效,利用大數據等技術提升效率。例如在公司內部通過算法模型去提升風險審覈的效率,降低人工工單審覈的比例。此外,運用OCR(光學字符識別)等機器學習技術幫助客戶做一些數據處理工作。在這個階段,公司開始推出新的產品。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"第三階段(2021年7月至今):平臺縱深發展,從單點的金融服務平臺往更智能化、數字化的方向發展。比如,今年新推出外貿CRM,總結出一套新的產品模式,幫助中小微外貿企業數字化轉型。該階段的重點在於如何將公司內部所沉澱積累的能力輸出給中小微企業。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"XTransfer的技術平臺和衆多服務都是基於雲基礎設施來打造的,目前主要用的是阿里雲,但包括華爲雲、騰訊雲和百度雲在內的雲服務也都有在用。“企業發展到一定階段,無論是出於成本還是穩定性的考慮,肯定要做多雲方案。我們現階段最大的考量點不是成本,而是打造更優質的產品和業務。”劉豔芳表示。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,公司研發團隊大致分爲兩部分,一部分是業務職能團隊,包括前端、後端、測試、運維等,另一部分是大數據和算法團隊。無論是業務平臺還是數據平臺的開發,都是先做開源框架選型,再以自研工作爲主。技術選型方面,以穩定性爲第一齣發點,並確保平臺的穩定性、安全性和數據精確性。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"實時數倉建設實踐"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"自建方案更靈活"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相較於ToC的場景,ToB業務的數據量相對要少很多。劉豔芳指出,目前市場上流行的方案大多數能夠有效解決大數據量計算和處理的問題,但是在安全性、穩定性和準確性上並沒有達到金融級別的要求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"跨境支付+B類支付結算場景所涉及的業務鏈路非常長,從詢盤到最終成交,當中涉及物流條款、支付條款,需要在每個節點上做風險管控。跨境資金交易監管愈發嚴格。外貿企業進行收款和資金週轉,這一過程受到金融機構及監管嚴格的反洗錢風險管理。以上種種因素對XTransfer的數據處理安全性和準確性都提出了更高的要求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/3c\/3c8f2a22b41a5256b0fb64367d955785.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":10}}],"text":"XTransfer所打造的大數據風控基礎設施架構圖"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"據瞭解,XTransfer在全球部署了3個數據中心,搭建了自己的實時數據倉庫,能夠有效地保障在跨境B2B業務全鏈路上,數據可以被有效採集、加工和計算,並滿足高安全、低延時、高精度等需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於初創型(用戶量在百萬以下)企業來說,對實時數倉的需求往往是部署和運維成本低、易開發、架構靈活簡單、開箱即用,僅投入較少的人力和時間成本就能滿足業務需求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"雖然市面上不少廠商都在推各種各樣的實時數倉解決方案,但劉豔芳認爲目前市場上所推出的實時數倉解決方案,多數針對2C營銷類的場景,不能完全支撐2B跨境支付的複雜場景以及滿足個性化需求。而自建的方案更靈活,能整合更多的框架和技術,去滿足特定的業務場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此,XTransfer根據自身的OLAP場景需求,在自建數倉的基礎上,支持靈活連接多種OLAP數據庫,例如Clickhouse、Doris等,以滿足不同需求。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"設計思路"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"XTransfer建設實時數據倉庫的設計思路是以開源項目爲基礎,疊加二次開發。目前數據平臺採用了Lambda架構,同時構建了流處理和批處理兩種架構進行數據處理,並正在向流批一體數倉的方向演進:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在業務起步的早期,團隊採用了批處理架構,數據倉庫處理時效是T+1(即:今天產生的數據分析結果明天才能看到)。隨着業務的發展,通過更頻繁的任務調度,提升批處理的時效,可以達到小時級。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當業務進入快速發展階段,對數據的實時性要求越來越高,團隊開始採用流處理架構,數據處理時效達到秒級。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對大數據量的離線數倉數據同步,採用CDC(Change Data Capture) + Merge的技術方案將數據同步至離線數倉ODS層,整體流程:進行一次性快照製作,將存量數據同步至ODS;每天基於存量數據和當日的增量變更進行Merge還原。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"技術選型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據倉庫的維度建模中,XTransfer選擇了星型模型,使用分層設計方案來建設實時數倉,分層架構如下圖所示:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/14\/14f2f39ecb67b68b3efac4b66bf44ac5.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在做技術選型的過程中,往往會出現兩種聲音,一種是希望能從0到1建設,另一種是希望直接選用成熟的方案。XTransfer也不例外,在這種情況下,解決方法是把各團隊集結到一起去做深入探討和研究,把關鍵路徑分析出來,確定哪些需要自己去做、哪些是可以引用的、哪些是可以自己去做補充和完善的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體的技術選型方面,XTransfer在實時數倉計算引擎上選擇了開源流處理框架Flink,因爲它具備高吞吐、低延遲、高性能等優點,且技術成熟、社區活躍。做風控需要對全量數據進行捕捉,CDC(Change Data Capture,用於捕捉數據庫表的增刪改查操作) 是內部比較傾向的方案,因此最終使用了Flink CDC Connectors,這是 Flink 的一組 Source 連接器,是 Flink CDC 的核心組件,這些連接器負責從 MySQL、PostgreSQL、Oracle、MongoDB 等數據庫讀取存量歷史數據和增量變更數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時數倉存儲方面選擇了Kafka,使用Kafka compacted topic存儲數倉ODS、DWD、DWS層的數據。使用upsert-kafka連接器以 upsert 方式從 Kafka topic 中讀取數據並將數據寫入 Kafka topic。作爲 source,upsert-kafka 連接器生產 changelog 流,其中每條數據記錄代表一個更新或刪除事件。作爲 sink,upsert-kafka 連接器可以消費 changelog 流。它會將 INSERT\/UPDATE 數據作爲正常的 Kafka 消息寫入,並將 DELETE 數據以 value 爲空的 Kafka 消息寫入,表示對應 key 的消息被刪除。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在OLAP引擎的選型上,團隊結合XTransfer的研發資源情況、業務需求以及使用場景,選擇了Apache Doris,具體有以下幾點考慮:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"選擇ROLAP模式,模型簡化,模型複用率高,開發效率高,低冗餘,省空間;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時支持離線批量導入和實時數據導入,支持事務和冪等性導入;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採取分區分桶的機制,支持多種索引技術,滿足PB級的存儲和分析能力;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"使用列式存儲和壓縮技術,提升查詢性能;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兼容MySQL訪問協議,簡單、易用;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"運維更簡單,內置分佈式協議,支持集羣的在線動態擴縮容,故障節點自動恢復;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一站式的分析解決方案,只需少量投入研發資源,開箱即用。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"深度參與開源"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從開源中受益,併力所能及地反哺社區,是XTransfer技術團隊所遵循的原則。前面提到他們使用了Flink CDC Connectors,這也是一個獨立的開源項目。日前,Flink社區已經正式發佈Flink CDC 2.1版本,重點提升MySQL CDC 連接器的性能和生產穩定性,並新增了 Oracle CDC 連接器和 MongoDB CDC 連接器,其中,XTransfer技術專家孫家寶貢獻了MongoDB CDC 連接器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MongoDB CDC 連接器支持從 MongoDB 數據庫獲取全量歷史數據和增量變更數據。藉助 Flink 的集成能力,用戶可以非常方便地將 MongoDB 中的數據實時同步到 Flink 支持的所有下游存儲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/60\/60077121612799ea908412b057a80bfc.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據捕獲的整個過程中,用戶不需要學習 MongoDB 的副本機制和原理,大大簡化了流程,降低了使用門檻。MongoDB CDC 也支持兩種啓動模式:默認的initial模式是先同步表中的存量的數據,然後同步表中的增量數據;latest-offset 模式則是從當前時間點開始只同步表中增量數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此外,MongoDB CDC 還提供了豐富的配置和優化參數,對於生產環境來說,這些配置和參數能夠極大地提升實時鏈路的性能和穩定性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"康偉表示,XTransfer大數據團隊目前也在關注Flink Connector、Flink Table API、Flink + Iceberg等領域的建設。接下來,XTransfer會持續優化MongoDB CDC,比如提升並行度,優化對大表同步的支撐等。在與社區其他開發者的交流中,團隊發現大家對於MongoDB在Sink Connector方面也有很大的需求,因此也正在做這部分開發。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前,XTransfer團隊已經向社區提報並解決的相關issue:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FLINK-6573 Flink MongoDB ConnectorFLINK-21172 “canal-json format include es field”FLINK-21949 Support collect to array aggregate functionDBZ-3966 JsonTableChangeSerializer support serialization for defaultValue"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正在關注跟進的issue:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"FLINK-22793 HybridSource Table ImplementationICEBERG-1639 Flink: write the CDC records into apache iceberg tables."}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"寫在最後"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在交流中,InfoQ瞭解到XTransfer選擇自行搭建實時數倉的原因也和公司所堅持的“長期有耐心”的理念息息相關——“必須要做長期的解決方案,不做不斷推倒重來的事。”對此,劉豔芳進一步解釋道,“目前我們的方案能針對性解決我們遇到的問題,同時我們能充分掌握裏面的核心技術,而它也是符合未來社區發展趨勢的解決方案,我們從社區的發展中獲益之餘也能夠貢獻給社區,這能保證我們不被淘汰。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"至於“不被淘汰”的衡量標準,則跟社區貢獻有關。劉豔芳對團隊有兩點要求,一要跟上社區的變化;二是“能給社區貢獻”,能貢獻就說明被社區接納,自身是處於和社區同步發展的軌道。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總的來說,XTransfer正試圖用一種更務實的方式去深度參與社區,並緊跟社區的腳步。“當下,我們可能沒有那麼大的力量,不像大廠能一下子貢獻一整個模塊,但我們也有能力去貢獻差異化的產出。未來,我們會持續探索,創造更多可能性。”"}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章