數倉實時化改造:Hudi on Flink 在順豐的實踐應用

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文主要介紹順豐在數據倉庫的數據實時化、數據庫 CDC、Hudi on Flink 上的實踐應用及產品化經驗。文章主要分爲以下幾部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"順豐業務介紹"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"Hudi on Flink"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"產品化支持"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"後續計劃"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1 順豐業務"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.1 順豐大數據的應用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先來看一下順豐大數據業務的全景圖。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/03\/0396c6ef42d17ef45ce6f9ee4a32c282.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大數據平臺,中間的基礎部分是大數據平臺,這塊是順豐結合開源組件自行搭建的。與之相關的是大數據分析與人工智能,順豐有一個非常強的地面部隊,就是線下的快遞小哥以及運輸車輛,需要使用 AI 以及大數據分析來輔助管理,提升整體效率。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"區塊鏈,順豐對接了很多客戶與商家,對於商家來說,首先需要確保快件是可信的能夠做貨物的交易與交換。這塊涉及的基本上都是品牌商家,溯源與存證的業務順豐也有涉及。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"IoT,就像之前提及到的,因爲順豐地面部隊較多,相應需要採集的數據也會比較多。我們的部分包裹中是有傳感器的,車輛也有相關的傳感器,如車輛的攝像頭,以及快遞小哥的手環(包含地理位置、員工的健康狀態,對應做一些關懷的舉動)。同時,還有一些工作場景既有叉車,也有分揀設備,這些就需要大數據平臺來做一些聯動,因此 IoT 的應用相對較多。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"智慧供應鏈和智慧物流,這兩塊更多的是指如何用大數據的手段輔助業務做一些經營上的決策。比如我們有很多 B 端客戶,對於他們來說如何在每個倉庫裏備貨,如何協調以及互相調撥,這部分就由智慧物流來完成。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面這塊就是 IOT 實踐中的一部分:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/49\/49d334b9a87992efb3b91cb18417019c.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面可以看出物流本身的環節是非常多的,下單、小哥收件、分揀、陸運中轉等整個過程,紅色解釋部分是指我們會做的一些 IoT 與大數據結合的應用,這裏其實大部分都是基於 Flink 來完成的。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.2 順豐大數據技術矩陣"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面這張圖是順豐目前大數據整體的架構概覽:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/f5\/f5e1adea99c092d854057a5737a78f40.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"數據集成層:最下面爲數據集成層,因爲順豐的歷史原因,所以包含了很多數據存儲引擎,如 Oracle、MySQL、MongoDB 等,並且部分引擎仍會繼續支持。右下物聯網設備相對較新,主要是進行包含普通文本、網絡數據庫、圖像、音頻、視頻等的數據採集。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"數據存儲計算:實時這塊順豐目前用的最多的還是 Flink,Storm 沒有標示出來,目前我們在做遷移。消息中間件處理目前主要使用 Kafka。然後右邊存儲結構的種類就相對豐富,因爲不同的場景有不同的處理方式,比如數據分析需要性能比較強的 Clickhouse;數倉和離線計算這塊還是比較傳統,以 Hive 爲主結合 Spark,目前我們是結合 Flink 與 Hudi 去實現離線實時化。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"數據產品,我們傾向的還是首先降門檻,讓內部開發與用戶更容易上手。內部同學如果要掌握如此多的組件,成本是非常高的,再加上規範化會導致溝通、維護以及運維的高額成本,所以我們一定要去做一些產品化、規範化的事情。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.3 順豐科技數據採集組成"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/6f\/6fd29b0cbe1f75e2c1846f623627214e.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上圖就是我們大數據整體數據採集的概覽,數據採集當前包括微服務的應用,部分數據直髮到 Kafka,還有些會落成日誌,然後我們自己做了一個日誌採集工具,類似於 Flume,更加的輕量化,達到不丟、不重、以及遠程的更新、限速。另外我們也會將 Kafka 中的數據通過 Flink 放到 HDFS,以 Hudi 的形式去做。下面會詳細介紹。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.4 順豐數據應用架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/wechat\/images\/9b\/9b382f2048fd362f15dfd8122725bbd2.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章