基於 Apache Doris 的有道精品課數據中臺建設實踐丨開源案例庫

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文旨在向大家分享有道精品課數據中臺的架構演進過程,以及Doris作爲一個MPP分析型數據庫是如何爲不斷增長的業務體量提供有效支撐並進行數據賦能的。內容分享邏輯首先從"},{"type":"text","marks":[{"type":"strong"}],"text":"實時數倉選型的經驗"},{"type":"text","text":"爲切入點,進一步着重分享使用Doris過程中遇到的問題以及Doris技術團隊針對這些問題所做出的調整和優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"1、背景"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1.1 業務場景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據業務需求,目前有道精品課的數據層架構上可分爲離線和實時兩部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"離線系統主要處理埋點相關數據,採用批處理的方式定時計算。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而實時流數據主要來源於各個業務系統實時產生的數據流以及數據庫的變更日誌,需要考慮數據的準確性、實時性和時序特徵,處理過程非常複雜。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有道精品課數據中臺團隊依託於其實時計算能力在整個數據架構中主要承擔了實時數據處理的角色,同時爲下游離線數倉提供實時數據同步服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據中臺主要服務的用戶角色和對應的數據需求如下:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/uploader.shimo.im\/f\/94kKZtOp8zg9O3Li.png!thumbnail","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"運營\/策略\/負責人主要查看學生的整體情況,查詢數據中臺的一些課程維度實時聚合數據"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"輔導\/銷售主要關注所服務學生的各種實時明細數據"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"品控主要查看課程\/老師\/輔導各維度整體數據,通過T+1的離線報表進行查看"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據分析師對數據中臺T+1同步到離線數倉的數據進行交互式分析"}]}]}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1.2 數據中臺前期系統架構及業務痛點"}]},{"type":"image","attrs":{"src":"https:\/\/uploader.shimo.im\/f\/UoOWIWWvQNhqRRXy.png!thumbnail","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示,在數據中臺1.0架構中我們的實時數據存儲主要依託於Elasticsearch,遇到了以下幾個問題:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"聚合查詢效率不高"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"數據壓縮空間低"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"不支持多索引的join,在業務設計上我們只能設置很多大寬表來解決問題"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"不支持標準SQL,查詢成本較高"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"2、實時數倉選型"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於上面的業務痛點,我們開始對實時數倉進行調研。當時調研了Doris, ClickHouse,  TiDB+TiFlash, Druid, Kylin。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
OLAP引擎
優勢
劣勢
Doris
1. 兼容MySQL協議
2. 支持Online Schema Change
3. 支持更新
4. 集羣擴縮容自動化
5. 支持基於時間分區,冷熱數據分離
1. 開源較晚,目前還在孵化中
ClickHouse
1. 單機性能強勁
2. 向量化引擎
3. 數據壓縮空間大
1. 不支持標準SQL
2. 集羣擴縮容不能自動Rebalance
3. 對更新支持不好
4. 運維成本較高
TiDB+TiFlash
1. 兼容MySQL協議
2. 向量化引擎
3. 業務數據和分析數據同步方便(內部Raft同步)
1. TiFlash不開源
2. 落地公司較少
3. 架構主要面向TP場景
Druid
1. 基於時間分區,聚合數據查詢較快
2. 支持冷熱數據分離
1. 不支持明細數據存儲
2. 不支持標準SQL
Kylin
1. 支持標準SQL查詢
2. 支持預聚合
3. 社區發展較好
1. 依賴較多
2. 明細查詢支持較弱
3. 資源消耗較多"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章