Apache Kylin 在汽車之家的實時多維分析實踐

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近期,Apache Kylin 5 週年在線慶典順利結束,來自汽車之家的實時計算平臺負責人 邸星星 老師爲大家介紹了 Apache Kylin 在汽車之家的升級歷程,以及在實時多維分析方面的實踐,最後也展望了對 Kylin 4.0 版本的期待。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/49\/1f\/49879e7f1630d05e8d182bcd517dba1f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/www.bilibili.com\/video\/BV1EV411t7eP?from=search&seid=7421771643899774600","title":"","type":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"會議完整視頻"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下是邸星星老師的現場分享實錄。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Apache Kylin 在汽車之家的升級演進"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近年來,面對大數據量與終端用戶的不斷增長,2016年的時候我們梳理了針對多維分析引擎的選型需求:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"面臨千億,乃至萬億數據量的情況下,要達到秒級或者亞秒級的查詢響應。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要滿足相對較高的查詢吞吐量以及較高的可用性,當時汽車之家的數據主要是基於離線數據來做多維分析。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Kylin 在汽車之家的升級歷程"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a2\/8c\/a25ff8d7d89e1227637ce6fff3af618c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"Kylin 過往 5年裏主要的發展里程碑"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2016年初,汽車之家正式啓用 Kylin 1.5 版本"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此時共有10+ Cube,主要支持部門內的少量數據分析,同時用作技術驗證。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2017年,內部正式上線 Kylin 1.6 版本,並逐步承接線上業務"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"累計有100+ Cube,2017年底使用 Kylin 支持了汽車之家的戰略級商業化數據產品。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2018年,升級到 Kylin 2.2 版本 ,支持 Spark 引擎做構建,同時對 HBase 集羣做了 T+1 備份"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在極端情況下,如果 HBase 集羣發生故障,可以把用戶的查詢請求路由到備用集羣裏來保障業務正常運行。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2019年,升級到 Kylin 2.6 版本,並在內部多個 BU 推廣使用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Cube 數量達到了 400+ ,另外公司內部也研發了BI 產品,也可以支持 Kylin 作爲它的查詢引擎。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2020年,將 Kylin 升級到 3.1 版本,主要來做實時多維分析的應用"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/yy\/f6\/yy5ef78005520f15df3c35709b3238f6.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":false,"pastePass":false}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"可以看到目前集羣規模大概有 500+Cube,20000+個Segment,有15萬左右的 HBase Region,存儲在 300T 左右,每天的查詢請求在 20 萬以上,使用 Kylin 95% 的響應時間會在 2 秒以內返回結果。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kylin 在實時多維分析方面的應用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"1)Kylin Real-time OLAP 架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 3.0 版本開始提供實時分析能力,主要是引入了兩個角色,一個是 Coordinator,另一個是 Receiver。Coordinator 主要是負責協調,做一些分配,啓動、停止這樣的工作。Coordinator 是一個真實的一個計算節點。同時,Kylin 會抽象 Replica Set 的概念,就是把兩個 Receiver 看作一組,當作一組 Replica Set,然後這兩個 Receiver 做的是一樣的事情,他們計算的是一樣的數據,也提供同樣的查詢,當一個 Receiver 掛掉的時候,整體的查詢是不會受影響的,相當於是做了一個 Receiver 層面的高可用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/75\/0e\/75596259cd4e8b7d0ed7eace2e85f90e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"2)Kylin 如何支持批流一體?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"接下來談談流批一體這個概念,目前流批一體的概念比較流行。"},{"type":"text","text":" Kylin 因爲支持實時 Cube 的 Lambda 模式,我們在頁面上做一次建模,然後把它定義爲 Lambda 模式,那它實時的數據會從 Kafka 裏面實時消費出來,然後去支持實時的查詢。 "},{"type":"text","marks":[{"type":"strong"}],"text":"當離線的數據準備好之後,可以通過構建當天的離線數據去覆蓋實時結果,也就是說 Kylin 能自動糾正實時計算的結果,這樣就很好的解決了離線和實時的開發體系不統一的問題。"},{"type":"text","text":" 比如說用 Flink 做實時計算,然後用 Hive 做離線計算,其實是兩套開發體系,而且有可能是兩個不同的開發人員;使用 Kylin 的時候完全可以由一個開發人員完成,在 Kylin 的頁面上來做正常的建模,就可以做到批流一體的多維分析, "},{"type":"text","marks":[{"type":"strong"}],"text":"這其實也降低了人力的浪費,同時也能保證數據口徑是完全一致的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"3)汽車之家在實時方面遇到的挑戰"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再來說下汽車之家在實時方面現在面臨的 3 個問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先 Kylin 的 Receiver 節點和之前的 Job 以及 Query 節點都不太一樣,因爲它既負責實時的計算,又要負責查詢,而且是查詢本地磁盤上的數據,所以負擔比較重。當面臨越來越多實時業務的時候,可能需要維護一個比較大的 Receiver 集羣,這樣對我們來說的這個維護成本會是一個大問題。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"另外, Receiver 本身不只是服務於某一個 Cube,一個 Receiver 進程裏面可能會支持好幾個 Cube 的計算,假如某一個 Cube 的數據量突然激增,或者某個 Cube 的數據有一些問題,就可能會影響到這個 Receiver 進程本身的穩定性,所以隔離方面其實也不是特別好。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後說下彈性伸縮這塊,如果單個 topic 的數據量激增的時候,要怎麼去快速的做擴容,然後等它的訪問量下去之後,是不是可以快速的做縮容?"}]}]},{"type":"listitem","attrs":{"listStyle":"none"},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"之前的方式都是需要手動去做調整,維護成本就會相應增加。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"4)Receiver on Kubernetes 解決方案介紹"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"目前 Kylin 在做的雲原生,我們就自然地想到了可以和 K8s 做集成來解決這些問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前我們是在內部引入了一個 Kubernetes Resource Manager 的角色,然後由它去和 K8s 集羣做交互,來動態申請相應的資源。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖是目前我們正在做的和 K8s 集成的整體架構,我們做的改造主要集中在前面幾步,後面這幾步,包括查詢這些基本沒有改動。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/00\/d9\/007ec4a72557f177b9a4db7byyf857d9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在 Coordinator 內部引入了一個組件,就是Kubernetes Resource Manager,當用戶完成建模後,在Enable Cube 的時候,會根據配置判斷這個 Cube 是不是需要去創建一個獨立的 Receiver 集羣(Streaming Cluster),如果需要獨立的 Receiver 集羣,Coordinator 會根據配置好的並行度、副本數、CPU、內存等基本參數,通過 Kubernetes Resource Manager 和 K8s 去做交互,動態創建 Cube 所需要的 Reciever 集羣。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這中間有一個映射關係,Kylin 裏面的一個 Replica Set 會對應 K8s 裏面的一個 StatefulSet,每個 StatefulSet 裏面會有對應副本個數的 Receiver 實例,這個例子中我們設置的並行度爲 2,副本數爲 2,那麼就會有 2 個StatefulSet,每個 StatefulSet 裏會有 2 個 Receiver 實例。當 Kubernetes Resource Manager 把 Receiver 實例都啓動成功後,Receiver 會自動註冊到 ZK 中,同時會寫入 Cube 名稱,聲明這個 Receiver 是屬於哪一個 Cube 的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 Coordinator 發現 Cube 所需的 Receiver 都啓動完畢以後,會創建 Replica Set,並進行 assign 操作,同時把Cluster state 置爲 ready 狀態,最後向 Cube 對應的所有 Receiver 發送 REST 請求,通知這些 Receiver 開始消費數據,至此整個 Enable Cube 的過程就結束了。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡單回顧一下,通過和 K8s 集成已經解決了前面提到彈性伸縮的問題,我們不再需要維護一個很大的 Streaming Cluster,Streaming Cluster 可以是 Cube 級別的,每個 Cube 會對應一個獨立的 Streaming Cluster,資源在不同的 Cube 間是相互隔離的,並且資源可以方便的動態伸縮。比如說用戶想要增加資源,調大了並行度,那麼 Coordinator 會識別到這個擴容事件,去動態的再創建一組 StatefulSet 和 Kylin Replica Set,進行重新分配,完成擴容。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"目前這一套架構已經在我們的準生產環境跑通了,然後接下來會上線支持一些我們內部的業務,運行穩定之後會逐步在各條業務線推廣。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"對 Kylin 4.0 的展望"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後來說下對 Kylin4.0 的展望,Kylin 4.0 主要是往雲原生方向發展,主要有兩塊大的改進:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"全棧化 Spark,脫離 Hadoop 組件做查詢"},{"type":"text","text":" ,這是雲原生的基礎;同時使用 Spark 引擎做查詢還有一個額外的好處,就是之前用 Calcite 會有查詢單點的問題,用 Spark 的話就可以很好地解決這個難點。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kylin on Parquet,基於列式存儲有很高的 IO 效率,也能一定程度上提供查詢的穩定性"},{"type":"text","text":" ,對比 HBase是一個有狀態的存儲,但是 Parquet 只是一個文件格式,所以查詢的鏈路上也會更加輕量級。去 HBase 後也不再需要額外的運維成本來維護 HBase。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜上,我們非常期待 Kylin 4.0 GA 的發佈。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"邸星星,汽車之家實時計算平臺負責人,長期從事實時計算與數據分析領域的平臺建設工作,致力於爲公司提供大規模、高效、穩定的計算與查詢服務。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文轉載自公衆號apachekylin(ID:ApacheKylin)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653082037&idx=1&sn=26f7afa97b1351c6bf364c829e1360b6&chksm=80a4af44b7d32652d331914023e3e6e1b655dfe744bb5c5fa6aa36baed046b7eed13b73ef7c2&token=1845978438&lang=zh_CN#rd","title":"","type":null},"content":[{"type":"text","text":"Apache Kylin 在汽車之家的實時多維分析實踐"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章