Kylin 在攜程的實踐(下)

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"案例分享"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"離線分析案例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/00\/0e\/0002yydc075eca19f172cc15758be50e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"攜程之前使用的是 OpenTSDB+Hive。採用 Kylin 前,先從 Hive 先生成聚合表,然後導入 HBase,通過 OpenTSDB 去分析,現在積累了接近百億的數據,隨着數據的增長,老的方案已經無法滿足業務需求了,而且同步數據成本高,OpenTSDB 沒辦法支持精準去重響應時間也很差。"},{"type":"text","marks":[{"type":"strong"}],"text":"用了 Kylin 之後,現在的業務規模已經可以支撐上百億了,目前已經配有 200 個左右的線上活躍的 Cube。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"實時分析案例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ea\/c2\/ea9c60b75aa83d02yy277f41721029c2.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個是去年 3、4 月份用戶提的新需求。Kylin 現在是上圖所示的 Streaming-Cube 的架構,Kylin 接入的是攜程的 Hermes,Hermes 是 Kafka 的一個封裝。我們現在支持原生 Kafka 接入和 Hermes 接入,底層沿用 MR,因爲我們測試過Spark,其實很多的場景上和 MR 相當,效果不是特別明顯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/db\/b3\/dbd5fef7fb6d498fb2cd576dfd5caab3.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這部分主要是用於度假預訂狀態告警,度假團隊需要去分析用戶預訂的情況,準確實時地發送給客服人員任何預訂失敗等錯誤狀況,所以這塊對於數據構建落地的時間敏感度比較高。目前,通過一系列優化,Streaming 的構建基本保持在5分鐘左右,可以滿足一部分業務的需求。但是,"},{"type":"text","marks":[{"type":"strong"}],"text":"更大的挑戰是達到一分鐘以內,也就是說秒級構建,所以對於我們來說Streaming-realtime 會是一個值得嘗試的方向。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"展望"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"攜程針對 Kylin 主要有兩方面的展望。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1 支持自動構建Cube"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這塊我們目前在調研,通過分析應用採集的元數據、SQL特徵,可以自動地爲用戶構建 Cube,爲用戶節約 Kylin 的學習成本,同時減少重複查詢對於 MPP 的壓力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2 Real-time Streaming的調研和落地"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了能夠更加豐富 Kylin 的使用場景,我們打算對 eBay 爲 Kylin 貢獻的"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653079081&idx=1&sn=7a5088406e01f6bfa62ffb02d7254d2f&scene=21#wechat_redirect","title":"","type":null},"content":[{"type":"text","text":"實時流處理技術"}]},{"type":"text","text":"做進一步調研和落地工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Q&A"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q:演講中提到的構建的 Cube 有 20 個指標,這種情況下去重,是精準去重還是近似去重?有多少個指標呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A:用戶配的是精確。精確去重指標不會太多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q:演講中提到20個維度的響應時間是亞秒級,有 20 個維度。請問你們做了哪些優化的工作來達到如此快的響應時間?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A:我們構建的時候,對於這種維度多的情況,"},{"type":"text","marks":[{"type":"strong"}],"text":"建議當用戶採取了以下 3 種措施來優化查詢:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"使用Mandatory Dimension;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"實現分佈式緩存;"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"配置高基維度的時候,會建議他們把高基維度往前移,這樣會更高效地命中 Cube,並減小掃描的數據範圍)。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q:配了 20 個維度,最終產生的 Cube 單日有多大?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A:最大的 Cube 日產生 13 T 的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q:剛剛提到的監控方案是你們自主研發的,還是有開源的方案可以用?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A:監控是我們自主研發的。我們接入了公司已經成熟的監控平臺,避免反覆造輪子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q:分享裏提到的實時5 分鐘構建一次,我理解是採用批操作,並不是真正的流,而是把流幾分鐘拆成一個批次。是嗎?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A:對的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q:前面講到底層用的 MR,沒用 Spark,因爲覺得時間上並沒有什麼節省。這個是 Spark 本身的原因,還是因爲你們的任務還不是很大的量?因爲每次 Spark 啓任務的時間和MR相比有差別?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A:離線這塊目前可以達到要求,所以還沒有轉成Spark。我們在實時這塊用 Spark 的過程中,就是像你說的,每次提交任務就很慢,達不到要求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q:是因爲頻繁提交的問題?不是因爲它本身?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A:對,不是因爲它本身。我們也在調研如何避免每個構建過程都啓動一次 driver。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q:在我之前的應用場景裏,有一個維度特別的高基維,每天增量就很大,我們查詢機制裏這個維度是必選的。比如說是人的工號,裏面放了很多人,然後我們要去預計算,如果說這個維度非常高,數據量會非常大,這種情況下你們會採取什麼辦法呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A:高基字段可以設置下 shard by。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q:攜程每天預計算的集羣大概是有多大?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A:離線集羣是 2 臺物理機,每臺 100 多 G 的物理機,查詢節點放了 4 臺虛機。實時這塊,因爲用戶量目前不多,所以都是建在虛機上,所以內存也不大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q:在維度特別大,數據量又很大的情況下,剪枝的話,Cuboid 大概會控制在多少?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A:維度特別大的情況,我們最多是 4096 個 Cuboid。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文轉載自公衆號apachekylin(ID:ApacheKylin)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653079116&idx=1&sn=cd1e06a78d4a07955d85dbba4089c542&chksm=80a4b8bdb7d331aba0fdb00d16168e6c74098a14b153c1a5d78b2000c47625cb2091714f1e4d&scene=21#wechat_redirect","title":"","type":null},"content":[{"type":"text","text":"Kylin 在攜程的實踐(下)"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章