# Kylin 在攜程的實踐（下）

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"案例分享"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"離線分析案例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/00\/0e\/0002yydc075eca19f172cc15758be50e.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"攜程之前使用的是 OpenTSDB+Hive。採用 Kylin 前，先從 Hive 先生成聚合表，然後導入 HBase，通過 OpenTSDB 去分析，現在積累了接近百億的數據，隨着數據的增長，老的方案已經無法滿足業務需求了，而且同步數據成本高，OpenTSDB 沒辦法支持精準去重響應時間也很差。"},{"type":"text","marks":[{"type":"strong"}],"text":"用了 Kylin 之後，現在的業務規模已經可以支撐上百億了，目前已經配有 200 個左右的線上活躍的 Cube。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"實時分析案例"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/ea\/c2\/ea9c60b75aa83d02yy277f41721029c2.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個是去年 3、4 月份用戶提的新需求。Kylin 現在是上圖所示的 Streaming-Cube 的架構，Kylin 接入的是攜程的 Hermes，Hermes 是 Kafka 的一個封裝。我們現在支持原生 Kafka 接入和 Hermes 接入，底層沿用 MR，因爲我們測試過Spark，其實很多的場景上和 MR 相當，效果不是特別明顯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/db\/b3\/dbd5fef7fb6d498fb2cd576dfd5caab3.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這部分主要是用於度假預訂狀態告警，度假團隊需要去分析用戶預訂的情況，準確實時地發送給客服人員任何預訂失敗等錯誤狀況，所以這塊對於數據構建落地的時間敏感度比較高。目前，通過一系列優化，Streaming 的構建基本保持在5分鐘左右，可以滿足一部分業務的需求。但是，"},{"type":"text","marks":[{"type":"strong"}],"text":"更大的挑戰是達到一分鐘以內，也就是說秒級構建，所以對於我們來說Streaming-realtime 會是一個值得嘗試的方向。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"展望"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"攜程針對 Kylin 主要有兩方面的展望。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1 支持自動構建Cube"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這塊我們目前在調研，通過分析應用採集的元數據、SQL特徵，可以自動地爲用戶構建 Cube，爲用戶節約 Kylin 的學習成本，同時減少重複查詢對於 MPP 的壓力。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2 Real-time Streaming的調研和落地"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了能夠更加豐富 Kylin 的使用場景，我們打算對 eBay 爲 Kylin 貢獻的"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653079081&idx=1&sn=7a5088406e01f6bfa62ffb02d7254d2f&scene=21#wechat_redirect","title":"","type":null},"content":[{"type":"text","text":"實時流處理技術"}]},{"type":"text","text":"做進一步調研和落地工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Q&A"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q：演講中提到的構建的 Cube 有 20 個指標，這種情況下去重，是精準去重還是近似去重？有多少個指標呢？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A：用戶配的是精確。精確去重指標不會太多。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q：演講中提到20個維度的響應時間是亞秒級，有 20 個維度。請問你們做了哪些優化的工作來達到如此快的響應時間？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A：我們構建的時候，對於這種維度多的情況，"},{"type":"text","marks":[{"type":"strong"}],"text":"建議當用戶採取了以下 3 種措施來優化查詢："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"使用Mandatory Dimension；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"實現分佈式緩存；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"配置高基維度的時候，會建議他們把高基維度往前移，這樣會更高效地命中 Cube，並減小掃描的數據範圍）。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q：配了 20 個維度，最終產生的 Cube 單日有多大？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A：最大的 Cube 日產生 13 T 的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q：剛剛提到的監控方案是你們自主研發的，還是有開源的方案可以用？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A：監控是我們自主研發的。我們接入了公司已經成熟的監控平臺，避免反覆造輪子。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q：分享裏提到的實時5 分鐘構建一次，我理解是採用批操作，並不是真正的流，而是把流幾分鐘拆成一個批次。是嗎？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A：對的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q：前面講到底層用的 MR，沒用 Spark，因爲覺得時間上並沒有什麼節省。這個是 Spark 本身的原因，還是因爲你們的任務還不是很大的量？因爲每次 Spark 啓任務的時間和MR相比有差別？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A：離線這塊目前可以達到要求，所以還沒有轉成Spark。我們在實時這塊用 Spark 的過程中，就是像你說的，每次提交任務就很慢，達不到要求。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q：是因爲頻繁提交的問題？不是因爲它本身？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A：對，不是因爲它本身。我們也在調研如何避免每個構建過程都啓動一次 driver。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q：在我之前的應用場景裏，有一個維度特別的高基維，每天增量就很大，我們查詢機制裏這個維度是必選的。比如說是人的工號，裏面放了很多人，然後我們要去預計算，如果說這個維度非常高，數據量會非常大，這種情況下你們會採取什麼辦法呢？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A：高基字段可以設置下 shard by。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q：攜程每天預計算的集羣大概是有多大？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A：離線集羣是 2 臺物理機，每臺 100 多 G 的物理機，查詢節點放了 4 臺虛機。實時這塊，因爲用戶量目前不多，所以都是建在虛機上，所以內存也不大。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Q：在維度特別大，數據量又很大的情況下，剪枝的話，Cuboid 大概會控制在多少？"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A：維度特別大的情況，我們最多是 4096 個 Cuboid。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文轉載自公衆號apachekylin（ID：ApacheKylin）。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":"："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653079116&idx=1&sn=cd1e06a78d4a07955d85dbba4089c542&chksm=80a4b8bdb7d331aba0fdb00d16168e6c74098a14b153c1a5d78b2000c47625cb2091714f1e4d&scene=21#wechat_redirect","title":"","type":null},"content":[{"type":"text","text":"Kylin 在攜程的實踐（下）"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}