Kylin on AWS 雲上運維實踐|揭祕 OLX Group 全球數據基礎架構

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲首箇中國人主導貢獻到 Apache 基金會的頂級項目,Apache Kylin 開源社區在國內外一直都保持着較高活力。在 2019 年 10 月,Apache Kylin 就同來自歐洲的大型跨境電商 OLX Group 在德國柏林一起舉辦過 Kylin Meetup,並受到衆多好評。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在之前舉辦的 Apache Kylin 5 週年慶典中,OLX Group 榮獲最佳應用獎,我們也再次邀請到高級數據工程師 Mateusz Jerzyk 作爲代表分享了 Apache Kylin 在 OLX Group 全球數據基礎架構中發揮的作用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以下爲會議實錄翻譯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大家好,今天很開心可以與大家分享,我們是如何在 OLX Group 應用 Apache Kylin 的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我會簡單介紹 OLX Group,之後會向大家展示 Kylin 在我們的全球數據服務基礎架構中的作用,最後會分享一些我們的用例。同時,我也會重點介紹我們在使用 Kylin,構建 Cube 時遇到的一些困難以及我們的收穫。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/05\/1f\/05923ff5a6717ed6761710fec06c971f.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"OLX Group 簡介"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/3c\/98\/3ca0d2e1b44efcf441864588e5910698.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"OLX Group 是全球互聯網巨頭 Prosus 公司的一部分。Prosus 是一個全球互聯網集團,也是全球最大的技術投資者之一。Prosus 投資了騰訊,Delivery Hero,Udemy 等公司。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/52\/6c\/522510f9bec3aa393da1e2d09f6d276c.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"OLX Group 爲購買,出售和交換產品和服務提供了領先的平臺,在全球擁有 20 多個品牌。目前業務覆蓋 30 多個國家\/地區,在全球設有 35 個以上辦事處。OLX 有 7500 多名員工,其中有上千名在產品與技術部門工作。每月都會有 3.5 億人通過我們的平臺購買,出售或交換商品或服務,平均每天用戶訪問平臺會產生超過 40 億個事件。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/f3\/40\/f3e2b305f0b44939a80a80b310be4b40.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 OLX Group,我們相信數據的力量。我們每天收集的數據都會影響我們的業務決策。我們會構建各種儀表板,機器學習模型等來輔助決策。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kylin 在數據架構中的作用"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我來介紹一下 Kylin 在我們的數據基礎架構中的作用。先介紹一下 OLX 的數據流。首先,我們會使用一些內部工具從產品數據庫和設備中收集數據。所有數據都存儲在數據湖中,作爲我們的數據存儲區。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在這裏需要特別提及的是,我們已經建立了一個數據湖,但只有公司內部的少數人才能訪問它,也是完全符合數據保護法案的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/53\/3e\/537079f1eacfc8c587b3e55f6020243e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在收集和治理數據之後,OLX 中的每個團隊都可以向專用的精簡數據存儲(稱爲存儲庫)請求一些數據。這樣,我們就可以完全控制我們數據的使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我們會使用 Odyn 的數據處理運營數據中心的功能。用戶可以計劃自己的 ETL 和\/或其他工作負載,並將結果存儲回存儲庫中。這些處理好的數據已準備好接入用作加速分析查詢的加速層的 Apache Kylin。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6b\/98\/6b875c11b7be0ebd07d32998a2cyy698.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"現在我們來關注 Kylin 在 OLX Group 數據架構中的作用。大家可以看到我們的 Apache Kylin 平臺設置的流程。 "},{"type":"text","marks":[{"type":"strong"}],"text":"我們使用 Kubernetes 來部署 Apache Kylin,Spark 和 Hive。"},{"type":"text","text":" 值得一提的是,爲了將 Apache Hive 在 Kubernetes 上用起來,OLX Group 會將 Apache Spark 作爲引擎的一部分。同時,我們使用 Amazon EMR 將 Amazon Kylin 的 HBase 集羣與 Hadoop HDFS 託管在一起,並且將數據備份到 S3 中。該數據架構還擁有一個自動還原過程,當發現部署中發生崩潰時,該過程可以隨時從 S3 恢復所有環境。OLX Group 將 OKTA 用作用戶登錄的 SAML 聯邦身份認證,也把 OpenLDAP 用於用戶授權。我們會將 Tomcat 會話存儲在 Memcached 中,來將部署的停機時間降至零。該數據架構使用 Amazon Aurora 存儲 Hive 元存儲數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們擁有和 OLX 其餘數據基礎架構完全集成的 Apache Kylin。分析師和非技術用戶可以使用一致、全面監控、穩定且可擴展的跨團隊環境,輕鬆順暢地構建多維數據集並使用 Apache Kylin。我們還爲 Apache Kylin 提供了量身定製的每日 HBase 備份和自動還原功能。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Kylin 實踐分享"}]},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/8c\/85\/8cb9a3eed9d85c88373ff77bf02eed85.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接下來我來分享一些案例和使用 Kylin 時遇到的一些困難。如前所述,我們在多個地方使用數據。 "},{"type":"text","marks":[{"type":"strong"}],"text":"我們遇到的第一個困難就是爲我們的全局報表構建一組儀表板。"},{"type":"text","text":" 我們的目標是使它們能夠以亞秒級的延遲快速查詢,而且還具有足夠的靈活性以計算給定過濾條件下的非累加度量。同時也能與 Tableau(我們的主要可視化工具)配合使用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"另一個具體問題是建立自助服務分析平臺。"},{"type":"text","text":" 與儀表板不同,在自助服務工具中,我們無法真正預測用戶將如何準確使用度量和維度,這意味着我們不知道 Cube 應當提供的查詢。因此,Cube 的目標是更加靈活。在這種情況下,我們可以接受邊緣情況下較慢的響應時間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,我想向大家分享一些數字。到 2020 年 11 月,我們在生產中使用了 39 個 Cube,支持 Tableau 用,目前有超過 300 位分析師在使用,執行了將近 40 萬次分析查詢,返回了超過 5,000 億行的數據,並掃描了 500 TB 以上的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mateusz Jerzyk,OLX Group 數據基礎架構團隊高級數據工程師。2019 年曾協助舉辦柏林站 Apache Kylin Meetup。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文轉載自公衆號ApacheKylin(ID:apachekylin)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":":"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653082203&idx=1&sn=0874c323a8052213b0ac3bebd2c48d6e&chksm=80a4acaab7d325bcb65d8e122aa471ac58e69e26cb97f874451130cf8eb6dfcde498d11c4611&token=1020922772&lang=zh_CN#rd","title":"","type":null},"content":[{"type":"text","text":"Kylin on AWS 雲上運維實踐|揭祕 OLX Group 全球數據基礎架構"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章