淺淡 Apache Kylin 與 ClickHouse 的對比

原創

2021-01-11 10:03

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apache Kylin 和 ClickHouse 都是目前市場流行的大數據 OLAP 引擎；Kylin 最初由 eBay 中國研發中心開發，2014 年開源並貢獻給 Apache 軟件基金會，憑藉着亞秒級查詢的能力和超高的併發查詢能力，被許多大廠所採用，包括美團，滴滴，攜程，貝殼找房，騰訊，58同城等；"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"OLAP 領域這兩年炙手可熱的 ClickHouse，由俄羅斯搜索巨頭 Yandex 開發，於2016年開源，典型用戶包括字節跳動、新浪、騰訊等知名企業。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這兩種 OLAP 引擎有什麼差異，各自有什麼優勢，如何選擇？ "},{"type":"text","marks":[{"type":"strong"}],"text":"本文將嘗試從技術原理、存儲結構、優化方法和優勢場景等方面，對比這兩種 OLAP 引擎，爲大家的技術選型提供一些參考。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"01 技術原理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"技術原理方面，我們主要從 "},{"type":"text","marks":[{"type":"strong"}],"text":"架構"},{"type":"text","text":" 和 "},{"type":"text","marks":[{"type":"strong"}],"text":"生態"},{"type":"text","text":" 兩方面做個比較。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.1 技術架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kylin 是基於 Hadoop 的 MOLAP (Multi-dimensional OLAP) 技術，核心技術是 OLAP Cube"},{"type":"text","text":" ；與傳統 MOLAP 技術不同，Kylin 運行在 Hadoop 這個功能強大、擴展性強的平臺上，從而可以支持海量 (TB到PB) 的數據；它將預計算（通過 MapReduce 或 Spark 執行）好的多維 Cube 導入到 HBase 這個低延遲的分佈式數據庫中，從而可以實現亞秒級的查詢響應；最近的 Kylin 4 開始使用 "},{"type":"text","marks":[{"type":"strong"}],"text":"Spark + Parquet"},{"type":"text","text":" 來替換 HBase，從而進一步簡化架構。由於大量的聚合計算在離線任務（Cube 構建）過程中已經完成，所以執行 SQL 查詢時，它不需要再訪問原始數據，而是直接利用索引結合聚合結果再二次計算，性能比訪問原始數據高百倍甚至千倍；由於 CPU 使用率低，它可以支持較高的併發量，尤其適合自助分析、固定報表等多用戶、交互式分析的場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ClickHouse 是基於 MPP 架構的分佈式 ROLAP （Relational OLAP）分析引擎"},{"type":"text","text":" ，各節點職責對等，各自負責一部分數據的處理（shared nothing），開發了向量化執行引擎，利用日誌合併樹、稀疏索引與 CPU 的 SIMD（單指令多數據，Single Instruction Multiple Data）等特性，充分發揮硬件優勢，達到高效計算的目的。因此當 ClickHouse 面對大數據量計算的場景，通常能達到 CPU 性能的極限。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.2 技術生態"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 採用 Java 編寫，充分融入 Hadoop 生態系統，使用 HDFS 做分佈式存儲，計算引擎可選 MapReduce、Spark、Flink；存儲引擎可選 HBase、Parquet（結合 Spark)。源數據接入支持 Hive、Kafka、RDBMS 等，多節點協調依賴 Zookeeper；兼容 Hive 元數據，Kylin 只支持 SELECT 查詢，schema 的修改等都需要在 Hive 中完成，然後同步到 Kylin；建模等操作通過 Web UI 完成，任務調度通過 Rest API 進行，Web UI 上可以查看任務進度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ClickHouse 採用 C++ 編寫，自成一套體系，對第三方工具依賴少。支持較完整的 DDL 和 DML，大部分操作可以通過命令行結合 SQL 就可以完成；分佈式集羣依賴 Zookeper 管理，單節點不用依賴 Zookeper，大部分配置需要通過修改配置文件完成。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"02 存儲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 採用 Hadoop 生態的 HBase 或 Parquet 做存儲結構，依靠 HBase 的 rowkey 索引或 Parquet 的 Row group 稀疏索引來做查詢提速，使用 HBase Region Server 或 Spark executor 做分佈式並行計算。ClickHouse 自己管理數據存儲，它的存儲特點包括：MergeTree 作主要的存儲結構，數據壓縮分塊，稀疏索引等。下面將針對兩者的引擎做詳細對比。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1 Kylin 的存儲結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 通過預聚合計算出多維 Cube 數據，查詢的時候根據查詢條件，動態選擇最優的 Cuboid （類似於物化視圖），這會極大減小 CPU 計算量和 IO 的讀取量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/6a\/bb\/6a5f5b0de0cb3e3f2ee61af11c678bbb.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Cube 構建過程中，Kylin 將維度值進行一定的編碼壓縮如字典編碼，力圖最小化數據存儲；由於 Kylin 的存儲引擎和構建引擎都是可插拔式的，對於不同的存儲引擎，存儲結構也有所差異。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"HBase 存儲"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在使用 HBase 作爲存儲引擎的情況下，在預計算時會對各個維度進行編碼，保證維度值長度固定，並且在生成 hfile 時把計算結果中的維度拼接成 rowkey，聚合值作爲 value。維度的順序決定 rowkey 的設計，也會直接影響查詢的效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/c1\/ab\/c10bba682822bcd88b54a3918dc6c3ab.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/d5\/98\/d5c01ebbf85a04caa42638a565ed3f98.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Parquet 存儲引擎"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在使用 Parquet 作爲存儲格式時則會直接存儲維度值和聚合值，而不需要進行編碼和 rowkey 拼接。在存成 Parquet 之前，計算引擎會根據維度對計算結果進行排序，維度字段越是靠前，那麼在其上的過濾效率也就越高。另外在同一個分區下 shard 的數量和 parquet 文件的 row group 數量也同樣會影響查詢的效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2 ClickHouse 的存儲結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ClickHouse 在創建表結構的時候一般要求用戶指定分區列。採用數據壓縮和純粹的列式存儲技術，使用 Mergetree 對每一列單獨存儲並壓縮分塊，"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/48\/a2\/486bc75900ed9ff0a583e4456d51c0a2.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時數據總會以片段的形式寫入磁盤，當滿足一定條件後 ClickHouse 會通過後臺線程定期合併這些數據片段。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a2\/f9\/a29c2805780b64cb7a07b3cacf5850f9.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當數據量持續增大，ClickHouse，會針對分區目錄的數據進行合併，提高數據掃描的效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同時 ClickHouse 針對每個數據塊，提供稀疏索引。在處理查詢請求的時候，就能夠利用稀疏索引，減少數據掃描起到加速作用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/cd\/b6\/cd0f6dc367a461dee570c1e1a2a968b6.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"03 優化方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 和 ClickHouse 都是大數據處理系統，當數據量級持續增大的時候，採用合適的優化方法往往能事半功倍，極大地降低查詢響應時間，減少存儲空間，提升查詢性能。由於二者的計算系統和存儲系統不同，因此採用的優化方式也不一樣，下一小節將着重分析 Kylin 和 ClickHouse 兩者的優化方法。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.1 Kylin 的優化方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Kylin 的核心原理是預計算"},{"type":"text","text":" ，正如第一小節技術原理所說：Kylin 的計算引擎用 Apache Spark，MapReduce；存儲用 HBase，Parquet；SQL 解析和後計算用 Apache Calcite。 "},{"type":"text","marks":[{"type":"strong"}],"text":"Kylin 的核心技術是研發了一系列的優化方法，來幫助解決維度爆炸和掃描數據過多的問題"},{"type":"text","text":" ，這些方法包括：設置聚合組，設置聯合維度，設置衍生維度，設置維度錶快照，設置 Rowkey 順序，設置 shard by 列等。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設置聚合組：通過聚合組進行剪枝，減少不必要的預計算組合；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設置聯合維度：將經常成對出現的維度組合放在一起，減少不必要的預計算；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設置衍生維度：將能通過其他維度計算出來的維度（例如年，月，日能通過日期計算出來）設置爲衍生維度，減少不必要的預計算；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設置維度錶快照：放入內存現算，減少佔用的存儲空間；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"字典編碼：減少佔用的存儲空間；"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RowKey 編碼，設置 shard by 列：通過減少數據掃描的行數，加速查詢效率"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a6\/b0\/a6d9e4e2714yy9a2a5a353c37e79b9b0.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3.2 ClickHouse 優化方法"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MPP 架構的系統最常見的優化方式就是分庫分表，類似的， "},{"type":"text","marks":[{"type":"strong"}],"text":"ClickHouse 最常見的優化方式包括設置分區和分片，此外 ClickHouse 也包括一些特有的引擎"},{"type":"text","text":" 。總結歸納下來，這些優化方法包括："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"用平表結構，代替多表 Join，避免昂貴的 Join 操作和數據混洗"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"設置合理的分區鍵，排序鍵，二級索引，減少數據掃描"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"搭建 ClickHouse 分佈式集羣增加分片和副本，添加計算資源"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"結合物化視圖，適當採用 SummingMergetree，AggregateMergetree 等以預計算爲核心的引擎"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"隨着後面性能和併發的要求越來越高，對機器的資源消耗也越來越大。在 ClickHouse 的官方網站文檔中建議 ClickHouse 的併發數不超過 100，當併發要求高，爲減少 ClickHouse 的資源消耗，可以結合 ClickHouse 的一些特殊引擎進行優化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特殊引擎中最常用的是 SummingMergetree 和 AggregateMergetree，這兩種數據結構是從 Mergetree 中派生而來，本質是通過預計算將需要查詢的數據提前算出來，保存在 ClickHouse 中，這樣查詢的時候就能進一步減少資源消耗。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從使用原理來看 SummingMergetree 和 AggregateMergetree 與 Kylin 的Cube有異曲同工之妙。但是當維度過多的時候，管理很多個物化視圖是不現實的做法，存在管理成本高等問題。與 ClickHouse 不同，Kylin 提供一系列簡單直接的優化方法，來避免維度爆炸的問題。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"可以看到，ClickHouse 和 Kylin 都提供一些方法減少存儲佔用的空間，降低查詢時掃描數據的行數。通常認爲：對 ClickHouse 和 Kylin 進行適當優化，都能在大數據量場景下滿足業務需求。ClickHouse 採用 MPP 現算，Kylin 採用預計算，由於兩者採用的技術路線不同因此相應優勢場景也不同。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"04 優勢場景"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kylin 因爲採用預計算技術，適合有固定模式的聚合查詢，例如：SQL 中的 join、group by、where條件模式比較固定等，數據量越大，使用 Kylin 的優勢越明顯；特別的， "},{"type":"text","marks":[{"type":"strong"}],"text":"Kylin 在去重（count distinct）、Top N、Percentile 等場景的優勢尤爲巨大，大量使用在 Dashboard、各類報表、大屏展示、流量統計、用戶行爲分析等場景"},{"type":"text","text":" 。美團、極光、貝殼找房等使用 Kylin 構建了他們的數據服務平臺，每日提供高達數百萬到數千萬次的查詢服務，且大部分查詢可以在 2 - 3 秒內完成。這樣的高併發場景幾乎沒有更好的替代方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"ClickHouse 因爲採用 MPP 架構現場計算能力很強，當查詢請求比較靈活，或者有明細查詢需求，併發量不大的時候比較適用"},{"type":"text","text":" 。場景包括：非常多列且 where 條件隨意組合的用戶標籤篩選，併發量不大的複雜即席查詢等。如果數據量和訪問量較大，需要部署分佈式 ClickHouse 集羣，這時候對運維的挑戰會比較高。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果有些查詢非常靈活，但不經常查，採用現算就比較節省資源，由於查詢量少，即使每個查詢消耗計算資源大整體來說也可以是划算的。如果有些查詢有固定的模式，查詢量較大就更適合 Kylin，因爲查詢量大，利用大的計算資源將計算結果保存，前期的計算成本能夠攤薄每個查詢中，因此是最經濟的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"05 總結"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文就技術原理，存儲結構，優化方法及優勢場景，對 Kylin 和 ClickHouse 進行了對比。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"技術原理方面"},{"type":"text","text":" ：ClickHouse 採用 MPP + Shared nothing 架構，查詢比較靈活，安裝部署和操作簡便，由於數據存儲在本地，擴容和運維相對較麻煩；Kylin 採用 MOLAP 預計算，基於 Hadoop，計算與存儲分離（特別是使用 Parquet 存儲後）、Shared storage 的架構，更適合場景相對固定但數據體量很大的場景，基於 Hadoop 便於與現有大數據平臺融合，也便於水平伸縮（特別是從 HBase 升級爲 Spark + Parquet 後）。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"存儲結構方面"},{"type":"text","text":" ：ClickHouse 存儲明細數據，特點包括MergeTree 存儲結構和稀疏索引，在明細之上可以進一步創建聚合表來加速性能；Kylin 採用預聚合以及 HBase 或 Parquet 做存儲，物化視圖對查詢透明，聚合查詢非常高效但不支持明細查詢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"優化方法方面"},{"type":"text","text":" ：ClickHouse 包括分區分片和二級索引等優化手段， Kylin 採用聚合組、聯合維度、衍生維度、層級維度，以及 rowkey 排序等優化手段"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"優勢場景方面"},{"type":"text","text":" ：ClickHouse 通常適合幾億~幾十億量級的靈活查詢（更多量級也支持只是集羣運維難度會加大）。Kylin 則更適合幾十億~百億以上的相對固定的查詢場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下圖是一個多方面的彙總："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.infoq.cn\/resource\/image\/a7\/10\/a7ea7da802d48b5d844f19b7f48fec10.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"綜合下來， Kylin 和 ClickHouse 有各種使用的領域和場景。現代數據分析領域沒有一種能適應所有場景的分析引擎。企業需要根據自己的業務場景，選擇合適的工具解決具體問題。希望本文能夠幫助企業做出合適的技術選型。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"作者介紹"},{"type":"text","text":"："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"周耀，Kyligence 解決方案架構師，Apache Kylin、Apache Superset Contributor。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"本文轉載自公衆號apachekylin（ID：ApacheKylin）。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"原文鏈接"},{"type":"text","text":"："}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s?__biz=MzAwODE3ODU5MA==&mid=2653081811&idx=1&sn=a30d9f66cedaa8b466fd56202e9ac1b3&chksm=80a4ae22b7d327345b635cbca42fb865a13e98166e47b3b8c075ee8eb23fddb68c27bb42be03&token=1340822333&lang=zh_CN#rd","title":"","type":null},"content":[{"type":"text","text":"淺淡 Apache Kylin 與 ClickHouse 的對比"}]}]}]}