ClickHouse的實踐之路

{"type":"doc","content":[{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據量日益增長的當下,傳統數據庫的查詢性能已滿足不了我們的業務需求。而Clickhouse在OLAP領域的快速崛起引起了我們的注意,於是我們引入Clickhouse並不斷優化系統性能,提供高可用集羣環境。本文主要講述如何通過Clickhouse結合大數據生態來定製一套完善的數據分析方案、如何打造完備的運維管理平臺以降低維護成本,並結合具體案例說明Clickhouse的實踐過程。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Clickhouse簡介","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 爲什麼選擇Clickhouse","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前企業用戶行爲日誌每天百億量級,雖然經過數倉的分層以及數據彙總層通用維度指標的預計算,但有些個性化的分析場景還是需要直接編寫程序或sql查詢,這種情況下hive sql和spark sql的查詢性能已無法滿足用戶需求,我們迫切的需要一個OLAP引擎來支持快速的即席查詢。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BI存儲庫主要採用的是Infobright,在千萬量級能很快的響應BI的查詢請求,但隨着時間推移和業務的發展,Infobright的併發量與查詢瓶頸日益凸顯,我們嘗試將大數據量級的表導入TiDB、Hbase、ES等存儲庫,雖然對查詢有一定的提速,但是也存在着相應的問題(後續章節會詳細介紹),這時我們考慮到Clickhouse。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Clickhouse社區活躍度高、版本迭代非常快,幾乎幾天到十幾天更新一個小版本,我們非常看好它以後的發展。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. Clickhouse特性","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Clickhouse是俄羅斯yandex公司於2016年開源的一個列式數據庫管理系統,在OLAP領域像一匹黑馬一樣,以其超高的性能受到業界的青睞。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"特性:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於shard+replica實現的線性擴展和高可靠","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"採用列式存儲,數據類型一致,壓縮性能更高","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"硬件利用率高,連續IO,提高了磁盤驅動器的效率","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"向量化引擎與SIMD提高了CPU利用率,多核多節點並行化大查詢","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不足:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不支持事務、異步刪除與更新","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不適用高併發場景","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Clickhouse建設","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 整體架構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/96/963b7b0384666eca473bbbaa1c8663b5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們依據數據的流向將Clickhouse的應用架構劃分爲4個層級。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據接入層","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"提供了數據導入相關的服務及功能,按照數據的量級和特性我們抽象出三種Clickhouse導入數據的方式。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方式一:數倉應用層小表導入這類數據量級相對較小,且分佈在不同的數據源如hdfs、es、hbase等,這時我們提供基於DataX自研的TaskPlus數據流轉+調度平臺導入數據,單分區數據無併發寫入,多分區數據小併發寫入,且能和線上任務形成依賴關係,確保導入程序的可靠性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方式二:離線多維明細寬表導入這類數據一般是彙總層的明細數據或者是用戶基於Hadoop生產的大量級數據,我們基於Spark開發了一個導入工具包,用戶可以根據配置直接拉取hdfs或者hive上的數據到clickhouse,同時還能基於配置sql對數據進行ETL處理,工具包會根據配置集羣的節點數以及Clickhouse集羣負載情況(merges、processes)對local表進行高併發的寫入,達到快速導數的目的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"方式三:實時多維明細寬表導入實時數據接入場景比較固定,我們封裝了通用的ClickhouseSink,將app、pc、m三端每日百億級的數據通過Flink接入clickhouse,ClickhouseSink也提供了batchSize(單次導入數據量)及batchTime(單次導入時間間隔)供用戶選擇。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據存儲層","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據存儲層這裏我們採用雙副本機制來保證數據的高可靠,同時用nginx代理clickhouse集羣,通過域名的方式進行讀寫操作,實現了數據均衡及高可靠寫入,且對於域名的響應時間及流量有對應的實時監控,一旦響應速度出現波動或異常我們能在第一時間收到報警通知。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"nginx_one_replication:代理集羣一半節點即一個完整副本,常用於寫操作,在每次提交數據時由nginx均衡路由到對應的shard表,當某一個節點出現異常導致寫入失敗時,nginx會暫時剔除異常節點並報警,然後另選一臺節點重新寫入。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"nginx_two_replication:代理集羣所有節點,一般用作查詢和無副本表數據寫入,同時也會有對於異常節點的剔除和報警機制。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據服務層","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對外:將集羣查詢統一封裝爲scf服務(RPC),供外部調用。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對內:提供了客戶端工具直接供分析師及開發人員使用。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據應用層","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"埋點系統:對接實時clickhouse集羣,提供秒級別的OLAP查詢功能。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶分析平臺:通過標籤篩選的方式,從用戶訪問總集合中根據特定的用戶行爲捕獲所需用戶集。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BI:提供數據應用層的可視化展示,對接單分片多副本Clickhouse集羣,可橫向擴展。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Clickhouse運維管理平臺","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在Clickhouse的使用過程中我們對常見的運維操作如:增刪節點、用戶管理、版本升降級等封裝了一系列的指令腳本,再結合業務同學使用過程中的一些訴求開發了Clickhouse管理平臺,該平臺集管理、運維、監控爲一體,旨在讓用戶更方便、快捷的使用Clickhouse服務,降低運維成本,提高工作效率。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1b/1b3b786b02f5e91a0a296ab6a83f52df.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"配置文件結構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f3/f371044c63ea2d41f5549d159338ae0c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"users.xml默認的users.xml可分爲三個部分用戶設置users:主要配置用戶信息如賬號、密碼、訪問ip等及對應的權限映射配額設置quotas:用於追蹤和限制用戶一段時間內的資源使用參數權限profiles:讀寫權限、內存、線程等大多數參數配置爲了統一管理權限我們在users.xml預定義了對應權限及資源的quotas及profiles,例如default_profile、readwrite_profile、readonly_profile等,新增用戶無需單獨配置quotas及profiles,直接關聯預定義好的配置即可users.d/xxx.xml按不同的用戶屬性設置user配置,每一個xml對應一組用戶,每個用戶關聯users.xml中的不同權限quotas及profilesusers_copy/xxx.xml每次有變更用戶操作時備份指定屬性的xml,方便回滾metrika.xml默認情況下包含集羣的配置、zookeeper的配置、macros的配置,當有集羣節點變動時通常需要將修改後的配置文件同步整個集羣,而macros是每個服務器獨有的配置,如果不拆解很容易造成配置覆蓋,引起macros混亂丟失數據,所以我們在metrika.xml中只保留每臺服務器通用的配置信息,而將獨立的配置拆解出去conf.d/xxx.xml保存每臺服務器獨立的配置,如macros.xmlconfig_copy/xxx.xml存放每次修改主配置時的備份文件,方便回滾","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"元數據管理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"維護各個Clickhosue集羣的元數據信息,包含表的元數據信息及Clickhouse服務狀態信息,給用戶更直觀的元數據管理體驗,主要有如下功能:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"查詢指定集羣和庫表信息,同時展示該表的狀態:只讀 or 讀寫。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"查看錶的元數據信息 行數、磁盤佔用、原始大小、更新時間、分區信息等。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"設定數據生命週期,基於分區數對數據進行清理操作。","attrs":{}}]}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a7/a743ef588dd1d56840dc90b65ec39095.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"自動化運維","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"用戶管理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於我們基於nginx代理的方式對Clickhouse進行均衡讀寫,同時Clickhouse的配置也是可以熱修改的,所以在用戶管理及資源控制方面我們直接通過web平臺對Clickhosue配置文件進行修改操作。通過web平臺展示users.xml中對應權限的profiles 和 quotas,運維人員只需根據用戶屬性選擇對應的配置填寫對應的用戶名及自動生成的密文密碼即可,不會影響已配置好的權限及資源,同時每次xml操作都會提前備份文件,在xml修改異常時可隨時回滾。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/90/90a2d65aaf6fe0be7bf6b220c30d3128.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"集羣操作","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"clickhosue管理平臺的核心模塊,依託於運維作業平臺 API封裝了一系列的運維腳本,覆蓋了集羣管理的常用操作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"clickhouse服務的啓動、停止、重啓","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"clickhouse的安裝、卸載、故障節點替換","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"升級/降級指定Clickhouse版本","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"動態上下線指定節點","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"元數據維護 (cluster_name、metrik、macros)","attrs":{}}]}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1e/1ebe7e5f8db032f79ab1229c7a71c48e.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏以新增節點爲例展示整體的流程操作:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d6/d642cfaa5a5aae5afaf5d4f09a8952d3.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中,較爲核心的操作在於install作業的分發及對應的配置生成。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分發install作業:由Clickhouse平臺調用運維作業平臺服務將預定義的腳本分發到指定節點執行,同時傳入用戶選填的配置參數。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/34/345bf255da318ee8a8bf78bbe7e291db.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生成配置文件:通常情況下我們會在一個物理集羣分別建立單副本集羣和雙副本集羣,在爲新節點生成配置文件時由clickhouse平臺從元數據模塊獲取到新增節點的集羣信息,動態生成新增節點的macros與metrika配置,然後將metrika.xml同步到所有集羣。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7a/7a0e7d82ec0d39255ad914cfc4beed66.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"監控與報警","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.硬件指標監控","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"硬件指標監控主要指clickhouse服務節點的負載、內存、磁盤IO、網卡流量等,這裏我們依託於monitor監控平臺來配置各種指標,當監控指標達到一定閾值後觸發報警。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.集羣指標監控","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們在Clickhouse管理平臺中集成了grafana,採用Prometheus採集clickhosue集羣信息在grafana做展現,一般的監控指標有top排名(慢查詢、內存佔用、查詢失敗 )、QPS、讀寫壓力、HTTP&TCP連接數、zookeeper狀態等,當這些指標出現異常時通過alertmanager插件配置的規則觸發報警。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ef/efd1273bc935baf4b4d5a486a8577aaa.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.流量指標監控","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前所有對於clickhouse的讀寫請求都是通過域名代理的方式,通過域名的各項指標能精準且實時的反映出用戶最原始的讀寫請求,當域名響應時間波動較大或者響應失敗時我們能在第一時間收到報警並查看原始請求。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"Clickhouse應用","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. BI查詢引擎","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"核心訴求","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在未接入Clickhouse之前,BI的存儲庫有Infobright、Hbase、ES、druid等,其中主要使用的是Infobright,在千萬級別以下Infobright性能出色,對於一些時間跨度較長、數據量級較大的表Infobright就有些無能爲力,這種數據我們通常會存放在ES與Hbase中,這樣雖然加快了查詢速度但是也增大了系統適配不同數據源的複雜度,同時分析師會有直接操作表的訴求,數據存入ES與Hbase會增加對應的學習成本,基於此我們的核心訴求就是:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(1) 大數據量級下高查詢性能;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(2) BI適配成本低;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"(3)支持sql簡單易用。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"選型對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於以上訴求我們拿現有的Infobright與TiDB、Doris、Clickhouse做了如下對比。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"功能點InfobrightTiDBDorisClickhouseBI適配成本-低低中學習使用成本-低低低百萬級查詢(100w)84ms24ms25ms41ms千萬級查詢(1000w)1330ms332ms130ms71ms億級別查詢(1.1億)57000ms16151ms3200ms401ms","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"總體來看Clickhouse的查詢性能略高於Doris,而TiDB在千萬量級以上性能下降明顯,且對於大數據量級下Clickhouse相比Infobright性能提升巨大,所以最終我們選擇了Clikhouse作爲BI的存儲查詢引擎。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 集羣構建","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在評估了目前Infobright中的數據量級和Clickhouse的併發限制之後,我們決定使用單分片 多副本的方式來構建Clickhouse集羣,理由如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BI對接數倉應用層數據,總體來說量級較小,同時clickhouse有着高效的數據壓縮比,採用單節點能存儲當前BI的全量數據,且能滿足未來幾年的數據存儲需求。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Clickhouse默認併發數爲100,採用單分片每個節點都擁有全量數據,當qps過高時可橫向增加節點來增大併發數。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"clickhouse對Distributed 表的join支持較差,單分片不走網絡,能提高join查詢速度。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"服務器配置:CPU:16 × 2 cores、內存:192GB、磁盤:21TB,整體的架構圖如下所示:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/2a/2a30dbe3123d9232d9db71b76b9fdee4.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在寫數據時由taskplus對其中的一臺節點寫入,如果該節點異常可切換到其他副本節點寫入,由寫入副本自動同步其他副本。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"查詢同樣用nginx代理三臺節點,由於是單分片集羣所以查詢視圖表和本地表效果是一樣的,不過視圖表會自動路由健康副本,所以這裏還是選擇查詢視圖表。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在通過Taskplus將BI的數據源切換到Clickhouse後對於大量級查詢性能提升明顯。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"tp99由1184ms變爲739ms","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大於1秒的查詢總量日均減少4.5倍","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大於1秒的查詢總耗時日均降低6.5倍","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}],"attrs":{}}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/75/75fde390c45301c7ed08564a8ce9b565.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 問題及優化","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在接入clickhouse之前BI的平均響應時間爲187.93ms,接入clickhouse之後BI的平均響應時間爲84.58ms,整體響應速度提升了2.2倍,雖然查詢速度有所提升但是我們在clickhouse監控日報郵件中仍發現了一些慢查詢,究其原因是我們對於應用層的表默認都是以日期字段stat_date分區,而有一部分表數據量級非常小且分區較多如某產品留存表總數據量:5564行,按日期分區 851個分區,平均每天6.5條數據,以下是針對於該表執行的常規group by count查詢統計。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"功能點ck日期分區(冷查詢)ck 日期分區(熱查詢)ck 無分區(熱查詢)Infobrightquery12000ms220ms16ms8ms","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由此可見Clickhouse對於多分區的select的查詢性能很差,官方文檔中也有對應的表述:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"> A merge only works for data parts that have the same value for the partitioning expression. This means you shouldn’t make overly granular partitions (more than about a thousand partitions). Otherwise, the SELECT query performs poorly because of an unreasonably large number of files in the file system and open file descriptors","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/39/39c4e35f1e072d7d35fcc2916cf84d9d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"實時數倉","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1. 分層架構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於每日用戶行爲數據量級已達百億,傳統的離線分析已不能滿足業務方的需求,因此我們基於三端數據構建了實時數倉,整體分層架構如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a4/a48c03fe74dcdd2076c9f686d4548655.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"clickhouse在其中扮演的角色是秒級別的實時OLAP查詢引擎,當我們DWS層的通用維度實時指標不滿足用戶需求時,用戶可以直接通過Clickhouse編寫sql查詢實時數據,大大降低了實時數據查詢門檻。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2. 數據輸入與輸出","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/66/668b855c82af534605a0fc1e5a5591a3.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據輸入層面我們將用戶的行爲數據實時關聯維表寫入kafka,然後由Flink + JDBC寫入Clickhouse,爲了保證實時查詢的穩定性我們採用了雙副本結構,用nginx代理其中一個完整的副本,直接對域名寫入.同時在程序中增加失敗重試機制,當有節點不可寫入時,會嘗試向其他分片寫入,保證了每條數據都能被寫入clickhouse。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在數據的輸出層面將同樣由nginx代理整個集羣,對接到客戶端工具及與SCF服務,其中客戶端工具對接到開發人員及分析師,scf對外提供查詢服務。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"3. 數據產品","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"埋點系統是我們專爲埋點管理開發的系統其主要功能有:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"numberedlist","attrs":{"start":null,"normalizeStart":1},"content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","text":"埋點報備及校驗:新上線埋點的收錄及校驗;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":2,"align":null,"origin":null},"content":[{"type":"text","text":"需求管理:針對於新埋點上線及埋點變更的需求週期監控及狀態追蹤;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":3,"align":null,"origin":null},"content":[{"type":"text","text":"埋點多維分析:基於用戶上報埋點進行多維彙總,方便用戶下鑽分析定位問題;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":4,"align":null,"origin":null},"content":[{"type":"text","text":"指標及看板:有單個或多個埋點按一定規則組合進行多維彙總,可直接在看板中配置對應的統計結果數據;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":5,"align":null,"origin":null},"content":[{"type":"text","text":"埋點測試:實時收集測試埋點數並進行格式化校驗及解析。","attrs":{}}]}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1b/1bda1f5cdc566bd15f736cfcd7bf628f.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在未接入Clickhouse前埋線系統採用MR預計算彙總用戶配置的埋點指標,並將結果數據寫入Hbase,預計算針對於用戶側來說查詢的都是結果數據,響應速度非常快,但是同時也帶來一些問題:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"時效性較差:新上報埋點數據或者修改後的埋點需要在T+1天才能展示,且修改埋點維度後需要重跑歷史數據。","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"模型單一不便擴展:只針對埋點的事件模型做流量統計,想要支持其他分析模型必須另外開發對應的計算模型。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/11/110132a1382130ba772ecb87541dfa01.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"基於此種情況我們直接將埋點系統中用戶配置的規則轉換爲sql,查詢Clickhouse中接入的實時多維明細數據,同時針對於埋點系統的使用場景優化了實時明細表的索引結構,依託clickhouse極致的查詢性能保證實時埋點統計能在秒級別的響應,相當於即配即出,且能隨意修改維度及指標,大大提升了用戶體驗.由於是基於sql直接統計明細數據,所以統計模型的擴展性較高,能更快的支持產品迭代。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"接入對比時效性時間維度計算方式擴展性未接入clickhouseT+1天級mr預計算低接入clickhouse秒級分鐘級實時計算高","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/33/33a124fa1eacf637aea94930f28c726c.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"常見問題","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"數據寫入","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個batch內不要寫多個分區的數據;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據服務器配置適當增大background_pool_size,提高merge線程的數量 默認值16;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於system.merges、system.processes表做好監控,可隨時感知寫入壓力情況作出預警,避免服務崩潰;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引不宜建立過多,對於大數據量高併發的寫入可以考慮先做數據編排按建表索引排序在寫入,減少merge壓力;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"禁止對Distributed表寫入,可通過代理方式如nginx或chproxy直接對local表寫入,而且能基於配置實現均衡寫入及動態上下線節點。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"JOIN操作","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"無論什麼join小表必須放在右邊,可以用left、right調整join方式;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"開啓謂詞下推:","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"大量降低數據量的操作如where、group by、distinct操作優先在join之前做(需根據降低比例評估)。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"常用參數","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"max_execution_time 單次查詢的最大時間:600s;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"max_memory_usage 單服務器單次查詢使用的最大內存,設置總體內存的50%;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"max_bytes_before_external_group_by 啓動外部存儲 max_memory_usage/2;","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"max_memory_usage_for_all_queries 單服務器所有查詢使用的最大內存,設置總體內存的80%-90%,防止因clickhouse服務佔用過大資源導致服務器假死。","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"總結與展望","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前Clickhouse主要應用於數據產品、畫像、BI等方向,日更新百億數據,每日百萬量級查詢請求,持續對外提供高效的查詢服務,我們未來將在以下兩個方面加強Clickhouse的建設:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"1.完善Clickhouse管理平臺保障Clickhouse服務的穩定性:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"目前在刪除節點時會啓動一個Rebalance腳本將被刪除節點上的數據重新寫入其他節點,在此過程中會造成數據查詢不一致的問題,我們希望能提供更高效無感的Rebalance操作方案","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"更精細化的權限控制及管理,目前最新版本中已有此實現(Role及Privileges),後續我們將嘗試使用該功能並適配到Clickhouse管理平臺","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"實時數據寫入Clickhouse的一致性保證","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.優化Clickhouse性能,拓展Clickhouse使用場景:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Clickhouse在千億級數據場景下複雜查詢優化","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"埋點系統基於Clickhouse統計模型拓展如訪問路徑、間隔、分佈等","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"作者簡介:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"楊迪,58同城分析與決策支持部數據高級開發工程師楊琛,58同城分析與決策支持部數據高級開發工程師曹德嵩,58同城分析與決策支持部數據資深開發工程師","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/09R_gmHdTSY_QAvmftX4og","title":""},"content":[{"type":"text","text":"ClickHouse的實踐之路","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章