數據架構:數據冷熱分離實踐思考

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系列文章:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/8cb17c785305b9228bc92fea1","title":"","type":null},"content":[{"type":"text","text":"數據架構:概念與冷熱分離","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"一 概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 上一篇文章","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/8cb17c785305b9228bc92fea1","title":"","type":null},"content":[{"type":"text","text":"數據架構:概念與冷熱分離","attrs":{}}]},{"type":"text","text":"中介紹了數據架構的概念和意義。並拋出了數據冷熱分離的問題。事實上,這並不是新的概念,各公司在很早之前就已經開始了落地實踐。微軟雲有冷熱blob存儲,阿里雲有ots,都是爲了在雲服務層面提供冷熱存儲的解決方案。儘管有這些工具,如果很好地實現冷熱分離,仍然是值得仔細思考和玩味的。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"二 冷熱分離核心問題與案例","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.1 關鍵問題","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 迴歸話題,無論我們怎樣選擇冷熱存儲方案,首先,都還是需要一種存儲介質。哪怕是雲上的存儲方案。冷熱分離的具體實現,也會與存儲介質的選擇直接相關。舉個栗子,數據從熱存儲到冷存儲的遷移,最簡單的來看,需要實現2個步驟:1、數據寫入冷存儲;2、熱存儲數據刪除;而刪除動作就與數據庫的選擇有很大關係。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1.1 大數據刪除","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 大量的數據插入和數據刪除,尤其是在有索引的大表上,這樣的操作會很大程度地影響數據庫讀寫性能;而且刪除後,未必會立即釋放舊數據所佔的空間,在某些db下,甚至可能需要做一次數據整理才能真正釋放。這會導致一個很嚴重的問題,如果不做整理操作,那麼相當於這些舊數據物理上還佔據着空間,最終必然也會導致磁盤空間不足。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.1.2 查詢包含熱數據也有冷數據","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 這點可以理解爲中間層路由的實現。什麼時候查詢熱數據,什麼時候查詢冷數據,需要有一個規則層來控制。理想的情況,冷熱數據都是分別查詢,而且冷數據查詢的頻率(在整體查詢中的比例)低一個或多個數量級,這樣的分離說明是比較合理的。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2.2 幾個案例","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 接下來,我們通過可以搜索到的幾個文章中的案例,來了解不同存儲方案下的冷熱分離實現,並試圖分析其中合理和不合理的地方。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2.1 mysql","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.1.1 案例概述","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://blog.csdn.net/java_zhangshuai/article/details/80698688","title":"","type":null},"content":[{"type":"text","text":"[數據庫]-----記一次mysql分庫的操作(冷熱分離)","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 案例中是採用數據分庫的方式實現。也就是說,建立了生產庫 和 歷史庫兩個數據庫,生產庫存放熱數據,歷史庫放冷數據。文中描述的架構如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/59/59b51d65ef870a6c24a078c41b624e51.png","alt":"這裏寫圖片描述","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.1.2 數據遷移","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 通常,遷移我們會採用定時任務的方式實現。也就是說,對於冷熱數據的分割,會傾向於使用“天”的粒度。當然,根據實際的業務需求也可以進一步細分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 爲了不影響常規業務,就需要在業務低谷時期執行這些非核心業務動作,所以會在每天凌晨執行遷移動作,在新的業務請求高峯到來之前完成遷移,降低影響。在任務的具體實現上,還需要特別注意,某些任務可能會依賴數據遷移的完成,這樣就意味着存在任務之間的依賴關係,以及失敗重試等等。並且爲了確保數據的完整性和一致性,最好對遷移數據進行一致性校驗,避免數據丟失和錯誤數據的產生。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.1.3 多數據源的查詢","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 這裏的多數據源,就是指既有熱數據,也有冷數據的查詢。當然前面我們有過描述,理想情況下不應該有這樣的情況存在,但在真實業務中很可能是不可避免的。這就要求:1)系統提供跨熱、冷數據庫的查詢支持;2)冷數據查詢性能明顯低於熱數據庫的情況下,儘可能減小查詢耗時。如果可能,最好能實現降低長尾耗時查詢的比例。爲了達到這個效果,就需要結合緩存策略或在功能上限制查詢模式和查詢範圍,並在具體業務中做好引導和取捨。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"2.2.2 Elasticsearch","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://elasticsearch.cn/article/13566","title":"","type":null},"content":[{"type":"text","text":"Elasticsearch冷熱分離原理和實踐","attrs":{}}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.2.1 節點異構","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 與mysql的冷熱部署類似,這裏的es也採用雙集羣模式,但強調出了節點異構。(其實這是必要環節和前提,簡單來說,熱庫側重實時業務讀寫能力,要求保障性能,空間足以存儲熱數據即可;而冷庫則需要保障數據存儲量級和一致,能夠接受犧牲一定程度的讀寫性能,因爲要存儲大量歷史數據,所以相比熱褲,空間需要大很多。)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" “部分是高性能的節點用於存儲熱點數據,部分是性能相對差些的大容量節點用於存儲冷數據,卻可以一方面保證熱數據的性能,另一方面保證冷數據的存儲,降低存儲成本,這也是Elasticsearch冷熱分離架構的基本思想”。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.2.2 節點指定冷熱屬性","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 在elasticsearch.yml文件中增加配置的方式,爲節點打上標籤。","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"node.attr.{attribute}: {value}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中attribute爲用戶自定義的任意標籤名,value爲該節點對應的該標籤的值,例如對於冷熱分離,可以使用如下設置","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"node.attr.temperature: hot //熱節點\nnode.attr.temperature: warm //冷節點","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.2.3 冷熱索引設置","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 冷熱數據做了分離,前面也提到二者適用於不同場景,那麼在數據的索引上,也可以針對使用場景進行曲分設計,不必保持一致。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 注意冷熱數據與數據庫主從的區別,冷熱數據庫會要求表/集合的結構一致,但索引可以有所區別。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2.2.4 索引生命週期","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" Elasticsearch從6.6版本開始提供索引生命週期管理功能,索引生命週期管理可以通過API或者kibana界面配置。這一特性使得我們可以使用索引生命週期管理結合冷熱分離架構實現索引數據的動態管理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這裏引述","attrs":{}},{"type":"link","attrs":{"href":"https://elasticsearch.cn/article/13566","title":"","type":null},"content":[{"type":"text","text":"Elasticsearch冷熱分離原理和實踐","attrs":{}}]},{"type":"text","text":"中的描述:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"索引的生命週期被分爲:","attrs":{}},{"type":"codeinline","content":[{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"Hot phrase","attrs":{}}],"attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":",","attrs":{}},{"type":"codeinline","content":[{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"Warm phase","attrs":{}}],"attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":", ","attrs":{}},{"type":"codeinline","content":[{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"Cold phase","attrs":{}}],"attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":",","attrs":{}},{"type":"codeinline","content":[{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"Delete phrase","attrs":{}}],"attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"四個階段","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"Hot phrase: 該階段可以根據索引的文檔數,大小,時長決定是否調用rollover API來滾動索引,詳情可以參考[","attrs":{}},{"type":"link","attrs":{"href":"https://www.elastic.co/guide/en/elasticsearch/reference/6.8/indices-rollover-index.html","title":null,"type":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"indices-rollover-index","attrs":{}}]},{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"],因與本文關係不大不再詳細贅述。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"Warm phrase: 當一個索引在Hot phrase被roll over後便會進入Warm phrase,進入該階段的索引會被設置爲read-only, 用戶可以爲這個索引設置要使用的attribute, 如對於冷熱分離策略,這裏可以選擇temperature: warm屬性。另外還可以對索引進行forceMerge, shrink等操作,這兩個操作具體可以參考官方文檔。","attrs":{}}]}]}],"attrs":{}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/72/72daa42144b4c8530b37b882551b8b28.png","alt":"","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"Cold phrase: 可以設置當索引rollover一段時間後進入cold階段,這個階段也可以設置一個屬性。從冷熱分離架構可以看出冷熱屬性是具備擴展性的,不僅可以指定hot, warm, 也可以擴展增加hot, warm, cold, freeze等多個冷熱屬性。如果想使用三層的冷熱分離的話這裏可以指定爲temperature: cold, 此處還支持對索引的freeze操作,詳情參考官方文檔。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}},{"type":"size","attrs":{"size":10}}],"text":"Delete phrase: 可以設置索引rollover一段時間後進入delete階段,進入該階段的索引會自動被刪除。","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 本篇分析了幾個冷熱分離的實現案例,並整理了一些問題和解決方案。通過mysql 和 Es的兩種冷熱分離實現,闡述了不同存儲方案上冷熱分離實現上的共同點和差別。迴歸本源,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"設計最終還是依賴於具體業務需求","attrs":{}},{"type":"text","text":"。後續還需要在實踐中,通過足夠的業務場景和數據量級支撐,來繼續驗證方案的可行性和潛在問題,不斷進行完善升級。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章