字節跳動10萬節點HDFS集羣多機房架構演進之路

原創

2021-07-20 07:03

組件	多機房方案
ZooKeeper	一個 ZK ensemble 由 5 臺 server 組成，這 5 臺 server 分佈在 3 個機房，分佈比例爲 A:B:C = 2:2:1
BookKeeper	一個 BK cluster 通常由 14 臺 server 組成，分佈在 2 個機房，分佈比例爲 1:1
DanceNN	一個 NameService 包含 5 個 DanceNN，這個 5 個 DanceNN 分佈在 2 個機房，分佈比例爲 3:2，工作模式爲 1 active + 4standby"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實現上，這裏面的關鍵就是 DanceNN 的 editlog 機房寫策略，因爲 DanceNN 在做主備切換的時候，如果 editlog 沒法保持同步，那麼服務是不可用的，得益於 BookKeeper 的機房感知的數據放置功能，DanceNN 可以通過這些策略來完成雙機房容災方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"常態下，editlog 會以 4 個副本存放到 BookKeeper 上，這 4 個副本的機房分佈比例爲 1:1。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容災場景下，DanceNN 可以快速切換成單機房模式，editlog 依然以 4 個副本存放，但是存儲策略變爲單機房存儲，歷史的 editlog 也能正常消費。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"旁路系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面已經介紹完了 HDFS 雙機房方案的主體設計，但是事實上一個方案的推進落地除了架構上的迭代演進之外，還需要一系列的旁路系統來配合支持，包括："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Balancer：需要感知機房放置"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mover：需要保證數據的實際放置滿足多機房策略"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"運維繫統"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 federation 架構下，多個 nameservice 需要保證切主的效率"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"運維操作預案：提前預判相關可能的故障，並且能在運維繫統上執行"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務的平穩過渡方案，儘可能少地減少對業務干擾"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"限於篇幅，本文不會對這些展開細節描述，感興趣的同學可以再交流。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"多機房"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HDFS 多機房架構是對雙機房架構的擴展，其研發直接動機是機房的資源供應短缺問題，例如 2020 年 B 機房幾乎就沒有資源供應，但是在公司新的主機房 C 卻有較爲充裕的資源。一開始我們是嘗試將 C 機房作爲一個獨立的集羣提供服務，但是發現業務的血緣關係太過複雜，遷移成本太高，因此選擇了基於雙機房機房擴展到多機房的方法，該方案需要滿足這些需求："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"合理使用跨機房帶寬"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兼容已有的雙機房方案"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遷移成本儘可能小"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"符合字節跳動的機房級別容災標準"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終的設計方案爲："}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據放置策略支持多機房，同時兼容已有的雙機房放置策略"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NameNode 的容災方案策略不變，因爲在多機房架構下，HDFS 依然只保證一個機房範圍的故障容災"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相應的旁路系統也做相應的調整，儘管 HDFS 底層提供了數據放置在多個機房的策略，但是在離線場景中，用戶只能選擇 2 個機房存放，例如 A\/B， B\/C，A\/B，這個運營上的策略選擇是綜合考慮了穩定性、帶寬使用的合理性以及資源的合理利用之後確定的，核心目標還是保障業務的平穩發展，從後續實踐下來看，這個策略是一個非常正確的選擇。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據我們的不完全調研，字節跳動 HDFS 的多機房架構在業界中是有自己獨特的路線，這個中原因主要還是公司業務高速發展和機房建設方向在業界中也是獨樹一幟的，這些因素驅動 HDFS 進行自己獨特迭代演進，從結果來看是達到預期，例如 2020 年 C 機房的充分使用，在 B 機房沒有資源供應的情況下依然保障了業務的平穩；2021 春晚活動，爲近線業務例如 ByteMQ、流式 CheckPoint 等提供了多機房的容災策略保障。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後，HDFS 的多機房架構依然在持續迭代，中長期來看，不排除有更多新機房的出現，這些都給 HDFS 多機房架構提出更多的挑戰，原來多機房方案的基礎條件不再具備，因此 HDFS 團隊已經開啓相關功能的迭代，敬請期待！"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自：字節跳動技術團隊（ID：toutiaotechblog）"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接："},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/_AumiAx9wbpOZCGMdf0Arw","title":"xxx","type":null},"content":[{"type":"text","text":"字節跳動10萬節點HDFS集羣多機房架構演進之路"}]}]}]}