字節跳動10萬節點HDFS集羣多機房架構演進之路

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"背景"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"現狀"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HDFS 全稱是 Hadoop Distributed File System,其本身是 Apache Hadoop 項目的一個模塊,作爲大數據存儲的基石提供高吞吐的海量數據存儲能力。自從 2006 年 4 月份發佈以來,HDFS 目前依然有着非常廣泛的應用,以字節跳動爲例,隨着公司業務的高速發展,目前 HDFS 服務的規模已經到達“雙 10”的級別:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單集羣節點 10 萬臺級別"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"單集羣數據量達到 10EB 級別"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主要使用場景包括"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"離線"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"OLAP 查詢引擎存儲底座,包括 Hive\/ClickHouse\/Presto 等場景"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"機器學習離線訓練數據"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"近線"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ByteMQ"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"流式任務 Checkpoint"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業界很多公司在維護 HDFS 服務時,採用的都是小集羣模式,即生產上部署多個隔離獨立的 HDFS 集羣滿足業務的不同需求。字節跳動採用的是橫跨多個機房的聯邦大集羣部署模式,即 HDFS 只有一個集羣,這個集羣有多個 nameservice,但是底層的 DN 是橫跨 A\/B\/C 3 個機房的 ,由於社區版 HDFS 沒有機房感知相關的支持,因此字節跳動 HDFS 團隊在這個功能上做了專門的設計和實現,本文會介紹這部分的工作。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"動機"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務的迅猛發展和業務場景的多樣性給 HDFS 帶來了很大的挑戰,這裏列幾個比較有代表性的問題:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何在容量上滿足業務的發展需求"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何滿足近線場景對低延遲的需求"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何滿足關鍵業務的機房級別容災需求"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何高效運維如此超大規模的集羣"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"要回答這些問題需要 HDFS 從多個方向迭代優化,例如 DanceNN 的上線、運維平臺的建設等,本文不會介紹字節跳動 HDFS 所有的演進方案,而是聚焦在 HDFS 多機房架構的演進策略上,它直接回答了上面提到的兩個問題,即:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何在容量上滿足業務的發展需求:數據如何合理地在多個機房之間存放以便能通過其他機房的資源進行快速擴容?"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如何滿足關鍵業務的容災需求:系統如何滿足核心業務機房級別的容災需求?"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"社區版架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"字節跳動的 HDFS 技術脫胎於 Apache 社區的 HDFS,爲了方便大家理解內部版本的技術發展歷程,本小節我們將先了解一下社區 HDFS 的架構。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/2a\/2a13090e28532842d7ca5f262a6f4c31.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖(1) Apache 社區 HDFS 架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從圖(1) 可以看出,社區 HDFS 從架構上劃分可以分爲 3 部分:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Client:訪問 HDFS 的 client,主要通過 HDFS SDK 和 HDFS 進行交互,HDFS SDK 的實現比較重,很多 IO 處理邏輯都是在 SDK 實現,因此這裏單獨列爲架構的一部分。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"元數據管理:即 NameNode,負責集羣的元數據管理,包括目錄樹和數據塊的位置信息。爲了解決元數據膨脹問題,社區提供了 Federation 的功能,引入了 NameService 的概念,簡單地說,每一個 NameService 提供一個 NameSpace,爲了保證 NameNode 的高可用,一個 NameService 包含多個 NameNode 節點(一般是 2 個),這些 NameNode 節點以一主多備的模式工作。Federation 功能跟多機房架構並沒有必要的關聯,因此接下來討論我們將不會涉及 Federation\/NameService 等概念。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據管理:即 DataNode,負責存放用戶的實際數據,前面提到 NameNode 一個功能是管理數據塊的位置信息,在具體實現上,NameNode 不會持久化這些塊的信息,而是靠 DataNode 主動彙報來維護。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"到目前爲止,HDFS 集羣的多機房架構相關的方案基本都是元數據層完成的,因此接下來我們的討論將會聚焦在元數據部分。在本文剩餘篇幅裏,除非特別聲明,否則相關術語都是指字節跳動版的 HDFS。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"字節版架構"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/d9\/d97c710493ad60ca2c1ab7c3dcd4e873.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖(2) 字節跳動 HDFS 架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注:由於 BookKeeper 自身的架構設計,NameNode(DanceNN)實際上是需要通過 ZooKeeper 去發現 BookKeeper 的 endpoint 信息的,這裏爲了方便理解,沒有把這部分通信關係畫出來。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對比圖(1) 和 圖(2), 我們可以發現,字節跳動的 HDFS 依然保留了社區 HDFS 的核心架構,同時也加入了一些特有的功能,包括:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DanceNN,即字節跳動用 C++重新實現的 NameNode,協議上基本兼容社區版的 NameNode。除非特別說明,否則後面出現 DanceNN、NameNode 均指代 DanceNN。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NNProxy,即 NameNode Proxy,爲 Federation 功能提供統一的 Namespace,由於跟多機房架構直接關係不大,這裏不再詳細展開。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"BookKeeper, 即 Apache BookKeeper,其作用是跟社區的 JournaNode 是一樣的,就是爲 Active 和 Standby NameNode 提供一個共享的 editlog 存儲方案,這是實現 NameNode 的 HA 方法的基礎。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"值得一提的是,BookKeeper 本身提供了機房級別的保存配置策略,這是 HDFS 多機房容災方案的基礎,這個特性確保了 HDFS NameNode 提供跨機房容災能力,後面我們將繼續深入討論。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"演進"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"雙機房"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面提到當前 HDFS 的大集羣是橫跨 A\/B\/C 的多機房模式,具體的演進順序是 A -> A,B -> A,B,C ,現在也保持了直接擴展到更多機房的能力。本小節將着重介紹 A -> A,B 的雙機房演進過程,後面的多機房架構的設計思想主要還是雙機房架構的擴展。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"數據放置"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/c0\/c044b473fd1cec7f86418cf79d631146.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖(3) 字節跳動 HDFS 雙機房 DataNode 結構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HDFS 雙機房數據放置方案在設計上總結起來可以描述如下:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"A\/B 機房的 DN 直接跨機房組成一個雙機房集羣,向相同的 NameNode 彙報。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每一個文件在寫的時候會保證每個機房至少存在一個副本,數據實時寫到兩個機房。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"每個 Client 在讀取文件的時候,優先讀取本機房的副本,避免產生大量的跨機房讀帶寬。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個設計的好處就是存儲層對上層應用屏蔽了集羣細節,計算資源可以直接無感分配。該設計結合離線數據一寫多讀的特點,充分考慮跨機房帶寬的合理使用。"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於寫帶寬一般不會有突發,機房間的離線帶寬可以支撐同步寫的需求,因此數據可以兩個機房同步放置至少一個副本。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"離線查詢容易有大的突發請求,因此需要確保常規狀態下沒有突發的跨機房讀帶寬。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實現上關鍵是 DanceNN 加入了機房的感知能力,DanceNN 在 client 進行數據操作時加入對機房拓撲的識別,由於 DanceNN 對外的協議沒有改動,因此上層應用不需要做感知改動。"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"容災設計"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面介紹了雙機房架構裏數據放置的設計,它解決了容量擴展的問題,但是並沒有解決機房級別的容災問題,儘管 NameNode 以一主多備的形式實現了高可用,但是所有 NameNode 還是放在一個機房,在字節跳動基礎架構的容災體系裏,是需要做到機房級別的容災。由於 HDFS 的數據已經實現了多機房數據副本的同步寫入,爲了達成容災的目標,只需要把元數據也演進到雙機房架構即可實現機房級別的容災。前面我們所說的 HDFS 的元數據組件其實包含了兩部分,即 NameNode 和 NameNode Proxy(NNProxy),由於 NNProxy 是無狀態的轉發服務,因此元數據的多機房架構我們只需要關注在 NameNode 設計上。"}]},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/24\/24fd531fab3625e416b517965c01be34.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","text":"圖(4) 字節跳動 HDFS NameNode 體系"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從圖(4) 可以看出 NameNode 包含了 3 個關鍵模塊:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apache ZooKeeper,爲 Apache BookKeeper 提供元數據服務。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Apache BookKeeper,爲 NameNode 的高可用方案提供 EditLog 共享存儲方案。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"DanceNN,即字節跳動自研的高性能 NameNode 實現。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這 3 者構成一個分層的單向依賴關係鏈, DanceNN -> BookKeeper -> ZooKeeper,因此這 3 者可以獨立完成雙機房的容災方案,最終在整體上呈現一個雙機房容災的 NameNode 元數據服務。"}]},{"type":"embedcomp","attrs":{"type":"table","data":{"content":"
組件多機房方案
ZooKeeper一個 ZK ensemble 由 5 臺 server 組成,這 5 臺 server 分佈在 3 個機房,分佈比例爲 A:B:C = 2:2:1
BookKeeper一個 BK cluster 通常由 14 臺 server 組成,分佈在 2 個機房,分佈比例爲 1:1
DanceNN一個 NameService 包含 5 個 DanceNN,這個 5 個 DanceNN 分佈在 2 個機房,分佈比例爲 3:2,工作模式爲 1 active + 4standby"}}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在實現上,這裏面的關鍵就是 DanceNN 的 editlog 機房寫策略,因爲 DanceNN 在做主備切換的時候,如果 editlog 沒法保持同步,那麼服務是不可用的,得益於 BookKeeper 的機房感知的數據放置功能,DanceNN 可以通過這些策略來完成雙機房容災方案。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"常態下,editlog 會以 4 個副本存放到 BookKeeper 上,這 4 個副本的機房分佈比例爲 1:1。"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"容災場景下,DanceNN 可以快速切換成單機房模式,editlog 依然以 4 個副本存放,但是存儲策略變爲單機房存儲,歷史的 editlog 也能正常消費。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"旁路系統"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面已經介紹完了 HDFS 雙機房方案的主體設計,但是事實上一個方案的推進落地除了架構上的迭代演進之外,還需要一系列的旁路系統來配合支持,包括:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Balancer:需要感知機房放置"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Mover:需要保證數據的實際放置滿足多機房策略"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"運維繫統"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 federation 架構下,多個 nameservice 需要保證切主的效率"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"運維操作預案:提前預判相關可能的故障,並且能在運維繫統上執行"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務的平穩過渡方案,儘可能少地減少對業務干擾"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"限於篇幅,本文不會對這些展開細節描述,感興趣的同學可以再交流。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"多機房"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"HDFS 多機房架構是對雙機房架構的擴展,其研發直接動機是機房的資源供應短缺問題,例如 2020 年 B 機房幾乎就沒有資源供應,但是在公司新的主機房 C 卻有較爲充裕的資源。一開始我們是嘗試將 C 機房作爲一個獨立的集羣提供服務,但是發現業務的血緣關係太過複雜,遷移成本太高,因此選擇了基於雙機房機房擴展到多機房的方法,該方案需要滿足這些需求:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"合理使用跨機房帶寬"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兼容已有的雙機房方案"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"遷移成本儘可能小"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"符合字節跳動的機房級別容災標準"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最終的設計方案爲:"}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據放置策略支持多機房,同時兼容已有的雙機房放置策略"}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"NameNode 的容災方案策略不變,因爲在多機房架構下,HDFS 依然只保證一個機房範圍的故障容災"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相應的旁路系統也做相應的調整,儘管 HDFS 底層提供了數據放置在多個機房的策略,但是在離線場景中,用戶只能選擇 2 個機房存放,例如 A\/B, B\/C,A\/B,這個運營上的策略選擇是綜合考慮了穩定性、帶寬使用的合理性以及資源的合理利用之後確定的,核心目標還是保障業務的平穩發展,從後續實踐下來看,這個策略是一個非常正確的選擇。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"結語"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"根據我們的不完全調研,字節跳動 HDFS 的多機房架構在業界中是有自己獨特的路線,這個中原因主要還是公司業務高速發展和機房建設方向在業界中也是獨樹一幟的,這些因素驅動 HDFS 進行自己獨特迭代演進,從結果來看是達到預期,例如 2020 年 C 機房的充分使用,在 B 機房沒有資源供應的情況下依然保障了業務的平穩;2021 春晚活動,爲近線業務例如 ByteMQ、流式 CheckPoint 等提供了多機房的容災策略保障。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後,HDFS 的多機房架構依然在持續迭代,中長期來看,不排除有更多新機房的出現,這些都給 HDFS 多機房架構提出更多的挑戰,原來多機房方案的基礎條件不再具備,因此 HDFS 團隊已經開啓相關功能的迭代,敬請期待!"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文轉載自:字節跳動技術團隊(ID:toutiaotechblog)"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"原文鏈接:"},{"type":"link","attrs":{"href":"https:\/\/mp.weixin.qq.com\/s\/_AumiAx9wbpOZCGMdf0Arw","title":"xxx","type":null},"content":[{"type":"text","text":"字節跳動10萬節點HDFS集羣多機房架構演進之路"}]}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章