深度解讀:Kafka放棄ZooKeeper,消息系統興起二次革命

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"作者 | Tina"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"採訪嘉賓 | 韓欣、王國璋"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"“我對該版本感到非常興奮,但我們的業務特性決定了我們不能停機升級...”"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"3月30日,Kafka背後的企業Confluent發佈博客表示,在即將發佈的2.8版本里,用戶可在完全不需要ZooKeeper的情況下運行Kafka,該版本將"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"依賴於ZooKeeper的控制器改造成了基於Kafka Raft的Quorm控制器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在之前的版本中,如果沒有"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"ZooKeeper"},{"type":"text","text":",Kafka將無法運行。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"但"},{"type":"text","text":"管理部署兩個不同的系統不僅讓運維複雜度翻倍,還讓Kafka變得沉重,進而限制了Kafka在輕量環境下的應用,同時"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"ZooKeeper的分區特性也限制了"},{"type":"text","text":"Kafka的"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"承載能力"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"從2019年起,Confluent就開始策劃更換掉"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"ZooKeeper。這是"},{"type":"text","text":"一項相當大的工程,經過九個多月的開發,KIP-500 代碼的早期訪問已經提交到trunk中。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"第一次,用戶可以在沒有 ZooKeeper 的情況下運行 Kafka。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"這是一次架構上的重大升級,讓一向“重量級”的Kafka從此變得簡單了起來。輕量級的單進程部署可以作爲 ActiveMQ 或 RabbitMQ 等的替代方案,同時也適合於邊緣場景和使用輕量級硬件的場景。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"爲什麼要拋棄使用了十年的ZooKeeper"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"ZooKeeper是Hadoop的一個子項目,一般用來管理較大規模、結構複雜的服務器集羣,具有自己的配置文件語法、管理工具和部署模式。Kafka最初由LinkedIn開發,隨後於2011年初開源,2014年由主創人員組建企業Confluent。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Broker是Kafka集羣的骨幹,負責從生產者(producer)到消費者(consumer)的接收、存儲和發送消息。在當前架構下,Kafka進程在啓動的時候需要往ZooKeeper集羣中註冊一些信息,比如BrokerId,並組建集羣。ZooKeeper 爲Kafka提供了可靠的元數據存儲,比如Topic\/"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"分區的元數據、Broker 數據、ACL信息等等"},{"type":"text","text":"。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"同時"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"ZooKeeper"},{"type":"text","text":"充當Kafka的領導者,以更新集羣中的拓撲更改;根據"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"ZooKeeper"},{"type":"text","text":"提供的通知,生產者和消費者發現整個Kafka集羣中是否存在任何新Broker或Broker失敗。大多數的運維操作,比如說擴容、分區遷移等等,都需要和"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"ZooKeeper"},{"type":"text","text":"交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"也就是說,Kafka代碼庫中有很大一部分是負責實現在集羣中多個 Broker 之間分配分區(即日誌)、分配領導權、處理故障等分佈式系統的功能。而早已經過業界廣泛使用和驗證過的ZooKeeper 是分佈式代碼工作的關鍵部分。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"假設沒有ZooKeeper的話,Kafka甚至無法啓動進程。騰訊雲中間件-微服務產品中心技術總監韓欣對InfoQ說,“在以前的版本中,"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"ZooKeeper可以說是Kafka集羣的靈魂。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"但嚴重依賴ZooKeeper,也給Kafka帶來了掣肘。Kafka 一路發展過來,繞不開的兩個話題就是集羣運維的複雜度以及單集羣可承載的分區規模,韓欣表示,比如騰訊雲 Kafka 維護了上萬節點的 Kafka 集羣,主要遇到的問題也還是這兩個。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"首先從集羣運維的角度來看,Kafka 本身就是一個分佈式系統。但它又依賴另一個開源的分佈式系統,而這個系統又是Kafka 系統本身的核心。這就要求集羣的研發和維護人員需要同時瞭解這兩個開源系統,需要對其運行原理以及日常的運維(比如參數配置、擴縮容、監控告警等)都有足夠的瞭解和運營經驗。否則在集羣出現問題的時候無法恢復,是不可接受的。所以,ZooKeeper的存在增加了運維的成本。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"其次從集羣規模的角度來看,限制 Kafka 集羣規模的一個核心指標就是集羣可承載的分區數。集羣的分區數對集羣的影響主要有兩點:ZooKeeper上存儲的元數據量和控制器變動效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Kafka集羣依賴於一個單一的Controller節點來處理絕大多數的ZooKeeper讀寫和運維操作,並在本地緩存所有ZooKeeper上的元數據。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"分區數增加,ZooKeeper上需要存儲的元數據就會增加,從而加大ZooKeeper的負載,給ZooKeeper集羣帶來壓力,可能導致Watch的延時或丟失。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"當Controller節點出現變動時,需要進行Leader切換、Controller節點重新選舉等行爲,分區數越多需要進行越多的ZooKeeper操作:"},{"type":"text","text":"比如當一個Kafka節點關閉的時候,Controller需要通過寫ZooKeeper將這個節點的所有Leader分區遷移到其他節點;"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"新的Controller節點啓動時,"},{"type":"text","text":"首先需要將所有ZooKeeper上的元數據讀進本地緩存"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":",分區越多,數據量越多,故障恢復耗時也就越長。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Kafka 單集羣可承載的分區數量對於一些業務來說,又特別重要。韓欣舉例補充道,“騰訊雲 Kafka 主要爲公有云用戶以及公司內部業務提供服務。我們遇到了很多需要支持百萬分區的用戶,比如騰訊雲Serverless、騰訊雲的CLS日誌服務、雲上的一些客戶等,他們面臨的場景是一個客戶需要一個topic來進行業務邏輯處理,當用戶量達到百萬千萬量級的情況下,topic帶來的膨脹是非常恐怖的。在當前架構下,Kafka 單集羣無法穩定承載百萬分區穩定運行。這也是我對新的KIP-500版本感到非常興奮的原因。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"去除ZooKeeper後的Kafka"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https:\/\/static001.geekbang.org\/infoq\/55\/559144e8a57729a28dcc5ba34b11077e.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"爲了改善Kafka,去年起Confluent就開始重寫"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"ZooKeeper功能,將這部分代碼集成到了Kafka內部。他們"},{"type":"text","text":"將新版本稱爲“ Kafka on Kafka”,意思是將元數據存儲在Kafka本身,而不是存儲ZooKeeper這樣的外部系統中。Quorum 控制器使用新的 KRaft 協議來確保元數據在仲裁中被精確地複製。這個協議在很多方面與 ZooKeeper 的 ZAB 協議和 Raft 相似。這意味着,仲裁控制器在成爲活動狀態之前不需要從 ZooKeeper 加載狀態。當領導權發生變化時,新的活動控制器已經在內存中擁有所有提交的元數據記錄。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"去除ZooKeeper後,Kafka集羣的運維複雜性直接減半。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在架構改進之前,一個最小的分佈式Kafka集羣也需要六個異構的節點:三個ZooKeeper節點,三個Kafka節點。而一個最簡單的Quickstart演示也需要先啓動一個ZooKeeper進程,然後再啓動一個Kafka進程。在新的KIP-500版本中,一個分佈式Kafka集羣只需要三個節點,而Quickstart演示只需要一個Kafka進程就可以。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"改進後同時提高了集羣的延展性(scalability),大大增加了 Kafka 單集羣可承載的分區數量。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在此之前,元數據管理一直是集羣範圍限制的主要瓶頸。特別是在集羣規模比較大的時候,如果出現"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"Controller節點失敗涉及到的選舉、L"},{"type":"text","text":"eader分區遷移"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":",以及"},{"type":"text","text":"將所有ZooKeeper的元數據讀進本地緩存的操作,所有這些操作都會受限於單個Controller的讀寫帶寬。因此"},{"type":"text","marks":[{"type":"strong"}],"text":"一"},{"type":"text","text":"個Kafka集羣可以管理的分區總數也會受限於這單個Controller的效率。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Confluent流數據部門首席工程師王國璋解釋道:“在KIP-500中,我們用一個Quorum Controller來代替和ZooKeeper交互的單個Controller,這個Quorum裏面的每個Controller節點都會通過Raft機制來備份所有元數據,而其中的Leader在寫入新的元數據時,也可以藉由批量寫入(batch writes)Raft日誌來提高效率。我們的"},{"type":"link","attrs":{"href":"https:\/\/t.co\/SNQNtJ4xqb?amp=1","title":null,"type":null},"content":[{"type":"text","text":"實驗表明"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",在一個可以管理兩百萬個分區的集羣中,Quorum Controller的遷移過程可以從幾分鐘縮小至三十秒。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"升級是否需要停機?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"減少依賴、擴大單集羣承載能力,"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"這肯定是一個很積極的改變方向。雖然目前的版本還未經過大流量檢驗,可能存在穩定性問題,這也是讓廣大開發者擔心的一個方面。但從長期意義上來講,KIP-500對社區和用戶都是一個很大的福音。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"當版本穩定之後,最終大家就會涉及到“升級”的工作。但如何升級,卻成了一個新的問題,"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"在很多Kafka的使用場景中,是不允許業務停機的。"},{"type":"text","text":"韓欣拿騰訊雲的業務舉例說,“"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"微信安全業務的消息管道就使用了騰訊雲的Kafka,假設發生停機,那麼一些自動化的安全業務就會受到影響,從而"},{"type":"text","text":"嚴重影響客戶體驗。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"“"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}},{"type":"strong"}],"text":"就我們的經驗而言,停機升級,在騰訊雲上是一個非常敏感的詞"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"。"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}},{"type":"strong"}],"text":"騰訊雲 Kafka 至今爲止已經運營了六七年,服務了內外部幾百家大客戶,還未發生過一次停機升級的情況"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"。如果需要停機才能升級,那麼對客戶的業務肯定會有影響的,影響的範圍取決於客戶業務的重要性。從雲服務的角度來看,任何客戶的業務的可持續性都是非常重要的、不可以被影響的。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"對於Confluent來講,提供不需要停機的平滑升級方案是一件非常有必要的事情。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"據王國璋介紹,“目前的設計方案是,在2.8版本之後的3.0版本會是一個特殊的搭橋版本(bridge release),在這個版本中,Quorum Controller會和老版本的基於ZooKeeper的Controller共存,而在之後的版本我們纔會去掉舊的Controller模塊。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"對於用戶而言,這意味着如果想要從2.8版本以下升級到3.0以後的某一個版本,比如說3.1,則需要藉由3.0版本實現兩次“跳躍”,也就是說先在線平滑升級到3.0,然後再一次在線平滑升級到3.1。並且在整個過程中,Kafka服務器端都可以和各種低版本的客戶端進行交互,而不需要強制客戶端的升級。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"而像騰訊這樣的企業,也會"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#333333","name":"user"}}],"text":"持續採用灰度的方式來進行業務的升級驗證,韓欣說,“一般不會在存量集羣上做大規模升級操作,而是會採用新建集羣的方式,讓一些有迫切訴求的業務先切量進行灰度驗證,在保證線上業務穩定運行的情況下,逐步擴展新集羣的規模,這樣逐步將業務升級和遷移到新的架構上去。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"消息系統掀起二次革命?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Apache Kafka出現之後,很快擊敗其它的消息系統,成爲最主流的應用。從2011年啓動,經過十年發展,得到大規模應用之後,爲什麼現在又決定用“Raft協議”替換ZooKeeper呢?對此,王國璋回覆InfoQ說,Raft是近年來很火的共識算法,但在Kafka設計之初(2011年),不僅僅是Raft方案,就連一個成熟通用的共識機制(consensus)代碼庫也不存在。當時最直接的設計方案就是基於ZooKeeper這樣一個高可用的同步服務項目。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在這十年裏,Hadoop生態中的不少軟件都在被逐漸拋棄。如今,作爲Hadoop生態中的一員,ZooKeeper也開始過時了?韓欣給予了否認意見,“每一種架構或者軟件都有其適合的應用場景,我不認可過時這個詞。”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"從技術的角度來看,歷史的車輪在不斷向前滾動,學術界和工業界的理論基礎一直在不斷進化,技術也要適應不斷革新的業務不停去演進。不否認有一些軟件會被一些新的軟件所替代,或者說一些新的軟件會更適合某些場景。比如流計算領域,Storm、Spark、Filnk的演進。但是合適的組件總會出現在合適的地方,這就是架構師和研發人員的工作和責任。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Kafka發展至今,雖然其體系結構不斷被改進,比如引入自動縮放、多租戶等功能,來滿足用戶發展的需求,但針對這次大的改進,且還存在需要驗證的現狀,網友在HackerNews上提出了一個靈魂發問:“如果現在還要設計一個"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"新系統"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",那麼是什麼理由選擇Kafka而不是Pulsar?”"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"Confluent的科林·麥凱布(Colin McCabe)迴應這個爭議說,起碼去掉ZooKeeper對Pulsar來說是一個艱鉅的挑戰。Kafka去除ZooKeeper依賴是個很大的賣點,意味着Kafka只有一個組件Broker,而Pulsar則需要ZooKeeper、Bookie、Broker(或者proxy)等多個組件。但也正因爲 抽離出一層存儲層(Bookie),使得後起之秀的Pulsar在架構上天然具有了“計算存儲分離”的優勢。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"總的來說,在企業加速上雲的背景下,無論是Kafka還是Pulsar,消息系統必須是要適應雲原生的大趨勢的,實現計算和存儲分離的功能也是Kafka下一步的策略。Confluent在另一個KIP-405版本中,實現了一個分層式的"},{"type":"link","attrs":{"href":"https:\/\/cwiki.apache.org\/confluence\/display\/KAFKA\/KIP-405%3A+Kafka+Tiered+Storage","title":null,"type":null},"content":[{"type":"text","text":"存儲模式"}],"marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}]},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",利用在雲架構下多種存儲介質的實際情況,將計算層和存儲層分離,將“冷數據”和“熱數據”分離,使得Kafka的分區擴容、縮容、遷移等等操作更加高效和低耗,同時也使Kafka可以在理論上長時間保留數據流。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":"在發展趨勢上,雲原生的出現對消息系統的影響是比較大的,比如容器化和大規模雲盤,爲原本在單集羣性能和堆積限制方面存在上限問題的Kafka,在突破資源瓶頸這裏帶來了新的思路。Broker的容器化,堆積的消息用大規模的雲盤,再加上KRaft去掉了ZooKeeper給Kafka帶來的元數據管理方面的限制,是否是給Kafka帶來了二次的消息系統革命?容器化的消息系統是否會帶來運營運維方面更多的自動化的能力?Serverless是否是消息系統的未來趨勢?雲和雲原生的發展,給傳統的消息系統安上了新的翅膀,帶來了新的想象空間。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"採訪嘉賓:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"韓欣"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",騰訊雲中間件-微服務產品中心技術總監,微服務平臺 TSF、消息隊列 CKafka \/ TDMQ、微服務觀測平臺 TSW 等中間件產品的負責人。負責中間件相關產品的規劃,架構和落地實施,有超過十三年的研發架構經驗。目前關注在雲計算中間件相關領域,致力於整合 PaaS 技術資源,構建基於微服務的技術中臺,爲企業的數字化轉型提供基礎支持。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}},{"type":"strong"}],"text":"王國璋"},{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":",現就職於Confluent,任流數據部門首席工程師。Apache Kafka項目管理委員會成員(PMC),Kafka Streams作者。分別於復旦大學計算機系和美國康奈爾大學計算機系取得學士和博士學位,主要研究方向爲數據庫管理和分佈式數據系統。此前曾就職於 LinkedIn 數據架構組任高級工程師,主要負責實時數據處理平臺,包括 Apache Kafka 和 Apache Samza 系統的開發與維護。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#494949","name":"user"}}],"text":" "}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章