Kafka 和 RocketMQ 之性能對比

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在雙十一過程中投入同樣的硬件資源,Kafka 搭建的日誌集羣單個Topic可以達到幾百萬的TPS,而使用RocketMQ組件的核心業務集羣,集羣TPS只能達到幾十萬TPS,這樣的現象激發了我對兩者性能方面的思考。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"溫馨提示:TPS只是衆多性能指標中的一個,我們在做技術選型方面要從多方面考慮,本文並不打算就消息中間件選型方面投入太多筆墨,重點想嘗試剖析兩者在性能方面的設計思想。","attrs":{}}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1、文件佈局","attrs":{}}]},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.1 Kafka 文件佈局","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 文件在宏觀上的佈局如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/93/9385b346ae66c58ed46f0caf720a9a7b.png","alt":null,"title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正如上圖所示,Kafka 文件佈局的主要特徵如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"文件的組織以 topic + 分區進行組織,每一個 topic 可以創建多個分區,每一個分區包含單獨的文件夾,並且是多副本機制,即 topic 的每一個分區會有 Leader 與 Follow,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"並且 Kafka 內部有機制保證 topic 的某一個分區的 Leader 與 follow 不會存在在同一臺機器,並且每一臺 broker 會盡量均衡的承擔各個分區的 Leader","attrs":{}},{"type":"text","text":",當然在運行過程中如果不均衡,可以執行命令進行手動重平衡。Leader 節點承擔一個分區的讀寫,follow 節點只負責數據備份。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 的負載均衡主要依靠分區 Leader 節點的分佈情況。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區的 Leader 節點負責讀寫,而從節點負責數據同步,如果Leader分區所在的Broker節點發生宕機,會觸發主從節點的切換,會在剩下的 follow 節點中選舉一個新的 Leader 節點,其數據的流入流程如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b1/b1607f071e249ca435742e8dde5395e2.png","alt":null,"title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分區 Leader 收到客戶端的消息發送請求時,是寫入到 Leader 節點後就返回還是要等到它的從節點全部寫入後再返回,這裏非常關鍵,會直接影響消息發送端的時延,故 Kafka 提供了 ack 這個參數來進行策略選擇:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ack = 0","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ack = 1","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ack = -1","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Leader節點和所有的Follow節點接受併成功存儲再向客戶端返回成功。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.2 RocketMQ 文件佈局","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ 的文件佈局如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/81/816ce2c219e25bc835ea5fe8df5465f3.png","alt":null,"title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ 所有主題的消息都會寫入到 commitlog 文件中,然後基於 commitlog 文件構建消息消費隊列文件(Consumequeue),消息消費隊列的組織結構按照 /topic/{queue} 進行組織。從集羣的視角來看如下圖所示:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9b/9b3cf21a5739e496235c40169644d7a2.png","alt":null,"title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ 默認採取的是主從同步,當然從RocketMQ4.5引入了多副本機制,但其","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"副本的粒度爲 Commitlog 文件","attrs":{}},{"type":"text","text":",上圖中不同 master 節點之間的數據完成不一樣(數據分片),而主從節點節點數據一致。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"1.3 文件佈局對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 中文件的佈局是以 Topic/partition ,每一個分區一個物理文件夾,在","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"分區文件級別實現文件順序寫","attrs":{}},{"type":"text","text":",如果一個Kafka集羣中擁有成百上千個主題,每一個主題擁有上百個分區,消息在高併發寫入時,其IO操作就會顯得零散,其操作相當於隨機IO,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"即 Kafka 在消息寫入時的IO性能會隨着 topic 、分區數量的增長,其寫入性能會先上升,然後下降","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而 RocketMQ在消息寫入時追求極致的順序寫,所有的消息不分主題一律順序寫入 commitlog 文件,並不會隨着 topic 和 分區數量的增加而影響其順序性。但通過筆者的實踐來看一臺物理機並使用SSD盤,但一個文件無法充分利用磁盤IO的性能。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"兩者文件組織方式,除了在磁盤的順序寫方面有所區別後,由於其粒度的問題,Kafka 的 topic 擴容分區會涉及分區在各個 Broker 的移動,其擴容操作比較重,而 RocketMQ 數據存儲是基於 commitlog 文件的,擴容時不會產生數據移動,只會對新的數據產生影響,RocketMQ 的運維成本對 Kafka 更低。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後 Kafka 的 ack 參數可以類比 RocketMQ 的同步複製、異步複製。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 的 ack 參數爲 1 時,對比 RocketMQ 的異步複製;-1 對標 RocketMQ 的 同步複製,而 -1 則對標 RocketMQ 消息發送方式的 oneway 模式。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2、數據寫入方式","attrs":{}}]},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.1 Kafka 消息寫入方式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 的消息寫入使用的是 FileChannel,其代碼截圖如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/59/596eeee35cfb8c2a61792062be7a4db7.png","alt":null,"title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"並且在消息寫入時使用了 transferTo 方法","attrs":{}},{"type":"text","text":",根據網上的資料說 NIO 中網絡讀寫真正是零拷貝的就是需要調用 FileChannel 的 transferTo或者 transferFrom 方法,其內部機制是利用了 sendfile 系統調用。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.2 RocketMQ 消息寫入方式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ 的消息寫入支持 內存映射 與 FileChannel 寫入兩種方式, 示例如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a0/a0004266ef3406afe25499c947da252a.png","alt":null,"title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"2.3 消息寫入方式對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"儘管 RocketMQ 與 Kafka 都支持 FileChannel 方式寫入,但 RocketMQ 基於 FileChannel 寫入時調用的 API 卻並不是 transferTo,而是先調用 writer,然後定時 flush 刷寫到磁盤,其代碼截圖如下:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f7/f7a1f3f3699b53b7e32e38f14529988c.png","alt":null,"title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲什麼 RocketMQ 不調用 transerTo 方法呢,個人覺得和 RocketMQ 需要在 Broker 組裝 MQ 消息格式有關,需要從網絡中解碼請求,傳輸到堆內存,然後對消息進行加工,最終持久化到磁盤相關。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從網上查詢資料中大概傾向於這樣一個 觀點:sendfile 系統調用相比內存映射多了一次從用戶緩存區拷貝到內核緩存區,但對於超過64K的內存寫入時往往 sendfile 的性能更高,可能是由於 sendfile 是基於塊內存的。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3、消息發送方式","attrs":{}}]},{"type":"horizontalrule","attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.1 Kafka 消息發送機制","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 在消息發送客戶端採用了一個雙端隊列,引入了批處理思想,其消息發送機制如下圖所示:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/eb/eb0e37ff4cd87d3300720bd39e788f49.png","alt":null,"title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"客戶端通過調用 kafka 的消息發送者發送消息時,消息會首先存入到一個雙端隊列中,雙端隊列中單個元素爲 ProducerBatch,表示一個發送批次,其最大大小受參數 batch.size 控制,默認爲 16K。然後會單獨開一個 Send 線程,從雙端隊列中獲取一個發送批次,將消息按批發送到 Kafka集羣中,這裏引入了 linger.ms 參數來控制 Send 線程的發送行爲。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了提高 kafka 消息發送的高吞吐量,即控制在緩存區中未積滿 batch.size 時來控制消息發送線程的行爲,是立即發送還是等待一定時間,如果linger.ms 設置爲 0表示立即發送,如果設置爲大於0,則消息發送線程會等待這個值後纔會向broker發送。","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"linger.ms 參數者會增加響應時間,但有利於增加吞吐量。有點類似於 TCP 領域的 Nagle 算法","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 的消息發送,在寫入 ProducerBatch 時會按照消息存儲協議組織好數據,在服務端可以直接寫入到文件中。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.2 RocketMQ 消息發送機制","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RocketMQ 消息發送在客戶端主要是根據路由選擇算法選擇一個隊列,然後將消息發送到服務端,消息會在服務端按照消息的存儲格式進行組織,然後進行持久化等操作。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"3.3 消息發送對比","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 在消息發送方面比 RokcetMQ 有一個顯著的優勢就是消息格式的組織是發生在客戶端,這樣會有一個大的優勢節約了 Broker 端的CPU壓力,客戶端“分佈式”的承接了其優勢,其架構方式有點類似 shardingjdbc 與 MyCat 的區別。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 在消息發送端另外一個特點是引入了雙端緩存隊列,Kafka 無處不在追求批處理,這樣顯著的特點是能提高消息發送的吞吐量,但與之帶來的是增大消息的響應時間,並且帶來了消息丟失的可能性,因爲 Kafka 追加到消息緩存後會返回成功,如果消息發送方異常退出,會帶來消息丟失。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 中的 linger.ms = 0 可類比 RocketMQ 消息發送的效果。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但 Kafka 通過提供 batch.size 與 linger.ms 兩個參數按照場景進行定製化,比 RocketMQ 靈活。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"例如日誌集羣,通常會調大 batch.size 與 linger.ms 參數,重複發揮消息批量發送機制,提高其吞吐量;但如果對一些響應時間比較敏感的話,可以適當減少 linger.ms 的值。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4、總結","attrs":{}}]},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面的對比來看,Kafka 在性能上綜合表現確實要比 RocketMQ 更加的優秀,但在消息選型過程中,我們不僅僅要參考其性能,還有從功能性上來考慮,例如 RocketMQ 提供了豐富的消息檢索功能、事務消息、消息消費重試、定時消息等。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"筆者個人認爲通常在大數據、流式處理場景基本選用 Kafka,業務處理相關選擇 RocketMQ。","attrs":{}}]},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#F5222D","name":"red"}}],"text":"原創不易,您的點贊與轉發是對筆者最大的鼓勵,更多專欄文章請關注『中間件興趣圈』","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/b8/b8ac0640119474dfe9d9a037fca19ad2.png","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#F5222D","name":"red"}}],"text":"專欄鏈接","attrs":{}},{"type":"text","text":":","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/6Zh0trQbF2LemaYWiFUP8Q","title":""},"content":[{"type":"text","text":"https://mp.weixin.qq.com/s/6Zh0trQbF2LemaYWiFUP8Q","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章