Kafka是批量發送消息的?

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"        Kafka作爲高可用的分佈式消息系統,由多個broker組成的Kafka集羣在物理層將消息分佈在不同broker的不同partition上,支持多個producer和多個consumer,producer可以將topic的消息分佈到不同broker的不同partition上,consumer也可以消費集羣中多個partition。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9f/9fd66b065a61075eaf5da33b34dd462c.jpeg","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    多個consumer通過coordinator(Consumer和Broker的協調者)可以加入到一個邏輯的Consumer Group中,一個partition只允許被一個Consumer Group中的一個consumer所消費,在一個Consumer Group中一個consumer可以消費多個partition,不同的consumer消費的partition一定不會重複,不同Consumer Group都會消費所有的partition,也就是說一個Consumer Group中的consumer對partition是互斥的,而不同Consumer Group之間是共享的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"消費組與分區平衡","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"       消費端的平衡主要包含JoinGroup和SyncGroup兩部分:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當啓動一個consumer時,向group.id發起join group請求","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Coordinator根據請求選出Consumer Group內的leader,並通知group內的各consumer ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Consumer leader重新爲每個consumer重新分配partition ","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"分配完畢後,coordinator通過SyncGroup請求把最新分配情況同步給每個consumer","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"      ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"   Consumer會週期性的發送heartbeat到coordinator(協調者),如果協調者在session.timeout.ms以內沒有收到consumer的heartbeat就會認定該consumer已退出,它所訂閱的partition會分配到同一Consumer Group內的其它consumer上。該機制叫rebalance,只是針對於group的協調。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/aa/aaa0de06b838ef724acc6d5599805469.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"   JoinGroup請求的主要作用是將Group Consumer訂閱的信息發送給leader進行分配partiton。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/71/71ab9f5f8dc2d1e2a5b5b57bb9513a34.jpeg","alt":null,"title":"","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    SyncGroup請求的主要作用是讓協調者把leader制定的分配方案發送給group內各consumer,當所有成員都成功接收到分配方案後,消費者組進入到Stable狀態開始正常的消費工作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":" 以上機制會存在一個問題:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"        由於消費者消費消息的最小單元是partition,每個partition的消費進度offset是保存在consumer中的,在進行rebalance後partition會由group內的其它consumer接管,而接管的consumer如何知道該分區的offset?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"      在kafka中可以以Consumer Group級別保存各consumer所消費分區的offset,這些信息可以保存在ZK或topic中,一旦group內發生了rebalance之後,其它接管的consumer就可以讀取該分區的offset。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0a/0af606eeeee0ab0bcae931182a26c704.jpeg","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"     Broker在啓動時,都會創建和開啓相應的coordinator組件,每次rebalance時group下所有的consumer都需要參與,如果group內的consumer過多會導致這個流程的效率非常低下,幸好rebalance不容易觸發,只在以下幾種條件觸發:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"group內成員數量發生變化","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"topic數量發生變化","attrs":{}}]}],"attrs":{}},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"topic的分區數發生變化","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"主要配置","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"group.id ","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"表示該consumer想要加入到哪個group中。默認值是“”。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"heartbeat.interval.ms ","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"心跳間隔。心跳是在consumer與coordinator之間進行的。心跳是確定consumer存活、加入或者退出group的方式。這個值必須設置的小於session.timeout.ms,因爲當consumer由於某種原因不能發heartbeat到coordinator時,並且時間超過session.timeout.ms時,就會認爲該consumer已退出導致rebalance。通常要低於session.timeout.ms的1/3。默認值:3000(3s)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"session.timeout.ms ","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Consumer與coordinator的session 過期時間。這個值必須設置在group.min.session.timeout.ms 與 group.max.session.timeout.ms之間。默認值:10000(10s)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"enable.auto.commit ","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Consumer 在commit offset時有兩種模式:自動提交、手動提交。默認值:true。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"auto.commit.interval.ms","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動提交間隔。默認值:5000(5s)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"auto.offset.reset ","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設置broker在發現kafka在沒有初始offset,或者當前的offset是一個不存在的值時的處理方式。有以下四種: ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            earliest:自動重置到最早的offset。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            latest:看上去重置到最晚的offset(默認)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"            none:如果更早的offset也沒有的話,就拋出異常給consumer 。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"connections.max.idle.ms ","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"consumer到broker的連接空閒超時時間 默認值是:540000(9min)","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"拉取消息","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"    在Kafka partition中,每個消息都有唯一標識offset,每個consumer group中consumer依次從partition讀取數據,Consumer循環不斷的向broker發起fetch請求,每次從partition只拉取fetch.max.bytes大小的數據,每次執行max.poll.records條數據,然後consumer調用commitSync/commitAsync方法commit,將批量消息中最大的offset保留在coordinator中。如果超過max.poll.interval.ms沒有調用poll就會認爲這個consumer失敗,導致觸發rebalance的機制。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"      由於消息的offset是由consumer管理,更新offset的方式稱爲commit。Consumer提供了兩種模式:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"自動方式(默認)","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當enable.auto.commit設置爲true時,consumer在發起fetch後每隔auto.commit.interval.ms 默認(5S)commit一次offset,如果在fetch後處理消息之後還未commit offset,發生了rebalance機制,接管的Consumer無法知道最新的offset而導致重複消費問題。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"手動提交","attrs":{}}]}],"attrs":{}}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當enable.auto.commit設置爲false時,需要consumer調用同步commitSync來主動提交offset,爲了避免重複消費應當在消息處理完之後才commit offset,即使處理過程中發生rebalance也只有處理過的消息會重複消費。由於大量手動同步提交等待最終結果會導致阻塞,所以有異步commitAsync方式commit offset,但是commit提交失敗後不會重試,因爲如果有多個異步提交,會導致offset被覆蓋。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}},{"type":"strong","attrs":{}}],"text":"多線程消費","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"       每個consumer可開啓多線程消費加大消費吞吐量,線程數量最好與分區數保持一致。topic的一個分區只能被同一個consumer group下的一個consumer線程來消費,但一consumer線程可以消費多個分區的數據,所以可加大consumer消費的線程數量來並行消費。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":16}}],"text":"更多文章請加入公衆號","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e9/e907e7f7f6deb51b782e7173adc49b56.jpeg","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章