聊聊 Kafka:Producer 源碼解析

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"一、前言","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面幾篇我們講了關於 Kafka 的基礎架構以及搭建,從這篇開始我們就來源碼分析一波。我們這用的 Kafka 版本是 2.7.0,其 Client 端是由 Java 實現,Server 端是由 Scala 來實現的,在使用 Kafka 時,Client 是用戶最先接觸到的部分,因此,我們從 Client 端開始,會先從 Producer 端開始,今天我們就來對 Producer 源碼解析一番。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"二、Producer 使用","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先我們先通過一段代碼來展示 KafkaProducer 的使用方法。在下面的示例中,我們使用 KafkaProducer 實現向 Kafka 發送消息的功能。在示例程序中,首先將 KafkaProduce 使用的配置寫入到 Properties 中,每項配置的具體含義在註釋中進行解釋。之後以此 Properties 對象爲參數構造 KafkaProducer 對象,最後通過 send 方法完成發送,代碼中包含同步發送、異步發送兩種情況。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c6/c6e3ee4c18769bc139ca0d6b91e6fce6.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面的代碼可以看出 Kafka 爲用戶提供了非常簡潔方便的 API,在使用時,只需要如下兩步:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"初始化 KafkaProducer 實例","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調用 send 接口發送數據","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本文主要是圍繞着初始化 KafkaProducer 實例與如何實現 send 接口發送數據而展開的。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"三、KafkaProducer 實例化","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"瞭解了 KafkaProducer 的基本使用,然後我們來深入瞭解下方法核心邏輯:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"public KafkaProducer(Properties properties) {\n    this(Utils.propsToMap(properties), (Serializer)null, (Serializer)null, (ProducerMetadata)null, (KafkaClient)null, (ProducerInterceptors)null, Time.SYSTEM);\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"四、消息發送過程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"用戶是直接使用 producer.send() 發送的數據,先看一下 send() 接口的實現","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"// 異步向一個 topic 發送數據\npublic Future send(ProducerRecord record) {\n    return this.send(record, (Callback)null);\n}\n\n// 向 topic 異步地發送數據,當發送確認後喚起回調函數\npublic Future send(ProducerRecord record, Callback callback) {\n    ProducerRecord interceptedRecord = this.interceptors.onSend(record);\n    return this.doSend(interceptedRecord, callback);\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據發送的最終實現還是調用了 Producer 的 doSend() 接口。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.1 攔截器","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先方法會先進入攔截器集合 ProducerInterceptors , onSend 方法是遍歷攔截器 onSend 方 法,攔截器的目的是將數據處理加工, Kafka 本身並沒有給出默認的攔截器的實現。如果需要使用攔截器功能,必須自己實現接口。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.1.1 攔截器代碼","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.1.2 攔截器核心邏輯","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1a/1ae3840eb2b1422d40230cf015c2f2ff.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ProducerInterceptor 接口包括三個方法:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"onSend(ProducerRecord var1)","attrs":{}}],"attrs":{}},{"type":"text","text":":該方法封裝進 KafkaProducer.send 方法中,即它運行在用戶主線程中的。 確保在消息被序列化以計算分區前調用該方法。用戶可以在該方法中對消息做任何操作,但最好保證不要修改消息所屬的 topic 和分區,否則會影響目標分區的計算。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"onAcknowledgement(RecordMetadata var1, Exception var2)","attrs":{}}],"attrs":{}},{"type":"text","text":":該方法會在消息被應答之前或消息發送失敗時調用,並且通常都是在 producer 回調邏輯觸發之前。onAcknowledgement 運行在 producer 的 IO 線程中,因此不要在該方法中放入很重的邏輯,否則會拖慢 producer 的消息發送效率。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"close()","attrs":{}}],"attrs":{}},{"type":"text","text":":關閉 interceptor,主要用於執行一些資源清理工作。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"攔截器可能被運行在多個線程中,因此在具體實現時用戶需要自行確保線程安全。另外倘若指定了多個 interceptor,則 producer 將按照指定順序調用它們,並僅僅是捕獲每個 interceptor 可能拋出的異常記錄到錯誤日誌中而非在向上傳遞。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"4.2 Producer 的 doSend 實現","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面是 doSend() 的具體實現:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/66/66158a3c18336d09395cee81a019c441.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 doSend() 方法的實現上,一條 Record 數據的發送,主要分爲以下五步:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"確認數據要發送到的 topic 的 metadata 是可用的(如果該 partition 的 leader 存在則是可用的,如果開啓權限時,client 有相應的權限),如果沒有 topic 的 metadata 信息,就需要獲取相應的 metadata;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"序列化 record 的 key 和 value;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"獲取該 record 要發送到的 partition(可以指定,也可以根據算法計算);","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"向 accumulator 中追加 record 數據,數據會先進行緩存;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果追加完數據後,對應的 RecordBatch 已經達到了 batch.size 的大小(或者 batch 的剩餘空間不足以添加下一條 Record),則喚醒 sender 線程發送數據。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據的發送過程,可以簡單總結爲以上五點,下面會這幾部分的具體實現進行詳細分析。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"五、消息發送過程","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"5.1 獲取 topic 的 metadata 信息","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Producer 通過 waitOnMetadata() 方法來獲取對應 topic 的 metadata 信息,這塊內容我下一篇再來講。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"5.2 key 和 value 的序列化","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Producer 端對 record 的 key 和 value 值進行序列化操作,在 Consumer 端再進行相應的反序列化,Kafka 內部提供的序列化和反序列化算法如下圖所示:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/05/05f3c92669e840a599a62ec9530d3525.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然我們也是可以自定義序列化的具體實現,不過一般情況下,Kafka 內部提供的這些方法已經足夠使用。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"5.3 獲取該 record 要發送到的 partition","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"獲取 partition 值,具體分爲下面三種情況:","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指明 partition 的情況下,直接將指明的值直接作爲 partiton 值;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"沒有指明 partition 值但有 key 的情況下,將 key 的 hash 值與 topic 的 partition 數進行取餘得到 partition 值;","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"既沒有 partition 值又沒有 key 值的情況下,第一次調用時隨機生成一個整數(後面每次調用在這個整數上自增),將這個值與 topic 可用的 partition 總數取餘得到 partition 值,也就是常說的 round-robin 算法。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體實現如下:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"// 當 record 中有 partition 值時,直接返回,沒有的情況下調用 partitioner 的類的 partition 方法去計算(KafkaProducer.class)\nprivate int partition(ProducerRecord record, byte[] serializedKey, byte[] serializedValue, Cluster cluster) {\n    Integer partition = record.partition();\n    return partition != null ? partition : this.partitioner.partition(record.topic(), record.key(), serializedKey, record.value(), serializedValue, cluster);\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Producer 默認使用的 partitioner 是 org.apache.kafka.clients.producer.internals.DefaultPartitioner,用戶也可以自定義 partition 的策略,下面是默認分區策略具體實現:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"public int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster) {\n    return this.partition(topic, key, keyBytes, value, valueBytes, cluster, cluster.partitionsForTopic(topic).size());\n}\n\npublic int partition(String topic, Object key, byte[] keyBytes, Object value, byte[] valueBytes, Cluster cluster, int numPartitions) {\n    return keyBytes == null ? this.stickyPartitionCache.partition(topic, cluster) : Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;\n}","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f0/f01d5150cbe4ef0a79efbab4751e4e65.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面這個默認算法核心就是粘着分區緩存","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"5.4 向 RecordAccmulator 中追加 record 數據","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們講 RecordAccumulator 之前先看這張圖,這樣的話會對整個發送流程有個大局觀。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/dd/ddd7c037008e15ee84d48484a8f03bf8.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RecordAccmulator 承擔了緩衝區的角色。默認是 32 MB。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 Kafka Producer 中,消息不是一條一條發給 broker 的,而是多條消息組成一個 ProducerBatch,然後由 Sender 一次性發出去,這裏的 batch.size 並不是消息的條數(湊滿多少條即發送),而是一個大小。默認是 16 KB,可以根據具體情況來進行優化。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在 RecordAccumulator 中,最核心的參數就是:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"private final ConcurrentMap> batches;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"它是一個 ConcurrentMap,key 是 TopicPartition 類,代表一個 topic 的一個 partition。value 是一個包含 ProducerBatch 的雙端隊列。等待 Sender 線程發送給 broker。畫張圖來看下:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3c/3c8e069695a0413b7a971750357bcfb6.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f9/f994c8fb79ad31bb80300a241c6d94e6.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面的代碼不知道大家有沒有疑問?分配內存的代碼爲啥不在 synchronized 同步塊中分配?導致下面的 synchronized 同步塊中還要 tryAppend 一下。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲這時候可能其他線程已經創建好 ProducerBatch 了,造成多餘的內存申請。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果把分配內存放在 synchronized 同步塊會有什麼問題?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"內存申請不到線程會一直等待,如果放在同步塊中會造成一直不釋放 Deque 隊列的鎖,那其他線程將無法對 Deque 隊列進行線程安全的同步操作。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"再跟下 tryAppend() 方法,這就比較簡單了。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/9a/9a406d4878a8ae42f05459b1aca451f0.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以上代碼見圖解:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/a0/a09c923ee8d8208dca340b01cdd68f83.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"5.5 喚醒 sender 線程發送 ProducerBatch","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當 record 寫入成功後,如果發現 ProducerBatch 已滿足發送的條件(通常是 queue 中有多個 batch,那麼最先添加的那些 batch 肯定是可以發送了),那麼就會喚醒 sender 線程,發送 ProducerBatch。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"sender 線程對 ProducerBatch 的處理是在 run() 方法中進行的,該方法具體實現如下:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/96/96972909f724e8010c416eb2e30792c4.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3d/3d4b5611ca9e6822aeb7c8369ec6863e.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中比較核心的方法是 run() 方法中的 org.apache.kafka.clients.producer.internals.Sender#sendProducerData","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中 pollTimeout 意思是最長阻塞到至少有一個通道在你註冊的事件就緒了。返回 0 則表示走起發車了。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f4/f4e2cfdccbd04f16999195730cee6a7c.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們繼續跟下:org.apache.kafka.clients.producer.internals.RecordAccumulator#ready","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d7/d79b4338ee9cb07938439f59e4e6c65b.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後再來看下里面這個方法 org.apache.kafka.clients.producer.internals.RecordAccumulator#drain,從accumulator 緩衝區獲取要發送的數據,最大一次性發 max.request.size 大小的數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/48/487460e6ac6b3ba1fb63e2500c1c2317.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"六、總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"最後爲了讓你對 Kafka Producer 有個宏觀的架構理解,請看下圖:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/47/47dd5e2fe34182a872d124e3e45d1d43.png","alt":"在這裏插入圖片描述","title":"在這裏插入圖片描述","style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"簡要說明:","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"new KafkaProducer() 後創建一個後臺線程 KafkaThread (實際運行線程是 Sender,KafkaThread 是對 Sender 的封裝) 掃描 RecordAccumulator 中是否有消息。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"調用 KafkaProducer.send() 發送消息,實際是將消息保存到 RecordAccumulator 中,實際上就是保存到一個 Map 中 (ConcurrentMap>),這條消息會被記錄到同一個記錄批次 (相同主題相同分區算同一個批次) 裏面,這個批次的所有消息會被髮送到相同的主題和分區上。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"後臺的獨立線程掃描到 RecordAccumulator 中有消息後,會將消息發送到 Kafka 集羣中 (不是一有消息就發送,而是要看消息是否 ready)","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果發送成功 (消息成功寫入 Kafka), 就返回一個 RecordMetaData 對象,它包括了主題和分區信息,以及記錄在分區裏的偏移量。","attrs":{}}]}]},{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果寫入失敗,就會返回一個錯誤,生產者在收到錯誤之後會嘗試重新發送消息 (如果允許的話,此時會將消息在保存到 RecordAccumulator 中),幾次之後如果還是失敗就返回錯誤消息。","attrs":{}}]}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"好了,本文對 Kafka Producer 源碼進行了解析,下一篇文章將會詳細介紹 metadata 的內容以及在 Producer 端 metadata 的更新機制。敬請期待~","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章