\n * Implement {@link org.apache.kafka.common.ClusterResourceListener} to receive cluster metadata once it's available. Please see the class documentation for ClusterResourceListener for more information.\n *\n * @param \n * This class will get producer config properties via \n * Exceptions thrown by ProducerInterceptor methods will be caught, logged, but not propagated further. As a result, if\n * the user configures the interceptor with the wrong key and value type parameters, the producer will not throw an exception,\n * just log the errors.\n * \n * ProducerInterceptor callbacks may be called from multiple threads. Interceptor implementation must ensure thread-safety, if needed.\n * \n * Implement {@link org.apache.kafka.common.ClusterResourceListener} to receive cluster metadata once it's available. Please see the class documentation for ClusterResourceListener for more information.\n */\npublic interface ProducerInterceptor \n * This method is allowed to modify the record, in which case, the new record will be returned. The implication of modifying\n * key/value is that partition assignment (if not specified in ProducerRecord) will be done based on modified key/value,\n * not key/value from the client. Consequently, key and value transformation done in onSend() needs to be consistent:\n * same key and value should mutate to the same (modified) key and value. Otherwise, log compaction would not work\n * as expected.\n * \n * Similarly, it is up to interceptor implementation to ensure that correct topic/partition is returned in ProducerRecord.\n * Most often, it should be the same topic/partition from 'record'.\n * \n * Any exception thrown by this method will be caught by the caller and logged, but not propagated further.\n * \n * Since the producer may run multiple interceptors, a particular interceptor's onSend() callback will be called in the order\n * specified by {@link org.apache.kafka.clients.producer.ProducerConfig#INTERCEPTOR_CLASSES_CONFIG}. The first interceptor\n * in the list gets the record passed from the client, the following interceptor will be passed the record returned by the\n * previous interceptor, and so on. Since interceptors are allowed to modify records, interceptors may potentially get\n * the record already modified by other interceptors. However, building a pipeline of mutable interceptors that depend on the output\n * of the previous interceptor is discouraged, because of potential side-effects caused by interceptors potentially failing to\n * modify the record and throwing an exception. If one of the interceptors in the list throws an exception from onSend(), the exception\n * is caught, logged, and the next interceptor is called with the record returned by the last successful interceptor in the list,\n * or otherwise the client.\n *\n * @param record the record from client or the record returned by the previous interceptor in the chain of interceptors.\n * @return producer record to send to topic/partition\n */\n public ProducerRecord \n * This method is generally called just before the user callback is called, and in additional cases when \n * Any exception thrown by this method will be ignored by the caller.\n * \n * This method will generally execute in the background I/O thread, so the implementation should be reasonably fast.\n * Otherwise, sending of messages from other threads could be delayed.\n *\n * @param metadata The metadata for the record that was sent (i.e. the partition and offset).\n * If an error occurred, metadata will contain only valid topic and maybe\n * partition. If partition is not given in ProducerRecord and an error occurs\n * before partition gets assigned, then partition will be set to RecordMetadata.NO_PARTITION.\n * The metadata may be null if the client passed null record to\n * {@link org.apache.kafka.clients.producer.KafkaProducer#send(ProducerRecord)}.\n * @param exception The exception thrown during processing of this record. Null if no error occurred.\n */\n public void onAcknowledgement(RecordMetadata metadata, Exception exception);\n\n /**\n * This is called when interceptor is closed\n */\n public void close();\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer在將消息序列化和計算分區之前會調用生產者攔截器的onSend()方法來對消息進行相應的定製化操作。一般來說最好不要修改消息 ProducerRecord 的 topic、key 和partition 等信息,如果要修改,則需確保對其有準確的判斷,否則會與預想的效果出現偏差。比如修改key不僅會影響分區的計算,同樣會影響broker端日誌壓縮(Log Compaction)的功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer 會在消息被應答(Acknowledgement)之前或消息發送失敗時調用生產者攔截器的onAcknowledgement()方法,優先於用戶設定的 Callback 之前執行。這個方法運行在Producer的 I/O 線程中,所以這個方法中實現的代碼邏輯越簡單越好,否則會影響消息的發送速度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka可支持鏈式的多個攔截器。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"原理分析"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"整體架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/50/50310cdd7a5e8872e5fecbd1e6684579.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個生產者客戶端由兩個線程協調運行。"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主線程"}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在主線程中由KafkaProducer創建消息,然後通過可能的攔截器、序列化器和分區器的作用之後緩存到消息累加器(RecordAccumulator,也稱爲消息收集器)中。在RecordAccumulator 的內部爲每個分區都維護了一個雙端隊列,隊列中的內容就是ProducerBatch,即 Deque<ProducerBatch>。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RecordAccumulator用於緩存消息,以便Sender線程能批量發送,進而減少網絡傳輸的資源消耗進而提升性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RecordAccumulator緩存的大小可通過生產者客戶端參數"},{"type":"codeinline","content":[{"type":"text","text":"buffer.memory"}]},{"type":"text","text":"進行配置,默認32MB。如果生產者發送消息的速度超過發送到服務器的速度,則會導致生產者空間不足,這個時候KafkaProducer的send()方法調用要麼被阻塞,要麼拋出異常,這個取決於參數"},{"type":"codeinline","content":[{"type":"text","text":"max.block.ms"}]},{"type":"text","text":"的配置,此參數的默認值爲60000,即60秒。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在RecordAccumulator的內部還有一個BufferPool,它主要用來實現ByteBuffer的複用,以實現緩存的高效利用。不過BufferPool只針對特定大小的ByteBuffer進行管理,而其他大小的ByteBuffer不會緩存進BufferPool中,這個特定的大小由"},{"type":"codeinline","content":[{"type":"text","text":"batch.size"}]},{"type":"text","text":"參數來指定,默認值爲16384B,即16KB。我們可以適當地調大batch.size參數以便多緩存一些消息。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sender線程"}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sender 線程負責從RecordAccumulator中獲取消息並將其發送到Kafka中。"}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sender 從 RecordAccumulator 中獲取緩存的消息之後,會進一步將原本<分區,Deque<ProducerBatch>>的保存形式轉變成<Node,List< ProducerBatch>的形式,其中Node表示Kafka集羣的broker節點。請求在從Sender線程發往Kafka之前還會保存到InFlightRequests中,InFlightRequests保存對象的具體形式爲 Map<NodeId,Deque<Request>>,它的主要作用是緩存了已經發出去但還沒有收到響應的請求(NodeId 是一個 String 類型,表示節點的 id 編號)。"}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與此同時,InFlightRequests還提供了許多管理類的方法,並且通過配置參數還可以限制每個連接(也就是客戶端與Node之間的連接)最多緩存的請求數。這個配置參數爲"},{"type":"codeinline","content":[{"type":"text","text":"max.in.flight.requests.per.connection"}]},{"type":"text","text":",默認值爲 5,即每個連接最多隻能緩存 5 個未響應的請求,超過該數值之後就不能再向這個連接發送更多的請求了,除非有緩存的請求收到了響應(Response)。通過比較Deque<Request>的size與這個參數的大小來判斷對應的Node中是否已經堆積了很多未響應的消息,如果真是如此,那麼說明這個 Node 節點負載較大或網絡連接有問題,再繼續向其發送請求會增大請求超時的可能。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"元數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"leastLoadedNode"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即所有Node中負載最小的那個,這裏的負載最小是通過每個Node在InFlightRequests中還未確認的請求決定的,未確認的請求越多則認爲負載越大。"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/25/255fc66765032082a4546fbbc0ce9333.png","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖中所示,負載最小的是節點node2。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此node2則是leastLoadedNode,如果選擇他進行消息發送可以使它能夠儘快發出,避免因網絡擁塞等異常而影響整體的進度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"leastLoadedNode的概念可以用於多個應用場合,比如"},{"type":"text","marks":[{"type":"strong"}],"text":"元數據請求"},{"type":"text","text":"、消費者組播協議的交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"ProducerRecordconfigure()
method, including clientId assigned\n * by KafkaProducer if not specified in the producer config. The interceptor implementation needs to be aware that it will be\n * sharing producer config namespace with other interceptors and serializers, and ensure that there are no conflicts.\n * KafkaProducer.send()
\n * throws an exception.\n *
【Kafka】生產者客戶端小結(java)
{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"基本用法"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"實例化KafkaProducer"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個簡單的生產端代碼如下:"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"public class KafkaProducerDemo {\n\n private static final String brokerlist = \"10.128.123.250:9092\";\n\n private static final String topic = \"topic-demo\";\n\n public static void main(String[] args) {\n Properties props = initConfig();\n KafkaProducer producer = new KafkaProducer<>(props);\n ProducerRecord record = new ProducerRecord<>(topic, \"hello, Kafka !\");\n try {\n producer.send(record);\n Thread.sleep(500L);\n } catch (Exception e) {\n e.printStackTrace();\n }\n }\n\n public static Properties initConfig() {\n Properties props = new Properties();\n props.put(\"bootstrap.servers\", brokerlist);\n props.put(\"key.serializer\", \"org.apache.kafka.common.serialization.StringSerializer\");\n props.put(\"value.serializer\", \"org.apache.kafka.common.serialization.StringSerializer\");\n return props;\n }\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面初始化Kafka配置的代碼,爲防止字符串的變量因爲書寫錯誤造成不能及時發現,可使用如下進行優化"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":" public static Properties initConfig() {\n Properties props = new Properties();\n props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerlist);\n props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());\n props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());\n return props;\n }"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中的"},{"type":"codeinline","content":[{"type":"text","text":"bootstrap.servers"}]},{"type":"text","text":"不必配置所有的broker地址,生產者會從給定的broker裏查找到其他broker的信息。不過建議至少要設置兩個以上的broker 地址信息,當其中任意一個宕機時,生產者仍然可以連接到 Kafka集羣上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer源碼中有多個構造函數,如果在創建KafkaProducer時沒有設置key.serializer和value.serializer,那麼也可以直接通過構造函數傳入"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"KafkaProducer producer = new KafkaProducer<>(props, new StringSerializer(), new StringSerializer());"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer是線程安全的,可以在多個線程中共享單個KafkaProducer實例,比使用多實例更快。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"官網文檔描述:"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"The producer is "},{"type":"text","marks":[{"type":"italic"}],"text":"thread safe"},{"type":"text","text":" and sharing a single producer instance across threads will generally be faster than having multiple instances."}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"創建ProducerRecord"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"public ProducerRecord(String topic, V value)\npublic ProducerRecord(String topic, K key, V value) \npublic ProducerRecord(String topic, Integer partition, K key, V value)\npublic ProducerRecord(String topic, Integer partition, Long timestamp, K key, V value, Iterable headers) \npublic ProducerRecord(String topic, Integer partition, Long timestamp, K key, V value)\npublic ProducerRecord(String topic, Integer partition, K key, V value, Iterable headers)"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若指定了partition,則發送至指定的partition."}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若沒指定partition,但指定了key,則根據key和分區規則指定partition。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若既沒有指定partition也沒有指定key,則round-robin模式發送到每個partition。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若既指定了partition又指定了key,則根據partition參數發送到指定partition,key不起作用。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"發送模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發送消息主要有三種模式:發後即忘(fire-and-forget)、同步(sync)及異步(async)"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"發後即忘"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不關心發送的消息是否到達,對返回結果不作任何處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本質上是一種異步發送,性能最高,但可靠性最差。"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"KafkaProducer producer = new KafkaProducer<>(props);\nProducerRecord record = new ProducerRecord<>(topic, \"hello, Kafka4 !\");\ntry {\n\tproducer.send(record);\n} catch (Exception e) {\n\te.printStackTrace();\n}"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"同步"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在執行send()方法返回Future對象,並調用了get()方法來阻塞等待Kafka的響應,直到消息發送成功,或者發生異常。如果發生異常,那麼就需要捕獲異常並交由外層邏輯處理。"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"KafkaProducer producer = new KafkaProducer<>(props);\nProducerRecord record = new ProducerRecord<>(topic, \"hello, Kafka !\");\ntry {\n\tFuture future = producer.send(record);\n RecordMetadata metadata = future.get();\n} catch (Exception e) {\n\te.printStackTrace();\n}"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"異步回調"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在"},{"type":"codeinline","content":[{"type":"text","text":"send()"}]},{"type":"text","text":"方法裏指定一個"},{"type":"codeinline","content":[{"type":"text","text":"Callback"}]},{"type":"text","text":"回調函數"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"producer.send(record, new Callback() {\n @Override\n public void onCompletion(RecordMetadata metadata, Exception exception) {\n if (null != exception) {\n exception.printStackTrace();\n } else {\n System.out.println(metadata.topic() + \"-\" + metadata.partition() + \":\" + metadata.offset());\n }\n }\n});"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於同一個分區,假設record1比record2先發送,那麼callback1也會在callback2前先調用。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"發送重試"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer中一般會發生兩種類型的異常:可重試的異常和不可重試的異常。常見的可重試異常有:"},{"type":"codeinline","content":[{"type":"text","text":"NetworkException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"LeaderNotAvailableException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"UnknownTopicOrPartitionException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"NotEnoughReplicasException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"NotCoordinatorException"}]},{"type":"text","text":" 等。比如"},{"type":"codeinline","content":[{"type":"text","text":"NetworkException"}]},{"type":"text","text":" 表示網絡異常,這個有可能是由於網絡瞬時故障而導致的異常,可以通過重試解決;又比如"},{"type":"codeinline","content":[{"type":"text","text":"LeaderNotAvailableException"}]},{"type":"text","text":"表示分區的leader副本不可用,這個異常通常發生在leader副本下線而新的 leader 副本選舉完成之前,重試之後可以重新恢復。不可重試的異常,比如"},{"type":"codeinline","content":[{"type":"text","text":"RecordTooLargeException"}]},{"type":"text","text":"異常,暗示了所發送的消息太大,KafkaProducer對此不會進行任何重試,直接拋出異常。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於可重試的異常,如果配置了 retries 參數,那麼只要在規定的重試次數內自行恢復了,就不會拋出異常。retries參數的默認值爲0,配置方式參考如下"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"props.put(ProducerConfig.RETRIES_CONFIG, 10);"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"序列化器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息的生產者需要使用序列化器將消息轉換爲字節數組才能通過網絡發送給kafka,而消費者則使用反序列化器將從kafka接收到的字節數組轉換成相應的對象。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了自帶的"},{"type":"codeinline","content":[{"type":"text","text":"org.apache.kafka.common.serialization.StringSerializer"}]},{"type":"text","text":"外,還有"},{"type":"codeinline","content":[{"type":"text","text":"ByteArray"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"ByteBuffer"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Bytes"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Double"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Integer"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Long"}]},{"type":"text","text":"這幾種類型,它們都實現了"},{"type":"codeinline","content":[{"type":"text","text":"org.apache.kafka.common.serialization.Serializer"}]},{"type":"text","text":"接口,此接口有3個方法"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"/**\n * An interface for converting objects to bytes.\n *\n * A class that implements this interface is expected to have a constructor with no parameter.\n *
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.