【Kafka】生產者客戶端小結(java)

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"基本用法"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"實例化KafkaProducer"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個簡單的生產端代碼如下:"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"public class KafkaProducerDemo {\n\n private static final String brokerlist = \"10.128.123.250:9092\";\n\n private static final String topic = \"topic-demo\";\n\n public static void main(String[] args) {\n Properties props = initConfig();\n KafkaProducer producer = new KafkaProducer<>(props);\n ProducerRecord record = new ProducerRecord<>(topic, \"hello, Kafka !\");\n try {\n producer.send(record);\n Thread.sleep(500L);\n } catch (Exception e) {\n e.printStackTrace();\n }\n }\n\n public static Properties initConfig() {\n Properties props = new Properties();\n props.put(\"bootstrap.servers\", brokerlist);\n props.put(\"key.serializer\", \"org.apache.kafka.common.serialization.StringSerializer\");\n props.put(\"value.serializer\", \"org.apache.kafka.common.serialization.StringSerializer\");\n return props;\n }\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面初始化Kafka配置的代碼,爲防止字符串的變量因爲書寫錯誤造成不能及時發現,可使用如下進行優化"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":" public static Properties initConfig() {\n Properties props = new Properties();\n props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerlist);\n props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());\n props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());\n return props;\n }"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中的"},{"type":"codeinline","content":[{"type":"text","text":"bootstrap.servers"}]},{"type":"text","text":"不必配置所有的broker地址,生產者會從給定的broker裏查找到其他broker的信息。不過建議至少要設置兩個以上的broker 地址信息,當其中任意一個宕機時,生產者仍然可以連接到 Kafka集羣上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer源碼中有多個構造函數,如果在創建KafkaProducer時沒有設置key.serializer和value.serializer,那麼也可以直接通過構造函數傳入"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"KafkaProducer producer = new KafkaProducer<>(props, new StringSerializer(), new StringSerializer());"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer是線程安全的,可以在多個線程中共享單個KafkaProducer實例,比使用多實例更快。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"官網文檔描述:"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"The producer is "},{"type":"text","marks":[{"type":"italic"}],"text":"thread safe"},{"type":"text","text":" and sharing a single producer instance across threads will generally be faster than having multiple instances."}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"創建ProducerRecord"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"public ProducerRecord(String topic, V value)\npublic ProducerRecord(String topic, K key, V value) \npublic ProducerRecord(String topic, Integer partition, K key, V value)\npublic ProducerRecord(String topic, Integer partition, Long timestamp, K key, V value, Iterable
headers) \npublic ProducerRecord(String topic, Integer partition, Long timestamp, K key, V value)\npublic ProducerRecord(String topic, Integer partition, K key, V value, Iterable
headers)"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若指定了partition,則發送至指定的partition."}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若沒指定partition,但指定了key,則根據key和分區規則指定partition。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若既沒有指定partition也沒有指定key,則round-robin模式發送到每個partition。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若既指定了partition又指定了key,則根據partition參數發送到指定partition,key不起作用。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"發送模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"發送消息主要有三種模式:發後即忘(fire-and-forget)、同步(sync)及異步(async)"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"發後即忘"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不關心發送的消息是否到達,對返回結果不作任何處理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本質上是一種異步發送,性能最高,但可靠性最差。"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"KafkaProducer producer = new KafkaProducer<>(props);\nProducerRecord record = new ProducerRecord<>(topic, \"hello, Kafka4 !\");\ntry {\n\tproducer.send(record);\n} catch (Exception e) {\n\te.printStackTrace();\n}"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"同步"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在執行send()方法返回Future對象,並調用了get()方法來阻塞等待Kafka的響應,直到消息發送成功,或者發生異常。如果發生異常,那麼就需要捕獲異常並交由外層邏輯處理。"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"KafkaProducer producer = new KafkaProducer<>(props);\nProducerRecord record = new ProducerRecord<>(topic, \"hello, Kafka !\");\ntry {\n\tFuture future = producer.send(record);\n RecordMetadata metadata = future.get();\n} catch (Exception e) {\n\te.printStackTrace();\n}"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"異步回調"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在"},{"type":"codeinline","content":[{"type":"text","text":"send()"}]},{"type":"text","text":"方法裏指定一個"},{"type":"codeinline","content":[{"type":"text","text":"Callback"}]},{"type":"text","text":"回調函數"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"producer.send(record, new Callback() {\n @Override\n public void onCompletion(RecordMetadata metadata, Exception exception) {\n if (null != exception) {\n exception.printStackTrace();\n } else {\n System.out.println(metadata.topic() + \"-\" + metadata.partition() + \":\" + metadata.offset());\n }\n }\n});"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於同一個分區,假設record1比record2先發送,那麼callback1也會在callback2前先調用。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"發送重試"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer中一般會發生兩種類型的異常:可重試的異常和不可重試的異常。常見的可重試異常有:"},{"type":"codeinline","content":[{"type":"text","text":"NetworkException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"LeaderNotAvailableException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"UnknownTopicOrPartitionException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"NotEnoughReplicasException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"NotCoordinatorException"}]},{"type":"text","text":" 等。比如"},{"type":"codeinline","content":[{"type":"text","text":"NetworkException"}]},{"type":"text","text":" 表示網絡異常,這個有可能是由於網絡瞬時故障而導致的異常,可以通過重試解決;又比如"},{"type":"codeinline","content":[{"type":"text","text":"LeaderNotAvailableException"}]},{"type":"text","text":"表示分區的leader副本不可用,這個異常通常發生在leader副本下線而新的 leader 副本選舉完成之前,重試之後可以重新恢復。不可重試的異常,比如"},{"type":"codeinline","content":[{"type":"text","text":"RecordTooLargeException"}]},{"type":"text","text":"異常,暗示了所發送的消息太大,KafkaProducer對此不會進行任何重試,直接拋出異常。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"對於可重試的異常,如果配置了 retries 參數,那麼只要在規定的重試次數內自行恢復了,就不會拋出異常。retries參數的默認值爲0,配置方式參考如下"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"props.put(ProducerConfig.RETRIES_CONFIG, 10);"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"序列化器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息的生產者需要使用序列化器將消息轉換爲字節數組才能通過網絡發送給kafka,而消費者則使用反序列化器將從kafka接收到的字節數組轉換成相應的對象。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了自帶的"},{"type":"codeinline","content":[{"type":"text","text":"org.apache.kafka.common.serialization.StringSerializer"}]},{"type":"text","text":"外,還有"},{"type":"codeinline","content":[{"type":"text","text":"ByteArray"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"ByteBuffer"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Bytes"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Double"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Integer"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Long"}]},{"type":"text","text":"這幾種類型,它們都實現了"},{"type":"codeinline","content":[{"type":"text","text":"org.apache.kafka.common.serialization.Serializer"}]},{"type":"text","text":"接口,此接口有3個方法"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"/**\n * An interface for converting objects to bytes.\n *\n * A class that implements this interface is expected to have a constructor with no parameter.\n *

\n * Implement {@link org.apache.kafka.common.ClusterResourceListener} to receive cluster metadata once it's available. Please see the class documentation for ClusterResourceListener for more information.\n *\n * @param Type to be serialized from.\n */\npublic interface Serializer extends Closeable {\n\n /**\n * Configure this class.\n * @param configs configs in key/value pairs\n * @param isKey whether is for key or value\n */\n void configure(Map configs, boolean isKey);\n\n /**\n * Convert {@code data} into a byte array.\n *\n * @param topic topic associated with data\n * @param data typed data\n * @return serialized bytes\n */\n byte[] serialize(String topic, T data);\n\n /**\n * Close this serializer.\n *\n * This method must be idempotent as it may be called multiple times.\n */\n @Override\n void close();\n}"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"configure 方法用於配置當前類,在創建KafkaProducer的時候會被調用。例如在"},{"type":"codeinline","content":[{"type":"text","text":"StringSerializer"}]},{"type":"text","text":"的實現中,讀取配置的編碼格式"},{"type":"codeinline","content":[{"type":"text","text":"key.serializer.encoding"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"value.serializer.encoding"}]},{"type":"text","text":"和"},{"type":"codeinline","content":[{"type":"text","text":"serializer.encoding"}]},{"type":"text","text":",如果都沒有配置,則默認使用\"UTF-8\"。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"serialize 方法非常直觀,就是將String類型轉爲byte[]類型。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"close 一般都是空的實現。源碼註解提到該方法必須是冪等的,因爲可能被調用多次。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"需要注意的是,生產者使用的序列化器必須與消費者使用的反序列化器一一對應,否則無法解析出想要的數據。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們也可以使用Avro、JSON、Thrift、ProtoBuf和Protostuff等通用的序列化工具來實現序列化器,或自定義的序列化器。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"分區器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果發送消息的ProducerRecord中已經指定了partition,則無需使用分區器了,因爲partition已經指定了需要發送到的分區號。只有在沒有指定partition時,分區器纔會起作用。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka中提供了默認的分區器"},{"type":"codeinline","content":[{"type":"text","text":"org.apache.kafka.clients.producer.internals.DefaultPartitioner"}]},{"type":"text","text":",實現了"},{"type":"codeinline","content":[{"type":"text","text":"org.apache.kafka.clients.producer.Partitioner"}]},{"type":"text","text":"接口。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"codeinline","content":[{"type":"text","text":"Partitioner"}]},{"type":"text","text":"還有一個父接口,通過實現"},{"type":"codeinline","content":[{"type":"text","text":"configure"}]},{"type":"text","text":"方法,可通過配置進行一些初始化配置工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過"},{"type":"codeinline","content":[{"type":"text","text":"partition"}]},{"type":"text","text":"方法則可實現分區邏輯。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在默認分區器中,如果key爲空,則會計算得到僅爲可用分區的分區號中任意一個。如果不爲空,可能得到所有分區中的分區號中任意一個。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"攔截器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"攔截器有兩種,生產者攔截器和消費者攔截器,這裏僅說明生產者攔截器。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生產者攔截器既可以用來在消息發送前做一些準備工作,比如按照某個規則過濾不符合要求的消息、修改消息的內容等,也可以用來在發送回調邏輯前做一些定製化的需求,比如統計類工作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"ProducerInterceptor接口中包含3個方法:"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"/**\n * A plugin interface that allows you to intercept (and possibly mutate) the records received by the producer before\n * they are published to the Kafka cluster.\n *

\n * This class will get producer config properties via configure() method, including clientId assigned\n * by KafkaProducer if not specified in the producer config. The interceptor implementation needs to be aware that it will be\n * sharing producer config namespace with other interceptors and serializers, and ensure that there are no conflicts.\n *

\n * Exceptions thrown by ProducerInterceptor methods will be caught, logged, but not propagated further. As a result, if\n * the user configures the interceptor with the wrong key and value type parameters, the producer will not throw an exception,\n * just log the errors.\n *

\n * ProducerInterceptor callbacks may be called from multiple threads. Interceptor implementation must ensure thread-safety, if needed.\n *

\n * Implement {@link org.apache.kafka.common.ClusterResourceListener} to receive cluster metadata once it's available. Please see the class documentation for ClusterResourceListener for more information.\n */\npublic interface ProducerInterceptor extends Configurable {\n /**\n * This is called from {@link org.apache.kafka.clients.producer.KafkaProducer#send(ProducerRecord)} and\n * {@link org.apache.kafka.clients.producer.KafkaProducer#send(ProducerRecord, Callback)} methods, before key and value\n * get serialized and partition is assigned (if partition is not specified in ProducerRecord).\n *

\n * This method is allowed to modify the record, in which case, the new record will be returned. The implication of modifying\n * key/value is that partition assignment (if not specified in ProducerRecord) will be done based on modified key/value,\n * not key/value from the client. Consequently, key and value transformation done in onSend() needs to be consistent:\n * same key and value should mutate to the same (modified) key and value. Otherwise, log compaction would not work\n * as expected.\n *

\n * Similarly, it is up to interceptor implementation to ensure that correct topic/partition is returned in ProducerRecord.\n * Most often, it should be the same topic/partition from 'record'.\n *

\n * Any exception thrown by this method will be caught by the caller and logged, but not propagated further.\n *

\n * Since the producer may run multiple interceptors, a particular interceptor's onSend() callback will be called in the order\n * specified by {@link org.apache.kafka.clients.producer.ProducerConfig#INTERCEPTOR_CLASSES_CONFIG}. The first interceptor\n * in the list gets the record passed from the client, the following interceptor will be passed the record returned by the\n * previous interceptor, and so on. Since interceptors are allowed to modify records, interceptors may potentially get\n * the record already modified by other interceptors. However, building a pipeline of mutable interceptors that depend on the output\n * of the previous interceptor is discouraged, because of potential side-effects caused by interceptors potentially failing to\n * modify the record and throwing an exception. If one of the interceptors in the list throws an exception from onSend(), the exception\n * is caught, logged, and the next interceptor is called with the record returned by the last successful interceptor in the list,\n * or otherwise the client.\n *\n * @param record the record from client or the record returned by the previous interceptor in the chain of interceptors.\n * @return producer record to send to topic/partition\n */\n public ProducerRecord onSend(ProducerRecord record);\n\n /**\n * This method is called when the record sent to the server has been acknowledged, or when sending the record fails before\n * it gets sent to the server.\n *

\n * This method is generally called just before the user callback is called, and in additional cases when KafkaProducer.send()\n * throws an exception.\n *

\n * Any exception thrown by this method will be ignored by the caller.\n *

\n * This method will generally execute in the background I/O thread, so the implementation should be reasonably fast.\n * Otherwise, sending of messages from other threads could be delayed.\n *\n * @param metadata The metadata for the record that was sent (i.e. the partition and offset).\n * If an error occurred, metadata will contain only valid topic and maybe\n * partition. If partition is not given in ProducerRecord and an error occurs\n * before partition gets assigned, then partition will be set to RecordMetadata.NO_PARTITION.\n * The metadata may be null if the client passed null record to\n * {@link org.apache.kafka.clients.producer.KafkaProducer#send(ProducerRecord)}.\n * @param exception The exception thrown during processing of this record. Null if no error occurred.\n */\n public void onAcknowledgement(RecordMetadata metadata, Exception exception);\n\n /**\n * This is called when interceptor is closed\n */\n public void close();\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer在將消息序列化和計算分區之前會調用生產者攔截器的onSend()方法來對消息進行相應的定製化操作。一般來說最好不要修改消息 ProducerRecord 的 topic、key 和partition 等信息,如果要修改,則需確保對其有準確的判斷,否則會與預想的效果出現偏差。比如修改key不僅會影響分區的計算,同樣會影響broker端日誌壓縮(Log Compaction)的功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer 會在消息被應答(Acknowledgement)之前或消息發送失敗時調用生產者攔截器的onAcknowledgement()方法,優先於用戶設定的 Callback 之前執行。這個方法運行在Producer的 I/O 線程中,所以這個方法中實現的代碼邏輯越簡單越好,否則會影響消息的發送速度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka可支持鏈式的多個攔截器。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"原理分析"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"整體架構"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/50/50310cdd7a5e8872e5fecbd1e6684579.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整個生產者客戶端由兩個線程協調運行。"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主線程"}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在主線程中由KafkaProducer創建消息,然後通過可能的攔截器、序列化器和分區器的作用之後緩存到消息累加器(RecordAccumulator,也稱爲消息收集器)中。在RecordAccumulator 的內部爲每個分區都維護了一個雙端隊列,隊列中的內容就是ProducerBatch,即 Deque<ProducerBatch>。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RecordAccumulator用於緩存消息,以便Sender線程能批量發送,進而減少網絡傳輸的資源消耗進而提升性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RecordAccumulator緩存的大小可通過生產者客戶端參數"},{"type":"codeinline","content":[{"type":"text","text":"buffer.memory"}]},{"type":"text","text":"進行配置,默認32MB。如果生產者發送消息的速度超過發送到服務器的速度,則會導致生產者空間不足,這個時候KafkaProducer的send()方法調用要麼被阻塞,要麼拋出異常,這個取決於參數"},{"type":"codeinline","content":[{"type":"text","text":"max.block.ms"}]},{"type":"text","text":"的配置,此參數的默認值爲60000,即60秒。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在RecordAccumulator的內部還有一個BufferPool,它主要用來實現ByteBuffer的複用,以實現緩存的高效利用。不過BufferPool只針對特定大小的ByteBuffer進行管理,而其他大小的ByteBuffer不會緩存進BufferPool中,這個特定的大小由"},{"type":"codeinline","content":[{"type":"text","text":"batch.size"}]},{"type":"text","text":"參數來指定,默認值爲16384B,即16KB。我們可以適當地調大batch.size參數以便多緩存一些消息。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sender線程"}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sender 線程負責從RecordAccumulator中獲取消息並將其發送到Kafka中。"}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sender 從 RecordAccumulator 中獲取緩存的消息之後,會進一步將原本<分區,Deque<ProducerBatch>>的保存形式轉變成<Node,List< ProducerBatch>的形式,其中Node表示Kafka集羣的broker節點。請求在從Sender線程發往Kafka之前還會保存到InFlightRequests中,InFlightRequests保存對象的具體形式爲 Map<NodeId,Deque<Request>>,它的主要作用是緩存了已經發出去但還沒有收到響應的請求(NodeId 是一個 String 類型,表示節點的 id 編號)。"}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"與此同時,InFlightRequests還提供了許多管理類的方法,並且通過配置參數還可以限制每個連接(也就是客戶端與Node之間的連接)最多緩存的請求數。這個配置參數爲"},{"type":"codeinline","content":[{"type":"text","text":"max.in.flight.requests.per.connection"}]},{"type":"text","text":",默認值爲 5,即每個連接最多隻能緩存 5 個未響應的請求,超過該數值之後就不能再向這個連接發送更多的請求了,除非有緩存的請求收到了響應(Response)。通過比較Deque<Request>的size與這個參數的大小來判斷對應的Node中是否已經堆積了很多未響應的消息,如果真是如此,那麼說明這個 Node 節點負載較大或網絡連接有問題,再繼續向其發送請求會增大請求超時的可能。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"元數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"leastLoadedNode"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即所有Node中負載最小的那個,這裏的負載最小是通過每個Node在InFlightRequests中還未確認的請求決定的,未確認的請求越多則認爲負載越大。"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/25/255fc66765032082a4546fbbc0ce9333.png","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如圖中所示,負載最小的是節點node2。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此node2則是leastLoadedNode,如果選擇他進行消息發送可以使它能夠儘快發出,避免因網絡擁塞等異常而影響整體的進度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"leastLoadedNode的概念可以用於多個應用場合,比如"},{"type":"text","marks":[{"type":"strong"}],"text":"元數據請求"},{"type":"text","text":"、消費者組播協議的交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"ProducerRecord record = new ProducerRecord<>(topic, \"hello, Kafka !\");"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當我們創建這樣一條消息,我們只知道主題和消息內容,對其他信息一無所知。KafkaProducer要將此消息追加到指定主題的某個分區所對應的leader副本之前,首先需要知道主題的分區數量,然後經過計算得出(或者直接指定)目標分區,之後KafkaProducer需要知道目標分區的leader副本所在的broker 節點的地址、端口等信息才能建立連接,最終才能將消息發送到 Kafka,在這一過程中所需要的信息都屬於元數據信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面介紹了"},{"type":"codeinline","content":[{"type":"text","text":"bootstrap.servers"}]},{"type":"text","text":"參數只需要配置部分broker節點的地址即可,不需要配置所有broker節點的地址,因爲客戶端可以自己發現其他broker節點的地址,這一過程也屬於元數據相關的更新操作。與此同時,分區數量及leader副本的分佈都會動態地變化,客戶端也需要動態地捕捉這些變化。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"元數據是指Kafka集羣的元數據,這些元數據具體記錄了集羣中有哪些主題,這些主題有哪些分區,每個分區的leader副本分配在哪個節點上,follower副本分配在哪些節點上,哪些副本在AR、ISR等集合中,集羣中有哪些節點,控制器節點又是哪一個等信息。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當客戶端中沒有需要使用的元數據信息時,比如沒有指定的主題信息,或者超過"},{"type":"codeinline","content":[{"type":"text","text":"metadata.max.age.ms"}]},{"type":"text","text":" 時間沒有更新元數據都會引起元數據的更新操作。客戶端參數"},{"type":"codeinline","content":[{"type":"text","text":"metadata.max.age.ms"}]},{"type":"text","text":"的默認值爲300000,即5分鐘。元數據的更新操作是在客戶端內部進行的,對客戶端的外部使用者不可見。當需要更新元數據時,會先挑選出"},{"type":"codeinline","content":[{"type":"text","text":"leastLoadedNode"}]},{"type":"text","text":",然後向這個Node發送MetadataRequest請求來獲取具體的元數據信息。這個更新操作是由Sender線程發起的,在創建完MetadataRequest之後同樣會存入InFlightRequests,之後的步驟就和發送消息時的類似。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"元數據雖然由Sender線程負責更新,但是主線程也需要讀取這些信息,這裏的數據同步通過synchronized和final關鍵字來保障。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"主要參數"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"acks"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指定分區中必須要有多少個副本收到這條消息,之後生產者纔會認爲這條消息是成功寫入的。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"acks 是生產者客戶端中一個非常重要的參數,它涉及消息的可靠性和吞吐量之間的權衡。acks參數有3種類型的值(都是字符串類型)。"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"acks=1(默認)"}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生產者發送消息之後,只要分區的leader副本成功寫入消息,那麼它就會收到來自服務端的成功響應。"}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果消息無法寫入leader副本,比如在leader 副本崩潰、重新選舉新的leader 副本的過程中,那麼生產者就會收到一個錯誤的響應,爲了避免消息丟失,生產者可以選擇重發消息。"}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果消息寫入leader副本並返回成功響應給生產者,且在被其他follower副本拉取之前leader副本崩潰,那麼此時消息還是會丟失,因爲新選舉的leader副本中並沒有這條對應的消息。"}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"acks設置爲1,是消息可靠性和吞吐量之間的折中方案。"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"acks=0"}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生產者發送消息之後不需要等待任何服務端的響應。如果在消息從發送到寫入Kafka的過程中出現某些異常,導致Kafka並沒有收到這條消息,那麼生產者也無從得知,消息也就丟失了。在其他配置環境相同的情況下,"},{"type":"text","marks":[{"type":"strong"}],"text":"acks 設置爲 0 可以達到最大的吞吐量"},{"type":"text","text":"。"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"acks=-1或acks=all"}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"生產者在消息發送之後,需要等待ISR中的所有副本都成功寫入消息之後才能夠收到來自服務端的成功響應。在其他配置環境相同的情況下,acks 設置爲-1(all)可以達到最強的可靠性。但這並不意味着消息就一定可靠,因爲ISR中可能只有leader副本,這樣就退化成了acks=1的情況。要獲得更高的消息可靠性需要配合 "},{"type":"codeinline","content":[{"type":"text","text":"min.insync.replicas"}]},{"type":"text","text":" 等參數的聯動。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"max.request.size"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"限制生產者客戶端能發送的消息的最大值,默認值爲 1048576B,即 1MB。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不建議盲目地增大這個參數的配置值,尤其是在對Kafka整體脈絡沒有足夠把控的時候。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因爲這個參數還涉及一些其他參數的聯動,比如broker端的"},{"type":"codeinline","content":[{"type":"text","text":"message.max.bytes"}]},{"type":"text","text":"參數,如果配置錯誤可能會引起一些不必要的異常。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如將broker端的"},{"type":"codeinline","content":[{"type":"text","text":"message.max.bytes"}]},{"type":"text","text":"參數配置爲10,而max.request.size參數配置爲20,那麼當我們發送一條大小爲15B的消息時,生產者客戶端就會報出如下的異常"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"org.apache.kafka.common.errors.RecordTooLargeException: The request included a message larger than the max message size the server will accept."}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"retries和retry.backoff.ms"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"retries參數用來配置生產者重試的次數,默認值爲0,該參數用於設置在發生"},{"type":"text","marks":[{"type":"strong"}],"text":"可重試異常"},{"type":"text","text":"的時候進行重試的次數。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"retry.backoff.ms用來設定兩次重試之間的時間間隔,避免無效的頻繁重試,默認值爲100。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"關於有序消息"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 可以保證同一個分區中的消息是有序的。如果生產者按照一定的順序發送消息,那麼這些消息也會順序地寫入分區,進而消費者也可以按照同樣的順序消費它們。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果將"},{"type":"codeinline","content":[{"type":"text","text":"acks"}]},{"type":"text","text":"參數配置爲非零值,並且"},{"type":"codeinline","content":[{"type":"text","text":"max.in.flight.requests.per.connection"}]},{"type":"text","text":"參數配置爲大於1的值,那麼就會出現錯序的現象:如果第一批次消息寫入失敗,而第二批次消息寫入成功,那麼生產者會重試發送第一批次的消息,此時如果第一批次的消息寫入成功,那麼這兩個批次的消息就出現了錯序。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在需要保證消息順序的場合建議把參數max.in.flight.requests.per.connection配置爲1,而不是把acks配置爲0,不過這樣也會影響整體的吞吐。"}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"compression.type"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指定消息的壓縮方式,默認值爲“none”,即默認情況下,消息不會被壓縮。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"該參數還可以配置爲“gzip”“snappy”和“lz4”。對消息進行壓縮可以極大地減少網絡傳輸量、降低網絡I/O,從而提高整體的性能。消息壓縮是一種使用時間換空間的優化方式,如果對時延有一定的要求,則不推薦對消息進行壓縮。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"connections.max.idle.ms"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指定在多久之後關閉閒置的連接,默認值是540000(ms),即9分鐘。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"linger.ms"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"指定生產者發送 ProducerBatch 之前等待更多消息(ProducerRecord)加入ProducerBatch 的時間,默認值爲 0。生產者客戶端會在 ProducerBatch 被填滿或等待時間超過linger.ms 值時發送出去。增大這個參數的值會增加消息的延遲,但是同時能提升一定的吞吐量。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"batch.size"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即上面提到的ProducerBatch一個批次發送消息的大小。在多個消息發送到同一個分區時,生產者將消息打包在一起,以減少網絡開銷和請求交互,從而提升性能。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"receive.buffer.bytes"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設置Socket接收消息緩衝區(SO_RECBUF)的大小,默認值爲32768(B),即32KB。如果設置爲-1,則使用操作系統的默認值。如果Producer與Kafka處於不同的機房,則可以適地調大這個參數值,因爲跨數據中心的網絡一般都有比較高的延遲和比較低的帶寬。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"send.buffer.bytes"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"設置Socket發送消息緩衝區(SO_SNDBUF)的大小,默認值爲131072(B),即128KB。與receive.buffer.bytes參數一樣,如果設置爲-1,則使用操作系統的默認值。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"request.timeout.ms"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"配置Producer等待請求響應的最長時間,默認值爲30000(ms)。請求超時之後可以選擇進行重試。"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注意:"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這個參數需要比broker端參數"},{"type":"codeinline","content":[{"type":"text","text":"replica.lag.time.max.ms"}]},{"type":"text","text":" 的值要大,這樣可以減少因客戶端重試而引起的消息重複的概率。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章