\n * Implement {@link org.apache.kafka.common.ClusterResourceListener} to receive cluster metadata once it's available. Please see the class documentation for ClusterResourceListener for more information.\n *\n * @param \n * This class will get producer config properties via \n * Exceptions thrown by ProducerInterceptor methods will be caught, logged, but not propagated further. As a result, if\n * the user configures the interceptor with the wrong key and value type parameters, the producer will not throw an exception,\n * just log the errors.\n * \n * ProducerInterceptor callbacks may be called from multiple threads. Interceptor implementation must ensure thread-safety, if needed.\n * \n * Implement {@link org.apache.kafka.common.ClusterResourceListener} to receive cluster metadata once it's available. Please see the class documentation for ClusterResourceListener for more information.\n */\npublic interface ProducerInterceptor \n * This method is allowed to modify the record, in which case, the new record will be returned. The implication of modifying\n * key/value is that partition assignment (if not specified in ProducerRecord) will be done based on modified key/value,\n * not key/value from the client. Consequently, key and value transformation done in onSend() needs to be consistent:\n * same key and value should mutate to the same (modified) key and value. Otherwise, log compaction would not work\n * as expected.\n * \n * Similarly, it is up to interceptor implementation to ensure that correct topic/partition is returned in ProducerRecord.\n * Most often, it should be the same topic/partition from 'record'.\n * \n * Any exception thrown by this method will be caught by the caller and logged, but not propagated further.\n * \n * Since the producer may run multiple interceptors, a particular interceptor's onSend() callback will be called in the order\n * specified by {@link org.apache.kafka.clients.producer.ProducerConfig#INTERCEPTOR_CLASSES_CONFIG}. The first interceptor\n * in the list gets the record passed from the client, the following interceptor will be passed the record returned by the\n * previous interceptor, and so on. Since interceptors are allowed to modify records, interceptors may potentially get\n * the record already modified by other interceptors. However, building a pipeline of mutable interceptors that depend on the output\n * of the previous interceptor is discouraged, because of potential side-effects caused by interceptors potentially failing to\n * modify the record and throwing an exception. If one of the interceptors in the list throws an exception from onSend(), the exception\n * is caught, logged, and the next interceptor is called with the record returned by the last successful interceptor in the list,\n * or otherwise the client.\n *\n * @param record the record from client or the record returned by the previous interceptor in the chain of interceptors.\n * @return producer record to send to topic/partition\n */\n public ProducerRecord \n * This method is generally called just before the user callback is called, and in additional cases when \n * Any exception thrown by this method will be ignored by the caller.\n * \n * This method will generally execute in the background I/O thread, so the implementation should be reasonably fast.\n * Otherwise, sending of messages from other threads could be delayed.\n *\n * @param metadata The metadata for the record that was sent (i.e. the partition and offset).\n * If an error occurred, metadata will contain only valid topic and maybe\n * partition. If partition is not given in ProducerRecord and an error occurs\n * before partition gets assigned, then partition will be set to RecordMetadata.NO_PARTITION.\n * The metadata may be null if the client passed null record to\n * {@link org.apache.kafka.clients.producer.KafkaProducer#send(ProducerRecord)}.\n * @param exception The exception thrown during processing of this record. Null if no error occurred.\n */\n public void onAcknowledgement(RecordMetadata metadata, Exception exception);\n\n /**\n * This is called when interceptor is closed\n */\n public void close();\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer在将消息序列化和计算分区之前会调用生产者拦截器的onSend()方法来对消息进行相应的定制化操作。一般来说最好不要修改消息 ProducerRecord 的 topic、key 和partition 等信息,如果要修改,则需确保对其有准确的判断,否则会与预想的效果出现偏差。比如修改key不仅会影响分区的计算,同样会影响broker端日志压缩(Log Compaction)的功能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer 会在消息被应答(Acknowledgement)之前或消息发送失败时调用生产者拦截器的onAcknowledgement()方法,优先于用户设定的 Callback 之前执行。这个方法运行在Producer的 I/O 线程中,所以这个方法中实现的代码逻辑越简单越好,否则会影响消息的发送速度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka可支持链式的多个拦截器。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"原理分析"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"整体架构"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/50/50310cdd7a5e8872e5fecbd1e6684579.png","alt":null,"title":null,"style":null,"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"整个生产者客户端由两个线程协调运行。"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"主线程"}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在主线程中由KafkaProducer创建消息,然后通过可能的拦截器、序列化器和分区器的作用之后缓存到消息累加器(RecordAccumulator,也称为消息收集器)中。在RecordAccumulator 的内部为每个分区都维护了一个双端队列,队列中的内容就是ProducerBatch,即 Deque<ProducerBatch>。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RecordAccumulator用于缓存消息,以便Sender线程能批量发送,进而减少网络传输的资源消耗进而提升性能。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"RecordAccumulator缓存的大小可通过生产者客户端参数"},{"type":"codeinline","content":[{"type":"text","text":"buffer.memory"}]},{"type":"text","text":"进行配置,默认32MB。如果生产者发送消息的速度超过发送到服务器的速度,则会导致生产者空间不足,这个时候KafkaProducer的send()方法调用要么被阻塞,要么抛出异常,这个取决于参数"},{"type":"codeinline","content":[{"type":"text","text":"max.block.ms"}]},{"type":"text","text":"的配置,此参数的默认值为60000,即60秒。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在RecordAccumulator的内部还有一个BufferPool,它主要用来实现ByteBuffer的复用,以实现缓存的高效利用。不过BufferPool只针对特定大小的ByteBuffer进行管理,而其他大小的ByteBuffer不会缓存进BufferPool中,这个特定的大小由"},{"type":"codeinline","content":[{"type":"text","text":"batch.size"}]},{"type":"text","text":"参数来指定,默认值为16384B,即16KB。我们可以适当地调大batch.size参数以便多缓存一些消息。"}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sender线程"}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sender 线程负责从RecordAccumulator中获取消息并将其发送到Kafka中。"}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Sender 从 RecordAccumulator 中获取缓存的消息之后,会进一步将原本<分区,Deque<ProducerBatch>>的保存形式转变成<Node,List< ProducerBatch>的形式,其中Node表示Kafka集群的broker节点。请求在从Sender线程发往Kafka之前还会保存到InFlightRequests中,InFlightRequests保存对象的具体形式为 Map<NodeId,Deque<Request>>,它的主要作用是缓存了已经发出去但还没有收到响应的请求(NodeId 是一个 String 类型,表示节点的 id 编号)。"}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"与此同时,InFlightRequests还提供了许多管理类的方法,并且通过配置参数还可以限制每个连接(也就是客户端与Node之间的连接)最多缓存的请求数。这个配置参数为"},{"type":"codeinline","content":[{"type":"text","text":"max.in.flight.requests.per.connection"}]},{"type":"text","text":",默认值为 5,即每个连接最多只能缓存 5 个未响应的请求,超过该数值之后就不能再向这个连接发送更多的请求了,除非有缓存的请求收到了响应(Response)。通过比较Deque<Request>的size与这个参数的大小来判断对应的Node中是否已经堆积了很多未响应的消息,如果真是如此,那么说明这个 Node 节点负载较大或网络连接有问题,再继续向其发送请求会增大请求超时的可能。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"元数据"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"leastLoadedNode"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"即所有Node中负载最小的那个,这里的负载最小是通过每个Node在InFlightRequests中还未确认的请求决定的,未确认的请求越多则认为负载越大。"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/25/255fc66765032082a4546fbbc0ce9333.png","alt":null,"title":"","style":[{"key":"width","value":"50%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如图中所示,负载最小的是节点node2。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"因此node2则是leastLoadedNode,如果选择他进行消息发送可以使它能够尽快发出,避免因网络拥塞等异常而影响整体的进度。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"leastLoadedNode的概念可以用于多个应用场合,比如"},{"type":"text","marks":[{"type":"strong"}],"text":"元数据请求"},{"type":"text","text":"、消费者组播协议的交互。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"ProducerRecordconfigure()
method, including clientId assigned\n * by KafkaProducer if not specified in the producer config. The interceptor implementation needs to be aware that it will be\n * sharing producer config namespace with other interceptors and serializers, and ensure that there are no conflicts.\n * KafkaProducer.send()
\n * throws an exception.\n *
【Kafka】生产者客户端小结(java)
{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"基本用法"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"实例化KafkaProducer"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一个简单的生产端代码如下:"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"public class KafkaProducerDemo {\n\n private static final String brokerlist = \"10.128.123.250:9092\";\n\n private static final String topic = \"topic-demo\";\n\n public static void main(String[] args) {\n Properties props = initConfig();\n KafkaProducer producer = new KafkaProducer<>(props);\n ProducerRecord record = new ProducerRecord<>(topic, \"hello, Kafka !\");\n try {\n producer.send(record);\n Thread.sleep(500L);\n } catch (Exception e) {\n e.printStackTrace();\n }\n }\n\n public static Properties initConfig() {\n Properties props = new Properties();\n props.put(\"bootstrap.servers\", brokerlist);\n props.put(\"key.serializer\", \"org.apache.kafka.common.serialization.StringSerializer\");\n props.put(\"value.serializer\", \"org.apache.kafka.common.serialization.StringSerializer\");\n return props;\n }\n}"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"上面初始化Kafka配置的代码,为防止字符串的变量因为书写错误造成不能及时发现,可使用如下进行优化"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":" public static Properties initConfig() {\n Properties props = new Properties();\n props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, brokerlist);\n props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());\n props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());\n return props;\n }"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中的"},{"type":"codeinline","content":[{"type":"text","text":"bootstrap.servers"}]},{"type":"text","text":"不必配置所有的broker地址,生产者会从给定的broker里查找到其他broker的信息。不过建议至少要设置两个以上的broker 地址信息,当其中任意一个宕机时,生产者仍然可以连接到 Kafka集群上。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer源码中有多个构造函数,如果在创建KafkaProducer时没有设置key.serializer和value.serializer,那么也可以直接通过构造函数传入"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"KafkaProducer producer = new KafkaProducer<>(props, new StringSerializer(), new StringSerializer());"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer是线程安全的,可以在多个线程中共享单个KafkaProducer实例,比使用多实例更快。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"官网文档描述:"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"The producer is "},{"type":"text","marks":[{"type":"italic"}],"text":"thread safe"},{"type":"text","text":" and sharing a single producer instance across threads will generally be faster than having multiple instances."}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"创建ProducerRecord"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"public ProducerRecord(String topic, V value)\npublic ProducerRecord(String topic, K key, V value) \npublic ProducerRecord(String topic, Integer partition, K key, V value)\npublic ProducerRecord(String topic, Integer partition, Long timestamp, K key, V value, Iterable headers) \npublic ProducerRecord(String topic, Integer partition, Long timestamp, K key, V value)\npublic ProducerRecord(String topic, Integer partition, K key, V value, Iterable headers)"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若指定了partition,则发送至指定的partition."}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若没指定partition,但指定了key,则根据key和分区规则指定partition。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若既没有指定partition也没有指定key,则round-robin模式发送到每个partition。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"若既指定了partition又指定了key,则根据partition参数发送到指定partition,key不起作用。"}]}]}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"发送模式"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"发送消息主要有三种模式:发后即忘(fire-and-forget)、同步(sync)及异步(async)"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"发后即忘"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不关心发送的消息是否到达,对返回结果不作任何处理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"本质上是一种异步发送,性能最高,但可靠性最差。"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"KafkaProducer producer = new KafkaProducer<>(props);\nProducerRecord record = new ProducerRecord<>(topic, \"hello, Kafka4 !\");\ntry {\n\tproducer.send(record);\n} catch (Exception e) {\n\te.printStackTrace();\n}"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"同步"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在执行send()方法返回Future对象,并调用了get()方法来阻塞等待Kafka的响应,直到消息发送成功,或者发生异常。如果发生异常,那么就需要捕获异常并交由外层逻辑处理。"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"KafkaProducer producer = new KafkaProducer<>(props);\nProducerRecord record = new ProducerRecord<>(topic, \"hello, Kafka !\");\ntry {\n\tFuture future = producer.send(record);\n RecordMetadata metadata = future.get();\n} catch (Exception e) {\n\te.printStackTrace();\n}"}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"异步回调"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在"},{"type":"codeinline","content":[{"type":"text","text":"send()"}]},{"type":"text","text":"方法里指定一个"},{"type":"codeinline","content":[{"type":"text","text":"Callback"}]},{"type":"text","text":"回调函数"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"producer.send(record, new Callback() {\n @Override\n public void onCompletion(RecordMetadata metadata, Exception exception) {\n if (null != exception) {\n exception.printStackTrace();\n } else {\n System.out.println(metadata.topic() + \"-\" + metadata.partition() + \":\" + metadata.offset());\n }\n }\n});"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对于同一个分区,假设record1比record2先发送,那么callback1也会在callback2前先调用。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"发送重试"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"KafkaProducer中一般会发生两种类型的异常:可重试的异常和不可重试的异常。常见的可重试异常有:"},{"type":"codeinline","content":[{"type":"text","text":"NetworkException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"LeaderNotAvailableException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"UnknownTopicOrPartitionException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"NotEnoughReplicasException"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"NotCoordinatorException"}]},{"type":"text","text":" 等。比如"},{"type":"codeinline","content":[{"type":"text","text":"NetworkException"}]},{"type":"text","text":" 表示网络异常,这个有可能是由于网络瞬时故障而导致的异常,可以通过重试解决;又比如"},{"type":"codeinline","content":[{"type":"text","text":"LeaderNotAvailableException"}]},{"type":"text","text":"表示分区的leader副本不可用,这个异常通常发生在leader副本下线而新的 leader 副本选举完成之前,重试之后可以重新恢复。不可重试的异常,比如"},{"type":"codeinline","content":[{"type":"text","text":"RecordTooLargeException"}]},{"type":"text","text":"异常,暗示了所发送的消息太大,KafkaProducer对此不会进行任何重试,直接抛出异常。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"对于可重试的异常,如果配置了 retries 参数,那么只要在规定的重试次数内自行恢复了,就不会抛出异常。retries参数的默认值为0,配置方式参考如下"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"props.put(ProducerConfig.RETRIES_CONFIG, 10);"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"序列化器"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"消息的生产者需要使用序列化器将消息转换为字节数组才能通过网络发送给kafka,而消费者则使用反序列化器将从kafka接收到的字节数组转换成相应的对象。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"除了自带的"},{"type":"codeinline","content":[{"type":"text","text":"org.apache.kafka.common.serialization.StringSerializer"}]},{"type":"text","text":"外,还有"},{"type":"codeinline","content":[{"type":"text","text":"ByteArray"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"ByteBuffer"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Bytes"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Double"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Integer"}]},{"type":"text","text":"、"},{"type":"codeinline","content":[{"type":"text","text":"Long"}]},{"type":"text","text":"这几种类型,它们都实现了"},{"type":"codeinline","content":[{"type":"text","text":"org.apache.kafka.common.serialization.Serializer"}]},{"type":"text","text":"接口,此接口有3个方法"}]},{"type":"codeblock","attrs":{"lang":"java"},"content":[{"type":"text","text":"/**\n * An interface for converting objects to bytes.\n *\n * A class that implements this interface is expected to have a constructor with no parameter.\n *
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.