procedure就是產生消息並將消息發佈至broker的應用。
producer連接至任意的活動節點並請求獲取某個topic的partition的leader元數據。這樣producer可以直接將信息發給該partition的lead broker。
出於效率考慮,producer可以分批發布消息,但是只能在異步模式下。異步模式下,producer可以配置queue.time
或`batch.size
這兩個參數其中一個來指定在一定數量或一定時間後批量發佈消息。消息會在producer這一端積累,然後在一次請求中批量發佈至broker。因此異步模式也帶來了消息丟失的風險,當producer崩潰時,在內存中的積累的尚未發佈的消息就丟失了。
對於異步模式的producer,回調函數可以用來註冊捕捉錯誤的處理器。
Java producer API
- Producer
Kafka提供了類kafka.javaapi.producer.Producer
(class Producer<K, V>
)用於向一個或多個topic創建消息,還可以制定消息的partition。K和V分別指定partiton key和消息的值的類型。 -
KeyedMessage
類kafka.producer.KeyedMessage
的構造函數參數爲topic名稱、partition key和消息值:class KeyedMessage[K,V](val topic: String, val key: K, val message: V)
-
ProducerConfig
類kafka.producer.ProducerConfig
封裝了與broker建立連接所需要的參數,如borker list、partition類、消息序列化類、partiton key。
producer的API封裝了同步模式下producer的實現,異步模式下producer基於producer.type
。例如,異步模式的kafka.producer.Producer
負責消息序列化和發送之前的數據緩存。在內部,kafka.producer.async.ProducerSendThread
的實例從隊列中讀出該批次的消息,kafka.producer.EventHandler
序列化併發送數據。配置event.handler
這個參數還可以自定義處理器。
一個簡單的Java producer
接下來,我們寫一個類SimpleProducer
來創建指定的topic對應的消息,並使用默認的partition。
1.引入以下類:
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
2.定義屬性:
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092, localhost:9093, localhost:9094");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<String, String> producer = new Producer<String, String>(config);
看一下代碼中提到的屬性:
metadata.broker.list
:該屬性指定producer要連接的broker(格式爲[<node:port>, <node:port>]
)。Kafka producer會自動爲topic選擇lead broker,並且在發佈消息時連接到正確的broker。serializer.class
:該屬性指定準備發送消息時對消息進行序列化的類。在本例中使用的是Kafka提供的字符串編碼器。默認情況下key和消息的序列化類是一樣的。也可以通過擴展kafka.serializer.Encoder
來實現自定義的序列化類。設置參數key.serializer.class
就可以使用自定義編碼器。request.required.acks
:該屬性指示broker在收到消息後向producer發送回執。1表示只要lead副本接收到消息就發送回執。
3.構造消息併發送:
String runtime = new Date().toString();
String msg = "Message Publishing Time - " + runtime;
KeyedMessage<String, String> data = new KeyedMessage<String, String>(topic, msg);
producer.send(data);
完整代碼如下:
package kafka.examples.producer;
import java.util.Date;
import java.util.Properties;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class SimpleProducer {
private static Producer<String, String> producer;
public SimpleProducer() {
Properties props = new Properties();
// Set the broker list for requesting metadata to find the lead broker
props.put("metadata.broker.list",
"192.168.146.132:9092, 192.168.146.132:9093, 192.168.146.132:9094");
//This specifies the serializer class for keys
props.put("serializer.class", "kafka.serializer.StringEncoder");
// 1 means the producer receives an acknowledgment once the lead replica
// has received the data. This option provides better durability as the
// client waits until the server acknowledges the request as successful.
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
producer = new Producer<String, String>(config);
}
public static void main(String[] args) {
int argsCount = args.length;
if (argsCount == 0 || argsCount == 1)
throw new IllegalArgumentException(
"Please provide topic name and Message count as arguments");
// Topic name and the message count to be published is passed from the
// command line
String topic = (String) args[0];
String count = (String) args[1];
int messageCount = Integer.parseInt(count);
System.out.println("Topic Name - " + topic);
System.out.println("Message Count - " + messageCount);
SimpleProducer simpleProducer = new SimpleProducer();
simpleProducer.publishMessage(topic, messageCount);
}
private void publishMessage(String topic, int messageCount) {
for (int mCount = 0; mCount < messageCount; mCount++) {
String runtime = new Date().toString();
String msg = "Message Publishing Time - " + runtime;
System.out.println(msg);
// Creates a KeyedMessage instance
KeyedMessage<String, String> data =
new KeyedMessage<String, String>(topic, msg);
// Publish the message
producer.send(data);
}
// Close producer connection with broker.
producer.close();
}
}
在運行上面的代碼之前,確保已經創建了名爲kafkatopic
的topic:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 3 --topic kafkatopic
添加環境變量KAFKA_LIB
指向Kafka的lib文件夾路徑,並將lib文件夾下的jar包添加到classpath
。
編譯代碼:
javac -d . kafka/examples/producer/SimpleProducer.java
運行程序,SimpleProducer
接收兩個參數,topic名稱和消息數量:
java kafka.examples.producer.SimpleProducer kafkatopic 10
之後可以運行consumer接收消息了:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic kafkatopic
自定義partition的Java producer
上面的例子是一個非常簡單的針對多broker集羣的producer,沒有明確指定消息的partition。接下來我們寫一個帶自定義消息partition的。例子的場景是,捕捉併發布從各個IP訪問網站的日誌消息。日誌消息包含:網站被訪問時的timestamp、網站的名稱、訪問網站的IP地址。
1.引用以下類
import java.util.Date;
import java.util.Properties;
import java.util.Random;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
2.定義屬性
Properties props = new Properties();
props.put("metadata.broker.list", "localhost:9092, localhost:9093, localhost:9094");
props.put("serializer.class", "kafka.serializer.StringEncoder");
props.put("partitioner.class", "kafka.examples.producer.SimplePartitioner");
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
Producer<Integer, String> producer = new Producer<Integer, String>(config);
屬性partitioner.class
指定用於決定消息發送的topic內partition的類。如果爲null,則使用key的哈希值。
3.實現分區類
編寫一個自定義分區類SimplePartitioner
,它是抽象類Partitioner
的實現。
package kafka.examples.producer;
import kafka.producer.Partitioner;
public class SimplePartitioner implements Partitioner {
public SimplePartitioner (VerifiableProperties props) {
}
/*
* The method takes the key, which in this case is the IP address,
* It finds the last octet and does a modulo operation on the number
* of partitions defined within Kafka for the topic.
*
* @see kafka.producer.Partitioner#partition(java.lang.Object, int)
*/
public int partition(Object key, int a_numPartitions) {
int partition = 0;
String partitionKey = (String) key;
int offset = partitionKey.lastIndexOf('.');
if (offset > 0) {
partition = Integer.parseInt(partitionKey.substring(offset + 1))
% a_numPartitions;
}
return partition;
}
}
4.構造消息併發送
完整代碼如下:
package kafka.examples.producer;
import java.util.Date;
import java.util.Properties;
import java.util.Random;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class CustomPartitionProducer {
private static Producer<String, String> producer;
public CustomPartitionProducer() {
Properties props = new Properties();
// Set the broker list for requesting metadata to find the lead broker
props.put("metadata.broker.list",
"192.168.146.132:9092, 192.168.146.132:9093, 192.168.146.132:9094");
// This specifies the serializer class for keys
props.put("serializer.class", "kafka.serializer.StringEncoder");
// Defines the class to be used for determining the partition
// in the topic where the message needs to be sent.
props.put("partitioner.class", "kafka.examples.ch4.SimplePartitioner");
// 1 means the producer receives an acknowledgment once the lead replica
// has received the data. This option provides better durability as the
// client waits until the server acknowledges the request as successful.
props.put("request.required.acks", "1");
ProducerConfig config = new ProducerConfig(props);
producer = new Producer<String, String>(config);
}
public static void main(String[] args) {
int argsCount = args.length;
if (argsCount == 0 || argsCount == 1)
throw new IllegalArgumentException(
"Please provide topic name and Message count as arguments");
// Topic name and the message count to be published is passed from the
// command line
String topic = (String) args[0];
String count = (String) args[1];
int messageCount = Integer.parseInt(count);
System.out.println("Topic Name - " + topic);
System.out.println("Message Count - " + messageCount);
CustomPartitionProducer simpleProducer = new CustomPartitionProducer();
simpleProducer.publishMessage(topic, messageCount);
}
private void publishMessage(String topic, int messageCount) {
Random random = new Random();
for (int mCount = 0; mCount < messageCount; mCount++) {
String clientIP = "192.168.14." + random.nextInt(255);
String accessTime = new Date().toString();
String message = accessTime + ",kafka.apache.org," + clientIP;
System.out.println(message);
// Creates a KeyedMessage instance
KeyedMessage<String, String> data =
new KeyedMessage<String, String>(topic, clientIP, message);
// Publish the message
producer.send(data);
}
// Close producer connection with broker.
producer.close();
}
}
在運行上面的代碼之前,確保已經創建了名爲website-hits
的topic:
bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 3 --partitions 5 --topic website-hits
編譯代碼:
javac -d . kafka/examples/producer/SimplePartitioner.java
javac -d . kafka/examples/producer/CustomPartitionProducer.java
運行程序:
java kafka.examples.producer.CustomPartitionProducer website-hits 100
運行consumer接收消息:bash bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic kafkatopic
producer屬性
metadata.broker.list
:producer使用該屬性獲取元數據(topic、partition、、replica),格式爲host1:port1,host2:port2
。serializer.class
:指定消息的序列化類。默認值爲kafka.serializer.DefaultEncoder
,。producer.type
:指定消息發送是同步模式還是異步模式。可選值爲async
和sync
。默認值爲sync
。request.required.acks
:指定producer請求完成時broker是否向producer發送回執。默認值爲0。0表示producer不等待broker的回執,這樣可以降低延遲,但可靠性降低。1表示在lead副本接收到數據後producer將立即收到回執,這提高了可靠性,因爲客戶端會等待服務器端處理請求完成的回執。-1表示在所有同步的副本都收到數據後producer將收到回執,這提供了最佳的可靠性。key.serializer.class
:指定對key的序列化類。默認值爲${serializer.class}
。partitioner.class
:指定在topic中對消息進行分區的類。默認值爲kafka.producer.DefaultPartitioner
,是基於key的哈希值。compression.codec
:指定producer壓縮數據的格式,可選的值有none
、gzip
、snappy
。默認值爲none
。batch.num.messages
:指定異步模式時批次發送消息的數量。默認值爲200。producer會等到消息數量達到該值或者達到queue.buffer.max.ms
後纔會發送消息。