一、producer api
1、消息發送流程
kafka的producer發送消息採用的是異步發送的方式。在消息發送的過程中,涉及到了兩個線程——main線程和sender線程,以及一個線程共享變量——RecordAccumulator。main線程將消息發送給RecordAccumulator,Sender線程不斷從RecordAccumulator中拉取消息發送到kafka broker。
2、異步發送
引入maven:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.3.0</version>
</dependency>
不帶回調的api:
Properties properties = new Properties();
properties.put("bootstrap.servers", "192.168.10.110:9092,192.168.10.132:9092,192.168.10.177:9092");
properties.put("acks", "all");
properties.put("retries", 3);
//批次大小
properties.put("batch.size", 16384);
//等待時間
properties.put("linger.ms", 1);
//緩衝區大小
properties.put("buffer.memory", 33554432);
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<>(properties);
for (int i = 0; i < 10; i++) {
kafkaProducer.send(new ProducerRecord<>("topic", "demo -- " + i));
}
kafkaProducer.close();
帶回調的api:
Properties properties = new Properties();
properties.put("bootstrap.servers", "192.168.10.110:9092,192.168.10.132:9092,192.168.10.177:9092");
properties.put("acks", "all");
properties.put("retries", 3);
properties.put("batch.size", 16384);
properties.put("linger.ms", 1);
properties.put("buffer.memory", 33554432);
properties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
properties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
KafkaProducer<String, String> kafkaProducer = new KafkaProducer<>(properties);
for (int i = 0; i < 10; i++) {
kafkaProducer.send(new ProducerRecord<>("topic", 0,"demo","callback - " + i), new Callback() {
@Override
public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e == null) {
System.out.println(recordMetadata.partition() + "--" + recordMetadata.offset());
}
}
});
}
kafkaProducer.close();
選擇分區,如果沒選分區有key的話,會用key進行hash分配到分區。
可以自定義分區,實現Partitioner接口,然後在properties中添加partitioner.class指定分區器。
3、同步發送
send()返回值是Future,調用get()方法會阻塞發送。
4、自定義攔截器
實現ProducerInterceptor接口,然後在properties中添加interceptor.classes,value是一個list
二、consumer api
1、自動提交offset
Properties properties = new Properties();
properties.put("bootstrap.servers", "192.168.10.110:9092,192.168.10.132:9092,192.168.10.177:9092");
properties.put("group.id", "2");
properties.put("enable.auto.commit", "true");
properties.put("auto.commit.interval.ms", "1000");
properties.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
properties.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
//offset重置
properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG,"earliest");
KafkaConsumer<String, String> kafkaConsumer = new KafkaConsumer<>(properties);
kafkaConsumer.subscribe(Arrays.asList("topic"));
while (true) {
ConsumerRecords<String, String> records = kafkaConsumer.poll(Duration.ofSeconds(1));
System.err.println(records.count());
for (ConsumerRecord<String, String> record : records) {
System.err.println(record.key() + " - " + record.value() + " - " + record.partition() + " - " + record.offset());
}
}
offset重置有兩個前提,一個是一個新的消費組,還有一個是數據過期。
2、手動提交offset
將enable.auto.commit設置爲false,然後consumer.commitSync同步提交和consumer.commitAsync異步提交
3、自定義存儲offset
kafka0.9版本之前,offset存儲在zookeeper,0.9之後,默認將offset存儲在kafka的一個內置topic中。除此之外,kafka還可以選擇自定義存儲offset。
offset的維護是相當繁瑣的,因爲需要考慮到消費者的Rebalace。
當有新的消費者加入消費者組,已有的消費者退出消費者組或者所訂閱的主題的分區發生變化,就會觸發到分區的重新分配,重新分配的過程叫做Rebalace。
自定義存儲offset,需要藉助ConsumerRebalanceListener。
做成一個事務放關係型數據庫,然後重啓時調用seek()。