我們將探討如何使用Spring Cloud Stream
和Kafka Streams
編寫流處理應用程序。
Spring Cloud Stream Horsham 版本(3.0.0)對應用程序使用Apache Kafka
的方式進行了一些更改,可以使用Kafka
和Kafka Streams
的binders
。
1.Spring Cloud Stream
下有幾種類型的Kafka
的binders
?
這通常是一個令人困惑的問題:如果我想基於Apache Kafka
編寫應用程序,應該使用哪種binders
。 Spring Cloud Stream
爲Kafka
提供了兩個binders ,spring-cloud-stream-binder-kafka
和spring-cloud-stream-binder-kafka-streams
。
spring-cloud-stream-binder-kafka
是要編寫標準事件驅動的應用程序而要使用普通Kafka生產者和使用者的應用程序。spring-cloud-stream-binder-kafka-streams
使用Kafka Streams
庫開發流處理應用程序。
注意:在本文章中,我們將重點介紹Kafka Streams
的binders
。
本篇文章主要是Spring Cloud Stream
和Kafka Streams
之間的整合,而沒有涉及Kafka Streams
本身的細節。 想要實現一個好的基於Kafka Streams
的流處理應用程序,強烈建議對Kafka Streams
庫有更深入的瞭解。 本文僅停留在實際的Kafka Streams
庫的簡單使用,後面有時間會深入研究Kafka Streams
的使用
2.創建Spring Cloud Stream Kafka Streams
應用程序
從本質上講,所有Spring Cloud Stream
應用程序都是基於Spring Boot
應用程序。 爲了創建新項目,請轉到pring Initializr,然後創建一個新項目。 選擇Cloud Stream
和Spring for Apache Kafka Streams
作爲依賴項。 這將生成一個項目,其中包含開始開發應用程序所需的所有組件。 這是Initializr
的屏幕截圖,其中選擇了基本依賴項。
3.編一個簡單的示例(WordCount)
這是一個非常基本但非常實用的Kafka Streams
應用程序,它是使用Spring Cloud Stream
的功能編寫的
3.1創建 SpringCloudKafkaStreamsExampleApplication
import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.kstream.Grouped;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.TimeWindows;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import java.time.Duration;
import java.util.Arrays;
import java.util.Date;
import java.util.function.Function;
@SpringBootApplication
public class SpringCloudKafkaStreamsExampleApplication {
public static void main(String[] args) {
SpringApplication.run(SpringCloudKafkaStreamsExampleApplication.class, args);
}
public static class WordCountProcessorApplication {
public static final String INPUT_TOPIC = "words";
public static final String OUTPUT_TOPIC = "counts";
public static final int WINDOW_SIZE_MS = 30000;
@Bean
public Function<KStream<Bytes, String>, KStream<Bytes, WordCount>> process() {
return input -> input
.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
.map((key, value) -> new KeyValue<>(value, value))
.groupByKey(Grouped.with(Serdes.String(), Serdes.String()))
.windowedBy(TimeWindows.of(Duration.ofMillis(WINDOW_SIZE_MS)))
.count(Materialized.as("WordCounts-1"))
.toStream()
.map((key, value) -> new KeyValue<>(null, new WordCount(key.key(), value, new Date(key.window().start()), new Date(key.window().end()))));
}
}
static class WordCount {
private String word;
private long count;
private Date start;
private Date end;
@Override
public String toString() {
final StringBuffer sb = new StringBuffer("WordCount{");
sb.append("word='").append(word).append('\'');
sb.append(", count=").append(count);
sb.append(", start=").append(start);
sb.append(", end=").append(end);
sb.append('}');
return sb.toString();
}
WordCount() {
}
WordCount(String word, long count, Date start, Date end) {
this.word = word;
this.count = count;
this.start = start;
this.end = end;
}
public String getWord() {
return word;
}
public void setWord(String word) {
this.word = word;
}
public long getCount() {
return count;
}
public void setCount(long count) {
this.count = count;
}
public Date getStart() {
return start;
}
public void setStart(Date start) {
this.start = start;
}
public Date getEnd() {
return end;
}
public void setEnd(Date end) {
this.end = end;
}
}
}
如上面的代碼,這是一個非常簡單的單詞統計應用程序,僅將統計的結果數據打印到控制檯,但仍然是功能完善的Kafka Streams
應用程序。 在外層,通過使用@SpringBootApplication
註解標識這是一個啓動類的應用程序。 然後,我們提供一個java.util.function.Function的bean,通過
lambda表達式封裝單詞統計的的邏輯。 將
KStream<Bytes, String>作爲其輸入,
KStream<Bytes, WordCount>`爲輸出。
注意:時間窗口爲30s,即:30s執行一次單詞出現的次數統計。
3.2 application.yml配置文件
spring.cloud.stream:
bindings:
process-in-0:
destination: words
process-out-0:
destination: counts
kafka:
streams:
binder:
applicationId: hello-word-count-sample
configuration:
commit.interval.ms: 100
default:
key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
value.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
#Enable metrics
management:
endpoint:
health:
show-details: ALWAYS
endpoints:
web:
exposure:
include: metrics,health
#Enable logging to debug for spring kafka config
logging:
level:
org.springframework.kafka.config: debug
spring:
cloud:
stream:
kafka:
binder:
brokers: localhost:9092
min-partition-count: 5
kafka:
bootstrap-servers: localhost:9092
producer:
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.apache.kafka.common.serialization.StringSerializer
consumer:
group-id: counts-group
enable-auto-commit: true
auto-commit-interval: 1000
key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
server:
port: 8096
spring.cloud.stream.bindings:process-in-0.destination
: 標識源數據的Topic 如:wordsspring.cloud.stream.bindings:process-out-0.destination
: 標識結果集的Topic 如:countsspring.cloud.stream.kafka.binder.applicationId
:標識應用的Id 如:hello-word-count-samplespring.cloud.stream.kafka.binder.brokers
: Kafka的broker地址 如:localhost:9092
3.3 KafkaConsumer 消費wordcount的結果集
import com.alibaba.fastjson.JSON;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Component;
/**
* kafka消費counts的topic中的數據,這裏我們只是爲了演示,直接輸出到控制檯,以後我們就可以存儲數據庫中。
*/
@Component
@Slf4j
public class KafkaConsumer {
@KafkaListener(topics = "counts")
public void listen(ConsumerRecord<?, ?> record) throws Exception {
String value = (String) record.value();
log.info("partition = {},topic = {}, offset = {}, value = {}",record.partition(), record.topic(), record.offset(), value);
WordCount wordCount = JSON.parseObject(value, WordCount.class);
log.info(wordCount.toString());
}
}
到這裏我們就可以啓動kafka
的broker
,來測試以上應用程序。
4.測試案例
4.1 啓動Zookeeper服務
bin\windows\zookeeper-server-start.bat config/zookeeper.properties
4.2 啓動Apache Kafka服務
bin\windows\kafka-server-start.bat config/server.properties
4.3 使用kafka-consule-producer命令在控制檯發送字符串
4.4查詢測試結果
2019-12-09 18:15:03.183 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : partition = 0,topic = counts, offset = 5446, value = {"word":"beijing","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:03.244 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : WordCount{word='beijing', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:03.244 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : partition = 0,topic = counts, offset = 5447, value = {"word":"lidong","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:03.244 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : WordCount{word='lidong', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:03.245 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : partition = 0,topic = counts, offset = 5448, value = {"word":"aihuhui","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:03.245 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : WordCount{word='aihuhui', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:03.245 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : partition = 0,topic = counts, offset = 5449, value = {"word":"xd","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:03.245 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : WordCount{word='xd', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:07.213 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : partition = 0,topic = counts, offset = 5450, value = {"word":"beijing","count":2,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:07.214 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : WordCount{word='beijing', count=2, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:07.214 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : partition = 0,topic = counts, offset = 5451, value = {"word":"lidong","count":2,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:07.214 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : WordCount{word='lidong', count=2, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:09.638 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : partition = 0,topic = counts, offset = 5452, value = {"word":"huge","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:09.639 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : WordCount{word='huge', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:11.691 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : partition = 0,topic = counts, offset = 5453, value = {"word":"ghhe","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:11.692 INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer : WordCount{word='ghhe', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
以上就是統計的結果日誌。
5.總結
在此文章中,我們快速介紹瞭如何使用Spring Cloud Stream
的功能編程支持編寫使用Kafka Streams
的流計算應用程序。 我們看到了binder
處理了很多基礎的配置,使我們可以專注於實現業務邏輯。 在下一篇文章中,我們將進一步探索Apache Kafka Streams
的編程模型,以瞭解如何使用Spring Cloud Stream
和Kafka Streams
開發更多優秀的流處理應用程序。
本文中案例源碼:github地址