使用Spring Cloud Stream和Apache Kafka Streams進行流計算(一)之WordCount入門

我們將探討如何使用Spring Cloud StreamKafka Streams編寫流處理應用程序。

Spring Cloud Stream Horsham 版本(3.0.0)對應用程序使用Apache Kafka的方式進行了一些更改,可以使用KafkaKafka Streamsbinders

1.Spring Cloud Stream下有幾種類型的Kafkabinders

這通常是一個令人困惑的問題:如果我想基於Apache Kafka編寫應用程序,應該使用哪種bindersSpring Cloud StreamKafka提供了兩個binders ,spring-cloud-stream-binder-kafkaspring-cloud-stream-binder-kafka-streams

  • spring-cloud-stream-binder-kafka是要編寫標準事件驅動的應用程序而要使用普通Kafka生產者和使用者的應用程序。
  • spring-cloud-stream-binder-kafka-streams 使用Kafka Streams庫開發流處理應用程序。

注意:在本文章中,我們將重點介紹Kafka Streamsbinders

本篇文章主要是Spring Cloud StreamKafka Streams之間的整合,而沒有涉及Kafka Streams本身的細節。 想要實現一個好的基於Kafka Streams的流處理應用程序,強烈建議對Kafka Streams庫有更深入的瞭解。 本文僅停留在實際的Kafka Streams庫的簡單使用,後面有時間會深入研究Kafka Streams的使用

2.創建Spring Cloud Stream Kafka Streams應用程序

從本質上講,所有Spring Cloud Stream應用程序都是基於Spring Boot應用程序。 爲了創建新項目,請轉到pring Initializr,然後創建一個新項目。 選擇Cloud StreamSpring for Apache Kafka Streams作爲依賴項。 這將生成一個項目,其中包含開始開發應用程序所需的所有組件。 這是Initializr的屏幕截圖,其中選擇了基本依賴項。
在這裏插入圖片描述

3.編一個簡單的示例(WordCount)

這是一個非常基本但非常實用的Kafka Streams應用程序,它是使用Spring Cloud Stream的功能編寫的

3.1創建 SpringCloudKafkaStreamsExampleApplication

import org.apache.kafka.common.serialization.Serdes;
import org.apache.kafka.common.utils.Bytes;
import org.apache.kafka.streams.KeyValue;
import org.apache.kafka.streams.kstream.Grouped;
import org.apache.kafka.streams.kstream.KStream;
import org.apache.kafka.streams.kstream.Materialized;
import org.apache.kafka.streams.kstream.TimeWindows;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;

import java.time.Duration;
import java.util.Arrays;
import java.util.Date;
import java.util.function.Function;

@SpringBootApplication
public class SpringCloudKafkaStreamsExampleApplication {


	public static void main(String[] args) {
		SpringApplication.run(SpringCloudKafkaStreamsExampleApplication.class, args);
	}

	public static class WordCountProcessorApplication {

		public static final String INPUT_TOPIC = "words";
		public static final String OUTPUT_TOPIC = "counts";
		public static final int WINDOW_SIZE_MS = 30000;

		@Bean
		public Function<KStream<Bytes, String>, KStream<Bytes, WordCount>> process() {

			return input -> input
					.flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))
					.map((key, value) -> new KeyValue<>(value, value))
					.groupByKey(Grouped.with(Serdes.String(), Serdes.String()))
					.windowedBy(TimeWindows.of(Duration.ofMillis(WINDOW_SIZE_MS)))
					.count(Materialized.as("WordCounts-1"))
					.toStream()
					.map((key, value) -> new KeyValue<>(null, new WordCount(key.key(), value, new Date(key.window().start()), new Date(key.window().end()))));
		}
	}

	static class WordCount {

		private String word;

		private long count;

		private Date start;

		private Date end;

		@Override
		public String toString() {
			final StringBuffer sb = new StringBuffer("WordCount{");
			sb.append("word='").append(word).append('\'');
			sb.append(", count=").append(count);
			sb.append(", start=").append(start);
			sb.append(", end=").append(end);
			sb.append('}');
			return sb.toString();
		}

		WordCount() {

		}

		WordCount(String word, long count, Date start, Date end) {
			this.word = word;
			this.count = count;
			this.start = start;
			this.end = end;
		}

		public String getWord() {
			return word;
		}

		public void setWord(String word) {
			this.word = word;
		}

		public long getCount() {
			return count;
		}

		public void setCount(long count) {
			this.count = count;
		}

		public Date getStart() {
			return start;
		}

		public void setStart(Date start) {
			this.start = start;
		}

		public Date getEnd() {
			return end;
		}

		public void setEnd(Date end) {
			this.end = end;
		}
	}

}

如上面的代碼,這是一個非常簡單的單詞統計應用程序,僅將統計的結果數據打印到控制檯,但仍然是功能完善的Kafka Streams應用程序。 在外層,通過使用@SpringBootApplication註解標識這是一個啓動類的應用程序。 然後,我們提供一個java.util.function.Function的bean,通過lambda表達式封裝單詞統計的的邏輯。 將KStream<Bytes, String>作爲其輸入,KStream<Bytes, WordCount>`爲輸出。

注意:時間窗口爲30s,即:30s執行一次單詞出現的次數統計。

3.2 application.yml配置文件

spring.cloud.stream:
  bindings:
    process-in-0:
      destination: words
    process-out-0:
      destination: counts
  kafka:
    streams:
      binder:
        applicationId: hello-word-count-sample
        configuration:
          commit.interval.ms: 100
          default:
            key.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
            value.serde: org.apache.kafka.common.serialization.Serdes$StringSerde
#Enable metrics
management:
  endpoint:
    health:
      show-details: ALWAYS
  endpoints:
    web:
      exposure:
        include: metrics,health
#Enable logging to debug for spring kafka config
logging:
  level:
    org.springframework.kafka.config: debug
spring:
  cloud:
    stream:
      kafka:
        binder:
          brokers: localhost:9092
          min-partition-count: 5
  kafka:
    bootstrap-servers: localhost:9092
    producer:
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.StringSerializer
    consumer:
      group-id: counts-group
      enable-auto-commit: true
      auto-commit-interval: 1000
      key-deserializer: org.apache.kafka.common.serialization.StringDeserializer
      value-deserializer: org.apache.kafka.common.serialization.StringDeserializer
server:
  port: 8096


  • spring.cloud.stream.bindings:process-in-0.destination: 標識源數據的Topic 如:words
  • spring.cloud.stream.bindings:process-out-0.destination: 標識結果集的Topic 如:counts
  • spring.cloud.stream.kafka.binder.applicationId:標識應用的Id 如:hello-word-count-sample
  • spring.cloud.stream.kafka.binder.brokers: Kafka的broker地址 如:localhost:9092
3.3 KafkaConsumer 消費wordcount的結果集
import com.alibaba.fastjson.JSON;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.springframework.kafka.annotation.KafkaListener;
import org.springframework.stereotype.Component;

/**
 * kafka消費counts的topic中的數據,這裏我們只是爲了演示,直接輸出到控制檯,以後我們就可以存儲數據庫中。
 */
@Component
@Slf4j
public class KafkaConsumer {

    @KafkaListener(topics = "counts")
    public void listen(ConsumerRecord<?, ?> record) throws Exception {
        String value = (String) record.value();
        log.info("partition = {},topic = {}, offset = {}, value = {}",record.partition(), record.topic(), record.offset(), value);
        WordCount wordCount = JSON.parseObject(value, WordCount.class);
        log.info(wordCount.toString());
    }
}

到這裏我們就可以啓動kafkabroker,來測試以上應用程序。

4.測試案例

4.1 啓動Zookeeper服務

bin\windows\zookeeper-server-start.bat config/zookeeper.properties

在這裏插入圖片描述

4.2 啓動Apache Kafka服務

 bin\windows\kafka-server-start.bat config/server.properties 

在這裏插入圖片描述

4.3 使用kafka-consule-producer命令在控制檯發送字符串

在這裏插入圖片描述

4.4查詢測試結果

2019-12-09 18:15:03.183  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : partition = 0,topic = counts, offset = 5446, value = {"word":"beijing","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:03.244  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : WordCount{word='beijing', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:03.244  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : partition = 0,topic = counts, offset = 5447, value = {"word":"lidong","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:03.244  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : WordCount{word='lidong', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:03.245  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : partition = 0,topic = counts, offset = 5448, value = {"word":"aihuhui","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:03.245  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : WordCount{word='aihuhui', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:03.245  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : partition = 0,topic = counts, offset = 5449, value = {"word":"xd","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:03.245  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : WordCount{word='xd', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:07.213  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : partition = 0,topic = counts, offset = 5450, value = {"word":"beijing","count":2,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:07.214  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : WordCount{word='beijing', count=2, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:07.214  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : partition = 0,topic = counts, offset = 5451, value = {"word":"lidong","count":2,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:07.214  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : WordCount{word='lidong', count=2, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:09.638  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : partition = 0,topic = counts, offset = 5452, value = {"word":"huge","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:09.639  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : WordCount{word='huge', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}
2019-12-09 18:15:11.691  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : partition = 0,topic = counts, offset = 5453, value = {"word":"ghhe","count":1,"start":1575886500000,"end":1575886530000}
2019-12-09 18:15:11.692  INFO 10008 --- [ntainer#0-0-C-1] com.lidongexample.KafkaConsumer          : WordCount{word='ghhe', count=1, start=Mon Dec 09 18:15:00 CST 2019, end=Mon Dec 09 18:15:30 CST 2019}

以上就是統計的結果日誌。

5.總結

在此文章中,我們快速介紹瞭如何使用Spring Cloud Stream的功能編程支持編寫使用Kafka Streams的流計算應用程序。 我們看到了binder處理了很多基礎的配置,使我們可以專注於實現業務邏輯。 在下一篇文章中,我們將進一步探索Apache Kafka Streams的編程模型,以瞭解如何使用Spring Cloud StreamKafka Streams開發更多優秀的流處理應用程序。

本文中案例源碼:github地址

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章