一、背景
- kafka應用場景很多,比如:日誌收集、消息系統、大數據流式處理等;
- 項目中剛好需要消息中間件、Spark流式處理,所以非常有必要深入瞭解下kafka原理與運行機制;
二、分析
- 先搞清楚Kafka的部署結構(物理結構):kafka由Kafka中間件(獨立安裝部署)、生產者、消費者三部分組成;
- Kafka中間件支持集羣安裝部署,生產者可以獨立部署到產生消息的業務模塊中去,消費者則獨立部署到消費消息的業務模塊中去,三者可以完全獨立;
- 鑑於本人使用的是Windows 10,準備在此係統基礎上搭建較簡潔的Kafka集羣環境,並在命令行中驗證,同時還通過Springboot項目驗證;
- kafka(版本:2.12-2.3.0)集羣需要zookeeper,爲了簡化說明,使用kafka內置的zookeeper;
三、步驟
- 官網下載Windows安裝包:kafka_2.12-2.3.0.tgz,解壓縮;
- 把解壓縮後的文件複製3份,分別命名:kafka_1、kafka_2、kafka_3;
- 新建kafka_1同級目錄kafka_slaves,並在kafka_slaves下新建data和logs二級目錄;
- 在data和logs二級目錄下都新建1、2、3這三個三級目錄;
- 在data/(1-3)/路徑下新建myid,值爲三級目錄的目錄名(1-3);
- 修改kafka_(1-3)的config/server.properties文件,以kafka_1爲例詳加說明:
#對應myid的值
broker.id=1
#日誌路徑,kafka2和3的最後一級目錄分別爲2、3
log.dirs=G:\\software\\java\\kafka_slaves\\logs\\1
#本機的非迴環IP+端口,kafka2和3對外端口號依次+1
listeners=PLAINTEXT://192.168.7.197:9091
#zookeeper配置都一樣
zookeeper.connect=localhost:2181,localhost:2182,localhost:2183
- 修改kafka_(1-3)的config/zookeeper.properties文件,以kafka_1爲例詳加說明:
dataDir=G:\\software\\java\\kafka_slaves\\data\\1
#zookeeper端口,kafka2和3對外端口號依次+1
clientPort=2181
server.1=127.0.0.1:2888:3888
server.2=127.0.0.1:4888:5888
server.3=127.0.0.1:6888:7888
- 啓動全部的zookeeper(每個一個管理員執行的CMD窗口):
G:\software\java\kafka_1\bin\windows\zookeeper-server-start.bat G:\software\java\kafka_1\config\zookeeper.properties
G:\software\java\kafka_2\bin\windows\zookeeper-server-start.bat G:\software\java\kafka_2\config\zookeeper.properties
G:\software\java\kafka_3\bin\windows\zookeeper-server-start.bat G:\software\java\kafka_3\config\zookeeper.properties
- 啓動全部的kafka(每個一個管理員執行的CMD窗口):
G:\software\java\kafka_1\bin\windows\kafka-server-start.bat G:\software\java\kafka_1\config\server.properties
G:\software\java\kafka_2\bin\windows\kafka-server-start.bat G:\software\java\kafka_2\config\server.properties
G:\software\java\kafka_3\bin\windows\kafka-server-start.bat G:\software\java\kafka_3\config\server.properties
- 所有配置文件路徑見github;
四、驗證
- 創建監聽
G:\software\java\kafka_1\bin\windows\kafka-topics.bat --create --zookeeper localhost:2181 --replication-factor 2 --partitions 3 --topic test
- 查看topic:
G:\software\java\kafka_1\bin\windows\kafka-topics.bat --describe --zookeeper localhost:2181
- 生產消息
G:\software\java\kafka_1\bin\windows\kafka-console-producer.bat --broker-list localhost:9091 --topic test
- 消費消息
G:\software\java\kafka_3\bin\windows\kafka-console-consumer.bat --bootstrap-server 192.168.7.197:9091 --from-beginning --topic test
- 效果圖如下:
- 生產者代碼:
@Service
public class KafkaProducer
{
@Autowired
private KafkaTemplate<String, String> kafkaTemplate;
@Value("${kafka.topic.topic-name}")
private String defaultTopic;
/**
* 發送消息
*
* @param msg
*/
public void send(String msg)
{
String topic = defaultTopic;
LOGGER.info("current default topic:{}", topic);
this.send(topic, msg);
}
/**
* 發送帶Topic的消息
*
* @param topic
* @param msg
*/
public void send(String topic, String msg)
{
LOGGER.info("current topic:{},msg:{}", topic, msg);
ListenableFuture<SendResult<String, String>> future = kafkaTemplate.send(topic, msg);
future.addCallback(new ListenableFutureCallback<SendResult<String, String>>()
{
@Override
public void onFailure(Throwable ex)
{
LOGGER.error("failed to send [{}] msg:{}.", topic, msg, ex);
}
@Override
public void onSuccess(SendResult<String, String> result)
{
LOGGER.info("send jms msg success:{}", result);
}
});
}
private static Logger LOGGER = LogManager.getLogger(KafkaProducer.class);
}
- 消費者代碼:
@Service
public class KafkaConsumer
{
@KafkaListener(topics = {"test-topic"}, groupId = "test-group")
public void process(ConsumerRecord<String, String> msg)
{
LOGGER.info("receive msg:{}", msg.value());
}
private static final Logger LOGGER = LogManager.getLogger(KafkaConsumer.class);
}
- kafka配置:
###Kafka Consumer
# 指定一個默認的組名
spring.kafka.consumer.group-id=test-group
# earliest:當各分區下有已提交的offset時,從提交的offset開始消費;無提交的offset時,從頭開始消費
# latest:當各分區下有已提交的offset時,從提交的offset開始消費;無提交的offset時,消費新產生的該分區下的數據
# none:topic各分區都存在已提交的offset時,從offset後開始消費;只要有一個分區不存在已提交的offset,則拋出異常
spring.kafka.consumer.auto-offset-reset=earliest
spring.kafka.consumer.enable-auto-commit=true
spring.kafka.consumer.auto-commit-interval=100
# key/value的反序列化
spring.kafka.consumer.key-deserializer=org.apache.kafka.common.serialization.StringDeserializer
spring.kafka.consumer.value-deserializer=org.apache.kafka.common.serialization.StringDeserializer
###Kafka Producer
# key/value的序列化
spring.kafka.producer.key-serializer=org.apache.kafka.common.serialization.StringSerializer
spring.kafka.producer.value-serializer=org.apache.kafka.common.serialization.StringSerializer
# 批量抓取
spring.kafka.producer.batch-size=65536
# 緩存容量
spring.kafka.producer.buffer-memory=524288
###Global Configuration
kafka.topic.topic-name=test-topic
- controller入口:
@RestController
public class ProducerController
{
@Autowired
private KafkaProducer producerService;
@GetMapping("/kafka/send")
@ResponseBody
public String send(String aaa)
{
producerService.send(aaa);
return "ok";
}
}
- 驗證1:通過java代碼生產一個test-topic消息,在命令行的消費者可以消費到:
- 驗證2:在命令行生產一個test-topic消息,在java端的消費者可以消費到:
- 完整java代碼詳見github;
五、總結
- 聽過kafka跟自己會不會使用kafka完全是兩碼事,kafka也不同於spring等這些三方件,需要獨立安裝部署kafka通道,java側只需要單獨引用其生產者或者消費者即可;
- kafka新舊版本的命令有差異,尤其是kafka-console-consumer.bat --bootstrap-server 192.168.7.197:9091 命令,舊版本是kafka-console-consumer.bat --zookeeper 127.0.0.1:2181,127.0.0.1:2182,完全不是一回事,需要多參考,並會分辨;
六、參考
[1]https://blog.csdn.net/m0_38075425/article/details/81353738
[2]https://zhuanlan.zhihu.com/p/64319010
[3]https://docs.spring.io/spring-kafka/reference/html/
[4]https://blog.csdn.net/lingbo229/article/details/80761778
[5]https://blog.csdn.net/weixin_34205076/article/details/90618163
[6]https://blog.csdn.net/panchang199266/article/details/82113453