【問題解決】Kafka報錯 Bootstrap broker x.x.x.x:9092 (id: -1 rack: null) disconnected

問題復現

近日針對某一客戶需求開發了一個需要使用Kafka的功能,功能是什麼暫且不論,在本地虛機的Kafka連接一切正常遂放到測試服務器上驗證功能,以下是監聽topic成功和警告報錯:

2023-05-09 10:22:23 [localhost-startStop-1] INFO  org.apache.kafka.clients.consumer.ConsumerConfig - ConsumerConfig values: 
	allow.auto.create.topics = true
	auto.commit.interval.ms = 5000
	auto.offset.reset = earliest
	bootstrap.servers = [10.39.48.113:9092]
	check.crcs = true
	client.dns.lookup = use_all_dns_ips
	client.id = consumer-enn-jiuqi-1
	client.rack = 
	connections.max.idle.ms = 540000
	default.api.timeout.ms = 60000
	enable.auto.commit = false
	exclude.internal.topics = true
	fetch.max.bytes = 52428800
	fetch.max.wait.ms = 500
	fetch.min.bytes = 1
	group.id = enn-jiuqi
	group.instance.id = null
	heartbeat.interval.ms = 3000
	interceptor.classes = []
	internal.leave.group.on.close = true
	internal.throw.on.fetch.stable.offset.unsupported = false
	isolation.level = read_uncommitted
	key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
	max.partition.fetch.bytes = 1048576
	max.poll.interval.ms = 300000
	max.poll.records = 500
	metadata.max.age.ms = 300000
	metric.reporters = []
	metrics.num.samples = 2
	metrics.recording.level = INFO
	metrics.sample.window.ms = 30000
	partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor, class org.apache.kafka.clients.consumer.CooperativeStickyAssignor]
	receive.buffer.bytes = 65536
	reconnect.backoff.max.ms = 1000
	reconnect.backoff.ms = 50
	request.timeout.ms = 30000
	retry.backoff.ms = 100
	sasl.client.callback.handler.class = null
	sasl.jaas.config = null
	sasl.kerberos.kinit.cmd = /usr/bin/kinit
	sasl.kerberos.min.time.before.relogin = 60000
	sasl.kerberos.service.name = null
	sasl.kerberos.ticket.renew.jitter = 0.05
	sasl.kerberos.ticket.renew.window.factor = 0.8
	sasl.login.callback.handler.class = null
	sasl.login.class = null
	sasl.login.refresh.buffer.seconds = 300
	sasl.login.refresh.min.period.seconds = 60
	sasl.login.refresh.window.factor = 0.8
	sasl.login.refresh.window.jitter = 0.05
	sasl.mechanism = GSSAPI
	security.protocol = PLAINTEXT
	security.providers = null
	send.buffer.bytes = 131072
	session.timeout.ms = 45000
	socket.connection.setup.timeout.max.ms = 30000
	socket.connection.setup.timeout.ms = 10000
	ssl.cipher.suites = null
	ssl.enabled.protocols = [TLSv1.2]
	ssl.endpoint.identification.algorithm = https
	ssl.engine.factory.class = null
	ssl.key.password = null
	ssl.keymanager.algorithm = SunX509
	ssl.keystore.certificate.chain = null
	ssl.keystore.key = null
	ssl.keystore.location = null
	ssl.keystore.password = null
	ssl.keystore.type = JKS
	ssl.protocol = TLSv1.2
	ssl.provider = null
	ssl.secure.random.implementation = null
	ssl.trustmanager.algorithm = PKIX
	ssl.truststore.certificates = null
	ssl.truststore.location = null
	ssl.truststore.password = null
	ssl.truststore.type = JKS
	value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer

2023-05-09 10:22:23 [localhost-startStop-1] INFO  org.apache.kafka.common.utils.AppInfoParser - Kafka version: 3.0.1
2023-05-09 10:22:23 [localhost-startStop-1] INFO  org.apache.kafka.common.utils.AppInfoParser - Kafka commitId: 8e30984f43e64d8b
2023-05-09 10:22:23 [localhost-startStop-1] INFO  org.apache.kafka.common.utils.AppInfoParser - Kafka startTimeMs: 1683598943212
2023-05-09 10:22:23 [localhost-startStop-1] INFO  org.apache.kafka.clients.consumer.KafkaConsumer - [Consumer clientId=consumer-test-1, groupId=test-group] Subscribed to topic(s): sync_user


2023-05-09 10:23:50 [org.springframework.kafka.KafkaListenerEndpointContainer#0-0-C-1] WARN  org.apache.kafka.clients.NetworkClient - [Consumer clientId=consumer-test-1, groupId=test-group] Bootstrap broker 10.39.48.113:9092 (id: -1 rack: null) disconnected

可以注意到日誌最後有1條警告,其實我只放了一條,這個信息在日誌裏多滴很。

分析問題

客戶對接方發送了一條消息問我們消費到沒,我一查日誌,滿屏是WARN提示 Bootstrap broker 10.39.48.113:9092 (id: -1 rack: null) disconnected,真是小刀拉屁股——開了眼。

打開Google Bard機器人問問這是啥原因導致的:

可能是網絡原因導致的,查了下其他博客說也有可能是開啓了某種認證機制導致的。

通過ping發現能ping通broker,但是通過telnet卻無法連接了,也就是說:這臺測試服務器與kafka broker間的網絡被限制爲可以ping但不能訪問broker的端口號!

解決辦法

查看了下測試服務器的iptables鏈沒發現問題,自己通過tcpdump抓包用wireshark分析發現只有SYN包沒響應,找到客戶網絡工程師定位到是公司的網絡策略限制了。至此問題解決,希望能給讀者一個思路。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章