Kafka事務未關閉導致消費者無法消費消息。
背景
最近遇到一個問題:有一個公用topic,很多應用都讀寫這個topic。從某個時間點開始,所有消費該topic的消費者(read_committed級別)都拉不到消息了。隨機看了一些應用的日誌,未發現生產者報錯,仍然能正常發消息並提交事務,消費者也未報錯。打開運維工具,用read_uncommitted
級別的消費者查看該topic裏面的消息,發現LSO(last stable offset)一直卡在某個offset,LEO(log end offset)仍在持續增加。
原因
-
生產者開啓了事務,由於程序bug,在異常退出的情況下,producer的commit、abort都沒執行到,後果就是事務未關閉。
-
事務除了手動commit、abort兩種方式進行主動關閉外,還可以通過超時被動關閉。
broker 端事務超時時間配置:transaction.max.timeout.ms,默認15分鐘
producer 端事務超時時間配置:transaction.timeout.ms,默認1分鐘
-
但是我們將事務超時時間設置得非常大,5小時。未關閉的事務將會等待5小時才能被動關閉。這5小時內,後續的消息都無法被消費。注意,雖然無法消費,但是不同事務Id的其他生產者卻能繼續往該topic提交事務。但是這些已提交的消息是不能被消費的,這是通過LastStableOffset(LSO)進行控制的。LSO 的定義是,
smallest offset of any open transaction
,已開啓的事務裏面offset最小的那一個值,可消費的offset是小於LSO的。 -
爲什麼後續已提交事務的消息也無法被消費呢?因爲Kafka要保證partition內的消息是有序的。當前序事務未結束,如果先消費了後續消息,此時前序事務即使提交也無意義了,因爲消費者不能將offset倒回去消費它。如果倒回去消費,順序就亂了。所以,前序事務必須結束,不管是主動commit/abort,還是超時被動abort,總之必須結束,後續消息才能被消費。
復現方案及延申測試
創建2個事務生產者tx-producer-1、tx-producer-2,創建1個非事務的生產者nontx-producer,創建1個隔離級別(isolation.level
)爲read_committed的rc-consumer,1個隔離級別爲read_uncommitted
的runc-consumer。
- nontx-producer每秒發一個消息
- tx-producer-1每秒提交一個事務,一個事務裏面發送2條消息
- tx-producer-2每秒提交一個事務,一個事務裏面發送2條消息;但是,在做第三個事務時,發了2條消息後,不提交,長時間sleep
- rc-comsumer和runc-consumer持續消費,打印消費的消息
每個生產者、消費者在自己的線程運行,互不影響。使用Kafka_2.13-3.3.1版本。
kafka使用默認配置。
producer.close會立即關閉事務嗎-會
結論:會
代碼:
for (int i = 1000; i < 2000; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-2]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-2]sent tx msg-" + (i + 1));
// 掛起事務
if (i == 1004) {
log.warn("tx-producer-2 hanging tx");
TimeUnit.SECONDS.sleep(5);
log.warn("tx-producer-2 close");
producer.close();
return;
}
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
log.warn("tx-producer-2 abort tx");
}
}
操作:在IDEA運行應用即可
應用日誌:
2022-11-17 09:56:39.757 WARN 2288 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : tx-producer-2 close
2022-11-17 09:56:39.757 INFO 2288 --- [pool-5-thread-1] o.a.k.clients.producer.KafkaProducer : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
2022-11-17 09:56:39.757 DEBUG 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Beginning shutdown of Kafka producer I/O thread, sending remaining records.
2022-11-17 09:56:39.757 INFO 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Aborting incomplete transaction due to shutdown
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Transition from state IN_TRANSACTION to ABORTING_TRANSACTION
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Enqueuing transactional request EndTxnRequestData(transactionalId='tx-producer-2', producerId=2, producerEpoch=0, committed=false)
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Sending transactional request EndTxnRequestData(transactionalId='tx-producer-2', producerId=2, producerEpoch=0, committed=false) to node ship:9092 (id: 0 rack: null) with correlation ID 40
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] org.apache.kafka.clients.NetworkClient : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Sending END_TXN request with header RequestHeader(apiKey=END_TXN, apiVersion=3, clientId=producer-tx-producer-2, correlationId=40) and timeout 30000 to node 0: EndTxnRequestData(transactionalId='tx-producer-2', producerId=2, producerEpoch=0, committed=false)
2022-11-17 09:56:39.761 DEBUG 2288 --- [r-tx-producer-2] org.apache.kafka.clients.NetworkClient : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Received END_TXN response from node 0 for request with header RequestHeader(apiKey=END_TXN, apiVersion=3, clientId=producer-tx-producer-2, correlationId=40): EndTxnResponseData(throttleTimeMs=0, errorCode=0)
2022-11-17 09:56:39.761 DEBUG 2288 --- [r-tx-producer-2] o.a.k.c.p.internals.TransactionManager : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Transition from state ABORTING_TRANSACTION to READY
2022-11-17 09:56:39.762 DEBUG 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Shutdown of Kafka producer I/O thread has completed.
2022-11-17 09:56:39.762 INFO 2288 --- [pool-5-thread-1] org.apache.kafka.common.metrics.Metrics : Metrics scheduler closed
2022-11-17 09:56:39.762 INFO 2288 --- [pool-5-thread-1] org.apache.kafka.common.metrics.Metrics : Closing reporter org.apache.kafka.common.metrics.JmxReporter
2022-11-17 09:56:39.762 INFO 2288 --- [pool-5-thread-1] org.apache.kafka.common.metrics.Metrics : Metrics reporters closed
2022-11-17 09:56:39.763 INFO 2288 --- [pool-5-thread-1] o.a.kafka.common.utils.AppInfoParser : App info kafka.producer for producer-tx-producer-2 unregistered
“Aborting incomplete transaction due to shutdown”,可見在producer.close()會關閉未完成的事務。
kafka server.log:
只有初始化事務的日誌,沒有結束事務的日誌
[2022-11-17 09:15:03,988] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 09:15:03,995] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-1 with producerId 1 and producer epoch 0 on partition __transaction_state-1 (kafka.coordinator.transaction.TransactionCoordinator)
應用進程退出會立即關閉事務嗎-不會
結論:不會
代碼:
for (int i = 1000; i < 2000; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-2]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-2]sent tx msg-" + (i + 1));
// 掛起事務
if (i == 1004) {
log.warn("tx-producer-2 hanging tx");
TimeUnit.MINUTES.sleep(60); // 長時間sleep
}
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
log.warn("tx-producer-2 abort tx");
}
}
操作:在IDEA裏面運行應用,在sleep過程中點擊停止
應用日誌:
Started MateApp in 1.702 seconds (JVM running for 2.123)
2022-11-16 21:05:58.582 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : runc pool...
2022-11-16 21:05:58.582 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : rc pool...
2022-11-16 21:05:58.609 INFO 14572 --- [pool-3-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : [nontx-producer]sent msg-0
2022-11-16 21:05:59.104 INFO 14572 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : [tx-producer-2]sent tx msg-1000
2022-11-16 21:06:00.192 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[rc] received: msg-1000
2022-11-16 21:06:00.192 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[rc] received: msg-1001
...
...
2022-11-16 21:06:02.160 WARN 14572 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : tx-producer-2 hanging tx
2022-11-16 21:06:02.254 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : runc pool...
2022-11-16 21:06:02.256 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[rc] received: msg-2004
2022-11-16 21:06:02.256 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[rc] received: msg-2005
2022-11-16 21:06:02.256 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : rc pool...
2022-11-16 21:06:02.290 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[read_uncommitted] received: msg-2006
2022-11-16 21:06:02.290 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[read_uncommitted] received: msg-2007
...
...
2022-11-16 21:06:08.265 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[read_uncommitted] received: msg-2018
2022-11-16 21:06:08.265 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : consumer[read_uncommitted] received: msg-2019
2022-11-16 21:06:08.265 INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : runc pool...
2022-11-16 21:06:08.292 INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : rc pool...
Process finished with exit code 130
在事務掛起後,consumer[rc]就再也消費不到消息了,只有consumer[read_uncommitted]還能消費到。
kafka server.log:
[2022-11-16 21:06:00,359] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
...
[2022-11-16 21:07:03,861] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
可見,tx-producer-2的事務在init1分鐘後自動rollback。應用在21:06:13停止,事務在21:07:03回滾。應用停止時事務並未立即回滾。
System.exit會立即關閉事務嗎-不會
結論:不會
代碼:
for (int i = 1000; i < 2000; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-2]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-2]sent tx msg-" + (i + 1));
// 掛起事務
if (i == 1004) {
log.warn("tx-producer-2 system exit");
System.exit(-1);
}
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
log.warn("tx-producer-2 abort tx");
}
}
應用日誌:
...
2022-11-17 10:18:49.195 WARN 10736 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : tx-producer-2 system exit
...
Process finished with exit code -1
kafka server.log
[2022-11-17 10:18:46,106] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-1 with producerId 1 and producer epoch 0 on partition __transaction_state-1 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:18:46,176] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:19:52,492] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-1 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:19:52,496] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
看起來應用System.exit(-1) 3秒後事務就回滾了。再做一次
# 應用日誌
2022-11-17 10:27:00.366 WARN 20524 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : tx-producer-2 system exit
# kafka server 日誌
[2022-11-17 10:26:53,406] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-1 with producerId 2 and producer epoch 0 on partition __transaction_state-1 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:26:53,406] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 1 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:28:02,428] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-1 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:28:02,432] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
這次相差一分鐘。
Thread.stop會立即關閉事務嗎-不會
結論:不會
代碼:
for (int i = 1000; i < 2000; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-2]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-2]sent tx msg-" + (i + 1));
// 掛起事務
if (i == 1004) {
log.info("tx-producer-1 stop");
Thread.currentThread().stop();
}
producer.commitTransaction();
} catch (Exception e) {
producer.abortTransaction();
log.warn("tx-producer-2 abort tx");
}
}
應用日誌:
2022-11-17 10:45:22.239 INFO 7504 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest : tx-producer-1 stop
kafka server.log:
[2022-11-17 10:45:15,340] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:46:22,035] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
生產者線程stop後一分鐘,broker回滾事務。
總結
transaction.timeout.ms
(producer)、transaction.max.timeout.ms
(broker)、request.timeout.ms
(broker)三個參數尤其重要,不能配得太大。- 異常處理要完備,確保commit、abort、close方法能被執行到
如何檢測掛起的事務
這個不消費的問題我排查了很久,知道原因後自然而然想到,有沒有手段可以檢測到未關閉的事務呢?KIP-664 正在討論這個問題。
在這個特性發布之前,我們可以用一些其他方式勉強實現這個功能。創建一個read_uncommitted
的消費者,消費同樣的topic partition,然後比較end offset。如果read_uncommitted
的end offset 逐漸增加,read_committed
的end offset 長時間未增加,則說明位於end offset + 1
的事務未提交。read_uncommitted
的end offset+1
就是 LSO。
如何關閉掛起的事務
一般來講,如果應用異常退出,未提交的事務就只有一個結局了:就是回滾,要麼由producer發起,要麼超時時間到了由broker自動執行。
有時應用崩了,一時半會兒也無法運行起來,它開啓的事務無法主動關閉,只能等超時。如果超時時間很久,對其他應用影響很大。此時,可以使用上一節描述的方法,將掛起的事務檢測出來。LSO這個offset上的消息,就是第一個未關閉事務的第一條消息。查看該消息的內容可以知道是哪個應用發送了這條消息,進而得出發送該消息的應用的事務Id。然後使用該Id創建一個producer,再執行producer.initTransactions()
,之前的“舊”事務就會被關閉。
Reference
- KIP-664: Provide tooling to detect and abort hanging transactions - Apache Kafka - Apache Software Foundation
- [KAFKA-12671] Out of order processing with a transactional producer can lead to a stuck LastStableOffset - ASF JIRA (apache.org)
- [KAFKA-5880] Transactional producer and read committed consumer causes consumer to stuck - ASF JIRA (apache.org)
- Is it possible to force abort a Kafka transaction? - Stack Overflow
附:測試代碼
package cn.whu.wy.kafkamate.service;
import cn.whu.wy.kafkamate.bean.TopicInfo;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.errors.AuthorizationException;
import org.apache.kafka.common.errors.OutOfOrderSequenceException;
import org.apache.kafka.common.errors.ProducerFencedException;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import java.util.Set;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
/**
* kafka事務測試。測試未關閉的事務,對消費者的影響
*
* @author WangYong
* Date 2022/11/15
* Time 22:05
*/
@Service
@Slf4j
public class KafkaTxTest implements InitializingBean {
private final String bootstrapServers = "192.168.136.128:9092";
@Autowired
private TopicManager topicManager;
@Override
public void afterPropertiesSet() throws Exception {
final String testTopic = "test-tx";
TopicInfo topicInfo = new TopicInfo(testTopic, 1, (short) 1);
topicManager.createTopics(Set.of(topicInfo));
Executors.newSingleThreadExecutor().execute(() -> {
try (Consumer<String, String> consumer = genRcConsumer("rc-consumer")) {
consumer.subscribe(Collections.singleton(testTopic));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
records.forEach(r -> log.info("consumer[rc] received: {}", r.value()));
log.info("rc pool...");
}
}
});
Executors.newSingleThreadExecutor().execute(() -> {
try (Consumer<String, String> consumer = genRuncConsumer("runc-consumer")) {
consumer.subscribe(Collections.singleton("test-tx"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
records.forEach(r -> log.info("consumer[read_uncommitted] received: {}", r.value()));
log.info("runc pool...");
}
}
});
// 每秒發1個消息,10條消息
Executors.newSingleThreadExecutor().execute(() -> {
try (Producer<String, String> producer = genProducer()) {
for (int i = 0; i < 10; i++) {
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
TimeUnit.SECONDS.sleep(1);
log.info("[nontx-producer]sent msg-" + i);
}
} catch (Exception e) {
throw new RuntimeException(e);
}
});
// 每秒一個事務,每個事務發2條消息,10個事務,20條消息
Executors.newSingleThreadExecutor().execute(() -> {
Producer<String, String> producer = genTxProducer("tx-producer-1");
producer.initTransactions();
for (int i = 2000; i < 2020; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-1]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-1]sent tx msg-" + (i + 1));
TimeUnit.SECONDS.sleep(1);
producer.commitTransaction();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
// We can't recover from these exceptions, so our only option is to close the producer and exit.
producer.close();
log.error("tx-producer-1 closed", e);
} catch (KafkaException e) {
// For all other exceptions, just abort the transaction and try again.
producer.abortTransaction();
log.warn("tx-producer-1 abort tx");
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
});
// 每秒1個事務,每個事務發2條消息
// 在2個事務後sleep。即,commit了4條消息,未commit的有2條
Executors.newSingleThreadExecutor().execute(() -> {
Producer<String, String> producer = genTxProducer("tx-producer-2");
producer.initTransactions();
for (int i = 1000; i < 2000; i += 2) {
try {
producer.beginTransaction();
producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
log.info("[tx-producer-2]sent tx msg-" + i);
producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
log.info("[tx-producer-2]sent tx msg-" + (i + 1));
TimeUnit.SECONDS.sleep(1);
// 掛起事務
if (i == 1004) {
// log.warn("tx-producer-2 hanging tx");
TimeUnit.SECONDS.sleep(5);
// log.warn("tx-producer-2 system exit");
// System.exit(-1);
// TimeUnit.MINUTES.sleep(60);
log.info("tx-producer-1 stop");
Thread.currentThread().stop();
}
producer.commitTransaction();
} catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
// We can't recover from these exceptions, so our only option is to close the producer and exit.
producer.close();
log.error("tx-producer-2 closed", e);
} catch (KafkaException e) {
// For all other exceptions, just abort the transaction and try again.
producer.abortTransaction();
log.warn("tx-producer-2 abort tx");
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
log.info("thread tx-producer-2 exit");
});
// 10 秒後,使用掛起線程的事務id,創建一個producer,強制結束之前掛起的事務
// Executors.newSingleThreadExecutor().execute(() -> {
// try {
// TimeUnit.SECONDS.sleep(10);
// } catch (InterruptedException e) {
// throw new RuntimeException(e);
// }
// Producer<String, String> producer = genTxProducer("tx-producer-1");
// log.info("tx-producer-1 init again");
// producer.initTransactions();
// });
}
Producer<String, String> genTxProducer(String txId) {
Properties props = new Properties();
props.put("bootstrap.servers", bootstrapServers);
props.put("linger.ms", 1);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("transactional.id", txId);
// props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, 10000); // 10 min
return new KafkaProducer<>(props);
}
Producer<String, String> genProducer() {
Properties props = new Properties();
props.put("bootstrap.servers", bootstrapServers);
props.put("linger.ms", 1);
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
return new KafkaProducer<>(props);
}
Consumer<String, String> genRuncConsumer(String gid) {
Properties props = new Properties();
props.setProperty("bootstrap.servers", bootstrapServers);
props.setProperty("group.id", gid);
props.setProperty("enable.auto.commit", "true");
props.setProperty("auto.commit.interval.ms", "1000");
props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
return new KafkaConsumer<>(props);
}
Consumer<String, String> genRcConsumer(String gid) {
Properties props = new Properties();
props.setProperty("bootstrap.servers", bootstrapServers);
props.setProperty("group.id", gid);
props.setProperty("enable.auto.commit", "true");
props.setProperty("auto.commit.interval.ms", "1000");
props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
props.setProperty(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
return new KafkaConsumer<>(props);
}
}