Kafka transaction hanging causes consumer to stuck

Kafka事務未關閉導致消費者無法消費消息。

背景

最近遇到一個問題:有一個公用topic,很多應用都讀寫這個topic。從某個時間點開始,所有消費該topic的消費者(read_committed級別)都拉不到消息了。隨機看了一些應用的日誌,未發現生產者報錯,仍然能正常發消息並提交事務,消費者也未報錯。打開運維工具,用read_uncommitted​級別的消費者查看該topic裏面的消息,發現LSO(last stable offset)一直卡在某個offset,LEO(log end offset)仍在持續增加。

原因

  1. 生產者開啓了事務,由於程序bug,在異常退出的情況下,producer的commit、abort都沒執行到,後果就是事務未關閉

  2. 事務除了手動commit、abort兩種方式進行主動關閉外,還可以通過超時​被動關閉。

    broker 端事務超時時間配置:transaction.max.timeout.ms,默認15分鐘

    producer 端事務超時時間配置:transaction.timeout.ms,默認1分鐘

  3. 但是我們將事務超時時間設置得非常大,5小時。未關閉的事務將會等待5小時才能被動關閉。這5小時內,後續的消息都無法被消費。注意,雖然無法消費,但是不同事務Id​的其他生產者卻能繼續往該topic提交事務。但是這些已提交的消息是不能被消費的,這是通過LastStableOffset(LSO)進行控制的。LSO 的定義是,smallest offset of any open transaction​,已開啓的事務裏面offset最小的那一個值,可消費的offset是小於LSO的。

  4. 爲什麼後續已提交事務的消息也無法被消費呢?因爲Kafka要保證partition內的消息是有序的。當前序事務未結束,如果先消費了後續消息,此時前序事務即使提交也無意義了,因爲消費者不能將offset倒回去消費它。如果倒回去消費,順序就亂了。所以,前序事務必須結束,不管是主動commit/abort,還是超時被動abort,總之必須結束,後續消息才能被消費。

復現方案及延申測試

創建2個事務生產者tx-producer-1、tx-producer-2,創建1個非事務的生產者nontx-producer,創建1個隔離級別(isolation.level​)爲read_committed的rc-consumer,1個隔離級別爲read_uncommitted​的runc-consumer。

  • nontx-producer每秒發一個消息
  • tx-producer-1每秒提交一個事務,一個事務裏面發送2條消息
  • tx-producer-2每秒提交一個事務,一個事務裏面發送2條消息;但是,在做第三個事務時,發了2條消息後,不提交,長時間sleep
  • rc-comsumer和runc-consumer持續消費,打印消費的消息

每個生產者、消費者在自己的線程運行,互不影響。使用Kafka_2.13-3.3.1版本。

kafka使用默認配置。

producer.close會立即關閉事務嗎-會

結論:會

代碼:

for (int i = 1000; i < 2000; i += 2) {
    try {
        producer.beginTransaction();

        producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
        log.info("[tx-producer-2]sent tx msg-" + i);
        producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
        log.info("[tx-producer-2]sent tx msg-" + (i + 1));

        // 掛起事務
        if (i == 1004) {
            log.warn("tx-producer-2 hanging tx");
            TimeUnit.SECONDS.sleep(5);
            log.warn("tx-producer-2 close");
            producer.close();
            return;
        }

        producer.commitTransaction();
    } catch (Exception e) {
        producer.abortTransaction();
        log.warn("tx-producer-2 abort tx");
    }
}

操作:在IDEA運行應用即可

應用日誌:

2022-11-17 09:56:39.757  WARN 2288 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : tx-producer-2 close
2022-11-17 09:56:39.757  INFO 2288 --- [pool-5-thread-1] o.a.k.clients.producer.KafkaProducer     : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Closing the Kafka producer with timeoutMillis = 9223372036854775807 ms.
2022-11-17 09:56:39.757 DEBUG 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender  : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Beginning shutdown of Kafka producer I/O thread, sending remaining records.
2022-11-17 09:56:39.757  INFO 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender  : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Aborting incomplete transaction due to shutdown
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] o.a.k.c.p.internals.TransactionManager   : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Transition from state IN_TRANSACTION to ABORTING_TRANSACTION
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] o.a.k.c.p.internals.TransactionManager   : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Enqueuing transactional request EndTxnRequestData(transactionalId='tx-producer-2', producerId=2, producerEpoch=0, committed=false)
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender  : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Sending transactional request EndTxnRequestData(transactionalId='tx-producer-2', producerId=2, producerEpoch=0, committed=false) to node ship:9092 (id: 0 rack: null) with correlation ID 40
2022-11-17 09:56:39.758 DEBUG 2288 --- [r-tx-producer-2] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Sending END_TXN request with header RequestHeader(apiKey=END_TXN, apiVersion=3, clientId=producer-tx-producer-2, correlationId=40) and timeout 30000 to node 0: EndTxnRequestData(transactionalId='tx-producer-2', producerId=2, producerEpoch=0, committed=false)
2022-11-17 09:56:39.761 DEBUG 2288 --- [r-tx-producer-2] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Received END_TXN response from node 0 for request with header RequestHeader(apiKey=END_TXN, apiVersion=3, clientId=producer-tx-producer-2, correlationId=40): EndTxnResponseData(throttleTimeMs=0, errorCode=0)
2022-11-17 09:56:39.761 DEBUG 2288 --- [r-tx-producer-2] o.a.k.c.p.internals.TransactionManager   : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Transition from state ABORTING_TRANSACTION to READY
2022-11-17 09:56:39.762 DEBUG 2288 --- [r-tx-producer-2] o.a.k.clients.producer.internals.Sender  : [Producer clientId=producer-tx-producer-2, transactionalId=tx-producer-2] Shutdown of Kafka producer I/O thread has completed.
2022-11-17 09:56:39.762  INFO 2288 --- [pool-5-thread-1] org.apache.kafka.common.metrics.Metrics  : Metrics scheduler closed
2022-11-17 09:56:39.762  INFO 2288 --- [pool-5-thread-1] org.apache.kafka.common.metrics.Metrics  : Closing reporter org.apache.kafka.common.metrics.JmxReporter
2022-11-17 09:56:39.762  INFO 2288 --- [pool-5-thread-1] org.apache.kafka.common.metrics.Metrics  : Metrics reporters closed
2022-11-17 09:56:39.763  INFO 2288 --- [pool-5-thread-1] o.a.kafka.common.utils.AppInfoParser     : App info kafka.producer for producer-tx-producer-2 unregistered

“Aborting incomplete transaction due to shutdown”,可見在producer.close()會關閉未完成的事務。

kafka server.log:

只有初始化事務的日誌,沒有結束事務的日誌

[2022-11-17 09:15:03,988] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 09:15:03,995] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-1 with producerId 1 and producer epoch 0 on partition __transaction_state-1 (kafka.coordinator.transaction.TransactionCoordinator)

應用進程退出會立即關閉事務嗎-不會

結論:不會

代碼:

for (int i = 1000; i < 2000; i += 2) {
    try {
        producer.beginTransaction();

        producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
        log.info("[tx-producer-2]sent tx msg-" + i);
        producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
        log.info("[tx-producer-2]sent tx msg-" + (i + 1));

        // 掛起事務
        if (i == 1004) {
            log.warn("tx-producer-2 hanging tx");
            TimeUnit.MINUTES.sleep(60); // 長時間sleep
        }

        producer.commitTransaction();
    } catch (Exception e) {
        producer.abortTransaction();
        log.warn("tx-producer-2 abort tx");
    }
}

操作:在IDEA裏面運行應用,在sleep過程中點擊停止

應用日誌:

Started MateApp in 1.702 seconds (JVM running for 2.123)
2022-11-16 21:05:58.582  INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : runc pool...
2022-11-16 21:05:58.582  INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : rc pool...
2022-11-16 21:05:58.609  INFO 14572 --- [pool-3-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : [nontx-producer]sent msg-0
2022-11-16 21:05:59.104  INFO 14572 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : [tx-producer-2]sent tx msg-1000
2022-11-16 21:06:00.192  INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : consumer[rc] received: msg-1000
2022-11-16 21:06:00.192  INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : consumer[rc] received: msg-1001
...
...
2022-11-16 21:06:02.160  WARN 14572 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : tx-producer-2 hanging tx
2022-11-16 21:06:02.254  INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : runc pool...
2022-11-16 21:06:02.256  INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : consumer[rc] received: msg-2004
2022-11-16 21:06:02.256  INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : consumer[rc] received: msg-2005
2022-11-16 21:06:02.256  INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : rc pool...
2022-11-16 21:06:02.290  INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : consumer[read_uncommitted] received: msg-2006
2022-11-16 21:06:02.290  INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : consumer[read_uncommitted] received: msg-2007
...
...
2022-11-16 21:06:08.265  INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : consumer[read_uncommitted] received: msg-2018
2022-11-16 21:06:08.265  INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : consumer[read_uncommitted] received: msg-2019
2022-11-16 21:06:08.265  INFO 14572 --- [pool-2-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : runc pool...
2022-11-16 21:06:08.292  INFO 14572 --- [pool-1-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : rc pool...

Process finished with exit code 130

在事務掛起後,consumer[rc]就再也消費不到消息了,只有consumer[read_uncommitted]還能消費到。

kafka server.log:

[2022-11-16 21:06:00,359] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
...
[2022-11-16 21:07:03,861] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)

可見,tx-producer-2的事務在init1分鐘後自動rollback。應用在21:06:13停止,事務在21:07:03回滾。應用停止時事務並未立即回滾。

System.exit會立即關閉事務嗎-不會

結論:不會

代碼:

for (int i = 1000; i < 2000; i += 2) {
    try {
        producer.beginTransaction();

        producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
        log.info("[tx-producer-2]sent tx msg-" + i);
        producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
        log.info("[tx-producer-2]sent tx msg-" + (i + 1));

        // 掛起事務
        if (i == 1004) {
            log.warn("tx-producer-2 system exit");
            System.exit(-1);
        }

        producer.commitTransaction();
    } catch (Exception e) {
        producer.abortTransaction();
        log.warn("tx-producer-2 abort tx");
    }
}

應用日誌:

...
2022-11-17 10:18:49.195  WARN 10736 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : tx-producer-2 system exit
...
Process finished with exit code -1

kafka server.log

[2022-11-17 10:18:46,106] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-1 with producerId 1 and producer epoch 0 on partition __transaction_state-1 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:18:46,176] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)

[2022-11-17 10:19:52,492] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-1 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:19:52,496] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)

看起來應用System.exit(-1) 3秒後事務就回滾了。再做一次

# 應用日誌
2022-11-17 10:27:00.366  WARN 20524 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : tx-producer-2 system exit

# kafka server 日誌
[2022-11-17 10:26:53,406] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-1 with producerId 2 and producer epoch 0 on partition __transaction_state-1 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:26:53,406] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 1 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)

[2022-11-17 10:28:02,428] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-1 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:28:02,432] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)

這次相差一分鐘。

Thread.stop會立即關閉事務嗎-不會

結論:不會

代碼:

for (int i = 1000; i < 2000; i += 2) {
    try {
        producer.beginTransaction();

        producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
        log.info("[tx-producer-2]sent tx msg-" + i);
        producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
        log.info("[tx-producer-2]sent tx msg-" + (i + 1));

        // 掛起事務
        if (i == 1004) {
            log.info("tx-producer-1 stop");
            Thread.currentThread().stop();
        }

        producer.commitTransaction();
    } catch (Exception e) {
        producer.abortTransaction();
        log.warn("tx-producer-2 abort tx");
    }
}

應用日誌:

2022-11-17 10:45:22.239  INFO 7504 --- [pool-5-thread-1] cn.whu.wy.kafkamate.service.KafkaTxTest  : tx-producer-1 stop

kafka server.log:

[2022-11-17 10:45:15,340] INFO [TransactionCoordinator id=0] Initialized transactionalId tx-producer-2 with producerId 2 and producer epoch 0 on partition __transaction_state-2 (kafka.coordinator.transaction.TransactionCoordinator)
[2022-11-17 10:46:22,035] INFO [TransactionCoordinator id=0] Completed rollback of ongoing transaction for transactionalId tx-producer-2 due to timeout (kafka.coordinator.transaction.TransactionCoordinator)

生產者線程stop後一分鐘,broker回滾事務。

總結

  1. transaction.timeout.ms​(producer)、transaction.max.timeout.ms​(broker)、request.timeout.ms​(broker)三個參數尤其重要,不能配得太大。
  2. 異常處理要完備,確保commit、abort、close方法能被執行到

如何檢測掛起的事務

這個不消費的問題我排查了很久,知道原因後自然而然想到,有沒有手段可以檢測到未關閉的事務呢?KIP-664 正在討論這個問題。

在這個特性發布之前,我們可以用一些其他方式勉強實現這個功能。創建一個read_uncommitted​的消費者,消費同樣的topic partition,然後比較end offset。如果read_uncommitted​的end offset 逐漸增加,read_committed​的end offset 長時間未增加,則說明位於end offset + 1​的事務未提交。read_uncommitted​的end offset+1​就是 LSO。

如何關閉掛起的事務

一般來講,如果應用異常退出,未提交的事務就只有一個結局了:就是回滾,要麼由producer發起,要麼超時時間到了由broker自動執行。

有時應用崩了,一時半會兒也無法運行起來,它開啓的事務無法主動關閉,只能等超時。如果超時時間很久,對其他應用影響很大。此時,可以使用上一節描述的方法,將掛起的事務檢測出來。LSO這個offset上的消息,就是第一個未關閉事務的第一條消息。查看該消息的內容可以知道是哪個應用發送了這條消息,進而得出發送該消息的應用的事務Id。然後使用該Id創建一個producer,再執行producer.initTransactions()​,之前的“舊”事務就會被關閉。

Reference

  1. KIP-664: Provide tooling to detect and abort hanging transactions - Apache Kafka - Apache Software Foundation
  2. [KAFKA-12671] Out of order processing with a transactional producer can lead to a stuck LastStableOffset - ASF JIRA (apache.org)
  3. [KAFKA-5880] Transactional producer and read committed consumer causes consumer to stuck - ASF JIRA (apache.org)
  4. Is it possible to force abort a Kafka transaction? - Stack Overflow

附:測試代碼

package cn.whu.wy.kafkamate.service;

import cn.whu.wy.kafkamate.bean.TopicInfo;
import lombok.extern.slf4j.Slf4j;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerConfig;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.Producer;
import org.apache.kafka.clients.producer.ProducerRecord;
import org.apache.kafka.common.KafkaException;
import org.apache.kafka.common.errors.AuthorizationException;
import org.apache.kafka.common.errors.OutOfOrderSequenceException;
import org.apache.kafka.common.errors.ProducerFencedException;
import org.springframework.beans.factory.InitializingBean;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.time.Duration;
import java.util.Collections;
import java.util.Properties;
import java.util.Set;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

/**
 * kafka事務測試。測試未關閉的事務,對消費者的影響
 *
 * @author WangYong
 * Date 2022/11/15
 * Time 22:05
 */
@Service
@Slf4j
public class KafkaTxTest implements InitializingBean {

    private final String bootstrapServers = "192.168.136.128:9092";

    @Autowired
    private TopicManager topicManager;


    @Override
    public void afterPropertiesSet() throws Exception {
        final String testTopic = "test-tx";

        TopicInfo topicInfo = new TopicInfo(testTopic, 1, (short) 1);
        topicManager.createTopics(Set.of(topicInfo));

        Executors.newSingleThreadExecutor().execute(() -> {
            try (Consumer<String, String> consumer = genRcConsumer("rc-consumer")) {
                consumer.subscribe(Collections.singleton(testTopic));
                while (true) {
                    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
                    records.forEach(r -> log.info("consumer[rc] received: {}", r.value()));
                    log.info("rc pool...");
                }
            }
        });

        Executors.newSingleThreadExecutor().execute(() -> {
            try (Consumer<String, String> consumer = genRuncConsumer("runc-consumer")) {
                consumer.subscribe(Collections.singleton("test-tx"));
                while (true) {
                    ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
                    records.forEach(r -> log.info("consumer[read_uncommitted] received: {}", r.value()));
                    log.info("runc pool...");
                }
            }
        });


        // 每秒發1個消息,10條消息
        Executors.newSingleThreadExecutor().execute(() -> {
            try (Producer<String, String> producer = genProducer()) {
                for (int i = 0; i < 10; i++) {
                    producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
                    TimeUnit.SECONDS.sleep(1);
                    log.info("[nontx-producer]sent msg-" + i);
                }
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        });

        // 每秒一個事務,每個事務發2條消息,10個事務,20條消息
        Executors.newSingleThreadExecutor().execute(() -> {
            Producer<String, String> producer = genTxProducer("tx-producer-1");
            producer.initTransactions();

            for (int i = 2000; i < 2020; i += 2) {
                try {
                    producer.beginTransaction();

                    producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
                    log.info("[tx-producer-1]sent tx msg-" + i);
                    producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
                    log.info("[tx-producer-1]sent tx msg-" + (i + 1));

                    TimeUnit.SECONDS.sleep(1);
                    producer.commitTransaction();
                } catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
                    // We can't recover from these exceptions, so our only option is to close the producer and exit.
                    producer.close();
                    log.error("tx-producer-1 closed", e);
                } catch (KafkaException e) {
                    // For all other exceptions, just abort the transaction and try again.
                    producer.abortTransaction();
                    log.warn("tx-producer-1 abort tx");
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            }
        });

        // 每秒1個事務,每個事務發2條消息
        // 在2個事務後sleep。即,commit了4條消息,未commit的有2條
        Executors.newSingleThreadExecutor().execute(() -> {
            Producer<String, String> producer = genTxProducer("tx-producer-2");
            producer.initTransactions();

            for (int i = 1000; i < 2000; i += 2) {
                try {
                    producer.beginTransaction();
                    producer.send(new ProducerRecord<>(testTopic, "msg-" + i));
                    log.info("[tx-producer-2]sent tx msg-" + i);

                    producer.send(new ProducerRecord<>(testTopic, "msg-" + (i + 1)));
                    log.info("[tx-producer-2]sent tx msg-" + (i + 1));

                    TimeUnit.SECONDS.sleep(1);


                    // 掛起事務
                    if (i == 1004) {
//                        log.warn("tx-producer-2 hanging tx");
                        TimeUnit.SECONDS.sleep(5);
//                        log.warn("tx-producer-2 system exit");
//                        System.exit(-1);
//                        TimeUnit.MINUTES.sleep(60);
                        log.info("tx-producer-1 stop");
                        Thread.currentThread().stop();
                    }

                    producer.commitTransaction();
                } catch (ProducerFencedException | OutOfOrderSequenceException | AuthorizationException e) {
                    // We can't recover from these exceptions, so our only option is to close the producer and exit.
                    producer.close();
                    log.error("tx-producer-2 closed", e);
                } catch (KafkaException e) {
                    // For all other exceptions, just abort the transaction and try again.
                    producer.abortTransaction();
                    log.warn("tx-producer-2 abort tx");
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            }
            log.info("thread tx-producer-2 exit");
        });


        // 10 秒後,使用掛起線程的事務id,創建一個producer,強制結束之前掛起的事務
//        Executors.newSingleThreadExecutor().execute(() -> {
//            try {
//                TimeUnit.SECONDS.sleep(10);
//            } catch (InterruptedException e) {
//                throw new RuntimeException(e);
//            }
//            Producer<String, String> producer = genTxProducer("tx-producer-1");
//            log.info("tx-producer-1 init again");
//            producer.initTransactions();
//        });

    }


    Producer<String, String> genTxProducer(String txId) {
        Properties props = new Properties();
        props.put("bootstrap.servers", bootstrapServers);
        props.put("linger.ms", 1);
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("transactional.id", txId);
//        props.put(ProducerConfig.TRANSACTION_TIMEOUT_CONFIG, 10000); // 10 min
        return new KafkaProducer<>(props);
    }

    Producer<String, String> genProducer() {
        Properties props = new Properties();
        props.put("bootstrap.servers", bootstrapServers);
        props.put("linger.ms", 1);
        props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
        return new KafkaProducer<>(props);
    }

    Consumer<String, String> genRuncConsumer(String gid) {
        Properties props = new Properties();
        props.setProperty("bootstrap.servers", bootstrapServers);
        props.setProperty("group.id", gid);
        props.setProperty("enable.auto.commit", "true");
        props.setProperty("auto.commit.interval.ms", "1000");
        props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        return new KafkaConsumer<>(props);
    }

    Consumer<String, String> genRcConsumer(String gid) {
        Properties props = new Properties();
        props.setProperty("bootstrap.servers", bootstrapServers);
        props.setProperty("group.id", gid);
        props.setProperty("enable.auto.commit", "true");
        props.setProperty("auto.commit.interval.ms", "1000");
        props.setProperty("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.setProperty("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
        props.setProperty(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
        return new KafkaConsumer<>(props);
    }

}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章