kafka的auto.offset.reset詳解與測試


1. 取值及定義

auto.offset.reset有以下三個可選值:

  • latest (默認)
  • earliest
  • none

三者均有共同定義:
對於同一個消費者組,若已有提交的offset,則從提交的offset開始接着消費

意思就是,只要這個消費者組消費過了,不管auto.offset.reset指定成什麼值,效果都一樣,每次啓動都是已有的最新的offset開始接着往後消費


不同的點爲:

  • latest(默認):對於同一個消費者組,若沒有提交過offset,則只消費消費者連接topic後,新產生的數據

就是說如果這個topic有歷史消息,現在新啓動了一個消費者組,且auto.offset.reset=latest,此時已存在的歷史消息無法消費到,那保持消費者組運行,如果此時topic有新消息進來了,這時新消息纔會被消費到。而一旦有消費,則必然會提交offset
這時候如果該消費者組意外下線了,topic仍然有消息進來,接着該消費者組在後面恢復上線了,它仍然可以從下線時的offset處開始接着消費,此時走的就是共同定義

  • earliest:對於同一個消費者組,若沒有提交過offset,則從頭開始消費

就是說如果這個topic有歷史消息存在,現在新啓動了一個消費者組,且auto.offset.reset=earliest,那將會從頭開始消費,這就是與latest不同之處。
一旦該消費者組消費過topic後,此時就有該消費者組的offset了,這種情況下即使指定了auto.offset.reset=earliest,再重新啓動該消費者組,效果是與latest一樣的,也就是此時走的是共同的定義

  • none:對於同一個消費者組,若沒有提交過offset,會拋異常

一般生產環境基本用不到該參數


2. 新建全新topic

./kafka-topics.sh --bootstrap-server 127.0.0.1:9092 --topic TestOffsetResetTopic --partitions 1 --replication-factor 1 --create

3. 往新建的topic發送消息

便於測試,用Java代碼發送5條消息

public class TestProducer {

    public static void main(String[] args) throws InterruptedException {
        Properties properties = new Properties();
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.123.124:9092");
        properties.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        properties.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
        KafkaProducer<String, String> producer = new KafkaProducer<>(properties);

        String topic = "TestOffsetResetTopic";

        for (int i = 0; i < 5; i++) {
            String value = "message_" + i + "_" + LocalDateTime.now();
            System.out.println("Send value: " + value);
            producer.send(new ProducerRecord<>(topic, value), (metadata, exception) -> {
                if (exception == null) {
                    String str = MessageFormat.format("Send success! topic: {0}, partition: {1}, offset: {2}", metadata.topic(), metadata.partition(), metadata.offset());
                    System.out.println(str);
                }
            });
            Thread.sleep(500);
        }

        producer.close();
    }
}

發送消息成功:

Send value: message_0_2022-09-16T18:26:15.943749600
Send success! topic: TestOffsetResetTopic, partition: 0, offset: 0
Send value: message_1_2022-09-16T18:26:17.066608900
Send success! topic: TestOffsetResetTopic, partition: 0, offset: 1
Send value: message_2_2022-09-16T18:26:17.568667200
Send success! topic: TestOffsetResetTopic, partition: 0, offset: 2
Send value: message_3_2022-09-16T18:26:18.069093600
Send success! topic: TestOffsetResetTopic, partition: 0, offset: 3
Send value: message_4_2022-09-16T18:26:18.583288100
Send success! topic: TestOffsetResetTopic, partition: 0, offset: 4

現在TestOffsetResetTopic這個topic有5條消息,且還沒有任何消費者組對其進行消費過,也就是沒有任何offset


4. 測試latest

已知topic已經存在5條歷史消息,此時啓動一個消費者

public class TestConsumerLatest {

    public static void main(String[] args) {
        Properties properties = new Properties();
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.123.124:9092");
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        // 指定消費者組
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, "group1");
        // 設置 auto.offset.reset
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");

        String topic = "TestOffsetResetTopic";
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
        consumer.subscribe(Collections.singletonList(topic));

        // 消費數據
        while (true) {
            ConsumerRecords<String, String> consumerRecords = consumer.poll(Duration.ofSeconds(1));
            for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {
                System.out.println(consumerRecord);
            }
        }

    }
}

發現如上面所述,歷史已存在的5條消息不會消費到,消費者沒有任何動靜,現在保持消費者在線

啓動TestProducer再發5條消息,會發現這後面新發的,offset5開始的消息就被消費了

ConsumerRecord(topic = TestOffsetResetTopic, partition = 0, leaderEpoch = 0, offset = 5, CreateTime = 1663329725731, serialized key size = -1, serialized value size = 39, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = message_0_2022-09-16T20:02:05.523581500)
ConsumerRecord(topic = TestOffsetResetTopic, partition = 0, leaderEpoch = 0, offset = 6, CreateTime = 1663329726251, serialized key size = -1, serialized value size = 39, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = message_1_2022-09-16T20:02:06.251399400)
ConsumerRecord(topic = TestOffsetResetTopic, partition = 0, leaderEpoch = 0, offset = 7, CreateTime = 1663329726764, serialized key size = -1, serialized value size = 39, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = message_2_2022-09-16T20:02:06.764186200)
ConsumerRecord(topic = TestOffsetResetTopic, partition = 0, leaderEpoch = 0, offset = 8, CreateTime = 1663329727264, serialized key size = -1, serialized value size = 39, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = message_3_2022-09-16T20:02:07.264268500)
ConsumerRecord(topic = TestOffsetResetTopic, partition = 0, leaderEpoch = 0, offset = 9, CreateTime = 1663329727778, serialized key size = -1, serialized value size = 39, headers = RecordHeaders(headers = [], isReadOnly = false), key = null, value = message_4_2022-09-16T20:02:07.778469700)

此時該消費者組對於這個topic的offset已經爲9了,現在停掉這個消費者(下線),再啓動TestProducer發5條消息,接着再啓動TestConsumerLatest,會發現緊接上一次的offset之後開始,即從10繼續消費

如果測試發現沒動靜,請多等一會,估計機器性能太差...


5. 測試earliest

新建一個測試消費者,設置auto.offset.resetearliest,注意groupid爲新的group2,表示對於topic來說是全新的消費者組

public class TestConsumerEarliest {

    public static void main(String[] args) {
        Properties properties = new Properties();
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.123.124:9092");
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        // 指定消費者組
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, "group2");
        // 設置 auto.offset.reset
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");

        String topic = "TestOffsetResetTopic";
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
        consumer.subscribe(Collections.singletonList(topic));

        // 消費數據
        while (true) {
            ConsumerRecords<String, String> consumerRecords = consumer.poll(Duration.ofSeconds(1));
            for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {
                System.out.println(consumerRecord);
            }
        }

    }
}

一運行發現已有的10條消息(最開始5條加上面一次測試又發了5條,一共10條)是可以被消費到的,且消費完後,對於這個topic就已經有了group2這個組的offset了,無論之後啓停,只要groupid不變,都會從最新的offset往後開始消費


6. 測試none

新建一個測試消費者,設置auto.offset.resetnone,注意groupid爲新的group3,表示對於topic來說是全新的消費者組

public class TestConsumerNone {

    public static void main(String[] args) {
        Properties properties = new Properties();
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "192.168.123.124:9092");
        properties.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        properties.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
        // 指定消費者組
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, "group3");
        // 設置 auto.offset.reset
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "none");

        String topic = "TestOffsetResetTopic";
        KafkaConsumer<String, String> consumer = new KafkaConsumer<>(properties);
        consumer.subscribe(Collections.singletonList(topic));

        // 消費數據
        while (true) {
            ConsumerRecords<String, String> consumerRecords = consumer.poll(Duration.ofSeconds(1));
            for (ConsumerRecord<String, String> consumerRecord : consumerRecords) {
                System.out.println(consumerRecord);
            }
        }

    }
}

一運行,程序報錯,因爲對於topic來說是全新的消費者組,且又指定了auto.offset.resetnone,直接拋異常,程序退出

Exception in thread "main" org.apache.kafka.clients.consumer.NoOffsetForPartitionException: Undefined offset with no reset policy for partitions: [TestOffsetResetTopic-0]
    at org.apache.kafka.clients.consumer.internals.SubscriptionState.resetInitializingPositions(SubscriptionState.java:706)
    at org.apache.kafka.clients.consumer.KafkaConsumer.updateFetchPositions(KafkaConsumer.java:2434)
    at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1266)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1231)
    at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211)
    at kakfa.TestConsumerNone.main(TestConsumerNone.java:31)

7. 總結

  • 如果topic已經有歷史消息了,又需要消費這些歷史消息,則必須要指定一個從未消費過的消費者組,同時指定auto.offset.resetearliest,纔可以消費到歷史數據,之後就有提交offset。有了offset,無論是earliest還是latest,效果都是一樣的了。
  • 如果topic沒有歷史消息,或者不需要處理歷史消息,那按照默認latest即可。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章