consumer.seek 之後立即 poll 可能拉不到消息

問題

前段時間寫Kafka工具,需要頻繁seek到不同的offset然後poll消息。

通常的poll寫法是:

consumer.subscribe("topics");
while(true){
    records = consumer.poll(Duration.ofSeconds(1));
    // do something with records
}

但我的需求是隻poll一次,僅拉取當前offset的一條消息。

然後就遇到了seek之後立即poll,拉不到消息的情況。如果多嘗試幾次,最終可以拉到消息。這裏面的原因是什麼呢?

新舊poll方法的區別

老版本的kafka 有個poll方法,簽名爲 poll(long timeout);新版本爲 poll(Duration timeout)。如果如舊版本的poll(long)方法,則一次就能消費到消息。二者的區別,可以在Kafka 的KIP裏面找到:

KIP-266: Fix consumer indefinite blocking behavior

關鍵內容摘抄如下:

Consumer#poll

The pre-existing variant poll(long timeout) would block indefinitely for metadata updates if they were needed, then it would issue a fetch and poll for timeout ms for new records. The initial indefinite metadata block caused applications to become stuck when the brokers became unavailable. The existence of the timeout parameter made the indefinite block especially unintuitive.

We will add a new method poll(Duration timeout) with the semantics:

  1. iff a metadata update is needed:
    1. send (asynchronous) metadata requests
    2. poll for metadata responses (counts against timeout)
      • if no response within timeout, return an empty collection immediately
  1. if there is fetch data available, return it immediately
  2. if there is no fetch request in flight, send fetch requests
  3. poll for fetch responses (counts against timeout)
    • if no response within timeout, return an empty collection (leaving async fetch request for the next poll)
    • if we get a response, return the response

We will deprecate the original method, poll(long timeout), and we will not change its semantics, so it remains:

  1. iff a metadata update is needed:
    1. send (asynchronous) metadata requests
    2. poll for metadata responses indefinitely until we get it
  1. if there is fetch data available, return it immediately
  2. if there is no fetch request in flight, send fetch requests
  3. poll for fetch responses (counts against timeout)
    • if no response within timeout, return an empty collection (leaving async fetch request for the next poll)
    • if we get a response, return the response

One notable usage is prohibited by the new poll: previously, you could call poll(0) to block for metadata updates, for example to initialize the client, supposedly without fetching records. Note, though, that this behavior is not according to any contract, and there is no guarantee that poll(0) won't return records the first time it's called. Therefore, it has always been unsafe to ignore the response.

簡言之,poll(long timeout) 是無限期阻塞的,會等待訂閱的元數據信息更新完成(這個等待時間不包含在timeout之內),確保能拉到消息。而poll(Duration timeout)不會一直阻塞,經過最多timeout後就會返回,不管拉沒拉到消息。之所以拉不到消息,是因爲此時根本沒有分區分配給該消費者,只有在元數據更新完成後才能消費到。

我在StackOverflow上也回答過這個問題:java - Kafka Cluster sometimes returns no records during seek and poll - Stack Overflow

解決

那麼如何解決該問題?

方法一,使用poll(long),能簡單粗暴的解決問題,但不推薦,因爲poll(long)已經打了@Deprecated註解,遲早要廢棄;

方法二,使用循環,確保消費者能分配到分區:

consumer.seek(offset);
while(consumer.assignment().size() == 0) {
  consumer.poll(Duration.ofMillis(100));
}
records = consumer.poll(Duration.ofMillis(100));

方法三,註冊一個ConsumerRebalanceListener監聽器,將poll(Duration)邏輯寫到其回調方法onPartitionsAssigned 裏面。這樣也能保證在消費者拿到分區之後再poll。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章