大家好,這是一個爲了夢想而保持學習的博客。這個專題會記錄我對於KAFKA的學習和實戰經驗,希望對大家有所幫助,目錄形式依舊爲問答的方式,相當於是模擬面試。
前言
這一篇我們主要梳理下kafka的消費者客戶端的整體架構。所謂架構整體架構呢,也就是consumer的核心鏈路設計,即:初始化、消息消費、位移提交、心跳。
Tips:以下源碼基於kafka-1.1.0版本。
初始化
初始化,顧名思義,我們先來看看「KafkaConsumer」這個類的構造源碼
private KafkaConsumer(ConsumerConfig config,
Deserializer<K> keyDeserializer,
Deserializer<V> valueDeserializer) {
try {
// 一些基礎配置項
// ...
List<ConsumerInterceptor<K, V>> interceptorList = (List) (new ConsumerConfig(userProvidedConfigs, false)).getConfiguredInstances(ConsumerConfig.INTERCEPTOR_CLASSES_CONFIG,
ConsumerInterceptor.class);
// 攔截器/序列化器模塊初始化
this.interceptors = new ConsumerInterceptors<>(interceptorList);
if (keyDeserializer == null) {
this.keyDeserializer = config.getConfiguredInstance(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,
Deserializer.class);
this.keyDeserializer.configure(config.originals(), true);
} else {
config.ignore(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG);
this.keyDeserializer = keyDeserializer;
}
if (valueDeserializer == null) {
this.valueDeserializer = config.getConfiguredInstance(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG,
Deserializer.class);
this.valueDeserializer.configure(config.originals(), false);
} else {
config.ignore(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG);
this.valueDeserializer = valueDeserializer;
}
ClusterResourceListeners clusterResourceListeners = configureClusterResourceListeners(keyDeserializer, valueDeserializer, reporters, interceptorList);
this.metadata = new Metadata(retryBackoffMs, config.getLong(ConsumerConfig.METADATA_MAX_AGE_CONFIG),
true, false, clusterResourceListeners);
List<InetSocketAddress> addresses = ClientUtils.parseAndValidateAddresses(config.getList(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG));
this.metadata.update(Cluster.bootstrap(addresses), Collections.<String>emptySet(), 0);
String metricGrpPrefix = "consumer";
ConsumerMetrics metricsRegistry = new ConsumerMetrics(metricsTags.keySet(), "consumer");
ChannelBuilder channelBuilder = ClientUtils.createChannelBuilder(config);
IsolationLevel isolationLevel = IsolationLevel.valueOf(
config.getString(ConsumerConfig.ISOLATION_LEVEL_CONFIG).toUpperCase(Locale.ROOT));
Sensor throttleTimeSensor = Fetcher.throttleTimeSensor(metrics, metricsRegistry.fetcherMetrics);
int heartbeatIntervalMs = config.getInt(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG);
// 網絡模塊
NetworkClient netClient = new NetworkClient(
new Selector(config.getLong(ConsumerConfig.CONNECTIONS_MAX_IDLE_MS_CONFIG), metrics, time, metricGrpPrefix, channelBuilder, logContext),
this.metadata,
clientId,
100, // a fixed large enough value will suffice for max in-flight requests
config.getLong(ConsumerConfig.RECONNECT_BACKOFF_MS_CONFIG),
config.getLong(ConsumerConfig.RECONNECT_BACKOFF_MAX_MS_CONFIG),
config.getInt(ConsumerConfig.SEND_BUFFER_CONFIG),
config.getInt(ConsumerConfig.RECEIVE_BUFFER_CONFIG),
config.getInt(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG),
time,
true,
new ApiVersions(),
throttleTimeSensor,
logContext);
// 將公共的網絡模塊封裝到consumerNetworkClient中
this.client = new ConsumerNetworkClient(
logContext,
netClient,
metadata,
time,
retryBackoffMs,
config.getInt(ConsumerConfig.REQUEST_TIMEOUT_MS_CONFIG),
heartbeatIntervalMs); //Will avoid blocking an extended period of time to prevent heartbeat thread starvation
OffsetResetStrategy offsetResetStrategy = OffsetResetStrategy.valueOf(config.getString(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG).toUpperCase(Locale.ROOT));
this.subscriptions = new SubscriptionState(offsetResetStrategy);
this.assignors = config.getConfiguredInstances(
ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
PartitionAssignor.class);
// 協調器組件
this.coordinator = new ConsumerCoordinator(logContext,
this.client,
groupId,
config.getInt(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG),// 默認5分鐘
config.getInt(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG),// 默認10s
heartbeatIntervalMs,
assignors,
this.metadata,
this.subscriptions,
metrics,
metricGrpPrefix,
this.time,
retryBackoffMs,
config.getBoolean(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG),
config.getInt(ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG),
this.interceptors,
config.getBoolean(ConsumerConfig.EXCLUDE_INTERNAL_TOPICS_CONFIG),
config.getBoolean(ConsumerConfig.LEAVE_GROUP_ON_CLOSE_CONFIG));
// 消息拉取組件
this.fetcher = new Fetcher<>(
logContext,
this.client,
config.getInt(ConsumerConfig.FETCH_MIN_BYTES_CONFIG),
config.getInt(ConsumerConfig.FETCH_MAX_BYTES_CONFIG),
config.getInt(ConsumerConfig.FETCH_MAX_WAIT_MS_CONFIG),
config.getInt(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG),
config.getInt(ConsumerConfig.MAX_POLL_RECORDS_CONFIG),
config.getBoolean(ConsumerConfig.CHECK_CRCS_CONFIG),
this.keyDeserializer,
this.valueDeserializer,
this.metadata,
this.subscriptions,
metrics,
metricsRegistry.fetcherMetrics,
this.time,
this.retryBackoffMs,
this.requestTimeoutMs,
isolationLevel);
config.logUnused();
AppInfoParser.registerAppInfo(JMX_PREFIX, clientId, metrics);
log.debug("Kafka consumer initialized");
} catch (Throwable t) {
// call close methods if internal objects are already constructed
// this is to prevent resource leak. see KAFKA-2121
close(0, true);
// now propagate the exception
throw new KafkaException("Failed to construct kafka consumer", t);
}
}
上面的代碼比較長,其實主要就是5個模塊:
- 一些基礎參數的設置,例如:clientId、groupId等等
- 攔截器/序列化模塊的初始化「interceptors/deserializer」
- 網絡模塊初始化「NetworkClient」
- 協調器模塊初始化 「ConsumerCoordinator」
- 消息拉取組件初始化 「Fetcher」
這個小節我們只需要大概瞭解我們初始化的時候會做哪些事情,其中包含了哪些組件,各個組件大概負責什麼職責即可。
消息消費
訂閱
初始化KafkaCosnumer完成後,在進行消息消費之前,我們需要訂閱目標Topic列表。有以下三種方式: consumer.subscribe(Collections.singletonList(this.topic)); // 最常用的指定一個topics列表即可
consumer.assign(partitions); // 強制指定目標消費分區集合,不再具備消費組的特性
consumer.subscribe(pattern, callback); // 通過正則的方式匹配對應topic,並且可以設置rebalance時的回調
以上三種訂閱方式,最常用的毫無疑問是第一種,顧名思義就是指定一個topic列表即可;
第二種呢,由於不具備消費組的特性了,因此也就不受消費組的分區訂閱限制,可以讓消費者個數突破分區數的限制,但是同時呢也就沒有了rebalance帶來的故障轉移的優勢,另外使用assign的方式訂閱後消費,通過kafka-consumer-groups.sh
查對應消費組的消費進度時會有點問題;
第三種呢,使用的很少,都是應用於一些相對特殊的場景下,就是可以靈活的匹配topic列表,除此之外也沒什麼特殊的地方。
消費
很多同學對kafka的消費方式有疑問,最多的就是以下兩點:
- 真的寫個while(true)去消費嗎?—— 是的,通常情況下都可以這麼寫;如果想要優雅一點可以設置了一個對應的控制參數去代替「true」。
- poll(timeout)這個時間該怎麼設置呀?—— 首先我們需要明白這個「timeout」參數是什麼意思,簡單的解釋下就是我們這一次poll,如果對應的所有分區中沒有還未消費的消息,那麼請求就會在服務端等待對應的時間;如果這個時間內生產者在對應的分區寫入了數據那麼就立刻返回再次進行拉取,如果沒有等到,那麼就在設置的這個超時時間後返回空的集合,繼續下一輪的poll。
在明白了參數的含義後,我們其實就明白了這個參數對我們消費的邏輯實際上沒有多大影響,它隻影響在服務端等待的時間,所以我們通常設置100ms~1000ms都是可以的;不過有一點需要注意的是,如果下游消費者比較多,比如幾百上千個消費者實例,這個參數都設置成10ms之後的非常短的時間,這會導致服務端對應的時間輪處理線程繁忙從而出現一個現象:生產者明明沒有寫數據到服務端去,爲啥服務端的CPU卻有個20%~40%?其實你按照正常CPU繁忙的手段去排查,最終能查到是delayedConsume延時任務的處理線程。
用上面這個真實的案例想告訴大家的是,這個參數對客戶端沒啥影響,對服務端還有點影響,所以通常情況下,設置個500MS/1000MS就可以了。
講完了上面的兩個關鍵問題,讓我們來看下對應部分的源碼:
@Override
public ConsumerRecords<K, V> poll(long timeout) {
acquireAndEnsureOpen();
try {
if (timeout < 0)
throw new IllegalArgumentException("Timeout must not be negative");
if (this.subscriptions.hasNoSubscriptionOrUserAssignment())
throw new IllegalStateException("Consumer is not subscribed to any topics or assigned any partitions");
// poll for new data until the timeout expires
long start = time.milliseconds();
long remaining = timeout;
do {
Map<TopicPartition, List<ConsumerRecord<K, V>>> records = pollOnce(remaining);
if (!records.isEmpty()) {
// before returning the fetched records, we can send off the next round of fetches
// and avoid block waiting for their responses to enable pipelining while the user
// is handling the fetched records.
//
// NOTE: since the consumed position has already been updated, we must not allow
// wakeups or any other errors to be triggered prior to returning the fetched records.
if (fetcher.sendFetches() > 0 || client.hasPendingRequests())
client.pollNoWakeup();
// 拿到數據後,執行攔截器邏輯
return this.interceptors.onConsume(new ConsumerRecords<>(records));
}
long elapsed = time.milliseconds() - start;
remaining = timeout - elapsed;
} while (remaining > 0);
return ConsumerRecords.empty();
} finally {
release();
}
}
可以看到核心函數是:pollOnce(remaining)
/**
* Do one round of polling. In addition to checking for new data, this does any needed offset commits
* (if auto-commit is enabled), and offset resets (if an offset reset policy is defined).
* @param timeout The maximum time to block in the underlying call to {@link ConsumerNetworkClient#poll(long)}.
* @return The fetched records (may be empty)
*/
private Map<TopicPartition, List<ConsumerRecord<K, V>>> pollOnce(long timeout) {
client.maybeTriggerWakeup();
long startMs = time.milliseconds();
// coordinator組件激活
coordinator.poll(startMs, timeout);
// Lookup positions of assigned partitions
boolean hasAllFetchPositions = updateFetchPositions();
// if data is available already, return it immediately
// 如果數據已經就緒,則立即返回(從緩存中)
Map<TopicPartition, List<ConsumerRecord<K, V>>> records = fetcher.fetchedRecords();
if (!records.isEmpty())
return records;
// send any new fetches (won't resend pending fetches)
// 如果緩存中沒有數據,則發送FETCH請求拉取數據
fetcher.sendFetches();
long nowMs = time.milliseconds();
long remainingTimeMs = Math.max(0, timeout - (nowMs - startMs));
long pollTimeout = Math.min(coordinator.timeToNextPoll(nowMs), remainingTimeMs);
// We do not want to be stuck blocking in poll if we are missing some positions
// since the offset lookup may be backing off after a failure
if (!hasAllFetchPositions && pollTimeout > retryBackoffMs)
pollTimeout = retryBackoffMs;
// 在上面設置完請求之後,由client.poll去進行網絡IO
client.poll(pollTimeout, nowMs, new PollCondition() {
// 這個函數是判斷FETCH請求是否有拉到數據,如果沒有則返回true
@Override
public boolean shouldBlock() {
// since a fetch might be completed by the background thread, we need this poll condition
// to ensure that we do not block unnecessarily in poll()
return !fetcher.hasCompletedFetches();
}
});
// after the long poll, we should check whether the group needs to rebalance
// prior to returning data so that the group can stabilize faster
if (coordinator.needRejoin())
return Collections.emptyMap();
// 再次從緩存中取
return fetcher.fetchedRecords();
}
該函數主要做了三件事,讓我們挨個看看:
- 激活coordinator
- 從緩存中拉取數據並返回
- 如果緩存中不存在數據,那麼就發送Fetch請求,並將數據存於緩存中,最終從緩存中拉取數據並返回。
激活coordinator,這個函數非常關鍵。
從下面的代碼我們可以看到,做了很多關鍵的動作:
- 確認coordinator,其實就是發送Find_Coordinator請求
- 激活消費組,主要就是啓動心跳線程+發送Join_Group請求
- 最後,判斷是否需要執行自動提交
這個函數的信息量巨大,我們稍微整理一下以方便記憶:
1、心跳線程不是在初始化的時候啓動的,而是在第一次poll時激活coordinator模塊時啓動的,這個顯而易見,因爲心跳是和coordinator節點去進行的。
2、元數據是在完成消費組激活動作後去更新的,那麼consumer和對應的節點建連的時刻,也就是在第一次poll的時候。
3、消息自動提交的實現,是在pollOnce的入口處,也就是每次poll的第一個動作就是去檢查是否需要自動提交上一次的消費進度。
/**
* Poll for coordinator events. This ensures that the coordinator is known and that the consumer
* has joined the group (if it is using group management). This also handles periodic offset commits
* if they are enabled.
*
* @param now current time in milliseconds
*/
public void poll(long now, long remainingMs) {
// 執行已完成的位移提交的回調函數
invokeCompletedOffsetCommitCallbacks();
// 如果不是assign模式,那麼就走消費組的策略
if (subscriptions.partitionsAutoAssigned()) {
// 確認coordinator
if (coordinatorUnknown()) {
ensureCoordinatorReady();
now = time.milliseconds();
}
// 分區信息/訂閱信息出現變化,則需要進行REJOIN
if (needRejoin()) {
// due to a race condition between the initial metadata fetch and the initial rebalance,
// we need to ensure that the metadata is fresh before joining initially. This ensures
// that we have matched the pattern against the cluster's topics at least once before joining.
// 如果是按照正則方式訂閱,那麼需要確認元數據是否需要更新
if (subscriptions.hasPatternSubscription())
client.ensureFreshMetadata();
// 1、二次確認coordinator
// 2、啓動heartbeatThread
// 3、發送JOIN_GROUP請求
ensureActiveGroup();
now = time.milliseconds();
}
// 檢查心跳線程狀態
pollHeartbeat(now);
} else {
// For manually assigned partitions, if there are no ready nodes, await metadata.
// If connections to all nodes fail, wakeups triggered while attempting to send fetch
// requests result in polls returning immediately, causing a tight loop of polls. Without
// the wakeup, poll() with no channels would block for the timeout, delaying re-connection.
// awaitMetadataUpdate() initiates new connections with configured backoff and avoids the busy loop.
// When group management is used, metadata wait is already performed for this scenario as
// coordinator is unknown, hence this check is not required.
if (metadata.updateRequested() && !client.hasReadyNodes()) {
boolean metadataUpdated = client.awaitMetadataUpdate(remainingMs);
if (!metadataUpdated && !client.hasReadyNodes())
return;
now = time.milliseconds();
}
}
// 判斷是否需要自動提交
maybeAutoCommitOffsetsAsync(now);
}
protected synchronized boolean ensureCoordinatorReady(long startTimeMs, long timeoutMs) {
long remainingMs = timeoutMs;
while (coordinatorUnknown()) {
RequestFuture<Void> future = lookupCoordinator();
client.poll(future, remainingMs);
if (future.failed()) {
if (future.isRetriable()) {
remainingMs = timeoutMs - (time.milliseconds() - startTimeMs);
if (remainingMs <= 0)
break;
log.debug("Coordinator discovery failed, refreshing metadata");
client.awaitMetadataUpdate(remainingMs);
} else
throw future.exception();
} else if (coordinator != null && client.connectionFailed(coordinator)) {
// we found the coordinator, but the connection has failed, so mark
// it dead and backoff before retrying discovery
markCoordinatorUnknown();
time.sleep(retryBackoffMs);
}
remainingMs = timeoutMs - (time.milliseconds() - startTimeMs);
if (remainingMs <= 0)
break;
}
return !coordinatorUnknown();
}
public void ensureActiveGroup() {
// always ensure that the coordinator is ready because we may have been disconnected
// when sending heartbeats and does not necessarily require us to rejoin the group.
ensureCoordinatorReady();
startHeartbeatThreadIfNeeded();
joinGroupIfNeeded();
}
我們接着看下消息是如何從緩存中被拉取的,從源碼可知,數據是被緩存於ConcurrentLinkedQueue<CompletedFetch> completedFetches
這麼一個隊列中,存放的形式是分區爲一個Fetch進行管理的。
/**
* Return the fetched records, empty the record buffer and update the consumed position.
*
* NOTE: returning empty records guarantees the consumed position are NOT updated.
*
* @return The fetched records per partition
* @throws OffsetOutOfRangeException If there is OffsetOutOfRange error in fetchResponse and
* the defaultResetPolicy is NONE
*/
public Map<TopicPartition, List<ConsumerRecord<K, V>>> fetchedRecords() {
Map<TopicPartition, List<ConsumerRecord<K, V>>> fetched = new HashMap<>();
int recordsRemaining = maxPollRecords;
try {
// 拉取max.poll.size條數
while (recordsRemaining > 0) {
// 從completedFetches裏面取出completedFetch
if (nextInLineRecords == null || nextInLineRecords.isFetched) {
CompletedFetch completedFetch = completedFetches.peek();
// 如果拉完了,直接break
if (completedFetch == null) break;
nextInLineRecords = parseCompletedFetch(completedFetch);
completedFetches.poll();
} else {
// 將completedFetch轉換成records
List<ConsumerRecord<K, V>> records = fetchRecords(nextInLineRecords, recordsRemaining);
TopicPartition partition = nextInLineRecords.partition;
if (!records.isEmpty()) {
List<ConsumerRecord<K, V>> currentRecords = fetched.get(partition);
if (currentRecords == null) {
fetched.put(partition, records);
} else {
// this case shouldn't usually happen because we only send one fetch at a time per partition,
// but it might conceivably happen in some rare cases (such as partition leader changes).
// we have to copy to a new list because the old one may be immutable
List<ConsumerRecord<K, V>> newRecords = new ArrayList<>(records.size() + currentRecords.size());
newRecords.addAll(currentRecords);
newRecords.addAll(records);
fetched.put(partition, newRecords);
}
recordsRemaining -= records.size();
}
}
}
} catch (KafkaException e) {
if (fetched.isEmpty())
throw e;
}
return fetched;
}
private List<ConsumerRecord<K, V>> fetchRecords(PartitionRecords partitionRecords, int maxRecords) {
if (!subscriptions.isAssigned(partitionRecords.partition)) {
// this can happen when a rebalance happened before fetched records are returned to the consumer's poll call
log.debug("Not returning fetched records for partition {} since it is no longer assigned",
partitionRecords.partition);
} else if (!subscriptions.isFetchable(partitionRecords.partition)) {
// this can happen when a partition is paused before fetched records are returned to the consumer's
// poll call or if the offset is being reset
log.debug("Not returning fetched records for assigned partition {} since it is no longer fetchable",
partitionRecords.partition);
} else {
long position = subscriptions.position(partitionRecords.partition);
// 比較內存中記錄的nextFetchOffset是否等於這次拉取下來的數據的起始位移
if (partitionRecords.nextFetchOffset == position) {
List<ConsumerRecord<K, V>> partRecords = partitionRecords.fetchRecords(maxRecords);
long nextOffset = partitionRecords.nextFetchOffset;
log.trace("Returning fetched records at offset {} for assigned partition {} and update " +
"position to {}", position, partitionRecords.partition, nextOffset);
// 更新當前分區的消費位移信息 = nextOffset
// 因此在這次拉取完成後,根據partitionState.position拿到的位移信息,一定都是curRecords.offset+1
// 所以無論是異步/同步提交,都是直接把這一批數據給提交上去了,反而可能會導致數據丟失。
// 反觀自動提交,是在上一批數據消費完之後,纔去提交的,因此在同步消費上一批數據的情況下,只會造成重複消費,而不會消息丟失。
// 如果是異步執行的方式,那麼必然會有重複消費/消息丟失的風險。
subscriptions.position(partitionRecords.partition, nextOffset);
Long partitionLag = subscriptions.partitionLag(partitionRecords.partition, isolationLevel);
if (partitionLag != null)
this.sensors.recordPartitionLag(partitionRecords.partition, partitionLag);
return partRecords;
} else {
// these records aren't next in line based on the last consumed position, ignore them
// they must be from an obsolete request
log.debug("Ignoring fetched records for {} at offset {} since the current position is {}",
partitionRecords.partition, partitionRecords.nextFetchOffset, position);
}
}
partitionRecords.drain();
return emptyList();
}
private static class CompletedFetch {
private final TopicPartition partition;
private final long fetchedOffset;
private final FetchResponse.PartitionData partitionData;
private final FetchResponseMetricAggregator metricAggregator;
private final short responseVersion;
private CompletedFetch(TopicPartition partition,
long fetchedOffset,
FetchResponse.PartitionData partitionData,
FetchResponseMetricAggregator metricAggregator,
short responseVersion) {
this.partition = partition;
this.fetchedOffset = fetchedOffset;
this.partitionData = partitionData;
this.metricAggregator = metricAggregator;
this.responseVersion = responseVersion;
}
}
我們再看下數據拉取後是怎麼存放於completeFetchs中的,其實主要做了兩件事:
1、找到這次發送請求對應的Node信息。
2、接着去發送請求,把拿到的數據封裝好後入隊到completedFetches中。
public int sendFetches() {
Map<Node, FetchSessionHandler.FetchRequestData> fetchRequestMap = prepareFetchRequests();
for (Map.Entry<Node, FetchSessionHandler.FetchRequestData> entry : fetchRequestMap.entrySet()) {
final Node fetchTarget = entry.getKey();
final FetchSessionHandler.FetchRequestData data = entry.getValue();
final FetchRequest.Builder request = FetchRequest.Builder
.forConsumer(this.maxWaitMs, this.minBytes, data.toSend())
.isolationLevel(isolationLevel)
.setMaxBytes(this.maxBytes)
.metadata(data.metadata())
.toForget(data.toForget());
if (log.isDebugEnabled()) {
log.debug("Sending {} {} to broker {}", isolationLevel, data.toString(), fetchTarget);
}
// 通過networkClient對指定節點拉取數據
// 然後放入completedFetches中
client.send(fetchTarget, request)
.addListener(new RequestFutureListener<ClientResponse>() {
@Override
public void onSuccess(ClientResponse resp) {
FetchResponse response = (FetchResponse) resp.responseBody();
FetchSessionHandler handler = sessionHandlers.get(fetchTarget.id());
if (handler == null) {
log.error("Unable to find FetchSessionHandler for node {}. Ignoring fetch response.",
fetchTarget.id());
return;
}
if (!handler.handleResponse(response)) {
return;
}
Set<TopicPartition> partitions = new HashSet<>(response.responseData().keySet());
FetchResponseMetricAggregator metricAggregator = new FetchResponseMetricAggregator(sensors, partitions);
// 拉取成功就緩存在completedFetches裏面
for (Map.Entry<TopicPartition, FetchResponse.PartitionData> entry : response.responseData().entrySet()) {
TopicPartition partition = entry.getKey();
long fetchOffset = data.sessionPartitions().get(partition).fetchOffset;
FetchResponse.PartitionData fetchData = entry.getValue();
log.debug("Fetch {} at offset {} for partition {} returned fetch data {}",
isolationLevel, fetchOffset, partition, fetchData);
completedFetches.add(new CompletedFetch(partition, fetchOffset, fetchData, metricAggregator,
resp.requestHeader().apiVersion()));
}
sensors.fetchLatency.record(resp.requestLatencyMs());
}
@Override
public void onFailure(RuntimeException e) {
FetchSessionHandler handler = sessionHandlers.get(fetchTarget.id());
if (handler != null) {
handler.handleError(e);
}
}
});
}
return fetchRequestMap.size();
}
到這裏,消息消費的邏輯基本上梳理完成了,其實概括一下來說就是:
1、激活coordinator,激活消費組,啓動心跳線程,判斷是否自動提交。
2、嘗試從緩存隊列complateFetchs中拉取數據,沒有則發送fetch請求。
3、將fetch請求拿到的數據封裝到complateFetchs隊列中,最後再次拉取後返回消息集合。
位移提交
在我們處理完消息之後,如何讓服務端知道呢?答案就是位移提交。
提交的方式有三種:
- 自動提交:開啓
enable.auto.commit=true
後即可自動提交,默認爲true,每5s提交異一次。 - 異步提交:也就是當即發送一條Commit_offset請求,但是不等待響應。
- 同步提交:也就是當即發送一條Commit_offset請求,需要同步等待響應。
讓我們來看下對應的源碼。
自動提交
涉及以下3個函數,最重要的呢就是allConsumed
函數,這個函數決定我當前提交的到底是哪一條offset。
從代碼可知,提交的呢就是我們每個分區的最新的position,而這個position的設置則是在我們上面的消息拉取處去設置的,各位可以回頭去看下,其實就是這是的lastOffset
,由此我們知道,自動提交是每次poll的時候去檢查是否滿足自動提交的條件,如果滿足呢就提交上一批拉取的數據的最新位移。
public void maybeAutoCommitOffsetsAsync(long now) {
if (autoCommitEnabled && now >= nextAutoCommitDeadline) {
this.nextAutoCommitDeadline = now + autoCommitIntervalMs;
doAutoCommitOffsetsAsync();
}
}
private void doAutoCommitOffsetsAsync() {
// 獲取已消費的位移信息
Map<TopicPartition, OffsetAndMetadata> allConsumedOffsets = subscriptions.allConsumed();
log.debug("Sending asynchronous auto-commit of offsets {}", allConsumedOffsets);
commitOffsetsAsync(allConsumedOffsets, new OffsetCommitCallback() {
@Override
public void onComplete(Map<TopicPartition, OffsetAndMetadata> offsets, Exception exception) {
if (exception != null) {
if (exception instanceof RetriableException) {
log.debug("Asynchronous auto-commit of offsets {} failed due to retriable error: {}", offsets,
exception);
nextAutoCommitDeadline = Math.min(time.milliseconds() + retryBackoffMs, nextAutoCommitDeadline);
} else {
log.warn("Asynchronous auto-commit of offsets {} failed: {}", offsets, exception.getMessage());
}
} else {
log.debug("Completed asynchronous auto-commit of offsets {}", offsets);
}
}
});
}
public Map<TopicPartition, OffsetAndMetadata> allConsumed() {
Map<TopicPartition, OffsetAndMetadata> allConsumed = new HashMap<>();
// 從partitionState中獲取position信息,也就是上一次poll執行時
// 該分區拉取到的最新的一條數據的那個position
for (PartitionStates.PartitionState<TopicPartitionState> state : assignment.partitionStates()) {
if (state.value().hasValidPosition())
allConsumed.put(state.topicPartition(), new OffsetAndMetadata(state.value().position));
}
return allConsumed;
}
異步/同步提交
爲啥這兩個放在一起呢?因爲除了是否同步等待之外,基本沒啥區別。
從下面的代碼我們可以看到,默認的調用下,其本質還是取決於allConsumed
函數,那麼每次調用的時候也是提交當前分組的最新拉取offset。所以不要濫用這兩個函數的默認調用。
當然,這兩個函數可以控制到offset維度的消費提交,通過其他兩個重載函數;不過這樣的話可以保障消息的可靠性,但是吞吐量會降得非常低,如果大家有興趣可以去看下另外的重載函數即可。
@Override
public void commitAsync(OffsetCommitCallback callback) {
acquireAndEnsureOpen();
try {
commitAsync(subscriptions.allConsumed(), callback);
} finally {
release();
}
}
@Override
public void commitSync() {
acquireAndEnsureOpen();
try {
coordinator.commitOffsetsSync(subscriptions.allConsumed(), Long.MAX_VALUE);
} finally {
release();
}
}
private void doCommitOffsetsAsync(final Map<TopicPartition, OffsetAndMetadata> offsets, final OffsetCommitCallback callback) {
// 發送提交請求
RequestFuture<Void> future = sendOffsetCommitRequest(offsets);
final OffsetCommitCallback cb = callback == null ? defaultOffsetCommitCallback : callback;
future.addListener(new RequestFutureListener<Void>() {
@Override
public void onSuccess(Void value) {
// 攔截器可以對提交成功後也做操作
if (interceptors != null)
interceptors.onCommit(offsets);
// 將提交信息緩存
completedOffsetCommits.add(new OffsetCommitCompletion(cb, offsets, null));
}
@Override
public void onFailure(RuntimeException e) {
Exception commitException = e;
// 可重試異常
if (e instanceof RetriableException)
commitException = new RetriableCommitFailedException(e);
// 也緩存起來
completedOffsetCommits.add(new OffsetCommitCompletion(cb, offsets, commitException));
}
});
}
public boolean commitOffsetsSync(Map<TopicPartition, OffsetAndMetadata> offsets, long timeoutMs) {
invokeCompletedOffsetCommitCallbacks();
if (offsets.isEmpty())
return true;
long now = time.milliseconds();
long startMs = now;
long remainingMs = timeoutMs;
do {
if (coordinatorUnknown()) {
if (!ensureCoordinatorReady(now, remainingMs))
return false;
remainingMs = timeoutMs - (time.milliseconds() - startMs);
}
RequestFuture<Void> future = sendOffsetCommitRequest(offsets);
client.poll(future, remainingMs);
// We may have had in-flight offset commits when the synchronous commit began. If so, ensure that
// the corresponding callbacks are invoked prior to returning in order to preserve the order that
// the offset commits were applied.
invokeCompletedOffsetCommitCallbacks();
if (future.succeeded()) {
if (interceptors != null)
interceptors.onCommit(offsets);
return true;
}
if (future.failed() && !future.isRetriable())
throw future.exception();
time.sleep(retryBackoffMs);
now = time.milliseconds();
remainingMs = timeoutMs - (now - startMs);
} while (remainingMs > 0);
return false;
}
最後,看完相關代碼,有同學還是想問,到底該選擇哪種提交方式呢?
其實都是根據業務需要來的,不然kafka也不會提供這麼多種選擇。
如果業務可以容忍重複消息,那麼自動提交就可以滿足其需求。
如果業務想要保證消費端不丟失消息,那麼就可以使用offset維度的同步提交。
如果業務想要相對較高的可靠性,又想要還過得去的吞吐量,可以選擇默認的異步提交。
...
以此類推進行選擇,本質上還是業務對可靠性與吞吐量之間的選擇。
心跳
最後,我們來看下客戶端的心跳。
心跳的啓動我們從上面知道是第一次poll的時候纔會啓動,我們來看看詳細信息是什麼。
其實,重點就是這個心跳線程是守護線程,優先級比較低,因此在機器CPU負載較高時可能拿不到CPU資源,從而導致心跳中斷,該消費者被提出消費組。
private synchronized void startHeartbeatThreadIfNeeded() {
if (heartbeatThread == null) {
heartbeatThread = new HeartbeatThread();
heartbeatThread.start();
}
}
public static final String HEARTBEAT_THREAD_PREFIX = "kafka-coordinator-heartbeat-thread";
private class HeartbeatThread extends KafkaThread {
private boolean enabled = false;
private boolean closed = false;
private AtomicReference<RuntimeException> failed = new AtomicReference<>(null);
private HeartbeatThread() {
super(HEARTBEAT_THREAD_PREFIX + (groupId.isEmpty() ? "" : " | " + groupId), true);
}
public KafkaThread(final String name, boolean daemon) {
super(name);
configureThread(name, daemon);
}
private void configureThread(final String name, boolean daemon) {
setDaemon(daemon);
setUncaughtExceptionHandler(new UncaughtExceptionHandler() {
public void uncaughtException(Thread t, Throwable e) {
log.error("Uncaught exception in thread '{}':", name, e);
}
});
}
再看看這個心跳線程做了些啥:
- 首先,檢測心跳會話是否超時,如果超時了就把coordinator設置爲unkonwn,在下一次poll的時候會開啓rebalance。
- 接着檢測兩次poll的間隔,是否超過了
max.poll.interval.ms
參數的限制,默認是5分鐘。如果是的話就主動發起LeaveGroup請求,下一次poll的時候會發起rebalance。 - 然後檢測是否到了心跳的間隔時間,如果沒到那麼就等待設置的重試間隔時間100ms。
- 最後,時間到了,就發送心跳請求,並設置對應的listenner。
除開上面的流程外,最重要的是HeartbeatResponseHandler
中如何處理對應的response的。:
- 如果沒有error,則正常處理。
- 如果發現coordinator不可用或者沒有coordinator,則重新find_coordinator
- 如果發現正處於rebalance過程中,則發送JoinGroup請求,重新加入消費組。
- 如果年代非法,那麼說明掉線了一陣子了,別人的rebaance都完成了,那麼重置年代後,重新加組。
- 如果是當前的memberId非法,那麼同上重置年代後,重新加組。
- 其他異常則直接向上層拋出。
@Override
public void run() {
try {
log.debug("Heartbeat thread started");
while (true) {
synchronized (AbstractCoordinator.this) {
if (closed)
return;
if (!enabled) {
AbstractCoordinator.this.wait();
continue;
}
if (state != MemberState.STABLE) {
// the group is not stable (perhaps because we left the group or because the coordinator
// kicked us out), so disable heartbeats and wait for the main thread to rejoin.
disable();
continue;
}
client.pollNoWakeup();
long now = time.milliseconds();
if (coordinatorUnknown()) {
if (findCoordinatorFuture != null || lookupCoordinator().failed())
// the immediate future check ensures that we backoff properly in the case that no
// brokers are available to connect to.
AbstractCoordinator.this.wait(retryBackoffMs);
} else if (heartbeat.sessionTimeoutExpired(now)) {
// the session timeout has expired without seeing a successful heartbeat, so we should
// probably make sure the coordinator is still healthy.
markCoordinatorUnknown();
} else if (heartbeat.pollTimeoutExpired(now)) {
// the poll timeout has expired, which means that the foreground thread has stalled
// in between calls to poll(), so we explicitly leave the group.
maybeLeaveGroup();
} else if (!heartbeat.shouldHeartbeat(now)) {
// poll again after waiting for the retry backoff in case the heartbeat failed or the
// coordinator disconnected
AbstractCoordinator.this.wait(retryBackoffMs);
} else {
heartbeat.sentHeartbeat(now);
sendHeartbeatRequest().addListener(new RequestFutureListener<Void>() {
@Override
public void onSuccess(Void value) {
synchronized (AbstractCoordinator.this) {
heartbeat.receiveHeartbeat(time.milliseconds());
}
}
@Override
public void onFailure(RuntimeException e) {
synchronized (AbstractCoordinator.this) {
if (e instanceof RebalanceInProgressException) {
// it is valid to continue heartbeating while the group is rebalancing. This
// ensures that the coordinator keeps the member in the group for as long
// as the duration of the rebalance timeout. If we stop sending heartbeats,
// however, then the session timeout may expire before we can rejoin.
heartbeat.receiveHeartbeat(time.milliseconds());
} else {
heartbeat.failHeartbeat();
// wake up the thread if it's sleeping to reschedule the heartbeat
AbstractCoordinator.this.notify();
}
}
}
});
}
}
}
} catch (AuthenticationException e) {
log.error("An authentication error occurred in the heartbeat thread", e);
this.failed.set(e);
} catch (GroupAuthorizationException e) {
log.error("A group authorization error occurred in the heartbeat thread", e);
this.failed.set(e);
} catch (InterruptedException | InterruptException e) {
Thread.interrupted();
log.error("Unexpected interrupt received in heartbeat thread", e);
this.failed.set(new RuntimeException(e));
} catch (Throwable e) {
log.error("Heartbeat thread failed due to unexpected error", e);
if (e instanceof RuntimeException)
this.failed.set((RuntimeException) e);
else
this.failed.set(new RuntimeException(e));
} finally {
log.debug("Heartbeat thread has closed");
}
}
}
// visible for testing
synchronized RequestFuture<Void> sendHeartbeatRequest() {
log.debug("Sending Heartbeat request to coordinator {}", coordinator);
HeartbeatRequest.Builder requestBuilder =
new HeartbeatRequest.Builder(this.groupId, this.generation.generationId, this.generation.memberId);
// 將handler傳入,處理對應響應,並且執行外部回調
return client.send(coordinator, requestBuilder)
.compose(new HeartbeatResponseHandler());
}
private class HeartbeatResponseHandler extends CoordinatorResponseHandler<HeartbeatResponse, Void> {
@Override
public void handle(HeartbeatResponse heartbeatResponse, RequestFuture<Void> future) {
sensors.heartbeatLatency.record(response.requestLatencyMs());
Errors error = heartbeatResponse.error();
if (error == Errors.NONE) {
log.debug("Received successful Heartbeat response");
future.complete(null);
} else if (error == Errors.COORDINATOR_NOT_AVAILABLE
|| error == Errors.NOT_COORDINATOR) {
log.debug("Attempt to heartbeat since coordinator {} is either not started or not valid.",
coordinator());
markCoordinatorUnknown();
future.raise(error);
} else if (error == Errors.REBALANCE_IN_PROGRESS) {
log.debug("Attempt to heartbeat failed since group is rebalancing");
requestRejoin();
future.raise(Errors.REBALANCE_IN_PROGRESS);
} else if (error == Errors.ILLEGAL_GENERATION) {
log.debug("Attempt to heartbeat failed since generation {} is not current", generation.generationId);
resetGeneration();
future.raise(Errors.ILLEGAL_GENERATION);
} else if (error == Errors.UNKNOWN_MEMBER_ID) {
log.debug("Attempt to heartbeat failed for since member id {} is not valid.", generation.memberId);
resetGeneration();
future.raise(Errors.UNKNOWN_MEMBER_ID);
} else if (error == Errors.GROUP_AUTHORIZATION_FAILED) {
future.raise(new GroupAuthorizationException(groupId));
} else {
future.raise(new KafkaException("Unexpected error in heartbeat response: " + error.message()));
}
}
}
總結
以上,整體梳理下consumer核心鏈路的相關代碼,但是相當於是代碼的走讀,沒有做總結性的記錄和畫圖,因此本文只是半完成品,後續有時間我會補上對應的東西,如果大家對其中某些地方有疑問的,歡迎留言交流~^_^