Flink kafka source & sink 源碼解析

吳鵬 Flink 中文社區


摘要：本文基於 Flink 1.9.0 和 Kafka 2.3 版本，對 Flink Kafka source 和 sink 端的源碼進行解析，主要內容分爲以下兩部分：1.Flink-kafka-source 源碼解析

* 流程概述
* 非 checkpoint 模式 offset 的提交
* checkpoint 模式下 offset 的提交
* 指定 offset 消費

2.Flink-kafka-sink 源碼解析

* 初始化
* Task運行
* 小結

Tips：Flink 中文社區徵稿啦，感興趣的同學可點擊「閱讀原文」瞭解詳情～

1.Flink-kafka-source 源碼解析

流程概述

一般在 Flink 中創建 kafka source 的代碼如下：


StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//KafkaEventSchema爲自定義的數據字段解析類
env.addSource(new FlinkKafkaConsumer<>("foo", new KafkaEventSchema(), properties)

而 Kafka 的 KafkaConsumer API 中消費某個 topic 使用的是 poll 方法如下：


KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
consumer.poll(Duration.ofMillis(100));

下面將分析這兩個流程是如何銜接起來的。

初始化

初始化執行 env.addSource 的時候會創建 StreamSource 對象，即 final StreamSource<OUT, ?> sourceOperator = new StreamSource<>(function);這裏的function 就是傳入的 FlinkKafkaConsumer 對象，StreamSource 構造函數中將這個對象傳給父類 AbstractUdfStreamOperator 的 userFunction 變量，源碼如下：

■ StreamSource.java


public StreamSource(SRC sourceFunction) {
    super(sourceFunction);
    this.chainingStrategy = ChainingStrategy.HEAD;
}

■ AbstractUdfStreamOperator.java


public AbstractUdfStreamOperator(F userFunction) {
   this.userFunction = requireNonNull(userFunction);
   checkUdfCheckpointingPreconditions();
}

Task運行

task 啓動後會調用到 SourceStreamTask 中的 performDefaultAction() 方法，這裏面會啓動一個線程 sourceThread.start();，部分源碼如下：


private final LegacySourceFunctionThread sourceThread;

@Override
protected void performDefaultAction(ActionContext context) throws Exception {
    sourceThread.start();
}

在 LegacySourceFunctionThread 的 run 方法中，通過調用 headOperator.run 方法，最終調用了 StreamSource 中的 run 方法，部分源碼如下：


public void run(final Object lockingObject,
                final StreamStatusMaintainer streamStatusMaintainer,
                final Output<StreamRecord<OUT>> collector,
                final OperatorChain<?, ?> operatorChain) throws Exception {

  //省略部分代碼
  this.ctx = StreamSourceContexts.getSourceContext(
    timeCharacteristic,
    getProcessingTimeService(),
    lockingObject,
    streamStatusMaintainer,
    collector,
    watermarkInterval,
    -1);

  try {
    userFunction.run(ctx);
    //省略部分代碼
  } finally {
    // make sure that the context is closed in any case
    ctx.close();
    if (latencyEmitter != null) {
      latencyEmitter.close();
    }
  }
}

這裏最重要的就是 userFunction.run(ctx);，這個 userFunction 就是在上面初始化的時候傳入的 FlinkKafkaConsumer 對象，也就是說這裏實際調用了 FlinkKafkaConsumer 中的 run 方法，而具體的方法實現在其父類 FlinkKafkaConsumerBase中，至此，進入了真正的 kafka 消費階段。

Kafka消費階段

在 FlinkKafkaConsumerBase#run 中創建了一個 KafkaFetcher 對象，並最終調用了 kafkaFetcher.runFetchLoop()，這個方法的代碼片段如下：


/** The thread that runs the actual KafkaConsumer and hand the record batches to this fetcher. */
private final KafkaConsumerThread consumerThread;

@Override
public void runFetchLoop() throws Exception {
  try {
    final Handover handover = this.handover;

    // kick off the actual Kafka consumer
    consumerThread.start();

    //省略部分代碼
}

可以看到實際啓動了一個 KafkaConsumerThread 線程。進入到 KafkaConsumerThread#run 中，下面只是列出了這個方法的部分源碼，完整代碼請參考 KafkaConsumerThread.java。


@Override
public void run() {
  // early exit check
  if (!running) {
    return;
  }
  // This method initializes the KafkaConsumer and guarantees it is torn down properly.
  // This is important, because the consumer has multi-threading issues,
  // including concurrent 'close()' calls.
  try {
    this.consumer = getConsumer(kafkaProperties);
  } catch (Throwable t) {
    handover.reportError(t);
    return;
  }
  try {

    // main fetch loop
    while (running) {
      try {
        if (records == null) {
          try {
            records = consumer.poll(pollTimeout);
          } catch (WakeupException we) {
            continue;
          }
        }
      }
      // end main fetch loop
    }
  } catch (Throwable t) {
    handover.reportError(t);
  } finally {
    handover.close();
    try {
      consumer.close();
    } catch (Throwable t) {
      log.warn("Error while closing Kafka consumer", t);
    }
  }
}

至此，終於走到了真正從 kafka 拿數據的代碼，即 records = consumer.poll(pollTimeout);。因爲 KafkaConsumer 不是線程安全的，所以每個線程都需要生成獨立的 KafkaConsumer 對象，即 this.consumer = getConsumer(kafkaProperties);。


KafkaConsumer<byte[], byte[]> getConsumer(Properties kafkaProperties) {
  return new KafkaConsumer<>(kafkaProperties);
}

小結：本節只是介紹了 Flink 消費 kafka 數據的關鍵流程，下面會更詳細的介紹在AT_LEAST_ONCE和EXACTLY_ONCE 不同場景下 FlinkKafkaConsumer 管理 offset 的流程。

非 checkpoint 模式 offset 的提交

消費 kafka topic 最爲重要的部分就是對 offset 的管理，對於 kafka 提交 offset 的機制，可以參考 kafka 官方網。

而在 flink kafka source 中 offset 的提交模式有3種：


public enum OffsetCommitMode {

   /** Completely disable offset committing. */
   DISABLED,

   /** Commit offsets back to Kafka only when checkpoints are completed. */
   ON_CHECKPOINTS,

   /** Commit offsets periodically back to Kafka, using the auto commit functionality of internal Kafka clients. */
   KAFKA_PERIODIC;
}

初始化 offsetCommitMode

在 FlinkKafkaConsumerBase#open 方法中初始化 offsetCommitMode


// determine the offset commit mode
this.offsetCommitMode = OffsetCommitModes.fromConfiguration(
                getIsAutoCommitEnabled(),
                enableCommitOnCheckpoints,
        ((StreamingRuntimeContext)getRuntimeContext()).isCheckpointingEnabled());

方法 getIsAutoCommitEnabled() 的實現如下：


protected boolean getIsAutoCommitEnabled() {
   return getBoolean(properties, ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, true) &&
      PropertiesUtil.getLong(properties, ConsumerConfig.AUTO_COMMIT_INTERVAL_MS_CONFIG, 5000) > 0;
}

也就是說只有 enable.auto.commit=true 並且 auto.commit.interval.ms>0 這個方法纔會返回 true
變量 enableCommitOnCheckpoints 默認是 true，可以調用 setCommitOffsetsOnCheckpoints 改變這個值
當代碼中調用了 env.enableCheckpointing 方法，isCheckpointingEnabled 纔會返回 true

通過下面的代碼返回真正的提交模式：


/**
 * Determine the offset commit mode using several configuration values.
 *
 * @param enableAutoCommit whether or not auto committing is enabled in the provided Kafka properties.
 * @param enableCommitOnCheckpoint whether or not committing on checkpoints is enabled.
 * @param enableCheckpointing whether or not checkpoint is enabled for the consumer.
 *
 * @return the offset commit mode to use, based on the configuration values.
 */
public static OffsetCommitMode fromConfiguration(
      boolean enableAutoCommit,
      boolean enableCommitOnCheckpoint,
      boolean enableCheckpointing) {

   if (enableCheckpointing) {
      // if checkpointing is enabled, the mode depends only on whether committing on checkpoints is enabled
      return (enableCommitOnCheckpoint) ? OffsetCommitMode.ON_CHECKPOINTS : OffsetCommitMode.DISABLED;
   } else {
      // else, the mode depends only on whether auto committing is enabled in the provided Kafka properties
      return (enableAutoCommit) ? OffsetCommitMode.KAFKA_PERIODIC : OffsetCommitMode.DISABLED;
   }
}

暫時不考慮 checkpoint 的場景，所以只考慮 (enableAutoCommit) ? OffsetCommitMode.KAFKA_PERIODIC : OffsetCommitMode.DISABLED;。

也就是如果客戶端設置了 enable.auto.commit=true 那麼就是 KAFKA_PERIODIC，否則就是 KAFKA_DISABLED。

offset 的提交

■ 自動提交

這種方式完全依靠 kafka 自身的特性進行提交，如下方式指定參數即可：


Properties properties = new Properties();
properties.put("enable.auto.commit", "true");
properties.setProperty("auto.commit.interval.ms", "1000");
new FlinkKafkaConsumer<>("foo", new KafkaEventSchema(), properties)

■ 非自動提交

通過上面的分析，如果 enable.auto.commit=false，那麼 offsetCommitMode 就是 DISABLED 。

kafka 官方文檔中，提到當 enable.auto.commit=false 時候需要手動提交 offset，也就是需要調用 consumer.commitSync(); 方法提交。

但是在 flink 中，非 checkpoint 模式下，不會調用 consumer.commitSync();，一旦關閉自動提交，意味着 kafka 不知道當前的 consumer group 每次消費到了哪。

可以從兩方面證實這個問題：

源碼
KafkaConsumerThread#run 方法中是有 consumer.commitSync();，但是隻有當 commitOffsetsAndCallback != null 的時候纔會調用。只有開啓了checkpoint 功能纔會不爲 null，這個變量會在後續的文章中詳細分析。
測試
a.可以通過消費 __consumer_offsets 觀察是否有 offset 的提交
b.重啓程序，還是會重複消費之前消費過的數據

小結：本節介紹了在非 checkpoint 模式下，Flink kafka source 提交 offset 的方式，下文會重點介紹 checkpoint 模式下提交 offset 的流程。

checkpoint 模式下 offset 的提交

上面介紹了在沒有開啓 checkpoint 的時候，offset 的提交方式，下面將重點介紹開啓 checkpoint 後，Flink kafka consumer 提交 offset 的方式。

初始化 offsetCommitMode

通過上文可以知道，當調用了 env.enableCheckpointing 方法後 offsetCommitMode 的值就是 ON_CHECKPOINTS，而且會通過下面方法強制關閉 kafka 自動提交功能，這個值很重要，後續很多地方都是根據這個值去判斷如何操作的。


/**
 * Make sure that auto commit is disabled when our offset commit mode is ON_CHECKPOINTS.
 * This overwrites whatever setting the user configured in the properties.
 * @param properties - Kafka configuration properties to be adjusted
 * @param offsetCommitMode offset commit mode
 */
static void adjustAutoCommitConfig(Properties properties, OffsetCommitMode offsetCommitMode) {
   if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS || offsetCommitMode == OffsetCommitMode.DISABLED) {
      properties.setProperty(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
   }
}

保存 offset

在做 checkpoint 的時候會調用 FlinkKafkaConsumerBase#snapshotState 方法，其中 pendingOffsetsToCommit 會保存要提交的 offset。


if (offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS) {
   // the map cannot be asynchronously updated, because only one checkpoint call can happen
   // on this function at a time: either snapshotState() or notifyCheckpointComplete()
   pendingOffsetsToCommit.put(context.getCheckpointId(), currentOffsets);
}

同時，下面的變量會作爲 checkpoint 的一部分保存下來，以便恢復時使用。


/** Accessor for state in the operator state backend. */
private transient ListState<Tuple2<KafkaTopicPartition, Long>> unionOffsetStates;

在snapshotState 方法中會同時保存 offset：


for (Map.Entry<KafkaTopicPartition, Long> subscribedPartition : subscribedPartitionsToStartOffsets.entrySet()) {
    unionOffsetStates.add(Tuple2.of(subscribedPartition.getKey(), subscribedPartition.getValue()));
}

提交 offset

在 checkpoint 完成以後，task 會調用 notifyCheckpointComplete 方法，裏面判斷 offsetCommitMode == OffsetCommitMode.ON_CHECKPOINTS 的時候，調用fetcher.commitInternalOffsetsToKafka(offsets, offsetCommitCallback); 方法，最終會將要提交的 offset 通過 KafkaFetcher#doCommitInternalOffsetsToKafka 方法中的 consumerThread.setOffsetsToCommit(offsetsToCommit, commitCallback); 保存到 KafkaConsumerThread.java 中的 nextOffsetsToCommit 成員變量裏面。

這樣就會保證當有需要提交的 offset 的時候，下面代碼會執行 consumer.commitAsync，從而完成了手動提交 offset 到 kafka。


final Tuple2<Map<TopicPartition, OffsetAndMetadata>, KafkaCommitCallback> commitOffsetsAndCallback = nextOffsetsToCommit.getAndSet(null);

if (commitOffsetsAndCallback != null) {
  log.debug("Sending async offset commit request to Kafka broker");

  // also record that a commit is already in progress
  // the order here matters! first set the flag, then send the commit command.
  commitInProgress = true;
  consumer.commitAsync(commitOffsetsAndCallback.f0, new CommitCallback(commitOffsetsAndCallback.f1));
}

小結：本節介紹了在 checkpoint 模式下，Flink kafka source 提交 offset 的方式，後續會介紹 consumer 讀取 offset 的流程。

指定 offset 消費

消費模式

在 Flink 的 kafka source 中有以下5種模式指定 offset 消費：


public enum StartupMode {

   /** Start from committed offsets in ZK / Kafka brokers of a specific consumer group (default). */
   GROUP_OFFSETS(KafkaTopicPartitionStateSentinel.GROUP_OFFSET),

   /** Start from the earliest offset possible. */
   EARLIEST(KafkaTopicPartitionStateSentinel.EARLIEST_OFFSET),

   /** Start from the latest offset. */
   LATEST(KafkaTopicPartitionStateSentinel.LATEST_OFFSET),

   /**
    * Start from user-supplied timestamp for each partition.
    * Since this mode will have specific offsets to start with, we do not need a sentinel value;
    * using Long.MIN_VALUE as a placeholder.
    */
   TIMESTAMP(Long.MIN_VALUE),

   /**
    * Start from user-supplied specific offsets for each partition.
    * Since this mode will have specific offsets to start with, we do not need a sentinel value;
    * using Long.MIN_VALUE as a placeholder.
    */
   SPECIFIC_OFFSETS(Long.MIN_VALUE);
}

默認爲 GROUP_OFFSETS，表示根據上一次 group id 提交的 offset 位置開始消費。每個枚舉的值其實是一個 long 型的負數，根據不同的模式，在每個 partition 初始化的時候會默認將 offset 設置爲這個負數。其他的方式和 kafka 本身的語義類似，就不在贅述。

指定 offset

此處只討論默認的 GROUP_OFFSETS 方式，下文所有分析都是基於這種模式。但是還是需要區分是否開啓了 checkpoint。在開始分析之前需要對幾個重要的變量進行說明：

subscribedPartitionsToStartOffsets

a.所屬類：FlinkKafkaConsumerBase.java
b.定義：


/** The set of topic partitions that the source will read, with their initial offsets to start reading from. */
private Map<KafkaTopicPartition, Long> subscribedPartitionsToSt

說明：保存訂閱 topic 的所有 partition 以及初始消費的 offset。

subscribedPartitionStates

a.所屬類：AbstractFetcher.java
b.定義：


/** All partitions (and their state) that this fetcher is subscribed to. */
private final List<KafkaTopicPartitionState<KPH>> subscribedPar

c.說明：保存了所有訂閱的 partition 的 offset 等詳細信息，例如：


/** The offset within the Kafka partition that we already processed. */
private volatile long offset;
/** The offset of the Kafka partition that has been committed. */
private volatile long committedOffset;

每次消費完數據之後都會更新這些值，這個變量非常的重要，在做 checkpoint 的時候，保存的 offset 等信息都是來自於這個變量。這個變量的初始化如下：


// initialize subscribed partition states with seed partitions
this.subscribedPartitionStates = createPartitionStateHolders(
  seedPartitionsWithInitialOffsets,
  timestampWatermarkMode,
  watermarksPeriodic,
  watermarksPunctuated,
  userCodeClassLoader);

消費之後更新相應的 offset 主要在 KafkaFetcher#runFetchLoop
方法 while 循環中調用 emitRecord(value, partition, record.
offset(), record);。

restoredState

a.所屬類：FlinkKafkaConsumerBase.java
b.定義：


/**
     * The offsets to restore to, if the consumer restores state from a checkpoint.
     *
     * <p>This map will be populated by the {@link #initializeState(FunctionInitializationContext)} method.
     *
     * <p>Using a sorted map as the ordering is important when using restored state
     * to seed the partition discoverer.
     */
private transient volatile TreeMap<KafkaTopicPartition, Long> restoredState;

c.說明：如果指定了恢復的 checkpoint 路徑，啓動時候將會讀取這個變量裏面的內容獲取起始 offset，而不再是使用 StartupMode 中的枚舉值作爲初始的 offset。

unionOffsetStates

a.所屬類：FlinkKafkaConsumerBase.java
b.定義：


/** Accessor for state in the operator state backend. */
private transient ListState<Tuple2<KafkaTopicPartition, Long>> unionOffsetStates;

c.說明：保存了 checkpoint 要持久化存儲的內容，例如每個 partition 已經消費的 offset 等信息

■ 非 checkpoint 模式

在沒有開啓 checkpoint 的時候，消費 kafka 中的數據，其實就是完全依靠 kafka 自身的機制進行消費。

■ checkpoint 模式

開啓 checkpoint 模式以後，會將 offset 等信息持久化存儲以便恢復時使用。但是作業重啓以後如果由於某種原因讀不到 checkpoint 的結果，例如 checkpoint 文件丟失或者沒有指定恢復路徑等。

第一種情況，如果讀取不到 checkpoint 的內容

subscribedPartitionsToStartOffsets 會初始化所有 partition 的起始 offset爲 -915623761773L 這個值就表示了當前爲 GROUP_OFFSETS 模式。


default:
   for (KafkaTopicPartition seedPartition : allPartitions) {
      subscribedPartitionsToStartOffsets.put(seedPartition, startupMode.getStateSentinel());
   }

第一次消費之前，指定讀取 offset 位置的關鍵方法是 KafkaConsumerThread#reassignPartitions 代碼片段如下：


for (KafkaTopicPartitionState<TopicPartition> newPartitionState : newPartitions) {
  if (newPartitionState.getOffset() == KafkaTopicPartitionStateSentinel.EARLIEST_OFFSET) {
    consumerTmp.seekToBeginning(Collections.singletonList(newPartitionState.getKafkaPartitionHandle()));
    newPartitionState.setOffset(consumerTmp.position(newPartitionState.getKafkaPartitionHandle()) - 1);
  } else if (newPartitionState.getOffset() == KafkaTopicPartitionStateSentinel.LATEST_OFFSET) {
    consumerTmp.seekToEnd(Collections.singletonList(newPartitionState.getKafkaPartitionHandle()));
    newPartitionState.setOffset(consumerTmp.position(newPartitionState.getKafkaPartitionHandle()) - 1);
  } else if (newPartitionState.getOffset() == KafkaTopicPartitionStateSentinel.GROUP_OFFSET) {
    // the KafkaConsumer by default will automatically seek the consumer position
    // to the committed group offset, so we do not need to do it.
    newPartitionState.setOffset(consumerTmp.position(newPartitionState.getKafkaPartitionHandle()) - 1);
  } else {
    consumerTmp.seek(newPartitionState.getKafkaPartitionHandle(), newPartitionState.getOffset() + 1);
  }
}

因爲是 GROUP_OFFSET 模式，所以會調用 newPartitionState.setOffset(consumerTmp.position(newPartitionState.getKafkaPartitionHandle()) - 1); 需要說明的是，在 state 裏面需要存儲的是成功消費的最後一條數據的 offset，但是通過 position 這個方法返回的是下一次應該消費的起始 offset，所以需要減1。這裏更新這個的目的是爲了 checkpoint 的時候可以正確的拿到 offset。

這種情況由於讀取不到上次 checkpoint 的結果，所以依舊是依靠 kafka 自身的機制，即根據__consumer_offsets 記錄的內容消費。

第二種情況，checkpoint 可以讀取到

這種情況下， subscribedPartitionsToStartOffsets 初始的 offset 就是具體從checkpoint 中恢復的內容，這樣 KafkaConsumerThread#reassignPartitions 實際走的分支就是：


consumerTmp.seek(newPartitionState.getKafkaPartitionHandle(), newPartitionState.getOffset() + 1);

這裏加1的原理同上，state 保存的是最後一次成功消費數據的 offset，所以加1纔是現在需要開始消費的 offset。

小結：本節介紹了程序啓動時，如何確定從哪個 offset 開始消費，下文會繼續分析 flink kafka sink 的相關源碼。

2.Flink-kafka-sink 源碼解析

初始化

通常添加一個 kafka sink 的代碼如下：


input.addSink(
   new FlinkKafkaProducer<>(
      "bar",
      new KafkaSerializationSchemaImpl(),
         properties,
      FlinkKafkaProducer.Semantic.AT_LEAST_ONCE)).name("Example Sink");

初始化執行 env.addSink 的時候會創建 StreamSink 對象，即 StreamSink<T> sinkOperator = new StreamSink<>(clean(sinkFunction));這裏的 sinkFunction 就是傳入的 FlinkKafkaProducer 對象，StreamSink 構造函數中將這個對象傳給父類 AbstractUdfStreamOperator 的 userFunction 變量，源碼如下：

■ StreamSink.java


public StreamSink(SinkFunction<IN> sinkFunction) {
  super(sinkFunction);
  chainingStrategy = ChainingStrategy.ALWAYS;
}

■ AbstractUdfStreamOperator.java


public AbstractUdfStreamOperator(F userFunction) {
   this.userFunction = requireNonNull(userFunction);
   checkUdfCheckpointingPreconditions();
}

Task 運行

StreamSink 會調用下面的方法發送數據：


@Override
public void processElement(StreamRecord<IN> element) throws Exception {
   sinkContext.element = element;
   userFunction.invoke(element.getValue(), sinkContext);
}

也就是實際調用的是 FlinkKafkaProducer#invoke 方法。在 FlinkKafkaProducer 的構造函數中需要指 FlinkKafkaProducer.Semantic，即：


public enum Semantic {
   EXACTLY_ONCE,
   AT_LEAST_ONCE,
   NONE
}

下面就基於3種語義分別說一下總體的向 kafka 發送數據的流程。

■ Semantic.NONE

這種方式不會做任何額外的操作，完全依靠 kafka producer 自身的特性，也就是FlinkKafkaProducer#invoke 裏面發送數據之後，Flink 不會再考慮 kafka 是否已經正確的收到數據。


transaction.producer.send(record, callback);

■ Semantic.AT_LEAST_ONCE

這種語義下，除了會走上面說到的發送數據的流程外，如果開啓了 checkpoint 功能，在 FlinkKafkaProducer#snapshotState 中會首先執行父類的 snapshotState方法，裏面最終會執行 FlinkKafkaProducer#preCommit。


@Override
protected void preCommit(FlinkKafkaProducer.KafkaTransactionState transaction) throws FlinkKafkaException {
   switch (semantic) {
      case EXACTLY_ONCE:
      case AT_LEAST_ONCE:
         flush(transaction);
         break;
      case NONE:
         break;
      default:
         throw new UnsupportedOperationException("Not implemented semantic");
   }
   checkErroneous();
}

AT_LEAST_ONCE 會執行了 flush 方法，裏面執行了：


transaction.producer.flush();

就是將 send 的數據立即發送給 kafka 服務端，詳細含義可以參考 KafkaProducer api：http://kafka.apache.org/23/javadoc/index.html?org/apache/kafka/clients/producer/KafkaProducer.html

flush()
Invoking this method makes all buffered records immediately available to send (even if linger.ms is greater than 0) and blocks on the completion of the requests associated with these records.

■ Semantic.EXACTLY_ONCE

EXACTLY_ONCE 語義也會執行 send 和 flush 方法，但是同時會開啓 kafka producer 的事務機制。FlinkKafkaProducer 中 beginTransaction 的源碼如下，可以看到只有是 EXACTLY_ONCE 模式纔會真正開始一個事務。


@Override
protected FlinkKafkaProducer.KafkaTransactionState beginTransaction() throws FlinkKafkaException {
   switch (semantic) {
      case EXACTLY_ONCE:
         FlinkKafkaInternalProducer<byte[], byte[]> producer = createTransactionalProducer();
         producer.beginTransaction();
         return new FlinkKafkaProducer.KafkaTransactionState(producer.getTransactionalId(), producer);
      case AT_LEAST_ONCE:
      case NONE:
         // Do not create new producer on each beginTransaction() if it is not necessary
         final FlinkKafkaProducer.KafkaTransactionState currentTransaction = currentTransaction();
         if (currentTransaction != null && currentTransaction.producer != null) {
            return new FlinkKafkaProducer.KafkaTransactionState(currentTransaction.producer);
         }
         return new FlinkKafkaProducer.KafkaTransactionState(initNonTransactionalProducer(true));
      default:
         throw new UnsupportedOperationException("Not implemented semantic");
   }
}

和 AT_LEAST_ONCE 另一個不同的地方在於 checkpoint 的時候，會將事務相關信息保存到變量 nextTransactionalIdHintState 中，這個變量存儲的信息會作爲 checkpoint 中的一部分進行持久化。


if (getRuntimeContext().getIndexOfThisSubtask() == 0 && semantic == FlinkKafkaProducer.Semantic.EXACTLY_ONCE) {
   checkState(nextTransactionalIdHint != null, "nextTransactionalIdHint must be set for EXACTLY_ONCE");
   long nextFreeTransactionalId = nextTransactionalIdHint.nextFreeTransactionalId;

   // If we scaled up, some (unknown) subtask must have created new transactional ids from scratch. In that
   // case we adjust nextFreeTransactionalId by the range of transactionalIds that could be used for this
   // scaling up.
   if (getRuntimeContext().getNumberOfParallelSubtasks() > nextTransactionalIdHint.lastParallelism) {
      nextFreeTransactionalId += getRuntimeContext().getNumberOfParallelSubtasks() * kafkaProducersPoolSize;
   }

   nextTransactionalIdHintState.add(new FlinkKafkaProducer.NextTransactionalIdHint(
      getRuntimeContext().getNumberOfParallelSubtasks(),
      nextFreeTransactionalId));
}

小結：本節介紹了 Flink Kafka Producer 的基本實現原理，後續會詳細介紹 Flink 在結合 kafka 的時候如何做到端到端的 Exactly Once 語義的。

作者介紹：

吳鵬，亞信科技資深工程師，Apache Flink Contributor。先後就職於中興，IBM，華爲。目前在亞信科技負責實時流處理引擎產品的研發。

Flink kafka source & sink 源碼解析