Kafka事務不完全指南

Kafka的事務是什麼

生產者往多個topic裏面寫消息,要麼同時成功,要麼同時失敗。

爲什麼需要事務

消息系統有3種語義:

  1. 最多一次
  2. 最少一次
  3. 精確一次。Exactly Only Once

爲了實現精確一次的語義,Kafka必須引入事務。如下圖:

本應用從上游topic消費消息,處理後發到下游topic,同時將處理進度發送到__consumer_offsetstopic裏面進行保存,對這兩個topic的寫,是一個原子操作(atomic)。

image-20220530095714242

如果沒有事務,發生以下情況時,消息可能會重複或者丟失:

  • A broker can fail
  • The producer-to-broker RPC can fail
  • The client can fail

Zombie instances

在分佈式系統中,主節點由於網絡閃斷等原因,被一致性協議踢出,過了一會兒又恢復過來,並希望與現任主節點做同樣的事情。前任主節點就是zombie instances。

Zombie fencing

We solve the problem of zombie instances by requiring that each transactional producer be assigned a unique identifier called the transactional.id . This is used to identify the same producer instance across process restarts.

transactional.id 能夠跨越進程重啓。也就是說,當主節點宕機後,備節點使用主節點的transactional.id 繼續幹活,是可以的。也就說,事務id “is consistent across producer sessions”。

代碼

//Read and validate deposits validated
Deposits = validate(consumer.poll(0))

//Send validated deposits & commit offsets atomically
producer.beginTransaction() 
producer.send(validatedDeposits) 
producer.sendOffsetsToTransaction(offsets(consumer)) 
producer.endTransaction()

How Transactions Work

image.png

The Transaction Coordinator and Transaction Log

The transaction coordinator is a module running inside every Kafka broker. The transaction log is an internal kafka topic. Each coordinator owns some subset of the partitions in the transaction log, ie. the partitions for which its broker is the leader.

Every transactional.id is mapped to a specific partition of the transaction log through a simple hashing function. This means that exactly one coordinator owns a given transactional.id.

每個broker裏面都有一個事務協調者。transaction log是一個內部topic。

Data flow

4個階段

  1. 生產者與協調者的交互

    1. 生產者使用initTransactions API向協調者註冊一個事務id。此時,協調者會close所有具有相同事務id且處於pending狀態的事務,並把其epoch 納入 fence out zombies
    2. 生產者在一個事務裏第一次對一個分區發消息時,該分區會在協調者註冊
    3. 應用調用commit或abort方法時,客戶端會發一個請求到協調者,協調者開始做二階段提交
  2. 協調者與transaction log的交互

    協調者是唯一一個讀寫transaction log的組件。

  3. 生產者寫數據到目標分區

  4. 協調者與目標分區的交互

    生產者commit或abort時,協調者開始執行2PC:

    1. 在內存更新事務狀態爲“prepare_commit”,也更新transaction log裏的狀態
    2. commit markers 寫到本事務涉及到的topic-partitions
    3. 更新事務狀態爲“complete”

How to pick a transactional.id

The transactional.id plays a major role in fencing out zombies. But maintaining an identifier that is consistent across producer sessions and also fences out zombies properly is a bit tricky.

The key to fencing out zombies properly is to ensure that the input topics and partitions in the read-process-write cycle is always the same for a given transactional.id. If this isn’t true, then it is possible for some messages to leak through the fencing provided by transactions.

For instance, in a distributed stream processing application, suppose topic-partition tp0 was originally processed by transactional.id *T0. *If, at some point later, it could be mapped to another producer with transactional.id *T1, *there would be no fencing between T0 and *T1. *So it is possible for messages from tp0 to be reprocessed, violating the exactly-once processing guarantee.

Practically, one would either have to store the mapping between input partitions and transactional.ids in an external store, or have some static encoding of it. Kafka Streams opts for the latter approach to solve this problem.

How transactions perform, and how to tune them

Performance of the transactional producer 事務的性能開銷

Transactions cause only moderate write amplification(有限的寫放大). The additional writes are due to:

  1. additional RPCs to register the partitions with the coordinator
  2. transaction marker has to be written to each partition
  3. write state changes to the transaction log

事務提交的批次越小、間隔越小,性能損耗越大。對於1KB消息,每100ms提交一次,吞吐下降3%。但是,批次(間隔)越大,消息的實時性越差。

事務對消費者來說幾乎沒性能損失。

冪等producer與事務producer

  • 冪等producer,其id是沒持久化的。重啓後會變。
  • 事務producer,其id是事務id。這個id在重啓後也不會變。事務id的主要作用就是保證producer在重啓後,pid不變。
  • 雖然producer重啓後事務id不會變,但是,一個新的producer session創建時,會再次執行initTransactions(),epoch會增加。假設舊的producer並沒徹底宕機,當它恢復過來時,嘗試使用舊的epoch提交事務,就會報錯。

事務型消費者,可以消費未啓用事務的普通消息嗎

READ_COMMITTED,只能消費到已提交的事務消息,和非事務的消息。

注意,上游不是事務producer,發的非事務消息,也是可以被READ_COMMITTED消費到的。

如何實現跨數據源的事務:kafka+DB混合事務

對於如下的一個邏輯:

image-20220530104540910

本應用消費上游topic的消息,然後將結果1發送到下游topic,同時將結果2寫入db,另外還要提交消費進度到__consumer_offsetstopic。如何保證這3個寫操作是一個事務?kafka的事務並不支持跨數據源。官方文檔如下:

When writing to an external system, the limitation is in the need to coordinate the consumer's position with what is actually stored as output. The classic way of achieving this would be to introduce a two-phase commit between the storage of the consumer position and the storage of the consumers output. But this can be handled more simply and generally by letting the consumer store its offset in the same place as its output.

—— Apache Kafka Documentation - 4.6 Message Delivery Semantics

The main restriction with Transactions is they only work in situations where both the input comes from Kafka and the output is written to a Kafka topic. If you are calling an external service (e.g., via HTTP), updating a database, writing to stdout, or anything other than writing to and from the Kafka broker, transactional guarantees won’t apply and calls can be duplicated. This is exactly how a transactional database works: the transaction works only within the confines of the database, but because Kafka is often used to link systems it can be a cause of confusion. Put another way, Kafka’s transactions are not inter-system transactions such as those provided by technologies that implement XA.

—— What Can't Transactions Do?

官方建議,“letting the consumer store its offset in the same place as its output”,將輸出保存到同一數據源:要麼全部寫數據庫(消費進度也存在數據庫裏),此時可以用數據庫的事務;要麼全部寫kafka。

根據這種思路,提供設計如下:

image-20220530104521400

結果2也寫到kafka裏面,這樣寫結果1+結果2+消費進度是一個kafka事務,可以做到精確一次消費。然後單獨寫一個持久化線程,將數據從kafka裏面消費,寫到db。注意寫db的時候,需要將消費進度一起寫入db,利用數據庫事務來確保精確一次持久化。

Reference

  1. Kafka: The Definitive Guide(2nd Edition)
  2. Apache Kafka Documentation
  3. Transactions in Apache Kafka | Confluent
  4. Building Systems Using Transactions in Apache Kafka® (confluent.io)
  5. Exactly-once Semantics is Possible: Here's How Apache Kafka Does it (confluent.io)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章