TiDB Binlog 源碼閱讀系列文章（四）Pump server 介紹

作者： satoru

在上篇文章中，我們介紹了 TiDB 如何通過 Pump client 將 binlog 發往 Pump，本文將繼續介紹 Pump server 的實現，對應的源碼主要集中在 TiDB Binlog 倉庫的 pump/server.go 文件中。

啓動 Pump Server

Server 的啓動主要由兩個函數實現：NewServer 和 (*Server).Start。

NewServer 依照傳入的配置項創建 Server 實例，初始化 Server 運行所必需的字段，以下簡單說明部分重要字段：

metrics：一個 MetricClient，用於定時向 Prometheus Pushgateway 推送 metrics。
clusterID：每個 TiDB 集羣都有一個 ID，連接到同一個 TiDB 集羣的服務可以通過這個 ID 識別其他服務是否屬於同個集羣。
pdCli：PD Client，用於註冊、發現服務，獲取 Timestamp Oracle。
tiStore：用於連接 TiDB storage engine，在這裏主要用於查詢事務相關的信息（可以通過 TiDB 中的對應 interface 描述瞭解它的功能）。
storage：Pump 的存儲實現，從 TiDB 發過來的 binlog 就是通過它保存的，下一篇文章將會重點介紹。

Server 初始化以後，就可以用 (*Server).Start 啓動服務。爲了避免丟失 binlog，在開始對外提供 binlog 寫入服務之前，它會將當前 Server 註冊到 PD 上，確保所有運行中的 Drainer 都已經觀察到新增的 Pump 節點。這一步除了啓動對外的服務，還開啓了一些 Pump 正常運作所必須的輔助機制，下文會有更詳細的介紹。

Pump Server API

Pump Server 通過 gRPC 暴露出一些服務，這些接口定義在 tipb/pump.pb.go，包含兩個接口 WriteBinlog、 PullBinlogs。

WriteBinlog

顧名思義，這是用於寫入 binlog 的接口，上篇文章中 Pump client 調用的就是這個。客戶端傳入的請求，是以下的格式：

type WriteBinlogReq struct {
  // The identifier of tidb-cluster, which is given at tidb startup.
  // Must specify the clusterID for each binlog to write.
  ClusterID uint64 `protobuf:"varint,1,opt,name=clusterID,proto3" json:"clusterID,omitempty"`
  // Payload bytes can be decoded back to binlog struct by the protobuf.
  Payload []byte `protobuf:"bytes,2,opt,name=payload,proto3" json:"payload,omitempty"`
}

其中 Payload 是一個用 Protobuf 序列化的 binlog，WriteBinlog 的主要流程就是將請求中的 Payload 解析成 binlog 實例，然後調用 storage.WriteBinlog 保存下來。storage.WriteBinlog 將 binlog 持久化存儲，並對 binlog 按 start TS / commit TS 進行排序，詳細的實現將在下章展開討論。

PullBinlogs

PullBinlogs 是爲 Drainer 提供的接口，用於按順序獲取 binlog。這是一個 streaming 接口，客戶端請求後得到一個 stream，可以從中不斷讀取 binlog。請求的格式如下：

type PullBinlogReq struct {
  // Specifies which clusterID of binlog to pull.
  ClusterID uint64 `protobuf:"varint,1,opt,name=clusterID,proto3" json:"clusterID,omitempty"`
  // The position from which the binlog will be sent.
  StartFrom Pos `protobuf:"bytes,2,opt,name=startFrom" json:"startFrom"`
}

// Binlogs are stored in a number of sequential files in a directory.
// The Pos describes the position of a binlog.
type Pos struct {
  // The suffix of binlog file, like .000001 .000002
  Suffix uint64 `protobuf:"varint,1,opt,name=suffix,proto3" json:"suffix,omitempty"`
  // The binlog offset in a file.
  Offset int64 `protobuf:"varint,2,opt,name=offset,proto3" json:"offset,omitempty"`
}

從名字可以看出，這個請求指定了 Drainer 要從什麼時間點的 binlog 開始同步。雖然 Pos 中有 Suffix 和 Offset 兩個字段，目前只有 Offset 字段是有效的，我們把它用作一個 commit TS，表示只拉取這個時間以後的 binlog。

PullBinlogs 的主要流程，是調用 storage.PullCommitBinlogs 得到一個可以獲取序列化 binlog 的 channel，將這些 binlog 通過 stream.Send 接口逐個發送給客戶端。

輔助機制

上文提到 Pump 的正常運作需要一些輔助機制，本節將逐一介紹這些機制。

fake binlog

在《TiDB-Binlog 架構演進與實現原理》一文中，對 fake binlog 機制有以下說明：

“Pump 會定時（默認三秒）向本地存儲中寫入一條數據爲空的 binlog，在生成該 binlog 前，會向 PD 中獲取一個 tso，作爲該 binlog 的 start_ts 與 commit_ts，這種 binlog 我們叫作 fake binlog。
……Drainer 通過如上所示的方式對 binlog 進行歸併排序，並推進同步的位置。那麼可能會存在這種情況：某個 Pump 由於一些特殊的原因一直沒有收到 binlog 數據，那麼 Drainer 中的歸併排序就無法繼續下去，正如我們用兩條腿走路，其中一隻腿不動就不能繼續前進。我們使用 Pump 一節中提到的 fake binlog 的機制來避免這種問題，Pump 每隔指定的時間就生成一條 fake binlog，即使某些 Pump 一直沒有數據寫入，也可以保證歸併排序正常向前推進。”

genForwardBinlog 實現了這個機制，它裏面是一個定時循環，每隔一段時間（默認 3 秒，可通過 gen-binlog-interval 選項配置）檢查一下是否有新的 binlog 寫入，如果沒有，就調用 writeFakeBinlog 寫一條假的 binlog。

判斷是否有新的 binlog 寫入，是通過 lastWriteBinlogUnixNano 這個變量，每次有新的寫入都會將這個變量設置爲當前時間。

垃圾回收

由於存儲容量限制，顯然 Pump 不能無限制地存儲收到的 binlog，因此需要有一個 GC (Garbage Collection) 機制來清理沒用的 binlog 釋放空間，gcBinlogFile 就負責 GC 的調度。有兩個值會影響 GC 的調度：

gcInterval：控制 GC 檢查的週期，目前寫死在代碼裏的設置是 1 小時
gcDuration：binlog 的保存時長，每次 GC 檢查就是通過當前時間和 gcDuration 計算出 GC 時間點，在這個時間點之前的 binlog 將被 GC 在 gcBinlogFile 的循環中，用 select 監控着 3 種情況：

select {
case <-s.ctx.Done():
  log.Info("gcBinlogFile exit")
  return
case <-s.triggerGC:
  log.Info("trigger gc now")
case <-time.After(gcInterval):
}

3 個 case 分別對應：server 退出，外部觸發 GC，定時檢查這三種情況。其中 server 退出的情況我們直接退出循環。另外兩種情況都會繼續，計算 GC 時間點，交由 storage.GC 執行。

Heartbeat

心跳機制用於定時（默認兩秒）向 PD 發送 Server 最新狀態，由 (*pumpNode).HeartBeat 實現。狀態是由 JSON 編碼的 Status 實例，主要記錄 NodeID、MaxCommitTS 之類的信息。

HTTP API 實現

Pump Server 通過 HTTP 方式暴露出一些 API，主要提供運維相關的接口。

| 路徑 | Handler | 說明 |
| :---------| :----------| :----------|
| GET /status | Status | 返回所有 Pump 節點的狀態。 |
| PUT /state/{nodeID}/{action} | ApplyAction | 支持 pause 和 close 兩種 action，可以暫停和關閉 server。接到請求的 server 會確保用戶指定的 nodeID 跟自己的 nodeID 相匹配，以防誤操作。 |
| GET /drainers | AllDrainers | 返回通過當前 PD 服務可以發現的所有 Drainer 的狀態，一般用於調試時確定 Pump 是否能如預期地發現 Drainer。 |
| GET /debug/binlog/{ts} | BinlogByTS | 通過指定的 timestamp 查詢 binlog，如果查詢結果是一條 Prewrite binlog，還會額外輸出 MVCC 相關的信息。 |
| POST /debug/gc/trigger | TriggerGC | 手動觸發一次 GC，如果 GC 已經在運行中，請求將被忽略。 |

下線 Pump Server

下線一個 Pump server 的流程通常由 binlogctl 命令發起，例如：

bin/binlogctl -pd-urls=localhost:2379 -cmd offline-pump -node-id=My-Host:8240

binlogctl 先通過 nodeID 在 PD 發現的 Pump 節點中找到指定的節點，然後調用上一小節中提到的接口 PUT /state/{nodeID}/close。

在 Server 端，ApplyAction 收到 close 後會將節點狀態置爲 Closing（Heartbeat 進程會定時將這類狀態更新到 PD），然後另起一個 goroutine 調用 Close。Close 首先調用 cancel，通過 context 將關停信號發往協作的 goroutine，這些 goroutine 主要就是上文提到的輔助機制運行的 goroutine，例如在 genForwardBinlog 中設計了在 context 被 cancel 時退出：

for {
  select {
  case <-s.ctx.Done():
     log.Info("genFakeBinlog exit")
     return

Close 用 waitGroup 等待這些 goroutine 全部退出。這時 Pump 仍然能正常提供 PullBinlogs 服務，但是寫入功能已經停止。Close 下一行調用了 commitStatus，這時節點的狀態是 Closing，對應的分支調用了 waitSafeToOffline 來確保到目前爲止寫入的 binlog 都已經被所有的 Drainer 讀到了。waitSafeToOffline 先往 storage 中寫入一條 fake binlog，由於此時寫入功能已經停止，可以確定這將是這個 Pump 最後的一條 binlog。之後就是在循環中定時檢查所有 Drainer 已經讀到的 Binlog 時間信息，直到這個時間已經大於 fake binlog 的 CommitTS。

waitSafeToOffline 等待結束後，就可以關停 gRPC 服務，釋放其他資源。

小結

本文介紹了 Pump server 的啓動、gRPC API 實現、輔助機制的設計以及下線服務的流程，希望能幫助大家在閱讀源碼時有一個更清晰的思路。在上面的介紹中，我們多次提到 storage 這個實體，用來存儲和查詢 binlog 的邏輯主要封裝在這個模塊內，這部分內容將在下篇文章爲大家作詳細介紹。

原文閱讀：https://pingcap.com/blog-cn/tidb-binlog-source-code-reading-4/

TiDB Binlog 源碼閱讀系列文章（四）Pump server 介紹

啓動 Pump Server

Pump Server API

WriteBinlog

PullBinlogs

輔助機制

fake binlog

垃圾回收

Heartbeat

HTTP API 實現

下線 Pump Server

小結

linux安裝cuda和cudnn

Mellanox網卡開啓SR-IOV

模擬手機設備：使用 Playwright 實現移動端自動化測試

全面系統的AI學習路徑，幫助普通人也能玩轉AI

HTML 00 Tutorial

從零開始：使用 Playwright 腳本錄製實現自動化測試

uni-app實現上拉加載

vue3編譯優化之“靜態提升”

又是一個月-20240513

flask 如何保證返回json有序

Explore the Sky丨來 TiDB Hackathon 2021 探索無限可能

成爲一棧式數據服務生態： TiDB 5.0 HTAP 架構設計與成爲場景解析

Async Commit 原理介紹

In Community We Trust

數據庫領域正在發生鉅變，從 TiDB 5.0 發佈會看未來的數據庫發展趨勢

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結