一例TiDB DM同步任務寫衝突的分析與解決

提出問題

我們當前通過5個DM任務從RDS MySQL向TiDB同步數據。這些任務均非合庫合表任務，且同步的庫表相互之間沒有交集，safe-mode均未顯式打開，Syncer線程數16。且除DM任務外，幾乎沒有其他寫入動作。

同步開始後，通過Grafana的TiDB/KV Errors面板觀察到持續的寫衝突，如下圖所示。

同時AlertManager出現大量關於事務重試的報警。

嘗試逐個停止DM任務，發現一旦停止其中流量最大的那一個（接入40+張表，數千QPS），寫衝突即消失。

分析問題

查看TiDB Server的日誌，其中頻繁打印prewrite encounters lock，表示預寫階段有鎖衝突，並且DM Syncer寫入採用的是樂觀事務。

[2021/02/08 15:47:40.112 +08:00] [INFO] [2pc.go:822] ["prewrite encounters lock"] [conn=0] [lock="key: {metaKey=true, key=DB:820, field=TID:864}, primary: {metaKey=true, key=DB:820, field=TID:864}, txnStartTS: 422778099492716553, lockForUpdateTS:0, ttl: 3001, type: Put"]
[2021/02/08 15:47:40.114 +08:00] [WARN] [txn.go:66] [RunInNewTxn] ["retry txn"=422778099492716555] ["original txn"=422778099492716555] [error="[kv:9007]Write conflict, txnStartTS=422778099492716555, conflictStartTS=422778099492716553, conflictCommitTS=422778099492716557, key={metaKey=true, key=DB:820, field=TID:864} primary={metaKey=true, key=DB:820, field=TID:864} [try again later]"]
[2021/02/08 15:47:40.116 +08:00] [INFO] [2pc.go:1336] ["2PC clean up done"] [txnStartTS=422778099492716555]
[2021/02/08 15:47:40.126 +08:00] [WARN] [txn.go:66] [RunInNewTxn] ["retry txn"=422778099492716568] ["original txn"=422778099492716568] [error="[kv:9007]Write conflict, txnStartTS=422778099492716568, conflictStartTS=422778099492716567, conflictCommitTS=422778099492716570, key={metaKey=true, key=DB:820, field=TID:862} primary={metaKey=true, key=DB:820, field=TID:862} [try again later]"]
[2021/02/08 15:47:40.127 +08:00] [INFO] [2pc.go:1336] ["2PC clean up done"] [txnStartTS=422778099492716568]
[2021/02/08 15:47:40.305 +08:00] [INFO] [2pc.go:822] ["prewrite encounters lock"] [conn=0] [lock="key: {metaKey=true, key=DB:820, field=TID:862}, primary: {metaKey=true, key=DB:820, field=TID:862}, txnStartTS: 422778099532038211, lockForUpdateTS:0, ttl: 3001, type: Put"]
[2021/02/08 15:47:40.309 +08:00] [WARN] [txn.go:66] [RunInNewTxn] ["retry txn"=422778099532038213] ["original txn"=422778099532038213] [error="[kv:9007]Write conflict, txnStartTS=422778099532038213, conflictStartTS=422778099532038211, conflictCommitTS=422778099545145351, key={metaKey=true, key=DB:820, field=TID:862} primary={metaKey=true, key=DB:820, field=TID:862} [try again later]"]
[2021/02/08 15:47:40.311 +08:00] [INFO] [2pc.go:1336] ["2PC clean up done"] [txnStartTS=422778099532038213]
[2021/02/08 15:47:40.365 +08:00] [INFO] [2pc.go:822] ["prewrite encounters lock"] [conn=0] [lock="key: {metaKey=true, key=DB:820, field=TID:862}, primary: {metaKey=true, key=DB:820, field=TID:862}, txnStartTS: 422778099558252548, lockForUpdateTS:0, ttl: 3001, type: Put"]
[2021/02/08 15:47:40.367 +08:00] [WARN] [txn.go:66] [RunInNewTxn] ["retry txn"=422778099558252549] ["original txn"=422778099558252549] [error="[kv:9007]Write conflict, txnStartTS=422778099558252549, conflictStartTS=422778099558252548, conflictCommitTS=422778099558252551, key={metaKey=true, key=DB:820, field=TID:862} primary={metaKey=true, key=DB:820, field=TID:862} [try again later]"]
[2021/02/08 15:47:40.368 +08:00] [INFO] [2pc.go:1336] ["2PC clean up done"] [txnStartTS=422778099558252549]
[2021/02/08 15:47:41.514 +08:00] [INFO] [2pc.go:822] ["prewrite encounters lock"] [conn=0] [lock="key: {metaKey=true, key=DB:820, field=TID:862}, primary: {metaKey=true, key=DB:820, field=TID:862}, txnStartTS: 422778099859718155, lockForUpdateTS:0, ttl: 3001, type: Put"]
[2021/02/08 15:47:41.516 +08:00] [WARN] [txn.go:66] [RunInNewTxn] ["retry txn"=422778099859718158] ["original txn"=422778099859718158] [error="[kv:9007]Write conflict, txnStartTS=422778099859718158, conflictStartTS=422778099859718155, conflictCommitTS=422778099859718160, key={metaKey=true, key=DB:820, field=TID:862} primary={metaKey=true, key=DB:820, field=TID:862} [try again later]"]
[2021/02/08 15:47:41.517 +08:00] [INFO] [2pc.go:1336] ["2PC clean up done"] [txnStartTS=422778099859718158]

但是，TiKV日誌中並未發現與寫衝突相關的任何信息（幾乎都是與not leader相關的）。參考官方文檔“樂觀事務模型下寫寫衝突問題排查”一節，同樣無法從上述日誌中定位出衝突的數據及主鍵信息（沒有tableID、indexID、handle等有效的字段）。

那麼，形如key={metaKey=true, key=DB:820, field=TID:862}的日誌是在哪裏輸出的？既然文檔不能解決問題，那麼就直接上源碼。來到store/tikv/snapshot.go文件，部分代碼如下。

func newWriteConflictError(conflict *pb.WriteConflict) error {
    var buf bytes.Buffer
    prettyWriteKey(&buf, conflict.Key)
    buf.WriteString(" primary=")
    prettyWriteKey(&buf, conflict.Primary)
    return kv.ErrWriteConflict.FastGenByArgs(conflict.StartTs, conflict.ConflictTs, conflict.ConflictCommitTs, buf.String())
}

func prettyWriteKey(buf *bytes.Buffer, key []byte) {
    tableID, indexID, indexValues, err := tablecodec.DecodeIndexKey(key)
    if err == nil {
        _, err1 := fmt.Fprintf(buf, "{tableID=%d, indexID=%d, indexValues={", tableID, indexID)
        // ...
        return
    }

    tableID, handle, err := tablecodec.DecodeRecordKey(key)
    if err == nil {
        _, err3 := fmt.Fprintf(buf, "{tableID=%d, handle=%d}", tableID, handle)
        // ...
        return
    }

    mKey, mField, err := tablecodec.DecodeMetaKey(key)
    if err == nil {
        _, err3 := fmt.Fprintf(buf, "{metaKey=true, key=%s, field=%s}", string(mKey), string(mField))
        // ...
        return
    }
    // ...
}

可見，當產生寫衝突時，prettyWriteKey()函數會負責輸出衝突的key信息，而帶有metaKey=true的自然是表示元數據key有衝突。從tablecodec.DecodeMetaKey()方法中並不能得到關於元數據的太多細節，繼續來到源碼meta/meta.go文件，其註釋恰好描述了元數據的結構。

Meta structure:
    NextGlobalID -> int64
    SchemaVersion -> int64
    DBs -> {
        DB:1 -> db meta data []byte
        DB:2 -> db meta data []byte
    }
    DB:1 -> {
        Table:1 -> table meta data []byte
        Table:2 -> table meta data []byte
        TID:1 -> int64
        TID:2 -> int64
    }

執行curl [tidb_addr]:10080/db-table/[TID]命令，通過TID（等同於tableID）可以查詢到對應的表名及庫名。上述TID爲862的表是一個寫入量較大的業務表，但按照常理也不應出現如此頻繁的寫衝突，所以問題只可能出現在該表對應的元數據內部。

繼續向下看與元數據相關的字段。

var (
    mMetaPrefix       = []byte("m")
    mNextGlobalIDKey  = []byte("NextGlobalID")
    mSchemaVersionKey = []byte("SchemaVersionKey")
    mDBs              = []byte("DBs")
    mDBPrefix         = "DB"
    mTablePrefix      = "Table"
    mSequencePrefix   = "SID"
    mSeqCyclePrefix   = "SequenceCycle"
    mTableIDPrefix    = "TID"
    mRandomIDPrefix   = "TARID"
    mBootstrapKey     = []byte("BootstrapKey")
    mSchemaDiffPrefix = "Diff"
)

通過mTableIDPrefix、mRandomIDPrefix等字段可以推測，表元數據內維護了當前自動生成的ID。繼續查看meta/autoid/autoid.go，能夠看到自動ID的分配器（即Allocator接口的實現）有如下4種，剛好與上面的元數據定義對得上。

const (
    // RowIDAllocType indicates the allocator is used to allocate row id.
    RowIDAllocType AllocatorType = iota
    // AutoIncrementType indicates the allocator is used to allocate auto increment value.
    AutoIncrementType
    // AutoRandomType indicates the allocator is used to allocate auto-shard id.
    AutoRandomType
    // SequenceType indicates the allocator is used to allocate sequence value.
    SequenceType
)

通過自動生成ID的函數generateAutoIDByAllocType()向下追溯可知，TiDB對RowIDAllocType和AutoIncrementType的處理方式相同，也就是說行ID和自增ID都是維護在以TID爲前綴的元數據key對應的value中。

func generateAutoIDByAllocType(m *meta.Meta, dbID, tableID, step int64, allocType AllocatorType) (int64, error) {
    switch allocType {
    case RowIDAllocType, AutoIncrementType:
        return m.GenAutoTableID(dbID, tableID, step)
    case AutoRandomType:
        return m.GenAutoRandomID(dbID, tableID, step)
    case SequenceType:
        return m.GenSequenceValue(dbID, tableID, step)
    default:
        return 0, ErrInvalidAllocatorType.GenWithStackByArgs()
    }
}

// GenAutoTableID adds step to the auto ID of the table and returns the sum.
func (m *Meta) GenAutoTableID(dbID, tableID, step int64) (int64, error) {
    // Check if DB exists.
    dbKey := m.dbKey(dbID)
    if err := m.checkDBExists(dbKey); err != nil {
        return 0, errors.Trace(err)
    }
    // Check if table exists.
    tableKey := m.tableKey(tableID)
    if err := m.checkTableExists(dbKey, tableKey); err != nil {
        return 0, errors.Trace(err)
    }
    return m.txn.HInc(dbKey, m.autoTableIDKey(tableID), step)
}

func (m *Meta) autoTableIDKey(tableID int64) []byte {
    return []byte(fmt.Sprintf("%s:%d", mTableIDPrefix, tableID))
}

查看TID爲862的表schema，發現其主鍵定義爲bigint(20) NOT NULL AUTO_INCREMENT類型，所以高度懷疑是該表的自增ID引起了寫衝突。

由於DM同步任務插入數據是採用INSERT INTO VALUES(...)語法，故來到executor/insert_common.go的insertRows()函數，它負責處理此類SQL語句。

// insertRows processes `insert|replace into values ()` or `insert|replace into set x=y`
func insertRows(ctx context.Context, base insertCommon) (err error) {
    e := base.insertCommon()
    // ...
    e.lazyFillAutoID = true
    // ...
    for i, list := range e.Lists {
        e.rowCount++
        var row []types.Datum
        row, err = evalRowFunc(ctx, list, i)
        if err != nil {
            return err
        }
        rows = append(rows, row)
        if batchInsert && e.rowCount%uint64(batchSize) == 0 {
            // ...
            // Before batch insert, fill the batch allocated autoIDs.
            rows, err = e.lazyAdjustAutoIncrementDatum(ctx, rows)
            if err != nil {
                return err
            }
            // ...
        }
    }
    // ...
}

根據註釋，lazyAdjustAutoIncrementDatum()函數用來填充此批次內的自動ID。注意到它首先會嘗試獲取插入數據中自動ID列對應的數據，如果非空且非0，就會直接使用該ID，但同時會調用Table.RebaseAutoID()方法來根據當前ID重置自動ID的起點。RebaseAutoID()方法實際調用的是各Allocator的Rebase()方法。

func (e *InsertValues) lazyAdjustAutoIncrementDatum(ctx context.Context, rows [][]types.Datum) ([][]types.Datum, error) {
    // ...
    for processedIdx := 0; processedIdx < rowCount; processedIdx++ {
        autoDatum := rows[processedIdx][idx]

        var err error
        var recordID int64
        if !autoDatum.IsNull() {
            recordID, err = getAutoRecordID(autoDatum, &col.FieldType, true)
            if err != nil {
                return nil, err
            }
        }
        // Use the value if it's not null and not 0.
        if recordID != 0 {
            err = e.Table.RebaseAutoID(e.ctx, recordID, true, autoid.RowIDAllocType)
            if err != nil {
                return nil, err
            }
            e.ctx.GetSessionVars().StmtCtx.InsertID = uint64(recordID)
            retryInfo.AddAutoIncrementID(recordID)
            continue
        }
        // ...
    }
    // ...
}

func (alloc *allocator) Rebase(tableID, requiredBase int64, allocIDs bool) error {
    if tableID == 0 {
        return errInvalidTableID.GenWithStack("Invalid tableID")
    }

    alloc.mu.Lock()
    defer alloc.mu.Unlock()

    if alloc.isUnsigned {
        return alloc.rebase4Unsigned(tableID, uint64(requiredBase), allocIDs)
    }
    return alloc.rebase4Signed(tableID, requiredBase, allocIDs)
}

不論是有符號還是無符號的rebase，都會調用kv.RunInNewTxn()方法（注意到它出現在了上文TiDB的日誌中）來啓動一個新事務來嘗試調整自動ID的區間。

func (alloc *allocator) rebase4Unsigned(tableID int64, requiredBase uint64, allocIDs bool) error {
    // ...
    err := kv.RunInNewTxn(context.Background(), alloc.store, true, func(ctx context.Context, txn kv.Transaction) error {
        m := meta.NewMeta(txn)
        currentEnd, err1 := getAutoIDByAllocType(m, alloc.dbID, tableID, alloc.allocType)
        if err1 != nil {
            return err1
        }
        uCurrentEnd := uint64(currentEnd)
        if allocIDs {
            newBase = mathutil.MaxUint64(uCurrentEnd, requiredBase)
            newEnd = mathutil.MinUint64(math.MaxUint64-uint64(alloc.step), newBase) + uint64(alloc.step)
        } else {
            if uCurrentEnd >= requiredBase {
                newBase = uCurrentEnd
                newEnd = uCurrentEnd
                return nil
            }
            newBase = requiredBase
            newEnd = requiredBase
        }
        _, err1 = generateAutoIDByAllocType(m, alloc.dbID, tableID, int64(newEnd-uCurrentEnd), alloc.allocType)
        return err1
    })
    // ...
}

推源碼推到這裏，答案已經呼之欲出了。詢問業務側對此表的寫入方式，答覆是插入數據時顯式指定了自增列的值。由於TiDB是採用分段緩存的方式維護自增ID的（詳情查看官方文檔中對AUTO_INCREMENT的解釋），顯式插入的自增ID值大概率會導致自動分配的ID區間頻繁rebase。再加上我們是採用LB組件下掛3個TiDB Server的方式作爲DM的target，多個TiDB實例之間還會爭搶自增ID的分段，使寫衝突更加嚴重。

解決問題

簡單粗暴的方法是要求業務端不要指定ID，但代價比較大，故我們嘗試去掉此表主鍵列的自增屬性。設置系統變量：

SET SESSION tidb_allow_remove_auto_inc = 1;

然後執行ALTER TABLE語句：

ALTER TABLE warehouse_db_new.warehouse_sku
MODIFY sku_id bigint(20) NOT NULL COMMENT 'SKU ID';

執行完畢後，寫衝突明顯下降，大功告成。

The End

還有幾天就過年了，預祝大佬們春節快樂~

一例TiDB DM同步任務寫衝突的分析與解決

提出問題

分析問題

解決問題

The End

淺談軟件工程中的Shim

Flink RichFunction題目一則

「Daylight -デイライト-」（日光）

2022。

淺談Flink批模式Adaptive Hash Join

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結