簡介：本文將進一步介紹 PolarDB-X 中 INSERT IGNORE 的執行流程，其根據插入的表是否有 GSI 也有所變化。

作者：潛璟

在上一篇源碼閱讀中，我們介紹了 INSERT 的執行流程。而 INSERT IGNORE 與 INSERT 不同，需要對插入值判斷是否有 Unique Key 的衝突，並忽略有衝突的插入值。因此本文將進一步介紹 PolarDB-X 中 INSERT IGNORE 的執行流程，其根據插入的表是否有 GSI 也有所變化。

下推執行
如果插入的表只有一張主表，沒有 GSI，那麼只需要將 INSERT IGNORE 直接發送到對應的物理表上，由 DN 自行忽略存在衝突的值。在這種情況下，INSERT IGNORE 的執行過程和 INSERT 基本上相同，讀者可以參考之前的源碼閱讀文章。

邏輯執行
而在有 GSI 的情況下，就不能簡單地將 INSERT IGNORE 分別下發到主表和 GSI 對應的物理分表上，否則有可能出現主表和 GSI 數據不一致的情況。舉個例子：

create table t1 (a int primary key, b int, global index g1(b) dbpartition by hash(b)) dbpartition by hash(a);
insert ignore into t1 values (1,1),(1,2);

對於插入的兩條記錄，它們在主表上位於同一個物理表（a 相同），但是在 GSI 上位於不同的物理表（b 不相同），如果直接下發 INSERT IGNORE 的話，主表上只有 (1,1) 能夠成功插入（主鍵衝突），而在 GSI 上 (1,1) 和 (1,2) 都能成功插入，於是 GSI 比主表多了一條數據。

針對這種情況，一種解決方案是根據插入值中的 Unique Key，先到數據庫中 SELECT 出有可能衝突的數據到 CN，然後在 CN 判斷衝突的值並刪除。

進行 SELECT 的時候，最簡單的方式就是將所有的 SELECT 直接發送到主表上，但是主表上可能沒有對應的 Unique Key，這就導致 SELECT 的時候會進行全表掃描，影響性能。所以在優化器階段，我們會根據 Unique Key 是在主表還是 GSI 上定義的來確定相應的 SELECT 需要發送到主表還是 GSI，具體代碼位置：

com.alibaba.polardbx.optimizer.core.planner.rule.OptimizeLogicalInsertRule#groupUkByTable

protected Map<String, List<List<String>>> groupUkByTable(LogicalInsertIgnore insertIgnore,
                                                             ExecutionContext executionContext) {
        // 找到每個 Unique Key 在主表和哪些 GSI 中存在
        Map<Integer, List<String>> ukAllTableMap = new HashMap<>();
        for (int i = 0; i < uniqueKeys.size(); i++) {
            List<String> uniqueKey = uniqueKeys.get(i);
            for (Map.Entry<String, Map<String, Set<String>>> e : writableTableUkMap.entrySet()) {
                String currentTableName = e.getKey().toUpperCase();
                Map<String, Set<String>> currentUniqueKeys = e.getValue();
                boolean found = false;
                for (Set<String> currentUniqueKey : currentUniqueKeys.values()) {
                    if (currentUniqueKey.size() != uniqueKey.size()) {
                        continue;
                    }
                    boolean match = currentUniqueKey.containsAll(uniqueKey);
                    if (match) {
                        found = true;
                        break;
                    }
                }
                if (found) {
                    ukAllTableMap.computeIfAbsent(i, k -> new ArrayList<>()).add(currentTableName);
                }
            }
        }

        // 確定是在哪一個表上進行 SELECT
        for (Map.Entry<Integer, List<String>> e : ukAllTableMap.entrySet()) {
            List<String> tableNames = e.getValue();

            if (tableNames.contains(primaryTableName.toUpperCase())) {
                tableUkMap.computeIfAbsent(primaryTableName.toUpperCase(), k -> new ArrayList<>())
                    .add(uniqueKeys.get(e.getKey()));
            } else {
                final boolean onlyNonPublicGsi =
                    tableNames.stream().noneMatch(tn -> GlobalIndexMeta.isPublished(executionContext, sm.getTable(tn)));

                boolean found = false;
                for (String tableName : tableNames) {
                    if (!onlyNonPublicGsi && GlobalIndexMeta.isPublished(executionContext, sm.getTable(tableName))) {
                        tableUkMap.computeIfAbsent(tableName, k -> new ArrayList<>()).add(uniqueKeys.get(e.getKey()));
                        found = true;
                        break;
                    } else if (onlyNonPublicGsi && GlobalIndexMeta.canWrite(executionContext, sm.getTable(tableName))) {
                        tableUkMap.computeIfAbsent(tableName, k -> new ArrayList<>()).add(uniqueKeys.get(e.getKey()));
                        found = true;
                        break;
                    }
                }
            }
        }

        return tableUkMap;
    }

而到了執行階段，我們在 LogicalInsertIgnoreHandler 中處理 INSERT IGNORE。我們首先會進入 getDuplicatedValues 函數，其通過下發 SELECT 的方式查找表中已有的衝突的 Unique Key 的記錄。我們將下發的 SELECT 語句中選擇的列設置爲 (value_index, uk_index, pk)。其中 value_index 和 uk_index 均爲的常量。

舉個例子，假設有表：

CREATE TABLE `t` (
    `id` int(11) NOT NULL,
    `a` int(11) NOT NULL,
    `b` int(11) NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE GLOBAL KEY `g_i_a` (`a`) COVERING (`id`) DBPARTITION BY HASH(`a`)
) DBPARTITION BY HASH(`id`)

以及一條 INSERT IGNORE 語句：

INSERT IGNORE INTO t VALUES (1,2,3),(2,3,4),(3,4,5);

假設在 PolarDB-X 中執行時，其會將 Unique Key 編號爲

0: id
1: g_i_a

INSERT IGNORE 語句中插入的每個值分別編號爲

0: (1,2,3)
1: (2,3,4)
2: (3,4,5)

那麼對於 (2,3,4) 的 UNIQUE KEY 構造的 GSI 上的 SELECT 即爲：

查詢 GSI

SELECT 1 as `value_index`, 1 as `uk_index`, `id`
FROM `g_i_a_xxxx`
WHERE `a` in 3;

假設表中已經存在 (5,3,6)，那麼這條 SELECT 的返回結果即爲 (1,1,5)。此外，由於不同的 Unique Key 的 SELECT 返回格式是相同的，所以我們會將同一個物理庫上不同的SELECT 查詢 UNION 起來發送，以一次性得到多個結果，減少 CN 和 DN 之間的交互次數。只要某個 Unique Key 有重複值，我們就能根據 value_index 和 uk_index 確定是插入值的哪一行的哪個 Unique Key 是重複的。

當得到所有的返回結果之後，我們對數據進行去重。我們將上一步得到的衝突的的值放入一個 SET 中，然後順序掃描所有的每一行插入值，如果發現有重複的就跳過該行，否則就將該行也加入到 SET 中（因爲插入值之間也有可能存在相互衝突）。去重完畢之後，我們就得到了所有不存在衝突的值，將這些值插入到表中之後就完成了一條 INSERT IGNORE 的執行。

邏輯執行的執行流程：

com.alibaba.polardbx.repo.mysql.handler.LogicalInsertIgnoreHandler#doExecute

protected int doExecute(LogicalInsert insert, ExecutionContext executionContext,
                            LogicalInsert.HandlerParams handlerParams) {
        // ...

        try {
            Map<String, List<List<String>>> ukGroupByTable = insertIgnore.getUkGroupByTable();
            List<Map<Integer, ParameterContext>> deduplicated;
            List<List<Object>> duplicateValues;
            // 獲取表中已有的 Unique Key 衝突值
            duplicateValues = getDuplicatedValues(insertIgnore, LockMode.SHARED_LOCK, executionContext, ukGroupByTable,
                (rowCount) -> memoryAllocator.allocateReservedMemory(
                    MemoryEstimator.calcSelectValuesMemCost(rowCount, selectRowType)), selectRowType, true,
                handlerParams);

            final List<Map<Integer, ParameterContext>> batchParameters =
                executionContext.getParams().getBatchParameters();

            // 根據上一步得到的結果，去掉 INSERT IGNORE 中的衝突值
            deduplicated = getDeduplicatedParams(insertIgnore.getUkColumnMetas(), insertIgnore.getBeforeUkMapping(),
                insertIgnore.getAfterUkMapping(), RelUtils.getRelInput(insertIgnore), duplicateValues,
                batchParameters, executionContext);

            if (!deduplicated.isEmpty()) {
                insertEc.setParams(new Parameters(deduplicated));
            } else {
                // All duplicated
                return affectRows;
            }

            // 執行 INSERT
            try {
                if (gsiConcurrentWrite) {
                    affectRows = concurrentExecute(insertIgnore, insertEc);
                } else {
                    affectRows = sequentialExecute(insertIgnore, insertEc);
                }
            } catch (Throwable e) {
                handleException(executionContext, e, GeneralUtil.isNotEmpty(insertIgnore.getGsiInsertWriters()));
            }
        } finally {
            selectValuesPool.destroy();
        }
        return affectRows;
    }

RETURNING 優化
上一節提到的 INSERT IGNORE 的邏輯執行方式，雖然保證了數據的正確性，但是也使得一條 INSERT IGNORE 語句至少需要 CN 和 DN 的兩次交互才能完成（第一次 SELECT，第二次 INSERT），影響了 INSERT IGNORE 的執行性能。

目前的 DN 已經支持了 AliSQL 的 RETURNING 優化，其可以在 DN 的 INSERT IGNORE 執行完畢之後返回成功插入的值。利用這一功能，PolarDB-X 對 INSERT IGNORE 進行了進一步的優化：直接將 INSERT IGNORE 下發，如果在主表和 GSI 上全部成功返回，那麼就說明插入值中沒有衝突，於是就成功完成該條 INSERT IGNORE 的執行；否則就將多插入的值刪除。

執行時，CN 首先會根據上文中的語法下發帶有 RETURNING 的物理 INSERT IGNORE 語句到 DN，比如：

call dbms_trans.returning("a", "insert into t1_xxxx values(1,1)");

其中返回列是主鍵，用來標識插入的一批數據中哪些被成功插入了；t1_xxxx 是邏輯表 t1 的一個物理分表。當主表和 GSI 上的所有 INSERT IGNORE 執行完畢之後，我們計算主表和 GSI 中成功插入值的交集作爲最後的結果，然後刪除多插入的值。這部分代碼在

com.alibaba.polardbx.repo.mysql.handler.LogicalInsertIgnoreHandler#getRowsToBeRemoved

private Map<String, List<List<Object>>> getRowsToBeRemoved(String tableName,
                                                               Map<String, List<List<Object>>> tableInsertedValues,
                                                               List<Integer> beforePkMapping,
                                                               List<ColumnMeta> pkColumnMetas) {
        final Map<String, Set<GroupKey>> tableInsertedPks = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
        final Map<String, List<Pair<GroupKey, List<Object>>>> tablePkRows =
            new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
        tableInsertedValues.forEach((tn, insertedValues) -> {
            final Set<GroupKey> insertedPks = new TreeSet<>();
            final List<Pair<GroupKey, List<Object>>> pkRows = new ArrayList<>();
            for (List<Object> inserted : insertedValues) {
                final Object[] groupKeys = beforePkMapping.stream().map(inserted::get).toArray();
                final GroupKey pk = new GroupKey(groupKeys, pkColumnMetas);
                insertedPks.add(pk);
                pkRows.add(Pair.of(pk, inserted));
            }
            tableInsertedPks.put(tn, insertedPks);
            tablePkRows.put(tn, pkRows);
        });

        // Get intersect of inserted values
        final Set<GroupKey> distinctPks = new TreeSet<>();
        for (GroupKey pk : tableInsertedPks.get(tableName)) {
            if (tableInsertedPks.values().stream().allMatch(pks -> pks.contains(pk))) {
                distinctPks.add(pk);
            }
        }

        // Remove values which not exists in at least one insert results
        final Map<String, List<List<Object>>> tableDeletePks = new TreeMap<>(String.CASE_INSENSITIVE_ORDER);
        tablePkRows.forEach((tn, pkRows) -> {
            final List<List<Object>> deletePks = new ArrayList<>();
            pkRows.forEach(pkRow -> {
                if (!distinctPks.contains(pkRow.getKey())) {
                    deletePks.add(pkRow.getValue());
                }
            });
            if (!deletePks.isEmpty()) {
                tableDeletePks.put(tn, deletePks);
            }
        });
        return tableDeletePks;
    }

與上一節的邏輯執行的“悲觀執行”相比，使用 RETURNING 優化的 INSERT IGNORE 相當於“樂觀執行”，如果插入的值本身沒有衝突，那麼一條 INSERT IGNORE 語句 CN 和 DN 間只需要一次交互即可；而在有衝突的情況下，我們需要下發 DELETE 語句將主表或 GSI 中多插入的值刪除，於是 CN 和 DN 間需要兩次交互。可以看出，即便是有衝突的情況，CN 和 DN 間的交互次數也不會超過上一節的邏輯執行。因此在無法直接下推的情況下，INSERT IGNORE 的執行策略是默認使用 RETURNING 優化執行。

當然 RETURNING 優化的使用也有一些限制，比如插入的 Value 有重複主鍵時就不能使用，因爲這種情況下無法判斷具體是哪一行被成功插入，哪一行需要刪除；具體可以閱讀代碼中的條件判斷。當不能使用 RETURNING 優化時，系統會自動選擇上一節中的邏輯執行方式執行該條 INSERT IGNORE 語句以保證數據的正確性。

使用 RETURNING 優化的執行流程：

com.alibaba.polardbx.repo.mysql.handler.LogicalInsertIgnoreHandler#doExecute

protected int doExecute(LogicalInsert insert, ExecutionContext executionContext,
                            LogicalInsert.HandlerParams handlerParams) {
        // ...

        // 判斷能否使用 RETURNING 優化
        boolean canUseReturning =
            executorContext.getStorageInfoManager().supportsReturning() && executionContext.getParamManager()
                .getBoolean(ConnectionParams.DML_USE_RETURNING) && allDnUseXDataSource && gsiCanUseReturning
                && !isBroadcast && !ComplexTaskPlanUtils.canWrite(tableMeta);

        if (canUseReturning) {
            canUseReturning = noDuplicateValues(insertIgnore, insertEc);
        }

        if (canUseReturning) {
            // 執行 INSERT IGNORE 並獲得返回結果
            final List<RelNode> allPhyPlan =
                new ArrayList<>(replaceSeqAndBuildPhyPlan(insertIgnore, insertEc, handlerParams));
            getPhysicalPlanForGsi(insertIgnore.getGsiInsertIgnoreWriters(), insertEc, allPhyPlan);
            final Map<String, List<List<Object>>> tableInsertedValues =
                executeAndGetReturning(executionContext, allPhyPlan, insertIgnore, insertEc, memoryAllocator,
                    selectRowType);

            // ...

            // 生成 DELETE
            final boolean removeAllInserted =
                targetTableNames.stream().anyMatch(tn -> !tableInsertedValues.containsKey(tn));

            if (removeAllInserted) {
                affectedRows -=
                    removeInserted(insertIgnore, schemaName, tableName, isBroadcast, insertEc, tableInsertedValues);
                if (returnIgnored) {
                    ignoredRows = totalRows;
                }
            } else {
                final List<Integer> beforePkMapping = insertIgnore.getBeforePkMapping();
                final List<ColumnMeta> pkColumnMetas = insertIgnore.getPkColumnMetas();

                // 計算所有插入值的交集
                final Map<String, List<List<Object>>> tableDeletePks =
                    getRowsToBeRemoved(tableName, tableInsertedValues, beforePkMapping, pkColumnMetas);

                affectedRows -=
                    removeInserted(insertIgnore, schemaName, tableName, isBroadcast, insertEc, tableDeletePks);
                if (returnIgnored) {
                    ignoredRows +=
                        Optional.ofNullable(tableDeletePks.get(insertIgnore.getLogicalTableName())).map(List::size)
                            .orElse(0);
                }

            }

            handlerParams.optimizedWithReturning = true;

            if (returnIgnored) {
                return ignoredRows;
            } else {
                return affectedRows;
            }
        } else {
            handlerParams.optimizedWithReturning = false;
        }

        // ... 
    }

最後以一個例子來展現 RETURNING 優化的執行流程與邏輯執行的不同。通過 /+TDDL:CMD_EXTRA(DML_USE_RETURNING=TRUE)/ 這條 HINT，用戶可以手動控制是否使用 RETURNING 優化。

首先建表並插入一條數據：

CREATE TABLE `t` (
    `id` int(11) NOT NULL,
    `a` int(11) NOT NULL,
    `b` int(11) NOT NULL,
    PRIMARY KEY (`id`),
    UNIQUE GLOBAL KEY `g_i_a` (`a`) COVERING (`id`) DBPARTITION BY HASH(`a`)
) DBPARTITION BY HASH(`id`);

INSERT INTO t VALUES (1,3,3);

再執行一條 INSERT IGNORE：

INSERT IGNORE INTO t VALUES (1,2,3),(2,3,4),(3,4,5);

其中 (1,2,3) 與 (1,3,3) 主鍵衝突，(2,3,4) 與 (1,3,3) 對於 Unique Key g_i_a 衝突。如果是 RETURNING 優化：

可以看到 PolarDB-X 先進行了 INSERT IGNORE，再將多插入的數據刪除：(1,2,3) 在主表上衝突在 UGSI 上成功插入，(2,3,4) 在 UGSI 上衝突在主表上成功插入，因此分別下發對應的 DELETE 到 UGSI 和主表上。

如果關閉 RETURNING 優化，邏輯執行：

可以看到 PolarDB-X 先進行了 SELECT，再將沒有衝突的數據 (3,4,5) 插入。

小結
本文介紹了 PolarDB-X 中 INSERT IGNORE 的執行流程。除了 INSERT IGNORE 之外，還有一些 DML 語句在執行時也需要進行重複值的判斷，比如 REPLACE、INSERT ON DUPLICATE KEY UPDATE 等，這些語句在有 GSI 的情況下均採用了邏輯執行的方式，即先進行 SELECT 再進行判重、更新等操作，感興趣的讀者可以自行閱讀相關代碼。

歡迎關注PolarDB-X知乎機構號，閱讀更多技術好文。

原文鏈接：https://click.aliyun.com/m/1000362575/

本文爲阿里雲原創內容，未經允許不得轉載。

PolarDB-X 源碼解讀系列：DML 之 INSERT IGNORE 流程

查詢 GSI

Serverless Devs 重大更新，基於 Serverless 架構的 CI/CD 框架：Serverless-cd

5個編寫技巧，有效提高單元測試實踐

使用EasyCV Mask2Former輕鬆實現圖像分割

通過定時SQL提取阿里雲API網關訪問日誌指標

「開源人說」| 雲原生時代，做不忘初心開源牧馬人

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結