使用 go-randgen 測試 join 查詢

在數據庫的查詢中，join 是最常用的查詢之一，由於 join 算法實現的複雜性，出現問題的概率較大，我們對 TiDB 中出現過的 join 問題進行分析，將易發生問題的場景歸爲如下幾類：

相同的 join 查詢，join key 爲不同的數據類型
在分區表上進行 join
相同的 join 查詢，不同的 join 實現算法
特殊的查詢條件

我們從這些場景入手，在過去的幾個月，使用 go-randgen 框架對 TiDB 進行測試。下文將對go-randgen 測試框架，及我們測試工作中的相關內容進行詳述，分爲以下 4 部分展開:

go-randgen 測試框架的簡介
通過示例對 go-randgen 的使用方法進行介紹
go-randgen 工具在 TiDB 測試中的實踐及效果
未來可以基於 go-randgen 的進一步工作，以及其它相關工作的介紹

go-randgen 簡介

go-randgen 是一個完全可配置的測試框架，它允許創建隨機數據集，並對其運行隨機生成的查詢，進而通過 A/B test 驗證查詢結果的正確性。

go-randgen 使用示例

以 join 測試爲例，go-randgen 相關語法格式可以參考 https://github.com/pingcap/go-randgen，分爲 3 個步驟:

定義 zz 文件，指定表的生成規則，如數據類型，表類型，行數等
定義 yy 文件，指定隨機 SQL 的生成規則
使用生成的表結構和 SQL 運行 A/B test

下文通過示例，分別對每個步驟進行詳述：

1.定義 join.zz.lua。該例中的 zz 文件，可以生成 6 張表，每張表中都有 17 個字段與 fileds.types 中定義的類型對應。這 6 張表分別是:

table_400_undef_undef_1（400 行數據）
table_400_undef_4_1（400 行數據且有 4 個分區）
table_300_undef_undef_1（300 行數據）
table_300_undef_4_1（300 行數據且 4 個分區）
table_290_undef_undef_1（290 行數據）
table_290_undef_4_1（290 行數據且 4 個分區）

tables = {
    rows = {400, 300, 290},
    partitions = {'undef', 4},
}

fields = {
    types = {'int', 'tinyint', 'smallint', 'bigint', 'decimal(40, 20)', 'float', 'double', 'char(20)', 'varchar(20)', 'enum', 'set', 'datetime', 'bool', 'bit(64)', 'timestamp', 'year', 'date'},
    keys = {'key'},
}

data = {
    numbers = {'null', 'tinyint', 'smallint',
        'decimal',
    },
    smallint = {null, 'smallint'},
    mediumint = {null, 'mediumint'},
    tinyint = {null, 'tinyint'},
    bool = {1, 0, null},
    year = {'null', 'year'},
    datetime = {'null', 'datetime'},
    timestamp = {'null', 'datetime'},
    date = {'null', 'date'},
    strings = {'null', 'letter', 'english'},
}

2.定義 join.yy 文件。該例中的 yy 文件，通過 hint 指定生成 inl_merge_join 和 inl_hash_join 算法查詢語句。生成的 sql 語句中除指定字段外，查詢條件中的表和字段將隨機組合而成。

生成的的 SQL 示例：

SELECT /*+ inl_hash_join(t1) */ t1.pk, t2.pk from table_290_undef_undef_1 t1, table_400_undef_undef_1 t2 where t1. `col_enum_key_signed` = t2. `col_int_key_signed` and t1. `col_smallint_key_signed` < -5418830167423061551 order by t1.pk, t2.pk;

query:
    select

select:
    SELECT hint_begin inl_merge_join(t1, t2) */ col_list FROM _table  t1, _table t2 where condition and condition1 order by t1.pk, t2.pk;
    SELECT hint_begin inl_hash_join(t1) */ col_list from _table  t1, _table t2 where condition and condition1 order by t1.pk, t2.pk;
    
col_list:
    t1.pk, t2.pk

condition:
    t1. _field = t2. _field

condition1:
    t1. _field_int < _int

hint_begin:
    /*+

3.根據 join.zz.lua 和 join.yy 文件運行 A/B test。本例中，會對 TiDB 查詢結果與 MySQL 查詢結果進行對比。查詢結果不一致的 SQL，將會被記錄在當前目錄的 dump 子目錄中。

./go-randgen exec -Z join.zz.lua -Y join.yy --dsn1 "root:password@tcp(127.0.0.1:3306)/test" --dsn2 "root:@tcp(127.0.0.1:4000)/test" -Q 2000
2020/12/25 16:37:18 Open DB ok, starting generate data in two db by ddls
2020/12/25 16:37:18 load zz from join.zz.lua
2020/12/25 16:37:20 generating data ok
2020/12/25 16:37:20 starting execute sqls generated by yy
2020/12/25 16:37:20 load yy from join.yy
2020/12/25 16:37:32 dump ok

go-randgen 在 TiDB 測試中的實踐

通過 go-randgen 對 TiDB 的 join 算法進行測試，我們目前已發現 10 個正確性相關的問題，例如：

通過對不同類型覆蓋，進行列值比較。如：select * from _table where _field > _field。發現時間列和 year 列比較錯誤，記錄在 tidb/issues/20121 中。
對 distinct 語句進行測試。如： select count(distinct(t1. _field)), count(distinct t1. _field, t1. _field) from table_400_utf8_undef t1, table_290_utf8_undef t2 where t1. _field = t2. _field and t1. _field = t2. _field and t1. _field_int != _int。發現 distinct 計算錯誤，記錄在 tidb/issues/20237 中。
除了隨機類型外，通過擴大單條語句的覆蓋範圍，隨機組合語句，使 SQL 語句上下文具有關聯性。如：alter table _table add index {print(string.format("t%d", math.random(10,2000000)))} (_field); SELECT t1.pk, t2.pk from t t1 left join t t2 on t1. _field = t2. _field where t1. _field != _int order by t1.pk, t2.pk。發現添加索引後，查詢報錯，記錄在 tidb/issues/20698 中。

發現的這些 issue 提醒我們對 TiDB 質量要有敬畏之心，並且也印證了從過往發現的問題進行分析，歸納場景進而擴大測試點範圍的方法是可行的。後續的 join 測試，也將繼續覆蓋更多的數據類型，嘗試更多的語句組合、場景組合，例如在事務中添加數據、刪除數據，再與 join 查詢隨機組合。

未來工作

我們可以通過不斷完善 zz 和 yy 文件，來提高 join 測試覆蓋面。但是通過 go-randgen 生成的 SQL 具有十分固定的結構，如果不能提前知道測試點，就不能構造出查詢語句，進行有效的測試覆蓋。那有沒有方法隨機生成 join 查詢語句，進行測試呢？目前我們正在 Horoscope（優化器檢測工具）中實現隨機生成 join 查詢的功能。此外，我們參考了 Manuel Rigger 的 “Testing Database Engines via Pivoted Query Synthesis” 論文中的思路，Horoscope 會在某些表中隨機選擇一行數據作爲 pivot row 去構建查詢，使得查詢返回的結果會包含這些選擇的行。

由於 join 查詢的複雜性，join 測試會是一個長期但價值大的事情，除此之外，優化器測試、region 測試、TiDB 集羣 chaos 測試以及事務測試也是非常重要並且價值大的工作。如果你有興趣，歡迎使用 go-randgen 或其他工具對 TiDB 進行測試，對於發現的問題可以在 GitHub 創建 issue 反饋給我們。如果你有更好的測試方法、測試思路和測試工具，歡迎在 TUG 中和我們進行交流。

使用 go-randgen 測試 join 查詢

go-randgen 簡介

go-randgen 使用示例

go-randgen 在 TiDB 測試中的實踐

未來工作

Python 潮流週刊#52：Python 處理 Excel 的資源

Explore the Sky丨來 TiDB Hackathon 2021 探索無限可能

成爲一棧式數據服務生態： TiDB 5.0 HTAP 架構設計與成爲場景解析

Async Commit 原理介紹

In Community We Trust

數據庫領域正在發生鉅變，從 TiDB 5.0 發佈會看未來的數據庫發展趨勢

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結