Semijoin 半連接

什麼是semi-join?

所謂的semi-join是指semi-join子查詢。 當一張表在另一張表找到匹配的記錄之後,半連接(semi-jion)返回第一張表中的記錄。

與條件連接相反,即使在右節點中找到幾條匹配的記錄,左節點 的表也只會返回一條記錄。另外,右節點的表一條記錄也不會返回。半連接通常使用IN 或 EXISTS 作爲連接條件。 該子查詢具有如下結構:

SELECT ... FROM outer_tables WHERE expr IN (SELECT ... FROM inner_tables ...) AND ...

即在where條件的“IN”中的那個子查詢就是semi-join

這種查詢的特點是我們只關心outer_table中與semi-join相匹配的記錄。
換句話說,最後的結果集是在outer_tables中的,而semi-join的作用只是對outer_tables中的記錄進行篩選。這也是我們進行 semi-join優化的基礎,即我們只需要從semi-join中獲取到最少量的足以對outer_tables記錄進行篩選的信息就足夠了。

所謂的最少量,體現到優化策略上就是如何去重。
以如下語句爲例:

select * from Country 
where 
  Country.Code in 
(select City.country 
                   from City 
                   where City.Population>1*1000*1000);

當中的semi-join: 
select City.country 
                   from City 
                   where City.Population>1*1000*1000

可能返回的結果集如下: China(Beijin), China(Shanghai), France(Paris)...
我們可以看到這裏有2個China,分別來至2條城市記錄Beijin和Shanghai, 但實際上我們只需要1個China就足夠對outer_table

Country進行篩選了。所以我們需要去重。

Mysql支持的Semi-join策略

Mysql支持的semi-join策略主要有5個,它們分別爲:

Convert the subquery to a join

直接轉爲join
Convert the subquery to a join, or use table pullout and run the query as an inner join between subquery tables and outer tables. Table pullout pulls a table out from the subquery to the outer query.

FirstMatch

只選用內部表的第1條與外表匹配的記錄。
FirstMatch: When scanning the inner tables for row combinations and there are multiple instances of a given value group, choose one rather than returning them all. This "shortcuts" scanning and eliminates production of unnecessary rows.

LooseScan

把inner-table數據基於索引進行分組,取每組第一條數據進行匹配。
LooseScan: Scan a subquery table using an index that enables a single value to be chosen from each subquery's value group.

Duplicate Weedout

使用臨時表對semi-join產生的結果集去重。
Duplicate Weedout: Run the semijoin as if it was a join and remove duplicate records using a temporary table.

Using index; Start temporary
Using where
Using index; End temporary

Start temporary, End temporary
表示半連接中使用了DuplicateWeedout策略的臨時表

MaterializeScan

將inner-table去重固化成臨時表,遍歷固化表,然後在outer-table上尋找匹配。

Materialize the subquery into an indexed temporary table that is used to perform a join, where the index is used to remove duplicates. The index might also be used later for lookups when joining the temporary table with the outer tables; if not, the table is scanned. For more information about materialization, see Section 8.2.2.2, “Optimizing Subqueries with Materialization”.

開啓半連接

mysql> SELECT @@optimizer_switch\G
*************************** 1. row ***************************
@@optimizer_switch: index_merge=on,index_merge_union=on,
index_merge_sort_union=on,
index_merge_intersection=on,
engine_condition_pushdown=on,
index_condition_pushdown=on,
mrr=on,mrr_cost_based=on,
block_nested_loop=on,batched_key_access=off,
materialization=on,semijoin=on,loosescan=on,
firstmatch=on,
subquery_materialization_cost_based=on,
use_index_extensions=on

Semijoin優化實戰

下列join查詢可能出現重複

SELECT class.class_num, class.class_name
    FROM class
    INNER JOIN roster
    WHERE class.class_num = roster.class_num;

However, the result lists each class once for each enrolled student. For the question being asked, this is unnecessary duplication of information.
但是,結果會爲每個註冊的學生列出一次每個班級。對於正在問的問題,這是不必要的信息重複。

Assuming that class_num is a primary key in the class table, duplicate suppression is possible by using SELECT DISTINCT, but it is inefficient to generate all matching rows first only to eliminate duplicates later.
The same duplicate-free result can be obtained by using a subquery:
假設class_num是class表中的主鍵,通過使用SELECT DISTINCT可以抑制重複,但是先生成所有匹配的行,然後再消除重複是低效的。

使用子查詢可以獲得相同的無重複結果:

SELECT class_num, class_name
    FROM class
    WHERE class_num IN
        (SELECT class_num FROM roster);

在這裏,優化器可以認識到IN子句要求子查詢只從花名冊表中返回每個類號的一個實例。在這種情況下,查詢可以使用半連接;也就是說,只返回類中與花名冊中的行匹配的每一行的一個實例的操作。
包含EXISTS子查詢謂詞的以下語句相當於包含IN子查詢謂詞的前一條語句:

SELECT class_num, class_name
    FROM class
    WHERE EXISTS
        (SELECT * FROM roster WHERE class.class_num = roster.class_num);

結論

有些join會出現重複,我們第一時間想到的是使用DISTINCT去重。但是這裏是先join後去重,效率低。
我們完全可以先去重後join。
so,使用in或EXISTS子查詢的Semijoin的方式可以優化之

執行計劃分析

1、使用join distinct

mysql> EXPLAIN SELECT DISTINCT
    a.spu,
    a.product_name productName,
    ( SELECT cat_name FROM `product_category` WHERE cat_no = a.cat_root_no LIMIT 1 ) catName,
    b.sku,
    b.k3_code,
    b.bar_code,
    b.model 
FROM
    jg_gift_rule d
    JOIN jg_gift_rel_product c ON d.id = c.gift_rule_id
    JOIN stock_product_detail b ON c.sku = b.sku
    JOIN stock_product a ON a.spu = b.spu 
WHERE
    d.del_status = 0 
    AND now( ) BETWEEN d.gift_start_time 
    AND d.gift_end_time;
+----+--------------------+------------------+------------+-------+----------------------------+----------------------+---------+------------------+------+----------+-------------------------------------------+
| id | select_type        | table            | partitions | type  | possible_keys              | key                  | key_len | ref              | rows | filtered | Extra                                     |
+----+--------------------+------------------+------------+-------+----------------------------+----------------------+---------+------------------+------+----------+-------------------------------------------+
|  1 | PRIMARY            | d                | NULL       | index | PRIMARY,index3             | index3               | 22      | NULL             |    1 |   100.00 | Using where; Using index; Using temporary |
|  1 | PRIMARY            | c                | NULL       | ref   | idx_sku_gift_rule_id,index | idx_sku_gift_rule_id | 8       | my.d.id          |    1 |   100.00 | Using index                               |
|  1 | PRIMARY            | b                | NULL       | ref   | idx_sku_spu,index          | idx_sku_spu          | 302     | my.c.sku         |    1 |   100.00 | Using index                               |
|  1 | PRIMARY            | a                | NULL       | ref   | sp_idx_spu                 | sp_idx_spu           | 302     | my.b.spu         |    1 |   100.00 | Using index                               |
|  2 | DEPENDENT SUBQUERY | product_category | NULL       | ref   | idx_cat_no                 | idx_cat_no           | 768     | my.a.cat_root_no |    1 |   100.00 | Using where; Using index                  |
+----+--------------------+------------------+------------+-------+----------------------------+----------------------+---------+------------------+------+----------+-------------------------------------------+
5 rows in set (0.03 sec)


       "table": {
            "table_name": "d",
            "access_type": "index",
            "possible_keys": [
              "PRIMARY",
              "index3"
            ],
            "key": "index3",
            "used_key_parts": [
              "id",
              "del_status",
              "gift_start_time",
              "gift_end_time"
 
            ],
            "key_length": "22",
            "rows_examined_per_scan": 1,
            "rows_produced_per_join": 1,
            "filtered": "100.00",
            "using_index": true,
            "cost_info": {
              "read_cost": "1.00",
              "eval_cost": "0.20",
              "prefix_cost": "1.20",
              "data_read_per_join": "1K"
            },
            "used_columns": [
              "id",
              "gift_start_time",
              "gift_end_time",
              "del_status"
            ],
            "attached_condition": "((`my`.`d`.`del_status` = 0) and (<cache>(now()) between `my`.`d`.`gift_start_time` and `my`.`d`.`gift_end_time`))"
          }
        },
mysql> 

2、使用Semijoin ,d表出現 FirstMatch(a)

mysql> EXPLAIN SELECT
    a.spu,
    a.product_name productName,
    ( SELECT cat_name FROM `product_category` WHERE cat_no = a.cat_root_no LIMIT 1 ) catName,
    b.sku,
    b.k3_code,
    b.bar_code,
    b.model 
FROM
    stock_product_detail b
    JOIN stock_product a ON a.spu = b.spu 
WHERE
    b.sku IN (
    SELECT
        c.sku 
    FROM
        jg_gift_rule d FORCE INDEX ( index3 )
        JOIN jg_gift_rel_product c ON d.id = c.gift_rule_id 
    WHERE
        d.del_status = 0 
        AND now( ) BETWEEN d.gift_start_time 
        AND d.gift_end_time 
    );
+----+--------------------+------------------+------------+-------+----------------------------+-------------+---------+-------------------------+------+----------+-----------------------------------------+
| id | select_type        | table            | partitions | type  | possible_keys              | key         | key_len | ref                     | rows | filtered | Extra                                   |
+----+--------------------+------------------+------------+-------+----------------------------+-------------+---------+-------------------------+------+----------+-----------------------------------------+
|  1 | PRIMARY            | b                | NULL       | index | idx_sku_spu,index          | idx_sku_spu | 1513    | NULL                    |    1 |   100.00 | Using index                             |
|  1 | PRIMARY            | a                | NULL       | ref   | sp_idx_spu                 | sp_idx_spu  | 302     | my.b.spu                |    1 |   100.00 | Using index                             |
|  1 | PRIMARY            | c                | NULL       | ref   | idx_sku_gift_rule_id,index | index       | 302     | my.b.sku                |    1 |   100.00 | Using index                             |
|  1 | PRIMARY            | d                | NULL       | ref   | index3                     | index3      | 10      | my.c.gift_rule_id,const |    1 |   100.00 | Using where; Using index; FirstMatch(a) |
|  2 | DEPENDENT SUBQUERY | product_category | NULL       | ref   | idx_cat_no                 | idx_cat_no  | 768     | my.a.cat_root_no        |    1 |   100.00 | Using where; Using index                |
+----+--------------------+------------------+------------+-------+----------------------------+-------------+---------+-------------------------+------+----------+-----------------------------------------+
5 rows in set (0.04 sec)

"table": {
          "table_name": "d",
          "access_type": "ref",
          "possible_keys": [
            "index3"
          ],
          "key": "index3",
          "used_key_parts": [
            "id",
            "del_status"
          ],
          "key_length": "10",
          "ref": [
            "my.c.gift_rule_id",
            "const"
          ],
          "rows_examined_per_scan": 1,
          "rows_produced_per_join": 1,
          "filtered": "100.00",
          "using_index": true,
          "first_match": "a",
          "cost_info": {
            "read_cost": "1.00",
            "eval_cost": "0.20",
            "prefix_cost": "4.80",
            "data_read_per_join": "1K"
          },
          "used_columns": [
            "id",
            "gift_start_time",
            "gift_end_time",
            "del_status"
          ],
          "attached_condition": "(<cache>(now()) between `my`.`d`.`gift_start_time` and `my`.`d`.`gift_end_time`)"
        }
      }
    ],

出現半連接的例子

CREATE TABLE `jg_gift_rel_product`  (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `gift_rule_id` bigint(20) NOT NULL COMMENT '贈品規則id',
  `sku` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL COMMENT '觸發贈品商品sku',
  PRIMARY KEY (`id`) USING BTREE,
  INDEX `idx_sku_gift_rule_id`(`sku`, `gift_rule_id`) USING BTREE,
  INDEX `idx_gift_rule_id_sku`(`gift_rule_id`, `sku`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '觸發贈品商品中間表' ROW_FORMAT = Dynamic;


CREATE TABLE `jg_gift_rule`  (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `platform_code` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL COMMENT '平臺編號',
  `rule_type` tinyint(4) DEFAULT NULL COMMENT '贈品規則類型(0:買贈類A+B,1:買贈類AorB,2:滿贈類)',
  `gift_rule_name` varchar(100) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL COMMENT '贈品規則名稱',
  `trigger_type` tinyint(4) DEFAULT NULL COMMENT '觸發類型',
  `gift_type` tinyint(4) DEFAULT NULL COMMENT '贈送類型(0:買x送x,1:買滿x送1,2:買x送1,3:滿即贈)',
  `gift_start_time` datetime(0) DEFAULT NULL COMMENT '贈品活動開始時間',
  `gift_end_time` datetime(0) DEFAULT NULL COMMENT '贈品活動結束時間',
  `gift_method` tinyint(4) DEFAULT NULL COMMENT '贈送方式(0:無,1:整個活動週期內,2:單個活動週期內,以工作日爲界)',
  `satisfy_count` int(10) DEFAULT NULL COMMENT '滿件數贈',
  `gift_count` int(10) DEFAULT NULL COMMENT '贈送件數',
  `gift_total_count` int(10) DEFAULT NULL COMMENT '贈送總件數',
  `gift_surplus_count` int(10) DEFAULT NULL COMMENT '剩餘可贈送數量',
  `use_type` tinyint(4) DEFAULT NULL COMMENT '使用方式(僅滿即贈使用0:無,1:結算金額,2:客戶實付金額下拉選擇)',
  `is_fold` tinyint(4) DEFAULT NULL COMMENT '是否疊加贈品(僅滿即贈使用0:無,1:是,2:否)',
  `disabled_status` tinyint(4) UNSIGNED DEFAULT 0 COMMENT '禁用狀態(0:啓用 1:禁用)',
  `create_by` varchar(64) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL COMMENT '創建人編號',
  `create_time` datetime(0) DEFAULT CURRENT_TIMESTAMP COMMENT '創建人時間',
  `update_by` varchar(64) CHARACTER SET utf8 COLLATE utf8_general_ci DEFAULT NULL COMMENT '修改人編號',
  `update_time` datetime(0) DEFAULT CURRENT_TIMESTAMP COMMENT '修改時間',
  `del_status` tinyint(4) DEFAULT 0 COMMENT '刪除狀態(0:啓用 1:禁用)',
  PRIMARY KEY (`id`) USING BTREE,
  INDEX `idx_gift_start_time_gift_end_time_del_status`(`gift_start_time`, `gift_end_time`, `del_status`) USING BTREE
) ENGINE = InnoDB  CHARACTER SET = utf8 COLLATE = utf8_general_ci COMMENT = '贈品規則表' ROW_FORMAT = Dynamic;

mysql> EXPLAIN
 SELECT a.id, a.gift_rule_id giftRuleId, a.sku
 FROM jg_gift_rel_product a
 WHERE a.gift_rule_id in (
 SELECT b.gift_rule_id
 FROM jg_gift_rel_product b
 LEFT JOIN jg_gift_rule a
 ON a.id = b.gift_rule_id
 WHERE DATE_FORMAT( a.gift_end_time, '%Y-%m-%d %H:%i:%s' )>= DATE_FORMAT( '2022-02-12 15:20:00.0', '%Y-%m-%d %H:%i:%s' ) AND DATE_FORMAT( '2022-02-12 15:20:00.0', '%Y-%m-%d %H:%i:%s' )>= DATE_FORMAT( a.gift_start_time, '%Y-%m-%d %H:%i:%s' ) AND b.sku = 'K.33.11231.202201170002' );
+----+-------------+-------+------------+--------+---------------------------------------------------------+----------------------+---------+------------------------------------+------+----------+------------------------------+
| id | select_type | table | partitions | type   | possible_keys                                           | key                  | key_len | ref                                | rows | filtered | Extra                        |
+----+-------------+-------+------------+--------+---------------------------------------------------------+----------------------+---------+------------------------------------+------+----------+------------------------------+
|  1 | SIMPLE      | b     | NULL       | ref    | idx_sku_gift_rule_id,idx_gift_rule_id_sku               | idx_sku_gift_rule_id | 302     | const                              |    1 |   100.00 | Using index; Start temporary |
|  1 | SIMPLE      | a     | NULL       | eq_ref | PRIMARY,idx_id_gift_start_time_gift_end_time_del_status | PRIMARY              | 8       | mgb_treasure_system.b.gift_rule_id |    1 |   100.00 | Using where                  |
|  1 | SIMPLE      | a     | NULL       | ref    | idx_gift_rule_id_sku                                    | idx_gift_rule_id_sku | 8       | mgb_treasure_system.b.gift_rule_id |    1 |   100.00 | Using index; End temporary   |
+----+-------------+-------+------------+--------+---------------------------------------------------------+----------------------+---------+------------------------------------+------+----------+------------------------------+
3 rows in set (0.03 sec)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章