mysql not in 和 left join 效率問題記錄

原創

2020-06-14 21:06

聲明：感謝 laserhe, denniswwh ， ACMAIN_CHM ， vinsonshen 的熱心幫助

首先說明該條sql的功能是查詢集合a不在集合b的數據。
not in的寫法
select add_tb.RUID
from (select distinct RUID
      from UserMsg
      where SubjectID =12
      and CreateTime>'2009-8-14 15:30:00'
      and CreateTime<='2009-8-17 16:00:00'
) add_tb
where add_tb.RUID
not in (select distinct RUID
          from UserMsg
          where SubjectID =12
    and CreateTime<'2009-8-14 15:30:00'
)
返回444行記錄用時 0.07sec
explain 結果
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+--

----------------------------+
| id | select_type        | table      | type           | possible_keys             | key        | key_len | ref | rows |

Extra                        |
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+--

----------------------------+
| 1 | PRIMARY            | <derived2> | ALL            | NULL                      | NULL       |    NULL | NULL | 452 |

Using where                  |
| 3 | DEPENDENT SUBQUERY | UserMsg    | index_subquery | RUID,SubjectID,CreateTime | RUID       |      96 | func |    2 |

Using index; Using where     |
| 2 | DERIVED            | UserMsg    | range          | SubjectID,CreateTime      | CreateTime |       9 | NULL | 1857 |

Using where; Using temporary |
+----+--------------------+------------+----------------+---------------------------+------------+---------+------+------+--

----------------------------+
分析:該條查詢速度快原因爲id=2的sql查詢出來的結果比較少，所以id=1sql所以運行速度比較快，id=2的使用了臨時表，不知道這個時候是否使用索引？
其中一種left join
select a.ruid,b.ruid
from(select distinct RUID
     from UserMsg
     where SubjectID =12
     and CreateTime >= '2009-8-14 15:30:00'
     and CreateTime<='2009-8-17 16:00:00'
) a left join (
    select distinct RUID
    from UserMsg
    where SubjectID =12 and CreateTime< '2009-8-14 15:30:00'
) b on a.ruid = b.ruid
where b.ruid is null
返回444行記錄用時 0.39sec
explain 結果
+----+-------------+------------+-------+----------------------+------------+---------+------+------+-----------------------

-------+
| id | select_type | table      | type | possible_keys        | key        | key_len | ref | rows | Extra

     |
+----+-------------+------------+-------+----------------------+------------+---------+------+------+-----------------------

-------+
| 1 | PRIMARY     | <derived2> | ALL   | NULL                 | NULL       |    NULL | NULL | 452 |

     |
| 1 | PRIMARY     | <derived3> | ALL   | NULL                 | NULL       |    NULL | NULL | 1112 | Using where; Not exists

     |
| 3 | DERIVED     | UserMsg    | ref   | SubjectID,CreateTime | SubjectID |       5 |      | 6667 | Using where; Using

temporary |
| 2 | DERIVED     | UserMsg    | range | SubjectID,CreateTime | CreateTime |       9 | NULL | 1838 | Using where; Using

temporary |
+----+-------------+------------+-------+----------------------+------------+---------+------+------+-----------------------

-------+
分析:使用了兩個臨時表，並且兩個臨時表做了笛卡爾積，導致不能使用索引並且數據量很大
另外一種left join
select distinct a.RUID
from UserMsg a
left join UserMsg b
    on a.ruid = b.ruid
    and b.subjectID =12 and b.createTime < '2009-8-14 15:30:00'
where a.subjectID =12
and a.createTime >= '2009-8-14 15:30:00'
and a.createtime <='2009-8-17 16:00:00'
and b.ruid is null;
返回444行記錄用時 0.07sec
explain 結果
+----+-------------+-------+-------+---------------------------+------------+---------+--------------+------+---------------

--------------------+
| id | select_type | table | type | possible_keys             | key        | key_len | ref          | rows | Extra

                  |
+----+-------------+-------+-------+---------------------------+------------+---------+--------------+------+---------------

--------------------+
| 1 | SIMPLE      | a     | range | SubjectID,CreateTime      | CreateTime |       9 | NULL         | 1839 | Using where;

Using temporary      |
| 1 | SIMPLE      | b     | ref   | RUID,SubjectID,CreateTime | RUID       |      96 | dream.a.RUID |    2 | Using where;

Not exists; Distinct |
+----+-------------+-------+-------+---------------------------+------------+---------+--------------+------+---------------

--------------------+
分析：兩次查詢都是用上了索引，並且查詢時同時進行的，所以查詢效率應該很高
使用not exists的sql
select distinct a.ruid
from UserMsg a
where a.subjectID =12
and a.createTime >= '2009-8-14 15:30:00'
and a.createTime <='2009-8-17 16:00:00'
and not exists (
    select distinct RUID
    from UserMsg
    where subjectID =12 and createTime < '2009-8-14 15:30:00'
    and ruid=a.ruid
)
返回444行記錄用時 0.08sec
explain 結果
+----+--------------------+---------+-------+---------------------------+------------+---------+--------------+------+------

------------------------+
| id | select_type        | table   | type | possible_keys             | key        | key_len | ref          | rows | Extra

                      |
+----+--------------------+---------+-------+---------------------------+------------+---------+--------------+------+------

------------------------+
| 1 | PRIMARY            | a       | range | SubjectID,CreateTime      | CreateTime |       9 | NULL         | 1839 | Using

where; Using temporary |
| 2 | DEPENDENT SUBQUERY | UserMsg | ref   | RUID,SubjectID,CreateTime | RUID       |      96 | dream.a.RUID |    2 | Using

where                  |
+----+--------------------+---------+-------+---------------------------+------------+---------+--------------+------+------

------------------------+
分析：同上基本上是一樣的，只是分解了2個查詢順序執行，查詢效率低於第3個

爲了驗證數據查詢效率，將上述查詢中的subjectID =12的限制條件去掉，結果統計查詢時間如下
0.20s
21.31s
0.25s
0.43s

laserhe幫忙分析問題總結
select a.ruid,b.ruid
from(    select distinct RUID
    from UserMsg
    where CreateTime >= '2009-8-14 15:30:00'
    and CreateTime<='2009-8-17 16:00:00'
) a     left join UserMsg b
    on a.ruid = b.ruid
    and b.createTime < '2009-8-14 15:30:00'
where b.ruid is null;
執行時間0.13s
+----+-------------+------------+-------+-----------------+------------+---------+--------+------+--------------------------

----+
| id | select_type | table      | type | possible_keys   | key        | key_len | ref    | rows | Extra

|
+----+-------------+------------+-------+-----------------+------------+---------+--------+------+--------------------------

----+
| 1 | PRIMARY     | <derived2> | ALL   | NULL            | NULL       |    NULL | NULL   | 1248 |

|
| 1 | PRIMARY     | b          | ref   | RUID,CreateTime | RUID       |      96 | a.RUID |    2 | Using where; Not exists

|
| 2 | DERIVED     | UserMsg    | range | CreateTime      | CreateTime |       9 | NULL   | 3553 | Using where; Using

temporary |
+----+-------------+------------+-------+-----------------+------------+---------+--------+------+--------------------------

----+
執行效率類似與not in的效率

數據庫優化的基本原則：讓笛卡爾積發生在儘可能小的集合之間，mysql在join的時候可以直接通過索引來掃描，而嵌入到子查詢裏頭，查詢規

劃器就不曉得用合適的索引了。
一個SQL在數據庫裏是這麼優化的：首先SQL會分析成一堆分析樹，一個樹狀數據結構，然後在這個數據結構裏，查詢規劃器會查找有沒有合適

的索引，然後根據具體情況做一個排列組合，然後計算這個排列組合中的每一種的開銷（類似explain的輸出的計算機可讀版本），然後比較裏

面開銷最小的，選取並執行之。那麼：
explain select a.ruid,b.ruid from(select distinct RUID      from UserMsg       where CreateTime >= '2009-8-14 15:30:00'

and CreateTime<='2009-8-17 16:00:00' ) a left join UserMsg b on a.ruid = b.ruid and b.createTime < '2009-8-14 15:30:00'

where b.ruid is null;
和
explain select add_tb.RUID
    -> from (select distinct RUID
    ->       from UserMsg
    ->       where CreateTime>'2009-8-14 15:30:00'
    ->       and CreateTime<='2009-8-17 16:00:00'
    -> ) add_tb
    ->   where add_tb.RUID
    ->   not in (select distinct RUID
    ->           from UserMsg
    ->           where CreateTime<'2009-8-14 15:30:00'
    -> );
explain
+----+--------------------+------------+----------------+-----------------+------------+---------+------+------+------------

------------------+
| id | select_type        | table      | type           | possible_keys   | key        | key_len | ref | rows | Extra

                |
+----+--------------------+------------+----------------+-----------------+------------+---------+------+------+------------

------------------+
| 1 | PRIMARY            | <derived2> | ALL            | NULL            | NULL       |    NULL | NULL | 1248 | Using where

                |
| 3 | DEPENDENT SUBQUERY | UserMsg    | index_subquery | RUID,CreateTime | RUID       |      96 | func |    2 | Using index;

Using where     |
| 2 | DERIVED            | UserMsg    | range          | CreateTime      | CreateTime |       9 | NULL | 3509 | Using where;

Using temporary |
+----+--------------------+------------+----------------+-----------------+------------+---------+------+------+------------

------------------+
開銷是完全一樣的，開銷可以從 rows 那個字段得出（基本上是rows那個字段各個行的數值的乘積，也就是笛卡爾積）
但是呢：下面這個：
explain select a.ruid,b.ruid from(select distinct RUID      from UserMsg       where CreateTime >= '2009-8-14 15:30:00'

and CreateTime<='2009-8-17 16:00:00' ) a left join ( select distinct RUID from UserMsg where createTime < '2009-8-14

15:30:00' ) b on a.ruid = b.ruid where b.ruid is null;
執行時間21.31s
+----+-------------+------------+-------+---------------+------------+---------+------+-------+-----------------------------

-+
| id | select_type | table      | type | possible_keys | key        | key_len | ref | rows | Extra

|
+----+-------------+------------+-------+---------------+------------+---------+------+-------+-----------------------------

-+
| 1 | PRIMARY     | <derived2> | ALL   | NULL          | NULL       |    NULL | NULL | 1248 |

|
| 1 | PRIMARY     | <derived3> | ALL   | NULL          | NULL       |    NULL | NULL | 30308 | Using where; Not exists

|
| 3 | DERIVED     | UserMsg    | ALL   | CreateTime    | NULL       |    NULL | NULL | 69366 | Using where; Using temporary

|
| 2 | DERIVED     | UserMsg    | range | CreateTime    | CreateTime |       9 | NULL | 3510 | Using where; Using temporary

|
+----+-------------+------------+-------+---------------+------------+---------+------+-------+-----------------------------

-+
我就有些不明白
爲何是四行
並且中間兩行巨大無比
按理說
查詢規劃器應該能把這個查詢優化得跟前面的兩個一樣的
（至少在我熟悉的pgsql數據庫裏我有信心是一樣的）
但mysql裏頭不是
所以我感覺查詢規劃器裏頭可能還是糙了點
我前面說過優化的基本原則就是，讓笛卡爾積發生在儘可能小的集合之間
那麼上面最後一種寫法至少沒有違反這個原則
雖然b 表因爲符合條件的非常多，基本上不會用索引
但是並不應該妨礙查詢優化器看到外面的join on條件，從而和前面兩個SQL一樣，選取主鍵進行join
不過我前面說過查詢規劃器的作用
理論上來講
遍歷一遍所有可能，計算一下開銷
是合理的
我感覺這裏最後一種寫法沒有遍歷完整所有可能
可能的原因是子查詢的實現還是比較簡單？
子查詢對數據庫的確是個挑戰
因爲基本都是遞歸的東西
所以在這個環節有點毛病並不奇怪
其實你仔細想想，最後一種寫法無非是我們第一種寫法的一個變種，關鍵在表b的where 條件放在哪裏
放在裏面，就不會用索引去join
放在外面就會
這個本身就是排列組合的一個可能

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

C# MySqlDBHelp

using System; using System.Collections.Generic; using System.Text; using System.Data; using System.Configuration;

2023-11-29 15:03:24

Redis源碼解析數據庫redisDb

[Redis 源碼解析 1：數據庫 redisDb] 服務器中的數據庫 Redis 服務器將絕大部分的信息都保存在 server.h/redisServer。redis 的數據是保存在 redisServer 中的 redisDb 結構中。

2023-11-05 12:28:17

典型場景 | PolarDB-X 如何支撐 SaaS 多租戶

SaaS多租戶背景很多平臺類應用或系統（如電商CRM平臺、倉庫訂單平臺等等），它們的服務模型是圍繞用戶維度（這裏的用戶維度可以是一個賣家或品牌，可以是一個倉庫，等等）展開的。因此，這類型的平臺業務，爲了支持業務系統的水平擴展性，業務的數

2023-09-15 00:14:03

SQL DATE 日期格式轉換

格式 # 查詢 (current date: 12/30/2006) 示例 1 select convert(varchar, getdat

2020-07-08 10:42:03

OTL編程技術

什麼是OTL:OTL 是 Oracle, Odbc and DB2-CLI TemplateLibrary 的縮寫，是一個操控關係數據庫的C++模板

2020-07-08 06:35:10

Oracle 查詢函數進階之decode()

1. decode() 此函數是ORACLE PL/SQL是功能強大的函數之一，目前還只有ORACLE公司的SQL提供了此函數，其他數據庫廠商的SQL實現還沒有此功能。含義：decode(表達式, 值1, 返回值1,

酸奶喵喵酱

2020-07-08 05:05:14

記一次mongo慢查詢排查記錄

背景近期收到客戶反映，系統的認證日誌不能根據用戶查詢接口總是報錯具體報錯爲:com.mongodb.MongoSocketReadTimeoutException:Timeout while receiving message 分析排查

2020-07-08 03:09:33

記一次mysql故障處理過程

背景同事一個項目數據庫mysql跑在docker容器裏，一日前臺報錯，查詢後發現mysql掛掉，他自己嘗試重啓docker start mysql，但是多次嘗試並沒有啓動成功，同事在沒有任何備份的情況下采用了，重裝鏡像的方法docker

2020-07-08 03:09:33

Sql經典練習題

use fuxi; CREATE TABLE STUDENT ( SNO VARCHAR(3) NOT NULL, SNAME VARCHAR(4) NOT NULL, SSEX VAR

2020-07-08 02:46:01

淺析 Django 處理流程和結構分析 django

在Python 的Web 框架中，Django 是比較成功的。它是一個高級Python web framework ，鼓勵快速開發和乾淨的

2020-07-08 01:38:31

[Oracle] 書寫歷史的甲骨文――ORACLE公司傳奇

[Oracle] 書寫歷史的甲骨文――ORACLE公司傳奇作者：Fenng 日期：Sep 09 2004 ORACLE公司之起源很難想象，ORACLE公司的這一段傳奇居然要從IBM開始。 1970年的6月，IBM公司的研究員埃

2020-07-07 13:52:28

MySQL中SQL的執行順序

MYSQL中SQL的執行順序：（1）、from （2）、on （3）、join （4）、where （5）、group by （6）、avg,sum （7）、having （8）、select （9）、distinct （10）、ord

有趣的灵魂_不世俗的心

2020-07-07 00:57:35

有必要嗎?

剛看了一些使用VC++對數據庫進行開發的案例，發現還自己真有點看不懂。有必要嗎？以前使用DELPHI進行數據庫開發，多快啊，多容易啊。拉幾個組

2020-07-06 12:38:03

連接mysql數據庫時Establishing SSL connection without server's identity verification is not recommended.

[Thu Aug 03 08:00:35 CST 2017 WARN: Establishing SSL connection without server's identity verification is not recommend

2020-07-06 11:43:21

對MS SQLSERVER出現無法刪除用戶的解決辦法

MS SQLServer 備份移植到另一服務器還原時容易遇到的問題…… 用戶無法登錄，因爲孤立用戶沒有和一個登錄名關聯，想刪除用戶重建登錄名，但系統提示錯誤： "因爲選定的用戶擁有對象，所以無法除去刪除該用戶" 解決方法： 1. 你需要做

2020-07-06 09:57:51

24小時熱門文章

最新文章

最新評論文章