mysql派生表合併

前幾天筆者在CR一段sql時,發現一處邏輯問題,原sql如下:

 select user_id, max(create_time) as create_time,is_success
 from user_login_log
 group by user_id

這裏是想取分組後最新的一條記錄,但事實上is_success字段與最新時間並不是同一條記錄
於是,改寫如下:

select user_id,create_time,is_success  from (select * from user_login_log  order by user_id, create_time desc) b 
group by b.user_id;

然鵝經過驗證後,發現數據並不準確。
通過explain查看此sql的執行計劃,發現只有一行記錄,諮詢dba後,是由於派生表合併造成。
派生表合併是mysql在5.7版本作的優化,在5.6版本中查看上面語句的執行計劃,會有兩行記錄,其中一行爲Derived表,即派生表。
具體操作如下:

  • Merge the derived table into the outer query block。(意即將派生表合併到外部查詢中)
  • Materialize the derived table to an internal temporary table。(將派生表按內部臨時表實現)
    舉個栗子
優化前:SELECT * FROM (SELECT * FROM t1) AS derived_t1;
優化後:SELECT * FROM t1;

或者

優化前:
  SELECT *
  FROM t1 JOIN (SELECT t2.f1 FROM t2) AS derived_t2 ON t1.f2=derived_t2.f1
  WHERE t1.f1 > 0;
優化後:
  SELECT t1.*, t2.f1
  FROM t1 JOIN t2 ON t1.f2=t2.f1
  WHERE t1.f1 > 0;

在合併後,優化器將在外部查詢快中執行order by子句,但是需滿足一定條件,否則將忽略order by子句。

The optimizer propagates an ORDER BY clause in a derived table or view reference to the outer query block if these conditions are all true:
 - The outer query is not grouped or aggregated.(外部查詢未分組或聚合)
 - The outer query does not specify DISTINCT, HAVING, or ORDER BY.(外部查詢未指定distinct,having,order by)
 - The outer query has this derived table or view reference as the only source in the FROM clause.(外部查詢將此派生表或試圖作爲from子句的唯一來源)
Otherwise, the optimizer ignores the ORDER BY clause.

而上面改寫的sql中,外部查詢就用到了group by子句,因此導致order by 子句被忽略。

如何解決此問題呢?
可以關閉派生表合併狀態

SET optimizer_switch = 'derived_merge=off';

但是這個影響範圍太大了,而且存在即合理,mysql之所以加入派生表合併是想減少查詢開銷,派生類是個臨時表,開闢一個臨時表的同時還要維護排序或者分組等等,都會影響效率,所以儘量不要去修改此參數。

mysql推薦使用如下方法來規避派生表合併

It is possible to disable merging by using in the subquery any constructs that prevent merging, although these are not as explicit in their effect on materialization. Constructs that prevent merging are the same for derived tables and view references:

 - Aggregate functions (SUM(), MIN(), MAX(), COUNT(), and so forth)
 - DISTINCT
 - GROUP BY
 - HAVING
 - LIMIT
 - UNION or UNION ALL
 - Subqueries in the select list
 - Assignments to user variables
 - Refererences only to literal values (in this case, there is no underlying table)

最終sql修改如下:

SELECT user_id,create_time,is_success  FROM  (select * from user_login_log  order by user_id, create_time DESC limit 100000) b 
GROUP BY b.user_id;
或者
SELECT user_id,create_time,is_success  FROM  (select distinct user_id,create_time,is_success from user_login_log  order by user_id, create_time DESC limit 100000) b 
GROUP BY b.user_id;

當然,limit的效率較distinct會高一些,而且order by limit 子句會使用優先隊列排序算法。
下圖爲優化前後的查詢結果:
在這裏插入圖片描述
在這裏插入圖片描述
類似的,在oracle中的概念叫做子查詢展開。

mysql英文手冊:

https://dev.mysql.com/doc/refman/5.7/en/upgrading-from-previous-series.html
https://dev.mysql.com/doc/refman/5.7/en/derived-table-optimization.html
https://dev.mysql.com/doc/refman/5.7/en/rewriting-subqueries.html
https://dev.mysql.com/doc/refman/5.7/en/limit-optimization.html
https://dev.mysql.com/doc/refman/5.7/en/semijoins.html

發佈了35 篇原創文章 · 獲贊 5 · 訪問量 1萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章