Mysql中的join、cross join、inner join是等效的

今天在Mysql官網(參考博客1)看到一段話:

At the parser stage, queries with right outer join operations are converted to equivalent queries containing only left join operations. In the general case, the conversion is performed such that this right join:

(T1, ...) RIGHT JOIN (T2, ...) ON P(T1, ..., T2, ...)

Becomes this equivalent left join:

(T2, ...) LEFT JOIN (T1, ...) ON P(T1, ..., T2, ...)

All inner join expressions of the form T1 INNER JOIN T2 ON P(T1,T2) are replaced by the list T1,T2,  and P(T1,T2) being joined as a conjunct to the WHERE condition (or to the join condition of the embedding join, if there is any).

這段話表明,在Mysql的執行引擎對sql的解析階段,都會將right join轉換爲left join;而對於inner join,則會進行如下轉換:

FROM (T1, ...) INNER JOIN (T2, ...) ON P(T1, ..., T2, ...)

轉換爲:

FROM (T1, ..., T2, ...) WHERE P(T1, ..., T2, ...)

其實對於right join轉換爲left join是可以理解的,因爲通過這樣的轉換,一方面可以使得底層的實現變得統一,另一方面其實也是受限於Mysql只實現了nested-join loop(NLJ)這一種算法(其他所謂的BNL、BKA等算法本質上還是NLJ),後面再細講(詳見參考博客3和4)。但對於將inner join所進行的轉換我就表示不理解:因爲這個轉換相當於是將inner join轉換爲了cross join,而標準的SQL中,這兩者肯定是不等價的。cross join是純粹的笛卡爾積,連表後的記錄行數比inner join要多。直到我看到了Mysql官網(參考博客2)上的另一段話:

In MySQL, JOINCROSS JOIN, and INNER JOIN are syntactic equivalents (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ONclause, CROSS JOIN is used otherwise.

這段話表明,在MySQL中,join、cross join和inner join這三者是等效的,而在標準的SQL查詢中,這三者是不等效的。到這裏,一切就能說得通了。

除此之外,我在Mysql官網上還看到一段話(參考博客2):

When the optimizer evaluates plans for outer join operations, it takes into consideration only plans where, for each such operation, the outer tables are accessed before the inner tables. The optimizer choices are limited because only such plans enable outer joins to be executed using the nested-loop algorithm.
這段話說明了爲什麼Mysql要將right join轉換爲left join。因爲Mysql只實現了nested-loop算法,該算法的核心就是外表驅動內表:

for each row in t1 matching range {
  for each row in t2 matching reference key {
    for each row in t3 {
      if row satisfies join conditions, send to client
    }
  }
}

由此可知,它必須要保證外表先被訪問。有興趣的可以進一步看參考博客4,介紹了三種表連接的算法。

最後,在Mysql官網上還看到一段話(參考博客1):

Consider a query of this form, where R(T2) greatly narrows the number of matching rows from table T2:

SELECT * T1 LEFT JOIN T2 ON P1(T1,T2)
  WHERE P(T1,T2) AND R(T2)

If the query is executed as written, the optimizer has no choice but to access the less-restricted table T1 before the more-restricted table T2, which may produce a very inefficient execution plan.

Instead, MySQL converts the query to a query with no outer join operation if the WHEREcondition is null-rejected. (That is, it converts the outer join to an inner join.) A condition is said to be null-rejected for an outer join operation if it evaluates to FALSE or UNKNOWN for any NULL-complemented row generated for the operation.

也就是說,Mysql引擎在一些特殊情況下,會將left join轉換爲inner join。這裏涉及到兩個問題:1.爲什麼要做這樣的轉換?2.什麼條件下才可以做轉換?

其實官網對這兩個問題都做了回答,不過對於第二個問題的回答方式可能不是那麼容易理解。本文說說對這兩個問題的理解:

首先,做轉換的目的是爲了提高查詢效率。在上面的示例中,where條件中的R(T2)原本可以極大地過濾不滿足條件的記錄,但由於nested loop算法的限制,只能先查T1,再用T1驅動T2。當然,不是所有的left join都能轉換爲inner join,這就涉及到第2個問題。如果你深知left join和inner join的區別就很好理解第二個問題的答案(不知道兩者區別的請自行百度):

left join是以T1表爲基礎,讓T2表來匹配,對於沒有被匹配的T1的記錄,其T2表中相應字段的值全爲null。也就是說,left join連表的結果集包含了T1中的所有行記錄。與之不同的是,inner join只返回T1表和T2表能匹配上的記錄。也就是說,相比left join,inner join少返回了沒有被T2匹配上的T1中的記錄。那麼,如果where中的查詢條件能保證返回的結果中一定不包含不能被T2匹配的T1中的記錄,那就可以保證left join的查詢結果和inner join的查詢結果是一樣的,在這種情況下,就可以將left join轉換爲inner join。

我們再回過頭來看官網中的例子:

T2.B IS NOT NULL
T2.B > 3
T2.C <= T1.C
T2.B < 2 OR T2.C > 1

如果上面的R(T2)是上面的任意一條,就能保證inner join的結果集中一定沒有不能被T2匹配的T1中的記錄。以T2.B > 3爲例,對於不能被T2匹配的T1中的結果集,其T2中的所有字段都是null,顯然不滿足T2.B > 3。

相反,以下R(T2)顯然不能滿足條件,原因請自行分析:

T2.B IS NULL
T1.B < 3 OR T2.B IS NOT NULL
T1.B < 3 OR T2.B > 3

 

參考博客:

1.https://dev.mysql.com/doc/refman/5.6/en/outer-join-simplification.html mysql官網

2.https://dev.mysql.com/doc/refman/5.6/en/join.html mysql官網

3.http://blog.sina.com.cn/s/blog_aed82f6f0102x8al.html mysql join性能原理

4.https://www.cnblogs.com/xqzt/p/4469673.html 表連接的三種方式詳解 hash join、merge join、 nested loop

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章