Innodb存儲引擎查詢輸出分析（包括查詢過程do_select()函數）

MySQL查詢邏輯以及結果的輸出有規律嗎？本身問題是不值得討論的，突然被問到這個問題時，覺得有必要把其深入的實現原理搞明白。因此，通過一些實驗進行驗證，並跟蹤源碼，對現有的查詢有了深入的理解。

源碼分析

查詢於存儲引擎的實現密切相關，因此，以下內容主要針對Innodb存儲引擎的查詢處理進行深入研究。對於查詢輸出的入口點，本文從do_select()（sql\sql_select.cc）函數開始。該函數主要用於查詢匹配的結果，並將查詢結果通過socket傳輸，或者寫到數據表中。

首先看一下調用邏輯，如下所示：

do_select(): 查詢入口函數。

| sub_select(): 查詢部分join的記錄。循環調用ha_innobase::rnd_next()和evaluate_join_record()獲取並處理該部分的每條記錄。（sql\sql_select.cc:11705）

| | evaluate_join_record(): 處理一條查詢記錄。（sql\sql_select.cc:11758）

| | rr_sequential():調用ha_innobase::rnd_next()讀取下一條記錄。（sql\records.cc:452）

| | | ha_innobase::rnd_next(): 讀取下一條記錄。（storage\innobase\handler\ha_innodb.cc:6141）

| | | | ha_innobase::general_fetch(): 從給定的索引位置獲取下一條或上一條記錄。（storage\innobase\handler\ha_innodb.cc:5948）

| | | | | row_search_for_mysql(): 從數據庫中查詢一條記錄。以下分爲6個階段分別處理各個部分。（storage\innobase\row\row0sel.c:3369）

| | | | | | | rw_lock_get_writer()函數用於獲取讀寫鎖。如果獲取失敗，釋放目前的讀寫鎖。（storage\innobase\include\sync0rw.ic:122）

| | | | | | | row_sel_pop_cached_row_for_mysql():函數用於從cache中讀取一行記錄，（storage\innobase\row\row0sel.c:3167）

| | | | | | | | row_sel_copy_cached_field_for_mysql(): 函數讀取每個字段。（storage\innobase\row\row0sel.c:3134）

| | | | | | | row_sel_try_search_shortcut_for_mysql()函數使用hash索引獲取聚集索引的記錄。（storage\innobase\row\row0sel.c:3293）

| | | | | | | | row_sel_store_mysql_rec()函數將獲取的innobase格式的行記錄轉化爲mysql格式。（storage\innobase\row\row0sel.c:2692）

| | | | | | | | | row_sel_field_store_in_mysql_format()函數將innobase格式的行記錄中的每個字段轉化爲mysql格式。（storage\innobase\row\row0sel.c:2535）

| | | | | | | sel_restore_position_for_mysql(): 恢復索引的遊標位置。（storage\innobase\row\row0sel.c:3070）

| | | | | | | | btr_pcur_restore_position_func(): 恢復一個持久化遊標的位置。（storage\innobase\btr\btr0pcur.c:208）

| | | | | | | | | btr_cur_get_index(): 獲取索引。（storage\innobase\include\btr0pcur.ic:51）

| | | | | | | | | btr_pcur_get_rec(): 獲取持久化遊標的記錄。（storage\innobase\include\btr0pcur.ic:104）

| | | | | | | | | | btr_cur_get_rec (): 獲取當前遊標位置的記錄。（storage\innobase\include\btr0pcur.ic:104）

| | | | | | | | | rec_get_offsets_func(): 獲取記錄中每個字段的偏移。（storage\innobase\rem\rem0rec.c:524）

| | | | | | | | btr_pcur_move_to_next(): 移動持久化遊標到下一條記錄。（storage\innobase\include\btr0pcur.ic:342）

| | | | | | | page_rec_is_infimum(): 查看當前記錄是否是該頁的infinum記錄。infinum記錄表示比任何鍵值都小的記錄。（storage\innobase\include\page0page.ic:415）

| | | | | | | page_rec_is_supermum(): 查看當前記錄是否是該頁的supermum記錄。supermum記錄表示比任何鍵值都大的記錄。（storage\innobase\include\page0page.ic:403）

| | | | | | | rec_get_next_offs(): 獲取相同頁中下一條記錄的偏移量。（storage\innobase\include\rem0rec.ic:325）

| | | | | | | rec_get_offsets_func(): 獲取記錄中每個字段的偏移。（storage\innobase\rem\rem0rec.c:524）

| | | | | | | rec_offs_validate():驗證記錄的偏移量。（storage\innobase\rem\rem0rec.c:954）

| | | | | | | row_sel_store_mysql_rec()函數將獲取的innobase格式的行記錄轉化爲mysql格式。（storage\innobase\row\row0sel.c:2692）

| | | | | | | | row_sel_field_store_in_mysql_format()函數將innobase格式的行記錄中的每個字段轉化爲mysql格式。（storage\innobase\row\row0sel.c:2535）

| | | | | | | | btr_pcur_store_position(): 存儲遊標的位置。（storage\innobase\btr\btr0pcur.c:89）

| | | | | | | | | btr_pcur_get_block(): 獲取持久化遊標的緩衝塊。（storage\innobase\include\btr0pcur.ic:90）

| | | | | | | | | btr_pcur_get_page_cur(): 獲取持久化遊標的頁的遊標。（storage\innobase\include\btr0pcur.ic:64）

| | | | | | | | | page_cur_get_rec(): 獲取遊標位置的記錄。（storage\innobase\include\page0cur.ic:76）

| | | | | | | | | dict_index_copy_rec_order_prefix(): 拷貝記錄。（storage\innobase\dict\dict0dict.c:4185）

| | | | | | | | | | rec_copy_prefix_to_buf(): 拷貝記錄的字段到緩存buffer中。（storage\innobase\rem\rem0rec.c:1383）

| | | | | | | | | | dict_index_get_nth_field(): 獲取第n個字段的起始地址。（storage\innobase\include\dict0dict.ic:620）

| | | | | | | | | | dict_field_get_col(): 獲取第n個字段的值。（storage\innobase\include\dict0dict.ic:663）

| | | | | | | btr_pcur_move_to_next(): 移動持久化遊標到下一條記錄。（storage\innobase\include\btr0pcur.ic:342）

| | | | | | | mtr_commit(): 提交事務。（storage\innobase\mtr\mtr0mtr.c:247）

從以上查詢邏輯，可以清晰的看出，MySQL的查詢時如何進行的。概言之，如果可以從自適應hash索引（在內存中）中得到結果，獲取結果從innobase格式轉化爲mysql格式並輸出；否則，根據索引的遊標位置，獲取當前頁中的記錄，並拷貝當前記錄到內存，同樣將結果轉化爲mysql格式輸出。

從以上內容中，很難看出輸出結果有什麼規律。實際上，輸出的結果是有一定規律的，這種規律與innodb存儲引擎的設計和存儲密切相關。

innodb存儲引擎的存儲是按照B+索引將主鍵作爲鍵值進行聚集存儲的，如果不指定主鍵，系統會隱藏建立一個主鍵。innodb存儲引擎中所有的葉子節點爲數據記錄，並且數據記錄邏輯上可以順序訪問。並且innodb存儲引擎的數據存儲和獲取是按照頁來進行的。因此，在查詢時，會將整個頁的數據加載到內存中，innodb存儲引擎的默認頁大小爲16K。

實驗測試

通過以上分析，不難理解，在查詢時，簡單查詢的輸出結果是一般按照B+索引的存儲順序排列的。爲了進一步的驗證，進行一下兩個實驗

實驗1：

以簡單的帶有主鍵的數據表student，表定義如下表所示。測試語句以簡單的select * from student;爲例，進行測試。

CREATE TABLE `student` (

`std_id` int(11) NOT NULL,

`std_name` varchar(20) NOT NULL DEFAULT '""',

`std_spec` varchar(20) NOT NULL DEFAULT '""',

`std_***` tinyint(4) NOT NULL DEFAULT '0',

`std_age` tinyint(3) unsigned NOT NULL DEFAULT '0',

PRIMARY KEY (`std_id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8;

測試結果如下所示：

mysql> select * from student;

+------------+----------+-------------+---------+---------+

+------------+----------+-------------+---------+---------+

| 2012072301 | aaa | computer | 0 | 20 |

| 2012072303 | ccc | computer | 1 | 21 |

| 2012072304 | ddd | computer | 0 | 20 |

| 2012072305 | eee | information | 0 | 22 |

| 2012072306 | fff | computer | 1 | 20 |

| 2012072307 | ggg | computer | 0 | 20 |

| 2012072308 | hhh | computer | 0 | 21 |

| 2012072309 | iii | automatic | 0 | 20 |

| 2012072310 | abc | computer | 1 | 20 |

| 2012072311 | kkk | computer | 0 | 18 |

| 2012072312 | lll | computer | 0 | 20 |

| 2012072313 | mmm | computer | 0 | 20 |

| 2012072314 | nnn | computer | 1 | 20 |

| 2012072315 | ooo | information | 0 | 20 |

| 2012072316 | ppp | computer | 0 | 19 |

| 2012072317 | qqq | computer | 1 | 20 |

| 2012072318 | rrr | information | 0 | 20 |

| 2012072319 | sss | computer | 1 | 20 |

| 2012072320 | ttt | computer | 0 | 20 |

| 2012072321 | uuu | automatic | 0 | 23 |

| 2012072322 | vvv | computer | 0 | 20 |

| 2012072323 | www | computer | 1 | 20 |

| 2012072324 | xxx | computer | 0 | 25 |

| 2012072325 | yyy | automatic | 0 | 20 |

| 2012072326 | zzz | computer | 1 | 20 |

| 2012080811 | bbb | information | 0 | 20 |

+------------+----------+-------------+---------+---------+

實驗2

同樣以student表爲例，將主鍵爲2012080811的記錄更新操作，操作如下：

update student set std_id=2012072302 where std_id=2012080811;

然後，在進行測試，測試結果如下所示：

+------------+----------+-------------+---------+---------+

+------------+----------+-------------+---------+---------+

| 2012072301 | aaa | computer | 0 | 20 |

| 2012072302 | bbb | information | 0 | 20 |

| 2012072303 | ccc | computer | 1 | 21 |

| 2012072304 | ddd | computer | 0 | 20 |

| 2012072305 | eee | information | 0 | 22 |

| 2012072306 | fff | computer | 1 | 20 |

| 2012072307 | ggg | computer | 0 | 20 |

| 2012072308 | hhh | computer | 0 | 21 |

| 2012072309 | iii | automatic | 0 | 20 |

| 2012072310 | abc | computer | 1 | 20 |

| 2012072311 | kkk | computer | 0 | 18 |

| 2012072312 | lll | computer | 0 | 20 |

| 2012072313 | mmm | computer | 0 | 20 |

| 2012072314 | nnn | computer | 1 | 20 |

| 2012072315 | ooo | information | 0 | 20 |

| 2012072316 | ppp | computer | 0 | 19 |

| 2012072317 | qqq | computer | 1 | 20 |

| 2012072318 | rrr | information | 0 | 20 |

| 2012072319 | sss | computer | 1 | 20 |

| 2012072320 | ttt | computer | 0 | 20 |

| 2012072321 | uuu | automatic | 0 | 23 |

| 2012072322 | vvv | computer | 0 | 20 |

| 2012072323 | www | computer | 1 | 20 |

| 2012072324 | xxx | computer | 0 | 25 |

| 2012072325 | yyy | automatic | 0 | 20 |

| 2012072326 | zzz | computer | 1 | 20 |

+------------+----------+-------------+---------+---------+

結論

從以上測試可以看出，當修改主鍵時，輸出的順序進行了變化。實驗1中2012080811記錄在最後，將該記錄的主鍵修改爲2012072302後，該記錄輸出爲第二行。由此可以驗證，Innodb的查詢輸出原則是按照主鍵在B+索引葉節點的邏輯位置順序輸出的。

通過以上測試和對比，《Innodb存儲引擎查詢輸出分析》中的測試僅僅對普遍的全表掃描進行的測試和描述，而沒有考慮特殊的查詢。而《MySQL技術內幕--SQL編程》中的測試正好從特殊的情況，認爲查詢輸出不一定按照主鍵輸出的。

因此，查詢輸出的規則依賴於查詢語句的類型。如果查詢進行全表掃描或者使用主鍵索引的方式進行查詢，那麼查詢輸出按照主鍵的順序進行輸出；如果查詢使用輔助索引進行查詢，那麼查詢輸出按照輔助索引的鍵值順序進行輸出。

參考：http://blog.chinaunix.net/uid-26896862-id-3307353.html

Innodb存儲引擎查詢輸出分析（包括查詢過程do_select()函數）

SQL優化-20231016

Gearman中文手冊技術文檔分享chm

Java中多線程Synchronized使用技巧

虛擬機搭建LVS負載均衡DR直接路由模式

php str_replace json_decode函數心得

Innodb存儲引擎查詢輸出分析（包括查詢過程do_select()函數）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結