《面試官:談談你對索引的認知》系列之磁盤I/O

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"寫在前面","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"前面兩講我們介紹了B-/+樹的特性對比,數據庫系統普遍採用B-/+樹作爲索引結構。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/d0485a1327e300611281f2194","title":"","type":null},"content":[{"type":"text","text":"《面試官:談談你對索引的認知》系列之B-樹","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/9c88e754b6edcba58a3e8776b","title":"","type":null},"content":[{"type":"text","text":"《面試官:談談你對索引的認知》系列之B+樹","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這一節將結合計算機組成原理相關知識討論B-/+樹作爲索引的理論基礎。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一般來說,索引本身也很大,不可能全部存儲在內存中,因此索引往往以索引文件的形式存儲的磁盤上。這樣的話,索引查找過程中就要產生磁盤I/O消耗,相對於內存存取,I/O存取的消耗要高几個數量級,所索引的結構組織要儘量減少查找過程中磁盤I/O的存取次數。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"爲什麼使用索引查詢性能會提高","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"正常查詢的時間複雜度是O(N),可能需要全表查詢遍歷纔可以查詢到。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存儲引擎InnoDB,data域存儲完整的數據邏輯。聚集型,數據文件本身就是索引文件。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/99/992447186217f5c0f2592dc328643f35.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"非葉子節點上以年齡分區,葉子節點上保存實際數據,葉子節點互相有指引,類似數組+鏈表形式,這種查詢時間複雜度是O(logN),所以使用索引可以大大降低磁盤I/O。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"磁盤存取原理","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"索引一般以文件形式存儲在磁盤上,索引檢索需要磁盤I/O操作。與主存不同,磁盤I/O存在機械運動耗費,因此磁盤I/O的時間消耗是巨大的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下所示爲磁盤的整體結構示意圖:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3b/3b578571365b08ac04f7e7414e4f0717.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一個磁盤由大小相同且同軸的圓形盤片組成,磁盤可以轉動(各個磁盤必須同步轉動)。在磁盤的一側有磁頭支架,磁頭支架固定了一組磁頭,每個磁頭負責存取一個磁盤的內容。磁頭不能轉動,但是可以沿磁盤半徑方向運動(實際是斜切向運動),每個磁頭同一時刻也必須是同軸的,即從正上方向下看,所有磁頭任何時候都是重疊的(不過目前已經有多磁頭獨立技術,可不受此限制)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如下所示爲磁盤結構的示意圖:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/c5/c5752a1d1b4e6820e954267c8bdc4ce5.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"盤片被劃分成一系列同心環,圓心是盤片中心,每個同心環叫做一個磁道,所有半徑相同的磁道組成一個柱面。磁道被沿半徑線劃分成一個個小的段,每個段叫做一個扇區,每個扇區是磁盤的最小存儲單元。爲了簡單起見,我們下面假設磁盤只有一個盤片和一個磁頭。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當需要從磁盤讀取數據時,系統會將數據邏輯地址傳給磁盤,磁盤的控制電路按照尋址邏輯將邏輯地址翻譯成物理地址,即確定要讀的數據在哪個磁道,哪個扇區。爲了讀取這個扇區的數據,需要將磁頭放到這個扇區上方,爲了實現這一點,磁頭需要移動對準相應磁道,這個過程叫做尋道,所耗費時間叫做尋道時間,然後磁盤旋轉將目標扇區旋轉到磁頭下,這個過程耗費的時間叫做旋轉時間。","attrs":{}}]},{"type":"bulletedlist","content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"磁盤I/O時間","attrs":{}},{"type":"text","text":":數據傳輸時間+讀取磁盤目錄時間+尋道時間+旋轉時間","attrs":{}}]}]}],"attrs":{}},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"存儲數據最小單元","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在計算機中磁盤存儲數據最小單元是扇區,一個扇區的大小是512字節,而文件系統(例如XFS/EXT4)他的最小單元是塊,一個塊的大小是4k。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而對於我們的InnoDB存儲引擎也有自己的最小儲存單元——頁(Page),一個頁的大小是16K。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在MySQL中我們的InnoDB頁的大小默認是16k,當然也可以通過參數設置:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"mysql>show variables like 'innodb_page_size';","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"數據表中的數據都是存儲在頁中的,所以一個頁中能存儲多少行數據呢?假設一行數據的大小是1k,那麼一個頁可以存放16行這樣的數據。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"局部性原理與磁盤預讀","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於存儲介質的特性,磁盤本身存取就比主存慢很多,再加上機械運動耗費,磁盤的存取速度往往是主存的幾百分之一,因此爲了提高效率,要儘量減少磁盤I/O。爲了達到這個目的,磁盤往往不是嚴格按需讀取,而是每次都會預讀,即使只需要一個字節,磁盤也會從這個位置開始,順序向後讀取一定長度的數據放入內存。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣做的理論依據是計算機科學中著名的局部性原理:","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當一個數據被用到時,其附近的數據也通常會馬上被使用。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"程序運行期間所需要的數據通常比較集中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於磁盤順序讀取的效率很高(不需要尋道時間,只需很少的旋轉時間),因此對於具有局部性的程序來說,","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"預讀可以提高I/O效率","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"預讀的長度一般爲頁(page)的整倍數。頁是計算機管理存儲器的邏輯塊,硬件及操作系統往往將主存和磁盤存儲區分割爲連續的大小相等的塊,每個存儲塊稱爲一頁(在許多操作系統中,頁的大小通常爲4k)主存和磁盤以頁爲單位交換數據。","attrs":{}}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當程序要讀取的數據不在主存中時,會觸發一個缺頁異常,此時系統會向磁盤發出讀盤信號,磁盤會找到數據的起始位置並向後連續讀取一頁或幾頁載入內存中,然後異常返回,程序繼續運行。","attrs":{}}]}],"attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以IO一次就是讀一頁的大小。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"總結","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MySQL 普遍採用 B+Tree 實現,索引本身很大,不可能全部存儲內存,因此需要以索引文件的形式存儲磁盤。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通常B-/+Tree的高(h)固定情況下,而度(d)會很大,度越大索引性能越好。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"相對於內存讀取,I/O存取的消耗要高几個數量級(內存1s,磁盤100多s),由於MySQL數據存儲保存在磁盤中,所以在查詢時磁盤I/O是其主要查詢性能瓶頸,而使用索引就可以減少磁盤I/O。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"horizontalrule","attrs":{}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作者:架構精進之路,十年研發風雨路,大廠架構師,CSDN 博客專家,專注架構技術沉澱學習及分享,職業與認知升級,堅持分享接地氣兒的乾貨文章,期待與你一起成長","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"關注並私信我回復“01”,送你一份程序員成長進階大禮包,歡迎勾搭。","attrs":{}}]}],"attrs":{}},{"type":"horizontalrule","attrs":{}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"🎉 文章首發於個人公衆號「架構精進之路」,原文鏈接:","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/rzAKaR8pMM1LIPeGZzVkWg","title":"","type":null},"content":[{"type":"text","text":"《面試官:談談你對索引的認知》系列之磁盤I/O","attrs":{}}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Thanks for reading!","attrs":{}}]}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章