高併發之存儲篇:關注下索引原理和優化吧!躲得過實踐,躲不過面試官!

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"高併發系列歷史文章微信鏈接文檔","attrs":{}}]},{"type":"numberedlist","attrs":{"start":1,"normalizeStart":1},"content":[{"type":"listitem","attrs":{"listStyle":null},"content":[{"type":"paragraph","attrs":{"indent":0,"number":1,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"垂直性能提升","attrs":{}}]}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.1. ","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/FEbR0JyKfYMY-8IypmXVig","title":null,"type":null},"content":[{"type":"text","text":"架構優化:集羣部署,負載均衡","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.2. ","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/KM686VV1KJgmsRoaTNgJ6w","title":null,"type":null},"content":[{"type":"text","text":"萬億流量下負載均衡的實現","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.3. ","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/ZkSNv7whVlm-Lxv6EIVctg","title":null,"type":null},"content":[{"type":"text","text":"架構優化:消息中間件的妙用","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"paragraph","attrs":{"indent":1,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"1.4. ","attrs":{}},{"type":"link","attrs":{"href":"https://mp.weixin.qq.com/s/yawrRw0_RTAb-GBe6Q8wlg","title":null,"type":null},"content":[{"type":"text","text":"存儲優化:mysql的索引原理和優化","attrs":{}}],"marks":[{"type":"strong"}]}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/4e/4e375e843a31b6f08434a5327e576d21.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":"center","origin":null},"content":[{"type":"text","marks":[{"type":"color","attrs":{"color":"#FF7021","name":"orange"}}],"text":"公號後臺提供高併發系列文章的離線 PDF 文檔整理版的下載,歡迎關注,歡迎討論","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"不管是啥業務,最終數據都要落地,數據庫這一環是肯定少不了的。隨着業務發展,併發越來越高,數據庫很容易成爲整個鏈路的短板。這也是大廠面試中比較常被問到的。而調優的第一步,都是從sql語句、索引入手。先得保證單個數據庫執行沒問題,纔會有更高層次的分庫分表、彈性、容災等等。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Part1 爲什麼Kafka不需要我們關心索引,而Mysql卻需要?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Kafka 和 MySQL 雖然最終數據都是落磁盤,但是兩者在用途和數據查詢方式上有着很大的差異,所以決定了數據的存儲結構不同,進而決定了索引的複雜程度。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們先看下kafka的存儲結構:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d4/d4d650a5f94fd02b71287a9c5d25cbf1.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由於 kafka 的定位是進行穩定的高性能數據讀寫。所以對磁盤來說,是採用順序讀寫的方式,落在了一些 .log 文件中,並以基準偏移量補0命名。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了實現高速查找 kafka 創建了稀疏索引文件(隔一段數據創建一條,而非全量),即index文件。其中維護消息的 offset 和 .log文件的物理位置。通過二分查找快速定位log文件並順序掃描找到目標。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"所以,kafka的索引組織方式是相對簡單、方案相對固定,但MySQL卻不行。Mysql是關係型數據庫,是爲了支持複雜的業務數據查詢而創建的,","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"查詢方式、數據獲取需求多種多樣,要求MySQL具備更加複雜的索引機制來加速複雜業務查詢場景","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Part2 MySQL數據怎麼被組織","attrs":{}},{"type":"sup","content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"[1]","attrs":{}}],"attrs":{}},{"type":"sup","content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"[2]","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"以InnoDB存儲引擎來看mysql數據存儲:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/1f/1ffb04441ffbdb9c272b1c7c29a0b0c8.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"參考了三本資料,基本把最重要的部分都概括了","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"數據被分了多個邏輯層:行->頁->區塊->段->表空間","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們知道,InnoDB存儲引擎表是Index organized的(數據即索引,索引即數據),他們都維護在一個B+樹上,數據段就是葉子節點,索引段就是非葉子節點;","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而我們劃分的段、區塊 其實都是爲了利用操作系統的資源(比如每次從磁盤加載到內存的數據大小按區塊來約定等等`)來達到更高效讀寫的目的,邏輯劃分的。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其中頁是MySQL和磁盤交互的最小單位,怎麼從頁找到行,怎麼聚合到塊、到段再到空間呢。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"1 數據記錄最小單位-- 行","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面總圖中摘出一條記錄的結構如下圖:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/0e/0e10bc3b711fe36db380742b70652d86.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們可以看到,記錄頭中除了行號,還有下一條記錄的標識","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"next_record","attrs":{}},{"type":"text","text":",所以,我們可以通過","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"next_record","attrs":{}},{"type":"text","text":"將記錄連接起來,以","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"單向鏈表","attrs":{}}],"attrs":{}},{"type":"text","text":"的形式,所以這就決定了,當我們在記錄鏈中尋找某記錄時,只能順序遍歷,這也決定了一條數據鏈不會太長。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但一個頁默認是16K,加上行溢出等處理,一頁最多存放","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"7992","attrs":{}}],"attrs":{}},{"type":"text","text":"行記錄,這麼多的記錄,必須順序遍歷麼?當然不需要,讓我看看頁是怎麼組織記錄行的。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"2 與磁盤最小交互單位-- 頁","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"作爲與磁盤交互的最小單位,是用來存放實際數據的(頁類型是b-tree Node存真實數據,還有其他類型如索引目錄頁等用來加速查詢)從上面的大圖中可以大致看到一個頁的整體結構:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/f2/f25e9f305dbb002c3f0a7642645c639d.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"讓我們來看幾個關鍵的字段參數:","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Page Directory 決定着記錄項在頁內的查詢效率","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了更快速的查詢,頁目錄存儲的本頁的數據目錄(","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"槽","attrs":{}},{"type":"text","text":"),包含","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"最大最小記錄","attrs":{}}],"attrs":{}},{"type":"text","text":"和 ","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"分組數據鏈的最大記錄","attrs":{}}],"attrs":{}},{"type":"text","text":"的偏移量。方便使用","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"二分法","attrs":{}}],"attrs":{}},{"type":"text","text":"快速查找數據,不需要再從最小值開始遍歷,如下圖:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/cd/cd2e0fffd1df556ddb0983766d6f5289.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖片來自《從根兒上理解 MySQL》","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"File Header決定頁和頁之間怎樣關聯","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"記錄本頁的一些通用信息,主要包含。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過頁號來找到本頁、通過上下頁進行雙向鏈表串聯、通過類型判斷是索引頁還是數據頁。。。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/72/7232a901ee01300099d00384d8bcf954.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖片來自《從根兒上理解 MySQL》","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"此字段決定了頁和頁之間可以很方便的通過上述屬性進行關聯。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"Page Header決定頁的層級","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"存儲的本頁的數據信息,主要包含**","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"有了頁面的數據組織概念,那麼,怎麼利用這些結構來實現的數據快速查詢呢?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Part3 索引的演進思路","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"從上面的數據組織的知識裏可以看到,行記錄之間串聯成單向鏈表,在每頁中都按分組方式分佈在此頁的最小記錄和最大記錄之間。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"頁面之間通過上一頁、下一頁的指針,串聯成雙向鏈表,在磁盤中進行存儲,如下圖:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/62/620ad96c663a44b40e24d4611b5b7982.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那麼,要查詢一條記錄,可以怎麼做?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"3 原始:順序方式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如上圖所示的數據串聯方式,自然的提供了一種查詢方式:即按主鍵順序遍歷每頁和頁中的記錄行。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"但是,這樣的查詢方式,除了在頁內有二分優化,再無效率可言。怎麼辦?","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"尋求改進","attrs":{}},{"type":"text","text":":既然頁內的行記錄可以分組入槽,那數據頁之間爲什麼不行呢?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"4 改進:目錄方式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"我們將頁向上聚蔟,構建一個頁號目錄,先在目錄中查找,再到對應頁中查找,就比順序查找要快很多了。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/ff/ff1f4bbaf48d2aaadae0fc83532f3898.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"尋求改進","attrs":{}},{"type":"text","text":":這樣的方式所需大量連續空間 + 目錄會隨數據變動而頻繁變動,怎麼辦?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"5 演進:主鍵B+樹方式","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"其實,在敘述行記錄結構的時候,我們就看到,數據行的結構中,除了實際業務數據外,還有很多額外空間。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如","attrs":{}},{"type":"codeinline","content":[{"type":"text","text":"record_type","attrs":{}}],"attrs":{}},{"type":"text","text":"用來表示該記錄的類型是數據還是索引。正是這些額外的空間的設計,給InnoDB以更加適合的方式組織索引提供了支持:","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/8e/8e901f9c2023f606d5968baa0dc60d36.png","alt":"圖片","title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"圖片來自《從根兒上理解 MySQL》","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這就是一棵B+樹,頁節點有層級區分,頁中的行記錄有類型區分。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"業務數據都包含在葉子節點中,目錄數據都包含在其他非葉節點中。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣組織方式的優勢,是允許足夠少的層級容納足夠多的數據項(可以簡單的假設每一頁的數據項大小來預估)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"而這個索引方式就是我們常說的","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"聚蔟索引","attrs":{}},{"type":"text","text":"。即使用主鍵值進行記錄和頁的排序,且葉子節點含有全部用戶數據。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"尋求改進","attrs":{}},{"type":"text","text":":如果我想用其他列來查詢,怎麼辦?","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"6 擴展:二級索引、聯合索引","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"二級索引","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"比如用戶需要根據某一列(a列)的值來查詢,那就再重新創建一個B+樹。此索引樹和聚蔟索引樹的差別在於,索引節點是以a列的值爲目錄,且","attrs":{}},{"type":"text","marks":[{"type":"italic","attrs":{}}],"text":"葉子節點只包含a列的值和主鍵兩個值","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果用戶需要查詢除c列以外的更多信息,則需要拿主鍵ID再去聚蔟索引查一次,也叫","attrs":{}},{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"回表","attrs":{}},{"type":"text","text":"。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"聯合索引","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"二級索引是除主鍵外的單列索引,而聯合索引則是多個列共同排序。假設用戶需要用a 、b 兩個列進行有序查詢,那內在含義是,在a列值相同的情況下,再判斷b的值。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"同二級索引一樣,InnoDB也需要再創建一棵B+樹,且目錄項的排序按先a,後b進行排序串聯,葉子節點的數據項只包含 a 、b、主鍵三個值。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":1},"content":[{"type":"text","text":"Part4 生產實踐之觸類旁通","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"7 美團定時任務索引優化","attrs":{}},{"type":"sup","content":[{"type":"text","marks":[{"type":"strong","attrs":{}}],"text":"[3]","attrs":{}}],"attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"系統需要定時的撈取特定時間段內特定狀態、特定類型、特定操作者的任務進行定時處理。","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"select * from task \nwhere\n   status=x\n   and operator_id=xxxx \n   and operate_time>xxxxxxxx01\n   and operate_time
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章