MySQL探祕(四):InnoDB的磁盤文件及落盤機制

{"type":"doc","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"size","attrs":{"size":9}}],"text":"備註:公衆號原名張狗蛋的技術之路,現已改名爲程序員歷小冰","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"任何一個技術都有其底層的關鍵基礎技術,這些關鍵技術很有可能也是其他技術的關鍵技術,學習這些底層技術,就可以一通百通,讓你很快的掌握其他技術。如何在磁盤上存儲數據,如何使用日誌文件保證數據不丟失以及如何落盤,不僅是MySQL等數據庫的關鍵技術,也是MQ消息隊列或者其他中間件的關鍵技術之一。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e7/e7271682f67bcc81778bf770d559b2e1.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 上圖詳細顯示了InnoDB存儲引擎的體系架構,從圖中可見,InnoDB存儲引擎由內存池,後臺線程和磁盤文件三大部分組成。接下來我們就來簡單瞭解一下磁盤文件相關的概念和原理。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" InnoDB的主要的磁盤文件主要分爲三大塊:一是系統表空間,二是用戶表空間,三是redo日誌文件和歸檔文件。二進制文件(binlog)等文件是MySQL Server層維護的文件,所以未列入InnoDB的磁盤文件中。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"系統表空間和用戶表空間","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" InnoDB系統表空間包含InnoDB數據字典(元數據以及相關對象)並且doublewrite buffer,change buffer,undo logs的存儲區域。系統表空間也默認包含任何用戶在系統表空間創建的表數據和索引數據。系統表空間是一個共享的表空間因爲它是被多個表共享的","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 系統表空間是由一個或者多個數據文件組成。默認情況下,1個初始大小爲10MB,名爲ibdata1的系統數據文件在MySQL的data目錄下被創建。用戶可以使用innodb_data_file_path對數據文件的大小和數量進行配置。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" innodb_data_file_path的格式如下:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"innodb_data_file_path=datafile1[,datafile2]...\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 用戶可以通過多個文件組成一個表空間,同時制定文件的屬性:","attrs":{}}]},{"type":"codeblock","attrs":{"lang":null},"content":[{"type":"text","text":"innodb_data_file_path = /db/ibdata1:1000M;/dr2/db/ibdata2:1000M:autoextend\n","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 這裏講/db/ibdata1和/dr2/db/ibdata2兩個文件組成系統表空間。如果這兩個文件位於不同的磁盤上,磁盤的負載可能被平均,因此可以提高數據庫的整體性能。兩個文件的文件名之後都跟了屬性,表示文件ibdata1的大小爲1000MB,文件ibdata2的大小爲1000MB,而且用完空間之後可以自動增長(autoextend)。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 設置innodb_data_file_path參數之後,所以基於InnoDB存儲引擎的表的數據都會記錄到該系統表空間中,如果設置了參數innodb_file_per_table,則用戶可以將每個基於InnoDB存儲引擎的表產生一個獨立的用戶表空間。用戶表空間的命名規則爲:表名.ibd。 通過這種方式,用戶不用將所有數據都存放於默認的系統表空間中,但是用戶表空只存儲該表的數據、索引和插入緩衝BITMAP等信息,其餘信息還是存放在默認的表空間中。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/91/914cfe6800502389c7f7a28d158b38de.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 上圖顯示InnoDB存儲引擎對於文件的存儲方式,其中frm文件是表結構定義文件,記錄每個表的表結構定義。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"重做日誌文件和歸檔文件","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 默認情況下,在InnoDB存儲引擎的數據目錄下會有兩個名爲ib_logfile0和ib_logfile1的文件,這就是InnoDB的重做日誌文件(redo log fiel),它記錄了對於InnoDB存儲引擎的事務日誌。 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 當InnoDB的數據存儲文件發生錯誤時,重做日誌文件就能派上用場。InnoDB存儲引擎可以使用重做日誌文件將數據恢復爲正確狀態,以此來保證數據的正確性和完整性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 每個InnoDB存儲引擎至少有1個重做日誌文件組(group),每個文件組下至少有2個重做日誌文件,如默認的ib_logfile0和ib_logfile1。 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 爲了得到更高的可靠性,用戶可以設置多個鏡像日誌組,將不同的文件組放在不同的磁盤上,以此來提高重做日誌的高可用性。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 在日誌組中每個重做日誌文件的大小一致,並以循環寫入的方式運行。InnoDB存儲引擎先寫入重做日誌文件1,當文件被寫滿時,會切換到重做日誌文件2,再當重做日誌文件2也被寫滿時,再切換到重做日誌文件1。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 用戶可以使用innodb_log_file_size來設置重做日誌文件的大小,這對InnoDB存儲引擎的性能有着非常大的影響。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 如果重做日誌文件設置的太大,數據丟失時,恢復時可能需要很長的時間;另一方面,如果設置的太小,重做日誌文件太小會導致依據checkpoint的檢查需要頻繁刷新髒頁到磁盤中,導致性能的抖動。 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 重做日誌相關和Checkpoint的機制可以閱讀我之前文章的相應章節。","attrs":{}},{"type":"link","attrs":{"href":"https://xie.infoq.cn/article/724aa99813dc0480dfeeda1d8","title":"","type":null},"content":[{"type":"text","text":"https://xie.infoq.cn/article/724aa99813dc0480dfeeda1d8","attrs":{}}]}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"重做日誌的落盤機制","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" InnoDB對於數據文件和日誌文件的刷盤遵守WAL(Write ahead redo log) 和Force-log-at-commit兩種規則,二者保證了事務的持久性。WAL要求數據的變更寫入到磁盤前,首先必須將內存中的日誌寫入到磁盤;Force-log-at-commit要求當一個事務提交時,所有產生的日誌都必須刷新到磁盤上,如果日誌刷新成功後,緩衝池中的數據刷新到磁盤前數據庫發生了宕機,那麼重啓時,數據庫可以從日誌中恢復數據。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/3b/3bc3402806f9583c717bcf613714e779.jpeg","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 如上圖所示,InnoDB在緩衝池中變更數據時,會首先將相關變更寫入重做日誌緩衝中,然後再按時或者當事務提交時寫入磁盤,這符合Force-log-at-commit原則;當重做日誌寫入磁盤後,緩衝池中的變更數據纔會依據checkpoint機制擇時寫入到磁盤中,這符合WAL原則。 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 在checkpoint擇時機制中,就有重做日誌文件寫滿的判斷,所以,如前文所述,如果重做日誌文件太小,經常被寫滿,就會頻繁導致checkpoint將更改的數據寫入磁盤,導致性能抖動。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 操作系統的文件系統是帶有緩存的,當InnoDB向磁盤寫入數據時,有可能只是寫入到了文件系統的緩存中,沒有真正的“落袋爲安”。 ","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" InnoDB的innodb_flush_log_at_trx_commit屬性可以控制每次事務提交時InnoDB的行爲。當屬性值爲0時,事務提交時,不會對重做日誌進行寫入操作,而是等待主線程按時寫入;當屬性值爲1時,事務提交時,會將重做日誌寫入文件系統緩存,並且調用文件系統的fsync,將文件系統緩衝中的數據真正寫入磁盤存儲,確保不會出現數據丟失;當屬性值爲2時,事務提交時,也會將日誌文件寫入文件系統緩存,但是不會調用fsync,而是讓文件系統自己去判斷何時將緩存寫入磁盤。日誌的刷盤機制如下圖所示。","attrs":{}}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/e6/e6f92228bce613578bb4a58151d81898.png","alt":null,"title":null,"style":[{"key":"width","value":"75%"},{"key":"bordertype","value":"none"}],"href":null,"fromPaste":true,"pastePass":true}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" innodb_flush_log_at_commit是InnoDB性能調優的一個基礎參數,涉及InnoDB的寫入效率和數據安全。當參數值爲0時,寫入效率最高,但是數據安全最低;參數值爲1時,寫入效率最低,但是數據安全最高;參數值爲2時,二者都是中等水平。一般建議將該屬性值設置爲1,以獲得較高的數據安全性,而且也只有設置爲1,才能保證事務的持久性。","attrs":{}}]},{"type":"heading","attrs":{"align":null,"level":4},"content":[{"type":"text","text":"後記","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":" 我們後續還會學習binlog文件以及數據文件的落盤機制,還有InnoDB事務相關的其他知識,請大家持續關注。","attrs":{}}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}}]}
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章