MySQL中事務的持久性實現原理

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"說到數據庫事務,大家腦子裏一定很容易蹦出一堆事務的相關知識,如事務的ACID特性,隔離級別,解決的問題(髒讀,不可重複讀,幻讀)等等,但是可能很少有人真正的清楚事務的這些特性又是怎麼實現的,爲什麼要有四個隔離級別。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在之前的文章我們已經瞭解了"},{"type":"link","attrs":{"href":"https://segmentfault.com/a/1190000025156465","title":null},"content":[{"type":"text","text":"MySQL中事務的隔離性的實現原理"}]},{"type":"text","text":",今天就繼續來聊一聊MySQL持久性的實現原理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當然MySQL博大精深,文章疏漏之處在所難免,歡迎批評指正。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"說明"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MySQL的事務實現邏輯是位於引擎層的,並且不是所有的引擎都支持事務的,下面的說明都是以InnoDB引擎爲基準。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"InnoDB讀寫數據原理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在往下學習之前,我們需要先來了解下InnoDB是怎麼來讀寫數據的。我們知道數據庫的數據都是存放在磁盤中的,然後我們也知道磁盤I/O的成本是很大的,如果每次讀寫數據都要訪問磁盤,數據庫的效率就會非常低。爲了解決這個問題,InnoDB提供了 Buffer Pool 作爲訪問數據庫數據的緩衝。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Buffer Pool 是位於內存的,包含了磁盤中部分數據頁的映射。當需要讀取數據時,InnoDB會首先嚐試從Buffer Pool中讀取,讀取不到的話就會從磁盤讀取後放入Buffer Pool;當寫入數據時,會先寫入Buffer Pool的頁面,並把這樣的頁面標記爲dirty,並放到專門的flush list上,這些修改的數據頁會在後續某個時刻被刷新到磁盤中(這一過程稱爲刷髒,由其他後臺線程負責) 。如下圖所示:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d7/d7ba9f151d286457341e23305713398f.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"這樣設計的好處是可以把大量的磁盤I/O轉成內存讀寫,並且把對一個頁面的多次修改merge成一次I/O操作(刷髒一次刷入整個頁面),避免每次讀寫操作都訪問磁盤,從而大大提升了數據庫的性能。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"持久性定義"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"持久性是指事務一旦提交,它對數據庫的改變就應該是永久性的,接下來的其他操作或故障不應該對本次事務的修改有任何影響。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通過前面的介紹,我們知道InnoDB使用 Buffer Pool 來提高讀寫的性能。但是 Buffer Pool 是在內存的,是易失性的,如果一個事務提交了事務後,MySQL突然宕機,且此時Buffer Pool中修改的數據還沒有刷新到磁盤中的話,就會導致數據的丟失,事務的持久性就無法保證。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了解決這個問題,InnoDB引入了 redo log來實現數據修改的持久化。當數據修改時,InnoDB除了修改Buffer Pool中的數據,還會在redo log 記錄這次操作,並保證redo log早於對應的頁面落盤(一般在事務提交的時候),也就是常說的WAL。若MySQL突然宕機了且還沒有把數據刷回磁盤,重啓後,MySQL會通過已經寫入磁盤的redo log來恢復沒有被刷新到磁盤的數據頁。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"實現原理:redo log"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"爲了提高性能,和數據頁類似,redo log 也包括兩部分:一是內存中的日誌緩衝(redo log buffer),該部分日誌是易失性的;二是磁盤上的重做日誌文件(redo log file),該部分日誌是持久的。redo log是物理日誌,記錄的是數據庫中物理頁的情況 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"當數據發生修改時,InnoDB不僅會修改Buffer Pool中的數據,也會在redo log buffer記錄這次操作;當事務提交時,會對redo log buffer進行刷盤,記錄到redo log file中。如果MySQL宕機,重啓時可以讀取redo log file中的數據,對數據庫進行恢復。這樣就不需要每次提交事務都實時進行刷髒了。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"寫入過程"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dccc76edc3f4d18c8395ede8a88d0d9.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注意點:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先修改Buffer Pool,後寫 redo log buffer。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"redo日誌比數據頁先寫回磁盤:事務提交的時候,會把redo log buffer寫入redo log file,寫入成功纔算提交成功(也有其他場景觸發寫入,這裏就不展開了),而Buffer Pool的數據由後臺線程在後續某個時刻寫入磁盤。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刷髒的時候一定會保證對應的redo log已經落盤了,也即是所謂的WAL(預寫式日誌),否則會有數據丟失的可能性。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"好處"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事務提交的時候,寫入redo log 相比於直接刷髒的好處主要有三點:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刷髒是隨機I/O,但寫redo log 是順序I/O,順序I/O可比隨機I/O快多了,不需要。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刷髒是以數據頁(Page)爲單位的,即使一個Page只有一點點修改也要整頁寫入;而redo log中只包含真正被修改的部分,數據量非常小,無效IO大大減少。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刷髒的時候可能要刷很多頁的數據,無法保證原子性(例如只寫了一部分數據就失敗了),而redo log buffer 向 redo log file 寫log block,是按512個字節,也就是一個扇區的大小進行寫入,扇區是寫入的最小單位,因此可以保證寫入是必定成功的。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"先寫redo log還是先修改數據"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一次DML可能涉及到數據的修改和redo log的記錄,那它們的執行順序是怎麼樣的呢?網上的文章有的說先修改數據,後記錄redo log,有的說先記錄redo log,後改數據,那真實的情況是如何呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先通過上面的說明我們知道,redo log buffer在事務提交的時候就會寫入redo log file的,而刷髒則是在後續的某個時刻,所以可以確定的是"},{"type":"text","marks":[{"type":"strong"}],"text":"先記錄redo log,後修改data page"},{"type":"text","text":"(WAL當然是日誌先寫啦)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那接下來的問題就是先寫redo log buffer還是先修改Buffer Pool了。要了解這個問題,我們先要了解InnoDB中,一次DML的執行過程是怎麼樣的。一次DML的執行過程涉及了數據的修改,加鎖,解鎖,redo log的記錄和undo log的記錄等,也是需要保證原子性的,而InnoDB通過MTR(Mini-transactions)來保證一次DML操作的原子性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先來看MTR的定義:"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"An internal phase of "},{"type":"codeinline","content":[{"type":"text","text":"InnoDB"}]},{"type":"text","text":" processing, when making changes at the "},{"type":"text","marks":[{"type":"strong"}],"text":"physical"},{"type":"text","text":" level to internal data structures during "},{"type":"text","marks":[{"type":"strong"}],"text":"DML"},{"type":"text","text":" operations. A Mini-transactions (mtr) has no notion of "},{"type":"text","marks":[{"type":"strong"}],"text":"rollback"},{"type":"text","text":"; multiple Mini-transactionss can occur within a single "},{"type":"text","marks":[{"type":"strong"}],"text":"transaction"},{"type":"text","text":". Mini-transactionss write information to the "},{"type":"text","marks":[{"type":"strong"}],"text":"redo log"},{"type":"text","text":" that is used during "},{"type":"text","marks":[{"type":"strong"}],"text":"crash recovery"},{"type":"text","text":". A Mini-transactions can also happen outside the context of a regular transaction, for example during "},{"type":"text","marks":[{"type":"strong"}],"text":"purge"},{"type":"text","text":" processing by background threads."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"見 "},{"type":"link","attrs":{"href":"https://dev.mysql.com/doc/refman/8.0/en/glossary.html","title":null},"content":[{"type":"text","text":"https://dev.mysql.com/doc/refman/8.0/en/glossary.html"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MTR 是一個短原子操作,不能回滾,因爲它本身就是原子的。數據頁的變更必須通過MTR,MTR 會把DML操作對數據頁的修改記錄到 redo log裏。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面來簡單看下MTR的過程:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MTR初始化的時候會初始化一份 mtr_buf"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"當修改數據時,在對內存Buffer Pool中的頁面進行修改的同時,還會生成redo log record,保存在mtr_buf中"},{"type":"text","text":"。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在執行mtr_commit函數提交本MTR的時候,會將mtr_buf中的redo log record更新到redo log buffer中,同時將髒頁添加到flush list,供後續刷髒使用。在log buffer中,每接收到496字節的log record,就將這組log record包裝一個12字節的block header和一個4字節的block tailer,成爲一個512字節的log block,方便刷盤的時候對齊512字節刷盤。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由此可見,InnoDB是先修改Buffer Pool,後寫redo log buffer的。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"恢復數據的過程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在任何情況下,InnoDB啓動時都會嘗試執行recovery操作。在恢復過程中,需要redo log參與,而如果還開啓了binlog,那就還需要binlog、undo log的參與。因爲有可能數據已經寫入binlog了,但是redo log還沒有刷盤的時候數據庫就奔潰了(事務是InnoDB引擎的特性,修改了數據不一定提交了,而binlog是MySQL服務層的特性,修改數據就會記錄了),這時候就需要redo log,binlog和undo log三者的參與來判斷是否有還沒提交的事務,未提交的事務進行回滾或者提交操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面來簡單說下僅利用redo log恢復數據的過程:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"啓動InnoDB時,找到最近一次Checkpoint的位置,利用Checkpoint LSN去找大於該LSN的redo log進行日誌恢復。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果中間恢復失敗了也沒影響,再次恢復的時候還是從上次保存成功的Checkpoint的位置繼續恢復。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Recover過程"},{"type":"text","text":":故障恢復包含三個階段:Analysis,Redo和Undo。Analysis階段的任務主要是利用Checkpoint及Log中的信息確認後續Redo和Undo階段的操作範圍,通過Log修正Checkpoint中記錄的Dirty Page集合信息,並用其中涉及最小的LSN位置作爲下一步Redo的開始位置RedoLSN。同時修正Checkpoint中記錄的活躍事務集合(未提交事務),作爲Undo過程的回滾對象;Redo階段從Analysis獲得的RedoLSN出發,重放所有的Log中的Redo內容,注意這裏也包含了未Commit事務;最後Undo階段對所有未提交事務利用Undo信息進行回滾,通過Log的PrevLSN可以順序找到事務所有需要回滾的修改。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具體見 "},{"type":"link","attrs":{"href":"http://catkang.github.io/2019/01/16/crash-recovery.html","title":null},"content":[{"type":"text","text":"http://catkang.github.io/2019/01/16/crash-recovery.html"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"什麼是LSN?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"LSN也就是log sequence number,也日誌的序列號,是一個單調遞增的64位無符號整數。redo log和數據頁都保存着LSN,可以用作數據恢復的依據。LSN更大的表示所引用的日誌記錄所描述的變化發生在更後面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"什麼是Checkpoint?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Checkpoint表示一個保存點,在這個點之前的數據頁的修改(log LSN
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章