MySQL中事务的持久性实现原理

{"type":"doc","content":[{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"前言"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"说到数据库事务,大家脑子里一定很容易蹦出一堆事务的相关知识,如事务的ACID特性,隔离级别,解决的问题(脏读,不可重复读,幻读)等等,但是可能很少有人真正的清楚事务的这些特性又是怎么实现的,为什么要有四个隔离级别。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在之前的文章我们已经了解了"},{"type":"link","attrs":{"href":"https://segmentfault.com/a/1190000025156465","title":null},"content":[{"type":"text","text":"MySQL中事务的隔离性的实现原理"}]},{"type":"text","text":",今天就继续来聊一聊MySQL持久性的实现原理。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当然MySQL博大精深,文章疏漏之处在所难免,欢迎批评指正。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"说明"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MySQL的事务实现逻辑是位于引擎层的,并且不是所有的引擎都支持事务的,下面的说明都是以InnoDB引擎为基准。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"InnoDB读写数据原理"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在往下学习之前,我们需要先来了解下InnoDB是怎么来读写数据的。我们知道数据库的数据都是存放在磁盘中的,然后我们也知道磁盘I/O的成本是很大的,如果每次读写数据都要访问磁盘,数据库的效率就会非常低。为了解决这个问题,InnoDB提供了 Buffer Pool 作为访问数据库数据的缓冲。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Buffer Pool 是位于内存的,包含了磁盘中部分数据页的映射。当需要读取数据时,InnoDB会首先尝试从Buffer Pool中读取,读取不到的话就会从磁盘读取后放入Buffer Pool;当写入数据时,会先写入Buffer Pool的页面,并把这样的页面标记为dirty,并放到专门的flush list上,这些修改的数据页会在后续某个时刻被刷新到磁盘中(这一过程称为刷脏,由其他后台线程负责) 。如下图所示:"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/d7/d7ba9f151d286457341e23305713398f.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"这样设计的好处是可以把大量的磁盘I/O转成内存读写,并且把对一个页面的多次修改merge成一次I/O操作(刷脏一次刷入整个页面),避免每次读写操作都访问磁盘,从而大大提升了数据库的性能。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"持久性定义"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"持久性是指事务一旦提交,它对数据库的改变就应该是永久性的,接下来的其他操作或故障不应该对本次事务的修改有任何影响。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"通过前面的介绍,我们知道InnoDB使用 Buffer Pool 来提高读写的性能。但是 Buffer Pool 是在内存的,是易失性的,如果一个事务提交了事务后,MySQL突然宕机,且此时Buffer Pool中修改的数据还没有刷新到磁盘中的话,就会导致数据的丢失,事务的持久性就无法保证。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了解决这个问题,InnoDB引入了 redo log来实现数据修改的持久化。当数据修改时,InnoDB除了修改Buffer Pool中的数据,还会在redo log 记录这次操作,并保证redo log早于对应的页面落盘(一般在事务提交的时候),也就是常说的WAL。若MySQL突然宕机了且还没有把数据刷回磁盘,重启后,MySQL会通过已经写入磁盘的redo log来恢复没有被刷新到磁盘的数据页。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"实现原理:redo log"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"为了提高性能,和数据页类似,redo log 也包括两部分:一是内存中的日志缓冲(redo log buffer),该部分日志是易失性的;二是磁盘上的重做日志文件(redo log file),该部分日志是持久的。redo log是物理日志,记录的是数据库中物理页的情况 。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"当数据发生修改时,InnoDB不仅会修改Buffer Pool中的数据,也会在redo log buffer记录这次操作;当事务提交时,会对redo log buffer进行刷盘,记录到redo log file中。如果MySQL宕机,重启时可以读取redo log file中的数据,对数据库进行恢复。这样就不需要每次提交事务都实时进行刷脏了。"}]},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"写入过程"}]},{"type":"image","attrs":{"src":"https://static001.geekbang.org/infoq/7d/7dccc76edc3f4d18c8395ede8a88d0d9.png","alt":null,"title":"","style":[{"key":"width","value":"100%"},{"key":"bordertype","value":"none"}],"href":"","fromPaste":false,"pastePass":false}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"注意点:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"先修改Buffer Pool,后写 redo log buffer。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"redo日志比数据页先写回磁盘:事务提交的时候,会把redo log buffer写入redo log file,写入成功才算提交成功(也有其他场景触发写入,这里就不展开了),而Buffer Pool的数据由后台线程在后续某个时刻写入磁盘。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刷脏的时候一定会保证对应的redo log已经落盘了,也即是所谓的WAL(预写式日志),否则会有数据丢失的可能性。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"好处"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"事务提交的时候,写入redo log 相比于直接刷脏的好处主要有三点:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刷脏是随机I/O,但写redo log 是顺序I/O,顺序I/O可比随机I/O快多了,不需要。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刷脏是以数据页(Page)为单位的,即使一个Page只有一点点修改也要整页写入;而redo log中只包含真正被修改的部分,数据量非常小,无效IO大大减少。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"刷脏的时候可能要刷很多页的数据,无法保证原子性(例如只写了一部分数据就失败了),而redo log buffer 向 redo log file 写log block,是按512个字节,也就是一个扇区的大小进行写入,扇区是写入的最小单位,因此可以保证写入是必定成功的。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"heading","attrs":{"align":null,"level":3},"content":[{"type":"text","text":"先写redo log还是先修改数据"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"一次DML可能涉及到数据的修改和redo log的记录,那它们的执行顺序是怎么样的呢?网上的文章有的说先修改数据,后记录redo log,有的说先记录redo log,后改数据,那真实的情况是如何呢?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先通过上面的说明我们知道,redo log buffer在事务提交的时候就会写入redo log file的,而刷脏则是在后续的某个时刻,所以可以确定的是"},{"type":"text","marks":[{"type":"strong"}],"text":"先记录redo log,后修改data page"},{"type":"text","text":"(WAL当然是日志先写啦)。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"那接下来的问题就是先写redo log buffer还是先修改Buffer Pool了。要了解这个问题,我们先要了解InnoDB中,一次DML的执行过程是怎么样的。一次DML的执行过程涉及了数据的修改,加锁,解锁,redo log的记录和undo log的记录等,也是需要保证原子性的,而InnoDB通过MTR(Mini-transactions)来保证一次DML操作的原子性。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"首先来看MTR的定义:"}]},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"An internal phase of "},{"type":"codeinline","content":[{"type":"text","text":"InnoDB"}]},{"type":"text","text":" processing, when making changes at the "},{"type":"text","marks":[{"type":"strong"}],"text":"physical"},{"type":"text","text":" level to internal data structures during "},{"type":"text","marks":[{"type":"strong"}],"text":"DML"},{"type":"text","text":" operations. A Mini-transactions (mtr) has no notion of "},{"type":"text","marks":[{"type":"strong"}],"text":"rollback"},{"type":"text","text":"; multiple Mini-transactionss can occur within a single "},{"type":"text","marks":[{"type":"strong"}],"text":"transaction"},{"type":"text","text":". Mini-transactionss write information to the "},{"type":"text","marks":[{"type":"strong"}],"text":"redo log"},{"type":"text","text":" that is used during "},{"type":"text","marks":[{"type":"strong"}],"text":"crash recovery"},{"type":"text","text":". A Mini-transactions can also happen outside the context of a regular transaction, for example during "},{"type":"text","marks":[{"type":"strong"}],"text":"purge"},{"type":"text","text":" processing by background threads."}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"见 "},{"type":"link","attrs":{"href":"https://dev.mysql.com/doc/refman/8.0/en/glossary.html","title":null},"content":[{"type":"text","text":"https://dev.mysql.com/doc/refman/8.0/en/glossary.html"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MTR 是一个短原子操作,不能回滚,因为它本身就是原子的。数据页的变更必须通过MTR,MTR 会把DML操作对数据页的修改记录到 redo log里。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面来简单看下MTR的过程:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"MTR初始化的时候会初始化一份 mtr_buf"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"当修改数据时,在对内存Buffer Pool中的页面进行修改的同时,还会生成redo log record,保存在mtr_buf中"},{"type":"text","text":"。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在执行mtr_commit函数提交本MTR的时候,会将mtr_buf中的redo log record更新到redo log buffer中,同时将脏页添加到flush list,供后续刷脏使用。在log buffer中,每接收到496字节的log record,就将这组log record包装一个12字节的block header和一个4字节的block tailer,成为一个512字节的log block,方便刷盘的时候对齐512字节刷盘。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"由此可见,InnoDB是先修改Buffer Pool,后写redo log buffer的。"}]},{"type":"heading","attrs":{"align":null,"level":2},"content":[{"type":"text","text":"恢复数据的过程"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"在任何情况下,InnoDB启动时都会尝试执行recovery操作。在恢复过程中,需要redo log参与,而如果还开启了binlog,那就还需要binlog、undo log的参与。因为有可能数据已经写入binlog了,但是redo log还没有刷盘的时候数据库就奔溃了(事务是InnoDB引擎的特性,修改了数据不一定提交了,而binlog是MySQL服务层的特性,修改数据就会记录了),这时候就需要redo log,binlog和undo log三者的参与来判断是否有还没提交的事务,未提交的事务进行回滚或者提交操作。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"下面来简单说下仅利用redo log恢复数据的过程:"}]},{"type":"bulletedlist","content":[{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"启动InnoDB时,找到最近一次Checkpoint的位置,利用Checkpoint LSN去找大于该LSN的redo log进行日志恢复。"}]}]},{"type":"listitem","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"如果中间恢复失败了也没影响,再次恢复的时候还是从上次保存成功的Checkpoint的位置继续恢复。"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"blockquote","content":[{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"Recover过程"},{"type":"text","text":":故障恢复包含三个阶段:Analysis,Redo和Undo。Analysis阶段的任务主要是利用Checkpoint及Log中的信息确认后续Redo和Undo阶段的操作范围,通过Log修正Checkpoint中记录的Dirty Page集合信息,并用其中涉及最小的LSN位置作为下一步Redo的开始位置RedoLSN。同时修正Checkpoint中记录的活跃事务集合(未提交事务),作为Undo过程的回滚对象;Redo阶段从Analysis获得的RedoLSN出发,重放所有的Log中的Redo内容,注意这里也包含了未Commit事务;最后Undo阶段对所有未提交事务利用Undo信息进行回滚,通过Log的PrevLSN可以顺序找到事务所有需要回滚的修改。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"具体见 "},{"type":"link","attrs":{"href":"http://catkang.github.io/2019/01/16/crash-recovery.html","title":null},"content":[{"type":"text","text":"http://catkang.github.io/2019/01/16/crash-recovery.html"}]}]}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null}},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"什么是LSN?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"LSN也就是log sequence number,也日志的序列号,是一个单调递增的64位无符号整数。redo log和数据页都保存着LSN,可以用作数据恢复的依据。LSN更大的表示所引用的日志记录所描述的变化发生在更后面。"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","marks":[{"type":"strong"}],"text":"什么是Checkpoint?"}]},{"type":"paragraph","attrs":{"indent":0,"number":0,"align":null,"origin":null},"content":[{"type":"text","text":"Checkpoint表示一个保存点,在这个点之前的数据页的修改(log LSN
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章