此文主要轉載自
http://blog.csdn.net/zbszhangbosen/article/details/7956558
官網上有關於MySQL的flush method的設置參數說明,但可能很多人不太明白。下文就詳細說明此問題。
首先官網的說明如下:
http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_method
innodb_flush_method
Command-Line Format | --innodb_flush_method=name | ||
Option-File Format | innodb_flush_method | ||
System Variable Name | innodb_flush_method | ||
Variable Scope | Global | ||
Dynamic Variable | No | ||
Permitted Values (<= 5.6.6) | |||
Type (Linux) | string | ||
Default | fdatasync | ||
Valid Values | O_DSYNC | ||
O_DIRECT | |||
Permitted Values (<= 5.6.6) | |||
Type (HP-UX) | string | ||
Default | fdatasync | ||
Valid Values | O_DSYNC | ||
O_DIRECT | |||
Permitted Values (<= 5.6.6) | |||
Type (Solaris) | string | ||
Default | fdatasync | ||
Valid Values | O_DSYNC | ||
O_DIRECT | |||
Permitted Values (>= 5.6.7) | |||
Type (Linux) | string | ||
Default | fdatasync | ||
Valid Values | fdatasync | ||
O_DSYNC | |||
O_DIRECT | |||
O_DIRECT_NO_FSYNC | |||
Permitted Values (>= 5.6.7) | |||
Type (Solaris) | string | ||
Default | fdatasync | ||
Valid Values | fdatasync | ||
O_DSYNC | |||
O_DIRECT | |||
O_DIRECT_NO_FSYNC | |||
Permitted Values (>= 5.6.7) | |||
Type (HP-UX) | string | ||
Default | fdatasync | ||
Valid Values | fdatasync | ||
O_DSYNC | |||
O_DIRECT | |||
O_DIRECT_NO_FSYNC |
Controls the system calls used to flush data to the InnoDB
data
files and log
files, which can influence I/O throughput. This
variable is relevant only for Unix and Linux systems. On
Windows systems, the flush method is always async_unbuffered
and cannot be changed.
By default, InnoDB
uses the fsync()
system call to flush both the data
and log files. If innodb_flush_method
option is
set to O_DSYNC
, InnoDB
uses O_SYNC
to open and flush the log
files, and fsync()
to flush the data files.
If O_DIRECT
is specified (available on some
GNU/Linux versions, FreeBSD, and Solaris), InnoDB
uses O_DIRECT
(or directio()
on Solaris) to open the data
files, and uses fsync()
to flush both the
data and log files. Note that InnoDB
uses fsync()
instead of fdatasync()
, and it does not use O_DSYNC
by default because there have been
problems with it on many varieties of Unix.
An alternative setting is O_DIRECT_NO_FSYNC
: it uses the O_DIRECT
flag during flushing I/O, but
skips the fsync()
system call afterwards.
This setting is suitable for some types of filesystems but not
others. For example, it is not suitable for XFS. If you are
not sure whether the filesystem you use requires an fsync()
, for example to preserve all file
metadata, use O_DIRECT
instead.
Depending on hardware configuration, setting innodb_flush_method
to O_DIRECT
or O_DIRECT_NO_FSYNC
can have either a
positive or negative effect on performance. Benchmark your
particular configuration to decide which setting to use, or
whether to keep the default. Examine the Innodb_data_fsyncs
status
variable to see the overall number of fsync()
calls done with each setting. The
mix of read and write operations in your workload can also
affect which setting performs better for you. For example, on
a system with a hardware RAID controller and battery-backed
write cache, O_DIRECT
can help to avoid
double buffering between the InnoDB
buffer
pool and the operating system's filesystem cache. On some
systems where InnoDB
data and log files are
located on a SAN, the default value or O_DSYNC
might be faster for a read-heavy
workload with mostly SELECT
statements.
Always test this parameter with the same type of hardware and
workload that reflects your production environment. For
general I/O tuning advice, see Section 8.5.7, “Optimizing InnoDB
Disk I/O”.
Formerly, a value of fdatasync
also
specified the default behavior. This value was removed, due to
confusion that a value of fdatasync
caused fsync()
system calls rather than fdatasync()
for flushing. To obtain the
default value now, do not set any value for innodb_flush_method
at
startup.
裏面提到了fsync()和fdatasync()系統調用,下文給予了詳細解釋。
之前在研究MySQL的一個參數innodb_flush_method時,就涉及到了fsync/fdatasync這些系統調用[system call](什麼是系統調用?它與庫函數的區別在哪?參見這裏)。接下來就簡單的分析一下sync/fsync/fdatasync的區別。
sync():int sync( void )這就是它的原型,A call to this function will not return as long as there is data which has not been written to the device,sync()同步寫,沒有寫到物理設備就不會返回,但是現實中並不是這樣的。在kernel的手冊上有解釋:BUGS部分(linux中用man查看命令的時候不是都有一個BUGS部分麼,就是指的那個)According to the standard specification (e.g., POSIX.1-2001), sync() schedules the writes, but may return before the actual writing is done. However, since version 1.3.20 Linux does actually wait. (This still does not guarantee data integrity: modern disks have large caches.)也就是sync()負責將這些寫物理設備的請求放入寫隊列,但是不一定寫真正被完成了。
fsync(int fd):The fsync function can be used to make sure all data associated with the open file fildes is written to the device associated with the descriptor。fsync()負責將一個文件描述符(什麼是文件描述符,它是unix、類unix系統打開文件的一種方式,應該相當於打開文件的一個句柄一樣)打開的文件寫到物理設備,而且是真正的同步寫,沒有寫完成就不會返回,而且寫的時候講文件本身的一些元數據都會更新到物理設備上去,比如atime,mtime等等。
fdatasync(int fd):When a call to the fdatasync function returns, it is ensured that all of the file data is written to the device。它只保證開打文件的數據全部被寫到物理設備上,但是一些元數據並不是一定的,這也是它與fsync的區別。
這三個系統調用都簡單的介紹完,那麼爲什麼需要它們三個呢?最簡單的說是從應用的需求來考慮的,sync是全局的,對整個系統都flush,fsync值針對單個文件,fdatasync當初設計是考慮到有特殊的時候一些基本的元數據比如atime,mtime這些不會對以後讀取造成不一致性,因此少了這些元數據的同步可能會在性能上有提升(但fsync和fdatasync兩者的性能差別有多大?這個不知道有誰測過沒)。所以說三者是根據不同的需求而定的。
接下來談談flush dirty page,也就是前面說的同步寫(沒寫完的話阻塞後面,直到寫完才返回)。爲什麼是刷髒頁?髒頁表示緩存中的頁(一般也就是內存中)也物理設備上的頁處於不一致,不一致是由於在內存中被修改。所以爲了使內存中的修改持久化到物理磁盤上我們需要將其從內存中flush到物理磁盤上。根據我的理解,一般來說緩存分成這幾種:1>應用程序自己帶了緩存,比如InnoDB的buffer pool;2>os層面上的緩存 ;3>磁盤設備自己的緩存,比如raid卡一般都管理着自己的緩存;4>磁盤本身或許會有一點點緩存(這個不確定,自己猜想的,這個即使有估計也是極小的)。好了,那麼大部分的時候我們說的flush dirty page都是指從應用程序的緩存->os的緩存->物理設備,如果物理設備沒有緩存的話,此時也就相當於持久化成功,但是像磁盤做了raid,raid卡有緩存的話,實際上還沒真正持久化成功,因爲此時還只到了raid卡的緩存,沒到物理設備,但是由於raid卡一般都帶有備用電池,所以即使此時斷電也不會造成數據丟失。
剛纔說了很多時候應用自己也有緩存機制,那麼你是否想過此時與os的緩存有重複呢?答案是:會的。剛纔說了我是通過研究MySQL的一個參數innodb_flush_method注意這些的,innodb_flush_method表示flush策略,MySQL提供了fdatasync/O_DSYNC/O_DIRECT這三個選項,默認是fdatasync(詳情可參看博文)我這裏主要說明爲什麼會提供選項:O_DIRECT。這個選項告訴os,InnoDB在讀寫數據的時候都不經過os的緩存,因爲剛纔說過InnoDB會維護自己的緩存buffer pool,如果還使用os的緩存那麼兩者就會有一定的重複。在前面參考的文章裏面說O_DIRECT對大量隨即讀寫有效率提升,順序讀寫則會下降。所以根據自己的需求來定,不過如果你的MySQL用在是OLTP上,基本上選擇O_DIRECT沒錯。