sync/fsync/fdatasync的簡單比較

此文主要轉載自

http://blog.csdn.net/zbszhangbosen/article/details/7956558


官網上有關於MySQL的flush method的設置參數說明,但可能很多人不太明白。下文就詳細說明此問題。


首先官網的說明如下:

http://dev.mysql.com/doc/refman/5.6/en/innodb-parameters.html#sysvar_innodb_flush_method

innodb_flush_method

Command-Line Format--innodb_flush_method=name
Option-File Formatinnodb_flush_method
System Variable Nameinnodb_flush_method
Variable ScopeGlobal
Dynamic VariableNo

Permitted Values (<= 5.6.6)
Type (Linux)string
Defaultfdatasync
Valid ValuesO_DSYNC
O_DIRECT

Permitted Values (<= 5.6.6)
Type (HP-UX)string
Defaultfdatasync
Valid ValuesO_DSYNC
O_DIRECT

Permitted Values (<= 5.6.6)
Type (Solaris)string
Defaultfdatasync
Valid ValuesO_DSYNC
O_DIRECT

Permitted Values (>= 5.6.7)
Type (Linux)string
Defaultfdatasync
Valid Valuesfdatasync
O_DSYNC
O_DIRECT
O_DIRECT_NO_FSYNC

Permitted Values (>= 5.6.7)
Type (Solaris)string
Defaultfdatasync
Valid Valuesfdatasync
O_DSYNC
O_DIRECT
O_DIRECT_NO_FSYNC

Permitted Values (>= 5.6.7)
Type (HP-UX)string
Defaultfdatasync
Valid Valuesfdatasync
O_DSYNC
O_DIRECT
O_DIRECT_NO_FSYNC

Controls the system calls used to          flush data to the          InnoDB data          files and log          files, which can influence I/O throughput. This          variable is relevant only for Unix and Linux systems. On          Windows systems, the flush method is always          async_unbuffered and cannot be changed.

By default, InnoDB uses the          fsync() system call to flush both the data          and log files. If          innodb_flush_method option is          set to O_DSYNC, InnoDB          uses O_SYNC to open and flush the log          files, and fsync() to flush the data files.          If O_DIRECT is specified (available on some          GNU/Linux versions, FreeBSD, and Solaris),          InnoDB uses O_DIRECT (or          directio() on Solaris) to open the data          files, and uses fsync() to flush both the          data and log files. Note that InnoDB uses          fsync() instead of          fdatasync(), and it does not use          O_DSYNC by default because there have been          problems with it on many varieties of Unix.

An alternative setting is          O_DIRECT_NO_FSYNC: it uses the          O_DIRECT flag during flushing I/O, but          skips the fsync() system call afterwards.          This setting is suitable for some types of filesystems but not          others. For example, it is not suitable for XFS. If you are          not sure whether the filesystem you use requires an          fsync(), for example to preserve all file          metadata, use O_DIRECT instead.

Depending on hardware configuration, setting          innodb_flush_method to          O_DIRECT or          O_DIRECT_NO_FSYNC can have either a          positive or negative effect on performance. Benchmark your          particular configuration to decide which setting to use, or          whether to keep the default. Examine the          Innodb_data_fsyncs status          variable to see the overall number of          fsync() calls done with each setting. The          mix of read and write operations in your workload can also          affect which setting performs better for you. For example, on          a system with a hardware RAID controller and battery-backed          write cache, O_DIRECT can help to avoid          double buffering between the InnoDB buffer          pool and the operating system's filesystem cache. On some          systems where InnoDB data and log files are          located on a SAN, the default value or          O_DSYNC might be faster for a read-heavy          workload with mostly SELECT statements.          Always test this parameter with the same type of hardware and          workload that reflects your production environment. For          general I/O tuning advice, see          Section 8.5.7, “Optimizing InnoDB Disk I/O”.

Formerly, a value of fdatasync also          specified the default behavior. This value was removed, due to          confusion that a value of fdatasync caused          fsync() system calls rather than          fdatasync() for flushing. To obtain the          default value now, do not set any value for          innodb_flush_method at          startup.


裏面提到了fsync()和fdatasync()系統調用,下文給予了詳細解釋。


  之前在研究MySQL的一個參數innodb_flush_method時,就涉及到了fsync/fdatasync這些系統調用[system call](什麼是系統調用?它與庫函數的區別在哪?參見這裏)。接下來就簡單的分析一下sync/fsync/fdatasync的區別。

        sync():int sync( void )這就是它的原型,A call to this function will not return as long as there is data which has not been written to the device,sync()同步寫,沒有寫到物理設備就不會返回,但是現實中並不是這樣的。在kernel的手冊上有解釋:BUGS部分(linux中用man查看命令的時候不是都有一個BUGS部分麼,就是指的那個)According to the standard specification (e.g., POSIX.1-2001), sync() schedules the writes, but may return before the actual writing is done.  However, since version 1.3.20 Linux does actually wait.  (This still does not guarantee data integrity: modern disks have large caches.)也就是sync()負責將這些寫物理設備的請求放入寫隊列,但是不一定寫真正被完成了。

        fsync(int fd):The fsync function can be used to make sure all data associated with the open file fildes is written to the device associated with the descriptor。fsync()負責將一個文件描述符(什麼是文件描述符,它是unix、類unix系統打開文件的一種方式,應該相當於打開文件的一個句柄一樣)打開的文件寫到物理設備,而且是真正的同步寫,沒有寫完成就不會返回,而且寫的時候講文件本身的一些元數據都會更新到物理設備上去,比如atime,mtime等等。

        fdatasync(int fd):When a call to the fdatasync function returns, it is ensured that all of the file data is written to the device。它只保證開打文件的數據全部被寫到物理設備上,但是一些元數據並不是一定的,這也是它與fsync的區別。

        這三個系統調用都簡單的介紹完,那麼爲什麼需要它們三個呢?最簡單的說是從應用的需求來考慮的,sync是全局的,對整個系統都flush,fsync值針對單個文件,fdatasync當初設計是考慮到有特殊的時候一些基本的元數據比如atime,mtime這些不會對以後讀取造成不一致性,因此少了這些元數據的同步可能會在性能上有提升(但fsync和fdatasync兩者的性能差別有多大?這個不知道有誰測過沒)。所以說三者是根據不同的需求而定的。

        接下來談談flush dirty page,也就是前面說的同步寫(沒寫完的話阻塞後面,直到寫完才返回)。爲什麼是刷髒頁?髒頁表示緩存中的頁(一般也就是內存中)也物理設備上的頁處於不一致,不一致是由於在內存中被修改。所以爲了使內存中的修改持久化到物理磁盤上我們需要將其從內存中flush到物理磁盤上。根據我的理解,一般來說緩存分成這幾種:1>應用程序自己帶了緩存,比如InnoDB的buffer pool;2>os層面上的緩存 ;3>磁盤設備自己的緩存,比如raid卡一般都管理着自己的緩存;4>磁盤本身或許會有一點點緩存(這個不確定,自己猜想的,這個即使有估計也是極小的)。好了,那麼大部分的時候我們說的flush dirty page都是指從應用程序的緩存->os的緩存->物理設備,如果物理設備沒有緩存的話,此時也就相當於持久化成功,但是像磁盤做了raid,raid卡有緩存的話,實際上還沒真正持久化成功,因爲此時還只到了raid卡的緩存,沒到物理設備,但是由於raid卡一般都帶有備用電池,所以即使此時斷電也不會造成數據丟失。

        剛纔說了很多時候應用自己也有緩存機制,那麼你是否想過此時與os的緩存有重複呢?答案是:會的。剛纔說了我是通過研究MySQL的一個參數innodb_flush_method注意這些的,innodb_flush_method表示flush策略,MySQL提供了fdatasync/O_DSYNC/O_DIRECT這三個選項,默認是fdatasync(詳情可參看博文)我這裏主要說明爲什麼會提供選項:O_DIRECT。這個選項告訴os,InnoDB在讀寫數據的時候都不經過os的緩存,因爲剛纔說過InnoDB會維護自己的緩存buffer pool,如果還使用os的緩存那麼兩者就會有一定的重複。在前面參考的文章裏面說O_DIRECT對大量隨即讀寫有效率提升,順序讀寫則會下降。所以根據自己的需求來定,不過如果你的MySQL用在是OLTP上,基本上選擇O_DIRECT沒錯。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章