關於Linux Cache與Direct IO

原創

2020-02-20 18:07

A file is simply a collection of data stored on media. When a process wants to access data from a file, the operating system brings the data into main memory, the process reads it, alters it and stores to the disk . The operating system could read and write data directly to and from the disk for each request, but the response time and throughput would be poor due to slow disk access times. The operating system therefore attempts to minimize the frequency of disk accesses by buffering data in main memory, within a structure called the file buffer cache.

Certain applications derive no benefit from the file buffer cache. Databases normally manage data caching at the application level, so they do not need the file system to implement this service for them. The use of a file buffer cache results in undesirable overheads in such cases, since data is first moved from the disk to the file buffer cache and from there to the application buffer. This “doublecopying” of data results in more CPU consumption and adds overhead to the memory too.

For applications that wish to bypass the buffering of memory within the file system cache, Direct I/O is provided. When Direct I/O is used for a file, data is transferred directly from the disk to the application buffer, without the use of the file buffer cache. Direct I/O can be used for a file either by mounting the corresponding file system with the direct i/o option (options differs for each OS), or by opening the file with the O_DIRECT flag specified in the open() system call. Direct I/O benefits applications by reducing CPU consumption and eliminating the overhead of copying data twice – first between the disk and the file buffer cache, and then from the file.

However, there are also few performance impacts when direct i/o is used. Direct I/O bypasses filesystem read-ahead - so there will be a performance impact.

As a generic term, 'direct I/O' refers to filesystem I/O which does not traffic through the OS-level page cache.

Linux的Cache一般認爲分爲Page cache與Buffer cache。Page cache實際上是針對文件系統的，是文件inode的緩存，在文件層面上的數據會緩存到Page cache。文件的邏輯層需要映射到實際的物理磁盤，這種映射關係由文件系統來完成。Buffer cache是針對磁盤塊的緩存，也就是在沒有文件系統的情況下，直接對磁盤進行操作的數據會緩存到Buffer cache中。這些Cache有效縮短了 I/O系統調用(比如read,write,getdents)的時間。

簡單說來，Page cache用來緩存文件數據，Buffer cache用來緩存磁盤數據。在有文件系統的情況下，對文件操作，那麼數據會緩存到Page cache，如果直接採用dd等工具對磁盤進行讀寫，那麼數據會緩存到Buffer cache。但是這種處理在2.6版本的內核之後就變的很簡單了。

在Linux 2.6的內核中Page cache和Buffer cache進一步結合，Buffer pages其實也是Page cache裏面的頁。從Linux算法實現的角度，Page cache和Buffer cache目前是一樣的,只是多了一層抽象，通過buffer_head來進行一些訪問管理。可以理解爲只有Page cache概念亦可。

標準IO：

在 Linux 中，這種訪問文件的方式是通過兩個系統調用實現的：read() 和 write()。當應用程序調用 read() 系統調用讀取一塊數據的時候，如果該塊數據已經在內存中了，那麼就直接從內存中讀出該數據並返回給應用程序；如果該塊數據不在內存中，那麼數據會被從磁盤上讀到頁高緩存中去，然後再從頁緩存中拷貝到用戶地址空間中去。如果一個進程讀取某個文件，那麼其他進程就都不可以讀取或者更改該文件；對於寫數據操作來說，當一個進程調用了write() 系統調用往某個文件中寫數據的時候，數據會先從用戶地址空間拷貝到操作系統內核地址空間的頁緩存中去，然後才被寫到磁盤上。但是對於這種標準的訪問文件的方式來說，在數據被寫到頁緩存中的時候，write() 系統調用就算執行完成，並不會等數據完全寫入到磁盤上。Linux 在這裏採用的是我們前邊提到的延遲寫機制（deferred writes ）。如果用戶採用的是延遲寫機制（ deferred writes ），那麼應用程序就完全不需要等到數據全部被寫回到磁盤，數據只要被寫到頁緩存中去就可以了。在延遲寫機制的情況下，操作系統會定期地將放在頁緩存中的數據刷到磁盤上。

直接IO：

凡是通過直接 I/O 方式進行數據傳輸，數據均直接在用戶地址空間的緩衝區和磁盤之間直接進行傳輸，完全不需要頁緩存的支持。操作系統層提供的緩存往往會使應用程序在讀寫數據的時候獲得更好的性能，但是對於某些特殊的應用程序，比如說數據庫管理系統這類應用，他們更傾向於選擇他們自己的緩存機制，因爲數據庫管理系統往往比操作系統更瞭解數據庫中存放的數據，數據庫管理系統可以提供一種更加有效的緩存機制來提高數據庫中數據的存取性能。