How the buffer cache works

On a typical system approximately 85% of disk I/O can be avoided by using the buffer cache, though this depends on the mix of jobs running. The buffer cache is created in an area of kernel memory and is never swapped out. Although the buffer cache can be regarded as a memory resource, it is primarily an I/O resource due to its use in mediating data transfer. When a user process issues a read request, the operating system searches the buffer cache for the requested data. If the data is in the buffer cache, the request is satisfied without accessing the physical device. It is quite likely that data to be read is already in the buffer cache because the kernel copies an entire block containing the data from disk into memory. This allows any subsequent data falling within that block to be read more quickly from the cache in memory, rather than having to re-access the disk. The kernel also performs read-ahead of blocks on the assumption that most files are accessed from beginning to end.

在一個經典的系統中通過使用buffer cache可以避免大約硬盤85%的讀寫操作,儘管這依賴於混合模式的工作。buffer cache在kernel memory區域中被創建並且永遠不會被置換出去。儘管buffer cache可以被當成內存資源,但是他主要是I/O資源,由於其在協調數據傳輸使用當一個用戶進程遇到一個讀請求,操作系統會去buffer cache搜索請求的數據。如果這個數據在buffer cache中,這個請求可以在不經過物理設備的情況下被滿足。去讀取的數據已經在buffer cache中是非常可能的,因爲kernel將一整塊包含數據的塊從disk複製到memory中。這允許任何落入那個數據塊中的後續數據可以從memory的cache中更快的讀取到,而不是不得不重新通過disk獲取。在大部分從頭到尾訪問的文件中,Kernel也會使用預讀塊

The data area of each buffer for filesystems other than DTFS is 1KB which is the same size as a filesystem logical block and twice the typical physical disk block size of 512 bytes. DTFS filesystems use buffers with data areas in multiples of 512 bytes from 512 bytes to 4KB.

文件系統的每個數據區域buffer除了DTFS是1KB,這個大小與一個文件系統的邏輯塊大小是一樣的並且是經典物理disk塊512bytes的兩倍。DTFS文件系統使用512bytes的倍數,從512bytes到4KB。

If data is written to the disk, the kernel first checks the buffer cache to see if the block, containing the data address to be written, is already in memory. If it is, then the block found in the buffer cache is updated; if not, the block must first be read into the buffer cache to allow the existing data to be overwritten.

如果數據被寫入disk,kernel首先檢查buffer cache去看一下是否對應的包含數據地址的需要寫入的塊已經在memory中。如果在memory中,這個在buffer cache中的block會被更新,如果不在memeory中這個對應的block應當首先被讀入到buffer cache中,並允許已存在的數據可以被覆蓋。

When the kernel writes data to a buffer, it marks it as delayed-write. This means that the buffer must be written to disk before the buffer can be re-used. Writing data to the buffer cache allows multiple updates to occur in memory rather than having to access the disk each time. Once a buffer has aged in memory for a set interval it is flushed to disk by the buffer flushing daemonbdflush.

當kernel寫數據到一個buffer中,它標記這個buffer爲 delayed-write(延遲寫入)。這意味着這個buffer必須在這個buffer可以被重用之前寫入disk。寫數據到buffer cache中允許在memory中發生多次更新而不必每次都訪問disk。 一旦一個buffer已經在memory中到達一個設定的時間間隔,它將被flush到disk中,通過buffer flushing daemon——bdflush。

The kernel parameter NAUTOUP specifies how long a delayed-write buffer can remain in the buffer cache before its contents are written to disk. The default value for NAUTOUP is 10 seconds, and ranges between 0 and 60. It does not cause a buffer to be written precisely at NAUTOUP seconds, but at the next buffer flushing following this time interval.

kernel參數NAUTOUP指定了一個delayed-write的buffer可以在buffer cache中保留多長時間才被寫到disk上。NAUTOUP的默認值是10 秒,範圍是0-60s之間。在NAUTOUP參數的時間,這個buffer不會被正好寫入到disk中,但是在下一個buffer flush會遵從這個時間間隔。

Although the system buffer cache significantly improves overall system throughput, in the event of a system power failure or a kernel panic, data remaining in the buffer cache but which has not been written to disk may be lost. This is because data scheduled to be written to a physical device will have been erased from physical memory (which is volatile) as a consequence of the crash.

儘管buffer cache系統極大的提高了系統的吞吐量,但是在系統斷電或者kernel panic,保留在buffer cache中還沒有被寫到disk上的數據可能會丟失。這是因爲被安排寫入到物理設備的數據將會從物理memory中被擦除結果是crash。

The default flushing interval of the buffer flushing daemonbdflush, is 30 seconds. The kernel parameter BDFLUSHR controls the flushing interval. You can configureBDFLUSHR to take a value in the range 1 to 300 seconds.

默認的buffer flushing daemon——bdflush的flushing時間間隔是30s。kernel參數BDFLUSHR控制flushing時間間隔,你可以配置BDFLUSHR去獲取一個從1-300秒之間的一個值。

If your system crashes, you will lose NAUTOUP + (BDFLUSHR/2) seconds of data on average. With the default values of these parameters, this corresponds to 25 seconds of data. Decreasing BDFLUSHR will increase data integrity but increase system overhead. The converse is true if you increase the interval.

如果你的系統crash,你將會丟失平均NAUTOUP+(BDFLUSHR/2)秒的數據。使用這些參數的額默認值的話,這個數據將是25s。降低BDFLUSHR將會增加數據完整性但是會增加系統的負載。如果你增加時間間隔,那麼結果正好相反。

Apart from adjusting the aging and flushing intervals, you can also control the size of the buffer cache. The kernel parameter NBUF determines the amount of memory in kilobytes that is available for buffers. If you are using the DTFS filesystem, the value of NBUF does not correspond to the actual number of buffers in use. The default value of NBUF is 0; this causes the kernel to allocate approximately 10% of available physical memory to buffers.


The size of the buffer cache in kilobytes is displayed when the system starts up and in the file /usr/adm/messages. Look for a line of the form:

   kernel: Hz = 100, i/o bufs = numberk
If there are any buffers in memory above the first 16MB, the line may take the form:
   kernel: Hz = 100, i/o bufs = numberk  (high bufs = numberk)

The amount of memory reserved automatically for buffers may be not be optimal depending on the mix of applications that a system will run. For example, you may need to increase the buffer cache size on a networked file server to make disk I/O more efficient and increase throughput. You might also find that reducing the buffer cache size on the clients of the file server may be possible since the applications that they are running tend to access a small number of files. It is usually beneficial to do this because it increases the amount of physical memory available for user processes.

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章