linux中頁緩衝和塊緩衝之概念


頁緩衝在《linux內核情景分析》一書的第5.6節文件的寫與讀一章中說明的很詳細,這裏摘抄下來;

在文件系統層中有三隔主要的數據結構,file結構、dentry結構和inode結構;

file結構:代表目標文件的一個上下文,不同進程可以在同一文件上建立不同的上下文,而且同一進程也可以通過打開一個文件多次而建立起多個上下文。因此不能在file結構上設置緩衝區隊列,因爲這些file結構體之間都不共享。

dentry結構體:該結構體是文件名結構體,通過軟/硬鏈接可以得到多個dentry結構體對應一個文件,dentry結構體和文件也不是一對一關係,所以也不能在該結構體上建立緩衝區隊列;

inode結構體:很顯然就只有inode結構體了,inode結構體和文件是一對一的關係,可以這麼說inode就是代表文件。在inode結構體上設置了i_mapping指針,該指針指向了一個address_space數據結構,一般來說該數據結構就是inode->i_data,緩衝區隊列就是在該數據結構中;


掛在緩衝區隊列中的不是記錄塊而是內存頁面,因此當一個進程調用mmap()函數將一個文件映射到它用戶空間時,它只要設置相應的內存映射表,就可以很自然的把這些緩存頁面映射到進程的用戶空間。所以才又起名爲i_mapping。


這裏還要了解下基數樹概念,先看看圖(圖片來自《深入linux內核架構》)


基數樹不是不是平衡樹,樹本身由兩種不同的數據結構組成,樹根節點和非葉子節點,樹根節點由簡單的數據結構表示,其中包含了樹的高度和指向組成樹的第一個節點的數據結構。節點本質上是數組,count是該節點的指針計數,其他的都是指向下一層節點的指針。而葉子節點是指向page的指針;

其中節點上的數據結構還包含了搜索標記,比如髒頁標記和回寫標記,可以很快的指定哪邊有標記的頁;



塊緩衝

塊緩衝在結構上由兩個部分組成:

1、緩衝頭:包含與緩衝區狀態相關的所有管理數據,塊號、長度,訪問器等,這些緩衝頭不直接存儲在緩衝頭之後,而是由緩衝頭指針指向的物理內存獨立區域中。

2、有用的數據保存在專門分配的頁中,這些頁也可以能同事存在頁緩衝中。


緩衝頭:

/*
 * Historically, a buffer_head was used to map a single block
 * within a page, and of course as the unit of I/O through the
 * filesystem and block layers.  Nowadays the basic I/O unit
 * is the bio, and buffer_heads are used for extracting block
 * mappings (via a get_block_t call), for tracking state within
 * a page (via a page_mapping) and for wrapping bio submission
 * for backward compatibility reasons (e.g. submit_bh).
 */
struct buffer_head {
    unsigned long b_state;      /* buffer state bitmap (see above) *///緩衝區狀態標識,看下面
    struct buffer_head *b_this_page;/* circular list of page's buffers *///指向下一個緩衝頭
    struct page *b_page;        /* the page this bh is mapped to *///指向擁有該塊緩衝區的頁描述符指針

    sector_t b_blocknr;     /* start block number *///塊設備的邏輯塊號
    size_t b_size;          /* size of mapping *///塊大小
    char *b_data;           /* pointer to data within the page *///塊在緩衝頁內的位置

    struct block_device *b_bdev;//指向塊設備描述符
    bh_end_io_t *b_end_io;      /* I/O completion *///i/o完成回調函數
    void *b_private;        /* reserved for b_end_io *///指向i/o完成回調函數的數據參數
    struct list_head b_assoc_buffers; /* associated with another mapping */
    struct address_space *b_assoc_map;  /* mapping this buffer is
                           associated with */
    atomic_t b_count;       /* users using this buffer_head *///塊使用計算器
};


緩衝區頭部的通用標誌

enum bh_state_bits {
    BH_Uptodate,    /* Contains valid data *///表示緩衝區包含有效數據
    BH_Dirty,   /* Is dirty *///緩衝區是髒的
    BH_Lock,    /* Is locked *///緩衝區被鎖住
    BH_Req,     /* Has been submitted for I/O *///初始化緩衝區而請求數據傳輸
    BH_Uptodate_Lock,/* Used by the first bh in a page, to serialise
              * IO completion of other buffers in the page
              */

    BH_Mapped,  /* Has a disk mapping *///b_bdev和b_blocknr是有效的
    BH_New,     /* Disk mapping was newly created by get_block *///剛分配還沒有訪問過
    BH_Async_Read,  /* Is under end_buffer_async_read I/O *///異步讀該緩衝區
    BH_Async_Write, /* Is under end_buffer_async_write I/O *///異步寫該緩衝區
    BH_Delay,   /* Buffer is not yet allocated on disk *///還沒有在磁盤上分配緩衝區
    BH_Boundary,    /* Block is followed by a discontiguity *///
    BH_Write_EIO,   /* I/O error on write *///i/o錯誤
    BH_Unwritten,   /* Buffer is allocated on disk but not written */
    BH_Quiet,   /* Buffer Error Prinks to be quiet */
    BH_Meta,    /* Buffer contains metadata */
    BH_Prio,    /* Buffer should be submitted with REQ_PRIO */

    BH_PrivateStart,/* not a state bit, but the first bit available
             * for private allocation by other entities
             */
};


如果一個頁作爲緩衝區頁使用,那麼與它的塊緩衝區相關的所有緩衝區首部都被收集在一個單向循環鏈表中。緩衝頁描述符的private字段指向該頁中第一個塊的緩衝區首部;而每個緩衝區首部的b_this_page字段中,該字段是指向鏈表中下一個緩衝區首部的指針。每個緩衝區首部的b_page指向所屬的緩衝區頁描述符;



從上圖可以看出一個緩衝頁對應了4個緩衝區,這就統一了page cache和buffer cache了。修改緩衝區或者緩衝頁,他們之間都會相互影響。



address_space結構體:

struct address_space {
    struct inode        *host;      /* owner: inode, block_device *///指向宿主文件的inode
    struct radix_tree_root  page_tree;  /* radix tree of all pages *///基數樹的root
    spinlock_t      tree_lock;  /* and lock protecting it *///基數樹的鎖
    unsigned int        i_mmap_writable;/* count VM_SHARED mappings *///vm_SHARED共享映射頁計數
    struct rb_root      i_mmap;     /* tree of private and shared mappings *///私有和共享映射的樹
    struct list_head    i_mmap_nonlinear;/*list VM_NONLINEAR mappings *///匿名映射的鏈表元素
    struct mutex        i_mmap_mutex;   /* protect tree, count, list *///包含樹的mutex
    /* Protected by tree_lock together with the radix tree */


    unsigned long       nrpages;    /* number of total pages *///頁的總數
    pgoff_t         writeback_index;/* writeback starts here *///回寫的開始
    const struct address_space_operations *a_ops;   /* methods *///函數指針
    unsigned long       flags;      /* error bits/gfp mask *///錯誤碼
    struct backing_dev_info *backing_dev_info; /* device readahead, etc *///設備預讀
    spinlock_t      private_lock;   /* for use by the address_space */
    struct list_head    private_list;   /* ditto */
    void            *private_data;  /* ditto */
} __attribute__((aligned(sizeof(long))));


struct inode *host和struct radix_tree_root page_tree關聯了文件和內存頁。




 346 struct address_space_operations {
 347     int (*writepage)(struct page *page, struct writeback_control *wbc);//寫操作,從頁寫到所有者的磁盤映像
 348     int (*readpage)(struct file *, struct page *);//讀操作,從所有者磁盤映像讀取到頁
 349 
 350     /* Write back some dirty pages from this mapping. */
 351     int (*writepages)(struct address_space *, struct writeback_control *);//指定數量的所有者髒頁回寫磁盤
 352 
 353     /* Set a page dirty.  Return true if this dirtied it */
 354     int (*set_page_dirty)(struct page *page);//把所有者的頁設置爲髒頁
 355 
 356     int (*readpages)(struct file *filp, struct address_space *mapping,
 357             struct list_head *pages, unsigned nr_pages);//從磁盤中讀取所有者頁的鏈表
 358 
 359     int (*write_begin)(struct file *, struct address_space *mapping,
 360                 loff_t pos, unsigned len, unsigned flags,
 361                 struct page **pagep, void **fsdata);//
 362     int (*write_end)(struct file *, struct address_space *mapping,
 363                 loff_t pos, unsigned len, unsigned copied,
 364                 struct page *page, void *fsdata);
 365 
 366     /* Unfortunately this kludge is needed for FIBMAP. Don't use it */
 367     sector_t (*bmap)(struct address_space *, sector_t);
 368     void (*invalidatepage) (struct page *, unsigned long);
 369     int (*releasepage) (struct page *, gfp_t);
 370     void (*freepage)(struct page *);
 371     ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
 372             loff_t offset, unsigned long nr_segs);
 373     int (*get_xip_mem)(struct address_space *, pgoff_t, int,
 374                         void **, unsigned long *);
 375     /*
 376      * migrate the contents of a page to the specified target. If sync
 377      * is false, it must not block.
 378      */
 379     int (*migratepage) (struct address_space *,
 380             struct page *, struct page *, enum migrate_mode);
 381     int (*launder_page) (struct page *);
 382     int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
 383                     unsigned long);
 384     int (*error_remove_page)(struct address_space *, struct page *);
 385 
 386     /* swapfile support */
 387     int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
 388                 sector_t *span);
 389     void (*swap_deactivate)(struct file *file);
 390 };
 391 























發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章