linux那些事之zero page【轉】

轉自:https://blog.csdn.net/weixin_42730667/article/details/123121624

zero page
zero page是一個特殊的物理頁,裏面值全部爲0,zero page是針對匿名頁場景專門進行優化,主要是節省內存和對性能進行了一定優化。當malloc或者mapp一段虛擬內存後,第一次對該內存訪問爲讀操作,將會發生匿名page fault。do_anonymous_page處理,由於第一次爲讀操作還未發生寫操作,因此發生一個zero page,爲其申請一個特殊物理頁zero page。

if a process instantiates a new (non-huge) page by trying to read from it, the kernel still will not allocate a new memory page. Instead, it maps a special page, called simply the "zero page," into the process's address space instead. Thus, all unwritten anonymous pages, across all processes in the system, are, in fact, sharing one special page. Needless to say, the zero page is always mapped read-only; it would not do to have some process changing the value of zero for everybody else. Whenever a process attempts to write to the zero page, it will generate a write-protection fault; the kernel will then (finally) get around to allocating a real page of memory and substitute it into the process's address space at the right spot

zero page 好處:

zero page是一個全局唯一的一個物理頁,且只有一個物理頁。
zero page可以節省很對不必要的物理內存開銷。在實際應用程序場景中,經常存在其虛擬內存已經申請,且只對該內存進行讀取過,但是從未對該內存進行過寫操作,如果針對此場景發生只讀page fault時,也爲其虛擬內存區域全部申請對應物理內存將會極大浪費內存 因爲後面從沒有對該內存真正進行寫過。如果針對此場景,針對只讀過未初始化的內存,全部映射到同一個內容全部爲0的物理內存頁上將會大大節省物理內存。
提高效率,由於針對只讀page fault,並沒有進入buddy分配物理頁,而是直接使用zero page(初始化就已經分配好),所以效率會提高很多。
使用zero page 可以防止 由於之前申請釋放的物理頁殘留而造成的髒數據。
When an anonymous memory area is created or extended, no actual pages of memory are allocated (whether transparent huge pages are enabled or not). That is because a typical program will never touch many of the pages that are part of its address space; allocating pages before there is a demonstrated need would waste a considerable amount of time and memory. So the kernel will wait until the process tries to access a specific page, generating a page fault, before allocating memory for that page。

But, even then, there is an optimization that can be made. New anonymous pages must be filled with zeroes; to do anything else would be to risk exposing whatever data was left in the page by its previous user. Programs often depend on the initialization of their memory; since they know that memory starts zero-filled, there is no need to initialize that memory themselves. As it turns out, a lot of those pages may never be written to; they stay zero-filled for the life of the process that owns them. Once that is understood, it does not take long to see that there is an opportunity to save a lot of memory by sharing those zero-filled pages. One zero-filled page looks a lot like another, so there is little value in making too many of them.

zero page 劣勢:

當內存先讀後寫時,會觸發兩次page fault,先觸發讀內存page fault 使用zero page刷新對應映射,然後由於寫內存會再次觸發page fault才生成新的實際物理內存,相對之前一次觸發page fault,消耗可能會增多。
匿名頁do_anonymous_page處理

 

 

 

匿名頁page fault針對zero page處理主要由讀寫內存順序不同,觸發page fault處理不同,如下:

 

 

 

當內存第一次訪問是讀則會觸發讀page fault,會申請zero page,且內存屬性爲只讀。
當內存先讀後寫,則會由於讀page fault設置屬性爲只讀,當寫內存時會再次觸發page fault,申請物理頁並更改pte。
當內存第一次訪問爲寫觸發,則會直接調用alloc_zeroed_user_highpage_movable申請新的物理頁。
page fault zero page 處理
匿名頁page fault zero page處理如下:

static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
{
... ...

//讀page fault 觸發zero page
/* Use the zero-page for reads */
if (!(vmf->flags & FAULT_FLAG_WRITE) &&
!mm_forbids_zeropage(vma->vm_mm)) {
entry = pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address),
vma->vm_page_prot));
vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
vmf->address, &vmf->ptl);
... ...
pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address),
vma->vm_page_prot))
goto setpte;
}

... ...

//寫page fault 調用alloc_zeroed_user_highpage_movable申請物理內存
page = alloc_zeroed_user_highpage_movable(vma, vmf->address);
if (!page)
goto oom;

... ...
//寫page fault,更新pte 從buddy中申請的物理內存,並刷新pte及對應權限。
entry = mk_pte(page, vma->vm_page_prot);
entry = pte_sw_mkyoung(entry);
setpte:
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);

/* No need to invalidate - it was non-present before */
update_mmu_cache(vma, vmf->address, vmf->pte);
... ...
}
vmf->flags 代表觸發page 原因,當爲只讀(vmf->flags & FAULT_FLAG_WRITE)時,且支持zero page特性,則會進入zero page處理。
my_zero_pfn(vmf->address): 獲取導zero page pfn。
pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address), vma->vm_page_prot)):設置zero page PTE。
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry):刷新page table對應PTE,即該虛擬地址對應物理頁爲zero page。
當爲寫內存觸發page fault時會調用alloc_zeroed_user_highpage_movable 申請物理內存。
entry = mk_pte(page, vma->vm_page_prot):根據新申請的物理內存以及對應物理頁權限組裝成entry。
set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry) 重新刷新entry 將之前zero page對應映射覆蓋掉。
update_mmu_cache: 刷新MMU。
my_zero_pfn
my_zero_pfn爲獲取到zero page pfn, 當不支持__HAVE_COLOR_ZERO_PAGE時,my_zero_pfn爲處理爲:

static inline unsigned long my_zero_pfn(unsigned long addr)
{
extern unsigned long zero_pfn;
return zero_pfn;
}
zero_pfn爲全局唯一的一個特殊物理頁pfn,內容全部爲零,在系統初始時就已經定義完成:

static int __init init_zero_pfn(void)
{
zero_pfn = page_to_pfn(ZERO_PAGE(0));
return 0;
}
ZERO_PAGE
ZERO_PAGE爲獲取到zero page對應的實際物理頁,x86系統(arch\x86\include\asm\pgtable.h)文件中定義:

/*
* ZERO_PAGE is a global shared page that is always zero: used
* for zero-mapped memory areas etc..
*/
extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]
__visible;
#define ZERO_PAGE(vaddr) ((void)(vaddr),virt_to_page(empty_zero_page))
empty_zero_page爲一個佔用實際大小爲一個物理頁的unsigned long,爲一個全局變量,其內存大小佔用一個物理頁,大小在編譯時就已經指定,當內核啓動加載image時將該全局變量加載到內存中,empty_zero_page數組首地址就是zero page的實際虛擬地址,由於內核數據段是採用一一映射,所以虛擬地址可以直接轉換成物理地址。

empty_zero_page爲一段彙編代碼進行一個初始化全部爲0(arch\x86\kernel\head_64.S)

__PAGE_ALIGNED_BSS
SYM_DATA_START_PAGE_ALIGNED(empty_zero_page)
.skip PAGE_SIZE
SYM_DATA_END(empty_zero_page)
EXPORT_SYMBOL(empty_zero_page)
empty_zero_page位於BSS段,使用GUN .skip指令將其填充爲0,大小爲PAGE_SIZE。

.skip指令相當於.space指令,格式爲:

..skip size [,fill]

size 爲要填充的內存大小。

file爲要填充的數值,如果省略則默認爲0

zero page 實驗可以參考《Introduce huge zero page》 中的實驗,實驗代碼如下:

#include <assert.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>

#define MAX_NUM 10

#define BUFFER_SIZE 4096
int main()
{
char *a[MAX_NUM]={NULL};


for (int i=0; i<MAX_NUM;i++){
posix_memalign((void **)&a[i],4096,BUFFER_SIZE);

for(int j=0;j<BUFFER_SIZE;j++){
assert(a[i][j] == 0);
}

getchar();
}

}
使用free 命令觀察每次循環之後物理內存沒有增長現象。

alloc_zeroed_user_highpage_movable
該函數是當寫內存觸發page fault時,會調用該函數申請實際物理內存:

static inline struct page *
alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,
unsigned long vaddr)
{
return __alloc_zeroed_user_highpage(__GFP_MOVABLE, vma, vaddr);
}
爲對應vma申請物理內存,標記爲位__GFP_MOVABLE,允許從ZONE_MOVABLE或者可移動遷移類中中申請內存:

#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)
再次添加GFP_HIGHUSER和__GFP_ZERO標記爲:

如果由ZONE_HIGH,則儘量從ZONE_HIGH中申請物理內存,會設置__GFP_HARDWALL當water mark低於min時允許使用一定數量保留物理內存,並設置__GFP_RECLAIM,內存不足時允許直接觸發內存回收。
__GFP_ZERO 分配物理內存內存全部初始化爲0。
參考資料
The GNU Assembler

Adding a huge zero page [LWN.net]

Introduce huge zero page [LWN.net]
————————————————
版權聲明:本文爲CSDN博主「Huo的藏經閣」的原創文章,遵循CC 4.0 BY-SA版權協議,轉載請附上原文出處鏈接及本聲明。
原文鏈接:https://blog.csdn.net/weixin_42730667/article/details/123121624

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章