在Linux中,對物理內存的管理是怎麼實現的呢?對頁面的分配和回收是如何實現的呢?
Linux中對物理內存的管理是通過zone來管理的,以X86爲例,16MB以下的物理內存爲DMA zone;896MB以下的區域爲normal zone;然後896MB以上的區域統稱爲HighMemory zone;然後在分配內存的時候,可以指定區域來實現在某個特定的zone分配頁面;
那麼zone又是如何管理物理內存的呢?Linux同樣是使用夥伴系統來管理物理內存,來儘量避免碎片的產生;對夥伴系統的原理應該很清楚,就不說了
struct zone {
/* Fields commonly accessed by the page allocator */
/* zone watermarks, access with *_wmark_pages(zone) macros */
unsigned long watermark[NR_WMARK];
/*
* We don't know if the memory that we're going to allocate will be freeable
* or/and it will be released eventually, so to avoid totally wasting several
* GB of ram we must reserve some of the lower zone memory (otherwise we risk
* to run OOM on the lower zones despite there's tons of freeable ram
* on the higher zones). This array is recalculated at runtime if the
* sysctl_lowmem_reserve_ratio sysctl changes.
*/
unsigned long lowmem_reserve[MAX_NR_ZONES];
struct per_cpu_pageset __percpu *pageset;
/*
* free areas of different sizes
*/
spinlock_t lock;
struct free_area free_area[MAX_ORDER];
/* Fields commonly accessed by the page reclaim scanner */
spinlock_t lru_lock;
struct lruvec lruvec;
unsigned long pages_scanned; /* since last reclaim */
unsigned long flags; /* zone flags, see below */
wait_queue_head_t * wait_table;
unsigned long wait_table_hash_nr_entries;
unsigned long wait_table_bits;
/*
* spanned_pages is the total pages spanned by the zone, including
* holes, which is calculated as:
* spanned_pages = zone_end_pfn - zone_start_pfn;
*
* present_pages is physical pages existing within the zone, which
* is calculated as:
* present_pages = spanned_pages - absent_pages(pages in holes);
*
* managed_pages is present pages managed by the buddy system, which
* is calculated as (reserved_pages includes pages allocated by the
* bootmem allocator):
* managed_pages = present_pages - reserved_pages;
*
* So present_pages may be used by memory hotplug or memory power
* management logic to figure out unmanaged pages by checking
* (present_pages - managed_pages). And managed_pages should be used
* by page allocator and vm scanner to calculate all kinds of watermarks
* and thresholds.
*/
unsigned long spanned_pages;
unsigned long present_pages;
unsigned long managed_pages;
/*
* rarely used fields:
*/
const char *name;
};
其中和夥伴系統有關的是:
struct free_area free_area[MAX_ORDER]; //MAX_ORDER=11
11個free_area 分別爲 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 和 1024個頁面,所以最大的爲4MB;
struct free_area {
struct list_head free_list[MIGRATE_TYPES];
unsigned long nr_free;
};
所有的空閒頁面都掛在free_list下;夥伴系統的回收和分配方式很容易理解,就不說了,至於代碼就不貼在這裏了
夥伴系統是物理內存頁分配的基礎,在分配頁面的時候,最主要的函數是_ _alloc_pages( ),而它的大體過程爲
for (i = 0; (z=zonelist->zones[i]) != NULL; i++) {
if (zone_watermark_ok(z, order, ...)) {
page = buffered_rmqueue(z, order, gfp_mask);
if (page)
return page;
}
}
zone_watermark_ok(z, order, ...)函數,與zone的臨界值有關,用來檢測內存分配的安全性,以後還會詳細的討論。
buffered_rmqueue()函數是在給定的zone分配頁面,它繼續調用_ _rmqueue( )到夥伴系統中分配頁面;
struct free_area *area;
unsigned int current_order;
for (current_order=order; current_order<11; ++current_order) {
area = zone->free_area + current_order;
if (!list_empty(&area->free_list))
goto block_found;
}
block_found:
page = list_entry(area->free_list.next, struct page, lru);
list_del(&page->lru);
ClearPagePrivate(page);
page->private = 0;
area->nr_free--;
zone->free_pages -= 1UL << order;
size = 1 << curr_order;
while (curr_order > order) {
area--;
curr_order--;
size >>= 1;
buddy = page + size;
/* insert buddy as first element in the list */
list_add(&buddy->lru, &area->free_list);
area->nr_free++;
buddy->private = curr_order;
SetPagePrivate(buddy);
}
return page;
return NULL;
這樣就可以完成對物理頁面的分配了,所有的物理頁面的分配,例如slab的分配,都是基於上面的過程實現的!