Linux內存初始化（下）

我們接着看linux初始化內存的下半部分，等內存初始化後就可以進入真正的內存管理了，初始化我總結了一下，大體分爲三步：

物理內存進系統前
用memblock模塊來對內存進行管理
頁表映射
zone初始化

前兩步在linux裏分別對應如下操作：

fixed map 加載dtb ：Uboot會將kernel image和dtb拷貝到內存中，並且將dtb物理地址告知kernel
系統解析dtb裏的內存參數：kernel需要從該物理地址上讀取到dtb文件並解析，才能得到最終的內存信息

上面兩個步驟可以簡單參考上篇文章，本文在上面2個步驟的基礎上延續向下講，進入 paging_init()和 bootmem_init()

paging_init

void __init paging_init(void)
{
 phys_addr_t pgd_phys = early_pgtable_alloc();//分配一頁大小的物理內存放進pgd
 pgd_t *pgd = pgd_set_fixmap(pgd_phys);

 map_kernel(pgd);//將內核的各個段進行映射 .text .init .data .bss
 map_mem(pgd);//將memblock子系統添加的物理內存進行映射(將物理地址映射到線性區域)

 /*
  * We want to reuse the original swapper_pg_dir so we don't have to
  * communicate the new address to non-coherent secondaries in
  * secondary_entry, and so cpu_switch_mm can generate the address with
  * adrp+add rather than a load from some global variable.
  *
  * To do this we need to go via a temporary pgd.
  */
 cpu_replace_ttbr1(__va(pgd_phys));//切換頁表
 memcpy(swapper_pg_dir, pgd, PGD_SIZE);//將新建立的頁表內容替換swapper_pg_dir頁表內容
 cpu_replace_ttbr1(lm_alias(swapper_pg_dir));

 pgd_clear_fixmap();
 memblock_free(pgd_phys, PAGE_SIZE);

 /*
  * We only reuse the PGD from the swapper_pg_dir, not the pud + pmd
  * allocated with it.
  */
 memblock_free(__pa_symbol(swapper_pg_dir) + PAGE_SIZE,
        SWAPPER_DIR_SIZE - PAGE_SIZE);
}

early_pgtable_alloc：分配一頁大小的物理內存放進pgd
map_kernel(pgd)：將內核的各個段進行映射（.text .init .data .bss）

map_mem(pgd)：將memblock子系統添加的物理內存進行映射(將物理地址映射到線性區域)

主要是完成通過memblock_add添加到系統中的物理內存映射，注意如果memblock設置了MEMBLOCK_NOMAP標誌的話則不對其地址映射。

cpu_replace_ttbr1(__va(pgd_phys))：切換頁表
memcpy(swapper_pg_dir, pgd, PGD_SIZE)：將新建立的頁表內容替換swapper_pg_dir頁表內容

bootmem_init

void __init bootmem_init(void)
{
 unsigned long min, max;

 min = PFN_UP(memblock_start_of_DRAM());
 max = PFN_DOWN(memblock_end_of_DRAM());

 early_memtest(min << PAGE_SHIFT, max << PAGE_SHIFT);

 max_pfn = max_low_pfn = max;

 arm64_numa_init();
 /*
  * Sparsemem tries to allocate bootmem in memory_present(), so must be
  * done after the fixed reservations.
  */
 arm64_memory_present();

 sparse_init();
 zone_sizes_init(min, max);

 memblock_dump_all();
}

這個函數基本上完成了linux對物理內存“劃分”的初始化，包括node, zone, page frame，以及對應的數據結構。在講這個函數之前，我們需要了解下物理內存組織。

「Linux是如何組織物理內存的？」

「node」：

目前計算機系統有兩種體系結構：

非一致性內存訪問 NUMA（Non-Uniform Memory Access）意思是內存被劃分爲各個node，訪問一個node花費的時間取決於CPU離這個node的距離。每一個cpu內部有一個本地的node，訪問本地node時間比訪問其他node的速度快
一致性內存訪問 UMA（Uniform Memory Access）也可以稱爲SMP（Symmetric Multi-Process）對稱多處理器。意思是所有的處理器訪問內存花費的時間是一樣的。也可以理解整個內存只有一個node。

「zone」：

ZONE的意思是把整個物理內存劃分爲幾個區域，每個區域有特殊的含義

enum zone_type {
#ifdef CONFIG_ZONE_DMA
 /*
  * ZONE_DMA is used when there are devices that are not able
  * to do DMA to all of addressable memory (ZONE_NORMAL). Then we
  * carve out the portion of memory that is needed for these devices.
  * The range is arch specific.
  *
  * Some examples
  *
  * Architecture  Limit
  * ---------------------------
  * parisc, ia64, sparc <4G
  * s390   <2G
  * arm   Various
  * alpha  Unlimited or 0-16MB.
  *
  * i386, x86_64 and multiple other arches
  *    <16M.
  */
 ZONE_DMA,
#endif
#ifdef CONFIG_ZONE_DMA32
 /*
  * x86_64 needs two ZONE_DMAs because it supports devices that are
  * only able to do DMA to the lower 16M but also 32 bit devices that
  * can only do DMA areas below 4G.
  */
 ZONE_DMA32,
#endif
 /*
  * Normal addressable memory is in ZONE_NORMAL. DMA operations can be
  * performed on pages in ZONE_NORMAL if the DMA devices support
  * transfers to all addressable memory.
  */
 ZONE_NORMAL,
#ifdef CONFIG_HIGHMEM
 /*
  * A memory area that is only addressable by the kernel through
  * mapping portions into its own address space. This is for example
  * used by i386 to allow the kernel to address the memory beyond
  * 900MB. The kernel will set up special mappings (page
  * table entries on i386) for each page that the kernel needs to
  * access.
  */
 ZONE_HIGHMEM,
#endif
 ZONE_MOVABLE,
#ifdef CONFIG_ZONE_DEVICE
 ZONE_DEVICE,
#endif
 __MAX_NR_ZONES

};

「page」：

代表一個物理頁，在內核中一個物理頁用一個struct page表示。

「page frame」:

爲了描述一個物理page，內核使用struct page結構來表示一個物理頁。假設一個page的大小是4K的，內核會將整個物理內存分割成一個一個4K大小的物理頁，而4K大小物理頁的區域我們稱爲page frame

「page frame num(pfn)」 :

pfn是對每個page frame的編號。故物理地址和pfn的關係是：

物理地址>>PAGE_SHIFT = pfn

「pfn和page的關係」：

內核中支持了好幾個內存模型：CONFIG_FLATMEM（平坦內存模型）CONFIG_DISCONTIGMEM（不連續內存模型）CONFIG_SPARSEMEM_VMEMMAP（稀疏的內存模型）目前ARM64使用的稀疏的類型模式

/* memmap is virtually contiguous.  */
#define __pfn_to_page(pfn) (vmemmap + (pfn))
#define __page_to_pfn(page) (unsigned long)((page) - vmemmap)

系統啓動的時候，內核會將整個struct page映射到內核虛擬地址空間vmemmap的區域，所以我們可以簡單的認爲struct page的基地址是vmemmap，則：

vmemmap+pfn的地址就是此struct page對應的地址。

最後

至此linux對物理內存的初始化和虛擬地址和物理地址的映射關係算是告一段落，相信你已經知道 linux 虛擬尋址空間layout的來龍去脈，以及如何把物理內存通過node, zone, page frame來軟件模擬。

Linux內存初始化（下）

paging_init

bootmem_init

最後

Linux內存初始化（下）

Linux內存初始化（上）

手把手教你入門AIoT（9）

Linus Torvalds：我們都老了，但Linux維護真的很難找

手把手教你入門AIoT（10）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結