http://blog.chinaunix.net/uid-24774106-id-3427836.html
我們都知道,動態共享庫裏面的函數的共享的,這也是動態庫的優勢所在,就是節省內存。C 編譯出來的可執行文件幾乎都會用到libc的庫,假如沒有這個共享的技術,每個可執行文件都要佔一份libc庫的內存,這將是極大的內存浪費。 可是一直沒搞明白,怎麼樣才能證明共享庫裏面函數的地址在物理內存層面是同一份。其實,這個問題的本質是程序裏面的邏輯地址和物理內存地址之間是怎樣映射的,說的再赤裸裸一點,就是我給你個邏輯地址,請你在物理內存中找到對應的地址,或者我給你個物理地址,請你把這個物理內存裏面存的東西告訴我。
最近兩天,發現了一篇很牛的博文,這個博文徹底解決了邏輯地址 線性地址 物理地址的內存映射問題,作者的功力特別身後,他十分kind的提供了一篇29頁的pdf文檔,此文章一出,就徹底終結這個問題了。那我爲什麼還要寫這篇博文呢。作者以2.6.18內核爲例,提供了兩個內核模塊和兩個應用層的程序,我在自己的Ubuntu 12.04上花了時間完整的驗證了文檔裏面PAE(Physical Address Extension)模式的地址映射,發現代碼裏面存在一些兼容性的問題,導致編譯不過,主要是內核版本不同和gcc帶來的一些小問題。所以我花了4個多小時才把這個實驗完整的做下來。如果想通過做實驗來加深理解的筒子可以參考我修改後的程序。我無意抄襲,還是那句話,光榮屬於前輩。
下面的圖來自Intel的手冊64-ia-32-architectures-software-developer-vol-3a-part-1-manual ,很好的解釋的邏輯地址到物理地址的映射。所謂邏輯地址,就是我們C 語言中取地址符後,看到的地址。
採用原文的函數
- #include <stdio.h>
- int main()
- {
- unsigned long tmp;
- tmp = 0x12345678;
- printf("tmp address:0x%08lX\n", &tmp);
- return 0;
- }
- tmp address:0xBF86D16C
1 段式映射
臨時變量tmp的邏輯地址0xBF86D16C就是偏移量,因爲tmp位於棧中,IA-32提供了SS(Stack Segment)寄存器。
- //arch/x86/kernel/process_32.c
- //-------------------------------------------
- void
- start_thread(struct pt_regs *regs, unsigned long new_ip, unsigned long new_sp)
- {
- set_user_gs(regs, 0);
- regs->fs = 0;
- regs->ds = __USER_DS;
- regs->es = __USER_DS;
- regs->ss = __USER_DS;
- regs->cs = __USER_CS;
- regs->ip = new_ip;
- regs->sp = new_sp;
- /*
- * Free the old FP and other extended state
- */
- free_thread_xstate(current);
- }
- arch/x86/include/asm/segment.h
- ------------------------------------------
- #define GDT_ENTRY_DEFAULT_USER_CS 14
- #define GDT_ENTRY_DEFAULT_USER_DS 15
- #define GDT_ENTRY_KERNEL_BASE (12)
- #define GDT_ENTRY_KERNEL_CS (GDT_ENTRY_KERNEL_BASE+0)
- #define GDT_ENTRY_KERNEL_DS (GDT_ENTRY_KERNEL_BASE+1)
- #define __KERNEL_CS (GDT_ENTRY_KERNEL_CS*8)
- #define __KERNEL_DS (GDT_ENTRY_KERNEL_DS*8)
- #define __USER_DS (GDT_ENTRY_DEFAULT_USER_DS*8+3)
- #define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8+3)
- 0000000001110 011
- 0000000001111 011
TI表示我要選擇的段描述符是存在GDT中還是LDT中。GDT和LDT可以簡單理解成兩個表,每個表裏面都存放這一組地址。
我們的CS和DS對應的TI位都是0,換句話說,我們要着的段描述符在GDT中。實際上,我們的Linux程序裏用的段描述符總是選擇GDT,幾乎沒有選擇LDT的。毛德操老爺子說,只有像wine這種進程纔會用到LDT這樣的東西。
RPL表示特權等級,0表示最高權限,3表示無特權。之所以在
- #define __USER_CS (GDT_ENTRY_DEFAULT_USER_CS*8+3)
接下來就是去GDT這張表,去找到我們要的段描述符。等等,我們一直很爽的叫着GDT,知道我們的DS段描述符是在index =15的位置,可是從來沒有人告訴我們GDT這張表放在哪裏。
GDTR橫空出世了,GDT的地址就存放在GDTR這個寄存器裏面。問題是怎麼讀出啦GDTR寄存器的值?
前面提到的博文作者寫了一個內核模塊,來提取GDTR,CR0 CR3 等的值,主幹代碼在下面:
- static int my_get_info( char *buf, char **start, off_t off, int count )
- {
- int len = 0;
- struct mm_struct *mm;
- mm = current->active_mm;
- cr0 = read_cr0();
- cr3 = read_cr3();
- cr4 = read_cr4();
- //asm(" sgdt gdtr");
- asm("sgdt %0":"=m"(gdtr));
- len += sprintf( buf+len, "cr4=%08X ", cr4 );
- len += sprintf( buf+len, "PSE=%X ", (cr4>>4)&1 );
- len += sprintf( buf+len, "PAE=%X ", (cr4>>5)&1 );
- len += sprintf( buf+len, "\n" );
- len += sprintf( buf+len, "cr3=%08X cr0=%08X\n",cr3,cr0);
- len += sprintf( buf+len, "pgd:0x%08X\n",(unsigned int)mm->pgd);
- len += sprintf( buf+len, "gdtr address:%lX, limit:%X\n", gdtr.address,gdtr.limit);
- // len += sprintf( buf+len, "cpu_gdt_table address:0x%08lX\n", cpu_gdt_table);
- return len;
- }
總之我們有辦法取GDTR寄存器的值,從而找到了GDT這張表,然後從這張表裏面着第16項(index=15),我們就能找到我們的DS段描述符。
- root@manu:~/code/c/self/mm_addr# ./mem_map
- %ebp:0xBF86D178
- tmp address:0xBF86D16C
- cr4=000006F0 PSE=1 PAE=1
- cr3=06E3C000 cr0=8005003B
- pgd:0xC6E3C000
- gdtr address:F7BB9000, limit:FF
可以算出GDT的地址爲F7BB9000 - c0000000,然後用作者提供的工具fileview去看下內存內容
- -----------------------------------------------------------
- gdtr : f7bb9000 - c0000000 = 37bb9000
- 0000037BB9000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000037BB9010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000037BB9020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000037BB9030 FF FF 00 B9 61 F3 DF B7 00 00 00 00 00 00 00 00 ....a...........
- 0000037BB9040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000037BB9050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000037BB9060 FF FF 00 00 00 9B CF 00 FF FF 00 00 00 93 CF 00 ................
- 0000037BB9070 FF FF 00 00 00 FB CF 00 FF FF 00 00 00 F3 CF 00 ................
- 0000037BB9080 6B 20 C0 EA BB 8B 00 F7 00 00 00 00 00 00 00 00 k ..............
- 0000037BB9090 FF FF 00 00 00 9A 40 00 FF FF 00 00 00 9A 00 00 ......@.........
- 0000037BB90A0 FF FF 00 00 00 92 00 00 00 00 00 00 00 92 00 00 ................
- 0000037BB90B0 00 00 00 00 00 92 00 00 FF FF 00 00 00 9A 40 00 ..............@.
- 0000037BB90C0 FF FF 00 00 00 9A 00 00 FF FF 00 00 00 92 40 00 ..............@.
- 0000037BB90D0 FF FF 00 00 00 92 CF 00 FF FF 00 40 29 93 8F 36 ...........@)..6
- 0000037BB90E0 18 00 80 0C BC 91 40 F7 00 00 00 00 00 00 00 00 ......@.........
- 0000037BB90F0 00 00 00 00 00 00 00 00 6B 20 00 48 80 89 00 C1 ........k .H....
- FF FF 00 00 00 F3 CF 00 = 00cff300 0000ffff
自己對照就能的出,BASE=0x00000000,費了半點的勁,最後的得出:
分段機制是fake的,虛擬地址總是能線性地址。
我們還可以得到其他有用的信息:
- S=1 非系統段
- G=1 以4096爲單位
- DPL=0x11,內核態用戶態均可訪問
2 頁式映射。
有了線性地址,下一步就是獲取物理地址了。
我的電腦採用了PAE,物理地址擴展分頁機制,看下我的uname -ar
- uname -r
- 3.2.0-29-generic-pae
先講講啥是PAE。 目前的服務器基本都突破了4G的內存,很多PC都已經突破4G 了,我有同事就有16G 內存的PC,讓我羨慕的直流口水。Intel通過把管腳從32增加到36,可以支持64G內存,但是,必須引入一種新的分頁機制,把32位的線性地址轉化成36位的物理地址,才能充分利用這64G的內存。
這個機制就是PAE :
1 引入一個頁目錄指針表PDPT,有4個64位的item組成。
2 cr3寄存器中27位用來表示 頁目錄指針表PDPT的地址(32字節對齊,所以不需要32來表示)。
3 線性地址的高2位決定4個PDPT item的的哪一個。
上圖完整的描述了PAE模式下線性地址到物理地址的映射。稍微不好懂的就是40這個數字的含義:
Intel手冊裏面有下面的句子:
1)A PDE is selected using the physical address defined as follows:
— Bits 51:12 are from PDPTEi.
— Bits 11:3 are bits 29:21 of the linear address.
— Bits 2:0 are 0.
2)PDE的bit7(PS位)決定了採用4K大小的頁還是2M 大小的頁。如果是2M 大小的頁,上面的圖針對的是4K 大小的頁。2M大小的頁採用這種模式:
對於我們而言,我們採用的不是2M 大小的頁,後面實驗中我們可以看下PS位。所以這種2M的頁的模式,後面我們就不講了。
3)A PTE is selected using the physical address defined as follows:
— Bits 51:12 are from the PDE.
— Bits 11:3 are bits 20:12 of the linear address.
— Bits 2:0 are 0.
4)獲取最後的物理地址
— Bits 51:12 are from the PTE.
— Bits 11:0 are from the original linear address.
OK 回到我們的例子:
- 線性地址:
- 0xBF86D16C
- 0x 10 111111 100 0 0110 1101 0001 0110 1100
- root@manu:~/code/c/self/mm_addr# ./mem_map
- %ebp:0xBF86D178
- tmp address:0xBF86D16C
- cr4=000006F0 PSE=1 PAE=1
- cr3=06E3C000 cr0=8005003B
- pgd:0xC6E3C000
- gdtr address:F7BB9000, limit:FF
我們看下0x6E3C000地址下存放的啥東西,再次祭出我們的dram神器:
- 0000006E3C000 01 B0 E3 06 00 00 00 00 01 60 3C 08 00 00 00 00 .........`<.....
- 0000006E3C010 01 50 3C 08 00 00 00 00 01 40 93 01 00 00 00 00 .P<......@......
- 0000006E3C020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006E3C0F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 01 50 3C 08 00 00 00 00 = 0x083c5001,
其中bit 0表示的是present,表示該64位地址是有效的。
其中bit7(PS位)沒有置位,表明採用的頁是4K 大小的頁,而不是2M大小的頁。
可以算出表項的基地址爲:0x83c5000。
- 線性地址:
- 0xBF86D16C
- 0x 10 111111 100 0 0110 1101 0001 0110 1100
看下這個地址下的內容:
- 00000083C5FB0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C5FC0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C5FD0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C5FE0 67 70 D8 06 00 00 00 00 00 00 00 00 00 00 00 00 gp..............
- 00000083C5FF0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C6090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 00000083C60A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000
- 67 70 D8 06 00 00 00 00 = 0x6d87067
- 線性地址:
- 0xBF86D16C
- 0x 10 111111 100 0 0110 1101 0001 0110 1100
看下這個地址下的內容
- 0000006D87360 47 40 65 07 00 00 00 80 47 A0 94 0D 00 00 00 80 G@e.....G.......
- 0000006D87370 47 B0 BD 09 00 00 00 80 00 00 00 00 00 00 00 00 G...............
- 0000006D87380 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87390 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D873F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87400 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87410 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87420 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87430 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87440 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 0000006D87450 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
- 47 A0 94 0D 00 00 00 80 = 0x80000000 0d94a047
1) 12~51位來自 0x80000000 0d94a047
換句話說就是:0d94a000
2) 0 ~11來自線性地址的最後12位
- 線性地址:
- 0xBF86D16C
- 0x 10 111111 100 0 0110 1101 0001 0110 1100
- 0xd94a000 + (0001 0110 1100)b = 0x0d94a16c
用我們的神器看下物理地址的內容是不是0x12345678
- 000000D94A160 70 A2 7A B7 00 00 00 00 A9 86 04 08 78 56 34 12 p.z.........xV4.
- 000000D94A170 A0 86 04 08 00 00 00 00 00 00 00 00 D3 14 5F B7 .............._.
- 000000D94A180 01 00 00 00 14 D2 86 BF 1C D2 86 BF 58 98 79 B7 ............X.y.
- 000000D94A190 00 00 00 00 1C D2 86 BF 1C D2 86 BF 00 00 00 00 ................
- 000000D94A1A0 A0 82 04 08 F4 DF 77 B7 00 00 00 00 00 00 00 00 ......w.........
- 000000D94A1B0 00 00 00 00 A9 68 DD 32 B8 4C 57 81 00 00 00 00 .....h.2.LW.....
- 000000D94A1C0 00 00 00 00 00 00 00 00 01 00 00 00 A0 84 04 08 ................
- 000000D94A1D0 00 00 00 00 A0 F6 7A B7 E9 13 5F B7 F4 BF 7B B7 ......z..._...{.
- 000000D94A1E0 01 00 00 00 A0 84 04 08 00 00 00 00 C1 84 04 08 ................
- 000000D94A1F0 54 85 04 08 01 00 00 00 14 D2 86 BF A0 86 04 08 T...............
- 000000D94A200 10 87 04 08 70 A2 7A B7 0C D2 86 BF 18 C9 7B B7 ....p.z.......{.
- 000000D94A210 01 00 00 00 DF E8 86 BF 00 00 00 00 E9 E8 86 BF ................
- 000000D94A220 F9 E8 86 BF 04 E9 86 BF 0E E9 86 BF 2F EE 86 BF ............/...
- 000000D94A230 3E EE 86 BF 4C EE 86 BF 5A EE 86 BF 6E EE 86 BF >...L...Z...n...
- 000000D94A240 B0 EE 86 BF D3 EE 86 BF E4 EE 86 BF EC EE 86 BF ................
- 000000D94A250 03 EF 86 BF 13 EF 86 BF 25 EF 86 BF 32 EF 86 BF ........%...2...
再次感謝ilinuxkernel博主寫的文檔,讓我解決了這個徹底解決了這個虛擬地址到物理地址的轉換,我喜歡這樣的文章,他讓我更深刻的理解計算機的原理,這片博文絕大部分的貢獻都是這位kind的博主,光榮屬於前輩。
爲了方便感興趣的筒子順利的做這個實驗,我將這個修改後的代碼放在github上。沒有竊取原博主勞動成果的意思。
地址爲:https://github.com/manuscola/mm_addr
plus:
fileview工具提供了按照字節,雙字節 ,4字節,8字節的方式來展示內存內容,可惜我昨晚實驗的時候,沒好好看fileview的源代碼,所以都是按照BYTE的方式展現物理內存的內容。後面有感興趣的筒子想做實驗的話,可以好好看下fileview的source code。
參考文獻:
1 Linux內存地址映射
2 深入理解計算機系統
3 深入理解linux內核
4 Linux用戶程序如何訪問物理內存
5 http://cs.usfca.edu/~cruse/cs635/