【彙編優化】之對nasm中GOT的理解

##1.Obtaining the Address of the GOT
Each code module in your shared library should define the GOT as an external symbol:

extern  _GLOBAL_OFFSET_TABLE_   ; in ELF 
extern  __GLOBAL_OFFSET_TABLE_  ; in BSD a.out

At the beginning of any function in your shared library which plans to access your data or BSS sections, you must first calculate the address of the GOT. This is typically done by writing the function in this form:

func:   push    ebp 
        mov     ebp,esp 
        push    ebx 
        call    .get_GOT 
.get_GOT: 
        pop     ebx 
        add     ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc 

        ; the function body comes here 

        mov     ebx,[ebp-4] 
        mov     esp,ebp 
        pop     ebp 
        ret

(For BSD, again, the symbol _GLOBAL_OFFSET_TABLE requires a second leading underscore.)

The first two lines of this function are simply the standard C prologue to set up a stack frame, and the last three lines are standard C function epilogue. The third line, and the fourth to last line, save and restore the EBX register, because PIC shared libraries use this register to store the address of the GOT.

The interesting bit is the CALL instruction and the following two lines. The CALL and POP combination obtains the address of the label .get_GOT, without having to know in advance where the program was loaded (since the CALL instruction is encoded relative to the current position). The ADD instruction makes use of one of the special PIC relocation types: GOTPC relocation. With the WRT …gotpc qualifier specified, the symbol referenced (here GLOBAL_OFFSET_TABLE, the special symbol assigned to the GOT) is given as an offset from the beginning of the section. (Actually, ELF encodes it as the offset from the operand field of the ADD instruction, but NASM simplifies this deliberately, so you do things the same way for both ELF and BSD.) So the instruction then adds the beginning of the section, to get the real address of the GOT, and subtracts the value of .get_GOT which it knows is in EBX. Therefore, by the time that instruction has finished, EBX contains the address of the GOT.

If you didn’t follow that, don’t worry: it’s never necessary to obtain the address of the GOT by any other means, so you can put those three instructions into a macro and safely ignore them:

%macro  get_GOT 0 

        call    %%getgot 
  %%getgot: 
        pop     ebx 
        add     ebx,_GLOBAL_OFFSET_TABLE_+$$-%%getgot wrt ..gotpc 

%endmacro

###1.1 Obtaining the Address of the GOT的理解
for example: a function(here add_fun) writed by asm, and called by c
add.asm

extern _GLOBAL_OFFSET_TABLE_
section .data

local_val: db 4
  
section .text
global add_fun
add_fun:
	push ebp  
	mov ebp, esp  
	push ebx
	call .get_GOT
.get_GOT:
	pop ebx
	
	mov ecx, $$
	mov ecx,_GLOBAL_OFFSET_TABLE_ wrt ..gotpc 
	mov ecx,.get_GOT
	
	add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc 
	mov eax, [esp + 8]  
	mov edx, [esp + 12]  
	add eax, edx;  
	mov edx, [ebx + local_val wrt ..gotoff]
	mov eax,edx
	mov esp, ebp  
	pop ebp  
	ret  

demo.c

#include<stdio.h>
#include<stdlib.h>
extern int add_fun(int a, int b);

int main()
{
	int sum = add_fun(2,3);
        printf("sum = %d\n", sum);
	return 0;
}

make.sh

#!/bin/sh
gcc -g -m32 -c demo.c -fPIC -I. -o demo.o
yasm -m x86 -f elf -DPIC -I. add.asm -o add.o
gcc -g -m32 -o demo demo.o add.o

after compliled

Dump of assembler code for function add_fun:
=> 0x56555580 <+0>:	    push   %ebp
   0x56555581 <+1>:	    mov    %esp,%ebp
   0x56555583 <+3>:  	push   %ebx
   0x56555584 <+4>:	    call   0x56555589 <add_fun+9>
   0x56555589 <+9>:	    pop    %ebx
   0x5655558a <+10>:	mov    $0x56555580,%ecx
   0x5655558f <+15>:	mov    $0x1a58,%ecx
   0x56555594 <+20>:	mov    $0x56555589,%ecx
   0x56555599 <+25>:	add    $0x1a4f,%ebx
   0x5655559f <+31>:	mov    0x8(%esp),%eax
   0x565555a3 <+35>:	mov    0xc(%esp),%edx
   0x565555a7 <+39>:	add    %edx,%eax
   0x565555a9 <+41>:	mov    0x30(%ebx),%edx
   0x565555af <+47>:	mov    %edx,%eax
   0x565555b1 <+49>:	mov    %ebp,%esp
   0x565555b3 <+51>:	pop    %ebp
   0x565555b4 <+52>:	ret    
   0x565555b5 <+53>:	xchg   %ax,%ax
   0x565555b7 <+55>:	xchg   %ax,%ax
   0x565555b9 <+57>:	xchg   %ax,%ax
   0x565555bb <+59>:	xchg   %ax,%ax
   0x565555bd <+61>:	xchg   %ax,%ax
   0x565555bf <+63>:	nop

(1)、0x56555584 <+4>: call 0x56555589 <add_fun+9>對應add.asm中的call .get_GOT
   0x56555589是.get_GOT的地址,另外執行完 0x56555589 <+9>: pop %ebx,ebx中存放的也是.get_GOT的地址(即0x56555589)。

(2)、 0x5655558a <+10>: mov $0x56555580,%ecx對應add.asm中的mov ecx, $$
  0x56555580是當前段的起始地址。

(3)、0x5655558f <+15>: mov $0x1a58,%ecx對應add.asm中的mov ecx,_GLOBAL_OFFSET_TABLE_ wrt ..gotpc
  0x1a58是當前段的偏移。

(4)、0x56555599 <+25>: add $0x1a4f,%ebx對應add.asm中的add ebx,_GLOBAL_OFFSET_TABLE_+$$-.get_GOT wrt ..gotpc
  執行完這條命令後,ebx中存放的是GOT的地址:0x565556fd8 = 0x1a4f + 0x56555589 = 0x1a58 + 0x56555580。

參考網址:http://coding.derkeiler.com/Archive/Assembler/comp.lang.asm.x86/2007-02/msg00041.html
參考網址:http://www.nasm.us/doc/nasmdoc9.html

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章