9.9. Program Interpreter

9.9. Program Interpreter

The term “program interpreter” comes from the ELF standard. On Linux, the program interpreter is ld.so (/lib/ld-linux.so), the run time linker/loader. The program interpreter is responsible for bringing up an executable and getting it running. It is called by the kernel and is passed a special array of information called an auxiliary vector. This is shown as follows using the special environment variable LD_SHOW_AUXV:

术语“程序解释器”来自ELF标准。在Linux上,程序解释器是ld.so(/lib/ld-linux.so),即运行时链接器/加载器。程序解释器负责提出可执行文件并使其运行。它由内核调用,并传递一个称为辅助向量的特殊信息数组。使用特殊环境变量LD_SHOW_AUXV,如下所示:

penguin> export LD_SHOW_AUXV=true

penguin> foo

AT_HWCAP:    fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge

mca cmov pat pse36 19 21 22 mmx osfxsr xmm xmm2 27 28 29

AT_PAGESZ:      4096

AT_CLKTCK:      100

AT_PHDR:        0x8048034

AT_PHENT:       32

AT_PHNUM:       7

AT_BASE:        0x40000000

AT_FLAGS:       0x0

AT_ENTRY:       0x8048540

AT_UID:         7903

AT_EUID:        7903

AT_GID:         200

AT_EGID:        200

AT_PLATFORM:    i686

This is a printf format string in baz

This is a printf format string in main

 

Brief definitions of the various fields can be found from /usr/åinclude/elf.h:

可以在/usr/include/elf.h中找到各个字段的简要定义:

Code View: Scroll / Show All

/* Legal values for a_type (entry type). */

 

#define AT_NULL      0        /* End of vector */

#define AT_IGNORE    1        /* Entry should be ignored */

#define AT_EXECFD    2        /* File descriptor of program */

#define AT_PHDR      3        /* Program headers for program */

#define AT_PHENT     4        /* Size of program header entry */

#define AT_PHNUM     5        /* Number of program headers */

#define AT_PAGESZ    6        /* System page size */

#define AT_BASE      7        /* Base address of interpreter */

#define AT_FLAGS     8        /* Flags */

#define AT_ENTRY     9        /* Entry point of program */

#define AT_NOTELF   10        /* Program is not ELF */

#define AT_UID      11        /* Real uid */

#define AT_EUID     12        /* Effective uid */

#define AT_GID      13        /* Real gid */

#define AT_EGID     14        /* Effective gid */

#define AT_CLKTCK   17        /* Frequency of times() */

 

/* Some more special a_type values describing the hardware. */

#define AT_PLATFORM 15        /* String identifying platform. */

#define AT_HWCAP    16        /* Machine dependent hints about processor capabilities.  */

                             

 

Here is some additional information about the various fields:

以下是有关各个字段的其他信息:

AT_PAGESZ:

The standard page size used on this operating system for normal memory regions. Other memory regions (such as shared memory) can have larger page sizes. It is assumed that normal memory regions are used when loading ELF objects into memory.

AT_PHDR:

The address of the program header table for the executable.

AT_PHENT:

The size of an entry in the program header table.

AT_PHNUM:

The number of entries in the program header table. Note that with AT_PHDR, AT_PHENT and PHNUM, the program interpreter can find all of the loadable segments of an ELF object file.

AT_BASE:

The address of the program interpreter (/lib/ld-linux.so.2 in this case) itself.

AT_ENTRY:

Entry point of the program. This is the address of the execut able that the run time loader will hand control over to after finishing program initialization. This is usually the _start function.

AT_PLATFORM:

The current hardware platform.

AT_UID, AT_EUID, AT_GID, AT_EGID:

user ID, effective user ID, group ID and effective group ID respectively.

 

The kernel on some platforms that support ELF may choose not to load the program but instead pass an open file descriptor to the run time loader/linker so that it can load the program on its own. In this case, the auxiliary vector will include another field called AT_EXECFD.

某些支持ELF的平台上的内核可能选择不加载程序,而是将打开的文件描述符传递给运行时加载器/链接器,以便它可以自己加载程序。在这种情况下,辅助矢量将包括另一个称为AT_EXECFD的字段。

It is the run time loader/linker’s responsibility to load up the program if needed and perform all initialization. The initialization includes finding all required libraries, calling initialization functions, performing required relocations, and so on. However, before it initializes the program, it needs first to initialize itself. This is actually a fairly complex process that is beyond the scope of this chapter. The reason for its complexity is that the run time linker has to do this manually because the regular methods rely on some basic setup that does not exist when the run time linker starts.

The run time linker/loader ld.so also has a special environment variable to help debug it. This environment variable instructs ld.so to show all the main activity while it brings up a program. In other words, it is like a trace of the run time linker/loader. Here is an example of this special debug mode in action:

运行时加载器/链接器负责在需要时加载程序并执行所有初始化。初始化包括查找所有必需的库,调用初始化函数,执行所需的重定位等。但是,在初始化程序之前,首先需要初始化自己。这实际上是一个相当复杂的过程,超出了本章的范围。其复杂性的原因是运行时链接器必须手动执行此操作,因为常规方法依赖于运行时链接器启动时不存在的一些基本设置。

运行时链接器/加载器ld.so还有一个特殊的环境变量来帮助调试它。此环境变量指示ld.so在显示程序时显示所有主要活动。换句话说,它就像运行时链接器/加载器的跟踪。以下是此特殊调试模式的示例:

Code View: Scroll / Show All

penguin> export LD_DEBUG=all

penguin> foo

27080:

27080:  file=libfoo.so; needed by foo

27080:  find library=libfoo.so; searching

27080:   search path=./i686/mmx:./i686:./mmx:. (RPATH from file foo)

27080:    trying file=./i686/mmx/libfoo.so

27080:    trying file=./i686/libfoo.so

27080:    trying file=./mmx/libfoo.so

27080:    trying file=./libfoo.so

27080:

27080:  file=libfoo.so; generating link map

27080:    dynamic: 0x40015c48 base: 0x40014000   size: 0x00001d84

27080:      entry: 0x400147e0 phdr: 0x40014034  phnum:          4

27080:

27080:

27080:  file=libstdc++.so.5; needed by foo

27080:  find library=libstdc++.so.5; searching

27080:   search path=./i686/mmx:./i686:./mmx:. (RPATH from file foo)

27080:    trying file=./i686/mmx/libstdc++.so.5

27080:    trying file=./i686/libstdc++.so.5

27080:    trying file=./mmx/libstdc++.so.5

27080:    trying file=./libstdc++.so.5

27080:   search path=/usr/lib/i686/mmx:/usr/lib/i686:/usr/lib/mmx:/usr/lib (system search path)

27080:    trying file=/usr/lib/i686/mmx/libstdc++.so.5

27080:    trying file=/usr/lib/i686/libstdc++.so.5

27080:    trying file=/usr/lib/mmx/libstdc++.so.5

27080:    trying file=/usr/lib/libstdc++.so.5

27080:

27080:  file=libstdc++.so.5; generating link map

27080:    dynamic: 0x400c246c base: 0x40016000   size: 0x000b23c0

27080:      entry: 0x40050700 phdr: 0x40016034  phnum:  4

<...>

 

This first part is called “loading” and involves finding and loading all required shared libraries. The search path (LD_LIBRARY_PATH) and the RPATH are searched as potential directories to find libraries. Make note of the following text in the output “generating link map.” This is described in more detail shortly. Let’s see what else is in this debug output:

第一部分称为“加载”,涉及查找和加载所有必需的共享库。搜索路径(LD_LIBRARY_PATH)和RPATH作为潜在目录进行搜索以查找库。在输出“生成链接映射”中记下以下文本。稍后将对此进行更详细的描述。让我们看看这个调试输出中还有什么:

<...>

27080:

27080: calling init: ./libfoo.so

27080:

<...>

 

Here we see the init function in libfoo.so being called. This is before control has officially been handed over to the executable. The output continues...

这里我们看到libfoo.so中的init函数被调用。 这是在控制权正式移交给可执行文件之前。 输出继续......

<...>

27080:

27080: initialize program: foo

27080:

27080:

27080: transferring control: foo

27080:

<...>

 

This is where control is officially handed over to the executable foo. After this point, the contents of the debug output are for “late” or “lazy” binding:

这是控制权正式移交给可执行文件foo的地方。 在此之后,调试输出的内容用于“延迟”或“延迟”绑定:

Code View: Scroll / Show All

<...>

27080:  symbol=_Z3bazi;  lookup in file=foo

27080:  symbol=_Z3bazi;  lookup in file=./libfoo.so

27080:  binding file foo to ./libfoo.so: normal symbol '_Z3bazi'

27080:  symbol=_Z3fooi;  lookup in file=foo

27080:  symbol=_Z3fooi;  lookup in file=./libfoo.so

27080:  symbol=_Z3fooi;  lookup in file=/usr/lib/libstdc++.so.5

27080:  symbol=_Z3fooi;  lookup in file=/lib/libm.so.6

27080:  symbol=_Z3fooi;  lookup in file=/lib/libgcc_s.so.1

27080:  symbol=_Z3fooi;  lookup in file=/lib/libc.so.6

27080:  symbol=_Z3fooi;  lookup in file=/lib/ld-linux.so.2

27080: binding file ./libfoo.so to ./libfoo.so: normal symbol '_Z3fooi'

27080:  symbol=printf;  lookup in file=foo

27080:  symbol=printf;  lookup in file=./libfoo.so

27080:  symbol=printf;  lookup in file=/usr/lib/libstdc++.so.5

27080:  symbol=printf;  lookup in file=/lib/libm.so.6

27080:  symbol=printf;  lookup in file=/lib/libgcc_s.so.1

27080:  symbol=printf;  lookup in file=/lib/libc.so.6

27080:  binding file ./libfoo.so to /lib/libc.so.6: normal symbol 'printf' [GLIBC_2.0]

<...>

 

These binding actions are driven by the _dl_runtime_resolve function described back in the “.plt” section of this chapter.

这些绑定操作由本章“.plt”部分中描述的_dl_runtime_resolve函数驱动。

9.9.1. Link Map

Remember that text, “generating link map,” from the output from this special debug mode? A link map contains information about a shared library that has been loaded into the address space.

还记得这个特殊调试模式的输出文本“生成链接映射”吗? 链接映射包含有关已加载到地址空间的共享库的信息。

There is a special variable called _dl_main_searchlist that has the following structure:

有一个名为_dl_main_searchlist的特殊变量,它具有以下结构:

struct

{

  /* Array of maps for the scope. */

  struct link_map **r_list;

  /* Number of entries in the scope. */

  unsigned int r_nlist;

};

 

From within GDB (the process has to be running for this to be useful), we can see the values of the two structure members:

从GDB内部(必须运行该进程才有用),我们可以看到两个结构成员的值:

(gdb) x/2x _dl_main_searchlist

0x400130c8:     0x40223030      0x00000007

 

The first value is the address of the list, and the second value is the number of elements in the list. Looking at the seven values in memory, we get the following:

第一个值是列表的地址,第二个值是列表中的元素数。 查看内存中的七个值,我们得到以下结果:

(gdb) x/7 0x40223030

0x40223030:    0x40012fd0    0x40013590    0x400137f8    0x400139e8

0x40223040:    0x40013bd0    0x40013dc0    0x40012d80

 

Each of these values is a pointer to the following structure:

每个值都是指向以下结构的指针:

struct link_map

  {

    /* These first few members are part of the protocol with the

debugger.

       This is the same format used in SVR4.  */

 

    ElfW(Addr) l_addr; /* Base address shared object is loaded at.*/

    char *l_name;      /* Absolute file name object was found in. */

    ElfW(Dyn) *l_ld;   /* Dynamic section of the shared object.   */

    struct link_map *l_next, *l_prev; /* Chain of loaded objects.*/

};

 

This is the link map that the ld.so output referred to. Let’s look at the second address in the list:

这是ld.so输出引用的链接映射。 我们来看看列表中的第二个地址:

(gdb) x/5x 0x40013590

0x40013590:  0x40014000   0x40013580   0x40015c48   0x400137f8

0x400135a0:  0x40012fd0

 

According to the link_map structure, the second value should be the path of a loaded library, confirmed below:

根据link_map结构,第二个值应该是加载库的路径,在下面确认:

(gdb) x/s 0x40013580

0x40013580:      "./libfoo.so"

 

The next link_map value is another library:

(gdb) x/5x 0x400137f8

0x400137f8:   0x40016000   0x400137e0   0x400c246c  0x400139e8

0x40013808:   0x40013590

(gdb) x/s 0x400137e0

0x400137e0:   "/usr/lib/libstdc++.so.5"

 

Also notice that the fourth value in each link_map structure (l_next) points to the next link map structure. l_prev points to the previous structure. There is both a linked list and an array of pointers to these functions.

另请注意,每个link_map结构(l_next)中的第四个值指向下一个链接映射结构。 l_prev指向上一个结构。链接列表和指向这些函数的指针数组都有。

The list of loaded libraries is used by the run time linker to keep track of the loaded libraries for a process.

运行时链接程序使用已加载库的列表来跟踪进程的已加载库。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章