Chapter 9. ELF: Executable and Linking Format

Chapter 9. ELF: Executable and Linking Format

Introduction

Concepts and Definitions

ELF Header

Overview of Segments and Sections

Segments and the Program Header Table

Sections and the Section Header Table

Relocation and Position Independent Code (PIC)

Stripping an ELF Object

Program Interpreter

Symbol Resolution

Use of Weak Symbols for Problem Investigations

Advanced Interception Using Global Offset Table

Source Files

ELF APIs

Other Information

Conclusion

9.1. Introduction

The ELF file format is the default format for shared libraries and executables on today’s Linux systems. ELF stands for Executable and Linking Format and is the most common file format for object files, executables, and shared libraries on Linux as well as some other popular operating systems. A good understanding of ELF will improve your overall knowledge of how the operating system works, which in turn will help you diagnose problems faster.

ELF 文件格式是当今 Linux 系统上共享库和可执行程序的默认格式。ELF 代表可执行文件和链接格式, 是 Linux 上对象文件、可执行文件和共享库以及其他一些受欢迎的操作系统的最常用格式。对 ELF 的了解将提高您对操作系统工作方式的总体了解, 这反过来将帮助您更快地诊断问题。

A good knowledge of ELF will also directly improve your diagnostic skills given that many in-depth problems require a reasonable knowledge of ELF. For example, in some cases, a program may have more than one global symbol (for example, a variable) with the same name. Different parts of the program may access the correct variable; whereas, other parts of the program may incorrectly access the other. This type of problem can be very difficult to diagnose without a basic understanding of ELF and the run time linker. There are also some useful debugging tricks that require a good understanding of ELF. One such trick is to build an interceptor library for operating system (or libc) functions such as malloc and free. The end of this chapter has some examples.

对ELF的很好的了解,也将直接提高你的诊断技能, 因为许多深层次的问题需要对ELF的深入理解。例如, 在某些情况下, 某个程序可能具有多个同名的全局符号 (例如, 变量)。程序的某些部分可以访问正确的变量;然而, 程序的其他部分可能错误地访问另一个。如果没有对 ELF 和运行时链接器的基本理解, 这类问题可能很难诊断。还有一些有用的调试技巧, 需要对 ELF 有很好的理解。其中一个技巧是为操作系统 (或 libc) 功能 (如 malloc 和free) 构建一个拦截库。本章的末尾有一些例子。

Note: The purpose of this chapter is to provide enough knowledge about ELF to help you improve your diagnostic skills. This chapter is not meant to replace or supplement the ELF standard but rather provide practical knowledge about ELF on Linux. Instead of walking through the ELF standard, this chapter will provide real examples and explain how things work under the covers.

注意: 本章的目的是提供有关 ELF 的知识, 帮助您提高诊断技能。本章并不意味着替换或补充 elf 标准, 而是提供 Linux 上的 elf 的实用知识。本章将提供真实的例子, 并解释工作原理, 而不是遍历 ELF 标准。

 

For this chapter, we will be using the source code listed here to illustrate the ELF standard and how it relates to the Linux operating system:

在本章中, 我们将使用此处列出的源代码来说明 ELF 标准以及它与 Linux 操作系统的关系:

penguin> ls -l *.C *.h Makefile

-rw-r—r—    1 wilding  build        759 Jan 18 05:59 Makefile

-rw-r—r—    1 wilding  build        641 Jan  9 08:57 foo.C

-rw-r—r—    1 wilding  build        161 Jan  9 08:04 foo.h

-rw-r—r—    1 wilding  build        649 Jan 18 05:37 intercept.C

-rw-r—r—    1 wilding  build        275 Apr 15  2004 main.C

-rw-r—r—    1 wilding  build        276 Dec 28 17:37 pic.C

Note: The source code in these files can be found at the end of the chapter under the heading “Source Files.”

注意: 这些文件中的源代码可以在章节末尾的 "源文件" 标题下找到。

Note: The .C extension is used for C++ files, although .cc and .cpp will also work.

注:.c 扩展用于 c++ 文件, 尽管. cc 和. cpp 也会起作用。

 

These objects are all compiled as C++ source files. Some of them contain simple C++ objects that will be used to explain how ELF handles simple C++ functionality such as constructors and destructors.

这些对象都编译为 c++ 源文件。其中有些包含简单的 c++ 对象, 它们将用于解释 ELF 如何处理简单的 c++ 功能 (如构造函数和析构函数)。

The source file foo.C contains a number of global objects that have a constructor, a number of global variables, and a number of functions. Source file, file.h, contains the definition of the class used for the global objects in foo.C. Source file, main.C, calls a function in foo.C and also instantiates a global object. Source file pic.C is a smaller, less complex source file used to illustrate relocation. Lastly, the make file, Makefile, serves to make it easier to compile and link the source files using the make utility.

源文件 foo.c 包含许多具有构造函数、多个全局变量和多个函数的全局对象。源文件, file.h, 包含 foo.c 中用于全局对象的类的定义。源文件, main.c, 调用 foo.c 中的函数。还可以实例化全局对象。源文件 pic.c 是用于说明较小的、较不复杂的源文件。最后, Makefile文件有助于使编译和链接源文件更容易。

To compile and link the source files, simply run make in the directory containing the files:

要编译和链接源文件, 只需在包含文件的目录中运行:

penguin> make

g++ -c  main.C

g++ -c -fPIC  foo.C

g++ -shared foo.o -o libfoo.so

g++ -o foo main.o -L. -Wl,-rpath,. -lfoo

g++ -c -fPIC  pic.C

g++ -o pic pic.o -L. -Wl,-rpath,.

g++ -c pic.C -o pic_nopic.o

g++ -o pic_nopic pic_nopic.o -L. -Wl,-rpath,.

gcc -c -fPIC  intercept.c

gcc -o intercept.so -shared intercept.o

Note: The -fPIC option is used for some of the source files to create position independent object files. This is the preferred choice when creating shared libraries. Position-independent code is covered in more detail later in this chapter.

注意: -fPIC 选项用于某些源文件以创建与位置无关的对象文件。这是创建共享库时首选的选项。本章后面将详细介绍与位置无关的代码。

Note: The -L. switch is used for convenience. It tells the linker to look in the current directory (“.”) for shared libraries. Normally, a real shared library directory would be used, but this is convenient given that the shared libraries are in the same directory as the executables.

注: -L. 开关方便使用。它告诉链接器查看共享库的当前目录 (".")。通常, 将使用一个真正的共享库目录, 但考虑到共享库与可执行文件位于同一目录中, 这是很方便的。

 

The source file, foo.C, is used to create libfoo.so and the source file, main.C, is used to build the executable foo.

源文件, foo.c, 用于创建 libfoo.so 和源文件, main.c, 用于生成可执行文件 foo。

Throughout this chapter, we will be using the g++ compiler. This compiler is installed on many/most Linux systems and will compile C++ code to help illustrate how ELF handles some basic C++ functionality.

在本章中, 我们将使用 "g + + 编译器"。此编译器安装在大多数 Linux 系统上, 并将编译 c++ 代码, 以帮助说明 ELF 如何处理一些基本的 c++ 功能。

9.2. Concepts and Definitions

Learning about ELF can be a challenge because the concepts are somewhat interdependent in that learning about one concept can require understanding another concept and vice versa. Thus, before diving into ELF details, we will introduce a few basic concepts and definitions without going into too much detail. The most basic of these concepts is the “symbol.”

了解 ELF 可能是一个挑战, 因为概念是有些相互依存的, 因为学习一个概念会要求理解另一个概念, 反之亦然。因此, 在深入ELF的细节之前, 我们将介绍一些基本概念和定义, 而不会深入太多细节。这些概念中最基本的是 "符号"。

9.2.1. Symbol

A symbol is a description of a function or variable (or other) that is stored in an ELF file. A symbol is similar to an entry in a phone book. It contains brief information about a function or variable in an ELF file, but, like the phone book entry, it is certainly not the item it is describing. A symbol can describe many things, but they are mainly used for functions and variables, which for the time being, we’ll focus on.

符号是存储在 ELF 文件中的函数或变量 (或其他) 的描述。符号与电话簿中的条目类似。它包含有关 ELF 文件中函数或变量的简短信息, 但与电话簿条目一样, 它肯定不是它所描述的条目。一个符号可以描述很多东西, 但它们主要用于函数和变量。就是现在我们将集中精力关注的东西。

Symbols have a size, value, and name associated with them as well as a few other tidbits of information. The size is the actual size of the function or variable that the symbol represents. The value gives the location in the ELF file of the function/variable or the expected address of the function/variable (using the load address of the shared library or executable as a base). Symbol names are for the sole convenience of humans and are used so that human software developers can use descriptive names in their programs.

符号有大小、值和名称, 还有其他一些信息。大小是符号所表示的函数或变量的实际大小。该值提供函数/变量在 ELF 文件中的位置或函数/变量的预期地址 (使用共享库或可执行文件的加载地址作为基准)。符号名称是为方便而使用, 使开发人员可以使用描述性名称在他们的程序。

Consider the function printf in libc. This function is defined and stored in a library “/lib/libc.so.6” on the system used for this chapter. We can see the details of this “symbol” using the nm command (note that <...> represents uninteresting output that was not included):

在 libc 中考虑函数的 printf。在本章所用的系统中,此函数在库 "/lib/libc.so.6"定义并存储。我们可以使用 nm 命令查看此 "符号" 的详细信息 (请注意 <...> 表示未包括的无关的输出):

penguin> nm -v -f s /lib/libc.so.6

 

Symbols from /lib/libc.so.6:

 

Name   Value      Class            Type    Size     Line   Section

<...>

printf|0004fc90|   T  |           FUNC|   00000039|      |.text

<...>

Note: Some distributions may strip the shared libraries, which removes the main symbol table and makes the nm command here not work as shown. In this case, choose a library on the system that has not been stripped. You can also use the file command on an ELF file to see whether it has been stripped (the term “stripped” will be displayed).

注意: 某些发行版可能会strip共享库, 这会删除主符号表, 并使 nm 命令无法正常工作。在这种情况下, 请选择系统上尚未strip的库。您还可以使用 ELF 文件上的 "file" 命令来查看它是否已stripped ("stripped" 一词将显示)。

Note: ELF uses the term “class” to describe the scope and/or category of a symbol. The ELF term class, when used to describe symbols, is not related to C++ classes.

注意: ELF 使用术语 "class" 来描述符号的范围和/或类别。ELF 术语class用于描述符号时, 与 c++ 类无关。

 

The field “Value” is the offset in the library where the executable instructions for “printf” are found. The class is “T” meaning “text”. Text in this context means read-only. The type is “FUNC” for function. The size is 0x39 bytes or 57 bytes in decimal notation, and lastly, the section that this function is stored in is the “.text” section, which contains all of the executable code for the library.

字段 "Value" 是库中找到 "printf" 的可执行指令的偏移量。类是 "T" 意思 "文本"。此上下文中的文本是只读。类型是函数的 "FUNC"。大小为0x39 字节或十进制中的57个字节, 最后, 此函数存储在的节是 ". text" 部分, 该节包含库的所有可执行代码。

Symbols can have different “scope” such as global or local. A global variable in a shared library is exposed and available for other shared libraries or the executable itself to use. A local variable was originally defined as “static” in the original source file, making it local to the code in that source file. Static symbols are not available to other ELF files.

符号可以有不同的 "范围", 如全局或局部。共享库中的全局变量是公开的, 可供其他共享库或可执行文件本身使用。局部变量在源文件中用 "static"定义, 只在定义的源文件中可用。静态符号在其他 ELF 源代码文件中不可用。

There are two variables defined in the source file foo.C that are meant for illustrating the concept of scope:

在源文件 foo.c 中定义了两个变量, 来表示范围概念的含义:

int globInt = 5 ;

static int staticInt = 5 ;

 

The first is a global variable, and the second is defined as “static.” Using nm again, we can see how the “class” (that is, scope) of the type variables differs:

第一个是全局变量, 第二个定义为 "static"。再次使用 nm, 我们可以看到类型变量的 "class" (即范围) 是如何不同的:

penguin>  nm -v -f s foo.o | egrep "staticInt|globInt"

globInt    |00000000|   D  |            OBJECT|00000004|   |.data

staticInt  |00000004|   d  |            OBJECT|00000004|   |.data

 

The output shows an uppercase D for global data variable and shows a lowercase d for local data variable. In the case of the text symbols, an upper case T refers to a global text symbol; whereas, a lower case t refers to a local text symbol.

输出显示全局数据变量的大写 D, 并显示本地数据变量的小写 d。在文本符号的情况下, 大写 T 指的是全局文本符号;反之, 小写 t 指的是局部文本符号。

Symbols can also be defined or undefined. A defined symbol means that an ELF file actually contains the contents of the associated function or variable. An undefined symbol means that the object file references the function/variable but does not contain its contents. For example, the source file “main.C” makes a call to printf but it does not contain the contents of the printf function. Using nm again, we can see that the “class” is “U” for “undefined”:

符号可以是定义或未定义的。定义的符号表示 ELF 文件实际上包含相关函数或变量的内容。未定义的符号表示对象文件引用的函数/变量, 但不包含其内容。例如, 源文件 "main.C "调用 printf, 但不包含 printf 函数的内容。再次使用 nm, 我们可以看到 "class" 是 "U",表示 "未定义":

 

penguin> nm -v -f s main.o |egrep printf

printf    |      |   U  |      NOTYPE|     |    |*UND*

 

Other ELF symbols describe variables that are read-only or not initialized (these have a “class” of R and B, respectively, in nm output).

其他 ELF 符号描述的是只读或未初始化的变量 (它们的”class”分别是R 和 B )。

ELF also has absolute symbols (marked with an A by nm) that are used to mark an offset in a file. The following command shows a few examples:

ELF 还具有绝对符号 (用 A 标记), 用于标记文件中的偏移量。下面的命令显示了几个示例:

penguin>  nm -v -f s libfoo.so | egrep " A "

_DYNAMIC              |00001c48|   A  |    OBJECT|       |  |*ABS*

_GLOBAL_OFFSET_TABLE  |00001d38|   A  |    OBJECT|       |  |*ABS*

_bss_start            |00001d74|   A  |    NOTYPE|       |  |*ABS*

_edata                |00001d74|   A  |    NOTYPE|       |  |*ABS*

_end                  |00001d84|   A  |    NOTYPE|       |  |*ABS*

 

Notice that the absolute symbols do not include a size because they are meant only to mark a location in an ELF file.

请注意, 绝对符号不包含大小, 因为它们只在 ELF 文件中标记位置。

9.2.1.1. Symbols Names and C Versus C++

One of the unfortunate problems with C code is symbol name collisions. With so much software in the world, it is not uncommon for two developers to pick the same name for two different functions or variables. This is not handled well (as explained later in this chapter) by ELF or other executable file formats and has caused many problems over the years for large C-based software.

C++ aimed to solve this problem in two ways. The mangled names include a name space (that is, class name) and information about the function arguments. So function “foo(int bar)” will have a different C++ symbol name than “foo(char bar).” This is covered in Chapter 5, but it is worth a quick reminder here under the context of ELF. There are many examples with mangled C++ names in this chapter.

C 代码的一个不幸问题是变量名冲突。在世界上有这么多的软件, 两个开发人员为两个不同的函数或变量选择相同的名称并不少见。ELF 或其他可执行文件格式对这个问题处理得不好(如本章后面部分所解释), 并且多年来为大型的基于 C 的软件带来了许多问题。c++ 旨在通过两种方式解决这个问题。被破坏的名称包括命名空间(即类名)和有关函数参数的信息。 因此,函数“foo(int bar)”将具有与“foo(char bar)”不同的C ++符号名称。这将在第5章中介绍,但值得在ELF的背景下快速提醒。 本章中有许多带有错误的C ++名称的示例。

9.2.2. Object Files, Shared Libraries, Executables, and Core Files

Source files are meant for humans and cannot be efficiently interpreted by computers. Therefore, source code must be translated (human to computer) into a format that is easily and efficiently executed on a computer system. This is called “compiling.” There is more information on compiling in Chapter 4, but stated simply, compiling is the process of turning a source file into an object file.

源文件是为人类准备的, 不能被计算机有效地解释。因此, 必须将源代码翻译 (从可读到计算机识别) 转换成一种在计算机系统上容易和高效地执行的格式。这叫做 "编译"。在第4章中有更多关于编译的信息, 但简单地说, 编译是将源文件转化为对象文件的过程。

9.2.2.1. Object Files

Even though an object file is the computer-readable version of a source file, it still has hints of the original source code. Function names as well as global and static variable names are the basis for the symbol names stored in the resulting object file. As mentioned earlier, the symbol names are solely for humans and are actually a bit of an inconvenience for computer systems. A good example of an inconvenience is the hash table section in an ELF file, which is detailed later in this chapter. The hash table consumes disk space and memory and requires CPU resources to traverse—it is required because of the human-friendly symbol names.

即使目标文件是源文件的计算机可读版本,它仍然具有原始源代码的提示。 函数名称以及全局和静态变量名称是存储在结果对象文件中的符号名称的基础。 如前所述,符号名称仅供人类使用,实际上对计算机系统来说有点不方便。 一个很不方便的例子是ELF文件中的哈希表部分,本章稍后将对此进行详细说明。 哈希表占用磁盘空间和内存,并且需要CPU资源遍历 - 由于人性化的符号名称,它是必需的。

The command to create an ELF object file is fairly straight forward as shown here:

创建 ELF 对象文件的命令相当直接, 如下所示:

Code View: Scroll / Show All

penguin> g++ -c foo.C

penguin> ls -l foo.C foo.o

-rw-r—r—    1 wilding  build     641 Jan   9 08:57 foo.C

-rw-r—r—    1 wilding  build    2360 Jan   9 09:05 foo.o

penguin> file foo.C foo.o

foo.C: ASCII C program text

foo.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

 

The foo.o file is the object file that contains the compiled version of the source code in the file foo.C. From the file foo.o command, we can see that this is indeed an ELF file. From the ls command, we can see that the size of the object file is much larger than the source file. In fact, it is about three and a half times larger. At first, this might seem strange. After all, shouldn’t the computer-readable version be smaller?

foo.o文件是目标文件,其中包含文件foo.C中源代码的编译版本。 从文件foo.o命令,我们可以看到这确实是一个ELF文件。 从ls命令,我们可以看到目标文件的大小比源文件大得多。 事实上,它大约是后者的三倍半。 乍看起来很奇怪。 毕竟,计算机可读版本不应该更小吗?

Note: For more information about creating small ELF files, see the following URL: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html.

注意: 有关创建ELF文件的详细信息, 请参阅以下 URL: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html。

 

The actual combined size of the compiled instructions and variables is 247 bytes (using nm to count the symbol sizes). This is about one-third of the size of the original source file and accounts for only about one-tenth of the object file size. The object file foo.o must contain more than just the machine instructions and variables of the source file foo.C. One example of something that takes up space is the ELF header, which is explained later in this chapter under the heading, “ELF Header.”

编译的指令和变量的实际组合大小为247字节 (使用 nm 计算符号大小)。这大约是原始源文件大小的 1/3, 仅占对象文件大小的大约1/10。目标文件foo.o所包含的不仅仅是源文件foo.C的机器指令和变量。占用空间的一个例子是 ELF头, 后面的章节将解释 "ELF 头"。

These object files cannot be run directly because they do not contain information about how the object file should be loaded into memory. Further, the undefined symbols in the object files must eventually point to the corresponding defined symbols, or the code in the object files will not run. For example, a call to printf must be able to find the actual function printf in libc (for example, “/lib/libc.so.6”). Before any machine instructions in an ELF object file are executed, the object files must be combined (“linked”) into a larger ELF file type called an executable or shared library. For shared libraries, there is an extra step where the run time linker/loader (explained later in this chapter) must dynamically load the shared libraries into the address space of an executable. In any case, the process of creating an executable or shared library from object files is called linking. And part of the responsibility of linking is to resolve some of the undefined symbols.

这些对象文件不能直接运行, 因为它们不包含有关如何将对象文件加载到内存中的信息。此外, 对象文件中未定义的符号最终必须指向相应的已定义符号, 否则对象文件中的代码将不会运行。例如, 对 printf 的调用必须能够在 libc 中找到实际的函数, 例如, "/lib/libc.so.6"。在执行 ELF 对象文件中的任何机器指令之前, 必须将对象文件组合 ("链接") 到一个称为可执行文件或共享库的更大的 ELF 文件名中。对于共享库, 有一个额外的步骤, 运行时linker/loader (后面解释) 必须动态地将共享库加载到可执行文件的地址空间中。在任何情况下, 从对象文件创建可执行文件或共享库的过程称为链接。链接的一部分工作是解决一些未定义的符号。

9.2.2.2. Shared Libraries

A shared library is made up of the symbols from one or more object files and can be loaded anywhere in the address space. There are some architectural restrictions that limit or guide the actual load address of shared libraries in the address space, but this does not affect ELF (any address is okay). A shared library, like an object file, has a list of symbols that are either defined or undefined. However, any undefined symbols must be satisfied through other shared libraries.

共享库由一个或多个对象文件中的符号组成, 并且可以加载到地址空间中的任何位置。有一些体系结构限制了地址空间中共享库的实际加载地址, 但这不影响 ELF (任何地址都可以)。共享库 (象对象文件一样) 具有定义或未定义的符号列表。但是, 任何未定义的符号都必须在其他共享库中找到。

9.2.2.3. Exectuables

An executable is very similar to a shared library, although it must be loaded at a specific address in memory. An executable also has a function that is called when a program starts. For programmers, this function is called main; however, the actual function that is run first in an executable is called _start and is explained later in this chapter.

可执行文件与共享库非常相似, 尽管它必须在内存中的特定地址加载。可执行文件还具有在程序启动时调用的函数。对于程序员来说, 这个函数叫做 main;但是, 在可执行文件中首先运行的实际函数称为 _start, 并在本章后面部分进行说明。

The most significant part of an executable or shared library is the information about how and where to load the files into memory so that the machine instructions can be run. This information is contained in a special part of the ELF file called the “program header table.” This is also explained in more detail later in this chapter.

可执行文件或共享库中最重要的部分是有关如何以及在何处将这些文件加载到内存中的信息, 以便可以运行计算机指令。此信息包含在称为 "程序头表" 的 ELF 文件的特殊部分中。本章后面还将详细说明这一点。

9.2.2.4. Core Files

A core file is a special type of ELF file that is very different from shared libraries and executables. It is the memory image from a once-running process. Core files contain a number of memory segments that were originally used by the running process and that can be loaded into a debugger for subsequent diagnosis.

core文件是一种特殊类型的 ELF 文件, 与共享库和可执行程序不同。它是一次运行过程中的内存映像。core文件包含一些被运行进程使用的内存段, 可以加载到调试器中以供后续诊断使用。

9.2.2.5. Static Libraries

Archive files (files that end with .a), also known as static libraries, are not ELF files. Archive files are a convenient file format to store other files. Technically, you can store any type of file in an archive file, although in practice, object files are by far the most commonly stored file type. Archive files do contain an index of the symbols from each ELF object file contained within them (which crosses the boundary a bit as far as a generic storage format is concerned). A description of static libraries (and their format) is beyond the scope of this chapter (because they are not an ELF file type). However, their importance will be explained as part of the linking phase.

存档文件 (以. a 结尾的文件) (也称为静态库) 不是 ELF 文件。存档文件是存储其他文件的一种方便的文件格式。从技术上讲, 您可以将任何类型的文件存储在存档文件中, 尽管实际上, 对象文件是最常用的文件存储类型。存档文件确实包含一些符号的索引,这些索引来自ELF文件包含的对象文件(与一般存储格式有关)。静态库 (及其格式) 的描述超出本章的范围 (因为它们不是 ELF 文件类型)。然而, 它们的重要性将在本章的链接部分解释。

9.2.3. Linking

Linking takes the symbols from the object files, shuffles them into a specific order, and then combines them into either a shared library or executable. Linking also needs to resolve some of the undefined symbols, either using the functions and variables of the object files that it is combining or through symbols exported by other shared libraries. The linker must also create a program header that includes information about how the executable or shared library should be loaded into memory. Let’s take a quick look at the process of linking using Figure 9.1.

链接从对象文件获取符号, 将它们排列为特定的顺序, 然后将它们组合成共享库或可执行文件。链接或者需要使用它所组合的对象文件的函数和变量, 或者通过其他共享库导出的符号解析某些未定义的符号。链接器还必须创建一个程序头, 其中包含有关如何将可执行文件或共享库加载到内存中的信息。让我们使用图9.1快速了解一下链接的过程。

Figure 9.1. Linking object files into a shared library.

 

The diagram shows four separate object files being combined into one shared library. Each object file contains a number of sections; only four are being shown here for simplicity:

该图显示了将四个单独的对象文件合并到一个共享库中。每个对象文件包含多个节;为了简洁,这里仅显示四个对象文件:

.text:

contains functions

.data:

contains initialized writable (that is, not constant) variables

.rodata:

contains read-only (that is, constant) variables

.bss:

contains uninitialized and writable variables

 

Each of the sections from the four object files is merged in with a larger section in the shared library. The sections are also included in larger contiguous regions called “segments,” which will eventually be loaded into memory with specific memory attributes such as read, write, and/or execute.

四个对象文件中的每一节都与共享库中的较大节合并。这些节还包含在称为 "段" 的较大相邻区域中, 最终将被加载到内存中, 其中包含特定的内存属性, 如读取、写入和/或执行。

The order of information in an object file is not that important. However, the order of the information in a shared library or executable is very important because the goal is to move functions and variables that have similar properties into the specific loadable segments. The order of the segments in the shared library is:

对象文件中信息的顺序并不重要。但是, 共享库或可执行文件中信息的顺序非常重要, 因为共享库或可执行文件的目标是将具有相似属性的函数和变量移动到特定的可加载段中。共享库中的段顺序为:

  1. text: read only
  2. data: read/write

The segments of this shared library are loaded into memory in order, and space for the variables in the .bss section is also allocated. The .bss section is a special section that stores uninitialized variables and the space for them must be taken into account in the resulting shared library or executable.

此共享库的段按顺序加载到内存中, 并且还分配.bss 部分变量的空间。.bss 部分是存储未初始化变量的特殊部分, 在生成的共享库或可执行文件中必须考虑到它们的空间。

The memory attributes for the text segment are read-only to protect the data from changing during run time. The memory attributes of the data segment are read and write because the contents will need to be modified at run time.

文本段的内存属性是只读的, 以防止数据在运行时更改。数据段的内存属性是读写的, 因为需要在运行时修改内容。

The order of the data and text section in a shared library or executable is important because the text section relies on the data segment being at a specific offset from the text segment—more on this later in the chapter as well.

共享库或可执行文件中的数据和文本部分的顺序很重要,因为文本部分依赖于数据段与文本段的特定偏移量 - 本章后面的内容将详细讲述。

Like an object file, a shared library or executable will have defined and undefined symbols. The difference is that the linker (the program which does the linking) will ensure that the unresolved symbols will be satisfied through other shared libraries. This is a protection mechanism to ensure that the shared library or executable will not run into any undefined symbols during run time. Here is an example of the error returned when trying to link an executable with an undefined symbol:

与对象文件一样, 共享库或可执行文件有定义和未定义的符号。区别在于链接器 (进行链接的程序) 将确保通过其他共享库来找到未解析的符号。这是一种保护机制, 确保共享库或可执行文件在运行时不会遇到任何未定义的符号。下面是尝试将可执行文件与未定义的符号链接时返回错误的示例:

penguin> g++ main.o -o main

main.o: In function 'main':

main.o(.text+0x2b): undefined reference to 'baz(int)'

collect2: ld returned 1 exit status

 

The object file main.o references a function “baz(int),” but the linker cannot find this function. To successfully link this executable, we need to include a shared library that contains this function as follows:

对象文件 main. o 调用函数 "baz (int)", 但链接器找不到此函数。要成功链接此可执行文件, 我们需要包括一个包含此函数的共享库, 如下所示:

g++ -o foo main.o -L. -Wl,-rpath, -lfoo

 

The -lfoo switch is a short form for libfoo.so. Because libfoo.so contains baz(int), the executable can be successfully linked. The -L. tells the linker to look in the current directory (“.”) for shared libraries.

-lfoo 开关是 libfoo.so 的一种短格式. 所以。因为 libfoo.so 包含 baz (int), 所以可执行文件可以成功链接。-L. 告诉链接器查看当前目录(".")寻找共享库的。

Note: The linking phase does not copy the contents of any undefined functions or variables from the shared libraries, but rather makes note of which libraries will be needed at run time to resolve the required symbols.

注意: 链接阶段不会从共享库中复制任何未定义函数或变量, 而是记录下在运行时需要哪些库来解析所需的符号。

 

After the executable is linked, there are still no guarantees. The library libfoo.so could be removed, modified, switched for another library, and so on. The linker doesn’t guarantee a perfect run with no undefined symbols; it just tries to protect against such a situation by requiring a shared library during link time so that it can embed the names of the required shared libraries into the executable (or shared library) that is being built.

在可执行文件链接后, 仍然不能保证可执行文件成功运行。库文件 libfoo.so可以被删除, 修改, 切换到另一个库, 等等。即使没有未定义的符号,链接器也不能保证可执行文件成功运行;它只是在链接时要求共享库中提供未定义的符号,来保证链接成功, 以便它可以将所需的共享库的名称嵌入正在构建的可执行文件 (或共享库) 中。

Consider Figure 9.2 in which the executable is dependent on two shared libraries: #1 and #2. The reason the executable has a dependency on these shared libraries is that they contain one or more of the undefined symbols required by the executable. Likewise, shared libraries #1 and #2 are also dependent on other shared libraries for the same reason.

参考图 9.2, 其中可执行文件依赖于两个共享库: #1 和 #2。可执行文件对这些共享库有依赖的原因是它们包含可执行文件所需的一个或多个未定义符号。同样, 共享库 #1 和 #2 也依赖于其他共享库, 原因相同。

Figure 9.2. Shared libraries and undefined symbols.

 

There are two good methods to find the dependent shared libraries. The first outlined here is to use the readelf command with the -d switch.

有两个好方法可以找到依赖的共享库。这里首先概述的是使用-d 开关的 readelf 命令。

penguin> readelf -d foo

 

Dynamic segment at offset 0x810 contains 25 entries:

  Tag        Type                         Name/Value

 0x00000001 (NEEDED)             Shared library: [libfoo.so]

 0x00000001 (NEEDED)             Shared library: [libstdc++.so.5]

 0x00000001 (NEEDED)             Shared library: [libm.so.6]

 0x00000001 (NEEDED)             Shared library: [libgcc_s.so.1]

 0x00000001 (NEEDED)             Shared library: [libc.so.6]

 0x0000000f (RPATH)              Library rpath: [.]

<...>

 

This readelf command here lists the shared libraries that are required by executable foo. It does not however, list where this executable will get these libraries at run time. This is what the second method, ldd can be used for:

此 readelf 命令列出了可执行文件 foo 所需的共享库。但是, 该可执行文件没有列出这些库的位置。第二种方法 ldd 可可以列出共享库的位置:

penguin> ldd foo

        libfoo.so => ./libfoo.so (0x40014000)

        libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x40016000)

        libm.so.6 => /lib/libm.so.6 (0x400da000)

        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x400fd000)

        libc.so.6 => /lib/libc.so.6 (0x40105000)

        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

 

The ldd command is actually a wrapper for a special environment variable called LD_TRACE_LOADED_OBJECTS that works with the run time linker/loader to trace the loading of the various libraries. You can use this directly, although any command run off of the command line will only display a trace of the shared libraries:

ldd 命令实际上是一个称为 LD_TRACE_LOADED_OBJECTS 的特殊环境变量的包装, 它与运行时链接器/加载程序一起跟踪各种库的加载。您可以直接使用它, 尽管命令行之外的任何命令都只显示共享库的跟踪:

penguin> export LD_TRACE_LOADED_OBJECTS=true

penguin> foo

        libfoo.so => ./libfoo.so (0x40014000)

        libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x40016000)

        libm.so.6 => /lib/libm.so.6 (0x400da000)

        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x400fd000)

        libc.so.6 => /lib/libc.so.6 (0x40105000)

        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

 

In general, it is best to use the ldd command. The run time loader that uses the environment variable LD_TRACE_LOADED_OBJECTS also deserves a quick overview.

通常, 最好使用 ldd 命令。使用环境变量 LD_TRACE_LOADED_OBJECTS 的运行时加载程序也可以。

9.2.3.1. Linking with Static Libraries

Static libraries (archive files of ELF object files) are a convenient method to store many object files. When linking with a static library, the linker uses the symbol index stored in the static library to find the symbols in the ELF object files. When linking with a static library, the contents of the static library (the object files) are copied into the resulting executable or shared library.

静态库 (ELF 对象文件的存档文件) 是存储许多对象文件的方便方法。当与静态库链接时, 链接器使用存储在静态库中的符号索引来查找 ELF 对象文件中的符号。当与静态库链接时, 静态库 (对象文件) 的内容将复制到构建的可执行文件或共享库中。

9.2.4. Run Time Linking

Run time linking is the process of matching undefined symbols with defined symbols at run time (that is, when a program is starting up or while it is running). When a program is compiled, the linker leaves the undefined symbols to be resolved by the run time linker when the program is run. Another term for run time linking is binding.

运行时链接是在运行时将未定义符号与已定义符号匹配的过程 (即, 当程序启动或运行时)。编译程序时, 链接器将在运行程序时将未定义的符号交给运行时链接器解析。运行时链接的另一个术语是绑定。

Lazy binding is a term used to define a symbol resolution (linking of an undefined symbol with the corresponding defined symbol) the first time a function is actually called. This can improve the performance of program startup because only a few of the undefined symbols may ever be used.

延迟绑定是用于在第一次实际调用函数时定义符号解析(将未定义符号与对应的定义符号链接)的术语。 这可以提高程序启动的性能,因为可能只使用少数未定义的符号。

9.2.5. Program Interpreter / Run Time Linker

The program interpreter or run time linker is a special library that has the responsibility of bringing a program up and eventually transferring control over to the program. This includes finding and loading all of the required libraries, potentially resolving some of the symbols for the executable or its shared libraries, running C++ global constructors, and so on. Eventually, the function main() is called, which transfers control over to the program source code.

程序翻译或运行时链接器是一个特殊的库, 负责使程序运行并最终将控制转移到程序。这包括查找和加载所有必需的库, 潜在地解决可执行文件或其共享库的某些符号, 运行 c++ 全局构造函数等等。最后, 调用函数main (), 将控制转移到程序源代码。

Note: On Linux, the program interpreter will be similar to /lib/ld-linux.so or /lib/linux-ld.so.2. There is a command that follows that will show the actual program interpreter as defined in the ELF file itself.

注: 在 linux 上, 程序解释器将类似于/lib/ld-linux.so或/lib/ld-linux.so.2。下面是一个命令, 它将显示在 ELF 文件中定义的实际程序解释器。

 

Now that some of the basic definitions and concepts are clear, let’s take a look at the ELF format, starting with the ELF header.

现在, 一些基本的定义和概念明确了, 让我们来看看 ELF 格式, 从 ELF 头开始。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章