Chapter 9. ELF: Executable and Linking Format

Chapter 9. ELF: Executable and Linking Format

Introduction

Concepts and Definitions

ELF Header

Overview of Segments and Sections

Segments and the Program Header Table

Sections and the Section Header Table

Relocation and Position Independent Code (PIC)

Stripping an ELF Object

Program Interpreter

Symbol Resolution

Use of Weak Symbols for Problem Investigations

Advanced Interception Using Global Offset Table

Source Files

ELF APIs

Other Information

Conclusion

9.1. Introduction

The ELF file format is the default format for shared libraries and executables on today’s Linux systems. ELF stands for Executable and Linking Format and is the most common file format for object files, executables, and shared libraries on Linux as well as some other popular operating systems. A good understanding of ELF will improve your overall knowledge of how the operating system works, which in turn will help you diagnose problems faster.

ELF 文件格式是當今 Linux 系統上共享庫和可執行程序的默認格式。ELF 代表可執行文件和鏈接格式, 是 Linux 上對象文件、可執行文件和共享庫以及其他一些受歡迎的操作系統的最常用格式。對 ELF 的瞭解將提高您對操作系統工作方式的總體瞭解, 這反過來將幫助您更快地診斷問題。

A good knowledge of ELF will also directly improve your diagnostic skills given that many in-depth problems require a reasonable knowledge of ELF. For example, in some cases, a program may have more than one global symbol (for example, a variable) with the same name. Different parts of the program may access the correct variable; whereas, other parts of the program may incorrectly access the other. This type of problem can be very difficult to diagnose without a basic understanding of ELF and the run time linker. There are also some useful debugging tricks that require a good understanding of ELF. One such trick is to build an interceptor library for operating system (or libc) functions such as malloc and free. The end of this chapter has some examples.

對ELF的很好的瞭解,也將直接提高你的診斷技能, 因爲許多深層次的問題需要對ELF的深入理解。例如, 在某些情況下, 某個程序可能具有多個同名的全局符號 (例如, 變量)。程序的某些部分可以訪問正確的變量;然而, 程序的其他部分可能錯誤地訪問另一個。如果沒有對 ELF 和運行時鏈接器的基本理解, 這類問題可能很難診斷。還有一些有用的調試技巧, 需要對 ELF 有很好的理解。其中一個技巧是爲操作系統 (或 libc) 功能 (如 malloc 和free) 構建一個攔截庫。本章的末尾有一些例子。

Note: The purpose of this chapter is to provide enough knowledge about ELF to help you improve your diagnostic skills. This chapter is not meant to replace or supplement the ELF standard but rather provide practical knowledge about ELF on Linux. Instead of walking through the ELF standard, this chapter will provide real examples and explain how things work under the covers.

注意: 本章的目的是提供有關 ELF 的知識, 幫助您提高診斷技能。本章並不意味着替換或補充 elf 標準, 而是提供 Linux 上的 elf 的實用知識。本章將提供真實的例子, 並解釋工作原理, 而不是遍歷 ELF 標準。

 

For this chapter, we will be using the source code listed here to illustrate the ELF standard and how it relates to the Linux operating system:

在本章中, 我們將使用此處列出的源代碼來說明 ELF 標準以及它與 Linux 操作系統的關係:

penguin> ls -l *.C *.h Makefile

-rw-r—r—    1 wilding  build        759 Jan 18 05:59 Makefile

-rw-r—r—    1 wilding  build        641 Jan  9 08:57 foo.C

-rw-r—r—    1 wilding  build        161 Jan  9 08:04 foo.h

-rw-r—r—    1 wilding  build        649 Jan 18 05:37 intercept.C

-rw-r—r—    1 wilding  build        275 Apr 15  2004 main.C

-rw-r—r—    1 wilding  build        276 Dec 28 17:37 pic.C

Note: The source code in these files can be found at the end of the chapter under the heading “Source Files.”

注意: 這些文件中的源代碼可以在章節末尾的 "源文件" 標題下找到。

Note: The .C extension is used for C++ files, although .cc and .cpp will also work.

注:.c 擴展用於 c++ 文件, 儘管. cc 和. cpp 也會起作用。

 

These objects are all compiled as C++ source files. Some of them contain simple C++ objects that will be used to explain how ELF handles simple C++ functionality such as constructors and destructors.

這些對象都編譯爲 c++ 源文件。其中有些包含簡單的 c++ 對象, 它們將用於解釋 ELF 如何處理簡單的 c++ 功能 (如構造函數和析構函數)。

The source file foo.C contains a number of global objects that have a constructor, a number of global variables, and a number of functions. Source file, file.h, contains the definition of the class used for the global objects in foo.C. Source file, main.C, calls a function in foo.C and also instantiates a global object. Source file pic.C is a smaller, less complex source file used to illustrate relocation. Lastly, the make file, Makefile, serves to make it easier to compile and link the source files using the make utility.

源文件 foo.c 包含許多具有構造函數、多個全局變量和多個函數的全局對象。源文件, file.h, 包含 foo.c 中用於全局對象的類的定義。源文件, main.c, 調用 foo.c 中的函數。還可以實例化全局對象。源文件 pic.c 是用於說明較小的、較不復雜的源文件。最後, Makefile文件有助於使編譯和鏈接源文件更容易。

To compile and link the source files, simply run make in the directory containing the files:

要編譯和鏈接源文件, 只需在包含文件的目錄中運行:

penguin> make

g++ -c  main.C

g++ -c -fPIC  foo.C

g++ -shared foo.o -o libfoo.so

g++ -o foo main.o -L. -Wl,-rpath,. -lfoo

g++ -c -fPIC  pic.C

g++ -o pic pic.o -L. -Wl,-rpath,.

g++ -c pic.C -o pic_nopic.o

g++ -o pic_nopic pic_nopic.o -L. -Wl,-rpath,.

gcc -c -fPIC  intercept.c

gcc -o intercept.so -shared intercept.o

Note: The -fPIC option is used for some of the source files to create position independent object files. This is the preferred choice when creating shared libraries. Position-independent code is covered in more detail later in this chapter.

注意: -fPIC 選項用於某些源文件以創建與位置無關的對象文件。這是創建共享庫時首選的選項。本章後面將詳細介紹與位置無關的代碼。

Note: The -L. switch is used for convenience. It tells the linker to look in the current directory (“.”) for shared libraries. Normally, a real shared library directory would be used, but this is convenient given that the shared libraries are in the same directory as the executables.

注: -L. 開關方便使用。它告訴鏈接器查看共享庫的當前目錄 (".")。通常, 將使用一個真正的共享庫目錄, 但考慮到共享庫與可執行文件位於同一目錄中, 這是很方便的。

 

The source file, foo.C, is used to create libfoo.so and the source file, main.C, is used to build the executable foo.

源文件, foo.c, 用於創建 libfoo.so 和源文件, main.c, 用於生成可執行文件 foo。

Throughout this chapter, we will be using the g++ compiler. This compiler is installed on many/most Linux systems and will compile C++ code to help illustrate how ELF handles some basic C++ functionality.

在本章中, 我們將使用 "g + + 編譯器"。此編譯器安裝在大多數 Linux 系統上, 並將編譯 c++ 代碼, 以幫助說明 ELF 如何處理一些基本的 c++ 功能。

9.2. Concepts and Definitions

Learning about ELF can be a challenge because the concepts are somewhat interdependent in that learning about one concept can require understanding another concept and vice versa. Thus, before diving into ELF details, we will introduce a few basic concepts and definitions without going into too much detail. The most basic of these concepts is the “symbol.”

瞭解 ELF 可能是一個挑戰, 因爲概念是有些相互依存的, 因爲學習一個概念會要求理解另一個概念, 反之亦然。因此, 在深入ELF的細節之前, 我們將介紹一些基本概念和定義, 而不會深入太多細節。這些概念中最基本的是 "符號"。

9.2.1. Symbol

A symbol is a description of a function or variable (or other) that is stored in an ELF file. A symbol is similar to an entry in a phone book. It contains brief information about a function or variable in an ELF file, but, like the phone book entry, it is certainly not the item it is describing. A symbol can describe many things, but they are mainly used for functions and variables, which for the time being, we’ll focus on.

符號是存儲在 ELF 文件中的函數或變量 (或其他) 的描述。符號與電話簿中的條目類似。它包含有關 ELF 文件中函數或變量的簡短信息, 但與電話簿條目一樣, 它肯定不是它所描述的條目。一個符號可以描述很多東西, 但它們主要用於函數和變量。就是現在我們將集中精力關注的東西。

Symbols have a size, value, and name associated with them as well as a few other tidbits of information. The size is the actual size of the function or variable that the symbol represents. The value gives the location in the ELF file of the function/variable or the expected address of the function/variable (using the load address of the shared library or executable as a base). Symbol names are for the sole convenience of humans and are used so that human software developers can use descriptive names in their programs.

符號有大小、值和名稱, 還有其他一些信息。大小是符號所表示的函數或變量的實際大小。該值提供函數/變量在 ELF 文件中的位置或函數/變量的預期地址 (使用共享庫或可執行文件的加載地址作爲基準)。符號名稱是爲方便而使用, 使開發人員可以使用描述性名稱在他們的程序。

Consider the function printf in libc. This function is defined and stored in a library “/lib/libc.so.6” on the system used for this chapter. We can see the details of this “symbol” using the nm command (note that <...> represents uninteresting output that was not included):

在 libc 中考慮函數的 printf。在本章所用的系統中,此函數在庫 "/lib/libc.so.6"定義並存儲。我們可以使用 nm 命令查看此 "符號" 的詳細信息 (請注意 <...> 表示未包括的無關的輸出):

penguin> nm -v -f s /lib/libc.so.6

 

Symbols from /lib/libc.so.6:

 

Name   Value      Class            Type    Size     Line   Section

<...>

printf|0004fc90|   T  |           FUNC|   00000039|      |.text

<...>

Note: Some distributions may strip the shared libraries, which removes the main symbol table and makes the nm command here not work as shown. In this case, choose a library on the system that has not been stripped. You can also use the file command on an ELF file to see whether it has been stripped (the term “stripped” will be displayed).

注意: 某些發行版可能會strip共享庫, 這會刪除主符號表, 並使 nm 命令無法正常工作。在這種情況下, 請選擇系統上尚未strip的庫。您還可以使用 ELF 文件上的 "file" 命令來查看它是否已stripped ("stripped" 一詞將顯示)。

Note: ELF uses the term “class” to describe the scope and/or category of a symbol. The ELF term class, when used to describe symbols, is not related to C++ classes.

注意: ELF 使用術語 "class" 來描述符號的範圍和/或類別。ELF 術語class用於描述符號時, 與 c++ 類無關。

 

The field “Value” is the offset in the library where the executable instructions for “printf” are found. The class is “T” meaning “text”. Text in this context means read-only. The type is “FUNC” for function. The size is 0x39 bytes or 57 bytes in decimal notation, and lastly, the section that this function is stored in is the “.text” section, which contains all of the executable code for the library.

字段 "Value" 是庫中找到 "printf" 的可執行指令的偏移量。類是 "T" 意思 "文本"。此上下文中的文本是隻讀。類型是函數的 "FUNC"。大小爲0x39 字節或十進制中的57個字節, 最後, 此函數存儲在的節是 ". text" 部分, 該節包含庫的所有可執行代碼。

Symbols can have different “scope” such as global or local. A global variable in a shared library is exposed and available for other shared libraries or the executable itself to use. A local variable was originally defined as “static” in the original source file, making it local to the code in that source file. Static symbols are not available to other ELF files.

符號可以有不同的 "範圍", 如全局或局部。共享庫中的全局變量是公開的, 可供其他共享庫或可執行文件本身使用。局部變量在源文件中用 "static"定義, 只在定義的源文件中可用。靜態符號在其他 ELF 源代碼文件中不可用。

There are two variables defined in the source file foo.C that are meant for illustrating the concept of scope:

在源文件 foo.c 中定義了兩個變量, 來表示範圍概念的含義:

int globInt = 5 ;

static int staticInt = 5 ;

 

The first is a global variable, and the second is defined as “static.” Using nm again, we can see how the “class” (that is, scope) of the type variables differs:

第一個是全局變量, 第二個定義爲 "static"。再次使用 nm, 我們可以看到類型變量的 "class" (即範圍) 是如何不同的:

penguin>  nm -v -f s foo.o | egrep "staticInt|globInt"

globInt    |00000000|   D  |            OBJECT|00000004|   |.data

staticInt  |00000004|   d  |            OBJECT|00000004|   |.data

 

The output shows an uppercase D for global data variable and shows a lowercase d for local data variable. In the case of the text symbols, an upper case T refers to a global text symbol; whereas, a lower case t refers to a local text symbol.

輸出顯示全局數據變量的大寫 D, 並顯示本地數據變量的小寫 d。在文本符號的情況下, 大寫 T 指的是全局文本符號;反之, 小寫 t 指的是局部文本符號。

Symbols can also be defined or undefined. A defined symbol means that an ELF file actually contains the contents of the associated function or variable. An undefined symbol means that the object file references the function/variable but does not contain its contents. For example, the source file “main.C” makes a call to printf but it does not contain the contents of the printf function. Using nm again, we can see that the “class” is “U” for “undefined”:

符號可以是定義或未定義的。定義的符號表示 ELF 文件實際上包含相關函數或變量的內容。未定義的符號表示對象文件引用的函數/變量, 但不包含其內容。例如, 源文件 "main.C "調用 printf, 但不包含 printf 函數的內容。再次使用 nm, 我們可以看到 "class" 是 "U",表示 "未定義":

 

penguin> nm -v -f s main.o |egrep printf

printf    |      |   U  |      NOTYPE|     |    |*UND*

 

Other ELF symbols describe variables that are read-only or not initialized (these have a “class” of R and B, respectively, in nm output).

其他 ELF 符號描述的是隻讀或未初始化的變量 (它們的”class”分別是R 和 B )。

ELF also has absolute symbols (marked with an A by nm) that are used to mark an offset in a file. The following command shows a few examples:

ELF 還具有絕對符號 (用 A 標記), 用於標記文件中的偏移量。下面的命令顯示了幾個示例:

penguin>  nm -v -f s libfoo.so | egrep " A "

_DYNAMIC              |00001c48|   A  |    OBJECT|       |  |*ABS*

_GLOBAL_OFFSET_TABLE  |00001d38|   A  |    OBJECT|       |  |*ABS*

_bss_start            |00001d74|   A  |    NOTYPE|       |  |*ABS*

_edata                |00001d74|   A  |    NOTYPE|       |  |*ABS*

_end                  |00001d84|   A  |    NOTYPE|       |  |*ABS*

 

Notice that the absolute symbols do not include a size because they are meant only to mark a location in an ELF file.

請注意, 絕對符號不包含大小, 因爲它們只在 ELF 文件中標記位置。

9.2.1.1. Symbols Names and C Versus C++

One of the unfortunate problems with C code is symbol name collisions. With so much software in the world, it is not uncommon for two developers to pick the same name for two different functions or variables. This is not handled well (as explained later in this chapter) by ELF or other executable file formats and has caused many problems over the years for large C-based software.

C++ aimed to solve this problem in two ways. The mangled names include a name space (that is, class name) and information about the function arguments. So function “foo(int bar)” will have a different C++ symbol name than “foo(char bar).” This is covered in Chapter 5, but it is worth a quick reminder here under the context of ELF. There are many examples with mangled C++ names in this chapter.

C 代碼的一個不幸問題是變量名衝突。在世界上有這麼多的軟件, 兩個開發人員爲兩個不同的函數或變量選擇相同的名稱並不少見。ELF 或其他可執行文件格式對這個問題處理得不好(如本章後面部分所解釋), 並且多年來爲大型的基於 C 的軟件帶來了許多問題。c++ 旨在通過兩種方式解決這個問題。被破壞的名稱包括命名空間(即類名)和有關函數參數的信息。 因此,函數“foo(int bar)”將具有與“foo(char bar)”不同的C ++符號名稱。這將在第5章中介紹,但值得在ELF的背景下快速提醒。 本章中有許多帶有錯誤的C ++名稱的示例。

9.2.2. Object Files, Shared Libraries, Executables, and Core Files

Source files are meant for humans and cannot be efficiently interpreted by computers. Therefore, source code must be translated (human to computer) into a format that is easily and efficiently executed on a computer system. This is called “compiling.” There is more information on compiling in Chapter 4, but stated simply, compiling is the process of turning a source file into an object file.

源文件是爲人類準備的, 不能被計算機有效地解釋。因此, 必須將源代碼翻譯 (從可讀到計算機識別) 轉換成一種在計算機系統上容易和高效地執行的格式。這叫做 "編譯"。在第4章中有更多關於編譯的信息, 但簡單地說, 編譯是將源文件轉化爲對象文件的過程。

9.2.2.1. Object Files

Even though an object file is the computer-readable version of a source file, it still has hints of the original source code. Function names as well as global and static variable names are the basis for the symbol names stored in the resulting object file. As mentioned earlier, the symbol names are solely for humans and are actually a bit of an inconvenience for computer systems. A good example of an inconvenience is the hash table section in an ELF file, which is detailed later in this chapter. The hash table consumes disk space and memory and requires CPU resources to traverse—it is required because of the human-friendly symbol names.

即使目標文件是源文件的計算機可讀版本,它仍然具有原始源代碼的提示。 函數名稱以及全局和靜態變量名稱是存儲在結果對象文件中的符號名稱的基礎。 如前所述,符號名稱僅供人類使用,實際上對計算機系統來說有點不方便。 一個很不方便的例子是ELF文件中的哈希表部分,本章稍後將對此進行詳細說明。 哈希表佔用磁盤空間和內存,並且需要CPU資源遍歷 - 由於人性化的符號名稱,它是必需的。

The command to create an ELF object file is fairly straight forward as shown here:

創建 ELF 對象文件的命令相當直接, 如下所示:

Code View: Scroll / Show All

penguin> g++ -c foo.C

penguin> ls -l foo.C foo.o

-rw-r—r—    1 wilding  build     641 Jan   9 08:57 foo.C

-rw-r—r—    1 wilding  build    2360 Jan   9 09:05 foo.o

penguin> file foo.C foo.o

foo.C: ASCII C program text

foo.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

 

The foo.o file is the object file that contains the compiled version of the source code in the file foo.C. From the file foo.o command, we can see that this is indeed an ELF file. From the ls command, we can see that the size of the object file is much larger than the source file. In fact, it is about three and a half times larger. At first, this might seem strange. After all, shouldn’t the computer-readable version be smaller?

foo.o文件是目標文件,其中包含文件foo.C中源代碼的編譯版本。 從文件foo.o命令,我們可以看到這確實是一個ELF文件。 從ls命令,我們可以看到目標文件的大小比源文件大得多。 事實上,它大約是後者的三倍半。 乍看起來很奇怪。 畢竟,計算機可讀版本不應該更小嗎?

Note: For more information about creating small ELF files, see the following URL: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html.

注意: 有關創建ELF文件的詳細信息, 請參閱以下 URL: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html。

 

The actual combined size of the compiled instructions and variables is 247 bytes (using nm to count the symbol sizes). This is about one-third of the size of the original source file and accounts for only about one-tenth of the object file size. The object file foo.o must contain more than just the machine instructions and variables of the source file foo.C. One example of something that takes up space is the ELF header, which is explained later in this chapter under the heading, “ELF Header.”

編譯的指令和變量的實際組合大小爲247字節 (使用 nm 計算符號大小)。這大約是原始源文件大小的 1/3, 僅佔對象文件大小的大約1/10。目標文件foo.o所包含的不僅僅是源文件foo.C的機器指令和變量。佔用空間的一個例子是 ELF頭, 後面的章節將解釋 "ELF 頭"。

These object files cannot be run directly because they do not contain information about how the object file should be loaded into memory. Further, the undefined symbols in the object files must eventually point to the corresponding defined symbols, or the code in the object files will not run. For example, a call to printf must be able to find the actual function printf in libc (for example, “/lib/libc.so.6”). Before any machine instructions in an ELF object file are executed, the object files must be combined (“linked”) into a larger ELF file type called an executable or shared library. For shared libraries, there is an extra step where the run time linker/loader (explained later in this chapter) must dynamically load the shared libraries into the address space of an executable. In any case, the process of creating an executable or shared library from object files is called linking. And part of the responsibility of linking is to resolve some of the undefined symbols.

這些對象文件不能直接運行, 因爲它們不包含有關如何將對象文件加載到內存中的信息。此外, 對象文件中未定義的符號最終必須指向相應的已定義符號, 否則對象文件中的代碼將不會運行。例如, 對 printf 的調用必須能夠在 libc 中找到實際的函數, 例如, "/lib/libc.so.6"。在執行 ELF 對象文件中的任何機器指令之前, 必須將對象文件組合 ("鏈接") 到一個稱爲可執行文件或共享庫的更大的 ELF 文件名中。對於共享庫, 有一個額外的步驟, 運行時linker/loader (後面解釋) 必須動態地將共享庫加載到可執行文件的地址空間中。在任何情況下, 從對象文件創建可執行文件或共享庫的過程稱爲鏈接。鏈接的一部分工作是解決一些未定義的符號。

9.2.2.2. Shared Libraries

A shared library is made up of the symbols from one or more object files and can be loaded anywhere in the address space. There are some architectural restrictions that limit or guide the actual load address of shared libraries in the address space, but this does not affect ELF (any address is okay). A shared library, like an object file, has a list of symbols that are either defined or undefined. However, any undefined symbols must be satisfied through other shared libraries.

共享庫由一個或多個對象文件中的符號組成, 並且可以加載到地址空間中的任何位置。有一些體系結構限制了地址空間中共享庫的實際加載地址, 但這不影響 ELF (任何地址都可以)。共享庫 (象對象文件一樣) 具有定義或未定義的符號列表。但是, 任何未定義的符號都必須在其他共享庫中找到。

9.2.2.3. Exectuables

An executable is very similar to a shared library, although it must be loaded at a specific address in memory. An executable also has a function that is called when a program starts. For programmers, this function is called main; however, the actual function that is run first in an executable is called _start and is explained later in this chapter.

可執行文件與共享庫非常相似, 儘管它必須在內存中的特定地址加載。可執行文件還具有在程序啓動時調用的函數。對於程序員來說, 這個函數叫做 main;但是, 在可執行文件中首先運行的實際函數稱爲 _start, 並在本章後面部分進行說明。

The most significant part of an executable or shared library is the information about how and where to load the files into memory so that the machine instructions can be run. This information is contained in a special part of the ELF file called the “program header table.” This is also explained in more detail later in this chapter.

可執行文件或共享庫中最重要的部分是有關如何以及在何處將這些文件加載到內存中的信息, 以便可以運行計算機指令。此信息包含在稱爲 "程序頭表" 的 ELF 文件的特殊部分中。本章後面還將詳細說明這一點。

9.2.2.4. Core Files

A core file is a special type of ELF file that is very different from shared libraries and executables. It is the memory image from a once-running process. Core files contain a number of memory segments that were originally used by the running process and that can be loaded into a debugger for subsequent diagnosis.

core文件是一種特殊類型的 ELF 文件, 與共享庫和可執行程序不同。它是一次運行過程中的內存映像。core文件包含一些被運行進程使用的內存段, 可以加載到調試器中以供後續診斷使用。

9.2.2.5. Static Libraries

Archive files (files that end with .a), also known as static libraries, are not ELF files. Archive files are a convenient file format to store other files. Technically, you can store any type of file in an archive file, although in practice, object files are by far the most commonly stored file type. Archive files do contain an index of the symbols from each ELF object file contained within them (which crosses the boundary a bit as far as a generic storage format is concerned). A description of static libraries (and their format) is beyond the scope of this chapter (because they are not an ELF file type). However, their importance will be explained as part of the linking phase.

存檔文件 (以. a 結尾的文件) (也稱爲靜態庫) 不是 ELF 文件。存檔文件是存儲其他文件的一種方便的文件格式。從技術上講, 您可以將任何類型的文件存儲在存檔文件中, 儘管實際上, 對象文件是最常用的文件存儲類型。存檔文件確實包含一些符號的索引,這些索引來自ELF文件包含的對象文件(與一般存儲格式有關)。靜態庫 (及其格式) 的描述超出本章的範圍 (因爲它們不是 ELF 文件類型)。然而, 它們的重要性將在本章的鏈接部分解釋。

9.2.3. Linking

Linking takes the symbols from the object files, shuffles them into a specific order, and then combines them into either a shared library or executable. Linking also needs to resolve some of the undefined symbols, either using the functions and variables of the object files that it is combining or through symbols exported by other shared libraries. The linker must also create a program header that includes information about how the executable or shared library should be loaded into memory. Let’s take a quick look at the process of linking using Figure 9.1.

鏈接從對象文件獲取符號, 將它們排列爲特定的順序, 然後將它們組合成共享庫或可執行文件。鏈接或者需要使用它所組合的對象文件的函數和變量, 或者通過其他共享庫導出的符號解析某些未定義的符號。鏈接器還必須創建一個程序頭, 其中包含有關如何將可執行文件或共享庫加載到內存中的信息。讓我們使用圖9.1快速瞭解一下鏈接的過程。

Figure 9.1. Linking object files into a shared library.

 

The diagram shows four separate object files being combined into one shared library. Each object file contains a number of sections; only four are being shown here for simplicity:

該圖顯示了將四個單獨的對象文件合併到一個共享庫中。每個對象文件包含多個節;爲了簡潔,這裏僅顯示四個對象文件:

.text:

contains functions

.data:

contains initialized writable (that is, not constant) variables

.rodata:

contains read-only (that is, constant) variables

.bss:

contains uninitialized and writable variables

 

Each of the sections from the four object files is merged in with a larger section in the shared library. The sections are also included in larger contiguous regions called “segments,” which will eventually be loaded into memory with specific memory attributes such as read, write, and/or execute.

四個對象文件中的每一節都與共享庫中的較大節合併。這些節還包含在稱爲 "段" 的較大相鄰區域中, 最終將被加載到內存中, 其中包含特定的內存屬性, 如讀取、寫入和/或執行。

The order of information in an object file is not that important. However, the order of the information in a shared library or executable is very important because the goal is to move functions and variables that have similar properties into the specific loadable segments. The order of the segments in the shared library is:

對象文件中信息的順序並不重要。但是, 共享庫或可執行文件中信息的順序非常重要, 因爲共享庫或可執行文件的目標是將具有相似屬性的函數和變量移動到特定的可加載段中。共享庫中的段順序爲:

  1. text: read only
  2. data: read/write

The segments of this shared library are loaded into memory in order, and space for the variables in the .bss section is also allocated. The .bss section is a special section that stores uninitialized variables and the space for them must be taken into account in the resulting shared library or executable.

此共享庫的段按順序加載到內存中, 並且還分配.bss 部分變量的空間。.bss 部分是存儲未初始化變量的特殊部分, 在生成的共享庫或可執行文件中必須考慮到它們的空間。

The memory attributes for the text segment are read-only to protect the data from changing during run time. The memory attributes of the data segment are read and write because the contents will need to be modified at run time.

文本段的內存屬性是隻讀的, 以防止數據在運行時更改。數據段的內存屬性是讀寫的, 因爲需要在運行時修改內容。

The order of the data and text section in a shared library or executable is important because the text section relies on the data segment being at a specific offset from the text segment—more on this later in the chapter as well.

共享庫或可執行文件中的數據和文本部分的順序很重要,因爲文本部分依賴於數據段與文本段的特定偏移量 - 本章後面的內容將詳細講述。

Like an object file, a shared library or executable will have defined and undefined symbols. The difference is that the linker (the program which does the linking) will ensure that the unresolved symbols will be satisfied through other shared libraries. This is a protection mechanism to ensure that the shared library or executable will not run into any undefined symbols during run time. Here is an example of the error returned when trying to link an executable with an undefined symbol:

與對象文件一樣, 共享庫或可執行文件有定義和未定義的符號。區別在於鏈接器 (進行鏈接的程序) 將確保通過其他共享庫來找到未解析的符號。這是一種保護機制, 確保共享庫或可執行文件在運行時不會遇到任何未定義的符號。下面是嘗試將可執行文件與未定義的符號鏈接時返回錯誤的示例:

penguin> g++ main.o -o main

main.o: In function 'main':

main.o(.text+0x2b): undefined reference to 'baz(int)'

collect2: ld returned 1 exit status

 

The object file main.o references a function “baz(int),” but the linker cannot find this function. To successfully link this executable, we need to include a shared library that contains this function as follows:

對象文件 main. o 調用函數 "baz (int)", 但鏈接器找不到此函數。要成功鏈接此可執行文件, 我們需要包括一個包含此函數的共享庫, 如下所示:

g++ -o foo main.o -L. -Wl,-rpath, -lfoo

 

The -lfoo switch is a short form for libfoo.so. Because libfoo.so contains baz(int), the executable can be successfully linked. The -L. tells the linker to look in the current directory (“.”) for shared libraries.

-lfoo 開關是 libfoo.so 的一種短格式. 所以。因爲 libfoo.so 包含 baz (int), 所以可執行文件可以成功鏈接。-L. 告訴鏈接器查看當前目錄(".")尋找共享庫的。

Note: The linking phase does not copy the contents of any undefined functions or variables from the shared libraries, but rather makes note of which libraries will be needed at run time to resolve the required symbols.

注意: 鏈接階段不會從共享庫中複製任何未定義函數或變量, 而是記錄下在運行時需要哪些庫來解析所需的符號。

 

After the executable is linked, there are still no guarantees. The library libfoo.so could be removed, modified, switched for another library, and so on. The linker doesn’t guarantee a perfect run with no undefined symbols; it just tries to protect against such a situation by requiring a shared library during link time so that it can embed the names of the required shared libraries into the executable (or shared library) that is being built.

在可執行文件鏈接後, 仍然不能保證可執行文件成功運行。庫文件 libfoo.so可以被刪除, 修改, 切換到另一個庫, 等等。即使沒有未定義的符號,鏈接器也不能保證可執行文件成功運行;它只是在鏈接時要求共享庫中提供未定義的符號,來保證鏈接成功, 以便它可以將所需的共享庫的名稱嵌入正在構建的可執行文件 (或共享庫) 中。

Consider Figure 9.2 in which the executable is dependent on two shared libraries: #1 and #2. The reason the executable has a dependency on these shared libraries is that they contain one or more of the undefined symbols required by the executable. Likewise, shared libraries #1 and #2 are also dependent on other shared libraries for the same reason.

參考圖 9.2, 其中可執行文件依賴於兩個共享庫: #1 和 #2。可執行文件對這些共享庫有依賴的原因是它們包含可執行文件所需的一個或多個未定義符號。同樣, 共享庫 #1 和 #2 也依賴於其他共享庫, 原因相同。

Figure 9.2. Shared libraries and undefined symbols.

 

There are two good methods to find the dependent shared libraries. The first outlined here is to use the readelf command with the -d switch.

有兩個好方法可以找到依賴的共享庫。這裏首先概述的是使用-d 開關的 readelf 命令。

penguin> readelf -d foo

 

Dynamic segment at offset 0x810 contains 25 entries:

  Tag        Type                         Name/Value

 0x00000001 (NEEDED)             Shared library: [libfoo.so]

 0x00000001 (NEEDED)             Shared library: [libstdc++.so.5]

 0x00000001 (NEEDED)             Shared library: [libm.so.6]

 0x00000001 (NEEDED)             Shared library: [libgcc_s.so.1]

 0x00000001 (NEEDED)             Shared library: [libc.so.6]

 0x0000000f (RPATH)              Library rpath: [.]

<...>

 

This readelf command here lists the shared libraries that are required by executable foo. It does not however, list where this executable will get these libraries at run time. This is what the second method, ldd can be used for:

此 readelf 命令列出了可執行文件 foo 所需的共享庫。但是, 該可執行文件沒有列出這些庫的位置。第二種方法 ldd 可可以列出共享庫的位置:

penguin> ldd foo

        libfoo.so => ./libfoo.so (0x40014000)

        libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x40016000)

        libm.so.6 => /lib/libm.so.6 (0x400da000)

        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x400fd000)

        libc.so.6 => /lib/libc.so.6 (0x40105000)

        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

 

The ldd command is actually a wrapper for a special environment variable called LD_TRACE_LOADED_OBJECTS that works with the run time linker/loader to trace the loading of the various libraries. You can use this directly, although any command run off of the command line will only display a trace of the shared libraries:

ldd 命令實際上是一個稱爲 LD_TRACE_LOADED_OBJECTS 的特殊環境變量的包裝, 它與運行時鏈接器/加載程序一起跟蹤各種庫的加載。您可以直接使用它, 儘管命令行之外的任何命令都只顯示共享庫的跟蹤:

penguin> export LD_TRACE_LOADED_OBJECTS=true

penguin> foo

        libfoo.so => ./libfoo.so (0x40014000)

        libstdc++.so.5 => /usr/lib/libstdc++.so.5 (0x40016000)

        libm.so.6 => /lib/libm.so.6 (0x400da000)

        libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x400fd000)

        libc.so.6 => /lib/libc.so.6 (0x40105000)

        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)

 

In general, it is best to use the ldd command. The run time loader that uses the environment variable LD_TRACE_LOADED_OBJECTS also deserves a quick overview.

通常, 最好使用 ldd 命令。使用環境變量 LD_TRACE_LOADED_OBJECTS 的運行時加載程序也可以。

9.2.3.1. Linking with Static Libraries

Static libraries (archive files of ELF object files) are a convenient method to store many object files. When linking with a static library, the linker uses the symbol index stored in the static library to find the symbols in the ELF object files. When linking with a static library, the contents of the static library (the object files) are copied into the resulting executable or shared library.

靜態庫 (ELF 對象文件的存檔文件) 是存儲許多對象文件的方便方法。當與靜態庫鏈接時, 鏈接器使用存儲在靜態庫中的符號索引來查找 ELF 對象文件中的符號。當與靜態庫鏈接時, 靜態庫 (對象文件) 的內容將複製到構建的可執行文件或共享庫中。

9.2.4. Run Time Linking

Run time linking is the process of matching undefined symbols with defined symbols at run time (that is, when a program is starting up or while it is running). When a program is compiled, the linker leaves the undefined symbols to be resolved by the run time linker when the program is run. Another term for run time linking is binding.

運行時鏈接是在運行時將未定義符號與已定義符號匹配的過程 (即, 當程序啓動或運行時)。編譯程序時, 鏈接器將在運行程序時將未定義的符號交給運行時鏈接器解析。運行時鏈接的另一個術語是綁定。

Lazy binding is a term used to define a symbol resolution (linking of an undefined symbol with the corresponding defined symbol) the first time a function is actually called. This can improve the performance of program startup because only a few of the undefined symbols may ever be used.

延遲綁定是用於在第一次實際調用函數時定義符號解析(將未定義符號與對應的定義符號鏈接)的術語。 這可以提高程序啓動的性能,因爲可能只使用少數未定義的符號。

9.2.5. Program Interpreter / Run Time Linker

The program interpreter or run time linker is a special library that has the responsibility of bringing a program up and eventually transferring control over to the program. This includes finding and loading all of the required libraries, potentially resolving some of the symbols for the executable or its shared libraries, running C++ global constructors, and so on. Eventually, the function main() is called, which transfers control over to the program source code.

程序翻譯或運行時鏈接器是一個特殊的庫, 負責使程序運行並最終將控制轉移到程序。這包括查找和加載所有必需的庫, 潛在地解決可執行文件或其共享庫的某些符號, 運行 c++ 全局構造函數等等。最後, 調用函數main (), 將控制轉移到程序源代碼。

Note: On Linux, the program interpreter will be similar to /lib/ld-linux.so or /lib/linux-ld.so.2. There is a command that follows that will show the actual program interpreter as defined in the ELF file itself.

注: 在 linux 上, 程序解釋器將類似於/lib/ld-linux.so或/lib/ld-linux.so.2。下面是一個命令, 它將顯示在 ELF 文件中定義的實際程序解釋器。

 

Now that some of the basic definitions and concepts are clear, let’s take a look at the ELF format, starting with the ELF header.

現在, 一些基本的定義和概念明確了, 讓我們來看看 ELF 格式, 從 ELF 頭開始。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章