9.6. Sections and the Section Header Table

9.6. Sections and the Section Header Table

The section header table contains information about every part of an ELF file except the ELF header, the program header table, and the section header table itself. The section header table is a list (or array) of section header structures, each defining a different section in the ELF file.

節頭表包含有關 ELF 文件的每個部分的信息, 除了 ELF 頭、程序頭表和節頭表本身之外。節頭表是節頭結構的列表 (或數組), 每一個都定義了 ELF 文件中的不同節。

The following is the structure of a section header table entry for a 32-bit ELF file. Refer to the /usr/include/elf.h file for the 64-bit version.

下面是32位 ELF 文件的節頭表項的結構。64位版本請參閱的/usr/include/elf.h 文件。

typedef struct {

  Elf32_Word    sh_name;

  Elf32_Word    sh_type;

  Elf32_Word    sh_flags;

  Elf32_Addr    sh_addr;

  Elf32_Off     sh_offset;

  Elf32_Word    sh_size;

  Elf32_Word    sh_link;

  Elf32_Word    sh_info;

  Elf32_Word    sh_addralign;

  Elf32_Word    sh_entsize;

} Elf32_Shdr;

 

sh_name

The numeric offset into the string table for the section name.

sh_type

The type of section.

sh_flags

A bit mask of miscellaneous attributes.

sh_addr

The memory address at which this section should reside in a process. The value will be zero if the section will not appear in a process memory image.

sh_offset

Contains the file offset of the actual section data. If the section type is SHT_NOBITS the section occupies no space in the file although the offset may still exist and will represent the offset as if the section was loaded in memory.

sh_size

The size of the actual section data.

sh_link

The meaning of this field depends on the section type.

sh_info

Contains extra information that depends on the section type.

sh_addralign

The alignment requirements for the section.

sh_entsize

The size of fix-sized elements for sections that use them. Example is a symbol table.

 

The ELF header contains the file offset of the section header table (e_shoff), the size of an entry in the table (e_shentsize), and the number of entries in the table (e_shnum). This is everything needed to find and sift through the contents of the section header table:

ELF 頭包含節頭表 (e_shoff) 的文件偏移量、表中條目的大小 (e_shentsize) 以及表中的條目數 (e_shnum)。下面是查找和篩選節頭表內容所需的信息:

penguin> readelf -h foo |egrep section

  Start of section headers:           9292 (bytes into file)

  Size of section headers:            40 (bytes)

Number of section headers:            35

 

Let’s take a look at the details behind the scenes again in the same way we did for the ELF header. This is useful to understand how the section header table and ELF string tables work.

讓我們看看細節, 就像我們爲ELF頭做的一樣。這對於瞭解節頭表和 ELF 字符串表的工作方式非常有用。

According to the ELF header for the executable foo, the section header table offset is 9292 bytes, the size of a section header is 40 bytes, and there are 35 section headers. A hex dump of the file at the offset for the section header table shows (note that the “*” in the output denotes an identical row to the previous):

根據可執行文件 foo 的 ELF 頭, 節頭表偏移量爲9292字節, 節頭的大小爲40字節, 並且有35個節頭。在ELF文件節頭表的偏移量處,十六進制轉儲顯示如下 (請注意, 輸出中的 "*" 表示與上一個相同的行):

Code View: Scroll / Show All

penguin> hexdump -C -s 9292 -n 160 foo

0000244c   00  00 00  00  00 00  00  00   00 00 00 00 00 00 00 00 |................|

*

0000246c   00  00 00  00  00 00  00  00   1b 00 00 00 01 00 00 00 |................|

0000247c   02  00 00  00  14 81  04  08   14 01 00 00 13 00 00 00 |................|

 

The first section header (at file offset 0x244c for foo) always has a NULL type and can be ignored. Because the 32-bit section header structure is 40 (0x28) bytes, the next section header starts at 40 bytes after the first at 0x2474:

第一個節頭 (在 foo 的偏移0x244c 中) 始終具有 NULL 類型, 可以忽略。由於32位節頭結構大小爲 40 (0x28) 字節, 所以下一節頭在0x2474後的第一個40個字節處開始::

Code View: Scroll / Show All

penguin> hexdump -C -s 0x2474 -n 40 foo

00002474   1b  00  00 00  01 00  00  00   02 00 00 00 14 81 04 08 |................|

00002484   14  01  00 00  13 00  00  00   00 00 00 00 00 00 00 00 |................|

00002494  01 00 00 00 00 00 00 00                      |........|

0000249c

 

We can get the values of the section header structure by mapping it onto the raw data at 0x2474:

通過將節頭結構的值映射到0x2474 中, 我們可以獲得該部分的的原始數據:

sh_name:  0x1b    (section name is at offset 0x1b in string table)

sh_type:  0x1     (SHT_PROGBITS)

sh_flags: 0x2     (SHF_ALLOC)

sh_addr:  0x08048114 (virtual address)

sh_offset:        0x114   (file offset)

sh_size:  0x13    (total size in bytes)

sh_link:  0x0

sh_info:  0x0

sh_addralign:     0x1    (needs to be aligned in a single byte boundary)

sh_entsize:       0x0    (does not use fixed size elements)

 

The first field is the offset into the string table for the section name. To get the section name, we first have to find the string table and then look at offset 0x1b in it. There should be a NULL terminated string at offset 0x1b in the string table that is the name of this first section. According to the ELF header, the section header table index for the section header string table is 32:

第一個字段是節名的字符串表中的偏移量。要獲取節名,我們首先必須找到字符串表,然後查看其中的偏移量0x1b。字符串表中的偏移量0x1b處應該有一個以NULL結尾的字符串,該字符串是第一部分的名稱。根據ELF頭,節頭字符串表的節頭表索引是32::

penquin> readelf -h foo | tail -2

  Number of section headers:         35

  Section header string table index: 32

 

To find the string table, we need to use the size of an element in the section header table (40 bytes), multiply it by 32 (the section header table index of the string table), and add the result to 0x244c, which is the file offset of the start of the section header table.

若要查找字符串表, 請我們需要使用節頭表中元素的大小 (40 字節), 乘以 32 (字符串表的節頭表索引), 然後將結果添加到 0x244c, 這是節頭表開頭的文件偏移量。

32 x 40 = 1280 = 0x500

0x500 + 0x244C = 0x294C

 

This offset in the file is only the entry in the section header table for the string table and is not the string table itself.

文件中的此偏移量只是字符串表的節頭表中的項, 而不是字符串表本身。

Code View: Scroll / Show All

penguin> hexdump -C -s 0x294C -n 40 foo

0000294c   11  00  00 00  03 00  00  00   00 00 00 00 00 00 00 00 |................|

0000295c   13  23  00 00  39 01  00  00   00 00 00 00 00 00 00 00 |.#..9...........|

0000296c  01 00 00 00 00 00 00 00                      |........|

00002974

 

Mapping the section header structure onto this raw data (sh_offset is at offset 16, and sh_size is at offset 20 in the section header structure) shows us that the offset to the string table section is 0x2313 (as shown at offset 0x295C), and the size is 0x139 (directly after the sh_offset field). Remember that this platform is little endian, so a hexdump of 1323 is actually a value of 0x2313. A hex dump of this file offset shows:

將節頭結構映射到此原始數據 (sh_offset 偏移量爲 16, sh_size 在節頭結構中的偏移量 20) 顯示了字符串表節的偏移量 0x2313 (如偏移0x295C 所示), 大小爲 0x139 (直接在 sh_offset 領域之後)。請記住, 這個平臺是little endian, 所以1323的值實際上是0x2313 。此文件偏移量的十六進制轉儲顯示:

Code View: Scroll / Show All

penguin> hexdump -C -s 0x2313 foo |head

00002313   00  2e  73 79  6d 74  61 62  00  2e  73  74  72  74  61  62 |..symtab..strtab|

00002323   00  2e  73 68  73 74  72 74  61  62  00  2e  69  6e  74  65 |..shstrtab..inte|

00002333  72 70 00 2e 6e 6f 74 65 2e 41 42 49 2d 74 61 67 |rp..note.ABI-tag|

00002343    00  2e  68  61  73  68  00 2e   64  79  6e  73 79 6d 00  2e |..hash..dynsym..|

 

Now we need to add the offset (0x1b) of the section name in the string table to the offset of the string table itself (0x2313) to get 0x232e. The section name at this offset can be found with yet another hexdump:

現在我們需要將字符串表中的節名的偏移量(0x1b)添加到字符串表本身的偏移量(0x2313)以獲得0x232e。可以使用另一個16進制輸出找到此偏移處的節名稱:

Code View: Scroll / Show All

penguin> hexdump -C -s 0x232e foo | head -2

0000232e   2e  69  6e 74  65 72  70  00   2e 6e 6f 74 65 2e 41 42 |.interp..note.AB|

0000233e  49 2d  74  61 67 00 2e 68  61 73 68 00 2e 64 79 6e  |I-tag..hash..dyn|

 

This offset contains the string .interp, which is the name of the first useful section. Whew! This is a lot of work, but again, the point is to show that there is no magic in an ELF file format.

此偏移量包含字符串.interp,它是第一個有用部分的名稱。呼! 這是重大進展,但同樣重要的是表明ELF文件格式沒有魔力。

After all of this work, we know that the first useful section header (the first actual section is a NULL section) is for the name of the program interpreter. Other sections can include global variables, the machine instructions, and many other types of data. The contents of a section depend entirely on its type and purpose.

在所有這些工作之後, 我們知道第一個有用的節頭 (第一個實際部分是空節) 是用於程序解釋器的名稱。其他部分可以包括全局變量、機器指令和許多其他類型的數據。節的內容完全取決於節的類型和用途。

Of course, there is a much easier way to view the section headers using readelf (don’t discount the importance of understanding how this really works, though).

當然,使用readelf查看節頭是一種更簡單的方法(儘管如此,不要忽視瞭解它是如何工作的重要性)。

penguin> readelf -S foo | head

There are 35 section headers, starting at offset 0x244c:

 

Section Headers:

  [Nr] Name          Type   Addr     Off    Size   ES Flg Lk Inf Al

  [ 0]               NULL   00000000 000000 000000 00  0  0   0

  [ 1] .interp       PROGBITS 08048114 000114 000013 00 A  0   0  1

  [ 2] .note.ABI-tag NOTE   08048128 000128 000020 00   A  0   0  4

  [ 3] .hash         HASH   08048148 000148 000094 04   A  4   0  4

  [ 4] .dynsym       DYNSYM 080481dc 0001dc 000120 10   A  1  4

  [ 5] .dynstr       STRTAB 080482fc 0002fc 00011a 00   A  0  0  1

<...>

 

The ELF specification contains a full list of section types. Only the most common and important ones are covered in detail next. We’ll start with two common section formats, symbol table and string table, because there is more than one of these section types described.

ELF 標準包含節類型的完整列表。接下來, 只有最常見和最重要的內容被詳細介紹。我們將從兩個常用節格式: 符號表和字符串表開始, 因爲所描述的節類型不止一個。

9.6.1. String Table Format

The string table contains a list of all strings that are used by the ELF specification. The string table is very simple. It is a range of space that contains a list of NULL terminated strings, one after the other. The strings are indexed by offset from the beginning of the file, the same offset as from base address for the shared library or executable.

字符串表包含ELF規範使用的所有字符串的列表。字符串表非常簡單。它是一個包含NULL終止字符串列表的空間範圍。字符串從文件開頭的偏移量開始索引,與共享庫或可執行文件的基址相同。

There can be a number of string tables in an ELF file, including one for the dynamic symbol table, one for the main symbol table, and one for the section header names. String tables all have the same simple format:

ELF 文件中有許多字符串表, 其中包括一個動態符號表, 一個主符號表, 一個節頭名稱。字符串表都具有相同的簡單格式:

<string1>\0<string2>\0<string3>\0...<stringN>\0\0

 

An index into a string table will point to the start of a string.

字符串表中的索引將指向字符串的開頭。

9.6.2. Symbol Table Format

A symbol table is an array of ELF symbol structures that describe a function, variable, or other type of symbol. As discussed at the beginning of this chapter, a symbol table is like a phone book for functions and variables in an ELF file.

符號表是一個ELF符號結構的數組, 用來描述函數、變量或其他類型的符號。正如本章開頭所討論的, 符號表就像ELF文件中函數和變量的電話簿。

There are actually two symbol tables for an ELF file. One is called the dynamic symbol table and is used at run time to find the various symbols in the ELF object. The other is the main symbol table and contains all of the symbols for an ELF object, including static symbol information that is not used at run time. The main symbol table is used at link time to find all of the unresolved symbols.

Each element of the array has the following structure (from /usr/include/elf.h):

實際上, ELF文件有兩個符號表。一個稱爲動態符號表, 在運行時用於查找 ELF 對象中的各種符號。另一個是主符號表, 包含 ELF 對象的所有符號, 包括在運行時不使用的靜態符號信息。在鏈接時使用主符號表來查找所有未解析的符號。數組的每個元素都具有以下結構 (/usr/include/elf.h):

typedef struct

{

  Elf32_Word    st_name;     /* Symbol name (string tbl index) */

  Elf32_Addr    st_value;    /* Symbol value */

  Elf32_Word    st_size;     /* Symbol size */

  unsigned char st_info;     /* Symbol type and binding */

  unsigned char st_other;    /* Symbol visibility */

  Elf32_Section st_shndx;    /* Section index */

} Elf32_Sym;

 

The st_name field is the string table index for the name of the symbol (Note: The dynamic and main symbol tables both have their own string table.). The “value” is either the offset in the ELF file, the address of the symbol when it will be loaded, or the offset in the section that contains the symbol. Executables will have values that are actual addresses; whereas, shared libraries will have offset into the ELF file. The difference is due to the fact that shared libraries do not have a specific load addresses for their memory segments. Executables, on the other hand, must be loaded at address 0x08048000 for 32-bit Linux. The st_size file is the actual size of the item described by the symbol entry. This could be a variable, function, or other. The st_info field describes the type of binding (local, global, and so on) and the type of symbol (variable, function, etc.). The st_other field contains information about the visibility of a symbol. The st_shndx describes which section contains the item described by the symbol entry.

st_name字段是符號名稱的字符串表索引(注意:動態符號表和主符號表都有自己的字符串表)。“值”或者是ELF文件中的偏移量,即加載時符號的地址,或者是包含符號的節的偏移量。可執行文件的值將是實際地址; 而共享庫的值是對ELF文件的偏移。不同之處在於共享庫沒有內存段的特定加載地址。另一方面,對於32位Linux,必須在地址0x08048000處加載可執行文件。 st_size文件是符號條目描述的項目的實際大小。這可以是變量,函數或其他。 st_info字段描述了綁定的類型(本地,全局等)和符號的類型(變量,函數等)。 st_other字段包含符號可見性的信息。 st_shndx描述哪個部分包含符號條目描述的項目。

The meaning of st_value field depends on the ELF type. For a relocatable object, the value is the offset within the section specific by the section index, st_shndx. For shared libraries and executables, the value in the symbol table structure is a virtual address. This additional complexity is to ensure efficient access by the tools and code that use these values.

st_value字段的含義取決於ELF類型。 對於可重定位目標文件,該值是節索引st_shndx特定部分中的偏移量。對於共享庫和可執行文件,符號表結構中的值是虛擬地址。這種額外的複雜性是爲了確保使用這些值的工具和代碼能有效訪問。

Given what we’ve covered about symbols, let’s see if we can find the global variable “list” in the object file foo.o. The variable “list” in foo.C is defined as follows:

考慮到我們所涉及的符號, 讓我們看看是否可以在對象文件 foo.o 中找到全局變量 "list"。Foo.c 中的變量 "list" 的定義如下:

int list[10] = { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 } ;

 

For the object file foo.o, the value in the symbol table for the global variable “list” is 0x20, which is the offset within the .data section for this global variable as shown here:

對於對象文件 foo o, 全局變量 "list" 在符號表中的值爲 0x20, 這是全局變量list在.data 節中的偏移量, 如下所示:

penguin> nm -v -f s foo.o | egrep list

list             |00000020|   D |          OBJECT|00000028|   |.data

 

We can confirm this by adding the file offset of the .data section to the value of the symbol “list.” We’ll need to use readelf to get the offset of the .data section:

我們可以通過將. data 節的文件偏移量添加到符號 "list" 的值來確認這一點。我們需要使用 readelf 來獲取.data節的偏移量:

penguin> readelf -S foo.o | egrep "\.data "

[ 3] .data            PROGBITS       00000000 000140 000048 00 WA 0

0 32

 

From the file offset of the .data section listed in the output just listed, the global variable “list” should be at file offset 0x160 (0x140 + 0x20). We can use hexdump to confirm that the values for “list” are indeed at this offset in the file:

從上面的輸出中列出的.data 節的文件偏移量, 全局變量 "list" 應位於文件偏移量 0x160 (0x140 + 0x20) 中。我們可以使用 hexdump 來確認 "list" 的值確實在文件的偏移量中:

Code View: Scroll / Show All

penguin> hexdump -C -s 0x160 foo.o | head -4

00000160  00 00 00 00 01 00 00 00  02 00 00 00 03 00 00 00 |................|

00000170  04 00 00 00 05 00 00 00  06 00 00 00 07 00 00 00 |................|

00000180  08 00 00 00 09 00 00 00  00 00 00 00 00 00 00 00 |................|

00000190  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00 |................|

 

For a shared library, the meaning of the “value” field in a symbol entry is a virtual address. In the case of libfoo.so, the value (virtual address) of the “list” variable is 0x1b40:

對於共享庫, 符號項中 "value" 字段的含義是虛擬地址。在 libfoo.so 的例子中, "list" 變量的值 (虛擬地址) 爲 0x1b40:

penguin> nm -v -f s libfoo.so | egrep list

list  00001b40|   D  |            OBJECT|00000028|   |.data

 

This isn’t the real virtual address when the shared library is loaded into memory because shared libraries can be loaded anywhere. For shared library files, the virtual address of the text segment starts at 0x0, and the virtual address of the data segment (which contains the .data section) is set to some offset from the beginning of the text segment. Looking at the .data section for libfoo.so reveals that its file offset is 0xb00 and the virtual address is 0x1b00:

當共享庫加載到內存中時, 這不是真正的虛擬地址, 因爲共享庫可以在任何位置加載。對於共享庫文件, 文本段的虛擬地址從0x0 開始, 並且數據段 (包含.data 節) 的虛擬地址設置爲從文本段開頭的某個偏移量。查看 libfoo.so 的.data節, 說明其文件偏移量爲 0xb00, 虛擬地址爲 0x1b00:

penguin> readelf -S libfoo.so | egrep "\.data "

[14] .data  PROGBITS  00001b00 000b00 000070 00  WA  0   0 32

 

Thus we can subtract 0x1000 (0x1b00 - 0xb00) from any value listed by nm for libfoo.so to get the file offset of a symbol. This means that the file offset of “list” in libfoo.so is 0xb40, as confirmed here:

因此,我們可以從nm爲libfoo.so列出的任何值中減去0x1000(0x1b00  -  0xb00)以獲取符號的文件偏移量。 這意味着libfoo.so中“list”的文件偏移量爲0xb40,如下所示:

Code View: Scroll / Show All

penguin> hexdump -C -s 0xb40 libfoo.so | head -4

00000b40  00 00 00 00 01 00 00 00  02 00 00 00 03 00 00 00 |................|

00000b50  04 00 00 00 05 00 00 00  06 00 00 00 07 00 00 00 |................|

00000b60  08 00 00 00 09 00 00 00  80 0a 00 00 00 00 00 00 |................|

00000b70  18 00 00 00 00 00 00 00  01 7a 50 52 00 01 7c 08 |.........zPR..|.|

 

The binding of a symbol has an interesting feature worth mentioning. As mentioned earlier in this chapter, symbols can be global or local depending on their scope. Symbols can also be “weak,” which is similar to global—although a symbol with a global binding will be chosen over a weak symbol of the same name. This can actually be very useful for problem determination efforts because some system functions are declared “weak,” meaning that you can override them if needed. This is covered in more detail with an example in the section titled “Use of Weak Symbols for Problem Determination” later in this chapter.

符號的綁定有一個值得一提的有趣特性。正如本章前面所提到的, 符號可以是全局的或局部的, 這取決於它們的作用範圍。符號也可以是 "弱", 這與全局相似--儘管具有全局綁定的符號將被選擇在同名的弱符號上。這實際上對於問題的確定是非常有用的, 因爲某些系統函數被聲明爲 "弱", 這意味着您可以在需要時重寫它們。本章後面的 "使用弱符號來確定問題" 一節中的示例更詳細地介紹了這一點。

9.6.3. Section Names and Types

The casual term “section type” has two different meanings in normal technical conversation. One refers to the section type as defined in the sh_type field of the section header structure. The other refers to the combination of name and type of a section. For example, a section can have a (sh_type) of PROGBITS, but that does not describe what is in the section. On the other hand, someone might ask “what type of section,” and the response is usually the section name, such as “.rodata” or “.text.”

非正式術語“節類型”在正常技術對話中具有兩種不同的含義。一個指節頭結構的sh_type字段中定義的節類型。 另一個是指節的名稱和類型的組合。例如,一個節可以有一個(sh_type)PROGBITS,但是沒有描述該節中的內容。另一方面,有人可能會問“什麼類型的節”,而回答通常是節名稱,例如“.rodata”或“.text”。

The sections included in the shared library libfoo.so are listed here:

此處列出了共享庫libfoo.so中包含的節:

Code View: Scroll / Show All

penguin> readelf -S libfoo.so

There are 30 section headers, starting at offset 0x1090:

有30個節標題, 從偏移0x1090 開始:

Section Headers:

  [Nr] Name          Type   Addr     Off    Size   ES Flg Lk Inf Al

  [ 0]               NULL  00000000 000000 000000 00      0   0  0

  [ 1] .hash         HASH  000000b4 0000b4 000158 04   A  2   0  4

  [ 2] .dynsym       DYNSYM 0000020c 00020c 0002f0 10  A  3  1b  4

  [ 3] .dynstr       STRTAB 000004fc 0004fc 000133 00  A  0   0  1

  [ 4] .gnu.version  VERSYM 00000630 000630 00005e 02  A  2   0  2

  [ 5] .gnu.version_r VERNEED 00000690 000690 000050 00 A  3  2  4

  [ 6] .rel.dyn      REL   000006e0 0006e0 000060 08   A  2   0  4

  [ 7] .rel.plt      REL   00000740 000740 000028 08   A  2   9  4

  [ 8] .init         PROGBITS 00000768 000768 000018 00 AX 0  0  4

  [ 9] .plt          PROGBITS 00000780 000780 000060 04  AX 0 0  4

  [10] .text         PROGBITS 000007e0 0007e0 000280 00  AX 0 0 16

  [11] .fini         PROGBITS 00000a60 000a60 00001c 00  AX 0 0  4

  [12] .rodata       PROGBITS 00000a80 000a80 00004c 00  A  0 0 32

  [13] .eh_frame_hdr PROGBITS 00000acc 000acc 00002c 00 A  0  0  4

  [14] .data         PROGBITS 00001b00 000b00 000070 00 WA 0  0 32

  [15] .eh_frame     PROGBITS 00001b70 000b70 0000d8 00 WA 0  0  4

  [16] .dynamic      DYNAMIC 00001c48 000c48 0000d8 08  WA 3  0  4

  [17] .ctors        PROGBITS 00001d20 000d20 00000c 00 WA 0  0  4

  [18] .dtors        PROGBITS 00001d2c 000d2c 000008 00 WA 0  0  4

  [19] .jcr          PROGBITS 00001d34 000d34 000004 00 WA 0  0  4

  [20] .got          PROGBITS 00001d38 000d38 00003c 04 WA 0  0  4

  [21] .bss          NOBITS 00001d74 000d74 000010 00   WA 0  0  4

  [22] .comment      PROGBITS 00000000 000d74 000050 00   0   0  1

  [23] .debug_aranges PROGBITS 00000000 000dc8 000058 00  0   0  8

  [24] .debug_info   PROGBITS 00000000 000e20 000098 00   0   0  1

  [25] .debug_abbrev PROGBITS 00000000 000eb8 00001c 00   0   0  1

  [26] .debug_line   PROGBITS 00000000 000ed4 0000bf 00   0   0  1

  [27] .shstrtab     STRTAB 00000000 000f93 0000fb 00     0   0  1

  [28] .symtab       SYMTAB 00000000 001540 0004d0 10    29  39  4

  [29] .strtab       STRTAB 00000000 001a10 000275 00     0   0  1

Key to Flags:

 W (write), A (alloc), X (execute), M (merge), S (strings)

 I (info), L (link order), G (group), x (unknown)

 O (extra OS processing required) o (OS specific), p (processor specific)

 

The first section is always NULL, although there are 29 other sections, each with its own purpose. The most interesting sections are listed as follows with more detail.

第一節始終爲 NULL, 儘管還有29節, 每個節都有各自的用途。最有趣的節的詳細信息如下所示。

Before listing the details of each section, please note that we will be discussing the source file foo.C as listed later in this chapter under “Source Files.” It includes a wide range of data types that will help to clarify the various sections.

在列出每個節的詳細信息之前, 請注意, 我們將討論源文件 foo.c (在本章後面的 "源文件" 下列出)。它包含範圍廣泛的數據類型, 將有助於澄清各個節。

Note: All sections start with a “.” prefix in ELF.

9.6.3.1. .bss

There is some debate about what the bss acronym actually stands for. The most likely origin is from Fortran compiler on the IBM 704. The acronym most likely stands for “Block Started by Symbol” and was adopted to describe the uninitialized data for an ELF object. The acronym bss is pretty much meaningless, so consider it a term, not a useful acronym.

關於 bss 縮寫詞究竟代表什麼, 有些爭論。最可能的起源是從IBM 704上的Fortran 編譯器。縮略詞最有可能代表 "由符號開始的塊", 並被用來描述一個 ELF 對象的未初始化數據。縮寫是沒有意義的, 所以把它看作一個術語, 而不是一個有用的縮寫詞。

The .bss section is used for global and file local variables that are not initialized with a specific value. It is zeroed out as the process starts up, which sets the initial value of any variables in it to zero. For example, the global variable noValueGlobInt in foo.C is stored in the bss section because it has no initial value.

. bss 節用於未初始化的全局和局部變量。當進程啓動時, 它將任何變量的初始值設置爲零。例如, foo.c 中的全局變量 noValueGlobInt存儲在 bss 部分中, 因爲它沒有初始值。

penguin> readelf -S libfoo.so | egrep \.bss

[21] .bss         NOBITS      00001d74 000d74 000010 00  WA  0   0  4

penguin> nm libfoo.so |egrep noValueGlobInt

00001d80 B noValueGlobInt

 

According to nm, the value of the noValueGlobInt variable is 0x1d80. This value is right inside the .bss section as expected. Also notice that the section type (sh_type) is NOBITS, which indicates that this section takes up no space in the ELF file. We can confirm this by looking at the loadable program headers for this library:

根據nm,noValueGlobInt變量的值爲0x1d80。 該值正好在.bss部分內。 另請注意,節類型(sh_type)是NOBITS,表示此節在ELF文件中不佔用空格。 我們可以通過查看此庫的可加載程序頭來確認這一點:

penguin> readelf -l libfoo.so

 

Elf file type is DYN (Shared object file)

Entry point 0x7e0

There are 4 program headers, starting at offset 52

Program Headers:

  Type         Offset  VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align

  LOAD       0x000000 0x00000000 0x00000000 0x00af8 0x00af8 R E 0x1000

  LOAD       0x000b00 0x00001b00 0x00001b00 0x00274 0x00284 RW  0x1000

<...>

 

The offset for the second load segment (the “data” segment) is 0x1b00, and the value listed from nm for noValueGlobInt is 0x1d80. Therefore, the noValueGlobInt is at offset 0x280 bytes (0x1d80 - 0x1b00) into the second load segment. The value of 0x280 is larger than FileSiz but not larger than MemSiz. The FileSiz field is the size of the load segment in the file. The MemSiz field is the size of the load segment once it is loaded into memory. That confirms that the variable really does not take any space in the file, but it will (obviously) require space once loaded into memory.

第二個加載段(“數據”段)的偏移量爲0x1b00,noValueGlobInt的nm列出的值爲0x1d80。 因此,noValueGlobInt位於第二個加載段的偏移量0x280字節(0x1d80  -  0x1b00)。 0x280的值大於FileSiz但不大於MemSiz。 FileSiz字段是文件中加載段的大小。 MemSiz字段是加載到內存後的加載段的大小。 這證實了變量確實沒有在文件中佔用任何空間,但是(顯然)一旦加載到內存中就會需要空間。

9.6.3.2. .data

This section contains initialized global and static, writable variables. Looking at the details for this section brings to light a few interesting things:

本節包含初始化的全局和靜態可寫變量。查看本節的詳細信息, 可以看到一些有趣的事情:

penguin> readelf -S libfoo.so | egrep '\.data'

[14] .data     PROGBITS       00001b00 000b00 000070 00 WA 0  0 32

 

First, note that the .data section has the type PROGBITS, which means that it does occupy space in the file. This is different than the .bss section, which had type NOBITS. Another thing worth noting is that memory attributes WA are the same as those for the .bss section. “WA” means that the section is writable and will require memory in the program.

首先, 請注意,.data 節具有類型 PROGBITS, 這意味着它在文件中佔用了空間。這與具有類型 NOBITS的.bss 節不同。另一件值得注意的是, 內存屬性與. bss 部分相同。"WA" 表示該節是可寫的, 在程序中需要內存。

The source file foo.C contains two variables, globInt and staticInt, that can illustrate how the .data section is used:

源文件 foo.c 包含兩個變量, globInt 和 staticInt, 可以說明如何使用.data節:

int globInt = 5 ;

static int staticInt = 5 ;

 

The variable globInt is a global, writable (non-constant) variable, and staticInt is a static writable variable. Both should be in the .data section, but let’s confirm that here:

變量 globInt 是全局可寫 (非常量) 變量, staticInt 是靜態可寫變量。兩者都應該在.data節中, 但讓我們確認以下內容:

penguin> nm -v -f s libfoo.so | egrep globInt

globInt           |00001b20|  D |          OBJECT|00000004|   |.data

penguin> nm -v -f s libfoo.so | egrep staticInt

staticInt         |00001b24|  d |          OBJECT|00000004|   |.data

 

From the section information before, the .data section had a value (“address”) of 0x1b00 and a size of 0x70. The two variables have values of 0x1b20 and 0x1b24 respectively, both of which are located in the range of the .data section and even have the .data section listed in the output.

從之前的節信息中,.data節的值(“地址”)爲0x1b00,大小爲0x70。 這兩個變量的值分別爲0x1b20和0x1b24,它們都位於.data節的範圍內,甚至在輸出中列出了.data節。

9.6.3.3. .dynamic

This section stores information about dynamic linking. This includes information about which libraries are required for a program or executable, where to look for these libraries (rpath), and the important sections of the ELF object needed to run the program. The DYNAMIC segment contains the .dynamic section (and contains only this one section). The readelf tool can be used to display the contents of the dynamic section/segment:

本節存儲有關動態鏈接的信息。這包括程序或可執行文件需要哪些庫、這些庫 (rpath)的位置 以及運行該程序所需的 ELF 對象的重要節的信息。DYNAMIC段包含.dynamic節 (並且僅包含這一節)。readelf 工具可用於顯示動態節/段的內容:

Code View: Scroll / Show All

penquin> readelf -d foo

 

Dynamic segment at offset 0x830 contains 25 entries:

  Tag        Type               Name/Value

 0x00000001 (NEEDED)            Shared library: [libfoo.so]

 0x00000001 (NEEDED)            Shared library: [libstdc++.so.5]

 0x00000001 (NEEDED)            Shared library: [libm.so.6]

 0x00000001 (NEEDED)            Shared library: [libgcc_s.so.1]

 0x00000001 (NEEDED)            Shared library: [libc.so.6]

 0x0000000f (RPATH)             Library rpath: [.]

 0x0000000c (INIT)              0x80484c8

 0x0000000d (FINI)              0x80486f0

 0x00000004 (HASH)              0x8048148

 0x00000005 (STRTAB)            0x8048310

 0x00000006 (SYMTAB)            0x80481e0

 0x0000000a (STRSZ)             282 (bytes)

 0x0000000b (SYMENT)            16 (bytes)

 0x00000015 (DEBUG)             0x0

 0x00000003 (PLTGOT)            0x8049938

 0x00000002 (PLTRELSZ)          48 (bytes)

 0x00000014 (PLTREL)            REL

 0x00000017 (JMPREL)            0x8048498

 0x00000011 (REL)               0x8048490

 0x00000012 (RELSZ)             8 (bytes)

 0x00000013 (RELENT)            8 (bytes)

 0x6ffffffe (VERNEED)           0x8048450

 0x6fffffff (VERNEEDNUM)        2

 0x6ffffff0 (VERSYM)            0x804842a

 0x00000000 (NULL)              0x0

 

The key sections listed in the dynamic section are explained in this chapter. The entries that have a type of NEEDED are for the libraries required by the executable foo. The RPATH defines a search path for the libraries. The rest of the information in the dynamic section is a convenient way to locate the information needed to run this executable, including the address of other important sections such as .init, .fini, and others.

本章將介紹動態節中列出的關鍵部分。具有NEEDED類型的條目用於可執行文件foo所需的庫。 RPATH定義了庫的搜索路徑。 動態節中的其餘信息是查找運行此可執行文件所需信息的便捷方式,包括.init,.fini等其他重要部分的地址。

This section is mainly used by the program interpreter, covered in more detail later in this chapter.

本節主要由程序解釋器使用, 本章後面將詳細介紹。

9.6.3.4. .dynsym (symbol table)

The dynamic symbol table is the smaller of the two symbol tables. It only contains symbols that are required for program execution, global symbols. The dynamic symbol table is required and cannot be stripped from an ELF object. The dynamic symbol table does not contain any static symbols because static symbol information is not needed at run time.

動態符號表是兩個符號表中較小的一個。它只包含程序執行所需的符號、全局符號。動態符號表是必需的, 不能從 ELF 對象中剝離。動態符號表不包含任何靜態符號, 因爲在運行時不需要靜態符號信息。

Static symbols are local to a file, and once a shared library or executable is linked, the offset of the static symbols are called directly through an offset that is known at link time. Dynamic symbols might be satisfied outside of the shared library and executable, and thus finding these symbols must be done at run time.

靜態符號是文件的本地符號,一旦鏈接了共享庫或可執行文件,靜態符號的偏移量將直接通過鏈接時已知的偏移量調用。 動態符號可能在共享庫和可執行文件之外,因此必須在運行時查找這些符號。

Consider the two variables again: staticInt and globInt, defined as: 再考慮兩個變量: staticInt 和 globInt, 定義爲:

int globInt = 5 ;

static int staticInt = 5 ;

 

The globInt variable should be in the dynamic symbol table, although the staticInt variable should only be in the main symbol table: globInt 變量應位於動態符號表中, 但 staticInt 變量只應位於主符號表中:

penguin> nm  libfoo.so | egrep staticInt

00001b24 d staticInt

penguin> nm -D libfoo.so |egrep staticInt

penguin> nm -D libfoo.so | egrep globInt

00001b20 D globInt

 

As expected, the static variable is not found in the dynamic symbol table. The main symbol table that contains all symbols is described as follows under the section heading, .symtab.

正如所料,在動態符號表中找不到靜態變量。包含所有符號的主符號表在 .symtab一節講述。

9.6.3.5. .dynstr (string table)

This string table contains only the symbols that are required for dynamic linking. The majority of the content will be for the symbol names from the dynamic symbol table. The format is the standard ELF string table format.

字符串表僅包含動態鏈接所需的符號。大多數內容將用於動態符號表中的符號名稱。格式是標準的 ELF 字符串表格格式。

9.6.3.6. .fini

The .fini section contains the machine instructions for the function _fini. Notice that the file offset listed by nm for _fini, and the file offset for the .fini section match exactly:

.fini節包含函數 _fini 的機器指令。請注意, nm列出_fini的文件偏移量, 與.fini節的文件偏移量完全匹配:

penguin> nm foo | egrep _fini

080486e0 T _fini

penguin> readelf -S foo | egrep fini

  [13] .fini  PROGBITS       080486e0 0006e0 00001c 00  AX  0  0  4

 

The _fini function at 0x080486e0 calls the static (global) destructors for an executable or shared library. For an executable, _fini (and thus the destructors) is called when the program terminates. For shared libraries, the _fini function is called when a library is unloaded from memory. The _fini function has a counterpart function called _init that calls the global constructors.

0x080486e0 中的 _fini 函數爲可執行文件或共享庫調用靜態 (全局) 析構函數。對於可執行文件, 當程序終止時, 將調用 _fini (從而調用析構函數)。對於共享庫, 當從內存中卸載庫時, 將調用 _fini 函數。_fini 函數有一個稱爲 _init 的對應函數, 它調用全局構造函數。

There is more information on the .init section covered further on in the chapter. 本章進一步介紹了有關. init 節的更多信息。

9.6.3.7. .got (Global Offset Table)

The Global Offset Table, known as GOT, is required for position-independent code. Position-independent code is compiled in such a way that it can be loaded and run from any address. This isn’t as easy as it sounds. Code that needs to call a function has no idea where in the address space this function will be. The GOT solves this problem by providing a level of indirection between the code in an executable or shared library and the required functions and variables that may be in other shared libraries. Let’s look at the GOT in more detail:

全局偏移表(稱爲GOT)是與位置無關的代碼所必需的。與位置無關的代碼以這樣的方式編譯,即可以從任何地址加載和運行它。這並不像聽起來那麼容易。需要調用函數的代碼不知道這個函數在地址空間中的位置。GOT通過在可執行文件或共享庫中的代碼與可能在其他共享庫中的所需函數和變量之間提供間接級別來解決此問題。讓我們更詳細地看一下GOT:

penguin> readelf -S libfoo.so | egrep .got

[20] .got      PROGBITS       00001d38 000d38 00003c 04  WA  0  0  4

 

From the output, we know that the .got is of type PROGBITS, meaning that it does consume space in the ELF file itself (unlike the .bss section). The output also indicates that the file offset of the .got is 0x918 and that it has a size of 0x28 or 40 decimal. Let’s look at the raw contents:

從輸出中,我們知道.got是PROGBITS類型,這意味着它確實消耗了ELF文件本身的空間(與.bss節不同)。 輸出還顯示.got的文件偏移量爲0x918,並且其大小爲0x28或十進制40。 我們來看看原始內容:

Code View: Scroll / Show All

penguin> hexdump -s 0xd38 -C -n 60 libfoo.so

00000d38   48  1c  00 00  00 00  00  00   00  00  00  00  96  07  00  00 |H...............|

00000d48   a6  07  00 00  b6 07  00  00   c6  07  00  00  d6  07  00  00 |................|

00000d58   00  00  00 00  00 1b  00  00   00  00  00  00  00  00  00  00 |................|

00000d68 00 00 00 00 00 00 00 00 00 00 00 00              |............|

 

The GOT is an array of values that is stored in private memory. According to readelf, the size of this global offset table is 0x3c, which means that it has the following values:

GOT是存儲在私有內存中的值數組。根據 readelf, 此全局偏移量表的大小爲 0x3c, 這意味着它具有以下值:

Value number

GOT Address (File Offset)

Value

0

0x1d38 (0xd38)

0x1c48

1

0x1d3c (0xd3c)

0x0

2

0x1d40 (0xd40)

0x0

3

0x1d44 (0xd44)

0x796

4

0x1d48 (0xd48)

0x7a6

5

0x1d4c (0xd4c)

0x7b6

6

0x1d50 (0xd50)

0x7c6

7

0x1d54 (0xd54)

0x7d6

8

0x1d58 (0xd58)

0x0

9

0x1d5c (0xd5c)

0x0

10

0x1d60 (0xd60)

0x1b00

11

0x1d64 (0xd64)

0x0

12

0x1d68 (0xd68)

0x0

13

0x1d6c (0xd6c)

0x0

14

0x1d70 (0xd70)

0x0

 

The first value (0x1c48) is always the virtual address of a special global variable called _DYNAMIC. This is the address of the dynamic section (covered previously).

第一個值 (0x1c48) 始終是稱爲 _DYNAMIC 的特殊全局變量的虛擬地址。這是動態節的地址 (如前所述)。

penguin> nm libfoo.so | egrep 1c48

00001c48 A _DYNAMIC

 

The next two values (1 and 2) are 0x0. Addresses 4 through 8 range in value from 0x796 to 0x07d6 and all point to addresses in the Procedure Linkage Table or PLT:

接下來的兩個值(1和2)是0x0。 地址4到8的值範圍爲0x796到0x07d6,並且都指向過程鏈接表或PLT中的地址:

penguin> readelf -S foo

There are 30 section headers, starting at offset 0x1090:

 

Section Headers:

  [Nr] Name       Type      Addr     Off    Size   ES Flg Lk Inf Al

<...>

  [ 9] .plt       PROGBITS  00000780 000780 000060 04 AX  0  0   4

<...>

 

From the output, the PLT starts at 0x780 and has a size of 0x60. More information on the PLT and how it relates to the GOT follows under the heading “.plt.”

從輸出,PLT從0x780開始,大小爲0x60。 關於PLT及其與GOT的關係的更多信息見“.plt”節。

So what are the rest of the entries in the GOT for? The answer requires some knowledge of relocation. Without going into too much detail here, some of the GOT entries are used for global variables. The machine instructions in the shared library will reference the GOT entries for the global variables, and the relocation entries ensure that the GOT entries point to the correct address at run time. The relocation entries are included here for the curious reader, although relocation will be covered in much more detail later in this chapter.

那麼GOT中的其他條目是什麼? 答案需要一些重定位的知識。 這裏沒有詳細介紹,一些GOT條目用於全局變量。 共享庫中的機器指令將引用全局變量的GOT條目,重定位條目確保GOT條目在運行時指向正確的地址。儘管本章後面將詳細介紹重定位,在這裏爲好奇的讀者簡單介紹重定位條目。

Code View: Scroll / Show All

penguin> readelf -r libfoo.so

 

Relocation section '.rel.dyn' at offset 0x6e0 contains 12 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

00001b00  00000008 R_386_RELATIVE

00001b04  00000008 R_386_RELATIVE

00001b68  00000008 R_386_RELATIVE

00001d24  00000008 R_386_RELATIVE

00001d5c  00000008 R_386_RELATIVE

00001b6c  00002101 R_386_32          00000000   __gxx_personality_v0

00001d58  00001e06 R_386_GLOB_DAT    00001d80   noValueGlobInt

00001d60  00002006 R_386_GLOB_DAT    00001b20   globInt

00001d64  00002606 R_386_GLOB_DAT    00000000   __cxa_finalize

00001d68  00002c06 R_386_GLOB_DAT    00001d7c   myObj2

00001d6c  00002d06 R_386_GLOB_DAT    00000000   _Jv_RegisterClasses

00001d70  00002e06 R_386_GLOB_DAT    00000000   __gmon_start__

 

Relocation section '.rel.plt' at offset 0x740 contains 5 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

00001d44  00001d07 R_386_JUMP_SLOT   000009d6   _Z3fooi

00001d48  00002407 R_386_JUMP_SLOT   00000000   printf

00001d4c  00002607 R_386_JUMP_SLOT   00000000   __cxa_finalize

00001d50  00002b07 R_386_JUMP_SLOT   000009c8   _ZN7myClassC1Ev

00001d54  00002d07 R_386_JUMP_SLOT   00000000   _Jv_RegisterClasses

 

Notice that offset of the relocation entry for globInt is 0x1d60, which is one of the slots in the GOT. 請注意,globInt的重定位條目的偏移量是0x1d60,這是GOT中的一個插槽。

Consider the following code from foo.C:

static int bar( int c )

{

   int d = 0;

 

   d = foo( c ) + globInt ;

   d += staticInt ;

   d += constInt ;

 

   return d ;

}

 

It references the global integer globInt. Let’s see how it uses the GOT to find this variable by looking at the assembly language for this function:

它引用了全局整數globInt。 讓我們看一下在此函數的彙編語言中,如何使用GOT來查找此變量

Code View: Scroll / Show All

penguin> objdump -d libfoo.so

<...>

000008c4 <_Z3bari>:

 8c4:   55                      push    %ebp

 8c5:   89 e5                   mov     %esp,%ebp

 8c7:   53                      push    %ebx

 8c8:   83 ec 04                sub     $0x4,%esp

 8cb:   e8 00 00 00 00          call    8d0 <_Z3bari+0xc>

 8d0:   5b                      pop     %ebx

 8d1:   81 c3 68 14 00 00       add     $0x1468,%ebx

 8d7:   c7 45 f8 00 00 00 00    movl    $0x0,0xfffffff8(%ebp)

 8de:   83 ec 0c                sub     $0xc,%esp

 8e1:   ff 75 08                pushl   0x8(%ebp)

 8e4:   e8 a7 fe ff ff          call    790 <_init+0x28>

 8e9:   83 c4 10                add     $0x10,%esp

 8ec:   89 c2                   mov     %eax,%edx

 8ee:   8b 83 28 00 00 00       mov     0x28(%ebx),%eax

 8f4:   8b 08                   mov     (%eax),%ecx

 8f6:   8d 04 11                lea     (%ecx,%edx,1),%eax

 8f9:   89 45 f8                mov     %eax,0xfffffff8(%ebp)

 8fc:   8b 93 ec fd ff ff       mov     0xfffffdec(%ebx),%edx

 902:   8d 45 f8                lea     0xfffffff8(%ebp),%eax

 905:   01 10                   add     %edx,(%eax)

 907:   8d 45 f8                lea     0xfffffff8(%ebp),%eax

 90a:   83 00 05                addl    $0x5,(%eax)

 90d:   8b 45 f8                mov     0xfffffff8(%ebp),%eax

 910:   8b 5d fc                mov     0xfffffffc(%ebp),%ebx

 913:   c9                      leave

 914:   c3                      ret

 915:   90                      nop

<...>

Note: In the assembly language here, the symbol name _Z3bari is the function bar. The name is a “mangled” C++ function name. The objdump tool accepts a -C switch to demangle the name, or alternatively you can use the command echo _Z3bari |c++filt.

注意:在此處的彙編語言中,符號名稱_Z3bari是函數欄。 該名稱是“修改”的C ++函數名稱。 objdump工具接受-C開關來解碼名稱,或者你可以使用命令echo _Z3bari | c ++ filt。

Note: In the assembly language just listed, the hex numbers to the left of the output are file offsets and not real memory addresses. However, the methods used by ELF to achieve position independence would work even if these were real addresses. When the library is loaded into memory, the real memory addresses would be used instead.

注意: 在剛剛列出的彙編語言中, 輸出左側的十六進制數字是文件偏移量, 而不是實際內存地址。然而, ELF使用方法來實現位置獨立性, 即使這些是真實的地址。將庫加載到內存中時, 將使用實際內存地址。

 

The instruction at 0x8cb makes a call to 0x8d0. This seems a bit strange because that is the next instruction to be run anyway—but this does have a purpose. The instruction at 0x8d0 puts the current instruction address into register EBX. The instruction at 0x8d1 then adds a hard-coded value 0x1468 to this.

0x8cb處的指令調用0x8d0。 這看起來有點奇怪,因爲這是下一條要運行的指令 - 但這樣做是有目的。 0x8d0處的指令將當前指令地址放入寄存器EBX。 然後,0x8d1處的指令將硬編碼值0x1468添加到此處。

0x8d0+0x1468 = 0x1d38.

 

This is where the GOT is located. For quick reference, here is the information for the GOT section again:

這是GOT所在的位置。 爲了快速參考,這裏再次顯示GOT節的信息:

penguin> readelf -S libfoo.so | egrep .got

[20] .got      PROGBITS       00001d38 000d38 00003c 04  WA  0  0  4

 

Later in the assembly language, there is a call at 0x8ee that adds 0x28 to the value in EBX, which finds the offset for globInt.

在彙編語言的後面,在0x8ee處將0x28加到EBX中的值上,該值爲globInt找到偏移量。

EBX( 0x1d38) + 0x28 = 0x1d60

 

From the relocation information just listed, this matches the value of the relocation entry for globInt:

從剛剛列出的重新定位信息中, 這與 globInt 的重新定位項的值匹配:

00001d60   00002006 R_386_GLOB_DAT     00001b20    globInt

 

The next instruction at 0x8f4 dereferences the value for globInt in the GOT to find the actual address of globInt.

0x8f4處的下一條指令取消引用GOT中globInt的值,以查找globInt的實際地址。

Note: This is a very good example of how the text segment relies on the data segment being at a specific offset. In this case, it expects the GOT to be 0x1468 away from a particular instruction. If the data section (which contains the GOT) was ever loaded in the wrong place, the hard-coded reference to the GOT would be inaccurate.

注意:這是文本段如何依賴於特定偏移量的數據段的一個非常好的示例。 在這種情況下,它期望GOT遠離特定指令0x1468。 如果數據節(包含GOT)被加載到錯誤的位置,則對GOT的硬編碼引用將是不準確的。

9.6.3.8. .hash

This is the symbol hash table. Because humans need to use symbol names (that is, function and variable names), ELF must implement a quick way to find the various symbol names to find the corresponding symbol.

這是符號哈希表。因爲人類需要使用符號名稱(即函數和變量名稱),所以ELF必須實現快速查找各種符號名稱以查找相應的符號。

We know that the symbol table in an ELF file is simply an array. Without a hash table of some sort, finding a symbol in this array would require a linear search of the array until the symbol is found or until the end of the symbol table is reached. This might not be too bad for a one-time search, but the typical ELF file contains many symbols (possibly many thousands). A linear search for each of these either during run time or for the linker (ld) would not be practical.

我們知道ELF文件中的符號表只是一個數組。如果沒有某種哈希表,在此數組中查找符號將需要對數組進行線性搜索,直到找到符號或到達符號表的末尾。 對於一次性搜索,這可能不是太糟糕,但典型的ELF文件包含許多符號(可能有數千個)。 在運行時或鏈接器(ld)期間對這些中的每一個進行線性搜索是不實際的。

The hash mechanism in ELF is illustrated in Figure 9.3.

Figure 9.3. ELF hash algorithm.

 

In the diagram, the function printf is run through the hash function to retrieve a numeric value. The modulus of this numeric value with the hash bucket table size provides an index into the hash bucket table. The hash bucket table contains an index into the symbol table as well as the hash chain table. At this point, the symbol name in the symbol table (pointed to by the hash bucket slot) is checked against printf (“C1” in the diagram). If the symbol name at this entry in the symbol table doesn’t ARmatch, the index of the chain table is used.

在圖中,函數printf通過運行哈希函數以檢索數值。該數值與散列桶表大小的模數提供散列桶表的索引。哈希桶表包含符號表的索引以及哈希鏈表。此時,符號表中的符號名稱(由哈希桶槽指向)將根據printf(圖中的“C1”)進行檢查。 如果符號表中此條目處的符號名稱不匹配,則使用鏈表的索引。

Each chain table entry contains an index into the symbol table as well as the index of the next element in the chain if there are any. The symbol table entry from the first chain table entry is compared against printf (“C2” in the diagram). According to the diagram, there was no match, so the next element in the chain is used. This happens again until the function printf is found in the symbol table after four comparisons (the symbol is found in “C4” according to the diagram).

每個鏈表條目包含符號表的索引以及鏈中下一個元素的索引(如果有)。 將第一個鏈表條目中的符號表條目與printf(圖中的“C2”)進行比較。 根據該圖,沒有匹配,因此使用鏈中的下一個元素。 再次發生這種情況,直到在四次比較後在符號表中找到函數printf(根據圖表符號在“C4”中找到)。

This is more complex but much more efficient than using a linear search in the symbol table, especially for large symbol tables.

這比在符號表中使用線性搜索更復雜, 但效率更高, 尤其是對於大型符號表。

For the interested reader, here is the hash algorithm as specified in the ELF standard:

對於感興趣的讀者, 下面是 ELF 標準中指定的哈希算法:

unsigned long ElfHash(const unsigned char *name)

{

  unsigned long h=0, g;

  while (*name)

    {

      h = (h << 4) + *name++;

      if (g = h & 0xF0000000)

  h ^= g >> 24;

      h &= ~g;

    }

  return h;

}



9.6.3.9. .init

This section contains executable instructions required for initialization of an ELF object. It is almost identical to the .fini section except that the information is for initialization, not finalization. The section .init contains a function called _init in the same way that the .fini section contains a function called _fini. The _init function is responsible for initializing global variables (including objects) for an ELF library or executable. Notice in the output that the address of the _init function matches the address of the .init section:

本節包含初始化ELF對象所需的可執行指令。它幾乎與.fini部分相同,只是信息用於初始化,而不是最後確定。.init部分包含一個名爲_init的函數,其方式與.fini部分包含一個名爲_fini的函數的方式相同。 _init函數負責初始化ELF庫或可執行文件的全局變量(包括對象)。 請注意,在輸出中_init函數的地址與.init節的地址匹配:

penguin> readelf -S foo |egrep "\.init"

  [10] .init   PROGBITS        080484b4 0004b4 000018 00  AX  0  0  4

penguin> nm foo |egrep 080484b4

080484b4 T _init

 

Let’s dig a bit deeper to see how the .init section works (which is very similar to how the .fini section works). We’ll start by looking at a global C++ object defined in main.C, which eventually is compiled into the executable called foo:

讓我們深入瞭解.init節是如何工作的(這與.fini節的工作方式非常相似)。 我們將首先查看main.C中定義的全局C ++對象,該對象最終被編譯爲名爲foo的可執行文件:

myClass myObj3 ;

The class "myClass" is defined as follows:

class myClass

{

   public:

 

   int myVar ;

 

   myClass() {

      myVar = 5 ;

   }

 

};

 

Notice that it includes a constructor. This is what should eventually be called by _init().

Disassembling the .init section using objdump shows us the following:

請注意,它包含一個構造函數。這是_init()最終應該調用的內容。使用objdump反彙編.init節向我們展示了以下內容:

penguin> objdump -d foo | head -15

 

foo:     file format elf32-i386

 

Disassembly of section .init:

 

080484b4 <_init>:

 80484b4:       55                   push    %ebp

 80484b5:       89 e5                mov     %esp,%ebp

 80484b7:       83 ec 08             sub     $0x8,%esp

 80484ba:      e8 a5 00 00 00     call  8048564 <call_gmon_start>

 80484bf:       90                   nop

 80484c0:       e8 0b 01 00 00       call  80485d0 <frame_dummy>

 80484c5:    e8 e6 01 00 00   call  80486b0 <__do_global_ctors_aux>

 80484ca:       c9                   leave

 80484cb:       c3                   ret

<...>

 

In the assembly listing, there is a call to __ do_global_ctors_aux. Let’s disassemble that function next (some output was excluded for simplicity):

在彙編程序中,調用了__ do_global_ctors_aux。 讓我們接下來反彙編該函數(爲簡單起見,刪除了一些輸出):

Code View: Scroll / Show All

penguin> objdump -d foo

<...>

080486b0 <__do_global_ctors_aux>:

 80486b0:       55                      push   %ebp

 80486b1:       89 e5                   mov    %esp,%ebp

 80486b3:       53                      push   %ebx

 80486b4:       52                      push   %edx

 80486b5:       a1 04 99 04 08          mov    0x8049904,%eax

 80486ba:       83 f8 ff                cmp    $0xffffffff,%eax

 80486bd:       bb 04 99 04 08          mov    $0x8049904,%ebx

 80486c2:  74 18           je    80486dc <__do_global_ctors_aux+0x2c>

 80486c4:       8d b6 00 00 00 00       lea    0x0(%esi),%esi

 80486ca:       8d bf 00 00 00 00       lea    0x0(%edi),%edi

 80486d0:       83 eb 04                sub    $0x4,%ebx

 80486d3:       ff d0                   call   *%eax

 80486d5:       8b 03                   mov    (%ebx),%eax

 80486d7:       83 f8 ff                cmp    $0xffffffff,%eax

 80486da: 75 f4             jne   80486d0 <__do_global_ctors_aux+0x20>

 80486dc:       58                      pop    %eax

 80486dd:       5b                      pop    %ebx

 80486de:       5d                      pop    %ebp

 80486df:       c3                      ret

<...>

 

The assembly listing shows the following instruction call *%eax at instruction 0x80486d3. This takes the value of the EAX register, treats it as an address, dereferences the address and the calls function stored at the dereferenced address. From code above this call, we can see EAX being set with mov 0x8049904,%eax at instruction 0x8049904. Let’s take a look at what is located at that address:

彙編程序在指令0x80486d3處顯示以下指令call *%eax。這將獲取EAX寄存器的值,將其視爲地址,解除引用地址,調用存儲在該地址的函數。 從call上面的代碼中,我們可以看到” mov 0x8049904,%eax” 設置EAX寄存器。我們來看看該地址的內容:

penguin> nm -v -f s foo

<...>

__CTOR_LIST__     |08049900| d |       OBJECT|     |   |.ctors

__CTOR_END__      |08049908| d |       OBJECT|     |   |.ctors

<...>

 

According to nm, there are two variables __CTOR_LIST and _CTOR_END, which have addresses that surround the address 0x8049904 stored in register EAX. This is the address that eventually is dereferenced to get an address that is called.

根據nm,有兩個變量__CTOR_LIST和_CTOR_END,它們的地址圍繞着存儲在寄存器EAX中的地址0x8049904。 這是最終獲取被調用地址的地址。

We need to find the value stored at address 0x8049904 to see what address is eventually called. It is kind of a pain to do this, but we first need to find the difference between the virtual address and the file offset of the segment that contains this address. We know it will be in a LOAD segment because it contains information needed at run time:

我們需要找到存儲在地址0x8049904的值,以查看最終調用的地址。 這樣做有點痛苦,但我們首先需要找到虛擬地址和包含該地址的段的文件偏移量之間的差異。 我們知道它將在LOAD段中,因爲它包含運行時所需的信息:

penguin> readelf -l foo | egrep LOAD

  LOAD      0x000000 0x08048000 0x08048000 0x0076c 0x0076c R E 0x1000

  LOAD      0x00076c 0x0804976c 0x0804976c 0x001d4 0x001dc RW 0x1000

 

According to the output, the address 0x8049904 is in the second load segment (the data segment), and the difference between virtual address of the data segment and the file offsets in the data segment is 0x08049000 (0x0804976c -0x76c). This is the address that we need to subtract from 0x8049904 to get the file offset of 0x904. Let’s see the contents at that offset:

根據輸出, 地址0x8049904 位於第二個加載段 (數據段) 中, 數據段的虛擬地址與數據段中的文件偏移量之間的差異是 0x08049000 (0x0804976c - 0x76c)。這是我們需要從0x8049904 中減去以獲得文件偏移量的地址0x904。讓我們看看這個偏移量的內容:

penguin> hexdump -C -s 0x904 -n 4 foo

00000904  7e 86 04 08                                     |~...|

00000908

 

According to the output, we have a value of 0x0804867e at address 0x8049904 (file offset 0x904). One more step is needed before we know what this address is for:

根據輸出, 我們在地址 0x8049904 (文件偏移 0x904) 上有一個0x0804867e 的值。在我們知道這個地址的用途之前, 還需要一個步驟:

penguin> nm foo | egrep 0804867e

0804867e t _GLOBAL__I_myObj3

 

This is the global constructor of the myObj2 object, which is called, as expected, by functions under _init.

這是myObj2對象的全局構造函數,正如預期的那樣,由_init下的函數調用。

The __CTOR_LIST__ variable stores the list of global constructors for an executable or shared library. We can find all of the global constructors by listing the addresses between __CTOR_LIST and _CTOR_END. This can be useful if you need to know what will be run before the main() function of an executable or when a library is first loaded.

__CTOR_LIST__ 變量存儲可執行文件或共享庫的全局構造函數的列表。通過列出 __CTOR_LIST 和 _CTOR_END 之間的地址, 我們可以找到所有的全局構造函數。如果您需要知道將在可執行文件的main () 函數之前運行的內容, 或者在首次加載庫時, 此功能會很有用。

The .fini section has a very similar convention but uses the __DTOR_LIST__ and __DTOR_END variables to mark the addresses of the global destructors.

.fini節有一個非常相似的約定, 但使用 __DTOR_LIST__ 和 __DTOR_END 變量來標記全局析構函數的地址。

9.6.3.10. .interp

This section contains the path name of the program interpreter. The program interpreter is used for executables and is responsible for getting a process up and running with all of its required libraries, and so on. A quick way to get the program interpreter is to use the readelf command as follows:

本節包含程序解釋器的路徑名。程序解釋器用於可執行文件, 並負責使用所有必需的庫 (等等) 運行進程。獲取程序解釋器的快速方法是使用 readelf 命令, 如下所示:

penguin> readelf -l foo

<...>

  INTERP        0x000114 0x08048114 0x08048114 0x00013 0x00013 R 0x1

      [Requesting program interpreter: /lib/ld-linux.so.2]

<...>

 

The program interpreter is described later in the chapter under the section heading, “Program Interpreter.”

程序解釋器將在本章後面的章節標題 "程序翻譯" 中描述。

9.6.3.11. .plt (Procedure Linkage Table)

The procedure linkage table is required by every shared library or executable that is dependent on shared libraries to satisfy an unresolved symbol. The PLT is also used to support “lazy binding,” which means not resolving the address of a function until it is called for the first time.

每個依賴於共享庫的共享庫或可執行文件都需要過程鏈接表來提供未解析的符號。PLT還用於支持“延遲綁定”,這意味着在第一次調用函數之前不會解析函數的地址。

The procedure linkage table contains a list of instructions that help functions find other functions in the address space. First, let’s use the readelf tool to find the PLT for the executable named foo. It is always in the section named .plt.

過程鏈接表包含幫助函數在地址空間中查找其他函數的指令列表。 首先,讓我們使用readelf工具找到名爲foo的可執行文件的PLT。 它始終位於名爲.plt的部分中。

penguin> readelf -S foo | egrep ' \.plt'

  [11] .plt    PROGBITS        080484cc 0004cc 000070 04 AX 0  0 4

 

The best way to look at the PLT is through the debugger. It starts at address 0x80484e0 and continues for 0x70 bytes:

查看PLT的最好方法是調試器。它從地址0x80484e0 開始, 並繼續查看0x70 字節:

Code View: Scroll / Show All

(gdb) disass 0x080484cc

Dump of assembler code for function _init:

0x80484b4 <_init>:      push   %ebp

0x80484b5 <_init+1>:    mov    %esp,%ebp

0x80484b7 <_init+3>:    sub    $0x8,%esp

0x80484ba <_init+6>:    call   0x8048564 <call_gmon_start>

0x80484bf <_init+11>:   nop

0x80484c0 <_init+12>:   call   0x80485d0 <frame_dummy>

0x80484c5 <_init+17>:   call   0x80486b0 <__do_global_ctors_aux>

0x80484ca <_init+22>:   leave

0x80484cb <_init+23>:   ret

0x80484cc <_init+24>:   pushl  0x804991c

0x80484d2 <_init+30>:   jmp    *0x8049920

0x80484d8 <_init+36>:   add    %al,(%eax)

0x80484da <_init+38>:   add    %al,(%eax)

0x80484dc <sleep>:      jmp    *0x8049924

0x80484e2 <sleep+6>:    push   $0x0

0x80484e7 <sleep+11>:   jmp    0x80484cc <_init+24>

0x80484ec <uname>:      jmp    *0x8049928

0x80484f2 <uname+6>:    push   $0x8

0x80484f7 <uname+11>:   jmp    0x80484cc <_init+24>

0x80484fc <__gxx_personality_v0>:       jmp    *0x804992c

0x8048502 <__gxx_personality_v0+6>:     push   $0x10

0x8048507 <__gxx_personality_v0+11>:    jmp    0x80484cc <_init+24>

0x804850c <__libc_start_main>:  jmp    *0x8049930

0x8048512 <__libc_start_main+6>:        push   $0x18

0x8048517 <__libc_start_main+11>:       jmp    0x80484cc <_init+24>

0x804851c <_Z3bazi>:    jmp    *0x8049934

0x8048522 <_Z3bazi+6>:  push   $0x20

0x8048527 <_Z3bazi+11>: jmp    0x80484cc <_init+24>

0x804852c <printf>:     jmp    *0x8049938

0x8048532 <printf+6>:   push   $0x28

0x8048537 <printf+11>:  jmp    0x80484cc <_init+24>

End of assembler dump.

 

Disassemble the PLT? Yes, the PLT is executable, but it has very specific executable parts (one for each function in the PLT). We could have used objdump to disassemble the PLT, although objdump would not give any hints as to which parts of the PLT relate to which functions.

反彙編PLT? 是的,PLT是可執行的,但它有非常具體的可執行部分(PLT中的每個功能都有一個)。 我們可以使用objdump來反彙編PLT,但objdump不會給出任何關於PLT的哪些部分與哪些函數相關的提示。

The location of the PLT is relative to functions that require other functions located somewhere in the address space. The relative offset from a function to the PLT is known at link time, so it is possible for the linker to specify a hard coded offset from an instruction address to the PLT. Because the location of the required/defined functions is not known at compile time or link time, the code instead makes a call directly to the appropriate slot in the PLT.

PLT的位置與需要位於地址空間某處的其他功能的功能有關。 從函數到PLT的相對偏移在鏈接時是已知的,因此鏈接器可以指定從指令地址到PLT的硬編碼偏移。由於在編譯時或鏈接時不知道所需/已定義函數的位置,因此代碼直接調用PLT中的相應插槽。

From the file main.C, the function main makes a call to sleep. The assembly language for this call from within gdb looks like this (some output skipped for simplicity):

從main.c文件中,函數main調用sleep。 來自gdb內的此調用的彙編語言如下所示(爲簡單起見,跳過了一些輸出):

(gdb) disass main

Dump of assembler code for function main:

0x80485fc <main>:       push   %ebp

0x80485fd <main+1>:     mov    %esp,%ebp

0x80485ff <main+3>:     sub    $0x198,%esp

<...>

0x8048641 <main+69>:    push   $0x3f2

0x8048646 <main+74>:    call   0x80484dc <sleep>

<...>

 

Note that the call to the sleep function is to the PLT slot for the sleep function (compare the address used in the call instruction to the assembly listing for the PLT above). Going back to the PLT slot for “sleep”:

注意,sleep函數的調用是針對sleep函數的PLT槽(將call指令中使用的地址與上面PLT的彙編程序進行比較)。 讓我們再回到PLT插槽“sleep”:

0x80484dc <sleep>:      jmp    *0x8049924

0x80484e2 <sleep+6>:    push   $0x0

0x80484e7 <sleep+11>:   jmp    0x80484cc <_init+24>

 

The first instruction is a jump to address 0x8049924. This is right inside the Global Offset Table or GOT. To confirm, let’s get the address of the GOT:

第一個指令是跳轉到地址0x8049924。在全局偏移表或GOT中,這是正確的。要確認這一點, 讓我們得到的GOT的地址:

penguin> readelf -S foo | egrep ' \.got'

[22] .got    PROGBITS        08049918 000918 000028 04 WA 0  0 4

 

Okay, so what’s in the GOT that might be of interest to the PLT? That depends on when you look. Before the program is run, the GOT looks like the following:

好吧,那麼PLT可能會對GOT的什麼內容感興趣? 這取決於你什麼時候看。在程序運行之前,GOT如下所示:

(gdb) x/40 0x08049918

0x8049918 <_JCR_LIST_+4>: 0x08049810 0x00000000 0x00000000 0x080484e2

0x8049928 <_JCR_LIST_+20>: 0x080484f2 0x08048502 0x08048512 0x08048522

0x8049938 <_JCR_LIST_+36>: 0x08048532 0x00000000 0x00000000 0x00000000

0x8049948:      Cannot access memory at address 0x8049948

 

The GOT slot for the sleep function (at 0x8049924) has a value of 0x080484e2. This is the address of the second instruction in the PLT slot for the sleep function.

Sleep函數的GOT插槽(位於0x8049924)的值爲0x080484e2。 這是sleep函數在PLT槽中的第二條指令的地址。

0x80484e2 <sleep+6>:     push   $0x0

 

The instruction pushes a value of 0x0 onto the stack. The next instruction jumps to 0x80484e0:

指令將0x0推送到堆棧上。下一條指令跳轉到 0x80484e0:

0x80484e7 <sleep+11>:   jmp     0x80484cc <_init+24>

 

It is worth noting that each of the PLT slots has a different value at offset 0x6:

值得注意的是,每個PLT插槽在偏移0x6處具有不同的值:

0x80484e2 <sleep+6>:    push   $0x0

0x80484f2 <uname+6>:    push   $0x8

0x8048502 <__gxx_personality_v0+6>:    push   $0x10

0x8048512 <__libc_start_main+6>:       push   $0x18

0x8048522 <_Z3bazi+6>:  push   $0x20

0x8048532 <printf+6>:   push   $0x28

 

For example, the uname slot pushes a value of 0x8 onto the stack. This is a special marker used to find the PLT slot, used by the dynamic linking code. Dynamic linking is explained in more detail shortly.

例如,uname slot將值0x8壓入堆棧。 這是一個特殊的標記,用於查找動態鏈接代碼使用的PLT插槽。 稍後將更詳細地解釋動態鏈接。

Let’s get back to the address of 0x80484cc. This is the address of the beginning of the PLT and contains the following instructions (ignore the offset of _init):

讓我們回到0x80484cc的地址。 這是PLT開頭的地址,包含以下指令(忽略_init的偏移量):

0x80484cc <_init+24>:   pushl  0x804991c

0x80484d2 <_init+30>:   jmp    *0x8049920

0x80484d8 <_init+36>:   add    %al,(%eax)

0x80484da <_init+38>:   add    %al,(%eax)

 

The first instruction pushes a value onto the stack, while the second instruction jumps to the address stored in 0x8049920. Let’s see what value is at that address:

第一個指令將值推送到堆棧上, 而第二個指令跳轉到存儲在0x8049920 中的地址。讓我們看看這個地址有什麼值:

(gdb) break main

Breakpoint 1 at 0x8048605

(gdb) run

Starting program: /home/wilding/src/Linuxbook/ELF/foo

 

Breakpoint 1, 0x08048605 in main ()

 

(gdb) x 0x8049920

0x8049920 <__JCR_LIST__+12>:     0x40009c90

 

Okay, let’s see what function is at 0x40009c90:

(gdb) disass 0x40009c90 0x40009c94

Dump of assembler code from 0x40009c90 to 0x40009c94:

0x40009c90 <_dl_runtime_resolve>:       push   %eax

0x40009c91 <_dl_runtime_resolve+1>:     push   %ecx

0x40009c92 <_dl_runtime_resolve+2>:     push   %edx

0x40009c93 <_dl_runtime_resolve+3>:     mov    0x10(%esp,1),%edx

End of assembler dump.

 

So after all of this we know that the first call to “sleep” (or any function) will eventually call _dl_runtime_resolve. This is a special function that works to resolve the address of the function. The details of how this works are a bit beyond the scope of this chapter (and fairly lengthy to explain), but suffice it to say that this finds the address of the function whose slot was just executed. It then updates the GOT with the actual address of the function in the address space so that the second call to the function (that is, “sleep”) will go directly to the address of the actual function itself.

所以我們知道第一次調用“sleep”(或其它函數)後最終會調用_dl_runtime_resolve。 這是一個特殊函數,用於解析函數的地址。這個函數工作原理的細節有點超出了本章的範圍(並且解釋起來相當冗長),但是可以說它找到了剛剛執行的函數的地址。 然後它使用地址空間中函數的實際地址更新GOT,準備第二次調用函數sleep。這次將直接轉到實際函數本身的地址。

Let’s see what the GOT looks like after the function sleep is called:

讓我們看看調用函數sleep後GOT的樣子:

(gdb) cont

Continuing.

This is a printf format string in baz

This is a printf format string in main

 

Program received signal SIGINT, Interrupt.

0x401a2d01 in nanosleep () from /lib/libc.so.6

(gdb) x/40 0x08049918

0x8049918 <_JCR_LIST_+4>: 0x08049810 0x40012fd0 0x40009c90 0x401a2ab0

0x8049928 <_JCR_LIST_+20>: 0x401a2750 0x08048502 0x4011d400 0x40014916

0x8049938 <_JCR_LIST_+36>: 0x40154c90 0x00000000 0x00000000 0x00000005

0x8049948: 0x00000000 0x00000019 0x7273752f 0x62696c2f

...

 

After the program is run and the sleep function is called, the slot for the sleep function in the GOT is 0x401a2ab0. Let’s confirm that this is the address of the actual sleep function. 在程序運行並調用sleep函數後, GOT中sleep函數的插槽爲0x401a2ab0。讓我們確認這是睡眠功能的實際地址。

(gdb) disass 0x401a2ab0

Dump of assembler code for function sleep:

0x401a2ab0 <sleep>:     push   %ebp

0x401a2ab1 <sleep+1>:   mov    %esp,%ebp

0x401a2ab3 <sleep+3>:   push   %edi

0x401a2ab4 <sleep+4>:   push   %esi

...

 

So the next time the function sleep is called, the GOT slot points directly to the actual sleep function, avoiding the need to resolve the function a second time (that is, find its address in memory).

因此, 下一次調用函數sleep時, GOT的插槽直接指向實際的sleep函數, 避免了第二次解析 (即在內存中查找其地址) 的函數。

9.6.3.12. .rodata

This section contains read-only constant values, string literals, and other constant data such as the variable constInt from foo.C defined as:

本節包含只讀常量值、字符串文本和其他常量數據, 例如 foo.c中constInt定義爲:

const int constInt = 5 ;

 

This variable should be located in the .rodata section because it is read-only (that is, constant). Let’s confirm by finding the location of the .rodata section and of this read-only (constant) variable.

此變量應位於.rodata節中,因爲它是隻讀的(即常量)。 讓我們通過查找.rodata節和這個只讀(常量)變量的位置來確認。

penguin> readelf -S libfoo.so |egrep rodata

  [12] .rodata   PROGBITS    00000a80 000a80 00004c 00  A 0   0 32

penguin> nm libfoo.so |egrep constInt

00000ac8 r constant

 

As expected, constInt is contained within the .rodata segment, given the values of the preceding output. Let’s see what else is in the .rodata section using the hexdump utility:

正如所料,constInt包含在.rodata段中,給定前面輸出的值。讓我們使用hexdump實用程序看看.rodata節還有什麼:

Code View: Scroll / Show All

penguin> hexdump -C -s 0xa80 -n 76 libfoo.so

00000a80  54 68 69 73 20 69 73 20  61 20 63 6f 6e 73 74 61 |This is a consta|

00000a90  6e 74 20 73 74 72 69 6e  67 21 00 00 00 00 00 00 |nt string!......|

00000aa0  54 68 69 73 20 69 73 20  61 20 70 72 69 6e 74 66 |This is a printf|

00000ab0  20 66 6f 72 6d 61 74 20  73 74 72 69 6e 67 20 69 |format string i|

00000ac0  6e 20 62 61 7a 0a 00 00  05 00 00 00             |n baz.......|

00000acc

 

This output shows that the .rodata section also contains constant strings, including those used in printf statements. Notice that such strings are not stored in any of the ELF string tables.

此輸出顯示. rodata 節還包含常量字符串, 包括在 printf 語句中使用的字串。請注意, 此類字符串不存儲在任何 ELF 字符串表中。

9.6.3.13. .shstrtab

This section is the string table that contains the section names of the various sections: 此節是包含各節名稱的字符串表:

Code View: Scroll / Show All

penguin> hexdump -C libfoo.so -s0xffb -n 251

00000ffb  00 2e 73 79 6d 74 61 62  00 2e 73 74 72 74 61 62 |..symtab..strtab|

0000100b  00 2e 73 68 73 74 72 74  61 62 00 2e 68 61 73 68 |..shstrtab..hash|

0000101b  00 2e 64 79 6e 73 79 6d  00 2e 64 79 6e 73 74 72 |..dynsym..dynstr|

0000102b  00 2e 67 6e 75 2e 76 65  72 73 69 6f 6e 00 2e 67 |..gnu.version..g|

0000103b  6e 75 2e 76 65 72 73 69  6f 6e 5f 72 00 2e 72 65 |nu.version_r..re|

0000104b  6c 2e 64 79 6e 00 2e 72  65 6c 2e 70 6c 74 00 2e |l.dyn..rel.plt..|

0000105b  69 6e 69 74 00 2e 74 65  78 74 00 2e 66 69 6e 69 |init..text..fini|

0000106b  00 2e 72 6f 64 61 74 61  00 2e 65 68 5f 66 72 61 |..rodata..eh_fra|

0000107b  6d 65 5f 68 64 72 00 2e  64 61 74 61 00 2e 65 68 |me_hdr..data..eh|

0000108b  5f 66 72 61 6d 65 00 2e  64 79 6e 61 6d 69 63 00 |_frame..dynamic.|

0000109b  2e 63 74 6f 72 73 00 2e  64 74 6f 72 73 00 2e 6a |.ctors..dtors..j|

000010ab  63 72 00 2e 67 6f 74 00  2e 62 73 73 00 2e 63 6f |cr..got..bss..co|

000010bb  6d 6d 65 6e 74 00 2e 64  65 62 75 67 5f 61 72 61 |mment..debug_ara|

000010cb  6e 67 65 73 00 2e 64 65  62 75 67 5f 69 6e 66 6f |nges..debug_info|

000010db  00 2e 64 65 62 75 67 5f  61 62 62 72 65 76 00 2e |..debug_abbrev..|

000010eb  64 65 62 75 67 5f 6c 69  6e 65 00 |debug_line.|000010f6

9.6.3.14. .strtab (string table)

This string table stores symbol names for the main symbol table. It uses the typical string table format described above under “String Table Format.” See the section, .dynsym, for the string table for the dynamic symbol table.

此字符串表存儲主符號表的符號名稱。它使用上面“字符串表格式”中描述的典型字符串表格式。請參閱.dynsym部分,瞭解動態符號表的字符串表。

9.6.3.15. .symtab (symbol table)

This is the full (main) symbol table that also includes all static functions and variables. This is used during the linking phase when an executable or shared library is being built. This is not used during run time. In fact, only part or none of this symbol table may be loaded into the address space at run time.

這是完整(主)符號表,還包括所有靜態函數和變量。在構建可執行文件或共享庫的鏈接階段使用此方法。 在運行時不使用它。實際上,在運行時只能將該符號表的一部分或全部不加載到地址空間中。

In the executable foo, the offset of the .symtab section is 0x29c4 and is 0x51 0 bytes in size as shown here:

在可執行文件foo中,.symtab節的偏移量爲0x29c4,大小爲0x510字節,如下所示:

penguin> readelf -S foo | egrep symtab

  [33] .symtab  SYMTAB        00000000 0029c4 000510 10    34 3a 4

 

Using the readelf command, we can see that this section is contained by none of the ELF segments:

使用readelf命令,我們可以看到此部分不包含任何ELF段:

penguin> readelf -l foo | head -16

 

Elf file type is EXEC (Executable file)

Entry point 0x8048540

There are 7 program headers, starting at offset 52

 

Program Headers:

  Type     Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align

  PHDR     0x000034 0x08048034 0x08048034 0x000e0 0x000e0 R E 0x4

  INTERP   0x000114 0x08048114 0x08048114 0x00013 0x00013 R  0x1

      [Requesting program interpreter: /lib/ld-linux.so.2]

  LOAD     0x000000 0x08048000 0x08048000 0x0076c 0x0076c R E 0x1000

  LOAD     0x00076c 0x0804976c 0x0804976c 0x001d4 0x001dc RW  0x1000

  DYNAMIC  0x000810 0x08049810 0x08049810 0x000f0 0x000f0 RW  0x4

  NOTE     0x000128 0x08048128 0x08048128 0x00020 0x00020 R   0x4

  GNU_EH_FRAME  0x000748 0x08048748 0x08048748 0x00024 0x00024 R 0x4

 

The fact that this symbol table is not loaded at run time makes it impossible to write a stack trace back function that can find and display the names of static functions from within the program itself.

在運行時未加載此符號表的事實使得無法編寫可以在程序本身內查找和顯示靜態函數名稱的棧跟蹤功能。

Stack trace back functions are often called from within a signal handler to dump the stack trace to a file when a trap occurs. Some of the functions on the stack may be static, and the program’s address space does not contain any symbol table that can be used to map the address of stack functions to function names.

棧跟蹤功能通常在信號處理程序內調用,以便在發生陷阱時將棧跟蹤轉儲到文件中。棧上的某些函數可能是靜態的,程序的地址空間不包含任何可用於將棧函數的地址映射到函數名稱的符號表。

9.6.3.16. .text

This section contains the executable code of an ELF file. All of the compiled functions that an ELF file contains will be in this section. This is the most important part of an ELF file. The rest of the sections and all of the complexity of ELF serves to allow the executable instructions in this section to run. The file foo.C defines three functions:

本節包含ELF文件的可執行代碼。 ELF文件包含的所有已編譯函數都將在此節中。 這是ELF文件中最重要的部分。 其餘的節和ELF的所有複雜性用於允許本節中的可執行指令運行。 文件foo.C定義了三個函數:

penguin> nm libfoo.so |egrep " T "

00000916 T _Z3bazi

00000a60 T _fini

00000768 T _init

 

These functions are all within the range of the .text segment: 這些函數都在. 文本段的範圍內:

penguin> readelf -S libfoo.so | egrep text

[10] .text    PROGBITS     000007e0 0007e0 000280 00  AX  0  0 16

 

Because the .text section contains executable instructions, the best way to see the contents of the .text section is to disassemble an ELF file using the objdump tool and look for the .text section in the output:

因爲. text 節包含可執行指令, 所以查看. text 節內容的最佳方法是使用 objdump 工具反彙編 ELF 文件, 並在輸出中查找. text 節:

Code View: Scroll / Show All

penguin> objdump -d libfoo.so

<...>

Disassembly of section .text:

<...>

000009d6 <_Z3fooi>:

9d6:    55                     push   %ebp

9d7:    89 e5                  mov    %esp,%ebp

9d9:    53                     push   %ebx

9da:    83 ec 04               sub    $0x4,%esp

9dd:    e8 00 00 00 00         call   9e2 <_Z3fooi+0xc>

9e2:    5b                     pop    %ebx

9e3:    81 c3 56 13 00 00      add    $0x1356,%ebx

9e9:    c7 45 f8 00 00 00 00   movl   $0x0,0xfffffff8(%ebp)

9f0:    8b 45 08               mov    0x8(%ebp),%eax

9f3:    89 45 f8               mov    %eax,0xfffffff8(%ebp)

9f6:    83 7d f8 63            cmpl   $0x63,0xfffffff8(%ebp)

9fa:    7e 02                  jle    9fe <_Z3fooi+0x28>

9fc:    eb 07                  jmp    a05 <_Z3fooi+0x2f>

9fe:    8d 45 f8               lea    0xfffffff8(%ebp),%eax

a01:    ff 00                  incl   (%eax)

a03:    eb f1                  jmp    9f6 <_Z3fooi+0x20>

a05:    8b 93 20 00 00 00      mov    0x20(%ebx),%edx

a0b:    8b 45 08               mov    0x8(%ebp),%eax

a0e:    89 02                  mov    %eax,(%edx)

a10:    8b 45 08               mov    0x8(%ebp),%eax

a13:    03 45 f8               add    0xfffffff8(%ebp),%eax

a16:    83 c4 04               add    $0x4,%esp

a19:    5b                     pop    %ebx

a1a:    5d                     pop    %ebp

a1b:    c3                     ret

 

All of the functions from the source files will be listed, as well as a few additional functions that support the inner workings of ELF.

源文件中的所有函數將被列出,以及一些支持ELF內部工作的附加函數。

9.6.3.17. .rel

This is a relocation section. Relocation sections are prefixed by the section name that they will be operating on. For example, .rel.text is a relocation table that will work with the .text section. Relocation is a critical part of ELF because it allows shared libraries to be loaded anywhere in the address space.

這是一個重定位節。 重定位節以它們將要操作的節名爲前綴。 例如,.rel.text是一個可以與.text節一起使用的重定位表。 重定位是ELF的關鍵部分,因爲它允許在地址空間的任何位置加載共享庫。

Relocation is the process of changing an address in a loaded ELF section to the current address of a corresponding function or variable as illustrated in Figure 9.4.

重定位是將加載的 ELF 節中的地址更改爲相應函數或變量的當前地址的過程, 如圖9.4 所示。

Figure 9.4. Relocations.

 

Figure 9.4 shows the important sections that are needed to explain relocation. The text segment contains the procedure linkage table (PLT), relocation sections, and the executable code. The data segment contains the global and static variables as well as the global offset table (GOT).

圖9.4顯示瞭解釋重定位所需的重要的節。 文本段包含過程鏈接表(PLT),重定位節和可執行代碼。 數據段包含全局和靜態變量以及全局偏移表(GOT)。

A function reference first goes to the PLT to the appropriate slot. This then executes an address, which is stored in the corresponding slot in the GOT. The address in the GOT may point back to a function in the text segment, or it may point to a function in another shared library. In the diagram, this reference goes to another shared library.

函數引用首先轉到適當的PLT的槽。 然後執行一個地址,該地址存儲在GOT的相應插槽中。 GOT中的地址可以指向文本段中的函數,或者它可以指向另一個共享庫中的函數。 在圖中,此引用轉到另一個共享庫。

A variable reference goes directly to the GOT and then to the address of the variable. The reference could go to another shared library (or executable) or to the data segment of the same shared library as it does in the diagram.

變量引用直接轉到GOT,然後轉到變量的地址。 引用可以轉到另一個共享庫(或可執行文件)或同一共享庫的數據段,如圖中所示。

The relocation in this case can be as simple as changing the addresses stored in the GOT.

在這種情況下,重定位可以像更改存儲在GOT中的地址一樣簡單。

The libfoo.so shared library actually has two sections that are prefixed by .rel: libfoo.

libfoo.so共享庫實際上有兩個以.rel:libfoo爲前綴的部分。

penguin> readelf -S libfoo.so | egrep rel

  [ 6] .rel.dyn    REL       000006e0 0006e0 000060 08  A 2  0 4

  [ 7] .rel.plt    REL       00000740 000740 000028 08  A 2  9 4

 

These are both relocation sections that contain entries of the form:

這些都是包含表單條目的重定位節:

typedef struct

{

  Elf32_Addr     r_offset;   /* Address */

Elf32_Word     r_info;     /* Relocation type and symbol index */

} Elf32_Rel;

Note: Other platforms may use .rela sections and the corresponding Elf32_Rela structure (see /usr/include/elf.h for this structure).

注意:其他平臺可能使用.rela部分和相應的Elf32_Rela結構(有關此結構,請參閱/usr/include/elf.h)。

 

The r_offset field is the target address or offset that should be changed by the relocation. For object files, the offset is within the affected section. For shared libraries and executables, this is the “value” (address) of the symbol. The r_info field contains the relocation type and the symbol index. We can see the relocation information using readelf:

r_offset字段是由重定位更改的目標地址或偏移量。 對於目標文件,偏移量在受影響的節內。 對於共享庫和可執行文件,這是符號的“值”(地址)。 r_info字段包含重定位類型和符號索引。 我們可以使用readelf查看重定位信息:

Code View: Scroll / Show All

penguin> readelf -r libfoo.so

 

Relocation section '.rel.dyn' at offset 0x6e0 contains 12 entries:

 Offset     Info    Type            Sym.Value  Sym. Name

00001b00  00000008 R_386_RELATIVE

00001b04  00000008 R_386_RELATIVE

00001b68  00000008 R_386_RELATIVE

00001d24  00000008 R_386_RELATIVE

00001d5c  00000008 R_386_RELATIVE

00001b6c  00002101 R_386_32      00000000   __ gxx_personality_v0

00001d58  00001e06 R_386_GLOB_DAT   00001d80   noValueGlobInt

00001d60  00002006 R_386_GLOB_DAT   00001b20   globInt

00001d64  00002606 R_386_GLOB_DAT   00000000   cxa_finalize

00001d68  00002c06 R_386_GLOB_DAT   00001d7c   myObj2

00001d6c  00002d06 R_386_GLOB_DAT   00000000   Jv_RegisterClasses

00001d70  00002e06 R_386_GLOB_DAT   00000000   gmon_start__

 

Relocation section '.rel.plt' at offset 0x740 contains 5 entries:

 Offset     Info    Type            Sym.Value Sym. Name

00001d44  00001d07 R_386_JUMP_SLOT   000009d6 Z3fooi

00001d48  00002407 R_386_JUMP_SLOT   00000000 printf

00001d4c  00002607 R_386_JUMP_SLOT   00000000 cxa_finalize

00001d50  00002b07 R_386_JUMP_SLOT   000009c8 ZN7myClassC1Ev

00001d54  00002d07 R_386_JUMP_SLOT   00000000 Jv_RegisterClasses

 

Notice that the variables are in one section, and all of the functions are in another section.

Static variables and functions do not need to be relocated because they will always reference using a relative offset from within the shared library. For executables, they will reference using absolute addresses.

請注意,變量位於一個節,所有功能都位於另一個節中。靜態變量和函數不需要重定位,因爲它們總是使用共享庫中的相對偏移量進行引用。 對於可執行文件,它們將使用絕對地址進行引用。

The file foo.C defines two objects, one static and one global as follows:

文件foo.C定義了兩個對象,一個是靜態的,一個是全局的,如下所示:

static myClass myObj ;

myClass myObj2 ;

 

Only one variable, the global variable, will be in the relocation section:

penguin> readelf -r libfoo.so |egrep "globInt|staticInt"

00001d60  00002006 R_386_GLOB_DAT    00001b20   globInt

 

The type is R_386_GLOB_DAT and the symbol offset is 0x1b20, which is the value of the symbol in the symbol table. The relocation offset for globInt points to 0x1d60, which is the address of the globInt slot in the GOT. More information on relocations follows in the next section of this chapter.

類型爲R_386_GLOB_DAT,符號偏移量爲0x1b20,這是符號表中符號的值。 globInt的重定位偏移量指向0x1d60,這是GOT中globInt插槽的地址。 有關重定位的更多信息,請參見本章的下一節。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章