解讀PE/COFF文件格式

Windows平臺內開發操作系統,在用Cygwin版的GCC編譯源文件是遇到這樣一個問題,如果使用gcc –c bootpack.c會生成bootpack.o文件,使用HEX編輯器打開改文件你會看到這樣的代碼:

1

代碼中紅色框內的可讀文字是COFF標準文件格式中定義的文字。這些文字以及其他部分二進制,都是程序本身以外定義的用來便於系統運行的。這些輔助的數據可以被windows識別,但是不能被Linux以及其他操作系統識別。更不用說是自己開發的小型操作系統。

那麼如何解決這個問題呢?Cygwin內提供了很多處理二進制文件的工具,例如objcopy,運行objcopy bootpack.o –O binary bootpack.bin。我們再來看看bootpack.bin的內容。

2

內容少了很多,上圖中紅色框的數據都沒有了。再仔細看看就發現圖2中的數字來自圖1 0x0000008C位置到0x000000CB位置的內容。這些內容纔是真正的執行指令。那麼bootpack.o大部分內容都有什麼含意和作用呢?出於好奇心,我在微軟下載了Visual Studio, Microsoft Portable Executable and Common Object File Format Specification。並根據該文件編寫了一個解析.exe,.obj,.dll等文件的小程序olink

 

先看看使用該程序解析上文bootpack.o後會是什麼輸出結果。

bootpack.c的源代碼很簡單:

/*Colimas Simple OS*/

void io_hlt(void);

void write_mem8(int addr, int data);

//entry

void ColimasMain(void)

{

                                                                                                                                             int i;

                                                                                                                                             for(i=0xa000;i<=0xaffff;i++){

                                                                                                                                                          write_mem8(i,15);

                                                                                                                                             }

                                                                                                                                            

                                                                                                                                             for(;;)    io_hlt();

 

}

 

您完全可以忽略源代碼的具體內容。使用olink bootpack.o的結果如下:

This is an image file.

1. Image file header info:

              Image file machine type:Intel 386 or later processors and compatible processors

              The number of sections:3

              Number of symbols:12

              Pointer of symbols table:0xe0

              Characteristics:

                            Machine is based on a 32-bit-word architecture.

2. The sections info of image file:

              1 .text:

                            The virtual size       :0

                            The virtual address    :0x0

                            The size of raw data   :64

                            The pointer to raw data:0x8c

                            The characteristics of the section:

                                          The section contains executable code.

                                          The section contains initialized data.

              The section has relocations.

              2 .data:

                            The virtual size       :0

                            The virtual address    :0x0

                            The size of raw data   :0

                            The pointer to raw data:0x0

                            The characteristics of the section:

                                          The section contains initialized data.

                                          The section contains uninitialized data.

              3 .bss:

                            The virtual size       :0

                            The virtual address    :0x0

                            The size of raw data   :0

                            The pointer to raw data:0x0

                            The characteristics of the section:

                                          The section contains initialized data.

                                          The section contains uninitialized data.

3. Symbol table of image file(12).

              1. .file

                            Value:Not yet assigned a section.

                            type:Base type.

                            Storage class:A value that Microsoft tools, as well as traditional COFF format, use for the source-file symbol record.

                            Number of section:-2

              2. Files

                            name:bootpack.c

              3. _ColimasMain

                            Value:Not yet assigned a section.

                            type:A function that returns a base type.

                            Storage class:A value that Microsoft tools use for external symbols.

                            Number of section:1

              4. Function Definitions

                            Tag index:0

                            Total size:0

                            Pointer to line number:0x0

                            Pointer to next function:0x0

              5. .text

                            Value:Not yet assigned a section.

                            type:Base type.

                            Storage class:The offset of the symbol within the section.

                            Number of section:1

              6. Section Definitions:

                            Length:55

                            Number of relocations:2

                            Number of line numbers:0

                            One-based index into the section table:0

              7. .data

                            Value:Not yet assigned a section.

                            type:Base type.

                            Storage class:The offset of the symbol within the section.

                            Number of section:2

              8. Section Definitions:

                            Length:0

                            Number of relocations:0

                            Number of line numbers:0

                            One-based index into the section table:0

              9. .bss

                            Value:Not yet assigned a section.

                            type:Base type.

                            Storage class:The offset of the symbol within the section.

                            Number of section:3

              10. Section Definitions:

                            Length:0

                            Number of relocations:0

                            Number of line numbers:0

                            One-based index into the section table:0

              11. _io_hlt

                            Value:Not yet assigned a section.

                            type:A function that returns a base type.

                            Storage class:A value that Microsoft tools use for external symbols.

                            Number of section:0

              12. _write_mem8

                            Value:Not yet assigned a section.

                            type:A function that returns a base type.

                            Storage class:A value that Microsoft tools use for external symbols.

                            Number of section:0

看上內容如此之多,其實跟其他負責的EXE程序相比,這個輸出結果已經很少了,畢竟源文件很簡單,也沒有使用任何的動態鏈接庫。如果你急於看看更復雜的結果,請在調試模式下編譯生成的中間文件obj的結果吧。調試模式下的obj保存着調試用的代碼行數,已經其他信息。這是爲什麼調試模式下的文件要比Release模式下編譯的文件大,而Release模式下編譯的文件無法調試的原因。

olink程序實現並不複雜,由於我有曾經解析Java Class文件的經驗,這次的實現變得更爲輕鬆。程序簡單分爲2步,獲取數據和輸出結果。

獲取的數據有:

1. PE/COFF文件頭數據,該數據包括編譯機器的類型,例如上文輸出結果中的Image file machine type:Intel 386 or later processors and compatible processorsSection大小,Section指的是文件內容被分爲不同類型,例如,代碼爲.text section,而數據則定義在.data section等;TimeDateStamp,本文略過;符號表地址PointerToSymbolTable,符號表指的是文件內作爲各種標識的ASCII符號以及一些屬性值,例如一個函數名,以及該函數指令的地址;符號個數NumberOfSymbols可選文件頭數據大小,Optional header info of image file,該數據是存在於.exe,.dll文件裏,上文中間文件的輸出結果中就沒有;文件特性Characteristics,例如Image only, Windows CE, and Microsoft Windows NT and later,或The image file is a dynamic-link library (DLL),或The image file is a system file, not a user program等等。

2. 可選文件頭數據,該數據內容有:文件標示Magic,所有PE32格式的Magic0x10bPE32+格式的爲0x20bPE32+允許64位的地址空間;連接器版本號;指令(.text section)總長度;初始化數據(.data)長度;未初始化數據(.bss)長度;該文件所需要的子系統,例如:Device drivers and native Windows processes, 或The Windows graphical user interface (GUI) subsystem,或Windows CE,或XBOX等;程序入口地址,例如WinMainmain等;Data Directories,一組數據,每組包括數據地址和長度,這些數據分別表示Export TableImport TableResource TableException TableBase Relocation TableDebug等;還有一些數據,詳見PE/COFF格式規範。

3. Section表,每個表定長40Bytes。包括名稱,例如.text, .data, .bss等;長度;地址;Section標誌,例如該Section包含可執行代碼,或者包含初始化數據,或者包含未初始化數據等。

4. 符號表,Symbol Table。包括符號名稱,例如函數名,Section名等;所在Section Number;類型,例如該符號是類型名還是函數名;

5. 字符串表,表內保存着所有符號表所需要的超過8bytes的字符串。

olink解析的文件內容可以看出PE/COFF文件格式的複雜和健全性。

 
發佈了96 篇原創文章 · 獲贊 8 · 訪問量 36萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章