簡介
應用代碼不太健壯時, 會遇到程序 crash
情況, 通常可通過 PC
寄存器, A0
寄存器, EXCCAUSE
寄存器和 backtrace
來初步定位問題.
例如截圖中 crash
信息:
通過如下方式定位分析問題時, 旨在讓讀者能夠初步感知
crash
時的現場. 不代表程序 100% crash 在分析出的位置上.
通過 backtrace 定位
如上截圖所示, 黃色打印是 backtrace
信息, 即程序執行時的調用棧.
從 backtrace
中初步定位到問題出現在 user_main.c
的 21 行附近, 即如圖的 printf
可能會有問題.
通過 PC/A0 寄存器定位
通過 PC/A0
寄存器中信息通常也可以初步看到 crash
附近代碼位置.
PC
判斷 crash
原因參考:
1.PC
寄存器值若爲 0x00000000
, 可能是代碼中執行某個空 callback
, 或執行調用某個爲空的函數指針.
2.PC
寄存器值若爲野指針, 例如 0x80001210
, 可能是內存踩踏. 一旦踩踏到棧底的 PC
, 形如 retw.n
就會彈出野指針.
PC 野指針: 筆者認爲指不在正常
IRAM
範圍內的指針.
ESP8266
正常IRAM
地址爲0x40100000
-0x40201010+appsize
(可在esp8266.ld
中查看);
ESP32
正常IRAM
地址爲0x40080000
-0x40400000
(可在esp32.ld
中查看).
內存踩踏原因很多, 例如:
- 使用不安全的
APIs
,strcat()
,strcpy()
,sprintf()
等 - 內存操作越界,
memset
,memcpy
,memmove
- 邊界問題: 數組操作越界,
strlen
寫錯爲sizeof
task stack
過小, 導致程序執行時越界- 函數返回局部變量引起的慘案
- 操作已經
free
過的指針
3.PC
寄存器值看起來正常, 例如 0x40201234
第一種是 PC
寄存器值真的正常, 這種情況需看下 PC 寄存器值對應的函數.
第二種是 PC
寄存器值看起來正常, 實際 PC 寄存器值無效.
例如 PC
指向一個 flash
上函數, 而剛好在執行此函數前, cache 被禁用(通常是 flash 相關操作), 則也會出現 因無法訪問此函數而 crash
.
通過 PC/A0 寄存器定位具體方法如下:
進入應用編譯目錄, 通過下面命令進行反彙編.
ESP8266:
cd build
xtensa-lx106-elf-objdump -S build/xxx.elf > a.S
ESP32:
cd build
xtensa-esp32-elf-objdump -S build/xxx.elf > a.S
-S
: 展示源代碼和反彙編代碼
- 對
PC/A0
寄存器作用瞭解不深讀者, 可以看下計算機組成原理或相關書籍.xtensa-lx106-elf-objdump
/xtensa-esp32-elf-objdump
和linux
上objdump
是幾乎一致的.- 可通過
xtensa-lx106-elf-objdump --help
/xtensa-esp32-elf-objdump --help
查看詳細說明.
通過查看 a.S
文件中的 PC
寄存器和 A0
寄存器信息, 這裏以 PC: 0x40228115
爲例:
如圖所示:
PC: 0x40228115
指向 printf
的反彙編代碼, 前兩條彙編指令對應: 從內存地址 0 讀取值到寄存器 a3
.
按照 C 入參規則, 即對應 *p 入參錯誤
, 由於 p 爲 NULL
, 而讀取 0x0
地址內容是屬於非法讀取, 因此程序 crash
.
通過 EXCCAUSE 寄存器定位
EXCCAUSE
寄存器: 即 exception causes
, 程序異常時, 該寄存器會保存 crash
原因, 供讀者參考.
如本文簡介中截圖所示:
EXCCAUSE: 0x0000001c
對應的是 LoadProhibitedCause
, 即一個加載引用了一個映射了不允許加載的屬性的頁.
其他 EXCCAUSE Code
可通過查看如下 Exception Causes
.
Exception Causes
EXCCAUSE Code | Cause Name | Cause Description [Required Option] | EXCVADDR Loaded |
---|---|---|---|
0 | IllegalInstructionCause | Illegal instruction [Exception Option] | No |
1 | SyscallCause | SYSCALL instruction [Exception Option] | No |
2 | InstructionFetchErrorCause | Processor internal physical address or data error during instruction fetch [Exception Option] | Yes |
3 | LoadStoreErrorCause | Processor internal physical address or data error during load or store [Exception Option] | Yes |
4 | Level1InterruptCause | Level-1 interrupt as indicated by set level-1 bits in the INTERRUPT register [ Interrupt Option ] | No |
5 | AllocaCause | MOVSP instruction, if caller’s registers are not in the register file [ Windowed Register Option] | No |
6 | IntegerDivideByZeroCause | QUOS, QUOU, REMS, or REMU divisor operand is zero [ 32-bit Integer Divide Option] | No |
7 | / | Reserved for Tensilica | / |
8 | PrivilegedCause | Attempt to execute a privileged operation when CRING ≠ 0 [MMU Option] | No |
9 | LoadStoreAlignmentCause | Load or store to an unaligned address [Unaligned Exception Option] | Yes |
10…11 | / | Reserved for Tensilica | / |
12 | InstrPIFDataErrorCause | PIF data error during instruction fetch [Processor Interface Option] | Yes |
13 | LoadStorePIFDataErrorCause | Synchronous PIF data error during LoadStore access [Processor Interface Option] | Yes |
14 | InstrPIFAddrErrorCause | PIF address error during instruction fetch [Processor Interface Option] | Yes |
15 | LoadStorePIFAddrErrorCause | Synchronous PIF address error during LoadStore access [Processor Interface Option] | Yes |
16 | InstTLBMissCause | Error during Instruction TLB refill [MMU Option] | Yes |
17 | InstTLBMultiHitCause | Multiple instruction TLB entries matched [MMU Option] | Yes |
18 | InstFetchPrivilegeCause | An instruction fetch referenced a virtual address at a ring level less than CRING [MMU Option] | Yes |
19 | / | Reserved for Tensilica | / |
20 | InstFetchProhibitedCause | An instruction fetch referenced a page mapped with an attribute that does not permit instruction fetch [Region Protection Option or MMU Option] | Yes |
21…23 | / | Reserved for Tensilica | / |
24 | LoadStoreTLBMissCause | Error during TLB refill for a load or store [MMU Option] | Yes |
25 | LoadStoreTLBMultiHitCause | Multiple TLB entries matched for a load or store [MMU Option] | Yes |
26 | LoadStorePrivilegeCause | A load or store referenced a virtual address at a ring level less than CRING [MMU Option] | Yes |
27 | / | Reserved for Tensilica | / |
28 | LoadProhibitedCause | A load referenced a page mapped with an attribute that does not permit loads [Region Protection Option or MMU Option] | Yes |
29 | StoreProhibitedCause | A store referenced a page mapped with an attribute that does not permit stores [Region Protection Option or MMU Option] | Yes |
30…31 | / | Reserved for Tensilica | / |
32…39 | CoprocessornDisabled | Coprocessor n instruction when cpn disabled. n varies 0…7 as the cause varies 32…39 [Coprocessor Option] | No |
40…63 | / | Reserved for Tensilica | / |