ARMv8-異常處理

ARM異常處理分爲同步（synchronous）和異步異常（asynchronous）

滿足下面條件爲同步異常：
1. 異常是由於直接執行或嘗試執行指令而生成的。
2. 提供給異常處理程序的返回地址確定保存着指示引起異常的指令。
3. 異常是精確的。

（一）同步異常分類及可能產生的原因

（1） 未定義異常：UNDEFINED exceptions

產生的原因：a）在不當的exception level執行指令；b）嘗試執行未定義的指令位模式；

（2）非法執行狀態異常：Illegal Execution State exceptions

產生的原因：嘗試執行指令的時候，而PSTATE.IL 被設置爲 1，PSTATE.IL爲非法執行標誌位；

（3）未對齊異常：misaligned Stack Pointer/ PC

產生的原因： SP和PC在執行使用中，未對齊；

（4）系統調用異常：SVC, HVC, or SMC

產生的原因：SVC, HVC, or SMC 指令產生的異常；SVC通常被EL0（user mode）的軟件用來申請操作系統上EL1（OS service）請求特權操作或訪問系統資源。HVC主要被guest OS用來請求hypervisor的服務； SMC表示：Secure monitor Call用於secure與none-secure切換；

（5）陷阱執行異常：Traps execute Exception

產生的原因：陷阱試圖執行系統控制寄存器定義的指令導致被困到更高等級EL的異常。

（6）指令數據終止異常： Instruction/Data abort Exception

產生的原因：指令異常：CPU根據一個地址預取指令，發現地址取不出數據或者無法訪問，就會觸發預取指異常；數據異常：當程序試圖讀或者寫一個不合法的內存地址時發生（沒有權限訪問或者不存在的地址）

（7）debug異常：debug exception

產生的原因：打開調試模式時候，軟件斷點指令/斷點/觀察點/向量捕獲/軟件單步等Debug產生的異常；

異步異常分類

異步異常分爲外部物理異常和虛擬異常；SError or vSError：系統錯誤類型，包括外部數據終止；IRQ or vIRQ：外部中斷 or 虛擬外部中斷；
FIQ or vFIQ：快速中斷 or 虛擬快速中斷；

異常處理過程

保存PSTATE 數據到SPSR_ELx,(x = 1,2,3)，在返回異常現場的時候，可以使用SPSR_ELx來恢復PE的狀態
保存異常進入地址到ELR_ELx，同步異常（und/abt等）是當前地址，而異步異常（irq/fiq等）是下一條指令地址，在返回異常現場的時候，可以使用ELR_ELx來恢復PC值
保存異常原因信息到ESR_ELx；
PE根據目標EL的異常向量表中定義的異常地址強制跳轉到異常處理程序；
堆棧指針SP的使用由目標EL決定；

用戶態(EL0)不能處理異常，當異常發生在用戶態時，異常級別(EL)會發生切換，默認切換到EL1(內核態)，所以大部分的異常都被路由到EL1來處理；

（二）異常向量表

arch/arm64/kernel/entry.S：

199 /*
200  * Exception vectors.
201  */
202    
203     .align  11
204 ENTRY(vectors)
205     ventry  el1_sync_invalid        // Synchronous EL1t
206     ventry  el1_irq_invalid         // IRQ EL1t
207     ventry  el1_fiq_invalid         // FIQ EL1t
208     ventry  el1_error_invalid       // Error EL1t
209    
210     ventry  el1_sync            // Synchronous EL1h 常發生在內核態（EL1）並且系統配置爲內核處理這些異常（這些異常導致PE遷移到EL1）時候的異常向量；           
211     ventry  el1_irq             // IRQ EL1h
212     ventry  el1_fiq_invalid         // FIQ EL1h
213     ventry  el1_error_invalid       // Error EL1h
214    
215     ventry  el0_sync            // Synchronous 64-bit EL0異常發生在了用戶態（EL0）並且該異常需要在內核態（EL1）中處理 ； 
216     ventry  el0_irq             // IRQ 64-bit EL0
217     ventry  el0_fiq_invalid         // FIQ 64-bit EL0
218     ventry  el0_error_invalid       // Error 64-bit EL0
219    
220 #ifdef CONFIG_COMPAT
221     ventry  el0_sync_compat         // Synchronous 32-bit EL0      
222     ventry  el0_irq_compat          // IRQ 32-bit EL0
223     ventry  el0_fiq_invalid_compat      // FIQ 32-bit EL0
224     ventry  el0_error_invalid_compat    // Error 32-bit EL0
225 #else
226     ventry  el0_sync_invalid        // Synchronous 32-bit EL0      
227     ventry  el0_irq_invalid         // IRQ 32-bit EL0
228     ventry  el0_fiq_invalid         // FIQ 32-bit EL0
229     ventry  el0_error_invalid       // Error 32-bit EL0
230 #endif
231 END(vectors)

align 11：EL1的異常向量表保存在VBAR_EL1寄存器中（Vector Base Address Register (EL1)），該寄存器的低11bit是reserve的，11～63表示了Vector Base Address，因此這裏的異常向量表是2K對齊的。

各個exception level的Vector Base Address Register (VBAR)寄存器，該寄存器保存了各個exception level的異常向量表的基地址。該寄存器有三個，分別是VBAR_EL1，VBAR_EL2，VBAR_EL3。
具體的exception handler是通過vector base address ＋ offset得到

根據上面的異常向量表可以分爲4組：

1. SError
2. FIQ
3. IRQ
4. Synchronous

4個組的分類根據發生異常時是否發生異常級別切換、和使用的堆棧指針來區別。分別對應於如下4組：

異常發生在當前級別且使用SP_EL0(EL0級別對應的堆棧指針)，即發生異常時不發生異常級別切換，可以簡單理解爲異常發生在內核態(EL1)，且使用EL0級別對應的SP。這種情況在Linux內核中未進行實質處理，直接進入bad_mode()流程。
異常發生在當前級別且使用SP_ELx(ELx級別對應的堆棧指針，x可能爲1、2、3)，即發生異常時不發生異常級別切換，可以簡單理解爲異常發生在內核態(EL1)，且使用EL1級別對應的SP。這是比較常見的場景。
異常發生在更低級別且在異常處理時使用AArch64模式。可以簡單理解爲異常發生在用戶態，且進入內核處理異常時，使用的是AArch64執行模式(非AArch32模式)。這也是比較常見的場景。
異常發生在更低級別且在異常處理時使用AArch32模式。可以簡單理解爲異常發生在用戶態，且進入內核處理異常時，使用的是AArch32執行模式(非AArch64模式)。這種場景基本未做處理。

比如el1_error_invalid：異常發生在EL1內核態，EL1t使用SP_EL0(用戶態棧)，EL1h使用SP_EL1(內核態棧)；而el0_error_invalid：異常發生在用戶態System Error ，使用SP_EL1(內核態棧)；

（三）invalid類異常處理函數接口

帶invalid後綴的向量都是Linux做未做進一步處理的向量，默認都會進入bad_mode()流程，說明這類異常Linux內核無法處理，只能上報給用戶進程(用戶態，sigkill或sigbus信號)或die(內核態)

帶invalid後綴的向量最終都調用了inv_entry，inv_entry實現如下：

233 /*
234  * Invalid mode handlers
235  */
236     .macro  inv_entry, el, reason, regsize = 64
        //用.MACRO僞指令定義一個宏，可以把需要重複執行的一段代碼或者是一組指令縮寫成一個宏；
237     kernel_entry el, \regsize  //(a)
238     mov x0, sp
239     mov x1, #\reason
240     mrs x2, esr_el1
        //保存三個參數到x0,x1,x2；
241     b   bad_mode   //(b)
242     .endm

244 el0_sync_invalid:
245     inv_entry 0, BAD_SYNC
246 ENDPROC(el0_sync_invalid)

（a）異常進入壓棧準備 kernel_entry

59 /*
 60  * Bad Abort numbers
 61  *-----------------
 62  */
 63 #define BAD_SYNC    0
 64 #define BAD_IRQ     1
 65 #define BAD_FIQ     2
 66 #define BAD_ERROR   3
 67 
 68     .macro  kernel_entry, el, regsize = 64
 69     sub sp, sp, #S_FRAME_SIZE - S_LR    // room for LR, SP, SPSR, ELR  //SP指針滿遞減；
 70     .if \regsize == 32
 71     mov w0, w0              // zero upper 32 bits of x0
 72     .endif
 73     push    x28, x29
 74     push    x26, x27
 75     push    x24, x25
 76     push    x22, x23
 77     push    x20, x21
 78     push    x18, x19
 79     push    x16, x17
 80     push    x14, x15
 81     push    x12, x13
 82     push    x10, x11
 83     push    x8, x9        
 84     push    x6, x7
 85     push    x4, x5
 86     push    x2, x3        
 87     push    x0, x1  //pair 寄存器存放      
 88     .if \el == 0
 89     mrs x21, sp_el0 //根據EL，取出相應的棧指針；      
 90     get_thread_info tsk         // Ensure MDSCR_EL1.SS is clear,
 91     ldr x19, [tsk, #TI_FLAGS]       // since we can unmask debug
 92     disable_step_tsk x19, x20       // exceptions when scheduling. 
 93     .else  
 94     add x21, sp, #S_FRAME_SIZE
//如果異常級不是el0,把sp指針指向的地方加上pt_regs大小後的地址放入x21，//即指向沒進入kernel_entry函數錢的sp指向的位置;
 95     .endif
 96     mrs x22, elr_el1  //把el1的elr寄存器內容給x22；
 97     mrs x23, spsr_el1 //把el1的spsr寄存器內容給x23；
 98     stp lr, x21, [sp, #S_LR]
 99     stp x22, x23, [sp, #S_PC]
100     //把sp_el0，elr，lr，spsr這些內容都壓入棧，用於異常返回；
101     /*
102      * Set syscallno to -1 by default (overridden later if real syscall).
103      */
104     .if \el == 0
105     mvn x21, xzr
106     str x21, [sp, #S_SYSCALLNO]
107     .endif
108 
109     /*
110      * Registers that may be useful after this macro is invoked:
111      *
112      * x21 - aborted SP
113      * x22 - aborted PC
114      * x23 - aborted PSTATE
115     */
116     .endm

S_FRAME_SIZE表示sizeof(structpt_regs)；S_LR表示offsetof(structpt_regs, regs[30]即31號寄存器在結構體pt_regs中的偏移量；兩者相減我們就知道SP,PC,PSTATE的所佔字節的大小了；

107 struct pt_regs {
108     union {
109         struct user_pt_regs user_regs;
110         struct {
111             u64 regs[31]; 
112             u64 sp;
113             u64 pc;
114             u64 pstate;
115         };
116     };
117     u64 orig_x0;
118     u64 syscallno;
119 };

arrch64當中也不存在pop和push命令，而是通過宏來定義stp和ldp來實現，所以push x0, x1進行pair 存放到棧中；

 33     .macro  push, xreg1, xreg2
 34     stp \xreg1, \xreg2, [sp, #-16]!
 35     .endm
 36 
 37     .macro  pop, xreg1, xreg2
 38     ldp \xreg1, \xreg2, [sp], #16
 39     .endm

(b) arch/arm/kernel/traps.c : bad_mode()

494 /* 
495  * bad_mode handles the impossible case in the vectors.  If you see one of
496  * these, then it's extremely serious, and could mean you have buggy hardware.
497  * It never returns, and never tries to sync.  We hope that we can at least
498  * dump out some state information...
499  */
500 asmlinkage void bad_mode(struct pt_regs *regs, int reason)
501 {  
502     console_verbose();  //設置console log level等級爲最高；
503    
504     printk(KERN_CRIT "Bad mode in %s handler detected\n", handler[reason]);
505    
506     die("Oops - bad mode", regs, 0);//通知內核die
507     local_irq_disable();//disable中斷  
508     panic("bad mode");
509 }

（四）其他類異常處理函數接口

其他類的處理函數入口，還有：

210     ventry  el1_sync            // Synchronous EL1h
211     ventry  el1_irq             // IRQ EL1h

215     ventry  el0_sync            // Synchronous 64-bit EL0
216     ventry  el0_irq             // IRQ 64-bit EL0

以el1_sync爲例：

286 /*
287  * EL1 mode handlers.
288  */
289     .align  6
290 el1_sync:
291     kernel_entry 1  //把寄存器信息壓棧
292     mrs x1, esr_el1         // read the syndrome register 讀異常類型寄存器
293     lsr x24, x1, #ESR_EL1_EC_SHIFT  // exception class
294     cmp x24, #ESR_EL1_EC_DABT_EL1   // data abort in EL1  如果是el1的數據中止（data_abort）異常，跳轉到el1_da標號處
295     b.eq    el1_da
296     cmp x24, #ESR_EL1_EC_SYS64      // configurable trap
297     b.eq    el1_undef
298     cmp x24, #ESR_EL1_EC_SP_ALIGN   // stack alignment exception
299     b.eq    el1_sp_pc
300     cmp x24, #ESR_EL1_EC_PC_ALIGN   // pc alignment exception
301     b.eq    el1_sp_pc
302     cmp x24, #ESR_EL1_EC_UNKNOWN    // unknown exception in EL1
303     b.eq    el1_undef
304     cmp x24, #ESR_EL1_EC_BREAKPT_EL1    // debug exception in EL1
305     b.ge    el1_dbg
306     b   el1_inv
307 el1_da:
308     /*
309      * Data abort handling
310      */
311     mrs x0, far_el1
312     enable_dbg
313     // re-enable interrupts if they were enabled in the aborted context
314     tbnz    x23, #7, 1f         // PSR_I_BIT 測試位比較非 0 跳轉
315     enable_irq
316 1:
317     mov x2, sp              //structpt_regs，sp中存儲的是執行完kernel_entry後的值，其指向壓棧後的棧頂，作爲參數傳給函數do_mem_abort()
318     bl  do_mem_abort
319     //傳給該函數的x0發生異常的地址信息，x1是異常類型，x2就是壓入棧中的寄存器堆首地址。
320     // disable interrupts before pulling preserved data off the stack
321     disable_irq
322     kernel_exit 1

發生在EL1的同步異常，然後還會esr_el1讀取出來的值，判斷具體是哪一類的同步異常類型；以data bort異常的el1_da處理函數接口,do_mem_abort():

468 /* 
469  * Dispatch a data abort to the relevant handler.
470  */
471 asmlinkage void __exception do_mem_abort(unsigned long addr, unsigned int esr,
472                      struct pt_regs *regs)
473 {  
474     const struct fault_info *inf = fault_info + (esr & 63);//取esr所有有效位，用於選擇fault_info數組中的相應處理函數，該數組定義在後面;
475     struct siginfo info;  
476     
477     if (!inf->fn(addr, esr, regs))//如果處理成功（返回0），則直接返回，否則繼續執行。
478         return;
479 //異常處理不成功，打印出錯信息，進一步處理，不做分析。這裏假設異常處理正常返回。
480     pr_alert("Unhandled fault: %s (0x%08x) at 0x%016lx\n",
481          inf->name, esr, addr);
482 
483     info.si_signo = inf->sig;
484     info.si_errno = 0;
485     info.si_code  = inf->code;
486     info.si_addr  = (void __user *)addr;
487     arm64_notify_die("", regs, &info, esr); //通知內核die；
488 }

arch/arm64/mm/fault.c：

390 static struct fault_info {
391     int (*fn)(unsigned long addr, unsigned int esr, struct pt_regs *regs);
392     int sig;              
393     int code;             
394     const char *name;
395 } fault_info[] = {        
396     { do_bad,       SIGBUS,  0,     "ttbr address size fault"   }, 
397     { do_bad,       SIGBUS,  0,     "level 1 address size fault"    },
398     { do_bad,       SIGBUS,  0,     "level 2 address size fault"    },
399     { do_bad,       SIGBUS,  0,     "level 3 address size fault"    },
400     { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "input address range fault" }, 
401     { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 1 translation fault" }, 
402     { do_translation_fault, SIGSEGV, SEGV_MAPERR,   "level 2 translation fault" }, 
403     { do_page_fault,    SIGSEGV, SEGV_MAPERR,   "level 3 translation fault" }, 
404     { do_bad,       SIGBUS,  0,     "reserved access flag fault"    },
405     { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 1 access flag fault" }, 
406     { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 2 access flag fault" }, 
407     { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 3 access flag fault" }, 
408     { do_bad,       SIGBUS,  0,     "reserved permission fault" }, 
409     { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 1 permission fault"  }, 
410     { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 2 permission fault"  }, 
411     { do_page_fault,    SIGSEGV, SEGV_ACCERR,   "level 3 permission fault"  }, 
412     { do_bad,       SIGBUS,  0,     "synchronous external abort"    },
413     { do_bad,       SIGBUS,  0,     "asynchronous external abort"   },
414     { do_bad,       SIGBUS,  0,     "unknown 18"            },     
415     { do_bad,       SIGBUS,  0,     "unknown 19"            },     
416     { do_bad,       SIGBUS,  0,     "synchronous abort (translation table walk)" },
417     { do_bad,       SIGBUS,  0,     "synchronous abort (translation table walk)" },
418     { do_bad,       SIGBUS,  0,     "synchronous abort (translation table walk)" },
419     { do_bad,       SIGBUS,  0,     "synchronous abort (translation table walk)" },
420     { do_bad,       SIGBUS,  0,     "synchronous parity error"  }, 
421     { do_bad,       SIGBUS,  0,     "asynchronous parity error" }, 
422     { do_bad,       SIGBUS,  0,     "unknown 26"            },     
423     { do_bad,       SIGBUS,  0,     "unknown 27"            },     
424     { do_bad,       SIGBUS,  0,     "synchronous parity error (translation table walk" },
......

結語

這一節主要介紹異常異常類型分同步異常和異步異常，當異常發生的時候通過異常向量表找到相應的異常entry，然後調用相應的處理函數，執行寄存器壓棧操作然後調用具體處理函數，至此異常模型處理接口已經很清晰，後面還會分析通知內核die然後panic過程及do_bad()和缺頁異常do_page_fault()處理函數的分析；

ARMv8-異常處理

Window 安裝 Python 失敗 0x80070643，發生嚴重錯誤

ARMv8-中斷處理接口

ARM基礎學習-存儲管理單元MMU

ARM基礎學習-快速上下文切換技術

ARM基礎學習-寄存器尋址方式和指令

ARM基礎學習-協處理器CP15

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結