用戶態到內核態切換分析

本文轉載自http://www.cnblogs.com/justcxtoworld/p/3155741.html

本文將主要研究在X86體系下Linux系統中用戶態到內核態切換條件，及切換過程中內核棧和任務狀態段TSS在中斷機制/任務切換中的作用及相關寄存器的變化。

一：用戶態到內核態切換途徑：

1：系統調用 2：中斷　　3：異常

對應代碼，在3.3內核中，可以在/arch/x86/kernel/entry_32.S文件中查看。

二：內核棧

內核棧：Linux中每個進程有兩個棧，分別用於用戶態和內核態的進程執行，其中的內核棧就是用於內核態的堆棧，它和進程的task_struct結構，更具體的是thread_info結構一起放在兩個連續的頁框大小的空間內。

在內核源代碼中使用C語言定義了一個聯合結構方便地表示一個進程的thread_info和內核棧：

此結構在3.3內核版本中的定義在include/linux/sched.h文件的第2106行：

2016  union thread_union {
2017          struct thread_info thread_info;
2018          unsigned long stack[THREAD_SIZE/sizeof(long)];
2019     };

其中thread_info結構的定義如下：

3.3內核 /arch/x86/include/asm/thread_info.h文件第26行：

 26 　　struct thread_info {
 27         struct task_struct      *task;          /* main task structure */
 28         struct exec_domain      *exec_domain;   /* execution domain */
 29         __u32                   flags;          /* low level flags */
 30         __u32                   status;         /* thread synchronous flags */
 31         __u32                   cpu;            /* current CPU */
 32         int                     preempt_count;  /* 0 => preemptable,
 33                                                    <0 => BUG */
 34         mm_segment_t            addr_limit;
 35         struct restart_block    restart_block;
 36         void __user             *sysenter_return;
 37 #ifdef CONFIG_X86_32
 38         unsigned long           previous_esp;   /* ESP of the previous stack in
 39                                                    case of nested (IRQ) stacks
 40                                                 */
 41         __u8                    supervisor_stack[0];
 42 #endif
 43         unsigned int            sig_on_uaccess_error:1;
 44         unsigned int            uaccess_err:1;  /* uaccess failed */
 45 };

它們的結構圖大致如下：

　　esp寄存器是CPU棧指針，存放內核棧棧頂地址。在X86體系中，棧開始於末端，並朝內存區開始的方向增長。從用戶態剛切換到內核態時，進程的內核棧總是空的，此時esp指向這個棧的頂端。

　　在X86中調用int指令型系統調用後會把用戶棧的%esp的值及相關寄存器壓入內核棧中，系統調用通過iret指令返回，在返回之前會從內核棧彈出用戶棧的%esp和寄存器的狀態，然後進行恢復。所以在進入內核態之前要保存進程的上下文，中斷結束後恢復進程上下文，那靠的就是內核棧。

　　這裏有個細節問題，就是要想在內核棧保存用戶態的esp,eip等寄存器的值，首先得知道內核棧的棧指針，那在進入內核態之前，通過什麼才能獲得內核棧的棧指針呢？答案是：TSS

三：TSS

X86體系結構中包括了一個特殊的段類型：任務狀態段（TSS），用它來存放硬件上下文。TSS反映了CPU上的當前進程的特權級。

linux爲每一個cpu提供一個tss段，並且在tr寄存器中保存該段。

在從用戶態切換到內核態時，可以通過獲取TSS段中的esp0來獲取當前進程的內核棧棧頂指針，從而可以保存用戶態的cs,esp,eip等上下文。

注：linux中之所以爲每一個cpu提供一個tss段，而不是爲每個進程提供一個tss段，主要原因是tr寄存器永遠指向它，在任務切換的適合不必切換tr寄存器，從而減小開銷。

下面我們看下在X86體系中Linux內核對TSS的具體實現：

內核代碼中TSS結構的定義：

3.3內核中：/arch/x86/include/asm/processor.h文件的第248行處：

248   struct tss_struct {
249         /*
250          * The hardware state:
251          */
252         struct x86_hw_tss       x86_tss;
253 
254         /*
255          * The extra 1 is there because the CPU will access an
256          * additional byte beyond the end of the IO permission
257          * bitmap. The extra byte must be all 1 bits, and must
258          * be within the limit.
259          */
260         unsigned long           io_bitmap[IO_BITMAP_LONGS + 1];
261 
262         /*
263          * .. and then another 0x100 bytes for the emergency kernel stack:
264          */
265         unsigned long           stack[64];
266 
267 } ____cacheline_aligned;

其中主要的內容是：

硬件狀態結構 : x86_hw_tss

IO權位圖 :　　　　io_bitmap

備用內核棧：　　 stack

其中硬件狀態結構：其中在32位X86系統中x86_hw_tss的具體定義如下：

/arch/x86/include/asm/processor.h文件中第190行處：

190#ifdef CONFIG_X86_32
191 /* This is the TSS defined by the hardware. */
192 struct x86_hw_tss {
193         unsigned short          back_link, __blh;
194         unsigned long           sp0;　　            //當前進程的內核棧頂指針
195         unsigned short          ss0, __ss0h;       //當前進程的內核棧段描述符
196         unsigned long           sp1;
197         /* ss1 caches MSR_IA32_SYSENTER_CS: */
198         unsigned short          ss1, __ss1h;
199         unsigned long           sp2;
200         unsigned short          ss2, __ss2h;
201         unsigned long           __cr3;
202         unsigned long           ip;
203         unsigned long           flags;
204         unsigned long           ax;
205         unsigned long           cx;
206         unsigned long           dx;
207         unsigned long           bx;
208         unsigned long           sp;      　　　　　　//當前進程用戶態棧頂指針
209         unsigned long           bp;
210         unsigned long           si;
211         unsigned long           di;
212         unsigned short          es, __esh;
213         unsigned short          cs, __csh;
214         unsigned short          ss, __ssh;
215         unsigned short          ds, __dsh;
216         unsigned short          fs, __fsh;
217         unsigned short          gs, __gsh;
218         unsigned short          ldt, __ldth;
219         unsigned short          trace;
220         unsigned short          io_bitmap_base;
221 
222 } __attribute__((packed));

linux的tss段中只使用esp0和iomap等字段，並且不用它的其他字段來保存寄存器，在一個用戶進程被中斷進入內核態的時候，從tss中的硬件狀態結構中取出esp0（即內核棧棧頂指針），然後切到esp0，其它的寄存器則保存在esp0指的內核棧上而不保存在tss中。

每個CPU定義一個TSS段的具體實現代碼：

3.3內核中/arch/x86/kernel/init_task.c第35行：

 35  * per-CPU TSS segments. Threads are completely 'soft' on Linux,
 36  * no more per-task TSS's. The TSS size is kept cacheline-aligned
 37  * so they are allowed to end up in the .data..cacheline_aligned
 38  * section. Since TSS's are completely CPU-local, we want them
 39  * on exact cacheline boundaries, to eliminate cacheline ping-pong.
 40  */
41 DEFINE_PER_CPU_SHARED_ALIGNED(struct tss_struct, init_tss) = INIT_TSS;

INIT_TSS的定義如下:

3.3內核中 /arch/x86/include/asm/processor.h文件的第879行：

879 #define INIT_TSS  {                                                       \
880         .x86_tss = {                                                      \
881                 .sp0            = sizeof(init_stack) + (long)&init_stack, \
882                 .ss0            = __KERNEL_DS,                            \
883                 .ss1            = __KERNEL_CS,                            \
884                 .io_bitmap_base = INVALID_IO_BITMAP_OFFSET,               \
885          },                                                               \
886         .io_bitmap              = { [0 ... IO_BITMAP_LONGS] = ~0 },       \
887 }

其中init_stack是宏定義，指向內核棧：

61 #define init_stack              (init_thread_union.stack)

這裏可以看到分別把內核棧棧頂指針、內核代碼段、內核數據段賦值給TSS中的相應項。從而進程從用戶態切換到內核態時，可以從TSS段中獲取內核棧棧頂指針，進而保存進程上下文到內核棧中。

總結：有了上面的一些準備，現總結在進程從用戶態到內核態切換過程中，Linux主要做的事：

1：讀取tr寄存器，訪問TSS段

2：從TSS段中的sp0獲取進程內核棧的棧頂指針

3: 由控制單元在內核棧中保存當前eflags,cs,ss,eip,esp寄存器的值。

4：由SAVE_ALL保存其寄存器的值到內核棧

5：把內核代碼選擇符寫入CS寄存器，內核棧指針寫入ESP寄存器，把內核入口點的線性地址寫入EIP寄存器

此時，CPU已經切換到內核態，根據EIP中的值開始執行內核入口點的第一條指令。

用戶態到內核態切換分析

Linux wc命令（統計文件行數）

Ubuntu 內核編譯

關於extern const

Ubuntu爲 Eclipse 添加快速啓動項

棧的Java實現（順序存儲實現與鏈式存儲實現）

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結