Linux kernel學習之進程切換

當前進程可能主動或被動地放棄CPU，如磁盤IO操作阻塞、時間片到期或者進程調用exit退出，此時將發生進程切換，即切換到另一個就緒態的進程。進程切換主要在context_switch()完成，該函數位於kernel/sched.c中，如下：

/*
 * context_switch - switch to the new MM and the new
 * thread's register state.
 */
static inline void
context_switch(struct rq *rq, struct task_struct *prev,
	       struct task_struct *next)
{
	struct mm_struct *mm, *oldmm;

	prepare_task_switch(rq, prev, next);
	mm = next->mm;
	oldmm = prev->active_mm;
	/*
	 * For paravirt, this is coupled with an exit in switch_to to
	 * combine the page table reload and the switch backend into
	 * one hypercall.
	 */
	arch_enter_lazy_cpu_mode();

	/*
	 * 新進程沒有mm結構（存儲當前進程的虛擬內存），即內核態進程。
	 * 此時借用prev進程的mm結構，內核態進程只訪問內核態地址空間，
	 * 而內核態空間是所有進程共享的，不會有問題。
	 */
	if (unlikely(!mm)) {
		next->active_mm = oldmm;	//借用mm結構
		atomic_inc(&oldmm->mm_count);	//增加引用計數
		enter_lazy_tlb(oldmm, next);
	} else
		switch_mm(oldmm, mm, next);	//新進程有mm結構，切換到新進程的虛擬內存空間。

	if (unlikely(!prev->mm)) {
		prev->active_mm = NULL;
		rq->prev_mm = oldmm;
	}
	/*
	 * Since the runqueue lock will be released by the next
	 * task (which is an invalid locking op but in the case
	 * of the scheduler it's an obvious special-case), so we
	 * do an early lockdep release here:
	 */
#ifndef __ARCH_WANT_UNLOCKED_CTXSW
	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
#endif

	/* Here we just switch the register state and the stack. */
	switch_to(prev, next, prev);	//切換寄存器和棧狀態，具體的切換動作在此完成

	barrier();
	/*
	 * this_rq must be evaluated again because prev may have moved
	 * CPUs since it called schedule(), thus the 'rq' on its stack
	 * frame will be invalid.
	 */
	finish_task_switch(this_rq(), prev);
}

具體的切換動作由switch_to()完成，然而在此之前，已經完成了虛擬內存的切換。但是在進程切換過程中，處於內核態，訪問的都是內核地址空間，而內核地址空間是由所有進程共享的，因此不會出現問題。

switch_to()位於include/asm-x86/system_32.h中，是一段彙編代碼，如下：

#define switch_to(prev,next,last) do {					\
	unsigned long esi,edi;						\
	asm volatile("pushfl\n\t"		/* Save flags */	\
		     "pushl %%ebp\n\t"					\
		     "movl %%esp,%0\n\t"	/* save ESP */		\
		     "movl %5,%%esp\n\t"	/* restore ESP */	\
		     "movl $1f,%1\n\t"		/* save EIP */		\
		     "pushl %6\n\t"		/* restore EIP */	\
		     "jmp __switch_to\n"				\
		     "1:\t"						\
		     "popl %%ebp\n\t"					\
		     "popfl"						\
		     :"=m" (prev->thread.esp),"=m" (prev->thread.eip),	\
		      "=a" (last),"=S" (esi),"=D" (edi)			\
		     :"m" (next->thread.esp),"m" (next->thread.eip),	\
		      "2" (prev), "d" (next));				\
} while (0)

它實際上是一個宏定義，主要完成保存prev進程狀態，切換到next進程的操作。

首先將prev進程的EFLAGS和EBP寄存器壓入棧中，然後把ESP保存在prev->thread.esp中，以保存prev進程的狀態。

隨後執行movl next.thread->esp, %esp，切換內核棧。此時，內核已經在next進程的內核棧中執行。

接下來，movl $1f, prev->thread.eip，將標號1的地址存儲在prev進程的eip中，到下一次進程切換到prev時，就會從標號1處開始執行，可以看到標號1處就是將之前壓入棧中的EFLAGS和EBP彈出棧，恢復環境。

之後pushl next->thread.eip，將next進程的運行地址壓入棧中。注意接下來調用__switch_to函數時，使用jmp指令直接跳轉，這樣就不會將返回地址壓棧。當__switch_to函數執行ret指令時，就會跳到之前壓入的eip出執行，也就是說，當__switch_to函數返回時，cpu就正式執行next進程的代碼了。

另外，切換進程只需要兩個進程的信息，爲什麼switch_to需要3個參數呢？假設進程A切到B，則此刻A的內核棧上，調用context_switch()的兩個參數prev和next分別指向A和B。如果B又切換到了C，最後由C切換到A。這時候切換到A時，prev應該指向C，但在A的內核棧上的prev卻依然是A。在context_switch()函數的最後，還要使用prev這個參數，這樣就導致了錯誤。於是就需要第三個參數last，在調用__switch_to返回時，eax指向prev，這時候就將其賦給last，那麼在新進程的棧上就可以保存正確的值了。

switch_to的Intel風格彙編如下。注意switch_to是一個宏，不是函數。被調用時，prev和next是context_switch()函數的局部變量，利用ebp相對尋址，即ebp+prev_offset的形式。

mov eax, [ebp+prev_offset]                #AT&T彙編風格中輸入部分的"2" (prev)，和"=a"(last)使用同一個寄存器eax
mov edx, [ebp+next_offset]                #"d" (next)
push eflags                               #EFLAGS入棧
push ebp                                  #EBP入棧
mov [ebp+prev_offset]->thread.esp, esp    #保存esp
mov esp, [ebp+next_offset]->thread.esp    #切換ESP，注意此時已經切換到了next進程的內核棧
mov [ebp+prev_offset]->thread.eip, $1     #將標號1位置的地址保存到prev進程中的eip，再次切換到prev進程時，會從標號1處開始繼續執行
push [ebp+next_offset]->thread.eip        #將next進程中保存的eip壓入棧。如果next是之前被換出的，則此時的eip也是下面的標號1
jmp __switch_to                           #跳到__switch_to函數執行硬件上下文切換。
                                          #__switch_to是一個fastcall類型的函數，兩個參數分別保存在eax和edx中，即prev和next。
                                          #使用jmp而非call進行跳轉，ret時，會跳到當前棧頂內容指向的位置，也即剛纔壓入的next進程的eip
                                          #__switch_to的返回值是prev，即函數ret後，eax指向prev
1:
pop ebp                                   #__switch_to函數返回後，會跳到這裏執行，此時將上次被切換出時保存的ebp恢復
pop eflags                                #恢復EFLAGS
mov [ebp+prev_offset], eax                #"=a" (last)，注意此時的ebp已經是next進程的ebp，此時eax保存的是切換之前的prev

__switch_to()主要完成硬件上下文的切換，在這裏就不過多討論了。__switch_to函數返回時，cpu就開始執行下一個進程的代碼了。

整個切換過程中的函數調用關係如下：

context_switch() ---> switch_to() ---> __switch_to()

其中，context_switch()完成進程頁表項的切換，switch_to()完成寄存器的切換，__switch_to()完成硬件上下文的切換。

參考資料：http://blog.csdn.net/xiaoxiaomuyu2010/article/details/11935393

Linux kernel學習之進程切換

IIS上傳漏洞

Ubuntu添加BT5軟件源

關閉C語言中system()函數的回顯

Adobe PDF LibTiff Integer Overflow CVE-2010-0188分析

Ubuntu下安裝Metasploit

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結