分析schedule()的執行過程

鄭德倫原創作品轉載請註明出處《Linux內核分析》MOOC課程
http://mooc.study.163.com/course/USTC-1000029000
我們在實驗樓的終端中輸入qemu –kernel linux-3.18.6/arch/x86/boot/bzImage –initrd rootfs.img –S –s
然後打開另一個終端輸入

gdb
(gdb)file linux-3.18.6/vmlinux
(gdb)target remote:1234
(gdb)b schedule
(gdb)c

進行調試跟蹤schedule的執行過程。

進程調度時，首先進入schedule()函數，將一個task_struct結構體的指針tsk賦值爲當前進程。
然後調用sched_submit_work(tsk)
我們進入這個函數，查看一下做了什麼工作
我們在執行到sched_submit_work時，輸入si進入函數。

可以看到這個函數時檢測tsk->state是否爲0 （runnable）若爲運行態時則返回，
tsk_is_pi_blocked(tsk),檢測tsk的死鎖檢測器是否爲空，若非空的話就return。

然後檢測是否需要刷新plug隊列，用來避免死鎖。
sched_submit_work主要是來避免死鎖。
然後我們進入__schedule()函數。

__schedule()是切換進程的真正代碼，我們來分析一下具體的關鍵代碼
1.創建一些局部變量，

struct task_struct *prev, *next;//當前進程和一下個進程的進程結構體
unsigned long *switch_count;//進程切換次數
struct rq *rq;//就緒隊列
int cpu;

關閉內核搶佔，初始化一部分變量

need_resched:
preempt_disable();//關閉內核搶佔
cpu = smp_processor_id();
rq = cpu_rq(cpu);//與CPU相關的runqueue保存在rq中
rcu_note_context_switch(cpu);
prev = rq->curr;//將runqueue當前的值賦給prev

3.選擇next進程

next = pick_next_task(rq, prev);//挑選一個優先級最高的任務排進隊列
clear_tsk_need_resched(prev);//清除prev的TIF_NEED_RESCHED標誌。
clear_preempt_need_resched();

4.完成進程的調度

if (likely(prev != next)) {//如果prev和next是不同進程
        rq->nr_switches++;//隊列切換次數更新
        rq->curr = next;
        ++*switch_count;//進程切換次數更新

        context_switch(rq, prev, next); /* unlocks the rq *///進程上下文的切換
        /*
         * The context switch have flipped the stack from under us
         * and restored the local variables which were saved when
         * this task called schedule() in the past. prev == current
         * is still correct, but it can be moved to another cpu/rq.
         */
cpu = smp_processor_id();
        rq = cpu_rq(cpu);
    } else//如果是同一個進程不需要切換
        raw_spin_unlock_irq(&rq->lock);

這段代碼中context_switch(rq,prev,next)完成了從prev到next的進程上下文的切換。我們進入這個函數查看

static inline void
context_switch(struct rq *rq, struct task_struct *prev,
           struct task_struct *next)
{
    struct mm_struct *mm, *oldmm;//初始化進程地址管理結構體mm和oldmm
    prepare_task_switch(rq, prev, next);//完成進程切換的準備工作
    mm = next->mm;
    oldmm = prev->active_mm;
    /*完成mm_struct的切換*/
if (!mm) {
        next->active_mm = oldmm;
        atomic_inc(&oldmm->mm_count);
        enter_lazy_tlb(oldmm, next);
    } else
        switch_mm(oldmm, mm, next);
    if (!prev->mm) {
        prev->active_mm = NULL;
        rq->prev_mm = oldmm;
    }
switch_to(prev, next, prev);//進程切換的核心代碼
barrier();
finish_task_switch(this_rq(), prev);
}

我們看到在context_switch中使用switch_to(prev,next,prev)來切換進程。我們查看一下switch_to的代碼。
switch_to是一個宏定義，完成進程從prev到next的切換，首先保存flags，然後保存當前進程的ebp，然後把當前進程的esp保存到prev->thread.sp中，然後把標號1:的地址保存到prev->thread.ip中。
然後把next->thread.ip壓入堆棧。這裏，如果之前B也被switch_to出去過，那麼next->thread.ip裏存的就是下面這個1f的標號，但如果next進程剛剛被創建，之前沒有被switch_to出去過，那麼next->thread.ip裏存的將是ret_ftom_fork
__switch_canqry應該是現代操作系統防止棧溢出攻擊的金絲雀技術。
jmp __switch_to使用regparm call, 參數不是壓入堆棧，而是使用寄存器傳值，來調用__switch_to
eax存放prev,edx存放next。這裏爲什麼不用call __switch_to而用jmp，因爲call會導致自動把下面這句話的地址(也就是1:)壓棧，然後__switch_to()就必然只能ret到這裏，而無法根據需要ret到ret_from_fork
當一個進程再次被調度時，會從1:開始執行，把ebp彈出，然後把flags彈出。

#define switch_to(prev, next, last)                 \
do {                                    \
    /*                              \
     * Context-switching clobbers all registers, so we clobber  \
     * them explicitly, via unused output variables.        \
     * (EAX and EBP is not listed because EBP is saved/restored \
     * explicitly for wchan access and EAX is the return value of   \
     * __switch_to())                       \
     */                             \
    unsigned long ebx, ecx, edx, esi, edi;              \
                                    \
    asm volatile("pushfl\n\t"       /* save    flags */ \
             "pushl %%ebp\n\t"      /* save    EBP   */ \
             "movl %%esp,%[prev_sp]\n\t"    /* save    ESP   */ \
             "movl %[next_sp],%%esp\n\t"    /* restore ESP   */ \
             "movl $1f,%[prev_ip]\n\t" /* save    EIP   */ \
             "pushl %[next_ip]\n\t" /* restore EIP   */ \
             __switch_canary                    \
             "jmp __switch_to\n"    /* regparm call  */ \
             "1:\t"                     \
             "popl %%ebp\n\t"       /* restore EBP   */ \
             "popfl\n"          /* restore flags */ \
                                    \
             /* output parameters */                \
             : [prev_sp] "=m" (prev->thread.sp),        \
               [prev_ip] "=m" (prev->thread.ip),        \
               "=a" (last),                 \
                                    \
               /* clobbered output registers: */        \
               "=b" (ebx), "=c" (ecx), "=d" (edx),      \
               "=S" (esi), "=D" (edi)               \
                                        \
               __switch_canary_oparam               \
                                    \
               /* input parameters: */              \
             : [next_sp]  "m" (next->thread.sp),        \
               [next_ip]  "m" (next->thread.ip),        \
                                        \
               /* regparm parameters for __switch_to(): */  \
               [prev]     "a" (prev),               \
               [next]     "d" (next)                \
                                    \
               __switch_canary_iparam               \
                                    \
             : /* reloaded segment registers */         \
            "memory");                  \
} while (0)

5.開啓搶佔

sched_preempt_enable_no_resched();
if (need_resched())
        goto need_resched;

到此，進程的切換過程就完成了。
總結：
整個schedule的執行過程可以用下面的流程圖表示：

分析schedule()的執行過程

linux安裝cuda和cudnn

模擬手機設備：使用 Playwright 實現移動端自動化測試

Mellanox網卡開啓SR-IOV

測試人員都是畫畫大神，讓我看看誰還不會用代碼圖？

Object.values()對象遍歷

我拍了拍Redis，被移出了羣聊···

網絡現代化通向雲原生應用的高速公路

面試官：說說你對序列化的理解

我宣佈，這是我找到的史上AI最全論文體系！

函數名、變量前後的_（一個下劃線)、__（兩個下劃線）分別有什麼用

kickstart自動生產存在lvm分區時出現name already in use錯誤的解決方法

strongswan與vpp實現ipsec

棧溢出攻擊的理解

九度OJ 1009

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結