MIT 操作系統實驗 MIT JOS lab4

MIT JOS lab4




寫在前面的碎碎念~ :

         經歷了LAB 3的洗禮,死磕到了lab 4. 這裏還是首先向各位爲JOS 實驗做過筆記,寫過博客,把自己實驗代碼託管到JOS上面的先行者們致敬! 如果沒有這麼好的開源環境, 這麼好的東西學不來. 珍惜, 不用嘴. Doing is better than saying!


--------------------------------------------------------------------------------------------------------------------------------


LAB 4 : Preemptive Multitasking



lab4的主題是多任務搶佔.


對於多核CPU, JOS對其進行了抽象. 具體在./kern/cpu.h


我們可以看到CPU最大數目是支持8個... 呵呵,就是一個宏定義

然後具體的,對CPU利用結構體進行抽象--struct CpuInfo.


而多核處理器我們用 struct mp來進行描述



其中的成員 mp.physaddr指向結構體 struct mpconfig



值得注意的是,這裏最後一個成員是一個佔位的0數組,下面的宏定義用於這個entries的索引






sum()函數把add處len字節長的數據全部加起來.

mpsearch1()函數在物理地址a起始處,長度爲len的範圍內查找MP結構體

他這裏爲什麼sum(mp, sizeof(*mp))要等於0我不怎麼理解



這裏前面一堆的if判斷條件就是爲了最後 *pmp = mp 和 return conf做準備.

pmp是個二級指針,這裏就是把mp賦值給函數串入參數的那個指針.


OK,鋪墊都做好了,可以看mp_init()了 (如果看不懂就反覆看看前面的mp_init()調用的函數)


先通過調用mpconfig()找到struct mpconf然後根據這個結構體內的entries信息對各個CPU結構體進行配置.

如果proc->flag是MPPROC_BOOT,說明這個入口對應的處理器是用於啓動的處理器,我們把結構體數組cpus[ncpu]

地址賦值給bootcpu指針.注意這裏ncpu是個全局變量,那麼這裏實質上就是把cpus數組的第一個元素的地址給了bootcpu. 

那個ismp是個全局變量,默認的初始值爲0, 但是我們進行mp_init()的時候,就把這個全局變量置爲1了,如果出現任何entries匹配錯誤(switch找不到對應項,跳進default),這個時候我們多可處理器的初始化就失敗了,不能用多核處理器進行機器的運行,於是ismp置爲0

後面會對這個ismp進行檢查.


到這裏mp_init()就搞定了

接下來就是lapic_init()做初始化工作.





 


Part A: Multiprocessor Support and Cooperative Multitasking

             In the first part of this lab, you will first extend JOS to run on a multiprocessor system, and then implement some new JOS kernel system calls to allow user­level environments to create additional new environments.  


Multiprocessor Support


            We are going to make JOS support "symmetric multiprocessing" (SMP), a multiprocessor model in which all CPUs have equivalent access to system resources such as memory and I/O buses. While all CPUs are functionally identical in SMP, during the boot process they can be classified into two types: 
the bootstrap processor (BSP) is responsible for initializing the system and for booting the operating system; and the application processors (APs) are activated by the BSP only after the operating system is up and running. Which processor is the BSP is determined by the hardware and the BIOS. Up to this point, all your existing JOS code has been running on the BSP.






//
// Reserve size bytes in the MMIO region and map [pa,pa+size) at this
// location.  Return the base of the reserved region.  size does *not*
// have to be multiple of PGSIZE.
//
void *
mmio_map_region(physaddr_t pa, size_t size)
{
    void * ret = (void *)base;
    size = ROUNDUP(size, PGSIZE);
    if (base + size > MMIOLIM || base + size < base)
    {
        panic("mmio_map_region : reservation overflow\n");
    }

    boot_map_region(kern_pgdir,
                    base,
                    size,
                    pa,
                    (PTE_P | PTE_PCD | PTE_PWT));
                    // never try to give a PTE_U
    base += size;

    return ret;
	//panic("mmio_map_region not implemented");
}




Application Processor Bootstrap


             Before booting up APs, the BSP should first collect information about the multiprocessor system, such
as the total number of CPUs, their APIC IDs and the MMIO address of the LAPIC unit. The  mp_init()
function in  kern/mpconfig.c  retrieves this information by reading the MP configuration table that resides
in the BIOS's region of memory.


       對於SMP, 一開始系統是單核啓動的(BSP), 而後完成啓動後, 纔會是APs.  就是啓動APs的入口函數---- boot_aps()

 


        memmove會把幾乎整個./kern/mpentry.S 文件從mpentry_start 到mpentry_end. 部分的彙編代碼copy到指針code指向的地址處.



                The  boot_aps()  function (in  kern/init.c ) drives the AP bootstrap process. APs start in real mode, much like how the bootloader started in  boot/boot.S , so  boot_aps()  copies the AP entry code ( kern/mpentry.S ) to a memory location that is addressable in the real mode. Unlike with the bootloader, we have some control
over where the AP will start executing code; we copy the entry code to  0x7000  ( MPENTRY_PADDR ), but any unused, page­aligned physical address below 640KB would work.



這裏啓動的時候,在完成內存映射和中斷初始化之後,就開始啓動APS.




對於page_init()只要添加一個 else if (i == MPENTRY_PADDR/PGSIZE)判斷條件





解答:

             #define MPBOOTPHYS(s) ((s) - mpentry_start + MPENTRY_PADDR))))
MPBOOTPHYS is to calculate symobl address relative to MPENTRY_PADDR. The ASM is executed in the load address above KERNBASE, but JOS need to run mp_main at 0x7000 address! Of course 0x7000's page is reserved at pmap.c.


CSDNer SunnyBeiKe 的解答:

但是在AP的保護模式打開之前,是沒有辦法尋址到3G以上的空間的,因此用MPBOOTPHYS是用來計算相應的物理地址的。 

但是在boot.S中,由於尚沒有啓用分頁機制,所以我們能夠指定程序開始執行的地方以及程序加載的地址;但是,在mpentry.S的時候,由於主CPU已經處於保護模式下了,因此是不能直接指定物理地址的,給定線性地址,映射到相應的物理地址是允許的。



多核CPU的內存佈局.



Per-­CPU current environment pointer. 


               Since each CPU can run different user process simultaneously, we redefined the symbol  curenv  to refer to  cpus[cpunum()].cpu_env  (or  thiscpu->cpu_env ), which points to the environment currently executing on the current CPU (the CPU on which the code is running).




// Modify mappings in kern_pgdir to support SMP
//   - Map the per-CPU stacks in the region [KSTACKTOP-PTSIZE, KSTACKTOP)
//
static void
mem_init_mp(void)
{

    int i = 0;
    uintptr_t kstacktop_i;

    for (i = 0; i < NCPU; i++)
    {
        kstacktop_i = KSTACKTOP - i * (KSTKSIZE + KSTKGAP);
        boot_map_region(kern_pgdir,
                    kstacktop_i - KSTKSIZE,
                    ROUNDUP(KSTKSIZE, PGSIZE),
                    PADDR(&percpu_kstacks[i]),
                    (PTE_W) | (PTE_P));
    }

}



 在 i386_init() , mp_main() , trap() , 這幾個函數中添加lock_kernel(),在env_run() 中添加unlock_kernel();


關於spinlock的筆記:

http://blog.csdn.net/cinmyheart/article/details/43880517




If it leaves future-needed data in the single kernel stack, It will have the re-entrant problem.



Round­Robin Scheduling




mp_main()函數:



關於sched_yield()的實現:


// Choose a user environment to run and run it.
void
sched_yield(void)
{
	struct Env *idle;

	// Implement simple round-robin scheduling.
	//
	// Search through 'envs' for an ENV_RUNNABLE environment in
	// circular fashion starting just after the env this CPU was
	// last running.  Switch to the first such environment found.
	//
	// If no envs are runnable, but the environment previously
	// running on this CPU is still ENV_RUNNING, it's okay to
	// choose that environment.
	//
	// Never choose an environment that's currently running on
	// another CPU (env_status == ENV_RUNNING). If there are
	// no runnable environments, simply drop through to the code
	// below to halt the cpu.

	// LAB 4: Your code here.

    idle = thiscpu->cpu_env;
    uint32_t start = (idle != NULL) ? ENVX( idle->env_id) : 0;
    uint32_t i = start;
    bool first = true;
    for (; i != start || first; i = (i+1) % NENV, first = false)
    {
        if(envs[i].env_status == ENV_RUNNABLE)
        {
            env_run(&envs[i]);
            return ;
        }
    }

    if (idle && idle->env_status == ENV_RUNNING)
    {
        env_run(idle);
        return ;
    }

	// sched_halt never returns
	sched_halt();
}


還可以去這裏看看關於JOS 裏的Round robin:

http://blog.csdn.net/cinmyheart/article/details/45192013


記得把kern/init.c裏面的 ENV_CREAT()




當CPUS=2時候,系統運行的IO信息

你會看到如下信息:



System Calls for Environment Creation


  Although your kernel is now capable of running and switching between multiple user level environments, it is still limited to running environments that the kernel initially set up. You will now implement the necessary JOS system calls to allow user environments to create and start other new user environments.


剛纔我們看到的多核CPU的進程併發還有進程切換,這些進程都是內核態的,我們現在要把併發的能力提供給用戶,於是這個時候,我們就需要擴展之前還很簡陋的系統調用功能.


  Unix provides the fork() system call as its process creation primitive. Unix fork() copies the entire address

space of calling process (the parent) to create a new process (the child). The only differences between the two observable from user space are their process IDs and parent process IDs (as returned by getpid and getppid). In the parent, fork() returns the child's process ID, while in the child, fork() returns 0. By default, each process gets its own private address space, and neither process's modifications to memory are visible to the other.


        You will provide a different, more primitive set of JOS system calls for creating new user­mode environments. With these system calls you will be able to implement a Unix­like fork() entirely in user space, in addition to other styles of environment creation. The new system calls you will write for JOS are as follows

自己要搞出一個類似於Unix裏面的fork()函數出來

補全kern/syscall.c裏面的函數



後面還有

sys_env_set_status:
sys_page_alloc:
sys_page_map:
sys_page_unmap:

四個函數需要實現,這裏不再貼出來了,太冗長,去我github看吧.


然後接着要我們利用一個用戶程序去測試這個我們新做出來的fork()函數,即sys_exofork()

           We have provided a very primitive implementation of a Unix­like fork() in the test program user/dumbfork.c. This test program uses the above system calls to create and run a child environment with a copy of its own address space. The two environments then switch back and forth using sys_yield as in the previous exercise. The parent exits after 10 iterations, whereas the child exits after 20.

                                       

從十二點到現在是凌晨兩點半,(⊙v⊙)嗯,期間一直在panic....


終於搞定了...

會調用dumbfork這個函數,然後不停的fork

這裏這個who接受dumbfork的返回值,相當於我們unix系裏fork父進程返回pid,子進程返回0一樣.

如果這裏who是0,那麼就迭代輸出20次,如果who是非0的,那麼迭代10次

兩個進程通過sys_yield實現進程的切換.以至於不會一直搶佔着CPU


下面是部分輸出結果



傳說中的COW技術

Part B: Copy ­on ­Write Fork

爲什麼需要這種技術?

    xv6 Unix implements fork() by copying all data from the parent's pages into new pages allocated for the
child. This is essentially the same approach that dumbfork() takes. The copying of the parent's address space into the child is the most expensive part of the fork() operation.
             However, a call to fork() is frequently followed almost immediately by a call to exec() in the child process, which replaces the child's memory with a new program. This is what the the shell typically does, for example. In this case,the time spent copying the parent's address space is largely wasted, because the child process will use very little of its memory before calling exec() .


                                            




之前見過struct Trapframe,這個結構體是用於用戶態切換到內核態保存進程狀態的結構體.

而如果這裏通過fork()去觸發一個page fault 這時候就需要用戶態去處理這個異常


   During normal execution, a user environment in JOS will run on the normal user stack: its ESP register
starts out pointing at USTACKTOP , and the stack data it pushes resides on the page between USTACKTOP-
PGSIZE and USTACKTOP-1 inclusive. When a page fault occurs in user mode, however, the kernel will
restart the user environment running a designated user­level page fault handler on a different stack,
namely the user exception stack.


下面是User exception stack的棧幀描述










Exercise 11. Finish set_pgfault_handler() in lib/pgfault.c


這裏要在用戶態的程序設置好page fault的handler,調用下面這個函數.如果沒有設置好,會進而調用內核的page fault handler!! 雖然一直panic,但是搞明白了真的很好玩!少年,因爲這就是我們常見的core dump的實現機制啊!

能不興奮麼




這裏有一個已經寫好的用戶程序在 user/faultdie.c


// test user-level fault handler -- just exit when we fault

#include <inc/lib.h>

void
handler(struct UTrapframe *utf)
{
	void *addr = (void*)utf->utf_fault_va;
	uint32_t err = utf->utf_err;
	cprintf("i faulted at va %x, err %x\n", addr, err & 7);
	sys_env_destroy(sys_getenvid());
}

void
umain(int argc, char **argv)
{
	set_pgfault_handler(handler);
	*(int*)0xDeadBeef = 0;
}

我們調用這個程序就可以了 make run-faultdie




我們還可以更細緻的去分析一下究竟是怎麼觸發中斷的

gdb調試的時候內核態好像不能檢測到用戶態的調試信息,我們只能手動的去設置斷點了(就是不能用gdb的tab自動補全)


這裏我們看到,在obj/user/faultdie.sym裏我們能看到linker需要的符號信息,還有對應的地址.


我們在 umain處設置斷點,即 0x800033



左邊是直接反彙編faultdie.o的結果.右邊是gdb調試的時候,我們看到的單個指令運行的調試信息.


(gdb調試的時候可能寄存器會有點不一樣) 但是我們還是會發現執行movl指令之後會觸發page fault

這裏引用了這個非法地址,嘗試寫入0,觸發page fault.

*(int*)0xDeadBeef = 0;

這個時候我們會先進入 TRAPTHANDLER(page_fault, T_PGFLT)處理程序.

後續會調用trap()函數,並進一步調用trap_dispatch() ,再調用page_fault_handler()

重頭戲就來了.這個時候由於我們是設置了env_page_fault_upcall,那麼就會調用這個用戶態的處理程序.

在就會進入


handler(struct UTrapframe *utf)
{
	void *addr = (void*)utf->utf_fault_va;
	uint32_t err = utf->utf_err;
	cprintf("i faulted at va %x, err %x\n", addr, err & 7);
	sys_env_destroy(sys_getenvid());
}

所以你才能看到 i faulted at va deadbeef err 6這個在屏幕上通過cprintf的輸出信息

這種機制非常棒!你會發現,它保護了系統!!!用戶是可以觸發page fault但是用戶的page fault不會導致系統掛掉.不會panic,只是返回給用戶信息就好.這和我們常常寫的普通C 程序防止頁面非法讀寫的原理是一樣的,

我們常常遇到的"core dump""段錯誤" (給非法的地址寫入值),就是這種防範機制.

你還是不至於寫個用戶C程序就吧系統給玩掛了吧...禁不住感嘆,真的很精彩啊!哈哈

比較左右兩個用戶程序,不同的就是左邊調用的cprintf而右邊調用的是 sys_cputs



結果是左邊正常觸發用戶態的page fault handler,而右邊的會被斷言掛掉

問題的本質就在於sys_cputs本身就是個系統調用哇...壓根就跳過了trap_dispatch那一步...沒別的



Implementing Copy­on­Write Fork


    You now have the kernel facilities to implement copy­on­write fork() entirely in user space.


直接貼結果了.代碼去這裏看吧,有點長.

https://github.com/jasonleaster/MIT_JOS_2014/tree/lab4





用戶程序是這樣的:

// Fork a binary tree of processes and display their structure.

#include <inc/lib.h>

#define DEPTH 3

void forktree(const char *cur);

void
forkchild(const char *cur, char branch)
{
	char nxt[DEPTH+1];

	if (strlen(cur) >= DEPTH)
		return;

	snprintf(nxt, DEPTH+1, "%s%c", cur, branch);
	if (fork() == 0) {
		forktree(nxt);
		exit();
	}
}

void
forktree(const char *cur)
{
	cprintf("%04x: I am '%s'\n", sys_getenvid(), cur);

	forkchild(cur, '0');
	forkchild(cur, '1');
}

void
umain(int argc, char **argv)
{
	forktree("");
}

forktree()輸入一個初始的空字符這個函數又會調用forkchild.

forkchild()是個遞歸函數,父進程會返回,而子進程又會調用forktree(),知道遞歸的深度達到3之後就會返回,因此我們最多能看到的是子進程"111"




copy on write技術的核心思想就是先不要拷貝進程的進程空間到子進程中,如果一旦子進程嘗試寫入操作,那麼再觸發用戶態的page fault,進而調用trap_dispatch(),會有配置好的用戶態page fault handler去處理這個錯誤,這個handler實質上就是lib/fork.c裏面的pgfault()函數.這個函數會把父進程的東東拷貝到子進程


Part C: Preemptive Multitasking and Inter­Process communication (IPC)


              

去kern/trap.c裏面的trap_dispatch()補好下面這部分.具體的還是去我github看比較保險





Inter­Process communication (IPC)


               


我都不想寫下去了...囉嗦的shi...

心情好的時候再更吧

我希望以後的筆記貼更傾向於理論的分析,不過多的貼代碼解答.代碼直接去github自己看就好.對應的lab分支裏都能找到.有心人自會去看.希望有機會一起交流討論.

linus不是說過麼"RTFSC"


: )


我的個神啊...lab4略微震撼..真的很爽




也終於算是告一段落.後續會持續更新,把筆記貼寫的更清楚,詳細.






發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章