《深入Linux內核架構》讀書筆記006——進程管理相關的系統調用

說明

進程管理相關的系統調用通常不是由應用程序直接調用的,而是使用了C標準庫這樣的中間層。

進程的創建最重要的是一個複製父進程到子進程的過程。

 

進程複製

Linux實現了3個系統調用用於進程複製。

fork:重量級調用,它建立父進程的完整副本;

vfork:類似於fork,但並不創建父進程數據的副本,而是與父進程共享數據。爲了滿足這個要求,子進程在退出或者開始新程序之前內核保證父進程處於堵塞狀態;

clone:產生線程,可以對父子進程之間的共享、複製進行精確控制;clone使用的細粒度的資源分配擴展了一般的線程概念,在一定程度上允許線程與進程之間的連續轉換;事實上在Linux中,線程和進程之間的差別不是那麼剛性,,這兩個名詞經常用作同義詞;

另外最重要的是,Linux使用了寫時複製(Copy-On-Write,COW)技術,它使父進程的數據不會直接複製到子進程,而是父子進程的地址空間指向同樣的物理內存,這些內存的屬性被設置成只讀。當一個進程試圖向複製的內存寫入,處理器會向內核報告“缺頁異常”,內核會創建該頁專用於當前進程的副本來進行寫操作。

上述系統調用的入口分別適合sys_fork、sys_vfork和sys_clone,它們是平臺相關的,以x86爲例(位於arch\x86\kernel\process_64.c):

asmlinkage long sys_fork(struct pt_regs *regs)
{
	return do_fork(SIGCHLD, regs->rsp, regs, 0, NULL, NULL);
}

asmlinkage long
sys_clone(unsigned long clone_flags, unsigned long newsp,
	  void __user *parent_tid, void __user *child_tid, struct pt_regs *regs)
{
	if (!newsp)
		newsp = regs->rsp;
	return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid);
}

/*
 * This is trivial, and on the face of it looks like it
 * could equally well be done in user mode.
 *
 * Not so, for quite unobvious reasons - register pressure.
 * In user mode vfork() cannot have a stack frame, and if
 * done by calling the "clone()" system call directly, you
 * do not have enough call-clobbered registers to hold all
 * the information you need.
 */
asmlinkage long sys_vfork(struct pt_regs *regs)
{
	return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->rsp, regs, 0,
		    NULL, NULL);
}

實際上它們都走到了平臺無關的函數do_fork()。

/*
 *  Ok, this is the main fork-routine.
 *
 * It copies the process, and if successful kick-starts
 * it and waits for it to finish using the VM if required.
 */
long do_fork(unsigned long clone_flags,
	      unsigned long stack_start,
	      struct pt_regs *regs,
	      unsigned long stack_size,
	      int __user *parent_tidptr,
	      int __user *child_tidptr)

關於這個函數的實現還是直接看書。

 

內核線程

內核線程是直接由內核本身啓動的進程,通過如下的接口創建:

/*
 * create a kernel thread without removing it from tasklists
 */
extern long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);

而它的實現,底層調用的還是do_fork:

pid_t
kernel_thread (int (*fn)(void *), void *arg, unsigned long flags)
{
	extern void start_kernel_thread (void);
	unsigned long *helper_fptr = (unsigned long *) &start_kernel_thread;
	struct {
		struct switch_stack sw;
		struct pt_regs pt;
	} regs;

	memset(&regs, 0, sizeof(regs));
	regs.pt.cr_iip = helper_fptr[0];	/* set entry point (IP) */
	regs.pt.r1 = helper_fptr[1];		/* set GP */
	regs.pt.r9 = (unsigned long) fn;	/* 1st argument */
	regs.pt.r11 = (unsigned long) arg;	/* 2nd argument */
	/* Preserve PSR bits, except for bits 32-34 and 37-45, which we can't read.  */
	regs.pt.cr_ipsr = ia64_getreg(_IA64_REG_PSR) | IA64_PSR_BN;
	regs.pt.cr_ifs = 1UL << 63;		/* mark as valid, empty frame */
	regs.sw.ar_fpsr = regs.pt.ar_fpsr = ia64_getreg(_IA64_REG_AR_FPSR);
	regs.sw.ar_bspstore = (unsigned long) current + IA64_RBS_OFFSET;
	regs.sw.pr = (1 << PRED_KERNEL_STACK);
	return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, &regs.pt, 0, NULL, NULL);
}

另一個創建內核線程的是kthread_create:

/**
 * kthread_create - create a kthread.
 * @threadfn: the function to run until signal_pending(current).
 * @data: data ptr for @threadfn.
 * @namefmt: printf-style name for the thread.
 *
 * Description: This helper function creates and names a kernel
 * thread.  The thread will be stopped: use wake_up_process() to start
 * it.  See also kthread_run(), kthread_create_on_cpu().
 *
 * When woken, the thread will run @threadfn() with @data as its
 * argument. @threadfn() can either call do_exit() directly if it is a
 * standalone thread for which noone will call kthread_stop(), or
 * return when 'kthread_should_stop()' is true (which means
 * kthread_stop() has been called).  The return value should be zero
 * or a negative error number; it will be passed to kthread_stop().
 *
 * Returns a task_struct or ERR_PTR(-ENOMEM).
 */
struct task_struct *kthread_create(int (*threadfn)(void *data),
				   void *data,
				   const char namefmt[],
				   ...)

 

啓動新程序

複製進程之後,用新代碼替換現存程序,即可啓動新程序。

Linux使用execve系統調用來完成這個操作。

同樣execve的入口點對應sys_execve函數:

long
sys_execve (char __user *filename, char __user * __user *argv, char __user * __user *envp,
	    struct pt_regs *regs)
{
	char *fname;
	int error;

	fname = getname(filename);
	error = PTR_ERR(fname);
	if (IS_ERR(fname))
		goto out;
	error = do_execve(fname, argv, envp, regs);
	putname(fname);
out:
	return error;
}

這個是平臺相關的,而對應的do_execve是平臺無關的。

關於do_execve()的實現,也還是看書。

 

退出進程

退出進程使用系統調用exit,它的入口點事sys_exit:

asmlinkage long sys_exit(int error_code)
{
	do_exit((error_code&0xff)<<8);
}

它是跟平臺無關的。

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章