說明
進程管理相關的系統調用通常不是由應用程序直接調用的,而是使用了C標準庫這樣的中間層。
進程的創建最重要的是一個複製父進程到子進程的過程。
進程複製
Linux實現了3個系統調用用於進程複製。
fork:重量級調用,它建立父進程的完整副本;
vfork:類似於fork,但並不創建父進程數據的副本,而是與父進程共享數據。爲了滿足這個要求,子進程在退出或者開始新程序之前內核保證父進程處於堵塞狀態;
clone:產生線程,可以對父子進程之間的共享、複製進行精確控制;clone使用的細粒度的資源分配擴展了一般的線程概念,在一定程度上允許線程與進程之間的連續轉換;事實上在Linux中,線程和進程之間的差別不是那麼剛性,,這兩個名詞經常用作同義詞;
另外最重要的是,Linux使用了寫時複製(Copy-On-Write,COW)技術,它使父進程的數據不會直接複製到子進程,而是父子進程的地址空間指向同樣的物理內存,這些內存的屬性被設置成只讀。當一個進程試圖向複製的內存寫入,處理器會向內核報告“缺頁異常”,內核會創建該頁專用於當前進程的副本來進行寫操作。
上述系統調用的入口分別適合sys_fork、sys_vfork和sys_clone,它們是平臺相關的,以x86爲例(位於arch\x86\kernel\process_64.c):
asmlinkage long sys_fork(struct pt_regs *regs)
{
return do_fork(SIGCHLD, regs->rsp, regs, 0, NULL, NULL);
}
asmlinkage long
sys_clone(unsigned long clone_flags, unsigned long newsp,
void __user *parent_tid, void __user *child_tid, struct pt_regs *regs)
{
if (!newsp)
newsp = regs->rsp;
return do_fork(clone_flags, newsp, regs, 0, parent_tid, child_tid);
}
/*
* This is trivial, and on the face of it looks like it
* could equally well be done in user mode.
*
* Not so, for quite unobvious reasons - register pressure.
* In user mode vfork() cannot have a stack frame, and if
* done by calling the "clone()" system call directly, you
* do not have enough call-clobbered registers to hold all
* the information you need.
*/
asmlinkage long sys_vfork(struct pt_regs *regs)
{
return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->rsp, regs, 0,
NULL, NULL);
}
實際上它們都走到了平臺無關的函數do_fork()。
/*
* Ok, this is the main fork-routine.
*
* It copies the process, and if successful kick-starts
* it and waits for it to finish using the VM if required.
*/
long do_fork(unsigned long clone_flags,
unsigned long stack_start,
struct pt_regs *regs,
unsigned long stack_size,
int __user *parent_tidptr,
int __user *child_tidptr)
關於這個函數的實現還是直接看書。
內核線程
內核線程是直接由內核本身啓動的進程,通過如下的接口創建:
/*
* create a kernel thread without removing it from tasklists
*/
extern long kernel_thread(int (*fn)(void *), void * arg, unsigned long flags);
而它的實現,底層調用的還是do_fork:
pid_t
kernel_thread (int (*fn)(void *), void *arg, unsigned long flags)
{
extern void start_kernel_thread (void);
unsigned long *helper_fptr = (unsigned long *) &start_kernel_thread;
struct {
struct switch_stack sw;
struct pt_regs pt;
} regs;
memset(®s, 0, sizeof(regs));
regs.pt.cr_iip = helper_fptr[0]; /* set entry point (IP) */
regs.pt.r1 = helper_fptr[1]; /* set GP */
regs.pt.r9 = (unsigned long) fn; /* 1st argument */
regs.pt.r11 = (unsigned long) arg; /* 2nd argument */
/* Preserve PSR bits, except for bits 32-34 and 37-45, which we can't read. */
regs.pt.cr_ipsr = ia64_getreg(_IA64_REG_PSR) | IA64_PSR_BN;
regs.pt.cr_ifs = 1UL << 63; /* mark as valid, empty frame */
regs.sw.ar_fpsr = regs.pt.ar_fpsr = ia64_getreg(_IA64_REG_AR_FPSR);
regs.sw.ar_bspstore = (unsigned long) current + IA64_RBS_OFFSET;
regs.sw.pr = (1 << PRED_KERNEL_STACK);
return do_fork(flags | CLONE_VM | CLONE_UNTRACED, 0, ®s.pt, 0, NULL, NULL);
}
另一個創建內核線程的是kthread_create:
/**
* kthread_create - create a kthread.
* @threadfn: the function to run until signal_pending(current).
* @data: data ptr for @threadfn.
* @namefmt: printf-style name for the thread.
*
* Description: This helper function creates and names a kernel
* thread. The thread will be stopped: use wake_up_process() to start
* it. See also kthread_run(), kthread_create_on_cpu().
*
* When woken, the thread will run @threadfn() with @data as its
* argument. @threadfn() can either call do_exit() directly if it is a
* standalone thread for which noone will call kthread_stop(), or
* return when 'kthread_should_stop()' is true (which means
* kthread_stop() has been called). The return value should be zero
* or a negative error number; it will be passed to kthread_stop().
*
* Returns a task_struct or ERR_PTR(-ENOMEM).
*/
struct task_struct *kthread_create(int (*threadfn)(void *data),
void *data,
const char namefmt[],
...)
啓動新程序
複製進程之後,用新代碼替換現存程序,即可啓動新程序。
Linux使用execve系統調用來完成這個操作。
同樣execve的入口點對應sys_execve函數:
long
sys_execve (char __user *filename, char __user * __user *argv, char __user * __user *envp,
struct pt_regs *regs)
{
char *fname;
int error;
fname = getname(filename);
error = PTR_ERR(fname);
if (IS_ERR(fname))
goto out;
error = do_execve(fname, argv, envp, regs);
putname(fname);
out:
return error;
}
這個是平臺相關的,而對應的do_execve是平臺無關的。
關於do_execve()的實現,也還是看書。
退出進程
退出進程使用系統調用exit,它的入口點事sys_exit:
asmlinkage long sys_exit(int error_code)
{
do_exit((error_code&0xff)<<8);
}
它是跟平臺無關的。