總體說明

本文介紹Linux下處理進程所使用的結構體task_struct。

它位於include\linux\sched.h。

對應的結構體如下：


struct task_struct {
	volatile long state;	/* -1 unrunnable, 0 runnable, >0 stopped */
	void *stack;
	atomic_t usage;
	unsigned int flags;	/* per process flags, defined below */
	unsigned int ptrace;

	int lock_depth;		/* BKL lock depth */

#ifdef CONFIG_SMP
#ifdef __ARCH_WANT_UNLOCKED_CTXSW
	int oncpu;
#endif
#endif

	int prio, static_prio, normal_prio;
	struct list_head run_list;
	const struct sched_class *sched_class;
	struct sched_entity se;


        // 後面略，還有很多。
}

下面會分別介紹各個成員。

state

指定進程的當前狀態。

對應的值：

/*
 * Task state bitmask. NOTE! These bits are also
 * encoded in fs/proc/array.c: get_task_state().
 *
 * We have two separate sets of flags: task->state
 * is about runnability, while task->exit_state are
 * about the task exiting. Confusing, but this way
 * modifying one set can't modify the other one by
 * mistake.
 */
#define TASK_RUNNING		0
#define TASK_INTERRUPTIBLE	1
#define TASK_UNINTERRUPTIBLE	2
#define TASK_STOPPED		4
#define TASK_TRACED		8
/* in tsk->exit_state */
#define EXIT_ZOMBIE		16
#define EXIT_DEAD		32
/* in tsk->state again */
#define TASK_DEAD		64

另外還有一個exit_state，值時上面的EXIT_XX那幾個。

rlim

對應的是一個數組：

	/*
	 * We don't bother to synchronize most readers of this at all,
	 * because there is no reader checking a limit that actually needs
	 * to get both rlim_cur and rlim_max atomically, and either one
	 * alone is a single word that can safely be read normally.
	 * getrlimit/setrlimit use task_lock(current->group_leader) to
	 * protect this instead of the siglock, because they really
	 * have no need to disable irqs.
	 */
	struct rlimit rlim[RLIM_NLIMITS];

結構體表示如下：

struct rlimit {
	unsigned long	rlim_cur;
	unsigned long	rlim_max;
};

分別表示資源的軟限制和硬限制。

通過getrlimit/setrlimit可以操作限制，但是看了代碼也看不出來是如何實現的......

具體進程的限制可以通過如下命令查看：

這裏的RLIM_NLIMITS的指是15，所以最多有15個限制，不過上圖有16個，可能跟使用的版本不同有關（上圖是在Ubuntu18.04上查看的）。

nsproxy

進程中包含命名空間相關的指針，對應的結構體：

/*
 * A structure to contain pointers to all per-process
 * namespaces - fs (mount), uts, network, sysvipc, etc.
 *
 * 'count' is the number of tasks holding a reference.
 * The count for each namespace, then, will be the number
 * of nsproxies pointing to it, not the number of tasks.
 *
 * The nsproxy is shared by tasks which share all namespaces.
 * As soon as a single namespace is cloned or unshared, the
 * nsproxy is copied.
 */
struct nsproxy {
	atomic_t count;
	struct uts_namespace *uts_ns;
	struct ipc_namespace *ipc_ns;
	struct mnt_namespace *mnt_ns;
	struct pid_namespace *pid_ns;
	struct user_namespace *user_ns;
	struct net 	     *net_ns;
};

進程與命名空間的關係：

命名空間有不少個，具體的說明如下：

通過fork/clone/unshare等系統調用可以創建新的命名空間，通過標識參數來實現，對應的標識有：

/*
 * cloning flags:
 */
#define CSIGNAL		0x000000ff	/* signal mask to be sent at exit */
#define CLONE_VM	0x00000100	/* set if VM shared between processes */
#define CLONE_FS	0x00000200	/* set if fs info shared between processes */
#define CLONE_FILES	0x00000400	/* set if open files shared between processes */
#define CLONE_SIGHAND	0x00000800	/* set if signal handlers and blocked signals shared */
#define CLONE_PTRACE	0x00002000	/* set if we want to let tracing continue on the child too */
#define CLONE_VFORK	0x00004000	/* set if the parent wants the child to wake it up on mm_release */
#define CLONE_PARENT	0x00008000	/* set if we want to have the same parent as the cloner */
#define CLONE_THREAD	0x00010000	/* Same thread group? */
#define CLONE_NEWNS	0x00020000	/* New namespace group? */
#define CLONE_SYSVSEM	0x00040000	/* share system V SEM_UNDO semantics */
#define CLONE_SETTLS	0x00080000	/* create a new TLS for the child */
#define CLONE_PARENT_SETTID	0x00100000	/* set the TID in the parent */
#define CLONE_CHILD_CLEARTID	0x00200000	/* clear the TID in the child */
#define CLONE_DETACHED		0x00400000	/* Unused, ignored */
#define CLONE_UNTRACED		0x00800000	/* set if the tracing process can't force CLONE_PTRACE on this clone */
#define CLONE_CHILD_SETTID	0x01000000	/* set the TID in the child */
#define CLONE_STOPPED		0x02000000	/* Start in stopped state */
#define CLONE_NEWUTS		0x04000000	/* New utsname group? */
#define CLONE_NEWIPC		0x08000000	/* New ipcs */
#define CLONE_NEWUSER		0x10000000	/* New user namespace */
#define CLONE_NEWPID		0x20000000	/* New pid namespace */
#define CLONE_NEWNET		0x40000000	/* New network namespace */

init_nsproxy定義了初始的全局命名空間：

extern struct nsproxy init_nsproxy;
struct nsproxy init_nsproxy = INIT_NSPROXY(init_nsproxy);

對應的宏：

#define INIT_NSPROXY(nsproxy) {						\
	.pid_ns		= &init_pid_ns,					\
	.count		= ATOMIC_INIT(1),				\
	.uts_ns		= &init_uts_ns,					\
	.mnt_ns		= NULL,						\
	INIT_NET_NS(net_ns)                                             \
	INIT_IPC_NS(ipc_ns)						\
	.user_ns	= &init_user_ns,				\
}

下面分別說明。

UTS命名空間

結構體如下：

struct uts_namespace {
	struct kref kref;
	struct new_utsname name;
};

第一個參數在之前介紹內核對象的時候有提到過，是用於跟蹤內核中有多少地方使用uts_namespace結構體實例的引用計數。

第二個參數纔是真正需要的成員，它也是一個結構體：

struct new_utsname {
	char sysname[65];
	char nodename[65];
	char release[65];
	char version[65];
	char machine[65];
	char domainname[65];
};

它就是一堆字符串的結合體。

這些字符串表示了系統名稱，內核發佈版本，機器名等信息。

使用如下命令可以查看：

它通過init_uts_ns初始化：

struct uts_namespace init_uts_ns = {
	.kref = {
		.refcount	= ATOMIC_INIT(2),
	},
	.name = {
		.sysname	= UTS_SYSNAME,
		.nodename	= UTS_NODENAME,
		.release	= UTS_RELEASE,
		.version	= UTS_VERSION,
		.machine	= UTS_MACHINE,
		.domainname	= UTS_DOMAINNAME,
	},
};
EXPORT_SYMBOL_GPL(init_uts_ns);

這裏的UTS_XXX宏就是一個個的字符串。這次字符串有些不能改，而有些可以修改。

通過copy_utsname可以創建新的UTS命名空間。

爲了創建新的UTS命名空間，會生成先前的ust_namespace實例的副本，當前進程的nsproxy實例內部的指針會指向新的副本。

用戶命名空間

用戶命名空間的結構體如下：

struct user_namespace {
	struct kref		kref;
	struct hlist_head	uidhash_table[UIDHASH_SZ];
	struct user_struct	*root_user;
};

kref不再贅述。

root_user用於負責記錄資源消耗，而通過uidhash_table可以訪問到這些資源。

注意雖然叫用戶命名空間，但是關注點在資源上，之所以叫用戶命名空間，應該是因爲資源是針對用戶的，而用戶通過UID來區分。

進程ID

進程相關的ID有很多。

首先是每個進程都有一個在其命名空間下唯一的ID，稱爲PID：

pid_t pid;

其次，進程可能在某個線程組下，因此包含一個線程組ID，稱爲TGID：

pid_t tgid;

由於父命名空間可以看到子命名空間的進程，所以某些進程可能具有多個PID（針對不同的命名空間），爲此需要了解全局ID和局部ID的概念：

全局ID是在內核本身和初始命名空間中的唯一ID，在系統啓動期間開始的init進程即屬於初始命名空間。

局部ID屬於某個特定的命名空間，不具備全局有效性。

上面提到的PID和TGID就是全局ID。

幾個進程可以合併成進程組，對應有進程組ID，稱爲PGID，它的值就是進程組組長的ID；幾個進程組可以合併成一個會話，因此存在一個會話ID，稱爲SID。它們並沒有直接在task_struct中，而是保存在用於信號處理的結構體中：

struct signal_struct *signal;  // 位於task_struct
/*
 * NOTE! "signal_struct" does not have it's own
 * locking, because a shared signal_struct always
 * implies a shared sighand_struct, so locking
 * sighand_struct is always a proper superset of
 * the locking of signal_struct.
 */
struct signal_struct {
    // 前略

    /*
    * pgrp and session fields are deprecated.
    * use the task_session_Xnr and task_pgrp_Xnr routines below
    */

    union {
        pid_t pgrp __deprecated;
        pid_t __pgrp;
    };

    struct pid *tty_old_pgrp;

    union {
        pid_t session __deprecated;
        pid_t __session;
    };

    // 後略

不過從說明上來看這兩個值以後不再使用了。

進程關係

task_struct中有如下的兩個成員：

/*
* children/sibling forms the list of my children plus the
* tasks I'm ptracing.
*/
struct list_head children;	/* list of my children */
struct list_head sibling;	/* linkage in my parent's children list */

它們都是鏈表，前者保存所有子進程，後者保存具有相同父進程的兄弟進程。

表示進程的結構體task_struct非常大，這裏不再一一介紹，在之後的文章中還會持續說明。

《深入Linux內核架構》讀書筆記004——進程表示

總體說明

state

rlim

nsproxy

UTS命名空間

用戶命名空間

進程ID

進程關係

認知提升的方法

螞蟻面試：Springcloud核心組件的底層原理，你知道多少？

ECC內存簡介

UEFI基礎——UEFI Shell

《深入Linux內核架構》讀書筆記005——管理進程相關ID

《深入Linux內核架構》讀書筆記002——簡介和概述2

《深入Linux內核架構》讀書筆記004——進程表示

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結