Linux進程ID的內核管理

在進程創建的時候，Linux系統會分配一個號碼給當前這個進程，這個號碼在進程所在的命名空間中是唯一的，但在其它的兄弟進程中，這個號碼就不是唯一的了，也就是對於全局的命名空間來說，這個號不是全局唯一。這個號碼就是進程的ID號，簡稱爲PID。

一，進程號數據結構表示

這個PID被保存在進程的結構表示task_struct中。

struct task_struct{
....
     pid_t pid;
     pid_t tgid;
.....
};

這兩個結構都是pid_t，這個結構是是體系結構相關的，在X86下它的定義是int，也就是可以同時使用的最大的ID數爲int的取值範圍。tgid是線程組ID，因爲在Linux中線程也是進程，線程組的ID其實就是主線程的PID。

typedef int	__kernel_pid_t;
typedef __kernel_pid_t pid_t;

二，管理進程ID

因爲這些都是和具體的PID命名空間相關的，先了解一下PID的命名空間結構。

struct pid_namespace {
	struct kref kref;
	struct pidmap pidmap[PIDMAP_ENTRIES];
	int last_pid;
	struct task_struct *child_reaper;
	struct kmem_cache *pid_cachep;
	int level;
	struct pid_namespace *parent;
#ifdef CONFIG_PROC_FS
	struct vfsmount *proc_mnt;
#endif
};

這個結構用於實現很多的功能，如保證分配唯一的PID，在這裏只關心這幾個成員：

child_reaper。每個PID命名空間都有一個和全局空間中init進程一樣的進程，該進程實現了對命名空間中的孤兒進程進行wait4操作。child_reaper保存了該進程的指針。
parent。指向父命名空間的指針。
level。表示當前的命名空間在命名空間層次結構中的深度，第一個命名空間，就是全局的命名空間level值爲0。level較高的命名空間對於level較低的命名空間是可見的，其實命名空間是採用在低level命名空間中給高level命名空間中的PID建立映射，所以對其是可見的。

爲了實現命名空間可見性，建立兩個數據結構：struct pid用於內核對PID的內部表示，而struct upid表示特定的命名空間中可見的信息。定義如下：

struct upid {
	/* Try to keep pid_chain in the same cacheline as nr for find_pid */
	int nr;   //表示ID的數值，這個就是命名空間內所可見的PID數值。
	struct pid_namespace *ns; // 該ID的命名空間指針。
	struct hlist_node pid_chain; // 將所有的upid實例都保存在一散列溢出鏈表上。
};

struct pid
{
	atomic_t count;  // 引用計數
	/* lists of tasks that use this pid */ //
	struct hlist_head tasks[PIDTYPE_MAX]; // 每個數組項都是一個散列表頭。
	struct rcu_head rcu; // 
	int level;
	struct upid numbers[1];
};

PIDTYPE_MAX表示進程ID的類型，如下：

enum pid_type
{
	PIDTYPE_PID,進程ID
	PIDTYPE_PGID,進程組ID
	PIDTYPE_SID,會話組ID
	PIDTYPE_MAX
};

在網上找的一張圖片，說明這些數據結構的關係：

這個圖也不是那麼容易看明白，圖中一共有三個部分，說明如下：

struct pid是進程ID在內核結構的一個表示。但有可能會出現多個task_struct實例共享同一個進程ID，這就需要將所有共享進程ID的task_struct都以一種方式和struct pid關聯起來，這裏就是struct pid結構中的tasks數組，對應多個散列鏈表的頭部，那是不是struct task_struct結構中應該提供鏈表接點呢，就是這樣的。
```
struct task_struct{
......
    struct pid_link pids[PIDTYPE_MAX];
......
};
struct pid_link{	 	//定義於pid.h文件中。
    struct hlist_node node;
    struct pid *pid;
}
```
這個結構中，pid指向進程所屬的pid結構，而node用作散列表接點元素。上文中的圖中的mode，應爲node。這樣的話，圖的右上角部分就容易理解了。再往下
struct pid結構中有一個level域，表示當前命名空間的層次，這個值可以表示當前可以看到該進程的命名空間的數目，比如當前只有一個命名空間，則表示進程對一個命名空間是可見的，如果這個命名空間有一個子命名空間，那麼子命名空間的level值應該是1，這個時候表示子命名空間的進程對2個命名空間可見：自身和父命名空間。struct pid爲了實現這個，在結構中定義了numbers域，定義只有一個，因爲大多數情況是這樣的。但如果有更多的話，也沒事，爲什麼呢？因爲這個numbers域在結構的末尾，所以只要添加，就可以根據level的值讀出來。只要分配空間了，就不會有數組溢出了。所以數組的每一個元素都是對應於在某一層次的命名空間中的struct upid表示，這正是PID在特定命名空間中可見的信息。
現在說到右下角了，這是在指定的命名空間中查找對應於指定PID的struct pid結構實例。因爲我們在某一命名空間對PID可見的信息是struct upid,那麼要根據這個struct upid找到struct pid。
爲了實現這個，內核使用散列表，在pid.c文件中定義：
```
static struct hlist_head *pid_hash;
```
這是一個數組，數組的大小取決於當前計算機的內存大小，調用pidhash_init函數計算合適的容量並分配內存
```
void __init pidhash_init(void)
{
	int i, pidhash_size;
	unsigned long megabytes = nr_kernel_pages >> (20 - PAGE_SHIFT);

	pidhash_shift = max(4, fls(megabytes * 4));
	pidhash_shift = min(12, pidhash_shift);
	pidhash_size = 1 << pidhash_shift;

	printk("PID hash table entries: %d (order: %d, %Zd bytes)\n",
		pidhash_size, pidhash_shift,
		pidhash_size * sizeof(struct hlist_head));

	pid_hash = alloc_bootmem(pidhash_size *	sizeof(*(pid_hash)));
	if (!pid_hash)
		panic("Could not alloc pidhash!\n");
	for (i = 0; i < pidhash_size; i++)
		INIT_HLIST_HEAD(&pid_hash[i]);
}
```
這個函數在內核啓動的時候，也就是start_kernel中，就會被調用。
在這之後，可以根據struct upid中nr的值和命名空間的地址進行散列。

三，進程task_struct和PID結構struct pid關聯

int fastcall attach_pid(struct task_struct *task, enum pid_type type,
		struct pid *pid)
{
	struct pid_link *link;

	link = &task->pids[type];
	link->pid = pid;
	hlist_add_head_rcu(&link->node, &pid->tasks[type]);

	return 0;
}

這個比較好理解，就是將struct pid附加到struct task_struct上去。這樣就建立了一個雙向的連接：task_struct可以通過task_struct->pids[type]->pid訪問struct pid實例。而從struct pid實例開始的話，可以遍歷tasks[type]散列表找到task_struct實例。

四，實現

在管理這些結構時，主要是對下面幾個問題比較着重：

根據一個局部的進程ID和對應的命名空間，查找所對應的task_struct實例。
如果已知task_struct和進程類型，命名空間，如何找到命名空間裏的進程ID，就是struct upid結構的nr值。

對於第一個問題，先根據局部的進程ID和對應的命名空間找到struct pid實例，然後再確定task_struct實例。

struct pid * fastcall find_pid_ns(int nr, struct pid_namespace *ns)
{
	struct hlist_node *elem;
	struct upid *pnr;

	hlist_for_each_entry_rcu(pnr, elem,
			&pid_hash[pid_hashfn(nr, ns)], pid_chain)
		if (pnr->nr == nr && pnr->ns == ns)
			return container_of(pnr, struct pid,
					numbers[ns->level]);

	return NULL;
}
EXPORT_SYMBOL_GPL(find_pid_ns);

通過前文說的pid_hash散列數組，對於container_of的實現，可以參看博文【Linux內核的Container_of機制】。在找到struct pid之後，再確定struct task_struct就容易了。

內核封裝了一個函數，用於一步完成這個操作：

struct task_struct *find_task_by_pid_type_ns(int type, int nr,
		struct pid_namespace *ns)
{
	return pid_task(find_pid_ns(nr, ns), type);
}

pid_task用於根據struct pid和type，找到具體的struct task_struct，這裏也用到了Container_of機制。

struct task_struct * fastcall pid_task(struct pid *pid, enum pid_type type)
{
	struct task_struct *result = NULL;
	if (pid) {
		struct hlist_node *first;
		first = rcu_dereference(pid->tasks[type].first);
		if (first)
			result = hlist_entry(first, struct task_struct, pids[(type)].node);
	}
	return result;
}

再回到第二個問題，第二個問題就比較容易了，根據struct task_struct和type，可以很容易的找到struct pid，只需要

struct pid* _pid = task->pids[type].pid;

就可以了，然後再直接遍歷_pid->numbers就可以得到匹配的命名空間所在struct upid結構，進而獲取nr的值。

五，生成唯一的進程ID

上面討論的是內核如果對進程的ID進行管理，內核肯定還會負責生成一些PID，並且保證這些PID都是全局唯一的。

爲了知道哪些ID已經分配，內核使用了一個大位圖，位圖中的每個比特表示一個PID，這樣就很明白了，PID的值和比特位的位置是唯一對應的。這樣一來，分配一個沒有使用的PID，就相當於在位圖中尋找第一個爲0的比特，然後將其置1。

static int alloc_pidmap(struct pid_namespace *pid_ns)
{
	int i, offset, max_scan, pid, last = pid_ns->last_pid;
	struct pidmap *map;

	pid = last + 1;
	if (pid >= pid_max)
		pid = RESERVED_PIDS;
	offset = pid & BITS_PER_PAGE_MASK;
	map = &pid_ns->pidmap[pid/BITS_PER_PAGE];
	max_scan = (pid_max + BITS_PER_PAGE - 1)/BITS_PER_PAGE - !offset;
	for (i = 0; i <= max_scan; ++i) {
		if (unlikely(!map->page)) {
			void *page = kzalloc(PAGE_SIZE, GFP_KERNEL);
			/*
			 * Free the page if someone raced with us
			 * installing it:
			 */
			spin_lock_irq(&pidmap_lock);
			if (map->page)
				kfree(page);
			else
				map->page = page;
			spin_unlock_irq(&pidmap_lock);
			if (unlikely(!map->page))
				break;
		}
		if (likely(atomic_read(&map->nr_free))) {
			do {
				if (!test_and_set_bit(offset, map->page)) {
					atomic_dec(&map->nr_free);
					pid_ns->last_pid = pid;
					return pid;
				}
				offset = find_next_offset(map, offset);
				pid = mk_pid(pid_ns, map, offset);
			/*
			 * find_next_offset() found a bit, the pid from it
			 * is in-bounds, and if we fell back to the last
			 * bitmap block and the final block was the same
			 * as the starting point, pid is before last_pid.
			 */
			} while (offset < BITS_PER_PAGE && pid < pid_max &&
					(i != max_scan || pid < last ||
					    !((last+1) & BITS_PER_PAGE_MASK)));
		}
		if (map < &pid_ns->pidmap[(pid_max-1)/BITS_PER_PAGE]) {
			++map;
			offset = 0;
		} else {
			map = &pid_ns->pidmap[0];
			offset = RESERVED_PIDS;
			if (unlikely(last == offset))
				break;
		}
		pid = mk_pid(pid_ns, map, offset);
	}
	return -1;
}

釋放一個PID操作

static fastcall void free_pidmap(struct pid_namespace *pid_ns, int pid)
{
	struct pidmap *map = pid_ns->pidmap + pid / BITS_PER_PAGE;
	int offset = pid & BITS_PER_PAGE_MASK;

	clear_bit(offset, map->page);
	atomic_inc(&map->nr_free);
}

另外，上面討論知道，在建立一個新的進程時，進程需要在多個命名空間中是可見的，對於每個對其可見的命名空間，都需要生成一個局部的PID，這個過程處理如下：

struct pid *alloc_pid(struct pid_namespace *ns)
{
	struct pid *pid;
	enum pid_type type;
	int i, nr;
	struct pid_namespace *tmp;
	struct upid *upid;

	pid = kmem_cache_alloc(ns->pid_cachep, GFP_KERNEL);
	if (!pid)
		goto out;

	tmp = ns;
	for (i = ns->level; i >= 0; i--) {//這裏將對其可見的所有的命名空間都生成一個局部的ID
		nr = alloc_pidmap(tmp);
		if (nr < 0)
			goto out_free;

		pid->numbers[i].nr = nr;
		pid->numbers[i].ns = tmp;
		tmp = tmp->parent;
	}

	get_pid_ns(ns);
	pid->level = ns->level;
	atomic_set(&pid->count, 1);
	for (type = 0; type < PIDTYPE_MAX; ++type)
		INIT_HLIST_HEAD(&pid->tasks[type]);

	spin_lock_irq(&pidmap_lock);
	for (i = ns->level; i >= 0; i--) {//更新struct pid,對每個struct upid，都將其置於散列表上。
		upid = &pid->numbers[i];
		hlist_add_head_rcu(&upid->pid_chain,
				&pid_hash[pid_hashfn(upid->nr, upid->ns)]);
	}
	spin_unlock_irq(&pidmap_lock);

out:
	return pid;

out_free:
	for (i++; i <= ns->level; i++)
		free_pidmap(pid->numbers[i].ns, pid->numbers[i].nr);

	kmem_cache_free(ns->pid_cachep, pid);
	pid = NULL;
	goto out;
}

這個函數很簡單，處理的功能也比較單一。

六，小結

再回過來看內核進程，這裏一個進程類型，其實在內核中進程不是隻有一個PID這個特徵的，還有其它的ID，像上文涉及的tgid就是其中一個，可能有以下幾種類型：

線程組ID，就是TGID。在一個進程沒有使用線程之前，TGID和PID是相等的。線程組的主進程，就是第一個創建線程的進程。每個”線程“都包含組長的task_struct實例
```
struct task_struct{
.......
       struct task_struct *group_leader;	/* threadgroup leader */
.......
};
```
由獨立進程合併的進程組。每個task_struct結構都包含進程組長的PID信息，這個數據存儲於task_struct->signal->__pgrp中。
幾個進程組可以合併成一個會話，會話中的所有進程都有同樣的會話ID。
全局PID。
局部PID。這是由於引進命名空間導致的，因爲命名空間中的所有PID對父命名空間都是可見的，但子命名空間無法看到父命名空間的PID。這就要求，某些進程可能有多個PID，因爲可以看到該進程的命名空間都爲爲其分配一個PID。

zmxiangde_88

發佈了65 篇原創文章 · 獲贊 27 · 訪問量 58萬+

私信關注

Linux進程ID的內核管理

[轉帖]使用NMT和pmap解決JVM資源泄漏問題原創

Python實現大麥網搶票的四大關鍵技術點解析

Python 安裝庫指令大全

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

一款開源的.NET程序集反編譯、編輯和調試神器

關於接口協議，你必須要知道這些！

基於 Milvus + LlamaIndex 實現高級 RAG

【2024-05-21】以茶會友

TCP/IP：認識TCP

Socket編程指南

內核的bootmem內存分配器

淺析MySQL二進制日誌

inet_ntoa在64位機器上出錯

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結