【原創】xenomai內核解析-xenomai實時線程創建流程

問題概述

3年前，在文章【原創】xenomai內核解析--雙核系統調用(一) 中我們提出了兩個問題：

雙核共存時，如何區分應用程序發起的系統調用是xenomai內核調用還是linux內核調用？

一個xenomai實時任務既可以調用xenomai內核服務，也可以調用linux內核服務，這是如何做到的？

本文通過分析源代碼爲你解答問題1，對於問題2，涉及雙核間的調度，本文暫不涉及，後面的文章揭曉答案。

當時解答了問題1，本文將繼續探討雙核間的調度問題，重點分析pthread_creta()接口的底層實現。我們知道，一個xenomai任務既可以在cobalt內核中運行，也可以在linux內核中運行，這就要求兩個內核都有對應的調度實體來管理這個任務。那麼，pthread_creta()接口是如何創建這樣一個雙重身份的任務的呢？讓我們一起來揭開它的神祕面紗吧。

注意：本文是幾年前基於源代碼的分析記錄，質量可能會略差，因爲它是源代碼分析時的流水記錄，沒有經過精心的整理和修改，所以可能存在一些不足之處。如果你只想看結論，可以直接跳到文章的最後部分。希望本文能對你有所啓發。

下面是與本文有上下文聯繫的文章，看完後應該會對xenomai任務管理有整體的認識：

【原創】xenomai內核解析--雙核系統調用(一) 該文章以X86處理器爲例，解析了一個應用程序發起內核系統調用時，xenomai內核調用的流程。

【原創】xenomai內核解析--雙核系統調用(二)--應用如何區分xenomai/linux系統調用或服務該文分析了應用程序發起內核系統調用後，是如何區分一個接口是該linux提供服務還是xenomai提供服務。

1 libCobalt中調用非實時POSIX接口

xenomai通過標準POSIX API創建的實時任務來衍生自己的實時線程，因此，xenomai線程繼承了Linux任務在非關鍵時間模式下調用常規Linux服務的能力。
當升級到實時應用程序時，Linux任務將附加到稱爲實時shadow的特殊xenomai擴展。一個實時shadow允許xenomai協同內核在實時模式下運行時，配置已配對的Linux任務。

拿POSIX標準函數來說，pthread_creta()不是一個系統調用，由NPTL（Native POSIX Threads Library）實現（NPTL是Linux 線程實現的現代版，由UlrichDrepper 和Ingo Molnar 開發，以取代LinuxThreads），NPTL負責一個用戶線程的用戶空間棧創建、內存分配、初始化等工作，與linux內核配合完成線程的創建。每一線程映射一個單獨的內核調度實體（KSE，Kernel Scheduling Entity）。內核分別對每個線程做調度處理。線程同步操作通過內核系統調用實現。

xenomai coblat作爲實時任務的調度器，每個實時線程需要對應到 coblat調度實體，如果創建實時線程需要像linux那樣NPTL與linux 內核深度結合，那麼coblat與libcoblat實現將會變得很複雜，在這裏，xenomai使用了一種方式，由NPTL方式去完成實時線程實體的創建，在普通線程的基礎上附加一些屬性，對應到內核實體時能被實時內核調度。所以libcoblat庫中的實時線程創建函數pthread_creta最後還是需要使用 NPTL的pthread_creta函數，xenomai只是去擴展NPTL pthread_creta創建的線程，使這個線程在實時內核調度。

創建一個實時線程的時候，應用程序調用libcobalt實現的pthread_creta函數，做一些初始工作,libcobalt最後會去調用NPTL的pthread_creta來創建線程，同一個函數pthread_creta,三者之間是怎樣區分的？下面一一解析

以pthread_creta函數開始解析cobalt內核線程創建流程。pthread_creta在pthread.h文件中定義如下：

COBALT_DECL(int, pthread_create(pthread_t *ptid_r,
				const pthread_attr_t *attr,
				void *(*start) (void *),
				void *arg));

COBALT_DECL宏在wrappers.h中如下,展開上面宏，會爲pthread_create生成三個類型函數：

#define __WRAP(call)		__wrap_ ## call
#define __STD(call)		__real_ ## call
#define __COBALT(call)		__cobalt_ ## call
#define __RT(call)		__COBALT(call)
#define COBALT_DECL(T, P)	\
	__typeof__(T) __RT(P);	\
	__typeof__(T) __STD(P); \
	__typeof__(T) __WRAP(P)
	
int __cobalt_pthread_create(pthread_t *ptid_r,
				const pthread_attr_t *attr,
				void *(*start) (void *),
				void *arg);
int __wrap_pthread_create(pthread_t *ptid_r,
				const pthread_attr_t *attr,
				void *(*start) (void *),
				void *arg);
int __real_pthread_create(pthread_t *ptid_r,
				const pthread_attr_t *attr,
				void *(*start) (void *),
				void *arg);

這三種類型的pthread_create函數爲:

__RT(P):__cobalt_pthread_create 表示Cobalt實現的POSIX函數

__STD(P):__real_pthread_create表示原始的POSIX函數（glibc實現)，cobalt庫內部用來調用原始的POSIX函數(glibc NPTL)

__WRAP(P)：__wrap_pthread_create是__cobalt_pthread_create 的弱別名，可以被覆蓋

最後一種，外部庫應提供其自己的__wrap_pthread_create()實現，來覆蓋Cobalt實現的pthread_create （）版本。原始的Cobalt實現仍可以引用爲__COBALT（pthread_create ）。由宏COBALT_IMPL來定義別名：
#define COBALT_IMPL(T, I, A)								\
__typeof__(T) __wrap_ ## I A __attribute__((alias("__cobalt_" __stringify(I)), weak));	\
__typeof__(T) __cobalt_ ## I A

最後cobalt庫函數pthread_create主體爲（xenomai3.0.8\lib\cobalt\thread.c）：

COBALT_IMPL(int, pthread_create, (pthread_t *ptid_r,
				  const pthread_attr_t *attr,
				  void *(*start) (void *), void *arg))
{
	pthread_attr_ex_t attr_ex;
	......
	return pthread_create_ex(ptid_r, &attr_ex, start, arg);
}

COBALT_IMPL定義了__cobalt_pthread_create 函數及該函數的一個弱別名__wrap_pthread_create,調用這兩個函數執行的是同一個函數體。

對於 NPTL函數pthread_create,在Cobalt庫裏被定義爲__real_pthread_create，其實只是NPTL pthread_create的封裝，__real_pthread_create直接調用 NPTL pthread_create,在lib\cobalt\wrappers.c實現如下：

__weak
int __real_pthread_create(pthread_t *ptid_r,
			  const pthread_attr_t * attr,
			  void *(*start) (void *), void *arg)
{
	return pthread_create(ptid_r, attr, start, arg);
}

libcobalt調用NPTL的pthread_create完成線程創建時，使用_STD宏就可以，如下：

ret = __STD(pthread_create(&lptid, &attr, cobalt_thread_trampoline, &iargs));
if (ret) {
	__STD(sem_destroy(&iargs.sync));
	return ret;
}

2 階段1 linux線程創建

pthread_create 不是一個系統調用，是實時線程庫libcobalt的一個函數，爲方便區分，對於一個POSIX函數 func，libCobalt實現的POSIX函數用__RT(func)表示，libc中的實現使用__STD(func)表示（xenomai3.0.8\lib\cobalt\thread.c）：

COBALT_IMPL(int, pthread_create, (pthread_t *ptid_r,
				  const pthread_attr_t *attr,
				  void *(*start) (void *), void *arg))
{
	pthread_attr_ex_t attr_ex;
	struct sched_param param;
	int policy;

	if (attr == NULL)
		attr = &default_attr_ex.std;

	memcpy(&attr_ex.std, attr, sizeof(*attr));
	pthread_attr_getschedpolicy(attr, &policy);
	attr_ex.nonstd.sched_policy = policy;
	pthread_attr_getschedparam(attr, &param);
	attr_ex.nonstd.sched_param.sched_priority = param.sched_priority;
	attr_ex.nonstd.personality = 0; /* Default: use Cobalt. */

	return pthread_create_ex(ptid_r, &attr_ex, start, arg);
}

首先處理的是線程的屬性參數attr。如果沒有傳入線程屬性，就取默認值。

attr_ex表示Cobalt線程的屬性，是pthread_attr_t 的擴展.

typedef struct pthread_attr_ex {
pthread_attr_t std;
struct {
	int personality;
	int sched_policy;
	struct sched_param_ex sched_param;
} nonstd;
} pthread_attr_ex_t;

根據線程屬性attr獲取Cobalt中對應的非標準policy。對調度參數也是同樣。保存在attr_ex.nonstd中.attr_ex.nonstd.personality設置爲0表示Cobalt.

根據attr獲取到擴展的attr_ex後，調用pthread_create_ex進一步處理，從這裏開始使用的都是attr_ex。那個標準的pthread_attr_t保存在attr_ex.std中，用戶空間線程的創建還需要調用NTPL的pthread_create去完成，attr還需要用到。

爲方便下面解析，說一下xenomai如何通過__STD（pthread_create）達到創建由Cobalt調度的線程的：首先通過__STD（pthread_create）創建一個普通線程，但其線程函數不是調用__RT(pthread_create)時傳入的start函數，而是xenomai的設計的cobalt_thread_trampoline，當__STD（pthread_create）結合linux創建出線程後，該線程得到運行時就會執行cobalt_thread_trampoline，再由cobalt_thread_trampoline發起Cobalt內核系統調用sc_cobalt_thread_create，來完成Cobalt實時線程創建，並在實時內核上調度，當系統調用返回後真正從start函數開始執行。

在pthread_create_ex()中,用於給cobalt_thread_trampoline傳遞參數的結構體變量爲struct pthread_iargs iargs。

struct pthread_iargs {
	struct sched_param_ex param_ex;
	int policy; //調度策略
	int personality; //
	void *(*start)(void *);//線程執行函數
	void *arg;//函數參數指針
	int parent_prio;//父進程的優先級
	sem_t sync;//線程創建完成同步信號
	int ret;
};

在調用__STD（pthread_create）之前主要填充iargs成員變量，首先通過系統調用去獲取當前線程在Cobalt核中的擴展調度策略。

pthread_getschedparam_ex(pthread_self(), &iargs.policy, &iargs.param_ex);

int pthread_getschedparam_ex(pthread_t thread,
int *restrict policy_r,
struct sched_param_ex *restrict param_ex)
{
struct sched_param short_param;
int ret;

ret = -XENOMAI_SYSCALL3(sc_cobalt_thread_getschedparam_ex,
thread, policy_r, param_ex);
if (ret == ESRCH) {
ret = __STD(pthread_getschedparam(thread, policy_r, &short_param));
if (ret == 0)
param_ex->sched_priority = short_param.sched_priority;
}

return ret;

}

如果發起創建線程的已經是一個Cobalt的實時線程，那麼系統調用sc_cobalt_thread_getschedparam_ex會拷貝一份該任務的調度參數，否則這個任務只是普通的linux任務，就需要通過NTPL的pthread_getschedparam來獲取。

    iargs.start = start;
	iargs.arg = arg;
	iargs.ret = EAGAIN;
	__STD(sem_init(&iargs.sync, 0, 0));

	ret = __STD(pthread_create(&lptid, &attr, cobalt_thread_trampoline, &iargs));/*__STD 調用標準庫的函數*/
	if (ret) {
		__STD(sem_destroy(&iargs.sync));
		return ret;
	}
		__STD(clock_gettime(CLOCK_REALTIME, &timeout));
	timeout.tv_sec += 5;
	timeout.tv_nsec = 0;

	for (;;) {
		ret = __STD(sem_timedwait(&iargs.sync, &timeout));/*等待實時線程創建完成*/
		if (ret && errno == EINTR)
			continue;
		if (ret == 0) {
			ret = iargs.ret;
			if (ret == 0)
				*ptid_r = lptid;/*傳出線程ID*/
			break;
		} else if (errno == ETIMEDOUT) {
			ret = EAGAIN;
			break;
		}
		ret = -errno;
		panic("regular sem_wait() failed with %s", symerror(ret));
	}

	__STD(sem_destroy(&iargs.sync));/*銷燬信號量*/

	cobalt_thread_harden(); /* May fail if regular thread. */
	return ret;

先初始化同步信號iargs.sync，當調用__STD（pthread_create）後父線程繼續執行，等待實時線程創建完畢，實時線程創建完成時會釋放iargs.sync信號量，並通過iargs.ret傳出返回值。

__STD(pthread_create(&lptid, &attr, cobalt_thread_trampoline, &iargs))先在用戶態分配線程棧後發起linux 的clone系統調用進行內核態調度實體創建。完成創建後內核發生調度，當該線程得到運行時，開始執行cobalt_thread_trampoline函數。另linux線程與進程創建流程區別如下（下圖來源於網絡）;

3 階段2 Cobalt內核創建線程

I-pipe促進了實時內核細粒度的管理每線程，而不是每個進程。由於這個原因，實時核心至少應該實現一種機制，將常規任務轉換爲具有擴展功能的實時線程，並將其綁定到Cobalt。

下面開始在cobalt內核創建實時線程調度實體。普通線程創建完成後，cobalt_thread_trampoline得到得到執行，根據傳入的iargs，進一步發起Cobalt內核系統調用,由於從root域發起系統調用，通過ipipe 慢速系統調用入口ipipe_syscall_hook()進入，檢查該系統調用的控制權限，允許非實時任務從linux域直接調用，然後執行Cobalt內核創建實時線程調度實體函數cobalt_thread ，關於系統調用7. Linux內核系統調用與實時內核系統調用

ipipe_handle_syscall()
	__ipipe_notify_syscall()
		ipipe_syscall_hook()
			handle_head_syscall()
				cobalt_search_process()/**/
		ipipe_syscall_hook()
			CoBaLt_thread_create()
/*
policy ：調度策略
param_ex：擴展參數
    struct sched_param_ex {
        int sched_priority;   //優先級
        union {
            struct __sched_ss_param ss; //SPORADIC調度類參數ss
            struct __sched_rr_param rr; //調度類rr
            struct __sched_tp_param tp; //調度類 tp
            struct __sched_quota_param quota;//調度類quota
        } sched_u;
    };
personality:cobalt
*/
ret = -XENOMAI_SYSCALL5(sc_cobalt_thread_create, ptid,
				policy, &param_ex, personality, &u_winoff);

該系統調用位於kernel\xenomai\posix\thread.c:

COBALT_SYSCALL(thread_create, init,
	       (unsigned long pth, int policy,
		struct sched_param_ex __user *u_param,
		int xid,
		__u32 __user *u_winoff))
{
	struct sched_param_ex param_ex;

	ret = cobalt_copy_from_user(&param_ex, u_param, sizeof(param_ex));
	......

	return __cobalt_thread_create(pth, policy, &param_ex, xid, u_winoff);
}

將調度參數從用戶空間拷貝到param_ex，接着調用__cobalt_thread_create進行創建。

int __cobalt_thread_create(unsigned long pth, int policy,
			   struct sched_param_ex *param_ex,
			   int xid, __u32 __user *u_winoff)
{
	struct cobalt_thread *thread = NULL;
	struct task_struct *p = current;
	struct cobalt_local_hkey hkey;
	int ret;
	/*
	 * We have been passed the pthread_t identifier the user-space
	 * Cobalt library has assigned to our caller; we'll index our
	 * internal pthread_t descriptor in kernel space on it.
	 */
	hkey.u_pth = pth;
	hkey.mm = p->mm;

	ret = pthread_create(&thread, policy, param_ex, p);/*創建線程*/
......
	ret = cobalt_map_user(&thread->threadbase, u_winoff);/*在用戶任務上創建影子線程上下文。*/
......
	if (!thread_hash(&hkey, thread, task_pid_vnr(p))) {
		goto fail;
	}

	thread->hkey = hkey;

	if (xid > 0 && cobalt_push_personality(xid) == NULL) {
		goto fail;
	}

	return xnthread_harden();
fail:
	xnthread_cancel(&thread->threadbase);

	return ret;
}

系統調用由該線程發起，所以內核中current指向該線程的task_struct。首先用hkey來保存該線程的用戶空間線程ID pthread_t、該線程的內存管理結構current->mm，線程ID時整個系統中唯一不能重複的;

struct cobalt_local_hkey {
	/** pthread_t from userland. */
	unsigned long u_pth;
	/** kernel mm context.*/
	struct mm_struct *mm; 
};

hkey是用來做hash查找的，用hkey來快速查找對應的實時線程實體cobalt_thread 。舉個例子，有個簡單的需求，一個實時線程正運行在實時內核上，現需要修改線程的name，如果調用非實時的thread_setname來修改,發起系統調用時ipipe發現這是一個linux的系統調用，需要調用linux的服務，就會觸發雙核間遷移，先遷移到linux內核，然後通過linux實現的thread_setname服務修改task_struct中的comm，修改完後再遷移到實時內核，整個過程代價就非常大。

避免這樣的事發生，實時內核實現了內核調用sc_cobalt_thread_setname,及libcobalt的__RT(thread_setname)，libcobalt會先獲取線程ID作爲第一個參數來發起系統調用sc_cobalt_thread_setname，系統調用前後都是實時上下文，無需內核間切換，實時內核直接根據hkey快速得到實時內核的調度實體cobalt_thread，再得到host_task，接着修改host_task的comm成員。

//xenomai3.0.8\lib\cobalt\thread.c
COBALT_IMPL(int, pthread_setname_np, (pthread_t thread, const char *name))
{
return -XENOMAI_SYSCALL2(sc_cobalt_thread_setname, thread, name);
}

COBALT_SYSCALL(thread_setname, current,
	       (unsigned long pth, const char __user *u_name))
{
	struct cobalt_local_hkey hkey;
	struct cobalt_thread *thread;
	char name[XNOBJECT_NAME_LEN];
	struct task_struct *p;
......
	if (cobalt_strncpy_from_user(name, u_name,
				     sizeof(name) - 1) < 0)
......
	name[sizeof(name) - 1] = '\0';
	hkey.u_pth = pth;
	hkey.mm = current->mm;
......
	thread = thread_lookup(&hkey);
......
	p = xnthread_host_task(&thread->threadbase);
......
	knamecpy(p->comm, name);
......
	return 0;
}

3.1 初始化cobalt_thread->threadbase

接下來調用pthread_create(&thread, policy, param_ex, p)進行實時內核調度實體cobalt_thread 的創建。

static int pthread_create(struct cobalt_thread **thread_p,
			  int policy,
			  const struct sched_param_ex *param_ex,
			  struct task_struct *task)
{
	struct xnsched_class *sched_class;
	union xnsched_policy_param param;
	struct xnthread_init_attr iattr;
	struct cobalt_thread *thread;
	xnticks_t tslice;
	int ret, n;
	spl_t s;

	thread = xnmalloc(sizeof(*thread));
......
	tslice = cobalt_time_slice; /*1000us *1000 */
	sched_class = cobalt_sched_policy_param(&param, policy,
						param_ex, &tslice);/*根據參數獲取調度類，設置調度參數*/
......
  
	iattr.name = task->comm;
	iattr.flags = XNUSER|XNFPU;
	iattr.personality = &cobalt_personality;   /*cobalt線程*/
	iattr.affinity = CPU_MASK_ALL;	
	ret = xnthread_init(&thread->threadbase, &iattr, sched_class, &param);/*初始化xnthread*/

	thread->magic = COBALT_THREAD_MAGIC;
	xnsynch_init(&thread->monitor_synch, XNSYNCH_FIFO, NULL);

	xnsynch_init(&thread->sigwait, XNSYNCH_FIFO, NULL);
	sigemptyset(&thread->sigpending);
	for (n = 0; n < _NSIG; n++)
		INIT_LIST_HEAD(thread->sigqueues + n);

	xnthread_set_slice(&thread->threadbase, tslice);/*設置線程時間切片信息*/
	cobalt_set_extref(&thread->extref, NULL, NULL);

	/*
	 * We need an anonymous registry entry to obtain a handle for
	 * fast mutex locking.
	*/
	ret = xnthread_register(&thread->threadbase, "");
    
	xnlock_get_irqsave(&nklock, s);
	list_add_tail(&thread->next, &cobalt_thread_list);/*添加到鏈表 cobalt_thread_list*/
	xnlock_put_irqrestore(&nklock, s);

	thread->hkey.u_pth = 0;
	thread->hkey.mm = NULL;

	*thread_p = thread;

	return 0;
}

首先分配一個cobalt_thread，分配是從cobalt_heap中分配，cobalt_heap時Cobalt內核管理的一片內存空間。xenomai初始化時從linux分配而來。關於cobalt_heap，後面解析。

接下來根據用戶設定的優先級，來決定調度類，默認只有xnsched_class_rt。其餘調度類需內核編譯時配置，詳見11.2 調度策略與調度類小節。

21-24行iattr 先設置線程的屬性attr；

struct xnthread_init_attr {
struct xnthread_personality *personality;
cpumask_t affinity;
int flags;
const char *name;
};

該結構的成員定義如下：

name：代表線程符號名稱的ASCII字符串。。
flags：影響操作的一組創建標誌。以下標誌可以是此位掩碼的一部分：
- XNSUSP創建處於掛起狀態的線程。在這種情況下，除了爲它調用xnthread_start（）之外，還應使用xnthread_resume（）服務顯式恢復該線程開始執行。調用xnthread_start(）作爲啓動模式時，也可以指定此標誌。
- XNUSER 如果線程將映射到現有的用戶空間任務，則應設置XNUSER。否則，將創建一個新的內核任務。
- XNFPU（啓用FPU）告訴Cobalt新線程可能使用浮點單元。即使未設置，也會隱式假設用戶空間線程使用XNFPU。
affinity：此線程的處理器親和性。傳遞CPU_MASK_ALL意味着允許內核將其分配到任意CPU上執行。傳遞空集無效。

xnthread_init->__xnthread_init()主要初始化結構體cobalt_thread各成員變量。

int __xnthread_init(struct xnthread *thread,
		    const struct xnthread_init_attr *attr,
		    struct xnsched *sched,
		    struct xnsched_class *sched_class,
		    const union xnsched_policy_param *sched_param)
{
	int flags = attr->flags, ret, gravity;
    ......
thread->personality = attr->personality;/* xenomai_personality */
	cpumask_and(&thread->affinity, &attr->affinity, &cobalt_cpu_affinity);
	thread->sched = sched;
	thread->state = flags;/*(XNROOT | XNFPU)*//*XNUSER|XNFPU*/
	thread->info = 0;
	thread->local_info = 0;
	thread->lock_count = 0;
	thread->rrperiod = XN_INFINITE;//0
	thread->wchan = NULL;
	thread->wwake = NULL;
	thread->wcontext = NULL;
	thread->res_count = 0;
	thread->handle = XN_NO_HANDLE;
	memset(&thread->stat, 0, sizeof(thread->stat));
	thread->selector = NULL;
	INIT_LIST_HEAD(&thread->claimq);
	INIT_LIST_HEAD(&thread->glink);
	/* These will be filled by xnthread_start() */
	thread->entry = NULL;
	thread->cookie = NULL;
	init_completion(&thread->exited);
	memset(xnthread_archtcb(thread), 0, sizeof(struct xnarchtcb));

	/*初始化sched->rootc中xnthread裏的定時器b*/
	gravity = flags & XNUSER ? XNTIMER_UGRAVITY : XNTIMER_KGRAVITY;
	xntimer_init(&thread->rtimer, &nkclock, timeout_handler,
		     sched, gravity);   /*創建線程定時器*/
	xntimer_set_name(&thread->rtimer, thread->name);
	xntimer_set_priority(&thread->rtimer, XNTIMER_HIPRIO);
	xntimer_init(&thread->ptimer, &nkclock, periodic_handler,
		     sched, gravity);   /*創建線程週期定時器*/
	xntimer_set_name(&thread->ptimer, thread->name);
	xntimer_set_priority(&thread->ptimer, XNTIMER_HIPRIO);/*設置定時器優先級*/

	thread->base_class = NULL; /* xnsched_set_policy() will set it. */
	ret = xnsched_init_thread(thread);/**/
	if (ret)
		goto err_out;

初始化sched爲當前cpu的xnsched，affinity爲attr->affinity，flags爲XNUSER|XNFPU；以及兩個xntimer 。接下來進行調度相關初始化。

ret = xnsched_set_policy(thread, sched_class, sched_param);

/* Must be called with nklock locked, interrupts off. */
int xnsched_set_policy(struct xnthread *thread,
		       struct xnsched_class *sched_class,
		       const union xnsched_policy_param *p)
{
	int ret;
	/*
	 * Declaring a thread to a new scheduling class may fail, so
	 * we do that early, while the thread is still a member of the
	 * previous class. However, this also means that the
	 * declaration callback shall not do anything that might
	 * affect the previous class (such as touching thread->rlink
	 * for instance).
	 */
	if (sched_class != thread->base_class) {
		ret = xnsched_declare(sched_class, thread, p);
		......
	}
	/*
	 * As a special case, we may be called from __xnthread_init()
	 * with no previous scheduling class at all.
	 */
	if (likely(thread->base_class != NULL)) {
		if (xnthread_test_state(thread, XNREADY))
			xnsched_dequeue(thread);

		if (sched_class != thread->base_class)
			xnsched_forget(thread);
	}

	thread->sched_class = sched_class;
	thread->base_class = sched_class;
	xnsched_setparam(thread, p);
	thread->bprio = thread->cprio;
	thread->wprio = thread->cprio + sched_class->weight;

	if (xnthread_test_state(thread, XNREADY))
		xnsched_enqueue(thread);

	if (!xnthread_test_state(thread, XNDORMANT))
		xnsched_set_resched(thread->sched);

	return 0;
}

如果將設置的sched_class與base_class不相同，則將該線程放到新的sched_class上。接下來如果已經屬於某個調度類也就是base_classs不爲空，而且處於就緒狀態，則把該線程從base_classs的就緒隊列中取下；接着如果sched_class與base_class不相同調用base_class的xnsched_forget將thread從調度類中刪除。從base_classs刪除後，32-33行就可以設置新的sched_class了。

34行根據新的sched_class 設置該thread新的優先級及加權優先級,並將thead的狀體位添加XNWEAK。

static inline void xnsched_setparam(struct xnthread *thread,
				    const union xnsched_policy_param *p)
{
	struct xnsched_class *sched_class = thread->sched_class;

	if (sched_class != &xnsched_class_idle)
		__xnsched_rt_setparam(thread, p);
	else
		__xnsched_idle_setparam(thread, p);

	thread->wprio = thread->cprio + sched_class->weight;
}

static inline void __xnsched_rt_setparam(struct xnthread *thread,
					 const union xnsched_policy_param *p)
{
	thread->cprio = p->rt.prio;
	if (!xnthread_test_state(thread, XNBOOST)) {
		if (thread->cprio)
			xnthread_clear_state(thread, XNWEAK);
		else
			xnthread_set_state(thread, XNWEAK);
	}
}

初始化完成後，42行設置thread所屬的那個xnsched重新調度標誌XNRESCHED。

回到pthread_create()函數，接着初始化cobalt_thread信號相關成員sigpending和sigwait，同步資源xnsynch monitor_synch，關於同步資源13 xenomai線程間同步詳細分析,設置默認時間片並啓動循環定時器rrbtimer。

將cobalt_thread添加到全局鏈表cobalt_thread_list。

3.2 用戶任務shadow線程上下文創建。

通過內核的pthread_create函數已經基本將實時調度實體初始化完畢，但還沒有與linux的調度實體聯繫起來，也就是說雖然在實時內核已經創建了調度實體但是具體的實時程序的用戶代碼在哪實時內核一無所知。並且當該實時任務在實時內核運行時，需要將該任務的運行狀態反映到linux空間。這樣用戶才能查詢到實時任務的運行狀態。

Cobalt中調度的實體稱爲linux空間的一個影子（show），cobalt_map_user函數將Xenomai線程映射到在用戶空間中運行的常規Linux任務。底層Linux任務的優先級和調度類不受影響。

int cobalt_map_user(struct xnthread *thread, __u32 __user *u_winoff)

該函數接收兩個參數，thread表示要映射到current的新影子線程的描述符地址，也就是xnthread，thread必須先前已通過調用xnthread_init（）進行初始化。u_winoff是與thread關聯的“u_window”結構在全局內存池(cobalt_kernel_ppd.umm.heap)中的與內存池起始地址的偏移量（關於xenomai xnheap詳見14 xenomai內存池管理），libcobalt會將內核中cobalt_kernel_ppd.umm.heap起始地址映射到用戶空間的cobalt_umm_shared，用戶空間通過cobalt_umm_shared + u_winoff就可以訪問該線程內核中的“u_window”結構。從用戶空間可見的線程狀態信息通過此“u_window”結構通過共享內存方式獲取。

	if (!xnthread_test_state(thread, XNUSER))
		return -EINVAL;

	if (xnthread_current() || xnthread_test_state(thread, XNMAPPED))
		return -EBUSY;

	if (!access_wok(u_winoff, sizeof(*u_winoff)))
		return -EFAULT;

首先判讀該線程是不是用戶線程，如果不是則報錯。接着判斷thread是否已經映射到一個線程任務，不能重複映射。接着判斷用戶空間地址u_winoff是否正常，否則發生錯誤。

    umm = &cobalt_kernel_ppd.umm;
	u_window = cobalt_umm_alloc(umm, sizeof(*u_window));
	if (u_window == NULL)
		return -ENOMEM;

	thread->u_window = u_window;
	__xn_put_user(cobalt_umm_offset(umm, u_window), u_winoff);

從cobalt_kernel_ppd管理的一片與用戶空間共享的內存umm裏分配u_window結構，將該結構地址給thread->u_window，並且算出改地址到umm的基地址的偏移，將偏移值保存到用戶空間地址u_winoff處。接下來處理task_struct。

	xnthread_init_shadow_tcb(thread);

xnthread_init_shadow_tcb(thread)，將linux管理的task_struct相關變量保存到thread->tcb,tcb結構如下

struct xntcb {
	struct task_struct *host_task; /*指向linux 管理task_struct*/
	struct thread_struct *tsp; /*task_struct->thread線程切換時需要切換的寄存器*/
	struct mm_struct *mm; 		/*用戶空間任務內存管理 task_struct->mm*/
	struct mm_struct *active_mm;
	struct thread_struct ts;
#ifdef CONFIG_XENO_ARCH_WANT_TIP
	struct thread_info *tip;   /*thread_info*/
#endif
#ifdef CONFIG_XENO_ARCH_FPU
	struct task_struct *user_fpu_owner;/*浮點上下文*/
#endif
};

struct xnarchtcb {
	struct xntcb core;
#if LINUX_VERSION_CODE < KERNEL_VERSION(4,8,0)
	unsigned long sp;	
	unsigned long *spp;	
	unsigned long ip;
	unsigned long *ipp;
#endif  
#ifdef IPIPE_X86_FPU_EAGER
	struct fpu *kfpu;
#else
	x86_fpustate *fpup;
	unsigned int root_used_math: 1;
	x86_fpustate *kfpu_state;
#endif
	unsigned int root_kfpu: 1;
	struct {
		unsigned long ip;
		unsigned long ax;
	} mayday;
};

在 task_struct 裏面，有一個成員變量 thread。這裏面保留了要切換進程的時候需要修改的寄存器。core.host_task指向task_struct，core.tsp指向task_struct裏的thread，core.active_mm與core.mm都指向task_struct裏的mm，core.tip指向task_struct中的thread_info.

	xnthread_suspend(thread, XNRELAX, XN_INFINITE, XN_RELATIVE, NULL);
	init_uthread_info(thread);
	xnthread_set_state(thread, XNMAPPED);/*XNMAPPED 線程是映射到linux的任務 */
	xnthread_run_handler(thread, map_thread);/*cobalt_thread_map*/
	ipipe_enable_notifier(current);/*thread_info ->flags置位 TIP_NOTIFY*/

thread_info ->flags置位 TIP_NOTIFY.

下面啓動啓動線程,調用 xnthread_start(thread, &attr)啓動線程.

int xnthread_start(struct xnthread *thread,
		   const struct xnthread_start_attr *attr)
{
	spl_t s;
....
	thread->entry = attr->entry;
	thread->cookie = attr->cookie;
   .......
	if (xnthread_test_state(thread, XNUSER))
		enlist_new_thread(thread);/*添加到鏈表 nkthreadq */

	xnthread_resume(thread, XNDORMANT);
	xnsched_run();
	return 0;
}

設置線程入口entry與參數cookie，將thre添加到全局隊列nkthreadq，接下來調用xnthread_resume()和xnsched_run()，根據標誌位，均未進行任何操作。

返回到__cobalt_thread_create()函數接着處理。

3.3 綁定到Cobalt 內核

	if (!thread_hash(&hkey, thread, task_pid_vnr(p))) {
		ret = -EAGAIN;
		goto fail;
	}

	thread->hkey = hkey;/*內核mm*/

	return xnthread_harden();

將hkey加入local_thread_hash與global_thread_hash，並將該hkey保存到cobalt_thread->hkey。

到此全都初始化完畢，可以在xenomai域調度，由於是實時線程，優先級比linux高，創建完成應該先跑起來，調用xnthread_harden()遷移到head域運行,，在12 雙核間任務遷移詳細分析。

xnthread_harden()返回後，返回用戶空間libCobalt中的函數cobalt_thread_trampoline繼續運行執行。

ret = -XENOMAI_SYSCALL5(sc_cobalt_thread_create, ptid,
				policy, &param_ex, personality, &u_winoff);
	if (ret == 0)
		cobalt_set_tsd(u_winoff);
	/*
	 * We must access anything we'll need from *iargs before
	 * posting the sync semaphore, since our released parent could
	 * unwind the stack space onto which the iargs struct is laid
	 * on before we actually get the CPU back.
	*/
sync_with_creator:
	iargs->ret = ret;
	__STD(sem_post(&iargs->sync));
	if (ret)
		return (void *)ret;

	/*
	 * If the parent thread runs with the same priority as we do,
	 * then we should yield the CPU to it, to preserve the
	 * scheduling order.
	 */
	if (param_ex.sched_priority == parent_prio)
		__STD(sched_yield());

	cobalt_thread_harden();

	retval = start(arg);/*開始執行真正的用戶線程函數*/

	pthread_setmode_np(PTHREAD_WARNSW, 0, NULL);

	return retval;
}

系統調用返回0表示實時線程創建成功，cobalt_set_tsd設置線程數據tsd（TSD: Thread-Specific Data）

在單線程的程序裏，有兩種基本的數據：全局變量和局部變量。但在多線程程序裏，還有第三種數據類型：線程數據（TSD: Thread-Specific Data）。它和全局變量很象，在線程內部，各個函數可以象使用全局變量一樣調用它，但它對線程外部的其它線程是不可見的。例如我們常見的變量 errno，它返回標準的出錯信息。它顯然不能是一個局部變量，幾乎每個函數都應該可以調用它；

cobalt_set_tsd使用系統調用sc_cobalt_get_current獲取內核中的xnthread.handle結合u_winoff來設置，具體流程不展開。注意此時該線程處於head域。

接着調用glibc中的sem_post，發起linux系統調用，釋放iargs->sync信號讓阻塞的父線程繼續執行。調用linux系統服務會發生head->root遷移，執行，後從Linux調用返回，此時線程處於root域。

由於處於root域，所以接着調用cobalt_thread_harden();發起Cobalt內核sc_cobalt_migrate系統調用（實時核心公開的專用系統調用），將線程切換至Cobalt調度（綁定到cobalt內核），到此該線程創建完畢，待cobalt調度後得到運行，返回用戶空間以Cobalt線程的身份開始執行用戶指定的線程函數start(arg)。

用戶代碼中會可能調用linux服務，這樣還會發生很多次head>root->head的遷移。

4 總結

到此整個cobalt線程創建主流程如下：

先通過標準pthread_creta()創建linux任務，任務執行入口爲cobalt_thread_trampoline()；
cobalt_thread_trampoline()中發起cobalt內核系統調用，創建cobalt調度任務實體；
通過cobalt_thread_harden()遷移到cobalt內核調度；
執行真正的用戶任務入口start()函數。

【原創】xenomai內核解析-xenomai實時線程創建流程

問題概述

1 libCobalt中調用非實時POSIX接口

2 階段1 linux線程創建

3 階段2 Cobalt內核創建線程

3.1 初始化cobalt_thread->threadbase

3.2 用戶任務shadow線程上下文創建。

3.3 綁定到Cobalt 內核

4 總結

[轉載] 跟我一起寫Makefile

【原創】linux爲什麼不是實時操作系統

【原創】關於xenomai3 RTnet的一點記錄

【原創】xenomai內核解析-xenomai實時線程創建流程

【轉載】老男孩讀PCIe

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結