最近因爲項目原因,發現對於cgroup的知識嚴重匱乏,所以惡補了一下cgroup的相關知識。
cgroup指對進程進行分組,然後控制讓他們的cpu,io以及memory的使用,和系統的性能息息相關。
一、首先是cgroup的整體框架圖:
以上框圖可以看出以下幾點:
1. cgroup的subsys分爲很多種,主要有:
acct:進行CPU資源的統計
cpuset:主要用來色值進程跑在哪個核上面
cpuctrl:主要用來設置進程在CPU上的運行時間,起作用的爲cpu.shares
blkio:主要用來設置不同進程的IO量佔比,可以設置爲權重和絕對值兩種
memory:主要用來設置進程的memory佔用的最高值。
2. 每個子系統下面分爲多個cgroup,以Android memory cgroup爲例,其層次結構如下:
首先其hierarchy爲2(具體爲什麼爲2沒搞明白,按照理解,root爲第一層,system/app爲第二層,uid爲第三層,pid爲第四層),兩層的話應該爲root和system/apps兩層。
其他cpu分組可以分爲前後臺進程佔用的cpu時間,或者cpu大小核分組。
3. 一個進程對應於一個css_set,css_set又對應於多個子系統
4. 每個process可以屬於多個子系統,以對該process進行多種資源的管控。
二、cgroup的初始化:
cgroup的初始化主要是對cgroup子系統進行初始化,最主要的兩個函數爲:
cgroup_init_early------>用來初始化root cgroup,初始化init_css_set, init_css_set_link這兩個全局結構,設置init進程的cgroups指針爲init_css_set,同時將一些需要進行early init的subsys進行初始化
int __init cgroup_init_early(void)
{
static struct cgroup_sb_opts __initdata opts;
struct cgroup_subsys *ss;
int i;
init_cgroup_root(&cgrp_dfl_root, &opts);
cgrp_dfl_root.cgrp.self.flags |= CSS_NO_REF;
RCU_INIT_POINTER(init_task.cgroups, &init_css_set);
for_each_subsys(ss, i) {//遍歷所有的cgroup subsys子系統,並初始化
WARN(!ss->css_alloc || !ss->css_free || ss->name || ss->id,
"invalid cgroup_subsys %d:%s css_alloc=%p css_free=%p id:name=%d:%s\n",
i, cgroup_subsys_name[i], ss->css_alloc, ss->css_free,
ss->id, ss->name);
WARN(strlen(cgroup_subsys_name[i]) > MAX_CGROUP_TYPE_NAMELEN,
"cgroup_subsys_name %s too long\n", cgroup_subsys_name[i]);
ss->id = i;
ss->name = cgroup_subsys_name[i];
if (!ss->legacy_name)
ss->legacy_name = cgroup_subsys_name[i];
if (ss->early_init)//如果需要進行early init,則此處調用cgroup_init_subsys()進行初始化
cgroup_init_subsys(ss, true);
}
return 0;
}
cgroup_init------>用來初始化和cgroup相關的一些全局變量,同時對剩餘的沒有在early init時初始化的subsys進行初始化.包括添加一些susbsys制定的文件節點到文件系統中,爲init_css_set設置hlist指向的hash list,最後調用register_filesystem註冊一個類型爲cgroup的僞文件系統,並創建/proc/cgroups。
cat proc/cgroups,可以查看當前cgroup的基本:
1. 當前cgroup subsys的名字:通過cgroup_subsys->legacy_name獲取
2. 包含多少層:通過cgroup_subsys->root->hierarchy_id獲取
2. 包含多少個cgroups:通過cgroup_subsys->root->nr_cgrps獲取
3. 哪些子系統被enabled:調用cgroup_ssid_enabled函數來實現
cgroup_init_subsys------>用來繼續初始化cgroup_sysbusys結構體,同時分配並初始化一個cgroup_subsys_state結構體,並將初始化完畢的cgroup_subsys結構體賦給cgroup_subsys_state成員ss。
其中分配css結構體主要是各subsys自己註冊的回調函數css_alloc
static void __init cgroup_init_subsys(struct cgroup_subsys *ss, bool early)
{
struct cgroup_subsys_state *css;
pr_debug("Initializing cgroup subsys %s\n", ss->name);
mutex_lock(&cgroup_mutex);
idr_init(&ss->css_idr);
INIT_LIST_HEAD(&ss->cfts);
/* Create the root cgroup state for this subsystem */
ss->root = &cgrp_dfl_root;
css = ss->css_alloc(cgroup_css(&cgrp_dfl_root.cgrp, ss));----分配css結構體
/* We don't handle early failures gracefully */
BUG_ON(IS_ERR(css));
init_and_link_css(css, ss, &cgrp_dfl_root.cgrp);
/*
* Root csses are never destroyed and we can't initialize
* percpu_ref during early init. Disable refcnting.
*/
css->flags |= CSS_NO_REF;
if (early) {
/* allocation can't be done safely during early init */
css->id = 1;
} else {
css->id = cgroup_idr_alloc(&ss->css_idr, css, 1, 2, GFP_KERNEL);
BUG_ON(css->id < 0);
}
/* Update the init_css_set to contain a subsys
* pointer to this state - since the subsystem is
* newly registered, all tasks and hence the
* init_css_set is in the subsystem's root cgroup. */
init_css_set.subsys[ss->id] = css;
have_fork_callback |= (bool)ss->fork << ss->id;
have_exit_callback |= (bool)ss->exit << ss->id;
have_free_callback |= (bool)ss->free << ss->id;
have_canfork_callback |= (bool)ss->can_fork << ss->id;
/* At system boot, before all subsystems have been
* registered, no tasks have been forked, so we don't
* need to invoke fork callbacks here. */
BUG_ON(!list_empty(&init_task.tasks));
BUG_ON(online_css(css));
mutex_unlock(&cgroup_mutex);
}
以memory cgroup爲例:
主要分配一個memcg結構體,同時初始化次結構體相關參數,然後返回memcg成員css。
static struct cgroup_subsys_state * __ref
mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
{
struct mem_cgroup *parent = mem_cgroup_from_css(parent_css);
struct mem_cgroup *memcg;
long error = -ENOMEM;
memcg = mem_cgroup_alloc();
if (!memcg)
return ERR_PTR(error);
memcg->high = PAGE_COUNTER_MAX;
memcg->soft_limit = PAGE_COUNTER_MAX;
if (parent) {
memcg->swappiness = mem_cgroup_swappiness(parent);
memcg->oom_kill_disable = parent->oom_kill_disable;
}
if (parent && parent->use_hierarchy) {
memcg->use_hierarchy = true;
page_counter_init(&memcg->memory, &parent->memory);
page_counter_init(&memcg->swap, &parent->swap);
page_counter_init(&memcg->memsw, &parent->memsw);
page_counter_init(&memcg->kmem, &parent->kmem);
page_counter_init(&memcg->tcpmem, &parent->tcpmem);
} else {
page_counter_init(&memcg->memory, NULL);
page_counter_init(&memcg->swap, NULL);
page_counter_init(&memcg->memsw, NULL);
page_counter_init(&memcg->kmem, NULL);
page_counter_init(&memcg->tcpmem, NULL);
/*
* Deeper hierachy with use_hierarchy == false doesn't make
* much sense so let cgroup subsystem know about this
* unfortunate state in our controller.
*/
if (parent != root_mem_cgroup)
memory_cgrp_subsys.broken_hierarchy = true;
}
/* The following stuff does not apply to the root */
if (!parent) {
root_mem_cgroup = memcg;
return &memcg->css;
}
error = memcg_online_kmem(memcg);
if (error)
goto fail;
if (cgroup_subsys_on_dfl(memory_cgrp_subsys) && !cgroup_memory_nosocket)
static_branch_inc(&memcg_sockets_enabled_key);
return &memcg->css;
fail:
mem_cgroup_free(memcg);
return ERR_PTR(-ENOMEM);
}
以上爲cgroup的整體框架。