1. 前言
本文以ARM64平臺下的cpuidle driver爲例,說明怎樣在cpuidle framework的框架下,編寫cpuidle driver。另外,本文在描述cpuidle driver的同時,會涉及到CPU hotplug的概念,因此也可作爲CPU hotplug的引子。
2. arm64_idle_init
ARM64 generic CPU idle driver的代碼位於“drivers/cpuidle/cpuidle-arm64.c”中,它的入口函數是arm64_idle_init,如下:
1: static int __init arm64_idle_init(void)
2: {
3: int cpu, ret;
4: struct cpuidle_driver *drv = &arm64_idle_driver;
5:
6: /*
7: * Initialize idle states data, starting at index 1.
8: * This driver is DT only, if no DT idle states are detected (ret == 0)
9: * let the driver initialization fail accordingly since there is no
10: * reason to initialize the idle driver if only wfi is supported.
11: */
12: ret = dt_init_idle_driver(drv, arm64_idle_state_match, 1);
13: if (ret <= 0) {
14: if (ret)
15: pr_err("failed to initialize idle states\n");
16: return ret ? : -ENODEV;
17: }
18:
19: /*
20: * Call arch CPU operations in order to initialize
21: * idle states suspend back-end specific data
22: */
23: for_each_possible_cpu(cpu) {
24: ret = cpu_init_idle(cpu);
25: if (ret) {
26: pr_err("CPU %d failed to init idle CPU ops\n", cpu);
27: return ret;
28: }
29: }
30:
31: ret = cpuidle_register(drv, NULL);
32: if (ret) {
33: pr_err("failed to register cpuidle driver\n");
34: return ret;
35: }
36:
37: return 0;
38: }
39: device_initcall(arm64_idle_init);
由該函數的執行過程,可以看出cpuidle driver的實現過程,包括:
1)靜態定義一個struct cpuidle_driver變量(這裏爲arm64_idle_driver)並填充必要的字段,如下
1: static struct cpuidle_driver arm64_idle_driver = {
2: .name = "arm64_idle",
3: .owner = THIS_MODULE,
4: /*
5: * State at index 0 is standby wfi and considered standard
6: * on all ARM platforms. If in some platforms simple wfi
7: * can't be used as "state 0", DT bindings must be implemented
8: * to work around this issue and allow installing a special
9: * handler for idle state index 0.
10: */
11: .states[0] = {
12: .enter = arm64_enter_idle_state,
13: .exit_latency = 1,
14: .target_residency = 1,
15: .power_usage = UINT_MAX,
16: .flags = CPUIDLE_FLAG_TIME_VALID,
17: .name = "WFI",
18: .desc = "ARM64 WFI",
19: }
20: };
該driver的名稱爲“arm64_idle”,會體現在sysfs(如/sys/devices/system/cpu/cpuidle/current_driver)中;
稍微留意一下上面的註釋,所有的ARM平臺,都應提供默認的WFI standby狀態,作爲idle state 0,如果有例外,則需要在DTS中另行處理;
對於state0,driver將其初始化爲:exit latency和target residency均爲1(最小值),power usage爲整數中的最大值。由此可以看出,這些信息不是實際信息(因爲driver不可能知道所有ARM平臺的WFI相關的信息),而是相對信息,其中的含義是:所有其它的state,exit latency和target residency都會比state0大,power usage都會比state0小,夠用了!多麼巧妙地設計!
2)初始化其它的idle states(從state1開始),必須通過DTS操作,否則返回失敗。具體可參考後續的描述。
3)對每一個cpu,調用cpu_init_idle接口,初始化用於支持cpuidle的、和cpu suspend有關的後端數據。後面會詳細介紹。
4)調用cpuidle_register,將cpuidle driver註冊到cpuidle core中(具體可參考“Linux cpuidle framework(2)_cpuidle core”)。
3. dt_init_idle_driver
dt_init_idle_driver函數用於從DTS中解析出cpuidle states的信息,並初始化arm64_idle_driver中的states數組。在分析這個函數之前,我們先來看一下cpuidle相關的DTS源文件是怎麼寫的:
注:寫這篇文章所參考的kernel(3.18-rc4)中,沒有ARM64平臺使用了cpuidle功能,因此這裏給不出ARM64平臺下的參考文件。好在ARM和ARM64在cpuidle dts解析上面的流程是一樣的,我們可以借用ARM中的例子。
1: /* arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts */
2: cpus {
3: #address-cells = <1>;
4: #size-cells = <0>;
5:
6: cpu0: cpu@0 {
7: device_type = "cpu";
8: compatible = "arm,cortex-a15";
9: reg = <0>;
10: cci-control-port = <&cci_control1>;
11: cpu-idle-states = <&CLUSTER_SLEEP_BIG>;
12: };
13:
14: cpu1: cpu@1 {
15: device_type = "cpu";
16: compatible = "arm,cortex-a15";
17: reg = <1>;
18: cci-control-port = <&cci_control1>;
19: cpu-idle-states = <&CLUSTER_SLEEP_BIG>;
20: };
21:
22: cpu2: cpu@2 {
23: device_type = "cpu";
24: compatible = "arm,cortex-a7";
25: reg = <0x100>;
26: cci-control-port = <&cci_control2>;
27: cpu-idle-states = <&CLUSTER_SLEEP_LITTLE>;
28: };
29:
30: cpu3: cpu@3 {
31: device_type = "cpu";
32: compatible = "arm,cortex-a7";
33: reg = <0x101>;
34: cci-control-port = <&cci_control2>;
35: cpu-idle-states = <&CLUSTER_SLEEP_LITTLE>;
36: };
37:
38: cpu4: cpu@4 {
39: device_type = "cpu";
40: compatible = "arm,cortex-a7";
41: reg = <0x102>;
42: cci-control-port = <&cci_control2>;
43: cpu-idle-states = <&CLUSTER_SLEEP_LITTLE>;
44: };
45: idle-states {
46: CLUSTER_SLEEP_BIG: cluster-sleep-big {
47: compatible = "arm,idle-state";
48: local-timer-stop;
49: entry-latency-us = <1000>;
50: exit-latency-us = <700>;
51: min-residency-us = <2000>;
52: };
53:
54: CLUSTER_SLEEP_LITTLE: cluster-sleep-little {
55: compatible = "arm,idle-state";
56: local-timer-stop;
57: entry-latency-us = <1000>;
58: exit-latency-us = <500>;
59: min-residency-us = <2500>;
60: };
61: };
cpuidle有關的DTS信息,從屬於cpus node中,先看最後面的idle-states node,它負責定義該ARM平臺支持的所有的idle states,每個子node就是一個state:
各個state定義都以“arm,idle-state”標識;
entry-latency-us、exit-latency-us、min-residency-us分別定義了改idle state的幾個重要參數,local-timer-stop對應CPUIDLE_FLAG_TIMER_STOP flag(具體含義可參考“Linux cpuidle framework(2)_cpuidle core”中的描述);
這些信息會被dt_init_idle_driver解析出來,並保存在arm64_idle_driver的state數組中。在每個cpu的node中,通過cpu-idle-states字段,指明該CPU支持的idle states,可以有多個。
結合上面的DTS信息,dt_init_idle_driver函數的執行過程如下。
1: /* drivers/cpuidle/dt_idle_states.c */
2: int dt_init_idle_driver(struct cpuidle_driver *drv,
3: const struct of_device_id *matches,
4: unsigned int start_idx)
5: {
6: struct cpuidle_state *idle_state;
7: struct device_node *state_node, *cpu_node;
8: int i, err = 0;
9: const cpumask_t *cpumask;
10: unsigned int state_idx = start_idx;
11:
12: if (state_idx >= CPUIDLE_STATE_MAX)
13: return -EINVAL;
14: /*
15: * We get the idle states for the first logical cpu in the
16: * driver mask (or cpu_possible_mask if the driver cpumask is not set)
17: * and we check through idle_state_valid() if they are uniform
18: * across CPUs, otherwise we hit a firmware misconfiguration.
19: */
20: cpumask = drv->cpumask ? : cpu_possible_mask;
21: cpu_node = of_cpu_device_node_get(cpumask_first(cpumask));
22:
23: for (i = 0; ; i++) {
24: state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i);
25: if (!state_node)
26: break;
27:
28: if (!idle_state_valid(state_node, i, cpumask)) {
29: pr_warn("%s idle state not valid, bailing out\n",
30: state_node->full_name);
31: err = -EINVAL;
32: break;
33: }
34:
35: if (state_idx == CPUIDLE_STATE_MAX) {
36: pr_warn("State index reached static CPU idle driver states array size\n");
37: break;
38: }
39:
40: idle_state = &drv->states[state_idx++];
41: err = init_state_node(idle_state, matches, state_node);
42: if (err) {
43: pr_err("Parsing idle state node %s failed with err %d\n",
44: state_node->full_name, err);
45: err = -EINVAL;
46: break;
47: }
48: of_node_put(state_node);
49: }
50:
51: of_node_put(state_node);
52: of_node_put(cpu_node);
53: if (err)
54: return err;
55: /*
56: * Update the driver state count only if some valid DT idle states
57: * were detected
58: */
59: if (i)
60: drv->state_count = state_idx;
61:
62: /*
63: * Return the number of present and valid DT idle states, which can
64: * also be 0 on platforms with missing DT idle states or legacy DT
65: * configuration predating the DT idle states bindings.
66: */
67: return i;
68: }
69: EXPORT_SYMBOL_GPL(dt_init_idle_driver);
1)14~21、28~33行,獲取第一個cpu的node,通過其中的“cpu-idle-states”字段,解析出該cpu支持的cpuidle states。同時通過idle_state_valid接口,檢查其它CPU是否同樣支持這些states,如果不支持,返回錯誤。也就是說,ARM64 generic CPU idle driver,只支持那些所有cpuidle state都相同的ARM64平臺。
2)40~47行,針對每一個支持的state,調用init_state_node接口,解析cpuidle state相關的信息,並保存在drv->state數組的指定index中。解析的過程中,會爲每個state指定enter回調函數,對ARM64而言,統一使用arm64_enter_idle_state接口。其它的解析過程,比較簡單,不再詳細說明了。
4. cpu_init_idle
cpuidle功能的支持,需要依賴CPU的、和電源管理有關的底層代碼實現。對ARM64來說,kernel將這些底層代碼抽象爲一系列的操作函數集(struct cpu_operations,具體可參考arch/arm64/kernel/cpu_ops.c)。對同一個ARM平臺,可能有多種類型的操作函數集,設計者可以根據需要選擇一種使用(具體可參考本站後續的文檔)。
以ARM64爲例,ARM document規定了一種PSCI(Power State Coordination Interface)接口,它由firmware實現,用於電源管理有關的操作,如IDLE相關的、SMP相關的、Hotplug相關的、等等。
對cpuidle來說,需要在cpuidle driver註冊之前,調用cpu_init_idle,該函數會根據當前使用的操作函數集,調用其中的cpu_init_idle回調函數,進行idle相關的初始化操作。具體的內容,蝸蝸會在其它文章中說明,這裏就暫停了。
5. arm64_enter_idle_state
idle state的enter函數用於使CPU進入指定的idle state,如下:
1: static int arm64_enter_idle_state(struct cpuidle_device *dev,
2: struct cpuidle_driver *drv, int idx)
3: {
4: int ret;
5:
6: if (!idx) {
7: cpu_do_idle();
8: return idx;
9: }
10:
11: ret = cpu_pm_enter();
12: if (!ret) {
13: /*
14: * Pass idle state index to cpu_suspend which in turn will
15: * call the CPU ops suspend protocol with idle index as a
16: * parameter.
17: */
18: ret = cpu_suspend(idx);
19:
20: cpu_pm_exit();
21: }
22:
23: return ret ? -1 : idx;
24: }
如果是idle state0(即WFI),調用傳統cpu_do_idle接口,該接口的實現,可參照“Linux cpuidle framework(1)_概述和軟件架構”中第2章ARM9的例子,在kernel source code中跟蹤;
對於其它的state,首先調用cpu_pm_enter,發出CPU即將進入low power state的通知。如果成功,則調用cpu_suspend接口,讓cpu進入指定的idle狀態。最後,從idle返回時,發送退出low power state的通知;
cpu_pm_enter/cpu_pm_exit位於kernel/cpu_pm.c中,會在其它文章介紹;
cpu_suspend位於arch/arm64/kernel/suspend.c中,直接調用操作函數集(struct cpu_operations,如PSCI)中的cpu_suspend回調函數。具體會在其它文章中描述。