Linux 電源管理 -- OPPs

  • 瞭解operating performance points framework

1.Operating performance points

  The OPP framework was acting as a helper library that provided a table of voltage-frequency pairs (with some additional information) for the kernel. Kernel frameworks, like cpufreq and devfreq, used these OPP tables to perform DVFS for the devices. The OPP framework creates this table dynamically via platform-specific code and statically from device-tree blobs.

  Systems on chips (SoCs) have become increasingly complex and power-efficient. There are multiple sub-modules within a SoC that work in conjunction, but not all of them are required to function at their highest performance frequency and voltage levels at all times, as that can be less power-efficient. Devices like CPUs, GPUs, and I/O devices have the capability of working at a range of frequency and voltage pairs. They should stay at lower voltages and frequencies when the system load is low and at higher levels otherwise.

  The set of discrete tuples consisting of frequency and voltage pairs that the device supports are called “operating performance points”.

  For example, a CPU core that can operate at 1.0GHz at minimum voltage 1.0V, 1.1GHz at minimum voltage 1.1V, and 1.2GHz at minimum voltage 1.3V can be represented by these OPP tuples:

    Hz         uV
    1000000000 1000000
    1100000000 1100000
    1200000000 1300000

1.1. OPP 發展

Before the 4.6 kernel:

  The OPP framework was responsible for creating an OPP table by parsing the device tree (or via the platform-specific code) and providing a set of helpers to inquire about the target OPPs. For example, finding the minimum or maximum OPP corresponding to the target frequency. The consumer drivers of the OPP library used the helpers to find an OPP corresponding to the target frequency and used it to configure the device’s clock and power supplies (if required).

During the 4.6 kernel:

  The OPP core thus gained the functionality to perform DVFS on behalf of device drivers. Those drivers need to pass a target frequency, and the OPP core will find and set the best possible OPP corresponding to that.

  In order to perform DVFS on behalf of device drivers, the OPP core needs some of the device’s resources. Some of them are acquired automatically by the OPP core, while the core needs help from the driver to get others. It is important for driver writers to understand the expectations of the OPP core before they try to use it to do DVFS for their devices.

1.2 Operating Performance Points Library

OPP library:

  • drivers/base/power/opp.c
  • include/linux/pm_opp.h

Kconfig setting:

  • CONFIG_PM_OPP
  • CONFIG_PM

1.3.結構體

struct dev_pm_opp:

 drivers/base/power/opp/opp.h:
 71 struct dev_pm_opp {
>> 72     struct list_head node;
>> 73     struct kref kref;
   74 
   75     bool available;
   76     bool dynamic;
   77     bool turbo;                                                                                        
   78     bool suspend;
   79     unsigned long rate;
   81     struct dev_pm_opp_supply *supplies;
   83     unsigned long clock_latency_ns;
   85     struct opp_table *opp_table;
   87     struct device_node *np;
   92 };

struct dev_pm_opp_supply:

   39 struct dev_pm_opp_supply {
   40     unsigned long u_volt;                                                                              
   41     unsigned long u_volt_min;
   42     unsigned long u_volt_max;
   43     unsigned long u_amp;
   44 };

struct opp_table:

149 struct opp_table {
>>150     struct list_head node;
  151 
>>152     struct blocking_notifier_head head;
>>153     struct list_head dev_list;
>>154     struct list_head opp_list;
>>155     struct kref kref;
>>156     struct mutex lock;
  157 
  158     struct device_node *np;
  159     unsigned long clock_latency_ns_max;
  160 
  161     /* For backward compatibility with v1 bindings */
  162     unsigned int voltage_tolerance_v1;
  163 
  164     enum opp_table_access shared_opp;                                                                  
  165     struct dev_pm_opp *suspend_opp;
  166 
  167     unsigned int *supported_hw;
  168     unsigned int supported_hw_count;
  169     const char *prop_name;
  170 }

2.Data Structures

  Typically an SoC contains multiple voltage domains which are variable. Each domain is represented by a device pointer. The relationship to OPP can be represented as follows:

SoC                                                                                                          
 |- device 1
 |  |- opp 1 (availability, freq, voltage)
 |  |- opp 2 ..
 ...    ... 
 |  `- opp n ..
 |- device 2
 ... 
 `- device m

  OPP library maintains a internal list that the SoC framework populates and accessed by various functions. However, the structures representing the actual OPPs and domains are internal to the OPP library itself to allow for suitable abstraction reusable across systems.

  Overall, in a simplistic view, the data structure operations is represented as following:

Initialization / modification:
            +-----+        /- dev_pm_opp_enable
dev_pm_opp_add --> | opp | <-------
  |         +-----+        \- dev_pm_opp_disable
  \-------> domain_info(device)

Search functions:
             /-- dev_pm_opp_find_freq_ceil  ---\   +-----+
domain_info<---- dev_pm_opp_find_freq_exact -----> | opp |
             \-- dev_pm_opp_find_freq_floor ---/   +-----+

Retrieval functions:
+-----+     /- dev_pm_opp_get_voltage
| opp | <---                                                                                                 
+-----+     \- dev_pm_opp_get_freq

domain_info <- dev_pm_opp_get_opp_count

3.APIs

  • dev_pm_opp_add :( WARNING: Do not use this function in interrupt context.)

    • 向指定的設備添加一個頻率/電壓(opp table)組合,頻率和電壓的單位分別是Hz和uV。
  • dev_pm_opp_remove:

    • remove an opp from opp table.
  • dev_pm_opp_get:

    • increment the reference count of opp.
  • dev_pm_opp_enable:

    • 用於使能指定的OPP,調用dev_pm_opp_add添加進去的OPP,默認是enable的。
  • dev_pm_opp_disable:

    • 雖然設備支持某些OPP,但driver有可能覺得比較危險,不想使用,則可以調用dev_pm_opp_disable接口,禁止該OPP。
  • dev_pm_opp_get_voltage:

    • 獲得電壓。
  • dev_pm_opp_get_freq:

    • 獲得頻率。
  • dev_pm_opp_set_regulators:

    • 進行voltage scaling
  • dev_pm_opp_put_regulators:

    • free the resources acquired by the OPP core
  • dev_pm_opp_set_rate:

    • This routine configures the device for the OPP with the lowest frequency greater than or equal to the target frequency.
  • dev_pm_opp_get_opp_count:

    • 獲取opp table opps numbers
  • dev_pm_opp_of_add_table :

    • 解析並初始化一個設備的opp table。
  • OPP的查詢接口包括:

    • dev_pm_opp_find_freq_floor,查詢小於或者等於指定freq的OPP,在返回OPP的同時,從freq指針中返回實際的freq值;
    • dev_pm_opp_find_freq_ceil,查詢大於或者等於指定freq的OPP,在返回OPP的同時,從freq指針中返回實際的freq值;
    • dev_pm_opp_find_freq_exact,精確查找指定freq的OPP,同時通過available變量,可以控制是否查找處於disable狀態的OPP。上面兩個查找接口,是不查找處於disable狀態的OPP的。

Note:
  Callers of these functions(OPP的查詢接口) shall call dev_pm_opp_put() after they have used the OPP. Otherwise the memory for the OPP will never get freed and result in memleak.

4.分析imx:

arch/arm/mach-imx/mach-imx6q.c
.init_late = imx6q_init_late,
  ->imx6q_opp_init();
    ->dev_pm_opp_of_add_table // 從dts中得到cpu工作的電壓和電流
      -> _of_add_opp_table_v2 //如果定義operating-points-v2,選擇該分支
        ->_opp_add_static_v2

_of_add_opp_table_v2 :

Note:
  Currently the _of_add_opp_table_v2 call loops through the OPP nodes in the operating-points-v2 table in the device tree and calls _opp_add_static_v2 for each to add them to the table.

  Here is an example:
  We have two CPU devices here (that share their clock/voltage rails) and we need to configure a single power supply to perform DVFS for them. The device-tree fragment describing the CPUs themselves would be:

	cpus {
		#address-cells = <1>;
		#size-cells = <0>;

		cpu@0 {
			compatible = "arm,cortex-a9";
			reg = <0>;
			next-level-cache = <&L2>;
			clocks = <&clk_controller 0>;
			clock-names = "cpu";
			vdd-supply = <&vdd_supply0>;
			operating-points-v2 = <&cpu_opp_table>;
		};

		cpu@1 {
			compatible = "arm,cortex-a9";
			reg = <1>;
			next-level-cache = <&L2>;
			clocks = <&clk_controller 0>;
			clock-names = "cpu";
			vdd-supply = <&vdd_supply0>;
			operating-points-v2 = <&cpu_opp_table>;
		};
	};

  These definitions reference cpu_opp_table, which is a table describing the valid operating points for these CPUs; it is also found in the device tree:


	cpu_opp_table: opp_table {
		compatible = "operating-points-v2";
		opp-shared;

		opp@1000000000 {
			opp-hz = /bits/ 64 <1000000000>;
			opp-microvolt = <990000 1000000 1050000>;
			opp-microamp = <70000>;
			clock-latency-ns = <300000>;
			opp-suspend;
		};
		opp@1100000000 {
			opp-hz = /bits/ 64 <1100000000>;
			opp-microvolt = <1090000 1100000 1150000>;
			opp-microamp = <80000>;
			clock-latency-ns = <310000>;
		};
		opp@1200000000 {
			opp-hz = /bits/ 64 <1200000000>;
			opp-microvolt = <1190000 1200000 1250000>;
			opp-microamp = <90000>;
			clock-latency-ns = <290000>;
			turbo-mode;
		};
	};

  The platform-specific code needed to set up DVFS would look something like:

  const char *name[] = {"vdd"};
    struct opp_table *opp_table;

    opp_table = dev_pm_opp_set_regulators(dev, &name, ARRAY_SIZE(name));
    if (IS_ERR(opp_table))
	dev_err(dev, "Failed to set regulators: %d\n", PTR_ERR(opp_table));

  The driver responsible for voltage and frequency scaling would then do something like this:

    ret = dev_pm_opp_set_rate(dev, target_freq);
    if (ret)
	dev_err(dev, "Failed to set rate: %d\n", ret);

refer to

  • https://lwn.net/Articles/718632/
  • Documentation/power/opp.txt
  • https://s0www0kernel0org.icopy.site/doc/html/latest/power/opp.html
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章