thermal子系統概述
thermal子系統是內核提供的溫控管理框架,一套軟件溫度解決方案,配合ic內部溫度傳感器,對ic溫度進行管控,保證系統穩定性。
thermal系統多用於對ic內部的重點發熱模塊的功能管控,如cpu、gpu。
thermal sensor驅動負責讀取硬件溫度sensor的溫度,並傳給thermal 子系統,thermal子系統將根據調控對象的溫度,決定是否觸發對應的冷卻措施,如限制CPU最大工作頻率,以及CPU打開的核數等,從而實現對系統的冷卻。
thermal zone
Thermal zone代表一個溫控管理區間,可以將其看做一個虛擬意義上的溫度Sensor, 需要有對應的物理Sensor與其關聯再能發揮作用。
一個Thermal Zone最多可以關聯一個Sensor,但該Sensor可以是多個硬件Sensor的混合。
Trip Point
即觸發點,由Thermal Zone維護。每個thermal zone可以維護多個trip point。Trip Point包含以下信息:
temp:觸發溫度,當溫度到達觸發溫度則該trip point被觸發。
type:trip point類型,沿襲PC散熱方式,分爲四種類型—passive、active、hot、critical。
cooling device綁定信息:
記錄在thermal_instance結構體中,描述trip point與cooling device的綁定關係,即當trip point觸發後由那個cooling device去實施冷卻措施。每個trip point必須與一個cooling device綁定,纔有實際意義。
cooling device
實際對系統實施冷卻措施的驅動,溫控的執行者。cooling device 維護一個cooling等級,即state,一般state越高即系統的冷卻需求越高。cooling device根據不同等級的冷卻需求進行冷卻行爲。
cooling device只根據state進行冷卻操作,是實施者,而state的計算由thermal governor完成。
thermal軟件框架
要實現一個溫度控制的需求,試想一下我們是不是最少要有獲取溫度的設備和控制溫度的設備這兩個最基本的東西?當然附帶的也會產生一些使用溫度控制設備的策略。
那上面這些東西在 Linux Thermal 框架中怎麼體現呢?通過閱讀源碼我們發現代碼中對上面的東西進行了一些抽象。
獲取溫度的設備:在 Thermal 框架中被抽象爲 Thermal Zone Device;
控制溫度的設備:在 Thermal 框架中被抽象爲 Thermal Cooling Device;
控制溫度策略:在 Thermal 框架中被抽象爲 Thermal Governor;
thermal zone
dts裏的配置如下:
thermal-zones{
cpu_thermal_zone{
polling-delay-passive = <1000>; //超過閥值輪詢時間
polling-delay = <2000>; //未超閥值輪詢時間
thermal-sensors = <&ths_combine0 0>;
trips{
cpu_trip0:t0{
temperature = <70>;
type = "passive";
hysteresis = <0>;
};
cpu_trip1:t1{
temperature = <90>;
type = "passive";
hysteresis = <0>;
};
cpu_trip2:t2{
temperature = <100>;
type = "passive";
hysteresis = <0>;
};
cpu_trip3:t3{
temperature = <105>;
type = "passive";
hysteresis = <0>;
};
cpu_trip4:t4{
temperature = <110>;
type = "passive";
hysteresis = <0>;
};
crt_trip0:t5{
temperature = <115>;
type = "critical";
hysteresis = <0>;
};
};
cooling-maps{
bind0{
contribution = <0>;
trip = <&cpu_trip0>;
cooling-device = <&cpu_budget_cooling 1 1>;
};
bind1{
contribution = <0>;
trip = <&cpu_trip1>;
cooling-device = <&cpu_budget_cooling 2 2>;
};
bind2{
contribution = <0>;
trip = <&cpu_trip2>;
cooling-device = <&cpu_budget_cooling 3 3>;
};
bind3{
contribution = <0>;
trip = <&cpu_trip3>;
cooling-device = <&cpu_budget_cooling 4 5>;
};
bind4{
contribution = <0>;
trip = <&cpu_trip4>;
cooling-device = <&cpu_budget_cooling 6 6>;
};
};
};
內核使用thermal_zone_device 抽象獲取溫度的device.
struct thermal_zone_device {
int id;
char type[THERMAL_NAME_LENGTH];
struct device device;
struct thermal_attr *trip_temp_attrs;
struct thermal_attr *trip_type_attrs;
struct thermal_attr *trip_hyst_attrs;
void *devdata;
int trips;
int passive_delay;
int polling_delay;
int temperature;
int last_temperature;
int emul_temperature;
int passive;
unsigned int forced_passive;
struct thermal_zone_device_ops *ops;
const struct thermal_zone_params *tzp;
struct thermal_governor *governor;
struct list_head thermal_instances;
struct idr idr;
struct mutex lock; /* protect thermal_instances list */
struct list_head node;
struct delayed_work poll_queue;
}
struct thermal_zone_device_ops {
int (*bind) (struct thermal_zone_device *,
struct thermal_cooling_device *);
int (*unbind) (struct thermal_zone_device *,
struct thermal_cooling_device *);
int (*get_temp) (struct thermal_zone_device *, int *);
int (*get_mode) (struct thermal_zone_device *,
enum thermal_device_mode *);
int (*set_mode) (struct thermal_zone_device *,
enum thermal_device_mode);
int (*get_trip_type) (struct thermal_zone_device *, int,
enum thermal_trip_type *);
int (*get_trip_temp) (struct thermal_zone_device *, int,
int *);
int (*set_trip_temp) (struct thermal_zone_device *, int,
int);
int (*get_trip_hyst) (struct thermal_zone_device *, int,
int *);
int (*set_trip_hyst) (struct thermal_zone_device *, int,
int);
int (*get_crit_temp) (struct thermal_zone_device *, int *);
int (*set_emul_temp) (struct thermal_zone_device *, int);
int (*get_trend) (struct thermal_zone_device *, int,
enum thermal_trend *);
int (*notify) (struct thermal_zone_device *, int,
enum thermal_trip_type);
}
thermal governal
降溫策略一個抽象,與cpufreq的governal概念類似。
內核已經實現了一些策略,step_wise, user_space, power_allocator, bang_bang 。我們常用step_wise。
/**
* struct thermal_governor - structure that holds thermal governor information
* @name: name of the governor
* @throttle: callback called for every trip point even if temperature is
* below the trip point temperature
* @governor_list: node in thermal_governor_list (in thermal_core.c)
*/
struct thermal_governor {
char name[THERMAL_NAME_LENGTH];
/* 策略函數 */
int (*throttle)(struct thermal_zone_device *tz, int trip);
struct list_head governor_list;
};
thermal cooling device
Thermal Cooling Device 是可以降溫設備的抽象,能降溫的設備比如風扇,這些好理解,但是想 CPU,GPU 這些 Cooling devices 怎麼理解呢?
其實降溫可以從兩方面來理解,一個是加快散熱,另外一個就是降低產熱量。風扇,散熱片這些是用來加快散熱,CPU,GPU 這些 Cooling devices 是通過降低產熱來降溫。
struct thermal_cooling_device {
int id;
char type[THERMAL_NAME_LENGTH];
struct device device;
struct device_node *np;
void *devdata;
/* cooling device 操作函數 */
const struct thermal_cooling_device_ops *ops;
bool updated; /* true if the cooling device does not need update */
struct mutex lock; /* protect thermal_instances list */
struct list_head thermal_instances;
struct list_head node;
};
struct thermal_cooling_device_ops {
int (*get_max_state) (struct thermal_cooling_device *, unsigned long *);
int (*get_cur_state) (struct thermal_cooling_device *, unsigned long *);
/* 設定等級 */
int (*set_cur_state) (struct thermal_cooling_device *, unsigned long);
};
thermal core
Thermal Core作爲中樞註冊Governor,註冊Thermal類,並且基於Device Tree註冊Thermal Zone;提供Thermal Zone註冊函數、Cooling Device註冊函數、提供將Cooling設備綁定到Zone的函數,一個Thermal Zone可以有多個Cooling設備;同時還提供一個核心函數thermal_zone_device_update作爲Thermal中斷處理函數和輪詢函數,輪詢時間會根據不同Trip Delay調節。
模塊流程圖如下:
操作時序圖如下:
原文地址:https://blog.csdn.net/zhouhuacai/article/details/78172267