ftrace總結

一. 簡介

ftrace 的作用是幫助開發人員瞭解 Linux 內核的運行時行爲，以便進行故障調試或性能分析

從名字上分析，ftrace = function trace，表示可以進行函數級 trace，最早 ftrace 就是一個 function tracer，僅能夠記錄內核的函數調用流程。如今 ftrace 已經成爲一個 framework，採用 plugin 的方式支持開發人員添加更多種類的 trace 功能。

接着介紹ftrace之前，我們需要先簡單介紹一下debugfs，因爲ftrace是需要debugfs支持的。爲什麼需要debugfs支持呢？因爲我們作爲user/debuger是工作在user space，也就是用戶態的，如果要獲取調試內核的函數，獲取到內核的函數log，就需要內核空間和用戶空間交互數據。DebugFS，顧名思義，是一種用於內核調試的虛擬文件系統，內核開發者通過debugfs和用戶空間交換數據。（類似的虛擬文件系統還有procfs和sysfs等。procfs，其目的是反映進程的狀態信息；而sysfs主要用於Linux設備模型。不論是procfs或是sysfs的接口應該保持相對穩定，因爲用戶態程序很可能會依賴它們。）。虛擬文件系統都並不實際存儲在硬盤上，而是Linux內核運行起來後才建立起來。

默認情況下，debugfs會被掛載在目錄/sys/kernel/debug之下，如果你的發行版裏沒有自動掛載，可以用如下命令手動完成

# mount -t debugfs none /your/debugfs/dir

但是一般情況下，我們常用的Linux發行版，比如ubuntu 18.04 20.04，會自動掛載，一般是自動掛載到：

/sys/kernel/debug，或者/sys/kernel/tracing

二. 簡明用法

簡單流程可以概括位：設置tracer類型 --> 設置tracer參數 --> 啓動ftrace --> 獲取結果

一般情況下，我們首先要通過文件 tracing_enable 把tracer開關先關閉，因爲如果開關沒有關閉的情況下，一旦先設置好了tracer類型，開始源源不斷的記錄log了，會讓我們的log文件變得很大，不好去分析。

# echo 0 > tracing_on

2.1 設置tracer類型

首先要知道系統支持的tracer類型，可以通過下面的方式查看：

# cd /sys/kernel/debug/tracing
# cat available_tracers
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
# echo function > current_tracer                           //設置 tracer 類型爲 function

2.2 設置tracer參數

set_ftrace_filter 表示要跟蹤的函數，這裏我們只跟蹤 mutex_unlock 函數

# echo mutex_unlock > set_ftrace_filter

2.3 啓動ftrace

通過下面這個命令：

# echo 1 > tracing_on

# echo 0 > tracing_on          //運行一段時間之後可以關閉ftrace，log會保留在trace文件中

2.4 獲取trace結果

trace結果保存在trace文件中

# tracer: function
#
# entries-in-buffer/entries-written: 165/165   #P:8
#
#                                _-----=> irqs-off
#                               / _----=> need-resched
#                              | / _---=> hardirq/softirq
#                              || / _--=> preempt-depth
#                              ||| /     delay
#           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
#              | |         |   ||||      |         |
           <...>-202430  [001] .... 1212056.188933: mutex_unlock <-rb_simple_write
           <...>-202430  [001] .... 1212056.188953: mutex_unlock <-tracing_release_generic_tr
           <...>-202430  [001] .... 1212056.189069: mutex_unlock <-tty_ioctl
           <...>-202430  [001] .... 1212056.189070: mutex_unlock <-tty_do_resize
           <...>-202430  [001] .... 1212056.189107: mutex_unlock <-process_output_block
           <...>-202430  [001] .... 1212056.189108: mutex_unlock <-do_tty_write
           <...>-202558  [005] .... 1212056.189129: mutex_unlock <-flush_to_ldisc
           <...>-202318  [003] .... 1212056.189204: mutex_unlock <-n_tty_read
           <...>-162326  [000] .... 1212056.260795: mutex_unlock <-psi_avgs_work
           <...>-162326  [000] .... 1212056.260799: mutex_unlock <-psi_avgs_work
     kworker/5:0-45      [005] .... 1212056.348817: mutex_unlock <-intel_fbc_flush
     kworker/5:0-45      [005] .... 1212056.552841: mutex_unlock <-intel_fbc_flush
           <...>-176455  [006] .... 1212056.676822: mutex_unlock <-psi_avgs_work
           <...>-187521  [007] .... 1212056.708794: mutex_unlock <-psi_avgs_work
           <...>-174712  [004] .... 1212056.708810: mutex_unlock <-psi_avgs_work
     kworker/5:0-45      [005] .... 1212056.756854: mutex_unlock <-intel_fbc_flush
           <...>-202013  [003] .... 1212056.900793: mutex_unlock <-pci_pme_list_scan
           <...>-176455  [006] .... 1212056.932793: mutex_unlock <-psi_avgs_work
     kworker/5:0-45      [005] .... 1212056.960851: mutex_unlock <-intel_fbc_flush
           <...>-202318  [003] .... 1212057.073399: mutex_unlock <-n_tty_write
           <...>-202318  [003] .... 1212057.073401: mutex_unlock <-do_tty_write
           <...>-202558  [005] .... 1212057.073454: mutex_unlock <-flush_to_ldisc

從 trace 信息我們可以獲取很多重要信息：

進程信息，TASK-PID
進程運行的 CPU
執行函數時的系統狀態，包括中斷，搶佔等狀態信息
執行函數的時間輟

與function trace相似，下面貼兩個function_graph trace和 trace event的示例：

function_graph trace:

# cat available_tracers
hwlat blk mmiotrace function_graph wakeup_dl wakeup_rt wakeup function nop
# echo function_graph > current_tracer

# echo dev_attr_show > set_graph_function
# echo 1 > tracing_on

# cat trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
 0)               |  dev_attr_show() {
 0)               |    energy_uj_show() {
 0)               |      get_energy_counter [intel_rapl_common]() {
 0)               |        cpus_read_lock() {
 0)               |          _cond_resched() {
 0)   0.283 us    |            rcu_all_qs();
 0)   0.880 us    |          }
 0)   1.521 us    |        }
 0)               |        rapl_read_data_raw [intel_rapl_common]() {
 0)               |          rapl_msr_read_raw [intel_rapl_msr]() {
 0)               |            rdmsrl_safe_on_cpu() {
 0)               |              rdmsr_safe_on_cpu() {
 0)   0.271 us    |                __init_waitqueue_head();
 0)               |                smp_call_function_single_async() {
 0)               |                  generic_exec_single() {
 0)               |                    __rdmsr_safe_on_cpu() {
 0)               |                      complete() {
 0)   0.268 us    |                        _raw_spin_lock_irqsave();
 0)               |                        __wake_up_locked() {
 0)   0.298 us    |                          __wake_up_common();
 0)   0.836 us    |                        }
 0)   0.281 us    |                        _raw_spin_unlock_irqrestore();
 0)   2.476 us    |                      }
 0)   3.610 us    |                    }
 0)   4.315 us    |                  }
 0)   4.941 us    |                }
 0)               |                wait_for_completion() {
 0)               |                  _cond_resched() {
 0)   0.270 us    |                    rcu_all_qs();
 0)   0.800 us    |                  }
 0)   0.271 us    |                  _raw_spin_lock_irq();
 0)   1.920 us    |                }
 0)   8.231 us    |              }
 0)   8.824 us    |            }
 0)   9.413 us    |          }
 0) + 10.670 us   |        }
 0)   0.269 us    |        cpus_read_unlock();
 0) + 13.757 us   |      }
 0) + 15.976 us   |    }
 0) + 17.029 us   |  }

trace event:

trace event 就是利用 ftrace 框架，實現低性能損耗，對執行流無影響的一種信息輸出機制。相比 printk，trace event：

不開啓沒有性能損耗
開啓後不影響代碼流程
不需要重新編譯內核即可獲取 debug 信息
系統支持的所有 trace event 都位於 /sys/kernel/debug/tracing/events 目錄

# cd /sys/kernel/debug/tracing/events/sched/sched_switch
# echo 1 > enable

# echo 0 > trace
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 221/221 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
          <idle>-0     [003] d...  3408.075314: sched_switch: prev_comm=swapper/3 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=chrome next_pid=4737 next_prio=120
          chrome-4737  [003] d...  3408.075465: sched_switch: prev_comm=chrome prev_pid=4737 prev_prio=120 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120

三. ftrace使用技巧

如果我們不想 trace schedule 這個函數，也可以這樣做：

echo '!schedule' > set_ftrace_filter

或者

echo schedule > set_ftrace_notrace

Function filter 的設置也支持 *match，match* ，*match* 這樣的正則表達式，譬如我們可以 echo '*lock*' < set_ftrace_notrace 來禁止跟蹤帶 lock 的函數，set_ftrace_notrace 文件裏面這時候就會顯示：

cat set_ftrace_notrace
xen_pte_unlock
read_hv_clock_msr
read_hv_clock_tsc
update_persistent_clock
read_persistent_clock
set_task_blockstep
user_enable_block_step

trace-cmd

從上面的例子可以看到，其實使用 ftrace 並不是那麼方便，我們需要手動的去控制多個文件，但幸運的是，我們有 trace-cmd，作爲 ftrace 的前端，trace-cmd 能夠非常方便的讓我們進行 ftrace 的操作，譬如我們可以使用 record 命令來 trace sched 事件：

trace-cmd record -e sched

然後使用 report 命令來查看 trace 的數據：

trace-cmd report | head -10
version = 6
CPU 27 is empty
cpus=40
       trace-cmd-29557 [003] 16159201.985281: sched_waking:         comm=kworker/u82:3 pid=28507 prio=120 target_cpu=037
       trace-cmd-29557 [003] 16159201.985283: sched_migrate_task:   comm=kworker/u82:3 pid=28507 prio=120 orig_cpu=37 dest_cpu=5
       trace-cmd-29557 [003] 16159201.985285: sched_stat_sleep:     comm=kworker/u82:3 pid=28507 delay=137014529 [ns]
       trace-cmd-29585 [023] 16159201.985286: sched_stat_runtime:   comm=trace-cmd pid=29585 runtime=217630 [ns] vruntime=107586626253137 [ns]
       trace-cmd-29557 [003] 16159201.985286: sched_wake_idle_without_ipi: cpu=5
       trace-cmd-29595 [037] 16159201.985286: sched_stat_runtime:   comm=trace-cmd pid=29595 runtime=213227 [ns] vruntime=105364596011783 [ns]
       trace-cmd-29557 [003] 16159201.985287: sched_wakeup:         kworker/u82:3:28507 [120] success=1 CPU:005

當然，在 report 的時候也可以加入自己的 filter 來過濾數據，譬如下面，我們就過濾出 sched_wakeup 事件並且是 success 爲 1 的。

trace-cmd report -F 'sched/sched_wakeup: success == 1'  | head -10
version = 6
CPU 27 is empty
cpus=40
       trace-cmd-29557 [003] 16159201.985287: sched_wakeup:         kworker/u82:3:28507 [120] success=1 CPU:005
       trace-cmd-29557 [003] 16159201.985292: sched_wakeup:         trace-cmd:29561 [120] success=1 CPU:007
          <idle>-0     [032] 16159201.985294: sched_wakeup:         qps_json_driver:24669 [120] success=1 CPU:032
          <idle>-0     [032] 16159201.985298: sched_wakeup:         trace-cmd:29590 [120] success=1 CPU:026
          <idle>-0     [010] 16159201.985300: sched_wakeup:         trace-cmd:29563 [120] success=1 CPU:010
       trace-cmd-29597 [037] 16159201.985302: sched_wakeup:         trace-cmd:29595 [120] success=1 CPU:039
          <idle>-0     [010] 16159201.985302: sched_wakeup:         sshd:29395 [120] success=1 CPU:010

參考鏈接：

1. linux的debugfs: https://www.cnblogs.com/cl1024cl/archive/2013/05/14/6205698.html

2. 使用ftrace跟蹤系統問題 https://www.jianshu.com/p/99e127973abe

3. ftrace簡介 https://www.cnblogs.com/danxi/p/6417828.html

一. 簡介

二. 簡明用法

三. ftrace使用技巧

10分鐘搞定Mysql主從部署配置

如何使用 JS 判斷用戶是否處於活躍狀態

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

lightdb數據庫超時相關控制參數

lightdb秒級增加列和刪除列（not null帶默認值）

Java ThreadPoolShutdown

GPU資源同步：coherent、總線協議、shareable Vulkan內存與資源管理

Vulkan內存與資源管理

Vulkan中的同步與緩存控制

OpenGL中的Frame Buffer、Depth Buffer、Color Buffer、Stencil Buffer

AIDL與HIDL

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結