1、第一节

1、perf top

关于perf top界面常用命令如下：

h：显示帮助，即可显示详细的帮助信息。
UP/DOWN/PGUP/PGDN/SPACE：上下和翻页。
a：annotate current symbol，注解当前符号。能够给出汇编语言的注解，给出各条指令的采样率。
d：过滤掉所有不属于此DSO的符号。非常方便查看同一类别的符号。
P：将当前信息保存到perf.hist.N中。

perf top常用选项有：

-e <event>：指明要分析的性能事件。
-p <pid>：Profile events on existing Process ID (comma sperated list). 仅分析目标进程及其创建的线程。
-k <path>：Path to vmlinux. Required for annotation functionality. 带符号表的内核映像所在的路径。
-K：不显示属于内核或模块的符号。
-U：不显示属于用户态程序的符号。
-d <n>：界面的刷新周期，默认为2s，因为perf top默认每2s从mmap的内存区域读取一次性能数据。
-g：得到函数的调用关系图。

perf top --call-graph [fractal]，路径概率为相对值，加起来为100%，调用顺序为从下往上。
perf top --call-graph graph，路径概率为绝对值，加起来为该函数的热度。

2、perf stat

perf stat用于运行指令，并分析其统计结果。虽然perf top也可以指定pid，但是必须先启动应用才能查看信息。
perf stat能完整统计应用整个生命周期的信息。
命令格式为：

perf stat [-e <EVENT> | --event=EVENT] [-a] <command>
perf stat [-e <EVENT> | --event=EVENT] [-a] — <command> [<options>]

下面简单看一下perf stat 的输出：

al@al-System-Product-Name:~/perf$ sudo perf stat
^C
  Performance counter stats for 'system wide':

      40904.820871      cpu-clock (msec)          #    5.000 CPUs utilized          
             18,132      context-switches          #    0.443 K/sec                  
              1,053      cpu-migrations            #    0.026 K/sec                  
              2,420      page-faults               #    0.059 K/sec                  
      3,958,376,712      cycles                    #    0.097 GHz                      (49.99%)
        574,598,403      stalled-cycles-frontend   #   14.52% frontend cycles idle     (49.98%)
      9,392,982,910      stalled-cycles-backend    #  237.29% backend cycles idle      (50.00%)
      1,653,185,883      instructions              #    0.42  insn per cycle         
                                                   #    5.68  stalled cycles per insn  (50.01%)
        237,061,366      branches                  #    5.795 M/sec                    (50.02%)
         18,333,168      branch-misses             #    7.73% of all branches          (50.00%)
       8.181521203 seconds time elapsed

输出解释如下：

cpu-clock：任务真正占用的处理器时间，单位为ms。CPUs utilized = task-clock / time elapsed，CPU的占用率。
context-switches：程序在运行过程中上下文的切换次数。
CPU-migrations：程序在运行过程中发生的处理器迁移次数。Linux为了维持多个处理器的负载均衡，在特定条件下会将某个任务从一个CPU迁移到另一个CPU。
CPU迁移和上下文切换：发生上下文切换不一定会发生CPU迁移，而发生CPU迁移时肯定会发生上下文切换。发生上下文切换有可能只是把上下文从当前CPU中换出，下一次调度器还是将进程安排在这个CPU上执行。
page-faults：缺页异常的次数。当应用程序请求的页面尚未建立、请求的页面不在内存中，或者请求的页面虽然在内存中，但物理地址和虚拟地址的映射关系尚未建立时，都会触发一次缺页异常。另外TLB不命中，页面访问权限不匹配等情况也会触发缺页异常。
cycles：消耗的处理器周期数。如果把被ls使用的cpu cycles看成是一个处理器的，那么它的主频为2.486GHz。可以用cycles / task-clock算出。
stalled-cycles-frontend：指令读取或解码的质量步骤，未能按理想状态发挥并行左右，发生停滞的时钟周期。
stalled-cycles-backend：指令执行步骤，发生停滞的时钟周期。
instructions：执行了多少条指令。IPC为平均每个cpu cycle执行了多少条指令。
branches：遇到的分支指令数。branch-misses是预测错误的分支指令数。

其他常用参数

    -a, --all-cpus        显示所有CPU上的统计信息
    -C, --cpu <cpu>       显示指定CPU的统计信息
    -c, --scale           scale/normalize counters
    -D, --delay <n>       ms to wait before starting measurement after program start
    -d, --detailed        detailed run - start a lot of events
    -e, --event <event>   event selector. use 'perf list' to list available events
    -G, --cgroup <name>   monitor event in cgroup name only
    -g, --group           put the counters into a counter group
    -I, --interval-print <n>
                          print counts at regular interval in ms (>= 10)
    -i, --no-inherit      child tasks do not inherit counters
    -n, --null            null run - dont start any counters
    -o, --output <file>   输出统计信息到文件
    -p, --pid <pid>       stat events on existing process id
    -r, --repeat <n>      repeat command and print average + stddev (max: 100, forever: 0)
    -S, --sync            call sync() before starting a run
    -t, --tid <tid>       stat events on existing thread id
...

示例
前面统计程序的示例，下面看一下统计CPU信息的示例：
执行sudo perf stat -C 0，统计CPU 0的信息。想要停止后，按下Ctrl+C终止。可以看到统计项一样，只是统计对象变了。

al@al-System-Product-Name:~/perf$ sudo perf stat -C 0
^C
  Performance counter stats for 'CPU(s) 0':

       2517.107315      cpu-clock (msec)          #    1.000 CPUs utilized          
              2,941      context-switches          #    0.001 M/sec                  
                109      cpu-migrations            #    0.043 K/sec                  
                 38      page-faults               #    0.015 K/sec                  
        644,094,340      cycles                    #    0.256 GHz                      (49.94%)
         70,425,076      stalled-cycles-frontend   #   10.93% frontend cycles idle     (49.94%)
        965,270,543      stalled-cycles-backend    #  149.86% backend cycles idle      (49.94%)
        623,284,864      instructions              #    0.97  insn per cycle         
                                                   #    1.55  stalled cycles per insn  (50.06%)
         65,658,190      branches                  #   26.085 M/sec                    (50.06%)
          3,276,104      branch-misses             #    4.99% of all branches          (50.06%)
       2.516996126 seconds time elapsed

如果需要统计更多的项，需要使用-e，如：

perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls

结果如下，关注的特殊项也纳入统计。

al@al-System-Product-Name:~/perf$ sudo perf stat -e task-clock,context-switches,cpu-migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses,L1-dcache-loads,L1-dcache-load-misses,LLC-loads,LLC-load-misses,dTLB-loads,dTLB-load-misses ls
Performance counter stats for 'ls':

          2.319422      task-clock (msec)         #    0.719 CPUs utilized          
                  0      context-switches          #    0.000 K/sec                  
                  0      cpu-migrations            #    0.000 K/sec                  
                 89      page-faults               #    0.038 M/sec                  
          2,142,386      cycles                    #    0.924 GHz                    
            659,800      stalled-cycles-frontend   #   30.80% frontend cycles idle   
            725,343      stalled-cycles-backend    #   33.86% backend cycles idle    
          1,344,518      instructions              #    0.63  insn per cycle         
                                                   #    0.54  stalled cycles per insn
      <not counted>      branches                                                    
      <not counted>      branch-misses                                               
      <not counted>      L1-dcache-loads                                             
      <not counted>      L1-dcache-load-misses                                       
      <not counted>      LLC-loads                                                   
      <not counted>      LLC-load-misses                                             
      <not counted>      dTLB-loads                                                  
      <not counted>      dTLB-load-misses                                           
       0.003227507 seconds time elapsed

3、perf record

sudo perf record -a -g ./test 会在当前目录生成perf.data文件。
sudo perf record sleep常用选项

-e record指定PMU事件
    --filter  event事件过滤器
-a  录取所有CPU的事件
-p  录取指定pid进程的事件
-o  指定录取保存数据的文件名
-g  使能函数调用图功能
-C 录取指定CPU的事件

4、perf report

perf report -i perf.data，可以看出main函数所占百分比，以及funcA和funcB分别所占百分比。

sudo perf report --call-graph none结果如下,后面结合perf timechart分析.
上图看上去比较乱，如果想只看test产生的信息：
sudo perf report --call-graph none -c test

2、第二节

全局性概况：

perf list查看当前系统支持的性能事件；
perf bench对系统性能进行摸底；
perf test对系统进行健全性测试；
perf stat对全局性能进行统计；

全局细节：

perf top可以实时查看当前系统进程函数占用率情况；
perf probe可以自定义动态事件；

特定功能分析：

perf kmem针对slab子系统性能分析；
perf kvm针对kvm虚拟化分析；
perf lock分析锁性能；
perf mem分析内存slab性能；
perf sched分析内核调度器性能；
perf trace记录系统调用轨迹；

最常用功能perf record，可以系统全局，也可以具体到某个进程，更甚具体到某一进程某一事件；可宏观，也可以很微观。

pref record记录信息到perf.data；
perf report生成报告；
perf diff对两个记录进行diff；
perf evlist列出记录的性能事件；
perf annotate显示perf.data函数代码；
perf archive将相关符号打包，方便在其它机器进行分析；
perf script将perf.data输出可读性文本；

可视化工具perf timechart

perf timechart record记录事件；
perf timechart生成output.svg文档；

3、第三节

1、perf命令简要介绍

性能调优时，我们通常需要分析查找到程序百分比高的热点代码片段，这便需要使用 perf record 记录单个函数级别的统计信息，并使用 perf report 来显示统计结果；
perf record
perf report
举例：
sudo perf record -e cpu-clock -g -p 2548
-g 选项是告诉perf record额外记录函数的调用关系
-e cpu-clock 指perf record监控的指标为cpu周期
-p 指定需要record的进程pid

程序运行完之后，perf record会生成一个名为perf.data的文件，如果之前已有，那么之前的perf.data文件会被覆盖
获得这个perf.data文件之后，就需要perf report工具进行查看
perf report -i perf.data
-i 指定要查看的文件
以诊断mysql为例,report结果：
$sudo perf report -i perf.data

2、使用火焰图展示结果

1、Flame Graph项目位于GitHub上：https://github.com/brendangregg/FlameGraph
2、可以用git将其clone下来：git clone https://github.com/brendangregg/FlameGraph.git

我们以perf为例，看一下flamegraph的使用方法：
1、第一步
$sudo perf record -e cpu-clock -g -p 28591
Ctrl+c结束执行后，在当前目录下会生成采样数据perf.data.
2、第二步
用perf script工具对perf.data进行解析
perf script -i perf.data &> perf.unfold
3、第三步
将perf.unfold中的符号进行折叠：
#./stackcollapse-perf.pl perf.unfold &> perf.folded
4、最后生成svg图：
./flamegraph.pl perf.folded > perf.svg

Linux 调试辅助工具之perf 火焰图

1、第一节

1、perf top

2、perf stat

3、perf record

4、perf report

2、第二节

3、第三节

1、perf命令简要介绍

2、使用火焰图展示结果

钉钉打卡速度慢

使用neovim打造go ide(支持代码跳转, 代码补全, 实时语法检查)

Nginx R31 doc 官方文档-01-nginx 如何安装

Python 潮流周刊#51：用 Python 绘制美观的图表

cs01 CSS Syntax

Qt/C++音视频开发74-合并标签图形/生成yolo运算结果图形/文字和图形合并成一个/水印滤镜

挑战程序设计竞赛 2.2章习题 POJ - 3617 Best Cow Line 贪心

字节面试：MySQL什么时候锁表？如何防止锁表？

.NET8连接SQL SERVER 2008 R2 报：证书链是由不受信任的颁发机构颁发的

golang开发环境搭建(win10)

基於VISA標準的儀器驅動器設計

vsftp 移植到arm

GDB 交差編譯

Linux C 程序執行shell命令並獲取返回值結果的方法

ffmpeg簡介及編碼支持

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結