上一篇博客說Perf_event_open遇到了問題,獲得的計數器不準。
我分析原因,可能是性能計數器是一個核中的硬件,它只能計數在該核中運行的程序。而在多核處理器中,任務調度可能會將我需要計數的程序分配給其他的核,而這種分配是動態的,所以每次得到的不一樣。這可能是原因之一。
很湊巧,我誤打誤撞竟然得到了較爲正確的做法。在官方手冊中說明。
The pid and cpu arguments allow specifying which process and CPU to
monitor:
pid == 0 and cpu == -1
This measures the calling process/thread on any CPU.
pid == 0 and cpu >= 0
This measures the calling process/thread only when running on
the specified CPU.
pid > 0 and cpu == -1
This measures the specified process/thread on any CPU.
pid > 0 and cpu >= 0
This measures the specified process/thread only when running
on the specified CPU.
pid == -1 and cpu >= 0
This measures all processes/threads on the specified CPU.
This requires CAP_SYS_ADMIN capability or a
/proc/sys/kernel/perf_event_paranoid value of less than 1.
pid == -1 and cpu == -1
This setting is invalid and will return an error.
When pid is greater than zero, permission to perform this system call
is governed by a ptrace access mode PTRACE_MODE_READ_REALCREDS check;
see ptrace(2).
我將pid == 0和cpu >=0,再按照CPU 多設置幾組並行統計,能得到較爲準確的值。