Perf_event_open 遇到的問題和想法

之前一篇寫了如何使用Perf_event_open來監控性能計數器。也找了一些例子監控多個性能計數器,有創建多個寄存器的,也有創建組的,比如https://stackoverflow.com/questions/42088515/perf-event-open-how-to-monitoring-multiple-events

但是有一個通用的問題,就是我將type設置爲PERF_TYPE_HARDWARE,config設置爲PERF_COUNT_HW_CPU_CYCLES、PERF_COUNT_HW_INSTRUCTIONS (不管是兩個還是寄存器組),得到的數據都是有問題的。有各式各樣的錯誤,比如

都爲0:

cpu cycles: 0
instructin: 0
page faults: 0

測試結果不準:

cpu cycles: 763916370

而我使用命令行的perf stat 命令行得到的cycle數是911,975,116,差距有點大呀。

我搜索了google和github,終於發現了一個不同的,我測試結果是準確的。連接如下:https://github.com/castl/easyperf

這個不同點是,type使用PERF_TYPE_RAW,在PERF_EVENT_OPEN說明上是需要自己查看處理器的手冊的。我來分析一下github上的easyperf。

首先是頭文件:

#ifndef __EASYPERF_H__
#define __EASYPERF_H__

#ifdef __cplusplus
extern "C" {
#endif

#include <stdint.h>

// Extra options for each event
#define PERFMON_EVENTSEL_OS     (1 << 17)
#define PERFMON_EVENTSEL_USR    (1 << 16)

int perf_init(unsigned int num_ctrs, ...);
void perf_close();

uint64_t perf_read(unsigned int ctr);
void perf_read_all(uint64_t* vals);

// microarch neutral
#define EV_CYCLES          (0x3C | (0x0 << 8))
#define EV_REF_CYCLES      (0x3C | (0x1 << 8))
#define EV_INSTR           (0xC0 | (0x0 << 8))
#define EV_BRANCH          (0xC4 | (0x1 << 8))
#define EV_BRANCH_MISS     (0xC5 | (0x1 << 8))

// microarch specific
#define I7_L3_REFS      (0x2e | (0x4f << 8))
#define I7_L3_MISS      (0x2e | (0x41 << 8))

#define I7_L2_REFS      (0x24 | (0xff << 8))
#define I7_L2_MISS      (0x24 | (0xaa << 8))

#define I7_ICACHE_HITS  (0x80 | (0x01 << 8))
#define I7_ICACHE_MISS  (0x80 | (0x02 << 8))

#define I7_DL1_REFS     (0x43 | (0x01 << 8))

#define I7_LOADS        (0x0b | (0x01 << 8))
#define I7_STORES       (0x0b | (0x02 << 8))

#define I7_L2_DTLB_MISS (0x49 | (0x01 << 8))
#define I7_L2_ITLB_MISS (0x85 | (0x01 << 8))

#define I7_IO_TXNS      (0x6c | (0x01 << 8))
#define I7_DRAM_REFS    (0x0f | (0x20 << 8))

#ifdef __cplusplus
}
#endif

#endif // __EASYPERF_H__

可以看出,主要是定義了性能寄存器的使用。結合寄存器的圖來具體說明,先看寄存器說明圖:

下面兩行是第十六位和第十七位,是看用戶模式還是操作系統模式:

#define PERFMON_EVENTSEL_OS     (1 << 17)
#define PERFMON_EVENTSEL_USR    (1 << 16)
  • USR (user mode) flag (bit 16) — Specifies that the selected microarchitectural condition is counted when the logical processor is operating at privilege levels 1, 2 or 3. This flag can be used with the OS flag.
  • OS (operating system mode) flag (bit 17) — Specifies that the selected microarchitectural condition is counted when the logical processor is operating at privilege level 0. This flag can be used with the USR flag.

而其他的是後16位,

#define EV_CYCLES          (0x3C | (0x0 << 8))
#define EV_REF_CYCLES      (0x3C | (0x1 << 8))

具體解釋是:

  • Unit mask (UMASK) field (bits 8 through 15) — These bits qualify the condition that the selected event logic unit detects. Valid UMASK values for each event logic unit are specific to the unit. For each architectural performance event, its corresponding UMASK value defines a specific microarchitectural condition. A pre-defined microarchitectural condition associated with an architectural event may not be applicable to a given processor. The processor then reports only a subset of pre-defined architectural events. Pre-defined architectural events are listed in Table 18-1; support for pre-defined architectural events is enumerated using CPUID.0AH:EBX. Architectural performance events available in the initial implementation are listed in Table 19-1.
  • Event select field (bits 0 through 7) — Selects the event logic unit used to detect microarchitectural conditions (see Table 18-1, for a list of architectural events and their 8-bit codes). The set of values for this field is defined architecturally; each value corresponds to an event logic unit for use with an architectural performance event. The number of architectural events is queried using CPUID.0AH:EAX. A processor may support only a subset of pre-defined values.

而umask和event select我們可以找到對應的表格(列舉一個,還有很多):

但是我主要的是arm64的,所以還得找arm的技術手冊。加油

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章