This is a list of all ARM V7's performance counter event types. Please see ARM11 Technical Reference Manual.
Name | Description | Counters usable | Unit mask options |
SW_INCR | Software increment of PMNC registers | 1, 2, 3, 4, 5, 6 | |
L1I_CACHE_REFILL | Level 1 instruction cache refill | 1, 2, 3, 4, 5, 6 | |
L1I_TLB_REFILL | Level 1 instruction TLB refill | 1, 2, 3, 4, 5, 6 | |
L1D_CACHE_REFILL | Level 1 data cache refill | 1, 2, 3, 4, 5, 6 | |
L1D_CACHE | Level 1 data cache access | 1, 2, 3, 4, 5, 6 | |
L1D_TLB_REFILL | Level 1 data TLB refill | 1, 2, 3, 4, 5, 6 | |
LD_RETIRED | Load instruction architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
ST_RETIRED | Store instruction architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
INST_RETIRED | Instruction architecturally executed | 1, 2, 3, 4, 5, 6 | |
EXC_TAKEN | Exception taken | 1, 2, 3, 4, 5, 6 | |
EXC_RETURN | Exception return instruction architecturally executed | 1, 2, 3, 4, 5, 6 | |
CID_WRITE_RETIRED | Write to CONTEXTIDR register architecturally executed | 1, 2, 3, 4, 5, 6 | |
PC_WRITE_RETIRED | Software change of the PC architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
BR_IMMED_RETIRED | Immediate branch instruction architecturally executed | 1, 2, 3, 4, 5, 6 | |
BR_RETURN_RETIRED | Procedure return instruction architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
UNALIGNED_LDST_RETIRED | Unaligned load or store instruction architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
BR_MIS_PRED | Mispredicted or not predicted branch speculatively executed | 1, 2, 3, 4, 5, 6 | |
BR_PRED | Predictable branch speculatively executed | 1, 2, 3, 4, 5, 6 | |
MEM_ACCESS | Data memory access | 1, 2, 3, 4, 5, 6 | |
L1I_CACHE | Level 1 instruction cache access | 1, 2, 3, 4, 5, 6 | |
L1D_CACHE_WB | Level 1 data cache write-back | 1, 2, 3, 4, 5, 6 | |
L2D_CACHE | Level 2 data cache access | 1, 2, 3, 4, 5, 6 | |
L2D_CACHE_REFILL | Level 2 data cache refill | 1, 2, 3, 4, 5, 6 | |
L2D_CACHE_WB | Level 2 data cache write-back | 1, 2, 3, 4, 5, 6 | |
BUS_ACCESS | Bus access | 1, 2, 3, 4, 5, 6 | |
MEMORY_ERROR | Local memory error | 1, 2, 3, 4, 5, 6 | |
INST_SPEC | Instruction speculatively executed | 1, 2, 3, 4, 5, 6 | |
TTBR_WRITE_RETIRED | Write to TTBR architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
BUS_CYCLES | Bus cycle | 1, 2, 3, 4, 5, 6 | |
CPU_CYCLES | CPU cycle | 0 | |
WRITE_BUFFER_FULL | Any write buffer full cycle | 1, 2, 3, 4 | |
L2_STORE_MERGED | Any store that is merged in L2 cache | 1, 2, 3, 4 | |
L2_STORE_BUFF | Any bufferable store from load/store to L2 cache | 1, 2, 3, 4 | |
L2_ACCESS | Any access to L2 cache | 1, 2, 3, 4 | |
L2_CACH_MISS | Any cacheable miss in L2 cache | 1, 2, 3, 4 | |
AXI_READ_CYCLES | Number of cycles for an active AXI read | 1, 2, 3, 4 | |
AXI_WRITE_CYCLES | Number of cycles for an active AXI write | 1, 2, 3, 4 | |
MEMORY_REPLAY | Any replay event in the memory subsystem | 1, 2, 3, 4 | |
UNALIGNED_ACCESS_REPLAY | Unaligned access that causes a replay | 1, 2, 3, 4 | |
L1_DATA_MISS | L1 data cache miss as a result of the hashing algorithm | 1, 2, 3, 4 | |
L1_INST_MISS | L1 instruction cache miss as a result of the hashing algorithm | 1, 2, 3, 4 | |
L1_DATA_COLORING | L1 data access in which a page coloring alias occurs | 1, 2, 3, 4 | |
L1_NEON_DATA | NEON data access that hits L1 cache | 1, 2, 3, 4 | |
L1_NEON_CACH_DATA | NEON cacheable data access that hits L1 cache | 1, 2, 3, 4 | |
L2_NEON | L2 access as a result of NEON memory access | 1, 2, 3, 4 | |
L2_NEON_HIT | Any NEON hit in L2 cache | 1, 2, 3, 4 | |
L1_INST | Any L1 instruction cache access, excluding CP15 cache accesses | 1, 2, 3, 4 | |
PC_RETURN_MIS_PRED | Return stack misprediction at return stack pop (incorrect target address) | 1, 2, 3, 4 | |
PC_BRANCH_FAILED | Branch prediction misprediction | 1, 2, 3, 4 | |
PC_BRANCH_TAKEN | Any predicted branch that is taken | 1, 2, 3, 4 | |
PC_BRANCH_EXECUTED | Any taken branch that is executed | 1, 2, 3, 4 | |
OP_EXECUTED | Number of operations executed (in instruction or mutli-cycle instruction) | 1, 2, 3, 4 | |
CYCLES_INST_STALL | Cycles where no instruction available | 1, 2, 3, 4 | |
CYCLES_INST | Number of instructions issued in a cycle | 1, 2, 3, 4 | |
CYCLES_NEON_DATA_STALL | Number of cycles the processor waits on MRC data from NEON | 1, 2, 3, 4 | |
CYCLES_NEON_INST_STALL | Number of cycles the processor waits on NEON instruction queue or NEON load queue | 1, 2, 3, 4 | |
NEON_CYCLES | Number of cycles NEON and integer processors are not idle | 1, 2, 3, 4 | |
PMU0_EVENTS | Number of events from external input source PMUEXTIN[0] | 1, 2, 3, 4 | |
PMU1_EVENTS | Number of events from external input source PMUEXTIN[1] | 1, 2, 3, 4 | |
PMU_EVENTS | Number of events from both external input sources PMUEXTIN[0] and PMUEXTIN[1] | 1, 2, 3, 4 |
Measurement is a crucial component of performance improvement since reasoning and intuition are fallible guides and must be supplemented with tools like timing commands and profilers.- The Practice of Programming, Brian W. Kernighan and Rob Pike