This is a list of all ARM V7 Cortex-A9's performance counter event types. Please see Cortex-A9 Technical Reference Manual Cortex A9 DDI (ARM DDI 0388E, revision r2p0)
Name | Description | Counters usable | Unit mask options |
SW_INCR | Software increment of PMNC registers | 1, 2, 3, 4, 5, 6 | |
L1I_CACHE_REFILL | Level 1 instruction cache refill | 1, 2, 3, 4, 5, 6 | |
L1I_TLB_REFILL | Level 1 instruction TLB refill | 1, 2, 3, 4, 5, 6 | |
L1D_CACHE_REFILL | Level 1 data cache refill | 1, 2, 3, 4, 5, 6 | |
L1D_CACHE | Level 1 data cache access | 1, 2, 3, 4, 5, 6 | |
L1D_TLB_REFILL | Level 1 data TLB refill | 1, 2, 3, 4, 5, 6 | |
LD_RETIRED | Load instruction architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
ST_RETIRED | Store instruction architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
INST_RETIRED | Instruction architecturally executed | 1, 2, 3, 4, 5, 6 | |
EXC_TAKEN | Exception taken | 1, 2, 3, 4, 5, 6 | |
EXC_RETURN | Exception return instruction architecturally executed | 1, 2, 3, 4, 5, 6 | |
CID_WRITE_RETIRED | Write to CONTEXTIDR register architecturally executed | 1, 2, 3, 4, 5, 6 | |
PC_WRITE_RETIRED | Software change of the PC architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
BR_IMMED_RETIRED | Immediate branch instruction architecturally executed | 1, 2, 3, 4, 5, 6 | |
BR_RETURN_RETIRED | Procedure return instruction architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
UNALIGNED_LDST_RETIRED | Unaligned load or store instruction architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
BR_MIS_PRED | Mispredicted or not predicted branch speculatively executed | 1, 2, 3, 4, 5, 6 | |
BR_PRED | Predictable branch speculatively executed | 1, 2, 3, 4, 5, 6 | |
MEM_ACCESS | Data memory access | 1, 2, 3, 4, 5, 6 | |
L1I_CACHE | Level 1 instruction cache access | 1, 2, 3, 4, 5, 6 | |
L1D_CACHE_WB | Level 1 data cache write-back | 1, 2, 3, 4, 5, 6 | |
L2D_CACHE | Level 2 data cache access | 1, 2, 3, 4, 5, 6 | |
L2D_CACHE_REFILL | Level 2 data cache refill | 1, 2, 3, 4, 5, 6 | |
L2D_CACHE_WB | Level 2 data cache write-back | 1, 2, 3, 4, 5, 6 | |
BUS_ACCESS | Bus access | 1, 2, 3, 4, 5, 6 | |
MEMORY_ERROR | Local memory error | 1, 2, 3, 4, 5, 6 | |
INST_SPEC | Instruction speculatively executed | 1, 2, 3, 4, 5, 6 | |
TTBR_WRITE_RETIRED | Write to TTBR architecturally executed, condition code pass | 1, 2, 3, 4, 5, 6 | |
BUS_CYCLES | Bus cycle | 1, 2, 3, 4, 5, 6 | |
CPU_CYCLES | CPU cycle | 0 | |
JAVA_BC_EXEC | Number of Java bytecodes decoded, including speculative ones | 1, 2, 3, 4, 5, 6 | |
JAVA_SFTBC_EXEC | Number of software Java bytecodes decoded, including speculative ones | 1, 2, 3, 4, 5, 6 | |
JAVA_BB_EXEC | Number of Jazelle taken branches executed, including those flushed due to a previous load/store which aborts late | 1, 2, 3, 4, 5, 6 | |
CO_LF_MISS | Number of coherent linefill requests which miss in all other CPUs, meaning that the request is sent to external memory | 1, 2, 3, 4, 5, 6 | |
CO_LF_HIT | Number of coherent linefill requests which hit in another CPU, meaning that the linefill data is fetched directly from the relevant cache | 1, 2, 3, 4, 5, 6 | |
IC_DEP_STALL | Number of cycles where CPU is ready to accept new instructions but does not receive any because of the instruction side not being able to provide any and the instruction cache is currently performing at least one linefill | 1, 2, 3, 4, 5, 6 | |
DC_DEP_STALL | Number of cycles where CPU has some instructions that it cannot issue to any pipeline and the LSU has at least one pending linefill request but no pending TLB requests | 1, 2, 3, 4, 5, 6 | |
STALL_MAIN_TLB | Number of cycles where CPU is stalled waiting for completion of translation table walk from the main TLB | 1, 2, 3, 4, 5, 6 | |
STREX_PASS | Number of STREX instructions architecturally executed and passed | 1, 2, 3, 4, 5, 6 | |
STREX_FAILS | Number of STREX instructions architecturally executed and failed | 1, 2, 3, 4, 5, 6 | |
DATA_EVICT | Number of eviction requests due to a linefill in the data cache | 1, 2, 3, 4, 5, 6 | |
ISS_NO_DISP | Number of cycles where the issue stage does not dispatch any instruction | 1, 2, 3, 4, 5, 6 | |
ISS_EMPTY | Number of cycles where the issue stage is empty | 1, 2, 3, 4, 5, 6 | |
INS_RENAME | Number of instructions going through the Register Renaming stage | 1, 2, 3, 4, 5, 6 | |
PRD_FN_RET | Number of procedure returns whose condition codes do not fail, excluding all exception returns | 1, 2, 3, 4, 5, 6 | |
INS_MAIN_EXEC | Number of instructions being executed in main execution pipeline of the CPU, the multiply pipeline and the ALU pipeline | 1, 2, 3, 4, 5, 6 | |
INS_SND_EXEC | Number of instructions being executed in the second execution pipeline (ALU) of the CPU | 1, 2, 3, 4, 5, 6 | |
INS_LSU | Number of instructions being executed in the Load/Store unit | 1, 2, 3, 4, 5, 6 | |
INS_FP_RR | Number of floating-point instructions going through the Register Rename stage | 1, 2, 3, 4, 5, 6 | |
INS_NEON_RR | Number of NEON instructions going through the Register Rename stage | 1, 2, 3, 4, 5, 6 | |
STALL_PLD | Number of cycles where CPU is stalled because PLD slots are all full | 1, 2, 3, 4, 5, 6 | |
STALL_WRITE | Number of cycles where CPU is stalled because data side is full and executing writes to external memory | 1, 2, 3, 4, 5, 6 | |
STALL_INS_TLB | Number of cycles where CPU is stalled because of main TLB misses on requests issued by the instruction side | 1, 2, 3, 4, 5, 6 | |
STALL_DATA_TLB | Number of cycles where CPU is stalled because of main TLB misses on requests issued by the data side | 1, 2, 3, 4, 5, 6 | |
STALL_INS_UTLB | Number of cycles where CPU is stalled because of micro TLB misses on the instruction side | 1, 2, 3, 4, 5, 6 | |
STALL_DATA_ULTB | Number of cycles where CPU is stalled because of micro TLB misses on the data side | 1, 2, 3, 4, 5, 6 | |
STALL_DMB | Number of cycles where CPU is stalled due to executed of a DMB memory barrier | 1, 2, 3, 4, 5, 6 | |
CLK_INT_EN | Number of cycles during which the integer core clock is enabled | 1, 2, 3, 4, 5, 6 | |
CLK_DE_EN | Number of cycles during which the Data Engine clock is enabled | 1, 2, 3, 4, 5, 6 | |
INS_ISB | Number of ISB instructions architecturally executed | 1, 2, 3, 4, 5, 6 | |
INS_DSB | Number of DSB instructions architecturally executed | 1, 2, 3, 4, 5, 6 | |
INS_DMB | Number of DMB instructions speculatively executed | 1, 2, 3, 4, 5, 6 | |
EXT_IRQ | Number of external interrupts executed by the processor | 1, 2, 3, 4, 5, 6 | |
PLE_CL_REQ_CMP | PLE cache line request completed | 1, 2, 3, 4, 5, 6 | |
PLE_CL_REQ_SKP | PLE cache line request skipped | 1, 2, 3, 4, 5, 6 | |
PLE_FIFO_FLSH | PLE FIFO flush | 1, 2, 3, 4, 5, 6 | |
PLE_REQ_COMP | PLE request completed | 1, 2, 3, 4, 5, 6 | |
PLE_FIFO_OF | PLE FIFO overflow | 1, 2, 3, 4, 5, 6 | |
PLE_REQ_PRG | PLE request programmed | 1, 2, 3, 4, 5, 6 |
Don't speculate - benchmark.- Dan Bernstein