This is a list of PPC E6500's performance counter event types. Please see PowerPC e500 Core Complex Reference Manual Chapter 7: Performance Monitor, downloadable from freescale.com
Name | Description | Counters usable | Unit mask options |
CPU_CLK | Cycles | all | |
COMPLETED_INSNS | Completed Instructions (0, 1, or 2 per cycle) | all | |
COMPLETED_OPS | Completed Micro-ops | all | |
DECODED_OPS | Micro-ops decoded | all | |
TRANSITIONS_PM_EVENT | 0 to 1 transitions on the pm_event input | all | |
CPU_CLK_PM_EVENT | Processor cycles that occur when the pm_event input is asserted | all | |
COMPLETED_BRANCHES | Branch Instructions completed | all | |
COMPLETED_LOAD_OPS | Load micro-ops completed | all | |
COMPLETED_STORE_OPS | Store micro-ops completed | all | |
COMPLETION_REDIRECTS | Number of completion buffer redirects | all | |
BRANCHES_FINISHED | Branches finished | all | |
TAKEN_BRANCHES_FINISHED | Taken branches finished | all | |
TAKEN_BRANCHES_FINISHED_NOT_BTB | Finished unconditional branches that miss the BTB | all | |
BRANCHES_MISPREDICTED | Branch instructions mispredicted due to direction, target, or IAB prediction | all | |
BRANCHES_MISPREDICTED_DIRECTION | Branches mispredicted due to direction prediction | all | |
BTB_HITS | Branches that hit in the BTB, or missed but are not taken | all | |
DECODE_STALLED | Cycles the instruction buffer was not empty, but 0 instructions decoded | all | |
ISSUE_STALLED | Cycles the SFX/CFX issue queue is not empty but 0 instructions issued | all | |
BRANCH_ISSUE_STALLED | Cycles the branch buffer is not empty but 0 instructions issued | all | |
SFX0_SCHEDULE_STALLED | Cycles SFX0 is not empty but 0 instructions scheduled | all | |
SFX1_SCHEDULE_STALLED | Cycles SFX1 is not empty but 0 instructions scheduled | all | |
CFX_SCHEDULE_STALLED | Cycles CFX is not empty but 0 instructions scheduled | all | |
LSU_SCHEDULE_STALLED | Cycles LSU is not empty but 0 instructions scheduled | all | |
BU_SCHEDULE_STALLED | Cycles BU is not empty but 0 instructions scheduled | all | |
TOTAL_TRANSLATED | Total LSU micro-ops that reach the second stage of the LSU | all | |
LOADS_TRANSLATED | Cacheable load micro-ops translated.1 (Does not include WT) | all | |
STORES_TRANSLATED | Cacheable store micro-ops translated.1 (Does not include WT) | all | |
TOUCHES_TRANSLATED | Cacheable touch instructions translated. Includes: dcbt / dcbtep dcbtst / dcbtstep icbt ct=2 | all | |
CACHEOPS_TRANSLATED | Number of dcba, dcbf, dcbst, and dcbz instructions translated (e500 traps on dcbi) | all | |
CACHEINHIBITED_ACCESSES_TRANSLATED | Number of cache inhibited accesses translated | all | |
GUARDED_LOADS_TRANSLATED | Number of guarded loads translated | all | |
WRITETHROUGH_STORES_TRANSLATED | Number of write-through stores translated | all | |
MISALIGNED_ACCESSES_TRANSLATED | Number of misaligned load or store accesses translated. | all | |
FETCH_2X4_HITS | Each fetch retrieves up to 8 instructions, but only the first 4 are required. This event increments if at least one instruction of the second 4 are actually used. | all | |
FETCH_HITS_ON_PREFETCHES | Fetch hits on instruction prefetch when the data is still in the ILFB. | all | |
GENERATED_FETCH_PREFETCHES | Number of prefetches generated. | all | |
DL1_RELOADS | This is historically used to determine dcache miss rate (along with loads/stores completed). This counts dL1 reloads for any reason. | all | |
LOAD_MISS_WITH_LOAD_QUEUE_FULL | Counts number of stalls; Com:52 counts cycles stalled. Includes: cacheable loads, CI loads, loadec, larx, touches, ibll, ibsl,ibllsl | all | |
LOAD_GUARDED_MISS_NOT_LAST_REPLAYS | Load guarded miss when the load is not yet at the bottom of the completion buffer. | all | |
STORE_TRANSLATED_QUEUE_FULL_REPLAYS | Translate a store when the StQ is full. | all | |
ADDRESS_COLLISION_REPLAYS | Address collision. | all | |
DTLB_MISS_REPLAYS | Counts number of stalls; Com:56 counts cycles stalled. | all | |
DTLB_BUSY_REPLAYS | Counts number of stalls; Com:57 counts cycles stalled. | all | |
SECOND_PART_MISALIGNED_AFTER_MISS_REPLAYS | Second part of misaligned access when first part missed in cache. | all | |
LOAD_MISS_QUEUE_FULL_CYCLES | Cycles stalled on replay condition - Load miss with load queue full. | all | |
LOAD_GUARDED_MISS_NOT_LAST_CYCLES | Cycles stalled on replay condition - Load guarded miss when the load is not yet at the bottom of the completion buffer. | all | |
STORE_TRANSLATED_QUEUE_FULL_CYCLES | Cycles stalled on replay condition - Translate a store when the StQ is full. | all | |
ADDRESS_COLLISION_CYCLES | Cycles stalled on replay condition - Address collision. | all | |
DTLB_MISS_CYCLES | Cycles stalled on replay condition - DTLB miss. | all | |
DTLB_BUSY_CYCLES | Cycles stalled on replay condition - DTLB busy. | all | |
SECOND_PART_MISALIGNED_AFTER_MISS_CYCLES | Cycles stalled on replay condition - Second part of misaligned access when first part missed in cache. | all | |
IL1_FETCH_RELOADS | This is historically used to determine icache miss rate (along with instructions completed) Reloads due to demand fetch. | all | |
FETCHES | Counts fetches that write at least one instruction to the Instruction Buffer. | all | |
IMMU_TLB4K_RELOADS | iMMU TLB4K reloads | all | |
IMMU_VSP_RELOADS | iMMU VSP reloads | all | |
DMMU_TLB4K_RELOADS | dMMU TLB4K reloads | all | |
DMMU_VSP_RELOADS | dMMU VSP reloads | all | |
L2MMU_MISSES | Counts iTLB/dTLB error interrupt | all | |
TAKEN_BRANCHES | Completed branch instructions that were taken. | all | |
TAKEN_BLR | Completed blr instructions that were taken. | all | |
BTB_TARGET_MISPREDICT | Number of target mispredicts (BTB). | all | |
MISPREDICT_TARGET_BLR | Number of link stack mispredicts (LS). | all | |
TAKEN_BTB_BUT_MISS | Number of BTB misses, but taken (BTB allocates). | all | |
PMC0_OVERFLOW | Counts the number of times PMC0[32] transitioned from 1 to 0. | all | |
PMC1_OVERFLOW | Counts the number of times PMC1[32] transitioned from 1 to 0. | all | |
PMC2_OVERFLOW | Counts the number of times PMC2[32] transitioned from 1 to 0. | all | |
PMC3_OVERFLOW | Counts the number of times PMC3[32] transitioned from 1 to 0. | all | |
INTERRUPTS | Number of interrupts taken | all | |
EXTERNAL_INTERRUPTS | Number of external input interrupts taken | all | |
CRITICAL_INTERRUPTS | Number of critical input interrupts taken | all | |
SC_TRAP_INTERRUPTS | Number of system call and trap interrupts | all | |
TBL_BIT_TRANS_PMGC0 | Counts transitions of the TBL bit selected by PMGC0[TBSEL]. | all | |
PMC4_OVERFLOW | Counts the number of times PMC4[32] transitioned from 1 to 0. | all | |
PMC5_OVERFLOW | Counts the number of times PMC5[32] transitioned from 1 to 0. | all | |
L1_STASH_HIT | Stash hits in L1 Data Cache. | all | |
L1_STASH_REQ | Stash requests to L1 Data Cache. | all | |
TIMES_LSU_THREAD_PRIO_SWTICHED | Number of times the Load Store Unit thread priority switched based on resource collisions. | all | |
CLK_THREAD_REQ_FPU_DENIED | Number of cycles both threads had Floating Point Unit requests and one was denied. | all | |
CLK_THREAD_REQ_VPERM_DENIED | Number of cycles both threads had Altivec Permute requests and one was denied. | all | |
CLK_THREAD_REQ_VGEN_DENIED | Number of cycles both threads had Altivec General requests and one was denied. | all | |
CLK_THREAD_REQ_CFX_DENIED | Number of cycles both threads had Complex Fixed-Point Unit requests and one was denied. | all | |
CLK_THREAD_REQ_FETCH_DENIED | Number of cycles both threads both threads made a Fetch request to the L1 Instruction Cache and one thread wins arbitration. | all | |
CLK_LSU_ISSUE_STALLED | Cycles the LSU issue queue is not empty but 0 instructions issued. | all | |
CLK_FPU_ISSUE_STALLED | Cycles the FPU issue queue is not empty but 0 instructions issued. | all | |
CLK_ALTIVEC_ISSUE_STALLED | Cycles the AltiVec issue queue is not empty but 0 instructions issued. | all | |
CLK_FPU_SCHEDULE_STALLED | Cycles FPU is not empty but 0 instructions scheduled. | all | |
CLK_VPERM_SCHEDULE_STALLED | Cycles VPERM is not empty but 0 instructions scheduled. | all | |
CLK_VGEN_SCHEDULE_STALLED | Cycles VGEN is not empty but 0 instructions scheduled. | all | |
CLK_VPU_INSTRUCTION_WAIT_FOR_OPERA | Cycles VPU instruction waits for operands. | all | |
CLK_VFPU_INSTRUCTION_WAIT_FOR_OPERA | Cycles VFPU instruction waits for operands. | all | |
CLK_VSFX_INSTRUCTION_WAIT_FOR_OPERA | Cycles VSFX instruction waits for operands | all | |
CLK_VCFX_INSTRUCTION_WAIT_FOR_OPERA | Cycles VCFX instruction waits for operands. | all | |
CLK_IB_EMPT | Number of cycles the Instruction Buffer is empty | all | |
CLK_IB_FULL | Number of cycles the Instruction Buffer is full enough such that fetch stops fetching. | all | |
CLK_CB_EMPT | Number of cycles the Completion Buffer is empty. | all | |
CLK_CB_FULL | Number of cycles the Completion Buffer is full enough such that decode stops. | all | |
CLK_PRESYNC_SI_IB | Number of cycles a pre-sync serialized instruction holds in the Instruction Buffer and is not decoded. | all | |
COMPLETED_CLK_0_INSTRUCTIONS | Increments if 0 instructions (micro-ops) completed. | all | |
COMPLETED_CLK_1_INSTRUCTIONS | Increments if 1 instruction (micro-op) completed. | all | |
COMPLETED_CLK_2_INSTRUCTIONS | Increments if 2 instructions (micro-op) completed. | all | |
DETECTED_IAC5S | Every valid IAC5 detection. | all | |
DETECTED_IAC6S | Every valid IAC6 detection. | all | |
DETECTED_IAC7S | Every valid IAC7 detection. | all | |
DETECTED_IAC8S | Every valid IAC8 detection. | all | |
DETECTED_IAC1S | Every valid IAC1 detection. | all | |
DETECTED_IAC2S | Every valid IAC2 detection. | all | |
DETECTED_IAC3S | Every valid IAC3 detection. | all | |
DETECTED_IAC4S | Every valid IAC4 detection. | all | |
DETECTED_DAC1S | Every valid DAC1 detection. | all | |
DETECTED_DAC2S | Every valid DAC2 detection. | all | |
DETECTED_DVT0 | Detection of a write to DEVENT SPR with DVT0 set. | all | |
DETECTED_DVT1 | Detection of a write to DEVENT SPR with DVT1 set. | all | |
DETECTED_DVT2 | Detection of a write to DEVENT SPR with DVT2 set. | all | |
DETECTED_DVT3 | Detection of a write to DEVENT SPR with DVT3 set. | all | |
DETECTED_DVT4 | Detection of a write to DEVENT SPR with DVT4 set. | all | |
DETECTED_DVT5 | Detection of a write to DEVENT SPR with DVT5 set. | all | |
DETECTED_DVT6 | Detection of a write to DEVENT SPR with DVT6 set. | all | |
DETECTED_DVT7 | Detection of a write to DEVENT SPR with DVT7 set. | all | |
CLK_COMPLETION_STALLED | Number of completion cycles stalled due to Nexus FIFO full. | all | |
FPU_FINISH | FPU finish. | all | |
CLK_FPU_DIV | Counts once for every cycle of divide execution. (fdivs and fdiv). | all | |
FPU_DENORM_INPUT | Counts extra cycles delay due to denormalized inputs. If there is one, this is incremented 4 times, Two operands increments it 5 times. This shows the real penalty due to denorms, not just how often they occur. | all | |
FPU_DENORM_OUTPUT | FPU denorm output. | all | |
FPU_FPSCR_FULL_STALL | FPU FPSCR stall. | all | |
FPU_PIPE_SYNC_STALL | Synchronization-op stalls: count once for each cycle that a ��break-before�� FPU is in the RS/issue stage but cannotissue. Also count once for each cycle that an FPU op is in the RS/issue stage but cannot issue due to ��break-after��: of an FPU op currently in progress. | all | |
FPU_INPUT_DATA_STALL | FPU data-ready stall: cycles in which there is an op in the RS/issue stage that cannot issue because one or more of its operands is not yet available. | all | |
FPU_INSTRUCTIONS_GEN_FLAG | FPU instruction sets FPSCR[FEX]. | all | |
PW20_CNT | Number of times the core enters the PW20 power management state. | all | |
DECORATED_LOADS | Number of decorated loads to cache inhibited memory performed. | all | |
DECORATED_STORES | Number of decorated stores to cache inhibited memory performed. | all | |
NUM_INSTRUCTIONS_SUCC | Number of successful stbcx., sthcx., stwcx., or stdcx. instructions. | all | |
NUM_INSTRUCTIONS_UNSUCC | Number of unsuccessful stbcx., sthcx., stwcx., or stdcx. instructions. | all | |
COMPLETED_LSU_MICROOPS | Completed Load Store Unit micro-ops. Every micro-op that goes down the LSU pipe. Includes: GPR loads / GPR stores, FPR loads / FPR stores, VR loads / VR stores, Cache ops. Memory barriers Other LSU ops (dsn, msgsnd, mvidsplt, mviwsplt, tlbilx, tlbivax, tlbsync) | all | |
COMPLETED_GPR_LOADS | GPR load micro-ops completed. This event only counts once for misaligns. Note that lmw that causes a fault may end up double-counting micro-ops -- once for first pass, once for second pass. | all | |
COMPLETED_GPR_STORES | GPR store micro-ops completed. This event only counts once for misaligns. Note that stmw that causes a fault may end up double-counting micro-ops -- once for first pass, once for second pass. | all | |
COMPLETED_CACHEOPS | Cache ops completed. Includes: dcba / dcbal, dcbf / dcbfep, dcbi, dcblc / dcblq, dcbst / dcbstep, dcbt / dcbtep / dcbtls, dcbtst / dcbtstep / dcbtstls, dcbz / dcbzep / dcbzl / dcbzlep, icbi / icbiep, icblc / icblq., icbt / icbtls | all | |
COMPLETED_MEM_BARRIERS | Memory barriers completed. Includes: msync (sync, lwsync, elemental barriers) mbar (eieio) miso. | all | |
COMPLETED_SFX_MICROOPS | SFX micro-ops completed. | all | |
COMPLETED_SINCLK_SFX_MICROOPS | SFX single-cycle micro-ops completed. | all | |
COMPLETED_DBLCLK_SFX_MICROOPS | SFX double-cycle micro-ops completed. | all | |
COMPLETED_CFX_INSTRUCTIONS | CFX instructions completed. | all | |
COMPLETED_SFX_CFX_INSTRUCTIONS | SFX or CFX instructions completed. | all | |
COMPLETED_FPU_INSTRUCTIONS | FPU instructions completed. | all | |
COMPLETED_FPR_MICROOPS_LOADS | FPR load micro-ops completed. | all | |
COMPLETED_FPR_MICROOPS_STORES | FPR store micro-ops completed. | all | |
COMPLETED_FPR_MICROOPS_LOADS_STORES | FPR load and store micro-ops completed. | all | |
COMPLETED_FPR_SINPRECISE_LOADS_STORES | FPR single-precision load and store micro-ops completed. | all | |
COMPLETED_FPR_DBLPRECISE_LOADS_STORES | FPR double-precision load and store micro-ops completed. | all | |
COMPLETED_ALTIVEC_INSTRUCTIONS | AltiVec instructions completed. (non-LSU). | all | |
COMPLETED_ALTIVEC_VSFX_INSTRUCTIONS | AltiVec VSFX instructions completed. | all | |
COMPLETED_ALTIVEC_VCFX_INSTRUCTIONS | AltiVec VCFX instructions completed. | all | |
COMPLETED_ALTIVEC_VPU_INSTRUCTIONS | AltiVec VPU instructions completed. | all | |
COMPLETED_ALTIVEC_VFPU_INSTRUCTIONS | AltiVec VFPU instructions completed. | all | |
COMPLETED_VR_LOADS_MICROOPS | VR load micro-ops completed. | all | |
COMPLETED_VR_STORES_MICROOPS | VR store micro-ops completed. | all | |
VSCR_SAT_SET | Number of times the saturate bit flips from 0 to 1. | all | |
CLK_SFX0_IDLE | Cycles Simple Fixed Point Unit 0 is idle. | all | |
CLK_SFX1_IDLE | Cycles Simple Fixed Point Unit 1 is idle. | all | |
CLK_CFX_IDLE | Cycles Complex Fixed Point Unit is idle. | all | |
CLK_LSU_IDLE | Cycles Load Store Unit is idle. | all | |
CLK_BU_IDLE | Cycles Branch Unit is idle. | all | |
CLK_FPU_IDLE | Cycles Floating Point Unit is idle. | all | |
CLK_VPU_IDLE | Cycles AltiVec Permute Unit is idle. | all | |
CLK_VFPU_IDLE | Cycles AltiVec Floating Point Unit is idle. | all | |
CLK_VSFX_IDLE | Cycles AltiVec Simple Fixed Point Unit is idle. | all | |
CLK_VCFX_IDLE | Cycles AltiVec Complex Fixed Point Unit is idle. | all | |
L1_CACHE_MISSES | Data L1 cache misses. (Includes load, store, cache ops). | all | |
L1_CACHE_LOAD_MISSES | Data L1 cache load misses. | all | |
L1_CACHE_STORE_MISSES | Data L1 cache store misses. | all | |
LMQ_ALLOCATED_LOADS | Loads that allocate into Load Miss Queue. (Data L1 cache misses, but may not be to different cache lines). | all | |
LOAD_THREAD_MISS_COLLISION | Number of times that this thread��s load hits a line that is valid for the other thread but not this thread. | all | |
INTERTHREAD_STATUS_ARRAY_COLLISION | Number of times that two threads collide on status array access. | all | |
NUM_SGB_ALLOC | Number of Store Gather Buffer allocates. | all | |
NUM_SGB_GATHERS | Number of Store Gather Buffer gathers. | all | |
NUM_SGB_OVERFLOWS | Number of Store Gather Buffer overflows. (Causes SGB full condition when additional store request is made). | all | |
NUM_SGB_PROMOTIONS | Number of Store Gather Buffer promotions. | all | |
NUM_SGB_INORDER_PROMOTIONS | Number of Store Gather Buffer in-order promotions. (Also includes oldest-entry timeout condition). | all | |
NUM_SGB_OUTOFORDER_PROMOTIONS | Number of Store Gather Buffer out-of-order promotions. | all | |
NUM_SGB_HP_PROMOTIONS | Number of Store Gather Buffer high-priority promotions. (Load hits on pending store). | all | |
NUM_SGB_MISO_PROMOTIONS | Number of Store Gather Buffer miso promotions. promotions. (Load hits on pending store). | all | |
NUM_SGB_WATERMARK_PROMOTIONS | Number of Store Gather Buffer watermark promotions. | all | |
NUM_SGB_OVERFLOW_PROMOTIONS | Number of Store Gather Buffer overflow promotions. | all | |
CLK_DLAQ_FULL | Number of cycles the DLink Age Queue is full. | all | |
TIMES_DLAQ_FULL | Number of times the DLink Age Queue is full. | all | |
CLK_LRSAQ_FULL | Number of cycles the Load Reservation Set Age Queue is full. | all | |
TIMES_LRSAQ_FULL | Number of times the Load Reservation Set Age Queue is full. | all | |
CLK_FWDAQ_FULL | Number of cycles the Forward Age Queue is full. | all | |
TIMES_FWDAQ_FULL | Number of times the Forward Age Queue is full. | all | |
NUM_FWD_STQ_COLLISION_TIMES | Number of times a Store Queue collision is forwardable. The following cases are not forwardable: store address + size does not contain the load, cache-inhibited store, denormalized, floating point store, stcx, guarded load. | all | |
NUM_FWD_STQ_COLLISION_TIMES_DATA_RDY | Number of times a Store Queue collision is forwardable and is ready with data to forward. | all | |
NUM_FWD_STQ_COLLISION_TIMES_DATA_NORDY | Number of times a Store Queue collision is forwardable but is not ready with data to forward. | all | |
NUM_NOFWD_STQ_COLLISION_TIMES | Number of times a Store Queue collision is not forwardable and must wait until the store leaves the Store Queue. | all | |
NUM_FWD_STQ_COLLISION_CLK | Number of cycles a Store Queue collision is forwardable. (Number of cycles from the detection of a forwardable Store Queue entry until the load is replayed in stg1). | all | |
NUM_FWD_STQ_COLLISION_CLK_DATA_RDY | Number of cycles a Store Queue collision is forwardable and is ready with data to forward. (Number of cycles from the detection of a forwardable Store Queue entry with valid data until the load is replayed in stg1). | all | |
NUM_FWD_STQ_COLLISION_CLK_DATA_NORDY | Number of cycles a Store Queue collision is forwardable but is not ready with data to forward. (Number of cycles from the detection of a forwardable Store Queue entry without valid data until the load is replayed in stg1). | all | |
NUM_NOFWD_STQ_COLLISION_CLK | Number of cycles a Store Queue collision is not forwardable and has to wait until the store leaves the Store Queue. (Number of cycles from the detection of a non-forwardable Store Queue entry until the load is replayed in stg1). | all | |
NUM_FALSE_EA_COLLISION | Number of times the lower 12-bits of EA matched but the upper bits did not, leading to a false load-on-store replay. Cycle penalty is 4x the number of times. | all | |
NUM_LSO_BUS_COLLISION | Number of LS0 result bus collisions. Cycle penalty is 3x this measurement. | all | |
NUM_INTERTHREAD_DBLWORKD_BANK_COLLISION | Number of inter-thread double-word bank collisions. Measures when both threads attempt to access the same double-word bank. Cycle penalty is 3x this measurement. | all | |
L1_CACHE_IM | Instruction L1 cache demand fetch misses. (Includes icbtls. Does not include prefetch). | all | |
IMMU_MISSES | Counts misses in the level 1 Instruction MMU. | all | |
IMMU_TLB4K_HITS | Counts hits in the level 1 Instruction MMU TLB-4K. | all | |
IMMU_VSP_HITS | Counts hits in the level 1 Instruction MMU VSP. | all | |
CLK_IMMU_HW_TABLEWALK | Counts IMMU cycles spent in hardware tablewalk. This represents the cycles from the point where the L2 MMU miss occurs to when the page table walk completes with a valid translation or exception. | all | |
DMMU_MISSES | Counts misses in the level 1 Data MMU. (Does not count replayed operations). | all | |
DMMU_TLB4K_HITS | Counts hits in the level 1 Data MMU TLB-4K. (Does not count replayed operations). | all | |
DMMU_VSP_HITS | Counts hits in the level 1 Data MMU VSP. (Does not count replayed operations). | all | |
CLK_DMMU_HW_TABLEWALK | Counts DMMU cycles spent in hardware tablewalk. This represents the cycles from the point where the L2 MMU miss occurs to when the page table walk completes with a valid translation or exception. | all | |
L2MMU_MISSES | Counts level 2 MMU misses. (Does not count misses that occur due to dcbt / dcbtst / dcba / dcbal instructions that fail translation and are no-oped. Does not count misses in L2MMU-VSP when looking up an indirect entry). | all | |
L2MMU_4K_HITS | Counts level 2 MMU hits in L2MMU-4K. | all | |
L2MMU_VSP_HITS | Counts level 2 MMU hits in L2MMU-VSP. (Does not count indirect lookups). | all | |
L2MMU_INDIRECT_MISSES | Counts level 2 MMU indirect misses. This represents indirect entry lookups that do not have a matching indirect entry. | all | |
L2MMU_INDIRECT_VALID_MISSES | Counts level 2 MMU indirect valid misses. This occurts when the indirect entry is valid, but the corresponding PTE[V] = 0 or the premissions in the PTE are not sufficient for the requested access. | all | |
LRAT_MISSES | Counts Logical to Real Address Translation misses. This includes LRAT misses from tlbwe instructions or from page table translations. | all | |
CLK_LMQ_LOSE_DLINK_DUE_SGB | Cycles the Load Miss Queue loses DLINK arbitration due to the Store Gather Buffer. | all | |
CLK_SGB_LOSE_DLINK_DUE_LMQ | Cycles the Store Gather Buffer loses DLINK arbitration due to the Load Miss Queue. | all | |
CLK_THREAD_LOSE_DLINK_DUE_OTHER_THREAD | Cycles thread loses DLINK arbitration due to other thread: Cycles thread loses DLINK arbitration due to other thread. | all | |
DECODE_MASK_VALUE | One mask/value pair that allows instructions to be counted in Decode. | all | |
SHR_L2_DLINK_REQ | Number of DLINK requests made from core to Shared L2. | all | |
SHR_L2_ILINK_REQ | Number of ILINK requests made from core to Shared L2. (Includes instruction fetches and L2MMU hardware tablewalk requests). | all | |
SHR_L2_RLINK_REQ | Number of RLINK requests made from Shared L2 to core. (back invalidates, stashes, barriers). | all | |
SHR_L2_BLINK_REQ | Number of BLINK requests made from Shared L2 to core. (back invalidates, stashes, barriers). | all | |
SHR_L2_CLINK_REQ | Number of CLINK requests made from Shared L2 to core. (back invalidates, stashes, barriers). | all | |
L2_HITS | Number of L2 Cache hits. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_MISSES | Number of L2 Cache hits. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_DEMAND_ACCESS | Number of L2 Cache demand accesses. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_ACCESSES | Number of L2 Cache accesses from all sources (demand, reload, snoop, etc). Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_STORE_ALLOCATE | Number of L2 Cache store allocates. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_INSTRUCTIONS_ACCESS | Number of L2 Cache instruction accesses. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_DATA_ACCESS | Number of L2 Cache data accesses. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_INSTRUCTIONS_MISSES | Number of L2 Cache instruction misses. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_DATA_MISSES | Number of L2 Cache data misses. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_HITS_PER_THREAD | Number of times this core/thread hits in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_MISSES_PER_THREAD | Number of times this core/thread misses in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_DEMAND_ACCESS_PER_THREAD | Number of times this core/thread makes a demand access to the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_STORE_ALLOC_PER_THREAD | Number of times a store from this core/thread allocates in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_INSTRUCTIONS_ACCESS_PER_THREAD | Number of times an instruction from this core/thread accesses the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_DATA_ACCESS_PER_THREAD | Number of times a data operation from this core/thread accesses the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_INSTRUCTION_MISSES_PER_THREAD | Number of times an instruction from this core/thread misses in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_DATA_MISSES_PER_THREAD | Number of times a data operation from this core/thread misses in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_RELOAD_FROM_CORENET | Number of L2 Cache reloads from CoreNet. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_IN_STASH_REQ | Number of incoming L2 Cache stash requests. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_STASH_REQ_DOWNGRD_TO_SNOOPS | Number of incoming L2 Cache stash requests downgraded to snoops. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_SNOOPS_HITS | Number of L2 Cache snoop hits. Counts 0, 1, 2, 3, or 4 per cycle. | all | |
L2_SNOOPS_MINT | Number of L2 Cache snoops causing MINT. | all | |
L2_SNOOPS_SINT | Number of L2 Cache snoops causing SINT. | all | |
L2_SNOOPS_PUSHES | Number of L2 Cache snoop pushes. | all | |
CLK_BIB_STALL | Stall for Back Invalidate Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. | all | |
CLK_RLT_STALL | Stall for Reload Table entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. | all | |
CLK_RLFQ_STALL | Stall for Reload Fold Queue entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. | all | |
CLK_DTQ_STALL | Stall for Data Transaction Queue entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. | all | |
CLK_COB_STALL | Stall for Castout Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. | all | |
CLK_WDB_STALL | Stall for Write Data Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. | all | |
CLK_RLDB_STALL | Stall for Reload Data Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. | all | |
CLK_SNPQ_STALL | Stall for Snoop Queue entry (cycles). | all | |
BIU_MASTER_REQ | Master transaction starts. (Number of AOut sent to CoreNet). | all | |
BIU_MASTER_GLOBAL_REQ | Master transaction starts that are global. (Number of AOut with M=1 sent to CoreNet). | all | |
BIU_MASTER_DATA_SIDE_REQ | Master transaction starts that are global. (Number of AOut with M=1 sent to CoreNet). | all | |
BIU_MASTER_INSTRUCTION_SIDE_REQ | Master instruction-side transaction starts. (Number of I-side AOut sent to CoreNet). | all | |
L2_STASH_REQ | Stash request on AIn matches stash IDs for core or L2. | all | |
L2_SNOOP_REQ | Externally generated snoop requests. (Number of AIn from CoreNet not from self). | all |
Don't speculate - benchmark.- Dan Bernstein