Page doesn't render properly ?

PPC E6500 events

This is a list of PPC E6500's performance counter event types. Please see PowerPC e500 Core Complex Reference Manual Chapter 7: Performance Monitor, downloadable from freescale.com

NameDescriptionCounters usableUnit mask options
CPU_CLK Cycles all
COMPLETED_INSNS Completed Instructions (0, 1, or 2 per cycle) all
COMPLETED_OPS Completed Micro-ops all
DECODED_OPS Micro-ops decoded all
TRANSITIONS_PM_EVENT 0 to 1 transitions on the pm_event input all
CPU_CLK_PM_EVENT Processor cycles that occur when the pm_event input is asserted all
COMPLETED_BRANCHES Branch Instructions completed all
COMPLETED_LOAD_OPS Load micro-ops completed all
COMPLETED_STORE_OPS Store micro-ops completed all
COMPLETION_REDIRECTS Number of completion buffer redirects all
BRANCHES_FINISHED Branches finished all
TAKEN_BRANCHES_FINISHED Taken branches finished all
TAKEN_BRANCHES_FINISHED_NOT_BTB Finished unconditional branches that miss the BTB all
BRANCHES_MISPREDICTED Branch instructions mispredicted due to direction, target, or IAB prediction all
BRANCHES_MISPREDICTED_DIRECTION Branches mispredicted due to direction prediction all
BTB_HITS Branches that hit in the BTB, or missed but are not taken all
DECODE_STALLED Cycles the instruction buffer was not empty, but 0 instructions decoded all
ISSUE_STALLED Cycles the SFX/CFX issue queue is not empty but 0 instructions issued all
BRANCH_ISSUE_STALLED Cycles the branch buffer is not empty but 0 instructions issued all
SFX0_SCHEDULE_STALLED Cycles SFX0 is not empty but 0 instructions scheduled all
SFX1_SCHEDULE_STALLED Cycles SFX1 is not empty but 0 instructions scheduled all
CFX_SCHEDULE_STALLED Cycles CFX is not empty but 0 instructions scheduled all
LSU_SCHEDULE_STALLED Cycles LSU is not empty but 0 instructions scheduled all
BU_SCHEDULE_STALLED Cycles BU is not empty but 0 instructions scheduled all
TOTAL_TRANSLATED Total LSU micro-ops that reach the second stage of the LSU all
LOADS_TRANSLATED Cacheable load micro-ops translated.1 (Does not include WT) all
STORES_TRANSLATED Cacheable store micro-ops translated.1 (Does not include WT) all
TOUCHES_TRANSLATED Cacheable touch instructions translated. Includes: dcbt / dcbtep dcbtst / dcbtstep icbt ct=2 all
CACHEOPS_TRANSLATED Number of dcba, dcbf, dcbst, and dcbz instructions translated (e500 traps on dcbi) all
CACHEINHIBITED_ACCESSES_TRANSLATED Number of cache inhibited accesses translated all
GUARDED_LOADS_TRANSLATED Number of guarded loads translated all
WRITETHROUGH_STORES_TRANSLATED Number of write-through stores translated all
MISALIGNED_ACCESSES_TRANSLATED Number of misaligned load or store accesses translated. all
FETCH_2X4_HITS Each fetch retrieves up to 8 instructions, but only the first 4 are required. This event increments if at least one instruction of the second 4 are actually used. all
FETCH_HITS_ON_PREFETCHES Fetch hits on instruction prefetch when the data is still in the ILFB. all
GENERATED_FETCH_PREFETCHES Number of prefetches generated. all
DL1_RELOADS This is historically used to determine dcache miss rate (along with loads/stores completed). This counts dL1 reloads for any reason. all
LOAD_MISS_WITH_LOAD_QUEUE_FULL Counts number of stalls; Com:52 counts cycles stalled. Includes: cacheable loads, CI loads, loadec, larx, touches, ibll, ibsl,ibllsl all
LOAD_GUARDED_MISS_NOT_LAST_REPLAYS Load guarded miss when the load is not yet at the bottom of the completion buffer. all
STORE_TRANSLATED_QUEUE_FULL_REPLAYS Translate a store when the StQ is full. all
ADDRESS_COLLISION_REPLAYS Address collision. all
DTLB_MISS_REPLAYS Counts number of stalls; Com:56 counts cycles stalled. all
DTLB_BUSY_REPLAYS Counts number of stalls; Com:57 counts cycles stalled. all
SECOND_PART_MISALIGNED_AFTER_MISS_REPLAYS Second part of misaligned access when first part missed in cache. all
LOAD_MISS_QUEUE_FULL_CYCLES Cycles stalled on replay condition - Load miss with load queue full. all
LOAD_GUARDED_MISS_NOT_LAST_CYCLES Cycles stalled on replay condition - Load guarded miss when the load is not yet at the bottom of the completion buffer. all
STORE_TRANSLATED_QUEUE_FULL_CYCLES Cycles stalled on replay condition - Translate a store when the StQ is full. all
ADDRESS_COLLISION_CYCLES Cycles stalled on replay condition - Address collision. all
DTLB_MISS_CYCLES Cycles stalled on replay condition - DTLB miss. all
DTLB_BUSY_CYCLES Cycles stalled on replay condition - DTLB busy. all
SECOND_PART_MISALIGNED_AFTER_MISS_CYCLES Cycles stalled on replay condition - Second part of misaligned access when first part missed in cache. all
IL1_FETCH_RELOADS This is historically used to determine icache miss rate (along with instructions completed) Reloads due to demand fetch. all
FETCHES Counts fetches that write at least one instruction to the Instruction Buffer. all
IMMU_TLB4K_RELOADS iMMU TLB4K reloads all
IMMU_VSP_RELOADS iMMU VSP reloads all
DMMU_TLB4K_RELOADS dMMU TLB4K reloads all
DMMU_VSP_RELOADS dMMU VSP reloads all
L2MMU_MISSES Counts iTLB/dTLB error interrupt all
TAKEN_BRANCHES Completed branch instructions that were taken. all
TAKEN_BLR Completed blr instructions that were taken. all
BTB_TARGET_MISPREDICT Number of target mispredicts (BTB). all
MISPREDICT_TARGET_BLR Number of link stack mispredicts (LS). all
TAKEN_BTB_BUT_MISS Number of BTB misses, but taken (BTB allocates). all
PMC0_OVERFLOW Counts the number of times PMC0[32] transitioned from 1 to 0. all
PMC1_OVERFLOW Counts the number of times PMC1[32] transitioned from 1 to 0. all
PMC2_OVERFLOW Counts the number of times PMC2[32] transitioned from 1 to 0. all
PMC3_OVERFLOW Counts the number of times PMC3[32] transitioned from 1 to 0. all
INTERRUPTS Number of interrupts taken all
EXTERNAL_INTERRUPTS Number of external input interrupts taken all
CRITICAL_INTERRUPTS Number of critical input interrupts taken all
SC_TRAP_INTERRUPTS Number of system call and trap interrupts all
TBL_BIT_TRANS_PMGC0 Counts transitions of the TBL bit selected by PMGC0[TBSEL]. all
PMC4_OVERFLOW Counts the number of times PMC4[32] transitioned from 1 to 0. all
PMC5_OVERFLOW Counts the number of times PMC5[32] transitioned from 1 to 0. all
L1_STASH_HIT Stash hits in L1 Data Cache. all
L1_STASH_REQ Stash requests to L1 Data Cache. all
TIMES_LSU_THREAD_PRIO_SWTICHED Number of times the Load Store Unit thread priority switched based on resource collisions. all
CLK_THREAD_REQ_FPU_DENIED Number of cycles both threads had Floating Point Unit requests and one was denied. all
CLK_THREAD_REQ_VPERM_DENIED Number of cycles both threads had Altivec Permute requests and one was denied. all
CLK_THREAD_REQ_VGEN_DENIED Number of cycles both threads had Altivec General requests and one was denied. all
CLK_THREAD_REQ_CFX_DENIED Number of cycles both threads had Complex Fixed-Point Unit requests and one was denied. all
CLK_THREAD_REQ_FETCH_DENIED Number of cycles both threads both threads made a Fetch request to the L1 Instruction Cache and one thread wins arbitration. all
CLK_LSU_ISSUE_STALLED Cycles the LSU issue queue is not empty but 0 instructions issued. all
CLK_FPU_ISSUE_STALLED Cycles the FPU issue queue is not empty but 0 instructions issued. all
CLK_ALTIVEC_ISSUE_STALLED Cycles the AltiVec issue queue is not empty but 0 instructions issued. all
CLK_FPU_SCHEDULE_STALLED Cycles FPU is not empty but 0 instructions scheduled. all
CLK_VPERM_SCHEDULE_STALLED Cycles VPERM is not empty but 0 instructions scheduled. all
CLK_VGEN_SCHEDULE_STALLED Cycles VGEN is not empty but 0 instructions scheduled. all
CLK_VPU_INSTRUCTION_WAIT_FOR_OPERA Cycles VPU instruction waits for operands. all
CLK_VFPU_INSTRUCTION_WAIT_FOR_OPERA Cycles VFPU instruction waits for operands. all
CLK_VSFX_INSTRUCTION_WAIT_FOR_OPERA Cycles VSFX instruction waits for operands all
CLK_VCFX_INSTRUCTION_WAIT_FOR_OPERA Cycles VCFX instruction waits for operands. all
CLK_IB_EMPT Number of cycles the Instruction Buffer is empty all
CLK_IB_FULL Number of cycles the Instruction Buffer is full enough such that fetch stops fetching. all
CLK_CB_EMPT Number of cycles the Completion Buffer is empty. all
CLK_CB_FULL Number of cycles the Completion Buffer is full enough such that decode stops. all
CLK_PRESYNC_SI_IB Number of cycles a pre-sync serialized instruction holds in the Instruction Buffer and is not decoded. all
COMPLETED_CLK_0_INSTRUCTIONS Increments if 0 instructions (micro-ops) completed. all
COMPLETED_CLK_1_INSTRUCTIONS Increments if 1 instruction (micro-op) completed. all
COMPLETED_CLK_2_INSTRUCTIONS Increments if 2 instructions (micro-op) completed. all
DETECTED_IAC5S Every valid IAC5 detection. all
DETECTED_IAC6S Every valid IAC6 detection. all
DETECTED_IAC7S Every valid IAC7 detection. all
DETECTED_IAC8S Every valid IAC8 detection. all
DETECTED_IAC1S Every valid IAC1 detection. all
DETECTED_IAC2S Every valid IAC2 detection. all
DETECTED_IAC3S Every valid IAC3 detection. all
DETECTED_IAC4S Every valid IAC4 detection. all
DETECTED_DAC1S Every valid DAC1 detection. all
DETECTED_DAC2S Every valid DAC2 detection. all
DETECTED_DVT0 Detection of a write to DEVENT SPR with DVT0 set. all
DETECTED_DVT1 Detection of a write to DEVENT SPR with DVT1 set. all
DETECTED_DVT2 Detection of a write to DEVENT SPR with DVT2 set. all
DETECTED_DVT3 Detection of a write to DEVENT SPR with DVT3 set. all
DETECTED_DVT4 Detection of a write to DEVENT SPR with DVT4 set. all
DETECTED_DVT5 Detection of a write to DEVENT SPR with DVT5 set. all
DETECTED_DVT6 Detection of a write to DEVENT SPR with DVT6 set. all
DETECTED_DVT7 Detection of a write to DEVENT SPR with DVT7 set. all
CLK_COMPLETION_STALLED Number of completion cycles stalled due to Nexus FIFO full. all
FPU_FINISH FPU finish. all
CLK_FPU_DIV Counts once for every cycle of divide execution. (fdivs and fdiv). all
FPU_DENORM_INPUT Counts extra cycles delay due to denormalized inputs. If there is one, this is incremented 4 times, Two operands increments it 5 times. This shows the real penalty due to denorms, not just how often they occur. all
FPU_DENORM_OUTPUT FPU denorm output. all
FPU_FPSCR_FULL_STALL FPU FPSCR stall. all
FPU_PIPE_SYNC_STALL Synchronization-op stalls: count once for each cycle that a ��break-before�� FPU is in the RS/issue stage but cannotissue. Also count once for each cycle that an FPU op is in the RS/issue stage but cannot issue due to ��break-after��: of an FPU op currently in progress. all
FPU_INPUT_DATA_STALL FPU data-ready stall: cycles in which there is an op in the RS/issue stage that cannot issue because one or more of its operands is not yet available. all
FPU_INSTRUCTIONS_GEN_FLAG FPU instruction sets FPSCR[FEX]. all
PW20_CNT Number of times the core enters the PW20 power management state. all
DECORATED_LOADS Number of decorated loads to cache inhibited memory performed. all
DECORATED_STORES Number of decorated stores to cache inhibited memory performed. all
NUM_INSTRUCTIONS_SUCC Number of successful stbcx., sthcx., stwcx., or stdcx. instructions. all
NUM_INSTRUCTIONS_UNSUCC Number of unsuccessful stbcx., sthcx., stwcx., or stdcx. instructions. all
COMPLETED_LSU_MICROOPS Completed Load Store Unit micro-ops. Every micro-op that goes down the LSU pipe. Includes: GPR loads / GPR stores, FPR loads / FPR stores, VR loads / VR stores, Cache ops. Memory barriers Other LSU ops (dsn, msgsnd, mvidsplt, mviwsplt, tlbilx, tlbivax, tlbsync) all
COMPLETED_GPR_LOADS GPR load micro-ops completed. This event only counts once for misaligns. Note that lmw that causes a fault may end up double-counting micro-ops -- once for first pass, once for second pass. all
COMPLETED_GPR_STORES GPR store micro-ops completed. This event only counts once for misaligns. Note that stmw that causes a fault may end up double-counting micro-ops -- once for first pass, once for second pass. all
COMPLETED_CACHEOPS Cache ops completed. Includes: dcba / dcbal, dcbf / dcbfep, dcbi, dcblc / dcblq, dcbst / dcbstep, dcbt / dcbtep / dcbtls, dcbtst / dcbtstep / dcbtstls, dcbz / dcbzep / dcbzl / dcbzlep, icbi / icbiep, icblc / icblq., icbt / icbtls all
COMPLETED_MEM_BARRIERS Memory barriers completed. Includes: msync (sync, lwsync, elemental barriers) mbar (eieio) miso. all
COMPLETED_SFX_MICROOPS SFX micro-ops completed. all
COMPLETED_SINCLK_SFX_MICROOPS SFX single-cycle micro-ops completed. all
COMPLETED_DBLCLK_SFX_MICROOPS SFX double-cycle micro-ops completed. all
COMPLETED_CFX_INSTRUCTIONS CFX instructions completed. all
COMPLETED_SFX_CFX_INSTRUCTIONS SFX or CFX instructions completed. all
COMPLETED_FPU_INSTRUCTIONS FPU instructions completed. all
COMPLETED_FPR_MICROOPS_LOADS FPR load micro-ops completed. all
COMPLETED_FPR_MICROOPS_STORES FPR store micro-ops completed. all
COMPLETED_FPR_MICROOPS_LOADS_STORES FPR load and store micro-ops completed. all
COMPLETED_FPR_SINPRECISE_LOADS_STORES FPR single-precision load and store micro-ops completed. all
COMPLETED_FPR_DBLPRECISE_LOADS_STORES FPR double-precision load and store micro-ops completed. all
COMPLETED_ALTIVEC_INSTRUCTIONS AltiVec instructions completed. (non-LSU). all
COMPLETED_ALTIVEC_VSFX_INSTRUCTIONS AltiVec VSFX instructions completed. all
COMPLETED_ALTIVEC_VCFX_INSTRUCTIONS AltiVec VCFX instructions completed. all
COMPLETED_ALTIVEC_VPU_INSTRUCTIONS AltiVec VPU instructions completed. all
COMPLETED_ALTIVEC_VFPU_INSTRUCTIONS AltiVec VFPU instructions completed. all
COMPLETED_VR_LOADS_MICROOPS VR load micro-ops completed. all
COMPLETED_VR_STORES_MICROOPS VR store micro-ops completed. all
VSCR_SAT_SET Number of times the saturate bit flips from 0 to 1. all
CLK_SFX0_IDLE Cycles Simple Fixed Point Unit 0 is idle. all
CLK_SFX1_IDLE Cycles Simple Fixed Point Unit 1 is idle. all
CLK_CFX_IDLE Cycles Complex Fixed Point Unit is idle. all
CLK_LSU_IDLE Cycles Load Store Unit is idle. all
CLK_BU_IDLE Cycles Branch Unit is idle. all
CLK_FPU_IDLE Cycles Floating Point Unit is idle. all
CLK_VPU_IDLE Cycles AltiVec Permute Unit is idle. all
CLK_VFPU_IDLE Cycles AltiVec Floating Point Unit is idle. all
CLK_VSFX_IDLE Cycles AltiVec Simple Fixed Point Unit is idle. all
CLK_VCFX_IDLE Cycles AltiVec Complex Fixed Point Unit is idle. all
L1_CACHE_MISSES Data L1 cache misses. (Includes load, store, cache ops). all
L1_CACHE_LOAD_MISSES Data L1 cache load misses. all
L1_CACHE_STORE_MISSES Data L1 cache store misses. all
LMQ_ALLOCATED_LOADS Loads that allocate into Load Miss Queue. (Data L1 cache misses, but may not be to different cache lines). all
LOAD_THREAD_MISS_COLLISION Number of times that this thread��s load hits a line that is valid for the other thread but not this thread. all
INTERTHREAD_STATUS_ARRAY_COLLISION Number of times that two threads collide on status array access. all
NUM_SGB_ALLOC Number of Store Gather Buffer allocates. all
NUM_SGB_GATHERS Number of Store Gather Buffer gathers. all
NUM_SGB_OVERFLOWS Number of Store Gather Buffer overflows. (Causes SGB full condition when additional store request is made). all
NUM_SGB_PROMOTIONS Number of Store Gather Buffer promotions. all
NUM_SGB_INORDER_PROMOTIONS Number of Store Gather Buffer in-order promotions. (Also includes oldest-entry timeout condition). all
NUM_SGB_OUTOFORDER_PROMOTIONS Number of Store Gather Buffer out-of-order promotions. all
NUM_SGB_HP_PROMOTIONS Number of Store Gather Buffer high-priority promotions. (Load hits on pending store). all
NUM_SGB_MISO_PROMOTIONS Number of Store Gather Buffer miso promotions. promotions. (Load hits on pending store). all
NUM_SGB_WATERMARK_PROMOTIONS Number of Store Gather Buffer watermark promotions. all
NUM_SGB_OVERFLOW_PROMOTIONS Number of Store Gather Buffer overflow promotions. all
CLK_DLAQ_FULL Number of cycles the DLink Age Queue is full. all
TIMES_DLAQ_FULL Number of times the DLink Age Queue is full. all
CLK_LRSAQ_FULL Number of cycles the Load Reservation Set Age Queue is full. all
TIMES_LRSAQ_FULL Number of times the Load Reservation Set Age Queue is full. all
CLK_FWDAQ_FULL Number of cycles the Forward Age Queue is full. all
TIMES_FWDAQ_FULL Number of times the Forward Age Queue is full. all
NUM_FWD_STQ_COLLISION_TIMES Number of times a Store Queue collision is forwardable. The following cases are not forwardable: store address + size does not contain the load, cache-inhibited store, denormalized, floating point store, stcx, guarded load. all
NUM_FWD_STQ_COLLISION_TIMES_DATA_RDY Number of times a Store Queue collision is forwardable and is ready with data to forward. all
NUM_FWD_STQ_COLLISION_TIMES_DATA_NORDY Number of times a Store Queue collision is forwardable but is not ready with data to forward. all
NUM_NOFWD_STQ_COLLISION_TIMES Number of times a Store Queue collision is not forwardable and must wait until the store leaves the Store Queue. all
NUM_FWD_STQ_COLLISION_CLK Number of cycles a Store Queue collision is forwardable. (Number of cycles from the detection of a forwardable Store Queue entry until the load is replayed in stg1). all
NUM_FWD_STQ_COLLISION_CLK_DATA_RDY Number of cycles a Store Queue collision is forwardable and is ready with data to forward. (Number of cycles from the detection of a forwardable Store Queue entry with valid data until the load is replayed in stg1). all
NUM_FWD_STQ_COLLISION_CLK_DATA_NORDY Number of cycles a Store Queue collision is forwardable but is not ready with data to forward. (Number of cycles from the detection of a forwardable Store Queue entry without valid data until the load is replayed in stg1). all
NUM_NOFWD_STQ_COLLISION_CLK Number of cycles a Store Queue collision is not forwardable and has to wait until the store leaves the Store Queue. (Number of cycles from the detection of a non-forwardable Store Queue entry until the load is replayed in stg1). all
NUM_FALSE_EA_COLLISION Number of times the lower 12-bits of EA matched but the upper bits did not, leading to a false load-on-store replay. Cycle penalty is 4x the number of times. all
NUM_LSO_BUS_COLLISION Number of LS0 result bus collisions. Cycle penalty is 3x this measurement. all
NUM_INTERTHREAD_DBLWORKD_BANK_COLLISION Number of inter-thread double-word bank collisions. Measures when both threads attempt to access the same double-word bank. Cycle penalty is 3x this measurement. all
L1_CACHE_IM Instruction L1 cache demand fetch misses. (Includes icbtls. Does not include prefetch). all
IMMU_MISSES Counts misses in the level 1 Instruction MMU. all
IMMU_TLB4K_HITS Counts hits in the level 1 Instruction MMU TLB-4K. all
IMMU_VSP_HITS Counts hits in the level 1 Instruction MMU VSP. all
CLK_IMMU_HW_TABLEWALK Counts IMMU cycles spent in hardware tablewalk. This represents the cycles from the point where the L2 MMU miss occurs to when the page table walk completes with a valid translation or exception. all
DMMU_MISSES Counts misses in the level 1 Data MMU. (Does not count replayed operations). all
DMMU_TLB4K_HITS Counts hits in the level 1 Data MMU TLB-4K. (Does not count replayed operations). all
DMMU_VSP_HITS Counts hits in the level 1 Data MMU VSP. (Does not count replayed operations). all
CLK_DMMU_HW_TABLEWALK Counts DMMU cycles spent in hardware tablewalk. This represents the cycles from the point where the L2 MMU miss occurs to when the page table walk completes with a valid translation or exception. all
L2MMU_MISSES Counts level 2 MMU misses. (Does not count misses that occur due to dcbt / dcbtst / dcba / dcbal instructions that fail translation and are no-oped. Does not count misses in L2MMU-VSP when looking up an indirect entry). all
L2MMU_4K_HITS Counts level 2 MMU hits in L2MMU-4K. all
L2MMU_VSP_HITS Counts level 2 MMU hits in L2MMU-VSP. (Does not count indirect lookups). all
L2MMU_INDIRECT_MISSES Counts level 2 MMU indirect misses. This represents indirect entry lookups that do not have a matching indirect entry. all
L2MMU_INDIRECT_VALID_MISSES Counts level 2 MMU indirect valid misses. This occurts when the indirect entry is valid, but the corresponding PTE[V] = 0 or the premissions in the PTE are not sufficient for the requested access. all
LRAT_MISSES Counts Logical to Real Address Translation misses. This includes LRAT misses from tlbwe instructions or from page table translations. all
CLK_LMQ_LOSE_DLINK_DUE_SGB Cycles the Load Miss Queue loses DLINK arbitration due to the Store Gather Buffer. all
CLK_SGB_LOSE_DLINK_DUE_LMQ Cycles the Store Gather Buffer loses DLINK arbitration due to the Load Miss Queue. all
CLK_THREAD_LOSE_DLINK_DUE_OTHER_THREAD Cycles thread loses DLINK arbitration due to other thread: Cycles thread loses DLINK arbitration due to other thread. all
DECODE_MASK_VALUE One mask/value pair that allows instructions to be counted in Decode. all
SHR_L2_DLINK_REQ Number of DLINK requests made from core to Shared L2. all
SHR_L2_ILINK_REQ Number of ILINK requests made from core to Shared L2. (Includes instruction fetches and L2MMU hardware tablewalk requests). all
SHR_L2_RLINK_REQ Number of RLINK requests made from Shared L2 to core. (back invalidates, stashes, barriers). all
SHR_L2_BLINK_REQ Number of BLINK requests made from Shared L2 to core. (back invalidates, stashes, barriers). all
SHR_L2_CLINK_REQ Number of CLINK requests made from Shared L2 to core. (back invalidates, stashes, barriers). all
L2_HITS Number of L2 Cache hits. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_MISSES Number of L2 Cache hits. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_DEMAND_ACCESS Number of L2 Cache demand accesses. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_ACCESSES Number of L2 Cache accesses from all sources (demand, reload, snoop, etc). Counts 0, 1, 2, 3, or 4 per cycle. all
L2_STORE_ALLOCATE Number of L2 Cache store allocates. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_INSTRUCTIONS_ACCESS Number of L2 Cache instruction accesses. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_DATA_ACCESS Number of L2 Cache data accesses. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_INSTRUCTIONS_MISSES Number of L2 Cache instruction misses. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_DATA_MISSES Number of L2 Cache data misses. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_HITS_PER_THREAD Number of times this core/thread hits in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_MISSES_PER_THREAD Number of times this core/thread misses in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_DEMAND_ACCESS_PER_THREAD Number of times this core/thread makes a demand access to the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_STORE_ALLOC_PER_THREAD Number of times a store from this core/thread allocates in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_INSTRUCTIONS_ACCESS_PER_THREAD Number of times an instruction from this core/thread accesses the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_DATA_ACCESS_PER_THREAD Number of times a data operation from this core/thread accesses the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_INSTRUCTION_MISSES_PER_THREAD Number of times an instruction from this core/thread misses in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_DATA_MISSES_PER_THREAD Number of times a data operation from this core/thread misses in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_RELOAD_FROM_CORENET Number of L2 Cache reloads from CoreNet. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_IN_STASH_REQ Number of incoming L2 Cache stash requests. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_STASH_REQ_DOWNGRD_TO_SNOOPS Number of incoming L2 Cache stash requests downgraded to snoops. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_SNOOPS_HITS Number of L2 Cache snoop hits. Counts 0, 1, 2, 3, or 4 per cycle. all
L2_SNOOPS_MINT Number of L2 Cache snoops causing MINT. all
L2_SNOOPS_SINT Number of L2 Cache snoops causing SINT. all
L2_SNOOPS_PUSHES Number of L2 Cache snoop pushes. all
CLK_BIB_STALL Stall for Back Invalidate Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. all
CLK_RLT_STALL Stall for Reload Table entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. all
CLK_RLFQ_STALL Stall for Reload Fold Queue entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. all
CLK_DTQ_STALL Stall for Data Transaction Queue entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. all
CLK_COB_STALL Stall for Castout Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. all
CLK_WDB_STALL Stall for Write Data Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. all
CLK_RLDB_STALL Stall for Reload Data Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle. all
CLK_SNPQ_STALL Stall for Snoop Queue entry (cycles). all
BIU_MASTER_REQ Master transaction starts. (Number of AOut sent to CoreNet). all
BIU_MASTER_GLOBAL_REQ Master transaction starts that are global. (Number of AOut with M=1 sent to CoreNet). all
BIU_MASTER_DATA_SIDE_REQ Master transaction starts that are global. (Number of AOut with M=1 sent to CoreNet). all
BIU_MASTER_INSTRUCTION_SIDE_REQ Master instruction-side transaction starts. (Number of I-side AOut sent to CoreNet). all
L2_STASH_REQ Stash request on AIn matches stash IDs for core or L2. all
L2_SNOOP_REQ Externally generated snoop requests. (Number of AIn from CoreNet not from self). all
Don't speculate - benchmark. - Dan Bernstein
2020/07/20