PPC E6500 performance counter events

PPC E6500 events

This is a list of PPC E6500's performance counter event types. Please see PowerPC e500 Core Complex Reference Manual Chapter 7: Performance Monitor, downloadable from freescale.com

Name	Description	Counters usable	Unit mask options
CPU_CLK	Cycles	all
COMPLETED_INSNS	Completed Instructions (0, 1, or 2 per cycle)	all
COMPLETED_OPS	Completed Micro-ops	all
DECODED_OPS	Micro-ops decoded	all
TRANSITIONS_PM_EVENT	0 to 1 transitions on the pm_event input	all
CPU_CLK_PM_EVENT	Processor cycles that occur when the pm_event input is asserted	all
COMPLETED_BRANCHES	Branch Instructions completed	all
COMPLETED_LOAD_OPS	Load micro-ops completed	all
COMPLETED_STORE_OPS	Store micro-ops completed	all
COMPLETION_REDIRECTS	Number of completion buffer redirects	all
BRANCHES_FINISHED	Branches finished	all
TAKEN_BRANCHES_FINISHED	Taken branches finished	all
TAKEN_BRANCHES_FINISHED_NOT_BTB	Finished unconditional branches that miss the BTB	all
BRANCHES_MISPREDICTED	Branch instructions mispredicted due to direction, target, or IAB prediction	all
BRANCHES_MISPREDICTED_DIRECTION	Branches mispredicted due to direction prediction	all
BTB_HITS	Branches that hit in the BTB, or missed but are not taken	all
DECODE_STALLED	Cycles the instruction buffer was not empty, but 0 instructions decoded	all
ISSUE_STALLED	Cycles the SFX/CFX issue queue is not empty but 0 instructions issued	all
BRANCH_ISSUE_STALLED	Cycles the branch buffer is not empty but 0 instructions issued	all
SFX0_SCHEDULE_STALLED	Cycles SFX0 is not empty but 0 instructions scheduled	all
SFX1_SCHEDULE_STALLED	Cycles SFX1 is not empty but 0 instructions scheduled	all
CFX_SCHEDULE_STALLED	Cycles CFX is not empty but 0 instructions scheduled	all
LSU_SCHEDULE_STALLED	Cycles LSU is not empty but 0 instructions scheduled	all
BU_SCHEDULE_STALLED	Cycles BU is not empty but 0 instructions scheduled	all
TOTAL_TRANSLATED	Total LSU micro-ops that reach the second stage of the LSU	all
LOADS_TRANSLATED	Cacheable load micro-ops translated.1 (Does not include WT)	all
STORES_TRANSLATED	Cacheable store micro-ops translated.1 (Does not include WT)	all
TOUCHES_TRANSLATED	Cacheable touch instructions translated. Includes: dcbt / dcbtep dcbtst / dcbtstep icbt ct=2	all
CACHEOPS_TRANSLATED	Number of dcba, dcbf, dcbst, and dcbz instructions translated (e500 traps on dcbi)	all
CACHEINHIBITED_ACCESSES_TRANSLATED	Number of cache inhibited accesses translated	all
GUARDED_LOADS_TRANSLATED	Number of guarded loads translated	all
WRITETHROUGH_STORES_TRANSLATED	Number of write-through stores translated	all
MISALIGNED_ACCESSES_TRANSLATED	Number of misaligned load or store accesses translated.	all
FETCH_2X4_HITS	Each fetch retrieves up to 8 instructions, but only the first 4 are required. This event increments if at least one instruction of the second 4 are actually used.	all
FETCH_HITS_ON_PREFETCHES	Fetch hits on instruction prefetch when the data is still in the ILFB.	all
GENERATED_FETCH_PREFETCHES	Number of prefetches generated.	all
DL1_RELOADS	This is historically used to determine dcache miss rate (along with loads/stores completed). This counts dL1 reloads for any reason.	all
LOAD_MISS_WITH_LOAD_QUEUE_FULL	Counts number of stalls; Com:52 counts cycles stalled. Includes: cacheable loads, CI loads, loadec, larx, touches, ibll, ibsl,ibllsl	all
LOAD_GUARDED_MISS_NOT_LAST_REPLAYS	Load guarded miss when the load is not yet at the bottom of the completion buffer.	all
STORE_TRANSLATED_QUEUE_FULL_REPLAYS	Translate a store when the StQ is full.	all
ADDRESS_COLLISION_REPLAYS	Address collision.	all
DTLB_MISS_REPLAYS	Counts number of stalls; Com:56 counts cycles stalled.	all
DTLB_BUSY_REPLAYS	Counts number of stalls; Com:57 counts cycles stalled.	all
SECOND_PART_MISALIGNED_AFTER_MISS_REPLAYS	Second part of misaligned access when first part missed in cache.	all
LOAD_MISS_QUEUE_FULL_CYCLES	Cycles stalled on replay condition - Load miss with load queue full.	all
LOAD_GUARDED_MISS_NOT_LAST_CYCLES	Cycles stalled on replay condition - Load guarded miss when the load is not yet at the bottom of the completion buffer.	all
STORE_TRANSLATED_QUEUE_FULL_CYCLES	Cycles stalled on replay condition - Translate a store when the StQ is full.	all
ADDRESS_COLLISION_CYCLES	Cycles stalled on replay condition - Address collision.	all
DTLB_MISS_CYCLES	Cycles stalled on replay condition - DTLB miss.	all
DTLB_BUSY_CYCLES	Cycles stalled on replay condition - DTLB busy.	all
SECOND_PART_MISALIGNED_AFTER_MISS_CYCLES	Cycles stalled on replay condition - Second part of misaligned access when first part missed in cache.	all
IL1_FETCH_RELOADS	This is historically used to determine icache miss rate (along with instructions completed) Reloads due to demand fetch.	all
FETCHES	Counts fetches that write at least one instruction to the Instruction Buffer.	all
IMMU_TLB4K_RELOADS	iMMU TLB4K reloads	all
IMMU_VSP_RELOADS	iMMU VSP reloads	all
DMMU_TLB4K_RELOADS	dMMU TLB4K reloads	all
DMMU_VSP_RELOADS	dMMU VSP reloads	all
L2MMU_MISSES	Counts iTLB/dTLB error interrupt	all
TAKEN_BRANCHES	Completed branch instructions that were taken.	all
TAKEN_BLR	Completed blr instructions that were taken.	all
BTB_TARGET_MISPREDICT	Number of target mispredicts (BTB).	all
MISPREDICT_TARGET_BLR	Number of link stack mispredicts (LS).	all
TAKEN_BTB_BUT_MISS	Number of BTB misses, but taken (BTB allocates).	all
PMC0_OVERFLOW	Counts the number of times PMC0[32] transitioned from 1 to 0.	all
PMC1_OVERFLOW	Counts the number of times PMC1[32] transitioned from 1 to 0.	all
PMC2_OVERFLOW	Counts the number of times PMC2[32] transitioned from 1 to 0.	all
PMC3_OVERFLOW	Counts the number of times PMC3[32] transitioned from 1 to 0.	all
INTERRUPTS	Number of interrupts taken	all
EXTERNAL_INTERRUPTS	Number of external input interrupts taken	all
CRITICAL_INTERRUPTS	Number of critical input interrupts taken	all
SC_TRAP_INTERRUPTS	Number of system call and trap interrupts	all
TBL_BIT_TRANS_PMGC0	Counts transitions of the TBL bit selected by PMGC0[TBSEL].	all
PMC4_OVERFLOW	Counts the number of times PMC4[32] transitioned from 1 to 0.	all
PMC5_OVERFLOW	Counts the number of times PMC5[32] transitioned from 1 to 0.	all
L1_STASH_HIT	Stash hits in L1 Data Cache.	all
L1_STASH_REQ	Stash requests to L1 Data Cache.	all
TIMES_LSU_THREAD_PRIO_SWTICHED	Number of times the Load Store Unit thread priority switched based on resource collisions.	all
CLK_THREAD_REQ_FPU_DENIED	Number of cycles both threads had Floating Point Unit requests and one was denied.	all
CLK_THREAD_REQ_VPERM_DENIED	Number of cycles both threads had Altivec Permute requests and one was denied.	all
CLK_THREAD_REQ_VGEN_DENIED	Number of cycles both threads had Altivec General requests and one was denied.	all
CLK_THREAD_REQ_CFX_DENIED	Number of cycles both threads had Complex Fixed-Point Unit requests and one was denied.	all
CLK_THREAD_REQ_FETCH_DENIED	Number of cycles both threads both threads made a Fetch request to the L1 Instruction Cache and one thread wins arbitration.	all
CLK_LSU_ISSUE_STALLED	Cycles the LSU issue queue is not empty but 0 instructions issued.	all
CLK_FPU_ISSUE_STALLED	Cycles the FPU issue queue is not empty but 0 instructions issued.	all
CLK_ALTIVEC_ISSUE_STALLED	Cycles the AltiVec issue queue is not empty but 0 instructions issued.	all
CLK_FPU_SCHEDULE_STALLED	Cycles FPU is not empty but 0 instructions scheduled.	all
CLK_VPERM_SCHEDULE_STALLED	Cycles VPERM is not empty but 0 instructions scheduled.	all
CLK_VGEN_SCHEDULE_STALLED	Cycles VGEN is not empty but 0 instructions scheduled.	all
CLK_VPU_INSTRUCTION_WAIT_FOR_OPERA	Cycles VPU instruction waits for operands.	all
CLK_VFPU_INSTRUCTION_WAIT_FOR_OPERA	Cycles VFPU instruction waits for operands.	all
CLK_VSFX_INSTRUCTION_WAIT_FOR_OPERA	Cycles VSFX instruction waits for operands	all
CLK_VCFX_INSTRUCTION_WAIT_FOR_OPERA	Cycles VCFX instruction waits for operands.	all
CLK_IB_EMPT	Number of cycles the Instruction Buffer is empty	all
CLK_IB_FULL	Number of cycles the Instruction Buffer is full enough such that fetch stops fetching.	all
CLK_CB_EMPT	Number of cycles the Completion Buffer is empty.	all
CLK_CB_FULL	Number of cycles the Completion Buffer is full enough such that decode stops.	all
CLK_PRESYNC_SI_IB	Number of cycles a pre-sync serialized instruction holds in the Instruction Buffer and is not decoded.	all
COMPLETED_CLK_0_INSTRUCTIONS	Increments if 0 instructions (micro-ops) completed.	all
COMPLETED_CLK_1_INSTRUCTIONS	Increments if 1 instruction (micro-op) completed.	all
COMPLETED_CLK_2_INSTRUCTIONS	Increments if 2 instructions (micro-op) completed.	all
DETECTED_IAC5S	Every valid IAC5 detection.	all
DETECTED_IAC6S	Every valid IAC6 detection.	all
DETECTED_IAC7S	Every valid IAC7 detection.	all
DETECTED_IAC8S	Every valid IAC8 detection.	all
DETECTED_IAC1S	Every valid IAC1 detection.	all
DETECTED_IAC2S	Every valid IAC2 detection.	all
DETECTED_IAC3S	Every valid IAC3 detection.	all
DETECTED_IAC4S	Every valid IAC4 detection.	all
DETECTED_DAC1S	Every valid DAC1 detection.	all
DETECTED_DAC2S	Every valid DAC2 detection.	all
DETECTED_DVT0	Detection of a write to DEVENT SPR with DVT0 set.	all
DETECTED_DVT1	Detection of a write to DEVENT SPR with DVT1 set.	all
DETECTED_DVT2	Detection of a write to DEVENT SPR with DVT2 set.	all
DETECTED_DVT3	Detection of a write to DEVENT SPR with DVT3 set.	all
DETECTED_DVT4	Detection of a write to DEVENT SPR with DVT4 set.	all
DETECTED_DVT5	Detection of a write to DEVENT SPR with DVT5 set.	all
DETECTED_DVT6	Detection of a write to DEVENT SPR with DVT6 set.	all
DETECTED_DVT7	Detection of a write to DEVENT SPR with DVT7 set.	all
CLK_COMPLETION_STALLED	Number of completion cycles stalled due to Nexus FIFO full.	all
FPU_FINISH	FPU finish.	all
CLK_FPU_DIV	Counts once for every cycle of divide execution. (fdivs and fdiv).	all
FPU_DENORM_INPUT	Counts extra cycles delay due to denormalized inputs. If there is one, this is incremented 4 times, Two operands increments it 5 times. This shows the real penalty due to denorms, not just how often they occur.	all
FPU_DENORM_OUTPUT	FPU denorm output.	all
FPU_FPSCR_FULL_STALL	FPU FPSCR stall.	all
FPU_PIPE_SYNC_STALL	Synchronization-op stalls: count once for each cycle that a ��break-before�� FPU is in the RS/issue stage but cannotissue. Also count once for each cycle that an FPU op is in the RS/issue stage but cannot issue due to ��break-after��: of an FPU op currently in progress.	all
FPU_INPUT_DATA_STALL	FPU data-ready stall: cycles in which there is an op in the RS/issue stage that cannot issue because one or more of its operands is not yet available.	all
FPU_INSTRUCTIONS_GEN_FLAG	FPU instruction sets FPSCR[FEX].	all
PW20_CNT	Number of times the core enters the PW20 power management state.	all
DECORATED_LOADS	Number of decorated loads to cache inhibited memory performed.	all
DECORATED_STORES	Number of decorated stores to cache inhibited memory performed.	all
NUM_INSTRUCTIONS_SUCC	Number of successful stbcx., sthcx., stwcx., or stdcx. instructions.	all
NUM_INSTRUCTIONS_UNSUCC	Number of unsuccessful stbcx., sthcx., stwcx., or stdcx. instructions.	all
COMPLETED_LSU_MICROOPS	Completed Load Store Unit micro-ops. Every micro-op that goes down the LSU pipe. Includes: GPR loads / GPR stores, FPR loads / FPR stores, VR loads / VR stores, Cache ops. Memory barriers Other LSU ops (dsn, msgsnd, mvidsplt, mviwsplt, tlbilx, tlbivax, tlbsync)	all
COMPLETED_GPR_LOADS	GPR load micro-ops completed. This event only counts once for misaligns. Note that lmw that causes a fault may end up double-counting micro-ops -- once for first pass, once for second pass.	all
COMPLETED_GPR_STORES	GPR store micro-ops completed. This event only counts once for misaligns. Note that stmw that causes a fault may end up double-counting micro-ops -- once for first pass, once for second pass.	all
COMPLETED_CACHEOPS	Cache ops completed. Includes: dcba / dcbal, dcbf / dcbfep, dcbi, dcblc / dcblq, dcbst / dcbstep, dcbt / dcbtep / dcbtls, dcbtst / dcbtstep / dcbtstls, dcbz / dcbzep / dcbzl / dcbzlep, icbi / icbiep, icblc / icblq., icbt / icbtls	all
COMPLETED_MEM_BARRIERS	Memory barriers completed. Includes: msync (sync, lwsync, elemental barriers) mbar (eieio) miso.	all
COMPLETED_SFX_MICROOPS	SFX micro-ops completed.	all
COMPLETED_SINCLK_SFX_MICROOPS	SFX single-cycle micro-ops completed.	all
COMPLETED_DBLCLK_SFX_MICROOPS	SFX double-cycle micro-ops completed.	all
COMPLETED_CFX_INSTRUCTIONS	CFX instructions completed.	all
COMPLETED_SFX_CFX_INSTRUCTIONS	SFX or CFX instructions completed.	all
COMPLETED_FPU_INSTRUCTIONS	FPU instructions completed.	all
COMPLETED_FPR_MICROOPS_LOADS	FPR load micro-ops completed.	all
COMPLETED_FPR_MICROOPS_STORES	FPR store micro-ops completed.	all
COMPLETED_FPR_MICROOPS_LOADS_STORES	FPR load and store micro-ops completed.	all
COMPLETED_FPR_SINPRECISE_LOADS_STORES	FPR single-precision load and store micro-ops completed.	all
COMPLETED_FPR_DBLPRECISE_LOADS_STORES	FPR double-precision load and store micro-ops completed.	all
COMPLETED_ALTIVEC_INSTRUCTIONS	AltiVec instructions completed. (non-LSU).	all
COMPLETED_ALTIVEC_VSFX_INSTRUCTIONS	AltiVec VSFX instructions completed.	all
COMPLETED_ALTIVEC_VCFX_INSTRUCTIONS	AltiVec VCFX instructions completed.	all
COMPLETED_ALTIVEC_VPU_INSTRUCTIONS	AltiVec VPU instructions completed.	all
COMPLETED_ALTIVEC_VFPU_INSTRUCTIONS	AltiVec VFPU instructions completed.	all
COMPLETED_VR_LOADS_MICROOPS	VR load micro-ops completed.	all
COMPLETED_VR_STORES_MICROOPS	VR store micro-ops completed.	all
VSCR_SAT_SET	Number of times the saturate bit flips from 0 to 1.	all
CLK_SFX0_IDLE	Cycles Simple Fixed Point Unit 0 is idle.	all
CLK_SFX1_IDLE	Cycles Simple Fixed Point Unit 1 is idle.	all
CLK_CFX_IDLE	Cycles Complex Fixed Point Unit is idle.	all
CLK_LSU_IDLE	Cycles Load Store Unit is idle.	all
CLK_BU_IDLE	Cycles Branch Unit is idle.	all
CLK_FPU_IDLE	Cycles Floating Point Unit is idle.	all
CLK_VPU_IDLE	Cycles AltiVec Permute Unit is idle.	all
CLK_VFPU_IDLE	Cycles AltiVec Floating Point Unit is idle.	all
CLK_VSFX_IDLE	Cycles AltiVec Simple Fixed Point Unit is idle.	all
CLK_VCFX_IDLE	Cycles AltiVec Complex Fixed Point Unit is idle.	all
L1_CACHE_MISSES	Data L1 cache misses. (Includes load, store, cache ops).	all
L1_CACHE_LOAD_MISSES	Data L1 cache load misses.	all
L1_CACHE_STORE_MISSES	Data L1 cache store misses.	all
LMQ_ALLOCATED_LOADS	Loads that allocate into Load Miss Queue. (Data L1 cache misses, but may not be to different cache lines).	all
LOAD_THREAD_MISS_COLLISION	Number of times that this thread��s load hits a line that is valid for the other thread but not this thread.	all
INTERTHREAD_STATUS_ARRAY_COLLISION	Number of times that two threads collide on status array access.	all
NUM_SGB_ALLOC	Number of Store Gather Buffer allocates.	all
NUM_SGB_GATHERS	Number of Store Gather Buffer gathers.	all
NUM_SGB_OVERFLOWS	Number of Store Gather Buffer overflows. (Causes SGB full condition when additional store request is made).	all
NUM_SGB_PROMOTIONS	Number of Store Gather Buffer promotions.	all
NUM_SGB_INORDER_PROMOTIONS	Number of Store Gather Buffer in-order promotions. (Also includes oldest-entry timeout condition).	all
NUM_SGB_OUTOFORDER_PROMOTIONS	Number of Store Gather Buffer out-of-order promotions.	all
NUM_SGB_HP_PROMOTIONS	Number of Store Gather Buffer high-priority promotions. (Load hits on pending store).	all
NUM_SGB_MISO_PROMOTIONS	Number of Store Gather Buffer miso promotions. promotions. (Load hits on pending store).	all
NUM_SGB_WATERMARK_PROMOTIONS	Number of Store Gather Buffer watermark promotions.	all
NUM_SGB_OVERFLOW_PROMOTIONS	Number of Store Gather Buffer overflow promotions.	all
CLK_DLAQ_FULL	Number of cycles the DLink Age Queue is full.	all
TIMES_DLAQ_FULL	Number of times the DLink Age Queue is full.	all
CLK_LRSAQ_FULL	Number of cycles the Load Reservation Set Age Queue is full.	all
TIMES_LRSAQ_FULL	Number of times the Load Reservation Set Age Queue is full.	all
CLK_FWDAQ_FULL	Number of cycles the Forward Age Queue is full.	all
TIMES_FWDAQ_FULL	Number of times the Forward Age Queue is full.	all
NUM_FWD_STQ_COLLISION_TIMES	Number of times a Store Queue collision is forwardable. The following cases are not forwardable: store address + size does not contain the load, cache-inhibited store, denormalized, floating point store, stcx, guarded load.	all
NUM_FWD_STQ_COLLISION_TIMES_DATA_RDY	Number of times a Store Queue collision is forwardable and is ready with data to forward.	all
NUM_FWD_STQ_COLLISION_TIMES_DATA_NORDY	Number of times a Store Queue collision is forwardable but is not ready with data to forward.	all
NUM_NOFWD_STQ_COLLISION_TIMES	Number of times a Store Queue collision is not forwardable and must wait until the store leaves the Store Queue.	all
NUM_FWD_STQ_COLLISION_CLK	Number of cycles a Store Queue collision is forwardable. (Number of cycles from the detection of a forwardable Store Queue entry until the load is replayed in stg1).	all
NUM_FWD_STQ_COLLISION_CLK_DATA_RDY	Number of cycles a Store Queue collision is forwardable and is ready with data to forward. (Number of cycles from the detection of a forwardable Store Queue entry with valid data until the load is replayed in stg1).	all
NUM_FWD_STQ_COLLISION_CLK_DATA_NORDY	Number of cycles a Store Queue collision is forwardable but is not ready with data to forward. (Number of cycles from the detection of a forwardable Store Queue entry without valid data until the load is replayed in stg1).	all
NUM_NOFWD_STQ_COLLISION_CLK	Number of cycles a Store Queue collision is not forwardable and has to wait until the store leaves the Store Queue. (Number of cycles from the detection of a non-forwardable Store Queue entry until the load is replayed in stg1).	all
NUM_FALSE_EA_COLLISION	Number of times the lower 12-bits of EA matched but the upper bits did not, leading to a false load-on-store replay. Cycle penalty is 4x the number of times.	all
NUM_LSO_BUS_COLLISION	Number of LS0 result bus collisions. Cycle penalty is 3x this measurement.	all
NUM_INTERTHREAD_DBLWORKD_BANK_COLLISION	Number of inter-thread double-word bank collisions. Measures when both threads attempt to access the same double-word bank. Cycle penalty is 3x this measurement.	all
L1_CACHE_IM	Instruction L1 cache demand fetch misses. (Includes icbtls. Does not include prefetch).	all
IMMU_MISSES	Counts misses in the level 1 Instruction MMU.	all
IMMU_TLB4K_HITS	Counts hits in the level 1 Instruction MMU TLB-4K.	all
IMMU_VSP_HITS	Counts hits in the level 1 Instruction MMU VSP.	all
CLK_IMMU_HW_TABLEWALK	Counts IMMU cycles spent in hardware tablewalk. This represents the cycles from the point where the L2 MMU miss occurs to when the page table walk completes with a valid translation or exception.	all
DMMU_MISSES	Counts misses in the level 1 Data MMU. (Does not count replayed operations).	all
DMMU_TLB4K_HITS	Counts hits in the level 1 Data MMU TLB-4K. (Does not count replayed operations).	all
DMMU_VSP_HITS	Counts hits in the level 1 Data MMU VSP. (Does not count replayed operations).	all
CLK_DMMU_HW_TABLEWALK	Counts DMMU cycles spent in hardware tablewalk. This represents the cycles from the point where the L2 MMU miss occurs to when the page table walk completes with a valid translation or exception.	all
L2MMU_MISSES	Counts level 2 MMU misses. (Does not count misses that occur due to dcbt / dcbtst / dcba / dcbal instructions that fail translation and are no-oped. Does not count misses in L2MMU-VSP when looking up an indirect entry).	all
L2MMU_4K_HITS	Counts level 2 MMU hits in L2MMU-4K.	all
L2MMU_VSP_HITS	Counts level 2 MMU hits in L2MMU-VSP. (Does not count indirect lookups).	all
L2MMU_INDIRECT_MISSES	Counts level 2 MMU indirect misses. This represents indirect entry lookups that do not have a matching indirect entry.	all
L2MMU_INDIRECT_VALID_MISSES	Counts level 2 MMU indirect valid misses. This occurts when the indirect entry is valid, but the corresponding PTE[V] = 0 or the premissions in the PTE are not sufficient for the requested access.	all
LRAT_MISSES	Counts Logical to Real Address Translation misses. This includes LRAT misses from tlbwe instructions or from page table translations.	all
CLK_LMQ_LOSE_DLINK_DUE_SGB	Cycles the Load Miss Queue loses DLINK arbitration due to the Store Gather Buffer.	all
CLK_SGB_LOSE_DLINK_DUE_LMQ	Cycles the Store Gather Buffer loses DLINK arbitration due to the Load Miss Queue.	all
CLK_THREAD_LOSE_DLINK_DUE_OTHER_THREAD	Cycles thread loses DLINK arbitration due to other thread: Cycles thread loses DLINK arbitration due to other thread.	all
DECODE_MASK_VALUE	One mask/value pair that allows instructions to be counted in Decode.	all
SHR_L2_DLINK_REQ	Number of DLINK requests made from core to Shared L2.	all
SHR_L2_ILINK_REQ	Number of ILINK requests made from core to Shared L2. (Includes instruction fetches and L2MMU hardware tablewalk requests).	all
SHR_L2_RLINK_REQ	Number of RLINK requests made from Shared L2 to core. (back invalidates, stashes, barriers).	all
SHR_L2_BLINK_REQ	Number of BLINK requests made from Shared L2 to core. (back invalidates, stashes, barriers).	all
SHR_L2_CLINK_REQ	Number of CLINK requests made from Shared L2 to core. (back invalidates, stashes, barriers).	all
L2_HITS	Number of L2 Cache hits. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_MISSES	Number of L2 Cache hits. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_DEMAND_ACCESS	Number of L2 Cache demand accesses. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_ACCESSES	Number of L2 Cache accesses from all sources (demand, reload, snoop, etc). Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_STORE_ALLOCATE	Number of L2 Cache store allocates. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_INSTRUCTIONS_ACCESS	Number of L2 Cache instruction accesses. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_DATA_ACCESS	Number of L2 Cache data accesses. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_INSTRUCTIONS_MISSES	Number of L2 Cache instruction misses. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_DATA_MISSES	Number of L2 Cache data misses. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_HITS_PER_THREAD	Number of times this core/thread hits in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_MISSES_PER_THREAD	Number of times this core/thread misses in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_DEMAND_ACCESS_PER_THREAD	Number of times this core/thread makes a demand access to the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_STORE_ALLOC_PER_THREAD	Number of times a store from this core/thread allocates in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_INSTRUCTIONS_ACCESS_PER_THREAD	Number of times an instruction from this core/thread accesses the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_DATA_ACCESS_PER_THREAD	Number of times a data operation from this core/thread accesses the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_INSTRUCTION_MISSES_PER_THREAD	Number of times an instruction from this core/thread misses in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_DATA_MISSES_PER_THREAD	Number of times a data operation from this core/thread misses in the L2 Cache. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_RELOAD_FROM_CORENET	Number of L2 Cache reloads from CoreNet. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_IN_STASH_REQ	Number of incoming L2 Cache stash requests. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_STASH_REQ_DOWNGRD_TO_SNOOPS	Number of incoming L2 Cache stash requests downgraded to snoops. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_SNOOPS_HITS	Number of L2 Cache snoop hits. Counts 0, 1, 2, 3, or 4 per cycle.	all
L2_SNOOPS_MINT	Number of L2 Cache snoops causing MINT.	all
L2_SNOOPS_SINT	Number of L2 Cache snoops causing SINT.	all
L2_SNOOPS_PUSHES	Number of L2 Cache snoop pushes.	all
CLK_BIB_STALL	Stall for Back Invalidate Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle.	all
CLK_RLT_STALL	Stall for Reload Table entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle.	all
CLK_RLFQ_STALL	Stall for Reload Fold Queue entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle.	all
CLK_DTQ_STALL	Stall for Data Transaction Queue entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle.	all
CLK_COB_STALL	Stall for Castout Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle.	all
CLK_WDB_STALL	Stall for Write Data Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle.	all
CLK_RLDB_STALL	Stall for Reload Data Buffer entry (cycles). Counts 0, 1, 2, 3, or 4 per cycle.	all
CLK_SNPQ_STALL	Stall for Snoop Queue entry (cycles).	all
BIU_MASTER_REQ	Master transaction starts. (Number of AOut sent to CoreNet).	all
BIU_MASTER_GLOBAL_REQ	Master transaction starts that are global. (Number of AOut with M=1 sent to CoreNet).	all
BIU_MASTER_DATA_SIDE_REQ	Master transaction starts that are global. (Number of AOut with M=1 sent to CoreNet).	all
BIU_MASTER_INSTRUCTION_SIDE_REQ	Master instruction-side transaction starts. (Number of I-side AOut sent to CoreNet).	all
L2_STASH_REQ	Stash request on AIn matches stash IDs for core or L2.	all
L2_SNOOP_REQ	Externally generated snoop requests. (Number of AIn from CoreNet not from self).	all

Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet. - M.A. Jackson

2020/07/20