This is a list of all Intel Phi (Knights Landing) Microarchitecture performance counter event types. Please see Intel Xeon Phi(TM) Processor Performance Monitoring Reference and Intel Architecture Optimization Reference Manual.
Name | Description | Counters usable | Unit mask options |
CYCLES | Cycles | 0 | |
PM_1PLUS_PPC_CMPL | 1 or more ppc insts finished (completed). | 0 | |
PM_1PLUS_PPC_DISP | Cycles at least one Instr Dispatched. Could be a group with only microcode. Issue HW016521 | 3 | |
PM_ANY_THRD_RUN_CYC | Any thread in run_cycles (was one thread in run_cycles). | 0 | |
PM_BR_MPRED_CMPL | Number of Branch Mispredicts. | 3 | |
PM_BR_TAKEN_CMPL | Branch Taken. | 1 | |
PM_CYC | Cycles . | 0, 1, 2, 3 | |
PM_DATA_FROM_L2MISS | Demand LD - L2 Miss (not L2 hit). | 1 | |
PM_DATA_FROM_L3MISS | Demand LD - L3 Miss (not L2 hit and not L3 hit). | 2 | |
PM_DATA_FROM_MEM | Data cache reload from memory (including L4). | 3 | |
PM_DTLB_MISS | Data PTEG Reloaded (DTLB Miss). | 2 | |
PM_EXT_INT | external interrupt. | 1 | |
PM_FLOP | Floating Point Operations Finished. | 0 | |
PM_FLUSH | Flush (any type). | 3 | |
PM_GCT_NOSLOT_CYC | Pipeline empty (No itags assigned , no GCT slots used). | 0 | |
PM_IERAT_RELOAD | IERAT Reloaded (Miss). | 0 | |
PM_INST_DISP | PPC Dispatched. | 1 | |
PM_INST_FROM_L3MISS | Inst from L3 miss. | 2 | |
PM_ITLB_MISS | ITLB Reloaded. | 3 | |
PM_L1_DCACHE_RELOAD_VALID | DL1 reloaded due to Demand Load . | 2 | |
PM_L1_ICACHE_MISS | Demand iCache Miss. | 1 | |
PM_LD_MISS_L1 | Load Missed L1. | 2 | |
PM_LSU_DERAT_MISS | DERAT Reloaded (Miss). | 1 | |
PM_MRK_BR_MPRED_CMPL | Marked Branch Mispredicted. | 2 | |
PM_MRK_BR_TAKEN_CMPL | Marked Branch Taken. | 0 | |
PM_MRK_DATA_FROM_L2MISS | Data cache reload L2 miss. | 3 | |
PM_MRK_DATA_FROM_L3MISS | The processor's data cache was reloaded from a localtion other than the local core's L3 due to a marked load. | 1 | |
PM_MRK_DATA_FROM_MEM | The processor's data cache was reloaded from a memory location including L4 from local remote or distant due to a marked load. | 1 | |
PM_MRK_DERAT_MISS | Erat Miss (TLB Access) All page sizes. | 2 | |
PM_MRK_DTLB_MISS | Marked dtlb miss. | 3 | |
PM_MRK_INST_CMPL | marked instruction completed. | 3 | |
PM_MRK_INST_DISP | Marked Instruction dispatched. | 0 | |
PM_MRK_INST_FROM_L3MISS | n/a | 3 | |
PM_MRK_L1_ICACHE_MISS | Marked L1 Icache Miss. | 0 | |
PM_MRK_L1_RELOAD_VALID | Marked demand reload. | 0 | |
PM_MRK_LD_MISS_L1 | Marked DL1 Demand Miss counted at exec time. | 1 | |
PM_MRK_ST_CMPL | Marked store completed. | 0 | |
PM_RUN_CYC | Run_cycles. | 5 | |
PM_RUN_INST_CMPL | Run_Instructions. | 4 | |
PM_RUN_PURR | Run_PURR. | 3 | |
PM_ST_FIN | Store Instructions Finished (store sent to nest). | 1 | |
PM_ST_MISS_L1 | Store Missed L1. | 2 | |
PM_TB_BIT_TRANS | timebase event. | 2 | |
PM_THRD_CONC_RUN_INST | Concurrent Run Instructions. | 2 | |
PM_THRESH_EXC_1024 | Threshold counter exceeded a value of 1024. | 2 | |
PM_THRESH_EXC_128 | Threshold counter exceeded a value of 128. | 3 | |
PM_THRESH_EXC_2048 | Threshold counter exceeded a value of 2048. | 3 | |
PM_THRESH_EXC_256 | Threshold counter exceed a count of 256. | 0 | |
PM_THRESH_EXC_32 | Threshold counter exceeded a value of 32. | 1 | |
PM_THRESH_EXC_4096 | Threshold counter exceed a count of 4096. | 0 | |
PM_THRESH_EXC_512 | Threshold counter exceeded a value of 512. | 1 | |
PM_THRESH_EXC_64 | Threshold counter exceeded a value of 64. | 2 | |
PM_THRESH_MET | threshold exceeded. | 0 | |
PM_1FLOP_CMPL | one flop (fadd, fmul, fsub, fcmp, fsel, fabs, fnabs, fres, fsqrte, fneg) operation completed | 3 | |
PM_1PLUS_PPC_CMPL | 1 or more ppc insts finished | 0 | |
PM_1PLUS_PPC_DISP | Cycles at least one Instr Dispatched | 3 | |
PM_2FLOP_CMPL | DP vector version of fmul, fsub, fcmp, fsel, fabs, fnabs, fres ,fsqrte, fneg | 3 | |
PM_4FLOP_CMPL | 4 FLOP instruction completed | 3 | |
PM_8FLOP_CMPL | 8 FLOP instruction completed | 3 | |
PM_ANY_THRD_RUN_CYC | Cycles in which at least one thread has the run latch set | 0 | |
PM_BACK_BR_CMPL | Branch instruction completed with a target address less than current instruction address | 1 | |
PM_BANK_CONFLICT | Read blocked due to interleave conflict. The ifar logic will detect an interleave conflict and kill the data that was read that cycle. | 0, 1, 2, 3 | |
PM_BFU_BUSY | Cycles in which all 4 Binary Floating Point units are busy. The BFU is running at capacity | 2 | |
PM_BR_2PATH | Branches that are not strongly biased | 1 | |
PM_BR_2PATH | Branches that are not strongly biased | 3 | |
PM_BR_CMPL | Any Branch instruction completed | 3 | |
PM_BR_CORECT_PRED_TAKEN_CMPL | Conditional Branch Completed in which the HW correctly predicted the direction as taken. Counted at completion time | 0, 1, 2, 3 | |
PM_BR_MPRED_CCACHE | Conditional Branch Completed that was Mispredicted due to the Count Cache Target Prediction | 0, 1, 2, 3 | |
PM_BR_MPRED_CMPL | Number of Branch Mispredicts | 3 | |
PM_BR_MPRED_LSTACK | Conditional Branch Completed that was Mispredicted due to the Link Stack Target Prediction | 0, 1, 2, 3 | |
PM_BR_MPRED_PCACHE | Conditional Branch Completed that was Mispredicted due to pattern cache prediction | 0, 1, 2, 3 | |
PM_BR_MPRED_TAKEN_CR | A Conditional Branch that resolved to taken was mispredicted as not taken (due to the BHT Direction Prediction). | 0, 1, 2, 3 | |
PM_BR_MPRED_TAKEN_TA | Conditional Branch Completed that was Mispredicted due to the Target Address Prediction from the Count Cache or Link Stack. Only XL-form branches that resolved Taken set this event. | 0, 1, 2, 3 | |
PM_BR_PRED | Conditional Branch Executed in which the HW predicted the Direction or Target. Includes taken and not taken and is counted at execution time | 0, 1, 2, 3 | |
PM_BR_PRED_CCACHE | Conditional Branch Completed that used the Count Cache for Target Prediction | 0, 1, 2, 3 | |
PM_BR_PRED_LSTACK | Conditional Branch Completed that used the Link Stack for Target Prediction | 0, 1, 2, 3 | |
PM_BR_PRED_PCACHE | Conditional branch completed that used pattern cache prediction | 0, 1, 2, 3 | |
PM_BR_PRED_TA | Conditional Branch Completed that had its target address predicted. Only XL-form branches set this event. This equal the sum of CCACHE, LSTACK, and PCACHE | 0, 1, 2, 3 | |
PM_BR_PRED_TAKEN_CR | Conditional Branch that had its direction predicted. I-form branches do not set this event. In addition, B-form branches which do not use the BHT do not set this event - these are branches with BO-field set to 'always taken' and branches | 0, 1, 2, 3 | |
PM_BR_TAKEN_CMPL | New event for Branch Taken | 1 | |
PM_BRU_FIN | Branch Instruction Finished | 0 | |
PM_BR_UNCOND | Unconditional Branch Completed. HW branch prediction was not used for this branch. This can be an I-form branch, a B-form branch with BO-field set to branch always, or a B-form branch which was covenrted to a Resolve. | 0, 1, 2, 3 | |
PM_BTAC_BAD_RESULT | BTAC thinks branch will be taken but it is either predicted not-taken by the BHT, or the target address is wrong (less common). In both cases, a redirect will happen | 0, 1, 2, 3 | |
PM_BTAC_GOOD_RESULT | BTAC predicts a taken branch and the BHT agrees, and the target address is correct | 0, 1, 2, 3 | |
PM_CHIP_PUMP_CPRED | Initial and Final Pump Scope was chip pump (prediction=correct) for all data types excluding data prefetch (demand load,inst prefetch,inst fetch,xlate) | 0 | |
PM_CLB_HELD | CLB (control logic block - indicates quadword fetch block) Hold: Any Reason | 0, 1, 2, 3 | |
PM_CMPLU_STALL | Nothing completed and ICT not empty | 0 | |
PM_CMPLU_STALL_ANY_SYNC | Cycles in which the NTC sync instruction (isync, lwsync or hwsync) is not allowed to complete | 0 | |
PM_CMPLU_STALL_BRU | Completion stall due to a Branch Unit | 3 | |
PM_CMPLU_STALL_CRYPTO | Finish stall because the NTF instruction was routed to the crypto execution pipe and was waiting to finish | 3 | |
PM_CMPLU_STALL_DCACHE_MISS | Finish stall because the NTF instruction was a load that missed the L1 and was waiting for the data to return from the nest | 1 | |
PM_CMPLU_STALL_DFLONG | Finish stall because the NTF instruction was a multi-cycle instruction issued to the Decimal Floating Point execution pipe and waiting to finish. Includes decimal floating point instructions + 128 bit binary floating point instructions. Qualified by multicycle | 0 | |
PM_CMPLU_STALL_DFU | Finish stall because the NTF instruction was issued to the Decimal Floating Point execution pipe and waiting to finish. Includes decimal floating point instructions + 128 bit binary floating point instructions. Not qualified by multicycle | 1 | |
PM_CMPLU_STALL_DMISS_L21_L31 | Completion stall by Dcache miss which resolved on chip ( excluding local L2/L3) | 1 | |
PM_CMPLU_STALL_DMISS_L2L3 | Completion stall by Dcache miss which resolved in L2/L3 | 0 | |
PM_CMPLU_STALL_DMISS_L2L3_CONFLICT | Completion stall due to cache miss that resolves in the L2 or L3 with a conflict | 3 | |
PM_CMPLU_STALL_DMISS_L3MISS | Completion stall due to cache miss resolving missed the L3 | 3 | |
PM_CMPLU_STALL_DMISS_LMEM | Completion stall due to cache miss that resolves in local memory | 2 | |
PM_CMPLU_STALL_DMISS_REMOTE | Completion stall by Dcache miss which resolved from remote chip (cache or memory) | 1 | |
PM_CMPLU_STALL_DP | Finish stall because the NTF instruction was a scalar instruction issued to the Double Precision execution pipe and waiting to finish. Includes binary floating point instructions in 32 and 64 bit binary floating point format. Not qualified multicycle. Qualified by NOT vector | 0 | |
PM_CMPLU_STALL_DPLONG | Finish stall because the NTF instruction was a scalar multi-cycle instruction issued to the Double Precision execution pipe and waiting to finish. Includes binary floating point instructions in 32 and 64 bit binary floating point format. Qualified by NOT vector AND multicycle | 2 | |
PM_CMPLU_STALL_EIEIO | Finish stall because the NTF instruction is an EIEIO waiting for response from L2 | 3 | |
PM_CMPLU_STALL_EMQ_FULL | Finish stall because the next to finish instruction suffered an ERAT miss and the EMQ was full | 2 | |
PM_CMPLU_STALL_ERAT_MISS | Finish stall because the NTF instruction was a load or store that suffered a translation miss | 3 | |
PM_CMPLU_STALL_EXCEPTION | Cycles in which the NTC instruction is not allowed to complete because it was interrupted by ANY exception, which has to be serviced before the instruction can complete | 2 | |
PM_CMPLU_STALL_EXEC_UNIT | Completion stall due to execution units (FXU/VSU/CRU) | 1 | |
PM_CMPLU_STALL_FLUSH_ANY_THREAD | Cycles in which the NTC instruction is not allowed to complete because any of the 4 threads in the same core suffered a flush, which blocks completion | 0 | |
PM_CMPLU_STALL_FXLONG | Completion stall due to a long latency scalar fixed point instruction (division, square root) | 3 | |
PM_CMPLU_STALL_FXU | Finish stall due to a scalar fixed point or CR instruction in the execution pipeline. These instructions get routed to the ALU, ALU2, and DIV pipes | 1 | |
PM_CMPLU_STALL_HWSYNC | completion stall due to hwsync | 2 | |
PM_CMPLU_STALL_LARX | Finish stall because the NTF instruction was a larx waiting to be satisfied | 0 | |
PM_CMPLU_STALL_LHS | Finish stall because the NTF instruction was a load that hit on an older store and it was waiting for store data | 1 | |
PM_CMPLU_STALL_LMQ_FULL | Finish stall because the NTF instruction was a load that missed in the L1 and the LMQ was unable to accept this load miss request because it was full | 3 | |
PM_CMPLU_STALL_LOAD_FINISH | Finish stall because the NTF instruction was a load instruction with all its dependencies satisfied just going through the LSU pipe to finish | 3 | |
PM_CMPLU_STALL_LRQ_FULL | Finish stall because the NTF instruction was a load that was held in LSAQ (load-store address queue) because the LRQ (load-reorder queue) was full | 1 | |
PM_CMPLU_STALL_LRQ_OTHER | Finish stall due to LRQ miscellaneous reasons, lost arbitration to LMQ slot, bank collisions, set prediction cleanup, set prediction multihit and others | 0 | |
PM_CMPLU_STALL_LSAQ_ARB | Finish stall because the NTF instruction was a load or store that was held in LSAQ because an older instruction from SRQ or LRQ won arbitration to the LSU pipe when this instruction tried to launch | 3 | |
PM_CMPLU_STALL_LSU | Completion stall by LSU instruction | 1 | |
PM_CMPLU_STALL_LSU_FIN | Finish stall because the NTF instruction was an LSU op (other than a load or a store) with all its dependencies met and just going through the LSU pipe to finish | 0 | |
PM_CMPLU_STALL_LSU_FLUSH_NEXT | Completion stall of one cycle because the LSU requested to flush the next iop in the sequence. It takes 1 cycle for the ISU to process this request before the LSU instruction is allowed to complete | 1 | |
PM_CMPLU_STALL_LSU_MFSPR | Finish stall because the NTF instruction was a mfspr instruction targeting an LSU SPR and it was waiting for the register data to be returned | 2 | |
PM_CMPLU_STALL_LWSYNC | completion stall due to lwsync | 0 | |
PM_CMPLU_STALL_MTFPSCR | Completion stall because the ISU is updating the register and notifying the Effective Address Table (EAT) | 3 | |
PM_CMPLU_STALL_NESTED_TBEGIN | Completion stall because the ISU is updating the TEXASR to keep track of the nested tbegin. This is a short delay, and it includes ROT | 0 | |
PM_CMPLU_STALL_NESTED_TEND | Completion stall because the ISU is updating the TEXASR to keep track of the nested tend and decrement the TEXASR nested level. This is a short delay | 2 | |
PM_CMPLU_STALL_NTC_DISP_FIN | Finish stall because the NTF instruction was one that must finish at dispatch. | 3 | |
PM_CMPLU_STALL_NTC_FLUSH | Completion stall due to ntc flush | 1 | |
PM_CMPLU_STALL_OTHER_CMPL | Instructions the core completed while this tread was stalled | 2 | |
PM_CMPLU_STALL_PASTE | Finish stall because the NTF instruction was a paste waiting for response from L2 | 1 | |
PM_CMPLU_STALL_PM | Finish stall because the NTF instruction was issued to the Permute execution pipe and waiting to finish. Includes permute and decimal fixed point instructions (128 bit BCD arithmetic) + a few 128 bit fixpoint add/subtract instructions with carry. Not qualified by vector or multicycle | 2 | |
PM_CMPLU_STALL_SLB | Finish stall because the NTF instruction was awaiting L2 response for an SLB | 0 | |
PM_CMPLU_STALL_SPEC_FINISH | Finish stall while waiting for the non-speculative finish of either a stcx waiting for its result or a load waiting for non-critical sectors of data and ECC | 2 | |
PM_CMPLU_STALL_SRQ_FULL | Finish stall because the NTF instruction was a store that was held in LSAQ because the SRQ was full | 2 | |
PM_CMPLU_STALL_STCX | Finish stall because the NTF instruction was a stcx waiting for response from L2 | 1 | |
PM_CMPLU_STALL_ST_FWD | Completion stall due to store forward | 3 | |
PM_CMPLU_STALL_STORE_DATA | Finish stall because the next to finish instruction was a store waiting on data | 2 | |
PM_CMPLU_STALL_STORE_FIN_ARB | Finish stall because the NTF instruction was a store waiting for a slot in the store finish pipe. This means the instruction is ready to finish but there are instructions ahead of it, using the finish pipe | 2 | |
PM_CMPLU_STALL_STORE_FINISH | Finish stall because the NTF instruction was a store with all its dependencies met, just waiting to go through the LSU pipe to finish | 1 | |
PM_CMPLU_STALL_STORE_PIPE_ARB | Finish stall because the NTF instruction was a store waiting for the next relaunch opportunity after an internal reject. This means the instruction is ready to relaunch and tried once but lost arbitration | 3 | |
PM_CMPLU_STALL_SYNC_PMU_INT | Cycles in which the NTC instruction is waiting for a synchronous PMU interrupt | 1 | |
PM_CMPLU_STALL_TEND | Finish stall because the NTF instruction was a tend instruction awaiting response from L2 | 0 | |
PM_CMPLU_STALL_THRD | Completion Stalled because the thread was blocked | 0 | |
PM_CMPLU_STALL_TLBIE | Finish stall because the NTF instruction was a tlbie waiting for response from L2 | 1 | |
PM_CMPLU_STALL_VDP | Finish stall because the NTF instruction was a vector instruction issued to the Double Precision execution pipe and waiting to finish. Includes binary floating point instructions in 32 and 64 bit binary floating point format. Not qualified multicycle. Qualified by vector | 3 | |
PM_CMPLU_STALL_VDPLONG | Finish stall because the NTF instruction was a scalar multi-cycle instruction issued to the Double Precision execution pipe and waiting to finish. Includes binary floating point instructions in 32 and 64 bit binary floating point format. Qualified by NOT vector AND multicycle | 2 | |
PM_CMPLU_STALL_VFXLONG | Completion stall due to a long latency vector fixed point instruction (division, square root) | 1 | |
PM_CMPLU_STALL_VFXU | Finish stall due to a vector fixed point instruction in the execution pipeline. These instructions get routed to the ALU, ALU2, and DIV pipes | 2 | |
PM_CO0_BUSY | CO mach 0 Busy. Used by PMU to sample ave CO lifetime (mach0 used as sample point) | 2 | |
PM_CO0_BUSY | CO mach 0 Busy. Used by PMU to sample ave CO lifetime (mach0 used as sample point) | 3 | |
PM_CO_DISP_FAIL | CO dispatch failed due to all CO machines being busy | 0 | |
PM_CO_TM_SC_FOOTPRINT | L2 did a cleanifdirty CO to the L3 (ie created an SC line in the L3) OR L2 TM_store hit dirty HPC line and L3 indicated SC line formed in L3 on RDR bus | 1 | |
PM_CO_USAGE | Continuous 16 cycle (2to1) window where this signals rotates thru sampling each CO machine busy. PMU uses this wave to then do 16 cyc count to sample total number of machs running | 1 | |
PM_CYC | Processor cycles | 0 | |
PM_CYC | Processor cycles | 1 | |
PM_CYC | Processor cycles | 2 | |
PM_CYC | Processor cycles | 3 | |
PM_DARQ0_0_3_ENTRIES | Cycles in which 3 or less DARQ entries (out of 12) are in use | 3 | |
PM_DARQ0_10_12_ENTRIES | Cycles in which 10 or more DARQ entries (out of 12) are in use | 0 | |
PM_DARQ0_4_6_ENTRIES | Cycles in which 4, 5, or 6 DARQ entries (out of 12) are in use | 2 | |
PM_DARQ0_7_9_ENTRIES | Cycles in which 7,8, or 9 DARQ entries (out of 12) are in use | 1 | |
PM_DARQ1_0_3_ENTRIES | Cycles in which 3 or fewer DARQ1 entries (out of 12) are in use | 3 | |
PM_DARQ1_10_12_ENTRIES | Cycles in which 10 or more DARQ1 entries (out of 12) are in use | 1 | |
PM_DARQ1_4_6_ENTRIES | Cycles in which 4, 5, or 6 DARQ1 entries (out of 12) are in use | 2 | |
PM_DARQ1_7_9_ENTRIES | Cycles in which 7 to 9 DARQ1 entries (out of 12) are in use | 1 | |
PM_DARQ_STORE_REJECT | The DARQ attempted to transmit a store into an LSAQ or SRQ entry but It was rejected. Divide by PM_DARQ_STORE_XMIT to get reject ratio | 3 | |
PM_DARQ_STORE_XMIT | The DARQ attempted to transmit a store into an LSAQ or SRQ entry. Includes rejects. Not qualified by thread, so it includes counts for the whole core | 2 | |
PM_DATA_CHIP_PUMP_CPRED | Initial and Final Pump Scope was chip pump (prediction=correct) for a demand load | 0 | |
PM_DATA_FROM_DL2L3_MOD | The processor's data cache was reloaded with Modified (M) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a demand load | 3 | |
PM_DATA_FROM_DL2L3_SHR | The processor's data cache was reloaded with Shared (S) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a demand load | 2 | |
PM_DATA_FROM_DL4 | The processor's data cache was reloaded from another chip's L4 on a different Node or Group (Distant) due to a demand load | 2 | |
PM_DATA_FROM_DMEM | The processor's data cache was reloaded from another chip's memory on the same Node or Group (Distant) due to a demand load | 3 | |
PM_DATA_FROM_L2 | The processor's data cache was reloaded from local core's L2 due to a demand load | 0 | |
PM_DATA_FROM_L21_MOD | The processor's data cache was reloaded with Modified (M) data from another core's L2 on the same chip due to a demand load | 3 | |
PM_DATA_FROM_L21_SHR | The processor's data cache was reloaded with Shared (S) data from another core's L2 on the same chip due to a demand load | 2 | |
PM_DATA_FROM_L2_DISP_CONFLICT_LDHITST | The processor's data cache was reloaded from local core's L2 with load hit store conflict due to a demand load | 2 | |
PM_DATA_FROM_L2_DISP_CONFLICT_OTHER | The processor's data cache was reloaded from local core's L2 with dispatch conflict due to a demand load | 3 | |
PM_DATA_FROM_L2_MEPF | The processor's data cache was reloaded from local core's L2 hit without dispatch conflicts on Mepf state due to a demand load | 1 | |
PM_DATA_FROM_L2MISS | Demand LD - L2 Miss (not L2 hit) | 1 | |
PM_DATA_FROM_L2MISS_MOD | The processor's data cache was reloaded from a location other than the local core's L2 due to a demand load | 0 | |
PM_DATA_FROM_L2_NO_CONFLICT | The processor's data cache was reloaded from local core's L2 without conflict due to a demand load | 0 | |
PM_DATA_FROM_L3 | The processor's data cache was reloaded from local core's L3 due to a demand load | 3 | |
PM_DATA_FROM_L31_ECO_MOD | The processor's data cache was reloaded with Modified (M) data from another core's ECO L3 on the same chip due to a demand load | 3 | |
PM_DATA_FROM_L31_ECO_SHR | The processor's data cache was reloaded with Shared (S) data from another core's ECO L3 on the same chip due to a demand load | 2 | |
PM_DATA_FROM_L31_MOD | The processor's data cache was reloaded with Modified (M) data from another core's L3 on the same chip due to a demand load | 1 | |
PM_DATA_FROM_L31_SHR | The processor's data cache was reloaded with Shared (S) data from another core's L3 on the same chip due to a demand load | 0 | |
PM_DATA_FROM_L3_DISP_CONFLICT | The processor's data cache was reloaded from local core's L3 with dispatch conflict due to a demand load | 2 | |
PM_DATA_FROM_L3_MEPF | The processor's data cache was reloaded from local core's L3 without dispatch conflicts hit on Mepf state due to a demand load | 1 | |
PM_DATA_FROM_L3MISS | Demand LD - L3 Miss (not L2 hit and not L3 hit) | 2 | |
PM_DATA_FROM_L3MISS_MOD | The processor's data cache was reloaded from a location other than the local core's L3 due to a demand load | 3 | |
PM_DATA_FROM_L3_NO_CONFLICT | The processor's data cache was reloaded from local core's L3 without conflict due to a demand load | 0 | |
PM_DATA_FROM_LL4 | The processor's data cache was reloaded from the local chip's L4 cache due to a demand load | 0 | |
PM_DATA_FROM_LMEM | The processor's data cache was reloaded from the local chip's Memory due to a demand load | 1 | |
PM_DATA_FROM_MEMORY | The processor's data cache was reloaded from a memory location including L4 from local remote or distant due to a demand load | 3 | |
PM_DATA_FROM_OFF_CHIP_CACHE | The processor's data cache was reloaded either shared or modified data from another core's L2/L3 on a different chip (remote or distant) due to a demand load | 3 | |
PM_DATA_FROM_ON_CHIP_CACHE | The processor's data cache was reloaded either shared or modified data from another core's L2/L3 on the same chip due to a demand load | 0 | |
PM_DATA_FROM_RL2L3_MOD | The processor's data cache was reloaded with Modified (M) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a demand load | 1 | |
PM_DATA_FROM_RL2L3_SHR | The processor's data cache was reloaded with Shared (S) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a demand load | 0 | |
PM_DATA_FROM_RL4 | The processor's data cache was reloaded from another chip's L4 on the same Node or Group ( Remote) due to a demand load | 1 | |
PM_DATA_FROM_RMEM | The processor's data cache was reloaded from another chip's memory on the same Node or Group ( Remote) due to a demand load | 2 | |
PM_DATA_GRP_PUMP_CPRED | Initial and Final Pump Scope was group pump (prediction=correct) for a demand load | 1 | |
PM_DATA_GRP_PUMP_MPRED | ended up either larger or smaller than Initial Pump Scope for a demand load | 1 | Final Pump Scope Group |
PM_DATA_GRP_PUMP_MPRED_RTY | ended up larger than Initial Pump Scope (Chip) for a demand load | 0 | Final Pump Scope Group |
PM_DATA_PUMP_CPRED | Pump prediction correct. Counts across all types of pumps for a demand load | 0 | |
PM_DATA_PUMP_MPRED | Pump misprediction. Counts across all types of pumps for a demand load | 3 | |
PM_DATA_STORE | All ops that drain from s2q to L2 containing data | 0, 1, 2, 3 | |
PM_DATA_SYS_PUMP_CPRED | Initial and Final Pump Scope was system pump (prediction=correct) for a demand load | 2 | |
PM_DATA_SYS_PUMP_MPRED | Final Pump Scope (system) mispredicted. Either the original scope was too small (Chip/Group) or the original scope was System and it should have been smaller. Counts for a demand load | 2 | |
PM_DATA_SYS_PUMP_MPRED_RTY | Final Pump Scope (system) ended up larger than Initial Pump Scope (Chip/Group) for a demand load | 3 | |
PM_DATA_TABLEWALK_CYC | Data Tablewalk Cycles. Could be 1 or 2 active tablewalks. Includes data prefetches. | 2 | |
PM_DC_DEALLOC_NO_CONF | A demand load referenced a line in an active fuzzy prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software.Fuzzy stream confirm (out of order effects, or pf cant keep up) | 0, 1, 2, 3 | |
PM_DC_PREF_CONF | A demand load referenced a line in an active prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software. Includes forwards and backwards streams | 0, 1, 2, 3 | |
PM_DC_PREF_CONS_ALLOC | Prefetch stream allocated in the conservative phase by either the hardware prefetch mechanism or software prefetch | 0, 1, 2, 3 | |
PM_DC_PREF_FUZZY_CONF | A demand load referenced a line in an active fuzzy prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software.Fuzzy stream confirm (out of order effects, or pf cant keep up) | 0, 1, 2, 3 | |
PM_DC_PREF_HW_ALLOC | Prefetch stream allocated by the hardware prefetch mechanism | 0, 1, 2, 3 | |
PM_DC_PREF_STRIDED_CONF | A demand load referenced a line in an active strided prefetch stream. The stream could have been allocated through the hardware prefetch mechanism or through software. | 0, 1, 2, 3 | |
PM_DC_PREF_SW_ALLOC | Prefetch stream allocated by software prefetching | 0, 1, 2, 3 | |
PM_DC_PREF_XCONS_ALLOC | Prefetch stream allocated in the Ultra conservative phase by either the hardware prefetch mechanism or software prefetch | 0, 1, 2, 3 | |
PM_DECODE_FUSION_CONST_GEN | 32-bit constant generation | 0, 1, 2, 3 | |
PM_DECODE_FUSION_EXT_ADD | 32-bit extended addition | 0, 1, 2, 3 | |
PM_DECODE_FUSION_LD_ST_DISP | 32-bit displacement D-form and 16-bit displacement X-form | 0, 1, 2, 3 | |
PM_DECODE_FUSION_OP_PRESERV | Destructive op operand preservation | 0, 1, 2, 3 | |
PM_DECODE_HOLD_ICT_FULL | Counts the number of cycles in which the IFU was not able to decode and transmit one or more instructions because all itags were in use. This means the ICT is full for this thread | 0, 1, 2, 3 | |
PM_DECODE_LANES_NOT_AVAIL | Decode has something to transmit but dispatch lanes are not available | 0, 1, 2, 3 | |
PM_DERAT_MISS_16G | Data ERAT Miss (Data TLB Access) page size 16G | 3 | |
PM_DERAT_MISS_16M | Data ERAT Miss (Data TLB Access) page size 16M | 2 | |
PM_DERAT_MISS_1G | Data ERAT Miss (Data TLB Access) page size 1G. Implies radix translation | 1 | |
PM_DERAT_MISS_2M | Data ERAT Miss (Data TLB Access) page size 2M. Implies radix translation | 0 | |
PM_DERAT_MISS_4K | Data ERAT Miss (Data TLB Access) page size 4K | 0 | |
PM_DERAT_MISS_64K | Data ERAT Miss (Data TLB Access) page size 64K | 1 | |
PM_DFU_BUSY | Cycles in which all 4 Decimal Floating Point units are busy. The DFU is running at capacity | 3 | |
PM_DISP_CLB_HELD_BAL | Dispatch/CLB Hold: Balance Flush | 0, 1, 2, 3 | |
PM_DISP_CLB_HELD_SB | Dispatch/CLB Hold: Scoreboard | 0, 1, 2, 3 | |
PM_DISP_CLB_HELD_TLBIE | Dispatch Hold: Due to TLBIE | 0, 1, 2, 3 | |
PM_DISP_HELD | Dispatch Held | 0 | |
PM_DISP_HELD_HB_FULL | Dispatch held due to History Buffer full. Could be GPR/VSR/VMR/FPR/CR/XVF | 2 | |
PM_DISP_HELD_ISSQ_FULL | Dispatch held due to Issue q full. Includes issue queue and branch queue | 1 | |
PM_DISP_HELD_SYNC_HOLD | Cycles in which dispatch is held because of a synchronizing instruction in the pipeline | 3 | |
PM_DISP_HELD_TBEGIN | This outer tbegin transaction cannot be dispatched until the previous tend instruction completes | 0, 1, 2, 3 | |
PM_DISP_STARVED | Dispatched Starved | 2 | |
PM_DP_QP_FLOP_CMPL | Double-Precion or Quad-Precision instruction completed | 3 | |
PM_DPTEG_FROM_DL2L3_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_DPTEG_FROM_DL2L3_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_DPTEG_FROM_DL4 | A Page Table Entry was loaded into the TLB from another chip's L4 on a different Node or Group (Distant) due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_DPTEG_FROM_DMEM | A Page Table Entry was loaded into the TLB from another chip's memory on the same Node or Group (Distant) due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_DPTEG_FROM_L2 | A Page Table Entry was loaded into the TLB from local core's L2 due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_DPTEG_FROM_L21_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another core's L2 on the same chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_DPTEG_FROM_L21_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another core's L2 on the same chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_DPTEG_FROM_L2_MEPF | A Page Table Entry was loaded into the TLB from local core's L2 hit without dispatch conflicts on Mepf state. due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_DPTEG_FROM_L2MISS | A Page Table Entry was loaded into the TLB from a location other than the local core's L2 due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_DPTEG_FROM_L2_NO_CONFLICT | A Page Table Entry was loaded into the TLB from local core's L2 without conflict due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_DPTEG_FROM_L3 | A Page Table Entry was loaded into the TLB from local core's L3 due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_DPTEG_FROM_L31_ECO_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another core's ECO L3 on the same chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_DPTEG_FROM_L31_ECO_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another core's ECO L3 on the same chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_DPTEG_FROM_L31_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another core's L3 on the same chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_DPTEG_FROM_L31_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another core's L3 on the same chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_DPTEG_FROM_L3_DISP_CONFLICT | A Page Table Entry was loaded into the TLB from local core's L3 with dispatch conflict due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_DPTEG_FROM_L3_MEPF | A Page Table Entry was loaded into the TLB from local core's L3 without dispatch conflicts hit on Mepf state. due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_DPTEG_FROM_L3MISS | A Page Table Entry was loaded into the TLB from a location other than the local core's L3 due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_DPTEG_FROM_L3_NO_CONFLICT | A Page Table Entry was loaded into the TLB from local core's L3 without conflict due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_DPTEG_FROM_LL4 | A Page Table Entry was loaded into the TLB from the local chip's L4 cache due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_DPTEG_FROM_LMEM | A Page Table Entry was loaded into the TLB from the local chip's Memory due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_DPTEG_FROM_MEMORY | A Page Table Entry was loaded into the TLB from a memory location including L4 from local remote or distant due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_DPTEG_FROM_OFF_CHIP_CACHE | A Page Table Entry was loaded into the TLB either shared or modified data from another core's L2/L3 on a different chip (remote or distant) due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_DPTEG_FROM_ON_CHIP_CACHE | A Page Table Entry was loaded into the TLB either shared or modified data from another core's L2/L3 on the same chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_DPTEG_FROM_RL2L3_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_DPTEG_FROM_RL2L3_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_DPTEG_FROM_RL4 | A Page Table Entry was loaded into the TLB from another chip's L4 on the same Node or Group ( Remote) due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_DPTEG_FROM_RMEM | A Page Table Entry was loaded into the TLB from another chip's memory on the same Node or Group ( Remote) due to a data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_DSIDE_L2MEMACC | Valid when first beat of data comes in for an D-side fetch where data came EXCLUSIVELY from memory (excluding hpcread64 accesses), i.e., total memory accesses by RCs | 2 | |
PM_DSIDE_MRU_TOUCH | D-side L2 MRU touch sent to L2 | 1 | |
PM_DSIDE_OTHER_64B_L2MEMACC | Valid when first beat of data comes in for an D-side fetch where data came EXCLUSIVELY from memory that was for hpc_read64, (RC had to fetch other 64B of a line from MC) i.e., number of times RC had to go to memory to get 'missing' 64B | 2 | |
PM_DSLB_MISS | Data SLB Miss - Total of all segment sizes | 0, 1, 2, 3 | |
PM_DSLB_MISS | gate_and(sd_pc_c0_comp_valid AND sd_pc_c0_comp_thread(0:1)=tid,sd_pc_c0_comp_ppc_count(0:3)) + gate_and(sd_pc_c1_comp_valid AND sd_pc_c1_comp_thread(0:1)=tid,sd_pc_c1_comp_ppc_count(0:3)) | 0 | |
PM_DTLB_MISS | Data PTEG reload | 2 | |
PM_DTLB_MISS_16G | Data TLB Miss page size 16G | 0 | |
PM_DTLB_MISS_16M | Data TLB Miss page size 16M | 3 | |
PM_DTLB_MISS_1G | Data TLB reload (after a miss) page size 1G. Implies radix translation was used | 3 | |
PM_DTLB_MISS_2M | Data TLB reload (after a miss) page size 2M. Implies radix translation was used | 0 | |
PM_DTLB_MISS_4K | Data TLB Miss page size 4k | 1 | |
PM_DTLB_MISS_64K | Data TLB Miss page size 64K | 2 | |
PM_EAT_FORCE_MISPRED | XL-form branch was mispredicted due to the predicted target address missing from EAT. The EAT forces a mispredict in this case since there is no predicated target to validate. This is a rare case that may occur when the EAT is full and a branch is issued | 0, 1, 2, 3 | |
PM_EAT_FULL_CYC | Cycles No room in EAT | 0, 1, 2, 3 | |
PM_EE_OFF_EXT_INT | CyclesMSR[EE] is off and external interrupts are active | 0, 1, 2, 3 | |
PM_EXT_INT | external interrupt | 1 | |
PM_FLOP_CMPL | Floating Point Operation Finished | 3 | |
PM_FLUSH | Flush (any type) | 3 | |
PM_FLUSH_COMPLETION | The instruction that was next to complete did not complete because it suffered a flush | 2 | |
PM_FLUSH_DISP | Dispatch flush | 0, 1, 2, 3 | |
PM_FLUSH_DISP_SB | Dispatch Flush: Scoreboard | 0, 1, 2, 3 | |
PM_FLUSH_DISP_TLBIE | Dispatch Flush: TLBIE | 0, 1, 2, 3 | |
PM_FLUSH_HB_RESTORE_CYC | Cycles in which no new instructions can be dispatched to the ICT after a flush. History buffer recovery | 0, 1, 2, 3 | |
PM_FLUSH_LSU | LSU flushes. Includes all lsu flushes | 0, 1, 2, 3 | |
PM_FLUSH_MPRED | Branch mispredict flushes. Includes target and address misprecition | 0, 1, 2, 3 | |
PM_FMA_CMPL | two flops operation completed (fmadd, fnmadd, fmsub, fnmsub) Scalar instructions only. | 3 | |
PM_FORCED_NOP | Instruction was forced to execute as a nop because it was found to behave like a nop (have no effect) at decode time | 0, 1, 2, 3 | |
PM_FREQ_DOWN | Power Management: Below Threshold B | 2 | |
PM_FREQ_UP | Power Management: Above Threshold A | 3 | |
PM_FXU_1PLUS_BUSY | At least one of the 4 FXU units is busy | 2 | |
PM_FXU_BUSY | Cycles in which all 4 FXUs are busy. The FXU is running at capacity | 1 | |
PM_FXU_FIN | The fixed point unit Unit finished an instruction. Instructions that finish may not necessary complete. | 3 | |
PM_FXU_IDLE | Cycles in which FXU0, FXU1, FXU2, and FXU3 are all idle | 1 | |
PM_GRP_PUMP_CPRED | Initial and Final Pump Scope and data sourced across this scope was group pump for all data types excluding data prefetch (demand load,inst prefetch,inst fetch,xlate) | 1 | |
PM_GRP_PUMP_MPRED | ended up either larger or smaller than Initial Pump Scope for all data types excluding data prefetch (demand load,inst prefetch,inst fetch,xlate) | 1 | Final Pump Scope Group |
PM_GRP_PUMP_MPRED_RTY | ended up larger than Initial Pump Scope (Chip) for all data types excluding data prefetch (demand load,inst prefetch,inst fetch,xlate) | 0 | Final Pump Scope Group |
PM_HV_CYC | Cycles in which msr_hv is high. Note that this event does not take msr_pr into consideration | 1 | |
PM_HWSYNC | Hwsync instruction decoded and transferred | 0, 1, 2, 3 | |
PM_IBUF_FULL_CYC | Cycles No room in ibuff | 0, 1, 2, 3 | |
PM_IC_DEMAND_CYC | ended up larger than Initial Pump Scope (Chip) for a demand load | 0 | Final Pump Scope Group |
PM_IC_DEMAND_L2_BHT_REDIRECT | L2 I cache demand request due to BHT redirect, branch redirect ( 2 bubbles 3 cycles) | 0, 1, 2, 3 | |
PM_IC_DEMAND_L2_BR_REDIRECT | L2 I cache demand request due to branch Mispredict ( 15 cycle path) | 0, 1, 2, 3 | |
PM_IC_DEMAND_REQ | Demand Instruction fetch request | 0, 1, 2, 3 | |
PM_IC_INVALIDATE | Ic line invalidated | 0, 1, 2, 3 | |
PM_IC_MISS_CMPL | Non-speculative icache miss, counted at completion | 3 | |
PM_IC_MISS_ICBI | threaded version, IC Misses where we got EA dir hit but no sector valids were on. ICBI took line out | 0, 1, 2, 3 | |
PM_IC_PREF_CANCEL_HIT | Prefetch Canceled due to icache hit | 0, 1, 2, 3 | |
PM_IC_PREF_CANCEL_L2 | L2 Squashed a demand or prefetch request | 0, 1, 2, 3 | |
PM_IC_PREF_CANCEL_PAGE | Prefetch Canceled due to page boundary | 0, 1, 2, 3 | |
PM_IC_PREF_REQ | Instruction prefetch requests | 0, 1, 2, 3 | |
PM_IC_PREF_WRITE | Instruction prefetch written into IL1 | 0, 1, 2, 3 | |
PM_IC_RELOAD_PRIVATE | Reloading line was brought in private for a specific thread. Most lines are brought in shared for all eight threads. If RA does not match then invalidates and then brings it shared to other thread. In P7 line brought in private , then line was invalidat | 0, 1, 2, 3 | |
PM_ICT_EMPTY_CYC | Cycles in which the ICT is completely empty. No itags are assigned to any thread | 1 | |
PM_ICT_NOSLOT_BR_MPRED | Ict empty for this thread due to branch mispred | 3 | |
PM_ICT_NOSLOT_BR_MPRED_ICMISS | Ict empty for this thread due to Icache Miss and branch mispred | 2 | |
PM_ICT_NOSLOT_CYC | Number of cycles the ICT has no itags assigned to this thread | 0 | |
PM_ICT_NOSLOT_DISP_HELD | Cycles in which the NTC instruction is held at dispatch for any reason | 3 | |
PM_ICT_NOSLOT_DISP_HELD_HB_FULL | Ict empty for this thread due to dispatch holds because the History Buffer was full. Could be GPR/VSR/VMR/FPR/CR/XVF | 2 | |
PM_ICT_NOSLOT_DISP_HELD_ISSQ | Ict empty for this thread due to dispatch hold on this thread due to Issue q full, BRQ full, XVCF Full, Count cache, Link, Tar full | 1 | |
PM_ICT_NOSLOT_DISP_HELD_SYNC | Dispatch held due to a synchronizing instruction at dispatch | 3 | |
PM_ICT_NOSLOT_DISP_HELD_TBEGIN | the NTC instruction is being held at dispatch because it is a tbegin instruction and there is an older tbegin in the pipeline that must complete before the younger tbegin can dispatch | 0 | |
PM_ICT_NOSLOT_IC_L3 | Ict empty for this thread due to icache misses that were sourced from the local L3 | 2 | |
PM_ICT_NOSLOT_IC_L3MISS | Ict empty for this thread due to icache misses that were sourced from beyond the local L3. The source could be local/remote/distant memory or another core's cache | 3 | |
PM_ICT_NOSLOT_IC_MISS | Ict empty for this thread due to Icache Miss | 1 | |
PM_IERAT_RELOAD | Number of I-ERAT reloads | 0 | |
PM_IERAT_RELOAD_16M | IERAT Reloaded (Miss) for a 16M page | 3 | |
PM_IERAT_RELOAD_4K | IERAT reloaded (after a miss) for 4K pages | 1 | |
PM_IERAT_RELOAD_64K | IERAT Reloaded (Miss) for a 64k page | 2 | |
PM_IFETCH_THROTTLE | Cycles in which Instruction fetch throttle was active. | 2 | |
PM_INST_CHIP_PUMP_CPRED | Initial and Final Pump Scope was chip pump (prediction=correct) for an instruction fetch | 0 | |
PM_INST_CMPL | Number of PowerPC Instructions that completed. | 0 | |
PM_INST_CMPL | Number of PowerPC Instructions that completed. | 1 | |
PM_INST_CMPL | Number of PowerPC Instructions that completed. | 2 | |
PM_INST_CMPL | Number of PowerPC Instructions that completed. | 3 | |
PM_INST_DISP | # PPC Dispatched | 1 | |
PM_INST_DISP | # PPC Dispatched | 2 | |
PM_INST_FROM_DL2L3_MOD | The processor's Instruction cache was reloaded with Modified (M) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to an instruction fetch (not prefetch) | 3 | |
PM_INST_FROM_DL2L3_SHR | The processor's Instruction cache was reloaded with Shared (S) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to an instruction fetch (not prefetch) | 2 | |
PM_INST_FROM_DL4 | The processor's Instruction cache was reloaded from another chip's L4 on a different Node or Group (Distant) due to an instruction fetch (not prefetch) | 2 | |
PM_INST_FROM_DMEM | The processor's Instruction cache was reloaded from another chip's memory on the same Node or Group (Distant) due to an instruction fetch (not prefetch) | 3 | |
PM_INST_FROM_L1 | Instruction fetches from L1. L1 instruction hit | 0, 1, 2, 3 | |
PM_INST_FROM_L2 | The processor's Instruction cache was reloaded from local core's L2 due to an instruction fetch (not prefetch) | 0 | |
PM_INST_FROM_L21_MOD | The processor's Instruction cache was reloaded with Modified (M) data from another core's L2 on the same chip due to an instruction fetch (not prefetch) | 3 | |
PM_INST_FROM_L21_SHR | The processor's Instruction cache was reloaded with Shared (S) data from another core's L2 on the same chip due to an instruction fetch (not prefetch) | 2 | |
PM_INST_FROM_L2_DISP_CONFLICT_LDHITST | The processor's Instruction cache was reloaded from local core's L2 with load hit store conflict due to an instruction fetch (not prefetch) | 2 | |
PM_INST_FROM_L2_DISP_CONFLICT_OTHER | The processor's Instruction cache was reloaded from local core's L2 with dispatch conflict due to an instruction fetch (not prefetch) | 3 | |
PM_INST_FROM_L2_MEPF | The processor's Instruction cache was reloaded from local core's L2 hit without dispatch conflicts on Mepf state. due to an instruction fetch (not prefetch) | 1 | |
PM_INST_FROM_L2MISS | The processor's Instruction cache was reloaded from a location other than the local core's L2 due to an instruction fetch (not prefetch) | 0 | |
PM_INST_FROM_L2_NO_CONFLICT | The processor's Instruction cache was reloaded from local core's L2 without conflict due to an instruction fetch (not prefetch) | 0 | |
PM_INST_FROM_L3 | The processor's Instruction cache was reloaded from local core's L3 due to an instruction fetch (not prefetch) | 3 | |
PM_INST_FROM_L31_ECO_MOD | The processor's Instruction cache was reloaded with Modified (M) data from another core's ECO L3 on the same chip due to an instruction fetch (not prefetch) | 3 | |
PM_INST_FROM_L31_ECO_SHR | The processor's Instruction cache was reloaded with Shared (S) data from another core's ECO L3 on the same chip due to an instruction fetch (not prefetch) | 2 | |
PM_INST_FROM_L31_MOD | The processor's Instruction cache was reloaded with Modified (M) data from another core's L3 on the same chip due to an instruction fetch (not prefetch) | 1 | |
PM_INST_FROM_L31_SHR | The processor's Instruction cache was reloaded with Shared (S) data from another core's L3 on the same chip due to an instruction fetch (not prefetch) | 0 | |
PM_INST_FROM_L3_DISP_CONFLICT | The processor's Instruction cache was reloaded from local core's L3 with dispatch conflict due to an instruction fetch (not prefetch) | 2 | |
PM_INST_FROM_L3_MEPF | The processor's Instruction cache was reloaded from local core's L3 without dispatch conflicts hit on Mepf state. due to an instruction fetch (not prefetch) | 1 | |
PM_INST_FROM_L3MISS | Marked instruction was reloaded from a location beyond the local chiplet | 2 | |
PM_INST_FROM_L3MISS_MOD | The processor's Instruction cache was reloaded from a location other than the local core's L3 due to a instruction fetch | 3 | |
PM_INST_FROM_L3_NO_CONFLICT | The processor's Instruction cache was reloaded from local core's L3 without conflict due to an instruction fetch (not prefetch) | 0 | |
PM_INST_FROM_LL4 | The processor's Instruction cache was reloaded from the local chip's L4 cache due to an instruction fetch (not prefetch) | 0 | |
PM_INST_FROM_LMEM | The processor's Instruction cache was reloaded from the local chip's Memory due to an instruction fetch (not prefetch) | 1 | |
PM_INST_FROM_MEMORY | The processor's Instruction cache was reloaded from a memory location including L4 from local remote or distant due to an instruction fetch (not prefetch) | 1 | |
PM_INST_FROM_OFF_CHIP_CACHE | The processor's Instruction cache was reloaded either shared or modified data from another core's L2/L3 on a different chip (remote or distant) due to an instruction fetch (not prefetch) | 3 | |
PM_INST_FROM_ON_CHIP_CACHE | The processor's Instruction cache was reloaded either shared or modified data from another core's L2/L3 on the same chip due to an instruction fetch (not prefetch) | 0 | |
PM_INST_FROM_RL2L3_MOD | The processor's Instruction cache was reloaded with Modified (M) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to an instruction fetch (not prefetch) | 1 | |
PM_INST_FROM_RL2L3_SHR | The processor's Instruction cache was reloaded with Shared (S) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to an instruction fetch (not prefetch) | 0 | |
PM_INST_FROM_RL4 | The processor's Instruction cache was reloaded from another chip's L4 on the same Node or Group ( Remote) due to an instruction fetch (not prefetch) | 1 | |
PM_INST_FROM_RMEM | The processor's Instruction cache was reloaded from another chip's memory on the same Node or Group ( Remote) due to an instruction fetch (not prefetch) | 2 | |
PM_INST_GRP_PUMP_CPRED | Initial and Final Pump Scope was group pump (prediction=correct) for an instruction fetch (demand only) | 1 | |
PM_INST_GRP_PUMP_MPRED | ended up either larger or smaller than Initial Pump Scope for an instruction fetch (demand only) | 1 | Final Pump Scope Group |
PM_INST_GRP_PUMP_MPRED_RTY | ended up larger than Initial Pump Scope (Chip) for an instruction fetch | 0 | Final Pump Scope Group |
PM_INST_IMC_MATCH_CMPL | IMC Match Count | 3 | |
PM_INST_PUMP_CPRED | Pump prediction correct. Counts across all types of pumps for an instruction fetch | 0 | |
PM_INST_PUMP_MPRED | Pump misprediction. Counts across all types of pumps for an instruction fetch | 3 | |
PM_INST_SYS_PUMP_CPRED | Initial and Final Pump Scope was system pump (prediction=correct) for an instruction fetch | 2 | |
PM_INST_SYS_PUMP_MPRED | Final Pump Scope (system) mispredicted. Either the original scope was too small (Chip/Group) or the original scope was System and it should have been smaller. Counts for an instruction fetch | 2 | |
PM_INST_SYS_PUMP_MPRED_RTY | Final Pump Scope (system) ended up larger than Initial Pump Scope (Chip/Group) for an instruction fetch | 3 | |
PM_IOPS_CMPL | Internal Operations completed | 1 | |
PM_IPTEG_FROM_DL2L3_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a instruction side request | 3 | |
PM_IPTEG_FROM_DL2L3_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a instruction side request | 2 | |
PM_IPTEG_FROM_DL4 | A Page Table Entry was loaded into the TLB from another chip's L4 on a different Node or Group (Distant) due to a instruction side request | 2 | |
PM_IPTEG_FROM_DMEM | A Page Table Entry was loaded into the TLB from another chip's memory on the same Node or Group (Distant) due to a instruction side request | 3 | |
PM_IPTEG_FROM_L2 | A Page Table Entry was loaded into the TLB from local core's L2 due to a instruction side request | 0 | |
PM_IPTEG_FROM_L21_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another core's L2 on the same chip due to a instruction side request | 3 | |
PM_IPTEG_FROM_L21_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another core's L2 on the same chip due to a instruction side request | 2 | |
PM_IPTEG_FROM_L2_MEPF | A Page Table Entry was loaded into the TLB from local core's L2 hit without dispatch conflicts on Mepf state. due to a instruction side request | 1 | |
PM_IPTEG_FROM_L2MISS | A Page Table Entry was loaded into the TLB from a location other than the local core's L2 due to a instruction side request | 0 | |
PM_IPTEG_FROM_L2_NO_CONFLICT | A Page Table Entry was loaded into the TLB from local core's L2 without conflict due to a instruction side request | 0 | |
PM_IPTEG_FROM_L3 | A Page Table Entry was loaded into the TLB from local core's L3 due to a instruction side request | 3 | |
PM_IPTEG_FROM_L31_ECO_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another core's ECO L3 on the same chip due to a instruction side request | 3 | |
PM_IPTEG_FROM_L31_ECO_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another core's ECO L3 on the same chip due to a instruction side request | 2 | |
PM_IPTEG_FROM_L31_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another core's L3 on the same chip due to a instruction side request | 1 | |
PM_IPTEG_FROM_L31_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another core's L3 on the same chip due to a instruction side request | 0 | |
PM_IPTEG_FROM_L3_DISP_CONFLICT | A Page Table Entry was loaded into the TLB from local core's L3 with dispatch conflict due to a instruction side request | 2 | |
PM_IPTEG_FROM_L3_MEPF | A Page Table Entry was loaded into the TLB from local core's L3 without dispatch conflicts hit on Mepf state. due to a instruction side request | 1 | |
PM_IPTEG_FROM_L3MISS | A Page Table Entry was loaded into the TLB from a location other than the local core's L3 due to a instruction side request | 3 | |
PM_IPTEG_FROM_L3_NO_CONFLICT | A Page Table Entry was loaded into the TLB from local core's L3 without conflict due to a instruction side request | 0 | |
PM_IPTEG_FROM_LL4 | A Page Table Entry was loaded into the TLB from the local chip's L4 cache due to a instruction side request | 0 | |
PM_IPTEG_FROM_LMEM | A Page Table Entry was loaded into the TLB from the local chip's Memory due to a instruction side request | 1 | |
PM_IPTEG_FROM_MEMORY | A Page Table Entry was loaded into the TLB from a memory location including L4 from local remote or distant due to a instruction side request | 1 | |
PM_IPTEG_FROM_OFF_CHIP_CACHE | A Page Table Entry was loaded into the TLB either shared or modified data from another core's L2/L3 on a different chip (remote or distant) due to a instruction side request | 3 | |
PM_IPTEG_FROM_ON_CHIP_CACHE | A Page Table Entry was loaded into the TLB either shared or modified data from another core's L2/L3 on the same chip due to a instruction side request | 0 | |
PM_IPTEG_FROM_RL2L3_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a instruction side request | 1 | |
PM_IPTEG_FROM_RL2L3_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a instruction side request | 0 | |
PM_IPTEG_FROM_RL4 | A Page Table Entry was loaded into the TLB from another chip's L4 on the same Node or Group ( Remote) due to a instruction side request | 1 | |
PM_IPTEG_FROM_RMEM | A Page Table Entry was loaded into the TLB from another chip's memory on the same Node or Group ( Remote) due to a instruction side request | 2 | |
PM_ISIDE_DISP | All I-side dispatch attempts for this thread (excludes i_l2mru_tch_reqs) | 0 | |
PM_ISIDE_DISP_FAIL_ADDR | All I-side dispatch attempts for this thread that failed due to a addr collision with another machine (excludes i_l2mru_tch_reqs) | 1 | |
PM_ISIDE_DISP_FAIL_OTHER | All I-side dispatch attempts for this thread that failed due to a reason other than addrs collision (excludes i_l2mru_tch_reqs) | 1 | |
PM_ISIDE_L2MEMACC | Valid when first beat of data comes in for an I-side fetch where data came from memory | 1 | |
PM_ISIDE_MRU_TOUCH | I-side L2 MRU touch sent to L2 for this thread | 3 | |
PM_ISLB_MISS | Instruction SLB Miss - Total of all segment sizes | 0, 1, 2, 3 | |
PM_ISLB_MISS | Number of ISLB misses for this thread | 3 | |
PM_ISQ_0_8_ENTRIES | Cycles in which 8 or less Issue Queue entries are in use. This is a shared event, not per thread | 2 | |
PM_ISQ_36_44_ENTRIES | Cycles in which 36 or more Issue Queue entries are in use. This is a shared event, not per thread. There are 44 issue queue entries across 4 slices in the whole core | 3 | |
PM_ISU0_ISS_HOLD_ALL | All ISU rejects | 0, 1, 2, 3 | |
PM_ISU1_ISS_HOLD_ALL | All ISU rejects | 0, 1, 2, 3 | |
PM_ISU2_ISS_HOLD_ALL | All ISU rejects | 0, 1, 2, 3 | |
PM_ISU3_ISS_HOLD_ALL | All ISU rejects | 0, 1, 2, 3 | |
PM_ISYNC | Isync completion count per thread | 0, 1, 2, 3 | |
PM_ITLB_MISS | ITLB Reloaded. Counts 1 per ITLB miss for HPT but multiple for radix depending on number of levels traveresed | 3 | |
PM_L1_DCACHE_RELOADED_ALL | L1 data cache reloaded for demand. If MMCR1[16] is 1, prefetches will be included as well | 0 | |
PM_L1_DCACHE_RELOAD_VALID | DL1 reloaded due to Demand Load | 2 | |
PM_L1_DEMAND_WRITE | Instruction Demand sectors written into IL1 | 0, 1, 2, 3 | |
PM_L1_ICACHE_MISS | Demand iCache Miss | 1 | |
PM_L1_ICACHE_RELOADED_ALL | Counts all Icache reloads includes demand, prefetch, prefetch turned into demand and demand turned into prefetch | 3 | |
PM_L1_ICACHE_RELOADED_PREF | Counts all Icache prefetch reloads ( includes demand turned into prefetch) | 2 | |
PM_L1PF_L2MEMACC | Valid when first beat of data comes in for an L1PF where data came from memory | 0 | |
PM_L1_PREF | A data line was written to the L1 due to a hardware or software prefetch | 1 | |
PM_L1_SW_PREF | Software L1 Prefetches, including SW Transient Prefetches | 0, 1, 2, 3 | |
PM_L2_CASTOUT_MOD | L2 Castouts - Modified (M,Mu,Me) | 0 | |
PM_L2_CASTOUT_SHR | L2 Castouts - Shared (Tx,Sx) | 0 | |
PM_L2_CHIP_PUMP | RC requests that were local (aka chip) pump attempts | 3 | |
PM_L2_DC_INV | D-cache invalidates sent over the reload bus to the core | 1 | |
PM_L2_DISP_ALL_L2MISS | All successful Ld/St dispatches for this thread that were an L2 miss (excludes i_l2mru_tch_reqs) | 3 | |
PM_L2_GROUP_PUMP | RC requests that were on group (aka nodel) pump attempts | 3 | |
PM_L2_GRP_GUESS_CORRECT | L2 guess grp (GS or NNS) and guess was correct (data intra-group AND ^on-chip) | 1 | |
PM_L2_GRP_GUESS_WRONG | L2 guess grp (GS or NNS) and guess was not correct (ie data on-chip OR beyond-group) | 1 | |
PM_L2_IC_INV | I-cache Invalidates sent over the realod bus to the core | 1 | |
PM_L2_INST | All successful I-side dispatches for this thread (excludes i_l2mru_tch reqs) | 2 | |
PM_L2_INST | All successful I-side dispatches for this thread (excludes i_l2mru_tch reqs) | 2 | |
PM_L2_INST_MISS | All successful I-side dispatches that were an L2 miss for this thread (excludes i_l2mru_tch reqs) | 2 | |
PM_L2_INST_MISS | All successful I-side dispatches that were an L2 miss for this thread (excludes i_l2mru_tch reqs) | 3 | |
PM_L2_LD | All successful D-side Load dispatches for this thread (L2 miss + L2 hits) | 0 | |
PM_L2_LD_DISP | All successful D-side load dispatches for this thread (L2 miss + L2 hits) | 0 | |
PM_L2_LD_DISP | All successful I-or-D side load dispatches for this thread (excludes i_l2mru_tch_reqs) | 2 | |
PM_L2_LD_HIT | All successful D-side load dispatches that were L2 hits for this thread | 1 | |
PM_L2_LD_HIT | All successful I-or-D side load dispatches for this thread that were L2 hits (excludes i_l2mru_tch_reqs) | 2 | |
PM_L2_LD_MISS | All successful D-Side Load dispatches that were an L2 miss for this thread | 1 | |
PM_L2_LD_MISS_128B | All successful D-side load dispatches that were an L2 miss (NOT Sx,Tx,Mx) for this thread and the RC calculated the request should be for 128B (i.e., M=0) | 0 | |
PM_L2_LD_MISS_64B | All successful D-side load dispatches that were an L2 miss (NOT Sx,Tx,Mx) for this thread and the RC calculated the request should be for 64B(i.e., M=1) | 1 | |
PM_L2_LOC_GUESS_CORRECT | L2 guess local (LNS) and guess was correct (ie data local) | 0 | |
PM_L2_LOC_GUESS_WRONG | L2 guess local (LNS) and guess was not correct (ie data not on chip) | 0 | |
PM_L2_RCLD_DISP | All I-or-D side load dispatch attempts for this thread (excludes i_l2mru_tch_reqs) | 0 | |
PM_L2_RCLD_DISP_FAIL_ADDR | All I-od-D side load dispatch attempts for this thread that failed due to address collision with RC/CO/SN/SQ machine (excludes i_l2mru_tch_reqs) | 0 | |
PM_L2_RCLD_DISP_FAIL_OTHER | All I-or-D side load dispatch attempts for this thread that failed due to reason other than address collision (excludes i_l2mru_tch_reqs) | 1 | |
PM_L2_RCST_DISP | All D-side store dispatch attempts for this thread | 2 | |
PM_L2_RCST_DISP_FAIL_ADDR | All D-side store dispatch attempts for this thread that failed due to address collision with RC/CO/SN/SQ | 2 | |
PM_L2_RCST_DISP_FAIL_OTHER | All D-side store dispatch attempts for this thread that failed due to reason other than address collision | 3 | |
PM_L2_RC_ST_DONE | RC did store to line that was Tx or Sx | 2 | |
PM_L2_RTY_LD | RC retries on PB for any load from core (excludes DCBFs) | 2 | |
PM_L2_RTY_LD | RC retries on PB for any load from core (excludes DCBFs) | 2 | |
PM_L2_RTY_ST | RC retries on PB for any store from core (excludes DCBFs) | 2 | |
PM_L2_RTY_ST | RC retries on PB for any store from core (excludes DCBFs) | 3 | |
PM_L2_SN_M_RD_DONE | SNP dispatched for a read and was M (true M) | 3 | |
PM_L2_SN_M_WR_DONE | SNP dispatched for a write and was M (true M) | 0 | |
PM_L2_SN_M_WR_DONE | SNP dispatched for a write and was M (true M) | 3 | |
PM_L2_SN_SX_I_DONE | SNP dispatched and went from Sx to Ix | 2 | |
PM_L2_ST | All successful D-side store dispatches for this thread (L2 miss + L2 hits) | 0 | |
PM_L2_ST_DISP | All successful D-side store dispatches for this thread (L2 miss + L2 hits) | 0 | |
PM_L2_ST_DISP | All successful D-side store dispatches for this thread | 3 | |
PM_L2_ST_HIT | All successful D-side store dispatches that were L2 hits for this thread | 1 | |
PM_L2_ST_HIT | All successful D-side store dispatches for this thread that were L2 hits | 3 | |
PM_L2_ST_MISS | All successful D-Side Store dispatches that were an L2 miss for this thread | 1 | |
PM_L2_ST_MISS_128B | All successful D-side store dispatches that were an L2 miss (NOT Sx,Tx,Mx) for this thread and the RC calculated the request should be for 128B (i.e., M=0) | 0 | |
PM_L2_ST_MISS_64B | All successful D-side store dispatches that were an L2 miss (NOT Sx,Tx,Mx) for this thread and the RC calculated the request should be for 64B (i.e., M=1) | 1 | |
PM_L2_SYS_GUESS_CORRECT | L2 guess system (VGS or RNS) and guess was correct (ie data beyond-group) | 2 | |
PM_L2_SYS_GUESS_WRONG | L2 guess system (VGS or RNS) and guess was not correct (ie data ^beyond-group) | 2 | |
PM_L2_SYS_PUMP | RC requests that were system pump attempts | 3 | |
PM_L3_CI_HIT | L3 Castins Hit (total count) | 1 | |
PM_L3_CI_MISS | L3 castins miss (total count) | 1 | |
PM_L3_CINJ | L3 castin of cache inject | 2 | |
PM_L3_CI_USAGE | Rotating sample of 16 CI or CO actives | 0 | |
PM_L3_CO | L3 castout occurring (does not include casthrough or log writes (cinj/dmaw)) | 2 | |
PM_L3_CO0_BUSY | Lifetime, sample of CO machine 0 valid | 2 | |
PM_L3_CO0_BUSY | Lifetime, sample of CO machine 0 valid | 3 | |
PM_L3_CO_L31 | L3 CO to L3.1 OR of port 0 and 1 (lossy = may undercount if two cresps come in the same cyc) | 1 | |
PM_L3_CO_LCO | Total L3 COs occurred on LCO L3.1 (good cresp, may end up in mem on a retry) | 2 | |
PM_L3_CO_MEM | L3 CO to memory OR of port 0 and 1 (lossy = may undercount if two cresp come in the same cyc) | 1 | |
PM_L3_CO_MEPF | L3 CO of line in Mep state (includes casthrough to memory). The Mepf state indicates that a line was brought in to satisfy an L3 prefetch request | 0 | |
PM_L3_CO_MEPF | L3 castouts in Mepf state for this thread | 2 | |
PM_L3_GRP_GUESS_CORRECT | Initial scope=group (GS or NNS) and data from same group (near) (pred successful) | 0 | |
PM_L3_GRP_GUESS_WRONG_HIGH | Initial scope=group (GS or NNS) but data from local node. Prediction too high | 2 | |
PM_L3_GRP_GUESS_WRONG_LOW | Initial scope=group (GS or NNS) but data from outside group (far or rem). Prediction too Low | 2 | |
PM_L3_HIT | L3 Hits (L2 miss hitting L3, including data/instrn/xlate) | 0 | |
PM_L3_L2_CO_HIT | L2 CO hits | 2 | |
PM_L3_L2_CO_MISS | L2 CO miss | 2 | |
PM_L3_LAT_CI_HIT | L3 Lateral Castins Hit | 3 | |
PM_L3_LAT_CI_MISS | L3 Lateral Castins Miss | 3 | |
PM_L3_LD_HIT | L3 Hits for demand LDs | 1 | |
PM_L3_LD_MISS | L3 Misses for demand LDs | 1 | |
PM_L3_LD_PREF | L3 load prefetch, sourced from a hardware or software stream, was sent to the nest | 0, 1, 2, 3 | |
PM_L3_LOC_GUESS_CORRECT | initial scope=node/chip (LNS) and data from local node (local) (pred successful) - always PFs only | 0 | |
PM_L3_LOC_GUESS_WRONG | Initial scope=node (LNS) but data from out side local node (near or far or rem). Prediction too Low | 1 | |
PM_L3_MISS | L3 Misses (L2 miss also missing L3, including data/instrn/xlate) | 0 | |
PM_L3_P0_CO_L31 | L3 CO to L3.1 (LCO) port 0 with or without data | 3 | |
PM_L3_P0_CO_MEM | L3 CO to memory port 0 with or without data | 2 | |
PM_L3_P0_CO_RTY | L3 CO received retry port 0 (memory only), every retry counted | 2 | |
PM_L3_P0_CO_RTY | L3 CO received retry port 2 (memory only), every retry counted | 3 | |
PM_L3_P0_GRP_PUMP | L3 PF sent with grp scope port 0, counts even retried requests | 1 | |
PM_L3_P0_LCO_DATA | LCO sent with data port 0 | 1 | |
PM_L3_P0_LCO_NO_DATA | Dataless L3 LCO sent port 0 | 0 | |
PM_L3_P0_LCO_RTY | L3 initiated LCO received retry on port 0 (can try 4 times) | 0 | |
PM_L3_P0_NODE_PUMP | L3 PF sent with nodal scope port 0, counts even retried requests | 0 | |
PM_L3_P0_PF_RTY | L3 PF received retry port 0, every retry counted | 0 | |
PM_L3_P0_PF_RTY | L3 PF received retry port 2, every retry counted | 1 | |
PM_L3_P0_SYS_PUMP | L3 PF sent with sys scope port 0, counts even retried requests | 2 | |
PM_L3_P1_CO_L31 | L3 CO to L3.1 (LCO) port 1 with or without data | 3 | |
PM_L3_P1_CO_MEM | L3 CO to memory port 1 with or without data | 2 | |
PM_L3_P1_CO_RTY | L3 CO received retry port 1 (memory only), every retry counted | 2 | |
PM_L3_P1_CO_RTY | L3 CO received retry port 3 (memory only), every retry counted | 3 | |
PM_L3_P1_GRP_PUMP | L3 PF sent with grp scope port 1, counts even retried requests | 1 | |
PM_L3_P1_LCO_DATA | LCO sent with data port 1 | 1 | |
PM_L3_P1_LCO_NO_DATA | Dataless L3 LCO sent port 1 | 0 | |
PM_L3_P1_LCO_RTY | L3 initiated LCO received retry on port 1 (can try 4 times) | 0 | |
PM_L3_P1_NODE_PUMP | L3 PF sent with nodal scope port 1, counts even retried requests | 0 | |
PM_L3_P1_PF_RTY | L3 PF received retry port 1, every retry counted | 0 | |
PM_L3_P1_PF_RTY | L3 PF received retry port 3, every retry counted | 1 | |
PM_L3_P1_SYS_PUMP | L3 PF sent with sys scope port 1, counts even retried requests | 2 | |
PM_L3_P2_LCO_RTY | L3 initiated LCO received retry on port 2 (can try 4 times) | 1 | |
PM_L3_P3_LCO_RTY | L3 initiated LCO received retry on port 3 (can try 4 times) | 1 | |
PM_L3_PF0_BUSY | Lifetime, sample of PF machine 0 valid | 2 | |
PM_L3_PF0_BUSY | Lifetime, sample of PF machine 0 valid | 3 | |
PM_L3_PF_HIT_L3 | L3 PF hit in L3 (abandoned) | 1 | |
PM_L3_PF_MISS_L3 | L3 PF missed in L3 | 0 | |
PM_L3_PF_OFF_CHIP_CACHE | L3 PF from Off chip cache | 2 | |
PM_L3_PF_OFF_CHIP_MEM | L3 PF from Off chip memory | 3 | |
PM_L3_PF_ON_CHIP_CACHE | L3 PF from On chip cache | 2 | |
PM_L3_PF_ON_CHIP_MEM | L3 PF from On chip memory | 3 | |
PM_L3_PF_USAGE | Rotating sample of 32 PF actives | 1 | |
PM_L3_RD0_BUSY | Lifetime, sample of RD machine 0 valid | 2 | |
PM_L3_RD0_BUSY | Lifetime, sample of RD machine 0 valid | 3 | |
PM_L3_RD_USAGE | Rotating sample of 16 RD actives | 1 | |
PM_L3_SN0_BUSY | Lifetime, sample of snooper machine 0 valid | 2 | |
PM_L3_SN0_BUSY | Lifetime, sample of snooper machine 0 valid | 3 | |
PM_L3_SN_USAGE | Rotating sample of 16 snoop valids | 0 | |
PM_L3_SW_PREF | L3 load prefetch, sourced from a software prefetch stream, was sent to the nest | 0, 1, 2, 3 | |
PM_L3_SYS_GUESS_CORRECT | Initial scope=system (VGS or RNS) and data from outside group (far or rem)(pred successful) | 1 | |
PM_L3_SYS_GUESS_WRONG | Initial scope=system (VGS or RNS) but data from local or near. Prediction too high | 3 | |
PM_L3_TRANS_PF | L3 Transient prefetch received from L2 | 3 | |
PM_L3_WI0_BUSY | Rotating sample of 8 WI valid | 0 | |
PM_L3_WI0_BUSY | Rotating sample of 8 WI valid (duplicate) | 1 | |
PM_L3_WI_USAGE | Lifetime, sample of Write Inject machine 0 valid | 0 | |
PM_LARX_FIN | Larx finished | 2 | |
PM_LD_CMPL | count of Loads completed | 3 | |
PM_LD_L3MISS_PEND_CYC | Cycles L3 miss was pending for this thread | 0 | |
PM_LD_MISS_L1 | Load Missed L1, counted at execution time (can be greater than loads finished). LMQ merges are not included in this count. i.e. if a load instruction misses on an address that is already allocated on the LMQ, this event will not increment for that load). Note that this count is per slice, so if a load spans multiple slices this event will increment multiple times for a single load. | 2 | |
PM_LD_MISS_L1 | Load Missed L1, counted at execution time (can be greater than loads finished). LMQ merges are not included in this count. i.e. if a load instruction misses on an address that is already allocated on the LMQ, this event will not increment for that load). Note that this count is per slice, so if a load spans multiple slices this event will increment multiple times for a single load. | 3 | |
PM_LD_MISS_L1_FIN | Number of load instructions that finished with an L1 miss. Note that even if a load spans multiple slices this event will increment only once per load op. | 1 | |
PM_LD_REF_L1 | All L1 D cache load references counted at finish, gated by reject | 0 | |
PM_LINK_STACK_CORRECT | Link stack predicts right address | 0, 1, 2, 3 | |
PM_LINK_STACK_INVALID_PTR | It is most often caused by certain types of flush where the pointer is not available. Can result in the data in the link stack becoming unusable. | 0, 1, 2, 3 | |
PM_LINK_STACK_WRONG_ADD_PRED | Link stack predicts wrong address, because of link stack design limitation or software violating the coding conventions | 0, 1, 2, 3 | |
PM_LMQ_EMPTY_CYC | Cycles in which the LMQ has no pending load misses for this thread | 1 | |
PM_LMQ_MERGE | A demand miss collides with a prefetch for the same line | 0 | |
PM_LRQ_REJECT | Internal LSU reject from LRQ. Rejects cause the load to go back to LRQ, but it stays contained within the LSU once it gets issued. This event counts the number of times the LRQ attempts to relaunch an instruction after a reject. Any load can suffer multiple rejects | 1 | |
PM_LS0_DC_COLLISIONS | Read-write data cache collisions | 0, 1, 2, 3 | |
PM_LS0_ERAT_MISS_PREF | LS0 Erat miss due to prefetch | 0, 1, 2, 3 | |
PM_LS0_LAUNCH_HELD_PREF | Number of times a load or store instruction was unable to launch/relaunch because a high priority prefetch used that relaunch cycle | 0, 1, 2, 3 | |
PM_LS0_PTE_TABLEWALK_CYC | Cycles when a tablewalk is pending on this thread on table 0 | 0, 1, 2, 3 | |
PM_LS0_TM_DISALLOW | A TM-ineligible instruction tries to execute inside a transaction and the LSU disallows it | 0, 1, 2, 3 | |
PM_LS0_UNALIGNED_LD | Load instructions whose data crosses a double-word boundary, which causes it to require an additional slice than than what normally would be required of the load of that size. If the load wraps from slice 3 to slice 0, thee is an additional 3-cycle penalty | 0, 1, 2, 3 | |
PM_LS0_UNALIGNED_ST | Store instructions whose data crosses a double-word boundary, which causes it to require an additional slice than than what normally would be required of the Store of that size. If the Store wraps from slice 3 to slice 0, thee is an additional 3-cycle penalty | 0, 1, 2, 3 | |
PM_LS1_DC_COLLISIONS | Read-write data cache collisions | 0, 1, 2, 3 | |
PM_LS1_ERAT_MISS_PREF | LS1 Erat miss due to prefetch | 0, 1, 2, 3 | |
PM_LS1_LAUNCH_HELD_PREF | Number of times a load or store instruction was unable to launch/relaunch because a high priority prefetch used that relaunch cycle | 0, 1, 2, 3 | |
PM_LS1_PTE_TABLEWALK_CYC | Cycles when a tablewalk is pending on this thread on table 1 | 0, 1, 2, 3 | |
PM_LS1_TM_DISALLOW | A TM-ineligible instruction tries to execute inside a transaction and the LSU disallows it | 0, 1, 2, 3 | |
PM_LS1_UNALIGNED_LD | Load instructions whose data crosses a double-word boundary, which causes it to require an additional slice than than what normally would be required of the load of that size. If the load wraps from slice 3 to slice 0, thee is an additional 3-cycle penalty | 0, 1, 2, 3 | |
PM_LS1_UNALIGNED_ST | Store instructions whose data crosses a double-word boundary, which causes it to require an additional slice than than what normally would be required of the Store of that size. If the Store wraps from slice 3 to slice 0, thee is an additional 3-cycle penalty | 0, 1, 2, 3 | |
PM_LS2_DC_COLLISIONS | Read-write data cache collisions | 0, 1, 2, 3 | |
PM_LS2_ERAT_MISS_PREF | LS0 Erat miss due to prefetch | 0, 1, 2, 3 | |
PM_LS2_TM_DISALLOW | A TM-ineligible instruction tries to execute inside a transaction and the LSU disallows it | 0, 1, 2, 3 | |
PM_LS2_UNALIGNED_LD | Load instructions whose data crosses a double-word boundary, which causes it to require an additional slice than than what normally would be required of the load of that size. If the load wraps from slice 3 to slice 0, thee is an additional 3-cycle penalty | 0, 1, 2, 3 | |
PM_LS2_UNALIGNED_ST | Store instructions whose data crosses a double-word boundary, which causes it to require an additional slice than than what normally would be required of the Store of that size. If the Store wraps from slice 3 to slice 0, thee is an additional 3-cycle penalty | 0, 1, 2, 3 | |
PM_LS3_DC_COLLISIONS | Read-write data cache collisions | 0, 1, 2, 3 | |
PM_LS3_ERAT_MISS_PREF | LS1 Erat miss due to prefetch | 0, 1, 2, 3 | |
PM_LS3_TM_DISALLOW | A TM-ineligible instruction tries to execute inside a transaction and the LSU disallows it | 0, 1, 2, 3 | |
PM_LS3_UNALIGNED_LD | Load instructions whose data crosses a double-word boundary, which causes it to require an additional slice than than what normally would be required of the load of that size. If the load wraps from slice 3 to slice 0, thee is an additional 3-cycle penalty | 0, 1, 2, 3 | |
PM_LS3_UNALIGNED_ST | Store instructions whose data crosses a double-word boundary, which causes it to require an additional slice than than what normally would be required of the Store of that size. If the Store wraps from slice 3 to slice 0, thee is an additional 3-cycle penalty | 0, 1, 2, 3 | |
PM_LSU0_1_LRQF_FULL_CYC | Counts the number of cycles the LRQF is full. LRQF is the queue that holds loads between finish and completion. If it fills up, instructions stay in LRQ until completion, potentially backing up the LRQ | 0, 1, 2, 3 | |
PM_LSU0_ERAT_HIT | Primary ERAT hit. There is no secondary ERAT | 0, 1, 2, 3 | |
PM_LSU0_FALSE_LHS | False LHS match detected | 0, 1, 2, 3 | |
PM_LSU0_L1_CAM_CANCEL | ls0 l1 tm cam cancel | 0, 1, 2, 3 | |
PM_LSU0_LDMX_FIN | New P9 instruction LDMX. The definition of this new PMU event is (from the ldmx RFC02491): "The thread has executed an ldmx instruction that accessed a doubleword that contains an effective address within an enabled section of the Load Monitored region." This event, therefore, should not occur if the FSCR has disabled the load monitored facility (FSCR[52]) or disabled the EBB facility (FSCR[56]). | 0, 1, 2, 3 | |
PM_LSU0_LMQ_S0_VALID | Slot 0 of LMQ valid | 0, 1, 2, 3 | |
PM_LSU0_LRQ_S0_VALID_CYC | Slot 0 of LRQ valid | 0, 1, 2, 3 | |
PM_LSU0_SET_MPRED | Set prediction(set-p) miss. The entry was not found in the Set prediction table | 0, 1, 2, 3 | |
PM_LSU0_SRQ_S0_VALID_CYC | Slot 0 of SRQ valid | 0, 1, 2, 3 | |
PM_LSU0_STORE_REJECT | All internal store rejects cause the instruction to go back to the SRQ and go to sleep until woken up to try again after the condition has been met | 0, 1, 2, 3 | |
PM_LSU0_TM_L1_HIT | Load tm hit in L1 | 0, 1, 2, 3 | |
PM_LSU0_TM_L1_MISS | Load tm L1 miss | 0, 1, 2, 3 | |
PM_LSU1_ERAT_HIT | Primary ERAT hit. There is no secondary ERAT | 0, 1, 2, 3 | |
PM_LSU1_FALSE_LHS | False LHS match detected | 0, 1, 2, 3 | |
PM_LSU1_L1_CAM_CANCEL | ls1 l1 tm cam cancel | 0, 1, 2, 3 | |
PM_LSU1_LDMX_FIN | New P9 instruction LDMX. The definition of this new PMU event is (from the ldmx RFC02491): "The thread has executed an ldmx instruction that accessed a doubleword that contains an effective address within an enabled section of the Load Monitored region." This event, therefore, should not occur if the FSCR has disabled the load monitored facility (FSCR[52]) or disabled the EBB facility (FSCR[56]). | 0, 1, 2, 3 | |
PM_LSU1_SET_MPRED | Set prediction(set-p) miss. The entry was not found in the Set prediction table | 0, 1, 2, 3 | |
PM_LSU1_STORE_REJECT | All internal store rejects cause the instruction to go back to the SRQ and go to sleep until woken up to try again after the condition has been met | 0, 1, 2, 3 | |
PM_LSU1_TM_L1_HIT | Load tm hit in L1 | 0, 1, 2, 3 | |
PM_LSU1_TM_L1_MISS | Load tm L1 miss | 0, 1, 2, 3 | |
PM_LSU2_3_LRQF_FULL_CYC | Counts the number of cycles the LRQF is full. LRQF is the queue that holds loads between finish and completion. If it fills up, instructions stay in LRQ until completion, potentially backing up the LRQ | 0, 1, 2, 3 | |
PM_LSU2_ERAT_HIT | Primary ERAT hit. There is no secondary ERAT | 0, 1, 2, 3 | |
PM_LSU2_FALSE_LHS | False LHS match detected | 0, 1, 2, 3 | |
PM_LSU2_L1_CAM_CANCEL | ls2 l1 tm cam cancel | 0, 1, 2, 3 | |
PM_LSU2_LDMX_FIN | New P9 instruction LDMX. The definition of this new PMU event is (from the ldmx RFC02491): "The thread has executed an ldmx instruction that accessed a doubleword that contains an effective address within an enabled section of the Load Monitored region." This event, therefore, should not occur if the FSCR has disabled the load monitored facility (FSCR[52]) or disabled the EBB facility (FSCR[56]). | 0, 1, 2, 3 | |
PM_LSU2_SET_MPRED | Set prediction(set-p) miss. The entry was not found in the Set prediction table | 0, 1, 2, 3 | |
PM_LSU2_STORE_REJECT | All internal store rejects cause the instruction to go back to the SRQ and go to sleep until woken up to try again after the condition has been met | 0, 1, 2, 3 | |
PM_LSU2_TM_L1_HIT | Load tm hit in L1 | 0, 1, 2, 3 | |
PM_LSU2_TM_L1_MISS | Load tm L1 miss | 0, 1, 2, 3 | |
PM_LSU3_ERAT_HIT | Primary ERAT hit. There is no secondary ERAT | 0, 1, 2, 3 | |
PM_LSU3_FALSE_LHS | False LHS match detected | 0, 1, 2, 3 | |
PM_LSU3_L1_CAM_CANCEL | ls3 l1 tm cam cancel | 0, 1, 2, 3 | |
PM_LSU3_LDMX_FIN | New P9 instruction LDMX. The definition of this new PMU event is (from the ldmx RFC02491): "The thread has executed an ldmx instruction that accessed a doubleword that contains an effective address within an enabled section of the Load Monitored region." This event, therefore, should not occur if the FSCR has disabled the load monitored facility (FSCR[52]) or disabled the EBB facility (FSCR[56]). | 0, 1, 2, 3 | |
PM_LSU3_SET_MPRED | Set prediction(set-p) miss. The entry was not found in the Set prediction table | 0, 1, 2, 3 | |
PM_LSU3_STORE_REJECT | All internal store rejects cause the instruction to go back to the SRQ and go to sleep until woken up to try again after the condition has been met | 0, 1, 2, 3 | |
PM_LSU3_TM_L1_HIT | Load tm hit in L1 | 0, 1, 2, 3 | |
PM_LSU3_TM_L1_MISS | Load tm L1 miss | 0, 1, 2, 3 | |
PM_LSU_DERAT_MISS | DERAT Reloaded due to a DERAT miss | 1 | |
PM_LSU_FIN | LSU Finished a PPC instruction (up to 4 per cycle) | 2 | |
PM_LSU_FLUSH_ATOMIC | Quad-word loads (lq) are considered atomic because they always span at least 2 slices. If a snoop or store from another thread changes the data the load is accessing between the 2 or 3 pieces of the lq instruction, the lq will be flushed | 0, 1, 2, 3 | |
PM_LSU_FLUSH_CI | Load was not issued to LSU as a cache inhibited (non-cacheable) load but it was later determined to be cache inhibited | 0, 1, 2, 3 | |
PM_LSU_FLUSH_EMSH | An ERAT miss was detected after a set-p hit. Erat tracker indicates fail due to tlbmiss and the instruction gets flushed because the instruction was working on the wrong address | 0, 1, 2, 3 | |
PM_LSU_FLUSH_LARX_STCX | A larx is flushed because an older larx has an LMQ reservation for the same thread. A stcx is flushed because an older stcx is in the LMQ. The flush happens when the older larx/stcx relaunches | 0, 1, 2, 3 | |
PM_LSU_FLUSH_LHL_SHL | The instruction was flushed because of a sequential load/store consistency. If a load or store hits on an older load that has either been snooped (for loads) or has stale data (for stores). | 0, 1, 2, 3 | |
PM_LSU_FLUSH_LHS | Effective Address alias flush : no EA match but Real Address match. If the data has not yet been returned for this load, the instruction will just be rejected, but if it has returned data, it will be flushed | 0, 1, 2, 3 | |
PM_LSU_FLUSH_NEXT | LSU flush next reported at flush time. Sometimes these also come with an exception | 0, 1, 2, 3 | |
PM_LSU_FLUSH_OTHER | Other LSU flushes including: Sync (sync ack from L2 caused search of LRQ for oldest snooped load, This will either signal a Precise Flush of the oldest snooped loa or a Flush Next PPC) | 0, 1, 2, 3 | |
PM_LSU_FLUSH_RELAUNCH_MISS | If a load that has already returned data and has to relaunch for any reason then gets a miss (erat, setp, data cache), it will often be flushed at relaunch time because the data might be inconsistent | 0, 1, 2, 3 | |
PM_LSU_FLUSH_SAO | A load-hit-load condition with Strong Address Ordering will have address compare disabled and flush | 0, 1, 2, 3 | |
PM_LSU_FLUSH_UE | Correctable ECC error on reload data, reported at critical data forward time | 0, 1, 2, 3 | |
PM_LSU_FLUSH_WRK_ARND | LSU workaround flush. These flushes are setup with programmable scan only latches to perform various actions when the flush macro receives a trigger from the dbg macros. These actions include things like flushing the next op encountered for a particular thread or flushing the next op that is NTC op that is encountered on a particular slice. The kind of flush that the workaround is setup to perform is highly variable. | 0, 1, 2, 3 | |
PM_LSU_LMQ_FULL_CYC | Counts the number of cycles the LMQ is full | 0, 1, 2, 3 | |
PM_LSU_LMQ_SRQ_EMPTY_CYC | Cycles in which the LSU is empty for all threads (lmq and srq are completely empty) | 1 | |
PM_LSU_NCST | Asserts when a i=1 store op is sent to the nest. No record of issue pipe (LS0/LS1) is maintained so this is for both pipes. Probably don't need separate LS0 and LS1 | 0, 1, 2, 3 | |
PM_LSU_REJECT_ERAT_MISS | LSU Reject due to ERAT (up to 4 per cycles) | 1 | |
PM_LSU_REJECT_LHS | LSU Reject due to LHS (up to 4 per cycle) | 3 | |
PM_LSU_REJECT_LMQ_FULL | LSU Reject due to LMQ full (up to 4 per cycles) | 2 | |
PM_LSU_SRQ_FULL_CYC | Cycles in which the Store Queue is full on all 4 slices. This is event is not per thread. All the threads will see the same count for this core resource | 0 | |
PM_LSU_STCX | STCX sent to nest, i.e. total | 0, 1, 2, 3 | |
PM_LSU_STCX_FAIL | LSU_STCX_FAIL | 0, 1, 2, 3 | |
PM_LWSYNC | Lwsync instruction decoded and transferred | 0, 1, 2, 3 | |
PM_MATH_FLOP_CMPL | Math flop instruction completed | 3 | |
PM_MEM_CO | Memory castouts from this thread | 3 | |
PM_MEM_LOC_THRESH_IFU | Local Memory above threshold for IFU speculation control | 0 | |
PM_MEM_LOC_THRESH_LSU_HIGH | Local memory above threshold for LSU medium | 3 | |
PM_MEM_LOC_THRESH_LSU_MED | Local memory above threshold for data prefetch | 0 | |
PM_MEM_PREF | Memory prefetch for this thread. Includes L4 | 1 | |
PM_MEM_READ | Reads from Memory from this thread (includes data/inst/xlate/l1prefetch/inst prefetch). Includes L4 | 0 | |
PM_MEM_RWITM | Memory Read With Intent to Modify for this thread | 2 | |
PM_MRK_BACK_BR_CMPL | Marked branch instruction completed with a target address less than current instruction address | 2 | |
PM_MRK_BR_2PATH | marked branches which are not strongly biased | 0 | |
PM_MRK_BR_CMPL | Branch Instruction completed | 0 | |
PM_MRK_BR_MPRED_CMPL | Marked Branch Mispredicted | 2 | |
PM_MRK_BR_TAKEN_CMPL | Marked Branch Taken completed | 0 | |
PM_MRK_BRU_FIN | bru marked instr finish | 1 | |
PM_MRK_DATA_FROM_DL2L3_MOD | The processor's data cache was reloaded with Modified (M) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a marked load | 2 | |
PM_MRK_DATA_FROM_DL2L3_MOD_CYC | Duration in cycles to reload with Modified (M) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a marked load | 3 | |
PM_MRK_DATA_FROM_DL2L3_SHR | The processor's data cache was reloaded with Shared (S) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a marked load | 0 | |
PM_MRK_DATA_FROM_DL2L3_SHR_CYC | Duration in cycles to reload with Shared (S) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a marked load | 1 | |
PM_MRK_DATA_FROM_DL4 | The processor's data cache was reloaded from another chip's L4 on a different Node or Group (Distant) due to a marked load | 0 | |
PM_MRK_DATA_FROM_DL4_CYC | Duration in cycles to reload from another chip's L4 on a different Node or Group (Distant) due to a marked load | 1 | |
PM_MRK_DATA_FROM_DMEM | The processor's data cache was reloaded from another chip's memory on the same Node or Group (Distant) due to a marked load | 2 | |
PM_MRK_DATA_FROM_DMEM_CYC | Duration in cycles to reload from another chip's memory on the same Node or Group (Distant) due to a marked load | 3 | |
PM_MRK_DATA_FROM_L2 | The processor's data cache was reloaded from local core's L2 due to a marked load | 1 | |
PM_MRK_DATA_FROM_L21_MOD | The processor's data cache was reloaded with Modified (M) data from another core's L2 on the same chip due to a marked load | 3 | |
PM_MRK_DATA_FROM_L21_MOD_CYC | Duration in cycles to reload with Modified (M) data from another core's L2 on the same chip due to a marked load | 2 | |
PM_MRK_DATA_FROM_L21_SHR | The processor's data cache was reloaded with Shared (S) data from another core's L2 on the same chip due to a marked load | 1 | |
PM_MRK_DATA_FROM_L21_SHR_CYC | Duration in cycles to reload with Shared (S) data from another core's L2 on the same chip due to a marked load | 0 | |
PM_MRK_DATA_FROM_L2_CYC | Duration in cycles to reload from local core's L2 due to a marked load | 0 | |
PM_MRK_DATA_FROM_L2_DISP_CONFLICT_LDHITST | The processor's data cache was reloaded from local core's L2 with load hit store conflict due to a marked load | 1 | |
PM_MRK_DATA_FROM_L2_DISP_CONFLICT_LDHITST_CYC | Duration in cycles to reload from local core's L2 with load hit store conflict due to a marked load | 0 | |
PM_MRK_DATA_FROM_L2_DISP_CONFLICT_OTHER | The processor's data cache was reloaded from local core's L2 with dispatch conflict due to a marked load | 1 | |
PM_MRK_DATA_FROM_L2_DISP_CONFLICT_OTHER_CYC | Duration in cycles to reload from local core's L2 with dispatch conflict due to a marked load | 2 | |
PM_MRK_DATA_FROM_L2_MEPF | The processor's data cache was reloaded from local core's L2 hit without dispatch conflicts on Mepf state. due to a marked load | 3 | |
PM_MRK_DATA_FROM_L2_MEPF_CYC | Duration in cycles to reload from local core's L2 hit without dispatch conflicts on Mepf state. due to a marked load | 2 | |
PM_MRK_DATA_FROM_L2MISS | The processor's data cache was reloaded from a location other than the local core's L2 due to a marked load | 3 | |
PM_MRK_DATA_FROM_L2MISS_CYC | Duration in cycles to reload from a location other than the local core's L2 due to a marked load | 2 | |
PM_MRK_DATA_FROM_L2_NO_CONFLICT | The processor's data cache was reloaded from local core's L2 without conflict due to a marked load | 1 | |
PM_MRK_DATA_FROM_L2_NO_CONFLICT_CYC | Duration in cycles to reload from local core's L2 without conflict due to a marked load | 0 | |
PM_MRK_DATA_FROM_L3 | The processor's data cache was reloaded from local core's L3 due to a marked load | 3 | |
PM_MRK_DATA_FROM_L31_ECO_MOD | The processor's data cache was reloaded with Modified (M) data from another core's ECO L3 on the same chip due to a marked load | 3 | |
PM_MRK_DATA_FROM_L31_ECO_MOD_CYC | Duration in cycles to reload with Modified (M) data from another core's ECO L3 on the same chip due to a marked load | 2 | |
PM_MRK_DATA_FROM_L31_ECO_SHR | The processor's data cache was reloaded with Shared (S) data from another core's ECO L3 on the same chip due to a marked load | 1 | |
PM_MRK_DATA_FROM_L31_ECO_SHR_CYC | Duration in cycles to reload with Shared (S) data from another core's ECO L3 on the same chip due to a marked load | 0 | |
PM_MRK_DATA_FROM_L31_MOD | The processor's data cache was reloaded with Modified (M) data from another core's L3 on the same chip due to a marked load | 1 | |
PM_MRK_DATA_FROM_L31_MOD_CYC | Duration in cycles to reload with Modified (M) data from another core's L3 on the same chip due to a marked load | 0 | |
PM_MRK_DATA_FROM_L31_SHR | The processor's data cache was reloaded with Shared (S) data from another core's L3 on the same chip due to a marked load | 3 | |
PM_MRK_DATA_FROM_L31_SHR_CYC | Duration in cycles to reload with Shared (S) data from another core's L3 on the same chip due to a marked load | 2 | |
PM_MRK_DATA_FROM_L3_CYC | Duration in cycles to reload from local core's L3 due to a marked load | 2 | |
PM_MRK_DATA_FROM_L3_DISP_CONFLICT | The processor's data cache was reloaded from local core's L3 with dispatch conflict due to a marked load | 0 | |
PM_MRK_DATA_FROM_L3_DISP_CONFLICT_CYC | Duration in cycles to reload from local core's L3 with dispatch conflict due to a marked load | 1 | |
PM_MRK_DATA_FROM_L3_MEPF | The processor's data cache was reloaded from local core's L3 without dispatch conflicts hit on Mepf state. due to a marked load | 1 | |
PM_MRK_DATA_FROM_L3_MEPF_CYC | Duration in cycles to reload from local core's L3 without dispatch conflicts hit on Mepf state due to a marked load | 0 | |
PM_MRK_DATA_FROM_L3MISS | The processor's data cache was reloaded from a location other than the local core's L3 due to a marked load | 1 | |
PM_MRK_DATA_FROM_L3MISS_CYC | Duration in cycles to reload from a location other than the local core's L3 due to a marked load | 0 | |
PM_MRK_DATA_FROM_L3_NO_CONFLICT | The processor's data cache was reloaded from local core's L3 without conflict due to a marked load | 2 | |
PM_MRK_DATA_FROM_L3_NO_CONFLICT_CYC | Duration in cycles to reload from local core's L3 without conflict due to a marked load | 3 | |
PM_MRK_DATA_FROM_LL4 | The processor's data cache was reloaded from the local chip's L4 cache due to a marked load | 0 | |
PM_MRK_DATA_FROM_LL4_CYC | Duration in cycles to reload from the local chip's L4 cache due to a marked load | 1 | |
PM_MRK_DATA_FROM_LMEM | The processor's data cache was reloaded from the local chip's Memory due to a marked load | 2 | |
PM_MRK_DATA_FROM_LMEM_CYC | Duration in cycles to reload from the local chip's Memory due to a marked load | 3 | |
PM_MRK_DATA_FROM_MEMORY | The processor's data cache was reloaded from a memory location including L4 from local remote or distant due to a marked load | 1 | |
PM_MRK_DATA_FROM_MEMORY_CYC | Duration in cycles to reload from a memory location including L4 from local remote or distant due to a marked load | 0 | |
PM_MRK_DATA_FROM_OFF_CHIP_CACHE | The processor's data cache was reloaded either shared or modified data from another core's L2/L3 on a different chip (remote or distant) due to a marked load | 1 | |
PM_MRK_DATA_FROM_OFF_CHIP_CACHE_CYC | Duration in cycles to reload either shared or modified data from another core's L2/L3 on a different chip (remote or distant) due to a marked load | 0 | |
PM_MRK_DATA_FROM_ON_CHIP_CACHE | The processor's data cache was reloaded either shared or modified data from another core's L2/L3 on the same chip due to a marked load | 3 | |
PM_MRK_DATA_FROM_ON_CHIP_CACHE_CYC | Duration in cycles to reload either shared or modified data from another core's L2/L3 on the same chip due to a marked load | 2 | |
PM_MRK_DATA_FROM_RL2L3_MOD | The processor's data cache was reloaded with Modified (M) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a marked load | 0 | |
PM_MRK_DATA_FROM_RL2L3_MOD_CYC | Duration in cycles to reload with Modified (M) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a marked load | 1 | |
PM_MRK_DATA_FROM_RL2L3_SHR | The processor's data cache was reloaded with Shared (S) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a marked load | 2 | |
PM_MRK_DATA_FROM_RL2L3_SHR_CYC | Duration in cycles to reload with Shared (S) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a marked load | 3 | |
PM_MRK_DATA_FROM_RL4 | The processor's data cache was reloaded from another chip's L4 on the same Node or Group ( Remote) due to a marked load | 2 | |
PM_MRK_DATA_FROM_RL4_CYC | Duration in cycles to reload from another chip's L4 on the same Node or Group ( Remote) due to a marked load | 3 | |
PM_MRK_DATA_FROM_RMEM | The processor's data cache was reloaded from another chip's memory on the same Node or Group ( Remote) due to a marked load | 0 | |
PM_MRK_DATA_FROM_RMEM_CYC | Duration in cycles to reload from another chip's memory on the same Node or Group ( Remote) due to a marked load | 1 | |
PM_MRK_DCACHE_RELOAD_INTV | Combined Intervention event | 3 | |
PM_MRK_DERAT_MISS | Erat Miss (TLB Access) All page sizes | 2 | |
PM_MRK_DERAT_MISS_16G | Marked Data ERAT Miss (Data TLB Access) page size 16G | 3 | |
PM_MRK_DERAT_MISS_16M | Marked Data ERAT Miss (Data TLB Access) page size 16M | 2 | |
PM_MRK_DERAT_MISS_1G | Marked Data ERAT Miss (Data TLB Access) page size 1G. Implies radix translation | 2 | |
PM_MRK_DERAT_MISS_2M | Marked Data ERAT Miss (Data TLB Access) page size 2M. Implies radix translation | 1 | |
PM_MRK_DERAT_MISS_4K | Marked Data ERAT Miss (Data TLB Access) page size 4K | 1 | |
PM_MRK_DERAT_MISS_64K | Marked Data ERAT Miss (Data TLB Access) page size 64K | 1 | |
PM_MRK_DFU_FIN | Decimal Unit marked Instruction Finish | 1 | |
PM_MRK_DPTEG_FROM_DL2L3_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_MRK_DPTEG_FROM_DL2L3_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another chip's L2 or L3 on a different Node or Group (Distant), as this chip due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_MRK_DPTEG_FROM_DL4 | A Page Table Entry was loaded into the TLB from another chip's L4 on a different Node or Group (Distant) due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_MRK_DPTEG_FROM_DMEM | A Page Table Entry was loaded into the TLB from another chip's memory on the same Node or Group (Distant) due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_MRK_DPTEG_FROM_L2 | A Page Table Entry was loaded into the TLB from local core's L2 due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_MRK_DPTEG_FROM_L21_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another core's L2 on the same chip due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_MRK_DPTEG_FROM_L21_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another core's L2 on the same chip due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_MRK_DPTEG_FROM_L2_MEPF | A Page Table Entry was loaded into the TLB from local core's L2 hit without dispatch conflicts on Mepf state. due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_MRK_DPTEG_FROM_L2MISS | A Page Table Entry was loaded into the TLB from a location other than the local core's L2 due to a marked data side request.. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_MRK_DPTEG_FROM_L2_NO_CONFLICT | A Page Table Entry was loaded into the TLB from local core's L2 without conflict due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_MRK_DPTEG_FROM_L3 | A Page Table Entry was loaded into the TLB from local core's L3 due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_MRK_DPTEG_FROM_L31_ECO_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another core's ECO L3 on the same chip due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_MRK_DPTEG_FROM_L31_ECO_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another core's ECO L3 on the same chip due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_MRK_DPTEG_FROM_L31_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another core's L3 on the same chip due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_MRK_DPTEG_FROM_L31_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another core's L3 on the same chip due to a marked data side request.. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_MRK_DPTEG_FROM_L3_DISP_CONFLICT | A Page Table Entry was loaded into the TLB from local core's L3 with dispatch conflict due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_MRK_DPTEG_FROM_L3_MEPF | A Page Table Entry was loaded into the TLB from local core's L3 without dispatch conflicts hit on Mepf state. due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_MRK_DPTEG_FROM_L3MISS | A Page Table Entry was loaded into the TLB from a location other than the local core's L3 due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_MRK_DPTEG_FROM_L3_NO_CONFLICT | A Page Table Entry was loaded into the TLB from local core's L3 without conflict due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_MRK_DPTEG_FROM_LL4 | A Page Table Entry was loaded into the TLB from the local chip's L4 cache due to a marked data side request.. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_MRK_DPTEG_FROM_LMEM | A Page Table Entry was loaded into the TLB from the local chip's Memory due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_MRK_DPTEG_FROM_MEMORY | A Page Table Entry was loaded into the TLB from a memory location including L4 from local remote or distant due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_MRK_DPTEG_FROM_OFF_CHIP_CACHE | A Page Table Entry was loaded into the TLB either shared or modified data from another core's L2/L3 on a different chip (remote or distant) due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 3 | |
PM_MRK_DPTEG_FROM_ON_CHIP_CACHE | A Page Table Entry was loaded into the TLB either shared or modified data from another core's L2/L3 on the same chip due to a marked data side request.. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_MRK_DPTEG_FROM_RL2L3_MOD | A Page Table Entry was loaded into the TLB with Modified (M) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_MRK_DPTEG_FROM_RL2L3_SHR | A Page Table Entry was loaded into the TLB with Shared (S) data from another chip's L2 or L3 on the same Node or Group (Remote), as this chip due to a marked data side request.. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 0 | |
PM_MRK_DPTEG_FROM_RL4 | A Page Table Entry was loaded into the TLB from another chip's L4 on the same Node or Group ( Remote) due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 1 | |
PM_MRK_DPTEG_FROM_RMEM | A Page Table Entry was loaded into the TLB from another chip's memory on the same Node or Group ( Remote) due to a marked data side request. When using Radix Page Translation, this count excludes PDE reloads. Only PTE reloads are included | 2 | |
PM_MRK_DTLB_MISS | Marked dtlb miss | 3 | |
PM_MRK_DTLB_MISS_16G | Marked Data TLB Miss page size 16G | 1 | |
PM_MRK_DTLB_MISS_16M | Marked Data TLB Miss page size 16M | 3 | |
PM_MRK_DTLB_MISS_1G | Marked Data TLB reload (after a miss) page size 2M. Implies radix translation was used | 0 | |
PM_MRK_DTLB_MISS_4K | Marked Data TLB Miss page size 4k | 1 | |
PM_MRK_DTLB_MISS_64K | Marked Data TLB Miss page size 64K | 2 | |
PM_MRK_FAB_RSP_BKILL | Marked store had to do a bkill | 3 | |
PM_MRK_FAB_RSP_BKILL_CYC | cycles L2 RC took for a bkill | 0 | |
PM_MRK_FAB_RSP_CLAIM_RTY | Sampled store did a rwitm and got a rty | 2 | |
PM_MRK_FAB_RSP_DCLAIM | Marked store had to do a dclaim | 2 | |
PM_MRK_FAB_RSP_DCLAIM_CYC | cycles L2 RC took for a dclaim | 1 | |
PM_MRK_FAB_RSP_RD_RTY | Sampled L2 reads retry count | 3 | |
PM_MRK_FAB_RSP_RD_T_INTV | Sampled Read got a T intervention | 0 | |
PM_MRK_FAB_RSP_RWITM_CYC | cycles L2 RC took for a rwitm | 3 | |
PM_MRK_FAB_RSP_RWITM_RTY | Sampled store did a rwitm and got a rty | 1 | |
PM_MRK_FXU_FIN | fxu marked instr finish | 1 | |
PM_MRK_IC_MISS | Marked instruction experienced I cache miss | 3 | |
PM_MRK_INST | An instruction was marked. Includes both Random Instruction Sampling (RIS) at decode time and Random Event Sampling (RES) at the time the configured event happens | 1 | |
PM_MRK_INST_CMPL | marked instruction completed | 3 | |
PM_MRK_INST_DECODED | An instruction was marked at decode time. Random Instruction Sampling (RIS) only | 1 | |
PM_MRK_INST_DISP | The thread has dispatched a randomly sampled marked instruction | 0 | |
PM_MRK_INST_FIN | marked instruction finished | 2 | |
PM_MRK_INST_FROM_L3MISS | Marked instruction was reloaded from a location beyond the local chiplet | 3 | |
PM_MRK_INST_ISSUED | Marked instruction issued | 0 | |
PM_MRK_INST_TIMEO | marked Instruction finish timeout (instruction lost) | 3 | |
PM_MRK_L1_ICACHE_MISS | sampled Instruction suffered an icache Miss | 0 | |
PM_MRK_L1_RELOAD_VALID | Marked demand reload | 0 | |
PM_MRK_L2_RC_DISP | Marked Instruction RC dispatched in L2 | 1 | |
PM_MRK_L2_RC_DONE | Marked RC done | 2 | |
PM_MRK_L2_TM_REQ_ABORT | TM abort | 0 | |
PM_MRK_L2_TM_ST_ABORT_SISTER | TM marked store abort for this thread | 2 | |
PM_MRK_LARX_FIN | Larx finished | 3 | |
PM_MRK_LD_MISS_EXPOSED_CYC | Marked Load exposed Miss (use edge detect to count #) | 0 | |
PM_MRK_LD_MISS_L1 | Marked DL1 Demand Miss counted at exec time. Note that this count is per slice, so if a load spans multiple slices this event will increment multiple times for a single load. | 1 | |
PM_MRK_LD_MISS_L1_CYC | Marked ld latency | 0 | |
PM_MRK_LSU_DERAT_MISS | Marked derat reload (miss) for any page size | 2 | |
PM_MRK_LSU_FIN | lsu marked instr PPC finish | 3 | |
PM_MRK_LSU_FLUSH_ATOMIC | Quad-word loads (lq) are considered atomic because they always span at least 2 slices. If a snoop or store from another thread changes the data the load is accessing between the 2 or 3 pieces of the lq instruction, the lq will be flushed | 0, 1, 2, 3 | |
PM_MRK_LSU_FLUSH_EMSH | An ERAT miss was detected after a set-p hit. Erat tracker indicates fail due to tlbmiss and the instruction gets flushed because the instruction was working on the wrong address | 0, 1, 2, 3 | |
PM_MRK_LSU_FLUSH_LARX_STCX | A larx is flushed because an older larx has an LMQ reservation for the same thread. A stcx is flushed because an older stcx is in the LMQ. The flush happens when the older larx/stcx relaunches | 0, 1, 2, 3 | |
PM_MRK_LSU_FLUSH_LHL_SHL | The instruction was flushed because of a sequential load/store consistency. If a load or store hits on an older load that has either been snooped (for loads) or has stale data (for stores). | 0, 1, 2, 3 | |
PM_MRK_LSU_FLUSH_LHS | Effective Address alias flush : no EA match but Real Address match. If the data has not yet been returned for this load, the instruction will just be rejected, but if it has returned data, it will be flushed | 0, 1, 2, 3 | |
PM_MRK_LSU_FLUSH_RELAUNCH_MISS | If a load that has already returned data and has to relaunch for any reason then gets a miss (erat, setp, data cache), it will often be flushed at relaunch time because the data might be inconsistent | 0, 1, 2, 3 | |
PM_MRK_LSU_FLUSH_SAO | A load-hit-load condition with Strong Address Ordering will have address compare disabled and flush | 0, 1, 2, 3 | |
PM_MRK_LSU_FLUSH_UE | Correctable ECC error on reload data, reported at critical data forward time | 0, 1, 2, 3 | |
PM_MRK_NTC_CYC | Cycles during which the marked instruction is next to complete (completion is held up because the marked instruction hasn't completed yet) | 1 | |
PM_MRK_NTF_FIN | Marked next to finish instruction finished | 1 | |
PM_MRK_PROBE_NOP_CMPL | Marked probeNops completed | 0 | |
PM_MRK_RUN_CYC | Run cycles in which a marked instruction is in the pipeline | 0 | |
PM_MRK_STALL_CMPLU_CYC | Number of cycles the marked instruction is experiencing a stall while it is next to complete (NTC) | 2 | |
PM_MRK_ST_CMPL | Marked store completed and sent to nest | 2 | |
PM_MRK_ST_CMPL_INT | marked store finished with intervention | 2 | |
PM_MRK_STCX_FAIL | marked stcx failed | 2 | |
PM_MRK_STCX_FIN | Number of marked stcx instructions finished. This includes instructions in the speculative path of a branch that may be flushed | 1 | |
PM_MRK_ST_DONE_L2 | marked store completed in L2 ( RC machine done) | 0 | |
PM_MRK_ST_DRAIN_TO_L2DISP_CYC | cycles to drain st from core to L2 | 2 | |
PM_MRK_ST_FWD | Marked st forwards | 2 | |
PM_MRK_ST_L2DISP_TO_CMPL_CYC | cycles from L2 rc disp to l2 rc completion | 0 | |
PM_MRK_ST_NEST | Marked store sent to nest | 1 | |
PM_MRK_TEND_FAIL | Nested or not nested tend failed for a marked tend instruction | 0, 1, 2, 3 | |
PM_MRK_VSU_FIN | VSU marked instr finish | 2 | |
PM_MULT_MRK | mult marked instr | 2 | |
PM_NEST_REF_CLK | Multiply by 4 to obtain the number of PB cycles | 2 | |
PM_NON_DATA_STORE | All ops that drain from s2q to L2 and contain no data | 0, 1, 2, 3 | |
PM_NON_FMA_FLOP_CMPL | Non FMA instruction completed | 3 | |
PM_NON_MATH_FLOP_CMPL | Non FLOP operation completed | 3 | |
PM_NON_TM_RST_SC | Non-TM snp rst TM SC | 1 | |
PM_NTC_ALL_FIN | Cycles after all instructions have finished to group completed | 1 | |
PM_NTC_FIN | Cycles in which the oldest instruction in the pipeline (NTC) finishes. This event is used to account for cycles in which work is being completed in the CPI stack | 1 | |
PM_NTC_ISSUE_HELD_ARB | The NTC instruction is being held at dispatch because it lost arbitration onto the issue pipe to another instruction (from the same thread or a different thread) | 1 | |
PM_NTC_ISSUE_HELD_DARQ_FULL | The NTC instruction is being held at dispatch because there are no slots in the DARQ for it | 0 | |
PM_NTC_ISSUE_HELD_OTHER | The NTC instruction is being held at dispatch during regular pipeline cycles, or because the VSU is busy with multi-cycle instructions, or because of a write-back collision with VSU | 2 | |
PM_PARTIAL_ST_FIN | Any store finished by an LSU slice | 2 | |
PM_PMC1_OVERFLOW | Overflow from counter 1 | 1 | |
PM_PMC1_REWIND | PMC1_REWIND | 3 | |
PM_PMC1_SAVED | PMC1 Rewind Value saved | 3 | |
PM_PMC2_OVERFLOW | Overflow from counter 2 | 2 | |
PM_PMC2_REWIND | PMC2 Rewind Event (did not match condition) | 2 | |
PM_PMC2_SAVED | PMC2 Rewind Value saved | 0 | |
PM_PMC3_OVERFLOW | Overflow from counter 3 | 3 | |
PM_PMC3_REWIND | PMC3 rewind event. A rewind happens when a speculative event (such as latency or CPI stack) is selected on PMC3 and the stall reason or reload source did not match the one programmed in PMC3. When this occurs, the count in PMC3 will not change. | 0 | |
PM_PMC3_SAVED | PMC3 Rewind Value saved | 3 | |
PM_PMC4_OVERFLOW | Overflow from counter 4 | 0 | |
PM_PMC4_REWIND | PMC4 Rewind Event | 0 | |
PM_PMC4_SAVED | PMC4 Rewind Value saved (matched condition) | 2 | |
PM_PMC5_OVERFLOW | Overflow from counter 5 | 0 | |
PM_PMC6_OVERFLOW | Overflow from counter 6 | 2 | |
PM_PROBE_NOP_DISP | ProbeNops dispatched | 3 | |
PM_PTE_PREFETCH | PTE prefetches | 0, 1, 2, 3 | |
PM_PTESYNC | ptesync instruction counted when the instruction is decoded and transmitted | 0, 1, 2, 3 | |
PM_PUMP_CPRED | Pump prediction correct. Counts across all types of pumps for all data types excluding data prefetch (demand load,inst prefetch,inst fetch,xlate) | 0 | |
PM_PUMP_MPRED | Pump misprediction. Counts across all types of pumps for all data types excluding data prefetch (demand load,inst prefetch,inst fetch,xlate) | 3 | |
PM_RADIX_PWC_L1_HIT | A radix translation attempt missed in the TLB and only the first level page walk cache was a hit. | 0 | |
PM_RADIX_PWC_L1_PDE_FROM_L2 | A Page Directory Entry was reloaded to a level 1 page walk cache from the core's L2 data cache | 1 | |
PM_RADIX_PWC_L1_PDE_FROM_L3 | A Page Directory Entry was reloaded to a level 1 page walk cache from the core's L3 data cache | 2 | |
PM_RADIX_PWC_L1_PDE_FROM_L3MISS | A Page Directory Entry was reloaded to a level 1 page walk cache from beyond the core's L3 data cache. The source could be local/remote/distant memory or another core's cache | 3 | |
PM_RADIX_PWC_L2_HIT | A radix translation attempt missed in the TLB but hit on both the first and second levels of page walk cache. | 1 | |
PM_RADIX_PWC_L2_PDE_FROM_L2 | A Page Directory Entry was reloaded to a level 2 page walk cache from the core's L2 data cache | 1 | |
PM_RADIX_PWC_L2_PDE_FROM_L3 | A Page Directory Entry was reloaded to a level 2 page walk cache from the core's L3 data cache | 2 | |
PM_RADIX_PWC_L2_PTE_FROM_L2 | A Page Table Entry was reloaded to a level 2 page walk cache from the core's L2 data cache. This implies that level 3 and level 4 PWC accesses were not necessary for this translation | 0 | |
PM_RADIX_PWC_L2_PTE_FROM_L3 | A Page Table Entry was reloaded to a level 2 page walk cache from the core's L3 data cache. This implies that level 3 and level 4 PWC accesses were not necessary for this translation | 3 | |
PM_RADIX_PWC_L2_PTE_FROM_L3MISS | A Page Table Entry was reloaded to a level 2 page walk cache from beyond the core's L3 data cache. This implies that level 3 and level 4 PWC accesses were not necessary for this translation. The source could be local/remote/distant memory or another core's cache | 3 | |
PM_RADIX_PWC_L3_HIT | A radix translation attempt missed in the TLB but hit on the first, second, and third levels of page walk cache. | 2 | |
PM_RADIX_PWC_L3_PDE_FROM_L2 | A Page Directory Entry was reloaded to a level 3 page walk cache from the core's L2 data cache | 1 | |
PM_RADIX_PWC_L3_PDE_FROM_L3 | A Page Directory Entry was reloaded to a level 3 page walk cache from the core's L3 data cache | 0 | |
PM_RADIX_PWC_L3_PTE_FROM_L2 | A Page Table Entry was reloaded to a level 3 page walk cache from the core's L2 data cache. This implies that a level 4 PWC access was not necessary for this translation | 1 | |
PM_RADIX_PWC_L3_PTE_FROM_L3 | A Page Table Entry was reloaded to a level 3 page walk cache from the core's L3 data cache. This implies that a level 4 PWC access was not necessary for this translation | 2 | |
PM_RADIX_PWC_L3_PTE_FROM_L3MISS | A Page Table Entry was reloaded to a level 3 page walk cache from beyond the core's L3 data cache. This implies that a level 4 PWC access was not necessary for this translation. The source could be local/remote/distant memory or another core's cache | 3 | |
PM_RADIX_PWC_L4_PTE_FROM_L2 | A Page Table Entry was reloaded to a level 4 page walk cache from the core's L2 data cache. This is the deepest level of PWC possible for a translation | 0 | |
PM_RADIX_PWC_L4_PTE_FROM_L3 | A Page Table Entry was reloaded to a level 4 page walk cache from the core's L3 data cache. This is the deepest level of PWC possible for a translation | 3 | |
PM_RADIX_PWC_L4_PTE_FROM_L3MISS | A Page Table Entry was reloaded to a level 4 page walk cache from beyond the core's L3 data cache. This is the deepest level of PWC possible for a translation. The source could be local/remote/distant memory or another core's cache | 2 | |
PM_RADIX_PWC_MISS | A radix translation attempt missed in the TLB and all levels of page walk cache. | 3 | |
PM_RC0_BUSY | RC mach 0 Busy. Used by PMU to sample ave RC lifetime (mach0 used as sample point) | 0 | |
PM_RC0_BUSY | RC mach 0 Busy. Used by PMU to sample ave RC lifetime (mach0 used as sample point) | 1 | |
PM_RC_USAGE | Continuous 16 cycle (2to1) window where this signals rotates thru sampling each RC machine busy. PMU uses this wave to then do 16 cyc count to sample total number of machs running | 0 | |
PM_RD_CLEARING_SC | Read clearing SC | 3 | |
PM_RD_FORMING_SC | Read forming SC | 3 | |
PM_RD_HIT_PF | RD machine hit L3 PF machine | 1 | |
PM_RUN_CYC | Run_cycles | 1 | |
PM_RUN_CYC_SMT2_MODE | Cycles in which this thread's run latch is set and the core is in SMT2 mode | 2 | |
PM_RUN_CYC_SMT4_MODE | Cycles in which this thread's run latch is set and the core is in SMT4 mode | 1 | |
PM_RUN_CYC_ST_MODE | Cycles run latch is set and core is in ST mode | 0 | |
PM_RUN_INST_CMPL | Run_Instructions | 3 | |
PM_RUN_PURR | Run_PURR | 3 | |
PM_RUN_SPURR | Run SPURR | 0 | |
PM_S2Q_FULL | Cycles during which the S2Q is full | 0, 1, 2, 3 | |
PM_SCALAR_FLOP_CMPL | Scalar flop operation completed | 3 | |
PM_SHL_CREATED | Store-Hit-Load Table Entry Created | 0, 1, 2, 3 | |
PM_SHL_ST_DEP_CREATED | Store-Hit-Load Table Read Hit with entry Enabled | 0, 1, 2, 3 | |
PM_SHL_ST_DISABLE | Store-Hit-Load Table Read Hit with entry Disabled (entry was disabled due to the entry shown to not prevent the flush) | 0, 1, 2, 3 | |
PM_SLB_TABLEWALK_CYC | Cycles when a tablewalk is pending on this thread on the SLB table | 0, 1, 2, 3 | |
PM_SN0_BUSY | SN mach 0 Busy. Used by PMU to sample ave SN lifetime (mach0 used as sample point) | 0 | |
PM_SN0_BUSY | SN mach 0 Busy. Used by PMU to sample ave SN lifetime (mach0 used as sample point) | 1 | |
PM_SN_HIT | Any port snooper hit L3. Up to 4 can happen in a cycle but we only count 1 | 3 | |
PM_SN_INVL | Any port snooper detects a store to a line in the Sx state and invalidates the line. Up to 4 can happen in a cycle but we only count 1 | 2 | |
PM_SN_MISS | Any port snooper L3 miss or collision. Up to 4 can happen in a cycle but we only count 1 | 3 | |
PM_SNOOP_TLBIE | TLBIE snoop | 0, 1, 2, 3 | |
PM_SNP_TM_HIT_M | Snp TM st hit M/Mu | 2 | |
PM_SNP_TM_HIT_T | Snp TM sthit T/Tn/Te | 2 | |
PM_SN_USAGE | Continuous 16 cycle (2to1) window where this signals rotates thru sampling each SN machine busy. PMU uses this wave to then do 16 cyc count to sample total number of machs running | 2 | |
PM_SPACEHOLDER_0x0000040062 | SPACE_HOLDER for event 0x0000040062 | 3 | |
PM_SPACEHOLDER_0x0000040064 | SPACE_HOLDER for event 0x0000040064 | 3 | |
PM_SP_FLOP_CMPL | SP instruction completed | 3 | |
PM_SRQ_EMPTY_CYC | Cycles in which the SRQ has at least one (out of four) empty slice | 3 | |
PM_SRQ_SYNC_CYC | A sync is in the S2Q (edge detect to count) | 0, 1, 2, 3 | |
PM_STALL_END_ICT_EMPTY | The number a times the core transitioned from a stall to ICT-empty for this thread | 0 | |
PM_ST_CAUSED_FAIL | Non-TM Store caused any thread to fail | 0 | |
PM_ST_CMPL | Stores completed from S2Q (2nd-level store queue). | 1 | |
PM_STCX_FAIL | stcx failed | 0 | |
PM_STCX_FIN | Number of stcx instructions finished. This includes instructions in the speculative path of a branch that may be flushed | 1 | |
PM_STCX_SUCCESS_CMPL | Number of stcx instructions that completed successfully | 0, 1, 2, 3 | |
PM_ST_FIN | Store finish count. Includes speculative activity | 1 | |
PM_ST_FWD | Store forwards that finished | 1 | |
PM_ST_MISS_L1 | Store Missed L1 | 2 | |
PM_STOP_FETCH_PENDING_CYC | Fetching is stopped due to an incoming instruction that will result in a flush | 0, 1, 2, 3 | |
PM_SUSPENDED | Counter OFF | 0 | |
PM_SUSPENDED | Counter OFF | 1 | |
PM_SUSPENDED | Counter OFF | 2 | |
PM_SUSPENDED | Counter OFF | 3 | |
PM_SYNC_MRK_BR_LINK | Marked Branch and link branch that can cause a synchronous interrupt | 0 | |
PM_SYNC_MRK_BR_MPRED | Marked Branch mispredict that can cause a synchronous interrupt | 0 | |
PM_SYNC_MRK_FX_DIVIDE | Marked fixed point divide that can cause a synchronous interrupt | 0 | |
PM_SYNC_MRK_L2HIT | Marked L2 Hits that can throw a synchronous interrupt | 0 | |
PM_SYNC_MRK_L2MISS | Marked L2 Miss that can throw a synchronous interrupt | 0 | |
PM_SYNC_MRK_L3MISS | Marked L3 misses that can throw a synchronous interrupt | 0 | |
PM_SYNC_MRK_PROBE_NOP | Marked probeNops which can cause synchronous interrupts | 0 | |
PM_SYS_PUMP_CPRED | Initial and Final Pump Scope was system pump for all data types excluding data prefetch (demand load,inst prefetch,inst fetch,xlate) | 2 | |
PM_SYS_PUMP_MPRED | Final Pump Scope (system) mispredicted. Either the original scope was too small (Chip/Group) or the original scope was System and it should have been smaller. Counts for all data types excluding data prefetch (demand load,inst prefetch,inst fetch,xlate) | 2 | |
PM_SYS_PUMP_MPRED_RTY | Final Pump Scope (system) ended up larger than Initial Pump Scope (Chip/Group) for all data types excluding data prefetch (demand load,inst prefetch,inst fetch,xlate) | 3 | |
PM_TABLEWALK_CYC | Cycles when an instruction tablewalk is active | 0 | |
PM_TABLEWALK_CYC_PREF | tablewalk qualified for pte prefetches | 0, 1, 2, 3 | |
PM_TAGE_CORRECT | The TAGE overrode BHT direction prediction and it was correct. Includes taken and not taken and is counted at execution time | 0, 1, 2, 3 | |
PM_TAGE_CORRECT_TAKEN_CMPL | The TAGE overrode BHT direction prediction and it was correct. Counted at completion for taken branches only | 0, 1, 2, 3 | |
PM_TAGE_OVERRIDE_WRONG | The TAGE overrode BHT direction prediction but it was incorrect. Counted at completion for taken branches only | 0, 1, 2, 3 | |
PM_TAGE_OVERRIDE_WRONG_SPEC | The TAGE overrode BHT direction prediction and it was correct. Includes taken and not taken and is counted at execution time | 0, 1, 2, 3 | |
PM_TAKEN_BR_MPRED_CMPL | Total number of taken branches that were incorrectly predicted as not-taken. This event counts branches completed and does not include speculative instructions | 1 | |
PM_TB_BIT_TRANS | timebase event | 2 | |
PM_TEND_PEND_CYC | TEND latency per thread | 0, 1, 2, 3 | |
PM_THRD_ALL_RUN_CYC | Cycles in which all the threads have the run latch set | 1 | |
PM_THRD_CONC_RUN_INST | PPC Instructions Finished by this thread when all threads in the core had the run-latch set | 2 | |
PM_THRD_PRIO_0_1_CYC | Cycles thread running at priority level 0 or 1 | 0, 1, 2, 3 | |
PM_THRD_PRIO_2_3_CYC | Cycles thread running at priority level 2 or 3 | 0, 1, 2, 3 | |
PM_THRD_PRIO_4_5_CYC | Cycles thread running at priority level 4 or 5 | 0, 1, 2, 3 | |
PM_THRD_PRIO_6_7_CYC | Cycles thread running at priority level 6 or 7 | 0, 1, 2, 3 | |
PM_THRESH_ACC | This event increments every time the threshold event counter ticks. Thresholding must be enabled (via MMCRA) and the thresholding start event must occur for this counter to increment. It will stop incrementing when the thresholding stop event occurs or when thresholding is disabled, until the next time a configured thresholding start event occurs. | 1 | |
PM_THRESH_EXC_1024 | Threshold counter exceeded a value of 1024 | 2 | |
PM_THRESH_EXC_128 | Threshold counter exceeded a value of 128 | 3 | |
PM_THRESH_EXC_2048 | Threshold counter exceeded a value of 2048 | 3 | |
PM_THRESH_EXC_256 | Threshold counter exceed a count of 256 | 0 | |
PM_THRESH_EXC_32 | Threshold counter exceeded a value of 32 | 1 | |
PM_THRESH_EXC_4096 | Threshold counter exceed a count of 4096 | 0 | |
PM_THRESH_EXC_512 | Threshold counter exceeded a value of 512 | 1 | |
PM_THRESH_EXC_64 | Threshold counter exceeded a value of 64 | 2 | |
PM_THRESH_MET | threshold exceeded | 0 | |
PM_THRESH_NOT_MET | Threshold counter did not meet threshold | 3 | |
PM_TLB_HIT | Number of times the TLB had the data required by the instruction. Applies to both HPT and RPT | 0 | |
PM_TLBIE_FIN | tlbie finished | 2 | |
PM_TLB_MISS | TLB Miss (I + D) | 1 | |
PM_TM_ABORTS | Number of TM transactions aborted | 2 | |
PM_TMA_REQ_L2 | addrs only req to L2 only on the first one,Indication that Load footprint is not expanding | 0, 1, 2, 3 | |
PM_TM_CAM_OVERFLOW | L3 TM cam overflow during L2 co of SC | 0 | |
PM_TM_CAP_OVERFLOW | TM Footprint Capacity Overflow | 3 | |
PM_TM_FAIL_CONF_NON_TM | TM aborted because a conflict occurred with a non-transactional access by another processor | 0, 1, 2, 3 | |
PM_TM_FAIL_CONF_TM | TM aborted because a conflict occurred with another transaction. | 0, 1, 2, 3 | |
PM_TM_FAIL_FOOTPRINT_OVERFLOW | TM aborted because the tracking limit for transactional storage accesses was exceeded.. Asynchronous | 0, 1, 2, 3 | |
PM_TM_FAIL_NON_TX_CONFLICT | Non transactional conflict from LSU, gets reported to TEXASR | 0, 1, 2, 3 | |
PM_TM_FAIL_SELF | TM aborted because a self-induced conflict occurred in Suspended state, due to one of the following: a store to a storage location that was previously accessed transactionally | 0, 1, 2, 3 | |
PM_TM_FAIL_TLBIE | Transaction failed because there was a TLBIE hit in the bloom filter | 0, 1, 2, 3 | |
PM_TM_FAIL_TX_CONFLICT | Transactional conflict from LSU, gets reported to TEXASR | 0, 1, 2, 3 | |
PM_TM_FAV_CAUSED_FAIL | TM Load (fav) caused another thread to fail | 1 | |
PM_TM_FAV_TBEGIN | Dispatch time Favored tbegin | 0, 1, 2, 3 | |
PM_TM_LD_CAUSED_FAIL | Non-TM Load caused any thread to fail | 0 | |
PM_TM_LD_CONF | TM Load (fav or non-fav) ran into conflict (failed) | 1 | |
PM_TM_NESTED_TBEGIN | Completion Tm nested tbegin | 0, 1, 2, 3 | |
PM_TM_NESTED_TEND | Completion time nested tend | 0, 1, 2, 3 | |
PM_TM_NON_FAV_TBEGIN | Dispatch time non favored tbegin | 0, 1, 2, 3 | |
PM_TM_OUTER_TBEGIN | Completion time outer tbegin | 0, 1, 2, 3 | |
PM_TM_OUTER_TBEGIN_DISP | Number of outer tbegin instructions dispatched. The dispatch unit determines whether the tbegin instruction is outer or nested. This is a speculative count, which includes flushed instructions | 3 | |
PM_TM_OUTER_TEND | Completion time outer tend | 0, 1, 2, 3 | |
PM_TM_PASSED | Number of TM transactions that passed | 1 | |
PM_TM_RST_SC | TM-snp rst RM SC | 1 | |
PM_TM_SC_CO | L3 castout TM SC line | 0 | |
PM_TM_ST_CAUSED_FAIL | TM Store (fav or non-fav) caused another thread to fail | 2 | |
PM_TM_ST_CONF | TM Store (fav or non-fav) ran into conflict (failed) | 2 | |
PM_TM_TABORT_TRECLAIM | Completion time tabortnoncd, tabortcd, treclaim | 0, 1, 2, 3 | |
PM_TM_TRANS_RUN_CYC | run cycles in transactional state | 0 | |
PM_TM_TRANS_RUN_INST | Run instructions completed in transactional state (gated by the run latch) | 2 | |
PM_TM_TRESUME | TM resume instruction completed | 0, 1, 2, 3 | |
PM_TM_TSUSPEND | TM suspend instruction completed | 0, 1, 2, 3 | |
PM_TM_TX_PASS_RUN_CYC | cycles spent in successful transactions | 1 | |
PM_TM_TX_PASS_RUN_INST | Run instructions spent in successful transactions | 3 | |
PM_VECTOR_FLOP_CMPL | Vector FP instruction completed | 3 | |
PM_VECTOR_LD_CMPL | Number of vector load instructions completed | 3 | |
PM_VECTOR_ST_CMPL | Number of vector store instructions completed | 3 | |
PM_VSU_DP_FSQRT_FDIV | vector versions of fdiv,fsqrt | 2 | |
PM_VSU_FIN | VSU instruction finished. Up to 4 per cycle | 1 | |
PM_VSU_FSQRT_FDIV | four flops operation (fdiv,fsqrt) Scalar Instructions only | 3 | |
PM_VSU_NON_FLOP_CMPL | Non FLOP operation completed | 3 | |
PM_XLATE_HPT_MODE | LSU reports every cycle the thread is in HPT translation mode (as opposed to radix mode) | 0, 1, 2, 3 | |
PM_XLATE_MISS | The LSU requested a line from L2 for translation. It may be satisfied from any source beyond L2. Includes speculative instructions | 0, 1, 2, 3 | |
PM_XLATE_RADIX_MODE | LSU reports every cycle the thread is in radix translation mode (as opposed to HPT mode) | 0, 1, 2, 3 |
Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet.- M.A. Jackson