This is a list of PPC E500 v2's performance counter event types. Please see PowerPC e500 Core Complex Reference Manual Chapter 7: Performance Monitor. Downloadable from freescale.com
Name | Description | Counters usable | Unit mask options |
CPU_CLK | Cycles | all | |
COMPLETED_INSNS | Completed Instructions (0, 1, or 2 per cycle) | all | |
COMPLETED_OPS | Completed Micro-ops (counts 2 for load/store w/update) | all | |
INSTRUCTION_FETCHES | Instruction fetches | all | |
DECODED_OPS | Micro-ops decoded | all | |
COMPLETED_BRANCHES | Branch Instructions completed | all | |
COMPLETED_LOAD_OPS | Load micro-ops completed | all | |
COMPLETED_STORE_OPS | Store micro-ops completed | all | |
COMPLETION_REDIRECTS | Number of completion buffer redirects | all | |
BRANCHES_FINISHED | Branches finished | all | |
TAKEN_BRANCHES_FINISHED | Taken branches finished | all | |
BIFFED_BRANCHES_FINISHED | Biffed branches finished | all | |
BRANCHES_MISPREDICTED | Branch instructions mispredicted due to direction, target, or IAB prediction | all | |
BRANCHES_MISPREDICTED_DIRECTION | Branches mispredicted due to direction prediction | all | |
BTB_HITS | Branches that hit in the BTB, or missed but are not taken | all | |
DECODE_STALLED | Cycles the instruction buffer was not empty, but 0 instructions decoded | all | |
ISSUE_STALLED | Cycles the issue buffer is not empty but 0 instructions issued | all | |
BRANCH_ISSUE_STALLED | Cycles the branch buffer is not empty but 0 instructions issued | all | |
SRS0_SCHEDULE_STALLED | Cycles SRS0 is not empty but 0 instructions scheduled | all | |
SRS1_SCHEDULE_STALLED | Cycles SRS1 is not empty but 0 instructions scheduled | all | |
VRS_SCHEDULE_STALLED | Cycles VRS is not empty but 0 instructions scheduled | all | |
LRS_SCHEDULE_STALLED | Cycles LRS is not empty but 0 instructions scheduled | all | |
BRS_SCHEDULE_STALLED | Cycles BRS is not empty but 0 instructions scheduled Load/Store, Data Cache, and dLFB Events | all | |
TOTAL_TRANSLATED | Total Ldst microops translated. | all | |
LOADS_TRANSLATED | Number of cacheable L* or EVL* microops translated. (This includes microops from load-multiple, load-update, and load-context instructions.) | all | |
STORES_TRANSLATED | Number of cacheable ST* or EVST* microops translated. (This includes microops from store-multiple, store-update, and save-context instructions.) | all | |
TOUCHES_TRANSLATED | Number of cacheable DCBT and DCBTST instructions translated (L1 only) (Does not count touches that are converted to nops i.e. exceptions, noncacheable, hid0[nopti] bit is set.) | all | |
CACHEOPS_TRANSLATED | Number of dcba, dcbf, dcbst, and dcbz instructions translated (e500 traps on dcbi) | all | |
CACHEINHIBITED_ACCESSES_TRANSLATED | Number of cache inhibited accesses translated | all | |
GUARDED_LOADS_TRANSLATED | Number of guarded loads translated | all | |
WRITETHROUGH_STORES_TRANSLATED | Number of write-through stores translated | all | |
MISALIGNED_ACCESSES_TRANSLATED | Number of misaligned load or store accesses translated. | all | |
TOTAL_ALLOCATED_DLFB | Total allocated to dLFB | all | |
LOADS_TRANSLATED_ALLOCATED_DLFB | Loads translated and allocated to dLFB (Applies to same class of instructions as loads translated.) | all | |
STORES_COMPLETED_ALLOCATED_DLFB | Stores completed and allocated to dLFB (Applies to same class of instructions as stores translated.) | all | |
TOUCHES_TRANSLATED_ALLOCATED_DLFB | Touches translated and allocated to dLFB (Applies to same class of instructions as touches translated.) | all | |
STORES_COMPLETED | Number of cacheable ST* or EVST* microops completed. (Applies to the same class of instructions as stores translated.) | all | |
DL1_LOCKS | Number of cache lines locked in the dL1. (Counts a lock even if an overlock condition is encountered.) | all | |
DL1_RELOADS | This is historically used to determine dcache miss rate (along with loads/stores completed). This counts dL1 reloads for any reason. | all | |
DL1_CASTOUTS | dL1 castouts. Does not count castouts due to DCBF. | all | |
DETECTED_REPLAYS | Times detected replay condition - Load miss with dLFB full. | all | |
LOAD_MISS_QUEUE_FULL_REPLAYS | Load miss with load queue full. | all | |
LOAD_GUARDED_MISS_NOT_LAST_REPLAYS | Load guarded miss when the load is not yet at the bottom of the completion buffer. | all | |
STORE_TRANSLATED_QUEUE_FULL_REPLAYS | Translate a store when the StQ is full. | all | |
ADDRESS_COLLISION_REPLAYS | Address collision. | all | |
DMMU_MISS_REPLAYS | DMMU_MISS_REPLAYS : DMMU miss. | all | |
DMMU_BUSY_REPLAYS | DMMU_BUSY_REPLAYS : DMMU busy. | all | |
SECOND_PART_MISALIGNED_AFTER_MISS_REPLAYS | Second part of misaligned access when first part missed in cache. | all | |
LOAD_MISS_DLFB_FULL_CYCLES | Cycles stalled on replay condition - Load miss with dLFB full. | all | |
LOAD_MISS_QUEUE_FULL_CYCLES | Cycles stalled on replay condition - Load miss with load queue full. | all | |
LOAD_GUARDED_MISS_NOT_LAST_CYCLES | Cycles stalled on replay condition - Load guarded miss when the load is not yet at the bottom of the completion buffer. | all | |
STORE_TRANSLATED_QUEUE_FULL_CYCLES | Cycles stalled on replay condition - Translate a store when the StQ is full. | all | |
ADDRESS_COLLISION_CYCLES | Cycles stalled on replay condition - Address collision. | all | |
DMMU_MISS_CYCLES | Cycles stalled on replay condition - DMMU miss. | all | |
DMMU_BUSY_CYCLES | Cycles stalled on replay condition - DMMU busy. | all | |
SECOND_PART_MISALIGNED_AFTER_MISS_CYCLES | Cycles stalled on replay condition - Second part of misaligned access when first part missed in cache. | all | |
IL1_LOCKS | Number of cache lines locked in the iL1. (Counts a lock even if an overlock condition is encountered.) | all | |
IL1_FETCH_RELOADS | This is historically used to determine icache miss rate (along with instructions completed) Reloads due to demand fetch. | all | |
FETCHES | Counts the number of fetches that write at least one instruction to the instruction buffer. (With instruction fetched, can used to compute instructions-per-fetch) | all | |
IMMU_TLB4K_RELOADS | iMMU TLB4K reloads | all | |
IMMU_VSP_RELOADS | iMMU VSP reloads | all | |
DMMU_TLB4K_RELOADS | dMMU TLB4K reloads | all | |
DMMU_VSP_RELOADS | dMMU VSP reloads | all | |
L2MMU_MISSES | Counts iTLB/dTLB error interrupt | all | |
BIU_MASTER_REQUESTS | Number of master transactions. (Number of master TSs.) | all | |
BIU_MASTER_I_REQUESTS | Number of master I-Side transactions. (Number of master I-Side TSs.) | all | |
BIU_MASTER_D_REQUESTS | Number of master D-Side transactions. (Number of master D-Side TSs.) | all | |
BIU_MASTER_D_CASTOUT_REQUESTS | Number of master D-Side non-program-demand castout transactions. This counts replacement pushes and snoop pushes. This does not count DCBF castouts. (Number of master D-side non-program-demand castout TSs.) | all | |
BIU_MASTER_RETRIES | Number of transactions which were initiated by this processor which were retried on the BIU interface. (Number of master ARTRYs.) | all | |
SNOOP_REQUESTS | Number of externally generated snoop requests. (Counts snoop TSs.) | all | |
SNOOP_HITS | Number of snoop hits on all D-side resources regardless of the cache state (modified, exclusive, or shared) | all | |
SNOOP_PUSHES | Number of snoop pushes from all D-side resources. (Counts snoop ARTRY/WOPs.) | all | |
SNOOP_RETRIES | Number of snoop requests retried. (Counts snoop ARTRYs.) | all | |
PMC0_OVERFLOW | Counts the number of times PMC0[32] transitioned from 1 to 0. | all | |
PMC1_OVERFLOW | Counts the number of times PMC1[32] transitioned from 1 to 0. | all | |
PMC2_OVERFLOW | Counts the number of times PMC2[32] transitioned from 1 to 0. | all | |
PMC3_OVERFLOW | Counts the number of times PMC3[32] transitioned from 1 to 0. | all | |
INTERRUPTS | Number of interrupts taken | all | |
EXTERNAL_INTERRUPTS | Number of external input interrupts taken | all | |
CRITICAL_INTERRUPTS | Number of critical input interrupts taken | all | |
SC_TRAP_INTERRUPTS | Number of system call and trap interrupts | all |
Measurement is a crucial component of performance improvement since reasoning and intuition are fallible guides and must be supplemented with tools like timing commands and profilers.- The Practice of Programming, Brian W. Kernighan and Rob Pike