This is a list of all P4-core CPU's with 2 logical processor per physical package performance counter event types. Please see the Intel Architecture 32 Family Developer's Manual, Volume 3, Appendix A. Oprofile use syntethised events and doen't provide a low-level access to P4 hardware, so the Intel manual is usefull mainly for people trying to add new events in Oprofile rather for end-user.
Name | Description | Counters usable | Unit mask options |
GLOBAL_POWER_EVENTS | time during which processor is not stopped | 0 |
0x01: mandatory
|
BRANCH_RETIRED | retired branches | 3 |
0x01: branch not-taken predicted
0x02: branch not-taken mispredicted 0x04: branch taken predicted 0x08: branch taken mispredicted |
MISPRED_BRANCH_RETIRED | retired mispredicted branches | 3 |
0x01: retired instruction is non-bogus
|
BPU_FETCH_REQUEST | instruction fetch requests from the branch predict unit | 0 |
0x01: trace cache lookup miss
|
ITLB_REFERENCE | translations using the instruction translation lookaside buffer | 0 |
0x01: ITLB hit
0x02: ITLB miss 0x04: uncacheable ITLB hit |
MEMORY_CANCEL | cancelled requesets in data cache address control unit | 2 |
0x04: replayed because no store request buffer available
0x08: conflicts due to 64k aliasing |
MEMORY_COMPLETE | completed split | 2 |
0x01: load split completed, excluding UC/WC loads
0x02: any split stores completed 0x04: uncacheable load split completed 0x08: uncacheable store split complete |
LOAD_PORT_REPLAY | replayed events at the load port | 2 |
0x02: split load
|
STORE_PORT_REPLAY | replayed events at the store port | 2 |
0x02: split store
|
MOB_LOAD_REPLAY | replayed loads from the memory order buffer | 0 |
0x02: replay cause: unknown store address
0x08: replay cause: unknown store data 0x10: replay cause: partial overlap between load and store 0x20: replay cause: mismatched low 4 bits between load and store addr |
BSQ_CACHE_REFERENCE | cache references seen by the bus unit | 0 |
0x01: read 2nd level cache hit shared
0x02: read 2nd level cache hit exclusive 0x04: read 2nd level cache hit modified 0x08: read 3rd level cache hit shared 0x10: read 3rd level cache hit exclusive 0x20: read 3rd level cache hit modified 0x100: read 2nd level cache miss 0x200: read 3rd level cache miss 0x400: writeback lookup from DAC misses 2nd level cache |
X87_ASSIST | retired x87 instructions which required special handling | 3 |
0x01: handle FP stack underflow
0x02: handle FP stack overflow 0x04: handle x87 output overflow 0x08: handle x87 output underflow 0x10: handle x87 input assist |
MACHINE_CLEAR | cycles with entire machine pipeline cleared | 3 |
0x01: count a portion of cycles the machine is cleared for any cause
0x04: count each time the machine is cleared due to memory ordering issues 0x40: count each time the machine is cleared due to self modifying code |
TC_MS_XFER | number of times uops deliver changed from TC to MS ROM | 1 |
0x01: count TC to MS transfers
|
UOP_QUEUE_WRITES | number of valid uops written to the uop queue | 1 |
0x01: count uops written to queue from TC build mode
0x02: count uops written to queue from TC deliver mode 0x04: count uops written to queue from microcode ROM |
INSTR_RETIRED | retired instructions | 3 |
0x01: count non-bogus instructions which are not tagged
0x02: count non-bogus instructions which are tagged 0x04: count bogus instructions which are not tagged 0x08: count bogus instructions which are tagged |
UOPS_RETIRED | retired uops | 3 |
0x01: count marked uops which are non-bogus
0x02: count marked uops which are bogus |
UOP_TYPE | type of uop tagged by front-end tagging | 3 |
0x02: count uops which are load operations
0x04: count uops which are store operations |
RETIRED_MISPRED_BRANCH_TYPE | retired mispredicted branched, selected by type | 1 |
0x01: count unconditional jumps
0x02: count conditional jumps 0x04: count call branches 0x08: count return branches 0x10: count indirect jumps |
RETIRED_BRANCH_TYPE | retired branches, selected by type | 1 |
0x01: count unconditional jumps
0x02: count conditional jumps 0x04: count call branches 0x08: count return branches 0x10: count indirect jumps |
Measurement is a crucial component of performance improvement since reasoning and intuition are fallible guides and must be supplemented with tools like timing commands and profilers.- The Practice of Programming, Brian W. Kernighan and Rob Pike