This is a list of all P4-core CPU's performance counter event types. Please see the Intel Architecture 32 Family Developer's Manual, Volume 3, Appendix A. Oprofile use syntethised events and doen't provide a low-level access to P4 hardware, so the Intel manual is usefull mainly for people trying to add new events in Oprofile rather for end-user.
Name | Description | Counters usable | Unit mask options |
GLOBAL_POWER_EVENTS | time during which processor is not stopped | 0, 4 |
0x01: mandatory
|
BRANCH_RETIRED | retired branches | 3, 7 |
0x01: branch not-taken predicted
0x02: branch not-taken mispredicted 0x04: branch taken predicted 0x08: branch taken mispredicted |
MISPRED_BRANCH_RETIRED | retired mispredicted branches | 3, 7 |
0x01: retired instruction is non-bogus
|
BPU_FETCH_REQUEST | instruction fetch requests from the branch predict unit | 0, 4 |
0x01: trace cache lookup miss
|
ITLB_REFERENCE | translations using the instruction translation lookaside buffer | 0, 4 |
0x01: ITLB hit
0x02: ITLB miss 0x04: uncacheable ITLB hit |
MEMORY_CANCEL | cancelled requesets in data cache address control unit | 2, 6 |
0x04: replayed because no store request buffer available
0x08: conflicts due to 64k aliasing |
MEMORY_COMPLETE | completed split | 2, 6 |
0x01: load split completed, excluding UC/WC loads
0x02: any split stores completed 0x04: uncacheable load split completed 0x08: uncacheable store split complete |
LOAD_PORT_REPLAY | replayed events at the load port | 2, 6 |
0x02: split load
|
STORE_PORT_REPLAY | replayed events at the store port | 2, 6 |
0x02: split store
|
MOB_LOAD_REPLAY | replayed loads from the memory order buffer | 0, 4 |
0x02: replay cause: unknown store address
0x08: replay cause: unknown store data 0x10: replay cause: partial overlap between load and store 0x20: replay cause: mismatched low 4 bits between load and store addr |
BSQ_CACHE_REFERENCE | cache references seen by the bus unit | 0, 4 |
0x01: read 2nd level cache hit shared
0x02: read 2nd level cache hit exclusive 0x04: read 2nd level cache hit modified 0x08: read 3rd level cache hit shared 0x10: read 3rd level cache hit exclusive 0x20: read 3rd level cache hit modified 0x100: read 2nd level cache miss 0x200: read 3rd level cache miss 0x400: writeback lookup from DAC misses 2nd level cache |
IOQ_ALLOCATION | bus transactions | 0 |
0x01: bus request type bit 0
0x02: bus request type bit 1 0x04: bus request type bit 2 0x08: bus request type bit 3 0x10: bus request type bit 4 0x20: count read entries 0x40: count write entries 0x80: count UC memory access entries 0x100: count WC memory access entries 0x200: count write-through memory access entries 0x400: count write-protected memory access entries 0x800: count WB memory access entries 0x2000: count own store requests 0x4000: count other / DMA store requests 0x8000: count HW/SW prefetch requests |
IOQ_ACTIVE_ENTRIES | number of entries in the IOQ which are active | 4 |
0x01: bus request type bit 0
0x02: bus request type bit 1 0x04: bus request type bit 2 0x08: bus request type bit 3 0x10: bus request type bit 4 0x20: count read entries 0x40: count write entries 0x80: count UC memory access entries 0x100: count WC memory access entries 0x200: count write-through memory access entries 0x400: count write-protected memory access entries 0x800: count WB memory access entries 0x2000: count own store requests 0x4000: count other / DMA store requests 0x8000: count HW/SW prefetch requests |
BSQ_ALLOCATION | allocations in the bus sequence unit | 0 |
0x01: (r)eq (t)ype (e)ncoding, bit 0: see next bit
0x02: rte bit 1: 00=read, 01=read invalidate, 10=write, 11=writeback 0x04: req len bit 0 0x08: req len bit 1 0x20: request type is input (0=output) 0x40: request type is bus lock 0x80: request type is cacheable 0x100: request type is 8-byte chunk split across 8-byte boundary 0x200: request type is demand (0=prefetch) 0x400: request type is ordered 0x800: (m)emory (t)ype (e)ncoding, bit 0: see next bits 0x1000: mte bit 1: see next bits 0x2000: mte bit 2: 000=UC, 001=USWC, 100=WT, 101=WP, 110=WB |
X87_ASSIST | retired x87 instructions which required special handling | 3, 7 |
0x01: handle FP stack underflow
0x02: handle FP stack overflow 0x04: handle x87 output overflow 0x08: handle x87 output underflow 0x10: handle x87 input assist |
MACHINE_CLEAR | cycles with entire machine pipeline cleared | 3, 7 |
0x01: count a portion of cycles the machine is cleared for any cause
0x04: count each time the machine is cleared due to memory ordering issues 0x40: count each time the machine is cleared due to self modifying code |
TC_MS_XFER | number of times uops deliver changed from TC to MS ROM | 1, 5 |
0x01: count TC to MS transfers
|
UOP_QUEUE_WRITES | number of valid uops written to the uop queue | 1, 5 |
0x01: count uops written to queue from TC build mode
0x02: count uops written to queue from TC deliver mode 0x04: count uops written to queue from microcode ROM |
INSTR_RETIRED | retired instructions | 3, 7 |
0x01: count non-bogus instructions which are not tagged
0x02: count non-bogus instructions which are tagged 0x04: count bogus instructions which are not tagged 0x08: count bogus instructions which are tagged |
UOPS_RETIRED | retired uops | 3, 7 |
0x01: count marked uops which are non-bogus
0x02: count marked uops which are bogus |
UOP_TYPE | type of uop tagged by front-end tagging | 3, 7 |
0x02: count uops which are load operations
0x04: count uops which are store operations |
RETIRED_MISPRED_BRANCH_TYPE | retired mispredicted branched, selected by type | 1, 5 |
0x01: count unconditional jumps
0x02: count conditional jumps 0x04: count call branches 0x08: count return branches 0x10: count indirect jumps |
RETIRED_BRANCH_TYPE | retired branches, selected by type | 1, 5 |
0x01: count unconditional jumps
0x02: count conditional jumps 0x04: count call branches 0x08: count return branches 0x10: count indirect jumps |
TC_DELIVER_MODE | duration (in clock cycles) in the trace cache and decode engine | 1, 5 |
0x04: processor is in deliver mode
0x20: processor is in build mode |
PAGE_WALK_TYPE | page walks by the page miss handler | 0, 4 |
0x01: page walk for data TLB miss
0x02: page walk for instruction TLB miss |
FSB_DATA_ACTIVITY | DRDY or DBSY events on the front side bus | 0, 4 |
0x01: count when this processor drives data onto bus
0x02: count when this processor reads data from bus 0x04: count when data is on bus but not sampled by this processor 0x08: count when this processor reserves bus for driving 0x10: count when other reserves bus and this processor will sample 0x20: count when other reserves bus and this processor will not sample |
BSQ_ACTIVE_ENTRIES | number of entries in the bus sequence unit which are active | 4 |
0x01: (r)eq (t)ype (e)ncoding, bit 0: see next bit
0x02: rte bit 1: 00=read, 01=read invalidate, 10=write, 11=writeback 0x04: req len bit 0 0x08: req len bit 1 0x20: request type is input (0=output) 0x40: request type is bus lock 0x80: request type is cacheable 0x100: request type is 8-byte chunk split across 8-byte boundary 0x200: request type is demand (0=prefetch) 0x400: request type is ordered 0x800: (m)emory (t)ype (e)ncoding, bit 0: see next bits 0x1000: mte bit 1: see next bits 0x2000: mte bit 2: 000=UC, 001=USWC, 100=WT, 101=WP, 110=WB |
SSE_INPUT_ASSIST | input assists requested for SSE or SSE2 operands | 2, 6 |
0x8000: count all uops of this type
|
PACKED_SP_UOP | packed single precision uops | 2, 6 |
0x8000: count all uops of this type
|
PACKED_DP_UOP | packed double precision uops | 2, 6 |
0x8000: count all uops of this type
|
SCALAR_SP_UOP | scalar single precision uops | 2, 6 |
0x8000: count all uops of this type
|
SCALAR_DP_UOP | scalar double presision uops | 2, 6 |
0x8000: count all uops of this type
|
64BIT_MMX_UOP | 64 bit integer SIMD MMX uops | 2, 6 |
0x8000: count all uops of this type
|
128BIT_MMX_UOP | 128 bit integer SIMD SSE2 uops | 2, 6 |
0x8000: count all uops of this type
|
X87_FP_UOP | x87 floating point uops | 2, 6 |
0x8000: count all uops of this type
|
X87_SIMD_MOVES_UOP | x87 FPU, MMX, SSE, or SSE2 loads, stores and reg-to-reg moves | 2, 6 |
0x08: count all x87 SIMD store/move uops
0x10: count all x87 SIMD load uops |
Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet.- M.A. Jackson