Page doesn't render properly ?

Intel P4 events

This is a list of all P4-core CPU's performance counter event types. Please see the Intel Architecture 32 Family Developer's Manual, Volume 3, Appendix A. Oprofile use syntethised events and doen't provide a low-level access to P4 hardware, so the Intel manual is usefull mainly for people trying to add new events in Oprofile rather for end-user.

NameDescriptionCounters usableUnit mask options
GLOBAL_POWER_EVENTS time during which processor is not stopped 0, 4 0x01: mandatory
BRANCH_RETIRED retired branches 3, 7 0x01: branch not-taken predicted
0x02: branch not-taken mispredicted
0x04: branch taken predicted
0x08: branch taken mispredicted
MISPRED_BRANCH_RETIRED retired mispredicted branches 3, 7 0x01: retired instruction is non-bogus
BPU_FETCH_REQUEST instruction fetch requests from the branch predict unit 0, 4 0x01: trace cache lookup miss
ITLB_REFERENCE translations using the instruction translation lookaside buffer 0, 4 0x01: ITLB hit
0x02: ITLB miss
0x04: uncacheable ITLB hit
MEMORY_CANCEL cancelled requesets in data cache address control unit 2, 6 0x04: replayed because no store request buffer available
0x08: conflicts due to 64k aliasing
MEMORY_COMPLETE completed split 2, 6 0x01: load split completed, excluding UC/WC loads
0x02: any split stores completed
0x04: uncacheable load split completed
0x08: uncacheable store split complete
LOAD_PORT_REPLAY replayed events at the load port 2, 6 0x02: split load
STORE_PORT_REPLAY replayed events at the store port 2, 6 0x02: split store
MOB_LOAD_REPLAY replayed loads from the memory order buffer 0, 4 0x02: replay cause: unknown store address
0x08: replay cause: unknown store data
0x10: replay cause: partial overlap between load and store
0x20: replay cause: mismatched low 4 bits between load and store addr
BSQ_CACHE_REFERENCE cache references seen by the bus unit 0, 4 0x01: read 2nd level cache hit shared
0x02: read 2nd level cache hit exclusive
0x04: read 2nd level cache hit modified
0x08: read 3rd level cache hit shared
0x10: read 3rd level cache hit exclusive
0x20: read 3rd level cache hit modified
0x100: read 2nd level cache miss
0x200: read 3rd level cache miss
0x400: writeback lookup from DAC misses 2nd level cache
IOQ_ALLOCATION bus transactions 0 0x01: bus request type bit 0
0x02: bus request type bit 1
0x04: bus request type bit 2
0x08: bus request type bit 3
0x10: bus request type bit 4
0x20: count read entries
0x40: count write entries
0x80: count UC memory access entries
0x100: count WC memory access entries
0x200: count write-through memory access entries
0x400: count write-protected memory access entries
0x800: count WB memory access entries
0x2000: count own store requests
0x4000: count other / DMA store requests
0x8000: count HW/SW prefetch requests
IOQ_ACTIVE_ENTRIES number of entries in the IOQ which are active 4 0x01: bus request type bit 0
0x02: bus request type bit 1
0x04: bus request type bit 2
0x08: bus request type bit 3
0x10: bus request type bit 4
0x20: count read entries
0x40: count write entries
0x80: count UC memory access entries
0x100: count WC memory access entries
0x200: count write-through memory access entries
0x400: count write-protected memory access entries
0x800: count WB memory access entries
0x2000: count own store requests
0x4000: count other / DMA store requests
0x8000: count HW/SW prefetch requests
BSQ_ALLOCATION allocations in the bus sequence unit 0 0x01: (r)eq (t)ype (e)ncoding, bit 0: see next bit
0x02: rte bit 1: 00=read, 01=read invalidate, 10=write, 11=writeback
0x04: req len bit 0
0x08: req len bit 1
0x20: request type is input (0=output)
0x40: request type is bus lock
0x80: request type is cacheable
0x100: request type is 8-byte chunk split across 8-byte boundary
0x200: request type is demand (0=prefetch)
0x400: request type is ordered
0x800: (m)emory (t)ype (e)ncoding, bit 0: see next bits
0x1000: mte bit 1: see next bits
0x2000: mte bit 2: 000=UC, 001=USWC, 100=WT, 101=WP, 110=WB
X87_ASSIST retired x87 instructions which required special handling 3, 7 0x01: handle FP stack underflow
0x02: handle FP stack overflow
0x04: handle x87 output overflow
0x08: handle x87 output underflow
0x10: handle x87 input assist
MACHINE_CLEAR cycles with entire machine pipeline cleared 3, 7 0x01: count a portion of cycles the machine is cleared for any cause
0x04: count each time the machine is cleared due to memory ordering issues
0x40: count each time the machine is cleared due to self modifying code
TC_MS_XFER number of times uops deliver changed from TC to MS ROM 1, 5 0x01: count TC to MS transfers
UOP_QUEUE_WRITES number of valid uops written to the uop queue 1, 5 0x01: count uops written to queue from TC build mode
0x02: count uops written to queue from TC deliver mode
0x04: count uops written to queue from microcode ROM
INSTR_RETIRED retired instructions 3, 7 0x01: count non-bogus instructions which are not tagged
0x02: count non-bogus instructions which are tagged
0x04: count bogus instructions which are not tagged
0x08: count bogus instructions which are tagged
UOPS_RETIRED retired uops 3, 7 0x01: count marked uops which are non-bogus
0x02: count marked uops which are bogus
UOP_TYPE type of uop tagged by front-end tagging 3, 7 0x02: count uops which are load operations
0x04: count uops which are store operations
RETIRED_MISPRED_BRANCH_TYPE retired mispredicted branched, selected by type 1, 5 0x01: count unconditional jumps
0x02: count conditional jumps
0x04: count call branches
0x08: count return branches
0x10: count indirect jumps
RETIRED_BRANCH_TYPE retired branches, selected by type 1, 5 0x01: count unconditional jumps
0x02: count conditional jumps
0x04: count call branches
0x08: count return branches
0x10: count indirect jumps
TC_DELIVER_MODE duration (in clock cycles) in the trace cache and decode engine 1, 5 0x04: processor is in deliver mode
0x20: processor is in build mode
PAGE_WALK_TYPE page walks by the page miss handler 0, 4 0x01: page walk for data TLB miss
0x02: page walk for instruction TLB miss
FSB_DATA_ACTIVITY DRDY or DBSY events on the front side bus 0, 4 0x01: count when this processor drives data onto bus
0x02: count when this processor reads data from bus
0x04: count when data is on bus but not sampled by this processor
0x08: count when this processor reserves bus for driving
0x10: count when other reserves bus and this processor will sample
0x20: count when other reserves bus and this processor will not sample
BSQ_ACTIVE_ENTRIES number of entries in the bus sequence unit which are active 4 0x01: (r)eq (t)ype (e)ncoding, bit 0: see next bit
0x02: rte bit 1: 00=read, 01=read invalidate, 10=write, 11=writeback
0x04: req len bit 0
0x08: req len bit 1
0x20: request type is input (0=output)
0x40: request type is bus lock
0x80: request type is cacheable
0x100: request type is 8-byte chunk split across 8-byte boundary
0x200: request type is demand (0=prefetch)
0x400: request type is ordered
0x800: (m)emory (t)ype (e)ncoding, bit 0: see next bits
0x1000: mte bit 1: see next bits
0x2000: mte bit 2: 000=UC, 001=USWC, 100=WT, 101=WP, 110=WB
SSE_INPUT_ASSIST input assists requested for SSE or SSE2 operands 2, 6 0x8000: count all uops of this type
PACKED_SP_UOP packed single precision uops 2, 6 0x8000: count all uops of this type
PACKED_DP_UOP packed double precision uops 2, 6 0x8000: count all uops of this type
SCALAR_SP_UOP scalar single precision uops 2, 6 0x8000: count all uops of this type
SCALAR_DP_UOP scalar double presision uops 2, 6 0x8000: count all uops of this type
64BIT_MMX_UOP 64 bit integer SIMD MMX uops 2, 6 0x8000: count all uops of this type
128BIT_MMX_UOP 128 bit integer SIMD SSE2 uops 2, 6 0x8000: count all uops of this type
X87_FP_UOP x87 floating point uops 2, 6 0x8000: count all uops of this type
X87_SIMD_MOVES_UOP x87 FPU, MMX, SSE, or SSE2 loads, stores and reg-to-reg moves 2, 6 0x08: count all x87 SIMD store/move uops
0x10: count all x87 SIMD load uops
Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. - Rob Pike
2020/07/20