Intel P4 performance counter events

Intel P4 with hyperthreading (2 logical processor) events

This is a list of all P4-core CPU's with 2 logical processor per physical package performance counter event types. Please see the Intel Architecture 32 Family Developer's Manual, Volume 3, Appendix A. Oprofile use syntethised events and doen't provide a low-level access to P4 hardware, so the Intel manual is usefull mainly for people trying to add new events in Oprofile rather for end-user.

Name	Description	Counters usable	Unit mask options
GLOBAL_POWER_EVENTS	time during which processor is not stopped	0	0x01: mandatory
BRANCH_RETIRED	retired branches	3	0x01: branch not-taken predicted 0x02: branch not-taken mispredicted 0x04: branch taken predicted 0x08: branch taken mispredicted
MISPRED_BRANCH_RETIRED	retired mispredicted branches	3	0x01: retired instruction is non-bogus
BPU_FETCH_REQUEST	instruction fetch requests from the branch predict unit	0	0x01: trace cache lookup miss
ITLB_REFERENCE	translations using the instruction translation lookaside buffer	0	0x01: ITLB hit 0x02: ITLB miss 0x04: uncacheable ITLB hit
MEMORY_CANCEL	cancelled requesets in data cache address control unit	2	0x04: replayed because no store request buffer available 0x08: conflicts due to 64k aliasing
MEMORY_COMPLETE	completed split	2	0x01: load split completed, excluding UC/WC loads 0x02: any split stores completed 0x04: uncacheable load split completed 0x08: uncacheable store split complete
LOAD_PORT_REPLAY	replayed events at the load port	2	0x02: split load
STORE_PORT_REPLAY	replayed events at the store port	2	0x02: split store
MOB_LOAD_REPLAY	replayed loads from the memory order buffer	0	0x02: replay cause: unknown store address 0x08: replay cause: unknown store data 0x10: replay cause: partial overlap between load and store 0x20: replay cause: mismatched low 4 bits between load and store addr
BSQ_CACHE_REFERENCE	cache references seen by the bus unit	0	0x01: read 2nd level cache hit shared 0x02: read 2nd level cache hit exclusive 0x04: read 2nd level cache hit modified 0x08: read 3rd level cache hit shared 0x10: read 3rd level cache hit exclusive 0x20: read 3rd level cache hit modified 0x100: read 2nd level cache miss 0x200: read 3rd level cache miss 0x400: writeback lookup from DAC misses 2nd level cache
X87_ASSIST	retired x87 instructions which required special handling	3	0x01: handle FP stack underflow 0x02: handle FP stack overflow 0x04: handle x87 output overflow 0x08: handle x87 output underflow 0x10: handle x87 input assist
MACHINE_CLEAR	cycles with entire machine pipeline cleared	3	0x01: count a portion of cycles the machine is cleared for any cause 0x04: count each time the machine is cleared due to memory ordering issues 0x40: count each time the machine is cleared due to self modifying code
TC_MS_XFER	number of times uops deliver changed from TC to MS ROM	1	0x01: count TC to MS transfers
UOP_QUEUE_WRITES	number of valid uops written to the uop queue	1	0x01: count uops written to queue from TC build mode 0x02: count uops written to queue from TC deliver mode 0x04: count uops written to queue from microcode ROM
INSTR_RETIRED	retired instructions	3	0x01: count non-bogus instructions which are not tagged 0x02: count non-bogus instructions which are tagged 0x04: count bogus instructions which are not tagged 0x08: count bogus instructions which are tagged
UOPS_RETIRED	retired uops	3	0x01: count marked uops which are non-bogus 0x02: count marked uops which are bogus
UOP_TYPE	type of uop tagged by front-end tagging	3	0x02: count uops which are load operations 0x04: count uops which are store operations
RETIRED_MISPRED_BRANCH_TYPE	retired mispredicted branched, selected by type	1	0x01: count unconditional jumps 0x02: count conditional jumps 0x04: count call branches 0x08: count return branches 0x10: count indirect jumps
RETIRED_BRANCH_TYPE	retired branches, selected by type	1	0x01: count unconditional jumps 0x02: count conditional jumps 0x04: count call branches 0x08: count return branches 0x10: count indirect jumps

Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. - Rob Pike

2020/07/20