Page doesn't render properly ?

Intel Westmere Microarchitecture events

This is a list of all Intel Westmere Microarchitecture performance counter event types. Please see Intel Architecture Developer's Manual Volume 3B, Appendix A and Intel Architecture Optimization Reference Manual (730795-001)

NameDescriptionCounters usableUnit mask options
CPU_CLK_UNHALTED Clock cycles when not halted all
UNHALTED_REFERENCE_CYCLES Unhalted reference cycles all 0x01: No unit mask
INST_RETIRED number of instructions retired all
LLC_MISSES Last level cache demand requests from this core that missed the LLC all 0x41: No unit mask
LLC_REFS Last level cache demand requests from this core all 0x4f: No unit mask
BR_INST_RETIRED number of branch instructions retired all
BR_MISS_PRED_RETIRED number of mispredicted branches retired (precise) all
LOAD_BLOCK Loads that partially overlap an earlier store all 0x02: No unit mask
SB_DRAIN All Store buffer stall cycles all 0x07: No unit mask
MISALIGN_MEM_REF Misaligned store references all 0x02: No unit mask
STORE_BLOCKS Loads delayed with at-Retirement block code all 0x04: (name=at_ret) Loads delayed with at-Retirement block code
0x08: (name=l1d_block) Cacheable loads delayed with L1D block code
PARTIAL_ADDRESS_ALIAS False dependencies due to partial address aliasing all 0x01: No unit mask
DTLB_LOAD_MISSES DTLB load misses all 0x01: (name=any) DTLB load misses
0x02: (name=walk_completed) DTLB load miss page walks complete
0x04: (name=walk_cycles) DTLB load miss page walk cycles
0x10: (name=stlb_hit) DTLB second level hit
0x20: (name=pde_miss) DTLB load miss caused by low part of address
0x80: (name=large_walk_completed) DTLB load miss large page walks
MEM_INST_RETIRED Memory instructions retired above 0 clocks (Precise Event) all 0x01: (name=loads) Instructions retired which contains a load (Precise Event)
0x02: (name=stores) Instructions retired which contains a store (Precise Event)
MEM_STORE_RETIRED Retired stores that miss the DTLB (Precise Event) all 0x01: No unit mask
UOPS_ISSUED Uops issued all 0x01: (name=any) Uops issued
0x02: (name=fused) Fused Uops issued
MEM_UNCORE_RETIRED Load instructions retired that HIT modified data in sibling core (Precise Event) all 0x02: (name=local_hitm) Load instructions retired that HIT modified data in sibling core (Precise Event)
0x04: (name=remote_hitm) Retired loads that hit remote socket in modified state (Precise Event)
0x08: (name=local_dram_and_remote_cache_hit) Load instructions retired local dram and remote cache HIT data sources (Precise Event)
0x10: (name=remote_dram) Load instructions retired remote DRAM and remote home-remote cache HITM (Precise Event)
0x80: (name=uncacheable) Load instructions retired IO (Precise Event)
FP_COMP_OPS_EXE MMX Uops all 0x01: (name=x87) Computational floating-point operations executed
0x02: (name=mmx) MMX Uops
0x04: (name=sse_fp) SSE and SSE2 FP Uops
0x08: (name=sse2_integer) SSE2 integer Uops
0x10: (name=sse_fp_packed) SSE FP packed Uops
0x20: (name=sse_fp_scalar) SSE FP scalar Uops
0x40: (name=sse_single_precision) SSE* FP single precision Uops
0x80: (name=sse_double_precision) SSE* FP double precision Uops
SIMD_INT_128 128 bit SIMD integer pack operations all 0x01: (name=packed_mpy) 128 bit SIMD integer multiply operations
0x02: (name=packed_shift) 128 bit SIMD integer shift operations
0x04: (name=pack) 128 bit SIMD integer pack operations
0x08: (name=unpack) 128 bit SIMD integer unpack operations
0x10: (name=packed_logical) 128 bit SIMD integer logical operations
0x20: (name=packed_arith) 128 bit SIMD integer arithmetic operations
0x40: (name=shuffle_move) 128 bit SIMD integer shuffle/move operations
LOAD_DISPATCH All loads dispatched all 0x01: (name=rs) Loads dispatched that bypass the MOB
0x02: (name=rs_delayed) Loads dispatched from stage 305
0x04: (name=mob) Loads dispatched from the MOB
0x07: (name=any) All loads dispatched
ARITH Cycles the divider is busy all 0x01: (name=cycles_div_busy) Cycles the divider is busy
0x02: (name=mul) Multiply operations executed
INST_QUEUE_WRITES Instructions written to instruction queue. all 0x01: No unit mask
INST_DECODED Instructions that must be decoded by decoder 0 all 0x01: No unit mask
TWO_UOP_INSTS_DECODED Two Uop instructions decoded all 0x01: No unit mask
INST_QUEUE_WRITE_CYCLES Cycles instructions are written to the instruction queue all 0x01: No unit mask
LSD_OVERFLOW Loops that can't stream from the instruction queue all 0x01: No unit mask
L2_RQSTS L2 instruction fetch hits all 0x01: (name=ld_hit) L2 load hits
0x02: (name=ld_miss) L2 load misses
0x03: (name=loads) L2 requests
0x04: (name=rfo_hit) L2 RFO hits
0x08: (name=rfo_miss) L2 RFO misses
0x0c: (name=rfos) L2 RFO requests
0x10: (name=ifetch_hit) L2 instruction fetch hits
0x20: (name=ifetch_miss) L2 instruction fetch misses
0x30: (name=ifetches) L2 instruction fetches
0x40: (name=prefetch_hit) L2 prefetch hits
0x80: (name=prefetch_miss) L2 prefetch misses
0xaa: (name=miss) All L2 misses
0xc0: (name=prefetches) All L2 prefetches
0xff: (name=references) All L2 requests
L2_DATA_RQSTS All L2 data requests all 0x01: (name=demand_i_state) L2 data demand loads in I state (misses)
0x02: (name=demand_s_state) L2 data demand loads in S state
0x04: (name=demand_e_state) L2 data demand loads in E state
0x08: (name=demand_m_state) L2 data demand loads in M state
0x0f: (name=demand_mesi) L2 data demand requests
0x10: (name=prefetch_i_state) L2 data prefetches in the I state (misses)
0x20: (name=prefetch_s_state) L2 data prefetches in the S state
0x40: (name=prefetch_e_state) L2 data prefetches in E state
0x80: (name=prefetch_m_state) L2 data prefetches in M state
0xf0: (name=prefetch_mesi) All L2 data prefetches
0xff: (name=any) All L2 data requests
L2_WRITE L2 demand lock RFOs in E state all 0x01: (name=rfo_i_state) L2 demand store RFOs in I state (misses)
0x02: (name=rfo_s_state) L2 demand store RFOs in S state
0x08: (name=rfo_m_state) L2 demand store RFOs in M state
0x0e: (name=rfo_hit) All L2 demand store RFOs that hit the cache
0x0f: (name=rfo_mesi) All L2 demand store RFOs
0x10: (name=lock_i_state) L2 demand lock RFOs in I state (misses)
0x20: (name=lock_s_state) L2 demand lock RFOs in S state
0x40: (name=lock_e_state) L2 demand lock RFOs in E state
0x80: (name=lock_m_state) L2 demand lock RFOs in M state
0xe0: (name=lock_hit) All demand L2 lock RFOs that hit the cache
0xf0: (name=lock_mesi) All demand L2 lock RFOs
L1D_WB_L2 L1 writebacks to L2 in E state all 0x01: (name=i_state) L1 writebacks to L2 in I state (misses)
0x02: (name=s_state) L1 writebacks to L2 in S state
0x04: (name=e_state) L1 writebacks to L2 in E state
0x08: (name=m_state) L1 writebacks to L2 in M state
0x0f: (name=mesi) All L1 writebacks to L2
LONGEST_LAT_CACHE Longest latency cache miss all 0x01: (name=miss) Longest latency cache miss
0x02: (name=reference) Longest latency cache reference
CPU_CLK_UNHALTED Reference base clock (133 Mhz) cycles when thread is not halted (programmable counter) all 0x00: (name=thread_p) Cycles when thread is not halted (programmable counter)
0x01: (name=ref_p) Reference base clock (133 Mhz) cycles when thread is not halted (programmable counter)
DTLB_MISSES DTLB misses all 0x01: (name=any) DTLB misses
0x02: (name=walk_completed) DTLB miss page walks
0x04: (name=walk_cycles) DTLB miss page walk cycles
0x10: (name=stlb_hit) DTLB first level misses but second level hit
0x20: (name=pde_miss) DTLB misses casued by low part of address
0x80: (name=large_walk_completed) DTLB miss large page walks
LOAD_HIT_PRE Load operations conflicting with software prefetches 0, 1 0x01: No unit mask
L1D_PREFETCH L1D hardware prefetch misses 0, 1 0x01: (name=requests) L1D hardware prefetch requests
0x02: (name=miss) L1D hardware prefetch misses
0x04: (name=triggers) L1D hardware prefetch requests triggered
EPT Extended Page Table walk cycles all 0x10: No unit mask
L1D L1D cache lines replaced in M state 0, 1 0x01: (name=repl) L1 data cache lines allocated
0x02: (name=m_repl) L1D cache lines allocated in the M state
0x04: (name=m_evict) L1D cache lines replaced in M state
0x08: (name=m_snoop_evict) L1D snoop eviction of cache lines in M state
L1D_CACHE_PREFETCH_LOCK_FB_HIT L1D prefetch load lock accepted in fill buffer 0, 1 0x01: No unit mask
OFFCORE_REQUESTS_OUTSTANDING Outstanding offcore reads 0 0x01: (name=demand_read_data) Outstanding offcore demand data reads
0x02: (name=demand_read_code) Outstanding offcore demand code reads
0x04: (name=demand_rfo) Outstanding offcore demand RFOs
0x08: (name=any_read) Outstanding offcore reads
CACHE_LOCK_CYCLES Cycles L1D locked 0, 1 0x01: (name=l1d_l2) Cycles L1D and L2 locked
0x02: (name=l1d) Cycles L1D locked
IO_TRANSACTIONS I/O transactions all 0x01: No unit mask
L1I L1I instruction fetch stall cycles all 0x01: (name=hits) L1I instruction fetch hits
0x02: (name=misses) L1I instruction fetch misses
0x03: (name=reads) L1I Instruction fetches
0x04: (name=cycles_stalled) L1I instruction fetch stall cycles
LARGE_ITLB Large ITLB hit all 0x01: No unit mask
ITLB_MISSES ITLB miss all 0x01: (name=any) ITLB miss
0x02: (name=walk_completed) ITLB miss page walks
0x04: (name=walk_cycles) ITLB miss page walk cycles
0x80: (name=large_walk_completed) ITLB miss large page walks
ILD_STALL Any Instruction Length Decoder stall cycles all 0x01: (name=lcp) Length Change Prefix stall cycles
0x02: (name=mru) Stall cycles due to BPU MRU bypass
0x04: (name=iq_full) Instruction Queue full stall cycles
0x08: (name=regen) Regen stall cycles
0x0f: (name=any) Any Instruction Length Decoder stall cycles
BR_INST_EXEC Branch instructions executed all 0x01: (name=cond) Conditional branch instructions executed
0x02: (name=direct) Unconditional branches executed
0x04: (name=indirect_non_call) Indirect non call branches executed
0x07: (name=non_calls) All non call branches executed
0x08: (name=return_near) Indirect return branches executed
0x10: (name=direct_near_call) Unconditional call branches executed
0x20: (name=indirect_near_call) Indirect call branches executed
0x30: (name=near_calls) Call branches executed
0x40: (name=taken) Taken branches executed
0x7f: (name=any) Branch instructions executed
BR_MISP_EXEC Mispredicted branches executed all 0x01: (name=cond) Mispredicted conditional branches executed
0x02: (name=direct) Mispredicted unconditional branches executed
0x04: (name=indirect_non_call) Mispredicted indirect non call branches executed
0x07: (name=non_calls) Mispredicted non call branches executed
0x08: (name=return_near) Mispredicted return branches executed
0x10: (name=direct_near_call) Mispredicted non call branches executed
0x20: (name=indirect_near_call) Mispredicted indirect call branches executed
0x30: (name=near_calls) Mispredicted call branches executed
0x40: (name=taken) Mispredicted taken branches executed
0x7f: (name=any) Mispredicted branches executed
RESOURCE_STALLS Resource related stall cycles all 0x01: (name=any) Resource related stall cycles
0x02: (name=load) Load buffer stall cycles
0x04: (name=rs_full) Reservation Station full stall cycles
0x08: (name=store) Store buffer stall cycles
0x10: (name=rob_full) ROB full stall cycles
0x20: (name=fpcw) FPU control word write stall cycles
0x40: (name=mxcsr) MXCSR rename stall cycles
0x80: (name=other) Other Resource related stall cycles
MACRO_INSTS_FUSED Macro-fused instructions decoded all 0x01: No unit mask
BACLEAR_FORCE_IQ Instruction queue forced BACLEAR all 0x01: No unit mask
LSD Cycles when uops were delivered by the LSD all 0x01: No unit mask
ITLB_FLUSH ITLB flushes all 0x01: No unit mask
OFFCORE_REQUESTS All offcore requests all 0x01: (name=demand_read_data) Offcore demand data read requests
0x02: (name=demand_read_code) Offcore demand code read requests
0x04: (name=demand_rfo) Offcore demand RFO requests
0x08: (name=any_read) Offcore read requests
0x10: (name=any_rfo) Offcore RFO requests
0x40: (name=l1d_writeback) Offcore L1 data cache writebacks
0x80: (name=any) All offcore requests
UOPS_EXECUTED Cycles Uops executed on any port (core count) all 0x01: (name=port0) Uops executed on port 0
0x02: (name=port1) Uops executed on port 1
0x04: (name=port2_core) Uops executed on port 2 (core count)
0x08: (name=port3_core) Uops executed on port 3 (core count)
0x10: (name=port4_core) Uops executed on port 4 (core count)
0x1f: (name=core_active_cycles_no_port5) Cycles Uops executed on ports 0-4 (core count)
0x20: (name=port5) Uops executed on port 5
0x3f: (name=core_active_cycles) Cycles Uops executed on any port (core count)
0x40: (name=port015) Uops issued on ports 0, 1 or 5
0x80: (name=port234_core) Uops issued on ports 2, 3 or 4
OFFCORE_REQUESTS_SQ_FULL Offcore requests blocked due to Super Queue full all 0x01: No unit mask
SNOOPQ_REQUESTS_OUTSTANDING Outstanding snoop code requests 0 0x01: (name=data) Outstanding snoop data requests
0x02: (name=invalidate) Outstanding snoop invalidate requests
0x04: (name=code) Outstanding snoop code requests
SNOOPQ_REQUESTS Snoop code requests all 0x01: (name=data) Snoop data requests
0x02: (name=invalidate) Snoop invalidate requests
0x04: (name=code) Snoop code requests
OFFCORE_RESPONSE_ANY_DATA_0 REQUEST = ANY_DATA read and RESPONSE = ANY_CACHE_DRAM 2 0x01: No unit mask
SNOOP_RESPONSE Thread responded HIT to snoop all 0x01: (name=hit) Thread responded HIT to snoop
0x02: (name=hite) Thread responded HITE to snoop
0x04: (name=hitm) Thread responded HITM to snoop
OFFCORE_RESPONSE_ANY_DATA_1 REQUEST = ANY_DATA read and RESPONSE = ANY_CACHE_DRAM 1 0x01: No unit mask
INST_RETIRED Instructions retired (Programmable counter and Precise Event) all 0x01: (name=any_p) Instructions retired (Programmable counter and Precise Event)
0x02: (name=x87) Retired floating-point operations (Precise Event)
0x04: (name=mmx) Retired MMX instructions (Precise Event)
UOPS_RETIRED Cycles Uops are being retired all 0x01: (name=active_cycles) Cycles Uops are being retired
0x02: (name=retire_slots) Retirement slots used (Precise Event)
0x04: (name=macro_fused) Macro-fused Uops retired (Precise Event)
MACHINE_CLEARS Cycles machine clear asserted all 0x01: (name=cycles) Cycles machine clear asserted
0x02: (name=mem_order) Execution pipeline restart due to Memory ordering conflicts
0x04: (name=smc) Self-Modifying Code detected
BR_INST_RETIRED Retired branch instructions (Precise Event) all 0x01: (name=conditional) Retired conditional branch instructions (Precise Event)
0x02: (name=near_call) Retired near call instructions (Precise Event)
0x04: (name=all_branches) Retired branch instructions (Precise Event)
BR_MISP_RETIRED Mispredicted retired branch instructions (Precise Event) all 0x01: (name=conditional) Mispredicted conditional retired branches (Precise Event)
0x02: (name=near_call) Mispredicted near retired calls (Precise Event)
0x04: (name=all_branches) Mispredicted retired branch instructions (Precise Event)
SSEX_UOPS_RETIRED SIMD Packed-Double Uops retired (Precise Event) all 0x01: (name=packed_single) SIMD Packed-Single Uops retired (Precise Event)
0x02: (name=scalar_single) SIMD Scalar-Single Uops retired (Precise Event)
0x04: (name=packed_double) SIMD Packed-Double Uops retired (Precise Event)
0x08: (name=scalar_double) SIMD Scalar-Double Uops retired (Precise Event)
0x10: (name=vector_integer) SIMD Vector Integer Uops retired (Precise Event)
ITLB_MISS_RETIRED Retired instructions that missed the ITLB (Precise Event) all 0x20: No unit mask
MEM_LOAD_RETIRED Retired loads that miss the DTLB (Precise Event) all 0x01: (name=l1d_hit) Retired loads that hit the L1 data cache (Precise Event)
0x02: (name=l2_hit) Retired loads that hit the L2 cache (Precise Event)
0x04: (name=llc_unshared_hit) Retired loads that hit valid versions in the LLC cache (Precise Event)
0x08: (name=other_core_l2_hit_hitm) Retired loads that hit sibling core's L2 in modified or unmodified states (Precise Event)
0x10: (name=llc_miss) Retired loads that miss the LLC cache (Precise Event)
0x40: (name=hit_lfb) Retired loads that miss L1D and hit an previously allocated LFB (Precise Event)
0x80: (name=dtlb_miss) Retired loads that miss the DTLB (Precise Event)
FP_MMX_TRANS All Floating Point to and from MMX transitions all 0x01: (name=to_fp) Transitions from MMX to Floating Point instructions
0x02: (name=to_mmx) Transitions from Floating Point to MMX instructions
0x03: (name=any) All Floating Point to and from MMX transitions
MACRO_INSTS Instructions decoded all 0x01: No unit mask
UOPS_DECODED Stack pointer instructions decoded all 0x01: (name=stall_cycles) Cycles no Uops are decoded
0x02: (name=ms_cycles_active) Uops decoded by Microcode Sequencer
0x04: (name=esp_folding) Stack pointer instructions decoded
0x08: (name=esp_sync) Stack pointer sync operations
RAT_STALLS All RAT stall cycles all 0x01: (name=flags) Flag stall cycles
0x02: (name=registers) Partial register stall cycles
0x04: (name=rob_read_port) ROB read port stalls cycles
0x08: (name=scoreboard) Scoreboard stall cycles
0x0f: (name=any) All RAT stall cycles
SEG_RENAME_STALLS Segment rename stall cycles all 0x01: No unit mask
ES_REG_RENAMES ES segment renames all 0x01: No unit mask
UOP_UNFUSION Uop unfusions due to FP exceptions all 0x01: No unit mask
BR_INST_DECODED Branch instructions decoded all 0x01: No unit mask
BPU_MISSED_CALL_RET Branch prediction unit missed call or return all 0x01: No unit mask
BACLEAR BACLEAR asserted with bad target address all 0x01: (name=clear) BACLEAR asserted, regardless of cause
0x02: (name=bad_target) BACLEAR asserted with bad target address
BPU_CLEARS Early Branch Prediction Unit clears all 0x01: (name=early) Early Branch Prediction Unit clears
0x02: (name=late) Late Branch Prediction Unit clears
L2_TRANSACTIONS All L2 transactions all 0x01: (name=load) L2 Load transactions
0x02: (name=rfo) L2 RFO transactions
0x04: (name=ifetch) L2 instruction fetch transactions
0x08: (name=prefetch) L2 prefetch transactions
0x10: (name=l1d_wb) L1D writeback to L2 transactions
0x20: (name=fill) L2 fill transactions
0x40: (name=wb) L2 writeback to LLC transactions
0x80: (name=any) All L2 transactions
L2_LINES_IN L2 lines alloacated all 0x02: (name=s_state) L2 lines allocated in the S state
0x04: (name=e_state) L2 lines allocated in the E state
0x07: (name=any) L2 lines alloacated
L2_LINES_OUT L2 lines evicted all 0x01: (name=demand_clean) L2 lines evicted by a demand request
0x02: (name=demand_dirty) L2 modified lines evicted by a demand request
0x04: (name=prefetch_clean) L2 lines evicted by a prefetch request
0x08: (name=prefetch_dirty) L2 modified lines evicted by a prefetch request
0x0f: (name=any) L2 lines evicted
SQ_MISC Super Queue LRU hints sent to LLC all 0x04: (name=lru_hints) Super Queue LRU hints sent to LLC
0x10: (name=split_lock) Super Queue lock splits across a cache line
SQ_FULL_STALL_CYCLES Super Queue full stall cycles all 0x01: No unit mask
FP_ASSIST X87 Floating point assists (Precise Event) all 0x01: (name=all) X87 Floating point assists (Precise Event)
0x02: (name=output) X87 Floating point assists for invalid output value (Precise Event)
0x04: (name=input) X87 Floating poiint assists for invalid input value (Precise Event)
SIMD_INT_64 SIMD integer 64 bit pack operations all 0x01: (name=packed_mpy) SIMD integer 64 bit packed multiply operations
0x02: (name=packed_shift) SIMD integer 64 bit shift operations
0x04: (name=pack) SIMD integer 64 bit pack operations
0x08: (name=unpack) SIMD integer 64 bit unpack operations
0x10: (name=packed_logical) SIMD integer 64 bit logical operations
0x20: (name=packed_arith) SIMD integer 64 bit arithmetic operations
0x40: (name=shuffle_move) SIMD integer 64 bit shuffle/move operations
Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. - Rob Pike
2020/07/20