Page doesn't render properly ?

Intel Haswell Microarchitecture events

This is a list of all Intel Haswell Microarchitecture performance counter event types. Please see Intel Architecture Developer's Manual Volume 3B, Appendix A and Intel Architecture Optimization Reference Manual (730795-001).

NameDescriptionCounters usableUnit mask options
CPU_CLK_UNHALTED Clock cycles when not halted all
UNHALTED_REFERENCE_CYCLES Unhalted reference cycles all 0x01: No unit mask
INST_RETIRED number of instructions retired all
LLC_MISSES Last level cache demand requests from this core that missed the LLC all 0x41: No unit mask
LLC_REFS Last level cache demand requests from this core all 0x4f: No unit mask
BR_INST_RETIRED number of branch instructions retired all
BR_MISS_PRED_RETIRED number of mispredicted branches retired (precise) all
ld_blocks all 0x02: (name=store_forward) This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load. The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceding smaller uncompleted store. The penalty for blocked store forwarding is that the load must wait for the store to write its value to the cache before it can be issued.
0x08: (name=no_sr) The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use
misalign_mem_ref all 0x01: (name=loads) Speculative cache line split load uops dispatched to L1 cache
0x02: (name=stores) Speculative cache line split STA uops dispatched to L1 cache
ld_blocks_partial_address_alias all 0x01: No unit mask
dtlb_load_misses all 0x01: (name=miss_causes_a_walk) Load misses in all DTLB levels that cause page walks
0x02: (name=walk_completed_4k) Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes (4K).
0x04: (name=walk_completed_2m_4m) Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes (2M/4M).
0x10: (name=walk_duration) This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB load misses.
0x20: (name=stlb_hit_4k) This event counts load operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks.
0x40: (name=stlb_hit_2m) This event counts load operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks.
0x80: (name=pde_cache_miss) DTLB demand load misses with low part of linear-to-physical address translation missed
0x0e: (name=walk_completed) Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size.
0x60: (name=stlb_hit) Load operations that miss the first DTLB level but hit the second and do not cause page walks
int_misc_recovery_cycles all 0x03: No unit mask
uops_issued all 0x01: (name=any) This event counts the number of uops issued by the Front-end of the pipeline to the Back-end. This event is counted at the allocation stage and will count both retired and non-retired uops.
0x10: (name=flags_merge) Number of flags-merge uops being allocated. Such uops considered perf sensitive; added by GSR u-arch.
0x20: (name=slow_lea) Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not.
0x40: (name=single_mul) Number of Multiply packed/scalar single precision uops allocated
0x01: (name=stall_cycles) Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread
0x01: (name=core_stall_cycles) Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads
l2_rqsts all 0x21: (name=demand_data_rd_miss) Demand Data Read miss L2, no rejects
0x41: (name=demand_data_rd_hit) Demand Data Read requests that hit L2 cache
0x30: (name=l2_pf_miss) L2 prefetch requests that miss L2 cache
0x50: (name=l2_pf_hit) L2 prefetch requests that hit L2 cache
0xe1: (name=all_demand_data_rd) Demand Data Read requests
0xe2: (name=all_rfo) RFO requests to L2 cache
0xe4: (name=all_code_rd) L2 code requests
0xf8: (name=all_pf) Requests from L2 hardware prefetchers
0x42: (name=rfo_hit) RFO requests that hit L2 cache
0x22: (name=rfo_miss) RFO requests that miss L2 cache
0x44: (name=code_rd_hit) L2 cache hits when fetching instructions, code reads.
0x24: (name=code_rd_miss) L2 cache misses when fetching instructions
0x27: (name=all_demand_miss) Demand requests that miss L2 cache
0xe7: (name=all_demand_references) Demand requests to L2 cache
0x3f: (name=miss) All requests that miss L2 cache
0xff: (name=references) All L2 requests
l2_demand_rqsts_wb_hit all 0x50: No unit mask
l1d_pend_miss 2 0x01: (name=pending) L1D miss oustandings duration in cycles
0x01: (name=pending_cycles) Cycles with L1D load Misses outstanding.
dtlb_store_misses all 0x01: (name=miss_causes_a_walk) Store misses in all DTLB levels that cause page walks
0x02: (name=walk_completed_4k) Store miss in all TLB levels causes a page walk that completes. (4K)
0x04: (name=walk_completed_2m_4m) Store misses in all DTLB levels that cause completed page walks (2M/4M)
0x10: (name=walk_duration) This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB store misses.
0x20: (name=stlb_hit_4k) This event counts store operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks.
0x40: (name=stlb_hit_2m) This event counts store operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks.
0x80: (name=pde_cache_miss) DTLB store misses with low part of linear-to-physical address translation missed
0x0e: (name=walk_completed) Store misses in all DTLB levels that cause completed page walks
0x60: (name=stlb_hit) Store operations that miss the first TLB level but hit the second and do not cause page walks
load_hit_pre all 0x01: (name=sw_pf) Not software-prefetch load dispatches that hit FB allocated for software prefetch
0x02: (name=hw_pf) Not software-prefetch load dispatches that hit FB allocated for hardware prefetch
ept_walk_cycles all 0x10: No unit mask
l1d_replacement all 0x01: No unit mask
tx_mem all 0x01: (name=abort_conflict) Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address
0x02: (name=abort_capacity_write) Number of times a transactional abort was signaled due to a data capacity limitation for transactional writes.
0x04: (name=abort_hle_store_to_elided_lock) Number of times a HLE transactional region aborted due to a non XRELEASE prefixed instruction writing to an elided lock in the elision buffer
0x08: (name=abort_hle_elision_buffer_not_empty) Number of times an HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero.
0x10: (name=abort_hle_elision_buffer_mismatch) Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer
0x20: (name=abort_hle_elision_buffer_unsupported_alignment) Number of times an HLE transactional execution aborted due to an unsupported read alignment from the elision buffer.
0x40: (name=hle_elision_buffer_full) Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero.
move_elimination all 0x01: (name=int_eliminated) Number of integer Move Elimination candidate uops that were eliminated.
0x02: (name=simd_eliminated) Number of SIMD Move Elimination candidate uops that were eliminated.
0x04: (name=int_not_eliminated) Number of integer Move Elimination candidate uops that were not eliminated.
0x08: (name=simd_not_eliminated) Number of SIMD Move Elimination candidate uops that were not eliminated.
cpl_cycles all 0x01: (name=ring0) Unhalted core cycles when the thread is in ring 0
0x02: (name=ring123) Unhalted core cycles when thread is in rings 1, 2, or 3
0x01: (name=ring0_trans) Number of intervals between processor halts while thread is in ring 0
tx_exec all 0x01: (name=misc1) Counts the number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution, it may not always cause a transactional abort.
0x02: (name=misc2) Counts the number of times a class of instructions (e.g., vzeroupper) that may cause a transactional abort was executed inside a transactional region
0x04: (name=misc3) Counts the number of times an instruction execution caused the transactional nest count supported to be exceeded
0x08: (name=misc4) Counts the number of times a XBEGIN instruction was executed inside an HLE transactional region.
0x10: (name=misc5) Counts the number of times an HLE XACQUIRE instruction was executed inside an RTM transactional region
rs_events all 0x01: (name=empty_cycles) This event counts cycles when the Reservation Station ( RS ) is empty for the thread. The RS is a structure that buffers allocated micro-ops from the Front-end. If there are many cycles when the RS is empty, it may represent an underflow of instructions delivered from the Front-end.
0x01: (name=empty_end) Counts end of periods where the Reservation Station (RS) was empty. Could be useful to precisely locate Frontend Latency Bound issues.
offcore_requests_outstanding all 0x01: (name=demand_data_rd) Offcore outstanding Demand Data Read transactions in uncore queue.
0x02: (name=demand_code_rd) Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle
0x04: (name=demand_rfo) Offcore outstanding RFO store transactions in SuperQueue (SQ), queue to uncore
0x08: (name=all_data_rd) Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore
0x01: (name=cycles_with_demand_data_rd) Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore
0x08: (name=cycles_with_data_rd) Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore
lock_cycles all 0x01: (name=split_lock_uc_lock_duration) Cycles when L1 and L2 are locked due to UC or split lock
0x02: (name=cache_lock_duration) Cycles when L1D is locked
idq all 0x02: (name=empty) Instruction Decode Queue (IDQ) empty cycles
0x04: (name=mite_uops) Uops delivered to Instruction Decode Queue (IDQ) from MITE path
0x08: (name=dsb_uops) Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path
0x10: (name=ms_dsb_uops) Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy
0x20: (name=ms_mite_uops) Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy
0x30: (name=ms_uops) This event counts uops delivered by the Front-end with the assistance of the microcode sequencer. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance.
0x30: (name=ms_cycles) This event counts cycles during which the microcode sequencer assisted the Front-end in delivering uops. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance.
0x04: (name=mite_cycles) Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path
0x08: (name=dsb_cycles) Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path
0x10: (name=ms_dsb_cycles) Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy
0x10: (name=ms_dsb_occur) Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy
0x18: (name=all_dsb_cycles_4_uops) Cycles Decode Stream Buffer (DSB) is delivering 4 Uops
0x18: (name=all_dsb_cycles_any_uops) Cycles Decode Stream Buffer (DSB) is delivering any Uop
0x24: (name=all_mite_cycles_4_uops) Cycles MITE is delivering 4 Uops
0x24: (name=all_mite_cycles_any_uops) Cycles MITE is delivering any Uop
0x3c: (name=mite_all_uops) Uops delivered to Instruction Decode Queue (IDQ) from MITE path
0x30: (name=ms_switches) Number of switches from DSB (Decode Stream Buffer) or MITE (legacy decode pipeline) to the Microcode Sequencer
icache all 0x02: (name=misses) This event counts Instruction Cache (ICACHE) misses.
0x04: (name=ifetch_stall) Cycles where a code-fetch stalled due to L1 instruction-cache miss or an iTLB miss
itlb_misses all 0x01: (name=miss_causes_a_walk) Misses at all ITLB levels that cause page walks
0x02: (name=walk_completed_4k) Code miss in all TLB levels causes a page walk that completes. (4K)
0x04: (name=walk_completed_2m_4m) Code miss in all TLB levels causes a page walk that completes. (2M/4M)
0x10: (name=walk_duration) This event counts cycles when the page miss handler (PMH) is servicing page walks caused by ITLB misses.
0x20: (name=stlb_hit_4k) Core misses that miss the DTLB and hit the STLB (4K)
0x40: (name=stlb_hit_2m) Code misses that miss the DTLB and hit the STLB (2M)
0x0e: (name=walk_completed) Misses in all ITLB levels that cause completed page walks
0x60: (name=stlb_hit) Operations that miss the first ITLB level but hit the second and do not cause any page walks
ild_stall all 0x01: (name=lcp) This event counts cycles where the decoder is stalled on an instruction with a length changing prefix (LCP).
0x04: (name=iq_full) Stall cycles because IQ is full
br_inst_exec all 0xff: (name=all_branches) Speculative and retired branches
0x41: (name=nontaken_conditional) Not taken macro-conditional branches
0x81: (name=taken_conditional) Taken speculative and retired macro-conditional branches
0x82: (name=taken_direct_jump) Taken speculative and retired macro-conditional branch instructions excluding calls and indirects
0x84: (name=taken_indirect_jump_non_call_ret) Taken speculative and retired indirect branches excluding calls and returns
0x88: (name=taken_indirect_near_return) Taken speculative and retired indirect branches with return mnemonic
0x90: (name=taken_direct_near_call) Taken speculative and retired direct near calls
0xa0: (name=taken_indirect_near_call) Taken speculative and retired indirect calls
0xc1: (name=all_conditional) Speculative and retired macro-conditional branches
0xc2: (name=all_direct_jmp) Speculative and retired macro-unconditional branches excluding calls and indirects
0xc4: (name=all_indirect_jump_non_call_ret) Speculative and retired indirect branches excluding calls and returns
0xc8: (name=all_indirect_near_return) Speculative and retired indirect return branches.
0xd0: (name=all_direct_near_call) Speculative and retired direct near calls
br_misp_exec all 0xff: (name=all_branches) Speculative and retired mispredicted macro conditional branches
0x41: (name=nontaken_conditional) Not taken speculative and retired mispredicted macro conditional branches
0x81: (name=taken_conditional) Taken speculative and retired mispredicted macro conditional branches
0x84: (name=taken_indirect_jump_non_call_ret) Taken speculative and retired mispredicted indirect branches excluding calls and returns
0x88: (name=taken_return_near) Taken speculative and retired mispredicted indirect branches with return mnemonic
0xc1: (name=all_conditional) Speculative and retired mispredicted macro conditional branches
0xc4: (name=all_indirect_jump_non_call_ret) Mispredicted indirect branches excluding calls and returns
0xa0: (name=taken_indirect_near_call) Taken speculative and retired mispredicted indirect calls
idq_uops_not_delivered all 0x01: (name=core) This event count the number of undelivered (unallocated) uops from the Front-end to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. The Front-end can allocate up to 4 uops per cycle so this event can increment 0-4 times per cycle depending on the number of unallocated uops. This event is counted on a per-core basis.
0x01: (name=cycles_0_uops_deliv_core) This event counts the number cycles during which the Front-end allocated exactly zero uops to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. This event is counted on a per-core basis.
0x01: (name=cycles_le_1_uop_deliv_core) Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled
0x01: (name=cycles_le_2_uop_deliv_core) Cycles with less than 2 uops delivered by the front end.
0x01: (name=cycles_le_3_uop_deliv_core) Cycles with less than 3 uops delivered by the front end.
0x01: (name=cycles_fe_was_ok) Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE.
uops_executed_port all 0x01: (name=port_0) Cycles per thread when uops are executed in port 0
0x02: (name=port_1) Cycles per thread when uops are executed in port 1
0x04: (name=port_2) Cycles per thread when uops are executed in port 2
0x08: (name=port_3) Cycles per thread when uops are executed in port 3
0x10: (name=port_4) Cycles per thread when uops are executed in port 4
0x20: (name=port_5) Cycles per thread when uops are executed in port 5
0x40: (name=port_6) Cycles per thread when uops are executed in port 6
0x80: (name=port_7) Cycles per thread when uops are executed in port 7
0x01: (name=port_0_core) Cycles per core when uops are exectuted in port 0
0x02: (name=port_1_core) Cycles per core when uops are exectuted in port 1
0x04: (name=port_2_core) Cycles per core when uops are dispatched to port 2
0x08: (name=port_3_core) Cycles per core when uops are dispatched to port 3
0x10: (name=port_4_core) Cycles per core when uops are exectuted in port 4
0x20: (name=port_5_core) Cycles per core when uops are exectuted in port 5
0x40: (name=port_6_core) Cycles per core when uops are exectuted in port 6
0x80: (name=port_7_core) Cycles per core when uops are dispatched to port 7
resource_stalls all 0x01: (name=any) Resource-related stall cycles
0x04: (name=rs) Cycles stalled due to no eligible RS entry available.
0x08: (name=sb) This event counts cycles during which no instructions were allocated because no Store Buffers (SB) were available.
0x10: (name=rob) Cycles stalled due to re-order buffer full.
cycle_activity 2 0x01: (name=cycles_l2_pending) Cycles with pending L2 cache miss loads.
0x08: (name=cycles_l1d_pending) Cycles with pending L1 cache miss loads.
0x02: (name=cycles_ldm_pending) Cycles with pending memory loads.
0x04: (name=cycles_no_execute) This event counts cycles during which no instructions were executed in the execution stage of the pipeline.
0x05: (name=stalls_l2_pending) Execution stalls due to L2 cache misses.
0x06: (name=stalls_ldm_pending) This event counts cycles during which no instructions were executed in the execution stage of the pipeline and there were memory instructions pending (waiting for data).
0x0c: (name=stalls_l1d_pending) Execution stalls due to L1 data cache misses
lsd_uops all 0x01: No unit mask
dsb2mite_switches_penalty_cycles all 0x02: No unit mask
itlb_itlb_flush all 0x01: No unit mask
offcore_requests all 0x01: (name=demand_data_rd) Demand Data Read requests sent to uncore
0x02: (name=demand_code_rd) Cacheable and noncachaeble code read requests
0x04: (name=demand_rfo) Demand RFO requests including regular RFOs, locks, ItoM
0x08: (name=all_data_rd) Demand and prefetch data reads
uops_executed all 0x02: (name=core) Number of uops executed on the core. Errata: HSM31
0x01: (name=stall_cycles) Counts number of cycles no uops were dispatched to be executed on this thread.
0x01: (name=cycles_ge_1_uops_exec) This events counts the cycles where at least one uop was executed. It is counted per thread. Errata: HSM31
0x01: (name=cycles_ge_2_uops_exec) This events counts the cycles where at least two uop were executed. It is counted per thread. Errata: HSM31
0x01: (name=cycles_ge_3_uops_exec) This events counts the cycles where at least three uop were executed. It is counted per thread. Errata: HSM31
0x01: (name=cycles_ge_4_uops_exec) Cycles where at least 4 uops were executed per-thread Errata: HSM31
page_walker_loads all 0x11: (name=dtlb_l1) Number of DTLB page walker hits in the L1+FB
0x21: (name=itlb_l1) Number of ITLB page walker hits in the L1+FB
0x41: (name=ept_dtlb_l1) Counts the number of Extended Page Table walks from the DTLB that hit in the L1 and FB.
0x81: (name=ept_itlb_l1) Counts the number of Extended Page Table walks from the ITLB that hit in the L1 and FB.
0x12: (name=dtlb_l2) Number of DTLB page walker hits in the L2
0x22: (name=itlb_l2) Number of ITLB page walker hits in the L2
0x42: (name=ept_dtlb_l2) Counts the number of Extended Page Table walks from the DTLB that hit in the L2.
0x82: (name=ept_itlb_l2) Counts the number of Extended Page Table walks from the ITLB that hit in the L2.
0x14: (name=dtlb_l3) Number of DTLB page walker hits in the L3 + XSNP
0x24: (name=itlb_l3) Number of ITLB page walker hits in the L3 + XSNP
0x44: (name=ept_dtlb_l3) Counts the number of Extended Page Table walks from the DTLB that hit in the L3.
0x84: (name=ept_itlb_l3) Counts the number of Extended Page Table walks from the ITLB that hit in the L2.
0x18: (name=dtlb_memory) Number of DTLB page walker hits in Memory
0x48: (name=ept_dtlb_memory) Counts the number of Extended Page Table walks from the DTLB that hit in memory.
0x88: (name=ept_itlb_memory) Counts the number of Extended Page Table walks from the ITLB that hit in memory.
tlb_flush all 0x01: (name=dtlb_thread) DTLB flush attempts of the thread-specific entries
0x20: (name=stlb_any) STLB flush attempts
inst_retired_prec_dist 1 0x01: No unit mask
other_assists all 0x08: (name=avx_to_sse) Number of transitions from AVX-256 to legacy SSE when penalty applicable. Errata: HSM57
0x10: (name=sse_to_avx) Number of transitions from SSE to AVX-256 when penalty applicable. Errata: HSM57
0x40: (name=any_wb_assist) Number of times any microcode assist is invoked by HW upon uop writeback.
uops_retired all 0x01: (name=all) Actually retired uops.
0x01: (name=all_pebs) Actually retired uops.
0x02: (name=retire_slots) This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle.
0x02: (name=retire_slots_pebs) This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle.
0x01: (name=stall_cycles) Cycles without actually retired uops.
0x01: (name=total_cycles) Cycles with less than 10 actually retired uops.
0x01: (name=core_stall_cycles) Cycles without actually retired uops.
machine_clears all 0x01: (name=cycles) Cycles there was a Nuke. Account for both thread-specific and All Thread Nukes.
0x02: (name=memory_ordering) This event counts the number of memory ordering machine clears detected. Memory ordering machine clears can result from memory address aliasing or snoops from another hardware thread or core to data inflight in the pipeline. Machine clears can have a significant performance impact if they are happening frequently.
0x04: (name=smc) This event is incremented when self-modifying code (SMC) is detected, which causes a machine clear. Machine clears can have a significant performance impact if they are happening frequently.
0x20: (name=maskmov) This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0.
0x01: (name=count) Number of machine clears (nukes) of any type.
br_inst_retired all 0x01: (name=conditional) Conditional branch instructions retired.
0x01: (name=conditional_pebs) Conditional branch instructions retired.
0x02: (name=near_call) Direct and indirect near call instructions retired.
0x02: (name=near_call_pebs) Direct and indirect near call instructions retired.
0x08: (name=near_return) Return instructions retired.
0x08: (name=near_return_pebs) Return instructions retired.
0x10: (name=not_taken) Not taken branch instructions retired.
0x20: (name=near_taken) Taken branch instructions retired.
0x20: (name=near_taken_pebs) Taken branch instructions retired.
0x40: (name=far_branch) Far branch instructions retired.
0x04: (name=all_branches_pebs) All (macro) branch instructions retired.
br_misp_retired all 0x01: (name=conditional) Mispredicted conditional branch instructions retired.
0x01: (name=conditional_pebs) Mispredicted conditional branch instructions retired.
0x04: (name=all_branches_pebs) This event counts all mispredicted branch instructions retired. This is a precise event.
0x20: (name=near_taken) number of near branch instructions retired that were mispredicted and taken.
0x20: (name=near_taken_pebs) number of near branch instructions retired that were mispredicted and taken.
hle_retired all 0x01: (name=start) Number of times an HLE execution started.
0x02: (name=commit) Number of times an HLE execution successfully committed
0x04: (name=aborted) Number of times an HLE execution aborted due to any reasons (multiple categories may count as one).
0x04: (name=aborted_pebs) Number of times an HLE execution aborted due to any reasons (multiple categories may count as one).
0x08: (name=aborted_misc1) Number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts).
0x10: (name=aborted_misc2) Number of times an HLE execution aborted due to uncommon conditions
0x20: (name=aborted_misc3) Number of times an HLE execution aborted due to HLE-unfriendly instructions
0x40: (name=aborted_misc4) Number of times an HLE execution aborted due to incompatible memory type
0x80: (name=aborted_misc5) Number of times an HLE execution aborted due to none of the previous 4 categories (e.g. interrupts)
rtm_retired all 0x01: (name=start) Number of times an RTM execution started.
0x02: (name=commit) Number of times an RTM execution successfully committed
0x04: (name=aborted) Number of times an RTM execution aborted due to any reasons (multiple categories may count as one).
0x04: (name=aborted_pebs) Number of times an RTM execution aborted due to any reasons (multiple categories may count as one).
0x08: (name=aborted_misc1) Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts)
0x10: (name=aborted_misc2) Number of times an RTM execution aborted due to various memory events (e.g., read/write capacity and conflicts).
0x20: (name=aborted_misc3) Number of times an RTM execution aborted due to HLE-unfriendly instructions
0x40: (name=aborted_misc4) Number of times an RTM execution aborted due to incompatible memory type
0x80: (name=aborted_misc5) Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt)
fp_assist all 0x1e: (name=any) Cycles with any input/output SSE or FP assist
0x02: (name=x87_output) Number of X87 assists due to output value.
0x04: (name=x87_input) Number of X87 assists due to input value.
0x08: (name=simd_output) Number of SIMD FP assists due to Output values
0x10: (name=simd_input) Number of SIMD FP assists due to input values
rob_misc_events_lbr_inserts all 0x20: No unit mask
mem_uops_retired all 0x11: (name=stlb_miss_loads) Load uops with true STLB miss retired to architected path. Errata: HSM30
0x11: (name=stlb_miss_loads_pebs) Load uops with true STLB miss retired to architected path. Errata: HSM30
0x12: (name=stlb_miss_stores) Store uops with true STLB miss retired to architected path. Errata: HSM30
0x12: (name=stlb_miss_stores_pebs) Store uops with true STLB miss retired to architected path. Errata: HSM30
0x21: (name=lock_loads) Load uops with locked access retired to architected path. Errata: HSM30
0x21: (name=lock_loads_pebs) Load uops with locked access retired to architected path. Errata: HSM30
0x41: (name=split_loads) Line-splitted load uops retired to architected path. Errata: HSM30
0x41: (name=split_loads_pebs) Line-splitted load uops retired to architected path. Errata: HSM30
0x42: (name=split_stores) Line-splitted store uops retired to architected path. Errata: HSM30
0x42: (name=split_stores_pebs) Line-splitted store uops retired to architected path. Errata: HSM30
0x81: (name=all_loads) Load uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30
0x81: (name=all_loads_pebs) Load uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30
0x82: (name=all_stores) Store uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30
0x82: (name=all_stores_pebs) Store uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30
mem_load_uops_retired all 0x01: (name=l1_hit) Retired load uops with L1 cache hits as data sources. Errata: HSM30
0x01: (name=l1_hit_pebs) Retired load uops with L1 cache hits as data sources. Errata: HSM30
0x02: (name=l2_hit) Retired load uops with L2 cache hits as data sources. Errata: HSM30
0x02: (name=l2_hit_pebs) Retired load uops with L2 cache hits as data sources. Errata: HSM30
0x04: (name=l3_hit) Retired load uops which data sources were data hits in L3 without snoops required. Errata: HSM26, HSM30
0x04: (name=l3_hit_pebs) Retired load uops which data sources were data hits in L3 without snoops required. Errata: HSM26, HSM30
0x08: (name=l1_miss) Retired load uops misses in L1 cache as data sources. Errata: HSM30
0x08: (name=l1_miss_pebs) Retired load uops misses in L1 cache as data sources. Errata: HSM30
0x10: (name=l2_miss) Miss in mid-level (L2) cache. Excludes Unknown data-source. Errata: HSM30
0x10: (name=l2_miss_pebs) Miss in mid-level (L2) cache. Excludes Unknown data-source. Errata: HSM30
0x20: (name=l3_miss) Miss in last-level (L3) cache. Excludes Unknown data-source. Errata: HSM26, HSM30
0x20: (name=l3_miss_pebs) Miss in last-level (L3) cache. Excludes Unknown data-source. Errata: HSM26, HSM30
0x40: (name=hit_lfb) Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. Errata: HSM30
0x40: (name=hit_lfb_pebs) Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. Errata: HSM30
mem_load_uops_l3_hit_retired all 0x01: (name=xsnp_miss) Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache. Errata: HSM26, HSM30
0x01: (name=xsnp_miss_pebs) Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache. Errata: HSM26, HSM30
0x02: (name=xsnp_hit) Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache. Errata: HSM26, HSM30
0x02: (name=xsnp_hit_pebs) Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache. Errata: HSM26, HSM30
0x04: (name=xsnp_hitm) Retired load uops which data sources were HitM responses from shared L3. Errata: HSM26, HSM30
0x04: (name=xsnp_hitm_pebs) Retired load uops which data sources were HitM responses from shared L3. Errata: HSM26, HSM30
0x08: (name=xsnp_none) Retired load uops which data sources were hits in L3 without snoops required. Errata: HSM26, HSM30
0x08: (name=xsnp_none_pebs) Retired load uops which data sources were hits in L3 without snoops required. Errata: HSM26, HSM30
mem_load_uops_l3_miss_retired all 0x01: (name=local_dram) This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches. Errata: HSM30
0x01: (name=local_dram_pebs) This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches. Errata: HSM30
baclears_any all 0x1f: No unit mask
l2_trans all 0x80: (name=all_requests) Transactions accessing L2 pipe
0x01: (name=demand_data_rd) Demand Data Read requests that access L2 cache
0x02: (name=rfo) RFO requests that access L2 cache
0x04: (name=code_rd) L2 cache accesses when fetching instructions
0x08: (name=all_pf) L2 or L3 HW prefetches that access L2 cache
0x10: (name=l1d_wb) L1D writebacks that access L2 cache
0x20: (name=l2_fill) L2 fill requests that access L2 cache
0x40: (name=l2_wb) L2 writebacks that access L2 cache
l2_lines_in all 0x07: (name=all) This event counts the number of L2 cache lines brought into the L2 cache. Lines are filled into the L2 cache when there was an L2 miss.
0x01: (name=i) L2 cache lines in I state filling L2
0x02: (name=s) L2 cache lines in S state filling L2
0x04: (name=e) L2 cache lines in E state filling L2
l2_lines_out all 0x05: (name=demand_clean) Clean L2 cache lines evicted by demand
0x06: (name=demand_dirty) Dirty L2 cache lines evicted by demand
Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is. - Rob Pike
2020/07/20