This is a list of all Intel Haswell Microarchitecture performance counter event types. Please see Intel Architecture Developer's Manual Volume 3B, Appendix A and Intel Architecture Optimization Reference Manual (730795-001).
Name | Description | Counters usable | Unit mask options |
CPU_CLK_UNHALTED | Clock cycles when not halted | all | |
UNHALTED_REFERENCE_CYCLES | Unhalted reference cycles | all |
0x01: No unit mask
|
INST_RETIRED | number of instructions retired | all | |
LLC_MISSES | Last level cache demand requests from this core that missed the LLC | all |
0x41: No unit mask
|
LLC_REFS | Last level cache demand requests from this core | all |
0x4f: No unit mask
|
BR_INST_RETIRED | number of branch instructions retired | all | |
BR_MISS_PRED_RETIRED | number of mispredicted branches retired (precise) | all | |
ld_blocks | all |
0x02: (name=store_forward) This event counts loads that followed a store to the same address, where the data could not be forwarded inside the pipeline from the store to the load. The most common reason why store forwarding would be blocked is when a load's address range overlaps with a preceding smaller uncompleted store. The penalty for blocked store forwarding is that the load must wait for the store to write its value to the cache before it can be issued.
0x08: (name=no_sr) The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use |
|
misalign_mem_ref | all |
0x01: (name=loads) Speculative cache line split load uops dispatched to L1 cache
0x02: (name=stores) Speculative cache line split STA uops dispatched to L1 cache |
|
ld_blocks_partial_address_alias | all |
0x01: No unit mask
|
|
dtlb_load_misses | all |
0x01: (name=miss_causes_a_walk) Load misses in all DTLB levels that cause page walks
0x02: (name=walk_completed_4k) Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes (4K). 0x04: (name=walk_completed_2m_4m) Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes (2M/4M). 0x10: (name=walk_duration) This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB load misses. 0x20: (name=stlb_hit_4k) This event counts load operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks. 0x40: (name=stlb_hit_2m) This event counts load operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks. 0x80: (name=pde_cache_miss) DTLB demand load misses with low part of linear-to-physical address translation missed 0x0e: (name=walk_completed) Demand load Miss in all translation lookaside buffer (TLB) levels causes a page walk that completes of any page size. 0x60: (name=stlb_hit) Load operations that miss the first DTLB level but hit the second and do not cause page walks |
|
int_misc_recovery_cycles | all |
0x03: No unit mask
|
|
uops_issued | all |
0x01: (name=any) This event counts the number of uops issued by the Front-end of the pipeline to the Back-end. This event is counted at the allocation stage and will count both retired and non-retired uops.
0x10: (name=flags_merge) Number of flags-merge uops being allocated. Such uops considered perf sensitive; added by GSR u-arch. 0x20: (name=slow_lea) Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. 0x40: (name=single_mul) Number of Multiply packed/scalar single precision uops allocated 0x01: (name=stall_cycles) Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread 0x01: (name=core_stall_cycles) Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for all threads |
|
l2_rqsts | all |
0x21: (name=demand_data_rd_miss) Demand Data Read miss L2, no rejects
0x41: (name=demand_data_rd_hit) Demand Data Read requests that hit L2 cache 0x30: (name=l2_pf_miss) L2 prefetch requests that miss L2 cache 0x50: (name=l2_pf_hit) L2 prefetch requests that hit L2 cache 0xe1: (name=all_demand_data_rd) Demand Data Read requests 0xe2: (name=all_rfo) RFO requests to L2 cache 0xe4: (name=all_code_rd) L2 code requests 0xf8: (name=all_pf) Requests from L2 hardware prefetchers 0x42: (name=rfo_hit) RFO requests that hit L2 cache 0x22: (name=rfo_miss) RFO requests that miss L2 cache 0x44: (name=code_rd_hit) L2 cache hits when fetching instructions, code reads. 0x24: (name=code_rd_miss) L2 cache misses when fetching instructions 0x27: (name=all_demand_miss) Demand requests that miss L2 cache 0xe7: (name=all_demand_references) Demand requests to L2 cache 0x3f: (name=miss) All requests that miss L2 cache 0xff: (name=references) All L2 requests |
|
l2_demand_rqsts_wb_hit | all |
0x50: No unit mask
|
|
l1d_pend_miss | 2 |
0x01: (name=pending) L1D miss oustandings duration in cycles
0x01: (name=pending_cycles) Cycles with L1D load Misses outstanding. |
|
dtlb_store_misses | all |
0x01: (name=miss_causes_a_walk) Store misses in all DTLB levels that cause page walks
0x02: (name=walk_completed_4k) Store miss in all TLB levels causes a page walk that completes. (4K) 0x04: (name=walk_completed_2m_4m) Store misses in all DTLB levels that cause completed page walks (2M/4M) 0x10: (name=walk_duration) This event counts cycles when the page miss handler (PMH) is servicing page walks caused by DTLB store misses. 0x20: (name=stlb_hit_4k) This event counts store operations from a 4K page that miss the first DTLB level but hit the second and do not cause page walks. 0x40: (name=stlb_hit_2m) This event counts store operations from a 2M page that miss the first DTLB level but hit the second and do not cause page walks. 0x80: (name=pde_cache_miss) DTLB store misses with low part of linear-to-physical address translation missed 0x0e: (name=walk_completed) Store misses in all DTLB levels that cause completed page walks 0x60: (name=stlb_hit) Store operations that miss the first TLB level but hit the second and do not cause page walks |
|
load_hit_pre | all |
0x01: (name=sw_pf) Not software-prefetch load dispatches that hit FB allocated for software prefetch
0x02: (name=hw_pf) Not software-prefetch load dispatches that hit FB allocated for hardware prefetch |
|
ept_walk_cycles | all |
0x10: No unit mask
|
|
l1d_replacement | all |
0x01: No unit mask
|
|
tx_mem | all |
0x01: (name=abort_conflict) Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address
0x02: (name=abort_capacity_write) Number of times a transactional abort was signaled due to a data capacity limitation for transactional writes. 0x04: (name=abort_hle_store_to_elided_lock) Number of times a HLE transactional region aborted due to a non XRELEASE prefixed instruction writing to an elided lock in the elision buffer 0x08: (name=abort_hle_elision_buffer_not_empty) Number of times an HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero. 0x10: (name=abort_hle_elision_buffer_mismatch) Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer 0x20: (name=abort_hle_elision_buffer_unsupported_alignment) Number of times an HLE transactional execution aborted due to an unsupported read alignment from the elision buffer. 0x40: (name=hle_elision_buffer_full) Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero. |
|
move_elimination | all |
0x01: (name=int_eliminated) Number of integer Move Elimination candidate uops that were eliminated.
0x02: (name=simd_eliminated) Number of SIMD Move Elimination candidate uops that were eliminated. 0x04: (name=int_not_eliminated) Number of integer Move Elimination candidate uops that were not eliminated. 0x08: (name=simd_not_eliminated) Number of SIMD Move Elimination candidate uops that were not eliminated. |
|
cpl_cycles | all |
0x01: (name=ring0) Unhalted core cycles when the thread is in ring 0
0x02: (name=ring123) Unhalted core cycles when thread is in rings 1, 2, or 3 0x01: (name=ring0_trans) Number of intervals between processor halts while thread is in ring 0 |
|
tx_exec | all |
0x01: (name=misc1) Counts the number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution, it may not always cause a transactional abort.
0x02: (name=misc2) Counts the number of times a class of instructions (e.g., vzeroupper) that may cause a transactional abort was executed inside a transactional region 0x04: (name=misc3) Counts the number of times an instruction execution caused the transactional nest count supported to be exceeded 0x08: (name=misc4) Counts the number of times a XBEGIN instruction was executed inside an HLE transactional region. 0x10: (name=misc5) Counts the number of times an HLE XACQUIRE instruction was executed inside an RTM transactional region |
|
rs_events | all |
0x01: (name=empty_cycles) This event counts cycles when the Reservation Station ( RS ) is empty for the thread. The RS is a structure that buffers allocated micro-ops from the Front-end. If there are many cycles when the RS is empty, it may represent an underflow of instructions delivered from the Front-end.
0x01: (name=empty_end) Counts end of periods where the Reservation Station (RS) was empty. Could be useful to precisely locate Frontend Latency Bound issues. |
|
offcore_requests_outstanding | all |
0x01: (name=demand_data_rd) Offcore outstanding Demand Data Read transactions in uncore queue.
0x02: (name=demand_code_rd) Offcore outstanding code reads transactions in SuperQueue (SQ), queue to uncore, every cycle 0x04: (name=demand_rfo) Offcore outstanding RFO store transactions in SuperQueue (SQ), queue to uncore 0x08: (name=all_data_rd) Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore 0x01: (name=cycles_with_demand_data_rd) Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore 0x08: (name=cycles_with_data_rd) Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore |
|
lock_cycles | all |
0x01: (name=split_lock_uc_lock_duration) Cycles when L1 and L2 are locked due to UC or split lock
0x02: (name=cache_lock_duration) Cycles when L1D is locked |
|
idq | all |
0x02: (name=empty) Instruction Decode Queue (IDQ) empty cycles
0x04: (name=mite_uops) Uops delivered to Instruction Decode Queue (IDQ) from MITE path 0x08: (name=dsb_uops) Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path 0x10: (name=ms_dsb_uops) Uops initiated by Decode Stream Buffer (DSB) that are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x20: (name=ms_mite_uops) Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x30: (name=ms_uops) This event counts uops delivered by the Front-end with the assistance of the microcode sequencer. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance. 0x30: (name=ms_cycles) This event counts cycles during which the microcode sequencer assisted the Front-end in delivering uops. Microcode assists are used for complex instructions or scenarios that can't be handled by the standard decoder. Using other instructions, if possible, will usually improve performance. 0x04: (name=mite_cycles) Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path 0x08: (name=dsb_cycles) Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path 0x10: (name=ms_dsb_cycles) Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x10: (name=ms_dsb_occur) Deliveries to Instruction Decode Queue (IDQ) initiated by Decode Stream Buffer (DSB) while Microcode Sequenser (MS) is busy 0x18: (name=all_dsb_cycles_4_uops) Cycles Decode Stream Buffer (DSB) is delivering 4 Uops 0x18: (name=all_dsb_cycles_any_uops) Cycles Decode Stream Buffer (DSB) is delivering any Uop 0x24: (name=all_mite_cycles_4_uops) Cycles MITE is delivering 4 Uops 0x24: (name=all_mite_cycles_any_uops) Cycles MITE is delivering any Uop 0x3c: (name=mite_all_uops) Uops delivered to Instruction Decode Queue (IDQ) from MITE path 0x30: (name=ms_switches) Number of switches from DSB (Decode Stream Buffer) or MITE (legacy decode pipeline) to the Microcode Sequencer |
|
icache | all |
0x02: (name=misses) This event counts Instruction Cache (ICACHE) misses.
0x04: (name=ifetch_stall) Cycles where a code-fetch stalled due to L1 instruction-cache miss or an iTLB miss |
|
itlb_misses | all |
0x01: (name=miss_causes_a_walk) Misses at all ITLB levels that cause page walks
0x02: (name=walk_completed_4k) Code miss in all TLB levels causes a page walk that completes. (4K) 0x04: (name=walk_completed_2m_4m) Code miss in all TLB levels causes a page walk that completes. (2M/4M) 0x10: (name=walk_duration) This event counts cycles when the page miss handler (PMH) is servicing page walks caused by ITLB misses. 0x20: (name=stlb_hit_4k) Core misses that miss the DTLB and hit the STLB (4K) 0x40: (name=stlb_hit_2m) Code misses that miss the DTLB and hit the STLB (2M) 0x0e: (name=walk_completed) Misses in all ITLB levels that cause completed page walks 0x60: (name=stlb_hit) Operations that miss the first ITLB level but hit the second and do not cause any page walks |
|
ild_stall | all |
0x01: (name=lcp) This event counts cycles where the decoder is stalled on an instruction with a length changing prefix (LCP).
0x04: (name=iq_full) Stall cycles because IQ is full |
|
br_inst_exec | all |
0xff: (name=all_branches) Speculative and retired branches
0x41: (name=nontaken_conditional) Not taken macro-conditional branches 0x81: (name=taken_conditional) Taken speculative and retired macro-conditional branches 0x82: (name=taken_direct_jump) Taken speculative and retired macro-conditional branch instructions excluding calls and indirects 0x84: (name=taken_indirect_jump_non_call_ret) Taken speculative and retired indirect branches excluding calls and returns 0x88: (name=taken_indirect_near_return) Taken speculative and retired indirect branches with return mnemonic 0x90: (name=taken_direct_near_call) Taken speculative and retired direct near calls 0xa0: (name=taken_indirect_near_call) Taken speculative and retired indirect calls 0xc1: (name=all_conditional) Speculative and retired macro-conditional branches 0xc2: (name=all_direct_jmp) Speculative and retired macro-unconditional branches excluding calls and indirects 0xc4: (name=all_indirect_jump_non_call_ret) Speculative and retired indirect branches excluding calls and returns 0xc8: (name=all_indirect_near_return) Speculative and retired indirect return branches. 0xd0: (name=all_direct_near_call) Speculative and retired direct near calls |
|
br_misp_exec | all |
0xff: (name=all_branches) Speculative and retired mispredicted macro conditional branches
0x41: (name=nontaken_conditional) Not taken speculative and retired mispredicted macro conditional branches 0x81: (name=taken_conditional) Taken speculative and retired mispredicted macro conditional branches 0x84: (name=taken_indirect_jump_non_call_ret) Taken speculative and retired mispredicted indirect branches excluding calls and returns 0x88: (name=taken_return_near) Taken speculative and retired mispredicted indirect branches with return mnemonic 0xc1: (name=all_conditional) Speculative and retired mispredicted macro conditional branches 0xc4: (name=all_indirect_jump_non_call_ret) Mispredicted indirect branches excluding calls and returns 0xa0: (name=taken_indirect_near_call) Taken speculative and retired mispredicted indirect calls |
|
idq_uops_not_delivered | all |
0x01: (name=core) This event count the number of undelivered (unallocated) uops from the Front-end to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. The Front-end can allocate up to 4 uops per cycle so this event can increment 0-4 times per cycle depending on the number of unallocated uops. This event is counted on a per-core basis.
0x01: (name=cycles_0_uops_deliv_core) This event counts the number cycles during which the Front-end allocated exactly zero uops to the Resource Allocation Table (RAT) while the Back-end of the processor is not stalled. This event is counted on a per-core basis. 0x01: (name=cycles_le_1_uop_deliv_core) Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled 0x01: (name=cycles_le_2_uop_deliv_core) Cycles with less than 2 uops delivered by the front end. 0x01: (name=cycles_le_3_uop_deliv_core) Cycles with less than 3 uops delivered by the front end. 0x01: (name=cycles_fe_was_ok) Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE. |
|
uops_executed_port | all |
0x01: (name=port_0) Cycles per thread when uops are executed in port 0
0x02: (name=port_1) Cycles per thread when uops are executed in port 1 0x04: (name=port_2) Cycles per thread when uops are executed in port 2 0x08: (name=port_3) Cycles per thread when uops are executed in port 3 0x10: (name=port_4) Cycles per thread when uops are executed in port 4 0x20: (name=port_5) Cycles per thread when uops are executed in port 5 0x40: (name=port_6) Cycles per thread when uops are executed in port 6 0x80: (name=port_7) Cycles per thread when uops are executed in port 7 0x01: (name=port_0_core) Cycles per core when uops are exectuted in port 0 0x02: (name=port_1_core) Cycles per core when uops are exectuted in port 1 0x04: (name=port_2_core) Cycles per core when uops are dispatched to port 2 0x08: (name=port_3_core) Cycles per core when uops are dispatched to port 3 0x10: (name=port_4_core) Cycles per core when uops are exectuted in port 4 0x20: (name=port_5_core) Cycles per core when uops are exectuted in port 5 0x40: (name=port_6_core) Cycles per core when uops are exectuted in port 6 0x80: (name=port_7_core) Cycles per core when uops are dispatched to port 7 |
|
resource_stalls | all |
0x01: (name=any) Resource-related stall cycles
0x04: (name=rs) Cycles stalled due to no eligible RS entry available. 0x08: (name=sb) This event counts cycles during which no instructions were allocated because no Store Buffers (SB) were available. 0x10: (name=rob) Cycles stalled due to re-order buffer full. |
|
cycle_activity | 2 |
0x01: (name=cycles_l2_pending) Cycles with pending L2 cache miss loads.
0x08: (name=cycles_l1d_pending) Cycles with pending L1 cache miss loads. 0x02: (name=cycles_ldm_pending) Cycles with pending memory loads. 0x04: (name=cycles_no_execute) This event counts cycles during which no instructions were executed in the execution stage of the pipeline. 0x05: (name=stalls_l2_pending) Execution stalls due to L2 cache misses. 0x06: (name=stalls_ldm_pending) This event counts cycles during which no instructions were executed in the execution stage of the pipeline and there were memory instructions pending (waiting for data). 0x0c: (name=stalls_l1d_pending) Execution stalls due to L1 data cache misses |
|
lsd_uops | all |
0x01: No unit mask
|
|
dsb2mite_switches_penalty_cycles | all |
0x02: No unit mask
|
|
itlb_itlb_flush | all |
0x01: No unit mask
|
|
offcore_requests | all |
0x01: (name=demand_data_rd) Demand Data Read requests sent to uncore
0x02: (name=demand_code_rd) Cacheable and noncachaeble code read requests 0x04: (name=demand_rfo) Demand RFO requests including regular RFOs, locks, ItoM 0x08: (name=all_data_rd) Demand and prefetch data reads |
|
uops_executed | all |
0x02: (name=core) Number of uops executed on the core. Errata: HSM31
0x01: (name=stall_cycles) Counts number of cycles no uops were dispatched to be executed on this thread. 0x01: (name=cycles_ge_1_uops_exec) This events counts the cycles where at least one uop was executed. It is counted per thread. Errata: HSM31 0x01: (name=cycles_ge_2_uops_exec) This events counts the cycles where at least two uop were executed. It is counted per thread. Errata: HSM31 0x01: (name=cycles_ge_3_uops_exec) This events counts the cycles where at least three uop were executed. It is counted per thread. Errata: HSM31 0x01: (name=cycles_ge_4_uops_exec) Cycles where at least 4 uops were executed per-thread Errata: HSM31 |
|
page_walker_loads | all |
0x11: (name=dtlb_l1) Number of DTLB page walker hits in the L1+FB
0x21: (name=itlb_l1) Number of ITLB page walker hits in the L1+FB 0x41: (name=ept_dtlb_l1) Counts the number of Extended Page Table walks from the DTLB that hit in the L1 and FB. 0x81: (name=ept_itlb_l1) Counts the number of Extended Page Table walks from the ITLB that hit in the L1 and FB. 0x12: (name=dtlb_l2) Number of DTLB page walker hits in the L2 0x22: (name=itlb_l2) Number of ITLB page walker hits in the L2 0x42: (name=ept_dtlb_l2) Counts the number of Extended Page Table walks from the DTLB that hit in the L2. 0x82: (name=ept_itlb_l2) Counts the number of Extended Page Table walks from the ITLB that hit in the L2. 0x14: (name=dtlb_l3) Number of DTLB page walker hits in the L3 + XSNP 0x24: (name=itlb_l3) Number of ITLB page walker hits in the L3 + XSNP 0x44: (name=ept_dtlb_l3) Counts the number of Extended Page Table walks from the DTLB that hit in the L3. 0x84: (name=ept_itlb_l3) Counts the number of Extended Page Table walks from the ITLB that hit in the L2. 0x18: (name=dtlb_memory) Number of DTLB page walker hits in Memory 0x48: (name=ept_dtlb_memory) Counts the number of Extended Page Table walks from the DTLB that hit in memory. 0x88: (name=ept_itlb_memory) Counts the number of Extended Page Table walks from the ITLB that hit in memory. |
|
tlb_flush | all |
0x01: (name=dtlb_thread) DTLB flush attempts of the thread-specific entries
0x20: (name=stlb_any) STLB flush attempts |
|
inst_retired_prec_dist | 1 |
0x01: No unit mask
|
|
other_assists | all |
0x08: (name=avx_to_sse) Number of transitions from AVX-256 to legacy SSE when penalty applicable. Errata: HSM57
0x10: (name=sse_to_avx) Number of transitions from SSE to AVX-256 when penalty applicable. Errata: HSM57 0x40: (name=any_wb_assist) Number of times any microcode assist is invoked by HW upon uop writeback. |
|
uops_retired | all |
0x01: (name=all) Actually retired uops.
0x01: (name=all_pebs) Actually retired uops. 0x02: (name=retire_slots) This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle. 0x02: (name=retire_slots_pebs) This event counts the number of retirement slots used each cycle. There are potentially 4 slots that can be used each cycle - meaning, 4 uops or 4 instructions could retire each cycle. 0x01: (name=stall_cycles) Cycles without actually retired uops. 0x01: (name=total_cycles) Cycles with less than 10 actually retired uops. 0x01: (name=core_stall_cycles) Cycles without actually retired uops. |
|
machine_clears | all |
0x01: (name=cycles) Cycles there was a Nuke. Account for both thread-specific and All Thread Nukes.
0x02: (name=memory_ordering) This event counts the number of memory ordering machine clears detected. Memory ordering machine clears can result from memory address aliasing or snoops from another hardware thread or core to data inflight in the pipeline. Machine clears can have a significant performance impact if they are happening frequently. 0x04: (name=smc) This event is incremented when self-modifying code (SMC) is detected, which causes a machine clear. Machine clears can have a significant performance impact if they are happening frequently. 0x20: (name=maskmov) This event counts the number of executed Intel AVX masked load operations that refer to an illegal address range with the mask bits set to 0. 0x01: (name=count) Number of machine clears (nukes) of any type. |
|
br_inst_retired | all |
0x01: (name=conditional) Conditional branch instructions retired.
0x01: (name=conditional_pebs) Conditional branch instructions retired. 0x02: (name=near_call) Direct and indirect near call instructions retired. 0x02: (name=near_call_pebs) Direct and indirect near call instructions retired. 0x08: (name=near_return) Return instructions retired. 0x08: (name=near_return_pebs) Return instructions retired. 0x10: (name=not_taken) Not taken branch instructions retired. 0x20: (name=near_taken) Taken branch instructions retired. 0x20: (name=near_taken_pebs) Taken branch instructions retired. 0x40: (name=far_branch) Far branch instructions retired. 0x04: (name=all_branches_pebs) All (macro) branch instructions retired. |
|
br_misp_retired | all |
0x01: (name=conditional) Mispredicted conditional branch instructions retired.
0x01: (name=conditional_pebs) Mispredicted conditional branch instructions retired. 0x04: (name=all_branches_pebs) This event counts all mispredicted branch instructions retired. This is a precise event. 0x20: (name=near_taken) number of near branch instructions retired that were mispredicted and taken. 0x20: (name=near_taken_pebs) number of near branch instructions retired that were mispredicted and taken. |
|
hle_retired | all |
0x01: (name=start) Number of times an HLE execution started.
0x02: (name=commit) Number of times an HLE execution successfully committed 0x04: (name=aborted) Number of times an HLE execution aborted due to any reasons (multiple categories may count as one). 0x04: (name=aborted_pebs) Number of times an HLE execution aborted due to any reasons (multiple categories may count as one). 0x08: (name=aborted_misc1) Number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts). 0x10: (name=aborted_misc2) Number of times an HLE execution aborted due to uncommon conditions 0x20: (name=aborted_misc3) Number of times an HLE execution aborted due to HLE-unfriendly instructions 0x40: (name=aborted_misc4) Number of times an HLE execution aborted due to incompatible memory type 0x80: (name=aborted_misc5) Number of times an HLE execution aborted due to none of the previous 4 categories (e.g. interrupts) |
|
rtm_retired | all |
0x01: (name=start) Number of times an RTM execution started.
0x02: (name=commit) Number of times an RTM execution successfully committed 0x04: (name=aborted) Number of times an RTM execution aborted due to any reasons (multiple categories may count as one). 0x04: (name=aborted_pebs) Number of times an RTM execution aborted due to any reasons (multiple categories may count as one). 0x08: (name=aborted_misc1) Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts) 0x10: (name=aborted_misc2) Number of times an RTM execution aborted due to various memory events (e.g., read/write capacity and conflicts). 0x20: (name=aborted_misc3) Number of times an RTM execution aborted due to HLE-unfriendly instructions 0x40: (name=aborted_misc4) Number of times an RTM execution aborted due to incompatible memory type 0x80: (name=aborted_misc5) Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt) |
|
fp_assist | all |
0x1e: (name=any) Cycles with any input/output SSE or FP assist
0x02: (name=x87_output) Number of X87 assists due to output value. 0x04: (name=x87_input) Number of X87 assists due to input value. 0x08: (name=simd_output) Number of SIMD FP assists due to Output values 0x10: (name=simd_input) Number of SIMD FP assists due to input values |
|
rob_misc_events_lbr_inserts | all |
0x20: No unit mask
|
|
mem_uops_retired | all |
0x11: (name=stlb_miss_loads) Load uops with true STLB miss retired to architected path. Errata: HSM30
0x11: (name=stlb_miss_loads_pebs) Load uops with true STLB miss retired to architected path. Errata: HSM30 0x12: (name=stlb_miss_stores) Store uops with true STLB miss retired to architected path. Errata: HSM30 0x12: (name=stlb_miss_stores_pebs) Store uops with true STLB miss retired to architected path. Errata: HSM30 0x21: (name=lock_loads) Load uops with locked access retired to architected path. Errata: HSM30 0x21: (name=lock_loads_pebs) Load uops with locked access retired to architected path. Errata: HSM30 0x41: (name=split_loads) Line-splitted load uops retired to architected path. Errata: HSM30 0x41: (name=split_loads_pebs) Line-splitted load uops retired to architected path. Errata: HSM30 0x42: (name=split_stores) Line-splitted store uops retired to architected path. Errata: HSM30 0x42: (name=split_stores_pebs) Line-splitted store uops retired to architected path. Errata: HSM30 0x81: (name=all_loads) Load uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30 0x81: (name=all_loads_pebs) Load uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30 0x82: (name=all_stores) Store uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30 0x82: (name=all_stores_pebs) Store uops retired to architected path with filter on bits 0 and 1 applied. Errata: HSM30 |
|
mem_load_uops_retired | all |
0x01: (name=l1_hit) Retired load uops with L1 cache hits as data sources. Errata: HSM30
0x01: (name=l1_hit_pebs) Retired load uops with L1 cache hits as data sources. Errata: HSM30 0x02: (name=l2_hit) Retired load uops with L2 cache hits as data sources. Errata: HSM30 0x02: (name=l2_hit_pebs) Retired load uops with L2 cache hits as data sources. Errata: HSM30 0x04: (name=l3_hit) Retired load uops which data sources were data hits in L3 without snoops required. Errata: HSM26, HSM30 0x04: (name=l3_hit_pebs) Retired load uops which data sources were data hits in L3 without snoops required. Errata: HSM26, HSM30 0x08: (name=l1_miss) Retired load uops misses in L1 cache as data sources. Errata: HSM30 0x08: (name=l1_miss_pebs) Retired load uops misses in L1 cache as data sources. Errata: HSM30 0x10: (name=l2_miss) Miss in mid-level (L2) cache. Excludes Unknown data-source. Errata: HSM30 0x10: (name=l2_miss_pebs) Miss in mid-level (L2) cache. Excludes Unknown data-source. Errata: HSM30 0x20: (name=l3_miss) Miss in last-level (L3) cache. Excludes Unknown data-source. Errata: HSM26, HSM30 0x20: (name=l3_miss_pebs) Miss in last-level (L3) cache. Excludes Unknown data-source. Errata: HSM26, HSM30 0x40: (name=hit_lfb) Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. Errata: HSM30 0x40: (name=hit_lfb_pebs) Retired load uops which data sources were load uops missed L1 but hit FB due to preceding miss to the same cache line with data not ready. Errata: HSM30 |
|
mem_load_uops_l3_hit_retired | all |
0x01: (name=xsnp_miss) Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache. Errata: HSM26, HSM30
0x01: (name=xsnp_miss_pebs) Retired load uops which data sources were L3 hit and cross-core snoop missed in on-pkg core cache. Errata: HSM26, HSM30 0x02: (name=xsnp_hit) Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache. Errata: HSM26, HSM30 0x02: (name=xsnp_hit_pebs) Retired load uops which data sources were L3 and cross-core snoop hits in on-pkg core cache. Errata: HSM26, HSM30 0x04: (name=xsnp_hitm) Retired load uops which data sources were HitM responses from shared L3. Errata: HSM26, HSM30 0x04: (name=xsnp_hitm_pebs) Retired load uops which data sources were HitM responses from shared L3. Errata: HSM26, HSM30 0x08: (name=xsnp_none) Retired load uops which data sources were hits in L3 without snoops required. Errata: HSM26, HSM30 0x08: (name=xsnp_none_pebs) Retired load uops which data sources were hits in L3 without snoops required. Errata: HSM26, HSM30 |
|
mem_load_uops_l3_miss_retired | all |
0x01: (name=local_dram) This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches. Errata: HSM30
0x01: (name=local_dram_pebs) This event counts retired load uops where the data came from local DRAM. This does not include hardware prefetches. Errata: HSM30 |
|
baclears_any | all |
0x1f: No unit mask
|
|
l2_trans | all |
0x80: (name=all_requests) Transactions accessing L2 pipe
0x01: (name=demand_data_rd) Demand Data Read requests that access L2 cache 0x02: (name=rfo) RFO requests that access L2 cache 0x04: (name=code_rd) L2 cache accesses when fetching instructions 0x08: (name=all_pf) L2 or L3 HW prefetches that access L2 cache 0x10: (name=l1d_wb) L1D writebacks that access L2 cache 0x20: (name=l2_fill) L2 fill requests that access L2 cache 0x40: (name=l2_wb) L2 writebacks that access L2 cache |
|
l2_lines_in | all |
0x07: (name=all) This event counts the number of L2 cache lines brought into the L2 cache. Lines are filled into the L2 cache when there was an L2 miss.
0x01: (name=i) L2 cache lines in I state filling L2 0x02: (name=s) L2 cache lines in S state filling L2 0x04: (name=e) L2 cache lines in E state filling L2 |
|
l2_lines_out | all |
0x05: (name=demand_clean) Clean L2 cache lines evicted by demand
0x06: (name=demand_dirty) Dirty L2 cache lines evicted by demand |
Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.- Rob Pike