This is a list of all Intel Skylake Microarchitecture performance counter event types. Please see Intel Architecture Developer's Manual Volume 3B, Appendix A and Intel Architecture Optimization Reference Manual (730795-001).
Name | Description | Counters usable | Unit mask options |
ld_blocks | all |
0x02: (name=store_forward) loads blocked by overlapping with store buffer that cannot be forwarded .
0x08: (name=no_sr) The number of times that split load operations are temporarily blocked because all resources for handling the split accesses are in use |
|
ld_blocks_partial_address_alias | all |
0x01: (name=address_alias) False dependencies in MOB due to partial compare on address.
|
|
dtlb_load_misses | all |
0x01: (name=miss_causes_a_walk) Load misses in all DTLB levels that cause page walks
0x10: (name=walk_pending) Counts 1 per cycle for each PMH that is busy with a page walk for a load. 0x20: (name=stlb_hit) Loads that miss the DTLB and hit the STLB. 0x0e: (name=walk_completed) Load miss in all TLB levels causes a page walk that completes. (All page sizes) 0x10: (name=walk_active) Cycles when at least one PMH is busy with a page walk for a load. |
|
int_misc | all |
0x01: (name=recovery_cycles) Core cycles the allocator was stalled due to recovery from earlier clear event for this thread (e.g. misprediction or memory nuke)
0x80: (name=clear_resteer_cycles) Cycles the issue-stage is waiting for front-end to fetch from resteered path following branch misprediction or machine clear events. 0x01: (name=recovery_cycles_any) Core cycles the allocator was stalled due to recovery from earlier clear event for any thread running on the physical core (e.g. misprediction or memory nuke) |
|
uops_issued | all |
0x01: (name=any) Uops that Resource Allocation Table (RAT) issues to Reservation Station (RS)
0x20: (name=slow_lea) Number of slow LEA uops being allocated. A uop is generally considered SlowLea if it has 3 sources (e.g. 2 sources + immediate) regardless if as a result of LEA instruction or not. 0x01: (name=stall_cycles) Cycles when Resource Allocation Table (RAT) does not issue Uops to Reservation Station (RS) for the thread 0x02: (name=vector_width_mismatch) This event counts the number of Blend Uops issued by the Resource Allocation Table (RAT) to the reservation station (RS) in order to preserve upper bits of vector registers. Starting the Skylake microarchitecture, these Blend uops are needed since every Intel SSE instruction executed in Dirty Upper State needs to preserve bits 128-255 of the destination register. For more information, refer to ?Mixing Intel AVX and Intel SSE Code? section of the Optimization Guide. |
|
arith_divider_active | all |
0x01: (name=divider_active) Cycles when divide unit is busy executing divide or square root operations. Accounts for integer and floating-point operations.
|
|
l2_rqsts | all |
0x21: (name=demand_data_rd_miss) Demand Data Read miss L2, no rejects
0x41: (name=demand_data_rd_hit) Demand Data Read requests that hit L2 cache 0xe1: (name=all_demand_data_rd) Demand Data Read requests 0xe2: (name=all_rfo) RFO requests to L2 cache 0xe4: (name=all_code_rd) L2 code requests 0xf8: (name=all_pf) Requests from the L1/L2/L3 hardware prefetchers or Load software prefetches 0x38: (name=pf_miss) Requests from the L1/L2/L3 hardware prefetchers or Load software prefetches that miss L2 cache 0xd8: (name=pf_hit) Requests from the L1/L2/L3 hardware prefetchers or Load software prefetches that hit L2 cache 0x42: (name=rfo_hit) RFO requests that hit L2 cache 0x22: (name=rfo_miss) RFO requests that miss L2 cache 0x44: (name=code_rd_hit) L2 cache hits when fetching instructions, code reads. 0x24: (name=code_rd_miss) L2 cache misses when fetching instructions 0x27: (name=all_demand_miss) Demand requests that miss L2 cache 0xe7: (name=all_demand_references) Demand requests to L2 cache 0x3f: (name=miss) All requests that miss L2 cache 0xff: (name=references) All L2 requests |
|
longest_lat_cache | all |
0x41: (name=miss) Core-originated cacheable demand requests missed L3
0x4f: (name=reference) Core-originated cacheable demand requests that refer to L3 |
|
cpu_clk_unhalted | all |
0x00: (name=thread) Core cycles when the thread is not in halt state
0x01: (name=ref_tsc) Reference cycles when the core is not in halt state. 0x00: (name=thread_p) Thread cycles when thread is not in halt state 0x02: (name=thread_any) Core cycles when at least one thread on the physical core is not in halt state 0x00: (name=thread_p_any) Core cycles when at least one thread on the physical core is not in halt state |
|
cpu_clk_thread_unhalted | all |
0x01: (name=ref_xclk) Reference cycles when the thread is unhalted (counts at 100 MHz rate)
0x02: (name=one_thread_active) Count XClk pulses when this thread is unhalted and the other thread is halted. 0x01: (name=ref_xclk_any) Reference cycles when the at least one thread on the physical core is unhalted (counts at 100 MHz rate) |
|
l1d_pend_miss | all |
0x01: (name=pending) L1D miss oustandings duration in cycles
0x02: (name=fb_full) Number of times a request needed a FB entry but there was no entry available for it. That is the FB unavailability was dominant reason for blocking the request. A request includes cacheable/uncacheable demands that is load, store or SW prefetch. HWP are e 0x01: (name=pending_cycles) Cycles with L1D load Misses outstanding. 0x01: (name=pending_cycles_any) Cycles with L1D load Misses outstanding from any thread on physical core |
|
dtlb_store_misses | all |
0x01: (name=miss_causes_a_walk) Store misses in all DTLB levels that cause page walks
0x10: (name=walk_pending) Counts 1 per cycle for each PMH that is busy with a page walk for a store. 0x20: (name=stlb_hit) Stores that miss the DTLB and hit the STLB. 0x0e: (name=walk_completed) Store misses in all TLB levels causes a page walk that completes. (All page sizes) 0x10: (name=walk_active) Cycles when at least one PMH is busy with a page walk for a store. |
|
load_hit_pre_sw_pf | all |
0x01: (name=sw_pf) Demand load dispatches that hit L1D fill buffer (FB) allocated for software prefetch.
|
|
ept_walk_pending | all |
0x10: (name=walk_pending) Counts 1 per cycle for each PMH that is busy with a EPT (Extended Page Table) walk for any request type.
|
|
l1d_replacement | all |
0x01: (name=replacement) L1D data line replacements
|
|
tx_mem | all |
0x01: (name=abort_conflict) Number of times a transactional abort was signaled due to a data conflict on a transactionally accessed address
0x02: (name=abort_capacity) Number of times a transactional abort was signaled due to a data capacity limitation for transactional reads or writes. 0x04: (name=abort_hle_store_to_elided_lock) Number of times a HLE transactional region aborted due to a non XRELEASE prefixed instruction writing to an elided lock in the elision buffer 0x08: (name=abort_hle_elision_buffer_not_empty) Number of times an HLE transactional execution aborted due to NoAllocatedElisionBuffer being non-zero. 0x10: (name=abort_hle_elision_buffer_mismatch) Number of times an HLE transactional execution aborted due to XRELEASE lock not satisfying the address and value requirements in the elision buffer 0x20: (name=abort_hle_elision_buffer_unsupported_alignment) Number of times an HLE transactional execution aborted due to an unsupported read alignment from the elision buffer. 0x40: (name=hle_elision_buffer_full) Number of times HLE lock could not be elided due to ElisionBufferAvailable being zero. |
|
tx_exec | all |
0x01: (name=misc1) Counts the number of times a class of instructions that may cause a transactional abort was executed. Since this is the count of execution, it may not always cause a transactional abort.
0x02: (name=misc2) Counts the number of times a class of instructions (e.g., vzeroupper) that may cause a transactional abort was executed inside a transactional region 0x04: (name=misc3) Counts the number of times an instruction execution caused the transactional nest count supported to be exceeded 0x08: (name=misc4) Counts the number of times a XBEGIN instruction was executed inside an HLE transactional region. 0x10: (name=misc5) Counts the number of times an HLE XACQUIRE instruction was executed inside an RTM transactional region |
|
rs_events | all |
0x01: (name=empty_cycles) Cycles when Reservation Station (RS) is empty for the thread
0x01: (name=empty_end) Counts end of periods where the Reservation Station (RS) was empty. Could be useful to precisely locate Frontend Latency Bound issues. |
|
offcore_requests_outstanding | all |
0x01: (name=demand_data_rd) Offcore outstanding Demand Data Read transactions in uncore queue.
0x02: (name=demand_code_rd) Offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore. 0x04: (name=demand_rfo) Offcore outstanding demand rfo reads transactions in SuperQueue (SQ), queue to uncore, every cycle 0x08: (name=all_data_rd) Offcore outstanding cacheable Core Data Read transactions in SuperQueue (SQ), queue to uncore 0x10: (name=l3_miss_demand_data_rd) Counts number of Offcore outstanding Demand Data Read requests who miss L3 cache in the superQ every cycle. 0x01: (name=cycles_with_demand_data_rd) Cycles when offcore outstanding Demand Data Read transactions are present in SuperQueue (SQ), queue to uncore 0x08: (name=cycles_with_data_rd) Cycles when offcore outstanding cacheable Core Data Read transactions are present in SuperQueue (SQ), queue to uncore 0x02: (name=cycles_with_demand_code_rd) Cycles with offcore outstanding Code Reads transactions in the SuperQueue (SQ), queue to uncore, every cycle. 0x04: (name=cycles_with_demand_rfo) Offcore outstanding demand rfo reads transactions in SuperQueue (SQ), queue to uncore, every cycle 0x10: (name=cycles_with_l3_miss_demand_data_rd) Cycles with at least 1 Demand Data Read requests who miss L3 cache in the superQ 0x10: (name=l3_miss_demand_data_rd_ge_6) Cycles with at least 6 Demand Data Read requests who miss L3 cache in the superQ 0x01: (name=demand_data_rd_ge_6) Cycles with at least 6 offcore outstanding Demand Data Read transactions in uncore queue |
|
lock_cycles_cache_lock_duration | all |
0x02: (name=cache_lock_duration) Cycles when L1D is locked
|
|
idq | all |
0x04: (name=mite_uops) Uops delivered to Instruction Decode Queue (IDQ) from MITE path
0x08: (name=dsb_uops) Uops delivered to Instruction Decode Queue (IDQ) from the Decode Stream Buffer (DSB) path 0x20: (name=ms_mite_uops) Uops initiated by MITE and delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x30: (name=ms_cycles) Cycles when uops are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x04: (name=mite_cycles) Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from MITE path 0x08: (name=dsb_cycles) Cycles when uops are being delivered to Instruction Decode Queue (IDQ) from Decode Stream Buffer (DSB) path 0x10: (name=ms_dsb_cycles) Cycles when uops initiated by Decode Stream Buffer (DSB) are being delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy 0x18: (name=all_dsb_cycles_4_uops) Cycles Decode Stream Buffer (DSB) is delivering 4 Uops 0x18: (name=all_dsb_cycles_any_uops) Cycles Decode Stream Buffer (DSB) is delivering any Uop 0x24: (name=all_mite_cycles_4_uops) Cycles MITE is delivering 4 Uops 0x24: (name=all_mite_cycles_any_uops) Cycles MITE is delivering any Uop 0x30: (name=ms_switches) Number of switches from DSB (Decode Stream Buffer) or MITE (legacy decode pipeline) to the Microcode Sequencer 0x30: (name=ms_uops) Uops delivered to Instruction Decode Queue (IDQ) while Microcode Sequenser (MS) is busy |
|
icache_16b_ifdata_stall | all |
0x04: (name=ifdata_stall) Cycles where a code fetch is stalled due to L1 instruction cache miss.
|
|
icache_64b | all |
0x01: (name=iftag_hit) Instruction fetch tag lookups that hit in the instruction cache (L1I). Counts at 64-byte cache-line granularity.
0x02: (name=iftag_miss) Instruction fetch tag lookups that miss in the instruction cache (L1I). Counts at 64-byte cache-line granularity. 0x04: (name=iftag_stall) Cycles where a code fetch is stalled due to L1 instruction cache tag miss. |
|
itlb_misses | all |
0x01: (name=miss_causes_a_walk) Misses at all ITLB levels that cause page walks
0x10: (name=walk_pending) Counts 1 per cycle for each PMH that is busy with a page walk for an instruction fetch request. 0x20: (name=stlb_hit) Intruction fetch requests that miss the ITLB and hit the STLB. 0x0e: (name=walk_completed) Code miss in all TLB levels causes a page walk that completes. (All page sizes) |
|
ild_stall_lcp | all |
0x01: (name=lcp) Stalls caused by changing prefix length of the instruction.
|
|
idq_uops_not_delivered | all |
0x01: (name=core) Uops not delivered to Resource Allocation Table (RAT) per thread when backend of the machine is not stalled
0x01: (name=cycles_0_uops_deliv_core) Cycles per thread when 4 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled 0x01: (name=cycles_le_1_uop_deliv_core) Cycles per thread when 3 or more uops are not delivered to Resource Allocation Table (RAT) when backend of the machine is not stalled 0x01: (name=cycles_le_2_uop_deliv_core) Cycles with less than 2 uops delivered by the front end. 0x01: (name=cycles_le_3_uop_deliv_core) Cycles with less than 3 uops delivered by the front end. 0x01: (name=cycles_fe_was_ok) Counts cycles FE delivered 4 uops or Resource Allocation Table (RAT) was stalling FE. |
|
uops_dispatched_port | all |
0x01: (name=port_0) Cycles per thread when uops are executed in port 0
0x02: (name=port_1) Cycles per thread when uops are executed in port 1 0x04: (name=port_2) Cycles per thread when uops are executed in port 2 0x08: (name=port_3) Cycles per thread when uops are executed in port 3 0x10: (name=port_4) Cycles per thread when uops are executed in port 4 0x20: (name=port_5) Cycles per thread when uops are executed in port 5 0x40: (name=port_6) Cycles per thread when uops are executed in port 6 0x80: (name=port_7) Cycles per thread when uops are executed in port 7 |
|
resource_stalls | all |
0x01: (name=any) Resource-related stall cycles
0x08: (name=sb) Cycles stalled due to no store buffers available. (not including draining form sync). |
|
cycle_activity | all |
0x04: (name=stalls_total) Total execution stalls.
0x08: (name=cycles_l1d_miss) Cycles while L1 cache miss demand load is outstanding. 0x0c: (name=stalls_l1d_miss) Execution stalls while L1 cache miss demand load is outstanding. 0x01: (name=cycles_l2_miss) Cycles while L2 cache miss demand load is outstanding. 0x05: (name=stalls_l2_miss) Execution stalls while L2 cache miss demand load is outstanding. 0x10: (name=cycles_mem_any) Cycles while memory subsystem has an outstanding load. 0x14: (name=stalls_mem_any) Execution stalls while memory subsystem has an outstanding load. 0x02: (name=cycles_l3_miss) Cycles while L3 cache miss demand load is outstanding. 0x06: (name=stalls_l3_miss) Execution stalls while L3 cache miss demand load is outstanding. |
|
exe_activity | all |
0x01: (name=exe_bound_0_ports) Cycles where no uops were executed, the Reservation Station was not empty, the Store Buffer was full and there was no outstanding load.
0x02: (name=u1_ports_util) Cycles total of 1 uop is executed on all ports and Reservation Station was not empty. 0x04: (name=u2_ports_util) Cycles total of 2 uops are executed on all ports and Reservation Station was not empty. 0x08: (name=u3_ports_util) Cycles total of 3 uops are executed on all ports and Reservation Station was not empty. 0x10: (name=u4_ports_util) Cycles total of 4 uops are executed on all ports and Reservation Station was not empty. 0x40: (name=bound_on_stores) Cycles where the Store Buffer was full and no outstanding load. |
|
lsd | all |
0x01: (name=uops) Number of Uops delivered by the LSD.
0x01: (name=cycles_active) Cycles Uops delivered by the LSD, but didn't come from the decoder 0x01: (name=cycles_4_uops) Cycles 4 Uops delivered by the LSD, but didn't come from the decoder |
|
dsb2mite_switches_penalty_cycles | all |
0x02: (name=penalty_cycles) Decode Stream Buffer (DSB)-to-MITE switch true penalty cycles.
|
|
itlb_itlb_flush | all |
0x01: (name=itlb_flush) Flushing of the Instruction TLB (ITLB) pages, includes 4k/2M/4M pages.
|
|
offcore_requests | all |
0x80: (name=all_requests) Any memory transaction that reached the SQ.
0x01: (name=demand_data_rd) Demand Data Read requests sent to uncore 0x02: (name=demand_code_rd) Cacheable and noncachaeble code read requests 0x04: (name=demand_rfo) Demand RFO requests including regular RFOs, locks, ItoM 0x08: (name=all_data_rd) Demand and prefetch data reads 0x10: (name=l3_miss_demand_data_rd) Demand Data Read requests who miss L3 cache |
|
uops_executed | all |
0x01: (name=thread) Counts the number of uops to be executed per-thread each cycle.
0x02: (name=core) Number of uops executed on the core. 0x10: (name=x87) Counts the number of x87 uops dispatched. 0x01: (name=stall_cycles) Counts number of cycles no uops were dispatched to be executed on this thread. 0x01: (name=cycles_ge_1_uop_exec) Cycles where at least 1 uop was executed per-thread 0x01: (name=cycles_ge_2_uops_exec) Cycles where at least 2 uops were executed per-thread 0x01: (name=cycles_ge_3_uops_exec) Cycles where at least 3 uops were executed per-thread 0x01: (name=cycles_ge_4_uops_exec) Cycles where at least 4 uops were executed per-thread 0x02: (name=core_cycles_ge_1) Cycles at least 1 micro-op is executed from any thread on physical core 0x02: (name=core_cycles_ge_2) Cycles at least 2 micro-op is executed from any thread on physical core 0x02: (name=core_cycles_ge_3) Cycles at least 3 micro-op is executed from any thread on physical core 0x02: (name=core_cycles_ge_4) Cycles at least 4 micro-op is executed from any thread on physical core 0x02: (name=core_cycles_none) Cycles with no micro-ops executed from any thread on physical core |
|
offcore_requests_buffer_sq_full | all |
0x01: (name=sq_full) Offcore requests buffer cannot take more entries for this thread core.
|
|
tlb_flush | all |
0x01: (name=dtlb_thread) DTLB flush attempts of the thread-specific entries
0x20: (name=stlb_any) STLB flush attempts |
|
inst_retired | 1 |
0x00: (name=any) Instructions retired from execution.mem
0x00: (name=any_p) Number of instructions retired. General Counter - architectural event 0x01: (name=prec_dist) Precise instruction retired event with HW to reduce effect of PEBS shadow in IP distribution |
|
other_assists_any | all |
0x3f: (name=any) Number of times a microcode assist is invoked by HW other than FP-assist. Examples include AD (page Access Dirty) and AVX* related assists.
|
|
uops_retired | all |
0x02: (name=retire_slots) Retirement slots used.
0x01: (name=stall_cycles) Cycles without actually retired uops. 0x01: (name=total_cycles) Cycles with less than 10 actually retired uops. |
|
machine_clears | all |
0x01: (name=count) Number of machine clears (nukes) of any type.
0x02: (name=memory_ordering) Counts the number of machine clears due to memory order conflicts. 0x04: (name=smc) Self-modifying code (SMC) detected. |
|
br_inst_retired | all |
0x00: (name=all_branches) All (macro) branch instructions retired.
0x01: (name=conditional) Conditional branch instructions retired. 0x01: (name=conditional_pebs) Conditional branch instructions retired. 0x02: (name=near_call) Direct and indirect near call instructions retired. 0x02: (name=near_call_pebs) Direct and indirect near call instructions retired. 0x00: (name=all_branches_pebs) All (macro) branch instructions retired. 0x08: (name=near_return) Return instructions retired. 0x08: (name=near_return_pebs) Return instructions retired. 0x10: (name=not_taken) Not taken branch instructions retired. 0x20: (name=near_taken) Taken branch instructions retired. 0x20: (name=near_taken_pebs) Taken branch instructions retired. 0x40: (name=far_branch) Far branch instructions retired. 0x40: (name=far_branch_pebs) Far branch instructions retired. 0x04: (name=all_branches_pebs) All (macro) branch instructions retired. |
|
br_misp_retired | all |
0x00: (name=all_branches) All mispredicted macro branch instructions retired.
0x01: (name=conditional) Mispredicted conditional branch instructions retired. 0x01: (name=conditional_pebs) Mispredicted conditional branch instructions retired. 0x20: (name=near_taken) number of near branch instructions retired that were mispredicted and taken. 0x20: (name=near_taken_pebs) number of near branch instructions retired that were mispredicted and taken. 0x04: (name=all_branches_pebs) Mispredicted macro branch instructions retired. |
|
fp_arith_inst_retired | all |
0x01: (name=scalar_double) Number of SSE/AVX computational scalar double precision floating-point instructions retired. Each count represents 1 computation. Applies to SSE* and AVX* scalar double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform multiple calculations per element.
0x02: (name=scalar_single) Number of SSE/AVX computational scalar single precision floating-point instructions retired. Each count represents 1 computation. Applies to SSE* and AVX* scalar single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP RSQRT SQRT FM(N)ADD/SUB. FM(N)ADD/SUB instructions count twice as they perform multiple calculations per element. 0x04: (name=u128b_packed_double) Number of SSE/AVX computational 128-bit packed double precision floating-point instructions retired. Each count represents 2 computations. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform multiple calculations per element. 0x08: (name=u128b_packed_single) Number of SSE/AVX computational 128-bit packed single precision floating-point instructions retired. Each count represents 4 computations. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP RSQRT SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform multiple calculations per element. 0x10: (name=u256b_packed_double) Number of SSE/AVX computational 256-bit packed double precision floating-point instructions retired. Each count represents 4 computations. Applies to SSE* and AVX* packed double precision floating-point instructions: ADD SUB MUL DIV MIN MAX SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform multiple calculations per element. 0x20: (name=u256b_packed_single) Number of SSE/AVX computational 256-bit packed single precision floating-point instructions retired. Each count represents 8 computations. Applies to SSE* and AVX* packed single precision floating-point instructions: ADD SUB MUL DIV MIN MAX RCP RSQRT SQRT DPP FM(N)ADD/SUB. DPP and FM(N)ADD/SUB instructions count twice as they perform multiple calculations per element. |
|
hle_retired | all |
0x01: (name=start) Number of times an HLE execution started.
0x02: (name=commit) Number of times an HLE execution successfully committed 0x04: (name=aborted) Number of times an HLE execution aborted due to any reasons (multiple categories may count as one). 0x04: (name=aborted_pebs) Number of times an HLE execution aborted due to any reasons (multiple categories may count as one). 0x08: (name=aborted_misc1) Number of times an HLE execution aborted due to various memory events (e.g., read/write capacity and conflicts). 0x10: (name=aborted_misc2) Number of times an HLE execution aborted due to hardware timer expiration. 0x20: (name=aborted_misc3) Number of times an HLE execution aborted due to HLE-unfriendly instructions and certain unfriendly events (such as AD assists etc.). 0x40: (name=aborted_misc4) Number of times an HLE execution aborted due to incompatible memory type 0x80: (name=aborted_misc5) Number of times an HLE execution aborted due to unfriendly events (such as interrupts). |
|
rtm_retired | all |
0x01: (name=start) Number of times an RTM execution started.
0x02: (name=commit) Number of times an RTM execution successfully committed 0x04: (name=aborted) Number of times an RTM execution aborted due to any reasons (multiple categories may count as one). 0x04: (name=aborted_pebs) Number of times an RTM execution aborted due to any reasons (multiple categories may count as one). 0x08: (name=aborted_misc1) Number of times an RTM execution aborted due to various memory events (e.g. read/write capacity and conflicts) 0x10: (name=aborted_misc2) Number of times an RTM execution aborted due to uncommon conditions. 0x20: (name=aborted_misc3) Number of times an RTM execution aborted due to HLE-unfriendly instructions 0x40: (name=aborted_misc4) Number of times an RTM execution aborted due to incompatible memory type 0x80: (name=aborted_misc5) Number of times an RTM execution aborted due to none of the previous 4 categories (e.g. interrupt) |
|
fp_assist_any | all |
0x1e: (name=any) Cycles with any input/output SSE or FP assist
|
|
hw_interrupts_received | all |
0x01: (name=received) Number of hardware interrupts received by the processor.
|
|
mem_inst_retired | all |
0x11: (name=stlb_miss_loads) Number of load instructions retired with STLB miss
0x11: (name=stlb_miss_loads_pebs) Number of load instructions retired with STLB miss 0x12: (name=stlb_miss_stores) Number of store instructions retired with STLB miss 0x12: (name=stlb_miss_stores_pebs) Number of store instructions retired with STLB miss 0x21: (name=lock_loads) Number of lock load instructions retired 0x21: (name=lock_loads_pebs) Number of lock load instructions retired 0x41: (name=split_loads) Number of load instructions retired with cache-line splits that may impact performance. 0x41: (name=split_loads_pebs) Number of load instructions retired with cache-line splits that may impact performance. 0x42: (name=split_stores) Number of store instructions retired with line-split 0x42: (name=split_stores_pebs) Number of store instructions retired with line-split 0x81: (name=all_loads) Number of load instructions retired 0x81: (name=all_loads_pebs) Number of load instructions retired 0x82: (name=all_stores) Number of store instructions retired 0x82: (name=all_stores_pebs) Number of store instructions retired |
|
mem_load_retired | all |
0x01: (name=l1_hit) Retired load instructions with L1 cache hits as data sources
0x01: (name=l1_hit_pebs) Retired load instructions with L1 cache hits as data sources 0x02: (name=l2_hit) Retired load instructions with L2 cache hits as data sources 0x02: (name=l2_hit_pebs) Retired load instructions with L2 cache hits as data sources 0x04: (name=l3_hit) Retired load instructions with L3 cache hits as data sources 0x04: (name=l3_hit_pebs) Retired load instructions with L3 cache hits as data sources 0x08: (name=l1_miss) Retired load instructions missed L1 cache as data sources 0x08: (name=l1_miss_pebs) Retired load instructions missed L1 cache as data sources 0x10: (name=l2_miss) Retired load instructions missed L2 cache as data sources 0x10: (name=l2_miss_pebs) Retired load instructions missed L2 cache as data sources 0x20: (name=l3_miss) Retired load instructions missed L3 cache as data sources 0x20: (name=l3_miss_pebs) Retired load instructions missed L3 cache as data sources 0x40: (name=fb_hit) Retired load instructions which data sources were load missed L1 but hit FB due to preceding miss to the same cache line with data not ready 0x40: (name=fb_hit_pebs) Retired load instructions which data sources were load missed L1 but hit FB due to preceding miss to the same cache line with data not ready |
|
mem_load_l3_hit_retired | all |
0x01: (name=xsnp_miss) Retired load instructions which data sources were L3 hit and cross-core snoop missed in on-pkg core cache.
0x01: (name=xsnp_miss_pebs) Retired load instructions which data sources were L3 hit and cross-core snoop missed in on-pkg core cache. 0x02: (name=xsnp_hit) Retired load instructions which data sources were L3 and cross-core snoop hits in on-pkg core cache 0x02: (name=xsnp_hit_pebs) Retired load instructions which data sources were L3 and cross-core snoop hits in on-pkg core cache 0x04: (name=xsnp_hitm) Retired load instructions which data sources were HitM responses from shared L3 0x04: (name=xsnp_hitm_pebs) Retired load instructions which data sources were HitM responses from shared L3 0x08: (name=xsnp_none) Retired load instructions which data sources were hits in L3 without snoops required 0x08: (name=xsnp_none_pebs) Retired load instructions which data sources were hits in L3 without snoops required |
|
baclears_any | all |
0x01: (name=any) Counts the total number when the front end is resteered, mainly when the BPU cannot provide a correct prediction and this is corrected by other branch handling mechanisms at the front end.
|
|
l2_trans_l2_wb | all |
0x40: (name=l2_wb) L2 writebacks that access L2 cache
|
|
l2_lines_in_all | all |
0x07: (name=all) L2 cache lines filling L2
|
Measurement is a crucial component of performance improvement since reasoning and intuition are fallible guides and must be supplemented with tools like timing commands and profilers.- The Practice of Programming, Brian W. Kernighan and Rob Pike