This is a list of all Intel Goldmont Plus Microarchitecture performance counter event types. Please see Intel Architecture Developer's Manual Volume 3B, Appendix A and Intel Architecture Optimization Reference Manual (730795-001).
Name | Description | Counters usable | Unit mask options |
CPU_CLK_UNHALTED | Clock cycles when not halted | all | |
UNHALTED_REFERENCE_CYCLES | Unhalted reference cycles | all |
0x01: No unit mask
|
INST_RETIRED | number of instructions retired | all | |
LLC_MISSES | Last level cache demand requests from this core that missed the LLC | all |
0x41: No unit mask
|
LLC_REFS | Last level cache demand requests from this core | all |
0x4f: No unit mask
|
BR_INST_RETIRED | number of branch instructions retired | all | |
BR_MISS_PRED_RETIRED | number of mispredicted branches retired (precise) | all | |
recycleq | Counts the number of retired load or store micro-ops that get pushed into the Recycle Queue | all |
0x01: (name=ld_block_st_forward) Counts the number of occurrences a retired load gets blocked because its address partially overlaps with a store.
0x02: (name=ld_block_std_notready) Counts the number of occurrences a retired load gets blocked because its address overlaps with a store whose data is not ready. 0x04: (name=st_splits) Counts the number of occurrences a retired store that is a cache line split. Each split should be counted only once. 0x08: (name=ld_splits) Counts the number of occurrences a retired load that is a cache line split. Each split should be counted only once. 0x10: (name=lock) Counts all the retired locked loads. It does not include stores because we would double count if we count stores. 0x20: (name=sta_full) Counts the store micro-ops retired that were pushed in the rehad queue because the store address buffer is full. 0x40: (name=any_ld) Counts any retired load that was pushed into the recycle queue for any reason. 0x80: (name=any_st) Counts any retired store that was pushed into the recycle queue for any reason. |
mem_uops_retired | Counts the number of memory micro-ops retired. | all |
0x01: (name=l1_miss_loads) Counts the number of load micro-ops retired that miss in L1 D cache.
0x02: (name=l2_hit_loads) Counts the number of load micro-ops retired that hit in the L2. 0x04: (name=l2_miss_loads) Counts the number of load micro-ops retired that miss in the L2. 0x08: (name=dtlb_miss_loads) Counts the number of load micro-ops retired that cause a DTLB miss. 0x10: (name=utlb_miss_loads) Counts the number of load micro-ops retired that caused micro TLB miss. 0x20: (name=hitm) Counts the loads retired that get the data from the other core in the same tile in M state. 0x40: (name=any_loads) Counts all the load micro-ops retired. 0x80: (name=any_stores) Counts all the store micro-ops retired. |
page_walks | Counts the number of core cycles for page walks | all |
0x01: (name=d_side_walks) Counts the total D-side page walks that are completed or started. The page walks started in the speculative path will also be counted.
0x01: (name=d_side_cycles) Counts the total number of core cycles for all the D-side page walks. The cycles for page walks started in speculative path will also be included. 0x02: (name=i_side_walks) Counts the total I-side page walks that are completed. 0x02: (name=i_side_cycles) Counts the total number of core cycles for all the I-side page walks. The cycles for page walks started in speculative path will also be included. 0x03: (name=walks) Counts the total page walks completed (I-side and D-side) 0x03: (name=cycles) Counts the total number of core cycles for all the page walks. The cycles for page walks started in speculative path will also be included. |
l2_requests_reject | Counts the number of MEC requests from the L2Q that reference a cache line were rejected. | all |
0x00: (name=all) Counts the number of MEC requests from the L2Q that reference a cache line excluding SW prefetches filling only to L2 cache and L1 evictions (automatically exlcudes L2HWP, UC, WC) that were rejected - Multiple repeated rejects should be counted multiple times.
|
core_reject_l2q | Number of requests not accepted into the L2Q because of any L2 queue reject condition. | all |
0x00: (name=all) Counts the number of MEC requests that were not accepted into the L2Q because of any L2 queue reject condition. There is no concept of at-ret here. It might include requests due to instructions in the speculative path
|
icache | Instruction fetches | all |
0x03: (name=accesses) All instruction fetches including uncacheable
0x01: (name=hits) All instruction fetches that hit instruction cache 0x02: (name=misses) All instruction fetches that missed instruction cache (produced a memory request); counted only once, not once per outstanding cycle |
fetch_stall | Counts the number of core cycles the instruction fetch pipe was stalls | all |
0x01: (name=icache_fill_pending_cycles) Counts the number of core cycles the fetch stalls because of an icache miss. This is a cumulative count of core cycles the fetch stalled for all icache misses
0x01: (name=icache_fill_pending_edge) Counts the number of times it happens that fetch stalls because of an icache miss. |
l2_requests | L2 cache requests | all |
0x41: (name=miss) Counts the total number of L2 cache misses.
0x4f: (name=reference) Counts the total number of L2 cache references. |
uops_retired | Retired uops | all |
0x01: (name=ms) Counts the number of uops retired that are from complex flows issued by the micro-sequencer
0x10: (name=all) Counts the number of uops retired 0x20: (name=scalar_simd) Counts the number of scalar SSE, AVX, AVX2, AVX-512 micro-ops except for loads (memory-to-register mov-type micro ops), division, sqrt. 0x40: (name=packed_simd) Counts the number of packed SSE, AVX, AVX2, AVX-512 micro-ops (both floating point and integer) except for loads (memory-to-register mov-type micro-ops), packed byte and word multiplies. |
machine_clears | Counts the number of times that the machine clears at retire. | all |
0x01: (name=smc) Counts the number of times that the machine clears due to program modifying data within 1K of a recently fetched code page.
0x02: (name=memory_ordering) Counts the number of times the machine clears due to memory ordering hazards. 0x04: (name=fp_assist) Counts the number of floating operations retired that required microcode assists 0x08: (name=all) Counts all machine clears |
br_inst_retired | Counts the number of branch instructions retired | all |
0x00: (name=any) Counts the number of branch instructions retired
0x7e: (name=jcc) Counts the number of branch instructions retired that were conditional jumps. 0xfe: (name=taken_jcc) Counts the number of branch instructions retired that were conditional jumps and predicted taken. 0xf9: (name=call) Counts the number of near CALL branch instructions retired. 0xfd: (name=rel_call) Counts the number of near relative CALL branch instructions retired. 0xfb: (name=ind_call) Counts the number of near indirect CALL branch instructions retired. 0xf7: (name=return) Counts the number of near RET branch instructions retired. 0xeb: (name=non_return_ind) Counts the number of branch instructions retired that were near indirect CALL or near indirect JMP. 0xbf: (name=far_branch) Counts the number of far branch instructions retired. |
br_misp_retired | Counts the number of mispredicted branch instructions retired | all |
0x00: (name=any) All mispredicted branches
0x7e: (name=jcc) Number of mispredicted conditional branch instructions retired 0xfe: (name=taken_jcc) Number of mispredicted taken conditional branch instructions retired 0xf9: (name=call) Counts the number of mispredicted near CALL branch instructions retired. 0xfd: (name=rel_call) Counts the number of mispredicted near relative CALL branch instructions retired. 0xfb: (name=ind_call) Number of mispredicted indirect call branch instructions retired 0xf7: (name=return) Number of mispredicted return branch instructions retired 0xeb: (name=non_return_ind) Number of mispredicted non-return branch instructions retired 0xbf: (name=far_branch) Counts the number of mispredicted far branch instructions retired. |
no_alloc_cycles | Counts the number of core cycles when no micro-ops are allocated | all |
0x01: (name=rob_full) Counts the number of core cycles when no micro-ops are allocated and the ROB is full
0x02: (name=mispredicts) Counts the number of core cycles when no micro-ops are allocated and the alloc pipe is stalled waiting for a mispredicted branch to retire. 0x20: (name=rat_stall) Counts the number of core cycles when no micro-ops are allocated and a RATstall (caused by reservation station full) is asserted. 0x7f: (name=all) Counts the total number of core cycles when no micro-ops are allocated for any reason. |
rs_full_stall | Counts the number of core cycles when the allocate stalls because the required RS is full. | all |
0x01: (name=mec) Counts the number of core cycles when allocation pipeline is stalled and is waiting for a free MEC reservation station entry.
0x1f: (name=all) Counts the total number of core cycles the Alloc pipeline is stalled when any one of the reservation stations is full. |
cycles_div_busy | Number of core cycles when divider is busy | all |
0x01: (name=all) Cycles the number of core cycles when divider is busy, does not imply a stall waiting for the divider
|
baclears | Counts the number of times Branch Target Buffer (BTB) prediction was corrected by a later branch predictor | all |
0x01: (name=all) Counts the number of times front-end resteers for any branch as a result of another branch handling mechanism in the front-end.
0x08: (name=return) Counts the number of times the front-end resteers for RET branches as a result of another branch handling mechanism in the front-end. 0x10: (name=cond) Counts the number of times the front-end resteers for conditional branches as a result of another branch handling mechanism in the front-end. |
ms_decoded | Microcode sequencer decode entrypoints | all |
0x01: (name=ms_entry) Counts the number of times the MSROM starts a flow of uops.
|
Measurement is a crucial component of performance improvement since reasoning and intuition are fallible guides and must be supplemented with tools like timing commands and profilers.- The Practice of Programming, Brian W. Kernighan and Rob Pike