Page doesn't render properly ?

Intel Goldmont Plus Microarchitecture events

This is a list of all Intel Goldmont Plus Microarchitecture performance counter event types. Please see Intel Architecture Developer's Manual Volume 3B, Appendix A and Intel Architecture Optimization Reference Manual (730795-001).

NameDescriptionCounters usableUnit mask options
CPU_CLK_UNHALTED Clock cycles when not halted all
UNHALTED_REFERENCE_CYCLES Unhalted reference cycles all 0x01: No unit mask
INST_RETIRED number of instructions retired all
LLC_MISSES Last level cache demand requests from this core that missed the LLC all 0x41: No unit mask
LLC_REFS Last level cache demand requests from this core all 0x4f: No unit mask
BR_INST_RETIRED number of branch instructions retired all
BR_MISS_PRED_RETIRED number of mispredicted branches retired (precise) all
recycleq Counts the number of retired load or store micro-ops that get pushed into the Recycle Queue all 0x01: (name=ld_block_st_forward) Counts the number of occurrences a retired load gets blocked because its address partially overlaps with a store.
0x02: (name=ld_block_std_notready) Counts the number of occurrences a retired load gets blocked because its address overlaps with a store whose data is not ready.
0x04: (name=st_splits) Counts the number of occurrences a retired store that is a cache line split. Each split should be counted only once.
0x08: (name=ld_splits) Counts the number of occurrences a retired load that is a cache line split. Each split should be counted only once.
0x10: (name=lock) Counts all the retired locked loads. It does not include stores because we would double count if we count stores.
0x20: (name=sta_full) Counts the store micro-ops retired that were pushed in the rehad queue because the store address buffer is full.
0x40: (name=any_ld) Counts any retired load that was pushed into the recycle queue for any reason.
0x80: (name=any_st) Counts any retired store that was pushed into the recycle queue for any reason.
mem_uops_retired Counts the number of memory micro-ops retired. all 0x01: (name=l1_miss_loads) Counts the number of load micro-ops retired that miss in L1 D cache.
0x02: (name=l2_hit_loads) Counts the number of load micro-ops retired that hit in the L2.
0x04: (name=l2_miss_loads) Counts the number of load micro-ops retired that miss in the L2.
0x08: (name=dtlb_miss_loads) Counts the number of load micro-ops retired that cause a DTLB miss.
0x10: (name=utlb_miss_loads) Counts the number of load micro-ops retired that caused micro TLB miss.
0x20: (name=hitm) Counts the loads retired that get the data from the other core in the same tile in M state.
0x40: (name=any_loads) Counts all the load micro-ops retired.
0x80: (name=any_stores) Counts all the store micro-ops retired.
page_walks Counts the number of core cycles for page walks all 0x01: (name=d_side_walks) Counts the total D-side page walks that are completed or started. The page walks started in the speculative path will also be counted.
0x01: (name=d_side_cycles) Counts the total number of core cycles for all the D-side page walks. The cycles for page walks started in speculative path will also be included.
0x02: (name=i_side_walks) Counts the total I-side page walks that are completed.
0x02: (name=i_side_cycles) Counts the total number of core cycles for all the I-side page walks. The cycles for page walks started in speculative path will also be included.
0x03: (name=walks) Counts the total page walks completed (I-side and D-side)
0x03: (name=cycles) Counts the total number of core cycles for all the page walks. The cycles for page walks started in speculative path will also be included.
l2_requests_reject Counts the number of MEC requests from the L2Q that reference a cache line were rejected. all 0x00: (name=all) Counts the number of MEC requests from the L2Q that reference a cache line excluding SW prefetches filling only to L2 cache and L1 evictions (automatically exlcudes L2HWP, UC, WC) that were rejected - Multiple repeated rejects should be counted multiple times.
core_reject_l2q Number of requests not accepted into the L2Q because of any L2 queue reject condition. all 0x00: (name=all) Counts the number of MEC requests that were not accepted into the L2Q because of any L2 queue reject condition. There is no concept of at-ret here. It might include requests due to instructions in the speculative path
icache Instruction fetches all 0x03: (name=accesses) All instruction fetches including uncacheable
0x01: (name=hits) All instruction fetches that hit instruction cache
0x02: (name=misses) All instruction fetches that missed instruction cache (produced a memory request); counted only once, not once per outstanding cycle
fetch_stall Counts the number of core cycles the instruction fetch pipe was stalls all 0x01: (name=icache_fill_pending_cycles) Counts the number of core cycles the fetch stalls because of an icache miss. This is a cumulative count of core cycles the fetch stalled for all icache misses
0x01: (name=icache_fill_pending_edge) Counts the number of times it happens that fetch stalls because of an icache miss.
l2_requests L2 cache requests all 0x41: (name=miss) Counts the total number of L2 cache misses.
0x4f: (name=reference) Counts the total number of L2 cache references.
uops_retired Retired uops all 0x01: (name=ms) Counts the number of uops retired that are from complex flows issued by the micro-sequencer
0x10: (name=all) Counts the number of uops retired
0x20: (name=scalar_simd) Counts the number of scalar SSE, AVX, AVX2, AVX-512 micro-ops except for loads (memory-to-register mov-type micro ops), division, sqrt.
0x40: (name=packed_simd) Counts the number of packed SSE, AVX, AVX2, AVX-512 micro-ops (both floating point and integer) except for loads (memory-to-register mov-type micro-ops), packed byte and word multiplies.
machine_clears Counts the number of times that the machine clears at retire. all 0x01: (name=smc) Counts the number of times that the machine clears due to program modifying data within 1K of a recently fetched code page.
0x02: (name=memory_ordering) Counts the number of times the machine clears due to memory ordering hazards.
0x04: (name=fp_assist) Counts the number of floating operations retired that required microcode assists
0x08: (name=all) Counts all machine clears
br_inst_retired Counts the number of branch instructions retired all 0x00: (name=any) Counts the number of branch instructions retired
0x7e: (name=jcc) Counts the number of branch instructions retired that were conditional jumps.
0xfe: (name=taken_jcc) Counts the number of branch instructions retired that were conditional jumps and predicted taken.
0xf9: (name=call) Counts the number of near CALL branch instructions retired.
0xfd: (name=rel_call) Counts the number of near relative CALL branch instructions retired.
0xfb: (name=ind_call) Counts the number of near indirect CALL branch instructions retired.
0xf7: (name=return) Counts the number of near RET branch instructions retired.
0xeb: (name=non_return_ind) Counts the number of branch instructions retired that were near indirect CALL or near indirect JMP.
0xbf: (name=far_branch) Counts the number of far branch instructions retired.
br_misp_retired Counts the number of mispredicted branch instructions retired all 0x00: (name=any) All mispredicted branches
0x7e: (name=jcc) Number of mispredicted conditional branch instructions retired
0xfe: (name=taken_jcc) Number of mispredicted taken conditional branch instructions retired
0xf9: (name=call) Counts the number of mispredicted near CALL branch instructions retired.
0xfd: (name=rel_call) Counts the number of mispredicted near relative CALL branch instructions retired.
0xfb: (name=ind_call) Number of mispredicted indirect call branch instructions retired
0xf7: (name=return) Number of mispredicted return branch instructions retired
0xeb: (name=non_return_ind) Number of mispredicted non-return branch instructions retired
0xbf: (name=far_branch) Counts the number of mispredicted far branch instructions retired.
no_alloc_cycles Counts the number of core cycles when no micro-ops are allocated all 0x01: (name=rob_full) Counts the number of core cycles when no micro-ops are allocated and the ROB is full
0x02: (name=mispredicts) Counts the number of core cycles when no micro-ops are allocated and the alloc pipe is stalled waiting for a mispredicted branch to retire.
0x20: (name=rat_stall) Counts the number of core cycles when no micro-ops are allocated and a RATstall (caused by reservation station full) is asserted.
0x7f: (name=all) Counts the total number of core cycles when no micro-ops are allocated for any reason.
rs_full_stall Counts the number of core cycles when the allocate stalls because the required RS is full. all 0x01: (name=mec) Counts the number of core cycles when allocation pipeline is stalled and is waiting for a free MEC reservation station entry.
0x1f: (name=all) Counts the total number of core cycles the Alloc pipeline is stalled when any one of the reservation stations is full.
cycles_div_busy Number of core cycles when divider is busy all 0x01: (name=all) Cycles the number of core cycles when divider is busy, does not imply a stall waiting for the divider
baclears Counts the number of times Branch Target Buffer (BTB) prediction was corrected by a later branch predictor all 0x01: (name=all) Counts the number of times front-end resteers for any branch as a result of another branch handling mechanism in the front-end.
0x08: (name=return) Counts the number of times the front-end resteers for RET branches as a result of another branch handling mechanism in the front-end.
0x10: (name=cond) Counts the number of times the front-end resteers for conditional branches as a result of another branch handling mechanism in the front-end.
ms_decoded Microcode sequencer decode entrypoints all 0x01: (name=ms_entry) Counts the number of times the MSROM starts a flow of uops.
Measurement is a crucial component of performance improvement since reasoning and intuition are fallible guides and must be supplemented with tools like timing commands and profilers. - The Practice of Programming, Brian W. Kernighan and Rob Pike
2020/07/20