spr-metrics.json - OpenGrok cross reference for /linux/tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json

Lines Matching +full:average +full:- +full:on
4         "MetricExpr": "cstate_core@c1\\-residency@ / TSC",
11         "MetricExpr": "cstate_pkg@c2\\-residency@ / TSC",
18         "MetricExpr": "cstate_core@c6\\-residency@ / TSC",
25         "MetricExpr": "cstate_pkg@c6\\-residency@ / TSC",
210 …"BriefDescription": "Average latency of a last level cache (LLC) demand data read miss (read memor…
216 …"BriefDescription": "Average latency of a last level cache (LLC) demand data read miss (read memor…
222 …"BriefDescription": "Average latency of a last level cache (LLC) demand data read miss (read memor…
228 …"BriefDescription": "Average latency of a last level cache (LLC) demand data read miss (read memor…
234 …"BriefDescription": "Average latency of a last level cache (LLC) demand data read miss (read memor…
312 …"BriefDescription": "Uops delivered from legacy decode pipeline (Micro-instruction Translation Eng…
343         "MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
363 …"BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on executi…
384 …-cases for operations that cannot be handled natively by the execution pipeline. For example; when…
398 …"MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-ret…
403 …-of-order scheduler dispatches ready uops into their respective execution units; and once complete…
409         "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)",
414 …s for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For…
418 …of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and B…
425 …"BriefDescription": "Total pipeline cost of instructions used for program control-flow - a subset …
430 …"PublicDescription": "Total pipeline cost of instructions used for program control-flow - a subset…
433 …"BriefDescription": "Total pipeline cost of external Memory- or Cache-Bandwidth related bottleneck…
438 …"PublicDescription": "Total pipeline cost of external Memory- or Cache-Bandwidth related bottlenec…
441 …"BriefDescription": "Total pipeline cost of external Memory- or Cache-Latency related bottlenecks",
446 …"PublicDescription": "Total pipeline cost of external Memory- or Cache-Latency related bottlenecks…
449 …     "BriefDescription": "Total pipeline cost when the execution is compute-bound - an estimation",
454 …ine cost when the execution is compute-bound - an estimation. Covers Core Bound when High ILP as w…
457 …tch bandwidth related bottlenecks (when the front-end could not sustain operations delivery to the…
458 …- (1 - 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispredicts) * tma_fetch_…
465 …"MetricExpr": "100 * ((1 - INST_RETIRED.REP_ITERATION / cpu@UOPS_RETIRED.MS\\,cmask\\=1@) * (tma_f…
469 …"PublicDescription": "Total pipeline cost of irregular execution (e.g. FP-assists in HPC, Wait tim…
472 …ription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
477 …"Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs). Related m…
481 …t_stores + tma_store_latency + tma_streaming_stores - tma_store_latency)) + tma_machine_clears * (…
489 …"MetricExpr": "100 * (1 - 10 * tma_microcode_sequencer * tma_other_mispredicts / tma_branch_mispre…
496         "BriefDescription": "Total pipeline cost of remaining bottlenecks in the back-end",
497 …"MetricExpr": "100 - (tma_bottleneck_big_code + tma_bottleneck_instruction_fetch_bw + tma_bottlene…
501 …aining bottlenecks in the back-end. Examples include data-dependencies (Core Bound when Low ILP) a…
504 …"BriefDescription": "Total pipeline cost of \"useful operations\" - the portion of Retiring catego…
505 … "100 * (tma_retiring - (BR_INST_RETIRED.ALL_BRANCHES + 2 * BR_INST_RETIRED.NEAR_CALL + INST_RETIR…
513 …"MetricExpr": "topdown\\-br\\-mispredict / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\…
518 …etched from an incorrectly speculated program path; or stalls when the out-of-order part of the ma…
527 … corrected path; following all sorts of miss-predicted branches. For example; branchy code with lo…
531 … represents fraction of cycles the CPU was stalled due staying in C0.1 power-performance optimized…
539 … represents fraction of cycles the CPU was stalled due staying in C0.2 power-performance optimized…
548         "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)",
552 … as in the case of read-modify-write as an example. Since these instructions require multiple uops…
557 …"MetricExpr": "(1 - tma_branch_mispredicts / tma_bad_speculation) * INT_MISC.CLEAR_RESTEER_CYCLES …
566         "MetricExpr": "max(0, tma_icache_misses - tma_code_l2_miss)",
581 …e (first level) ITLB was missed by instructions fetches, that later on hit in second-level TLB (ST…
582         "MetricExpr": "max(0, tma_itlb_misses - tma_code_stlb_miss)",
589 …"BriefDescription": "This metric estimates the fraction of cycles where the Second-level TLB (STLB…
619 …ata written by one Logical Processor are read by another Logical Processor on a different Physical…
623 …"BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of…
625         "MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)",
630 …-memory issues were of a bottleneck.  Shortage in hardware compute resources; or dependencies in s…
634 …n of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses",
636 …MEM_LOAD_L3_HIT_RETIRED.XSNP_NO_FWD + MEM_LOAD_L3_HIT_RETIRED.XSNP_FWD * (1 - OCR.DEMAND_DATA_RD.L…
640 … cycles while the memory subsystem was handling synchronizations due to data-sharing accesses. Dat…
644 …"BriefDescription": "This metric represents fraction of cycles where decoder-0 was the only active…
645 …"MetricExpr": "(cpu@INST_DECODED.DECODERS\\,cmask\\=1@ - cpu@INST_DECODED.DECODERS\\,cmask\\=2@) /…
649 …"PublicDescription": "This metric represents fraction of cycles where decoder-0 was the only activ…
662 …"BriefDescription": "This metric estimates how often the CPU was stalled on accesses to external m…
667 …"PublicDescription": "This metric estimates how often the CPU was stalled on accesses to external …
672         "MetricExpr": "(IDQ.DSB_CYCLES_ANY - IDQ.DSB_CYCLES_OK) / tma_info_core_core_clks / 2",
685 …o switches from DSB to MITE pipelines. The DSB (decoded i-cache) is a Uop Cache where the front-en…
690 …mask\\=1@ + DTLB_LOAD_MISSES.WALK_ACTIVE, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - MEMORY_ACTIVITY.CYCL…
694 …-aside Buffers) are processor caches for recently used entries out of the Page Tables that are use…
698 …: "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store …
703 …-level data TLB store misses.  As with ordinary data caching; focus on improving data locality and…
712 …a multithreading hiccup; where multiple Logical Processors contend on different data-elements mapp…
721 …the misses are satisfied from (metric values >1 are valid). Often it hints on approaching bandwidt…
727         "MetricExpr": "max(0, tma_frontend_bound - tma_fetch_latency)",
738 …MetricExpr": "topdown\\-fetch\\-lat / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-ret…
743 …he CPU was stalled due to Frontend latency issues.  For example; instruction-cache misses; iTLB mi…
748         "MetricExpr": "max(0, tma_heavy_operations - tma_microcode_sequencer)",
752 …tiring instructions that that are decoder into two or more uops. This highly-correlates with the n…
756 …"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations frac…
761 …-point (FP) operations fraction the CPU has executed (retired). Note this metric's value may excee…
770 …ts. FP Assist may apply when working with very small floating point values (so-called Denormals).",
774 …"BriefDescription": "This metric represents fraction of cycles where the Floating-Point Divider un…
782 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction …
787 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction…
791 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction …
796 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction…
800 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors",
805 … approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors. May…
809 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors",
814 … approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors. May…
818 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 512-bit wide vectors",
823 … approximates arithmetic FP vector uops fraction the CPU has retired for 512-bit wide vectors. May…
829 …"MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-ret…
834 …on by the Backend part. Within the Frontend; a branch predictor predicts the next address to fetch…
838 …represents fraction of slots where the CPU was retiring fused instructions -- where one uop can re…
843 …represents fraction of slots where the CPU was retiring fused instructions -- where one uop can re…
847 … slots where the CPU was retiring heavy-weight operations -- instructions that require two or more…
849 …"MetricExpr": "topdown\\-heavy\\-ops / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-re…
854 …he CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro…
867 …Misprediction Cost: Cycles representing fraction of TMA slots wasted per non-speculative branch mi…
871 …Misprediction Cost: Cycles representing fraction of TMA slots wasted per non-speculative branch mi…
874 …"BriefDescription": "Instructions per retired Mispredicts for conditional non-taken branches (lowe…
902 …"BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lo…
915 …      "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
916 …"MetricExpr": "(100 * (1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_ut…
922 …"BriefDescription": "Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetch…
927 …"PublicDescription": "Total pipeline cost of DSB (uop cache) hits - subset of the Instruction_Fetc…
930 …"BriefDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fet…
935 …"PublicDescription": "Total pipeline cost of DSB (uop cache) misses - subset of the Instruction_Fe…
938 …"BriefDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bott…
943 …"PublicDescription": "Total pipeline cost of Instruction Cache misses - subset of the Big_Code Bot…
952         "BriefDescription": "Fraction of branches that are non-taken conditionals",
965 …"MetricExpr": "(BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_TAKEN - 2 * BR_INST_RETIRED.NEAR…
971 …"MetricExpr": "1 - (tma_info_branches_cond_nt + tma_info_branches_cond_tk + tma_info_branches_call…
976 …"BriefDescription": "Core actual clocks when any Logical Processor is active on the Physical Core",
982         "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
1000 …BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardles…
1004 …-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width…
1007 …efDescription": "Instruction-Level-Parallelism (average number of uops executed when there is exec…
1021 …"BriefDescription": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch un…
1027         "BriefDescription": "Average number of Uops issued by front-end when it issued something",
1033         "BriefDescription": "Average Latency for L1 instruction cache misses",
1039 …"BriefDescription": "Instructions per non-speculative DSB miss (lower number means higher occurren…
1070 …"BriefDescription": "Average number of cycles the front-end was delayed due to an Unknown Branch d…
1074 …"PublicDescription": "Average number of cycles the front-end was delayed due to an Unknown Branch …
1098 …"BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mean…
1103 …"PublicDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mea…
1106 …"BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means h…
1111 …"PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means …
1114 …"BriefDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means hi…
1119 …"PublicDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means h…
1122 …"BriefDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower num…
1127 …"PublicDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower nu…
1130 …"BriefDescription": "Instructions per FP Arithmetic Scalar Half-Precision instruction (lower numbe…
1135 …"PublicDescription": "Instructions per FP Arithmetic Scalar Half-Precision instruction (lower numb…
1138 …"BriefDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower num…
1143 …"PublicDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower nu…
1202         "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
1208         "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
1226         "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
1232         "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
1238 … instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
1244 …      "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
1262         "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
1269         "MetricExpr": "1e3 * (L2_RQSTS.REFERENCES - L2_RQSTS.MISS) / INST_RETIRED.ANY",
1304         "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
1310         "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
1322         "BriefDescription": "Average Parallel L2 cache miss data reads",
1328         "BriefDescription": "Average Latency for L2 cache miss demand Loads",
1334         "BriefDescription": "Average Parallel L2 cache miss demand Loads",
1340         "BriefDescription": "Average Latency for L3 cache miss demand Loads",
1346 …"BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core…
1358         "BriefDescription": "Off-core accesses per kilo instruction for modified write requests",
1364 …"BriefDescription": "Off-core accesses per kilo instruction for reads-to-core requests (speculativ…
1370 …tion": "L3 cache misses per kilo instruction for reads-to-core requests (speculative; including in…
1376         "BriefDescription": "Un-cacheable retired load per kilo instruction",
1382 …"BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is…
1386 …ublicDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is …
1396 …"BriefDescription": "Average DRAM BW for Reads-to-Core (R2C) covering for memory attached to local…
1400 …"PublicDescription": "Average DRAM BW for Reads-to-Core (R2C) covering for memory attached to loca…
1403         "BriefDescription": "Average L3-cache miss BW for Reads-to-Core (R2C)",
1407 …"PublicDescription": "Average L3-cache miss BW for Reads-to-Core (R2C). This covering going to DRA…
1410         "BriefDescription": "Average Off-core access BW for Reads-to-Core (R2C)",
1414 …"PublicDescription": "Average Off-core access BW for Reads-to-Core (R2C). R2C account for demand o…
1417 … level TLB) code speculative misses per kilo instruction (misses of any page-size that complete th…
1423 …l TLB) data load speculative misses per kilo instruction (misses of any page-size that complete th…
1436 … TLB) data store speculative misses per kilo instruction (misses of any page-size that complete th…
1448         "BriefDescription": "Average number of uops fetched from DSB per cycle",
1454         "BriefDescription": "Average number of uops fetched from MITE per cycle",
1468 …"BriefDescription": "Average number of Uops retired in cycles where at least one uop has retired.",
1474 …    "BriefDescription": "Estimated fraction of retirement-cycles dealing with repeat instructions",
1481 …et unhalted; covering legacy PAUSE instruction, as well as C0.1 / C0.2 power-performance optimized…
1488         "BriefDescription": "Measured Average Core Frequency for unhalted processors [GHz]",
1494         "BriefDescription": "Average CPU Utilization (percentage)",
1500         "BriefDescription": "Average number of utilized CPUs",
1506         "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
1510 …"PublicDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]. Relat…
1517 …egate across all supported options of: FP precisions, scalar and vector instructions, vector-width"
1520         "BriefDescription": "Average IO (network or disk) Bandwidth Use for Reads [GB / sec]",
1524 …"PublicDescription": "Average IO (network or disk) Bandwidth Use for Reads [GB / sec]. Bandwidth o…
1527         "BriefDescription": "Average IO (network or disk) Bandwidth Use for Writes [GB / sec]",
1531 …"PublicDescription": "Average IO (network or disk) Bandwidth Use for Writes [GB / sec]. Bandwidth …
1554 …"BriefDescription": "Average latency of data read request to external DRAM memory [in nanoseconds]…
1558 …cDescription": "Average latency of data read request to external DRAM memory [in nanoseconds]. Acc…
1568         "BriefDescription": "Average number of parallel data read requests to external memory",
1572 …"PublicDescription": "Average number of parallel data read requests to external memory. Accounts f…
1575 …    "BriefDescription": "Average latency of data read request to external memory (in nanoseconds)",
1580 …tion": "Average latency of data read request to external memory (in nanoseconds). Accounts for dem…
1591 …   "MetricExpr": "(power@energy\\-pkg@ * 61 + 15.6 * power@energy\\-ram@) / (duration_time * 1e6)",
1597 …"MetricExpr": "(1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_DISTRIBUTED if #SMT_…
1602         "BriefDescription": "Socket actual clocks when any core is active on that socket",
1615         "BriefDescription": "Average Frequency Utilization relative nominal frequency",
1621         "BriefDescription": "Measured Average Uncore Frequency for the SoC [GHz]",
1627 …"BriefDescription": "Cross-socket Ultra Path Interconnect (UPI) data transmit bandwidth for data o…
1633 …   "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
1645         "BriefDescription": "The ratio of Executed- by Issued-Uops",
1649 …"PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio > 1 suggests high rate of uop m…
1658 …"BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor…
1664 …    "BriefDescription": "Fraction of Physical Core issue-slots utilized by this Logical Processor",
1685         "MetricExpr": "tma_divider - tma_fp_divider",
1701 …"BriefDescription": "This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI (Vector Neu…
1706 …"PublicDescription": "This metric represents 128-bit vector Integer ADD/SUB/SAD or VNNI (Vector Ne…
1710 …"BriefDescription": "This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI (Vector…
1715 …"PublicDescription": "This metric represents 256-bit vector Integer ADD/SUB/SAD/MUL or VNNI (Vecto…
1729 …"MetricExpr": "max((EXE_ACTIVITY.BOUND_ON_LOADS - MEMORY_ACTIVITY.STALLS_L1D_MISS) / tma_info_thre…
1733 …on older stores; a load might suffer due to high latency even though it is being satisfied by the …
1738 …EM_INST_RETIRED.ALL_LOADS - MEM_LOAD_RETIRED.FB_HIT - MEM_LOAD_RETIRED.L1_MISS) * 20 / 100, max(CY…
1742 … the L1D cache. The short latency of the L1D cache may be exposed in pointer-chasing memory access…
1747 …"MetricExpr": "(MEMORY_ACTIVITY.STALLS_L1D_MISS - MEMORY_ACTIVITY.STALLS_L2_MISS) / tma_info_threa…
1766 …"MetricExpr": "(MEMORY_ACTIVITY.STALLS_L2_MISS - MEMORY_ACTIVITY.STALLS_L3_MISS) / tma_info_thread…
1793 …slots where the CPU was retiring light-weight operations -- instructions that require no more than…
1795         "MetricExpr": "max(0, tma_retiring - tma_heavy_operations)",
1800 …-weight operations -- instructions that require no more than one uop (micro-operation). This corre…
1804 …"BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on executi…
1809 …"PublicDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execut…
1813 …here the (first level) DTLB was missed by load accesses, that later on hit in second-level TLB (ST…
1814         "MetricExpr": "tma_dtlb_load - tma_load_stlb_miss",
1821 …"BriefDescription": "This metric estimates the fraction of cycles where the Second-level TLB (STLB…
1863 …"MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS.ALL_RFO) + MEM_INST_RETIRED.LOC…
1873         "MetricExpr": "max(0, tma_bad_speculation - tma_branch_mispredicts)",
1878 …-of-order portion of the machine needs to recover its state after the clear. For example; this can…
1890 …as likely hurt due to approaching bandwidth limits of external memory - DRAM ([SPR-HBM] and/or HBM…
1895 …- DRAM ([SPR-HBM] and/or HBM).  The underlying heuristic assumes that a similar off-core traffic i…
1899 …e the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM…
1900 …EAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / tma_info_thread_clks - tma_mem_bandwidth",
1904 …e the performance was likely hurt due to latency from external memory - DRAM ([SPR-HBM] and/or HBM…
1910 …"MetricExpr": "topdown\\-mem\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-re…
1915 …-completed in-flight memory demand loads which coincides with execution units starvation; in addit…
1928 … represents fraction of slots where the CPU was retiring memory operations -- uops for memory load…
1955         "MetricExpr": "(IDQ.MITE_CYCLES_ANY - IDQ.MITE_CYCLES_OK) / tma_info_core_core_clks / 2",
1959 …the legacy decode pipeline). This pipeline is used for code that was not pre-cached in the DSB or …
1963 …n terms of percentage of([SKL+] injected blend uops out of all Uops Issued -- the Count Domain; [A…
1968 …n terms of percentage of([SKL+] injected blend uops out of all Uops Issued -- the Count Domain; [A…
1972 …es in which CPU was likely limited due to the Microcode Sequencer (MS) unit - see Microcode_Sequen…
1985 … Commonly used instructions are optimized for delivery by the DSB (decoded i-cache) or MITE (legac…
1990 …"MetricExpr": "tma_light_operations * (BR_INST_RETIRED.ALL_BRANCHES - INST_RETIRED.MACRO_FUSED) / …
1994 …lots where the CPU was retiring branch instructions that were not fused. Non-conditional branches …
2003 …o op) instructions. Compilers often use NOPs for certain address alignments - e.g. start address o…
2007 …is metric represents the remaining light uops fraction the CPU has executed - remaining means not …
2008 …"MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_int_operations + tma_memory_opera…
2012 …is metric represents the remaining light uops fraction the CPU has executed - remaining means not …
2016 …action of slots the CPU was stalled due to other cases of misprediction (non-retired x86 branches …
2017 …"MetricExpr": "max(tma_branch_mispredicts * (1 - BR_MISP_RETIRED.ALL_BRANCHES / (INT_MISC.CLEARS_C…
2025 …"MetricExpr": "max(tma_machine_clears * (1 - MACHINE_CLEARS.MEMORY_ORDERING / MACHINE_CLEARS.COUNT…
2037 …PU retired uops as a result of handing Page Faults. A Page Fault may apply on first application ac…
2041 …"BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on executi…
2046 …"PublicDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execut…
2050 …"BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on executi…
2055 …"PublicDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execut…
2059 …"BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on executi…
2064 …"PublicDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execut…
2068 … the CPU performance was potentially limited due to Core computation issues (non divider-related)",
2069 …)) / tma_info_thread_clks if ARITH.DIV_ACTIVE < CYCLE_ACTIVITY.STALLS_TOTAL - EXE_ACTIVITY.BOUND_O…
2073 …-related).  Two distinct categories can be attributed into this metric: (1) heavy data-dependency …
2077 …"BriefDescription": "This metric represents fraction of cycles CPU executed no uops on any executi…
2078 …0_PORTS + max(RS.EMPTY_RESOURCE - RESOURCE_STALLS.SCOREBOARD, 0)) / tma_info_thread_clks * (CYCLE_…
2082 …cycles CPU executed no uops on any execution port (Logical Processor cycles since ICL, Physical Co…
2086 …resents fraction of cycles where the CPU executed total of 1 uop per cycle on all execution ports …
2091 …on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise). This …
2095 …etric represents fraction of cycles CPU executed total of 2 uops per cycle on all execution ports …
2101 …on all execution ports (Logical Processor cycles since ICL, Physical Core cycles otherwise).  Loop…
2105 …presents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports …
2111 …presents fraction of cycles CPU executed total of 3 or more uops per cycle on all execution ports …
2120 …r sockets including synchronizations issues. This is caused often due to non-optimal NUMA allocati…
2129 …ystem was handling loads from remote memory. This is caused often due to non-optimal NUMA allocati…
2135 …"MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retir…
2140 …ions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is …
2144 …"BriefDescription": "This metric represents fraction of cycles the CPU issue-pipeline was stalled …
2149 …ycles the CPU issue-pipeline was stalled due to serializing operations. Instructions like CPUID; W…
2153 …sents fraction of slots where the CPU was retiring Shuffle operations of 256-bit vector size (FP o…
2158 …sents fraction of slots where the CPU was retiring Shuffle operations of 256-bit vector size (FP o…
2172 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
2177 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
2186 …resents rate of split store accesses.  Consider aligning your data to the 64-byte cache line granu…
2190 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
2195 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
2199 … CPU was stalled  due to RFO store memory accesses; RFO store issue a read-for-ownership request b…
2204 …ses; RFO store issue a read-for-ownership request before the write. Even though store accesses do …
2213 …perations in the pipeline; a load can avoid waiting for memory if a prior in-flight store is writi…
2218 …xpr": "(MEM_STORE_RETIRED.L2_HIT * 10 * (1 - MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_ST…
2222 …-of-order core performance; however; holding resources for longer time can lead into undesired imp…
2226 …"BriefDescription": "This metric represents Core fraction of cycles CPU dispatched uops on executi…
2231 …"PublicDescription": "This metric represents Core fraction of cycles CPU dispatched uops on execut…
2235 …tion of cycles where the TLB was missed by store accesses, hitting in the second-level TLB (STLB)",
2236         "MetricExpr": "tma_dtlb_store - tma_store_stlb_miss",
2280 …uired by RFO stores. Even though store accesses do not typically stall out-of-order CPUs; there ar…
2303         "MetricExpr": "(max(cycles\\-t - cycles\\-ct, 0) / cycles if has_event(cycles\\-t) else 0)",
2310         "MetricExpr": "(cycles\\-t / tx\\-start if has_event(cycles\\-t) else 0)",
2317         "MetricExpr": "(cycles\\-t / cycles if has_event(cycles\\-t) else 0)",