xref: /linux/tools/perf/Documentation/topdown.txt (revision 891e8abed532423d3b918b0c445dc8919bc445b5)
120cb10eaSIan RogersUsing TopDown metrics
220cb10eaSIan Rogers---------------------
3328781dfSAndi Kleen
420cb10eaSIan RogersTopDown metrics break apart performance bottlenecks. Starting at level
520cb10eaSIan Rogers1 it is typical to get metrics on retiring, bad speculation, frontend
620cb10eaSIan Rogersbound, and backend bound. Higher levels provide more detail in to the
720cb10eaSIan Rogerslevel 1 bottlenecks, such as at level 2: core bound, memory bound,
820cb10eaSIan Rogersheavy operations, light operations, branch mispredicts, machine
920cb10eaSIan Rogersclears, fetch latency and fetch bandwidth. For more details see [1][2][3].
10328781dfSAndi Kleen
1120cb10eaSIan Rogersperf stat --topdown implements this using available metrics that vary
1220cb10eaSIan Rogersper architecture.
13328781dfSAndi Kleen
1420cb10eaSIan Rogers% perf stat -a --topdown -I1000
1520cb10eaSIan Rogers#           time      %  tma_retiring %  tma_backend_bound %  tma_frontend_bound %  tma_bad_speculation
1620cb10eaSIan Rogers     1.001141351                 11.5                 34.9                  46.9                    6.7
1720cb10eaSIan Rogers     2.006141972                 13.4                 28.1                  50.4                    8.1
1820cb10eaSIan Rogers     3.010162040                 12.9                 28.1                  51.1                    8.0
1920cb10eaSIan Rogers     4.014009311                 12.5                 28.6                  51.8                    7.2
2020cb10eaSIan Rogers     5.017838554                 11.8                 33.0                  48.0                    7.2
2120cb10eaSIan Rogers     5.704818971                 14.0                 27.5                  51.3                    7.3
2220cb10eaSIan Rogers...
23328781dfSAndi Kleen
2420cb10eaSIan RogersNew Topdown features in Intel Ice Lake
2520cb10eaSIan Rogers======================================
26328781dfSAndi Kleen
27328781dfSAndi KleenWith Ice Lake CPUs the TopDown metrics are directly available as
28328781dfSAndi Kleenfixed counters and do not require generic counters. This allows
29328781dfSAndi Kleento collect TopDown always in addition to other events.
30328781dfSAndi Kleen
3120cb10eaSIan RogersUsing TopDown through RDPMC in applications on Intel Ice Lake
3220cb10eaSIan Rogers=============================================================
33328781dfSAndi Kleen
34328781dfSAndi KleenFor more fine grained measurements it can be useful to
35328781dfSAndi Kleenaccess the new  directly from user space. This is more complicated,
36328781dfSAndi Kleenbut drastically lowers overhead.
37328781dfSAndi Kleen
38328781dfSAndi KleenOn Ice Lake, there is a new fixed counter 3: SLOTS, which reports
39328781dfSAndi Kleen"pipeline SLOTS" (cycles multiplied by core issue width) and a
40328781dfSAndi Kleenmetric register that reports slots ratios for the different bottleneck
41328781dfSAndi Kleencategories.
42328781dfSAndi Kleen
43328781dfSAndi KleenThe metrics counter is CPU model specific and is not available on older
44328781dfSAndi KleenCPUs.
45328781dfSAndi Kleen
46328781dfSAndi KleenExample code
47328781dfSAndi Kleen============
48328781dfSAndi Kleen
49328781dfSAndi KleenLibrary functions to do the functionality described below
50328781dfSAndi Kleenis also available in libjevents [4]
51328781dfSAndi Kleen
52328781dfSAndi KleenThe application opens a group with fixed counter 3 (SLOTS) and any
53328781dfSAndi Kleenmetric event, and allow user programs to read the performance counters.
54328781dfSAndi Kleen
55328781dfSAndi KleenFixed counter 3 is mapped to a pseudo event event=0x00, umask=04,
56328781dfSAndi Kleenso the perf_event_attr structure should be initialized with
57328781dfSAndi Kleen{ .config = 0x0400, .type = PERF_TYPE_RAW }
58328781dfSAndi KleenThe metric events are mapped to the pseudo event event=0x00, umask=0x8X.
59328781dfSAndi KleenFor example, the perf_event_attr structure can be initialized with
60328781dfSAndi Kleen{ .config = 0x8000, .type = PERF_TYPE_RAW } for Retiring metric event
61328781dfSAndi KleenThe Fixed counter 3 must be the leader of the group.
62328781dfSAndi Kleen
63328781dfSAndi Kleen#include <linux/perf_event.h>
64a4b0fccfSRay Kinsella#include <sys/mman.h>
65328781dfSAndi Kleen#include <sys/syscall.h>
66328781dfSAndi Kleen#include <unistd.h>
67328781dfSAndi Kleen
68328781dfSAndi Kleen/* Provide own perf_event_open stub because glibc doesn't */
69328781dfSAndi Kleen__attribute__((weak))
70328781dfSAndi Kleenint perf_event_open(struct perf_event_attr *attr, pid_t pid,
71328781dfSAndi Kleen		    int cpu, int group_fd, unsigned long flags)
72328781dfSAndi Kleen{
73328781dfSAndi Kleen	return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
74328781dfSAndi Kleen}
75328781dfSAndi Kleen
76328781dfSAndi Kleen/* Open slots counter file descriptor for current task. */
77328781dfSAndi Kleenstruct perf_event_attr slots = {
78328781dfSAndi Kleen	.type = PERF_TYPE_RAW,
79328781dfSAndi Kleen	.size = sizeof(struct perf_event_attr),
80328781dfSAndi Kleen	.config = 0x400,
81328781dfSAndi Kleen	.exclude_kernel = 1,
82328781dfSAndi Kleen};
83328781dfSAndi Kleen
84328781dfSAndi Kleenint slots_fd = perf_event_open(&slots, 0, -1, -1, 0);
85328781dfSAndi Kleenif (slots_fd < 0)
86328781dfSAndi Kleen	... error ...
87328781dfSAndi Kleen
88a4b0fccfSRay Kinsella/* Memory mapping the fd permits _rdpmc calls from userspace */
89a4b0fccfSRay Kinsellavoid *slots_p = mmap(0, getpagesize(), PROT_READ, MAP_SHARED, slots_fd, 0);
90a4b0fccfSRay Kinsellaif (!slot_p)
91a4b0fccfSRay Kinsella	.... error ...
92a4b0fccfSRay Kinsella
93328781dfSAndi Kleen/*
94328781dfSAndi Kleen * Open metrics event file descriptor for current task.
95328781dfSAndi Kleen * Set slots event as the leader of the group.
96328781dfSAndi Kleen */
97328781dfSAndi Kleenstruct perf_event_attr metrics = {
98328781dfSAndi Kleen	.type = PERF_TYPE_RAW,
99328781dfSAndi Kleen	.size = sizeof(struct perf_event_attr),
100328781dfSAndi Kleen	.config = 0x8000,
101328781dfSAndi Kleen	.exclude_kernel = 1,
102328781dfSAndi Kleen};
103328781dfSAndi Kleen
104328781dfSAndi Kleenint metrics_fd = perf_event_open(&metrics, 0, -1, slots_fd, 0);
105328781dfSAndi Kleenif (metrics_fd < 0)
106328781dfSAndi Kleen	... error ...
107328781dfSAndi Kleen
108a4b0fccfSRay Kinsella/* Memory mapping the fd permits _rdpmc calls from userspace */
109a4b0fccfSRay Kinsellavoid *metrics_p = mmap(0, getpagesize(), PROT_READ, MAP_SHARED, metrics_fd, 0);
110a4b0fccfSRay Kinsellaif (!metrics_p)
111a4b0fccfSRay Kinsella	... error ...
112a4b0fccfSRay Kinsella
113a4b0fccfSRay KinsellaNote: the file descriptors returned by the perf_event_open calls must be memory
114a4b0fccfSRay Kinsellamapped to permit calls to the _rdpmd instruction. Permission may also be granted
115a4b0fccfSRay Kinsellaby writing the /sys/devices/cpu/rdpmc sysfs node.
116328781dfSAndi Kleen
117328781dfSAndi KleenThe RDPMC instruction (or _rdpmc compiler intrinsic) can now be used
118328781dfSAndi Kleento read slots and the topdown metrics at different points of the program:
119328781dfSAndi Kleen
120328781dfSAndi Kleen#include <stdint.h>
121328781dfSAndi Kleen#include <x86intrin.h>
122328781dfSAndi Kleen
123328781dfSAndi Kleen#define RDPMC_FIXED	(1 << 30)	/* return fixed counters */
124328781dfSAndi Kleen#define RDPMC_METRIC	(1 << 29)	/* return metric counters */
125328781dfSAndi Kleen
126328781dfSAndi Kleen#define FIXED_COUNTER_SLOTS		3
1277d91e818SKan Liang#define METRIC_COUNTER_TOPDOWN_L1_L2	0
128328781dfSAndi Kleen
129328781dfSAndi Kleenstatic inline uint64_t read_slots(void)
130328781dfSAndi Kleen{
131328781dfSAndi Kleen	return _rdpmc(RDPMC_FIXED | FIXED_COUNTER_SLOTS);
132328781dfSAndi Kleen}
133328781dfSAndi Kleen
134328781dfSAndi Kleenstatic inline uint64_t read_metrics(void)
135328781dfSAndi Kleen{
1367d91e818SKan Liang	return _rdpmc(RDPMC_METRIC | METRIC_COUNTER_TOPDOWN_L1_L2);
137328781dfSAndi Kleen}
138328781dfSAndi Kleen
139328781dfSAndi KleenThen the program can be instrumented to read these metrics at different
140328781dfSAndi Kleenpoints.
141328781dfSAndi Kleen
142328781dfSAndi KleenIt's not a good idea to do this with too short code regions,
143328781dfSAndi Kleenas the parallelism and overlap in the CPU program execution will
144328781dfSAndi Kleencause too much measurement inaccuracy. For example instrumenting
145328781dfSAndi Kleenindividual basic blocks is definitely too fine grained.
146328781dfSAndi Kleen
147a4b0fccfSRay Kinsella_rdpmc calls should not be mixed with reading the metrics and slots counters
148a4b0fccfSRay Kinsellathrough system calls, as the kernel will reset these counters after each system
149a4b0fccfSRay Kinsellacall.
150a4b0fccfSRay Kinsella
151328781dfSAndi KleenDecoding metrics values
152328781dfSAndi Kleen=======================
153328781dfSAndi Kleen
154328781dfSAndi KleenThe value reported by read_metrics() contains four 8 bit fields
155328781dfSAndi Kleenthat represent a scaled ratio that represent the Level 1 bottleneck.
156328781dfSAndi KleenAll four fields add up to 0xff (= 100%)
157328781dfSAndi Kleen
158328781dfSAndi KleenThe binary ratios in the metric value can be converted to float ratios:
159328781dfSAndi Kleen
160328781dfSAndi Kleen#define GET_METRIC(m, i) (((m) >> (i*8)) & 0xff)
161328781dfSAndi Kleen
1627d91e818SKan Liang/* L1 Topdown metric events */
163328781dfSAndi Kleen#define TOPDOWN_RETIRING(val)	((float)GET_METRIC(val, 0) / 0xff)
164328781dfSAndi Kleen#define TOPDOWN_BAD_SPEC(val)	((float)GET_METRIC(val, 1) / 0xff)
165328781dfSAndi Kleen#define TOPDOWN_FE_BOUND(val)	((float)GET_METRIC(val, 2) / 0xff)
166328781dfSAndi Kleen#define TOPDOWN_BE_BOUND(val)	((float)GET_METRIC(val, 3) / 0xff)
167328781dfSAndi Kleen
1687d91e818SKan Liang/*
1697d91e818SKan Liang * L2 Topdown metric events.
1707d91e818SKan Liang * Available on Sapphire Rapids and later platforms.
1717d91e818SKan Liang */
1727d91e818SKan Liang#define TOPDOWN_HEAVY_OPS(val)		((float)GET_METRIC(val, 4) / 0xff)
1737d91e818SKan Liang#define TOPDOWN_BR_MISPREDICT(val)	((float)GET_METRIC(val, 5) / 0xff)
1747d91e818SKan Liang#define TOPDOWN_FETCH_LAT(val)		((float)GET_METRIC(val, 6) / 0xff)
1757d91e818SKan Liang#define TOPDOWN_MEM_BOUND(val)		((float)GET_METRIC(val, 7) / 0xff)
1767d91e818SKan Liang
177328781dfSAndi Kleenand then converted to percent for printing.
178328781dfSAndi Kleen
179328781dfSAndi KleenThe ratios in the metric accumulate for the time when the counter
180328781dfSAndi Kleenis enabled. For measuring programs it is often useful to measure
181328781dfSAndi Kleenspecific sections. For this it is needed to deltas on metrics.
182328781dfSAndi Kleen
183328781dfSAndi KleenThis can be done by scaling the metrics with the slots counter
184328781dfSAndi Kleenread at the same time.
185328781dfSAndi Kleen
186328781dfSAndi KleenThen it's possible to take deltas of these slots counts
187328781dfSAndi Kleenmeasured at different points, and determine the metrics
188328781dfSAndi Kleenfor that time period.
189328781dfSAndi Kleen
190328781dfSAndi Kleen	slots_a = read_slots();
191328781dfSAndi Kleen	metric_a = read_metrics();
192328781dfSAndi Kleen
193328781dfSAndi Kleen	... larger code region ...
194328781dfSAndi Kleen
195328781dfSAndi Kleen	slots_b = read_slots()
196328781dfSAndi Kleen	metric_b = read_metrics()
197328781dfSAndi Kleen
198328781dfSAndi Kleen	# compute scaled metrics for measurement a
199328781dfSAndi Kleen	retiring_slots_a = GET_METRIC(metric_a, 0) * slots_a
200328781dfSAndi Kleen	bad_spec_slots_a = GET_METRIC(metric_a, 1) * slots_a
201328781dfSAndi Kleen	fe_bound_slots_a = GET_METRIC(metric_a, 2) * slots_a
202328781dfSAndi Kleen	be_bound_slots_a = GET_METRIC(metric_a, 3) * slots_a
203328781dfSAndi Kleen
204328781dfSAndi Kleen	# compute delta scaled metrics between b and a
205328781dfSAndi Kleen	retiring_slots = GET_METRIC(metric_b, 0) * slots_b - retiring_slots_a
206328781dfSAndi Kleen	bad_spec_slots = GET_METRIC(metric_b, 1) * slots_b - bad_spec_slots_a
207328781dfSAndi Kleen	fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a
208328781dfSAndi Kleen	be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a
209328781dfSAndi Kleen
2107d91e818SKan LiangLater the individual ratios of L1 metric events for the measurement period can
2117d91e818SKan Liangbe recreated from these counts.
212328781dfSAndi Kleen
213328781dfSAndi Kleen	slots_delta = slots_b - slots_a
214328781dfSAndi Kleen	retiring_ratio = (float)retiring_slots / slots_delta
215328781dfSAndi Kleen	bad_spec_ratio = (float)bad_spec_slots / slots_delta
216328781dfSAndi Kleen	fe_bound_ratio = (float)fe_bound_slots / slots_delta
217328781dfSAndi Kleen	be_bound_ratio = (float)be_bound_slots / slota_delta
218328781dfSAndi Kleen
219328781dfSAndi Kleen	printf("Retiring %.2f%% Bad Speculation %.2f%% FE Bound %.2f%% BE Bound %.2f%%\n",
220328781dfSAndi Kleen		retiring_ratio * 100.,
221328781dfSAndi Kleen		bad_spec_ratio * 100.,
222328781dfSAndi Kleen		fe_bound_ratio * 100.,
223328781dfSAndi Kleen		be_bound_ratio * 100.);
224328781dfSAndi Kleen
2257d91e818SKan LiangThe individual ratios of L2 metric events for the measurement period can be
2267d91e818SKan Liangrecreated from L1 and L2 metric counters. (Available on Sapphire Rapids and
2277d91e818SKan Lianglater platforms)
2287d91e818SKan Liang
2297d91e818SKan Liang	# compute scaled metrics for measurement a
2307d91e818SKan Liang	heavy_ops_slots_a = GET_METRIC(metric_a, 4) * slots_a
2317d91e818SKan Liang	br_mispredict_slots_a = GET_METRIC(metric_a, 5) * slots_a
2327d91e818SKan Liang	fetch_lat_slots_a = GET_METRIC(metric_a, 6) * slots_a
2337d91e818SKan Liang	mem_bound_slots_a = GET_METRIC(metric_a, 7) * slots_a
2347d91e818SKan Liang
2357d91e818SKan Liang	# compute delta scaled metrics between b and a
2367d91e818SKan Liang	heavy_ops_slots = GET_METRIC(metric_b, 4) * slots_b - heavy_ops_slots_a
2377d91e818SKan Liang	br_mispredict_slots = GET_METRIC(metric_b, 5) * slots_b - br_mispredict_slots_a
2387d91e818SKan Liang	fetch_lat_slots = GET_METRIC(metric_b, 6) * slots_b - fetch_lat_slots_a
2397d91e818SKan Liang	mem_bound_slots = GET_METRIC(metric_b, 7) * slots_b - mem_bound_slots_a
2407d91e818SKan Liang
2417d91e818SKan Liang	slots_delta = slots_b - slots_a
2427d91e818SKan Liang	heavy_ops_ratio = (float)heavy_ops_slots / slots_delta
2437d91e818SKan Liang	light_ops_ratio = retiring_ratio - heavy_ops_ratio;
2447d91e818SKan Liang
2457d91e818SKan Liang	br_mispredict_ratio = (float)br_mispredict_slots / slots_delta
2467d91e818SKan Liang	machine_clears_ratio = bad_spec_ratio - br_mispredict_ratio;
2477d91e818SKan Liang
2487d91e818SKan Liang	fetch_lat_ratio = (float)fetch_lat_slots / slots_delta
2497d91e818SKan Liang	fetch_bw_ratio = fe_bound_ratio - fetch_lat_ratio;
2507d91e818SKan Liang
2517d91e818SKan Liang	mem_bound_ratio = (float)mem_bound_slots / slota_delta
2527d91e818SKan Liang	core_bound_ratio = be_bound_ratio - mem_bound_ratio;
2537d91e818SKan Liang
2547d91e818SKan Liang	printf("Heavy Operations %.2f%% Light Operations %.2f%% "
2557d91e818SKan Liang	       "Branch Mispredict %.2f%% Machine Clears %.2f%% "
2567d91e818SKan Liang	       "Fetch Latency %.2f%% Fetch Bandwidth %.2f%% "
2577d91e818SKan Liang	       "Mem Bound %.2f%% Core Bound %.2f%%\n",
2587d91e818SKan Liang		heavy_ops_ratio * 100.,
2597d91e818SKan Liang		light_ops_ratio * 100.,
2607d91e818SKan Liang		br_mispredict_ratio * 100.,
2617d91e818SKan Liang		machine_clears_ratio * 100.,
2627d91e818SKan Liang		fetch_lat_ratio * 100.,
2637d91e818SKan Liang		fetch_bw_ratio * 100.,
2647d91e818SKan Liang		mem_bound_ratio * 100.,
2657d91e818SKan Liang		core_bound_ratio * 100.);
2667d91e818SKan Liang
267328781dfSAndi KleenResetting metrics counters
268328781dfSAndi Kleen==========================
269328781dfSAndi Kleen
270328781dfSAndi KleenSince the individual metrics are only 8bit they lose precision for
271328781dfSAndi Kleenshort regions over time because the number of cycles covered by each
272328781dfSAndi Kleenfraction bit shrinks. So the counters need to be reset regularly.
273328781dfSAndi Kleen
274328781dfSAndi KleenWhen using the kernel perf API the kernel resets on every read.
275328781dfSAndi KleenSo as long as the reading is at reasonable intervals (every few
276328781dfSAndi Kleenseconds) the precision is good.
277328781dfSAndi Kleen
278328781dfSAndi KleenWhen using perf stat it is recommended to always use the -I option,
279328781dfSAndi Kleenwith no longer interval than a few seconds
280328781dfSAndi Kleen
281328781dfSAndi Kleen	perf stat -I 1000 --topdown ...
282328781dfSAndi Kleen
283328781dfSAndi KleenFor user programs using RDPMC directly the counter can
284328781dfSAndi Kleenbe reset explicitly using ioctl:
285328781dfSAndi Kleen
286328781dfSAndi Kleen	ioctl(perf_fd, PERF_EVENT_IOC_RESET, 0);
287328781dfSAndi Kleen
288328781dfSAndi KleenThis "opens" a new measurement period.
289328781dfSAndi Kleen
290328781dfSAndi KleenA program using RDPMC for TopDown should schedule such a reset
291328781dfSAndi Kleenregularly, as in every few seconds.
292328781dfSAndi Kleen
29320cb10eaSIan RogersLimits on Intel Ice Lake
29420cb10eaSIan Rogers========================
295328781dfSAndi Kleen
296328781dfSAndi KleenFour pseudo TopDown metric events are exposed for the end-users,
297328781dfSAndi Kleentopdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound.
298328781dfSAndi KleenThey can be used to collect the TopDown value under the following
299328781dfSAndi Kleenrules:
300328781dfSAndi Kleen- All the TopDown metric events must be in a group with the SLOTS event.
301328781dfSAndi Kleen- The SLOTS event must be the leader of the group.
302328781dfSAndi Kleen- The PERF_FORMAT_GROUP flag must be applied for each TopDown metric
303328781dfSAndi Kleen  events
304328781dfSAndi Kleen
305328781dfSAndi KleenThe SLOTS event and the TopDown metric events can be counting members of
306328781dfSAndi Kleena sampling read group. Since the SLOTS event must be the leader of a TopDown
307328781dfSAndi Kleengroup, the second event of the group is the sampling event.
308328781dfSAndi KleenFor example, perf record -e '{slots, $sampling_event, topdown-retiring}:S'
309328781dfSAndi Kleen
31020cb10eaSIan RogersExtension on Intel Sapphire Rapids Server
31120cb10eaSIan Rogers=========================================
3127d91e818SKan LiangThe metrics counter is extended to support TMA method level 2 metrics.
3137d91e818SKan LiangThe lower half of the register is the TMA level 1 metrics (legacy).
3147d91e818SKan LiangThe upper half is also divided into four 8-bit fields for the new level 2
3157d91e818SKan Liangmetrics. Four more TopDown metric events are exposed for the end-users,
3167d91e818SKan Liangtopdown-heavy-ops, topdown-br-mispredict, topdown-fetch-lat and
3177d91e818SKan Liangtopdown-mem-bound.
3187d91e818SKan Liang
3197d91e818SKan LiangEach of the new level 2 metrics in the upper half is a subset of the
3207d91e818SKan Liangcorresponding level 1 metric in the lower half. Software can deduce the
3217d91e818SKan Liangother four level 2 metrics by subtracting corresponding metrics as below.
3227d91e818SKan Liang
3237d91e818SKan Liang    Light_Operations = Retiring - Heavy_Operations
3247d91e818SKan Liang    Machine_Clears = Bad_Speculation - Branch_Mispredicts
3257d91e818SKan Liang    Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
3267d91e818SKan Liang    Core_Bound = Backend_Bound - Memory_Bound
3277d91e818SKan Liang
328*169f18fdSWeilin WangTPEBS in TopDown
329*169f18fdSWeilin Wang================
330*169f18fdSWeilin Wang
331*169f18fdSWeilin WangTPEBS (Timed PEBS) is one of the new Intel PMU features provided since Granite
332*169f18fdSWeilin WangRapids microarchitecture. The TPEBS feature adds a 16 bit retire_latency field
333*169f18fdSWeilin Wangin the Basic Info group of the PEBS record. It records the Core cycles since the
334*169f18fdSWeilin Wangretirement of the previous instruction to the retirement of current instruction.
335*169f18fdSWeilin WangPlease refer to Section 8.4.1 of "Intel® Architecture Instruction Set Extensions
336*169f18fdSWeilin WangProgramming Reference" for more details about this feature. Because this feature
337*169f18fdSWeilin Wangextends PEBS record, sampling with weight option is required to get the
338*169f18fdSWeilin Wangretire_latency value.
339*169f18fdSWeilin Wang
340*169f18fdSWeilin Wang	perf record -e event_name -W ...
341*169f18fdSWeilin Wang
342*169f18fdSWeilin WangIn the most recent release of TMA, the metrics begin to use event retire_latency
343*169f18fdSWeilin Wangvalues in some of the metrics’ formulas on processors that support TPEBS feature.
344*169f18fdSWeilin WangFor previous generations that do not support TPEBS, the values are static and
345*169f18fdSWeilin Wangpredefined per processor family by the hardware architects. Due to the diversity
346*169f18fdSWeilin Wangof workloads in execution environments, retire_latency values measured at real
347*169f18fdSWeilin Wangtime are more accurate. Therefore, new TMA metrics that use TPEBS will provide
348*169f18fdSWeilin Wangmore accurate performance analysis results.
349*169f18fdSWeilin Wang
350*169f18fdSWeilin WangTo support TPEBS in TMA metrics, a new modifier :R on event is added. Perf would
351*169f18fdSWeilin Wangcapture retire_latency value of required events(event with :R in metric formula)
352*169f18fdSWeilin Wangwith perf record. The retire_latency value would be used in metric calculation.
353*169f18fdSWeilin WangCurrently, this feature is supported through perf stat
354*169f18fdSWeilin Wang
355*169f18fdSWeilin Wang	perf stat -M metric_name --record-tpebs ...
356*169f18fdSWeilin Wang
357*169f18fdSWeilin Wang
358328781dfSAndi Kleen
359328781dfSAndi Kleen[1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
36020cb10eaSIan Rogers[2] https://sites.google.com/site/analysismethods/yasin-pubs
36120cb10eaSIan Rogers[3] https://perf.wiki.kernel.org/index.php/Top-Down_Analysis
362328781dfSAndi Kleen[4] https://github.com/andikleen/pmu-tools/tree/master/jevents
363