xref: /linux/tools/perf/Documentation/perf-mem.txt (revision bfb4a6c721517a11b277e8841f8a7a64b1b14b72)
1perf-mem(1)
2===========
3
4NAME
5----
6perf-mem - Profile memory accesses
7
8SYNOPSIS
9--------
10[verse]
11'perf mem' [<options>] (record [<command>] | report)
12
13DESCRIPTION
14-----------
15"perf mem record" runs a command and gathers memory operation data
16from it, into perf.data. Perf record options are accepted and are passed through.
17
18"perf mem report" displays the result. It invokes perf report with the
19right set of options to display a memory access profile. By default, loads
20and stores are sampled. Use the -t option to limit to loads or stores.
21
22Note that on Intel systems the memory latency reported is the use-latency,
23not the pure load (or store latency). Use latency includes any pipeline
24queuing delays in addition to the memory subsystem latency.
25
26On Arm64 this uses SPE to sample load and store operations, therefore hardware
27and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide.
28Due to the statistical nature of SPE sampling, not every memory operation will
29be sampled.
30
31On AMD this use IBS Op PMU to sample load-store operations.
32
33COMMON OPTIONS
34--------------
35-f::
36--force::
37	Don't do ownership validation
38
39-t::
40--type=<type>::
41	Select the memory operation type: load or store (default: load,store)
42
43-v::
44--verbose::
45	Be more verbose (show counter open errors, etc)
46
47-p::
48--phys-data::
49	Record/Report sample physical addresses
50
51--data-page-size::
52	Record/Report sample data address page size
53
54RECORD OPTIONS
55--------------
56<command>...::
57	Any command you can specify in a shell.
58
59-e::
60--event <event>::
61	Event selector. Use 'perf mem record -e list' to list available events.
62
63-K::
64--all-kernel::
65	Configure all used events to run in kernel space.
66
67-U::
68--all-user::
69	Configure all used events to run in user space.
70
71--ldlat <n>::
72	Specify desired latency for loads event. Supported on Intel, Arm64 and
73	some AMD processors. Ignored on other archs.
74
75	On supported AMD processors:
76	- /sys/bus/event_source/devices/ibs_op/caps/ldlat file contains '1'.
77	- Supported latency values are 128 to 2048 (both inclusive).
78	- Latency value which is a multiple of 128 incurs a little less profiling
79	  overhead compared to other values.
80	- Load latency filtering is disabled by default.
81
82REPORT OPTIONS
83--------------
84-i::
85--input=<file>::
86	Input file name.
87
88-C::
89--cpu=<cpu>::
90	Monitor only on the list of CPUs provided. Multiple CPUs can be provided as a
91        comma-separated list with no space: 0,1. Ranges of CPUs are specified with -
92	like 0-2. Default is to monitor all CPUS.
93
94-D::
95--dump-raw-samples::
96	Dump the raw decoded samples on the screen in a format that is easy to parse with
97	one sample per line.
98
99-s::
100--sort=<key>::
101	Group result by given key(s) - multiple keys can be specified
102	in CSV format.  The keys are specific to memory samples are:
103	symbol_daddr, symbol_iaddr, dso_daddr, locked, tlb, mem, snoop,
104	dcacheline, phys_daddr, data_page_size, blocked.
105
106	- symbol_daddr: name of data symbol being executed on at the time of sample
107	- symbol_iaddr: name of code symbol being executed on at the time of sample
108	- dso_daddr: name of library or module containing the data being executed
109	             on at the time of the sample
110	- locked: whether the bus was locked at the time of the sample
111	- tlb: type of tlb access for the data at the time of the sample
112	- mem: type of memory access for the data at the time of the sample
113	- snoop: type of snoop (if any) for the data at the time of the sample
114	- dcacheline: the cacheline the data address is on at the time of the sample
115	- phys_daddr: physical address of data being executed on at the time of sample
116	- data_page_size: the data page size of data being executed on at the time of sample
117	- blocked: reason of blocked load access for the data at the time of the sample
118
119	And the default sort keys are changed to local_weight, mem, sym, dso,
120	symbol_daddr, dso_daddr, snoop, tlb, locked, blocked, local_ins_lat.
121
122-T::
123--type-profile::
124	Show data-type profile result instead of code symbols.  This requires
125	the debug information and it will change the default sort keys to:
126	mem, snoop, tlb, type.
127
128-U::
129--hide-unresolved::
130	Only display entries resolved to a symbol.
131
132-x::
133--field-separator=<separator>::
134	Specify the field separator used when dump raw samples (-D option). By default,
135	The separator is the space character.
136
137In addition, for report all perf report options are valid, and for record
138all perf record options.
139
140OVERHEAD CALCULATION
141--------------------
142Unlike linkperf:perf-report[1], which calculates overhead from the actual
143sample period, perf-mem overhead is calculated using sample weight. E.g.
144there are two samples in perf.data file, both with the same sample period,
145but one sample with weight 180 and the other with weight 20:
146
147  $ perf script -F period,data_src,weight,ip,sym
148  100000    629080842 |OP LOAD|LVL L3 hit|...     20       7e69b93ca524 strcmp
149  100000   1a29081042 |OP LOAD|LVL RAM hit|...   180   ffffffff82429168 memcpy
150
151  $ perf report -F overhead,symbol
152  50%   [.] strcmp
153  50%   [k] memcpy
154
155  $ perf mem report -F overhead,symbol
156  90%   [k] memcpy
157  10%   [.] strcmp
158
159SEE ALSO
160--------
161linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]
162