Lines Matching +full:spe +full:- +full:pmu
1 perf-mem(1)
5 ----
6 perf-mem - Profile memory accesses
9 --------
14 -----------
20 and stores are sampled. Use the -t option to limit to loads or stores.
22 Note that on Intel systems the memory latency reported is the use-latency,
26 On Arm64 this uses SPE to sample load and store operations, therefore hardware
27 and kernel support is required. See linkperf:perf-arm-spe[1] for a setup guide.
28 Due to the statistical nature of SPE sampling, not every memory operation will
31 On AMD this use IBS Op PMU to sample load-store operations.
34 --------------
35 -f::
36 --force::
39 -t::
40 --type=<type>::
43 -v::
44 --verbose::
47 -p::
48 --phys-data::
51 --data-page-size::
55 --------------
59 -e::
60 --event <event>::
61 Event selector. Use 'perf mem record -e list' to list available events.
63 -K::
64 --all-kernel::
67 -U::
68 --all-user::
71 --ldlat <n>::
76 - /sys/bus/event_source/devices/ibs_op/caps/ldlat file contains '1'.
77 - Supported latency values are 128 to 2048 (both inclusive).
78 - Latency value which is a multiple of 128 incurs a little less profiling
80 - Load latency filtering is disabled by default.
83 --------------
84 -i::
85 --input=<file>::
88 -C::
89 --cpu=<cpu>::
91 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -
92 like 0-2. Default is to monitor all CPUS.
94 -D::
95 --dump-raw-samples::
99 -s::
100 --sort=<key>::
101 Group result by given key(s) - multiple keys can be specified
106 - symbol_daddr: name of data symbol being executed on at the time of sample
107 - symbol_iaddr: name of code symbol being executed on at the time of sample
108 - dso_daddr: name of library or module containing the data being executed
110 - locked: whether the bus was locked at the time of the sample
111 - tlb: type of tlb access for the data at the time of the sample
112 - mem: type of memory access for the data at the time of the sample
113 - snoop: type of snoop (if any) for the data at the time of the sample
114 - dcacheline: the cacheline the data address is on at the time of the sample
115 - phys_daddr: physical address of data being executed on at the time of sample
116 - data_page_size: the data page size of data being executed on at the time of sample
117 - blocked: reason of blocked load access for the data at the time of the sample
122 -F::
123 --fields=::
124 Specify output field - multiple keys can be specified in CSV format.
125 Please see linkperf:perf-report[1] for details.
130 - op: operation in the sample instruction (load, store, prefetch, ...)
131 - cache: location in CPU cache (L1, L2, ...) where the sample hit
132 - mem: location in memory or other places the sample hit
133 - dtlb: location in Data TLB (L1, L2) where the sample hit
134 - snoop: snoop result for the sampled data access
138 -T::
139 --type-profile::
140 Show data-type profile result instead of code symbols. This requires
144 -U::
145 --hide-unresolved::
148 -x::
149 --field-separator=<separator>::
150 Specify the field separator used when dump raw samples (-D option). By default,
157 --------------------
158 Unlike linkperf:perf-report[1], which calculates overhead from the actual
159 sample period, perf-mem overhead is calculated using sample weight. E.g.
163 $ perf script -F period,data_src,weight,ip,sym
167 $ perf report -F overhead,symbol
171 $ perf mem report -F overhead,symbol
176 ----------------------
180 behave differently when it's used by -F/--fields or -s/--sort.
185 $ perf mem report -F mem,snoop
187 # ------ Memory ------- --- Snoop ----
196 $ perf mem report -s mem,snoop
210 --------
211 linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-arm-spe[1]