Lines Matching +full:default +full:- +full:sample +full:- +full:phase
1 perf-c2c(1)
5 ----
6 perf-c2c - Shared Data C2C/HITM Analyzer.
9 --------
12 'perf c2c record' [<options>] \-- [<record command options>] <command>
16 -----------
26 sample load and store operations, therefore hardware and kernel support is
27 required. See linkperf:perf-arm-spe[1] for a setup guide. Due to the
32 - memory address of the access
33 - type of the access (load and store details)
34 - latency (in cycles) of the load access
37 for cachelines with highest contention - highest number of HITM accesses.
39 The basic workflow with this tool follows the standard record/report phase.
45 --------------
46 -e::
47 --event=::
48 Select the PMU event. Use 'perf c2c record -e list'
51 -v::
52 --verbose::
55 -l::
56 --ldlat::
57 Configure mem-loads latency. Supported on Intel, Arm64 and some AMD
61 - /sys/bus/event_source/devices/ibs_op/caps/ldlat file contains '1'.
62 - Supported latency values are 128 to 2048 (both inclusive).
63 - Latency value which is a multiple of 128 incurs a little less profiling
65 - Load latency filtering is disabled by default.
67 -k::
68 --all-kernel::
71 -u::
72 --all-user::
76 --------------
77 -k::
78 --vmlinux=<file>::
81 -v::
82 --verbose::
85 -i::
86 --input::
89 -N::
90 --node-info::
93 -c::
94 --coalesce::
99 -g::
100 --call-graph::
102 Please refer to perf-report man page for details.
104 --stdio::
107 --stats::
110 --full-symbols::
113 --no-source::
116 --show-all::
119 -f::
120 --force::
123 -d::
124 --display::
126 and sort on. Total HITMs (tot) as default, except Arm64 uses peer mode
127 as default.
129 --stitch-lbr::
132 perf c2c record --call-graph lbr.
133 Disabled by default. In common cases with call stack overflows,
134 it can recreate better call stacks than the default lbr call stack
140 --double-cl::
147 ----------
151 Following perf record options are configured by default:
154 -W,-d,--phys-data,--sample-cpu
156 Unless specified otherwise with '-e' option, following events are monitored by
157 default on Intel:
159 cpu/mem-loads,ldlat=30/P
160 cpu/mem-stores/P
168 cpu/mem-loads/
169 cpu/mem-stores/
171 User can pass any 'perf record' option behind '--' mark, like (to enable
174 $ perf c2c record -- -g -a
179 ----------
181 display modes: stdio and tui (default).
184 - sort all the data based on the cacheline address
185 - store access details for each cacheline
186 - sort all cachelines based on user settings
187 - display data
197 - zero based index to identify the cacheline
200 - cacheline address (hex number)
203 - cacheline percentage of all Remote/Local HITM accesses
206 - cacheline percentage of all peer accesses
208 LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
209 - count of Total/Local/Remote load HITMs
211 Load Peer - Total, Local, Remote (For display with peer type)
212 - count of Total/Local/Remote load from peer cache or DRAM
215 - sum of all cachelines accesses
218 - sum of all load accesses
221 - sum of all store accesses
223 Store Reference - L1Hit, L1Miss, N/A
224 L1Hit - store accesses that hit L1
225 L1Miss - store accesses that missed L1
226 N/A - store accesses with memory level is not available
228 Core Load Hit - FB, L1, L2
229 - count of load hits in FB (Fill Buffer), L1 and L2 cache
231 LLC Load Hit - LlcHit, LclHitm
232 - count of LLC load accesses, includes LLC hits and LLC HITMs
234 RMT Load Hit - RmtHit, RmtHitm
235 - count of remote load accesses, includes remote hits and remote HITMs;
239 Load Dram - Lcl, Rmt
240 - count of local and remote DRAM accesses
244 HITM - Rmt, Lcl (Display with HITM types)
245 - % of Remote/Local HITM accesses for given offset within cacheline
247 Peer Snoop - Rmt, Lcl (Display with peer type)
248 - % of Remote/Local peer accesses for given offset within cacheline
250 Store Refs - L1 Hit, L1 Miss, N/A
251 - % of store accesses that hit L1, missed L1 and N/A (no available) memory
254 Data address - Offset
255 - offset address
258 - pid of the process responsible for the accesses
261 - tid of the process responsible for the accesses
264 - code address responsible for the accesses
266 cycles - rmt hitm, lcl hitm, load (Display with HITM types)
267 - sum of cycles for given accesses - Remote/Local HITM and generic load
269 cycles - rmt peer, lcl peer, load (Display with peer type)
270 - sum of cycles for given accesses - Remote/Local peer load and generic load
273 - number of cpus that participated on the access
276 - code symbol related to the 'Code address' value
279 - shared object name related to the 'Code address' value
282 - source information related to the 'Code address' value
285 - nodes participating on the access (see NODE INFO section)
288 ---------
291 - node IDs separated by ','
292 - node IDs with stats for each ID, in following format:
295 - node IDs with list of affected CPUs in following format:
298 User can switch between above flavors with -N option or
302 --------
308 tid - coalesced by process TIDs
309 pid - coalesced by process PIDs
310 iaddr - coalesced by code address, following fields are displayed:
312 dso - coalesced by shared object
314 By default the coalescing is setup with 'pid,iaddr'.
317 ------------
322 - overall statistics of memory accesses
325 - overall statistics on shared cachelines
328 - list of most expensive cachelines
331 - list of all accessed offsets for each cacheline
334 ----------
341 -------
347 --------
349 https://joemario.github.io/blog/2016/09/01/c2c-blog/
352 --------
353 linkperf:perf-record[1], linkperf:perf-mem[1], linkperf:perf-arm-spe[1]