perf-c2c.txt - OpenGrok cross reference for /linux/tools/perf/Documentation/perf-c2c.txt

Lines Matching +full:default +full:- +full:sample +full:- +full:phase
1 perf-c2c(1)
5 ----
6 perf-c2c - Shared Data C2C/HITM Analyzer.
9 --------
12 'perf c2c record' [<options>] \-- [<record command options>] <command>
16 -----------
26 sample load and store operations, therefore hardware and kernel support is
27 required. See linkperf:perf-arm-spe[1] for a setup guide. Due to the
32   - memory address of the access
33   - type of the access (load and store details)
34   - latency (in cycles) of the load access
37 for cachelines with highest contention - highest number of HITM accesses.
39 The basic workflow with this tool follows the standard record/report phase.
45 --------------
46 -e::
47 --event=::
48 	Select the PMU event. Use 'perf c2c record -e list'
51 -v::
52 --verbose::
55 -l::
56 --ldlat::
57 	Configure mem-loads latency. Supported on Intel, Arm64 and some AMD
61 	- /sys/bus/event_source/devices/ibs_op/caps/ldlat file contains '1'.
62 	- Supported latency values are 128 to 2048 (both inclusive).
63 	- Latency value which is a multiple of 128 incurs a little less profiling
65 	- Load latency filtering is disabled by default.
67 -k::
68 --all-kernel::
71 -u::
72 --all-user::
76 --------------
77 -k::
78 --vmlinux=<file>::
81 -v::
82 --verbose::
85 -i::
86 --input::
89 -N::
90 --node-info::
93 -c::
94 --coalesce::
99 -g::
100 --call-graph::
102 	Please refer to perf-report man page for details.
104 --stdio::
107 --stats::
110 --full-symbols::
113 --no-source::
116 --show-all::
119 -f::
120 --force::
123 -d::
124 --display::
126 	and sort on. Total HITMs (tot) as default, except Arm64 uses peer mode
127 	as default.
129 --stitch-lbr::
132 	perf c2c record --call-graph lbr.
133 	Disabled by default. In common cases with call stack overflows,
134 	it can recreate better call stacks than the default lbr call stack
140 --double-cl::
147 ----------
151 Following perf record options are configured by default:
154   -W,-d,--phys-data,--sample-cpu
156 Unless specified otherwise with '-e' option, following events are monitored by
157 default on Intel:
159   cpu/mem-loads,ldlat=30/P
160   cpu/mem-stores/P
168   cpu/mem-loads/
169   cpu/mem-stores/
171 User can pass any 'perf record' option behind '--' mark, like (to enable
174   $ perf c2c record -- -g -a
179 ----------
181 display modes: stdio and tui (default).
184   - sort all the data based on the cacheline address
185   - store access details for each cacheline
186   - sort all cachelines based on user settings
187   - display data
197   - zero based index to identify the cacheline
200   - cacheline address (hex number)
203   - cacheline percentage of all Remote/Local HITM accesses
206   - cacheline percentage of all peer accesses
208   LLC Load Hitm - Total, LclHitm, RmtHitm (For display with HITM types)
209   - count of Total/Local/Remote load HITMs
211   Load Peer - Total, Local, Remote (For display with peer type)
212   - count of Total/Local/Remote load from peer cache or DRAM
215   - sum of all cachelines accesses
218   - sum of all load accesses
221   - sum of all store accesses
223   Store Reference - L1Hit, L1Miss, N/A
224     L1Hit - store accesses that hit L1
225     L1Miss - store accesses that missed L1
226     N/A - store accesses with memory level is not available
228   Core Load Hit - FB, L1, L2
229   - count of load hits in FB (Fill Buffer), L1 and L2 cache
231   LLC Load Hit - LlcHit, LclHitm
232   - count of LLC load accesses, includes LLC hits and LLC HITMs
234   RMT Load Hit - RmtHit, RmtHitm
235   - count of remote load accesses, includes remote hits and remote HITMs;
239   Load Dram - Lcl, Rmt
240   - count of local and remote DRAM accesses
244   HITM - Rmt, Lcl (Display with HITM types)
245   - % of Remote/Local HITM accesses for given offset within cacheline
247   Peer Snoop - Rmt, Lcl (Display with peer type)
248   - % of Remote/Local peer accesses for given offset within cacheline
250   Store Refs - L1 Hit, L1 Miss, N/A
251   - % of store accesses that hit L1, missed L1 and N/A (no available) memory
254   Data address - Offset
255   - offset address
258   - pid of the process responsible for the accesses
261   - tid of the process responsible for the accesses
264   - code address responsible for the accesses
266   cycles - rmt hitm, lcl hitm, load (Display with HITM types)
267     - sum of cycles for given accesses - Remote/Local HITM and generic load
269   cycles - rmt peer, lcl peer, load (Display with peer type)
270     - sum of cycles for given accesses - Remote/Local peer load and generic load
273     - number of cpus that participated on the access
276     - code symbol related to the 'Code address' value
279     - shared object name related to the 'Code address' value
282     - source information related to the 'Code address' value
285     - nodes participating on the access (see NODE INFO section)
288 ---------
291   - node IDs separated by ','
292   - node IDs with stats for each ID, in following format:
295   - node IDs with list of affected CPUs in following format:
298 User can switch between above flavors with -N option or
302 --------
308   tid   - coalesced by process TIDs
309   pid   - coalesced by process PIDs
310   iaddr - coalesced by code address, following fields are displayed:
312   dso   - coalesced by shared object
314 By default the coalescing is setup with 'pid,iaddr'.
317 ------------
322   - overall statistics of memory accesses
325   - overall statistics on shared cachelines
328   - list of most expensive cachelines
331   - list of all accessed offsets for each cacheline
334 ----------
341 -------
347 --------
349   https://joemario.github.io/blog/2016/09/01/c2c-blog/
352 --------
353 linkperf:perf-record[1], linkperf:perf-mem[1], linkperf:perf-arm-spe[1]