Lines Matching +full:no +full:- +full:can +full:- +full:fd
1 perf-stat(1)
5 ----
6 perf-stat - Run a command and gather performance counter statistics
9 --------
11 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
12 'perf stat' [-e <EVENT> | --event=EVENT] [-a] \-- <command> [<options>]
13 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] \-- <command> [<options>]
14 'perf stat' report [-i file]
17 -----------
23 -------
25 Any command you can specify in a shell.
33 -e::
34 --event=::
35 Select the PMU event. Selection can be:
37 - a symbolic event name (use 'perf list' to list all events)
39 - a raw PMU event in the form of rN where N is a hexadecimal value
44 - a symbolic or raw PMU event followed by an optional colon
45 and a list of event modifiers, e.g., cpu-cycles:p. See the
46 linkperf:perf-list[1] man page for details on event modifiers.
48 - a symbolically formed event like 'pmu/param1=0x3,param2/' where
54 perf stat -A -a -e cpu/event,percore=1/,otherevent ...
56 - a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
69 -i::
70 --no-inherit::
72 -p::
73 --pid=<pid>::
76 -t::
77 --tid=<tid>::
80 -b::
81 --bpf-prog::
83 requiring root rights. bpftool-prog could be used to find program
86 # bpftool prog | head -n 1
89 # perf stat -e cycles,instructions --bpf-prog 17247 --timeout 1000
98 --bpf-counters::
100 allows multiple perf-stat sessions that are counting the same metric (cycles,
103 "perf config stat.bpf-counter-events=<list_of_events>".
105 --bpf-attr-map::
106 With option "--bpf-counters", different perf-stat sessions share
108 Use "--bpf-attr-map" to specify the path of this pinned hashmap.
112 --pfm-events events::
114 including support for event filters. For example '--pfm-events
115 inst_retired:any_p:u:c=1:i'. More than one event can be passed to the
117 events cannot be mixed together. The latter must be used with the -e
118 option. The -e option and this one can be mixed and matched. Events
119 can be grouped using the {} notation.
122 -a::
123 --all-cpus::
124 system-wide collection from all CPUs (default if no target is specified)
126 --no-scale::
129 -d::
130 --detailed::
131 print more detailed statistics, can be specified up to 3 times
133 -d: detailed events, L1 and LLC data cache
134 -d -d: more detailed events, dTLB and iTLB events
135 -d -d -d: very detailed events, adding prefetch events
137 -r::
138 --repeat=<n>::
141 -B::
142 --big-num::
144 Enabled by default. Use "--no-big-num" to disable.
145 Default setting can be changed with "perf config stat.big-num=false".
147 -C::
148 --cpu=::
149 Count only on the list of CPUs provided. Multiple CPUs can be provided as a
150 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
151 In per-thread mode, this option is ignored. The -a option is still necessary
152 to activate system-wide monitoring. Default is to count on all CPUs.
154 -A::
155 --no-aggr::
158 -n::
159 --null::
160 null run - Don't start any counters.
162 This can be useful to measure just elapsed wall-clock time - or to assess the
165 -v::
166 --verbose::
169 -x SEP::
170 --field-separator SEP::
171 print counts using a CSV-style output to make it easy to import directly into
174 --table:: Display time for each run (-r option), in a table format, e.g.:
176 $ perf stat --null -r 5 --table perf bench sched pipe
181 5.189 (-0.293) #
182 5.189 (-0.294) #
183 5.186 (-0.296) #
188 5.483 +- 0.198 seconds time elapsed ( +- 3.62% )
190 -G name::
191 --cgroup name::
193 in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
195 can be provided. Each cgroup is applied to the corresponding event, i.e., first cgroup
197 an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
199 line. If the user wants to track multiple events for a specific cgroup, the user can
200 use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
203 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
205 --for-each-cgroup name::
208 effect that repeating -e option and -G option for each event x name. This option
209 cannot be used with -G/--cgroup option.
211 -o file::
212 --output file::
215 --append::
216 Append to the output file designated with the -o option. Ignored if -o is not specified.
218 --log-fd::
220 Log output to fd, instead of stderr. Complementary to --output, and mutually exclusive
221 with it. --append may be used here. Examples:
222 3>results perf stat --log-fd 3 \-- $cmd
223 3>>results perf stat --log-fd 3 --append \-- $cmd
225 --control=fifo:ctl-fifo[,ack-fifo]::
226 --control=fd:ctl-fd[,ack-fd]::
227 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
228 Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
229 'disable': disable events). Measurements can be started with events disabled using
230 --delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
239 test -p ${ctl_fifo} && unlink ${ctl_fifo}
244 test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
248 perf stat -D -1 -e cpu-cycles -a -I 1000 \
249 --control fd:${ctl_fd},${ctl_fd_ack} \
250 \-- sleep 30 &
253 sleep 5 && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
254 sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
256 exec {ctl_fd_ack}>&-
259 exec {ctl_fd}>&-
262 wait -n ${perf_pid}
266 --pre::
267 --post::
270 perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' \-- make -s -j64 O=defc…
272 -I msecs::
273 --interval-print msecs::
276 example: 'perf stat -I 1000 -e cycles -a sleep 5'
280 --interval-count times::
282 This option should be used together with "-I" option.
283 example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
285 --interval-clear::
288 --timeout msecs::
290 This option is not supported with the "-I" option.
291 example: 'perf stat --time 2000 -e cycles -a'
293 --metric-only::
295 Don't show any raw values. Not supported with --per-thread.
297 --per-socket::
298 Aggregate counts per processor socket for system-wide mode measurements. This
300 use --per-socket in addition to -a. (system-wide). The output includes the
304 --per-die::
305 Aggregate counts per processor die for system-wide mode measurements. This
307 use --per-die in addition to -a. (system-wide). The output includes the
311 --per-cluster::
312 Aggregate counts per processor cluster for system-wide mode measurement. This
314 use --per-cluster in addition to -a. (system-wide). The output includes the
317 related CPUs can be gotten from /sys/devices/system/cpu/cpuX/topology/cluster_{id, cpus}.
319 --per-cache::
320 Aggregate counts per cache instance for system-wide mode measurements. By
323 alongside the option in the format [Ll][1-9][0-9]*. For example:
324 Using option "--per-cache=l3" or "--per-cache=L3" will aggregate the
327 --per-core::
328 Aggregate counts per physical processor for system-wide mode measurements. This
330 use --per-core in addition to -a. (system-wide). The output includes the
333 --per-thread::
334 Aggregate counts per monitored threads, when monitoring threads (-t option)
335 or processes (-p option).
337 --per-node::
338 Aggregate counts per NUMA nodes for system-wide mode measurements. This
340 mode, use --per-node in addition to -a. (system-wide).
342 -D msecs::
343 --delay msecs::
344 After starting the program, wait msecs before measuring (-1: start with events
348 -T::
349 --transaction::
353 --metric-no-group::
356 --metric-no-group option places events outside of groups and may
357 increase the chance of the event being scheduled - leading to more
359 for metrics like instructions per cycle can be lower - as both metrics
360 may no longer be being measured at the same time.
362 --metric-no-merge::
363 By default metric events in different weak groups can be shared if one
372 --metric-no-threshold::
375 may not be desirable, for example, as the events can introduce
381 --quiet::
386 -----------
389 -o file::
390 --output file::
394 -----------
397 -i file::
398 --input file::
401 --per-socket::
402 Aggregate counts per processor socket for system-wide mode measurements.
404 --per-die::
405 Aggregate counts per processor die for system-wide mode measurements.
407 --per-cluster::
408 Aggregate counts perf processor cluster for system-wide mode measurements.
410 --per-cache::
411 Aggregate counts per cache instance for system-wide mode measurements. By
414 alongside the option in the format [Ll][1-9][0-9]*. For example: Using
415 option "--per-cache=l3" or "--per-cache=L3" will aggregate the
418 --per-core::
419 Aggregate counts per physical processor for system-wide mode measurements.
421 -M::
422 --metrics::
431 no threshold information was available or the threshold
434 -A::
435 --no-aggr::
436 --no-merge::
459 --hybrid-merge::
465 --topdown::
466 Print top-down metrics supported by the CPU. This allows to determine
479 mode like -I 1000, as the bottleneck of workloads can change often.
481 This enables --metric-only, unless overridden with --no-metric-only.
484 on newer CPUs (IceLake and later) TopDown can be collected for any thread:
488 and -a (global monitoring) is needed, requiring root rights or
489 perf.perf_event_paranoid=-1.
498 CPUs the workload runs on. If needed the CPUs can be forced using
501 --record-tpebs::
509 --td-level::
510 Print the top-down statistics that equal the input level. It allows
511 users to print the interested top-down metrics level instead of the
512 level 1 top-down metrics.
515 will be less accurate. By convention a metric can be examined by
520 'perf stat -M tma_frontend_bound_group...'.
524 --smi-cost::
530 The cost of SMI can be measured by (aperf - unhalted core cycles).
533 oriented analysis. --metric_only will be applied by default.
534 The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
536 Users who wants to get the actual value can apply --no-metric-only.
538 --all-kernel::
541 --all-user::
544 --percore-show-thread::
553 --summary::
554 Print summary for interval mode (-I).
556 --no-csv-summary::
558 This option must be used with -x and --summary.
560 This option can be enabled in perf config by setting the variable
561 'stat.no-csv-summary'.
563 $ perf config stat.no-csv-summary=true
565 --cputype::
570 --------
572 $ perf stat \-- make
576 83723.452481 task-clock:u (msec) # 1.004 CPUs utilized
577 0 context-switches:u # 0.000 K/sec
578 0 cpu-migrations:u # 0.000 K/sec
579 3,228,188 page-faults:u # 0.039 M/sec
583 2,078,861,393 branch-misses:u # 2.98% of all branches
591 -------
592 As displayed in the example above we can display 3 types of timings.
606 ----------
608 With -x, perf stat is able to output a not-quite-CSV format output
610 it is recommended to use a different character like -x \;
614 - optional usec time stamp in fractions of second (with -I xxx)
615 - optional CPU, core, or socket identifier
616 - optional number of logical CPUs aggregated
617 - counter value
618 - unit of the counter value or empty
619 - event name
620 - run time of counter
621 - percentage of measurement time the counter was running
622 - optional variance if multiple values are collected with -r
623 - optional metric value
624 - optional unit of metric
628 include::intel-hybrid.txt[]
631 -----------
633 With -j, perf stat is able to print out a JSON format output
634 that can be used for parsing.
636 - timestamp : optional usec time stamp in fractions of second (with -I)
637 - optional aggregate options:
638 - core : core identifier (with --per-core)
639 - die : die identifier (with --per-die)
640 - socket : socket identifier (with --per-socket)
641 - node : node identifier (with --per-node)
642 - thread : thread identifier (with --per-thread)
643 - counter-value : counter value
644 - unit : unit of the counter value or empty
645 - event : event name
646 - variance : optional variance if multiple values are collected (with -r)
647 - runtime : run time of counter
648 - metric-value : optional metric value
649 - metric-unit : optional unit of metric
652 --------
653 linkperf:perf-top[1], linkperf:perf-list[1]