xref: /linux/tools/perf/Documentation/perf-sched.txt (revision 1fd1dc41724319406b0aff221a352a400b0ddfc5)
1perf-sched(1)
2=============
3
4NAME
5----
6perf-sched - Tool to trace/measure scheduler properties (latencies)
7
8SYNOPSIS
9--------
10[verse]
11'perf sched' {record|latency|map|replay|script|timehist|stats}
12
13DESCRIPTION
14-----------
15There are several variants of 'perf sched':
16
17  'perf sched record <command>' to record the scheduling events
18  of an arbitrary workload.
19
20  'perf sched latency' to report the per task scheduling latencies
21  and other scheduling properties of the workload.
22
23   Example usage:
24       perf sched record -- sleep 1
25       perf sched latency
26
27  -------------------------------------------------------------------------------------------------------------------------------------------
28  Task                  |   Runtime ms  |  Count   | Avg delay ms    | Max delay ms    | Max delay start           | Max delay end          |
29  -------------------------------------------------------------------------------------------------------------------------------------------
30  perf:(2)              |      2.804 ms |       66 | avg:   0.524 ms | max:   1.069 ms | max start: 254752.314960 s | max end: 254752.316029 s
31  NetworkManager:1343   |      0.372 ms |       13 | avg:   0.008 ms | max:   0.013 ms | max start: 254751.551153 s | max end: 254751.551166 s
32  kworker/1:2-xfs:4649  |      0.012 ms |        1 | avg:   0.008 ms | max:   0.008 ms | max start: 254751.519807 s | max end: 254751.519815 s
33  kworker/3:1-xfs:388   |      0.011 ms |        1 | avg:   0.006 ms | max:   0.006 ms | max start: 254751.519809 s | max end: 254751.519815 s
34  sleep:147736          |      0.938 ms |        3 | avg:   0.006 ms | max:   0.007 ms | max start: 254751.313817 s | max end: 254751.313824 s
35
36  It shows Runtime(time that a task spent actually running on the CPU),
37  Count(number of times a delay was calculated) and delay(time that a
38  task was ready to run but was kept waiting).
39
40  Tasks with the same command name are merged and the merge count is
41  given within (), However if -p option is used, pid is mentioned.
42
43  'perf sched script' to see a detailed trace of the workload that
44   was recorded (aliased to 'perf script' for now).
45
46  'perf sched replay' to simulate the workload that was recorded
47  via perf sched record. (this is done by starting up mockup threads
48  that mimic the workload based on the events in the trace. These
49  threads can then replay the timings (CPU runtime and sleep patterns)
50  of the workload as it occurred when it was recorded - and can repeat
51  it a number of times, measuring its performance.)
52
53  'perf sched map' to print a textual context-switching outline of
54  workload captured via perf sched record.  Columns stand for
55  individual CPUs, and the two-letter shortcuts stand for tasks that
56  are running on a CPU. A '*' denotes the CPU that had the event, and
57  a dot signals an idle CPU.
58
59  'perf sched timehist' provides an analysis of scheduling events.
60
61    Example usage:
62        perf sched record -- sleep 1
63        perf sched timehist
64
65   By default it shows the individual schedule events, including the wait
66   time (time between sched-out and next sched-in events for the task), the
67   task scheduling delay (time between runnable and actually running) and
68   run time for the task:
69
70                time    cpu  task name             wait time  sch delay   run time
71                             [tid/pid]                (msec)     (msec)     (msec)
72      -------------- ------  --------------------  ---------  ---------  ---------
73        79371.874569 [0011]  gcc[31949]                0.014      0.000      1.148
74        79371.874591 [0010]  gcc[31951]                0.000      0.000      0.024
75        79371.874603 [0010]  migration/10[59]          3.350      0.004      0.011
76        79371.874604 [0011]  <idle>                    1.148      0.000      0.035
77        79371.874723 [0005]  <idle>                    0.016      0.000      1.383
78        79371.874746 [0005]  gcc[31949]                0.153      0.078      0.022
79    ...
80
81   Times are in msec.usec.
82
83   'perf sched stats {record | report | diff} <command>' to capture, report the diff
84   in schedstat counters and show the difference between perf sched stats report
85   respectively. schedstat counters which are present in the linux kernel and are
86   exposed through the file ``/proc/schedstat``. These counters are enabled or disabled
87   via the sysctl governed by the file ``/proc/sys/kernel/sched_schedstats``. These
88   counters accounts for many scheduler events such as ``schedule()`` calls, load-balancing
89   events, ``try_to_wakeup()`` call among others. This is useful in understanding the
90   scheduler behavior for the workload.
91
92   Note: The tool will not give correct results if there is topological reordering or
93         online/offline of cpus in between capturing snapshots of `/proc/schedstat`.
94
95    Example usage:
96        perf sched stats record -- sleep 1
97        perf sched stats report
98        perf sched stats diff
99
100   A detailed description of the schedstats can be found in the Kernel Documentation:
101   https://www.kernel.org/doc/html/latest/scheduler/sched-stats.html
102
103   The result can be interpreted as follows:
104
105   The `perf sched stats report` starts with description of the columns present in
106   the report. These column names are given before cpu and domain stats to improve
107   the readability of the report.
108
109   ----------------------------------------------------------------------------------------------------
110   DESC                    -> Description of the field
111   COUNT                   -> Value of the field
112   PCT_CHANGE              -> Percent change with corresponding base value
113   AVG_JIFFIES             -> Avg time in jiffies between two consecutive occurrence of event
114   ----------------------------------------------------------------------------------------------------
115
116   Next is the total profiling time in terms of jiffies:
117
118   ----------------------------------------------------------------------------------------------------
119   Time elapsed (in jiffies)                                   :        2323
120   ----------------------------------------------------------------------------------------------------
121
122   Next is CPU scheduling statistics. These are simple diffs of /proc/schedstat CPU lines
123   along with description. The report also prints % relative to base stat.
124
125   In the example below, schedule() left the CPU0 idle 36.58% of the time. 0.45% of total
126   try_to_wake_up() was to wakeup local CPU. And, the total waittime by tasks on CPU0 is
127   48.70% of the total runtime by tasks on the same CPU.
128
129   ----------------------------------------------------------------------------------------------------
130   CPU 0
131   ----------------------------------------------------------------------------------------------------
132   DESC                                                                     COUNT   PCT_CHANGE
133   ----------------------------------------------------------------------------------------------------
134   yld_count                                                        :           0
135   array_exp                                                        :           0
136   sched_count                                                      :      402267
137   sched_goidle                                                     :      147161  (    36.58% )
138   ttwu_count                                                       :      236309
139   ttwu_local                                                       :        1062  (     0.45% )
140   rq_cpu_time                                                      :  7083791148
141   run_delay                                                        :  3449973971  (    48.70% )
142   pcount                                                           :      255035
143   ----------------------------------------------------------------------------------------------------
144
145   Next is load balancing statistics. For each of the sched domains
146   (eg: `SMT`, `MC`, `DIE`...), the scheduler computes statistics under
147   the following three categories:
148
149   1) Idle Load Balance: Load balancing performed on behalf of a long
150                         idling CPU by some other CPU.
151   2) Busy Load Balance: Load balancing performed when the CPU was busy.
152   3) New Idle Balance : Load balancing performed when a CPU just became
153                        idle.
154
155   Under each of these three categories, sched stats report provides
156   different load balancing statistics. Along with direct stats, the
157   report also contains derived metrics prefixed with *. Example:
158
159   ----------------------------------------------------------------------------------------------------
160   CPU 0, DOMAIN SMT CPUS 0,64
161   ----------------------------------------------------------------------------------------------------
162   DESC                                                                     COUNT    AVG_JIFFIES
163   ----------------------------------------- <Category busy> ------------------------------------------
164   busy_lb_count                                                    :         136  $       17.08 $
165   busy_lb_balanced                                                 :         131  $       17.73 $
166   busy_lb_failed                                                   :           0  $        0.00 $
167   busy_lb_imbalance_load                                           :          58
168   busy_lb_imbalance_util                                           :           0
169   busy_lb_imbalance_task                                           :           0
170   busy_lb_imbalance_misfit                                         :           0
171   busy_lb_gained                                                   :           7
172   busy_lb_hot_gained                                               :           0
173   busy_lb_nobusyq                                                  :           2  $     1161.50 $
174   busy_lb_nobusyg                                                  :         129  $       18.01 $
175   *busy_lb_success_count                                           :           5
176   *busy_lb_avg_pulled                                              :        1.40
177   ----------------------------------------- <Category idle> ------------------------------------------
178   idle_lb_count                                                    :         449  $        5.17 $
179   idle_lb_balanced                                                 :         382  $        6.08 $
180   idle_lb_failed                                                   :           3  $      774.33 $
181   idle_lb_imbalance_load                                           :           0
182   idle_lb_imbalance_util                                           :           0
183   idle_lb_imbalance_task                                           :          71
184   idle_lb_imbalance_misfit                                         :           0
185   idle_lb_gained                                                   :          67
186   idle_lb_hot_gained                                               :           0
187   idle_lb_nobusyq                                                  :           0  $        0.00 $
188   idle_lb_nobusyg                                                  :         382  $        6.08 $
189   *idle_lb_success_count                                           :          64
190   *idle_lb_avg_pulled                                              :        1.05
191   ---------------------------------------- <Category newidle> ----------------------------------------
192   newidle_lb_count                                                 :       30471  $        0.08 $
193   newidle_lb_balanced                                              :       28490  $        0.08 $
194   newidle_lb_failed                                                :         633  $        3.67 $
195   newidle_lb_imbalance_load                                        :           0
196   newidle_lb_imbalance_util                                        :           0
197   newidle_lb_imbalance_task                                        :        2040
198   newidle_lb_imbalance_misfit                                      :           0
199   newidle_lb_gained                                                :        1348
200   newidle_lb_hot_gained                                            :           0
201   newidle_lb_nobusyq                                               :           6  $      387.17 $
202   newidle_lb_nobusyg                                               :       26634  $        0.09 $
203   *newidle_lb_success_count                                        :        1348
204   *newidle_lb_avg_pulled                                           :        1.00
205   ----------------------------------------------------------------------------------------------------
206
207   Consider following line:
208
209   newidle_lb_balanced                                              :       28490  $        0.08 $
210
211   While profiling was active, the load-balancer found 28490 times the load
212   needs to be balanced on a newly idle CPU 0. Following value encapsulated
213   inside $ is average jiffies between two events (2323 / 28490 = 0.08).
214
215   Next are active_load_balance() stats. alb did not trigger while the
216   profiling was active, hence it's all 0s.
217
218   --------------------------------- <Category active_load_balance()> ---------------------------------
219   alb_count                                                        :           0
220   alb_failed                                                       :           0
221   alb_pushed                                                       :           0
222   ----------------------------------------------------------------------------------------------------
223
224   Next are sched_balance_exec() and sched_balance_fork() stats. They are
225   not used but we kept it in RFC just for legacy purpose. Unless opposed,
226   we plan to remove them in next revision.
227
228   Next are wakeup statistics. For every domain, the report also shows
229   task-wakeup statistics. Example:
230
231   ------------------------------------------ <Wakeup Info> -------------------------------------------
232   ttwu_wake_remote                                                 :        1590
233   ttwu_move_affine                                                 :          84
234   ttwu_move_balance                                                :           0
235   ----------------------------------------------------------------------------------------------------
236
237   Same set of stats are reported for each CPU and each domain level.
238
239   How to interpret the diff
240   ~~~~~~~~~~~~~~~~~~~~~~~~~
241
242   The `perf sched stats diff` will also start with explaining the columns
243   present in the diff. Then it will show the diff in time in terms of
244   jiffies. The order of the values depends on the order of input data
245   files. It will take `perf.data.old` and `perf.data` respectively as the
246   defaults for comparison. Example:
247
248   ----------------------------------------------------------------------------------------------------
249   Time elapsed (in jiffies)                                        :        2009,       2001
250   ----------------------------------------------------------------------------------------------------
251
252   Below is the sample representing the difference in cpu and domain stats of
253   two runs. Here third column or the values enclosed in `|...|` shows the
254   percent change between the two. Second and fourth columns shows the
255   side-by-side representions of the corresponding fields from `perf sched
256   stats report`.
257
258   ----------------------------------------------------------------------------------------------------
259   CPU <ALL CPUS SUMMARY>
260   ----------------------------------------------------------------------------------------------------
261   DESC                                                                    COUNT1      COUNT2   PCT_CHANG>
262   ----------------------------------------------------------------------------------------------------
263   yld_count                                                        :           0,          0  |     0.00>
264   array_exp                                                        :           0,          0  |     0.00>
265   sched_count                                                      :      528533,     412573  |   -21.94>
266   sched_goidle                                                     :      193426,     146082  |   -24.48>
267   ttwu_count                                                       :      313134,     385975  |    23.26>
268   ttwu_local                                                       :        1126,       1282  |    13.85>
269   rq_cpu_time                                                      :  8257200244, 8301250047  |     0.53>
270   run_delay                                                        :  4728347053, 3997100703  |   -15.47>
271   pcount                                                           :      335031,     266396  |   -20.49>
272   ----------------------------------------------------------------------------------------------------
273
274   Below is the sample of domain stats diff:
275
276   ----------------------------------------------------------------------------------------------------
277   CPU <ALL CPUS SUMMARY>, DOMAIN SMT
278   ----------------------------------------------------------------------------------------------------
279   DESC                                                                    COUNT1      COUNT2   PCT_CHANG>
280   ----------------------------------------- <Category busy> ------------------------------------------
281   busy_lb_count                                                    :         122,         80  |   -34.43>
282   busy_lb_balanced                                                 :         115,         76  |   -33.91>
283   busy_lb_failed                                                   :           1,          3  |   200.00>
284   busy_lb_imbalance_load                                           :          35,         49  |    40.00>
285   busy_lb_imbalance_util                                           :           0,          0  |     0.00>
286   busy_lb_imbalance_task                                           :           0,          0  |     0.00>
287   busy_lb_imbalance_misfit                                         :           0,          0  |     0.00>
288   busy_lb_gained                                                   :           7,          2  |   -71.43>
289   busy_lb_hot_gained                                               :           0,          0  |     0.00>
290   busy_lb_nobusyq                                                  :           0,          0  |     0.00>
291   busy_lb_nobusyg                                                  :         115,         76  |   -33.91>
292   *busy_lb_success_count                                           :           6,          1  |   -83.33>
293   *busy_lb_avg_pulled                                              :        1.17,       2.00  |    71.43>
294   ----------------------------------------- <Category idle> ------------------------------------------
295   idle_lb_count                                                    :         568,        620  |     9.15>
296   idle_lb_balanced                                                 :         462,        449  |    -2.81>
297   idle_lb_failed                                                   :          11,         21  |    90.91>
298   idle_lb_imbalance_load                                           :           0,          0  |     0.00>
299   idle_lb_imbalance_util                                           :           0,          0  |     0.00>
300   idle_lb_imbalance_task                                           :         115,        189  |    64.35>
301   idle_lb_imbalance_misfit                                         :           0,          0  |     0.00>
302   idle_lb_gained                                                   :         103,        169  |    64.08>
303   idle_lb_hot_gained                                               :           0,          0  |     0.00>
304   idle_lb_nobusyq                                                  :           0,          0  |     0.00>
305   idle_lb_nobusyg                                                  :         462,        449  |    -2.81>
306   *idle_lb_success_count                                           :          95,        150  |    57.89>
307   *idle_lb_avg_pulled                                              :        1.08,       1.13  |     3.92>
308   ---------------------------------------- <Category newidle> ----------------------------------------
309   newidle_lb_count                                                 :       16961,       3155  |   -81.40>
310   newidle_lb_balanced                                              :       15646,       2556  |   -83.66>
311   newidle_lb_failed                                                :         397,        142  |   -64.23>
312   newidle_lb_imbalance_load                                        :           0,          0  |     0.00>
313   newidle_lb_imbalance_util                                        :           0,          0  |     0.00>
314   newidle_lb_imbalance_task                                        :        1376,        655  |   -52.40>
315   newidle_lb_imbalance_misfit                                      :           0,          0  |     0.00>
316   newidle_lb_gained                                                :         917,        457  |   -50.16>
317   newidle_lb_hot_gained                                            :           0,          0  |     0.00>
318   newidle_lb_nobusyq                                               :           3,          1  |   -66.67>
319   newidle_lb_nobusyg                                               :       14480,       2103  |   -85.48>
320   *newidle_lb_success_count                                        :         918,        457  |   -50.22>
321   *newidle_lb_avg_pulled                                           :        1.00,       1.00  |     0.11>
322   --------------------------------- <Category active_load_balance()> ---------------------------------
323   alb_count                                                        :           0,          1  |     0.00>
324   alb_failed                                                       :           0,          0  |     0.00>
325   alb_pushed                                                       :           0,          1  |     0.00>
326   --------------------------------- <Category sched_balance_exec()> ----------------------------------
327   sbe_count                                                        :           0,          0  |     0.00>
328   sbe_balanced                                                     :           0,          0  |     0.00>
329   sbe_pushed                                                       :           0,          0  |     0.00>
330   --------------------------------- <Category sched_balance_fork()> ----------------------------------
331   sbf_count                                                        :           0,          0  |     0.00>
332   sbf_balanced                                                     :           0,          0  |     0.00>
333   sbf_pushed                                                       :           0,          0  |     0.00>
334   ------------------------------------------ <Wakeup Info> -------------------------------------------
335   ttwu_wake_remote                                                 :        2031,       2914  |    43.48>
336   ttwu_move_affine                                                 :          73,        124  |    69.86>
337   ttwu_move_balance                                                :           0,          0  |     0.00>
338   ----------------------------------------------------------------------------------------------------
339
340OPTIONS
341-------
342Applicable to {record|latency|map|replay|script}
343
344-i::
345--input=<file>::
346        Input file name. (default: perf.data unless stdin is a fifo)
347
348-v::
349--verbose::
350        Be more verbose. (show symbol address, etc)
351
352-D::
353--dump-raw-trace=::
354        Display verbose dump of the sched data.
355
356-f::
357--force::
358	Don't complain, do it.
359
360OPTIONS for 'perf sched latency'
361-------------------------------
362
363-C::
364--CPU <n>::
365        CPU to profile on.
366
367-p::
368--pids::
369        latency stats per pid instead of per command name.
370
371-s::
372--sort <key[,key2...]>::
373        sort by key(s): runtime, switch, avg, max
374        by default it's sorted by "avg ,max ,switch ,runtime".
375
376OPTIONS for 'perf sched map'
377----------------------------
378
379--compact::
380	Show only CPUs with activity. Helps visualizing on high core
381	count systems.
382
383--cpus::
384	Show just entries with activities for the given CPUs.
385
386--color-cpus::
387	Highlight the given cpus.
388
389--color-pids::
390	Highlight the given pids.
391
392--task-name <task>::
393	Map output only for the given task name(s). Separate the
394	task names with a comma (without whitespace). The sched-out
395	time is printed and is represented by '*-' for the given
396	task name(s).
397	('-' indicates other tasks while '.' is idle).
398
399--fuzzy-name::
400	Given task name(s) can be partially matched (fuzzy matching).
401
402OPTIONS for 'perf sched timehist'
403---------------------------------
404-k::
405--vmlinux=<file>::
406    vmlinux pathname
407
408--kallsyms=<file>::
409    kallsyms pathname
410
411-g::
412--call-graph::
413	Display call chains if present (default on).
414
415--max-stack::
416	Maximum number of functions to display in backtrace, default 5.
417
418-C=::
419--cpu=::
420	Only show events for the given CPU(s) (comma separated list).
421
422-p=::
423--pid=::
424	Only show events for given process ID (comma separated list).
425
426-t=::
427--tid=::
428	Only show events for given thread ID (comma separated list).
429
430-s::
431--summary::
432    Show only a summary of scheduling by thread with min, max, and average
433    run times (in sec) and relative stddev.
434
435-S::
436--with-summary::
437    Show all scheduling events followed by a summary by thread with min,
438    max, and average run times (in sec) and relative stddev.
439
440--symfs=<directory>::
441    Look for files with symbols relative to this directory.
442
443-V::
444--cpu-visual::
445	Show visual aid for sched switches by CPU: 'i' marks idle time,
446	's' are scheduler events.
447
448-w::
449--wakeups::
450	Show wakeup events.
451
452-M::
453--migrations::
454	Show migration events.
455
456-n::
457--next::
458	Show next task.
459
460-I::
461--idle-hist::
462	Show idle-related events only.
463
464--time::
465	Only analyze samples within given time window: <start>,<stop>. Times
466	have the format seconds.microseconds. If start is not given (i.e., time
467	string is ',x.y') then analysis starts at the beginning of the file. If
468	stop time is not given (i.e, time string is 'x.y,') then analysis goes
469	to end of file.
470
471--state::
472	Show task state when it switched out.
473
474--show-prio::
475	Show task priority.
476
477--prio::
478	Only show events for given task priority(ies). Multiple priorities can be
479	provided as a comma-separated list with no spaces: 0,120. Ranges of
480	priorities are specified with -: 120-129. A combination of both can also be
481	provided: 0,120-129.
482
483-P::
484--pre-migrations::
485	Show pre-migration wait time. pre-migration wait time is the time spent
486	by a task waiting on a runqueue but not getting the chance to run there
487	and is migrated to a different runqueue where it is finally run. This
488	time between sched_wakeup and migrate_task is the pre-migration wait
489	time.
490
491OPTIONS for 'perf sched replay'
492------------------------------
493
494-r::
495--repeat <n>::
496	repeat the workload n times (0: infinite). Default is 10.
497
498SEE ALSO
499--------
500linkperf:perf-record[1]
501