xref: /linux/Documentation/trace/timerlat-tracer.rst (revision cdd5b5a9761fd66d17586e4f4ba6588c70e640ea)
1a955d7eaSDaniel Bristot de Oliveira###############
2a955d7eaSDaniel Bristot de OliveiraTimerlat tracer
3a955d7eaSDaniel Bristot de Oliveira###############
4a955d7eaSDaniel Bristot de Oliveira
5a955d7eaSDaniel Bristot de OliveiraThe timerlat tracer aims to help the preemptive kernel developers to
64d4eac7bSDaniel Bristot de Oliveirafind sources of wakeup latencies of real-time threads. Like cyclictest,
7a955d7eaSDaniel Bristot de Oliveirathe tracer sets a periodic timer that wakes up a thread. The thread then
8a955d7eaSDaniel Bristot de Oliveiracomputes a *wakeup latency* value as the difference between the *current
9a955d7eaSDaniel Bristot de Oliveiratime* and the *absolute time* that the timer was set to expire. The main
10a955d7eaSDaniel Bristot de Oliveiragoal of timerlat is tracing in such a way to help kernel developers.
11a955d7eaSDaniel Bristot de Oliveira
12a955d7eaSDaniel Bristot de OliveiraUsage
13a955d7eaSDaniel Bristot de Oliveira-----
14a955d7eaSDaniel Bristot de Oliveira
15a955d7eaSDaniel Bristot de OliveiraWrite the ASCII text "timerlat" into the current_tracer file of the
16a955d7eaSDaniel Bristot de Oliveiratracing system (generally mounted at /sys/kernel/tracing).
17a955d7eaSDaniel Bristot de Oliveira
18a955d7eaSDaniel Bristot de OliveiraFor example::
19a955d7eaSDaniel Bristot de Oliveira
20a955d7eaSDaniel Bristot de Oliveira        [root@f32 ~]# cd /sys/kernel/tracing/
21a955d7eaSDaniel Bristot de Oliveira        [root@f32 tracing]# echo timerlat > current_tracer
22a955d7eaSDaniel Bristot de Oliveira
237f77ebbfSAkhil RajIt is possible to follow the trace by reading the trace file::
24a955d7eaSDaniel Bristot de Oliveira
25a955d7eaSDaniel Bristot de Oliveira  [root@f32 tracing]# cat trace
26a955d7eaSDaniel Bristot de Oliveira  # tracer: timerlat
27a955d7eaSDaniel Bristot de Oliveira  #
28a955d7eaSDaniel Bristot de Oliveira  #                              _-----=> irqs-off
29a955d7eaSDaniel Bristot de Oliveira  #                             / _----=> need-resched
30a955d7eaSDaniel Bristot de Oliveira  #                            | / _---=> hardirq/softirq
31a955d7eaSDaniel Bristot de Oliveira  #                            || / _--=> preempt-depth
32a955d7eaSDaniel Bristot de Oliveira  #                            || /
33a955d7eaSDaniel Bristot de Oliveira  #                            ||||             ACTIVATION
34a955d7eaSDaniel Bristot de Oliveira  #         TASK-PID      CPU# ||||   TIMESTAMP    ID            CONTEXT                LATENCY
35a955d7eaSDaniel Bristot de Oliveira  #            | |         |   ||||      |         |                  |                       |
36a955d7eaSDaniel Bristot de Oliveira          <idle>-0       [000] d.h1    54.029328: #1     context    irq timer_latency       932 ns
37a955d7eaSDaniel Bristot de Oliveira           <...>-867     [000] ....    54.029339: #1     context thread timer_latency     11700 ns
38a955d7eaSDaniel Bristot de Oliveira          <idle>-0       [001] dNh1    54.029346: #1     context    irq timer_latency      2833 ns
39a955d7eaSDaniel Bristot de Oliveira           <...>-868     [001] ....    54.029353: #1     context thread timer_latency      9820 ns
40a955d7eaSDaniel Bristot de Oliveira          <idle>-0       [000] d.h1    54.030328: #2     context    irq timer_latency       769 ns
41a955d7eaSDaniel Bristot de Oliveira           <...>-867     [000] ....    54.030330: #2     context thread timer_latency      3070 ns
42a955d7eaSDaniel Bristot de Oliveira          <idle>-0       [001] d.h1    54.030344: #2     context    irq timer_latency       935 ns
43a955d7eaSDaniel Bristot de Oliveira           <...>-868     [001] ....    54.030347: #2     context thread timer_latency      4351 ns
44a955d7eaSDaniel Bristot de Oliveira
45a955d7eaSDaniel Bristot de Oliveira
46a955d7eaSDaniel Bristot de OliveiraThe tracer creates a per-cpu kernel thread with real-time priority that
47a955d7eaSDaniel Bristot de Oliveiraprints two lines at every activation. The first is the *timer latency*
48a955d7eaSDaniel Bristot de Oliveiraobserved at the *hardirq* context before the activation of the thread.
49a955d7eaSDaniel Bristot de OliveiraThe second is the *timer latency* observed by the thread. The ACTIVATION
50a955d7eaSDaniel Bristot de OliveiraID field serves to relate the *irq* execution to its respective *thread*
51a955d7eaSDaniel Bristot de Oliveiraexecution.
52a955d7eaSDaniel Bristot de Oliveira
534d4eac7bSDaniel Bristot de OliveiraThe *irq*/*thread* splitting is important to clarify in which context
54a955d7eaSDaniel Bristot de Oliveirathe unexpected high value is coming from. The *irq* context can be
554d4eac7bSDaniel Bristot de Oliveiradelayed by hardware-related actions, such as SMIs, NMIs, IRQs,
564d4eac7bSDaniel Bristot de Oliveiraor by thread masking interrupts. Once the timer happens, the delay
57a955d7eaSDaniel Bristot de Oliveiracan also be influenced by blocking caused by threads. For example, by
584d4eac7bSDaniel Bristot de Oliveirapostponing the scheduler execution via preempt_disable(), scheduler
594d4eac7bSDaniel Bristot de Oliveiraexecution, or masking interrupts. Threads can also be delayed by the
604d4eac7bSDaniel Bristot de Oliveirainterference from other threads and IRQs.
61a955d7eaSDaniel Bristot de Oliveira
62a955d7eaSDaniel Bristot de OliveiraTracer options
63a955d7eaSDaniel Bristot de Oliveira---------------------
64a955d7eaSDaniel Bristot de Oliveira
65a955d7eaSDaniel Bristot de OliveiraThe timerlat tracer is built on top of osnoise tracer.
66a955d7eaSDaniel Bristot de OliveiraSo its configuration is also done in the osnoise/ config
67a955d7eaSDaniel Bristot de Oliveiradirectory. The timerlat configs are:
68a955d7eaSDaniel Bristot de Oliveira
69a955d7eaSDaniel Bristot de Oliveira - cpus: CPUs at which a timerlat thread will execute.
70a955d7eaSDaniel Bristot de Oliveira - timerlat_period_us: the period of the timerlat thread.
714d4eac7bSDaniel Bristot de Oliveira - stop_tracing_us: stop the system tracing if a
72a955d7eaSDaniel Bristot de Oliveira   timer latency at the *irq* context higher than the configured
73a955d7eaSDaniel Bristot de Oliveira   value happens. Writing 0 disables this option.
74a955d7eaSDaniel Bristot de Oliveira - stop_tracing_total_us: stop the system tracing if a
754d4eac7bSDaniel Bristot de Oliveira   timer latency at the *thread* context is higher than the configured
76a955d7eaSDaniel Bristot de Oliveira   value happens. Writing 0 disables this option.
774dd2aea2SDaniel Bristot de Oliveira - print_stack: save the stack of the IRQ occurrence. The stack is printed
784dd2aea2SDaniel Bristot de Oliveira   after the *thread context* event, or at the IRQ handler if *stop_tracing_us*
794dd2aea2SDaniel Bristot de Oliveira   is hit.
80a955d7eaSDaniel Bristot de Oliveira
81a955d7eaSDaniel Bristot de Oliveiratimerlat and osnoise
82a955d7eaSDaniel Bristot de Oliveira----------------------------
83a955d7eaSDaniel Bristot de Oliveira
84a955d7eaSDaniel Bristot de OliveiraThe timerlat can also take advantage of the osnoise: traceevents.
85a955d7eaSDaniel Bristot de OliveiraFor example::
86a955d7eaSDaniel Bristot de Oliveira
87a955d7eaSDaniel Bristot de Oliveira        [root@f32 ~]# cd /sys/kernel/tracing/
88a955d7eaSDaniel Bristot de Oliveira        [root@f32 tracing]# echo timerlat > current_tracer
89a955d7eaSDaniel Bristot de Oliveira        [root@f32 tracing]# echo 1 > events/osnoise/enable
90a955d7eaSDaniel Bristot de Oliveira        [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us
91a955d7eaSDaniel Bristot de Oliveira        [root@f32 tracing]# tail -10 trace
92a955d7eaSDaniel Bristot de Oliveira             cc1-87882   [005] d..h...   548.771078: #402268 context    irq timer_latency     13585 ns
93a955d7eaSDaniel Bristot de Oliveira             cc1-87882   [005] dNLh1..   548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns
94a955d7eaSDaniel Bristot de Oliveira             cc1-87882   [005] dNLh2..   548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns
95a955d7eaSDaniel Bristot de Oliveira             cc1-87882   [005] d...3..   548.771102: thread_noise:      cc1:87882 start 548.771078243 duration 9909 ns
96a955d7eaSDaniel Bristot de Oliveira      timerlat/5-1035    [005] .......   548.771104: #402268 context thread timer_latency     39960 ns
97a955d7eaSDaniel Bristot de Oliveira
98a955d7eaSDaniel Bristot de OliveiraIn this case, the root cause of the timer latency does not point to a
994d4eac7bSDaniel Bristot de Oliveirasingle cause but to multiple ones. Firstly, the timer IRQ was delayed
100a955d7eaSDaniel Bristot de Oliveirafor 13 us, which may point to a long IRQ disabled section (see IRQ
101a955d7eaSDaniel Bristot de Oliveirastacktrace section). Then the timer interrupt that wakes up the timerlat
102a955d7eaSDaniel Bristot de Oliveirathread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally,
103a955d7eaSDaniel Bristot de Oliveirathe cc1 thread noise took 9909 ns of time before the context switch.
104a955d7eaSDaniel Bristot de OliveiraSuch pieces of evidence are useful for the developer to use other
105a955d7eaSDaniel Bristot de Oliveiratracing methods to figure out how to debug and optimize the system.
106a955d7eaSDaniel Bristot de Oliveira
107a955d7eaSDaniel Bristot de OliveiraIt is worth mentioning that the *duration* values reported
108a955d7eaSDaniel Bristot de Oliveiraby the osnoise: events are *net* values. For example, the
109a955d7eaSDaniel Bristot de Oliveirathread_noise does not include the duration of the overhead caused
110a955d7eaSDaniel Bristot de Oliveiraby the IRQ execution (which indeed accounted for 12736 ns). But
111a955d7eaSDaniel Bristot de Oliveirathe values reported by the timerlat tracer (timerlat_latency)
112a955d7eaSDaniel Bristot de Oliveiraare *gross* values.
113a955d7eaSDaniel Bristot de Oliveira
114a955d7eaSDaniel Bristot de OliveiraThe art below illustrates a CPU timeline and how the timerlat tracer
115a955d7eaSDaniel Bristot de Oliveiraobserves it at the top and the osnoise: events at the bottom. Each "-"
116a955d7eaSDaniel Bristot de Oliveirain the timelines means circa 1 us, and the time moves ==>::
117a955d7eaSDaniel Bristot de Oliveira
118a955d7eaSDaniel Bristot de Oliveira      External     timer irq                   thread
119a955d7eaSDaniel Bristot de Oliveira       clock        latency                    latency
120a955d7eaSDaniel Bristot de Oliveira       event        13585 ns                   39960 ns
121a955d7eaSDaniel Bristot de Oliveira         |             ^                         ^
122a955d7eaSDaniel Bristot de Oliveira         v             |                         |
123a955d7eaSDaniel Bristot de Oliveira         |-------------|                         |
124a955d7eaSDaniel Bristot de Oliveira         |-------------+-------------------------|
125a955d7eaSDaniel Bristot de Oliveira                       ^                         ^
126a955d7eaSDaniel Bristot de Oliveira  ========================================================================
127a955d7eaSDaniel Bristot de Oliveira                    [tmr irq]  [dev irq]
128a955d7eaSDaniel Bristot de Oliveira  [another thread...^       v..^       v.......][timerlat/ thread]  <-- CPU timeline
129a955d7eaSDaniel Bristot de Oliveira  =========================================================================
130a955d7eaSDaniel Bristot de Oliveira                    |-------|  |-------|
131a955d7eaSDaniel Bristot de Oliveira                            |--^       v-------|
132a955d7eaSDaniel Bristot de Oliveira                            |          |       |
133a955d7eaSDaniel Bristot de Oliveira                            |          |       + thread_noise: 9909 ns
134a955d7eaSDaniel Bristot de Oliveira                            |          +-> irq_noise: 6139 ns
135a955d7eaSDaniel Bristot de Oliveira                            +-> irq_noise: 7597 ns
136a955d7eaSDaniel Bristot de Oliveira
137a955d7eaSDaniel Bristot de OliveiraIRQ stacktrace
138a955d7eaSDaniel Bristot de Oliveira---------------------------
139a955d7eaSDaniel Bristot de Oliveira
140a955d7eaSDaniel Bristot de OliveiraThe osnoise/print_stack option is helpful for the cases in which a thread
141a955d7eaSDaniel Bristot de Oliveiranoise causes the major factor for the timer latency, because of preempt or
142a955d7eaSDaniel Bristot de Oliveirairq disabled. For example::
143a955d7eaSDaniel Bristot de Oliveira
144a955d7eaSDaniel Bristot de Oliveira        [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us
145a955d7eaSDaniel Bristot de Oliveira        [root@f32 tracing]# echo 500 > osnoise/print_stack
146a955d7eaSDaniel Bristot de Oliveira        [root@f32 tracing]# echo timerlat > current_tracer
147a955d7eaSDaniel Bristot de Oliveira        [root@f32 tracing]# tail -21 per_cpu/cpu7/trace
148a955d7eaSDaniel Bristot de Oliveira          insmod-1026    [007] dN.h1..   200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns
149a955d7eaSDaniel Bristot de Oliveira          insmod-1026    [007] d..h1..   200.202587: #29800 context    irq timer_latency      1616 ns
150a955d7eaSDaniel Bristot de Oliveira          insmod-1026    [007] dN.h2..   200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns
151a955d7eaSDaniel Bristot de Oliveira          insmod-1026    [007] dN.h3..   200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns
152a955d7eaSDaniel Bristot de Oliveira          insmod-1026    [007] d...3..   200.203444: thread_noise:   insmod:1026 start 200.202586933 duration 838681 ns
153a955d7eaSDaniel Bristot de Oliveira      timerlat/7-1001    [007] .......   200.203445: #29800 context thread timer_latency    859978 ns
154a955d7eaSDaniel Bristot de Oliveira      timerlat/7-1001    [007] ....1..   200.203446: <stack trace>
155a955d7eaSDaniel Bristot de Oliveira  => timerlat_irq
156a955d7eaSDaniel Bristot de Oliveira  => __hrtimer_run_queues
157a955d7eaSDaniel Bristot de Oliveira  => hrtimer_interrupt
158a955d7eaSDaniel Bristot de Oliveira  => __sysvec_apic_timer_interrupt
159a955d7eaSDaniel Bristot de Oliveira  => asm_call_irq_on_stack
160a955d7eaSDaniel Bristot de Oliveira  => sysvec_apic_timer_interrupt
161a955d7eaSDaniel Bristot de Oliveira  => asm_sysvec_apic_timer_interrupt
162a955d7eaSDaniel Bristot de Oliveira  => delay_tsc
163a955d7eaSDaniel Bristot de Oliveira  => dummy_load_1ms_pd_init
164a955d7eaSDaniel Bristot de Oliveira  => do_one_initcall
165a955d7eaSDaniel Bristot de Oliveira  => do_init_module
166a955d7eaSDaniel Bristot de Oliveira  => __do_sys_finit_module
167a955d7eaSDaniel Bristot de Oliveira  => do_syscall_64
168a955d7eaSDaniel Bristot de Oliveira  => entry_SYSCALL_64_after_hwframe
169a955d7eaSDaniel Bristot de Oliveira
170a955d7eaSDaniel Bristot de OliveiraIn this case, it is possible to see that the thread added the highest
171a955d7eaSDaniel Bristot de Oliveiracontribution to the *timer latency* and the stack trace, saved during
172a955d7eaSDaniel Bristot de Oliveirathe timerlat IRQ handler, points to a function named
173a955d7eaSDaniel Bristot de Oliveiradummy_load_1ms_pd_init, which had the following code (on purpose)::
174a955d7eaSDaniel Bristot de Oliveira
175a955d7eaSDaniel Bristot de Oliveira	static int __init dummy_load_1ms_pd_init(void)
176a955d7eaSDaniel Bristot de Oliveira	{
177a955d7eaSDaniel Bristot de Oliveira		preempt_disable();
178a955d7eaSDaniel Bristot de Oliveira		mdelay(1);
179a955d7eaSDaniel Bristot de Oliveira		preempt_enable();
180a955d7eaSDaniel Bristot de Oliveira		return 0;
181a955d7eaSDaniel Bristot de Oliveira
182a955d7eaSDaniel Bristot de Oliveira	}
183*e88ed227SDaniel Bristot de Oliveira
184*e88ed227SDaniel Bristot de OliveiraUser-space interface
185*e88ed227SDaniel Bristot de Oliveira---------------------------
186*e88ed227SDaniel Bristot de Oliveira
187*e88ed227SDaniel Bristot de OliveiraTimerlat allows user-space threads to use timerlat infra-structure to
188*e88ed227SDaniel Bristot de Oliveirameasure scheduling latency. This interface is accessible via a per-CPU
189*e88ed227SDaniel Bristot de Oliveirafile descriptor inside $tracing_dir/osnoise/per_cpu/cpu$ID/timerlat_fd.
190*e88ed227SDaniel Bristot de Oliveira
191*e88ed227SDaniel Bristot de OliveiraThis interface is accessible under the following conditions:
192*e88ed227SDaniel Bristot de Oliveira
193*e88ed227SDaniel Bristot de Oliveira - timerlat tracer is enable
194*e88ed227SDaniel Bristot de Oliveira - osnoise workload option is set to NO_OSNOISE_WORKLOAD
195*e88ed227SDaniel Bristot de Oliveira - The user-space thread is affined to a single processor
196*e88ed227SDaniel Bristot de Oliveira - The thread opens the file associated with its single processor
197*e88ed227SDaniel Bristot de Oliveira - Only one thread can access the file at a time
198*e88ed227SDaniel Bristot de Oliveira
199*e88ed227SDaniel Bristot de OliveiraThe open() syscall will fail if any of these conditions are not met.
200*e88ed227SDaniel Bristot de OliveiraAfter opening the file descriptor, the user space can read from it.
201*e88ed227SDaniel Bristot de Oliveira
202*e88ed227SDaniel Bristot de OliveiraThe read() system call will run a timerlat code that will arm the
203*e88ed227SDaniel Bristot de Oliveiratimer in the future and wait for it as the regular kernel thread does.
204*e88ed227SDaniel Bristot de Oliveira
205*e88ed227SDaniel Bristot de OliveiraWhen the timer IRQ fires, the timerlat IRQ will execute, report the
206*e88ed227SDaniel Bristot de OliveiraIRQ latency and wake up the thread waiting in the read. The thread will be
207*e88ed227SDaniel Bristot de Oliveirascheduled and report the thread latency via tracer - as for the kernel
208*e88ed227SDaniel Bristot de Oliveirathread.
209*e88ed227SDaniel Bristot de Oliveira
210*e88ed227SDaniel Bristot de OliveiraThe difference from the in-kernel timerlat is that, instead of re-arming
211*e88ed227SDaniel Bristot de Oliveirathe timer, timerlat will return to the read() system call. At this point,
212*e88ed227SDaniel Bristot de Oliveirathe user can run any code.
213*e88ed227SDaniel Bristot de Oliveira
214*e88ed227SDaniel Bristot de OliveiraIf the application rereads the file timerlat file descriptor, the tracer
215*e88ed227SDaniel Bristot de Oliveirawill report the return from user-space latency, which is the total
216*e88ed227SDaniel Bristot de Oliveiralatency. If this is the end of the work, it can be interpreted as the
217*e88ed227SDaniel Bristot de Oliveiraresponse time for the request.
218*e88ed227SDaniel Bristot de Oliveira
219*e88ed227SDaniel Bristot de OliveiraAfter reporting the total latency, timerlat will restart the cycle, arm
220*e88ed227SDaniel Bristot de Oliveiraa timer, and go to sleep for the following activation.
221*e88ed227SDaniel Bristot de Oliveira
222*e88ed227SDaniel Bristot de OliveiraIf at any time one of the conditions is broken, e.g., the thread migrates
223*e88ed227SDaniel Bristot de Oliveirawhile in user space, or the timerlat tracer is disabled, the SIG_KILL
224*e88ed227SDaniel Bristot de Oliveirasignal will be sent to the user-space thread.
225*e88ed227SDaniel Bristot de Oliveira
226*e88ed227SDaniel Bristot de OliveiraHere is an basic example of user-space code for timerlat::
227*e88ed227SDaniel Bristot de Oliveira
228*e88ed227SDaniel Bristot de Oliveira int main(void)
229*e88ed227SDaniel Bristot de Oliveira {
230*e88ed227SDaniel Bristot de Oliveira	char buffer[1024];
231*e88ed227SDaniel Bristot de Oliveira	int timerlat_fd;
232*e88ed227SDaniel Bristot de Oliveira	int retval;
233*e88ed227SDaniel Bristot de Oliveira	long cpu = 0;   /* place in CPU 0 */
234*e88ed227SDaniel Bristot de Oliveira	cpu_set_t set;
235*e88ed227SDaniel Bristot de Oliveira
236*e88ed227SDaniel Bristot de Oliveira	CPU_ZERO(&set);
237*e88ed227SDaniel Bristot de Oliveira	CPU_SET(cpu, &set);
238*e88ed227SDaniel Bristot de Oliveira
239*e88ed227SDaniel Bristot de Oliveira	if (sched_setaffinity(gettid(), sizeof(set), &set) == -1)
240*e88ed227SDaniel Bristot de Oliveira		return 1;
241*e88ed227SDaniel Bristot de Oliveira
242*e88ed227SDaniel Bristot de Oliveira	snprintf(buffer, sizeof(buffer),
243*e88ed227SDaniel Bristot de Oliveira		"/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd",
244*e88ed227SDaniel Bristot de Oliveira		cpu);
245*e88ed227SDaniel Bristot de Oliveira
246*e88ed227SDaniel Bristot de Oliveira	timerlat_fd = open(buffer, O_RDONLY);
247*e88ed227SDaniel Bristot de Oliveira	if (timerlat_fd < 0) {
248*e88ed227SDaniel Bristot de Oliveira		printf("error opening %s: %s\n", buffer, strerror(errno));
249*e88ed227SDaniel Bristot de Oliveira		exit(1);
250*e88ed227SDaniel Bristot de Oliveira	}
251*e88ed227SDaniel Bristot de Oliveira
252*e88ed227SDaniel Bristot de Oliveira	for (;;) {
253*e88ed227SDaniel Bristot de Oliveira		retval = read(timerlat_fd, buffer, 1024);
254*e88ed227SDaniel Bristot de Oliveira		if (retval < 0)
255*e88ed227SDaniel Bristot de Oliveira			break;
256*e88ed227SDaniel Bristot de Oliveira	}
257*e88ed227SDaniel Bristot de Oliveira
258*e88ed227SDaniel Bristot de Oliveira	close(timerlat_fd);
259*e88ed227SDaniel Bristot de Oliveira	exit(0);
260*e88ed227SDaniel Bristot de Oliveira }
261