1a955d7eaSDaniel Bristot de Oliveira############### 2a955d7eaSDaniel Bristot de OliveiraTimerlat tracer 3a955d7eaSDaniel Bristot de Oliveira############### 4a955d7eaSDaniel Bristot de Oliveira 5a955d7eaSDaniel Bristot de OliveiraThe timerlat tracer aims to help the preemptive kernel developers to 64d4eac7bSDaniel Bristot de Oliveirafind sources of wakeup latencies of real-time threads. Like cyclictest, 7a955d7eaSDaniel Bristot de Oliveirathe tracer sets a periodic timer that wakes up a thread. The thread then 8a955d7eaSDaniel Bristot de Oliveiracomputes a *wakeup latency* value as the difference between the *current 9a955d7eaSDaniel Bristot de Oliveiratime* and the *absolute time* that the timer was set to expire. The main 10a955d7eaSDaniel Bristot de Oliveiragoal of timerlat is tracing in such a way to help kernel developers. 11a955d7eaSDaniel Bristot de Oliveira 12a955d7eaSDaniel Bristot de OliveiraUsage 13a955d7eaSDaniel Bristot de Oliveira----- 14a955d7eaSDaniel Bristot de Oliveira 15a955d7eaSDaniel Bristot de OliveiraWrite the ASCII text "timerlat" into the current_tracer file of the 16a955d7eaSDaniel Bristot de Oliveiratracing system (generally mounted at /sys/kernel/tracing). 17a955d7eaSDaniel Bristot de Oliveira 18a955d7eaSDaniel Bristot de OliveiraFor example:: 19a955d7eaSDaniel Bristot de Oliveira 20a955d7eaSDaniel Bristot de Oliveira [root@f32 ~]# cd /sys/kernel/tracing/ 21a955d7eaSDaniel Bristot de Oliveira [root@f32 tracing]# echo timerlat > current_tracer 22a955d7eaSDaniel Bristot de Oliveira 237f77ebbfSAkhil RajIt is possible to follow the trace by reading the trace file:: 24a955d7eaSDaniel Bristot de Oliveira 25a955d7eaSDaniel Bristot de Oliveira [root@f32 tracing]# cat trace 26a955d7eaSDaniel Bristot de Oliveira # tracer: timerlat 27a955d7eaSDaniel Bristot de Oliveira # 28a955d7eaSDaniel Bristot de Oliveira # _-----=> irqs-off 29a955d7eaSDaniel Bristot de Oliveira # / _----=> need-resched 30a955d7eaSDaniel Bristot de Oliveira # | / _---=> hardirq/softirq 31a955d7eaSDaniel Bristot de Oliveira # || / _--=> preempt-depth 32a955d7eaSDaniel Bristot de Oliveira # || / 33a955d7eaSDaniel Bristot de Oliveira # |||| ACTIVATION 34a955d7eaSDaniel Bristot de Oliveira # TASK-PID CPU# |||| TIMESTAMP ID CONTEXT LATENCY 35a955d7eaSDaniel Bristot de Oliveira # | | | |||| | | | | 36a955d7eaSDaniel Bristot de Oliveira <idle>-0 [000] d.h1 54.029328: #1 context irq timer_latency 932 ns 37a955d7eaSDaniel Bristot de Oliveira <...>-867 [000] .... 54.029339: #1 context thread timer_latency 11700 ns 38a955d7eaSDaniel Bristot de Oliveira <idle>-0 [001] dNh1 54.029346: #1 context irq timer_latency 2833 ns 39a955d7eaSDaniel Bristot de Oliveira <...>-868 [001] .... 54.029353: #1 context thread timer_latency 9820 ns 40a955d7eaSDaniel Bristot de Oliveira <idle>-0 [000] d.h1 54.030328: #2 context irq timer_latency 769 ns 41a955d7eaSDaniel Bristot de Oliveira <...>-867 [000] .... 54.030330: #2 context thread timer_latency 3070 ns 42a955d7eaSDaniel Bristot de Oliveira <idle>-0 [001] d.h1 54.030344: #2 context irq timer_latency 935 ns 43a955d7eaSDaniel Bristot de Oliveira <...>-868 [001] .... 54.030347: #2 context thread timer_latency 4351 ns 44a955d7eaSDaniel Bristot de Oliveira 45a955d7eaSDaniel Bristot de Oliveira 46a955d7eaSDaniel Bristot de OliveiraThe tracer creates a per-cpu kernel thread with real-time priority that 47a955d7eaSDaniel Bristot de Oliveiraprints two lines at every activation. The first is the *timer latency* 48a955d7eaSDaniel Bristot de Oliveiraobserved at the *hardirq* context before the activation of the thread. 49a955d7eaSDaniel Bristot de OliveiraThe second is the *timer latency* observed by the thread. The ACTIVATION 50a955d7eaSDaniel Bristot de OliveiraID field serves to relate the *irq* execution to its respective *thread* 51a955d7eaSDaniel Bristot de Oliveiraexecution. 52a955d7eaSDaniel Bristot de Oliveira 534d4eac7bSDaniel Bristot de OliveiraThe *irq*/*thread* splitting is important to clarify in which context 54a955d7eaSDaniel Bristot de Oliveirathe unexpected high value is coming from. The *irq* context can be 554d4eac7bSDaniel Bristot de Oliveiradelayed by hardware-related actions, such as SMIs, NMIs, IRQs, 564d4eac7bSDaniel Bristot de Oliveiraor by thread masking interrupts. Once the timer happens, the delay 57a955d7eaSDaniel Bristot de Oliveiracan also be influenced by blocking caused by threads. For example, by 584d4eac7bSDaniel Bristot de Oliveirapostponing the scheduler execution via preempt_disable(), scheduler 594d4eac7bSDaniel Bristot de Oliveiraexecution, or masking interrupts. Threads can also be delayed by the 604d4eac7bSDaniel Bristot de Oliveirainterference from other threads and IRQs. 61a955d7eaSDaniel Bristot de Oliveira 62a955d7eaSDaniel Bristot de OliveiraTracer options 63a955d7eaSDaniel Bristot de Oliveira--------------------- 64a955d7eaSDaniel Bristot de Oliveira 65a955d7eaSDaniel Bristot de OliveiraThe timerlat tracer is built on top of osnoise tracer. 66a955d7eaSDaniel Bristot de OliveiraSo its configuration is also done in the osnoise/ config 67a955d7eaSDaniel Bristot de Oliveiradirectory. The timerlat configs are: 68a955d7eaSDaniel Bristot de Oliveira 69a955d7eaSDaniel Bristot de Oliveira - cpus: CPUs at which a timerlat thread will execute. 70a955d7eaSDaniel Bristot de Oliveira - timerlat_period_us: the period of the timerlat thread. 714d4eac7bSDaniel Bristot de Oliveira - stop_tracing_us: stop the system tracing if a 72a955d7eaSDaniel Bristot de Oliveira timer latency at the *irq* context higher than the configured 73a955d7eaSDaniel Bristot de Oliveira value happens. Writing 0 disables this option. 74a955d7eaSDaniel Bristot de Oliveira - stop_tracing_total_us: stop the system tracing if a 754d4eac7bSDaniel Bristot de Oliveira timer latency at the *thread* context is higher than the configured 76a955d7eaSDaniel Bristot de Oliveira value happens. Writing 0 disables this option. 774dd2aea2SDaniel Bristot de Oliveira - print_stack: save the stack of the IRQ occurrence. The stack is printed 784dd2aea2SDaniel Bristot de Oliveira after the *thread context* event, or at the IRQ handler if *stop_tracing_us* 794dd2aea2SDaniel Bristot de Oliveira is hit. 80a955d7eaSDaniel Bristot de Oliveira 81a955d7eaSDaniel Bristot de Oliveiratimerlat and osnoise 82a955d7eaSDaniel Bristot de Oliveira---------------------------- 83a955d7eaSDaniel Bristot de Oliveira 84a955d7eaSDaniel Bristot de OliveiraThe timerlat can also take advantage of the osnoise: traceevents. 85a955d7eaSDaniel Bristot de OliveiraFor example:: 86a955d7eaSDaniel Bristot de Oliveira 87a955d7eaSDaniel Bristot de Oliveira [root@f32 ~]# cd /sys/kernel/tracing/ 88a955d7eaSDaniel Bristot de Oliveira [root@f32 tracing]# echo timerlat > current_tracer 89a955d7eaSDaniel Bristot de Oliveira [root@f32 tracing]# echo 1 > events/osnoise/enable 90a955d7eaSDaniel Bristot de Oliveira [root@f32 tracing]# echo 25 > osnoise/stop_tracing_total_us 91a955d7eaSDaniel Bristot de Oliveira [root@f32 tracing]# tail -10 trace 92a955d7eaSDaniel Bristot de Oliveira cc1-87882 [005] d..h... 548.771078: #402268 context irq timer_latency 13585 ns 93a955d7eaSDaniel Bristot de Oliveira cc1-87882 [005] dNLh1.. 548.771082: irq_noise: local_timer:236 start 548.771077442 duration 7597 ns 94a955d7eaSDaniel Bristot de Oliveira cc1-87882 [005] dNLh2.. 548.771099: irq_noise: qxl:21 start 548.771085017 duration 7139 ns 95a955d7eaSDaniel Bristot de Oliveira cc1-87882 [005] d...3.. 548.771102: thread_noise: cc1:87882 start 548.771078243 duration 9909 ns 96a955d7eaSDaniel Bristot de Oliveira timerlat/5-1035 [005] ....... 548.771104: #402268 context thread timer_latency 39960 ns 97a955d7eaSDaniel Bristot de Oliveira 98a955d7eaSDaniel Bristot de OliveiraIn this case, the root cause of the timer latency does not point to a 994d4eac7bSDaniel Bristot de Oliveirasingle cause but to multiple ones. Firstly, the timer IRQ was delayed 100a955d7eaSDaniel Bristot de Oliveirafor 13 us, which may point to a long IRQ disabled section (see IRQ 101a955d7eaSDaniel Bristot de Oliveirastacktrace section). Then the timer interrupt that wakes up the timerlat 102a955d7eaSDaniel Bristot de Oliveirathread took 7597 ns, and the qxl:21 device IRQ took 7139 ns. Finally, 103a955d7eaSDaniel Bristot de Oliveirathe cc1 thread noise took 9909 ns of time before the context switch. 104a955d7eaSDaniel Bristot de OliveiraSuch pieces of evidence are useful for the developer to use other 105a955d7eaSDaniel Bristot de Oliveiratracing methods to figure out how to debug and optimize the system. 106a955d7eaSDaniel Bristot de Oliveira 107a955d7eaSDaniel Bristot de OliveiraIt is worth mentioning that the *duration* values reported 108a955d7eaSDaniel Bristot de Oliveiraby the osnoise: events are *net* values. For example, the 109a955d7eaSDaniel Bristot de Oliveirathread_noise does not include the duration of the overhead caused 110a955d7eaSDaniel Bristot de Oliveiraby the IRQ execution (which indeed accounted for 12736 ns). But 111a955d7eaSDaniel Bristot de Oliveirathe values reported by the timerlat tracer (timerlat_latency) 112a955d7eaSDaniel Bristot de Oliveiraare *gross* values. 113a955d7eaSDaniel Bristot de Oliveira 114a955d7eaSDaniel Bristot de OliveiraThe art below illustrates a CPU timeline and how the timerlat tracer 115a955d7eaSDaniel Bristot de Oliveiraobserves it at the top and the osnoise: events at the bottom. Each "-" 116a955d7eaSDaniel Bristot de Oliveirain the timelines means circa 1 us, and the time moves ==>:: 117a955d7eaSDaniel Bristot de Oliveira 118a955d7eaSDaniel Bristot de Oliveira External timer irq thread 119a955d7eaSDaniel Bristot de Oliveira clock latency latency 120a955d7eaSDaniel Bristot de Oliveira event 13585 ns 39960 ns 121a955d7eaSDaniel Bristot de Oliveira | ^ ^ 122a955d7eaSDaniel Bristot de Oliveira v | | 123a955d7eaSDaniel Bristot de Oliveira |-------------| | 124a955d7eaSDaniel Bristot de Oliveira |-------------+-------------------------| 125a955d7eaSDaniel Bristot de Oliveira ^ ^ 126a955d7eaSDaniel Bristot de Oliveira ======================================================================== 127a955d7eaSDaniel Bristot de Oliveira [tmr irq] [dev irq] 128a955d7eaSDaniel Bristot de Oliveira [another thread...^ v..^ v.......][timerlat/ thread] <-- CPU timeline 129a955d7eaSDaniel Bristot de Oliveira ========================================================================= 130a955d7eaSDaniel Bristot de Oliveira |-------| |-------| 131a955d7eaSDaniel Bristot de Oliveira |--^ v-------| 132a955d7eaSDaniel Bristot de Oliveira | | | 133a955d7eaSDaniel Bristot de Oliveira | | + thread_noise: 9909 ns 134a955d7eaSDaniel Bristot de Oliveira | +-> irq_noise: 6139 ns 135a955d7eaSDaniel Bristot de Oliveira +-> irq_noise: 7597 ns 136a955d7eaSDaniel Bristot de Oliveira 137a955d7eaSDaniel Bristot de OliveiraIRQ stacktrace 138a955d7eaSDaniel Bristot de Oliveira--------------------------- 139a955d7eaSDaniel Bristot de Oliveira 140a955d7eaSDaniel Bristot de OliveiraThe osnoise/print_stack option is helpful for the cases in which a thread 141a955d7eaSDaniel Bristot de Oliveiranoise causes the major factor for the timer latency, because of preempt or 142a955d7eaSDaniel Bristot de Oliveirairq disabled. For example:: 143a955d7eaSDaniel Bristot de Oliveira 144a955d7eaSDaniel Bristot de Oliveira [root@f32 tracing]# echo 500 > osnoise/stop_tracing_total_us 145a955d7eaSDaniel Bristot de Oliveira [root@f32 tracing]# echo 500 > osnoise/print_stack 146a955d7eaSDaniel Bristot de Oliveira [root@f32 tracing]# echo timerlat > current_tracer 147a955d7eaSDaniel Bristot de Oliveira [root@f32 tracing]# tail -21 per_cpu/cpu7/trace 148a955d7eaSDaniel Bristot de Oliveira insmod-1026 [007] dN.h1.. 200.201948: irq_noise: local_timer:236 start 200.201939376 duration 7872 ns 149a955d7eaSDaniel Bristot de Oliveira insmod-1026 [007] d..h1.. 200.202587: #29800 context irq timer_latency 1616 ns 150a955d7eaSDaniel Bristot de Oliveira insmod-1026 [007] dN.h2.. 200.202598: irq_noise: local_timer:236 start 200.202586162 duration 11855 ns 151a955d7eaSDaniel Bristot de Oliveira insmod-1026 [007] dN.h3.. 200.202947: irq_noise: local_timer:236 start 200.202939174 duration 7318 ns 152a955d7eaSDaniel Bristot de Oliveira insmod-1026 [007] d...3.. 200.203444: thread_noise: insmod:1026 start 200.202586933 duration 838681 ns 153a955d7eaSDaniel Bristot de Oliveira timerlat/7-1001 [007] ....... 200.203445: #29800 context thread timer_latency 859978 ns 154a955d7eaSDaniel Bristot de Oliveira timerlat/7-1001 [007] ....1.. 200.203446: <stack trace> 155a955d7eaSDaniel Bristot de Oliveira => timerlat_irq 156a955d7eaSDaniel Bristot de Oliveira => __hrtimer_run_queues 157a955d7eaSDaniel Bristot de Oliveira => hrtimer_interrupt 158a955d7eaSDaniel Bristot de Oliveira => __sysvec_apic_timer_interrupt 159a955d7eaSDaniel Bristot de Oliveira => asm_call_irq_on_stack 160a955d7eaSDaniel Bristot de Oliveira => sysvec_apic_timer_interrupt 161a955d7eaSDaniel Bristot de Oliveira => asm_sysvec_apic_timer_interrupt 162a955d7eaSDaniel Bristot de Oliveira => delay_tsc 163a955d7eaSDaniel Bristot de Oliveira => dummy_load_1ms_pd_init 164a955d7eaSDaniel Bristot de Oliveira => do_one_initcall 165a955d7eaSDaniel Bristot de Oliveira => do_init_module 166a955d7eaSDaniel Bristot de Oliveira => __do_sys_finit_module 167a955d7eaSDaniel Bristot de Oliveira => do_syscall_64 168a955d7eaSDaniel Bristot de Oliveira => entry_SYSCALL_64_after_hwframe 169a955d7eaSDaniel Bristot de Oliveira 170a955d7eaSDaniel Bristot de OliveiraIn this case, it is possible to see that the thread added the highest 171a955d7eaSDaniel Bristot de Oliveiracontribution to the *timer latency* and the stack trace, saved during 172a955d7eaSDaniel Bristot de Oliveirathe timerlat IRQ handler, points to a function named 173a955d7eaSDaniel Bristot de Oliveiradummy_load_1ms_pd_init, which had the following code (on purpose):: 174a955d7eaSDaniel Bristot de Oliveira 175a955d7eaSDaniel Bristot de Oliveira static int __init dummy_load_1ms_pd_init(void) 176a955d7eaSDaniel Bristot de Oliveira { 177a955d7eaSDaniel Bristot de Oliveira preempt_disable(); 178a955d7eaSDaniel Bristot de Oliveira mdelay(1); 179a955d7eaSDaniel Bristot de Oliveira preempt_enable(); 180a955d7eaSDaniel Bristot de Oliveira return 0; 181a955d7eaSDaniel Bristot de Oliveira 182a955d7eaSDaniel Bristot de Oliveira } 183*e88ed227SDaniel Bristot de Oliveira 184*e88ed227SDaniel Bristot de OliveiraUser-space interface 185*e88ed227SDaniel Bristot de Oliveira--------------------------- 186*e88ed227SDaniel Bristot de Oliveira 187*e88ed227SDaniel Bristot de OliveiraTimerlat allows user-space threads to use timerlat infra-structure to 188*e88ed227SDaniel Bristot de Oliveirameasure scheduling latency. This interface is accessible via a per-CPU 189*e88ed227SDaniel Bristot de Oliveirafile descriptor inside $tracing_dir/osnoise/per_cpu/cpu$ID/timerlat_fd. 190*e88ed227SDaniel Bristot de Oliveira 191*e88ed227SDaniel Bristot de OliveiraThis interface is accessible under the following conditions: 192*e88ed227SDaniel Bristot de Oliveira 193*e88ed227SDaniel Bristot de Oliveira - timerlat tracer is enable 194*e88ed227SDaniel Bristot de Oliveira - osnoise workload option is set to NO_OSNOISE_WORKLOAD 195*e88ed227SDaniel Bristot de Oliveira - The user-space thread is affined to a single processor 196*e88ed227SDaniel Bristot de Oliveira - The thread opens the file associated with its single processor 197*e88ed227SDaniel Bristot de Oliveira - Only one thread can access the file at a time 198*e88ed227SDaniel Bristot de Oliveira 199*e88ed227SDaniel Bristot de OliveiraThe open() syscall will fail if any of these conditions are not met. 200*e88ed227SDaniel Bristot de OliveiraAfter opening the file descriptor, the user space can read from it. 201*e88ed227SDaniel Bristot de Oliveira 202*e88ed227SDaniel Bristot de OliveiraThe read() system call will run a timerlat code that will arm the 203*e88ed227SDaniel Bristot de Oliveiratimer in the future and wait for it as the regular kernel thread does. 204*e88ed227SDaniel Bristot de Oliveira 205*e88ed227SDaniel Bristot de OliveiraWhen the timer IRQ fires, the timerlat IRQ will execute, report the 206*e88ed227SDaniel Bristot de OliveiraIRQ latency and wake up the thread waiting in the read. The thread will be 207*e88ed227SDaniel Bristot de Oliveirascheduled and report the thread latency via tracer - as for the kernel 208*e88ed227SDaniel Bristot de Oliveirathread. 209*e88ed227SDaniel Bristot de Oliveira 210*e88ed227SDaniel Bristot de OliveiraThe difference from the in-kernel timerlat is that, instead of re-arming 211*e88ed227SDaniel Bristot de Oliveirathe timer, timerlat will return to the read() system call. At this point, 212*e88ed227SDaniel Bristot de Oliveirathe user can run any code. 213*e88ed227SDaniel Bristot de Oliveira 214*e88ed227SDaniel Bristot de OliveiraIf the application rereads the file timerlat file descriptor, the tracer 215*e88ed227SDaniel Bristot de Oliveirawill report the return from user-space latency, which is the total 216*e88ed227SDaniel Bristot de Oliveiralatency. If this is the end of the work, it can be interpreted as the 217*e88ed227SDaniel Bristot de Oliveiraresponse time for the request. 218*e88ed227SDaniel Bristot de Oliveira 219*e88ed227SDaniel Bristot de OliveiraAfter reporting the total latency, timerlat will restart the cycle, arm 220*e88ed227SDaniel Bristot de Oliveiraa timer, and go to sleep for the following activation. 221*e88ed227SDaniel Bristot de Oliveira 222*e88ed227SDaniel Bristot de OliveiraIf at any time one of the conditions is broken, e.g., the thread migrates 223*e88ed227SDaniel Bristot de Oliveirawhile in user space, or the timerlat tracer is disabled, the SIG_KILL 224*e88ed227SDaniel Bristot de Oliveirasignal will be sent to the user-space thread. 225*e88ed227SDaniel Bristot de Oliveira 226*e88ed227SDaniel Bristot de OliveiraHere is an basic example of user-space code for timerlat:: 227*e88ed227SDaniel Bristot de Oliveira 228*e88ed227SDaniel Bristot de Oliveira int main(void) 229*e88ed227SDaniel Bristot de Oliveira { 230*e88ed227SDaniel Bristot de Oliveira char buffer[1024]; 231*e88ed227SDaniel Bristot de Oliveira int timerlat_fd; 232*e88ed227SDaniel Bristot de Oliveira int retval; 233*e88ed227SDaniel Bristot de Oliveira long cpu = 0; /* place in CPU 0 */ 234*e88ed227SDaniel Bristot de Oliveira cpu_set_t set; 235*e88ed227SDaniel Bristot de Oliveira 236*e88ed227SDaniel Bristot de Oliveira CPU_ZERO(&set); 237*e88ed227SDaniel Bristot de Oliveira CPU_SET(cpu, &set); 238*e88ed227SDaniel Bristot de Oliveira 239*e88ed227SDaniel Bristot de Oliveira if (sched_setaffinity(gettid(), sizeof(set), &set) == -1) 240*e88ed227SDaniel Bristot de Oliveira return 1; 241*e88ed227SDaniel Bristot de Oliveira 242*e88ed227SDaniel Bristot de Oliveira snprintf(buffer, sizeof(buffer), 243*e88ed227SDaniel Bristot de Oliveira "/sys/kernel/tracing/osnoise/per_cpu/cpu%ld/timerlat_fd", 244*e88ed227SDaniel Bristot de Oliveira cpu); 245*e88ed227SDaniel Bristot de Oliveira 246*e88ed227SDaniel Bristot de Oliveira timerlat_fd = open(buffer, O_RDONLY); 247*e88ed227SDaniel Bristot de Oliveira if (timerlat_fd < 0) { 248*e88ed227SDaniel Bristot de Oliveira printf("error opening %s: %s\n", buffer, strerror(errno)); 249*e88ed227SDaniel Bristot de Oliveira exit(1); 250*e88ed227SDaniel Bristot de Oliveira } 251*e88ed227SDaniel Bristot de Oliveira 252*e88ed227SDaniel Bristot de Oliveira for (;;) { 253*e88ed227SDaniel Bristot de Oliveira retval = read(timerlat_fd, buffer, 1024); 254*e88ed227SDaniel Bristot de Oliveira if (retval < 0) 255*e88ed227SDaniel Bristot de Oliveira break; 256*e88ed227SDaniel Bristot de Oliveira } 257*e88ed227SDaniel Bristot de Oliveira 258*e88ed227SDaniel Bristot de Oliveira close(timerlat_fd); 259*e88ed227SDaniel Bristot de Oliveira exit(0); 260*e88ed227SDaniel Bristot de Oliveira } 261