1*f51fe3b7SSebastian Andrzej Siewior.. SPDX-License-Identifier: GPL-2.0 2*f51fe3b7SSebastian Andrzej Siewior 3*f51fe3b7SSebastian Andrzej Siewior===================== 4*f51fe3b7SSebastian Andrzej SiewiorTheory of operation 5*f51fe3b7SSebastian Andrzej Siewior===================== 6*f51fe3b7SSebastian Andrzej Siewior 7*f51fe3b7SSebastian Andrzej Siewior:Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> 8*f51fe3b7SSebastian Andrzej Siewior 9*f51fe3b7SSebastian Andrzej SiewiorPreface 10*f51fe3b7SSebastian Andrzej Siewior======= 11*f51fe3b7SSebastian Andrzej Siewior 12*f51fe3b7SSebastian Andrzej SiewiorPREEMPT_RT transforms the Linux kernel into a real-time kernel. It achieves 13*f51fe3b7SSebastian Andrzej Siewiorthis by replacing locking primitives, such as spinlock_t, with a preemptible 14*f51fe3b7SSebastian Andrzej Siewiorand priority-inheritance aware implementation known as rtmutex, and by enforcing 15*f51fe3b7SSebastian Andrzej Siewiorthe use of threaded interrupts. As a result, the kernel becomes fully 16*f51fe3b7SSebastian Andrzej Siewiorpreemptible, with the exception of a few critical code paths, including entry 17*f51fe3b7SSebastian Andrzej Siewiorcode, the scheduler, and low-level interrupt handling routines. 18*f51fe3b7SSebastian Andrzej Siewior 19*f51fe3b7SSebastian Andrzej SiewiorThis transformation places the majority of kernel execution contexts under the 20*f51fe3b7SSebastian Andrzej Siewiorcontrol of the scheduler and significantly increasing the number of preemption 21*f51fe3b7SSebastian Andrzej Siewiorpoints. Consequently, it reduces the latency between a high-priority task 22*f51fe3b7SSebastian Andrzej Siewiorbecoming runnable and its actual execution on the CPU. 23*f51fe3b7SSebastian Andrzej Siewior 24*f51fe3b7SSebastian Andrzej SiewiorScheduling 25*f51fe3b7SSebastian Andrzej Siewior========== 26*f51fe3b7SSebastian Andrzej Siewior 27*f51fe3b7SSebastian Andrzej SiewiorThe core principles of Linux scheduling and the associated user-space API are 28*f51fe3b7SSebastian Andrzej Siewiordocumented in the man page sched(7) 29*f51fe3b7SSebastian Andrzej Siewior`sched(7) <https://man7.org/linux/man-pages/man7/sched.7.html>`_. 30*f51fe3b7SSebastian Andrzej SiewiorBy default, the Linux kernel uses the SCHED_OTHER scheduling policy. Under 31*f51fe3b7SSebastian Andrzej Siewiorthis policy, a task is preempted when the scheduler determines that it has 32*f51fe3b7SSebastian Andrzej Siewiorconsumed a fair share of CPU time relative to other runnable tasks. However, 33*f51fe3b7SSebastian Andrzej Siewiorthe policy does not guarantee immediate preemption when a new SCHED_OTHER task 34*f51fe3b7SSebastian Andrzej Siewiorbecomes runnable. The currently running task may continue executing. 35*f51fe3b7SSebastian Andrzej Siewior 36*f51fe3b7SSebastian Andrzej SiewiorThis behavior differs from that of real-time scheduling policies such as 37*f51fe3b7SSebastian Andrzej SiewiorSCHED_FIFO. When a task with a real-time policy becomes runnable, the 38*f51fe3b7SSebastian Andrzej Siewiorscheduler immediately selects it for execution if it has a higher priority than 39*f51fe3b7SSebastian Andrzej Siewiorthe currently running task. The task continues to run until it voluntarily 40*f51fe3b7SSebastian Andrzej Siewioryields the CPU, typically by blocking on an event. 41*f51fe3b7SSebastian Andrzej Siewior 42*f51fe3b7SSebastian Andrzej SiewiorSleeping spin locks 43*f51fe3b7SSebastian Andrzej Siewior=================== 44*f51fe3b7SSebastian Andrzej Siewior 45*f51fe3b7SSebastian Andrzej SiewiorThe various lock types and their behavior under real-time configurations are 46*f51fe3b7SSebastian Andrzej Siewiordescribed in detail in Documentation/locking/locktypes.rst. 47*f51fe3b7SSebastian Andrzej SiewiorIn a non-PREEMPT_RT configuration, a spinlock_t is acquired by first disabling 48*f51fe3b7SSebastian Andrzej Siewiorpreemption and then actively spinning until the lock becomes available. Once 49*f51fe3b7SSebastian Andrzej Siewiorthe lock is released, preemption is enabled. From a real-time perspective, 50*f51fe3b7SSebastian Andrzej Siewiorthis approach is undesirable because disabling preemption prevents the 51*f51fe3b7SSebastian Andrzej Siewiorscheduler from switching to a higher-priority task, potentially increasing 52*f51fe3b7SSebastian Andrzej Siewiorlatency. 53*f51fe3b7SSebastian Andrzej Siewior 54*f51fe3b7SSebastian Andrzej SiewiorTo address this, PREEMPT_RT replaces spinning locks with sleeping spin locks 55*f51fe3b7SSebastian Andrzej Siewiorthat do not disable preemption. On PREEMPT_RT, spinlock_t is implemented using 56*f51fe3b7SSebastian Andrzej Siewiorrtmutex. Instead of spinning, a task attempting to acquire a contended lock 57*f51fe3b7SSebastian Andrzej Siewiordisables CPU migration, donates its priority to the lock owner (priority 58*f51fe3b7SSebastian Andrzej Siewiorinheritance), and voluntarily schedules out while waiting for the lock to 59*f51fe3b7SSebastian Andrzej Siewiorbecome available. 60*f51fe3b7SSebastian Andrzej Siewior 61*f51fe3b7SSebastian Andrzej SiewiorDisabling CPU migration provides the same effect as disabling preemption, while 62*f51fe3b7SSebastian Andrzej Siewiorstill allowing preemption and ensuring that the task continues to run on the 63*f51fe3b7SSebastian Andrzej Siewiorsame CPU while holding a sleeping lock. 64*f51fe3b7SSebastian Andrzej Siewior 65*f51fe3b7SSebastian Andrzej SiewiorPriority inheritance 66*f51fe3b7SSebastian Andrzej Siewior==================== 67*f51fe3b7SSebastian Andrzej Siewior 68*f51fe3b7SSebastian Andrzej SiewiorLock types such as spinlock_t and mutex_t in a PREEMPT_RT enabled kernel are 69*f51fe3b7SSebastian Andrzej Siewiorimplemented on top of rtmutex, which provides support for priority inheritance 70*f51fe3b7SSebastian Andrzej Siewior(PI). When a task blocks on such a lock, the PI mechanism temporarily 71*f51fe3b7SSebastian Andrzej Siewiorpropagates the blocked task’s scheduling parameters to the lock owner. 72*f51fe3b7SSebastian Andrzej Siewior 73*f51fe3b7SSebastian Andrzej SiewiorFor example, if a SCHED_FIFO task A blocks on a lock currently held by a 74*f51fe3b7SSebastian Andrzej SiewiorSCHED_OTHER task B, task A’s scheduling policy and priority are temporarily 75*f51fe3b7SSebastian Andrzej Siewiorinherited by task B. After this inheritance, task A is put to sleep while 76*f51fe3b7SSebastian Andrzej Siewiorwaiting for the lock, and task B effectively becomes the highest-priority task 77*f51fe3b7SSebastian Andrzej Siewiorin the system. This allows B to continue executing, make progress, and 78*f51fe3b7SSebastian Andrzej Siewioreventually release the lock. 79*f51fe3b7SSebastian Andrzej Siewior 80*f51fe3b7SSebastian Andrzej SiewiorOnce B releases the lock, it reverts to its original scheduling parameters, and 81*f51fe3b7SSebastian Andrzej Siewiortask A can resume execution. 82*f51fe3b7SSebastian Andrzej Siewior 83*f51fe3b7SSebastian Andrzej SiewiorThreaded interrupts 84*f51fe3b7SSebastian Andrzej Siewior=================== 85*f51fe3b7SSebastian Andrzej Siewior 86*f51fe3b7SSebastian Andrzej SiewiorInterrupt handlers are another source of code that executes with preemption 87*f51fe3b7SSebastian Andrzej Siewiordisabled and outside the control of the scheduler. To bring interrupt handling 88*f51fe3b7SSebastian Andrzej Siewiorunder scheduler control, PREEMPT_RT enforces threaded interrupt handlers. 89*f51fe3b7SSebastian Andrzej Siewior 90*f51fe3b7SSebastian Andrzej SiewiorWith forced threading, interrupt handling is split into two stages. The first 91*f51fe3b7SSebastian Andrzej Siewiorstage, the primary handler, is executed in IRQ context with interrupts disabled. 92*f51fe3b7SSebastian Andrzej SiewiorIts sole responsibility is to wake the associated threaded handler. The second 93*f51fe3b7SSebastian Andrzej Siewiorstage, the threaded handler, is the function passed to request_irq() as the 94*f51fe3b7SSebastian Andrzej Siewiorinterrupt handler. It runs in process context, scheduled by the kernel. 95*f51fe3b7SSebastian Andrzej Siewior 96*f51fe3b7SSebastian Andrzej SiewiorFrom waking the interrupt thread until threaded handling is completed, the 97*f51fe3b7SSebastian Andrzej Siewiorinterrupt source is masked in the interrupt controller. This ensures that the 98*f51fe3b7SSebastian Andrzej Siewiordevice interrupt remains pending but does not retrigger the CPU, allowing the 99*f51fe3b7SSebastian Andrzej Siewiorsystem to exit IRQ context and handle the interrupt in a scheduled thread. 100*f51fe3b7SSebastian Andrzej Siewior 101*f51fe3b7SSebastian Andrzej SiewiorBy default, the threaded handler executes with the SCHED_FIFO scheduling policy 102*f51fe3b7SSebastian Andrzej Siewiorand a priority of 50 (MAX_RT_PRIO / 2), which is midway between the minimum and 103*f51fe3b7SSebastian Andrzej Siewiormaximum real-time priorities. 104*f51fe3b7SSebastian Andrzej Siewior 105*f51fe3b7SSebastian Andrzej SiewiorIf the threaded interrupt handler raises any soft interrupts during its 106*f51fe3b7SSebastian Andrzej Siewiorexecution, those soft interrupt routines are invoked after the threaded handler 107*f51fe3b7SSebastian Andrzej Siewiorcompletes, within the same thread. Preemption remains enabled during the 108*f51fe3b7SSebastian Andrzej Siewiorexecution of the soft interrupt handler. 109*f51fe3b7SSebastian Andrzej Siewior 110*f51fe3b7SSebastian Andrzej SiewiorSummary 111*f51fe3b7SSebastian Andrzej Siewior======= 112*f51fe3b7SSebastian Andrzej Siewior 113*f51fe3b7SSebastian Andrzej SiewiorBy using sleeping locks and forced-threaded interrupts, PREEMPT_RT 114*f51fe3b7SSebastian Andrzej Siewiorsignificantly reduces sections of code where interrupts or preemption is 115*f51fe3b7SSebastian Andrzej Siewiordisabled, allowing the scheduler to preempt the current execution context and 116*f51fe3b7SSebastian Andrzej Siewiorswitch to a higher-priority task. 117