xref: /linux/Documentation/locking/locktypes.rst (revision 55a42f78ffd386e01a5404419f8c5ded7db70a21)
1.. SPDX-License-Identifier: GPL-2.0
2
3.. _kernel_hacking_locktypes:
4
5==========================
6Lock types and their rules
7==========================
8
9Introduction
10============
11
12The kernel provides a variety of locking primitives which can be divided
13into three categories:
14
15 - Sleeping locks
16 - CPU local locks
17 - Spinning locks
18
19This document conceptually describes these lock types and provides rules
20for their nesting, including the rules for use under PREEMPT_RT.
21
22
23Lock categories
24===============
25
26Sleeping locks
27--------------
28
29Sleeping locks can only be acquired in preemptible task context.
30
31Although implementations allow try_lock() from other contexts, it is
32necessary to carefully evaluate the safety of unlock() as well as of
33try_lock().  Furthermore, it is also necessary to evaluate the debugging
34versions of these primitives.  In short, don't acquire sleeping locks from
35other contexts unless there is no other option.
36
37Sleeping lock types:
38
39 - mutex
40 - rt_mutex
41 - semaphore
42 - rw_semaphore
43 - ww_mutex
44 - percpu_rw_semaphore
45
46On PREEMPT_RT kernels, these lock types are converted to sleeping locks:
47
48 - local_lock
49 - spinlock_t
50 - rwlock_t
51
52
53CPU local locks
54---------------
55
56 - local_lock
57
58On non-PREEMPT_RT kernels, local_lock functions are wrappers around
59preemption and interrupt disabling primitives. Contrary to other locking
60mechanisms, disabling preemption or interrupts are pure CPU local
61concurrency control mechanisms and not suited for inter-CPU concurrency
62control.
63
64
65Spinning locks
66--------------
67
68 - raw_spinlock_t
69 - bit spinlocks
70
71On non-PREEMPT_RT kernels, these lock types are also spinning locks:
72
73 - spinlock_t
74 - rwlock_t
75
76Spinning locks implicitly disable preemption and the lock / unlock functions
77can have suffixes which apply further protections:
78
79 ===================  ====================================================
80 _bh()                Disable / enable bottom halves (soft interrupts)
81 _irq()               Disable / enable interrupts
82 _irqsave/restore()   Save and disable / restore interrupt disabled state
83 ===================  ====================================================
84
85
86Owner semantics
87===============
88
89The aforementioned lock types except semaphores have strict owner
90semantics:
91
92  The context (task) that acquired the lock must release it.
93
94rw_semaphores have a special interface which allows non-owner release for
95readers.
96
97
98rtmutex
99=======
100
101RT-mutexes are mutexes with support for priority inheritance (PI).
102
103PI has limitations on non-PREEMPT_RT kernels due to preemption and
104interrupt disabled sections.
105
106PI clearly cannot preempt preemption-disabled or interrupt-disabled
107regions of code, even on PREEMPT_RT kernels.  Instead, PREEMPT_RT kernels
108execute most such regions of code in preemptible task context, especially
109interrupt handlers and soft interrupts.  This conversion allows spinlock_t
110and rwlock_t to be implemented via RT-mutexes.
111
112
113semaphore
114=========
115
116semaphore is a counting semaphore implementation.
117
118Semaphores are often used for both serialization and waiting, but new use
119cases should instead use separate serialization and wait mechanisms, such
120as mutexes and completions.
121
122semaphores and PREEMPT_RT
123----------------------------
124
125PREEMPT_RT does not change the semaphore implementation because counting
126semaphores have no concept of owners, thus preventing PREEMPT_RT from
127providing priority inheritance for semaphores.  After all, an unknown
128owner cannot be boosted. As a consequence, blocking on semaphores can
129result in priority inversion.
130
131
132rw_semaphore
133============
134
135rw_semaphore is a multiple readers and single writer lock mechanism.
136
137On non-PREEMPT_RT kernels the implementation is fair, thus preventing
138writer starvation.
139
140rw_semaphore complies by default with the strict owner semantics, but there
141exist special-purpose interfaces that allow non-owner release for readers.
142These interfaces work independent of the kernel configuration.
143
144rw_semaphore and PREEMPT_RT
145---------------------------
146
147PREEMPT_RT kernels map rw_semaphore to a separate rt_mutex-based
148implementation, thus changing the fairness:
149
150 Because an rw_semaphore writer cannot grant its priority to multiple
151 readers, a preempted low-priority reader will continue holding its lock,
152 thus starving even high-priority writers.  In contrast, because readers
153 can grant their priority to a writer, a preempted low-priority writer will
154 have its priority boosted until it releases the lock, thus preventing that
155 writer from starving readers.
156
157
158local_lock
159==========
160
161local_lock provides a named scope to critical sections which are protected
162by disabling preemption or interrupts.
163
164On non-PREEMPT_RT kernels local_lock operations map to the preemption and
165interrupt disabling and enabling primitives:
166
167 ===============================  ======================
168 local_lock(&llock)               preempt_disable()
169 local_unlock(&llock)             preempt_enable()
170 local_lock_irq(&llock)           local_irq_disable()
171 local_unlock_irq(&llock)         local_irq_enable()
172 local_lock_irqsave(&llock)       local_irq_save()
173 local_unlock_irqrestore(&llock)  local_irq_restore()
174 ===============================  ======================
175
176The named scope of local_lock has two advantages over the regular
177primitives:
178
179  - The lock name allows static analysis and is also a clear documentation
180    of the protection scope while the regular primitives are scopeless and
181    opaque.
182
183  - If lockdep is enabled the local_lock gains a lockmap which allows to
184    validate the correctness of the protection. This can detect cases where
185    e.g. a function using preempt_disable() as protection mechanism is
186    invoked from interrupt or soft-interrupt context. Aside of that
187    lockdep_assert_held(&llock) works as with any other locking primitive.
188
189local_lock and PREEMPT_RT
190-------------------------
191
192PREEMPT_RT kernels map local_lock to a per-CPU spinlock_t, thus changing
193semantics:
194
195  - All spinlock_t changes also apply to local_lock.
196
197local_lock usage
198----------------
199
200local_lock should be used in situations where disabling preemption or
201interrupts is the appropriate form of concurrency control to protect
202per-CPU data structures on a non PREEMPT_RT kernel.
203
204local_lock is not suitable to protect against preemption or interrupts on a
205PREEMPT_RT kernel due to the PREEMPT_RT specific spinlock_t semantics.
206
207CPU local scope and bottom-half
208-------------------------------
209
210Per-CPU variables that are accessed only in softirq context should not rely on
211the assumption that this context is implicitly protected due to being
212non-preemptible. In a PREEMPT_RT kernel, softirq context is preemptible, and
213synchronizing every bottom-half-disabled section via implicit context results
214in an implicit per-CPU "big kernel lock."
215
216A local_lock_t together with local_lock_nested_bh() and
217local_unlock_nested_bh() for locking operations help to identify the locking
218scope.
219
220When lockdep is enabled, these functions verify that data structure access
221occurs within softirq context.
222Unlike local_lock(), local_unlock_nested_bh() does not disable preemption and
223does not add overhead when used without lockdep.
224
225On a PREEMPT_RT kernel, local_lock_t behaves as a real lock and
226local_unlock_nested_bh() serializes access to the data structure, which allows
227removal of serialization via local_bh_disable().
228
229raw_spinlock_t and spinlock_t
230=============================
231
232raw_spinlock_t
233--------------
234
235raw_spinlock_t is a strict spinning lock implementation in all kernels,
236including PREEMPT_RT kernels.  Use raw_spinlock_t only in real critical
237core code, low-level interrupt handling and places where disabling
238preemption or interrupts is required, for example, to safely access
239hardware state.  raw_spinlock_t can sometimes also be used when the
240critical section is tiny, thus avoiding RT-mutex overhead.
241
242spinlock_t
243----------
244
245The semantics of spinlock_t change with the state of PREEMPT_RT.
246
247On a non-PREEMPT_RT kernel spinlock_t is mapped to raw_spinlock_t and has
248exactly the same semantics.
249
250spinlock_t and PREEMPT_RT
251-------------------------
252
253On a PREEMPT_RT kernel spinlock_t is mapped to a separate implementation
254based on rt_mutex which changes the semantics:
255
256 - Preemption is not disabled.
257
258 - The hard interrupt related suffixes for spin_lock / spin_unlock
259   operations (_irq, _irqsave / _irqrestore) do not affect the CPU's
260   interrupt disabled state.
261
262 - The soft interrupt related suffix (_bh()) still disables softirq
263   handlers.
264
265   Non-PREEMPT_RT kernels disable preemption to get this effect.
266
267   PREEMPT_RT kernels use a per-CPU lock for serialization which keeps
268   preemption enabled. The lock disables softirq handlers and also
269   prevents reentrancy due to task preemption.
270
271PREEMPT_RT kernels preserve all other spinlock_t semantics:
272
273 - Tasks holding a spinlock_t do not migrate.  Non-PREEMPT_RT kernels
274   avoid migration by disabling preemption.  PREEMPT_RT kernels instead
275   disable migration, which ensures that pointers to per-CPU variables
276   remain valid even if the task is preempted.
277
278 - Task state is preserved across spinlock acquisition, ensuring that the
279   task-state rules apply to all kernel configurations.  Non-PREEMPT_RT
280   kernels leave task state untouched.  However, PREEMPT_RT must change
281   task state if the task blocks during acquisition.  Therefore, it saves
282   the current task state before blocking and the corresponding lock wakeup
283   restores it, as shown below::
284
285    task->state = TASK_INTERRUPTIBLE
286     lock()
287       block()
288         task->saved_state = task->state
289	 task->state = TASK_UNINTERRUPTIBLE
290	 schedule()
291					lock wakeup
292					  task->state = task->saved_state
293
294   Other types of wakeups would normally unconditionally set the task state
295   to RUNNING, but that does not work here because the task must remain
296   blocked until the lock becomes available.  Therefore, when a non-lock
297   wakeup attempts to awaken a task blocked waiting for a spinlock, it
298   instead sets the saved state to RUNNING.  Then, when the lock
299   acquisition completes, the lock wakeup sets the task state to the saved
300   state, in this case setting it to RUNNING::
301
302    task->state = TASK_INTERRUPTIBLE
303     lock()
304       block()
305         task->saved_state = task->state
306	 task->state = TASK_UNINTERRUPTIBLE
307	 schedule()
308					non lock wakeup
309					  task->saved_state = TASK_RUNNING
310
311					lock wakeup
312					  task->state = task->saved_state
313
314   This ensures that the real wakeup cannot be lost.
315
316
317rwlock_t
318========
319
320rwlock_t is a multiple readers and single writer lock mechanism.
321
322Non-PREEMPT_RT kernels implement rwlock_t as a spinning lock and the
323suffix rules of spinlock_t apply accordingly. The implementation is fair,
324thus preventing writer starvation.
325
326rwlock_t and PREEMPT_RT
327-----------------------
328
329PREEMPT_RT kernels map rwlock_t to a separate rt_mutex-based
330implementation, thus changing semantics:
331
332 - All the spinlock_t changes also apply to rwlock_t.
333
334 - Because an rwlock_t writer cannot grant its priority to multiple
335   readers, a preempted low-priority reader will continue holding its lock,
336   thus starving even high-priority writers.  In contrast, because readers
337   can grant their priority to a writer, a preempted low-priority writer
338   will have its priority boosted until it releases the lock, thus
339   preventing that writer from starving readers.
340
341
342PREEMPT_RT caveats
343==================
344
345local_lock on RT
346----------------
347
348The mapping of local_lock to spinlock_t on PREEMPT_RT kernels has a few
349implications. For example, on a non-PREEMPT_RT kernel the following code
350sequence works as expected::
351
352  local_lock_irq(&local_lock);
353  raw_spin_lock(&lock);
354
355and is fully equivalent to::
356
357   raw_spin_lock_irq(&lock);
358
359On a PREEMPT_RT kernel this code sequence breaks because local_lock_irq()
360is mapped to a per-CPU spinlock_t which neither disables interrupts nor
361preemption. The following code sequence works perfectly correct on both
362PREEMPT_RT and non-PREEMPT_RT kernels::
363
364  local_lock_irq(&local_lock);
365  spin_lock(&lock);
366
367Another caveat with local locks is that each local_lock has a specific
368protection scope. So the following substitution is wrong::
369
370  func1()
371  {
372    local_irq_save(flags);    -> local_lock_irqsave(&local_lock_1, flags);
373    func3();
374    local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_1, flags);
375  }
376
377  func2()
378  {
379    local_irq_save(flags);    -> local_lock_irqsave(&local_lock_2, flags);
380    func3();
381    local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock_2, flags);
382  }
383
384  func3()
385  {
386    lockdep_assert_irqs_disabled();
387    access_protected_data();
388  }
389
390On a non-PREEMPT_RT kernel this works correctly, but on a PREEMPT_RT kernel
391local_lock_1 and local_lock_2 are distinct and cannot serialize the callers
392of func3(). Also the lockdep assert will trigger on a PREEMPT_RT kernel
393because local_lock_irqsave() does not disable interrupts due to the
394PREEMPT_RT-specific semantics of spinlock_t. The correct substitution is::
395
396  func1()
397  {
398    local_irq_save(flags);    -> local_lock_irqsave(&local_lock, flags);
399    func3();
400    local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
401  }
402
403  func2()
404  {
405    local_irq_save(flags);    -> local_lock_irqsave(&local_lock, flags);
406    func3();
407    local_irq_restore(flags); -> local_unlock_irqrestore(&local_lock, flags);
408  }
409
410  func3()
411  {
412    lockdep_assert_held(&local_lock);
413    access_protected_data();
414  }
415
416
417spinlock_t and rwlock_t
418-----------------------
419
420The changes in spinlock_t and rwlock_t semantics on PREEMPT_RT kernels
421have a few implications.  For example, on a non-PREEMPT_RT kernel the
422following code sequence works as expected::
423
424   local_irq_disable();
425   spin_lock(&lock);
426
427and is fully equivalent to::
428
429   spin_lock_irq(&lock);
430
431Same applies to rwlock_t and the _irqsave() suffix variants.
432
433On PREEMPT_RT kernel this code sequence breaks because RT-mutex requires a
434fully preemptible context.  Instead, use spin_lock_irq() or
435spin_lock_irqsave() and their unlock counterparts.  In cases where the
436interrupt disabling and locking must remain separate, PREEMPT_RT offers a
437local_lock mechanism.  Acquiring the local_lock pins the task to a CPU,
438allowing things like per-CPU interrupt disabled locks to be acquired.
439However, this approach should be used only where absolutely necessary.
440
441A typical scenario is protection of per-CPU variables in thread context::
442
443  struct foo *p = get_cpu_ptr(&var1);
444
445  spin_lock(&p->lock);
446  p->count += this_cpu_read(var2);
447
448This is correct code on a non-PREEMPT_RT kernel, but on a PREEMPT_RT kernel
449this breaks. The PREEMPT_RT-specific change of spinlock_t semantics does
450not allow to acquire p->lock because get_cpu_ptr() implicitly disables
451preemption. The following substitution works on both kernels::
452
453  struct foo *p;
454
455  migrate_disable();
456  p = this_cpu_ptr(&var1);
457  spin_lock(&p->lock);
458  p->count += this_cpu_read(var2);
459
460migrate_disable() ensures that the task is pinned on the current CPU which
461in turn guarantees that the per-CPU access to var1 and var2 are staying on
462the same CPU while the task remains preemptible.
463
464The migrate_disable() substitution is not valid for the following
465scenario::
466
467  func()
468  {
469    struct foo *p;
470
471    migrate_disable();
472    p = this_cpu_ptr(&var1);
473    p->val = func2();
474
475This breaks because migrate_disable() does not protect against reentrancy from
476a preempting task. A correct substitution for this case is::
477
478  func()
479  {
480    struct foo *p;
481
482    local_lock(&foo_lock);
483    p = this_cpu_ptr(&var1);
484    p->val = func2();
485
486On a non-PREEMPT_RT kernel this protects against reentrancy by disabling
487preemption. On a PREEMPT_RT kernel this is achieved by acquiring the
488underlying per-CPU spinlock.
489
490
491raw_spinlock_t on RT
492--------------------
493
494Acquiring a raw_spinlock_t disables preemption and possibly also
495interrupts, so the critical section must avoid acquiring a regular
496spinlock_t or rwlock_t, for example, the critical section must avoid
497allocating memory.  Thus, on a non-PREEMPT_RT kernel the following code
498works perfectly::
499
500  raw_spin_lock(&lock);
501  p = kmalloc(sizeof(*p), GFP_ATOMIC);
502
503But this code fails on PREEMPT_RT kernels because the memory allocator is
504fully preemptible and therefore cannot be invoked from truly atomic
505contexts.  However, it is perfectly fine to invoke the memory allocator
506while holding normal non-raw spinlocks because they do not disable
507preemption on PREEMPT_RT kernels::
508
509  spin_lock(&lock);
510  p = kmalloc(sizeof(*p), GFP_ATOMIC);
511
512
513bit spinlocks
514-------------
515
516PREEMPT_RT cannot substitute bit spinlocks because a single bit is too
517small to accommodate an RT-mutex.  Therefore, the semantics of bit
518spinlocks are preserved on PREEMPT_RT kernels, so that the raw_spinlock_t
519caveats also apply to bit spinlocks.
520
521Some bit spinlocks are replaced with regular spinlock_t for PREEMPT_RT
522using conditional (#ifdef'ed) code changes at the usage site.  In contrast,
523usage-site changes are not needed for the spinlock_t substitution.
524Instead, conditionals in header files and the core locking implementation
525enable the compiler to do the substitution transparently.
526
527
528Lock type nesting rules
529=======================
530
531The most basic rules are:
532
533  - Lock types of the same lock category (sleeping, CPU local, spinning)
534    can nest arbitrarily as long as they respect the general lock ordering
535    rules to prevent deadlocks.
536
537  - Sleeping lock types cannot nest inside CPU local and spinning lock types.
538
539  - CPU local and spinning lock types can nest inside sleeping lock types.
540
541  - Spinning lock types can nest inside all lock types
542
543These constraints apply both in PREEMPT_RT and otherwise.
544
545The fact that PREEMPT_RT changes the lock category of spinlock_t and
546rwlock_t from spinning to sleeping and substitutes local_lock with a
547per-CPU spinlock_t means that they cannot be acquired while holding a raw
548spinlock.  This results in the following nesting ordering:
549
550  1) Sleeping locks
551  2) spinlock_t, rwlock_t, local_lock
552  3) raw_spinlock_t and bit spinlocks
553
554Lockdep will complain if these constraints are violated, both in
555PREEMPT_RT and otherwise.
556