1.\" Copyright (c) 2007 Julian Elischer (julian - freebsd org ) 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd February 3, 2023 28.Dt LOCKING 9 29.Os 30.Sh NAME 31.Nm locking 32.Nd kernel synchronization primitives 33.Sh DESCRIPTION 34The 35.Em FreeBSD 36kernel is written to run across multiple CPUs and as such provides 37several different synchronization primitives to allow developers 38to safely access and manipulate many data types. 39.Ss Mutexes 40Mutexes (also called "blocking mutexes") are the most commonly used 41synchronization primitive in the kernel. 42A thread acquires (locks) a mutex before accessing data shared with other 43threads (including interrupt threads), and releases (unlocks) it afterwards. 44If the mutex cannot be acquired, the thread requesting it will wait. 45Mutexes are adaptive by default, meaning that 46if the owner of a contended mutex is currently running on another CPU, 47then a thread attempting to acquire the mutex will spin rather than yielding 48the processor. 49Mutexes fully support priority propagation. 50.Pp 51See 52.Xr mutex 9 53for details. 54.Ss Spin Mutexes 55Spin mutexes are a variation of basic mutexes; the main difference between 56the two is that spin mutexes never block. 57Instead, they spin while waiting for the lock to be released. 58To avoid deadlock, a thread that holds a spin mutex must never yield its CPU. 59Unlike ordinary mutexes, spin mutexes disable interrupts when acquired. 60Since disabling interrupts can be expensive, they are generally slower to 61acquire and release. 62Spin mutexes should be used only when absolutely necessary, 63e.g. to protect data shared 64with interrupt filter code (see 65.Xr bus_setup_intr 9 66for details), 67or for scheduler internals. 68.Ss Mutex Pools 69With most synchronization primitives, such as mutexes, the programmer must 70provide memory to hold the primitive. 71For example, a mutex may be embedded inside the structure it protects. 72Mutex pools provide a preallocated set of mutexes to avoid this 73requirement. 74Note that mutexes from a pool may only be used as leaf locks. 75.Pp 76See 77.Xr mtx_pool 9 78for details. 79.Ss Reader/Writer Locks 80Reader/writer locks allow shared access to protected data by multiple threads 81or exclusive access by a single thread. 82The threads with shared access are known as 83.Em readers 84since they should only read the protected data. 85A thread with exclusive access is known as a 86.Em writer 87since it may modify protected data. 88.Pp 89Reader/writer locks can be treated as mutexes (see above and 90.Xr mutex 9 ) 91with shared/exclusive semantics. 92Reader/writer locks support priority propagation like mutexes, 93but priority is propagated only to an exclusive holder. 94This limitation comes from the fact that shared owners 95are anonymous. 96.Pp 97See 98.Xr rwlock 9 99for details. 100.Ss Read-Mostly Locks 101Read-mostly locks are similar to 102.Em reader/writer 103locks but optimized for very infrequent write locking. 104.Em Read-mostly 105locks implement full priority propagation by tracking shared owners 106using a caller-supplied 107.Em tracker 108data structure. 109.Pp 110See 111.Xr rmlock 9 112for details. 113.Ss Sleepable Read-Mostly Locks 114Sleepable read-mostly locks are a variation on read-mostly locks. 115Threads holding an exclusive lock may sleep, 116but threads holding a shared lock may not. 117Priority is propagated to shared owners but not to exclusive owners. 118.Ss Shared/exclusive locks 119Shared/exclusive locks are similar to reader/writer locks; the main difference 120between them is that shared/exclusive locks may be held during unbounded sleep. 121Acquiring a contested shared/exclusive lock can perform an unbounded sleep. 122These locks do not support priority propagation. 123.Pp 124See 125.Xr sx 9 126for details. 127.Ss Lockmanager locks 128Lockmanager locks are sleepable shared/exclusive locks used mostly in 129.Xr VFS 9 130.Po 131as a 132.Xr vnode 9 133lock 134.Pc 135and in the buffer cache 136.Po 137.Xr BUF_LOCK 9 138.Pc . 139They have features other lock types do not have such as sleep 140timeouts, blocking upgrades, 141writer starvation avoidance, draining, and an interlock mutex, 142but this makes them complicated both to use and to implement; 143for this reason, they should be avoided. 144.Pp 145See 146.Xr lock 9 147for details. 148.Ss Non-blocking synchronization 149The kernel has two facilities, 150.Xr epoch 9 151and 152.Xr smr 9 , 153which can be used to provide read-only access to a data structure while one or 154more writers are concurrently modifying the data structure. 155Specifically, readers using 156.Xr epoch 9 157and 158.Xr smr 9 159to synchronize accesses do not block writers, in contrast with reader/writer 160locks, and they help ensure that memory freed by writers is not reused until 161all readers which may be accessing it have finished. 162Thus, they are a useful building block in the construction of lock-free 163data structures. 164.Pp 165These facilities are difficult to use correctly and should be avoided 166in preference to traditional mutual exclusion-based synchronization, 167except when performance or non-blocking guarantees are a major concern. 168.Pp 169See 170.Xr epoch 9 171and 172.Xr smr 9 173for details. 174.Ss Counting semaphores 175Counting semaphores provide a mechanism for synchronizing access 176to a pool of resources. 177Unlike mutexes, semaphores do not have the concept of an owner, 178so they can be useful in situations where one thread needs 179to acquire a resource, and another thread needs to release it. 180They are largely deprecated. 181.Pp 182See 183.Xr sema 9 184for details. 185.Ss Condition variables 186Condition variables are used in conjunction with locks to wait for 187a condition to become true. 188A thread must hold the associated lock before calling one of the 189.Fn cv_wait , 190functions. 191When a thread waits on a condition, the lock 192is atomically released before the thread yields the processor 193and reacquired before the function call returns. 194Condition variables may be used with blocking mutexes, 195reader/writer locks, read-mostly locks, and shared/exclusive locks. 196.Pp 197See 198.Xr condvar 9 199for details. 200.Ss Sleep/Wakeup 201The functions 202.Fn tsleep , 203.Fn msleep , 204.Fn msleep_spin , 205.Fn pause , 206.Fn wakeup , 207and 208.Fn wakeup_one 209also handle event-based thread blocking. 210Unlike condition variables, 211arbitrary addresses may be used as wait channels and a dedicated 212structure does not need to be allocated. 213However, care must be taken to ensure that wait channel addresses are 214unique to an event. 215If a thread must wait for an external event, it is put to sleep by 216.Fn tsleep , 217.Fn msleep , 218.Fn msleep_spin , 219or 220.Fn pause . 221Threads may also wait using one of the locking primitive sleep routines 222.Xr mtx_sleep 9 , 223.Xr rw_sleep 9 , 224or 225.Xr sx_sleep 9 . 226.Pp 227The parameter 228.Fa chan 229is an arbitrary address that uniquely identifies the event on which 230the thread is being put to sleep. 231All threads sleeping on a single 232.Fa chan 233are woken up later by 234.Fn wakeup 235.Pq often called from inside an interrupt routine 236to indicate that the 237event the thread was blocking on has occurred. 238.Pp 239Several of the sleep functions including 240.Fn msleep , 241.Fn msleep_spin , 242and the locking primitive sleep routines specify an additional lock 243parameter. 244The lock will be released before sleeping and reacquired 245before the sleep routine returns. 246If 247.Fa priority 248includes the 249.Dv PDROP 250flag, then the lock will not be reacquired before returning. 251The lock is used to ensure that a condition can be checked atomically, 252and that the current thread can be suspended without missing a 253change to the condition or an associated wakeup. 254In addition, all of the sleep routines will fully drop the 255.Va Giant 256mutex 257.Pq even if recursed 258while the thread is suspended and will reacquire the 259.Va Giant 260mutex 261.Pq restoring any recursion 262before the function returns. 263.Pp 264The 265.Fn pause 266function is a special sleep function that waits for a specified 267amount of time to pass before the thread resumes execution. 268This sleep cannot be terminated early by either an explicit 269.Fn wakeup 270or a signal. 271.Pp 272See 273.Xr sleep 9 274for details. 275.Ss Giant 276Giant is a special mutex used to protect data structures that do not 277yet have their own locks. 278Since it provides semantics akin to the old 279.Xr spl 9 280interface, 281Giant has special characteristics: 282.Bl -enum 283.It 284It is recursive. 285.It 286Drivers can request that Giant be locked around them 287by not marking themselves MPSAFE. 288Note that infrastructure to do this is slowly going away as non-MPSAFE 289drivers either became properly locked or disappear. 290.It 291Giant must be locked before other non-sleepable locks. 292.It 293Giant is dropped during unbounded sleeps and reacquired after wakeup. 294.It 295There are places in the kernel that drop Giant and pick it back up 296again. 297Sleep locks will do this before sleeping. 298Parts of the network or VM code may do this as well. 299This means that you cannot count on Giant keeping other code from 300running if your code sleeps, even if you want it to. 301.El 302.Sh INTERACTIONS 303The primitives can interact and have a number of rules regarding how 304they can and can not be combined. 305Many of these rules are checked by 306.Xr witness 4 . 307.Ss Bounded vs. Unbounded Sleep 308In a bounded sleep 309.Po also referred to as 310.Dq blocking 311.Pc 312the only resource needed to resume execution of a thread 313is CPU time for the owner of a lock that the thread is waiting to acquire. 314In an unbounded sleep 315.Po 316often referred to as simply 317.Dq sleeping 318.Pc 319a thread waits for an external event or for a condition 320to become true. 321In particular, 322a dependency chain of threads in bounded sleeps should always make forward 323progress, 324since there is always CPU time available. 325This requires that no thread in a bounded sleep is waiting for a lock held 326by a thread in an unbounded sleep. 327To avoid priority inversions, 328a thread in a bounded sleep lends its priority to the owner of the lock 329that it is waiting for. 330.Pp 331The following primitives perform bounded sleeps: 332mutexes, reader/writer locks and read-mostly locks. 333.Pp 334The following primitives perform unbounded sleeps: 335sleepable read-mostly locks, shared/exclusive locks, lockmanager locks, 336counting semaphores, condition variables, and sleep/wakeup. 337.Ss General Principles 338.Bl -bullet 339.It 340It is an error to do any operation that could result in yielding the processor 341while holding a spin mutex. 342.It 343It is an error to do any operation that could result in unbounded sleep 344while holding any primitive from the 'bounded sleep' group. 345For example, it is an error to try to acquire a shared/exclusive lock while 346holding a mutex, or to try to allocate memory with M_WAITOK while holding a 347reader/writer lock. 348.Pp 349Note that the lock passed to one of the 350.Fn sleep 351or 352.Fn cv_wait 353functions is dropped before the thread enters the unbounded sleep and does 354not violate this rule. 355.It 356It is an error to do any operation that could result in yielding of 357the processor when running inside an interrupt filter. 358.It 359It is an error to do any operation that could result in unbounded sleep when 360running inside an interrupt thread. 361.El 362.Ss Interaction table 363The following table shows what you can and can not do while holding 364one of the locking primitives discussed. 365Note that 366.Dq sleep 367includes 368.Fn sema_wait , 369.Fn sema_timedwait , 370any of the 371.Fn cv_wait 372functions, 373and any of the 374.Fn sleep 375functions. 376.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n 377.It Em " You want:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep 378.It Em "You have: " Ta -------- Ta -------- Ta ------ Ta -------- Ta ------ Ta ------ 379.It spin mtx Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-1 380.It mutex/rw Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1 381.It rmlock Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1 382.It sleep rm Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok-2 Ta \&ok-2/3 383.It sx Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok-3 384.It lockmgr Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok 385.El 386.Pp 387.Em *1 388There are calls that atomically release this primitive when going to sleep 389and reacquire it on wakeup 390.Po 391.Fn mtx_sleep , 392.Fn rw_sleep , 393.Fn msleep_spin , 394etc. 395.Pc . 396.Pp 397.Em *2 398These cases are only allowed while holding a write lock on a sleepable 399read-mostly lock. 400.Pp 401.Em *3 402Though one can sleep while holding this lock, 403one can also use a 404.Fn sleep 405function to atomically release this primitive when going to sleep and 406reacquire it on wakeup. 407.Pp 408Note that non-blocking try operations on locks are always permitted. 409.Ss Context mode table 410The next table shows what can be used in different contexts. 411At this time this is a rather easy to remember table. 412.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n 413.It Em "Context:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep 414.It interrupt filter: Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no 415.It interrupt thread: Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no 416.It callout: Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no 417.It direct callout: Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no 418.It system call: Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok 419.El 420.Sh SEE ALSO 421.Xr lockstat 1 , 422.Xr witness 4 , 423.Xr atomic 9 , 424.Xr BUS_SETUP_INTR 9 , 425.Xr callout 9 , 426.Xr condvar 9 , 427.Xr epoch 9 , 428.Xr lock 9 , 429.Xr LOCK_PROFILING 9 , 430.Xr mtx_pool 9 , 431.Xr mutex 9 , 432.Xr rmlock 9 , 433.Xr rwlock 9 , 434.Xr sema 9 , 435.Xr sleep 9 , 436.Xr smr 9 , 437.Xr sx 9 438.Sh BUGS 439There are too many locking primitives to choose from. 440