1.\" Copyright (c) 2007 Julian Elischer (julian - freebsd org ) 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd March 29, 2022 28.Dt LOCKING 9 29.Os 30.Sh NAME 31.Nm locking 32.Nd kernel synchronization primitives 33.Sh DESCRIPTION 34The 35.Em FreeBSD 36kernel is written to run across multiple CPUs and as such provides 37several different synchronization primitives to allow developers 38to safely access and manipulate many data types. 39.Ss Mutexes 40Mutexes (also called "blocking mutexes") are the most commonly used 41synchronization primitive in the kernel. 42A thread acquires (locks) a mutex before accessing data shared with other 43threads (including interrupt threads), and releases (unlocks) it afterwards. 44If the mutex cannot be acquired, the thread requesting it will wait. 45Mutexes are adaptive by default, meaning that 46if the owner of a contended mutex is currently running on another CPU, 47then a thread attempting to acquire the mutex will spin rather than yielding 48the processor. 49Mutexes fully support priority propagation. 50.Pp 51See 52.Xr mutex 9 53for details. 54.Ss Spin Mutexes 55Spin mutexes are a variation of basic mutexes; the main difference between 56the two is that spin mutexes never block. 57Instead, they spin while waiting for the lock to be released. 58To avoid deadlock, a thread that holds a spin mutex must never yield its CPU. 59Unlike ordinary mutexes, spin mutexes disable interrupts when acquired. 60Since disabling interrupts can be expensive, they are generally slower to 61acquire and release. 62Spin mutexes should be used only when absolutely necessary, 63e.g. to protect data shared 64with interrupt filter code (see 65.Xr bus_setup_intr 9 66for details), 67or for scheduler internals. 68.Ss Mutex Pools 69With most synchronization primitives, such as mutexes, the programmer must 70provide memory to hold the primitive. 71For example, a mutex may be embedded inside the structure it protects. 72Mutex pools provide a preallocated set of mutexes to avoid this 73requirement. 74Note that mutexes from a pool may only be used as leaf locks. 75.Pp 76See 77.Xr mtx_pool 9 78for details. 79.Ss Reader/Writer Locks 80Reader/writer locks allow shared access to protected data by multiple threads 81or exclusive access by a single thread. 82The threads with shared access are known as 83.Em readers 84since they should only read the protected data. 85A thread with exclusive access is known as a 86.Em writer 87since it may modify protected data. 88.Pp 89Reader/writer locks can be treated as mutexes (see above and 90.Xr mutex 9 ) 91with shared/exclusive semantics. 92Reader/writer locks support priority propagation like mutexes, 93but priority is propagated only to an exclusive holder. 94This limitation comes from the fact that shared owners 95are anonymous. 96.Pp 97See 98.Xr rwlock 9 99for details. 100.Ss Read-Mostly Locks 101Read-mostly locks are similar to 102.Em reader/writer 103locks but optimized for very infrequent write locking. 104.Em Read-mostly 105locks implement full priority propagation by tracking shared owners 106using a caller-supplied 107.Em tracker 108data structure. 109.Pp 110See 111.Xr rmlock 9 112for details. 113.Ss Sleepable Read-Mostly Locks 114Sleepable read-mostly locks are a variation on read-mostly locks. 115Threads holding an exclusive lock may sleep, 116but threads holding a shared lock may not. 117Priority is propagated to shared owners but not to exclusive owners. 118.Ss Shared/exclusive locks 119Shared/exclusive locks are similar to reader/writer locks; the main difference 120between them is that shared/exclusive locks may be held during unbounded sleep. 121Acquiring a contested shared/exclusive lock can perform an unbounded sleep. 122These locks do not support priority propagation. 123.Pp 124See 125.Xr sx 9 126for details. 127.Ss Lockmanager locks 128Lockmanager locks are sleepable shared/exclusive locks used mostly in 129.Xr VFS 9 130.Po 131as a 132.Xr vnode 9 133lock 134.Pc 135and in the buffer cache 136.Po 137.Xr BUF_LOCK 9 138.Pc . 139They have features other lock types do not have such as sleep 140timeouts, blocking upgrades, 141writer starvation avoidance, draining, and an interlock mutex, 142but this makes them complicated both to use and to implement; 143for this reason, they should be avoided. 144.Pp 145See 146.Xr lock 9 147for details. 148.Ss Counting semaphores 149Counting semaphores provide a mechanism for synchronizing access 150to a pool of resources. 151Unlike mutexes, semaphores do not have the concept of an owner, 152so they can be useful in situations where one thread needs 153to acquire a resource, and another thread needs to release it. 154They are largely deprecated. 155.Pp 156See 157.Xr sema 9 158for details. 159.Ss Condition variables 160Condition variables are used in conjunction with locks to wait for 161a condition to become true. 162A thread must hold the associated lock before calling one of the 163.Fn cv_wait , 164functions. 165When a thread waits on a condition, the lock 166is atomically released before the thread yields the processor 167and reacquired before the function call returns. 168Condition variables may be used with blocking mutexes, 169reader/writer locks, read-mostly locks, and shared/exclusive locks. 170.Pp 171See 172.Xr condvar 9 173for details. 174.Ss Sleep/Wakeup 175The functions 176.Fn tsleep , 177.Fn msleep , 178.Fn msleep_spin , 179.Fn pause , 180.Fn wakeup , 181and 182.Fn wakeup_one 183also handle event-based thread blocking. 184Unlike condition variables, 185arbitrary addresses may be used as wait channels and a dedicated 186structure does not need to be allocated. 187However, care must be taken to ensure that wait channel addresses are 188unique to an event. 189If a thread must wait for an external event, it is put to sleep by 190.Fn tsleep , 191.Fn msleep , 192.Fn msleep_spin , 193or 194.Fn pause . 195Threads may also wait using one of the locking primitive sleep routines 196.Xr mtx_sleep 9 , 197.Xr rw_sleep 9 , 198or 199.Xr sx_sleep 9 . 200.Pp 201The parameter 202.Fa chan 203is an arbitrary address that uniquely identifies the event on which 204the thread is being put to sleep. 205All threads sleeping on a single 206.Fa chan 207are woken up later by 208.Fn wakeup 209.Pq often called from inside an interrupt routine 210to indicate that the 211event the thread was blocking on has occurred. 212.Pp 213Several of the sleep functions including 214.Fn msleep , 215.Fn msleep_spin , 216and the locking primitive sleep routines specify an additional lock 217parameter. 218The lock will be released before sleeping and reacquired 219before the sleep routine returns. 220If 221.Fa priority 222includes the 223.Dv PDROP 224flag, then the lock will not be reacquired before returning. 225The lock is used to ensure that a condition can be checked atomically, 226and that the current thread can be suspended without missing a 227change to the condition or an associated wakeup. 228In addition, all of the sleep routines will fully drop the 229.Va Giant 230mutex 231.Pq even if recursed 232while the thread is suspended and will reacquire the 233.Va Giant 234mutex 235.Pq restoring any recursion 236before the function returns. 237.Pp 238The 239.Fn pause 240function is a special sleep function that waits for a specified 241amount of time to pass before the thread resumes execution. 242This sleep cannot be terminated early by either an explicit 243.Fn wakeup 244or a signal. 245.Pp 246See 247.Xr sleep 9 248for details. 249.Ss Giant 250Giant is a special mutex used to protect data structures that do not 251yet have their own locks. 252Since it provides semantics akin to the old 253.Xr spl 9 254interface, 255Giant has special characteristics: 256.Bl -enum 257.It 258It is recursive. 259.It 260Drivers can request that Giant be locked around them 261by not marking themselves MPSAFE. 262Note that infrastructure to do this is slowly going away as non-MPSAFE 263drivers either became properly locked or disappear. 264.It 265Giant must be locked before other non-sleepable locks. 266.It 267Giant is dropped during unbounded sleeps and reacquired after wakeup. 268.It 269There are places in the kernel that drop Giant and pick it back up 270again. 271Sleep locks will do this before sleeping. 272Parts of the network or VM code may do this as well. 273This means that you cannot count on Giant keeping other code from 274running if your code sleeps, even if you want it to. 275.El 276.Sh INTERACTIONS 277The primitives can interact and have a number of rules regarding how 278they can and can not be combined. 279Many of these rules are checked by 280.Xr witness 4 . 281.Ss Bounded vs. Unbounded Sleep 282In a bounded sleep 283.Po also referred to as 284.Dq blocking 285.Pc 286the only resource needed to resume execution of a thread 287is CPU time for the owner of a lock that the thread is waiting to acquire. 288In an unbounded sleep 289.Po 290often referred to as simply 291.Dq sleeping 292.Pc 293a thread waits for an external event or for a condition 294to become true. 295In particular, 296a dependency chain of threads in bounded sleeps should always make forward 297progress, 298since there is always CPU time available. 299This requires that no thread in a bounded sleep is waiting for a lock held 300by a thread in an unbounded sleep. 301To avoid priority inversions, 302a thread in a bounded sleep lends its priority to the owner of the lock 303that it is waiting for. 304.Pp 305The following primitives perform bounded sleeps: 306mutexes, reader/writer locks and read-mostly locks. 307.Pp 308The following primitives perform unbounded sleeps: 309sleepable read-mostly locks, shared/exclusive locks, lockmanager locks, 310counting semaphores, condition variables, and sleep/wakeup. 311.Ss General Principles 312.Bl -bullet 313.It 314It is an error to do any operation that could result in yielding the processor 315while holding a spin mutex. 316.It 317It is an error to do any operation that could result in unbounded sleep 318while holding any primitive from the 'bounded sleep' group. 319For example, it is an error to try to acquire a shared/exclusive lock while 320holding a mutex, or to try to allocate memory with M_WAITOK while holding a 321reader/writer lock. 322.Pp 323Note that the lock passed to one of the 324.Fn sleep 325or 326.Fn cv_wait 327functions is dropped before the thread enters the unbounded sleep and does 328not violate this rule. 329.It 330It is an error to do any operation that could result in yielding of 331the processor when running inside an interrupt filter. 332.It 333It is an error to do any operation that could result in unbounded sleep when 334running inside an interrupt thread. 335.El 336.Ss Interaction table 337The following table shows what you can and can not do while holding 338one of the locking primitives discussed. 339Note that 340.Dq sleep 341includes 342.Fn sema_wait , 343.Fn sema_timedwait , 344any of the 345.Fn cv_wait 346functions, 347and any of the 348.Fn sleep 349functions. 350.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n 351.It Em " You want:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep 352.It Em "You have: " Ta -------- Ta -------- Ta ------ Ta -------- Ta ------ Ta ------ 353.It spin mtx Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-1 354.It mutex/rw Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1 355.It rmlock Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1 356.It sleep rm Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok-2 Ta \&ok-2/3 357.It sx Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok-3 358.It lockmgr Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok 359.El 360.Pp 361.Em *1 362There are calls that atomically release this primitive when going to sleep 363and reacquire it on wakeup 364.Po 365.Fn mtx_sleep , 366.Fn rw_sleep , 367.Fn msleep_spin , 368etc. 369.Pc . 370.Pp 371.Em *2 372These cases are only allowed while holding a write lock on a sleepable 373read-mostly lock. 374.Pp 375.Em *3 376Though one can sleep while holding this lock, 377one can also use a 378.Fn sleep 379function to atomically release this primitive when going to sleep and 380reacquire it on wakeup. 381.Pp 382Note that non-blocking try operations on locks are always permitted. 383.Ss Context mode table 384The next table shows what can be used in different contexts. 385At this time this is a rather easy to remember table. 386.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n 387.It Em "Context:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep 388.It interrupt filter: Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no 389.It interrupt thread: Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no 390.It callout: Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no 391.It direct callout: Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no 392.It system call: Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok 393.El 394.Sh SEE ALSO 395.Xr lockstat 1 , 396.Xr witness 4 , 397.Xr BUS_SETUP_INTR 9 , 398.Xr condvar 9 , 399.Xr lock 9 , 400.Xr LOCK_PROFILING 9 , 401.Xr mtx_pool 9 , 402.Xr mutex 9 , 403.Xr rmlock 9 , 404.Xr rwlock 9 , 405.Xr sema 9 , 406.Xr sleep 9 , 407.Xr sx 9 , 408.Xr timeout 9 409.Sh HISTORY 410These 411functions appeared in 412.Bsx 4.1 413through 414.Fx 7.0 . 415.Sh BUGS 416There are too many locking primitives to choose from. 417