1.\" Copyright (c) 2007 Julian Elischer (julian - freebsd org ) 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.Dd February 3, 2023 26.Dt LOCKING 9 27.Os 28.Sh NAME 29.Nm locking 30.Nd kernel synchronization primitives 31.Sh DESCRIPTION 32The 33.Em FreeBSD 34kernel is written to run across multiple CPUs and as such provides 35several different synchronization primitives to allow developers 36to safely access and manipulate many data types. 37.Ss Mutexes 38Mutexes (also called "blocking mutexes") are the most commonly used 39synchronization primitive in the kernel. 40A thread acquires (locks) a mutex before accessing data shared with other 41threads (including interrupt threads), and releases (unlocks) it afterwards. 42If the mutex cannot be acquired, the thread requesting it will wait. 43Mutexes are adaptive by default, meaning that 44if the owner of a contended mutex is currently running on another CPU, 45then a thread attempting to acquire the mutex will spin rather than yielding 46the processor. 47Mutexes fully support priority propagation. 48.Pp 49See 50.Xr mutex 9 51for details. 52.Ss Spin Mutexes 53Spin mutexes are a variation of basic mutexes; the main difference between 54the two is that spin mutexes never block. 55Instead, they spin while waiting for the lock to be released. 56To avoid deadlock, a thread that holds a spin mutex must never yield its CPU. 57Unlike ordinary mutexes, spin mutexes disable interrupts when acquired. 58Since disabling interrupts can be expensive, they are generally slower to 59acquire and release. 60Spin mutexes should be used only when absolutely necessary, 61e.g. to protect data shared 62with interrupt filter code (see 63.Xr bus_setup_intr 9 64for details), 65or for scheduler internals. 66.Ss Mutex Pools 67With most synchronization primitives, such as mutexes, the programmer must 68provide memory to hold the primitive. 69For example, a mutex may be embedded inside the structure it protects. 70Mutex pools provide a preallocated set of mutexes to avoid this 71requirement. 72Note that mutexes from a pool may only be used as leaf locks. 73.Pp 74See 75.Xr mtx_pool 9 76for details. 77.Ss Reader/Writer Locks 78Reader/writer locks allow shared access to protected data by multiple threads 79or exclusive access by a single thread. 80The threads with shared access are known as 81.Em readers 82since they should only read the protected data. 83A thread with exclusive access is known as a 84.Em writer 85since it may modify protected data. 86.Pp 87Reader/writer locks can be treated as mutexes (see above and 88.Xr mutex 9 ) 89with shared/exclusive semantics. 90Reader/writer locks support priority propagation like mutexes, 91but priority is propagated only to an exclusive holder. 92This limitation comes from the fact that shared owners 93are anonymous. 94.Pp 95See 96.Xr rwlock 9 97for details. 98.Ss Read-Mostly Locks 99Read-mostly locks are similar to 100.Em reader/writer 101locks but optimized for very infrequent write locking. 102.Em Read-mostly 103locks implement full priority propagation by tracking shared owners 104using a caller-supplied 105.Em tracker 106data structure. 107.Pp 108See 109.Xr rmlock 9 110for details. 111.Ss Sleepable Read-Mostly Locks 112Sleepable read-mostly locks are a variation on read-mostly locks. 113Threads holding an exclusive lock may sleep, 114but threads holding a shared lock may not. 115Priority is propagated to shared owners but not to exclusive owners. 116.Ss Shared/exclusive locks 117Shared/exclusive locks are similar to reader/writer locks; the main difference 118between them is that shared/exclusive locks may be held during unbounded sleep. 119Acquiring a contested shared/exclusive lock can perform an unbounded sleep. 120These locks do not support priority propagation. 121.Pp 122See 123.Xr sx 9 124for details. 125.Ss Lockmanager locks 126Lockmanager locks are sleepable shared/exclusive locks used mostly in 127.Xr VFS 9 128.Po 129as a 130.Xr vnode 9 131lock 132.Pc 133and in the buffer cache 134.Po 135.Xr BUF_LOCK 9 136.Pc . 137They have features other lock types do not have such as sleep 138timeouts, blocking upgrades, 139writer starvation avoidance, draining, and an interlock mutex, 140but this makes them complicated both to use and to implement; 141for this reason, they should be avoided. 142.Pp 143See 144.Xr lock 9 145for details. 146.Ss Non-blocking synchronization 147The kernel has two facilities, 148.Xr epoch 9 149and 150.Xr smr 9 , 151which can be used to provide read-only access to a data structure while one or 152more writers are concurrently modifying the data structure. 153Specifically, readers using 154.Xr epoch 9 155and 156.Xr smr 9 157to synchronize accesses do not block writers, in contrast with reader/writer 158locks, and they help ensure that memory freed by writers is not reused until 159all readers which may be accessing it have finished. 160Thus, they are a useful building block in the construction of lock-free 161data structures. 162.Pp 163These facilities are difficult to use correctly and should be avoided 164in preference to traditional mutual exclusion-based synchronization, 165except when performance or non-blocking guarantees are a major concern. 166.Pp 167See 168.Xr epoch 9 169and 170.Xr smr 9 171for details. 172.Ss Counting semaphores 173Counting semaphores provide a mechanism for synchronizing access 174to a pool of resources. 175Unlike mutexes, semaphores do not have the concept of an owner, 176so they can be useful in situations where one thread needs 177to acquire a resource, and another thread needs to release it. 178They are largely deprecated. 179.Pp 180See 181.Xr sema 9 182for details. 183.Ss Condition variables 184Condition variables are used in conjunction with locks to wait for 185a condition to become true. 186A thread must hold the associated lock before calling one of the 187.Fn cv_wait , 188functions. 189When a thread waits on a condition, the lock 190is atomically released before the thread yields the processor 191and reacquired before the function call returns. 192Condition variables may be used with blocking mutexes, 193reader/writer locks, read-mostly locks, and shared/exclusive locks. 194.Pp 195See 196.Xr condvar 9 197for details. 198.Ss Sleep/Wakeup 199The functions 200.Fn tsleep , 201.Fn msleep , 202.Fn msleep_spin , 203.Fn pause , 204.Fn wakeup , 205and 206.Fn wakeup_one 207also handle event-based thread blocking. 208Unlike condition variables, 209arbitrary addresses may be used as wait channels and a dedicated 210structure does not need to be allocated. 211However, care must be taken to ensure that wait channel addresses are 212unique to an event. 213If a thread must wait for an external event, it is put to sleep by 214.Fn tsleep , 215.Fn msleep , 216.Fn msleep_spin , 217or 218.Fn pause . 219Threads may also wait using one of the locking primitive sleep routines 220.Xr mtx_sleep 9 , 221.Xr rw_sleep 9 , 222or 223.Xr sx_sleep 9 . 224.Pp 225The parameter 226.Fa chan 227is an arbitrary address that uniquely identifies the event on which 228the thread is being put to sleep. 229All threads sleeping on a single 230.Fa chan 231are woken up later by 232.Fn wakeup 233.Pq often called from inside an interrupt routine 234to indicate that the 235event the thread was blocking on has occurred. 236.Pp 237Several of the sleep functions including 238.Fn msleep , 239.Fn msleep_spin , 240and the locking primitive sleep routines specify an additional lock 241parameter. 242The lock will be released before sleeping and reacquired 243before the sleep routine returns. 244If 245.Fa priority 246includes the 247.Dv PDROP 248flag, then the lock will not be reacquired before returning. 249The lock is used to ensure that a condition can be checked atomically, 250and that the current thread can be suspended without missing a 251change to the condition or an associated wakeup. 252In addition, all of the sleep routines will fully drop the 253.Va Giant 254mutex 255.Pq even if recursed 256while the thread is suspended and will reacquire the 257.Va Giant 258mutex 259.Pq restoring any recursion 260before the function returns. 261.Pp 262The 263.Fn pause 264function is a special sleep function that waits for a specified 265amount of time to pass before the thread resumes execution. 266This sleep cannot be terminated early by either an explicit 267.Fn wakeup 268or a signal. 269.Pp 270See 271.Xr sleep 9 272for details. 273.Ss Giant 274Giant is a special mutex used to protect data structures that do not 275yet have their own locks. 276Since it provides semantics akin to the old 277.Xr spl 9 278interface, 279Giant has special characteristics: 280.Bl -enum 281.It 282It is recursive. 283.It 284Drivers can request that Giant be locked around them 285by not marking themselves MPSAFE. 286Note that infrastructure to do this is slowly going away as non-MPSAFE 287drivers either became properly locked or disappear. 288.It 289Giant must be locked before other non-sleepable locks. 290.It 291Giant is dropped during unbounded sleeps and reacquired after wakeup. 292.It 293There are places in the kernel that drop Giant and pick it back up 294again. 295Sleep locks will do this before sleeping. 296Parts of the network or VM code may do this as well. 297This means that you cannot count on Giant keeping other code from 298running if your code sleeps, even if you want it to. 299.El 300.Sh INTERACTIONS 301The primitives can interact and have a number of rules regarding how 302they can and can not be combined. 303Many of these rules are checked by 304.Xr witness 4 . 305.Ss Bounded vs. Unbounded Sleep 306In a bounded sleep 307.Po also referred to as 308.Dq blocking 309.Pc 310the only resource needed to resume execution of a thread 311is CPU time for the owner of a lock that the thread is waiting to acquire. 312In an unbounded sleep 313.Po 314often referred to as simply 315.Dq sleeping 316.Pc 317a thread waits for an external event or for a condition 318to become true. 319In particular, 320a dependency chain of threads in bounded sleeps should always make forward 321progress, 322since there is always CPU time available. 323This requires that no thread in a bounded sleep is waiting for a lock held 324by a thread in an unbounded sleep. 325To avoid priority inversions, 326a thread in a bounded sleep lends its priority to the owner of the lock 327that it is waiting for. 328.Pp 329The following primitives perform bounded sleeps: 330mutexes, reader/writer locks and read-mostly locks. 331.Pp 332The following primitives perform unbounded sleeps: 333sleepable read-mostly locks, shared/exclusive locks, lockmanager locks, 334counting semaphores, condition variables, and sleep/wakeup. 335.Ss General Principles 336.Bl -bullet 337.It 338It is an error to do any operation that could result in yielding the processor 339while holding a spin mutex. 340.It 341It is an error to do any operation that could result in unbounded sleep 342while holding any primitive from the 'bounded sleep' group. 343For example, it is an error to try to acquire a shared/exclusive lock while 344holding a mutex, or to try to allocate memory with M_WAITOK while holding a 345reader/writer lock. 346.Pp 347Note that the lock passed to one of the 348.Fn sleep 349or 350.Fn cv_wait 351functions is dropped before the thread enters the unbounded sleep and does 352not violate this rule. 353.It 354It is an error to do any operation that could result in yielding of 355the processor when running inside an interrupt filter. 356.It 357It is an error to do any operation that could result in unbounded sleep when 358running inside an interrupt thread. 359.El 360.Ss Interaction table 361The following table shows what you can and can not do while holding 362one of the locking primitives discussed. 363Note that 364.Dq sleep 365includes 366.Fn sema_wait , 367.Fn sema_timedwait , 368any of the 369.Fn cv_wait 370functions, 371and any of the 372.Fn sleep 373functions. 374.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n 375.It Em " You want:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep 376.It Em "You have: " Ta -------- Ta -------- Ta ------ Ta -------- Ta ------ Ta ------ 377.It spin mtx Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-1 378.It mutex/rw Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1 379.It rmlock Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1 380.It sleep rm Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok-2 Ta \&ok-2/3 381.It sx Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok-3 382.It lockmgr Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok 383.El 384.Pp 385.Em *1 386There are calls that atomically release this primitive when going to sleep 387and reacquire it on wakeup 388.Po 389.Fn mtx_sleep , 390.Fn rw_sleep , 391.Fn msleep_spin , 392etc. 393.Pc . 394.Pp 395.Em *2 396These cases are only allowed while holding a write lock on a sleepable 397read-mostly lock. 398.Pp 399.Em *3 400Though one can sleep while holding this lock, 401one can also use a 402.Fn sleep 403function to atomically release this primitive when going to sleep and 404reacquire it on wakeup. 405.Pp 406Note that non-blocking try operations on locks are always permitted. 407.Ss Context mode table 408The next table shows what can be used in different contexts. 409At this time this is a rather easy to remember table. 410.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n 411.It Em "Context:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep 412.It interrupt filter: Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no 413.It interrupt thread: Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no 414.It callout: Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no 415.It direct callout: Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no 416.It system call: Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok 417.El 418.Sh SEE ALSO 419.Xr lockstat 1 , 420.Xr witness 4 , 421.Xr atomic 9 , 422.Xr BUS_SETUP_INTR 9 , 423.Xr callout 9 , 424.Xr condvar 9 , 425.Xr epoch 9 , 426.Xr lock 9 , 427.Xr LOCK_PROFILING 9 , 428.Xr mtx_pool 9 , 429.Xr mutex 9 , 430.Xr rmlock 9 , 431.Xr rwlock 9 , 432.Xr sema 9 , 433.Xr sleep 9 , 434.Xr smr 9 , 435.Xr sx 9 436.Sh BUGS 437There are too many locking primitives to choose from. 438