1.\" Copyright (c) 2007 Julian Elischer (julian - freebsd org ) 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd May 25, 2012 28.Dt LOCKING 9 29.Os 30.Sh NAME 31.Nm locking 32.Nd kernel synchronization primitives 33.Sh DESCRIPTION 34The 35.Em FreeBSD 36kernel is written to run across multiple CPUs and as such requires 37several different synchronization primitives to allow the developers 38to safely access and manipulate the many data types required. 39.Ss Mutexes 40Mutexes (also erroneously called "sleep mutexes") are the most commonly used 41synchronization primitive in the kernel. 42A thread acquires (locks) a mutex before accessing data shared with other 43threads (including interrupt threads), and releases (unlocks) it afterwards. 44If the mutex cannot be acquired, the thread requesting it will wait. 45Mutexes are by default adaptive, meaning that 46if the owner of a contended mutex is currently running on another CPU, 47then a thread attempting to acquire the mutex will briefly spin 48in the hope that the owner is only briefly holding it, 49and might release it shortly. 50If the owner does not do so, the waiting thread proceeds to yield the processor, 51allowing other threads to run. 52If the owner is not currently actually running then the spin step is skipped. 53Mutexes fully support priority propagation. 54.Pp 55See 56.Xr mutex 9 57for details. 58.Ss Spin mutexes 59Spin mutexes are variation of basic mutexes; the main difference between 60the two is that spin mutexes never yield the processor - instead, they spin, 61waiting for the thread holding the lock, 62(which must be running on another CPU), to release it. 63Spin mutexes disable interrupts while the held so as to not get pre-empted. 64Since disabling interrupts is expensive, they are also generally slower. 65Spin mutexes should be used only when necessary, e.g. to protect data shared 66with interrupt filter code (see 67.Xr bus_setup_intr 9 68for details). 69.Ss Pool mutexes 70With most synchronization primitives, such as mutexes, programmer must 71provide a piece of allocated memory to hold the primitive. 72For example, a mutex may be embedded inside the structure it protects. 73Pool mutex is a variant of mutex without this requirement - to lock or unlock 74a pool mutex, one uses address of the structure being protected with it, 75not the mutex itself. 76Pool mutexes are seldom used. 77.Pp 78See 79.Xr mtx_pool 9 80for details. 81.Ss Reader/writer locks 82Reader/writer locks allow shared access to protected data by multiple threads, 83or exclusive access by a single thread. 84The threads with shared access are known as 85.Em readers 86since they should only read the protected data. 87A thread with exclusive access is known as a 88.Em writer 89since it may modify protected data. 90.Pp 91Reader/writer locks can be treated as mutexes (see above and 92.Xr mutex 9 ) 93with shared/exclusive semantics. 94More specifically, regular mutexes can be 95considered to be equivalent to a write-lock on an 96.Em rw_lock. 97The 98.Em rw_lock 99locks have priority propagation like mutexes, but priority 100can be propagated only to an exclusive holder. 101This limitation comes from the fact that shared owners 102are anonymous. 103Another important property is that shared holders of 104.Em rw_lock 105can recurse, but exclusive locks are not allowed to recurse. 106This ability should not be used lightly and 107.Em may go away. 108.Pp 109See 110.Xr rwlock 9 111for details. 112.Ss Read-mostly locks 113Mostly reader locks are similar to 114.Em reader/writer 115locks but optimized for very infrequent write locking. 116.Em Read-mostly 117locks implement full priority propagation by tracking shared owners 118using a caller-supplied 119.Em tracker 120data structure. 121.Pp 122See 123.Xr rmlock 9 124for details. 125.Ss Shared/exclusive locks 126Shared/exclusive locks are similar to reader/writer locks; the main difference 127between them is that shared/exclusive locks may be held during unbounded sleep 128(and may thus perform an unbounded sleep). 129They are inherently less efficient than mutexes, reader/writer locks 130and read-mostly locks. 131They don't support priority propagation. 132They should be considered to be closely related to 133.Xr sleep 9 . 134They could in some cases be 135considered a conditional sleep. 136.Pp 137See 138.Xr sx 9 139for details. 140.Ss Counting semaphores 141Counting semaphores provide a mechanism for synchronizing access 142to a pool of resources. 143Unlike mutexes, semaphores do not have the concept of an owner, 144so they can be useful in situations where one thread needs 145to acquire a resource, and another thread needs to release it. 146They are largely deprecated. 147.Pp 148See 149.Xr sema 9 150for details. 151.Ss Condition variables 152Condition variables are used in conjunction with mutexes to wait for 153conditions to occur. 154A thread must hold the mutex before calling the 155.Fn cv_wait* , 156functions. 157When a thread waits on a condition, the mutex 158is atomically released before the thread yields the processor, 159then reacquired before the function call returns. 160.Pp 161See 162.Xr condvar 9 163for details. 164.Ss Giant 165Giant is an instance of a mutex, with some special characteristics: 166.Bl -enum 167.It 168It is recursive. 169.It 170Drivers and filesystems can request that Giant be locked around them 171by not marking themselves MPSAFE. 172Note that infrastructure to do this is slowly going away as non-MPSAFE 173drivers either became properly locked or disappear. 174.It 175Giant must be locked first before other locks. 176.It 177It is OK to hold Giant while performing unbounded sleep; in such case, 178Giant will be dropped before sleeping and picked up after wakeup. 179.It 180There are places in the kernel that drop Giant and pick it back up 181again. 182Sleep locks will do this before sleeping. 183Parts of the network or VM code may do this as well, depending on the 184setting of a sysctl. 185This means that you cannot count on Giant keeping other code from 186running if your code sleeps, even if you want it to. 187.El 188.Ss Sleep/wakeup 189The functions 190.Fn tsleep , 191.Fn msleep , 192.Fn msleep_spin , 193.Fn pause , 194.Fn wakeup , 195and 196.Fn wakeup_one 197handle event-based thread blocking. 198If a thread must wait for an external event, it is put to sleep by 199.Fn tsleep , 200.Fn msleep , 201.Fn msleep_spin , 202or 203.Fn pause . 204Threads may also wait using one of the locking primitive sleep routines 205.Xr mtx_sleep 9 , 206.Xr rw_sleep 9 , 207or 208.Xr sx_sleep 9 . 209.Pp 210The parameter 211.Fa chan 212is an arbitrary address that uniquely identifies the event on which 213the thread is being put to sleep. 214All threads sleeping on a single 215.Fa chan 216are woken up later by 217.Fn wakeup , 218often called from inside an interrupt routine, to indicate that the 219resource the thread was blocking on is available now. 220.Pp 221Several of the sleep functions including 222.Fn msleep , 223.Fn msleep_spin , 224and the locking primitive sleep routines specify an additional lock 225parameter. 226The lock will be released before sleeping and reacquired 227before the sleep routine returns. 228If 229.Fa priority 230includes the 231.Dv PDROP 232flag, then the lock will not be reacquired before returning. 233The lock is used to ensure that a condition can be checked atomically, 234and that the current thread can be suspended without missing a 235change to the condition, or an associated wakeup. 236In addition, all of the sleep routines will fully drop the 237.Va Giant 238mutex 239(even if recursed) 240while the thread is suspended and will reacquire the 241.Va Giant 242mutex before the function returns. 243.Pp 244See 245.Xr sleep 9 246for details. 247.Ss Lockmanager locks 248Shared/exclusive locks, used mostly in 249.Xr VFS 9 , 250in particular as a 251.Xr vnode 9 252lock. 253They have features other lock types don't have, such as sleep timeout, 254writer starvation avoidance, draining, and interlock mutex, but this makes them 255complicated to implement; for this reason, they are deprecated. 256.Pp 257See 258.Xr lock 9 259for details. 260.Sh INTERACTIONS 261The primitives interact and have a number of rules regarding how 262they can and can not be combined. 263Many of these rules are checked using the 264.Xr witness 4 265code. 266.Ss Bounded vs. unbounded sleep 267The following primitives perform bounded sleep: 268 mutexes, pool mutexes, reader/writer locks and read-mostly locks. 269.Pp 270The following primitives may perform an unbounded sleep: 271shared/exclusive locks, counting semaphores, condition variables, sleep/wakeup and lockmanager locks. 272.Pp 273It is an error to do any operation that could result in yielding the processor 274while holding a spin mutex. 275.Pp 276As a general rule, it is an error to do any operation that could result 277in unbounded sleep while holding any primitive from the 'bounded sleep' group. 278For example, it is an error to try to acquire shared/exclusive lock while 279holding mutex, or to try to allocate memory with M_WAITOK while holding 280read-write lock. 281.Pp 282As a special case, it is possible to call 283.Fn sleep 284or 285.Fn mtx_sleep 286while holding a single mutex. 287It will atomically drop that mutex and reacquire it as part of waking up. 288This is often a bad idea because it generally relies on the programmer having 289good knowledge of all of the call graph above the place where 290.Fn mtx_sleep 291is being called and assumptions the calling code has made. 292Because the lock gets dropped during sleep, one must re-test all 293the assumptions that were made before, all the way up the call graph to the 294place where the lock was acquired. 295.Pp 296It is an error to do any operation that could result in yielding of 297the processor when running inside an interrupt filter. 298.Pp 299It is an error to do any operation that could result in unbounded sleep when 300running inside an interrupt thread. 301.Ss Interaction table 302The following table shows what you can and can not do while holding 303one of the synchronization primitives discussed: 304.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent 305.It Em " You want:" Ta spin-mtx Ta mutex Ta rwlock Ta rmlock Ta sx Ta sleep 306.It Em "You have: " Ta ------ Ta ------ Ta ------ Ta ------ Ta ------ Ta ------ 307.It spin mtx Ta \&ok-1 Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-3 308.It mutex Ta \&ok Ta \&ok-1 Ta \&ok Ta \&ok Ta \&no Ta \&no-3 309.It rwlock Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok Ta \&no Ta \&no-3 310.It rmlock Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&no-5 Ta \&no-5 311.It sx Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&no-2 Ta \&ok-4 312.El 313.Pp 314.Em *1 315Recursion is defined per lock. 316Lock order is important. 317.Pp 318.Em *2 319Readers can recurse though writers can not. 320Lock order is important. 321.Pp 322.Em *3 323There are calls that atomically release this primitive when going to sleep 324and reacquire it on wakeup (e.g. 325.Fn mtx_sleep , 326.Fn rw_sleep 327and 328.Fn msleep_spin ) . 329.Pp 330.Em *4 331Though one can sleep holding an sx lock, one can also use 332.Fn sx_sleep 333which will atomically release this primitive when going to sleep and 334reacquire it on wakeup. 335.Pp 336.Em *5 337.Em Read-mostly 338locks can be initialized to support sleeping while holding a write lock. 339See 340.Xr rmlock 9 341for details. 342.Ss Context mode table 343The next table shows what can be used in different contexts. 344At this time this is a rather easy to remember table. 345.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent 346.It Em "Context:" Ta spin mtx Ta mutex Ta sx Ta rwlock Ta rmlock Ta sleep 347.It interrupt filter: Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no 348.It interrupt thread: Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&ok Ta \&no 349.It callout: Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&no Ta \&no 350.It syscall: Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok 351.El 352.Sh SEE ALSO 353.Xr witness 4 , 354.Xr condvar 9 , 355.Xr lock 9 , 356.Xr mtx_pool 9 , 357.Xr mutex 9 , 358.Xr rmlock 9 , 359.Xr rwlock 9 , 360.Xr sema 9 , 361.Xr sleep 9 , 362.Xr sx 9 , 363.Xr BUS_SETUP_INTR 9 , 364.Xr LOCK_PROFILING 9 365.Sh HISTORY 366These 367functions appeared in 368.Bsx 4.1 369through 370.Fx 7.0 . 371.Sh BUGS 372There are too many locking primitives to choose from. 373