1.\" Copyright (c) 2007 Julian Elischer (julian - freebsd org ) 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD$ 26.\" 27.Dd November 3, 2010 28.Dt LOCKING 9 29.Os 30.Sh NAME 31.Nm locking 32.Nd kernel synchronization primitives 33.Sh DESCRIPTION 34The 35.Em FreeBSD 36kernel is written to run across multiple CPUs and as such requires 37several different synchronization primitives to allow the developers 38to safely access and manipulate the many data types required. 39.Ss Mutexes 40Mutexes (also called "sleep mutexes") are the most commonly used 41synchronization primitive in the kernel. 42Thread acquires (locks) a mutex before accessing data shared with other 43threads (including interrupt threads), and releases (unlocks) it afterwards. 44If the mutex cannot be acquired, the thread requesting it will sleep. 45Mutexes fully support priority propagation. 46.Pp 47See 48.Xr mutex 9 49for details. 50.Ss Spin mutexes 51Spin mutexes are variation of basic mutexes; the main difference between 52the two is that spin mutexes never sleep - instead, they spin, waiting 53for the thread holding the lock, which runs on another CPU, to release it. 54Differently from ordinary mutex, spin mutexes disable interrupts when acquired. 55Since disabling interrupts is expensive, they are also generally slower. 56Spin mutexes should be used only when necessary, e.g. to protect data shared 57with interrupt filter code (see 58.Xr bus_setup_intr 9 59for details). 60.Ss Pool mutexes 61With most synchronization primitives, such as mutexes, programmer must 62provide a piece of allocated memory to hold the primitive. 63For example, a mutex may be embedded inside the structure it protects. 64Pool mutex is a variant of mutex without this requirement - to lock or unlock 65a pool mutex, one uses address of the structure being protected with it, 66not the mutex itself. 67Pool mutexes are seldom used. 68.Pp 69See 70.Xr mtx_pool 9 71for details. 72.Ss Reader/writer locks 73Reader/writer locks allow shared access to protected data by multiple threads, 74or exclusive access by a single thread. 75The threads with shared access are known as 76.Em readers 77since they should only read the protected data. 78A thread with exclusive access is known as a 79.Em writer 80since it may modify protected data. 81.Pp 82Reader/writer locks can be treated as mutexes (see above and 83.Xr mutex 9 ) 84with shared/exclusive semantics. 85More specifically, regular mutexes can be 86considered to be equivalent to a write-lock on an 87.Em rw_lock. 88The 89.Em rw_lock 90locks have priority propagation like mutexes, but priority 91can be propagated only to an exclusive holder. 92This limitation comes from the fact that shared owners 93are anonymous. 94Another important property is that shared holders of 95.Em rw_lock 96can recurse, but exclusive locks are not allowed to recurse. 97This ability should not be used lightly and 98.Em may go away. 99.Pp 100See 101.Xr rwlock 9 102for details. 103.Ss Read-mostly locks 104Mostly reader locks are similar to 105.Em reader/writer 106locks but optimized for very infrequent write locking. 107.Em Read-mostly 108locks implement full priority propagation by tracking shared owners 109using a caller-supplied 110.Em tracker 111data structure. 112.Pp 113See 114.Xr rmlock 9 115for details. 116.Ss Shared/exclusive locks 117Shared/exclusive locks are similar to reader/writer locks; the main difference 118between them is that shared/exclusive locks may be held during unbounded sleep 119(and may thus perform an unbounded sleep). 120They are inherently less efficient than mutexes, reader/writer locks 121and read-mostly locks. 122They don't support priority propagation. 123They should be considered to be closely related to 124.Xr sleep 9 . 125In fact it could in some cases be 126considered a conditional sleep. 127.Pp 128See 129.Xr sx 9 130for details. 131.Ss Counting semaphores 132Counting semaphores provide a mechanism for synchronizing access 133to a pool of resources. 134Unlike mutexes, semaphores do not have the concept of an owner, 135so they can be useful in situations where one thread needs 136to acquire a resource, and another thread needs to release it. 137They are largely deprecated. 138.Pp 139See 140.Xr sema 9 141for details. 142.Ss Condition variables 143Condition variables are used in conjunction with mutexes to wait for 144conditions to occur. 145A thread must hold the mutex before calling the 146.Fn cv_wait* , 147functions. 148When a thread waits on a condition, the mutex 149is atomically released before the thread is blocked, then reacquired 150before the function call returns. 151.Pp 152See 153.Xr condvar 9 154for details. 155.Ss Giant 156Giant is an instance of a mutex, with some special characteristics: 157.Bl -enum 158.It 159It is recursive. 160.It 161Drivers and filesystems can request that Giant be locked around them 162by not marking themselves MPSAFE. 163Note that infrastructure to do this is slowly going away as non-MPSAFE 164drivers either became properly locked or disappear. 165.It 166Giant must be locked first before other locks. 167.It 168It is OK to hold Giant while performing unbounded sleep; in such case, 169Giant will be dropped before sleeping and picked up after wakeup. 170.It 171There are places in the kernel that drop Giant and pick it back up 172again. 173Sleep locks will do this before sleeping. 174Parts of the network or VM code may do this as well, depending on the 175setting of a sysctl. 176This means that you cannot count on Giant keeping other code from 177running if your code sleeps, even if you want it to. 178.El 179.Ss Sleep/wakeup 180The functions 181.Fn tsleep , 182.Fn msleep , 183.Fn msleep_spin , 184.Fn pause , 185.Fn wakeup , 186and 187.Fn wakeup_one 188handle event-based thread blocking. 189If a thread must wait for an external event, it is put to sleep by 190.Fn tsleep , 191.Fn msleep , 192.Fn msleep_spin , 193or 194.Fn pause . 195Threads may also wait using one of the locking primitive sleep routines 196.Xr mtx_sleep 9 , 197.Xr rw_sleep 9 , 198or 199.Xr sx_sleep 9 . 200.Pp 201The parameter 202.Fa chan 203is an arbitrary address that uniquely identifies the event on which 204the thread is being put to sleep. 205All threads sleeping on a single 206.Fa chan 207are woken up later by 208.Fn wakeup , 209often called from inside an interrupt routine, to indicate that the 210resource the thread was blocking on is available now. 211.Pp 212Several of the sleep functions including 213.Fn msleep , 214.Fn msleep_spin , 215and the locking primitive sleep routines specify an additional lock 216parameter. 217The lock will be released before sleeping and reacquired 218before the sleep routine returns. 219If 220.Fa priority 221includes the 222.Dv PDROP 223flag, then the lock will not be reacquired before returning. 224The lock is used to ensure that a condition can be checked atomically, 225and that the current thread can be suspended without missing a 226change to the condition, or an associated wakeup. 227In addition, all of the sleep routines will fully drop the 228.Va Giant 229mutex 230(even if recursed) 231while the thread is suspended and will reacquire the 232.Va Giant 233mutex before the function returns. 234.Pp 235See 236.Xr sleep 9 237for details. 238.Pp 239.Ss Lockmanager locks 240Shared/exclusive locks, used mostly in 241.Xr VFS 9 , 242in particular as a 243.Xr vnode 9 244lock. 245They have features other lock types don't have, such as sleep timeout, 246writer starvation avoidance, draining, and interlock mutex, but this makes them 247complicated to implement; for this reason, they are deprecated. 248.Pp 249See 250.Xr lock 9 251for details. 252.Sh INTERACTIONS 253The primitives interact and have a number of rules regarding how 254they can and can not be combined. 255Many of these rules are checked using the 256.Xr witness 4 257code. 258.Ss Bounded vs. unbounded sleep 259The following primitives perform bounded sleep: mutexes, pool mutexes, 260reader/writer locks and read-mostly locks. 261.Pp 262The following primitives block (perform unbounded sleep): shared/exclusive locks, 263counting semaphores, condition variables, sleep/wakeup and lockmanager locks. 264.Pp 265It is an error to do any operation that could result in any kind of sleep while 266holding spin mutex. 267.Pp 268As a general rule, it is an error to do any operation that could result 269in unbounded sleep while holding any primitive from the 'bounded sleep' group. 270For example, it is an error to try to acquire shared/exclusive lock while 271holding mutex, or to try to allocate memory with M_WAITOK while holding 272read-write lock. 273.Pp 274As a special case, it is possible to call 275.Fn sleep 276or 277.Fn mtx_sleep 278while holding a single mutex. 279It will atomically drop that mutex and reacquire it as part of waking up. 280This is often a bad idea because it generally relies on the programmer having 281good knowledge of all of the call graph above the place where 282.Fn mtx_sleep 283is being called and assumptions the calling code has made. 284Because the lock gets dropped during sleep, one one must re-test all 285the assumptions that were made before, all the way up the call graph to the 286place where the lock was acquired. 287.Pp 288It is an error to do any operation that could result in any kind of sleep when 289running inside an interrupt filter. 290.Pp 291It is an error to do any operation that could result in unbounded sleep when 292running inside an interrupt thread. 293.Ss Interaction table 294The following table shows what you can and can not do while holding 295one of the synchronization primitives discussed: 296.Bl -column ".Ic xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent 297.It Xo 298.Em "You have: You want:" Ta spin mtx Ta mutex Ta sx Ta rwlock Ta rmlock Ta sleep 299.Xc 300.It spin mtx Ta \&ok-1 Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-3 301.It mutex Ta \&ok Ta \&ok-1 Ta \&no Ta \&ok Ta \&ok Ta \&no-3 302.It sx Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok Ta \&ok Ta \&ok-4 303.It rwlock Ta \&ok Ta \&ok Ta \&no Ta \&ok-2 Ta \&ok Ta \&no-3 304.It rmlock Ta \&ok Ta \&ok Ta \&ok-5 Ta \&ok Ta \&ok-2 Ta \&ok-5 305.El 306.Pp 307.Em *1 308Recursion is defined per lock. 309Lock order is important. 310.Pp 311.Em *2 312Readers can recurse though writers can not. 313Lock order is important. 314.Pp 315.Em *3 316There are calls that atomically release this primitive when going to sleep 317and reacquire it on wakeup (e.g. 318.Fn mtx_sleep , 319.Fn rw_sleep 320and 321.Fn msleep_spin 322). 323.Pp 324.Em *4 325Though one can sleep holding an sx lock, one can also use 326.Fn sx_sleep 327which will atomically release this primitive when going to sleep and 328reacquire it on wakeup. 329.Pp 330.Em *5 331.Em Read-mostly 332locks can be initialized to support sleeping while holding a write lock. 333See 334.Xr rmlock 9 335for details. 336.Ss Context mode table 337The next table shows what can be used in different contexts. 338At this time this is a rather easy to remember table. 339.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent 340.It Xo 341.Em "Context:" Ta spin mtx Ta mutex Ta sx Ta rwlock Ta rmlock Ta sleep 342.Xc 343.It interrupt filter: Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no 344.It interrupt thread: Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&ok Ta \&no 345.It callout: Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&no Ta \&no 346.It syscall: Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok 347.El 348.Sh SEE ALSO 349.Xr witness 4 , 350.Xr condvar 9 , 351.Xr lock 9 , 352.Xr mtx_pool 9 , 353.Xr mutex 9 , 354.Xr rmlock 9 , 355.Xr rwlock 9 , 356.Xr sema 9 , 357.Xr sleep 9 , 358.Xr sx 9 , 359.Xr BUS_SETUP_INTR 9 , 360.Xr LOCK_PROFILING 9 361.Sh HISTORY 362These 363functions appeared in 364.Bsx 4.1 365through 366.Fx 7.0 367.Sh BUGS 368There are too many locking primitives to choose from. 369