xref: /freebsd/share/man/man9/locking.9 (revision f5f47d5068fb97df18eb114a66ae8ef51a0b3c8c)
1.\" Copyright (c) 2007 Julian Elischer  (julian -  freebsd org )
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd May 25, 2012
28.Dt LOCKING 9
29.Os
30.Sh NAME
31.Nm locking
32.Nd kernel synchronization primitives
33.Sh DESCRIPTION
34The
35.Em FreeBSD
36kernel is written to run across multiple CPUs and as such requires
37several different synchronization primitives to allow the developers
38to safely access and manipulate the many data types required.
39.Ss Mutexes
40Mutexes (also erroneously called "sleep mutexes") are the most commonly used
41synchronization primitive in the kernel.
42A thread acquires (locks) a mutex before accessing data shared with other
43threads (including interrupt threads), and releases (unlocks) it afterwards.
44If the mutex cannot be acquired, the thread requesting it will wait.
45Mutexes are by default adaptive, meaning that
46if the owner of a contended mutex is currently running on another CPU,
47then a thread attempting to acquire the mutex will briefly spin
48in the hope that the owner is only briefly holding it,
49and might release it shortly.
50If the owner does not do so, the waiting thread proceeds to yield the processor,
51allowing other threads to run.
52If the owner is not currently actually running then the spin step is skipped.
53Mutexes fully support priority propagation.
54.Pp
55See
56.Xr mutex 9
57for details.
58.Ss Spin mutexes
59Spin mutexes are variation of basic mutexes; the main difference between
60the two is that spin mutexes never yield the processor - instead, they spin,
61waiting for the thread holding the lock,
62(which must be running on another CPU), to release it.
63Spin mutexes disable interrupts while the held so as to not get pre-empted.
64Since disabling interrupts is expensive, they are also generally slower.
65Spin mutexes should be used only when necessary, e.g. to protect data shared
66with interrupt filter code (see
67.Xr bus_setup_intr 9
68for details).
69.Ss Pool mutexes
70With most synchronization primitives, such as mutexes, programmer must
71provide a piece of allocated memory to hold the primitive.
72For example, a mutex may be embedded inside the structure it protects.
73Pool mutex is a variant of mutex without this requirement - to lock or unlock
74a pool mutex, one uses address of the structure being protected with it,
75not the mutex itself.
76Pool mutexes are seldom used.
77.Pp
78See
79.Xr mtx_pool 9
80for details.
81.Ss Reader/writer locks
82Reader/writer locks allow shared access to protected data by multiple threads,
83or exclusive access by a single thread.
84The threads with shared access are known as
85.Em readers
86since they should only read the protected data.
87A thread with exclusive access is known as a
88.Em writer
89since it may modify protected data.
90.Pp
91Reader/writer locks can be treated as mutexes (see above and
92.Xr mutex 9 )
93with shared/exclusive semantics.
94More specifically, regular mutexes can be
95considered to be equivalent to a write-lock on an
96.Em rw_lock.
97The
98.Em rw_lock
99locks have priority propagation like mutexes, but priority
100can be propagated only to an exclusive holder.
101This limitation comes from the fact that shared owners
102are anonymous.
103Another important property is that shared holders of
104.Em rw_lock
105can recurse, but exclusive locks are not allowed to recurse.
106This ability should not be used lightly and
107.Em may go away.
108.Pp
109See
110.Xr rwlock 9
111for details.
112.Ss Read-mostly locks
113Mostly reader locks are similar to
114.Em reader/writer
115locks but optimized for very infrequent write locking.
116.Em Read-mostly
117locks implement full priority propagation by tracking shared owners
118using a caller-supplied
119.Em tracker
120data structure.
121.Pp
122See
123.Xr rmlock 9
124for details.
125.Ss Shared/exclusive locks
126Shared/exclusive locks are similar to reader/writer locks; the main difference
127between them is that shared/exclusive locks may be held during unbounded sleep
128(and may thus perform an unbounded sleep).
129They are inherently less efficient than mutexes, reader/writer locks
130and read-mostly locks.
131They don't support priority propagation.
132They should be considered to be closely related to
133.Xr sleep 9 .
134They could in some cases be
135considered a conditional sleep.
136.Pp
137See
138.Xr sx 9
139for details.
140.Ss Counting semaphores
141Counting semaphores provide a mechanism for synchronizing access
142to a pool of resources.
143Unlike mutexes, semaphores do not have the concept of an owner,
144so they can be useful in situations where one thread needs
145to acquire a resource, and another thread needs to release it.
146They are largely deprecated.
147.Pp
148See
149.Xr sema 9
150for details.
151.Ss Condition variables
152Condition variables are used in conjunction with mutexes to wait for
153conditions to occur.
154A thread must hold the mutex before calling the
155.Fn cv_wait* ,
156functions.
157When a thread waits on a condition, the mutex
158is atomically released before the thread yields the processor,
159then reacquired before the function call returns.
160.Pp
161See
162.Xr condvar 9
163for details.
164.Ss Giant
165Giant is an instance of a mutex, with some special characteristics:
166.Bl -enum
167.It
168It is recursive.
169.It
170Drivers and filesystems can request that Giant be locked around them
171by not marking themselves MPSAFE.
172Note that infrastructure to do this is slowly going away as non-MPSAFE
173drivers either became properly locked or disappear.
174.It
175Giant must be locked first before other locks.
176.It
177It is OK to hold Giant while performing unbounded sleep; in such case,
178Giant will be dropped before sleeping and picked up after wakeup.
179.It
180There are places in the kernel that drop Giant and pick it back up
181again.
182Sleep locks will do this before sleeping.
183Parts of the network or VM code may do this as well, depending on the
184setting of a sysctl.
185This means that you cannot count on Giant keeping other code from
186running if your code sleeps, even if you want it to.
187.El
188.Ss Sleep/wakeup
189The functions
190.Fn tsleep ,
191.Fn msleep ,
192.Fn msleep_spin ,
193.Fn pause ,
194.Fn wakeup ,
195and
196.Fn wakeup_one
197handle event-based thread blocking.
198If a thread must wait for an external event, it is put to sleep by
199.Fn tsleep ,
200.Fn msleep ,
201.Fn msleep_spin ,
202or
203.Fn pause .
204Threads may also wait using one of the locking primitive sleep routines
205.Xr mtx_sleep 9 ,
206.Xr rw_sleep 9 ,
207or
208.Xr sx_sleep 9 .
209.Pp
210The parameter
211.Fa chan
212is an arbitrary address that uniquely identifies the event on which
213the thread is being put to sleep.
214All threads sleeping on a single
215.Fa chan
216are woken up later by
217.Fn wakeup ,
218often called from inside an interrupt routine, to indicate that the
219resource the thread was blocking on is available now.
220.Pp
221Several of the sleep functions including
222.Fn msleep ,
223.Fn msleep_spin ,
224and the locking primitive sleep routines specify an additional lock
225parameter.
226The lock will be released before sleeping and reacquired
227before the sleep routine returns.
228If
229.Fa priority
230includes the
231.Dv PDROP
232flag, then the lock will not be reacquired before returning.
233The lock is used to ensure that a condition can be checked atomically,
234and that the current thread can be suspended without missing a
235change to the condition, or an associated wakeup.
236In addition, all of the sleep routines will fully drop the
237.Va Giant
238mutex
239(even if recursed)
240while the thread is suspended and will reacquire the
241.Va Giant
242mutex before the function returns.
243.Pp
244See
245.Xr sleep 9
246for details.
247.Ss Lockmanager locks
248Shared/exclusive locks, used mostly in
249.Xr VFS 9 ,
250in particular as a
251.Xr vnode 9
252lock.
253They have features other lock types don't have, such as sleep timeout,
254writer starvation avoidance, draining, and interlock mutex, but this makes them
255complicated to implement; for this reason, they are deprecated.
256.Pp
257See
258.Xr lock 9
259for details.
260.Sh INTERACTIONS
261The primitives interact and have a number of rules regarding how
262they can and can not be combined.
263Many of these rules are checked using the
264.Xr witness 4
265code.
266.Ss Bounded vs. unbounded sleep
267The following primitives perform bounded sleep:
268 mutexes, pool mutexes, reader/writer locks and read-mostly locks.
269.Pp
270The following primitives may perform an unbounded sleep:
271shared/exclusive locks, counting semaphores, condition variables, sleep/wakeup and lockmanager locks.
272.Pp
273It is an error to do any operation that could result in yielding the processor
274while holding a spin mutex.
275.Pp
276As a general rule, it is an error to do any operation that could result
277in unbounded sleep while holding any primitive from the 'bounded sleep' group.
278For example, it is an error to try to acquire shared/exclusive lock while
279holding mutex, or to try to allocate memory with M_WAITOK while holding
280read-write lock.
281.Pp
282As a special case, it is possible to call
283.Fn sleep
284or
285.Fn mtx_sleep
286while holding a single mutex.
287It will atomically drop that mutex and reacquire it as part of waking up.
288This is often a bad idea because it generally relies on the programmer having
289good knowledge of all of the call graph above the place where
290.Fn mtx_sleep
291is being called and assumptions the calling code has made.
292Because the lock gets dropped during sleep, one must re-test all
293the assumptions that were made before, all the way up the call graph to the
294place where the lock was acquired.
295.Pp
296It is an error to do any operation that could result in yielding of
297the processor when running inside an interrupt filter.
298.Pp
299It is an error to do any operation that could result in unbounded sleep when
300running inside an interrupt thread.
301.Ss Interaction table
302The following table shows what you can and can not do while holding
303one of the synchronization primitives discussed:
304.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent
305.It Em "       You want:" Ta spin-mtx Ta mutex Ta rwlock Ta rmlock Ta sx Ta sleep
306.It Em "You have:     " Ta ------ Ta ------ Ta ------ Ta ------ Ta ------ Ta ------
307.It spin mtx  Ta \&ok-1 Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-3
308.It mutex     Ta \&ok Ta \&ok-1 Ta \&ok Ta \&ok Ta \&no Ta \&no-3
309.It rwlock    Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok Ta \&no Ta \&no-3
310.It rmlock    Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&no-5 Ta \&no-5
311.It sx        Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&no-2 Ta \&ok-4
312.El
313.Pp
314.Em *1
315Recursion is defined per lock.
316Lock order is important.
317.Pp
318.Em *2
319Readers can recurse though writers can not.
320Lock order is important.
321.Pp
322.Em *3
323There are calls that atomically release this primitive when going to sleep
324and reacquire it on wakeup (e.g.
325.Fn mtx_sleep ,
326.Fn rw_sleep
327and
328.Fn msleep_spin ) .
329.Pp
330.Em *4
331Though one can sleep holding an sx lock, one can also use
332.Fn sx_sleep
333which will atomically release this primitive when going to sleep and
334reacquire it on wakeup.
335.Pp
336.Em *5
337.Em Read-mostly
338locks can be initialized to support sleeping while holding a write lock.
339See
340.Xr rmlock 9
341for details.
342.Ss Context mode table
343The next table shows what can be used in different contexts.
344At this time this is a rather easy to remember table.
345.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXX" -offset indent
346.It Em "Context:"  Ta spin mtx Ta mutex Ta sx Ta rwlock Ta rmlock Ta sleep
347.It interrupt filter:  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no
348.It interrupt thread:  Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&ok Ta \&no
349.It callout:    Ta \&ok Ta \&ok Ta \&no Ta \&ok Ta \&no Ta \&no
350.It syscall:    Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
351.El
352.Sh SEE ALSO
353.Xr witness 4 ,
354.Xr condvar 9 ,
355.Xr lock 9 ,
356.Xr mtx_pool 9 ,
357.Xr mutex 9 ,
358.Xr rmlock 9 ,
359.Xr rwlock 9 ,
360.Xr sema 9 ,
361.Xr sleep 9 ,
362.Xr sx 9 ,
363.Xr BUS_SETUP_INTR 9 ,
364.Xr LOCK_PROFILING 9
365.Sh HISTORY
366These
367functions appeared in
368.Bsx 4.1
369through
370.Fx 7.0 .
371.Sh BUGS
372There are too many locking primitives to choose from.
373