xref: /freebsd/share/man/man9/locking.9 (revision 13ec1e3155c7e9bf037b12af186351b7fa9b9450)
1.\" Copyright (c) 2007 Julian Elischer  (julian -  freebsd org )
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd July 5, 2015
28.Dt LOCKING 9
29.Os
30.Sh NAME
31.Nm locking
32.Nd kernel synchronization primitives
33.Sh DESCRIPTION
34The
35.Em FreeBSD
36kernel is written to run across multiple CPUs and as such provides
37several different synchronization primitives to allow developers
38to safely access and manipulate many data types.
39.Ss Mutexes
40Mutexes (also called "blocking mutexes") are the most commonly used
41synchronization primitive in the kernel.
42A thread acquires (locks) a mutex before accessing data shared with other
43threads (including interrupt threads), and releases (unlocks) it afterwards.
44If the mutex cannot be acquired, the thread requesting it will wait.
45Mutexes are adaptive by default, meaning that
46if the owner of a contended mutex is currently running on another CPU,
47then a thread attempting to acquire the mutex will spin rather than yielding
48the processor.
49Mutexes fully support priority propagation.
50.Pp
51See
52.Xr mutex 9
53for details.
54.Ss Spin Mutexes
55Spin mutexes are a variation of basic mutexes; the main difference between
56the two is that spin mutexes never block.
57Instead, they spin while waiting for the lock to be released.
58To avoid deadlock, a thread that holds a spin mutex must never yield its CPU.
59Unlike ordinary mutexes, spin mutexes disable interrupts when acquired.
60Since disabling interrupts can be expensive, they are generally slower to
61acquire and release.
62Spin mutexes should be used only when absolutely necessary,
63e.g. to protect data shared
64with interrupt filter code (see
65.Xr bus_setup_intr 9
66for details),
67or for scheduler internals.
68.Ss Mutex Pools
69With most synchronization primitives, such as mutexes, the programmer must
70provide memory to hold the primitive.
71For example, a mutex may be embedded inside the structure it protects.
72Mutex pools provide a preallocated set of mutexes to avoid this
73requirement.
74Note that mutexes from a pool may only be used as leaf locks.
75.Pp
76See
77.Xr mtx_pool 9
78for details.
79.Ss Reader/Writer Locks
80Reader/writer locks allow shared access to protected data by multiple threads
81or exclusive access by a single thread.
82The threads with shared access are known as
83.Em readers
84since they should only read the protected data.
85A thread with exclusive access is known as a
86.Em writer
87since it may modify protected data.
88.Pp
89Reader/writer locks can be treated as mutexes (see above and
90.Xr mutex 9 )
91with shared/exclusive semantics.
92Reader/writer locks support priority propagation like mutexes,
93but priority is propagated only to an exclusive holder.
94This limitation comes from the fact that shared owners
95are anonymous.
96.Pp
97See
98.Xr rwlock 9
99for details.
100.Ss Read-Mostly Locks
101Read-mostly locks are similar to
102.Em reader/writer
103locks but optimized for very infrequent write locking.
104.Em Read-mostly
105locks implement full priority propagation by tracking shared owners
106using a caller-supplied
107.Em tracker
108data structure.
109.Pp
110See
111.Xr rmlock 9
112for details.
113.Ss Sleepable Read-Mostly Locks
114Sleepable read-mostly locks are a variation on read-mostly locks.
115Threads holding an exclusive lock may sleep,
116but threads holding a shared lock may not.
117Priority is propagated to shared owners but not to exclusive owners.
118.Ss Shared/exclusive locks
119Shared/exclusive locks are similar to reader/writer locks; the main difference
120between them is that shared/exclusive locks may be held during unbounded sleep.
121Acquiring a contested shared/exclusive lock can perform an unbounded sleep.
122These locks do not support priority propagation.
123.Pp
124See
125.Xr sx 9
126for details.
127.Ss Lockmanager locks
128Lockmanager locks are sleepable shared/exclusive locks used mostly in
129.Xr VFS 9
130.Po
131as a
132.Xr vnode 9
133lock
134.Pc
135and in the buffer cache
136.Po
137.Xr BUF_LOCK 9
138.Pc .
139They have features other lock types do not have such as sleep
140timeouts, blocking upgrades,
141writer starvation avoidance, draining, and an interlock mutex,
142but this makes them complicated both to use and to implement;
143for this reason, they should be avoided.
144.Pp
145See
146.Xr lock 9
147for details.
148.Ss Counting semaphores
149Counting semaphores provide a mechanism for synchronizing access
150to a pool of resources.
151Unlike mutexes, semaphores do not have the concept of an owner,
152so they can be useful in situations where one thread needs
153to acquire a resource, and another thread needs to release it.
154They are largely deprecated.
155.Pp
156See
157.Xr sema 9
158for details.
159.Ss Condition variables
160Condition variables are used in conjunction with locks to wait for
161a condition to become true.
162A thread must hold the associated lock before calling one of the
163.Fn cv_wait ,
164functions.
165When a thread waits on a condition, the lock
166is atomically released before the thread yields the processor
167and reacquired before the function call returns.
168Condition variables may be used with blocking mutexes,
169reader/writer locks, read-mostly locks, and shared/exclusive locks.
170.Pp
171See
172.Xr condvar 9
173for details.
174.Ss Sleep/Wakeup
175The functions
176.Fn tsleep ,
177.Fn msleep ,
178.Fn msleep_spin ,
179.Fn pause ,
180.Fn wakeup ,
181and
182.Fn wakeup_one
183also handle event-based thread blocking.
184Unlike condition variables,
185arbitrary addresses may be used as wait channels and a dedicated
186structure does not need to be allocated.
187However, care must be taken to ensure that wait channel addresses are
188unique to an event.
189If a thread must wait for an external event, it is put to sleep by
190.Fn tsleep ,
191.Fn msleep ,
192.Fn msleep_spin ,
193or
194.Fn pause .
195Threads may also wait using one of the locking primitive sleep routines
196.Xr mtx_sleep 9 ,
197.Xr rw_sleep 9 ,
198or
199.Xr sx_sleep 9 .
200.Pp
201The parameter
202.Fa chan
203is an arbitrary address that uniquely identifies the event on which
204the thread is being put to sleep.
205All threads sleeping on a single
206.Fa chan
207are woken up later by
208.Fn wakeup
209.Pq often called from inside an interrupt routine
210to indicate that the
211event the thread was blocking on has occurred.
212.Pp
213Several of the sleep functions including
214.Fn msleep ,
215.Fn msleep_spin ,
216and the locking primitive sleep routines specify an additional lock
217parameter.
218The lock will be released before sleeping and reacquired
219before the sleep routine returns.
220If
221.Fa priority
222includes the
223.Dv PDROP
224flag, then the lock will not be reacquired before returning.
225The lock is used to ensure that a condition can be checked atomically,
226and that the current thread can be suspended without missing a
227change to the condition or an associated wakeup.
228In addition, all of the sleep routines will fully drop the
229.Va Giant
230mutex
231.Pq even if recursed
232while the thread is suspended and will reacquire the
233.Va Giant
234mutex
235.Pq restoring any recursion
236before the function returns.
237.Pp
238The
239.Fn pause
240function is a special sleep function that waits for a specified
241amount of time to pass before the thread resumes execution.
242This sleep cannot be terminated early by either an explicit
243.Fn wakeup
244or a signal.
245.Pp
246See
247.Xr sleep 9
248for details.
249.Ss Giant
250Giant is a special mutex used to protect data structures that do not
251yet have their own locks.
252Since it provides semantics akin to the old
253.Xr spl 9
254interface,
255Giant has special characteristics:
256.Bl -enum
257.It
258It is recursive.
259.It
260Drivers can request that Giant be locked around them
261by not marking themselves MPSAFE.
262Note that infrastructure to do this is slowly going away as non-MPSAFE
263drivers either became properly locked or disappear.
264.It
265Giant must be locked before other non-sleepable locks.
266.It
267Giant is dropped during unbounded sleeps and reacquired after wakeup.
268.It
269There are places in the kernel that drop Giant and pick it back up
270again.
271Sleep locks will do this before sleeping.
272Parts of the network or VM code may do this as well.
273This means that you cannot count on Giant keeping other code from
274running if your code sleeps, even if you want it to.
275.El
276.Sh INTERACTIONS
277The primitives can interact and have a number of rules regarding how
278they can and can not be combined.
279Many of these rules are checked by
280.Xr witness 4 .
281.Ss Bounded vs. Unbounded Sleep
282In a bounded sleep
283.Po also referred to as
284.Dq blocking
285.Pc
286the only resource needed to resume execution of a thread
287is CPU time for the owner of a lock that the thread is waiting to acquire.
288In an unbounded sleep
289.Po
290often referred to as simply
291.Dq sleeping
292.Pc
293a thread waits for an external event or for a condition
294to become true.
295In particular,
296a dependency chain of threads in bounded sleeps should always make forward
297progress,
298since there is always CPU time available.
299This requires that no thread in a bounded sleep is waiting for a lock held
300by a thread in an unbounded sleep.
301To avoid priority inversions,
302a thread in a bounded sleep lends its priority to the owner of the lock
303that it is waiting for.
304.Pp
305The following primitives perform bounded sleeps:
306mutexes, reader/writer locks and read-mostly locks.
307.Pp
308The following primitives perform unbounded sleeps:
309sleepable read-mostly locks, shared/exclusive locks, lockmanager locks,
310counting semaphores, condition variables, and sleep/wakeup.
311.Ss General Principles
312.Bl -bullet
313.It
314It is an error to do any operation that could result in yielding the processor
315while holding a spin mutex.
316.It
317It is an error to do any operation that could result in unbounded sleep
318while holding any primitive from the 'bounded sleep' group.
319For example, it is an error to try to acquire a shared/exclusive lock while
320holding a mutex, or to try to allocate memory with M_WAITOK while holding a
321reader/writer lock.
322.Pp
323Note that the lock passed to one of the
324.Fn sleep
325or
326.Fn cv_wait
327functions is dropped before the thread enters the unbounded sleep and does
328not violate this rule.
329.It
330It is an error to do any operation that could result in yielding of
331the processor when running inside an interrupt filter.
332.It
333It is an error to do any operation that could result in unbounded sleep when
334running inside an interrupt thread.
335.El
336.Ss Interaction table
337The following table shows what you can and can not do while holding
338one of the locking primitives discussed.
339Note that
340.Dq sleep
341includes
342.Fn sema_wait ,
343.Fn sema_timedwait ,
344any of the
345.Fn cv_wait
346functions,
347and any of the
348.Fn sleep
349functions.
350.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n
351.It Em "       You want:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep
352.It Em "You have:     " Ta -------- Ta -------- Ta ------ Ta -------- Ta ------ Ta ------
353.It spin mtx  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-1
354.It mutex/rw  Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1
355.It rmlock    Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1
356.It sleep rm  Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok-2 Ta \&ok-2/3
357.It sx        Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok-3
358.It lockmgr   Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
359.El
360.Pp
361.Em *1
362There are calls that atomically release this primitive when going to sleep
363and reacquire it on wakeup
364.Po
365.Fn mtx_sleep ,
366.Fn rw_sleep ,
367.Fn msleep_spin ,
368etc.
369.Pc .
370.Pp
371.Em *2
372These cases are only allowed while holding a write lock on a sleepable
373read-mostly lock.
374.Pp
375.Em *3
376Though one can sleep while holding this lock,
377one can also use a
378.Fn sleep
379function to atomically release this primitive when going to sleep and
380reacquire it on wakeup.
381.Pp
382Note that non-blocking try operations on locks are always permitted.
383.Ss Context mode table
384The next table shows what can be used in different contexts.
385At this time this is a rather easy to remember table.
386.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n
387.It Em "Context:"  Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep
388.It interrupt filter:  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no
389.It interrupt thread:  Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no
390.It callout:    Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no
391.It direct callout:  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no
392.It system call:    Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
393.El
394.Sh SEE ALSO
395.Xr witness 4 ,
396.Xr BUS_SETUP_INTR 9 ,
397.Xr condvar 9 ,
398.Xr lock 9 ,
399.Xr LOCK_PROFILING 9 ,
400.Xr mtx_pool 9 ,
401.Xr mutex 9 ,
402.Xr rmlock 9 ,
403.Xr rwlock 9 ,
404.Xr sema 9 ,
405.Xr sleep 9 ,
406.Xr sx 9 ,
407.Xr timeout 9
408.Sh HISTORY
409These
410functions appeared in
411.Bsx 4.1
412through
413.Fx 7.0 .
414.Sh BUGS
415There are too many locking primitives to choose from.
416