xref: /freebsd/share/man/man9/locking.9 (revision 258a0d760aa8b42899a000e30f610f900a402556)
1.\" Copyright (c) 2007 Julian Elischer  (julian -  freebsd org )
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD$
26.\"
27.Dd February 3, 2023
28.Dt LOCKING 9
29.Os
30.Sh NAME
31.Nm locking
32.Nd kernel synchronization primitives
33.Sh DESCRIPTION
34The
35.Em FreeBSD
36kernel is written to run across multiple CPUs and as such provides
37several different synchronization primitives to allow developers
38to safely access and manipulate many data types.
39.Ss Mutexes
40Mutexes (also called "blocking mutexes") are the most commonly used
41synchronization primitive in the kernel.
42A thread acquires (locks) a mutex before accessing data shared with other
43threads (including interrupt threads), and releases (unlocks) it afterwards.
44If the mutex cannot be acquired, the thread requesting it will wait.
45Mutexes are adaptive by default, meaning that
46if the owner of a contended mutex is currently running on another CPU,
47then a thread attempting to acquire the mutex will spin rather than yielding
48the processor.
49Mutexes fully support priority propagation.
50.Pp
51See
52.Xr mutex 9
53for details.
54.Ss Spin Mutexes
55Spin mutexes are a variation of basic mutexes; the main difference between
56the two is that spin mutexes never block.
57Instead, they spin while waiting for the lock to be released.
58To avoid deadlock, a thread that holds a spin mutex must never yield its CPU.
59Unlike ordinary mutexes, spin mutexes disable interrupts when acquired.
60Since disabling interrupts can be expensive, they are generally slower to
61acquire and release.
62Spin mutexes should be used only when absolutely necessary,
63e.g. to protect data shared
64with interrupt filter code (see
65.Xr bus_setup_intr 9
66for details),
67or for scheduler internals.
68.Ss Mutex Pools
69With most synchronization primitives, such as mutexes, the programmer must
70provide memory to hold the primitive.
71For example, a mutex may be embedded inside the structure it protects.
72Mutex pools provide a preallocated set of mutexes to avoid this
73requirement.
74Note that mutexes from a pool may only be used as leaf locks.
75.Pp
76See
77.Xr mtx_pool 9
78for details.
79.Ss Reader/Writer Locks
80Reader/writer locks allow shared access to protected data by multiple threads
81or exclusive access by a single thread.
82The threads with shared access are known as
83.Em readers
84since they should only read the protected data.
85A thread with exclusive access is known as a
86.Em writer
87since it may modify protected data.
88.Pp
89Reader/writer locks can be treated as mutexes (see above and
90.Xr mutex 9 )
91with shared/exclusive semantics.
92Reader/writer locks support priority propagation like mutexes,
93but priority is propagated only to an exclusive holder.
94This limitation comes from the fact that shared owners
95are anonymous.
96.Pp
97See
98.Xr rwlock 9
99for details.
100.Ss Read-Mostly Locks
101Read-mostly locks are similar to
102.Em reader/writer
103locks but optimized for very infrequent write locking.
104.Em Read-mostly
105locks implement full priority propagation by tracking shared owners
106using a caller-supplied
107.Em tracker
108data structure.
109.Pp
110See
111.Xr rmlock 9
112for details.
113.Ss Sleepable Read-Mostly Locks
114Sleepable read-mostly locks are a variation on read-mostly locks.
115Threads holding an exclusive lock may sleep,
116but threads holding a shared lock may not.
117Priority is propagated to shared owners but not to exclusive owners.
118.Ss Shared/exclusive locks
119Shared/exclusive locks are similar to reader/writer locks; the main difference
120between them is that shared/exclusive locks may be held during unbounded sleep.
121Acquiring a contested shared/exclusive lock can perform an unbounded sleep.
122These locks do not support priority propagation.
123.Pp
124See
125.Xr sx 9
126for details.
127.Ss Lockmanager locks
128Lockmanager locks are sleepable shared/exclusive locks used mostly in
129.Xr VFS 9
130.Po
131as a
132.Xr vnode 9
133lock
134.Pc
135and in the buffer cache
136.Po
137.Xr BUF_LOCK 9
138.Pc .
139They have features other lock types do not have such as sleep
140timeouts, blocking upgrades,
141writer starvation avoidance, draining, and an interlock mutex,
142but this makes them complicated both to use and to implement;
143for this reason, they should be avoided.
144.Pp
145See
146.Xr lock 9
147for details.
148.Ss Non-blocking synchronization
149The kernel has two facilities,
150.Xr epoch 9
151and
152.Xr smr 9 ,
153which can be used to provide read-only access to a data structure while one or
154more writers are concurrently modifying the data structure.
155Specifically, readers using
156.Xr epoch 9
157and
158.Xr smr 9
159to synchronize accesses do not block writers, in contrast with reader/writer
160locks, and they help ensure that memory freed by writers is not reused until
161all readers which may be accessing it have finished.
162Thus, they are a useful building block in the construction of lock-free
163data structures.
164.Pp
165These facilities are difficult to use correctly and should be avoided
166in preference to traditional mutual exclusion-based synchronization,
167except when performance or non-blocking guarantees are a major concern.
168.Pp
169See
170.Xr epoch 9
171and
172.Xr smr 9
173for details.
174.Ss Counting semaphores
175Counting semaphores provide a mechanism for synchronizing access
176to a pool of resources.
177Unlike mutexes, semaphores do not have the concept of an owner,
178so they can be useful in situations where one thread needs
179to acquire a resource, and another thread needs to release it.
180They are largely deprecated.
181.Pp
182See
183.Xr sema 9
184for details.
185.Ss Condition variables
186Condition variables are used in conjunction with locks to wait for
187a condition to become true.
188A thread must hold the associated lock before calling one of the
189.Fn cv_wait ,
190functions.
191When a thread waits on a condition, the lock
192is atomically released before the thread yields the processor
193and reacquired before the function call returns.
194Condition variables may be used with blocking mutexes,
195reader/writer locks, read-mostly locks, and shared/exclusive locks.
196.Pp
197See
198.Xr condvar 9
199for details.
200.Ss Sleep/Wakeup
201The functions
202.Fn tsleep ,
203.Fn msleep ,
204.Fn msleep_spin ,
205.Fn pause ,
206.Fn wakeup ,
207and
208.Fn wakeup_one
209also handle event-based thread blocking.
210Unlike condition variables,
211arbitrary addresses may be used as wait channels and a dedicated
212structure does not need to be allocated.
213However, care must be taken to ensure that wait channel addresses are
214unique to an event.
215If a thread must wait for an external event, it is put to sleep by
216.Fn tsleep ,
217.Fn msleep ,
218.Fn msleep_spin ,
219or
220.Fn pause .
221Threads may also wait using one of the locking primitive sleep routines
222.Xr mtx_sleep 9 ,
223.Xr rw_sleep 9 ,
224or
225.Xr sx_sleep 9 .
226.Pp
227The parameter
228.Fa chan
229is an arbitrary address that uniquely identifies the event on which
230the thread is being put to sleep.
231All threads sleeping on a single
232.Fa chan
233are woken up later by
234.Fn wakeup
235.Pq often called from inside an interrupt routine
236to indicate that the
237event the thread was blocking on has occurred.
238.Pp
239Several of the sleep functions including
240.Fn msleep ,
241.Fn msleep_spin ,
242and the locking primitive sleep routines specify an additional lock
243parameter.
244The lock will be released before sleeping and reacquired
245before the sleep routine returns.
246If
247.Fa priority
248includes the
249.Dv PDROP
250flag, then the lock will not be reacquired before returning.
251The lock is used to ensure that a condition can be checked atomically,
252and that the current thread can be suspended without missing a
253change to the condition or an associated wakeup.
254In addition, all of the sleep routines will fully drop the
255.Va Giant
256mutex
257.Pq even if recursed
258while the thread is suspended and will reacquire the
259.Va Giant
260mutex
261.Pq restoring any recursion
262before the function returns.
263.Pp
264The
265.Fn pause
266function is a special sleep function that waits for a specified
267amount of time to pass before the thread resumes execution.
268This sleep cannot be terminated early by either an explicit
269.Fn wakeup
270or a signal.
271.Pp
272See
273.Xr sleep 9
274for details.
275.Ss Giant
276Giant is a special mutex used to protect data structures that do not
277yet have their own locks.
278Since it provides semantics akin to the old
279.Xr spl 9
280interface,
281Giant has special characteristics:
282.Bl -enum
283.It
284It is recursive.
285.It
286Drivers can request that Giant be locked around them
287by not marking themselves MPSAFE.
288Note that infrastructure to do this is slowly going away as non-MPSAFE
289drivers either became properly locked or disappear.
290.It
291Giant must be locked before other non-sleepable locks.
292.It
293Giant is dropped during unbounded sleeps and reacquired after wakeup.
294.It
295There are places in the kernel that drop Giant and pick it back up
296again.
297Sleep locks will do this before sleeping.
298Parts of the network or VM code may do this as well.
299This means that you cannot count on Giant keeping other code from
300running if your code sleeps, even if you want it to.
301.El
302.Sh INTERACTIONS
303The primitives can interact and have a number of rules regarding how
304they can and can not be combined.
305Many of these rules are checked by
306.Xr witness 4 .
307.Ss Bounded vs. Unbounded Sleep
308In a bounded sleep
309.Po also referred to as
310.Dq blocking
311.Pc
312the only resource needed to resume execution of a thread
313is CPU time for the owner of a lock that the thread is waiting to acquire.
314In an unbounded sleep
315.Po
316often referred to as simply
317.Dq sleeping
318.Pc
319a thread waits for an external event or for a condition
320to become true.
321In particular,
322a dependency chain of threads in bounded sleeps should always make forward
323progress,
324since there is always CPU time available.
325This requires that no thread in a bounded sleep is waiting for a lock held
326by a thread in an unbounded sleep.
327To avoid priority inversions,
328a thread in a bounded sleep lends its priority to the owner of the lock
329that it is waiting for.
330.Pp
331The following primitives perform bounded sleeps:
332mutexes, reader/writer locks and read-mostly locks.
333.Pp
334The following primitives perform unbounded sleeps:
335sleepable read-mostly locks, shared/exclusive locks, lockmanager locks,
336counting semaphores, condition variables, and sleep/wakeup.
337.Ss General Principles
338.Bl -bullet
339.It
340It is an error to do any operation that could result in yielding the processor
341while holding a spin mutex.
342.It
343It is an error to do any operation that could result in unbounded sleep
344while holding any primitive from the 'bounded sleep' group.
345For example, it is an error to try to acquire a shared/exclusive lock while
346holding a mutex, or to try to allocate memory with M_WAITOK while holding a
347reader/writer lock.
348.Pp
349Note that the lock passed to one of the
350.Fn sleep
351or
352.Fn cv_wait
353functions is dropped before the thread enters the unbounded sleep and does
354not violate this rule.
355.It
356It is an error to do any operation that could result in yielding of
357the processor when running inside an interrupt filter.
358.It
359It is an error to do any operation that could result in unbounded sleep when
360running inside an interrupt thread.
361.El
362.Ss Interaction table
363The following table shows what you can and can not do while holding
364one of the locking primitives discussed.
365Note that
366.Dq sleep
367includes
368.Fn sema_wait ,
369.Fn sema_timedwait ,
370any of the
371.Fn cv_wait
372functions,
373and any of the
374.Fn sleep
375functions.
376.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n
377.It Em "       You want:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep
378.It Em "You have:     " Ta -------- Ta -------- Ta ------ Ta -------- Ta ------ Ta ------
379.It spin mtx  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-1
380.It mutex/rw  Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1
381.It rmlock    Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1
382.It sleep rm  Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok-2 Ta \&ok-2/3
383.It sx        Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok-3
384.It lockmgr   Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
385.El
386.Pp
387.Em *1
388There are calls that atomically release this primitive when going to sleep
389and reacquire it on wakeup
390.Po
391.Fn mtx_sleep ,
392.Fn rw_sleep ,
393.Fn msleep_spin ,
394etc.
395.Pc .
396.Pp
397.Em *2
398These cases are only allowed while holding a write lock on a sleepable
399read-mostly lock.
400.Pp
401.Em *3
402Though one can sleep while holding this lock,
403one can also use a
404.Fn sleep
405function to atomically release this primitive when going to sleep and
406reacquire it on wakeup.
407.Pp
408Note that non-blocking try operations on locks are always permitted.
409.Ss Context mode table
410The next table shows what can be used in different contexts.
411At this time this is a rather easy to remember table.
412.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n
413.It Em "Context:"  Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep
414.It interrupt filter:  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no
415.It interrupt thread:  Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no
416.It callout:    Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no
417.It direct callout:  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no
418.It system call:    Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
419.El
420.Sh SEE ALSO
421.Xr lockstat 1 ,
422.Xr witness 4 ,
423.Xr atomic 9 ,
424.Xr BUS_SETUP_INTR 9 ,
425.Xr callout 9 ,
426.Xr condvar 9 ,
427.Xr epoch 9 ,
428.Xr lock 9 ,
429.Xr LOCK_PROFILING 9 ,
430.Xr mtx_pool 9 ,
431.Xr mutex 9 ,
432.Xr rmlock 9 ,
433.Xr rwlock 9 ,
434.Xr sema 9 ,
435.Xr sleep 9 ,
436.Xr smr 9 ,
437.Xr sx 9
438.Sh BUGS
439There are too many locking primitives to choose from.
440