xref: /freebsd/share/man/man9/locking.9 (revision 0e8011faf58b743cc652e3b2ad0f7671227610df)
1.\" Copyright (c) 2007 Julian Elischer  (julian -  freebsd org )
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.Dd February 3, 2023
26.Dt LOCKING 9
27.Os
28.Sh NAME
29.Nm locking
30.Nd kernel synchronization primitives
31.Sh DESCRIPTION
32The
33.Em FreeBSD
34kernel is written to run across multiple CPUs and as such provides
35several different synchronization primitives to allow developers
36to safely access and manipulate many data types.
37.Ss Mutexes
38Mutexes (also called "blocking mutexes") are the most commonly used
39synchronization primitive in the kernel.
40A thread acquires (locks) a mutex before accessing data shared with other
41threads (including interrupt threads), and releases (unlocks) it afterwards.
42If the mutex cannot be acquired, the thread requesting it will wait.
43Mutexes are adaptive by default, meaning that
44if the owner of a contended mutex is currently running on another CPU,
45then a thread attempting to acquire the mutex will spin rather than yielding
46the processor.
47Mutexes fully support priority propagation.
48.Pp
49See
50.Xr mutex 9
51for details.
52.Ss Spin Mutexes
53Spin mutexes are a variation of basic mutexes; the main difference between
54the two is that spin mutexes never block.
55Instead, they spin while waiting for the lock to be released.
56To avoid deadlock, a thread that holds a spin mutex must never yield its CPU.
57Unlike ordinary mutexes, spin mutexes disable interrupts when acquired.
58Since disabling interrupts can be expensive, they are generally slower to
59acquire and release.
60Spin mutexes should be used only when absolutely necessary,
61e.g. to protect data shared
62with interrupt filter code (see
63.Xr bus_setup_intr 9
64for details),
65or for scheduler internals.
66.Ss Mutex Pools
67With most synchronization primitives, such as mutexes, the programmer must
68provide memory to hold the primitive.
69For example, a mutex may be embedded inside the structure it protects.
70Mutex pools provide a preallocated set of mutexes to avoid this
71requirement.
72Note that mutexes from a pool may only be used as leaf locks.
73.Pp
74See
75.Xr mtx_pool 9
76for details.
77.Ss Reader/Writer Locks
78Reader/writer locks allow shared access to protected data by multiple threads
79or exclusive access by a single thread.
80The threads with shared access are known as
81.Em readers
82since they should only read the protected data.
83A thread with exclusive access is known as a
84.Em writer
85since it may modify protected data.
86.Pp
87Reader/writer locks can be treated as mutexes (see above and
88.Xr mutex 9 )
89with shared/exclusive semantics.
90Reader/writer locks support priority propagation like mutexes,
91but priority is propagated only to an exclusive holder.
92This limitation comes from the fact that shared owners
93are anonymous.
94.Pp
95See
96.Xr rwlock 9
97for details.
98.Ss Read-Mostly Locks
99Read-mostly locks are similar to
100.Em reader/writer
101locks but optimized for very infrequent write locking.
102.Em Read-mostly
103locks implement full priority propagation by tracking shared owners
104using a caller-supplied
105.Em tracker
106data structure.
107.Pp
108See
109.Xr rmlock 9
110for details.
111.Ss Sleepable Read-Mostly Locks
112Sleepable read-mostly locks are a variation on read-mostly locks.
113Threads holding an exclusive lock may sleep,
114but threads holding a shared lock may not.
115Priority is propagated to shared owners but not to exclusive owners.
116.Ss Shared/exclusive locks
117Shared/exclusive locks are similar to reader/writer locks; the main difference
118between them is that shared/exclusive locks may be held during unbounded sleep.
119Acquiring a contested shared/exclusive lock can perform an unbounded sleep.
120These locks do not support priority propagation.
121.Pp
122See
123.Xr sx 9
124for details.
125.Ss Lockmanager locks
126Lockmanager locks are sleepable shared/exclusive locks used mostly in
127.Xr VFS 9
128.Po
129as a
130.Xr vnode 9
131lock
132.Pc
133and in the buffer cache
134.Po
135.Xr BUF_LOCK 9
136.Pc .
137They have features other lock types do not have such as sleep
138timeouts, blocking upgrades,
139writer starvation avoidance, draining, and an interlock mutex,
140but this makes them complicated both to use and to implement;
141for this reason, they should be avoided.
142.Pp
143See
144.Xr lock 9
145for details.
146.Ss Non-blocking synchronization
147The kernel has two facilities,
148.Xr epoch 9
149and
150.Xr smr 9 ,
151which can be used to provide read-only access to a data structure while one or
152more writers are concurrently modifying the data structure.
153Specifically, readers using
154.Xr epoch 9
155and
156.Xr smr 9
157to synchronize accesses do not block writers, in contrast with reader/writer
158locks, and they help ensure that memory freed by writers is not reused until
159all readers which may be accessing it have finished.
160Thus, they are a useful building block in the construction of lock-free
161data structures.
162.Pp
163These facilities are difficult to use correctly and should be avoided
164in preference to traditional mutual exclusion-based synchronization,
165except when performance or non-blocking guarantees are a major concern.
166.Pp
167See
168.Xr epoch 9
169and
170.Xr smr 9
171for details.
172.Ss Counting semaphores
173Counting semaphores provide a mechanism for synchronizing access
174to a pool of resources.
175Unlike mutexes, semaphores do not have the concept of an owner,
176so they can be useful in situations where one thread needs
177to acquire a resource, and another thread needs to release it.
178They are largely deprecated.
179.Pp
180See
181.Xr sema 9
182for details.
183.Ss Condition variables
184Condition variables are used in conjunction with locks to wait for
185a condition to become true.
186A thread must hold the associated lock before calling one of the
187.Fn cv_wait ,
188functions.
189When a thread waits on a condition, the lock
190is atomically released before the thread yields the processor
191and reacquired before the function call returns.
192Condition variables may be used with blocking mutexes,
193reader/writer locks, read-mostly locks, and shared/exclusive locks.
194.Pp
195See
196.Xr condvar 9
197for details.
198.Ss Sleep/Wakeup
199The functions
200.Fn tsleep ,
201.Fn msleep ,
202.Fn msleep_spin ,
203.Fn pause ,
204.Fn wakeup ,
205and
206.Fn wakeup_one
207also handle event-based thread blocking.
208Unlike condition variables,
209arbitrary addresses may be used as wait channels and a dedicated
210structure does not need to be allocated.
211However, care must be taken to ensure that wait channel addresses are
212unique to an event.
213If a thread must wait for an external event, it is put to sleep by
214.Fn tsleep ,
215.Fn msleep ,
216.Fn msleep_spin ,
217or
218.Fn pause .
219Threads may also wait using one of the locking primitive sleep routines
220.Xr mtx_sleep 9 ,
221.Xr rw_sleep 9 ,
222or
223.Xr sx_sleep 9 .
224.Pp
225The parameter
226.Fa chan
227is an arbitrary address that uniquely identifies the event on which
228the thread is being put to sleep.
229All threads sleeping on a single
230.Fa chan
231are woken up later by
232.Fn wakeup
233.Pq often called from inside an interrupt routine
234to indicate that the
235event the thread was blocking on has occurred.
236.Pp
237Several of the sleep functions including
238.Fn msleep ,
239.Fn msleep_spin ,
240and the locking primitive sleep routines specify an additional lock
241parameter.
242The lock will be released before sleeping and reacquired
243before the sleep routine returns.
244If
245.Fa priority
246includes the
247.Dv PDROP
248flag, then the lock will not be reacquired before returning.
249The lock is used to ensure that a condition can be checked atomically,
250and that the current thread can be suspended without missing a
251change to the condition or an associated wakeup.
252In addition, all of the sleep routines will fully drop the
253.Va Giant
254mutex
255.Pq even if recursed
256while the thread is suspended and will reacquire the
257.Va Giant
258mutex
259.Pq restoring any recursion
260before the function returns.
261.Pp
262The
263.Fn pause
264function is a special sleep function that waits for a specified
265amount of time to pass before the thread resumes execution.
266This sleep cannot be terminated early by either an explicit
267.Fn wakeup
268or a signal.
269.Pp
270See
271.Xr sleep 9
272for details.
273.Ss Giant
274Giant is a special mutex used to protect data structures that do not
275yet have their own locks.
276Since it provides semantics akin to the old
277.Xr spl 9
278interface,
279Giant has special characteristics:
280.Bl -enum
281.It
282It is recursive.
283.It
284Drivers can request that Giant be locked around them
285by not marking themselves MPSAFE.
286Note that infrastructure to do this is slowly going away as non-MPSAFE
287drivers either became properly locked or disappear.
288.It
289Giant must be locked before other non-sleepable locks.
290.It
291Giant is dropped during unbounded sleeps and reacquired after wakeup.
292.It
293There are places in the kernel that drop Giant and pick it back up
294again.
295Sleep locks will do this before sleeping.
296Parts of the network or VM code may do this as well.
297This means that you cannot count on Giant keeping other code from
298running if your code sleeps, even if you want it to.
299.El
300.Sh INTERACTIONS
301The primitives can interact and have a number of rules regarding how
302they can and can not be combined.
303Many of these rules are checked by
304.Xr witness 4 .
305.Ss Bounded vs. Unbounded Sleep
306In a bounded sleep
307.Po also referred to as
308.Dq blocking
309.Pc
310the only resource needed to resume execution of a thread
311is CPU time for the owner of a lock that the thread is waiting to acquire.
312In an unbounded sleep
313.Po
314often referred to as simply
315.Dq sleeping
316.Pc
317a thread waits for an external event or for a condition
318to become true.
319In particular,
320a dependency chain of threads in bounded sleeps should always make forward
321progress,
322since there is always CPU time available.
323This requires that no thread in a bounded sleep is waiting for a lock held
324by a thread in an unbounded sleep.
325To avoid priority inversions,
326a thread in a bounded sleep lends its priority to the owner of the lock
327that it is waiting for.
328.Pp
329The following primitives perform bounded sleeps:
330mutexes, reader/writer locks and read-mostly locks.
331.Pp
332The following primitives perform unbounded sleeps:
333sleepable read-mostly locks, shared/exclusive locks, lockmanager locks,
334counting semaphores, condition variables, and sleep/wakeup.
335.Ss General Principles
336.Bl -bullet
337.It
338It is an error to do any operation that could result in yielding the processor
339while holding a spin mutex.
340.It
341It is an error to do any operation that could result in unbounded sleep
342while holding any primitive from the 'bounded sleep' group.
343For example, it is an error to try to acquire a shared/exclusive lock while
344holding a mutex, or to try to allocate memory with M_WAITOK while holding a
345reader/writer lock.
346.Pp
347Note that the lock passed to one of the
348.Fn sleep
349or
350.Fn cv_wait
351functions is dropped before the thread enters the unbounded sleep and does
352not violate this rule.
353.It
354It is an error to do any operation that could result in yielding of
355the processor when running inside an interrupt filter.
356.It
357It is an error to do any operation that could result in unbounded sleep when
358running inside an interrupt thread.
359.El
360.Ss Interaction table
361The following table shows what you can and can not do while holding
362one of the locking primitives discussed.
363Note that
364.Dq sleep
365includes
366.Fn sema_wait ,
367.Fn sema_timedwait ,
368any of the
369.Fn cv_wait
370functions,
371and any of the
372.Fn sleep
373functions.
374.Bl -column ".Ic xxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n
375.It Em "       You want:" Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep
376.It Em "You have:     " Ta -------- Ta -------- Ta ------ Ta -------- Ta ------ Ta ------
377.It spin mtx  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no-1
378.It mutex/rw  Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1
379.It rmlock    Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no-1
380.It sleep rm  Ta \&ok Ta \&ok Ta \&ok Ta \&ok-2 Ta \&ok-2 Ta \&ok-2/3
381.It sx        Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok-3
382.It lockmgr   Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
383.El
384.Pp
385.Em *1
386There are calls that atomically release this primitive when going to sleep
387and reacquire it on wakeup
388.Po
389.Fn mtx_sleep ,
390.Fn rw_sleep ,
391.Fn msleep_spin ,
392etc.
393.Pc .
394.Pp
395.Em *2
396These cases are only allowed while holding a write lock on a sleepable
397read-mostly lock.
398.Pp
399.Em *3
400Though one can sleep while holding this lock,
401one can also use a
402.Fn sleep
403function to atomically release this primitive when going to sleep and
404reacquire it on wakeup.
405.Pp
406Note that non-blocking try operations on locks are always permitted.
407.Ss Context mode table
408The next table shows what can be used in different contexts.
409At this time this is a rather easy to remember table.
410.Bl -column ".Ic Xxxxxxxxxxxxxxxxxxx" ".Xr XXXXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXXX" ".Xr XXXXXXXXX" ".Xr XXXXXX" -offset 3n
411.It Em "Context:"  Ta spin mtx Ta mutex/rw Ta rmlock Ta sleep rm Ta sx/lk Ta sleep
412.It interrupt filter:  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no
413.It interrupt thread:  Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no
414.It callout:    Ta \&ok Ta \&ok Ta \&ok Ta \&no Ta \&no Ta \&no
415.It direct callout:  Ta \&ok Ta \&no Ta \&no Ta \&no Ta \&no Ta \&no
416.It system call:    Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok Ta \&ok
417.El
418.Sh SEE ALSO
419.Xr lockstat 1 ,
420.Xr witness 4 ,
421.Xr atomic 9 ,
422.Xr BUS_SETUP_INTR 9 ,
423.Xr callout 9 ,
424.Xr condvar 9 ,
425.Xr epoch 9 ,
426.Xr lock 9 ,
427.Xr LOCK_PROFILING 9 ,
428.Xr mtx_pool 9 ,
429.Xr mutex 9 ,
430.Xr rmlock 9 ,
431.Xr rwlock 9 ,
432.Xr sema 9 ,
433.Xr sleep 9 ,
434.Xr smr 9 ,
435.Xr sx 9
436.Sh BUGS
437There are too many locking primitives to choose from.
438