xref: /freebsd/share/man/man9/smr.9 (revision 7d8e1e8dd9042f802a67adefabd28fcd9b1e4051)
1.\" SPDX-License-Identifier: BSD-2-Clause
2.\"
3.\" Copyright (c) 2023 The FreeBSD Foundation
4.\"
5.\" This documentation was written by Mark Johnston <markj@FreeBSD.org>
6.\" under sponsorship from the FreeBSD Foundation.
7.\"
8.\" Redistribution and use in source and binary forms, with or without
9.\" modification, are permitted provided that the following conditions
10.\" are met:
11.\" 1. Redistributions of source code must retain the above copyright
12.\"    notice, this list of conditions and the following disclaimer.
13.\" 2. Redistributions in binary form must reproduce the above copyright
14.\"    notice, this list of conditions and the following disclaimer in the
15.\"    documentation and/or other materials provided with the distribution.
16.\"
17.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
18.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
21.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
22.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
23.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
24.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
26.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
27.\" SUCH DAMAGE.
28.\"
29.Dd January 17, 2023
30.Dt SMR 9
31.Os
32.Sh NAME
33.Nm smr
34.Nd safe memory reclamation for lock-free data structures
35.Sh SYNOPSIS
36.In sys/smr.h
37.Ft smr_seq_t
38.Fo smr_advance
39.Fa "smr_t smr"
40.Fc
41.Ft smr_t
42.Fo smr_create
43.Fa "const char *name"
44.Fc
45.Ft void
46.Fo smr_destroy
47.Fa "smr_t smr"
48.Fc
49.Ft void
50.Fo smr_enter
51.Fa "smr_t smr"
52.Fc
53.Ft void
54.Fo smr_exit
55.Fa "smr_t smr"
56.Fc
57.Ft bool
58.Fo smr_poll
59.Fa "smr_t smr"
60.Fa "smr_seq_t goal"
61.Fa "bool wait"
62.Fc
63.Ft void
64.Fo smr_synchronize
65.Fa "smr_t smr"
66.Fc
67.Ft void
68.Fo smr_wait
69.Fa "smr_t smr"
70.Fa "smr_seq_t goal"
71.Fc
72.Sh DESCRIPTION
73Safe Memory Reclamation (SMR) is a facility which enables the implementation of
74memory-safe lock-free data structures.
75In typical usage, read accesses to an SMR-protected data structure, such as a
76hash table or tree, are performed in a
77.Dq read section
78consisting of code bracketed by
79.Fn smr_enter
80and
81.Fn smr_exit
82calls, while mutations of the data structure are serialized by a traditional
83mutex such as
84.Xr mutex 9 .
85In contrast with reader-writer locks such as
86.Xr rwlock 9 ,
87.Xr rmlock 9 ,
88and
89.Xr sx 9 ,
90SMR allows readers and writers to access the data structure concurrently.
91Readers can always enter a read section immediately
92.Po
93.Fn smr_enter
94never blocks
95.Pc ,
96so mutations do not introduce read latency.
97Furthermore,
98.Fn smr_enter
99and
100.Fn smr_exit
101operate only on per-CPU data and thus avoid some of the performance problems
102inherent in the implementation of traditional reader-writer mutexes.
103SMR can therefore be a useful building block for data structures which are
104accessed frequently but are only rarely modified.
105.Pp
106Note that any SMR-protected data structure must be implemented carefully such
107that operations behave correctly in the absence of mutual exclusion between
108readers and writers.
109The data structure must be designed to be lock-free; SMR merely facilitates
110the implementation, for example by making it safe to follow dangling pointers
111and by helping avoid the ABA problem.
112.Pp
113When shared accesses to and mutations of a data structure can proceed
114concurrently, writers must take care to ensure that any items removed from the
115structure are not freed and recycled while readers are accessing them in
116parallel.
117This requirement results in a two-phase approach to the removal of items:
118first, the item is unlinked such that all pointers to the item are removed from
119the structure, preventing any new readers from observing the item.
120Then, the writer waits until some mechanism guarantees that no existing readers
121are still accessing the item.
122At that point the memory for that item can be freed and reused safely.
123SMR provides this mechanism: readers may access a lock-free data structure in
124between calls to the
125.Fn smr_enter
126and
127.Fn smr_exit
128functions, which together create a read section, and the
129.Fn smr_advance ,
130.Fn smr_poll ,
131.Fn smr_wait ,
132and
133.Fn smr_synchronize
134functions can be used to wait for threads in read sections to finish.
135All of these functions operate on a
136.Ft smr_t
137state block which holds both per-CPU and global state.
138Readers load global state and modify per-CPU state, while writers must scan all
139per-CPU states to detect active readers.
140SMR is designed to amortize this cost by batching to give acceptable
141performance in write-heavy workloads.
142.Ss Readers
143Threads enter a read section by calling
144.Fn smr_enter .
145Read sections should be short, and many operations are not permitted while in
146a read section.
147Specifically, context switching is disabled, and thus readers may not acquire
148blocking mutexes such as
149.Xr mutex 9 .
150Another consequence of this is that the thread is pinned to the current CPU for
151the duration of the read section.
152Furthermore, read sections may not be nested: it is incorrect to call
153.Fn smr_enter
154with a given
155.Ft smr_t
156state block when already in a read section for that state block.
157.Ss UMA Integration
158To simplify the integration of SMR into consumers, the
159.Xr uma 9
160kernel memory allocator provides some SMR-specified facilities.
161This eliminates a good deal of complexity from the implementation of consumers
162and automatically batches write operations.
163.Pp
164In typical usage, a UMA zone (created with the
165.Dv UMA_ZONE_SMR
166flag or initialized with the
167.Fn uma_zone_set_smr
168function) is coupled with a
169.Ft smr_t
170state block.
171To insert an item into an SMR-protected data structure, memory is allocated
172from the zone using the
173.Fn uma_zalloc_smr
174function.
175Insertions and removals are serialized using traditional mutual exclusion
176and items are freed using the
177.Fn uma_zfree_smr
178function.
179Read-only lookup operations are performed in SMR read sections.
180.Fn uma_zfree_smr
181waits for all active readers which may be accessing the freed item to finish
182their read sections before recycling that item's memory.
183.Pp
184If the zone has an associated per-item destructor, it will be invoked at some
185point when no readers can be accessing a given item.
186The destructor can be used to release additional resources associated with the
187item.
188Note however that there is no guarantee that the destructor will be invoked in
189a bounded time period.
190.Ss Writers
191Consumers are expected to use SMR in conjunction with UMA and thus need only
192make use of the
193.Fn smr_enter
194and
195.Fn smr_exit
196functions, and the SMR helper macros defined in
197.Pa sys/smr_types.h .
198However, an introduction to the write-side interface of SMR can be useful.
199.Pp
200Internally, SMR maintains a global
201.Ql write sequence
202number.
203When entering a read section,
204.Fn smr_enter
205loads a copy of the write sequence and stores it in per-CPU memory, hence
206.Ql observing
207that value.
208To exit a read section, this per-CPU memory is overwritten with an invalid
209value, making the CPU inactive.
210Writers perform two operations: advancing the write sequence number, and
211polling all CPUs to see whether active readers have observed a given sequence
212number.
213These operations are performed by
214.Fn smr_advance
215and
216.Fn smr_poll ,
217respectively, which do not require serialization between multiple writers.
218.Pp
219After a writer unlinks an item from a data structure, it increments the write
220sequence number and tags the item with the new value returned by
221.Fn smr_advance .
222Once all CPUs have observed the new value, the writer can use
223.Fn smr_poll
224to deduce that no active readers have access to the unlinked item, and thus the
225item is safe to recycle.
226Because this pair of operations is relatively expensive, it is generally a good
227idea to amortize this cost by accumulating a collection of multiple unlinked
228items and tagging the entire batch with a target write sequence number.
229.Pp
230.Fn smr_poll
231is a non-blocking operation and returns true only if all active readers are
232guaranteed to have observed the target sequence number value.
233.Fn smr_wait
234is a variant of
235.Fn smr_poll
236which waits until all CPUs have observed the target sequence number value.
237.Fn smr_synchronize
238combines
239.Fn smr_advance
240with
241.Fn smr_wait
242to wait for all CPUs to observe a new write sequence number.
243This is an expensive operation and should only be used if polling cannot be
244deferred in some way.
245.Ss Memory Ordering
246The
247.Fn smr_enter
248function has acquire semantics, and the
249.Fn smr_exit
250function has release semantics.
251The
252.Fn smr_advance ,
253.Fn smr_poll ,
254.Fn smr_wait ,
255and
256.Fn smr_synchronize
257functions should not be assumed to have any guarantees with respect to memory
258ordering.
259In practice, some of these functions have stronger ordering semantics than
260is stated here, but this is specific to the implementation and should not be
261relied upon.
262See
263.Xr atomic 9
264for more details.
265.Sh NOTES
266Outside of
267.Fx
268the acronym SMR typically refers to a family of algorithms which enable
269memory-safe read-only access to a data structure concurrent with modifications
270to that data structure.
271Here, SMR refers to a particular algorithm belonging to this family, as well as
272its implementation in
273.Fx .
274See
275.Pa sys/sys/smr.h
276and
277.Pa sys/kern/subr_smr.c
278in the
279.Fx
280source tree for further details on the algorithm and the context.
281.Pp
282The acronym SMR is also used to mean "shingled magnetic recording", a
283technology used to store data on hard disk drives which requires operating
284system support.
285These two uses of the acronym are unrelated.
286.Sh SEE ALSO
287.Xr atomic 9 ,
288.Xr locking 9 ,
289.Xr uma 9
290.Sh AUTHORS
291The SMR algorithm and its implementation were provided by
292.An Jeff Roberson Aq Mt jeff@FreeBSD.org .
293This manual page was written by
294.An Mark Johnston Aq Mt markj@FreeBSD.org .
295