xref: /freebsd/share/man/man9/smr.9 (revision 0de03c306c2f95580e56c65b74ebf8c04c1416b3)
1*0de03c30SMark Johnston.\" SPDX-License-Identifier: BSD-2-Clause
2*0de03c30SMark Johnston.\"
3*0de03c30SMark Johnston.\" Copyright (c) 2023 The FreeBSD Foundation
4*0de03c30SMark Johnston.\"
5*0de03c30SMark Johnston.\" This documentation was written by Mark Johnston <markj@FreeBSD.org>
6*0de03c30SMark Johnston.\" under sponsorship from the FreeBSD Foundation.
7*0de03c30SMark Johnston.\"
8*0de03c30SMark Johnston.\" Redistribution and use in source and binary forms, with or without
9*0de03c30SMark Johnston.\" modification, are permitted provided that the following conditions
10*0de03c30SMark Johnston.\" are met:
11*0de03c30SMark Johnston.\" 1. Redistributions of source code must retain the above copyright
12*0de03c30SMark Johnston.\"    notice, this list of conditions and the following disclaimer.
13*0de03c30SMark Johnston.\" 2. Redistributions in binary form must reproduce the above copyright
14*0de03c30SMark Johnston.\"    notice, this list of conditions and the following disclaimer in the
15*0de03c30SMark Johnston.\"    documentation and/or other materials provided with the distribution.
16*0de03c30SMark Johnston.\"
17*0de03c30SMark Johnston.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
18*0de03c30SMark Johnston.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
19*0de03c30SMark Johnston.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
20*0de03c30SMark Johnston.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
21*0de03c30SMark Johnston.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
22*0de03c30SMark Johnston.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
23*0de03c30SMark Johnston.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
24*0de03c30SMark Johnston.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
25*0de03c30SMark Johnston.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
26*0de03c30SMark Johnston.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
27*0de03c30SMark Johnston.\" SUCH DAMAGE.
28*0de03c30SMark Johnston.\"
29*0de03c30SMark Johnston.Dd January 17, 2023
30*0de03c30SMark Johnston.Dt SMR 9
31*0de03c30SMark Johnston.Os
32*0de03c30SMark Johnston.Sh NAME
33*0de03c30SMark Johnston.Nm smr
34*0de03c30SMark Johnston.Nd safe memory reclamation for lock-free data structures
35*0de03c30SMark Johnston.Sh SYNOPSIS
36*0de03c30SMark Johnston.In sys/smr.h
37*0de03c30SMark Johnston.Ft smr_seq_t
38*0de03c30SMark Johnston.Fo smr_advance
39*0de03c30SMark Johnston.Fa "smr_t smr"
40*0de03c30SMark Johnston.Fc
41*0de03c30SMark Johnston.Ft smr_t
42*0de03c30SMark Johnston.Fo smr_create
43*0de03c30SMark Johnston.Fa "const char *name"
44*0de03c30SMark Johnston.Fc
45*0de03c30SMark Johnston.Ft void
46*0de03c30SMark Johnston.Fo smr_destroy
47*0de03c30SMark Johnston.Fa "smr_t smr"
48*0de03c30SMark Johnston.Fc
49*0de03c30SMark Johnston.Ft void
50*0de03c30SMark Johnston.Fo smr_enter
51*0de03c30SMark Johnston.Fa "smr_t smr"
52*0de03c30SMark Johnston.Fc
53*0de03c30SMark Johnston.Ft void
54*0de03c30SMark Johnston.Fo smr_exit
55*0de03c30SMark Johnston.Fa "smr_t smr"
56*0de03c30SMark Johnston.Fc
57*0de03c30SMark Johnston.Ft bool
58*0de03c30SMark Johnston.Fo smr_poll
59*0de03c30SMark Johnston.Fa "smr_t smr"
60*0de03c30SMark Johnston.Fa "smr_seq_t goal"
61*0de03c30SMark Johnston.Fa "bool wait"
62*0de03c30SMark Johnston.Fc
63*0de03c30SMark Johnston.Ft void
64*0de03c30SMark Johnston.Fo smr_synchronize
65*0de03c30SMark Johnston.Fa "smr_t smr"
66*0de03c30SMark Johnston.Fc
67*0de03c30SMark Johnston.Ft bool
68*0de03c30SMark Johnston.Fo smr_wait
69*0de03c30SMark Johnston.Fa "smr_t smr"
70*0de03c30SMark Johnston.Fa "smr_seq_t goal"
71*0de03c30SMark Johnston.Fc
72*0de03c30SMark Johnston.Sh DESCRIPTION
73*0de03c30SMark JohnstonSafe Memory Reclamation (SMR) is a facility which enables the implementation of
74*0de03c30SMark Johnstonmemory-safe lock-free data structures.
75*0de03c30SMark JohnstonIn typical usage, read accesses to an SMR-protected data structure, such as a
76*0de03c30SMark Johnstonhash table or tree, are performed in a
77*0de03c30SMark Johnston.Dq read section
78*0de03c30SMark Johnstonconsisting of code bracketed by
79*0de03c30SMark Johnston.Fn smr_enter
80*0de03c30SMark Johnstonand
81*0de03c30SMark Johnston.Fn smr_exit
82*0de03c30SMark Johnstoncalls, while mutations of the data structure are serialized by a traditional
83*0de03c30SMark Johnstonmutex such as
84*0de03c30SMark Johnston.Xr mutex 9 .
85*0de03c30SMark JohnstonIn contrast with reader-writer locks such as
86*0de03c30SMark Johnston.Xr rwlock 9 ,
87*0de03c30SMark Johnston.Xr rmlock 9 ,
88*0de03c30SMark Johnstonand
89*0de03c30SMark Johnston.Xr sx 9 ,
90*0de03c30SMark JohnstonSMR allows readers and writers to access the data structure concurrently.
91*0de03c30SMark JohnstonReaders can always enter a read section immediately
92*0de03c30SMark Johnston.Po
93*0de03c30SMark Johnston.Fn smr_enter
94*0de03c30SMark Johnstonnever blocks
95*0de03c30SMark Johnston.Pc ,
96*0de03c30SMark Johnstonso mutations do not introduce read latency.
97*0de03c30SMark JohnstonFurthermore,
98*0de03c30SMark Johnston.Fn smr_enter
99*0de03c30SMark Johnstonand
100*0de03c30SMark Johnston.Fn smr_exit
101*0de03c30SMark Johnstonoperate only on per-CPU data and thus avoid some of the performance problems
102*0de03c30SMark Johnstoninherent in the implementation of traditional reader-writer mutexes.
103*0de03c30SMark JohnstonSMR can therefore be a useful building block for data structures which are
104*0de03c30SMark Johnstonaccessed frequently but are only rarely modified.
105*0de03c30SMark Johnston.Pp
106*0de03c30SMark JohnstonNote that any SMR-protected data structure must be implemented carefully such
107*0de03c30SMark Johnstonthat operations behave correctly in the absence of mutual exclusion between
108*0de03c30SMark Johnstonreaders and writers.
109*0de03c30SMark JohnstonThe data structure must be designed to be lock-free; SMR merely facilitates
110*0de03c30SMark Johnstonthe implementation, for example by making it safe to follow dangling pointers
111*0de03c30SMark Johnstonand by helping avoid the ABA problem.
112*0de03c30SMark Johnston.Pp
113*0de03c30SMark JohnstonWhen shared accesses to and mutations of a data structure can proceed
114*0de03c30SMark Johnstonconcurrently, writers must take care to ensure that any items removed from the
115*0de03c30SMark Johnstonstructure are not freed and recycled while readers are accessing them in
116*0de03c30SMark Johnstonparallel.
117*0de03c30SMark JohnstonThis requirement results in a two-phase approach to the removal of items:
118*0de03c30SMark Johnstonfirst, the item is unlinked such that all pointers to the item are removed from
119*0de03c30SMark Johnstonthe structure, preventing any new readers from observing the item.
120*0de03c30SMark JohnstonThen, the writer waits until some mechanism guarantees that no existing readers
121*0de03c30SMark Johnstonare still accessing the item.
122*0de03c30SMark JohnstonAt that point the memory for that item can be freed and reused safely.
123*0de03c30SMark JohnstonSMR provides this mechanism: readers may access a lock-free data structure in
124*0de03c30SMark Johnstonbetween calls to the
125*0de03c30SMark Johnston.Fn smr_enter
126*0de03c30SMark Johnstonand
127*0de03c30SMark Johnston.Fn smr_exit
128*0de03c30SMark Johnstonfunctions, which together create a read section, and the
129*0de03c30SMark Johnston.Fn smr_advance ,
130*0de03c30SMark Johnston.Fn smr_poll ,
131*0de03c30SMark Johnston.Fn smr_wait ,
132*0de03c30SMark Johnstonand
133*0de03c30SMark Johnston.Fn smr_synchronize
134*0de03c30SMark Johnstonfunctions can be used to wait for threads in read sections to finish.
135*0de03c30SMark JohnstonAll of these functions operate on a
136*0de03c30SMark Johnston.Ft smr_t
137*0de03c30SMark Johnstonstate block which holds both per-CPU and global state.
138*0de03c30SMark JohnstonReaders load global state and modify per-CPU state, while writers must scan all
139*0de03c30SMark Johnstonper-CPU states to detect active readers.
140*0de03c30SMark JohnstonSMR is designed to amortize this cost by batching to give acceptable
141*0de03c30SMark Johnstonperformance in write-heavy workloads.
142*0de03c30SMark Johnston.Ss Readers
143*0de03c30SMark JohnstonThreads enter a read section by calling
144*0de03c30SMark Johnston.Fn smr_enter .
145*0de03c30SMark JohnstonRead sections should be short, and many operations are not permitted while in
146*0de03c30SMark Johnstona read section.
147*0de03c30SMark JohnstonSpecifically, context switching is disabled, and thus readers may not acquire
148*0de03c30SMark Johnstonblocking mutexes such as
149*0de03c30SMark Johnston.Xr mutex 9 .
150*0de03c30SMark JohnstonAnother consequence of this is that the thread is pinned to the current CPU for
151*0de03c30SMark Johnstonthe duration of the read section.
152*0de03c30SMark JohnstonFurthermore, read sections may not be nested: it is incorrect to call
153*0de03c30SMark Johnston.Fn smr_enter
154*0de03c30SMark Johnstonwith a given
155*0de03c30SMark Johnston.Ft smr_t
156*0de03c30SMark Johnstonstate block when already in a read section for that state block.
157*0de03c30SMark Johnston.Ss UMA Integration
158*0de03c30SMark JohnstonTo simplify the integration of SMR into consumers, the
159*0de03c30SMark Johnston.Xr uma 9
160*0de03c30SMark Johnstonkernel memory allocator provides some SMR-specified facilities.
161*0de03c30SMark JohnstonThis eliminates a good deal of complexity from the implementation of consumers
162*0de03c30SMark Johnstonand automatically batches write operations.
163*0de03c30SMark Johnston.Pp
164*0de03c30SMark JohnstonIn typical usage, a UMA zone (created with the
165*0de03c30SMark Johnston.Dv UMA_ZONE_SMR
166*0de03c30SMark Johnstonflag or initialized with the
167*0de03c30SMark Johnston.Fn uma_zone_set_smr
168*0de03c30SMark Johnstonfunction) is coupled with a
169*0de03c30SMark Johnston.Ft smr_t
170*0de03c30SMark Johnstonstate block.
171*0de03c30SMark JohnstonTo insert an item into an SMR-protected data structure, memory is allocated
172*0de03c30SMark Johnstonfrom the zone using the
173*0de03c30SMark Johnston.Fn uma_zalloc_smr
174*0de03c30SMark Johnstonfunction.
175*0de03c30SMark JohnstonInsertions and removals are serialized using traditional mutual exclusion
176*0de03c30SMark Johnstonand items are freed using the
177*0de03c30SMark Johnston.Fn uma_zfree_smr
178*0de03c30SMark Johnstonfunction.
179*0de03c30SMark JohnstonRead-only lookup operations are performed in SMR read sections.
180*0de03c30SMark Johnston.Fn uma_zfree_smr
181*0de03c30SMark Johnstonwaits for all active readers which may be accessing the freed item to finish
182*0de03c30SMark Johnstontheir read sections before recycling that item's memory.
183*0de03c30SMark Johnston.Pp
184*0de03c30SMark JohnstonIf the zone has an associated per-item destructor, it will be invoked at some
185*0de03c30SMark Johnstonpoint when no readers can be accessing a given item.
186*0de03c30SMark JohnstonThe destructor can be used to release additional resources associated with the
187*0de03c30SMark Johnstonitem.
188*0de03c30SMark JohnstonNote however that there is no guarantee that the destructor will be invoked in
189*0de03c30SMark Johnstona bounded time period.
190*0de03c30SMark Johnston.Ss Writers
191*0de03c30SMark JohnstonConsumers are expected to use SMR in conjunction with UMA and thus need only
192*0de03c30SMark Johnstonmake use of the
193*0de03c30SMark Johnston.Fn smr_enter
194*0de03c30SMark Johnstonand
195*0de03c30SMark Johnston.Fn smr_exit
196*0de03c30SMark Johnstonfunctions, and the SMR helper macros defined in
197*0de03c30SMark Johnston.Pa sys/smr_types.h .
198*0de03c30SMark JohnstonHowever, an introduction to the write-side interface of SMR can be useful.
199*0de03c30SMark Johnston.Pp
200*0de03c30SMark JohnstonInternally, SMR maintains a global
201*0de03c30SMark Johnston.Ql write sequence
202*0de03c30SMark Johnstonnumber.
203*0de03c30SMark JohnstonWhen entering a read section,
204*0de03c30SMark Johnston.Fn smr_enter
205*0de03c30SMark Johnstonloads a copy of the write sequence and stores it in per-CPU memory, hence
206*0de03c30SMark Johnston.Ql observing
207*0de03c30SMark Johnstonthat value.
208*0de03c30SMark JohnstonTo exit a read section, this per-CPU memory is overwritten with an invalid
209*0de03c30SMark Johnstonvalue, making the CPU inactive.
210*0de03c30SMark JohnstonWriters perform two operations: advancing the write sequence number, and
211*0de03c30SMark Johnstonpolling all CPUs to see whether active readers have observed a given sequence
212*0de03c30SMark Johnstonnumber.
213*0de03c30SMark JohnstonThese operations are performed by
214*0de03c30SMark Johnston.Fn smr_advance
215*0de03c30SMark Johnstonand
216*0de03c30SMark Johnston.Fn smr_poll ,
217*0de03c30SMark Johnstonrespectively, which do not require serialization between multiple writers.
218*0de03c30SMark Johnston.Pp
219*0de03c30SMark JohnstonAfter a writer unlinks an item from a data structure, it increments the write
220*0de03c30SMark Johnstonsequence number and tags the item with the new value returned by
221*0de03c30SMark Johnston.Fn smr_advance .
222*0de03c30SMark JohnstonOnce all CPUs have observed the new value, the writer can use
223*0de03c30SMark Johnston.Fn smr_poll
224*0de03c30SMark Johnstonto deduce that no active readers have access to the unlinked item, and thus the
225*0de03c30SMark Johnstonitem is safe to recycle.
226*0de03c30SMark JohnstonBecause this pair of operations is relatively expensive, it is generally a good
227*0de03c30SMark Johnstonidea to amortize this cost by accumulating a collection of multiple unlinked
228*0de03c30SMark Johnstonitems and tagging the entire batch with a target write sequence number.
229*0de03c30SMark Johnston.Pp
230*0de03c30SMark Johnston.Fn smr_poll
231*0de03c30SMark Johnstonis a non-blocking operation and returns true only if all active readers are
232*0de03c30SMark Johnstonguaranteed to have observed the target sequence number value.
233*0de03c30SMark Johnston.Fn smr_wait
234*0de03c30SMark Johnstonis a variant of
235*0de03c30SMark Johnston.Fn smr_poll
236*0de03c30SMark Johnstonwhich waits until all CPUs have observed the target sequence number value.
237*0de03c30SMark Johnston.Fn smr_synchronize
238*0de03c30SMark Johnstoncombines
239*0de03c30SMark Johnston.Fn smr_advance
240*0de03c30SMark Johnstonwith
241*0de03c30SMark Johnston.Fn smr_wait
242*0de03c30SMark Johnstonto wait for all CPUs to observe a new write sequence number.
243*0de03c30SMark JohnstonThis is an expensive operation and should only be used if polling cannot be
244*0de03c30SMark Johnstondeferred in some way.
245*0de03c30SMark Johnston.Ss Memory Ordering
246*0de03c30SMark JohnstonThe
247*0de03c30SMark Johnston.Fn smr_enter
248*0de03c30SMark Johnstonfunction has acquire semantics, and the
249*0de03c30SMark Johnston.Fn smr_exit
250*0de03c30SMark Johnstonfunction has release semantics.
251*0de03c30SMark JohnstonThe
252*0de03c30SMark Johnston.Fn smr_advance ,
253*0de03c30SMark Johnston.Fn smr_poll ,
254*0de03c30SMark Johnston.Fn smr_wait ,
255*0de03c30SMark Johnstonand
256*0de03c30SMark Johnston.Fn smr_synchronize
257*0de03c30SMark Johnstonfunctions should not be assumed to have any guarantees with respect to memory
258*0de03c30SMark Johnstonordering.
259*0de03c30SMark JohnstonIn practice, some of these functions have stronger ordering semantics than
260*0de03c30SMark Johnstonis stated here, but this is specific to the implementation and should not be
261*0de03c30SMark Johnstonrelied upon.
262*0de03c30SMark JohnstonSee
263*0de03c30SMark Johnston.Xr atomic 9
264*0de03c30SMark Johnstonfor more details.
265*0de03c30SMark Johnston.Sh NOTES
266*0de03c30SMark JohnstonOutside of
267*0de03c30SMark Johnston.Fx
268*0de03c30SMark Johnstonthe acronym SMR typically refers to a family of algorithms which enable
269*0de03c30SMark Johnstonmemory-safe read-only access to a data structure concurrent with modifications
270*0de03c30SMark Johnstonto that data structure.
271*0de03c30SMark JohnstonHere, SMR refers to a particular algorithm belonging to this family, as well as
272*0de03c30SMark Johnstonits implementation in
273*0de03c30SMark Johnston.Fx .
274*0de03c30SMark JohnstonSee
275*0de03c30SMark Johnston.Pa sys/sys/smr.h
276*0de03c30SMark Johnstonand
277*0de03c30SMark Johnston.Pa sys/kern/subr_smr.c
278*0de03c30SMark Johnstonin the
279*0de03c30SMark Johnston.Fx
280*0de03c30SMark Johnstonsource tree for further details on the algorithm and the context.
281*0de03c30SMark Johnston.Pp
282*0de03c30SMark JohnstonThe acronym SMR is also used to mean "shingled magnetic recording", a
283*0de03c30SMark Johnstontechnology used to store data on hard disk drives which requires operating
284*0de03c30SMark Johnstonsystem support.
285*0de03c30SMark JohnstonThese two uses of the acronym are unrelated.
286*0de03c30SMark Johnston.Sh SEE ALSO
287*0de03c30SMark Johnston.Xr atomic 9 ,
288*0de03c30SMark Johnston.Xr locking 9 ,
289*0de03c30SMark Johnston.Xr uma 9
290*0de03c30SMark Johnston.Sh AUTHORS
291*0de03c30SMark JohnstonThe SMR algorithm and its implementation were provided by
292*0de03c30SMark Johnston.An Jeff Roberson Aq Mt jeff@FreeBSD.org .
293*0de03c30SMark JohnstonThis manual page was written by
294*0de03c30SMark Johnston.An Mark Johnston Aq Mt markj@FreeBSD.org .
295