xref: /freebsd/share/man/man9/smr.9 (revision cd133525fad197ac8cbbd4bd68860a4dd51a561f)
10de03c30SMark Johnston.\" SPDX-License-Identifier: BSD-2-Clause
20de03c30SMark Johnston.\"
30de03c30SMark Johnston.\" Copyright (c) 2023 The FreeBSD Foundation
40de03c30SMark Johnston.\"
50de03c30SMark Johnston.\" This documentation was written by Mark Johnston <markj@FreeBSD.org>
60de03c30SMark Johnston.\" under sponsorship from the FreeBSD Foundation.
70de03c30SMark Johnston.\"
80de03c30SMark Johnston.\" Redistribution and use in source and binary forms, with or without
90de03c30SMark Johnston.\" modification, are permitted provided that the following conditions
100de03c30SMark Johnston.\" are met:
110de03c30SMark Johnston.\" 1. Redistributions of source code must retain the above copyright
120de03c30SMark Johnston.\"    notice, this list of conditions and the following disclaimer.
130de03c30SMark Johnston.\" 2. Redistributions in binary form must reproduce the above copyright
140de03c30SMark Johnston.\"    notice, this list of conditions and the following disclaimer in the
150de03c30SMark Johnston.\"    documentation and/or other materials provided with the distribution.
160de03c30SMark Johnston.\"
170de03c30SMark Johnston.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
180de03c30SMark Johnston.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
190de03c30SMark Johnston.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
200de03c30SMark Johnston.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
210de03c30SMark Johnston.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
220de03c30SMark Johnston.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
230de03c30SMark Johnston.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
240de03c30SMark Johnston.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
250de03c30SMark Johnston.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
260de03c30SMark Johnston.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
270de03c30SMark Johnston.\" SUCH DAMAGE.
280de03c30SMark Johnston.\"
290de03c30SMark Johnston.Dd January 17, 2023
300de03c30SMark Johnston.Dt SMR 9
310de03c30SMark Johnston.Os
320de03c30SMark Johnston.Sh NAME
330de03c30SMark Johnston.Nm smr
340de03c30SMark Johnston.Nd safe memory reclamation for lock-free data structures
350de03c30SMark Johnston.Sh SYNOPSIS
360de03c30SMark Johnston.In sys/smr.h
370de03c30SMark Johnston.Ft smr_seq_t
380de03c30SMark Johnston.Fo smr_advance
390de03c30SMark Johnston.Fa "smr_t smr"
400de03c30SMark Johnston.Fc
410de03c30SMark Johnston.Ft smr_t
420de03c30SMark Johnston.Fo smr_create
430de03c30SMark Johnston.Fa "const char *name"
440de03c30SMark Johnston.Fc
450de03c30SMark Johnston.Ft void
460de03c30SMark Johnston.Fo smr_destroy
470de03c30SMark Johnston.Fa "smr_t smr"
480de03c30SMark Johnston.Fc
490de03c30SMark Johnston.Ft void
500de03c30SMark Johnston.Fo smr_enter
510de03c30SMark Johnston.Fa "smr_t smr"
520de03c30SMark Johnston.Fc
530de03c30SMark Johnston.Ft void
540de03c30SMark Johnston.Fo smr_exit
550de03c30SMark Johnston.Fa "smr_t smr"
560de03c30SMark Johnston.Fc
570de03c30SMark Johnston.Ft bool
580de03c30SMark Johnston.Fo smr_poll
590de03c30SMark Johnston.Fa "smr_t smr"
600de03c30SMark Johnston.Fa "smr_seq_t goal"
610de03c30SMark Johnston.Fa "bool wait"
620de03c30SMark Johnston.Fc
630de03c30SMark Johnston.Ft void
640de03c30SMark Johnston.Fo smr_synchronize
650de03c30SMark Johnston.Fa "smr_t smr"
660de03c30SMark Johnston.Fc
67*cd133525SMark Johnston.Ft void
680de03c30SMark Johnston.Fo smr_wait
690de03c30SMark Johnston.Fa "smr_t smr"
700de03c30SMark Johnston.Fa "smr_seq_t goal"
710de03c30SMark Johnston.Fc
720de03c30SMark Johnston.Sh DESCRIPTION
730de03c30SMark JohnstonSafe Memory Reclamation (SMR) is a facility which enables the implementation of
740de03c30SMark Johnstonmemory-safe lock-free data structures.
750de03c30SMark JohnstonIn typical usage, read accesses to an SMR-protected data structure, such as a
760de03c30SMark Johnstonhash table or tree, are performed in a
770de03c30SMark Johnston.Dq read section
780de03c30SMark Johnstonconsisting of code bracketed by
790de03c30SMark Johnston.Fn smr_enter
800de03c30SMark Johnstonand
810de03c30SMark Johnston.Fn smr_exit
820de03c30SMark Johnstoncalls, while mutations of the data structure are serialized by a traditional
830de03c30SMark Johnstonmutex such as
840de03c30SMark Johnston.Xr mutex 9 .
850de03c30SMark JohnstonIn contrast with reader-writer locks such as
860de03c30SMark Johnston.Xr rwlock 9 ,
870de03c30SMark Johnston.Xr rmlock 9 ,
880de03c30SMark Johnstonand
890de03c30SMark Johnston.Xr sx 9 ,
900de03c30SMark JohnstonSMR allows readers and writers to access the data structure concurrently.
910de03c30SMark JohnstonReaders can always enter a read section immediately
920de03c30SMark Johnston.Po
930de03c30SMark Johnston.Fn smr_enter
940de03c30SMark Johnstonnever blocks
950de03c30SMark Johnston.Pc ,
960de03c30SMark Johnstonso mutations do not introduce read latency.
970de03c30SMark JohnstonFurthermore,
980de03c30SMark Johnston.Fn smr_enter
990de03c30SMark Johnstonand
1000de03c30SMark Johnston.Fn smr_exit
1010de03c30SMark Johnstonoperate only on per-CPU data and thus avoid some of the performance problems
1020de03c30SMark Johnstoninherent in the implementation of traditional reader-writer mutexes.
1030de03c30SMark JohnstonSMR can therefore be a useful building block for data structures which are
1040de03c30SMark Johnstonaccessed frequently but are only rarely modified.
1050de03c30SMark Johnston.Pp
1060de03c30SMark JohnstonNote that any SMR-protected data structure must be implemented carefully such
1070de03c30SMark Johnstonthat operations behave correctly in the absence of mutual exclusion between
1080de03c30SMark Johnstonreaders and writers.
1090de03c30SMark JohnstonThe data structure must be designed to be lock-free; SMR merely facilitates
1100de03c30SMark Johnstonthe implementation, for example by making it safe to follow dangling pointers
1110de03c30SMark Johnstonand by helping avoid the ABA problem.
1120de03c30SMark Johnston.Pp
1130de03c30SMark JohnstonWhen shared accesses to and mutations of a data structure can proceed
1140de03c30SMark Johnstonconcurrently, writers must take care to ensure that any items removed from the
1150de03c30SMark Johnstonstructure are not freed and recycled while readers are accessing them in
1160de03c30SMark Johnstonparallel.
1170de03c30SMark JohnstonThis requirement results in a two-phase approach to the removal of items:
1180de03c30SMark Johnstonfirst, the item is unlinked such that all pointers to the item are removed from
1190de03c30SMark Johnstonthe structure, preventing any new readers from observing the item.
1200de03c30SMark JohnstonThen, the writer waits until some mechanism guarantees that no existing readers
1210de03c30SMark Johnstonare still accessing the item.
1220de03c30SMark JohnstonAt that point the memory for that item can be freed and reused safely.
1230de03c30SMark JohnstonSMR provides this mechanism: readers may access a lock-free data structure in
1240de03c30SMark Johnstonbetween calls to the
1250de03c30SMark Johnston.Fn smr_enter
1260de03c30SMark Johnstonand
1270de03c30SMark Johnston.Fn smr_exit
1280de03c30SMark Johnstonfunctions, which together create a read section, and the
1290de03c30SMark Johnston.Fn smr_advance ,
1300de03c30SMark Johnston.Fn smr_poll ,
1310de03c30SMark Johnston.Fn smr_wait ,
1320de03c30SMark Johnstonand
1330de03c30SMark Johnston.Fn smr_synchronize
1340de03c30SMark Johnstonfunctions can be used to wait for threads in read sections to finish.
1350de03c30SMark JohnstonAll of these functions operate on a
1360de03c30SMark Johnston.Ft smr_t
1370de03c30SMark Johnstonstate block which holds both per-CPU and global state.
1380de03c30SMark JohnstonReaders load global state and modify per-CPU state, while writers must scan all
1390de03c30SMark Johnstonper-CPU states to detect active readers.
1400de03c30SMark JohnstonSMR is designed to amortize this cost by batching to give acceptable
1410de03c30SMark Johnstonperformance in write-heavy workloads.
1420de03c30SMark Johnston.Ss Readers
1430de03c30SMark JohnstonThreads enter a read section by calling
1440de03c30SMark Johnston.Fn smr_enter .
1450de03c30SMark JohnstonRead sections should be short, and many operations are not permitted while in
1460de03c30SMark Johnstona read section.
1470de03c30SMark JohnstonSpecifically, context switching is disabled, and thus readers may not acquire
1480de03c30SMark Johnstonblocking mutexes such as
1490de03c30SMark Johnston.Xr mutex 9 .
1500de03c30SMark JohnstonAnother consequence of this is that the thread is pinned to the current CPU for
1510de03c30SMark Johnstonthe duration of the read section.
1520de03c30SMark JohnstonFurthermore, read sections may not be nested: it is incorrect to call
1530de03c30SMark Johnston.Fn smr_enter
1540de03c30SMark Johnstonwith a given
1550de03c30SMark Johnston.Ft smr_t
1560de03c30SMark Johnstonstate block when already in a read section for that state block.
1570de03c30SMark Johnston.Ss UMA Integration
1580de03c30SMark JohnstonTo simplify the integration of SMR into consumers, the
1590de03c30SMark Johnston.Xr uma 9
1600de03c30SMark Johnstonkernel memory allocator provides some SMR-specified facilities.
1610de03c30SMark JohnstonThis eliminates a good deal of complexity from the implementation of consumers
1620de03c30SMark Johnstonand automatically batches write operations.
1630de03c30SMark Johnston.Pp
1640de03c30SMark JohnstonIn typical usage, a UMA zone (created with the
1650de03c30SMark Johnston.Dv UMA_ZONE_SMR
1660de03c30SMark Johnstonflag or initialized with the
1670de03c30SMark Johnston.Fn uma_zone_set_smr
1680de03c30SMark Johnstonfunction) is coupled with a
1690de03c30SMark Johnston.Ft smr_t
1700de03c30SMark Johnstonstate block.
1710de03c30SMark JohnstonTo insert an item into an SMR-protected data structure, memory is allocated
1720de03c30SMark Johnstonfrom the zone using the
1730de03c30SMark Johnston.Fn uma_zalloc_smr
1740de03c30SMark Johnstonfunction.
1750de03c30SMark JohnstonInsertions and removals are serialized using traditional mutual exclusion
1760de03c30SMark Johnstonand items are freed using the
1770de03c30SMark Johnston.Fn uma_zfree_smr
1780de03c30SMark Johnstonfunction.
1790de03c30SMark JohnstonRead-only lookup operations are performed in SMR read sections.
1800de03c30SMark Johnston.Fn uma_zfree_smr
1810de03c30SMark Johnstonwaits for all active readers which may be accessing the freed item to finish
1820de03c30SMark Johnstontheir read sections before recycling that item's memory.
1830de03c30SMark Johnston.Pp
1840de03c30SMark JohnstonIf the zone has an associated per-item destructor, it will be invoked at some
1850de03c30SMark Johnstonpoint when no readers can be accessing a given item.
1860de03c30SMark JohnstonThe destructor can be used to release additional resources associated with the
1870de03c30SMark Johnstonitem.
1880de03c30SMark JohnstonNote however that there is no guarantee that the destructor will be invoked in
1890de03c30SMark Johnstona bounded time period.
1900de03c30SMark Johnston.Ss Writers
1910de03c30SMark JohnstonConsumers are expected to use SMR in conjunction with UMA and thus need only
1920de03c30SMark Johnstonmake use of the
1930de03c30SMark Johnston.Fn smr_enter
1940de03c30SMark Johnstonand
1950de03c30SMark Johnston.Fn smr_exit
1960de03c30SMark Johnstonfunctions, and the SMR helper macros defined in
1970de03c30SMark Johnston.Pa sys/smr_types.h .
1980de03c30SMark JohnstonHowever, an introduction to the write-side interface of SMR can be useful.
1990de03c30SMark Johnston.Pp
2000de03c30SMark JohnstonInternally, SMR maintains a global
2010de03c30SMark Johnston.Ql write sequence
2020de03c30SMark Johnstonnumber.
2030de03c30SMark JohnstonWhen entering a read section,
2040de03c30SMark Johnston.Fn smr_enter
2050de03c30SMark Johnstonloads a copy of the write sequence and stores it in per-CPU memory, hence
2060de03c30SMark Johnston.Ql observing
2070de03c30SMark Johnstonthat value.
2080de03c30SMark JohnstonTo exit a read section, this per-CPU memory is overwritten with an invalid
2090de03c30SMark Johnstonvalue, making the CPU inactive.
2100de03c30SMark JohnstonWriters perform two operations: advancing the write sequence number, and
2110de03c30SMark Johnstonpolling all CPUs to see whether active readers have observed a given sequence
2120de03c30SMark Johnstonnumber.
2130de03c30SMark JohnstonThese operations are performed by
2140de03c30SMark Johnston.Fn smr_advance
2150de03c30SMark Johnstonand
2160de03c30SMark Johnston.Fn smr_poll ,
2170de03c30SMark Johnstonrespectively, which do not require serialization between multiple writers.
2180de03c30SMark Johnston.Pp
2190de03c30SMark JohnstonAfter a writer unlinks an item from a data structure, it increments the write
2200de03c30SMark Johnstonsequence number and tags the item with the new value returned by
2210de03c30SMark Johnston.Fn smr_advance .
2220de03c30SMark JohnstonOnce all CPUs have observed the new value, the writer can use
2230de03c30SMark Johnston.Fn smr_poll
2240de03c30SMark Johnstonto deduce that no active readers have access to the unlinked item, and thus the
2250de03c30SMark Johnstonitem is safe to recycle.
2260de03c30SMark JohnstonBecause this pair of operations is relatively expensive, it is generally a good
2270de03c30SMark Johnstonidea to amortize this cost by accumulating a collection of multiple unlinked
2280de03c30SMark Johnstonitems and tagging the entire batch with a target write sequence number.
2290de03c30SMark Johnston.Pp
2300de03c30SMark Johnston.Fn smr_poll
2310de03c30SMark Johnstonis a non-blocking operation and returns true only if all active readers are
2320de03c30SMark Johnstonguaranteed to have observed the target sequence number value.
2330de03c30SMark Johnston.Fn smr_wait
2340de03c30SMark Johnstonis a variant of
2350de03c30SMark Johnston.Fn smr_poll
2360de03c30SMark Johnstonwhich waits until all CPUs have observed the target sequence number value.
2370de03c30SMark Johnston.Fn smr_synchronize
2380de03c30SMark Johnstoncombines
2390de03c30SMark Johnston.Fn smr_advance
2400de03c30SMark Johnstonwith
2410de03c30SMark Johnston.Fn smr_wait
2420de03c30SMark Johnstonto wait for all CPUs to observe a new write sequence number.
2430de03c30SMark JohnstonThis is an expensive operation and should only be used if polling cannot be
2440de03c30SMark Johnstondeferred in some way.
2450de03c30SMark Johnston.Ss Memory Ordering
2460de03c30SMark JohnstonThe
2470de03c30SMark Johnston.Fn smr_enter
2480de03c30SMark Johnstonfunction has acquire semantics, and the
2490de03c30SMark Johnston.Fn smr_exit
2500de03c30SMark Johnstonfunction has release semantics.
2510de03c30SMark JohnstonThe
2520de03c30SMark Johnston.Fn smr_advance ,
2530de03c30SMark Johnston.Fn smr_poll ,
2540de03c30SMark Johnston.Fn smr_wait ,
2550de03c30SMark Johnstonand
2560de03c30SMark Johnston.Fn smr_synchronize
2570de03c30SMark Johnstonfunctions should not be assumed to have any guarantees with respect to memory
2580de03c30SMark Johnstonordering.
2590de03c30SMark JohnstonIn practice, some of these functions have stronger ordering semantics than
2600de03c30SMark Johnstonis stated here, but this is specific to the implementation and should not be
2610de03c30SMark Johnstonrelied upon.
2620de03c30SMark JohnstonSee
2630de03c30SMark Johnston.Xr atomic 9
2640de03c30SMark Johnstonfor more details.
2650de03c30SMark Johnston.Sh NOTES
2660de03c30SMark JohnstonOutside of
2670de03c30SMark Johnston.Fx
2680de03c30SMark Johnstonthe acronym SMR typically refers to a family of algorithms which enable
2690de03c30SMark Johnstonmemory-safe read-only access to a data structure concurrent with modifications
2700de03c30SMark Johnstonto that data structure.
2710de03c30SMark JohnstonHere, SMR refers to a particular algorithm belonging to this family, as well as
2720de03c30SMark Johnstonits implementation in
2730de03c30SMark Johnston.Fx .
2740de03c30SMark JohnstonSee
2750de03c30SMark Johnston.Pa sys/sys/smr.h
2760de03c30SMark Johnstonand
2770de03c30SMark Johnston.Pa sys/kern/subr_smr.c
2780de03c30SMark Johnstonin the
2790de03c30SMark Johnston.Fx
2800de03c30SMark Johnstonsource tree for further details on the algorithm and the context.
2810de03c30SMark Johnston.Pp
2820de03c30SMark JohnstonThe acronym SMR is also used to mean "shingled magnetic recording", a
2830de03c30SMark Johnstontechnology used to store data on hard disk drives which requires operating
2840de03c30SMark Johnstonsystem support.
2850de03c30SMark JohnstonThese two uses of the acronym are unrelated.
2860de03c30SMark Johnston.Sh SEE ALSO
2870de03c30SMark Johnston.Xr atomic 9 ,
2880de03c30SMark Johnston.Xr locking 9 ,
2890de03c30SMark Johnston.Xr uma 9
2900de03c30SMark Johnston.Sh AUTHORS
2910de03c30SMark JohnstonThe SMR algorithm and its implementation were provided by
2920de03c30SMark Johnston.An Jeff Roberson Aq Mt jeff@FreeBSD.org .
2930de03c30SMark JohnstonThis manual page was written by
2940de03c30SMark Johnston.An Mark Johnston Aq Mt markj@FreeBSD.org .
295