10de03c30SMark Johnston.\" SPDX-License-Identifier: BSD-2-Clause 20de03c30SMark Johnston.\" 30de03c30SMark Johnston.\" Copyright (c) 2023 The FreeBSD Foundation 40de03c30SMark Johnston.\" 50de03c30SMark Johnston.\" This documentation was written by Mark Johnston <markj@FreeBSD.org> 60de03c30SMark Johnston.\" under sponsorship from the FreeBSD Foundation. 70de03c30SMark Johnston.\" 80de03c30SMark Johnston.\" Redistribution and use in source and binary forms, with or without 90de03c30SMark Johnston.\" modification, are permitted provided that the following conditions 100de03c30SMark Johnston.\" are met: 110de03c30SMark Johnston.\" 1. Redistributions of source code must retain the above copyright 120de03c30SMark Johnston.\" notice, this list of conditions and the following disclaimer. 130de03c30SMark Johnston.\" 2. Redistributions in binary form must reproduce the above copyright 140de03c30SMark Johnston.\" notice, this list of conditions and the following disclaimer in the 150de03c30SMark Johnston.\" documentation and/or other materials provided with the distribution. 160de03c30SMark Johnston.\" 170de03c30SMark Johnston.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 180de03c30SMark Johnston.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 190de03c30SMark Johnston.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 200de03c30SMark Johnston.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 210de03c30SMark Johnston.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 220de03c30SMark Johnston.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 230de03c30SMark Johnston.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 240de03c30SMark Johnston.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 250de03c30SMark Johnston.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 260de03c30SMark Johnston.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 270de03c30SMark Johnston.\" SUCH DAMAGE. 280de03c30SMark Johnston.\" 290de03c30SMark Johnston.Dd January 17, 2023 300de03c30SMark Johnston.Dt SMR 9 310de03c30SMark Johnston.Os 320de03c30SMark Johnston.Sh NAME 330de03c30SMark Johnston.Nm smr 340de03c30SMark Johnston.Nd safe memory reclamation for lock-free data structures 350de03c30SMark Johnston.Sh SYNOPSIS 360de03c30SMark Johnston.In sys/smr.h 370de03c30SMark Johnston.Ft smr_seq_t 380de03c30SMark Johnston.Fo smr_advance 390de03c30SMark Johnston.Fa "smr_t smr" 400de03c30SMark Johnston.Fc 410de03c30SMark Johnston.Ft smr_t 420de03c30SMark Johnston.Fo smr_create 430de03c30SMark Johnston.Fa "const char *name" 440de03c30SMark Johnston.Fc 450de03c30SMark Johnston.Ft void 460de03c30SMark Johnston.Fo smr_destroy 470de03c30SMark Johnston.Fa "smr_t smr" 480de03c30SMark Johnston.Fc 490de03c30SMark Johnston.Ft void 500de03c30SMark Johnston.Fo smr_enter 510de03c30SMark Johnston.Fa "smr_t smr" 520de03c30SMark Johnston.Fc 530de03c30SMark Johnston.Ft void 540de03c30SMark Johnston.Fo smr_exit 550de03c30SMark Johnston.Fa "smr_t smr" 560de03c30SMark Johnston.Fc 570de03c30SMark Johnston.Ft bool 580de03c30SMark Johnston.Fo smr_poll 590de03c30SMark Johnston.Fa "smr_t smr" 600de03c30SMark Johnston.Fa "smr_seq_t goal" 610de03c30SMark Johnston.Fa "bool wait" 620de03c30SMark Johnston.Fc 630de03c30SMark Johnston.Ft void 640de03c30SMark Johnston.Fo smr_synchronize 650de03c30SMark Johnston.Fa "smr_t smr" 660de03c30SMark Johnston.Fc 67*cd133525SMark Johnston.Ft void 680de03c30SMark Johnston.Fo smr_wait 690de03c30SMark Johnston.Fa "smr_t smr" 700de03c30SMark Johnston.Fa "smr_seq_t goal" 710de03c30SMark Johnston.Fc 720de03c30SMark Johnston.Sh DESCRIPTION 730de03c30SMark JohnstonSafe Memory Reclamation (SMR) is a facility which enables the implementation of 740de03c30SMark Johnstonmemory-safe lock-free data structures. 750de03c30SMark JohnstonIn typical usage, read accesses to an SMR-protected data structure, such as a 760de03c30SMark Johnstonhash table or tree, are performed in a 770de03c30SMark Johnston.Dq read section 780de03c30SMark Johnstonconsisting of code bracketed by 790de03c30SMark Johnston.Fn smr_enter 800de03c30SMark Johnstonand 810de03c30SMark Johnston.Fn smr_exit 820de03c30SMark Johnstoncalls, while mutations of the data structure are serialized by a traditional 830de03c30SMark Johnstonmutex such as 840de03c30SMark Johnston.Xr mutex 9 . 850de03c30SMark JohnstonIn contrast with reader-writer locks such as 860de03c30SMark Johnston.Xr rwlock 9 , 870de03c30SMark Johnston.Xr rmlock 9 , 880de03c30SMark Johnstonand 890de03c30SMark Johnston.Xr sx 9 , 900de03c30SMark JohnstonSMR allows readers and writers to access the data structure concurrently. 910de03c30SMark JohnstonReaders can always enter a read section immediately 920de03c30SMark Johnston.Po 930de03c30SMark Johnston.Fn smr_enter 940de03c30SMark Johnstonnever blocks 950de03c30SMark Johnston.Pc , 960de03c30SMark Johnstonso mutations do not introduce read latency. 970de03c30SMark JohnstonFurthermore, 980de03c30SMark Johnston.Fn smr_enter 990de03c30SMark Johnstonand 1000de03c30SMark Johnston.Fn smr_exit 1010de03c30SMark Johnstonoperate only on per-CPU data and thus avoid some of the performance problems 1020de03c30SMark Johnstoninherent in the implementation of traditional reader-writer mutexes. 1030de03c30SMark JohnstonSMR can therefore be a useful building block for data structures which are 1040de03c30SMark Johnstonaccessed frequently but are only rarely modified. 1050de03c30SMark Johnston.Pp 1060de03c30SMark JohnstonNote that any SMR-protected data structure must be implemented carefully such 1070de03c30SMark Johnstonthat operations behave correctly in the absence of mutual exclusion between 1080de03c30SMark Johnstonreaders and writers. 1090de03c30SMark JohnstonThe data structure must be designed to be lock-free; SMR merely facilitates 1100de03c30SMark Johnstonthe implementation, for example by making it safe to follow dangling pointers 1110de03c30SMark Johnstonand by helping avoid the ABA problem. 1120de03c30SMark Johnston.Pp 1130de03c30SMark JohnstonWhen shared accesses to and mutations of a data structure can proceed 1140de03c30SMark Johnstonconcurrently, writers must take care to ensure that any items removed from the 1150de03c30SMark Johnstonstructure are not freed and recycled while readers are accessing them in 1160de03c30SMark Johnstonparallel. 1170de03c30SMark JohnstonThis requirement results in a two-phase approach to the removal of items: 1180de03c30SMark Johnstonfirst, the item is unlinked such that all pointers to the item are removed from 1190de03c30SMark Johnstonthe structure, preventing any new readers from observing the item. 1200de03c30SMark JohnstonThen, the writer waits until some mechanism guarantees that no existing readers 1210de03c30SMark Johnstonare still accessing the item. 1220de03c30SMark JohnstonAt that point the memory for that item can be freed and reused safely. 1230de03c30SMark JohnstonSMR provides this mechanism: readers may access a lock-free data structure in 1240de03c30SMark Johnstonbetween calls to the 1250de03c30SMark Johnston.Fn smr_enter 1260de03c30SMark Johnstonand 1270de03c30SMark Johnston.Fn smr_exit 1280de03c30SMark Johnstonfunctions, which together create a read section, and the 1290de03c30SMark Johnston.Fn smr_advance , 1300de03c30SMark Johnston.Fn smr_poll , 1310de03c30SMark Johnston.Fn smr_wait , 1320de03c30SMark Johnstonand 1330de03c30SMark Johnston.Fn smr_synchronize 1340de03c30SMark Johnstonfunctions can be used to wait for threads in read sections to finish. 1350de03c30SMark JohnstonAll of these functions operate on a 1360de03c30SMark Johnston.Ft smr_t 1370de03c30SMark Johnstonstate block which holds both per-CPU and global state. 1380de03c30SMark JohnstonReaders load global state and modify per-CPU state, while writers must scan all 1390de03c30SMark Johnstonper-CPU states to detect active readers. 1400de03c30SMark JohnstonSMR is designed to amortize this cost by batching to give acceptable 1410de03c30SMark Johnstonperformance in write-heavy workloads. 1420de03c30SMark Johnston.Ss Readers 1430de03c30SMark JohnstonThreads enter a read section by calling 1440de03c30SMark Johnston.Fn smr_enter . 1450de03c30SMark JohnstonRead sections should be short, and many operations are not permitted while in 1460de03c30SMark Johnstona read section. 1470de03c30SMark JohnstonSpecifically, context switching is disabled, and thus readers may not acquire 1480de03c30SMark Johnstonblocking mutexes such as 1490de03c30SMark Johnston.Xr mutex 9 . 1500de03c30SMark JohnstonAnother consequence of this is that the thread is pinned to the current CPU for 1510de03c30SMark Johnstonthe duration of the read section. 1520de03c30SMark JohnstonFurthermore, read sections may not be nested: it is incorrect to call 1530de03c30SMark Johnston.Fn smr_enter 1540de03c30SMark Johnstonwith a given 1550de03c30SMark Johnston.Ft smr_t 1560de03c30SMark Johnstonstate block when already in a read section for that state block. 1570de03c30SMark Johnston.Ss UMA Integration 1580de03c30SMark JohnstonTo simplify the integration of SMR into consumers, the 1590de03c30SMark Johnston.Xr uma 9 1600de03c30SMark Johnstonkernel memory allocator provides some SMR-specified facilities. 1610de03c30SMark JohnstonThis eliminates a good deal of complexity from the implementation of consumers 1620de03c30SMark Johnstonand automatically batches write operations. 1630de03c30SMark Johnston.Pp 1640de03c30SMark JohnstonIn typical usage, a UMA zone (created with the 1650de03c30SMark Johnston.Dv UMA_ZONE_SMR 1660de03c30SMark Johnstonflag or initialized with the 1670de03c30SMark Johnston.Fn uma_zone_set_smr 1680de03c30SMark Johnstonfunction) is coupled with a 1690de03c30SMark Johnston.Ft smr_t 1700de03c30SMark Johnstonstate block. 1710de03c30SMark JohnstonTo insert an item into an SMR-protected data structure, memory is allocated 1720de03c30SMark Johnstonfrom the zone using the 1730de03c30SMark Johnston.Fn uma_zalloc_smr 1740de03c30SMark Johnstonfunction. 1750de03c30SMark JohnstonInsertions and removals are serialized using traditional mutual exclusion 1760de03c30SMark Johnstonand items are freed using the 1770de03c30SMark Johnston.Fn uma_zfree_smr 1780de03c30SMark Johnstonfunction. 1790de03c30SMark JohnstonRead-only lookup operations are performed in SMR read sections. 1800de03c30SMark Johnston.Fn uma_zfree_smr 1810de03c30SMark Johnstonwaits for all active readers which may be accessing the freed item to finish 1820de03c30SMark Johnstontheir read sections before recycling that item's memory. 1830de03c30SMark Johnston.Pp 1840de03c30SMark JohnstonIf the zone has an associated per-item destructor, it will be invoked at some 1850de03c30SMark Johnstonpoint when no readers can be accessing a given item. 1860de03c30SMark JohnstonThe destructor can be used to release additional resources associated with the 1870de03c30SMark Johnstonitem. 1880de03c30SMark JohnstonNote however that there is no guarantee that the destructor will be invoked in 1890de03c30SMark Johnstona bounded time period. 1900de03c30SMark Johnston.Ss Writers 1910de03c30SMark JohnstonConsumers are expected to use SMR in conjunction with UMA and thus need only 1920de03c30SMark Johnstonmake use of the 1930de03c30SMark Johnston.Fn smr_enter 1940de03c30SMark Johnstonand 1950de03c30SMark Johnston.Fn smr_exit 1960de03c30SMark Johnstonfunctions, and the SMR helper macros defined in 1970de03c30SMark Johnston.Pa sys/smr_types.h . 1980de03c30SMark JohnstonHowever, an introduction to the write-side interface of SMR can be useful. 1990de03c30SMark Johnston.Pp 2000de03c30SMark JohnstonInternally, SMR maintains a global 2010de03c30SMark Johnston.Ql write sequence 2020de03c30SMark Johnstonnumber. 2030de03c30SMark JohnstonWhen entering a read section, 2040de03c30SMark Johnston.Fn smr_enter 2050de03c30SMark Johnstonloads a copy of the write sequence and stores it in per-CPU memory, hence 2060de03c30SMark Johnston.Ql observing 2070de03c30SMark Johnstonthat value. 2080de03c30SMark JohnstonTo exit a read section, this per-CPU memory is overwritten with an invalid 2090de03c30SMark Johnstonvalue, making the CPU inactive. 2100de03c30SMark JohnstonWriters perform two operations: advancing the write sequence number, and 2110de03c30SMark Johnstonpolling all CPUs to see whether active readers have observed a given sequence 2120de03c30SMark Johnstonnumber. 2130de03c30SMark JohnstonThese operations are performed by 2140de03c30SMark Johnston.Fn smr_advance 2150de03c30SMark Johnstonand 2160de03c30SMark Johnston.Fn smr_poll , 2170de03c30SMark Johnstonrespectively, which do not require serialization between multiple writers. 2180de03c30SMark Johnston.Pp 2190de03c30SMark JohnstonAfter a writer unlinks an item from a data structure, it increments the write 2200de03c30SMark Johnstonsequence number and tags the item with the new value returned by 2210de03c30SMark Johnston.Fn smr_advance . 2220de03c30SMark JohnstonOnce all CPUs have observed the new value, the writer can use 2230de03c30SMark Johnston.Fn smr_poll 2240de03c30SMark Johnstonto deduce that no active readers have access to the unlinked item, and thus the 2250de03c30SMark Johnstonitem is safe to recycle. 2260de03c30SMark JohnstonBecause this pair of operations is relatively expensive, it is generally a good 2270de03c30SMark Johnstonidea to amortize this cost by accumulating a collection of multiple unlinked 2280de03c30SMark Johnstonitems and tagging the entire batch with a target write sequence number. 2290de03c30SMark Johnston.Pp 2300de03c30SMark Johnston.Fn smr_poll 2310de03c30SMark Johnstonis a non-blocking operation and returns true only if all active readers are 2320de03c30SMark Johnstonguaranteed to have observed the target sequence number value. 2330de03c30SMark Johnston.Fn smr_wait 2340de03c30SMark Johnstonis a variant of 2350de03c30SMark Johnston.Fn smr_poll 2360de03c30SMark Johnstonwhich waits until all CPUs have observed the target sequence number value. 2370de03c30SMark Johnston.Fn smr_synchronize 2380de03c30SMark Johnstoncombines 2390de03c30SMark Johnston.Fn smr_advance 2400de03c30SMark Johnstonwith 2410de03c30SMark Johnston.Fn smr_wait 2420de03c30SMark Johnstonto wait for all CPUs to observe a new write sequence number. 2430de03c30SMark JohnstonThis is an expensive operation and should only be used if polling cannot be 2440de03c30SMark Johnstondeferred in some way. 2450de03c30SMark Johnston.Ss Memory Ordering 2460de03c30SMark JohnstonThe 2470de03c30SMark Johnston.Fn smr_enter 2480de03c30SMark Johnstonfunction has acquire semantics, and the 2490de03c30SMark Johnston.Fn smr_exit 2500de03c30SMark Johnstonfunction has release semantics. 2510de03c30SMark JohnstonThe 2520de03c30SMark Johnston.Fn smr_advance , 2530de03c30SMark Johnston.Fn smr_poll , 2540de03c30SMark Johnston.Fn smr_wait , 2550de03c30SMark Johnstonand 2560de03c30SMark Johnston.Fn smr_synchronize 2570de03c30SMark Johnstonfunctions should not be assumed to have any guarantees with respect to memory 2580de03c30SMark Johnstonordering. 2590de03c30SMark JohnstonIn practice, some of these functions have stronger ordering semantics than 2600de03c30SMark Johnstonis stated here, but this is specific to the implementation and should not be 2610de03c30SMark Johnstonrelied upon. 2620de03c30SMark JohnstonSee 2630de03c30SMark Johnston.Xr atomic 9 2640de03c30SMark Johnstonfor more details. 2650de03c30SMark Johnston.Sh NOTES 2660de03c30SMark JohnstonOutside of 2670de03c30SMark Johnston.Fx 2680de03c30SMark Johnstonthe acronym SMR typically refers to a family of algorithms which enable 2690de03c30SMark Johnstonmemory-safe read-only access to a data structure concurrent with modifications 2700de03c30SMark Johnstonto that data structure. 2710de03c30SMark JohnstonHere, SMR refers to a particular algorithm belonging to this family, as well as 2720de03c30SMark Johnstonits implementation in 2730de03c30SMark Johnston.Fx . 2740de03c30SMark JohnstonSee 2750de03c30SMark Johnston.Pa sys/sys/smr.h 2760de03c30SMark Johnstonand 2770de03c30SMark Johnston.Pa sys/kern/subr_smr.c 2780de03c30SMark Johnstonin the 2790de03c30SMark Johnston.Fx 2800de03c30SMark Johnstonsource tree for further details on the algorithm and the context. 2810de03c30SMark Johnston.Pp 2820de03c30SMark JohnstonThe acronym SMR is also used to mean "shingled magnetic recording", a 2830de03c30SMark Johnstontechnology used to store data on hard disk drives which requires operating 2840de03c30SMark Johnstonsystem support. 2850de03c30SMark JohnstonThese two uses of the acronym are unrelated. 2860de03c30SMark Johnston.Sh SEE ALSO 2870de03c30SMark Johnston.Xr atomic 9 , 2880de03c30SMark Johnston.Xr locking 9 , 2890de03c30SMark Johnston.Xr uma 9 2900de03c30SMark Johnston.Sh AUTHORS 2910de03c30SMark JohnstonThe SMR algorithm and its implementation were provided by 2920de03c30SMark Johnston.An Jeff Roberson Aq Mt jeff@FreeBSD.org . 2930de03c30SMark JohnstonThis manual page was written by 2940de03c30SMark Johnston.An Mark Johnston Aq Mt markj@FreeBSD.org . 295