1*0de03c30SMark Johnston.\" SPDX-License-Identifier: BSD-2-Clause 2*0de03c30SMark Johnston.\" 3*0de03c30SMark Johnston.\" Copyright (c) 2023 The FreeBSD Foundation 4*0de03c30SMark Johnston.\" 5*0de03c30SMark Johnston.\" This documentation was written by Mark Johnston <markj@FreeBSD.org> 6*0de03c30SMark Johnston.\" under sponsorship from the FreeBSD Foundation. 7*0de03c30SMark Johnston.\" 8*0de03c30SMark Johnston.\" Redistribution and use in source and binary forms, with or without 9*0de03c30SMark Johnston.\" modification, are permitted provided that the following conditions 10*0de03c30SMark Johnston.\" are met: 11*0de03c30SMark Johnston.\" 1. Redistributions of source code must retain the above copyright 12*0de03c30SMark Johnston.\" notice, this list of conditions and the following disclaimer. 13*0de03c30SMark Johnston.\" 2. Redistributions in binary form must reproduce the above copyright 14*0de03c30SMark Johnston.\" notice, this list of conditions and the following disclaimer in the 15*0de03c30SMark Johnston.\" documentation and/or other materials provided with the distribution. 16*0de03c30SMark Johnston.\" 17*0de03c30SMark Johnston.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 18*0de03c30SMark Johnston.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 19*0de03c30SMark Johnston.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 20*0de03c30SMark Johnston.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 21*0de03c30SMark Johnston.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 22*0de03c30SMark Johnston.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 23*0de03c30SMark Johnston.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 24*0de03c30SMark Johnston.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 25*0de03c30SMark Johnston.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 26*0de03c30SMark Johnston.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 27*0de03c30SMark Johnston.\" SUCH DAMAGE. 28*0de03c30SMark Johnston.\" 29*0de03c30SMark Johnston.Dd January 17, 2023 30*0de03c30SMark Johnston.Dt SMR 9 31*0de03c30SMark Johnston.Os 32*0de03c30SMark Johnston.Sh NAME 33*0de03c30SMark Johnston.Nm smr 34*0de03c30SMark Johnston.Nd safe memory reclamation for lock-free data structures 35*0de03c30SMark Johnston.Sh SYNOPSIS 36*0de03c30SMark Johnston.In sys/smr.h 37*0de03c30SMark Johnston.Ft smr_seq_t 38*0de03c30SMark Johnston.Fo smr_advance 39*0de03c30SMark Johnston.Fa "smr_t smr" 40*0de03c30SMark Johnston.Fc 41*0de03c30SMark Johnston.Ft smr_t 42*0de03c30SMark Johnston.Fo smr_create 43*0de03c30SMark Johnston.Fa "const char *name" 44*0de03c30SMark Johnston.Fc 45*0de03c30SMark Johnston.Ft void 46*0de03c30SMark Johnston.Fo smr_destroy 47*0de03c30SMark Johnston.Fa "smr_t smr" 48*0de03c30SMark Johnston.Fc 49*0de03c30SMark Johnston.Ft void 50*0de03c30SMark Johnston.Fo smr_enter 51*0de03c30SMark Johnston.Fa "smr_t smr" 52*0de03c30SMark Johnston.Fc 53*0de03c30SMark Johnston.Ft void 54*0de03c30SMark Johnston.Fo smr_exit 55*0de03c30SMark Johnston.Fa "smr_t smr" 56*0de03c30SMark Johnston.Fc 57*0de03c30SMark Johnston.Ft bool 58*0de03c30SMark Johnston.Fo smr_poll 59*0de03c30SMark Johnston.Fa "smr_t smr" 60*0de03c30SMark Johnston.Fa "smr_seq_t goal" 61*0de03c30SMark Johnston.Fa "bool wait" 62*0de03c30SMark Johnston.Fc 63*0de03c30SMark Johnston.Ft void 64*0de03c30SMark Johnston.Fo smr_synchronize 65*0de03c30SMark Johnston.Fa "smr_t smr" 66*0de03c30SMark Johnston.Fc 67*0de03c30SMark Johnston.Ft bool 68*0de03c30SMark Johnston.Fo smr_wait 69*0de03c30SMark Johnston.Fa "smr_t smr" 70*0de03c30SMark Johnston.Fa "smr_seq_t goal" 71*0de03c30SMark Johnston.Fc 72*0de03c30SMark Johnston.Sh DESCRIPTION 73*0de03c30SMark JohnstonSafe Memory Reclamation (SMR) is a facility which enables the implementation of 74*0de03c30SMark Johnstonmemory-safe lock-free data structures. 75*0de03c30SMark JohnstonIn typical usage, read accesses to an SMR-protected data structure, such as a 76*0de03c30SMark Johnstonhash table or tree, are performed in a 77*0de03c30SMark Johnston.Dq read section 78*0de03c30SMark Johnstonconsisting of code bracketed by 79*0de03c30SMark Johnston.Fn smr_enter 80*0de03c30SMark Johnstonand 81*0de03c30SMark Johnston.Fn smr_exit 82*0de03c30SMark Johnstoncalls, while mutations of the data structure are serialized by a traditional 83*0de03c30SMark Johnstonmutex such as 84*0de03c30SMark Johnston.Xr mutex 9 . 85*0de03c30SMark JohnstonIn contrast with reader-writer locks such as 86*0de03c30SMark Johnston.Xr rwlock 9 , 87*0de03c30SMark Johnston.Xr rmlock 9 , 88*0de03c30SMark Johnstonand 89*0de03c30SMark Johnston.Xr sx 9 , 90*0de03c30SMark JohnstonSMR allows readers and writers to access the data structure concurrently. 91*0de03c30SMark JohnstonReaders can always enter a read section immediately 92*0de03c30SMark Johnston.Po 93*0de03c30SMark Johnston.Fn smr_enter 94*0de03c30SMark Johnstonnever blocks 95*0de03c30SMark Johnston.Pc , 96*0de03c30SMark Johnstonso mutations do not introduce read latency. 97*0de03c30SMark JohnstonFurthermore, 98*0de03c30SMark Johnston.Fn smr_enter 99*0de03c30SMark Johnstonand 100*0de03c30SMark Johnston.Fn smr_exit 101*0de03c30SMark Johnstonoperate only on per-CPU data and thus avoid some of the performance problems 102*0de03c30SMark Johnstoninherent in the implementation of traditional reader-writer mutexes. 103*0de03c30SMark JohnstonSMR can therefore be a useful building block for data structures which are 104*0de03c30SMark Johnstonaccessed frequently but are only rarely modified. 105*0de03c30SMark Johnston.Pp 106*0de03c30SMark JohnstonNote that any SMR-protected data structure must be implemented carefully such 107*0de03c30SMark Johnstonthat operations behave correctly in the absence of mutual exclusion between 108*0de03c30SMark Johnstonreaders and writers. 109*0de03c30SMark JohnstonThe data structure must be designed to be lock-free; SMR merely facilitates 110*0de03c30SMark Johnstonthe implementation, for example by making it safe to follow dangling pointers 111*0de03c30SMark Johnstonand by helping avoid the ABA problem. 112*0de03c30SMark Johnston.Pp 113*0de03c30SMark JohnstonWhen shared accesses to and mutations of a data structure can proceed 114*0de03c30SMark Johnstonconcurrently, writers must take care to ensure that any items removed from the 115*0de03c30SMark Johnstonstructure are not freed and recycled while readers are accessing them in 116*0de03c30SMark Johnstonparallel. 117*0de03c30SMark JohnstonThis requirement results in a two-phase approach to the removal of items: 118*0de03c30SMark Johnstonfirst, the item is unlinked such that all pointers to the item are removed from 119*0de03c30SMark Johnstonthe structure, preventing any new readers from observing the item. 120*0de03c30SMark JohnstonThen, the writer waits until some mechanism guarantees that no existing readers 121*0de03c30SMark Johnstonare still accessing the item. 122*0de03c30SMark JohnstonAt that point the memory for that item can be freed and reused safely. 123*0de03c30SMark JohnstonSMR provides this mechanism: readers may access a lock-free data structure in 124*0de03c30SMark Johnstonbetween calls to the 125*0de03c30SMark Johnston.Fn smr_enter 126*0de03c30SMark Johnstonand 127*0de03c30SMark Johnston.Fn smr_exit 128*0de03c30SMark Johnstonfunctions, which together create a read section, and the 129*0de03c30SMark Johnston.Fn smr_advance , 130*0de03c30SMark Johnston.Fn smr_poll , 131*0de03c30SMark Johnston.Fn smr_wait , 132*0de03c30SMark Johnstonand 133*0de03c30SMark Johnston.Fn smr_synchronize 134*0de03c30SMark Johnstonfunctions can be used to wait for threads in read sections to finish. 135*0de03c30SMark JohnstonAll of these functions operate on a 136*0de03c30SMark Johnston.Ft smr_t 137*0de03c30SMark Johnstonstate block which holds both per-CPU and global state. 138*0de03c30SMark JohnstonReaders load global state and modify per-CPU state, while writers must scan all 139*0de03c30SMark Johnstonper-CPU states to detect active readers. 140*0de03c30SMark JohnstonSMR is designed to amortize this cost by batching to give acceptable 141*0de03c30SMark Johnstonperformance in write-heavy workloads. 142*0de03c30SMark Johnston.Ss Readers 143*0de03c30SMark JohnstonThreads enter a read section by calling 144*0de03c30SMark Johnston.Fn smr_enter . 145*0de03c30SMark JohnstonRead sections should be short, and many operations are not permitted while in 146*0de03c30SMark Johnstona read section. 147*0de03c30SMark JohnstonSpecifically, context switching is disabled, and thus readers may not acquire 148*0de03c30SMark Johnstonblocking mutexes such as 149*0de03c30SMark Johnston.Xr mutex 9 . 150*0de03c30SMark JohnstonAnother consequence of this is that the thread is pinned to the current CPU for 151*0de03c30SMark Johnstonthe duration of the read section. 152*0de03c30SMark JohnstonFurthermore, read sections may not be nested: it is incorrect to call 153*0de03c30SMark Johnston.Fn smr_enter 154*0de03c30SMark Johnstonwith a given 155*0de03c30SMark Johnston.Ft smr_t 156*0de03c30SMark Johnstonstate block when already in a read section for that state block. 157*0de03c30SMark Johnston.Ss UMA Integration 158*0de03c30SMark JohnstonTo simplify the integration of SMR into consumers, the 159*0de03c30SMark Johnston.Xr uma 9 160*0de03c30SMark Johnstonkernel memory allocator provides some SMR-specified facilities. 161*0de03c30SMark JohnstonThis eliminates a good deal of complexity from the implementation of consumers 162*0de03c30SMark Johnstonand automatically batches write operations. 163*0de03c30SMark Johnston.Pp 164*0de03c30SMark JohnstonIn typical usage, a UMA zone (created with the 165*0de03c30SMark Johnston.Dv UMA_ZONE_SMR 166*0de03c30SMark Johnstonflag or initialized with the 167*0de03c30SMark Johnston.Fn uma_zone_set_smr 168*0de03c30SMark Johnstonfunction) is coupled with a 169*0de03c30SMark Johnston.Ft smr_t 170*0de03c30SMark Johnstonstate block. 171*0de03c30SMark JohnstonTo insert an item into an SMR-protected data structure, memory is allocated 172*0de03c30SMark Johnstonfrom the zone using the 173*0de03c30SMark Johnston.Fn uma_zalloc_smr 174*0de03c30SMark Johnstonfunction. 175*0de03c30SMark JohnstonInsertions and removals are serialized using traditional mutual exclusion 176*0de03c30SMark Johnstonand items are freed using the 177*0de03c30SMark Johnston.Fn uma_zfree_smr 178*0de03c30SMark Johnstonfunction. 179*0de03c30SMark JohnstonRead-only lookup operations are performed in SMR read sections. 180*0de03c30SMark Johnston.Fn uma_zfree_smr 181*0de03c30SMark Johnstonwaits for all active readers which may be accessing the freed item to finish 182*0de03c30SMark Johnstontheir read sections before recycling that item's memory. 183*0de03c30SMark Johnston.Pp 184*0de03c30SMark JohnstonIf the zone has an associated per-item destructor, it will be invoked at some 185*0de03c30SMark Johnstonpoint when no readers can be accessing a given item. 186*0de03c30SMark JohnstonThe destructor can be used to release additional resources associated with the 187*0de03c30SMark Johnstonitem. 188*0de03c30SMark JohnstonNote however that there is no guarantee that the destructor will be invoked in 189*0de03c30SMark Johnstona bounded time period. 190*0de03c30SMark Johnston.Ss Writers 191*0de03c30SMark JohnstonConsumers are expected to use SMR in conjunction with UMA and thus need only 192*0de03c30SMark Johnstonmake use of the 193*0de03c30SMark Johnston.Fn smr_enter 194*0de03c30SMark Johnstonand 195*0de03c30SMark Johnston.Fn smr_exit 196*0de03c30SMark Johnstonfunctions, and the SMR helper macros defined in 197*0de03c30SMark Johnston.Pa sys/smr_types.h . 198*0de03c30SMark JohnstonHowever, an introduction to the write-side interface of SMR can be useful. 199*0de03c30SMark Johnston.Pp 200*0de03c30SMark JohnstonInternally, SMR maintains a global 201*0de03c30SMark Johnston.Ql write sequence 202*0de03c30SMark Johnstonnumber. 203*0de03c30SMark JohnstonWhen entering a read section, 204*0de03c30SMark Johnston.Fn smr_enter 205*0de03c30SMark Johnstonloads a copy of the write sequence and stores it in per-CPU memory, hence 206*0de03c30SMark Johnston.Ql observing 207*0de03c30SMark Johnstonthat value. 208*0de03c30SMark JohnstonTo exit a read section, this per-CPU memory is overwritten with an invalid 209*0de03c30SMark Johnstonvalue, making the CPU inactive. 210*0de03c30SMark JohnstonWriters perform two operations: advancing the write sequence number, and 211*0de03c30SMark Johnstonpolling all CPUs to see whether active readers have observed a given sequence 212*0de03c30SMark Johnstonnumber. 213*0de03c30SMark JohnstonThese operations are performed by 214*0de03c30SMark Johnston.Fn smr_advance 215*0de03c30SMark Johnstonand 216*0de03c30SMark Johnston.Fn smr_poll , 217*0de03c30SMark Johnstonrespectively, which do not require serialization between multiple writers. 218*0de03c30SMark Johnston.Pp 219*0de03c30SMark JohnstonAfter a writer unlinks an item from a data structure, it increments the write 220*0de03c30SMark Johnstonsequence number and tags the item with the new value returned by 221*0de03c30SMark Johnston.Fn smr_advance . 222*0de03c30SMark JohnstonOnce all CPUs have observed the new value, the writer can use 223*0de03c30SMark Johnston.Fn smr_poll 224*0de03c30SMark Johnstonto deduce that no active readers have access to the unlinked item, and thus the 225*0de03c30SMark Johnstonitem is safe to recycle. 226*0de03c30SMark JohnstonBecause this pair of operations is relatively expensive, it is generally a good 227*0de03c30SMark Johnstonidea to amortize this cost by accumulating a collection of multiple unlinked 228*0de03c30SMark Johnstonitems and tagging the entire batch with a target write sequence number. 229*0de03c30SMark Johnston.Pp 230*0de03c30SMark Johnston.Fn smr_poll 231*0de03c30SMark Johnstonis a non-blocking operation and returns true only if all active readers are 232*0de03c30SMark Johnstonguaranteed to have observed the target sequence number value. 233*0de03c30SMark Johnston.Fn smr_wait 234*0de03c30SMark Johnstonis a variant of 235*0de03c30SMark Johnston.Fn smr_poll 236*0de03c30SMark Johnstonwhich waits until all CPUs have observed the target sequence number value. 237*0de03c30SMark Johnston.Fn smr_synchronize 238*0de03c30SMark Johnstoncombines 239*0de03c30SMark Johnston.Fn smr_advance 240*0de03c30SMark Johnstonwith 241*0de03c30SMark Johnston.Fn smr_wait 242*0de03c30SMark Johnstonto wait for all CPUs to observe a new write sequence number. 243*0de03c30SMark JohnstonThis is an expensive operation and should only be used if polling cannot be 244*0de03c30SMark Johnstondeferred in some way. 245*0de03c30SMark Johnston.Ss Memory Ordering 246*0de03c30SMark JohnstonThe 247*0de03c30SMark Johnston.Fn smr_enter 248*0de03c30SMark Johnstonfunction has acquire semantics, and the 249*0de03c30SMark Johnston.Fn smr_exit 250*0de03c30SMark Johnstonfunction has release semantics. 251*0de03c30SMark JohnstonThe 252*0de03c30SMark Johnston.Fn smr_advance , 253*0de03c30SMark Johnston.Fn smr_poll , 254*0de03c30SMark Johnston.Fn smr_wait , 255*0de03c30SMark Johnstonand 256*0de03c30SMark Johnston.Fn smr_synchronize 257*0de03c30SMark Johnstonfunctions should not be assumed to have any guarantees with respect to memory 258*0de03c30SMark Johnstonordering. 259*0de03c30SMark JohnstonIn practice, some of these functions have stronger ordering semantics than 260*0de03c30SMark Johnstonis stated here, but this is specific to the implementation and should not be 261*0de03c30SMark Johnstonrelied upon. 262*0de03c30SMark JohnstonSee 263*0de03c30SMark Johnston.Xr atomic 9 264*0de03c30SMark Johnstonfor more details. 265*0de03c30SMark Johnston.Sh NOTES 266*0de03c30SMark JohnstonOutside of 267*0de03c30SMark Johnston.Fx 268*0de03c30SMark Johnstonthe acronym SMR typically refers to a family of algorithms which enable 269*0de03c30SMark Johnstonmemory-safe read-only access to a data structure concurrent with modifications 270*0de03c30SMark Johnstonto that data structure. 271*0de03c30SMark JohnstonHere, SMR refers to a particular algorithm belonging to this family, as well as 272*0de03c30SMark Johnstonits implementation in 273*0de03c30SMark Johnston.Fx . 274*0de03c30SMark JohnstonSee 275*0de03c30SMark Johnston.Pa sys/sys/smr.h 276*0de03c30SMark Johnstonand 277*0de03c30SMark Johnston.Pa sys/kern/subr_smr.c 278*0de03c30SMark Johnstonin the 279*0de03c30SMark Johnston.Fx 280*0de03c30SMark Johnstonsource tree for further details on the algorithm and the context. 281*0de03c30SMark Johnston.Pp 282*0de03c30SMark JohnstonThe acronym SMR is also used to mean "shingled magnetic recording", a 283*0de03c30SMark Johnstontechnology used to store data on hard disk drives which requires operating 284*0de03c30SMark Johnstonsystem support. 285*0de03c30SMark JohnstonThese two uses of the acronym are unrelated. 286*0de03c30SMark Johnston.Sh SEE ALSO 287*0de03c30SMark Johnston.Xr atomic 9 , 288*0de03c30SMark Johnston.Xr locking 9 , 289*0de03c30SMark Johnston.Xr uma 9 290*0de03c30SMark Johnston.Sh AUTHORS 291*0de03c30SMark JohnstonThe SMR algorithm and its implementation were provided by 292*0de03c30SMark Johnston.An Jeff Roberson Aq Mt jeff@FreeBSD.org . 293*0de03c30SMark JohnstonThis manual page was written by 294*0de03c30SMark Johnston.An Mark Johnston Aq Mt markj@FreeBSD.org . 295