1706eeb3eSPeter Zijlstra 2706eeb3eSPeter ZijlstraOn atomic types (atomic_t atomic64_t and atomic_long_t). 3706eeb3eSPeter Zijlstra 4706eeb3eSPeter ZijlstraThe atomic type provides an interface to the architecture's means of atomic 5706eeb3eSPeter ZijlstraRMW operations between CPUs (atomic operations on MMIO are not supported and 6706eeb3eSPeter Zijlstracan lead to fatal traps on some platforms). 7706eeb3eSPeter Zijlstra 8706eeb3eSPeter ZijlstraAPI 9706eeb3eSPeter Zijlstra--- 10706eeb3eSPeter Zijlstra 11706eeb3eSPeter ZijlstraThe 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for 12706eeb3eSPeter Zijlstrabrevity): 13706eeb3eSPeter Zijlstra 14706eeb3eSPeter ZijlstraNon-RMW ops: 15706eeb3eSPeter Zijlstra 16706eeb3eSPeter Zijlstra atomic_read(), atomic_set() 17706eeb3eSPeter Zijlstra atomic_read_acquire(), atomic_set_release() 18706eeb3eSPeter Zijlstra 19706eeb3eSPeter Zijlstra 20706eeb3eSPeter ZijlstraRMW atomic operations: 21706eeb3eSPeter Zijlstra 22706eeb3eSPeter ZijlstraArithmetic: 23706eeb3eSPeter Zijlstra 24706eeb3eSPeter Zijlstra atomic_{add,sub,inc,dec}() 25706eeb3eSPeter Zijlstra atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}() 26706eeb3eSPeter Zijlstra atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}() 27706eeb3eSPeter Zijlstra 28706eeb3eSPeter Zijlstra 29706eeb3eSPeter ZijlstraBitwise: 30706eeb3eSPeter Zijlstra 31706eeb3eSPeter Zijlstra atomic_{and,or,xor,andnot}() 32706eeb3eSPeter Zijlstra atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}() 33706eeb3eSPeter Zijlstra 34706eeb3eSPeter Zijlstra 35706eeb3eSPeter ZijlstraSwap: 36706eeb3eSPeter Zijlstra 37706eeb3eSPeter Zijlstra atomic_xchg{,_relaxed,_acquire,_release}() 38706eeb3eSPeter Zijlstra atomic_cmpxchg{,_relaxed,_acquire,_release}() 39706eeb3eSPeter Zijlstra atomic_try_cmpxchg{,_relaxed,_acquire,_release}() 40706eeb3eSPeter Zijlstra 41706eeb3eSPeter Zijlstra 42706eeb3eSPeter ZijlstraReference count (but please see refcount_t): 43706eeb3eSPeter Zijlstra 44706eeb3eSPeter Zijlstra atomic_add_unless(), atomic_inc_not_zero() 45706eeb3eSPeter Zijlstra atomic_sub_and_test(), atomic_dec_and_test() 46706eeb3eSPeter Zijlstra 47706eeb3eSPeter Zijlstra 48706eeb3eSPeter ZijlstraMisc: 49706eeb3eSPeter Zijlstra 50706eeb3eSPeter Zijlstra atomic_inc_and_test(), atomic_add_negative() 51706eeb3eSPeter Zijlstra atomic_dec_unless_positive(), atomic_inc_unless_negative() 52706eeb3eSPeter Zijlstra 53706eeb3eSPeter Zijlstra 54706eeb3eSPeter ZijlstraBarriers: 55706eeb3eSPeter Zijlstra 56706eeb3eSPeter Zijlstra smp_mb__{before,after}_atomic() 57706eeb3eSPeter Zijlstra 58706eeb3eSPeter Zijlstra 59f1887143SPeter ZijlstraTYPES (signed vs unsigned) 60f1887143SPeter Zijlstra----- 61f1887143SPeter Zijlstra 62f1887143SPeter ZijlstraWhile atomic_t, atomic_long_t and atomic64_t use int, long and s64 63f1887143SPeter Zijlstrarespectively (for hysterical raisins), the kernel uses -fno-strict-overflow 64f1887143SPeter Zijlstra(which implies -fwrapv) and defines signed overflow to behave like 65f1887143SPeter Zijlstra2s-complement. 66f1887143SPeter Zijlstra 67f1887143SPeter ZijlstraTherefore, an explicitly unsigned variant of the atomic ops is strictly 68f1887143SPeter Zijlstraunnecessary and we can simply cast, there is no UB. 69f1887143SPeter Zijlstra 70f1887143SPeter ZijlstraThere was a bug in UBSAN prior to GCC-8 that would generate UB warnings for 71f1887143SPeter Zijlstrasigned types. 72f1887143SPeter Zijlstra 73f1887143SPeter ZijlstraWith this we also conform to the C/C++ _Atomic behaviour and things like 74f1887143SPeter ZijlstraP1236R1. 75f1887143SPeter Zijlstra 76706eeb3eSPeter Zijlstra 77706eeb3eSPeter ZijlstraSEMANTICS 78706eeb3eSPeter Zijlstra--------- 79706eeb3eSPeter Zijlstra 80706eeb3eSPeter ZijlstraNon-RMW ops: 81706eeb3eSPeter Zijlstra 82706eeb3eSPeter ZijlstraThe non-RMW ops are (typically) regular LOADs and STOREs and are canonically 83706eeb3eSPeter Zijlstraimplemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and 84fff9b6c7SPeter Zijlstrasmp_store_release() respectively. Therefore, if you find yourself only using 85fff9b6c7SPeter Zijlstrathe Non-RMW operations of atomic_t, you do not in fact need atomic_t at all 86fff9b6c7SPeter Zijlstraand are doing it wrong. 87706eeb3eSPeter Zijlstra 884dcd4d36SBoqun FengA note for the implementation of atomic_set{}() is that it must not break the 894dcd4d36SBoqun Fengatomicity of the RMW ops. That is: 90706eeb3eSPeter Zijlstra 914dcd4d36SBoqun Feng C Atomic-RMW-ops-are-atomic-WRT-atomic_set 92706eeb3eSPeter Zijlstra 93706eeb3eSPeter Zijlstra { 944dcd4d36SBoqun Feng atomic_t v = ATOMIC_INIT(1); 954dcd4d36SBoqun Feng } 964dcd4d36SBoqun Feng 974dcd4d36SBoqun Feng P0(atomic_t *v) 984dcd4d36SBoqun Feng { 994dcd4d36SBoqun Feng (void)atomic_add_unless(v, 1, 0); 100706eeb3eSPeter Zijlstra } 101706eeb3eSPeter Zijlstra 102706eeb3eSPeter Zijlstra P1(atomic_t *v) 103706eeb3eSPeter Zijlstra { 104706eeb3eSPeter Zijlstra atomic_set(v, 0); 105706eeb3eSPeter Zijlstra } 106706eeb3eSPeter Zijlstra 107706eeb3eSPeter Zijlstra exists 108706eeb3eSPeter Zijlstra (v=2) 109706eeb3eSPeter Zijlstra 110706eeb3eSPeter ZijlstraIn this case we would expect the atomic_set() from CPU1 to either happen 111706eeb3eSPeter Zijlstrabefore the atomic_add_unless(), in which case that latter one would no-op, or 112706eeb3eSPeter Zijlstra_after_ in which case we'd overwrite its result. In no case is "2" a valid 113706eeb3eSPeter Zijlstraoutcome. 114706eeb3eSPeter Zijlstra 115706eeb3eSPeter ZijlstraThis is typically true on 'normal' platforms, where a regular competing STORE 116706eeb3eSPeter Zijlstrawill invalidate a LL/SC or fail a CMPXCHG. 117706eeb3eSPeter Zijlstra 118706eeb3eSPeter ZijlstraThe obvious case where this is not so is when we need to implement atomic ops 119706eeb3eSPeter Zijlstrawith a lock: 120706eeb3eSPeter Zijlstra 121706eeb3eSPeter Zijlstra CPU0 CPU1 122706eeb3eSPeter Zijlstra 123706eeb3eSPeter Zijlstra atomic_add_unless(v, 1, 0); 124706eeb3eSPeter Zijlstra lock(); 125706eeb3eSPeter Zijlstra ret = READ_ONCE(v->counter); // == 1 126706eeb3eSPeter Zijlstra atomic_set(v, 0); 127706eeb3eSPeter Zijlstra if (ret != u) WRITE_ONCE(v->counter, 0); 128706eeb3eSPeter Zijlstra WRITE_ONCE(v->counter, ret + 1); 129706eeb3eSPeter Zijlstra unlock(); 130706eeb3eSPeter Zijlstra 131706eeb3eSPeter Zijlstrathe typical solution is to then implement atomic_set{}() with atomic_xchg(). 132706eeb3eSPeter Zijlstra 133706eeb3eSPeter Zijlstra 134706eeb3eSPeter ZijlstraRMW ops: 135706eeb3eSPeter Zijlstra 136706eeb3eSPeter ZijlstraThese come in various forms: 137706eeb3eSPeter Zijlstra 138706eeb3eSPeter Zijlstra - plain operations without return value: atomic_{}() 139706eeb3eSPeter Zijlstra 140706eeb3eSPeter Zijlstra - operations which return the modified value: atomic_{}_return() 141706eeb3eSPeter Zijlstra 142706eeb3eSPeter Zijlstra these are limited to the arithmetic operations because those are 143706eeb3eSPeter Zijlstra reversible. Bitops are irreversible and therefore the modified value 144706eeb3eSPeter Zijlstra is of dubious utility. 145706eeb3eSPeter Zijlstra 146706eeb3eSPeter Zijlstra - operations which return the original value: atomic_fetch_{}() 147706eeb3eSPeter Zijlstra 148706eeb3eSPeter Zijlstra - swap operations: xchg(), cmpxchg() and try_cmpxchg() 149706eeb3eSPeter Zijlstra 150706eeb3eSPeter Zijlstra - misc; the special purpose operations that are commonly used and would, 151706eeb3eSPeter Zijlstra given the interface, normally be implemented using (try_)cmpxchg loops but 152706eeb3eSPeter Zijlstra are time critical and can, (typically) on LL/SC architectures, be more 153706eeb3eSPeter Zijlstra efficiently implemented. 154706eeb3eSPeter Zijlstra 155706eeb3eSPeter ZijlstraAll these operations are SMP atomic; that is, the operations (for a single 156706eeb3eSPeter Zijlstraatomic variable) can be fully ordered and no intermediate state is lost or 157706eeb3eSPeter Zijlstravisible. 158706eeb3eSPeter Zijlstra 159706eeb3eSPeter Zijlstra 160706eeb3eSPeter ZijlstraORDERING (go read memory-barriers.txt first) 161706eeb3eSPeter Zijlstra-------- 162706eeb3eSPeter Zijlstra 163706eeb3eSPeter ZijlstraThe rule of thumb: 164706eeb3eSPeter Zijlstra 165706eeb3eSPeter Zijlstra - non-RMW operations are unordered; 166706eeb3eSPeter Zijlstra 167706eeb3eSPeter Zijlstra - RMW operations that have no return value are unordered; 168706eeb3eSPeter Zijlstra 169706eeb3eSPeter Zijlstra - RMW operations that have a return value are fully ordered; 170706eeb3eSPeter Zijlstra 171706eeb3eSPeter Zijlstra - RMW operations that are conditional are unordered on FAILURE, 172706eeb3eSPeter Zijlstra otherwise the above rules apply. 173706eeb3eSPeter Zijlstra 174706eeb3eSPeter ZijlstraExcept of course when an operation has an explicit ordering like: 175706eeb3eSPeter Zijlstra 176706eeb3eSPeter Zijlstra {}_relaxed: unordered 177706eeb3eSPeter Zijlstra {}_acquire: the R of the RMW (or atomic_read) is an ACQUIRE 178706eeb3eSPeter Zijlstra {}_release: the W of the RMW (or atomic_set) is a RELEASE 179706eeb3eSPeter Zijlstra 180706eeb3eSPeter ZijlstraWhere 'unordered' is against other memory locations. Address dependencies are 181706eeb3eSPeter Zijlstranot defeated. 182706eeb3eSPeter Zijlstra 183706eeb3eSPeter ZijlstraFully ordered primitives are ordered against everything prior and everything 184706eeb3eSPeter Zijlstrasubsequent. Therefore a fully ordered primitive is like having an smp_mb() 185706eeb3eSPeter Zijlstrabefore and an smp_mb() after the primitive. 186706eeb3eSPeter Zijlstra 187706eeb3eSPeter Zijlstra 188706eeb3eSPeter ZijlstraThe barriers: 189706eeb3eSPeter Zijlstra 190706eeb3eSPeter Zijlstra smp_mb__{before,after}_atomic() 191706eeb3eSPeter Zijlstra 1922966f8d4SAlan Sternonly apply to the RMW atomic ops and can be used to augment/upgrade the 1932966f8d4SAlan Sternordering inherent to the op. These barriers act almost like a full smp_mb(): 1942966f8d4SAlan Sternsmp_mb__before_atomic() orders all earlier accesses against the RMW op 1952966f8d4SAlan Sternitself and all accesses following it, and smp_mb__after_atomic() orders all 1962966f8d4SAlan Sternlater accesses against the RMW op and all accesses preceding it. However, 1972966f8d4SAlan Sternaccesses between the smp_mb__{before,after}_atomic() and the RMW op are not 1982966f8d4SAlan Sternordered, so it is advisable to place the barrier right next to the RMW atomic 1992966f8d4SAlan Sternop whenever possible. 200706eeb3eSPeter Zijlstra 201706eeb3eSPeter ZijlstraThese helper barriers exist because architectures have varying implicit 202706eeb3eSPeter Zijlstraordering on their SMP atomic primitives. For example our TSO architectures 203706eeb3eSPeter Zijlstraprovide full ordered atomics and these barriers are no-ops. 204706eeb3eSPeter Zijlstra 20569d927bbSPeter ZijlstraNOTE: when the atomic RmW ops are fully ordered, they should also imply a 20669d927bbSPeter Zijlstracompiler barrier. 20769d927bbSPeter Zijlstra 208706eeb3eSPeter ZijlstraThus: 209706eeb3eSPeter Zijlstra 210706eeb3eSPeter Zijlstra atomic_fetch_add(); 211706eeb3eSPeter Zijlstra 212706eeb3eSPeter Zijlstrais equivalent to: 213706eeb3eSPeter Zijlstra 214706eeb3eSPeter Zijlstra smp_mb__before_atomic(); 215706eeb3eSPeter Zijlstra atomic_fetch_add_relaxed(); 216706eeb3eSPeter Zijlstra smp_mb__after_atomic(); 217706eeb3eSPeter Zijlstra 218706eeb3eSPeter ZijlstraHowever the atomic_fetch_add() might be implemented more efficiently. 219706eeb3eSPeter Zijlstra 220706eeb3eSPeter ZijlstraFurther, while something like: 221706eeb3eSPeter Zijlstra 222706eeb3eSPeter Zijlstra smp_mb__before_atomic(); 223706eeb3eSPeter Zijlstra atomic_dec(&X); 224706eeb3eSPeter Zijlstra 225706eeb3eSPeter Zijlstrais a 'typical' RELEASE pattern, the barrier is strictly stronger than 2262966f8d4SAlan Sterna RELEASE because it orders preceding instructions against both the read 2272966f8d4SAlan Sternand write parts of the atomic_dec(), and against all following instructions 2282966f8d4SAlan Sternas well. Similarly, something like: 229706eeb3eSPeter Zijlstra 230ca110694SPeter Zijlstra atomic_inc(&X); 231ca110694SPeter Zijlstra smp_mb__after_atomic(); 232706eeb3eSPeter Zijlstra 233ca110694SPeter Zijlstrais an ACQUIRE pattern (though very much not typical), but again the barrier is 234ca110694SPeter Zijlstrastrictly stronger than ACQUIRE. As illustrated: 235ca110694SPeter Zijlstra 236e30d0235SBoqun Feng C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire 237ca110694SPeter Zijlstra 238ca110694SPeter Zijlstra { 239ca110694SPeter Zijlstra } 240ca110694SPeter Zijlstra 241e30d0235SBoqun Feng P0(int *x, atomic_t *y) 242ca110694SPeter Zijlstra { 243ca110694SPeter Zijlstra r0 = READ_ONCE(*x); 244ca110694SPeter Zijlstra smp_rmb(); 245ca110694SPeter Zijlstra r1 = atomic_read(y); 246ca110694SPeter Zijlstra } 247ca110694SPeter Zijlstra 248e30d0235SBoqun Feng P1(int *x, atomic_t *y) 249ca110694SPeter Zijlstra { 250ca110694SPeter Zijlstra atomic_inc(y); 251ca110694SPeter Zijlstra smp_mb__after_atomic(); 252ca110694SPeter Zijlstra WRITE_ONCE(*x, 1); 253ca110694SPeter Zijlstra } 254ca110694SPeter Zijlstra 255ca110694SPeter Zijlstra exists 256e30d0235SBoqun Feng (0:r0=1 /\ 0:r1=0) 257ca110694SPeter Zijlstra 258ca110694SPeter ZijlstraThis should not happen; but a hypothetical atomic_inc_acquire() -- 259ca110694SPeter Zijlstra(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome, 2602966f8d4SAlan Sternbecause it would not order the W part of the RMW against the following 2612966f8d4SAlan SternWRITE_ONCE. Thus: 262ca110694SPeter Zijlstra 263e30d0235SBoqun Feng P0 P1 264ca110694SPeter Zijlstra 265ca110694SPeter Zijlstra t = LL.acq *y (0) 266ca110694SPeter Zijlstra t++; 267ca110694SPeter Zijlstra *x = 1; 268ca110694SPeter Zijlstra r0 = *x (1) 269ca110694SPeter Zijlstra RMB 270ca110694SPeter Zijlstra r1 = *y (0) 271ca110694SPeter Zijlstra SC *y, t; 272ca110694SPeter Zijlstra 273ca110694SPeter Zijlstrais allowed. 274d1bbfd0cSPeter Zijlstra 275d1bbfd0cSPeter Zijlstra 276d1bbfd0cSPeter ZijlstraCMPXCHG vs TRY_CMPXCHG 277d1bbfd0cSPeter Zijlstra---------------------- 278d1bbfd0cSPeter Zijlstra 279d1bbfd0cSPeter Zijlstra int atomic_cmpxchg(atomic_t *ptr, int old, int new); 280d1bbfd0cSPeter Zijlstra bool atomic_try_cmpxchg(atomic_t *ptr, int *oldp, int new); 281d1bbfd0cSPeter Zijlstra 282d1bbfd0cSPeter ZijlstraBoth provide the same functionality, but try_cmpxchg() can lead to more 283d1bbfd0cSPeter Zijlstracompact code. The functions relate like: 284d1bbfd0cSPeter Zijlstra 285d1bbfd0cSPeter Zijlstra bool atomic_try_cmpxchg(atomic_t *ptr, int *oldp, int new) 286d1bbfd0cSPeter Zijlstra { 287d1bbfd0cSPeter Zijlstra int ret, old = *oldp; 288d1bbfd0cSPeter Zijlstra ret = atomic_cmpxchg(ptr, old, new); 289d1bbfd0cSPeter Zijlstra if (ret != old) 290d1bbfd0cSPeter Zijlstra *oldp = ret; 291d1bbfd0cSPeter Zijlstra return ret == old; 292d1bbfd0cSPeter Zijlstra } 293d1bbfd0cSPeter Zijlstra 294d1bbfd0cSPeter Zijlstraand: 295d1bbfd0cSPeter Zijlstra 296d1bbfd0cSPeter Zijlstra int atomic_cmpxchg(atomic_t *ptr, int old, int new) 297d1bbfd0cSPeter Zijlstra { 298d1bbfd0cSPeter Zijlstra (void)atomic_try_cmpxchg(ptr, &old, new); 299d1bbfd0cSPeter Zijlstra return old; 300d1bbfd0cSPeter Zijlstra } 301d1bbfd0cSPeter Zijlstra 302d1bbfd0cSPeter ZijlstraUsage: 303d1bbfd0cSPeter Zijlstra 304d1bbfd0cSPeter Zijlstra old = atomic_read(&v); old = atomic_read(&v); 305d1bbfd0cSPeter Zijlstra for (;;) { do { 306d1bbfd0cSPeter Zijlstra new = func(old); new = func(old); 307d1bbfd0cSPeter Zijlstra tmp = atomic_cmpxchg(&v, old, new); } while (!atomic_try_cmpxchg(&v, &old, new)); 308d1bbfd0cSPeter Zijlstra if (tmp == old) 309d1bbfd0cSPeter Zijlstra break; 310d1bbfd0cSPeter Zijlstra old = tmp; 311d1bbfd0cSPeter Zijlstra } 312d1bbfd0cSPeter Zijlstra 313d1bbfd0cSPeter ZijlstraNB. try_cmpxchg() also generates better code on some platforms (notably x86) 314d1bbfd0cSPeter Zijlstrawhere the function more closely matches the hardware instruction. 31555bccf1fSPeter Zijlstra 31655bccf1fSPeter Zijlstra 31755bccf1fSPeter ZijlstraFORWARD PROGRESS 31855bccf1fSPeter Zijlstra---------------- 31955bccf1fSPeter Zijlstra 32055bccf1fSPeter ZijlstraIn general strong forward progress is expected of all unconditional atomic 32155bccf1fSPeter Zijlstraoperations -- those in the Arithmetic and Bitwise classes and xchg(). However 32255bccf1fSPeter Zijlstraa fair amount of code also requires forward progress from the conditional 32355bccf1fSPeter Zijlstraatomic operations. 32455bccf1fSPeter Zijlstra 32555bccf1fSPeter ZijlstraSpecifically 'simple' cmpxchg() loops are expected to not starve one another 32655bccf1fSPeter Zijlstraindefinitely. However, this is not evident on LL/SC architectures, because 327*aae0c8a5SKushagra Vermawhile an LL/SC architecture 'can/should/must' provide forward progress 32855bccf1fSPeter Zijlstraguarantees between competing LL/SC sections, such a guarantee does not 32955bccf1fSPeter Zijlstratransfer to cmpxchg() implemented using LL/SC. Consider: 33055bccf1fSPeter Zijlstra 33155bccf1fSPeter Zijlstra old = atomic_read(&v); 33255bccf1fSPeter Zijlstra do { 33355bccf1fSPeter Zijlstra new = func(old); 33455bccf1fSPeter Zijlstra } while (!atomic_try_cmpxchg(&v, &old, new)); 33555bccf1fSPeter Zijlstra 33655bccf1fSPeter Zijlstrawhich on LL/SC becomes something like: 33755bccf1fSPeter Zijlstra 33855bccf1fSPeter Zijlstra old = atomic_read(&v); 33955bccf1fSPeter Zijlstra do { 34055bccf1fSPeter Zijlstra new = func(old); 34155bccf1fSPeter Zijlstra } while (!({ 34255bccf1fSPeter Zijlstra volatile asm ("1: LL %[oldval], %[v]\n" 34355bccf1fSPeter Zijlstra " CMP %[oldval], %[old]\n" 34455bccf1fSPeter Zijlstra " BNE 2f\n" 34555bccf1fSPeter Zijlstra " SC %[new], %[v]\n" 34655bccf1fSPeter Zijlstra " BNE 1b\n" 34755bccf1fSPeter Zijlstra "2:\n" 34855bccf1fSPeter Zijlstra : [oldval] "=&r" (oldval), [v] "m" (v) 34955bccf1fSPeter Zijlstra : [old] "r" (old), [new] "r" (new) 35055bccf1fSPeter Zijlstra : "memory"); 35155bccf1fSPeter Zijlstra success = (oldval == old); 35255bccf1fSPeter Zijlstra if (!success) 35355bccf1fSPeter Zijlstra old = oldval; 35455bccf1fSPeter Zijlstra success; })); 35555bccf1fSPeter Zijlstra 35655bccf1fSPeter ZijlstraHowever, even the forward branch from the failed compare can cause the LL/SC 35755bccf1fSPeter Zijlstrato fail on some architectures, let alone whatever the compiler makes of the C 35855bccf1fSPeter Zijlstraloop body. As a result there is no guarantee what so ever the cacheline 35955bccf1fSPeter Zijlstracontaining @v will stay on the local CPU and progress is made. 36055bccf1fSPeter Zijlstra 36155bccf1fSPeter ZijlstraEven native CAS architectures can fail to provide forward progress for their 36255bccf1fSPeter Zijlstraprimitive (See Sparc64 for an example). 36355bccf1fSPeter Zijlstra 36455bccf1fSPeter ZijlstraSuch implementations are strongly encouraged to add exponential backoff loops 36555bccf1fSPeter Zijlstrato a failed CAS in order to ensure some progress. Affected architectures are 36655bccf1fSPeter Zijlstraalso strongly encouraged to inspect/audit the atomic fallbacks, refcount_t and 36755bccf1fSPeter Zijlstratheir locking primitives. 368