xref: /linux/Documentation/atomic_t.txt (revision 9a87ffc99ec8eb8d35eed7c4f816d75f5cc9662e)
1706eeb3eSPeter Zijlstra
2706eeb3eSPeter ZijlstraOn atomic types (atomic_t atomic64_t and atomic_long_t).
3706eeb3eSPeter Zijlstra
4706eeb3eSPeter ZijlstraThe atomic type provides an interface to the architecture's means of atomic
5706eeb3eSPeter ZijlstraRMW operations between CPUs (atomic operations on MMIO are not supported and
6706eeb3eSPeter Zijlstracan lead to fatal traps on some platforms).
7706eeb3eSPeter Zijlstra
8706eeb3eSPeter ZijlstraAPI
9706eeb3eSPeter Zijlstra---
10706eeb3eSPeter Zijlstra
11706eeb3eSPeter ZijlstraThe 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for
12706eeb3eSPeter Zijlstrabrevity):
13706eeb3eSPeter Zijlstra
14706eeb3eSPeter ZijlstraNon-RMW ops:
15706eeb3eSPeter Zijlstra
16706eeb3eSPeter Zijlstra  atomic_read(), atomic_set()
17706eeb3eSPeter Zijlstra  atomic_read_acquire(), atomic_set_release()
18706eeb3eSPeter Zijlstra
19706eeb3eSPeter Zijlstra
20706eeb3eSPeter ZijlstraRMW atomic operations:
21706eeb3eSPeter Zijlstra
22706eeb3eSPeter ZijlstraArithmetic:
23706eeb3eSPeter Zijlstra
24706eeb3eSPeter Zijlstra  atomic_{add,sub,inc,dec}()
25706eeb3eSPeter Zijlstra  atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
26706eeb3eSPeter Zijlstra  atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
27706eeb3eSPeter Zijlstra
28706eeb3eSPeter Zijlstra
29706eeb3eSPeter ZijlstraBitwise:
30706eeb3eSPeter Zijlstra
31706eeb3eSPeter Zijlstra  atomic_{and,or,xor,andnot}()
32706eeb3eSPeter Zijlstra  atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
33706eeb3eSPeter Zijlstra
34706eeb3eSPeter Zijlstra
35706eeb3eSPeter ZijlstraSwap:
36706eeb3eSPeter Zijlstra
37706eeb3eSPeter Zijlstra  atomic_xchg{,_relaxed,_acquire,_release}()
38706eeb3eSPeter Zijlstra  atomic_cmpxchg{,_relaxed,_acquire,_release}()
39706eeb3eSPeter Zijlstra  atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
40706eeb3eSPeter Zijlstra
41706eeb3eSPeter Zijlstra
42706eeb3eSPeter ZijlstraReference count (but please see refcount_t):
43706eeb3eSPeter Zijlstra
44706eeb3eSPeter Zijlstra  atomic_add_unless(), atomic_inc_not_zero()
45706eeb3eSPeter Zijlstra  atomic_sub_and_test(), atomic_dec_and_test()
46706eeb3eSPeter Zijlstra
47706eeb3eSPeter Zijlstra
48706eeb3eSPeter ZijlstraMisc:
49706eeb3eSPeter Zijlstra
50706eeb3eSPeter Zijlstra  atomic_inc_and_test(), atomic_add_negative()
51706eeb3eSPeter Zijlstra  atomic_dec_unless_positive(), atomic_inc_unless_negative()
52706eeb3eSPeter Zijlstra
53706eeb3eSPeter Zijlstra
54706eeb3eSPeter ZijlstraBarriers:
55706eeb3eSPeter Zijlstra
56706eeb3eSPeter Zijlstra  smp_mb__{before,after}_atomic()
57706eeb3eSPeter Zijlstra
58706eeb3eSPeter Zijlstra
59f1887143SPeter ZijlstraTYPES (signed vs unsigned)
60f1887143SPeter Zijlstra-----
61f1887143SPeter Zijlstra
62f1887143SPeter ZijlstraWhile atomic_t, atomic_long_t and atomic64_t use int, long and s64
63f1887143SPeter Zijlstrarespectively (for hysterical raisins), the kernel uses -fno-strict-overflow
64f1887143SPeter Zijlstra(which implies -fwrapv) and defines signed overflow to behave like
65f1887143SPeter Zijlstra2s-complement.
66f1887143SPeter Zijlstra
67f1887143SPeter ZijlstraTherefore, an explicitly unsigned variant of the atomic ops is strictly
68f1887143SPeter Zijlstraunnecessary and we can simply cast, there is no UB.
69f1887143SPeter Zijlstra
70f1887143SPeter ZijlstraThere was a bug in UBSAN prior to GCC-8 that would generate UB warnings for
71f1887143SPeter Zijlstrasigned types.
72f1887143SPeter Zijlstra
73f1887143SPeter ZijlstraWith this we also conform to the C/C++ _Atomic behaviour and things like
74f1887143SPeter ZijlstraP1236R1.
75f1887143SPeter Zijlstra
76706eeb3eSPeter Zijlstra
77706eeb3eSPeter ZijlstraSEMANTICS
78706eeb3eSPeter Zijlstra---------
79706eeb3eSPeter Zijlstra
80706eeb3eSPeter ZijlstraNon-RMW ops:
81706eeb3eSPeter Zijlstra
82706eeb3eSPeter ZijlstraThe non-RMW ops are (typically) regular LOADs and STOREs and are canonically
83706eeb3eSPeter Zijlstraimplemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
84fff9b6c7SPeter Zijlstrasmp_store_release() respectively. Therefore, if you find yourself only using
85fff9b6c7SPeter Zijlstrathe Non-RMW operations of atomic_t, you do not in fact need atomic_t at all
86fff9b6c7SPeter Zijlstraand are doing it wrong.
87706eeb3eSPeter Zijlstra
884dcd4d36SBoqun FengA note for the implementation of atomic_set{}() is that it must not break the
894dcd4d36SBoqun Fengatomicity of the RMW ops. That is:
90706eeb3eSPeter Zijlstra
914dcd4d36SBoqun Feng  C Atomic-RMW-ops-are-atomic-WRT-atomic_set
92706eeb3eSPeter Zijlstra
93706eeb3eSPeter Zijlstra  {
944dcd4d36SBoqun Feng    atomic_t v = ATOMIC_INIT(1);
954dcd4d36SBoqun Feng  }
964dcd4d36SBoqun Feng
974dcd4d36SBoqun Feng  P0(atomic_t *v)
984dcd4d36SBoqun Feng  {
994dcd4d36SBoqun Feng    (void)atomic_add_unless(v, 1, 0);
100706eeb3eSPeter Zijlstra  }
101706eeb3eSPeter Zijlstra
102706eeb3eSPeter Zijlstra  P1(atomic_t *v)
103706eeb3eSPeter Zijlstra  {
104706eeb3eSPeter Zijlstra    atomic_set(v, 0);
105706eeb3eSPeter Zijlstra  }
106706eeb3eSPeter Zijlstra
107706eeb3eSPeter Zijlstra  exists
108706eeb3eSPeter Zijlstra  (v=2)
109706eeb3eSPeter Zijlstra
110706eeb3eSPeter ZijlstraIn this case we would expect the atomic_set() from CPU1 to either happen
111706eeb3eSPeter Zijlstrabefore the atomic_add_unless(), in which case that latter one would no-op, or
112706eeb3eSPeter Zijlstra_after_ in which case we'd overwrite its result. In no case is "2" a valid
113706eeb3eSPeter Zijlstraoutcome.
114706eeb3eSPeter Zijlstra
115706eeb3eSPeter ZijlstraThis is typically true on 'normal' platforms, where a regular competing STORE
116706eeb3eSPeter Zijlstrawill invalidate a LL/SC or fail a CMPXCHG.
117706eeb3eSPeter Zijlstra
118706eeb3eSPeter ZijlstraThe obvious case where this is not so is when we need to implement atomic ops
119706eeb3eSPeter Zijlstrawith a lock:
120706eeb3eSPeter Zijlstra
121706eeb3eSPeter Zijlstra  CPU0						CPU1
122706eeb3eSPeter Zijlstra
123706eeb3eSPeter Zijlstra  atomic_add_unless(v, 1, 0);
124706eeb3eSPeter Zijlstra    lock();
125706eeb3eSPeter Zijlstra    ret = READ_ONCE(v->counter); // == 1
126706eeb3eSPeter Zijlstra						atomic_set(v, 0);
127706eeb3eSPeter Zijlstra    if (ret != u)				  WRITE_ONCE(v->counter, 0);
128706eeb3eSPeter Zijlstra      WRITE_ONCE(v->counter, ret + 1);
129706eeb3eSPeter Zijlstra    unlock();
130706eeb3eSPeter Zijlstra
131706eeb3eSPeter Zijlstrathe typical solution is to then implement atomic_set{}() with atomic_xchg().
132706eeb3eSPeter Zijlstra
133706eeb3eSPeter Zijlstra
134706eeb3eSPeter ZijlstraRMW ops:
135706eeb3eSPeter Zijlstra
136706eeb3eSPeter ZijlstraThese come in various forms:
137706eeb3eSPeter Zijlstra
138706eeb3eSPeter Zijlstra - plain operations without return value: atomic_{}()
139706eeb3eSPeter Zijlstra
140706eeb3eSPeter Zijlstra - operations which return the modified value: atomic_{}_return()
141706eeb3eSPeter Zijlstra
142706eeb3eSPeter Zijlstra   these are limited to the arithmetic operations because those are
143706eeb3eSPeter Zijlstra   reversible. Bitops are irreversible and therefore the modified value
144706eeb3eSPeter Zijlstra   is of dubious utility.
145706eeb3eSPeter Zijlstra
146706eeb3eSPeter Zijlstra - operations which return the original value: atomic_fetch_{}()
147706eeb3eSPeter Zijlstra
148706eeb3eSPeter Zijlstra - swap operations: xchg(), cmpxchg() and try_cmpxchg()
149706eeb3eSPeter Zijlstra
150706eeb3eSPeter Zijlstra - misc; the special purpose operations that are commonly used and would,
151706eeb3eSPeter Zijlstra   given the interface, normally be implemented using (try_)cmpxchg loops but
152706eeb3eSPeter Zijlstra   are time critical and can, (typically) on LL/SC architectures, be more
153706eeb3eSPeter Zijlstra   efficiently implemented.
154706eeb3eSPeter Zijlstra
155706eeb3eSPeter ZijlstraAll these operations are SMP atomic; that is, the operations (for a single
156706eeb3eSPeter Zijlstraatomic variable) can be fully ordered and no intermediate state is lost or
157706eeb3eSPeter Zijlstravisible.
158706eeb3eSPeter Zijlstra
159706eeb3eSPeter Zijlstra
160706eeb3eSPeter ZijlstraORDERING  (go read memory-barriers.txt first)
161706eeb3eSPeter Zijlstra--------
162706eeb3eSPeter Zijlstra
163706eeb3eSPeter ZijlstraThe rule of thumb:
164706eeb3eSPeter Zijlstra
165706eeb3eSPeter Zijlstra - non-RMW operations are unordered;
166706eeb3eSPeter Zijlstra
167706eeb3eSPeter Zijlstra - RMW operations that have no return value are unordered;
168706eeb3eSPeter Zijlstra
169706eeb3eSPeter Zijlstra - RMW operations that have a return value are fully ordered;
170706eeb3eSPeter Zijlstra
171706eeb3eSPeter Zijlstra - RMW operations that are conditional are unordered on FAILURE,
172706eeb3eSPeter Zijlstra   otherwise the above rules apply.
173706eeb3eSPeter Zijlstra
174706eeb3eSPeter ZijlstraExcept of course when an operation has an explicit ordering like:
175706eeb3eSPeter Zijlstra
176706eeb3eSPeter Zijlstra {}_relaxed: unordered
177706eeb3eSPeter Zijlstra {}_acquire: the R of the RMW (or atomic_read) is an ACQUIRE
178706eeb3eSPeter Zijlstra {}_release: the W of the RMW (or atomic_set)  is a  RELEASE
179706eeb3eSPeter Zijlstra
180706eeb3eSPeter ZijlstraWhere 'unordered' is against other memory locations. Address dependencies are
181706eeb3eSPeter Zijlstranot defeated.
182706eeb3eSPeter Zijlstra
183706eeb3eSPeter ZijlstraFully ordered primitives are ordered against everything prior and everything
184706eeb3eSPeter Zijlstrasubsequent. Therefore a fully ordered primitive is like having an smp_mb()
185706eeb3eSPeter Zijlstrabefore and an smp_mb() after the primitive.
186706eeb3eSPeter Zijlstra
187706eeb3eSPeter Zijlstra
188706eeb3eSPeter ZijlstraThe barriers:
189706eeb3eSPeter Zijlstra
190706eeb3eSPeter Zijlstra  smp_mb__{before,after}_atomic()
191706eeb3eSPeter Zijlstra
1922966f8d4SAlan Sternonly apply to the RMW atomic ops and can be used to augment/upgrade the
1932966f8d4SAlan Sternordering inherent to the op. These barriers act almost like a full smp_mb():
1942966f8d4SAlan Sternsmp_mb__before_atomic() orders all earlier accesses against the RMW op
1952966f8d4SAlan Sternitself and all accesses following it, and smp_mb__after_atomic() orders all
1962966f8d4SAlan Sternlater accesses against the RMW op and all accesses preceding it. However,
1972966f8d4SAlan Sternaccesses between the smp_mb__{before,after}_atomic() and the RMW op are not
1982966f8d4SAlan Sternordered, so it is advisable to place the barrier right next to the RMW atomic
1992966f8d4SAlan Sternop whenever possible.
200706eeb3eSPeter Zijlstra
201706eeb3eSPeter ZijlstraThese helper barriers exist because architectures have varying implicit
202706eeb3eSPeter Zijlstraordering on their SMP atomic primitives. For example our TSO architectures
203706eeb3eSPeter Zijlstraprovide full ordered atomics and these barriers are no-ops.
204706eeb3eSPeter Zijlstra
20569d927bbSPeter ZijlstraNOTE: when the atomic RmW ops are fully ordered, they should also imply a
20669d927bbSPeter Zijlstracompiler barrier.
20769d927bbSPeter Zijlstra
208706eeb3eSPeter ZijlstraThus:
209706eeb3eSPeter Zijlstra
210706eeb3eSPeter Zijlstra  atomic_fetch_add();
211706eeb3eSPeter Zijlstra
212706eeb3eSPeter Zijlstrais equivalent to:
213706eeb3eSPeter Zijlstra
214706eeb3eSPeter Zijlstra  smp_mb__before_atomic();
215706eeb3eSPeter Zijlstra  atomic_fetch_add_relaxed();
216706eeb3eSPeter Zijlstra  smp_mb__after_atomic();
217706eeb3eSPeter Zijlstra
218706eeb3eSPeter ZijlstraHowever the atomic_fetch_add() might be implemented more efficiently.
219706eeb3eSPeter Zijlstra
220706eeb3eSPeter ZijlstraFurther, while something like:
221706eeb3eSPeter Zijlstra
222706eeb3eSPeter Zijlstra  smp_mb__before_atomic();
223706eeb3eSPeter Zijlstra  atomic_dec(&X);
224706eeb3eSPeter Zijlstra
225706eeb3eSPeter Zijlstrais a 'typical' RELEASE pattern, the barrier is strictly stronger than
2262966f8d4SAlan Sterna RELEASE because it orders preceding instructions against both the read
2272966f8d4SAlan Sternand write parts of the atomic_dec(), and against all following instructions
2282966f8d4SAlan Sternas well. Similarly, something like:
229706eeb3eSPeter Zijlstra
230ca110694SPeter Zijlstra  atomic_inc(&X);
231ca110694SPeter Zijlstra  smp_mb__after_atomic();
232706eeb3eSPeter Zijlstra
233ca110694SPeter Zijlstrais an ACQUIRE pattern (though very much not typical), but again the barrier is
234ca110694SPeter Zijlstrastrictly stronger than ACQUIRE. As illustrated:
235ca110694SPeter Zijlstra
236e30d0235SBoqun Feng  C Atomic-RMW+mb__after_atomic-is-stronger-than-acquire
237ca110694SPeter Zijlstra
238ca110694SPeter Zijlstra  {
239ca110694SPeter Zijlstra  }
240ca110694SPeter Zijlstra
241e30d0235SBoqun Feng  P0(int *x, atomic_t *y)
242ca110694SPeter Zijlstra  {
243ca110694SPeter Zijlstra    r0 = READ_ONCE(*x);
244ca110694SPeter Zijlstra    smp_rmb();
245ca110694SPeter Zijlstra    r1 = atomic_read(y);
246ca110694SPeter Zijlstra  }
247ca110694SPeter Zijlstra
248e30d0235SBoqun Feng  P1(int *x, atomic_t *y)
249ca110694SPeter Zijlstra  {
250ca110694SPeter Zijlstra    atomic_inc(y);
251ca110694SPeter Zijlstra    smp_mb__after_atomic();
252ca110694SPeter Zijlstra    WRITE_ONCE(*x, 1);
253ca110694SPeter Zijlstra  }
254ca110694SPeter Zijlstra
255ca110694SPeter Zijlstra  exists
256e30d0235SBoqun Feng  (0:r0=1 /\ 0:r1=0)
257ca110694SPeter Zijlstra
258ca110694SPeter ZijlstraThis should not happen; but a hypothetical atomic_inc_acquire() --
259ca110694SPeter Zijlstra(void)atomic_fetch_inc_acquire() for instance -- would allow the outcome,
2602966f8d4SAlan Sternbecause it would not order the W part of the RMW against the following
2612966f8d4SAlan SternWRITE_ONCE.  Thus:
262ca110694SPeter Zijlstra
263e30d0235SBoqun Feng  P0			P1
264ca110694SPeter Zijlstra
265ca110694SPeter Zijlstra			t = LL.acq *y (0)
266ca110694SPeter Zijlstra			t++;
267ca110694SPeter Zijlstra			*x = 1;
268ca110694SPeter Zijlstra  r0 = *x (1)
269ca110694SPeter Zijlstra  RMB
270ca110694SPeter Zijlstra  r1 = *y (0)
271ca110694SPeter Zijlstra			SC *y, t;
272ca110694SPeter Zijlstra
273ca110694SPeter Zijlstrais allowed.
274d1bbfd0cSPeter Zijlstra
275d1bbfd0cSPeter Zijlstra
276d1bbfd0cSPeter ZijlstraCMPXCHG vs TRY_CMPXCHG
277d1bbfd0cSPeter Zijlstra----------------------
278d1bbfd0cSPeter Zijlstra
279d1bbfd0cSPeter Zijlstra  int atomic_cmpxchg(atomic_t *ptr, int old, int new);
280d1bbfd0cSPeter Zijlstra  bool atomic_try_cmpxchg(atomic_t *ptr, int *oldp, int new);
281d1bbfd0cSPeter Zijlstra
282d1bbfd0cSPeter ZijlstraBoth provide the same functionality, but try_cmpxchg() can lead to more
283d1bbfd0cSPeter Zijlstracompact code. The functions relate like:
284d1bbfd0cSPeter Zijlstra
285d1bbfd0cSPeter Zijlstra  bool atomic_try_cmpxchg(atomic_t *ptr, int *oldp, int new)
286d1bbfd0cSPeter Zijlstra  {
287d1bbfd0cSPeter Zijlstra    int ret, old = *oldp;
288d1bbfd0cSPeter Zijlstra    ret = atomic_cmpxchg(ptr, old, new);
289d1bbfd0cSPeter Zijlstra    if (ret != old)
290d1bbfd0cSPeter Zijlstra      *oldp = ret;
291d1bbfd0cSPeter Zijlstra    return ret == old;
292d1bbfd0cSPeter Zijlstra  }
293d1bbfd0cSPeter Zijlstra
294d1bbfd0cSPeter Zijlstraand:
295d1bbfd0cSPeter Zijlstra
296d1bbfd0cSPeter Zijlstra  int atomic_cmpxchg(atomic_t *ptr, int old, int new)
297d1bbfd0cSPeter Zijlstra  {
298d1bbfd0cSPeter Zijlstra    (void)atomic_try_cmpxchg(ptr, &old, new);
299d1bbfd0cSPeter Zijlstra    return old;
300d1bbfd0cSPeter Zijlstra  }
301d1bbfd0cSPeter Zijlstra
302d1bbfd0cSPeter ZijlstraUsage:
303d1bbfd0cSPeter Zijlstra
304d1bbfd0cSPeter Zijlstra  old = atomic_read(&v);			old = atomic_read(&v);
305d1bbfd0cSPeter Zijlstra  for (;;) {					do {
306d1bbfd0cSPeter Zijlstra    new = func(old);				  new = func(old);
307d1bbfd0cSPeter Zijlstra    tmp = atomic_cmpxchg(&v, old, new);		} while (!atomic_try_cmpxchg(&v, &old, new));
308d1bbfd0cSPeter Zijlstra    if (tmp == old)
309d1bbfd0cSPeter Zijlstra      break;
310d1bbfd0cSPeter Zijlstra    old = tmp;
311d1bbfd0cSPeter Zijlstra  }
312d1bbfd0cSPeter Zijlstra
313d1bbfd0cSPeter ZijlstraNB. try_cmpxchg() also generates better code on some platforms (notably x86)
314d1bbfd0cSPeter Zijlstrawhere the function more closely matches the hardware instruction.
31555bccf1fSPeter Zijlstra
31655bccf1fSPeter Zijlstra
31755bccf1fSPeter ZijlstraFORWARD PROGRESS
31855bccf1fSPeter Zijlstra----------------
31955bccf1fSPeter Zijlstra
32055bccf1fSPeter ZijlstraIn general strong forward progress is expected of all unconditional atomic
32155bccf1fSPeter Zijlstraoperations -- those in the Arithmetic and Bitwise classes and xchg(). However
32255bccf1fSPeter Zijlstraa fair amount of code also requires forward progress from the conditional
32355bccf1fSPeter Zijlstraatomic operations.
32455bccf1fSPeter Zijlstra
32555bccf1fSPeter ZijlstraSpecifically 'simple' cmpxchg() loops are expected to not starve one another
32655bccf1fSPeter Zijlstraindefinitely. However, this is not evident on LL/SC architectures, because
327*aae0c8a5SKushagra Vermawhile an LL/SC architecture 'can/should/must' provide forward progress
32855bccf1fSPeter Zijlstraguarantees between competing LL/SC sections, such a guarantee does not
32955bccf1fSPeter Zijlstratransfer to cmpxchg() implemented using LL/SC. Consider:
33055bccf1fSPeter Zijlstra
33155bccf1fSPeter Zijlstra  old = atomic_read(&v);
33255bccf1fSPeter Zijlstra  do {
33355bccf1fSPeter Zijlstra    new = func(old);
33455bccf1fSPeter Zijlstra  } while (!atomic_try_cmpxchg(&v, &old, new));
33555bccf1fSPeter Zijlstra
33655bccf1fSPeter Zijlstrawhich on LL/SC becomes something like:
33755bccf1fSPeter Zijlstra
33855bccf1fSPeter Zijlstra  old = atomic_read(&v);
33955bccf1fSPeter Zijlstra  do {
34055bccf1fSPeter Zijlstra    new = func(old);
34155bccf1fSPeter Zijlstra  } while (!({
34255bccf1fSPeter Zijlstra    volatile asm ("1: LL  %[oldval], %[v]\n"
34355bccf1fSPeter Zijlstra                  "   CMP %[oldval], %[old]\n"
34455bccf1fSPeter Zijlstra                  "   BNE 2f\n"
34555bccf1fSPeter Zijlstra                  "   SC  %[new], %[v]\n"
34655bccf1fSPeter Zijlstra                  "   BNE 1b\n"
34755bccf1fSPeter Zijlstra                  "2:\n"
34855bccf1fSPeter Zijlstra                  : [oldval] "=&r" (oldval), [v] "m" (v)
34955bccf1fSPeter Zijlstra		  : [old] "r" (old), [new] "r" (new)
35055bccf1fSPeter Zijlstra                  : "memory");
35155bccf1fSPeter Zijlstra    success = (oldval == old);
35255bccf1fSPeter Zijlstra    if (!success)
35355bccf1fSPeter Zijlstra      old = oldval;
35455bccf1fSPeter Zijlstra    success; }));
35555bccf1fSPeter Zijlstra
35655bccf1fSPeter ZijlstraHowever, even the forward branch from the failed compare can cause the LL/SC
35755bccf1fSPeter Zijlstrato fail on some architectures, let alone whatever the compiler makes of the C
35855bccf1fSPeter Zijlstraloop body. As a result there is no guarantee what so ever the cacheline
35955bccf1fSPeter Zijlstracontaining @v will stay on the local CPU and progress is made.
36055bccf1fSPeter Zijlstra
36155bccf1fSPeter ZijlstraEven native CAS architectures can fail to provide forward progress for their
36255bccf1fSPeter Zijlstraprimitive (See Sparc64 for an example).
36355bccf1fSPeter Zijlstra
36455bccf1fSPeter ZijlstraSuch implementations are strongly encouraged to add exponential backoff loops
36555bccf1fSPeter Zijlstrato a failed CAS in order to ensure some progress. Affected architectures are
36655bccf1fSPeter Zijlstraalso strongly encouraged to inspect/audit the atomic fallbacks, refcount_t and
36755bccf1fSPeter Zijlstratheir locking primitives.
368