explanation.txt - OpenGrok cross reference for /linux/tools/memory-model/Documentation/explanation.txt

Lines Matching full:and
13   5. ORDERING AND CYCLES
15   7. THE PROGRAM ORDER RELATION: po AND po-loc
17   9. DEPENDENCY RELATIONS: data, addr, and ctrl
18   10. THE READS-FROM RELATION: rf, rfi, and rfe
19   11. CACHE COHERENCE AND THE COHERENCE ORDER RELATION: co, coi, and coe
20   12. THE FROM-READS RELATION: fr, fri, and fre
27   19. AND THEN THERE WAS ALPHA
30   22. RCU RELATIONS: rcu-link, rcu-gp, rcu-rscsi, rcu-order, rcu-fence, and rb
33   25. PLAIN ACCESSES AND DATA RACES
34   26. ODDS AND ENDS
41 The Linux-kernel memory consistency model (LKMM) is rather complex and
43 linux-kernel.bell and linux-kernel.cat files that make up the formal
44 version of the model; they are extremely terse and their meanings are
49 not go into the details of the code in the .bell and .cat files;
52 Sections 2 (BACKGROUND) through 5 (ORDERING AND CYCLES) are aimed
53 toward beginners; they explain what memory consistency models are and
63 not passed by pointers, and they don't have an "exists" clause at the
78 That is, given a piece of code and a collection of values specified
87 factors such as DMA and mixed-size accesses.)  But on multiprocessor
91 Different architectures have differing memory models, and the Linux
103 device, stores it in a buffer, and sets a flag to indicate the buffer
107 ready, and if it is, copies the data back to userspace.  The buffer
108 and the flag are memory locations shared between the two CPUs.
111 (the reason for using WRITE_ONCE() and READ_ONCE() instead of simple
133 CPU and P1() represents the read() routine running on another.  The
135 Thus, P0 stores the data in buf and then sets flag.  Meanwhile, P1
136 reads flag into the private variable r1, and if it is set, reads the
139 while and try again.)
142 shared memory locations and another CPU loads from those locations in
148 usually use sleep and wakeup mechanisms rather than polling for I/O
149 completion, and real code generally doesn't bother to copy values into
155 from flag and buf, or equivalently, what values r1 and r2 might end up
163 instance, P1 might run entirely before P0 begins, in which case r1 and
165 begins, in which case r1 and r2 will both be 1.
169 store to buf but before the store to flag.  In this case, r1 and r2
171 unconditionally then we would instead have r1 = 0 and r2 = 1.)
173 However, the most interesting possibility is where r1 = 1 and r2 = 0.
176 happens, the LKMM does predict this outcome can occur, and the example
183 The first widely cited memory model, and the simplest to understand,
215 performance optimizations.  On ARM and PowerPC architectures, for
216 instance, the MP example code really does sometimes yield r1 = 1 and
219 x86 and SPARC follow yet a different memory model: TSO (Total Store
223 each CPU stores to its own shared location and then loads from the
246 this outcome to occur, and in fact it does sometimes occur on x86 and
250 x86, Alpha, and other architectures.  However, it is different in
254 ORDERING AND CYCLES
262 as those instructions occur in the code, and there are many other
266 It's not possible to have X ordered before Y, Y ordered before Z, and
278 and a certain outcome for the loads in a piece of code can happen only
305 spin_lock().  However, no single event can be both a read and a write.
312 shared memory, do not give rise to events.  Thus, arithmetic and
319 is concerned only with the store itself -- its value and its address
328 THE PROGRAM ORDER RELATION: po AND po-loc
333 code after branches are taken into account and loops have been
344 first comes before the second in program order and they access the
350 code, and it retains their outlook to a considerable extent.  The
351 read, write, and fence events used by the model are close in spirit to
353 kernel code written in C, and the mapping from C to machine code can
365 by the C standard), and it will not introduce extraneous accesses.
367 The MP and SB examples above used READ_ONCE() and WRITE_ONCE() rather
370 buf and P0's write event to flag, and similarly for the other shared
383 The protections provided by READ_ONCE(), WRITE_ONCE(), and others are
384 not perfect; and under some circumstances it is possible for the
416 operators and function calls can be evaluated in any order.  For
427 DEPENDENCY RELATIONS: data, addr, and ctrl
432 from memory by the first.  The first event must be a read, and the
434 There are three kinds of dependencies: data, address (addr), and
437 A read and a write event are linked by a data dependency if the value
448 arbitrarily complicated computations, and a write can depend on the
451 A read event and another memory access event are linked by an address
468 Finally, a read event X and a write event Y are linked by a control
469 dependency if Y syntactically lies within an arm of an if statement and
530 about what will happen if r1 is nonzero, and it can assume that r1
537 from semantic dependencies and therefore mistakenly predicts that the
542 THE READS-FROM RELATION: rf, rfi, and rfe
549 further distinguish the cases where the load and the store occur on
550 the same CPU (internal reads-from, or rfi) and where they occur on
560 and some of them from another store.  Fortunately, use of READ_ONCE()
561 and WRITE_ONCE() will prevent load-tearing; it's not possible to have:
577 and end up with r1 = 0x1200 (partly from x's initial value and partly
602 from both of P0's stores.  It is possible to handle mixed-size and
604 attempt to do so.  It requires all accesses to be properly aligned and
608 CACHE COHERENCE AND THE COHERENCE ORDER RELATION: co, coi, and coe
615 ordering which all the CPUs agree on (the coherence order), and this
624 that value comes third, and so on.
637 	W' in program order and they access the same location), where W
638 	and W' are two stores, then W ->co W'.
640 	Write-read coherence: If W ->po-loc R, where W is a store and R
644 	Read-write coherence: If R ->po-loc W, where R is a load and W
648 	Read-read coherence: If R ->po-loc R', where R and R' are two
656 Wikipedia, sequential consistency per variable and cache coherence
676 program order, it must also come later in x's coherence order and
710 If r1 = 5 (reading from P0's store) and r2 = 0 (reading from the
718 and r2 = 0!  This results from parallel execution of the operations
719 encoded in Itanium's Very-Long-Instruction-Word format, and it is yet
724 possible for a store to directly or indirectly overwrite itself!  And
726 occur on the same CPU (internal coherence order, or coi) and stores
735 THE FROM-READS RELATION: fr, fri, and fre
757 The value loaded from x will be 0 (assuming cache coherence!), and it
763 As with rf, rfi, and rfe, we subdivide the fr relation into fri (when
764 the load and the store are on the same CPU) and fre (when they are on
767 Note that the fr relation is determined entirely by the rf and co
768 relations; it is not independent.  Given a read event R and a write
769 event W for the same location, we will have R ->fr W if and only if
772 	(R ->fr W) := (there exists W' with W' ->rf R and W' ->co W).
782 The system as a whole is divided into the CPUs and a memory subsystem.
784 in program order), and they communicate with the memory subsystem.
786 only internal operations.  However, loads, stores, and fences involve
802 and we say that the store's value is forwarded to R.  Otherwise, the
803 CPU asks the memory subsystem for the value to load and we say that R
809 CPUs have local caches, and propagating a store to a CPU really means
811 time to process the stores that it receives, and a store can't be used
814 First-In-First-Out order, and consequently the processing delay
819 Note that load instructions may be executed speculatively and may be
826 about the fence.  However, fences do constrain the way CPUs and the
833 	Strong fences, including smp_mb() and synchronize_rcu(), force
869 The propagation ordering enforced by release fences and strong fences
872 fence.  We describe this property by saying that release fences and
878 rcu_read_lock(), rcu_read_unlock(), and synchronize_rcu() fences have
885 The fences which affect propagation order (i.e., strong, release, and
888 defined to link memory access events E and F whenever:
890 	E and F are both stores on the same CPU and an smp_wmb() fence
893 	F is a release fence and some X comes before F in program order,
896 	A strong fence event occurs between some X and F in program
899 The operational model requires that whenever W and W' are both stores
900 and W ->cumul-fence W', then W must propagate to any given CPU
901 before W' does.  However, for different CPUs C and C', it does not
910 maintaining cache coherence and the fact that a CPU can't operate on a
927 	CPUs and to RAM in a specific order.
929 	Rcu: This requires that RCU read-side critical sections and
937 The first and second are quite common; they can be found in many
938 memory models (such as those for C11/C++11).  The "happens-before" and
940 "rcu" and "plain-coherence" axioms are specific to the LKMM.
951 each load between the store that it reads from and the following
957 and po-loc relations agree with this global ordering; in other words,
980 this case) does not get altered between the read and the write events
994 occurs in between CPU 1's load and store.  To put it another way, the
996 is between the store that CPU 1 reads from and the store that CPU 1
1001 rule: Whenever R and W are the read and write events composing an
1002 atomic read-modify-write and W' is the write event which R reads from,
1003 there must not be any stores coming between W' and W in the coherence
1006 	(R ->rmw W) implies (there is no X with R ->fr X and X ->co W),
1008 where the rmw relation links the read and write events making up each
1018 where Z0 is some store event and n can be any number (even 0, in the
1020 Note that this implies Z0 and Zn are stores to the same variable.
1029 to each CPU before the store X does.  Then the fact that X and Y are
1032 the w-post-bounded relation defined below in the PLAIN ACCESSES AND
1048 instruction to the po-later instruction and is thus a sub-relation of
1052 situation: Fences are a source of ppo links.  Suppose X and Y are
1057 	X and Y;
1059 	X and Y are both stores and an smp_wmb() fence occurs between
1062 	X and Y are both loads and an smp_rmb() fence occurs between
1072 	X and Y are both loads, X ->addr Y (i.e., there is an address
1073 	dependency from X to Y), and X is a READ_ONCE() or an atomic
1089 can always satisfy the second load speculatively before the first, and
1091 be executed after all.  And lastly, the real difficulties begin when
1101 problem when the loads come from READ_ONCE(), and therefore the LKMM
1106 store and a second, po-later load reads from that store:
1114 it knows what that value is, or that W and R' do access the same
1116 and W then the CPU can speculatively forward W to R' before executing
1127 because it could tell that the store and the second load access the
1137 (the po-loc link says that R comes before W in program order and they
1145 and the CPU executed W' before W, then the memory subsystem would put
1154 AND THEN THERE WAS ALPHA
1183 and r2 = 0 at the end, in spite of the address dependency.
1187 to ptr does.  And since P1 can't execute its second load
1208 adds this fence after every READ_ONCE() and atomic load on Alpha.  The
1222 then we would never get r1 = &x and r2 = 0.  By the time P1 executed
1224 the local cache and available for satisfying the read request.  Thus
1229 The LKMM requires that smp_rmb(), acquire fences, and strong fences
1239 And of course, none of this matters for any architecture other than
1247 execute in a certain order.  hb includes the ppo relation and two
1250 W ->rfe R implies that W and R are on different CPUs.  It also means
1253 must have executed before R, and so we have W ->hb R.
1255 The equivalent fact need not hold if W ->rfi R (i.e., W and R are on
1262 W ->coe W'.  This means that W and W' are stores to the same location,
1263 they execute on different CPUs, and W comes before W' in the coherence
1276 cache coherence.  The relation is called prop, and it links two events
1278 the first event in the coherence order and propagates to C before the
1305 order, and P1's store propagated to P0 before P0's load executed.
1325 If r1 = 0 and r2 = 9 at the end then P0's accesses must have executed
1328 load executed, and so r1 would have been 9 rather than 0.  In this
1330 because P1's store overwrote the value read by P0's first load, and
1356 stores.  If r1 = 1 and r2 = 0 at the end then there is a prop link
1360 to buf will propagate to P1 before the store to flag does, and the
1368 Alpha, although the details are more complicated and we will not go
1372 would force the two loads to be executed in program order, and it
1374 link (hence an hb link) from the first load to the second, and the
1385 followed by two cumul-fences and an rfe link, utilizing the fact that
1413 If x = 2, r0 = 1, and r2 = 1 after this code runs then there is a prop
1418 before P2's load and store execute, P2's smp_store_release()
1419 guarantees that the stores to x and y both propagate to P0 before the
1420 store to z does (the second cumul-fence), and P0's load executes after the
1437 features of strong fences.  It links two events E and F whenever some
1438 store is coherence-later than E and propagates to every CPU and to RAM
1441 optional rfe link, a strong fence, and an arbitrary number of hb
1445 of links begins with coe).  Then there are events W, X, Y, and Z such
1451 specified type, and the ? suffix indicates the link is optional (Y may
1453 propagate to Y's CPU before X does, hence before Y executes and hence
1455 know that W will propagate to every CPU and to RAM before Z executes.
1456 And because of the hb links, we know that Z will execute before F.
1458 propagate to every CPU and to RAM before F executes.
1497 value read by P0), and a strong fence between P1's store and its load.
1498 In this example, the sequences of cumul-fence and hb links are empty.
1500 because it does not start and end on the same CPU.
1503 to P0's.  This means that if both r1 and r2 were 0 there would be a
1513 RCU RELATIONS: rcu-link, rcu-gp, rcu-rscsi, rcu-order, rcu-fence, and rb
1517 rests on two concepts: grace periods and read-side critical sections.
1522 at the start and rcu_read_unlock() at the end.  Critical sections can
1529 	For any critical section C and any grace period G, at least
1532 (1)	C ends before G does, and in addition, every store that
1536 (2)	G starts before C does, and in addition, every store that
1541 before and end after a grace period.
1565 never end with r1 = 1 and r2 = 0.  The reasoning is as follows.  r1 = 1
1578 execute an smp_mb() fence after the end of the critical section and
1580 And if a critical section ends after a grace period does then the
1582 and some time before the critical section's opening rcu_read_lock()
1590 "before": If E and F are RCU fence events (i.e., rcu_read_lock(),
1593 event X, F is po-after some memory-access event Y, and we have any of
1597 obscure, and we won't give it here.  It is closely related to the pb
1598 relation, and the details don't matter unless you want to comb through
1602 The LKMM also defines the rcu-gp and rcu-rscsi relations.  They bring
1603 grace periods and read-side critical sections into the picture, in the
1606 	E ->rcu-gp F means that E and F are in fact the same event,
1607 	and that event is a synchronize_rcu() fence (i.e., a grace
1610 	E ->rcu-rscsi F means that E and F are the rcu_read_unlock()
1611 	and rcu_read_lock() fence events delimiting some read-side
1625 rcu-gp and rcu-rscsi links separated by rcu-link links, in which the
1632 rcu-gp links and one rcu-rscsi link.  (It also implies that
1633 X ->rcu-order T and Z ->rcu-order V.)  On the other hand:
1653 This formula means that G and W are the same event (a grace period),
1654 and there are events X, Y and a read-side critical section C such that:
1658 	2. X comes "before" Y in some sense (including rfe, co and fr);
1671 executing and hence before any instruction po-after F can execute.
1682 there are fence events E and F such that:
1689 "super-strong" fence: Unlike the original strong fences (smp_mb() and
1698 before F, just as E ->pb F does (and for much the same reasons).
1703 and F with E ->rcu-link F ->rcu-order E.  Or to put it a third way,
1704 the axiom requires that there are no cycles consisting of rcu-gp and
1712 violated: A critical section starts before a grace period, and some
1717 Putting symbols to these ideas, let L and U be the rcu_read_lock() and
1719 question, and let S be the synchronize_rcu() fence event for the grace
1721 are events Q and R where Q is po-after L (which marks the start of the
1723 relation, and R is po-before the grace period S.  Thus we have:
1728 critical section and witness that W propagates to the critical
1729 section's CPU by reading from W, and let Z on some arbitrary CPU be a
1737 discussion of the rcu-link relation earlier) that S and U are related
1742 Since S is a grace period we have S ->rcu-gp S, and since L and U are
1743 the start and end of the critical section C we have U ->rcu-rscsi L.
1776 P1's load at W reads from, so we have W ->fre Y.  Since S ->po W and
1781 so we have X ->rfe Z.  Together with L ->po X and Z ->po S, this
1782 yields L ->rcu-link S.  And since L and U are the start and end of a
1845 This requires P0 and P2 to execute their loads and stores out of
1846 program order, but of course they are allowed to do so.  And as you
1848 section in P0 both starts before P1's grace period does and ends
1849 before it does, and the critical section in P2 both starts after P1's
1850 grace period does and ends after it does.
1854 relations srcu-gp and srcu-rscsi added to represent SRCU grace periods
1855 and read-side critical sections.  However, there are some significant
1856 differences between RCU read-side critical sections and their SRCU
1868 	srcu_read_lock(), and srcu_read_unlock() take a pointer to a
1870 	an SRCU domain, and calls linked by srcu-rscsi must have the
1871 	same domain.  Read-side critical sections and grace periods
1874 	to pairs of critical sections and grace periods having the
1879 	rcu_read_lock() and rcu_read_unlock(), an srcu_read_lock()
1883 	following example (where s is an srcu_struct and idx1 and idx2
1891 	The matching is determined entirely by the domain pointer and
1893 	rcu_read_lock() and rcu_read_unlock() then they would have
1895 	sections: an inner one and an outer one.
1897 3.	The srcu_down_read() and srcu_up_read() primitives work
1898 	exactly like srcu_read_lock() and srcu_read_unlock(), except
1902 	pointer and index value, these primitives make it possible for
1903 	an SRCU read-side critical section to start on one CPU and end
1908 appropriate, since it takes a memory location as argument and returns
1909 a value, just as a load does) and srcu_read_unlock() as a special type
1911 memory location and a value).  These loads and stores are annotated as
1912 belonging to the "srcu-lock" and "srcu-unlock" event classes
1928 the LKMM will treat A and B as loads from s yielding values saved in
1929 idx1 and idx2 respectively.  Similarly, it will treat C and D as
1930 though they stored the values from idx1 and idx2 in s.  The end result
1938 except for the presence of the special srcu-lock and srcu-unlock
1939 annotations.  You can see at once that we have A ->data C and
1941 srcu-unlock event matching srcu-lock event A, and D is the
1944 This approach is admittedly a hack, and it has the potential to lead
1982 Assuming that P1 executes after P0 and does read the index value
1994 an srcu-lock event and the final data dependency leading to the
2005 the LKMM treats B as a store to the variable s and C as a load from
2024 the same as an int, and spin_lock(&s) is treated almost the same as:
2029 This waits until s is equal to 0 and then atomically sets it to 1,
2030 and the read part of the cmpxchg operation acts as an acquire fence.
2040 which atomically sets s to 1 if it is currently equal to 0 and returns
2048 store-release in a spin_unlock() and the load-acquire which forms the
2050 spin_trylock() -- we can call these things lock-releases and
2052 and acquires.
2057 would naturally hold if the release and acquire operations were on
2058 different CPUs and accessed the same lock variable, but the LKMM says
2084 Here the second spin_lock() is po-after the first spin_unlock(), and
2087 r1 = 1 and r2 = 0 at the end (this is an instance of the MP pattern).
2089 This requirement does not apply to ordinary release and acquire
2112 and thus it could load y before x, obtaining r2 = 0 and r1 = 1.
2115 and some other stores W and W' occur po-before the lock-release and
2150 before the store to y does, so we cannot have r2 = 1 and r3 = 0.  But
2152 propagated in either order.  (On the other hand, if the code in P0 and
2157 These two special requirements for lock-release and lock-acquire do
2159 have come to expect and rely on them because they do hold on all
2164 PLAIN ACCESSES AND DATA RACES
2168 smp_load_acquire(&z), and so on are collectively referred to as
2201 possible final values for r2 are 6 and 0, depending on whether or not
2205 for the READ_ONCE()) and eliminate the temporary variable r1.  The
2217 And now it is obvious that this code runs the risk of dereferencing a
2221 the compiler would not have performed this optimization and there
2241 	same CPU), and
2246 1 and 2 above.  We'll go a little farther and say that two accesses
2253 a potential data race and makes no predictions about the program's
2265 if they can be connected by a sequence of hb, pb, and rb links
2269 If X is a load and X executes before a store Y, then indeed there is
2270 no danger of X and Y being concurrent.  After all, Y can't have any
2273 some time after Y executes and thus after X executes.  But if X is a
2276 could very well interfere somehow with Y, and we would have to
2277 consider X and Y to be concurrent.
2279 Therefore when X is a store, for X and Y to be non-concurrent the LKMM
2289 	these links are present, X and Z are the same event),
2291 and either:
2298 	Z is on the same CPU as Y and is connected to Y by a possibly
2300 	means Z and Y are the same event).
2318 pattern (with fences and statement labels, but without the conditional
2340 The smp_wmb() memory barrier gives a cumul-fence link from X to W, and
2343 executes.  Next, Z and Y are on the same CPU and the smp_rmb() fence
2352 cumul-fence, pb, and so on -- including vis) apply only to marked
2356 allowed to apply fancy transformations to marked accesses, and
2360 split them up, duplicate them, eliminate them, invent new ones, and
2368 race (if it could, memory models would be useless and no multithreaded
2423 This program does not contain a data race.  Although the U and V
2427 	The smp_wmb() fence in P0 is both a compiler barrier and a
2435 	X and Y are both marked accesses.  Hence an rfe link from X to
2437 	executed, i.e., X ->vis Y.  (And if there is no rfe link then
2438 	r1 will be 0, so V will not be executed and ipso facto won't
2443 	corresponding to the access V will be po-after the fence, and
2445 	after the fence does and hence after Y does.
2452 general.  Suppose R is a plain load and we want to show that R
2454 marked access X such that R and X are ordered by a suitable fence and
2456 marked access Y such that X ->xb* Y, and Y and E are ordered by a
2458 "post-bounded" by X and E is "pre-bounded" by Y.
2464 (i.e., smp_rmb()) and some affect only stores (smp_wmb()); otherwise
2465 the two types of bounds are the same.  And as a degenerate case, we
2466 say that a marked access pre-bounds and post-bounds itself (e.g., if R
2469 The need to distinguish between r- and w-bounding raises yet another
2482 thereby adding a load (and possibly replacing the store entirely).
2489 compiler has augmented a store with a load in this fashion, and the
2539 (In this example the rcu_read_lock() and rcu_read_unlock() calls don't
2545 is definitely w-post-bounded before the store to ptr, and the two
2553 and a certain amount of care is required when programming constructs
2554 like this one.  In particular, comparisons between the pointer and
2582 be on different CPUs, and fences don't link events on different CPUs.
2607 P1 before the critical section started and so would have been visible
2613 "y = 3" store, and consequently the first must propagate from P1 to P0
2615 concurrent and there is no race, even though P1's plain store to y
2619 race-candidate stores W and W', where W ->co W', the LKMM says the
2628 sequence, and if W' is plain then they also have to be linked by a
2632 sequence.  For race-candidate load R and store W, the LKMM says the
2644 	strong-fence ; xb* ; {w and/or r}-pre-bounded
2646 sequence with no post-bounding, and in every case the LKMM also allows
2653 happens-before, propagates-before, and rcu axioms (which state that
2662 satisfy a load request and its determination of where a store will
2665 	If R and W are race candidates and it is possible to link R to
2670 	If W and R are race candidates and it is possible to link W to
2675 	If W and W' are race candidates and it is possible to link W
2686 ODDS AND ENDS
2695 optional, and it doesn't require the events linked by the relation to
2698 because all the other parts (fences and rfe) are already included in
2699 hb anyway, and where the formal model adds prop into hb, it includes
2704 accesses and fences, such as those corresponding to smp_load_acquire()
2706 and fences; rather, they are read events with an annotation marking
2723 x, and then the increment is carried out by the memory hardware with
2729 treated as READ_ONCE() and rcu_assign_pointer() is treated as
2736 followed by smp_wmb() and then a marked store W', the LKMM creates a
2744 links a marked load R to a store W, and the store is read by a load R'
2751 smp_mb__before_atomic(), smp_mb__after_atomic(), and
2759 	po-later atomic updates and the events following them;
2761 	smp_mb__after_atomic() orders po-earlier atomic updates and
2765 	events and the events preceding them against all po-later
2768 Interestingly, RCU and locking each introduce the possibility of