xref: /linux/Documentation/RCU/rcu_dereference.rst (revision 0e9b70c1e3623fa110fb6be553e644524228ef60)
1.. _rcu_dereference_doc:
2
3PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference()
4===============================================================
5
6Most of the time, you can use values from rcu_dereference() or one of
7the similar primitives without worries.  Dereferencing (prefix "*"),
8field selection ("->"), assignment ("="), address-of ("&"), addition and
9subtraction of constants, and casts all work quite naturally and safely.
10
11It is nevertheless possible to get into trouble with other operations.
12Follow these rules to keep your RCU code working properly:
13
14-	You must use one of the rcu_dereference() family of primitives
15	to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU
16	will complain.  Worse yet, your code can see random memory-corruption
17	bugs due to games that compilers and DEC Alpha can play.
18	Without one of the rcu_dereference() primitives, compilers
19	can reload the value, and won't your code have fun with two
20	different values for a single pointer!  Without rcu_dereference(),
21	DEC Alpha can load a pointer, dereference that pointer, and
22	return data preceding initialization that preceded the store
23	of the pointer.  (As noted later, in recent kernels READ_ONCE()
24	also prevents DEC Alpha from playing these tricks.)
25
26	In addition, the volatile cast in rcu_dereference() prevents the
27	compiler from deducing the resulting pointer value.  Please see
28	the section entitled "EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH"
29	for an example where the compiler can in fact deduce the exact
30	value of the pointer, and thus cause misordering.
31
32-	In the special case where data is added but is never removed
33	while readers are accessing the structure, READ_ONCE() may be used
34	instead of rcu_dereference().  In this case, use of READ_ONCE()
35	takes on the role of the lockless_dereference() primitive that
36	was removed in v4.15.
37
38-	You are only permitted to use rcu_dereference() on pointer values.
39	The compiler simply knows too much about integral values to
40	trust it to carry dependencies through integer operations.
41	There are a very few exceptions, namely that you can temporarily
42	cast the pointer to uintptr_t in order to:
43
44	-	Set bits and clear bits down in the must-be-zero low-order
45		bits of that pointer.  This clearly means that the pointer
46		must have alignment constraints, for example, this does
47		*not* work in general for char* pointers.
48
49	-	XOR bits to translate pointers, as is done in some
50		classic buddy-allocator algorithms.
51
52	It is important to cast the value back to pointer before
53	doing much of anything else with it.
54
55-	Avoid cancellation when using the "+" and "-" infix arithmetic
56	operators.  For example, for a given variable "x", avoid
57	"(x-(uintptr_t)x)" for char* pointers.	The compiler is within its
58	rights to substitute zero for this sort of expression, so that
59	subsequent accesses no longer depend on the rcu_dereference(),
60	again possibly resulting in bugs due to misordering.
61
62	Of course, if "p" is a pointer from rcu_dereference(), and "a"
63	and "b" are integers that happen to be equal, the expression
64	"p+a-b" is safe because its value still necessarily depends on
65	the rcu_dereference(), thus maintaining proper ordering.
66
67-	If you are using RCU to protect JITed functions, so that the
68	"()" function-invocation operator is applied to a value obtained
69	(directly or indirectly) from rcu_dereference(), you may need to
70	interact directly with the hardware to flush instruction caches.
71	This issue arises on some systems when a newly JITed function is
72	using the same memory that was used by an earlier JITed function.
73
74-	Do not use the results from relational operators ("==", "!=",
75	">", ">=", "<", or "<=") when dereferencing.  For example,
76	the following (quite strange) code is buggy::
77
78		int *p;
79		int *q;
80
81		...
82
83		p = rcu_dereference(gp)
84		q = &global_q;
85		q += p > &oom_p;
86		r1 = *q;  /* BUGGY!!! */
87
88	As before, the reason this is buggy is that relational operators
89	are often compiled using branches.  And as before, although
90	weak-memory machines such as ARM or PowerPC do order stores
91	after such branches, but can speculate loads, which can again
92	result in misordering bugs.
93
94-	Be very careful about comparing pointers obtained from
95	rcu_dereference() against non-NULL values.  As Linus Torvalds
96	explained, if the two pointers are equal, the compiler could
97	substitute the pointer you are comparing against for the pointer
98	obtained from rcu_dereference().  For example::
99
100		p = rcu_dereference(gp);
101		if (p == &default_struct)
102			do_default(p->a);
103
104	Because the compiler now knows that the value of "p" is exactly
105	the address of the variable "default_struct", it is free to
106	transform this code into the following::
107
108		p = rcu_dereference(gp);
109		if (p == &default_struct)
110			do_default(default_struct.a);
111
112	On ARM and Power hardware, the load from "default_struct.a"
113	can now be speculated, such that it might happen before the
114	rcu_dereference().  This could result in bugs due to misordering.
115
116	However, comparisons are OK in the following cases:
117
118	-	The comparison was against the NULL pointer.  If the
119		compiler knows that the pointer is NULL, you had better
120		not be dereferencing it anyway.  If the comparison is
121		non-equal, the compiler is none the wiser.  Therefore,
122		it is safe to compare pointers from rcu_dereference()
123		against NULL pointers.
124
125	-	The pointer is never dereferenced after being compared.
126		Since there are no subsequent dereferences, the compiler
127		cannot use anything it learned from the comparison
128		to reorder the non-existent subsequent dereferences.
129		This sort of comparison occurs frequently when scanning
130		RCU-protected circular linked lists.
131
132		Note that if the pointer comparison is done outside
133		of an RCU read-side critical section, and the pointer
134		is never dereferenced, rcu_access_pointer() should be
135		used in place of rcu_dereference().  In most cases,
136		it is best to avoid accidental dereferences by testing
137		the rcu_access_pointer() return value directly, without
138		assigning it to a variable.
139
140		Within an RCU read-side critical section, there is little
141		reason to use rcu_access_pointer().
142
143	-	The comparison is against a pointer that references memory
144		that was initialized "a long time ago."  The reason
145		this is safe is that even if misordering occurs, the
146		misordering will not affect the accesses that follow
147		the comparison.  So exactly how long ago is "a long
148		time ago"?  Here are some possibilities:
149
150		-	Compile time.
151
152		-	Boot time.
153
154		-	Module-init time for module code.
155
156		-	Prior to kthread creation for kthread code.
157
158		-	During some prior acquisition of the lock that
159			we now hold.
160
161		-	Before mod_timer() time for a timer handler.
162
163		There are many other possibilities involving the Linux
164		kernel's wide array of primitives that cause code to
165		be invoked at a later time.
166
167	-	The pointer being compared against also came from
168		rcu_dereference().  In this case, both pointers depend
169		on one rcu_dereference() or another, so you get proper
170		ordering either way.
171
172		That said, this situation can make certain RCU usage
173		bugs more likely to happen.  Which can be a good thing,
174		at least if they happen during testing.  An example
175		of such an RCU usage bug is shown in the section titled
176		"EXAMPLE OF AMPLIFIED RCU-USAGE BUG".
177
178	-	All of the accesses following the comparison are stores,
179		so that a control dependency preserves the needed ordering.
180		That said, it is easy to get control dependencies wrong.
181		Please see the "CONTROL DEPENDENCIES" section of
182		Documentation/memory-barriers.txt for more details.
183
184	-	The pointers are not equal *and* the compiler does
185		not have enough information to deduce the value of the
186		pointer.  Note that the volatile cast in rcu_dereference()
187		will normally prevent the compiler from knowing too much.
188
189		However, please note that if the compiler knows that the
190		pointer takes on only one of two values, a not-equal
191		comparison will provide exactly the information that the
192		compiler needs to deduce the value of the pointer.
193
194-	Disable any value-speculation optimizations that your compiler
195	might provide, especially if you are making use of feedback-based
196	optimizations that take data collected from prior runs.  Such
197	value-speculation optimizations reorder operations by design.
198
199	There is one exception to this rule:  Value-speculation
200	optimizations that leverage the branch-prediction hardware are
201	safe on strongly ordered systems (such as x86), but not on weakly
202	ordered systems (such as ARM or Power).  Choose your compiler
203	command-line options wisely!
204
205
206EXAMPLE OF AMPLIFIED RCU-USAGE BUG
207----------------------------------
208
209Because updaters can run concurrently with RCU readers, RCU readers can
210see stale and/or inconsistent values.  If RCU readers need fresh or
211consistent values, which they sometimes do, they need to take proper
212precautions.  To see this, consider the following code fragment::
213
214	struct foo {
215		int a;
216		int b;
217		int c;
218	};
219	struct foo *gp1;
220	struct foo *gp2;
221
222	void updater(void)
223	{
224		struct foo *p;
225
226		p = kmalloc(...);
227		if (p == NULL)
228			deal_with_it();
229		p->a = 42;  /* Each field in its own cache line. */
230		p->b = 43;
231		p->c = 44;
232		rcu_assign_pointer(gp1, p);
233		p->b = 143;
234		p->c = 144;
235		rcu_assign_pointer(gp2, p);
236	}
237
238	void reader(void)
239	{
240		struct foo *p;
241		struct foo *q;
242		int r1, r2;
243
244		rcu_read_lock();
245		p = rcu_dereference(gp2);
246		if (p == NULL)
247			return;
248		r1 = p->b;  /* Guaranteed to get 143. */
249		q = rcu_dereference(gp1);  /* Guaranteed non-NULL. */
250		if (p == q) {
251			/* The compiler decides that q->c is same as p->c. */
252			r2 = p->c; /* Could get 44 on weakly order system. */
253		} else {
254			r2 = p->c - r1; /* Unconditional access to p->c. */
255		}
256		rcu_read_unlock();
257		do_something_with(r1, r2);
258	}
259
260You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible,
261but you should not be.  After all, the updater might have been invoked
262a second time between the time reader() loaded into "r1" and the time
263that it loaded into "r2".  The fact that this same result can occur due
264to some reordering from the compiler and CPUs is beside the point.
265
266But suppose that the reader needs a consistent view?
267
268Then one approach is to use locking, for example, as follows::
269
270	struct foo {
271		int a;
272		int b;
273		int c;
274		spinlock_t lock;
275	};
276	struct foo *gp1;
277	struct foo *gp2;
278
279	void updater(void)
280	{
281		struct foo *p;
282
283		p = kmalloc(...);
284		if (p == NULL)
285			deal_with_it();
286		spin_lock(&p->lock);
287		p->a = 42;  /* Each field in its own cache line. */
288		p->b = 43;
289		p->c = 44;
290		spin_unlock(&p->lock);
291		rcu_assign_pointer(gp1, p);
292		spin_lock(&p->lock);
293		p->b = 143;
294		p->c = 144;
295		spin_unlock(&p->lock);
296		rcu_assign_pointer(gp2, p);
297	}
298
299	void reader(void)
300	{
301		struct foo *p;
302		struct foo *q;
303		int r1, r2;
304
305		rcu_read_lock();
306		p = rcu_dereference(gp2);
307		if (p == NULL)
308			return;
309		spin_lock(&p->lock);
310		r1 = p->b;  /* Guaranteed to get 143. */
311		q = rcu_dereference(gp1);  /* Guaranteed non-NULL. */
312		if (p == q) {
313			/* The compiler decides that q->c is same as p->c. */
314			r2 = p->c; /* Locking guarantees r2 == 144. */
315		} else {
316			spin_lock(&q->lock);
317			r2 = q->c - r1;
318			spin_unlock(&q->lock);
319		}
320		rcu_read_unlock();
321		spin_unlock(&p->lock);
322		do_something_with(r1, r2);
323	}
324
325As always, use the right tool for the job!
326
327
328EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH
329-----------------------------------------
330
331If a pointer obtained from rcu_dereference() compares not-equal to some
332other pointer, the compiler normally has no clue what the value of the
333first pointer might be.  This lack of knowledge prevents the compiler
334from carrying out optimizations that otherwise might destroy the ordering
335guarantees that RCU depends on.  And the volatile cast in rcu_dereference()
336should prevent the compiler from guessing the value.
337
338But without rcu_dereference(), the compiler knows more than you might
339expect.  Consider the following code fragment::
340
341	struct foo {
342		int a;
343		int b;
344	};
345	static struct foo variable1;
346	static struct foo variable2;
347	static struct foo *gp = &variable1;
348
349	void updater(void)
350	{
351		initialize_foo(&variable2);
352		rcu_assign_pointer(gp, &variable2);
353		/*
354		 * The above is the only store to gp in this translation unit,
355		 * and the address of gp is not exported in any way.
356		 */
357	}
358
359	int reader(void)
360	{
361		struct foo *p;
362
363		p = gp;
364		barrier();
365		if (p == &variable1)
366			return p->a; /* Must be variable1.a. */
367		else
368			return p->b; /* Must be variable2.b. */
369	}
370
371Because the compiler can see all stores to "gp", it knows that the only
372possible values of "gp" are "variable1" on the one hand and "variable2"
373on the other.  The comparison in reader() therefore tells the compiler
374the exact value of "p" even in the not-equals case.  This allows the
375compiler to make the return values independent of the load from "gp",
376in turn destroying the ordering between this load and the loads of the
377return values.  This can result in "p->b" returning pre-initialization
378garbage values on weakly ordered systems.
379
380In short, rcu_dereference() is *not* optional when you are going to
381dereference the resulting pointer.
382
383
384WHICH MEMBER OF THE rcu_dereference() FAMILY SHOULD YOU USE?
385------------------------------------------------------------
386
387First, please avoid using rcu_dereference_raw() and also please avoid
388using rcu_dereference_check() and rcu_dereference_protected() with a
389second argument with a constant value of 1 (or true, for that matter).
390With that caution out of the way, here is some guidance for which
391member of the rcu_dereference() to use in various situations:
392
3931.	If the access needs to be within an RCU read-side critical
394	section, use rcu_dereference().  With the new consolidated
395	RCU flavors, an RCU read-side critical section is entered
396	using rcu_read_lock(), anything that disables bottom halves,
397	anything that disables interrupts, or anything that disables
398	preemption.
399
4002.	If the access might be within an RCU read-side critical section
401	on the one hand, or protected by (say) my_lock on the other,
402	use rcu_dereference_check(), for example::
403
404		p1 = rcu_dereference_check(p->rcu_protected_pointer,
405					   lockdep_is_held(&my_lock));
406
407
4083.	If the access might be within an RCU read-side critical section
409	on the one hand, or protected by either my_lock or your_lock on
410	the other, again use rcu_dereference_check(), for example::
411
412		p1 = rcu_dereference_check(p->rcu_protected_pointer,
413					   lockdep_is_held(&my_lock) ||
414					   lockdep_is_held(&your_lock));
415
4164.	If the access is on the update side, so that it is always protected
417	by my_lock, use rcu_dereference_protected()::
418
419		p1 = rcu_dereference_protected(p->rcu_protected_pointer,
420					       lockdep_is_held(&my_lock));
421
422	This can be extended to handle multiple locks as in #3 above,
423	and both can be extended to check other conditions as well.
424
4255.	If the protection is supplied by the caller, and is thus unknown
426	to this code, that is the rare case when rcu_dereference_raw()
427	is appropriate.  In addition, rcu_dereference_raw() might be
428	appropriate when the lockdep expression would be excessively
429	complex, except that a better approach in that case might be to
430	take a long hard look at your synchronization design.  Still,
431	there are data-locking cases where any one of a very large number
432	of locks or reference counters suffices to protect the pointer,
433	so rcu_dereference_raw() does have its place.
434
435	However, its place is probably quite a bit smaller than one
436	might expect given the number of uses in the current kernel.
437	Ditto for its synonym, rcu_dereference_check( ... , 1), and
438	its close relative, rcu_dereference_protected(... , 1).
439
440
441SPARSE CHECKING OF RCU-PROTECTED POINTERS
442-----------------------------------------
443
444The sparse static-analysis tool checks for non-RCU access to RCU-protected
445pointers, which can result in "interesting" bugs due to compiler
446optimizations involving invented loads and perhaps also load tearing.
447For example, suppose someone mistakenly does something like this::
448
449	p = q->rcu_protected_pointer;
450	do_something_with(p->a);
451	do_something_else_with(p->b);
452
453If register pressure is high, the compiler might optimize "p" out
454of existence, transforming the code to something like this::
455
456	do_something_with(q->rcu_protected_pointer->a);
457	do_something_else_with(q->rcu_protected_pointer->b);
458
459This could fatally disappoint your code if q->rcu_protected_pointer
460changed in the meantime.  Nor is this a theoretical problem:  Exactly
461this sort of bug cost Paul E. McKenney (and several of his innocent
462colleagues) a three-day weekend back in the early 1990s.
463
464Load tearing could of course result in dereferencing a mashup of a pair
465of pointers, which also might fatally disappoint your code.
466
467These problems could have been avoided simply by making the code instead
468read as follows::
469
470	p = rcu_dereference(q->rcu_protected_pointer);
471	do_something_with(p->a);
472	do_something_else_with(p->b);
473
474Unfortunately, these sorts of bugs can be extremely hard to spot during
475review.  This is where the sparse tool comes into play, along with the
476"__rcu" marker.  If you mark a pointer declaration, whether in a structure
477or as a formal parameter, with "__rcu", which tells sparse to complain if
478this pointer is accessed directly.  It will also cause sparse to complain
479if a pointer not marked with "__rcu" is accessed using rcu_dereference()
480and friends.  For example, ->rcu_protected_pointer might be declared as
481follows::
482
483	struct foo __rcu *rcu_protected_pointer;
484
485Use of "__rcu" is opt-in.  If you choose not to use it, then you should
486ignore the sparse warnings.
487