xref: /linux/Documentation/RCU/whatisRCU.rst (revision 71dfa617ea9f18e4585fe78364217cd32b1fc382)
1.. _whatisrcu_doc:
2
3What is RCU?  --  "Read, Copy, Update"
4======================================
5
6Please note that the "What is RCU?" LWN series is an excellent place
7to start learning about RCU:
8
9| 1.	What is RCU, Fundamentally?  https://lwn.net/Articles/262464/
10| 2.	What is RCU? Part 2: Usage   https://lwn.net/Articles/263130/
11| 3.	RCU part 3: the RCU API      https://lwn.net/Articles/264090/
12| 4.	The RCU API, 2010 Edition    https://lwn.net/Articles/418853/
13| 	2010 Big API Table           https://lwn.net/Articles/419086/
14| 5.	The RCU API, 2014 Edition    https://lwn.net/Articles/609904/
15|	2014 Big API Table           https://lwn.net/Articles/609973/
16| 6.	The RCU API, 2019 Edition    https://lwn.net/Articles/777036/
17|	2019 Big API Table           https://lwn.net/Articles/777165/
18
19For those preferring video:
20
21| 1.	Unraveling RCU Mysteries: Fundamentals          https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries
22| 2.	Unraveling RCU Mysteries: Additional Use Cases  https://www.linuxfoundation.org/webinars/unraveling-rcu-usage-mysteries-additional-use-cases
23
24
25What is RCU?
26
27RCU is a synchronization mechanism that was added to the Linux kernel
28during the 2.5 development effort that is optimized for read-mostly
29situations.  Although RCU is actually quite simple, making effective use
30of it requires you to think differently about your code.  Another part
31of the problem is the mistaken assumption that there is "one true way" to
32describe and to use RCU.  Instead, the experience has been that different
33people must take different paths to arrive at an understanding of RCU,
34depending on their experiences and use cases.  This document provides
35several different paths, as follows:
36
37:ref:`1.	RCU OVERVIEW <1_whatisRCU>`
38
39:ref:`2.	WHAT IS RCU'S CORE API? <2_whatisRCU>`
40
41:ref:`3.	WHAT ARE SOME EXAMPLE USES OF CORE RCU API? <3_whatisRCU>`
42
43:ref:`4.	WHAT IF MY UPDATING THREAD CANNOT BLOCK? <4_whatisRCU>`
44
45:ref:`5.	WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU? <5_whatisRCU>`
46
47:ref:`6.	ANALOGY WITH READER-WRITER LOCKING <6_whatisRCU>`
48
49:ref:`7.	ANALOGY WITH REFERENCE COUNTING <7_whatisRCU>`
50
51:ref:`8.	FULL LIST OF RCU APIs <8_whatisRCU>`
52
53:ref:`9.	ANSWERS TO QUICK QUIZZES <9_whatisRCU>`
54
55People who prefer starting with a conceptual overview should focus on
56Section 1, though most readers will profit by reading this section at
57some point.  People who prefer to start with an API that they can then
58experiment with should focus on Section 2.  People who prefer to start
59with example uses should focus on Sections 3 and 4.  People who need to
60understand the RCU implementation should focus on Section 5, then dive
61into the kernel source code.  People who reason best by analogy should
62focus on Section 6 and 7.  Section 8 serves as an index to the docbook
63API documentation, and Section 9 is the traditional answer key.
64
65So, start with the section that makes the most sense to you and your
66preferred method of learning.  If you need to know everything about
67everything, feel free to read the whole thing -- but if you are really
68that type of person, you have perused the source code and will therefore
69never need this document anyway.  ;-)
70
71.. _1_whatisRCU:
72
731.  RCU OVERVIEW
74----------------
75
76The basic idea behind RCU is to split updates into "removal" and
77"reclamation" phases.  The removal phase removes references to data items
78within a data structure (possibly by replacing them with references to
79new versions of these data items), and can run concurrently with readers.
80The reason that it is safe to run the removal phase concurrently with
81readers is the semantics of modern CPUs guarantee that readers will see
82either the old or the new version of the data structure rather than a
83partially updated reference.  The reclamation phase does the work of reclaiming
84(e.g., freeing) the data items removed from the data structure during the
85removal phase.  Because reclaiming data items can disrupt any readers
86concurrently referencing those data items, the reclamation phase must
87not start until readers no longer hold references to those data items.
88
89Splitting the update into removal and reclamation phases permits the
90updater to perform the removal phase immediately, and to defer the
91reclamation phase until all readers active during the removal phase have
92completed, either by blocking until they finish or by registering a
93callback that is invoked after they finish.  Only readers that are active
94during the removal phase need be considered, because any reader starting
95after the removal phase will be unable to gain a reference to the removed
96data items, and therefore cannot be disrupted by the reclamation phase.
97
98So the typical RCU update sequence goes something like the following:
99
100a.	Remove pointers to a data structure, so that subsequent
101	readers cannot gain a reference to it.
102
103b.	Wait for all previous readers to complete their RCU read-side
104	critical sections.
105
106c.	At this point, there cannot be any readers who hold references
107	to the data structure, so it now may safely be reclaimed
108	(e.g., kfree()d).
109
110Step (b) above is the key idea underlying RCU's deferred destruction.
111The ability to wait until all readers are done allows RCU readers to
112use much lighter-weight synchronization, in some cases, absolutely no
113synchronization at all.  In contrast, in more conventional lock-based
114schemes, readers must use heavy-weight synchronization in order to
115prevent an updater from deleting the data structure out from under them.
116This is because lock-based updaters typically update data items in place,
117and must therefore exclude readers.  In contrast, RCU-based updaters
118typically take advantage of the fact that writes to single aligned
119pointers are atomic on modern CPUs, allowing atomic insertion, removal,
120and replacement of data items in a linked structure without disrupting
121readers.  Concurrent RCU readers can then continue accessing the old
122versions, and can dispense with the atomic operations, memory barriers,
123and communications cache misses that are so expensive on present-day
124SMP computer systems, even in absence of lock contention.
125
126In the three-step procedure shown above, the updater is performing both
127the removal and the reclamation step, but it is often helpful for an
128entirely different thread to do the reclamation, as is in fact the case
129in the Linux kernel's directory-entry cache (dcache).  Even if the same
130thread performs both the update step (step (a) above) and the reclamation
131step (step (c) above), it is often helpful to think of them separately.
132For example, RCU readers and updaters need not communicate at all,
133but RCU provides implicit low-overhead communication between readers
134and reclaimers, namely, in step (b) above.
135
136So how the heck can a reclaimer tell when a reader is done, given
137that readers are not doing any sort of synchronization operations???
138Read on to learn about how RCU's API makes this easy.
139
140.. _2_whatisRCU:
141
1422.  WHAT IS RCU'S CORE API?
143---------------------------
144
145The core RCU API is quite small:
146
147a.	rcu_read_lock()
148b.	rcu_read_unlock()
149c.	synchronize_rcu() / call_rcu()
150d.	rcu_assign_pointer()
151e.	rcu_dereference()
152
153There are many other members of the RCU API, but the rest can be
154expressed in terms of these five, though most implementations instead
155express synchronize_rcu() in terms of the call_rcu() callback API.
156
157The five core RCU APIs are described below, the other 18 will be enumerated
158later.  See the kernel docbook documentation for more info, or look directly
159at the function header comments.
160
161rcu_read_lock()
162^^^^^^^^^^^^^^^
163	void rcu_read_lock(void);
164
165	This temporal primitive is used by a reader to inform the
166	reclaimer that the reader is entering an RCU read-side critical
167	section.  It is illegal to block while in an RCU read-side
168	critical section, though kernels built with CONFIG_PREEMPT_RCU
169	can preempt RCU read-side critical sections.  Any RCU-protected
170	data structure accessed during an RCU read-side critical section
171	is guaranteed to remain unreclaimed for the full duration of that
172	critical section.  Reference counts may be used in conjunction
173	with RCU to maintain longer-term references to data structures.
174
175	Note that anything that disables bottom halves, preemption,
176	or interrupts also enters an RCU read-side critical section.
177	Acquiring a spinlock also enters an RCU read-side critical
178	sections, even for spinlocks that do not disable preemption,
179	as is the case in kernels built with CONFIG_PREEMPT_RT=y.
180	Sleeplocks do *not* enter RCU read-side critical sections.
181
182rcu_read_unlock()
183^^^^^^^^^^^^^^^^^
184	void rcu_read_unlock(void);
185
186	This temporal primitives is used by a reader to inform the
187	reclaimer that the reader is exiting an RCU read-side critical
188	section.  Anything that enables bottom halves, preemption,
189	or interrupts also exits an RCU read-side critical section.
190	Releasing a spinlock also exits an RCU read-side critical section.
191
192	Note that RCU read-side critical sections may be nested and/or
193	overlapping.
194
195synchronize_rcu()
196^^^^^^^^^^^^^^^^^
197	void synchronize_rcu(void);
198
199	This temporal primitive marks the end of updater code and the
200	beginning of reclaimer code.  It does this by blocking until
201	all pre-existing RCU read-side critical sections on all CPUs
202	have completed.  Note that synchronize_rcu() will **not**
203	necessarily wait for any subsequent RCU read-side critical
204	sections to complete.  For example, consider the following
205	sequence of events::
206
207	         CPU 0                  CPU 1                 CPU 2
208	     ----------------- ------------------------- ---------------
209	 1.  rcu_read_lock()
210	 2.                    enters synchronize_rcu()
211	 3.                                               rcu_read_lock()
212	 4.  rcu_read_unlock()
213	 5.                     exits synchronize_rcu()
214	 6.                                              rcu_read_unlock()
215
216	To reiterate, synchronize_rcu() waits only for ongoing RCU
217	read-side critical sections to complete, not necessarily for
218	any that begin after synchronize_rcu() is invoked.
219
220	Of course, synchronize_rcu() does not necessarily return
221	**immediately** after the last pre-existing RCU read-side critical
222	section completes.  For one thing, there might well be scheduling
223	delays.  For another thing, many RCU implementations process
224	requests in batches in order to improve efficiencies, which can
225	further delay synchronize_rcu().
226
227	Since synchronize_rcu() is the API that must figure out when
228	readers are done, its implementation is key to RCU.  For RCU
229	to be useful in all but the most read-intensive situations,
230	synchronize_rcu()'s overhead must also be quite small.
231
232	The call_rcu() API is an asynchronous callback form of
233	synchronize_rcu(), and is described in more detail in a later
234	section.  Instead of blocking, it registers a function and
235	argument which are invoked after all ongoing RCU read-side
236	critical sections have completed.  This callback variant is
237	particularly useful in situations where it is illegal to block
238	or where update-side performance is critically important.
239
240	However, the call_rcu() API should not be used lightly, as use
241	of the synchronize_rcu() API generally results in simpler code.
242	In addition, the synchronize_rcu() API has the nice property
243	of automatically limiting update rate should grace periods
244	be delayed.  This property results in system resilience in face
245	of denial-of-service attacks.  Code using call_rcu() should limit
246	update rate in order to gain this same sort of resilience.  See
247	checklist.rst for some approaches to limiting the update rate.
248
249rcu_assign_pointer()
250^^^^^^^^^^^^^^^^^^^^
251	void rcu_assign_pointer(p, typeof(p) v);
252
253	Yes, rcu_assign_pointer() **is** implemented as a macro, though it
254	would be cool to be able to declare a function in this manner.
255	(Compiler experts will no doubt disagree.)
256
257	The updater uses this spatial macro to assign a new value to an
258	RCU-protected pointer, in order to safely communicate the change
259	in value from the updater to the reader.  This is a spatial (as
260	opposed to temporal) macro.  It does not evaluate to an rvalue,
261	but it does execute any memory-barrier instructions required
262	for a given CPU architecture.  Its ordering properties are that
263	of a store-release operation.
264
265	Perhaps just as important, it serves to document (1) which
266	pointers are protected by RCU and (2) the point at which a
267	given structure becomes accessible to other CPUs.  That said,
268	rcu_assign_pointer() is most frequently used indirectly, via
269	the _rcu list-manipulation primitives such as list_add_rcu().
270
271rcu_dereference()
272^^^^^^^^^^^^^^^^^
273	typeof(p) rcu_dereference(p);
274
275	Like rcu_assign_pointer(), rcu_dereference() must be implemented
276	as a macro.
277
278	The reader uses the spatial rcu_dereference() macro to fetch
279	an RCU-protected pointer, which returns a value that may
280	then be safely dereferenced.  Note that rcu_dereference()
281	does not actually dereference the pointer, instead, it
282	protects the pointer for later dereferencing.  It also
283	executes any needed memory-barrier instructions for a given
284	CPU architecture.  Currently, only Alpha needs memory barriers
285	within rcu_dereference() -- on other CPUs, it compiles to a
286	volatile load.
287
288	Common coding practice uses rcu_dereference() to copy an
289	RCU-protected pointer to a local variable, then dereferences
290	this local variable, for example as follows::
291
292		p = rcu_dereference(head.next);
293		return p->data;
294
295	However, in this case, one could just as easily combine these
296	into one statement::
297
298		return rcu_dereference(head.next)->data;
299
300	If you are going to be fetching multiple fields from the
301	RCU-protected structure, using the local variable is of
302	course preferred.  Repeated rcu_dereference() calls look
303	ugly, do not guarantee that the same pointer will be returned
304	if an update happened while in the critical section, and incur
305	unnecessary overhead on Alpha CPUs.
306
307	Note that the value returned by rcu_dereference() is valid
308	only within the enclosing RCU read-side critical section [1]_.
309	For example, the following is **not** legal::
310
311		rcu_read_lock();
312		p = rcu_dereference(head.next);
313		rcu_read_unlock();
314		x = p->address;	/* BUG!!! */
315		rcu_read_lock();
316		y = p->data;	/* BUG!!! */
317		rcu_read_unlock();
318
319	Holding a reference from one RCU read-side critical section
320	to another is just as illegal as holding a reference from
321	one lock-based critical section to another!  Similarly,
322	using a reference outside of the critical section in which
323	it was acquired is just as illegal as doing so with normal
324	locking.
325
326	As with rcu_assign_pointer(), an important function of
327	rcu_dereference() is to document which pointers are protected by
328	RCU, in particular, flagging a pointer that is subject to changing
329	at any time, including immediately after the rcu_dereference().
330	And, again like rcu_assign_pointer(), rcu_dereference() is
331	typically used indirectly, via the _rcu list-manipulation
332	primitives, such as list_for_each_entry_rcu() [2]_.
333
334.. 	[1] The variant rcu_dereference_protected() can be used outside
335	of an RCU read-side critical section as long as the usage is
336	protected by locks acquired by the update-side code.  This variant
337	avoids the lockdep warning that would happen when using (for
338	example) rcu_dereference() without rcu_read_lock() protection.
339	Using rcu_dereference_protected() also has the advantage
340	of permitting compiler optimizations that rcu_dereference()
341	must prohibit.	The rcu_dereference_protected() variant takes
342	a lockdep expression to indicate which locks must be acquired
343	by the caller. If the indicated protection is not provided,
344	a lockdep splat is emitted.  See Design/Requirements/Requirements.rst
345	and the API's code comments for more details and example usage.
346
347.. 	[2] If the list_for_each_entry_rcu() instance might be used by
348	update-side code as well as by RCU readers, then an additional
349	lockdep expression can be added to its list of arguments.
350	For example, given an additional "lock_is_held(&mylock)" argument,
351	the RCU lockdep code would complain only if this instance was
352	invoked outside of an RCU read-side critical section and without
353	the protection of mylock.
354
355The following diagram shows how each API communicates among the
356reader, updater, and reclaimer.
357::
358
359
360	    rcu_assign_pointer()
361	                            +--------+
362	    +---------------------->| reader |---------+
363	    |                       +--------+         |
364	    |                           |              |
365	    |                           |              | Protect:
366	    |                           |              | rcu_read_lock()
367	    |                           |              | rcu_read_unlock()
368	    |        rcu_dereference()  |              |
369	    +---------+                 |              |
370	    | updater |<----------------+              |
371	    +---------+                                V
372	    |                                    +-----------+
373	    +----------------------------------->| reclaimer |
374	                                         +-----------+
375	      Defer:
376	      synchronize_rcu() & call_rcu()
377
378
379The RCU infrastructure observes the temporal sequence of rcu_read_lock(),
380rcu_read_unlock(), synchronize_rcu(), and call_rcu() invocations in
381order to determine when (1) synchronize_rcu() invocations may return
382to their callers and (2) call_rcu() callbacks may be invoked.  Efficient
383implementations of the RCU infrastructure make heavy use of batching in
384order to amortize their overhead over many uses of the corresponding APIs.
385The rcu_assign_pointer() and rcu_dereference() invocations communicate
386spatial changes via stores to and loads from the RCU-protected pointer in
387question.
388
389There are at least three flavors of RCU usage in the Linux kernel. The diagram
390above shows the most common one. On the updater side, the rcu_assign_pointer(),
391synchronize_rcu() and call_rcu() primitives used are the same for all three
392flavors. However for protection (on the reader side), the primitives used vary
393depending on the flavor:
394
395a.	rcu_read_lock() / rcu_read_unlock()
396	rcu_dereference()
397
398b.	rcu_read_lock_bh() / rcu_read_unlock_bh()
399	local_bh_disable() / local_bh_enable()
400	rcu_dereference_bh()
401
402c.	rcu_read_lock_sched() / rcu_read_unlock_sched()
403	preempt_disable() / preempt_enable()
404	local_irq_save() / local_irq_restore()
405	hardirq enter / hardirq exit
406	NMI enter / NMI exit
407	rcu_dereference_sched()
408
409These three flavors are used as follows:
410
411a.	RCU applied to normal data structures.
412
413b.	RCU applied to networking data structures that may be subjected
414	to remote denial-of-service attacks.
415
416c.	RCU applied to scheduler and interrupt/NMI-handler tasks.
417
418Again, most uses will be of (a).  The (b) and (c) cases are important
419for specialized uses, but are relatively uncommon.  The SRCU, RCU-Tasks,
420RCU-Tasks-Rude, and RCU-Tasks-Trace have similar relationships among
421their assorted primitives.
422
423.. _3_whatisRCU:
424
4253.  WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
426-----------------------------------------------
427
428This section shows a simple use of the core RCU API to protect a
429global pointer to a dynamically allocated structure.  More-typical
430uses of RCU may be found in listRCU.rst, arrayRCU.rst, and NMI-RCU.rst.
431::
432
433	struct foo {
434		int a;
435		char b;
436		long c;
437	};
438	DEFINE_SPINLOCK(foo_mutex);
439
440	struct foo __rcu *gbl_foo;
441
442	/*
443	 * Create a new struct foo that is the same as the one currently
444	 * pointed to by gbl_foo, except that field "a" is replaced
445	 * with "new_a".  Points gbl_foo to the new structure, and
446	 * frees up the old structure after a grace period.
447	 *
448	 * Uses rcu_assign_pointer() to ensure that concurrent readers
449	 * see the initialized version of the new structure.
450	 *
451	 * Uses synchronize_rcu() to ensure that any readers that might
452	 * have references to the old structure complete before freeing
453	 * the old structure.
454	 */
455	void foo_update_a(int new_a)
456	{
457		struct foo *new_fp;
458		struct foo *old_fp;
459
460		new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
461		spin_lock(&foo_mutex);
462		old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
463		*new_fp = *old_fp;
464		new_fp->a = new_a;
465		rcu_assign_pointer(gbl_foo, new_fp);
466		spin_unlock(&foo_mutex);
467		synchronize_rcu();
468		kfree(old_fp);
469	}
470
471	/*
472	 * Return the value of field "a" of the current gbl_foo
473	 * structure.  Use rcu_read_lock() and rcu_read_unlock()
474	 * to ensure that the structure does not get deleted out
475	 * from under us, and use rcu_dereference() to ensure that
476	 * we see the initialized version of the structure (important
477	 * for DEC Alpha and for people reading the code).
478	 */
479	int foo_get_a(void)
480	{
481		int retval;
482
483		rcu_read_lock();
484		retval = rcu_dereference(gbl_foo)->a;
485		rcu_read_unlock();
486		return retval;
487	}
488
489So, to sum up:
490
491-	Use rcu_read_lock() and rcu_read_unlock() to guard RCU
492	read-side critical sections.
493
494-	Within an RCU read-side critical section, use rcu_dereference()
495	to dereference RCU-protected pointers.
496
497-	Use some solid design (such as locks or semaphores) to
498	keep concurrent updates from interfering with each other.
499
500-	Use rcu_assign_pointer() to update an RCU-protected pointer.
501	This primitive protects concurrent readers from the updater,
502	**not** concurrent updates from each other!  You therefore still
503	need to use locking (or something similar) to keep concurrent
504	rcu_assign_pointer() primitives from interfering with each other.
505
506-	Use synchronize_rcu() **after** removing a data element from an
507	RCU-protected data structure, but **before** reclaiming/freeing
508	the data element, in order to wait for the completion of all
509	RCU read-side critical sections that might be referencing that
510	data item.
511
512See checklist.rst for additional rules to follow when using RCU.
513And again, more-typical uses of RCU may be found in listRCU.rst,
514arrayRCU.rst, and NMI-RCU.rst.
515
516.. _4_whatisRCU:
517
5184.  WHAT IF MY UPDATING THREAD CANNOT BLOCK?
519--------------------------------------------
520
521In the example above, foo_update_a() blocks until a grace period elapses.
522This is quite simple, but in some cases one cannot afford to wait so
523long -- there might be other high-priority work to be done.
524
525In such cases, one uses call_rcu() rather than synchronize_rcu().
526The call_rcu() API is as follows::
527
528	void call_rcu(struct rcu_head *head, rcu_callback_t func);
529
530This function invokes func(head) after a grace period has elapsed.
531This invocation might happen from either softirq or process context,
532so the function is not permitted to block.  The foo struct needs to
533have an rcu_head structure added, perhaps as follows::
534
535	struct foo {
536		int a;
537		char b;
538		long c;
539		struct rcu_head rcu;
540	};
541
542The foo_update_a() function might then be written as follows::
543
544	/*
545	 * Create a new struct foo that is the same as the one currently
546	 * pointed to by gbl_foo, except that field "a" is replaced
547	 * with "new_a".  Points gbl_foo to the new structure, and
548	 * frees up the old structure after a grace period.
549	 *
550	 * Uses rcu_assign_pointer() to ensure that concurrent readers
551	 * see the initialized version of the new structure.
552	 *
553	 * Uses call_rcu() to ensure that any readers that might have
554	 * references to the old structure complete before freeing the
555	 * old structure.
556	 */
557	void foo_update_a(int new_a)
558	{
559		struct foo *new_fp;
560		struct foo *old_fp;
561
562		new_fp = kmalloc(sizeof(*new_fp), GFP_KERNEL);
563		spin_lock(&foo_mutex);
564		old_fp = rcu_dereference_protected(gbl_foo, lockdep_is_held(&foo_mutex));
565		*new_fp = *old_fp;
566		new_fp->a = new_a;
567		rcu_assign_pointer(gbl_foo, new_fp);
568		spin_unlock(&foo_mutex);
569		call_rcu(&old_fp->rcu, foo_reclaim);
570	}
571
572The foo_reclaim() function might appear as follows::
573
574	void foo_reclaim(struct rcu_head *rp)
575	{
576		struct foo *fp = container_of(rp, struct foo, rcu);
577
578		foo_cleanup(fp->a);
579
580		kfree(fp);
581	}
582
583The container_of() primitive is a macro that, given a pointer into a
584struct, the type of the struct, and the pointed-to field within the
585struct, returns a pointer to the beginning of the struct.
586
587The use of call_rcu() permits the caller of foo_update_a() to
588immediately regain control, without needing to worry further about the
589old version of the newly updated element.  It also clearly shows the
590RCU distinction between updater, namely foo_update_a(), and reclaimer,
591namely foo_reclaim().
592
593The summary of advice is the same as for the previous section, except
594that we are now using call_rcu() rather than synchronize_rcu():
595
596-	Use call_rcu() **after** removing a data element from an
597	RCU-protected data structure in order to register a callback
598	function that will be invoked after the completion of all RCU
599	read-side critical sections that might be referencing that
600	data item.
601
602If the callback for call_rcu() is not doing anything more than calling
603kfree() on the structure, you can use kfree_rcu() instead of call_rcu()
604to avoid having to write your own callback::
605
606	kfree_rcu(old_fp, rcu);
607
608If the occasional sleep is permitted, the single-argument form may
609be used, omitting the rcu_head structure from struct foo.
610
611	kfree_rcu_mightsleep(old_fp);
612
613This variant almost never blocks, but might do so by invoking
614synchronize_rcu() in response to memory-allocation failure.
615
616Again, see checklist.rst for additional rules governing the use of RCU.
617
618.. _5_whatisRCU:
619
6205.  WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
621------------------------------------------------
622
623One of the nice things about RCU is that it has extremely simple "toy"
624implementations that are a good first step towards understanding the
625production-quality implementations in the Linux kernel.  This section
626presents two such "toy" implementations of RCU, one that is implemented
627in terms of familiar locking primitives, and another that more closely
628resembles "classic" RCU.  Both are way too simple for real-world use,
629lacking both functionality and performance.  However, they are useful
630in getting a feel for how RCU works.  See kernel/rcu/update.c for a
631production-quality implementation, and see:
632
633	https://docs.google.com/document/d/1X0lThx8OK0ZgLMqVoXiR4ZrGURHrXK6NyLRbeXe3Xac/edit
634
635for papers describing the Linux kernel RCU implementation.  The OLS'01
636and OLS'02 papers are a good introduction, and the dissertation provides
637more details on the current implementation as of early 2004.
638
639
6405A.  "TOY" IMPLEMENTATION #1: LOCKING
641^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
642This section presents a "toy" RCU implementation that is based on
643familiar locking primitives.  Its overhead makes it a non-starter for
644real-life use, as does its lack of scalability.  It is also unsuitable
645for realtime use, since it allows scheduling latency to "bleed" from
646one read-side critical section to another.  It also assumes recursive
647reader-writer locks:  If you try this with non-recursive locks, and
648you allow nested rcu_read_lock() calls, you can deadlock.
649
650However, it is probably the easiest implementation to relate to, so is
651a good starting point.
652
653It is extremely simple::
654
655	static DEFINE_RWLOCK(rcu_gp_mutex);
656
657	void rcu_read_lock(void)
658	{
659		read_lock(&rcu_gp_mutex);
660	}
661
662	void rcu_read_unlock(void)
663	{
664		read_unlock(&rcu_gp_mutex);
665	}
666
667	void synchronize_rcu(void)
668	{
669		write_lock(&rcu_gp_mutex);
670		smp_mb__after_spinlock();
671		write_unlock(&rcu_gp_mutex);
672	}
673
674[You can ignore rcu_assign_pointer() and rcu_dereference() without missing
675much.  But here are simplified versions anyway.  And whatever you do,
676don't forget about them when submitting patches making use of RCU!]::
677
678	#define rcu_assign_pointer(p, v) \
679	({ \
680		smp_store_release(&(p), (v)); \
681	})
682
683	#define rcu_dereference(p) \
684	({ \
685		typeof(p) _________p1 = READ_ONCE(p); \
686		(_________p1); \
687	})
688
689
690The rcu_read_lock() and rcu_read_unlock() primitive read-acquire
691and release a global reader-writer lock.  The synchronize_rcu()
692primitive write-acquires this same lock, then releases it.  This means
693that once synchronize_rcu() exits, all RCU read-side critical sections
694that were in progress before synchronize_rcu() was called are guaranteed
695to have completed -- there is no way that synchronize_rcu() would have
696been able to write-acquire the lock otherwise.  The smp_mb__after_spinlock()
697promotes synchronize_rcu() to a full memory barrier in compliance with
698the "Memory-Barrier Guarantees" listed in:
699
700	Design/Requirements/Requirements.rst
701
702It is possible to nest rcu_read_lock(), since reader-writer locks may
703be recursively acquired.  Note also that rcu_read_lock() is immune
704from deadlock (an important property of RCU).  The reason for this is
705that the only thing that can block rcu_read_lock() is a synchronize_rcu().
706But synchronize_rcu() does not acquire any locks while holding rcu_gp_mutex,
707so there can be no deadlock cycle.
708
709.. _quiz_1:
710
711Quick Quiz #1:
712		Why is this argument naive?  How could a deadlock
713		occur when using this algorithm in a real-world Linux
714		kernel?  How could this deadlock be avoided?
715
716:ref:`Answers to Quick Quiz <9_whatisRCU>`
717
7185B.  "TOY" EXAMPLE #2: CLASSIC RCU
719^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
720This section presents a "toy" RCU implementation that is based on
721"classic RCU".  It is also short on performance (but only for updates) and
722on features such as hotplug CPU and the ability to run in CONFIG_PREEMPTION
723kernels.  The definitions of rcu_dereference() and rcu_assign_pointer()
724are the same as those shown in the preceding section, so they are omitted.
725::
726
727	void rcu_read_lock(void) { }
728
729	void rcu_read_unlock(void) { }
730
731	void synchronize_rcu(void)
732	{
733		int cpu;
734
735		for_each_possible_cpu(cpu)
736			run_on(cpu);
737	}
738
739Note that rcu_read_lock() and rcu_read_unlock() do absolutely nothing.
740This is the great strength of classic RCU in a non-preemptive kernel:
741read-side overhead is precisely zero, at least on non-Alpha CPUs.
742And there is absolutely no way that rcu_read_lock() can possibly
743participate in a deadlock cycle!
744
745The implementation of synchronize_rcu() simply schedules itself on each
746CPU in turn.  The run_on() primitive can be implemented straightforwardly
747in terms of the sched_setaffinity() primitive.  Of course, a somewhat less
748"toy" implementation would restore the affinity upon completion rather
749than just leaving all tasks running on the last CPU, but when I said
750"toy", I meant **toy**!
751
752So how the heck is this supposed to work???
753
754Remember that it is illegal to block while in an RCU read-side critical
755section.  Therefore, if a given CPU executes a context switch, we know
756that it must have completed all preceding RCU read-side critical sections.
757Once **all** CPUs have executed a context switch, then **all** preceding
758RCU read-side critical sections will have completed.
759
760So, suppose that we remove a data item from its structure and then invoke
761synchronize_rcu().  Once synchronize_rcu() returns, we are guaranteed
762that there are no RCU read-side critical sections holding a reference
763to that data item, so we can safely reclaim it.
764
765.. _quiz_2:
766
767Quick Quiz #2:
768		Give an example where Classic RCU's read-side
769		overhead is **negative**.
770
771:ref:`Answers to Quick Quiz <9_whatisRCU>`
772
773.. _quiz_3:
774
775Quick Quiz #3:
776		If it is illegal to block in an RCU read-side
777		critical section, what the heck do you do in
778		CONFIG_PREEMPT_RT, where normal spinlocks can block???
779
780:ref:`Answers to Quick Quiz <9_whatisRCU>`
781
782.. _6_whatisRCU:
783
7846.  ANALOGY WITH READER-WRITER LOCKING
785--------------------------------------
786
787Although RCU can be used in many different ways, a very common use of
788RCU is analogous to reader-writer locking.  The following unified
789diff shows how closely related RCU and reader-writer locking can be.
790::
791
792	@@ -5,5 +5,5 @@ struct el {
793	 	int data;
794	 	/* Other data fields */
795	 };
796	-rwlock_t listmutex;
797	+spinlock_t listmutex;
798	 struct el head;
799
800	@@ -13,15 +14,15 @@
801		struct list_head *lp;
802		struct el *p;
803
804	-	read_lock(&listmutex);
805	-	list_for_each_entry(p, head, lp) {
806	+	rcu_read_lock();
807	+	list_for_each_entry_rcu(p, head, lp) {
808			if (p->key == key) {
809				*result = p->data;
810	-			read_unlock(&listmutex);
811	+			rcu_read_unlock();
812				return 1;
813			}
814		}
815	-	read_unlock(&listmutex);
816	+	rcu_read_unlock();
817		return 0;
818	 }
819
820	@@ -29,15 +30,16 @@
821	 {
822		struct el *p;
823
824	-	write_lock(&listmutex);
825	+	spin_lock(&listmutex);
826		list_for_each_entry(p, head, lp) {
827			if (p->key == key) {
828	-			list_del(&p->list);
829	-			write_unlock(&listmutex);
830	+			list_del_rcu(&p->list);
831	+			spin_unlock(&listmutex);
832	+			synchronize_rcu();
833				kfree(p);
834				return 1;
835			}
836		}
837	-	write_unlock(&listmutex);
838	+	spin_unlock(&listmutex);
839		return 0;
840	 }
841
842Or, for those who prefer a side-by-side listing::
843
844 1 struct el {                          1 struct el {
845 2   struct list_head list;             2   struct list_head list;
846 3   long key;                          3   long key;
847 4   spinlock_t mutex;                  4   spinlock_t mutex;
848 5   int data;                          5   int data;
849 6   /* Other data fields */            6   /* Other data fields */
850 7 };                                   7 };
851 8 rwlock_t listmutex;                  8 spinlock_t listmutex;
852 9 struct el head;                      9 struct el head;
853
854::
855
856  1 int search(long key, int *result)    1 int search(long key, int *result)
857  2 {                                    2 {
858  3   struct list_head *lp;              3   struct list_head *lp;
859  4   struct el *p;                      4   struct el *p;
860  5                                      5
861  6   read_lock(&listmutex);             6   rcu_read_lock();
862  7   list_for_each_entry(p, head, lp) { 7   list_for_each_entry_rcu(p, head, lp) {
863  8     if (p->key == key) {             8     if (p->key == key) {
864  9       *result = p->data;             9       *result = p->data;
865 10       read_unlock(&listmutex);      10       rcu_read_unlock();
866 11       return 1;                     11       return 1;
867 12     }                               12     }
868 13   }                                 13   }
869 14   read_unlock(&listmutex);          14   rcu_read_unlock();
870 15   return 0;                         15   return 0;
871 16 }                                   16 }
872
873::
874
875  1 int delete(long key)                 1 int delete(long key)
876  2 {                                    2 {
877  3   struct el *p;                      3   struct el *p;
878  4                                      4
879  5   write_lock(&listmutex);            5   spin_lock(&listmutex);
880  6   list_for_each_entry(p, head, lp) { 6   list_for_each_entry(p, head, lp) {
881  7     if (p->key == key) {             7     if (p->key == key) {
882  8       list_del(&p->list);            8       list_del_rcu(&p->list);
883  9       write_unlock(&listmutex);      9       spin_unlock(&listmutex);
884                                        10       synchronize_rcu();
885 10       kfree(p);                     11       kfree(p);
886 11       return 1;                     12       return 1;
887 12     }                               13     }
888 13   }                                 14   }
889 14   write_unlock(&listmutex);         15   spin_unlock(&listmutex);
890 15   return 0;                         16   return 0;
891 16 }                                   17 }
892
893Either way, the differences are quite small.  Read-side locking moves
894to rcu_read_lock() and rcu_read_unlock, update-side locking moves from
895a reader-writer lock to a simple spinlock, and a synchronize_rcu()
896precedes the kfree().
897
898However, there is one potential catch: the read-side and update-side
899critical sections can now run concurrently.  In many cases, this will
900not be a problem, but it is necessary to check carefully regardless.
901For example, if multiple independent list updates must be seen as
902a single atomic update, converting to RCU will require special care.
903
904Also, the presence of synchronize_rcu() means that the RCU version of
905delete() can now block.  If this is a problem, there is a callback-based
906mechanism that never blocks, namely call_rcu() or kfree_rcu(), that can
907be used in place of synchronize_rcu().
908
909.. _7_whatisRCU:
910
9117.  ANALOGY WITH REFERENCE COUNTING
912-----------------------------------
913
914The reader-writer analogy (illustrated by the previous section) is not
915always the best way to think about using RCU.  Another helpful analogy
916considers RCU an effective reference count on everything which is
917protected by RCU.
918
919A reference count typically does not prevent the referenced object's
920values from changing, but does prevent changes to type -- particularly the
921gross change of type that happens when that object's memory is freed and
922re-allocated for some other purpose.  Once a type-safe reference to the
923object is obtained, some other mechanism is needed to ensure consistent
924access to the data in the object.  This could involve taking a spinlock,
925but with RCU the typical approach is to perform reads with SMP-aware
926operations such as smp_load_acquire(), to perform updates with atomic
927read-modify-write operations, and to provide the necessary ordering.
928RCU provides a number of support functions that embed the required
929operations and ordering, such as the list_for_each_entry_rcu() macro
930used in the previous section.
931
932A more focused view of the reference counting behavior is that,
933between rcu_read_lock() and rcu_read_unlock(), any reference taken with
934rcu_dereference() on a pointer marked as ``__rcu`` can be treated as
935though a reference-count on that object has been temporarily increased.
936This prevents the object from changing type.  Exactly what this means
937will depend on normal expectations of objects of that type, but it
938typically includes that spinlocks can still be safely locked, normal
939reference counters can be safely manipulated, and ``__rcu`` pointers
940can be safely dereferenced.
941
942Some operations that one might expect to see on an object for
943which an RCU reference is held include:
944
945 - Copying out data that is guaranteed to be stable by the object's type.
946 - Using kref_get_unless_zero() or similar to get a longer-term
947   reference.  This may fail of course.
948 - Acquiring a spinlock in the object, and checking if the object still
949   is the expected object and if so, manipulating it freely.
950
951The understanding that RCU provides a reference that only prevents a
952change of type is particularly visible with objects allocated from a
953slab cache marked ``SLAB_TYPESAFE_BY_RCU``.  RCU operations may yield a
954reference to an object from such a cache that has been concurrently freed
955and the memory reallocated to a completely different object, though of
956the same type.  In this case RCU doesn't even protect the identity of the
957object from changing, only its type.  So the object found may not be the
958one expected, but it will be one where it is safe to take a reference
959(and then potentially acquiring a spinlock), allowing subsequent code
960to check whether the identity matches expectations.  It is tempting
961to simply acquire the spinlock without first taking the reference, but
962unfortunately any spinlock in a ``SLAB_TYPESAFE_BY_RCU`` object must be
963initialized after each and every call to kmem_cache_alloc(), which renders
964reference-free spinlock acquisition completely unsafe.  Therefore, when
965using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter.
966(Those willing to initialize their locks in a kmem_cache constructor
967may also use locking, including cache-friendly sequence locking.)
968
969With traditional reference counting -- such as that implemented by the
970kref library in Linux -- there is typically code that runs when the last
971reference to an object is dropped.  With kref, this is the function
972passed to kref_put().  When RCU is being used, such finalization code
973must not be run until all ``__rcu`` pointers referencing the object have
974been updated, and then a grace period has passed.  Every remaining
975globally visible pointer to the object must be considered to be a
976potential counted reference, and the finalization code is typically run
977using call_rcu() only after all those pointers have been changed.
978
979To see how to choose between these two analogies -- of RCU as a
980reader-writer lock and RCU as a reference counting system -- it is useful
981to reflect on the scale of the thing being protected.  The reader-writer
982lock analogy looks at larger multi-part objects such as a linked list
983and shows how RCU can facilitate concurrency while elements are added
984to, and removed from, the list.  The reference-count analogy looks at
985the individual objects and looks at how they can be accessed safely
986within whatever whole they are a part of.
987
988.. _8_whatisRCU:
989
9908.  FULL LIST OF RCU APIs
991-------------------------
992
993The RCU APIs are documented in docbook-format header comments in the
994Linux-kernel source code, but it helps to have a full list of the
995APIs, since there does not appear to be a way to categorize them
996in docbook.  Here is the list, by category.
997
998RCU list traversal::
999
1000	list_entry_rcu
1001	list_entry_lockless
1002	list_first_entry_rcu
1003	list_next_rcu
1004	list_for_each_entry_rcu
1005	list_for_each_entry_continue_rcu
1006	list_for_each_entry_from_rcu
1007	list_first_or_null_rcu
1008	list_next_or_null_rcu
1009	hlist_first_rcu
1010	hlist_next_rcu
1011	hlist_pprev_rcu
1012	hlist_for_each_entry_rcu
1013	hlist_for_each_entry_rcu_bh
1014	hlist_for_each_entry_from_rcu
1015	hlist_for_each_entry_continue_rcu
1016	hlist_for_each_entry_continue_rcu_bh
1017	hlist_nulls_first_rcu
1018	hlist_nulls_for_each_entry_rcu
1019	hlist_bl_first_rcu
1020	hlist_bl_for_each_entry_rcu
1021
1022RCU pointer/list update::
1023
1024	rcu_assign_pointer
1025	list_add_rcu
1026	list_add_tail_rcu
1027	list_del_rcu
1028	list_replace_rcu
1029	hlist_add_behind_rcu
1030	hlist_add_before_rcu
1031	hlist_add_head_rcu
1032	hlist_add_tail_rcu
1033	hlist_del_rcu
1034	hlist_del_init_rcu
1035	hlist_replace_rcu
1036	list_splice_init_rcu
1037	list_splice_tail_init_rcu
1038	hlist_nulls_del_init_rcu
1039	hlist_nulls_del_rcu
1040	hlist_nulls_add_head_rcu
1041	hlist_bl_add_head_rcu
1042	hlist_bl_del_init_rcu
1043	hlist_bl_del_rcu
1044	hlist_bl_set_first_rcu
1045
1046RCU::
1047
1048	Critical sections	Grace period		Barrier
1049
1050	rcu_read_lock		synchronize_net		rcu_barrier
1051	rcu_read_unlock		synchronize_rcu
1052	rcu_dereference		synchronize_rcu_expedited
1053	rcu_read_lock_held	call_rcu
1054	rcu_dereference_check	kfree_rcu
1055	rcu_dereference_protected
1056
1057bh::
1058
1059	Critical sections	Grace period		Barrier
1060
1061	rcu_read_lock_bh	call_rcu		rcu_barrier
1062	rcu_read_unlock_bh	synchronize_rcu
1063	[local_bh_disable]	synchronize_rcu_expedited
1064	[and friends]
1065	rcu_dereference_bh
1066	rcu_dereference_bh_check
1067	rcu_dereference_bh_protected
1068	rcu_read_lock_bh_held
1069
1070sched::
1071
1072	Critical sections	Grace period		Barrier
1073
1074	rcu_read_lock_sched	call_rcu		rcu_barrier
1075	rcu_read_unlock_sched	synchronize_rcu
1076	[preempt_disable]	synchronize_rcu_expedited
1077	[and friends]
1078	rcu_read_lock_sched_notrace
1079	rcu_read_unlock_sched_notrace
1080	rcu_dereference_sched
1081	rcu_dereference_sched_check
1082	rcu_dereference_sched_protected
1083	rcu_read_lock_sched_held
1084
1085
1086RCU-Tasks::
1087
1088	Critical sections	Grace period		Barrier
1089
1090	N/A			call_rcu_tasks		rcu_barrier_tasks
1091				synchronize_rcu_tasks
1092
1093
1094RCU-Tasks-Rude::
1095
1096	Critical sections	Grace period		Barrier
1097
1098	N/A			call_rcu_tasks_rude	rcu_barrier_tasks_rude
1099				synchronize_rcu_tasks_rude
1100
1101
1102RCU-Tasks-Trace::
1103
1104	Critical sections	Grace period		Barrier
1105
1106	rcu_read_lock_trace	call_rcu_tasks_trace	rcu_barrier_tasks_trace
1107	rcu_read_unlock_trace	synchronize_rcu_tasks_trace
1108
1109
1110SRCU::
1111
1112	Critical sections	Grace period		Barrier
1113
1114	srcu_read_lock		call_srcu		srcu_barrier
1115	srcu_read_unlock	synchronize_srcu
1116	srcu_dereference	synchronize_srcu_expedited
1117	srcu_dereference_check
1118	srcu_read_lock_held
1119
1120SRCU: Initialization/cleanup::
1121
1122	DEFINE_SRCU
1123	DEFINE_STATIC_SRCU
1124	init_srcu_struct
1125	cleanup_srcu_struct
1126
1127All: lockdep-checked RCU utility APIs::
1128
1129	RCU_LOCKDEP_WARN
1130	rcu_sleep_check
1131
1132All: Unchecked RCU-protected pointer access::
1133
1134	rcu_dereference_raw
1135
1136All: Unchecked RCU-protected pointer access with dereferencing prohibited::
1137
1138	rcu_access_pointer
1139
1140See the comment headers in the source code (or the docbook generated
1141from them) for more information.
1142
1143However, given that there are no fewer than four families of RCU APIs
1144in the Linux kernel, how do you choose which one to use?  The following
1145list can be helpful:
1146
1147a.	Will readers need to block?  If so, you need SRCU.
1148
1149b.	Will readers need to block and are you doing tracing, for
1150	example, ftrace or BPF?  If so, you need RCU-tasks,
1151	RCU-tasks-rude, and/or RCU-tasks-trace.
1152
1153c.	What about the -rt patchset?  If readers would need to block in
1154	an non-rt kernel, you need SRCU.  If readers would block when
1155	acquiring spinlocks in a -rt kernel, but not in a non-rt kernel,
1156	SRCU is not necessary.	(The -rt patchset turns spinlocks into
1157	sleeplocks, hence this distinction.)
1158
1159d.	Do you need to treat NMI handlers, hardirq handlers,
1160	and code segments with preemption disabled (whether
1161	via preempt_disable(), local_irq_save(), local_bh_disable(),
1162	or some other mechanism) as if they were explicit RCU readers?
1163	If so, RCU-sched readers are the only choice that will work
1164	for you, but since about v4.20 you use can use the vanilla RCU
1165	update primitives.
1166
1167e.	Do you need RCU grace periods to complete even in the face of
1168	softirq monopolization of one or more of the CPUs?  For example,
1169	is your code subject to network-based denial-of-service attacks?
1170	If so, you should disable softirq across your readers, for
1171	example, by using rcu_read_lock_bh().  Since about v4.20 you
1172	use can use the vanilla RCU update primitives.
1173
1174f.	Is your workload too update-intensive for normal use of
1175	RCU, but inappropriate for other synchronization mechanisms?
1176	If so, consider SLAB_TYPESAFE_BY_RCU (which was originally
1177	named SLAB_DESTROY_BY_RCU).  But please be careful!
1178
1179g.	Do you need read-side critical sections that are respected even
1180	on CPUs that are deep in the idle loop, during entry to or exit
1181	from user-mode execution, or on an offlined CPU?  If so, SRCU
1182	and RCU Tasks Trace are the only choices that will work for you,
1183	with SRCU being strongly preferred in almost all cases.
1184
1185h.	Otherwise, use RCU.
1186
1187Of course, this all assumes that you have determined that RCU is in fact
1188the right tool for your job.
1189
1190.. _9_whatisRCU:
1191
11929.  ANSWERS TO QUICK QUIZZES
1193----------------------------
1194
1195Quick Quiz #1:
1196		Why is this argument naive?  How could a deadlock
1197		occur when using this algorithm in a real-world Linux
1198		kernel?  [Referring to the lock-based "toy" RCU
1199		algorithm.]
1200
1201Answer:
1202		Consider the following sequence of events:
1203
1204		1.	CPU 0 acquires some unrelated lock, call it
1205			"problematic_lock", disabling irq via
1206			spin_lock_irqsave().
1207
1208		2.	CPU 1 enters synchronize_rcu(), write-acquiring
1209			rcu_gp_mutex.
1210
1211		3.	CPU 0 enters rcu_read_lock(), but must wait
1212			because CPU 1 holds rcu_gp_mutex.
1213
1214		4.	CPU 1 is interrupted, and the irq handler
1215			attempts to acquire problematic_lock.
1216
1217		The system is now deadlocked.
1218
1219		One way to avoid this deadlock is to use an approach like
1220		that of CONFIG_PREEMPT_RT, where all normal spinlocks
1221		become blocking locks, and all irq handlers execute in
1222		the context of special tasks.  In this case, in step 4
1223		above, the irq handler would block, allowing CPU 1 to
1224		release rcu_gp_mutex, avoiding the deadlock.
1225
1226		Even in the absence of deadlock, this RCU implementation
1227		allows latency to "bleed" from readers to other
1228		readers through synchronize_rcu().  To see this,
1229		consider task A in an RCU read-side critical section
1230		(thus read-holding rcu_gp_mutex), task B blocked
1231		attempting to write-acquire rcu_gp_mutex, and
1232		task C blocked in rcu_read_lock() attempting to
1233		read_acquire rcu_gp_mutex.  Task A's RCU read-side
1234		latency is holding up task C, albeit indirectly via
1235		task B.
1236
1237		Realtime RCU implementations therefore use a counter-based
1238		approach where tasks in RCU read-side critical sections
1239		cannot be blocked by tasks executing synchronize_rcu().
1240
1241:ref:`Back to Quick Quiz #1 <quiz_1>`
1242
1243Quick Quiz #2:
1244		Give an example where Classic RCU's read-side
1245		overhead is **negative**.
1246
1247Answer:
1248		Imagine a single-CPU system with a non-CONFIG_PREEMPTION
1249		kernel where a routing table is used by process-context
1250		code, but can be updated by irq-context code (for example,
1251		by an "ICMP REDIRECT" packet).	The usual way of handling
1252		this would be to have the process-context code disable
1253		interrupts while searching the routing table.  Use of
1254		RCU allows such interrupt-disabling to be dispensed with.
1255		Thus, without RCU, you pay the cost of disabling interrupts,
1256		and with RCU you don't.
1257
1258		One can argue that the overhead of RCU in this
1259		case is negative with respect to the single-CPU
1260		interrupt-disabling approach.  Others might argue that
1261		the overhead of RCU is merely zero, and that replacing
1262		the positive overhead of the interrupt-disabling scheme
1263		with the zero-overhead RCU scheme does not constitute
1264		negative overhead.
1265
1266		In real life, of course, things are more complex.  But
1267		even the theoretical possibility of negative overhead for
1268		a synchronization primitive is a bit unexpected.  ;-)
1269
1270:ref:`Back to Quick Quiz #2 <quiz_2>`
1271
1272Quick Quiz #3:
1273		If it is illegal to block in an RCU read-side
1274		critical section, what the heck do you do in
1275		CONFIG_PREEMPT_RT, where normal spinlocks can block???
1276
1277Answer:
1278		Just as CONFIG_PREEMPT_RT permits preemption of spinlock
1279		critical sections, it permits preemption of RCU
1280		read-side critical sections.  It also permits
1281		spinlocks blocking while in RCU read-side critical
1282		sections.
1283
1284		Why the apparent inconsistency?  Because it is
1285		possible to use priority boosting to keep the RCU
1286		grace periods short if need be (for example, if running
1287		short of memory).  In contrast, if blocking waiting
1288		for (say) network reception, there is no way to know
1289		what should be boosted.  Especially given that the
1290		process we need to boost might well be a human being
1291		who just went out for a pizza or something.  And although
1292		a computer-operated cattle prod might arouse serious
1293		interest, it might also provoke serious objections.
1294		Besides, how does the computer know what pizza parlor
1295		the human being went to???
1296
1297:ref:`Back to Quick Quiz #3 <quiz_3>`
1298
1299ACKNOWLEDGEMENTS
1300
1301My thanks to the people who helped make this human-readable, including
1302Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
1303
1304
1305For more information, see http://www.rdrop.com/users/paulmck/RCU.
1306