xref: /linux/Documentation/core-api/housekeeping.rst (revision 69050f8d6d075dc01af7a5f2f550a8067510366f)
1======================================
2Housekeeping
3======================================
4
5
6CPU Isolation moves away kernel work that may otherwise run on any CPU.
7The purpose of its related features is to reduce the OS jitter that some
8extreme workloads can't stand, such as in some DPDK usecases.
9
10The kernel work moved away by CPU isolation is commonly described as
11"housekeeping" because it includes ground work that performs cleanups,
12statistics maintainance and actions relying on them, memory release,
13various deferrals etc...
14
15Sometimes housekeeping is just some unbound work (unbound workqueues,
16unbound timers, ...) that gets easily assigned to non-isolated CPUs.
17But sometimes housekeeping is tied to a specific CPU and requires
18elaborated tricks to be offloaded to non-isolated CPUs (RCU_NOCB, remote
19scheduler tick, etc...).
20
21Thus, a housekeeping CPU can be considered as the reverse of an isolated
22CPU. It is simply a CPU that can execute housekeeping work. There must
23always be at least one online housekeeping CPU at any time. The CPUs that
24are not	isolated are automatically assigned as housekeeping.
25
26Housekeeping is currently divided in four features described
27by the ``enum hk_type type``:
28
291.	HK_TYPE_DOMAIN matches the work moved away by scheduler domain
30	isolation performed through ``isolcpus=domain`` boot parameter or
31	isolated cpuset partitions in cgroup v2. This includes scheduler
32	load balancing, unbound workqueues and timers.
33
342.	HK_TYPE_KERNEL_NOISE matches the work moved away by tick isolation
35	performed through ``nohz_full=`` or ``isolcpus=nohz`` boot
36	parameters. This includes remote scheduler tick, vmstat and lockup
37	watchdog.
38
393.	HK_TYPE_MANAGED_IRQ matches the IRQ handlers moved away by managed
40	IRQ isolation performed through ``isolcpus=managed_irq``.
41
424.	HK_TYPE_DOMAIN_BOOT matches the work moved away by scheduler domain
43	isolation performed through ``isolcpus=domain`` only. It is similar
44	to HK_TYPE_DOMAIN except it ignores the isolation performed by
45	cpusets.
46
47
48Housekeeping cpumasks
49=================================
50
51Housekeeping cpumasks include the CPUs that can execute the work moved
52away by the matching isolation feature. These cpumasks are returned by
53the following function::
54
55	const struct cpumask *housekeeping_cpumask(enum hk_type type)
56
57By default, if neither ``nohz_full=``, nor ``isolcpus``, nor cpuset's
58isolated partitions are used, which covers most usecases, this function
59returns the cpu_possible_mask.
60
61Otherwise the function returns the cpumask complement of the isolation
62feature. For example:
63
64With isolcpus=domain,7 the following will return a mask with all possible
65CPUs except 7::
66
67	housekeeping_cpumask(HK_TYPE_DOMAIN)
68
69Similarly with nohz_full=5,6 the following will return a mask with all
70possible CPUs except 5,6::
71
72	housekeeping_cpumask(HK_TYPE_KERNEL_NOISE)
73
74
75Synchronization against cpusets
76=================================
77
78Cpuset can modify the HK_TYPE_DOMAIN housekeeping cpumask while creating,
79modifying or deleting an isolated partition.
80
81The users of HK_TYPE_DOMAIN cpumask must then make sure to synchronize
82properly against cpuset in order to make sure that:
83
841.	The cpumask snapshot stays coherent.
85
862.	No housekeeping work is queued on a newly made isolated CPU.
87
883.	Pending housekeeping work that was queued to a non isolated
89	CPU which just turned isolated through cpuset must be flushed
90	before the related created/modified isolated partition is made
91	available to userspace.
92
93This synchronization is maintained by an RCU based scheme. The cpuset update
94side waits for an RCU grace period after updating the HK_TYPE_DOMAIN
95cpumask and before flushing pending works. On the read side, care must be
96taken to gather the housekeeping target election and the work enqueue within
97the same RCU read side critical section.
98
99A typical layout example would look like this on the update side
100(``housekeeping_update()``)::
101
102	rcu_assign_pointer(housekeeping_cpumasks[type], trial);
103	synchronize_rcu();
104	flush_workqueue(example_workqueue);
105
106And then on the read side::
107
108	rcu_read_lock();
109	cpu = housekeeping_any_cpu(HK_TYPE_DOMAIN);
110	queue_work_on(cpu, example_workqueue, work);
111	rcu_read_unlock();
112