1*fa39ec4fSFrederic Weisbecker====================================== 2*fa39ec4fSFrederic WeisbeckerHousekeeping 3*fa39ec4fSFrederic Weisbecker====================================== 4*fa39ec4fSFrederic Weisbecker 5*fa39ec4fSFrederic Weisbecker 6*fa39ec4fSFrederic WeisbeckerCPU Isolation moves away kernel work that may otherwise run on any CPU. 7*fa39ec4fSFrederic WeisbeckerThe purpose of its related features is to reduce the OS jitter that some 8*fa39ec4fSFrederic Weisbeckerextreme workloads can't stand, such as in some DPDK usecases. 9*fa39ec4fSFrederic Weisbecker 10*fa39ec4fSFrederic WeisbeckerThe kernel work moved away by CPU isolation is commonly described as 11*fa39ec4fSFrederic Weisbecker"housekeeping" because it includes ground work that performs cleanups, 12*fa39ec4fSFrederic Weisbeckerstatistics maintainance and actions relying on them, memory release, 13*fa39ec4fSFrederic Weisbeckervarious deferrals etc... 14*fa39ec4fSFrederic Weisbecker 15*fa39ec4fSFrederic WeisbeckerSometimes housekeeping is just some unbound work (unbound workqueues, 16*fa39ec4fSFrederic Weisbeckerunbound timers, ...) that gets easily assigned to non-isolated CPUs. 17*fa39ec4fSFrederic WeisbeckerBut sometimes housekeeping is tied to a specific CPU and requires 18*fa39ec4fSFrederic Weisbeckerelaborated tricks to be offloaded to non-isolated CPUs (RCU_NOCB, remote 19*fa39ec4fSFrederic Weisbeckerscheduler tick, etc...). 20*fa39ec4fSFrederic Weisbecker 21*fa39ec4fSFrederic WeisbeckerThus, a housekeeping CPU can be considered as the reverse of an isolated 22*fa39ec4fSFrederic WeisbeckerCPU. It is simply a CPU that can execute housekeeping work. There must 23*fa39ec4fSFrederic Weisbeckeralways be at least one online housekeeping CPU at any time. The CPUs that 24*fa39ec4fSFrederic Weisbeckerare not isolated are automatically assigned as housekeeping. 25*fa39ec4fSFrederic Weisbecker 26*fa39ec4fSFrederic WeisbeckerHousekeeping is currently divided in four features described 27*fa39ec4fSFrederic Weisbeckerby the ``enum hk_type type``: 28*fa39ec4fSFrederic Weisbecker 29*fa39ec4fSFrederic Weisbecker1. HK_TYPE_DOMAIN matches the work moved away by scheduler domain 30*fa39ec4fSFrederic Weisbecker isolation performed through ``isolcpus=domain`` boot parameter or 31*fa39ec4fSFrederic Weisbecker isolated cpuset partitions in cgroup v2. This includes scheduler 32*fa39ec4fSFrederic Weisbecker load balancing, unbound workqueues and timers. 33*fa39ec4fSFrederic Weisbecker 34*fa39ec4fSFrederic Weisbecker2. HK_TYPE_KERNEL_NOISE matches the work moved away by tick isolation 35*fa39ec4fSFrederic Weisbecker performed through ``nohz_full=`` or ``isolcpus=nohz`` boot 36*fa39ec4fSFrederic Weisbecker parameters. This includes remote scheduler tick, vmstat and lockup 37*fa39ec4fSFrederic Weisbecker watchdog. 38*fa39ec4fSFrederic Weisbecker 39*fa39ec4fSFrederic Weisbecker3. HK_TYPE_MANAGED_IRQ matches the IRQ handlers moved away by managed 40*fa39ec4fSFrederic Weisbecker IRQ isolation performed through ``isolcpus=managed_irq``. 41*fa39ec4fSFrederic Weisbecker 42*fa39ec4fSFrederic Weisbecker4. HK_TYPE_DOMAIN_BOOT matches the work moved away by scheduler domain 43*fa39ec4fSFrederic Weisbecker isolation performed through ``isolcpus=domain`` only. It is similar 44*fa39ec4fSFrederic Weisbecker to HK_TYPE_DOMAIN except it ignores the isolation performed by 45*fa39ec4fSFrederic Weisbecker cpusets. 46*fa39ec4fSFrederic Weisbecker 47*fa39ec4fSFrederic Weisbecker 48*fa39ec4fSFrederic WeisbeckerHousekeeping cpumasks 49*fa39ec4fSFrederic Weisbecker================================= 50*fa39ec4fSFrederic Weisbecker 51*fa39ec4fSFrederic WeisbeckerHousekeeping cpumasks include the CPUs that can execute the work moved 52*fa39ec4fSFrederic Weisbeckeraway by the matching isolation feature. These cpumasks are returned by 53*fa39ec4fSFrederic Weisbeckerthe following function:: 54*fa39ec4fSFrederic Weisbecker 55*fa39ec4fSFrederic Weisbecker const struct cpumask *housekeeping_cpumask(enum hk_type type) 56*fa39ec4fSFrederic Weisbecker 57*fa39ec4fSFrederic WeisbeckerBy default, if neither ``nohz_full=``, nor ``isolcpus``, nor cpuset's 58*fa39ec4fSFrederic Weisbeckerisolated partitions are used, which covers most usecases, this function 59*fa39ec4fSFrederic Weisbeckerreturns the cpu_possible_mask. 60*fa39ec4fSFrederic Weisbecker 61*fa39ec4fSFrederic WeisbeckerOtherwise the function returns the cpumask complement of the isolation 62*fa39ec4fSFrederic Weisbeckerfeature. For example: 63*fa39ec4fSFrederic Weisbecker 64*fa39ec4fSFrederic WeisbeckerWith isolcpus=domain,7 the following will return a mask with all possible 65*fa39ec4fSFrederic WeisbeckerCPUs except 7:: 66*fa39ec4fSFrederic Weisbecker 67*fa39ec4fSFrederic Weisbecker housekeeping_cpumask(HK_TYPE_DOMAIN) 68*fa39ec4fSFrederic Weisbecker 69*fa39ec4fSFrederic WeisbeckerSimilarly with nohz_full=5,6 the following will return a mask with all 70*fa39ec4fSFrederic Weisbeckerpossible CPUs except 5,6:: 71*fa39ec4fSFrederic Weisbecker 72*fa39ec4fSFrederic Weisbecker housekeeping_cpumask(HK_TYPE_KERNEL_NOISE) 73*fa39ec4fSFrederic Weisbecker 74*fa39ec4fSFrederic Weisbecker 75*fa39ec4fSFrederic WeisbeckerSynchronization against cpusets 76*fa39ec4fSFrederic Weisbecker================================= 77*fa39ec4fSFrederic Weisbecker 78*fa39ec4fSFrederic WeisbeckerCpuset can modify the HK_TYPE_DOMAIN housekeeping cpumask while creating, 79*fa39ec4fSFrederic Weisbeckermodifying or deleting an isolated partition. 80*fa39ec4fSFrederic Weisbecker 81*fa39ec4fSFrederic WeisbeckerThe users of HK_TYPE_DOMAIN cpumask must then make sure to synchronize 82*fa39ec4fSFrederic Weisbeckerproperly against cpuset in order to make sure that: 83*fa39ec4fSFrederic Weisbecker 84*fa39ec4fSFrederic Weisbecker1. The cpumask snapshot stays coherent. 85*fa39ec4fSFrederic Weisbecker 86*fa39ec4fSFrederic Weisbecker2. No housekeeping work is queued on a newly made isolated CPU. 87*fa39ec4fSFrederic Weisbecker 88*fa39ec4fSFrederic Weisbecker3. Pending housekeeping work that was queued to a non isolated 89*fa39ec4fSFrederic Weisbecker CPU which just turned isolated through cpuset must be flushed 90*fa39ec4fSFrederic Weisbecker before the related created/modified isolated partition is made 91*fa39ec4fSFrederic Weisbecker available to userspace. 92*fa39ec4fSFrederic Weisbecker 93*fa39ec4fSFrederic WeisbeckerThis synchronization is maintained by an RCU based scheme. The cpuset update 94*fa39ec4fSFrederic Weisbeckerside waits for an RCU grace period after updating the HK_TYPE_DOMAIN 95*fa39ec4fSFrederic Weisbeckercpumask and before flushing pending works. On the read side, care must be 96*fa39ec4fSFrederic Weisbeckertaken to gather the housekeeping target election and the work enqueue within 97*fa39ec4fSFrederic Weisbeckerthe same RCU read side critical section. 98*fa39ec4fSFrederic Weisbecker 99*fa39ec4fSFrederic WeisbeckerA typical layout example would look like this on the update side 100*fa39ec4fSFrederic Weisbecker(``housekeeping_update()``):: 101*fa39ec4fSFrederic Weisbecker 102*fa39ec4fSFrederic Weisbecker rcu_assign_pointer(housekeeping_cpumasks[type], trial); 103*fa39ec4fSFrederic Weisbecker synchronize_rcu(); 104*fa39ec4fSFrederic Weisbecker flush_workqueue(example_workqueue); 105*fa39ec4fSFrederic Weisbecker 106*fa39ec4fSFrederic WeisbeckerAnd then on the read side:: 107*fa39ec4fSFrederic Weisbecker 108*fa39ec4fSFrederic Weisbecker rcu_read_lock(); 109*fa39ec4fSFrederic Weisbecker cpu = housekeeping_any_cpu(HK_TYPE_DOMAIN); 110*fa39ec4fSFrederic Weisbecker queue_work_on(cpu, example_workqueue, work); 111*fa39ec4fSFrederic Weisbecker rcu_read_unlock(); 112