1b8b43a4cSDavid Matlack.. SPDX-License-Identifier: GPL-2.0 2b8b43a4cSDavid Matlack 3b8b43a4cSDavid Matlack=========================== 4b8b43a4cSDavid MatlackThe KVM halt polling system 5b8b43a4cSDavid Matlack=========================== 6b8b43a4cSDavid Matlack 7b8b43a4cSDavid MatlackThe KVM halt polling system provides a feature within KVM whereby the latency 8b8b43a4cSDavid Matlackof a guest can, under some circumstances, be reduced by polling in the host 9b8b43a4cSDavid Matlackfor some time period after the guest has elected to no longer run by cedeing. 10b8b43a4cSDavid MatlackThat is, when a guest vcpu has ceded, or in the case of powerpc when all of the 11b8b43a4cSDavid Matlackvcpus of a single vcore have ceded, the host kernel polls for wakeup conditions 12b8b43a4cSDavid Matlackbefore giving up the cpu to the scheduler in order to let something else run. 13b8b43a4cSDavid Matlack 14b8b43a4cSDavid MatlackPolling provides a latency advantage in cases where the guest can be run again 15b8b43a4cSDavid Matlackvery quickly by at least saving us a trip through the scheduler, normally on 16b8b43a4cSDavid Matlackthe order of a few micro-seconds, although performance benefits are workload 17*d56b699dSBjorn Helgaasdependent. In the event that no wakeup source arrives during the polling 18b8b43a4cSDavid Matlackinterval or some other task on the runqueue is runnable the scheduler is 19b8b43a4cSDavid Matlackinvoked. Thus halt polling is especially useful on workloads with very short 20b8b43a4cSDavid Matlackwakeup periods where the time spent halt polling is minimised and the time 21b8b43a4cSDavid Matlacksavings of not invoking the scheduler are distinguishable. 22b8b43a4cSDavid Matlack 23b8b43a4cSDavid MatlackThe generic halt polling code is implemented in: 24b8b43a4cSDavid Matlack 25b8b43a4cSDavid Matlack virt/kvm/kvm_main.c: kvm_vcpu_block() 26b8b43a4cSDavid Matlack 27b8b43a4cSDavid MatlackThe powerpc kvm-hv specific case is implemented in: 28b8b43a4cSDavid Matlack 29b8b43a4cSDavid Matlack arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked() 30b8b43a4cSDavid Matlack 31b8b43a4cSDavid MatlackHalt Polling Interval 32b8b43a4cSDavid Matlack===================== 33b8b43a4cSDavid Matlack 34b8b43a4cSDavid MatlackThe maximum time for which to poll before invoking the scheduler, referred to 35b8b43a4cSDavid Matlackas the halt polling interval, is increased and decreased based on the perceived 36b8b43a4cSDavid Matlackeffectiveness of the polling in an attempt to limit pointless polling. 37b8b43a4cSDavid MatlackThis value is stored in either the vcpu struct: 38b8b43a4cSDavid Matlack 39b8b43a4cSDavid Matlack kvm_vcpu->halt_poll_ns 40b8b43a4cSDavid Matlack 41b8b43a4cSDavid Matlackor in the case of powerpc kvm-hv, in the vcore struct: 42b8b43a4cSDavid Matlack 43b8b43a4cSDavid Matlack kvmppc_vcore->halt_poll_ns 44b8b43a4cSDavid Matlack 45b8b43a4cSDavid MatlackThus this is a per vcpu (or vcore) value. 46b8b43a4cSDavid Matlack 47b8b43a4cSDavid MatlackDuring polling if a wakeup source is received within the halt polling interval, 48b8b43a4cSDavid Matlackthe interval is left unchanged. In the event that a wakeup source isn't 49b8b43a4cSDavid Matlackreceived during the polling interval (and thus schedule is invoked) there are 50b8b43a4cSDavid Matlacktwo options, either the polling interval and total block time[0] were less than 51b8b43a4cSDavid Matlackthe global max polling interval (see module params below), or the total block 52b8b43a4cSDavid Matlacktime was greater than the global max polling interval. 53b8b43a4cSDavid Matlack 54b8b43a4cSDavid MatlackIn the event that both the polling interval and total block time were less than 55b8b43a4cSDavid Matlackthe global max polling interval then the polling interval can be increased in 56b8b43a4cSDavid Matlackthe hope that next time during the longer polling interval the wake up source 57b8b43a4cSDavid Matlackwill be received while the host is polling and the latency benefits will be 58b8b43a4cSDavid Matlackreceived. The polling interval is grown in the function grow_halt_poll_ns() and 59b8b43a4cSDavid Matlackis multiplied by the module parameters halt_poll_ns_grow and 60b8b43a4cSDavid Matlackhalt_poll_ns_grow_start. 61b8b43a4cSDavid Matlack 62b8b43a4cSDavid MatlackIn the event that the total block time was greater than the global max polling 63b8b43a4cSDavid Matlackinterval then the host will never poll for long enough (limited by the global 64b8b43a4cSDavid Matlackmax) to wakeup during the polling interval so it may as well be shrunk in order 65b8b43a4cSDavid Matlackto avoid pointless polling. The polling interval is shrunk in the function 66b8b43a4cSDavid Matlackshrink_halt_poll_ns() and is divided by the module parameter 67b8b43a4cSDavid Matlackhalt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0. 68b8b43a4cSDavid Matlack 69b8b43a4cSDavid MatlackIt is worth noting that this adjustment process attempts to hone in on some 70b8b43a4cSDavid Matlacksteady state polling interval but will only really do a good job for wakeups 71b8b43a4cSDavid Matlackwhich come at an approximately constant rate, otherwise there will be constant 72b8b43a4cSDavid Matlackadjustment of the polling interval. 73b8b43a4cSDavid Matlack 74b8b43a4cSDavid Matlack[0] total block time: 75b8b43a4cSDavid Matlack the time between when the halt polling function is 76b8b43a4cSDavid Matlack invoked and a wakeup source received (irrespective of 77b8b43a4cSDavid Matlack whether the scheduler is invoked within that function). 78b8b43a4cSDavid Matlack 79b8b43a4cSDavid MatlackModule Parameters 80b8b43a4cSDavid Matlack================= 81b8b43a4cSDavid Matlack 82b8b43a4cSDavid MatlackThe kvm module has 3 tuneable module parameters to adjust the global max 83b8b43a4cSDavid Matlackpolling interval as well as the rate at which the polling interval is grown and 84b8b43a4cSDavid Matlackshrunk. These variables are defined in include/linux/kvm_host.h and as module 85b8b43a4cSDavid Matlackparameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the 86b8b43a4cSDavid Matlackpowerpc kvm-hv case. 87b8b43a4cSDavid Matlack 88b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+ 89b8b43a4cSDavid Matlack|Module Parameter | Description | Default Value | 90b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+ 91b8b43a4cSDavid Matlack|halt_poll_ns | The global max polling | KVM_HALT_POLL_NS_DEFAULT| 92b8b43a4cSDavid Matlack| | interval which defines | | 93b8b43a4cSDavid Matlack| | the ceiling value of the | | 94b8b43a4cSDavid Matlack| | polling interval for | (per arch value) | 95b8b43a4cSDavid Matlack| | each vcpu. | | 96b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+ 97b8b43a4cSDavid Matlack|halt_poll_ns_grow | The value by which the | 2 | 98b8b43a4cSDavid Matlack| | halt polling interval is | | 99b8b43a4cSDavid Matlack| | multiplied in the | | 100b8b43a4cSDavid Matlack| | grow_halt_poll_ns() | | 101b8b43a4cSDavid Matlack| | function. | | 102b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+ 103b8b43a4cSDavid Matlack|halt_poll_ns_grow_start| The initial value to grow | 10000 | 104b8b43a4cSDavid Matlack| | to from zero in the | | 105b8b43a4cSDavid Matlack| | grow_halt_poll_ns() | | 106b8b43a4cSDavid Matlack| | function. | | 107b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+ 108b8b43a4cSDavid Matlack|halt_poll_ns_shrink | The value by which the | 0 | 109b8b43a4cSDavid Matlack| | halt polling interval is | | 110b8b43a4cSDavid Matlack| | divided in the | | 111b8b43a4cSDavid Matlack| | shrink_halt_poll_ns() | | 112b8b43a4cSDavid Matlack| | function. | | 113b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+ 114b8b43a4cSDavid Matlack 1154c60d499SRandy DunlapThese module parameters can be set from the sysfs files in: 116b8b43a4cSDavid Matlack 117b8b43a4cSDavid Matlack /sys/module/kvm/parameters/ 118b8b43a4cSDavid Matlack 1194c60d499SRandy DunlapNote: these module parameters are system-wide values and are not able to 120b8b43a4cSDavid Matlack be tuned on a per vm basis. 121b8b43a4cSDavid Matlack 12234e30ebbSDavid MatlackAny changes to these parameters will be picked up by new and existing vCPUs the 12334e30ebbSDavid Matlacknext time they halt, with the notable exception of VMs using KVM_CAP_HALT_POLL 12434e30ebbSDavid Matlack(see next section). 12534e30ebbSDavid Matlack 12634e30ebbSDavid MatlackKVM_CAP_HALT_POLL 12734e30ebbSDavid Matlack================= 12834e30ebbSDavid Matlack 12934e30ebbSDavid MatlackKVM_CAP_HALT_POLL is a VM capability that allows userspace to override halt_poll_ns 13034e30ebbSDavid Matlackon a per-VM basis. VMs using KVM_CAP_HALT_POLL ignore halt_poll_ns completely (but 13134e30ebbSDavid Matlackstill obey halt_poll_ns_grow, halt_poll_ns_grow_start, and halt_poll_ns_shrink). 13234e30ebbSDavid Matlack 13334e30ebbSDavid MatlackSee Documentation/virt/kvm/api.rst for more information on this capability. 13434e30ebbSDavid Matlack 135b8b43a4cSDavid MatlackFurther Notes 136b8b43a4cSDavid Matlack============= 137b8b43a4cSDavid Matlack 138b8b43a4cSDavid Matlack- Care should be taken when setting the halt_poll_ns module parameter as a large value 139b8b43a4cSDavid Matlack has the potential to drive the cpu usage to 100% on a machine which would be almost 140b8b43a4cSDavid Matlack entirely idle otherwise. This is because even if a guest has wakeups during which very 141b8b43a4cSDavid Matlack little work is done and which are quite far apart, if the period is shorter than the 142b8b43a4cSDavid Matlack global max polling interval (halt_poll_ns) then the host will always poll for the 143b8b43a4cSDavid Matlack entire block time and thus cpu utilisation will go to 100%. 144b8b43a4cSDavid Matlack 1454c60d499SRandy Dunlap- Halt polling essentially presents a trade-off between power usage and latency and 146b8b43a4cSDavid Matlack the module parameters should be used to tune the affinity for this. Idle cpu time is 147b8b43a4cSDavid Matlack essentially converted to host kernel time with the aim of decreasing latency when 148b8b43a4cSDavid Matlack entering the guest. 149b8b43a4cSDavid Matlack 150b8b43a4cSDavid Matlack- Halt polling will only be conducted by the host when no other tasks are runnable on 151b8b43a4cSDavid Matlack that cpu, otherwise the polling will cease immediately and schedule will be invoked to 1524c60d499SRandy Dunlap allow that other task to run. Thus this doesn't allow a guest to cause denial of service 1534c60d499SRandy Dunlap of the cpu. 154