xref: /linux/Documentation/virt/kvm/halt-polling.rst (revision a1c613ae4c322ddd58d5a8539dbfba2a0380a8c0)
1b8b43a4cSDavid Matlack.. SPDX-License-Identifier: GPL-2.0
2b8b43a4cSDavid Matlack
3b8b43a4cSDavid Matlack===========================
4b8b43a4cSDavid MatlackThe KVM halt polling system
5b8b43a4cSDavid Matlack===========================
6b8b43a4cSDavid Matlack
7b8b43a4cSDavid MatlackThe KVM halt polling system provides a feature within KVM whereby the latency
8b8b43a4cSDavid Matlackof a guest can, under some circumstances, be reduced by polling in the host
9b8b43a4cSDavid Matlackfor some time period after the guest has elected to no longer run by cedeing.
10b8b43a4cSDavid MatlackThat is, when a guest vcpu has ceded, or in the case of powerpc when all of the
11b8b43a4cSDavid Matlackvcpus of a single vcore have ceded, the host kernel polls for wakeup conditions
12b8b43a4cSDavid Matlackbefore giving up the cpu to the scheduler in order to let something else run.
13b8b43a4cSDavid Matlack
14b8b43a4cSDavid MatlackPolling provides a latency advantage in cases where the guest can be run again
15b8b43a4cSDavid Matlackvery quickly by at least saving us a trip through the scheduler, normally on
16b8b43a4cSDavid Matlackthe order of a few micro-seconds, although performance benefits are workload
17*d56b699dSBjorn Helgaasdependent. In the event that no wakeup source arrives during the polling
18b8b43a4cSDavid Matlackinterval or some other task on the runqueue is runnable the scheduler is
19b8b43a4cSDavid Matlackinvoked. Thus halt polling is especially useful on workloads with very short
20b8b43a4cSDavid Matlackwakeup periods where the time spent halt polling is minimised and the time
21b8b43a4cSDavid Matlacksavings of not invoking the scheduler are distinguishable.
22b8b43a4cSDavid Matlack
23b8b43a4cSDavid MatlackThe generic halt polling code is implemented in:
24b8b43a4cSDavid Matlack
25b8b43a4cSDavid Matlack	virt/kvm/kvm_main.c: kvm_vcpu_block()
26b8b43a4cSDavid Matlack
27b8b43a4cSDavid MatlackThe powerpc kvm-hv specific case is implemented in:
28b8b43a4cSDavid Matlack
29b8b43a4cSDavid Matlack	arch/powerpc/kvm/book3s_hv.c: kvmppc_vcore_blocked()
30b8b43a4cSDavid Matlack
31b8b43a4cSDavid MatlackHalt Polling Interval
32b8b43a4cSDavid Matlack=====================
33b8b43a4cSDavid Matlack
34b8b43a4cSDavid MatlackThe maximum time for which to poll before invoking the scheduler, referred to
35b8b43a4cSDavid Matlackas the halt polling interval, is increased and decreased based on the perceived
36b8b43a4cSDavid Matlackeffectiveness of the polling in an attempt to limit pointless polling.
37b8b43a4cSDavid MatlackThis value is stored in either the vcpu struct:
38b8b43a4cSDavid Matlack
39b8b43a4cSDavid Matlack	kvm_vcpu->halt_poll_ns
40b8b43a4cSDavid Matlack
41b8b43a4cSDavid Matlackor in the case of powerpc kvm-hv, in the vcore struct:
42b8b43a4cSDavid Matlack
43b8b43a4cSDavid Matlack	kvmppc_vcore->halt_poll_ns
44b8b43a4cSDavid Matlack
45b8b43a4cSDavid MatlackThus this is a per vcpu (or vcore) value.
46b8b43a4cSDavid Matlack
47b8b43a4cSDavid MatlackDuring polling if a wakeup source is received within the halt polling interval,
48b8b43a4cSDavid Matlackthe interval is left unchanged. In the event that a wakeup source isn't
49b8b43a4cSDavid Matlackreceived during the polling interval (and thus schedule is invoked) there are
50b8b43a4cSDavid Matlacktwo options, either the polling interval and total block time[0] were less than
51b8b43a4cSDavid Matlackthe global max polling interval (see module params below), or the total block
52b8b43a4cSDavid Matlacktime was greater than the global max polling interval.
53b8b43a4cSDavid Matlack
54b8b43a4cSDavid MatlackIn the event that both the polling interval and total block time were less than
55b8b43a4cSDavid Matlackthe global max polling interval then the polling interval can be increased in
56b8b43a4cSDavid Matlackthe hope that next time during the longer polling interval the wake up source
57b8b43a4cSDavid Matlackwill be received while the host is polling and the latency benefits will be
58b8b43a4cSDavid Matlackreceived. The polling interval is grown in the function grow_halt_poll_ns() and
59b8b43a4cSDavid Matlackis multiplied by the module parameters halt_poll_ns_grow and
60b8b43a4cSDavid Matlackhalt_poll_ns_grow_start.
61b8b43a4cSDavid Matlack
62b8b43a4cSDavid MatlackIn the event that the total block time was greater than the global max polling
63b8b43a4cSDavid Matlackinterval then the host will never poll for long enough (limited by the global
64b8b43a4cSDavid Matlackmax) to wakeup during the polling interval so it may as well be shrunk in order
65b8b43a4cSDavid Matlackto avoid pointless polling. The polling interval is shrunk in the function
66b8b43a4cSDavid Matlackshrink_halt_poll_ns() and is divided by the module parameter
67b8b43a4cSDavid Matlackhalt_poll_ns_shrink, or set to 0 iff halt_poll_ns_shrink == 0.
68b8b43a4cSDavid Matlack
69b8b43a4cSDavid MatlackIt is worth noting that this adjustment process attempts to hone in on some
70b8b43a4cSDavid Matlacksteady state polling interval but will only really do a good job for wakeups
71b8b43a4cSDavid Matlackwhich come at an approximately constant rate, otherwise there will be constant
72b8b43a4cSDavid Matlackadjustment of the polling interval.
73b8b43a4cSDavid Matlack
74b8b43a4cSDavid Matlack[0] total block time:
75b8b43a4cSDavid Matlack		      the time between when the halt polling function is
76b8b43a4cSDavid Matlack		      invoked and a wakeup source received (irrespective of
77b8b43a4cSDavid Matlack		      whether the scheduler is invoked within that function).
78b8b43a4cSDavid Matlack
79b8b43a4cSDavid MatlackModule Parameters
80b8b43a4cSDavid Matlack=================
81b8b43a4cSDavid Matlack
82b8b43a4cSDavid MatlackThe kvm module has 3 tuneable module parameters to adjust the global max
83b8b43a4cSDavid Matlackpolling interval as well as the rate at which the polling interval is grown and
84b8b43a4cSDavid Matlackshrunk. These variables are defined in include/linux/kvm_host.h and as module
85b8b43a4cSDavid Matlackparameters in virt/kvm/kvm_main.c, or arch/powerpc/kvm/book3s_hv.c in the
86b8b43a4cSDavid Matlackpowerpc kvm-hv case.
87b8b43a4cSDavid Matlack
88b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+
89b8b43a4cSDavid Matlack|Module Parameter	|   Description		    |	     Default Value    |
90b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+
91b8b43a4cSDavid Matlack|halt_poll_ns		| The global max polling    | KVM_HALT_POLL_NS_DEFAULT|
92b8b43a4cSDavid Matlack|			| interval which defines    |			      |
93b8b43a4cSDavid Matlack|			| the ceiling value of the  |			      |
94b8b43a4cSDavid Matlack|			| polling interval for      | (per arch value)	      |
95b8b43a4cSDavid Matlack|			| each vcpu.		    |			      |
96b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+
97b8b43a4cSDavid Matlack|halt_poll_ns_grow	| The value by which the    | 2			      |
98b8b43a4cSDavid Matlack|			| halt polling interval is  |			      |
99b8b43a4cSDavid Matlack|			| multiplied in the	    |			      |
100b8b43a4cSDavid Matlack|			| grow_halt_poll_ns()	    |			      |
101b8b43a4cSDavid Matlack|			| function.		    |			      |
102b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+
103b8b43a4cSDavid Matlack|halt_poll_ns_grow_start| The initial value to grow | 10000		      |
104b8b43a4cSDavid Matlack|			| to from zero in the	    |			      |
105b8b43a4cSDavid Matlack|			| grow_halt_poll_ns()	    |			      |
106b8b43a4cSDavid Matlack|			| function.		    |			      |
107b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+
108b8b43a4cSDavid Matlack|halt_poll_ns_shrink	| The value by which the    | 0			      |
109b8b43a4cSDavid Matlack|			| halt polling interval is  |			      |
110b8b43a4cSDavid Matlack|			| divided in the	    |			      |
111b8b43a4cSDavid Matlack|			| shrink_halt_poll_ns()	    |			      |
112b8b43a4cSDavid Matlack|			| function.		    |			      |
113b8b43a4cSDavid Matlack+-----------------------+---------------------------+-------------------------+
114b8b43a4cSDavid Matlack
1154c60d499SRandy DunlapThese module parameters can be set from the sysfs files in:
116b8b43a4cSDavid Matlack
117b8b43a4cSDavid Matlack	/sys/module/kvm/parameters/
118b8b43a4cSDavid Matlack
1194c60d499SRandy DunlapNote: these module parameters are system-wide values and are not able to
120b8b43a4cSDavid Matlack      be tuned on a per vm basis.
121b8b43a4cSDavid Matlack
12234e30ebbSDavid MatlackAny changes to these parameters will be picked up by new and existing vCPUs the
12334e30ebbSDavid Matlacknext time they halt, with the notable exception of VMs using KVM_CAP_HALT_POLL
12434e30ebbSDavid Matlack(see next section).
12534e30ebbSDavid Matlack
12634e30ebbSDavid MatlackKVM_CAP_HALT_POLL
12734e30ebbSDavid Matlack=================
12834e30ebbSDavid Matlack
12934e30ebbSDavid MatlackKVM_CAP_HALT_POLL is a VM capability that allows userspace to override halt_poll_ns
13034e30ebbSDavid Matlackon a per-VM basis. VMs using KVM_CAP_HALT_POLL ignore halt_poll_ns completely (but
13134e30ebbSDavid Matlackstill obey halt_poll_ns_grow, halt_poll_ns_grow_start, and halt_poll_ns_shrink).
13234e30ebbSDavid Matlack
13334e30ebbSDavid MatlackSee Documentation/virt/kvm/api.rst for more information on this capability.
13434e30ebbSDavid Matlack
135b8b43a4cSDavid MatlackFurther Notes
136b8b43a4cSDavid Matlack=============
137b8b43a4cSDavid Matlack
138b8b43a4cSDavid Matlack- Care should be taken when setting the halt_poll_ns module parameter as a large value
139b8b43a4cSDavid Matlack  has the potential to drive the cpu usage to 100% on a machine which would be almost
140b8b43a4cSDavid Matlack  entirely idle otherwise. This is because even if a guest has wakeups during which very
141b8b43a4cSDavid Matlack  little work is done and which are quite far apart, if the period is shorter than the
142b8b43a4cSDavid Matlack  global max polling interval (halt_poll_ns) then the host will always poll for the
143b8b43a4cSDavid Matlack  entire block time and thus cpu utilisation will go to 100%.
144b8b43a4cSDavid Matlack
1454c60d499SRandy Dunlap- Halt polling essentially presents a trade-off between power usage and latency and
146b8b43a4cSDavid Matlack  the module parameters should be used to tune the affinity for this. Idle cpu time is
147b8b43a4cSDavid Matlack  essentially converted to host kernel time with the aim of decreasing latency when
148b8b43a4cSDavid Matlack  entering the guest.
149b8b43a4cSDavid Matlack
150b8b43a4cSDavid Matlack- Halt polling will only be conducted by the host when no other tasks are runnable on
151b8b43a4cSDavid Matlack  that cpu, otherwise the polling will cease immediately and schedule will be invoked to
1524c60d499SRandy Dunlap  allow that other task to run. Thus this doesn't allow a guest to cause denial of service
1534c60d499SRandy Dunlap  of the cpu.
154