xref: /linux/Documentation/virt/kvm/devices/vcpu.rst (revision e7e86d7697c6ed1dbbde18d7185c35b6967945ed)
1.. SPDX-License-Identifier: GPL-2.0
2
3======================
4Generic vcpu interface
5======================
6
7The virtual cpu "device" also accepts the ioctls KVM_SET_DEVICE_ATTR,
8KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same struct
9kvm_device_attr as other devices, but targets VCPU-wide settings and controls.
10
11The groups and attributes per virtual cpu, if any, are architecture specific.
12
131. GROUP: KVM_ARM_VCPU_PMU_V3_CTRL
14==================================
15
16:Architectures: ARM64
17
181.1. ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_IRQ
19---------------------------------------
20
21:Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a
22	     pointer to an int
23
24Returns:
25
26	 =======  ========================================================
27	 -EBUSY   The PMU overflow interrupt is already set
28	 -EFAULT  Error reading interrupt number
29	 -ENXIO   PMUv3 not supported or the overflow interrupt not set
30		  when attempting to get it
31	 -ENODEV  KVM_ARM_VCPU_PMU_V3 feature missing from VCPU
32	 -EINVAL  Invalid PMU overflow interrupt number supplied or
33		  trying to set the IRQ number without using an in-kernel
34		  irqchip.
35	 =======  ========================================================
36
37A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
38number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
39type must be same for each vcpu. As a PPI, the interrupt number is the same for
40all vcpus, while as an SPI it must be a separate number per vcpu.
41
421.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
43---------------------------------------
44
45:Parameters: no additional parameter in kvm_device_attr.addr
46
47Returns:
48
49	 =======  ======================================================
50	 -EEXIST  Interrupt number already used
51	 -ENODEV  PMUv3 not supported or GIC not initialized
52	 -ENXIO   PMUv3 not supported, missing VCPU feature or interrupt
53		  number not set
54	 -EBUSY   PMUv3 already initialized
55	 =======  ======================================================
56
57Request the initialization of the PMUv3.  If using the PMUv3 with an in-kernel
58virtual GIC implementation, this must be done after initializing the in-kernel
59irqchip.
60
611.3 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FILTER
62-----------------------------------------
63
64:Parameters: in kvm_device_attr.addr the address for a PMU event filter is a
65             pointer to a struct kvm_pmu_event_filter
66
67:Returns:
68
69	 =======  ======================================================
70	 -ENODEV  PMUv3 not supported or GIC not initialized
71	 -ENXIO   PMUv3 not properly configured or in-kernel irqchip not
72	 	  configured as required prior to calling this attribute
73	 -EBUSY   PMUv3 already initialized or a VCPU has already run
74	 -EINVAL  Invalid filter range
75	 =======  ======================================================
76
77Request the installation of a PMU event filter described as follows::
78
79    struct kvm_pmu_event_filter {
80	    __u16	base_event;
81	    __u16	nevents;
82
83    #define KVM_PMU_EVENT_ALLOW	0
84    #define KVM_PMU_EVENT_DENY	1
85
86	    __u8	action;
87	    __u8	pad[3];
88    };
89
90A filter range is defined as the range [@base_event, @base_event + @nevents),
91together with an @action (KVM_PMU_EVENT_ALLOW or KVM_PMU_EVENT_DENY). The
92first registered range defines the global policy (global ALLOW if the first
93@action is DENY, global DENY if the first @action is ALLOW). Multiple ranges
94can be programmed, and must fit within the event space defined by the PMU
95architecture (10 bits on ARMv8.0, 16 bits from ARMv8.1 onwards).
96
97Note: "Cancelling" a filter by registering the opposite action for the same
98range doesn't change the default action. For example, installing an ALLOW
99filter for event range [0:10) as the first filter and then applying a DENY
100action for the same range will leave the whole range as disabled.
101
102Restrictions: Event 0 (SW_INCR) is never filtered, as it doesn't count a
103hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
104isn't strictly speaking an event. Filtering the cycle counter is possible
105using event 0x11 (CPU_CYCLES).
106
1071.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
108------------------------------------------
109
110:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
111             identifier.
112
113:Returns:
114
115	 =======  ====================================================
116	 -EBUSY   PMUv3 already initialized, a VCPU has already run or
117                  an event filter has already been set
118	 -EFAULT  Error accessing the PMU identifier
119	 -ENXIO   PMU not found
120	 -ENODEV  PMUv3 not supported or GIC not initialized
121	 -ENOMEM  Could not allocate memory
122	 =======  ====================================================
123
124Request that the VCPU uses the specified hardware PMU when creating guest events
125for the purpose of PMU emulation. The PMU identifier can be read from the "type"
126file for the desired PMU instance under /sys/devices (or, equivalent,
127/sys/bus/even_source). This attribute is particularly useful on heterogeneous
128systems where there are at least two CPU PMUs on the system. The PMU that is set
129for one VCPU will be used by all the other VCPUs. It isn't possible to set a PMU
130if a PMU event filter is already present.
131
132Note that KVM will not make any attempts to run the VCPU on the physical CPUs
133associated with the PMU specified by this attribute. This is entirely left to
134userspace. However, attempting to run the VCPU on a physical CPU not supported
135by the PMU will fail and KVM_RUN will return with
136exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting
137hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and
138the cpu field to the processor id.
139
1401.5 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS
141--------------------------------------------------
142
143:Parameters: in kvm_device_attr.addr the address to an unsigned int
144	     representing the maximum value taken by PMCR_EL0.N
145
146:Returns:
147
148	 =======  ====================================================
149	 -EBUSY   PMUv3 already initialized, a VCPU has already run or
150                  an event filter has already been set
151	 -EFAULT  Error accessing the value pointed to by addr
152	 -ENODEV  PMUv3 not supported or GIC not initialized
153	 -EINVAL  No PMUv3 explicitly selected, or value of N out of
154	 	  range
155	 =======  ====================================================
156
157Set the number of implemented event counters in the virtual PMU. This
158mandates that a PMU has explicitly been selected via
159KVM_ARM_VCPU_PMU_V3_SET_PMU, and will fail when no PMU has been
160explicitly selected, or the number of counters is out of range for the
161selected PMU. Selecting a new PMU cancels the effect of setting this
162attribute.
163
1642. GROUP: KVM_ARM_VCPU_TIMER_CTRL
165=================================
166
167:Architectures: ARM64
168
1692.1. ATTRIBUTES: KVM_ARM_VCPU_TIMER_IRQ_{VTIMER,PTIMER,HVTIMER,HPTIMER}
170-----------------------------------------------------------------------
171
172:Parameters: in kvm_device_attr.addr the address for the timer interrupt is a
173	     pointer to an int
174
175Returns:
176
177	 =======  =================================
178	 -EINVAL  Invalid timer interrupt number
179	 -EBUSY   One or more VCPUs has already run
180	 =======  =================================
181
182A value describing the architected timer interrupt number when connected to an
183in-kernel virtual GIC.  These must be a PPI (16 <= intid < 32).  Setting the
184attribute overrides the default values (see below).
185
186==============================  ==========================================
187KVM_ARM_VCPU_TIMER_IRQ_VTIMER   The EL1 virtual timer intid (default: 27)
188KVM_ARM_VCPU_TIMER_IRQ_PTIMER   The EL1 physical timer intid (default: 30)
189KVM_ARM_VCPU_TIMER_IRQ_HVTIMER  The EL2 virtual timer intid (default: 28)
190KVM_ARM_VCPU_TIMER_IRQ_HPTIMER  The EL2 physical timer intid (default: 26)
191==============================  ==========================================
192
193Setting the same PPI for different timers will prevent the VCPUs from running.
194Setting the interrupt number on a VCPU configures all VCPUs created at that
195time to use the number provided for a given timer, overwriting any previously
196configured values on other VCPUs.  Userspace should configure the interrupt
197numbers on at least one VCPU after creating all VCPUs and before running any
198VCPUs.
199
200.. _kvm_arm_vcpu_pvtime_ctrl:
201
2023. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
203==================================
204
205:Architectures: ARM64
206
2073.1 ATTRIBUTE: KVM_ARM_VCPU_PVTIME_IPA
208--------------------------------------
209
210:Parameters: 64-bit base address
211
212Returns:
213
214	 =======  ======================================
215	 -ENXIO   Stolen time not implemented
216	 -EEXIST  Base address already set for this VCPU
217	 -EINVAL  Base address not 64 byte aligned
218	 =======  ======================================
219
220Specifies the base address of the stolen time structure for this VCPU. The
221base address must be 64 byte aligned and exist within a valid guest memory
222region. See Documentation/virt/kvm/arm/pvtime.rst for more information
223including the layout of the stolen time structure.
224
2254. GROUP: KVM_VCPU_TSC_CTRL
226===========================
227
228:Architectures: x86
229
2304.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
231
232:Parameters: 64-bit unsigned TSC offset
233
234Returns:
235
236	 ======= ======================================
237	 -EFAULT Error reading/writing the provided
238		 parameter address.
239	 -ENXIO  Attribute not supported
240	 ======= ======================================
241
242Specifies the guest's TSC offset relative to the host's TSC. The guest's
243TSC is then derived by the following equation:
244
245  guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
246
247This attribute is useful to adjust the guest's TSC on live migration,
248so that the TSC counts the time during which the VM was paused. The
249following describes a possible algorithm to use for this purpose.
250
251From the source VMM process:
252
2531. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src),
254   kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds
255   (host_src).
256
2572. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the
258   guest TSC offset (ofs_src[i]).
259
2603. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the
261   guest's TSC (freq).
262
263From the destination VMM process:
264
2654. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from
266   kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective
267   fields.  Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
268   structure.
269
270   KVM will advance the VM's kvmclock to account for elapsed time since
271   recording the clock values.  Note that this will cause problems in
272   the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized
273   between the source and destination, and a reasonably short time passes
274   between the source pausing the VMs and the destination executing
275   steps 4-7.
276
2775. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_dest) and
278   kvmclock nanoseconds (guest_dest).
279
2806. Adjust the guest TSC offsets for every vCPU to account for (1) time
281   elapsed since recording state and (2) difference in TSCs between the
282   source and destination machine:
283
284   ofs_dst[i] = ofs_src[i] -
285     (guest_src - guest_dest) * freq +
286     (tsc_src - tsc_dest)
287
288   ("ofs[i] + tsc - guest * freq" is the guest TSC value corresponding to
289   a time of 0 in kvmclock.  The above formula ensures that it is the
290   same on the destination as it was on the source).
291
2927. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
293   respective value derived in the previous step.
294