xref: /linux/Documentation/virt/kvm/devices/vcpu.rst (revision bba2c3615bd6cfee7456d1130f2e6b01b3f4e9ba)
1.. SPDX-License-Identifier: GPL-2.0
2
3======================
4Generic vcpu interface
5======================
6
7The virtual cpu "device" also accepts the ioctls KVM_SET_DEVICE_ATTR,
8KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same struct
9kvm_device_attr as other devices, but targets VCPU-wide settings and controls.
10
11The groups and attributes per virtual cpu, if any, are architecture specific.
12
131. GROUP: KVM_ARM_VCPU_PMU_V3_CTRL
14==================================
15
16:Architectures: ARM64
17
181.1. ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_IRQ
19---------------------------------------
20
21:Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a
22	     pointer to an int
23
24Returns:
25
26	 =======  ========================================================
27	 -EBUSY   The PMU overflow interrupt is already set
28	 -EFAULT  Error reading interrupt number
29	 -ENXIO   PMUv3 not supported or the overflow interrupt not set
30		  when attempting to get it
31	 -ENODEV  KVM_ARM_VCPU_PMU_V3 feature missing from VCPU
32	 -EINVAL  Invalid PMU overflow interrupt number supplied or
33		  trying to set the IRQ number without using an in-kernel
34		  irqchip.
35	 =======  ========================================================
36
37A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt
38number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt
39type must be same for each vcpu. As a PPI, the interrupt number is the same for
40all vcpus, while as an SPI it must be a separate number per vcpu.
41
42For GICv5-based guests, the architected PPI (23) must be used, and must be
43communicated as the full GICv5-style Interrupt ID, i.e., 0x20000017. This ioctl
44can be omitted altogether for a GICv5-based guest.
45
461.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT
47---------------------------------------
48
49:Parameters: no additional parameter in kvm_device_attr.addr
50
51Returns:
52
53	 =======  ======================================================
54	 -EEXIST  Interrupt number already used
55	 -ENODEV  PMUv3 not supported or GIC not initialized
56	 -ENXIO   PMUv3 not supported, missing VCPU feature or interrupt
57		  number not set (non-GICv5 guests, only)
58	 -EBUSY   PMUv3 already initialized
59	 =======  ======================================================
60
61Request the initialization of the PMUv3.  If using the PMUv3 with an in-kernel
62virtual GIC implementation, this must be done after initializing the in-kernel
63irqchip.
64
651.3 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_FILTER
66-----------------------------------------
67
68:Parameters: in kvm_device_attr.addr the address for a PMU event filter is a
69             pointer to a struct kvm_pmu_event_filter
70
71:Returns:
72
73	 =======  ======================================================
74	 -ENODEV  PMUv3 not supported or GIC not initialized
75	 -ENXIO   PMUv3 not properly configured or in-kernel irqchip not
76	 	  configured as required prior to calling this attribute
77	 -EBUSY   PMUv3 already initialized or a VCPU has already run
78	 -EINVAL  Invalid filter range
79	 =======  ======================================================
80
81Request the installation of a PMU event filter described as follows::
82
83    struct kvm_pmu_event_filter {
84	    __u16	base_event;
85	    __u16	nevents;
86
87    #define KVM_PMU_EVENT_ALLOW	0
88    #define KVM_PMU_EVENT_DENY	1
89
90	    __u8	action;
91	    __u8	pad[3];
92    };
93
94A filter range is defined as the range [@base_event, @base_event + @nevents),
95together with an @action (KVM_PMU_EVENT_ALLOW or KVM_PMU_EVENT_DENY). The
96first registered range defines the global policy (global ALLOW if the first
97@action is DENY, global DENY if the first @action is ALLOW). Multiple ranges
98can be programmed, and must fit within the event space defined by the PMU
99architecture (10 bits on ARMv8.0, 16 bits from ARMv8.1 onwards).
100
101Note: "Cancelling" a filter by registering the opposite action for the same
102range doesn't change the default action. For example, installing an ALLOW
103filter for event range [0:10) as the first filter and then applying a DENY
104action for the same range will leave the whole range as disabled.
105
106Restrictions: Event 0 (SW_INCR) is never filtered, as it doesn't count a
107hardware event. Filtering event 0x1E (CHAIN) has no effect either, as it
108isn't strictly speaking an event. Filtering the cycle counter is possible
109using event 0x11 (CPU_CYCLES).
110
1111.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU
112------------------------------------------
113
114:Parameters: in kvm_device_attr.addr the address to an int representing the PMU
115             identifier.
116
117:Returns:
118
119	 =======  ====================================================
120	 -EBUSY   PMUv3 already initialized, a VCPU has already run or
121                  an event filter has already been set
122	 -EFAULT  Error accessing the PMU identifier
123	 -ENXIO   PMU not found
124	 -ENODEV  PMUv3 not supported or GIC not initialized
125	 -ENOMEM  Could not allocate memory
126	 =======  ====================================================
127
128Request that the VCPU uses the specified hardware PMU when creating guest events
129for the purpose of PMU emulation. The PMU identifier can be read from the "type"
130file for the desired PMU instance under /sys/devices (or, equivalent,
131/sys/bus/even_source). This attribute is particularly useful on heterogeneous
132systems where there are at least two CPU PMUs on the system. The PMU that is set
133for one VCPU will be used by all the other VCPUs. It isn't possible to set a PMU
134if a PMU event filter is already present.
135
136Note that KVM will not make any attempts to run the VCPU on the physical CPUs
137associated with the PMU specified by this attribute. This is entirely left to
138userspace. However, attempting to run the VCPU on a physical CPU not supported
139by the PMU will fail and KVM_RUN will return with
140exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting
141hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and
142the cpu field to the processor id.
143
1441.5 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS
145--------------------------------------------------
146
147:Parameters: in kvm_device_attr.addr the address to an unsigned int
148	     representing the maximum value taken by PMCR_EL0.N
149
150:Returns:
151
152	 =======  ====================================================
153	 -EBUSY   PMUv3 already initialized, a VCPU has already run or
154                  an event filter has already been set
155	 -EFAULT  Error accessing the value pointed to by addr
156	 -ENODEV  PMUv3 not supported or GIC not initialized
157	 -EINVAL  No PMUv3 explicitly selected, or value of N out of
158	 	  range
159	 =======  ====================================================
160
161Set the number of implemented event counters in the virtual PMU. This
162mandates that a PMU has explicitly been selected via
163KVM_ARM_VCPU_PMU_V3_SET_PMU, and will fail when no PMU has been
164explicitly selected, or the number of counters is out of range for the
165selected PMU. Selecting a new PMU cancels the effect of setting this
166attribute.
167
1682. GROUP: KVM_ARM_VCPU_TIMER_CTRL
169=================================
170
171:Architectures: ARM64
172
1732.1. ATTRIBUTES: KVM_ARM_VCPU_TIMER_IRQ_{VTIMER,PTIMER,HVTIMER,HPTIMER}
174-----------------------------------------------------------------------
175
176:Parameters: in kvm_device_attr.addr the address for the timer interrupt is a
177	     pointer to an int
178
179Returns:
180
181	 =======  =================================
182	 -EINVAL  Invalid timer interrupt number
183	 -EBUSY   One or more VCPUs has already run
184	 =======  =================================
185
186A value describing the architected timer interrupt number when connected to an
187in-kernel virtual GIC.  These must be a PPI (16 <= intid < 32).  Setting the
188attribute overrides the default values (see below).
189
190==============================  ==========================================
191KVM_ARM_VCPU_TIMER_IRQ_VTIMER   The EL1 virtual timer intid (default: 27)
192KVM_ARM_VCPU_TIMER_IRQ_PTIMER   The EL1 physical timer intid (default: 30)
193KVM_ARM_VCPU_TIMER_IRQ_HVTIMER  The EL2 virtual timer intid (default: 28)
194KVM_ARM_VCPU_TIMER_IRQ_HPTIMER  The EL2 physical timer intid (default: 26)
195==============================  ==========================================
196
197Setting the same PPI for different timers will prevent the VCPUs from running.
198Setting the interrupt number on a VCPU configures all VCPUs created at that
199time to use the number provided for a given timer, overwriting any previously
200configured values on other VCPUs.  Userspace should configure the interrupt
201numbers on at least one VCPU after creating all VCPUs and before running any
202VCPUs.
203
204.. _kvm_arm_vcpu_pvtime_ctrl:
205
2063. GROUP: KVM_ARM_VCPU_PVTIME_CTRL
207==================================
208
209:Architectures: ARM64
210
2113.1 ATTRIBUTE: KVM_ARM_VCPU_PVTIME_IPA
212--------------------------------------
213
214:Parameters: 64-bit base address
215
216Returns:
217
218	 =======  ======================================
219	 -ENXIO   Stolen time not implemented
220	 -EEXIST  Base address already set for this VCPU
221	 -EINVAL  Base address not 64 byte aligned
222	 =======  ======================================
223
224Specifies the base address of the stolen time structure for this VCPU. The
225base address must be 64 byte aligned and exist within a valid guest memory
226region. See Documentation/virt/kvm/arm/pvtime.rst for more information
227including the layout of the stolen time structure.
228
2294. GROUP: KVM_VCPU_TSC_CTRL
230===========================
231
232:Architectures: x86
233
2344.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET
235
236:Parameters: 64-bit unsigned TSC offset
237
238Returns:
239
240	 ======= ======================================
241	 -EFAULT Error reading/writing the provided
242		 parameter address.
243	 -ENXIO  Attribute not supported
244	 ======= ======================================
245
246Specifies the guest's TSC offset relative to the host's TSC. The guest's
247TSC is then derived by the following equation:
248
249  guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET
250
251This attribute is useful to adjust the guest's TSC on live migration,
252so that the TSC counts the time during which the VM was paused. The
253following describes a possible algorithm to use for this purpose.
254
255From the source VMM process:
256
2571. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src),
258   kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds
259   (host_src).
260
2612. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the
262   guest TSC offset (ofs_src[i]).
263
2643. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the
265   guest's TSC (freq).
266
267From the destination VMM process:
268
2694. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from
270   kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective
271   fields.  Ensure that the KVM_CLOCK_REALTIME flag is set in the provided
272   structure.
273
274   KVM will advance the VM's kvmclock to account for elapsed time since
275   recording the clock values.  Note that this will cause problems in
276   the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized
277   between the source and destination, and a reasonably short time passes
278   between the source pausing the VMs and the destination executing
279   steps 4-7.
280
2815. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_dest) and
282   kvmclock nanoseconds (guest_dest).
283
2846. Adjust the guest TSC offsets for every vCPU to account for (1) time
285   elapsed since recording state and (2) difference in TSCs between the
286   source and destination machine:
287
288   ofs_dst[i] = ofs_src[i] -
289     (guest_src - guest_dest) * freq +
290     (tsc_src - tsc_dest)
291
292   ("ofs[i] + tsc - guest * freq" is the guest TSC value corresponding to
293   a time of 0 in kvmclock.  The above formula ensures that it is the
294   same on the destination as it was on the source).
295
2967. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the
297   respective value derived in the previous step.
298