Lines Matching +full:ignore +full:- +full:power +full:- +full:on +full:- +full:sel

1 .. SPDX-License-Identifier: GPL-2.0
4 The Definitive KVM (Kernel-based Virtual Machine) API Documentation
13 can be used to issue system ioctls. A KVM_CREATE_VM ioctl on this
15 ioctls. A KVM_CREATE_VCPU or KVM_CREATE_DEVICE ioctl on a VM fd will
21 a virtual machine. Depending on the file descriptor that accepts them,
24 - System ioctls: These query and set global attributes which affect the
28 - VM ioctls: These query and set attributes that affect an entire virtual
35 - vcpu ioctls: These query and set attributes that control the operation
43 - device ioctls: These query and set attributes that control the operation
67 the API. See "General description" for details on the ioctl usage
83 by and on behalf of the VM's process may not be freed/unaccounted when
92 facility that allows backward-compatible extensions to the API to be
95 The extension mechanism is not based on the Linux version number.
133 -----------------------
150 -----------------
169 In order to create user controlled virtual machines on S390, check
176 To use hardware assisted virtualization on MIPS (VZ ASE) rather than
184 On arm64, the physical address size for a VM (IPA Size limit) is limited
189 address used by the VM. The IPA_Bits is encoded in bits[7-0] of the
204 Host_IPA_Limit is the maximum possible value for IPA_Bits on the host and
205 is dependent on the CPU capability and the kernel configuration. The limit can
207 ioctl() at run-time.
210 implicit or explicit) is unsupported on the host.
219 ----------------------------------------------------------
225 :Returns: 0 on success; -1 on error
263 -----------------------
277 Based on their initialization different VMs may have different capabilities.
279 with KVM_CAP_CHECK_EXTENSION_VM on the vm fd)
282 --------------------------
295 the VCPU file descriptor can be mmap-ed, including:
297 - if KVM_CAP_COALESCED_MMIO is available, a page at
302 - if KVM_CAP_DIRTY_LOG_RING is available, a number of pages at
303 KVM_DIRTY_LOG_PAGE_OFFSET * PAGE_SIZE. For more information on
308 -------------------
313 :Parameters: vcpu id (apic id on x86)
314 :Returns: vcpu fd on success, -1 on error
320 the KVM_CHECK_EXTENSION ioctl() at run-time.
322 KVM_CAP_MAX_VCPUS of the KVM_CHECK_EXTENSION ioctl() at run-time.
330 KVM_CAP_MAX_VCPU_ID of the KVM_CHECK_EXTENSION ioctl() at run-time.
335 On powerpc using book3s_hv mode, the vcpus are mapped onto virtual
345 single-threaded guest vcpus, it should make all vcpu ids be a multiple
355 ---------------------
361 :Returns: 0 on success, -1 on error
380 If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of slot field specifies
382 KVM_SET_USER_MEMORY_REGION for details on the usage of slot field.
393 ------------
399 :Returns: 0 on success, -1 on error
420 -----------------
426 :Returns: 0 on success, -1 on error
460 -----------------
466 :Returns: 0 on success, -1 on error
474 ------------------
480 :Returns: 0 on success, -1 on error
497 /* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */
505 ------------------
511 :Returns: 0 on success, -1 on error
518 ------------------
524 :Returns: 0 on success, -1 on error
545 ------------------
551 :Returns: 0 on success, negative on failure.
569 0 on success,
570 -EEXIST if an interrupt is already enqueued
571 -EINVAL the irq number is invalid
572 -ENXIO if the PIC is in the kernel
573 -EFAULT if the pointer is invalid
577 ioctl is useful if the in-kernel PIC is not used.
617 RISC-V:
644 -----------------
651 -1 on error
654 Reads the values of MSR-based features that are available for the VM. This
656 The list of msr-based features can be obtained using KVM_GET_MSR_FEATURE_INDEX_LIST
660 Reads model-specific registers from the vcpu. Supported msr indices can
684 -----------------
690 :Returns: number of msrs successfully set (see below), -1 on error
692 Writes model-specific registers to the vcpu. See KVM_GET_MSRS for the
706 ------------------
712 :Returns: 0 on success, -1 on error
718 - If this IOCTL fails, KVM gives no guarantees that previous valid CPUID
721 - Using KVM_SET_CPUID{,2} after KVM_RUN, i.e. changing the guest vCPU model
723 - Using heterogeneous CPUID configurations, modulo APIC IDs, topology, etc...
746 ------------------------
752 :Returns: 0 on success, -1 on error
757 their traditional behaviour) will cause KVM_RUN to return with -EINTR.
772 ----------------
778 :Returns: 0 on success, -1 on error
810 ----------------
816 :Returns: 0 on success, -1 on error
848 -----------------------
854 :Returns: 0 on success, -1 on error
857 On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up
858 future vcpus to have a local APIC. IRQ routing for GSIs 0-15 is set to both
859 PIC and IOAPIC; GSI 16-23 only go to the IOAPIC.
860 On arm64, a GICv2 is created. Any other GIC versions require the usage of
863 On s390, a dummy irq routing table is created.
865 Note that on s390 the KVM_CAP_S390_IRQCHIP vm capability needs to be enabled
870 -----------------
876 :Returns: 0 on success, -1 on error
879 On some architectures it is required that an interrupt controller model has
880 been previously created with KVM_CREATE_IRQCHIP. Note that edge-triggered
883 On real hardware, interrupt pins can be active-low or active-high. This
888 (active-low/active-high) for level-triggered interrupts, and KVM used
890 active-low interrupts, the above convention is now valid on x86 too.
892 should not present interrupts to the guest as active-low unless this
893 capability is present (or unless it is not using the in-kernel irqchip,
898 in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to
907 - KVM_ARM_IRQ_TYPE_CPU:
908 out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
909 - KVM_ARM_IRQ_TYPE_SPI:
910 in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
912 - KVM_ARM_IRQ_TYPE_PPI:
913 in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
923 Note that on arm64, the KVM_CAP_IRQCHIP capability only conditions
924 injection of interrupts for the in-kernel irqchip. KVM_IRQ_LINE can always
939 --------------------
945 :Returns: 0 on success, -1 on error
964 --------------------
970 :Returns: 0 on success, -1 on error
989 -----------------------
995 :Returns: 0 on success, -1 on error
1000 page of a blob (32- or 64-bit, depending on the vcpu mode) to guest
1039 ------------------
1045 :Returns: 0 on success, -1 on error
1048 conjunction with KVM_SET_CLOCK, it is used to ensure monotonicity on scenarios
1089 ------------------
1095 :Returns: 0 on success, -1 on error
1098 In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios
1124 ------------------------
1131 :Returns: 0 on success, -1 on error
1176 - KVM_VCPUEVENT_VALID_SHADOW may be set to signal that
1179 - KVM_VCPUEVENT_VALID_SMM may be set to signal that smi contains a
1182 - KVM_VCPUEVENT_VALID_PAYLOAD may be set to signal that the
1187 - KVM_VCPUEVENT_VALID_TRIPLE_FAULT may be set to signal that the
1206 guest-visible registers. It is not possible to 'cancel' an SError that has been
1209 A device being emulated in user-space may also wish to generate an SError. To do
1210 this the events structure can be populated by user-space. The current state
1219 always have a non-zero value when read, and the agent making an SError pending
1221 the system supports KVM_CAP_ARM_INJECT_SERROR_ESR, but user-space sets the events
1224 Specifying exception.has_esr on a system that does not support it will return
1225 -EINVAL. Setting anything other than the lower 24bits of exception.serror_esr
1226 will return -EINVAL.
1232 Calling this ioctl on a vCPU that hasn't been initialized will return
1233 -ENOEXEC.
1250 ------------------------
1257 :Returns: 0 on success, -1 on error
1270 suppress overwriting the current in-kernel state. The bits are:
1275 KVM_VCPUEVENT_VALID_SMM transfer the smi sub-struct.
1305 from the exiting fault on the VCPU. It is a programming error to set
1315 Calling this ioctl on a vCPU that hasn't been initialized will return
1316 -ENOEXEC.
1319 ----------------------
1325 :Returns: 0 on success, -1 on error
1341 ----------------------
1347 :Returns: 0 on success, -1 on error
1352 yet and must be cleared on entry.
1356 -------------------------------
1362 :Returns: 0 on success, -1 on error
1379 memory slot. Bits 0-15 of "slot" specify the slot id and this value
1384 If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of "slot"
1388 are unrelated; the restriction on overlapping slots only applies within
1400 On architectures that support a form of address tagging, userspace_addr must
1411 to make a new slot read-only. In this case, writes to this memory will be
1420 Read only region isn't supported. Only as-id 0 is supported.
1422 Note: On arm64, a write generated by the page-table walker (to update
1426 page-table walker, making it impossible to emulate the access.
1427 Instead, an abort (data abort if the cause of the page-table update
1434 Returns -EINVAL or -EEXIST if the VM has the KVM_VM_S390_UCONTROL flag set.
1435 Returns -EINVAL if called on a protected VM.
1438 ---------------------
1444 :Returns: 0 on success, -1 on error
1446 This ioctl defines the physical address of a three-page region in the guest
1452 This ioctl is required on Intel-based hosts. This is needed on Intel hardware
1460 -------------------
1466 :Returns: 0 on success; -1 on error
1472 :Returns: 0 on success; -1 on error
1479 On systems that do not support this ioctl, it always fails. On systems that
1511 The vcpu ioctl should be used for vcpu-specific capabilities, the vm ioctl
1512 for vm-wide capabilities.
1515 ---------------------
1521 :Returns: 0 on success; -1 on error
1529 Returns the vcpu's current "multiprocessing state" (though also valid on
1555 On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
1556 in-kernel irqchip, the multiprocessing state must be maintained by userspace on
1592 On LoongArch, only the KVM_MP_STATE_RUNNABLE state is used to reflect
1596 ---------------------
1602 :Returns: 0 on success; -1 on error
1607 On x86, this ioctl is only useful after KVM_CREATE_IRQCHIP. Without an
1608 in-kernel irqchip, the multiprocessing state must be maintained by userspace on
1617 On LoongArch, only the KVM_MP_STATE_RUNNABLE state is used to reflect
1621 ------------------------------
1627 :Returns: 0 on success, -1 on error
1629 This ioctl defines the physical address of a one-page region in the guest
1638 This ioctl is required on Intel-based hosts. This is needed on Intel hardware
1645 ------------------------
1651 :Returns: 0 on success, -1 on error
1660 ------------------
1666 :Returns: 0 on success, -1 on error
1680 ------------------
1686 :Returns: 0 on success, -1 on error
1698 when invoked on the vm file descriptor. The size value returned by
1704 contents of CPUID leaf 0xD on the host.
1708 -----------------
1714 :Returns: 0 on success, -1 on error
1735 -----------------
1741 :Returns: 0 on success, -1 on error
1762 ----------------------------
1768 :Returns: 0 on success, -1 on error
1801 Dynamically-enabled feature bits need to be requested with
1811 with the 'nent' field indicating the number of entries in the variable-size
1841 may be returned as true, but they depend on KVM_CREATE_IRQCHIP for in-kernel
1854 -----------------------
1860 :Returns: 0 on success, !0 on error
1875 If any additional field gets added to this structure later on, a bit for that
1884 ------------------------
1890 :Returns: 0 on success, -1 on error
1894 On arm64, GSI routing has the following limitation:
1896 - GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.
1932 On s390, adding a KVM_IRQ_ROUTING_S390_ADAPTER is rejected on ucontrol VMs with
1933 error -EINVAL.
1937 - KVM_MSI_VALID_DEVID: used along with KVM_IRQ_ROUTING_MSI routing entry
1938 type, specifies that the devid field contains a valid value. The per-VM
1942 - zero otherwise
1965 On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
1967 address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
2001 --------------------
2007 :Returns: 0 on success, -1 on error
2024 --------------------
2030 :Returns: virtual tsc-khz on success, negative value on error
2033 KHz. If the host has unstable tsc this ioctl returns -EIO instead as an
2038 ------------------
2044 :Returns: 0 on success, -1 on error
2057 enabled, then the format of APIC_ID register depends on the APIC mode
2059 the APIC_ID register (bytes 32-35). xAPIC only allows an 8-bit APIC ID
2060 which is stored in bits 31-24 of the APIC register, or equivalently in
2069 ------------------
2075 :Returns: 0 on success, -1 on error
2087 The format of the APIC ID register (bytes 32-35 of struct kvm_lapic_state's
2088 regs field) depends on the state of the KVM_CAP_X2APIC_API capability.
2093 ------------------
2099 :Returns: 0 on success, !0 on error
2116 For the special case of virtio-ccw devices on s390, the ioevent is matched
2130 For virtio-ccw devices, addr contains the subchannel id and datamatch the
2134 the kernel will ignore the length of guest write and may get a faster vmexit.
2139 ------------------
2145 :Returns: 0 on success, -1 on error
2155 TLB, prior to calling KVM_RUN on the associated vcpu.
2165 The array is little-endian: the bit 0 is the least significant bit of the
2175 -------------------------
2184 is an IOMMU for PAPR-style virtual I/O. It is used to translate
2198 which this TCE table will translate - the table will contain one 64
2201 When the guest issues an H_PUT_TCE hcall on a liobn for which a TCE
2208 the entries written by kernel-handled H_PUT_TCE calls, and also lets
2214 ------------
2220 :Returns: 0 on success, -1 on error
2222 Queues an NMI on the thread's vcpu. Note this is well defined only
2230 - pause the vcpu
2231 - read the local APIC's state (KVM_GET_LAPIC)
2232 - check whether changing LINT1 will queue an NMI (see the LVT entry for LINT1)
2233 - if so, issue KVM_NMI
2234 - resume the vcpu
2241 ----------------------
2263 ------------------------
2285 ------------------------
2293 This call creates a page table entry on the virtual cpu's address space
2303 --------------------
2309 :Returns: 0 on success, negative value on failure
2316 protected virtualization mode on s390
2322 (These error codes are indicative only: do not rely on a specific error
2546 ARM 32-bit CP15 registers have the following id bit patterns::
2550 ARM 64-bit CP15 registers have the following id bit patterns::
2558 ARM 32-bit VFP control registers have the following id bit patterns::
2562 ARM 64-bit FP registers have the following id bit patterns::
2566 ARM firmware pseudo-registers have the following bit pattern::
2574 arm64 core/FP-SIMD registers have the following id bit patterns. Note
2608 .. [1] These encodings are not accepted for SVE-enabled vcpus. See
2633 arm64 firmware pseudo-registers have the following bit pattern::
2642 0x6060 0000 0015 ffff KVM_REG_ARM64_SVE_VLS pseudo-register
2645 ENOENT. max_vq is the vcpu's maximum supported vector length in 128-bit
2648 These registers are only accessible on vcpus for which SVE is enabled.
2656 KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector
2666 ((vector_lengths[(vq - KVM_ARM64_SVE_VQ_MIN) / 64] >>
2667 ((vq - KVM_ARM64_SVE_VQ_MIN) % 64)) & 1))
2673 max_vq. This is the maximum vector length available to the guest on
2689 is hardware-dependent and may not be available. Attempting to configure
2696 arm64 bitmap feature firmware pseudo-registers have the following bit pattern::
2710 a -EBUSY to userspace.
2723 patterns depending on whether they're 32-bit or 64-bit registers::
2725 0x7020 0000 0001 00 <reg:5> <sel:3> (32-bit)
2726 0x7030 0000 0001 00 <reg:5> <sel:3> (64-bit)
2744 id bit patterns depending on the size of the register being accessed. They are
2751 0x7020 0000 0003 00 <0:3> <reg:5> (32-bit FPU registers)
2752 0x7030 0000 0003 00 <0:3> <reg:5> (64-bit FPU registers)
2753 0x7040 0000 0003 00 <0:3> <reg:5> (128-bit MSA vector registers)
2765 RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of
2768 RISC-V config registers are meant for configuring a Guest VCPU and it has
2774 Following are the RISC-V config registers:
2786 RISC-V core registers represent the general execution state of a Guest VCPU
2792 Following are the RISC-V core registers:
2829 0x80x0 0000 0200 0020 mode Privilege mode (1 = S-mode or 0 = U-mode)
2832 RISC-V csr registers represent the supervisor mode control/status registers
2838 Following are the RISC-V csr registers:
2854 RISC-V timer registers represent the timer state of a Guest VCPU and it has
2859 Following are the RISC-V timer registers:
2864 0x8030 0000 0400 0000 frequency Time base frequency (read-only)
2867 0x8030 0000 0400 0003 state Time compare state (1 = ON or 0 = OFF)
2870 RISC-V F-extension registers represent the single precision floating point
2875 Following are the RISC-V F-extension registers:
2886 RISC-V D-extension registers represent the double precision floating point
2890 0x8030 0000 06 <index into the __riscv_d_ext_state struct:24> (non-fcsr)
2892 Following are the RISC-V D-extension registers:
2909 0x9030 0000 0001 00 <reg:5> <sel:3> (64-bit)
2919 Following are the KVM-defined registers for x86:
2928 --------------------
2934 :Returns: 0 on success, negative value on failure
2941 protected virtualization mode on s390
2945 (These error codes are indicative only: do not rely on a specific error
2950 kvm_one_reg struct passed in. On success, the register value can be found
2958 ----------------------
2964 :Returns: 0 on success, -1 on error
2975 load-link/store-conditional, or equivalent must be used. There are two cases
2982 -------------------
2988 :Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
2990 Directly inject a MSI message. Only valid with in-kernel irqchip that handles
3005 KVM_MSI_VALID_DEVID: devid contains a valid value. The per-VM
3014 On x86, address_hi is ignored unless the KVM_X2APIC_API_USE_32BIT_IDS
3016 address_hi bits 31-8 provide bits 31-8 of the destination id. Bits 7-0 of
3021 --------------------
3027 :Returns: 0 on success, -1 on error
3029 Creates an in-kernel device model for the i8254 PIT. This call is only valid
3030 after enabling in-kernel irqchip support via KVM_CREATE_IRQCHIP. The following
3042 PIT timer interrupts may use a per-VM kernel thread for injection. If it
3045 kvm-pit/<owner-process-pid>
3054 -----------------
3060 :Returns: 0 on success, -1 on error
3062 Retrieves the state of the in-kernel PIT model. Only valid after
3082 -----------------
3088 :Returns: 0 on success, -1 on error
3090 Sets the state of the in-kernel PIT model. Only valid after KVM_CREATE_PIT2.
3091 See KVM_GET_PIT2 for details on struct kvm_pit_state2.
3097 interval timer <https://www.scs.stanford.edu/10wi-cs140/pintos/specs/8254.pdf>`_.
3103 --------------------------
3109 :Returns: 0 on success, -1 on error
3114 device-tree properties for the guest operating system.
3128 - KVM_PPC_PAGE_SIZES_REAL:
3133 - KVM_PPC_1T_SEGMENTS
3137 - KVM_PPC_NO_HASH
3178 --------------
3184 :Returns: 0 on success, -1 on error
3189 an event is triggered on the eventfd, an interrupt is injected into
3194 With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify
3195 mechanism allowing emulation of level-triggered, irqfd-based
3200 as from an EOI, the gsi is de-asserted and the user is notified via
3201 kvm_irqfd.resamplefd. It is the user's responsibility to re-queue
3204 irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment
3207 On arm64, gsi routing being supported, the following can happen:
3209 - in case no routing entry is associated to this gsi, injection fails
3210 - in case the gsi is associated to an irqchip routing entry,
3212 - in case the gsi is associated to an MSI routing entry, the MSI
3214 to GICv3 ITS in-kernel emulation).
3217 --------------------------
3223 :Returns: 0 on success, -1 on error
3235 The parameter is a pointer to a 32-bit unsigned integer variable
3237 table, which must be between 18 and 46. On successful return from the
3242 default-sized hash table (16 MB).
3250 real-mode area (VRMA) facility, the kernel will re-create the VMRA
3251 HPTEs on the next KVM_RUN of any vcpu.
3254 -----------------------
3260 :Returns: 0 on success, -1 on error
3263 (vm ioctl) or per cpu (vcpu ioctl), depending on the interrupt type.
3276 - sigp stop; optional flags in parm
3278 - program check; code in parm
3280 - sigp set prefix; prefix address in parm
3282 - restart
3284 - clock comparator interrupt
3286 - CPU timer interrupt
3288 - virtio external interrupt; external interrupt
3291 - sclp external interrupt; sclp parameter in parm
3293 - sigp emergency; source cpu in parm
3295 - sigp external call; source cpu in parm
3297 - compound value to indicate an
3298 I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel);
3302 - machine check interrupt; cr 14 bits in parm, machine check interrupt
3309 ------------------------
3315 :Returns: file descriptor number (>= 0) on success, -1 on error
3338 Reads on the fd will initially supply information about all
3342 return. If read() is called again on the fd, it will start again from
3364 ----------------------
3370 :Returns: 0 on success, -1 on error
3403 --------------------------------------------
3411 :Returns: 0 on success, -1 on error
3419 (e.g. read-only attribute, or attribute that only makes
3426 semantics are device-specific. See individual device documentation in
3434 __u32 group; /* device-defined */
3435 __u64 attr; /* group-defined */
3440 ------------------------
3447 :Returns: 0 on success, -1 on error
3464 ----------------------
3470 :Returns: 0 on success; -1 on error
3485 - Processor state:
3490 - General Purpose registers, including PC and SP: set to 0
3491 - FPSIMD/NEON registers: set to 0
3492 - SVE registers: set to 0
3493 - System registers: Reset to their architecturally defined
3507 - KVM_ARM_VCPU_POWER_OFF: Starts the CPU in a power-off state.
3508 Depends on KVM_CAP_ARM_PSCI. If not set, the CPU will be powered on
3510 - KVM_ARM_VCPU_EL1_32BIT: Starts the CPU in a 32bit mode.
3511 Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only).
3512 - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 (or a future revision
3514 Depends on KVM_CAP_ARM_PSCI_0_2.
3515 - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU.
3516 Depends on KVM_CAP_ARM_PMU_V3.
3518 - KVM_ARM_VCPU_PTRAUTH_ADDRESS: Enables Address Pointer authentication
3520 Depends on KVM_CAP_ARM_PTRAUTH_ADDRESS.
3526 - KVM_ARM_VCPU_PTRAUTH_GENERIC: Enables Generic Pointer authentication
3528 Depends on KVM_CAP_ARM_PTRAUTH_GENERIC.
3534 - KVM_ARM_VCPU_SVE: Enables SVE for the CPU (arm64 only).
3535 Depends on KVM_CAP_ARM_SVE.
3540 - KVM_REG_ARM64_SVE_VLS may be read using KVM_GET_ONE_REG: the
3541 initial value of this pseudo-register indicates the best set of
3542 vector lengths possible for a vcpu on this host.
3546 - KVM_RUN and KVM_GET_REG_LIST are not available;
3548 - KVM_GET_ONE_REG and KVM_SET_ONE_REG cannot be used to access
3553 - KVM_REG_ARM64_SVE_VLS may optionally be written using
3559 - the KVM_REG_ARM64_SVE_VLS pseudo-register is immutable, and can
3562 - KVM_ARM_VCPU_HAS_EL2: Enable Nested Virtualisation support,
3564 Depends on KVM_CAP_ARM_EL2.
3568 - KVM_ARM_VCPU_HAS_EL2_E2H0: Restrict Nested Virtualisation
3569 support to HCR_EL2.E2H being RES0 (non-VHE).
3570 Depends on KVM_CAP_ARM_EL2_E2H0.
3574 -----------------------------
3580 :Returns: 0 on success; -1 on error
3589 by KVM on underlying host.
3593 kvm_vcpu_init->features bitmap returned will have feature bits set if
3603 ---------------------
3609 :Returns: 0 on success; -1 on error
3631 - KVM_REG_S390_TODPR
3633 - KVM_REG_S390_EPOCHDIFF
3635 - KVM_REG_S390_CPU_TIMER
3637 - KVM_REG_S390_CLOCK_COMP
3639 - KVM_REG_S390_PFTOKEN
3641 - KVM_REG_S390_PFCOMPARE
3643 - KVM_REG_S390_PFSELECT
3645 - KVM_REG_S390_PP
3647 - KVM_REG_S390_GBEA
3653 -----------------------------------------
3659 :Returns: 0 on success, -1 on error
3665 ENXIO Device not supported on current system
3689 arm64 currently only require this when using the in-kernel GIC
3694 KVM_RUN on any of the VCPUs. Calling this ioctl twice for any of the
3695 base addresses will return -EEXIST.
3702 ------------------------------
3708 :Returns: 0 on success, -1 on error
3713 of a service that has a kernel-side implementation. If the token
3714 value is non-zero, it will be associated with that service, and
3722 ------------------------
3728 :Returns: 0 on success; -1 on error
3743 - KVM_GUESTDBG_ENABLE: guest debugging is enabled
3744 - KVM_GUESTDBG_SINGLESTEP: the next run should single-step
3749 - KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
3750 - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
3751 - KVM_GUESTDBG_USE_HW: using hardware debug events [arm64]
3752 - KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
3753 - KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
3754 - KVM_GUESTDBG_EXIT_PENDING: trigger an immediate guest exit [s390]
3755 - KVM_GUESTDBG_BLOCKIRQ: avoid injecting interrupts/NMI/SMI [x86]
3773 the single-step debug event (KVM_GUESTDBG_SINGLESTEP) is supported.
3783 ---------------------------
3789 :Returns: 0 on success, -1 on error
3824 the variable-size array 'entries'. If the number of entries is too low
3858 --------------------
3864 :Returns: = 0 on success,
3865 < 0 on generic error (e.g. -EFAULT or -ENOMEM),
3923 Logical accesses are permitted for non-protected guests only.
3940 On protection exceptions, unless specified otherwise, the injected
3941 translation-exception identifier (TEID) indicates suppression.
3964 Absolute accesses are permitted for non-protected guests only.
3976 Perform cmpxchg on absolute guest memory. Intended for use with the
3981 parameter. "size" must be a power of two up to and including 16.
4003 -----------------------
4009 :Returns: 0 on success, KVM_S390_GET_SKEYS_NONE if guest is not using storage
4010 keys, negative value on error
4012 This ioctl is used to get guest storage key values on the s390
4029 will cause the ioctl to return -EINVAL.
4035 -----------------------
4041 :Returns: 0 on success, negative value on error
4043 This ioctl is used to set guest storage key values on the s390
4045 See section on KVM_S390_GET_SKEYS for struct definition.
4053 will cause the ioctl to return -EINVAL.
4060 the ioctl will return -EINVAL.
4063 -----------------
4069 :Returns: 0 on success, -1 on error
4110 - KVM_S390_SIGP_STOP - sigp stop; parameter in .stop
4111 - KVM_S390_PROGRAM_INT - program check; parameters in .pgm
4112 - KVM_S390_SIGP_SET_PREFIX - sigp set prefix; parameters in .prefix
4113 - KVM_S390_RESTART - restart; no parameters
4114 - KVM_S390_INT_CLOCK_COMP - clock comparator interrupt; no parameters
4115 - KVM_S390_INT_CPU_TIMER - CPU timer interrupt; no parameters
4116 - KVM_S390_INT_EMERGENCY - sigp emergency; parameters in .emerg
4117 - KVM_S390_INT_EXTERNAL_CALL - sigp external call; parameters in .extcall
4118 - KVM_S390_MCHK - machine check interrupt; parameters in .mchk
4123 ---------------------------
4130 -EINVAL if buffer size is 0,
4131 -ENOBUFS if buffer size is too small to fit all pending interrupts,
4132 -EFAULT if the buffer address was invalid
4150 the kernel never checked for flags == 0 and QEMU never pre-zeroed flags and
4154 If -ENOBUFS is returned the buffer provided was too small and userspace
4158 ---------------------------
4164 :Returns: 0 on success,
4165 -EFAULT if the buffer address was invalid,
4166 -EINVAL for an invalid buffer length (see below),
4167 -EBUSY if there were already interrupts pending,
4171 This ioctl allows userspace to set the complete state of all cpu-local
4193 which is the maximum number of possibly pending cpu-local interrupts.
4196 ------------
4202 :Returns: 0 on success, -1 on error
4204 Queues an SMI on the thread's vcpu.
4207 ----------------------------
4213 :Returns: 0 on success, < 0 on error
4267 If an MSR access is denied by userspace, the resulting KVM behavior depends on
4270 on denied accesses, i.e. userspace effectively intercepts the MSR access. If
4272 on denied accesses. Note, if an MSR access is denied during emulation of MSR
4296 part of VM-Enter/VM-Exit emulation.
4299 of VM-Enter/VM-Exit emulation. If an MSR access is denied on VM-Enter, KVM
4300 synthesizes a consistency check VM-Exit(EXIT_REASON_MSR_LOAD_FAIL). If an
4301 MSR access is denied on VM-Exit, KVM synthesizes a VM-Abort. In short, KVM
4303 the VM-Enter/VM-Exit MSR list. It is platform owner's responsibility to
4314 Similarly, if userspace wishes to intercept on denied accesses,
4320 ----------------------------
4353 -------------------------
4359 :Returns: 0 on success,
4360 -EFAULT if struct kvm_reinject_control cannot be read,
4361 -ENXIO if KVM_CREATE_PIT or KVM_CREATE_PIT2 didn't succeed earlier.
4380 ------------------------------
4386 :Returns: 0 on success,
4387 -EFAULT if struct kvm_ppc_mmuv3_cfg cannot be read,
4388 -EINVAL if the configuration is invalid
4411 the Power ISA V3.00, Book III section 5.7.6.1.
4414 ---------------------------
4420 :Returns: 0 on success,
4421 -EFAULT if struct kvm_ppc_rmmu_info cannot be written,
4422 -EINVAL if no useful information can be returned
4451 --------------------------------
4457 :Returns: 0 on successful completion,
4460 -EFAULT if struct kvm_reinject_control cannot be read,
4461 -EINVAL if the supplied shift or flags are invalid,
4462 -ENOMEM if unable to allocate the new HPT,
4495 returns 0 (i.e. cancels any in-progress preparation).
4498 flags will result in an -EINVAL.
4505 -------------------------------
4511 :Returns: 0 on successful completion,
4512 -EFAULT if struct kvm_reinject_control cannot be read,
4513 -EINVAL if the supplied shift or flags are invalid,
4514 -ENXIO is there is no pending HPT, or the pending HPT doesn't
4516 -EBUSY if the pending HPT is not fully prepared,
4517 -ENOSPC if there was a hash collision when moving existing
4519 -EIO on other error conditions
4536 KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
4537 -EBUSY, though others may be possible if the preparation was started,
4540 This will have undefined effects on the guest if it has not already
4544 On successful completion, the pending HPT will become the guest's active
4547 On failure, the guest will still be operating on its previous HPT.
4550 -----------------------------------
4556 :Returns: 0 on success, -1 on error
4563 -----------------------
4569 :Returns: 0 on success,
4570 -EFAULT if u64 mcg_cap cannot be read,
4571 -EINVAL if the requested number of banks is invalid,
4572 -EINVAL if requested MCE capability is not supported.
4577 supported number of error-reporting banks can be retrieved when
4582 ---------------------
4588 :Returns: 0 on success,
4589 -EFAULT if struct kvm_x86_mce cannot be read,
4590 -EINVAL if the bank number is invalid,
4591 -EINVAL if VAL bit is not set in status field.
4616 ----------------------------
4622 :Returns: 0 on success, a negative value on error
4636 This ioctl is used to get the values of the CMMA bits on the s390
4639 - During live migration to save the CMMA values. Live migration needs
4641 - To non-destructively peek at the CMMA values, with the flag
4672 KVM_S390_SKEYS_MAX. KVM_S390_SKEYS_MAX is re-used for consistency with
4678 Depending on the flags, different actions are performed. The only
4717 ----------------------------
4723 :Returns: 0 on success, a negative value on error
4725 This ioctl is used to set the values of the CMMA bits on the s390
4727 the CMMA values, but there are no restrictions on its use.
4756 This ioctl can fail with -ENOMEM if not enough memory can be allocated to
4757 complete the task, with -ENXIO if CMMA is not enabled, with -EINVAL if
4759 if the flags field was not 0, with -EFAULT if the userspace address is
4765 --------------------------
4771 :Returns: 0 on successful completion,
4772 -EFAULT if struct kvm_ppc_cpu_char cannot be written
4777 CVE-2017-5715, CVE-2017-5753 and CVE-2017-5754). The information is
4790 userspace will be able to tell whether it is running on a kernel that
4794 with preventing inadvertent information disclosure - specifically,
4795 whether there is an instruction to flash-invalidate the L1 data cache
4812 ---------------------------
4818 :Returns: 0 on success; -1 on error
4821 for issuing platform-specific memory encryption commands to manage those
4825 (SEV) commands on AMD Processors and Trusted Domain Extensions (TDX) commands
4826 on Intel Processors. The detailed commands are defined in
4827 Documentation/virt/kvm/x86/amd-memory-encryption.rst and
4828 Documentation/virt/kvm/x86/intel-tdx.rst.
4831 -----------------------------------
4837 :Returns: 0 on success; -1 on error
4842 It is used in the SEV-enabled guest. When encryption is enabled, a guest
4855 -------------------------------------
4861 :Returns: 0 on success; -1 on error
4867 ------------------------
4874 This ioctl (un)registers an eventfd to receive notifications from the guest on
4875 the specified Hyper-V connection id through the SIGNAL_EVENT hypercall, without
4876 causing a user exit. SIGNAL_EVENT hypercall with non-zero event flag number
4877 (bits 24-31) still triggers a KVM_EXIT_HYPERV_HCALL user exit.
4896 :Returns: 0 on success,
4897 -EINVAL if conn_id or flags is outside the allowed range,
4898 -ENOENT on deassign if the conn_id isn't registered,
4899 -EEXIST on assign if the conn_id is already registered
4902 --------------------------
4908 :Returns: 0 on success, -1 on error
4976 --------------------------
4982 :Returns: 0 on success, -1 on error
4988 -------------------------------------
4995 :Returns: 0 on success, < 0 on error
5008 register on the same device. This last access will cause a vmexit and
5010 it. That will avoid exiting to userspace on repeated writes.
5012 Coalesced pio is based on coalesced mmio. There is little difference
5017 -------------------------
5023 :Returns: 0 on success, -1 on error
5045 in KVM's dirty bitmap, and dirty tracking is re-enabled for that page
5046 (for example via write-protection, or by clearing the dirty bit in
5049 If KVM_CAP_MULTI_ADDRESS_SPACE is available, bits 16-31 of slot field specifies
5051 KVM_SET_USER_MEMORY_REGION for details on the usage of slot field.
5059 --------------------------------
5065 :Returns: 0 on success, -1 on error
5086 This ioctl returns x86 cpuid features leaves related to Hyper-V emulation in
5088 cpuid information presented to guests consuming Hyper-V enlightenments (e.g.
5089 Windows or Hyper-V guests).
5091 CPUID feature leaves returned by this ioctl are defined by Hyper-V Top Level
5098 - HYPERV_CPUID_VENDOR_AND_MAX_FUNCTIONS
5099 - HYPERV_CPUID_INTERFACE
5100 - HYPERV_CPUID_VERSION
5101 - HYPERV_CPUID_FEATURES
5102 - HYPERV_CPUID_ENLIGHTMENT_INFO
5103 - HYPERV_CPUID_IMPLEMENT_LIMITS
5104 - HYPERV_CPUID_NESTED_FEATURES
5105 - HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS
5106 - HYPERV_CPUID_SYNDBG_INTERFACE
5107 - HYPERV_CPUID_SYNDBG_PLATFORM_CAPABILITIES
5110 with the 'nent' field indicating the number of entries in the variable-size
5111 array 'entries'. If the number of entries is too low to describe all Hyper-V
5113 to the number of Hyper-V feature leaves, the 'nent' field is adjusted to the
5123 - HYPERV_CPUID_NESTED_FEATURES leaf and HV_X64_ENLIGHTENED_VMCS_RECOMMENDED
5125 on the corresponding vCPU (KVM_CAP_HYPERV_ENLIGHTENED_VMCS).
5126 - HV_STIMER_DIRECT_MODE_AVAILABLE bit is only exposed with in-kernel LAPIC.
5130 ---------------------------
5135 :Returns: 0 on success, -1 on error
5161 that should be performed and how to do it are feature-dependent.
5163 Other calls that depend on a particular feature being finalized, such as
5165 -EPERM unless the feature has already been finalized by means of a
5172 ------------------------------
5178 :Returns: 0 on success, -1 on error
5231 ---- -----------
5260 When setting a new pmu event filter, -EINVAL will be returned if any of the
5262 select are set when called on Intel.
5272 Specifically, KVM follows the following pseudo-code when determining whether to
5273 allow the guest FixCtr[i] to count its pre-defined fixed event::
5288 ---------------------
5294 :Returns: 0 on successful completion,
5312 ---------------------------
5324 ----------------------------
5337 --------------------------
5351 -------------------------
5357 :Returns: 0 on success, < 0 on error
5396 All registered VCPUs are converted back to non-protected ones. If a
5497 not succeed all other subcommands will fail with -EINVAL. This
5498 subcommand will return -EINVAL if a dump process has not yet been
5517 On success `conf_dump_finalize_len` bytes of completion data will be
5529 resume execution immediately as non-protected. There can be at most
5554 --------------------------
5560 :Returns: 0 on success, < 0 on error
5600 Sets the ABI mode of the VM to 32-bit or 64-bit (long mode). This
5634 re-mapped in guest physical address space.
5640 This is the HVM-wide vector injected directly by the hypervisor
5650 a specified vCPU (by APIC ID) / port / priority on the guest, or to
5651 trigger events on an eventfd. The vCPU and priority can be changed
5662 the 32-bit version code returned to the guest when it invokes the
5677 --------------------------
5683 :Returns: 0 on success, < 0 on error
5690 ---------------------------
5696 :Returns: 0 on success, < 0 on error
5731 on dirty logging. Setting the gpa to KVM_XEN_INVALID_GPA will disable
5741 an overlay on guest memory and remains at a fixed host address
5775 other four times. The state field must be set to -1, or to a valid
5783 vCPU ID of the given vCPU, to allow timer-related VCPU operations to
5796 per-vCPU local APIC upcall vector, configured by a Xen guest with
5798 used by Windows guests, and is distinct from the HVM-wide upcall
5804 ---------------------------
5810 :Returns: 0 on success, < 0 on error
5819 ---------------------------
5825 :Returns: number of bytes copied, < 0 on error (-EINVAL for incorrect
5826 arguments, -EFAULT if memory cannot be accessed).
5840 ``length`` must not be bigger than 2^31 - PAGE_SIZE bytes. The ``addr``
5857 --------------------
5863 :Returns: 0 on success, -1 on error
5890 --------------------
5896 :Returns: 0 on success, -1 on error
5903 ----------------------
5909 :Returns: statistics file descriptor on success, < 0 on error
5922 +-------------+
5924 +-------------+
5926 +-------------+
5928 +-------------+
5930 +-------------+
5974 The id string block contains a string which identifies the file descriptor on
6020 Bits 0-3 of ``flags`` encode the type:
6035 of items in a hash table bucket, the longest time waited and so on.
6042 is [``hist_param``*(N-1), ``hist_param``*N), while the range of the last
6043 bucket is [``hist_param``*(``size``-1), +INF). (+INF means positive infinity
6048 [0, 1), while the range of the last bucket is [pow(2, ``size``-2), +INF).
6050 [pow(2, N-2), pow(2, N-1)).
6052 Bits 4-7 of ``flags`` encode the unit:
6075 Bits 8-11 of ``flags``, together with ``exponent``, encode the scale of the
6079 The scale is based on power of 10. It is used for measurement of time and
6080 CPU clock cycles. For example, an exponent of -9 can be used with
6083 The scale is based on power of 2. It is used for measurement of memory size.
6096 bucket in the unit expressed by bits 4-11 of ``flags`` together with ``exponent``.
6102 The Stats Data block contains an array of 64-bit values in the same order
6106 --------------------
6112 :Returns: 0 on success, -1 on error
6124 when invoked on the vm file descriptor. The size value returned by
6130 of CPUID leaf 0xD on the host.
6133 -----------------------------
6139 :Returns: 0 on success, < 0 on error
6153 -----------------------------
6159 :Returns: 0 on success, < 0 on error
6162 for vcpus. It re-uses the kvm_s390_pv_dmp struct and hence also shares
6178 ----------------------
6184 :Returns: 0 on success, <0 on error
6186 Used to manage hardware-assisted virtualization features for zPCI devices.
6225 --------------------------------
6231 :Returns: 0 on success, < 0 on error
6233 This capability indicates that userspace is able to apply a single VM-wide
6251 on previous values of the guest counters.
6254 (-EINVAL) being returned. This ioctl can also return -EBUSY if any vcpu
6265 ------------------------------------
6271 :Returns: 0 on success, < 0 on error
6304 op0==3, op1=={0, 1, 3}, CRn==0, CRm=={0-7}, op2=={0-7}.
6313 ---------------------------------
6319 :Returns: 0 on success, -1 on error
6326 must point at a file created via KVM_CREATE_GUEST_MEMFD on the current VM, and
6349 on-demand.
6352 userspace_addr vs. guest_memfd, based on the gfn's KVM_MEMORY_ATTRIBUTE_PRIVATE
6360 Returns -EINVAL if the VM has the KVM_VM_S390_UCONTROL flag set.
6361 Returns -EINVAL if called on a protected VM.
6364 -------------------------------
6370 :Returns: 0 on success, <0 on error
6387 retrieved via ioctl(KVM_CHECK_EXTENSION) on KVM_CAP_MEMORY_ATTRIBUTES. If
6388 executed on a VM, KVM_CAP_MEMORY_ATTRIBUTES precisely returns the attributes
6400 ----------------------------
6406 :Returns: A file descriptor on success, <0 on error
6444 GUEST_MEMFD_FLAG_MMAP Enable using mmap() on the guest_memfd file
6461 ---------------------------
6467 :Returns: 0 if at least one page is processed, < 0 on error
6493 KVM_PRE_FAULT_MEMORY populates KVM's stage-2 page tables used to map memory
6495 stage-2 read page fault, e.g. faults in memory as needed, but doesn't break
6496 CoW. However, KVM does not mark any newly created stage-2 PTE as Accessed.
6508 remaining range. If `size` > 0 on return, the caller can just issue
6545 This field is polled once when KVM_RUN starts; if non-zero, KVM_RUN
6546 exits immediately, returning -EINTR. In the common scenario where a
6550 a signal handler that sets run->immediate_exit to a non-zero value.
6576 The value of the current interrupt flag. Only valid if in-kernel
6583 More architecture-specific flags detailing state of the VCPU that may
6601 The value of the cr8 register. Only valid if in-kernel local APIC is
6608 The value of the APIC BASE msr. Only valid if in-kernel local
6620 reasons. Further architecture-specific information is available in
6632 to unknown reasons. Further architecture-specific information is
6685 executed a memory-mapped I/O instruction which could not be satisfied
6699 has re-entered the kernel with KVM_RUN. The kernel side will first finish
6704 completed before performing a live migration. Userspace can re-enter the
6727 ----------
6729 SMCCC exits can be enabled depending on the configuration of the SMCCC
6738 - ``KVM_HYPERCALL_EXIT_SMC``: Indicates that the guest used the SMC
6742 - ``KVM_HYPERCALL_EXIT_16BIT``: Indicates that the guest used a 16bit
6795 machine (KVM_VM_S390_UNCONTROL) on its host page table that cannot be
6811 Deprecated - was used for 440 KVM.
6837 This is used on 64-bit PowerPC when emulating a pSeries partition,
6841 the arguments (from the guest R4 - R12). Userspace should put the
6843 The possible hypercalls are defined in the Power Architecture Platform
6844 Requirements (PAPR) document available from www.power.org (free
6872 On FSL BookE PowerPC chips, the interrupt controller has a fast patch
6903 a system-level event using some architecture specific mechanism (hypercall
6907 The 'type' field describes the system-level event type.
6910 - KVM_SYSTEM_EVENT_SHUTDOWN -- the guest has requested a shutdown of the
6914 - KVM_SYSTEM_EVENT_RESET -- the guest has requested a reset of the VM.
6915 As with SHUTDOWN, userspace can choose to ignore the request, or
6917 - KVM_SYSTEM_EVENT_CRASH -- the guest crash occurred and the guest
6919 to ignore the request, or to gather VM memory core dump and/or
6921 - KVM_SYSTEM_EVENT_SEV_TERM -- an AMD SEV guest requested termination.
6923 - KVM_SYSTEM_EVENT_TDX_FATAL -- a TDX guest reported a fatal error state.
6924 KVM doesn't do any parsing or conversion, it just dumps 16 general-purpose
6925 registers to userspace, in ascending order of the 4-bit indices for x86-64
6926 general-purpose registers in instruction encoding, as defined in the Intel
6928 - KVM_SYSTEM_EVENT_WAKEUP -- the exiting vCPU is in a suspended state and
6931 - KVM_SYSTEM_EVENT_SUSPEND -- the guest has requested a suspension of
6935 architecture specific information for the system-level event. Only
6938 - for arm64, data[0] is set to KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 if
6942 - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2
6946 - for RISC-V, data[0] is set to the value of the second argument of the
6954 --------------
6964 the call parameters are left in-place in the vCPU registers.
6969 - Honor the guest request to suspend the VM. Userspace can request
6970 in-kernel emulation of suspension by setting the calling vCPU's
6974 for details on the function parameters.
6976 - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
6992 Indicates that the VCPU's in-kernel local APIC received an EOI for a
6993 level-triggered IOAPIC interrupt. This exit only triggers when the
7035 related to Hyper-V emulation.
7039 - KVM_EXIT_HYPERV_SYNIC -- synchronously notify user-space about
7041 Hyper-V SynIC state change. Notification is used to remap SynIC
7045 - KVM_EXIT_HYPERV_SYNDBG -- synchronously notify user-space about
7047 Hyper-V Synthetic debugger state change. Notification is used to either update
7059 Used on arm64 systems. If a guest accesses memory not in a memslot,
7060 KVM will typically return to userspace and ask it to do MMIO emulation on its
7066 the VM. KVM assumed that if the guest accessed non-memslot memory, it was
7090 exposed if queried on a protected VM file descriptor.
7096 __u8 error; /* user -> kernel */
7098 __u32 reason; /* kernel -> user */
7099 __u32 index; /* kernel -> user */
7100 __u64 data; /* kernel <-> user */
7103 Used on x86 systems. When the VM capability KVM_CAP_X86_USER_SPACE_MSR is
7132 See KVM_X86_SET_MSR_FILTER for details on the interaction with MSR filtering.
7158 - KVM_EXIT_XEN_HCALL -- synchronously notify user-space about Xen hypercall.
7173 done a SBI call which is not handled by KVM RISC-V kernel module. The details
7179 values of SBI call before resuming the VCPU. For more details on RISC-V SBI
7180 spec refer, https://github.com/riscv/riscv-sbi-doc.
7197 - KVM_MEMORY_EXIT_FLAG_PRIVATE - When set, indicates the memory fault occurred
7198 on a private memory access. When clear, indicates the fault occurred on a
7202 accompanies a return code of '-1', not '0'! errno will always be set to EFAULT
7214 Used on x86 systems. When the VM capability KVM_CAP_X86_NOTIFY_VMEXIT is
7215 enabled, a VM exit generated if no event window occurs in VM non-root mode
7223 - KVM_NOTIFY_CONTEXT_INVALID -- the VM context is corrupted and not valid
7255 on the Guest-Hypervisor Communication Interface (GHCI) specification;
7258 on re-entry.
7265 * ``TDVMCALL_GET_QUOTE``: the guest has requested to generate a TD-Quote
7266 signed by a service hosting TD-Quoting Enclave operating on the host.
7269 (without the shared bit set) and the size of a shared-memory buffer, in
7273 shared-memory area to check whether the Quote generation is completed or
7324 6. Capabilities that can be enabled on vCPUs
7331 Below you can find a list of capabilities and what their effect on the vCPU or
7341 whether this is a per-vcpu or per-vm capability.
7352 -------------------
7357 :Returns: 0 on success; -1 on error
7361 were invented by Mac-on-Linux to have a standardized communication mechanism
7368 --------------------
7373 :Returns: 0 on success; -1 on error
7389 ------------------
7394 :Returns: 0 on success; -1 on error
7407 addresses of mmu-type-specific data structures. The "array_len" field is an
7416 On return from KVM_RUN, the shared region will reflect the current state of
7419 on this vcpu.
7423 - The "params" field is of type "struct kvm_book3e_206_tlb_params".
7424 - The "array" field points to an array of type "struct
7426 - The array consists of all entries in the first TLB, followed by all
7428 - Within a TLB, entries are ordered first by increasing set number. Within a
7430 - The hash for determining set number in TLB0 is: (MAS2 >> 12) & (num_sets - 1)
7432 - The tsize field of mas1 shall be set to 4K on TLB0, even though the
7436 ----------------------------
7441 :Returns: 0 on success; -1 on error
7446 handled in-kernel, while the other I/O instructions are passed to userspace.
7448 When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST
7451 Note that even though this capability is enabled per-vcpu, the complete
7455 -------------------
7460 :Returns: 0 on success; -1 on error
7474 --------------------
7480 This capability connects the vcpu to an in-kernel MPIC device.
7483 --------------------
7490 This capability connects the vcpu to an in-kernel XICS device.
7493 ------------------------
7499 This capability enables the in-kernel irqchip for s390. Please refer to
7503 --------------------
7512 accessed (depending on the current guest FPU register mode), and the Status.FR,
7514 depending on them being supported by the FPU.
7517 ---------------------
7530 ----------------------
7535 :Returns: x86: KVM_CHECK_EXTENSION returns a bit-array indicating which register
7551 - the register sets to be copied out to kvm_run are selectable
7553 - vcpu_events are available in addition to regs and sregs.
7556 function as an input bit-array field set by userspace to indicate the
7557 specific register sets to be copied out on the next exit.
7576 -------------------------
7583 This capability connects the vcpu to an in-kernel XIVE device.
7586 -------------------------
7593 Hyper-V Synthetic interrupt controller(SynIC). Hyper-V SynIC is
7594 used to support Windows Hyper-V based guest paravirt drivers(VMBus).
7597 capability via KVM_ENABLE_CAP ioctl on the vcpu fd. Note that this
7599 by the CPU, as it's incompatible with SynIC auto-EOI behavior.
7602 --------------------------
7607 This capability enables a newer version of Hyper-V Synthetic interrupt
7613 -----------------------------------
7618 This capability indicates that KVM running on top of Hyper-V hypervisor
7620 hypercalls are handled by Level 0 hypervisor (Hyper-V) bypassing KVM.
7621 Due to the different ABI for hypercall parameters between Hyper-V and
7624 flush hypercalls by Hyper-V) so userspace should disable KVM identification
7625 in CPUID and only exposes Hyper-V identification. In this case, guest
7626 thinks it's running on Hyper-V and only use Hyper-V hypercalls.
7629 ---------------------------------
7634 When enabled, KVM will disable emulated Hyper-V features provided to the
7635 guest according to the bits Hyper-V CPUID feature leaves. Otherwise, all
7636 currently implemented Hyper-V features are provided unconditionally when
7637 Hyper-V identification is set in the HYPERV_CPUID_INTERFACE (0x40000001)
7641 -------------------------------------
7656 7. Capabilities that can be enabled on VMs
7662 what their effect on the VM is when enabling them.
7679 ----------------------------
7683 args[1] is 0 to disable, 1 to enable in-kernel handling
7686 get handled by the kernel or not. Enabling or disabling in-kernel
7687 handling of an hcall is effective across the VM. On creation, an
7688 initial set of hcalls are enabled for in-kernel handling, which
7689 consists of those hcalls for which in-kernel handlers were implemented
7696 If the hcall number specified is not one that has an in-kernel
7701 --------------------------
7710 - SENSE
7711 - SENSE RUNNING
7712 - EXTERNAL CALL
7713 - EMERGENCY SIGNAL
7714 - CONDITIONAL EMERGENCY SIGNAL
7723 ---------------------------------
7727 :Returns: 0 on success, negative value on error
7731 return -EINVAL if the machine does not support vectors.
7734 --------------------------
7739 This capability allows post-handlers for the STSI instruction. After
7744 vcpu->run::
7755 @addr - guest address of STSI SYSIB
7756 @fc - function code
7757 @sel1 - selector 1
7758 @sel2 - selector 2
7759 @ar - access register number
7761 KVM handlers should exit to userspace with rc = -EREMOTE.
7764 -------------------------
7767 :Parameters: args[0] - number of routes reserved for userspace IOAPICs
7768 :Returns: 0 on success, -1 on error
7785 -------------------
7790 Allows use of runtime-instrumentation introduced with zEC12 processor.
7791 Will return -EINVAL if the machine does not support runtime-instrumentation.
7792 Will return -EBUSY if a VCPU has already been created.
7795 ----------------------
7798 :Parameters: args[0] - features that should be enabled
7799 :Returns: 0 on success, -EINVAL when args[0] contains invalid features
7808 allowing the use of 32-bit APIC IDs. See KVM_CAP_X2APIC_API in their
7815 where 0xff represents CPUs 0-7 in cluster 0.
7818 ----------------------------
7825 mechanism e.g. to realize 2-byte software breakpoints. The kernel will
7833 -------------------
7837 :Returns: 0 on success; -EINVAL if the machine does not support
7838 guarded storage; -EBUSY if a VCPU has already been created.
7843 ---------------------
7848 Allow use of adapter-interruption suppression.
7849 :Returns: 0 on success; -EBUSY if a VCPU has already been created.
7852 --------------------
7857 Enabling this capability on a VM provides userspace with a way to set
7859 virtual core). The virtual SMT mode, vsmt_mode, must be a power of 2
7860 between 1 and 8. On POWER8, vsmt_mode must also be no greater than
7870 ----------------------
7882 ------------------------------
7886 :Returns: 0 on success, -EINVAL when args[0] contains invalid exits
7897 Enabling this capability on a VM provides userspace with a way to no
7908 document strict usage conditions for these MSRs--emphasizing that only
7910 architecturally defined--simply passing through the MSRs can still
7918 4. C-states lower than C0 are emulated (e.g., via HLT interception).
7929 --------------------------
7933 :Returns: 0 on success, -EINVAL if hpage module parameter was not set
7941 hpage module parameter is not set to 1, -EINVAL is returned.
7947 ------------------------------
7957 --------------------------
7961 :Returns: 0 on success, -EINVAL when the implementation doesn't support
7962 nested-HV virtualization.
7964 HV-KVM on POWER9 and later systems allows for "nested-HV"
7966 can run using the CPU's supervisor mode (privileged non-hypervisor
7967 state). Enabling this capability on a VM depends on the CPU having
7968 the necessary functionality and on the facility being enabled with a
7969 kvm-hv module parameter.
7972 ------------------------------
7978 emulated VM-exit when L1 intercepts a #PF exception that occurs in
7979 L2. Similarly, for kvm-intel only, DR6 will not be modified prior to
7980 the emulated VM-exit when L1 intercepts a #DB exception that occurs in
7986 exception.has_payload and to put the faulting address - or the new DR6
7987 bits\ [#]_ - in the exception_payload field.
7998 --------------------------------------
8009 automatically clear and write-protect all pages that are returned as dirty.
8015 KVM_CLEAR_DIRTY_LOG ioctl can operate on a 64-page granularity rather
8028 dirty logging can be enabled gradually in small chunks on the first call
8029 to KVM_CLEAR_DIRTY_LOG. KVM_DIRTY_LOG_INITIALLY_SET depends on
8030 KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE (it is also only available on
8040 ------------------------------
8044 This capability indicates that KVM is running on a host that has
8045 ultravisor firmware and thus can support a secure guest. On such a
8057 ----------------------
8062 :Returns: 0 on success; -1 on error
8065 maximum halt-polling time for all vCPUs in the target VM. This capability can
8067 maximum halt-polling time.
8069 See Documentation/virt/kvm/halt-polling.rst for more information on halt
8073 -------------------------------
8078 :Returns: 0 on success; -1 on error
8081 access to an MSR is denied. By default, KVM injects #GP on denied accesses.
8105 -------------------------------
8110 :Returns: 0 on success, -EINVAL when args[0] contains invalid bits
8117 Enabling this capability on a VM provides userspace with a way to select a
8120 the KVM_ENABLE_CAP. The supported modes are mutually-exclusive.
8122 This capability allows userspace to force VM exits on bus locks detected in the
8123 guest, irrespective whether or not the host has enabled split-lock detection
8129 exit, although the host kernel's split-lock #AC detection still applies, if
8135 apply some other policy-based mitigation. When exiting to userspace, KVM sets
8136 KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason
8140 the time of exit diverges between Intel and AMD. On Intel hosts, RIP points at
8141 the next instruction, i.e. the exit is trap-like. On AMD hosts, RIP points at
8142 the offending instruction, i.e. the exit is fault-like.
8146 userspace wants to take action on all detected bus locks.
8149 ----------------------
8153 :Returns: 0 on success, -EINVAL when CPU doesn't support 2nd DAWR
8160 -------------------------------------
8165 :Returns: 0 on success; ENOTTY on error
8168 indicated by the fd to the vm this is called on.
8170 This is intended to support in-guest workloads scheduled by the host. This
8171 allows the in-guest workload to maintain its own NPTs and keeps the two vms
8176 --------------------------
8181 :Returns: 0 on success, -EINVAL if the file handle is invalid or if a requested
8199 --------------------------------------
8217 --------------------
8234 ``MAP_ANONYMOUS`` or with a RAM-based file mapping (``tmpfs``, ``memfd``),
8236 -EINVAL return.
8242 -------------------------------------
8247 :Returns: 0 on success
8250 indicated by the fd to the VM this is called on.
8252 This is intended to support intra-host migration of VMs between userspace VMMs,
8256 ----------------------------
8258 :Parameters: args[0] - set of KVM quirks to disable
8280 KVM_X86_QUIRK_CD_NW_CLEARED By default, KVM clears CR0.CD and CR0.NW on
8294 KVM_X86_QUIRK_OUT_7E_INC_RIP By default, KVM pre-increments %rip before
8297 KVM does not pre-increment %rip before
8322 KVM will inject a #UD on MONITOR/MWAIT if
8325 guest CPUID on writes to MISC_ENABLE if
8358 be set by userspace (KVM sets them based on
8361 KVM_X86_QUIRK_IGNORE_GUEST_PAT By default, on Intel platforms, KVM ignores
8364 on Intel platforms which are incapable of
8366 self-snoop, KVM always ignores guest PAT and
8368 also ignored on AMD platforms or, on Intel,
8369 when a VM has non-coherent DMA devices
8372 slowdowns on certain Intel Xeon platforms
8373 (e.g. ICX, SPR) where self-snoop feature is
8385 ------------------------
8389 :Parameters: args[0] - maximum APIC ID value set for current VM
8390 :Returns: 0 on success, -EINVAL if args[0] is beyond KVM_MAX_VCPU_IDS
8406 ------------------------------
8411 :Returns: 0 on success, -EINVAL if args[0] contains invalid flags or notify
8420 This capability allows userspace to configure the notify VM exit on/off
8421 in per-VM scope during VM creation. Notify VM exit is disabled by default.
8424 a VM exit if no event window occurs in VM non-root mode for a specified of
8435 -----------------------------------
8440 :Returns: 0 on success, -EINVAL if args[0] contains an invalid value for the
8441 frequency or if any vCPUs have been created, -ENXIO if a virtual
8444 This capability sets the VM's APIC bus clock frequency, used by KVM's in-kernel
8449 core crystal clock frequency, if a non-zero CPUID 0x15 is exposed to the guest.
8452 ----------------------------------------------------------
8456 :Parameters: args[0] - size of the dirty log ring
8480 vCPU, and the size of the ring must be a power of two. The larger the
8482 exit to userspace. The optimal size depends on the workload, but it is
8496 00 -----------> 01 -------------> 1X -------+
8499 +------------------------------------------+
8507 on to the next GFN. The userspace should continue to do this until the
8511 Note that on weakly ordered architectures, userspace accesses to the
8513 using load-acquire/store-release accessors when available, or any
8539 the additional memory ordering requirements imposed on userspace when
8541 Architecture with TSO-like ordering (such as x86) are allowed to
8547 ring structures can be backed by per-slot bitmaps. With this capability
8557 context. Otherwise, the stand-alone per-slot bitmap mechanism needs to
8569 tables through command KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_SAVE_TABLES} on
8570 KVM device "kvm-arm-vgic-its". (2) restore vgic/its tables through
8571 command KVM_DEV_ARM_{VGIC_GRP_CTRL, ITS_RESTORE_TABLES} on KVM device
8572 "kvm-arm-vgic-its". VGICv3 LPI pending status is restored. (3) save
8574 command on KVM device "kvm-arm-vgic-v3".
8577 ---------------------------
8582 :Returns: 0 on success, -EINVAL when arg[0] contains invalid bits
8587 PMU virtualization capabilities that can be adjusted on a VM.
8591 only be invoked on a VM prior to the creation of VCPUs.
8598 -------------------------------------
8603 :Returns: 0 on success, -EPERM if the userspace process does not
8604 have CAP_SYS_BOOT, -EINVAL if args[0] is not 0 or any vCPUs have been
8614 ---------------------------------------
8619 :Returns: 0 on success, -EINVAL if any memslot was already created.
8623 Eager Page Splitting improves the performance of dirty-logging (used
8624 in live migrations) when guest memory is backed by huge-pages. It
8625 avoids splitting huge-pages (into PAGE_SIZE pages) on fault, by doing
8636 64-bit bitmap (each bit describing a block size). The default value is
8640 ---------------------------
8658 -------------------------------
8667 -------------------------------------
8672 :Returns: 0 on success, -EINVAL if vCPUs have been created before enabling this
8681 scoped, meaning that the same set of values are presented on all vCPUs in a
8685 ---------------------------------
8690 :Returns: 0 on success, -EINVAL if arg[0] is not zero
8696 -------------------------------------------
8703 can be safely mapped as cacheable. This relies on the presence of
8704 force write back (FWB) feature support on the hardware.
8713 ---------------------
8719 H_RANDOM hypercall backed by a hardware random-number generator.
8724 -------------------------
8730 radix MMU defined in Power ISA V3.00 (as implemented in the POWER9
8734 ---------------------------
8740 hashed page table MMU defined in Power ISA V3.00 (as implemented in
8741 the POWER9 processor), including in-memory segment tables.
8744 -------------------
8748 This capability, if KVM_CHECK_EXTENSION on the main kvm handle indicates that
8754 If KVM_CHECK_EXTENSION on a kvm VM handle indicates that this capability is
8774 ----------------------
8781 The values returned when this capability is checked by KVM_CHECK_EXTENSION on a
8788 Both registers and addresses are 32-bits wide.
8789 It will only be possible to run 32-bit guest code.
8791 1 MIPS64 or microMIPS64 with access only to 32-bit compatibility segments.
8792 Registers are 64-bits wide, but addresses are 32-bits wide.
8793 64-bit guest code may run but cannot access MIPS64 memory segments.
8794 It will also be possible to run 32-bit guest code.
8797 Both registers and addresses are 64-bits wide.
8798 It will be possible to run 64-bit or 32-bit guest code.
8802 ------------------------
8807 that if userspace creates a VM without an in-kernel interrupt controller, it
8808 will be notified of changes to the output level of in-kernel emulated devices,
8810 For such VMs, on every return to userspace, the kernel
8811 updates the vcpu's run->s.regs.device_irq_level field to represent the actual
8817 userspace can always sample the device output level and re-compute the state of
8819 of run->s.regs.device_irq_level on every kvm exit.
8820 The value in run->s.regs.device_irq_level can represent both level and edge
8821 triggered interrupt signals, depending on the device. Edge triggered interrupt
8822 signals will exit to userspace with the bit in run->s.regs.device_irq_level
8825 The field run->s.regs.device_irq_level is available independent of
8826 run->kvm_valid_regs or run->kvm_dirty_regs bits.
8830 and thereby which bits in run->s.regs.device_irq_level can signal values.
8836 KVM_ARM_DEV_EL1_VTIMER - EL1 virtual timer
8837 KVM_ARM_DEV_EL1_PTIMER - EL1 physical timer
8838 KVM_ARM_DEV_PMU - ARM PMU overflow interrupt signal
8845 -----------------------------
8855 ----------------------------
8865 -------------------------------
8874 ---------------------
8881 ----------------------
8890 ---------------------
8895 use copy-on-write semantics as well as dirty pages tracking via read-only page
8899 ---------------------
8908 ----------------------------
8912 This capability indicates that KVM supports paravirtualized Hyper-V TLB Flush
8918 ----------------------------------
8933 ----------------------------
8937 This capability indicates that KVM supports paravirtualized Hyper-V IPI send
8942 -----------------------------
8950 ---------------------------
8961 -----------------------
8967 architecture-specific interfaces. This capability and the architecture-
8974 -------------------------
8981 environments running on the machine.
8984 an 8-byte value consisting of a one-byte Control Program Name Code (CPNC) and
8985 a 7-byte Control Program Version Code (CPVC). The CPNC determines what
8994 -------------------------------
8999 writes to user space. It can be enabled on a VM level. If enabled, MSR
9005 ---------------------------
9016 limit the attack surface on KVM's MSR emulation code.
9019 --------------------
9049 The KVM_XEN_HVM_CONFIG_RUNSTATE flag indicates that the runstate-related
9083 ---------------------------
9097 IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
9098 present in the "ibm,hypertas-functions" device-tree property.
9102 they will get passed on to user space. So user space still has to have
9108 --------------------
9114 available to the guest on migration.
9117 --------------------------------
9129 ------------------------------
9144 on vm fd, KVM_S390_VM_CPU_TOPOLOGY.
9149 When getting the Modified Change Topology Report value, the attr->addr
9153 ---------------------
9158 This capability returns a bitmap of support VM types. The 1-setting of bit @n
9168 production. The behavior and effective ABI for software-protected VMs is
9172 -------------------------------
9181 IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
9182 present in the "ibm,hypertas-functions" device-tree property.
9184 This capability is enabled for hypervisors on platforms like POWER9
9188 ---------------------------
9193 "Address Translation Mode on Interrupt" aka "Alternate Interrupt Location"
9196 This capability allows a guest kernel to use a better-performance mode for
9200 ------------------------------
9205 kvm_run.memory_fault if KVM cannot resolve a guest page fault VM-Exit, e.g. if
9220 ---------------------------
9229 depending on which executed at the time of an exit. Userspace must
9243 --------
9257 ``KVM_ENABLE_CAP(KVM_CAP_IRQCHIP_SPLIT)`` are used to enable in-kernel emulation of
9262 On older versions of Linux, CPU[EAX=1]:ECX[24] (TSC_DEADLINE) is not reported by
9264 is present and the kernel has enabled in-kernel emulation of the local APIC.
9265 On newer versions, ``KVM_GET_SUPPORTED_CPUID`` does report the bit as available.
9273 should not rely on it. Currently they return all zeroes.