History log of /linux/arch/x86/kvm/vmx/tdx.c (Results 1 – 25 of 79)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 73d7cf07 10-Jul-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"Many patches, pretty much all of them small, that accumulated while I
was on vacation.

AR

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"Many patches, pretty much all of them small, that accumulated while I
was on vacation.

ARM:

- Remove the last leftovers of the ill-fated FPSIMD host state
mapping at EL2 stage-1

- Fix unexpected advertisement to the guest of unimplemented S2 base
granule sizes

- Gracefully fail initialising pKVM if the interrupt controller isn't
GICv3

- Also gracefully fail initialising pKVM if the carveout allocation
fails

- Fix the computing of the minimum MMIO range required for the host
on stage-2 fault

- Fix the generation of the GICv3 Maintenance Interrupt in nested
mode

x86:

- Reject SEV{-ES} intra-host migration if one or more vCPUs are
actively being created, so as not to create a non-SEV{-ES} vCPU in
an SEV{-ES} VM

- Use a pre-allocated, per-vCPU buffer for handling de-sparsification
of vCPU masks in Hyper-V hypercalls; fixes a "stack frame too
large" issue

- Allow out-of-range/invalid Xen event channel ports when configuring
IRQ routing, to avoid dictating a specific ioctl() ordering to
userspace

- Conditionally reschedule when setting memory attributes to avoid
soft lockups when userspace converts huge swaths of memory to/from
private

- Add back MWAIT as a required feature for the MONITOR/MWAIT selftest

- Add a missing field in struct sev_data_snp_launch_start that
resulted in the guest-visible workarounds field being filled at the
wrong offset

- Skip non-canonical address when processing Hyper-V PV TLB flushes
to avoid VM-Fail on INVVPID

- Advertise supported TDX TDVMCALLs to userspace

- Pass SetupEventNotifyInterrupt arguments to userspace

- Fix TSC frequency underflow"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: avoid underflow when scaling TSC frequency
KVM: arm64: Remove kvm_arch_vcpu_run_map_fp()
KVM: arm64: Fix handling of FEAT_GTG for unimplemented granule sizes
KVM: arm64: Don't free hyp pages with pKVM on GICv2
KVM: arm64: Fix error path in init_hyp_mode()
KVM: arm64: Adjust range correctly during host stage-2 faults
KVM: arm64: nv: Fix MI line level calculation in vgic_v3_nested_update_mi()
KVM: x86/hyper-v: Skip non-canonical addresses during PV TLB flush
KVM: SVM: Add missing member in SNP_LAUNCH_START command structure
Documentation: KVM: Fix unexpected unindent warnings
KVM: selftests: Add back the missing check of MONITOR/MWAIT availability
KVM: Allow CPU to reschedule while setting per-page memory attributes
KVM: x86/xen: Allow 'out of range' event channel ports in IRQ routing table.
KVM: x86/hyper-v: Use preallocated per-vCPU buffer for de-sparsified vCPU masks
KVM: SVM: Initialize vmsa_pa in VMCB to INVALID_PAGE if VMSA page is NULL
KVM: SVM: Reject SEV{-ES} intra host migration if vCPU creation is in-flight
KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities
KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt

show more ...


# 5383fc05 08-Jul-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-x86-fixes-6.16-rcN' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.16-rcN

- Reject SEV{-ES} intra-host migration if one or more vCPUs are actively
being created

Merge tag 'kvm-x86-fixes-6.16-rcN' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.16-rcN

- Reject SEV{-ES} intra-host migration if one or more vCPUs are actively
being created so as not to create a non-SEV{-ES} vCPU in an SEV{-ES} VM.

- Use a pre-allocated, per-vCPU buffer for handling de-sparsified vCPU masks
when emulating Hyper-V hypercalls to fix a "stack frame too large" issue.

- Allow out-of-range/invalid Xen event channel ports when configuring IRQ
routing to avoid dictating a specific ioctl() ordering to userspace.

- Conditionally reschedule when setting memory attributes to avoid soft
lockups when userspace converts huge swaths of memory to/from private.

- Add back MWAIT as a required feature for the MONITOR/MWAIT selftest.

- Add a missing field in struct sev_data_snp_launch_start that resulted in
the guest-visible workarounds field being filled at the wrong offset.

- Skip non-canonical address when processing Hyper-V PV TLB flushes to avoid
VM-Fail on INVVPID.

- Advertise supported TDX TDVMCALLs to userspace.

show more ...


Revision tags: v6.16-rc5, v6.16-rc4, v6.16-rc3
# 28224ef0 20-Jun-2025 Paolo Bonzini <pbonzini@redhat.com>

KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities

Allow userspace to advertise TDG.VP.VMCALL subfunctions that the
kernel also supports. For each output register of GetTdVmCallInfo'

KVM: TDX: Report supported optional TDVMCALLs in TDX capabilities

Allow userspace to advertise TDG.VP.VMCALL subfunctions that the
kernel also supports. For each output register of GetTdVmCallInfo's
leaf 1, add two fields to KVM_TDX_CAPABILITIES: one for kernel-supported
TDVMCALLs (userspace can set those blindly) and one for user-supported
TDVMCALLs (userspace can set those if it knows how to handle them).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 4580dbef 20-Jun-2025 Paolo Bonzini <pbonzini@redhat.com>

KVM: TDX: Exit to userspace for SetupEventNotifyInterrupt

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>


# 346bd8a9 26-Jun-2025 Takashi Iwai <tiwai@suse.de>

Merge tag 'asoc-fix-v6.16-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.16

A small collection of fixes, the main one being a fix for resume

Merge tag 'asoc-fix-v6.16-rc3' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.16

A small collection of fixes, the main one being a fix for resume from
hibernation on AMD systems, plus a few new quirk entries for AMD
systems.

show more ...


# e669e322 22-Jun-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM:

- Fix another set of FP/SIMD/SVE bugs affecting NV, and plugging some
missing sy

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM:

- Fix another set of FP/SIMD/SVE bugs affecting NV, and plugging some
missing synchronisation

- A small fix for the irqbypass hook fixes, tightening the check and
ensuring that we only deal with MSI for both the old and the new
route entry

- Rework the way the shadow LRs are addressed in a nesting
configuration, plugging an embarrassing bug as well as simplifying
the whole process

- Add yet another fix for the dreaded arch_timer_edge_cases selftest

RISC-V:

- Fix the size parameter check in SBI SFENCE calls

- Don't treat SBI HFENCE calls as NOPs

x86 TDX:

- Complete API for handling complex TDVMCALLs in userspace.

This was delayed because the spec lacked a way for userspace to
deny supporting these calls; the new exit code is now approved"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: TDX: Exit to userspace for GetTdVmCallInfo
KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>
KVM: TDX: Add new TDVMCALL status code for unsupported subfuncs
KVM: arm64: VHE: Centralize ISBs when returning to host
KVM: arm64: Remove cpacr_clear_set()
KVM: arm64: Remove ad-hoc CPTR manipulation from kvm_hyp_handle_fpsimd()
KVM: arm64: Remove ad-hoc CPTR manipulation from fpsimd_sve_sync()
KVM: arm64: Reorganise CPTR trap manipulation
KVM: arm64: VHE: Synchronize CPTR trap deactivation
KVM: arm64: VHE: Synchronize restore of host debug registers
KVM: arm64: selftests: Close the GIC FD in arch_timer_edge_cases
KVM: arm64: Explicitly treat routing entry type changes as changes
KVM: arm64: nv: Fix tracking of shadow list registers
RISC-V: KVM: Don't treat SBI HFENCE calls as NOPs
RISC-V: KVM: Fix the size parameter check in SBI SFENCE calls

show more ...


Revision tags: v6.16-rc2
# 25e8b1dd 10-Jun-2025 Binbin Wu <binbin.wu@linux.intel.com>

KVM: TDX: Exit to userspace for GetTdVmCallInfo

Exit to userspace for TDG.VP.VMCALL<GetTdVmCallInfo> via KVM_EXIT_TDX,
to allow userspace to provide information about the support of
TDVMCALLs when r

KVM: TDX: Exit to userspace for GetTdVmCallInfo

Exit to userspace for TDG.VP.VMCALL<GetTdVmCallInfo> via KVM_EXIT_TDX,
to allow userspace to provide information about the support of
TDVMCALLs when r12 is 1 for the TDVMCALLs beyond the GHCI base API.

GHCI spec defines the GHCI base TDVMCALLs: <GetTdVmCallInfo>, <MapGPA>,
<ReportFatalError>, <Instruction.CPUID>, <#VE.RequestMMIO>,
<Instruction.HLT>, <Instruction.IO>, <Instruction.RDMSR> and
<Instruction.WRMSR>. They must be supported by VMM to support TDX guests.

For GetTdVmCallInfo
- When leaf (r12) to enumerate TDVMCALL functionality is set to 0,
successful execution indicates all GHCI base TDVMCALLs listed above are
supported.

Update the KVM TDX document with the set of the GHCI base APIs.

- When leaf (r12) to enumerate TDVMCALL functionality is set to 1, it
indicates the TDX guest is querying the supported TDVMCALLs beyond
the GHCI base TDVMCALLs.
Exit to userspace to let userspace set the TDVMCALL sub-function bit(s)
accordingly to the leaf outputs. KVM could set the TDVMCALL bit(s)
supported by itself when the TDVMCALLs don't need support from userspace
after returning from userspace and before entering guest. Currently, no
such TDVMCALLs implemented, KVM just sets the values returned from
userspace.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
[Adjust userspace API. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# cf207eac 10-Jun-2025 Binbin Wu <binbin.wu@linux.intel.com>

KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>

Handle TDVMCALL for GetQuote to generate a TD-Quote.

GetQuote is a doorbell-like interface used by TDX guests to request VMM
to generate a TD-Quote signed b

KVM: TDX: Handle TDG.VP.VMCALL<GetQuote>

Handle TDVMCALL for GetQuote to generate a TD-Quote.

GetQuote is a doorbell-like interface used by TDX guests to request VMM
to generate a TD-Quote signed by a service hosting TD-Quoting Enclave
operating on the host. A TDX guest passes a TD Report (TDREPORT_STRUCT) in
a shared-memory area as parameter. Host VMM can access it and queue the
operation for a service hosting TD-Quoting enclave. When completed, the
Quote is returned via the same shared-memory area.

KVM only checks the GPA from the TDX guest has the shared-bit set and drops
the shared-bit before exiting to userspace to avoid bleeding the shared-bit
into KVM's exit ABI. KVM forwards the request to userspace VMM (e.g. QEMU)
and userspace VMM queues the operation asynchronously. KVM sets the return
code according to the 'ret' field set by userspace to notify the TDX guest
whether the request has been queued successfully or not. When the request
has been queued successfully, the TDX guest can poll the status field in
the shared-memory area to check whether the Quote generation is completed
or not. When completed, the generated Quote is returned via the same
buffer.

Add KVM_EXIT_TDX as a new exit reason to userspace. Userspace is
required to handle the KVM exit reason as the initial support for TDX,
by reentering KVM to ensure that the TDVMCALL is complete. While at it,
add a note that KVM_EXIT_HYPERCALL also requires reentry with KVM_RUN.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Tested-by: Mikko Ylinen <mikko.ylinen@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
[Adjust userspace API. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# b5aafcb4 10-Jun-2025 Binbin Wu <binbin.wu@linux.intel.com>

KVM: TDX: Add new TDVMCALL status code for unsupported subfuncs

Add the new TDVMCALL status code TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED and
return it for unimplemented TDVMCALL subfunctions.

Returning

KVM: TDX: Add new TDVMCALL status code for unsupported subfuncs

Add the new TDVMCALL status code TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED and
return it for unimplemented TDVMCALL subfunctions.

Returning TDVMCALL_STATUS_INVALID_OPERAND when a subfunction is not
implemented is vague because TDX guests can't tell the error is due to
the subfunction is not supported or an invalid input of the subfunction.
New GHCI spec adds TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED to avoid the
ambiguity. Use it instead of TDVMCALL_STATUS_INVALID_OPERAND.

Before the change, for common guest implementations, when a TDX guest
receives TDVMCALL_STATUS_INVALID_OPERAND, it has two cases:
1. Some operand is invalid. It could change the operand to another value
retry.
2. The subfunction is not supported.

For case 1, an invalid operand usually means the guest implementation bug.
Since the TDX guest can't tell which case is, the best practice for
handling TDVMCALL_STATUS_INVALID_OPERAND is stopping calling such leaf,
treating the failure as fatal if the TDVMCALL is essential or ignoring
it if the TDVMCALL is optional.

With this change, TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED could be sent to
old TDX guest that do not know about it, but it is expected that the
guest will make the same action as TDVMCALL_STATUS_INVALID_OPERAND.
Currently, no known TDX guest checks TDVMCALL_STATUS_INVALID_OPERAND
specifically; for example Linux just checks for success.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
[Return it for untrapped KVM_HC_MAP_GPA_RANGE. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


Revision tags: v6.16-rc1
# 43db1111 29-May-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.

Quotes are appropr

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.

Quotes are appropriate because (at 6k lines and 100+ commits) it is
much bigger than the rest, which will come later this week and
consists mostly of bugfixes and selftests. s390 changes will also come
in the second batch.

ARM:

- Add large stage-2 mapping (THP) support for non-protected guests
when pKVM is enabled, clawing back some performance.

- Enable nested virtualisation support on systems that support it,
though it is disabled by default.

- Add UBSAN support to the standalone EL2 object used in nVHE/hVHE
and protected modes.

- Large rework of the way KVM tracks architecture features and links
them with the effects of control bits. While this has no functional
impact, it ensures correctness of emulation (the data is
automatically extracted from the published JSON files), and helps
dealing with the evolution of the architecture.

- Significant changes to the way pKVM tracks ownership of pages,
avoiding page table walks by storing the state in the hypervisor's
vmemmap. This in turn enables the THP support described above.

- New selftest checking the pKVM ownership transition rules

- Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
even if the host didn't have it.

- Fixes for the address translation emulation, which happened to be
rather buggy in some specific contexts.

- Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
from the number of counters exposed to a guest and addressing a
number of issues in the process.

- Add a new selftest for the SVE host state being corrupted by a
guest.

- Keep HCR_EL2.xMO set at all times for systems running with the
kernel at EL2, ensuring that the window for interrupts is slightly
bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

- Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
from a pretty bad case of TLB corruption unless accesses to HCR_EL2
are heavily synchronised.

- Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
tables in a human-friendly fashion.

- and the usual random cleanups.

LoongArch:

- Don't flush tlb if the host supports hardware page table walks.

- Add KVM selftests support.

RISC-V:

- Add vector registers to get-reg-list selftest

- VCPU reset related improvements

- Remove scounteren initialization from VCPU reset

- Support VCPU reset from userspace using set_mpstate() ioctl

x86:

- Initial support for TDX in KVM.

This finally makes it possible to use the TDX module to run
confidential guests on Intel processors. This is quite a large
series, including support for private page tables (managed by the
TDX module and mirrored in KVM for efficiency), forwarding some
TDVMCALLs to userspace, and handling several special VM exits from
the TDX module.

This has been in the works for literally years and it's not really
possible to describe everything here, so I'll defer to the various
merge commits up to and including commit 7bcf7246c42a ('Merge
branch 'kvm-tdx-finish-initial' into HEAD')"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits)
x86/tdx: mark tdh_vp_enter() as __flatten
Documentation: virt/kvm: remove unreferenced footnote
RISC-V: KVM: lock the correct mp_state during reset
KVM: arm64: Fix documentation for vgic_its_iter_next()
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: Add a range to pkvm_mappings
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
RISC-V: KVM: Remove scounteren initialization
KVM: RISC-V: remove unnecessary SBI reset state
...

show more ...


Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14
# fd02aa45 19-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-initial' into HEAD

This large commit contains the initial support for TDX in KVM. All x86
parts enable the host-side hypercalls that KVM uses to talk to the TDX
module, a soft

Merge branch 'kvm-tdx-initial' into HEAD

This large commit contains the initial support for TDX in KVM. All x86
parts enable the host-side hypercalls that KVM uses to talk to the TDX
module, a software component that runs in a special CPU mode called SEAM
(Secure Arbitration Mode).

The series is in turn split into multiple sub-series, each with a separate
merge commit:

- Initialization: basic setup for using the TDX module from KVM, plus
ioctls to create TDX VMs and vCPUs.

- MMU: in TDX, private and shared halves of the address space are mapped by
different EPT roots, and the private half is managed by the TDX module.
Using the support that was added to the generic MMU code in 6.14,
add support for TDX's secure page tables to the Intel side of KVM.
Generic KVM code takes care of maintaining a mirror of the secure page
tables so that they can be queried efficiently, and ensuring that changes
are applied to both the mirror and the secure EPT.

- vCPU enter/exit: implement the callbacks that handle the entry of a TDX
vCPU (via the SEAMCALL TDH.VP.ENTER) and the corresponding save/restore
of host state.

- Userspace exits: introduce support for guest TDVMCALLs that KVM forwards to
userspace. These correspond to the usual KVM_EXIT_* "heavyweight vmexits"
but are triggered through a different mechanism, similar to VMGEXIT for
SEV-ES and SEV-SNP.

- Interrupt handling: support for virtual interrupt injection as well as
handling VM-Exits that are caused by vectored events. Exclusive to
TDX are machine-check SMIs, which the kernel already knows how to
handle through the kernel machine check handler (commit 7911f145de5f,
"x86/mce: Implement recovery for errors in TDX/SEAM non-root mode")

- Loose ends: handling of the remaining exits from the TDX module, including
EPT violation/misconfig and several TDVMCALL leaves that are handled in
the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); plus returning
an error or ignoring operations that are not supported by TDX guests

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


Revision tags: v6.14-rc7
# 7bcf7246 12-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-finish-initial' into HEAD

This patch ties the remaining loose ends and finally enables TDX guests to
run inside KVM. It implements handling of EPT violation/misconfig and of
s

Merge branch 'kvm-tdx-finish-initial' into HEAD

This patch ties the remaining loose ends and finally enables TDX guests to
run inside KVM. It implements handling of EPT violation/misconfig and of
several TDVMCALL leaves that are handled in the kernel (CPUID, HLT, RDMSR/WRMSR,
GetTdVmCallInfo); it also adds a bunch of wrappers in vmx/main.c to
ignore operations not supported by TDX guests(*)

Finally, it introduces documentation for the new APIs that have been
added along the way.

(*) access to CPU state, VMX preemption timer, accesses to TSC offset or
multiplier, LMCE enable/disable, hypercall patching.

show more ...


# 9913212b 12-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-interrupts' into HEAD

Introduces support for interrupt handling for TDX guests, including
virtual interrupt injection and VM-Exits caused by vectored events.

Injection
=======

Merge branch 'kvm-tdx-interrupts' into HEAD

Introduces support for interrupt handling for TDX guests, including
virtual interrupt injection and VM-Exits caused by vectored events.

Injection
=========

TDX supports non-NMI interrupt injection only by posted interrupt. Posted
interrupt descriptors (PIDs) are allocated in shared memory, KVM
can update them directly. To post pending interrupts in the PID, KVM
can generate a self-IPI with notification vector prior to TD entry.
TDX guest status is protected, KVM can't get the interrupt status of
TDX guest. For now, assume the interrupt is always allowed. A later
patch set will let TDX guests to call TDVMCALL with HLT, which passes
the interrupt block flag, so that whether interrupt is allowed in HLT
will checked against the interrupt block flag.

For NMIs, KVM can request the TDX module to inject a NMI into a TDX vCPU
by setting the PEND_NMI TDVPS field to 1. Following that, KVM can call
TDH.VP.ENTER to run the vCPU and the TDX module will attempt to inject
the NMI as soon as possible. PEND_NMI TDVPS field is a 1-bit filed,
i.e. KVM can only pend one NMI in the TDX module. Also, TDX doesn't
allow KVM to request NMI-window exit directly. When there is already
one NMI pending in the TDX module, i.e. it has not been delivered to
TDX guest yet, if there is NMI pending in KVM, collapse the pending
NMI in KVM into the one pending in the TDX module. Such collapse is OK
considering on X86 bare metal, multiple NMIs could collapse into one NMI,
e.g. when NMI is blocked by SMI. It's OS's responsibility to poll all
NMI sources in the NMI handler to avoid missing handling of some NMI
events. More details can be found in the changelog of the patch "KVM:
TDX: Implement methods to inject NMI".

TDX doesn't support system-management mode (SMM) and system-management
interrupt (SMI) in guest TDs because TDX module doesn't provide a way for
VMM to inject SMI into guest TD or switch guest vCPU mode into SMM.
SMI requests return -ENOTTY similar to CONFIG_KVM_SMM=n. Likewise,
INIT and SIPI events are not used and are blocked for TDX guests;
TDX defines its own vCPU creation and initialization sequence, which
is done on the host via SEAMCALLs at TD build time.

VM-exit for external events
===========================

Similar to the VMX case, external interrupts are with interrupts off:
in the .handle_exit_irqoff() callback for external interrupts and in
the noinstr region for NMIs. Just like VMX, NMI remains blocked after
exiting from TDX guest for NMI-induced exits.

Machine check, which is handled in the .handle_exit_irqoff() callback, is
the only exception type KVM handles for TDX guests. For other exceptions,
because TDX guest state is protected, exceptions in TDX guests can't be
intercepted. TDX VMM isn't supposed to handle these exceptions. Exit to
userspace with KVM_EXIT_EXCEPTION If unexpected exception occurs.

Host SMIs also cause an exit to KVM. This is needed because in SEAM
root mode (TDX module) all interrupts are blocked. An SMI can be "I/O
SMI" or "other SMI". For TDX, there will be no I/O SMI because I/O
instructions inside TDX guest trigger #VE and TDX guest needs to use
TDVMCALL to request VMM to do I/O emulation. The only case of interest
for "other SMI" is an #MC occurring in the guest when MCE-SMI morphing
is enabled in the host firmware. Such "MSMI" is marked by having bit 0
set in the exit qualification; MSMI exits are fatal for the TD and
are eventually handled by the kernel machine check handler (7911f14
x86/mce: Implement recovery for errors in TDX/SEAM non-root mode),
which marks the page as poisoned. It is not possible right now to
pass machine check exceptions to the guest.

SMIs other than machine check SMIs are handled just by leaving SEAM
root mode and KVM doesn't need to do anything.

show more ...


# 4d2dc9a2 12-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-userspace-exit' into HEAD

Introduces support for VM exits that are forwarded the host VMM in
userspace. These are initiated from the TDCALL exit code; although
these userspace

Merge branch 'kvm-tdx-userspace-exit' into HEAD

Introduces support for VM exits that are forwarded the host VMM in
userspace. These are initiated from the TDCALL exit code; although
these userspace exits have the same TDX exit code, they result in several
different types of exits to userspace.

When a guest TD issues a TDVMCALL, it
exits to VMM with a new exit reason. The arguments from the guest TD and
return values from the VMM are passed through the guest registers. The
ABI details for the guest TD hypercalls are specified in the TDX GHCI
specification.

There are two types of hypercalls defined in the GHCI specification:

- Standard TDVMCALLs: When input of R10 from guest TD is set to 0, it
indicates that the TDVMCALL sub-function used in R11 is defined in GHCI
specification.

- Vendor-Specific TDVMCALLs: When input of R10 from guest TD is non-zero,
it indicates a vendor-specific TDVMCALL. KVM hypercalls from the guest
follow this interface, using R10 as KVM hypercall number and R11-R14 as
4 arguments. The error code returned in R10.

This series includes basic standard TDVMCALLs that map to existing eixt
reasons:

- TDG.VP.VMCALL<MapGPA> reuses exit reason KVM_EXIT_HYPERCALL with the
hypercall number KVM_HC_MAP_GPA_RANGE.

- TDG.VP.VMCALL<ReportFatalError> reuses exit reason KVM_EXIT_SYSTEM_EVENT
with a new event type KVM_SYSTEM_EVENT_TDX_FATAL.

- TDG.VP.VMCALL<Instruction.IO> reuses exit reason KVM_EXIT_IO.

- TDG.VP.VMCALL<#VE.RequestMMIO> reuses exit reason KVM_EXIT_MMIO.

Notably, handling for TDG.VP.VMCALL<SetupEventNotifyInterrupt> and
TDG.VP.VMCALL<GetQuote> is not included yet.

show more ...


# 77ab80c6 12-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-enter-exit' into HEAD

This series introduces callbacks to facilitate the entry of a TD VCPU
and the corresponding save/restore of host state.

A TD VCPU is entered via the SEAM

Merge branch 'kvm-tdx-enter-exit' into HEAD

This series introduces callbacks to facilitate the entry of a TD VCPU
and the corresponding save/restore of host state.

A TD VCPU is entered via the SEAMCALL TDH.VP.ENTER. The TDX Module manages
the save/restore of guest state and, in conjunction with the SEAMCALL
interface, handles certain aspects of host state. However, there are
specific elements of the host state that require additional attention, as
detailed in the Intel TDX ABI documentation for TDH.VP.ENTER.

TDX is quite different from VMX in this regard. For VMX, the host VMM is
heavily involved in restoring, managing and saving guest CPU state, whereas
for TDX this is handled by the TDX Module. In that way, the TDX Module can
protect the confidentiality and integrity of TD CPU state.

The TDX Module does not save/restore all host CPU state because the host
VMM can do it more efficiently and selectively. CPU state referred to
below is host CPU state. Often values are already held in memory so no
explicit save is needed, and restoration may not be needed if the kernel
is not using a feature.

TDX does not support PAUSE-loop exiting. According to the TDX Module
Base arch. spec., hypercalls are expected to be used instead. Note that
the Linux TDX guest supports existing hypercalls via TDG.VP.VMCALL.

This series requires TDX module 1.5.06.00.0744, or later, due to removal
of the workarounds for the lack of the NO_RBP_MOD feature required by the
kernel. NO_RBP_MOD is now required.

show more ...


Revision tags: v6.14-rc6
# fcbe3482 06-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-mmu' into HEAD

This series picks up from commit 86eb1aef7279 ("Merge branch
'kvm-mirror-page-tables' into HEAD", 2025-01-20), which focused on
changes to the generic x86 parts

Merge branch 'kvm-tdx-mmu' into HEAD

This series picks up from commit 86eb1aef7279 ("Merge branch
'kvm-mirror-page-tables' into HEAD", 2025-01-20), which focused on
changes to the generic x86 parts of the KVM MMU code, and adds support
for TDX's secure page tables to the Intel side of KVM.

Confidential computing solutions have concepts of private and shared
memory. Often the guest accesses either private or shared memory via a bit
in the guest PTE. Solutions like SEV treat this bit more like a permission
bit, where solutions like TDX and ARM CCA treat it more like a GPA bit. In
the latter case, the host maps private memory in one half of the address
space and shared in another. For TDX these two halves are mapped by
different EPT roots. The private half (also called Secure EPT in Intel
documentation) gets managed by the privileged TDX Module. The shared half
is managed by the untrusted part of the VMM (KVM).

In addition to the separate roots for private and shared, there are
limitations on what operations can be done on the private side. Like SNP,
TDX wants to protect against protected memory being reset or otherwise
scrambled by the host. In order to prevent this, the guest has to take
specific action to “accept” memory after changes are made by the VMM to
the private EPT. This prevents the VMM from performing many of the usual
memory management operations that involve zapping and refaulting memory.
The private memory also is always RWX and cannot have VMM specified cache
attribute attributes applied.

TDX memory implementation
=========================

Creating shared EPT
-------------------
Shared EPT handling is relatively simple compared to private memory. It is
managed from within KVM. The main differences between shared EPT and EPT
in a normal VM are that the root is set with a TDVMCS field (via SEAMCALL),
and that the GFN specified in the memslot perspective needs to be mapped
at an offset in the EPT. For the former, this series plumbs in the
load_mmu_pgd() operation to the correct field for the shared EPT. For the
latter, previous patches have laid the groundwork for mapping so called
“direct roots” roots at an offset specified in kvm->arch.gfn_direct_bits.

Creating private EPT
--------------------
In previous patches, the concept of “mirrored roots” were introduced. Such
roots maintain a KVM side “mirror” of the “external” EPT by keeping an
unmapped EPT tree within the KVM MMU code. When changing these mirror
EPTs, the KVM MMU code calls out via x86_ops to update the external EPT.
This series adds implementations for these “external” ops for TDX to
create and manage “private” memory via TDX module APIs.

Managing S-EPT with the TDX Module
----------------------------------
The TDX module allows the TD’s private memory to be managed via SEAMCALLs.
This management consists of operating on two internal elements:

1. The private EPT, which the TDX module calls the S-EPT. It maps the
actual mapped, private half of the GPA space using an EPT tree.

2. The HKID, which represents private encryption keys used for encrypting
TD memory. The CPU doesn’t guarantee cache coherency between these
encryption keys, so memory that is encrypted with one of these keys
needs to be reclaimed for use on the host in special ways.

This series will primarily focus on the SEAMCALLs for managing the private
EPT. Consideration of the HKID is needed for when the TD is torn down.

Populating TDX Private memory
-----------------------------
TDX allows the EPT mapping the TD's private memory to be modified in
limited ways. There are SEAMCALLs for building and tearing down the EPT
tree, as well as mapping pages into the private EPT.

As for building and tearing down the EPT page tables, it is relatively
simple. There are SEAMCALLs for installing and removing them. However, the
current implementation only supports adding private EPT page tables, and
leaves them installed for the lifetime of the TD. For teardown, the
details are discussed in a later section.

As for populating and zapping private SPTE, there are SEAMCALLs for this
as well. The zapping case will be described in detail later. As for the
populating case, there are two categories: before TD is finalized and
after TD is finalized. Both of these scenarios go through the TDP MMU map
path. The changes done previously to introduce “mirror” and “external”
page tables handle directing SPTE installation operations through the
set_external_spte() op.

In the “after” case, the TDX set_external_spte() handler simply calls a
SEAMCALL (TDX.MEM.PAGE.AUG).

For the before case, it is a bit more complicated as it requires both
setting the private SPTE *and* copying in the initial contents of the page
at the same time. For TDX this is done via the KVM_TDX_INIT_MEM_REGION
ioctl, which is effectively the kvm_gmem_populate() operation.

For SNP, the private memory can be pre-populated first, and faulted in
later like normal. But for TDX these need to both happen both at the same
time and the setting of the private SPTE needs to happen in a different
way than the “after” case described above. It needs to use the
TDH.MEM.SEPT.ADD SEAMCALL which does both the copying in of the data and
setting the SPTE.

Without extensive modification to the fault path, it’s not possible
utilize this callback from the set_external_spte() handler because it the
source page for the data to be copied in is not known deep down in this
callchain. So instead the post-populate callback does a three step
process.

1. Pre-fault the memory into the mirror EPT, but have the
set_external_spte() not make any SEAMCALLs.

2. Check that the page is still faulted into the mirror EPT under read
mmu_lock that is held over this and the following step.

3. Call TDH.MEM.SEPT.ADD with the HPA of the page to copy data from, and
the private page installed in the mirror EPT to use for the private
mapping.

The scheme involves some assumptions about the operations that might
operate on the mirrored EPT before the VM is finalized. It assumes that no
other memory will be faulted into the mirror EPT, that is not also added
via TDH.MEM.SEPT.ADD). If this is violated the KVM MMU may not see private
memory faulted in there later and so not make the proper external spte
callbacks. To check this, KVM enforces that the number of
pre-faulted pages is the same as the number of pages added via
KVM_TDX_INIT_MEM_REGION.

TDX TLB flushing
----------------
For TDX, TLB flushing needs to happen in different ways depending on
whether private and/or shared EPT needs to be flushed. Shared EPT can be
flushed like normal EPT with INVEPT. To avoid reading TD's EPTP out from
TDX module, this series flushes shared EPT with type 2 INVEPT. Private TLB
entries can be flushed this way too (via type 2). However, since the TDX
module needs to enforce some guarantees around which private memory is
mapped in the TD, it requires these operations to be done in special ways
for private memory.

For flushing private memory, two methods are possible. The simple one
is the TDH.VP.FLUSH SEAMCALL; this flush is of the INVEPT type 1 variety
(i.e. mappings associated with the TD).

The second method is part of a sequence of SEAMCALLs for removing a guest
page. The sequence looks like:

1. TDH.MEM.RANGE.BLOCK - Remove RWX bits from entry (similar to KVM’s zap).

2. TDH.MEM.TRACK - Increment the TD TLB epoch, which is a per-TD counter

3. Kick off all vCPUs - In order to force them to have to re-enter.

4. TDH.MEM.PAGE.REMOVE - Actually remove the page and make it available for
other use.

5. TDH.VP.ENTER - On re-entering TDX module will see the epoch is
incremented and flush the TLB.

On top of this, during TDX module init TDH.SYS.LP.INIT (which is used
to online a CPU for TDX usage) invokes INVEPT to flush all mappings in
the TLB.

During runtime, for normal (TDP MMU, non-nested) guests, KVM will do a TLB
flushes in 4 scenarios:

(1) kvm_mmu_load()

After EPT is loaded, call kvm_x86_flush_tlb_current() to invalidate
TLBs for current vCPU loaded EPT on current pCPU.

(2) Loading vCPU to a new pCPU

Send request KVM_REQ_TLB_FLUSH to current vCPU, the request handler
will call kvm_x86_flush_tlb_all() to flush all EPTs assocated with the
new pCPU.

(3) When EPT mapping has changed (after removing or permission reduction)
(e.g. in kvm_flush_remote_tlbs())

Send request KVM_REQ_TLB_FLUSH to all vCPUs by kicking all them off,
the request handler on each vCPU will call kvm_x86_flush_tlb_all() to
invalidate TLBs for all EPTs associated with the pCPU.

(4) When EPT changes only affects current vCPU, e.g. virtual apic mode
changed.

Send request KVM_REQ_TLB_FLUSH_CURRENT, the request handler will call
kvm_x86_flush_tlb_current() to invalidate TLBs for current vCPU loaded
EPT on current pCPU.

Only the first 3 are relevant to TDX. They are implemented as follows.

(1) kvm_mmu_load()

Only the shared EPT root is loaded in this path. The TDX module does
not require any assurances about the operation, so the
flush_tlb_current()->ept_sync_global() can be called as normal.

(2) vCPU load

When a vCPU migrates to a new logical processor, it has to be flushed
on the *old* pCPU, unlike normal VMs where the INVEPT is executed on
the new pCPU to remove stale mappings from previous usage of the same
EPTP on the new pCPU. The TDX behavior comes from a requirement
that a vCPU can only be associated with one pCPU at at time. This
flush happens via an IPI that invokes TDH.VP.FLUSH SEAMCALL, during
the vcpu_load callback.

(3) Removing a private SPTE

This is the more complicated flow. It is done in a simple way for now
and is especially inefficient during VM teardown. The plan is to get a
basic functional version working and optimize some of these flows
later.

When a private page mapping is removed, the core MMU code calls the
newly remove_external_spte() op, and flushes the TLB on all vCPUs. But
TDX can’t rely on doing that for private memory, so it has it’s own
process for making sure the private page is removed. This flow
(TDH.MEM.RANGE.BLOCK, TDH.MEM.TRACK, TDH.MEM.PAGE.REMOVE) is done
withing the remove_external_spte() implementation as described in the
“TDX TLB flushing” section above.

After that, back in the core MMU code, KVM will call
kvm_flush_remote_tlbs*() resulting in an INVEPT. Despite that, when
the vCPUs re-enter (TDH.VP.ENTER) the TD, the TDX module will do
another INVEPT for its own reassurance.

Private memory teardown
-----------------------
Tearing down private memory involves reclaiming three types of resources
from the TDX module:

1. TD’s HKID

To reclaim the TD’s HKID, no mappings may be mapped with it.

2. Private guest pages (mapped with HKID)
3. Private page tables that map private pages (mapped with HKID)

From the TDX module’s perspective, to reclaim guest private pages they
need to be prevented from be accessed via the HKID (unmapped and TLB
flushed), their HKID associated cachelines need to be flushed, and
they need to be marked as no longer use by the TD in the TDX modules
internal tracking (PAMT)

During runtime private PTEs can be zapped as part of memslot deletion or
when memory coverts from shared to private, but private page tables and
HKIDs are not torn down until the TD is being destructed. The means the
operation to zap private guest mapped pages needs to do the required cache
writeback under the assumption that other vCPU’s may be active, but the
PTs do not.

TD teardown resource reclamation
--------------------------------
The code that does the TD teardown is organized such that when an HKID is
reclaimed:
1. vCPUs will no longer enter the TD
2. The TLB is flushed on all CPUs
3. The HKID associated cachelines have been flushed.

So at that point most of the steps needed to reclaim TD private pages and
page tables have already been done and the reclaim operation only needs to
update the TDX module’s tracking of page ownership. For simplicity each
operation only supports one scenario: before or after HKID reclaim. Since
zapping and reclaiming private pages has to function during runtime for
memslot deletion and converting from shared to private, the TD teardown is
arranged so this happens before HKID reclaim. Since private page tables
are never torn down during TD runtime, they can happen in a simpler and
more efficient way after HKID reclaim. The private page reclaim is
initiated from the kvm fd release. The callchain looks like this:

do_exit
|->exit_mm --> tdx_mmu_release_hkid() was called here previously in v19
|->exit_files
|->1.release vcpu fd
|->2.kvm_gmem_release
| |->kvm_gmem_invalidate_begin --> unmap all leaf entries, causing
| zapping of private guest pages
|->3.release kvmfd
|->kvm_destroy_vm
|->kvm_arch_pre_destroy_vm
| | kvm_x86_call(vm_pre_destroy)(kvm) -->tdx_mmu_release_hkid()
|->kvm_arch_destroy_vm
|->kvm_unload_vcpu_mmus
| kvm_destroy_vcpus(kvm)
| |->kvm_arch_vcpu_destroy
| |->kvm_x86_call(vcpu_free)(vcpu)
| | kvm_mmu_destroy(vcpu) -->unref mirror root
| kvm_mmu_uninit_vm(kvm) --> mirror root ref is 1 here,
| zap private page tables
| static_call_cond(kvm_x86_vm_destroy)(kvm);

show more ...


# 0d20742b 06-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-initialization' into HEAD

This series kicks off the actual interaction of KVM with the TDX module.
This series encompasses the basic setup for using the TDX module from KVM,
an

Merge branch 'kvm-tdx-initialization' into HEAD

This series kicks off the actual interaction of KVM with the TDX module.
This series encompasses the basic setup for using the TDX module from KVM,
and the creation of TD VMs and vCPUs.

The TDX Module is a software component that runs in a special CPU mode
called SEAM (Secure Arbitration Mode). Loading it is mostly handled
outside of KVM by the core kernel. Once it’s loaded KVM can interact with
the TDX Module via a new instruction called SEAMCALL to virtualize a TD
guests. This instruction can be used to make various types of seamcalls,
with names organized into a hierarchy. The format is TDH.[AREA].[ACTION],
where “TDH” stands for “Trust Domain Host”, and differentiates from
another set of calls that can be done by the guest “TDG”. The KVM relevant
areas of SEAMCALLs are:
SYS – TDX module management, static metadata reading.
MNG – TD management. VM scoped things that operate on a TDX module
controlled structure called the TDCS.
VP – vCPU management. vCPU scoped things that operate on TDX module
controlled structures called the TDVPS.
PHYMEM - Operations related to physical memory management (page
reclaiming, cache operations, etc).

This series introduces some TDX specific KVM APIs and stops short of
fully “finalizing” the creation of a TD VM. The part of initializing
a guest where initial private memory is loaded is left to a separate
MMU related series.

show more ...


Revision tags: v6.14-rc5
# 90fe64a9 24-Feb-2025 Yan Zhao <yan.y.zhao@intel.com>

KVM: TDX: KVM: TDX: Always honor guest PAT on TDX enabled guests

Always honor guest PAT in KVM-managed EPTs on TDX enabled guests by
making self-snoop feature a hard dependency for TDX and making qu

KVM: TDX: KVM: TDX: Always honor guest PAT on TDX enabled guests

Always honor guest PAT in KVM-managed EPTs on TDX enabled guests by
making self-snoop feature a hard dependency for TDX and making quirk
KVM_X86_QUIRK_IGNORE_GUEST_PAT not a valid quirk once TDX is enabled.

The quirk KVM_X86_QUIRK_IGNORE_GUEST_PAT only affects memory type of
KVM-managed EPTs. For the TDX-module-managed private EPT, memory type is
always forced to WB now.

Honoring guest PAT in KVM-managed EPTs ensures KVM does not invoke
kvm_zap_gfn_range() when attaching/detaching non-coherent DMA devices,
which would cause mirrored EPTs for TDs to be zapped, leading to the
TDX-module-managed private EPT being incorrectly zapped.

As a new feature, TDX always comes with support for self-snoop, and does
not have to worry about unmodifiable but buggy guests. So, simply ignore
KVM_X86_QUIRK_IGNORE_GUEST_PAT on TDX guests just like kvm-amd.ko already
does.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20250224071039.31511-1-yan.y.zhao@intel.com>
[Only apply to TDX guests. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 26eab9ae 27-Feb-2025 Binbin Wu <binbin.wu@linux.intel.com>

KVM: TDX: Enable guest access to MTRR MSRs

Allow TDX guests to access MTRR MSRs as what KVM does for normal VMs, i.e.,
KVM emulates accesses to MTRR MSRs, but doesn't virtualize guest MTRR
memory ty

KVM: TDX: Enable guest access to MTRR MSRs

Allow TDX guests to access MTRR MSRs as what KVM does for normal VMs, i.e.,
KVM emulates accesses to MTRR MSRs, but doesn't virtualize guest MTRR
memory types.

TDX module exposes MTRR feature to TDX guests unconditionally. KVM needs
to support MTRR MSRs accesses for TDX guests to match the architectural
behavior.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250227012021.1778144-19-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 04733836 27-Feb-2025 Isaku Yamahata <isaku.yamahata@intel.com>

KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall

Implement TDG.VP.VMCALL<GetTdVmCallInfo> hypercall. If the input value is
zero, return success code and zero in output registers.

TDG.VP.V

KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall

Implement TDG.VP.VMCALL<GetTdVmCallInfo> hypercall. If the input value is
zero, return success code and zero in output registers.

TDG.VP.VMCALL<GetTdVmCallInfo> hypercall is a subleaf of TDG.VP.VMCALL to
enumerate which TDG.VP.VMCALL sub leaves are supported. This hypercall is
for future enhancement of the Guest-Host-Communication Interface (GHCI)
specification. The GHCI version of 344426-001US defines it to require
input R12 to be zero and to return zero in output registers, R11, R12, R13,
and R14 so that guest TD enumerates no enhancement.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250227012021.1778144-12-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 9fc3402a 27-Feb-2025 Isaku Yamahata <isaku.yamahata@intel.com>

KVM: TDX: Enable guest access to LMCE related MSRs

Allow TDX guest to configure LMCE (Local Machine Check Event) by handling
MSR IA32_FEAT_CTL and IA32_MCG_EXT_CTL.

MCE and MCA are advertised via c

KVM: TDX: Enable guest access to LMCE related MSRs

Allow TDX guest to configure LMCE (Local Machine Check Event) by handling
MSR IA32_FEAT_CTL and IA32_MCG_EXT_CTL.

MCE and MCA are advertised via cpuid based on the TDX module spec. Guest
kernel can access IA32_FEAT_CTL to check whether LMCE is opted-in by the
platform or not. If LMCE is opted-in by the platform, guest kernel can
access IA32_MCG_EXT_CTL to enable/disable LMCE.

Handle MSR IA32_FEAT_CTL and IA32_MCG_EXT_CTL for TDX guests to avoid
failure when a guest accesses them with TDG.VP.VMCALL<MSR> on #VE. E.g.,
Linux guest will treat the failure as a #GP(0).

Userspace VMM may not opt-in LMCE by default, e.g., QEMU disables it by
default, "-cpu lmce=on" is needed in QEMU command line to opt-in it.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
[binbin: rework changelog]
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250227012021.1778144-11-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 081385db 27-Feb-2025 Isaku Yamahata <isaku.yamahata@intel.com>

KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall

Morph PV RDMSR/WRMSR hypercall to EXIT_REASON_MSR_{READ,WRITE} and
wire up KVM backend functions.

For complete_emulated_msr() callback, instead of inje

KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall

Morph PV RDMSR/WRMSR hypercall to EXIT_REASON_MSR_{READ,WRITE} and
wire up KVM backend functions.

For complete_emulated_msr() callback, instead of injecting #GP on error,
implement tdx_complete_emulated_msr() to set return code on error. Also
set return value on MSR read according to the values from kvm x86
registers.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250227012021.1778144-10-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# dd50294f 27-Feb-2025 Isaku Yamahata <isaku.yamahata@intel.com>

KVM: TDX: Implement callbacks for MSR operations

Add functions to implement MSR related callbacks, .set_msr(), .get_msr(),
and .has_emulated_msr(), for preparation of handling hypercalls from TDX
gu

KVM: TDX: Implement callbacks for MSR operations

Add functions to implement MSR related callbacks, .set_msr(), .get_msr(),
and .has_emulated_msr(), for preparation of handling hypercalls from TDX
guest for PV RDMSR and WRMSR. Ignore KVM_REQ_MSR_FILTER_CHANGED for TDX.

There are three classes of MSR virtualization for TDX.
- Non-configurable: TDX module directly virtualizes it. VMM can't configure
it, the value set by KVM_SET_MSRS is ignored.
- Configurable: TDX module directly virtualizes it. VMM can configure it at
VM creation time. The value set by KVM_SET_MSRS is used.
- #VE case: TDX guest would issue TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}>
and VMM handles the MSR hypercall. The value set by KVM_SET_MSRS is used.

For the MSRs belonging to the #VE case, the TDX module injects #VE to the
TDX guest upon RDMSR or WRMSR. The exact list of such MSRs is defined in
TDX Module ABI Spec.

Upon #VE, the TDX guest may call TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}>,
which are defined in GHCI (Guest-Host Communication Interface) so that the
host VMM (e.g. KVM) can virtualize the MSRs.

TDX doesn't allow VMM to configure interception of MSR accesses. Ignore
KVM_REQ_MSR_FILTER_CHANGED for TDX guest. If the userspace has set any
MSR filters, it will be applied when handling
TDG.VP.VMCALL<INSTRUCTION.{WRMSR,RDMSR}> in a later patch.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <20250227012021.1778144-9-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 5cf7239b 27-Feb-2025 Isaku Yamahata <isaku.yamahata@intel.com>

KVM: TDX: Handle TDX PV HLT hypercall

Handle TDX PV HLT hypercall and the interrupt status due to it.

TDX guest status is protected, KVM can't get the interrupt status
of TDX guest and it assumes i

KVM: TDX: Handle TDX PV HLT hypercall

Handle TDX PV HLT hypercall and the interrupt status due to it.

TDX guest status is protected, KVM can't get the interrupt status
of TDX guest and it assumes interrupt is always allowed unless TDX
guest calls TDVMCALL with HLT, which passes the interrupt blocked flag.

If the guest halted with interrupt enabled, also query pending RVI by
checking bit 0 of TD_VCPU_STATE_DETAILS_NON_ARCH field via a seamcall.

Update vt_interrupt_allowed() for TDX based on interrupt blocked flag
passed by HLT TDVMCALL. Do not wakeup TD vCPU if interrupt is blocked
for VT-d PI.

For NMIs, KVM cannot determine the NMI blocking status for TDX guests,
so KVM always assumes NMIs are not blocked. In the unlikely scenario
where a guest invokes the PV HLT hypercall within an NMI handler, this
could result in a spurious wakeup. The guest should implement the PV
HLT hypercall within a loop if it truly requires no interruptions, since
NMI could be unblocked by an IRET due to an exception occurring before
the PV HLT is executed in the NMI handler.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250227012021.1778144-7-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 3bf31b57 27-Feb-2025 Isaku Yamahata <isaku.yamahata@intel.com>

KVM: TDX: Handle TDX PV CPUID hypercall

Handle TDX PV CPUID hypercall for the CPUIDs virtualized by VMM
according to TDX Guest Host Communication Interface (GHCI).

For TDX, most CPUID leaf/sub-leaf

KVM: TDX: Handle TDX PV CPUID hypercall

Handle TDX PV CPUID hypercall for the CPUIDs virtualized by VMM
according to TDX Guest Host Communication Interface (GHCI).

For TDX, most CPUID leaf/sub-leaf combinations are virtualized by
the TDX module while some trigger #VE. On #VE, TDX guest can issue
TDG.VP.VMCALL<INSTRUCTION.CPUID> (same value as EXIT_REASON_CPUID)
to request VMM to emulate CPUID operation.

Morph PV CPUID hypercall to EXIT_REASON_CPUID and wire up to the KVM
backend function.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
[binbin: rewrite changelog]
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250227012021.1778144-6-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


1234