History log of /linux/arch/x86/kvm/vmx/common.h (Results 1 – 13 of 13)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 7f9039c5 02-Jun-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
Generic:

- Clean up locking of all vCPUs for a VM by using the *_nest_lock()
f

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull more kvm updates from Paolo Bonzini:
Generic:

- Clean up locking of all vCPUs for a VM by using the *_nest_lock()
family of functions, and move duplicated code to virt/kvm/. kernel/
patches acked by Peter Zijlstra

- Add MGLRU support to the access tracking perf test

ARM fixes:

- Make the irqbypass hooks resilient to changes in the GSI<->MSI
routing, avoiding behind stale vLPI mappings being left behind. The
fix is to resolve the VGIC IRQ using the host IRQ (which is stable)
and nuking the vLPI mapping upon a routing change

- Close another VGIC race where vCPU creation races with VGIC
creation, leading to in-flight vCPUs entering the kernel w/o
private IRQs allocated

- Fix a build issue triggered by the recently added workaround for
Ampere's AC04_CPU_23 erratum

- Correctly sign-extend the VA when emulating a TLBI instruction
potentially targeting a VNCR mapping

- Avoid dereferencing a NULL pointer in the VGIC debug code, which
can happen if the device doesn't have any mapping yet

s390:

- Fix interaction between some filesystems and Secure Execution

- Some cleanups and refactorings, preparing for an upcoming big
series

x86:

- Wait for target vCPU to ack KVM_REQ_UPDATE_PROTECTED_GUEST_STATE
to fix a race between AP destroy and VMRUN

- Decrypt and dump the VMSA in dump_vmcb() if debugging enabled for
the VM

- Refine and harden handling of spurious faults

- Add support for ALLOWED_SEV_FEATURES

- Add #VMGEXIT to the set of handlers special cased for
CONFIG_RETPOLINE=y

- Treat DEBUGCTL[5:2] as reserved to pave the way for virtualizing
features that utilize those bits

- Don't account temporary allocations in sev_send_update_data()

- Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM, via Bus Lock
Threshold

- Unify virtualization of IBRS on nested VM-Exit, and cross-vCPU
IBPB, between SVM and VMX

- Advertise support to userspace for WRMSRNS and PREFETCHI

- Rescan I/O APIC routes after handling EOI that needed to be
intercepted due to the old/previous routing, but not the
new/current routing

- Add a module param to control and enumerate support for device
posted interrupts

- Fix a potential overflow with nested virt on Intel systems running
32-bit kernels

- Flush shadow VMCSes on emergency reboot

- Add support for SNP to the various SEV selftests

- Add a selftest to verify fastops instructions via forced emulation

- Refine and optimize KVM's software processing of the posted
interrupt bitmap, and share the harvesting code between KVM and the
kernel's Posted MSI handler"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
rtmutex_api: provide correct extern functions
KVM: arm64: vgic-debug: Avoid dereferencing NULL ITE pointer
KVM: arm64: vgic-init: Plug vCPU vs. VGIC creation race
KVM: arm64: Unmap vLPIs affected by changes to GSI routing information
KVM: arm64: Resolve vLPI by host IRQ in vgic_v4_unset_forwarding()
KVM: arm64: Protect vLPI translation with vgic_irq::irq_lock
KVM: arm64: Use lock guard in vgic_v4_set_forwarding()
KVM: arm64: Mask out non-VA bits from TLBI VA* on VNCR invalidation
arm64: sysreg: Drag linux/kconfig.h to work around vdso build issue
KVM: s390: Simplify and move pv code
KVM: s390: Refactor and split some gmap helpers
KVM: s390: Remove unneeded srcu lock
s390: Remove unneeded includes
s390/uv: Improve splitting of large folios that cannot be split while dirty
s390/uv: Always return 0 from s390_wiggle_split_folio() if successful
s390/uv: Don't return 0 from make_hva_secure() if the operation was not successful
rust: add helper for mutex_trylock
RISC-V: KVM: use kvm_trylock_all_vcpus when locking all vCPUs
KVM: arm64: use kvm_trylock_all_vcpus when locking all vCPUs
x86: KVM: SVM: use kvm_lock_all_vcpus instead of a custom implementation
...

show more ...


# cd1be30b 27-May-2025 Edward Adam Davis <eadavis@qq.com>

KVM: VMX: use __always_inline for is_td_vcpu and is_td

is_td() and is_td_vcpu() are used in no-instrumentation sections; use
__always_inline instead of inline.

vmlinux.o: error: objtool: vmx_handle

KVM: VMX: use __always_inline for is_td_vcpu and is_td

is_td() and is_td_vcpu() are used in no-instrumentation sections; use
__always_inline instead of inline.

vmlinux.o: error: objtool: vmx_handle_nmi+0x47:
call to is_td_vcpu.isra.0() leaves .noinstr.text section

Fixes: 7172c753c26a ("KVM: VMX: Move common fields of struct vcpu_{vmx,tdx} to a struct")
Signed-off-by: Edward Adam Davis <eadavis@qq.com>
Message-ID: <tencent_1A767567C83C1137829622362E4A72756F09@qq.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 43db1111 29-May-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.

Quotes are appropr

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm updates from Paolo Bonzini:
"As far as x86 goes this pull request "only" includes TDX host support.

Quotes are appropriate because (at 6k lines and 100+ commits) it is
much bigger than the rest, which will come later this week and
consists mostly of bugfixes and selftests. s390 changes will also come
in the second batch.

ARM:

- Add large stage-2 mapping (THP) support for non-protected guests
when pKVM is enabled, clawing back some performance.

- Enable nested virtualisation support on systems that support it,
though it is disabled by default.

- Add UBSAN support to the standalone EL2 object used in nVHE/hVHE
and protected modes.

- Large rework of the way KVM tracks architecture features and links
them with the effects of control bits. While this has no functional
impact, it ensures correctness of emulation (the data is
automatically extracted from the published JSON files), and helps
dealing with the evolution of the architecture.

- Significant changes to the way pKVM tracks ownership of pages,
avoiding page table walks by storing the state in the hypervisor's
vmemmap. This in turn enables the THP support described above.

- New selftest checking the pKVM ownership transition rules

- Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests
even if the host didn't have it.

- Fixes for the address translation emulation, which happened to be
rather buggy in some specific contexts.

- Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N
from the number of counters exposed to a guest and addressing a
number of issues in the process.

- Add a new selftest for the SVE host state being corrupted by a
guest.

- Keep HCR_EL2.xMO set at all times for systems running with the
kernel at EL2, ensuring that the window for interrupts is slightly
bigger, and avoiding a pretty bad erratum on the AmpereOne HW.

- Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers
from a pretty bad case of TLB corruption unless accesses to HCR_EL2
are heavily synchronised.

- Add a per-VM, per-ITS debugfs entry to dump the state of the ITS
tables in a human-friendly fashion.

- and the usual random cleanups.

LoongArch:

- Don't flush tlb if the host supports hardware page table walks.

- Add KVM selftests support.

RISC-V:

- Add vector registers to get-reg-list selftest

- VCPU reset related improvements

- Remove scounteren initialization from VCPU reset

- Support VCPU reset from userspace using set_mpstate() ioctl

x86:

- Initial support for TDX in KVM.

This finally makes it possible to use the TDX module to run
confidential guests on Intel processors. This is quite a large
series, including support for private page tables (managed by the
TDX module and mirrored in KVM for efficiency), forwarding some
TDVMCALLs to userspace, and handling several special VM exits from
the TDX module.

This has been in the works for literally years and it's not really
possible to describe everything here, so I'll defer to the various
merge commits up to and including commit 7bcf7246c42a ('Merge
branch 'kvm-tdx-finish-initial' into HEAD')"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits)
x86/tdx: mark tdh_vp_enter() as __flatten
Documentation: virt/kvm: remove unreferenced footnote
RISC-V: KVM: lock the correct mp_state during reset
KVM: arm64: Fix documentation for vgic_its_iter_next()
KVM: arm64: np-guest CMOs with PMD_SIZE fixmap
KVM: arm64: Stage-2 huge mappings for np-guests
KVM: arm64: Add a range to pkvm_mappings
KVM: arm64: Convert pkvm_mappings to interval tree
KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest()
KVM: arm64: Add a range to __pkvm_host_wrprotect_guest()
KVM: arm64: Add a range to __pkvm_host_unshare_guest()
KVM: arm64: Add a range to __pkvm_host_share_guest()
KVM: arm64: Introduce for_each_hyp_page
KVM: arm64: Handle huge mappings for np-guest CMOs
KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section
KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held
KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating
RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET
RISC-V: KVM: Remove scounteren initialization
KVM: RISC-V: remove unnecessary SBI reset state
...

show more ...


Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14
# fd02aa45 19-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-initial' into HEAD

This large commit contains the initial support for TDX in KVM. All x86
parts enable the host-side hypercalls that KVM uses to talk to the TDX
module, a soft

Merge branch 'kvm-tdx-initial' into HEAD

This large commit contains the initial support for TDX in KVM. All x86
parts enable the host-side hypercalls that KVM uses to talk to the TDX
module, a software component that runs in a special CPU mode called SEAM
(Secure Arbitration Mode).

The series is in turn split into multiple sub-series, each with a separate
merge commit:

- Initialization: basic setup for using the TDX module from KVM, plus
ioctls to create TDX VMs and vCPUs.

- MMU: in TDX, private and shared halves of the address space are mapped by
different EPT roots, and the private half is managed by the TDX module.
Using the support that was added to the generic MMU code in 6.14,
add support for TDX's secure page tables to the Intel side of KVM.
Generic KVM code takes care of maintaining a mirror of the secure page
tables so that they can be queried efficiently, and ensuring that changes
are applied to both the mirror and the secure EPT.

- vCPU enter/exit: implement the callbacks that handle the entry of a TDX
vCPU (via the SEAMCALL TDH.VP.ENTER) and the corresponding save/restore
of host state.

- Userspace exits: introduce support for guest TDVMCALLs that KVM forwards to
userspace. These correspond to the usual KVM_EXIT_* "heavyweight vmexits"
but are triggered through a different mechanism, similar to VMGEXIT for
SEV-ES and SEV-SNP.

- Interrupt handling: support for virtual interrupt injection as well as
handling VM-Exits that are caused by vectored events. Exclusive to
TDX are machine-check SMIs, which the kernel already knows how to
handle through the kernel machine check handler (commit 7911f145de5f,
"x86/mce: Implement recovery for errors in TDX/SEAM non-root mode")

- Loose ends: handling of the remaining exits from the TDX module, including
EPT violation/misconfig and several TDVMCALL leaves that are handled in
the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); plus returning
an error or ignoring operations that are not supported by TDX guests

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


Revision tags: v6.14-rc7
# 9913212b 12-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-interrupts' into HEAD

Introduces support for interrupt handling for TDX guests, including
virtual interrupt injection and VM-Exits caused by vectored events.

Injection
=======

Merge branch 'kvm-tdx-interrupts' into HEAD

Introduces support for interrupt handling for TDX guests, including
virtual interrupt injection and VM-Exits caused by vectored events.

Injection
=========

TDX supports non-NMI interrupt injection only by posted interrupt. Posted
interrupt descriptors (PIDs) are allocated in shared memory, KVM
can update them directly. To post pending interrupts in the PID, KVM
can generate a self-IPI with notification vector prior to TD entry.
TDX guest status is protected, KVM can't get the interrupt status of
TDX guest. For now, assume the interrupt is always allowed. A later
patch set will let TDX guests to call TDVMCALL with HLT, which passes
the interrupt block flag, so that whether interrupt is allowed in HLT
will checked against the interrupt block flag.

For NMIs, KVM can request the TDX module to inject a NMI into a TDX vCPU
by setting the PEND_NMI TDVPS field to 1. Following that, KVM can call
TDH.VP.ENTER to run the vCPU and the TDX module will attempt to inject
the NMI as soon as possible. PEND_NMI TDVPS field is a 1-bit filed,
i.e. KVM can only pend one NMI in the TDX module. Also, TDX doesn't
allow KVM to request NMI-window exit directly. When there is already
one NMI pending in the TDX module, i.e. it has not been delivered to
TDX guest yet, if there is NMI pending in KVM, collapse the pending
NMI in KVM into the one pending in the TDX module. Such collapse is OK
considering on X86 bare metal, multiple NMIs could collapse into one NMI,
e.g. when NMI is blocked by SMI. It's OS's responsibility to poll all
NMI sources in the NMI handler to avoid missing handling of some NMI
events. More details can be found in the changelog of the patch "KVM:
TDX: Implement methods to inject NMI".

TDX doesn't support system-management mode (SMM) and system-management
interrupt (SMI) in guest TDs because TDX module doesn't provide a way for
VMM to inject SMI into guest TD or switch guest vCPU mode into SMM.
SMI requests return -ENOTTY similar to CONFIG_KVM_SMM=n. Likewise,
INIT and SIPI events are not used and are blocked for TDX guests;
TDX defines its own vCPU creation and initialization sequence, which
is done on the host via SEAMCALLs at TD build time.

VM-exit for external events
===========================

Similar to the VMX case, external interrupts are with interrupts off:
in the .handle_exit_irqoff() callback for external interrupts and in
the noinstr region for NMIs. Just like VMX, NMI remains blocked after
exiting from TDX guest for NMI-induced exits.

Machine check, which is handled in the .handle_exit_irqoff() callback, is
the only exception type KVM handles for TDX guests. For other exceptions,
because TDX guest state is protected, exceptions in TDX guests can't be
intercepted. TDX VMM isn't supposed to handle these exceptions. Exit to
userspace with KVM_EXIT_EXCEPTION If unexpected exception occurs.

Host SMIs also cause an exit to KVM. This is needed because in SEAM
root mode (TDX module) all interrupts are blocked. An SMI can be "I/O
SMI" or "other SMI". For TDX, there will be no I/O SMI because I/O
instructions inside TDX guest trigger #VE and TDX guest needs to use
TDVMCALL to request VMM to do I/O emulation. The only case of interest
for "other SMI" is an #MC occurring in the guest when MCE-SMI morphing
is enabled in the host firmware. Such "MSMI" is marked by having bit 0
set in the exit qualification; MSMI exits are fatal for the TD and
are eventually handled by the kernel machine check handler (7911f14
x86/mce: Implement recovery for errors in TDX/SEAM non-root mode),
which marks the page as poisoned. It is not possible right now to
pass machine check exceptions to the guest.

SMIs other than machine check SMIs are handled just by leaving SEAM
root mode and KVM doesn't need to do anything.

show more ...


# 77ab80c6 12-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-enter-exit' into HEAD

This series introduces callbacks to facilitate the entry of a TD VCPU
and the corresponding save/restore of host state.

A TD VCPU is entered via the SEAM

Merge branch 'kvm-tdx-enter-exit' into HEAD

This series introduces callbacks to facilitate the entry of a TD VCPU
and the corresponding save/restore of host state.

A TD VCPU is entered via the SEAMCALL TDH.VP.ENTER. The TDX Module manages
the save/restore of guest state and, in conjunction with the SEAMCALL
interface, handles certain aspects of host state. However, there are
specific elements of the host state that require additional attention, as
detailed in the Intel TDX ABI documentation for TDH.VP.ENTER.

TDX is quite different from VMX in this regard. For VMX, the host VMM is
heavily involved in restoring, managing and saving guest CPU state, whereas
for TDX this is handled by the TDX Module. In that way, the TDX Module can
protect the confidentiality and integrity of TD CPU state.

The TDX Module does not save/restore all host CPU state because the host
VMM can do it more efficiently and selectively. CPU state referred to
below is host CPU state. Often values are already held in memory so no
explicit save is needed, and restoration may not be needed if the kernel
is not using a feature.

TDX does not support PAUSE-loop exiting. According to the TDX Module
Base arch. spec., hypercalls are expected to be used instead. Note that
the Linux TDX guest supports existing hypercalls via TDG.VP.VMCALL.

This series requires TDX module 1.5.06.00.0744, or later, due to removal
of the workarounds for the lack of the NO_RBP_MOD feature required by the
kernel. NO_RBP_MOD is now required.

show more ...


Revision tags: v6.14-rc6
# fcbe3482 06-Mar-2025 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-tdx-mmu' into HEAD

This series picks up from commit 86eb1aef7279 ("Merge branch
'kvm-mirror-page-tables' into HEAD", 2025-01-20), which focused on
changes to the generic x86 parts

Merge branch 'kvm-tdx-mmu' into HEAD

This series picks up from commit 86eb1aef7279 ("Merge branch
'kvm-mirror-page-tables' into HEAD", 2025-01-20), which focused on
changes to the generic x86 parts of the KVM MMU code, and adds support
for TDX's secure page tables to the Intel side of KVM.

Confidential computing solutions have concepts of private and shared
memory. Often the guest accesses either private or shared memory via a bit
in the guest PTE. Solutions like SEV treat this bit more like a permission
bit, where solutions like TDX and ARM CCA treat it more like a GPA bit. In
the latter case, the host maps private memory in one half of the address
space and shared in another. For TDX these two halves are mapped by
different EPT roots. The private half (also called Secure EPT in Intel
documentation) gets managed by the privileged TDX Module. The shared half
is managed by the untrusted part of the VMM (KVM).

In addition to the separate roots for private and shared, there are
limitations on what operations can be done on the private side. Like SNP,
TDX wants to protect against protected memory being reset or otherwise
scrambled by the host. In order to prevent this, the guest has to take
specific action to “accept” memory after changes are made by the VMM to
the private EPT. This prevents the VMM from performing many of the usual
memory management operations that involve zapping and refaulting memory.
The private memory also is always RWX and cannot have VMM specified cache
attribute attributes applied.

TDX memory implementation
=========================

Creating shared EPT
-------------------
Shared EPT handling is relatively simple compared to private memory. It is
managed from within KVM. The main differences between shared EPT and EPT
in a normal VM are that the root is set with a TDVMCS field (via SEAMCALL),
and that the GFN specified in the memslot perspective needs to be mapped
at an offset in the EPT. For the former, this series plumbs in the
load_mmu_pgd() operation to the correct field for the shared EPT. For the
latter, previous patches have laid the groundwork for mapping so called
“direct roots” roots at an offset specified in kvm->arch.gfn_direct_bits.

Creating private EPT
--------------------
In previous patches, the concept of “mirrored roots” were introduced. Such
roots maintain a KVM side “mirror” of the “external” EPT by keeping an
unmapped EPT tree within the KVM MMU code. When changing these mirror
EPTs, the KVM MMU code calls out via x86_ops to update the external EPT.
This series adds implementations for these “external” ops for TDX to
create and manage “private” memory via TDX module APIs.

Managing S-EPT with the TDX Module
----------------------------------
The TDX module allows the TD’s private memory to be managed via SEAMCALLs.
This management consists of operating on two internal elements:

1. The private EPT, which the TDX module calls the S-EPT. It maps the
actual mapped, private half of the GPA space using an EPT tree.

2. The HKID, which represents private encryption keys used for encrypting
TD memory. The CPU doesn’t guarantee cache coherency between these
encryption keys, so memory that is encrypted with one of these keys
needs to be reclaimed for use on the host in special ways.

This series will primarily focus on the SEAMCALLs for managing the private
EPT. Consideration of the HKID is needed for when the TD is torn down.

Populating TDX Private memory
-----------------------------
TDX allows the EPT mapping the TD's private memory to be modified in
limited ways. There are SEAMCALLs for building and tearing down the EPT
tree, as well as mapping pages into the private EPT.

As for building and tearing down the EPT page tables, it is relatively
simple. There are SEAMCALLs for installing and removing them. However, the
current implementation only supports adding private EPT page tables, and
leaves them installed for the lifetime of the TD. For teardown, the
details are discussed in a later section.

As for populating and zapping private SPTE, there are SEAMCALLs for this
as well. The zapping case will be described in detail later. As for the
populating case, there are two categories: before TD is finalized and
after TD is finalized. Both of these scenarios go through the TDP MMU map
path. The changes done previously to introduce “mirror” and “external”
page tables handle directing SPTE installation operations through the
set_external_spte() op.

In the “after” case, the TDX set_external_spte() handler simply calls a
SEAMCALL (TDX.MEM.PAGE.AUG).

For the before case, it is a bit more complicated as it requires both
setting the private SPTE *and* copying in the initial contents of the page
at the same time. For TDX this is done via the KVM_TDX_INIT_MEM_REGION
ioctl, which is effectively the kvm_gmem_populate() operation.

For SNP, the private memory can be pre-populated first, and faulted in
later like normal. But for TDX these need to both happen both at the same
time and the setting of the private SPTE needs to happen in a different
way than the “after” case described above. It needs to use the
TDH.MEM.SEPT.ADD SEAMCALL which does both the copying in of the data and
setting the SPTE.

Without extensive modification to the fault path, it’s not possible
utilize this callback from the set_external_spte() handler because it the
source page for the data to be copied in is not known deep down in this
callchain. So instead the post-populate callback does a three step
process.

1. Pre-fault the memory into the mirror EPT, but have the
set_external_spte() not make any SEAMCALLs.

2. Check that the page is still faulted into the mirror EPT under read
mmu_lock that is held over this and the following step.

3. Call TDH.MEM.SEPT.ADD with the HPA of the page to copy data from, and
the private page installed in the mirror EPT to use for the private
mapping.

The scheme involves some assumptions about the operations that might
operate on the mirrored EPT before the VM is finalized. It assumes that no
other memory will be faulted into the mirror EPT, that is not also added
via TDH.MEM.SEPT.ADD). If this is violated the KVM MMU may not see private
memory faulted in there later and so not make the proper external spte
callbacks. To check this, KVM enforces that the number of
pre-faulted pages is the same as the number of pages added via
KVM_TDX_INIT_MEM_REGION.

TDX TLB flushing
----------------
For TDX, TLB flushing needs to happen in different ways depending on
whether private and/or shared EPT needs to be flushed. Shared EPT can be
flushed like normal EPT with INVEPT. To avoid reading TD's EPTP out from
TDX module, this series flushes shared EPT with type 2 INVEPT. Private TLB
entries can be flushed this way too (via type 2). However, since the TDX
module needs to enforce some guarantees around which private memory is
mapped in the TD, it requires these operations to be done in special ways
for private memory.

For flushing private memory, two methods are possible. The simple one
is the TDH.VP.FLUSH SEAMCALL; this flush is of the INVEPT type 1 variety
(i.e. mappings associated with the TD).

The second method is part of a sequence of SEAMCALLs for removing a guest
page. The sequence looks like:

1. TDH.MEM.RANGE.BLOCK - Remove RWX bits from entry (similar to KVM’s zap).

2. TDH.MEM.TRACK - Increment the TD TLB epoch, which is a per-TD counter

3. Kick off all vCPUs - In order to force them to have to re-enter.

4. TDH.MEM.PAGE.REMOVE - Actually remove the page and make it available for
other use.

5. TDH.VP.ENTER - On re-entering TDX module will see the epoch is
incremented and flush the TLB.

On top of this, during TDX module init TDH.SYS.LP.INIT (which is used
to online a CPU for TDX usage) invokes INVEPT to flush all mappings in
the TLB.

During runtime, for normal (TDP MMU, non-nested) guests, KVM will do a TLB
flushes in 4 scenarios:

(1) kvm_mmu_load()

After EPT is loaded, call kvm_x86_flush_tlb_current() to invalidate
TLBs for current vCPU loaded EPT on current pCPU.

(2) Loading vCPU to a new pCPU

Send request KVM_REQ_TLB_FLUSH to current vCPU, the request handler
will call kvm_x86_flush_tlb_all() to flush all EPTs assocated with the
new pCPU.

(3) When EPT mapping has changed (after removing or permission reduction)
(e.g. in kvm_flush_remote_tlbs())

Send request KVM_REQ_TLB_FLUSH to all vCPUs by kicking all them off,
the request handler on each vCPU will call kvm_x86_flush_tlb_all() to
invalidate TLBs for all EPTs associated with the pCPU.

(4) When EPT changes only affects current vCPU, e.g. virtual apic mode
changed.

Send request KVM_REQ_TLB_FLUSH_CURRENT, the request handler will call
kvm_x86_flush_tlb_current() to invalidate TLBs for current vCPU loaded
EPT on current pCPU.

Only the first 3 are relevant to TDX. They are implemented as follows.

(1) kvm_mmu_load()

Only the shared EPT root is loaded in this path. The TDX module does
not require any assurances about the operation, so the
flush_tlb_current()->ept_sync_global() can be called as normal.

(2) vCPU load

When a vCPU migrates to a new logical processor, it has to be flushed
on the *old* pCPU, unlike normal VMs where the INVEPT is executed on
the new pCPU to remove stale mappings from previous usage of the same
EPTP on the new pCPU. The TDX behavior comes from a requirement
that a vCPU can only be associated with one pCPU at at time. This
flush happens via an IPI that invokes TDH.VP.FLUSH SEAMCALL, during
the vcpu_load callback.

(3) Removing a private SPTE

This is the more complicated flow. It is done in a simple way for now
and is especially inefficient during VM teardown. The plan is to get a
basic functional version working and optimize some of these flows
later.

When a private page mapping is removed, the core MMU code calls the
newly remove_external_spte() op, and flushes the TLB on all vCPUs. But
TDX can’t rely on doing that for private memory, so it has it’s own
process for making sure the private page is removed. This flow
(TDH.MEM.RANGE.BLOCK, TDH.MEM.TRACK, TDH.MEM.PAGE.REMOVE) is done
withing the remove_external_spte() implementation as described in the
“TDX TLB flushing” section above.

After that, back in the core MMU code, KVM will call
kvm_flush_remote_tlbs*() resulting in an INVEPT. Despite that, when
the vCPUs re-enter (TDH.VP.ENTER) the TD, the TDX module will do
another INVEPT for its own reassurance.

Private memory teardown
-----------------------
Tearing down private memory involves reclaiming three types of resources
from the TDX module:

1. TD’s HKID

To reclaim the TD’s HKID, no mappings may be mapped with it.

2. Private guest pages (mapped with HKID)
3. Private page tables that map private pages (mapped with HKID)

From the TDX module’s perspective, to reclaim guest private pages they
need to be prevented from be accessed via the HKID (unmapped and TLB
flushed), their HKID associated cachelines need to be flushed, and
they need to be marked as no longer use by the TD in the TDX modules
internal tracking (PAMT)

During runtime private PTEs can be zapped as part of memslot deletion or
when memory coverts from shared to private, but private page tables and
HKIDs are not torn down until the TD is being destructed. The means the
operation to zap private guest mapped pages needs to do the required cache
writeback under the assumption that other vCPU’s may be active, but the
PTs do not.

TD teardown resource reclamation
--------------------------------
The code that does the TD teardown is organized such that when an HKID is
reclaimed:
1. vCPUs will no longer enter the TD
2. The TLB is flushed on all CPUs
3. The HKID associated cachelines have been flushed.

So at that point most of the steps needed to reclaim TD private pages and
page tables have already been done and the reclaim operation only needs to
update the TDX module’s tracking of page ownership. For simplicity each
operation only supports one scenario: before or after HKID reclaim. Since
zapping and reclaiming private pages has to function during runtime for
memslot deletion and converting from shared to private, the TD teardown is
arranged so this happens before HKID reclaim. Since private page tables
are never torn down during TD runtime, they can happen in a simpler and
more efficient way after HKID reclaim. The private page reclaim is
initiated from the kvm fd release. The callchain looks like this:

do_exit
|->exit_mm --> tdx_mmu_release_hkid() was called here previously in v19
|->exit_files
|->1.release vcpu fd
|->2.kvm_gmem_release
| |->kvm_gmem_invalidate_begin --> unmap all leaf entries, causing
| zapping of private guest pages
|->3.release kvmfd
|->kvm_destroy_vm
|->kvm_arch_pre_destroy_vm
| | kvm_x86_call(vm_pre_destroy)(kvm) -->tdx_mmu_release_hkid()
|->kvm_arch_destroy_vm
|->kvm_unload_vcpu_mmus
| kvm_destroy_vcpus(kvm)
| |->kvm_arch_vcpu_destroy
| |->kvm_x86_call(vcpu_free)(vcpu)
| | kvm_mmu_destroy(vcpu) -->unref mirror root
| kvm_mmu_uninit_vm(kvm) --> mirror root ref is 1 here,
| zap private page tables
| static_call_cond(kvm_x86_vm_destroy)(kvm);

show more ...


Revision tags: v6.14-rc5, v6.14-rc4
# 7e548b0d 22-Feb-2025 Sean Christopherson <sean.j.christopherson@intel.com>

KVM: VMX: Add a helper for NMI handling

Add a helper to handles NMI exit.

TDX handles the NMI exit the same as VMX case. Add a helper to share the
code with TDX, expose the helper in common.h.

No

KVM: VMX: Add a helper for NMI handling

Add a helper to handles NMI exit.

TDX handles the NMI exit the same as VMX case. Add a helper to share the
code with TDX, expose the helper in common.h.

No functional change intended.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250222014757.897978-15-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# d5bc91e8 22-Feb-2025 Binbin Wu <binbin.wu@linux.intel.com>

KVM: VMX: Move emulation_required to struct vcpu_vt

Move emulation_required from struct vcpu_vmx to struct vcpu_vt so that
vmx_handle_exit_irqoff() can be reused by TDX code.

No functional change i

KVM: VMX: Move emulation_required to struct vcpu_vt

Move emulation_required from struct vcpu_vmx to struct vcpu_vt so that
vmx_handle_exit_irqoff() can be reused by TDX code.

No functional change intended.

Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20250222014757.897978-14-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 254e5dcd 22-Feb-2025 Isaku Yamahata <isaku.yamahata@intel.com>

KVM: VMX: Move posted interrupt delivery code to common header

Move posted interrupt delivery code to common header so that TDX can
leverage it.

No functional change intended.

Signed-off-by: Isaku

KVM: VMX: Move posted interrupt delivery code to common header

Move posted interrupt delivery code to common header so that TDX can
leverage it.

No functional change intended.

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
[binbin: split into new patch]
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Message-ID: <20250222014757.897978-4-binbin.wu@linux.intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 7172c753 14-Mar-2025 Binbin Wu <binbin.wu@linux.intel.com>

KVM: VMX: Move common fields of struct vcpu_{vmx,tdx} to a struct

Move common fields of struct vcpu_vmx and struct vcpu_tdx to struct
vcpu_vt, to share the code between VMX/TDX as much as possible a

KVM: VMX: Move common fields of struct vcpu_{vmx,tdx} to a struct

Move common fields of struct vcpu_vmx and struct vcpu_tdx to struct
vcpu_vt, to share the code between VMX/TDX as much as possible and to make
TDX exit handling more VMX like.

No functional change intended.

[Adrian: move code that depends on struct vcpu_vmx back to vmx.h]

Suggested-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/Z1suNzg2Or743a7e@google.com
Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Message-ID: <20250129095902.16391-5-adrian.hunter@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1, v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12
# 3b725e97 12-Nov-2024 Rick Edgecombe <rick.p.edgecombe@intel.com>

KVM: VMX: Teach EPT violation helper about private mem

Teach EPT violation helper to check shared mask of a GPA to find out
whether the GPA is for private memory.

When EPT violation is triggered af

KVM: VMX: Teach EPT violation helper about private mem

Teach EPT violation helper to check shared mask of a GPA to find out
whether the GPA is for private memory.

When EPT violation is triggered after TD accessing a private GPA, KVM will
exit to user space if the corresponding GFN's attribute is not private.
User space will then update GFN's attribute during its memory conversion
process. After that, TD will re-access the private GPA and trigger EPT
violation again. Only with GFN's attribute matches to private, KVM will
fault in private page, map it in mirrored TDP root, and propagate changes
to private EPT to resolve the EPT violation.

Relying on GFN's attribute tracking xarray to determine if a GFN is
private, as for KVM_X86_SW_PROTECTED_VM, may lead to endless EPT
violations.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Co-developed-by: Yan Zhao <yan.y.zhao@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Message-ID: <20241112073539.22056-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# c8563d1b 12-Nov-2024 Sean Christopherson <sean.j.christopherson@intel.com>

KVM: VMX: Split out guts of EPT violation to common/exposed function

The difference of TDX EPT violation is how to retrieve information, GPA,
and exit qualification. To share the code to handle EPT

KVM: VMX: Split out guts of EPT violation to common/exposed function

The difference of TDX EPT violation is how to retrieve information, GPA,
and exit qualification. To share the code to handle EPT violation, split
out the guts of EPT violation handler so that VMX/TDX exit handler can call
it after retrieving GPA and exit qualification.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Co-developed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Message-ID: <20241112073528.22042-1-yan.y.zhao@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...