tdx.h - OpenGrok history log for /linux/arch/x86/kvm/vmx/tdx.h

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# 43db1111	29-May-2025	Linus Torvalds <torvalds@linux-foundation.org>	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "As far as x86 goes this pull request "only" includes TDX host support. Quotes are appropr Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "As far as x86 goes this pull request "only" includes TDX host support. Quotes are appropriate because (at 6k lines and 100+ commits) it is much bigger than the rest, which will come later this week and consists mostly of bugfixes and selftests. s390 changes will also come in the second batch. ARM: - Add large stage-2 mapping (THP) support for non-protected guests when pKVM is enabled, clawing back some performance. - Enable nested virtualisation support on systems that support it, though it is disabled by default. - Add UBSAN support to the standalone EL2 object used in nVHE/hVHE and protected modes. - Large rework of the way KVM tracks architecture features and links them with the effects of control bits. While this has no functional impact, it ensures correctness of emulation (the data is automatically extracted from the published JSON files), and helps dealing with the evolution of the architecture. - Significant changes to the way pKVM tracks ownership of pages, avoiding page table walks by storing the state in the hypervisor's vmemmap. This in turn enables the THP support described above. - New selftest checking the pKVM ownership transition rules - Fixes for FEAT_MTE_ASYNC being accidentally advertised to guests even if the host didn't have it. - Fixes for the address translation emulation, which happened to be rather buggy in some specific contexts. - Fixes for the PMU emulation in NV contexts, decoupling PMCR_EL0.N from the number of counters exposed to a guest and addressing a number of issues in the process. - Add a new selftest for the SVE host state being corrupted by a guest. - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, ensuring that the window for interrupts is slightly bigger, and avoiding a pretty bad erratum on the AmpereOne HW. - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from a pretty bad case of TLB corruption unless accesses to HCR_EL2 are heavily synchronised. - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables in a human-friendly fashion. - and the usual random cleanups. LoongArch: - Don't flush tlb if the host supports hardware page table walks. - Add KVM selftests support. RISC-V: - Add vector registers to get-reg-list selftest - VCPU reset related improvements - Remove scounteren initialization from VCPU reset - Support VCPU reset from userspace using set_mpstate() ioctl x86: - Initial support for TDX in KVM. This finally makes it possible to use the TDX module to run confidential guests on Intel processors. This is quite a large series, including support for private page tables (managed by the TDX module and mirrored in KVM for efficiency), forwarding some TDVMCALLs to userspace, and handling several special VM exits from the TDX module. This has been in the works for literally years and it's not really possible to describe everything here, so I'll defer to the various merge commits up to and including commit 7bcf7246c42a ('Merge branch 'kvm-tdx-finish-initial' into HEAD')" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (248 commits) x86/tdx: mark tdh_vp_enter() as __flatten Documentation: virt/kvm: remove unreferenced footnote RISC-V: KVM: lock the correct mp_state during reset KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: np-guest CMOs with PMD_SIZE fixmap KVM: arm64: Stage-2 huge mappings for np-guests KVM: arm64: Add a range to pkvm_mappings KVM: arm64: Convert pkvm_mappings to interval tree KVM: arm64: Add a range to __pkvm_host_test_clear_young_guest() KVM: arm64: Add a range to __pkvm_host_wrprotect_guest() KVM: arm64: Add a range to __pkvm_host_unshare_guest() KVM: arm64: Add a range to __pkvm_host_share_guest() KVM: arm64: Introduce for_each_hyp_page KVM: arm64: Handle huge mappings for np-guest CMOs KVM: arm64: nv: Release faulted-in VNCR page from mmu_lock critical section KVM: arm64: nv: Handle TLBI S1E2 for VNCR invalidation with mmu_lock held KVM: arm64: nv: Hold mmu_lock when invalidating VNCR SW-TLB before translating RISC-V: KVM: add KVM_CAP_RISCV_MP_STATE_RESET RISC-V: KVM: Remove scounteren initialization KVM: RISC-V: remove unnecessary SBI reset state ... show more ...
Revision tags: v6.15, v6.15-rc7, v6.15-rc6, v6.15-rc5, v6.15-rc4, v6.15-rc3, v6.15-rc2, v6.15-rc1, v6.14
# fd02aa45	19-Mar-2025	Paolo Bonzini <pbonzini@redhat.com>	Merge branch 'kvm-tdx-initial' into HEAD This large commit contains the initial support for TDX in KVM. All x86 parts enable the host-side hypercalls that KVM uses to talk to the TDX module, a soft Merge branch 'kvm-tdx-initial' into HEAD This large commit contains the initial support for TDX in KVM. All x86 parts enable the host-side hypercalls that KVM uses to talk to the TDX module, a software component that runs in a special CPU mode called SEAM (Secure Arbitration Mode). The series is in turn split into multiple sub-series, each with a separate merge commit: - Initialization: basic setup for using the TDX module from KVM, plus ioctls to create TDX VMs and vCPUs. - MMU: in TDX, private and shared halves of the address space are mapped by different EPT roots, and the private half is managed by the TDX module. Using the support that was added to the generic MMU code in 6.14, add support for TDX's secure page tables to the Intel side of KVM. Generic KVM code takes care of maintaining a mirror of the secure page tables so that they can be queried efficiently, and ensuring that changes are applied to both the mirror and the secure EPT. - vCPU enter/exit: implement the callbacks that handle the entry of a TDX vCPU (via the SEAMCALL TDH.VP.ENTER) and the corresponding save/restore of host state. - Userspace exits: introduce support for guest TDVMCALLs that KVM forwards to userspace. These correspond to the usual KVM_EXIT_* "heavyweight vmexits" but are triggered through a different mechanism, similar to VMGEXIT for SEV-ES and SEV-SNP. - Interrupt handling: support for virtual interrupt injection as well as handling VM-Exits that are caused by vectored events. Exclusive to TDX are machine-check SMIs, which the kernel already knows how to handle through the kernel machine check handler (commit 7911f145de5f, "x86/mce: Implement recovery for errors in TDX/SEAM non-root mode") - Loose ends: handling of the remaining exits from the TDX module, including EPT violation/misconfig and several TDVMCALL leaves that are handled in the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); plus returning an error or ignoring operations that are not supported by TDX guests Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
Revision tags: v6.14-rc7
# 7bcf7246	12-Mar-2025	Paolo Bonzini <pbonzini@redhat.com>	Merge branch 'kvm-tdx-finish-initial' into HEAD This patch ties the remaining loose ends and finally enables TDX guests to run inside KVM. It implements handling of EPT violation/misconfig and of s Merge branch 'kvm-tdx-finish-initial' into HEAD This patch ties the remaining loose ends and finally enables TDX guests to run inside KVM. It implements handling of EPT violation/misconfig and of several TDVMCALL leaves that are handled in the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); it also adds a bunch of wrappers in vmx/main.c to ignore operations not supported by TDX guests() Finally, it introduces documentation for the new APIs that have been added along the way. () access to CPU state, VMX preemption timer, accesses to TSC offset or multiplier, LMCE enable/disable, hypercall patching. show more ...
# 9913212b	12-Mar-2025	Paolo Bonzini <pbonzini@redhat.com>	Merge branch 'kvm-tdx-interrupts' into HEAD Introduces support for interrupt handling for TDX guests, including virtual interrupt injection and VM-Exits caused by vectored events. Injection ======= Merge branch 'kvm-tdx-interrupts' into HEAD Introduces support for interrupt handling for TDX guests, including virtual interrupt injection and VM-Exits caused by vectored events. Injection ========= TDX supports non-NMI interrupt injection only by posted interrupt. Posted interrupt descriptors (PIDs) are allocated in shared memory, KVM can update them directly. To post pending interrupts in the PID, KVM can generate a self-IPI with notification vector prior to TD entry. TDX guest status is protected, KVM can't get the interrupt status of TDX guest. For now, assume the interrupt is always allowed. A later patch set will let TDX guests to call TDVMCALL with HLT, which passes the interrupt block flag, so that whether interrupt is allowed in HLT will checked against the interrupt block flag. For NMIs, KVM can request the TDX module to inject a NMI into a TDX vCPU by setting the PEND_NMI TDVPS field to 1. Following that, KVM can call TDH.VP.ENTER to run the vCPU and the TDX module will attempt to inject the NMI as soon as possible. PEND_NMI TDVPS field is a 1-bit filed, i.e. KVM can only pend one NMI in the TDX module. Also, TDX doesn't allow KVM to request NMI-window exit directly. When there is already one NMI pending in the TDX module, i.e. it has not been delivered to TDX guest yet, if there is NMI pending in KVM, collapse the pending NMI in KVM into the one pending in the TDX module. Such collapse is OK considering on X86 bare metal, multiple NMIs could collapse into one NMI, e.g. when NMI is blocked by SMI. It's OS's responsibility to poll all NMI sources in the NMI handler to avoid missing handling of some NMI events. More details can be found in the changelog of the patch "KVM: TDX: Implement methods to inject NMI". TDX doesn't support system-management mode (SMM) and system-management interrupt (SMI) in guest TDs because TDX module doesn't provide a way for VMM to inject SMI into guest TD or switch guest vCPU mode into SMM. SMI requests return -ENOTTY similar to CONFIG_KVM_SMM=n. Likewise, INIT and SIPI events are not used and are blocked for TDX guests; TDX defines its own vCPU creation and initialization sequence, which is done on the host via SEAMCALLs at TD build time. VM-exit for external events =========================== Similar to the VMX case, external interrupts are with interrupts off: in the .handle_exit_irqoff() callback for external interrupts and in the noinstr region for NMIs. Just like VMX, NMI remains blocked after exiting from TDX guest for NMI-induced exits. Machine check, which is handled in the .handle_exit_irqoff() callback, is the only exception type KVM handles for TDX guests. For other exceptions, because TDX guest state is protected, exceptions in TDX guests can't be intercepted. TDX VMM isn't supposed to handle these exceptions. Exit to userspace with KVM_EXIT_EXCEPTION If unexpected exception occurs. Host SMIs also cause an exit to KVM. This is needed because in SEAM root mode (TDX module) all interrupts are blocked. An SMI can be "I/O SMI" or "other SMI". For TDX, there will be no I/O SMI because I/O instructions inside TDX guest trigger #VE and TDX guest needs to use TDVMCALL to request VMM to do I/O emulation. The only case of interest for "other SMI" is an #MC occurring in the guest when MCE-SMI morphing is enabled in the host firmware. Such "MSMI" is marked by having bit 0 set in the exit qualification; MSMI exits are fatal for the TD and are eventually handled by the kernel machine check handler (7911f14 x86/mce: Implement recovery for errors in TDX/SEAM non-root mode), which marks the page as poisoned. It is not possible right now to pass machine check exceptions to the guest. SMIs other than machine check SMIs are handled just by leaving SEAM root mode and KVM doesn't need to do anything. show more ...
# 4d2dc9a2	12-Mar-2025	Paolo Bonzini <pbonzini@redhat.com>	Merge branch 'kvm-tdx-userspace-exit' into HEAD Introduces support for VM exits that are forwarded the host VMM in userspace. These are initiated from the TDCALL exit code; although these userspace Merge branch 'kvm-tdx-userspace-exit' into HEAD Introduces support for VM exits that are forwarded the host VMM in userspace. These are initiated from the TDCALL exit code; although these userspace exits have the same TDX exit code, they result in several different types of exits to userspace. When a guest TD issues a TDVMCALL, it exits to VMM with a new exit reason. The arguments from the guest TD and return values from the VMM are passed through the guest registers. The ABI details for the guest TD hypercalls are specified in the TDX GHCI specification. There are two types of hypercalls defined in the GHCI specification: - Standard TDVMCALLs: When input of R10 from guest TD is set to 0, it indicates that the TDVMCALL sub-function used in R11 is defined in GHCI specification. - Vendor-Specific TDVMCALLs: When input of R10 from guest TD is non-zero, it indicates a vendor-specific TDVMCALL. KVM hypercalls from the guest follow this interface, using R10 as KVM hypercall number and R11-R14 as 4 arguments. The error code returned in R10. This series includes basic standard TDVMCALLs that map to existing eixt reasons: - TDG.VP.VMCALL<MapGPA> reuses exit reason KVM_EXIT_HYPERCALL with the hypercall number KVM_HC_MAP_GPA_RANGE. - TDG.VP.VMCALL<ReportFatalError> reuses exit reason KVM_EXIT_SYSTEM_EVENT with a new event type KVM_SYSTEM_EVENT_TDX_FATAL. - TDG.VP.VMCALL<Instruction.IO> reuses exit reason KVM_EXIT_IO. - TDG.VP.VMCALL<#VE.RequestMMIO> reuses exit reason KVM_EXIT_MMIO. Notably, handling for TDG.VP.VMCALL<SetupEventNotifyInterrupt> and TDG.VP.VMCALL<GetQuote> is not included yet. show more ...
# 77ab80c6	12-Mar-2025	Paolo Bonzini <pbonzini@redhat.com>	Merge branch 'kvm-tdx-enter-exit' into HEAD This series introduces callbacks to facilitate the entry of a TD VCPU and the corresponding save/restore of host state. A TD VCPU is entered via the SEAM Merge branch 'kvm-tdx-enter-exit' into HEAD This series introduces callbacks to facilitate the entry of a TD VCPU and the corresponding save/restore of host state. A TD VCPU is entered via the SEAMCALL TDH.VP.ENTER. The TDX Module manages the save/restore of guest state and, in conjunction with the SEAMCALL interface, handles certain aspects of host state. However, there are specific elements of the host state that require additional attention, as detailed in the Intel TDX ABI documentation for TDH.VP.ENTER. TDX is quite different from VMX in this regard. For VMX, the host VMM is heavily involved in restoring, managing and saving guest CPU state, whereas for TDX this is handled by the TDX Module. In that way, the TDX Module can protect the confidentiality and integrity of TD CPU state. The TDX Module does not save/restore all host CPU state because the host VMM can do it more efficiently and selectively. CPU state referred to below is host CPU state. Often values are already held in memory so no explicit save is needed, and restoration may not be needed if the kernel is not using a feature. TDX does not support PAUSE-loop exiting. According to the TDX Module Base arch. spec., hypercalls are expected to be used instead. Note that the Linux TDX guest supports existing hypercalls via TDG.VP.VMCALL. This series requires TDX module 1.5.06.00.0744, or later, due to removal of the workarounds for the lack of the NO_RBP_MOD feature required by the kernel. NO_RBP_MOD is now required. show more ...
Revision tags: v6.14-rc6
# fcbe3482	06-Mar-2025	Paolo Bonzini <pbonzini@redhat.com>	Merge branch 'kvm-tdx-mmu' into HEAD This series picks up from commit 86eb1aef7279 ("Merge branch 'kvm-mirror-page-tables' into HEAD", 2025-01-20), which focused on changes to the generic x86 parts Merge branch 'kvm-tdx-mmu' into HEAD This series picks up from commit 86eb1aef7279 ("Merge branch 'kvm-mirror-page-tables' into HEAD", 2025-01-20), which focused on changes to the generic x86 parts of the KVM MMU code, and adds support for TDX's secure page tables to the Intel side of KVM. Confidential computing solutions have concepts of private and shared memory. Often the guest accesses either private or shared memory via a bit in the guest PTE. Solutions like SEV treat this bit more like a permission bit, where solutions like TDX and ARM CCA treat it more like a GPA bit. In the latter case, the host maps private memory in one half of the address space and shared in another. For TDX these two halves are mapped by different EPT roots. The private half (also called Secure EPT in Intel documentation) gets managed by the privileged TDX Module. The shared half is managed by the untrusted part of the VMM (KVM). In addition to the separate roots for private and shared, there are limitations on what operations can be done on the private side. Like SNP, TDX wants to protect against protected memory being reset or otherwise scrambled by the host. In order to prevent this, the guest has to take specific action to “accept” memory after changes are made by the VMM to the private EPT. This prevents the VMM from performing many of the usual memory management operations that involve zapping and refaulting memory. The private memory also is always RWX and cannot have VMM specified cache attribute attributes applied. TDX memory implementation ========================= Creating shared EPT ------------------- Shared EPT handling is relatively simple compared to private memory. It is managed from within KVM. The main differences between shared EPT and EPT in a normal VM are that the root is set with a TDVMCS field (via SEAMCALL), and that the GFN specified in the memslot perspective needs to be mapped at an offset in the EPT. For the former, this series plumbs in the load_mmu_pgd() operation to the correct field for the shared EPT. For the latter, previous patches have laid the groundwork for mapping so called “direct roots” roots at an offset specified in kvm->arch.gfn_direct_bits. Creating private EPT -------------------- In previous patches, the concept of “mirrored roots” were introduced. Such roots maintain a KVM side “mirror” of the “external” EPT by keeping an unmapped EPT tree within the KVM MMU code. When changing these mirror EPTs, the KVM MMU code calls out via x86_ops to update the external EPT. This series adds implementations for these “external” ops for TDX to create and manage “private” memory via TDX module APIs. Managing S-EPT with the TDX Module ---------------------------------- The TDX module allows the TD’s private memory to be managed via SEAMCALLs. This management consists of operating on two internal elements: 1. The private EPT, which the TDX module calls the S-EPT. It maps the actual mapped, private half of the GPA space using an EPT tree. 2. The HKID, which represents private encryption keys used for encrypting TD memory. The CPU doesn’t guarantee cache coherency between these encryption keys, so memory that is encrypted with one of these keys needs to be reclaimed for use on the host in special ways. This series will primarily focus on the SEAMCALLs for managing the private EPT. Consideration of the HKID is needed for when the TD is torn down. Populating TDX Private memory ----------------------------- TDX allows the EPT mapping the TD's private memory to be modified in limited ways. There are SEAMCALLs for building and tearing down the EPT tree, as well as mapping pages into the private EPT. As for building and tearing down the EPT page tables, it is relatively simple. There are SEAMCALLs for installing and removing them. However, the current implementation only supports adding private EPT page tables, and leaves them installed for the lifetime of the TD. For teardown, the details are discussed in a later section. As for populating and zapping private SPTE, there are SEAMCALLs for this as well. The zapping case will be described in detail later. As for the populating case, there are two categories: before TD is finalized and after TD is finalized. Both of these scenarios go through the TDP MMU map path. The changes done previously to introduce “mirror” and “external” page tables handle directing SPTE installation operations through the set_external_spte() op. In the “after” case, the TDX set_external_spte() handler simply calls a SEAMCALL (TDX.MEM.PAGE.AUG). For the before case, it is a bit more complicated as it requires both setting the private SPTE and copying in the initial contents of the page at the same time. For TDX this is done via the KVM_TDX_INIT_MEM_REGION ioctl, which is effectively the kvm_gmem_populate() operation. For SNP, the private memory can be pre-populated first, and faulted in later like normal. But for TDX these need to both happen both at the same time and the setting of the private SPTE needs to happen in a different way than the “after” case described above. It needs to use the TDH.MEM.SEPT.ADD SEAMCALL which does both the copying in of the data and setting the SPTE. Without extensive modification to the fault path, it’s not possible utilize this callback from the set_external_spte() handler because it the source page for the data to be copied in is not known deep down in this callchain. So instead the post-populate callback does a three step process. 1. Pre-fault the memory into the mirror EPT, but have the set_external_spte() not make any SEAMCALLs. 2. Check that the page is still faulted into the mirror EPT under read mmu_lock that is held over this and the following step. 3. Call TDH.MEM.SEPT.ADD with the HPA of the page to copy data from, and the private page installed in the mirror EPT to use for the private mapping. The scheme involves some assumptions about the operations that might operate on the mirrored EPT before the VM is finalized. It assumes that no other memory will be faulted into the mirror EPT, that is not also added via TDH.MEM.SEPT.ADD). If this is violated the KVM MMU may not see private memory faulted in there later and so not make the proper external spte callbacks. To check this, KVM enforces that the number of pre-faulted pages is the same as the number of pages added via KVM_TDX_INIT_MEM_REGION. TDX TLB flushing ---------------- For TDX, TLB flushing needs to happen in different ways depending on whether private and/or shared EPT needs to be flushed. Shared EPT can be flushed like normal EPT with INVEPT. To avoid reading TD's EPTP out from TDX module, this series flushes shared EPT with type 2 INVEPT. Private TLB entries can be flushed this way too (via type 2). However, since the TDX module needs to enforce some guarantees around which private memory is mapped in the TD, it requires these operations to be done in special ways for private memory. For flushing private memory, two methods are possible. The simple one is the TDH.VP.FLUSH SEAMCALL; this flush is of the INVEPT type 1 variety (i.e. mappings associated with the TD). The second method is part of a sequence of SEAMCALLs for removing a guest page. The sequence looks like: 1. TDH.MEM.RANGE.BLOCK - Remove RWX bits from entry (similar to KVM’s zap). 2. TDH.MEM.TRACK - Increment the TD TLB epoch, which is a per-TD counter 3. Kick off all vCPUs - In order to force them to have to re-enter. 4. TDH.MEM.PAGE.REMOVE - Actually remove the page and make it available for other use. 5. TDH.VP.ENTER - On re-entering TDX module will see the epoch is incremented and flush the TLB. On top of this, during TDX module init TDH.SYS.LP.INIT (which is used to online a CPU for TDX usage) invokes INVEPT to flush all mappings in the TLB. During runtime, for normal (TDP MMU, non-nested) guests, KVM will do a TLB flushes in 4 scenarios: (1) kvm_mmu_load() After EPT is loaded, call kvm_x86_flush_tlb_current() to invalidate TLBs for current vCPU loaded EPT on current pCPU. (2) Loading vCPU to a new pCPU Send request KVM_REQ_TLB_FLUSH to current vCPU, the request handler will call kvm_x86_flush_tlb_all() to flush all EPTs assocated with the new pCPU. (3) When EPT mapping has changed (after removing or permission reduction) (e.g. in kvm_flush_remote_tlbs()) Send request KVM_REQ_TLB_FLUSH to all vCPUs by kicking all them off, the request handler on each vCPU will call kvm_x86_flush_tlb_all() to invalidate TLBs for all EPTs associated with the pCPU. (4) When EPT changes only affects current vCPU, e.g. virtual apic mode changed. Send request KVM_REQ_TLB_FLUSH_CURRENT, the request handler will call kvm_x86_flush_tlb_current() to invalidate TLBs for current vCPU loaded EPT on current pCPU. Only the first 3 are relevant to TDX. They are implemented as follows. (1) kvm_mmu_load() Only the shared EPT root is loaded in this path. The TDX module does not require any assurances about the operation, so the flush_tlb_current()->ept_sync_global() can be called as normal. (2) vCPU load When a vCPU migrates to a new logical processor, it has to be flushed on the old pCPU, unlike normal VMs where the INVEPT is executed on the new pCPU to remove stale mappings from previous usage of the same EPTP on the new pCPU. The TDX behavior comes from a requirement that a vCPU can only be associated with one pCPU at at time. This flush happens via an IPI that invokes TDH.VP.FLUSH SEAMCALL, during the vcpu_load callback. (3) Removing a private SPTE This is the more complicated flow. It is done in a simple way for now and is especially inefficient during VM teardown. The plan is to get a basic functional version working and optimize some of these flows later. When a private page mapping is removed, the core MMU code calls the newly remove_external_spte() op, and flushes the TLB on all vCPUs. But TDX can’t rely on doing that for private memory, so it has it’s own process for making sure the private page is removed. This flow (TDH.MEM.RANGE.BLOCK, TDH.MEM.TRACK, TDH.MEM.PAGE.REMOVE) is done withing the remove_external_spte() implementation as described in the “TDX TLB flushing” section above. After that, back in the core MMU code, KVM will call kvm_flush_remote_tlbs*() resulting in an INVEPT. Despite that, when the vCPUs re-enter (TDH.VP.ENTER) the TD, the TDX module will do another INVEPT for its own reassurance. Private memory teardown ----------------------- Tearing down private memory involves reclaiming three types of resources from the TDX module: 1. TD’s HKID To reclaim the TD’s HKID, no mappings may be mapped with it. 2. Private guest pages (mapped with HKID) 3. Private page tables that map private pages (mapped with HKID) From the TDX module’s perspective, to reclaim guest private pages they need to be prevented from be accessed via the HKID (unmapped and TLB flushed), their HKID associated cachelines need to be flushed, and they need to be marked as no longer use by the TD in the TDX modules internal tracking (PAMT) During runtime private PTEs can be zapped as part of memslot deletion or when memory coverts from shared to private, but private page tables and HKIDs are not torn down until the TD is being destructed. The means the operation to zap private guest mapped pages needs to do the required cache writeback under the assumption that other vCPU’s may be active, but the PTs do not. TD teardown resource reclamation -------------------------------- The code that does the TD teardown is organized such that when an HKID is reclaimed: 1. vCPUs will no longer enter the TD 2. The TLB is flushed on all CPUs 3. The HKID associated cachelines have been flushed. So at that point most of the steps needed to reclaim TD private pages and page tables have already been done and the reclaim operation only needs to update the TDX module’s tracking of page ownership. For simplicity each operation only supports one scenario: before or after HKID reclaim. Since zapping and reclaiming private pages has to function during runtime for memslot deletion and converting from shared to private, the TD teardown is arranged so this happens before HKID reclaim. Since private page tables are never torn down during TD runtime, they can happen in a simpler and more efficient way after HKID reclaim. The private page reclaim is initiated from the kvm fd release. The callchain looks like this: do_exit \|->exit_mm --> tdx_mmu_release_hkid() was called here previously in v19 \|->exit_files \|->1.release vcpu fd \|->2.kvm_gmem_release \| \|->kvm_gmem_invalidate_begin --> unmap all leaf entries, causing \| zapping of private guest pages \|->3.release kvmfd \|->kvm_destroy_vm \|->kvm_arch_pre_destroy_vm \| \| kvm_x86_call(vm_pre_destroy)(kvm) -->tdx_mmu_release_hkid() \|->kvm_arch_destroy_vm \|->kvm_unload_vcpu_mmus \| kvm_destroy_vcpus(kvm) \| \|->kvm_arch_vcpu_destroy \| \|->kvm_x86_call(vcpu_free)(vcpu) \| \| kvm_mmu_destroy(vcpu) -->unref mirror root \| kvm_mmu_uninit_vm(kvm) --> mirror root ref is 1 here, \| zap private page tables \| static_call_cond(kvm_x86_vm_destroy)(kvm); show more ...
# 0d20742b	06-Mar-2025	Paolo Bonzini <pbonzini@redhat.com>	Merge branch 'kvm-tdx-initialization' into HEAD This series kicks off the actual interaction of KVM with the TDX module. This series encompasses the basic setup for using the TDX module from KVM, an Merge branch 'kvm-tdx-initialization' into HEAD This series kicks off the actual interaction of KVM with the TDX module. This series encompasses the basic setup for using the TDX module from KVM, and the creation of TD VMs and vCPUs. The TDX Module is a software component that runs in a special CPU mode called SEAM (Secure Arbitration Mode). Loading it is mostly handled outside of KVM by the core kernel. Once it’s loaded KVM can interact with the TDX Module via a new instruction called SEAMCALL to virtualize a TD guests. This instruction can be used to make various types of seamcalls, with names organized into a hierarchy. The format is TDH.[AREA].[ACTION], where “TDH” stands for “Trust Domain Host”, and differentiates from another set of calls that can be done by the guest “TDG”. The KVM relevant areas of SEAMCALLs are: SYS – TDX module management, static metadata reading. MNG – TD management. VM scoped things that operate on a TDX module controlled structure called the TDCS. VP – vCPU management. vCPU scoped things that operate on TDX module controlled structures called the TDVPS. PHYMEM - Operations related to physical memory management (page reclaiming, cache operations, etc). This series introduces some TDX specific KVM APIs and stops short of fully “finalizing” the creation of a TD VM. The part of initializing a guest where initial private memory is loaded is left to a separate MMU related series. show more ...
Revision tags: v6.14-rc5
# 081385db	27-Feb-2025	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall Morph PV RDMSR/WRMSR hypercall to EXIT_REASON_MSR_{READ,WRITE} and wire up KVM backend functions. For complete_emulated_msr() callback, instead of inje KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall Morph PV RDMSR/WRMSR hypercall to EXIT_REASON_MSR_{READ,WRITE} and wire up KVM backend functions. For complete_emulated_msr() callback, instead of injecting #GP on error, implement tdx_complete_emulated_msr() to set return code on error. Also set return value on MSR read according to the values from kvm x86 registers. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20250227012021.1778144-10-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# 5cf7239b	27-Feb-2025	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: Handle TDX PV HLT hypercall Handle TDX PV HLT hypercall and the interrupt status due to it. TDX guest status is protected, KVM can't get the interrupt status of TDX guest and it assumes i KVM: TDX: Handle TDX PV HLT hypercall Handle TDX PV HLT hypercall and the interrupt status due to it. TDX guest status is protected, KVM can't get the interrupt status of TDX guest and it assumes interrupt is always allowed unless TDX guest calls TDVMCALL with HLT, which passes the interrupt blocked flag. If the guest halted with interrupt enabled, also query pending RVI by checking bit 0 of TD_VCPU_STATE_DETAILS_NON_ARCH field via a seamcall. Update vt_interrupt_allowed() for TDX based on interrupt blocked flag passed by HLT TDVMCALL. Do not wakeup TD vCPU if interrupt is blocked for VT-d PI. For NMIs, KVM cannot determine the NMI blocking status for TDX guests, so KVM always assumes NMIs are not blocked. In the unlikely scenario where a guest invokes the PV HLT hypercall within an NMI handler, this could result in a spurious wakeup. The guest should implement the PV HLT hypercall within a loop if it truly requires no interruptions, since NMI could be unblocked by an IRET due to an exception occurring before the PV HLT is executed in the NMI handler. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Message-ID: <20250227012021.1778144-7-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# 4b2abc49	27-Feb-2025	Yan Zhao <yan.y.zhao@intel.com>	KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal Kick off all vCPUs and prevent tdh_vp_enter() from executing whenever tdh_mem_range_block()/tdh_mem_track()/tdh_mem_page_remove( KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal Kick off all vCPUs and prevent tdh_vp_enter() from executing whenever tdh_mem_range_block()/tdh_mem_track()/tdh_mem_page_remove() encounters contention, since the page removal path does not expect error and is less sensitive to the performance penalty caused by kicking off vCPUs. Although KVM has protected SEPT zap-related SEAMCALLs with kvm->mmu_lock, KVM may still encounter TDX_OPERAND_BUSY due to the contention in the TDX module. - tdh_mem_track() may contend with tdh_vp_enter(). - tdh_mem_range_block()/tdh_mem_page_remove() may contend with tdh_vp_enter() and TDCALLs. Resources SHARED users EXCLUSIVE users ------------------------------------------------------------ TDCS epoch tdh_vp_enter tdh_mem_track ------------------------------------------------------------ SEPT tree tdh_mem_page_remove tdh_vp_enter (0-step mitigation) tdh_mem_range_block ------------------------------------------------------------ SEPT entry tdh_mem_range_block (Host lock) tdh_mem_page_remove (Host lock) tdg_mem_page_accept (Guest lock) tdg_mem_page_attr_rd (Guest lock) tdg_mem_page_attr_wr (Guest lock) Use a TDX specific per-VM flag wait_for_sept_zap along with KVM_REQ_OUTSIDE_GUEST_MODE to kick off vCPUs and prevent them from entering TD, thereby avoiding the potential contention. Apply the kick-off and no vCPU entering only after each SEAMCALL busy error to minimize the window of no TD entry, as the contention due to 0-step mitigation or TDCALLs is expected to be rare. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Message-ID: <20250227012021.1778144-5-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
Revision tags: v6.14-rc4
# acc64eb4	22-Feb-2025	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: Implement methods to inject NMI Inject NMI to TDX guest by setting the PEND_NMI TDVPS field to 1, i.e. make the NMI pending in the TDX module. If there is a further pending NMI in KVM, co KVM: TDX: Implement methods to inject NMI Inject NMI to TDX guest by setting the PEND_NMI TDVPS field to 1, i.e. make the NMI pending in the TDX module. If there is a further pending NMI in KVM, collapse it to the one pending in the TDX module. VMM can request the TDX module to inject a NMI into a TDX vCPU by setting the PEND_NMI TDVPS field to 1. Following that, VMM can call TDH.VP.ENTER to run the vCPU and the TDX module will attempt to inject the NMI as soon as possible. KVM has the following 3 cases to inject two NMIs when handling simultaneous NMIs and they need to be injected in a back-to-back way. Otherwise, OS kernel may fire a warning about the unknown NMI [1]: K1. One NMI is being handled in the guest and one NMI pending in KVM. KVM requests NMI window exit to inject the pending NMI. K2. Two NMIs are pending in KVM. KVM injects the first NMI and requests NMI window exit to inject the second NMI. K3. A previous NMI needs to be rejected and one NMI pending in KVM. KVM first requests force immediate exit followed by a VM entry to complete the NMI rejection. Then, during the force immediate exit, KVM requests NMI window exit to inject the pending NMI. For TDX, PEND_NMI TDVPS field is a 1-bit field, i.e. KVM can only pend one NMI in the TDX module. Also, the vCPU state is protected, KVM doesn't know the NMI blocking states of TDX vCPU, KVM has to assume NMI is always unmasked and allowed. When KVM sees PEND_NMI is 1 after a TD exit, it means the previous NMI needs to be re-injected. Based on KVM's NMI handling flow, there are following 6 cases: In NMI handler TDX module KVM T1. No PEND_NMI=0 1 pending NMI T2. No PEND_NMI=0 2 pending NMIs T3. No PEND_NMI=1 1 pending NMI T4. Yes PEND_NMI=0 1 pending NMI T5. Yes PEND_NMI=0 2 pending NMIs T6. Yes PEND_NMI=1 1 pending NMI K1 is mapped to T4. K2 is mapped to T2 or T5. K3 is mapped to T3 or T6. Note: KVM doesn't know whether NMI is blocked by a NMI or not, case T5 and T6 can happen. When handling pending NMI in KVM for TDX guest, what KVM can do is to add a pending NMI in TDX module when PEND_NMI is 0. T1 and T4 can be handled by this way. However, TDX doesn't allow KVM to request NMI window exit directly, if PEND_NMI is already set and there is still pending NMI in KVM, the only way KVM could try is to request a force immediate exit. But for case T5 and T6, force immediate exit will result in infinite loop because force immediate exit makes it no progress in the NMI handler, so that the pending NMI in the TDX module can never be injected. Considering on X86 bare metal, multiple NMIs could collapse into one NMI, e.g. when NMI is blocked by SMI. It's OS's responsibility to poll all NMI sources in the NMI handler to avoid missing handling of some NMI events. Based on that, for the above 3 cases (K1-K3), only case K1 must inject the second NMI because the guest NMI handler may have already polled some of the NMI sources, which could include the source of the pending NMI, the pending NMI must be injected to avoid the lost of NMI. For case K2 and K3, the guest OS will poll all NMI sources (including the sources caused by the second NMI and further NMI collapsed) when the delivery of the first NMI, KVM doesn't have the necessity to inject the second NMI. To handle the NMI injection properly for TDX, there are two options: - Option 1: Modify the KVM's NMI handling common code, to collapse the second pending NMI for K2 and K3. - Option 2: Do it in TDX specific way. When the previous NMI is still pending in the TDX module, i.e. it has not been delivered to TDX guest yet, collapse the pending NMI in KVM into the previous one. This patch goes with option 2 because it is simple and doesn't impact other VM types. Option 1 may need more discussions. This is the first need to access vCPU scope metadata in the "management" class. Make needed accessors available. [1] https://lore.kernel.org/all/1317409584-23662-5-git-send-email-dzickus@redhat.com/ Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20250222014757.897978-8-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# 2c304880	22-Feb-2025	Binbin Wu <binbin.wu@linux.intel.com>	KVM: TDX: Handle TDG.VP.VMCALL<MapGPA> Convert TDG.VP.VMCALL<MapGPA> to KVM_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE and forward it to userspace for handling. MapGPA is used by TDX guest to request KVM: TDX: Handle TDG.VP.VMCALL<MapGPA> Convert TDG.VP.VMCALL<MapGPA> to KVM_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE and forward it to userspace for handling. MapGPA is used by TDX guest to request to map a GPA range as private or shared memory. It needs to exit to userspace for handling. KVM has already implemented a similar hypercall KVM_HC_MAP_GPA_RANGE, which will exit to userspace with exit reason KVM_EXIT_HYPERCALL. Do sanity checks, convert TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE and forward the request to userspace. To prevent a TDG.VP.VMCALL<MapGPA> call from taking too long, the MapGPA range is split into 2MB chunks and check interrupt pending between chunks. This allows for timely injection of interrupts and prevents issues with guest lockup detection. TDX guest should retry the operation for the GPA starting at the address specified in R11 when the TDVMCALL return TDVMCALL_RETRY as status code. Note userspace needs to enable KVM_CAP_EXIT_HYPERCALL with KVM_HC_MAP_GPA_RANGE bit set for TD VM. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Message-ID: <20250222014225.897298-7-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# 095b71a0	06-Mar-2025	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: Add a place holder to handle TDX VM exit Introduce the wiring for handling TDX VM exits by implementing the callbacks .get_exit_info(), .get_entry_info(), and .handle_exit(). Additionally, KVM: TDX: Add a place holder to handle TDX VM exit Introduce the wiring for handling TDX VM exits by implementing the callbacks .get_exit_info(), .get_entry_info(), and .handle_exit(). Additionally, add error handling during the TDX VM exit flow, and add a place holder to handle various exit reasons. Store VMX exit reason and exit qualification in struct vcpu_vt for TDX, so that TDX/VMX can use the same helpers to get exit reason and exit qualification. Store extended exit qualification and exit GPA info in struct vcpu_tdx because they are used by TDX code only. Contention Handling: The TDH.VP.ENTER operation may contend with TDH.MEM.* operations due to secure EPT or TD EPOCH. If the contention occurs, the return value will have TDX_OPERAND_BUSY set, prompting the vCPU to attempt re-entry into the guest with EXIT_FASTPATH_EXIT_HANDLED, not EXIT_FASTPATH_REENTER_GUEST, so that the interrupts pending during IN_GUEST_MODE can be delivered for sure. Otherwise, the requester of KVM_REQ_OUTSIDE_GUEST_MODE may be blocked endlessly. Error Handling: - TDX_SW_ERROR: This includes #UD caused by SEAMCALL instruction if the CPU isn't in VMX operation, #GP caused by SEAMCALL instruction when TDX isn't enabled by the BIOS, and TDX_SEAMCALL_VMFAILINVALID when SEAM firmware is not loaded or disabled. - TDX_ERROR: This indicates some check failed in the TDX module, preventing the vCPU from running. - Failed VM Entry: Exit to userspace with KVM_EXIT_FAIL_ENTRY. Handle it separately before handling TDX_NON_RECOVERABLE because when off-TD debug is not enabled, TDX_NON_RECOVERABLE is set. - TDX_NON_RECOVERABLE: Set by the TDX module when the error is non-recoverable, indicating that the TDX guest is dead or the vCPU is disabled. A special case is triple fault, which also sets TDX_NON_RECOVERABLE but exits to userspace with KVM_EXIT_SHUTDOWN, aligning with the VMX case. - Any unhandled VM exit reason will also return to userspace with KVM_EXIT_INTERNAL_ERROR. Suggested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Message-ID: <20250222014225.897298-4-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
Revision tags: v6.14-rc3, v6.14-rc2, v6.14-rc1
# e0b4f31a	29-Jan-2025	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: restore user ret MSRs Several MSRs are clobbered on TD exit that are not used by Linux while in ring 0. Ensure the cached value of the MSR is updated on vcpu_put, and the MSRs themselves KVM: TDX: restore user ret MSRs Several MSRs are clobbered on TD exit that are not used by Linux while in ring 0. Ensure the cached value of the MSR is updated on vcpu_put, and the MSRs themselves before returning to ring 3. Co-developed-by: Tony Lindgren <tony.lindgren@linux.intel.com> Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Message-ID: <20250129095902.16391-10-adrian.hunter@intel.com> Reviewed-by: Xiayao Li <xiaoyao.li@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# 81bf912b	29-Jan-2025	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: Implement TDX vcpu enter/exit path Implement callbacks to enter/exit a TDX VCPU by calling tdh_vp_enter(). Ensure the TDX VCPU is in a correct state to run. Do not pass arguments from/to KVM: TDX: Implement TDX vcpu enter/exit path Implement callbacks to enter/exit a TDX VCPU by calling tdh_vp_enter(). Ensure the TDX VCPU is in a correct state to run. Do not pass arguments from/to vcpu->arch.regs[] unconditionally. Instead, marshall state to/from the appropriate x86 registers only when needed, i.e., to handle some TDVMCALL sub-leaves following KVM's ABI to leverage the existing code. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Message-ID: <20250129095902.16391-6-adrian.hunter@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# 7172c753	14-Mar-2025	Binbin Wu <binbin.wu@linux.intel.com>	KVM: VMX: Move common fields of struct vcpu_{vmx,tdx} to a struct Move common fields of struct vcpu_vmx and struct vcpu_tdx to struct vcpu_vt, to share the code between VMX/TDX as much as possible a KVM: VMX: Move common fields of struct vcpu_{vmx,tdx} to a struct Move common fields of struct vcpu_vmx and struct vcpu_tdx to struct vcpu_vt, to share the code between VMX/TDX as much as possible and to make TDX exit handling more VMX like. No functional change intended. [Adrian: move code that depends on struct vcpu_vmx back to vmx.h] Suggested-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/Z1suNzg2Or743a7e@google.com Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Message-ID: <20250129095902.16391-5-adrian.hunter@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
Revision tags: v6.13, v6.13-rc7, v6.13-rc6, v6.13-rc5, v6.13-rc4, v6.13-rc3, v6.13-rc2, v6.13-rc1, v6.12
# d789fa6e	12-Nov-2024	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: Handle vCPU dissociation Handle vCPUs dissociations by invoking SEAMCALL TDH.VP.FLUSH which flushes the address translation caches and cached TD VMCS of a TD vCPU in its associated pCPU. KVM: TDX: Handle vCPU dissociation Handle vCPUs dissociations by invoking SEAMCALL TDH.VP.FLUSH which flushes the address translation caches and cached TD VMCS of a TD vCPU in its associated pCPU. In TDX, a vCPUs can only be associated with one pCPU at a time, which is done by invoking SEAMCALL TDH.VP.ENTER. For a successful association, the vCPU must be dissociated from its previous associated pCPU. To facilitate vCPU dissociation, introduce a per-pCPU list associated_tdvcpus. Add a vCPU into this list when it's loaded into a new pCPU (i.e. when a vCPU is loaded for the first time or migrated to a new pCPU). vCPU dissociations can happen under below conditions: - On the op hardware_disable is called. This op is called when virtualization is disabled on a given pCPU, e.g. when hot-unplug a pCPU or machine shutdown/suspend. In this case, dissociate all vCPUs from the pCPU by iterating its per-pCPU list associated_tdvcpus. - On vCPU migration to a new pCPU. Before adding a vCPU into associated_tdvcpus list of the new pCPU, dissociation from its old pCPU is required, which is performed by issuing an IPI and executing SEAMCALL TDH.VP.FLUSH on the old pCPU. On a successful dissociation, the vCPU will be removed from the associated_tdvcpus list of its previously associated pCPU. - On tdx_mmu_release_hkid() is called. TDX mandates that all vCPUs must be disassociated prior to the release of an hkid. Therefore, dissociation of all vCPUs is a must before executing the SEAMCALL TDH.MNG.VPFLUSHDONE and subsequently freeing the hkid. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Co-developed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Message-ID: <20241112073858.22312-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
Revision tags: v6.12-rc7, v6.12-rc6, v6.12-rc5, v6.12-rc4, v6.12-rc3, v6.12-rc2, v6.12-rc1, v6.11, v6.11-rc7
# 012426d6	04-Sep-2024	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: Finalize VM initialization Add a new VM-scoped KVM_MEMORY_ENCRYPT_OP IOCTL subcommand, KVM_TDX_FINALIZE_VM, to perform TD Measurement Finalization. Documentation for the API is added in a KVM: TDX: Finalize VM initialization Add a new VM-scoped KVM_MEMORY_ENCRYPT_OP IOCTL subcommand, KVM_TDX_FINALIZE_VM, to perform TD Measurement Finalization. Documentation for the API is added in another patch: "Documentation/virt/kvm: Document on Trust Domain Extensions(TDX)" For the purpose of attestation, a measurement must be made of the TDX VM initial state. This is referred to as TD Measurement Finalization, and uses SEAMCALL TDH.MR.FINALIZE, after which: 1. The VMM adding TD private pages with arbitrary content is no longer allowed 2. The TDX VM is runnable Co-developed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Message-ID: <20240904030751.117579-21-rick.p.edgecombe@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# fe1e6d48	12-Nov-2024	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: Add accessors VMX VMCS helpers TDX defines SEAMCALL APIs to access TDX control structures corresponding to the VMX VMCS. Introduce helper accessors to hide its SEAMCALL ABI details. Sign KVM: TDX: Add accessors VMX VMCS helpers TDX defines SEAMCALL APIs to access TDX control structures corresponding to the VMX VMCS. Introduce helper accessors to hide its SEAMCALL ABI details. Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Co-developed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Message-ID: <20241112073551.22070-1-yan.y.zhao@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# 7c035bea	19-Feb-2025	Zhiming Hu <zhiming.hu@intel.com>	KVM: TDX: Register TDX host key IDs to cgroup misc controller TDX host key IDs (HKID) are limit resources in a machine, and the misc cgroup lets the machine owner track their usage and limits the po KVM: TDX: Register TDX host key IDs to cgroup misc controller TDX host key IDs (HKID) are limit resources in a machine, and the misc cgroup lets the machine owner track their usage and limits the possibility of abusing them outside the owner's control. The cgroup v2 miscellaneous subsystem was introduced to control the resource of AMD SEV & SEV-ES ASIDs. Likewise introduce HKIDs as a misc resource. Signed-off-by: Zhiming Hu <zhiming.hu@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# a50f673f	30-Oct-2024	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: Do TDX specific vcpu initialization TD guest vcpu needs TDX specific initialization before running. Repurpose KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command KVM_TDX_INIT_VCPU, KVM: TDX: Do TDX specific vcpu initialization TD guest vcpu needs TDX specific initialization before running. Repurpose KVM_MEMORY_ENCRYPT_OP to vcpu-scope, add a new sub-command KVM_TDX_INIT_VCPU, and implement the callback for it. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Tony Lindgren <tony.lindgren@linux.intel.com> Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com> Co-developed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- - Fix comment: https://lore.kernel.org/kvm/Z36OYfRW9oPjW8be@google.com/ (Sean) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# 0186dd29	14-Jan-2025	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: add ioctl to initialize VM with TDX specific parameters After the crypto-protection key has been configured, TDX requires a VM-scope initialization as a step of creating the TDX guest. Th KVM: TDX: add ioctl to initialize VM with TDX specific parameters After the crypto-protection key has been configured, TDX requires a VM-scope initialization as a step of creating the TDX guest. This "per-VM" TDX initialization does the global configurations/features that the TDX guest can support, such as guest's CPUIDs (emulated by the TDX module), the maximum number of vcpus etc. Because there is no room in KVM_CREATE_VM to pass all the required parameters, introduce a new ioctl KVM_TDX_INIT_VM and mark the VM as TD_STATE_UNINITIALIZED until it is invoked. This "per-VM" TDX initialization must be done before any "vcpu-scope" TDX initialization; KVM_TDX_INIT_VM IOCTL must be invoked before the creation of vCPUs. Co-developed-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# 8d032b68	25-Feb-2025	Isaku Yamahata <isaku.yamahata@intel.com>	KVM: TDX: create/destroy VM structure Implement managing the TDX private KeyID to implement, create, destroy and free for a TDX guest. When creating at TDX guest, assign a TDX private KeyID for the KVM: TDX: create/destroy VM structure Implement managing the TDX private KeyID to implement, create, destroy and free for a TDX guest. When creating at TDX guest, assign a TDX private KeyID for the TDX guest for memory encryption, and allocate pages for the guest. These are used for the Trust Domain Root (TDR) and Trust Domain Control Structure (TDCS). On destruction, free the allocated pages, and the KeyID. Before tearing down the private page tables, TDX requires the guest TD to be destroyed by reclaiming the KeyID. Do it in the vm_pre_destroy() kvm_x86_ops hook. The TDR control structures can be freed in the vm_destroy() hook, which runs last. Co-developed-by: Tony Lindgren <tony.lindgren@linux.intel.com> Signed-off-by: Tony Lindgren <tony.lindgren@linux.intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Co-developed-by: Yan Zhao <yan.y.zhao@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- - Fix build issue in kvm-coco-queue - Init ret earlier to fix __tdx_td_init() error handling. (Chao) - Standardize -EAGAIN for __tdx_td_init() retry errors (Rick) Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
# 1001d988	30-Oct-2024	Sean Christopherson <sean.j.christopherson@intel.com>	KVM: TDX: Add TDX "architectural" error codes Add error codes for the TDX SEAMCALLs both for TDX VMM side for TDH SEAMCALL and TDX guest side for TDG.VP.VMCALL. KVM issues the TDX SEAMCALLs and che KVM: TDX: Add TDX "architectural" error codes Add error codes for the TDX SEAMCALLs both for TDX VMM side for TDH SEAMCALL and TDX guest side for TDG.VP.VMCALL. KVM issues the TDX SEAMCALLs and checks its error code. KVM handles hypercall from the TDX guest and may return an error. So error code for the TDX guest is also needed. TDX SEAMCALL uses bits 31:0 to return more information, so these error codes will only exactly match RAX[63:32]. Error codes for TDG.VP.VMCALL is defined by TDX Guest-Host-Communication interface spec. Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Yuan Yao <yuan.yao@intel.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Message-ID: <20241030190039.77971-14-rick.p.edgecombe@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> show more ...
12