KVM: TDX: Exit to userspace for SetupEventNotifyInterruptSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: TDX: Add new TDVMCALL status code for unsupported subfuncsAdd the new TDVMCALL status code TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED andreturn it for unimplemented TDVMCALL subfunctions.Returning
KVM: TDX: Add new TDVMCALL status code for unsupported subfuncsAdd the new TDVMCALL status code TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED andreturn it for unimplemented TDVMCALL subfunctions.Returning TDVMCALL_STATUS_INVALID_OPERAND when a subfunction is notimplemented is vague because TDX guests can't tell the error is due tothe subfunction is not supported or an invalid input of the subfunction.New GHCI spec adds TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED to avoid theambiguity. Use it instead of TDVMCALL_STATUS_INVALID_OPERAND.Before the change, for common guest implementations, when a TDX guestreceives TDVMCALL_STATUS_INVALID_OPERAND, it has two cases:1. Some operand is invalid. It could change the operand to another value retry.2. The subfunction is not supported.For case 1, an invalid operand usually means the guest implementation bug.Since the TDX guest can't tell which case is, the best practice forhandling TDVMCALL_STATUS_INVALID_OPERAND is stopping calling such leaf,treating the failure as fatal if the TDVMCALL is essential or ignoringit if the TDVMCALL is optional.With this change, TDVMCALL_STATUS_SUBFUNC_UNSUPPORTED could be sent toold TDX guest that do not know about it, but it is expected that theguest will make the same action as TDVMCALL_STATUS_INVALID_OPERAND.Currently, no known TDX guest checks TDVMCALL_STATUS_INVALID_OPERANDspecifically; for example Linux just checks for success.Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>[Return it for untrapped KVM_HC_MAP_GPA_RANGE. - Paolo]Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
Merge tag 'tsm-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsmPull trusted security manager (TSM) updates from Dan Williams: - Add a general sysfs scheme for publishing "Mea
Merge tag 'tsm-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsmPull trusted security manager (TSM) updates from Dan Williams: - Add a general sysfs scheme for publishing "Measurement" values provided by the architecture's TEE Security Manager. Use it to publish TDX "Runtime Measurement Registers" ("RTMRs") that either maintain a hash of stored values (similar to a TPM PCR) or provide statically provisioned data. These measurements are validated by a relying party. - Reorganize the drivers/virt/coco/ directory for "host" and "guest" shared infrastructure. - Fix a configfs-tsm-report unregister bug - With CONFIG_TSM_MEASUREMENTS joining CONFIG_TSM_REPORTS and in anticipation of more shared "TSM" infrastructure arriving, rename the maintainer entry to "TRUSTED SECURITY MODULE (TSM) INFRASTRUCTURE".* tag 'tsm-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/devsec/tsm: tsm-mr: Fix init breakage after bin_attrs constification by scoping non-const pointers to init phase sample/tsm-mr: Fix missing static for sample_report virt: tdx-guest: Transition to scoped_cond_guard for mutex operations virt: tdx-guest: Refactor and streamline TDREPORT generation virt: tdx-guest: Expose TDX MRs as sysfs attributes x86/tdx: tdx_mcall_get_report0: Return -EBUSY on TDCALL_OPERAND_BUSY error x86/tdx: Add tdx_mcall_extend_rtmr() interface tsm-mr: Add tsm-mr sample code tsm-mr: Add TVM Measurement Register support configfs-tsm-report: Fix NULL dereference of tsm_ops coco/guest: Move shared guest CC infrastructure to drivers/virt/coco/guest/ configfs-tsm: Namespace TSM report symbols
x86/tdx: Add tdx_mcall_extend_rtmr() interfaceThe TDX guest exposes one MRTD (Build-time Measurement Register) and fourRTMR (Run-time Measurement Register) registers to record the build and bootm
x86/tdx: Add tdx_mcall_extend_rtmr() interfaceThe TDX guest exposes one MRTD (Build-time Measurement Register) and fourRTMR (Run-time Measurement Register) registers to record the build and bootmeasurements of a virtual machine (VM). These registers are similar to PCR(Platform Configuration Register) registers in the TPM (Trusted PlatformModule) space. This measurement data is used to implement security featureslike attestation and trusted boot.To facilitate updating the RTMR registers, the TDX module provides supportfor the `TDG.MR.RTMR.EXTEND` TDCALL which can be used to securely extendthe RTMR registers.Add helper function to update RTMR registers. It will be used by the TDXguest driver in enabling RTMR extension support.Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>Signed-off-by: Cedric Xing <cedric.xing@intel.com>Acked-by: Dionna Amalie Glaze <dionnaglaze@google.com>Acked-by: Dave Hansen <dave.hansen@linux.intel.com>Link: https://patch.msgid.link/20250506-tdx-rtmr-v6-3-ac6ff5e9d58a@intel.comSigned-off-by: Dan Williams <dan.j.williams@intel.com>
Merge branch 'kvm-tdx-initial' into HEADThis large commit contains the initial support for TDX in KVM. All x86parts enable the host-side hypercalls that KVM uses to talk to the TDXmodule, a soft
Merge branch 'kvm-tdx-initial' into HEADThis large commit contains the initial support for TDX in KVM. All x86parts enable the host-side hypercalls that KVM uses to talk to the TDXmodule, a software component that runs in a special CPU mode called SEAM(Secure Arbitration Mode).The series is in turn split into multiple sub-series, each with a separatemerge commit:- Initialization: basic setup for using the TDX module from KVM, plus ioctls to create TDX VMs and vCPUs.- MMU: in TDX, private and shared halves of the address space are mapped by different EPT roots, and the private half is managed by the TDX module. Using the support that was added to the generic MMU code in 6.14, add support for TDX's secure page tables to the Intel side of KVM. Generic KVM code takes care of maintaining a mirror of the secure page tables so that they can be queried efficiently, and ensuring that changes are applied to both the mirror and the secure EPT.- vCPU enter/exit: implement the callbacks that handle the entry of a TDX vCPU (via the SEAMCALL TDH.VP.ENTER) and the corresponding save/restore of host state.- Userspace exits: introduce support for guest TDVMCALLs that KVM forwards to userspace. These correspond to the usual KVM_EXIT_* "heavyweight vmexits" but are triggered through a different mechanism, similar to VMGEXIT for SEV-ES and SEV-SNP.- Interrupt handling: support for virtual interrupt injection as well as handling VM-Exits that are caused by vectored events. Exclusive to TDX are machine-check SMIs, which the kernel already knows how to handle through the kernel machine check handler (commit 7911f145de5f, "x86/mce: Implement recovery for errors in TDX/SEAM non-root mode")- Loose ends: handling of the remaining exits from the TDX module, including EPT violation/misconfig and several TDVMCALL leaves that are handled in the kernel (CPUID, HLT, RDMSR/WRMSR, GetTdVmCallInfo); plus returning an error or ignoring operations that are not supported by TDX guestsSigned-off-by: Paolo Bonzini <pbonzini@redhat.com>
x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headersWhile the GCC and Clang compilers already define __ASSEMBLER__automatically when compiling assembly code, __ASSEMBLY__ is a
x86/headers: Replace __ASSEMBLY__ with __ASSEMBLER__ in non-UAPI headersWhile the GCC and Clang compilers already define __ASSEMBLER__automatically when compiling assembly code, __ASSEMBLY__ is amacro that only gets defined by the Makefiles in the kernel.This can be very confusing when switching between userspaceand kernelspace coding, or when dealing with UAPI headers thatrather should use __ASSEMBLER__ instead. So let's standardize onthe __ASSEMBLER__ macro that is provided by the compilers now.This is mostly a mechanical patch (done with a simple "sed -i"statement), with some manual tweaks in <asm/frame.h>, <asm/hw_irq.h>and <asm/setup.h> that mentioned this macro in comments with somemissing underscores.Signed-off-by: Thomas Huth <thuth@redhat.com>Signed-off-by: Ingo Molnar <mingo@kernel.org>Cc: Brian Gerst <brgerst@gmail.com>Cc: Juergen Gross <jgross@suse.com>Cc: H. Peter Anvin <hpa@zytor.com>Cc: Kees Cook <keescook@chromium.org>Cc: Linus Torvalds <torvalds@linux-foundation.org>Link: https://lore.kernel.org/r/20250314071013.1575167-38-thuth@redhat.com
KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercallImplement TDG.VP.VMCALL<GetTdVmCallInfo> hypercall. If the input value iszero, return success code and zero in output registers.TDG.VP.V
KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercallImplement TDG.VP.VMCALL<GetTdVmCallInfo> hypercall. If the input value iszero, return success code and zero in output registers.TDG.VP.VMCALL<GetTdVmCallInfo> hypercall is a subleaf of TDG.VP.VMCALL toenumerate which TDG.VP.VMCALL sub leaves are supported. This hypercall isfor future enhancement of the Guest-Host-Communication Interface (GHCI)specification. The GHCI version of 344426-001US defines it to requireinput R12 to be zero and to return zero in output registers, R11, R12, R13,and R14 so that guest TD enumerates no enhancement.Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>Message-ID: <20250227012021.1778144-12-binbin.wu@linux.intel.com>Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>Convert TDG.VP.VMCALL<MapGPA> to KVM_EXIT_HYPERCALL withKVM_HC_MAP_GPA_RANGE and forward it to userspace for handling.MapGPA is used by TDX guest to request
KVM: TDX: Handle TDG.VP.VMCALL<MapGPA>Convert TDG.VP.VMCALL<MapGPA> to KVM_EXIT_HYPERCALL withKVM_HC_MAP_GPA_RANGE and forward it to userspace for handling.MapGPA is used by TDX guest to request to map a GPA range as privateor shared memory. It needs to exit to userspace for handling. KVM hasalready implemented a similar hypercall KVM_HC_MAP_GPA_RANGE, which willexit to userspace with exit reason KVM_EXIT_HYPERCALL. Do sanity checks,convert TDVMCALL_MAP_GPA to KVM_HC_MAP_GPA_RANGE and forward the requestto userspace.To prevent a TDG.VP.VMCALL<MapGPA> call from taking too long, the MapGPArange is split into 2MB chunks and check interrupt pending between chunks.This allows for timely injection of interrupts and prevents issues withguest lockup detection. TDX guest should retry the operation for theGPA starting at the address specified in R11 when the TDVMCALL returnTDVMCALL_RETRY as status code.Note userspace needs to enable KVM_CAP_EXIT_HYPERCALL withKVM_HC_MAP_GPA_RANGE bit set for TD VM.Suggested-by: Sean Christopherson <seanjc@google.com>Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com>Message-ID: <20250222014225.897298-7-binbin.wu@linux.intel.com>Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: TDX: Add TDX "architectural" error codesAdd error codes for the TDX SEAMCALLs both for TDX VMM side for TDHSEAMCALL and TDX guest side for TDG.VP.VMCALL. KVM issues the TDXSEAMCALLs and che
KVM: TDX: Add TDX "architectural" error codesAdd error codes for the TDX SEAMCALLs both for TDX VMM side for TDHSEAMCALL and TDX guest side for TDG.VP.VMCALL. KVM issues the TDXSEAMCALLs and checks its error code. KVM handles hypercall from the TDXguest and may return an error. So error code for the TDX guest is alsoneeded.TDX SEAMCALL uses bits 31:0 to return more information, so these errorcodes will only exactly match RAX[63:32]. Error codes for TDG.VP.VMCALL isdefined by TDX Guest-Host-Communication interface spec.Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>Reviewed-by: Yuan Yao <yuan.yao@intel.com>Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>Message-ID: <20241030190039.77971-14-rick.p.edgecombe@intel.com>Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
x86/tdx: Dump attributes and TD_CTLS on bootDump TD configuration on boot. Attributes and TD_CTLS define TDbehavior. This information is useful for tracking down bugs.The output ends up looking
x86/tdx: Dump attributes and TD_CTLS on bootDump TD configuration on boot. Attributes and TD_CTLS define TDbehavior. This information is useful for tracking down bugs.The output ends up looking like this in practice:[ 0.000000] tdx: Guest detected[ 0.000000] tdx: Attributes: SEPT_VE_DISABLE[ 0.000000] tdx: TD_CTLS: PENDING_VE_DISABLE ENUM_TOPOLOGY VIRT_CPUID2 REDUCE_VESigned-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>Link: https://lore.kernel.org/all/20241202072458.447455-1-kirill.shutemov%40linux.intel.com
x86/tdx: Disable unnecessary virtualization exceptionsOriginally, #VE was defined as the TDX behavior in order to supportparavirtualization of x86 features that can’t be virtualized by the TDXmod
x86/tdx: Disable unnecessary virtualization exceptionsOriginally, #VE was defined as the TDX behavior in order to supportparavirtualization of x86 features that can’t be virtualized by the TDXmodule. The intention is that if guest software wishes to use such afeature, it implements some logic to support this. This logic resides inthe #VE exception handler it may work in cooperation with the host VMM.Theoretically, the guest TD’s #VE handler was supposed to act as a "TDXenlightenment agent" inside the TD. However, in practice, the #VEhandler is simplistic: - #VE on CPUID is handled by returning all-0 to the code which executed CPUID. In many cases, an all-0 value is not the correct value, and may cause improper operation. - #VE on RDMSR is handled by requesting the MSR value from the host VMM. This is prone to security issues since the host VMM is untrusted. It may also be functionally incorrect in case the expected operation is to paravirtualize some CPU functionality.Newer TDX modules provide a "REDUCE_VE" feature. When enabled, itdrastically cuts cases when guests receive #VE on MSR and CPUIDaccesses. Basically, instead of punting the problem to the VMM, theTDX module fills in good data. What the TDX module provides isobviously highly specific to the MSR or CPUID. This is all spelledout in excruciating detail in the TDX specs.Enable REDUCE_VE. Make TDX guest behaviour less odd, and closer tohow a normal CPU behaves.Note that enabling of the feature doesn't eliminate need in #VE handlerfor CPUID and MSR accesses. Some MSRs still generate #VE (notablyAPIC-related) and kernel needs CPUID #VE handler to ask VMM for leafs inhypervisor range.[ dhansen: changelog tweaks, rename/rework VE reduction function ]Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>Link: https://lore.kernel.org/all/20241202072431.447380-1-kirill.shutemov%40linux.intel.com
x86/tdx: Enable CPU topology enumerationTDX 1.0 defines baseline behaviour of TDX guest platform. TDX 1.0generates a #VE when accessing topology-related CPUID leafs (0xB and0x1F) and the X2APIC_A
x86/tdx: Enable CPU topology enumerationTDX 1.0 defines baseline behaviour of TDX guest platform. TDX 1.0generates a #VE when accessing topology-related CPUID leafs (0xB and0x1F) and the X2APIC_APICID MSR. The kernel returns all zeros on CPUIDtopology. In practice, this means that the kernel can only boot with aplain topology. Any complications will cause problems.The ENUM_TOPOLOGY feature allows the VMM to provide topologyinformation to the guest. Enabling the feature eliminatestopology-related #VEs: the TDX module virtualizes accesses tothe CPUID leafs and the MSR.Enable ENUM_TOPOLOGY if it is available.Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Acked-by: Kai Huang <kai.huang@intel.com>Link: https://lore.kernel.org/all/20241104103803.195705-5-kirill.shutemov%40linux.intel.com
x86/tdx: Dynamically disable SEPT violations from causing #VEsMemory access #VEs are hard for Linux to handle in contexts like theentry code or NMIs. But other OSes need them for functionality.T
x86/tdx: Dynamically disable SEPT violations from causing #VEsMemory access #VEs are hard for Linux to handle in contexts like theentry code or NMIs. But other OSes need them for functionality.There's a static (pre-guest-boot) way for a VMM to choose one or theother. But VMMs don't always know which OS they are booting, so theychoose to deliver those #VEs so the "other" OSes will work. That,unfortunately has left us in the lurch and exposed to thesehard-to-handle #VEs.The TDX module has introduced a new feature. Even if the staticconfiguration is set to "send nasty #VEs", the kernel can dynamicallyrequest that they be disabled. Once they are disabled, access to privatememory that is not in the Mapped state in the Secure-EPT (SEPT) willresult in an exit to the VMM rather than injecting a #VE.Check if the feature is available and disable SEPT #VE if possible.If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLEattribute is no longer reliable. It reflects the initial state of thecontrol for the TD, but it will not be updated if someone (e.g. bootloader)changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit todetermine if SEPT #VEs are enabled or disabled.[ dhansen: remove 'return' at end of function ]Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access")Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Acked-by: Kai Huang <kai.huang@intel.com>Link: https://lore.kernel.org/all/20241104103803.195705-4-kirill.shutemov%40linux.intel.com
x86/tdx: Introduce wrappers to read and write TD metadataThe TDG_VM_WR TDCALL is used to ask the TDX module to change someTD-specific VM configuration. There is currently only one user in thekern
x86/tdx: Introduce wrappers to read and write TD metadataThe TDG_VM_WR TDCALL is used to ask the TDX module to change someTD-specific VM configuration. There is currently only one user in thekernel of this TDCALL leaf. More will be added shortly.Refactor to make way for more users of TDG_VM_WR who will need to modifyother TD configuration values.Add a wrapper for the TDG_VM_RD TDCALL that requests TD-specificmetadata from the TDX module. There are currently no users forTDG_VM_RD. Mark it as __maybe_unused until the first user appears.This is preparation for enumeration and enabling optional TD features.Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: Kai Huang <kai.huang@intel.com>Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>Link: https://lore.kernel.org/all/20241104103803.195705-2-kirill.shutemov%40linux.intel.com
x86/virt/tdx: Get module global metadata for module initializationThe TDX module global metadata provides system-wide information aboutthe module.TL;DR:Use the TDH.SYS.RD SEAMCALL to tell if t
x86/virt/tdx: Get module global metadata for module initializationThe TDX module global metadata provides system-wide information aboutthe module.TL;DR:Use the TDH.SYS.RD SEAMCALL to tell if the module is good or not.Long Version:1) Only initialize TDX module with version 1.5 and laterTDX module 1.0 has some compatibility issues with the later versions ofmodule, as documented in the "Intel TDX module ABI incompatibilitiesbetween TDX1.0 and TDX1.5" spec. Don't bother with module versions thatdo not have a stable ABI.2) Get the essential global metadata for module initializationTDX reports a list of "Convertible Memory Region" (CMR) to tell thekernel which memory is TDX compatible. The kernel needs to build a listof memory regions (out of CMRs) as "TDX-usable" memory and pass them tothe TDX module. The kernel does this by constructing a list of "TDMemory Regions" (TDMRs) to cover all these memory regions and passingthem to the TDX module.Each TDMR is a TDX architectural data structure containing the memoryregion that the TDMR covers, plus the information to track (within thisTDMR): a) the "Physical Address Metadata Table" (PAMT) to track each TDX memory page's status (such as which TDX guest "owns" a given page, and b) the "reserved areas" to tell memory holes that cannot be used as TDX memory.The kernel needs to get below metadata from the TDX module to build thelist of TDMRs: a) the maximum number of supported TDMRs b) the maximum number of supported reserved areas per TDMR and, c) the PAMT entry size for each TDX-supported page size.== Implementation ==The TDX module has two modes of fetching the metadata: a one field ata time, or all in one blob. Use the field at a time for now. It isslower, but there just are not enough fields now to justify thecomplexity of extra unpacking.The err_free_tdxmem=>out_put_tdxmem goto looks wonky by itself. Butit is the first of a bunch of error handling that will get stuck atits site.[ dhansen: clean up changelog and add a struct to map between the TDX module fields and 'struct tdx_tdmr_sysinfo' ]Signed-off-by: Kai Huang <kai.huang@intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Link: https://lore.kernel.org/all/20231208170740.53979-8-dave.hansen%40intel.com
x86/virt/tdx: Define TDX supported page sizes as macrosTDX supports 4K, 2M and 1G page sizes. The corresponding values aredefined by the TDX module spec and used as TDX module ABI. Currently,th
x86/virt/tdx: Define TDX supported page sizes as macrosTDX supports 4K, 2M and 1G page sizes. The corresponding values aredefined by the TDX module spec and used as TDX module ABI. Currently,they are used in try_accept_one() when the TDX guest tries to accept apage. However currently try_accept_one() uses hard-coded magic values.Define TDX supported page sizes as macros and get rid of the hard-codedvalues in try_accept_one(). TDX host support will need to use them too.Signed-off-by: Kai Huang <kai.huang@intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: David Hildenbrand <david@redhat.com>Link: https://lore.kernel.org/all/20231208170740.53979-2-dave.hansen%40intel.com
Merge tag 'tsm-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linuxPull unified attestation reporting from Dan Williams: "In an ideal world there would be a cross-vendor standard a
Merge tag 'tsm-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linuxPull unified attestation reporting from Dan Williams: "In an ideal world there would be a cross-vendor standard attestation report format for confidential guests along with a common device definition to act as the transport. In the real world the situation ended up with multiple platform vendors inventing their own attestation report formats with the SEV-SNP implementation being a first mover to define a custom sev-guest character device and corresponding ioctl(). Later, this configfs-tsm proposal intercepted an attempt to add a tdx-guest character device and a corresponding new ioctl(). It also anticipated ARM and RISC-V showing up with more chardevs and more ioctls(). The proposal takes for granted that Linux tolerates the vendor report format differentiation until a standard arrives. From talking with folks involved, it sounds like that standardization work is unlikely to resolve anytime soon. It also takes the position that kernfs ABIs are easier to maintain than ioctl(). The result is a shared configfs mechanism to return per-vendor report-blobs with the option to later support a standard when that arrives. Part of the goal here also is to get the community into the "uncomfortable, but beneficial to the long term maintainability of the kernel" state of talking to each other about their differentiation and opportunities to collaborate. Think of this like the device-driver equivalent of the common memory-management infrastructure for confidential-computing being built up in KVM. As for establishing an "upstream path for cross-vendor confidential-computing device driver infrastructure" this is something I want to discuss at Plumbers. At present, the multiple vendor proposals for assigning devices to confidential computing VMs likely needs a new dedicated repository and maintainer team, but that is a discussion for v6.8. For now, Greg and Thomas have acked this approach and this is passing is AMD, Intel, and Google tests. Summary: - Introduce configfs-tsm as a shared ABI for confidential computing attestation reports - Convert sev-guest to additionally support configfs-tsm alongside its vendor specific ioctl() - Added signed attestation report retrieval to the tdx-guest driver forgoing a new vendor specific ioctl() - Misc cleanups and a new __free() annotation for kvfree()"* tag 'tsm-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/linux: virt: tdx-guest: Add Quote generation support using TSM_REPORTS virt: sevguest: Add TSM_REPORTS support for SNP_GET_EXT_REPORT mm/slab: Add __free() support for kvfree virt: sevguest: Prep for kernel internal get_ext_report() configfs-tsm: Introduce a shared ABI for attestation reports virt: coco: Add a coco/Makefile and coco/Kconfig virt: sevguest: Fix passing a stack buffer as a scatterlist target
virt: tdx-guest: Add Quote generation support using TSM_REPORTSIn TDX guest, the attestation process is used to verify the TDX guesttrustworthiness to other entities before provisioning secrets to
virt: tdx-guest: Add Quote generation support using TSM_REPORTSIn TDX guest, the attestation process is used to verify the TDX guesttrustworthiness to other entities before provisioning secrets to theguest. The first step in the attestation process is TDREPORTgeneration, which involves getting the guest measurement data in theformat of TDREPORT, which is further used to validate the authenticityof the TDX guest. TDREPORT by design is integrity-protected and canonly be verified on the local machine.To support remote verification of the TDREPORT in a SGX-basedattestation, the TDREPORT needs to be sent to the SGX Quoting Enclave(QE) to convert it to a remotely verifiable Quote. SGX QE by design canonly run outside of the TDX guest (i.e. in a host process or in anormal VM) and guest can use communication channels like vsock orTCP/IP to send the TDREPORT to the QE. But for security concerns, theTDX guest may not support these communication channels. To handle suchcases, TDX defines a GetQuote hypercall which can be used by the guestto request the host VMM to communicate with the SGX QE. More detailsabout GetQuote hypercall can be found in TDX Guest-Host CommunicationInterface (GHCI) for Intel TDX 1.0, section titled"TDG.VP.VMCALL<GetQuote>".Trusted Security Module (TSM) [1] exposes a common ABI for ConfidentialComputing Guest platforms to get the measurement data via ConfigFS.Extend the TSM framework and add support to allow an attestation agentto get the TDX Quote data (included usage example below). report=/sys/kernel/config/tsm/report/report0 mkdir $report dd if=/dev/urandom bs=64 count=1 > $report/inblob hexdump -C $report/outblob rmdir $reportGetQuote TDVMCALL requires TD guest pass a 4K aligned shared bufferwith TDREPORT data as input, which is further used by the VMM to copythe TD Quote result after successful Quote generation. To create theshared buffer, allocate a large enough memory and mark it shared usingset_memory_decrypted() in tdx_guest_init(). This buffer will be re-usedfor GetQuote requests in the TDX TSM handler.Although this method reserves a fixed chunk of memory for GetQuoterequests, such one time allocation can help avoid memory fragmentationrelated allocation failures later in the uptime of the guest.Since the Quote generation process is not time-critical or frequentlyused, the current version uses a polling model for Quote requests andit also does not support parallel GetQuote requests.Link: https://lore.kernel.org/lkml/169342399185.3934343.3035845348326944519.stgit@dwillia2-xfh.jf.intel.com/ [1]Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>Reviewed-by: Erdem Aktas <erdemaktas@google.com>Tested-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>Tested-by: Peter Gonda <pgonda@google.com>Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>Signed-off-by: Dan Williams <dan.j.williams@intel.com>
x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()LKP reported below build warning: vmlinux.o: warning: objtool: __tdx_hypercall+0x128: __tdx_hypercall_failed() is missing a _
x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()LKP reported below build warning: vmlinux.o: warning: objtool: __tdx_hypercall+0x128: __tdx_hypercall_failed() is missing a __noreturn annotationThe __tdx_hypercall_failed() function definition already has __noreturnannotation, but it turns out the __noreturn must be annotated to thefunction declaration.PeterZ explains: "FWIW, the reason being that... The point of noreturn is that the caller should know to stop generating code. For that the declaration needs the attribute, because call sites typically do not have access to the function definition in C."Add __noreturn annotation to the declaration of __tdx_hypercall_failed()to fix. It's not a bad idea to document the __noreturn nature at thedefinition site either, so keep the annotation at the definition.Note <asm/shared/tdx.h> is also included by TDX related assembly files.Include <linux/compiler_attributes.h> only in case of !__ASSEMBLY__otherwise compiling assembly file would trigger build error.Also, following the objtool documentation, add __tdx_hypercall_failed()to "tools/objtool/noreturns.h".Fixes: c641cfb5c157 ("x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL")Reported-by: kernel test robot <lkp@intel.com>Signed-off-by: Kai Huang <kai.huang@intel.com>Signed-off-by: Ingo Molnar <mingo@kernel.org>Link: https://lore.kernel.org/r/20230918041858.331234-1-kai.huang@intel.comCloses: https://lore.kernel.org/oe-kbuild-all/202309140828.9RdmlH2Z-lkp@intel.com/
x86/tdx: Remove 'struct tdx_hypercall_args'Now 'struct tdx_hypercall_args' is basically 'struct tdx_module_args'minus RCX. Although from __tdx_hypercall()'s perspective RCX isn'tused as shared r
x86/tdx: Remove 'struct tdx_hypercall_args'Now 'struct tdx_hypercall_args' is basically 'struct tdx_module_args'minus RCX. Although from __tdx_hypercall()'s perspective RCX isn'tused as shared register thus not part of input/output registers, it'snot worth to have a separate structure just due to one register.Remove the 'struct tdx_hypercall_args' and use 'struct tdx_module_args'instead in __tdx_hypercall() related code. This also saves the memorycopy between the two structures within __tdx_hypercall().Suggested-by: Peter Zijlstra <peterz@infradead.org>Signed-off-by: Kai Huang <kai.huang@intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>Link: https://lore.kernel.org/all/798dad5ce24e9d745cf0e16825b75ccc433ad065.1692096753.git.kai.huang%40intel.com
x86/tdx: Reimplement __tdx_hypercall() using TDX_MODULE_CALL asmNow the TDX_HYPERCALL asm is basically identical to the TDX_MODULE_CALLwith both '\saved' and '\ret' enabled, with two minor things
x86/tdx: Reimplement __tdx_hypercall() using TDX_MODULE_CALL asmNow the TDX_HYPERCALL asm is basically identical to the TDX_MODULE_CALLwith both '\saved' and '\ret' enabled, with two minor things though:1) The way to restore the structure pointer is differentThe TDX_HYPERCALL uses RCX as spare to restore the structure pointer,but the TDX_MODULE_CALL assumes no spare register can be used. In otherwords, TDX_MODULE_CALL already covers what TDX_HYPERCALL does.2) TDX_MODULE_CALL only clears shared registers for TDH.VP.ENTERFor this just need to make that code available for the non-host case.Thus, remove the TDX_HYPERCALL and reimplement the __tdx_hypercall()using the TDX_MODULE_CALL.Extend the TDX_MODULE_CALL to cover "clear shared registers" forTDG.VP.VMCALL. Introduce a new __tdcall_saved_ret() to replace thetemporary __tdcall_hypercall().The __tdcall_saved_ret() can also be used for those new TDCALLs whichrequire more input/output registers than the basic TDCALLs do.Suggested-by: Peter Zijlstra <peterz@infradead.org>Signed-off-by: Kai Huang <kai.huang@intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>Link: https://lore.kernel.org/all/e68a2473fb6f5bcd78b078cae7510e9d0753b3df.1692096753.git.kai.huang%40intel.com
x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALLNow the 'struct tdx_hypercall_args' and 'struct tdx_module_args' arealmost the same, and the TDX_HYPERCALL and TDX_MODULE_CALL asm macros
x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALLNow the 'struct tdx_hypercall_args' and 'struct tdx_module_args' arealmost the same, and the TDX_HYPERCALL and TDX_MODULE_CALL asm macroshare similar code pattern too. The __tdx_hypercall() and __tdcall()should be unified to use the same assembly code.As a preparation to unify them, simplify the TDX_HYPERCALL to make itmore like the TDX_MODULE_CALL.The TDX_HYPERCALL takes the pointer of 'struct tdx_hypercall_args' asfunction call argument, and does below extra things comparing to theTDX_MODULE_CALL:1) It sets RAX to 0 (TDG.VP.VMCALL leaf) internally;2) It sets RCX to the (fixed) bitmap of shared registers internally;3) It calls __tdx_hypercall_failed() internally (and panics) when the TDCALL instruction itself fails;4) After TDCALL, it moves R10 to RAX to return the return code of the VMCALL leaf, regardless the '\ret' asm macro argument;Firstly, change the TDX_HYPERCALL to take the same function callarguments as the TDX_MODULE_CALL does: TDCALL leaf ID, and the pointerto 'struct tdx_module_args'. Then 1) and 2) can be moved to thecaller: - TDG.VP.VMCALL leaf ID can be passed via the function call argument; - 'struct tdx_module_args' is 'struct tdx_hypercall_args' + RCX, thus the bitmap of shared registers can be passed via RCX in the structure.Secondly, to move 3) and 4) out of assembly, make the TDX_HYPERCALLalways save output registers to the structure. The caller then can: - Call __tdx_hypercall_failed() when TDX_HYPERCALL returns error; - Return R10 in the structure as the return code of the VMCALL leaf;With above changes, change the asm function from __tdx_hypercall() to__tdcall_hypercall(), and reimplement __tdx_hypercall() as the C wrapperof it. This avoids having to add another wrapper of __tdx_hypercall()(_tdx_hypercall() is already taken).The __tdcall_hypercall() will be replaced with a __tdcall() variantusing TDX_MODULE_CALL in a later commit as the final goal is to have oneassembly to handle both TDCALL and TDVMCALL.Currently, the __tdx_hypercall() asm is in '.noinstr.text'. To keepthis unchanged, annotate __tdx_hypercall(), which is a C function now,as 'noinstr'.Remove the __tdx_hypercall_ret() as __tdx_hypercall() already does so.Implement __tdx_hypercall() in tdx-shared.c so it can be shared with thecompressed code.Opportunistically fix a checkpatch error complaining using space aroundparenthesis '(' and ')' while moving the bitmap of shared registers to<asm/shared/tdx.h>.[ dhansen: quash new calls of __tdx_hypercall_ret() that showed up ]Suggested-by: Peter Zijlstra <peterz@infradead.org>Signed-off-by: Kai Huang <kai.huang@intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>Link: https://lore.kernel.org/all/0cbf25e7aee3256288045023a31f65f0cef90af4.1692096753.git.kai.huang%40intel.com
x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafsThe TDX guest live migration support (TDX 1.5) adds new TDCALL/SEAMCALLleaf functions. Those new TDCALLs/SEAMCALLs take additi
x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafsThe TDX guest live migration support (TDX 1.5) adds new TDCALL/SEAMCALLleaf functions. Those new TDCALLs/SEAMCALLs take additional registersfor input (R10-R13) and output (R12-R13). TDG.SERVTD.RD is an example.Also, the current TDX_MODULE_CALL doesn't aim to handle TDH.VP.ENTERSEAMCALL, which monitors the TDG.VP.VMCALL in input/output registerswhen it returns in case of VMCALL from TDX guest.With those new TDCALLs/SEAMCALLs and the TDH.VP.ENTER covered, theTDX_MODULE_CALL macro basically needs to handle the same input/outputregisters as the TDX_HYPERCALL does. And as a result, they also sharesimilar logic in the assembly, thus should be unified to use one commonassembly.Extend the TDX_MODULE_CALL asm to support the new TDCALLs/SEAMCALLs andalso the TDH.VP.ENTER SEAMCALL. Eventually it will be unified with theTDX_HYPERCALL.The new input/output registers fit with the "callee-saved" registers inthe x86 calling convention. Add a new "saved" parameter to supportthose new TDCALLs/SEAMCALLs and TDH.VP.ENTER and keep the existingTDCALLs/SEAMCALLs minimally impacted.For TDH.VP.ENTER, after it returns the registers shared by the guestcontain guest's values. Explicitly clear them to prevent speculativeuse of guest's values.Note most TDX live migration related SEAMCALLs may also clobber AVX*state ("AVX, AVX2 and AVX512 state: may be reset to the architecturalINIT state" -- see TDH.EXPORT.MEM for example). And TDH.VP.ENTER alsoclobbers XMM0-XMM15 when the corresponding bit is set in RCX. Don'thandle them in the TDX_MODULE_CALL macro but let the caller save andrestore when needed.This is basically based on Peter's code.Suggested-by: Peter Zijlstra <peterz@infradead.org>Signed-off-by: Kai Huang <kai.huang@intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>Link: https://lore.kernel.org/all/d4785de7c392f7c5684407f6c24a73b92148ec49.1692096753.git.kai.huang%40intel.com
x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structureCurrently, the TDX_MODULE_CALL asm macro, which handles both TDCALL andSEAMCALL, takes one parameter for each input register and
x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structureCurrently, the TDX_MODULE_CALL asm macro, which handles both TDCALL andSEAMCALL, takes one parameter for each input register and an optional'struct tdx_module_output' (a collection of output registers) as output.This is different from the TDX_HYPERCALL macro which uses a single'struct tdx_hypercall_args' to carry all input/output registers.The newer TDX versions introduce more TDCALLs/SEAMCALLs which use moreinput/output registers. Also, the TDH.VP.ENTER (which isn't coveredby the current TDX_MODULE_CALL macro) basically can use all registersthat the TDX_HYPERCALL does. The current TDX_MODULE_CALL macro isn'textendible to cover those cases.Similar to the TDX_HYPERCALL macro, simplify the TDX_MODULE_CALL macroto use a single structure 'struct tdx_module_args' to carry all theinput/output registers. Currently, R10/R11 are only used as outputregister but not as input by any TDCALL/SEAMCALL. Change to also useR10/R11 as input register to make input/output registers symmetric.Currently, the TDX_MODULE_CALL macro depends on the caller to pass anon-NULL 'struct tdx_module_output' to get additional output registers.Similar to the TDX_HYPERCALL macro, change the TDX_MODULE_CALL macro totake a new 'ret' macro argument to indicate whether to save the outputregisters to the 'struct tdx_module_args'. Also introduce a new__tdcall_ret() for that purpose, similar to the __tdx_hypercall_ret().Note the tdcall(), which is a wrapper of __tdcall(), is called by threecallers: tdx_parse_tdinfo(), tdx_get_ve_info() and tdx_early_init().The former two need the additional output but the last one doesn't. Forsimplicity, make tdcall() always call __tdcall_ret() to avoid another"_ret()" wrapper. The last caller tdx_early_init() isn't performancecritical anyway.Suggested-by: Peter Zijlstra <peterz@infradead.org>Signed-off-by: Kai Huang <kai.huang@intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>Link: https://lore.kernel.org/all/483616c1762d85eb3a3c3035a7de061cfacf2f14.1692096753.git.kai.huang%40intel.com
x86/tdx: Rename __tdx_module_call() to __tdcall()__tdx_module_call() is only used by the TDX guest to issue TDCALL to theTDX module. Rename it to __tdcall() to match its behaviour, e.g., itcanno
x86/tdx: Rename __tdx_module_call() to __tdcall()__tdx_module_call() is only used by the TDX guest to issue TDCALL to theTDX module. Rename it to __tdcall() to match its behaviour, e.g., itcannot be used to make host-side SEAMCALL.Also rename tdx_module_call() which is a wrapper of __tdx_module_call()to tdcall().No functional change intended.Signed-off-by: Kai Huang <kai.huang@intel.com>Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>Link: https://lore.kernel.org/all/785d20d99fbcd0db8262c94da6423375422d8c75.1692096753.git.kai.huang%40intel.com
12