| b7d21739 | 02-Apr-2026 |
Vishal Verma <vishal.l.verma@intel.com> |
x86/virt/tdx: Add SEAMCALL wrapper for TDH.SYS.DISABLE
Some early TDX-capable platforms have an erratum where a partial write to TDX private memory can cause a machine check on a subsequent read. On
x86/virt/tdx: Add SEAMCALL wrapper for TDH.SYS.DISABLE
Some early TDX-capable platforms have an erratum where a partial write to TDX private memory can cause a machine check on a subsequent read. On these platforms, kexec and kdump have been disabled in these cases, because the old kernel cannot safely hand off TDX state to the new kernel. Later TDX modules support the TDH.SYS.DISABLE SEAMCALL, which provides a way to cleanly disable TDX and allow kexec to proceed.
The new SEAMCALL has an enumeration bit, but that is ignored. It is expected that users will be using the latest TDX module, and the failure mode for running the missing SEAMCALL on an older module is not fatal.
This can be a long running operation, and the time needed largely depends on the amount of memory that has been allocated to TDs. If all TDs have been destroyed prior to the sys_disable call, then it is fast, with only needing to override the TDX module memory.
After the SEAMCALL completes, the TDX module is disabled and all memory resources allocated to TDX are freed and reset. The next kernel can then re-initialize the TDX module from scratch via the normal TDX bring-up sequence.
The SEAMCALL can return two different error codes that expect a retry. - TDX_INTERRUPTED_RESUMABLE can be returned in the case of a host interrupt. However, it will not return until it makes some forward progress, so we can expect to complete even in the case of interrupt storms. - TDX_SYS_BUSY will be returned on contention with other TDH.SYS.* SEAMCALLs, however a side effect of TDH.SYS.DISABLE is that it will block other SEAMCALLs once it gets going. So this contention will be short lived.
So loop infinitely on either of these error codes, until success or other error.
An error is printed if the SEAMCALL fails with anything other than the error codes that cause retries, or 'synthesized' error codes produced for #GP or #UD. e.g., an old module that has been properly initialized, that doesn't implement SYS_DISABLE, returns TDX_OPERAND_INVALID. This prints:
virt/tdx: TDH.SYS.DISABLE failed: 0xc000010000000000
But a system that doesn't have any TDX support at all doesn't print anything.
Co-developed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Chao Gao <chao.gao@intel.com> Reviewed-by: Kiryl Shutsemau (Meta) <kas@kernel.org> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20260402-fuller_tdx_kexec_support-v3-3-34438d7094bf@intel.com
show more ...
|
| 28bcd8d8 | 03-Mar-2026 |
Xiaoyao Li <xiaoyao.li@intel.com> |
x86/tdx: Rename TDX_ATTR_* to TDX_TD_ATTR_*
The macros TDX_ATTR_* and DEF_TDX_ATTR_* are related to TD attributes, which are TD-scope attributes. Naming them as TDX_ATTR_* can be somewhat confusing
x86/tdx: Rename TDX_ATTR_* to TDX_TD_ATTR_*
The macros TDX_ATTR_* and DEF_TDX_ATTR_* are related to TD attributes, which are TD-scope attributes. Naming them as TDX_ATTR_* can be somewhat confusing and might mislead people into thinking they are TDX global things.
Rename TDX_ATTR_* to TDX_TD_ATTR_* to explicitly clarify they are TD-scope things.
Suggested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Reviewed-by: Kiryl Shutsemau <kas@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://patch.msgid.link/20260303030335.766779-4-xiaoyao.li@intel.com
show more ...
|
| 04733836 | 27-Feb-2025 |
Isaku Yamahata <isaku.yamahata@intel.com> |
KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall
Implement TDG.VP.VMCALL<GetTdVmCallInfo> hypercall. If the input value is zero, return success code and zero in output registers.
TDG.VP.V
KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall
Implement TDG.VP.VMCALL<GetTdVmCallInfo> hypercall. If the input value is zero, return success code and zero in output registers.
TDG.VP.VMCALL<GetTdVmCallInfo> hypercall is a subleaf of TDG.VP.VMCALL to enumerate which TDG.VP.VMCALL sub leaves are supported. This hypercall is for future enhancement of the Guest-Host-Communication Interface (GHCI) specification. The GHCI version of 344426-001US defines it to require input R12 to be zero and to return zero in output registers, R11, R12, R13, and R14 so that guest TD enumerates no enhancement.
Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com> Signed-off-by: Binbin Wu <binbin.wu@linux.intel.com> Message-ID: <20250227012021.1778144-12-binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
show more ...
|
| 564ea84c | 02-Dec-2024 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Dump attributes and TD_CTLS on boot
Dump TD configuration on boot. Attributes and TD_CTLS define TD behavior. This information is useful for tracking down bugs.
The output ends up looking
x86/tdx: Dump attributes and TD_CTLS on boot
Dump TD configuration on boot. Attributes and TD_CTLS define TD behavior. This information is useful for tracking down bugs.
The output ends up looking like this in practice:
[ 0.000000] tdx: Guest detected [ 0.000000] tdx: Attributes: SEPT_VE_DISABLE [ 0.000000] tdx: TD_CTLS: PENDING_VE_DISABLE ENUM_TOPOLOGY VIRT_CPUID2 REDUCE_VE
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> Link: https://lore.kernel.org/all/20241202072458.447455-1-kirill.shutemov%40linux.intel.com
show more ...
|
| 7ae15e2f | 04-Nov-2024 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Enable CPU topology enumeration
TDX 1.0 defines baseline behaviour of TDX guest platform. TDX 1.0 generates a #VE when accessing topology-related CPUID leafs (0xB and 0x1F) and the X2APIC_A
x86/tdx: Enable CPU topology enumeration
TDX 1.0 defines baseline behaviour of TDX guest platform. TDX 1.0 generates a #VE when accessing topology-related CPUID leafs (0xB and 0x1F) and the X2APIC_APICID MSR. The kernel returns all zeros on CPUID topology. In practice, this means that the kernel can only boot with a plain topology. Any complications will cause problems.
The ENUM_TOPOLOGY feature allows the VMM to provide topology information to the guest. Enabling the feature eliminates topology-related #VEs: the TDX module virtualizes accesses to the CPUID leafs and the MSR.
Enable ENUM_TOPOLOGY if it is available.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/20241104103803.195705-5-kirill.shutemov%40linux.intel.com
show more ...
|
| f65aa0ad | 04-Nov-2024 |
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> |
x86/tdx: Dynamically disable SEPT violations from causing #VEs
Memory access #VEs are hard for Linux to handle in contexts like the entry code or NMIs. But other OSes need them for functionality. T
x86/tdx: Dynamically disable SEPT violations from causing #VEs
Memory access #VEs are hard for Linux to handle in contexts like the entry code or NMIs. But other OSes need them for functionality. There's a static (pre-guest-boot) way for a VMM to choose one or the other. But VMMs don't always know which OS they are booting, so they choose to deliver those #VEs so the "other" OSes will work. That, unfortunately has left us in the lurch and exposed to these hard-to-handle #VEs.
The TDX module has introduced a new feature. Even if the static configuration is set to "send nasty #VEs", the kernel can dynamically request that they be disabled. Once they are disabled, access to private memory that is not in the Mapped state in the Secure-EPT (SEPT) will result in an exit to the VMM rather than injecting a #VE.
Check if the feature is available and disable SEPT #VE if possible.
If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE attribute is no longer reliable. It reflects the initial state of the control for the TD, but it will not be updated if someone (e.g. bootloader) changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to determine if SEPT #VEs are enabled or disabled.
[ dhansen: remove 'return' at end of function ]
Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access") Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Link: https://lore.kernel.org/all/20241104103803.195705-4-kirill.shutemov%40linux.intel.com
show more ...
|
| cf72bc48 | 08-Dec-2023 |
Kai Huang <kai.huang@intel.com> |
x86/virt/tdx: Get module global metadata for module initialization
The TDX module global metadata provides system-wide information about the module.
TL;DR:
Use the TDH.SYS.RD SEAMCALL to tell if t
x86/virt/tdx: Get module global metadata for module initialization
The TDX module global metadata provides system-wide information about the module.
TL;DR:
Use the TDH.SYS.RD SEAMCALL to tell if the module is good or not.
Long Version:
1) Only initialize TDX module with version 1.5 and later
TDX module 1.0 has some compatibility issues with the later versions of module, as documented in the "Intel TDX module ABI incompatibilities between TDX1.0 and TDX1.5" spec. Don't bother with module versions that do not have a stable ABI.
2) Get the essential global metadata for module initialization
TDX reports a list of "Convertible Memory Region" (CMR) to tell the kernel which memory is TDX compatible. The kernel needs to build a list of memory regions (out of CMRs) as "TDX-usable" memory and pass them to the TDX module. The kernel does this by constructing a list of "TD Memory Regions" (TDMRs) to cover all these memory regions and passing them to the TDX module.
Each TDMR is a TDX architectural data structure containing the memory region that the TDMR covers, plus the information to track (within this TDMR): a) the "Physical Address Metadata Table" (PAMT) to track each TDX memory page's status (such as which TDX guest "owns" a given page, and b) the "reserved areas" to tell memory holes that cannot be used as TDX memory.
The kernel needs to get below metadata from the TDX module to build the list of TDMRs: a) the maximum number of supported TDMRs b) the maximum number of supported reserved areas per TDMR and, c) the PAMT entry size for each TDX-supported page size.
== Implementation ==
The TDX module has two modes of fetching the metadata: a one field at a time, or all in one blob. Use the field at a time for now. It is slower, but there just are not enough fields now to justify the complexity of extra unpacking.
The err_free_tdxmem=>out_put_tdxmem goto looks wonky by itself. But it is the first of a bunch of error handling that will get stuck at its site.
[ dhansen: clean up changelog and add a struct to map between the TDX module fields and 'struct tdx_tdmr_sysinfo' ]
Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20231208170740.53979-8-dave.hansen%40intel.com
show more ...
|
| 518755a7 | 18-Sep-2023 |
Kai Huang <kai.huang@intel.com> |
x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()
LKP reported below build warning:
vmlinux.o: warning: objtool: __tdx_hypercall+0x128: __tdx_hypercall_failed() is missing a _
x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()
LKP reported below build warning:
vmlinux.o: warning: objtool: __tdx_hypercall+0x128: __tdx_hypercall_failed() is missing a __noreturn annotation
The __tdx_hypercall_failed() function definition already has __noreturn annotation, but it turns out the __noreturn must be annotated to the function declaration.
PeterZ explains:
"FWIW, the reason being that...
The point of noreturn is that the caller should know to stop generating code. For that the declaration needs the attribute, because call sites typically do not have access to the function definition in C."
Add __noreturn annotation to the declaration of __tdx_hypercall_failed() to fix. It's not a bad idea to document the __noreturn nature at the definition site either, so keep the annotation at the definition.
Note <asm/shared/tdx.h> is also included by TDX related assembly files. Include <linux/compiler_attributes.h> only in case of !__ASSEMBLY__ otherwise compiling assembly file would trigger build error.
Also, following the objtool documentation, add __tdx_hypercall_failed() to "tools/objtool/noreturns.h".
Fixes: c641cfb5c157 ("x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20230918041858.331234-1-kai.huang@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202309140828.9RdmlH2Z-lkp@intel.com/
show more ...
|
| 8a8544bd | 15-Aug-2023 |
Kai Huang <kai.huang@intel.com> |
x86/tdx: Remove 'struct tdx_hypercall_args'
Now 'struct tdx_hypercall_args' is basically 'struct tdx_module_args' minus RCX. Although from __tdx_hypercall()'s perspective RCX isn't used as shared r
x86/tdx: Remove 'struct tdx_hypercall_args'
Now 'struct tdx_hypercall_args' is basically 'struct tdx_module_args' minus RCX. Although from __tdx_hypercall()'s perspective RCX isn't used as shared register thus not part of input/output registers, it's not worth to have a separate structure just due to one register.
Remove the 'struct tdx_hypercall_args' and use 'struct tdx_module_args' instead in __tdx_hypercall() related code. This also saves the memory copy between the two structures within __tdx_hypercall().
Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Kai Huang <kai.huang@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/798dad5ce24e9d745cf0e16825b75ccc433ad065.1692096753.git.kai.huang%40intel.com
show more ...
|