1Using XSTATE features in user space applications 2================================================ 3 4The x86 architecture supports floating-point extensions which are 5enumerated via CPUID. Applications consult CPUID and use XGETBV to 6evaluate which features have been enabled by the kernel XCR0. 7 8Up to AVX-512 and PKRU states, these features are automatically enabled by 9the kernel if available. Features like AMX TILE_DATA (XSTATE component 18) 10are enabled by XCR0 as well, but the first use of related instruction is 11trapped by the kernel because by default the required large XSTATE buffers 12are not allocated automatically. 13 14The purpose for dynamic features 15-------------------------------- 16 17Legacy userspace libraries often have hard-coded, static sizes for 18alternate signal stacks, often using MINSIGSTKSZ which is typically 2KB. 19That stack must be able to store at *least* the signal frame that the 20kernel sets up before jumping into the signal handler. That signal frame 21must include an XSAVE buffer defined by the CPU. 22 23However, that means that the size of signal stacks is dynamic, not static, 24because different CPUs have differently-sized XSAVE buffers. A compiled-in 25size of 2KB with existing applications is too small for new CPU features 26like AMX. Instead of universally requiring larger stack, with the dynamic 27enabling, the kernel can enforce userspace applications to have 28properly-sized altstacks. 29 30Using dynamically enabled XSTATE features in user space applications 31-------------------------------------------------------------------- 32 33The kernel provides an arch_prctl(2) based mechanism for applications to 34request the usage of such features. The arch_prctl(2) options related to 35this are: 36 37-ARCH_GET_XCOMP_SUPP 38 39 arch_prctl(ARCH_GET_XCOMP_SUPP, &features); 40 41 ARCH_GET_XCOMP_SUPP stores the supported features in userspace storage of 42 type uint64_t. The second argument is a pointer to that storage. 43 44-ARCH_GET_XCOMP_PERM 45 46 arch_prctl(ARCH_GET_XCOMP_PERM, &features); 47 48 ARCH_GET_XCOMP_PERM stores the features for which the userspace process 49 has permission in userspace storage of type uint64_t. The second argument 50 is a pointer to that storage. 51 52-ARCH_REQ_XCOMP_PERM 53 54 arch_prctl(ARCH_REQ_XCOMP_PERM, feature_nr); 55 56 ARCH_REQ_XCOMP_PERM allows to request permission for a dynamically enabled 57 feature or a feature set. A feature set can be mapped to a facility, e.g. 58 AMX, and can require one or more XSTATE components to be enabled. 59 60 The feature argument is the number of the highest XSTATE component which 61 is required for a facility to work. 62 63When requesting permission for a feature, the kernel checks the 64availability. The kernel ensures that sigaltstacks in the process's tasks 65are large enough to accommodate the resulting large signal frame. It 66enforces this both during ARCH_REQ_XCOMP_SUPP and during any subsequent 67sigaltstack(2) calls. If an installed sigaltstack is smaller than the 68resulting sigframe size, ARCH_REQ_XCOMP_SUPP results in -ENOSUPP. Also, 69sigaltstack(2) results in -ENOMEM if the requested altstack is too small 70for the permitted features. 71 72Permission, when granted, is valid per process. Permissions are inherited 73on fork(2) and cleared on exec(3). 74 75The first use of an instruction related to a dynamically enabled feature is 76trapped by the kernel. The trap handler checks whether the process has 77permission to use the feature. If the process has no permission then the 78kernel sends SIGILL to the application. If the process has permission then 79the handler allocates a larger xstate buffer for the task so the large 80state can be context switched. In the unlikely cases that the allocation 81fails, the kernel sends SIGSEGV. 82 83AMX TILE_DATA enabling example 84^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 85 86Below is the example of how userspace applications enable 87TILE_DATA dynamically: 88 89 1. The application first needs to query the kernel for AMX 90 support:: 91 92 #include <asm/prctl.h> 93 #include <sys/syscall.h> 94 #include <stdio.h> 95 #include <unistd.h> 96 97 #ifndef ARCH_GET_XCOMP_SUPP 98 #define ARCH_GET_XCOMP_SUPP 0x1021 99 #endif 100 101 #ifndef ARCH_XCOMP_TILECFG 102 #define ARCH_XCOMP_TILECFG 17 103 #endif 104 105 #ifndef ARCH_XCOMP_TILEDATA 106 #define ARCH_XCOMP_TILEDATA 18 107 #endif 108 109 #define MASK_XCOMP_TILE ((1 << ARCH_XCOMP_TILECFG) | \ 110 (1 << ARCH_XCOMP_TILEDATA)) 111 112 unsigned long features; 113 long rc; 114 115 ... 116 117 rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_SUPP, &features); 118 119 if (!rc && (features & MASK_XCOMP_TILE) == MASK_XCOMP_TILE) 120 printf("AMX is available.\n"); 121 122 2. After that, determining support for AMX, an application must 123 explicitly ask permission to use it:: 124 125 #ifndef ARCH_REQ_XCOMP_PERM 126 #define ARCH_REQ_XCOMP_PERM 0x1023 127 #endif 128 129 ... 130 131 rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_PERM, ARCH_XCOMP_TILEDATA); 132 133 if (!rc) 134 printf("AMX is ready for use.\n"); 135 136Note this example does not include the sigaltstack preparation. 137 138Dynamic features in signal frames 139--------------------------------- 140 141Dynamically enabled features are not written to the signal frame upon signal 142entry if the feature is in its initial configuration. This differs from 143non-dynamic features which are always written regardless of their 144configuration. Signal handlers can examine the XSAVE buffer's XSTATE_BV 145field to determine if a features was written. 146 147Dynamic features for virtual machines 148------------------------------------- 149 150The permission for the guest state component needs to be managed separately 151from the host, as they are exclusive to each other. A coupled of options 152are extended to control the guest permission: 153 154-ARCH_GET_XCOMP_GUEST_PERM 155 156 arch_prctl(ARCH_GET_XCOMP_GUEST_PERM, &features); 157 158 ARCH_GET_XCOMP_GUEST_PERM is a variant of ARCH_GET_XCOMP_PERM. So it 159 provides the same semantics and functionality but for the guest 160 components. 161 162-ARCH_REQ_XCOMP_GUEST_PERM 163 164 arch_prctl(ARCH_REQ_XCOMP_GUEST_PERM, feature_nr); 165 166 ARCH_REQ_XCOMP_GUEST_PERM is a variant of ARCH_REQ_XCOMP_PERM. It has the 167 same semantics for the guest permission. While providing a similar 168 functionality, this comes with a constraint. Permission is frozen when the 169 first VCPU is created. Any attempt to change permission after that point 170 is going to be rejected. So, the permission has to be requested before the 171 first VCPU creation. 172 173Note that some VMMs may have already established a set of supported state 174components. These options are not presumed to support any particular VMM. 175