1e4624435SJonathan Corbet=================================================== 2e4624435SJonathan CorbetScalable Vector Extension support for AArch64 Linux 3e4624435SJonathan Corbet=================================================== 4e4624435SJonathan Corbet 5e4624435SJonathan CorbetAuthor: Dave Martin <Dave.Martin@arm.com> 6e4624435SJonathan Corbet 7e4624435SJonathan CorbetDate: 4 August 2017 8e4624435SJonathan Corbet 9e4624435SJonathan CorbetThis document outlines briefly the interface provided to userspace by Linux in 10e4624435SJonathan Corbetorder to support use of the ARM Scalable Vector Extension (SVE), including 11e4624435SJonathan Corbetinteractions with Streaming SVE mode added by the Scalable Matrix Extension 12e4624435SJonathan Corbet(SME). 13e4624435SJonathan Corbet 14e4624435SJonathan CorbetThis is an outline of the most important features and issues only and not 15e4624435SJonathan Corbetintended to be exhaustive. 16e4624435SJonathan Corbet 17e4624435SJonathan CorbetThis document does not aim to describe the SVE architecture or programmer's 18e4624435SJonathan Corbetmodel. To aid understanding, a minimal description of relevant programmer's 19e4624435SJonathan Corbetmodel features for SVE is included in Appendix A. 20e4624435SJonathan Corbet 21e4624435SJonathan Corbet 22e4624435SJonathan Corbet1. General 23e4624435SJonathan Corbet----------- 24e4624435SJonathan Corbet 25e4624435SJonathan Corbet* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are 26e4624435SJonathan Corbet tracked per-thread. 27e4624435SJonathan Corbet 28e4624435SJonathan Corbet* In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present 29e4624435SJonathan Corbet in the system, when it is not supported and these interfaces are used to 30e4624435SJonathan Corbet access streaming mode FFR is read and written as zero. 31e4624435SJonathan Corbet 32e4624435SJonathan Corbet* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector 33e4624435SJonathan Corbet AT_HWCAP entry. Presence of this flag implies the presence of the SVE 34e4624435SJonathan Corbet instructions and registers, and the Linux-specific system interfaces 35e4624435SJonathan Corbet described in this document. SVE is reported in /proc/cpuinfo as "sve". 36e4624435SJonathan Corbet 37e4624435SJonathan Corbet* Support for the execution of SVE instructions in userspace can also be 38e4624435SJonathan Corbet detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS 39e4624435SJonathan Corbet instruction, and checking that the value of the SVE field is nonzero. [3] 40e4624435SJonathan Corbet 41e4624435SJonathan Corbet It does not guarantee the presence of the system interfaces described in the 42e4624435SJonathan Corbet following sections: software that needs to verify that those interfaces are 43e4624435SJonathan Corbet present must check for HWCAP_SVE instead. 44e4624435SJonathan Corbet 45e4624435SJonathan Corbet* On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also 46e4624435SJonathan Corbet be reported in the AT_HWCAP2 aux vector entry. In addition to this, 47e4624435SJonathan Corbet optional extensions to SVE2 may be reported by the presence of: 48e4624435SJonathan Corbet 49e4624435SJonathan Corbet HWCAP2_SVE2 50e4624435SJonathan Corbet HWCAP2_SVEAES 51e4624435SJonathan Corbet HWCAP2_SVEPMULL 52e4624435SJonathan Corbet HWCAP2_SVEBITPERM 53e4624435SJonathan Corbet HWCAP2_SVESHA3 54e4624435SJonathan Corbet HWCAP2_SVESM4 55e4624435SJonathan Corbet HWCAP2_SVE2P1 56e4624435SJonathan Corbet 57e4624435SJonathan Corbet This list may be extended over time as the SVE architecture evolves. 58e4624435SJonathan Corbet 59e4624435SJonathan Corbet These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1, 60e4624435SJonathan Corbet which userspace can read using an MRS instruction. See elf_hwcaps.txt and 61e4624435SJonathan Corbet cpu-feature-registers.txt for details. 62e4624435SJonathan Corbet 63e4624435SJonathan Corbet* On hardware that supports the SME extensions, HWCAP2_SME will also be 64e4624435SJonathan Corbet reported in the AT_HWCAP2 aux vector entry. Among other things SME adds 65e4624435SJonathan Corbet streaming mode which provides a subset of the SVE feature set using a 66e4624435SJonathan Corbet separate SME vector length and the same Z/V registers. See sme.rst 67e4624435SJonathan Corbet for more details. 68e4624435SJonathan Corbet 69e4624435SJonathan Corbet* Debuggers should restrict themselves to interacting with the target via the 70e4624435SJonathan Corbet NT_ARM_SVE regset. The recommended way of detecting support for this regset 71e4624435SJonathan Corbet is to connect to a target process first and then attempt a 72e4624435SJonathan Corbet ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is 73e4624435SJonathan Corbet present and streaming SVE mode is in use the FPSIMD subset of registers 74e4624435SJonathan Corbet will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode 75e4624435SJonathan Corbet in the target. 76e4624435SJonathan Corbet 77e4624435SJonathan Corbet* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory 78e4624435SJonathan Corbet between userspace and the kernel, the register value is encoded in memory in 79e4624435SJonathan Corbet an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at 80e4624435SJonathan Corbet byte offset i from the start of the memory representation. This affects for 81e4624435SJonathan Corbet example the signal frame (struct sve_context) and ptrace interface 82e4624435SJonathan Corbet (struct user_sve_header) and associated data. 83e4624435SJonathan Corbet 84e4624435SJonathan Corbet Beware that on big-endian systems this results in a different byte order than 85e4624435SJonathan Corbet for the FPSIMD V-registers, which are stored as single host-endian 128-bit 86e4624435SJonathan Corbet values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at 87e4624435SJonathan Corbet byte offset i. (struct fpsimd_context, struct user_fpsimd_state). 88e4624435SJonathan Corbet 89e4624435SJonathan Corbet 90e4624435SJonathan Corbet2. Vector length terminology 91e4624435SJonathan Corbet----------------------------- 92e4624435SJonathan Corbet 93e4624435SJonathan CorbetThe size of an SVE vector (Z) register is referred to as the "vector length". 94e4624435SJonathan Corbet 95e4624435SJonathan CorbetTo avoid confusion about the units used to express vector length, the kernel 96e4624435SJonathan Corbetadopts the following conventions: 97e4624435SJonathan Corbet 98e4624435SJonathan Corbet* Vector length (VL) = size of a Z-register in bytes 99e4624435SJonathan Corbet 100e4624435SJonathan Corbet* Vector quadwords (VQ) = size of a Z-register in units of 128 bits 101e4624435SJonathan Corbet 102e4624435SJonathan Corbet(So, VL = 16 * VQ.) 103e4624435SJonathan Corbet 104e4624435SJonathan CorbetThe VQ convention is used where the underlying granularity is important, such 105e4624435SJonathan Corbetas in data structure definitions. In most other situations, the VL convention 106e4624435SJonathan Corbetis used. This is consistent with the meaning of the "VL" pseudo-register in 107e4624435SJonathan Corbetthe SVE instruction set architecture. 108e4624435SJonathan Corbet 109e4624435SJonathan Corbet 110e4624435SJonathan Corbet3. System call behaviour 111e4624435SJonathan Corbet------------------------- 112e4624435SJonathan Corbet 113e4624435SJonathan Corbet* On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of 114e4624435SJonathan Corbet Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR 115e4624435SJonathan Corbet become zero on return from a syscall. 116e4624435SJonathan Corbet 117e4624435SJonathan Corbet* The SVE registers are not used to pass arguments to or receive results from 118e4624435SJonathan Corbet any syscall. 119e4624435SJonathan Corbet 120e4624435SJonathan Corbet* All other SVE state of a thread, including the currently configured vector 121e4624435SJonathan Corbet length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector 122e4624435SJonathan Corbet length (if any), is preserved across all syscalls, subject to the specific 123e4624435SJonathan Corbet exceptions for execve() described in section 6. 124e4624435SJonathan Corbet 125e4624435SJonathan Corbet In particular, on return from a fork() or clone(), the parent and new child 126e4624435SJonathan Corbet process or thread share identical SVE configuration, matching that of the 127e4624435SJonathan Corbet parent before the call. 128e4624435SJonathan Corbet 129e4624435SJonathan Corbet 130e4624435SJonathan Corbet4. Signal handling 131e4624435SJonathan Corbet------------------- 132e4624435SJonathan Corbet 133e4624435SJonathan Corbet* A new signal frame record sve_context encodes the SVE registers on signal 134e4624435SJonathan Corbet delivery. [1] 135e4624435SJonathan Corbet 136e4624435SJonathan Corbet* This record is supplementary to fpsimd_context. The FPSR and FPCR registers 137e4624435SJonathan Corbet are only present in fpsimd_context. For convenience, the content of V0..V31 138e4624435SJonathan Corbet is duplicated between sve_context and fpsimd_context. 139e4624435SJonathan Corbet 140e4624435SJonathan Corbet* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which 141e4624435SJonathan Corbet if set indicates that the thread is in streaming mode and the vector length 142e4624435SJonathan Corbet and register data (if present) describe the streaming SVE data and vector 143e4624435SJonathan Corbet length. 144e4624435SJonathan Corbet 145e4624435SJonathan Corbet* The signal frame record for SVE always contains basic metadata, in particular 146e4624435SJonathan Corbet the thread's vector length (in sve_context.vl). 147e4624435SJonathan Corbet 148e4624435SJonathan Corbet* The SVE registers may or may not be included in the record, depending on 149e4624435SJonathan Corbet whether the registers are live for the thread. The registers are present if 150e4624435SJonathan Corbet and only if: 151e4624435SJonathan Corbet sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)). 152e4624435SJonathan Corbet 153e4624435SJonathan Corbet* If the registers are present, the remainder of the record has a vl-dependent 154e4624435SJonathan Corbet size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to 155e4624435SJonathan Corbet the members. 156e4624435SJonathan Corbet 157e4624435SJonathan Corbet* Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant 158e4624435SJonathan Corbet layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the 159e4624435SJonathan Corbet start of the register's representation in memory. 160e4624435SJonathan Corbet 161e4624435SJonathan Corbet* If the SVE context is too big to fit in sigcontext.__reserved[], then extra 162e4624435SJonathan Corbet space is allocated on the stack, an extra_context record is written in 163e4624435SJonathan Corbet __reserved[] referencing this space. sve_context is then written in the 164e4624435SJonathan Corbet extra space. Refer to [1] for further details about this mechanism. 165e4624435SJonathan Corbet 166e4624435SJonathan Corbet 167e4624435SJonathan Corbet5. Signal return 168e4624435SJonathan Corbet----------------- 169e4624435SJonathan Corbet 170e4624435SJonathan CorbetWhen returning from a signal handler: 171e4624435SJonathan Corbet 172e4624435SJonathan Corbet* If there is no sve_context record in the signal frame, or if the record is 173e4624435SJonathan Corbet present but contains no register data as described in the previous section, 174e4624435SJonathan Corbet then the SVE registers/bits become non-live and take unspecified values. 175e4624435SJonathan Corbet 176e4624435SJonathan Corbet* If sve_context is present in the signal frame and contains full register 177e4624435SJonathan Corbet data, the SVE registers become live and are populated with the specified 178e4624435SJonathan Corbet data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31 179e4624435SJonathan Corbet are always restored from the corresponding members of fpsimd_context.vregs[] 180e4624435SJonathan Corbet and not from sve_context. The remaining bits are restored from sve_context. 181e4624435SJonathan Corbet 182e4624435SJonathan Corbet* Inclusion of fpsimd_context in the signal frame remains mandatory, 183e4624435SJonathan Corbet irrespective of whether sve_context is present or not. 184e4624435SJonathan Corbet 185e4624435SJonathan Corbet* The vector length cannot be changed via signal return. If sve_context.vl in 186e4624435SJonathan Corbet the signal frame does not match the current vector length, the signal return 187e4624435SJonathan Corbet attempt is treated as illegal, resulting in a forced SIGSEGV. 188e4624435SJonathan Corbet 189e4624435SJonathan Corbet* It is permitted to enter or leave streaming mode by setting or clearing 190e4624435SJonathan Corbet the SVE_SIG_FLAG_SM flag but applications should take care to ensure that 191e4624435SJonathan Corbet when doing so sve_context.vl and any register data are appropriate for the 192e4624435SJonathan Corbet vector length in the new mode. 193e4624435SJonathan Corbet 194e4624435SJonathan Corbet 195e4624435SJonathan Corbet6. prctl extensions 196e4624435SJonathan Corbet-------------------- 197e4624435SJonathan Corbet 198e4624435SJonathan CorbetSome new prctl() calls are added to allow programs to manage the SVE vector 199e4624435SJonathan Corbetlength: 200e4624435SJonathan Corbet 201e4624435SJonathan Corbetprctl(PR_SVE_SET_VL, unsigned long arg) 202e4624435SJonathan Corbet 203e4624435SJonathan Corbet Sets the vector length of the calling thread and related flags, where 204e4624435SJonathan Corbet arg == vl | flags. Other threads of the calling process are unaffected. 205e4624435SJonathan Corbet 206e4624435SJonathan Corbet vl is the desired vector length, where sve_vl_valid(vl) must be true. 207e4624435SJonathan Corbet 208e4624435SJonathan Corbet flags: 209e4624435SJonathan Corbet 210e4624435SJonathan Corbet PR_SVE_VL_INHERIT 211e4624435SJonathan Corbet 212e4624435SJonathan Corbet Inherit the current vector length across execve(). Otherwise, the 213e4624435SJonathan Corbet vector length is reset to the system default at execve(). (See 214e4624435SJonathan Corbet Section 9.) 215e4624435SJonathan Corbet 216e4624435SJonathan Corbet PR_SVE_SET_VL_ONEXEC 217e4624435SJonathan Corbet 218e4624435SJonathan Corbet Defer the requested vector length change until the next execve() 219e4624435SJonathan Corbet performed by this thread. 220e4624435SJonathan Corbet 221e4624435SJonathan Corbet The effect is equivalent to implicit execution of the following 222e4624435SJonathan Corbet call immediately after the next execve() (if any) by the thread: 223e4624435SJonathan Corbet 224e4624435SJonathan Corbet prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC) 225e4624435SJonathan Corbet 226e4624435SJonathan Corbet This allows launching of a new program with a different vector 227e4624435SJonathan Corbet length, while avoiding runtime side effects in the caller. 228e4624435SJonathan Corbet 229e4624435SJonathan Corbet 230e4624435SJonathan Corbet Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect 231e4624435SJonathan Corbet immediately. 232e4624435SJonathan Corbet 233e4624435SJonathan Corbet 234e4624435SJonathan Corbet Return value: a nonnegative on success, or a negative value on error: 235e4624435SJonathan Corbet EINVAL: SVE not supported, invalid vector length requested, or 236e4624435SJonathan Corbet invalid flags. 237e4624435SJonathan Corbet 238e4624435SJonathan Corbet 239e4624435SJonathan Corbet On success: 240e4624435SJonathan Corbet 241e4624435SJonathan Corbet * Either the calling thread's vector length or the deferred vector length 242e4624435SJonathan Corbet to be applied at the next execve() by the thread (dependent on whether 243e4624435SJonathan Corbet PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value 244e4624435SJonathan Corbet supported by the system that is less than or equal to vl. If vl == 245e4624435SJonathan Corbet SVE_VL_MAX, the value set will be the largest value supported by the 246e4624435SJonathan Corbet system. 247e4624435SJonathan Corbet 248e4624435SJonathan Corbet * Any previously outstanding deferred vector length change in the calling 249e4624435SJonathan Corbet thread is cancelled. 250e4624435SJonathan Corbet 251e4624435SJonathan Corbet * The returned value describes the resulting configuration, encoded as for 252e4624435SJonathan Corbet PR_SVE_GET_VL. The vector length reported in this value is the new 253e4624435SJonathan Corbet current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not 254e4624435SJonathan Corbet present in arg; otherwise, the reported vector length is the deferred 255e4624435SJonathan Corbet vector length that will be applied at the next execve() by the calling 256e4624435SJonathan Corbet thread. 257e4624435SJonathan Corbet 258e4624435SJonathan Corbet * Changing the vector length causes all of P0..P15, FFR and all bits of 259e4624435SJonathan Corbet Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become 260e4624435SJonathan Corbet unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current 261e4624435SJonathan Corbet vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC 262e4624435SJonathan Corbet flag, does not constitute a change to the vector length for this purpose. 263e4624435SJonathan Corbet 264e4624435SJonathan Corbet 265e4624435SJonathan Corbetprctl(PR_SVE_GET_VL) 266e4624435SJonathan Corbet 267e4624435SJonathan Corbet Gets the vector length of the calling thread. 268e4624435SJonathan Corbet 269e4624435SJonathan Corbet The following flag may be OR-ed into the result: 270e4624435SJonathan Corbet 271e4624435SJonathan Corbet PR_SVE_VL_INHERIT 272e4624435SJonathan Corbet 273e4624435SJonathan Corbet Vector length will be inherited across execve(). 274e4624435SJonathan Corbet 275e4624435SJonathan Corbet There is no way to determine whether there is an outstanding deferred 276e4624435SJonathan Corbet vector length change (which would only normally be the case between a 277e4624435SJonathan Corbet fork() or vfork() and the corresponding execve() in typical use). 278e4624435SJonathan Corbet 279e4624435SJonathan Corbet To extract the vector length from the result, bitwise and it with 280e4624435SJonathan Corbet PR_SVE_VL_LEN_MASK. 281e4624435SJonathan Corbet 282e4624435SJonathan Corbet Return value: a nonnegative value on success, or a negative value on error: 283e4624435SJonathan Corbet EINVAL: SVE not supported. 284e4624435SJonathan Corbet 285e4624435SJonathan Corbet 286e4624435SJonathan Corbet7. ptrace extensions 287e4624435SJonathan Corbet--------------------- 288e4624435SJonathan Corbet 289e4624435SJonathan Corbet* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with 290e4624435SJonathan Corbet PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the 291e4624435SJonathan Corbet streaming mode SVE registers and NT_ARM_SVE describes the 292e4624435SJonathan Corbet non-streaming mode SVE registers. 293e4624435SJonathan Corbet 294e4624435SJonathan Corbet In this description a register set is referred to as being "live" when 295e4624435SJonathan Corbet the target is in the appropriate streaming or non-streaming mode and is 296e4624435SJonathan Corbet using data beyond the subset shared with the FPSIMD Vn registers. 297e4624435SJonathan Corbet 298e4624435SJonathan Corbet Refer to [2] for definitions. 299e4624435SJonathan Corbet 300e4624435SJonathan CorbetThe regset data starts with struct user_sve_header, containing: 301e4624435SJonathan Corbet 302e4624435SJonathan Corbet size 303e4624435SJonathan Corbet 304e4624435SJonathan Corbet Size of the complete regset, in bytes. 305e4624435SJonathan Corbet This depends on vl and possibly on other things in the future. 306e4624435SJonathan Corbet 307e4624435SJonathan Corbet If a call to PTRACE_GETREGSET requests less data than the value of 308e4624435SJonathan Corbet size, the caller can allocate a larger buffer and retry in order to 309e4624435SJonathan Corbet read the complete regset. 310e4624435SJonathan Corbet 311e4624435SJonathan Corbet max_size 312e4624435SJonathan Corbet 313e4624435SJonathan Corbet Maximum size in bytes that the regset can grow to for the target 314e4624435SJonathan Corbet thread. The regset won't grow bigger than this even if the target 315e4624435SJonathan Corbet thread changes its vector length etc. 316e4624435SJonathan Corbet 317e4624435SJonathan Corbet vl 318e4624435SJonathan Corbet 319e4624435SJonathan Corbet Target thread's current vector length, in bytes. 320e4624435SJonathan Corbet 321e4624435SJonathan Corbet max_vl 322e4624435SJonathan Corbet 323e4624435SJonathan Corbet Maximum possible vector length for the target thread. 324e4624435SJonathan Corbet 325e4624435SJonathan Corbet flags 326e4624435SJonathan Corbet 327e4624435SJonathan Corbet at most one of 328e4624435SJonathan Corbet 329e4624435SJonathan Corbet SVE_PT_REGS_FPSIMD 330e4624435SJonathan Corbet 331e4624435SJonathan Corbet SVE registers are not live (GETREGSET) or are to be made 332e4624435SJonathan Corbet non-live (SETREGSET). 333e4624435SJonathan Corbet 334e4624435SJonathan Corbet The payload is of type struct user_fpsimd_state, with the same 335e4624435SJonathan Corbet meaning as for NT_PRFPREG, starting at offset 336e4624435SJonathan Corbet SVE_PT_FPSIMD_OFFSET from the start of user_sve_header. 337e4624435SJonathan Corbet 338e4624435SJonathan Corbet Extra data might be appended in the future: the size of the 339e4624435SJonathan Corbet payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags). 340e4624435SJonathan Corbet 341e4624435SJonathan Corbet vq should be obtained using sve_vq_from_vl(vl). 342e4624435SJonathan Corbet 343e4624435SJonathan Corbet or 344e4624435SJonathan Corbet 345e4624435SJonathan Corbet SVE_PT_REGS_SVE 346e4624435SJonathan Corbet 347e4624435SJonathan Corbet SVE registers are live (GETREGSET) or are to be made live 348e4624435SJonathan Corbet (SETREGSET). 349e4624435SJonathan Corbet 350e4624435SJonathan Corbet The payload contains the SVE register data, starting at offset 351e4624435SJonathan Corbet SVE_PT_SVE_OFFSET from the start of user_sve_header, and with 352e4624435SJonathan Corbet size SVE_PT_SVE_SIZE(vq, flags); 353e4624435SJonathan Corbet 354e4624435SJonathan Corbet ... OR-ed with zero or more of the following flags, which have the same 355e4624435SJonathan Corbet meaning and behaviour as the corresponding PR_SET_VL_* flags: 356e4624435SJonathan Corbet 357e4624435SJonathan Corbet SVE_PT_VL_INHERIT 358e4624435SJonathan Corbet 359e4624435SJonathan Corbet SVE_PT_VL_ONEXEC (SETREGSET only). 360e4624435SJonathan Corbet 361e4624435SJonathan Corbet If neither FPSIMD nor SVE flags are provided then no register 362e4624435SJonathan Corbet payload is available, this is only possible when SME is implemented. 363e4624435SJonathan Corbet 364e4624435SJonathan Corbet 365e4624435SJonathan Corbet* The effects of changing the vector length and/or flags are equivalent to 366e4624435SJonathan Corbet those documented for PR_SVE_SET_VL. 367e4624435SJonathan Corbet 368e4624435SJonathan Corbet The caller must make a further GETREGSET call if it needs to know what VL is 369e4624435SJonathan Corbet actually set by SETREGSET, unless is it known in advance that the requested 370e4624435SJonathan Corbet VL is supported. 371e4624435SJonathan Corbet 372e4624435SJonathan Corbet* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on 373e4624435SJonathan Corbet the header fields. The SVE_PT_SVE_*() macros are provided to facilitate 374e4624435SJonathan Corbet access to the members. 375e4624435SJonathan Corbet 376e4624435SJonathan Corbet* In either case, for SETREGSET it is permissible to omit the payload, in which 377e4624435SJonathan Corbet case only the vector length and flags are changed (along with any 378e4624435SJonathan Corbet consequences of those changes). 379e4624435SJonathan Corbet 380e4624435SJonathan Corbet* In systems supporting SME when in streaming mode a GETREGSET for 381e4624435SJonathan Corbet NT_REG_SVE will return only the user_sve_header with no register data, 382e4624435SJonathan Corbet similarly a GETREGSET for NT_REG_SSVE will not return any register data 383e4624435SJonathan Corbet when not in streaming mode. 384e4624435SJonathan Corbet 385e4624435SJonathan Corbet* A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD. 386e4624435SJonathan Corbet 387e4624435SJonathan Corbet* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the 388e4624435SJonathan Corbet requested VL is not supported, the effect will be the same as if the 389e4624435SJonathan Corbet payload were omitted, except that an EIO error is reported. No 390e4624435SJonathan Corbet attempt is made to translate the payload data to the correct layout 391e4624435SJonathan Corbet for the vector length actually set. The thread's FPSIMD state is 392e4624435SJonathan Corbet preserved, but the remaining bits of the SVE registers become 393e4624435SJonathan Corbet unspecified. It is up to the caller to translate the payload layout 394e4624435SJonathan Corbet for the actual VL and retry. 395e4624435SJonathan Corbet 396e4624435SJonathan Corbet* Where SME is implemented it is not possible to GETREGSET the register 397e4624435SJonathan Corbet state for normal SVE when in streaming mode, nor the streaming mode 398e4624435SJonathan Corbet register state when in normal mode, regardless of the implementation defined 399e4624435SJonathan Corbet behaviour of the hardware for sharing data between the two modes. 400e4624435SJonathan Corbet 401e4624435SJonathan Corbet* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in 402e4624435SJonathan Corbet streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode 403e4624435SJonathan Corbet if the target was not in streaming mode. 404e4624435SJonathan Corbet 405e4624435SJonathan Corbet* The effect of writing a partial, incomplete payload is unspecified. 406e4624435SJonathan Corbet 407e4624435SJonathan Corbet 408e4624435SJonathan Corbet8. ELF coredump extensions 409e4624435SJonathan Corbet--------------------------- 410e4624435SJonathan Corbet 411e4624435SJonathan Corbet* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for 412e4624435SJonathan Corbet each thread of the dumped process. The contents will be equivalent to the 413e4624435SJonathan Corbet data that would have been read if a PTRACE_GETREGSET of the corresponding 414e4624435SJonathan Corbet type were executed for each thread when the coredump was generated. 415e4624435SJonathan Corbet 416e4624435SJonathan Corbet9. System runtime configuration 417e4624435SJonathan Corbet-------------------------------- 418e4624435SJonathan Corbet 419e4624435SJonathan Corbet* To mitigate the ABI impact of expansion of the signal frame, a policy 420e4624435SJonathan Corbet mechanism is provided for administrators, distro maintainers and developers 421e4624435SJonathan Corbet to set the default vector length for userspace processes: 422e4624435SJonathan Corbet 423e4624435SJonathan Corbet/proc/sys/abi/sve_default_vector_length 424e4624435SJonathan Corbet 425e4624435SJonathan Corbet Writing the text representation of an integer to this file sets the system 426*3fd97cf3SMark Brown default vector length to the specified value rounded to a supported value 427*3fd97cf3SMark Brown using the same rules as for setting vector length via PR_SVE_SET_VL. 428e4624435SJonathan Corbet 429e4624435SJonathan Corbet The result can be determined by reopening the file and reading its 430e4624435SJonathan Corbet contents. 431e4624435SJonathan Corbet 432e4624435SJonathan Corbet At boot, the default vector length is initially set to 64 or the maximum 433e4624435SJonathan Corbet supported vector length, whichever is smaller. This determines the initial 434e4624435SJonathan Corbet vector length of the init process (PID 1). 435e4624435SJonathan Corbet 436e4624435SJonathan Corbet Reading this file returns the current system default vector length. 437e4624435SJonathan Corbet 438e4624435SJonathan Corbet* At every execve() call, the new vector length of the new process is set to 439e4624435SJonathan Corbet the system default vector length, unless 440e4624435SJonathan Corbet 441e4624435SJonathan Corbet * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the 442e4624435SJonathan Corbet calling thread, or 443e4624435SJonathan Corbet 444e4624435SJonathan Corbet * a deferred vector length change is pending, established via the 445e4624435SJonathan Corbet PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC). 446e4624435SJonathan Corbet 447e4624435SJonathan Corbet* Modifying the system default vector length does not affect the vector length 448e4624435SJonathan Corbet of any existing process or thread that does not make an execve() call. 449e4624435SJonathan Corbet 450e4624435SJonathan Corbet10. Perf extensions 451e4624435SJonathan Corbet-------------------------------- 452e4624435SJonathan Corbet 453e4624435SJonathan Corbet* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register 454e4624435SJonathan Corbet at index 46. This register is used for DWARF unwinding when variable length 455e4624435SJonathan Corbet SVE registers are pushed onto the stack. 456e4624435SJonathan Corbet 457e4624435SJonathan Corbet* Its value is equivalent to the current SVE vector length (VL) in bits divided 458e4624435SJonathan Corbet by 64. 459e4624435SJonathan Corbet 460e4624435SJonathan Corbet* The value is included in Perf samples in the regs[46] field if 461e4624435SJonathan Corbet PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set. 462e4624435SJonathan Corbet 463e4624435SJonathan Corbet* The value is the current value at the time the sample was taken, and it can 464e4624435SJonathan Corbet change over time. 465e4624435SJonathan Corbet 466e4624435SJonathan Corbet* If the system doesn't support SVE when perf_event_open is called with these 467e4624435SJonathan Corbet settings, the event will fail to open. 468e4624435SJonathan Corbet 469e4624435SJonathan CorbetAppendix A. SVE programmer's model (informative) 470e4624435SJonathan Corbet================================================= 471e4624435SJonathan Corbet 472e4624435SJonathan CorbetThis section provides a minimal description of the additions made by SVE to the 473e4624435SJonathan CorbetARMv8-A programmer's model that are relevant to this document. 474e4624435SJonathan Corbet 475e4624435SJonathan CorbetNote: This section is for information only and not intended to be complete or 476e4624435SJonathan Corbetto replace any architectural specification. 477e4624435SJonathan Corbet 478e4624435SJonathan CorbetA.1. Registers 479e4624435SJonathan Corbet--------------- 480e4624435SJonathan Corbet 481e4624435SJonathan CorbetIn A64 state, SVE adds the following: 482e4624435SJonathan Corbet 483e4624435SJonathan Corbet* 32 8VL-bit vector registers Z0..Z31 484e4624435SJonathan Corbet For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn. 485e4624435SJonathan Corbet 486e4624435SJonathan Corbet A register write using a Vn register name zeros all bits of the corresponding 487e4624435SJonathan Corbet Zn except for bits [127:0]. 488e4624435SJonathan Corbet 489e4624435SJonathan Corbet* 16 VL-bit predicate registers P0..P15 490e4624435SJonathan Corbet 491e4624435SJonathan Corbet* 1 VL-bit special-purpose predicate register FFR (the "first-fault register") 492e4624435SJonathan Corbet 493e4624435SJonathan Corbet* a VL "pseudo-register" that determines the size of each vector register 494e4624435SJonathan Corbet 495e4624435SJonathan Corbet The SVE instruction set architecture provides no way to write VL directly. 496e4624435SJonathan Corbet Instead, it can be modified only by EL1 and above, by writing appropriate 497e4624435SJonathan Corbet system registers. 498e4624435SJonathan Corbet 499e4624435SJonathan Corbet* The value of VL can be configured at runtime by EL1 and above: 500e4624435SJonathan Corbet 16 <= VL <= VLmax, where VL must be a multiple of 16. 501e4624435SJonathan Corbet 502e4624435SJonathan Corbet* The maximum vector length is determined by the hardware: 503e4624435SJonathan Corbet 16 <= VLmax <= 256. 504e4624435SJonathan Corbet 505e4624435SJonathan Corbet (The SVE architecture specifies 256, but permits future architecture 506e4624435SJonathan Corbet revisions to raise this limit.) 507e4624435SJonathan Corbet 508e4624435SJonathan Corbet* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point 509e4624435SJonathan Corbet operations in a similar way to the way in which they interact with ARMv8 510e4624435SJonathan Corbet floating-point operations:: 511e4624435SJonathan Corbet 512e4624435SJonathan Corbet 8VL-1 128 0 bit index 513e4624435SJonathan Corbet +---- //// -----------------+ 514e4624435SJonathan Corbet Z0 | : V0 | 515e4624435SJonathan Corbet : : 516e4624435SJonathan Corbet Z7 | : V7 | 517e4624435SJonathan Corbet Z8 | : * V8 | 518e4624435SJonathan Corbet : : : 519e4624435SJonathan Corbet Z15 | : *V15 | 520e4624435SJonathan Corbet Z16 | : V16 | 521e4624435SJonathan Corbet : : 522e4624435SJonathan Corbet Z31 | : V31 | 523e4624435SJonathan Corbet +---- //// -----------------+ 524e4624435SJonathan Corbet 31 0 525e4624435SJonathan Corbet VL-1 0 +-------+ 526e4624435SJonathan Corbet +---- //// --+ FPSR | | 527e4624435SJonathan Corbet P0 | | +-------+ 528e4624435SJonathan Corbet : | | *FPCR | | 529e4624435SJonathan Corbet P15 | | +-------+ 530e4624435SJonathan Corbet +---- //// --+ 531e4624435SJonathan Corbet FFR | | +-----+ 532e4624435SJonathan Corbet +---- //// --+ VL | | 533e4624435SJonathan Corbet +-----+ 534e4624435SJonathan Corbet 535e4624435SJonathan Corbet(*) callee-save: 536e4624435SJonathan Corbet This only applies to bits [63:0] of Z-/V-registers. 537e4624435SJonathan Corbet FPCR contains callee-save and caller-save bits. See [4] for details. 538e4624435SJonathan Corbet 539e4624435SJonathan Corbet 540e4624435SJonathan CorbetA.2. Procedure call standard 541e4624435SJonathan Corbet----------------------------- 542e4624435SJonathan Corbet 543e4624435SJonathan CorbetThe ARMv8-A base procedure call standard is extended as follows with respect to 544e4624435SJonathan Corbetthe additional SVE register state: 545e4624435SJonathan Corbet 546e4624435SJonathan Corbet* All SVE register bits that are not shared with FP/SIMD are caller-save. 547e4624435SJonathan Corbet 548e4624435SJonathan Corbet* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save. 549e4624435SJonathan Corbet 550e4624435SJonathan Corbet This follows from the way these bits are mapped to V8..V15, which are caller- 551e4624435SJonathan Corbet save in the base procedure call standard. 552e4624435SJonathan Corbet 553e4624435SJonathan Corbet 554e4624435SJonathan CorbetAppendix B. ARMv8-A FP/SIMD programmer's model 555e4624435SJonathan Corbet=============================================== 556e4624435SJonathan Corbet 557e4624435SJonathan CorbetNote: This section is for information only and not intended to be complete or 558e4624435SJonathan Corbetto replace any architectural specification. 559e4624435SJonathan Corbet 560e4624435SJonathan CorbetRefer to [4] for more information. 561e4624435SJonathan Corbet 562e4624435SJonathan CorbetARMv8-A defines the following floating-point / SIMD register state: 563e4624435SJonathan Corbet 564e4624435SJonathan Corbet* 32 128-bit vector registers V0..V31 565e4624435SJonathan Corbet* 2 32-bit status/control registers FPSR, FPCR 566e4624435SJonathan Corbet 567e4624435SJonathan Corbet:: 568e4624435SJonathan Corbet 569e4624435SJonathan Corbet 127 0 bit index 570e4624435SJonathan Corbet +---------------+ 571e4624435SJonathan Corbet V0 | | 572e4624435SJonathan Corbet : : : 573e4624435SJonathan Corbet V7 | | 574e4624435SJonathan Corbet * V8 | | 575e4624435SJonathan Corbet : : : : 576e4624435SJonathan Corbet *V15 | | 577e4624435SJonathan Corbet V16 | | 578e4624435SJonathan Corbet : : : 579e4624435SJonathan Corbet V31 | | 580e4624435SJonathan Corbet +---------------+ 581e4624435SJonathan Corbet 582e4624435SJonathan Corbet 31 0 583e4624435SJonathan Corbet +-------+ 584e4624435SJonathan Corbet FPSR | | 585e4624435SJonathan Corbet +-------+ 586e4624435SJonathan Corbet *FPCR | | 587e4624435SJonathan Corbet +-------+ 588e4624435SJonathan Corbet 589e4624435SJonathan Corbet(*) callee-save: 590e4624435SJonathan Corbet This only applies to bits [63:0] of V-registers. 591e4624435SJonathan Corbet FPCR contains a mixture of callee-save and caller-save bits. 592e4624435SJonathan Corbet 593e4624435SJonathan Corbet 594e4624435SJonathan CorbetReferences 595e4624435SJonathan Corbet========== 596e4624435SJonathan Corbet 597e4624435SJonathan Corbet[1] arch/arm64/include/uapi/asm/sigcontext.h 598e4624435SJonathan Corbet AArch64 Linux signal ABI definitions 599e4624435SJonathan Corbet 600e4624435SJonathan Corbet[2] arch/arm64/include/uapi/asm/ptrace.h 601e4624435SJonathan Corbet AArch64 Linux ptrace ABI definitions 602e4624435SJonathan Corbet 603e4624435SJonathan Corbet[3] Documentation/arch/arm64/cpu-feature-registers.rst 604e4624435SJonathan Corbet 605e4624435SJonathan Corbet[4] ARM IHI0055C 606e4624435SJonathan Corbet http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf 607e4624435SJonathan Corbet http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html 608e4624435SJonathan Corbet Procedure Call Standard for the ARM 64-bit Architecture (AArch64) 609e4624435SJonathan Corbet 610e4624435SJonathan Corbet[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst 611