1*e4624435SJonathan Corbet=================================================== 2*e4624435SJonathan CorbetScalable Vector Extension support for AArch64 Linux 3*e4624435SJonathan Corbet=================================================== 4*e4624435SJonathan Corbet 5*e4624435SJonathan CorbetAuthor: Dave Martin <Dave.Martin@arm.com> 6*e4624435SJonathan Corbet 7*e4624435SJonathan CorbetDate: 4 August 2017 8*e4624435SJonathan Corbet 9*e4624435SJonathan CorbetThis document outlines briefly the interface provided to userspace by Linux in 10*e4624435SJonathan Corbetorder to support use of the ARM Scalable Vector Extension (SVE), including 11*e4624435SJonathan Corbetinteractions with Streaming SVE mode added by the Scalable Matrix Extension 12*e4624435SJonathan Corbet(SME). 13*e4624435SJonathan Corbet 14*e4624435SJonathan CorbetThis is an outline of the most important features and issues only and not 15*e4624435SJonathan Corbetintended to be exhaustive. 16*e4624435SJonathan Corbet 17*e4624435SJonathan CorbetThis document does not aim to describe the SVE architecture or programmer's 18*e4624435SJonathan Corbetmodel. To aid understanding, a minimal description of relevant programmer's 19*e4624435SJonathan Corbetmodel features for SVE is included in Appendix A. 20*e4624435SJonathan Corbet 21*e4624435SJonathan Corbet 22*e4624435SJonathan Corbet1. General 23*e4624435SJonathan Corbet----------- 24*e4624435SJonathan Corbet 25*e4624435SJonathan Corbet* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are 26*e4624435SJonathan Corbet tracked per-thread. 27*e4624435SJonathan Corbet 28*e4624435SJonathan Corbet* In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present 29*e4624435SJonathan Corbet in the system, when it is not supported and these interfaces are used to 30*e4624435SJonathan Corbet access streaming mode FFR is read and written as zero. 31*e4624435SJonathan Corbet 32*e4624435SJonathan Corbet* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector 33*e4624435SJonathan Corbet AT_HWCAP entry. Presence of this flag implies the presence of the SVE 34*e4624435SJonathan Corbet instructions and registers, and the Linux-specific system interfaces 35*e4624435SJonathan Corbet described in this document. SVE is reported in /proc/cpuinfo as "sve". 36*e4624435SJonathan Corbet 37*e4624435SJonathan Corbet* Support for the execution of SVE instructions in userspace can also be 38*e4624435SJonathan Corbet detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS 39*e4624435SJonathan Corbet instruction, and checking that the value of the SVE field is nonzero. [3] 40*e4624435SJonathan Corbet 41*e4624435SJonathan Corbet It does not guarantee the presence of the system interfaces described in the 42*e4624435SJonathan Corbet following sections: software that needs to verify that those interfaces are 43*e4624435SJonathan Corbet present must check for HWCAP_SVE instead. 44*e4624435SJonathan Corbet 45*e4624435SJonathan Corbet* On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also 46*e4624435SJonathan Corbet be reported in the AT_HWCAP2 aux vector entry. In addition to this, 47*e4624435SJonathan Corbet optional extensions to SVE2 may be reported by the presence of: 48*e4624435SJonathan Corbet 49*e4624435SJonathan Corbet HWCAP2_SVE2 50*e4624435SJonathan Corbet HWCAP2_SVEAES 51*e4624435SJonathan Corbet HWCAP2_SVEPMULL 52*e4624435SJonathan Corbet HWCAP2_SVEBITPERM 53*e4624435SJonathan Corbet HWCAP2_SVESHA3 54*e4624435SJonathan Corbet HWCAP2_SVESM4 55*e4624435SJonathan Corbet HWCAP2_SVE2P1 56*e4624435SJonathan Corbet 57*e4624435SJonathan Corbet This list may be extended over time as the SVE architecture evolves. 58*e4624435SJonathan Corbet 59*e4624435SJonathan Corbet These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1, 60*e4624435SJonathan Corbet which userspace can read using an MRS instruction. See elf_hwcaps.txt and 61*e4624435SJonathan Corbet cpu-feature-registers.txt for details. 62*e4624435SJonathan Corbet 63*e4624435SJonathan Corbet* On hardware that supports the SME extensions, HWCAP2_SME will also be 64*e4624435SJonathan Corbet reported in the AT_HWCAP2 aux vector entry. Among other things SME adds 65*e4624435SJonathan Corbet streaming mode which provides a subset of the SVE feature set using a 66*e4624435SJonathan Corbet separate SME vector length and the same Z/V registers. See sme.rst 67*e4624435SJonathan Corbet for more details. 68*e4624435SJonathan Corbet 69*e4624435SJonathan Corbet* Debuggers should restrict themselves to interacting with the target via the 70*e4624435SJonathan Corbet NT_ARM_SVE regset. The recommended way of detecting support for this regset 71*e4624435SJonathan Corbet is to connect to a target process first and then attempt a 72*e4624435SJonathan Corbet ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov). Note that when SME is 73*e4624435SJonathan Corbet present and streaming SVE mode is in use the FPSIMD subset of registers 74*e4624435SJonathan Corbet will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode 75*e4624435SJonathan Corbet in the target. 76*e4624435SJonathan Corbet 77*e4624435SJonathan Corbet* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory 78*e4624435SJonathan Corbet between userspace and the kernel, the register value is encoded in memory in 79*e4624435SJonathan Corbet an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at 80*e4624435SJonathan Corbet byte offset i from the start of the memory representation. This affects for 81*e4624435SJonathan Corbet example the signal frame (struct sve_context) and ptrace interface 82*e4624435SJonathan Corbet (struct user_sve_header) and associated data. 83*e4624435SJonathan Corbet 84*e4624435SJonathan Corbet Beware that on big-endian systems this results in a different byte order than 85*e4624435SJonathan Corbet for the FPSIMD V-registers, which are stored as single host-endian 128-bit 86*e4624435SJonathan Corbet values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at 87*e4624435SJonathan Corbet byte offset i. (struct fpsimd_context, struct user_fpsimd_state). 88*e4624435SJonathan Corbet 89*e4624435SJonathan Corbet 90*e4624435SJonathan Corbet2. Vector length terminology 91*e4624435SJonathan Corbet----------------------------- 92*e4624435SJonathan Corbet 93*e4624435SJonathan CorbetThe size of an SVE vector (Z) register is referred to as the "vector length". 94*e4624435SJonathan Corbet 95*e4624435SJonathan CorbetTo avoid confusion about the units used to express vector length, the kernel 96*e4624435SJonathan Corbetadopts the following conventions: 97*e4624435SJonathan Corbet 98*e4624435SJonathan Corbet* Vector length (VL) = size of a Z-register in bytes 99*e4624435SJonathan Corbet 100*e4624435SJonathan Corbet* Vector quadwords (VQ) = size of a Z-register in units of 128 bits 101*e4624435SJonathan Corbet 102*e4624435SJonathan Corbet(So, VL = 16 * VQ.) 103*e4624435SJonathan Corbet 104*e4624435SJonathan CorbetThe VQ convention is used where the underlying granularity is important, such 105*e4624435SJonathan Corbetas in data structure definitions. In most other situations, the VL convention 106*e4624435SJonathan Corbetis used. This is consistent with the meaning of the "VL" pseudo-register in 107*e4624435SJonathan Corbetthe SVE instruction set architecture. 108*e4624435SJonathan Corbet 109*e4624435SJonathan Corbet 110*e4624435SJonathan Corbet3. System call behaviour 111*e4624435SJonathan Corbet------------------------- 112*e4624435SJonathan Corbet 113*e4624435SJonathan Corbet* On syscall, V0..V31 are preserved (as without SVE). Thus, bits [127:0] of 114*e4624435SJonathan Corbet Z0..Z31 are preserved. All other bits of Z0..Z31, and all of P0..P15 and FFR 115*e4624435SJonathan Corbet become zero on return from a syscall. 116*e4624435SJonathan Corbet 117*e4624435SJonathan Corbet* The SVE registers are not used to pass arguments to or receive results from 118*e4624435SJonathan Corbet any syscall. 119*e4624435SJonathan Corbet 120*e4624435SJonathan Corbet* In practice the affected registers/bits will be preserved or will be replaced 121*e4624435SJonathan Corbet with zeros on return from a syscall, but userspace should not make 122*e4624435SJonathan Corbet assumptions about this. The kernel behaviour may vary on a case-by-case 123*e4624435SJonathan Corbet basis. 124*e4624435SJonathan Corbet 125*e4624435SJonathan Corbet* All other SVE state of a thread, including the currently configured vector 126*e4624435SJonathan Corbet length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector 127*e4624435SJonathan Corbet length (if any), is preserved across all syscalls, subject to the specific 128*e4624435SJonathan Corbet exceptions for execve() described in section 6. 129*e4624435SJonathan Corbet 130*e4624435SJonathan Corbet In particular, on return from a fork() or clone(), the parent and new child 131*e4624435SJonathan Corbet process or thread share identical SVE configuration, matching that of the 132*e4624435SJonathan Corbet parent before the call. 133*e4624435SJonathan Corbet 134*e4624435SJonathan Corbet 135*e4624435SJonathan Corbet4. Signal handling 136*e4624435SJonathan Corbet------------------- 137*e4624435SJonathan Corbet 138*e4624435SJonathan Corbet* A new signal frame record sve_context encodes the SVE registers on signal 139*e4624435SJonathan Corbet delivery. [1] 140*e4624435SJonathan Corbet 141*e4624435SJonathan Corbet* This record is supplementary to fpsimd_context. The FPSR and FPCR registers 142*e4624435SJonathan Corbet are only present in fpsimd_context. For convenience, the content of V0..V31 143*e4624435SJonathan Corbet is duplicated between sve_context and fpsimd_context. 144*e4624435SJonathan Corbet 145*e4624435SJonathan Corbet* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which 146*e4624435SJonathan Corbet if set indicates that the thread is in streaming mode and the vector length 147*e4624435SJonathan Corbet and register data (if present) describe the streaming SVE data and vector 148*e4624435SJonathan Corbet length. 149*e4624435SJonathan Corbet 150*e4624435SJonathan Corbet* The signal frame record for SVE always contains basic metadata, in particular 151*e4624435SJonathan Corbet the thread's vector length (in sve_context.vl). 152*e4624435SJonathan Corbet 153*e4624435SJonathan Corbet* The SVE registers may or may not be included in the record, depending on 154*e4624435SJonathan Corbet whether the registers are live for the thread. The registers are present if 155*e4624435SJonathan Corbet and only if: 156*e4624435SJonathan Corbet sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)). 157*e4624435SJonathan Corbet 158*e4624435SJonathan Corbet* If the registers are present, the remainder of the record has a vl-dependent 159*e4624435SJonathan Corbet size and layout. Macros SVE_SIG_* are defined [1] to facilitate access to 160*e4624435SJonathan Corbet the members. 161*e4624435SJonathan Corbet 162*e4624435SJonathan Corbet* Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant 163*e4624435SJonathan Corbet layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the 164*e4624435SJonathan Corbet start of the register's representation in memory. 165*e4624435SJonathan Corbet 166*e4624435SJonathan Corbet* If the SVE context is too big to fit in sigcontext.__reserved[], then extra 167*e4624435SJonathan Corbet space is allocated on the stack, an extra_context record is written in 168*e4624435SJonathan Corbet __reserved[] referencing this space. sve_context is then written in the 169*e4624435SJonathan Corbet extra space. Refer to [1] for further details about this mechanism. 170*e4624435SJonathan Corbet 171*e4624435SJonathan Corbet 172*e4624435SJonathan Corbet5. Signal return 173*e4624435SJonathan Corbet----------------- 174*e4624435SJonathan Corbet 175*e4624435SJonathan CorbetWhen returning from a signal handler: 176*e4624435SJonathan Corbet 177*e4624435SJonathan Corbet* If there is no sve_context record in the signal frame, or if the record is 178*e4624435SJonathan Corbet present but contains no register data as described in the previous section, 179*e4624435SJonathan Corbet then the SVE registers/bits become non-live and take unspecified values. 180*e4624435SJonathan Corbet 181*e4624435SJonathan Corbet* If sve_context is present in the signal frame and contains full register 182*e4624435SJonathan Corbet data, the SVE registers become live and are populated with the specified 183*e4624435SJonathan Corbet data. However, for backward compatibility reasons, bits [127:0] of Z0..Z31 184*e4624435SJonathan Corbet are always restored from the corresponding members of fpsimd_context.vregs[] 185*e4624435SJonathan Corbet and not from sve_context. The remaining bits are restored from sve_context. 186*e4624435SJonathan Corbet 187*e4624435SJonathan Corbet* Inclusion of fpsimd_context in the signal frame remains mandatory, 188*e4624435SJonathan Corbet irrespective of whether sve_context is present or not. 189*e4624435SJonathan Corbet 190*e4624435SJonathan Corbet* The vector length cannot be changed via signal return. If sve_context.vl in 191*e4624435SJonathan Corbet the signal frame does not match the current vector length, the signal return 192*e4624435SJonathan Corbet attempt is treated as illegal, resulting in a forced SIGSEGV. 193*e4624435SJonathan Corbet 194*e4624435SJonathan Corbet* It is permitted to enter or leave streaming mode by setting or clearing 195*e4624435SJonathan Corbet the SVE_SIG_FLAG_SM flag but applications should take care to ensure that 196*e4624435SJonathan Corbet when doing so sve_context.vl and any register data are appropriate for the 197*e4624435SJonathan Corbet vector length in the new mode. 198*e4624435SJonathan Corbet 199*e4624435SJonathan Corbet 200*e4624435SJonathan Corbet6. prctl extensions 201*e4624435SJonathan Corbet-------------------- 202*e4624435SJonathan Corbet 203*e4624435SJonathan CorbetSome new prctl() calls are added to allow programs to manage the SVE vector 204*e4624435SJonathan Corbetlength: 205*e4624435SJonathan Corbet 206*e4624435SJonathan Corbetprctl(PR_SVE_SET_VL, unsigned long arg) 207*e4624435SJonathan Corbet 208*e4624435SJonathan Corbet Sets the vector length of the calling thread and related flags, where 209*e4624435SJonathan Corbet arg == vl | flags. Other threads of the calling process are unaffected. 210*e4624435SJonathan Corbet 211*e4624435SJonathan Corbet vl is the desired vector length, where sve_vl_valid(vl) must be true. 212*e4624435SJonathan Corbet 213*e4624435SJonathan Corbet flags: 214*e4624435SJonathan Corbet 215*e4624435SJonathan Corbet PR_SVE_VL_INHERIT 216*e4624435SJonathan Corbet 217*e4624435SJonathan Corbet Inherit the current vector length across execve(). Otherwise, the 218*e4624435SJonathan Corbet vector length is reset to the system default at execve(). (See 219*e4624435SJonathan Corbet Section 9.) 220*e4624435SJonathan Corbet 221*e4624435SJonathan Corbet PR_SVE_SET_VL_ONEXEC 222*e4624435SJonathan Corbet 223*e4624435SJonathan Corbet Defer the requested vector length change until the next execve() 224*e4624435SJonathan Corbet performed by this thread. 225*e4624435SJonathan Corbet 226*e4624435SJonathan Corbet The effect is equivalent to implicit execution of the following 227*e4624435SJonathan Corbet call immediately after the next execve() (if any) by the thread: 228*e4624435SJonathan Corbet 229*e4624435SJonathan Corbet prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC) 230*e4624435SJonathan Corbet 231*e4624435SJonathan Corbet This allows launching of a new program with a different vector 232*e4624435SJonathan Corbet length, while avoiding runtime side effects in the caller. 233*e4624435SJonathan Corbet 234*e4624435SJonathan Corbet 235*e4624435SJonathan Corbet Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect 236*e4624435SJonathan Corbet immediately. 237*e4624435SJonathan Corbet 238*e4624435SJonathan Corbet 239*e4624435SJonathan Corbet Return value: a nonnegative on success, or a negative value on error: 240*e4624435SJonathan Corbet EINVAL: SVE not supported, invalid vector length requested, or 241*e4624435SJonathan Corbet invalid flags. 242*e4624435SJonathan Corbet 243*e4624435SJonathan Corbet 244*e4624435SJonathan Corbet On success: 245*e4624435SJonathan Corbet 246*e4624435SJonathan Corbet * Either the calling thread's vector length or the deferred vector length 247*e4624435SJonathan Corbet to be applied at the next execve() by the thread (dependent on whether 248*e4624435SJonathan Corbet PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value 249*e4624435SJonathan Corbet supported by the system that is less than or equal to vl. If vl == 250*e4624435SJonathan Corbet SVE_VL_MAX, the value set will be the largest value supported by the 251*e4624435SJonathan Corbet system. 252*e4624435SJonathan Corbet 253*e4624435SJonathan Corbet * Any previously outstanding deferred vector length change in the calling 254*e4624435SJonathan Corbet thread is cancelled. 255*e4624435SJonathan Corbet 256*e4624435SJonathan Corbet * The returned value describes the resulting configuration, encoded as for 257*e4624435SJonathan Corbet PR_SVE_GET_VL. The vector length reported in this value is the new 258*e4624435SJonathan Corbet current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not 259*e4624435SJonathan Corbet present in arg; otherwise, the reported vector length is the deferred 260*e4624435SJonathan Corbet vector length that will be applied at the next execve() by the calling 261*e4624435SJonathan Corbet thread. 262*e4624435SJonathan Corbet 263*e4624435SJonathan Corbet * Changing the vector length causes all of P0..P15, FFR and all bits of 264*e4624435SJonathan Corbet Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become 265*e4624435SJonathan Corbet unspecified. Calling PR_SVE_SET_VL with vl equal to the thread's current 266*e4624435SJonathan Corbet vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC 267*e4624435SJonathan Corbet flag, does not constitute a change to the vector length for this purpose. 268*e4624435SJonathan Corbet 269*e4624435SJonathan Corbet 270*e4624435SJonathan Corbetprctl(PR_SVE_GET_VL) 271*e4624435SJonathan Corbet 272*e4624435SJonathan Corbet Gets the vector length of the calling thread. 273*e4624435SJonathan Corbet 274*e4624435SJonathan Corbet The following flag may be OR-ed into the result: 275*e4624435SJonathan Corbet 276*e4624435SJonathan Corbet PR_SVE_VL_INHERIT 277*e4624435SJonathan Corbet 278*e4624435SJonathan Corbet Vector length will be inherited across execve(). 279*e4624435SJonathan Corbet 280*e4624435SJonathan Corbet There is no way to determine whether there is an outstanding deferred 281*e4624435SJonathan Corbet vector length change (which would only normally be the case between a 282*e4624435SJonathan Corbet fork() or vfork() and the corresponding execve() in typical use). 283*e4624435SJonathan Corbet 284*e4624435SJonathan Corbet To extract the vector length from the result, bitwise and it with 285*e4624435SJonathan Corbet PR_SVE_VL_LEN_MASK. 286*e4624435SJonathan Corbet 287*e4624435SJonathan Corbet Return value: a nonnegative value on success, or a negative value on error: 288*e4624435SJonathan Corbet EINVAL: SVE not supported. 289*e4624435SJonathan Corbet 290*e4624435SJonathan Corbet 291*e4624435SJonathan Corbet7. ptrace extensions 292*e4624435SJonathan Corbet--------------------- 293*e4624435SJonathan Corbet 294*e4624435SJonathan Corbet* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with 295*e4624435SJonathan Corbet PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the 296*e4624435SJonathan Corbet streaming mode SVE registers and NT_ARM_SVE describes the 297*e4624435SJonathan Corbet non-streaming mode SVE registers. 298*e4624435SJonathan Corbet 299*e4624435SJonathan Corbet In this description a register set is referred to as being "live" when 300*e4624435SJonathan Corbet the target is in the appropriate streaming or non-streaming mode and is 301*e4624435SJonathan Corbet using data beyond the subset shared with the FPSIMD Vn registers. 302*e4624435SJonathan Corbet 303*e4624435SJonathan Corbet Refer to [2] for definitions. 304*e4624435SJonathan Corbet 305*e4624435SJonathan CorbetThe regset data starts with struct user_sve_header, containing: 306*e4624435SJonathan Corbet 307*e4624435SJonathan Corbet size 308*e4624435SJonathan Corbet 309*e4624435SJonathan Corbet Size of the complete regset, in bytes. 310*e4624435SJonathan Corbet This depends on vl and possibly on other things in the future. 311*e4624435SJonathan Corbet 312*e4624435SJonathan Corbet If a call to PTRACE_GETREGSET requests less data than the value of 313*e4624435SJonathan Corbet size, the caller can allocate a larger buffer and retry in order to 314*e4624435SJonathan Corbet read the complete regset. 315*e4624435SJonathan Corbet 316*e4624435SJonathan Corbet max_size 317*e4624435SJonathan Corbet 318*e4624435SJonathan Corbet Maximum size in bytes that the regset can grow to for the target 319*e4624435SJonathan Corbet thread. The regset won't grow bigger than this even if the target 320*e4624435SJonathan Corbet thread changes its vector length etc. 321*e4624435SJonathan Corbet 322*e4624435SJonathan Corbet vl 323*e4624435SJonathan Corbet 324*e4624435SJonathan Corbet Target thread's current vector length, in bytes. 325*e4624435SJonathan Corbet 326*e4624435SJonathan Corbet max_vl 327*e4624435SJonathan Corbet 328*e4624435SJonathan Corbet Maximum possible vector length for the target thread. 329*e4624435SJonathan Corbet 330*e4624435SJonathan Corbet flags 331*e4624435SJonathan Corbet 332*e4624435SJonathan Corbet at most one of 333*e4624435SJonathan Corbet 334*e4624435SJonathan Corbet SVE_PT_REGS_FPSIMD 335*e4624435SJonathan Corbet 336*e4624435SJonathan Corbet SVE registers are not live (GETREGSET) or are to be made 337*e4624435SJonathan Corbet non-live (SETREGSET). 338*e4624435SJonathan Corbet 339*e4624435SJonathan Corbet The payload is of type struct user_fpsimd_state, with the same 340*e4624435SJonathan Corbet meaning as for NT_PRFPREG, starting at offset 341*e4624435SJonathan Corbet SVE_PT_FPSIMD_OFFSET from the start of user_sve_header. 342*e4624435SJonathan Corbet 343*e4624435SJonathan Corbet Extra data might be appended in the future: the size of the 344*e4624435SJonathan Corbet payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags). 345*e4624435SJonathan Corbet 346*e4624435SJonathan Corbet vq should be obtained using sve_vq_from_vl(vl). 347*e4624435SJonathan Corbet 348*e4624435SJonathan Corbet or 349*e4624435SJonathan Corbet 350*e4624435SJonathan Corbet SVE_PT_REGS_SVE 351*e4624435SJonathan Corbet 352*e4624435SJonathan Corbet SVE registers are live (GETREGSET) or are to be made live 353*e4624435SJonathan Corbet (SETREGSET). 354*e4624435SJonathan Corbet 355*e4624435SJonathan Corbet The payload contains the SVE register data, starting at offset 356*e4624435SJonathan Corbet SVE_PT_SVE_OFFSET from the start of user_sve_header, and with 357*e4624435SJonathan Corbet size SVE_PT_SVE_SIZE(vq, flags); 358*e4624435SJonathan Corbet 359*e4624435SJonathan Corbet ... OR-ed with zero or more of the following flags, which have the same 360*e4624435SJonathan Corbet meaning and behaviour as the corresponding PR_SET_VL_* flags: 361*e4624435SJonathan Corbet 362*e4624435SJonathan Corbet SVE_PT_VL_INHERIT 363*e4624435SJonathan Corbet 364*e4624435SJonathan Corbet SVE_PT_VL_ONEXEC (SETREGSET only). 365*e4624435SJonathan Corbet 366*e4624435SJonathan Corbet If neither FPSIMD nor SVE flags are provided then no register 367*e4624435SJonathan Corbet payload is available, this is only possible when SME is implemented. 368*e4624435SJonathan Corbet 369*e4624435SJonathan Corbet 370*e4624435SJonathan Corbet* The effects of changing the vector length and/or flags are equivalent to 371*e4624435SJonathan Corbet those documented for PR_SVE_SET_VL. 372*e4624435SJonathan Corbet 373*e4624435SJonathan Corbet The caller must make a further GETREGSET call if it needs to know what VL is 374*e4624435SJonathan Corbet actually set by SETREGSET, unless is it known in advance that the requested 375*e4624435SJonathan Corbet VL is supported. 376*e4624435SJonathan Corbet 377*e4624435SJonathan Corbet* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on 378*e4624435SJonathan Corbet the header fields. The SVE_PT_SVE_*() macros are provided to facilitate 379*e4624435SJonathan Corbet access to the members. 380*e4624435SJonathan Corbet 381*e4624435SJonathan Corbet* In either case, for SETREGSET it is permissible to omit the payload, in which 382*e4624435SJonathan Corbet case only the vector length and flags are changed (along with any 383*e4624435SJonathan Corbet consequences of those changes). 384*e4624435SJonathan Corbet 385*e4624435SJonathan Corbet* In systems supporting SME when in streaming mode a GETREGSET for 386*e4624435SJonathan Corbet NT_REG_SVE will return only the user_sve_header with no register data, 387*e4624435SJonathan Corbet similarly a GETREGSET for NT_REG_SSVE will not return any register data 388*e4624435SJonathan Corbet when not in streaming mode. 389*e4624435SJonathan Corbet 390*e4624435SJonathan Corbet* A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD. 391*e4624435SJonathan Corbet 392*e4624435SJonathan Corbet* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the 393*e4624435SJonathan Corbet requested VL is not supported, the effect will be the same as if the 394*e4624435SJonathan Corbet payload were omitted, except that an EIO error is reported. No 395*e4624435SJonathan Corbet attempt is made to translate the payload data to the correct layout 396*e4624435SJonathan Corbet for the vector length actually set. The thread's FPSIMD state is 397*e4624435SJonathan Corbet preserved, but the remaining bits of the SVE registers become 398*e4624435SJonathan Corbet unspecified. It is up to the caller to translate the payload layout 399*e4624435SJonathan Corbet for the actual VL and retry. 400*e4624435SJonathan Corbet 401*e4624435SJonathan Corbet* Where SME is implemented it is not possible to GETREGSET the register 402*e4624435SJonathan Corbet state for normal SVE when in streaming mode, nor the streaming mode 403*e4624435SJonathan Corbet register state when in normal mode, regardless of the implementation defined 404*e4624435SJonathan Corbet behaviour of the hardware for sharing data between the two modes. 405*e4624435SJonathan Corbet 406*e4624435SJonathan Corbet* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in 407*e4624435SJonathan Corbet streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode 408*e4624435SJonathan Corbet if the target was not in streaming mode. 409*e4624435SJonathan Corbet 410*e4624435SJonathan Corbet* The effect of writing a partial, incomplete payload is unspecified. 411*e4624435SJonathan Corbet 412*e4624435SJonathan Corbet 413*e4624435SJonathan Corbet8. ELF coredump extensions 414*e4624435SJonathan Corbet--------------------------- 415*e4624435SJonathan Corbet 416*e4624435SJonathan Corbet* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for 417*e4624435SJonathan Corbet each thread of the dumped process. The contents will be equivalent to the 418*e4624435SJonathan Corbet data that would have been read if a PTRACE_GETREGSET of the corresponding 419*e4624435SJonathan Corbet type were executed for each thread when the coredump was generated. 420*e4624435SJonathan Corbet 421*e4624435SJonathan Corbet9. System runtime configuration 422*e4624435SJonathan Corbet-------------------------------- 423*e4624435SJonathan Corbet 424*e4624435SJonathan Corbet* To mitigate the ABI impact of expansion of the signal frame, a policy 425*e4624435SJonathan Corbet mechanism is provided for administrators, distro maintainers and developers 426*e4624435SJonathan Corbet to set the default vector length for userspace processes: 427*e4624435SJonathan Corbet 428*e4624435SJonathan Corbet/proc/sys/abi/sve_default_vector_length 429*e4624435SJonathan Corbet 430*e4624435SJonathan Corbet Writing the text representation of an integer to this file sets the system 431*e4624435SJonathan Corbet default vector length to the specified value, unless the value is greater 432*e4624435SJonathan Corbet than the maximum vector length supported by the system in which case the 433*e4624435SJonathan Corbet default vector length is set to that maximum. 434*e4624435SJonathan Corbet 435*e4624435SJonathan Corbet The result can be determined by reopening the file and reading its 436*e4624435SJonathan Corbet contents. 437*e4624435SJonathan Corbet 438*e4624435SJonathan Corbet At boot, the default vector length is initially set to 64 or the maximum 439*e4624435SJonathan Corbet supported vector length, whichever is smaller. This determines the initial 440*e4624435SJonathan Corbet vector length of the init process (PID 1). 441*e4624435SJonathan Corbet 442*e4624435SJonathan Corbet Reading this file returns the current system default vector length. 443*e4624435SJonathan Corbet 444*e4624435SJonathan Corbet* At every execve() call, the new vector length of the new process is set to 445*e4624435SJonathan Corbet the system default vector length, unless 446*e4624435SJonathan Corbet 447*e4624435SJonathan Corbet * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the 448*e4624435SJonathan Corbet calling thread, or 449*e4624435SJonathan Corbet 450*e4624435SJonathan Corbet * a deferred vector length change is pending, established via the 451*e4624435SJonathan Corbet PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC). 452*e4624435SJonathan Corbet 453*e4624435SJonathan Corbet* Modifying the system default vector length does not affect the vector length 454*e4624435SJonathan Corbet of any existing process or thread that does not make an execve() call. 455*e4624435SJonathan Corbet 456*e4624435SJonathan Corbet10. Perf extensions 457*e4624435SJonathan Corbet-------------------------------- 458*e4624435SJonathan Corbet 459*e4624435SJonathan Corbet* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register 460*e4624435SJonathan Corbet at index 46. This register is used for DWARF unwinding when variable length 461*e4624435SJonathan Corbet SVE registers are pushed onto the stack. 462*e4624435SJonathan Corbet 463*e4624435SJonathan Corbet* Its value is equivalent to the current SVE vector length (VL) in bits divided 464*e4624435SJonathan Corbet by 64. 465*e4624435SJonathan Corbet 466*e4624435SJonathan Corbet* The value is included in Perf samples in the regs[46] field if 467*e4624435SJonathan Corbet PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set. 468*e4624435SJonathan Corbet 469*e4624435SJonathan Corbet* The value is the current value at the time the sample was taken, and it can 470*e4624435SJonathan Corbet change over time. 471*e4624435SJonathan Corbet 472*e4624435SJonathan Corbet* If the system doesn't support SVE when perf_event_open is called with these 473*e4624435SJonathan Corbet settings, the event will fail to open. 474*e4624435SJonathan Corbet 475*e4624435SJonathan CorbetAppendix A. SVE programmer's model (informative) 476*e4624435SJonathan Corbet================================================= 477*e4624435SJonathan Corbet 478*e4624435SJonathan CorbetThis section provides a minimal description of the additions made by SVE to the 479*e4624435SJonathan CorbetARMv8-A programmer's model that are relevant to this document. 480*e4624435SJonathan Corbet 481*e4624435SJonathan CorbetNote: This section is for information only and not intended to be complete or 482*e4624435SJonathan Corbetto replace any architectural specification. 483*e4624435SJonathan Corbet 484*e4624435SJonathan CorbetA.1. Registers 485*e4624435SJonathan Corbet--------------- 486*e4624435SJonathan Corbet 487*e4624435SJonathan CorbetIn A64 state, SVE adds the following: 488*e4624435SJonathan Corbet 489*e4624435SJonathan Corbet* 32 8VL-bit vector registers Z0..Z31 490*e4624435SJonathan Corbet For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn. 491*e4624435SJonathan Corbet 492*e4624435SJonathan Corbet A register write using a Vn register name zeros all bits of the corresponding 493*e4624435SJonathan Corbet Zn except for bits [127:0]. 494*e4624435SJonathan Corbet 495*e4624435SJonathan Corbet* 16 VL-bit predicate registers P0..P15 496*e4624435SJonathan Corbet 497*e4624435SJonathan Corbet* 1 VL-bit special-purpose predicate register FFR (the "first-fault register") 498*e4624435SJonathan Corbet 499*e4624435SJonathan Corbet* a VL "pseudo-register" that determines the size of each vector register 500*e4624435SJonathan Corbet 501*e4624435SJonathan Corbet The SVE instruction set architecture provides no way to write VL directly. 502*e4624435SJonathan Corbet Instead, it can be modified only by EL1 and above, by writing appropriate 503*e4624435SJonathan Corbet system registers. 504*e4624435SJonathan Corbet 505*e4624435SJonathan Corbet* The value of VL can be configured at runtime by EL1 and above: 506*e4624435SJonathan Corbet 16 <= VL <= VLmax, where VL must be a multiple of 16. 507*e4624435SJonathan Corbet 508*e4624435SJonathan Corbet* The maximum vector length is determined by the hardware: 509*e4624435SJonathan Corbet 16 <= VLmax <= 256. 510*e4624435SJonathan Corbet 511*e4624435SJonathan Corbet (The SVE architecture specifies 256, but permits future architecture 512*e4624435SJonathan Corbet revisions to raise this limit.) 513*e4624435SJonathan Corbet 514*e4624435SJonathan Corbet* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point 515*e4624435SJonathan Corbet operations in a similar way to the way in which they interact with ARMv8 516*e4624435SJonathan Corbet floating-point operations:: 517*e4624435SJonathan Corbet 518*e4624435SJonathan Corbet 8VL-1 128 0 bit index 519*e4624435SJonathan Corbet +---- //// -----------------+ 520*e4624435SJonathan Corbet Z0 | : V0 | 521*e4624435SJonathan Corbet : : 522*e4624435SJonathan Corbet Z7 | : V7 | 523*e4624435SJonathan Corbet Z8 | : * V8 | 524*e4624435SJonathan Corbet : : : 525*e4624435SJonathan Corbet Z15 | : *V15 | 526*e4624435SJonathan Corbet Z16 | : V16 | 527*e4624435SJonathan Corbet : : 528*e4624435SJonathan Corbet Z31 | : V31 | 529*e4624435SJonathan Corbet +---- //// -----------------+ 530*e4624435SJonathan Corbet 31 0 531*e4624435SJonathan Corbet VL-1 0 +-------+ 532*e4624435SJonathan Corbet +---- //// --+ FPSR | | 533*e4624435SJonathan Corbet P0 | | +-------+ 534*e4624435SJonathan Corbet : | | *FPCR | | 535*e4624435SJonathan Corbet P15 | | +-------+ 536*e4624435SJonathan Corbet +---- //// --+ 537*e4624435SJonathan Corbet FFR | | +-----+ 538*e4624435SJonathan Corbet +---- //// --+ VL | | 539*e4624435SJonathan Corbet +-----+ 540*e4624435SJonathan Corbet 541*e4624435SJonathan Corbet(*) callee-save: 542*e4624435SJonathan Corbet This only applies to bits [63:0] of Z-/V-registers. 543*e4624435SJonathan Corbet FPCR contains callee-save and caller-save bits. See [4] for details. 544*e4624435SJonathan Corbet 545*e4624435SJonathan Corbet 546*e4624435SJonathan CorbetA.2. Procedure call standard 547*e4624435SJonathan Corbet----------------------------- 548*e4624435SJonathan Corbet 549*e4624435SJonathan CorbetThe ARMv8-A base procedure call standard is extended as follows with respect to 550*e4624435SJonathan Corbetthe additional SVE register state: 551*e4624435SJonathan Corbet 552*e4624435SJonathan Corbet* All SVE register bits that are not shared with FP/SIMD are caller-save. 553*e4624435SJonathan Corbet 554*e4624435SJonathan Corbet* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save. 555*e4624435SJonathan Corbet 556*e4624435SJonathan Corbet This follows from the way these bits are mapped to V8..V15, which are caller- 557*e4624435SJonathan Corbet save in the base procedure call standard. 558*e4624435SJonathan Corbet 559*e4624435SJonathan Corbet 560*e4624435SJonathan CorbetAppendix B. ARMv8-A FP/SIMD programmer's model 561*e4624435SJonathan Corbet=============================================== 562*e4624435SJonathan Corbet 563*e4624435SJonathan CorbetNote: This section is for information only and not intended to be complete or 564*e4624435SJonathan Corbetto replace any architectural specification. 565*e4624435SJonathan Corbet 566*e4624435SJonathan CorbetRefer to [4] for more information. 567*e4624435SJonathan Corbet 568*e4624435SJonathan CorbetARMv8-A defines the following floating-point / SIMD register state: 569*e4624435SJonathan Corbet 570*e4624435SJonathan Corbet* 32 128-bit vector registers V0..V31 571*e4624435SJonathan Corbet* 2 32-bit status/control registers FPSR, FPCR 572*e4624435SJonathan Corbet 573*e4624435SJonathan Corbet:: 574*e4624435SJonathan Corbet 575*e4624435SJonathan Corbet 127 0 bit index 576*e4624435SJonathan Corbet +---------------+ 577*e4624435SJonathan Corbet V0 | | 578*e4624435SJonathan Corbet : : : 579*e4624435SJonathan Corbet V7 | | 580*e4624435SJonathan Corbet * V8 | | 581*e4624435SJonathan Corbet : : : : 582*e4624435SJonathan Corbet *V15 | | 583*e4624435SJonathan Corbet V16 | | 584*e4624435SJonathan Corbet : : : 585*e4624435SJonathan Corbet V31 | | 586*e4624435SJonathan Corbet +---------------+ 587*e4624435SJonathan Corbet 588*e4624435SJonathan Corbet 31 0 589*e4624435SJonathan Corbet +-------+ 590*e4624435SJonathan Corbet FPSR | | 591*e4624435SJonathan Corbet +-------+ 592*e4624435SJonathan Corbet *FPCR | | 593*e4624435SJonathan Corbet +-------+ 594*e4624435SJonathan Corbet 595*e4624435SJonathan Corbet(*) callee-save: 596*e4624435SJonathan Corbet This only applies to bits [63:0] of V-registers. 597*e4624435SJonathan Corbet FPCR contains a mixture of callee-save and caller-save bits. 598*e4624435SJonathan Corbet 599*e4624435SJonathan Corbet 600*e4624435SJonathan CorbetReferences 601*e4624435SJonathan Corbet========== 602*e4624435SJonathan Corbet 603*e4624435SJonathan Corbet[1] arch/arm64/include/uapi/asm/sigcontext.h 604*e4624435SJonathan Corbet AArch64 Linux signal ABI definitions 605*e4624435SJonathan Corbet 606*e4624435SJonathan Corbet[2] arch/arm64/include/uapi/asm/ptrace.h 607*e4624435SJonathan Corbet AArch64 Linux ptrace ABI definitions 608*e4624435SJonathan Corbet 609*e4624435SJonathan Corbet[3] Documentation/arch/arm64/cpu-feature-registers.rst 610*e4624435SJonathan Corbet 611*e4624435SJonathan Corbet[4] ARM IHI0055C 612*e4624435SJonathan Corbet http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf 613*e4624435SJonathan Corbet http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html 614*e4624435SJonathan Corbet Procedure Call Standard for the ARM 64-bit Architecture (AArch64) 615*e4624435SJonathan Corbet 616*e4624435SJonathan Corbet[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst 617