xref: /linux/Documentation/arch/arm64/sve.rst (revision e4624435f38b34e7ce827070aa0f8b533a37c07e)
1*e4624435SJonathan Corbet===================================================
2*e4624435SJonathan CorbetScalable Vector Extension support for AArch64 Linux
3*e4624435SJonathan Corbet===================================================
4*e4624435SJonathan Corbet
5*e4624435SJonathan CorbetAuthor: Dave Martin <Dave.Martin@arm.com>
6*e4624435SJonathan Corbet
7*e4624435SJonathan CorbetDate:   4 August 2017
8*e4624435SJonathan Corbet
9*e4624435SJonathan CorbetThis document outlines briefly the interface provided to userspace by Linux in
10*e4624435SJonathan Corbetorder to support use of the ARM Scalable Vector Extension (SVE), including
11*e4624435SJonathan Corbetinteractions with Streaming SVE mode added by the Scalable Matrix Extension
12*e4624435SJonathan Corbet(SME).
13*e4624435SJonathan Corbet
14*e4624435SJonathan CorbetThis is an outline of the most important features and issues only and not
15*e4624435SJonathan Corbetintended to be exhaustive.
16*e4624435SJonathan Corbet
17*e4624435SJonathan CorbetThis document does not aim to describe the SVE architecture or programmer's
18*e4624435SJonathan Corbetmodel.  To aid understanding, a minimal description of relevant programmer's
19*e4624435SJonathan Corbetmodel features for SVE is included in Appendix A.
20*e4624435SJonathan Corbet
21*e4624435SJonathan Corbet
22*e4624435SJonathan Corbet1.  General
23*e4624435SJonathan Corbet-----------
24*e4624435SJonathan Corbet
25*e4624435SJonathan Corbet* SVE registers Z0..Z31, P0..P15 and FFR and the current vector length VL, are
26*e4624435SJonathan Corbet  tracked per-thread.
27*e4624435SJonathan Corbet
28*e4624435SJonathan Corbet* In streaming mode FFR is not accessible unless HWCAP2_SME_FA64 is present
29*e4624435SJonathan Corbet  in the system, when it is not supported and these interfaces are used to
30*e4624435SJonathan Corbet  access streaming mode FFR is read and written as zero.
31*e4624435SJonathan Corbet
32*e4624435SJonathan Corbet* The presence of SVE is reported to userspace via HWCAP_SVE in the aux vector
33*e4624435SJonathan Corbet  AT_HWCAP entry.  Presence of this flag implies the presence of the SVE
34*e4624435SJonathan Corbet  instructions and registers, and the Linux-specific system interfaces
35*e4624435SJonathan Corbet  described in this document.  SVE is reported in /proc/cpuinfo as "sve".
36*e4624435SJonathan Corbet
37*e4624435SJonathan Corbet* Support for the execution of SVE instructions in userspace can also be
38*e4624435SJonathan Corbet  detected by reading the CPU ID register ID_AA64PFR0_EL1 using an MRS
39*e4624435SJonathan Corbet  instruction, and checking that the value of the SVE field is nonzero. [3]
40*e4624435SJonathan Corbet
41*e4624435SJonathan Corbet  It does not guarantee the presence of the system interfaces described in the
42*e4624435SJonathan Corbet  following sections: software that needs to verify that those interfaces are
43*e4624435SJonathan Corbet  present must check for HWCAP_SVE instead.
44*e4624435SJonathan Corbet
45*e4624435SJonathan Corbet* On hardware that supports the SVE2 extensions, HWCAP2_SVE2 will also
46*e4624435SJonathan Corbet  be reported in the AT_HWCAP2 aux vector entry.  In addition to this,
47*e4624435SJonathan Corbet  optional extensions to SVE2 may be reported by the presence of:
48*e4624435SJonathan Corbet
49*e4624435SJonathan Corbet	HWCAP2_SVE2
50*e4624435SJonathan Corbet	HWCAP2_SVEAES
51*e4624435SJonathan Corbet	HWCAP2_SVEPMULL
52*e4624435SJonathan Corbet	HWCAP2_SVEBITPERM
53*e4624435SJonathan Corbet	HWCAP2_SVESHA3
54*e4624435SJonathan Corbet	HWCAP2_SVESM4
55*e4624435SJonathan Corbet	HWCAP2_SVE2P1
56*e4624435SJonathan Corbet
57*e4624435SJonathan Corbet  This list may be extended over time as the SVE architecture evolves.
58*e4624435SJonathan Corbet
59*e4624435SJonathan Corbet  These extensions are also reported via the CPU ID register ID_AA64ZFR0_EL1,
60*e4624435SJonathan Corbet  which userspace can read using an MRS instruction.  See elf_hwcaps.txt and
61*e4624435SJonathan Corbet  cpu-feature-registers.txt for details.
62*e4624435SJonathan Corbet
63*e4624435SJonathan Corbet* On hardware that supports the SME extensions, HWCAP2_SME will also be
64*e4624435SJonathan Corbet  reported in the AT_HWCAP2 aux vector entry.  Among other things SME adds
65*e4624435SJonathan Corbet  streaming mode which provides a subset of the SVE feature set using a
66*e4624435SJonathan Corbet  separate SME vector length and the same Z/V registers.  See sme.rst
67*e4624435SJonathan Corbet  for more details.
68*e4624435SJonathan Corbet
69*e4624435SJonathan Corbet* Debuggers should restrict themselves to interacting with the target via the
70*e4624435SJonathan Corbet  NT_ARM_SVE regset.  The recommended way of detecting support for this regset
71*e4624435SJonathan Corbet  is to connect to a target process first and then attempt a
72*e4624435SJonathan Corbet  ptrace(PTRACE_GETREGSET, pid, NT_ARM_SVE, &iov).  Note that when SME is
73*e4624435SJonathan Corbet  present and streaming SVE mode is in use the FPSIMD subset of registers
74*e4624435SJonathan Corbet  will be read via NT_ARM_SVE and NT_ARM_SVE writes will exit streaming mode
75*e4624435SJonathan Corbet  in the target.
76*e4624435SJonathan Corbet
77*e4624435SJonathan Corbet* Whenever SVE scalable register values (Zn, Pn, FFR) are exchanged in memory
78*e4624435SJonathan Corbet  between userspace and the kernel, the register value is encoded in memory in
79*e4624435SJonathan Corbet  an endianness-invariant layout, with bits [(8 * i + 7) : (8 * i)] encoded at
80*e4624435SJonathan Corbet  byte offset i from the start of the memory representation.  This affects for
81*e4624435SJonathan Corbet  example the signal frame (struct sve_context) and ptrace interface
82*e4624435SJonathan Corbet  (struct user_sve_header) and associated data.
83*e4624435SJonathan Corbet
84*e4624435SJonathan Corbet  Beware that on big-endian systems this results in a different byte order than
85*e4624435SJonathan Corbet  for the FPSIMD V-registers, which are stored as single host-endian 128-bit
86*e4624435SJonathan Corbet  values, with bits [(127 - 8 * i) : (120 - 8 * i)] of the register encoded at
87*e4624435SJonathan Corbet  byte offset i.  (struct fpsimd_context, struct user_fpsimd_state).
88*e4624435SJonathan Corbet
89*e4624435SJonathan Corbet
90*e4624435SJonathan Corbet2.  Vector length terminology
91*e4624435SJonathan Corbet-----------------------------
92*e4624435SJonathan Corbet
93*e4624435SJonathan CorbetThe size of an SVE vector (Z) register is referred to as the "vector length".
94*e4624435SJonathan Corbet
95*e4624435SJonathan CorbetTo avoid confusion about the units used to express vector length, the kernel
96*e4624435SJonathan Corbetadopts the following conventions:
97*e4624435SJonathan Corbet
98*e4624435SJonathan Corbet* Vector length (VL) = size of a Z-register in bytes
99*e4624435SJonathan Corbet
100*e4624435SJonathan Corbet* Vector quadwords (VQ) = size of a Z-register in units of 128 bits
101*e4624435SJonathan Corbet
102*e4624435SJonathan Corbet(So, VL = 16 * VQ.)
103*e4624435SJonathan Corbet
104*e4624435SJonathan CorbetThe VQ convention is used where the underlying granularity is important, such
105*e4624435SJonathan Corbetas in data structure definitions.  In most other situations, the VL convention
106*e4624435SJonathan Corbetis used.  This is consistent with the meaning of the "VL" pseudo-register in
107*e4624435SJonathan Corbetthe SVE instruction set architecture.
108*e4624435SJonathan Corbet
109*e4624435SJonathan Corbet
110*e4624435SJonathan Corbet3.  System call behaviour
111*e4624435SJonathan Corbet-------------------------
112*e4624435SJonathan Corbet
113*e4624435SJonathan Corbet* On syscall, V0..V31 are preserved (as without SVE).  Thus, bits [127:0] of
114*e4624435SJonathan Corbet  Z0..Z31 are preserved.  All other bits of Z0..Z31, and all of P0..P15 and FFR
115*e4624435SJonathan Corbet  become zero on return from a syscall.
116*e4624435SJonathan Corbet
117*e4624435SJonathan Corbet* The SVE registers are not used to pass arguments to or receive results from
118*e4624435SJonathan Corbet  any syscall.
119*e4624435SJonathan Corbet
120*e4624435SJonathan Corbet* In practice the affected registers/bits will be preserved or will be replaced
121*e4624435SJonathan Corbet  with zeros on return from a syscall, but userspace should not make
122*e4624435SJonathan Corbet  assumptions about this.  The kernel behaviour may vary on a case-by-case
123*e4624435SJonathan Corbet  basis.
124*e4624435SJonathan Corbet
125*e4624435SJonathan Corbet* All other SVE state of a thread, including the currently configured vector
126*e4624435SJonathan Corbet  length, the state of the PR_SVE_VL_INHERIT flag, and the deferred vector
127*e4624435SJonathan Corbet  length (if any), is preserved across all syscalls, subject to the specific
128*e4624435SJonathan Corbet  exceptions for execve() described in section 6.
129*e4624435SJonathan Corbet
130*e4624435SJonathan Corbet  In particular, on return from a fork() or clone(), the parent and new child
131*e4624435SJonathan Corbet  process or thread share identical SVE configuration, matching that of the
132*e4624435SJonathan Corbet  parent before the call.
133*e4624435SJonathan Corbet
134*e4624435SJonathan Corbet
135*e4624435SJonathan Corbet4.  Signal handling
136*e4624435SJonathan Corbet-------------------
137*e4624435SJonathan Corbet
138*e4624435SJonathan Corbet* A new signal frame record sve_context encodes the SVE registers on signal
139*e4624435SJonathan Corbet  delivery. [1]
140*e4624435SJonathan Corbet
141*e4624435SJonathan Corbet* This record is supplementary to fpsimd_context.  The FPSR and FPCR registers
142*e4624435SJonathan Corbet  are only present in fpsimd_context.  For convenience, the content of V0..V31
143*e4624435SJonathan Corbet  is duplicated between sve_context and fpsimd_context.
144*e4624435SJonathan Corbet
145*e4624435SJonathan Corbet* The record contains a flag field which includes a flag SVE_SIG_FLAG_SM which
146*e4624435SJonathan Corbet  if set indicates that the thread is in streaming mode and the vector length
147*e4624435SJonathan Corbet  and register data (if present) describe the streaming SVE data and vector
148*e4624435SJonathan Corbet  length.
149*e4624435SJonathan Corbet
150*e4624435SJonathan Corbet* The signal frame record for SVE always contains basic metadata, in particular
151*e4624435SJonathan Corbet  the thread's vector length (in sve_context.vl).
152*e4624435SJonathan Corbet
153*e4624435SJonathan Corbet* The SVE registers may or may not be included in the record, depending on
154*e4624435SJonathan Corbet  whether the registers are live for the thread.  The registers are present if
155*e4624435SJonathan Corbet  and only if:
156*e4624435SJonathan Corbet  sve_context.head.size >= SVE_SIG_CONTEXT_SIZE(sve_vq_from_vl(sve_context.vl)).
157*e4624435SJonathan Corbet
158*e4624435SJonathan Corbet* If the registers are present, the remainder of the record has a vl-dependent
159*e4624435SJonathan Corbet  size and layout.  Macros SVE_SIG_* are defined [1] to facilitate access to
160*e4624435SJonathan Corbet  the members.
161*e4624435SJonathan Corbet
162*e4624435SJonathan Corbet* Each scalable register (Zn, Pn, FFR) is stored in an endianness-invariant
163*e4624435SJonathan Corbet  layout, with bits [(8 * i + 7) : (8 * i)] stored at byte offset i from the
164*e4624435SJonathan Corbet  start of the register's representation in memory.
165*e4624435SJonathan Corbet
166*e4624435SJonathan Corbet* If the SVE context is too big to fit in sigcontext.__reserved[], then extra
167*e4624435SJonathan Corbet  space is allocated on the stack, an extra_context record is written in
168*e4624435SJonathan Corbet  __reserved[] referencing this space.  sve_context is then written in the
169*e4624435SJonathan Corbet  extra space.  Refer to [1] for further details about this mechanism.
170*e4624435SJonathan Corbet
171*e4624435SJonathan Corbet
172*e4624435SJonathan Corbet5.  Signal return
173*e4624435SJonathan Corbet-----------------
174*e4624435SJonathan Corbet
175*e4624435SJonathan CorbetWhen returning from a signal handler:
176*e4624435SJonathan Corbet
177*e4624435SJonathan Corbet* If there is no sve_context record in the signal frame, or if the record is
178*e4624435SJonathan Corbet  present but contains no register data as described in the previous section,
179*e4624435SJonathan Corbet  then the SVE registers/bits become non-live and take unspecified values.
180*e4624435SJonathan Corbet
181*e4624435SJonathan Corbet* If sve_context is present in the signal frame and contains full register
182*e4624435SJonathan Corbet  data, the SVE registers become live and are populated with the specified
183*e4624435SJonathan Corbet  data.  However, for backward compatibility reasons, bits [127:0] of Z0..Z31
184*e4624435SJonathan Corbet  are always restored from the corresponding members of fpsimd_context.vregs[]
185*e4624435SJonathan Corbet  and not from sve_context.  The remaining bits are restored from sve_context.
186*e4624435SJonathan Corbet
187*e4624435SJonathan Corbet* Inclusion of fpsimd_context in the signal frame remains mandatory,
188*e4624435SJonathan Corbet  irrespective of whether sve_context is present or not.
189*e4624435SJonathan Corbet
190*e4624435SJonathan Corbet* The vector length cannot be changed via signal return.  If sve_context.vl in
191*e4624435SJonathan Corbet  the signal frame does not match the current vector length, the signal return
192*e4624435SJonathan Corbet  attempt is treated as illegal, resulting in a forced SIGSEGV.
193*e4624435SJonathan Corbet
194*e4624435SJonathan Corbet* It is permitted to enter or leave streaming mode by setting or clearing
195*e4624435SJonathan Corbet  the SVE_SIG_FLAG_SM flag but applications should take care to ensure that
196*e4624435SJonathan Corbet  when doing so sve_context.vl and any register data are appropriate for the
197*e4624435SJonathan Corbet  vector length in the new mode.
198*e4624435SJonathan Corbet
199*e4624435SJonathan Corbet
200*e4624435SJonathan Corbet6.  prctl extensions
201*e4624435SJonathan Corbet--------------------
202*e4624435SJonathan Corbet
203*e4624435SJonathan CorbetSome new prctl() calls are added to allow programs to manage the SVE vector
204*e4624435SJonathan Corbetlength:
205*e4624435SJonathan Corbet
206*e4624435SJonathan Corbetprctl(PR_SVE_SET_VL, unsigned long arg)
207*e4624435SJonathan Corbet
208*e4624435SJonathan Corbet    Sets the vector length of the calling thread and related flags, where
209*e4624435SJonathan Corbet    arg == vl | flags.  Other threads of the calling process are unaffected.
210*e4624435SJonathan Corbet
211*e4624435SJonathan Corbet    vl is the desired vector length, where sve_vl_valid(vl) must be true.
212*e4624435SJonathan Corbet
213*e4624435SJonathan Corbet    flags:
214*e4624435SJonathan Corbet
215*e4624435SJonathan Corbet	PR_SVE_VL_INHERIT
216*e4624435SJonathan Corbet
217*e4624435SJonathan Corbet	    Inherit the current vector length across execve().  Otherwise, the
218*e4624435SJonathan Corbet	    vector length is reset to the system default at execve().  (See
219*e4624435SJonathan Corbet	    Section 9.)
220*e4624435SJonathan Corbet
221*e4624435SJonathan Corbet	PR_SVE_SET_VL_ONEXEC
222*e4624435SJonathan Corbet
223*e4624435SJonathan Corbet	    Defer the requested vector length change until the next execve()
224*e4624435SJonathan Corbet	    performed by this thread.
225*e4624435SJonathan Corbet
226*e4624435SJonathan Corbet	    The effect is equivalent to implicit execution of the following
227*e4624435SJonathan Corbet	    call immediately after the next execve() (if any) by the thread:
228*e4624435SJonathan Corbet
229*e4624435SJonathan Corbet		prctl(PR_SVE_SET_VL, arg & ~PR_SVE_SET_VL_ONEXEC)
230*e4624435SJonathan Corbet
231*e4624435SJonathan Corbet	    This allows launching of a new program with a different vector
232*e4624435SJonathan Corbet	    length, while avoiding runtime side effects in the caller.
233*e4624435SJonathan Corbet
234*e4624435SJonathan Corbet
235*e4624435SJonathan Corbet	    Without PR_SVE_SET_VL_ONEXEC, the requested change takes effect
236*e4624435SJonathan Corbet	    immediately.
237*e4624435SJonathan Corbet
238*e4624435SJonathan Corbet
239*e4624435SJonathan Corbet    Return value: a nonnegative on success, or a negative value on error:
240*e4624435SJonathan Corbet	EINVAL: SVE not supported, invalid vector length requested, or
241*e4624435SJonathan Corbet	    invalid flags.
242*e4624435SJonathan Corbet
243*e4624435SJonathan Corbet
244*e4624435SJonathan Corbet    On success:
245*e4624435SJonathan Corbet
246*e4624435SJonathan Corbet    * Either the calling thread's vector length or the deferred vector length
247*e4624435SJonathan Corbet      to be applied at the next execve() by the thread (dependent on whether
248*e4624435SJonathan Corbet      PR_SVE_SET_VL_ONEXEC is present in arg), is set to the largest value
249*e4624435SJonathan Corbet      supported by the system that is less than or equal to vl.  If vl ==
250*e4624435SJonathan Corbet      SVE_VL_MAX, the value set will be the largest value supported by the
251*e4624435SJonathan Corbet      system.
252*e4624435SJonathan Corbet
253*e4624435SJonathan Corbet    * Any previously outstanding deferred vector length change in the calling
254*e4624435SJonathan Corbet      thread is cancelled.
255*e4624435SJonathan Corbet
256*e4624435SJonathan Corbet    * The returned value describes the resulting configuration, encoded as for
257*e4624435SJonathan Corbet      PR_SVE_GET_VL.  The vector length reported in this value is the new
258*e4624435SJonathan Corbet      current vector length for this thread if PR_SVE_SET_VL_ONEXEC was not
259*e4624435SJonathan Corbet      present in arg; otherwise, the reported vector length is the deferred
260*e4624435SJonathan Corbet      vector length that will be applied at the next execve() by the calling
261*e4624435SJonathan Corbet      thread.
262*e4624435SJonathan Corbet
263*e4624435SJonathan Corbet    * Changing the vector length causes all of P0..P15, FFR and all bits of
264*e4624435SJonathan Corbet      Z0..Z31 except for Z0 bits [127:0] .. Z31 bits [127:0] to become
265*e4624435SJonathan Corbet      unspecified.  Calling PR_SVE_SET_VL with vl equal to the thread's current
266*e4624435SJonathan Corbet      vector length, or calling PR_SVE_SET_VL with the PR_SVE_SET_VL_ONEXEC
267*e4624435SJonathan Corbet      flag, does not constitute a change to the vector length for this purpose.
268*e4624435SJonathan Corbet
269*e4624435SJonathan Corbet
270*e4624435SJonathan Corbetprctl(PR_SVE_GET_VL)
271*e4624435SJonathan Corbet
272*e4624435SJonathan Corbet    Gets the vector length of the calling thread.
273*e4624435SJonathan Corbet
274*e4624435SJonathan Corbet    The following flag may be OR-ed into the result:
275*e4624435SJonathan Corbet
276*e4624435SJonathan Corbet	PR_SVE_VL_INHERIT
277*e4624435SJonathan Corbet
278*e4624435SJonathan Corbet	    Vector length will be inherited across execve().
279*e4624435SJonathan Corbet
280*e4624435SJonathan Corbet    There is no way to determine whether there is an outstanding deferred
281*e4624435SJonathan Corbet    vector length change (which would only normally be the case between a
282*e4624435SJonathan Corbet    fork() or vfork() and the corresponding execve() in typical use).
283*e4624435SJonathan Corbet
284*e4624435SJonathan Corbet    To extract the vector length from the result, bitwise and it with
285*e4624435SJonathan Corbet    PR_SVE_VL_LEN_MASK.
286*e4624435SJonathan Corbet
287*e4624435SJonathan Corbet    Return value: a nonnegative value on success, or a negative value on error:
288*e4624435SJonathan Corbet	EINVAL: SVE not supported.
289*e4624435SJonathan Corbet
290*e4624435SJonathan Corbet
291*e4624435SJonathan Corbet7.  ptrace extensions
292*e4624435SJonathan Corbet---------------------
293*e4624435SJonathan Corbet
294*e4624435SJonathan Corbet* New regsets NT_ARM_SVE and NT_ARM_SSVE are defined for use with
295*e4624435SJonathan Corbet  PTRACE_GETREGSET and PTRACE_SETREGSET. NT_ARM_SSVE describes the
296*e4624435SJonathan Corbet  streaming mode SVE registers and NT_ARM_SVE describes the
297*e4624435SJonathan Corbet  non-streaming mode SVE registers.
298*e4624435SJonathan Corbet
299*e4624435SJonathan Corbet  In this description a register set is referred to as being "live" when
300*e4624435SJonathan Corbet  the target is in the appropriate streaming or non-streaming mode and is
301*e4624435SJonathan Corbet  using data beyond the subset shared with the FPSIMD Vn registers.
302*e4624435SJonathan Corbet
303*e4624435SJonathan Corbet  Refer to [2] for definitions.
304*e4624435SJonathan Corbet
305*e4624435SJonathan CorbetThe regset data starts with struct user_sve_header, containing:
306*e4624435SJonathan Corbet
307*e4624435SJonathan Corbet    size
308*e4624435SJonathan Corbet
309*e4624435SJonathan Corbet	Size of the complete regset, in bytes.
310*e4624435SJonathan Corbet	This depends on vl and possibly on other things in the future.
311*e4624435SJonathan Corbet
312*e4624435SJonathan Corbet	If a call to PTRACE_GETREGSET requests less data than the value of
313*e4624435SJonathan Corbet	size, the caller can allocate a larger buffer and retry in order to
314*e4624435SJonathan Corbet	read the complete regset.
315*e4624435SJonathan Corbet
316*e4624435SJonathan Corbet    max_size
317*e4624435SJonathan Corbet
318*e4624435SJonathan Corbet	Maximum size in bytes that the regset can grow to for the target
319*e4624435SJonathan Corbet	thread.  The regset won't grow bigger than this even if the target
320*e4624435SJonathan Corbet	thread changes its vector length etc.
321*e4624435SJonathan Corbet
322*e4624435SJonathan Corbet    vl
323*e4624435SJonathan Corbet
324*e4624435SJonathan Corbet	Target thread's current vector length, in bytes.
325*e4624435SJonathan Corbet
326*e4624435SJonathan Corbet    max_vl
327*e4624435SJonathan Corbet
328*e4624435SJonathan Corbet	Maximum possible vector length for the target thread.
329*e4624435SJonathan Corbet
330*e4624435SJonathan Corbet    flags
331*e4624435SJonathan Corbet
332*e4624435SJonathan Corbet	at most one of
333*e4624435SJonathan Corbet
334*e4624435SJonathan Corbet	    SVE_PT_REGS_FPSIMD
335*e4624435SJonathan Corbet
336*e4624435SJonathan Corbet		SVE registers are not live (GETREGSET) or are to be made
337*e4624435SJonathan Corbet		non-live (SETREGSET).
338*e4624435SJonathan Corbet
339*e4624435SJonathan Corbet		The payload is of type struct user_fpsimd_state, with the same
340*e4624435SJonathan Corbet		meaning as for NT_PRFPREG, starting at offset
341*e4624435SJonathan Corbet		SVE_PT_FPSIMD_OFFSET from the start of user_sve_header.
342*e4624435SJonathan Corbet
343*e4624435SJonathan Corbet		Extra data might be appended in the future: the size of the
344*e4624435SJonathan Corbet		payload should be obtained using SVE_PT_FPSIMD_SIZE(vq, flags).
345*e4624435SJonathan Corbet
346*e4624435SJonathan Corbet		vq should be obtained using sve_vq_from_vl(vl).
347*e4624435SJonathan Corbet
348*e4624435SJonathan Corbet		or
349*e4624435SJonathan Corbet
350*e4624435SJonathan Corbet	    SVE_PT_REGS_SVE
351*e4624435SJonathan Corbet
352*e4624435SJonathan Corbet		SVE registers are live (GETREGSET) or are to be made live
353*e4624435SJonathan Corbet		(SETREGSET).
354*e4624435SJonathan Corbet
355*e4624435SJonathan Corbet		The payload contains the SVE register data, starting at offset
356*e4624435SJonathan Corbet		SVE_PT_SVE_OFFSET from the start of user_sve_header, and with
357*e4624435SJonathan Corbet		size SVE_PT_SVE_SIZE(vq, flags);
358*e4624435SJonathan Corbet
359*e4624435SJonathan Corbet	... OR-ed with zero or more of the following flags, which have the same
360*e4624435SJonathan Corbet	meaning and behaviour as the corresponding PR_SET_VL_* flags:
361*e4624435SJonathan Corbet
362*e4624435SJonathan Corbet	    SVE_PT_VL_INHERIT
363*e4624435SJonathan Corbet
364*e4624435SJonathan Corbet	    SVE_PT_VL_ONEXEC (SETREGSET only).
365*e4624435SJonathan Corbet
366*e4624435SJonathan Corbet	If neither FPSIMD nor SVE flags are provided then no register
367*e4624435SJonathan Corbet	payload is available, this is only possible when SME is implemented.
368*e4624435SJonathan Corbet
369*e4624435SJonathan Corbet
370*e4624435SJonathan Corbet* The effects of changing the vector length and/or flags are equivalent to
371*e4624435SJonathan Corbet  those documented for PR_SVE_SET_VL.
372*e4624435SJonathan Corbet
373*e4624435SJonathan Corbet  The caller must make a further GETREGSET call if it needs to know what VL is
374*e4624435SJonathan Corbet  actually set by SETREGSET, unless is it known in advance that the requested
375*e4624435SJonathan Corbet  VL is supported.
376*e4624435SJonathan Corbet
377*e4624435SJonathan Corbet* In the SVE_PT_REGS_SVE case, the size and layout of the payload depends on
378*e4624435SJonathan Corbet  the header fields.  The SVE_PT_SVE_*() macros are provided to facilitate
379*e4624435SJonathan Corbet  access to the members.
380*e4624435SJonathan Corbet
381*e4624435SJonathan Corbet* In either case, for SETREGSET it is permissible to omit the payload, in which
382*e4624435SJonathan Corbet  case only the vector length and flags are changed (along with any
383*e4624435SJonathan Corbet  consequences of those changes).
384*e4624435SJonathan Corbet
385*e4624435SJonathan Corbet* In systems supporting SME when in streaming mode a GETREGSET for
386*e4624435SJonathan Corbet  NT_REG_SVE will return only the user_sve_header with no register data,
387*e4624435SJonathan Corbet  similarly a GETREGSET for NT_REG_SSVE will not return any register data
388*e4624435SJonathan Corbet  when not in streaming mode.
389*e4624435SJonathan Corbet
390*e4624435SJonathan Corbet* A GETREGSET for NT_ARM_SSVE will never return SVE_PT_REGS_FPSIMD.
391*e4624435SJonathan Corbet
392*e4624435SJonathan Corbet* For SETREGSET, if an SVE_PT_REGS_SVE payload is present and the
393*e4624435SJonathan Corbet  requested VL is not supported, the effect will be the same as if the
394*e4624435SJonathan Corbet  payload were omitted, except that an EIO error is reported.  No
395*e4624435SJonathan Corbet  attempt is made to translate the payload data to the correct layout
396*e4624435SJonathan Corbet  for the vector length actually set.  The thread's FPSIMD state is
397*e4624435SJonathan Corbet  preserved, but the remaining bits of the SVE registers become
398*e4624435SJonathan Corbet  unspecified.  It is up to the caller to translate the payload layout
399*e4624435SJonathan Corbet  for the actual VL and retry.
400*e4624435SJonathan Corbet
401*e4624435SJonathan Corbet* Where SME is implemented it is not possible to GETREGSET the register
402*e4624435SJonathan Corbet  state for normal SVE when in streaming mode, nor the streaming mode
403*e4624435SJonathan Corbet  register state when in normal mode, regardless of the implementation defined
404*e4624435SJonathan Corbet  behaviour of the hardware for sharing data between the two modes.
405*e4624435SJonathan Corbet
406*e4624435SJonathan Corbet* Any SETREGSET of NT_ARM_SVE will exit streaming mode if the target was in
407*e4624435SJonathan Corbet  streaming mode and any SETREGSET of NT_ARM_SSVE will enter streaming mode
408*e4624435SJonathan Corbet  if the target was not in streaming mode.
409*e4624435SJonathan Corbet
410*e4624435SJonathan Corbet* The effect of writing a partial, incomplete payload is unspecified.
411*e4624435SJonathan Corbet
412*e4624435SJonathan Corbet
413*e4624435SJonathan Corbet8.  ELF coredump extensions
414*e4624435SJonathan Corbet---------------------------
415*e4624435SJonathan Corbet
416*e4624435SJonathan Corbet* NT_ARM_SVE and NT_ARM_SSVE notes will be added to each coredump for
417*e4624435SJonathan Corbet  each thread of the dumped process.  The contents will be equivalent to the
418*e4624435SJonathan Corbet  data that would have been read if a PTRACE_GETREGSET of the corresponding
419*e4624435SJonathan Corbet  type were executed for each thread when the coredump was generated.
420*e4624435SJonathan Corbet
421*e4624435SJonathan Corbet9.  System runtime configuration
422*e4624435SJonathan Corbet--------------------------------
423*e4624435SJonathan Corbet
424*e4624435SJonathan Corbet* To mitigate the ABI impact of expansion of the signal frame, a policy
425*e4624435SJonathan Corbet  mechanism is provided for administrators, distro maintainers and developers
426*e4624435SJonathan Corbet  to set the default vector length for userspace processes:
427*e4624435SJonathan Corbet
428*e4624435SJonathan Corbet/proc/sys/abi/sve_default_vector_length
429*e4624435SJonathan Corbet
430*e4624435SJonathan Corbet    Writing the text representation of an integer to this file sets the system
431*e4624435SJonathan Corbet    default vector length to the specified value, unless the value is greater
432*e4624435SJonathan Corbet    than the maximum vector length supported by the system in which case the
433*e4624435SJonathan Corbet    default vector length is set to that maximum.
434*e4624435SJonathan Corbet
435*e4624435SJonathan Corbet    The result can be determined by reopening the file and reading its
436*e4624435SJonathan Corbet    contents.
437*e4624435SJonathan Corbet
438*e4624435SJonathan Corbet    At boot, the default vector length is initially set to 64 or the maximum
439*e4624435SJonathan Corbet    supported vector length, whichever is smaller.  This determines the initial
440*e4624435SJonathan Corbet    vector length of the init process (PID 1).
441*e4624435SJonathan Corbet
442*e4624435SJonathan Corbet    Reading this file returns the current system default vector length.
443*e4624435SJonathan Corbet
444*e4624435SJonathan Corbet* At every execve() call, the new vector length of the new process is set to
445*e4624435SJonathan Corbet  the system default vector length, unless
446*e4624435SJonathan Corbet
447*e4624435SJonathan Corbet    * PR_SVE_VL_INHERIT (or equivalently SVE_PT_VL_INHERIT) is set for the
448*e4624435SJonathan Corbet      calling thread, or
449*e4624435SJonathan Corbet
450*e4624435SJonathan Corbet    * a deferred vector length change is pending, established via the
451*e4624435SJonathan Corbet      PR_SVE_SET_VL_ONEXEC flag (or SVE_PT_VL_ONEXEC).
452*e4624435SJonathan Corbet
453*e4624435SJonathan Corbet* Modifying the system default vector length does not affect the vector length
454*e4624435SJonathan Corbet  of any existing process or thread that does not make an execve() call.
455*e4624435SJonathan Corbet
456*e4624435SJonathan Corbet10.  Perf extensions
457*e4624435SJonathan Corbet--------------------------------
458*e4624435SJonathan Corbet
459*e4624435SJonathan Corbet* The arm64 specific DWARF standard [5] added the VG (Vector Granule) register
460*e4624435SJonathan Corbet  at index 46. This register is used for DWARF unwinding when variable length
461*e4624435SJonathan Corbet  SVE registers are pushed onto the stack.
462*e4624435SJonathan Corbet
463*e4624435SJonathan Corbet* Its value is equivalent to the current SVE vector length (VL) in bits divided
464*e4624435SJonathan Corbet  by 64.
465*e4624435SJonathan Corbet
466*e4624435SJonathan Corbet* The value is included in Perf samples in the regs[46] field if
467*e4624435SJonathan Corbet  PERF_SAMPLE_REGS_USER is set and the sample_regs_user mask has bit 46 set.
468*e4624435SJonathan Corbet
469*e4624435SJonathan Corbet* The value is the current value at the time the sample was taken, and it can
470*e4624435SJonathan Corbet  change over time.
471*e4624435SJonathan Corbet
472*e4624435SJonathan Corbet* If the system doesn't support SVE when perf_event_open is called with these
473*e4624435SJonathan Corbet  settings, the event will fail to open.
474*e4624435SJonathan Corbet
475*e4624435SJonathan CorbetAppendix A.  SVE programmer's model (informative)
476*e4624435SJonathan Corbet=================================================
477*e4624435SJonathan Corbet
478*e4624435SJonathan CorbetThis section provides a minimal description of the additions made by SVE to the
479*e4624435SJonathan CorbetARMv8-A programmer's model that are relevant to this document.
480*e4624435SJonathan Corbet
481*e4624435SJonathan CorbetNote: This section is for information only and not intended to be complete or
482*e4624435SJonathan Corbetto replace any architectural specification.
483*e4624435SJonathan Corbet
484*e4624435SJonathan CorbetA.1.  Registers
485*e4624435SJonathan Corbet---------------
486*e4624435SJonathan Corbet
487*e4624435SJonathan CorbetIn A64 state, SVE adds the following:
488*e4624435SJonathan Corbet
489*e4624435SJonathan Corbet* 32 8VL-bit vector registers Z0..Z31
490*e4624435SJonathan Corbet  For each Zn, Zn bits [127:0] alias the ARMv8-A vector register Vn.
491*e4624435SJonathan Corbet
492*e4624435SJonathan Corbet  A register write using a Vn register name zeros all bits of the corresponding
493*e4624435SJonathan Corbet  Zn except for bits [127:0].
494*e4624435SJonathan Corbet
495*e4624435SJonathan Corbet* 16 VL-bit predicate registers P0..P15
496*e4624435SJonathan Corbet
497*e4624435SJonathan Corbet* 1 VL-bit special-purpose predicate register FFR (the "first-fault register")
498*e4624435SJonathan Corbet
499*e4624435SJonathan Corbet* a VL "pseudo-register" that determines the size of each vector register
500*e4624435SJonathan Corbet
501*e4624435SJonathan Corbet  The SVE instruction set architecture provides no way to write VL directly.
502*e4624435SJonathan Corbet  Instead, it can be modified only by EL1 and above, by writing appropriate
503*e4624435SJonathan Corbet  system registers.
504*e4624435SJonathan Corbet
505*e4624435SJonathan Corbet* The value of VL can be configured at runtime by EL1 and above:
506*e4624435SJonathan Corbet  16 <= VL <= VLmax, where VL must be a multiple of 16.
507*e4624435SJonathan Corbet
508*e4624435SJonathan Corbet* The maximum vector length is determined by the hardware:
509*e4624435SJonathan Corbet  16 <= VLmax <= 256.
510*e4624435SJonathan Corbet
511*e4624435SJonathan Corbet  (The SVE architecture specifies 256, but permits future architecture
512*e4624435SJonathan Corbet  revisions to raise this limit.)
513*e4624435SJonathan Corbet
514*e4624435SJonathan Corbet* FPSR and FPCR are retained from ARMv8-A, and interact with SVE floating-point
515*e4624435SJonathan Corbet  operations in a similar way to the way in which they interact with ARMv8
516*e4624435SJonathan Corbet  floating-point operations::
517*e4624435SJonathan Corbet
518*e4624435SJonathan Corbet         8VL-1                       128               0  bit index
519*e4624435SJonathan Corbet        +----          ////            -----------------+
520*e4624435SJonathan Corbet     Z0 |                               :       V0      |
521*e4624435SJonathan Corbet      :                                          :
522*e4624435SJonathan Corbet     Z7 |                               :       V7      |
523*e4624435SJonathan Corbet     Z8 |                               :     * V8      |
524*e4624435SJonathan Corbet      :                                       :  :
525*e4624435SJonathan Corbet    Z15 |                               :     *V15      |
526*e4624435SJonathan Corbet    Z16 |                               :      V16      |
527*e4624435SJonathan Corbet      :                                          :
528*e4624435SJonathan Corbet    Z31 |                               :      V31      |
529*e4624435SJonathan Corbet        +----          ////            -----------------+
530*e4624435SJonathan Corbet                                                 31    0
531*e4624435SJonathan Corbet         VL-1                  0                +-------+
532*e4624435SJonathan Corbet        +----       ////      --+          FPSR |       |
533*e4624435SJonathan Corbet     P0 |                       |               +-------+
534*e4624435SJonathan Corbet      : |                       |         *FPCR |       |
535*e4624435SJonathan Corbet    P15 |                       |               +-------+
536*e4624435SJonathan Corbet        +----       ////      --+
537*e4624435SJonathan Corbet    FFR |                       |               +-----+
538*e4624435SJonathan Corbet        +----       ////      --+            VL |     |
539*e4624435SJonathan Corbet                                                +-----+
540*e4624435SJonathan Corbet
541*e4624435SJonathan Corbet(*) callee-save:
542*e4624435SJonathan Corbet    This only applies to bits [63:0] of Z-/V-registers.
543*e4624435SJonathan Corbet    FPCR contains callee-save and caller-save bits.  See [4] for details.
544*e4624435SJonathan Corbet
545*e4624435SJonathan Corbet
546*e4624435SJonathan CorbetA.2.  Procedure call standard
547*e4624435SJonathan Corbet-----------------------------
548*e4624435SJonathan Corbet
549*e4624435SJonathan CorbetThe ARMv8-A base procedure call standard is extended as follows with respect to
550*e4624435SJonathan Corbetthe additional SVE register state:
551*e4624435SJonathan Corbet
552*e4624435SJonathan Corbet* All SVE register bits that are not shared with FP/SIMD are caller-save.
553*e4624435SJonathan Corbet
554*e4624435SJonathan Corbet* Z8 bits [63:0] .. Z15 bits [63:0] are callee-save.
555*e4624435SJonathan Corbet
556*e4624435SJonathan Corbet  This follows from the way these bits are mapped to V8..V15, which are caller-
557*e4624435SJonathan Corbet  save in the base procedure call standard.
558*e4624435SJonathan Corbet
559*e4624435SJonathan Corbet
560*e4624435SJonathan CorbetAppendix B.  ARMv8-A FP/SIMD programmer's model
561*e4624435SJonathan Corbet===============================================
562*e4624435SJonathan Corbet
563*e4624435SJonathan CorbetNote: This section is for information only and not intended to be complete or
564*e4624435SJonathan Corbetto replace any architectural specification.
565*e4624435SJonathan Corbet
566*e4624435SJonathan CorbetRefer to [4] for more information.
567*e4624435SJonathan Corbet
568*e4624435SJonathan CorbetARMv8-A defines the following floating-point / SIMD register state:
569*e4624435SJonathan Corbet
570*e4624435SJonathan Corbet* 32 128-bit vector registers V0..V31
571*e4624435SJonathan Corbet* 2 32-bit status/control registers FPSR, FPCR
572*e4624435SJonathan Corbet
573*e4624435SJonathan Corbet::
574*e4624435SJonathan Corbet
575*e4624435SJonathan Corbet         127           0  bit index
576*e4624435SJonathan Corbet        +---------------+
577*e4624435SJonathan Corbet     V0 |               |
578*e4624435SJonathan Corbet      : :               :
579*e4624435SJonathan Corbet     V7 |               |
580*e4624435SJonathan Corbet   * V8 |               |
581*e4624435SJonathan Corbet   :  : :               :
582*e4624435SJonathan Corbet   *V15 |               |
583*e4624435SJonathan Corbet    V16 |               |
584*e4624435SJonathan Corbet      : :               :
585*e4624435SJonathan Corbet    V31 |               |
586*e4624435SJonathan Corbet        +---------------+
587*e4624435SJonathan Corbet
588*e4624435SJonathan Corbet                 31    0
589*e4624435SJonathan Corbet                +-------+
590*e4624435SJonathan Corbet           FPSR |       |
591*e4624435SJonathan Corbet                +-------+
592*e4624435SJonathan Corbet          *FPCR |       |
593*e4624435SJonathan Corbet                +-------+
594*e4624435SJonathan Corbet
595*e4624435SJonathan Corbet(*) callee-save:
596*e4624435SJonathan Corbet    This only applies to bits [63:0] of V-registers.
597*e4624435SJonathan Corbet    FPCR contains a mixture of callee-save and caller-save bits.
598*e4624435SJonathan Corbet
599*e4624435SJonathan Corbet
600*e4624435SJonathan CorbetReferences
601*e4624435SJonathan Corbet==========
602*e4624435SJonathan Corbet
603*e4624435SJonathan Corbet[1] arch/arm64/include/uapi/asm/sigcontext.h
604*e4624435SJonathan Corbet    AArch64 Linux signal ABI definitions
605*e4624435SJonathan Corbet
606*e4624435SJonathan Corbet[2] arch/arm64/include/uapi/asm/ptrace.h
607*e4624435SJonathan Corbet    AArch64 Linux ptrace ABI definitions
608*e4624435SJonathan Corbet
609*e4624435SJonathan Corbet[3] Documentation/arch/arm64/cpu-feature-registers.rst
610*e4624435SJonathan Corbet
611*e4624435SJonathan Corbet[4] ARM IHI0055C
612*e4624435SJonathan Corbet    http://infocenter.arm.com/help/topic/com.arm.doc.ihi0055c/IHI0055C_beta_aapcs64.pdf
613*e4624435SJonathan Corbet    http://infocenter.arm.com/help/topic/com.arm.doc.subset.swdev.abi/index.html
614*e4624435SJonathan Corbet    Procedure Call Standard for the ARM 64-bit Architecture (AArch64)
615*e4624435SJonathan Corbet
616*e4624435SJonathan Corbet[5] https://github.com/ARM-software/abi-aa/blob/main/aadwarf64/aadwarf64.rst
617