Lines Matching +full:a +full:- +full:bit

4 Adding a New System Call
7 This document describes what's involved in adding a new system call to the
9 :ref:`Documentation/process/submitting-patches.rst <submittingpatches>`.
13 ------------------------
15 The first thing to consider when adding a new system call is whether one of
18 kernel, there are other possibilities -- choose what fits best for your
21 - If the operations involved can be made to look like a filesystem-like
22 object, it may make more sense to create a new filesystem or device. This
23 also makes it easier to encapsulate the new functionality in a kernel module
26 - If the new functionality involves operations where the kernel notifies
27 userspace that something has happened, then returning a new file
30 - However, operations that don't map to
31 :manpage:`read(2)`/:manpage:`write(2)`-like operations
33 to a somewhat opaque API.
35 - If you're just exposing runtime system information, a new node in sysfs
39 in a namespaced/sandboxed/chrooted environment). Avoid adding any API to
40 debugfs, as this is not considered a 'production' interface to userspace.
41 - If the operation is specific to a particular file or file descriptor, then
43 :manpage:`fcntl(2)` is a multiplexing system call that hides a lot of complexity, so
46 (for example, getting/setting a simple flag related to a file descriptor).
47 - If the operation is specific to a particular task or process, then an
49 with :manpage:`fcntl(2)`, this system call is a complicated multiplexor so
50 is best reserved for near-analogs of existing ``prctl()`` commands or
51 getting/setting a simple flag related to a process.
55 -----------------------------------------
57 A new system call forms part of the API of the kernel, and has to be supported
58 indefinitely. As such, it's a very good idea to explicitly discuss the
63 together with the corresponding follow-up system calls --
65 ``pipe``/``pipe2``, ``renameat``/``renameat2`` -- so
68 For simpler system calls that only take a couple of arguments, the preferred
69 way to allow for future extensibility is to include a flags argument to the
75 return -EINVAL;
79 For more sophisticated system calls that involve a larger number of arguments,
80 it's preferred to encapsulate the majority of the arguments into a structure
81 that is passed in by pointer. Such a structure can cope with future extension
82 by including a size argument in the structure::
85 u32 size; /* userspace sets p->size = sizeof(struct xyzzy_params) */
91 As long as any subsequently added field, say ``param_4``, is designed so that a
95 - To cope with a later userspace program calling an older kernel, the kernel
98 - To cope with an older userspace program calling a newer kernel, the kernel
99 code can zero-extend a smaller instance of the structure (effectively
107 ---------------------------------------
109 If your new system call allows userspace to refer to a kernel object, it
110 should use a file descriptor as the handle for that object -- don't invent a
112 well-defined semantics for using file descriptors.
114 If your new :manpage:`xyzzy(2)` system call does return a new file descriptor,
115 then the flags argument should include a value that is equivalent to setting
119 ``execve()`` in another thread could leak a descriptor to
120 the exec'ed program. (However, resist the temptation to re-use the actual value
121 of the ``O_CLOEXEC`` constant, as it is architecture-specific and is part of a
124 If your system call returns a new file descriptor, you should also consider
126 descriptor. Making a file descriptor ready for reading or writing is the
130 If your new :manpage:`xyzzy(2)` system call involves a filename argument::
140 already-opened file descriptor using the ``AT_EMPTY_PATH`` flag, effectively
143 - xyzzyat(AT_FDCWD, path, ..., 0) is equivalent to xyzzy(path,...)
144 - xyzzyat(fd, "", ..., AT_EMPTY_PATH) is equivalent to fxyzzy(fd, ...)
150 If your new :manpage:`xyzzy(2)` system call involves a parameter describing an
151 offset within a file, make its type ``loff_t`` so that 64-bit offsets can be
152 supported even on 32-bit architectures.
155 it needs to be governed by the appropriate Linux capability bit (checked with
156 a call to ``capable()``), as described in the :manpage:`capabilities(7)` man
157 page. Choose an existing capability bit that governs related functionality,
159 under the same bit, as this goes against capabilities' purpose of splitting
161 overly-general ``CAP_SYS_ADMIN`` capability.
163 If your new :manpage:`xyzzy(2)` system call manipulates a process other than
164 the calling process, it should be restricted (using a call to
165 ``ptrace_may_access()``) so that only a calling process with the same
169 Finally, be aware that some non-x86 architectures have an easier time if
170 system call parameters that are explicitly 64-bit fall on odd-numbered
171 arguments (i.e. parameter 1, 3, 5), to allow use of contiguous pairs of 32-bit
172 registers. (This concern does not apply if the arguments are part of a
177 -----------------
183 - The core implementation of the system call, together with prototypes,
185 - Wiring up of the new system call for one particular architecture, usually
187 - A demonstration of the use of the new system call in userspace via a
189 - A draft man-page for the new system call, either as plain text in the
190 cover letter, or as a patch to the (separate) man-pages repository.
193 be cc'ed to linux-api@vger.kernel.org.
197 ----------------------------------
207 The new entry point also needs a corresponding function prototype, in
213 Some architectures (e.g. x86) have their own architecture-specific syscall
214 tables, but several other architectures share a generic syscall table. Add your
216 ``include/uapi/asm-generic/unistd.h``::
225 The file ``kernel/sys_ni.c`` provides a fallback stub implementation of each
226 system call, returning ``-ENOSYS``. Add your new system call here too::
231 normally be optional, so add a ``CONFIG`` option (typically to
234 - Include a description of the new functionality and system call controlled
236 - Make the option depend on EXPERT if it should be hidden from normal users.
237 - Make any new source files implementing the function dependent on the CONFIG
238 option in the Makefile (e.g. ``obj-$(CONFIG_XYZZY_SYSCALL) += xyzzy.o``).
239 - Double check that the kernel still builds with the new CONFIG option turned
242 To summarize, you need a commit that includes:
244 - ``CONFIG`` option for the new function, normally in ``init/Kconfig``
245 - ``SYSCALL_DEFINEn(xyzzy, ...)`` for the entry point
246 - corresponding prototype in ``include/linux/syscalls.h``
247 - generic table entry in ``include/uapi/asm-generic/unistd.h``
248 - fallback stub in ``kernel/sys_ni.c``
258 ``include/uapi/asm-generic/unistd.h``:
260 - arc
261 - arm64
262 - csky
263 - hexagon
264 - loongarch
265 - nios2
266 - openrisc
267 - riscv
272 As ``scripts/syscall.tbl`` serves as a common syscall table across multiple
273 architectures, a new entry is required in this table::
279 architecture-specific changes, consider using an architecture-specific ABI or
280 defining a new one.
282 If a new ABI, say ``xyz``, is introduced, the corresponding updates should be
287 To summarize, you need a commit that includes:
289 - ``CONFIG`` option for the new function, normally in ``init/Kconfig``
290 - ``SYSCALL_DEFINEn(xyzzy, ...)`` for the entry point
291 - corresponding prototype in ``include/linux/syscalls.h``
292 - new entry in ``scripts/syscall.tbl``
293 - (if needed) Makefile updates in ``arch/*/kernel/Makefile.syscalls``
294 - fallback stub in ``kernel/sys_ni.c``
298 ------------------------------
302 way (see below), this involves a "common" entry (for x86_64 and x32) in
316 ------------------------------------
318 For most system calls the same 64-bit implementation can be invoked even when
319 the userspace program is itself 32-bit; even if the system call's parameters
322 However, there are a couple of situations where a compatibility layer is
323 needed to cope with size differences between 32-bit and 64-bit.
325 The first is if the 64-bit kernel also supports 32-bit userspace programs, and
326 so needs to parse areas of (``__user``) memory that could hold either 32-bit or
327 64-bit values. In particular, this is needed whenever a system call argument
330 - a pointer to a pointer
331 - a pointer to a struct containing a pointer (e.g. ``struct iovec __user *``)
332 - a pointer to a varying sized integral type (``time_t``, ``off_t``,
334 - a pointer to a struct containing a varying sized integral type.
336 The second situation that requires a compatibility layer is if one of the
337 system call's arguments has a type that is explicitly 64-bit even on a 32-bit
338 architecture, for example ``loff_t`` or ``__u64``. In this case, a value that
339 arrives at a 64-bit kernel from a 32-bit application will be split into two
340 32-bit values, which then need to be re-assembled in the compatibility layer.
342 (Note that a system call argument that's a pointer to an explicit 64-bit type
343 does **not** need a compatibility layer; for example, :manpage:`splice(2)`'s arguments of
344 type ``loff_t __user *`` do not trigger the need for a ``compat_`` system call.)
348 SYSCALL_DEFINEn. This version of the implementation runs as part of a 64-bit
349 kernel, but expects to receive 32-bit parameter values and does whatever is
351 values to 64-bit versions and either calls on to the ``sys_`` version, or both of
352 them call a common inner implementation function.)
354 The compat entry point also needs a corresponding function prototype, in
360 If the system call involves a structure that is laid out differently on 32-bit
361 and 64-bit systems, say ``struct xyzzy_args``, then the include/linux/compat.h
362 header file should also include a compat version of the structure (``struct
363 compat_xyzzy_args``) where each variable-size field has the appropriate
366 parse the arguments from a 32-bit invocation.
387 version; the entry in ``include/uapi/asm-generic/unistd.h`` should use
395 - a ``COMPAT_SYSCALL_DEFINEn(xyzzy, ...)`` for the compat entry point
396 - corresponding prototype in ``include/linux/compat.h``
397 - (if needed) 32-bit mapping struct in ``include/linux/compat.h``
398 - instance of ``__SC_COMP`` not ``__SYSCALL`` in
399 ``include/uapi/asm-generic/unistd.h``
410 to indicate that a 32-bit userspace program running on a 64-bit kernel should
417 - ``COMPAT_SYSCALL_DEFINEn(xyzzy, ...)`` for the compat entry point
418 - corresponding prototype in ``include/linux/compat.h``
419 - modification of the entry in ``scripts/syscall.tbl`` to include an extra
421 - (if needed) 32-bit mapping struct in ``include/linux/compat.h``
429 On arm64, there is a dedicated syscall table for compatibility system calls
430 targeting 32-bit (AArch32) userspace: ``arch/arm64/tools/syscall_32.tbl``.
438 --------------------------------
440 To wire up the x86 architecture of a system call with a compatibility version,
444 column to indicate that a 32-bit userspace program running on a 64-bit kernel
450 the new system call. There's a choice here: the layout of the arguments
451 should either match the 64-bit version or the 32-bit version.
453 If there's a pointer-to-a-pointer involved, the decision is easy: x32 is
454 ILP32, so the layout should match the 32-bit version, and the entry in
462 If no pointers are involved, then it is preferable to re-use the 64-bit system
467 layout do indeed map exactly from x32 (-mx32) to either the 32-bit (-m32) or
468 64-bit (-m64) equivalents.
472 --------------------------------
475 continues exactly where it left off -- at the next instruction, with the
479 However, a few system calls do things differently. They might return to a
488 This is arch-specific, but typically involves defining assembly entry points
492 For x86_64, this is implemented as a ``stub_xyzzy`` entry point in
498 The equivalent for 32-bit programs running on a 64-bit kernel is normally
505 If the system call needs a compatibility layer (as in the previous section)
507 of the system call rather than the native 64-bit version. Also, if the x32 ABI
509 table will also need to invoke a stub that calls on to the ``compat_sys_``
512 For completeness, it's also nice to set up a mapping so that user-mode Linux
513 still works -- its syscall table will reference stub_xyzzy, but the UML build
515 simulates registers etc). Fixing this is as simple as adding a #define to
522 -------------
524 Most of the kernel treats system calls in a generic way, but there is the
527 The audit subsystem is one such special case; it includes (arch-specific)
528 functions that classify some special types of system call -- specifically
534 new system call, it's worth doing a kernel-wide grep for the existing system
539 -------
541 A new system call should obviously be tested; it is also useful to provide
542 reviewers with a demonstration of how user space programs will use the system
543 call. A good way to combine these aims is to include a simple self-test
544 program in a new directory under ``tools/testing/selftests/``.
546 For a new system call, there will obviously be no libc wrapper function and so
548 involves a new userspace-visible structure, the corresponding header will need
552 example, check that it works when compiled as an x86_64 (-m64), x86_32 (-m32)
553 and x32 (-mx32) ABI program.
557 for filesystem-related changes.
559 - https://linux-test-project.github.io/
560 - git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git
564 --------
566 All new system calls should come with a complete man page, ideally using groff
567 markup, but plain text will do. If groff is used, it's helpful to include a
568 pre-rendered ASCII version of the man page in the cover email for the
571 The man page should be cc'ed to linux-man@vger.kernel.org
572 For more details, see https://www.kernel.org/doc/man-pages/patches.html
576 --------------------------------------
582 useful to be used within the kernel, needs to be shared between an old and a
583 new syscall, or needs to be shared between a syscall and its compatibility
584 variant, it should be implemented by means of a "helper" function (such as
589 At least on 64-bit x86, it will be a hard requirement from v4.17 onwards to not
590 call system call functions in the kernel. It uses a different calling
591 convention for system calls where ``struct pt_regs`` is decoded on-the-fly in a
593 This means that only those parameters which are actually needed for a specific
599 user data. This is another reason why calling ``sys_xyzzy()`` is generally a
602 Exceptions to this rule are only allowed in architecture-specific overrides,
603 architecture-specific compatibility wrappers, or other code in arch/.
607 ----------------------
609 - LWN article from Michael Kerrisk on use of flags argument in system calls:
611 - LWN article from Michael Kerrisk on how to handle unknown flags in a system
613 - LWN article from Jake Edge describing constraints on 64-bit system call
615 - Pair of LWN articles from David Drysdale that describe the system call
618 - https://lwn.net/Articles/604287/
619 - https://lwn.net/Articles/604515/
621 - Architecture-specific requirements for system calls are discussed in the
622 :manpage:`syscall(2)` man-page:
623 http://man7.org/linux/man-pages/man2/syscall.2.html#NOTES
624 - Collated emails from Linus Torvalds discussing the problems with ``ioctl()``:
626 - "How to not invent kernel interfaces", Arnd Bergmann,
628 - LWN article from Michael Kerrisk on avoiding new uses of CAP_SYS_ADMIN:
630 - Recommendation from Andrew Morton that all related information for a new
632 https://lore.kernel.org/r/20140724144747.3041b208832bbdf9fbce5d96@linux-foundation.org
633 - Recommendation from Michael Kerrisk that a new system call should come with
634a man page: https://lore.kernel.org/r/CAKgNAkgMA39AfoSoA5Pe1r9N+ZzfYQNvNPvcRN7tOvRb8+v06Q@mail.gma…
635 - Suggestion from Thomas Gleixner that x86 wire-up should be in a separate
637 - Suggestion from Greg Kroah-Hartman that it's good for new system calls to
638 come with a man-page & selftest: https://lore.kernel.org/r/20140320025530.GA25469@kroah.com
639 - Discussion from Michael Kerrisk of new system call vs. :manpage:`prctl(2)` extension:
640 https://lore.kernel.org/r/CAHO5Pa3F2MjfTtfNxa8LbnkeeU8=YJ+9tDqxZpw7Gz59E-4AUg@mail.gmail.com
641 - Suggestion from Ingo Molnar that system calls that involve multiple
642 arguments should encapsulate those arguments in a struct, which includes a
644 - Numbering oddities arising from (re-)use of O_* numbering space flags:
646 - commit 75069f2b5bfb ("vfs: renumber FMODE_NONOTIFY and add to uniqueness
648 - commit 12ed2e36c98a ("fanotify: FMODE_NONOTIFY and __O_SYNC in sparc
650 - commit bb458c644a59 ("Safer ABI for O_TMPFILE")
652 - Discussion from Matthew Wilcox about restrictions on 64-bit arguments:
653 https://lore.kernel.org/r/20081212152929.GM26095@parisc-linux.org
654 - Recommendation from Greg Kroah-Hartman that unknown flags should be
656 - Recommendation from Linus Torvalds that x32 system calls should prefer
657 compatibility with 64-bit versions rather than 32-bit versions:
659 - Patch series revising system call table infrastructure to use
661 https://lore.kernel.org/lkml/20240704143611.2979589-1-arnd@kernel.org