The Linux Kernel threat model
=============================

There are a lot of assumptions regarding what the kernel does and does not
protect against. These assumptions tend to cause confusion for bug reports
(:doc:`security-related ones <security-bugs>` vs :doc:`non-security ones
<../admin-guide/reporting-issues>`), and can complicate security enforcement
when the responsibilities for some boundaries is not clear between the kernel,
distros, administrators and users.

This document tries to clarify the responsibilities of the kernel in this
domain.

The kernel's responsibilities
-----------------------------

The kernel abstracts access to local hardware resources and to remote systems
in a way that allows multiple local users to get a fair share of the available
resources granted to them, and, when the underlying hardware permits, to assign
a level of confidentiality to their communications and to the data they are
processing or storing.

The kernel assumes that the underlying hardware behaves according to its
specifications. This includes the integrity of the CPU's instruction set, the
transparency of the branch prediction unit and the cache units, the consistency
of the Memory Management Unit (MMU), the isolation of DMA-capable peripherals
(e.g., via IOMMU), state transitions in controllers, ranges of values read from
registers, the respect of documented hardware limitations, etc.

When hardware fails to maintain its specified isolation (e.g., CPU bugs,
side-channels, hardware response to unexpected inputs), the kernel will usually
attempt to implement reasonable mitigations. These are best-effort measures
intended to reduce the attack surface or elevate the cost of an attack within
the limits of the hardware's facilities; they do not constitute a
kernel-provided safety guarantee.

Users always perform their activities under the authority of an administrator
who is able to grant or deny various types of permissions that may affect how
users benefit from available resources, or the level of confidentiality of
their activities. Administrators may also delegate all or part of their own
permissions to some users, particularly via capabilities but not only. All this
is performed via configuration (sysctl, file-system permissions etc).

The Linux Kernel applies a certain collection of default settings that match
its threat model. Distros have their own threat model and will come with their
own configuration presets, that the administrator may have to adjust to better
suit their expectations (relax or restrict).

By default, the Linux Kernel guarantees the following protections when running
on common processors featuring privilege levels and memory management units:

* **User-based isolation**: an unprivileged user may restrict access to their
  own data from other unprivileged users running on the same system. This
  includes:

  * stored data, via file system permissions
  * in-memory data (pages are not accessible by default to other users)
  * process activity (ptrace is not permitted to other users)
  * inter-process communication (other users may not observe data exchanged via
    UNIX domain sockets or other IPC mechanisms).
  * network communications within the same or with other systems

* **Capability-based protection**:

  * users not having elevated capabilities (including but not limited to
    CAP_SYS_ADMIN) may not alter the
    kernel's configuration, memory nor state, change other users' view of the
    file system layout, grant any user capabilities they do not have, nor
    affect the system's availability (shutdown, reboot, panic, hang, or making
    the system unresponsive via unbounded resource exhaustion).
  * users not having the ``CAP_NET_ADMIN`` capability may not alter the network
    configuration, intercept nor spoof network communications from other users
    nor systems.
  * users not having ``CAP_SYS_PTRACE`` may not observe other users' processes
    activities.

When ``CONFIG_USER_NS`` is set, the kernel also permits unprivileged users to
create their own user namespace in which they have all capabilities, but with a
number of restrictions (they may not perform actions that have impacts on the
initial user namespace, such as changing time, loading modules or mounting
block devices). Please refer to ``user_namespaces(7)`` for more details, the
possibilities of user namespaces are not covered in this document.

The kernel also offers a lot of troubleshooting and debugging facilities, which
can constitute attack vectors when placed in wrong hands. While some of them
are designed to be accessible to regular local users with a low risk (e.g.
kernel logs via ``/proc/kmsg``), some would expose enough information to
represent a risk in most places and the decision to expose them is under the
administrator's responsibility (perf events, traces), and others are not
designed to be accessed by non-privileged users (e.g. debugfs). Access to these
facilities by a user who has been explicitly granted permission by an
administrator does not constitute a security breach.

Bugs that permit to violate the principles above constitute security breaches.
However, bugs that permit one violation only once another one was already
achieved are only weaknesses. The kernel applies a number of self-protection
measures whose purpose is to avoid crossing a security boundary when certain
classes of bugs are found, but a failure of these extra protections do not
constitute a vulnerability alone.

What does not constitute a security bug
---------------------------------------

In the Linux kernel's threat model, the following classes of problems are
**NOT** considered as Linux Kernel security bugs. However, when it is believed
that the kernel could do better, they should be reported, so that they can be
reviewed and fixed where reasonably possible, but they will be handled as any
regular bug:

* **Configuration**:

  * outdated kernels and particularly end-of-life branches are out of the scope
    of the kernel's threat model: administrators are responsible for keeping
    their system up to date. For a bug to qualify as a security bug, it must be
    demonstrated that it affects actively maintained versions.

  * build-level: changes to the kernel configuration that are explicitly
    documented as lowering the security level (e.g. ``CONFIG_NOMMU``), or
    targeted at developers only.

  * OS-level: changes to command line parameters, sysctls, filesystem
    permissions, user capabilities, exposure of privileged interfaces, that
    explicitly increase exposure by either offering non-default access to
    unprivileged users, or reduce the kernel's ability to enforce some
    protections or mitigations. Example: write access to procfs or debugfs.

  * issues triggered only when using features intended for development or
    debugging (e.g., LOCKDEP, KASAN, FAULT_INJECTION): these features are known
    to introduce overhead and potential instability and are not intended for
    production use.

  * issues affecting drivers exposed under CONFIG_STAGING, as well as features
    marked EXPERIMENTAL in the configuration.

  * loading of explicitly insecure/broken/staging modules, and generally any
    using any subsystem marked as experimental or not intended for production
    use.

  * running out-of-tree modules or unofficial kernel forks; these should be
    reported to the relevant vendor.

* **Excess of initial privileges**:

  * actions performed by a user already possessing the privileges required to
    perform that action or modify that state (e.g. ``CAP_SYS_ADMIN``,
    ``CAP_NET_ADMIN``, ``CAP_SYS_RAWIO``, ``CAP_SYS_MODULE`` with no further
    boundary being crossed).

  * actions performed in user namespace that do not bypass the restrictions
    imposed to the initial user (e.g. ptrace usage, signal delivery, resource
    usage, access to FS/device/sysctl/memory, network binding, system/network
    configuration etc).

  * anything performed by the root user in the initial namespace (e.g. kernel
    oops when writing to a privileged device).

* **Out of production use**:

  This covers theoretical/probabilistic attacks that rely on laboratory
  conditions with zero system noise, or those requiring an unrealistic number
  of attempts (e.g., billions of trials) that would be detected by standard
  system monitoring long before success, such as:

  * prediction of random numbers that only works in a totally silent
    environment (such as IP ID, TCP ports or sequence numbers that can only be
    guessed in a lab).

  * activity observation and information leaks based on probabilistic
    approaches that are prone to measurement noise and not realistically
    reproducible on a production system.

  * issues that can only be triggered by heavy attacks (e.g. brute force) whose
    impact on the system makes it unlikely or impossible to remain undetected
    before they succeed (e.g. consuming all memory before succeeding).

  * problems seen only under development simulators, emulators, or combinations
    that do not exist on real systems at the time of reporting (issues
    involving tens of millions of threads, tens of thousands of CPUs,
    unrealistic CPU frequencies, RAM sizes or disk capacities, network speeds.

  * issues whose reproduction requires hardware modification or emulation,
    including fake USB devices that pretend to be another one.

  * as well as issues that can be triggered at a cost that is orders of
    magnitude higher than the expected benefits (e.g. fully functional keyboard
    emulator only to retrieve 7 uninitialized bytes in a structure, or
    brute-force method involving millions of connection attempts to guess a
    port number).

* **Hardening failures**:

  * ability to bypass some of the kernel's hardening measures with no
    demonstrable exploit path (e.g. ASLR bypass, events timing or probing with
    no demonstrable consequence). These are just weaknesses, not
    vulnerabilities.

  * missing argument checks and failure to report certain errors with no
    immediate consequence.

* **Random information leaks**:

  This concerns information leaks of small data parts that happen to be there
  and that cannot be chosen by the attacker, or face access restrictions:

  * structure padding reported by syscalls or other interfaces.

  * identifiers, partial data, non-terminated strings reported in error
    messages.

  * Leaks of kernel memory addresses/pointers do not constitute an immediately
    exploitable vector and are not security bugs, though they must be reported
    and fixed.

* **Crafted file system images**:

  * bugs triggered by mounting a corrupted or maliciously crafted file system
    image are generally not security bugs, as the kernel assumes the underlying
    storage media is under the administrator's control, unless the filesystem
    driver is specifically documented as being hardened against untrusted media.

  * issues that are resolved, mitigated, or detected by running a filesystem
    consistency check (fsck) on the image prior to mounting.

* **Physical access**:

  Issues that require physical access to the machine, hardware modification, or
  the use of specialized hardware (e.g., logic analyzers, DMA-attack tools over
  PCI-E/Thunderbolt) are out of scope unless the system is explicitly
  configured with technologies meant to defend against such attacks
  (e.g. IOMMU).

* **Functional and performance regressions**:

  Any issue that can be mitigated by setting proper permissions and limits
  doesn't qualify as a security bug.