xref: /linux/Documentation/filesystems/fuse/fuse-passthrough.rst (revision 6238729bfce13f94b701766996a5d116d2df8bff)
1*6be0ddb2SBagas Sanjaya.. SPDX-License-Identifier: GPL-2.0
2*6be0ddb2SBagas Sanjaya
3*6be0ddb2SBagas Sanjaya================
4*6be0ddb2SBagas SanjayaFUSE Passthrough
5*6be0ddb2SBagas Sanjaya================
6*6be0ddb2SBagas Sanjaya
7*6be0ddb2SBagas SanjayaIntroduction
8*6be0ddb2SBagas Sanjaya============
9*6be0ddb2SBagas Sanjaya
10*6be0ddb2SBagas SanjayaFUSE (Filesystem in Userspace) passthrough is a feature designed to improve the
11*6be0ddb2SBagas Sanjayaperformance of FUSE filesystems for I/O operations. Typically, FUSE operations
12*6be0ddb2SBagas Sanjayainvolve communication between the kernel and a userspace FUSE daemon, which can
13*6be0ddb2SBagas Sanjayaincur overhead. Passthrough allows certain operations on a FUSE file to bypass
14*6be0ddb2SBagas Sanjayathe userspace daemon and be executed directly by the kernel on an underlying
15*6be0ddb2SBagas Sanjaya"backing file".
16*6be0ddb2SBagas Sanjaya
17*6be0ddb2SBagas SanjayaThis is achieved by the FUSE daemon registering a file descriptor (pointing to
18*6be0ddb2SBagas Sanjayathe backing file on a lower filesystem) with the FUSE kernel module. The kernel
19*6be0ddb2SBagas Sanjayathen receives an identifier (``backing_id``) for this registered backing file.
20*6be0ddb2SBagas SanjayaWhen a FUSE file is subsequently opened, the FUSE daemon can, in its response to
21*6be0ddb2SBagas Sanjayathe ``OPEN`` request, include this ``backing_id`` and set the
22*6be0ddb2SBagas Sanjaya``FOPEN_PASSTHROUGH`` flag. This establishes a direct link for specific
23*6be0ddb2SBagas Sanjayaoperations.
24*6be0ddb2SBagas Sanjaya
25*6be0ddb2SBagas SanjayaCurrently, passthrough is supported for operations like ``read(2)``/``write(2)``
26*6be0ddb2SBagas Sanjaya(via ``read_iter``/``write_iter``), ``splice(2)``, and ``mmap(2)``.
27*6be0ddb2SBagas Sanjaya
28*6be0ddb2SBagas SanjayaEnabling Passthrough
29*6be0ddb2SBagas Sanjaya====================
30*6be0ddb2SBagas Sanjaya
31*6be0ddb2SBagas SanjayaTo use FUSE passthrough:
32*6be0ddb2SBagas Sanjaya
33*6be0ddb2SBagas Sanjaya  1. The FUSE filesystem must be compiled with ``CONFIG_FUSE_PASSTHROUGH``
34*6be0ddb2SBagas Sanjaya     enabled.
35*6be0ddb2SBagas Sanjaya  2. The FUSE daemon, during the ``FUSE_INIT`` handshake, must negotiate the
36*6be0ddb2SBagas Sanjaya     ``FUSE_PASSTHROUGH`` capability and specify its desired
37*6be0ddb2SBagas Sanjaya     ``max_stack_depth``.
38*6be0ddb2SBagas Sanjaya  3. The (privileged) FUSE daemon uses the ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl
39*6be0ddb2SBagas Sanjaya     on its connection file descriptor (e.g., ``/dev/fuse``) to register a
40*6be0ddb2SBagas Sanjaya     backing file descriptor and obtain a ``backing_id``.
41*6be0ddb2SBagas Sanjaya  4. When handling an ``OPEN`` or ``CREATE`` request for a FUSE file, the daemon
42*6be0ddb2SBagas Sanjaya     replies with the ``FOPEN_PASSTHROUGH`` flag set in
43*6be0ddb2SBagas Sanjaya     ``fuse_open_out::open_flags`` and provides the corresponding ``backing_id``
44*6be0ddb2SBagas Sanjaya     in ``fuse_open_out::backing_id``.
45*6be0ddb2SBagas Sanjaya  5. The FUSE daemon should eventually call ``FUSE_DEV_IOC_BACKING_CLOSE`` with
46*6be0ddb2SBagas Sanjaya     the ``backing_id`` to release the kernel's reference to the backing file
47*6be0ddb2SBagas Sanjaya     when it's no longer needed for passthrough setups.
48*6be0ddb2SBagas Sanjaya
49*6be0ddb2SBagas SanjayaPrivilege Requirements
50*6be0ddb2SBagas Sanjaya======================
51*6be0ddb2SBagas Sanjaya
52*6be0ddb2SBagas SanjayaSetting up passthrough functionality currently requires the FUSE daemon to
53*6be0ddb2SBagas Sanjayapossess the ``CAP_SYS_ADMIN`` capability. This requirement stems from several
54*6be0ddb2SBagas Sanjayasecurity and resource management considerations that are actively being
55*6be0ddb2SBagas Sanjayadiscussed and worked on. The primary reasons for this restriction are detailed
56*6be0ddb2SBagas Sanjayabelow.
57*6be0ddb2SBagas Sanjaya
58*6be0ddb2SBagas SanjayaResource Accounting and Visibility
59*6be0ddb2SBagas Sanjaya----------------------------------
60*6be0ddb2SBagas Sanjaya
61*6be0ddb2SBagas SanjayaThe core mechanism for passthrough involves the FUSE daemon opening a file
62*6be0ddb2SBagas Sanjayadescriptor to a backing file and registering it with the FUSE kernel module via
63*6be0ddb2SBagas Sanjayathe ``FUSE_DEV_IOC_BACKING_OPEN`` ioctl. This ioctl returns a ``backing_id``
64*6be0ddb2SBagas Sanjayaassociated with a kernel-internal ``struct fuse_backing`` object, which holds a
65*6be0ddb2SBagas Sanjayareference to the backing ``struct file``.
66*6be0ddb2SBagas Sanjaya
67*6be0ddb2SBagas SanjayaA significant concern arises because the FUSE daemon can close its own file
68*6be0ddb2SBagas Sanjayadescriptor to the backing file after registration. The kernel, however, will
69*6be0ddb2SBagas Sanjayastill hold a reference to the ``struct file`` via the ``struct fuse_backing``
70*6be0ddb2SBagas Sanjayaobject as long as it's associated with a ``backing_id`` (or subsequently, with
71*6be0ddb2SBagas Sanjayaan open FUSE file in passthrough mode).
72*6be0ddb2SBagas Sanjaya
73*6be0ddb2SBagas SanjayaThis behavior leads to two main issues for unprivileged FUSE daemons:
74*6be0ddb2SBagas Sanjaya
75*6be0ddb2SBagas Sanjaya  1. **Invisibility to lsof and other inspection tools**: Once the FUSE
76*6be0ddb2SBagas Sanjaya     daemon closes its file descriptor, the open backing file held by the kernel
77*6be0ddb2SBagas Sanjaya     becomes "hidden." Standard tools like ``lsof``, which typically inspect
78*6be0ddb2SBagas Sanjaya     process file descriptor tables, would not be able to identify that this
79*6be0ddb2SBagas Sanjaya     file is still open by the system on behalf of the FUSE filesystem. This
80*6be0ddb2SBagas Sanjaya     makes it difficult for system administrators to track resource usage or
81*6be0ddb2SBagas Sanjaya     debug issues related to open files (e.g., preventing unmounts).
82*6be0ddb2SBagas Sanjaya
83*6be0ddb2SBagas Sanjaya  2. **Bypassing RLIMIT_NOFILE**: The FUSE daemon process is subject to
84*6be0ddb2SBagas Sanjaya     resource limits, including the maximum number of open file descriptors
85*6be0ddb2SBagas Sanjaya     (``RLIMIT_NOFILE``). If an unprivileged daemon could register backing files
86*6be0ddb2SBagas Sanjaya     and then close its own FDs, it could potentially cause the kernel to hold
87*6be0ddb2SBagas Sanjaya     an unlimited number of open ``struct file`` references without these being
88*6be0ddb2SBagas Sanjaya     accounted against the daemon's ``RLIMIT_NOFILE``. This could lead to a
89*6be0ddb2SBagas Sanjaya     denial-of-service (DoS) by exhausting system-wide file resources.
90*6be0ddb2SBagas Sanjaya
91*6be0ddb2SBagas SanjayaThe ``CAP_SYS_ADMIN`` requirement acts as a safeguard against these issues,
92*6be0ddb2SBagas Sanjayarestricting this powerful capability to trusted processes.
93*6be0ddb2SBagas Sanjaya
94*6be0ddb2SBagas Sanjaya**NOTE**: ``io_uring`` solves this similar issue by exposing its "fixed files",
95*6be0ddb2SBagas Sanjayawhich are visible via ``fdinfo`` and accounted under the registering user's
96*6be0ddb2SBagas Sanjaya``RLIMIT_NOFILE``.
97*6be0ddb2SBagas Sanjaya
98*6be0ddb2SBagas SanjayaFilesystem Stacking and Shutdown Loops
99*6be0ddb2SBagas Sanjaya--------------------------------------
100*6be0ddb2SBagas Sanjaya
101*6be0ddb2SBagas SanjayaAnother concern relates to the potential for creating complex and problematic
102*6be0ddb2SBagas Sanjayafilesystem stacking scenarios if unprivileged users could set up passthrough.
103*6be0ddb2SBagas SanjayaA FUSE passthrough filesystem might use a backing file that resides:
104*6be0ddb2SBagas Sanjaya
105*6be0ddb2SBagas Sanjaya  * On the *same* FUSE filesystem.
106*6be0ddb2SBagas Sanjaya  * On another filesystem (like OverlayFS) which itself might have an upper or
107*6be0ddb2SBagas Sanjaya    lower layer that is a FUSE filesystem.
108*6be0ddb2SBagas Sanjaya
109*6be0ddb2SBagas SanjayaThese configurations could create dependency loops, particularly during
110*6be0ddb2SBagas Sanjayafilesystem shutdown or unmount sequences, leading to deadlocks or system
111*6be0ddb2SBagas Sanjayainstability. This is conceptually similar to the risks associated with the
112*6be0ddb2SBagas Sanjaya``LOOP_SET_FD`` ioctl, which also requires ``CAP_SYS_ADMIN``.
113*6be0ddb2SBagas Sanjaya
114*6be0ddb2SBagas SanjayaTo mitigate this, FUSE passthrough already incorporates checks based on
115*6be0ddb2SBagas Sanjayafilesystem stacking depth (``sb->s_stack_depth`` and ``fc->max_stack_depth``).
116*6be0ddb2SBagas SanjayaFor example, during the ``FUSE_INIT`` handshake, the FUSE daemon can negotiate
117*6be0ddb2SBagas Sanjayathe ``max_stack_depth`` it supports. When a backing file is registered via
118*6be0ddb2SBagas Sanjaya``FUSE_DEV_IOC_BACKING_OPEN``, the kernel checks if the backing file's
119*6be0ddb2SBagas Sanjayafilesystem stack depth is within the allowed limit.
120*6be0ddb2SBagas Sanjaya
121*6be0ddb2SBagas SanjayaThe ``CAP_SYS_ADMIN`` requirement provides an additional layer of security,
122*6be0ddb2SBagas Sanjayaensuring that only privileged users can create these potentially complex
123*6be0ddb2SBagas Sanjayastacking arrangements.
124*6be0ddb2SBagas Sanjaya
125*6be0ddb2SBagas SanjayaGeneral Security Posture
126*6be0ddb2SBagas Sanjaya------------------------
127*6be0ddb2SBagas Sanjaya
128*6be0ddb2SBagas SanjayaAs a general principle for new kernel features that allow userspace to instruct
129*6be0ddb2SBagas Sanjayathe kernel to perform direct operations on its behalf based on user-provided
130*6be0ddb2SBagas Sanjayafile descriptors, starting with a higher privilege requirement (like
131*6be0ddb2SBagas Sanjaya``CAP_SYS_ADMIN``) is a conservative and common security practice. This allows
132*6be0ddb2SBagas Sanjayathe feature to be used and tested while further security implications are
133*6be0ddb2SBagas Sanjayaevaluated and addressed.
134