| #
7c8a4671 |
| 15-Apr-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'vfs-7.1-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- Add FSMOUNT_NAMESPACE flag to fsmount() that creates a ne
Merge tag 'vfs-7.1-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- Add FSMOUNT_NAMESPACE flag to fsmount() that creates a new mount namespace with the newly created filesystem attached to a copy of the real rootfs. This returns a namespace file descriptor instead of an O_PATH mount fd, similar to how OPEN_TREE_NAMESPACE works for open_tree().
This allows creating a new filesystem and immediately placing it in a new mount namespace in a single operation, which is useful for container runtimes and other namespace-based isolation mechanisms.
This accompanies OPEN_TREE_NAMESPACE and avoids a needless detour via OPEN_TREE_NAMESPACE to get the same effect. Will be especially useful when you mount an actual filesystem to be used as the container rootfs.
- Currently, creating a new mount namespace always copies the entire mount tree from the caller's namespace. For containers and sandboxes that intend to build their mount table from scratch this is wasteful: they inherit a potentially large mount tree only to immediately tear it down.
This series adds support for creating a mount namespace that contains only a clone of the root mount, with none of the child mounts. Two new flags are introduced:
- CLONE_EMPTY_MNTNS (0x400000000) for clone3(), using the 64-bit flag space - UNSHARE_EMPTY_MNTNS (0x00100000) for unshare()
Both flags imply CLONE_NEWNS. The resulting namespace contains a single nullfs root mount with an immutable empty directory. The intended workflow is to then mount a real filesystem (e.g., tmpfs) over the root and build the mount table from there.
- Allow MOVE_MOUNT_BENEATH to target the caller's rootfs, allowing to switch out the rootfs without pivot_root(2).
The traditional approach to switching the rootfs involves pivot_root(2) or a chroot_fs_refs()-based mechanism that atomically updates fs->root for all tasks sharing the same fs_struct. This has consequences for fork(), unshare(CLONE_FS), and setns().
This series instead decomposes root-switching into individually atomic, locally-scoped steps:
fd_tree = open_tree(-EBADF, "/newroot", OPEN_TREE_CLONE | OPEN_TREE_CLOEXEC); fchdir(fd_tree); move_mount(fd_tree, "", AT_FDCWD, "/", MOVE_MOUNT_BENEATH | MOVE_MOUNT_F_EMPTY_PATH); chroot("."); umount2(".", MNT_DETACH);
Since each step only modifies the caller's own state, the fork/unshare/setns races are eliminated by design.
A key step to making this possible is to remove the locked mount restriction. Originally MOVE_MOUNT_BENEATH doesn't support mounting beneath a mount that is locked. The locked mount protects the underlying mount from being revealed. This is a core mechanism of unshare(CLONE_NEWUSER | CLONE_NEWNS). The mounts in the new mount namespace become locked. That effectively makes the new mount table useless as the caller cannot ever get rid of any of the mounts no matter how useless they are.
We can lift this restriction though. We simply transfer the locked property from the top mount to the mount beneath. This works because what we care about is to protect the underlying mount aka the parent. The mount mounted between the parent and the top mount takes over the job of protecting the parent mount from the top mount mount. This leaves us free to remove the locked property from the top mount which can consequently be unmounted:
unshare(CLONE_NEWUSER | CLONE_NEWNS)
and we inherit a clone of procfs on /proc then currently we cannot unmount it as:
umount -l /proc
will fail with EINVAL because the procfs mount is locked.
After this series we can now do:
mount --beneath -t tmpfs tmpfs /proc umount -l /proc
after which a tmpfs mount has been placed beneath the procfs mount. The tmpfs mount has become locked and the procfs mount has become unlocked.
This means you can safely modify an inherited mount table after unprivileged namespace creation.
Afterwards we simply make it possible to move a mount beneath the rootfs allowing to upgrade the rootfs.
Removing the locked restriction makes this very useful for containers created with unshare(CLONE_NEWUSER | CLONE_NEWNS) to reshuffle an inherited mount table safely and MOVE_MOUNT_BENEATH makes it possible to switch out the rootfs instead of using the costly pivot_root(2).
* tag 'vfs-7.1-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/namespaces: remove unused utils.h include from listns_efault_test selftests/fsmount_ns: add missing TARGETS and fix cap test selftests/empty_mntns: fix wrong CLONE_EMPTY_MNTNS hex value in comment selftests/empty_mntns: fix statmount_alloc() signature mismatch selftests/statmount: remove duplicate wait_for_pid() mount: always duplicate mount selftests/filesystems: add MOVE_MOUNT_BENEATH rootfs tests move_mount: allow MOVE_MOUNT_BENEATH on the rootfs move_mount: transfer MNT_LOCKED selftests/filesystems: add clone3 tests for empty mount namespaces selftests/filesystems: add tests for empty mount namespaces namespace: allow creating empty mount namespaces selftests: add FSMOUNT_NAMESPACE tests selftests/statmount: add statmount_alloc() helper tools: update mount.h header mount: add FSMOUNT_NAMESPACE mount: simplify __do_loopback() mount: start iterating from start of rbtree
show more ...
|
|
Revision tags: v7.0, v7.0-rc7, v7.0-rc6 |
|
| #
1a398a23 |
| 23-Mar-2026 |
Christian Brauner <brauner@kernel.org> |
selftests/empty_mntns: fix statmount_alloc() signature mismatch
empty_mntns.h includes ../statmount/statmount.h which provides a 4-argument statmount_alloc(mnt_id, mnt_ns_id, mask, flags), but then
selftests/empty_mntns: fix statmount_alloc() signature mismatch
empty_mntns.h includes ../statmount/statmount.h which provides a 4-argument statmount_alloc(mnt_id, mnt_ns_id, mask, flags), but then redefines its own 3-argument version without the flags parameter. This causes a build failure due to conflicting types.
Remove the duplicate definition from empty_mntns.h and update all callers to pass 0 for the flags argument.
Fixes: 32f54f2bbccf ("selftests/filesystems: add tests for empty mount namespaces") Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
|
Revision tags: v7.0-rc5, v7.0-rc4 |
|
| #
4e9f7592 |
| 11-Mar-2026 |
Christian Brauner <brauner@kernel.org> |
Merge patch series "namespace: allow creating empty mount namespaces"
Christian Brauner <brauner@kernel.org> says:
Currently, creating a new mount namespace always copies the entire mount tree from
Merge patch series "namespace: allow creating empty mount namespaces"
Christian Brauner <brauner@kernel.org> says:
Currently, creating a new mount namespace always copies the entire mount tree from the caller's namespace. For containers and sandboxes that intend to build their mount table from scratch this is wasteful: they inherit a potentially large mount tree only to immediately tear it down.
This series adds support for creating a mount namespace that contains only a clone of the root mount, with none of the child mounts. Two new flags are introduced:
- CLONE_EMPTY_MNTNS (0x400000000) for clone3(), using the 64-bit flag space. - UNSHARE_EMPTY_MNTNS (0x00100000) for unshare(), reusing the CLONE_PARENT_SETTID bit which has no meaning for unshare.
Both flags imply CLONE_NEWNS. The resulting namespace contains a single nullfs root mount with an immutable empty directory. The intended workflow is to then mount a real filesystem (e.g., tmpfs) over the root and build the mount table from there.
* patches from https://patch.msgid.link/20260306-work-empty-mntns-consolidated-v1-0-6eb30529bbb0@kernel.org: selftests/filesystems: add clone3 tests for empty mount namespaces selftests/filesystems: add tests for empty mount namespaces namespace: allow creating empty mount namespaces
Link: https://patch.msgid.link/20260306-work-empty-mntns-consolidated-v1-0-6eb30529bbb0@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
|
Revision tags: v7.0-rc3 |
|
| #
32f54f2b |
| 06-Mar-2026 |
Christian Brauner <brauner@kernel.org> |
selftests/filesystems: add tests for empty mount namespaces
Add a test suite for the UNSHARE_EMPTY_MNTNS and CLONE_EMPTY_MNTNS flags exercising the empty mount namespace functionality through the ks
selftests/filesystems: add tests for empty mount namespaces
Add a test suite for the UNSHARE_EMPTY_MNTNS and CLONE_EMPTY_MNTNS flags exercising the empty mount namespace functionality through the kselftest harness.
The tests cover:
- basic functionality: unshare succeeds, exactly one mount exists in the new namespace, root and cwd point to the same mount - flag interactions: UNSHARE_EMPTY_MNTNS works standalone without explicit CLONE_NEWNS, combines correctly with CLONE_NEWUSER and other namespace flags (CLONE_NEWUTS, CLONE_NEWIPC) - edge cases: EPERM without capabilities, works from a user namespace, many source mounts still result in one mount, cwd on a different mount gets reset to root - error paths: invalid flags return EINVAL - regression: plain CLONE_NEWNS still copies the full mount tree, other namespace unshares are unaffected - mount properties: the root mount has the expected statmount properties, is its own parent, and is the only entry returned by listmount - repeated unshare: consecutive UNSHARE_EMPTY_MNTNS calls each produce a new namespace with a distinct mount ID - overmount workflow: verifies the intended usage pattern of creating an empty mount namespace with a nullfs root and then mounting tmpfs over it to build a writable filesystem from scratch
Link: https://patch.msgid.link/20260306-work-empty-mntns-consolidated-v1-2-6eb30529bbb0@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|