| #
157d3d6e |
| 09-Feb-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- statmount: accept fd as a parameter
Extend struct mn
Merge tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs mount updates from Christian Brauner:
- statmount: accept fd as a parameter
Extend struct mnt_id_req with a file descriptor field and a new STATMOUNT_BY_FD flag. When set, statmount() returns mount information for the mount the fd resides on — including detached mounts (unmounted via umount2(MNT_DETACH)).
For detached mounts the STATMOUNT_MNT_POINT and STATMOUNT_MNT_NS_ID mask bits are cleared since neither is meaningful. The capability check is skipped for STATMOUNT_BY_FD since holding an fd already implies prior access to the mount and equivalent information is available through fstatfs() and /proc/pid/mountinfo without privilege. Includes comprehensive selftests covering both attached and detached mount cases.
- fs: Remove internal old mount API code (1 patch)
Now that every in-tree filesystem has been converted to the new mount API, remove all the legacy shim code in fs_context.c that handled unconverted filesystems. This deletes ~280 lines including legacy_init_fs_context(), the legacy_fs_context struct, and associated wrappers. The mount(2) syscall path for userspace remains untouched. Documentation references to the legacy callbacks are cleaned up.
- mount: add OPEN_TREE_NAMESPACE to open_tree()
Container runtimes currently use CLONE_NEWNS to copy the caller's entire mount namespace — only to then pivot_root() and recursively unmount everything they just copied. With large mount tables and thousands of parallel container launches this creates significant contention on the namespace semaphore.
OPEN_TREE_NAMESPACE copies only the specified mount tree (like OPEN_TREE_CLONE) but returns a mount namespace fd instead of a detached mount fd. The new namespace contains the copied tree mounted on top of a clone of the real rootfs.
This functions as a combined unshare(CLONE_NEWNS) + pivot_root() in a single syscall. Works with user namespaces: an unshare(CLONE_NEWUSER) followed by OPEN_TREE_NAMESPACE creates a mount namespace owned by the new user namespace. Mount namespace file mounts are excluded from the copy to prevent cycles. Includes ~1000 lines of selftests"
* tag 'vfs-7.0-rc1.namespace' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/open_tree: add OPEN_TREE_NAMESPACE tests mount: add OPEN_TREE_NAMESPACE fs: Remove internal old mount API code selftests: statmount: tests for STATMOUNT_BY_FD statmount: accept fd as a parameter statmount: permission check should return EPERM
show more ...
|
|
Revision tags: v6.19, v6.19-rc8, v6.19-rc7, v6.19-rc6 |
|
| #
1bce1a66 |
| 12-Jan-2026 |
Christian Brauner <brauner@kernel.org> |
Merge patch series "mount: add OPEN_TREE_NAMESPACE"
Christian Brauner <brauner@kernel.org> says:
When creating containers the setup usually involves using CLONE_NEWNS via clone3() or unshare(). Thi
Merge patch series "mount: add OPEN_TREE_NAMESPACE"
Christian Brauner <brauner@kernel.org> says:
When creating containers the setup usually involves using CLONE_NEWNS via clone3() or unshare(). This copies the caller's complete mount namespace. The runtime will also assemble a new rootfs and then use pivot_root() to switch the old mount tree with the new rootfs. Afterward it will recursively umount the old mount tree thereby getting rid of all mounts.
On a basic system here where the mount table isn't particularly large this still copies about 30 mounts. Copying all of these mounts only to get rid of them later is pretty wasteful.
This is exacerbated if intermediary mount namespaces are used that only exist for a very short amount of time and are immediately destroyed again causing a ton of mounts to be copied and destroyed needlessly.
With a large mount table and a system where thousands or ten-thousands of namespaces are spawned in parallel this quickly becomes a bottleneck increasing contention on the semaphore.
Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of returning a file descriptor referring to that mount tree OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor to a new mount namespace. In that new mount namespace the copied mount tree has been mounted on top of a copy of the real rootfs.
The caller can setns() into that mount namespace and perform any additionally setup such as move_mount()ing detached mounts in there.
This allows OPEN_TREE_NAMESPACE to function as a combined unshare(CLONE_NEWNS) and pivot_root().
A caller may for example choose to create an extremely minimal rootfs:
fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE);
This will create a mount namespace where "wootwoot" has become the rootfs mounted on top of the real rootfs. The caller can now setns() into this new mount namespace and assemble additional mounts.
This also works with user namespaces:
unshare(CLONE_NEWUSER); fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE);
which creates a new mount namespace owned by the earlier created user namespace with "wootwoot" as the rootfs mounted on top of the real rootfs.
This will scale a lot better when creating tons of mount namespaces and will allow to get rid of a lot of unnecessary mount and umount cycles. It also allows to create mount namespaces without needing to spawn throwaway helper processes.
* patches from https://patch.msgid.link/20251229-work-empty-namespace-v1-0-bfb24c7b061f@kernel.org: selftests/open_tree: add OPEN_TREE_NAMESPACE tests mount: add OPEN_TREE_NAMESPACE
Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-0-bfb24c7b061f@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
|
Revision tags: v6.19-rc5, v6.19-rc4 |
|
| #
b8f7622a |
| 29-Dec-2025 |
Christian Brauner <brauner@kernel.org> |
selftests/open_tree: add OPEN_TREE_NAMESPACE tests
Add tests for OPEN_TREE_NAMESPACE.
Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-2-bfb24c7b061f@kernel.org Tested-by: Jeff Layto
selftests/open_tree: add OPEN_TREE_NAMESPACE tests
Add tests for OPEN_TREE_NAMESPACE.
Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-2-bfb24c7b061f@kernel.org Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|