#
31d1c080 |
| 16-Jul-2025 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: drop SDT_PROBES_ENABLED usage
since sdt probes started being hot patched
This eliminates a now spurious branch on fpl.status
|
#
6567623f |
| 15-Jul-2025 |
Mateusz Piotrowski <0mp@FreeBSD.org> |
vfs_cache: Fix the SDT definition of vfs:fplookup:lookup:done
1. The definition lists struct nameidata as the type of the first argument. However, the actual probes always pass a variable of type
vfs_cache: Fix the SDT definition of vfs:fplookup:lookup:done
1. The definition lists struct nameidata as the type of the first argument. However, the actual probes always pass a variable of type struct nameidata* to SDT_PROBE3. 2. The third argument (args[2]) is actually enum cache_fpl_status.
Reviewed by: markj Approved by: markj (mentor) Fixes: 07d2145a1717 vfs: add the infrastructure for lockless lookup MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D51315
show more ...
|
#
f1f23043 |
| 03-Jul-2025 |
Mark Johnston <markj@FreeBSD.org> |
vfs: Initial revision of inotify
Add an implementation of inotify_init(), inotify_add_watch(), inotify_rm_watch(), source-compatible with Linux. This provides functionality similar to kevent(2)'s E
vfs: Initial revision of inotify
Add an implementation of inotify_init(), inotify_add_watch(), inotify_rm_watch(), source-compatible with Linux. This provides functionality similar to kevent(2)'s EVFILT_VNODE, i.e., it lets applications monitor filesystem files for accesses. Compared to inotify, however, EVFILT_VNODE has the limitation of requiring the application to open the file to be monitored. This means that activity on a newly created file cannot be monitored reliably, and that a file descriptor per file in the hierarchy is required.
inotify on the other hand allows a directory and its entries to be monitored at once. It introduces a new file descriptor type to which "watches" can be attached; a watch is a pseudo-file descriptor associated with a file or directory and a set of events to watch for. When a watched vnode is accessed, a description of the event is queued to the inotify descriptor, readable with read(2). Events for files in a watched directory include the file name.
A watched vnode has its usecount bumped, so name cache entries originating from a watched directory are not evicted. Name cache entries are used to populate inotify events for files with a link in a watched directory. In particular, if a file is accessed with, say, read(2), an IN_ACCESS event will be generated for any watched hard link of the file.
The inotify_add_watch_at() variant is included so that this functionality is available in capability mode; plain inotify_add_watch() is disallowed in capability mode.
When a file in a nullfs mount is watched, the watch is attached to the lower vnode, such that accesses via either layer generate inotify events.
Many thanks to Gleb Popov for testing this patch and finding lots of bugs.
PR: 258010, 215011 Reviewed by: kib Tested by: arrowd MFC after: 3 months Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D50315
show more ...
|
Revision tags: release/14.3.0-p1, release/14.2.0-p4, release/13.5.0-p2 |
|
#
f35525ff |
| 24-Jun-2025 |
Mark Johnston <markj@FreeBSD.org> |
file: Add a fd flag with O_RESOLVE_BENEATH semantics
The O_RESOLVE_BENEATH openat(2) flag restricts name lookups such that they remain under the directory referenced by the dirfd. This commit intro
file: Add a fd flag with O_RESOLVE_BENEATH semantics
The O_RESOLVE_BENEATH openat(2) flag restricts name lookups such that they remain under the directory referenced by the dirfd. This commit introduces an implicit version of the flag, FD_RESOLVE_BENEATH, stored in the file descriptor entry. When the flag is set, any lookup relative to that fd automatically has O_RESOLVE_BENEATH semantics. Furthermore, the flag is sticky, meaning that it cannot be cleared, and it is copied by dup() and openat().
File descriptors with FD_RESOLVE_BENEATH set may not be passed to fchdir(2) or fchroot(2). Various fd lookup routines are modified to return fd flags to the caller.
This flag will be used to address a case where jails with different root directories and the ability to pass SCM_RIGHTS messages across the jail boundary can transfer directory fds in such as way as to allow a filesystem escape.
PR: 262180 Reviewed by: kib MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D50371
show more ...
|
Revision tags: release/14.3.0 |
|
#
0d224af3 |
| 27-May-2025 |
Mark Johnston <markj@FreeBSD.org> |
namei: Fix cn_flags width in various places
This truncation is mostly harmless today, but fix it anyway to avoid pain later down the road.
Reviewed by: olce, kib MFC after: 2 weeks Differential Rev
namei: Fix cn_flags width in various places
This truncation is mostly harmless today, but fix it anyway to avoid pain later down the road.
Reviewed by: olce, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D50417
show more ...
|
#
f4158953 |
| 27-May-2025 |
Mark Johnston <markj@FreeBSD.org> |
vfs cache: Add NAMEILOOKUP to the whitelist of fastpath lookup flags
Otherwise the lockless name lookup path is inadvertently disabled since NAMEILOOKUP isn't recognized.
Reviewed by: olce, kib Fix
vfs cache: Add NAMEILOOKUP to the whitelist of fastpath lookup flags
Otherwise the lockless name lookup path is inadvertently disabled since NAMEILOOKUP isn't recognized.
Reviewed by: olce, kib Fixes: 7587f6d4840f ("namei: Make stackable filesystems check harder for jail roots") Differential Revision: https://reviews.freebsd.org/D50532
show more ...
|
#
0596b4a3 |
| 26-May-2025 |
Rick Macklem <rmacklem@FreeBSD.org> |
vfs_cache.c: Use CACHE_FPL_SUPPORTED_CN_FLAGS
Commit 2ec2ba7e232d added some code to cache_can_fplookup() which worked (ensuring an abort when OPENNNAMED was set), but showed I didn't understand wha
vfs_cache.c: Use CACHE_FPL_SUPPORTED_CN_FLAGS
Commit 2ec2ba7e232d added some code to cache_can_fplookup() which worked (ensuring an abort when OPENNNAMED was set), but showed I didn't understand what CACHE_FPL_SUPPORTED_CN_FLAGS was used for.
This patch cleans it up.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D50524 Fixes: 2ec2ba7e232d ("vfs: Add VFS/syscall support for Solaris style extended attributes")
show more ...
|
#
14ec281a |
| 23-May-2025 |
Mark Johnston <markj@FreeBSD.org> |
namei: Remove a now-unused variable
Reported by: bapt Fixes: 7587f6d4840f ("namei: Make stackable filesystems check harder for jail roots")
|
#
7587f6d4 |
| 23-May-2025 |
Mark Johnston <markj@FreeBSD.org> |
namei: Make stackable filesystems check harder for jail roots
Suppose a process has its cwd pointing to a nullfs directory, where the lower directory is also visible in the jail's filesystem namespa
namei: Make stackable filesystems check harder for jail roots
Suppose a process has its cwd pointing to a nullfs directory, where the lower directory is also visible in the jail's filesystem namespace. Suppose that the lower directory vnode is moved out from under the nullfs mount. The nullfs vnode still shadows the lower vnode, and dotdot lookups relative to that directory will instantiate new nullfs vnodes outside of the nullfs mountpoint, effectively shadowing the lower filesystem.
This phenomenon can be abused to escape a chroot, since the nullfs vnodes instantiated by these dotdot lookups defeat the root vnode check in vfs_lookup(), which uses vnode pointer equality to test for the process root.
Fix this by extending nullfs and unionfs to perform the same check, exploiting the fact that the passed componentname is embedded in a nameidata structure to avoid changing the VOP_LOOKUP interface. That is, add a flag to indicate that containerof can be used to get the full nameidata structure, and perform the root vnode check on the lower vnode when performing a dotdot lookup.
PR: 262180 Reviewed by: olce, kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D50418
show more ...
|
#
713abc98 |
| 01-May-2025 |
Olivier Certner <olce@FreeBSD.org> |
sysctl(9): Ease exporting struct sizes; Discourage doing that
Introduce two helpers, the more general SYSCTL_SIZEOF() and a struct-specific one SYSCTL_SIZEOF_STRUCT() which prepends 'struct' in the
sysctl(9): Ease exporting struct sizes; Discourage doing that
Introduce two helpers, the more general SYSCTL_SIZEOF() and a struct-specific one SYSCTL_SIZEOF_STRUCT() which prepends 'struct' in the description and in the use of sizeof() but uses the raw structure name as the knob's name. The size of the object/structure is exported under 'debug.sizeof'.
Existing knobs under 'debug.sizeof' were all converted to use the helpers.
Add a note before the helpers discouraging the introduction of new leaves for ad-hoc reasons. List alternative means for developers to obtain the size of arbitrary kernel structures easily (thanks to markj@ for providing these).
No functional change (intended).
Reviewed by: kib, markj MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D50121
show more ...
|
#
811f6a0a |
| 01-May-2025 |
Olivier Certner <olce@FreeBSD.org> |
VFS cache: Fix initial sizing for non-default 'ncsizefactor'
Reviewed by: markj MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D
VFS cache: Fix initial sizing for non-default 'ncsizefactor'
Reviewed by: markj MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D50120
show more ...
|
#
01435e28 |
| 02-May-2025 |
Mark Johnston <markj@FreeBSD.org> |
vfs cache: Simplify cache_enter_time() a bit
The condition `flag == NFC_ISDOTDOT && vp != NULL && vp->v_type != VDIR` is never true at this point in the function. This is asserted slightly earlier.
vfs cache: Simplify cache_enter_time() a bit
The condition `flag == NFC_ISDOTDOT && vp != NULL && vp->v_type != VDIR` is never true at this point in the function. This is asserted slightly earlier. So, remove some dead code and simplify control flow.
N.B. we set v_cache_dd for all vnode types, not just VDIR. This seems to be intentional, see commit ce575cd0e2f9069. For regular files it appears to effectively represent the most recently entered cache entry for the vnode.
No functional change intended.
Reviewed by: olce, kib MFC after: 2 weeks Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D50107
show more ...
|
#
cc25864d |
| 02-May-2025 |
Mark Johnston <markj@FreeBSD.org> |
vfs cache: Move hash row lookup loops into a subroutine
No functional change intended.
Reviewed by: olce, kib MFC after: 2 weeks Sponsored by: Klara, Inc. Differential Revision: https://reviews.fre
vfs cache: Move hash row lookup loops into a subroutine
No functional change intended.
Reviewed by: olce, kib MFC after: 2 weeks Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D50106
show more ...
|
#
029ed5f5 |
| 02-May-2025 |
Mark Johnston <markj@FreeBSD.org> |
vfs cache: Add a predicate for testing cache entries
No functional change intended.
Reviewed by: olce, kib MFC after: 2 weeks Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebs
vfs cache: Add a predicate for testing cache entries
No functional change intended.
Reviewed by: olce, kib MFC after: 2 weeks Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D50105
show more ...
|
Revision tags: release/13.4.0-p5, release/13.5.0-p1, release/14.2.0-p3 |
|
#
2ec2ba7e |
| 02-Apr-2025 |
Rick Macklem <rmacklem@FreeBSD.org> |
vfs: Add VFS/syscall support for Solaris style extended attributes
Some systems, such as Solaris, represent extended attributes as a set of files in a directory associated with a file object. This
vfs: Add VFS/syscall support for Solaris style extended attributes
Some systems, such as Solaris, represent extended attributes as a set of files in a directory associated with a file object. This allows extended attributes to be acquired/modified via regular file system operations, such as read(2), write(2), lseek(2) and ftruncate(2).
Since ZFS already has the capability to do this, this patch allows system calls (and the NFSv4 client/server) such access to extended attributes. This permits handling of large extended attributes and allows the NFSv4 server to provide the service to NFSv4 clients that want it, such as Windows, MacOS and Solaris.
The top level syscall change is a new open(2)/openat(2) flag I called O_NAMEDATTR that allows the named attribute directory or any attribute within that directory to be open'd.
The patch defines two new v_irflag flags called VIRF_NAMEDDIR and VIRF_NAMEDATTR to indicate that the vnode is for this alternate name space and not a normal file object. The patch also defines flags (OPENNAMED and CREATENAMED) for VOP_LOOKUP() to pass this new case down into VOP_LOOKUP() and MNT_NAMEDATTR for file systems that support named attributes.
Most of the code in this patch is to avoid creation of links, symlinks or non-regular file objects in the named attribute directory.
It also must avoid using the name cache, since the named attribute directory is associated with the same name as the file object.
Man pages updates will be done as separate commits.
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D49583
show more ...
|
Revision tags: release/13.5.0, release/14.2.0-p2, release/14.1.0-p8, release/13.4.0-p4, release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3, release/14.2.0 |
|
#
bde575b2 |
| 25-Nov-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kern___realpathat(): honor uio_seg argument
Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D47739
|
#
67218bce |
| 25-Nov-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kern___realpathat(): do not copyout past end of string
Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.fr
kern___realpathat(): do not copyout past end of string
Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D47739
show more ...
|
#
31784ee1 |
| 25-Nov-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kern___realpathat(): style
Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential revision: https://reviews.freebsd.org/D47739
|
Revision tags: release/13.4.0 |
|
#
0a487207 |
| 08-Jul-2024 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: add sysctl vfs.cache.param.hitpct
Sponsored by: Rubicon Communications, LLC ("Netgate")
|
Revision tags: release/14.1.0 |
|
#
0cd9cde7 |
| 06-Apr-2024 |
Jake Freeland <jfree@FreeBSD.org> |
ktrace: Record namei violations with KTR_CAPFAIL
Report namei path lookups while Capsicum violation tracing with CAPFAIL_NAMEI. vfs caching is also ignored when tracing to mimic capability mode beha
ktrace: Record namei violations with KTR_CAPFAIL
Report namei path lookups while Capsicum violation tracing with CAPFAIL_NAMEI. vfs caching is also ignored when tracing to mimic capability mode behavior.
Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40680
show more ...
|
Revision tags: release/13.3.0 |
|
#
55edc40e |
| 04-Jan-2024 |
Mark Johnston <markj@FreeBSD.org> |
file: Remove the fd parameter to fgetvp_lookup() and fgetvp_lookup_smr()
The fd is always obtained from nameidata, so just fetch it from there instead. No functional change intended.
Reviewed by:
file: Remove the fd parameter to fgetvp_lookup() and fgetvp_lookup_smr()
The fd is always obtained from nameidata, so just fetch it from there instead. No functional change intended.
Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D43257
show more ...
|
#
29363fb4 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
bb8ecf25 |
| 19-Oct-2023 |
Dmitry Chagin <dchagin@FreeBSD.org> |
vfs cache: Fallback to namei to resolve symlinks with leading / in target for non-native ABI
This is a temporary solution to fix PR before release. During 15.0 it's necessary to refactor symlinks ha
vfs cache: Fallback to namei to resolve symlinks with leading / in target for non-native ABI
This is a temporary solution to fix PR before release. During 15.0 it's necessary to refactor symlinks handling between vfs & namecache.
PR: 273414 Reported by: Vincent Milum Jr, Dan Kotowski, glebius Tested by: Dan Kotowski, glebius Reviewed by: Differential Revision: https://reviews.freebsd.org/D41806 MFC after: 3 days
show more ...
|
#
8b622172 |
| 05-Oct-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: add 2 more optimizaiton ideas
|
#
cd2105d6 |
| 05-Oct-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs cache: denote a known bug in cache_remove_cnp
|