.gitignore - OpenGrok history log for /linux/tools/testing/selftests/pipe/.gitignore

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
Revision tags: v7.2-rc1
# 7e0e7bd6	15-Jun-2026	Linus Torvalds <torvalds@linux-foundation.org>	Merge tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Reduce pipe->mutex contention by pre-allocating Merge tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Reduce pipe->mutex contention by pre-allocating pages outside the lock in anon_pipe_write(). anon_pipe_write() called alloc_page() once per page while holding pipe->mutex. The allocation can sleep doing direct reclaim and runs memcg charging, which extends the critical section and stalls any concurrent reader on the same mutex. Now up to 8 pages are pre-allocated before the mutex is taken, leftovers are recycled into the per-pipe tmp_page[] cache before unlock, and any remainder is released after unlock, keeping the allocator out of the critical section on both sides. On a writers x readers sweep with 64KB writes against a 1 MB pipe throughput improves 6-28% and average write latency drops 5-22%; under memory pressure - when the cost of holding the mutex across reclaim is highest - throughput improves 21-48% and latency drops 17-33%. The microbenchmark is added to selftests. - uaccess/sockptr: fix the ignored_trailing logic in copy_struct_to_user() to behave as documented and the usize check in copy_struct_from_sockptr() for user pointers, and add copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr() helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC). - bpf: add a sleepable bpf_real_inode() kfunc that resolves the real inode backing a dentry via d_real_inode(). On overlayfs the inode attached to the dentry doesn't carry the underlying device information; this is used by the filesystem restriction BPF program that was merged into systemd. - docs: add guidelines for submitting new filesystems, motivated by the maintenance burden abandoned and untestable filesystems impose on VFS developers, blocking infrastructure work like folio conversions and iomap migration. Fixes: - libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() and drop the now-redundant assignments in callers. This began as a one-line dma-buf fix for a path_noexec() warning; a pseudo filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo() callers were audited: the only visible effect is on dma-buf where SB_I_NOEXEC silences the warning. - Handle set_blocksize() failures in legacy filesystems (bfs, hpfs, qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a device with a sector size > PAGE_SIZE crashed roughly half of them; the rest had the same missing error handling pattern. Plus a follow-up releasing the superblock buffer_head when setting the minix v3 block size fails. - mount: honour SB_NOUSER in the new mount API. - fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by switching the process-group paths of send_sigio() and send_sigurg() from read_lock(&tasklist_lock) to RCU, matching the single-PID path. - vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing delegated NFS mounts (fsopen() in a container with the mount performed by a privileged daemon) that broke when non-init s_user_ns was tied to FS_USERNS_MOUNT. - selftests/namespaces: fix a hang in nsid_test where an unreaped grandchild kept the TAP pipe write-end open, a waitpid(-1) race in listns_efault_test, and a false FAIL on kernels without listns() where the tests should SKIP. - filelock: fix the break_lease() stub signature for CONFIG_FILE_LOCKING=n. - init/initramfs_test: wait for the async initramfs unpacking before running; the test and do_populate_rootfs() share the parser state. - fs/coredump: reduce redundant log noise in validate_coredump_safety(). - iomap: pass the correct length to fserror_report_io() in __iomap_write_begin(). - backing-file: fix the backing_file_open() kerneldoc. Cleanups: - initramfs: refactor the cpio hex header parsing to use hex2bin() instead of the hand-rolled simple_strntoul() which is reverted, and extend the initramfs KUnit tests to cover header fields with 0x prefixes. - Replace __get_free_pages() and friends with kmalloc()/kzalloc() across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2, isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the do_mounts init code - part of the larger work of replacing page allocator calls with kmalloc(). - Use clear_and_wake_up_bit() in unlock_buffer() and journal_end_buffer_io_sync() instead of open-coding the sequence. - Drop unused VFS exports: unexport drop_super_exclusive(), remove start_removing_user_path_at(), and fold __start_removing_path() into start_removing_path(). - fs/read_write: narrow the __kernel_write() export with EXPORT_SYMBOL_FOR_MODULES(). - vfs: uapi: retire octal and hex constants in favor of (1 << n) for the O_ flags. Finding a free bit for a new flag across the architectures was needlessly hard with the mixed bases. - dcache: add extra sanity checks of dead dentries in dentry_free() via a new DENTRY_WARN_ONCE() that also prints d_flags. - iov_iter: use kmemdup_array() in dup_iter() to harden the allocation against multiplication overflow. - fs/pipe: write to ->poll_usage only once. - vfs: remove an always-taken if-branch in find_next_fd(). - dcache: use kmalloc_flex() for struct external_name in __d_alloc(). - namei: use QSTR() instead of QSTR_INIT() in path_pts(). - sync_file_range: delete dead S_ISLNK code. - Comment fixes: retire a stale comment in fget_task_next() and fix assorted spelling mistakes" * tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits) backing-file: fix backing_file_open() kerneldoc parameter iomap: pass the correct len to fserror_report_io in __iomap_write_begin vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags bpf: add bpf_real_inode() kfunc fs/read_write: Do not export __kernel_write() to the entire world libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() mount: honour SB_NOUSER in the new mount API fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling selftests/pipe: add pipe_bench microbenchmark fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write fs: retire stale comment in fget_task_next() fs: fix spelling mistakes in comment bfs: replace get_zeroed_page() with kzalloc() binfmt_misc: replace __get_free_page() with kmalloc() configfs: replace __get_free_pages() with kzalloc() fs/namespace: use __getname() to allocate mntpath buffer fs/select: replace __get_free_page() with kmalloc() ... show more ...
Revision tags: v7.1, v7.1-rc7, v7.1-rc6
# 99f41427	28-May-2026	Christian Brauner <brauner@kernel.org>	Merge patch series "fs/pipe: reduce pipe->mutex contention by pre-allocating outside the lock" Breno Leitao <leitao@debian.org> says: While profiling Meta's caching code[1], I found pipe->mutex con Merge patch series "fs/pipe: reduce pipe->mutex contention by pre-allocating outside the lock" Breno Leitao <leitao@debian.org> says: While profiling Meta's caching code[1], I found pipe->mutex contention on the hot path. anon_pipe_write() currently calls alloc_page() once per page while holding pipe->mutex. The allocation can sleep doing direct reclaim and runs memcg charging, which extends the critical section and stalls any concurrent reader on the same mutex. This series pre-allocates pages outside pipe->mutex in anon_pipe_write(): for writes that span more than one full page, up to PIPE_PREALLOC_MAX (8) pages are allocated via a per-page alloc_page() loop before the mutex is taken. anon_pipe_get_page() then drains the prealloc array first, falls back to the per-pipe tmp_page[] cache, and only enters the allocator under the mutex for the leftover pages (writes larger than PIPE_PREALLOC_MAX, single-page writes that skip prealloc, or shortfalls when the prealloc loop fails). Leftover prealloc pages are recycled into tmp_page[] before unlock and any remainder is put_page()'d after unlock, keeping the allocator out of the critical section on both sides. alloc_pages_bulk_mempolicy() looked tempting but the bulk allocator refuses __GFP_ACCOUNT under memcg -- it returns at most one page when memcg_kmem_online() && (gfp & __GFP_ACCOUNT), see commit 8dcb3060d81d ("memcg: page_alloc: skip bulk allocator for __GFP_ACCOUNT"). A per-page loop keeps memcg accounting and the task NUMA mempolicy honoured uniformly without open-coding the charge. I also vibe-coded a microbenchmark to validate the change. It sweeps writers x readers over {1,2,5} x {1,5,10} with 64KB writes against a 1 MB pipe and prints throughput + latency percentiles per config. Measured on arm64 and also on x86 using virtme-ng (16 vCPUs, 64KB writes, 1 MB pipe). The numbers below were collected on v1 (alloc_pages_bulk()); v2's per-page loop preserves the dominant "allocation outside the mutex" win and is expected to land in the same range. == No memory pressure (10s per config) == Throughput in MB/s (baseline -> patched, delta): writers readers=1 readers=5 readers=10 1 1119 -> 1354 (+21%) 1132 -> 1195 (+6%) 1060 -> 1240 (+17%) 2 1162 -> 1487 (+28%) 1034 -> 1285 (+24%) 1069 -> 1213 (+14%) 5 1152 -> 1357 (+18%) 1021 -> 1164 (+14%) 997 -> 1239 (+24%) Avg write latency in ns (baseline -> patched, delta): writers readers=1 readers=5 readers=10 1 55786 -> 46103 (-17%) 55164 -> 52260 (-5%) 58906 -> 50370 (-14%) 2 107546 -> 84011 (-22%) 120837 -> 97206 (-20%) 116860 -> 103036 (-12%) 5 271293 -> 230170 (-15%) 306089 -> 268429 (-12%) 313300 -> 252232 (-19%) Throughput improves +6% to +28% and average write latency drops 5% to 22% across every configuration. == Under memory pressure (--memory-pressure, 6s per config) == stress-ng --vm 2 --vm-bytes 50% --vm-keep is forked alongside the sweep so the alloc_page() calls inside anon_pipe_write() routinely hit direct reclaim -- exactly the regime the patch targets. Throughput in MB/s (baseline -> patched, delta): writers readers=1 readers=5 readers=10 1 1088 -> 1438 (+32%) 996 -> 1477 (+48%) 989 -> 1194 (+21%) 2 1076 -> 1378 (+28%) 1007 -> 1269 (+26%) 1018 -> 1234 (+21%) 5 1052 -> 1311 (+25%) 986 -> 1225 (+24%) 972 -> 1249 (+29%) Avg write latency in ns (baseline -> patched, delta): writers readers=1 readers=5 readers=10 1 57397 -> 43406 (-24%) 62690 -> 42272 (-33%) 63136 -> 52272 (-17%) 2 116121 -> 90700 (-22%) 124098 -> 98481 (-21%) 122754 -> 101217 (-18%) 5 297122 -> 238322 (-20%) 316836 -> 255095 (-19%) 321496 -> 250189 (-22%) Throughput improves +21% to +48% and average write latency drops 17% to 33% -- a noticeably bigger win than the no-pressure run. That tracks: when alloc_page() has to dip into reclaim, the cost of holding pipe->mutex across it is highest, and pulling the allocation out of the critical section pays the most. * patches from https://patch.msgid.link/20260524-fix_pipe-v3-0-bb4a75d23a90@debian.org: selftests/pipe: add pipe_bench microbenchmark fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write Link: https://www.usenix.org/system/files/conference/atc13/atc13-bronson.pdf [1] Link: https://patch.msgid.link/20260524-fix_pipe-v3-0-bb4a75d23a90@debian.org Signed-off-by: Christian Brauner <brauner@kernel.org> show more ...
Revision tags: v7.1-rc5
# d29bd8ef	24-May-2026	Breno Leitao <leitao@debian.org>	selftests/pipe: add pipe_bench microbenchmark Add a small selftest that stresses pipe->mutex contention by spawning N writer threads that hammer a single pipe with multi-page writes, plus M reader t selftests/pipe: add pipe_bench microbenchmark Add a small selftest that stresses pipe->mutex contention by spawning N writer threads that hammer a single pipe with multi-page writes, plus M reader threads that drain. Each writer records its own write() latency samples into a log2-bucketed histogram; main aggregates and prints total writes, throughput, average and percentile (p50/p99) latencies, and the maximum observed latency. Pass --memory-pressure to fork stress-ng (--vm 4 --vm-bytes 80% --vm-method all) for the duration of the run, so alloc_page() in anon_pipe_write() routinely hits direct reclaim. The flag fails fast if stress-ng is not on $PATH. Program print something like the following, for different writes, readers, msgsizes and memory pressure: config: writers=X readers=Y msgsize=Z duration=3 pipe_size=1048576 memory_pressure=[no\|yes] writes: total=54451 rate=18150/s throughput_MBps: 1134.40 lat_avg_ns: 275355 lat_p50_ns_upper: 262143 lat_p99_ns_upper: 1048575 lat_max_ns: 2145633 Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260524-fix_pipe-v3-2-bb4a75d23a90@debian.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org> show more ...