History log of /linux/tools/testing/selftests/pipe/.gitignore (Results 1 – 3 of 3)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v7.2-rc1
# 7e0e7bd6 15-Jun-2026 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
"Features:

- Reduce pipe->mutex contention by pre-allocating

Merge tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
"Features:

- Reduce pipe->mutex contention by pre-allocating pages outside the
lock in anon_pipe_write().

anon_pipe_write() called alloc_page() once per page while holding
pipe->mutex. The allocation can sleep doing direct reclaim and runs
memcg charging, which extends the critical section and stalls any
concurrent reader on the same mutex. Now up to 8 pages are
pre-allocated before the mutex is taken, leftovers are recycled
into the per-pipe tmp_page[] cache before unlock, and any remainder
is released after unlock, keeping the allocator out of the critical
section on both sides. On a writers x readers sweep with 64KB
writes against a 1 MB pipe throughput improves 6-28% and average
write latency drops 5-22%; under memory pressure - when the cost of
holding the mutex across reclaim is highest - throughput improves
21-48% and latency drops 17-33%. The microbenchmark is added to
selftests.

- uaccess/sockptr: fix the ignored_trailing logic in
copy_struct_to_user() to behave as documented and the usize check
in copy_struct_from_sockptr() for user pointers, and add
copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr()
helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC).

- bpf: add a sleepable bpf_real_inode() kfunc that resolves the real
inode backing a dentry via d_real_inode(). On overlayfs the inode
attached to the dentry doesn't carry the underlying device
information; this is used by the filesystem restriction BPF program
that was merged into systemd.

- docs: add guidelines for submitting new filesystems, motivated by
the maintenance burden abandoned and untestable filesystems impose
on VFS developers, blocking infrastructure work like folio
conversions and iomap migration.

Fixes:

- libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
and drop the now-redundant assignments in callers. This began as a
one-line dma-buf fix for a path_noexec() warning; a pseudo
filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo()
callers were audited: the only visible effect is on dma-buf where
SB_I_NOEXEC silences the warning.

- Handle set_blocksize() failures in legacy filesystems (bfs, hpfs,
qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a
device with a sector size > PAGE_SIZE crashed roughly half of them;
the rest had the same missing error handling pattern. Plus a
follow-up releasing the superblock buffer_head when setting the
minix v3 block size fails.

- mount: honour SB_NOUSER in the new mount API.

- fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by
switching the process-group paths of send_sigio() and send_sigurg()
from read_lock(&tasklist_lock) to RCU, matching the single-PID
path.

- vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing
delegated NFS mounts (fsopen() in a container with the mount
performed by a privileged daemon) that broke when non-init
s_user_ns was tied to FS_USERNS_MOUNT.

- selftests/namespaces: fix a hang in nsid_test where an unreaped
grandchild kept the TAP pipe write-end open, a waitpid(-1) race in
listns_efault_test, and a false FAIL on kernels without listns()
where the tests should SKIP.

- filelock: fix the break_lease() stub signature for
CONFIG_FILE_LOCKING=n.

- init/initramfs_test: wait for the async initramfs unpacking before
running; the test and do_populate_rootfs() share the parser state.

- fs/coredump: reduce redundant log noise in
validate_coredump_safety().

- iomap: pass the correct length to fserror_report_io() in
__iomap_write_begin().

- backing-file: fix the backing_file_open() kerneldoc.

Cleanups:

- initramfs: refactor the cpio hex header parsing to use hex2bin()
instead of the hand-rolled simple_strntoul() which is reverted, and
extend the initramfs KUnit tests to cover header fields with 0x
prefixes.

- Replace __get_free_pages() and friends with kmalloc()/kzalloc()
across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2,
isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the
do_mounts init code - part of the larger work of replacing page
allocator calls with kmalloc().

- Use clear_and_wake_up_bit() in unlock_buffer() and
journal_end_buffer_io_sync() instead of open-coding the sequence.

- Drop unused VFS exports: unexport drop_super_exclusive(), remove
start_removing_user_path_at(), and fold __start_removing_path()
into start_removing_path().

- fs/read_write: narrow the __kernel_write() export with
EXPORT_SYMBOL_FOR_MODULES().

- vfs: uapi: retire octal and hex constants in favor of (1 << n) for
the O_ flags. Finding a free bit for a new flag across the
architectures was needlessly hard with the mixed bases.

- dcache: add extra sanity checks of dead dentries in dentry_free()
via a new DENTRY_WARN_ONCE() that also prints d_flags.

- iov_iter: use kmemdup_array() in dup_iter() to harden the
allocation against multiplication overflow.

- fs/pipe: write to ->poll_usage only once.

- vfs: remove an always-taken if-branch in find_next_fd().

- dcache: use kmalloc_flex() for struct external_name in __d_alloc().

- namei: use QSTR() instead of QSTR_INIT() in path_pts().

- sync_file_range: delete dead S_ISLNK code.

- Comment fixes: retire a stale comment in fget_task_next() and fix
assorted spelling mistakes"

* tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits)
backing-file: fix backing_file_open() kerneldoc parameter
iomap: pass the correct len to fserror_report_io in __iomap_write_begin
vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS
filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n
vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags
bpf: add bpf_real_inode() kfunc
fs/read_write: Do not export __kernel_write() to the entire world
libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers
libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo()
mount: honour SB_NOUSER in the new mount API
fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling
selftests/pipe: add pipe_bench microbenchmark
fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write
fs: retire stale comment in fget_task_next()
fs: fix spelling mistakes in comment
bfs: replace get_zeroed_page() with kzalloc()
binfmt_misc: replace __get_free_page() with kmalloc()
configfs: replace __get_free_pages() with kzalloc()
fs/namespace: use __getname() to allocate mntpath buffer
fs/select: replace __get_free_page() with kmalloc()
...

show more ...


Revision tags: v7.1, v7.1-rc7, v7.1-rc6
# 99f41427 28-May-2026 Christian Brauner <brauner@kernel.org>

Merge patch series "fs/pipe: reduce pipe->mutex contention by pre-allocating outside the lock"

Breno Leitao <leitao@debian.org> says:

While profiling Meta's caching code[1], I found pipe->mutex con

Merge patch series "fs/pipe: reduce pipe->mutex contention by pre-allocating outside the lock"

Breno Leitao <leitao@debian.org> says:

While profiling Meta's caching code[1], I found pipe->mutex contention
on the hot path. anon_pipe_write() currently calls alloc_page() once
per page while holding pipe->mutex. The allocation can sleep doing
direct reclaim and runs memcg charging, which extends the critical
section and stalls any concurrent reader on the same mutex.

This series pre-allocates pages outside pipe->mutex in
anon_pipe_write(): for writes that span more than one full page, up
to PIPE_PREALLOC_MAX (8) pages are allocated via a per-page
alloc_page() loop before the mutex is taken. anon_pipe_get_page()
then drains the prealloc array first, falls back to the per-pipe
tmp_page[] cache, and only enters the allocator under the mutex for
the leftover pages (writes larger than PIPE_PREALLOC_MAX, single-page
writes that skip prealloc, or shortfalls when the prealloc loop
fails). Leftover prealloc pages are recycled into tmp_page[] before
unlock and any remainder is put_page()'d after unlock, keeping the
allocator out of the critical section on both sides.

alloc_pages_bulk_mempolicy() looked tempting but the bulk allocator
refuses __GFP_ACCOUNT under memcg -- it returns at most one page
when memcg_kmem_online() && (gfp & __GFP_ACCOUNT), see commit
8dcb3060d81d ("memcg: page_alloc: skip bulk allocator for
__GFP_ACCOUNT"). A per-page loop keeps memcg accounting and the
task NUMA mempolicy honoured uniformly without open-coding the
charge.

I also vibe-coded a microbenchmark to validate the change. It sweeps
writers x readers over {1,2,5} x {1,5,10} with 64KB writes against a
1 MB pipe and prints throughput + latency percentiles per config.

Measured on arm64 and also on x86 using virtme-ng (16 vCPUs, 64KB
writes, 1 MB pipe). The numbers below were collected on v1
(alloc_pages_bulk()); v2's per-page loop preserves the dominant
"allocation outside the mutex" win and is expected to land in the same
range.

== No memory pressure (10s per config) ==

Throughput in MB/s (baseline -> patched, delta):
writers readers=1 readers=5 readers=10
1 1119 -> 1354 (+21%) 1132 -> 1195 (+6%) 1060 -> 1240 (+17%)
2 1162 -> 1487 (+28%) 1034 -> 1285 (+24%) 1069 -> 1213 (+14%)
5 1152 -> 1357 (+18%) 1021 -> 1164 (+14%) 997 -> 1239 (+24%)

Avg write latency in ns (baseline -> patched, delta):
writers readers=1 readers=5 readers=10
1 55786 -> 46103 (-17%) 55164 -> 52260 (-5%) 58906 -> 50370 (-14%)
2 107546 -> 84011 (-22%) 120837 -> 97206 (-20%) 116860 -> 103036 (-12%)
5 271293 -> 230170 (-15%) 306089 -> 268429 (-12%) 313300 -> 252232 (-19%)

Throughput improves +6% to +28% and average write latency drops 5%
to 22% across every configuration.

== Under memory pressure (--memory-pressure, 6s per config) ==

stress-ng --vm 2 --vm-bytes 50% --vm-keep is forked alongside the
sweep so the alloc_page() calls inside anon_pipe_write() routinely
hit direct reclaim -- exactly the regime the patch targets.

Throughput in MB/s (baseline -> patched, delta):
writers readers=1 readers=5 readers=10
1 1088 -> 1438 (+32%) 996 -> 1477 (+48%) 989 -> 1194 (+21%)
2 1076 -> 1378 (+28%) 1007 -> 1269 (+26%) 1018 -> 1234 (+21%)
5 1052 -> 1311 (+25%) 986 -> 1225 (+24%) 972 -> 1249 (+29%)

Avg write latency in ns (baseline -> patched, delta):
writers readers=1 readers=5 readers=10
1 57397 -> 43406 (-24%) 62690 -> 42272 (-33%) 63136 -> 52272 (-17%)
2 116121 -> 90700 (-22%) 124098 -> 98481 (-21%) 122754 -> 101217 (-18%)
5 297122 -> 238322 (-20%) 316836 -> 255095 (-19%) 321496 -> 250189 (-22%)

Throughput improves +21% to +48% and average write latency drops
17% to 33% -- a noticeably bigger win than the no-pressure run.

That tracks: when alloc_page() has to dip into reclaim, the cost
of holding pipe->mutex across it is highest, and pulling the
allocation out of the critical section pays the most.

* patches from https://patch.msgid.link/20260524-fix_pipe-v3-0-bb4a75d23a90@debian.org:
selftests/pipe: add pipe_bench microbenchmark
fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write

Link: https://www.usenix.org/system/files/conference/atc13/atc13-bronson.pdf [1]
Link: https://patch.msgid.link/20260524-fix_pipe-v3-0-bb4a75d23a90@debian.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


Revision tags: v7.1-rc5
# d29bd8ef 24-May-2026 Breno Leitao <leitao@debian.org>

selftests/pipe: add pipe_bench microbenchmark

Add a small selftest that stresses pipe->mutex contention by spawning N
writer threads that hammer a single pipe with multi-page writes, plus M
reader t

selftests/pipe: add pipe_bench microbenchmark

Add a small selftest that stresses pipe->mutex contention by spawning N
writer threads that hammer a single pipe with multi-page writes, plus M
reader threads that drain. Each writer records its own write() latency
samples into a log2-bucketed histogram; main aggregates and prints
total writes, throughput, average and percentile (p50/p99) latencies,
and the maximum observed latency.

Pass --memory-pressure to fork stress-ng (--vm 4 --vm-bytes 80%
--vm-method all) for the duration of the run, so alloc_page() in
anon_pipe_write() routinely hits direct reclaim. The flag fails
fast if stress-ng is not on $PATH.

Program print something like the following, for different writes,
readers, msgsizes and memory pressure:

config: writers=X readers=Y msgsize=Z duration=3 pipe_size=1048576
memory_pressure=[no|yes]
writes: total=54451 rate=18150/s
throughput_MBps: 1134.40
lat_avg_ns: 275355
lat_p50_ns_upper: 262143
lat_p99_ns_upper: 1048575
lat_max_ns: 2145633

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://patch.msgid.link/20260524-fix_pipe-v3-2-bb4a75d23a90@debian.org
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>

show more ...