History log of /linux/fs/proc/base.c (Results 1 – 25 of 3051)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.12-rc2
# c8d430db 06-Oct-2024 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #1

- Fix pKVM error path on init, making sure we do not chang

Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #1

- Fix pKVM error path on init, making sure we do not change critical
system registers as we're about to fail

- Make sure that the host's vector length is at capped by a value
common to all CPUs

- Fix kvm_has_feat*() handling of "negative" features, as the current
code is pretty broken

- Promote Joey to the status of official reviewer, while James steps
down -- hopefully only temporarly

show more ...


# 0c436dfe 02-Oct-2024 Takashi Iwai <tiwai@suse.de>

Merge tag 'asoc-fix-v6.12-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.12

A bunch of fixes here that came in during the merge window and t

Merge tag 'asoc-fix-v6.12-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.12

A bunch of fixes here that came in during the merge window and the first
week of release, plus some new quirks and device IDs. There's nothing
major here, it's a bit bigger than it might've been due to there being
no fixes sent during the merge window due to your vacation.

show more ...


# 2cd86f02 01-Oct-2024 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes

Required for a panthor fix that broke when
FOP_UNSIGNED_OFFSET was added in place of FMODE_UNSIGNED_OFFSET.

Signed-off-by: Maarten L

Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes

Required for a panthor fix that broke when
FOP_UNSIGNED_OFFSET was added in place of FMODE_UNSIGNED_OFFSET.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

show more ...


Revision tags: v6.12-rc1
# 3a39d672 27-Sep-2024 Paolo Abeni <pabeni@redhat.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

No conflicts and no adjacent changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 36ec807b 20-Sep-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.12 merge window.


Revision tags: v6.11, v6.11-rc7, v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2, v6.11-rc1
# 3daee2e4 16-Jul-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.10' into next

Sync up with mainline to bring in device_for_each_child_node_scoped()
and other newer APIs.


# 66e72a01 29-Jul-2024 Jerome Brunet <jbrunet@baylibre.com>

Merge tag 'v6.11-rc1' into clk-meson-next

Linux 6.11-rc1


# ee057c8c 14-Aug-2024 Steven Rostedt <rostedt@goodmis.org>

Merge tag 'v6.11-rc3' into trace/ring-buffer/core

The "reserve_mem" kernel command line parameter has been pulled into
v6.11. Merge the latest -rc3 to allow the persistent ring buffer memory to
be a

Merge tag 'v6.11-rc3' into trace/ring-buffer/core

The "reserve_mem" kernel command line parameter has been pulled into
v6.11. Merge the latest -rc3 to allow the persistent ring buffer memory to
be able to be mapped at the address specified by the "reserve_mem" command
line parameter.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

show more ...


# c8faf11c 30-Jul-2024 Tejun Heo <tj@kernel.org>

Merge tag 'v6.11-rc1' into for-6.12

Linux 6.11-rc1


# 2004cef1 19-Sep-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

- Implement the SCHED_DEADLINE server infrastructure - Daniel Br

Merge tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

- Implement the SCHED_DEADLINE server infrastructure - Daniel Bristot
de Oliveira's last major contribution to the kernel:

"SCHED_DEADLINE servers can help fixing starvation issues of low
priority tasks (e.g., SCHED_OTHER) when higher priority tasks
monopolize CPU cycles. Today we have RT Throttling; DEADLINE
servers should be able to replace and improve that."

(Daniel Bristot de Oliveira, Peter Zijlstra, Joel Fernandes, Youssef
Esmat, Huang Shijie)

- Preparatory changes for sched_ext integration:
- Use set_next_task(.first) where required
- Fix up set_next_task() implementations
- Clean up DL server vs. core sched
- Split up put_prev_task_balance()
- Rework pick_next_task()
- Combine the last put_prev_task() and the first set_next_task()
- Rework dl_server
- Add put_prev_task(.next)

(Peter Zijlstra, with a fix by Tejun Heo)

- Complete the EEVDF transition and refine EEVDF scheduling:
- Implement delayed dequeue
- Allow shorter slices to wakeup-preempt
- Use sched_attr::sched_runtime to set request/slice suggestion
- Document the new feature flags
- Remove unused and duplicate-functionality fields
- Simplify & unify pick_next_task_fair()
- Misc debuggability enhancements

(Peter Zijlstra, with fixes/cleanups by Dietmar Eggemann, Valentin
Schneider and Chuyi Zhou)

- Initialize the vruntime of a new task when it is first enqueued,
resulting in significant decrease in latency of newly woken tasks
(Zhang Qiao)

- Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
(K Prateek Nayak, Peter Zijlstra)

- Clean up and clarify the usage of Clean up usage of rt_task()
(Qais Yousef)

- Preempt SCHED_IDLE entities in strict cgroup hierarchies
(Tianchen Ding)

- Clarify the documentation of time units for deadline scheduler
parameters (Christian Loehle)

- Remove the HZ_BW chicken-bit feature flag introduced a year ago,
the original change seems to be working fine (Phil Auld)

- Misc fixes and cleanups (Chen Yu, Dan Carpenter, Huang Shijie,
Peilin He, Qais Yousefm and Vincent Guittot)

* tag 'sched-core-2024-09-19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
sched/cpufreq: Use NSEC_PER_MSEC for deadline task
cpufreq/cppc: Use NSEC_PER_MSEC for deadline task
sched/deadline: Clarify nanoseconds in uapi
sched/deadline: Convert schedtool example to chrt
sched/debug: Fix the runnable tasks output
sched: Fix sched_delayed vs sched_core
kernel/sched: Fix util_est accounting for DELAY_DEQUEUE
kthread: Fix task state in kthread worker if being frozen
sched/pelt: Use rq_clock_task() for hw_pressure
sched/fair: Move effective_cpu_util() and effective_cpu_util() in fair.c
sched/core: Introduce SM_IDLE and an idle re-entry fast-path in __schedule()
sched: Add put_prev_task(.next)
sched: Rework dl_server
sched: Combine the last put_prev_task() and the first set_next_task()
sched: Rework pick_next_task()
sched: Split up put_prev_task_balance()
sched: Clean up DL server vs core sched
sched: Fixup set_next_task() implementations
sched: Use set_next_task(.first) where required
sched/fair: Properly deactivate sched_delayed task upon class change
...

show more ...


# 0e8655b4 29-Jul-2024 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging to get a late RC of v6.10 before moving into v6.11.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# 9ea925c8 17-Sep-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
"Core:

- Overhaul of posix-timers in preparation of removing

Merge tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
"Core:

- Overhaul of posix-timers in preparation of removing the workaround
for periodic timers which have signal delivery ignored.

- Remove the historical extra jiffie in msleep()

msleep() adds an extra jiffie to the timeout value to ensure
minimal sleep time. The timer wheel ensures minimal sleep time
since the large rewrite to a non-cascading wheel, but the extra
jiffie in msleep() remained unnoticed. Remove it.

- Make the timer slack handling correct for realtime tasks.

The procfs interface is inconsistent and does neither reflect
reality nor conforms to the man page. Show the correct 0 slack for
real time tasks and enforce it at the core level instead of having
inconsistent individual checks in various timer setup functions.

- The usual set of updates and enhancements all over the place.

Drivers:

- Allow the ACPI PM timer to be turned off during suspend

- No new drivers

- The usual updates and enhancements in various drivers"

* tag 'timers-core-2024-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
ntp: Make sure RTC is synchronized when time goes backwards
treewide: Fix wrong singular form of jiffies in comments
cpu: Use already existing usleep_range()
timers: Rename next_expiry_recalc() to be unique
platform/x86:intel/pmc: Fix comment for the pmc_core_acpi_pm_timer_suspend_resume function
clocksource/drivers/jcore: Use request_percpu_irq()
clocksource/drivers/cadence-ttc: Add missing clk_disable_unprepare in ttc_setup_clockevent
clocksource/drivers/asm9260: Add missing clk_disable_unprepare in asm9260_timer_init
clocksource/drivers/qcom: Add missing iounmap() on errors in msm_dt_timer_init()
clocksource/drivers/ingenic: Use devm_clk_get_enabled() helpers
platform/x86:intel/pmc: Enable the ACPI PM Timer to be turned off when suspended
clocksource: acpi_pm: Add external callback for suspend/resume
clocksource/drivers/arm_arch_timer: Using for_each_available_child_of_node_scoped()
dt-bindings: timer: rockchip: Add rk3576 compatible
timers: Annotate possible non critical data race of next_expiry
timers: Remove historical extra jiffie for timeout in msleep()
hrtimer: Use and report correct timerslack values for realtime tasks
hrtimer: Annotate hrtimer_cpu_base_.*_expiry() for sparse.
timers: Add sparse annotation for timer_sync_wait_running().
signal: Replace BUG_ON()s
...

show more ...


# ed4fb6d7 14-Aug-2024 Felix Moessbauer <felix.moessbauer@siemens.com>

hrtimer: Use and report correct timerslack values for realtime tasks

The timerslack_ns setting is used to specify how much the hardware
timers should be delayed, to potentially dispatch multiple tim

hrtimer: Use and report correct timerslack values for realtime tasks

The timerslack_ns setting is used to specify how much the hardware
timers should be delayed, to potentially dispatch multiple timers in a
single interrupt. This is a performance optimization. Timers of
realtime tasks (having a realtime scheduling policy) should not be
delayed.

This logic was inconsitently applied to the hrtimers, leading to delays
of realtime tasks which used timed waits for events (e.g. condition
variables). Due to the downstream override of the slack for rt tasks,
the procfs reported incorrect (non-zero) timerslack_ns values.

This is changed by setting the timer_slack_ns task attribute to 0 for
all tasks with a rt policy. By that, downstream users do not need to
specially handle rt tasks (w.r.t. the slack), and the procfs entry
shows the correct value of "0". Setting non-zero slack values (either
via procfs or PR_SET_TIMERSLACK) on tasks with a rt policy is ignored,
as stated in "man 2 PR_SET_TIMERSLACK":

Timer slack is not applied to threads that are scheduled under a
real-time scheduling policy (see sched_setscheduler(2)).

The special handling of timerslack on rt tasks in downstream users
is removed as well.

Signed-off-by: Felix Moessbauer <felix.moessbauer@siemens.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240814121032.368444-2-felix.moessbauer@siemens.com

show more ...


# 9a7b0158 30-Jul-2024 Thomas Gleixner <tglx@linutronix.de>

Merge tag 'posix-timers-2024-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core

Pull updates for posix timers and related signal code from Frederic Weis

Merge tag 'posix-timers-2024-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core

Pull updates for posix timers and related signal code from Frederic Weisbecker:

* Prepare posix timers selftests for upcoming changes:

- Check signal behaviour sanity against SIG_IGN

- Check signal behaviour sanity against timer
reprogramm/deletion

- Check SIGEV_NONE pending expiry read

- Check interval timer read on a pending SIGNAL

- Check correct overrun count after signal block/unblock

* Various consolidations:

- timer get/set

- signal queue

* Fixes:
- Correctly read SIGEV_NONE timers

- Forward expiry while reading expired interval timers
with pending signal

- Don't arm SIGEV_NONE timers

* Various cleanups all over the place

show more ...


Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4
# 52dea0a1 10-Jun-2024 Thomas Gleixner <tglx@linutronix.de>

posix-timers: Convert timer list to hlist

No requirement for a real list. Spare a few bytes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.

posix-timers: Convert timer list to hlist

No requirement for a real list. Spare a few bytes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

show more ...


# e8fc317d 16-Sep-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull procfs updates from Christian Brauner:
"This contains the following changes for procfs:

- Add config op

Merge tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull procfs updates from Christian Brauner:
"This contains the following changes for procfs:

- Add config options and parameters to block forcing memory writes.

This adds a Kconfig option and boot param to allow removing the
FOLL_FORCE flag from /proc/<pid>/mem write calls as this can be
used in various attacks.

The traditional forcing behavior is kept as default because it can
break GDB and some other use cases.

This is the simpler version that you had requested.

- Restrict overmounting of ephemeral entities.

It is currently possible to mount on top of various ephemeral
entities in procfs. This specifically includes magic links. To
recap, magic links are links of the form /proc/<pid>/fd/<nr>. They
serve as references to a target file and during path lookup they
cause a jump to the target path. Such magic links disappear if the
corresponding file descriptor is closed.

Currently it is possible to overmount such magic links. This is
mostly interesting for an attacker that wants to somehow trick a
process into e.g., reopening something that it didn't intend to
reopen or to hide a malicious file descriptor.

But also it risks leaking mounts for long-running processes. When
overmounting a magic link like above, the mount will not be
detached when the file descriptor is closed. Only the target
mountpoint will disappear. Which has the consequence of making it
impossible to unmount that mount afterwards. So the mount will
stick around until the process exits and the /proc/<pid>/ directory
is cleaned up during proc_flush_pid() when the dentries are pruned
and invalidated.

That in turn means it's possible for a program to accidentally leak
mounts and it's also possible to make a task leak mounts without
it's knowledge if the attacker just keeps overmounting things under
/proc/<pid>/fd/<nr>.

Disallow overmounting of such ephemeral entities.

- Cleanup the readdir method naming in some procfs file operations.

- Replace kmalloc() and strcpy() with a simple kmemdup() call"

* tag 'vfs-6.12.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
proc: fold kmalloc() + strcpy() into kmemdup()
proc: block mounting on top of /proc/<pid>/fdinfo/*
proc: block mounting on top of /proc/<pid>/fd/*
proc: block mounting on top of /proc/<pid>/map_files/*
proc: add proc_splice_unmountable()
proc: proc_readfdinfo() -> proc_fdinfo_iterate()
proc: proc_readfd() -> proc_fd_iterate()
proc: add config & param to block forcing mem writes

show more ...


# 3352633c 16-Sep-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs file updates from Christian Brauner:
"This is the work to cleanup and shrink struct file significantly.

Merge tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs file updates from Christian Brauner:
"This is the work to cleanup and shrink struct file significantly.

Right now, (focusing on x86) struct file is 232 bytes. After this
series struct file will be 184 bytes aka 3 cacheline and a spare 8
bytes for future extensions at the end of the struct.

With struct file being as ubiquitous as it is this should make a
difference for file heavy workloads and allow further optimizations in
the future.

- struct fown_struct was embedded into struct file letting it take up
32 bytes in total when really it shouldn't even be embedded in
struct file in the first place. Instead, actual users of struct
fown_struct now allocate the struct on demand. This frees up 24
bytes.

- Move struct file_ra_state into the union containg the cleanup hooks
and move f_iocb_flags out of the union. This closes a 4 byte hole
we created earlier and brings struct file to 192 bytes. Which means
struct file is 3 cachelines and we managed to shrink it by 40
bytes.

- Reorder struct file so that nothing crosses a cacheline.

I suspect that in the future we will end up reordering some members
to mitigate false sharing issues or just because someone does
actually provide really good perf data.

- Shrinking struct file to 192 bytes is only part of the work.

Files use a slab that is SLAB_TYPESAFE_BY_RCU and when a kmem cache
is created with SLAB_TYPESAFE_BY_RCU the free pointer must be
located outside of the object because the cache doesn't know what
part of the memory can safely be overwritten as it may be needed to
prevent object recycling.

That has the consequence that SLAB_TYPESAFE_BY_RCU may end up
adding a new cacheline.

So this also contains work to add a new kmem_cache_create_rcu()
function that allows the caller to specify an offset where the
freelist pointer is supposed to be placed. Thus avoiding the
implicit addition of a fourth cacheline.

- And finally this removes the f_version member in struct file.

The f_version member isn't particularly well-defined. It is mainly
used as a cookie to detect concurrent seeks when iterating
directories. But it is also abused by some subsystems for
completely unrelated things.

It is mostly a directory and filesystem specific thing that doesn't
really need to live in struct file and with its wonky semantics it
really lacks a specific function.

For pipes, f_version is (ab)used to defer poll notifications until
a write has happened. And struct pipe_inode_info is used by
multiple struct files in their ->private_data so there's no chance
of pushing that down into file->private_data without introducing
another pointer indirection.

But pipes don't rely on f_pos_lock so this adds a union into struct
file encompassing f_pos_lock and a pipe specific f_pipe member that
pipes can use. This union of course can be extended to other file
types and is similar to what we do in struct inode already"

* tag 'vfs-6.12.file' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (26 commits)
fs: remove f_version
pipe: use f_pipe
fs: add f_pipe
ubifs: store cookie in private data
ufs: store cookie in private data
udf: store cookie in private data
proc: store cookie in private data
ocfs2: store cookie in private data
input: remove f_version abuse
ext4: store cookie in private data
ext2: store cookie in private data
affs: store cookie in private data
fs: add generic_llseek_cookie()
fs: use must_set_pos()
fs: add must_set_pos()
fs: add vfs_setpos_cookie()
s390: remove unused f_version
ceph: remove unused f_version
adi: remove unused f_version
mm: Removed @freeptr_offset to prevent doc warning
...

show more ...


# 8f72c31f 16-Sep-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
"This contains the usual pile of misc updates:

Features:

- Add

Merge tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs

Pull misc vfs updates from Christian Brauner:
"This contains the usual pile of misc updates:

Features:

- Add F_CREATED_QUERY fcntl() that allows userspace to query whether
a file was actually created. Often userspace wants to know whether
an O_CREATE request did actually create a file without using
O_EXCL. The current logic is that to first attempts to open the
file without O_CREAT | O_EXCL and if ENOENT is returned userspace
tries again with both flags. If that succeeds all is well. If it
now reports EEXIST it retries.

That works fairly well but some corner cases make this more
involved. If this operates on a dangling symlink the first openat()
without O_CREAT | O_EXCL will return ENOENT but the second openat()
with O_CREAT | O_EXCL will fail with EEXIST.

The reason is that openat() without O_CREAT | O_EXCL follows the
symlink while O_CREAT | O_EXCL doesn't for security reasons. So
it's not something we can really change unless we add an explicit
opt-in via O_FOLLOW which seems really ugly.

All available workarounds are really nasty (fanotify, bpf lsm etc)
so add a simple fcntl().

- Try an opportunistic lookup for O_CREAT. Today, when opening a file
we'll typically do a fast lookup, but if O_CREAT is set, the kernel
always takes the exclusive inode lock. This was likely done with
the expectation that O_CREAT means that we always expect to do the
create, but that's often not the case. Many programs set O_CREAT
even in scenarios where the file already exists (see related
F_CREATED_QUERY patch motivation above).

The series contained in the pr rearranges the pathwalk-for-open
code to also attempt a fast_lookup in certain O_CREAT cases. If a
positive dentry is found, the inode_lock can be avoided altogether
and it can stay in rcuwalk mode for the last step_into.

- Expose the 64 bit mount id via name_to_handle_at()

Now that we provide a unique 64-bit mount ID interface in statx(2),
we can now provide a race-free way for name_to_handle_at(2) to
provide a file handle and corresponding mount without needing to
worry about racing with /proc/mountinfo parsing or having to open a
file just to do statx(2).

While this is not necessary if you are using AT_EMPTY_PATH and
don't care about an extra statx(2) call, users that pass full paths
into name_to_handle_at(2) need to know which mount the file handle
comes from (to make sure they don't try to open_by_handle_at a file
handle from a different filesystem) and switching to AT_EMPTY_PATH
would require allocating a file for every name_to_handle_at(2) call

- Add a per dentry expire timeout to autofs

There are two fairly well known automounter map formats, the autofs
format and the amd format (more or less System V and Berkley).

Some time ago Linux autofs added an amd map format parser that
implemented a fair amount of the amd functionality. This was done
within the autofs infrastructure and some functionality wasn't
implemented because it either didn't make sense or required extra
kernel changes. The idea was to restrict changes to be within the
existing autofs functionality as much as possible and leave changes
with a wider scope to be considered later.

One of these changes is implementing the amd options:
1) "unmount", expire this mount according to a timeout (same as
the current autofs default).
2) "nounmount", don't expire this mount (same as setting the
autofs timeout to 0 except only for this specific mount) .
3) "utimeout=<seconds>", expire this mount using the specified
timeout (again same as setting the autofs timeout but only for
this mount)

To implement these options per-dentry expire timeouts need to be
implemented for autofs indirect mounts. This is because all map
keys (mounts) for autofs indirect mounts use an expire timeout
stored in the autofs mount super block info. structure and all
indirect mounts use the same expire timeout.

Fixes:

- Fix missing fput for FSCONFIG_SET_FD in autofs

- Use param->file for FSCONFIG_SET_FD in coda

- Delete the 'fs/netfs' proc subtreee when netfs module exits

- Make sure that struct uid_gid_map fits into a single cacheline

- Don't flush in-flight wb switches for superblocks without cgroup
writeback

- Correcting the idmapping mount example in the idmapping
documentation

- Fix a race between evice_inodes() and find_inode() and iput()

- Refine the show_inode_state() macro definition in writeback code

- Prevent dump_mapping() from accessing invalid dentry.d_name.name

- Show actual source for debugfs in /proc/mounts

- Annotate data-race of busy_poll_usecs in eventpoll

- Don't WARN for racy path_noexec check in exec code

- Handle OOM on mnt_warn_timestamp_expiry()

- Fix some spelling in the iomap design documentation

- Fix typo in procfs comment

- Fix typo in fs/namespace.c comment

Cleanups:

- Add the VFS git tree to the MAINTAINERS file

- Move FMODE_UNSIGNED_OFFSET to fop_flags freeing up another f_mode
bit in struct file bringing us to 5 free f_mode bits

- Remove the __I_DIO_WAKEUP bit from i_state flags as we can simplify
the wait mechanism

- Remove the unused path_put_init() helper

- Replace a __u32 with u32 for s_fsnotify_mask as __u32 is uapi
specific

- Replace the unsigned long i_state member with a u32 i_state member
in struct inode freeing up 4 bytes in struct inode. Instead of
using the bit based wait apis we're now using the var event apis
and using the individual bytes of the i_state member to wait on
state changes

- Explain how per-syscall AT_* flags should be allocated

- Use in_group_or_capable() helper to simplify the posix acl mode
update code

- Switch to LIST_HEAD() in fsync_buffers_list() to simplify the code

- Removed comment about d_rcu_to_refcount() as that function doesn't
exist anymore

- Add kernel documentation for lookup_fast()

- Don't re-zero evenpoll fields

- Remove outdated comment after close_fd()

- Fix imprecise wording in comment about the pipe filesystem

- Drop GFP_NOFAIL mode from alloc_page_buffers

- Missing blank line warnings and struct declaration improved in
file_table

- Annotate struct poll_list with __counted_by()

- Remove the unused read parameter in percpu-rwsem

- Remove linux/prefetch.h include from direct-io code

- Use kmemdup_array instead of kmemdup for multiple allocation in
mnt_idmapping code

- Remove unused mnt_cursor_del() declaration

Performance tweaks:

- Dodge smp_mb in break_lease and break_deleg in the common case

- Only read fops once in fops_{get,put}()

- Use RCU in ilookup()

- Elide smp_mb in iversion handling in the common case

- Drop one lock trip in evict()"

* tag 'vfs-6.12.misc' of gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs: (58 commits)
uidgid: make sure we fit into one cacheline
proc: Fix typo in the comment
fs/pipe: Correct imprecise wording in comment
fhandle: expose u64 mount id to name_to_handle_at(2)
uapi: explain how per-syscall AT_* flags should be allocated
fs: drop GFP_NOFAIL mode from alloc_page_buffers
writeback: Refine the show_inode_state() macro definition
fs/inode: Prevent dump_mapping() accessing invalid dentry.d_name.name
mnt_idmapping: Use kmemdup_array instead of kmemdup for multiple allocation
netfs: Delete subtree of 'fs/netfs' when netfs module exits
fs: use LIST_HEAD() to simplify code
inode: make i_state a u32
inode: port __I_LRU_ISOLATING to var event
vfs: fix race between evice_inodes() and find_inode()&iput()
inode: port __I_NEW to var event
inode: port __I_SYNC to var event
fs: reorder i_state bits
fs: add i_state helpers
MAINTAINERS: add the VFS git tree
fs: s/__u32/u32/ for s_fsnotify_mask
...

show more ...


# 24a988f7 08-Sep-2024 Christian Brauner <brauner@kernel.org>

Merge patch series "file: remove f_version"

Christian Brauner <brauner@kernel.org> says:

The f_version member in struct file isn't particularly well-defined. It
is mainly used as a cookie to detect

Merge patch series "file: remove f_version"

Christian Brauner <brauner@kernel.org> says:

The f_version member in struct file isn't particularly well-defined. It
is mainly used as a cookie to detect concurrent seeks when iterating
directories. But it is also abused by some subsystems for completely
unrelated things.

It is mostly a directory specific thing that doesn't really need to live
in struct file and with its wonky semantics it really lacks a specific
function.

For pipes, f_version is (ab)used to defer poll notifications until a
write has happened. And struct pipe_inode_info is used by multiple
struct files in their ->private_data so there's no chance of pushing
that down into file->private_data without introducing another pointer
indirection.

But this should be a solvable problem. Only regular files with
FMODE_ATOMIC_POS and directories require f_pos_lock. Pipes and other
files don't. So this adds a union into struct file encompassing
f_pos_lock and a pipe specific f_pipe member that pipes can use. This
union of course can be extended to other file types and is similar to
what we do in struct inode already.

* patches from https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-0-6d3e4816aa7b@kernel.org:
fs: remove f_version
pipe: use f_pipe
fs: add f_pipe
ubifs: store cookie in private data
ufs: store cookie in private data
udf: store cookie in private data
proc: store cookie in private data
ocfs2: store cookie in private data
input: remove f_version abuse
ext4: store cookie in private data
ext2: store cookie in private data
affs: store cookie in private data
fs: add generic_llseek_cookie()
fs: use must_set_pos()
fs: add must_set_pos()
fs: add vfs_setpos_cookie()
s390: remove unused f_version
ceph: remove unused f_version
adi: remove unused f_version
file: remove pointless comment

Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-0-6d3e4816aa7b@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# b4dba2ef 30-Aug-2024 Christian Brauner <brauner@kernel.org>

proc: store cookie in private data

Store the cookie to detect concurrent seeks on directories in
file->private_data.

Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-14-6d3e4816aa7b@k

proc: store cookie in private data

Store the cookie to detect concurrent seeks on directories in
file->private_data.

Link: https://lore.kernel.org/r/20240830-vfs-file-f_version-v1-14-6d3e4816aa7b@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 641bb439 09-Aug-2024 Christian Brauner <brauner@kernel.org>

fs: move FMODE_UNSIGNED_OFFSET to fop_flags

This is another flag that is statically set and doesn't need to use up
an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit.

(1) mem_op

fs: move FMODE_UNSIGNED_OFFSET to fop_flags

This is another flag that is statically set and doesn't need to use up
an FMODE_* bit. Move it to ->fop_flags and free up another FMODE_* bit.

(1) mem_open() used from proc_mem_operations
(2) adi_open() used from adi_fops
(3) drm_open_helper():
(3.1) accel_open() used from DRM_ACCEL_FOPS
(3.2) drm_open() used from
(3.2.1) amdgpu_driver_kms_fops
(3.2.2) psb_gem_fops
(3.2.3) i915_driver_fops
(3.2.4) nouveau_driver_fops
(3.2.5) panthor_drm_driver_fops
(3.2.6) radeon_driver_kms_fops
(3.2.7) tegra_drm_fops
(3.2.8) vmwgfx_driver_fops
(3.2.9) xe_driver_fops
(3.2.10) DRM_GEM_FOPS
(3.2.11) DEFINE_DRM_GEM_DMA_FOPS
(4) struct memdev sets fmode flags based on type of device opened. For
devices using struct mem_fops unsigned offset is used.

Mark all these file operations as FOP_UNSIGNED_OFFSET and add asserts
into the open helper to ensure that the flag is always set.

Link: https://lore.kernel.org/r/20240809-work-fop_unsigned-v1-1-658e054d893e@kernel.org
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# d80b065b 30-Aug-2024 Christian Brauner <brauner@kernel.org>

Merge patch series "proc: restrict overmounting of ephemeral entities"

Christian Brauner <brauner@kernel.org> says:

It is currently possible to mount on top of various ephemeral entities
in procfs.

Merge patch series "proc: restrict overmounting of ephemeral entities"

Christian Brauner <brauner@kernel.org> says:

It is currently possible to mount on top of various ephemeral entities
in procfs. This specifically includes magic links. To recap, magic links
are links of the form /proc/<pid>/fd/<nr>. They serve as references to
a target file and during path lookup they cause a jump to the target
path. Such magic links disappear if the corresponding file descriptor is
closed.

Currently it is possible to overmount such magic links:

int fd = open("/mnt/foo", O_RDONLY);
sprintf(path, "/proc/%d/fd/%d", getpid(), fd);
int fd2 = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW);
mount("/mnt/bar", path, "", MS_BIND, 0);

Arguably, this is nonsensical and is mostly interesting for an attacker
that wants to somehow trick a process into e.g., reopening something
that they didn't intend to reopen or to hide a malicious file
descriptor.

But also it risks leaking mounts for long-running processes. When
overmounting a magic link like above, the mount will not be detached
when the file descriptor is closed. Only the target mountpoint will
disappear. Which has the consequence of making it impossible to unmount
that mount afterwards. So the mount will stick around until the process
exits and the /proc/<pid>/ directory is cleaned up during
proc_flush_pid() when the dentries are pruned and invalidated.

That in turn means it's possible for a program to accidentally leak
mounts and it's also possible to make a task leak mounts without it's
knowledge if the attacker just keeps overmounting things under
/proc/<pid>/fd/<nr>.

I think it's wrong to try and fix this by us starting to play games with
close() or somewhere else to undo these mounts when the file descriptor
is closed. The fact that we allow overmounting of such magic links is
simply a bug and one that we need to fix.

Similar things can be said about entries under fdinfo/ and map_files/ so
those are restricted as well.

I have a further more aggressive patch that gets out the big hammer and
makes everything under /proc/<pid>/*, as well as immediate symlinks such
as /proc/self, /proc/thread-self, /proc/mounts, /proc/net that point
into /proc/<pid>/ not overmountable. Imho, all of this should be blocked
if we can get away with it. It's only useful to hide exploits such as in [1].

And again, overmounting of any global procfs files remains unaffected
and is an existing and supported use-case.

Link: https://righteousit.com/2024/07/24/hiding-linux-processes-with-bind-mounts [1]

// Note that repro uses the traditional way of just mounting over
// /proc/<pid>/fd/<nr>. This could also all be achieved just based on
// file descriptors using move_mount(). So /proc/<pid>/fd/<nr> isn't the
// only entry vector here. It's also possible to e.g., mount directly
// onto /proc/<pid>/map_files/* without going over /proc/<pid>/fd/<nr>.
int main(int argc, char *argv[])
{
char path[PATH_MAX];

creat("/mnt/foo", 0777);
creat("/mnt/bar", 0777);

/*
* For illustration use a bunch of file descriptors in the upper
* range that are unused.
*/
for (int i = 10000; i >= 256; i--) {
printf("I'm: /proc/%d/\n", getpid());

int fd2 = open("/mnt/foo", O_RDONLY);
if (fd2 < 0) {
printf("%m - Failed to open\n");
_exit(1);
}

int newfd = dup2(fd2, i);
if (newfd < 0) {
printf("%m - Failed to dup\n");
_exit(1);
}
close(fd2);

sprintf(path, "/proc/%d/fd/%d", getpid(), newfd);
int fd = openat(AT_FDCWD, path, O_PATH | O_NOFOLLOW);
if (fd < 0) {
printf("%m - Failed to open\n");
_exit(3);
}

sprintf(path, "/proc/%d/fd/%d", getpid(), fd);
printf("Mounting on top of %s\n", path);
if (mount("/mnt/bar", path, "", MS_BIND, 0)) {
printf("%m - Failed to mount\n");
_exit(4);
}

close(newfd);
close(fd2);
}

/*
* Give some time to look at things. The mounts now linger until
* the process exits.
*/
sleep(10000);
_exit(0);
}

* patches from https://lore.kernel.org/r/20240806-work-procfs-v1-0-fb04e1d09f0c@kernel.org:
proc: block mounting on top of /proc/<pid>/fdinfo/*
proc: block mounting on top of /proc/<pid>/fd/*
proc: block mounting on top of /proc/<pid>/map_files/*
proc: add proc_splice_unmountable()
proc: proc_readfdinfo() -> proc_fdinfo_iterate()
proc: proc_readfd() -> proc_fd_iterate()

Link: https://lore.kernel.org/r/20240806-work-procfs-v1-0-fb04e1d09f0c@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 3836b31c 06-Aug-2024 Christian Brauner <brauner@kernel.org>

proc: block mounting on top of /proc/<pid>/map_files/*

Entries under /proc/<pid>/map_files/* are ephemeral and may go away
before the process dies. As such allowing them to be used as mount
points c

proc: block mounting on top of /proc/<pid>/map_files/*

Entries under /proc/<pid>/map_files/* are ephemeral and may go away
before the process dies. As such allowing them to be used as mount
points creates the ability to leak mounts that linger until the process
dies with no ability to unmount them until then. Don't allow using them
as mountpoints.

Link: https://lore.kernel.org/r/20240806-work-procfs-v1-4-fb04e1d09f0c@kernel.org
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# 41e8149c 02-Aug-2024 Adrian Ratiu <adrian.ratiu@collabora.com>

proc: add config & param to block forcing mem writes

This adds a Kconfig option and boot param to allow removing
the FOLL_FORCE flag from /proc/pid/mem write calls because
it can be abused.

The tra

proc: add config & param to block forcing mem writes

This adds a Kconfig option and boot param to allow removing
the FOLL_FORCE flag from /proc/pid/mem write calls because
it can be abused.

The traditional forcing behavior is kept as default because
it can break GDB and some other use cases.

Previously we tried a more sophisticated approach allowing
distributions to fine-tune /proc/pid/mem behavior, however
that got NAK-ed by Linus [1], who prefers this simpler
approach with semantics also easier to understand for users.

Link: https://lore.kernel.org/lkml/CAHk-=wiGWLChxYmUA5HrT5aopZrB7_2VTa0NLZcxORgkUe5tEQ@mail.gmail.com/ [1]
Cc: Doug Anderson <dianders@chromium.org>
Cc: Jeff Xu <jeffxu@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Kees Cook <kees@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Link: https://lore.kernel.org/r/20240802080225.89408-1-adrian.ratiu@collabora.com
Signed-off-by: Christian Brauner <brauner@kernel.org>

show more ...


# b5dd4241 17-Jun-2024 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge tag 'v6.10-rc4' into driver-core-next

We need the driver core and sysfs fixes in here to build on top of.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


12345678910>>...123