a634dda2 | 17-Jan-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'io_uring-6.13-20250116' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe: "One fix for the error handling in buffer cloning, and one fix for the ring resizing.
Two m
Merge tag 'io_uring-6.13-20250116' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe: "One fix for the error handling in buffer cloning, and one fix for the ring resizing.
Two minor followups for the latter as well.
Both of these issues only affect 6.13, so not marked for stable"
* tag 'io_uring-6.13-20250116' of git://git.kernel.dk/linux: io_uring/register: cache old SQ/CQ head reading for copies io_uring/register: document io_register_resize_rings() shared mem usage io_uring/register: use stable SQ/CQ ring data during resize io_uring/rsrc: fixup io_clone_buffers() error handling
show more ...
|
6f7a644e | 15-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/register: cache old SQ/CQ head reading for copies
The SQ and CQ ring heads are read twice - once for verifying that it's within bounds, and once inside the loops copying SQE and CQE entries
io_uring/register: cache old SQ/CQ head reading for copies
The SQ and CQ ring heads are read twice - once for verifying that it's within bounds, and once inside the loops copying SQE and CQE entries. This is technically incorrect, in case the values could get modified in between verifying them and using them in the copy loop. While this won't lead to anything truly nefarious, it may cause longer loop times for the copies than expected.
Read the ring head values once, and use the verified value in the copy loops.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
2c5aae12 | 15-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/register: document io_register_resize_rings() shared mem usage
It can be a bit hard to tell which parts of io_register_resize_rings() are operating on shared memory, and which ones are not.
io_uring/register: document io_register_resize_rings() shared mem usage
It can be a bit hard to tell which parts of io_register_resize_rings() are operating on shared memory, and which ones are not. And anything reading or writing to those regions should really use the read/write once primitives.
Hence add those, ensuring sanity in how this memory is accessed, and helping document the shared nature of it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
8911798d | 15-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/register: use stable SQ/CQ ring data during resize
Normally the kernel would not expect an application to modify any of the data shared with the kernel during a resize operation, but of cou
io_uring/register: use stable SQ/CQ ring data during resize
Normally the kernel would not expect an application to modify any of the data shared with the kernel during a resize operation, but of course the kernel cannot always assume good intent on behalf of the application.
As part of resizing the rings, existing SQEs and CQEs are copied over to the new storage. Resizing uses the masks in the newly allocated shared storage to index the arrays, however it's possible that malicious userspace could modify these after they have been sanity checked.
Use the validated and locally stored CQ and SQ ring sizing for masking to ensure the values are both stable and valid.
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
c1c03ee7 | 14-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/rsrc: fixup io_clone_buffers() error handling
Jann reports he can trigger a UAF if the target ring unregisters buffers before the clone operation is fully done. And additionally also an iss
io_uring/rsrc: fixup io_clone_buffers() error handling
Jann reports he can trigger a UAF if the target ring unregisters buffers before the clone operation is fully done. And additionally also an issue related to node allocation failures. Both of those stemp from the fact that the cleanup logic puts the buffers manually, rather than just relying on io_rsrc_data_free() doing it. Hence kill the manual cleanup code and just let io_rsrc_data_free() handle it, it'll put the nodes appropriately.
Reported-by: Jann Horn <jannh@google.com> Fixes: 3597f2786b68 ("io_uring/rsrc: unify file and buffer resource tables") Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
52a5a22d | 11-Jan-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'io_uring-6.13-20250111' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix for multishot timeout updates only using the updated value for the first invocation, n
Merge tag 'io_uring-6.13-20250111' of git://git.kernel.dk/linux
Pull io_uring fixes from Jens Axboe:
- Fix for multishot timeout updates only using the updated value for the first invocation, not subsequent ones
- Silence a false positive lockdep warning
- Fix the eventfd signaling and putting RCU logic
- Fix fault injected SQPOLL setup not clearing the task pointer in the error path
- Fix local task_work looking at the SQPOLL thread rather than just signaling the safe variant. Again one of those theoretical issues, which should be closed up none the less.
* tag 'io_uring-6.13-20250111' of git://git.kernel.dk/linux: io_uring: don't touch sqd->thread off tw add io_uring/sqpoll: zero sqd->thread on tctx errors io_uring/eventfd: ensure io_eventfd_signal() defers another RCU period io_uring: silence false positive warnings io_uring/timeout: fix multishot updates
show more ...
|
bd2703b4 | 10-Jan-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: don't touch sqd->thread off tw add
With IORING_SETUP_SQPOLL all requests are created by the SQPOLL task, which means that req->task should always match sqd->thread. Since accesses to sqd->
io_uring: don't touch sqd->thread off tw add
With IORING_SETUP_SQPOLL all requests are created by the SQPOLL task, which means that req->task should always match sqd->thread. Since accesses to sqd->thread should be separately protected, use req->task in io_req_normal_work_add() instead.
Note, in the eyes of io_req_normal_work_add(), the SQPOLL task struct is always pinned and alive, and sqd->thread can either be the task or NULL. It's only problematic if the compiler decides to reload the value after the null check, which is not so likely.
Cc: stable@vger.kernel.org Cc: Bui Quang Minh <minhquangbui99@gmail.com> Reported-by: lizetao <lizetao1@huawei.com> Fixes: 78f9b61bd8e54 ("io_uring: wake SQPOLL task when task_work is added to an empty queue") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/1cbbe72cf32c45a8fee96026463024cd8564a7d7.1736541357.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
4b7cfa8b | 10-Jan-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/sqpoll: zero sqd->thread on tctx errors
Syzkeller reports:
BUG: KASAN: slab-use-after-free in thread_group_cputime+0x409/0x700 kernel/sched/cputime.c:341 Read of size 8 at addr ffff8880357
io_uring/sqpoll: zero sqd->thread on tctx errors
Syzkeller reports:
BUG: KASAN: slab-use-after-free in thread_group_cputime+0x409/0x700 kernel/sched/cputime.c:341 Read of size 8 at addr ffff88803578c510 by task syz.2.3223/27552 Call Trace: <TASK> ... kasan_report+0x143/0x180 mm/kasan/report.c:602 thread_group_cputime+0x409/0x700 kernel/sched/cputime.c:341 thread_group_cputime_adjusted+0xa6/0x340 kernel/sched/cputime.c:639 getrusage+0x1000/0x1340 kernel/sys.c:1863 io_uring_show_fdinfo+0xdfe/0x1770 io_uring/fdinfo.c:197 seq_show+0x608/0x770 fs/proc/fd.c:68 ...
That's due to sqd->task not being cleared properly in cases where SQPOLL task tctx setup fails, which can essentially only happen with fault injection to insert allocation errors.
Cc: stable@vger.kernel.org Fixes: 1251d2025c3e1 ("io_uring/sqpoll: early exit thread if task_context wasn't allocated") Reported-by: syzbot+3d92cfcfa84070b0a470@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/efc7ec7010784463b2e7466d7b5c02c2cb381635.1736519461.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
7110f24f | 10-Jan-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'vfs-6.13-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner: "afs:
- Fix the maximum cell name length
- Fix merge prefere
Merge tag 'vfs-6.13-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner: "afs:
- Fix the maximum cell name length
- Fix merge preference rule failure condition
fuse:
- Fix fuse_get_user_pages() so it doesn't risk misleading the caller to think pages have been allocated when they actually haven't
- Fix direct-io folio offset and length calculation
netfs:
- Fix async direct-io handling
- Fix read-retry for filesystems that don't provide a ->prepare_read() method
vfs:
- Prevent truncating 64-bit offsets to 32-bits in iomap
- Fix memory barrier interactions when polling
- Remove MNT_ONRB to fix concurrent modification of @mnt->mnt_flags leading to MNT_ONRB to not be raised and invalid access to a list member"
* tag 'vfs-6.13-rc7.fixes.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: poll: kill poll_does_not_wait() sock_poll_wait: kill the no longer necessary barrier after poll_wait() io_uring_poll: kill the no longer necessary barrier after poll_wait() poll_wait: kill the obsolete wait_address check poll_wait: add mb() to fix theoretical race between waitqueue_active() and .poll() afs: Fix merge preference rule failure condition netfs: Fix read-retry for fs with no ->prepare_read() netfs: Fix kernel async DIO fs: kill MNT_ONRB iomap: avoid avoid truncating 64-bit offset to 32 bits afs: Fix the maximum cell name length fuse: Set *nbytesp=0 in fuse_get_user_pages on allocation failure fuse: fix direct io folio offset and length calculation
show more ...
|
1623bc27 | 10-Jan-2025 |
Christian Brauner <brauner@kernel.org> |
Merge branch 'vfs-6.14.poll' into vfs.fixes
Bring in the fixes for __pollwait() and waitqueue_active() interactions.
Signed-off-by: Christian Brauner <brauner@kernel.org> |
4e15fa83 | 07-Jan-2025 |
Oleg Nesterov <oleg@redhat.com> |
io_uring_poll: kill the no longer necessary barrier after poll_wait()
Now that poll_wait() provides a full barrier we can remove smp_rmb() from io_uring_poll().
In fact I don't think smp_rmb() was
io_uring_poll: kill the no longer necessary barrier after poll_wait()
Now that poll_wait() provides a full barrier we can remove smp_rmb() from io_uring_poll().
In fact I don't think smp_rmb() was correct, it can't serialize LOADs and STOREs.
Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20250107162730.GA18940@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
show more ...
|
643e2e25 | 09-Jan-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba: "A few more fixes.
Besides the one-liners in Btrfs there's fix to th
Merge tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba: "A few more fixes.
Besides the one-liners in Btrfs there's fix to the io_uring and encoded read integration (added in this development cycle). The update to io_uring provides more space for the ongoing command that is then used in Btrfs to handle some cases.
- io_uring and encoded read: - provide stable storage for io_uring command data - make a copy of encoded read ioctl call, reuse that in case the call would block and will be called again
- properly initialize zlib context for hardware compression on s390
- fix max extent size calculation on filesystems with non-zoned devices
- fix crash in scrub on crafted image due to invalid extent tree"
* tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zlib: fix avail_in bytes for s390 zlib HW compression path btrfs: zoned: calculate max_extent_size properly on non-zoned setup btrfs: avoid NULL pointer dereference if no valid extent tree btrfs: don't read from userspace twice in btrfs_uring_encoded_read() io_uring: add io_uring_cmd_get_async_data helper io_uring/cmd: add per-op data to struct io_uring_cmd_data io_uring/cmd: rename struct uring_cache to io_uring_cmd_data
show more ...
|
c9a40292 | 08-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/eventfd: ensure io_eventfd_signal() defers another RCU period
io_eventfd_do_signal() is invoked from an RCU callback, but when dropping the reference to the io_ev_fd, it calls io_eventfd_fr
io_uring/eventfd: ensure io_eventfd_signal() defers another RCU period
io_eventfd_do_signal() is invoked from an RCU callback, but when dropping the reference to the io_ev_fd, it calls io_eventfd_free() directly if the refcount drops to zero. This isn't correct, as any potential freeing of the io_ev_fd should be deferred another RCU grace period.
Just call io_eventfd_put() rather than open-code the dec-and-test and free, which will correctly defer it another RCU grace period.
Fixes: 21a091b970cd ("io_uring: signal registered eventfd to process deferred task work") Reported-by: Jann Horn <jannh@google.com> Cc: stable@vger.kernel.org Tested-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Li Zetao<lizetao1@huawei.com> Reviewed-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
60495b08 | 07-Jan-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: silence false positive warnings
If we kill a ring and then immediately exit the task, we'll get cancellattion running by the task and a kthread in io_ring_exit_work. For DEFER_TASKRUN, we
io_uring: silence false positive warnings
If we kill a ring and then immediately exit the task, we'll get cancellattion running by the task and a kthread in io_ring_exit_work. For DEFER_TASKRUN, we do want to limit it to only one entity executing it, however it's currently not an issue as it's protected by uring_lock.
Silence lockdep assertions for now, we'll return to it later.
Reported-by: syzbot+1bcb75613069ad4957fc@syzkaller.appspotmail.com Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/7e5f68281acb0f081f65fde435833c68a3b7e02f.1736257837.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
3347fa65 | 03-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/cmd: add per-op data to struct io_uring_cmd_data
In case an op handler for ->uring_cmd() needs stable storage for user data, it can allocate io_uring_cmd_data->op_data and use it for the du
io_uring/cmd: add per-op data to struct io_uring_cmd_data
In case an op handler for ->uring_cmd() needs stable storage for user data, it can allocate io_uring_cmd_data->op_data and use it for the duration of the request. When the request gets cleaned up, uring_cmd will free it automatically.
Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
dadf03cf | 03-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/cmd: rename struct uring_cache to io_uring_cmd_data
In preparation for making this more generically available for ->uring_cmd() usage that needs stable command data, rename it and move it t
io_uring/cmd: rename struct uring_cache to io_uring_cmd_data
In preparation for making this more generically available for ->uring_cmd() usage that needs stable command data, rename it and move it to io_uring/cmd.h instead.
Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: David Sterba <dsterba@suse.com>
show more ...
|
c83c8462 | 04-Jan-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/timeout: fix multishot updates
After update only the first shot of a multishot timeout request adheres to the new timeout value while all subsequent retries continue to use the old value. D
io_uring/timeout: fix multishot updates
After update only the first shot of a multishot timeout request adheres to the new timeout value while all subsequent retries continue to use the old value. Don't forget to update the timeout stored in struct io_timeout_data.
Cc: stable@vger.kernel.org Fixes: ea97f6c8558e8 ("io_uring: add support for multishot timeouts") Reported-by: Christian Mazakas <christian.mazakas@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e6516c3304eb654ec234cfa65c88a9579861e597.1736015288.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
ed123c94 | 03-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/kbuf: use pre-committed buffer address for non-pollable file
For non-pollable files, buffer ring consumption will commit upfront. This is fine, but io_ring_buffer_select() will return the a
io_uring/kbuf: use pre-committed buffer address for non-pollable file
For non-pollable files, buffer ring consumption will commit upfront. This is fine, but io_ring_buffer_select() will return the address of the buffer after having committed it. For incrementally consumed buffers, this is incorrect as it will modify the buffer address.
Store the pre-committed value and return that. If that isn't done, then the initial part of the buffer is not used and the application will correctly assume the content arrived at the start of the userspace buffer, but the kernel will have put it later in the buffer. Or it can cause a spurious -EFAULT returned in the CQE, depending on the buffer size. As bounds are suitably checked for doing the actual IO, no adverse side effects are possible - it's just a data misplacement within the existing buffer.
Reported-by: Gwendal Fernet <gwendalfernet@gmail.com> Cc: stable@vger.kernel.org Fixes: ae98dbf43d75 ("io_uring/kbuf: add support for incremental buffer consumption") Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
c6e60a0a | 03-Jan-2025 |
Jens Axboe <axboe@kernel.dk> |
io_uring/net: always initialize kmsg->msg.msg_inq upfront
syzbot reports that ->msg_inq may get used uinitialized from the following path:
BUG: KMSAN: uninit-value in io_recv_buf_select io_uring/ne
io_uring/net: always initialize kmsg->msg.msg_inq upfront
syzbot reports that ->msg_inq may get used uinitialized from the following path:
BUG: KMSAN: uninit-value in io_recv_buf_select io_uring/net.c:1094 [inline] BUG: KMSAN: uninit-value in io_recv+0x930/0x1f90 io_uring/net.c:1158 io_recv_buf_select io_uring/net.c:1094 [inline] io_recv+0x930/0x1f90 io_uring/net.c:1158 io_issue_sqe+0x420/0x2130 io_uring/io_uring.c:1740 io_queue_sqe io_uring/io_uring.c:1950 [inline] io_req_task_submit+0xfa/0x1d0 io_uring/io_uring.c:1374 io_handle_tw_list+0x55f/0x5c0 io_uring/io_uring.c:1057 tctx_task_work_run+0x109/0x3e0 io_uring/io_uring.c:1121 tctx_task_work+0x6d/0xc0 io_uring/io_uring.c:1139 task_work_run+0x268/0x310 kernel/task_work.c:239 io_run_task_work+0x43a/0x4a0 io_uring/io_uring.h:343 io_cqring_wait io_uring/io_uring.c:2527 [inline] __do_sys_io_uring_enter io_uring/io_uring.c:3439 [inline] __se_sys_io_uring_enter+0x204f/0x4ce0 io_uring/io_uring.c:3330 __x64_sys_io_uring_enter+0x11f/0x1a0 io_uring/io_uring.c:3330 x64_sys_call+0xce5/0x3c30 arch/x86/include/generated/asm/syscalls_64.h:427 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
and it is correct, as it's never initialized upfront. Hence the first submission can end up using it uninitialized, if the recv wasn't successful and the networking stack didn't honor ->msg_get_inq being set and filling in the output value of ->msg_inq as requested.
Set it to 0 upfront when it's allocated, just to silence this KMSAN warning. There's no side effect of using it uninitialized, it'll just potentially cause the next receive to use a recv value hint that's not accurate.
Fixes: c6f32c7d9e09 ("io_uring/net: get rid of ->prep_async() for receive side") Reported-by: syzbot+068ff190354d2f74892f@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
a9c83a0a | 30-Dec-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/timeout: flush timeouts outside of the timeout lock
syzbot reports that a recent fix causes nesting issues between the (now) raw timeoutlock and the eventfd locking:
======================
io_uring/timeout: flush timeouts outside of the timeout lock
syzbot reports that a recent fix causes nesting issues between the (now) raw timeoutlock and the eventfd locking:
============================= [ BUG: Invalid wait context ] 6.13.0-rc4-00080-g9828a4c0901f #29 Not tainted ----------------------------- kworker/u32:0/68094 is trying to lock: ffff000014d7a520 (&ctx->wqh#2){..-.}-{3:3}, at: eventfd_signal_mask+0x64/0x180 other info that might help us debug this: context-{5:5} 6 locks held by kworker/u32:0/68094: #0: ffff0000c1d98148 ((wq_completion)iou_exit){+.+.}-{0:0}, at: process_one_work+0x4e8/0xfc0 #1: ffff80008d927c78 ((work_completion)(&ctx->exit_work)){+.+.}-{0:0}, at: process_one_work+0x53c/0xfc0 #2: ffff0000c59bc3d8 (&ctx->completion_lock){+.+.}-{3:3}, at: io_kill_timeouts+0x40/0x180 #3: ffff0000c59bc358 (&ctx->timeout_lock){-.-.}-{2:2}, at: io_kill_timeouts+0x48/0x180 #4: ffff800085127aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x38 #5: ffff800085127aa0 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire+0x8/0x38 stack backtrace: CPU: 7 UID: 0 PID: 68094 Comm: kworker/u32:0 Not tainted 6.13.0-rc4-00080-g9828a4c0901f #29 Hardware name: linux,dummy-virt (DT) Workqueue: iou_exit io_ring_exit_work Call trace: show_stack+0x1c/0x30 (C) __dump_stack+0x24/0x30 dump_stack_lvl+0x60/0x80 dump_stack+0x14/0x20 __lock_acquire+0x19f8/0x60c8 lock_acquire+0x1a4/0x540 _raw_spin_lock_irqsave+0x90/0xd0 eventfd_signal_mask+0x64/0x180 io_eventfd_signal+0x64/0x108 io_req_local_work_add+0x294/0x430 __io_req_task_work_add+0x1c0/0x270 io_kill_timeout+0x1f0/0x288 io_kill_timeouts+0xd4/0x180 io_uring_try_cancel_requests+0x2e8/0x388 io_ring_exit_work+0x150/0x550 process_one_work+0x5e8/0xfc0 worker_thread+0x7ec/0xc80 kthread+0x24c/0x300 ret_from_fork+0x10/0x20
because after the preempt-rt fix for the timeout lock nesting inside the io-wq lock, we now have the eventfd spinlock nesting inside the raw timeout spinlock.
Rather than play whack-a-mole with other nesting on the timeout lock, split the deletion and killing of timeouts so queueing the task_work for the timeout cancelations can get done outside of the timeout lock.
Reported-by: syzbot+b1fc199a40b65d601b65@syzkaller.appspotmail.com Fixes: 020b40f35624 ("io_uring: make ctx->timeout_lock a raw spinlock") Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
38fc96a5 | 28-Dec-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/rw: fix downgraded mshot read
The io-wq path can downgrade a multishot request to oneshot mode, however io_read_mshot() doesn't handle that and would still post multiple CQEs. That's not al
io_uring/rw: fix downgraded mshot read
The io-wq path can downgrade a multishot request to oneshot mode, however io_read_mshot() doesn't handle that and would still post multiple CQEs. That's not allowed, because io_req_post_cqe() requires stricter context requirements.
The described can only happen with pollable files that don't support FMODE_NOWAIT, which is an odd combination, so if even allowed it should be fairly rare.
Cc: stable@vger.kernel.org Reported-by: chase xd <sl1589472800@gmail.com> Fixes: bee1d5becdf5b ("io_uring: disable io-wq execution of multishot NOWAIT requests") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/c5c8c4a50a882fd581257b81bf52eee260ac29fd.1735407848.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
e33ac68e | 26-Dec-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring/sqpoll: fix sqpoll error handling races
BUG: KASAN: slab-use-after-free in __lock_acquire+0x370b/0x4a10 kernel/locking/lockdep.c:5089 Call Trace: <TASK> ... _raw_spin_lock_irqsave+0x3d/0x60
io_uring/sqpoll: fix sqpoll error handling races
BUG: KASAN: slab-use-after-free in __lock_acquire+0x370b/0x4a10 kernel/locking/lockdep.c:5089 Call Trace: <TASK> ... _raw_spin_lock_irqsave+0x3d/0x60 kernel/locking/spinlock.c:162 class_raw_spinlock_irqsave_constructor include/linux/spinlock.h:551 [inline] try_to_wake_up+0xb5/0x23c0 kernel/sched/core.c:4205 io_sq_thread_park+0xac/0xe0 io_uring/sqpoll.c:55 io_sq_thread_finish+0x6b/0x310 io_uring/sqpoll.c:96 io_sq_offload_create+0x162/0x11d0 io_uring/sqpoll.c:497 io_uring_create io_uring/io_uring.c:3724 [inline] io_uring_setup+0x1728/0x3230 io_uring/io_uring.c:3806 ...
Kun Hu reports that the SQPOLL creating error path has UAF, which happens if io_uring_alloc_task_context() fails and then io_sq_thread() manages to run and complete before the rest of error handling code, which means io_sq_thread_finish() is looking at already killed task.
Note that this is mostly theoretical, requiring fault injection on the allocation side to trigger in practice.
Cc: stable@vger.kernel.org Reported-by: Kun Hu <huk23@m.fudan.edu.cn> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0f2f1aa5729332612bd01fe0f2f385fd1f06ce7c.1735231717.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
dbd2ca93 | 19-Dec-2024 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: check if iowq is killed before queuing
task work can be executed after the task has gone through io_uring termination, whether it's the final task_work run or the fallback path. In this ca
io_uring: check if iowq is killed before queuing
task work can be executed after the task has gone through io_uring termination, whether it's the final task_work run or the fallback path. In this case, task work will find ->io_wq being already killed and null'ed, which is a problem if it then tries to forward the request to io_queue_iowq(). Make io_queue_iowq() fail requests in this case.
Note that it also checks PF_KTHREAD, because the user can first close a DEFER_TASKRUN ring and shortly after kill the task, in which case ->iowq check would race.
Cc: stable@vger.kernel.org Fixes: 50c52250e2d74 ("block: implement async io_uring discard cmd") Fixes: 773af69121ecc ("io_uring: always reissue from task_work context") Reported-by: Will <willsroot@protonmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/63312b4a2c2bb67ad67b857d17a300e1d3b078e8.1734637909.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
c261e4f1 | 19-Dec-2024 |
Jens Axboe <axboe@kernel.dk> |
io_uring/register: limit ring resizing to DEFER_TASKRUN
With DEFER_TASKRUN, we know the ring can't be both waited upon and resized at the same time. This is important for CQ resizing. Allowing SQ ri
io_uring/register: limit ring resizing to DEFER_TASKRUN
With DEFER_TASKRUN, we know the ring can't be both waited upon and resized at the same time. This is important for CQ resizing. Allowing SQ ring resizing is more trivial, but isn't the interesting use case. Hence limit ring resizing in general to DEFER_TASKRUN only for now. This isn't a huge problem as CQ ring resizing is generally the most useful on networking type of workloads where it can be hard to size the ring appropriately upfront, and those should be using DEFER_TASKRUN for better performance.
Fixes: 79cfe9e59c2a ("io_uring/register: add IORING_REGISTER_RESIZE_RINGS") Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
12d90811 | 18-Dec-2024 |
Jann Horn <jannh@google.com> |
io_uring: Fix registered ring file refcount leak
Currently, io_uring_unreg_ringfd() (which cleans up registered rings) is only called on exit, but __io_uring_free (which frees the tctx in which the
io_uring: Fix registered ring file refcount leak
Currently, io_uring_unreg_ringfd() (which cleans up registered rings) is only called on exit, but __io_uring_free (which frees the tctx in which the registered ring pointers are stored) is also called on execve (via begin_new_exec -> io_uring_task_cancel -> __io_uring_cancel -> io_uring_cancel_generic -> __io_uring_free).
This means: A process going through execve while having registered rings will leak references to the rings' `struct file`.
Fix it by zapping registered rings on execve(). This is implemented by moving the io_uring_unreg_ringfd() from io_uring_files_cancel() into its callee __io_uring_cancel(), which is called from io_uring_task_cancel() on execve.
This could probably be exploited *on 32-bit kernels* by leaking 2^32 references to the same ring, because the file refcount is stored in a pointer-sized field and get_file() doesn't have protection against refcount overflow, just a WARN_ONCE(); but on 64-bit it should have no impact beyond a memory leak.
Cc: stable@vger.kernel.org Fixes: e7a6c00dc77a ("io_uring: add support for registering ring file descriptors") Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20241218-uring-reg-ring-cleanup-v1-1-8f63e999045b@google.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|