| #
591beb0e |
| 10-Feb-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring bpf filters from Jens Axboe: "This adds support for both cBPF filters for
Merge tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring bpf filters from Jens Axboe: "This adds support for both cBPF filters for io_uring, as well as task inherited restrictions and filters.
seccomp and io_uring don't play along nicely, as most of the interesting data to filter on resides somewhat out-of-band, in the submission queue ring.
As a result, things like containers and systemd that apply seccomp filters, can't filter io_uring operations.
That leaves them with just one choice if filtering is critical - filter the actual io_uring_setup(2) system call to simply disallow io_uring. That's rather unfortunate, and has limited us because of it.
io_uring already has some filtering support. It requires the ring to be setup in a disabled state, and then a filter set can be applied. This filter set is completely bi-modal - an opcode is either enabled or it's not. Once a filter set is registered, the ring can be enabled. This is very restrictive, and it's not useful at all to systemd or containers which really want both broader and more specific control.
This first adds support for cBPF filters for opcodes, which enables tighter control over what exactly a specific opcode may do. As examples, specific support is added for IORING_OP_OPENAT/OPENAT2, allowing filtering on resolve flags. And another example is added for IORING_OP_SOCKET, allowing filtering on domain/type/protocol. These are both common use cases. cBPF was chosen rather than eBPF, because the latter is often restricted in containers as well.
These filters are run post the init phase of the request, which allows filters to even dip into data that is being passed in struct in user memory, as the init side of requests make that data stable by bringing it into the kernel. This allows filtering without needing to copy this data twice, or have filters etc know about the exact layout of the user data. The filters get the already copied and sanitized data passed.
On top of that support is added for per-task filters, meaning that any ring created with a task that has a per-task filter will get those filters applied when it's created. These filters are inherited across fork as well. Once a filter has been registered, any further added filters may only further restrict what operations are permitted.
Filters cannot change the return value of an operation, they can only permit or deny it based on the contents"
* tag 'io_uring-bpf-restrictions.4-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: io_uring: allow registration of per-task restrictions io_uring: add task fork hook io_uring/bpf_filter: add ref counts to struct io_bpf_filter io_uring/bpf_filter: cache lookup table in ctx->bpf_filters io_uring/bpf_filter: allow filtering on contents of struct open_how io_uring/net: allow filtering on IORING_OP_SOCKET data io_uring: add support for BPF filtering for opcode restrictions
show more ...
|
| #
f5d4feed |
| 10-Feb-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'for-7.0/io_uring-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring updates from Jens Axboe:
- Clean up the IORING_SETUP_R_DISABLED and submitter task
Merge tag 'for-7.0/io_uring-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring updates from Jens Axboe:
- Clean up the IORING_SETUP_R_DISABLED and submitter task checking, mostly just in preparation for relaxing the locking for SINGLE_ISSUER in the future.
- Improve IOPOLL by using a doubly linked list to manage completions.
Previously it was singly listed, which meant that to complete request N in the chain 0..N-1 had to have completed first. With a doubly linked list we can complete whatever request completes in that order, rather than need to wait for a consecutive range to be available. This reduces latencies.
- Improve the restriction setup and checking. Mostly in preparation for adding further features on top of that. Coming in a separate pull request.
- Split out task_work and wait handling into separate files. These are mostly nicely abstracted already, but still remained in the io_uring.c file which is on the larger side.
- Use GFP_KERNEL_ACCOUNT in a few more spots, where appropriate.
- Ensure even the idle io-wq worker exits if a task no longer has any rings open.
- Add support for a non-circular submission queue.
By default, the SQ ring keeps moving around, even if only a few entries are used for each submission. This can be wasteful in terms of cachelines.
If IORING_SETUP_SQ_REWIND is set for the ring when created, each submission will start at offset 0 instead of where we last left off doing submissions.
- Various little cleanups
* tag 'for-7.0/io_uring-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (30 commits) io_uring/kbuf: fix memory leak if io_buffer_add_list fails io_uring: Add SPDX id lines to remaining source files io_uring: allow io-wq workers to exit when unused io_uring/io-wq: add exit-on-idle state io_uring/net: don't continue send bundle if poll was required for retry io_uring/rsrc: use GFP_KERNEL_ACCOUNT consistently io_uring/futex: use GFP_KERNEL_ACCOUNT for futex data allocation io_uring/io-wq: handle !sysctl_hung_task_timeout_secs io_uring: fix bad indentation for setup flags if statement io_uring/rsrc: take unsigned index in io_rsrc_node_lookup() io_uring: introduce non-circular SQ io_uring: split out CQ waiting code into wait.c io_uring: split out task work code into tw.c io_uring/io-wq: don't trigger hung task for syzbot craziness io_uring: add IO_URING_EXIT_WAIT_MAX definition io_uring/sync: validate passed in offset io_uring/eventfd: remove unused ctx->evfd_last_cq_tail member io_uring/timeout: annotate data race in io_flush_timeouts() io_uring/uring_cmd: explicitly disallow cancelations for IOPOLL io_uring: fix IOPOLL with passthrough I/O ...
show more ...
|
|
Revision tags: v6.19, v6.19-rc8, v6.19-rc7, v6.19-rc6 |
|
| #
d42eb05e |
| 15-Jan-2026 |
Jens Axboe <axboe@kernel.dk> |
io_uring: add support for BPF filtering for opcode restrictions
Add support for loading classic BPF programs with io_uring to provide fine-grained filtering of SQE operations. Unlike IORING_REGISTER
io_uring: add support for BPF filtering for opcode restrictions
Add support for loading classic BPF programs with io_uring to provide fine-grained filtering of SQE operations. Unlike IORING_REGISTER_RESTRICTIONS which only allows bitmap-based allow/deny of opcodes, BPF filters can inspect request attributes and make dynamic decisions.
The filter is registered via IORING_REGISTER_BPF_FILTER with a struct io_uring_bpf:
struct io_uring_bpf_filter { __u32 opcode; /* io_uring opcode to filter */ __u32 flags; __u32 filter_len; /* number of BPF instructions */ __u32 resv; __u64 filter_ptr; /* pointer to BPF filter */ __u64 resv2[5]; };
enum { IO_URING_BPF_CMD_FILTER = 1, };
struct io_uring_bpf { __u16 cmd_type; /* IO_URING_BPF_* values */ __u16 cmd_flags; /* none so far */ __u32 resv; union { struct io_uring_bpf_filter filter; }; };
and the filters get supplied a struct io_uring_bpf_ctx:
struct io_uring_bpf_ctx { __u64 user_data; __u8 opcode; __u8 sqe_flags; __u8 pdu_size; __u8 pad[5]; };
where it's possible to filter on opcode and sqe_flags, with pdu_size indicating how much extra data is being passed in beyond the pad field. This will used for specific finer grained filtering inside an opcode. An example of that for sockets is in one of the following patches. Anything the opcode supports can end up in this struct, populated by the opcode itself, and hence can be filtered for.
Filters have the following semantics: - Return 1 to allow the request - Return 0 to deny the request with -EACCES - Multiple filters can be stacked per opcode. All filters must return 1 for the opcode to be allowed. - Filters are evaluated in registration order (most recent first)
The implementation uses classic BPF (cBPF) rather than eBPF for as that's required for containers, and since they can be used by any user in the system.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
| #
0105b056 |
| 22-Jan-2026 |
Jens Axboe <axboe@kernel.dk> |
io_uring: split out CQ waiting code into wait.c
Move the completion queue waiting and scheduling code out of io_uring.c into a dedicated wait.c file. This further removes code out of the main io_uri
io_uring: split out CQ waiting code into wait.c
Move the completion queue waiting and scheduling code out of io_uring.c into a dedicated wait.c file. This further removes code out of the main io_uring C and header file, and into a topical new file.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
| #
7642e668 |
| 22-Jan-2026 |
Jens Axboe <axboe@kernel.dk> |
io_uring: split out task work code into tw.c
Move the task work handling code out of io_uring.c into a new tw.c file. This includes the local work, normal work, and fallback work handling infrastruc
io_uring: split out task work code into tw.c
Move the task work handling code out of io_uring.c into a new tw.c file. This includes the local work, normal work, and fallback work handling infrastructure.
The associated tw.h header contains io_should_terminate_tw() as a static inline helper, along with the necessary function declarations.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
|
Revision tags: v6.19-rc5, v6.19-rc4, v6.19-rc3, v6.19-rc2, v6.19-rc1 |
|
| #
a4a508df |
| 13-Dec-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.18' into next
Sync up with the mainline to bring in the latest APIs.
|
|
Revision tags: v6.18, v6.18-rc7, v6.18-rc6, v6.18-rc5, v6.18-rc4 |
|
| #
cb9f145f |
| 01-Nov-2025 |
Rob Clark <robin.clark@oss.qualcomm.com> |
Merge remote-tracking branch 'drm/drm-next' into msm-next-robclark
Back-merge drm-next to get caught up.
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
|
|
Revision tags: v6.18-rc3, v6.18-rc2 |
|
| #
82ee5025 |
| 14-Oct-2025 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
Merge drm/drm-next into drm-xe-next
Backmerging to bring in 6.18-rc1.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
| #
2acee98f |
| 14-Oct-2025 |
Jani Nikula <jani.nikula@intel.com> |
Merge drm/drm-next into drm-intel-next
Sync to v6.18-rc1.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
| #
9b966ae4 |
| 13-Oct-2025 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Updating drm-misc-next to the state of v6.18-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
|
Revision tags: v6.18-rc1, v6.17, v6.17-rc7 |
|
| #
f088104d |
| 16-Sep-2025 |
Joonas Lahtinen <joonas.lahtinen@linux.intel.com> |
Merge drm/drm-next into drm-intel-gt-next
Backmerge in order to get the commit:
048832a3f400 ("drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter")
To drm-intel-gt-next as there are f
Merge drm/drm-next into drm-intel-gt-next
Backmerge in order to get the commit:
048832a3f400 ("drm/i915: Refactor shmem_pwrite() to use kiocb and write_iter")
To drm-intel-gt-next as there are followup fixes to be applied.
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
show more ...
|
| #
2ace5271 |
| 21-Nov-2025 |
Peter Zijlstra <peterz@infradead.org> |
Merge branch 'objtool/core'
Bring in the UDB and objtool data annotations to avoid conflicts while further extending the bug exceptions.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
Revision tags: v6.17-rc6, v6.17-rc5, v6.17-rc4, v6.17-rc3, v6.17-rc2, v6.17-rc1 |
|
| #
a53d0cf7 |
| 05-Aug-2025 |
Ingo Molnar <mingo@kernel.org> |
Merge commit 'linus' into core/bugs, to resolve conflicts
Resolve conflicts with this commit that was developed in parallel during the merge window:
8c8efa93db68 ("x86/bug: Add ARCH_WARN_ASM macro
Merge commit 'linus' into core/bugs, to resolve conflicts
Resolve conflicts with this commit that was developed in parallel during the merge window:
8c8efa93db68 ("x86/bug: Add ARCH_WARN_ASM macro for BUG/WARN asm code sharing with Rust")
Conflicts: arch/riscv/include/asm/bug.h arch/x86/include/asm/bug.h
Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
| #
f39b6c46 |
| 18-Nov-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.18-rc6' into for-linus
Sync up with the mainline to bring in definition of INPUT_PROP_HAPTIC_TOUCHPAD.
|
| #
4f38da1f |
| 13-Oct-2025 |
Mark Brown <broonie@kernel.org> |
spi: Merge up v6.18-rc1
Ensure my CI has a sensible baseline.
|
| #
ec2e0fb0 |
| 16-Oct-2025 |
Takashi Iwai <tiwai@suse.de> |
Merge tag 'asoc-fix-v6.18-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.18
A moderately large collection of driver specific fixes, plus a f
Merge tag 'asoc-fix-v6.18-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.18
A moderately large collection of driver specific fixes, plus a few new quirks and device IDs. The NAU8821 changes are a little large but more in mechanical ways than in ways that are complex.
show more ...
|
| #
48a71076 |
| 14-Oct-2025 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Updating drm-misc-fixes to the state of v6.18-rc1.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
| #
8b87f67b |
| 08-Oct-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.18 merge window.
|
| #
4b051897 |
| 21-Aug-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.17-rc2' into HEAD
Sync up with mainline to bring in changes to include/linux/sprintf.h
|
| #
5832d264 |
| 02-Oct-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'for-6.18/io_uring-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring updates from Jens Axboe:
- Store ring provided buffers locally for the users, rath
Merge tag 'for-6.18/io_uring-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring updates from Jens Axboe:
- Store ring provided buffers locally for the users, rather than stuff them into struct io_kiocb.
These types of buffers must always be fully consumed or recycled in the current context, and leaving them in struct io_kiocb is hence not a good ideas as that struct has a vastly different life time.
Basically just an architecture cleanup that can help prevent issues with ring provided buffers in the future.
- Support for mixed CQE sizes in the same ring.
Before this change, a CQ ring either used the default 16b CQEs, or it was setup with 32b CQE using IORING_SETUP_CQE32. For use cases where a few 32b CQEs were needed, this caused everything else to use big CQEs. This is wasteful both in terms of memory usage, but also memory bandwidth for the posted CQEs.
With IORING_SETUP_CQE_MIXED, applications may use request types that post both normal 16b and big 32b CQEs on the same ring.
- Add helpers for async data management, to make it harder for opcode handlers to mess it up.
- Add support for multishot for uring_cmd, which ublk can use. This helps improve efficiency, by providing a persistent request type that can trigger multiple CQEs.
- Add initial support for ring feature querying.
We had basic support for probe operations, but the API isn't great. Rather than expand that, add support for QUERY which is easily expandable and can cover a lot more cases than the existing probe support. This will help applications get a better idea of what operations are supported on a given host.
- zcrx improvements from Pavel: - Improve refill entry alignment for better caching - Various cleanups, especially around deduplicating normal memory vs dmabuf setup. - Generalisation of the niov size (Patch 12). It's still hard coded to PAGE_SIZE on init, but will let the user to specify the rx buffer length on setup. - Syscall / synchronous bufer return. It'll be used as a slow fallback path for returning buffers when the refill queue is full. Useful for tolerating slight queue size misconfiguration or with inconsistent load. - Accounting more memory to cgroups. - Additional independent cleanups that will also be useful for mutli-area support.
- Various fixes and cleanups
* tag 'for-6.18/io_uring-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (68 commits) io_uring/cmd: drop unused res2 param from io_uring_cmd_done() io_uring: fix nvme's 32b cqes on mixed cq io_uring/query: cap number of queries io_uring/query: prevent infinite loops io_uring/zcrx: account niov arrays to cgroup io_uring/zcrx: allow synchronous buffer return io_uring/zcrx: introduce io_parse_rqe() io_uring/zcrx: don't adjust free cache space io_uring/zcrx: use guards for the refill lock io_uring/zcrx: reduce netmem scope in refill io_uring/zcrx: protect netdev with pp_lock io_uring/zcrx: rename dma lock io_uring/zcrx: make niov size variable io_uring/zcrx: set sgt for umem area io_uring/zcrx: remove dmabuf_offset io_uring/zcrx: deduplicate area mapping io_uring/zcrx: pass ifq to io_zcrx_alloc_fallback() io_uring/zcrx: check all niovs filled with dma addresses io_uring/zcrx: move area reg checks into io_import_area io_uring/zcrx: don't pass slot to io_zcrx_create_area ...
show more ...
|
| #
b4d90dbc |
| 15-Sep-2025 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next-fixes
Backmerging to drm-misc-next-fixes to get features and fixes from v6.17-rc6.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
| #
702fdf35 |
| 10-Sep-2025 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Catching up with some display dependencies.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
| #
c265ae75 |
| 08-Sep-2025 |
Pavel Begunkov <asml.silence@gmail.com> |
io_uring: introduce io_uring querying
There are many parameters users might want to query about io_uring like available request types or the ring sizes. This patch introduces an interface for such s
io_uring: introduce io_uring querying
There are many parameters users might want to query about io_uring like available request types or the ring sizes. This patch introduces an interface for such slow path queries.
It was written with several requirements in mind: - Can be used with or without an io_uring instance. Asking for supported setup flags before creating an instance as well as qeurying info about an already created ring are valid use cases. - Should be moderately fast. For example, users might use it to periodically retrieve ring attributes at runtime. As a consequence, it should be able to query multiple attributes in a single syscall. - Backward and forward compatible. - Should be reasobably easy to use. - Reduce the kernel code size for introducing new query types.
It's implemented as a new registration opcode IORING_REGISTER_QUERY. The user passes one or more query strutctures linked together, each represented by struct io_uring_query_hdr. The header stores common control fields needed for processing and points to query type specific information.
The header contains - The query type - The result field, which on return contains the error code for the query - Pointer to the query type specific information - The size of the query structure. The kernel will only populate up to the size, which helps with backward compatibility. The kernel can also reduce the size, so if the current kernel is older than the inteface the user tries to use, it'll get only the supported bits. - next_entry field is used to chain multiple queries.
Apart from common registeration syscall failures, it can only immediately return an error code in case when the headers are incorrect or any other addresses and invalid. That usually mean that the userspace doesn't use the API right and should be corrected. All query type specific errors are returned in the header's result field.
As an example, the patch adds a single query type for now, i.e. IO_URING_QUERY_OPCODES, which tells what register / request / etc. opcodes are supported, but there are particular plans to extend it.
Note: there is a request probing interface via IORING_REGISTER_PROBE, but it's a mess. It requires the user to create a ring first, it only works for requests, and requires dynamic allocations.
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
show more ...
|
| #
ca994e89 |
| 12-Aug-2025 |
Lucas De Marchi <lucas.demarchi@intel.com> |
Merge drm/drm-next into drm-xe-next
Bring v6.17-rc1 to propagate commits from other subsystems, particularly PCI, which has some new functions needed for SR-IOV integration.
Signed-off-by: Lucas De
Merge drm/drm-next into drm-xe-next
Bring v6.17-rc1 to propagate commits from other subsystems, particularly PCI, which has some new functions needed for SR-IOV integration.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
show more ...
|
| #
08c51f5b |
| 11-Aug-2025 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-n
Updating drm-misc-next to the state of v6.17-rc1. Begins a new release cycle.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|