| 7e311baf | 13-Apr-2026 |
Kuba Piecuch <jpiecuch@google.com> |
tools/sched_ext: Add explicit cast from void* in RESIZE_ARRAY()
This fixes the following compilation error when using the header from C++ code:
error: assigning to 'struct scx_flux__data_uei_dump
tools/sched_ext: Add explicit cast from void* in RESIZE_ARRAY()
This fixes the following compilation error when using the header from C++ code:
error: assigning to 'struct scx_flux__data_uei_dump *' from incompatible type 'void *'
Signed-off-by: Kuba Piecuch <jpiecuch@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| ea702393 | 26-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext: Remove redundant SCX_ENQ_IMMED compat definition
compat.bpf.h defined a fallback SCX_ENQ_IMMED macro using __COMPAT_ENUM_OR_ZERO(). After 6bf36c68b0a2 ("tools/sched_ext: Regenerate
tools/sched_ext: Remove redundant SCX_ENQ_IMMED compat definition
compat.bpf.h defined a fallback SCX_ENQ_IMMED macro using __COMPAT_ENUM_OR_ZERO(). After 6bf36c68b0a2 ("tools/sched_ext: Regenerate autogen enum headers") added SCX_ENQ_IMMED to the autogen headers, including both triggers -Wmacro-redefined warnings.
The autogen definition through const volatile __weak already resolves to 0 on older kernels, providing the same backward compatibility. Remove the now-redundant compat fallback.
Fixes: 6bf36c68b0a2 ("tools/sched_ext: Regenerate autogen enum headers") Link: https://lore.kernel.org/r/20260326100313.338388-1-zhaomzhao@126.com Reported-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 6bf36c68 | 25-Mar-2026 |
Cheng-Yang Chou <yphbchou0911@gmail.com> |
tools/sched_ext: Regenerate autogen enum headers
Regenerate enum_defs.autogen.h, enums.autogen.h and enums.autogen.bpf.h using the upstream scripts [1][2] to sync with recent kernel enum additions.
tools/sched_ext: Regenerate autogen enum headers
Regenerate enum_defs.autogen.h, enums.autogen.h and enums.autogen.bpf.h using the upstream scripts [1][2] to sync with recent kernel enum additions.
[1] https://github.com/sched-ext/scx/blob/main/scripts/gen_enum_defs.py [2] https://github.com/sched-ext/scx/blob/main/scripts/gen_enums.py
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| cb251eae | 23-Mar-2026 |
Cheng-Yang Chou <yphbchou0911@gmail.com> |
tools/sched_ext: Add scx_bpf_sub_dispatch() compat wrapper
Add a transparent compatibility wrapper for the scx_bpf_sub_dispatch() kfunc in compat.bpf.h. This allows BPF schedulers using the sub-sche
tools/sched_ext: Add scx_bpf_sub_dispatch() compat wrapper
Add a transparent compatibility wrapper for the scx_bpf_sub_dispatch() kfunc in compat.bpf.h. This allows BPF schedulers using the sub-sched dispatch feature to build and run on older kernels that lack the kfunc.
To avoid requiring code changes in individual schedulers, the transparent wrapper pattern is used instead of a __COMPAT prefix. The kfunc is declared with a ___compat suffix, while the static inline wrapper retains the original scx_bpf_sub_dispatch() name.
When the kfunc is unavailable, the wrapper safely falls back to returning false. This is acceptable because the dispatch path cannot do anything useful without underlying sub-sched support anyway.
Tested scx_qmap on v6.14 successfully.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 3229ac4a | 13-Mar-2026 |
Tejun Heo <tj@kernel.org> |
sched_ext: Add SCX_OPS_ALWAYS_ENQ_IMMED ops flag
SCX_ENQ_IMMED makes enqueue to local DSQs succeed only if the task can start running immediately. Otherwise, the task is re-enqueued through ops.enqu
sched_ext: Add SCX_OPS_ALWAYS_ENQ_IMMED ops flag
SCX_ENQ_IMMED makes enqueue to local DSQs succeed only if the task can start running immediately. Otherwise, the task is re-enqueued through ops.enqueue(). This provides tighter control but requires specifying the flag on every insertion.
Add SCX_OPS_ALWAYS_ENQ_IMMED ops flag. When set, SCX_ENQ_IMMED is automatically applied to all local DSQ enqueues including through scx_bpf_dsq_move_to_local().
scx_qmap is updated with -I option to test the feature and -F option for IMMED stress testing which forces every Nth enqueue to a busy local DSQ.
v2: - Cover scx_bpf_dsq_move_to_local() path (now has enq_flags via ___v2). - scx_qmap: Remove sched_switch and cpu_release handlers (superseded by kernel-side wakeup_preempt_scx()). Add -F for IMMED stress testing.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 98d709cb | 13-Mar-2026 |
Tejun Heo <tj@kernel.org> |
sched_ext: Implement SCX_ENQ_IMMED
Add SCX_ENQ_IMMED enqueue flag for local DSQ insertions. Once a task is dispatched with IMMED, it either gets on the CPU immediately and stays on it, or gets reenq
sched_ext: Implement SCX_ENQ_IMMED
Add SCX_ENQ_IMMED enqueue flag for local DSQ insertions. Once a task is dispatched with IMMED, it either gets on the CPU immediately and stays on it, or gets reenqueued back to the BPF scheduler. It will never linger on a local DSQ behind other tasks or on a CPU taken by a higher-priority class.
rq_is_open() uses rq->next_class to determine whether the rq is available, and wakeup_preempt_scx() triggers reenqueue when a higher-priority class task arrives. These capture all higher class preemptions. Combined with reenqueue points in the dispatch path, all cases where an IMMED task would not execute immediately are covered.
SCX_TASK_IMMED persists in p->scx.flags until the next fresh enqueue, so the guarantee survives SAVE/RESTORE cycles. If preempted while running, put_prev_task_scx() reenqueues through ops.enqueue() with SCX_TASK_REENQ_PREEMPTED instead of silently placing the task back on the local DSQ.
This enables tighter scheduling latency control by preventing tasks from piling up on local DSQs. It also enables opportunistic CPU sharing across sub-schedulers - without this, a sub-scheduler can stuff the local DSQ of a shared CPU, making it difficult for others to use.
v2: - Rewrite is_curr_done() as rq_is_open() using rq->next_class and implement wakeup_preempt_scx() to achieve complete coverage of all cases where IMMED tasks could get stranded. - Track IMMED persistently in p->scx.flags and reenqueue preempted-while-running tasks through ops.enqueue(). - Bound deferred reenq cycles (SCX_REENQ_LOCAL_MAX_REPEAT). - Misc renames, documentation.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| bec10581 | 09-Mar-2026 |
Zhao Mengmeng <zhaomengmeng@kylinos.cn> |
sched_ext: remove SCX_OPS_HAS_CGROUP_WEIGHT
While running scx_flatcg, dmesg prints "SCX_OPS_HAS_CGROUP_WEIGHT is deprecated and a noop", in code, SCX_OPS_HAS_CGROUP_WEIGHT has been marked as DEPRECA
sched_ext: remove SCX_OPS_HAS_CGROUP_WEIGHT
While running scx_flatcg, dmesg prints "SCX_OPS_HAS_CGROUP_WEIGHT is deprecated and a noop", in code, SCX_OPS_HAS_CGROUP_WEIGHT has been marked as DEPRECATED, and will be removed on 6.18. Now it's time to do it.
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 0a0d3b8d | 08-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext/include: Regenerate enum_defs.autogen.h
Regenerate enum_defs.autogen.h from the current vmlinux.h to pick up new SCX enums added in the for-7.1 cycle.
Signed-off-by: Tejun Heo <tj@k
tools/sched_ext/include: Regenerate enum_defs.autogen.h
Regenerate enum_defs.autogen.h from the current vmlinux.h to pick up new SCX enums added in the for-7.1 cycle.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 93ac9b15 | 08-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext/include: Add libbpf version guard for assoc_struct_ops
Extract the inline bpf_program__assoc_struct_ops() call in SCX_OPS_LOAD() into a __scx_ops_assoc_prog() helper and wrap it with
tools/sched_ext/include: Add libbpf version guard for assoc_struct_ops
Extract the inline bpf_program__assoc_struct_ops() call in SCX_OPS_LOAD() into a __scx_ops_assoc_prog() helper and wrap it with a libbpf >= 1.7 version guard. bpf_program__assoc_struct_ops() was added in libbpf 1.7; the guard provides a no-op fallback for older versions. Add the <bpf/libbpf.h> include needed by the helper, and fix "assumming" typo in a nearby comment.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| c9c8546c | 08-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext/include: Add __COMPAT_HAS_scx_bpf_select_cpu_and macro
scx_bpf_select_cpu_and() is now an inline wrapper so bpf_ksym_exists(scx_bpf_select_cpu_and) no longer works. Add __COMPAT_HAS_
tools/sched_ext/include: Add __COMPAT_HAS_scx_bpf_select_cpu_and macro
scx_bpf_select_cpu_and() is now an inline wrapper so bpf_ksym_exists(scx_bpf_select_cpu_and) no longer works. Add __COMPAT_HAS_scx_bpf_select_cpu_and macro that checks for either the struct args type (new) or the compat ksym (old) to test availability.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 3691d380 | 08-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext/include: Add missing helpers to common.bpf.h
Sync several helpers from the scx repo: - bpf_cgroup_acquire() ksym declaration - __sink() macro for hiding values from verifier precisio
tools/sched_ext/include: Add missing helpers to common.bpf.h
Sync several helpers from the scx repo: - bpf_cgroup_acquire() ksym declaration - __sink() macro for hiding values from verifier precision tracking - ctzll() count-trailing-zeros implementation - get_prandom_u64() helper - scx_clock_task/pelt/virt/irq() clock helpers with get_current_rq()
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 9c6437f7 | 08-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext/include: Sync bpf_arena_common.bpf.h with scx repo
Sync the following changes from the scx repo:
- Guard __arena define with #ifndef to avoid redefinition when the attribute is al
tools/sched_ext/include: Sync bpf_arena_common.bpf.h with scx repo
Sync the following changes from the scx repo:
- Guard __arena define with #ifndef to avoid redefinition when the attribute is already defined by another header. - Add bpf_arena_reserve_pages() and bpf_arena_mapping_nr_pages() ksym declarations. - Rename TEST to SCX_BPF_UNITTEST to avoid collision with generic TEST macros in other projects.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 9c34c507 | 07-Mar-2026 |
Tejun Heo <tj@kernel.org> |
sched_ext: Introduce scx_bpf_dsq_reenq() for remote local DSQ reenqueue
scx_bpf_reenqueue_local() can only trigger re-enqueue of the current CPU's local DSQ. Introduce scx_bpf_dsq_reenq() which take
sched_ext: Introduce scx_bpf_dsq_reenq() for remote local DSQ reenqueue
scx_bpf_reenqueue_local() can only trigger re-enqueue of the current CPU's local DSQ. Introduce scx_bpf_dsq_reenq() which takes a DSQ ID and can target any local DSQ including remote CPUs via SCX_DSQ_LOCAL_ON | cpu. This will be expanded to support user DSQs by future changes.
scx_bpf_reenqueue_local() is reimplemented as a simple wrapper around scx_bpf_dsq_reenq(SCX_DSQ_LOCAL, 0) and may be deprecated in the future.
Update compat.bpf.h with a compatibility shim and scx_qmap to test the new functionality.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 4f8b1228 | 06-Mar-2026 |
Tejun Heo <tj@kernel.org> |
sched_ext: Add basic building blocks for nested sub-scheduler dispatching
This is an early-stage partial implementation that demonstrates the core building blocks for nested sub-scheduler dispatchin
sched_ext: Add basic building blocks for nested sub-scheduler dispatching
This is an early-stage partial implementation that demonstrates the core building blocks for nested sub-scheduler dispatching. While significant work remains in the enqueue path and other areas, this patch establishes the fundamental mechanisms needed for hierarchical scheduler operation.
The key building blocks introduced include:
- Private stack support for ops.dispatch() to prevent stack overflow when walking down nested schedulers during dispatch operations
- scx_bpf_sub_dispatch() kfunc that allows parent schedulers to trigger dispatch operations on their direct child schedulers
- Proper parent-child relationship validation to ensure dispatch requests are only made to legitimate child schedulers
- Updated scx_dispatch_sched() to handle both nested and non-nested invocations with appropriate kf_mask handling
The qmap scheduler is updated to demonstrate the functionality by calling scx_bpf_sub_dispatch() on registered child schedulers when it has no tasks in its own queues.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 34423456 | 29-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext/tools: Restore backward compat with v6.12 kernels
Commit 111a79800aed ("tools/sched_ext: Strip compatibility macros for cgroup and dispatch APIs") removed the compat layer for v6.12-v6.13
sched_ext/tools: Restore backward compat with v6.12 kernels
Commit 111a79800aed ("tools/sched_ext: Strip compatibility macros for cgroup and dispatch APIs") removed the compat layer for v6.12-v6.13 kfunc renaming, but v6.12 is the current LTS kernel and will remain supported through 2026. Restore backward compatibility so schedulers built with v6.19+ headers can run on v6.12 kernels.
The restored compat differs from the original in two ways:
1. Uses ___new/___old suffixes instead of ___compat for clarity. The new macros check for v6.13+ names (scx_bpf_dsq_move*), fall back to v6.12 names (scx_bpf_dispatch_from_dsq*, scx_bpf_consume), then return safe no-ops for missing symbols.
2. Integrates with the args-struct-packing changes added in c0d630ba347c ("sched_ext: Wrap kfunc args in struct to prepare for aux__prog"). scx_bpf_dsq_insert_vtime() now tries __scx_bpf_dsq_insert_vtime (args struct), then scx_bpf_dsq_insert_vtime___compat (v6.13-v6.18), then scx_bpf_dispatch_vtime___compat (pre-v6.13).
Forward compatibility is not restored - binaries built against v6.13 or earlier headers won't run on v6.19+ kernels, as the old kfunc names are not exported. This is acceptable since the priority is new binaries running on older kernels.
Also add missing compat checks for ops.cgroup_set_bandwidth() (added v6.17) and ops.cgroup_set_idle() (added v6.19). These need to be NULLed out in userspace on older kernels.
Reported-by: Andrea Righi <arighi@nvidia.com> Acked-and-tested-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| a3f5d482 | 27-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Allow scx_bpf_reenqueue_local() to be called from anywhere
The ops.cpu_acquire/release() callbacks miss events under multiple conditions. There are two distinct task dispatch gaps that ca
sched_ext: Allow scx_bpf_reenqueue_local() to be called from anywhere
The ops.cpu_acquire/release() callbacks miss events under multiple conditions. There are two distinct task dispatch gaps that can cause cpu_released flag desynchronization:
1. balance-to-pick_task gap: This is what was originally reported. balance_scx() can enqueue a task, but during consume_remote_task() when the rq lock is released, a higher priority task can be enqueued and ultimately picked while cpu_released remains false. This gap is closeable via RETRY_TASK handling.
2. ttwu-to-pick_task gap: ttwu() can directly dispatch a task to a CPU's local DSQ. By the time the sched path runs on the target CPU, higher class tasks may already be queued. In such cases, nothing on sched_ext side will be invoked, and the only solution would be a hook invoked regardless of sched class, which isn't desirable.
Rather than adding invasive core hooks, BPF schedulers can use generic BPF mechanisms like tracepoints. From SCX scheduler's perspective, this is congruent with other mechanisms it already uses and doesn't add further friction.
The main use case for cpu_release() was calling scx_bpf_reenqueue_local() when a CPU gets preempted by a higher priority scheduling class. However, the old scx_bpf_reenqueue_local() could only be called from cpu_release() context.
Add a new version of scx_bpf_reenqueue_local() that can be called from any context by deferring the actual re-enqueue operation. This eliminates the need for cpu_acquire/release() ops entirely. Schedulers can now use standard BPF mechanisms like the sched_switch tracepoint to detect and handle CPU preemption.
Update scx_qmap to demonstrate the new approach using sched_switch instead of cpu_release, with compat support for older kernels. Mark cpu_acquire/release() as deprecated. The old scx_bpf_reenqueue_local() variant will be removed in v6.23.
Reported-by: Wen-Fang Liu <liuwenfang@honor.com> Link: https://lore.kernel.org/all/8d64c74118c6440f81bcf5a4ac6b9f00@honor.com/ Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| dcb938c4 | 25-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Add ___compat suffix to scx_bpf_dsq_insert___v2 in compat.bpf.h
2dbbdeda77a6 ("sched_ext: Fix scx_bpf_dsq_insert() backward binary compatibility") renamed the new bool-returning variant t
sched_ext: Add ___compat suffix to scx_bpf_dsq_insert___v2 in compat.bpf.h
2dbbdeda77a6 ("sched_ext: Fix scx_bpf_dsq_insert() backward binary compatibility") renamed the new bool-returning variant to scx_bpf_dsq_insert___v2 in the kernel. However, libbpf currently only strips ___SUFFIX on the BPF side, not on kernel symbols, so the compat wrapper couldn't match the kernel kfunc and would always fall back to the old variant even when the new one was available.
Add an extra ___compat suffix as a workaround - libbpf strips one suffix on the BPF side leaving ___v2, which then matches the kernel kfunc directly. In the future when libbpf strips all suffixes on both sides, all suffixes can be dropped.
Fixes: 2dbbdeda77a6 ("sched_ext: Fix scx_bpf_dsq_insert() backward binary compatibility") Cc: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|