| b565a73b | 27-May-2026 |
Zicheng Qu <quzicheng315@gmail.com> |
tools/sched_ext: Fix scx_show_state per-scheduler state reads
scx_show_state.py still reads scx_aborting and scx_bypass_depth as global symbols. Those symbols no longer exist after the state was mov
tools/sched_ext: Fix scx_show_state per-scheduler state reads
scx_show_state.py still reads scx_aborting and scx_bypass_depth as global symbols. Those symbols no longer exist after the state was moved into struct scx_sched, so the drgn script fails when it reaches either field.
Read aborting and bypass_depth from scx_root instead. This preserves the script's current root-scheduler view: with sub-scheduler support, the reported values are for the root scheduler and sub-schedulers are not enumerated.
Fixes: 5c8d98a1b4de ("sched_ext: Move bypass state into scx_sched") Fixes: c1743da43cf5 ("sched_ext: Move aborting flag to per-scheduler field") Signed-off-by: Zicheng Qu <quzicheng@huawei.com> Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 7e311baf | 13-Apr-2026 |
Kuba Piecuch <jpiecuch@google.com> |
tools/sched_ext: Add explicit cast from void* in RESIZE_ARRAY()
This fixes the following compilation error when using the header from C++ code:
error: assigning to 'struct scx_flux__data_uei_dump
tools/sched_ext: Add explicit cast from void* in RESIZE_ARRAY()
This fixes the following compilation error when using the header from C++ code:
error: assigning to 'struct scx_flux__data_uei_dump *' from incompatible type 'void *'
Signed-off-by: Kuba Piecuch <jpiecuch@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 4615361f | 13-Apr-2026 |
Kuba Piecuch <jpiecuch@google.com> |
sched_ext: Make string params of __ENUM_set() const
A small change to improve type safety/const correctness. __COMPAT_read_enum() already has const string parameters.
It fixes a warning when using
sched_ext: Make string params of __ENUM_set() const
A small change to improve type safety/const correctness. __COMPAT_read_enum() already has const string parameters.
It fixes a warning when using the header in C++ code:
error: ISO C++11 does not allow conversion from string literal to 'char *' [-Werror,-Wwritable-strings]
That's because string literals have type char[N] in C and const char[N] in C++.
Signed-off-by: Kuba Piecuch <jpiecuch@google.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 3d3667f2 | 13-Apr-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext: Kick home CPU for stranded tasks in scx_qmap
scx_qmap uses global BPF queue maps (BPF_MAP_TYPE_QUEUE) that any CPU's ops.dispatch() can pop from. When a CPU pops a task that can't r
tools/sched_ext: Kick home CPU for stranded tasks in scx_qmap
scx_qmap uses global BPF queue maps (BPF_MAP_TYPE_QUEUE) that any CPU's ops.dispatch() can pop from. When a CPU pops a task that can't run on it (e.g. a pinned per-CPU kthread), it inserts the task into SHARED_DSQ. consume_dispatch_q() then skips the task due to affinity mismatch, leaving it stranded until some CPU in its allowed mask calls ops.dispatch(). This doesn't cause indefinite stalls -- the periodic tick keeps firing (can_stop_idle_tick() returns false when softirq is pending) -- but can cause noticeable scheduling delays.
After inserting to SHARED_DSQ, kick the task's home CPU if this CPU can't run it. There's a small race window where the home CPU can enter idle before the kick lands -- if a per-CPU kthread like ksoftirqd is the stranded task, this can trigger a "NOHZ tick-stop error" warning. The kick arrives shortly after and the home CPU drains the task.
Rather than fully eliminating the warning by routing pinned tasks to local or global DSQs, the current code keeps them going through the normal BPF queue path and documents the race and the resulting warning in detail. scx_qmap is an example scheduler and having tasks go through the usual dispatch path is useful for testing. The detailed comment also serves as a reference for other schedulers that may encounter similar warnings.
Reviewed-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| a3c3fb2f | 31-Mar-2026 |
Cheng-Yang Chou <yphbchou0911@gmail.com> |
tools/sched_ext: Fix off-by-one in scx_sdt payload zeroing
scx_alloc_free_idx() zeroes the payload of a freed arena allocation one word at a time. The loop bound was alloc->pool.elem_size / 8, but e
tools/sched_ext: Fix off-by-one in scx_sdt payload zeroing
scx_alloc_free_idx() zeroes the payload of a freed arena allocation one word at a time. The loop bound was alloc->pool.elem_size / 8, but elem_size includes sizeof(struct sdt_data) (the 8-byte union sdt_id header). This caused the loop to write one extra u64 past the allocation, corrupting the tid field of the adjacent pool element.
Fix the loop bound to (elem_size - sizeof(struct sdt_data)) / 8 so only the payload portion is zeroed.
Test plan: - Add a temporary sanity check in scx_task_free() before the free call:
if (mval->data->tid.idx != mval->tid.idx) scx_bpf_error("tid corruption: arena=%d storage=%d", mval->data->tid.idx, (int)mval->tid.idx);
- stress-ng --fork 100 -t 10 & sudo ./build/bin/scx_sdt
Without this fix, running scx_sdt under fork-heavy load triggers the corruption error. With the fix applied, the same workload completes without error.
Fixes: 36929ebd17ae ("tools/sched_ext: add arena based scheduler") Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| d6edb15a | 27-Mar-2026 |
Zhao Mengmeng <zhaomengmeng@kylinos.cn> |
scx_central: Defer timer start to central dispatch to fix init error
scx_central currently assumes that ops.init() runs on the selected central CPU and aborts otherwise. This is no longer true, as o
scx_central: Defer timer start to central dispatch to fix init error
scx_central currently assumes that ops.init() runs on the selected central CPU and aborts otherwise. This is no longer true, as ops.init() is invoked from the scx_enable_helper thread, which can run on any CPU.
As a result, sched_setaffinity() from userspace doesn't work, causing scx_central to fail when loading with:
[ 1985.319942] sched_ext: central: scx_central.bpf.c:314: init from non-central CPU [ 1985.320317] scx_exit+0xa3/0xd0 [ 1985.320535] scx_bpf_error_bstr+0xbd/0x220 [ 1985.320840] bpf_prog_3a445a8163fa8149_central_init+0x103/0x1ba [ 1985.321073] bpf__sched_ext_ops_init+0x40/0xa8 [ 1985.321286] scx_root_enable_workfn+0x507/0x1650 [ 1985.321461] kthread_worker_fn+0x260/0x940 [ 1985.321745] kthread+0x303/0x3e0 [ 1985.321901] ret_from_fork+0x589/0x7d0 [ 1985.322065] ret_from_fork_asm+0x1a/0x30
DEBUG DUMP ===================================================================
central: root scx_enable_help[134] triggered exit kind 1025: scx_bpf_error (scx_central.bpf.c:314: init from non-central CPU)
Fix this by: - Defer bpf_timer_start() to the first dispatch on the central CPU. - Initialize the BPF timer in central_init() and kick the central CPU to guarantee entering the dispatch path on the central CPU immediately. - Remove the unnecessary sched_setaffinity() call in userspace.
Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| ea702393 | 26-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext: Remove redundant SCX_ENQ_IMMED compat definition
compat.bpf.h defined a fallback SCX_ENQ_IMMED macro using __COMPAT_ENUM_OR_ZERO(). After 6bf36c68b0a2 ("tools/sched_ext: Regenerate
tools/sched_ext: Remove redundant SCX_ENQ_IMMED compat definition
compat.bpf.h defined a fallback SCX_ENQ_IMMED macro using __COMPAT_ENUM_OR_ZERO(). After 6bf36c68b0a2 ("tools/sched_ext: Regenerate autogen enum headers") added SCX_ENQ_IMMED to the autogen headers, including both triggers -Wmacro-redefined warnings.
The autogen definition through const volatile __weak already resolves to 0 on older kernels, providing the same backward compatibility. Remove the now-redundant compat fallback.
Fixes: 6bf36c68b0a2 ("tools/sched_ext: Regenerate autogen enum headers") Link: https://lore.kernel.org/r/20260326100313.338388-1-zhaomzhao@126.com Reported-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| f546c770 | 26-Mar-2026 |
Zhao Mengmeng <zhaomengmeng@kylinos.cn> |
tools/sched_ext: scx_pair: fix pair_ctx indexing for CPU pairs
scx_pair sizes pair_ctx to nr_cpu_ids / 2, so valid pair_ctx keys are dense pair indexes in the range [0, nr_cpu_ids / 2).
However, th
tools/sched_ext: scx_pair: fix pair_ctx indexing for CPU pairs
scx_pair sizes pair_ctx to nr_cpu_ids / 2, so valid pair_ctx keys are dense pair indexes in the range [0, nr_cpu_ids / 2).
However, the userspace setup code stores pair_id as the first CPU number in each pair. On an 8-CPU system with "-S 1", that produces pair IDs 0, 2, 4 and 6 for pairs [0,1], [2,3], [4,5] and [6,7]. CPUs in the latter half then look up pair_ctx with out-of-range keys and the BPF scheduler aborts with:
EXIT: scx_bpf_error (scx_pair.bpf.c:328: failed to lookup pairc and in_pair_mask for cpu[5])
Assign pair_id using a dense pair counter instead so that each CPU pair maps to a valid pair_ctx entry. Besides, reject odd CPU configuration, as scx_pair requires all CPUs to be paired.
Fixes: f0262b102c7c ("tools/sched_ext: add scx_pair scheduler") Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 6bf36c68 | 25-Mar-2026 |
Cheng-Yang Chou <yphbchou0911@gmail.com> |
tools/sched_ext: Regenerate autogen enum headers
Regenerate enum_defs.autogen.h, enums.autogen.h and enums.autogen.bpf.h using the upstream scripts [1][2] to sync with recent kernel enum additions.
tools/sched_ext: Regenerate autogen enum headers
Regenerate enum_defs.autogen.h, enums.autogen.h and enums.autogen.bpf.h using the upstream scripts [1][2] to sync with recent kernel enum additions.
[1] https://github.com/sched-ext/scx/blob/main/scripts/gen_enum_defs.py [2] https://github.com/sched-ext/scx/blob/main/scripts/gen_enums.py
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| cb251eae | 23-Mar-2026 |
Cheng-Yang Chou <yphbchou0911@gmail.com> |
tools/sched_ext: Add scx_bpf_sub_dispatch() compat wrapper
Add a transparent compatibility wrapper for the scx_bpf_sub_dispatch() kfunc in compat.bpf.h. This allows BPF schedulers using the sub-sche
tools/sched_ext: Add scx_bpf_sub_dispatch() compat wrapper
Add a transparent compatibility wrapper for the scx_bpf_sub_dispatch() kfunc in compat.bpf.h. This allows BPF schedulers using the sub-sched dispatch feature to build and run on older kernels that lack the kfunc.
To avoid requiring code changes in individual schedulers, the transparent wrapper pattern is used instead of a __COMPAT prefix. The kfunc is declared with a ___compat suffix, while the static inline wrapper retains the original scx_bpf_sub_dispatch() name.
When the kfunc is unavailable, the wrapper safely falls back to returning false. This is acceptable because the dispatch path cannot do anything useful without underlying sub-sched support anyway.
Tested scx_qmap on v6.14 successfully.
Signed-off-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| f03ffe53 | 22-Mar-2026 |
Andrea Righi <arighi@nvidia.com> |
tools/sched_ext: Add compat handling for sub-scheduler ops
Extend SCX_OPS_OPEN() with compatibility handling for ops.sub_attach() and ops.sub_detach(), allowing scx C schedulers with sub-scheduler s
tools/sched_ext: Add compat handling for sub-scheduler ops
Extend SCX_OPS_OPEN() with compatibility handling for ops.sub_attach() and ops.sub_detach(), allowing scx C schedulers with sub-scheduler support to run on kernels both with and without its support.
Cc: Cheng-Yang Chou <yphbchou0911@gmail.com> Fixes: ebeca1f930ea ("sched_ext: Introduce cgroup sub-sched support") Signed-off-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Cheng-Yang Chou <yphbchou0911@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 068014da | 18-Mar-2026 |
Ke Zhao <ke.zhao.kernel@gmail.com> |
tools/sched_ext: Update stale scx_ops_error() comment in fcg_cgroup_move()
The function scx_ops_error() was dropped, but the comment here is left pointing to the old name. Update to be consistent wi
tools/sched_ext: Update stale scx_ops_error() comment in fcg_cgroup_move()
The function scx_ops_error() was dropped, but the comment here is left pointing to the old name. Update to be consistent with current API.
Signed-off-by: Ke Zhao <ke.zhao.kernel@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 3229ac4a | 13-Mar-2026 |
Tejun Heo <tj@kernel.org> |
sched_ext: Add SCX_OPS_ALWAYS_ENQ_IMMED ops flag
SCX_ENQ_IMMED makes enqueue to local DSQs succeed only if the task can start running immediately. Otherwise, the task is re-enqueued through ops.enqu
sched_ext: Add SCX_OPS_ALWAYS_ENQ_IMMED ops flag
SCX_ENQ_IMMED makes enqueue to local DSQs succeed only if the task can start running immediately. Otherwise, the task is re-enqueued through ops.enqueue(). This provides tighter control but requires specifying the flag on every insertion.
Add SCX_OPS_ALWAYS_ENQ_IMMED ops flag. When set, SCX_ENQ_IMMED is automatically applied to all local DSQ enqueues including through scx_bpf_dsq_move_to_local().
scx_qmap is updated with -I option to test the feature and -F option for IMMED stress testing which forces every Nth enqueue to a busy local DSQ.
v2: - Cover scx_bpf_dsq_move_to_local() path (now has enq_flags via ___v2). - scx_qmap: Remove sched_switch and cpu_release handlers (superseded by kernel-side wakeup_preempt_scx()). Add -F for IMMED stress testing.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 86068376 | 13-Mar-2026 |
Tejun Heo <tj@kernel.org> |
sched_ext: Add enq_flags to scx_bpf_dsq_move_to_local()
scx_bpf_dsq_move_to_local() moves a task from a non-local DSQ to the current CPU's local DSQ. This is an indirect way of dispatching to a loca
sched_ext: Add enq_flags to scx_bpf_dsq_move_to_local()
scx_bpf_dsq_move_to_local() moves a task from a non-local DSQ to the current CPU's local DSQ. This is an indirect way of dispatching to a local DSQ and should support enq_flags like direct dispatches do - e.g. SCX_ENQ_HEAD for head-of-queue insertion and SCX_ENQ_IMMED for immediate execution guarantees.
Add scx_bpf_dsq_move_to_local___v2() with an enq_flags parameter. The original becomes a v1 compat wrapper passing 0. The compat macro is updated to a three-level chain: v2 (7.1+) -> v1 (current) -> scx_bpf_consume (pre-rename). All in-tree BPF schedulers are updated to pass 0.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 98d709cb | 13-Mar-2026 |
Tejun Heo <tj@kernel.org> |
sched_ext: Implement SCX_ENQ_IMMED
Add SCX_ENQ_IMMED enqueue flag for local DSQ insertions. Once a task is dispatched with IMMED, it either gets on the CPU immediately and stays on it, or gets reenq
sched_ext: Implement SCX_ENQ_IMMED
Add SCX_ENQ_IMMED enqueue flag for local DSQ insertions. Once a task is dispatched with IMMED, it either gets on the CPU immediately and stays on it, or gets reenqueued back to the BPF scheduler. It will never linger on a local DSQ behind other tasks or on a CPU taken by a higher-priority class.
rq_is_open() uses rq->next_class to determine whether the rq is available, and wakeup_preempt_scx() triggers reenqueue when a higher-priority class task arrives. These capture all higher class preemptions. Combined with reenqueue points in the dispatch path, all cases where an IMMED task would not execute immediately are covered.
SCX_TASK_IMMED persists in p->scx.flags until the next fresh enqueue, so the guarantee survives SAVE/RESTORE cycles. If preempted while running, put_prev_task_scx() reenqueues through ops.enqueue() with SCX_TASK_REENQ_PREEMPTED instead of silently placing the task back on the local DSQ.
This enables tighter scheduling latency control by preventing tasks from piling up on local DSQs. It also enables opportunistic CPU sharing across sub-schedulers - without this, a sub-scheduler can stuff the local DSQ of a shared CPU, making it difficult for others to use.
v2: - Rewrite is_curr_done() as rq_is_open() using rq->next_class and implement wakeup_preempt_scx() to achieve complete coverage of all cases where IMMED tasks could get stranded. - Track IMMED persistently in p->scx.flags and reenqueue preempted-while-running tasks through ops.enqueue(). - Bound deferred reenq cycles (SCX_REENQ_LOCAL_MAX_REPEAT). - Misc renames, documentation.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| bec10581 | 09-Mar-2026 |
Zhao Mengmeng <zhaomengmeng@kylinos.cn> |
sched_ext: remove SCX_OPS_HAS_CGROUP_WEIGHT
While running scx_flatcg, dmesg prints "SCX_OPS_HAS_CGROUP_WEIGHT is deprecated and a noop", in code, SCX_OPS_HAS_CGROUP_WEIGHT has been marked as DEPRECA
sched_ext: remove SCX_OPS_HAS_CGROUP_WEIGHT
While running scx_flatcg, dmesg prints "SCX_OPS_HAS_CGROUP_WEIGHT is deprecated and a noop", in code, SCX_OPS_HAS_CGROUP_WEIGHT has been marked as DEPRECATED, and will be removed on 6.18. Now it's time to do it.
Signed-off-by: Zhao Mengmeng <zhaomengmeng@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 0a0d3b8d | 08-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext/include: Regenerate enum_defs.autogen.h
Regenerate enum_defs.autogen.h from the current vmlinux.h to pick up new SCX enums added in the for-7.1 cycle.
Signed-off-by: Tejun Heo <tj@k
tools/sched_ext/include: Regenerate enum_defs.autogen.h
Regenerate enum_defs.autogen.h from the current vmlinux.h to pick up new SCX enums added in the for-7.1 cycle.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 93ac9b15 | 08-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext/include: Add libbpf version guard for assoc_struct_ops
Extract the inline bpf_program__assoc_struct_ops() call in SCX_OPS_LOAD() into a __scx_ops_assoc_prog() helper and wrap it with
tools/sched_ext/include: Add libbpf version guard for assoc_struct_ops
Extract the inline bpf_program__assoc_struct_ops() call in SCX_OPS_LOAD() into a __scx_ops_assoc_prog() helper and wrap it with a libbpf >= 1.7 version guard. bpf_program__assoc_struct_ops() was added in libbpf 1.7; the guard provides a no-op fallback for older versions. Add the <bpf/libbpf.h> include needed by the helper, and fix "assumming" typo in a nearby comment.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| c9c8546c | 08-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext/include: Add __COMPAT_HAS_scx_bpf_select_cpu_and macro
scx_bpf_select_cpu_and() is now an inline wrapper so bpf_ksym_exists(scx_bpf_select_cpu_and) no longer works. Add __COMPAT_HAS_
tools/sched_ext/include: Add __COMPAT_HAS_scx_bpf_select_cpu_and macro
scx_bpf_select_cpu_and() is now an inline wrapper so bpf_ksym_exists(scx_bpf_select_cpu_and) no longer works. Add __COMPAT_HAS_scx_bpf_select_cpu_and macro that checks for either the struct args type (new) or the compat ksym (old) to test availability.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 3691d380 | 08-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext/include: Add missing helpers to common.bpf.h
Sync several helpers from the scx repo: - bpf_cgroup_acquire() ksym declaration - __sink() macro for hiding values from verifier precisio
tools/sched_ext/include: Add missing helpers to common.bpf.h
Sync several helpers from the scx repo: - bpf_cgroup_acquire() ksym declaration - __sink() macro for hiding values from verifier precision tracking - ctzll() count-trailing-zeros implementation - get_prandom_u64() helper - scx_clock_task/pelt/virt/irq() clock helpers with get_current_rq()
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
| 9c6437f7 | 08-Mar-2026 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext/include: Sync bpf_arena_common.bpf.h with scx repo
Sync the following changes from the scx repo:
- Guard __arena define with #ifndef to avoid redefinition when the attribute is al
tools/sched_ext/include: Sync bpf_arena_common.bpf.h with scx repo
Sync the following changes from the scx repo:
- Guard __arena define with #ifndef to avoid redefinition when the attribute is already defined by another header. - Add bpf_arena_reserve_pages() and bpf_arena_mapping_nr_pages() ksym declarations. - Rename TEST to SCX_BPF_UNITTEST to avoid collision with generic TEST macros in other projects.
Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Andrea Righi <arighi@nvidia.com>
show more ...
|