| c9894e6f | 26-Dec-2025 |
Kohei Enju <enjuk@amazon.com> |
tools/sched_ext: update scx_show_state.py for scx_aborting change
Commit a69040ed57f5 ("sched_ext: Simplify breather mechanism with scx_aborting flag") removed scx_in_softlockup and scx_breather_dep
tools/sched_ext: update scx_show_state.py for scx_aborting change
Commit a69040ed57f5 ("sched_ext: Simplify breather mechanism with scx_aborting flag") removed scx_in_softlockup and scx_breather_depth, replacing them with scx_aborting.
Update the script accordingly.
Fixes: a69040ed57f5 ("sched_ext: Simplify breather mechanism with scx_aborting flag") Signed-off-by: Kohei Enju <enjuk@amazon.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 06a7415c | 18-Nov-2025 |
Rong Tao <rongtao@cestc.cn> |
sched_ext: tools: Removing duplicate targets during non-cross compilation
When cross-compilation is not used, BPFOBJ and HOST_BPFOBJ are identical files, libbpf.a, and duplicate libbpf.a files shoul
sched_ext: tools: Removing duplicate targets during non-cross compilation
When cross-compilation is not used, BPFOBJ and HOST_BPFOBJ are identical files, libbpf.a, and duplicate libbpf.a files should be removed.
Signed-off-by: Rong Tao <rongtao@cestc.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 34423456 | 29-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext/tools: Restore backward compat with v6.12 kernels
Commit 111a79800aed ("tools/sched_ext: Strip compatibility macros for cgroup and dispatch APIs") removed the compat layer for v6.12-v6.13
sched_ext/tools: Restore backward compat with v6.12 kernels
Commit 111a79800aed ("tools/sched_ext: Strip compatibility macros for cgroup and dispatch APIs") removed the compat layer for v6.12-v6.13 kfunc renaming, but v6.12 is the current LTS kernel and will remain supported through 2026. Restore backward compatibility so schedulers built with v6.19+ headers can run on v6.12 kernels.
The restored compat differs from the original in two ways:
1. Uses ___new/___old suffixes instead of ___compat for clarity. The new macros check for v6.13+ names (scx_bpf_dsq_move*), fall back to v6.12 names (scx_bpf_dispatch_from_dsq*, scx_bpf_consume), then return safe no-ops for missing symbols.
2. Integrates with the args-struct-packing changes added in c0d630ba347c ("sched_ext: Wrap kfunc args in struct to prepare for aux__prog"). scx_bpf_dsq_insert_vtime() now tries __scx_bpf_dsq_insert_vtime (args struct), then scx_bpf_dsq_insert_vtime___compat (v6.13-v6.18), then scx_bpf_dispatch_vtime___compat (pre-v6.13).
Forward compatibility is not restored - binaries built against v6.13 or earlier headers won't run on v6.19+ kernels, as the old kfunc names are not exported. This is acceptable since the priority is new binaries running on older kernels.
Also add missing compat checks for ops.cgroup_set_bandwidth() (added v6.17) and ops.cgroup_set_idle() (added v6.19). These need to be NULLed out in userspace on older kernels.
Reported-by: Andrea Righi <arighi@nvidia.com> Acked-and-tested-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| a3f5d482 | 27-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Allow scx_bpf_reenqueue_local() to be called from anywhere
The ops.cpu_acquire/release() callbacks miss events under multiple conditions. There are two distinct task dispatch gaps that ca
sched_ext: Allow scx_bpf_reenqueue_local() to be called from anywhere
The ops.cpu_acquire/release() callbacks miss events under multiple conditions. There are two distinct task dispatch gaps that can cause cpu_released flag desynchronization:
1. balance-to-pick_task gap: This is what was originally reported. balance_scx() can enqueue a task, but during consume_remote_task() when the rq lock is released, a higher priority task can be enqueued and ultimately picked while cpu_released remains false. This gap is closeable via RETRY_TASK handling.
2. ttwu-to-pick_task gap: ttwu() can directly dispatch a task to a CPU's local DSQ. By the time the sched path runs on the target CPU, higher class tasks may already be queued. In such cases, nothing on sched_ext side will be invoked, and the only solution would be a hook invoked regardless of sched class, which isn't desirable.
Rather than adding invasive core hooks, BPF schedulers can use generic BPF mechanisms like tracepoints. From SCX scheduler's perspective, this is congruent with other mechanisms it already uses and doesn't add further friction.
The main use case for cpu_release() was calling scx_bpf_reenqueue_local() when a CPU gets preempted by a higher priority scheduling class. However, the old scx_bpf_reenqueue_local() could only be called from cpu_release() context.
Add a new version of scx_bpf_reenqueue_local() that can be called from any context by deferring the actual re-enqueue operation. This eliminates the need for cpu_acquire/release() ops entirely. Schedulers can now use standard BPF mechanisms like the sched_switch tracepoint to detect and handle CPU preemption.
Update scx_qmap to demonstrate the new approach using sched_switch instead of cpu_release, with compat support for older kernels. Mark cpu_acquire/release() as deprecated. The old scx_bpf_reenqueue_local() variant will be removed in v6.23.
Reported-by: Wen-Fang Liu <liuwenfang@honor.com> Link: https://lore.kernel.org/all/8d64c74118c6440f81bcf5a4ac6b9f00@honor.com/ Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| dcb938c4 | 25-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Add ___compat suffix to scx_bpf_dsq_insert___v2 in compat.bpf.h
2dbbdeda77a6 ("sched_ext: Fix scx_bpf_dsq_insert() backward binary compatibility") renamed the new bool-returning variant t
sched_ext: Add ___compat suffix to scx_bpf_dsq_insert___v2 in compat.bpf.h
2dbbdeda77a6 ("sched_ext: Fix scx_bpf_dsq_insert() backward binary compatibility") renamed the new bool-returning variant to scx_bpf_dsq_insert___v2 in the kernel. However, libbpf currently only strips ___SUFFIX on the BPF side, not on kernel symbols, so the compat wrapper couldn't match the kernel kfunc and would always fall back to the old variant even when the new one was available.
Add an extra ___compat suffix as a workaround - libbpf strips one suffix on the BPF side leaving ___v2, which then matches the kernel kfunc directly. In the future when libbpf strips all suffixes on both sides, all suffixes can be dropped.
Fixes: 2dbbdeda77a6 ("sched_ext: Fix scx_bpf_dsq_insert() backward binary compatibility") Cc: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 2dbbdeda | 21-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Fix scx_bpf_dsq_insert() backward binary compatibility
cded46d97159 ("sched_ext: Make scx_bpf_dsq_insert*() return bool") introduced a new bool-returning scx_bpf_dsq_insert() and renamed
sched_ext: Fix scx_bpf_dsq_insert() backward binary compatibility
cded46d97159 ("sched_ext: Make scx_bpf_dsq_insert*() return bool") introduced a new bool-returning scx_bpf_dsq_insert() and renamed the old void-returning version to scx_bpf_dsq_insert___compat, with the expectation that libbpf would match old binaries to the ___compat variant, maintaining backward binary compatibility. However, while libbpf ignores ___suffix on the BPF side when matching symbols, it doesn't do so for kernel-side symbols. Old binaries compiled with the original scx_bpf_dsq_insert() could no longer resolve the symbol.
Fix by reversing the naming: Keep scx_bpf_dsq_insert() as the old void-returning interface and add ___v2 to the new bool-returning version. This allows old binaries to continue working while new code can use the ___v2 variant. Once libbpf is updated to ignore kernel-side ___SUFFIX, the ___v2 suffix can be dropped when the compat interface is removed.
v2: Use ___v2 instead of ___new.
Fixes: cded46d97159 ("sched_ext: Make scx_bpf_dsq_insert*() return bool") Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 44f5c8ec | 15-Oct-2025 |
Ryan Newton <newton@meta.com> |
sched_ext: Add lockless peek operation for DSQs
The builtin DSQ queue data structures are meant to be used by a wide range of different sched_ext schedulers with different demands on these data stru
sched_ext: Add lockless peek operation for DSQs
The builtin DSQ queue data structures are meant to be used by a wide range of different sched_ext schedulers with different demands on these data structures. They might be per-cpu with low-contention, or high-contention shared queues. Unfortunately, DSQs have a coarse-grained lock around the whole data structure. Without going all the way to a lock-free, more scalable implementation, a small step we can take to reduce lock contention is to allow a lockless, small-fixed-cost peek at the head of the queue.
This change allows certain custom SCX schedulers to cheaply peek at queues, e.g. during load balancing, before locking them. But it represents a few extra memory operations to update the pointer each time the DSQ is modified, including a memory barrier on ARM so the write appears correctly ordered.
This commit adds a first_task pointer field which is updated atomically when the DSQ is modified, and allows any thread to peek at the head of the queue without holding the lock.
Signed-off-by: Ryan Newton <newton@meta.com> Reviewed-by: Andrea Righi <arighi@nvidia.com> Reviewed-by: Christian Loehle <christian.loehle@arm.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| bd7143e7 | 07-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext/tools: Add compat wrapper for scx_bpf_task_set_slice/dsq_vtime()
for sub-scheduler authority checks. Add compat wrappers which fall back to direct p->scx field writes on older kernels.
Su
sched_ext/tools: Add compat wrapper for scx_bpf_task_set_slice/dsq_vtime()
for sub-scheduler authority checks. Add compat wrappers which fall back to direct p->scx field writes on older kernels.
Suggested-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| cded46d9 | 07-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Make scx_bpf_dsq_insert*() return bool
In preparation for hierarchical schedulers, change scx_bpf_dsq_insert() and scx_bpf_dsq_insert_vtime() to return bool instead of void. With sub-sche
sched_ext: Make scx_bpf_dsq_insert*() return bool
In preparation for hierarchical schedulers, change scx_bpf_dsq_insert() and scx_bpf_dsq_insert_vtime() to return bool instead of void. With sub-schedulers, there will be no reliable way to guarantee a task is still owned by the sub-scheduler at insertion time (e.g., the task may have been migrated to another scheduler). The bool return value will enable sub-schedulers to detect and gracefully handle insertion failures.
For the root scheduler, insertion failures will continue to trigger scheduler abort via scx_error(), so existing code doesn't need to check the return value. Backward compatibility is maintained through compat wrappers.
Also update scx_bpf_dsq_move() documentation to clarify that it can return false for sub-schedulers when @dsq_id points to a disallowed local DSQ.
Reviewed-by: Changwoo Min <changwoo@igalia.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| c0d630ba | 07-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Wrap kfunc args in struct to prepare for aux__prog
scx_bpf_dsq_insert_vtime() and scx_bpf_select_cpu_and() currently have 5 parameters. An upcoming change will add aux__prog parameter whi
sched_ext: Wrap kfunc args in struct to prepare for aux__prog
scx_bpf_dsq_insert_vtime() and scx_bpf_select_cpu_and() currently have 5 parameters. An upcoming change will add aux__prog parameter which will exceed BPF's 5 argument limit.
Prepare by adding new kfuncs __scx_bpf_dsq_insert_vtime() and __scx_bpf_select_cpu_and() that take args structs. The existing kfuncs are kept as compatibility wrappers. BPF programs use inline wrappers that detect kernel API version via bpf_core_type_exists() and use the new struct-based kfuncs when available, falling back to compat kfuncs otherwise. This allows BPF programs to work with both old and new kernels.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 3035addf | 07-Oct-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Add scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime()
With the planned hierarchical scheduler support, sub-schedulers will need to be verified for authority before being allowed t
sched_ext: Add scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime()
With the planned hierarchical scheduler support, sub-schedulers will need to be verified for authority before being allowed to modify task->scx.slice and task->scx.dsq_vtime. Add scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() which will perform the necessary permission checks.
Root schedulers can still directly write to these fields, so this doesn't affect existing schedulers.
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 7852e0fd | 23-Sep-2025 |
Tejun Heo <tj@kernel.org> |
tools/sched_ext: scx_qmap: Make debug output quieter by default
scx_qmap currently outputs verbose debug messages including cgroup operations and CPU online/offline events by default, which can be n
tools/sched_ext: scx_qmap: Make debug output quieter by default
scx_qmap currently outputs verbose debug messages including cgroup operations and CPU online/offline events by default, which can be noisy during normal operation. While the existing -P option controls DSQ dumps and event statistics, there's no way to suppress the other debug messages.
Split the debug output controls to make scx_qmap quieter by default. The -P option continues to control DSQ dumps and event statistics (print_dsqs_and_events), while a new -M option controls debug messages like cgroup operations and CPU events (print_msgs). This allows users to run scx_qmap with minimal output and selectively enable debug information as needed.
Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| d4529728 | 23-Sep-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Make qmap dump operation non-destructive
The qmap dump operation was destructively consuming queue entries while displaying them. As dump can be triggered anytime, this can easily lead to
sched_ext: Make qmap dump operation non-destructive
The qmap dump operation was destructively consuming queue entries while displaying them. As dump can be triggered anytime, this can easily lead to stalls. Add a temporary dump_store queue and modify the dump logic to pop entries, display them, and then restore them back to the original queue. This allows dump operations to be performed without affecting the scheduler's queue state.
Note that if racing against new enqueues during dump, ordering can get mixed up, but this is acceptable for debugging purposes.
Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 20b15809 | 03-Sep-2025 |
Christian Loehle <christian.loehle@arm.com> |
sched_ext: Introduce scx_bpf_cpu_curr()
Provide scx_bpf_cpu_curr() as a way for scx schedulers to check the curr task of a remote rq without assuming its lock is held.
Many scx schedulers make use
sched_ext: Introduce scx_bpf_cpu_curr()
Provide scx_bpf_cpu_curr() as a way for scx schedulers to check the curr task of a remote rq without assuming its lock is held.
Many scx schedulers make use of scx_bpf_cpu_rq() to check a remote curr (e.g. to see if it should be preempted). This is problematic because scx_bpf_cpu_rq() provides access to all fields of struct rq, most of which aren't safe to use without holding the associated rq lock.
Signed-off-by: Christian Loehle <christian.loehle@arm.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| e0ca1696 | 03-Sep-2025 |
Christian Loehle <christian.loehle@arm.com> |
sched_ext: Introduce scx_bpf_locked_rq()
Most fields in scx_bpf_cpu_rq() assume that its rq_lock is held. Furthermore they become meaningless without rq lock, too. Make a safer version of scx_bpf_cp
sched_ext: Introduce scx_bpf_locked_rq()
Most fields in scx_bpf_cpu_rq() assume that its rq_lock is held. Furthermore they become meaningless without rq lock, too. Make a safer version of scx_bpf_cpu_rq() that only returns a rq if we hold rq lock of that rq.
Also mark the new scx_bpf_locked_rq() as returning NULL as scx_bpf_cpu_rq() should've been too.
Signed-off-by: Christian Loehle <christian.loehle@arm.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| f203683c | 18-Apr-2025 |
Honglei Wang <jameshongleiwang@126.com> |
sched_ext: change the variable name for slice refill event
SCX_EV_ENQ_SLICE_DFL gives the impression that the event only occurs when the tasks were enqueued, which seems not accurate. What it actual
sched_ext: change the variable name for slice refill event
SCX_EV_ENQ_SLICE_DFL gives the impression that the event only occurs when the tasks were enqueued, which seems not accurate. What it actually means is the refilling with defalt slice, and this can occur either when enqueue or pick_task. Let's change the variable to SCX_EV_REFILL_SLICE_DFL.
Signed-off-by: Honglei Wang <jameshongleiwang@126.com> Acked-by: Changwoo Min <changwoo@igalia.com> Acked-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
| 6d65f682 | 14-Apr-2025 |
yangsonghua <jluyangsonghua@gmail.com> |
sched_ext: Improve cross-compilation support in Makefile
Modify the tools/sched_ext/Makefile to better handle cross-compilation environments by:
1. Fix host tools build directory structure by separ
sched_ext: Improve cross-compilation support in Makefile
Modify the tools/sched_ext/Makefile to better handle cross-compilation environments by:
1. Fix host tools build directory structure by separating obj/ from output (HOST_BUILD_DIR now points to $(OBJ_DIR)/host/obj) 2. Properly propagate CROSS_COMPILE to libbpf sub-make invocation 3. Add missing $(HOST_BPFOBJ) build rule with proper host toolchain flags (ARCH=, CROSS_COMPILE=, explicit HOSTCC/HOSTLD) 4. Consistently quote $(HOSTCC) in bpftool build rule 5. Change LDFLAGS assignment to += to allow external extensions
The changes ensure proper cross-compilation behavior while maintaining backward compatibility with native builds. Host tools are now correctly built with the host toolchain while target binaries use the cross-toolchain.
Signed-off-by: yangsonghua <yangsonghua@lixiang.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|