Revision tags: v6.16-rc7 |
|
#
bf61759d |
| 19-Jul-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'sched_ext-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix handling of migration disabled tasks in default id
Merge tag 'sched_ext-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix handling of migration disabled tasks in default idle selection
- update_locked_rq() called __this_cpu_write() spuriously with NULL when @rq was not locked. As the writes were spurious, it didn't break anything directly. However, the function could be called in a preemptible leading to a context warning in __this_cpu_write(). Skip the spurious NULL writes.
- Selftest fix on UP
* tag 'sched_ext-for-6.16-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: idle: Handle migration-disabled tasks in idle selection sched/ext: Prevent update_locked_rq() calls with NULL rq selftests/sched_ext: Fix exit selftest hang on UP
show more ...
|
Revision tags: v6.16-rc6, v6.16-rc5 |
|
#
06efc9fe |
| 05-Jul-2025 |
Andrea Righi <arighi@nvidia.com> |
sched_ext: idle: Handle migration-disabled tasks in idle selection
When SCX_OPS_ENQ_MIGRATION_DISABLED is enabled, migration-disabled tasks are also routed to ops.enqueue(). A scheduler may attempt
sched_ext: idle: Handle migration-disabled tasks in idle selection
When SCX_OPS_ENQ_MIGRATION_DISABLED is enabled, migration-disabled tasks are also routed to ops.enqueue(). A scheduler may attempt to dispatch such tasks directly to an idle CPU using the default idle selection policy via scx_bpf_select_cpu_and() or scx_bpf_select_cpu_dfl().
This scenario must be properly handled by the built-in idle policy to avoid returning an idle CPU where the target task isn't allowed to run. Otherwise, it can lead to errors such as:
EXIT: runtime error (SCX_DSQ_LOCAL[_ON] cannot move migration disabled Chrome_ChildIOT[291646] from CPU 3 to 14)
Prevent this by explicitly handling migration-disabled tasks in the built-in idle selection logic, maintaining their CPU affinity.
Fixes: a730e3f7a48bc ("sched_ext: idle: Consolidate default idle CPU selection kfuncs") Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
Revision tags: v6.16-rc4, v6.16-rc3, v6.16-rc2, v6.16-rc1 |
|
#
16b70698 |
| 04-Jun-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'sched_ext-for-6.16-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo: "Two fixes in the built-in idle selection helpers:
-
Merge tag 'sched_ext-for-6.16-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo: "Two fixes in the built-in idle selection helpers:
- Fix prev_cpu handling to guarantee that idle selection never returns a CPU that's not allowed
- Skip cross-node search when !NUMA which could lead to infinite looping due to a bug in NUMA iterator"
* tag 'sched_ext-for-6.16-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: idle: Skip cross-node search with !CONFIG_NUMA sched_ext: idle: Properly handle invalid prev_cpu during idle selection
show more ...
|
#
9960be72 |
| 03-Jun-2025 |
Andrea Righi <arighi@nvidia.com> |
sched_ext: idle: Skip cross-node search with !CONFIG_NUMA
In the idle CPU selection logic, attempting cross-node searches adds unnecessary complexity when CONFIG_NUMA is disabled.
Since there's no
sched_ext: idle: Skip cross-node search with !CONFIG_NUMA
In the idle CPU selection logic, attempting cross-node searches adds unnecessary complexity when CONFIG_NUMA is disabled.
Since there's no meaningful concept of nodes in this case, simplify the logic by restricting the idle CPU search to the current node only.
Fixes: 48849271e6611 ("sched_ext: idle: Per-node idle cpumasks") Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
#
ee9a4e92 |
| 30-May-2025 |
Andrea Righi <arighi@nvidia.com> |
sched_ext: idle: Properly handle invalid prev_cpu during idle selection
The default idle selection policy doesn't properly handle the case where @prev_cpu is not part of the task's allowed CPUs.
In
sched_ext: idle: Properly handle invalid prev_cpu during idle selection
The default idle selection policy doesn't properly handle the case where @prev_cpu is not part of the task's allowed CPUs.
In this situation, it may return an idle CPU that is not usable by the task, breaking the assumption that the returned CPU must always be within the allowed cpumask, causing inefficiencies or even stalls in certain cases.
This issue can arise in the following cases:
- The task's affinity may have changed by the time the function is invoked, especially now that the idle selection logic can be used from multiple contexts (i.e., BPF test_run call).
- The BPF scheduler may provide a @prev_cpu that is not part of the allowed mask, either unintentionally or as a placement hint. In fact @prev_cpu may not necessarily refer to the CPU the task last ran on, but it can also be considered as a target CPU that the scheduler wishes to use for the task.
Therefore, enforce the right behavior by always checking whether @prev_cpu is in the allowed mask, when using scx_bpf_select_cpu_and(), and it's also usable by the task (@p->cpus_ptr). If it is not, try to find a valid CPU nearby @prev_cpu, following the usual locality-aware fallback path (SMT, LLC, node, allowed CPUs).
This ensures the returned CPU is always allowed, improving robustness to affinity changes and invalid scheduler hints, while preserving locality as much as possible.
Fixes: a730e3f7a48bc ("sched_ext: idle: Consolidate default idle CPU selection kfuncs") Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
Revision tags: v6.15 |
|
#
4d4eb387 |
| 19-May-2025 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up changes for other tools/ libraries used by perf and for header synchronization with the kernel sources originals.
Sign
Merge remote-tracking branch 'torvalds/master' into perf-tools-next
To pick up changes for other tools/ libraries used by perf and for header synchronization with the kernel sources originals.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
show more ...
|
#
4f978603 |
| 02-Jun-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge branch 'next' into for-linus
Prepare input updates for 6.16 merge window.
|
Revision tags: v6.15-rc7 |
|
#
d51b9d81 |
| 16-May-2025 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.15-rc6' into next
Sync up with mainline to bring in xpad controller changes.
|
#
ef223385 |
| 26-May-2025 |
Jason Gunthorpe <jgg@nvidia.com> |
Merge tag 'v6.15' into rdma.git for-next
Following patches need the RDMA rc branch since we are past the RC cycle now.
Merge conflicts resolved based on Linux-next:
- For RXE odp changes keep for-
Merge tag 'v6.15' into rdma.git for-next
Following patches need the RDMA rc branch since we are past the RC cycle now.
Merge conflicts resolved based on Linux-next:
- For RXE odp changes keep for-next version and fixup new places that need to call is_odp_mr() https://lore.kernel.org/r/20250422143019.500201bd@canb.auug.org.au https://lore.kernel.org/r/20250514122455.3593b083@canb.auug.org.au
- irdma is keeping the while/kfree bugfix from -rc and the pf/cdev_info change from for-next https://lore.kernel.org/r/20250513130630.280ee6c5@canb.auug.org.au
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
#
9c7dcf4c |
| 23-May-2025 |
Wolfram Sang <wsa+renesas@sang-engineering.com> |
Merge tag 'i2c-host-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow
i2c-host updates for v6.16
Cleanups and refactorings - Many drivers switched to
Merge tag 'i2c-host-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-mergewindow
i2c-host updates for v6.16
Cleanups and refactorings - Many drivers switched to dev_err_probe() - Generic cleanups applied to designware, iproc, ismt, mlxbf, npcm7xx, qcom-geni, pasemi, and thunderx - davinci: declare I2C mangling support among I2C features - designware: clean up DTS handling - designware: fix PM runtime on driver unregister - imx: improve error logging during probe - lpc2k: improve checks in probe error path - xgene-slimpro: improve PCC shared memory handling - pasemi: improve error handling in reset, smbus clear, timeouts - tegra: validate buffer length during transfers - wmt: convert binding to YAML format
Improvements and extended support: - microchip-core: add SMBus support - mlxbf: add support for repeated start in block transfers - mlxbf: improve timer configuration - npcm: attempt clock toggle recovery before failing init - octeon: add support for block mode operations - pasemi: add support for unjam device feature - riic: add support for bus recovery
New device support: - MediaTek Dimensity 1200 (MT6893) - Sophgo SG2044 - Renesas RZ/V2N (R9A09G056) - Rockchip RK3528 - AMD ISP (new driver)
Misc changes: - core: add support for Write Disable-aware SPD
show more ...
|
#
85502b22 |
| 26-May-2025 |
Paolo Bonzini <pbonzini@redhat.com> |
Merge tag 'loongarch-kvm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.16
1. Don't flush tlb if HW PTW supported. 2. Add Lo
Merge tag 'loongarch-kvm-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD
LoongArch KVM changes for v6.16
1. Don't flush tlb if HW PTW supported. 2. Add LoongArch KVM selftests support.
show more ...
|
#
feacb177 |
| 28-May-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'sched_ext-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- More in-kernel idle CPU selection improvements. Expand topolog
Merge tag 'sched_ext-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext updates from Tejun Heo:
- More in-kernel idle CPU selection improvements. Expand topology awareness coverage add scx_bpf_select_cpu_and() to allow more flexibility. The idle CPU selection kfuncs can now be called from unlocked contexts too.
- A bunch of reorganization changes to lay the foundation for multiple hierarchical scheduler support. This isn't ready yet and the included changes don't make meaningful behavior differences. One notable change is replacing some static_key tests with dynamic tests as the test results may differ depending on the scheduler instance. This isn't expected to cause meaningful performance difference.
- Other minor and doc updates.
- There were multiple patches in for-6.15-fixes which conflicted with changes in for-6.16. for-6.15-fixes were pulled three times into for-6.16 to resolve the conflicts.
* tag 'sched_ext-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (49 commits) sched_ext: Call ops.update_idle() after updating builtin idle bits sched_ext, docs: convert mentions of "CFS" to "fair-class scheduler" selftests/sched_ext: Update test enq_select_cpu_fails sched_ext: idle: Consolidate default idle CPU selection kfuncs selftests/sched_ext: Add test for scx_bpf_select_cpu_and() via test_run sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and() sched_ext: Make scx_kf_allowed_if_unlocked() available outside ext.c sched_ext, docs: add label sched_ext: Explain the temporary situation around scx_root dereferences sched_ext: Add @sch to SCX_CALL_OP*() sched_ext: Cleanup [__]scx_exit/error*() sched_ext: Add @sch to SCX_CALL_OP*() sched_ext: Clean up scx_root usages Documentation: scheduler: Changed lowercase acronyms to uppercase sched_ext: Avoid NULL scx_root deref in __scx_exit() sched_ext: Add RCU protection to scx_root in DSQ iterator sched_ext: Clean up SCX_EXIT_NONE handling in scx_disable_workfn() sched_ext: Move disable machinery into scx_sched sched_ext: Move event_stats_cpu into scx_sched ...
show more ...
|
#
273cc949 |
| 22-May-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Call ops.update_idle() after updating builtin idle bits
BPF schedulers that use both builtin CPU idle mechanism and ops.update_idle() may want to use the latter to create interlocking bet
sched_ext: Call ops.update_idle() after updating builtin idle bits
BPF schedulers that use both builtin CPU idle mechanism and ops.update_idle() may want to use the latter to create interlocking between ops.enqueue() and CPU idle transitions so that either ops.enqueue() sees the idle bit or ops.update_idle() sees the task queued somewhere. This can prevent race conditions where CPUs go idle while tasks are waiting in DSQs.
For such interlocking to work, ops.update_idle() must be called after builtin CPU masks are updated. Relocate the invocation. Currently, there are no ordering requirements on transitions from idle and this relocation isn't expected to make meaningful differences in that direction.
This also makes the ops.update_idle() behavior semantically consistent: any action performed in this callback should be able to override the builtin idle state, not the other way around.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-and-tested-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com>
show more ...
|
#
a730e3f7 |
| 21-May-2025 |
Andrea Righi <arighi@nvidia.com> |
sched_ext: idle: Consolidate default idle CPU selection kfuncs
There is no reason to restrict scx_bpf_select_cpu_dfl() invocations to ops.select_cpu() while allowing scx_bpf_select_cpu_and() to be u
sched_ext: idle: Consolidate default idle CPU selection kfuncs
There is no reason to restrict scx_bpf_select_cpu_dfl() invocations to ops.select_cpu() while allowing scx_bpf_select_cpu_and() to be used from multiple contexts, as both provide equivalent functionality, with the latter simply accepting an additional "allowed" cpumask.
Therefore, unify the two APIs, enabling both kfuncs to be used from ops.select_cpu(), ops.enqueue(), and unlocked contexts (e.g., via BPF test_run).
This allows schedulers to implement a consistent idle CPU selection policy and helps reduce code duplication.
Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
#
4ac760bd |
| 15-May-2025 |
Andrea Righi <arighi@nvidia.com> |
sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
Allow scx_bpf_select_cpu_and() to be used from an unlocked context, in addition to ops.enqueue() or ops.select_cpu().
This enab
sched_ext: idle: Allow scx_bpf_select_cpu_and() from unlocked context
Allow scx_bpf_select_cpu_and() to be used from an unlocked context, in addition to ops.enqueue() or ops.select_cpu().
This enables schedulers, including user-space ones, to implement a consistent idle CPU selection policy and helps reduce code duplication.
Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
#
686d1337 |
| 15-May-2025 |
Andrea Righi <arighi@nvidia.com> |
sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and()
Validate locking correctness when accessing p->nr_cpus_allowed and p->cpus_ptr inside scx_bpf_select_cpu_and(): if the rq lo
sched_ext: idle: Validate locking correctness in scx_bpf_select_cpu_and()
Validate locking correctness when accessing p->nr_cpus_allowed and p->cpus_ptr inside scx_bpf_select_cpu_and(): if the rq lock is held, access is safe; otherwise, require that p->pi_lock is held.
This allows to catch potential unsafe calls to scx_bpf_select_cpu_and().
Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|
#
bebd7b26 |
| 15-May-2025 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc7).
Conflicts:
tools/testing/selftests/drivers/net/hw/ncdevmem.c 97c4e
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.15-rc7).
Conflicts:
tools/testing/selftests/drivers/net/hw/ncdevmem.c 97c4e094a4b2 ("tests/ncdevmem: Fix double-free of queue array") 2f1a805f32ba ("selftests: ncdevmem: Implement devmem TCP TX") https://lore.kernel.org/20250514122900.1e77d62d@canb.auug.org.au
Adjacent changes:
net/core/devmem.c net/core/devmem.h 0afc44d8cdf6 ("net: devmem: fix kernel panic when netlink socket close after module unload") bd61848900bf ("net: devmem: Implement TX path")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
a8433f7a |
| 14-May-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Add @sch to SCX_CALL_OP*()
In preparation of hierarchical scheduling support, add @sch to scx_exit() and friends:
- scx_exit/error() updated to take explicit @sch instead of assuming s
sched_ext: Add @sch to SCX_CALL_OP*()
In preparation of hierarchical scheduling support, add @sch to scx_exit() and friends:
- scx_exit/error() updated to take explicit @sch instead of assuming scx_root.
- scx_kf_exit/error() added. These are to be used from kfuncs, don't take @sch and internally determine the scx_sched instance to abort. Currently, it's always scx_root but once multiple scheduler support is in place, it will be the scx_sched instance that invoked the kfunc. This simplifies many callsites and defers scx_sched lookup until error is triggered.
- @sch is propagated to ops_cpu_valid() and ops_sanitize_err(). The CPU validity conditions in ops_cpu_valid() are factored into __cpu_valid() to implement kf_cpu_valid() which is the counterpart to scx_kf_exit/error().
- All users are converted. Most conversions are straightforward. check_rq_for_timeouts() and scx_softlockup() are updated to use explicit rcu_dereference*(scx_root) for safety as they may execute asynchronous to the exit path. scx_tick() is also updated to use rcu_dereference(). While not strictly necessary due to the preceding scx_enabled() test and IRQ disabled context, this removes the subtlety at no noticeable cost.
No behavior changes intended.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
#
ab3f497a |
| 14-May-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Add @sch to SCX_CALL_OP*()
In preparation of hierarchical scheduling support, make SCX_CALL_OP*() take explicit @sch instead of assuming scx_root. As scx_root is still the only scheduler
sched_ext: Add @sch to SCX_CALL_OP*()
In preparation of hierarchical scheduling support, make SCX_CALL_OP*() take explicit @sch instead of assuming scx_root. As scx_root is still the only scheduler instance, this patch doesn't make any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
#
d310fb40 |
| 14-May-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Clean up scx_root usages
- Always cache scx_root into local variable sch before using.
- Don't use scx_root if cached sch is available.
- Wrap !sch test with unlikely().
- Pass @scx in
sched_ext: Clean up scx_root usages
- Always cache scx_root into local variable sch before using.
- Don't use scx_root if cached sch is available.
- Wrap !sch test with unlikely().
- Pass @scx into scx_cgroup_init/exit().
No behavior changes intended.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
show more ...
|
Revision tags: v6.15-rc6, v6.15-rc5 |
|
#
844e31bb |
| 29-Apr-2025 |
Rob Clark <robdclark@chromium.org> |
Merge remote-tracking branch 'drm-misc/drm-misc-next' into msm-next
Merge drm-misc-next to get commit Fixes: fec450ca15af ("drm/display: hdmi: provide central data authority for ACR params").
Signe
Merge remote-tracking branch 'drm-misc/drm-misc-next' into msm-next
Merge drm-misc-next to get commit Fixes: fec450ca15af ("drm/display: hdmi: provide central data authority for ACR params").
Signed-off-by: Rob Clark <robdclark@chromium.org>
show more ...
|
#
48e12677 |
| 29-Apr-2025 |
Tejun Heo <tj@kernel.org> |
sched_ext: Introduce scx_sched
To support multiple scheduler instances, collect some of the global variables that should be specific to a scheduler instance into the new struct scx_sched. scx_root i
sched_ext: Introduce scx_sched
To support multiple scheduler instances, collect some of the global variables that should be specific to a scheduler instance into the new struct scx_sched. scx_root is the root scheduler instance and points to a static instance of struct scx_sched. Except for an extra dereference through the scx_root pointer, this patch makes no functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com> Acked-by: Changwoo Min <changwoo@igalia.com>
show more ...
|
Revision tags: v6.15-rc4 |
|
#
3ab7ae8e |
| 24-Apr-2025 |
Thomas Hellström <thomas.hellstrom@linux.intel.com> |
Merge drm/drm-next into drm-xe-next
Backmerge to bring in linux 6.15-rc.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
|
#
ac47c272 |
| 22-Apr-2025 |
Tejun Heo <tj@kernel.org> |
Merge branch 'for-6.15-fixes' into for-6.16
a11d6784d731 ("sched_ext: Fix missing rq lock in scx_bpf_cpuperf_set()") added a call to scx_ops_error() which was renamed to scx_error() in for-6.16. Fix
Merge branch 'for-6.15-fixes' into for-6.16
a11d6784d731 ("sched_ext: Fix missing rq lock in scx_bpf_cpuperf_set()") added a call to scx_ops_error() which was renamed to scx_error() in for-6.16. Fix it up.
show more ...
|
Revision tags: v6.15-rc3, v6.15-rc2, v6.15-rc1 |
|
#
683d2d0f |
| 05-Apr-2025 |
Andrea Righi <arighi@nvidia.com> |
sched_ext: idle: Introduce scx_bpf_select_cpu_and()
Provide a new kfunc, scx_bpf_select_cpu_and(), that can be used to apply the built-in idle CPU selection policy to a subset of allowed CPU.
This
sched_ext: idle: Introduce scx_bpf_select_cpu_and()
Provide a new kfunc, scx_bpf_select_cpu_and(), that can be used to apply the built-in idle CPU selection policy to a subset of allowed CPU.
This new helper is basically an extension of scx_bpf_select_cpu_dfl(). However, when an idle CPU can't be found, it returns a negative value instead of @prev_cpu, aligning its behavior more closely with scx_bpf_pick_idle_cpu().
It also accepts %SCX_PICK_IDLE_* flags, which can be used to enforce strict selection to @prev_cpu's node (%SCX_PICK_IDLE_IN_NODE), or to request only a full-idle SMT core (%SCX_PICK_IDLE_CORE), while applying the built-in selection logic.
With this helper, BPF schedulers can apply the built-in idle CPU selection policy restricted to any arbitrary subset of CPUs.
Example usage =============
Possible usage in ops.select_cpu():
s32 BPF_STRUCT_OPS(foo_select_cpu, struct task_struct *p, s32 prev_cpu, u64 wake_flags) { const struct cpumask *cpus = task_allowed_cpus(p) ?: p->cpus_ptr; s32 cpu;
cpu = scx_bpf_select_cpu_and(p, prev_cpu, wake_flags, cpus, 0); if (cpu >= 0) { scx_bpf_dsq_insert(p, SCX_DSQ_LOCAL, SCX_SLICE_DFL, 0); return cpu; }
return prev_cpu; }
Results =======
Load distribution on a 4 sockets, 4 cores per socket system, simulated using virtme-ng, running a modified version of scx_bpfland that uses scx_bpf_select_cpu_and() with 0xff00 as the allowed subset of CPUs:
$ vng --cpu 16,sockets=4,cores=4,threads=1 ... $ stress-ng -c 16 ... $ htop ... 0[ 0.0%] 8[||||||||||||||||||||||||100.0%] 1[ 0.0%] 9[||||||||||||||||||||||||100.0%] 2[ 0.0%] 10[||||||||||||||||||||||||100.0%] 3[ 0.0%] 11[||||||||||||||||||||||||100.0%] 4[ 0.0%] 12[||||||||||||||||||||||||100.0%] 5[ 0.0%] 13[||||||||||||||||||||||||100.0%] 6[ 0.0%] 14[||||||||||||||||||||||||100.0%] 7[ 0.0%] 15[||||||||||||||||||||||||100.0%]
With scx_bpf_select_cpu_dfl() tasks would be distributed evenly across all the available CPUs.
Signed-off-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Tejun Heo <tj@kernel.org>
show more ...
|