History log of /freebsd/sys/kern/sched_ule.c (Results 1 – 25 of 823)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# df114dae 03-Jul-2025 Ruslan Bukin <br@FreeBSD.org>

Import the Hardware Trace (HWT) framework.

The HWT framework provides infrastructure for hardware-assisted tracing. It
collects detailed information about software execution and records it as
"event

Import the Hardware Trace (HWT) framework.

The HWT framework provides infrastructure for hardware-assisted tracing. It
collects detailed information about software execution and records it as
"events" in highly compressed format into DRAM. The events cover information
about control flow changes of a program, whether branches taken or not,
exceptions taken, timing information, cycles elapsed and more. This allows
to reconstruct entire program flow of a given application.

This comes with separate machine-dependent tracing backends for trace
collection, trace decoder libraries and an instrumentation tool.

Reviewed by: kib (sys/kern bits)
Sponsored by: UKRI
Differential Revision: https://reviews.freebsd.org/D40466

show more ...


# 013c58ce 18-Jun-2025 Olivier Certner <olce@FreeBSD.org>

sched_ule: 32-bit platforms: Fix runq_print() after runq changes

The compiler would report a mismatch between the format and the actual
type of the runqueue status word because the latter is now
unc

sched_ule: 32-bit platforms: Fix runq_print() after runq changes

The compiler would report a mismatch between the format and the actual
type of the runqueue status word because the latter is now
unconditionally defined as an 'unsigned long' (which has the "natural"
platform size) and the format expects a 'size_t', which expands to an
'unsigned int' on 32-bit platforms (although they are both of the same
actual size).

This worked before as the C type used depended on the architecture and
was set to 'uint32_t' aka 'unsigned int' on these 32-bit platforms.

Just fix the format (use 'l'). While here, remove outputting '0x' by
hand, instead relying on '#' (only difference is for 0, and is fine).

runq_print() should be moved out of 'sched_ule.c' in a subsequent
commit.

Reported by: Jenkins
Fixes: 79d8a99ee583 ("runq: Deduce most parameters, remove machine headers")
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation

show more ...


Revision tags: release/14.3.0, release/13.4.0-p5, release/13.5.0-p1, release/14.2.0-p3, release/13.5.0, release/14.2.0-p2, release/14.1.0-p8, release/13.4.0-p4, release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3, release/14.2.0, release/13.4.0
# a33225ef 25-Jun-2024 Olivier Certner <olce@FreeBSD.org>

sched_ule: Sanitize CPU's use and priority computations, and ticks storage

Computation of %CPU in sched_pctcpu() was overly complicated, wrong in
the case of a non-maximal window (10 seconds span; t

sched_ule: Sanitize CPU's use and priority computations, and ticks storage

Computation of %CPU in sched_pctcpu() was overly complicated, wrong in
the case of a non-maximal window (10 seconds span; this is always the
case in practice as the window would oscillate between 10 and 11 seconds
for continuously running processes) and performed unshifted for the
first part, essentially losing precision (up to 9% for SCHED_TICK_SECS
being 10), and with some uneffective shift for the second part.
Conserve maximum precision by only shifting by the require amount to
attain FSHIFT before dividing. Apply classical rounding to nearest
instead of rounding down.

To generally avoid wraparound problems with tick fields in 'struct
td_sched' (as already happened once in sched_pctcpu_update()), make then
all unsigned, and ensure 'ticks' is always converted to some 'u_int'.
While here, fix SCHED_AFFINITY().

Rewrite sched_pctcpu_update() while keeping the existing formulas:
- Fix the hole in the cliff case that in theory 'ts_ticks' can become
greater than the window size if a running thread has not been
accounted for too long (today cannot happen because of sched_clock()).
- Make the decay ratio explicit and configurable (SCHED_CPU_DECAY_NUMER,
SCHED_CPU_DECAY_DENOM). Set it to the current value (10/11),
currently producing a 95% attenuation after about ~32s. This eases
experimenting with changing it. Apply the ratio on shifted ticks for
better precision, independently of the chosen value for
SCHED_TICK_MAX/SCHED_TICK_SECS.
- Remove redundant SCHED_TICK_TARG. Compute SCHED_TICK_MAX from
SCHED_TICK_SECS, the latter now really specifying the maximum size of
the %CPU estimation window.
- Ensure it is immune to varying 'hz' (which today can't happen), so
that after computation SCHED_TICK_RUN(ts) is mathematically guaranteed
lower than SCHED_TICK_LENGTH(ts).
- Thoroughly explain the current formula, and mention its main drawback
(it is completely dependent on the frequency of calls to
sched_pctcpu_update(), which currently manifests itself for sleeping
threads).

Rework sched_priority():
- Ensure 'p_nice' is read only once, to be immune to a concurrent
change.
- Clearly show that the computed priority is the sum of 3 components.
Make them all positive by shifting the starting priority and shifting
the nice value in SCHED_PRI_NICE().
- Compute the priority offset deriving from the %CPU with rounding to
nearest.
- Much more informative KASSERT() output with details regarding the
priority computation.

MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46567

show more ...


# 6792f341 17-Jun-2024 Olivier Certner <olce@FreeBSD.org>

sched_ule: Recover previous nice and anti-starvation behaviors

Justification for this change is to avoid disturbing ULE's behavior too
much at this time. We however acknowledge that the effect of "

sched_ule: Recover previous nice and anti-starvation behaviors

Justification for this change is to avoid disturbing ULE's behavior too
much at this time. We however acknowledge that the effect of "nice"
values is extremely weak and will most probably change it going forward.

Tuning allows to mostly recover ULE's behavior prior to the switch to
a single 256-queue runqueue and the increase of the timesharing priority
levels' range.

After this change, in a series of test involving two long-running
processes with varying nice values competing for the same CPU, we
observe that used CPU time ratios of the highest priority process to
change by at most 1.15% and on average by 0.46% (absolute differences).
In relative differences, they change by at most 2% and on average by
0.78%.

In order to preserve these ratios, as the number of priority levels
alloted to timesharing have been raised from 136 to 168 (and the subsets
of them dedicated to either interactive or batch threads scaled
accordingly), we keep the ratio of levels reserved to handle nice values
to those reserved for CPU usage by applying a factor of 5/4 (which is
close to 168/136).

Time-based advance of the timesharing circular queue's head is ULE's
main fairness and anti-starvation mechanism. The higher number of
queues subject to the timesharing scheduling policy is now compensated
by allowing a greater increment of the head offset per tick. Because
there are now 109 queue levels dedicated to the timesharing scheduling
policy (in contrast with the 168 levels alloted to timesharing levels,
which include the former but also those dedicated to threads considered
interactive) whereas there previously were 64 ones (priorities spread
into a single, separate runqueue), we advance the circular queue's head
7/4 faster (a ratio close to 109/64).

While here, take into account 'cnt' as the number of ticks when
advancing the circular queue's head. This fix depends on the other code
changes enabling incrementation by more than one.

MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46566

show more ...


Revision tags: release/14.1.0
# baecdea1 14-May-2024 Olivier Certner <olce@FreeBSD.org>

sched_ule: Use a single runqueue per CPU

Previously, ULE would use 3 separate runqueues per CPU to store threads,
one for each of its selection policies, which are realtime, timesharing
and idle. T

sched_ule: Use a single runqueue per CPU

Previously, ULE would use 3 separate runqueues per CPU to store threads,
one for each of its selection policies, which are realtime, timesharing
and idle. They would be examined in this order, and the first thread
found would be the one selected.

This choice indeed appears as the easiest evolution from the single
runqueue used by sched_4bsd (4BSD): It allows sharing most of the same
runqueue code, which currently defines 64 levels per runqueue, while
multiplying the number of levels (by 3). However, it has several
important drawbacks:

1. The number of levels is the same for each selection policy. 64 is
unnecessarily large for the idle policy (only 32 distinct levels would
be necessary, given the 32 levels of our RTP_PRIO_IDLE and their future
aliases in the to-be-introduced SCHED_IDLE POSIX scheduling policy) and
unnecessary restrictive both for the realtime policy (which should
include 32 distinct levels for PRI_REALTIME, given our implementation of
SCHED_RR/SCHED_FIFO, leaving at most 32 levels for ULE's interactive
processes where the current implementation provisions 48 (perhaps taking
into account the spreading problem, see next point)) and the timesharing
one (88 distinct levels currently provisioned).

2. A runqueue has only 64 distinct levels, and maps priorities in the
range [0;255] to a queue index by just performing a division by 4.
Priorities mapped to the same level are treated exactly the same from
a scheduling perspective, which is generally both unexpected and
incorrect. ULE's code tries to compensate for this aliasing in the
timesharing selection policy, by spreading the 88 levels into 256,
knowing the latter amount in the end to only 64 distinct ones. This
scaling is unfortunately not performed for the other policies, breaking
the expectations mentioned in the previous point about distinct priority
levels.

With this change, only a single runqueue is now used to store all
threads, regardless of the scheduling policy ULE applies to them (going
back to what 4BSD has always been doing). ULE's 3 selection policies
are assigned non-overlapping ranges of levels, and helper functions have
been created to select or steal a thread in these distinct ranges,
preserving the "circular" queue mechanism for the timesharing selection
policy that (tries to) prevent starvation in the face of permanent
dynamic priority adjustments.

This change allows to choose any arbitrary repartition of runqueue
levels between selection policies. It is a prerequisite to the increase
to 256 levels per runqueue, which will allow to dispense with all the
drawbacks listed above.

MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45389

show more ...


# fdf31d27 29-Apr-2024 Olivier Certner <olce@FreeBSD.org>

sched_ule: runq_steal_from(): Suppress first thread special case

This special case was introduced as soon as commit "ULE 3.0"
(ae7a6b38d53f, r171482, from July 2007). It caused runq_steal_from() to

sched_ule: runq_steal_from(): Suppress first thread special case

This special case was introduced as soon as commit "ULE 3.0"
(ae7a6b38d53f, r171482, from July 2007). It caused runq_steal_from() to
ignore the highest-priority thread while stealing.

Its functionality was changed in commit "Rework CPU load balancing in
SCHED_ULE" (36acfc6507aa, r232207, from February 2012), where the intent
was to keep track of that first thread and return it if no other one was
stealable, instead of returning NULL (no steal). Some bug prevented it
from working in loaded cases (more than one thread, and all threads but
the first one not stealable), which was subsequently fixed in commit
"sched_ule(4): Fix interactive threads stealing." (bd84094a51c4, from
September 2021).

All the reasons for this mechanism we could second-guess were dubious at
best. Jeff Roberson, ULE's main author, says in the differential
revision that "The point was to move threads that are least likely to
benefit from affinity because they are unlikely to run soon enough to
take advantage of it.", to which we responded: "(snip) This may improve
affinity in some cases, but at the same time we don't really know when
the next thread on the queue is to run. Not stealing in this case also
amounts to slightly violating the expected execution ordering and
fairness.".

As this twist doesn't seem to bring any performance improvement in
general, let's just remove it.

MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45388

show more ...


# f4be333b 29-Apr-2024 Olivier Certner <olce@FreeBSD.org>

sched_ule: Re-implement stealing on top of runq common-code

Stop using internal knowledge of runqueues. Remove duplicate
boilerplate parts.

Concretely, runq_steal() and runq_steal_from() are now i

sched_ule: Re-implement stealing on top of runq common-code

Stop using internal knowledge of runqueues. Remove duplicate
boilerplate parts.

Concretely, runq_steal() and runq_steal_from() are now implemented on
top of runq_findq().

Besides considerably simplifying the code, this change also brings an
algorithmic improvement since, previously, set bits in the runqueue's
status words were found by testing each bit individually in a loop
instead of using ffsl()/bsfl() (except for the first set bit per status
word).

This change also makes it more apparent that runq_steal_from() treats
the first thread with highest priority specifically (which runq_steal()
does not).

MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45388

show more ...


# a3119317 23-May-2024 Olivier Certner <olce@FreeBSD.org>

runq: New function runq_is_queue_empty(); Use it in ULE

Indicates if some particular queue of the runqueue is empty.

Reviewed by: kib
MFC after: 1 month
Event: Kitchener-Waterloo H

runq: New function runq_is_queue_empty(); Use it in ULE

Indicates if some particular queue of the runqueue is empty.

Reviewed by: kib
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45387

show more ...


# 7e2502e3 23-Apr-2024 Olivier Certner <olce@FreeBSD.org>

runq: More macros; Better and more consistent naming

Most existing macros have ambiguous names regarding which index they
operate on (queue, word, bit?), so have been renamed to improve clarity.
Use

runq: More macros; Better and more consistent naming

Most existing macros have ambiguous names regarding which index they
operate on (queue, word, bit?), so have been renamed to improve clarity.
Use the 'RQSW_' prefix for all macros related to status words, and
change the status word type name accordingly.

Rename RQB_FFS() to RQSW_BSF() to remove confusion about the return
value (ffs*() return bit indices starting at 1, or 0 if the input is 0,
whereas BSF on x86 returns 0-based indices, which is what the current
code assumes). While here, add a check (under INVARIANTS) that
RQSW_BSF() isn't called with 0 as an argument.

Also, rename 'rqb_bits_t' to the more concise 'rqsw_t', 'struct rqbits'
to 'struct rq_status', its 'rqb_bits' field to 'rq_sw' (it designates an
array of words, not bits), and the type 'rqhead' to 'rq_queue'

Add macros computing a queue index from a status word index and a bit in
order to factorize code. If the precise index of the bit is known,
callers can use RQSW_TO_QUEUE_IDX() to get the corresponding queue
index, whereas if they want the one corresponding to the first
(least-significant): set bit in a given status word (corresponding to
the non-empty queue with lower index in the status word), they can use
RQSW_FIRST_QUEUE_IDX() instead.

Add RQSW_BIT_IDX(), which computes the correspond bit's index in the
corresponding status word. This allows more code factorization (even if
most uses will be eliminated in a later commit) and makes what is
computed clearer.

Reviewed by: kib
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45387

show more ...


Revision tags: release/13.3.0
# a11926f2 27-Feb-2024 Olivier Certner <olce@FreeBSD.org>

runq: API tidy up: 'pri' => 'idx', 'idx' as int, remove runq_remove_idx()

Make sure that external and internal users are aware that the runqueue
API always expects queue indices, and not priority le

runq: API tidy up: 'pri' => 'idx', 'idx' as int, remove runq_remove_idx()

Make sure that external and internal users are aware that the runqueue
API always expects queue indices, and not priority levels. Name
arithmetic arguments in 'runq.h' for better immediate reference.

Use plain integers to pass indices instead of 'u_char' (using the latter
probably doesn't bring any gain, and an 'int' makes the API agnostic to
a number of queues greater than 256). Add a static assertion that
RQ_NQS can't be strictly greater than 256 as long as the 'td_rqindex'
thread field is of type 'u_char'.

Add a new macro CHECK_IDX() that checks that an index is non-negative
and below RQ_NQS, and use it in all low-level functions (and "public"
ones when they don't need to call the former).

While here, remove runq_remove_idx(), as it knows a bit too much of
ULE's internals, in particular by treating the whole runqueue as
round-robin, which we are going to change. Instead, have runq_remove()
return whether the queue from which the thread was removed is now empty,
and leverage this information in tdq_runq_rem() (sched_ule(4)).

While here, re-implement runq_add() on top of runq_add_idx() to remove
its duplicated code (all lines except one). Introduce the new
RQ_PRI_TO_IDX() macro to convert a priority to a queue index, and use it
in runq_add() (many more uses will be introduced in later commits).

While here, rename runq_check() to runq_not_empty() and have it return
a boolean instead of an 'int', and same for sched_runnable() as an
impact (and while here, fix a small style violation in sched_4bsd(4)'s
version).

While here, simplify sched_runnable().

While here, make <sys/sched.h> standalone include-wise.

No functional change intended.

Reviewed by: kib
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45387

show more ...


# c21c24ad 27-Mar-2024 Olivier Certner <olce@FreeBSD.org>

runq: More selective includes of <sys/runq.h> to reduce pollution

<sys/proc.h> doesn't need <sys/runq.h>. Remove this include and add it
back for kernel files that relied on the pollution.

Reviewe

runq: More selective includes of <sys/runq.h> to reduce pollution

<sys/proc.h> doesn't need <sys/runq.h>. Remove this include and add it
back for kernel files that relied on the pollution.

Reviewed by: kib
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45387

show more ...


# 2fefe2c8 26-Feb-2024 Olivier Certner <olce@FreeBSD.org>

runq: Deduce most parameters, remove machine headers

The 'runq' machinery now depends on only two settable parameters,
RQ_MAX_PRIO, the maximum priority number that can be accepted, the
minimum bein

runq: Deduce most parameters, remove machine headers

The 'runq' machinery now depends on only two settable parameters,
RQ_MAX_PRIO, the maximum priority number that can be accepted, the
minimum being 0, and RQ_PPQ, the number of priorities per queue (to
reduce the number of queues).

All other parameters are deduced from these ones. Also, all
architectures automatically get a runq word that is their natural word.

RQB_FFS() always was 'ffsl() - 1' except for amd64 where it was
'bsfq()'. Now that all these finally call compiler builtins, the
resulting assembly code is the same, so there is no cost to removing
this special case.

After all these changes, <machine/runq.h> headers have no more purpose,
so remove them.

While here, fix potentially confusing parameter name for RQB_WORD() and
RQB_BIT().

While here, include all necessary headers so that <sys/runq.h> can be
included standalone.

No functional change (intended).

Reviewed by: kib
MFC after: 1 month
Event: Kitchener-Waterloo Hackathon 202506
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D45387

show more ...


# 62f55b34 11-May-2025 Ka Ho Ng <khng@FreeBSD.org>

sched: mark several kern.sched.* sysctls as CTLFLAG_RWTUN

The following sysctls which are not touched during boot time
initialization are marked as CTLFLAG_RWTUN so they can be set by loader
tunable

sched: mark several kern.sched.* sysctls as CTLFLAG_RWTUN

The following sysctls which are not touched during boot time
initialization are marked as CTLFLAG_RWTUN so they can be set by loader
tunables as well:
- kern.sched.interact
- kern.sched.preempt_thresh
- kern.sched.static_boost
- kern.sched.idlespins
- kern.sched.balance
- kern.sched.steal_idle
- kern.sched.steal_thresh
- kern.sched.trysteal_limit
- kern.sched.always_steal

MFC after: 1 week
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D50279

show more ...


# e24a6552 29-Jul-2024 Mark Johnston <markj@FreeBSD.org>

thread: Remove kernel stack swapping support, part 4

- Remove the IS_SWAPPED thread inhibitor state.
- Remove all uses of TD_IS_SWAPPED() in the kernel.
- Remove the TDF_CANSWAP flag.
- Remove the P

thread: Remove kernel stack swapping support, part 4

- Remove the IS_SWAPPED thread inhibitor state.
- Remove all uses of TD_IS_SWAPPED() in the kernel.
- Remove the TDF_CANSWAP flag.
- Remove the P_SWAPPINGOUT and P_SWAPPINGIN flags.

Tested by: pho
Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D46115

show more ...


# aeff15b3 09-Feb-2024 Olivier Certner <olce@FreeBSD.org>

sched: Simplify sched_lend_user_prio_cond()

If 'td_lend_user_pri' has the expected value, there is no need to check
the fields that sched_lend_user_prio() modifies, they either are already
good or s

sched: Simplify sched_lend_user_prio_cond()

If 'td_lend_user_pri' has the expected value, there is no need to check
the fields that sched_lend_user_prio() modifies, they either are already
good or soon will be ('td->td_lend_user_pri' has just been changed by
a concurrent update).

Reviewed by: kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D44050

show more ...


# 6a3c02bc 16-Jan-2024 Olivier Certner <olce@FreeBSD.org>

sched: sched_switch(): Factorize sleepqueue flags

Avoid duplicating common flags for the preempted and non-preempted
cases, making it clear that they are the same without resorting to
formatting.

N

sched: sched_switch(): Factorize sleepqueue flags

Avoid duplicating common flags for the preempted and non-preempted
cases, making it clear that they are the same without resorting to
formatting.

No functional change.

Approved by: markj (mentor)
MFC after: 3 days
Sponsored by: The FreeBSD Foundation

show more ...


# 0a713948 22-Nov-2023 Alexander Motin <mav@FreeBSD.org>

Replace random sbuf_printf() with cheaper cat/putc.


Revision tags: release/14.0.0
# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix

show more ...


Revision tags: release/13.2.0
# 1029dab6 09-Feb-2023 Mitchell Horne <mhorne@FreeBSD.org>

mi_switch(): clean up switch types and their usage

Overall, this is a non-functional change, except for kernels built with
SCHED_STATS. However, the switch types are useful for communicating the
int

mi_switch(): clean up switch types and their usage

Overall, this is a non-functional change, except for kernels built with
SCHED_STATS. However, the switch types are useful for communicating the
intent of the caller.

1. Ensure that every caller provides a type. In most cases, we upgrade
the basic yield to sched_relinquish() aka SWT_RELINQUISH.
2. The case of sched_bind() is distinct, so add a new switch type SWT_BIND.
3. Remove the two unused types, SWT_PREEMPT and SWT_SLEEPQTIMO.
4. Remove SWT_NONE altogether and assert that callers always provide
a type flag.
5. Reference the mi_switch(9) man page in the comments, as these flags
will be documented there.

Reviewed by: kib, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38184

show more ...


Revision tags: release/12.4.0
# c6d31b83 18-Jul-2022 Konstantin Belousov <kib@FreeBSD.org>

AST: rework

Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For inst

AST: rework

Make most AST handlers dynamically registered. This allows to have
subsystem-specific handler source located in the subsystem files,
instead of making subr_trap.c aware of it. For instance, signal
delivery code on return to userspace is now moved to kern_sig.c.

Also, it allows to have some handlers designated as the cleanup (kclear)
type, which are called both at AST and on thread/process exit. For
instance, ast(), exit1(), and NFS server no longer need to be aware
about UFS softdep processing.

The dynamic registration also allows third-party modules to register AST
handlers if needed. There is one caveat with loadable modules: the
code does not make any effort to ensure that the module is not unloaded
before all threads processed through AST handler in it. In fact, this
is already present behavior for hwpmc.ko and ufs.ko. I do not think it
is worth the efforts and the runtime overhead to try to fix it.

Reviewed by: markj
Tested by: emaste (arm64), pho
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35888

show more ...


# bd980ca8 18-Jul-2022 Mark Johnston <markj@FreeBSD.org>

sched_ule: Ensure we hold the thread lock when modifying td_flags

The load balancer may force a running thread to reschedule and pick a
new CPU. To do this it sets some flags in the thread running

sched_ule: Ensure we hold the thread lock when modifying td_flags

The load balancer may force a running thread to reschedule and pick a
new CPU. To do this it sets some flags in the thread running on a
loaded CPU. But the code assumed that a running thread's lock is the
same as that of the corresponding runqueue, and there are small windows
where this is not true. In this case, we can end up with non-atomic
modifications to td_flags.

Since this load balancing is best-effort, simply give up if the thread's
lock doesn't match; in this case the thread is about to enter the
scheduler anyway.

Reviewed by: kib
Reported by: glebius
Fixes: e745d729be60 ("sched_ule(4): Improve long-term load balancer.")
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D35821

show more ...


# 6eeba7db 16-Jul-2022 Mateusz Guzik <mjg@FreeBSD.org>

ule: unbreak UP builds

Sponsored by: Rubicon Communications, LLC ("Netgate")


# 954cffe9 14-Jul-2022 John Baldwin <jhb@FreeBSD.org>

ule: Simplistic time-sharing for interrupt threads.

If an interrupt thread runs for a full quantum without yielding the
CPU, demote its priority and schedule a preemption to give other
ithreads a tu

ule: Simplistic time-sharing for interrupt threads.

If an interrupt thread runs for a full quantum without yielding the
CPU, demote its priority and schedule a preemption to give other
ithreads a turn.

Reviewed by: kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35644

show more ...


# fea89a28 14-Jul-2022 John Baldwin <jhb@FreeBSD.org>

Add sched_ithread_prio to set the base priority of an interrupt thread.

Use it instead of sched_prio when setting the priority of an interrupt
thread.

Reviewed by: kib, markj
Sponsored by: Netflix

Add sched_ithread_prio to set the base priority of an interrupt thread.

Use it instead of sched_prio when setting the priority of an interrupt
thread.

Reviewed by: kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D35642

show more ...


12345678910>>...33