xref: /linux/Documentation/scheduler/sched-util-clamp.rst (revision c8bfe3fad4f86a029da7157bae9699c816f0c309)
1.. SPDX-License-Identifier: GPL-2.0
2
3====================
4Utilization Clamping
5====================
6
71. Introduction
8===============
9
10Utilization clamping, also known as util clamp or uclamp, is a scheduler
11feature that allows user space to help in managing the performance requirement
12of tasks. It was introduced in v5.3 release. The CGroup support was merged in
13v5.4.
14
15Uclamp is a hinting mechanism that allows the scheduler to understand the
16performance requirements and restrictions of the tasks, thus it helps the
17scheduler to make a better decision. And when schedutil cpufreq governor is
18used, util clamp will influence the CPU frequency selection as well.
19
20Since the scheduler and schedutil are both driven by PELT (util_avg) signals,
21util clamp acts on that to achieve its goal by clamping the signal to a certain
22point; hence the name. That is, by clamping utilization we are making the
23system run at a certain performance point.
24
25The right way to view util clamp is as a mechanism to make request or hint on
26performance constraints. It consists of two tunables:
27
28        * UCLAMP_MIN, which sets the lower bound.
29        * UCLAMP_MAX, which sets the upper bound.
30
31These two bounds will ensure a task will operate within this performance range
32of the system. UCLAMP_MIN implies boosting a task, while UCLAMP_MAX implies
33capping a task.
34
35One can tell the system (scheduler) that some tasks require a minimum
36performance point to operate at to deliver the desired user experience. Or one
37can tell the system that some tasks should be restricted from consuming too
38much resources and should not go above a specific performance point. Viewing
39the uclamp values as performance points rather than utilization is a better
40abstraction from user space point of view.
41
42As an example, a game can use util clamp to form a feedback loop with its
43perceived Frames Per Second (FPS). It can dynamically increase the minimum
44performance point required by its display pipeline to ensure no frame is
45dropped. It can also dynamically 'prime' up these tasks if it knows in the
46coming few hundred milliseconds a computationally intensive scene is about to
47happen.
48
49On mobile hardware where the capability of the devices varies a lot, this
50dynamic feedback loop offers a great flexibility to ensure best user experience
51given the capabilities of any system.
52
53Of course a static configuration is possible too. The exact usage will depend
54on the system, application and the desired outcome.
55
56Another example is in Android where tasks are classified as background,
57foreground, top-app, etc. Util clamp can be used to constrain how much
58resources background tasks are consuming by capping the performance point they
59can run at. This constraint helps reserve resources for important tasks, like
60the ones belonging to the currently active app (top-app group). Beside this
61helps in limiting how much power they consume. This can be more obvious in
62heterogeneous systems (e.g. Arm big.LITTLE); the constraint will help bias the
63background tasks to stay on the little cores which will ensure that:
64
65        1. The big cores are free to run top-app tasks immediately. top-app
66           tasks are the tasks the user is currently interacting with, hence
67           the most important tasks in the system.
68        2. They don't run on a power hungry core and drain battery even if they
69           are CPU intensive tasks.
70
71.. note::
72  **little cores**:
73    CPUs with capacity < 1024
74
75  **big cores**:
76    CPUs with capacity = 1024
77
78By making these uclamp performance requests, or rather hints, user space can
79ensure system resources are used optimally to deliver the best possible user
80experience.
81
82Another use case is to help with **overcoming the ramp up latency inherit in
83how scheduler utilization signal is calculated**.
84
85On the other hand, a busy task for instance that requires to run at maximum
86performance point will suffer a delay of ~200ms (PELT HALFIFE = 32ms) for the
87scheduler to realize that. This is known to affect workloads like gaming on
88mobile devices where frames will drop due to slow response time to select the
89higher frequency required for the tasks to finish their work in time. Setting
90UCLAMP_MIN=1024 will ensure such tasks will always see the highest performance
91level when they start running.
92
93The overall visible effect goes beyond better perceived user
94experience/performance and stretches to help achieve a better overall
95performance/watt if used effectively.
96
97User space can form a feedback loop with the thermal subsystem too to ensure
98the device doesn't heat up to the point where it will throttle.
99
100Both SCHED_NORMAL/OTHER and SCHED_FIFO/RR honour uclamp requests/hints.
101
102In the SCHED_FIFO/RR case, uclamp gives the option to run RT tasks at any
103performance point rather than being tied to MAX frequency all the time. Which
104can be useful on general purpose systems that run on battery powered devices.
105
106Note that by design RT tasks don't have per-task PELT signal and must always
107run at a constant frequency to combat undeterministic DVFS rampup delays.
108
109Note that using schedutil always implies a single delay to modify the frequency
110when an RT task wakes up. This cost is unchanged by using uclamp. Uclamp only
111helps picking what frequency to request instead of schedutil always requesting
112MAX for all RT tasks.
113
114See :ref:`section 3.4 <uclamp-default-values>` for default values and
115:ref:`3.4.1 <sched-util-clamp-min-rt-default>` on how to change RT tasks
116default value.
117
1182. Design
119=========
120
121Util clamp is a property of every task in the system. It sets the boundaries of
122its utilization signal; acting as a bias mechanism that influences certain
123decisions within the scheduler.
124
125The actual utilization signal of a task is never clamped in reality. If you
126inspect PELT signals at any point of time you should continue to see them as
127they are intact. Clamping happens only when needed, e.g: when a task wakes up
128and the scheduler needs to select a suitable CPU for it to run on.
129
130Since the goal of util clamp is to allow requesting a minimum and maximum
131performance point for a task to run on, it must be able to influence the
132frequency selection as well as task placement to be most effective. Both of
133which have implications on the utilization value at CPU runqueue (rq for short)
134level, which brings us to the main design challenge.
135
136When a task wakes up on an rq, the utilization signal of the rq will be
137affected by the uclamp settings of all the tasks enqueued on it. For example if
138a task requests to run at UTIL_MIN = 512, then the util signal of the rq needs
139to respect to this request as well as all other requests from all of the
140enqueued tasks.
141
142To be able to aggregate the util clamp value of all the tasks attached to the
143rq, uclamp must do some housekeeping at every enqueue/dequeue, which is the
144scheduler hot path. Hence care must be taken since any slow down will have
145significant impact on a lot of use cases and could hinder its usability in
146practice.
147
148The way this is handled is by dividing the utilization range into buckets
149(struct uclamp_bucket) which allows us to reduce the search space from every
150task on the rq to only a subset of tasks on the top-most bucket.
151
152When a task is enqueued, the counter in the matching bucket is incremented,
153and on dequeue it is decremented. This makes keeping track of the effective
154uclamp value at rq level a lot easier.
155
156As tasks are enqueued and dequeued, we keep track of the current effective
157uclamp value of the rq. See :ref:`section 2.1 <uclamp-buckets>` for details on
158how this works.
159
160Later at any path that wants to identify the effective uclamp value of the rq,
161it will simply need to read this effective uclamp value of the rq at that exact
162moment of time it needs to take a decision.
163
164For task placement case, only Energy Aware and Capacity Aware Scheduling
165(EAS/CAS) make use of uclamp for now, which implies that it is applied on
166heterogeneous systems only.
167When a task wakes up, the scheduler will look at the current effective uclamp
168value of every rq and compare it with the potential new value if the task were
169to be enqueued there. Favoring the rq that will end up with the most energy
170efficient combination.
171
172Similarly in schedutil, when it needs to make a frequency update it will look
173at the current effective uclamp value of the rq which is influenced by the set
174of tasks currently enqueued there and select the appropriate frequency that
175will satisfy constraints from requests.
176
177Other paths like setting overutilization state (which effectively disables EAS)
178make use of uclamp as well. Such cases are considered necessary housekeeping to
179allow the 2 main use cases above and will not be covered in detail here as they
180could change with implementation details.
181
182.. _uclamp-buckets:
183
1842.1. Buckets
185------------
186
187::
188
189                           [struct rq]
190
191  (bottom)                                                    (top)
192
193    0                                                          1024
194    |                                                           |
195    +-----------+-----------+-----------+----   ----+-----------+
196    |  Bucket 0 |  Bucket 1 |  Bucket 2 |    ...    |  Bucket N |
197    +-----------+-----------+-----------+----   ----+-----------+
198       :           :                                   :
199       +- p0       +- p3                               +- p4
200       :                                               :
201       +- p1                                           +- p5
202       :
203       +- p2
204
205
206.. note::
207  The diagram above is an illustration rather than a true depiction of the
208  internal data structure.
209
210To reduce the search space when trying to decide the effective uclamp value of
211an rq as tasks are enqueued/dequeued, the whole utilization range is divided
212into N buckets where N is configured at compile time by setting
213CONFIG_UCLAMP_BUCKETS_COUNT. By default it is set to 5.
214
215The rq has a bucket for each uclamp_id tunables: [UCLAMP_MIN, UCLAMP_MAX].
216
217The range of each bucket is 1024/N. For example, for the default value of
2185 there will be 5 buckets, each of which will cover the following range:
219
220::
221
222        DELTA = round_closest(1024/5) = 204.8 = 205
223
224        Bucket 0: [0:204]
225        Bucket 1: [205:409]
226        Bucket 2: [410:614]
227        Bucket 3: [615:819]
228        Bucket 4: [820:1024]
229
230When a task p with following tunable parameters
231
232::
233
234        p->uclamp[UCLAMP_MIN] = 300
235        p->uclamp[UCLAMP_MAX] = 1024
236
237is enqueued into the rq, bucket 1 will be incremented for UCLAMP_MIN and bucket
2384 will be incremented for UCLAMP_MAX to reflect the fact the rq has a task in
239this range.
240
241The rq then keeps track of its current effective uclamp value for each
242uclamp_id.
243
244When a task p is enqueued, the rq value changes to:
245
246::
247
248        // update bucket logic goes here
249        rq->uclamp[UCLAMP_MIN] = max(rq->uclamp[UCLAMP_MIN], p->uclamp[UCLAMP_MIN])
250        // repeat for UCLAMP_MAX
251
252Similarly, when p is dequeued the rq value changes to:
253
254::
255
256        // update bucket logic goes here
257        rq->uclamp[UCLAMP_MIN] = search_top_bucket_for_highest_value()
258        // repeat for UCLAMP_MAX
259
260When all buckets are empty, the rq uclamp values are reset to system defaults.
261See :ref:`section 3.4 <uclamp-default-values>` for details on default values.
262
263
2642.2. Max aggregation
265--------------------
266
267Util clamp is tuned to honour the request for the task that requires the
268highest performance point.
269
270When multiple tasks are attached to the same rq, then util clamp must make sure
271the task that needs the highest performance point gets it even if there's
272another task that doesn't need it or is disallowed from reaching this point.
273
274For example, if there are multiple tasks attached to an rq with the following
275values:
276
277::
278
279        p0->uclamp[UCLAMP_MIN] = 300
280        p0->uclamp[UCLAMP_MAX] = 900
281
282        p1->uclamp[UCLAMP_MIN] = 500
283        p1->uclamp[UCLAMP_MAX] = 500
284
285then assuming both p0 and p1 are enqueued to the same rq, both UCLAMP_MIN
286and UCLAMP_MAX become:
287
288::
289
290        rq->uclamp[UCLAMP_MIN] = max(300, 500) = 500
291        rq->uclamp[UCLAMP_MAX] = max(900, 500) = 900
292
293As we shall see in :ref:`section 5.1 <uclamp-capping-fail>`, this max
294aggregation is the cause of one of limitations when using util clamp, in
295particular for UCLAMP_MAX hint when user space would like to save power.
296
2972.3. Hierarchical aggregation
298-----------------------------
299
300As stated earlier, util clamp is a property of every task in the system. But
301the actual applied (effective) value can be influenced by more than just the
302request made by the task or another actor on its behalf (middleware library).
303
304The effective util clamp value of any task is restricted as follows:
305
306  1. By the uclamp settings defined by the cgroup CPU controller it is attached
307     to, if any.
308  2. The restricted value in (1) is then further restricted by the system wide
309     uclamp settings.
310
311:ref:`Section 3 <uclamp-interfaces>` discusses the interfaces and will expand
312further on that.
313
314For now suffice to say that if a task makes a request, its actual effective
315value will have to adhere to some restrictions imposed by cgroup and system
316wide settings.
317
318The system will still accept the request even if effectively will be beyond the
319constraints, but as soon as the task moves to a different cgroup or a sysadmin
320modifies the system settings, the request will be satisfied only if it is
321within new constraints.
322
323In other words, this aggregation will not cause an error when a task changes
324its uclamp values, but rather the system may not be able to satisfy requests
325based on those factors.
326
3272.4. Range
328----------
329
330Uclamp performance request has the range of 0 to 1024 inclusive.
331
332For cgroup interface percentage is used (that is 0 to 100 inclusive).
333Just like other cgroup interfaces, you can use 'max' instead of 100.
334
335.. _uclamp-interfaces:
336
3373. Interfaces
338=============
339
3403.1. Per task interface
341-----------------------
342
343sched_setattr() syscall was extended to accept two new fields:
344
345* sched_util_min: requests the minimum performance point the system should run
346  at when this task is running. Or lower performance bound.
347* sched_util_max: requests the maximum performance point the system should run
348  at when this task is running. Or upper performance bound.
349
350For example, the following scenario have 40% to 80% utilization constraints:
351
352::
353
354        attr->sched_util_min = 40% * 1024;
355        attr->sched_util_max = 80% * 1024;
356
357When task @p is running, **the scheduler should try its best to ensure it
358starts at 40% performance level**. If the task runs for a long enough time so
359that its actual utilization goes above 80%, the utilization, or performance
360level, will be capped.
361
362The special value -1 is used to reset the uclamp settings to the system
363default.
364
365Note that resetting the uclamp value to system default using -1 is not the same
366as manually setting uclamp value to system default. This distinction is
367important because as we shall see in system interfaces, the default value for
368RT could be changed. SCHED_NORMAL/OTHER might gain similar knobs too in the
369future.
370
3713.2. cgroup interface
372---------------------
373
374There are two uclamp related values in the CPU cgroup controller:
375
376* cpu.uclamp.min
377* cpu.uclamp.max
378
379When a task is attached to a CPU controller, its uclamp values will be impacted
380as follows:
381
382* cpu.uclamp.min is a protection as described in :ref:`section 3-3 of cgroup
383  v2 documentation <cgroupv2-protections-distributor>`.
384
385  If a task uclamp_min value is lower than cpu.uclamp.min, then the task will
386  inherit the cgroup cpu.uclamp.min value.
387
388  In a cgroup hierarchy, effective cpu.uclamp.min is the max of (child,
389  parent).
390
391* cpu.uclamp.max is a limit as described in :ref:`section 3-2 of cgroup v2
392  documentation <cgroupv2-limits-distributor>`.
393
394  If a task uclamp_max value is higher than cpu.uclamp.max, then the task will
395  inherit the cgroup cpu.uclamp.max value.
396
397  In a cgroup hierarchy, effective cpu.uclamp.max is the min of (child,
398  parent).
399
400For example, given following parameters:
401
402::
403
404        p0->uclamp[UCLAMP_MIN] = // system default;
405        p0->uclamp[UCLAMP_MAX] = // system default;
406
407        p1->uclamp[UCLAMP_MIN] = 40% * 1024;
408        p1->uclamp[UCLAMP_MAX] = 50% * 1024;
409
410        cgroup0->cpu.uclamp.min = 20% * 1024;
411        cgroup0->cpu.uclamp.max = 60% * 1024;
412
413        cgroup1->cpu.uclamp.min = 60% * 1024;
414        cgroup1->cpu.uclamp.max = 100% * 1024;
415
416when p0 and p1 are attached to cgroup0, the values become:
417
418::
419
420        p0->uclamp[UCLAMP_MIN] = cgroup0->cpu.uclamp.min = 20% * 1024;
421        p0->uclamp[UCLAMP_MAX] = cgroup0->cpu.uclamp.max = 60% * 1024;
422
423        p1->uclamp[UCLAMP_MIN] = 40% * 1024; // intact
424        p1->uclamp[UCLAMP_MAX] = 50% * 1024; // intact
425
426when p0 and p1 are attached to cgroup1, these instead become:
427
428::
429
430        p0->uclamp[UCLAMP_MIN] = cgroup1->cpu.uclamp.min = 60% * 1024;
431        p0->uclamp[UCLAMP_MAX] = cgroup1->cpu.uclamp.max = 100% * 1024;
432
433        p1->uclamp[UCLAMP_MIN] = cgroup1->cpu.uclamp.min = 60% * 1024;
434        p1->uclamp[UCLAMP_MAX] = 50% * 1024; // intact
435
436Note that cgroup interfaces allows cpu.uclamp.max value to be lower than
437cpu.uclamp.min. Other interfaces don't allow that.
438
4393.3. System interface
440---------------------
441
4423.3.1 sched_util_clamp_min
443--------------------------
444
445System wide limit of allowed UCLAMP_MIN range. By default it is set to 1024,
446which means that permitted effective UCLAMP_MIN range for tasks is [0:1024].
447By changing it to 512 for example the range reduces to [0:512]. This is useful
448to restrict how much boosting tasks are allowed to acquire.
449
450Requests from tasks to go above this knob value will still succeed, but
451they won't be satisfied until it is more than p->uclamp[UCLAMP_MIN].
452
453The value must be smaller than or equal to sched_util_clamp_max.
454
4553.3.2 sched_util_clamp_max
456--------------------------
457
458System wide limit of allowed UCLAMP_MAX range. By default it is set to 1024,
459which means that permitted effective UCLAMP_MAX range for tasks is [0:1024].
460
461By changing it to 512 for example the effective allowed range reduces to
462[0:512]. This means is that no task can run above 512, which implies that all
463rqs are restricted too. IOW, the whole system is capped to half its performance
464capacity.
465
466This is useful to restrict the overall maximum performance point of the system.
467For example, it can be handy to limit performance when running low on battery
468or when the system wants to limit access to more energy hungry performance
469levels when it's in idle state or screen is off.
470
471Requests from tasks to go above this knob value will still succeed, but they
472won't be satisfied until it is more than p->uclamp[UCLAMP_MAX].
473
474The value must be greater than or equal to sched_util_clamp_min.
475
476.. _uclamp-default-values:
477
4783.4. Default values
479-------------------
480
481By default all SCHED_NORMAL/SCHED_OTHER tasks are initialized to:
482
483::
484
485        p_fair->uclamp[UCLAMP_MIN] = 0
486        p_fair->uclamp[UCLAMP_MAX] = 1024
487
488That is, by default they're boosted to run at the maximum performance point of
489changed at boot or runtime. No argument was made yet as to why we should
490provide this, but can be added in the future.
491
492For SCHED_FIFO/SCHED_RR tasks:
493
494::
495
496        p_rt->uclamp[UCLAMP_MIN] = 1024
497        p_rt->uclamp[UCLAMP_MAX] = 1024
498
499That is by default they're boosted to run at the maximum performance point of
500the system which retains the historical behavior of the RT tasks.
501
502RT tasks default uclamp_min value can be modified at boot or runtime via
503sysctl. See below section.
504
505.. _sched-util-clamp-min-rt-default:
506
5073.4.1 sched_util_clamp_min_rt_default
508-------------------------------------
509
510Running RT tasks at maximum performance point is expensive on battery powered
511devices and not necessary. To allow system developer to offer good performance
512guarantees for these tasks without pushing it all the way to maximum
513performance point, this sysctl knob allows tuning the best boost value to
514address the system requirement without burning power running at maximum
515performance point all the time.
516
517Application developer are encouraged to use the per task util clamp interface
518to ensure they are performance and power aware. Ideally this knob should be set
519to 0 by system designers and leave the task of managing performance
520requirements to the apps.
521
5224. How to use util clamp
523========================
524
525Util clamp promotes the concept of user space assisted power and performance
526management. At the scheduler level there is no info required to make the best
527decision. However, with util clamp user space can hint to the scheduler to make
528better decision about task placement and frequency selection.
529
530Best results are achieved by not making any assumptions about the system the
531application is running on and to use it in conjunction with a feedback loop to
532dynamically monitor and adjust. Ultimately this will allow for a better user
533experience at a better perf/watt.
534
535For some systems and use cases, static setup will help to achieve good results.
536Portability will be a problem in this case. How much work one can do at 100,
537200 or 1024 is different for each system. Unless there's a specific target
538system, static setup should be avoided.
539
540There are enough possibilities to create a whole framework based on util clamp
541or self contained app that makes use of it directly.
542
5434.1. Boost important and DVFS-latency-sensitive tasks
544-----------------------------------------------------
545
546A GUI task might not be busy to warrant driving the frequency high when it
547wakes up. However, it requires to finish its work within a specific time window
548to deliver the desired user experience. The right frequency it requires at
549wakeup will be system dependent. On some underpowered systems it will be high,
550on other overpowered ones it will be low or 0.
551
552This task can increase its UCLAMP_MIN value every time it misses the deadline
553to ensure on next wake up it runs at a higher performance point. It should try
554to approach the lowest UCLAMP_MIN value that allows to meet its deadline on any
555particular system to achieve the best possible perf/watt for that system.
556
557On heterogeneous systems, it might be important for this task to run on
558a faster CPU.
559
560**Generally it is advised to perceive the input as performance level or point
561which will imply both task placement and frequency selection**.
562
5634.2. Cap background tasks
564-------------------------
565
566Like explained for Android case in the introduction. Any app can lower
567UCLAMP_MAX for some background tasks that don't care about performance but
568could end up being busy and consume unnecessary system resources on the system.
569
5704.3. Powersave mode
571-------------------
572
573sched_util_clamp_max system wide interface can be used to limit all tasks from
574operating at the higher performance points which are usually energy
575inefficient.
576
577This is not unique to uclamp as one can achieve the same by reducing max
578frequency of the cpufreq governor. It can be considered a more convenient
579alternative interface.
580
5814.4. Per-app performance restriction
582------------------------------------
583
584Middleware/Utility can provide the user an option to set UCLAMP_MIN/MAX for an
585app every time it is executed to guarantee a minimum performance point and/or
586limit it from draining system power at the cost of reduced performance for
587these apps.
588
589If you want to prevent your laptop from heating up while on the go from
590compiling the kernel and happy to sacrifice performance to save power, but
591still would like to keep your browser performance intact, uclamp makes it
592possible.
593
5945. Limitations
595==============
596
597.. _uclamp-capping-fail:
598
5995.1. Capping frequency with uclamp_max fails under certain conditions
600---------------------------------------------------------------------
601
602If task p0 is capped to run at 512:
603
604::
605
606        p0->uclamp[UCLAMP_MAX] = 512
607
608and it shares the rq with p1 which is free to run at any performance point:
609
610::
611
612        p1->uclamp[UCLAMP_MAX] = 1024
613
614then due to max aggregation the rq will be allowed to reach max performance
615point:
616
617::
618
619        rq->uclamp[UCLAMP_MAX] = max(512, 1024) = 1024
620
621Assuming both p0 and p1 have UCLAMP_MIN = 0, then the frequency selection for
622the rq will depend on the actual utilization value of the tasks.
623
624If p1 is a small task but p0 is a CPU intensive task, then due to the fact that
625both are running at the same rq, p1 will cause the frequency capping to be left
626from the rq although p1, which is allowed to run at any performance point,
627doesn't actually need to run at that frequency.
628
6295.2. UCLAMP_MAX can break PELT (util_avg) signal
630------------------------------------------------
631
632PELT assumes that frequency will always increase as the signals grow to ensure
633there's always some idle time on the CPU. But with UCLAMP_MAX, this frequency
634increase will be prevented which can lead to no idle time in some
635circumstances. When there's no idle time, a task will stuck in a busy loop,
636which would result in util_avg being 1024.
637
638Combing with issue described below, this can lead to unwanted frequency spikes
639when severely capped tasks share the rq with a small non capped task.
640
641As an example if task p, which have:
642
643::
644
645        p0->util_avg = 300
646        p0->uclamp[UCLAMP_MAX] = 0
647
648wakes up on an idle CPU, then it will run at min frequency (Fmin) this
649CPU is capable of. The max CPU frequency (Fmax) matters here as well,
650since it designates the shortest computational time to finish the task's
651work on this CPU.
652
653::
654
655        rq->uclamp[UCLAMP_MAX] = 0
656
657If the ratio of Fmax/Fmin is 3, then maximum value will be:
658
659::
660
661        300 * (Fmax/Fmin) = 900
662
663which indicates the CPU will still see idle time since 900 is < 1024. The
664_actual_ util_avg will not be 900 though, but somewhere between 300 and 900. As
665long as there's idle time, p->util_avg updates will be off by a some margin,
666but not proportional to Fmax/Fmin.
667
668::
669
670        p0->util_avg = 300 + small_error
671
672Now if the ratio of Fmax/Fmin is 4, the maximum value becomes:
673
674::
675
676        300 * (Fmax/Fmin) = 1200
677
678which is higher than 1024 and indicates that the CPU has no idle time. When
679this happens, then the _actual_ util_avg will become:
680
681::
682
683        p0->util_avg = 1024
684
685If task p1 wakes up on this CPU, which have:
686
687::
688
689        p1->util_avg = 200
690        p1->uclamp[UCLAMP_MAX] = 1024
691
692then the effective UCLAMP_MAX for the CPU will be 1024 according to max
693aggregation rule. But since the capped p0 task was running and throttled
694severely, then the rq->util_avg will be:
695
696::
697
698        p0->util_avg = 1024
699        p1->util_avg = 200
700
701        rq->util_avg = 1024
702        rq->uclamp[UCLAMP_MAX] = 1024
703
704Hence lead to a frequency spike since if p0 wasn't throttled we should get:
705
706::
707
708        p0->util_avg = 300
709        p1->util_avg = 200
710
711        rq->util_avg = 500
712
713and run somewhere near mid performance point of that CPU, not the Fmax we get.
714
7155.3. Schedutil response time issues
716-----------------------------------
717
718schedutil has three limitations:
719
720        1. Hardware takes non-zero time to respond to any frequency change
721           request. On some platforms can be in the order of few ms.
722        2. Non fast-switch systems require a worker deadline thread to wake up
723           and perform the frequency change, which adds measurable overhead.
724        3. schedutil rate_limit_us drops any requests during this rate_limit_us
725           window.
726
727If a relatively small task is doing critical job and requires a certain
728performance point when it wakes up and starts running, then all these
729limitations will prevent it from getting what it wants in the time scale it
730expects.
731
732This limitation is not only impactful when using uclamp, but will be more
733prevalent as we no longer gradually ramp up or down. We could easily be
734jumping between frequencies depending on the order tasks wake up, and their
735respective uclamp values.
736
737We regard that as a limitation of the capabilities of the underlying system
738itself.
739
740There is room to improve the behavior of schedutil rate_limit_us, but not much
741to be done for 1 or 2. They are considered hard limitations of the system.
742