cpufreq.rst - OpenGrok cross reference for /linux/Documentation/admin-guide/pm/cpufreq.rst

Lines Matching +full:cpu +full:- +full:capacity
1 .. SPDX-License-Identifier: GPL-2.0
7 CPU Performance Scaling
15 The Concept of CPU Performance Scaling
20 Operating Performance Points or P-states (in ACPI terminology).  As a rule,
22 can be retired by the CPU over a unit of time, but also the higher the clock
24 time (or the more power is drawn) by the CPU in the given P-state.  Therefore
25 there is a natural tradeoff between the CPU capacity (the number of instructions
26 that can be executed over a unit of time) and the power drawn by the CPU.
29 as possible and then there is no reason to use any P-states different from the
30 highest one (i.e. the highest-performance frequency/voltage configuration
32 instructions so quickly and maintaining the highest available CPU capacity for a
34 It also may not be physically possible to maintain maximum CPU capacity for too
35 long for thermal or power supply capacity reasons or similar.  To cover those
38 put into different P-states.
40 Typically, they are used along with algorithms to estimate the required CPU
41 capacity, so as to decide which P-states to put the CPUs into.  Of course, since
44 to as CPU performance scaling or CPU frequency scaling (because it involves
45 adjusting the CPU clock frequency).
48 CPU Performance Scaling in Linux
51 The Linux kernel supports CPU performance scaling by means of the ``CPUFreq``
52 (CPU Frequency scaling) subsystem that consists of three layers of code: the
56 interfaces for all platforms that support CPU performance scaling.  It defines
59 Scaling governors implement algorithms to estimate the required CPU capacity.
64 information on the available P-states (or P-state ranges in some cases) and
65 access platform-specific hardware interfaces to change CPU P-states as requested
70 performance scaling algorithms for P-state selection can be represented in a
71 platform-independent form in the majority of cases, so it should be possible
80 platform-independent way.  For this reason, ``CPUFreq`` allows scaling drivers
88 In some cases the hardware interface for P-state control is shared by multiple
90 control the P-state of multiple CPUs at the same time and writing to it affects
93 Sets of CPUs sharing hardware P-state control interfaces are represented by
95 struct cpufreq_policy is also used when there is only one CPU in the given
99 every CPU in the system, including CPUs that are currently offline.  If multiple
100 CPUs share the same hardware P-state control interface, all of the pointers
107 CPU Initialization
114 The scaling driver may be registered before or after CPU registration.  If
121 In any case, the ``CPUFreq`` core is invoked to take note of any logical CPU it
122 has not seen so far as soon as it is ready to handle that CPU.  [Note that the
123 logical CPU may be a physical single-core processor, or a single core in a
125 core.  In what follows "CPU" always means "logical CPU" unless explicitly stated
130 for the given CPU and if so, it skips the policy object creation.  Otherwise,
133 the given CPU is set to the new policy object's address in memory.
135 Next, the scaling driver's ``->init()`` callback is invoked with the policy
136 pointer of the new CPU passed to it as the argument.  That callback is expected
137 to initialize the performance scaling hardware interface for the given CPU (or,
142 the set of supported P-states is not a continuous range), and the mask of CPUs
151 the governor's ``->init()`` callback which is expected to initialize all of the
154 invoking its ``->start()`` callback.
156 That callback is expected to register per-CPU utilization update callbacks for
157 all of the online CPUs belonging to the given policy with the CPU scheduler.
158 The utilization update callbacks will be invoked by the CPU scheduler on
160 scheduler tick or generally whenever the CPU utilization may change (from the
162 to determine the P-state to use for the given policy going forward and to
164 the P-state selection.  The scaling driver may be invoked directly from
172 "inactive" (and is re-initialized now) instead of the default governor.
174 In turn, if a previously offline CPU is being brought back online, but some
176 need to re-initialize the policy object at all.  In that case, it only is
177 necessary to restart the scaling governor so that it can take the new online CPU
178 into account.  That is achieved by invoking the governor's ``->stop`` and
179 ``->start()`` callbacks, in this order, for the entire policy.
182 governor layer of ``CPUFreq`` and provides its own P-state selection algorithms.
184 new policy objects.  Instead, the driver's ``->setpolicy()`` callback is invoked
185 to register per-CPU utilization update callbacks for each policy.  These
186 callbacks are invoked by the CPU scheduler in the same way as for scaling
187 governors, but in the |intel_pstate| case they both determine the P-state to
191 The policy objects created during CPU initialization and other data structures
194 when the last CPU belonging to the given policy in unregistered.
202 :file:`/sys/devices/system/cpu/`.
207 under :file:`/sys/devices/system/cpu/cpuY/` (where ``Y`` represents an integer
210 in :file:`/sys/devices/system/cpu/cpufreq` each contain policy-specific
217 also add driver-specific attributes to the policy directories in ``sysfs`` to
218 control policy-specific aspects of driver behavior.
220 The generic attributes under :file:`/sys/devices/system/cpu/cpufreq/policyX/`
230 	CPU frequencies, that limit will be reported through this attribute (if
235 	BIOS/HW-based mechanisms.
262         CPU(s) will result in an appropriate error, i.e: EAGAIN for CPU that
275 	P-state to another, in nanoseconds.
278 	work with the `ondemand`_ governor, -1 (:c:macro:`CPUFREQ_ETERNAL`)
301 	In the majority of cases, this is the frequency of the last P-state
304 	the CPU is actually running at (due to hardware design and other
308 	more precisely reflecting the current CPU frequency through this
309 	attribute, but that still may not be the exact current CPU frequency as
321 	This attribute is read-write and writing to it will cause a new scaling
332 	This attribute is read-write and writing a string representing an
340 	This attribute is read-write and writing a string representing a
341 	non-negative integer to it will cause a new limit to be set (it must not
366 Some governors expose ``sysfs`` attributes to control or fine-tune the scaling
368 tunables, can be either global (system-wide) or per-policy, depending on the
370 per-policy, they are located in a subdirectory of each policy directory.
372 :file:`/sys/devices/system/cpu/cpufreq/`.  In either case the name of the
377 ---------------
387 -------------
397 -------------
400 to set the CPU frequency for the policy it is attached to by writing to the
404 -------------
406 This governor uses CPU utilization data available from the CPU scheduler.  It
407 generally is regarded as a part of the CPU scheduler, so it can access the
411 invoke the scaling driver asynchronously when it decides that the CPU frequency
413 is capable of changing the CPU frequency from scheduler context).
415 The actions of this governor for a particular CPU depend on the scheduling class
416 invoking its utilization update callback for that CPU.  If it is invoked by the
420 Per-Entity Load Tracking (PELT) metric for the root control group of the
421 given CPU as the CPU utilization estimate (see the *Per-entity load tracking*
423 CPU frequency to apply is computed in accordance with the formula
428 ``util``, and ``f_0`` is either the maximum possible CPU frequency for the given
429 policy (if the PELT number is frequency-invariant), or the current CPU frequency
433 CPU frequency for tasks that have been waiting on I/O most recently, called
434 "IO-wait boosting".  That happens when the :c:macro:`SCHED_CPUFREQ_IOWAIT` flag
451 tightly integrated with the CPU scheduler, its overhead in terms of CPU context
452 switches and similar is less significant, and it uses the scheduler's own CPU
457 ------------
459 This governor uses CPU load as a CPU frequency selection metric.
461 In order to estimate the current CPU load, it measures the time elapsed between
463 time in which the given CPU was not idle.  The ratio of the non-idle (active)
464 time to the total CPU time is taken as an estimate of the load.
471 invoked asynchronously (via a workqueue) and CPU P-states are updated from
473 governor is minimum, but it causes additional CPU context switches to happen
474 relatively often and the CPU P-state updates triggered by it can be relatively
475 irregular.  Also, it affects its own CPU load metric by running code that
476 reduces the CPU idle time (even though the CPU idle time is only reduced very
479 It generally selects CPU frequencies proportional to the estimated load, so that
498 	If this tunable is per-policy, the following shell command sets the time
505 	If the estimated CPU load is above this value (in percent), the governor
508 	CPU load.
511 	If set to 1 (default 0), it will cause the CPU load estimation code to
512 	treat the CPU time spent on executing tasks with "nice" levels greater
513 	than 0 as CPU idle time.
522 	the ``sampling_rate`` value if the CPU load goes above ``up_threshold``.
529 	at the cost of additional energy spent on maintaining the maximum CPU
530 	capacity.
535 	value is exceeded by the estimated CPU load) or sensitivity threshold
543 		f * (1 - ``powersave_bias`` / 1000)
555 	workload running on a CPU will change in response to frequency changes.
557 	The performance of a workload with the sensitivity of 0 (memory-bound or
558 	IO-bound) is not expected to increase at all as a result of increasing
559 	the CPU frequency, whereas workloads with the sensitivity of 100%
560 	(CPU-bound) are expected to perform much better if the CPU frequency is
566 	target, so as to avoid over-provisioning workloads that will not benefit
567 	from running at higher CPU frequencies.
570 ----------------
572 This governor uses CPU load as a CPU frequency selection metric.
574 It estimates the CPU load in the same way as the `ondemand`_ governor described
575 above, but the CPU frequency selection algorithm implemented by it is different.
578 which may not be suitable for systems with limited power supply capacity (e.g.
579 battery-powered).  To achieve that, it changes the frequency in relatively
580 small steps, one step at a time, up or down - depending on whether or not a
581 (configurable) threshold has been exceeded by the estimated CPU load.
600 	If the estimated CPU load is greater than this value, the frequency will
617 ----------
626 "Turbo-Core" or (in technical documentation) "Core Performance Boost" and so on.
631 The frequency boost mechanism may be either hardware-based or software-based.
632 If it is hardware-based (e.g. on x86), the decision to trigger the boosting is
634 into a special state in which it can control the CPU frequency within certain
635 limits).  If it is software-based (e.g. on ARM), the scaling driver decides
639 -------------------------------
641 This file is located under :file:`/sys/devices/system/cpu/cpufreq/` and controls
644 but provides a driver-specific interface for controlling it, like
649 trigger boosting (in the hardware-based case), or the software is allowed to
650 trigger boosting (in the software-based case).  It does not mean that boosting
661 --------------------------------
664 CPU performance on time scales below software resolution (e.g. below the
677      limited capacity, such as batteries, so the ability to disable the boost
691      single-thread performance may vary because of it which may lead to
697 -----------------------
699 The AMD powernow-k8 scaling driver supports a ``sysfs`` knob very similar to
704 ``sysfs`` (:file:`/sys/devices/system/cpu/cpufreq/policyX/`) and is called
706 implementation, however, works on the system-wide basis and setting that knob
726 .. [1] Jonathan Corbet, *Per-entity load tracking*,