1 /*
2 * CDDL HEADER START
3 *
4 * The contents of this file are subject to the terms of the
5 * Common Development and Distribution License (the "License").
6 * You may not use this file except in compliance with the License.
7 *
8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
9 * or http://www.opensolaris.org/os/licensing.
10 * See the License for the specific language governing permissions
11 * and limitations under the License.
12 *
13 * When distributing Covered Code, include this CDDL HEADER in each
14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
15 * If applicable, add the following below this CDDL HEADER, with the
16 * fields enclosed by brackets "[]" replaced with your own identifying
17 * information: Portions Copyright [yyyy] [name of copyright owner]
18 *
19 * CDDL HEADER END
20 */
21 /*
22 * Copyright 2009 Sun Microsystems, Inc. All rights reserved.
23 * Use is subject to license terms.
24 *
25 * Copyright 2024 Oxide Computer Company
26 */
27
28 /*
29 * AMD-specific CPU power management support.
30 *
31 * And, a brief history of AMD CPU power management. Or, "Why you care about CPU
32 * power management even when you are not worried about a few watts from the
33 * wall." This history is intended to provide lodestones for this domain, but is
34 * not a fully comprehensive AMD power management feature chronology.
35 *
36 * In the early 2000s, AMD shipped a feature called PowerNow! in the K6 era -
37 * K6-2E+ and K6-III+ cores, according to "AMD PowerNow! Technology Dynamically
38 * Manages Power and Performance", publication number 24404A. This feature
39 * allowed operating systems to control power and performance settings in a way
40 * that is very similar to ACPI P-states. That is, selectable core voltage and
41 * frequency levels, with default "power-saver" and "high-performance" modes
42 * that are reflective of Pmin and Pmax on a 2024-era AMD processor.
43 *
44 * With Thuban and Zosma parts later in the K10 era, AMD extended power and
45 * frequency management with the "Turbo Core" feature. They talk about this in
46 * more detail in blogs about the Bulldozer architecture, though many materials
47 * are now dead links. Exactly how Turbo Core is informed and managed is less
48 * discussed, or at least I have been unable to find good technical material on
49 * the topic, but we can draw some inferences from what *is* discussed with
50 * those Bulldozer cores:
51 * * introduces the notion of boosting all cores beyond a "base frequency"
52 * * introduces the notion of boosting further with only half or fewer cores
53 * active
54 * * introduces the notion of power-governed turbo boost
55 *
56 * Somewhere in the K10 era, AMD also introduced C-state support, allowing cores
57 * to be put into low-power idle states when not used. Some articles from
58 * reviewers and system integrators around this time indicate that setting the
59 * "C-state mode to C6" is "required to get the highest Turbo Core frequencies."
60 *
61 * As the AMD 15h BIOS and Kernel Developers Guide (BKDG) is clear to note, AMD
62 * C-states do not directly correspond to ACPI C-states. But when an ACPI
63 * low-power C-state is entered, the CPU's low-power implementation is one of
64 * these AMD C-states, and C6 is the lowest-power of them.
65 *
66 * Further, note that in the Bulldozer era, CPUs were in the range of 4-8 cores,
67 * so "half or fewer cores" means "2-4 active cores."
68 *
69 * At this point and onwards, for some families of AMD parts, best single-core
70 * performance can only be achieved if an operating system parks idle CPU cores
71 * in the lowest-power states - AMD's C6, aka ACPI C3.
72 *
73 * Boosting beyond a base clock, in a AMD-defined and approved manner,
74 * potentially on all cores, has since also been branded as "AMD Core
75 * Performance Boost." This is the name you can find this behavior known as in
76 * Zen and later parts.
77 *
78 * Zen included a more expansive power management approach, "Precision Boost."
79 * "Precision Boost" is where we see start to see power management more
80 * explicitly *managed* - core clocks and voltages are decided by some software
81 * running on the new System Management Unit (SMU). Correspondingly, exactly
82 * what voltage/termperature/power inputs will produce what operational outcomes
83 * from the processor become less and less clearly documented.
84 *
85 * For example, a (Zen 1) Ryzen 7 1700 part is labeled as 3.0GHz base clock,
86 * with up to 3.8GHz boost clock. This 800MHz gamut is the purview of the SMU
87 * implementing "Precision Boost."
88 *
89 * In practice, later AMD marketing material implies that Precision Boost
90 * retained "Turbo Core" behavior that peak boost frequences are only attainable
91 * when one or two cores are actually active. Additionally, even if all cores
92 * are loaded, Precision Boost provides some amount of boost if thermal and
93 * power headroom allows.
94 *
95 * Taking the above Ryzen 7 1700 part as an example, the "base clock" of 3.0 GHz
96 * is relatively unlikely to be an actual operational frequency of the part.
97 * Either a core will be off (as in AMD-defined C1 or C6), on in a low-power
98 * P-state (the processor's minimum operational frequency, probably P1 or
99 * whatever Pmin the part supports), or on in a high-power P-state (P0). In the
100 * high-power P-state, if "boost above base clock" feature is enabled, a core
101 * will probably be some hundreds of MHz above its requested clock speed!
102 *
103 * Further, somewhere around the Zen architecture AMD introduced the "Extended
104 * Frequency Range" (XFR) feature, which allows the processor to clock
105 * 100-150MHz (depending on SKU) higher than "max turbo." This is still
106 * constrained by the silicon, power, and thermal limits indicated by a
107 * combination of fused values set at fabrication time, platform firmware, and
108 * potentially user customization (if firmware allows). Specifics here are
109 * still slim pickings.
110 *
111 * Forum-goers in 2018 would discuss their Ryzen 7 1700s having a clock speed of
112 * 3.1-3.2GHz under all-core load, going up to 3.7GHz under one- or two-core
113 * loads. All frequency selection in this range is up to the SMU, potentially
114 * capped by BIOS or OS-provided parameters.
115 *
116 * As of Zen 5, the latest development here is "Precision Boost 2", which began
117 * shipping with Zen 2. This seems to be an upgrade of the power/frequency
118 * selection regime used by the SMU - instead of "all-core" and "low-core"
119 * turbos, the processor measures its utilization of system-specific paramters
120 * such as package temperature and power draw. Exactly how frequency choices are
121 * made at this point appears to be a black box, other than blanket statements
122 * like "the processor will pick the highest permissible frequency given its
123 * operating environment."
124 *
125 * An interesting detail in the marketing material and slide decks surrounding
126 * the introduction of Precision Boost 2.0 is an implicit confirmation that
127 * Precision Boost did maintain a strict "all-core" and "low-core" pair of
128 * frequencies. This comes from the marketing statement that Precision Boost 2.0
129 * has done away with those concepts from previous generations, instead
130 * providing a "linear scaling" of frequencies under increasing load levels.
131 *
132 * This brings us to 2024; empirically the above blanket statements are only
133 * correct given the operating system managing CPU cores in a way roughly
134 * commensurate with how AMD would expect an operating system to manage them.
135 *
136 * This is especially dramatic on AMD's server parts - Naples, Rome, Milan, and
137 * onward - where with all cores in high-power C-states, but possibly low-power
138 * P-states, still prevent individual cores from boosting closer to a part's
139 * Fmax. The difference between a peak clock without C-state management, and
140 * peak clock with C-state management, can be as much as 20% of a part's Fmax.
141 * This has also been seen on Threadripper systems. But the impact of C-state
142 * management seems much less dramatic on "desktop" parts; a 7950x without
143 * C-state management can see individual cores clocking to 5.4 GHz or above,
144 * much closer to its rated Fmax of 5.75 GHz.
145 *
146 * From empirical measurement, the difference here appears to be an undocumented
147 * "all-core" turbo that the part limits itself to if all cores are in C0, even
148 * if they are in C0 but in Pmax and stopped in hlt/mwait idle - the actual
149 * power draw differences between these states may be small, but simply being
150 * powered seems to trip some threshold.
151 *
152 * One conclusion from all this is that across the board, C-state management can
153 * have a surprising relationship to performance. Unfortunately, the direct
154 * relationships are undocumented. We are entirely dependent on ACPI-provided
155 * latency information to decide if C-state transitions are profitable given
156 * instantaneous workloads and performance needs.
157 *
158 * Finally, CPPC (Collaborative Processor Performance Control) is a feature
159 * that currently seems to be more oriented towards desktop enthusiast parts,
160 * but stretches the above even further. CPPC includes an abstract "performance
161 * scale" a processor supports, where the operating system requests some factor
162 * along this scale based on workloads it must run. CPPC also introduces the
163 * idea of "Preferred Cores", where at manufacturing time individual cores in a
164 * die are fused with information indicating how highly they can be driven.
165 * This is reportedly reflected as higher peak clocks under load, lower voltage
166 * (and less power) at intermediate clocks.
167 *
168 * It would be nice, in the limit of time, to find if a given processor supports
169 * CPPC, collect its preferred cores, and prefer scheduling tasks on those cores
170 * if they are not already busy. This extends somewhat beyond simply managing
171 * power states of loaded cores.
172 */
173
174 #include <sys/x86_archext.h>
175 #include <sys/cpu_acpi.h>
176 #include <sys/cpu_idle.h>
177 #include <sys/pwrnow.h>
178
179 boolean_t
cpupm_amd_init(cpu_t * cp)180 cpupm_amd_init(cpu_t *cp)
181 {
182 cpupm_mach_state_t *mach_state =
183 (cpupm_mach_state_t *)(cp->cpu_m.mcpu_pm_mach_state);
184
185 /* AMD or Hygon? */
186 if (x86_vendor != X86_VENDOR_AMD &&
187 x86_vendor != X86_VENDOR_HYGON)
188 return (B_FALSE);
189
190 /*
191 * If we support PowerNow! on this processor, then set the
192 * correct cma_ops for the processor.
193 */
194 mach_state->ms_pstate.cma_ops = pwrnow_supported() ?
195 &pwrnow_ops : NULL;
196
197 /*
198 * AMD systems may support C-states, so optimistically set cma_ops to
199 * drive C-states. If the system does not *actually* support C-states,
200 * ACPI tables will not include _CST objects and `cpus_init` will fail.
201 * This, in turn, will cause `cpupm_init` to reset idle handling to not
202 * use C-states including clearing `ms_cstate.cma_ops`.
203 */
204 mach_state->ms_cstate.cma_ops = &cpu_idle_ops;
205
206 return (B_TRUE);
207 }
208