1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 /* 22 * Copyright 2009 Sun Microsystems, Inc. All rights reserved. 23 * Use is subject to license terms. 24 * 25 * Copyright 2024 Oxide Computer Company 26 */ 27 28 /* 29 * AMD-specific CPU power management support. 30 * 31 * And, a brief history of AMD CPU power management. Or, "Why you care about CPU 32 * power management even when you are not worried about a few watts from the 33 * wall." This history is intended to provide lodestones for this domain, but is 34 * not a fully comprehensive AMD power management feature chronology. 35 * 36 * In the early 2000s, AMD shipped a feature called PowerNow! in the K6 era - 37 * K6-2E+ and K6-III+ cores, according to "AMD PowerNow! Technology Dynamically 38 * Manages Power and Performance", publication number 24404A. This feature 39 * allowed operating systems to control power and performance settings in a way 40 * that is very similar to ACPI P-states. That is, selectable core voltage and 41 * frequency levels, with default "power-saver" and "high-performance" modes 42 * that are reflective of Pmin and Pmax on a 2024-era AMD processor. 43 * 44 * With Thuban and Zosma parts later in the K10 era, AMD extended power and 45 * frequency management with the "Turbo Core" feature. They talk about this in 46 * more detail in blogs about the Bulldozer architecture, though many materials 47 * are now dead links. Exactly how Turbo Core is informed and managed is less 48 * discussed, or at least I have been unable to find good technical material on 49 * the topic, but we can draw some inferences from what *is* discussed with 50 * those Bulldozer cores: 51 * * introduces the notion of boosting all cores beyond a "base frequency" 52 * * introduces the notion of boosting further with only half or fewer cores 53 * active 54 * * introduces the notion of power-governed turbo boost 55 * 56 * Somewhere in the K10 era, AMD also introduced C-state support, allowing cores 57 * to be put into low-power idle states when not used. Some articles from 58 * reviewers and system integrators around this time indicate that setting the 59 * "C-state mode to C6" is "required to get the highest Turbo Core frequencies." 60 * 61 * As the AMD 15h BIOS and Kernel Developers Guide (BKDG) is clear to note, AMD 62 * C-states do not directly correspond to ACPI C-states. But when an ACPI 63 * low-power C-state is entered, the CPU's low-power implementation is one of 64 * these AMD C-states, and C6 is the lowest-power of them. 65 * 66 * Further, note that in the Bulldozer era, CPUs were in the range of 4-8 cores, 67 * so "half or fewer cores" means "2-4 active cores." 68 * 69 * At this point and onwards, for some families of AMD parts, best single-core 70 * performance can only be achieved if an operating system parks idle CPU cores 71 * in the lowest-power states - AMD's C6, aka ACPI C3. 72 * 73 * Boosting beyond a base clock, in a AMD-defined and approved manner, 74 * potentially on all cores, has since also been branded as "AMD Core 75 * Performance Boost." This is the name you can find this behavior known as in 76 * Zen and later parts. 77 * 78 * Zen included a more expansive power management approach, "Precision Boost." 79 * "Precision Boost" is where we see start to see power management more 80 * explicitly *managed* - core clocks and voltages are decided by some software 81 * running on the new System Management Unit (SMU). Correspondingly, exactly 82 * what voltage/termperature/power inputs will produce what operational outcomes 83 * from the processor become less and less clearly documented. 84 * 85 * For example, a (Zen 1) Ryzen 7 1700 part is labeled as 3.0GHz base clock, 86 * with up to 3.8GHz boost clock. This 800MHz gamut is the purview of the SMU 87 * implementing "Precision Boost." 88 * 89 * In practice, later AMD marketing material implies that Precision Boost 90 * retained "Turbo Core" behavior that peak boost frequences are only attainable 91 * when one or two cores are actually active. Additionally, even if all cores 92 * are loaded, Precision Boost provides some amount of boost if thermal and 93 * power headroom allows. 94 * 95 * Taking the above Ryzen 7 1700 part as an example, the "base clock" of 3.0 GHz 96 * is relatively unlikely to be an actual operational frequency of the part. 97 * Either a core will be off (as in AMD-defined C1 or C6), on in a low-power 98 * P-state (the processor's minimum operational frequency, probably P1 or 99 * whatever Pmin the part supports), or on in a high-power P-state (P0). In the 100 * high-power P-state, if "boost above base clock" feature is enabled, a core 101 * will probably be some hundreds of MHz above its requested clock speed! 102 * 103 * Further, somewhere around the Zen architecture AMD introduced the "Extended 104 * Frequency Range" (XFR) feature, which allows the processor to clock 105 * 100-150MHz (depending on SKU) higher than "max turbo." This is still 106 * constrained by the silicon, power, and thermal limits indicated by a 107 * combination of fused values set at fabrication time, platform firmware, and 108 * potentially user customization (if firmware allows). Specifics here are 109 * still slim pickings. 110 * 111 * Forum-goers in 2018 would discuss their Ryzen 7 1700s having a clock speed of 112 * 3.1-3.2GHz under all-core load, going up to 3.7GHz under one- or two-core 113 * loads. All frequency selection in this range is up to the SMU, potentially 114 * capped by BIOS or OS-provided parameters. 115 * 116 * As of Zen 5, the latest development here is "Precision Boost 2", which began 117 * shipping with Zen 2. This seems to be an upgrade of the power/frequency 118 * selection regime used by the SMU - instead of "all-core" and "low-core" 119 * turbos, the processor measures its utilization of system-specific paramters 120 * such as package temperature and power draw. Exactly how frequency choices are 121 * made at this point appears to be a black box, other than blanket statements 122 * like "the processor will pick the highest permissible frequency given its 123 * operating environment." 124 * 125 * An interesting detail in the marketing material and slide decks surrounding 126 * the introduction of Precision Boost 2.0 is an implicit confirmation that 127 * Precision Boost did maintain a strict "all-core" and "low-core" pair of 128 * frequencies. This comes from the marketing statement that Precision Boost 2.0 129 * has done away with those concepts from previous generations, instead 130 * providing a "linear scaling" of frequencies under increasing load levels. 131 * 132 * This brings us to 2024; empirically the above blanket statements are only 133 * correct given the operating system managing CPU cores in a way roughly 134 * commensurate with how AMD would expect an operating system to manage them. 135 * 136 * This is especially dramatic on AMD's server parts - Naples, Rome, Milan, and 137 * onward - where with all cores in high-power C-states, but possibly low-power 138 * P-states, still prevent individual cores from boosting closer to a part's 139 * Fmax. The difference between a peak clock without C-state management, and 140 * peak clock with C-state management, can be as much as 20% of a part's Fmax. 141 * This has also been seen on Threadripper systems. But the impact of C-state 142 * management seems much less dramatic on "desktop" parts; a 7950x without 143 * C-state management can see individual cores clocking to 5.4 GHz or above, 144 * much closer to its rated Fmax of 5.75 GHz. 145 * 146 * From empirical measurement, the difference here appears to be an undocumented 147 * "all-core" turbo that the part limits itself to if all cores are in C0, even 148 * if they are in C0 but in Pmax and stopped in hlt/mwait idle - the actual 149 * power draw differences between these states may be small, but simply being 150 * powered seems to trip some threshold. 151 * 152 * One conclusion from all this is that across the board, C-state management can 153 * have a surprising relationship to performance. Unfortunately, the direct 154 * relationships are undocumented. We are entirely dependent on ACPI-provided 155 * latency information to decide if C-state transitions are profitable given 156 * instantaneous workloads and performance needs. 157 * 158 * Finally, CPPC (Collaborative Processor Performance Control) is a feature 159 * that currently seems to be more oriented towards desktop enthusiast parts, 160 * but stretches the above even further. CPPC includes an abstract "performance 161 * scale" a processor supports, where the operating system requests some factor 162 * along this scale based on workloads it must run. CPPC also introduces the 163 * idea of "Preferred Cores", where at manufacturing time individual cores in a 164 * die are fused with information indicating how highly they can be driven. 165 * This is reportedly reflected as higher peak clocks under load, lower voltage 166 * (and less power) at intermediate clocks. 167 * 168 * It would be nice, in the limit of time, to find if a given processor supports 169 * CPPC, collect its preferred cores, and prefer scheduling tasks on those cores 170 * if they are not already busy. This extends somewhat beyond simply managing 171 * power states of loaded cores. 172 */ 173 174 #include <sys/x86_archext.h> 175 #include <sys/cpu_acpi.h> 176 #include <sys/cpu_idle.h> 177 #include <sys/pwrnow.h> 178 179 boolean_t 180 cpupm_amd_init(cpu_t *cp) 181 { 182 cpupm_mach_state_t *mach_state = 183 (cpupm_mach_state_t *)(cp->cpu_m.mcpu_pm_mach_state); 184 185 /* AMD or Hygon? */ 186 if (x86_vendor != X86_VENDOR_AMD && 187 x86_vendor != X86_VENDOR_HYGON) 188 return (B_FALSE); 189 190 /* 191 * If we support PowerNow! on this processor, then set the 192 * correct cma_ops for the processor. 193 */ 194 mach_state->ms_pstate.cma_ops = pwrnow_supported() ? 195 &pwrnow_ops : NULL; 196 197 /* 198 * AMD systems may support C-states, so optimistically set cma_ops to 199 * drive C-states. If the system does not *actually* support C-states, 200 * ACPI tables will not include _CST objects and `cpus_init` will fail. 201 * This, in turn, will cause `cpupm_init` to reset idle handling to not 202 * use C-states including clearing `ms_cstate.cma_ops`. 203 */ 204 mach_state->ms_cstate.cma_ops = &cpu_idle_ops; 205 206 return (B_TRUE); 207 } 208