1 /* 2 * This file and its contents are supplied under the terms of the 3 * Common Development and Distribution License ("CDDL"), version 1.0. 4 * You may only use this file in accordance with the terms of version 5 * 1.0 of the CDDL. 6 * 7 * A full copy of the text of the CDDL should have accompanied this 8 * source. A copy of the CDDL is also available via the Internet at 9 * http://www.illumos.org/license/CDDL. 10 */ 11 12 /* 13 * Copyright 2022 Oxide Computer Co. 14 */ 15 16 #ifndef _SYS_AMDZEN_SMN_H 17 #define _SYS_AMDZEN_SMN_H 18 19 #include <sys/debug.h> 20 #include <sys/types.h> 21 22 /* 23 * Generic definitions for the system management network (SMN) in Milan and many 24 * other AMD Zen processors. These are shared between the amdzen nexus and its 25 * client drivers and kernel code that may require SMN access to resources. 26 * 27 * ------------------------ 28 * Endpoints and Addressing 29 * ------------------------ 30 * 31 * SMN addresses are 36 bits long but in practice we can use only 32. Bits 32 * [35:32] identify a destination node, but all consumers instead direct SMN 33 * transactions to a specific node by selecting the address/data register pair 34 * in the NBIO PCI config space corresponding to the destination. Additional 35 * information about nodes and the organisation of devices in the Zen 36 * architecture may be found in the block comments in amdzen.c and cpuid.c. 37 * 38 * The SMN provides access to instances of various functional units present on 39 * or accessed via each node. Some functional units have only a single instance 40 * per node while others may have many. Each functional unit instance has one 41 * or more apertures in which it decodes addresses. The aperture portion of the 42 * address consists of bits [31:20] and the remainder of the address is used to 43 * specify a register instance within that functional unit. To complicate 44 * matters, some functional units have multiple smaller sub-units that decode 45 * smaller regions within its parent's aperture; in some cases, the bits in a 46 * mask describing the sub-unit's registers may not be contiguous. To keep 47 * software relatively simple, we generally treat sub-units and parent units the 48 * same and try to choose collections of registers whose addresses can all be 49 * computed in the same manner to form what we will describe as a unit. 50 * 51 * Each functional unit should typically have its own header containing register 52 * definitions, accessors, and address calculation routines; some functional 53 * units are small and straightforward while others may have numerous complex 54 * sub-units, registers with many instances whose locations are computed in 55 * unusual and nonstandard ways, and other features that need to be declared for 56 * consumers. Those functional units that are present across many processors 57 * and have similar or identical contents across them should live in this 58 * directory; umc.h is such an example. Others may be specific to a particular 59 * processor family (see cpuid.c) or other collection and may require their own 60 * subdirectories, symbol prefixes, and so on. Unlike the DF, the existence, 61 * location, and format of registers accessible over SMN are not versioned nor 62 * are they generally self-discoverable. Each functional unit may be present or 63 * absent, in varying numbers and with varying functionality, across the entire 64 * Zen product range. Therefore, at this time most per-unit headers are 65 * intended for use only by code that will execute on a specific processor 66 * family. Unifying them over time is considered desirable to the extent the 67 * hardware allows it. 68 * 69 * ----- 70 * Types 71 * ----- 72 * 73 * Practically every last one of us has screwed up the order of arguments to 74 * functions like amdzen_smn_write32() when they take an address and a value of 75 * the same type. Repeatedly. Often. To safety this particularly annoying 76 * footgun, we pass SMN register addresses around in a dedicated struct type 77 * smn_reg_t, intended to be instantiated only by the amdzen_xx_smn_reg() and 78 * analogous kernel functions and the macros that expand to them or, for the 79 * YOLO crew, SMN_MAKE_REG(). Since the struct type and uint32_t are not 80 * compatible, the compiler will always squawk if the register and value 81 * arguments are reversed, leaving us far fewer baffling failures to debug at 82 * runtime. Typical callers don't require any awareness of this at all, but 83 * those that want to pass the address around to e.g. log warnings can obtain 84 * the uint32_t address via SMN_REG_ADDR(). 85 * 86 * Register definitions within functional units are provided by objects of type 87 * `const smn_reg_def_t`, the usage of which is described in detail in the next 88 * section. For now these are produced on demand by macros; see additional 89 * notes on conventions below. In time, this mechanism may be extended to 90 * incorporate version information in a manner similar to that used in df.h. An 91 * automated mechanism for creating a single collection of register and field 92 * definitions for C, in CTF, and/or for other language consumers as well as 93 * automated register value decoding remains an open area for future work. 94 * 95 * ----------------------- 96 * Instances and Iterators 97 * ----------------------- 98 * 99 * Not only do some functional units have many instances, so too do many 100 * registers. AMD documentation describes registers in terms of a series of 101 * iterators over various functional units, subunits, and other entities and 102 * attributes that each multiply the number of register instances. A concrete 103 * example from the publicly-available Naples PPR (publication 54945 rev. 1.14) 104 * may make this simpler to understand. Unfortunately, SMN is not described by 105 * this document, but the register instance syntax used is the same and is 106 * described in additional detail in sections 1.3.3-4. For our example, let us 107 * consider the same MSR that AMD uses in their own example, 108 * Core::X86::MSR::TSC. We are given that this register has the following 109 * instances: lthree[1:0]_core[3:0]_thread[1:0]. We therefore have three 110 * iterators: one for 'lthree's, one for 'core's for each 'lthree', and one for 111 * 'thread's for each 'core'. We can also see that there are 16 total 112 * instances; in fact, there are actually 16 per core-complex die (CCD), which 113 * documents for more recent processors would expose as a fourth iterator. To 114 * keep things relatively simple, we will assume that there are only 16 per 115 * processor. If it were possible to access all of these instances via MMIO, 116 * SMN, or some other flat address space (it isn't, as far as we can tell), a 117 * function for computing the address of each instance would require three 118 * parameters. Let us suppose that this register really were accessible via 119 * SMN; in that case, we would also be provided with a list of instance alias 120 * such as 121 * 122 * _thread[1:0]_core[7:0]_lthree[1:0]_alias_SMN: THREADREGS[1:0]x0000_0010; 123 * THREADREGS[1:0]=COREREGS[7:0]x0000_[4,0]000; 124 * COREREGS[7:0]=L3REGS[1:0]x000[7:0]_5000; L3REGS[1:0]=57[A,6]0_0000 125 * 126 * To compute the address of an instance of this hypothetical register, we would 127 * begin by determining that its top-level functional unit is L3REGS with a base 128 * aperture at 0x5760_0000. There are two instances of this functional unit (01 129 * and 1) and each subsequent instance is offset 0x40_0000 from the previous. 130 * This allows us to compute the base address of each L3REGS block; a similar 131 * process is then used to compute the base address of each COREREGS block, and 132 * finally the address of each THREADREGS block that contains the register 133 * instance. In practice, we might choose instead to consider the COREREGS as 134 * our functional unit, with instances at 0x5760_5000, 0x5761_5000, 0x57A0_5000, 135 * and 0x57A1_5000; whether it is useful to do this depends on whether we need 136 * to consider other registers in the L3REGS unit that may not have per-core 137 * blocks or instances but would otherwise be interleaved with these. This ends 138 * up being something of a judgment call. Let's suppose we want to consider the 139 * entire L3REGS functional unit and write a function to compute the address of 140 * any register (including our hypothetical TSC) in the subordinate THREADREGS 141 * blocks. We'll start by adding the new unit to the smn_unit_t enumeration; 142 * let's call it SMN_UNIT_L3REGS_COREREGS since that's the sub-unit level at 143 * which we can uniformly compute register instance addresses. We have already 144 * determined our base aperture and we know that we have 3 iterators and 145 * therefore three parameters; all SMN address calculators return an smn_reg_t 146 * and must accept an smn_reg_def_t. Therefore our function's signature is: 147 * 148 * smn_reg_t amdzen_smn_l3regs_coreregs_reg(uint8_t l3no, 149 * const smn_reg_def_t def, uint16_t coreinst, uint16_t threadinst); 150 * 151 * We have chosen to use a base aperture of 0x5760_0000 and unit offset 152 * 0x40_0000, so we can begin by computing a COREREGS aperture: 153 * 154 * const uint32_t aperture_base = 0x57600000; 155 * const uint32_t aperture_off = l3no * 0x400000; 156 * const uint32_t coreregs_aperture_base = 0x5000; 157 * const uint32_t coreregs_aperture_off = coreinst * 0x10000; 158 * 159 * We can now consider the smn_reg_def_t our function will be given, which 160 * describes THREADREGS::TSC. Within the COREREGS functional sub-unit, each 161 * thread register has 2 instances present at a stride of 0x4000 bytes (from our 162 * hypothetical register definition), so the register would be defined as 163 * follows: 164 * 165 * #define D_L3REGS_COREREGS_THREAD_TSC (const smn_reg_def_t){ \ 166 * .srd_unit = SMN_UNIT_L3REGS_COREREGS, \ 167 * .srd_reg = 0x10, \ 168 * .srd_nents = 2, \ 169 * .srd_stride = 0x4000 \ 170 * } 171 * 172 * Note that describing the number of entries and their stride in the register 173 * definition allows us to collapse the last functional sub-unit in our 174 * calculation process: we need not compute the base aperture address of the 175 * THREADREGS sub-unit. Instead, we can follow our previous code with: 176 * 177 * const uint32_t aperture = aperture_base + 178 * coreregs_aperture_base + coreregs_aperture_off; 179 * const uint32_t reg = def.srd_reg + threadinst * def.srd_stride; 180 * 181 * Finally, we convert the aperture address and register offset into the 182 * appropriate type and return it: 183 * 184 * return (SMN_MAKE_REG(aperture + reg)); 185 * 186 * As you can see, other registers in THREADREGS would be defined with the same 187 * number entries and stride but a different offset (srd_reg member), while 188 * other registers in the COREREGS block would have a different offset and 189 * stride. For example, if a block of per-core (not per-thread) registers were 190 * located at COREREGS[7:0]x0000_1000, a register called "COREREGS::FrobberCntl" 191 * in that block with a single instance at offset 0x48 might be defined as 192 * 193 * #define D_L3REGS_COREREGS_FROB_CTL (const smn_reg_def_t){ \ 194 * .srd_unit = SMN_UNIT_L3REGS_COREREGS, \ 195 * .srd_reg = 0x1048, \ 196 * .srd_nents = 1 \ 197 * } 198 * 199 * You can satisfy yourself that the same calculation function we wrote above 200 * will correctly compute the address of the sole instance (0) of this register. 201 * To further simplify register definitions and callers, the actual address 202 * calculation functions are written to treat srd_nents == 0 to mean a register 203 * with a single instance, and to treat srd_stride == 0 as if it were 4 (the 204 * space occupied by registers accessed by SMN is -- so far as we can tell, 205 * practically always -- 4 bytes in size, even if the register itself is 206 * smaller). Additionally, a large number of assertions should be present in 207 * such functions to guard against foreign unit register definitions, 208 * out-of-bounds unit and register instance parameters, address overflow, and 209 * register instance offsets that overflow improperly into an aperture base 210 * address. All of these conditions indicate either an incorrect register 211 * definition or a bug in the caller. See the template macro at the bottom of 212 * this file and umc.h for additional examples of calculating and checking 213 * register addresses. 214 * 215 * With address computation out of the way, we can then provide an accessor for 216 * each instance this register: 217 * 218 * #define L3REGS_COREREGS_THREAD_TSC(l3, core, thread) \ 219 * amdzen_l3regs_coreregs_reg(l3, D_L3REGS_COREREGS_THREAD_TSC, \ 220 * core, thread) 221 * 222 * Our other per-core register's accessor would look like: 223 * 224 * #define L3REGS_COREREGS_FROB_CTL(l3, core) \ 225 * amdzen_l3regs_coreregs_reg(l3, D_L3REGS_COREREGS_FROB_CTL, core, 0) 226 * 227 * The next section describes these conventions in greater detail. 228 * 229 * ----------- 230 * Conventions 231 * ----------- 232 * 233 * First, let's consider the names of the register definition and the 234 * convenience macro supplied to obtain an instance of that register: we've 235 * prefixed the global definition of the registers with D_ and the convenience 236 * macros to return a specific instance are simply named for the register 237 * itself. Additionally, the two macros expand to objects of incompatible 238 * types, so that using the wrong one will always be detected at compile time. 239 * Why do we expose both of these? The instance macro is useful for callers who 240 * know at compile-time the name of the register of which they want instances; 241 * this makes it unnecessary to remember the names of functions used to compute 242 * register instance addresses. The definition itself is useful to callers that 243 * accept const smn_reg_def_t arguments referring to registers of which the 244 * immediate caller does not know the names at compile time. 245 * 246 * You may wonder why we don't declare named constants for the definitions. 247 * There are two ways we could do that and both are unfortunate: one would be to 248 * declare them static in the header, the other to separate declarations in the 249 * header from initialisation in a separate source file. Measurements revealed 250 * that the former causes a very substantial increase in data size, which will 251 * be multiplied by the number of registers defined and the number of source 252 * files including the header. As convenient as it is to have these symbolic 253 * constants available to debuggers and other tools at runtime, they're just too 254 * big. However, it is possible to generate code to be compiled into loadable 255 * modules that would contain a single copy of the constants for this purpose as 256 * well as for providing CTF to foreign-language binding generators. The other 257 * option considered here, putting the constants in separate source files, makes 258 * maintenance significantly more challenging and makes it likely not only that 259 * new registers may not be added properly but also that definitions, macros, or 260 * both may be incorrect. Neither of these options is terrible but for now 261 * we've optimised for simplicity of maintenance and minimal data size at the 262 * immediate but not necessarily permanent expense of some debugging 263 * convenience. 264 * 265 * We wish to standardise as much as possible on conventions across all 266 * Zen-related functional units and blocks (including those accessed by SMN, 267 * through the DF directly, and by other means). In general, some register and 268 * field names are shortened from their official names for clarity and brevity; 269 * the official names are always given in the comment above the definition. 270 * AMD's functional units come from many internal teams and presumably several 271 * outside vendors as well; as a result, there is no single convention to be 272 * found throughout the PPRs and other documentation. For example, different 273 * units may have registers containing "CTL", "CNTL", "CTRL", "CNTRL", and 274 * "CONTROL", as well as "FOO_CNTL", "FooCntl", and "Foo_Cntl". Reflecting 275 * longstanding illumos conventions, we collapse all such register names 276 * regardless of case as follows: 277 * 278 * CTL/CTRL/CNTL/CNTRL/CONTROL => CTL 279 * CFG/CONF/CONFIG/CONFIGURATION => CFG 280 * EN/ENAB/ENABLE/ENABLED => EN 281 * DIS/DISAB/DISABLE/DISABLED => DIS 282 * 283 * Note that if collapsing these would result in ambiguity, more of the official 284 * names will be preserved. In addition to collapsing register and field names 285 * in this case-insensitive manner, we also follow standard code style practice 286 * and name macros and constants in SCREAMING_SNAKE_CASE regardless of AMD's 287 * official name. It is similarly reasonable to truncate or abbreviate other 288 * common terms in a consistent manner where doing so preserves uniqueness and 289 * at least some semantic value; without doing so, some official register names 290 * will be excessively unwieldy and may not even fit into 80 columns. Please 291 * maintain these practices and strive for consistency with existing examples 292 * when abbreviation is required. 293 * 294 * As we have done elsewhere throughout the amdzen body of work, register fields 295 * should always be given in order starting with the most significant bits and 296 * working down toward 0; this matches AMD's documentation and makes it easier 297 * for reviewers and other readers to follow. The routines in bitext.h should 298 * be used to extract and set bitfields unless there is a compelling reason to 299 * do otherwise (e.g., assembly consumers). Accessors should be named 300 * UNIT_REG_GET_FIELD and UNIT_REG_SET_FIELD respectively, unless the register 301 * has a single field that has no meaningful name (i.e., the field's name is the 302 * same as the register's or it's otherwise obvious from the context what its 303 * purpose is), in which case UNIT_REG_GET and UNIT_REG_SET are appropriate. 304 * Additional getters and setters that select a particular bit from a register 305 * or field consisting entirely of individual bits describing or controlling the 306 * state of some entity may also be useful. As with register names, be as brief 307 * as possible without sacrificing too much information. 308 * 309 * Constant values associated with a field should be declared immediately 310 * following that field. If a constant or collection of constants is used in 311 * multiple fields of the same register, the definitions should follow the last 312 * such field; similarly, constants used in multiple registers should follow the 313 * last such register, and a comment explaining the scope of their validity is 314 * recommended. Such constants should be named for the common elements of the 315 * fields or registers in which they are valid. 316 * 317 * As noted above, SMN register definitions should omit the srd_nents and 318 * srd_stride members when there is a single instance of the register within the 319 * unit. The srd_stride member should also be elided when the register 320 * instances are contiguous. All address calculation routines should be written 321 * to support these conventions. Each register should have an accessor macro or 322 * function, and should accept instance numbers in order from superior to 323 * inferior (e.g., from the largest functional unit to the smallest, ending with 324 * the register instance itself). This convention is similar to that used in 325 * generic PCIe code in which a register is specified by bus, device, and 326 * function numbers in that order. Register accessor macros or inline functions 327 * should not expose inapplicable taxons to callers; in our example above, 328 * COREREGS_FROB_CTL has an instance for each core but is not associated with a 329 * thread; therefore its accessor should not accept a thread instance argument 330 * even though the address calculation function it uses does. 331 * 332 * Most of these conventions are not specific to registers accessed via SMN; 333 * note also that some registers may be accessed in multiple ways (e.g., SMN and 334 * MMIO, or SMN and the MSR instructions). While the code here is generally 335 * unaware of such aliased access methods, following these conventions will 336 * simplify naming and usage if such a register needs to be accessed in multiple 337 * ways. Sensible additions to macro and symbol names such as the access method 338 * to be used will generally be sufficient to disambiguate while allowing reuse 339 * of associated field accessors, constants, and in some cases even register 340 * offset, instance count, and stride. 341 */ 342 343 #ifdef __cplusplus 344 extern "C" { 345 #endif 346 347 #define SMN_APERTURE_MASK 0xfff00000 348 349 /* 350 * An instance of an SMN-accessible register. 351 */ 352 typedef struct smn_reg { 353 uint32_t sr_addr; 354 } smn_reg_t; 355 356 /*CSTYLED*/ 357 #define SMN_MAKE_REG(x) ((const smn_reg_t){ .sr_addr = (x) }) 358 #define SMN_REG_ADDR(x) ((x).sr_addr) 359 360 /* 361 * This exists so that address calculation functions can check that the register 362 * definitions they're passed are something they understand how to use. While 363 * many address calculation functions are similar, some functional units define 364 * registers with multiple iterators, have differently-sized apertures, or both; 365 * it's important that we reject foreign register definitions in these 366 * functions. In principle this could be done at compile time, but the 367 * preprocessor gymnastics required to do so are excessively vile and we are 368 * really already hanging it pretty far over the edge in terms of what the C 369 * preprocessor can do for us. 370 */ 371 typedef enum smn_unit { 372 SMN_UNIT_UNKNOWN, 373 SMN_UNIT_IOAPIC, 374 SMN_UNIT_IOHC, 375 SMN_UNIT_IOHCDEV_PCIE, 376 SMN_UNIT_IOHCDEV_NBIF, 377 SMN_UNIT_IOHCDEV_SB, 378 SMN_UNIT_IOAGR, 379 SMN_UNIT_SDPMUX, 380 SMN_UNIT_UMC, 381 SMN_UNIT_PCIE_CORE, 382 SMN_UNIT_PCIE_PORT, 383 SMN_UNIT_PCIE_RSMU, 384 SMN_UNIT_SCFCTP, 385 SMN_UNIT_SMUPWR, 386 SMN_UNIT_IOMMUL1, 387 SMN_UNIT_IOMMUL2, 388 SMN_UNIT_NBIF, 389 SMN_UNIT_NBIF_ALT, 390 SMN_UNIT_NBIF_FUNC 391 } smn_unit_t; 392 393 /* 394 * srd_unit and srd_reg are required; they describe the functional unit and the 395 * register's address within that unit's aperture (which may be the SDP-defined 396 * aperture described above or a smaller one if a unit has been broken down 397 * logically into smaller units). srd_nents is optional; if not set, all 398 * existing consumers assume a value of 0 is equivalent to 1: the register has 399 * but a single instance in each unit. srd_stride is ignored if srd_nents is 0 400 * or 1 and optional otherwise; it describes the number of bytes to be added to 401 * the previous instance's address to obtain that of the next instance. If left 402 * at 0 it is assumed to be 4 bytes. 403 * 404 * There are units in which registers have more complicated collections of 405 * instances that cannot be represented perfectly by this simple descriptor; 406 * they require custom address calculation macros and functions that may take 407 * additional arguments, and they may not be able to check their arguments or 408 * the computed addresses as carefully as would be ideal. 409 */ 410 typedef struct smn_reg_def { 411 smn_unit_t srd_unit; 412 uint32_t srd_reg; 413 uint32_t srd_stride; 414 uint16_t srd_nents; 415 } smn_reg_def_t; 416 417 /* 418 * This macro may be used by per-functional-unit code to construct an address 419 * calculation function. It is usable by some, BUT NOT ALL, functional units; 420 * see the block comment above for an example that cannot be accommodated. Here 421 * we assume that there are at most 2 iterators in any register's definition. 422 * Use this when possible, as it provides a large number of useful checks on 423 * DEBUG bits. Similar checks should be incorporated into implementations for 424 * nonstandard functional units to the extent possible. 425 */ 426 427 #define AMDZEN_MAKE_SMN_REG_FN(_fn, _unit, _base, _mask, _nunits, _unitshift) \ 428 CTASSERT(((_base) & ~(_mask)) == 0); \ 429 static inline smn_reg_t \ 430 _fn(const uint8_t unitno, const smn_reg_def_t def, const uint16_t reginst) \ 431 { \ 432 const uint32_t unit32 = (const uint32_t)unitno; \ 433 const uint32_t reginst32 = (const uint32_t)reginst; \ 434 const uint32_t stride = (def.srd_stride == 0) ? 4 : def.srd_stride; \ 435 const uint32_t nents = (def.srd_nents == 0) ? 1 : \ 436 (const uint32_t)def.srd_nents; \ 437 \ 438 ASSERT3S(def.srd_unit, ==, SMN_UNIT_ ## _unit); \ 439 ASSERT3U(unit32, <, (_nunits)); \ 440 ASSERT3U(nents, >, reginst32); \ 441 ASSERT0(def.srd_reg & (_mask)); \ 442 \ 443 const uint32_t aperture_base = (_base); \ 444 \ 445 const uint32_t aperture_off = (unit32 << (_unitshift)); \ 446 ASSERT3U(aperture_off, <=, UINT32_MAX - aperture_base); \ 447 \ 448 const uint32_t aperture = aperture_base + aperture_off; \ 449 ASSERT0(aperture & ~(_mask)); \ 450 \ 451 const uint32_t reg = def.srd_reg + reginst32 * stride; \ 452 ASSERT0(reg & (_mask)); \ 453 \ 454 return (SMN_MAKE_REG(aperture + reg)); \ 455 } 456 457 #ifdef __cplusplus 458 } 459 #endif 460 461 #endif /* _SYS_AMDZEN_SMN_H */ 462