1.. SPDX-License-Identifier: GPL-2.0 2 3====================== 4BIOS/EFI Configuration 5====================== 6 7BIOS and EFI are largely responsible for configuring static information about 8devices (or potential future devices) such that Linux can build the appropriate 9logical representations of these devices. 10 11At a high level, this is what occurs during this phase of configuration. 12 13* The bootloader starts the BIOS/EFI. 14 15* BIOS/EFI do early device probe to determine static configuration 16 17* BIOS/EFI creates ACPI Tables that describe static config for the OS 18 19* BIOS/EFI create the system memory map (EFI Memory Map, E820, etc) 20 21* BIOS/EFI calls :code:`start_kernel` and begins the Linux Early Boot process. 22 23Much of what this section is concerned with is ACPI Table production and 24static memory map configuration. More detail on these tables can be found 25at :doc:`ACPI Tables <acpi>`. 26 27.. note:: 28 Platform Vendors should read carefully, as this sections has recommendations 29 on physical memory region size and alignment, memory holes, HDM interleave, 30 and what linux expects of HDM decoders trying to work with these features. 31 32 33Linux Expectations of BIOS/EFI Software 34======================================= 35Linux expects BIOS/EFI software to construct sufficient ACPI tables (such as 36CEDT, SRAT, HMAT, etc) and platform-specific configurations (such as HPA spaces 37and host-bridge interleave configurations) to allow the Linux driver to 38subsequently configure the devices in the CXL fabric at runtime. 39 40Programming of HDM decoders and switch ports is not required, and may be 41deferred to the CXL driver based on admin policy (e.g. udev rules). 42 43Some platforms may require pre-programming HDM decoders and locking them 44due to quirks (see: Zen5 address translation), but this is not the normal, 45"expected" configuration path. This should be avoided if possible. 46 47Some platforms may wish to pre-configure these resources to bring memory 48up without requiring CXL driver support. These platform vendors should 49test their configurations with the existing CXL driver and provide driver 50support for their auto-configurations if features like RAS are required. 51 52Platforms requiring boot-time programming and/or locking of CXL fabric 53components may prevent features, such as device hot-plug, from working. 54 55UEFI Settings 56============= 57If your platform supports it, the :code:`uefisettings` command can be used to 58read/write EFI settings. Changes will be reflected on the next reboot. Kexec 59is not a sufficient reboot. 60 61One notable configuration here is the EFI_MEMORY_SP (Specific Purpose) bit. 62When this is enabled, this bit tells linux to defer management of a memory 63region to a driver (in this case, the CXL driver). Otherwise, the memory is 64treated as "normal memory", and is exposed to the page allocator during 65:code:`__init`. 66 67uefisettings examples 68--------------------- 69 70:code:`uefisettings identify` :: 71 72 uefisettings identify 73 74 bios_vendor: xxx 75 bios_version: xxx 76 bios_release: xxx 77 bios_date: xxx 78 product_name: xxx 79 product_family: xxx 80 product_version: xxx 81 82On some AMD platforms, the :code:`EFI_MEMORY_SP` bit is set via the :code:`CXL 83Memory Attribute` field. This may be called something else on your platform. 84 85:code:`uefisettings get "CXL Memory Attribute"` :: 86 87 selector: xxx 88 ... 89 question: Question { 90 name: "CXL Memory Attribute", 91 answer: "Enabled", 92 ... 93 } 94 95Physical Memory Map 96=================== 97 98Physical Address Region Alignment 99--------------------------------- 100 101As of Linux v6.14, the hotplug memory system requires memory regions to be 102uniform in size and alignment. While the CXL specification allows for memory 103regions as small as 256MB, the supported memory block size and alignment for 104hotplugged memory is architecture-defined. 105 106A Linux memory blocks may be as small as 128MB and increase in powers of two. 107 108* On ARM, the default block size and alignment is either 128MB or 256MB. 109 110* On x86, the default block size is 256MB, and increases to 2GB as the 111 capacity of the system increases up to 64GB. 112 113For best support across versions, platform vendors should place CXL memory at 114a 2GB aligned base address, and regions should be 2GB aligned. This also helps 115prevent the creating thousands of memory devices (one per block). 116 117Memory Holes 118------------ 119 120Holes in the memory map are tricky. Consider a 4GB device located at base 121address 0x100000000, but with the following memory map :: 122 123 --------------------- 124 | 0x100000000 | 125 | CXL | 126 | 0x1BFFFFFFF | 127 --------------------- 128 | 0x1C0000000 | 129 | MEMORY HOLE | 130 | 0x1FFFFFFFF | 131 --------------------- 132 | 0x200000000 | 133 | CXL CONT. | 134 | 0x23FFFFFFF | 135 --------------------- 136 137There are two issues to consider: 138 139* decoder programming, and 140* memory block alignment. 141 142If your architecture requires 2GB uniform size and aligned memory blocks, the 143only capacity Linux is capable of mapping (as of v6.14) would be the capacity 144from `0x100000000-0x180000000`. The remaining capacity will be stranded, as 145they are not of 2GB aligned length. 146 147Assuming your architecture and memory configuration allows 1GB memory blocks, 148this memory map is supported and this should be presented as multiple CFMWS 149in the CEDT that describe each side of the memory hole separately - along with 150matching decoders. 151 152Multiple decoders can (and should) be used to manage such a memory hole (see 153below), but each chunk of a memory hole should be aligned to a reasonable block 154size (larger alignment is always better). If you intend to have memory holes 155in the memory map, expect to use one decoder per contiguous chunk of host 156physical memory. 157 158As of v6.14, Linux does provide support for memory hotplug of multiple 159physical memory regions separated by a memory hole described by a single 160HDM decoder. 161 162 163Decoder Programming 164=================== 165If BIOS/EFI intends to program the decoders to be statically configured, 166there are a few things to consider to avoid major pitfalls that will 167prevent Linux compatibility. Some of these recommendations are not 168required "per the specification", but Linux makes no guarantees of support 169otherwise. 170 171 172Translation Point 173----------------- 174Per the specification, the only decoders which **TRANSLATE** Host Physical 175Address (HPA) to Device Physical Address (DPA) are the **Endpoint Decoders**. 176All other decoders in the fabric are intended to route accesses without 177translating the addresses. 178 179This is heavily implied by the specification, see: :: 180 181 CXL Specification 3.1 182 8.2.4.20: CXL HDM Decoder Capability Structure 183 - Implementation Note: CXL Host Bridge and Upstream Switch Port Decoder Flow 184 - Implementation Note: Device Decoder Logic 185 186Given this, Linux makes a strong assumption that decoders between CPU and 187endpoint will all be programmed with addresses ranges that are subsets of 188their parent decoder. 189 190Due to some ambiguity in how Architecture, ACPI, PCI, and CXL specifications 191"hand off" responsibility between domains, some early adopting platforms 192attempted to do translation at the originating memory controller or host 193bridge. This configuration requires a platform specific extension to the 194driver and is not officially endorsed - despite being supported. 195 196It is *highly recommended* **NOT** to do this; otherwise, you are on your own 197to implement driver support for your platform. 198 199Interleave and Configuration Flexibility 200---------------------------------------- 201If providing cross-host-bridge interleave, a CFMWS entry in the :doc:`CEDT 202<acpi/cedt>` must be presented with target host-bridges for the interleaved 203device sets (there may be multiple behind each host bridge). 204 205If providing intra-host-bridge interleaving, only 1 CFMWS entry in the CEDT is 206required for that host bridge - if it covers the entire capacity of the devices 207behind the host bridge. 208 209If intending to provide users flexibility in programming decoders beyond the 210root, you may want to provide multiple CFMWS entries in the CEDT intended for 211different purposes. For example, you may want to consider adding: 212 2131) A CFMWS entry to cover all interleavable host bridges. 2142) A CFMWS entry to cover all devices on a single host bridge. 2153) A CFMWS entry to cover each device. 216 217A platform may choose to add all of these, or change the mode based on a BIOS 218setting. For each CFMWS entry, Linux expects descriptions of the described 219memory regions in the :doc:`SRAT <acpi/srat>` to determine the number of 220NUMA nodes it should reserve during early boot / init. 221 222As of v6.14, Linux will create a NUMA node for each CEDT CFMWS entry, even if 223a matching SRAT entry does not exist; however, this is not guaranteed in the 224future and such a configuration should be avoided. 225 226Memory Holes 227------------ 228If your platform includes memory holes interspersed between your CXL memory, it 229is recommended to utilize multiple decoders to cover these regions of memory, 230rather than try to program the decoders to accept the entire range and expect 231Linux to manage the overlap. 232 233For example, consider the Memory Hole described above :: 234 235 --------------------- 236 | 0x100000000 | 237 | CXL | 238 | 0x1BFFFFFFF | 239 --------------------- 240 | 0x1C0000000 | 241 | MEMORY HOLE | 242 | 0x1FFFFFFFF | 243 --------------------- 244 | 0x200000000 | 245 | CXL CONT. | 246 | 0x23FFFFFFF | 247 --------------------- 248 249Assuming this is provided by a single device attached directly to a host bridge, 250Linux would expect the following decoder programming :: 251 252 ----------------------- ----------------------- 253 | root-decoder-0 | | root-decoder-1 | 254 | base: 0x100000000 | | base: 0x200000000 | 255 | size: 0xC0000000 | | size: 0x40000000 | 256 ----------------------- ----------------------- 257 | | 258 ----------------------- ----------------------- 259 | HB-decoder-0 | | HB-decoder-1 | 260 | base: 0x100000000 | | base: 0x200000000 | 261 | size: 0xC0000000 | | size: 0x40000000 | 262 ----------------------- ----------------------- 263 | | 264 ----------------------- ----------------------- 265 | ep-decoder-0 | | ep-decoder-1 | 266 | base: 0x100000000 | | base: 0x200000000 | 267 | size: 0xC0000000 | | size: 0x40000000 | 268 ----------------------- ----------------------- 269 270With a CEDT configuration with two CFMWS describing the above root decoders. 271 272Linux makes no guarantee of support for strange memory hole situations. 273 274Multi-Media Devices 275------------------- 276The CFMWS field of the CEDT has special restriction bits which describe whether 277the described memory region allows volatile or persistent memory (or both). If 278the platform intends to support either: 279 2801) A device with multiple medias, or 2812) Using a persistent memory device as normal memory 282 283A platform may wish to create multiple CEDT CFMWS entries to describe the same 284memory, with the intent of allowing the end user flexibility in how that memory 285is configured. Linux does not presently have strong requirements in this area. 286