xref: /linux/Documentation/driver-api/cxl/platform/bios-and-efi.rst (revision be1ca3ee8f97067fee87fda73ea5959d5ab75bbf)
1.. SPDX-License-Identifier: GPL-2.0
2
3======================
4BIOS/EFI Configuration
5======================
6
7BIOS and EFI are largely responsible for configuring static information about
8devices (or potential future devices) such that Linux can build the appropriate
9logical representations of these devices.
10
11At a high level, this is what occurs during this phase of configuration.
12
13* The bootloader starts the BIOS/EFI.
14
15* BIOS/EFI do early device probe to determine static configuration
16
17* BIOS/EFI creates ACPI Tables that describe static config for the OS
18
19* BIOS/EFI create the system memory map (EFI Memory Map, E820, etc)
20
21* BIOS/EFI calls :code:`start_kernel` and begins the Linux Early Boot process.
22
23Much of what this section is concerned with is ACPI Table production and
24static memory map configuration. More detail on these tables can be found
25at :doc:`ACPI Tables <acpi>`.
26
27.. note::
28   Platform Vendors should read carefully, as this sections has recommendations
29   on physical memory region size and alignment, memory holes, HDM interleave,
30   and what linux expects of HDM decoders trying to work with these features.
31
32
33Linux Expectations of BIOS/EFI Software
34=======================================
35Linux expects BIOS/EFI software to construct sufficient ACPI tables (such as
36CEDT, SRAT, HMAT, etc) and platform-specific configurations (such as HPA spaces
37and host-bridge interleave configurations) to allow the Linux driver to
38subsequently configure the devices in the CXL fabric at runtime.
39
40Programming of HDM decoders and switch ports is not required, and may be
41deferred to the CXL driver based on admin policy (e.g. udev rules).
42
43Some platforms may require pre-programming HDM decoders and locking them
44due to quirks (see: Zen5 address translation), but this is not the normal,
45"expected" configuration path.  This should be avoided if possible.
46
47Some platforms may wish to pre-configure these resources to bring memory
48up without requiring CXL driver support.  These platform vendors should
49test their configurations with the existing CXL driver and provide driver
50support for their auto-configurations if features like RAS are required.
51
52Platforms requiring boot-time programming and/or locking of CXL fabric
53components may prevent features, such as device hot-plug, from working.
54
55UEFI Settings
56=============
57If your platform supports it, the :code:`uefisettings` command can be used to
58read/write EFI settings. Changes will be reflected on the next reboot. Kexec
59is not a sufficient reboot.
60
61One notable configuration here is the EFI_MEMORY_SP (Specific Purpose) bit.
62When this is enabled, this bit tells linux to defer management of a memory
63region to a driver (in this case, the CXL driver). Otherwise, the memory is
64treated as "normal memory", and is exposed to the page allocator during
65:code:`__init`.
66
67uefisettings examples
68---------------------
69
70:code:`uefisettings identify` ::
71
72        uefisettings identify
73
74        bios_vendor: xxx
75        bios_version: xxx
76        bios_release: xxx
77        bios_date: xxx
78        product_name: xxx
79        product_family: xxx
80        product_version: xxx
81
82On some AMD platforms, the :code:`EFI_MEMORY_SP` bit is set via the :code:`CXL
83Memory Attribute` field.  This may be called something else on your platform.
84
85:code:`uefisettings get "CXL Memory Attribute"` ::
86
87        selector: xxx
88        ...
89        question: Question {
90            name: "CXL Memory Attribute",
91            answer: "Enabled",
92            ...
93        }
94
95Physical Memory Map
96===================
97
98Physical Address Region Alignment
99---------------------------------
100
101As of Linux v6.14, the hotplug memory system requires memory regions to be
102uniform in size and alignment.  While the CXL specification allows for memory
103regions as small as 256MB, the supported memory block size and alignment for
104hotplugged memory is architecture-defined.
105
106A Linux memory blocks may be as small as 128MB and increase in powers of two.
107
108* On ARM, the default block size and alignment is either 128MB or 256MB.
109
110* On x86, the default block size is 256MB, and increases to 2GB as the
111  capacity of the system increases up to 64GB.
112
113For best support across versions, platform vendors should place CXL memory at
114a 2GB aligned base address, and regions should be 2GB aligned.  This also helps
115prevent the creating thousands of memory devices (one per block).
116
117Memory Holes
118------------
119
120Holes in the memory map are tricky.  Consider a 4GB device located at base
121address 0x100000000, but with the following memory map ::
122
123  ---------------------
124  |    0x100000000    |
125  |        CXL        |
126  |    0x1BFFFFFFF    |
127  ---------------------
128  |    0x1C0000000    |
129  |    MEMORY HOLE    |
130  |    0x1FFFFFFFF    |
131  ---------------------
132  |    0x200000000    |
133  |     CXL CONT.     |
134  |    0x23FFFFFFF    |
135  ---------------------
136
137There are two issues to consider:
138
139* decoder programming, and
140* memory block alignment.
141
142If your architecture requires 2GB uniform size and aligned memory blocks, the
143only capacity Linux is capable of mapping (as of v6.14) would be the capacity
144from `0x100000000-0x180000000`.  The remaining capacity will be stranded, as
145they are not of 2GB aligned length.
146
147Assuming your architecture and memory configuration allows 1GB memory blocks,
148this memory map is supported and this should be presented as multiple CFMWS
149in the CEDT that describe each side of the memory hole separately - along with
150matching decoders.
151
152Multiple decoders can (and should) be used to manage such a memory hole (see
153below), but each chunk of a memory hole should be aligned to a reasonable block
154size (larger alignment is always better).  If you intend to have memory holes
155in the memory map, expect to use one decoder per contiguous chunk of host
156physical memory.
157
158As of v6.14, Linux does provide support for memory hotplug of multiple
159physical memory regions separated by a memory hole described by a single
160HDM decoder.
161
162
163Decoder Programming
164===================
165If BIOS/EFI intends to program the decoders to be statically configured,
166there are a few things to consider to avoid major pitfalls that will
167prevent Linux compatibility.  Some of these recommendations are not
168required "per the specification", but Linux makes no guarantees of support
169otherwise.
170
171
172Translation Point
173-----------------
174Per the specification, the only decoders which **TRANSLATE** Host Physical
175Address (HPA) to Device Physical Address (DPA) are the **Endpoint Decoders**.
176All other decoders in the fabric are intended to route accesses without
177translating the addresses.
178
179This is heavily implied by the specification, see: ::
180
181  CXL Specification 3.1
182  8.2.4.20: CXL HDM Decoder Capability Structure
183  - Implementation Note: CXL Host Bridge and Upstream Switch Port Decoder Flow
184  - Implementation Note: Device Decoder Logic
185
186Given this, Linux makes a strong assumption that decoders between CPU and
187endpoint will all be programmed with addresses ranges that are subsets of
188their parent decoder.
189
190Due to some ambiguity in how Architecture, ACPI, PCI, and CXL specifications
191"hand off" responsibility between domains, some early adopting platforms
192attempted to do translation at the originating memory controller or host
193bridge.  This configuration requires a platform specific extension to the
194driver and is not officially endorsed - despite being supported.
195
196It is *highly recommended* **NOT** to do this; otherwise, you are on your own
197to implement driver support for your platform.
198
199Interleave and Configuration Flexibility
200----------------------------------------
201If providing cross-host-bridge interleave, a CFMWS entry in the :doc:`CEDT
202<acpi/cedt>` must be presented with target host-bridges for the interleaved
203device sets (there may be multiple behind each host bridge).
204
205If providing intra-host-bridge interleaving, only 1 CFMWS entry in the CEDT is
206required for that host bridge - if it covers the entire capacity of the devices
207behind the host bridge.
208
209If intending to provide users flexibility in programming decoders beyond the
210root, you may want to provide multiple CFMWS entries in the CEDT intended for
211different purposes.  For example, you may want to consider adding:
212
2131) A CFMWS entry to cover all interleavable host bridges.
2142) A CFMWS entry to cover all devices on a single host bridge.
2153) A CFMWS entry to cover each device.
216
217A platform may choose to add all of these, or change the mode based on a BIOS
218setting.  For each CFMWS entry, Linux expects descriptions of the described
219memory regions in the :doc:`SRAT <acpi/srat>` to determine the number of
220NUMA nodes it should reserve during early boot / init.
221
222As of v6.14, Linux will create a NUMA node for each CEDT CFMWS entry, even if
223a matching SRAT entry does not exist; however, this is not guaranteed in the
224future and such a configuration should be avoided.
225
226Memory Holes
227------------
228If your platform includes memory holes interspersed between your CXL memory, it
229is recommended to utilize multiple decoders to cover these regions of memory,
230rather than try to program the decoders to accept the entire range and expect
231Linux to manage the overlap.
232
233For example, consider the Memory Hole described above ::
234
235  ---------------------
236  |    0x100000000    |
237  |        CXL        |
238  |    0x1BFFFFFFF    |
239  ---------------------
240  |    0x1C0000000    |
241  |    MEMORY HOLE    |
242  |    0x1FFFFFFFF    |
243  ---------------------
244  |    0x200000000    |
245  |     CXL CONT.     |
246  |    0x23FFFFFFF    |
247  ---------------------
248
249Assuming this is provided by a single device attached directly to a host bridge,
250Linux would expect the following decoder programming ::
251
252     -----------------------   -----------------------
253     | root-decoder-0      |   | root-decoder-1      |
254     |   base: 0x100000000 |   |   base: 0x200000000 |
255     |   size:  0xC0000000 |   |   size:  0x40000000 |
256     -----------------------   -----------------------
257                |                         |
258     -----------------------   -----------------------
259     | HB-decoder-0        |   | HB-decoder-1        |
260     |   base: 0x100000000 |   |   base: 0x200000000 |
261     |   size:  0xC0000000 |   |   size:  0x40000000 |
262     -----------------------   -----------------------
263                |                         |
264     -----------------------   -----------------------
265     | ep-decoder-0        |   | ep-decoder-1        |
266     |   base: 0x100000000 |   |   base: 0x200000000 |
267     |   size:  0xC0000000 |   |   size:  0x40000000 |
268     -----------------------   -----------------------
269
270With a CEDT configuration with two CFMWS describing the above root decoders.
271
272Linux makes no guarantee of support for strange memory hole situations.
273
274Multi-Media Devices
275-------------------
276The CFMWS field of the CEDT has special restriction bits which describe whether
277the described memory region allows volatile or persistent memory (or both). If
278the platform intends to support either:
279
2801) A device with multiple medias, or
2812) Using a persistent memory device as normal memory
282
283A platform may wish to create multiple CEDT CFMWS entries to describe the same
284memory, with the intent of allowing the end user flexibility in how that memory
285is configured. Linux does not presently have strong requirements in this area.
286