xref: /linux/Documentation/driver-api/cxl/conventions.rst (revision e3966940559d52aa1800a008dcfeec218dd31f88)
1.. SPDX-License-Identifier: GPL-2.0
2.. include:: <isonum.txt>
3
4=======================================
5Compute Express Link: Linux Conventions
6=======================================
7
8There exists shipping platforms that bend or break CXL specification
9expectations. Record the details and the rationale for those deviations.
10Borrow the ACPI Code First template format to capture the assumptions
11and tradeoffs such that multiple platform implementations can follow the
12same convention.
13
14<(template) Title>
15==================
16
17Document
18--------
19CXL Revision <rev>, Version <ver>
20
21License
22-------
23SPDX-License Identifier: CC-BY-4.0
24
25Creator/Contributors
26--------------------
27
28Summary of the Change
29---------------------
30
31<Detail the conflict with the specification and where available the
32assumptions and tradeoffs taken by the hardware platform.>
33
34
35Benefits of the Change
36----------------------
37
38<Detail what happens if platforms and Linux do not adopt this
39convention.>
40
41References
42----------
43
44Detailed Description of the Change
45----------------------------------
46
47<Propose spec language that corrects the conflict.>
48
49
50Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders
51============================================================================
52
53Document
54--------
55
56CXL Revision 3.2, Version 1.0
57
58License
59-------
60
61SPDX-License Identifier: CC-BY-4.0
62
63Creator/Contributors
64--------------------
65
66- Fabio M. De Francesco, Intel
67- Dan J. Williams, Intel
68- Mahesh Natu, Intel
69
70Summary of the Change
71---------------------
72
73According to the current Compute Express Link (CXL) Specifications (Revision
743.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero
75or more Host Physical Address (HPA) windows associated with each CXL Host
76Bridge. Each window represents a contiguous HPA range that may be interleaved
77across one or more targets, including CXL Host Bridges. Each window has a set
78of restrictions that govern its usage. It is the Operating System-directed
79configuration and Power Management (OSPM) responsibility to utilize each window
80for the specified use.
81
82Table 9-22 of the current CXL Specifications states that the Window Size field
83contains the total number of consecutive bytes of HPA this window describes.
84This value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB.
85
86Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a
87memory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases,
88the CFMWS Range Size may not adhere to the NIW * 256 MB rule.
89
90The HPA represents the actual physical memory address space that the CXL devices
91can decode and respond to, while the System Physical Address (SPA), a related
92but distinct concept, represents the system-visible address space that users can
93direct transaction to and so it excludes reserved regions.
94
95BIOS publishes CFMWS to communicate the active SPA ranges that, on platforms
96with LMH's, map to a strict subset of the HPA. The SPA range trims out the hole,
97resulting in lost capacity in the Endpoints with no SPA to map to that part of
98the HPA range that intersects the hole.
99
100E.g, an x86 platform with two CFMWS and an LMH starting at 2 GB:
101
102 +--------+------------+-------------------+------------------+-------------------+------+
103 | Window | CFMWS Base |    CFMWS Size     | HDM Decoder Base |  HDM Decoder Size | Ways |
104 +========+============+===================+==================+===================+======+
105 |   0    |   0 GB     |       2 GB        |      0 GB        |       3 GB        |  12  |
106 +--------+------------+-------------------+------------------+-------------------+------+
107 |   1    |   4 GB     | NIW*256MB Aligned |      4 GB        | NIW*256MB Aligned |  12  |
108 +--------+------------+-------------------+------------------+-------------------+------+
109
110HDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of
111a 12 ways region and all the intermediate Switch Decoders. They are configured
112by the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of
1133GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root
114Decoder HPA range that results smaller (2GB) than that of the Switch and
115Endpoint Decoders in the hierarchy (3GB).
116
117This creates 2 issues which lead to a failure to construct a region:
118
1191) A mismatch in region size between root and any HDM decoder. The root decoders
120   will always be smaller due to the trim.
121
1222) The trim causes the root decoder to violate the (NIW * 256MB) rule.
123
124This change allows a region with a base address of 0GB to bypass these checks to
125allow for region creation with the trimmed root decoder address range.
126
127This change does not allow for any other arbitrary region to violate these
128checks - it is intended exclusively to enable x86 platforms which map CXL memory
129under 4GB.
130
131Despite the HDM decoders covering the PCIE hole HPA region, it is expected that
132the platform will never route address accesses to the CXL complex because the
133root decoder only covers the trimmed region (which excludes this). This is
134outside the ability of Linux to enforce.
135
136On the example platform, only the first 2GB will be potentially usable, but
137Linux, aiming to adhere to the current specifications, fails to construct
138Regions and attach Endpoint and intermediate Switch Decoders to them.
139
140There are several points of failure that due to the expectation that the Root
141Decoder HPA size, that is equal to the CFMWS from which it is configured, has
142to be greater or equal to the matching Switch and Endpoint HDM Decoders.
143
144In order to succeed with construction and attachment, Linux must construct a
145Region with Root Decoder HPA range size, and then attach to that all the
146intermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy
147regardless of their range sizes.
148
149Benefits of the Change
150----------------------
151
152Without the change, the OSPM wouldn't match intermediate Switch and Endpoint
153Decoders with Root Decoders configured with CFMWS HPA sizes that don't align
154with the NIW * 256MB constraint, and so it leads to lost memdev capacity.
155
156This change allows the OSPM to construct Regions and attach intermediate Switch
157and Endpoint Decoders to them, so that the addressable part of the memory
158devices total capacity is made available to the users.
159
160References
161----------
162
163Compute Express Link Specification Revision 3.2, Version 1.0
164<https://www.computeexpresslink.org/>
165
166Detailed Description of the Change
167----------------------------------
168
169The description of the Window Size field in table 9-22 needs to account for
170platforms with Low Memory Holes, where SPA ranges might be subsets of the
171endpoints HPA. Therefore, it has to be changed to the following:
172
173"The total number of consecutive bytes of HPA this window represents. This value
174shall be a multiple of NIW * 256 MB.
175
176On platforms that reserve physical addresses below 4 GB, such as the Low Memory
177Hole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might
178have a size that doesn't align with the NIW * 256 MB constraint.
179
180Note that the matching intermediate Switch Decoders and the Endpoint Decoders
181HPA range sizes must still align to the above-mentioned rule, but the memory
182capacity that exceeds the CFMWS window size won't be accessible.".
183