xref: /linux/Documentation/driver-api/cxl/conventions/cxl-lmh.rst (revision e812928be2ee1c2744adf20ed04e0ce1e2fc5c13)
1.. SPDX-License-Identifier: GPL-2.0
2
3Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders
4============================================================================
5
6Document
7--------
8
9CXL Revision 3.2, Version 1.0
10
11License
12-------
13
14SPDX-License Identifier: CC-BY-4.0
15
16Creator/Contributors
17--------------------
18
19- Fabio M. De Francesco, Intel
20- Dan J. Williams, Intel
21- Mahesh Natu, Intel
22
23Summary of the Change
24---------------------
25
26According to the current Compute Express Link (CXL) Specifications (Revision
273.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero
28or more Host Physical Address (HPA) windows associated with each CXL Host
29Bridge. Each window represents a contiguous HPA range that may be interleaved
30across one or more targets, including CXL Host Bridges. Each window has a set
31of restrictions that govern its usage. It is the Operating System-directed
32configuration and Power Management (OSPM) responsibility to utilize each window
33for the specified use.
34
35Table 9-22 of the current CXL Specifications states that the Window Size field
36contains the total number of consecutive bytes of HPA this window describes.
37This value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB.
38
39Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a
40memory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases,
41the CFMWS Range Size may not adhere to the NIW * 256 MB rule.
42
43The HPA represents the actual physical memory address space that the CXL devices
44can decode and respond to, while the System Physical Address (SPA), a related
45but distinct concept, represents the system-visible address space that users can
46direct transaction to and so it excludes reserved regions.
47
48BIOS publishes CFMWS to communicate the active SPA ranges that, on platforms
49with LMH's, map to a strict subset of the HPA. The SPA range trims out the hole,
50resulting in lost capacity in the Endpoints with no SPA to map to that part of
51the HPA range that intersects the hole.
52
53E.g, an x86 platform with two CFMWS and an LMH starting at 2 GB:
54
55 +--------+------------+-------------------+------------------+-------------------+------+
56 | Window | CFMWS Base |    CFMWS Size     | HDM Decoder Base |  HDM Decoder Size | Ways |
57 +========+============+===================+==================+===================+======+
58 |   0    |   0 GB     |       2 GB        |      0 GB        |       3 GB        |  12  |
59 +--------+------------+-------------------+------------------+-------------------+------+
60 |   1    |   4 GB     | NIW*256MB Aligned |      4 GB        | NIW*256MB Aligned |  12  |
61 +--------+------------+-------------------+------------------+-------------------+------+
62
63HDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of
64a 12 ways region and all the intermediate Switch Decoders. They are configured
65by the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of
663GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root
67Decoder HPA range that results smaller (2GB) than that of the Switch and
68Endpoint Decoders in the hierarchy (3GB).
69
70This creates 2 issues which lead to a failure to construct a region:
71
721) A mismatch in region size between root and any HDM decoder. The root decoders
73   will always be smaller due to the trim.
74
752) The trim causes the root decoder to violate the (NIW * 256MB) rule.
76
77This change allows a region with a base address of 0GB to bypass these checks to
78allow for region creation with the trimmed root decoder address range.
79
80This change does not allow for any other arbitrary region to violate these
81checks - it is intended exclusively to enable x86 platforms which map CXL memory
82under 4GB.
83
84Despite the HDM decoders covering the PCIE hole HPA region, it is expected that
85the platform will never route address accesses to the CXL complex because the
86root decoder only covers the trimmed region (which excludes this). This is
87outside the ability of Linux to enforce.
88
89On the example platform, only the first 2GB will be potentially usable, but
90Linux, aiming to adhere to the current specifications, fails to construct
91Regions and attach Endpoint and intermediate Switch Decoders to them.
92
93There are several points of failure that due to the expectation that the Root
94Decoder HPA size, that is equal to the CFMWS from which it is configured, has
95to be greater or equal to the matching Switch and Endpoint HDM Decoders.
96
97In order to succeed with construction and attachment, Linux must construct a
98Region with Root Decoder HPA range size, and then attach to that all the
99intermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy
100regardless of their range sizes.
101
102Benefits of the Change
103----------------------
104
105Without the change, the OSPM wouldn't match intermediate Switch and Endpoint
106Decoders with Root Decoders configured with CFMWS HPA sizes that don't align
107with the NIW * 256MB constraint, and so it leads to lost memdev capacity.
108
109This change allows the OSPM to construct Regions and attach intermediate Switch
110and Endpoint Decoders to them, so that the addressable part of the memory
111devices total capacity is made available to the users.
112
113References
114----------
115
116Compute Express Link Specification Revision 3.2, Version 1.0
117<https://www.computeexpresslink.org/>
118
119Detailed Description of the Change
120----------------------------------
121
122The description of the Window Size field in table 9-22 needs to account for
123platforms with Low Memory Holes, where SPA ranges might be subsets of the
124endpoints HPA. Therefore, it has to be changed to the following:
125
126"The total number of consecutive bytes of HPA this window represents. This value
127shall be a multiple of NIW * 256 MB.
128
129On platforms that reserve physical addresses below 4 GB, such as the Low Memory
130Hole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might
131have a size that doesn't align with the NIW * 256 MB constraint.
132
133Note that the matching intermediate Switch Decoders and the Endpoint Decoders
134HPA range sizes must still align to the above-mentioned rule, but the memory
135capacity that exceeds the CFMWS window size won't be accessible.".
136