xref: /linux/Documentation/driver-api/cxl/conventions/cxl-lmh.rst (revision e812928be2ee1c2744adf20ed04e0ce1e2fc5c13)
1*e6efbd29SRobert Richter.. SPDX-License-Identifier: GPL-2.0
2*e6efbd29SRobert Richter
3*e6efbd29SRobert RichterResolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders
4*e6efbd29SRobert Richter============================================================================
5*e6efbd29SRobert Richter
6*e6efbd29SRobert RichterDocument
7*e6efbd29SRobert Richter--------
8*e6efbd29SRobert Richter
9*e6efbd29SRobert RichterCXL Revision 3.2, Version 1.0
10*e6efbd29SRobert Richter
11*e6efbd29SRobert RichterLicense
12*e6efbd29SRobert Richter-------
13*e6efbd29SRobert Richter
14*e6efbd29SRobert RichterSPDX-License Identifier: CC-BY-4.0
15*e6efbd29SRobert Richter
16*e6efbd29SRobert RichterCreator/Contributors
17*e6efbd29SRobert Richter--------------------
18*e6efbd29SRobert Richter
19*e6efbd29SRobert Richter- Fabio M. De Francesco, Intel
20*e6efbd29SRobert Richter- Dan J. Williams, Intel
21*e6efbd29SRobert Richter- Mahesh Natu, Intel
22*e6efbd29SRobert Richter
23*e6efbd29SRobert RichterSummary of the Change
24*e6efbd29SRobert Richter---------------------
25*e6efbd29SRobert Richter
26*e6efbd29SRobert RichterAccording to the current Compute Express Link (CXL) Specifications (Revision
27*e6efbd29SRobert Richter3.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero
28*e6efbd29SRobert Richteror more Host Physical Address (HPA) windows associated with each CXL Host
29*e6efbd29SRobert RichterBridge. Each window represents a contiguous HPA range that may be interleaved
30*e6efbd29SRobert Richteracross one or more targets, including CXL Host Bridges. Each window has a set
31*e6efbd29SRobert Richterof restrictions that govern its usage. It is the Operating System-directed
32*e6efbd29SRobert Richterconfiguration and Power Management (OSPM) responsibility to utilize each window
33*e6efbd29SRobert Richterfor the specified use.
34*e6efbd29SRobert Richter
35*e6efbd29SRobert RichterTable 9-22 of the current CXL Specifications states that the Window Size field
36*e6efbd29SRobert Richtercontains the total number of consecutive bytes of HPA this window describes.
37*e6efbd29SRobert RichterThis value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB.
38*e6efbd29SRobert Richter
39*e6efbd29SRobert RichterPlatform Firmware (BIOS) might reserve physical addresses below 4 GB where a
40*e6efbd29SRobert Richtermemory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases,
41*e6efbd29SRobert Richterthe CFMWS Range Size may not adhere to the NIW * 256 MB rule.
42*e6efbd29SRobert Richter
43*e6efbd29SRobert RichterThe HPA represents the actual physical memory address space that the CXL devices
44*e6efbd29SRobert Richtercan decode and respond to, while the System Physical Address (SPA), a related
45*e6efbd29SRobert Richterbut distinct concept, represents the system-visible address space that users can
46*e6efbd29SRobert Richterdirect transaction to and so it excludes reserved regions.
47*e6efbd29SRobert Richter
48*e6efbd29SRobert RichterBIOS publishes CFMWS to communicate the active SPA ranges that, on platforms
49*e6efbd29SRobert Richterwith LMH's, map to a strict subset of the HPA. The SPA range trims out the hole,
50*e6efbd29SRobert Richterresulting in lost capacity in the Endpoints with no SPA to map to that part of
51*e6efbd29SRobert Richterthe HPA range that intersects the hole.
52*e6efbd29SRobert Richter
53*e6efbd29SRobert RichterE.g, an x86 platform with two CFMWS and an LMH starting at 2 GB:
54*e6efbd29SRobert Richter
55*e6efbd29SRobert Richter +--------+------------+-------------------+------------------+-------------------+------+
56*e6efbd29SRobert Richter | Window | CFMWS Base |    CFMWS Size     | HDM Decoder Base |  HDM Decoder Size | Ways |
57*e6efbd29SRobert Richter +========+============+===================+==================+===================+======+
58*e6efbd29SRobert Richter |   0    |   0 GB     |       2 GB        |      0 GB        |       3 GB        |  12  |
59*e6efbd29SRobert Richter +--------+------------+-------------------+------------------+-------------------+------+
60*e6efbd29SRobert Richter |   1    |   4 GB     | NIW*256MB Aligned |      4 GB        | NIW*256MB Aligned |  12  |
61*e6efbd29SRobert Richter +--------+------------+-------------------+------------------+-------------------+------+
62*e6efbd29SRobert Richter
63*e6efbd29SRobert RichterHDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of
64*e6efbd29SRobert Richtera 12 ways region and all the intermediate Switch Decoders. They are configured
65*e6efbd29SRobert Richterby the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of
66*e6efbd29SRobert Richter3GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root
67*e6efbd29SRobert RichterDecoder HPA range that results smaller (2GB) than that of the Switch and
68*e6efbd29SRobert RichterEndpoint Decoders in the hierarchy (3GB).
69*e6efbd29SRobert Richter
70*e6efbd29SRobert RichterThis creates 2 issues which lead to a failure to construct a region:
71*e6efbd29SRobert Richter
72*e6efbd29SRobert Richter1) A mismatch in region size between root and any HDM decoder. The root decoders
73*e6efbd29SRobert Richter   will always be smaller due to the trim.
74*e6efbd29SRobert Richter
75*e6efbd29SRobert Richter2) The trim causes the root decoder to violate the (NIW * 256MB) rule.
76*e6efbd29SRobert Richter
77*e6efbd29SRobert RichterThis change allows a region with a base address of 0GB to bypass these checks to
78*e6efbd29SRobert Richterallow for region creation with the trimmed root decoder address range.
79*e6efbd29SRobert Richter
80*e6efbd29SRobert RichterThis change does not allow for any other arbitrary region to violate these
81*e6efbd29SRobert Richterchecks - it is intended exclusively to enable x86 platforms which map CXL memory
82*e6efbd29SRobert Richterunder 4GB.
83*e6efbd29SRobert Richter
84*e6efbd29SRobert RichterDespite the HDM decoders covering the PCIE hole HPA region, it is expected that
85*e6efbd29SRobert Richterthe platform will never route address accesses to the CXL complex because the
86*e6efbd29SRobert Richterroot decoder only covers the trimmed region (which excludes this). This is
87*e6efbd29SRobert Richteroutside the ability of Linux to enforce.
88*e6efbd29SRobert Richter
89*e6efbd29SRobert RichterOn the example platform, only the first 2GB will be potentially usable, but
90*e6efbd29SRobert RichterLinux, aiming to adhere to the current specifications, fails to construct
91*e6efbd29SRobert RichterRegions and attach Endpoint and intermediate Switch Decoders to them.
92*e6efbd29SRobert Richter
93*e6efbd29SRobert RichterThere are several points of failure that due to the expectation that the Root
94*e6efbd29SRobert RichterDecoder HPA size, that is equal to the CFMWS from which it is configured, has
95*e6efbd29SRobert Richterto be greater or equal to the matching Switch and Endpoint HDM Decoders.
96*e6efbd29SRobert Richter
97*e6efbd29SRobert RichterIn order to succeed with construction and attachment, Linux must construct a
98*e6efbd29SRobert RichterRegion with Root Decoder HPA range size, and then attach to that all the
99*e6efbd29SRobert Richterintermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy
100*e6efbd29SRobert Richterregardless of their range sizes.
101*e6efbd29SRobert Richter
102*e6efbd29SRobert RichterBenefits of the Change
103*e6efbd29SRobert Richter----------------------
104*e6efbd29SRobert Richter
105*e6efbd29SRobert RichterWithout the change, the OSPM wouldn't match intermediate Switch and Endpoint
106*e6efbd29SRobert RichterDecoders with Root Decoders configured with CFMWS HPA sizes that don't align
107*e6efbd29SRobert Richterwith the NIW * 256MB constraint, and so it leads to lost memdev capacity.
108*e6efbd29SRobert Richter
109*e6efbd29SRobert RichterThis change allows the OSPM to construct Regions and attach intermediate Switch
110*e6efbd29SRobert Richterand Endpoint Decoders to them, so that the addressable part of the memory
111*e6efbd29SRobert Richterdevices total capacity is made available to the users.
112*e6efbd29SRobert Richter
113*e6efbd29SRobert RichterReferences
114*e6efbd29SRobert Richter----------
115*e6efbd29SRobert Richter
116*e6efbd29SRobert RichterCompute Express Link Specification Revision 3.2, Version 1.0
117*e6efbd29SRobert Richter<https://www.computeexpresslink.org/>
118*e6efbd29SRobert Richter
119*e6efbd29SRobert RichterDetailed Description of the Change
120*e6efbd29SRobert Richter----------------------------------
121*e6efbd29SRobert Richter
122*e6efbd29SRobert RichterThe description of the Window Size field in table 9-22 needs to account for
123*e6efbd29SRobert Richterplatforms with Low Memory Holes, where SPA ranges might be subsets of the
124*e6efbd29SRobert Richterendpoints HPA. Therefore, it has to be changed to the following:
125*e6efbd29SRobert Richter
126*e6efbd29SRobert Richter"The total number of consecutive bytes of HPA this window represents. This value
127*e6efbd29SRobert Richtershall be a multiple of NIW * 256 MB.
128*e6efbd29SRobert Richter
129*e6efbd29SRobert RichterOn platforms that reserve physical addresses below 4 GB, such as the Low Memory
130*e6efbd29SRobert RichterHole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might
131*e6efbd29SRobert Richterhave a size that doesn't align with the NIW * 256 MB constraint.
132*e6efbd29SRobert Richter
133*e6efbd29SRobert RichterNote that the matching intermediate Switch Decoders and the Endpoint Decoders
134*e6efbd29SRobert RichterHPA range sizes must still align to the above-mentioned rule, but the memory
135*e6efbd29SRobert Richtercapacity that exceeds the CFMWS window size won't be accessible.".
136