1.. SPDX-License-Identifier: GPL-2.0 2 3Resolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders 4============================================================================ 5 6Document 7-------- 8 9CXL Revision 3.2, Version 1.0 10 11License 12------- 13 14SPDX-License Identifier: CC-BY-4.0 15 16Creator/Contributors 17-------------------- 18 19- Fabio M. De Francesco, Intel 20- Dan J. Williams, Intel 21- Mahesh Natu, Intel 22 23Summary of the Change 24--------------------- 25 26According to the current Compute Express Link (CXL) Specifications (Revision 273.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero 28or more Host Physical Address (HPA) windows associated with each CXL Host 29Bridge. Each window represents a contiguous HPA range that may be interleaved 30across one or more targets, including CXL Host Bridges. Each window has a set 31of restrictions that govern its usage. It is the Operating System-directed 32configuration and Power Management (OSPM) responsibility to utilize each window 33for the specified use. 34 35Table 9-22 of the current CXL Specifications states that the Window Size field 36contains the total number of consecutive bytes of HPA this window describes. 37This value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB. 38 39Platform Firmware (BIOS) might reserve physical addresses below 4 GB where a 40memory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases, 41the CFMWS Range Size may not adhere to the NIW * 256 MB rule. 42 43The HPA represents the actual physical memory address space that the CXL devices 44can decode and respond to, while the System Physical Address (SPA), a related 45but distinct concept, represents the system-visible address space that users can 46direct transaction to and so it excludes reserved regions. 47 48BIOS publishes CFMWS to communicate the active SPA ranges that, on platforms 49with LMH's, map to a strict subset of the HPA. The SPA range trims out the hole, 50resulting in lost capacity in the Endpoints with no SPA to map to that part of 51the HPA range that intersects the hole. 52 53E.g, an x86 platform with two CFMWS and an LMH starting at 2 GB: 54 55 +--------+------------+-------------------+------------------+-------------------+------+ 56 | Window | CFMWS Base | CFMWS Size | HDM Decoder Base | HDM Decoder Size | Ways | 57 +========+============+===================+==================+===================+======+ 58 | 0 | 0 GB | 2 GB | 0 GB | 3 GB | 12 | 59 +--------+------------+-------------------+------------------+-------------------+------+ 60 | 1 | 4 GB | NIW*256MB Aligned | 4 GB | NIW*256MB Aligned | 12 | 61 +--------+------------+-------------------+------------------+-------------------+------+ 62 63HDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of 64a 12 ways region and all the intermediate Switch Decoders. They are configured 65by the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of 663GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root 67Decoder HPA range that results smaller (2GB) than that of the Switch and 68Endpoint Decoders in the hierarchy (3GB). 69 70This creates 2 issues which lead to a failure to construct a region: 71 721) A mismatch in region size between root and any HDM decoder. The root decoders 73 will always be smaller due to the trim. 74 752) The trim causes the root decoder to violate the (NIW * 256MB) rule. 76 77This change allows a region with a base address of 0GB to bypass these checks to 78allow for region creation with the trimmed root decoder address range. 79 80This change does not allow for any other arbitrary region to violate these 81checks - it is intended exclusively to enable x86 platforms which map CXL memory 82under 4GB. 83 84Despite the HDM decoders covering the PCIE hole HPA region, it is expected that 85the platform will never route address accesses to the CXL complex because the 86root decoder only covers the trimmed region (which excludes this). This is 87outside the ability of Linux to enforce. 88 89On the example platform, only the first 2GB will be potentially usable, but 90Linux, aiming to adhere to the current specifications, fails to construct 91Regions and attach Endpoint and intermediate Switch Decoders to them. 92 93There are several points of failure that due to the expectation that the Root 94Decoder HPA size, that is equal to the CFMWS from which it is configured, has 95to be greater or equal to the matching Switch and Endpoint HDM Decoders. 96 97In order to succeed with construction and attachment, Linux must construct a 98Region with Root Decoder HPA range size, and then attach to that all the 99intermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy 100regardless of their range sizes. 101 102Benefits of the Change 103---------------------- 104 105Without the change, the OSPM wouldn't match intermediate Switch and Endpoint 106Decoders with Root Decoders configured with CFMWS HPA sizes that don't align 107with the NIW * 256MB constraint, and so it leads to lost memdev capacity. 108 109This change allows the OSPM to construct Regions and attach intermediate Switch 110and Endpoint Decoders to them, so that the addressable part of the memory 111devices total capacity is made available to the users. 112 113References 114---------- 115 116Compute Express Link Specification Revision 3.2, Version 1.0 117<https://www.computeexpresslink.org/> 118 119Detailed Description of the Change 120---------------------------------- 121 122The description of the Window Size field in table 9-22 needs to account for 123platforms with Low Memory Holes, where SPA ranges might be subsets of the 124endpoints HPA. Therefore, it has to be changed to the following: 125 126"The total number of consecutive bytes of HPA this window represents. This value 127shall be a multiple of NIW * 256 MB. 128 129On platforms that reserve physical addresses below 4 GB, such as the Low Memory 130Hole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might 131have a size that doesn't align with the NIW * 256 MB constraint. 132 133Note that the matching intermediate Switch Decoders and the Endpoint Decoders 134HPA range sizes must still align to the above-mentioned rule, but the memory 135capacity that exceeds the CFMWS window size won't be accessible.". 136