1*e6efbd29SRobert Richter.. SPDX-License-Identifier: GPL-2.0 2*e6efbd29SRobert Richter 3*e6efbd29SRobert RichterResolve conflict between CFMWS, Platform Memory Holes, and Endpoint Decoders 4*e6efbd29SRobert Richter============================================================================ 5*e6efbd29SRobert Richter 6*e6efbd29SRobert RichterDocument 7*e6efbd29SRobert Richter-------- 8*e6efbd29SRobert Richter 9*e6efbd29SRobert RichterCXL Revision 3.2, Version 1.0 10*e6efbd29SRobert Richter 11*e6efbd29SRobert RichterLicense 12*e6efbd29SRobert Richter------- 13*e6efbd29SRobert Richter 14*e6efbd29SRobert RichterSPDX-License Identifier: CC-BY-4.0 15*e6efbd29SRobert Richter 16*e6efbd29SRobert RichterCreator/Contributors 17*e6efbd29SRobert Richter-------------------- 18*e6efbd29SRobert Richter 19*e6efbd29SRobert Richter- Fabio M. De Francesco, Intel 20*e6efbd29SRobert Richter- Dan J. Williams, Intel 21*e6efbd29SRobert Richter- Mahesh Natu, Intel 22*e6efbd29SRobert Richter 23*e6efbd29SRobert RichterSummary of the Change 24*e6efbd29SRobert Richter--------------------- 25*e6efbd29SRobert Richter 26*e6efbd29SRobert RichterAccording to the current Compute Express Link (CXL) Specifications (Revision 27*e6efbd29SRobert Richter3.2, Version 1.0), the CXL Fixed Memory Window Structure (CFMWS) describes zero 28*e6efbd29SRobert Richteror more Host Physical Address (HPA) windows associated with each CXL Host 29*e6efbd29SRobert RichterBridge. Each window represents a contiguous HPA range that may be interleaved 30*e6efbd29SRobert Richteracross one or more targets, including CXL Host Bridges. Each window has a set 31*e6efbd29SRobert Richterof restrictions that govern its usage. It is the Operating System-directed 32*e6efbd29SRobert Richterconfiguration and Power Management (OSPM) responsibility to utilize each window 33*e6efbd29SRobert Richterfor the specified use. 34*e6efbd29SRobert Richter 35*e6efbd29SRobert RichterTable 9-22 of the current CXL Specifications states that the Window Size field 36*e6efbd29SRobert Richtercontains the total number of consecutive bytes of HPA this window describes. 37*e6efbd29SRobert RichterThis value must be a multiple of the Number of Interleave Ways (NIW) * 256 MB. 38*e6efbd29SRobert Richter 39*e6efbd29SRobert RichterPlatform Firmware (BIOS) might reserve physical addresses below 4 GB where a 40*e6efbd29SRobert Richtermemory gap such as the Low Memory Hole for PCIe MMIO may exist. In such cases, 41*e6efbd29SRobert Richterthe CFMWS Range Size may not adhere to the NIW * 256 MB rule. 42*e6efbd29SRobert Richter 43*e6efbd29SRobert RichterThe HPA represents the actual physical memory address space that the CXL devices 44*e6efbd29SRobert Richtercan decode and respond to, while the System Physical Address (SPA), a related 45*e6efbd29SRobert Richterbut distinct concept, represents the system-visible address space that users can 46*e6efbd29SRobert Richterdirect transaction to and so it excludes reserved regions. 47*e6efbd29SRobert Richter 48*e6efbd29SRobert RichterBIOS publishes CFMWS to communicate the active SPA ranges that, on platforms 49*e6efbd29SRobert Richterwith LMH's, map to a strict subset of the HPA. The SPA range trims out the hole, 50*e6efbd29SRobert Richterresulting in lost capacity in the Endpoints with no SPA to map to that part of 51*e6efbd29SRobert Richterthe HPA range that intersects the hole. 52*e6efbd29SRobert Richter 53*e6efbd29SRobert RichterE.g, an x86 platform with two CFMWS and an LMH starting at 2 GB: 54*e6efbd29SRobert Richter 55*e6efbd29SRobert Richter +--------+------------+-------------------+------------------+-------------------+------+ 56*e6efbd29SRobert Richter | Window | CFMWS Base | CFMWS Size | HDM Decoder Base | HDM Decoder Size | Ways | 57*e6efbd29SRobert Richter +========+============+===================+==================+===================+======+ 58*e6efbd29SRobert Richter | 0 | 0 GB | 2 GB | 0 GB | 3 GB | 12 | 59*e6efbd29SRobert Richter +--------+------------+-------------------+------------------+-------------------+------+ 60*e6efbd29SRobert Richter | 1 | 4 GB | NIW*256MB Aligned | 4 GB | NIW*256MB Aligned | 12 | 61*e6efbd29SRobert Richter +--------+------------+-------------------+------------------+-------------------+------+ 62*e6efbd29SRobert Richter 63*e6efbd29SRobert RichterHDM decoder base and HDM decoder size represent all the 12 Endpoint Decoders of 64*e6efbd29SRobert Richtera 12 ways region and all the intermediate Switch Decoders. They are configured 65*e6efbd29SRobert Richterby the BIOS according to the NIW * 256MB rule, resulting in a HPA range size of 66*e6efbd29SRobert Richter3GB. Instead, the CFMWS Base and CFMWS Size are used to configure the Root 67*e6efbd29SRobert RichterDecoder HPA range that results smaller (2GB) than that of the Switch and 68*e6efbd29SRobert RichterEndpoint Decoders in the hierarchy (3GB). 69*e6efbd29SRobert Richter 70*e6efbd29SRobert RichterThis creates 2 issues which lead to a failure to construct a region: 71*e6efbd29SRobert Richter 72*e6efbd29SRobert Richter1) A mismatch in region size between root and any HDM decoder. The root decoders 73*e6efbd29SRobert Richter will always be smaller due to the trim. 74*e6efbd29SRobert Richter 75*e6efbd29SRobert Richter2) The trim causes the root decoder to violate the (NIW * 256MB) rule. 76*e6efbd29SRobert Richter 77*e6efbd29SRobert RichterThis change allows a region with a base address of 0GB to bypass these checks to 78*e6efbd29SRobert Richterallow for region creation with the trimmed root decoder address range. 79*e6efbd29SRobert Richter 80*e6efbd29SRobert RichterThis change does not allow for any other arbitrary region to violate these 81*e6efbd29SRobert Richterchecks - it is intended exclusively to enable x86 platforms which map CXL memory 82*e6efbd29SRobert Richterunder 4GB. 83*e6efbd29SRobert Richter 84*e6efbd29SRobert RichterDespite the HDM decoders covering the PCIE hole HPA region, it is expected that 85*e6efbd29SRobert Richterthe platform will never route address accesses to the CXL complex because the 86*e6efbd29SRobert Richterroot decoder only covers the trimmed region (which excludes this). This is 87*e6efbd29SRobert Richteroutside the ability of Linux to enforce. 88*e6efbd29SRobert Richter 89*e6efbd29SRobert RichterOn the example platform, only the first 2GB will be potentially usable, but 90*e6efbd29SRobert RichterLinux, aiming to adhere to the current specifications, fails to construct 91*e6efbd29SRobert RichterRegions and attach Endpoint and intermediate Switch Decoders to them. 92*e6efbd29SRobert Richter 93*e6efbd29SRobert RichterThere are several points of failure that due to the expectation that the Root 94*e6efbd29SRobert RichterDecoder HPA size, that is equal to the CFMWS from which it is configured, has 95*e6efbd29SRobert Richterto be greater or equal to the matching Switch and Endpoint HDM Decoders. 96*e6efbd29SRobert Richter 97*e6efbd29SRobert RichterIn order to succeed with construction and attachment, Linux must construct a 98*e6efbd29SRobert RichterRegion with Root Decoder HPA range size, and then attach to that all the 99*e6efbd29SRobert Richterintermediate Switch Decoders and Endpoint Decoders that belong to the hierarchy 100*e6efbd29SRobert Richterregardless of their range sizes. 101*e6efbd29SRobert Richter 102*e6efbd29SRobert RichterBenefits of the Change 103*e6efbd29SRobert Richter---------------------- 104*e6efbd29SRobert Richter 105*e6efbd29SRobert RichterWithout the change, the OSPM wouldn't match intermediate Switch and Endpoint 106*e6efbd29SRobert RichterDecoders with Root Decoders configured with CFMWS HPA sizes that don't align 107*e6efbd29SRobert Richterwith the NIW * 256MB constraint, and so it leads to lost memdev capacity. 108*e6efbd29SRobert Richter 109*e6efbd29SRobert RichterThis change allows the OSPM to construct Regions and attach intermediate Switch 110*e6efbd29SRobert Richterand Endpoint Decoders to them, so that the addressable part of the memory 111*e6efbd29SRobert Richterdevices total capacity is made available to the users. 112*e6efbd29SRobert Richter 113*e6efbd29SRobert RichterReferences 114*e6efbd29SRobert Richter---------- 115*e6efbd29SRobert Richter 116*e6efbd29SRobert RichterCompute Express Link Specification Revision 3.2, Version 1.0 117*e6efbd29SRobert Richter<https://www.computeexpresslink.org/> 118*e6efbd29SRobert Richter 119*e6efbd29SRobert RichterDetailed Description of the Change 120*e6efbd29SRobert Richter---------------------------------- 121*e6efbd29SRobert Richter 122*e6efbd29SRobert RichterThe description of the Window Size field in table 9-22 needs to account for 123*e6efbd29SRobert Richterplatforms with Low Memory Holes, where SPA ranges might be subsets of the 124*e6efbd29SRobert Richterendpoints HPA. Therefore, it has to be changed to the following: 125*e6efbd29SRobert Richter 126*e6efbd29SRobert Richter"The total number of consecutive bytes of HPA this window represents. This value 127*e6efbd29SRobert Richtershall be a multiple of NIW * 256 MB. 128*e6efbd29SRobert Richter 129*e6efbd29SRobert RichterOn platforms that reserve physical addresses below 4 GB, such as the Low Memory 130*e6efbd29SRobert RichterHole for PCIe MMIO on x86, an instance of CFMWS whose Base HPA range is 0 might 131*e6efbd29SRobert Richterhave a size that doesn't align with the NIW * 256 MB constraint. 132*e6efbd29SRobert Richter 133*e6efbd29SRobert RichterNote that the matching intermediate Switch Decoders and the Endpoint Decoders 134*e6efbd29SRobert RichterHPA range sizes must still align to the above-mentioned rule, but the memory 135*e6efbd29SRobert Richtercapacity that exceeds the CFMWS window size won't be accessible.". 136