xref: /linux/Documentation/driver-api/cxl/theory-of-operation.rst (revision 3f2a5ba784b808109cac0aac921213e43143a216)
1.. SPDX-License-Identifier: GPL-2.0
2.. include:: <isonum.txt>
3
4===============================================
5Compute Express Link Driver Theory of Operation
6===============================================
7
8A Compute Express Link Memory Device is a CXL component that implements the
9CXL.mem protocol. It contains some amount of volatile memory, persistent memory,
10or both. It is enumerated as a PCI device for configuration and passing
11messages over an MMIO mailbox. Its contribution to the System Physical
12Address space is handled via HDM (Host Managed Device Memory) decoders
13that optionally define a device's contribution to an interleaved address
14range across multiple devices underneath a host-bridge or interleaved
15across host-bridges.
16
17The CXL Bus
18===========
19Similar to how a RAID driver takes disk objects and assembles them into a new
20logical device, the CXL subsystem is tasked to take PCIe and ACPI objects and
21assemble them into a CXL.mem decode topology. The need for runtime configuration
22of the CXL.mem topology is also similar to RAID in that different environments
23with the same hardware configuration may decide to assemble the topology in
24contrasting ways. One may choose performance (RAID0) striping memory across
25multiple Host Bridges and endpoints while another may opt for fault tolerance
26and disable any striping in the CXL.mem topology.
27
28Platform firmware enumerates a menu of interleave options at the "CXL root port"
29(Linux term for the top of the CXL decode topology). From there, PCIe topology
30dictates which endpoints can participate in which Host Bridge decode regimes.
31Each PCIe Switch in the path between the root and an endpoint introduces a point
32at which the interleave can be split. For example, platform firmware may say a
33given range only decodes to one Host Bridge, but that Host Bridge may in turn
34interleave cycles across multiple Root Ports. An intervening Switch between a
35port and an endpoint may interleave cycles across multiple Downstream Switch
36Ports, etc.
37
38Here is a sample listing of a CXL topology defined by 'cxl_test'. The 'cxl_test'
39module generates an emulated CXL topology of 2 Host Bridges each with 2 Root
40Ports. Each of those Root Ports are connected to 2-way switches with endpoints
41connected to those downstream ports for a total of 8 endpoints::
42
43    # cxl list -BEMPu -b cxl_test
44    {
45      "bus":"root3",
46      "provider":"cxl_test",
47      "ports:root3":[
48        {
49          "port":"port5",
50          "host":"cxl_host_bridge.1",
51          "ports:port5":[
52            {
53              "port":"port8",
54              "host":"cxl_switch_uport.1",
55              "endpoints:port8":[
56                {
57                  "endpoint":"endpoint9",
58                  "host":"mem2",
59                  "memdev":{
60                    "memdev":"mem2",
61                    "pmem_size":"256.00 MiB (268.44 MB)",
62                    "ram_size":"256.00 MiB (268.44 MB)",
63                    "serial":"0x1",
64                    "numa_node":1,
65                    "host":"cxl_mem.1"
66                  }
67                },
68                {
69                  "endpoint":"endpoint15",
70                  "host":"mem6",
71                  "memdev":{
72                    "memdev":"mem6",
73                    "pmem_size":"256.00 MiB (268.44 MB)",
74                    "ram_size":"256.00 MiB (268.44 MB)",
75                    "serial":"0x5",
76                    "numa_node":1,
77                    "host":"cxl_mem.5"
78                  }
79                }
80              ]
81            },
82            {
83              "port":"port12",
84              "host":"cxl_switch_uport.3",
85              "endpoints:port12":[
86                {
87                  "endpoint":"endpoint17",
88                  "host":"mem8",
89                  "memdev":{
90                    "memdev":"mem8",
91                    "pmem_size":"256.00 MiB (268.44 MB)",
92                    "ram_size":"256.00 MiB (268.44 MB)",
93                    "serial":"0x7",
94                    "numa_node":1,
95                    "host":"cxl_mem.7"
96                  }
97                },
98                {
99                  "endpoint":"endpoint13",
100                  "host":"mem4",
101                  "memdev":{
102                    "memdev":"mem4",
103                    "pmem_size":"256.00 MiB (268.44 MB)",
104                    "ram_size":"256.00 MiB (268.44 MB)",
105                    "serial":"0x3",
106                    "numa_node":1,
107                    "host":"cxl_mem.3"
108                  }
109                }
110              ]
111            }
112          ]
113        },
114        {
115          "port":"port4",
116          "host":"cxl_host_bridge.0",
117          "ports:port4":[
118            {
119              "port":"port6",
120              "host":"cxl_switch_uport.0",
121              "endpoints:port6":[
122                {
123                  "endpoint":"endpoint7",
124                  "host":"mem1",
125                  "memdev":{
126                    "memdev":"mem1",
127                    "pmem_size":"256.00 MiB (268.44 MB)",
128                    "ram_size":"256.00 MiB (268.44 MB)",
129                    "serial":"0",
130                    "numa_node":0,
131                    "host":"cxl_mem.0"
132                  }
133                },
134                {
135                  "endpoint":"endpoint14",
136                  "host":"mem5",
137                  "memdev":{
138                    "memdev":"mem5",
139                    "pmem_size":"256.00 MiB (268.44 MB)",
140                    "ram_size":"256.00 MiB (268.44 MB)",
141                    "serial":"0x4",
142                    "numa_node":0,
143                    "host":"cxl_mem.4"
144                  }
145                }
146              ]
147            },
148            {
149              "port":"port10",
150              "host":"cxl_switch_uport.2",
151              "endpoints:port10":[
152                {
153                  "endpoint":"endpoint16",
154                  "host":"mem7",
155                  "memdev":{
156                    "memdev":"mem7",
157                    "pmem_size":"256.00 MiB (268.44 MB)",
158                    "ram_size":"256.00 MiB (268.44 MB)",
159                    "serial":"0x6",
160                    "numa_node":0,
161                    "host":"cxl_mem.6"
162                  }
163                },
164                {
165                  "endpoint":"endpoint11",
166                  "host":"mem3",
167                  "memdev":{
168                    "memdev":"mem3",
169                    "pmem_size":"256.00 MiB (268.44 MB)",
170                    "ram_size":"256.00 MiB (268.44 MB)",
171                    "serial":"0x2",
172                    "numa_node":0,
173                    "host":"cxl_mem.2"
174                  }
175                }
176              ]
177            }
178          ]
179        }
180      ]
181    }
182
183In that listing each "root", "port", and "endpoint" object correspond a kernel
184'struct cxl_port' object. A 'cxl_port' is a device that can decode CXL.mem to
185its descendants. So "root" claims non-PCIe enumerable platform decode ranges and
186decodes them to "ports", "ports" decode to "endpoints", and "endpoints"
187represent the decode from SPA (System Physical Address) to DPA (Device Physical
188Address).
189
190Continuing the RAID analogy, disks have both topology metadata and on-device
191metadata that determine RAID set assembly. CXL Port topology and CXL Port link
192status is metadata for CXL.mem set assembly. The CXL Port topology is enumerated
193by the arrival of a CXL.mem device. I.e. unless and until the PCIe core attaches
194the cxl_pci driver to a CXL Memory Expander there is no role for CXL Port
195objects. Conversely for hot-unplug / removal scenarios, there is no need for
196the Linux PCI core to tear down switch-level CXL resources because the endpoint
197->remove() event cleans up the port data that was established to support that
198Memory Expander.
199
200The port metadata and potential decode schemes that a given memory device may
201participate can be determined via a command like::
202
203    # cxl list -BDMu -d root -m mem3
204    {
205      "bus":"root3",
206      "provider":"cxl_test",
207      "decoders:root3":[
208        {
209          "decoder":"decoder3.1",
210          "resource":"0x8030000000",
211          "size":"512.00 MiB (536.87 MB)",
212          "volatile_capable":true,
213          "nr_targets":2
214        },
215        {
216          "decoder":"decoder3.3",
217          "resource":"0x8060000000",
218          "size":"512.00 MiB (536.87 MB)",
219          "pmem_capable":true,
220          "nr_targets":2
221        },
222        {
223          "decoder":"decoder3.0",
224          "resource":"0x8020000000",
225          "size":"256.00 MiB (268.44 MB)",
226          "volatile_capable":true,
227          "nr_targets":1
228        },
229        {
230          "decoder":"decoder3.2",
231          "resource":"0x8050000000",
232          "size":"256.00 MiB (268.44 MB)",
233          "pmem_capable":true,
234          "nr_targets":1
235        }
236      ],
237      "memdevs:root3":[
238        {
239          "memdev":"mem3",
240          "pmem_size":"256.00 MiB (268.44 MB)",
241          "ram_size":"256.00 MiB (268.44 MB)",
242          "serial":"0x2",
243          "numa_node":0,
244          "host":"cxl_mem.2"
245        }
246      ]
247    }
248
249...which queries the CXL topology to ask "given CXL Memory Expander with a kernel
250device name of 'mem3' which platform level decode ranges may this device
251participate". A given expander can participate in multiple CXL.mem interleave
252sets simultaneously depending on how many decoder resources it has. In this
253example mem3 can participate in one or more of a PMEM interleave that spans two
254Host Bridges, a PMEM interleave that targets a single Host Bridge, a Volatile
255memory interleave that spans 2 Host Bridges, and a Volatile memory interleave
256that only targets a single Host Bridge.
257
258Conversely the memory devices that can participate in a given platform level
259decode scheme can be determined via a command like the following::
260
261    # cxl list -MDu -d 3.2
262    [
263      {
264        "memdevs":[
265          {
266            "memdev":"mem1",
267            "pmem_size":"256.00 MiB (268.44 MB)",
268            "ram_size":"256.00 MiB (268.44 MB)",
269            "serial":"0",
270            "numa_node":0,
271            "host":"cxl_mem.0"
272          },
273          {
274            "memdev":"mem5",
275            "pmem_size":"256.00 MiB (268.44 MB)",
276            "ram_size":"256.00 MiB (268.44 MB)",
277            "serial":"0x4",
278            "numa_node":0,
279            "host":"cxl_mem.4"
280          },
281          {
282            "memdev":"mem7",
283            "pmem_size":"256.00 MiB (268.44 MB)",
284            "ram_size":"256.00 MiB (268.44 MB)",
285            "serial":"0x6",
286            "numa_node":0,
287            "host":"cxl_mem.6"
288          },
289          {
290            "memdev":"mem3",
291            "pmem_size":"256.00 MiB (268.44 MB)",
292            "ram_size":"256.00 MiB (268.44 MB)",
293            "serial":"0x2",
294            "numa_node":0,
295            "host":"cxl_mem.2"
296          }
297        ]
298      },
299      {
300        "root decoders":[
301          {
302            "decoder":"decoder3.2",
303            "resource":"0x8050000000",
304            "size":"256.00 MiB (268.44 MB)",
305            "pmem_capable":true,
306            "nr_targets":1
307          }
308        ]
309      }
310    ]
311
312...where the naming scheme for decoders is "decoder<port_id>.<instance_id>".
313
314Driver Infrastructure
315=====================
316
317This section covers the driver infrastructure for a CXL memory device.
318
319CXL Memory Device
320-----------------
321
322.. kernel-doc:: drivers/cxl/pci.c
323   :doc: cxl pci
324
325.. kernel-doc:: drivers/cxl/pci.c
326   :internal:
327
328.. kernel-doc:: drivers/cxl/mem.c
329   :doc: cxl mem
330
331.. kernel-doc:: drivers/cxl/cxlmem.h
332   :internal:
333
334.. kernel-doc:: drivers/cxl/core/memdev.c
335   :identifiers:
336
337CXL Port
338--------
339.. kernel-doc:: drivers/cxl/port.c
340   :doc: cxl port
341
342CXL Core
343--------
344.. kernel-doc:: drivers/cxl/cxl.h
345   :doc: cxl objects
346
347.. kernel-doc:: drivers/cxl/cxl.h
348   :internal:
349
350.. kernel-doc:: drivers/cxl/acpi.c
351   :identifiers: add_cxl_resources
352
353.. kernel-doc:: drivers/cxl/core/hdm.c
354   :doc: cxl core hdm
355
356.. kernel-doc:: drivers/cxl/core/hdm.c
357   :identifiers:
358
359.. kernel-doc:: drivers/cxl/core/cdat.c
360   :identifiers:
361
362.. kernel-doc:: drivers/cxl/core/port.c
363   :doc: cxl core
364
365.. kernel-doc:: drivers/cxl/core/port.c
366   :identifiers:
367
368.. kernel-doc:: drivers/cxl/core/pci.c
369   :doc: cxl core pci
370
371.. kernel-doc:: drivers/cxl/core/pci.c
372   :identifiers:
373
374.. kernel-doc:: drivers/cxl/core/pmem.c
375   :doc: cxl pmem
376
377.. kernel-doc:: drivers/cxl/core/pmem.c
378   :identifiers:
379
380.. kernel-doc:: drivers/cxl/core/regs.c
381   :doc: cxl registers
382
383.. kernel-doc:: drivers/cxl/core/regs.c
384   :identifiers:
385
386.. kernel-doc:: drivers/cxl/core/mbox.c
387   :doc: cxl mbox
388
389.. kernel-doc:: drivers/cxl/core/mbox.c
390   :identifiers:
391
392.. kernel-doc:: drivers/cxl/core/features.c
393   :doc: cxl features
394
395See :c:func:`devm_cxl_setup_features` for API details.
396
397CXL Regions
398-----------
399.. kernel-doc:: drivers/cxl/core/region.c
400   :doc: cxl core region
401
402.. kernel-doc:: drivers/cxl/core/region.c
403   :identifiers:
404
405External Interfaces
406===================
407
408CXL IOCTL Interface
409-------------------
410
411.. kernel-doc:: include/uapi/linux/cxl_mem.h
412   :doc: UAPI
413
414.. kernel-doc:: include/uapi/linux/cxl_mem.h
415   :internal:
416