1.. SPDX-License-Identifier: GPL-2.0 2 3============================ 4PCI Peer-to-Peer DMA Support 5============================ 6 7The PCI bus has pretty decent support for performing DMA transfers 8between two devices on the bus. This type of transaction is henceforth 9called Peer-to-Peer (or P2P). However, there are a number of issues that 10make P2P transactions tricky to do in a perfectly safe way. 11 12One of the biggest issues is that PCI doesn't require forwarding 13transactions between hierarchy domains, and in PCIe, each Root Port 14defines a separate hierarchy domain. To make things worse, there is no 15simple way to determine if a given Root Complex supports this or not. 16(See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel 17only supports doing P2P when the endpoints involved are all behind the 18same PCI bridge, as such devices are all in the same PCI hierarchy 19domain, and the spec guarantees that all transactions within the 20hierarchy will be routable, but it does not require routing 21between hierarchies. 22 23The second issue is that to make use of existing interfaces in Linux, 24memory that is used for P2P transactions needs to be backed by struct 25pages. However, PCI BARs are not typically cache coherent so there are 26a few corner case gotchas with these pages so developers need to 27be careful about what they do with them. 28 29 30Driver Writer's Guide 31===================== 32 33In a given P2P implementation there may be three or more different 34types of kernel drivers in play: 35 36* Provider - A driver which provides or publishes P2P resources like 37 memory or doorbell registers to other drivers. 38* Client - A driver which makes use of a resource by setting up a 39 DMA transaction to or from it. 40* Orchestrator - A driver which orchestrates the flow of data between 41 clients and providers. 42 43In many cases there could be overlap between these three types (i.e., 44it may be typical for a driver to be both a provider and a client). 45 46For example, in the NVMe Target Copy Offload implementation: 47 48* The NVMe PCI driver is both a client, provider and orchestrator 49 in that it exposes any CMB (Controller Memory Buffer) as a P2P memory 50 resource (provider), it accepts P2P memory pages as buffers in requests 51 to be used directly (client) and it can also make use of the CMB as 52 submission queue entries (orchestrator). 53* The RDMA driver is a client in this arrangement so that an RNIC 54 can DMA directly to the memory exposed by the NVMe device. 55* The NVMe Target driver (nvmet) can orchestrate the data from the RNIC 56 to the P2P memory (CMB) and then to the NVMe device (and vice versa). 57 58This is currently the only arrangement supported by the kernel but 59one could imagine slight tweaks to this that would allow for the same 60functionality. For example, if a specific RNIC added a BAR with some 61memory behind it, its driver could add support as a P2P provider and 62then the NVMe Target could use the RNIC's memory instead of the CMB 63in cases where the NVMe cards in use do not have CMB support. 64 65 66Provider Drivers 67---------------- 68 69A provider simply needs to register a BAR (or a portion of a BAR) 70as a P2P DMA resource using :c:func:`pci_p2pdma_add_resource()`. 71This will register struct pages for all the specified memory. 72 73After that it may optionally publish all of its resources as 74P2P memory using :c:func:`pci_p2pmem_publish()`. This will allow 75any orchestrator drivers to find and use the memory. When marked in 76this way, the resource must be regular memory with no side effects. 77 78For the time being this is fairly rudimentary in that all resources 79are typically going to be P2P memory. Future work will likely expand 80this to include other types of resources like doorbells. 81 82 83Client Drivers 84-------------- 85 86A client driver only has to use the mapping API :c:func:`dma_map_sg()` 87and :c:func:`dma_unmap_sg()` functions as usual, and the implementation 88will do the right thing for the P2P capable memory. 89 90 91Orchestrator Drivers 92-------------------- 93 94The first task an orchestrator driver must do is compile a list of 95all client devices that will be involved in a given transaction. For 96example, the NVMe Target driver creates a list including the namespace 97block device and the RNIC in use. If the orchestrator has access to 98a specific P2P provider to use it may check compatibility using 99:c:func:`pci_p2pdma_distance()` otherwise it may find a memory provider 100that's compatible with all clients using :c:func:`pci_p2pmem_find()`. 101If more than one provider is supported, the one nearest to all the clients will 102be chosen first. If more than one provider is an equal distance away, the 103one returned will be chosen at random (it is not an arbitrary but 104truly random). This function returns the PCI device to use for the provider 105with a reference taken and therefore when it's no longer needed it should be 106returned with pci_dev_put(). 107 108Once a provider is selected, the orchestrator can then use 109:c:func:`pci_alloc_p2pmem()` and :c:func:`pci_free_p2pmem()` to 110allocate P2P memory from the provider. :c:func:`pci_p2pmem_alloc_sgl()` 111and :c:func:`pci_p2pmem_free_sgl()` are convenience functions for 112allocating scatter-gather lists with P2P memory. 113 114Struct Page Caveats 115------------------- 116 117Driver writers should be very careful about not passing these special 118struct pages to code that isn't prepared for it. At this time, the kernel 119interfaces do not have any checks for ensuring this. This obviously 120precludes passing these pages to userspace. 121 122P2P memory is also technically IO memory but should never have any side 123effects behind it. Thus, the order of loads and stores should not be important 124and ioreadX(), iowriteX() and friends should not be necessary. 125 126 127P2P DMA Support Library 128======================= 129 130.. kernel-doc:: drivers/pci/p2pdma.c 131 :export: 132