17ee6095fSJonathan NeuschäferBuffer Sharing and Synchronization (dma-buf) 27ee6095fSJonathan Neuschäfer============================================ 3868c97a8SDaniel Vetter 4868c97a8SDaniel VetterThe dma-buf subsystem provides the framework for sharing buffers for 5868c97a8SDaniel Vetterhardware (DMA) access across multiple device drivers and subsystems, and 6868c97a8SDaniel Vetterfor synchronizing asynchronous hardware access. 7868c97a8SDaniel Vetter 809902f3aSDaniel StoneAs an example, it is used extensively by the DRM subsystem to exchange 909902f3aSDaniel Stonebuffers between processes, contexts, library APIs within the same 1009902f3aSDaniel Stoneprocess, and also to exchange buffers with other subsystems such as 1109902f3aSDaniel StoneV4L2. 12868c97a8SDaniel Vetter 1309902f3aSDaniel StoneThis document describes the way in which kernel subsystems can use and 1409902f3aSDaniel Stoneinteract with the three main primitives offered by dma-buf: 1509902f3aSDaniel Stone 1609902f3aSDaniel Stone - dma-buf, representing a sg_table and exposed to userspace as a file 1709902f3aSDaniel Stone descriptor to allow passing between processes, subsystems, devices, 1809902f3aSDaniel Stone etc; 1909902f3aSDaniel Stone - dma-fence, providing a mechanism to signal when an asynchronous 2009902f3aSDaniel Stone hardware operation has completed; and 2109902f3aSDaniel Stone - dma-resv, which manages a set of dma-fences for a particular dma-buf 2209902f3aSDaniel Stone allowing implicit (kernel-ordered) synchronization of work to 2309902f3aSDaniel Stone preserve the illusion of coherent access 24868c97a8SDaniel Vetter 25504245a5SDaniel Stone 26504245a5SDaniel StoneUserspace API principles and use 27504245a5SDaniel Stone-------------------------------- 28504245a5SDaniel Stone 29504245a5SDaniel StoneFor more details on how to design your subsystem's API for dma-buf use, please 30504245a5SDaniel Stonesee Documentation/userspace-api/dma-buf-alloc-exchange.rst. 31504245a5SDaniel Stone 32504245a5SDaniel Stone 33868c97a8SDaniel VetterShared DMA Buffers 34868c97a8SDaniel Vetter------------------ 35868c97a8SDaniel Vetter 362904a8c1SDaniel VetterThis document serves as a guide to device-driver writers on what is the dma-buf 372904a8c1SDaniel Vetterbuffer sharing API, how to use it for exporting and using shared buffers. 382904a8c1SDaniel Vetter 392904a8c1SDaniel VetterAny device driver which wishes to be a part of DMA buffer sharing, can do so as 402904a8c1SDaniel Vettereither the 'exporter' of buffers, or the 'user' or 'importer' of buffers. 412904a8c1SDaniel Vetter 422904a8c1SDaniel VetterSay a driver A wants to use buffers created by driver B, then we call B as the 432904a8c1SDaniel Vetterexporter, and A as buffer-user/importer. 442904a8c1SDaniel Vetter 452904a8c1SDaniel VetterThe exporter 462904a8c1SDaniel Vetter 472904a8c1SDaniel Vetter - implements and manages operations in :c:type:`struct dma_buf_ops 482904a8c1SDaniel Vetter <dma_buf_ops>` for the buffer, 492904a8c1SDaniel Vetter - allows other users to share the buffer by using dma_buf sharing APIs, 50776d5882SGal Pressman - manages the details of buffer allocation, wrapped in a :c:type:`struct 512904a8c1SDaniel Vetter dma_buf <dma_buf>`, 522904a8c1SDaniel Vetter - decides about the actual backing storage where this allocation happens, 532904a8c1SDaniel Vetter - and takes care of any migration of scatterlist - for all (shared) users of 542904a8c1SDaniel Vetter this buffer. 552904a8c1SDaniel Vetter 562904a8c1SDaniel VetterThe buffer-user 572904a8c1SDaniel Vetter 582904a8c1SDaniel Vetter - is one of (many) sharing users of the buffer. 592904a8c1SDaniel Vetter - doesn't need to worry about how the buffer is allocated, or where. 602904a8c1SDaniel Vetter - and needs a mechanism to get access to the scatterlist that makes up this 612904a8c1SDaniel Vetter buffer in memory, mapped into its own address space, so it can access the 622904a8c1SDaniel Vetter same area of memory. This interface is provided by :c:type:`struct 632904a8c1SDaniel Vetter dma_buf_attachment <dma_buf_attachment>`. 642904a8c1SDaniel Vetter 65e7e21c72SDaniel VetterAny exporters or users of the dma-buf buffer sharing framework must have a 66e7e21c72SDaniel Vetter'select DMA_SHARED_BUFFER' in their respective Kconfigs. 67e7e21c72SDaniel Vetter 68e7e21c72SDaniel VetterUserspace Interface Notes 69e7e21c72SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~ 70e7e21c72SDaniel Vetter 71e7e21c72SDaniel VetterMostly a DMA buffer file descriptor is simply an opaque object for userspace, 72e7e21c72SDaniel Vetterand hence the generic interface exposed is very minimal. There's a few things to 73e7e21c72SDaniel Vetterconsider though: 74e7e21c72SDaniel Vetter 75e7e21c72SDaniel Vetter- Since kernel 3.12 the dma-buf FD supports the llseek system call, but only 76e7e21c72SDaniel Vetter with offset=0 and whence=SEEK_END|SEEK_SET. SEEK_SET is supported to allow 77e7e21c72SDaniel Vetter the usual size discover pattern size = SEEK_END(0); SEEK_SET(0). Every other 78e7e21c72SDaniel Vetter llseek operation will report -EINVAL. 79e7e21c72SDaniel Vetter 80*49c2dd6eSBaruch Siach If llseek on dma-buf FDs isn't supported the kernel will report -ESPIPE for all 81e7e21c72SDaniel Vetter cases. Userspace can use this to detect support for discovering the dma-buf 82e7e21c72SDaniel Vetter size using llseek. 83e7e21c72SDaniel Vetter 84e7e21c72SDaniel Vetter- In order to avoid fd leaks on exec, the FD_CLOEXEC flag must be set 85e7e21c72SDaniel Vetter on the file descriptor. This is not just a resource leak, but a 86e7e21c72SDaniel Vetter potential security hole. It could give the newly exec'd application 87e7e21c72SDaniel Vetter access to buffers, via the leaked fd, to which it should otherwise 88e7e21c72SDaniel Vetter not be permitted access. 89e7e21c72SDaniel Vetter 90e7e21c72SDaniel Vetter The problem with doing this via a separate fcntl() call, versus doing it 91e7e21c72SDaniel Vetter atomically when the fd is created, is that this is inherently racy in a 92e7e21c72SDaniel Vetter multi-threaded app[3]. The issue is made worse when it is library code 93e7e21c72SDaniel Vetter opening/creating the file descriptor, as the application may not even be 94e7e21c72SDaniel Vetter aware of the fd's. 95e7e21c72SDaniel Vetter 96e7e21c72SDaniel Vetter To avoid this problem, userspace must have a way to request O_CLOEXEC 97e7e21c72SDaniel Vetter flag be set when the dma-buf fd is created. So any API provided by 98e7e21c72SDaniel Vetter the exporting driver to create a dmabuf fd must provide a way to let 99e7e21c72SDaniel Vetter userspace control setting of O_CLOEXEC flag passed in to dma_buf_fd(). 100e7e21c72SDaniel Vetter 101e7e21c72SDaniel Vetter- Memory mapping the contents of the DMA buffer is also supported. See the 102e7e21c72SDaniel Vetter discussion below on `CPU Access to DMA Buffer Objects`_ for the full details. 103e7e21c72SDaniel Vetter 104b899353dSMauro Carvalho Chehab- The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ below for 105e7e21c72SDaniel Vetter details. 106e7e21c72SDaniel Vetter 10751f52547SJason Ekstrand- The DMA buffer FD also supports a few dma-buf-specific ioctls, see 10851f52547SJason Ekstrand `DMA Buffer ioctls`_ below for details. 10951f52547SJason Ekstrand 1102904a8c1SDaniel VetterBasic Operation and Device DMA Access 1112904a8c1SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1122904a8c1SDaniel Vetter 1132904a8c1SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-buf.c 1142904a8c1SDaniel Vetter :doc: dma buf device access 1152904a8c1SDaniel Vetter 1160959a168SDaniel VetterCPU Access to DMA Buffer Objects 1170959a168SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1180959a168SDaniel Vetter 1190959a168SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-buf.c 1200959a168SDaniel Vetter :doc: cpu access 1210959a168SDaniel Vetter 122102514ecSDaniel VetterImplicit Fence Poll Support 123102514ecSDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~ 124e7e21c72SDaniel Vetter 125e7e21c72SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-buf.c 126102514ecSDaniel Vetter :doc: implicit fence polling 127e7e21c72SDaniel Vetter 128bdb8d06dSHridya ValsarajuDMA-BUF statistics 129bdb8d06dSHridya Valsaraju~~~~~~~~~~~~~~~~~~ 130bdb8d06dSHridya Valsaraju.. kernel-doc:: drivers/dma-buf/dma-buf-sysfs-stats.c 131bdb8d06dSHridya Valsaraju :doc: overview 132bdb8d06dSHridya Valsaraju 13351f52547SJason EkstrandDMA Buffer ioctls 13451f52547SJason Ekstrand~~~~~~~~~~~~~~~~~ 13551f52547SJason Ekstrand 13651f52547SJason Ekstrand.. kernel-doc:: include/uapi/linux/dma-buf.h 13751f52547SJason Ekstrand 138ae2e7f28SDmitry OsipenkoDMA-BUF locking convention 139ae2e7f28SDmitry Osipenko~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 140ae2e7f28SDmitry Osipenko 141ae2e7f28SDmitry Osipenko.. kernel-doc:: drivers/dma-buf/dma-buf.c 142ae2e7f28SDmitry Osipenko :doc: locking convention 143ae2e7f28SDmitry Osipenko 1442904a8c1SDaniel VetterKernel Functions and Structures Reference 1452904a8c1SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1462904a8c1SDaniel Vetter 147868c97a8SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-buf.c 148868c97a8SDaniel Vetter :export: 149868c97a8SDaniel Vetter 150868c97a8SDaniel Vetter.. kernel-doc:: include/linux/dma-buf.h 151868c97a8SDaniel Vetter :internal: 152868c97a8SDaniel Vetter 153868c97a8SDaniel VetterReservation Objects 154868c97a8SDaniel Vetter------------------- 155868c97a8SDaniel Vetter 1560f546217SAnna Karas.. kernel-doc:: drivers/dma-buf/dma-resv.c 157868c97a8SDaniel Vetter :doc: Reservation Object Overview 158868c97a8SDaniel Vetter 1590f546217SAnna Karas.. kernel-doc:: drivers/dma-buf/dma-resv.c 160868c97a8SDaniel Vetter :export: 161868c97a8SDaniel Vetter 1620f546217SAnna Karas.. kernel-doc:: include/linux/dma-resv.h 163868c97a8SDaniel Vetter :internal: 164868c97a8SDaniel Vetter 165868c97a8SDaniel VetterDMA Fences 166868c97a8SDaniel Vetter---------- 167868c97a8SDaniel Vetter 168868c97a8SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence.c 1694dd3cdb2SDaniel Vetter :doc: DMA fences overview 1704dd3cdb2SDaniel Vetter 171d0b9a9aeSDaniel VetterDMA Fence Cross-Driver Contract 172d0b9a9aeSDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 173d0b9a9aeSDaniel Vetter 174d0b9a9aeSDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence.c 175d0b9a9aeSDaniel Vetter :doc: fence cross-driver contract 176d0b9a9aeSDaniel Vetter 1775fbff813SDaniel VetterDMA Fence Signalling Annotations 1785fbff813SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1795fbff813SDaniel Vetter 1805fbff813SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence.c 1815fbff813SDaniel Vetter :doc: fence signalling annotation 1825fbff813SDaniel Vetter 183aec11c8dSRob ClarkDMA Fence Deadline Hints 184aec11c8dSRob Clark~~~~~~~~~~~~~~~~~~~~~~~~ 185aec11c8dSRob Clark 186aec11c8dSRob Clark.. kernel-doc:: drivers/dma-buf/dma-fence.c 187aec11c8dSRob Clark :doc: deadline hints 188aec11c8dSRob Clark 1894dd3cdb2SDaniel VetterDMA Fences Functions Reference 1904dd3cdb2SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1914dd3cdb2SDaniel Vetter 1924dd3cdb2SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence.c 193868c97a8SDaniel Vetter :export: 194868c97a8SDaniel Vetter 195868c97a8SDaniel Vetter.. kernel-doc:: include/linux/dma-fence.h 196868c97a8SDaniel Vetter :internal: 197868c97a8SDaniel Vetter 198868c97a8SDaniel VetterDMA Fence Array 199868c97a8SDaniel Vetter~~~~~~~~~~~~~~~ 200868c97a8SDaniel Vetter 201868c97a8SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence-array.c 202868c97a8SDaniel Vetter :export: 203868c97a8SDaniel Vetter 204868c97a8SDaniel Vetter.. kernel-doc:: include/linux/dma-fence-array.h 205868c97a8SDaniel Vetter :internal: 206868c97a8SDaniel Vetter 207b970b8e9SDaniel VetterDMA Fence Chain 208b970b8e9SDaniel Vetter~~~~~~~~~~~~~~~ 209b970b8e9SDaniel Vetter 210b970b8e9SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence-chain.c 211b970b8e9SDaniel Vetter :export: 212b970b8e9SDaniel Vetter 213b970b8e9SDaniel Vetter.. kernel-doc:: include/linux/dma-fence-chain.h 214b970b8e9SDaniel Vetter :internal: 215b970b8e9SDaniel Vetter 21664a8f92fSChristian KönigDMA Fence unwrap 21764a8f92fSChristian König~~~~~~~~~~~~~~~~ 21864a8f92fSChristian König 21964a8f92fSChristian König.. kernel-doc:: include/linux/dma-fence-unwrap.h 22064a8f92fSChristian König :internal: 22164a8f92fSChristian König 222d71c11ccSRob ClarkDMA Fence Sync File 223d71c11ccSRob Clark~~~~~~~~~~~~~~~~~~~ 224868c97a8SDaniel Vetter 225868c97a8SDaniel Vetter.. kernel-doc:: drivers/dma-buf/sync_file.c 226868c97a8SDaniel Vetter :export: 227868c97a8SDaniel Vetter 228868c97a8SDaniel Vetter.. kernel-doc:: include/linux/sync_file.h 229868c97a8SDaniel Vetter :internal: 230868c97a8SDaniel Vetter 231d71c11ccSRob ClarkDMA Fence Sync File uABI 232d71c11ccSRob Clark~~~~~~~~~~~~~~~~~~~~~~~~ 233d71c11ccSRob Clark 234d71c11ccSRob Clark.. kernel-doc:: include/uapi/linux/sync_file.h 235d71c11ccSRob Clark :internal: 236d71c11ccSRob Clark 23772b6ede7SDaniel VetterIndefinite DMA Fences 2386546d28fSRandy Dunlap~~~~~~~~~~~~~~~~~~~~~ 23972b6ede7SDaniel Vetter 24026e08a6dSDaniel VetterAt various times struct dma_fence with an indefinite time until dma_fence_wait() 24172b6ede7SDaniel Vetterfinishes have been proposed. Examples include: 24272b6ede7SDaniel Vetter 24372b6ede7SDaniel Vetter* Future fences, used in HWC1 to signal when a buffer isn't used by the display 24472b6ede7SDaniel Vetter any longer, and created with the screen update that makes the buffer visible. 24572b6ede7SDaniel Vetter The time this fence completes is entirely under userspace's control. 24672b6ede7SDaniel Vetter 24772b6ede7SDaniel Vetter* Proxy fences, proposed to handle &drm_syncobj for which the fence has not yet 24872b6ede7SDaniel Vetter been set. Used to asynchronously delay command submission. 24972b6ede7SDaniel Vetter 25072b6ede7SDaniel Vetter* Userspace fences or gpu futexes, fine-grained locking within a command buffer 25172b6ede7SDaniel Vetter that userspace uses for synchronization across engines or with the CPU, which 25272b6ede7SDaniel Vetter are then imported as a DMA fence for integration into existing winsys 25372b6ede7SDaniel Vetter protocols. 25472b6ede7SDaniel Vetter 25572b6ede7SDaniel Vetter* Long-running compute command buffers, while still using traditional end of 25672b6ede7SDaniel Vetter batch DMA fences for memory management instead of context preemption DMA 25772b6ede7SDaniel Vetter fences which get reattached when the compute job is rescheduled. 25872b6ede7SDaniel Vetter 25972b6ede7SDaniel VetterCommon to all these schemes is that userspace controls the dependencies of these 26072b6ede7SDaniel Vetterfences and controls when they fire. Mixing indefinite fences with normal 26172b6ede7SDaniel Vetterin-kernel DMA fences does not work, even when a fallback timeout is included to 26272b6ede7SDaniel Vetterprotect against malicious userspace: 26372b6ede7SDaniel Vetter 26472b6ede7SDaniel Vetter* Only the kernel knows about all DMA fence dependencies, userspace is not aware 26572b6ede7SDaniel Vetter of dependencies injected due to memory management or scheduler decisions. 26672b6ede7SDaniel Vetter 26772b6ede7SDaniel Vetter* Only userspace knows about all dependencies in indefinite fences and when 26872b6ede7SDaniel Vetter exactly they will complete, the kernel has no visibility. 26972b6ede7SDaniel Vetter 27072b6ede7SDaniel VetterFurthermore the kernel has to be able to hold up userspace command submission 27172b6ede7SDaniel Vetterfor memory management needs, which means we must support indefinite fences being 27272b6ede7SDaniel Vetterdependent upon DMA fences. If the kernel also support indefinite fences in the 27372b6ede7SDaniel Vetterkernel like a DMA fence, like any of the above proposal would, there is the 27472b6ede7SDaniel Vetterpotential for deadlocks. 27572b6ede7SDaniel Vetter 27672b6ede7SDaniel Vetter.. kernel-render:: DOT 27772b6ede7SDaniel Vetter :alt: Indefinite Fencing Dependency Cycle 27872b6ede7SDaniel Vetter :caption: Indefinite Fencing Dependency Cycle 27972b6ede7SDaniel Vetter 28072b6ede7SDaniel Vetter digraph "Fencing Cycle" { 28172b6ede7SDaniel Vetter node [shape=box bgcolor=grey style=filled] 28272b6ede7SDaniel Vetter kernel [label="Kernel DMA Fences"] 28372b6ede7SDaniel Vetter userspace [label="userspace controlled fences"] 28472b6ede7SDaniel Vetter kernel -> userspace [label="memory management"] 28572b6ede7SDaniel Vetter userspace -> kernel [label="Future fence, fence proxy, ..."] 28672b6ede7SDaniel Vetter 28772b6ede7SDaniel Vetter { rank=same; kernel userspace } 28872b6ede7SDaniel Vetter } 28972b6ede7SDaniel Vetter 29072b6ede7SDaniel VetterThis means that the kernel might accidentally create deadlocks 29172b6ede7SDaniel Vetterthrough memory management dependencies which userspace is unaware of, which 29272b6ede7SDaniel Vetterrandomly hangs workloads until the timeout kicks in. Workloads, which from 29372b6ede7SDaniel Vetteruserspace's perspective, do not contain a deadlock. In such a mixed fencing 29472b6ede7SDaniel Vetterarchitecture there is no single entity with knowledge of all dependencies. 2957852fe3aSRandy DunlapTherefore preventing such deadlocks from within the kernel is not possible. 29672b6ede7SDaniel Vetter 29772b6ede7SDaniel VetterThe only solution to avoid dependencies loops is by not allowing indefinite 29872b6ede7SDaniel Vetterfences in the kernel. This means: 29972b6ede7SDaniel Vetter 30072b6ede7SDaniel Vetter* No future fences, proxy fences or userspace fences imported as DMA fences, 30172b6ede7SDaniel Vetter with or without a timeout. 30272b6ede7SDaniel Vetter 30372b6ede7SDaniel Vetter* No DMA fences that signal end of batchbuffer for command submission where 30472b6ede7SDaniel Vetter userspace is allowed to use userspace fencing or long running compute 30572b6ede7SDaniel Vetter workloads. This also means no implicit fencing for shared buffers in these 30672b6ede7SDaniel Vetter cases. 3078613385cSDaniel Vetter 3088613385cSDaniel VetterRecoverable Hardware Page Faults Implications 3098613385cSDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 3108613385cSDaniel Vetter 3118613385cSDaniel VetterModern hardware supports recoverable page faults, which has a lot of 3128613385cSDaniel Vetterimplications for DMA fences. 3138613385cSDaniel Vetter 3148613385cSDaniel VetterFirst, a pending page fault obviously holds up the work that's running on the 3158613385cSDaniel Vetteraccelerator and a memory allocation is usually required to resolve the fault. 3168613385cSDaniel VetterBut memory allocations are not allowed to gate completion of DMA fences, which 3178613385cSDaniel Vettermeans any workload using recoverable page faults cannot use DMA fences for 3188613385cSDaniel Vettersynchronization. Synchronization fences controlled by userspace must be used 3198613385cSDaniel Vetterinstead. 3208613385cSDaniel Vetter 3218613385cSDaniel VetterOn GPUs this poses a problem, because current desktop compositor protocols on 3228613385cSDaniel VetterLinux rely on DMA fences, which means without an entirely new userspace stack 3238613385cSDaniel Vetterbuilt on top of userspace fences, they cannot benefit from recoverable page 3248613385cSDaniel Vetterfaults. Specifically this means implicit synchronization will not be possible. 3258613385cSDaniel VetterThe exception is when page faults are only used as migration hints and never to 3268613385cSDaniel Vetteron-demand fill a memory request. For now this means recoverable page 3278613385cSDaniel Vetterfaults on GPUs are limited to pure compute workloads. 3288613385cSDaniel Vetter 3298613385cSDaniel VetterFurthermore GPUs usually have shared resources between the 3D rendering and 3308613385cSDaniel Vettercompute side, like compute units or command submission engines. If both a 3D 3318613385cSDaniel Vetterjob with a DMA fence and a compute workload using recoverable page faults are 3328613385cSDaniel Vetterpending they could deadlock: 3338613385cSDaniel Vetter 3348613385cSDaniel Vetter- The 3D workload might need to wait for the compute job to finish and release 3358613385cSDaniel Vetter hardware resources first. 3368613385cSDaniel Vetter 3378613385cSDaniel Vetter- The compute workload might be stuck in a page fault, because the memory 3388613385cSDaniel Vetter allocation is waiting for the DMA fence of the 3D workload to complete. 3398613385cSDaniel Vetter 3408613385cSDaniel VetterThere are a few options to prevent this problem, one of which drivers need to 3418613385cSDaniel Vetterensure: 3428613385cSDaniel Vetter 3438613385cSDaniel Vetter- Compute workloads can always be preempted, even when a page fault is pending 3448613385cSDaniel Vetter and not yet repaired. Not all hardware supports this. 3458613385cSDaniel Vetter 3468613385cSDaniel Vetter- DMA fence workloads and workloads which need page fault handling have 3478613385cSDaniel Vetter independent hardware resources to guarantee forward progress. This could be 3488613385cSDaniel Vetter achieved through e.g. through dedicated engines and minimal compute unit 3498613385cSDaniel Vetter reservations for DMA fence workloads. 3508613385cSDaniel Vetter 3518613385cSDaniel Vetter- The reservation approach could be further refined by only reserving the 3528613385cSDaniel Vetter hardware resources for DMA fence workloads when they are in-flight. This must 3538613385cSDaniel Vetter cover the time from when the DMA fence is visible to other threads up to 3548613385cSDaniel Vetter moment when fence is completed through dma_fence_signal(). 3558613385cSDaniel Vetter 3568613385cSDaniel Vetter- As a last resort, if the hardware provides no useful reservation mechanics, 3578613385cSDaniel Vetter all workloads must be flushed from the GPU when switching between jobs 3588613385cSDaniel Vetter requiring DMA fences or jobs requiring page fault handling: This means all DMA 3598613385cSDaniel Vetter fences must complete before a compute job with page fault handling can be 3608613385cSDaniel Vetter inserted into the scheduler queue. And vice versa, before a DMA fence can be 3618613385cSDaniel Vetter made visible anywhere in the system, all compute workloads must be preempted 3628613385cSDaniel Vetter to guarantee all pending GPU page faults are flushed. 3638613385cSDaniel Vetter 3648613385cSDaniel Vetter- Only a fairly theoretical option would be to untangle these dependencies when 3658613385cSDaniel Vetter allocating memory to repair hardware page faults, either through separate 3668613385cSDaniel Vetter memory blocks or runtime tracking of the full dependency graph of all DMA 3678613385cSDaniel Vetter fences. This results very wide impact on the kernel, since resolving the page 3688613385cSDaniel Vetter on the CPU side can itself involve a page fault. It is much more feasible and 3698613385cSDaniel Vetter robust to limit the impact of handling hardware page faults to the specific 3708613385cSDaniel Vetter driver. 3718613385cSDaniel Vetter 3728613385cSDaniel VetterNote that workloads that run on independent hardware like copy engines or other 3738613385cSDaniel VetterGPUs do not have any impact. This allows us to keep using DMA fences internally 3748613385cSDaniel Vetterin the kernel even for resolving hardware page faults, e.g. by using copy 3758613385cSDaniel Vetterengines to clear or copy memory needed to resolve the page fault. 3768613385cSDaniel Vetter 3778613385cSDaniel VetterIn some ways this page fault problem is a special case of the `Infinite DMA 3788613385cSDaniel VetterFences` discussions: Infinite fences from compute workloads are allowed to 3798613385cSDaniel Vetterdepend on DMA fences, but not the other way around. And not even the page fault 3808613385cSDaniel Vetterproblem is new, because some other CPU thread in userspace might 3818613385cSDaniel Vetterhit a page fault which holds up a userspace fence - supporting page faults on 3828613385cSDaniel VetterGPUs doesn't anything fundamentally new. 383