xref: /linux/Documentation/driver-api/dma-buf.rst (revision 49c2dd6e99a5541698b18a652552890a18c2a00f)
17ee6095fSJonathan NeuschäferBuffer Sharing and Synchronization (dma-buf)
27ee6095fSJonathan Neuschäfer============================================
3868c97a8SDaniel Vetter
4868c97a8SDaniel VetterThe dma-buf subsystem provides the framework for sharing buffers for
5868c97a8SDaniel Vetterhardware (DMA) access across multiple device drivers and subsystems, and
6868c97a8SDaniel Vetterfor synchronizing asynchronous hardware access.
7868c97a8SDaniel Vetter
809902f3aSDaniel StoneAs an example, it is used extensively by the DRM subsystem to exchange
909902f3aSDaniel Stonebuffers between processes, contexts, library APIs within the same
1009902f3aSDaniel Stoneprocess, and also to exchange buffers with other subsystems such as
1109902f3aSDaniel StoneV4L2.
12868c97a8SDaniel Vetter
1309902f3aSDaniel StoneThis document describes the way in which kernel subsystems can use and
1409902f3aSDaniel Stoneinteract with the three main primitives offered by dma-buf:
1509902f3aSDaniel Stone
1609902f3aSDaniel Stone - dma-buf, representing a sg_table and exposed to userspace as a file
1709902f3aSDaniel Stone   descriptor to allow passing between processes, subsystems, devices,
1809902f3aSDaniel Stone   etc;
1909902f3aSDaniel Stone - dma-fence, providing a mechanism to signal when an asynchronous
2009902f3aSDaniel Stone   hardware operation has completed; and
2109902f3aSDaniel Stone - dma-resv, which manages a set of dma-fences for a particular dma-buf
2209902f3aSDaniel Stone   allowing implicit (kernel-ordered) synchronization of work to
2309902f3aSDaniel Stone   preserve the illusion of coherent access
24868c97a8SDaniel Vetter
25504245a5SDaniel Stone
26504245a5SDaniel StoneUserspace API principles and use
27504245a5SDaniel Stone--------------------------------
28504245a5SDaniel Stone
29504245a5SDaniel StoneFor more details on how to design your subsystem's API for dma-buf use, please
30504245a5SDaniel Stonesee Documentation/userspace-api/dma-buf-alloc-exchange.rst.
31504245a5SDaniel Stone
32504245a5SDaniel Stone
33868c97a8SDaniel VetterShared DMA Buffers
34868c97a8SDaniel Vetter------------------
35868c97a8SDaniel Vetter
362904a8c1SDaniel VetterThis document serves as a guide to device-driver writers on what is the dma-buf
372904a8c1SDaniel Vetterbuffer sharing API, how to use it for exporting and using shared buffers.
382904a8c1SDaniel Vetter
392904a8c1SDaniel VetterAny device driver which wishes to be a part of DMA buffer sharing, can do so as
402904a8c1SDaniel Vettereither the 'exporter' of buffers, or the 'user' or 'importer' of buffers.
412904a8c1SDaniel Vetter
422904a8c1SDaniel VetterSay a driver A wants to use buffers created by driver B, then we call B as the
432904a8c1SDaniel Vetterexporter, and A as buffer-user/importer.
442904a8c1SDaniel Vetter
452904a8c1SDaniel VetterThe exporter
462904a8c1SDaniel Vetter
472904a8c1SDaniel Vetter - implements and manages operations in :c:type:`struct dma_buf_ops
482904a8c1SDaniel Vetter   <dma_buf_ops>` for the buffer,
492904a8c1SDaniel Vetter - allows other users to share the buffer by using dma_buf sharing APIs,
50776d5882SGal Pressman - manages the details of buffer allocation, wrapped in a :c:type:`struct
512904a8c1SDaniel Vetter   dma_buf <dma_buf>`,
522904a8c1SDaniel Vetter - decides about the actual backing storage where this allocation happens,
532904a8c1SDaniel Vetter - and takes care of any migration of scatterlist - for all (shared) users of
542904a8c1SDaniel Vetter   this buffer.
552904a8c1SDaniel Vetter
562904a8c1SDaniel VetterThe buffer-user
572904a8c1SDaniel Vetter
582904a8c1SDaniel Vetter - is one of (many) sharing users of the buffer.
592904a8c1SDaniel Vetter - doesn't need to worry about how the buffer is allocated, or where.
602904a8c1SDaniel Vetter - and needs a mechanism to get access to the scatterlist that makes up this
612904a8c1SDaniel Vetter   buffer in memory, mapped into its own address space, so it can access the
622904a8c1SDaniel Vetter   same area of memory. This interface is provided by :c:type:`struct
632904a8c1SDaniel Vetter   dma_buf_attachment <dma_buf_attachment>`.
642904a8c1SDaniel Vetter
65e7e21c72SDaniel VetterAny exporters or users of the dma-buf buffer sharing framework must have a
66e7e21c72SDaniel Vetter'select DMA_SHARED_BUFFER' in their respective Kconfigs.
67e7e21c72SDaniel Vetter
68e7e21c72SDaniel VetterUserspace Interface Notes
69e7e21c72SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~
70e7e21c72SDaniel Vetter
71e7e21c72SDaniel VetterMostly a DMA buffer file descriptor is simply an opaque object for userspace,
72e7e21c72SDaniel Vetterand hence the generic interface exposed is very minimal. There's a few things to
73e7e21c72SDaniel Vetterconsider though:
74e7e21c72SDaniel Vetter
75e7e21c72SDaniel Vetter- Since kernel 3.12 the dma-buf FD supports the llseek system call, but only
76e7e21c72SDaniel Vetter  with offset=0 and whence=SEEK_END|SEEK_SET. SEEK_SET is supported to allow
77e7e21c72SDaniel Vetter  the usual size discover pattern size = SEEK_END(0); SEEK_SET(0). Every other
78e7e21c72SDaniel Vetter  llseek operation will report -EINVAL.
79e7e21c72SDaniel Vetter
80*49c2dd6eSBaruch Siach  If llseek on dma-buf FDs isn't supported the kernel will report -ESPIPE for all
81e7e21c72SDaniel Vetter  cases. Userspace can use this to detect support for discovering the dma-buf
82e7e21c72SDaniel Vetter  size using llseek.
83e7e21c72SDaniel Vetter
84e7e21c72SDaniel Vetter- In order to avoid fd leaks on exec, the FD_CLOEXEC flag must be set
85e7e21c72SDaniel Vetter  on the file descriptor.  This is not just a resource leak, but a
86e7e21c72SDaniel Vetter  potential security hole.  It could give the newly exec'd application
87e7e21c72SDaniel Vetter  access to buffers, via the leaked fd, to which it should otherwise
88e7e21c72SDaniel Vetter  not be permitted access.
89e7e21c72SDaniel Vetter
90e7e21c72SDaniel Vetter  The problem with doing this via a separate fcntl() call, versus doing it
91e7e21c72SDaniel Vetter  atomically when the fd is created, is that this is inherently racy in a
92e7e21c72SDaniel Vetter  multi-threaded app[3].  The issue is made worse when it is library code
93e7e21c72SDaniel Vetter  opening/creating the file descriptor, as the application may not even be
94e7e21c72SDaniel Vetter  aware of the fd's.
95e7e21c72SDaniel Vetter
96e7e21c72SDaniel Vetter  To avoid this problem, userspace must have a way to request O_CLOEXEC
97e7e21c72SDaniel Vetter  flag be set when the dma-buf fd is created.  So any API provided by
98e7e21c72SDaniel Vetter  the exporting driver to create a dmabuf fd must provide a way to let
99e7e21c72SDaniel Vetter  userspace control setting of O_CLOEXEC flag passed in to dma_buf_fd().
100e7e21c72SDaniel Vetter
101e7e21c72SDaniel Vetter- Memory mapping the contents of the DMA buffer is also supported. See the
102e7e21c72SDaniel Vetter  discussion below on `CPU Access to DMA Buffer Objects`_ for the full details.
103e7e21c72SDaniel Vetter
104b899353dSMauro Carvalho Chehab- The DMA buffer FD is also pollable, see `Implicit Fence Poll Support`_ below for
105e7e21c72SDaniel Vetter  details.
106e7e21c72SDaniel Vetter
10751f52547SJason Ekstrand- The DMA buffer FD also supports a few dma-buf-specific ioctls, see
10851f52547SJason Ekstrand  `DMA Buffer ioctls`_ below for details.
10951f52547SJason Ekstrand
1102904a8c1SDaniel VetterBasic Operation and Device DMA Access
1112904a8c1SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1122904a8c1SDaniel Vetter
1132904a8c1SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-buf.c
1142904a8c1SDaniel Vetter   :doc: dma buf device access
1152904a8c1SDaniel Vetter
1160959a168SDaniel VetterCPU Access to DMA Buffer Objects
1170959a168SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1180959a168SDaniel Vetter
1190959a168SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-buf.c
1200959a168SDaniel Vetter   :doc: cpu access
1210959a168SDaniel Vetter
122102514ecSDaniel VetterImplicit Fence Poll Support
123102514ecSDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~
124e7e21c72SDaniel Vetter
125e7e21c72SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-buf.c
126102514ecSDaniel Vetter   :doc: implicit fence polling
127e7e21c72SDaniel Vetter
128bdb8d06dSHridya ValsarajuDMA-BUF statistics
129bdb8d06dSHridya Valsaraju~~~~~~~~~~~~~~~~~~
130bdb8d06dSHridya Valsaraju.. kernel-doc:: drivers/dma-buf/dma-buf-sysfs-stats.c
131bdb8d06dSHridya Valsaraju   :doc: overview
132bdb8d06dSHridya Valsaraju
13351f52547SJason EkstrandDMA Buffer ioctls
13451f52547SJason Ekstrand~~~~~~~~~~~~~~~~~
13551f52547SJason Ekstrand
13651f52547SJason Ekstrand.. kernel-doc:: include/uapi/linux/dma-buf.h
13751f52547SJason Ekstrand
138ae2e7f28SDmitry OsipenkoDMA-BUF locking convention
139ae2e7f28SDmitry Osipenko~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
140ae2e7f28SDmitry Osipenko
141ae2e7f28SDmitry Osipenko.. kernel-doc:: drivers/dma-buf/dma-buf.c
142ae2e7f28SDmitry Osipenko   :doc: locking convention
143ae2e7f28SDmitry Osipenko
1442904a8c1SDaniel VetterKernel Functions and Structures Reference
1452904a8c1SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1462904a8c1SDaniel Vetter
147868c97a8SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-buf.c
148868c97a8SDaniel Vetter   :export:
149868c97a8SDaniel Vetter
150868c97a8SDaniel Vetter.. kernel-doc:: include/linux/dma-buf.h
151868c97a8SDaniel Vetter   :internal:
152868c97a8SDaniel Vetter
153868c97a8SDaniel VetterReservation Objects
154868c97a8SDaniel Vetter-------------------
155868c97a8SDaniel Vetter
1560f546217SAnna Karas.. kernel-doc:: drivers/dma-buf/dma-resv.c
157868c97a8SDaniel Vetter   :doc: Reservation Object Overview
158868c97a8SDaniel Vetter
1590f546217SAnna Karas.. kernel-doc:: drivers/dma-buf/dma-resv.c
160868c97a8SDaniel Vetter   :export:
161868c97a8SDaniel Vetter
1620f546217SAnna Karas.. kernel-doc:: include/linux/dma-resv.h
163868c97a8SDaniel Vetter   :internal:
164868c97a8SDaniel Vetter
165868c97a8SDaniel VetterDMA Fences
166868c97a8SDaniel Vetter----------
167868c97a8SDaniel Vetter
168868c97a8SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence.c
1694dd3cdb2SDaniel Vetter   :doc: DMA fences overview
1704dd3cdb2SDaniel Vetter
171d0b9a9aeSDaniel VetterDMA Fence Cross-Driver Contract
172d0b9a9aeSDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
173d0b9a9aeSDaniel Vetter
174d0b9a9aeSDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence.c
175d0b9a9aeSDaniel Vetter   :doc: fence cross-driver contract
176d0b9a9aeSDaniel Vetter
1775fbff813SDaniel VetterDMA Fence Signalling Annotations
1785fbff813SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1795fbff813SDaniel Vetter
1805fbff813SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence.c
1815fbff813SDaniel Vetter   :doc: fence signalling annotation
1825fbff813SDaniel Vetter
183aec11c8dSRob ClarkDMA Fence Deadline Hints
184aec11c8dSRob Clark~~~~~~~~~~~~~~~~~~~~~~~~
185aec11c8dSRob Clark
186aec11c8dSRob Clark.. kernel-doc:: drivers/dma-buf/dma-fence.c
187aec11c8dSRob Clark   :doc: deadline hints
188aec11c8dSRob Clark
1894dd3cdb2SDaniel VetterDMA Fences Functions Reference
1904dd3cdb2SDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1914dd3cdb2SDaniel Vetter
1924dd3cdb2SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence.c
193868c97a8SDaniel Vetter   :export:
194868c97a8SDaniel Vetter
195868c97a8SDaniel Vetter.. kernel-doc:: include/linux/dma-fence.h
196868c97a8SDaniel Vetter   :internal:
197868c97a8SDaniel Vetter
198868c97a8SDaniel VetterDMA Fence Array
199868c97a8SDaniel Vetter~~~~~~~~~~~~~~~
200868c97a8SDaniel Vetter
201868c97a8SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence-array.c
202868c97a8SDaniel Vetter   :export:
203868c97a8SDaniel Vetter
204868c97a8SDaniel Vetter.. kernel-doc:: include/linux/dma-fence-array.h
205868c97a8SDaniel Vetter   :internal:
206868c97a8SDaniel Vetter
207b970b8e9SDaniel VetterDMA Fence Chain
208b970b8e9SDaniel Vetter~~~~~~~~~~~~~~~
209b970b8e9SDaniel Vetter
210b970b8e9SDaniel Vetter.. kernel-doc:: drivers/dma-buf/dma-fence-chain.c
211b970b8e9SDaniel Vetter   :export:
212b970b8e9SDaniel Vetter
213b970b8e9SDaniel Vetter.. kernel-doc:: include/linux/dma-fence-chain.h
214b970b8e9SDaniel Vetter   :internal:
215b970b8e9SDaniel Vetter
21664a8f92fSChristian KönigDMA Fence unwrap
21764a8f92fSChristian König~~~~~~~~~~~~~~~~
21864a8f92fSChristian König
21964a8f92fSChristian König.. kernel-doc:: include/linux/dma-fence-unwrap.h
22064a8f92fSChristian König   :internal:
22164a8f92fSChristian König
222d71c11ccSRob ClarkDMA Fence Sync File
223d71c11ccSRob Clark~~~~~~~~~~~~~~~~~~~
224868c97a8SDaniel Vetter
225868c97a8SDaniel Vetter.. kernel-doc:: drivers/dma-buf/sync_file.c
226868c97a8SDaniel Vetter   :export:
227868c97a8SDaniel Vetter
228868c97a8SDaniel Vetter.. kernel-doc:: include/linux/sync_file.h
229868c97a8SDaniel Vetter   :internal:
230868c97a8SDaniel Vetter
231d71c11ccSRob ClarkDMA Fence Sync File uABI
232d71c11ccSRob Clark~~~~~~~~~~~~~~~~~~~~~~~~
233d71c11ccSRob Clark
234d71c11ccSRob Clark.. kernel-doc:: include/uapi/linux/sync_file.h
235d71c11ccSRob Clark   :internal:
236d71c11ccSRob Clark
23772b6ede7SDaniel VetterIndefinite DMA Fences
2386546d28fSRandy Dunlap~~~~~~~~~~~~~~~~~~~~~
23972b6ede7SDaniel Vetter
24026e08a6dSDaniel VetterAt various times struct dma_fence with an indefinite time until dma_fence_wait()
24172b6ede7SDaniel Vetterfinishes have been proposed. Examples include:
24272b6ede7SDaniel Vetter
24372b6ede7SDaniel Vetter* Future fences, used in HWC1 to signal when a buffer isn't used by the display
24472b6ede7SDaniel Vetter  any longer, and created with the screen update that makes the buffer visible.
24572b6ede7SDaniel Vetter  The time this fence completes is entirely under userspace's control.
24672b6ede7SDaniel Vetter
24772b6ede7SDaniel Vetter* Proxy fences, proposed to handle &drm_syncobj for which the fence has not yet
24872b6ede7SDaniel Vetter  been set. Used to asynchronously delay command submission.
24972b6ede7SDaniel Vetter
25072b6ede7SDaniel Vetter* Userspace fences or gpu futexes, fine-grained locking within a command buffer
25172b6ede7SDaniel Vetter  that userspace uses for synchronization across engines or with the CPU, which
25272b6ede7SDaniel Vetter  are then imported as a DMA fence for integration into existing winsys
25372b6ede7SDaniel Vetter  protocols.
25472b6ede7SDaniel Vetter
25572b6ede7SDaniel Vetter* Long-running compute command buffers, while still using traditional end of
25672b6ede7SDaniel Vetter  batch DMA fences for memory management instead of context preemption DMA
25772b6ede7SDaniel Vetter  fences which get reattached when the compute job is rescheduled.
25872b6ede7SDaniel Vetter
25972b6ede7SDaniel VetterCommon to all these schemes is that userspace controls the dependencies of these
26072b6ede7SDaniel Vetterfences and controls when they fire. Mixing indefinite fences with normal
26172b6ede7SDaniel Vetterin-kernel DMA fences does not work, even when a fallback timeout is included to
26272b6ede7SDaniel Vetterprotect against malicious userspace:
26372b6ede7SDaniel Vetter
26472b6ede7SDaniel Vetter* Only the kernel knows about all DMA fence dependencies, userspace is not aware
26572b6ede7SDaniel Vetter  of dependencies injected due to memory management or scheduler decisions.
26672b6ede7SDaniel Vetter
26772b6ede7SDaniel Vetter* Only userspace knows about all dependencies in indefinite fences and when
26872b6ede7SDaniel Vetter  exactly they will complete, the kernel has no visibility.
26972b6ede7SDaniel Vetter
27072b6ede7SDaniel VetterFurthermore the kernel has to be able to hold up userspace command submission
27172b6ede7SDaniel Vetterfor memory management needs, which means we must support indefinite fences being
27272b6ede7SDaniel Vetterdependent upon DMA fences. If the kernel also support indefinite fences in the
27372b6ede7SDaniel Vetterkernel like a DMA fence, like any of the above proposal would, there is the
27472b6ede7SDaniel Vetterpotential for deadlocks.
27572b6ede7SDaniel Vetter
27672b6ede7SDaniel Vetter.. kernel-render:: DOT
27772b6ede7SDaniel Vetter   :alt: Indefinite Fencing Dependency Cycle
27872b6ede7SDaniel Vetter   :caption: Indefinite Fencing Dependency Cycle
27972b6ede7SDaniel Vetter
28072b6ede7SDaniel Vetter   digraph "Fencing Cycle" {
28172b6ede7SDaniel Vetter      node [shape=box bgcolor=grey style=filled]
28272b6ede7SDaniel Vetter      kernel [label="Kernel DMA Fences"]
28372b6ede7SDaniel Vetter      userspace [label="userspace controlled fences"]
28472b6ede7SDaniel Vetter      kernel -> userspace [label="memory management"]
28572b6ede7SDaniel Vetter      userspace -> kernel [label="Future fence, fence proxy, ..."]
28672b6ede7SDaniel Vetter
28772b6ede7SDaniel Vetter      { rank=same; kernel userspace }
28872b6ede7SDaniel Vetter   }
28972b6ede7SDaniel Vetter
29072b6ede7SDaniel VetterThis means that the kernel might accidentally create deadlocks
29172b6ede7SDaniel Vetterthrough memory management dependencies which userspace is unaware of, which
29272b6ede7SDaniel Vetterrandomly hangs workloads until the timeout kicks in. Workloads, which from
29372b6ede7SDaniel Vetteruserspace's perspective, do not contain a deadlock.  In such a mixed fencing
29472b6ede7SDaniel Vetterarchitecture there is no single entity with knowledge of all dependencies.
2957852fe3aSRandy DunlapTherefore preventing such deadlocks from within the kernel is not possible.
29672b6ede7SDaniel Vetter
29772b6ede7SDaniel VetterThe only solution to avoid dependencies loops is by not allowing indefinite
29872b6ede7SDaniel Vetterfences in the kernel. This means:
29972b6ede7SDaniel Vetter
30072b6ede7SDaniel Vetter* No future fences, proxy fences or userspace fences imported as DMA fences,
30172b6ede7SDaniel Vetter  with or without a timeout.
30272b6ede7SDaniel Vetter
30372b6ede7SDaniel Vetter* No DMA fences that signal end of batchbuffer for command submission where
30472b6ede7SDaniel Vetter  userspace is allowed to use userspace fencing or long running compute
30572b6ede7SDaniel Vetter  workloads. This also means no implicit fencing for shared buffers in these
30672b6ede7SDaniel Vetter  cases.
3078613385cSDaniel Vetter
3088613385cSDaniel VetterRecoverable Hardware Page Faults Implications
3098613385cSDaniel Vetter~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3108613385cSDaniel Vetter
3118613385cSDaniel VetterModern hardware supports recoverable page faults, which has a lot of
3128613385cSDaniel Vetterimplications for DMA fences.
3138613385cSDaniel Vetter
3148613385cSDaniel VetterFirst, a pending page fault obviously holds up the work that's running on the
3158613385cSDaniel Vetteraccelerator and a memory allocation is usually required to resolve the fault.
3168613385cSDaniel VetterBut memory allocations are not allowed to gate completion of DMA fences, which
3178613385cSDaniel Vettermeans any workload using recoverable page faults cannot use DMA fences for
3188613385cSDaniel Vettersynchronization. Synchronization fences controlled by userspace must be used
3198613385cSDaniel Vetterinstead.
3208613385cSDaniel Vetter
3218613385cSDaniel VetterOn GPUs this poses a problem, because current desktop compositor protocols on
3228613385cSDaniel VetterLinux rely on DMA fences, which means without an entirely new userspace stack
3238613385cSDaniel Vetterbuilt on top of userspace fences, they cannot benefit from recoverable page
3248613385cSDaniel Vetterfaults. Specifically this means implicit synchronization will not be possible.
3258613385cSDaniel VetterThe exception is when page faults are only used as migration hints and never to
3268613385cSDaniel Vetteron-demand fill a memory request. For now this means recoverable page
3278613385cSDaniel Vetterfaults on GPUs are limited to pure compute workloads.
3288613385cSDaniel Vetter
3298613385cSDaniel VetterFurthermore GPUs usually have shared resources between the 3D rendering and
3308613385cSDaniel Vettercompute side, like compute units or command submission engines. If both a 3D
3318613385cSDaniel Vetterjob with a DMA fence and a compute workload using recoverable page faults are
3328613385cSDaniel Vetterpending they could deadlock:
3338613385cSDaniel Vetter
3348613385cSDaniel Vetter- The 3D workload might need to wait for the compute job to finish and release
3358613385cSDaniel Vetter  hardware resources first.
3368613385cSDaniel Vetter
3378613385cSDaniel Vetter- The compute workload might be stuck in a page fault, because the memory
3388613385cSDaniel Vetter  allocation is waiting for the DMA fence of the 3D workload to complete.
3398613385cSDaniel Vetter
3408613385cSDaniel VetterThere are a few options to prevent this problem, one of which drivers need to
3418613385cSDaniel Vetterensure:
3428613385cSDaniel Vetter
3438613385cSDaniel Vetter- Compute workloads can always be preempted, even when a page fault is pending
3448613385cSDaniel Vetter  and not yet repaired. Not all hardware supports this.
3458613385cSDaniel Vetter
3468613385cSDaniel Vetter- DMA fence workloads and workloads which need page fault handling have
3478613385cSDaniel Vetter  independent hardware resources to guarantee forward progress. This could be
3488613385cSDaniel Vetter  achieved through e.g. through dedicated engines and minimal compute unit
3498613385cSDaniel Vetter  reservations for DMA fence workloads.
3508613385cSDaniel Vetter
3518613385cSDaniel Vetter- The reservation approach could be further refined by only reserving the
3528613385cSDaniel Vetter  hardware resources for DMA fence workloads when they are in-flight. This must
3538613385cSDaniel Vetter  cover the time from when the DMA fence is visible to other threads up to
3548613385cSDaniel Vetter  moment when fence is completed through dma_fence_signal().
3558613385cSDaniel Vetter
3568613385cSDaniel Vetter- As a last resort, if the hardware provides no useful reservation mechanics,
3578613385cSDaniel Vetter  all workloads must be flushed from the GPU when switching between jobs
3588613385cSDaniel Vetter  requiring DMA fences or jobs requiring page fault handling: This means all DMA
3598613385cSDaniel Vetter  fences must complete before a compute job with page fault handling can be
3608613385cSDaniel Vetter  inserted into the scheduler queue. And vice versa, before a DMA fence can be
3618613385cSDaniel Vetter  made visible anywhere in the system, all compute workloads must be preempted
3628613385cSDaniel Vetter  to guarantee all pending GPU page faults are flushed.
3638613385cSDaniel Vetter
3648613385cSDaniel Vetter- Only a fairly theoretical option would be to untangle these dependencies when
3658613385cSDaniel Vetter  allocating memory to repair hardware page faults, either through separate
3668613385cSDaniel Vetter  memory blocks or runtime tracking of the full dependency graph of all DMA
3678613385cSDaniel Vetter  fences. This results very wide impact on the kernel, since resolving the page
3688613385cSDaniel Vetter  on the CPU side can itself involve a page fault. It is much more feasible and
3698613385cSDaniel Vetter  robust to limit the impact of handling hardware page faults to the specific
3708613385cSDaniel Vetter  driver.
3718613385cSDaniel Vetter
3728613385cSDaniel VetterNote that workloads that run on independent hardware like copy engines or other
3738613385cSDaniel VetterGPUs do not have any impact. This allows us to keep using DMA fences internally
3748613385cSDaniel Vetterin the kernel even for resolving hardware page faults, e.g. by using copy
3758613385cSDaniel Vetterengines to clear or copy memory needed to resolve the page fault.
3768613385cSDaniel Vetter
3778613385cSDaniel VetterIn some ways this page fault problem is a special case of the `Infinite DMA
3788613385cSDaniel VetterFences` discussions: Infinite fences from compute workloads are allowed to
3798613385cSDaniel Vetterdepend on DMA fences, but not the other way around. And not even the page fault
3808613385cSDaniel Vetterproblem is new, because some other CPU thread in userspace might
3818613385cSDaniel Vetterhit a page fault which holds up a userspace fence - supporting page faults on
3828613385cSDaniel VetterGPUs doesn't anything fundamentally new.
383