xref: /linux/Documentation/gpu/rfc/gpusvm.rst (revision 4f9786035f9e519db41375818e1d0b5f20da2f10)
145f5a1efSMatthew Brost.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
245f5a1efSMatthew Brost
345f5a1efSMatthew Brost===============
445f5a1efSMatthew BrostGPU SVM Section
545f5a1efSMatthew Brost===============
645f5a1efSMatthew Brost
745f5a1efSMatthew BrostAgreed upon design principles
845f5a1efSMatthew Brost=============================
945f5a1efSMatthew Brost
1045f5a1efSMatthew Brost* migrate_to_ram path
1145f5a1efSMatthew Brost	* Rely only on core MM concepts (migration PTEs, page references, and
1245f5a1efSMatthew Brost	  page locking).
1345f5a1efSMatthew Brost	* No driver specific locks other than locks for hardware interaction in
1445f5a1efSMatthew Brost	  this path. These are not required and generally a bad idea to
1545f5a1efSMatthew Brost	  invent driver defined locks to seal core MM races.
1645f5a1efSMatthew Brost	* An example of a driver-specific lock causing issues occurred before
1745f5a1efSMatthew Brost	  fixing do_swap_page to lock the faulting page. A driver-exclusive lock
1845f5a1efSMatthew Brost	  in migrate_to_ram produced a stable livelock if enough threads read
1945f5a1efSMatthew Brost	  the faulting page.
2045f5a1efSMatthew Brost	* Partial migration is supported (i.e., a subset of pages attempting to
2145f5a1efSMatthew Brost	  migrate can actually migrate, with only the faulting page guaranteed
2245f5a1efSMatthew Brost	  to migrate).
2345f5a1efSMatthew Brost	* Driver handles mixed migrations via retry loops rather than locking.
2445f5a1efSMatthew Brost* Eviction
2545f5a1efSMatthew Brost	* Eviction is defined as migrating data from the GPU back to the
2645f5a1efSMatthew Brost	  CPU without a virtual address to free up GPU memory.
2745f5a1efSMatthew Brost	* Only looking at physical memory data structures and locks as opposed to
2845f5a1efSMatthew Brost	  looking at virtual memory data structures and locks.
2945f5a1efSMatthew Brost	* No looking at mm/vma structs or relying on those being locked.
3045f5a1efSMatthew Brost	* The rationale for the above two points is that CPU virtual addresses
3145f5a1efSMatthew Brost	  can change at any moment, while the physical pages remain stable.
3245f5a1efSMatthew Brost	* GPU page table invalidation, which requires a GPU virtual address, is
3345f5a1efSMatthew Brost	  handled via the notifier that has access to the GPU virtual address.
3445f5a1efSMatthew Brost* GPU fault side
3545f5a1efSMatthew Brost	* mmap_read only used around core MM functions which require this lock
3645f5a1efSMatthew Brost	  and should strive to take mmap_read lock only in GPU SVM layer.
3745f5a1efSMatthew Brost	* Big retry loop to handle all races with the mmu notifier under the gpu
3845f5a1efSMatthew Brost	  pagetable locks/mmu notifier range lock/whatever we end up calling
3945f5a1efSMatthew Brost          those.
4045f5a1efSMatthew Brost	* Races (especially against concurrent eviction or migrate_to_ram)
4145f5a1efSMatthew Brost	  should not be handled on the fault side by trying to hold locks;
4245f5a1efSMatthew Brost	  rather, they should be handled using retry loops. One possible
4345f5a1efSMatthew Brost	  exception is holding a BO's dma-resv lock during the initial migration
4445f5a1efSMatthew Brost	  to VRAM, as this is a well-defined lock that can be taken underneath
4545f5a1efSMatthew Brost	  the mmap_read lock.
4645f5a1efSMatthew Brost	* One possible issue with the above approach is if a driver has a strict
4745f5a1efSMatthew Brost	  migration policy requiring GPU access to occur in GPU memory.
4845f5a1efSMatthew Brost	  Concurrent CPU access could cause a livelock due to endless retries.
4945f5a1efSMatthew Brost	  While no current user (Xe) of GPU SVM has such a policy, it is likely
5045f5a1efSMatthew Brost	  to be added in the future. Ideally, this should be resolved on the
5145f5a1efSMatthew Brost	  core-MM side rather than through a driver-side lock.
5245f5a1efSMatthew Brost* Physical memory to virtual backpointer
5345f5a1efSMatthew Brost	* This does not work, as no pointers from physical memory to virtual
5445f5a1efSMatthew Brost	  memory should exist. mremap() is an example of the core MM updating
5545f5a1efSMatthew Brost	  the virtual address without notifying the driver of address
5645f5a1efSMatthew Brost	  change rather the driver only receiving the invalidation notifier.
5745f5a1efSMatthew Brost	* The physical memory backpointer (page->zone_device_data) should remain
5845f5a1efSMatthew Brost	  stable from allocation to page free. Safely updating this against a
5945f5a1efSMatthew Brost	  concurrent user would be very difficult unless the page is free.
6045f5a1efSMatthew Brost* GPU pagetable locking
6145f5a1efSMatthew Brost	* Notifier lock only protects range tree, pages valid state for a range
6245f5a1efSMatthew Brost	  (rather than seqno due to wider notifiers), pagetable entries, and
6345f5a1efSMatthew Brost	  mmu notifier seqno tracking, it is not a global lock to protect
6445f5a1efSMatthew Brost          against races.
6545f5a1efSMatthew Brost	* All races handled with big retry as mentioned above.
6645f5a1efSMatthew Brost
6745f5a1efSMatthew BrostOverview of baseline design
6845f5a1efSMatthew Brost===========================
6945f5a1efSMatthew Brost
70*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
7145f5a1efSMatthew Brost   :doc: Overview
72*fd6c10e6SLucas De Marchi
73*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
7445f5a1efSMatthew Brost   :doc: Locking
75*fd6c10e6SLucas De Marchi
76*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
77*fd6c10e6SLucas De Marchi   :doc: Migration
78*fd6c10e6SLucas De Marchi
79*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
8045f5a1efSMatthew Brost   :doc: Partial Unmapping of Ranges
81*fd6c10e6SLucas De Marchi
82*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
8345f5a1efSMatthew Brost   :doc: Examples
8445f5a1efSMatthew Brost
8545f5a1efSMatthew BrostPossible future design features
8645f5a1efSMatthew Brost===============================
8745f5a1efSMatthew Brost
8845f5a1efSMatthew Brost* Concurrent GPU faults
8945f5a1efSMatthew Brost	* CPU faults are concurrent so makes sense to have concurrent GPU
9045f5a1efSMatthew Brost	  faults.
9145f5a1efSMatthew Brost	* Should be possible with fined grained locking in the driver GPU
9245f5a1efSMatthew Brost	  fault handler.
9345f5a1efSMatthew Brost	* No expected GPU SVM changes required.
9445f5a1efSMatthew Brost* Ranges with mixed system and device pages
9545f5a1efSMatthew Brost	* Can be added if required to drm_gpusvm_get_pages fairly easily.
9645f5a1efSMatthew Brost* Multi-GPU support
9745f5a1efSMatthew Brost	* Work in progress and patches expected after initially landing on GPU
9845f5a1efSMatthew Brost	  SVM.
9945f5a1efSMatthew Brost	* Ideally can be done with little to no changes to GPU SVM.
10045f5a1efSMatthew Brost* Drop ranges in favor of radix tree
10145f5a1efSMatthew Brost	* May be desirable for faster notifiers.
10245f5a1efSMatthew Brost* Compound device pages
10345f5a1efSMatthew Brost	* Nvidia, AMD, and Intel all have agreed expensive core MM functions in
10445f5a1efSMatthew Brost	  migrate device layer are a performance bottleneck, having compound
10545f5a1efSMatthew Brost	  device pages should help increase performance by reducing the number
10645f5a1efSMatthew Brost	  of these expensive calls.
10745f5a1efSMatthew Brost* Higher order dma mapping for migration
10845f5a1efSMatthew Brost	* 4k dma mapping adversely affects migration performance on Intel
10945f5a1efSMatthew Brost	  hardware, higher order (2M) dma mapping should help here.
11045f5a1efSMatthew Brost* Build common userptr implementation on top of GPU SVM
11145f5a1efSMatthew Brost* Driver side madvise implementation and migration policies
11245f5a1efSMatthew Brost* Pull in pending dma-mapping API changes from Leon / Nvidia when these land
113