gpu/rfc/gpusvm.rst

45f5a1efSMatthew Brost.. SPDX-License-Identifier: (GPL-2.0+ OR MIT)
45f5a1efSMatthew Brost
45f5a1efSMatthew Brost===============
45f5a1efSMatthew BrostGPU SVM Section
45f5a1efSMatthew Brost===============
45f5a1efSMatthew Brost
45f5a1efSMatthew BrostAgreed upon design principles
45f5a1efSMatthew Brost=============================
45f5a1efSMatthew Brost
45f5a1efSMatthew Brost* migrate_to_ram path
45f5a1efSMatthew Brost	* Rely only on core MM concepts (migration PTEs, page references, and
45f5a1efSMatthew Brost	  page locking).
45f5a1efSMatthew Brost	* No driver specific locks other than locks for hardware interaction in
45f5a1efSMatthew Brost	  this path. These are not required and generally a bad idea to
45f5a1efSMatthew Brost	  invent driver defined locks to seal core MM races.
45f5a1efSMatthew Brost	* An example of a driver-specific lock causing issues occurred before
45f5a1efSMatthew Brost	  fixing do_swap_page to lock the faulting page. A driver-exclusive lock
45f5a1efSMatthew Brost	  in migrate_to_ram produced a stable livelock if enough threads read
45f5a1efSMatthew Brost	  the faulting page.
45f5a1efSMatthew Brost	* Partial migration is supported (i.e., a subset of pages attempting to
45f5a1efSMatthew Brost	  migrate can actually migrate, with only the faulting page guaranteed
45f5a1efSMatthew Brost	  to migrate).
45f5a1efSMatthew Brost	* Driver handles mixed migrations via retry loops rather than locking.
45f5a1efSMatthew Brost* Eviction
45f5a1efSMatthew Brost	* Eviction is defined as migrating data from the GPU back to the
45f5a1efSMatthew Brost	  CPU without a virtual address to free up GPU memory.
45f5a1efSMatthew Brost	* Only looking at physical memory data structures and locks as opposed to
45f5a1efSMatthew Brost	  looking at virtual memory data structures and locks.
45f5a1efSMatthew Brost	* No looking at mm/vma structs or relying on those being locked.
45f5a1efSMatthew Brost	* The rationale for the above two points is that CPU virtual addresses
45f5a1efSMatthew Brost	  can change at any moment, while the physical pages remain stable.
45f5a1efSMatthew Brost	* GPU page table invalidation, which requires a GPU virtual address, is
45f5a1efSMatthew Brost	  handled via the notifier that has access to the GPU virtual address.
45f5a1efSMatthew Brost* GPU fault side
45f5a1efSMatthew Brost	* mmap_read only used around core MM functions which require this lock
45f5a1efSMatthew Brost	  and should strive to take mmap_read lock only in GPU SVM layer.
45f5a1efSMatthew Brost	* Big retry loop to handle all races with the mmu notifier under the gpu
45f5a1efSMatthew Brost	  pagetable locks/mmu notifier range lock/whatever we end up calling
45f5a1efSMatthew Brost          those.
45f5a1efSMatthew Brost	* Races (especially against concurrent eviction or migrate_to_ram)
45f5a1efSMatthew Brost	  should not be handled on the fault side by trying to hold locks;
45f5a1efSMatthew Brost	  rather, they should be handled using retry loops. One possible
45f5a1efSMatthew Brost	  exception is holding a BO's dma-resv lock during the initial migration
45f5a1efSMatthew Brost	  to VRAM, as this is a well-defined lock that can be taken underneath
45f5a1efSMatthew Brost	  the mmap_read lock.
45f5a1efSMatthew Brost	* One possible issue with the above approach is if a driver has a strict
45f5a1efSMatthew Brost	  migration policy requiring GPU access to occur in GPU memory.
45f5a1efSMatthew Brost	  Concurrent CPU access could cause a livelock due to endless retries.
45f5a1efSMatthew Brost	  While no current user (Xe) of GPU SVM has such a policy, it is likely
45f5a1efSMatthew Brost	  to be added in the future. Ideally, this should be resolved on the
45f5a1efSMatthew Brost	  core-MM side rather than through a driver-side lock.
45f5a1efSMatthew Brost* Physical memory to virtual backpointer
45f5a1efSMatthew Brost	* This does not work, as no pointers from physical memory to virtual
45f5a1efSMatthew Brost	  memory should exist. mremap() is an example of the core MM updating
45f5a1efSMatthew Brost	  the virtual address without notifying the driver of address
45f5a1efSMatthew Brost	  change rather the driver only receiving the invalidation notifier.
45f5a1efSMatthew Brost	* The physical memory backpointer (page->zone_device_data) should remain
45f5a1efSMatthew Brost	  stable from allocation to page free. Safely updating this against a
45f5a1efSMatthew Brost	  concurrent user would be very difficult unless the page is free.
45f5a1efSMatthew Brost* GPU pagetable locking
45f5a1efSMatthew Brost	* Notifier lock only protects range tree, pages valid state for a range
45f5a1efSMatthew Brost	  (rather than seqno due to wider notifiers), pagetable entries, and
45f5a1efSMatthew Brost	  mmu notifier seqno tracking, it is not a global lock to protect
45f5a1efSMatthew Brost          against races.
45f5a1efSMatthew Brost	* All races handled with big retry as mentioned above.
45f5a1efSMatthew Brost
45f5a1efSMatthew BrostOverview of baseline design
45f5a1efSMatthew Brost===========================
45f5a1efSMatthew Brost
*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
45f5a1efSMatthew Brost   :doc: Overview
*fd6c10e6SLucas De Marchi
*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
45f5a1efSMatthew Brost   :doc: Locking
*fd6c10e6SLucas De Marchi
*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
*fd6c10e6SLucas De Marchi   :doc: Migration
*fd6c10e6SLucas De Marchi
*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
45f5a1efSMatthew Brost   :doc: Partial Unmapping of Ranges
*fd6c10e6SLucas De Marchi
*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c
45f5a1efSMatthew Brost   :doc: Examples
45f5a1efSMatthew Brost
45f5a1efSMatthew BrostPossible future design features
45f5a1efSMatthew Brost===============================
45f5a1efSMatthew Brost
45f5a1efSMatthew Brost* Concurrent GPU faults
45f5a1efSMatthew Brost	* CPU faults are concurrent so makes sense to have concurrent GPU
45f5a1efSMatthew Brost	  faults.
45f5a1efSMatthew Brost	* Should be possible with fined grained locking in the driver GPU
45f5a1efSMatthew Brost	  fault handler.
45f5a1efSMatthew Brost	* No expected GPU SVM changes required.
45f5a1efSMatthew Brost* Ranges with mixed system and device pages
45f5a1efSMatthew Brost	* Can be added if required to drm_gpusvm_get_pages fairly easily.
45f5a1efSMatthew Brost* Multi-GPU support
45f5a1efSMatthew Brost	* Work in progress and patches expected after initially landing on GPU
45f5a1efSMatthew Brost	  SVM.
45f5a1efSMatthew Brost	* Ideally can be done with little to no changes to GPU SVM.
45f5a1efSMatthew Brost* Drop ranges in favor of radix tree
45f5a1efSMatthew Brost	* May be desirable for faster notifiers.
45f5a1efSMatthew Brost* Compound device pages
45f5a1efSMatthew Brost	* Nvidia, AMD, and Intel all have agreed expensive core MM functions in
45f5a1efSMatthew Brost	  migrate device layer are a performance bottleneck, having compound
45f5a1efSMatthew Brost	  device pages should help increase performance by reducing the number
45f5a1efSMatthew Brost	  of these expensive calls.
45f5a1efSMatthew Brost* Higher order dma mapping for migration
45f5a1efSMatthew Brost	* 4k dma mapping adversely affects migration performance on Intel
45f5a1efSMatthew Brost	  hardware, higher order (2M) dma mapping should help here.
45f5a1efSMatthew Brost* Build common userptr implementation on top of GPU SVM
45f5a1efSMatthew Brost* Driver side madvise implementation and migration policies
45f5a1efSMatthew Brost* Pull in pending dma-mapping API changes from Leon / Nvidia when these land