145f5a1efSMatthew Brost.. SPDX-License-Identifier: (GPL-2.0+ OR MIT) 245f5a1efSMatthew Brost 345f5a1efSMatthew Brost=============== 445f5a1efSMatthew BrostGPU SVM Section 545f5a1efSMatthew Brost=============== 645f5a1efSMatthew Brost 745f5a1efSMatthew BrostAgreed upon design principles 845f5a1efSMatthew Brost============================= 945f5a1efSMatthew Brost 1045f5a1efSMatthew Brost* migrate_to_ram path 1145f5a1efSMatthew Brost * Rely only on core MM concepts (migration PTEs, page references, and 1245f5a1efSMatthew Brost page locking). 1345f5a1efSMatthew Brost * No driver specific locks other than locks for hardware interaction in 1445f5a1efSMatthew Brost this path. These are not required and generally a bad idea to 1545f5a1efSMatthew Brost invent driver defined locks to seal core MM races. 1645f5a1efSMatthew Brost * An example of a driver-specific lock causing issues occurred before 1745f5a1efSMatthew Brost fixing do_swap_page to lock the faulting page. A driver-exclusive lock 1845f5a1efSMatthew Brost in migrate_to_ram produced a stable livelock if enough threads read 1945f5a1efSMatthew Brost the faulting page. 2045f5a1efSMatthew Brost * Partial migration is supported (i.e., a subset of pages attempting to 2145f5a1efSMatthew Brost migrate can actually migrate, with only the faulting page guaranteed 2245f5a1efSMatthew Brost to migrate). 2345f5a1efSMatthew Brost * Driver handles mixed migrations via retry loops rather than locking. 2445f5a1efSMatthew Brost* Eviction 2545f5a1efSMatthew Brost * Eviction is defined as migrating data from the GPU back to the 2645f5a1efSMatthew Brost CPU without a virtual address to free up GPU memory. 2745f5a1efSMatthew Brost * Only looking at physical memory data structures and locks as opposed to 2845f5a1efSMatthew Brost looking at virtual memory data structures and locks. 2945f5a1efSMatthew Brost * No looking at mm/vma structs or relying on those being locked. 3045f5a1efSMatthew Brost * The rationale for the above two points is that CPU virtual addresses 3145f5a1efSMatthew Brost can change at any moment, while the physical pages remain stable. 3245f5a1efSMatthew Brost * GPU page table invalidation, which requires a GPU virtual address, is 3345f5a1efSMatthew Brost handled via the notifier that has access to the GPU virtual address. 3445f5a1efSMatthew Brost* GPU fault side 3545f5a1efSMatthew Brost * mmap_read only used around core MM functions which require this lock 3645f5a1efSMatthew Brost and should strive to take mmap_read lock only in GPU SVM layer. 3745f5a1efSMatthew Brost * Big retry loop to handle all races with the mmu notifier under the gpu 3845f5a1efSMatthew Brost pagetable locks/mmu notifier range lock/whatever we end up calling 3945f5a1efSMatthew Brost those. 4045f5a1efSMatthew Brost * Races (especially against concurrent eviction or migrate_to_ram) 4145f5a1efSMatthew Brost should not be handled on the fault side by trying to hold locks; 4245f5a1efSMatthew Brost rather, they should be handled using retry loops. One possible 4345f5a1efSMatthew Brost exception is holding a BO's dma-resv lock during the initial migration 4445f5a1efSMatthew Brost to VRAM, as this is a well-defined lock that can be taken underneath 4545f5a1efSMatthew Brost the mmap_read lock. 4645f5a1efSMatthew Brost * One possible issue with the above approach is if a driver has a strict 4745f5a1efSMatthew Brost migration policy requiring GPU access to occur in GPU memory. 4845f5a1efSMatthew Brost Concurrent CPU access could cause a livelock due to endless retries. 4945f5a1efSMatthew Brost While no current user (Xe) of GPU SVM has such a policy, it is likely 5045f5a1efSMatthew Brost to be added in the future. Ideally, this should be resolved on the 5145f5a1efSMatthew Brost core-MM side rather than through a driver-side lock. 5245f5a1efSMatthew Brost* Physical memory to virtual backpointer 5345f5a1efSMatthew Brost * This does not work, as no pointers from physical memory to virtual 5445f5a1efSMatthew Brost memory should exist. mremap() is an example of the core MM updating 5545f5a1efSMatthew Brost the virtual address without notifying the driver of address 5645f5a1efSMatthew Brost change rather the driver only receiving the invalidation notifier. 5745f5a1efSMatthew Brost * The physical memory backpointer (page->zone_device_data) should remain 5845f5a1efSMatthew Brost stable from allocation to page free. Safely updating this against a 5945f5a1efSMatthew Brost concurrent user would be very difficult unless the page is free. 6045f5a1efSMatthew Brost* GPU pagetable locking 6145f5a1efSMatthew Brost * Notifier lock only protects range tree, pages valid state for a range 6245f5a1efSMatthew Brost (rather than seqno due to wider notifiers), pagetable entries, and 6345f5a1efSMatthew Brost mmu notifier seqno tracking, it is not a global lock to protect 6445f5a1efSMatthew Brost against races. 6545f5a1efSMatthew Brost * All races handled with big retry as mentioned above. 6645f5a1efSMatthew Brost 6745f5a1efSMatthew BrostOverview of baseline design 6845f5a1efSMatthew Brost=========================== 6945f5a1efSMatthew Brost 70*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c 7145f5a1efSMatthew Brost :doc: Overview 72*fd6c10e6SLucas De Marchi 73*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c 7445f5a1efSMatthew Brost :doc: Locking 75*fd6c10e6SLucas De Marchi 76*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c 77*fd6c10e6SLucas De Marchi :doc: Migration 78*fd6c10e6SLucas De Marchi 79*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c 8045f5a1efSMatthew Brost :doc: Partial Unmapping of Ranges 81*fd6c10e6SLucas De Marchi 82*fd6c10e6SLucas De Marchi.. kernel-doc:: drivers/gpu/drm/drm_gpusvm.c 8345f5a1efSMatthew Brost :doc: Examples 8445f5a1efSMatthew Brost 8545f5a1efSMatthew BrostPossible future design features 8645f5a1efSMatthew Brost=============================== 8745f5a1efSMatthew Brost 8845f5a1efSMatthew Brost* Concurrent GPU faults 8945f5a1efSMatthew Brost * CPU faults are concurrent so makes sense to have concurrent GPU 9045f5a1efSMatthew Brost faults. 9145f5a1efSMatthew Brost * Should be possible with fined grained locking in the driver GPU 9245f5a1efSMatthew Brost fault handler. 9345f5a1efSMatthew Brost * No expected GPU SVM changes required. 9445f5a1efSMatthew Brost* Ranges with mixed system and device pages 9545f5a1efSMatthew Brost * Can be added if required to drm_gpusvm_get_pages fairly easily. 9645f5a1efSMatthew Brost* Multi-GPU support 9745f5a1efSMatthew Brost * Work in progress and patches expected after initially landing on GPU 9845f5a1efSMatthew Brost SVM. 9945f5a1efSMatthew Brost * Ideally can be done with little to no changes to GPU SVM. 10045f5a1efSMatthew Brost* Drop ranges in favor of radix tree 10145f5a1efSMatthew Brost * May be desirable for faster notifiers. 10245f5a1efSMatthew Brost* Compound device pages 10345f5a1efSMatthew Brost * Nvidia, AMD, and Intel all have agreed expensive core MM functions in 10445f5a1efSMatthew Brost migrate device layer are a performance bottleneck, having compound 10545f5a1efSMatthew Brost device pages should help increase performance by reducing the number 10645f5a1efSMatthew Brost of these expensive calls. 10745f5a1efSMatthew Brost* Higher order dma mapping for migration 10845f5a1efSMatthew Brost * 4k dma mapping adversely affects migration performance on Intel 10945f5a1efSMatthew Brost hardware, higher order (2M) dma mapping should help here. 11045f5a1efSMatthew Brost* Build common userptr implementation on top of GPU SVM 11145f5a1efSMatthew Brost* Driver side madvise implementation and migration policies 11245f5a1efSMatthew Brost* Pull in pending dma-mapping API changes from Leon / Nvidia when these land 113