gpu/amdgpu/ring-buffer.rst

*03dc0a6cSRodrigo Siqueira=============
*03dc0a6cSRodrigo Siqueira Ring Buffer
*03dc0a6cSRodrigo Siqueira=============
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraTo handle communication between user space and kernel space, AMD GPUs use a
*03dc0a6cSRodrigo Siqueiraring buffer design to feed the engines (GFX, Compute, SDMA, UVD, VCE, VCN, VPE,
*03dc0a6cSRodrigo Siqueiraetc.). See the figure below that illustrates how this communication works:
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo Siqueira.. kernel-figure:: ring_buffers.svg
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraRing buffers in the amdgpu work as a producer-consumer model, where userspace
*03dc0a6cSRodrigo Siqueiraacts as the producer, constantly filling the ring buffer with GPU commands to
*03dc0a6cSRodrigo Siqueirabe executed. Meanwhile, the GPU retrieves the information from the ring, parses
*03dc0a6cSRodrigo Siqueirait, and distributes the specific set of instructions between the different
*03dc0a6cSRodrigo Siqueiraamdgpu blocks.
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraNotice from the diagram that the ring has a Read Pointer (rptr), which
*03dc0a6cSRodrigo Siqueiraindicates where the engine is currently reading packets from the ring, and a
*03dc0a6cSRodrigo SiqueiraWrite Pointer (wptr), which indicates how many packets software has added to
*03dc0a6cSRodrigo Siqueirathe ring. When the rptr and wptr are equal, the ring is idle. When software
*03dc0a6cSRodrigo Siqueiraadds packets to the ring, it updates the wptr, this causes the engine to start
*03dc0a6cSRodrigo Siqueirafetching and processing packets. As the engine processes packets, the rptr gets
*03dc0a6cSRodrigo Siqueiraupdates until the rptr catches up to the wptr and they are equal again.
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraUsually, ring buffers in the driver have a limited size (search for occurrences
*03dc0a6cSRodrigo Siqueiraof `amdgpu_ring_init()`). One of the reasons for the small ring buffer size is
*03dc0a6cSRodrigo Siqueirathat CP (Command Processor) is capable of following addresses inserted into the
*03dc0a6cSRodrigo Siqueiraring; this is illustrated in the image by the reference to the IB (Indirect
*03dc0a6cSRodrigo SiqueiraBuffer). The IB gives userspace the possibility to have an area in memory that
*03dc0a6cSRodrigo SiqueiraCP can read and feed the hardware with extra instructions.
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraAll ASICs pre-GFX11 use what is called a kernel queue, which means
*03dc0a6cSRodrigo Siqueirathe ring is allocated in kernel space and has some restrictions, such as not
*03dc0a6cSRodrigo Siqueirabeing able to be :ref:`preempted directly by the scheduler<amdgpu-mes>`. GFX11
*03dc0a6cSRodrigo Siqueiraand newer support kernel queues, but also provide a new mechanism named
*03dc0a6cSRodrigo Siqueira:ref:`user queues<amdgpu-userq>`, where the queue is moved to the user space
*03dc0a6cSRodrigo Siqueiraand can be mapped and unmapped via the scheduler. In practice, both queues
*03dc0a6cSRodrigo Siqueirainsert user-space-generated GPU commands from different jobs into the requested
*03dc0a6cSRodrigo Siqueiracomponent ring.
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraEnforce Isolation
*03dc0a6cSRodrigo Siqueira=================
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo Siqueira.. note:: After reading this section, you might want to check the
*03dc0a6cSRodrigo Siqueira   :ref:`Process Isolation<amdgpu-process-isolation>` page for more details.
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraBefore examining the Enforce Isolation mechanism in the ring buffer context, it
*03dc0a6cSRodrigo Siqueirais helpful to briefly discuss how instructions from the ring buffer are
*03dc0a6cSRodrigo Siqueiraprocessed in the graphics pipeline. Let’s expand on this topic by checking the
*03dc0a6cSRodrigo Siqueiradiagram below that illustrates the graphics pipeline:
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo Siqueira.. kernel-figure:: gfx_pipeline_seq.svg
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraIn terms of executing instructions, the GFX pipeline follows the sequence:
*03dc0a6cSRodrigo SiqueiraShader Export (SX), Geometry Engine (GE), Shader Process or Input (SPI), Scan
*03dc0a6cSRodrigo SiqueiraConverter (SC), Primitive Assembler (PA), and cache manipulation (which may
*03dc0a6cSRodrigo Siqueiravary across ASICs). Another common way to describe the pipeline is to use Pixel
*03dc0a6cSRodrigo SiqueiraShader (PS), raster, and Vertex Shader (VS) to symbolize the two shader stages.
*03dc0a6cSRodrigo SiqueiraNow, with this pipeline in mind, let's assume that Job B causes a hang issue,
*03dc0a6cSRodrigo Siqueirabut Job C's instruction might already be executing, leading developers to
*03dc0a6cSRodrigo Siqueiraincorrectly identify Job C as the problematic one. This problem can be
*03dc0a6cSRodrigo Siqueiramitigated on multiple levels; the diagram below illustrates how to minimize
*03dc0a6cSRodrigo Siqueirapart of this problem:
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo Siqueira.. kernel-figure:: no_enforce_isolation.svg
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraNote from the diagram that there is no guarantee of order or a clear separation
*03dc0a6cSRodrigo Siqueirabetween instructions, which is not a problem most of the time, and is also good
*03dc0a6cSRodrigo Siqueirafor performance. Furthermore, notice some circles between jobs in the diagram
*03dc0a6cSRodrigo Siqueirathat represent a **fence wait** used to avoid overlapping work in the ring. At
*03dc0a6cSRodrigo Siqueirathe end of the fence, a cache flush occurs, ensuring that when the next job
*03dc0a6cSRodrigo Siqueirastarts, it begins in a clean state and, if issues arise, the developer can
*03dc0a6cSRodrigo Siqueirapinpoint the problematic process more precisely.
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraTo increase the level of isolation between jobs, there is the "Enforce
*03dc0a6cSRodrigo SiqueiraIsolation" method described in the picture below:
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo Siqueira.. kernel-figure:: enforce_isolation.svg
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraAs shown in the diagram, enforcing isolation introduces ordering between
*03dc0a6cSRodrigo Siqueirasubmissions, since the access to GFX/Compute is serialized, think about it as
*03dc0a6cSRodrigo Siqueirasingle process at a time mode for gfx/compute. Notice that this approach has a
*03dc0a6cSRodrigo Siqueirasignificant performance impact, as it allows only one job to submit commands at
*03dc0a6cSRodrigo Siqueiraa time. However, this option can help pinpoint the job that caused the problem.
*03dc0a6cSRodrigo SiqueiraAlthough enforcing isolation improves the situation, it does not fully resolve
*03dc0a6cSRodrigo Siqueirathe issue of precisely pinpointing bad jobs, since isolation might mask the
*03dc0a6cSRodrigo Siqueiraproblem. In summary, identifying which job caused the issue may not be precise,
*03dc0a6cSRodrigo Siqueirabut enforcing isolation might help with the debugging.
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo SiqueiraRing Operations
*03dc0a6cSRodrigo Siqueira===============
*03dc0a6cSRodrigo Siqueira
*03dc0a6cSRodrigo Siqueira.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
*03dc0a6cSRodrigo Siqueira   :internal:
*03dc0a6cSRodrigo Siqueira