xref: /linux/Documentation/gpu/amdgpu/ring-buffer.rst (revision c17ee635fd3a482b2ad2bf5e269755c2eae5f25e)
1*03dc0a6cSRodrigo Siqueira=============
2*03dc0a6cSRodrigo Siqueira Ring Buffer
3*03dc0a6cSRodrigo Siqueira=============
4*03dc0a6cSRodrigo Siqueira
5*03dc0a6cSRodrigo SiqueiraTo handle communication between user space and kernel space, AMD GPUs use a
6*03dc0a6cSRodrigo Siqueiraring buffer design to feed the engines (GFX, Compute, SDMA, UVD, VCE, VCN, VPE,
7*03dc0a6cSRodrigo Siqueiraetc.). See the figure below that illustrates how this communication works:
8*03dc0a6cSRodrigo Siqueira
9*03dc0a6cSRodrigo Siqueira.. kernel-figure:: ring_buffers.svg
10*03dc0a6cSRodrigo Siqueira
11*03dc0a6cSRodrigo SiqueiraRing buffers in the amdgpu work as a producer-consumer model, where userspace
12*03dc0a6cSRodrigo Siqueiraacts as the producer, constantly filling the ring buffer with GPU commands to
13*03dc0a6cSRodrigo Siqueirabe executed. Meanwhile, the GPU retrieves the information from the ring, parses
14*03dc0a6cSRodrigo Siqueirait, and distributes the specific set of instructions between the different
15*03dc0a6cSRodrigo Siqueiraamdgpu blocks.
16*03dc0a6cSRodrigo Siqueira
17*03dc0a6cSRodrigo SiqueiraNotice from the diagram that the ring has a Read Pointer (rptr), which
18*03dc0a6cSRodrigo Siqueiraindicates where the engine is currently reading packets from the ring, and a
19*03dc0a6cSRodrigo SiqueiraWrite Pointer (wptr), which indicates how many packets software has added to
20*03dc0a6cSRodrigo Siqueirathe ring. When the rptr and wptr are equal, the ring is idle. When software
21*03dc0a6cSRodrigo Siqueiraadds packets to the ring, it updates the wptr, this causes the engine to start
22*03dc0a6cSRodrigo Siqueirafetching and processing packets. As the engine processes packets, the rptr gets
23*03dc0a6cSRodrigo Siqueiraupdates until the rptr catches up to the wptr and they are equal again.
24*03dc0a6cSRodrigo Siqueira
25*03dc0a6cSRodrigo SiqueiraUsually, ring buffers in the driver have a limited size (search for occurrences
26*03dc0a6cSRodrigo Siqueiraof `amdgpu_ring_init()`). One of the reasons for the small ring buffer size is
27*03dc0a6cSRodrigo Siqueirathat CP (Command Processor) is capable of following addresses inserted into the
28*03dc0a6cSRodrigo Siqueiraring; this is illustrated in the image by the reference to the IB (Indirect
29*03dc0a6cSRodrigo SiqueiraBuffer). The IB gives userspace the possibility to have an area in memory that
30*03dc0a6cSRodrigo SiqueiraCP can read and feed the hardware with extra instructions.
31*03dc0a6cSRodrigo Siqueira
32*03dc0a6cSRodrigo SiqueiraAll ASICs pre-GFX11 use what is called a kernel queue, which means
33*03dc0a6cSRodrigo Siqueirathe ring is allocated in kernel space and has some restrictions, such as not
34*03dc0a6cSRodrigo Siqueirabeing able to be :ref:`preempted directly by the scheduler<amdgpu-mes>`. GFX11
35*03dc0a6cSRodrigo Siqueiraand newer support kernel queues, but also provide a new mechanism named
36*03dc0a6cSRodrigo Siqueira:ref:`user queues<amdgpu-userq>`, where the queue is moved to the user space
37*03dc0a6cSRodrigo Siqueiraand can be mapped and unmapped via the scheduler. In practice, both queues
38*03dc0a6cSRodrigo Siqueirainsert user-space-generated GPU commands from different jobs into the requested
39*03dc0a6cSRodrigo Siqueiracomponent ring.
40*03dc0a6cSRodrigo Siqueira
41*03dc0a6cSRodrigo SiqueiraEnforce Isolation
42*03dc0a6cSRodrigo Siqueira=================
43*03dc0a6cSRodrigo Siqueira
44*03dc0a6cSRodrigo Siqueira.. note:: After reading this section, you might want to check the
45*03dc0a6cSRodrigo Siqueira   :ref:`Process Isolation<amdgpu-process-isolation>` page for more details.
46*03dc0a6cSRodrigo Siqueira
47*03dc0a6cSRodrigo SiqueiraBefore examining the Enforce Isolation mechanism in the ring buffer context, it
48*03dc0a6cSRodrigo Siqueirais helpful to briefly discuss how instructions from the ring buffer are
49*03dc0a6cSRodrigo Siqueiraprocessed in the graphics pipeline. Let’s expand on this topic by checking the
50*03dc0a6cSRodrigo Siqueiradiagram below that illustrates the graphics pipeline:
51*03dc0a6cSRodrigo Siqueira
52*03dc0a6cSRodrigo Siqueira.. kernel-figure:: gfx_pipeline_seq.svg
53*03dc0a6cSRodrigo Siqueira
54*03dc0a6cSRodrigo SiqueiraIn terms of executing instructions, the GFX pipeline follows the sequence:
55*03dc0a6cSRodrigo SiqueiraShader Export (SX), Geometry Engine (GE), Shader Process or Input (SPI), Scan
56*03dc0a6cSRodrigo SiqueiraConverter (SC), Primitive Assembler (PA), and cache manipulation (which may
57*03dc0a6cSRodrigo Siqueiravary across ASICs). Another common way to describe the pipeline is to use Pixel
58*03dc0a6cSRodrigo SiqueiraShader (PS), raster, and Vertex Shader (VS) to symbolize the two shader stages.
59*03dc0a6cSRodrigo SiqueiraNow, with this pipeline in mind, let's assume that Job B causes a hang issue,
60*03dc0a6cSRodrigo Siqueirabut Job C's instruction might already be executing, leading developers to
61*03dc0a6cSRodrigo Siqueiraincorrectly identify Job C as the problematic one. This problem can be
62*03dc0a6cSRodrigo Siqueiramitigated on multiple levels; the diagram below illustrates how to minimize
63*03dc0a6cSRodrigo Siqueirapart of this problem:
64*03dc0a6cSRodrigo Siqueira
65*03dc0a6cSRodrigo Siqueira.. kernel-figure:: no_enforce_isolation.svg
66*03dc0a6cSRodrigo Siqueira
67*03dc0a6cSRodrigo SiqueiraNote from the diagram that there is no guarantee of order or a clear separation
68*03dc0a6cSRodrigo Siqueirabetween instructions, which is not a problem most of the time, and is also good
69*03dc0a6cSRodrigo Siqueirafor performance. Furthermore, notice some circles between jobs in the diagram
70*03dc0a6cSRodrigo Siqueirathat represent a **fence wait** used to avoid overlapping work in the ring. At
71*03dc0a6cSRodrigo Siqueirathe end of the fence, a cache flush occurs, ensuring that when the next job
72*03dc0a6cSRodrigo Siqueirastarts, it begins in a clean state and, if issues arise, the developer can
73*03dc0a6cSRodrigo Siqueirapinpoint the problematic process more precisely.
74*03dc0a6cSRodrigo Siqueira
75*03dc0a6cSRodrigo SiqueiraTo increase the level of isolation between jobs, there is the "Enforce
76*03dc0a6cSRodrigo SiqueiraIsolation" method described in the picture below:
77*03dc0a6cSRodrigo Siqueira
78*03dc0a6cSRodrigo Siqueira.. kernel-figure:: enforce_isolation.svg
79*03dc0a6cSRodrigo Siqueira
80*03dc0a6cSRodrigo SiqueiraAs shown in the diagram, enforcing isolation introduces ordering between
81*03dc0a6cSRodrigo Siqueirasubmissions, since the access to GFX/Compute is serialized, think about it as
82*03dc0a6cSRodrigo Siqueirasingle process at a time mode for gfx/compute. Notice that this approach has a
83*03dc0a6cSRodrigo Siqueirasignificant performance impact, as it allows only one job to submit commands at
84*03dc0a6cSRodrigo Siqueiraa time. However, this option can help pinpoint the job that caused the problem.
85*03dc0a6cSRodrigo SiqueiraAlthough enforcing isolation improves the situation, it does not fully resolve
86*03dc0a6cSRodrigo Siqueirathe issue of precisely pinpointing bad jobs, since isolation might mask the
87*03dc0a6cSRodrigo Siqueiraproblem. In summary, identifying which job caused the issue may not be precise,
88*03dc0a6cSRodrigo Siqueirabut enforcing isolation might help with the debugging.
89*03dc0a6cSRodrigo Siqueira
90*03dc0a6cSRodrigo SiqueiraRing Operations
91*03dc0a6cSRodrigo Siqueira===============
92*03dc0a6cSRodrigo Siqueira
93*03dc0a6cSRodrigo Siqueira.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
94*03dc0a6cSRodrigo Siqueira   :internal:
95*03dc0a6cSRodrigo Siqueira
96