1*03dc0a6cSRodrigo Siqueira============= 2*03dc0a6cSRodrigo Siqueira Ring Buffer 3*03dc0a6cSRodrigo Siqueira============= 4*03dc0a6cSRodrigo Siqueira 5*03dc0a6cSRodrigo SiqueiraTo handle communication between user space and kernel space, AMD GPUs use a 6*03dc0a6cSRodrigo Siqueiraring buffer design to feed the engines (GFX, Compute, SDMA, UVD, VCE, VCN, VPE, 7*03dc0a6cSRodrigo Siqueiraetc.). See the figure below that illustrates how this communication works: 8*03dc0a6cSRodrigo Siqueira 9*03dc0a6cSRodrigo Siqueira.. kernel-figure:: ring_buffers.svg 10*03dc0a6cSRodrigo Siqueira 11*03dc0a6cSRodrigo SiqueiraRing buffers in the amdgpu work as a producer-consumer model, where userspace 12*03dc0a6cSRodrigo Siqueiraacts as the producer, constantly filling the ring buffer with GPU commands to 13*03dc0a6cSRodrigo Siqueirabe executed. Meanwhile, the GPU retrieves the information from the ring, parses 14*03dc0a6cSRodrigo Siqueirait, and distributes the specific set of instructions between the different 15*03dc0a6cSRodrigo Siqueiraamdgpu blocks. 16*03dc0a6cSRodrigo Siqueira 17*03dc0a6cSRodrigo SiqueiraNotice from the diagram that the ring has a Read Pointer (rptr), which 18*03dc0a6cSRodrigo Siqueiraindicates where the engine is currently reading packets from the ring, and a 19*03dc0a6cSRodrigo SiqueiraWrite Pointer (wptr), which indicates how many packets software has added to 20*03dc0a6cSRodrigo Siqueirathe ring. When the rptr and wptr are equal, the ring is idle. When software 21*03dc0a6cSRodrigo Siqueiraadds packets to the ring, it updates the wptr, this causes the engine to start 22*03dc0a6cSRodrigo Siqueirafetching and processing packets. As the engine processes packets, the rptr gets 23*03dc0a6cSRodrigo Siqueiraupdates until the rptr catches up to the wptr and they are equal again. 24*03dc0a6cSRodrigo Siqueira 25*03dc0a6cSRodrigo SiqueiraUsually, ring buffers in the driver have a limited size (search for occurrences 26*03dc0a6cSRodrigo Siqueiraof `amdgpu_ring_init()`). One of the reasons for the small ring buffer size is 27*03dc0a6cSRodrigo Siqueirathat CP (Command Processor) is capable of following addresses inserted into the 28*03dc0a6cSRodrigo Siqueiraring; this is illustrated in the image by the reference to the IB (Indirect 29*03dc0a6cSRodrigo SiqueiraBuffer). The IB gives userspace the possibility to have an area in memory that 30*03dc0a6cSRodrigo SiqueiraCP can read and feed the hardware with extra instructions. 31*03dc0a6cSRodrigo Siqueira 32*03dc0a6cSRodrigo SiqueiraAll ASICs pre-GFX11 use what is called a kernel queue, which means 33*03dc0a6cSRodrigo Siqueirathe ring is allocated in kernel space and has some restrictions, such as not 34*03dc0a6cSRodrigo Siqueirabeing able to be :ref:`preempted directly by the scheduler<amdgpu-mes>`. GFX11 35*03dc0a6cSRodrigo Siqueiraand newer support kernel queues, but also provide a new mechanism named 36*03dc0a6cSRodrigo Siqueira:ref:`user queues<amdgpu-userq>`, where the queue is moved to the user space 37*03dc0a6cSRodrigo Siqueiraand can be mapped and unmapped via the scheduler. In practice, both queues 38*03dc0a6cSRodrigo Siqueirainsert user-space-generated GPU commands from different jobs into the requested 39*03dc0a6cSRodrigo Siqueiracomponent ring. 40*03dc0a6cSRodrigo Siqueira 41*03dc0a6cSRodrigo SiqueiraEnforce Isolation 42*03dc0a6cSRodrigo Siqueira================= 43*03dc0a6cSRodrigo Siqueira 44*03dc0a6cSRodrigo Siqueira.. note:: After reading this section, you might want to check the 45*03dc0a6cSRodrigo Siqueira :ref:`Process Isolation<amdgpu-process-isolation>` page for more details. 46*03dc0a6cSRodrigo Siqueira 47*03dc0a6cSRodrigo SiqueiraBefore examining the Enforce Isolation mechanism in the ring buffer context, it 48*03dc0a6cSRodrigo Siqueirais helpful to briefly discuss how instructions from the ring buffer are 49*03dc0a6cSRodrigo Siqueiraprocessed in the graphics pipeline. Let’s expand on this topic by checking the 50*03dc0a6cSRodrigo Siqueiradiagram below that illustrates the graphics pipeline: 51*03dc0a6cSRodrigo Siqueira 52*03dc0a6cSRodrigo Siqueira.. kernel-figure:: gfx_pipeline_seq.svg 53*03dc0a6cSRodrigo Siqueira 54*03dc0a6cSRodrigo SiqueiraIn terms of executing instructions, the GFX pipeline follows the sequence: 55*03dc0a6cSRodrigo SiqueiraShader Export (SX), Geometry Engine (GE), Shader Process or Input (SPI), Scan 56*03dc0a6cSRodrigo SiqueiraConverter (SC), Primitive Assembler (PA), and cache manipulation (which may 57*03dc0a6cSRodrigo Siqueiravary across ASICs). Another common way to describe the pipeline is to use Pixel 58*03dc0a6cSRodrigo SiqueiraShader (PS), raster, and Vertex Shader (VS) to symbolize the two shader stages. 59*03dc0a6cSRodrigo SiqueiraNow, with this pipeline in mind, let's assume that Job B causes a hang issue, 60*03dc0a6cSRodrigo Siqueirabut Job C's instruction might already be executing, leading developers to 61*03dc0a6cSRodrigo Siqueiraincorrectly identify Job C as the problematic one. This problem can be 62*03dc0a6cSRodrigo Siqueiramitigated on multiple levels; the diagram below illustrates how to minimize 63*03dc0a6cSRodrigo Siqueirapart of this problem: 64*03dc0a6cSRodrigo Siqueira 65*03dc0a6cSRodrigo Siqueira.. kernel-figure:: no_enforce_isolation.svg 66*03dc0a6cSRodrigo Siqueira 67*03dc0a6cSRodrigo SiqueiraNote from the diagram that there is no guarantee of order or a clear separation 68*03dc0a6cSRodrigo Siqueirabetween instructions, which is not a problem most of the time, and is also good 69*03dc0a6cSRodrigo Siqueirafor performance. Furthermore, notice some circles between jobs in the diagram 70*03dc0a6cSRodrigo Siqueirathat represent a **fence wait** used to avoid overlapping work in the ring. At 71*03dc0a6cSRodrigo Siqueirathe end of the fence, a cache flush occurs, ensuring that when the next job 72*03dc0a6cSRodrigo Siqueirastarts, it begins in a clean state and, if issues arise, the developer can 73*03dc0a6cSRodrigo Siqueirapinpoint the problematic process more precisely. 74*03dc0a6cSRodrigo Siqueira 75*03dc0a6cSRodrigo SiqueiraTo increase the level of isolation between jobs, there is the "Enforce 76*03dc0a6cSRodrigo SiqueiraIsolation" method described in the picture below: 77*03dc0a6cSRodrigo Siqueira 78*03dc0a6cSRodrigo Siqueira.. kernel-figure:: enforce_isolation.svg 79*03dc0a6cSRodrigo Siqueira 80*03dc0a6cSRodrigo SiqueiraAs shown in the diagram, enforcing isolation introduces ordering between 81*03dc0a6cSRodrigo Siqueirasubmissions, since the access to GFX/Compute is serialized, think about it as 82*03dc0a6cSRodrigo Siqueirasingle process at a time mode for gfx/compute. Notice that this approach has a 83*03dc0a6cSRodrigo Siqueirasignificant performance impact, as it allows only one job to submit commands at 84*03dc0a6cSRodrigo Siqueiraa time. However, this option can help pinpoint the job that caused the problem. 85*03dc0a6cSRodrigo SiqueiraAlthough enforcing isolation improves the situation, it does not fully resolve 86*03dc0a6cSRodrigo Siqueirathe issue of precisely pinpointing bad jobs, since isolation might mask the 87*03dc0a6cSRodrigo Siqueiraproblem. In summary, identifying which job caused the issue may not be precise, 88*03dc0a6cSRodrigo Siqueirabut enforcing isolation might help with the debugging. 89*03dc0a6cSRodrigo Siqueira 90*03dc0a6cSRodrigo SiqueiraRing Operations 91*03dc0a6cSRodrigo Siqueira=============== 92*03dc0a6cSRodrigo Siqueira 93*03dc0a6cSRodrigo Siqueira.. kernel-doc:: drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 94*03dc0a6cSRodrigo Siqueira :internal: 95*03dc0a6cSRodrigo Siqueira 96