1======================== 2Display Core Debug tools 3======================== 4 5In this section, you will find helpful information on debugging the amdgpu 6driver from the display perspective. This page introduces debug mechanisms and 7procedures to help you identify if some issues are related to display code. 8 9Narrow down display issues 10========================== 11 12Since the display is the driver's visual component, it is common to see users 13reporting issues as a display when another component causes the problem. This 14section equips users to determine if a specific issue was caused by the display 15component or another part of the driver. 16 17DC dmesg important messages 18--------------------------- 19 20The dmesg log is the first source of information to be checked, and amdgpu 21takes advantage of this feature by logging some valuable information. When 22looking for the issues associated with amdgpu, remember that each component of 23the driver (e.g., smu, PSP, dm, etc.) is loaded one by one, and this 24information can be found in the dmesg log. In this sense, look for the part of 25the log that looks like the below log snippet:: 26 27 [ 4.254295] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1002:0x0E3B 0xC8). 28 [ 4.254718] [drm] register mmio base: 0xFCB00000 29 [ 4.254918] [drm] register mmio size: 1048576 30 [ 4.260095] [drm] add ip block number 0 <soc21_common> 31 [ 4.260318] [drm] add ip block number 1 <gmc_v11_0> 32 [ 4.260510] [drm] add ip block number 2 <ih_v6_0> 33 [ 4.260696] [drm] add ip block number 3 <psp> 34 [ 4.260878] [drm] add ip block number 4 <smu> 35 [ 4.261057] [drm] add ip block number 5 <dm> 36 [ 4.261231] [drm] add ip block number 6 <gfx_v11_0> 37 [ 4.261402] [drm] add ip block number 7 <sdma_v6_0> 38 [ 4.261568] [drm] add ip block number 8 <vcn_v4_0> 39 [ 4.261729] [drm] add ip block number 9 <jpeg_v4_0> 40 [ 4.261887] [drm] add ip block number 10 <mes_v11_0> 41 42From the above example, you can see the line that reports that `<dm>`, 43(**Display Manager**), was loaded, which means that display can be part of the 44issue. If you do not see that line, something else might have failed before 45amdgpu loads the display component, indicating that we don't have a 46display issue. 47 48After you identified that the DM was loaded correctly, you can check for the 49display version of the hardware in use, which can be retrieved from the dmesg 50log with the command:: 51 52 dmesg | grep -i 'display core' 53 54This command shows a message that looks like this:: 55 56 [ 4.655828] [drm] Display Core v3.2.285 initialized on DCN 3.2 57 58This message has two key pieces of information: 59 60* **The DC version (e.g., v3.2.285)**: Display developers release a new DC version 61 every week, and this information can be advantageous in a situation where a 62 user/developer must find a good point versus a bad point based on a tested 63 version of the display code. Remember from page :ref:`Display Core <amdgpu-display-core>`, 64 that every week the new patches for display are heavily tested with IGT and 65 manual tests. 66* **The DCN version (e.g., DCN 3.2)**: The DCN block is associated with the 67 hardware generation, and the DCN version conveys the hardware generation that 68 the driver is currently running. This information helps to narrow down the 69 code debug area since each DCN version has its files in the DC folder per DCN 70 component (from the example, the developer might want to focus on 71 files/folders/functions/structs with the dcn32 label might be executed). 72 However, keep in mind that DC reuses code across different DCN versions; for 73 example, it is expected to have some callbacks set in one DCN that are the same 74 as those from another DCN. In summary, use the DCN version just as a guide. 75 76From the dmesg file, it is also possible to get the ATOM bios code by using:: 77 78 dmesg | grep -i 'ATOM BIOS' 79 80Which generates an output that looks like this:: 81 82 [ 4.274534] amdgpu: ATOM BIOS: 113-D7020100-102 83 84This type of information is useful to be reported. 85 86Avoid loading display core 87-------------------------- 88 89Sometimes, it might be hard to figure out which part of the driver is causing 90the issue; if you suspect that the display is not part of the problem and your 91bug scenario is simple (e.g., some desktop configuration) you can try to remove 92the display component from the equation. First, you need to identify `dm` ID 93from the dmesg log; for example, search for the following log:: 94 95 [ 4.254295] [drm] initializing kernel modesetting (IP DISCOVERY 0x1002:0x744C 0x1002:0x0E3B 0xC8). 96 [..] 97 [ 4.260095] [drm] add ip block number 0 <soc21_common> 98 [ 4.260318] [drm] add ip block number 1 <gmc_v11_0> 99 [..] 100 [ 4.261057] [drm] add ip block number 5 <dm> 101 102Notice from the above example that the `dm` id is 5 for this specific hardware. 103Next, you need to run the following binary operation to identify the IP block 104mask:: 105 106 0xffffffff & ~(1 << [DM ID]) 107 108From our example the IP mask is:: 109 110 0xffffffff & ~(1 << 5) = 0xffffffdf 111 112Finally, to disable DC, you just need to set the below parameter in your 113bootloader:: 114 115 amdgpu.ip_block_mask = 0xffffffdf 116 117If you can boot your system with the DC disabled and still see the issue, it 118means you can rule DC out of the equation. However, if the bug disappears, you 119still need to consider the DC part of the problem and keep narrowing down the 120issue. In some scenarios, disabling DC is impossible since it might be 121necessary to use the display component to reproduce the issue (e.g., play a 122game). 123 124**Note: This will probably lead to the absence of a display output.** 125 126Display flickering 127------------------ 128 129Display flickering might have multiple causes; one is the lack of proper power 130to the GPU or problems in the DPM switches. A good first generic verification 131is to set the GPU to use high voltage:: 132 133 bash -c "echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level" 134 135The above command sets the GPU/APU to use the maximum power allowed which 136disables DPM switches. If forcing DPM levels high does not fix the issue, it 137is less likely that the issue is related to power management. If the issue 138disappears, there is a good chance that other components might be involved, and 139the display should not be ignored since this could be a DPM issues. From the 140display side, if the power increase fixes the issue, it is worth debugging the 141clock configuration and the pipe split police used in the specific 142configuration. 143 144Display artifacts 145----------------- 146 147Users may see some screen artifacts that can be categorized into two different 148types: localized artifacts and general artifacts. The localized artifacts 149happen in some specific areas, such as around the UI window corners; if you see 150this type of issue, there is a considerable chance that you have a userspace 151problem, likely Mesa or similar. The general artifacts usually happen on the 152entire screen. They might be caused by a misconfiguration at the driver level 153of the display parameters, but the userspace might also cause this issue. One 154way to identify the source of the problem is to take a screenshot or make a 155desktop video capture when the problem happens; after checking the 156screenshot/video recording, if you don't see any of the artifacts, it means 157that the issue is likely on the the driver side. If you can still see the 158problem in the data collected, it is an issue that probably happened during 159rendering, and the display code just got the framebuffer already corrupted. 160 161Disabling/Enabling specific features 162==================================== 163 164DC has a struct named `dc_debug_options`, which is statically initialized by 165all DCE/DCN components based on the specific hardware characteristic. This 166structure usually facilitates the bring-up phase since developers can start 167with many disabled features and enable them individually. This is also an 168important debug feature since users can change it when debugging specific 169issues. 170 171For example, dGPU users sometimes see a problem where a horizontal fillet of 172flickering happens in some specific part of the screen. This could be an 173indication of Sub-Viewport issues; after the users identified the target DCN, 174they can set the `force_disable_subvp` field to true in the statically 175initialized version of `dc_debug_options` to see if the issue gets fixed. Along 176the same lines, users/developers can also try to turn off `fams2_config` and 177`enable_single_display_2to1_odm_policy`. In summary, the `dc_debug_options` is 178an interesting form for identifying the problem. 179 180DC Visual Confirmation 181====================== 182 183Display core provides a feature named visual confirmation, which is a set of 184bars added at the scanout time by the driver to convey some specific 185information. In general, you can enable this debug option by using:: 186 187 echo <N> > /sys/kernel/debug/dri/0/amdgpu_dm_visual_confirm 188 189Where `N` is an integer number for some specific scenarios that the developer 190wants to enable, you will see some of these debug cases in the following 191subsection. 192 193Multiple Planes Debug 194--------------------- 195 196If you want to enable or debug multiple planes in a specific user-space 197application, you can leverage a debug feature named visual confirm. For 198enabling it, you will need:: 199 200 echo 1 > /sys/kernel/debug/dri/0/amdgpu_dm_visual_confirm 201 202You need to reload your GUI to see the visual confirmation. When the plane 203configuration changes or a full update occurs there will be a colored bar at 204the bottom of each hardware plane being drawn on the screen. 205 206* The color indicates the format - For example, red is AR24 and green is NV12 207* The height of the bar indicates the index of the plane 208* Pipe split can be observed if there are two bars with a difference in height 209 covering the same plane 210 211Consider the video playback case in which a video is played in a specific 212plane, and the desktop is drawn in another plane. The video plane should 213feature one or two green bars at the bottom of the video depending on pipe 214split configuration. 215 216* There should **not** be any visual corruption 217* There should **not** be any underflow or screen flashes 218* There should **not** be any black screens 219* There should **not** be any cursor corruption 220* Multiple plane **may** be briefly disabled during window transitions or 221 resizing but should come back after the action has finished 222 223Pipe Split Debug 224---------------- 225 226Sometimes we need to debug if DCN is splitting pipes correctly, and visual 227confirmation is also handy for this case. Similar to the MPO case, you can use 228the below command to enable visual confirmation:: 229 230 echo 1 > /sys/kernel/debug/dri/0/amdgpu_dm_visual_confirm 231 232In this case, if you have a pipe split, you will see one small red bar at the 233bottom of the display covering the entire display width and another bar 234covering the second pipe. In other words, you will see a bit high bar in the 235second pipe. 236 237DTN Debug 238========= 239 240DC (DCN) provides an extensive log that dumps multiple details from our 241hardware configuration. Via debugfs, you can capture those status values by 242using Display Test Next (DTN) log, which can be captured via debugfs by using:: 243 244 cat /sys/kernel/debug/dri/0/amdgpu_dm_dtn_log 245 246Since this log is updated accordingly with DCN status, you can also follow the 247change in real-time by using something like:: 248 249 sudo watch -d cat /sys/kernel/debug/dri/0/amdgpu_dm_dtn_log 250 251When reporting a bug related to DC, consider attaching this log before and 252after you reproduce the bug. 253 254Collect Firmware information 255============================ 256 257When reporting issues, it is important to have the firmware information since 258it can be helpful for debugging purposes. To get all the firmware information, 259use the command:: 260 261 cat /sys/kernel/debug/dri/0/amdgpu_firmware_info 262 263From the display perspective, pay attention to the firmware of the DMCU and 264DMCUB. 265 266DMUB Firmware Debug 267=================== 268 269Sometimes, dmesg logs aren't enough. This is especially true if a feature is 270implemented primarily in DMUB firmware. In such cases, all we see in dmesg when 271an issue arises is some generic timeout error. So, to get more relevant 272information, we can trace DMUB commands by enabling the relevant bits in 273`amdgpu_dm_dmub_trace_mask`. 274 275Currently, we support the tracing of the following groups: 276 277Trace Groups 278------------ 279 280.. csv-table:: 281 :header-rows: 1 282 :widths: 1, 1 283 :file: ./trace-groups-table.csv 284 285**Note: Not all ASICs support all of the listed trace groups** 286 287So, to enable just PSR tracing you can use the following command:: 288 289 # echo 0x8020 > /sys/kernel/debug/dri/0/amdgpu_dm_dmub_trace_mask 290 291Then, you need to enable logging trace events to the buffer, which you can do 292using the following:: 293 294 # echo 1 > /sys/kernel/debug/dri/0/amdgpu_dm_dmcub_trace_event_en 295 296Lastly, after you are able to reproduce the issue you are trying to debug, 297you can disable tracing and read the trace log by using the following:: 298 299 # echo 0 > /sys/kernel/debug/dri/0/amdgpu_dm_dmcub_trace_event_en 300 # cat /sys/kernel/debug/dri/0/amdgpu_dm_dmub_tracebuffer 301 302So, when reporting bugs related to features such as PSR and ABM, consider 303enabling the relevant bits in the mask before reproducing the issue and 304attach the log that you obtain from the trace buffer in any bug reports that you 305create. 306