1*a037699dSSebastian Fricke.. SPDX-License-Identifier: GPL-2.0 2*a037699dSSebastian Fricke 3*a037699dSSebastian Fricke======================================== 4*a037699dSSebastian FrickeDebugging advice for driver development 5*a037699dSSebastian Fricke======================================== 6*a037699dSSebastian Fricke 7*a037699dSSebastian FrickeThis document serves as a general starting point and lookup for debugging 8*a037699dSSebastian Frickedevice drivers. 9*a037699dSSebastian FrickeWhile this guide focuses on debugging that requires re-compiling the 10*a037699dSSebastian Frickemodule/kernel, the :doc:`userspace debugging guide 11*a037699dSSebastian Fricke</process/debugging/userspace_debugging_guide>` will guide 12*a037699dSSebastian Frickeyou through tools like dynamic debug, ftrace and other tools useful for 13*a037699dSSebastian Frickedebugging issues and behavior. 14*a037699dSSebastian FrickeFor general debugging advice, see the :doc:`general advice document 15*a037699dSSebastian Fricke</process/debugging/index>`. 16*a037699dSSebastian Fricke 17*a037699dSSebastian Fricke.. contents:: 18*a037699dSSebastian Fricke :depth: 3 19*a037699dSSebastian Fricke 20*a037699dSSebastian FrickeThe following sections show you the available tools. 21*a037699dSSebastian Fricke 22*a037699dSSebastian Frickeprintk() & friends 23*a037699dSSebastian Fricke------------------ 24*a037699dSSebastian Fricke 25*a037699dSSebastian FrickeThese are derivatives of printf() with varying destinations and support for 26*a037699dSSebastian Frickebeing dynamically turned on or off, or lack thereof. 27*a037699dSSebastian Fricke 28*a037699dSSebastian FrickeSimple printk() 29*a037699dSSebastian Fricke~~~~~~~~~~~~~~~ 30*a037699dSSebastian Fricke 31*a037699dSSebastian FrickeThe classic, can be used to great effect for quick and dirty development 32*a037699dSSebastian Frickeof new modules or to extract arbitrary necessary data for troubleshooting. 33*a037699dSSebastian Fricke 34*a037699dSSebastian FrickePrerequisite: ``CONFIG_PRINTK`` (usually enabled by default) 35*a037699dSSebastian Fricke 36*a037699dSSebastian Fricke**Pros**: 37*a037699dSSebastian Fricke 38*a037699dSSebastian Fricke- No need to learn anything, simple to use 39*a037699dSSebastian Fricke- Easy to modify exactly to your needs (formatting of the data (See: 40*a037699dSSebastian Fricke :doc:`/core-api/printk-formats`), visibility in the log) 41*a037699dSSebastian Fricke- Can cause delays in the execution of the code (beneficial to confirm whether 42*a037699dSSebastian Fricke timing is a factor) 43*a037699dSSebastian Fricke 44*a037699dSSebastian Fricke**Cons**: 45*a037699dSSebastian Fricke 46*a037699dSSebastian Fricke- Requires rebuilding the kernel/module 47*a037699dSSebastian Fricke- Can cause delays in the execution of the code (which can cause issues to be 48*a037699dSSebastian Fricke not reproducible) 49*a037699dSSebastian Fricke 50*a037699dSSebastian FrickeFor the full documentation see :doc:`/core-api/printk-basics` 51*a037699dSSebastian Fricke 52*a037699dSSebastian FrickeTrace_printk 53*a037699dSSebastian Fricke~~~~~~~~~~~~ 54*a037699dSSebastian Fricke 55*a037699dSSebastian FrickePrerequisite: ``CONFIG_DYNAMIC_FTRACE`` & ``#include <linux/ftrace.h>`` 56*a037699dSSebastian Fricke 57*a037699dSSebastian FrickeIt is a tiny bit less comfortable to use than printk(), because you will have 58*a037699dSSebastian Fricketo read the messages from the trace file (See: :ref:`read_ftrace_log` 59*a037699dSSebastian Frickeinstead of from the kernel log, but very useful when printk() adds unwanted 60*a037699dSSebastian Frickedelays into the code execution, causing issues to be flaky or hidden.) 61*a037699dSSebastian Fricke 62*a037699dSSebastian FrickeIf the processing of this still causes timing issues then you can try 63*a037699dSSebastian Fricketrace_puts(). 64*a037699dSSebastian Fricke 65*a037699dSSebastian FrickeFor the full Documentation see trace_printk() 66*a037699dSSebastian Fricke 67*a037699dSSebastian Frickedev_dbg 68*a037699dSSebastian Fricke~~~~~~~ 69*a037699dSSebastian Fricke 70*a037699dSSebastian FrickePrint statement, which can be targeted by 71*a037699dSSebastian Fricke:ref:`process/debugging/userspace_debugging_guide:dynamic debug` that contains 72*a037699dSSebastian Frickeadditional information about the device used within the context. 73*a037699dSSebastian Fricke 74*a037699dSSebastian Fricke**When is it appropriate to leave a debug print in the code?** 75*a037699dSSebastian Fricke 76*a037699dSSebastian FrickePermanent debug statements have to be useful for a developer to troubleshoot 77*a037699dSSebastian Frickedriver misbehavior. Judging that is a bit more of an art than a science, but 78*a037699dSSebastian Frickesome guidelines are in the :ref:`Coding style guidelines 79*a037699dSSebastian Fricke<process/coding-style:13) printing kernel messages>`. In almost all cases the 80*a037699dSSebastian Frickedebug statements shouldn't be upstreamed, as a working driver is supposed to be 81*a037699dSSebastian Frickesilent. 82*a037699dSSebastian Fricke 83*a037699dSSebastian FrickeCustom printk 84*a037699dSSebastian Fricke~~~~~~~~~~~~~ 85*a037699dSSebastian Fricke 86*a037699dSSebastian FrickeExample:: 87*a037699dSSebastian Fricke 88*a037699dSSebastian Fricke #define core_dbg(fmt, arg...) do { \ 89*a037699dSSebastian Fricke if (core_debug) \ 90*a037699dSSebastian Fricke printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \ 91*a037699dSSebastian Fricke } while (0) 92*a037699dSSebastian Fricke 93*a037699dSSebastian Fricke**When should you do this?** 94*a037699dSSebastian Fricke 95*a037699dSSebastian FrickeIt is better to just use a pr_debug(), which can later be turned on/off with 96*a037699dSSebastian Frickedynamic debug. Additionally, a lot of drivers activate these prints via a 97*a037699dSSebastian Frickevariable like ``core_debug`` set by a module parameter. However, Module 98*a037699dSSebastian Frickeparameters `are not recommended anymore 99*a037699dSSebastian Fricke<https://lore.kernel.org/all/2024032757-surcharge-grime-d3dd@gregkh>`_. 100*a037699dSSebastian Fricke 101*a037699dSSebastian FrickeFtrace 102*a037699dSSebastian Fricke------ 103*a037699dSSebastian Fricke 104*a037699dSSebastian FrickeCreating a custom Ftrace tracepoint 105*a037699dSSebastian Fricke~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 106*a037699dSSebastian Fricke 107*a037699dSSebastian FrickeA tracepoint adds a hook into your code that will be called and logged when the 108*a037699dSSebastian Fricketracepoint is enabled. This can be used, for example, to trace hitting a 109*a037699dSSebastian Frickeconditional branch or to dump the internal state at specific points of the code 110*a037699dSSebastian Frickeflow during a debugging session. 111*a037699dSSebastian Fricke 112*a037699dSSebastian FrickeHere is a basic description of :ref:`how to implement new tracepoints 113*a037699dSSebastian Fricke<trace/tracepoints:usage>`. 114*a037699dSSebastian Fricke 115*a037699dSSebastian FrickeFor the full event tracing documentation see :doc:`/trace/events` 116*a037699dSSebastian Fricke 117*a037699dSSebastian FrickeFor the full Ftrace documentation see :doc:`/trace/ftrace` 118*a037699dSSebastian Fricke 119*a037699dSSebastian FrickeDebugFS 120*a037699dSSebastian Fricke------- 121*a037699dSSebastian Fricke 122*a037699dSSebastian FrickePrerequisite: ``CONFIG_DEBUG_FS` & `#include <linux/debugfs.h>`` 123*a037699dSSebastian Fricke 124*a037699dSSebastian FrickeDebugFS differs from the other approaches of debugging, as it doesn't write 125*a037699dSSebastian Frickemessages to the kernel log nor add traces to the code. Instead it allows the 126*a037699dSSebastian Frickedeveloper to handle a set of files. 127*a037699dSSebastian FrickeWith these files you can either store values of variables or make 128*a037699dSSebastian Frickeregister/memory dumps or you can make these files writable and modify 129*a037699dSSebastian Frickevalues/settings in the driver. 130*a037699dSSebastian Fricke 131*a037699dSSebastian FrickePossible use-cases among others: 132*a037699dSSebastian Fricke 133*a037699dSSebastian Fricke- Store register values 134*a037699dSSebastian Fricke- Keep track of variables 135*a037699dSSebastian Fricke- Store errors 136*a037699dSSebastian Fricke- Store settings 137*a037699dSSebastian Fricke- Toggle a setting like debug on/off 138*a037699dSSebastian Fricke- Error injection 139*a037699dSSebastian Fricke 140*a037699dSSebastian FrickeThis is especially useful, when the size of a data dump would be hard to digest 141*a037699dSSebastian Frickeas part of the general kernel log (for example when dumping raw bitstream data) 142*a037699dSSebastian Frickeor when you are not interested in all the values all the time, but with the 143*a037699dSSebastian Frickepossibility to inspect them. 144*a037699dSSebastian Fricke 145*a037699dSSebastian FrickeThe general idea is: 146*a037699dSSebastian Fricke 147*a037699dSSebastian Fricke- Create a directory during probe (``struct dentry *parent = 148*a037699dSSebastian Fricke debugfs_create_dir("my_driver", NULL);``) 149*a037699dSSebastian Fricke- Create a file (``debugfs_create_u32("my_value", 444, parent, &my_variable);``) 150*a037699dSSebastian Fricke 151*a037699dSSebastian Fricke - In this example the file is found in 152*a037699dSSebastian Fricke ``/sys/kernel/debug/my_driver/my_value`` (with read permissions for 153*a037699dSSebastian Fricke user/group/all) 154*a037699dSSebastian Fricke - any read of the file will return the current contents of the variable 155*a037699dSSebastian Fricke ``my_variable`` 156*a037699dSSebastian Fricke 157*a037699dSSebastian Fricke- Clean up the directory when removing the device 158*a037699dSSebastian Fricke (``debugfs_remove_recursive(parent);``) 159*a037699dSSebastian Fricke 160*a037699dSSebastian FrickeFor the full documentation see :doc:`/filesystems/debugfs`. 161*a037699dSSebastian Fricke 162*a037699dSSebastian FrickeKASAN, UBSAN, lockdep and other error checkers 163*a037699dSSebastian Fricke---------------------------------------------- 164*a037699dSSebastian Fricke 165*a037699dSSebastian FrickeKASAN (Kernel Address Sanitizer) 166*a037699dSSebastian Fricke~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 167*a037699dSSebastian Fricke 168*a037699dSSebastian FrickePrerequisite: ``CONFIG_KASAN`` 169*a037699dSSebastian Fricke 170*a037699dSSebastian FrickeKASAN is a dynamic memory error detector that helps to find use-after-free and 171*a037699dSSebastian Frickeout-of-bounds bugs. It uses compile-time instrumentation to check every memory 172*a037699dSSebastian Frickeaccess. 173*a037699dSSebastian Fricke 174*a037699dSSebastian FrickeFor the full documentation see :doc:`/dev-tools/kasan`. 175*a037699dSSebastian Fricke 176*a037699dSSebastian FrickeUBSAN (Undefined Behavior Sanitizer) 177*a037699dSSebastian Fricke~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 178*a037699dSSebastian Fricke 179*a037699dSSebastian FrickePrerequisite: ``CONFIG_UBSAN`` 180*a037699dSSebastian Fricke 181*a037699dSSebastian FrickeUBSAN relies on compiler instrumentation and runtime checks to detect undefined 182*a037699dSSebastian Frickebehavior. It is designed to find a variety of issues, including signed integer 183*a037699dSSebastian Frickeoverflow, array index out of bounds, and more. 184*a037699dSSebastian Fricke 185*a037699dSSebastian FrickeFor the full documentation see :doc:`/dev-tools/ubsan` 186*a037699dSSebastian Fricke 187*a037699dSSebastian Frickelockdep (Lock Dependency Validator) 188*a037699dSSebastian Fricke~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 189*a037699dSSebastian Fricke 190*a037699dSSebastian FrickePrerequisite: ``CONFIG_DEBUG_LOCKDEP`` 191*a037699dSSebastian Fricke 192*a037699dSSebastian Frickelockdep is a runtime lock dependency validator that detects potential deadlocks 193*a037699dSSebastian Frickeand other locking-related issues in the kernel. 194*a037699dSSebastian FrickeIt tracks lock acquisitions and releases, building a dependency graph that is 195*a037699dSSebastian Frickeanalyzed for potential deadlocks. 196*a037699dSSebastian Frickelockdep is especially useful for validating the correctness of lock ordering in 197*a037699dSSebastian Frickethe kernel. 198*a037699dSSebastian Fricke 199*a037699dSSebastian FrickePSI (Pressure stall information tracking) 200*a037699dSSebastian Fricke~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 201*a037699dSSebastian Fricke 202*a037699dSSebastian FrickePrerequisite: ``CONFIG_PSI`` 203*a037699dSSebastian Fricke 204*a037699dSSebastian FrickePSI is a measurement tool to identify excessive overcommits on hardware 205*a037699dSSebastian Frickeresources, that can cause performance disruptions or even OOM kills. 206*a037699dSSebastian Fricke 207*a037699dSSebastian Frickedevice coredump 208*a037699dSSebastian Fricke--------------- 209*a037699dSSebastian Fricke 210*a037699dSSebastian FrickePrerequisite: ``#include <linux/devcoredump.h>`` 211*a037699dSSebastian Fricke 212*a037699dSSebastian FrickeProvides the infrastructure for a driver to provide arbitrary data to userland. 213*a037699dSSebastian FrickeIt is most often used in conjunction with udev or similar userland application 214*a037699dSSebastian Fricketo listen for kernel uevents, which indicate that the dump is ready. Udev has 215*a037699dSSebastian Frickerules to copy that file somewhere for long-term storage and analysis, as by 216*a037699dSSebastian Frickedefault, the data for the dump is automatically cleaned up after 5 minutes. 217*a037699dSSebastian FrickeThat data is analyzed with driver-specific tools or GDB. 218*a037699dSSebastian Fricke 219*a037699dSSebastian FrickeYou can find an example implementation at: 220*a037699dSSebastian Fricke`drivers/media/platform/qcom/venus/core.c 221*a037699dSSebastian Fricke<https://elixir.bootlin.com/linux/v6.11.6/source/drivers/media/platform/qcom/venus/core.c#L30>`__ 222*a037699dSSebastian Fricke 223*a037699dSSebastian Fricke**Copyright** ©2024 : Collabora 224