xref: /linux/Documentation/process/debugging/driver_development_debugging_guide.rst (revision e0c0ab04f6785abaa71b9b8dc252cb1a2072c225)
1.. SPDX-License-Identifier: GPL-2.0
2
3========================================
4Debugging advice for driver development
5========================================
6
7This document serves as a general starting point and lookup for debugging
8device drivers.
9While this guide focuses on debugging that requires re-compiling the
10module/kernel, the :doc:`userspace debugging guide
11</process/debugging/userspace_debugging_guide>` will guide
12you through tools like dynamic debug, ftrace and other tools useful for
13debugging issues and behavior.
14For general debugging advice, see the :doc:`general advice document
15</process/debugging/index>`.
16
17.. contents::
18    :depth: 3
19
20The following sections show you the available tools.
21
22printk() & friends
23------------------
24
25These are derivatives of printf() with varying destinations and support for
26being dynamically turned on or off, or lack thereof.
27
28Simple printk()
29~~~~~~~~~~~~~~~
30
31The classic, can be used to great effect for quick and dirty development
32of new modules or to extract arbitrary necessary data for troubleshooting.
33
34Prerequisite: ``CONFIG_PRINTK`` (usually enabled by default)
35
36**Pros**:
37
38- No need to learn anything, simple to use
39- Easy to modify exactly to your needs (formatting of the data (See:
40  :doc:`/core-api/printk-formats`), visibility in the log)
41- Can cause delays in the execution of the code (beneficial to confirm whether
42  timing is a factor)
43
44**Cons**:
45
46- Requires rebuilding the kernel/module
47- Can cause delays in the execution of the code (which can cause issues to be
48  not reproducible)
49
50For the full documentation see :doc:`/core-api/printk-basics`
51
52Trace_printk
53~~~~~~~~~~~~
54
55Prerequisite: ``CONFIG_DYNAMIC_FTRACE`` & ``#include <linux/ftrace.h>``
56
57It is a tiny bit less comfortable to use than printk(), because you will have
58to read the messages from the trace file (See: :ref:`read_ftrace_log`
59instead of from the kernel log, but very useful when printk() adds unwanted
60delays into the code execution, causing issues to be flaky or hidden.)
61
62If the processing of this still causes timing issues then you can try
63trace_puts().
64
65For the full Documentation see trace_printk()
66
67dev_dbg
68~~~~~~~
69
70Print statement, which can be targeted by
71:ref:`process/debugging/userspace_debugging_guide:dynamic debug` that contains
72additional information about the device used within the context.
73
74**When is it appropriate to leave a debug print in the code?**
75
76Permanent debug statements have to be useful for a developer to troubleshoot
77driver misbehavior. Judging that is a bit more of an art than a science, but
78some guidelines are in the :ref:`Coding style guidelines
79<process/coding-style:13) printing kernel messages>`. In almost all cases the
80debug statements shouldn't be upstreamed, as a working driver is supposed to be
81silent.
82
83Custom printk
84~~~~~~~~~~~~~
85
86Example::
87
88  #define core_dbg(fmt, arg...) do { \
89	  if (core_debug) \
90		  printk(KERN_DEBUG pr_fmt("core: " fmt), ## arg); \
91	  } while (0)
92
93**When should you do this?**
94
95It is better to just use a pr_debug(), which can later be turned on/off with
96dynamic debug. Additionally, a lot of drivers activate these prints via a
97variable like ``core_debug`` set by a module parameter. However, Module
98parameters `are not recommended anymore
99<https://lore.kernel.org/all/2024032757-surcharge-grime-d3dd@gregkh>`_.
100
101Ftrace
102------
103
104Creating a custom Ftrace tracepoint
105~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
106
107A tracepoint adds a hook into your code that will be called and logged when the
108tracepoint is enabled. This can be used, for example, to trace hitting a
109conditional branch or to dump the internal state at specific points of the code
110flow during a debugging session.
111
112Here is a basic description of :ref:`how to implement new tracepoints
113<trace/tracepoints:usage>`.
114
115For the full event tracing documentation see :doc:`/trace/events`
116
117For the full Ftrace documentation see :doc:`/trace/ftrace`
118
119DebugFS
120-------
121
122Prerequisite: ``CONFIG_DEBUG_FS` & `#include <linux/debugfs.h>``
123
124DebugFS differs from the other approaches of debugging, as it doesn't write
125messages to the kernel log nor add traces to the code. Instead it allows the
126developer to handle a set of files.
127With these files you can either store values of variables or make
128register/memory dumps or you can make these files writable and modify
129values/settings in the driver.
130
131Possible use-cases among others:
132
133- Store register values
134- Keep track of variables
135- Store errors
136- Store settings
137- Toggle a setting like debug on/off
138- Error injection
139
140This is especially useful, when the size of a data dump would be hard to digest
141as part of the general kernel log (for example when dumping raw bitstream data)
142or when you are not interested in all the values all the time, but with the
143possibility to inspect them.
144
145The general idea is:
146
147- Create a directory during probe (``struct dentry *parent =
148  debugfs_create_dir("my_driver", NULL);``)
149- Create a file (``debugfs_create_u32("my_value", 444, parent, &my_variable);``)
150
151  - In this example the file is found in
152    ``/sys/kernel/debug/my_driver/my_value`` (with read permissions for
153    user/group/all)
154  - any read of the file will return the current contents of the variable
155    ``my_variable``
156
157- Clean up the directory when removing the device
158  (``debugfs_remove(parent);``)
159
160For the full documentation see :doc:`/filesystems/debugfs`.
161
162KASAN, UBSAN, lockdep and other error checkers
163----------------------------------------------
164
165KASAN (Kernel Address Sanitizer)
166~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
167
168Prerequisite: ``CONFIG_KASAN``
169
170KASAN is a dynamic memory error detector that helps to find use-after-free and
171out-of-bounds bugs. It uses compile-time instrumentation to check every memory
172access.
173
174For the full documentation see :doc:`/dev-tools/kasan`.
175
176UBSAN (Undefined Behavior Sanitizer)
177~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
178
179Prerequisite: ``CONFIG_UBSAN``
180
181UBSAN relies on compiler instrumentation and runtime checks to detect undefined
182behavior. It is designed to find a variety of issues, including signed integer
183overflow, array index out of bounds, and more.
184
185For the full documentation see :doc:`/dev-tools/ubsan`
186
187lockdep (Lock Dependency Validator)
188~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
189
190Prerequisite: ``CONFIG_DEBUG_LOCKDEP``
191
192lockdep is a runtime lock dependency validator that detects potential deadlocks
193and other locking-related issues in the kernel.
194It tracks lock acquisitions and releases, building a dependency graph that is
195analyzed for potential deadlocks.
196lockdep is especially useful for validating the correctness of lock ordering in
197the kernel.
198
199PSI (Pressure stall information tracking)
200~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
201
202Prerequisite: ``CONFIG_PSI``
203
204PSI is a measurement tool to identify excessive overcommits on hardware
205resources, that can cause performance disruptions or even OOM kills.
206
207device coredump
208---------------
209
210Prerequisite: ``CONFIG_DEV_COREDUMP`` & ``#include <linux/devcoredump.h>``
211
212Provides the infrastructure for a driver to provide arbitrary data to userland.
213It is most often used in conjunction with udev or similar userland application
214to listen for kernel uevents, which indicate that the dump is ready. Udev has
215rules to copy that file somewhere for long-term storage and analysis, as by
216default, the data for the dump is automatically cleaned up after a default
2175 minutes. That data is analyzed with driver-specific tools or GDB.
218
219A device coredump can be created with a vmalloc area, with read/free
220methods, or as a scatter/gather list.
221
222You can find an example implementation at:
223`drivers/media/platform/qcom/venus/core.c
224<https://elixir.bootlin.com/linux/v6.11.6/source/drivers/media/platform/qcom/venus/core.c#L30>`__,
225in the Bluetooth HCI layer, in several wireless drivers, and in several
226DRM drivers.
227
228devcoredump interfaces
229~~~~~~~~~~~~~~~~~~~~~~
230
231.. kernel-doc:: include/linux/devcoredump.h
232
233.. kernel-doc:: drivers/base/devcoredump.c
234
235**Copyright** ©2024 : Collabora
236