xref: /linux/Documentation/admin-guide/bug-hunting.rst (revision 3a39d672e7f48b8d6b91a09afa4b55352773b4b5)
19d85025bSMauro Carvalho ChehabBug hunting
2f226e460SMauro Carvalho Chehab===========
39d85025bSMauro Carvalho Chehab
4f226e460SMauro Carvalho ChehabKernel bug reports often come with a stack dump like the one below::
59d85025bSMauro Carvalho Chehab
6f226e460SMauro Carvalho Chehab	------------[ cut here ]------------
7f226e460SMauro Carvalho Chehab	WARNING: CPU: 1 PID: 28102 at kernel/module.c:1108 module_put+0x57/0x70
8f226e460SMauro Carvalho Chehab	Modules linked in: dvb_usb_gp8psk(-) dvb_usb dvb_core nvidia_drm(PO) nvidia_modeset(PO) snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd soundcore nvidia(PO) [last unloaded: rc_core]
9f226e460SMauro Carvalho Chehab	CPU: 1 PID: 28102 Comm: rmmod Tainted: P        WC O 4.8.4-build.1 #1
10f226e460SMauro Carvalho Chehab	Hardware name: MSI MS-7309/MS-7309, BIOS V1.12 02/23/2009
11f226e460SMauro Carvalho Chehab	 00000000 c12ba080 00000000 00000000 c103ed6a c1616014 00000001 00006dc6
12f226e460SMauro Carvalho Chehab	 c1615862 00000454 c109e8a7 c109e8a7 00000009 ffffffff 00000000 f13f6a10
13f226e460SMauro Carvalho Chehab	 f5f5a600 c103ee33 00000009 00000000 00000000 c109e8a7 f80ca4d0 c109f617
14f226e460SMauro Carvalho Chehab	Call Trace:
15f226e460SMauro Carvalho Chehab	 [<c12ba080>] ? dump_stack+0x44/0x64
16f226e460SMauro Carvalho Chehab	 [<c103ed6a>] ? __warn+0xfa/0x120
17f226e460SMauro Carvalho Chehab	 [<c109e8a7>] ? module_put+0x57/0x70
18f226e460SMauro Carvalho Chehab	 [<c109e8a7>] ? module_put+0x57/0x70
19f226e460SMauro Carvalho Chehab	 [<c103ee33>] ? warn_slowpath_null+0x23/0x30
20f226e460SMauro Carvalho Chehab	 [<c109e8a7>] ? module_put+0x57/0x70
21f226e460SMauro Carvalho Chehab	 [<f80ca4d0>] ? gp8psk_fe_set_frontend+0x460/0x460 [dvb_usb_gp8psk]
22f226e460SMauro Carvalho Chehab	 [<c109f617>] ? symbol_put_addr+0x27/0x50
23f226e460SMauro Carvalho Chehab	 [<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
24f226e460SMauro Carvalho Chehab	 [<f80bb3bf>] ? dvb_usb_exit+0x2f/0xd0 [dvb_usb]
25f226e460SMauro Carvalho Chehab	 [<c13d03bc>] ? usb_disable_endpoint+0x7c/0xb0
26f226e460SMauro Carvalho Chehab	 [<f80bb48a>] ? dvb_usb_device_exit+0x2a/0x50 [dvb_usb]
27f226e460SMauro Carvalho Chehab	 [<c13d2882>] ? usb_unbind_interface+0x62/0x250
28f226e460SMauro Carvalho Chehab	 [<c136b514>] ? __pm_runtime_idle+0x44/0x70
29f226e460SMauro Carvalho Chehab	 [<c13620d8>] ? __device_release_driver+0x78/0x120
30f226e460SMauro Carvalho Chehab	 [<c1362907>] ? driver_detach+0x87/0x90
31f226e460SMauro Carvalho Chehab	 [<c1361c48>] ? bus_remove_driver+0x38/0x90
32f226e460SMauro Carvalho Chehab	 [<c13d1c18>] ? usb_deregister+0x58/0xb0
33f226e460SMauro Carvalho Chehab	 [<c109fbb0>] ? SyS_delete_module+0x130/0x1f0
34f226e460SMauro Carvalho Chehab	 [<c1055654>] ? task_work_run+0x64/0x80
35f226e460SMauro Carvalho Chehab	 [<c1000fa5>] ? exit_to_usermode_loop+0x85/0x90
36f226e460SMauro Carvalho Chehab	 [<c10013f0>] ? do_fast_syscall_32+0x80/0x130
37f226e460SMauro Carvalho Chehab	 [<c1549f43>] ? sysenter_past_esp+0x40/0x6a
38f226e460SMauro Carvalho Chehab	---[ end trace 6ebc60ef3981792f ]---
399d85025bSMauro Carvalho Chehab
40f226e460SMauro Carvalho ChehabSuch stack traces provide enough information to identify the line inside the
41f226e460SMauro Carvalho ChehabKernel's source code where the bug happened. Depending on the severity of
42f226e460SMauro Carvalho Chehabthe issue, it may also contain the word **Oops**, as on this one::
43f226e460SMauro Carvalho Chehab
44f226e460SMauro Carvalho Chehab	BUG: unable to handle kernel NULL pointer dereference at   (null)
45f226e460SMauro Carvalho Chehab	IP: [<c06969d4>] iret_exc+0x7d0/0xa59
46f226e460SMauro Carvalho Chehab	*pdpt = 000000002258a001 *pde = 0000000000000000
47f226e460SMauro Carvalho Chehab	Oops: 0002 [#1] PREEMPT SMP
48f226e460SMauro Carvalho Chehab	...
49f226e460SMauro Carvalho Chehab
50f226e460SMauro Carvalho ChehabDespite being an **Oops** or some other sort of stack trace, the offended
51f226e460SMauro Carvalho Chehabline is usually required to identify and handle the bug. Along this chapter,
524eb92411SRandy Dunlapwe'll refer to "Oops" for all kinds of stack traces that need to be analyzed.
53f226e460SMauro Carvalho Chehab
544eb92411SRandy DunlapIf the kernel is compiled with ``CONFIG_DEBUG_INFO``, you can enhance the
554eb92411SRandy Dunlapquality of the stack trace by using file:`scripts/decode_stacktrace.sh`.
56f226e460SMauro Carvalho Chehab
574eb92411SRandy DunlapModules linked in
584eb92411SRandy Dunlap-----------------
594eb92411SRandy Dunlap
604eb92411SRandy DunlapModules that are tainted or are being loaded or unloaded are marked with
614eb92411SRandy Dunlap"(...)", where the taint flags are described in
624eb92411SRandy Dunlapfile:`Documentation/admin-guide/tainted-kernels.rst`, "being loaded" is
634eb92411SRandy Dunlapannotated with "+", and "being unloaded" is annotated with "-".
644eb92411SRandy Dunlap
65f226e460SMauro Carvalho Chehab
66f226e460SMauro Carvalho ChehabWhere is the Oops message is located?
67f226e460SMauro Carvalho Chehab-------------------------------------
68f226e460SMauro Carvalho Chehab
69f226e460SMauro Carvalho ChehabNormally the Oops text is read from the kernel buffers by klogd and
70f226e460SMauro Carvalho Chehabhanded to ``syslogd`` which writes it to a syslog file, typically
71f226e460SMauro Carvalho Chehab``/var/log/messages`` (depends on ``/etc/syslog.conf``). On systems with
72f226e460SMauro Carvalho Chehabsystemd, it may also be stored by the ``journald`` daemon, and accessed
73f226e460SMauro Carvalho Chehabby running ``journalctl`` command.
74f226e460SMauro Carvalho Chehab
75f226e460SMauro Carvalho ChehabSometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to
76f226e460SMauro Carvalho Chehabread the data from the kernel buffers and save it.  Or you can
77f226e460SMauro Carvalho Chehab``cat /proc/kmsg > file``, however you have to break in to stop the transfer,
784eb92411SRandy Dunlapsince ``kmsg`` is a "never ending file".
79f226e460SMauro Carvalho Chehab
80f226e460SMauro Carvalho ChehabIf the machine has crashed so badly that you cannot enter commands or
81f226e460SMauro Carvalho Chehabthe disk is not available then you have three options:
82f226e460SMauro Carvalho Chehab
83f226e460SMauro Carvalho Chehab(1) Hand copy the text from the screen and type it in after the machine
84f226e460SMauro Carvalho Chehab    has restarted.  Messy but it is the only option if you have not
85f226e460SMauro Carvalho Chehab    planned for a crash. Alternatively, you can take a picture of
86f226e460SMauro Carvalho Chehab    the screen with a digital camera - not nice, but better than
87f226e460SMauro Carvalho Chehab    nothing.  If the messages scroll off the top of the console, you
884eb92411SRandy Dunlap    may find that booting with a higher resolution (e.g., ``vga=791``)
89f226e460SMauro Carvalho Chehab    will allow you to read more of the text. (Caveat: This needs ``vesafb``,
904eb92411SRandy Dunlap    so won't help for 'early' oopses.)
91f226e460SMauro Carvalho Chehab
92f226e460SMauro Carvalho Chehab(2) Boot with a serial console (see
93f226e460SMauro Carvalho Chehab    :ref:`Documentation/admin-guide/serial-console.rst <serial_console>`),
94f226e460SMauro Carvalho Chehab    run a null modem to a second machine and capture the output there
95f226e460SMauro Carvalho Chehab    using your favourite communication program.  Minicom works well.
96f226e460SMauro Carvalho Chehab
97330d4810SMauro Carvalho Chehab(3) Use Kdump (see Documentation/admin-guide/kdump/kdump.rst),
98f226e460SMauro Carvalho Chehab    extract the kernel ring buffer from old memory with using dmesg
99330d4810SMauro Carvalho Chehab    gdbmacro in Documentation/admin-guide/kdump/gdbmacros.txt.
100f226e460SMauro Carvalho Chehab
101f226e460SMauro Carvalho ChehabFinding the bug's location
102f226e460SMauro Carvalho Chehab--------------------------
103f226e460SMauro Carvalho Chehab
104f226e460SMauro Carvalho ChehabReporting a bug works best if you point the location of the bug at the
105f226e460SMauro Carvalho ChehabKernel source file. There are two methods for doing that. Usually, using
106f226e460SMauro Carvalho Chehab``gdb`` is easier, but the Kernel should be pre-compiled with debug info.
107f226e460SMauro Carvalho Chehab
108f226e460SMauro Carvalho Chehabgdb
109f226e460SMauro Carvalho Chehab^^^
110f226e460SMauro Carvalho Chehab
1114eb92411SRandy DunlapThe GNU debugger (``gdb``) is the best way to figure out the exact file and line
112f226e460SMauro Carvalho Chehabnumber of the OOPS from the ``vmlinux`` file.
113f226e460SMauro Carvalho Chehab
114f226e460SMauro Carvalho ChehabThe usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``.
115f226e460SMauro Carvalho ChehabThis can be set by running::
116f226e460SMauro Carvalho Chehab
117f226e460SMauro Carvalho Chehab  $ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
118f226e460SMauro Carvalho Chehab
119f226e460SMauro Carvalho ChehabOn a kernel compiled with ``CONFIG_DEBUG_INFO``, you can simply copy the
120f226e460SMauro Carvalho ChehabEIP value from the OOPS::
121f226e460SMauro Carvalho Chehab
122f226e460SMauro Carvalho Chehab EIP:    0060:[<c021e50e>]    Not tainted VLI
123f226e460SMauro Carvalho Chehab
124f226e460SMauro Carvalho ChehabAnd use GDB to translate that to human-readable form::
125f226e460SMauro Carvalho Chehab
126f226e460SMauro Carvalho Chehab  $ gdb vmlinux
127f226e460SMauro Carvalho Chehab  (gdb) l *0xc021e50e
128f226e460SMauro Carvalho Chehab
129f226e460SMauro Carvalho ChehabIf you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function
130f226e460SMauro Carvalho Chehaboffset from the OOPS::
131f226e460SMauro Carvalho Chehab
132f226e460SMauro Carvalho Chehab EIP is at vt_ioctl+0xda8/0x1482
133f226e460SMauro Carvalho Chehab
134f226e460SMauro Carvalho ChehabAnd recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled::
135f226e460SMauro Carvalho Chehab
136f226e460SMauro Carvalho Chehab  $ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO
137f226e460SMauro Carvalho Chehab  $ make vmlinux
138f226e460SMauro Carvalho Chehab  $ gdb vmlinux
139f226e460SMauro Carvalho Chehab  (gdb) l *vt_ioctl+0xda8
140f226e460SMauro Carvalho Chehab  0x1888 is in vt_ioctl (drivers/tty/vt/vt_ioctl.c:293).
141f226e460SMauro Carvalho Chehab  288	{
142f226e460SMauro Carvalho Chehab  289		struct vc_data *vc = NULL;
143f226e460SMauro Carvalho Chehab  290		int ret = 0;
144f226e460SMauro Carvalho Chehab  291
145f226e460SMauro Carvalho Chehab  292		console_lock();
146f226e460SMauro Carvalho Chehab  293		if (VT_BUSY(vc_num))
147f226e460SMauro Carvalho Chehab  294			ret = -EBUSY;
148f226e460SMauro Carvalho Chehab  295		else if (vc_num)
149f226e460SMauro Carvalho Chehab  296			vc = vc_deallocate(vc_num);
150f226e460SMauro Carvalho Chehab  297		console_unlock();
151f226e460SMauro Carvalho Chehab
152f226e460SMauro Carvalho Chehabor, if you want to be more verbose::
153f226e460SMauro Carvalho Chehab
154f226e460SMauro Carvalho Chehab  (gdb) p vt_ioctl
155f226e460SMauro Carvalho Chehab  $1 = {int (struct tty_struct *, unsigned int, unsigned long)} 0xae0 <vt_ioctl>
156f226e460SMauro Carvalho Chehab  (gdb) l *0xae0+0xda8
157f226e460SMauro Carvalho Chehab
158f226e460SMauro Carvalho ChehabYou could, instead, use the object file::
159f226e460SMauro Carvalho Chehab
160f226e460SMauro Carvalho Chehab  $ make drivers/tty/
161f226e460SMauro Carvalho Chehab  $ gdb drivers/tty/vt/vt_ioctl.o
162f226e460SMauro Carvalho Chehab  (gdb) l *vt_ioctl+0xda8
163f226e460SMauro Carvalho Chehab
164f226e460SMauro Carvalho ChehabIf you have a call trace, such as::
165f226e460SMauro Carvalho Chehab
166f226e460SMauro Carvalho Chehab     Call Trace:
167f226e460SMauro Carvalho Chehab      [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5
168f226e460SMauro Carvalho Chehab      [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e
169f226e460SMauro Carvalho Chehab      [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee
170f226e460SMauro Carvalho Chehab      ...
171f226e460SMauro Carvalho Chehab
1724eb92411SRandy Dunlapthis shows the problem likely is in the :jbd: module. You can load that module
173f226e460SMauro Carvalho Chehabin gdb and list the relevant code::
174f226e460SMauro Carvalho Chehab
175f226e460SMauro Carvalho Chehab  $ gdb fs/jbd/jbd.ko
176f226e460SMauro Carvalho Chehab  (gdb) l *log_wait_commit+0xa3
177f226e460SMauro Carvalho Chehab
178f226e460SMauro Carvalho Chehab.. note::
179f226e460SMauro Carvalho Chehab
180f226e460SMauro Carvalho Chehab     You can also do the same for any function call at the stack trace,
181f226e460SMauro Carvalho Chehab     like this one::
182f226e460SMauro Carvalho Chehab
183f226e460SMauro Carvalho Chehab	 [<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb]
184f226e460SMauro Carvalho Chehab
185f226e460SMauro Carvalho Chehab     The position where the above call happened can be seen with::
186f226e460SMauro Carvalho Chehab
187f226e460SMauro Carvalho Chehab	$ gdb drivers/media/usb/dvb-usb/dvb-usb.o
188f226e460SMauro Carvalho Chehab	(gdb) l *dvb_usb_adapter_frontend_exit+0x3a
1899d85025bSMauro Carvalho Chehab
190ab0e44c1SMauro Carvalho Chehabobjdump
191f226e460SMauro Carvalho Chehab^^^^^^^
192ab0e44c1SMauro Carvalho Chehab
1939d85025bSMauro Carvalho ChehabTo debug a kernel, use objdump and look for the hex offset from the crash
1949d85025bSMauro Carvalho Chehaboutput to find the valid line of code/assembler. Without debug symbols, you
1959d85025bSMauro Carvalho Chehabwill see the assembler code for the routine shown, but if your kernel has
1969d85025bSMauro Carvalho Chehabdebug symbols the C code will also be available. (Debug symbols can be enabled
1979d85025bSMauro Carvalho Chehabin the kernel hacking menu of the menu configuration.) For example::
1989d85025bSMauro Carvalho Chehab
199ab0e44c1SMauro Carvalho Chehab    $ objdump -r -S -l --disassemble net/dccp/ipv4.o
2009d85025bSMauro Carvalho Chehab
2019d85025bSMauro Carvalho Chehab.. note::
2029d85025bSMauro Carvalho Chehab
2039d85025bSMauro Carvalho Chehab   You need to be at the top level of the kernel tree for this to pick up
2049d85025bSMauro Carvalho Chehab   your C files.
2059d85025bSMauro Carvalho Chehab
2064eb92411SRandy DunlapIf you don't have access to the source code you can still debug some crash
2074eb92411SRandy Dunlapdumps using the following method (example crash dump output as shown by
2084eb92411SRandy DunlapDave Miller)::
2099d85025bSMauro Carvalho Chehab
210ab0e44c1SMauro Carvalho Chehab     EIP is at 	+0x14/0x4c0
2119d85025bSMauro Carvalho Chehab      ...
2129d85025bSMauro Carvalho Chehab     Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00
2139d85025bSMauro Carvalho Chehab     00 00 55 57  56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08
2149d85025bSMauro Carvalho Chehab     <8b> 83 3c 01 00 00 89 44  24 14 8b 45 28 85 c0 89 44 24 18 0f 85
2159d85025bSMauro Carvalho Chehab
2169d85025bSMauro Carvalho Chehab     Put the bytes into a "foo.s" file like this:
2179d85025bSMauro Carvalho Chehab
2189d85025bSMauro Carvalho Chehab            .text
2199d85025bSMauro Carvalho Chehab            .globl foo
2209d85025bSMauro Carvalho Chehab     foo:
2219d85025bSMauro Carvalho Chehab            .byte  .... /* bytes from Code: part of OOPS dump */
2229d85025bSMauro Carvalho Chehab
2239d85025bSMauro Carvalho Chehab     Compile it with "gcc -c -o foo.o foo.s" then look at the output of
2249d85025bSMauro Carvalho Chehab     "objdump --disassemble foo.o".
2259d85025bSMauro Carvalho Chehab
2269d85025bSMauro Carvalho Chehab     Output:
2279d85025bSMauro Carvalho Chehab
2289d85025bSMauro Carvalho Chehab     ip_queue_xmit:
2299d85025bSMauro Carvalho Chehab         push       %ebp
2309d85025bSMauro Carvalho Chehab         push       %edi
2319d85025bSMauro Carvalho Chehab         push       %esi
2329d85025bSMauro Carvalho Chehab         push       %ebx
2339d85025bSMauro Carvalho Chehab         sub        $0xbc, %esp
2349d85025bSMauro Carvalho Chehab         mov        0xd0(%esp), %ebp        ! %ebp = arg0 (skb)
2359d85025bSMauro Carvalho Chehab         mov        0x8(%ebp), %ebx         ! %ebx = skb->sk
2369d85025bSMauro Carvalho Chehab         mov        0x13c(%ebx), %eax       ! %eax = inet_sk(sk)->opt
2379d85025bSMauro Carvalho Chehab
2384eb92411SRandy Dunlapfile:`scripts/decodecode` can be used to automate most of this, depending
2394eb92411SRandy Dunlapon what CPU architecture is being debugged.
2404eb92411SRandy Dunlap
241f226e460SMauro Carvalho ChehabReporting the bug
242f226e460SMauro Carvalho Chehab-----------------
243ab0e44c1SMauro Carvalho Chehab
244f226e460SMauro Carvalho ChehabOnce you find where the bug happened, by inspecting its location,
245f226e460SMauro Carvalho Chehabyou could either try to fix it yourself or report it upstream.
246ab0e44c1SMauro Carvalho Chehab
247*cd0403adSJani NikulaIn order to report it upstream, you should identify the bug tracker, if any, or
248*cd0403adSJani Nikulamailing list used for the development of the affected code. This can be done by
249*cd0403adSJani Nikulausing the ``get_maintainer.pl`` script.
250ab0e44c1SMauro Carvalho Chehab
251ed6e26baSChristophe JAILLETFor example, if you find a bug at the gspca's sonixj.c file, you can get
2524eb92411SRandy Dunlapits maintainers with::
253ab0e44c1SMauro Carvalho Chehab
254*cd0403adSJani Nikula	$ ./scripts/get_maintainer.pl --bug -f drivers/media/usb/gspca/sonixj.c
255f226e460SMauro Carvalho Chehab	Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%)
256f226e460SMauro Carvalho Chehab	Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%)
257f226e460SMauro Carvalho Chehab	Tejun Heo <tj@kernel.org> (commit_signer:1/1=100%)
258f226e460SMauro Carvalho Chehab	Bhaktipriya Shridhar <bhaktipriya96@gmail.com> (commit_signer:1/1=100%,authored:1/1=100%,added_lines:4/4=100%,removed_lines:9/9=100%)
259f226e460SMauro Carvalho Chehab	linux-media@vger.kernel.org (open list:GSPCA USB WEBCAM DRIVER)
260f226e460SMauro Carvalho Chehab	linux-kernel@vger.kernel.org (open list)
2619d85025bSMauro Carvalho Chehab
262f226e460SMauro Carvalho ChehabPlease notice that it will point to:
2639d85025bSMauro Carvalho Chehab
2644eb92411SRandy Dunlap- The last developers that touched the source code (if this is done inside
2654eb92411SRandy Dunlap  a git tree). On the above example, Tejun and Bhaktipriya (in this
266b2105aa2SAndrew Klychkov  specific case, none really involved on the development of this file);
267f226e460SMauro Carvalho Chehab- The driver maintainer (Hans Verkuil);
268ed6e26baSChristophe JAILLET- The subsystem maintainer (Mauro Carvalho Chehab);
269f226e460SMauro Carvalho Chehab- The driver and/or subsystem mailing list (linux-media@vger.kernel.org);
270*cd0403adSJani Nikula- The Linux Kernel mailing list (linux-kernel@vger.kernel.org);
271*cd0403adSJani Nikula- The bug reporting URIs for the driver/subsystem (none in the above example).
2729d85025bSMauro Carvalho Chehab
273*cd0403adSJani NikulaIf the listing contains bug reporting URIs at the end, please prefer them over
274*cd0403adSJani Nikulaemail. Otherwise, please report bugs to the mailing list used for the
275*cd0403adSJani Nikuladevelopment of the code (linux-media ML) copying the driver maintainer (Hans).
2769d85025bSMauro Carvalho Chehab
277f226e460SMauro Carvalho ChehabIf you are totally stumped as to whom to send the report, and
278f226e460SMauro Carvalho Chehab``get_maintainer.pl`` didn't provide you anything useful, send it to
279f226e460SMauro Carvalho Chehablinux-kernel@vger.kernel.org.
2809d85025bSMauro Carvalho Chehab
281f226e460SMauro Carvalho ChehabThanks for your help in making Linux as stable as humanly possible.
2829d85025bSMauro Carvalho Chehab
283f226e460SMauro Carvalho ChehabFixing the bug
284f226e460SMauro Carvalho Chehab--------------
2859d85025bSMauro Carvalho Chehab
286f226e460SMauro Carvalho ChehabIf you know programming, you could help us by not only reporting the bug,
287ed6e26baSChristophe JAILLETbut also providing us with a solution. After all, open source is about
288f226e460SMauro Carvalho Chehabsharing what you do and don't you want to be recognised for your genius?
289ab0e44c1SMauro Carvalho Chehab
290f226e460SMauro Carvalho ChehabIf you decide to take this way, once you have worked out a fix please submit
291f226e460SMauro Carvalho Chehabit upstream.
2929d85025bSMauro Carvalho Chehab
2938c27ceffSMauro Carvalho ChehabPlease do read
294ed6e26baSChristophe JAILLET:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` though
2958c27ceffSMauro Carvalho Chehabto help your code get accepted.
296f226e460SMauro Carvalho Chehab
297f226e460SMauro Carvalho Chehab
298f226e460SMauro Carvalho Chehab---------------------------------------------------------------------------
299f226e460SMauro Carvalho Chehab
300f226e460SMauro Carvalho ChehabNotes on Oops tracing with ``klogd``
301f226e460SMauro Carvalho Chehab------------------------------------
302f226e460SMauro Carvalho Chehab
303f226e460SMauro Carvalho ChehabIn order to help Linus and the other kernel developers there has been
304f226e460SMauro Carvalho Chehabsubstantial support incorporated into ``klogd`` for processing protection
305f226e460SMauro Carvalho Chehabfaults.  In order to have full support for address resolution at least
306f226e460SMauro Carvalho Chehabversion 1.3-pl3 of the ``sysklogd`` package should be used.
307f226e460SMauro Carvalho Chehab
308f226e460SMauro Carvalho ChehabWhen a protection fault occurs the ``klogd`` daemon automatically
309f226e460SMauro Carvalho Chehabtranslates important addresses in the kernel log messages to their
310f226e460SMauro Carvalho Chehabsymbolic equivalents.  This translated kernel message is then
311f226e460SMauro Carvalho Chehabforwarded through whatever reporting mechanism ``klogd`` is using.  The
312f226e460SMauro Carvalho Chehabprotection fault message can be simply cut out of the message files
313f226e460SMauro Carvalho Chehaband forwarded to the kernel developers.
314f226e460SMauro Carvalho Chehab
315f226e460SMauro Carvalho ChehabTwo types of address resolution are performed by ``klogd``.  The first is
3164eb92411SRandy Dunlapstatic translation and the second is dynamic translation.
3174eb92411SRandy DunlapStatic translation uses the System.map file.
3184eb92411SRandy DunlapIn order to do static translation the ``klogd`` daemon
319f226e460SMauro Carvalho Chehabmust be able to find a system map file at daemon initialization time.
320f226e460SMauro Carvalho ChehabSee the klogd man page for information on how ``klogd`` searches for map
321f226e460SMauro Carvalho Chehabfiles.
322f226e460SMauro Carvalho Chehab
323f226e460SMauro Carvalho ChehabDynamic address translation is important when kernel loadable modules
324f226e460SMauro Carvalho Chehabare being used.  Since memory for kernel modules is allocated from the
325f226e460SMauro Carvalho Chehabkernel's dynamic memory pools there are no fixed locations for either
326f226e460SMauro Carvalho Chehabthe start of the module or for functions and symbols in the module.
327f226e460SMauro Carvalho Chehab
328f226e460SMauro Carvalho ChehabThe kernel supports system calls which allow a program to determine
329f226e460SMauro Carvalho Chehabwhich modules are loaded and their location in memory.  Using these
330f226e460SMauro Carvalho Chehabsystem calls the klogd daemon builds a symbol table which can be used
331f226e460SMauro Carvalho Chehabto debug a protection fault which occurs in a loadable kernel module.
332f226e460SMauro Carvalho Chehab
333f226e460SMauro Carvalho ChehabAt the very minimum klogd will provide the name of the module which
334f226e460SMauro Carvalho Chehabgenerated the protection fault.  There may be additional symbolic
335f226e460SMauro Carvalho Chehabinformation available if the developer of the loadable module chose to
336f226e460SMauro Carvalho Chehabexport symbol information from the module.
337f226e460SMauro Carvalho Chehab
338f226e460SMauro Carvalho ChehabSince the kernel module environment can be dynamic there must be a
339f226e460SMauro Carvalho Chehabmechanism for notifying the ``klogd`` daemon when a change in module
340f226e460SMauro Carvalho Chehabenvironment occurs.  There are command line options available which
341f226e460SMauro Carvalho Chehaballow klogd to signal the currently executing daemon that symbol
342f226e460SMauro Carvalho Chehabinformation should be refreshed.  See the ``klogd`` manual page for more
343f226e460SMauro Carvalho Chehabinformation.
344f226e460SMauro Carvalho Chehab
345f226e460SMauro Carvalho ChehabA patch is included with the sysklogd distribution which modifies the
346f226e460SMauro Carvalho Chehab``modules-2.0.0`` package to automatically signal klogd whenever a module
347f226e460SMauro Carvalho Chehabis loaded or unloaded.  Applying this patch provides essentially
348f226e460SMauro Carvalho Chehabseamless support for debugging protection faults which occur with
349f226e460SMauro Carvalho Chehabkernel loadable modules.
350f226e460SMauro Carvalho Chehab
351f226e460SMauro Carvalho ChehabThe following is an example of a protection fault in a loadable module
352f226e460SMauro Carvalho Chehabprocessed by ``klogd``::
353f226e460SMauro Carvalho Chehab
354f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc
355f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000
356f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: *pde = 00000000
357f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: Oops: 0002
358f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: CPU:    0
359f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: EIP:    0010:[oops:_oops+16/3868]
360f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212
361f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: eax: 315e97cc   ebx: 003a6f80   ecx: 001be77b   edx: 00237c0c
362f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: esi: 00000000   edi: bffffdb3   ebp: 00589f90   esp: 00589f8c
363f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: ds: 0018   es: 0018   fs: 002b   gs: 002b   ss: 0018
364f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000)
365f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001
366f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel:        00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00
367f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel:        bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036
368f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128]
369f226e460SMauro Carvalho Chehab	Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3
370f226e460SMauro Carvalho Chehab
371f226e460SMauro Carvalho Chehab---------------------------------------------------------------------------
372f226e460SMauro Carvalho Chehab
373f226e460SMauro Carvalho Chehab::
374f226e460SMauro Carvalho Chehab
375f226e460SMauro Carvalho Chehab  Dr. G.W. Wettstein           Oncology Research Div. Computing Facility
376f226e460SMauro Carvalho Chehab  Roger Maris Cancer Center    INTERNET: greg@wind.rmcc.com
377f226e460SMauro Carvalho Chehab  820 4th St. N.
378f226e460SMauro Carvalho Chehab  Fargo, ND  58122
379f226e460SMauro Carvalho Chehab  Phone: 701-234-7556
380