19d85025bSMauro Carvalho ChehabBug hunting 2f226e460SMauro Carvalho Chehab=========== 39d85025bSMauro Carvalho Chehab 4f226e460SMauro Carvalho ChehabKernel bug reports often come with a stack dump like the one below:: 59d85025bSMauro Carvalho Chehab 6f226e460SMauro Carvalho Chehab ------------[ cut here ]------------ 7f226e460SMauro Carvalho Chehab WARNING: CPU: 1 PID: 28102 at kernel/module.c:1108 module_put+0x57/0x70 8f226e460SMauro Carvalho Chehab Modules linked in: dvb_usb_gp8psk(-) dvb_usb dvb_core nvidia_drm(PO) nvidia_modeset(PO) snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_timer snd soundcore nvidia(PO) [last unloaded: rc_core] 9f226e460SMauro Carvalho Chehab CPU: 1 PID: 28102 Comm: rmmod Tainted: P WC O 4.8.4-build.1 #1 10f226e460SMauro Carvalho Chehab Hardware name: MSI MS-7309/MS-7309, BIOS V1.12 02/23/2009 11f226e460SMauro Carvalho Chehab 00000000 c12ba080 00000000 00000000 c103ed6a c1616014 00000001 00006dc6 12f226e460SMauro Carvalho Chehab c1615862 00000454 c109e8a7 c109e8a7 00000009 ffffffff 00000000 f13f6a10 13f226e460SMauro Carvalho Chehab f5f5a600 c103ee33 00000009 00000000 00000000 c109e8a7 f80ca4d0 c109f617 14f226e460SMauro Carvalho Chehab Call Trace: 15f226e460SMauro Carvalho Chehab [<c12ba080>] ? dump_stack+0x44/0x64 16f226e460SMauro Carvalho Chehab [<c103ed6a>] ? __warn+0xfa/0x120 17f226e460SMauro Carvalho Chehab [<c109e8a7>] ? module_put+0x57/0x70 18f226e460SMauro Carvalho Chehab [<c109e8a7>] ? module_put+0x57/0x70 19f226e460SMauro Carvalho Chehab [<c103ee33>] ? warn_slowpath_null+0x23/0x30 20f226e460SMauro Carvalho Chehab [<c109e8a7>] ? module_put+0x57/0x70 21f226e460SMauro Carvalho Chehab [<f80ca4d0>] ? gp8psk_fe_set_frontend+0x460/0x460 [dvb_usb_gp8psk] 22f226e460SMauro Carvalho Chehab [<c109f617>] ? symbol_put_addr+0x27/0x50 23f226e460SMauro Carvalho Chehab [<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb] 24f226e460SMauro Carvalho Chehab [<f80bb3bf>] ? dvb_usb_exit+0x2f/0xd0 [dvb_usb] 25f226e460SMauro Carvalho Chehab [<c13d03bc>] ? usb_disable_endpoint+0x7c/0xb0 26f226e460SMauro Carvalho Chehab [<f80bb48a>] ? dvb_usb_device_exit+0x2a/0x50 [dvb_usb] 27f226e460SMauro Carvalho Chehab [<c13d2882>] ? usb_unbind_interface+0x62/0x250 28f226e460SMauro Carvalho Chehab [<c136b514>] ? __pm_runtime_idle+0x44/0x70 29f226e460SMauro Carvalho Chehab [<c13620d8>] ? __device_release_driver+0x78/0x120 30f226e460SMauro Carvalho Chehab [<c1362907>] ? driver_detach+0x87/0x90 31f226e460SMauro Carvalho Chehab [<c1361c48>] ? bus_remove_driver+0x38/0x90 32f226e460SMauro Carvalho Chehab [<c13d1c18>] ? usb_deregister+0x58/0xb0 33f226e460SMauro Carvalho Chehab [<c109fbb0>] ? SyS_delete_module+0x130/0x1f0 34f226e460SMauro Carvalho Chehab [<c1055654>] ? task_work_run+0x64/0x80 35f226e460SMauro Carvalho Chehab [<c1000fa5>] ? exit_to_usermode_loop+0x85/0x90 36f226e460SMauro Carvalho Chehab [<c10013f0>] ? do_fast_syscall_32+0x80/0x130 37f226e460SMauro Carvalho Chehab [<c1549f43>] ? sysenter_past_esp+0x40/0x6a 38f226e460SMauro Carvalho Chehab ---[ end trace 6ebc60ef3981792f ]--- 399d85025bSMauro Carvalho Chehab 40f226e460SMauro Carvalho ChehabSuch stack traces provide enough information to identify the line inside the 41f226e460SMauro Carvalho ChehabKernel's source code where the bug happened. Depending on the severity of 42f226e460SMauro Carvalho Chehabthe issue, it may also contain the word **Oops**, as on this one:: 43f226e460SMauro Carvalho Chehab 44f226e460SMauro Carvalho Chehab BUG: unable to handle kernel NULL pointer dereference at (null) 45f226e460SMauro Carvalho Chehab IP: [<c06969d4>] iret_exc+0x7d0/0xa59 46f226e460SMauro Carvalho Chehab *pdpt = 000000002258a001 *pde = 0000000000000000 47f226e460SMauro Carvalho Chehab Oops: 0002 [#1] PREEMPT SMP 48f226e460SMauro Carvalho Chehab ... 49f226e460SMauro Carvalho Chehab 50f226e460SMauro Carvalho ChehabDespite being an **Oops** or some other sort of stack trace, the offended 51f226e460SMauro Carvalho Chehabline is usually required to identify and handle the bug. Along this chapter, 524eb92411SRandy Dunlapwe'll refer to "Oops" for all kinds of stack traces that need to be analyzed. 53f226e460SMauro Carvalho Chehab 544eb92411SRandy DunlapIf the kernel is compiled with ``CONFIG_DEBUG_INFO``, you can enhance the 554eb92411SRandy Dunlapquality of the stack trace by using file:`scripts/decode_stacktrace.sh`. 56f226e460SMauro Carvalho Chehab 574eb92411SRandy DunlapModules linked in 584eb92411SRandy Dunlap----------------- 594eb92411SRandy Dunlap 604eb92411SRandy DunlapModules that are tainted or are being loaded or unloaded are marked with 614eb92411SRandy Dunlap"(...)", where the taint flags are described in 624eb92411SRandy Dunlapfile:`Documentation/admin-guide/tainted-kernels.rst`, "being loaded" is 634eb92411SRandy Dunlapannotated with "+", and "being unloaded" is annotated with "-". 644eb92411SRandy Dunlap 65f226e460SMauro Carvalho Chehab 66f226e460SMauro Carvalho ChehabWhere is the Oops message is located? 67f226e460SMauro Carvalho Chehab------------------------------------- 68f226e460SMauro Carvalho Chehab 69f226e460SMauro Carvalho ChehabNormally the Oops text is read from the kernel buffers by klogd and 70f226e460SMauro Carvalho Chehabhanded to ``syslogd`` which writes it to a syslog file, typically 71f226e460SMauro Carvalho Chehab``/var/log/messages`` (depends on ``/etc/syslog.conf``). On systems with 72f226e460SMauro Carvalho Chehabsystemd, it may also be stored by the ``journald`` daemon, and accessed 73f226e460SMauro Carvalho Chehabby running ``journalctl`` command. 74f226e460SMauro Carvalho Chehab 75f226e460SMauro Carvalho ChehabSometimes ``klogd`` dies, in which case you can run ``dmesg > file`` to 76f226e460SMauro Carvalho Chehabread the data from the kernel buffers and save it. Or you can 77f226e460SMauro Carvalho Chehab``cat /proc/kmsg > file``, however you have to break in to stop the transfer, 784eb92411SRandy Dunlapsince ``kmsg`` is a "never ending file". 79f226e460SMauro Carvalho Chehab 80f226e460SMauro Carvalho ChehabIf the machine has crashed so badly that you cannot enter commands or 81f226e460SMauro Carvalho Chehabthe disk is not available then you have three options: 82f226e460SMauro Carvalho Chehab 83f226e460SMauro Carvalho Chehab(1) Hand copy the text from the screen and type it in after the machine 84f226e460SMauro Carvalho Chehab has restarted. Messy but it is the only option if you have not 85f226e460SMauro Carvalho Chehab planned for a crash. Alternatively, you can take a picture of 86f226e460SMauro Carvalho Chehab the screen with a digital camera - not nice, but better than 87f226e460SMauro Carvalho Chehab nothing. If the messages scroll off the top of the console, you 884eb92411SRandy Dunlap may find that booting with a higher resolution (e.g., ``vga=791``) 89f226e460SMauro Carvalho Chehab will allow you to read more of the text. (Caveat: This needs ``vesafb``, 904eb92411SRandy Dunlap so won't help for 'early' oopses.) 91f226e460SMauro Carvalho Chehab 92f226e460SMauro Carvalho Chehab(2) Boot with a serial console (see 93f226e460SMauro Carvalho Chehab :ref:`Documentation/admin-guide/serial-console.rst <serial_console>`), 94f226e460SMauro Carvalho Chehab run a null modem to a second machine and capture the output there 95f226e460SMauro Carvalho Chehab using your favourite communication program. Minicom works well. 96f226e460SMauro Carvalho Chehab 97330d4810SMauro Carvalho Chehab(3) Use Kdump (see Documentation/admin-guide/kdump/kdump.rst), 98f226e460SMauro Carvalho Chehab extract the kernel ring buffer from old memory with using dmesg 99330d4810SMauro Carvalho Chehab gdbmacro in Documentation/admin-guide/kdump/gdbmacros.txt. 100f226e460SMauro Carvalho Chehab 101f226e460SMauro Carvalho ChehabFinding the bug's location 102f226e460SMauro Carvalho Chehab-------------------------- 103f226e460SMauro Carvalho Chehab 104f226e460SMauro Carvalho ChehabReporting a bug works best if you point the location of the bug at the 105f226e460SMauro Carvalho ChehabKernel source file. There are two methods for doing that. Usually, using 106f226e460SMauro Carvalho Chehab``gdb`` is easier, but the Kernel should be pre-compiled with debug info. 107f226e460SMauro Carvalho Chehab 108f226e460SMauro Carvalho Chehabgdb 109f226e460SMauro Carvalho Chehab^^^ 110f226e460SMauro Carvalho Chehab 1114eb92411SRandy DunlapThe GNU debugger (``gdb``) is the best way to figure out the exact file and line 112f226e460SMauro Carvalho Chehabnumber of the OOPS from the ``vmlinux`` file. 113f226e460SMauro Carvalho Chehab 114f226e460SMauro Carvalho ChehabThe usage of gdb works best on a kernel compiled with ``CONFIG_DEBUG_INFO``. 115f226e460SMauro Carvalho ChehabThis can be set by running:: 116f226e460SMauro Carvalho Chehab 117f226e460SMauro Carvalho Chehab $ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO 118f226e460SMauro Carvalho Chehab 119f226e460SMauro Carvalho ChehabOn a kernel compiled with ``CONFIG_DEBUG_INFO``, you can simply copy the 120f226e460SMauro Carvalho ChehabEIP value from the OOPS:: 121f226e460SMauro Carvalho Chehab 122f226e460SMauro Carvalho Chehab EIP: 0060:[<c021e50e>] Not tainted VLI 123f226e460SMauro Carvalho Chehab 124f226e460SMauro Carvalho ChehabAnd use GDB to translate that to human-readable form:: 125f226e460SMauro Carvalho Chehab 126f226e460SMauro Carvalho Chehab $ gdb vmlinux 127f226e460SMauro Carvalho Chehab (gdb) l *0xc021e50e 128f226e460SMauro Carvalho Chehab 129f226e460SMauro Carvalho ChehabIf you don't have ``CONFIG_DEBUG_INFO`` enabled, you use the function 130f226e460SMauro Carvalho Chehaboffset from the OOPS:: 131f226e460SMauro Carvalho Chehab 132f226e460SMauro Carvalho Chehab EIP is at vt_ioctl+0xda8/0x1482 133f226e460SMauro Carvalho Chehab 134f226e460SMauro Carvalho ChehabAnd recompile the kernel with ``CONFIG_DEBUG_INFO`` enabled:: 135f226e460SMauro Carvalho Chehab 136f226e460SMauro Carvalho Chehab $ ./scripts/config -d COMPILE_TEST -e DEBUG_KERNEL -e DEBUG_INFO 137f226e460SMauro Carvalho Chehab $ make vmlinux 138f226e460SMauro Carvalho Chehab $ gdb vmlinux 139f226e460SMauro Carvalho Chehab (gdb) l *vt_ioctl+0xda8 140f226e460SMauro Carvalho Chehab 0x1888 is in vt_ioctl (drivers/tty/vt/vt_ioctl.c:293). 141f226e460SMauro Carvalho Chehab 288 { 142f226e460SMauro Carvalho Chehab 289 struct vc_data *vc = NULL; 143f226e460SMauro Carvalho Chehab 290 int ret = 0; 144f226e460SMauro Carvalho Chehab 291 145f226e460SMauro Carvalho Chehab 292 console_lock(); 146f226e460SMauro Carvalho Chehab 293 if (VT_BUSY(vc_num)) 147f226e460SMauro Carvalho Chehab 294 ret = -EBUSY; 148f226e460SMauro Carvalho Chehab 295 else if (vc_num) 149f226e460SMauro Carvalho Chehab 296 vc = vc_deallocate(vc_num); 150f226e460SMauro Carvalho Chehab 297 console_unlock(); 151f226e460SMauro Carvalho Chehab 152f226e460SMauro Carvalho Chehabor, if you want to be more verbose:: 153f226e460SMauro Carvalho Chehab 154f226e460SMauro Carvalho Chehab (gdb) p vt_ioctl 155f226e460SMauro Carvalho Chehab $1 = {int (struct tty_struct *, unsigned int, unsigned long)} 0xae0 <vt_ioctl> 156f226e460SMauro Carvalho Chehab (gdb) l *0xae0+0xda8 157f226e460SMauro Carvalho Chehab 158f226e460SMauro Carvalho ChehabYou could, instead, use the object file:: 159f226e460SMauro Carvalho Chehab 160f226e460SMauro Carvalho Chehab $ make drivers/tty/ 161f226e460SMauro Carvalho Chehab $ gdb drivers/tty/vt/vt_ioctl.o 162f226e460SMauro Carvalho Chehab (gdb) l *vt_ioctl+0xda8 163f226e460SMauro Carvalho Chehab 164f226e460SMauro Carvalho ChehabIf you have a call trace, such as:: 165f226e460SMauro Carvalho Chehab 166f226e460SMauro Carvalho Chehab Call Trace: 167f226e460SMauro Carvalho Chehab [<ffffffff8802c8e9>] :jbd:log_wait_commit+0xa3/0xf5 168f226e460SMauro Carvalho Chehab [<ffffffff810482d9>] autoremove_wake_function+0x0/0x2e 169f226e460SMauro Carvalho Chehab [<ffffffff8802770b>] :jbd:journal_stop+0x1be/0x1ee 170f226e460SMauro Carvalho Chehab ... 171f226e460SMauro Carvalho Chehab 1724eb92411SRandy Dunlapthis shows the problem likely is in the :jbd: module. You can load that module 173f226e460SMauro Carvalho Chehabin gdb and list the relevant code:: 174f226e460SMauro Carvalho Chehab 175f226e460SMauro Carvalho Chehab $ gdb fs/jbd/jbd.ko 176f226e460SMauro Carvalho Chehab (gdb) l *log_wait_commit+0xa3 177f226e460SMauro Carvalho Chehab 178f226e460SMauro Carvalho Chehab.. note:: 179f226e460SMauro Carvalho Chehab 180f226e460SMauro Carvalho Chehab You can also do the same for any function call at the stack trace, 181f226e460SMauro Carvalho Chehab like this one:: 182f226e460SMauro Carvalho Chehab 183f226e460SMauro Carvalho Chehab [<f80bc9ca>] ? dvb_usb_adapter_frontend_exit+0x3a/0x70 [dvb_usb] 184f226e460SMauro Carvalho Chehab 185f226e460SMauro Carvalho Chehab The position where the above call happened can be seen with:: 186f226e460SMauro Carvalho Chehab 187f226e460SMauro Carvalho Chehab $ gdb drivers/media/usb/dvb-usb/dvb-usb.o 188f226e460SMauro Carvalho Chehab (gdb) l *dvb_usb_adapter_frontend_exit+0x3a 1899d85025bSMauro Carvalho Chehab 190ab0e44c1SMauro Carvalho Chehabobjdump 191f226e460SMauro Carvalho Chehab^^^^^^^ 192ab0e44c1SMauro Carvalho Chehab 1939d85025bSMauro Carvalho ChehabTo debug a kernel, use objdump and look for the hex offset from the crash 1949d85025bSMauro Carvalho Chehaboutput to find the valid line of code/assembler. Without debug symbols, you 1959d85025bSMauro Carvalho Chehabwill see the assembler code for the routine shown, but if your kernel has 1969d85025bSMauro Carvalho Chehabdebug symbols the C code will also be available. (Debug symbols can be enabled 1979d85025bSMauro Carvalho Chehabin the kernel hacking menu of the menu configuration.) For example:: 1989d85025bSMauro Carvalho Chehab 199ab0e44c1SMauro Carvalho Chehab $ objdump -r -S -l --disassemble net/dccp/ipv4.o 2009d85025bSMauro Carvalho Chehab 2019d85025bSMauro Carvalho Chehab.. note:: 2029d85025bSMauro Carvalho Chehab 2039d85025bSMauro Carvalho Chehab You need to be at the top level of the kernel tree for this to pick up 2049d85025bSMauro Carvalho Chehab your C files. 2059d85025bSMauro Carvalho Chehab 2064eb92411SRandy DunlapIf you don't have access to the source code you can still debug some crash 2074eb92411SRandy Dunlapdumps using the following method (example crash dump output as shown by 2084eb92411SRandy DunlapDave Miller):: 2099d85025bSMauro Carvalho Chehab 210ab0e44c1SMauro Carvalho Chehab EIP is at +0x14/0x4c0 2119d85025bSMauro Carvalho Chehab ... 2129d85025bSMauro Carvalho Chehab Code: 44 24 04 e8 6f 05 00 00 e9 e8 fe ff ff 8d 76 00 8d bc 27 00 00 2139d85025bSMauro Carvalho Chehab 00 00 55 57 56 53 81 ec bc 00 00 00 8b ac 24 d0 00 00 00 8b 5d 08 2149d85025bSMauro Carvalho Chehab <8b> 83 3c 01 00 00 89 44 24 14 8b 45 28 85 c0 89 44 24 18 0f 85 2159d85025bSMauro Carvalho Chehab 2169d85025bSMauro Carvalho Chehab Put the bytes into a "foo.s" file like this: 2179d85025bSMauro Carvalho Chehab 2189d85025bSMauro Carvalho Chehab .text 2199d85025bSMauro Carvalho Chehab .globl foo 2209d85025bSMauro Carvalho Chehab foo: 2219d85025bSMauro Carvalho Chehab .byte .... /* bytes from Code: part of OOPS dump */ 2229d85025bSMauro Carvalho Chehab 2239d85025bSMauro Carvalho Chehab Compile it with "gcc -c -o foo.o foo.s" then look at the output of 2249d85025bSMauro Carvalho Chehab "objdump --disassemble foo.o". 2259d85025bSMauro Carvalho Chehab 2269d85025bSMauro Carvalho Chehab Output: 2279d85025bSMauro Carvalho Chehab 2289d85025bSMauro Carvalho Chehab ip_queue_xmit: 2299d85025bSMauro Carvalho Chehab push %ebp 2309d85025bSMauro Carvalho Chehab push %edi 2319d85025bSMauro Carvalho Chehab push %esi 2329d85025bSMauro Carvalho Chehab push %ebx 2339d85025bSMauro Carvalho Chehab sub $0xbc, %esp 2349d85025bSMauro Carvalho Chehab mov 0xd0(%esp), %ebp ! %ebp = arg0 (skb) 2359d85025bSMauro Carvalho Chehab mov 0x8(%ebp), %ebx ! %ebx = skb->sk 2369d85025bSMauro Carvalho Chehab mov 0x13c(%ebx), %eax ! %eax = inet_sk(sk)->opt 2379d85025bSMauro Carvalho Chehab 2384eb92411SRandy Dunlapfile:`scripts/decodecode` can be used to automate most of this, depending 2394eb92411SRandy Dunlapon what CPU architecture is being debugged. 2404eb92411SRandy Dunlap 241f226e460SMauro Carvalho ChehabReporting the bug 242f226e460SMauro Carvalho Chehab----------------- 243ab0e44c1SMauro Carvalho Chehab 244f226e460SMauro Carvalho ChehabOnce you find where the bug happened, by inspecting its location, 245f226e460SMauro Carvalho Chehabyou could either try to fix it yourself or report it upstream. 246ab0e44c1SMauro Carvalho Chehab 247*cd0403adSJani NikulaIn order to report it upstream, you should identify the bug tracker, if any, or 248*cd0403adSJani Nikulamailing list used for the development of the affected code. This can be done by 249*cd0403adSJani Nikulausing the ``get_maintainer.pl`` script. 250ab0e44c1SMauro Carvalho Chehab 251ed6e26baSChristophe JAILLETFor example, if you find a bug at the gspca's sonixj.c file, you can get 2524eb92411SRandy Dunlapits maintainers with:: 253ab0e44c1SMauro Carvalho Chehab 254*cd0403adSJani Nikula $ ./scripts/get_maintainer.pl --bug -f drivers/media/usb/gspca/sonixj.c 255f226e460SMauro Carvalho Chehab Hans Verkuil <hverkuil@xs4all.nl> (odd fixer:GSPCA USB WEBCAM DRIVER,commit_signer:1/1=100%) 256f226e460SMauro Carvalho Chehab Mauro Carvalho Chehab <mchehab@kernel.org> (maintainer:MEDIA INPUT INFRASTRUCTURE (V4L/DVB),commit_signer:1/1=100%) 257f226e460SMauro Carvalho Chehab Tejun Heo <tj@kernel.org> (commit_signer:1/1=100%) 258f226e460SMauro Carvalho Chehab Bhaktipriya Shridhar <bhaktipriya96@gmail.com> (commit_signer:1/1=100%,authored:1/1=100%,added_lines:4/4=100%,removed_lines:9/9=100%) 259f226e460SMauro Carvalho Chehab linux-media@vger.kernel.org (open list:GSPCA USB WEBCAM DRIVER) 260f226e460SMauro Carvalho Chehab linux-kernel@vger.kernel.org (open list) 2619d85025bSMauro Carvalho Chehab 262f226e460SMauro Carvalho ChehabPlease notice that it will point to: 2639d85025bSMauro Carvalho Chehab 2644eb92411SRandy Dunlap- The last developers that touched the source code (if this is done inside 2654eb92411SRandy Dunlap a git tree). On the above example, Tejun and Bhaktipriya (in this 266b2105aa2SAndrew Klychkov specific case, none really involved on the development of this file); 267f226e460SMauro Carvalho Chehab- The driver maintainer (Hans Verkuil); 268ed6e26baSChristophe JAILLET- The subsystem maintainer (Mauro Carvalho Chehab); 269f226e460SMauro Carvalho Chehab- The driver and/or subsystem mailing list (linux-media@vger.kernel.org); 270*cd0403adSJani Nikula- The Linux Kernel mailing list (linux-kernel@vger.kernel.org); 271*cd0403adSJani Nikula- The bug reporting URIs for the driver/subsystem (none in the above example). 2729d85025bSMauro Carvalho Chehab 273*cd0403adSJani NikulaIf the listing contains bug reporting URIs at the end, please prefer them over 274*cd0403adSJani Nikulaemail. Otherwise, please report bugs to the mailing list used for the 275*cd0403adSJani Nikuladevelopment of the code (linux-media ML) copying the driver maintainer (Hans). 2769d85025bSMauro Carvalho Chehab 277f226e460SMauro Carvalho ChehabIf you are totally stumped as to whom to send the report, and 278f226e460SMauro Carvalho Chehab``get_maintainer.pl`` didn't provide you anything useful, send it to 279f226e460SMauro Carvalho Chehablinux-kernel@vger.kernel.org. 2809d85025bSMauro Carvalho Chehab 281f226e460SMauro Carvalho ChehabThanks for your help in making Linux as stable as humanly possible. 2829d85025bSMauro Carvalho Chehab 283f226e460SMauro Carvalho ChehabFixing the bug 284f226e460SMauro Carvalho Chehab-------------- 2859d85025bSMauro Carvalho Chehab 286f226e460SMauro Carvalho ChehabIf you know programming, you could help us by not only reporting the bug, 287ed6e26baSChristophe JAILLETbut also providing us with a solution. After all, open source is about 288f226e460SMauro Carvalho Chehabsharing what you do and don't you want to be recognised for your genius? 289ab0e44c1SMauro Carvalho Chehab 290f226e460SMauro Carvalho ChehabIf you decide to take this way, once you have worked out a fix please submit 291f226e460SMauro Carvalho Chehabit upstream. 2929d85025bSMauro Carvalho Chehab 2938c27ceffSMauro Carvalho ChehabPlease do read 294ed6e26baSChristophe JAILLET:ref:`Documentation/process/submitting-patches.rst <submittingpatches>` though 2958c27ceffSMauro Carvalho Chehabto help your code get accepted. 296f226e460SMauro Carvalho Chehab 297f226e460SMauro Carvalho Chehab 298f226e460SMauro Carvalho Chehab--------------------------------------------------------------------------- 299f226e460SMauro Carvalho Chehab 300f226e460SMauro Carvalho ChehabNotes on Oops tracing with ``klogd`` 301f226e460SMauro Carvalho Chehab------------------------------------ 302f226e460SMauro Carvalho Chehab 303f226e460SMauro Carvalho ChehabIn order to help Linus and the other kernel developers there has been 304f226e460SMauro Carvalho Chehabsubstantial support incorporated into ``klogd`` for processing protection 305f226e460SMauro Carvalho Chehabfaults. In order to have full support for address resolution at least 306f226e460SMauro Carvalho Chehabversion 1.3-pl3 of the ``sysklogd`` package should be used. 307f226e460SMauro Carvalho Chehab 308f226e460SMauro Carvalho ChehabWhen a protection fault occurs the ``klogd`` daemon automatically 309f226e460SMauro Carvalho Chehabtranslates important addresses in the kernel log messages to their 310f226e460SMauro Carvalho Chehabsymbolic equivalents. This translated kernel message is then 311f226e460SMauro Carvalho Chehabforwarded through whatever reporting mechanism ``klogd`` is using. The 312f226e460SMauro Carvalho Chehabprotection fault message can be simply cut out of the message files 313f226e460SMauro Carvalho Chehaband forwarded to the kernel developers. 314f226e460SMauro Carvalho Chehab 315f226e460SMauro Carvalho ChehabTwo types of address resolution are performed by ``klogd``. The first is 3164eb92411SRandy Dunlapstatic translation and the second is dynamic translation. 3174eb92411SRandy DunlapStatic translation uses the System.map file. 3184eb92411SRandy DunlapIn order to do static translation the ``klogd`` daemon 319f226e460SMauro Carvalho Chehabmust be able to find a system map file at daemon initialization time. 320f226e460SMauro Carvalho ChehabSee the klogd man page for information on how ``klogd`` searches for map 321f226e460SMauro Carvalho Chehabfiles. 322f226e460SMauro Carvalho Chehab 323f226e460SMauro Carvalho ChehabDynamic address translation is important when kernel loadable modules 324f226e460SMauro Carvalho Chehabare being used. Since memory for kernel modules is allocated from the 325f226e460SMauro Carvalho Chehabkernel's dynamic memory pools there are no fixed locations for either 326f226e460SMauro Carvalho Chehabthe start of the module or for functions and symbols in the module. 327f226e460SMauro Carvalho Chehab 328f226e460SMauro Carvalho ChehabThe kernel supports system calls which allow a program to determine 329f226e460SMauro Carvalho Chehabwhich modules are loaded and their location in memory. Using these 330f226e460SMauro Carvalho Chehabsystem calls the klogd daemon builds a symbol table which can be used 331f226e460SMauro Carvalho Chehabto debug a protection fault which occurs in a loadable kernel module. 332f226e460SMauro Carvalho Chehab 333f226e460SMauro Carvalho ChehabAt the very minimum klogd will provide the name of the module which 334f226e460SMauro Carvalho Chehabgenerated the protection fault. There may be additional symbolic 335f226e460SMauro Carvalho Chehabinformation available if the developer of the loadable module chose to 336f226e460SMauro Carvalho Chehabexport symbol information from the module. 337f226e460SMauro Carvalho Chehab 338f226e460SMauro Carvalho ChehabSince the kernel module environment can be dynamic there must be a 339f226e460SMauro Carvalho Chehabmechanism for notifying the ``klogd`` daemon when a change in module 340f226e460SMauro Carvalho Chehabenvironment occurs. There are command line options available which 341f226e460SMauro Carvalho Chehaballow klogd to signal the currently executing daemon that symbol 342f226e460SMauro Carvalho Chehabinformation should be refreshed. See the ``klogd`` manual page for more 343f226e460SMauro Carvalho Chehabinformation. 344f226e460SMauro Carvalho Chehab 345f226e460SMauro Carvalho ChehabA patch is included with the sysklogd distribution which modifies the 346f226e460SMauro Carvalho Chehab``modules-2.0.0`` package to automatically signal klogd whenever a module 347f226e460SMauro Carvalho Chehabis loaded or unloaded. Applying this patch provides essentially 348f226e460SMauro Carvalho Chehabseamless support for debugging protection faults which occur with 349f226e460SMauro Carvalho Chehabkernel loadable modules. 350f226e460SMauro Carvalho Chehab 351f226e460SMauro Carvalho ChehabThe following is an example of a protection fault in a loadable module 352f226e460SMauro Carvalho Chehabprocessed by ``klogd``:: 353f226e460SMauro Carvalho Chehab 354f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: Unable to handle kernel paging request at virtual address f15e97cc 355f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: current->tss.cr3 = 0062d000, %cr3 = 0062d000 356f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: *pde = 00000000 357f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: Oops: 0002 358f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: CPU: 0 359f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: EIP: 0010:[oops:_oops+16/3868] 360f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: EFLAGS: 00010212 361f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: eax: 315e97cc ebx: 003a6f80 ecx: 001be77b edx: 00237c0c 362f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: esi: 00000000 edi: bffffdb3 ebp: 00589f90 esp: 00589f8c 363f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: ds: 0018 es: 0018 fs: 002b gs: 002b ss: 0018 364f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: Process oops_test (pid: 3374, process nr: 21, stackpage=00589000) 365f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: Stack: 315e97cc 00589f98 0100b0b4 bffffed4 0012e38e 00240c64 003a6f80 00000001 366f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: 00000000 00237810 bfffff00 0010a7fa 00000003 00000001 00000000 bfffff00 367f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: bffffdb3 bffffed4 ffffffda 0000002b 0007002b 0000002b 0000002b 00000036 368f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: Call Trace: [oops:_oops_ioctl+48/80] [_sys_ioctl+254/272] [_system_call+82/128] 369f226e460SMauro Carvalho Chehab Aug 29 09:51:01 blizard kernel: Code: c7 00 05 00 00 00 eb 08 90 90 90 90 90 90 90 90 89 ec 5d c3 370f226e460SMauro Carvalho Chehab 371f226e460SMauro Carvalho Chehab--------------------------------------------------------------------------- 372f226e460SMauro Carvalho Chehab 373f226e460SMauro Carvalho Chehab:: 374f226e460SMauro Carvalho Chehab 375f226e460SMauro Carvalho Chehab Dr. G.W. Wettstein Oncology Research Div. Computing Facility 376f226e460SMauro Carvalho Chehab Roger Maris Cancer Center INTERNET: greg@wind.rmcc.com 377f226e460SMauro Carvalho Chehab 820 4th St. N. 378f226e460SMauro Carvalho Chehab Fargo, ND 58122 379f226e460SMauro Carvalho Chehab Phone: 701-234-7556 380