1cafb2224SMario Limonciello.. SPDX-License-Identifier: GPL-2.0 2cafb2224SMario Limonciello 3cafb2224SMario LimoncielloDebugging AMD Zen systems 4cafb2224SMario Limonciello+++++++++++++++++++++++++ 5cafb2224SMario Limonciello 6cafb2224SMario LimoncielloIntroduction 7cafb2224SMario Limonciello============ 8cafb2224SMario Limonciello 9cafb2224SMario LimoncielloThis document describes techniques that are useful for debugging issues with 10cafb2224SMario LimoncielloAMD Zen systems. It is intended for use by developers and technical users 11cafb2224SMario Limoncielloto help identify and resolve issues. 12cafb2224SMario Limonciello 13cafb2224SMario LimoncielloS3 vs s2idle 14cafb2224SMario Limonciello============ 15cafb2224SMario Limonciello 16cafb2224SMario LimoncielloOn AMD systems, it's not possible to simultaneously support suspend-to-RAM (S3) 17cafb2224SMario Limoncielloand suspend-to-idle (s2idle). To confirm which mode your system supports you 18cafb2224SMario Limonciellocan look at ``cat /sys/power/mem_sleep``. If it shows ``s2idle [deep]`` then 19cafb2224SMario Limonciello*S3* is supported. If it shows ``[s2idle]`` then *s2idle* is 20cafb2224SMario Limonciellosupported. 21cafb2224SMario Limonciello 22cafb2224SMario LimoncielloOn systems that support *S3*, the firmware will be utilized to put all hardware into 23cafb2224SMario Limonciellothe appropriate low power state. 24cafb2224SMario Limonciello 25cafb2224SMario LimoncielloOn systems that support *s2idle*, the kernel will be responsible for transitioning devices 26cafb2224SMario Limonciellointo the appropriate low power state. When all devices are in the appropriate low 27cafb2224SMario Limonciellopower state, the hardware will transition into a hardware sleep state. 28cafb2224SMario Limonciello 29cafb2224SMario LimoncielloAfter a suspend cycle you can tell how much time was spent in a hardware sleep 30cafb2224SMario Limonciellostate by looking at ``cat /sys/power/suspend_stats/last_hw_sleep``. 31cafb2224SMario Limonciello 32cafb2224SMario LimoncielloThis flowchart explains how the AMD s2idle suspend flow works. 33cafb2224SMario Limonciello 34cafb2224SMario Limonciello.. kernel-figure:: suspend.svg 35cafb2224SMario Limonciello 36cafb2224SMario LimoncielloThis flowchart explains how the amd s2idle resume flow works. 37cafb2224SMario Limonciello 38cafb2224SMario Limonciello.. kernel-figure:: resume.svg 39cafb2224SMario Limonciello 40cafb2224SMario Limonciellos2idle debugging tool 41cafb2224SMario Limonciello===================== 42cafb2224SMario Limonciello 43cafb2224SMario LimoncielloAs there are a lot of places that problems can occur, a debugging tool has been 44cafb2224SMario Limonciellocreated at 45cafb2224SMario Limonciello`amd-debug-tools <https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/>`_ 46cafb2224SMario Limonciellothat can help test for common problems and offer suggestions. 47cafb2224SMario Limonciello 48cafb2224SMario LimoncielloIf you have an s2idle issue, it's best to start with this and follow instructions 49cafb2224SMario Limonciellofrom its findings. If you continue to have an issue, raise a bug with the 50cafb2224SMario Limoncielloreport generated from this script to 51cafb2224SMario Limonciello`drm/amd gitlab <https://gitlab.freedesktop.org/drm/amd/-/issues/new?issuable_template=s2idle_BUG_TEMPLATE>`_. 52cafb2224SMario Limonciello 53cafb2224SMario LimoncielloSpurious s2idle wakeups from an IRQ 54cafb2224SMario Limonciello=================================== 55*ab813102SYazen Ghannam 56cafb2224SMario LimoncielloSpurious wakeups will generally have an IRQ set to ``/sys/power/pm_wakeup_irq``. 57cafb2224SMario LimoncielloThis can be matched to ``/proc/interrupts`` to determine what device woke the system. 58cafb2224SMario Limonciello 59cafb2224SMario LimoncielloIf this isn't enough to debug the problem, then the following sysfs files 60cafb2224SMario Limonciellocan be set to add more verbosity to the wakeup process: :: 61cafb2224SMario Limonciello 62cafb2224SMario Limonciello # echo 1 | sudo tee /sys/power/pm_debug_messages 63cafb2224SMario Limonciello # echo 1 | sudo tee /sys/power/pm_print_times 64cafb2224SMario Limonciello 65cafb2224SMario LimoncielloAfter making those changes, the kernel will display messages that can 66cafb2224SMario Limonciellobe traced back to kernel s2idle loop code as well as display any active 67cafb2224SMario LimoncielloGPIO sources while waking up. 68cafb2224SMario Limonciello 69cafb2224SMario LimoncielloIf the wakeup is caused by the ACPI SCI, additional ACPI debugging may be 70cafb2224SMario Limoncielloneeded. These commands can enable additional trace data: :: 71cafb2224SMario Limonciello 72cafb2224SMario Limonciello # echo enable | sudo tee /sys/module/acpi/parameters/trace_state 73cafb2224SMario Limonciello # echo 1 | sudo tee /sys/module/acpi/parameters/aml_debug_output 74cafb2224SMario Limonciello # echo 0x0800000f | sudo tee /sys/module/acpi/parameters/debug_level 75cafb2224SMario Limonciello # echo 0xffff0000 | sudo tee /sys/module/acpi/parameters/debug_layer 76cafb2224SMario Limonciello 77cafb2224SMario LimoncielloSpurious s2idle wakeups from a GPIO 78cafb2224SMario Limonciello=================================== 79cafb2224SMario Limonciello 80cafb2224SMario LimoncielloIf a GPIO is active when waking up the system ideally you would look at the 81cafb2224SMario Limoncielloschematic to determine what device it is associated with. If the schematic 82cafb2224SMario Limonciellois not available, another tactic is to look at the ACPI _EVT() entry 83cafb2224SMario Limoncielloto determine what device is notified when that GPIO is active. 84cafb2224SMario Limonciello 85cafb2224SMario LimoncielloFor a hypothetical example, say that GPIO 59 woke up the system. You can 86cafb2224SMario Limonciellolook at the SSDT to determine what device is notified when GPIO 59 is active. 87cafb2224SMario Limonciello 88cafb2224SMario LimoncielloFirst convert the GPIO number into hex. :: 89cafb2224SMario Limonciello 90cafb2224SMario Limonciello $ python3 -c "print(hex(59))" 91cafb2224SMario Limonciello 0x3b 92cafb2224SMario Limonciello 93cafb2224SMario LimoncielloNext determine which ACPI table has the ``_EVT`` entry. For example: :: 94cafb2224SMario Limonciello 95cafb2224SMario Limonciello $ sudo grep EVT /sys/firmware/acpi/tables/SSDT* 96cafb2224SMario Limonciello grep: /sys/firmware/acpi/tables/SSDT27: binary file matches 97cafb2224SMario Limonciello 98cafb2224SMario LimoncielloDecode this table:: 99cafb2224SMario Limonciello 100cafb2224SMario Limonciello $ sudo cp /sys/firmware/acpi/tables/SSDT27 . 101cafb2224SMario Limonciello $ sudo iasl -d SSDT27 102cafb2224SMario Limonciello 103cafb2224SMario LimoncielloThen look at the table and find the matching entry for GPIO 0x3b. :: 104cafb2224SMario Limonciello 105cafb2224SMario Limonciello Case (0x3B) 106cafb2224SMario Limonciello { 107cafb2224SMario Limonciello M000 (0x393B) 108cafb2224SMario Limonciello M460 (" Notify (\\_SB.PCI0.GP17.XHC1, 0x02)\n", Zero, Zero, Zero, Zero, Zero, Zero) 109cafb2224SMario Limonciello Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake 110cafb2224SMario Limonciello } 111cafb2224SMario Limonciello 112cafb2224SMario LimoncielloYou can see in this case that the device ``\_SB.PCI0.GP17.XHC1`` is notified 113cafb2224SMario Limonciellowhen GPIO 59 is active. It's obvious this is an XHCI controller, but to go a 114cafb2224SMario Limonciellostep further you can figure out which XHCI controller it is by matching it to 115cafb2224SMario LimoncielloACPI.:: 116cafb2224SMario Limonciello 117cafb2224SMario Limonciello $ grep "PCI0.GP17.XHC1" /sys/bus/acpi/devices/*/path 118cafb2224SMario Limonciello /sys/bus/acpi/devices/device:2d/path:\_SB_.PCI0.GP17.XHC1 119cafb2224SMario Limonciello /sys/bus/acpi/devices/device:2e/path:\_SB_.PCI0.GP17.XHC1.RHUB 120cafb2224SMario Limonciello /sys/bus/acpi/devices/device:2f/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1 121cafb2224SMario Limonciello /sys/bus/acpi/devices/device:30/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1.CAM0 122cafb2224SMario Limonciello /sys/bus/acpi/devices/device:31/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1.CAM1 123cafb2224SMario Limonciello /sys/bus/acpi/devices/device:32/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT2 124cafb2224SMario Limonciello /sys/bus/acpi/devices/LNXPOWER:0d/path:\_SB_.PCI0.GP17.XHC1.PWRS 125cafb2224SMario Limonciello 126cafb2224SMario LimoncielloHere you can see it matches to ``device:2d``. Look at the ``physical_node`` 127cafb2224SMario Limoncielloto determine what PCI device that actually is. :: 128cafb2224SMario Limonciello 129cafb2224SMario Limonciello $ ls -l /sys/bus/acpi/devices/device:2d/physical_node 130cafb2224SMario Limonciello lrwxrwxrwx 1 root root 0 Feb 12 13:22 /sys/bus/acpi/devices/device:2d/physical_node -> ../../../../../pci0000:00/0000:00:08.1/0000:c2:00.4 131cafb2224SMario Limonciello 132cafb2224SMario LimoncielloSo there you have it: the PCI device associated with this GPIO wakeup was ``0000:c2:00.4``. 133cafb2224SMario Limonciello 134cafb2224SMario LimoncielloThe ``amd_s2idle.py`` script will capture most of these artifacts for you. 135cafb2224SMario Limonciello 136cafb2224SMario Limonciellos2idle PM debug messages 137cafb2224SMario Limonciello======================== 138*ab813102SYazen Ghannam 139cafb2224SMario LimoncielloDuring the s2idle flow on AMD systems, the ACPI LPS0 driver is responsible 140cafb2224SMario Limoncielloto check all uPEP constraints. Failing uPEP constraints does not prevent 141cafb2224SMario Limonciellos0i3 entry. This means that if some constraints are not met, it is possible 142cafb2224SMario Limonciellothe kernel may attempt to enter s2idle even if there are some known issues. 143cafb2224SMario Limonciello 144cafb2224SMario LimoncielloTo activate PM debugging, either specify ``pm_debug_messagess`` kernel 145cafb2224SMario Limonciellocommand-line option at boot or write to ``/sys/power/pm_debug_messages``. 146cafb2224SMario LimoncielloUnmet constraints will be displayed in the kernel log and can be 147cafb2224SMario Limoncielloviewed by logging tools that process kernel ring buffer like ``dmesg`` or 148cafb2224SMario Limonciello``journalctl``." 149cafb2224SMario Limonciello 150cafb2224SMario LimoncielloIf the system freezes on entry/exit before these messages are flushed, a 151cafb2224SMario Limonciellouseful debugging tactic is to unbind the ``amd_pmc`` driver to prevent 152cafb2224SMario Limonciellonotification to the platform to start s0i3 entry. This will stop the 153cafb2224SMario Limonciellosystem from freezing on entry or exit and let you view all the failed 154cafb2224SMario Limoncielloconstraints. :: 155cafb2224SMario Limonciello 156cafb2224SMario Limonciello cd /sys/bus/platform/drivers/amd_pmc 157cafb2224SMario Limonciello ls | grep AMD | sudo tee unbind 158cafb2224SMario Limonciello 159cafb2224SMario LimoncielloAfter doing this, run the suspend cycle and look specifically for errors around: :: 160cafb2224SMario Limonciello 161cafb2224SMario Limonciello ACPI: LPI: Constraint not met; min power state:%s current power state:%s 162cafb2224SMario Limonciello 163cafb2224SMario LimoncielloHistorical examples of s2idle issues 164cafb2224SMario Limonciello==================================== 165*ab813102SYazen Ghannam 166cafb2224SMario LimoncielloTo help understand the types of issues that can occur and how to debug them, 167cafb2224SMario Limonciellohere are some historical examples of s2idle issues that have been resolved. 168cafb2224SMario Limonciello 169cafb2224SMario LimoncielloCore offlining 170cafb2224SMario Limonciello-------------- 171cafb2224SMario LimoncielloAn end user had reported that taking a core offline would prevent the system 172cafb2224SMario Limonciellofrom properly entering s0i3. This was debugged using internal AMD tools 173cafb2224SMario Limoncielloto capture and display a stream of metrics from the hardware showing what changed 174cafb2224SMario Limonciellowhen a core was offlined. It was determined that the hardware didn't get 175cafb2224SMario Limonciellonotification the offline cores were in the deepest state, and so it prevented 176cafb2224SMario LimoncielloCPU from going into the deepest state. The issue was debugged to a missing 177cafb2224SMario Limonciellocommand to put cores into C3 upon offline. 178cafb2224SMario Limonciello 179cafb2224SMario Limonciello`commit d6b88ce2eb9d2 ("ACPI: processor idle: Allow playing dead in C3 state") <https://git.kernel.org/torvalds/c/d6b88ce2eb9d2>`_ 180cafb2224SMario Limonciello 181cafb2224SMario LimoncielloCorruption after resume 182cafb2224SMario Limonciello----------------------- 183cafb2224SMario LimoncielloA big problem that occurred with Rembrandt was that there was graphical 184cafb2224SMario Limonciellocorruption after resume. This happened because of a misalignment of PSP 185cafb2224SMario Limoncielloand driver responsibility. The PSP will save and restore DMCUB, but the 186cafb2224SMario Limonciellodriver assumed it needed to reset DMCUB on resume. 187cafb2224SMario LimoncielloThis actually was a misalignment for earlier silicon as well, but was not 188cafb2224SMario Limoncielloobserved. 189cafb2224SMario Limonciello 190cafb2224SMario Limonciello`commit 79d6b9351f086 ("drm/amd/display: Don't reinitialize DMCUB on s0ix resume") <https://git.kernel.org/torvalds/c/79d6b9351f086>`_ 191cafb2224SMario Limonciello 192cafb2224SMario LimoncielloBack to Back suspends fail 193cafb2224SMario Limonciello-------------------------- 194cafb2224SMario LimoncielloWhen using a wakeup source that triggers the IRQ to wakeup, a bug in the 195cafb2224SMario Limonciellopinctrl-amd driver may capture the wrong state of the IRQ and prevent the 196cafb2224SMario Limonciellosystem going back to sleep properly. 197cafb2224SMario Limonciello 198cafb2224SMario Limonciello`commit b8c824a869f22 ("pinctrl: amd: Don't save/restore interrupt status and wake status bits") <https://git.kernel.org/torvalds/c/b8c824a869f22>`_ 199cafb2224SMario Limonciello 200cafb2224SMario LimoncielloSpurious timer based wakeup after 5 minutes 201cafb2224SMario Limonciello------------------------------------------- 202cafb2224SMario LimoncielloThe HPET was being used to program the wakeup source for the system, however 203cafb2224SMario Limonciellothis was causing a spurious wakeup after 5 minutes. The correct alarm to use 204cafb2224SMario Limonciellowas the ACPI alarm. 205cafb2224SMario Limonciello 206cafb2224SMario Limonciello`commit 3d762e21d5637 ("rtc: cmos: Use ACPI alarm for non-Intel x86 systems too") <https://git.kernel.org/torvalds/c/3d762e21d5637>`_ 207cafb2224SMario Limonciello 208cafb2224SMario LimoncielloDisk disappears after resume 209cafb2224SMario Limonciello---------------------------- 210cafb2224SMario LimoncielloAfter resuming from s2idle, the NVME disk would disappear. This was due to the 211cafb2224SMario LimoncielloBIOS not specifying the _DSD StorageD3Enable property. This caused the NVME 212cafb2224SMario Limonciellodriver not to put the disk into the expected state at suspend and to fail 213cafb2224SMario Limoncielloon resume. 214cafb2224SMario Limonciello 215cafb2224SMario Limonciello`commit e79a10652bbd3 ("ACPI: x86: Force StorageD3Enable on more products") <https://git.kernel.org/torvalds/c/e79a10652bbd3>`_ 216cafb2224SMario Limonciello 217cafb2224SMario LimoncielloSpurious IRQ1 218cafb2224SMario Limonciello------------- 219cafb2224SMario LimoncielloA number of Renoir, Lucienne, Cezanne, & Barcelo platforms have a 220cafb2224SMario Limoncielloplatform firmware bug where IRQ1 is triggered during s0i3 resume. 221cafb2224SMario Limonciello 222cafb2224SMario LimoncielloThis was fixed in the platform firmware, but a number of systems didn't 223cafb2224SMario Limoncielloreceive any more platform firmware updates. 224cafb2224SMario Limonciello 225cafb2224SMario Limonciello`commit 8e60615e89321 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN") <https://git.kernel.org/torvalds/c/8e60615e89321>`_ 226cafb2224SMario Limonciello 227cafb2224SMario LimoncielloHardware timeout 228cafb2224SMario Limonciello---------------- 229cafb2224SMario LimoncielloThe hardware performs many actions besides accepting the values from 230cafb2224SMario Limoncielloamd-pmc driver. As the communication path with the hardware is a mailbox, 231cafb2224SMario Limoncielloit's possible that it might not respond quickly enough. 232cafb2224SMario LimoncielloThis issue manifested as a failure to suspend: :: 233cafb2224SMario Limonciello 234cafb2224SMario Limonciello PM: dpm_run_callback(): acpi_subsys_suspend_noirq+0x0/0x50 returns -110 235cafb2224SMario Limonciello amd_pmc AMDI0005:00: PM: failed to suspend noirq: error -110 236cafb2224SMario Limonciello 237cafb2224SMario LimoncielloThe timing problem was identified by comparing the values of the idle mask. 238cafb2224SMario Limonciello 239cafb2224SMario Limonciello`commit 3c3c8e88c8712 ("platform/x86: amd-pmc: Increase the response register timeout") <https://git.kernel.org/torvalds/c/3c3c8e88c8712>`_ 240cafb2224SMario Limonciello 241cafb2224SMario LimoncielloFailed to reach hardware sleep state with panel on 242cafb2224SMario Limonciello-------------------------------------------------- 243cafb2224SMario LimoncielloOn some Strix systems certain panels were observed to block the system from 244cafb2224SMario Limoncielloentering a hardware sleep state if the internal panel was on during the sequence. 245cafb2224SMario Limonciello 246cafb2224SMario LimoncielloEven though the panel got turned off during suspend it exposed a timing problem 247cafb2224SMario Limonciellowhere an interrupt caused the display hardware to wake up and block low power 248cafb2224SMario Limonciellostate entry. 249cafb2224SMario Limonciello 250cafb2224SMario Limonciello`commit 40b8c14936bd2 ("drm/amd/display: Disable unneeded hpd interrupts during dm_init") <https://git.kernel.org/torvalds/c/40b8c14936bd2>`_ 251cafb2224SMario Limonciello 252cafb2224SMario LimoncielloRuntime power consumption issues 253cafb2224SMario Limonciello================================ 254*ab813102SYazen Ghannam 255cafb2224SMario LimoncielloRuntime power consumption is influenced by many factors, including but not 256cafb2224SMario Limonciellolimited to the configuration of the PCIe Active State Power Management (ASPM), 257cafb2224SMario Limonciellothe display brightness, the EPP policy of the CPU, and the power management 258cafb2224SMario Limoncielloof the devices. 259cafb2224SMario Limonciello 260cafb2224SMario LimoncielloASPM 261cafb2224SMario Limonciello---- 262cafb2224SMario LimoncielloFor the best runtime power consumption, ASPM should be programmed as intended 263cafb2224SMario Limoncielloby the BIOS from the hardware vendor. To accomplish this the Linux kernel 264cafb2224SMario Limoncielloshould be compiled with ``CONFIG_PCIEASPM_DEFAULT`` set to ``y`` and the 265cafb2224SMario Limonciellosysfs file ``/sys/module/pcie_aspm/parameters/policy`` should not be modified. 266cafb2224SMario Limonciello 267cafb2224SMario LimoncielloMost notably, if L1.2 is not configured properly for any devices, the SoC 268cafb2224SMario Limonciellowill not be able to enter the deepest idle state. 269cafb2224SMario Limonciello 270cafb2224SMario LimoncielloEPP Policy 271cafb2224SMario Limonciello---------- 272cafb2224SMario LimoncielloThe ``energy_performance_preference`` sysfs file can be used to set a bias 273cafb2224SMario Limoncielloof efficiency or performance for a CPU. This has a direct relationship on 274cafb2224SMario Limonciellothe battery life when more heavily biased towards performance. 275cafb2224SMario Limonciello 276cafb2224SMario Limonciello 277cafb2224SMario LimoncielloBIOS debug messages 278cafb2224SMario Limonciello=================== 279*ab813102SYazen Ghannam 280cafb2224SMario LimoncielloMost OEM machines don't have a serial UART for outputting kernel or BIOS 281cafb2224SMario Limonciellodebug messages. However BIOS debug messages are useful for understanding 282cafb2224SMario Limoncielloboth BIOS bugs and bugs with the Linux kernel drivers that call BIOS AML. 283cafb2224SMario Limonciello 284cafb2224SMario LimoncielloAs the BIOS on most OEM AMD systems are based off an AMD reference BIOS, 285cafb2224SMario Limonciellothe infrastructure used for exporting debugging messages is often the same 286cafb2224SMario Limoncielloas AMD reference BIOS. 287cafb2224SMario Limonciello 288cafb2224SMario LimoncielloManually Parsing 289cafb2224SMario Limonciello---------------- 290cafb2224SMario LimoncielloThere is generally an ACPI method ``\M460`` that different paths of the AML 291cafb2224SMario Limonciellowill call to emit a message to the BIOS serial log. This method takes 292cafb2224SMario Limonciello7 arguments, with the first being a string and the rest being optional 293cafb2224SMario Limonciellointegers:: 294cafb2224SMario Limonciello 295cafb2224SMario Limonciello Method (M460, 7, Serialized) 296cafb2224SMario Limonciello 297cafb2224SMario LimoncielloHere is an example of a string that BIOS AML may call out using ``\M460``:: 298cafb2224SMario Limonciello 299cafb2224SMario Limonciello M460 (" OEM-ASL-PCIe Address (0x%X)._REG (%d %d) PCSA = %d\n", DADR, Arg0, Arg1, PCSA, Zero, Zero) 300cafb2224SMario Limonciello 301cafb2224SMario LimoncielloNormally when executed, the ``\M460`` method would populate the additional 302cafb2224SMario Limoncielloarguments into the string. In order to get these messages from the Linux 303cafb2224SMario Limonciellokernel a hook has been added into ACPICA that can capture the *arguments* 304cafb2224SMario Limonciellosent to ``\M460`` and print them to the kernel ring buffer. 305cafb2224SMario LimoncielloFor example the following message could be emitted into kernel ring buffer:: 306cafb2224SMario Limonciello 307cafb2224SMario Limonciello extrace-0174 ex_trace_args : " OEM-ASL-PCIe Address (0x%X)._REG (%d %d) PCSA = %d\n", ec106000, 2, 1, 1, 0, 0 308cafb2224SMario Limonciello 309cafb2224SMario LimoncielloIn order to get these messages, you need to compile with ``CONFIG_ACPI_DEBUG`` 310cafb2224SMario Limoncielloand then turn on the following ACPICA tracing parameters. 311cafb2224SMario LimoncielloThis can be done either on the kernel command line or at runtime: 312cafb2224SMario Limonciello 313cafb2224SMario Limonciello* ``acpi.trace_method_name=\M460`` 314cafb2224SMario Limonciello* ``acpi.trace_state=method`` 315cafb2224SMario Limonciello 316cafb2224SMario LimoncielloNOTE: These can be very noisy at bootup. If you turn these parameters on 317cafb2224SMario Limonciellothe kernel command, please also consider turning up ``CONFIG_LOG_BUF_SHIFT`` 318cafb2224SMario Limoncielloto a larger size such as 17 to avoid losing early boot messages. 319cafb2224SMario Limonciello 320cafb2224SMario LimoncielloTool assisted Parsing 321cafb2224SMario Limonciello--------------------- 322cafb2224SMario LimoncielloAs mentioned above, parsing by hand can be tedious, especially with a lot of 323cafb2224SMario Limonciellomessages. To help with this, a tool has been created at 324cafb2224SMario Limonciello`amd-debug-tools <https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/>`_ 325cafb2224SMario Limoncielloto help parse the messages. 326*ab813102SYazen Ghannam 327*ab813102SYazen GhannamRandom reboot issues 328*ab813102SYazen Ghannam==================== 329*ab813102SYazen Ghannam 330*ab813102SYazen GhannamWhen a random reboot occurs, the high-level reason for the reboot is stored 331*ab813102SYazen Ghannamin a register that will persist onto the next boot. 332*ab813102SYazen Ghannam 333*ab813102SYazen GhannamThere are 6 classes of reasons for the reboot: 334*ab813102SYazen Ghannam * Software induced 335*ab813102SYazen Ghannam * Power state transition 336*ab813102SYazen Ghannam * Pin induced 337*ab813102SYazen Ghannam * Hardware induced 338*ab813102SYazen Ghannam * Remote reset 339*ab813102SYazen Ghannam * Internal CPU event 340*ab813102SYazen Ghannam 341*ab813102SYazen Ghannam.. csv-table:: 342*ab813102SYazen Ghannam :header: "Bit", "Type", "Reason" 343*ab813102SYazen Ghannam :align: left 344*ab813102SYazen Ghannam 345*ab813102SYazen Ghannam "0", "Pin", "thermal pin BP_THERMTRIP_L was tripped" 346*ab813102SYazen Ghannam "1", "Pin", "power button was pressed for 4 seconds" 347*ab813102SYazen Ghannam "2", "Pin", "shutdown pin was tripped" 348*ab813102SYazen Ghannam "4", "Remote", "remote ASF power off command was received" 349*ab813102SYazen Ghannam "9", "Internal", "internal CPU thermal limit was tripped" 350*ab813102SYazen Ghannam "16", "Pin", "system reset pin BP_SYS_RST_L was tripped" 351*ab813102SYazen Ghannam "17", "Software", "software issued PCI reset" 352*ab813102SYazen Ghannam "18", "Software", "software wrote 0x4 to reset control register 0xCF9" 353*ab813102SYazen Ghannam "19", "Software", "software wrote 0x6 to reset control register 0xCF9" 354*ab813102SYazen Ghannam "20", "Software", "software wrote 0xE to reset control register 0xCF9" 355*ab813102SYazen Ghannam "21", "ACPI-state", "ACPI power state transition occurred" 356*ab813102SYazen Ghannam "22", "Pin", "keyboard reset pin KB_RST_L was tripped" 357*ab813102SYazen Ghannam "23", "Internal", "internal CPU shutdown event occurred" 358*ab813102SYazen Ghannam "24", "Hardware", "system failed to boot before failed boot timer expired" 359*ab813102SYazen Ghannam "25", "Hardware", "hardware watchdog timer expired" 360*ab813102SYazen Ghannam "26", "Remote", "remote ASF reset command was received" 361*ab813102SYazen Ghannam "27", "Internal", "an uncorrected error caused a data fabric sync flood event" 362*ab813102SYazen Ghannam "29", "Internal", "FCH and MP1 failed warm reset handshake" 363*ab813102SYazen Ghannam "30", "Internal", "a parity error occurred" 364*ab813102SYazen Ghannam "31", "Internal", "a software sync flood event occurred" 365*ab813102SYazen Ghannam 366*ab813102SYazen GhannamThis information is read by the kernel at bootup and printed into 367*ab813102SYazen Ghannamthe syslog. When a random reboot occurs this message can be helpful 368*ab813102SYazen Ghannamto determine the next component to debug. 369