xref: /linux/Documentation/arch/x86/amd-debugging.rst (revision 785cdec46e9227f9433884ed3b436471e944007c)
1cafb2224SMario Limonciello.. SPDX-License-Identifier: GPL-2.0
2cafb2224SMario Limonciello
3cafb2224SMario LimoncielloDebugging AMD Zen systems
4cafb2224SMario Limonciello+++++++++++++++++++++++++
5cafb2224SMario Limonciello
6cafb2224SMario LimoncielloIntroduction
7cafb2224SMario Limonciello============
8cafb2224SMario Limonciello
9cafb2224SMario LimoncielloThis document describes techniques that are useful for debugging issues with
10cafb2224SMario LimoncielloAMD Zen systems.  It is intended for use by developers and technical users
11cafb2224SMario Limoncielloto help identify and resolve issues.
12cafb2224SMario Limonciello
13cafb2224SMario LimoncielloS3 vs s2idle
14cafb2224SMario Limonciello============
15cafb2224SMario Limonciello
16cafb2224SMario LimoncielloOn AMD systems, it's not possible to simultaneously support suspend-to-RAM (S3)
17cafb2224SMario Limoncielloand suspend-to-idle (s2idle).  To confirm which mode your system supports you
18cafb2224SMario Limonciellocan look at ``cat /sys/power/mem_sleep``.  If it shows ``s2idle [deep]`` then
19cafb2224SMario Limonciello*S3* is supported.  If it shows ``[s2idle]`` then *s2idle* is
20cafb2224SMario Limonciellosupported.
21cafb2224SMario Limonciello
22cafb2224SMario LimoncielloOn systems that support *S3*, the firmware will be utilized to put all hardware into
23cafb2224SMario Limonciellothe appropriate low power state.
24cafb2224SMario Limonciello
25cafb2224SMario LimoncielloOn systems that support *s2idle*, the kernel will be responsible for transitioning devices
26cafb2224SMario Limonciellointo the appropriate low power state. When all devices are in the appropriate low
27cafb2224SMario Limonciellopower state, the hardware will transition into a hardware sleep state.
28cafb2224SMario Limonciello
29cafb2224SMario LimoncielloAfter a suspend cycle you can tell how much time was spent in a hardware sleep
30cafb2224SMario Limonciellostate by looking at ``cat /sys/power/suspend_stats/last_hw_sleep``.
31cafb2224SMario Limonciello
32cafb2224SMario LimoncielloThis flowchart explains how the AMD s2idle suspend flow works.
33cafb2224SMario Limonciello
34cafb2224SMario Limonciello.. kernel-figure:: suspend.svg
35cafb2224SMario Limonciello
36cafb2224SMario LimoncielloThis flowchart explains how the amd s2idle resume flow works.
37cafb2224SMario Limonciello
38cafb2224SMario Limonciello.. kernel-figure:: resume.svg
39cafb2224SMario Limonciello
40cafb2224SMario Limonciellos2idle debugging tool
41cafb2224SMario Limonciello=====================
42cafb2224SMario Limonciello
43cafb2224SMario LimoncielloAs there are a lot of places that problems can occur, a debugging tool has been
44cafb2224SMario Limonciellocreated at
45cafb2224SMario Limonciello`amd-debug-tools <https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/>`_
46cafb2224SMario Limonciellothat can help test for common problems and offer suggestions.
47cafb2224SMario Limonciello
48cafb2224SMario LimoncielloIf you have an s2idle issue, it's best to start with this and follow instructions
49cafb2224SMario Limonciellofrom its findings.  If you continue to have an issue, raise a bug with the
50cafb2224SMario Limoncielloreport generated from this script to
51cafb2224SMario Limonciello`drm/amd gitlab <https://gitlab.freedesktop.org/drm/amd/-/issues/new?issuable_template=s2idle_BUG_TEMPLATE>`_.
52cafb2224SMario Limonciello
53cafb2224SMario LimoncielloSpurious s2idle wakeups from an IRQ
54cafb2224SMario Limonciello===================================
55*ab813102SYazen Ghannam
56cafb2224SMario LimoncielloSpurious wakeups will generally have an IRQ set to ``/sys/power/pm_wakeup_irq``.
57cafb2224SMario LimoncielloThis can be matched to ``/proc/interrupts`` to determine what device woke the system.
58cafb2224SMario Limonciello
59cafb2224SMario LimoncielloIf this isn't enough to debug the problem, then the following sysfs files
60cafb2224SMario Limonciellocan be set to add more verbosity to the wakeup process: ::
61cafb2224SMario Limonciello
62cafb2224SMario Limonciello  # echo 1 | sudo tee /sys/power/pm_debug_messages
63cafb2224SMario Limonciello  # echo 1 | sudo tee /sys/power/pm_print_times
64cafb2224SMario Limonciello
65cafb2224SMario LimoncielloAfter making those changes, the kernel will display messages that can
66cafb2224SMario Limonciellobe traced back to kernel s2idle loop code as well as display any active
67cafb2224SMario LimoncielloGPIO sources while waking up.
68cafb2224SMario Limonciello
69cafb2224SMario LimoncielloIf the wakeup is caused by the ACPI SCI, additional ACPI debugging may be
70cafb2224SMario Limoncielloneeded.  These commands can enable additional trace data: ::
71cafb2224SMario Limonciello
72cafb2224SMario Limonciello  # echo enable | sudo tee /sys/module/acpi/parameters/trace_state
73cafb2224SMario Limonciello  # echo 1 | sudo tee /sys/module/acpi/parameters/aml_debug_output
74cafb2224SMario Limonciello  # echo 0x0800000f | sudo tee /sys/module/acpi/parameters/debug_level
75cafb2224SMario Limonciello  # echo 0xffff0000 | sudo tee /sys/module/acpi/parameters/debug_layer
76cafb2224SMario Limonciello
77cafb2224SMario LimoncielloSpurious s2idle wakeups from a GPIO
78cafb2224SMario Limonciello===================================
79cafb2224SMario Limonciello
80cafb2224SMario LimoncielloIf a GPIO is active when waking up the system ideally you would look at the
81cafb2224SMario Limoncielloschematic to determine what device it is associated with. If the schematic
82cafb2224SMario Limonciellois not available, another tactic is to look at the ACPI _EVT() entry
83cafb2224SMario Limoncielloto determine what device is notified when that GPIO is active.
84cafb2224SMario Limonciello
85cafb2224SMario LimoncielloFor a hypothetical example, say that GPIO 59 woke up the system.  You can
86cafb2224SMario Limonciellolook at the SSDT to determine what device is notified when GPIO 59 is active.
87cafb2224SMario Limonciello
88cafb2224SMario LimoncielloFirst convert the GPIO number into hex. ::
89cafb2224SMario Limonciello
90cafb2224SMario Limonciello  $ python3 -c "print(hex(59))"
91cafb2224SMario Limonciello  0x3b
92cafb2224SMario Limonciello
93cafb2224SMario LimoncielloNext determine which ACPI table has the ``_EVT`` entry. For example: ::
94cafb2224SMario Limonciello
95cafb2224SMario Limonciello  $ sudo grep EVT /sys/firmware/acpi/tables/SSDT*
96cafb2224SMario Limonciello  grep: /sys/firmware/acpi/tables/SSDT27: binary file matches
97cafb2224SMario Limonciello
98cafb2224SMario LimoncielloDecode this table::
99cafb2224SMario Limonciello
100cafb2224SMario Limonciello  $ sudo cp /sys/firmware/acpi/tables/SSDT27 .
101cafb2224SMario Limonciello  $ sudo iasl -d SSDT27
102cafb2224SMario Limonciello
103cafb2224SMario LimoncielloThen look at the table and find the matching entry for GPIO 0x3b. ::
104cafb2224SMario Limonciello
105cafb2224SMario Limonciello  Case (0x3B)
106cafb2224SMario Limonciello  {
107cafb2224SMario Limonciello      M000 (0x393B)
108cafb2224SMario Limonciello      M460 ("    Notify (\\_SB.PCI0.GP17.XHC1, 0x02)\n", Zero, Zero, Zero, Zero, Zero, Zero)
109cafb2224SMario Limonciello      Notify (\_SB.PCI0.GP17.XHC1, 0x02) // Device Wake
110cafb2224SMario Limonciello  }
111cafb2224SMario Limonciello
112cafb2224SMario LimoncielloYou can see in this case that the device ``\_SB.PCI0.GP17.XHC1`` is notified
113cafb2224SMario Limonciellowhen GPIO 59 is active. It's obvious this is an XHCI controller, but to go a
114cafb2224SMario Limonciellostep further you can figure out which XHCI controller it is by matching it to
115cafb2224SMario LimoncielloACPI.::
116cafb2224SMario Limonciello
117cafb2224SMario Limonciello  $ grep "PCI0.GP17.XHC1" /sys/bus/acpi/devices/*/path
118cafb2224SMario Limonciello  /sys/bus/acpi/devices/device:2d/path:\_SB_.PCI0.GP17.XHC1
119cafb2224SMario Limonciello  /sys/bus/acpi/devices/device:2e/path:\_SB_.PCI0.GP17.XHC1.RHUB
120cafb2224SMario Limonciello  /sys/bus/acpi/devices/device:2f/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1
121cafb2224SMario Limonciello  /sys/bus/acpi/devices/device:30/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1.CAM0
122cafb2224SMario Limonciello  /sys/bus/acpi/devices/device:31/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT1.CAM1
123cafb2224SMario Limonciello  /sys/bus/acpi/devices/device:32/path:\_SB_.PCI0.GP17.XHC1.RHUB.PRT2
124cafb2224SMario Limonciello  /sys/bus/acpi/devices/LNXPOWER:0d/path:\_SB_.PCI0.GP17.XHC1.PWRS
125cafb2224SMario Limonciello
126cafb2224SMario LimoncielloHere you can see it matches to ``device:2d``. Look at the ``physical_node``
127cafb2224SMario Limoncielloto determine what PCI device that actually is. ::
128cafb2224SMario Limonciello
129cafb2224SMario Limonciello  $ ls -l /sys/bus/acpi/devices/device:2d/physical_node
130cafb2224SMario Limonciello  lrwxrwxrwx 1 root root 0 Feb 12 13:22 /sys/bus/acpi/devices/device:2d/physical_node -> ../../../../../pci0000:00/0000:00:08.1/0000:c2:00.4
131cafb2224SMario Limonciello
132cafb2224SMario LimoncielloSo there you have it: the PCI device associated with this GPIO wakeup was ``0000:c2:00.4``.
133cafb2224SMario Limonciello
134cafb2224SMario LimoncielloThe ``amd_s2idle.py`` script will capture most of these artifacts for you.
135cafb2224SMario Limonciello
136cafb2224SMario Limonciellos2idle PM debug messages
137cafb2224SMario Limonciello========================
138*ab813102SYazen Ghannam
139cafb2224SMario LimoncielloDuring the s2idle flow on AMD systems, the ACPI LPS0 driver is responsible
140cafb2224SMario Limoncielloto check all uPEP constraints.  Failing uPEP constraints does not prevent
141cafb2224SMario Limonciellos0i3 entry.  This means that if some constraints are not met, it is possible
142cafb2224SMario Limonciellothe kernel may attempt to enter s2idle even if there are some known issues.
143cafb2224SMario Limonciello
144cafb2224SMario LimoncielloTo activate PM debugging, either specify ``pm_debug_messagess`` kernel
145cafb2224SMario Limonciellocommand-line option at boot or write to ``/sys/power/pm_debug_messages``.
146cafb2224SMario LimoncielloUnmet constraints will be displayed in the kernel log and can be
147cafb2224SMario Limoncielloviewed by logging tools that process kernel ring buffer like ``dmesg`` or
148cafb2224SMario Limonciello``journalctl``."
149cafb2224SMario Limonciello
150cafb2224SMario LimoncielloIf the system freezes on entry/exit before these messages are flushed, a
151cafb2224SMario Limonciellouseful debugging tactic is to unbind the ``amd_pmc`` driver to prevent
152cafb2224SMario Limonciellonotification to the platform to start s0i3 entry.  This will stop the
153cafb2224SMario Limonciellosystem from freezing on entry or exit and let you view all the failed
154cafb2224SMario Limoncielloconstraints. ::
155cafb2224SMario Limonciello
156cafb2224SMario Limonciello  cd /sys/bus/platform/drivers/amd_pmc
157cafb2224SMario Limonciello  ls | grep AMD | sudo tee unbind
158cafb2224SMario Limonciello
159cafb2224SMario LimoncielloAfter doing this, run the suspend cycle and look specifically for errors around: ::
160cafb2224SMario Limonciello
161cafb2224SMario Limonciello  ACPI: LPI: Constraint not met; min power state:%s current power state:%s
162cafb2224SMario Limonciello
163cafb2224SMario LimoncielloHistorical examples of s2idle issues
164cafb2224SMario Limonciello====================================
165*ab813102SYazen Ghannam
166cafb2224SMario LimoncielloTo help understand the types of issues that can occur and how to debug them,
167cafb2224SMario Limonciellohere are some historical examples of s2idle issues that have been resolved.
168cafb2224SMario Limonciello
169cafb2224SMario LimoncielloCore offlining
170cafb2224SMario Limonciello--------------
171cafb2224SMario LimoncielloAn end user had reported that taking a core offline would prevent the system
172cafb2224SMario Limonciellofrom properly entering s0i3.  This was debugged using internal AMD tools
173cafb2224SMario Limoncielloto capture and display a stream of metrics from the hardware showing what changed
174cafb2224SMario Limonciellowhen a core was offlined.  It was determined that the hardware didn't get
175cafb2224SMario Limonciellonotification the offline cores were in the deepest state, and so it prevented
176cafb2224SMario LimoncielloCPU from going into the deepest state. The issue was debugged to a missing
177cafb2224SMario Limonciellocommand to put cores into C3 upon offline.
178cafb2224SMario Limonciello
179cafb2224SMario Limonciello`commit d6b88ce2eb9d2 ("ACPI: processor idle: Allow playing dead in C3 state") <https://git.kernel.org/torvalds/c/d6b88ce2eb9d2>`_
180cafb2224SMario Limonciello
181cafb2224SMario LimoncielloCorruption after resume
182cafb2224SMario Limonciello-----------------------
183cafb2224SMario LimoncielloA big problem that occurred with Rembrandt was that there was graphical
184cafb2224SMario Limonciellocorruption after resume.  This happened because of a misalignment of PSP
185cafb2224SMario Limoncielloand driver responsibility.  The PSP will save and restore DMCUB, but the
186cafb2224SMario Limonciellodriver assumed it needed to reset DMCUB on resume.
187cafb2224SMario LimoncielloThis actually was a misalignment for earlier silicon as well, but was not
188cafb2224SMario Limoncielloobserved.
189cafb2224SMario Limonciello
190cafb2224SMario Limonciello`commit 79d6b9351f086 ("drm/amd/display: Don't reinitialize DMCUB on s0ix resume") <https://git.kernel.org/torvalds/c/79d6b9351f086>`_
191cafb2224SMario Limonciello
192cafb2224SMario LimoncielloBack to Back suspends fail
193cafb2224SMario Limonciello--------------------------
194cafb2224SMario LimoncielloWhen using a wakeup source that triggers the IRQ to wakeup, a bug in the
195cafb2224SMario Limonciellopinctrl-amd driver may capture the wrong state of the IRQ and prevent the
196cafb2224SMario Limonciellosystem going back to sleep properly.
197cafb2224SMario Limonciello
198cafb2224SMario Limonciello`commit b8c824a869f22 ("pinctrl: amd: Don't save/restore interrupt status and wake status bits") <https://git.kernel.org/torvalds/c/b8c824a869f22>`_
199cafb2224SMario Limonciello
200cafb2224SMario LimoncielloSpurious timer based wakeup after 5 minutes
201cafb2224SMario Limonciello-------------------------------------------
202cafb2224SMario LimoncielloThe HPET was being used to program the wakeup source for the system, however
203cafb2224SMario Limonciellothis was causing a spurious wakeup after 5 minutes.  The correct alarm to use
204cafb2224SMario Limonciellowas the ACPI alarm.
205cafb2224SMario Limonciello
206cafb2224SMario Limonciello`commit 3d762e21d5637 ("rtc: cmos: Use ACPI alarm for non-Intel x86 systems too") <https://git.kernel.org/torvalds/c/3d762e21d5637>`_
207cafb2224SMario Limonciello
208cafb2224SMario LimoncielloDisk disappears after resume
209cafb2224SMario Limonciello----------------------------
210cafb2224SMario LimoncielloAfter resuming from s2idle, the NVME disk would disappear.  This was due to the
211cafb2224SMario LimoncielloBIOS not specifying the _DSD StorageD3Enable property.  This caused the NVME
212cafb2224SMario Limonciellodriver not to put the disk into the expected state at suspend and to fail
213cafb2224SMario Limoncielloon resume.
214cafb2224SMario Limonciello
215cafb2224SMario Limonciello`commit e79a10652bbd3 ("ACPI: x86: Force StorageD3Enable on more products") <https://git.kernel.org/torvalds/c/e79a10652bbd3>`_
216cafb2224SMario Limonciello
217cafb2224SMario LimoncielloSpurious IRQ1
218cafb2224SMario Limonciello-------------
219cafb2224SMario LimoncielloA number of Renoir, Lucienne, Cezanne, & Barcelo platforms have a
220cafb2224SMario Limoncielloplatform firmware bug where IRQ1 is triggered during s0i3 resume.
221cafb2224SMario Limonciello
222cafb2224SMario LimoncielloThis was fixed in the platform firmware, but a number of systems didn't
223cafb2224SMario Limoncielloreceive any more platform firmware updates.
224cafb2224SMario Limonciello
225cafb2224SMario Limonciello`commit 8e60615e89321 ("platform/x86/amd: pmc: Disable IRQ1 wakeup for RN/CZN") <https://git.kernel.org/torvalds/c/8e60615e89321>`_
226cafb2224SMario Limonciello
227cafb2224SMario LimoncielloHardware timeout
228cafb2224SMario Limonciello----------------
229cafb2224SMario LimoncielloThe hardware performs many actions besides accepting the values from
230cafb2224SMario Limoncielloamd-pmc driver.  As the communication path with the hardware is a mailbox,
231cafb2224SMario Limoncielloit's possible that it might not respond quickly enough.
232cafb2224SMario LimoncielloThis issue manifested as a failure to suspend: ::
233cafb2224SMario Limonciello
234cafb2224SMario Limonciello  PM: dpm_run_callback(): acpi_subsys_suspend_noirq+0x0/0x50 returns -110
235cafb2224SMario Limonciello  amd_pmc AMDI0005:00: PM: failed to suspend noirq: error -110
236cafb2224SMario Limonciello
237cafb2224SMario LimoncielloThe timing problem was identified by comparing the values of the idle mask.
238cafb2224SMario Limonciello
239cafb2224SMario Limonciello`commit 3c3c8e88c8712 ("platform/x86: amd-pmc: Increase the response register timeout") <https://git.kernel.org/torvalds/c/3c3c8e88c8712>`_
240cafb2224SMario Limonciello
241cafb2224SMario LimoncielloFailed to reach hardware sleep state with panel on
242cafb2224SMario Limonciello--------------------------------------------------
243cafb2224SMario LimoncielloOn some Strix systems certain panels were observed to block the system from
244cafb2224SMario Limoncielloentering a hardware sleep state if the internal panel was on during the sequence.
245cafb2224SMario Limonciello
246cafb2224SMario LimoncielloEven though the panel got turned off during suspend it exposed a timing problem
247cafb2224SMario Limonciellowhere an interrupt caused the display hardware to wake up and block low power
248cafb2224SMario Limonciellostate entry.
249cafb2224SMario Limonciello
250cafb2224SMario Limonciello`commit 40b8c14936bd2 ("drm/amd/display: Disable unneeded hpd interrupts during dm_init") <https://git.kernel.org/torvalds/c/40b8c14936bd2>`_
251cafb2224SMario Limonciello
252cafb2224SMario LimoncielloRuntime power consumption issues
253cafb2224SMario Limonciello================================
254*ab813102SYazen Ghannam
255cafb2224SMario LimoncielloRuntime power consumption is influenced by many factors, including but not
256cafb2224SMario Limonciellolimited to the configuration of the PCIe Active State Power Management (ASPM),
257cafb2224SMario Limonciellothe display brightness, the EPP policy of the CPU, and the power management
258cafb2224SMario Limoncielloof the devices.
259cafb2224SMario Limonciello
260cafb2224SMario LimoncielloASPM
261cafb2224SMario Limonciello----
262cafb2224SMario LimoncielloFor the best runtime power consumption, ASPM should be programmed as intended
263cafb2224SMario Limoncielloby the BIOS from the hardware vendor.  To accomplish this the Linux kernel
264cafb2224SMario Limoncielloshould be compiled with ``CONFIG_PCIEASPM_DEFAULT`` set to ``y`` and the
265cafb2224SMario Limonciellosysfs file ``/sys/module/pcie_aspm/parameters/policy`` should not be modified.
266cafb2224SMario Limonciello
267cafb2224SMario LimoncielloMost notably, if L1.2 is not configured properly for any devices, the SoC
268cafb2224SMario Limonciellowill not be able to enter the deepest idle state.
269cafb2224SMario Limonciello
270cafb2224SMario LimoncielloEPP Policy
271cafb2224SMario Limonciello----------
272cafb2224SMario LimoncielloThe ``energy_performance_preference`` sysfs file can be used to set a bias
273cafb2224SMario Limoncielloof efficiency or performance for a CPU.  This has a direct relationship on
274cafb2224SMario Limonciellothe battery life when more heavily biased towards performance.
275cafb2224SMario Limonciello
276cafb2224SMario Limonciello
277cafb2224SMario LimoncielloBIOS debug messages
278cafb2224SMario Limonciello===================
279*ab813102SYazen Ghannam
280cafb2224SMario LimoncielloMost OEM machines don't have a serial UART for outputting kernel or BIOS
281cafb2224SMario Limonciellodebug messages. However BIOS debug messages are useful for understanding
282cafb2224SMario Limoncielloboth BIOS bugs and bugs with the Linux kernel drivers that call BIOS AML.
283cafb2224SMario Limonciello
284cafb2224SMario LimoncielloAs the BIOS on most OEM AMD systems are based off an AMD reference BIOS,
285cafb2224SMario Limonciellothe infrastructure used for exporting debugging messages is often the same
286cafb2224SMario Limoncielloas AMD reference BIOS.
287cafb2224SMario Limonciello
288cafb2224SMario LimoncielloManually Parsing
289cafb2224SMario Limonciello----------------
290cafb2224SMario LimoncielloThere is generally an ACPI method ``\M460`` that different paths of the AML
291cafb2224SMario Limonciellowill call to emit a message to the BIOS serial log. This method takes
292cafb2224SMario Limonciello7 arguments, with the first being a string and the rest being optional
293cafb2224SMario Limonciellointegers::
294cafb2224SMario Limonciello
295cafb2224SMario Limonciello  Method (M460, 7, Serialized)
296cafb2224SMario Limonciello
297cafb2224SMario LimoncielloHere is an example of a string that BIOS AML may call out using ``\M460``::
298cafb2224SMario Limonciello
299cafb2224SMario Limonciello  M460 ("  OEM-ASL-PCIe Address (0x%X)._REG (%d %d)  PCSA = %d\n", DADR, Arg0, Arg1, PCSA, Zero, Zero)
300cafb2224SMario Limonciello
301cafb2224SMario LimoncielloNormally when executed, the ``\M460`` method would populate the additional
302cafb2224SMario Limoncielloarguments into the string.  In order to get these messages from the Linux
303cafb2224SMario Limonciellokernel a hook has been added into ACPICA that can capture the *arguments*
304cafb2224SMario Limonciellosent to ``\M460`` and print them to the kernel ring buffer.
305cafb2224SMario LimoncielloFor example the following message could be emitted into kernel ring buffer::
306cafb2224SMario Limonciello
307cafb2224SMario Limonciello  extrace-0174 ex_trace_args         :  "  OEM-ASL-PCIe Address (0x%X)._REG (%d %d)  PCSA = %d\n", ec106000, 2, 1, 1, 0, 0
308cafb2224SMario Limonciello
309cafb2224SMario LimoncielloIn order to get these messages, you need to compile with ``CONFIG_ACPI_DEBUG``
310cafb2224SMario Limoncielloand then turn on the following ACPICA tracing parameters.
311cafb2224SMario LimoncielloThis can be done either on the kernel command line or at runtime:
312cafb2224SMario Limonciello
313cafb2224SMario Limonciello* ``acpi.trace_method_name=\M460``
314cafb2224SMario Limonciello* ``acpi.trace_state=method``
315cafb2224SMario Limonciello
316cafb2224SMario LimoncielloNOTE: These can be very noisy at bootup. If you turn these parameters on
317cafb2224SMario Limonciellothe kernel command, please also consider turning up ``CONFIG_LOG_BUF_SHIFT``
318cafb2224SMario Limoncielloto a larger size such as 17 to avoid losing early boot messages.
319cafb2224SMario Limonciello
320cafb2224SMario LimoncielloTool assisted Parsing
321cafb2224SMario Limonciello---------------------
322cafb2224SMario LimoncielloAs mentioned above, parsing by hand can be tedious, especially with a lot of
323cafb2224SMario Limonciellomessages.  To help with this, a tool has been created at
324cafb2224SMario Limonciello`amd-debug-tools <https://git.kernel.org/pub/scm/linux/kernel/git/superm1/amd-debug-tools.git/about/>`_
325cafb2224SMario Limoncielloto help parse the messages.
326*ab813102SYazen Ghannam
327*ab813102SYazen GhannamRandom reboot issues
328*ab813102SYazen Ghannam====================
329*ab813102SYazen Ghannam
330*ab813102SYazen GhannamWhen a random reboot occurs, the high-level reason for the reboot is stored
331*ab813102SYazen Ghannamin a register that will persist onto the next boot.
332*ab813102SYazen Ghannam
333*ab813102SYazen GhannamThere are 6 classes of reasons for the reboot:
334*ab813102SYazen Ghannam * Software induced
335*ab813102SYazen Ghannam * Power state transition
336*ab813102SYazen Ghannam * Pin induced
337*ab813102SYazen Ghannam * Hardware induced
338*ab813102SYazen Ghannam * Remote reset
339*ab813102SYazen Ghannam * Internal CPU event
340*ab813102SYazen Ghannam
341*ab813102SYazen Ghannam.. csv-table::
342*ab813102SYazen Ghannam   :header: "Bit", "Type", "Reason"
343*ab813102SYazen Ghannam   :align: left
344*ab813102SYazen Ghannam
345*ab813102SYazen Ghannam   "0",  "Pin",      "thermal pin BP_THERMTRIP_L was tripped"
346*ab813102SYazen Ghannam   "1",  "Pin",      "power button was pressed for 4 seconds"
347*ab813102SYazen Ghannam   "2",  "Pin",      "shutdown pin was tripped"
348*ab813102SYazen Ghannam   "4",  "Remote",   "remote ASF power off command was received"
349*ab813102SYazen Ghannam   "9",  "Internal", "internal CPU thermal limit was tripped"
350*ab813102SYazen Ghannam   "16", "Pin",      "system reset pin BP_SYS_RST_L was tripped"
351*ab813102SYazen Ghannam   "17", "Software", "software issued PCI reset"
352*ab813102SYazen Ghannam   "18", "Software", "software wrote 0x4 to reset control register 0xCF9"
353*ab813102SYazen Ghannam   "19", "Software", "software wrote 0x6 to reset control register 0xCF9"
354*ab813102SYazen Ghannam   "20", "Software", "software wrote 0xE to reset control register 0xCF9"
355*ab813102SYazen Ghannam   "21", "ACPI-state", "ACPI power state transition occurred"
356*ab813102SYazen Ghannam   "22", "Pin",      "keyboard reset pin KB_RST_L was tripped"
357*ab813102SYazen Ghannam   "23", "Internal", "internal CPU shutdown event occurred"
358*ab813102SYazen Ghannam   "24", "Hardware", "system failed to boot before failed boot timer expired"
359*ab813102SYazen Ghannam   "25", "Hardware", "hardware watchdog timer expired"
360*ab813102SYazen Ghannam   "26", "Remote",   "remote ASF reset command was received"
361*ab813102SYazen Ghannam   "27", "Internal", "an uncorrected error caused a data fabric sync flood event"
362*ab813102SYazen Ghannam   "29", "Internal", "FCH and MP1 failed warm reset handshake"
363*ab813102SYazen Ghannam   "30", "Internal", "a parity error occurred"
364*ab813102SYazen Ghannam   "31", "Internal", "a software sync flood event occurred"
365*ab813102SYazen Ghannam
366*ab813102SYazen GhannamThis information is read by the kernel at bootup and printed into
367*ab813102SYazen Ghannamthe syslog. When a random reboot occurs this message can be helpful
368*ab813102SYazen Ghannamto determine the next component to debug.
369