1.. SPDX-License-Identifier: GPL-2.0 2 3================================================= 4Recoverable Hardware Error Tracking in vmcoreinfo 5================================================= 6 7Overview 8-------- 9 10This feature provides a generic infrastructure within the Linux kernel to track 11and log recoverable hardware errors. These are hardware recoverable errors 12visible that might not cause immediate panics but may influence health, mainly 13because new code path will be executed in the kernel. 14 15By recording counts and timestamps of recoverable errors into the vmcoreinfo 16crash dump notes, this infrastructure aids post-mortem crash analysis tools in 17correlating hardware events with kernel failures. This enables faster triage 18and better understanding of root causes, especially in large-scale cloud 19environments where hardware issues are common. 20 21Benefits 22-------- 23 24- Facilitates correlation of hardware recoverable errors with kernel panics or 25 unusual code paths that lead to system crashes. 26- Provides operators and cloud providers quick insights, improving reliability 27 and reducing troubleshooting time. 28- Complements existing full hardware diagnostics without replacing them. 29 30Data Exposure and Consumption 31----------------------------- 32 33- The tracked error data consists of per-error-type counts and timestamps of 34 last occurrence. 35- This data is stored in the `hwerror_data` array, categorized by error source 36 types like CPU, memory, PCI, CXL, and others. 37- It is exposed via vmcoreinfo crash dump notes and can be read using tools 38 like `crash`, `drgn`, or other kernel crash analysis utilities. 39- There is no other way to read these data other than from crash dumps. 40- These errors are divided by area, which includes CPU, Memory, PCI, CXL and 41 others. 42 43Typical usage example (in drgn REPL): 44 45.. code-block:: python 46 47 >>> prog['hwerror_data'] 48 (struct hwerror_info[HWERR_RECOV_MAX]){ 49 { 50 .count = (int)844, 51 .timestamp = (time64_t)1752852018, 52 }, 53 ... 54 } 55 56Enabling 57-------- 58 59- This feature is enabled when CONFIG_VMCORE_INFO is set. 60 61