xref: /linux/Documentation/ABI/testing/sysfs-bus-pci-devices-aer (revision 7f81907b7e3f93dfed2e903af52659baa4944341)
1PCIe Device AER statistics
2--------------------------
3
4These attributes show up under all the devices that are AER capable. These
5statistical counters indicate the errors "as seen/reported by the device".
6Note that this may mean that if an endpoint is causing problems, the AER
7counters may increment at its link partner (e.g. root port) because the
8errors may be "seen" / reported by the link partner and not the
9problematic endpoint itself (which may report all counters as 0 as it never
10saw any problems).
11
12What:		/sys/bus/pci/devices/<dev>/aer_dev_correctable
13Date:		July 2018
14KernelVersion:	4.19.0
15Contact:	linux-pci@vger.kernel.org, rajatja@google.com
16Description:	List of correctable errors seen and reported by this
17		PCI device using ERR_COR. Note that since multiple errors may
18		be reported using a single ERR_COR message, thus
19		TOTAL_ERR_COR at the end of the file may not match the actual
20		total of all the errors in the file. Sample output::
21
22		    localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_correctable
23		    Receiver Error 2
24		    Bad TLP 0
25		    Bad DLLP 0
26		    RELAY_NUM Rollover 0
27		    Replay Timer Timeout 0
28		    Advisory Non-Fatal 0
29		    Corrected Internal Error 0
30		    Header Log Overflow 0
31		    TOTAL_ERR_COR 2
32
33What:		/sys/bus/pci/devices/<dev>/aer_dev_fatal
34Date:		July 2018
35KernelVersion:	4.19.0
36Contact:	linux-pci@vger.kernel.org, rajatja@google.com
37Description:	List of uncorrectable fatal errors seen and reported by this
38		PCI device using ERR_FATAL. Note that since multiple errors may
39		be reported using a single ERR_FATAL message, thus
40		TOTAL_ERR_FATAL at the end of the file may not match the actual
41		total of all the errors in the file. Sample output::
42
43		    localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_fatal
44		    Undefined 0
45		    Data Link Protocol 0
46		    Surprise Down Error 0
47		    Poisoned TLP 0
48		    Flow Control Protocol 0
49		    Completion Timeout 0
50		    Completer Abort 0
51		    Unexpected Completion 0
52		    Receiver Overflow 0
53		    Malformed TLP 0
54		    ECRC 0
55		    Unsupported Request 0
56		    ACS Violation 0
57		    Uncorrectable Internal Error 0
58		    MC Blocked TLP 0
59		    AtomicOp Egress Blocked 0
60		    TLP Prefix Blocked Error 0
61		    TOTAL_ERR_FATAL 0
62
63What:		/sys/bus/pci/devices/<dev>/aer_dev_nonfatal
64Date:		July 2018
65KernelVersion:	4.19.0
66Contact:	linux-pci@vger.kernel.org, rajatja@google.com
67Description:	List of uncorrectable nonfatal errors seen and reported by this
68		PCI device using ERR_NONFATAL. Note that since multiple errors
69		may be reported using a single ERR_FATAL message, thus
70		TOTAL_ERR_NONFATAL at the end of the file may not match the
71		actual total of all the errors in the file. Sample output::
72
73		    localhost /sys/devices/pci0000:00/0000:00:1c.0 # cat aer_dev_nonfatal
74		    Undefined 0
75		    Data Link Protocol 0
76		    Surprise Down Error 0
77		    Poisoned TLP 0
78		    Flow Control Protocol 0
79		    Completion Timeout 0
80		    Completer Abort 0
81		    Unexpected Completion 0
82		    Receiver Overflow 0
83		    Malformed TLP 0
84		    ECRC 0
85		    Unsupported Request 0
86		    ACS Violation 0
87		    Uncorrectable Internal Error 0
88		    MC Blocked TLP 0
89		    AtomicOp Egress Blocked 0
90		    TLP Prefix Blocked Error 0
91		    TOTAL_ERR_NONFATAL 0
92
93PCIe Rootport AER statistics
94----------------------------
95
96These attributes show up under only the rootports (or root complex event
97collectors) that are AER capable. These indicate the number of error messages as
98"reported to" the rootport. Please note that the rootports also transmit
99(internally) the ERR_* messages for errors seen by the internal rootport PCI
100device, so these counters include them and are thus cumulative of all the error
101messages on the PCI hierarchy originating at that root port.
102
103What:		/sys/bus/pci/devices/<dev>/aer_rootport_total_err_cor
104Date:		July 2018
105KernelVersion:	4.19.0
106Contact:	linux-pci@vger.kernel.org, rajatja@google.com
107Description:	Total number of ERR_COR messages reported to rootport.
108
109What:		/sys/bus/pci/devices/<dev>/aer_rootport_total_err_fatal
110Date:		July 2018
111KernelVersion:	4.19.0
112Contact:	linux-pci@vger.kernel.org, rajatja@google.com
113Description:	Total number of ERR_FATAL messages reported to rootport.
114
115What:		/sys/bus/pci/devices/<dev>/aer_rootport_total_err_nonfatal
116Date:		July 2018
117KernelVersion:	4.19.0
118Contact:	linux-pci@vger.kernel.org, rajatja@google.com
119Description:	Total number of ERR_NONFATAL messages reported to rootport.
120
121PCIe AER ratelimits
122-------------------
123
124These attributes show up under all the devices that are AER capable.
125They represent configurable ratelimits of logs per error type.
126
127See Documentation/PCI/pcieaer-howto.rst for more info on ratelimits.
128
129What:		/sys/bus/pci/devices/<dev>/aer/correctable_ratelimit_interval_ms
130Date:		May 2025
131KernelVersion:	6.16.0
132Contact:	linux-pci@vger.kernel.org
133Description:	Writing 0 disables AER correctable error log ratelimiting.
134		Writing a positive value sets the ratelimit interval in ms.
135		Default is DEFAULT_RATELIMIT_INTERVAL (5000 ms).
136
137What:		/sys/bus/pci/devices/<dev>/aer/correctable_ratelimit_burst
138Date:		May 2025
139KernelVersion:	6.16.0
140Contact:	linux-pci@vger.kernel.org
141Description:	Ratelimit burst for correctable error logs. Writing a value
142		changes the number of errors (burst) allowed per interval
143		before ratelimiting. Reading gets the current ratelimit
144		burst. Default is DEFAULT_RATELIMIT_BURST (10).
145
146What:		/sys/bus/pci/devices/<dev>/aer/nonfatal_ratelimit_interval_ms
147Date:		May 2025
148KernelVersion:	6.16.0
149Contact:	linux-pci@vger.kernel.org
150Description:	Writing 0 disables AER non-fatal uncorrectable error log
151		ratelimiting. Writing a positive value sets the ratelimit
152		interval in ms. Default is DEFAULT_RATELIMIT_INTERVAL
153		(5000 ms).
154
155What:		/sys/bus/pci/devices/<dev>/aer/nonfatal_ratelimit_burst
156Date:		May 2025
157KernelVersion:	6.16.0
158Contact:	linux-pci@vger.kernel.org
159Description:	Ratelimit burst for non-fatal uncorrectable error logs.
160		Writing a value changes the number of errors (burst)
161		allowed per interval before ratelimiting. Reading gets the
162		current ratelimit burst. Default is DEFAULT_RATELIMIT_BURST
163		(10).
164