xref: /linux/Documentation/ABI/testing/debugfs-cxl (revision ec2e0fb07d789976c601bec19ecced7a501c3705)
1What:		/sys/kernel/debug/cxl/memX/inject_poison
2Date:		April, 2023
3KernelVersion:	v6.4
4Contact:	linux-cxl@vger.kernel.org
5Description:
6		(WO) When a Device Physical Address (DPA) is written to this
7		attribute, the memdev driver sends an inject poison command to
8		the device for the specified address. The DPA must be 64-byte
9		aligned and the length of the injected poison is 64-bytes. If
10		successful, the device returns poison when the address is
11		accessed through the CXL.mem bus. Injecting poison adds the
12		address to the device's Poison List and the error source is set
13		to Injected. In addition, the device adds a poison creation
14		event to its internal Informational Event log, updates the
15		Event Status register, and if configured, interrupts the host.
16		It is not an error to inject poison into an address that
17		already has poison present and no error is returned. If the
18		device returns 'Inject Poison Limit Reached' an -EBUSY error
19		is returned to the user. The inject_poison attribute is only
20		visible for devices supporting the capability.
21
22		TEST-ONLY INTERFACE: This interface is intended for testing
23		and validation purposes only. It is not a data repair mechanism
24		and should never be used on production systems or live data.
25
26		DATA LOSS RISK: For CXL persistent memory (PMEM) devices,
27		poison injection can result in permanent data loss. Injected
28		poison may render data permanently inaccessible even after
29		clearing, as the clear operation writes zeros and does not
30		recover original data.
31
32		SYSTEM STABILITY RISK: For volatile memory, poison injection
33		can cause kernel crashes, system instability, or unpredictable
34		behavior if the poisoned addresses are accessed by running code
35		or critical kernel structures.
36
37What:		/sys/kernel/debug/cxl/memX/clear_poison
38Date:		April, 2023
39KernelVersion:	v6.4
40Contact:	linux-cxl@vger.kernel.org
41Description:
42		(WO) When a Device Physical Address (DPA) is written to this
43		attribute, the memdev driver sends a clear poison command to
44		the device for the specified address. Clearing poison removes
45		the address from the device's Poison List and writes 0 (zero)
46		for 64 bytes starting at address. It is not an error to clear
47		poison from an address that does not have poison set. If the
48		device cannot clear poison from the address, -ENXIO is returned.
49		The clear_poison attribute is only visible for devices
50		supporting the capability.
51
52		TEST-ONLY INTERFACE: This interface is intended for testing
53		and validation purposes only. It is not a data repair mechanism
54		and should never be used on production systems or live data.
55
56		CLEAR IS NOT DATA RECOVERY: This operation writes zeros to the
57		specified address range and removes the address from the poison
58		list. It does NOT recover or restore original data that may have
59		been present before poison injection. Any original data at the
60		cleared address is permanently lost and replaced with zeros.
61
62		CLEAR IS NOT A REPAIR MECHANISM: This interface is for testing
63		purposes only and should not be used as a data repair tool.
64		Clearing poison is fundamentally different from data recovery
65		or error correction.
66
67What:		/sys/kernel/debug/cxl/regionX/inject_poison
68Date:		August, 2025
69Contact:	linux-cxl@vger.kernel.org
70Description:
71		(WO) When a Host Physical Address (HPA) is written to this
72		attribute, the region driver translates it to a Device
73		Physical Address (DPA) and identifies the corresponding
74		memdev. It then sends an inject poison command to that memdev
75		at the translated DPA. Refer to the memdev ABI entry at:
76		/sys/kernel/debug/cxl/memX/inject_poison for the detailed
77		behavior. This attribute is only visible if all memdevs
78		participating in the region support both inject and clear
79		poison commands.
80
81		TEST-ONLY INTERFACE: This interface is intended for testing
82		and validation purposes only. It is not a data repair mechanism
83		and should never be used on production systems or live data.
84
85		DATA LOSS RISK: For CXL persistent memory (PMEM) devices,
86		poison injection can result in permanent data loss. Injected
87		poison may render data permanently inaccessible even after
88		clearing, as the clear operation writes zeros and does not
89		recover original data.
90
91		SYSTEM STABILITY RISK: For volatile memory, poison injection
92		can cause kernel crashes, system instability, or unpredictable
93		behavior if the poisoned addresses are accessed by running code
94		or critical kernel structures.
95
96What:		/sys/kernel/debug/cxl/regionX/clear_poison
97Date:		August, 2025
98Contact:	linux-cxl@vger.kernel.org
99Description:
100		(WO) When a Host Physical Address (HPA) is written to this
101		attribute, the region driver translates it to a Device
102		Physical Address (DPA) and identifies the corresponding
103		memdev. It then sends a clear poison command to that memdev
104		at the translated DPA. Refer to the memdev ABI entry at:
105		/sys/kernel/debug/cxl/memX/clear_poison for the detailed
106		behavior. This attribute is only visible if all memdevs
107		participating in the region support both inject and clear
108		poison commands.
109
110		TEST-ONLY INTERFACE: This interface is intended for testing
111		and validation purposes only. It is not a data repair mechanism
112		and should never be used on production systems or live data.
113
114		CLEAR IS NOT DATA RECOVERY: This operation writes zeros to the
115		specified address range and removes the address from the poison
116		list. It does NOT recover or restore original data that may have
117		been present before poison injection. Any original data at the
118		cleared address is permanently lost and replaced with zeros.
119
120		CLEAR IS NOT A REPAIR MECHANISM: This interface is for testing
121		purposes only and should not be used as a data repair tool.
122		Clearing poison is fundamentally different from data recovery
123		or error correction.
124
125What:		/sys/kernel/debug/cxl/einj_types
126Date:		January, 2024
127KernelVersion:	v6.9
128Contact:	linux-cxl@vger.kernel.org
129Description:
130		(RO) Prints the CXL protocol error types made available by
131		the platform in the format:
132
133			0x<error number> <error type>
134
135		The possible error types are (as of ACPI v6.5):
136
137			0x1000	CXL.cache Protocol Correctable
138			0x2000	CXL.cache Protocol Uncorrectable non-fatal
139			0x4000	CXL.cache Protocol Uncorrectable fatal
140			0x8000	CXL.mem Protocol Correctable
141			0x10000	CXL.mem Protocol Uncorrectable non-fatal
142			0x20000	CXL.mem Protocol Uncorrectable fatal
143
144		The <error number> can be written to einj_inject to inject
145		<error type> into a chosen dport.
146
147What:		/sys/kernel/debug/cxl/$dport_dev/einj_inject
148Date:		January, 2024
149KernelVersion:	v6.9
150Contact:	linux-cxl@vger.kernel.org
151Description:
152		(WO) Writing an integer to this file injects the corresponding
153		CXL protocol error into $dport_dev ($dport_dev will be a device
154		name from /sys/bus/pci/devices). The integer to type mapping for
155		injection can be found by reading from einj_types. If the dport
156		was enumerated in RCH mode, a CXL 1.1 error is injected, otherwise
157		a CXL 2.0 error is injected.
158