xref: /linux/Documentation/ABI/testing/sysfs-edac-memory-repair (revision bbfd5594756011167b8f8de9a00e0c946afda1e6)
1699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX
2699ea521SShiju JoseDate:		March 2025
3699ea521SShiju JoseKernelVersion:	6.15
4699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
5699ea521SShiju JoseDescription:
6699ea521SShiju Jose		The sysfs EDAC bus devices /<dev-name>/mem_repairX subdirectory
7699ea521SShiju Jose		pertains to the memory media repair features control, such as
8699ea521SShiju Jose		PPR (Post Package Repair), memory sparing etc, where <dev-name>
9699ea521SShiju Jose		directory corresponds to a device registered with the EDAC
10699ea521SShiju Jose		device driver for the memory repair features.
11699ea521SShiju Jose
12699ea521SShiju Jose		Post Package Repair is a maintenance operation requests the memory
13699ea521SShiju Jose		device to perform a repair operation on its media. It is a memory
14699ea521SShiju Jose		self-healing feature that fixes a failing memory location by
15699ea521SShiju Jose		replacing it with a spare row in a DRAM device. For example, a
16699ea521SShiju Jose		CXL memory device with DRAM components that support PPR features may
17699ea521SShiju Jose		implement PPR maintenance operations. DRAM components may support
18699ea521SShiju Jose		two types of PPR functions: hard PPR, for a permanent row repair, and
19699ea521SShiju Jose		soft PPR, for a temporary row repair. Soft PPR may be much faster
20699ea521SShiju Jose		than hard PPR, but the repair is lost with a power cycle.
21699ea521SShiju Jose
22699ea521SShiju Jose		The sysfs attributes nodes for a repair feature are only
23699ea521SShiju Jose		present if the parent driver has implemented the corresponding
24699ea521SShiju Jose		attr callback function and provided the necessary operations
25699ea521SShiju Jose		to the EDAC device driver during registration.
26699ea521SShiju Jose
27699ea521SShiju Jose		In some states of system configuration (e.g. before address
28699ea521SShiju Jose		decoders have been configured), memory devices (e.g. CXL)
29699ea521SShiju Jose		may not have an active mapping in the main host address
30699ea521SShiju Jose		physical address map. As such, the memory to repair must be
31699ea521SShiju Jose		identified by a device specific physical addressing scheme
32699ea521SShiju Jose		using a device physical address(DPA). The DPA and other control
33699ea521SShiju Jose		attributes to use will be presented in related error records.
34699ea521SShiju Jose
35699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair_type
36699ea521SShiju JoseDate:		March 2025
37699ea521SShiju JoseKernelVersion:	6.15
38699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
39699ea521SShiju JoseDescription:
40699ea521SShiju Jose		(RO) Memory repair type. For eg. post package repair,
41699ea521SShiju Jose		memory sparing etc. Valid values are:
42699ea521SShiju Jose
43699ea521SShiju Jose		- ppr - Post package repair.
44699ea521SShiju Jose
45*81e42fc1SShiju Jose		- cacheline-sparing
46*81e42fc1SShiju Jose
47*81e42fc1SShiju Jose		- row-sparing
48*81e42fc1SShiju Jose
49*81e42fc1SShiju Jose		- bank-sparing
50*81e42fc1SShiju Jose
51*81e42fc1SShiju Jose		- rank-sparing
52*81e42fc1SShiju Jose
53699ea521SShiju Jose		- All other values are reserved.
54699ea521SShiju Jose
55699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/persist_mode
56699ea521SShiju JoseDate:		March 2025
57699ea521SShiju JoseKernelVersion:	6.15
58699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
59699ea521SShiju JoseDescription:
60699ea521SShiju Jose		(RW) Get/Set the current persist repair mode set for a
61699ea521SShiju Jose		repair function. Persist repair modes supported in the
62699ea521SShiju Jose		device, based on a memory repair function, either is temporary,
63699ea521SShiju Jose		which is lost with a power cycle or permanent. Valid values are:
64699ea521SShiju Jose
65699ea521SShiju Jose		- 0 - Soft memory repair (temporary repair).
66699ea521SShiju Jose
67699ea521SShiju Jose		- 1 - Hard memory repair (permanent repair).
68699ea521SShiju Jose
69699ea521SShiju Jose		- All other values are reserved.
70699ea521SShiju Jose
71699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair_safe_when_in_use
72699ea521SShiju JoseDate:		March 2025
73699ea521SShiju JoseKernelVersion:	6.15
74699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
75699ea521SShiju JoseDescription:
76699ea521SShiju Jose		(RO) True if memory media is accessible and data is retained
77699ea521SShiju Jose		during the memory repair operation.
78699ea521SShiju Jose		The data may not be retained and memory requests may not be
79699ea521SShiju Jose		correctly processed during a repair operation. In such case
80699ea521SShiju Jose		repair operation can not be executed at runtime. The memory
81699ea521SShiju Jose		must be taken offline.
82699ea521SShiju Jose
83699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/hpa
84699ea521SShiju JoseDate:		March 2025
85699ea521SShiju JoseKernelVersion:	6.15
86699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
87699ea521SShiju JoseDescription:
88699ea521SShiju Jose		(RW) Host Physical Address (HPA) of the memory to repair.
89699ea521SShiju Jose		The HPA to use will be provided in related error records.
90699ea521SShiju Jose
91699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/dpa
92699ea521SShiju JoseDate:		March 2025
93699ea521SShiju JoseKernelVersion:	6.15
94699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
95699ea521SShiju JoseDescription:
96699ea521SShiju Jose		(RW) Device Physical Address (DPA) of the memory to repair.
97699ea521SShiju Jose		The specific DPA to use will be provided in related error
98699ea521SShiju Jose		records.
99699ea521SShiju Jose
100699ea521SShiju Jose		In some states of system configuration (e.g. before address
101699ea521SShiju Jose		decoders have been configured), memory devices (e.g. CXL)
102699ea521SShiju Jose		may not have an active mapping in the main host address
103699ea521SShiju Jose		physical address map. As such, the memory to repair must be
104699ea521SShiju Jose		identified by a device specific physical addressing scheme
105699ea521SShiju Jose		using a DPA. The device physical address(DPA) to use will be
106699ea521SShiju Jose		presented in related error records.
107699ea521SShiju Jose
108699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/nibble_mask
109699ea521SShiju JoseDate:		March 2025
110699ea521SShiju JoseKernelVersion:	6.15
111699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
112699ea521SShiju JoseDescription:
113699ea521SShiju Jose		(RW) Read/Write Nibble mask of the memory to repair.
114699ea521SShiju Jose		Nibble mask identifies one or more nibbles in error on the
115699ea521SShiju Jose		memory bus that produced the error event. Nibble Mask bit 0
116699ea521SShiju Jose		shall be set if nibble 0 on the memory bus produced the
117699ea521SShiju Jose		event, etc. For example, CXL PPR and sparing, a nibble mask
118699ea521SShiju Jose		bit set to 1 indicates the request to perform repair
119699ea521SShiju Jose		operation in the specific device. All nibble mask bits set
120699ea521SShiju Jose		to 1 indicates the request to perform the operation in all
121699ea521SShiju Jose		devices. Eg. for CXL memory repair, the specific value of
122699ea521SShiju Jose		nibble mask to use will be provided in related error records.
123699ea521SShiju Jose		For more details, See nibble mask field in CXL spec ver 3.1,
124699ea521SShiju Jose		section 8.2.9.7.1.2 Table 8-103 soft PPR and section
125699ea521SShiju Jose		8.2.9.7.1.3 Table 8-104 hard PPR, section 8.2.9.7.1.4
126699ea521SShiju Jose		Table 8-105 memory sparing.
127699ea521SShiju Jose
128699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/min_hpa
129699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/max_hpa
130699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/min_dpa
131699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/max_dpa
132699ea521SShiju JoseDate:		March 2025
133699ea521SShiju JoseKernelVersion:	6.15
134699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
135699ea521SShiju JoseDescription:
136699ea521SShiju Jose		(RW) The supported range of memory address that is to be
137699ea521SShiju Jose		repaired. The memory device may give the supported range of
138699ea521SShiju Jose		attributes to use and it will depend on the memory device
139699ea521SShiju Jose		and the portion of memory to repair.
140699ea521SShiju Jose		The userspace may receive the specific value of attributes
141699ea521SShiju Jose		to use for a repair operation from the memory device via
142699ea521SShiju Jose		related error records and trace events, for eg. CXL DRAM
143699ea521SShiju Jose		and CXL general media error records in CXL memory devices.
144699ea521SShiju Jose
145*81e42fc1SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/bank_group
146*81e42fc1SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/bank
147*81e42fc1SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/rank
148*81e42fc1SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/row
149*81e42fc1SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/column
150*81e42fc1SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/channel
151*81e42fc1SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/sub_channel
152*81e42fc1SShiju JoseDate:		March 2025
153*81e42fc1SShiju JoseKernelVersion:	6.15
154*81e42fc1SShiju JoseContact:	linux-edac@vger.kernel.org
155*81e42fc1SShiju JoseDescription:
156*81e42fc1SShiju Jose		(RW) The control attributes for the memory to be repaired.
157*81e42fc1SShiju Jose		The specific value of attributes to use depends on the
158*81e42fc1SShiju Jose		portion of memory to repair and will be reported to the host
159*81e42fc1SShiju Jose		in related error records and be available to userspace
160*81e42fc1SShiju Jose		in trace events, such as CXL DRAM and CXL general media
161*81e42fc1SShiju Jose		error records of CXL memory devices.
162*81e42fc1SShiju Jose
163*81e42fc1SShiju Jose		When readng back these attributes, it returns the current
164*81e42fc1SShiju Jose		value of memory requested to be repaired.
165*81e42fc1SShiju Jose
166*81e42fc1SShiju Jose		bank_group - The bank group of the memory to repair.
167*81e42fc1SShiju Jose
168*81e42fc1SShiju Jose		bank - The bank number of the memory to repair.
169*81e42fc1SShiju Jose
170*81e42fc1SShiju Jose		rank - The rank of the memory to repair. Rank is defined as a
171*81e42fc1SShiju Jose		set of memory devices on a channel that together execute a
172*81e42fc1SShiju Jose		transaction.
173*81e42fc1SShiju Jose
174*81e42fc1SShiju Jose		row - The row number of the memory to repair.
175*81e42fc1SShiju Jose
176*81e42fc1SShiju Jose		column - The column number of the memory to repair.
177*81e42fc1SShiju Jose
178*81e42fc1SShiju Jose		channel - The channel of the memory to repair. Channel is
179*81e42fc1SShiju Jose		defined as an interface that can be independently accessed
180*81e42fc1SShiju Jose		for a transaction.
181*81e42fc1SShiju Jose
182*81e42fc1SShiju Jose		sub_channel - The subchannel of the memory to repair.
183*81e42fc1SShiju Jose
184*81e42fc1SShiju Jose		The requirement to set these attributes varies based on the
185*81e42fc1SShiju Jose		repair function. The attributes in sysfs are not present
186*81e42fc1SShiju Jose		unless required for a repair function.
187*81e42fc1SShiju Jose
188*81e42fc1SShiju Jose		For example, CXL spec ver 3.1, Section 8.2.9.7.1.2 Table 8-103
189*81e42fc1SShiju Jose		soft PPR and Section 8.2.9.7.1.3 Table 8-104 hard PPR operations,
190*81e42fc1SShiju Jose		these attributes are not required to set. CXL spec ver 3.1,
191*81e42fc1SShiju Jose		Section 8.2.9.7.1.4 Table 8-105 memory sparing, these attributes
192*81e42fc1SShiju Jose		are required to set based on memory sparing granularity.
193*81e42fc1SShiju Jose
194699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair
195699ea521SShiju JoseDate:		March 2025
196699ea521SShiju JoseKernelVersion:	6.15
197699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
198699ea521SShiju JoseDescription:
199699ea521SShiju Jose		(WO) Issue the memory repair operation for the specified
200699ea521SShiju Jose		memory repair attributes. The operation may fail if resources
201699ea521SShiju Jose		are insufficient based on the requirements of the memory
202699ea521SShiju Jose		device and repair function.
203699ea521SShiju Jose
204699ea521SShiju Jose		- 1 - Issue the repair operation.
205699ea521SShiju Jose
206699ea521SShiju Jose		- All other values are reserved.
207