xref: /linux/Documentation/ABI/testing/sysfs-edac-memory-repair (revision 699ea5219c4b1d9d8819eb2d99e51a3fdb7b1d7b)
1*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX
2*699ea521SShiju JoseDate:		March 2025
3*699ea521SShiju JoseKernelVersion:	6.15
4*699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
5*699ea521SShiju JoseDescription:
6*699ea521SShiju Jose		The sysfs EDAC bus devices /<dev-name>/mem_repairX subdirectory
7*699ea521SShiju Jose		pertains to the memory media repair features control, such as
8*699ea521SShiju Jose		PPR (Post Package Repair), memory sparing etc, where <dev-name>
9*699ea521SShiju Jose		directory corresponds to a device registered with the EDAC
10*699ea521SShiju Jose		device driver for the memory repair features.
11*699ea521SShiju Jose
12*699ea521SShiju Jose		Post Package Repair is a maintenance operation requests the memory
13*699ea521SShiju Jose		device to perform a repair operation on its media. It is a memory
14*699ea521SShiju Jose		self-healing feature that fixes a failing memory location by
15*699ea521SShiju Jose		replacing it with a spare row in a DRAM device. For example, a
16*699ea521SShiju Jose		CXL memory device with DRAM components that support PPR features may
17*699ea521SShiju Jose		implement PPR maintenance operations. DRAM components may support
18*699ea521SShiju Jose		two types of PPR functions: hard PPR, for a permanent row repair, and
19*699ea521SShiju Jose		soft PPR, for a temporary row repair. Soft PPR may be much faster
20*699ea521SShiju Jose		than hard PPR, but the repair is lost with a power cycle.
21*699ea521SShiju Jose
22*699ea521SShiju Jose		The sysfs attributes nodes for a repair feature are only
23*699ea521SShiju Jose		present if the parent driver has implemented the corresponding
24*699ea521SShiju Jose		attr callback function and provided the necessary operations
25*699ea521SShiju Jose		to the EDAC device driver during registration.
26*699ea521SShiju Jose
27*699ea521SShiju Jose		In some states of system configuration (e.g. before address
28*699ea521SShiju Jose		decoders have been configured), memory devices (e.g. CXL)
29*699ea521SShiju Jose		may not have an active mapping in the main host address
30*699ea521SShiju Jose		physical address map. As such, the memory to repair must be
31*699ea521SShiju Jose		identified by a device specific physical addressing scheme
32*699ea521SShiju Jose		using a device physical address(DPA). The DPA and other control
33*699ea521SShiju Jose		attributes to use will be presented in related error records.
34*699ea521SShiju Jose
35*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair_type
36*699ea521SShiju JoseDate:		March 2025
37*699ea521SShiju JoseKernelVersion:	6.15
38*699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
39*699ea521SShiju JoseDescription:
40*699ea521SShiju Jose		(RO) Memory repair type. For eg. post package repair,
41*699ea521SShiju Jose		memory sparing etc. Valid values are:
42*699ea521SShiju Jose
43*699ea521SShiju Jose		- ppr - Post package repair.
44*699ea521SShiju Jose
45*699ea521SShiju Jose		- All other values are reserved.
46*699ea521SShiju Jose
47*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/persist_mode
48*699ea521SShiju JoseDate:		March 2025
49*699ea521SShiju JoseKernelVersion:	6.15
50*699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
51*699ea521SShiju JoseDescription:
52*699ea521SShiju Jose		(RW) Get/Set the current persist repair mode set for a
53*699ea521SShiju Jose		repair function. Persist repair modes supported in the
54*699ea521SShiju Jose		device, based on a memory repair function, either is temporary,
55*699ea521SShiju Jose		which is lost with a power cycle or permanent. Valid values are:
56*699ea521SShiju Jose
57*699ea521SShiju Jose		- 0 - Soft memory repair (temporary repair).
58*699ea521SShiju Jose
59*699ea521SShiju Jose		- 1 - Hard memory repair (permanent repair).
60*699ea521SShiju Jose
61*699ea521SShiju Jose		- All other values are reserved.
62*699ea521SShiju Jose
63*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair_safe_when_in_use
64*699ea521SShiju JoseDate:		March 2025
65*699ea521SShiju JoseKernelVersion:	6.15
66*699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
67*699ea521SShiju JoseDescription:
68*699ea521SShiju Jose		(RO) True if memory media is accessible and data is retained
69*699ea521SShiju Jose		during the memory repair operation.
70*699ea521SShiju Jose		The data may not be retained and memory requests may not be
71*699ea521SShiju Jose		correctly processed during a repair operation. In such case
72*699ea521SShiju Jose		repair operation can not be executed at runtime. The memory
73*699ea521SShiju Jose		must be taken offline.
74*699ea521SShiju Jose
75*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/hpa
76*699ea521SShiju JoseDate:		March 2025
77*699ea521SShiju JoseKernelVersion:	6.15
78*699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
79*699ea521SShiju JoseDescription:
80*699ea521SShiju Jose		(RW) Host Physical Address (HPA) of the memory to repair.
81*699ea521SShiju Jose		The HPA to use will be provided in related error records.
82*699ea521SShiju Jose
83*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/dpa
84*699ea521SShiju JoseDate:		March 2025
85*699ea521SShiju JoseKernelVersion:	6.15
86*699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
87*699ea521SShiju JoseDescription:
88*699ea521SShiju Jose		(RW) Device Physical Address (DPA) of the memory to repair.
89*699ea521SShiju Jose		The specific DPA to use will be provided in related error
90*699ea521SShiju Jose		records.
91*699ea521SShiju Jose
92*699ea521SShiju Jose		In some states of system configuration (e.g. before address
93*699ea521SShiju Jose		decoders have been configured), memory devices (e.g. CXL)
94*699ea521SShiju Jose		may not have an active mapping in the main host address
95*699ea521SShiju Jose		physical address map. As such, the memory to repair must be
96*699ea521SShiju Jose		identified by a device specific physical addressing scheme
97*699ea521SShiju Jose		using a DPA. The device physical address(DPA) to use will be
98*699ea521SShiju Jose		presented in related error records.
99*699ea521SShiju Jose
100*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/nibble_mask
101*699ea521SShiju JoseDate:		March 2025
102*699ea521SShiju JoseKernelVersion:	6.15
103*699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
104*699ea521SShiju JoseDescription:
105*699ea521SShiju Jose		(RW) Read/Write Nibble mask of the memory to repair.
106*699ea521SShiju Jose		Nibble mask identifies one or more nibbles in error on the
107*699ea521SShiju Jose		memory bus that produced the error event. Nibble Mask bit 0
108*699ea521SShiju Jose		shall be set if nibble 0 on the memory bus produced the
109*699ea521SShiju Jose		event, etc. For example, CXL PPR and sparing, a nibble mask
110*699ea521SShiju Jose		bit set to 1 indicates the request to perform repair
111*699ea521SShiju Jose		operation in the specific device. All nibble mask bits set
112*699ea521SShiju Jose		to 1 indicates the request to perform the operation in all
113*699ea521SShiju Jose		devices. Eg. for CXL memory repair, the specific value of
114*699ea521SShiju Jose		nibble mask to use will be provided in related error records.
115*699ea521SShiju Jose		For more details, See nibble mask field in CXL spec ver 3.1,
116*699ea521SShiju Jose		section 8.2.9.7.1.2 Table 8-103 soft PPR and section
117*699ea521SShiju Jose		8.2.9.7.1.3 Table 8-104 hard PPR, section 8.2.9.7.1.4
118*699ea521SShiju Jose		Table 8-105 memory sparing.
119*699ea521SShiju Jose
120*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/min_hpa
121*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/max_hpa
122*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/min_dpa
123*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/max_dpa
124*699ea521SShiju JoseDate:		March 2025
125*699ea521SShiju JoseKernelVersion:	6.15
126*699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
127*699ea521SShiju JoseDescription:
128*699ea521SShiju Jose		(RW) The supported range of memory address that is to be
129*699ea521SShiju Jose		repaired. The memory device may give the supported range of
130*699ea521SShiju Jose		attributes to use and it will depend on the memory device
131*699ea521SShiju Jose		and the portion of memory to repair.
132*699ea521SShiju Jose		The userspace may receive the specific value of attributes
133*699ea521SShiju Jose		to use for a repair operation from the memory device via
134*699ea521SShiju Jose		related error records and trace events, for eg. CXL DRAM
135*699ea521SShiju Jose		and CXL general media error records in CXL memory devices.
136*699ea521SShiju Jose
137*699ea521SShiju JoseWhat:		/sys/bus/edac/devices/<dev-name>/mem_repairX/repair
138*699ea521SShiju JoseDate:		March 2025
139*699ea521SShiju JoseKernelVersion:	6.15
140*699ea521SShiju JoseContact:	linux-edac@vger.kernel.org
141*699ea521SShiju JoseDescription:
142*699ea521SShiju Jose		(WO) Issue the memory repair operation for the specified
143*699ea521SShiju Jose		memory repair attributes. The operation may fail if resources
144*699ea521SShiju Jose		are insufficient based on the requirements of the memory
145*699ea521SShiju Jose		device and repair function.
146*699ea521SShiju Jose
147*699ea521SShiju Jose		- 1 - Issue the repair operation.
148*699ea521SShiju Jose
149*699ea521SShiju Jose		- All other values are reserved.
150