1699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX 2699ea521SShiju JoseDate: March 2025 3699ea521SShiju JoseKernelVersion: 6.15 4699ea521SShiju JoseContact: linux-edac@vger.kernel.org 5699ea521SShiju JoseDescription: 6699ea521SShiju Jose The sysfs EDAC bus devices /<dev-name>/mem_repairX subdirectory 7699ea521SShiju Jose pertains to the memory media repair features control, such as 8699ea521SShiju Jose PPR (Post Package Repair), memory sparing etc, where <dev-name> 9699ea521SShiju Jose directory corresponds to a device registered with the EDAC 10699ea521SShiju Jose device driver for the memory repair features. 11699ea521SShiju Jose 12699ea521SShiju Jose Post Package Repair is a maintenance operation requests the memory 13699ea521SShiju Jose device to perform a repair operation on its media. It is a memory 14699ea521SShiju Jose self-healing feature that fixes a failing memory location by 15699ea521SShiju Jose replacing it with a spare row in a DRAM device. For example, a 16699ea521SShiju Jose CXL memory device with DRAM components that support PPR features may 17699ea521SShiju Jose implement PPR maintenance operations. DRAM components may support 18699ea521SShiju Jose two types of PPR functions: hard PPR, for a permanent row repair, and 19699ea521SShiju Jose soft PPR, for a temporary row repair. Soft PPR may be much faster 20699ea521SShiju Jose than hard PPR, but the repair is lost with a power cycle. 21699ea521SShiju Jose 22699ea521SShiju Jose The sysfs attributes nodes for a repair feature are only 23699ea521SShiju Jose present if the parent driver has implemented the corresponding 24699ea521SShiju Jose attr callback function and provided the necessary operations 25699ea521SShiju Jose to the EDAC device driver during registration. 26699ea521SShiju Jose 27699ea521SShiju Jose In some states of system configuration (e.g. before address 28699ea521SShiju Jose decoders have been configured), memory devices (e.g. CXL) 29699ea521SShiju Jose may not have an active mapping in the main host address 30699ea521SShiju Jose physical address map. As such, the memory to repair must be 31699ea521SShiju Jose identified by a device specific physical addressing scheme 32699ea521SShiju Jose using a device physical address(DPA). The DPA and other control 33699ea521SShiju Jose attributes to use will be presented in related error records. 34699ea521SShiju Jose 35699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/repair_type 36699ea521SShiju JoseDate: March 2025 37699ea521SShiju JoseKernelVersion: 6.15 38699ea521SShiju JoseContact: linux-edac@vger.kernel.org 39699ea521SShiju JoseDescription: 40699ea521SShiju Jose (RO) Memory repair type. For eg. post package repair, 41699ea521SShiju Jose memory sparing etc. Valid values are: 42699ea521SShiju Jose 43699ea521SShiju Jose - ppr - Post package repair. 44699ea521SShiju Jose 45*81e42fc1SShiju Jose - cacheline-sparing 46*81e42fc1SShiju Jose 47*81e42fc1SShiju Jose - row-sparing 48*81e42fc1SShiju Jose 49*81e42fc1SShiju Jose - bank-sparing 50*81e42fc1SShiju Jose 51*81e42fc1SShiju Jose - rank-sparing 52*81e42fc1SShiju Jose 53699ea521SShiju Jose - All other values are reserved. 54699ea521SShiju Jose 55699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/persist_mode 56699ea521SShiju JoseDate: March 2025 57699ea521SShiju JoseKernelVersion: 6.15 58699ea521SShiju JoseContact: linux-edac@vger.kernel.org 59699ea521SShiju JoseDescription: 60699ea521SShiju Jose (RW) Get/Set the current persist repair mode set for a 61699ea521SShiju Jose repair function. Persist repair modes supported in the 62699ea521SShiju Jose device, based on a memory repair function, either is temporary, 63699ea521SShiju Jose which is lost with a power cycle or permanent. Valid values are: 64699ea521SShiju Jose 65699ea521SShiju Jose - 0 - Soft memory repair (temporary repair). 66699ea521SShiju Jose 67699ea521SShiju Jose - 1 - Hard memory repair (permanent repair). 68699ea521SShiju Jose 69699ea521SShiju Jose - All other values are reserved. 70699ea521SShiju Jose 71699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/repair_safe_when_in_use 72699ea521SShiju JoseDate: March 2025 73699ea521SShiju JoseKernelVersion: 6.15 74699ea521SShiju JoseContact: linux-edac@vger.kernel.org 75699ea521SShiju JoseDescription: 76699ea521SShiju Jose (RO) True if memory media is accessible and data is retained 77699ea521SShiju Jose during the memory repair operation. 78699ea521SShiju Jose The data may not be retained and memory requests may not be 79699ea521SShiju Jose correctly processed during a repair operation. In such case 80699ea521SShiju Jose repair operation can not be executed at runtime. The memory 81699ea521SShiju Jose must be taken offline. 82699ea521SShiju Jose 83699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/hpa 84699ea521SShiju JoseDate: March 2025 85699ea521SShiju JoseKernelVersion: 6.15 86699ea521SShiju JoseContact: linux-edac@vger.kernel.org 87699ea521SShiju JoseDescription: 88699ea521SShiju Jose (RW) Host Physical Address (HPA) of the memory to repair. 89699ea521SShiju Jose The HPA to use will be provided in related error records. 90699ea521SShiju Jose 91699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/dpa 92699ea521SShiju JoseDate: March 2025 93699ea521SShiju JoseKernelVersion: 6.15 94699ea521SShiju JoseContact: linux-edac@vger.kernel.org 95699ea521SShiju JoseDescription: 96699ea521SShiju Jose (RW) Device Physical Address (DPA) of the memory to repair. 97699ea521SShiju Jose The specific DPA to use will be provided in related error 98699ea521SShiju Jose records. 99699ea521SShiju Jose 100699ea521SShiju Jose In some states of system configuration (e.g. before address 101699ea521SShiju Jose decoders have been configured), memory devices (e.g. CXL) 102699ea521SShiju Jose may not have an active mapping in the main host address 103699ea521SShiju Jose physical address map. As such, the memory to repair must be 104699ea521SShiju Jose identified by a device specific physical addressing scheme 105699ea521SShiju Jose using a DPA. The device physical address(DPA) to use will be 106699ea521SShiju Jose presented in related error records. 107699ea521SShiju Jose 108699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/nibble_mask 109699ea521SShiju JoseDate: March 2025 110699ea521SShiju JoseKernelVersion: 6.15 111699ea521SShiju JoseContact: linux-edac@vger.kernel.org 112699ea521SShiju JoseDescription: 113699ea521SShiju Jose (RW) Read/Write Nibble mask of the memory to repair. 114699ea521SShiju Jose Nibble mask identifies one or more nibbles in error on the 115699ea521SShiju Jose memory bus that produced the error event. Nibble Mask bit 0 116699ea521SShiju Jose shall be set if nibble 0 on the memory bus produced the 117699ea521SShiju Jose event, etc. For example, CXL PPR and sparing, a nibble mask 118699ea521SShiju Jose bit set to 1 indicates the request to perform repair 119699ea521SShiju Jose operation in the specific device. All nibble mask bits set 120699ea521SShiju Jose to 1 indicates the request to perform the operation in all 121699ea521SShiju Jose devices. Eg. for CXL memory repair, the specific value of 122699ea521SShiju Jose nibble mask to use will be provided in related error records. 123699ea521SShiju Jose For more details, See nibble mask field in CXL spec ver 3.1, 124699ea521SShiju Jose section 8.2.9.7.1.2 Table 8-103 soft PPR and section 125699ea521SShiju Jose 8.2.9.7.1.3 Table 8-104 hard PPR, section 8.2.9.7.1.4 126699ea521SShiju Jose Table 8-105 memory sparing. 127699ea521SShiju Jose 128699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/min_hpa 129699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/max_hpa 130699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/min_dpa 131699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/max_dpa 132699ea521SShiju JoseDate: March 2025 133699ea521SShiju JoseKernelVersion: 6.15 134699ea521SShiju JoseContact: linux-edac@vger.kernel.org 135699ea521SShiju JoseDescription: 136699ea521SShiju Jose (RW) The supported range of memory address that is to be 137699ea521SShiju Jose repaired. The memory device may give the supported range of 138699ea521SShiju Jose attributes to use and it will depend on the memory device 139699ea521SShiju Jose and the portion of memory to repair. 140699ea521SShiju Jose The userspace may receive the specific value of attributes 141699ea521SShiju Jose to use for a repair operation from the memory device via 142699ea521SShiju Jose related error records and trace events, for eg. CXL DRAM 143699ea521SShiju Jose and CXL general media error records in CXL memory devices. 144699ea521SShiju Jose 145*81e42fc1SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/bank_group 146*81e42fc1SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/bank 147*81e42fc1SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/rank 148*81e42fc1SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/row 149*81e42fc1SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/column 150*81e42fc1SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/channel 151*81e42fc1SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/sub_channel 152*81e42fc1SShiju JoseDate: March 2025 153*81e42fc1SShiju JoseKernelVersion: 6.15 154*81e42fc1SShiju JoseContact: linux-edac@vger.kernel.org 155*81e42fc1SShiju JoseDescription: 156*81e42fc1SShiju Jose (RW) The control attributes for the memory to be repaired. 157*81e42fc1SShiju Jose The specific value of attributes to use depends on the 158*81e42fc1SShiju Jose portion of memory to repair and will be reported to the host 159*81e42fc1SShiju Jose in related error records and be available to userspace 160*81e42fc1SShiju Jose in trace events, such as CXL DRAM and CXL general media 161*81e42fc1SShiju Jose error records of CXL memory devices. 162*81e42fc1SShiju Jose 163*81e42fc1SShiju Jose When readng back these attributes, it returns the current 164*81e42fc1SShiju Jose value of memory requested to be repaired. 165*81e42fc1SShiju Jose 166*81e42fc1SShiju Jose bank_group - The bank group of the memory to repair. 167*81e42fc1SShiju Jose 168*81e42fc1SShiju Jose bank - The bank number of the memory to repair. 169*81e42fc1SShiju Jose 170*81e42fc1SShiju Jose rank - The rank of the memory to repair. Rank is defined as a 171*81e42fc1SShiju Jose set of memory devices on a channel that together execute a 172*81e42fc1SShiju Jose transaction. 173*81e42fc1SShiju Jose 174*81e42fc1SShiju Jose row - The row number of the memory to repair. 175*81e42fc1SShiju Jose 176*81e42fc1SShiju Jose column - The column number of the memory to repair. 177*81e42fc1SShiju Jose 178*81e42fc1SShiju Jose channel - The channel of the memory to repair. Channel is 179*81e42fc1SShiju Jose defined as an interface that can be independently accessed 180*81e42fc1SShiju Jose for a transaction. 181*81e42fc1SShiju Jose 182*81e42fc1SShiju Jose sub_channel - The subchannel of the memory to repair. 183*81e42fc1SShiju Jose 184*81e42fc1SShiju Jose The requirement to set these attributes varies based on the 185*81e42fc1SShiju Jose repair function. The attributes in sysfs are not present 186*81e42fc1SShiju Jose unless required for a repair function. 187*81e42fc1SShiju Jose 188*81e42fc1SShiju Jose For example, CXL spec ver 3.1, Section 8.2.9.7.1.2 Table 8-103 189*81e42fc1SShiju Jose soft PPR and Section 8.2.9.7.1.3 Table 8-104 hard PPR operations, 190*81e42fc1SShiju Jose these attributes are not required to set. CXL spec ver 3.1, 191*81e42fc1SShiju Jose Section 8.2.9.7.1.4 Table 8-105 memory sparing, these attributes 192*81e42fc1SShiju Jose are required to set based on memory sparing granularity. 193*81e42fc1SShiju Jose 194699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/repair 195699ea521SShiju JoseDate: March 2025 196699ea521SShiju JoseKernelVersion: 6.15 197699ea521SShiju JoseContact: linux-edac@vger.kernel.org 198699ea521SShiju JoseDescription: 199699ea521SShiju Jose (WO) Issue the memory repair operation for the specified 200699ea521SShiju Jose memory repair attributes. The operation may fail if resources 201699ea521SShiju Jose are insufficient based on the requirements of the memory 202699ea521SShiju Jose device and repair function. 203699ea521SShiju Jose 204699ea521SShiju Jose - 1 - Issue the repair operation. 205699ea521SShiju Jose 206699ea521SShiju Jose - All other values are reserved. 207