1*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX 2*699ea521SShiju JoseDate: March 2025 3*699ea521SShiju JoseKernelVersion: 6.15 4*699ea521SShiju JoseContact: linux-edac@vger.kernel.org 5*699ea521SShiju JoseDescription: 6*699ea521SShiju Jose The sysfs EDAC bus devices /<dev-name>/mem_repairX subdirectory 7*699ea521SShiju Jose pertains to the memory media repair features control, such as 8*699ea521SShiju Jose PPR (Post Package Repair), memory sparing etc, where <dev-name> 9*699ea521SShiju Jose directory corresponds to a device registered with the EDAC 10*699ea521SShiju Jose device driver for the memory repair features. 11*699ea521SShiju Jose 12*699ea521SShiju Jose Post Package Repair is a maintenance operation requests the memory 13*699ea521SShiju Jose device to perform a repair operation on its media. It is a memory 14*699ea521SShiju Jose self-healing feature that fixes a failing memory location by 15*699ea521SShiju Jose replacing it with a spare row in a DRAM device. For example, a 16*699ea521SShiju Jose CXL memory device with DRAM components that support PPR features may 17*699ea521SShiju Jose implement PPR maintenance operations. DRAM components may support 18*699ea521SShiju Jose two types of PPR functions: hard PPR, for a permanent row repair, and 19*699ea521SShiju Jose soft PPR, for a temporary row repair. Soft PPR may be much faster 20*699ea521SShiju Jose than hard PPR, but the repair is lost with a power cycle. 21*699ea521SShiju Jose 22*699ea521SShiju Jose The sysfs attributes nodes for a repair feature are only 23*699ea521SShiju Jose present if the parent driver has implemented the corresponding 24*699ea521SShiju Jose attr callback function and provided the necessary operations 25*699ea521SShiju Jose to the EDAC device driver during registration. 26*699ea521SShiju Jose 27*699ea521SShiju Jose In some states of system configuration (e.g. before address 28*699ea521SShiju Jose decoders have been configured), memory devices (e.g. CXL) 29*699ea521SShiju Jose may not have an active mapping in the main host address 30*699ea521SShiju Jose physical address map. As such, the memory to repair must be 31*699ea521SShiju Jose identified by a device specific physical addressing scheme 32*699ea521SShiju Jose using a device physical address(DPA). The DPA and other control 33*699ea521SShiju Jose attributes to use will be presented in related error records. 34*699ea521SShiju Jose 35*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/repair_type 36*699ea521SShiju JoseDate: March 2025 37*699ea521SShiju JoseKernelVersion: 6.15 38*699ea521SShiju JoseContact: linux-edac@vger.kernel.org 39*699ea521SShiju JoseDescription: 40*699ea521SShiju Jose (RO) Memory repair type. For eg. post package repair, 41*699ea521SShiju Jose memory sparing etc. Valid values are: 42*699ea521SShiju Jose 43*699ea521SShiju Jose - ppr - Post package repair. 44*699ea521SShiju Jose 45*699ea521SShiju Jose - All other values are reserved. 46*699ea521SShiju Jose 47*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/persist_mode 48*699ea521SShiju JoseDate: March 2025 49*699ea521SShiju JoseKernelVersion: 6.15 50*699ea521SShiju JoseContact: linux-edac@vger.kernel.org 51*699ea521SShiju JoseDescription: 52*699ea521SShiju Jose (RW) Get/Set the current persist repair mode set for a 53*699ea521SShiju Jose repair function. Persist repair modes supported in the 54*699ea521SShiju Jose device, based on a memory repair function, either is temporary, 55*699ea521SShiju Jose which is lost with a power cycle or permanent. Valid values are: 56*699ea521SShiju Jose 57*699ea521SShiju Jose - 0 - Soft memory repair (temporary repair). 58*699ea521SShiju Jose 59*699ea521SShiju Jose - 1 - Hard memory repair (permanent repair). 60*699ea521SShiju Jose 61*699ea521SShiju Jose - All other values are reserved. 62*699ea521SShiju Jose 63*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/repair_safe_when_in_use 64*699ea521SShiju JoseDate: March 2025 65*699ea521SShiju JoseKernelVersion: 6.15 66*699ea521SShiju JoseContact: linux-edac@vger.kernel.org 67*699ea521SShiju JoseDescription: 68*699ea521SShiju Jose (RO) True if memory media is accessible and data is retained 69*699ea521SShiju Jose during the memory repair operation. 70*699ea521SShiju Jose The data may not be retained and memory requests may not be 71*699ea521SShiju Jose correctly processed during a repair operation. In such case 72*699ea521SShiju Jose repair operation can not be executed at runtime. The memory 73*699ea521SShiju Jose must be taken offline. 74*699ea521SShiju Jose 75*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/hpa 76*699ea521SShiju JoseDate: March 2025 77*699ea521SShiju JoseKernelVersion: 6.15 78*699ea521SShiju JoseContact: linux-edac@vger.kernel.org 79*699ea521SShiju JoseDescription: 80*699ea521SShiju Jose (RW) Host Physical Address (HPA) of the memory to repair. 81*699ea521SShiju Jose The HPA to use will be provided in related error records. 82*699ea521SShiju Jose 83*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/dpa 84*699ea521SShiju JoseDate: March 2025 85*699ea521SShiju JoseKernelVersion: 6.15 86*699ea521SShiju JoseContact: linux-edac@vger.kernel.org 87*699ea521SShiju JoseDescription: 88*699ea521SShiju Jose (RW) Device Physical Address (DPA) of the memory to repair. 89*699ea521SShiju Jose The specific DPA to use will be provided in related error 90*699ea521SShiju Jose records. 91*699ea521SShiju Jose 92*699ea521SShiju Jose In some states of system configuration (e.g. before address 93*699ea521SShiju Jose decoders have been configured), memory devices (e.g. CXL) 94*699ea521SShiju Jose may not have an active mapping in the main host address 95*699ea521SShiju Jose physical address map. As such, the memory to repair must be 96*699ea521SShiju Jose identified by a device specific physical addressing scheme 97*699ea521SShiju Jose using a DPA. The device physical address(DPA) to use will be 98*699ea521SShiju Jose presented in related error records. 99*699ea521SShiju Jose 100*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/nibble_mask 101*699ea521SShiju JoseDate: March 2025 102*699ea521SShiju JoseKernelVersion: 6.15 103*699ea521SShiju JoseContact: linux-edac@vger.kernel.org 104*699ea521SShiju JoseDescription: 105*699ea521SShiju Jose (RW) Read/Write Nibble mask of the memory to repair. 106*699ea521SShiju Jose Nibble mask identifies one or more nibbles in error on the 107*699ea521SShiju Jose memory bus that produced the error event. Nibble Mask bit 0 108*699ea521SShiju Jose shall be set if nibble 0 on the memory bus produced the 109*699ea521SShiju Jose event, etc. For example, CXL PPR and sparing, a nibble mask 110*699ea521SShiju Jose bit set to 1 indicates the request to perform repair 111*699ea521SShiju Jose operation in the specific device. All nibble mask bits set 112*699ea521SShiju Jose to 1 indicates the request to perform the operation in all 113*699ea521SShiju Jose devices. Eg. for CXL memory repair, the specific value of 114*699ea521SShiju Jose nibble mask to use will be provided in related error records. 115*699ea521SShiju Jose For more details, See nibble mask field in CXL spec ver 3.1, 116*699ea521SShiju Jose section 8.2.9.7.1.2 Table 8-103 soft PPR and section 117*699ea521SShiju Jose 8.2.9.7.1.3 Table 8-104 hard PPR, section 8.2.9.7.1.4 118*699ea521SShiju Jose Table 8-105 memory sparing. 119*699ea521SShiju Jose 120*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/min_hpa 121*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/max_hpa 122*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/min_dpa 123*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/max_dpa 124*699ea521SShiju JoseDate: March 2025 125*699ea521SShiju JoseKernelVersion: 6.15 126*699ea521SShiju JoseContact: linux-edac@vger.kernel.org 127*699ea521SShiju JoseDescription: 128*699ea521SShiju Jose (RW) The supported range of memory address that is to be 129*699ea521SShiju Jose repaired. The memory device may give the supported range of 130*699ea521SShiju Jose attributes to use and it will depend on the memory device 131*699ea521SShiju Jose and the portion of memory to repair. 132*699ea521SShiju Jose The userspace may receive the specific value of attributes 133*699ea521SShiju Jose to use for a repair operation from the memory device via 134*699ea521SShiju Jose related error records and trace events, for eg. CXL DRAM 135*699ea521SShiju Jose and CXL general media error records in CXL memory devices. 136*699ea521SShiju Jose 137*699ea521SShiju JoseWhat: /sys/bus/edac/devices/<dev-name>/mem_repairX/repair 138*699ea521SShiju JoseDate: March 2025 139*699ea521SShiju JoseKernelVersion: 6.15 140*699ea521SShiju JoseContact: linux-edac@vger.kernel.org 141*699ea521SShiju JoseDescription: 142*699ea521SShiju Jose (WO) Issue the memory repair operation for the specified 143*699ea521SShiju Jose memory repair attributes. The operation may fail if resources 144*699ea521SShiju Jose are insufficient based on the requirements of the memory 145*699ea521SShiju Jose device and repair function. 146*699ea521SShiju Jose 147*699ea521SShiju Jose - 1 - Issue the repair operation. 148*699ea521SShiju Jose 149*699ea521SShiju Jose - All other values are reserved. 150