Lines Matching +full:in +full:- +full:memory
1 .. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-or-later
4 EDAC Memory Repair Control
7 Copyright (c) 2024-2025 HiSilicon Limited.
11 Invariant Sections, Front-Cover Texts nor Back-Cover Texts.
15 - Written for: 6.15
18 ------------
20 Some memory devices support repair operations to address issues in their
21 memory media. Post Package Repair (PPR) and memory sparing are examples of
27 Post Package Repair is a maintenance operation which requests the memory
28 device to perform repair operation on its media. It is a memory self-healing
29 feature that fixes a failing memory location by replacing it with a spare row
30 in a DRAM device.
32 For example, a CXL memory device with DRAM components that support PPR
36 - hard PPR, for a permanent row repair, and
37 - soft PPR, for a temporary row repair.
42 The data may not be retained and memory requests may not be correctly
43 processed during a repair operation. In such case, the repair operation should
46 For example, for CXL memory devices, see CXL spec rev 3.1 [1]_ sections
50 Memory Sparing
53 Memory sparing is a repair function that replaces a portion of memory with
54 a portion of functional memory at a particular granularity. Memory
55 sparing has cacheline/row/bank/rank sparing granularities. For example, in
56 rank memory-sparing mode, one memory rank serves as a spare for other ranks on
57 the same channel in case they fail.
59 The spare rank is held in reserve and not used as active memory until
61 available memory in the system.
63 After an error threshold is surpassed in a system protected by memory sparing,
66 active memory in place of the failed rank.
68 For example, CXL memory devices can support various subclasses for sparing
69 operation vary in terms of the scope of the sparing being performed.
74 to be replaced. Rank sparing is defined as an operation in which an entire DDR
77 See CXL spec 3.1 [1]_ section 8.2.9.7.1.4 Memory Sparing Maintenance
80 .. [1] https://computeexpresslink.org/cxl-specification/
82 Use cases of generic memory repair features control
85 1. The soft PPR, hard PPR and memory-sparing features share similar control
90 2. When a CXL device detects an error in a memory component, it informs the
93 specifies the device physical address (DPA) and attributes of the memory
96 rasdaemon) initiate a repair maintenance operation in response to the
99 3. Userspace tools, such as rasdaemon, request a repair operation on a memory
100 region when maintenance need flag set or an uncorrected memory error or
101 excess of corrected memory errors above a threshold value is reported or an
102 exceed corrected errors threshold flag set for that memory.
104 4. Multiple PPR/sparing instances may be present per memory device.
106 5. Drivers should enforce that live repair is safe. In systems where memory
108 memory errors seen on this boot against which to check live memory repair
112 ---------------
114 The control attributes of a registered memory repair instance could be
115 accessed in the /sys/bus/edac/devices/<dev-name>/mem_repairX/
118 -----
120 Sysfs files are documented in
121 `Documentation/ABI/testing/sysfs-edac-memory-repair`.
124 --------
126 The memory repair usage takes the form shown in this example:
128 1. CXL memory sparing
130 Memory sparing is defined as a repair function that replaces a portion of
131 memory with a portion of functional memory at that same DPA. The subclass
132 for this operation, cacheline/row/bank/rank sparing, vary in terms of the
135 Memory sparing maintenance operations may be supported by CXL devices that
138 device with DRAM components that support memory sparing features may
141 2. CXL memory Soft Post Package Repair (sPPR)
151 Sysfs files for memory repair are documented in
152 `Documentation/ABI/testing/sysfs-edac-memory-repair`