Lines Matching +full:in +full:- +full:memory

1 .. SPDX-License-Identifier: GPL-2.0 OR GFDL-1.2-no-invariants-or-later
7 Copyright (c) 2024-2025 HiSilicon Limited.
11 Invariant Sections, Front-Cover Texts nor Back-Cover Texts.
14 - Written for: 6.15
17 ------------
19 Increasing DRAM size and cost have made memory subsystem reliability an
21 could cause expensive or fatal issues. Memory errors are among the top
24 Memory scrubbing is a feature where an ECC (Error-Correcting Code) engine
25 reads data from each memory media location, corrects if necessary and writes
26 the corrected data back to the same memory media location.
28 DIMMs can be scrubbed at a configurable rate to detect uncorrected memory
35 2. When detected, uncorrected errors caught in unallocated memory pages are
39 memory errors.
41 4. The additional data on failures in memory may be used to build up
42 statistics that are later used to decide whether to use memory repair
45 There are 2 types of memory scrubbing:
49 2. On-demand scrubbing for a specific address range or region of memory.
51 Several types of interfaces to hardware memory scrubbers have been
52 identified, such as CXL memory device patrol scrub, CXL DDR5 ECS, ACPI
53 RAS2 memory scrubbing, and ACPI NVDIMM ARS (Address Range Scrub).
55 The control mechanisms vary across different memory scrubbers. To enable
59 A generic memory EDAC scrub control allows users to manage underlying
60 scrubbers in the system through a standardized sysfs control interface. It
65 -----------------------------------------
67 1. Several types of interfaces for hardware memory scrubbers have been
68 identified, including the CXL memory device patrol scrub, CXL DDR5 ECS,
69 ACPI RAS2 memory scrubbing features, ACPI NVDIMM ARS (Address Range Scrub),
70 and software-based memory scrubbers.
72 Of the identified interfaces to hardware memory scrubbers some support
74 on-demand scrubbing (e.g., ACPI RAS2, ACPI ARS). However, the scrub control
75 interfaces vary between memory scrubbers, highlighting the need for
79 2. User-space scrub controls allow users to disable scrubbing if necessary,
81 rate for performance-aware operations where background activities need to
84 3. User-space tools enable on-demand scrubbing for specific address ranges,
87 4. User-space tools can also control memory DIMM scrubbing at a configurable
90 4.1. Detects uncorrectable memory errors early, before user access to affected
91 memory, helping facilitate recovery.
96 5. Policy control for hotplugged memory is necessary because there may not
97 be a system-wide BIOS or similar control to manage scrub settings for a CXL
100 Therefore, a unified interface is recommended for handling this function in
105 ------------------
107 CXL Memory Scrubbing features
110 CXL spec r3.1 [1]_ section 8.2.9.9.11.1 describes the memory device patrol
112 corrections to errors in regular cycle. The patrol scrub control allows the
116 hours in which the patrol scrub cycles must be completed, provided that
118 scrub rate that the device is capable of. In the CXL driver, the
122 In addition, they allow the host to disable the feature in case it interferes
123 with performance-aware operations which require the background operations to
130 - a feature defined in the JEDEC DDR5 SDRAM Specification (JESD79-5) and
131 allowing DRAM to internally read, correct single-bit errors, and write back
135 The DDR5 device contains number of memory media Field Replaceable Units (FRU)
139 ACPI RAS2 Hardware-based Memory Scrubbing
145 of the same component in a given system.
147 Memory RAS features apply to RAS capabilities, controls and operations that
148 are specific to memory. RAS2 PCC sub-spaces for memory-specific RAS features
149 have a Feature Type of 0x00 (Memory).
151 The platform can use the hardware-based memory scrubbing feature to expose
152 controls and capabilities associated with hardware-based memory scrub
153 engines. The RAS2 memory scrubbing feature supports as per spec,
155 1. Independent memory scrubbing controls for each NUMA domain, identified
158 2. Provision for background (patrol) scrubbing of the entire memory system,
159 as well as on-demand scrubbing for a specific region of memory.
165 ARS allows the platform to communicate memory errors to system software.
167 uncorrectable errors in memory. ARS functions manage all NVDIMMs present in
168 the system. Only one scrub can be in progress system wide at any given time.
175 2. Start ARS triggers an Address Range Scrub for the given memory range.
176 Address scrubbing can be done for volatile or persistent memory, or both.
188 supported in EDAC.
190 .. [1] https://computeexpresslink.org/cxl-specification/
197 +--------------+-----------+-----------+-----------+-----------+
200 +--------------+-----------+-----------+-----------+-----------+
202 | On-demand | Supported | No | No | Supported |
205 +--------------+-----------+-----------+-----------+-----------+
210 +--------------+-----------+-----------+-----------+-----------+
212 | Mode of | Scrub ctrl| per device| per memory| Unknown |
215 +--------------+-----------+-----------+-----------+-----------+
220 +--------------+-----------+-----------+-----------+-----------+
225 +--------------+-----------+-----------+-----------+-----------+
230 +--------------+-----------+-----------+-----------+-----------+
232 | Unit for | Not | in hours | No | No |
235 +--------------+-----------+-----------+-----------+-----------+
237 | Scrub | on-demand | No | No | Supported |
240 +--------------+-----------+-----------+-----------+-----------+
245 +--------------+-----------+-----------+-----------+-----------+
250 +--------------+-----------+-----------+-----------+-----------+
253 ---------------
256 accessed in:
258 /sys/bus/edac/devices/<dev-name>/scrubX/
261 -----
263 Sysfs files are documented in
264 `Documentation/ABI/testing/sysfs-edac-scrub`
266 `Documentation/ABI/testing/sysfs-edac-ecs`
269 --------
271 The usage takes the form shown in these examples:
273 1. CXL memory Patrol Scrub
277 - Scrubbing is needed at device granularity because a device is showing
280 - Scrubbing may apply to memory that isn't online at all yet. Likely this
283 - Scrubbing at a higher rate because the monitor software has determined that
289 CXL memory is exposed to memory management subsystem and ultimately userspace
290 via CXL devices. Device-based scrubbing is used for the first use case
291 described in "Section 1 CXL Memory Patrol Scrub".
296 Sysfs files for scrubbing are documented in
297 `Documentation/ABI/testing/sysfs-edac-scrub`
301 CXL memory is exposed to memory management subsystem and ultimately userspace
302 via CXL regions. CXL Regions represent mapped memory capacity in system
304 memory devices with traffic interleaved across them. The user may want to control
308 requests for scrubbing of other regions may result in a higher scrub rate than
311 Region-based scrubbing is used for the third use case described in
312 "Section 1 CXL Memory Patrol Scrub".
317 1. Taking each region in turn from lowest desired scrub rate to highest and set
326 Sysfs files for scrubbing are documented in
327 `Documentation/ABI/testing/sysfs-edac-scrub`
329 2. CXL memory Error Check Scrub (ECS)
331 The Error Check Scrub (ECS) feature enables a memory device to perform error
332 checking and correction (ECC) and count single-bit errors. The associated
333 memory controller sets the ECS mode with a trigger sent to the memory
338 initiating Error Check Scrub on a memory device may lie with the memory
341 Sysfs files for scrubbing are documented in
342 `Documentation/ABI/testing/sysfs-edac-ecs`