Revision tags: v6.16-rc5 |
|
#
4f5b1aa2 |
| 04-Jul-2025 |
Takashi Iwai <tiwai@suse.de> |
Merge tag 'asoc-fix-v6.16-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.16
An update for the MAINTAINERS file, plus a number of small drive
Merge tag 'asoc-fix-v6.16-rc4' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v6.16
An update for the MAINTAINERS file, plus a number of small driver specific fixes and device quirks.
show more ...
|
Revision tags: v6.16-rc4 |
|
#
26fd9f7b |
| 28-Jun-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'cxl-fixes-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) fixes from Dave Jiang: "These fixes address a few issues in the CXL subsystem
Merge tag 'cxl-fixes-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) fixes from Dave Jiang: "These fixes address a few issues in the CXL subsystem, including dealing with some bugs in the CXL EDAC and RAS drivers:
- Fix return value of cxlctl_validate_set_features()
- Fix min_scrub_cycle of a region miscaculation and add additional documentation
- Fix potential memory leak issues for CXL EDAC
- Fix CPER handler device confusion for CXL RAS
- Fix using wrong repair type to check DRAM event record"
* tag 'cxl-fixes-6.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl/edac: Fix using wrong repair type to check dram event record cxl/ras: Fix CPER handler device confusion cxl/edac: Fix potential memory leak issues cxl/Documentation: Add more description about min/max scrub cycle cxl/edac: Fix the min_scrub_cycle of a region miscalculation cxl: fix return value in cxlctl_validate_set_features()
show more ...
|
Revision tags: v6.16-rc3 |
|
#
0a46f60a |
| 20-Jun-2025 |
Li Ming <ming.li@zohomail.com> |
cxl/edac: Fix using wrong repair type to check dram event record
cxl_find_rec_dram() is used to find a DRAM event record based on the inputted attributes. Different repair_type of the inputted attri
cxl/edac: Fix using wrong repair type to check dram event record
cxl_find_rec_dram() is used to find a DRAM event record based on the inputted attributes. Different repair_type of the inputted attributes will check the DRAM event record in different ways. When EDAC driver is performing a memory rank sparing, it should use CXL_RANK_SPARING rather than CXL_BANK_SPARING as repair_type for DRAM event record checking.
Fixes: 588ca944c277 ("cxl/edac: Add CXL memory device memory sparing control feature") Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Link: https://patch.msgid.link/20250620052924.138892-1-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
Revision tags: v6.16-rc2 |
|
#
a403fe6c |
| 13-Jun-2025 |
Li Ming <ming.li@zohomail.com> |
cxl/edac: Fix potential memory leak issues
In cxl_store_rec_gen_media() and cxl_store_rec_dram(), use kmemdup() to duplicate a cxl gen_media/dram event to store the event in a xarray by xa_store().
cxl/edac: Fix potential memory leak issues
In cxl_store_rec_gen_media() and cxl_store_rec_dram(), use kmemdup() to duplicate a cxl gen_media/dram event to store the event in a xarray by xa_store(). The cxl gen_media/dram event allocated by kmemdup() should be freed in the case that the xa_store() fails.
Fixes: 0b5ccb0de1e2 ("cxl/edac: Support for finding memory operation attributes from the current boot") Signed-off-by: Li Ming <ming.li@zohomail.com> Tested-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20250613011648.102840-1-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
Revision tags: v6.16-rc1 |
|
#
fdc9be90 |
| 03-Jun-2025 |
Li Ming <ming.li@zohomail.com> |
cxl/edac: Fix the min_scrub_cycle of a region miscalculation
When trying to update the scrub_cycle value of a cxl region, which means updating the scrub_cycle value of each memdev under a cxl region
cxl/edac: Fix the min_scrub_cycle of a region miscalculation
When trying to update the scrub_cycle value of a cxl region, which means updating the scrub_cycle value of each memdev under a cxl region. cxl driver needs to guarantee the new scrub_cycle value is greater than the min_scrub_cycle value of a memdev, otherwise the updating operation will fail(Per Table 8-223 in CXL r3.2 section 8.2.10.9.11.1).
Current implementation logic of getting the min_scrub_cycle value of a cxl region is that getting the min_scrub_cycle value of each memdevs under the cxl region, then using the minimum min_scrub_cycle value as the region's min_scrub_cycle. Checking if the new scrub_cycle value is greater than this value. If yes, updating the new scrub_cycle value to each memdevs. The issue is that the new scrub_cycle value is possibly greater than the minimum min_scrub_cycle value of all memdevs but less than the maximum min_scrub_cycle value of all memdevs if memdevs have a different min_scrub_cycle value. The updating operation will always fail on these memdevs which have a greater min_scrub_cycle than the new scrub_cycle.
The correct implementation logic is to get the maximum value of these memdevs' min_scrub_cycle, check if the new scrub_cycle value is greater than the value. If yes, the new scrub_cycle value is fit for the region.
The change also impacts the result of cxl_patrol_scrub_get_min_scrub_cycle(), the interface returned the minimum min_scrub_cycle value among all memdevs under the region before the change. The interface will return the maximum min_scrub_cycle value among all memdevs under the region with the change.
Signed-off-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://patch.msgid.link/20250603104314.25569-1-ming.li@zohomail.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
29e93590 |
| 03-Jun-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) updates from Dave Jiang:
- Remove always true condition in cxl features code
- A
Merge tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link (CXL) updates from Dave Jiang:
- Remove always true condition in cxl features code
- Add verification of CHBS length for CXL 2.0
- Ignore interleave granularity when interleave ways is 1
- Add update addressing mising MODULE_DESCRIPTION for cxl_test
- A series of cleanups/refactor to prep for AMD Zen5 translate code
- Clean %pa debug printk in core/hdm.c
- Documentation updates: - Update to CXL Maturity Map - Fixes to source linking in CXL documentation - CXL documentation fixes, spelling corrections - A large collection of CXL documentation for the entire CXL subsystem, including documentation on CXL related platform and firmware notes
- Remove redundant code of cxlctl_get_supported_features()
- Series to support CXL RAS Features - Including "Patrol Scrub Control", "Error Check Scrub", "Performance Maitenance" and "Memory Sparing". The series connects CXL to EDAC.
* tag 'cxl-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (53 commits) cxl/edac: Add CXL memory device soft PPR control feature cxl/edac: Add CXL memory device memory sparing control feature cxl/edac: Support for finding memory operation attributes from the current boot cxl/edac: Add support for PERFORM_MAINTENANCE command cxl/edac: Add CXL memory device ECS control feature cxl/edac: Add CXL memory device patrol scrub control feature cxl: Update prototype of function get_support_feature_info() EDAC: Update documentation for the CXL memory patrol scrub control feature cxl/features: Remove the inline specifier from to_cxlfs() cxl/feature: Remove redundant code of get supported features docs: ABI: Fix "firwmare" to "firmware" cxl/Documentation: Fix typo in sysfs write_bandwidth attribute path cxl: doc/linux/access-coordinates Update access coordinates calculation methods cxl: docs/platform/acpi/srat Add generic target documentation cxl: docs/platform/cdat reference documentation Documentation: Update the CXL Maturity Map cxl: Sync up the driver-api/cxl documentation cxl: docs - add self-referencing cross-links cxl: docs/allocation/hugepages cxl: docs/allocation/reclaim ...
show more ...
|
Revision tags: v6.15 |
|
#
9f153b7f |
| 23-May-2025 |
Dave Jiang <dave.jiang@intel.com> |
Merge branch 'for-6.16/cxl-features-ras' into cxl-for-next
Add CXL RAS Features support. Features include "patrol scrub control", "error check scrub", "perform maintenance", and "memory sparing". Th
Merge branch 'for-6.16/cxl-features-ras' into cxl-for-next
Add CXL RAS Features support. Features include "patrol scrub control", "error check scrub", "perform maintenance", and "memory sparing". This support connects the RAS Featurs to EDAC.
show more ...
|
#
be9b359e |
| 21-May-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/edac: Add CXL memory device soft PPR control feature
Post Package Repair (PPR) maintenance operations may be supported by CXL devices that implement CXL.mem protocol. A PPR maintenance operation
cxl/edac: Add CXL memory device soft PPR control feature
Post Package Repair (PPR) maintenance operations may be supported by CXL devices that implement CXL.mem protocol. A PPR maintenance operation requests the CXL device to perform a repair operation on its media. For example, a CXL device with DRAM components that support PPR features may implement PPR Maintenance operations. DRAM components may support two types of PPR, hard PPR (hPPR), for a permanent row repair, and Soft PPR (sPPR), for a temporary row repair. Soft PPR is much faster than hPPR, but the repair is lost with a power cycle.
During the execution of a PPR Maintenance operation, a CXL memory device: - May or may not retain data - May or may not be able to process CXL.mem requests correctly, including the ones that target the DPA involved in the repair. These CXL Memory Device capabilities are specified by Restriction Flags in the sPPR Feature and hPPR Feature.
Soft PPR maintenance operation may be executed at runtime, if data is retained and CXL.mem requests are correctly processed. For CXL devices with DRAM components, hPPR maintenance operation may be executed only at boot because typically data may not be retained with hPPR maintenance operation.
When a CXL device identifies error on a memory component, the device may inform the host about the need for a PPR maintenance operation by using an Event Record, where the Maintenance Needed flag is set. The Event Record specifies the DPA that should be repaired. A CXL device may not keep track of the requests that have already been sent and the information on which DPA should be repaired may be lost upon power cycle. The userspace tool requests for maintenance operation if the number of corrected error reported on a CXL.mem media exceeds error threshold.
CXL spec 3.2 section 8.2.10.7.1.2 describes the device's sPPR (soft PPR) maintenance operation and section 8.2.10.7.1.3 describes the device's hPPR (hard PPR) maintenance operation feature.
CXL spec 3.2 section 8.2.10.7.2.1 describes the sPPR feature discovery and configuration.
CXL spec 3.2 section 8.2.10.7.2.2 describes the hPPR feature discovery and configuration.
Add support for controlling CXL memory device soft PPR (sPPR) feature. Register with EDAC driver, which gets the memory repair attr descriptors from the EDAC memory repair driver and exposes sysfs repair control attributes for PRR to the userspace. For example CXL PPR control for the CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/
Add checks to ensure the memory to be repaired is offline and originates from a CXL DRAM or CXL gen_media error record reported in the current boot, before requesting a PPR operation on the device.
Note: Tested with QEMU patch for CXL PPR feature. https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m70b2b010f43f7f4a6f9acee5ec9008498bf292c3
Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250521124749.817-9-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
588ca944 |
| 21-May-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/edac: Add CXL memory device memory sparing control feature
Memory sparing is defined as a repair function that replaces a portion of memory with a portion of functional memory at that same DPA.
cxl/edac: Add CXL memory device memory sparing control feature
Memory sparing is defined as a repair function that replaces a portion of memory with a portion of functional memory at that same DPA. The subclasses for this operation vary in terms of the scope of the sparing being performed. The cacheline sparing subclass refers to a sparing action that can replace a full cacheline. Row sparing is provided as an alternative to PPR sparing functions and its scope is that of a single DDR row. As per CXL r3.2 Table 8-125 foot note 1. Memory sparing is preferred over PPR when possible. Bank sparing allows an entire bank to be replaced. Rank sparing is defined as an operation in which an entire DDR rank is replaced.
Memory sparing maintenance operations may be supported by CXL devices that implement CXL.mem protocol. A sparing maintenance operation requests the CXL device to perform a repair operation on its media. For example, a CXL device with DRAM components that support memory sparing features may implement sparing maintenance operations.
The host may issue a query command by setting query resources flag in the input payload (CXL spec 3.2 Table 8-120) to determine availability of sparing resources for a given address. In response to a query request, the device shall report the resource availability by producing the memory sparing event record (CXL spec 3.2 Table 8-60) in which the Channel, Rank, Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy of the values specified in the request.
During the execution of a sparing maintenance operation, a CXL memory device: - may not retain data - may not be able to process CXL.mem requests correctly. These CXL memory device capabilities are specified by restriction flags in the memory sparing feature readable attributes.
When a CXL device identifies error on a memory component, the device may inform the host about the need for a memory sparing maintenance operation by using DRAM event record, where the 'maintenance needed' flag may set. The event record contains some of the DPA, Channel, Rank, Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that should be repaired. The userspace tool requests for maintenance operation if the 'maintenance needed' flag set in the CXL DRAM error record.
CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing maintenance operation feature.
CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature discovery and configuration.
Add support for controlling CXL memory device memory sparing feature. Register with EDAC driver, which gets the memory repair attr descriptors from the EDAC memory repair driver and exposes sysfs repair control attributes for memory sparing to the userspace. For example CXL memory sparing control for the CXL mem0 device is exposed in /sys/bus/edac/devices/cxl_mem0/mem_repairX/
Use case ======== 1. CXL device identifies a failure in a memory component, report to userspace in a CXL DRAM trace event with DPA and other attributes of memory to repair such as channel, rank, nibble mask, bank Group, bank, row, column, sub-channel.
2. Rasdaemon process the trace event and may issue query request in sysfs check resources available for memory sparing if either of the following conditions met. - 'maintenance needed' flag set in the event record. - 'threshold event' flag set for CVME threshold feature. - When the number of corrected error reported on a CXL.mem media to the userspace exceeds the threshold value for corrected error count defined by the userspace policy.
3. Rasdaemon process the memory sparing trace event and issue repair request for memory sparing.
Kernel CXL driver shall report memory sparing event record to the userspace with the resource availability in order rasdaemon to process the event record and issue a repair request in sysfs for the memory sparing operation in the CXL device.
Note: Based on the feedbacks from the community 'query' sysfs attribute is removed and reporting memory sparing error record to the userspace are not supported. Instead userspace issues sparing operation and kernel does the same to the CXL memory device, when 'maintenance needed' flag set in the DRAM event record.
Add checks to ensure the memory to be repaired is offline and if online, then originates from a CXL DRAM error record reported in the current boot before requesting a memory sparing operation on the device.
Note: Tested memory sparing feature control with QEMU patch "hw/cxl: Add emulation for memory sparing control feature" https://lore.kernel.org/linux-cxl/20250509172229.726-1-shiju.jose@huawei.com/T/#m5f38512a95670d75739f9dad3ee91b95c7f5c8d6
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250521124749.817-8-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
0b5ccb0d |
| 21-May-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/edac: Support for finding memory operation attributes from the current boot
Certain operations on memory, such as memory repair, are permitted only when the address and other attributes for the
cxl/edac: Support for finding memory operation attributes from the current boot
Certain operations on memory, such as memory repair, are permitted only when the address and other attributes for the operation are from the current boot. This is determined by checking whether the memory attributes for the operation match those in the CXL gen_media or CXL DRAM memory event records reported during the current boot.
The CXL event records must be backed up because they are cleared in the hardware after being processed by the kernel.
Support is added for storing CXL gen_media or CXL DRAM memory event records in xarrays. Old records are deleted when they expire or when there is an overflow and which depends on platform correctly report Event Record Timestamp field of CXL spec Table 8-55 Common Event Record Format.
Additionally, helper functions are implemented to find a matching record in the xarray storage based on the memory attributes and repair type.
Add validity check, when matching attributes for sparing, using the validity flag in the DRAM event record, to ensure that all required attributes for a requested repair operation are valid and set.
Reviewed-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250521124749.817-7-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
077ee5f7 |
| 21-May-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/edac: Add support for PERFORM_MAINTENANCE command
Add support for PERFORM_MAINTENANCE command.
CXL spec 3.2 section 8.2.10.7.1 describes the Perform Maintenance command. This command requests t
cxl/edac: Add support for PERFORM_MAINTENANCE command
Add support for PERFORM_MAINTENANCE command.
CXL spec 3.2 section 8.2.10.7.1 describes the Perform Maintenance command. This command requests the device to execute the maintenance operation specified by the maintenance operation class and the maintenance operation subclass.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250521124749.817-6-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
85fb6a16 |
| 21-May-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/edac: Add CXL memory device ECS control feature
CXL spec 3.2 section 8.2.10.9.11.2 describes the DDR5 ECS (Error Check Scrub) control feature. The Error Check Scrub (ECS) is a feature defined in
cxl/edac: Add CXL memory device ECS control feature
CXL spec 3.2 section 8.2.10.9.11.2 describes the DDR5 ECS (Error Check Scrub) control feature. The Error Check Scrub (ECS) is a feature defined in JEDEC DDR5 SDRAM Specification (JESD79-5) and allows the DRAM to internally read, correct single-bit errors, and write back corrected data bits to the DRAM array while providing transparency to error counts.
The ECS control allows the requester to change the log entry type, the ECS threshold count (provided the request falls within the limits specified in DDR5 mode registers), switch between codeword mode and row count mode, and reset the ECS counter.
Register with EDAC device driver, which retrieves the ECS attribute descriptors from the EDAC ECS and exposes the ECS control attributes to userspace via sysfs. For example, the ECS control for the memory media FRU0 in CXL mem0 device is located at /sys/bus/edac/devices/cxl_mem0/ecs_fru0/
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250521124749.817-5-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
#
0c6e6f13 |
| 21-May-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/edac: Add CXL memory device patrol scrub control feature
CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub control feature. The device patrol scrub proactively locates and mak
cxl/edac: Add CXL memory device patrol scrub control feature
CXL spec 3.2 section 8.2.10.9.11.1 describes the device patrol scrub control feature. The device patrol scrub proactively locates and makes corrections to errors in regular cycle.
Allow specifying the number of hours within which the patrol scrub must be completed, subject to minimum and maximum limits reported by the device. Also allow disabling scrub allowing trade-off error rates against performance.
Add support for patrol scrub control on CXL memory devices. Register with the EDAC device driver, which retrieves the scrub attribute descriptors from EDAC scrub and exposes the sysfs scrub control attributes to userspace. For example, scrub control for the CXL memory device "cxl_mem0" is exposed in /sys/bus/edac/devices/cxl_mem0/scrubX/.
Additionally, add support for region-based CXL memory patrol scrub control. CXL memory regions may be interleaved across one or more CXL memory devices. For example, region-based scrub control for "cxl_region1" is exposed in /sys/bus/edac/devices/cxl_region1/scrubX/.
[dj: A few formatting fixes from Jonathan]
Reviewed-by: Dave Jiang <dave.jiang@intel.com> Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250521124749.817-4-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|