| 95350eff | 14-Jan-2026 |
Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> |
ACPI: extlog: Trace CPER CXL Protocol Error Section
When Firmware First is enabled, BIOS handles errors first and then it makes them available to the kernel via the Common Platform Error Record (CPE
ACPI: extlog: Trace CPER CXL Protocol Error Section
When Firmware First is enabled, BIOS handles errors first and then it makes them available to the kernel via the Common Platform Error Record (CPER) sections (UEFI 2.11 Appendix N.2.13). Linux parses the CPER sections via one of two similar paths, either ELOG or GHES. The errors managed by ELOG are signaled to the BIOS by the I/O Machine Check Architecture (I/O MCA).
Currently, ELOG and GHES show some inconsistencies in how they report to userspace via trace events.
Therefore, make the two mentioned paths act similarly by tracing the CPER CXL Protocol Error Section.
Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> Link: https://patch.msgid.link/20260114101543.85926-6-fabio.m.de.francesco@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
| ba8af8e1 | 14-Jan-2026 |
Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> |
ACPI: APEI: GHES: Add helper to copy CPER CXL protocol error info to work struct
Make a helper out of cxl_cper_post_prot_err() that checks the CXL agent type and copy the CPER CXL protocol errors in
ACPI: APEI: GHES: Add helper to copy CPER CXL protocol error info to work struct
Make a helper out of cxl_cper_post_prot_err() that checks the CXL agent type and copy the CPER CXL protocol errors information to a work data structure.
Export the new symbol for reuse by ELOG.
Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> [ rjw: Subject tweak ] Link: https://patch.msgid.link/20260114101543.85926-5-fabio.m.de.francesco@linux.intel.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
show more ...
|
| f10f46a0 | 17-Jul-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/events: Trace Memory Sparing Event Record
CXL rev 3.2 section 8.2.10.2.1.4 Table 8-60 defines the Memory Sparing Event Record.
Determine if the event read is memory sparing record and if so tra
cxl/events: Trace Memory Sparing Event Record
CXL rev 3.2 section 8.2.10.2.1.4 Table 8-60 defines the Memory Sparing Event Record.
Determine if the event read is memory sparing record and if so trace the record.
Memory device shall produce a memory sparing event record 1. After completion of a PPR maintenance operation if the memory sparing event record enable bit is set (Field: sPPR/hPPR Operation Mode in Table 8-128/Table 8-131). 2. In response to a query request by the host (see section 8.2.10.7.1.4) to determine the availability of sparing resources. The device shall report the resource availability by producing the Memory Sparing Event Record (see Table 8-60) in which the channel, rank, nibble mask, bank group, bank, row, column, sub-channel fields are a copy of the values specified in the request. If the controller does not support reporting whether a resource is available, and a perform maintenance operation for memory sparing is issued with query resources set to 1, the controller shall return invalid input.
Example trace log for produce memory sparing event record on completion of a soft PPR operation, cxl_memory_sparing: memdev=mem1 host=0000:0f:00.0 serial=3 log=Informational : time=55045163029 uuid=e71f3a40-2d29-4092-8a39-4d1c966c7c65 len=128 flags='0x1' handle=1 related_handle=0 maint_op_class=2 maint_op_sub_class=1 ld_id=0 head_id=0 : flags='' result=0 validity_flags='CHANNEL|RANK|NIBBLE|BANK GROUP|BANK|ROW|COLUMN' spare resource avail=1 channel=2 rank=5 nibble_mask=a59c bank_group=2 bank=4 row=13 column=23 sub_channel=0 comp_id=00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 comp_id_pldm_valid_flags='' pldm_entity_id=0x00 pldm_resource_id=0x00
Note: For memory sparing event record, fields 'maintenance operation class' and 'maintenance operation subclass' are defined twice, first in the common event record (Table 8-55) and second in the memory sparing event record (Table 8-60). Thus those in the sparing event record coded as reserved, to be removed when the spec is updated.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250717101817.2104-5-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| 9b8e73cd | 07-Mar-2025 |
Dave Jiang <dave.jiang@intel.com> |
cxl: Move cxl feature command structs to user header
In preparation for cxl fwctl enabling, move data structures related to cxl feature commands to a user header file.
Reviewed-by; Jonathan Cameron
cxl: Move cxl feature command structs to user header
In preparation for cxl fwctl enabling, move data structures related to cxl feature commands to a user header file.
Reviewed-by; Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://patch.msgid.link/r/20250307205648.1021626-3-dave.jiang@intel.com Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
show more ...
|
| 36f257e3 | 10-Mar-2025 |
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> |
acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
When PCIe AER is in FW-First, OS should process CXL Protocol errors from CPER records. Introduce support for handling and logging CXL Protocol er
acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors
When PCIe AER is in FW-First, OS should process CXL Protocol errors from CPER records. Introduce support for handling and logging CXL Protocol errors.
The defined trace events cxl_aer_uncorrectable_error and cxl_aer_correctable_error trace native CXL AER endpoint errors. Reuse them to trace FW-First Protocol errors.
Since the CXL code is required to be called from process context and GHES is in interrupt context, use workqueues for processing.
Similar to CXL CPER event handling, use kfifo to handle errors as it simplifies queue processing by providing lock free fifo operations.
Add the ability for the CXL sub-system to register a workqueue to process CXL CPER protocol errors.
[DJ: return cxl_cper_register_prot_err_work() directly in cxl_ras_init()]
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://patch.msgid.link/20250310223839.31342-2-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| a8b773f2 | 20-Feb-2025 |
Dave Jiang <dave.jiang@intel.com> |
cxl: Setup exclusive CXL features that are reserved for the kernel
Certain features will be exclusively used by components such as in kernel RAS driver. Setup an exclusion list that can be used to d
cxl: Setup exclusive CXL features that are reserved for the kernel
Certain features will be exclusively used by components such as in kernel RAS driver. Setup an exclusion list that can be used to detect if a feature is exclusive to the kernel.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Tested-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250220194438.2281088-7-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| 14d502cc | 20-Feb-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/mbox: Add SET_FEATURE mailbox command
Add support for SET_FEATURE mailbox command.
CXL spec r3.2 section 8.2.9.6 describes optional device specific features. CXL devices supports features with
cxl/mbox: Add SET_FEATURE mailbox command
Add support for SET_FEATURE mailbox command.
CXL spec r3.2 section 8.2.9.6 describes optional device specific features. CXL devices supports features with changeable attributes. The settings of a feature can be optionally modified using Set Feature command. CXL spec r3.2 section 8.2.9.6.3 describes Set Feature command.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250220194438.2281088-6-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| 5e5ac21f | 20-Feb-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/mbox: Add GET_FEATURE mailbox command
Add support for GET_FEATURE mailbox command.
CXL spec r3.2 section 8.2.9.6 describes optional device specific features. The settings of a feature can be re
cxl/mbox: Add GET_FEATURE mailbox command
Add support for GET_FEATURE mailbox command.
CXL spec r3.2 section 8.2.9.6 describes optional device specific features. The settings of a feature can be retrieved using Get Feature command. CXL spec r3.2 section 8.2.9.6.2 describes Get Feature command.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250220194438.2281088-5-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| f0e6a232 | 20-Feb-2025 |
Dave Jiang <dave.jiang@intel.com> |
cxl: Add Get Supported Features command for kernel usage
CXL spec r3.2 8.2.9.6.1 Get Supported Features (Opcode 0500h) The command retrieve the list of supported device-specific features (identified
cxl: Add Get Supported Features command for kernel usage
CXL spec r3.2 8.2.9.6.1 Get Supported Features (Opcode 0500h) The command retrieve the list of supported device-specific features (identified by UUID) and general information about each Feature.
The driver will retrieve the Feature entries in order to make checks and provide information for the Get Feature and Set Feature command. One of the main piece of information retrieved are the effects a Set Feature command would have for a particular feature. The retrieved Feature entries are stored in the cxl_mailbox context.
The setup of Features is initiated via devm_cxl_setup_features() during the pci probe function before the cxl_memdev is enumerated.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250220194438.2281088-3-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| cbbca60a | 20-Feb-2025 |
Dave Jiang <dave.jiang@intel.com> |
cxl: Enumerate feature commands
Add feature commands enumeration code in order to detect and enumerate the 3 feature related commands "get supported features", "get feature", and "set feature". The
cxl: Enumerate feature commands
Add feature commands enumeration code in order to detect and enumerate the 3 feature related commands "get supported features", "get feature", and "set feature". The enumeration will help determine whether the driver can issue any of the 3 commands to the device.
Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Li Ming <ming.li@zohomail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250220194438.2281088-2-dave.jiang@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| 315c2f0b | 23-Jan-2025 |
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> |
acpi/ghes, cper: Recognize and cache CXL Protocol errors
Add support in GHES to detect and process CXL CPER Protocol errors, as defined in UEFI v2.10, section N.2.13.
Define struct cxl_cper_prot_er
acpi/ghes, cper: Recognize and cache CXL Protocol errors
Add support in GHES to detect and process CXL CPER Protocol errors, as defined in UEFI v2.10, section N.2.13.
Define struct cxl_cper_prot_err_work_data to cache CXL protocol error information, including RAS capabilities and severity, for further handling.
These cached CXL CPER records will later be processed by workqueues within the CXL subsystem.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Link: https://patch.msgid.link/20250123084421.127697-5-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| 4c6e20eb | 11-Jan-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
CXL spec 3.1 section 8.2.9.2.1.3 Table 8-47, Memory Module Event Record has updated with following new fields and new info for Devic
cxl/events: Update Memory Module Event Record to CXL spec rev 3.1
CXL spec 3.1 section 8.2.9.2.1.3 Table 8-47, Memory Module Event Record has updated with following new fields and new info for Device Event Type and Device Health Information fields. 1. Validity Flags 2. Component Identifier 3. Device Event Sub-Type
Update the Memory Module event record and Memory Module trace event for the above spec changes. The new fields are inserted in logical places.
Example trace print of cxl_memory_module trace event,
cxl_memory_module: memdev=mem3 host=0000:0f:00.0 serial=3 log=Fatal : \ time=371709344709 uuid=fe927475-dd59-4339-a586-79bab113b774 len=128 \ flags='0x1' handle=2 related_handle=0 maint_op_class=0 \ maint_op_sub_class=0 : event_type='Temperature Change' \ event_sub_type='Unsupported Config Data' \ health_status='MAINTENANCE_NEEDED|REPLACEMENT_NEEDED' \ media_status='All Data Loss in Event of Power Loss' as_life_used=0x3 \ as_dev_temp=Normal as_cor_vol_err_cnt=Normal as_cor_per_err_cnt=Normal \ life_used=8 device_temp=3 dirty_shutdown_cnt=33 cor_vol_err_cnt=25 \ cor_per_err_cnt=45 validity_flags='COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='Resource ID' \ pldm_entity_id=0x00 pldm_resource_id=fc d2 7e 2f
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250111091756.1682-6-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| 24ec41f7 | 11-Jan-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/events: Update DRAM Event Record to CXL spec rev 3.1
CXL spec 3.1 section 8.2.9.2.1.2 Table 8-46, DRAM Event Record has updated with following new fields and new types for Memory Event Type, Tra
cxl/events: Update DRAM Event Record to CXL spec rev 3.1
CXL spec 3.1 section 8.2.9.2.1.2 Table 8-46, DRAM Event Record has updated with following new fields and new types for Memory Event Type, Transaction Type and Validity Flags fields. 1. Component Identifier 2. Sub-channel 3. Advanced Programmable Corrected Memory Error Threshold Event Flags 4. Corrected Memory Error Count at Event 5. Memory Event Sub-Type
Update DRAM events record and DRAM trace event for the above spec changes. The new fields are inserted in logical places. Includes trivial consistency of white space improvements.
Example trace print of cxl_dram trace event,
cxl_dram: memdev=mem0 host=0000:0f:00.0 serial=3 log=Informational : \ time=54799339519 uuid=601dcbb3-9c06-4eab-b8af-4e9bfb5c9624 len=128 \ flags='0x1' handle=1 related_handle=0 maint_op_class=1 \ maint_op_sub_class=3 : dpa=18680 dpa_flags='' \ descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT' type='Data Path Error' \ sub_type='Media Link CRC Error' transaction_type='Internal Media Scrub' \ channel=3 rank=17 nibble_mask=3b00b2 bank_group=7 bank=11 row=2 \ column=77 cor_mask=21 00 00 00 00 00 00 00 2c 00 00 00 00 00 00 00 37 00 \ 00 00 00 00 00 00 42 00 00 00 00 00 00 00 validity_flags='CHANNEL|RANK|NIBBLE|\ BANK GROUP|BANK|ROW|COLUMN|CORRECTION MASK|COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=01 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='PLDM Entity ID' pldm_entity_id=74 c5 08 9a 1a 0b \ pldm_resource_id=0x00 hpa=ffffffffffffffff region= \ region_uuid=00000000-0000-0000-0000-000000000000 sub_channel=5 \ cme_threshold_ev_flags='Corrected Memory Errors in Multiple Media Components|\ Exceeded Programmable Threshold' cvme_count=148
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://patch.msgid.link/20250111091756.1682-5-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| ae834131 | 11-Jan-2025 |
Shiju Jose <shiju.jose@huawei.com> |
cxl/events: Update General Media Event Record to CXL spec rev 3.1
CXL spec rev 3.1 section 8.2.9.2.1.1 Table 8-45, General Media Event Record has updated with following new fields and new types for
cxl/events: Update General Media Event Record to CXL spec rev 3.1
CXL spec rev 3.1 section 8.2.9.2.1.1 Table 8-45, General Media Event Record has updated with following new fields and new types for Memory Event Type and Transaction Type fields. 1. Advanced Programmable Corrected Memory Error Threshold Event Flags 2. Corrected Memory Error Count at Event 3. Memory Event Sub-Type
The format of component identifier has changed (CXL spec 3.1 section 8.2.9.2.1 Table 8-44).
Update the general media event record and general media trace event for the above spec changes. The new fields are inserted in logical places.
Example trace log of cxl_general_media trace event,
cxl_general_media: memdev=mem0 host=0000:0f:00.0 serial=3 log=Fatal : \ time=156831237413 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 \ flags='0x1' handle=1 related_handle=0 maint_op_class=2 \ maint_op_sub_class=4 : dpa=30d40 dpa_flags='' \ descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVERFLOW' \ type='TE State Violation' sub_type='Media Link Command Training Error' \ transaction_type='Host Inject Poison' channel=3 rank=33 device=5 \ validity_flags='CHANNEL|RANK|DEVICE|COMPONENT|COMPONENT PLDM FORMAT' \ comp_id=03 74 c5 08 9a 1a 0b fc d2 7e 2f 31 9b 3c 81 4d \ comp_id_pldm_valid_flags='PLDM Entity ID | Resource ID' \ pldm_entity_id=74 c5 08 9a 1a 0b pldm_resource_id=fc d2 7e 2f \ hpa=ffffffffffffffff region= \ region_uuid=00000000-0000-0000-0000-000000000000 \ cme_threshold_ev_flags='Corrected Memory Errors in Multiple Media \ Components|Exceeded Programmable Threshold' cme_count=120
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Link: https://patch.msgid.link/20250111091756.1682-4-shiju.jose@huawei.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|