Merge tag 'v6.15-rc5' into x86/cpu, to resolve conflicts Conflicts: tools/arch/x86/include/asm/cpufeatures.hSigned-off-by: Ingo Molnar <mingo@kernel.org>
x86/platform/amd: Move the <asm/amd_node.h> header to <asm/amd/node.h>Collect AMD specific platform header files in <asm/amd/*.h>.Signed-off-by: Ingo Molnar <mingo@kernel.org>Acked-by: Borislav
x86/platform/amd: Move the <asm/amd_node.h> header to <asm/amd/node.h>Collect AMD specific platform header files in <asm/amd/*.h>.Signed-off-by: Ingo Molnar <mingo@kernel.org>Acked-by: Borislav Petkov (AMD) <bp@alien8.de>Cc: H. Peter Anvin <hpa@zytor.com>Cc: Linus Torvalds <torvalds@linux-foundation.org>Cc: Mario Limonciello <superm1@kernel.org>Link: https://lore.kernel.org/r/20250413084144.3746608-7-mingo@kernel.org
show more ...
x86/platform/amd: Move the <asm/amd_nb.h> header to <asm/amd/nb.h>Collect AMD specific platform header files in <asm/amd/*.h>.Signed-off-by: Ingo Molnar <mingo@kernel.org>Acked-by: Borislav Petk
x86/platform/amd: Move the <asm/amd_nb.h> header to <asm/amd/nb.h>Collect AMD specific platform header files in <asm/amd/*.h>.Signed-off-by: Ingo Molnar <mingo@kernel.org>Acked-by: Borislav Petkov (AMD) <bp@alien8.de>Cc: H. Peter Anvin <hpa@zytor.com>Cc: Linus Torvalds <torvalds@linux-foundation.org>Cc: Mario Limonciello <superm1@kernel.org>Link: https://lore.kernel.org/r/20250413084144.3746608-4-mingo@kernel.org
RAS/AMD/FMPM: Get masked addressSome operations require checking, or ignoring, specific bits in an addressvalue. For example, this can be comparing address values to identify uniquestructures.C
RAS/AMD/FMPM: Get masked addressSome operations require checking, or ignoring, specific bits in an addressvalue. For example, this can be comparing address values to identify uniquestructures.Currently, the full address value is compared when filtering for duplicates.This results in over counting and creation of extra records. This gives theimpression that more unique events occurred than did in reality.Mask the address for physical rows on MI300. [ bp: Simplify. ]Fixes: 6f15e617cc99 ("RAS: Introduce a FRU memory poison manager")Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Cc: stable@vger.kernel.org
RAS/AMD/ATL: Include row[13] bit in row retirementBased on feedback from hardware folks, row[13] is part of the variablebits within a physical row (along with all column bits).Only half the phys
RAS/AMD/ATL: Include row[13] bit in row retirementBased on feedback from hardware folks, row[13] is part of the variablebits within a physical row (along with all column bits).Only half the physical addresses affected by a row are calculated ifthis bit is not included.Add the row[13] bit to the row retirement flow.Fixes: 3b566b30b414 ("RAS/AMD/ATL: Add MI300 row retirement support")Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Cc: stable@vger.kernel.orgLink: https://lore.kernel.org/r/20250401-fix-fmpm-extra-records-v1-1-840bcf7a8ac5@amd.com
x86/amd_nb: Move SMN access code to a new amd_node driverSMN access was bolted into amd_nb mostly as convenience. This haslimitations though that require incurring tech debt to keep it working.
x86/amd_nb: Move SMN access code to a new amd_node driverSMN access was bolted into amd_nb mostly as convenience. This haslimitations though that require incurring tech debt to keep it working.Move SMN access to the newly introduced AMD Node driver.Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> # pdx86Acked-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com> # PMF, PMCLink: https://lore.kernel.org/r/20241206161210.163701-11-yazen.ghannam@amd.com
RAS/AMD/ATL: Add debug prints for DF register readsThe ATL will fail early if the DF register access fails due to missingPCI IDs in the amd_nb code. There aren't any clear indicators on why theAT
RAS/AMD/ATL: Add debug prints for DF register readsThe ATL will fail early if the DF register access fails due to missingPCI IDs in the amd_nb code. There aren't any clear indicators on why theATL will fail to load in this case.Add a couple of debug print statements to highlight reasons for failure.A common scenario is missing support for new hardware. If the ATL failsto load on a system, and there is interest to support it, then dynamicdebugging can be enabled to help find the cause for failure. If there isno interest in supporting ATL on a new system, then these failures willbe silent.Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Link: https://lore.kernel.org/r/20241021152158.2525669-1-yazen.ghannam@amd.com
RAS/AMD/ATL: Translate normalized to system physical addresses using PRMAMD Zen-based systems report memory error addresses through machinecheck banks representing Unified Memory Controllers (UMCs
RAS/AMD/ATL: Translate normalized to system physical addresses using PRMAMD Zen-based systems report memory error addresses through machinecheck banks representing Unified Memory Controllers (UMCs) in the formof UMC relative "normalized" addresses. A normalized address must beconverted to a system physical address to be usable by the OS.Future AMD platforms will provide a UEFI PRM module that implements anumber of address translation PRM handlers. This will provide aninterface for the OS to call platform specific code without requiringthe use of SMM or other heavy firmware operations.Add support for the normalized to system physical address translationPRM handler in the AMD Address Translation Library and prefer it overnative code if available. The GUID and parameter buffer structure arespecific to the normalized to system physical address handler providedby the address translation PRM module included in future AMD systems.The address translation PRM module is documented in chapter 22 of thepublicly available "AMD Family 1Ah Models 00h–0Fh and Models 10h–1FhACPI v6.5 Porting Guide". [ bp: Massage commit message. ]Signed-off-by: John Allen <john.allen@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Link: https://lore.kernel.org/r/20240730151731.15363-3-john.allen@amd.com
Merge tag 'edac_updates_for_v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/rasPull EDAC updates from Borislav Petkov: - The AMD memory controllers data fabric version 4.5 supports
Merge tag 'edac_updates_for_v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/rasPull EDAC updates from Borislav Petkov: - The AMD memory controllers data fabric version 4.5 supports non-power-of-2 denormalization in the sense that certain bits of the system physical address cannot be reconstructed from the normalized address reported by the RAS hardware. Add support for handling such addresses - Switch the EDAC drivers to the new Intel CPU model defines - The usual fixes and cleanups all over the place* tag 'edac_updates_for_v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC: Add missing MODULE_DESCRIPTION() macros EDAC/dmc520: Use devm_platform_ioremap_resource() EDAC/igen6: Add Intel Arrow Lake-U/H SoCs support RAS/AMD/FMPM: Use atl internal.h for INVALID_SPA RAS/AMD/ATL: Implement DF 4.5 NP2 denormalization RAS/AMD/ATL: Validate address map when information is gathered RAS/AMD/ATL: Expand helpers for adding and removing base and hole RAS/AMD/ATL: Read DRAM hole base early RAS/AMD/ATL: Add amd_atl pr_fmt() prefix RAS/AMD/ATL: Add a missing module description EDAC, i10nm: make skx_common.o a separate module EDAC/skx: Switch to new Intel CPU model defines EDAC/sb_edac: Switch to new Intel CPU model defines EDAC, pnd2: Switch to new Intel CPU model defines EDAC/i10nm: Switch to new Intel CPU model defines EDAC/ghes: Add missing newline to pr_info() statement RAS/AMD/ATL: Add missing newline to pr_info() statement EDAC/thunderx: Remove unused struct error_syndrome
Merge remote-tracking branches 'ras/edac-amd-atl' and 'ras/edac-misc' into edac-updates* ras/edac-amd-atl: RAS/AMD/FMPM: Use atl internal.h for INVALID_SPA RAS/AMD/ATL: Implement DF 4.5 NP2 den
Merge remote-tracking branches 'ras/edac-amd-atl' and 'ras/edac-misc' into edac-updates* ras/edac-amd-atl: RAS/AMD/FMPM: Use atl internal.h for INVALID_SPA RAS/AMD/ATL: Implement DF 4.5 NP2 denormalization RAS/AMD/ATL: Validate address map when information is gathered RAS/AMD/ATL: Expand helpers for adding and removing base and hole RAS/AMD/ATL: Read DRAM hole base early RAS/AMD/ATL: Add amd_atl pr_fmt() prefix RAS/AMD/ATL: Add a missing module description* ras/edac-misc: EDAC: Add missing MODULE_DESCRIPTION() macros EDAC/dmc520: Use devm_platform_ioremap_resource() EDAC/igen6: Add Intel Arrow Lake-U/H SoCs support EDAC, i10nm: make skx_common.o a separate module EDAC/skx: Switch to new Intel CPU model defines EDAC/sb_edac: Switch to new Intel CPU model defines EDAC, pnd2: Switch to new Intel CPU model defines EDAC/i10nm: Switch to new Intel CPU model defines EDAC/ghes: Add missing newline to pr_info() statement RAS/AMD/ATL: Add missing newline to pr_info() statement EDAC/thunderx: Remove unused struct error_syndromeSigned-off-by: Borislav Petkov (AMD) <bp@alien8.de>
RAS/AMD/ATL: Use system settings for MI300 DRAM to normalized address translationThe currently used normalized address format is not applicable to allMI300 systems. This leads to incorrect results
RAS/AMD/ATL: Use system settings for MI300 DRAM to normalized address translationThe currently used normalized address format is not applicable to allMI300 systems. This leads to incorrect results during addresstranslation.Drop the fixed layout and construct the normalized address from systemsettings.Fixes: 87a612375307 ("RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support")Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Cc: <stable@kernel.org>Link: https://lore.kernel.org/r/20240607-mi300-dram-xl-fix-v1-2-2f11547a178c@amd.com
RAS/AMD/ATL: Fix MI300 bank hashApply the SID bits to the correct offset in the Bank value. Do this inthe temporary value so they don't need to be masked off later.Fixes: 87a612375307 ("RAS/AMD/
RAS/AMD/ATL: Fix MI300 bank hashApply the SID bits to the correct offset in the Bank value. Do this inthe temporary value so they don't need to be masked off later.Fixes: 87a612375307 ("RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support")Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Cc: <stable@kernel.org>Link: https://lore.kernel.org/r/20240607-mi300-dram-xl-fix-v1-1-2f11547a178c@amd.com
RAS/AMD/ATL: Implement DF 4.5 NP2 denormalizationUnlike with previous Data Fabric versions, with Data Fabric 4.5non-power-of-2 denormalization, there are bits of the system physicaladdress that c
RAS/AMD/ATL: Implement DF 4.5 NP2 denormalizationUnlike with previous Data Fabric versions, with Data Fabric 4.5non-power-of-2 denormalization, there are bits of the system physicaladdress that can't be fully reconstructed from the normalized address.To determine the proper combination of missing system physical addressbits, iterate through each possible combination of these bits, normalizethe resulting system physical address, and compare to the originaladdress that is being translated. If the addresses match, then thecorrect permutation of bits has been found.Signed-off-by: John Allen <john.allen@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>Link: https://lore.kernel.org/r/20240606203313.51197-6-john.allen@amd.com
RAS/AMD/ATL: Validate address map when information is gatheredValidate address maps at the time the information is gathered as theaddress map will not change during translation.Signed-off-by: Jo
RAS/AMD/ATL: Validate address map when information is gatheredValidate address maps at the time the information is gathered as theaddress map will not change during translation.Signed-off-by: John Allen <john.allen@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>Link: https://lore.kernel.org/r/20240606203313.51197-5-john.allen@amd.com
RAS/AMD/ATL: Expand helpers for adding and removing base and holeThe ret_addr field in struct addr_ctx contains the intermediate value ofthe returned address as it passes through multiple steps in
RAS/AMD/ATL: Expand helpers for adding and removing base and holeThe ret_addr field in struct addr_ctx contains the intermediate value ofthe returned address as it passes through multiple steps in thetranslation process. Currently, adding the DRAM base and legacy holeis only done once, so it operates directly on the intermediate value.However, for DF 4.5 non-power-of-2 denormalization, adding and removingthe DRAM base and legacy hole needs to be done for multiple temporaryaddress values. During this process, the intermediate value should not belost so the ret_addr value can't be reused.Update the existing 'add' helper to operate on an arbitrary addressand introduce a new 'remove' helper to do the inverse operations.Signed-off-by: John Allen <john.allen@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>Link: https://lore.kernel.org/r/20240606203313.51197-4-john.allen@amd.com
RAS/AMD/ATL: Read DRAM hole base earlyRead DRAM hole base when constructing the address map as the value willnot change during run time.Signed-off-by: John Allen <john.allen@amd.com>Signed-off-
RAS/AMD/ATL: Read DRAM hole base earlyRead DRAM hole base when constructing the address map as the value willnot change during run time.Signed-off-by: John Allen <john.allen@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>Link: https://lore.kernel.org/r/20240606203313.51197-3-john.allen@amd.com
RAS/AMD/ATL: Add amd_atl pr_fmt() prefixPrefix all AMD ATL pr_* statements with "amd_atl:".Signed-off-by: John Allen <john.allen@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Link:
RAS/AMD/ATL: Add amd_atl pr_fmt() prefixPrefix all AMD ATL pr_* statements with "amd_atl:".Signed-off-by: John Allen <john.allen@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Link: https://lore.kernel.org/r/20240606203313.51197-2-john.allen@amd.com
RAS/AMD/ATL: Add a missing module descriptionAdd a missing module description. [ bp: Massage commit message. ]Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>Signed-off-by: Borislav Pe
RAS/AMD/ATL: Add a missing module descriptionAdd a missing module description. [ bp: Massage commit message. ]Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Link: https://lore.kernel.org/r/20240604-md-ras-amd-atl-v1-1-d4eb3cf3abe4@quicinc.com
RAS/AMD/ATL: Add missing newline to pr_info() statementAdd a missing newline character even if printk() adds newlines tonon-\n-terminated strings because in the unlikely case a KERN_CONT printsta
RAS/AMD/ATL: Add missing newline to pr_info() statementAdd a missing newline character even if printk() adds newlines tonon-\n-terminated strings because in the unlikely case a KERN_CONT printstatement is added after the unterminated statement, the two will getglued together which is not the expected behavior. [ bp: Rewrite commit message. ]Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Link: https://lore.kernel.org/r/20240517215452.2020680-1-gomonovych@gmail.com
Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/rasPull EDAC updates from Borislav Petkov: - Add a FRU (Field Replaceable Unit) memory poison manager which
Merge tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/rasPull EDAC updates from Borislav Petkov: - Add a FRU (Field Replaceable Unit) memory poison manager which collects and manages previously encountered hw errors in order to save them to persistent storage across reboots. Previously recorded errors are "replayed" upon reboot in order to poison memory which has caused said errors in the past. The main use case is stacked, on-chip memory which cannot simply be replaced so poisoning faulty areas of it and thus making them inaccessible is the only strategy to prolong its lifetime. - Add an AMD address translation library glue which converts the reported addresses of hw errors into system physical addresses in order to be used by other subsystems like memory failure, for example. Add support for MI300 accelerators to that library. - igen6: Add support for Alder Lake-N SoC - i10nm: Add Grand Ridge support - The usual fixlets and cleanups* tag 'edac_updates_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/versal: Convert to platform remove callback returning void RAS/AMD/FMPM: Fix off by one when unwinding on error RAS/AMD/FMPM: Add debugfs interface to print record entries RAS/AMD/FMPM: Save SPA values RAS: Export helper to get ras_debugfs_dir RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2() RAS: Introduce a FRU memory poison manager RAS/AMD/ATL: Add MI300 row retirement support Documentation: Move RAS section to admin-guide EDAC/versal: Make the bit position of injected errors configurable EDAC/i10nm: Add Intel Grand Ridge micro-server support EDAC/igen6: Add one more Intel Alder Lake-N SoC support RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300() RAS/AMD/ATL: Add MI300 support Documentation: RAS: Add index and address translation section EDAC/amd64: Use new AMD Address Translation Library RAS: Introduce AMD Address Translation Library EDAC/synopsys: Convert to devm_platform_ioremap_resource()
RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()The hash_pa8 and hashed_bit values in denorm_addr_df4_np2() arecurrently defined as u8 types. These variables represent single bits.'hash_pa
RAS/AMD/ATL: Fix bit overflow in denorm_addr_df4_np2()The hash_pa8 and hashed_bit values in denorm_addr_df4_np2() arecurrently defined as u8 types. These variables represent single bits.'hash_pa8' is set based on logical AND operations using masks with morethan 8 bits. So the calculated value will not fit in this variable. Itwill always be '0'. The 'hash_pa8' check later in the function will failwhich produces incorrect results for some cases.Change these variables to bool type. This clarifies that they aresingle bit values. Also, this allows the compiler to ensure they holdthe proper results. Remove an unnecessary shift operation. [ bp: Remove the unnecessary brackets in the else-branch of the hash_pa8 assignment. ]Fixes: 3f3174996be6 ("RAS: Introduce AMD Address Translation Library")Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Link: https://lore.kernel.org/r/20240222165449.23582-1-yazen.ghannam@amd.com
RAS/AMD/ATL: Add MI300 row retirement supportDRAM row retirement depends on model-specific information that is bestdone within the AMD Address Translation Library.Export a generic wrapper functi
RAS/AMD/ATL: Add MI300 row retirement supportDRAM row retirement depends on model-specific information that is bestdone within the AMD Address Translation Library.Export a generic wrapper function for other modules to use. Add anymodel-specific helpers here.Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Link: https://lore.kernel.org/r/20240214033516.1344948-2-yazen.ghannam@amd.com
RAS/AMD/ATL: Add MI300 DRAM to normalized address translation supportZen-based AMD systems report DRAM ECC errors through Unified MemoryController (UMC) MCA banks. The value provided in MCA_ADDR i
RAS/AMD/ATL: Add MI300 DRAM to normalized address translation supportZen-based AMD systems report DRAM ECC errors through Unified MemoryController (UMC) MCA banks. The value provided in MCA_ADDR isa "normalized" address which represents the UMC's view of its managedmemory. The normalized address must be translated to a system physicaladdress for software to take action.MI300 systems, uniquely, do not provide a normalized address in MCA_ADDRfor DRAM ECC errors. Rather, the "DRAM" address is reported. This valueincludes identifiers for the bank, row, column, pseudochannel and stackof the memory location.The DRAM address must be converted to a normalized address in order tobe further translated to a system physical address.Add helper functions to do the DRAM to normalized translation for MI300systems. The method is based on the fixed hardware layout of the on-chipmemory. [ bp: Massage commit message, decapitalize some, rename function. ]Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>Co-developed-by: Muralidhara M K <muralidhara.mk@amd.com>Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Tested-by: Muralidhara M K <muralidhara.mk@amd.com>Link: https://lore.kernel.org/r/20240131165732.88297-1-yazen.ghannam@amd.com
RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()Check against ARRAY_SIZE() which is the number of elements instead ofsizeof() which is the number of bytes.Fixes: 453f0ae79
RAS/AMD/ATL: Fix array overflow in get_logical_coh_st_fabric_id_mi300()Check against ARRAY_SIZE() which is the number of elements instead ofsizeof() which is the number of bytes.Fixes: 453f0ae79732 ("RAS/AMD/ATL: Add MI300 support")Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Link: https://lore.kernel.org/r/279c8b5e-6c00-467a-9071-9c67926abea4@moroto.mountain
RAS/AMD/ATL: Add MI300 supportAMD MI300 systems include on-die HBM3 memory and a unique topology. Andthey fall under Data Fabric version 4.5 in overall design.Generally, topology information (ID
RAS/AMD/ATL: Add MI300 supportAMD MI300 systems include on-die HBM3 memory and a unique topology. Andthey fall under Data Fabric version 4.5 in overall design.Generally, topology information (IDs, etc.) is gathered from Data Fabricregisters. However, the unique topology for MI300 means that sometopology information is fixed in hardware and follows arbitrarymappings. Furthermore, not all hardware instances are software-visible,so register accesses must be adjusted.Recognize and add helper functions for the new MI300 interleave modes.Add lookup tables for fixed values where appropriate. Adjust how Die andNode IDs are found and used.Also, fix some register bitmasks that were mislabeled.Signed-off-by: Muralidhara M K <muralidhara.mk@amd.com>Co-developed-by: Yazen Ghannam <yazen.ghannam@amd.com>Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>Link: https://lore.kernel.org/r/20240128155950.1434067-1-yazen.ghannam@amd.com
12