| 059edcc4 | 27-Mar-2026 |
Dan Williams <dan.j.williams@intel.com> |
dax/hmem: Parent dax_hmem devices
For test purposes it is useful to be able to determine which "hmem_platform" device is hosting a given sub-device.
Register hmem devices underneath "hmem_platform"
dax/hmem: Parent dax_hmem devices
For test purposes it is useful to be able to determine which "hmem_platform" device is hosting a given sub-device.
Register hmem devices underneath "hmem_platform".
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260327052821.440749-8-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| f8dc1bde | 27-Mar-2026 |
Dan Williams <dan.j.williams@intel.com> |
dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices
dax_hmem (ab)uses a platform device to allow for a module to autoload in the presence of "Soft Reserved" resources. The dax_h
dax/hmem: Fix singleton confusion between dax_hmem_work and hmem devices
dax_hmem (ab)uses a platform device to allow for a module to autoload in the presence of "Soft Reserved" resources. The dax_hmem driver had no dependencies on the "hmem_platform" device being a singleton until the recent "dax_hmem vs dax_cxl" takeover solution.
Replace the layering violation of dax_hmem_work assuming that there will never be more than one "hmem_platform" device associated with a global work item with a dax_hmem local workqueue that can theoretically support any number of hmem_platform devices.
Fixup the reference counting to only pin the device while it is live in the queue.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260327052821.440749-7-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| 3cba30ee | 27-Mar-2026 |
Dan Williams <dan.j.williams@intel.com> |
dax/hmem: Reduce visibility of dax_cxl coordination symbols
No other module or use case should be using dax_hmem_initial_probe or dax_hmem_flush_work(). Limit their use to dax_hmem, and dax_cxl resp
dax/hmem: Reduce visibility of dax_cxl coordination symbols
No other module or use case should be using dax_hmem_initial_probe or dax_hmem_flush_work(). Limit their use to dax_hmem, and dax_cxl respectively.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260327052821.440749-6-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| 471d8844 | 27-Mar-2026 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Constify cxl_region_resource_contains()
The call to cxl_region_resource_contains() in hmem_register_cxl_device() need not cast away 'const'. The problem is the usage of the bus_for_each_
cxl/region: Constify cxl_region_resource_contains()
The call to cxl_region_resource_contains() in hmem_register_cxl_device() need not cast away 'const'. The problem is the usage of the bus_for_each_dev() API which does not mark its @data parameter as 'const'. Switch to bus_find_device() which does take 'const' @data, fixup cxl_region_resource_contains() and its caller.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260327052821.440749-5-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| b6a61d5b | 27-Mar-2026 |
Dan Williams <dan.j.williams@intel.com> |
cxl/region: Limit visibility of cxl_region_contains_resource()
The dax_hmem dependency on cxl_region_contains_resource() is a one-off special case. It is not suitable for other use cases.
Move the
cxl/region: Limit visibility of cxl_region_contains_resource()
The dax_hmem dependency on cxl_region_contains_resource() is a one-off special case. It is not suitable for other use cases.
Move the definition to the other CONFIG_CXL_REGION guarded definitions in drivers/cxl/cxl.h and include that by a relative path include. This matches what drivers/dax/cxl.c does for its limited private usage of CXL core symbols.
Reduce the symbol export visibility from global to just dax_hmem, to further clarify its applicability.
Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://patch.msgid.link/20260327052821.440749-4-dan.j.williams@intel.com Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| e4de6b91 | 22-Mar-2026 |
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> |
dax/hmem, cxl: Defer and resolve Soft Reserved ownership
The current probe time ownership check for Soft Reserved memory based solely on CXL window intersection is insufficient. dax_hmem probing is
dax/hmem, cxl: Defer and resolve Soft Reserved ownership
The current probe time ownership check for Soft Reserved memory based solely on CXL window intersection is insufficient. dax_hmem probing is not always guaranteed to run after CXL enumeration and region assembly, which can lead to incorrect ownership decisions before the CXL stack has finished publishing windows and assembling committed regions.
Introduce deferred ownership handling for Soft Reserved ranges that intersect CXL windows. When such a range is encountered during the initial dax_hmem probe, schedule deferred work to wait for the CXL stack to complete enumeration and region assembly before deciding ownership.
Once the deferred work runs, evaluate each Soft Reserved range individually: if a CXL region fully contains the range, skip it and let dax_cxl bind. Otherwise, register it with dax_hmem. This per-range ownership model avoids the need for CXL region teardown and alloc_dax_region() resource exclusion prevents double claiming.
Introduce a boolean flag dax_hmem_initial_probe to live inside device.c so it survives module reload. Ensure dax_cxl defers driver registration until dax_hmem has completed ownership resolution. dax_cxl calls dax_hmem_flush_work() before cxl_driver_register(), which both waits for the deferred work to complete and creates a module symbol dependency that forces dax_hmem.ko to load before dax_cxl.
Co-developed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260322195343.206900-9-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| edfcf1e2 | 22-Mar-2026 |
Dan Williams <dan.j.williams@intel.com> |
dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL) so that HMEM only defers Soft Reserved ranges when CXL DAX support is e
dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
Replace IS_ENABLED(CONFIG_CXL_REGION) with IS_ENABLED(CONFIG_DEV_DAX_CXL) so that HMEM only defers Soft Reserved ranges when CXL DAX support is enabled. This makes the coordination between HMEM and the CXL stack more precise and prevents deferral in unrelated CXL configurations.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Link: https://patch.msgid.link/20260322195343.206900-5-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| 7b4bcaad | 22-Mar-2026 |
Dan Williams <dan.j.williams@intel.com> |
dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft Reserved ranges.
Replace MODULE_SOFTDEP("pre: cx
dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
Ensure cxl_acpi has published CXL Window resources before HMEM walks Soft Reserved ranges.
Replace MODULE_SOFTDEP("pre: cxl_acpi") with an explicit, synchronous request_module("cxl_acpi"). MODULE_SOFTDEP() only guarantees eventual loading, it does not enforce that the dependency has finished init before the current module runs. This can cause HMEM to start before cxl_acpi has populated the resource tree, breaking detection of overlaps between Soft Reserved and CXL Windows.
Also, request cxl_pci before HMEM walks Soft Reserved ranges. Unlike cxl_acpi, cxl_pci attach is asynchronous and creates dependent devices that trigger further module loads. Asynchronous probe flushing (wait_for_device_probe()) is added later in the series in a deferred context before HMEM makes ownership decisions for Soft Reserved ranges.
Add an additional explicit Kconfig ordering so that CXL_ACPI and CXL_PCI must be initialized before DEV_DAX_HMEM. This prevents HMEM from consuming Soft Reserved ranges before CXL drivers have had a chance to claim them.
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://patch.msgid.link/20260322195343.206900-4-Smita.KoralahalliChannabasappa@amd.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| 09d09e04 | 10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
cxl/dax: Create dax devices for CXL RAM regions
While platform firmware takes some responsibility for mapping the RAM capacity of CXL devices present at boot, the OS is responsible for mapping the r
cxl/dax: Create dax devices for CXL RAM regions
While platform firmware takes some responsibility for mapping the RAM capacity of CXL devices present at boot, the OS is responsible for mapping the remainder and hot-added devices. Platform firmware is also responsible for identifying the platform general purpose memory pool, typically DDR attached DRAM, and arranging for the remainder to be 'Soft Reserved'. That reservation allows the CXL subsystem to route the memory to core-mm via memory-hotplug (dax_kmem), or leave it for dedicated access (device-dax).
The new 'struct cxl_dax_region' object allows for a CXL memory resource (region) to be published, but also allow for udev and module policy to act on that event. It also prevents cxl_core.ko from having a module loading dependency on any drivers/dax/ modules.
Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167602003896.1924368.10335442077318970468.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
| e9ee9fe3 | 10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax: Assign RAM regions to memory-hotplug by default
The default mode for device-dax instances is backwards for RAM-regions as evidenced by the fact that it tends to catch end users by surprise. "Wh
dax: Assign RAM regions to memory-hotplug by default
The default mode for device-dax instances is backwards for RAM-regions as evidenced by the fact that it tends to catch end users by surprise. "Where is my memory?". Recall that platforms are increasingly shipping with performance-differentiated memory pools beyond typical DRAM and NUMA effects. This includes HBM (high-bandwidth-memory) and CXL (dynamic interleave, varied media types, and future fabric attached possibilities).
For this reason the EFI_MEMORY_SP (EFI Special Purpose Memory => Linux 'Soft Reserved') attribute is expected to be applied to all memory-pools that are not the general purpose pool. This designation gives an Operating System a chance to defer usage of a memory pool until later in the boot process where its performance properties can be interrogated and administrator policy can be applied.
'Soft Reserved' memory can be anything from too limited and precious to be part of the general purpose pool (HBM), too slow to host hot kernel data structures (some PMEM media), or anything in between. However, in the absence of an explicit policy, the memory should at least be made usable by default. The current device-dax default hides all non-general-purpose memory behind a device interface.
The expectation is that the distribution of users that want the memory online by default vs device-dedicated-access by default follows the Pareto principle. A small number of enlightened users may want to do userspace memory management through a device, but general users just want the kernel to make the memory available with an option to get more advanced later.
Arrange for all device-dax instances not backed by PMEM to default to attaching to the dax_kmem driver. From there the baseline memory hotplug policy (CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE / memhp_default_state=) gates whether the memory comes online or stays offline. Where, if it stays offline, it can be reliably converted back to device-mode where it can be partitioned, or fronted by a userspace allocator.
So, if someone wants device-dax instances for their 'Soft Reserved' memory:
1/ Build a kernel with CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n or boot with memhp_default_state=offline, or roll the dice and hope that the kernel has not pinned a page in that memory before step 2.
2/ Write a udev rule to convert the target dax device(s) from 'system-ram' mode to 'devdax' mode:
daxctl reconfigure-device $dax -m devdax -f
Cc: Michal Hocko <mhocko@suse.com> Cc: David Hildenbrand <david@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167602003336.1924368.6809503401422267885.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
| 7dab174e | 10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax/hmem: Move hmem device registration to dax_hmem.ko
In preparation for the CXL region driver to take over the responsibility of registering device-dax instances for CXL regions, move the registra
dax/hmem: Move hmem device registration to dax_hmem.ko
In preparation for the CXL region driver to take over the responsibility of registering device-dax instances for CXL regions, move the registration of "hmem" devices to dax_hmem.ko.
Previously the builtin component of this enabling (drivers/dax/hmem/device.o) would register platform devices for each address range and trigger the dax_hmem.ko module to load and attach device-dax instances to those devices. Now, the ranges are collected from the HMAT and EFI memory map walking, but the device creation is deferred. A new "hmem_platform" device is created which triggers dax_hmem.ko to load and register the platform devices.
Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Link: https://lore.kernel.org/r/167602002771.1924368.5653558226424530127.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
| fe098574 | 10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax/hmem: Convey the dax range via memregion_info()
In preparation for hmem platform devices to be unregistered, stop using platform_device_add_resources() to convey the address range. The platform_
dax/hmem: Convey the dax range via memregion_info()
In preparation for hmem platform devices to be unregistered, stop using platform_device_add_resources() to convey the address range. The platform_device_add_resources() API causes an existing "Soft Reserved" iomem resource to be re-parented under an inserted platform device resource. When that platform device is deleted it removes the platform device resource and all children.
Instead, it is sufficient to convey just the address range and let request_mem_region() insert resources to indicate the devices active in the range. This allows the "Soft Reserved" resource to be re-enumerated upon the next probe event.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167602002217.1924368.7036275892522551624.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
| 84fe17f8 | 10-Feb-2023 |
Dan Williams <dan.j.williams@intel.com> |
dax/hmem: Drop unnecessary dax_hmem_remove()
Empty driver remove callbacks can just be elided.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gregory Price <gregory.price@
dax/hmem: Drop unnecessary dax_hmem_remove()
Empty driver remove callbacks can just be elided.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Gregory Price <gregory.price@memverge.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/167602001664.1924368.9102029637928071240.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
show more ...
|
| 5a505603 | 14-Oct-2020 |
Joao Martins <joao.m.martins@oracle.com> |
dax/hmem: introduce dax_hmem.region_idle parameter
Introduce a new module parameter for dax_hmem which initializes all region devices as free, rather than allocating a pagemap for the region by defa
dax/hmem: introduce dax_hmem.region_idle parameter
Introduce a new module parameter for dax_hmem which initializes all region devices as free, rather than allocating a pagemap for the region by default.
All hmem devices created with dax_hmem.region_idle=1 will have full available size for creating dynamic dax devices.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Ben Skeggs <bskeggs@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Brice Goglin <Brice.Goglin@inria.fr> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: David Airlie <airlied@linux.ie> Cc: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Hulk Robot <hulkci@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@mellanox.com> Cc: Jason Yan <yanaijie@huawei.com> Cc: Jeff Moyer <jmoyer@redhat.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Jia He <justin.he@arm.com> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Juergen Gross <jgross@suse.com> Cc: kernel test robot <lkp@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Mackerras <paulus@ozlabs.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Wei Yang <richard.weiyang@linux.alibaba.com> Cc: Will Deacon <will@kernel.org> Link: https://lkml.kernel.org/r/159643106460.4062302.5868522341307530091.stgit@dwillia2-desk3.amr.corp.intel.com Link: https://lore.kernel.org/r/20200716172913.19658-4-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/160106119033.30709.11249962152222193448.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
show more ...
|