1.. SPDX-License-Identifier: GPL-2.0 2 3======================= 4Linux Init (Early Boot) 5======================= 6 7Linux configuration is split into two major steps: Early-Boot and everything else. 8 9During early boot, Linux sets up immutable resources (such as numa nodes), while 10later operations include things like driver probe and memory hotplug. Linux may 11read EFI and ACPI information throughout this process to configure logical 12representations of the devices. 13 14During Linux Early Boot stage (functions in the kernel that have the __init 15decorator), the system takes the resources created by EFI/BIOS 16(:doc:`ACPI tables <../platform/acpi>`) and turns them into resources that the 17kernel can consume. 18 19 20BIOS, Build and Boot Options 21============================ 22 23There are 4 pre-boot options that need to be considered during kernel build 24which dictate how memory will be managed by Linux during early boot. 25 26* EFI_MEMORY_SP 27 28 * BIOS/EFI Option that dictates whether memory is SystemRAM or 29 Specific Purpose. Specific Purpose memory will be deferred to 30 drivers to manage - and not immediately exposed as system RAM. 31 32* CONFIG_EFI_SOFT_RESERVE 33 34 * Linux Build config option that dictates whether the kernel supports 35 Specific Purpose memory. 36 37* CONFIG_MHP_DEFAULT_ONLINE_TYPE 38 39 * Linux Build config that dictates whether and how Specific Purpose memory 40 converted to a dax device should be managed (left as DAX or onlined as 41 SystemRAM in ZONE_NORMAL or ZONE_MOVABLE). 42 43* nosoftreserve 44 45 * Linux kernel boot option that dictates whether Soft Reserve should be 46 supported. Similar to CONFIG_EFI_SOFT_RESERVE. 47 48Memory Map Creation 49=================== 50 51While the kernel parses the EFI memory map, if :code:`Specific Purpose` memory 52is supported and detected, it will set this region aside as 53:code:`SOFT_RESERVED`. 54 55If :code:`EFI_MEMORY_SP=0`, :code:`CONFIG_EFI_SOFT_RESERVE=n`, or 56:code:`nosoftreserve=y` - Linux will default a CXL device memory region to 57SystemRAM. This will expose the memory to the kernel page allocator in 58:code:`ZONE_NORMAL`, making it available for use for most allocations (including 59:code:`struct page` and page tables). 60 61If `Specific Purpose` is set and supported, :code:`CONFIG_MHP_DEFAULT_ONLINE_TYPE_*` 62dictates whether the memory is onlined by default (:code:`_OFFLINE` or 63:code:`_ONLINE_*`), and if online which zone to online this memory to by default 64(:code:`_NORMAL` or :code:`_MOVABLE`). 65 66If placed in :code:`ZONE_MOVABLE`, the memory will not be available for most 67kernel allocations (such as :code:`struct page` or page tables). This may 68significant impact performance depending on the memory capacity of the system. 69 70 71NUMA Node Reservation 72===================== 73 74Linux refers to the proximity domains (:code:`PXM`) defined in the :doc:`SRAT 75<../platform/acpi/srat>` to create NUMA nodes in :code:`acpi_numa_init`. 76Typically, there is a 1:1 relation between :code:`PXM` and NUMA node IDs. 77 78The SRAT is the only ACPI defined way of defining Proximity Domains. Linux 79chooses to, at most, map those 1:1 with NUMA nodes. 80:doc:`CEDT <../platform/acpi/cedt>` adds a description of SPA ranges which 81Linux may map to one or more NUMA nodes. 82 83If there are CXL ranges in the CFMWS but not in SRAT, then a fake :code:`PXM` 84is created (as of v6.15). In the future, Linux may reject CFMWS not described 85by SRAT due to the ambiguity of proximity domain association. 86 87It is important to note that NUMA node creation cannot be done at runtime. All 88possible NUMA nodes are identified at :code:`__init` time, more specifically 89during :code:`mm_init`. The CEDT and SRAT must contain sufficient :code:`PXM` 90data for Linux to identify NUMA nodes their associated memory regions. 91 92The relevant code exists in: :code:`linux/drivers/acpi/numa/srat.c`. 93 94See :doc:`Example Platform Configurations <../platform/example-configs>` 95for more info. 96 97Memory Tiers Creation 98===================== 99Memory tiers are a collection of NUMA nodes grouped by performance characteristics. 100During :code:`__init`, Linux initializes the system with a default memory tier that 101contains all nodes marked :code:`N_MEMORY`. 102 103:code:`memory_tier_init` is called at boot for all nodes with memory online by 104default. :code:`memory_tier_late_init` is called during late-init for nodes setup 105during driver configuration. 106 107Nodes are only marked :code:`N_MEMORY` if they have *online* memory. 108 109Tier membership can be inspected in :: 110 111 /sys/devices/virtual/memory_tiering/memory_tierN/nodelist 112 0-1 113 114If nodes are grouped which have clear difference in performance, check the 115:doc:`HMAT <../platform/acpi/hmat>` and CDAT information for the CXL nodes. All 116nodes default to the DRAM tier, unless HMAT/CDAT information is reported to the 117memory_tier component via `access_coordinates`. 118 119For more, see :doc:`CXL access coordinates documentation 120<../linux/access-coordinates>`. 121 122Contiguous Memory Allocation 123============================ 124The contiguous memory allocator (CMA) enables reservation of contiguous memory 125regions on NUMA nodes during early boot. However, CMA cannot reserve memory 126on NUMA nodes that are not online during early boot. :: 127 128 void __init hugetlb_cma_reserve(int order) { 129 if (!node_online(nid)) 130 /* do not allow reservations */ 131 } 132 133This means if users intend to defer management of CXL memory to the driver, CMA 134cannot be used to guarantee huge page allocations. If enabling CXL memory as 135SystemRAM in `ZONE_NORMAL` during early boot, CMA reservations per-node can be 136made with the :code:`cma_pernuma` or :code:`numa_cma` kernel command line 137parameters. 138