1============= 2NUMA Locality 3============= 4 5Some platforms may have multiple types of memory attached to a compute 6node. These disparate memory ranges may share some characteristics, such 7as CPU cache coherence, but may have different performance. For example, 8different media types and buses affect bandwidth and latency. 9 10A system supports such heterogeneous memory by grouping each memory type 11under different domains, or "nodes", based on locality and performance 12characteristics. Some memory may share the same node as a CPU, and others 13are provided as memory only nodes. While memory only nodes do not provide 14CPUs, they may still be local to one or more compute nodes relative to 15other nodes. The following diagram shows one such example of two compute 16nodes with local memory and a memory only node for each of compute node:: 17 18 +------------------+ +------------------+ 19 | Compute Node 0 +-----+ Compute Node 1 | 20 | Local Node0 Mem | | Local Node1 Mem | 21 +--------+---------+ +--------+---------+ 22 | | 23 +--------+---------+ +--------+---------+ 24 | Slower Node2 Mem | | Slower Node3 Mem | 25 +------------------+ +--------+---------+ 26 27A "memory initiator" is a node containing one or more devices such as 28CPUs or separate memory I/O devices that can initiate memory requests. 29A "memory target" is a node containing one or more physical address 30ranges accessible from one or more memory initiators. 31 32When multiple memory initiators exist, they may not all have the same 33performance when accessing a given memory target. Each initiator-target 34pair may be organized into different ranked access classes to represent 35this relationship. The highest performing initiator to a given target 36is considered to be one of that target's local initiators, and given 37the highest access class, 0. Any given target may have one or more 38local initiators, and any given initiator may have multiple local 39memory targets. 40 41To aid applications matching memory targets with their initiators, the 42kernel provides symlinks to each other. The following example lists the 43relationship for the access class "0" memory initiators and targets:: 44 45 # symlinks -v /sys/devices/system/node/nodeX/access0/targets/ 46 relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY 47 48 # symlinks -v /sys/devices/system/node/nodeY/access0/initiators/ 49 relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX 50 51A memory initiator may have multiple memory targets in the same access 52class. The target memory's initiators in a given class indicate the 53nodes' access characteristics share the same performance relative to other 54linked initiator nodes. Each target within an initiator's access class, 55though, do not necessarily perform the same as each other. 56 57The access class "1" is used to allow differentiation between initiators 58that are CPUs and hence suitable for generic task scheduling, and 59IO initiators such as GPUs and NICs. Unlike access class 0, only 60nodes containing CPUs are considered. 61 62================ 63NUMA Performance 64================ 65 66Applications may wish to consider which node they want their memory to 67be allocated from based on the node's performance characteristics. If 68the system provides these attributes, the kernel exports them under the 69node sysfs hierarchy by appending the attributes directory under the 70memory node's access class 0 initiators as follows:: 71 72 /sys/devices/system/node/nodeY/access0/initiators/ 73 74These attributes apply only when accessed from nodes that have the 75are linked under the this access's initiators. 76 77The performance characteristics the kernel provides for the local initiators 78are exported are as follows:: 79 80 # tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/initiators/ 81 /sys/devices/system/node/nodeY/access0/initiators/ 82 |-- read_bandwidth 83 |-- read_latency 84 |-- write_bandwidth 85 `-- write_latency 86 87The bandwidth attributes are provided in MiB/second. 88 89The latency attributes are provided in nanoseconds. 90 91The values reported here correspond to the rated latency and bandwidth 92for the platform. 93 94Access class 1 takes the same form but only includes values for CPU to 95memory activity. 96 97========== 98NUMA Cache 99========== 100 101System memory may be constructed in a hierarchy of elements with various 102performance characteristics in order to provide large address space of 103slower performing memory cached by a smaller higher performing memory. The 104system physical addresses memory initiators are aware of are provided 105by the last memory level in the hierarchy. The system meanwhile uses 106higher performing memory to transparently cache access to progressively 107slower levels. 108 109The term "far memory" is used to denote the last level memory in the 110hierarchy. Each increasing cache level provides higher performing 111initiator access, and the term "near memory" represents the fastest 112cache provided by the system. 113 114This numbering is different than CPU caches where the cache level (ex: 115L1, L2, L3) uses the CPU-side view where each increased level is lower 116performing. In contrast, the memory cache level is centric to the last 117level memory, so the higher numbered cache level corresponds to memory 118nearer to the CPU, and further from far memory. 119 120The memory-side caches are not directly addressable by software. When 121software accesses a system address, the system will return it from the 122near memory cache if it is present. If it is not present, the system 123accesses the next level of memory until there is either a hit in that 124cache level, or it reaches far memory. 125 126An application does not need to know about caching attributes in order 127to use the system. Software may optionally query the memory cache 128attributes in order to maximize the performance out of such a setup. 129If the system provides a way for the kernel to discover this information, 130for example with ACPI HMAT (Heterogeneous Memory Attribute Table), 131the kernel will append these attributes to the NUMA node memory target. 132 133When the kernel first registers a memory cache with a node, the kernel 134will create the following directory:: 135 136 /sys/devices/system/node/nodeX/memory_side_cache/ 137 138If that directory is not present, the system either does not provide 139a memory-side cache, or that information is not accessible to the kernel. 140 141The attributes for each level of cache is provided under its cache 142level index:: 143 144 /sys/devices/system/node/nodeX/memory_side_cache/indexA/ 145 /sys/devices/system/node/nodeX/memory_side_cache/indexB/ 146 /sys/devices/system/node/nodeX/memory_side_cache/indexC/ 147 148Each cache level's directory provides its attributes. For example, the 149following shows a single cache level and the attributes available for 150software to query:: 151 152 # tree /sys/devices/system/node/node0/memory_side_cache/ 153 /sys/devices/system/node/node0/memory_side_cache/ 154 |-- index1 155 | |-- indexing 156 | |-- line_size 157 | |-- size 158 | `-- write_policy 159 160The "indexing" will be 0 if it is a direct-mapped cache, and non-zero 161for any other indexed based, multi-way associativity. 162 163The "line_size" is the number of bytes accessed from the next cache 164level on a miss. 165 166The "size" is the number of bytes provided by this cache level. 167 168The "write_policy" will be 0 for write-back, and non-zero for 169write-through caching. 170 171======== 172See Also 173======== 174 175[1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf 176- Section 5.2.27 177