1.. _numaperf: 2 3============= 4NUMA Locality 5============= 6 7Some platforms may have multiple types of memory attached to a compute 8node. These disparate memory ranges may share some characteristics, such 9as CPU cache coherence, but may have different performance. For example, 10different media types and buses affect bandwidth and latency. 11 12A system supports such heterogeneous memory by grouping each memory type 13under different domains, or "nodes", based on locality and performance 14characteristics. Some memory may share the same node as a CPU, and others 15are provided as memory only nodes. While memory only nodes do not provide 16CPUs, they may still be local to one or more compute nodes relative to 17other nodes. The following diagram shows one such example of two compute 18nodes with local memory and a memory only node for each of compute node:: 19 20 +------------------+ +------------------+ 21 | Compute Node 0 +-----+ Compute Node 1 | 22 | Local Node0 Mem | | Local Node1 Mem | 23 +--------+---------+ +--------+---------+ 24 | | 25 +--------+---------+ +--------+---------+ 26 | Slower Node2 Mem | | Slower Node3 Mem | 27 +------------------+ +--------+---------+ 28 29A "memory initiator" is a node containing one or more devices such as 30CPUs or separate memory I/O devices that can initiate memory requests. 31A "memory target" is a node containing one or more physical address 32ranges accessible from one or more memory initiators. 33 34When multiple memory initiators exist, they may not all have the same 35performance when accessing a given memory target. Each initiator-target 36pair may be organized into different ranked access classes to represent 37this relationship. The highest performing initiator to a given target 38is considered to be one of that target's local initiators, and given 39the highest access class, 0. Any given target may have one or more 40local initiators, and any given initiator may have multiple local 41memory targets. 42 43To aid applications matching memory targets with their initiators, the 44kernel provides symlinks to each other. The following example lists the 45relationship for the access class "0" memory initiators and targets:: 46 47 # symlinks -v /sys/devices/system/node/nodeX/access0/targets/ 48 relative: /sys/devices/system/node/nodeX/access0/targets/nodeY -> ../../nodeY 49 50 # symlinks -v /sys/devices/system/node/nodeY/access0/initiators/ 51 relative: /sys/devices/system/node/nodeY/access0/initiators/nodeX -> ../../nodeX 52 53A memory initiator may have multiple memory targets in the same access 54class. The target memory's initiators in a given class indicate the 55nodes' access characteristics share the same performance relative to other 56linked initiator nodes. Each target within an initiator's access class, 57though, do not necessarily perform the same as each other. 58 59================ 60NUMA Performance 61================ 62 63Applications may wish to consider which node they want their memory to 64be allocated from based on the node's performance characteristics. If 65the system provides these attributes, the kernel exports them under the 66node sysfs hierarchy by appending the attributes directory under the 67memory node's access class 0 initiators as follows:: 68 69 /sys/devices/system/node/nodeY/access0/initiators/ 70 71These attributes apply only when accessed from nodes that have the 72are linked under the this access's inititiators. 73 74The performance characteristics the kernel provides for the local initiators 75are exported are as follows:: 76 77 # tree -P "read*|write*" /sys/devices/system/node/nodeY/access0/initiators/ 78 /sys/devices/system/node/nodeY/access0/initiators/ 79 |-- read_bandwidth 80 |-- read_latency 81 |-- write_bandwidth 82 `-- write_latency 83 84The bandwidth attributes are provided in MiB/second. 85 86The latency attributes are provided in nanoseconds. 87 88The values reported here correspond to the rated latency and bandwidth 89for the platform. 90 91========== 92NUMA Cache 93========== 94 95System memory may be constructed in a hierarchy of elements with various 96performance characteristics in order to provide large address space of 97slower performing memory cached by a smaller higher performing memory. The 98system physical addresses memory initiators are aware of are provided 99by the last memory level in the hierarchy. The system meanwhile uses 100higher performing memory to transparently cache access to progressively 101slower levels. 102 103The term "far memory" is used to denote the last level memory in the 104hierarchy. Each increasing cache level provides higher performing 105initiator access, and the term "near memory" represents the fastest 106cache provided by the system. 107 108This numbering is different than CPU caches where the cache level (ex: 109L1, L2, L3) uses the CPU-side view where each increased level is lower 110performing. In contrast, the memory cache level is centric to the last 111level memory, so the higher numbered cache level corresponds to memory 112nearer to the CPU, and further from far memory. 113 114The memory-side caches are not directly addressable by software. When 115software accesses a system address, the system will return it from the 116near memory cache if it is present. If it is not present, the system 117accesses the next level of memory until there is either a hit in that 118cache level, or it reaches far memory. 119 120An application does not need to know about caching attributes in order 121to use the system. Software may optionally query the memory cache 122attributes in order to maximize the performance out of such a setup. 123If the system provides a way for the kernel to discover this information, 124for example with ACPI HMAT (Heterogeneous Memory Attribute Table), 125the kernel will append these attributes to the NUMA node memory target. 126 127When the kernel first registers a memory cache with a node, the kernel 128will create the following directory:: 129 130 /sys/devices/system/node/nodeX/memory_side_cache/ 131 132If that directory is not present, the system either does not provide 133a memory-side cache, or that information is not accessible to the kernel. 134 135The attributes for each level of cache is provided under its cache 136level index:: 137 138 /sys/devices/system/node/nodeX/memory_side_cache/indexA/ 139 /sys/devices/system/node/nodeX/memory_side_cache/indexB/ 140 /sys/devices/system/node/nodeX/memory_side_cache/indexC/ 141 142Each cache level's directory provides its attributes. For example, the 143following shows a single cache level and the attributes available for 144software to query:: 145 146 # tree sys/devices/system/node/node0/memory_side_cache/ 147 /sys/devices/system/node/node0/memory_side_cache/ 148 |-- index1 149 | |-- indexing 150 | |-- line_size 151 | |-- size 152 | `-- write_policy 153 154The "indexing" will be 0 if it is a direct-mapped cache, and non-zero 155for any other indexed based, multi-way associativity. 156 157The "line_size" is the number of bytes accessed from the next cache 158level on a miss. 159 160The "size" is the number of bytes provided by this cache level. 161 162The "write_policy" will be 0 for write-back, and non-zero for 163write-through caching. 164 165======== 166See Also 167======== 168 169[1] https://www.uefi.org/sites/default/files/resources/ACPI_6_2.pdf 170- Section 5.2.27 171