1.. SPDX-License-Identifier: GPL-2.0 2 3====== 4Design 5====== 6 7 8.. _damon_design_execution_model_and_data_structures: 9 10Execution Model and Data Structures 11=================================== 12 13The monitoring-related information including the monitoring request 14specification and DAMON-based operation schemes are stored in a data structure 15called DAMON ``context``. DAMON executes each context with a kernel thread 16called ``kdamond``. Multiple kdamonds could run in parallel, for different 17types of monitoring. 18 19 20Overall Architecture 21==================== 22 23DAMON subsystem is configured with three layers including 24 25- Operations Set: Implements fundamental operations for DAMON that depends on 26 the given monitoring target address-space and available set of 27 software/hardware primitives, 28- Core: Implements core logics including monitoring overhead/accurach control 29 and access-aware system operations on top of the operations set layer, and 30- Modules: Implements kernel modules for various purposes that provides 31 interfaces for the user space, on top of the core layer. 32 33 34.. _damon_design_configurable_operations_set: 35 36Configurable Operations Set 37--------------------------- 38 39For data access monitoring and additional low level work, DAMON needs a set of 40implementations for specific operations that are dependent on and optimized for 41the given target address space. On the other hand, the accuracy and overhead 42tradeoff mechanism, which is the core logic of DAMON, is in the pure logic 43space. DAMON separates the two parts in different layers, namely DAMON 44Operations Set and DAMON Core Logics Layers, respectively. It further defines 45the interface between the layers to allow various operations sets to be 46configured with the core logic. 47 48Due to this design, users can extend DAMON for any address space by configuring 49the core logic to use the appropriate operations set. If any appropriate set 50is unavailable, users can implement one on their own. 51 52For example, physical memory, virtual memory, swap space, those for specific 53processes, NUMA nodes, files, and backing memory devices would be supportable. 54Also, if some architectures or devices supporting special optimized access 55check primitives, those will be easily configurable. 56 57 58Programmable Modules 59-------------------- 60 61Core layer of DAMON is implemented as a framework, and exposes its application 62programming interface to all kernel space components such as subsystems and 63modules. For common use cases of DAMON, DAMON subsystem provides kernel 64modules that built on top of the core layer using the API, which can be easily 65used by the user space end users. 66 67 68.. _damon_operations_set: 69 70Operations Set Layer 71==================== 72 73The monitoring operations are defined in two parts: 74 751. Identification of the monitoring target address range for the address space. 762. Access check of specific address range in the target space. 77 78DAMON currently provides below three operation sets. Below two subsections 79describe how those work. 80 81 - vaddr: Monitor virtual address spaces of specific processes 82 - fvaddr: Monitor fixed virtual address ranges 83 - paddr: Monitor the physical address space of the system 84 85 86 .. _damon_design_vaddr_target_regions_construction: 87 88VMA-based Target Address Range Construction 89------------------------------------------- 90 91A mechanism of ``vaddr`` DAMON operations set that automatically initializes 92and updates the monitoring target address regions so that entire memory 93mappings of the target processes can be covered. 94 95This mechanism is only for the ``vaddr`` operations set. In cases of 96``fvaddr`` and ``paddr`` operation sets, users are asked to manually set the 97monitoring target address ranges. 98 99Only small parts in the super-huge virtual address space of the processes are 100mapped to the physical memory and accessed. Thus, tracking the unmapped 101address regions is just wasteful. However, because DAMON can deal with some 102level of noise using the adaptive regions adjustment mechanism, tracking every 103mapping is not strictly required but could even incur a high overhead in some 104cases. That said, too huge unmapped areas inside the monitoring target should 105be removed to not take the time for the adaptive mechanism. 106 107For the reason, this implementation converts the complex mappings to three 108distinct regions that cover every mapped area of the address space. The two 109gaps between the three regions are the two biggest unmapped areas in the given 110address space. The two biggest unmapped areas would be the gap between the 111heap and the uppermost mmap()-ed region, and the gap between the lowermost 112mmap()-ed region and the stack in most of the cases. Because these gaps are 113exceptionally huge in usual address spaces, excluding these will be sufficient 114to make a reasonable trade-off. Below shows this in detail:: 115 116 <heap> 117 <BIG UNMAPPED REGION 1> 118 <uppermost mmap()-ed region> 119 (small mmap()-ed regions and munmap()-ed regions) 120 <lowermost mmap()-ed region> 121 <BIG UNMAPPED REGION 2> 122 <stack> 123 124 125PTE Accessed-bit Based Access Check 126----------------------------------- 127 128Both of the implementations for physical and virtual address spaces use PTE 129Accessed-bit for basic access checks. Only one difference is the way of 130finding the relevant PTE Accessed bit(s) from the address. While the 131implementation for the virtual address walks the page table for the target task 132of the address, the implementation for the physical address walks every page 133table having a mapping to the address. In this way, the implementations find 134and clear the bit(s) for next sampling target address and checks whether the 135bit(s) set again after one sampling period. This could disturb other kernel 136subsystems using the Accessed bits, namely Idle page tracking and the reclaim 137logic. DAMON does nothing to avoid disturbing Idle page tracking, so handling 138the interference is the responsibility of sysadmins. However, it solves the 139conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags, 140as Idle page tracking does. 141 142 143Core Logics 144=========== 145 146 147Monitoring 148---------- 149 150Below four sections describe each of the DAMON core mechanisms and the five 151monitoring attributes, ``sampling interval``, ``aggregation interval``, 152``update interval``, ``minimum number of regions``, and ``maximum number of 153regions``. 154 155 156Access Frequency Monitoring 157~~~~~~~~~~~~~~~~~~~~~~~~~~~ 158 159The output of DAMON says what pages are how frequently accessed for a given 160duration. The resolution of the access frequency is controlled by setting 161``sampling interval`` and ``aggregation interval``. In detail, DAMON checks 162access to each page per ``sampling interval`` and aggregates the results. In 163other words, counts the number of the accesses to each page. After each 164``aggregation interval`` passes, DAMON calls callback functions that previously 165registered by users so that users can read the aggregated results and then 166clears the results. This can be described in below simple pseudo-code:: 167 168 while monitoring_on: 169 for page in monitoring_target: 170 if accessed(page): 171 nr_accesses[page] += 1 172 if time() % aggregation_interval == 0: 173 for callback in user_registered_callbacks: 174 callback(monitoring_target, nr_accesses) 175 for page in monitoring_target: 176 nr_accesses[page] = 0 177 sleep(sampling interval) 178 179The monitoring overhead of this mechanism will arbitrarily increase as the 180size of the target workload grows. 181 182 183.. _damon_design_region_based_sampling: 184 185Region Based Sampling 186~~~~~~~~~~~~~~~~~~~~~ 187 188To avoid the unbounded increase of the overhead, DAMON groups adjacent pages 189that assumed to have the same access frequencies into a region. As long as the 190assumption (pages in a region have the same access frequencies) is kept, only 191one page in the region is required to be checked. Thus, for each ``sampling 192interval``, DAMON randomly picks one page in each region, waits for one 193``sampling interval``, checks whether the page is accessed meanwhile, and 194increases the access frequency counter of the region if so. The counter is 195called ``nr_regions`` of the region. Therefore, the monitoring overhead is 196controllable by setting the number of regions. DAMON allows users to set the 197minimum and the maximum number of regions for the trade-off. 198 199This scheme, however, cannot preserve the quality of the output if the 200assumption is not guaranteed. 201 202 203Adaptive Regions Adjustment 204~~~~~~~~~~~~~~~~~~~~~~~~~~~ 205 206Even somehow the initial monitoring target regions are well constructed to 207fulfill the assumption (pages in same region have similar access frequencies), 208the data access pattern can be dynamically changed. This will result in low 209monitoring quality. To keep the assumption as much as possible, DAMON 210adaptively merges and splits each region based on their access frequency. 211 212For each ``aggregation interval``, it compares the access frequencies of 213adjacent regions and merges those if the frequency difference is small. Then, 214after it reports and clears the aggregated access frequency of each region, it 215splits each region into two or three regions if the total number of regions 216will not exceed the user-specified maximum number of regions after the split. 217 218In this way, DAMON provides its best-effort quality and minimal overhead while 219keeping the bounds users set for their trade-off. 220 221 222.. _damon_design_age_tracking: 223 224Age Tracking 225~~~~~~~~~~~~ 226 227By analyzing the monitoring results, users can also find how long the current 228access pattern of a region has maintained. That could be used for good 229understanding of the access pattern. For example, page placement algorithm 230utilizing both the frequency and the recency could be implemented using that. 231To make such access pattern maintained period analysis easier, DAMON maintains 232yet another counter called ``age`` in each region. For each ``aggregation 233interval``, DAMON checks if the region's size and access frequency 234(``nr_accesses``) has significantly changed. If so, the counter is reset to 235zero. Otherwise, the counter is increased. 236 237 238Dynamic Target Space Updates Handling 239~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 240 241The monitoring target address range could dynamically changed. For example, 242virtual memory could be dynamically mapped and unmapped. Physical memory could 243be hot-plugged. 244 245As the changes could be quite frequent in some cases, DAMON allows the 246monitoring operations to check dynamic changes including memory mapping changes 247and applies it to monitoring operations-related data structures such as the 248abstracted monitoring target memory area only for each of a user-specified time 249interval (``update interval``). 250 251 252.. _damon_design_damos: 253 254Operation Schemes 255----------------- 256 257One common purpose of data access monitoring is access-aware system efficiency 258optimizations. For example, 259 260 paging out memory regions that are not accessed for more than two minutes 261 262or 263 264 using THP for memory regions that are larger than 2 MiB and showing a high 265 access frequency for more than one minute. 266 267One straightforward approach for such schemes would be profile-guided 268optimizations. That is, getting data access monitoring results of the 269workloads or the system using DAMON, finding memory regions of special 270characteristics by profiling the monitoring results, and making system 271operation changes for the regions. The changes could be made by modifying or 272providing advice to the software (the application and/or the kernel), or 273reconfiguring the hardware. Both offline and online approaches could be 274available. 275 276Among those, providing advice to the kernel at runtime would be flexible and 277effective, and therefore widely be used. However, implementing such schemes 278could impose unnecessary redundancy and inefficiency. The profiling could be 279redundant if the type of interest is common. Exchanging the information 280including monitoring results and operation advice between kernel and user 281spaces could be inefficient. 282 283To allow users to reduce such redundancy and inefficiencies by offloading the 284works, DAMON provides a feature called Data Access Monitoring-based Operation 285Schemes (DAMOS). It lets users specify their desired schemes at a high 286level. For such specifications, DAMON starts monitoring, finds regions having 287the access pattern of interest, and applies the user-desired operation actions 288to the regions, for every user-specified time interval called 289``apply_interval``. 290 291 292.. _damon_design_damos_action: 293 294Operation Action 295~~~~~~~~~~~~~~~~ 296 297The management action that the users desire to apply to the regions of their 298interest. For example, paging out, prioritizing for next reclamation victim 299selection, advising ``khugepaged`` to collapse or split, or doing nothing but 300collecting statistics of the regions. 301 302The list of supported actions is defined in DAMOS, but the implementation of 303each action is in the DAMON operations set layer because the implementation 304normally depends on the monitoring target address space. For example, the code 305for paging specific virtual address ranges out would be different from that for 306physical address ranges. And the monitoring operations implementation sets are 307not mandated to support all actions of the list. Hence, the availability of 308specific DAMOS action depends on what operations set is selected to be used 309together. 310 311The list of the supported actions, their meaning, and DAMON operations sets 312that supports each action are as below. 313 314 - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``. 315 Supported by ``vaddr`` and ``fvaddr`` operations set. 316 - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``. 317 Supported by ``vaddr`` and ``fvaddr`` operations set. 318 - ``pageout``: Reclaim the region. 319 Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. 320 - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``. 321 Supported by ``vaddr`` and ``fvaddr`` operations set. 322 - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``. 323 Supported by ``vaddr`` and ``fvaddr`` operations set. 324 - ``lru_prio``: Prioritize the region on its LRU lists. 325 Supported by ``paddr`` operations set. 326 - ``lru_deprio``: Deprioritize the region on its LRU lists. 327 Supported by ``paddr`` operations set. 328 - ``stat``: Do nothing but count the statistics. 329 Supported by all operations sets. 330 331Applying the actions except ``stat`` to a region is considered as changing the 332region's characteristics. Hence, DAMOS resets the age of regions when any such 333actions are applied to those. 334 335 336.. _damon_design_damos_access_pattern: 337 338Target Access Pattern 339~~~~~~~~~~~~~~~~~~~~~ 340 341The access pattern of the schemes' interest. The patterns are constructed with 342the properties that DAMON's monitoring results provide, specifically the size, 343the access frequency, and the age. Users can describe their access pattern of 344interest by setting minimum and maximum values of the three properties. If a 345region's three properties are in the ranges, DAMOS classifies it as one of the 346regions that the scheme is having an interest in. 347 348 349.. _damon_design_damos_quotas: 350 351Quotas 352~~~~~~ 353 354DAMOS upper-bound overhead control feature. DAMOS could incur high overhead if 355the target access pattern is not properly tuned. For example, if a huge memory 356region having the access pattern of interest is found, applying the scheme's 357action to all pages of the huge region could consume unacceptably large system 358resources. Preventing such issues by tuning the access pattern could be 359challenging, especially if the access patterns of the workloads are highly 360dynamic. 361 362To mitigate that situation, DAMOS provides an upper-bound overhead control 363feature called quotas. It lets users specify an upper limit of time that DAMOS 364can use for applying the action, and/or a maximum bytes of memory regions that 365the action can be applied within a user-specified time duration. 366 367 368.. _damon_design_damos_quotas_prioritization: 369 370Prioritization 371^^^^^^^^^^^^^^ 372 373A mechanism for making a good decision under the quotas. When the action 374cannot be applied to all regions of interest due to the quotas, DAMOS 375prioritizes regions and applies the action to only regions having high enough 376priorities so that it will not exceed the quotas. 377 378The prioritization mechanism should be different for each action. For example, 379rarely accessed (colder) memory regions would be prioritized for page-out 380scheme action. In contrast, the colder regions would be deprioritized for huge 381page collapse scheme action. Hence, the prioritization mechanisms for each 382action are implemented in each DAMON operations set, together with the actions. 383 384Though the implementation is up to the DAMON operations set, it would be common 385to calculate the priority using the access pattern properties of the regions. 386Some users would want the mechanisms to be personalized for their specific 387case. For example, some users would want the mechanism to weigh the recency 388(``age``) more than the access frequency (``nr_accesses``). DAMOS allows users 389to specify the weight of each access pattern property and passes the 390information to the underlying mechanism. Nevertheless, how and even whether 391the weight will be respected are up to the underlying prioritization mechanism 392implementation. 393 394 395.. _damon_design_damos_quotas_auto_tuning: 396 397Aim-oriented Feedback-driven Auto-tuning 398^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 399 400Automatic feedback-driven quota tuning. Instead of setting the absolute quota 401value, users can specify the metric of their interest, and what target value 402they want the metric value to be. DAMOS then automatically tunes the 403aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS 404is under achieving the goal, DAMOS automatically increases the quota. If DAMOS 405is over achieving the goal, it decreases the quota. 406 407The goal can be specified with three parameters, namely ``target_metric``, 408``target_value``, and ``current_value``. The auto-tuning mechanism tries to 409make ``current_value`` of ``target_metric`` be same to ``target_value``. 410Currently, two ``target_metric`` are provided. 411 412- ``user_input``: User-provided value. Users could use any metric that they 413 has interest in for the value. Use space main workload's latency or 414 throughput, system metrics like free memory ratio or memory pressure stall 415 time (PSI) could be examples. Note that users should explicitly set 416 ``current_value`` on their own in this case. In other words, users should 417 repeatedly provide the feedback. 418- ``some_mem_psi_us``: System-wide ``some`` memory pressure stall information 419 in microseconds that measured from last quota reset to next quota reset. 420 DAMOS does the measurement on its own, so only ``target_value`` need to be 421 set by users at the initial time. In other words, DAMOS does self-feedback. 422 423 424.. _damon_design_damos_watermarks: 425 426Watermarks 427~~~~~~~~~~ 428 429Conditional DAMOS (de)activation automation. Users might want DAMOS to run 430only under certain situations. For example, when a sufficient amount of free 431memory is guaranteed, running a scheme for proactive reclamation would only 432consume unnecessary system resources. To avoid such consumption, the user would 433need to manually monitor some metrics such as free memory ratio, and turn 434DAMON/DAMOS on or off. 435 436DAMOS allows users to offload such works using three watermarks. It allows the 437users to configure the metric of their interest, and three watermark values, 438namely high, middle, and low. If the value of the metric becomes above the 439high watermark or below the low watermark, the scheme is deactivated. If the 440metric becomes below the mid watermark but above the low watermark, the scheme 441is activated. If all schemes are deactivated by the watermarks, the monitoring 442is also deactivated. In this case, the DAMON worker thread only periodically 443checks the watermarks and therefore incurs nearly zero overhead. 444 445 446.. _damon_design_damos_filters: 447 448Filters 449~~~~~~~ 450 451Non-access pattern-based target memory regions filtering. If users run 452self-written programs or have good profiling tools, they could know something 453more than the kernel, such as future access patterns or some special 454requirements for specific types of memory. For example, some users may know 455only anonymous pages can impact their program's performance. They can also 456have a list of latency-critical processes. 457 458To let users optimize DAMOS schemes with such special knowledge, DAMOS provides 459a feature called DAMOS filters. The feature allows users to set an arbitrary 460number of filters for each scheme. Each filter specifies the type of target 461memory, and whether it should exclude the memory of the type (filter-out), or 462all except the memory of the type (filter-in). 463 464Currently, anonymous page, memory cgroup, address range, and DAMON monitoring 465target type filters are supported by the feature. Some filter target types 466require additional arguments. The memory cgroup filter type asks users to 467specify the file path of the memory cgroup for the filter. The address range 468type asks the start and end addresses of the range. The DAMON monitoring 469target type asks the index of the target from the context's monitoring targets 470list. Hence, users can apply specific schemes to only anonymous pages, 471non-anonymous pages, pages of specific cgroups, all pages excluding those of 472specific cgroups, pages in specific address range, pages in specific DAMON 473monitoring targets, and any combination of those. 474 475To handle filters efficiently, the address range and DAMON monitoring target 476type filters are handled by the core layer, while others are handled by 477operations set. If a memory region is filtered by a core layer-handled filter, 478it is not counted as the scheme has tried to the region. In contrast, if a 479memory regions is filtered by an operations set layer-handled filter, it is 480counted as the scheme has tried. The difference in accounting leads to changes 481in the statistics. 482 483 484Application Programming Interface 485--------------------------------- 486 487The programming interface for kernel space data access-aware applications. 488DAMON is a framework, so it does nothing by itself. Instead, it only helps 489other kernel components such as subsystems and modules building their data 490access-aware applications using DAMON's core features. For this, DAMON exposes 491its all features to other kernel components via its application programming 492interface, namely ``include/linux/damon.h``. Please refer to the API 493:doc:`document </mm/damon/api>` for details of the interface. 494 495 496Modules 497======= 498 499Because the core of DAMON is a framework for kernel components, it doesn't 500provide any direct interface for the user space. Such interfaces should be 501implemented by each DAMON API user kernel components, instead. DAMON subsystem 502itself implements such DAMON API user modules, which are supposed to be used 503for general purpose DAMON control and special purpose data access-aware system 504operations, and provides stable application binary interfaces (ABI) for the 505user space. The user space can build their efficient data access-aware 506applications using the interfaces. 507 508 509General Purpose User Interface Modules 510-------------------------------------- 511 512DAMON modules that provide user space ABIs for general purpose DAMON usage in 513runtime. 514 515DAMON user interface modules, namely 'DAMON sysfs interface' and 'DAMON debugfs 516interface' are DAMON API user kernel modules that provide ABIs to the 517user-space. Please note that DAMON debugfs interface is currently deprecated. 518 519Like many other ABIs, the modules create files on sysfs and debugfs, allow 520users to specify their requests to and get the answers from DAMON by writing to 521and reading from the files. As a response to such I/O, DAMON user interface 522modules control DAMON and retrieve the results as user requested via the DAMON 523API, and return the results to the user-space. 524 525The ABIs are designed to be used for user space applications development, 526rather than human beings' fingers. Human users are recommended to use such 527user space tools. One such Python-written user space tool is available at 528Github (https://github.com/awslabs/damo), Pypi 529(https://pypistats.org/packages/damo), and Fedora 530(https://packages.fedoraproject.org/pkgs/python-damo/damo/). 531 532Please refer to the ABI :doc:`document </admin-guide/mm/damon/usage>` for 533details of the interfaces. 534 535 536Special-Purpose Access-aware Kernel Modules 537------------------------------------------- 538 539DAMON modules that provide user space ABI for specific purpose DAMON usage. 540 541DAMON sysfs/debugfs user interfaces are for full control of all DAMON features 542in runtime. For each special-purpose system-wide data access-aware system 543operations such as proactive reclamation or LRU lists balancing, the interfaces 544could be simplified by removing unnecessary knobs for the specific purpose, and 545extended for boot-time and even compile time control. Default values of DAMON 546control parameters for the usage would also need to be optimized for the 547purpose. 548 549To support such cases, yet more DAMON API user kernel modules that provide more 550simple and optimized user space interfaces are available. Currently, two 551modules for proactive reclamation and LRU lists manipulation are provided. For 552more detail, please read the usage documents for those 553(:doc:`/admin-guide/mm/damon/reclaim` and 554:doc:`/admin-guide/mm/damon/lru_sort`). 555