1.. SPDX-License-Identifier: GPL-2.0 2 3====== 4Design 5====== 6 7 8.. _damon_design_execution_model_and_data_structures: 9 10Execution Model and Data Structures 11=================================== 12 13The monitoring-related information including the monitoring request 14specification and DAMON-based operation schemes are stored in a data structure 15called DAMON ``context``. DAMON executes each context with a kernel thread 16called ``kdamond``. Multiple kdamonds could run in parallel, for different 17types of monitoring. 18 19To know how user-space can do the configurations and start/stop DAMON, refer to 20:ref:`DAMON sysfs interface <sysfs_interface>` documentation. 21 22 23Overall Architecture 24==================== 25 26DAMON subsystem is configured with three layers including 27 28- :ref:`Operations Set <damon_operations_set>`: Implements fundamental 29 operations for DAMON that depends on the given monitoring target 30 address-space and available set of software/hardware primitives, 31- :ref:`Core <damon_core_logic>`: Implements core logics including monitoring 32 overhead/accuracy control and access-aware system operations on top of the 33 operations set layer, and 34- :ref:`Modules <damon_modules>`: Implements kernel modules for various 35 purposes that provides interfaces for the user space, on top of the core 36 layer. 37 38 39.. _damon_operations_set: 40 41Operations Set Layer 42==================== 43 44.. _damon_design_configurable_operations_set: 45 46For data access monitoring and additional low level work, DAMON needs a set of 47implementations for specific operations that are dependent on and optimized for 48the given target address space. For example, below two operations for access 49monitoring are address-space dependent. 50 511. Identification of the monitoring target address range for the address space. 522. Access check of specific address range in the target space. 53 54DAMON consolidates these implementations in a layer called DAMON Operations 55Set, and defines the interface between it and the upper layer. The upper layer 56is dedicated for DAMON's core logics including the mechanism for control of the 57monitoring accuracy and the overhead. 58 59Hence, DAMON can easily be extended for any address space and/or available 60hardware features by configuring the core logic to use the appropriate 61operations set. If there is no available operations set for a given purpose, a 62new operations set can be implemented following the interface between the 63layers. 64 65For example, physical memory, virtual memory, swap space, those for specific 66processes, NUMA nodes, files, and backing memory devices would be supportable. 67Also, if some architectures or devices support special optimized access check 68features, those will be easily configurable. 69 70DAMON currently provides below three operation sets. Below three subsections 71describe how those work. 72 73 - vaddr: Monitor virtual address spaces of specific processes 74 - fvaddr: Monitor fixed virtual address ranges 75 - paddr: Monitor the physical address space of the system 76 77To know how user-space can do the configuration via :ref:`DAMON sysfs interface 78<sysfs_interface>`, refer to :ref:`operations <sysfs_context>` file part of the 79documentation. 80 81 82 .. _damon_design_vaddr_target_regions_construction: 83 84VMA-based Target Address Range Construction 85------------------------------------------- 86 87A mechanism of ``vaddr`` DAMON operations set that automatically initializes 88and updates the monitoring target address regions so that entire memory 89mappings of the target processes can be covered. 90 91This mechanism is only for the ``vaddr`` operations set. In cases of 92``fvaddr`` and ``paddr`` operation sets, users are asked to manually set the 93monitoring target address ranges. 94 95Only small parts in the super-huge virtual address space of the processes are 96mapped to the physical memory and accessed. Thus, tracking the unmapped 97address regions is just wasteful. However, because DAMON can deal with some 98level of noise using the adaptive regions adjustment mechanism, tracking every 99mapping is not strictly required but could even incur a high overhead in some 100cases. That said, too huge unmapped areas inside the monitoring target should 101be removed to not take the time for the adaptive mechanism. 102 103For the reason, this implementation converts the complex mappings to three 104distinct regions that cover every mapped area of the address space. The two 105gaps between the three regions are the two biggest unmapped areas in the given 106address space. The two biggest unmapped areas would be the gap between the 107heap and the uppermost mmap()-ed region, and the gap between the lowermost 108mmap()-ed region and the stack in most of the cases. Because these gaps are 109exceptionally huge in usual address spaces, excluding these will be sufficient 110to make a reasonable trade-off. Below shows this in detail:: 111 112 <heap> 113 <BIG UNMAPPED REGION 1> 114 <uppermost mmap()-ed region> 115 (small mmap()-ed regions and munmap()-ed regions) 116 <lowermost mmap()-ed region> 117 <BIG UNMAPPED REGION 2> 118 <stack> 119 120 121PTE Accessed-bit Based Access Check 122----------------------------------- 123 124Both of the implementations for physical and virtual address spaces use PTE 125Accessed-bit for basic access checks. Only one difference is the way of 126finding the relevant PTE Accessed bit(s) from the address. While the 127implementation for the virtual address walks the page table for the target task 128of the address, the implementation for the physical address walks every page 129table having a mapping to the address. In this way, the implementations find 130and clear the bit(s) for next sampling target address and checks whether the 131bit(s) set again after one sampling period. This could disturb other kernel 132subsystems using the Accessed bits, namely Idle page tracking and the reclaim 133logic. DAMON does nothing to avoid disturbing Idle page tracking, so handling 134the interference is the responsibility of sysadmins. However, it solves the 135conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags, 136as Idle page tracking does. 137 138.. _damon_design_addr_unit: 139 140Address Unit 141------------ 142 143DAMON core layer uses ``unsinged long`` type for monitoring target address 144ranges. In some cases, the address space for a given operations set could be 145too large to be handled with the type. ARM (32-bit) with large physical 146address extension is an example. For such cases, a per-operations set 147parameter called ``address unit`` is provided. It represents the scale factor 148that need to be multiplied to the core layer's address for calculating real 149address on the given address space. Support of ``address unit`` parameter is 150up to each operations set implementation. ``paddr`` is the only operations set 151implementation that supports the parameter. 152 153If the value is smaller than ``PAGE_SIZE``, only a power of two should be used. 154 155.. _damon_core_logic: 156 157Core Logics 158=========== 159 160.. _damon_design_monitoring: 161 162Monitoring 163---------- 164 165Below four sections describe each of the DAMON core mechanisms and the five 166monitoring attributes, ``sampling interval``, ``aggregation interval``, 167``update interval``, ``minimum number of regions``, and ``maximum number of 168regions``. 169 170Note that ``minimum number of regions`` must be 3 or higher. This is because the 171virtual address space monitoring is designed to handle at least three regions to 172accommodate two large unmapped areas commonly found in normal virtual address 173spaces. While this restriction might not be strictly necessary for other 174operation sets like ``paddr``, it is currently enforced across all DAMON 175operations for consistency. 176 177To know how user-space can set the attributes via :ref:`DAMON sysfs interface 178<sysfs_interface>`, refer to :ref:`monitoring_attrs <sysfs_monitoring_attrs>` 179part of the documentation. 180 181 182Access Frequency Monitoring 183~~~~~~~~~~~~~~~~~~~~~~~~~~~ 184 185The output of DAMON says what pages are how frequently accessed for a given 186duration. The resolution of the access frequency is controlled by setting 187``sampling interval`` and ``aggregation interval``. In detail, DAMON checks 188access to each page per ``sampling interval`` and aggregates the results. In 189other words, counts the number of the accesses to each page. After each 190``aggregation interval`` passes, DAMON calls callback functions that previously 191registered by users so that users can read the aggregated results and then 192clears the results. This can be described in below simple pseudo-code:: 193 194 while monitoring_on: 195 for page in monitoring_target: 196 if accessed(page): 197 nr_accesses[page] += 1 198 if time() % aggregation_interval == 0: 199 for callback in user_registered_callbacks: 200 callback(monitoring_target, nr_accesses) 201 for page in monitoring_target: 202 nr_accesses[page] = 0 203 sleep(sampling interval) 204 205The monitoring overhead of this mechanism will arbitrarily increase as the 206size of the target workload grows. 207 208 209.. _damon_design_region_based_sampling: 210 211Region Based Sampling 212~~~~~~~~~~~~~~~~~~~~~ 213 214To avoid the unbounded increase of the overhead, DAMON groups adjacent pages 215that assumed to have the same access frequencies into a region. As long as the 216assumption (pages in a region have the same access frequencies) is kept, only 217one page in the region is required to be checked. Thus, for each ``sampling 218interval``, DAMON randomly picks one page in each region, waits for one 219``sampling interval``, checks whether the page is accessed meanwhile, and 220increases the access frequency counter of the region if so. The counter is 221called ``nr_accesses`` of the region. Therefore, the monitoring overhead is 222controllable by setting the number of regions. DAMON allows users to set the 223minimum and the maximum number of regions for the trade-off. 224 225This scheme, however, cannot preserve the quality of the output if the 226assumption is not guaranteed. 227 228 229.. _damon_design_adaptive_regions_adjustment: 230 231Adaptive Regions Adjustment 232~~~~~~~~~~~~~~~~~~~~~~~~~~~ 233 234Even somehow the initial monitoring target regions are well constructed to 235fulfill the assumption (pages in same region have similar access frequencies), 236the data access pattern can be dynamically changed. This will result in low 237monitoring quality. To keep the assumption as much as possible, DAMON 238adaptively merges and splits each region based on their access frequency. 239 240For each ``aggregation interval``, it compares the access frequencies 241(``nr_accesses``) of adjacent regions. If the difference is small, and if the 242sum of the two regions' sizes is smaller than the size of total regions divided 243by the ``minimum number of regions``, DAMON merges the two regions. If the 244resulting number of total regions is still higher than ``maximum number of 245regions``, it repeats the merging with increasing access frequenceis difference 246threshold until the upper-limit of the number of regions is met, or the 247threshold becomes higher than possible maximum value (``aggregation interval`` 248divided by ``sampling interval``). Then, after it reports and clears the 249aggregated access frequency of each region, it splits each region into two or 250three regions if the total number of regions will not exceed the user-specified 251maximum number of regions after the split. 252 253In this way, DAMON provides its best-effort quality and minimal overhead while 254keeping the bounds users set for their trade-off. 255 256 257.. _damon_design_age_tracking: 258 259Age Tracking 260~~~~~~~~~~~~ 261 262By analyzing the monitoring results, users can also find how long the current 263access pattern of a region has maintained. That could be used for good 264understanding of the access pattern. For example, page placement algorithm 265utilizing both the frequency and the recency could be implemented using that. 266To make such access pattern maintained period analysis easier, DAMON maintains 267yet another counter called ``age`` in each region. For each ``aggregation 268interval``, DAMON checks if the region's size and access frequency 269(``nr_accesses``) has significantly changed. If so, the counter is reset to 270zero. Otherwise, the counter is increased. 271 272 273Dynamic Target Space Updates Handling 274~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 275 276The monitoring target address range could dynamically changed. For example, 277virtual memory could be dynamically mapped and unmapped. Physical memory could 278be hot-plugged. 279 280As the changes could be quite frequent in some cases, DAMON allows the 281monitoring operations to check dynamic changes including memory mapping changes 282and applies it to monitoring operations-related data structures such as the 283abstracted monitoring target memory area only for each of a user-specified time 284interval (``update interval``). 285 286User-space can get the monitoring results via DAMON sysfs interface and/or 287tracepoints. For more details, please refer to the documentations for 288:ref:`DAMOS tried regions <sysfs_schemes_tried_regions>` and :ref:`tracepoint`, 289respectively. 290 291 292.. _damon_design_monitoring_params_tuning_guide: 293 294Monitoring Parameters Tuning Guide 295~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 296 297In short, set ``aggregation interval`` to capture meaningful amount of accesses 298for the purpose. The amount of accesses can be measured using ``nr_accesses`` 299and ``age`` of regions in the aggregated monitoring results snapshot. The 300default value of the interval, ``100ms``, turns out to be too short in many 301cases. Set ``sampling interval`` proportional to ``aggregation interval``. By 302default, ``1/20`` is recommended as the ratio. 303 304``Aggregation interval`` should be set as the time interval that the workload 305can make an amount of accesses for the monitoring purpose, within the interval. 306If the interval is too short, only small number of accesses are captured. As a 307result, the monitoring results look everything is samely accessed only rarely. 308For many purposes, that would be useless. If it is too long, however, the time 309to converge regions with the :ref:`regions adjustment mechanism 310<damon_design_adaptive_regions_adjustment>` can be too long, depending on the 311time scale of the given purpose. This could happen if the workload is actually 312making only rare accesses but the user thinks the amount of accesses for the 313monitoring purpose too high. For such cases, the target amount of access to 314capture per ``aggregation interval`` should carefully reconsidered. Also, note 315that the captured amount of accesses is represented with not only 316``nr_accesses``, but also ``age``. For example, even if every region on the 317monitoring results show zero ``nr_accesses``, regions could still be 318distinguished using ``age`` values as the recency information. 319 320Hence the optimum value of ``aggregation interval`` depends on the access 321intensiveness of the workload. The user should tune the interval based on the 322amount of access that captured on each aggregated snapshot of the monitoring 323results. 324 325Note that the default value of the interval is 100 milliseconds, which is too 326short in many cases, especially on large systems. 327 328``Sampling interval`` defines the resolution of each aggregation. If it is set 329too large, monitoring results will look like every region was samely rarely 330accessed, or samely frequently accessed. That is, regions become 331undistinguishable based on access pattern, and therefore the results will be 332useless in many use cases. If ``sampling interval`` is too small, it will not 333degrade the resolution, but will increase the monitoring overhead. If it is 334appropriate enough to provide a resolution of the monitoring results that 335sufficient for the given purpose, it shouldn't be unnecessarily further 336lowered. It is recommended to be set proportional to ``aggregation interval``. 337By default, the ratio is set as ``1/20``, and it is still recommended. 338 339Based on the manual tuning guide, DAMON provides more intuitive knob-based 340intervals auto tuning mechanism. Please refer to :ref:`the design document of 341the feature <damon_design_monitoring_intervals_autotuning>` for detail. 342 343Refer to below documents for an example tuning based on the above guide. 344 345.. toctree:: 346 :maxdepth: 1 347 348 monitoring_intervals_tuning_example 349 350 351.. _damon_design_monitoring_intervals_autotuning: 352 353Monitoring Intervals Auto-tuning 354~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 355 356DAMON provides automatic tuning of the ``sampling interval`` and ``aggregation 357interval`` based on the :ref:`the tuning guide idea 358<damon_design_monitoring_params_tuning_guide>`. The tuning mechanism allows 359users to set the aimed amount of access events to observe via DAMON within 360given time interval. The target can be specified by the user as a ratio of 361DAMON-observed access events to the theoretical maximum amount of the events 362(``access_bp``) that measured within a given number of aggregations 363(``aggrs``). 364 365The DAMON-observed access events are calculated in byte granularity based on 366DAMON :ref:`region assumption <damon_design_region_based_sampling>`. For 367example, if a region of size ``X`` bytes of ``Y`` ``nr_accesses`` is found, it 368means ``X * Y`` access events are observed by DAMON. Theoretical maximum 369access events for the region is calculated in same way, but replacing ``Y`` 370with theoretical maximum ``nr_accesses``, which can be calculated as 371``aggregation interval / sampling interval``. 372 373The mechanism calculates the ratio of access events for ``aggrs`` aggregations, 374and increases or decrease the ``sampleing interval`` and ``aggregation 375interval`` in same ratio, if the observed access ratio is lower or higher than 376the target, respectively. The ratio of the intervals change is decided in 377proportion to the distance between current samples ratio and the target ratio. 378 379The user can further set the minimum and maximum ``sampling interval`` that can 380be set by the tuning mechanism using two parameters (``min_sample_us`` and 381``max_sample_us``). Because the tuning mechanism changes ``sampling interval`` 382and ``aggregation interval`` in same ratio always, the minimum and maximum 383``aggregation interval`` after each of the tuning changes can automatically set 384together. 385 386The tuning is turned off by default, and need to be set explicitly by the user. 387As a rule of thumbs and the Parreto principle, 4% access samples ratio target 388is recommended. Note that Parreto principle (80/20 rule) has applied twice. 389That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source) 390to capture 64% (80% multipled by 80%) real access events (outcomes). 391 392To know how user-space can use this feature via :ref:`DAMON sysfs interface 393<sysfs_interface>`, refer to :ref:`intervals_goal 394<damon_usage_sysfs_monitoring_intervals_goal>` part of the documentation. 395 396 397.. _damon_design_damos: 398 399Operation Schemes 400----------------- 401 402One common purpose of data access monitoring is access-aware system efficiency 403optimizations. For example, 404 405 paging out memory regions that are not accessed for more than two minutes 406 407or 408 409 using THP for memory regions that are larger than 2 MiB and showing a high 410 access frequency for more than one minute. 411 412One straightforward approach for such schemes would be profile-guided 413optimizations. That is, getting data access monitoring results of the 414workloads or the system using DAMON, finding memory regions of special 415characteristics by profiling the monitoring results, and making system 416operation changes for the regions. The changes could be made by modifying or 417providing advice to the software (the application and/or the kernel), or 418reconfiguring the hardware. Both offline and online approaches could be 419available. 420 421Among those, providing advice to the kernel at runtime would be flexible and 422effective, and therefore widely be used. However, implementing such schemes 423could impose unnecessary redundancy and inefficiency. The profiling could be 424redundant if the type of interest is common. Exchanging the information 425including monitoring results and operation advice between kernel and user 426spaces could be inefficient. 427 428To allow users to reduce such redundancy and inefficiencies by offloading the 429works, DAMON provides a feature called Data Access Monitoring-based Operation 430Schemes (DAMOS). It lets users specify their desired schemes at a high 431level. For such specifications, DAMON starts monitoring, finds regions having 432the access pattern of interest, and applies the user-desired operation actions 433to the regions, for every user-specified time interval called 434``apply_interval``. 435 436To know how user-space can set ``apply_interval`` via :ref:`DAMON sysfs 437interface <sysfs_interface>`, refer to :ref:`apply_interval_us <sysfs_scheme>` 438part of the documentation. 439 440 441.. _damon_design_damos_action: 442 443Operation Action 444~~~~~~~~~~~~~~~~ 445 446The management action that the users desire to apply to the regions of their 447interest. For example, paging out, prioritizing for next reclamation victim 448selection, advising ``khugepaged`` to collapse or split, or doing nothing but 449collecting statistics of the regions. 450 451The list of supported actions is defined in DAMOS, but the implementation of 452each action is in the DAMON operations set layer because the implementation 453normally depends on the monitoring target address space. For example, the code 454for paging specific virtual address ranges out would be different from that for 455physical address ranges. And the monitoring operations implementation sets are 456not mandated to support all actions of the list. Hence, the availability of 457specific DAMOS action depends on what operations set is selected to be used 458together. 459 460The list of the supported actions, their meaning, and DAMON operations sets 461that supports each action are as below. 462 463 - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``. 464 Supported by ``vaddr`` and ``fvaddr`` operations set. 465 - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``. 466 Supported by ``vaddr`` and ``fvaddr`` operations set. 467 - ``pageout``: Reclaim the region. 468 Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. 469 - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``. 470 Supported by ``vaddr`` and ``fvaddr`` operations set. When 471 TRANSPARENT_HUGEPAGE is disabled, the application of the action will just 472 fail. 473 - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``. 474 Supported by ``vaddr`` and ``fvaddr`` operations set. When 475 TRANSPARENT_HUGEPAGE is disabled, the application of the action will just 476 fail. 477 - ``lru_prio``: Prioritize the region on its LRU lists. 478 Supported by ``paddr`` operations set. 479 - ``lru_deprio``: Deprioritize the region on its LRU lists. 480 Supported by ``paddr`` operations set. 481 - ``migrate_hot``: Migrate the regions prioritizing warmer regions. 482 Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. 483 - ``migrate_cold``: Migrate the regions prioritizing colder regions. 484 Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. 485 - ``stat``: Do nothing but count the statistics. 486 Supported by all operations sets. 487 488Applying the actions except ``stat`` to a region is considered as changing the 489region's characteristics. Hence, DAMOS resets the age of regions when any such 490actions are applied to those. 491 492To know how user-space can set the action via :ref:`DAMON sysfs interface 493<sysfs_interface>`, refer to :ref:`action <sysfs_scheme>` part of the 494documentation. 495 496 497.. _damon_design_damos_access_pattern: 498 499Target Access Pattern 500~~~~~~~~~~~~~~~~~~~~~ 501 502The access pattern of the schemes' interest. The patterns are constructed with 503the properties that DAMON's monitoring results provide, specifically the size, 504the access frequency, and the age. Users can describe their access pattern of 505interest by setting minimum and maximum values of the three properties. If a 506region's three properties are in the ranges, DAMOS classifies it as one of the 507regions that the scheme is having an interest in. 508 509To know how user-space can set the access pattern via :ref:`DAMON sysfs 510interface <sysfs_interface>`, refer to :ref:`access_pattern 511<sysfs_access_pattern>` part of the documentation. 512 513 514.. _damon_design_damos_quotas: 515 516Quotas 517~~~~~~ 518 519DAMOS upper-bound overhead control feature. DAMOS could incur high overhead if 520the target access pattern is not properly tuned. For example, if a huge memory 521region having the access pattern of interest is found, applying the scheme's 522action to all pages of the huge region could consume unacceptably large system 523resources. Preventing such issues by tuning the access pattern could be 524challenging, especially if the access patterns of the workloads are highly 525dynamic. 526 527To mitigate that situation, DAMOS provides an upper-bound overhead control 528feature called quotas. It lets users specify an upper limit of time that DAMOS 529can use for applying the action, and/or a maximum bytes of memory regions that 530the action can be applied within a user-specified time duration. 531 532To know how user-space can set the basic quotas via :ref:`DAMON sysfs interface 533<sysfs_interface>`, refer to :ref:`quotas <sysfs_quotas>` part of the 534documentation. 535 536 537.. _damon_design_damos_quotas_prioritization: 538 539Prioritization 540^^^^^^^^^^^^^^ 541 542A mechanism for making a good decision under the quotas. When the action 543cannot be applied to all regions of interest due to the quotas, DAMOS 544prioritizes regions and applies the action to only regions having high enough 545priorities so that it will not exceed the quotas. 546 547The prioritization mechanism should be different for each action. For example, 548rarely accessed (colder) memory regions would be prioritized for page-out 549scheme action. In contrast, the colder regions would be deprioritized for huge 550page collapse scheme action. Hence, the prioritization mechanisms for each 551action are implemented in each DAMON operations set, together with the actions. 552 553Though the implementation is up to the DAMON operations set, it would be common 554to calculate the priority using the access pattern properties of the regions. 555Some users would want the mechanisms to be personalized for their specific 556case. For example, some users would want the mechanism to weigh the recency 557(``age``) more than the access frequency (``nr_accesses``). DAMOS allows users 558to specify the weight of each access pattern property and passes the 559information to the underlying mechanism. Nevertheless, how and even whether 560the weight will be respected are up to the underlying prioritization mechanism 561implementation. 562 563To know how user-space can set the prioritization weights via :ref:`DAMON sysfs 564interface <sysfs_interface>`, refer to :ref:`weights <sysfs_quotas>` part of 565the documentation. 566 567 568.. _damon_design_damos_quotas_auto_tuning: 569 570Aim-oriented Feedback-driven Auto-tuning 571^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 572 573Automatic feedback-driven quota tuning. Instead of setting the absolute quota 574value, users can specify the metric of their interest, and what target value 575they want the metric value to be. DAMOS then automatically tunes the 576aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS 577is under achieving the goal, DAMOS automatically increases the quota. If DAMOS 578is over achieving the goal, it decreases the quota. 579 580There are two such tuning algorithms that users can select as they need. 581 582- ``consist``: A proportional feedback loop based algorithm. Tries to find an 583 optimum quota that should be consistently kept, to keep achieving the goal. 584 Useful for kernel-only operation on dynamic and long-running environments. 585 This is the default selection. If unsure, use this. 586- ``temporal``: More straightforward algorithm. Tries to achieve the goal as 587 fast as possible, using maximum allowed quota, but only for a temporal short 588 time. When the quota is under-achieved, this algorithm keeps tuning quota to 589 a maximum allowed one. Once the quota is [over]-achieved, this sets the 590 quota zero. Useful for deterministic control required environments. 591 592The goal can be specified with five parameters, namely ``target_metric``, 593``target_value``, ``current_value``, ``nid`` and ``path``. The auto-tuning 594mechanism tries to make ``current_value`` of ``target_metric`` be same to 595``target_value``. 596 597- ``user_input``: User-provided value. Users could use any metric that they 598 has interest in for the value. Use space main workload's latency or 599 throughput, system metrics like free memory ratio or memory pressure stall 600 time (PSI) could be examples. Note that users should explicitly set 601 ``current_value`` on their own in this case. In other words, users should 602 repeatedly provide the feedback. 603- ``some_mem_psi_us``: System-wide ``some`` memory pressure stall information 604 in microseconds that measured from last quota reset to next quota reset. 605 DAMOS does the measurement on its own, so only ``target_value`` need to be 606 set by users at the initial time. In other words, DAMOS does self-feedback. 607- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000). 608- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000). 609- ``node_memcg_used_bp``: Specific cgroup's node used memory ratio for a 610 specific NUMA node, in bp (1/10,000). 611- ``node_memcg_free_bp``: Specific cgroup's node unused memory ratio for a 612 specific NUMA node, in bp (1/10,000). 613- ``active_mem_bp``: Active to active + inactive (LRU) memory size ratio in bp 614 (1/10,000). 615- ``inactive_mem_bp``: Inactive to active + inactive (LRU) memory size ratio in 616 bp (1/10,000). 617 618``nid`` is optionally required for only ``node_mem_used_bp``, 619``node_mem_free_bp``, ``node_memcg_used_bp`` and ``node_memcg_free_bp`` to 620point the specific NUMA node. 621 622``path`` is optionally required for only ``node_memcg_used_bp`` and 623``node_memcg_free_bp`` to point the path to the cgroup. The value should be 624the path of the memory cgroup from the cgroups mount point. 625 626To know how user-space can set the tuning goal metric, the target value, and/or 627the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to 628:ref:`quota goals <sysfs_schemes_quota_goals>` part of the documentation. 629 630 631.. _damon_design_damos_watermarks: 632 633Watermarks 634~~~~~~~~~~ 635 636Conditional DAMOS (de)activation automation. Users might want DAMOS to run 637only under certain situations. For example, when a sufficient amount of free 638memory is guaranteed, running a scheme for proactive reclamation would only 639consume unnecessary system resources. To avoid such consumption, the user would 640need to manually monitor some metrics such as free memory ratio, and turn 641DAMON/DAMOS on or off. 642 643DAMOS allows users to offload such works using three watermarks. It allows the 644users to configure the metric of their interest, and three watermark values, 645namely high, middle, and low. If the value of the metric becomes above the 646high watermark or below the low watermark, the scheme is deactivated. If the 647metric becomes below the mid watermark but above the low watermark, the scheme 648is activated. If all schemes are deactivated by the watermarks, the monitoring 649is also deactivated. In this case, the DAMON worker thread only periodically 650checks the watermarks and therefore incurs nearly zero overhead. 651 652To know how user-space can set the watermarks via :ref:`DAMON sysfs interface 653<sysfs_interface>`, refer to :ref:`watermarks <sysfs_watermarks>` part of the 654documentation. 655 656 657.. _damon_design_damos_filters: 658 659Filters 660~~~~~~~ 661 662Non-access pattern-based target memory regions filtering. If users run 663self-written programs or have good profiling tools, they could know something 664more than the kernel, such as future access patterns or some special 665requirements for specific types of memory. For example, some users may know 666only anonymous pages can impact their program's performance. They can also 667have a list of latency-critical processes. 668 669To let users optimize DAMOS schemes with such special knowledge, DAMOS provides 670a feature called DAMOS filters. The feature allows users to set an arbitrary 671number of filters for each scheme. Each filter specifies 672 673- a type of memory (``type``), 674- whether it is for the memory of the type or all except the type 675 (``matching``), and 676- whether it is to allow (include) or reject (exclude) applying 677 the scheme's action to the memory (``allow``). 678 679For efficient handling of filters, some types of filters are handled by the 680core layer, while others are handled by operations set. In the latter case, 681hence, support of the filter types depends on the DAMON operations set. In 682case of the core layer-handled filters, the memory regions that excluded by the 683filter are not counted as the scheme has tried to the region. In contrast, if 684a memory regions is filtered by an operations set layer-handled filter, it is 685counted as the scheme has tried. This difference affects the statistics. 686 687When multiple filters are installed, the group of filters that handled by the 688core layer are evaluated first. After that, the group of filters that handled 689by the operations layer are evaluated. Filters in each of the groups are 690evaluated in the installed order. If a part of memory is matched to one of the 691filter, next filters are ignored. If the part passes through the filters 692evaluation stage because it is not matched to any of the filters, applying the 693scheme's action to it depends on the last filter's allowance type. If the last 694filter was for allowing, the part of memory will be rejected, and vice versa. 695 696For example, let's assume 1) a filter for allowing anonymous pages and 2) 697another filter for rejecting young pages are installed in the order. If a page 698of a region that eligible to apply the scheme's action is an anonymous page, 699the scheme's action will be applied to the page regardless of whether it is 700young or not, since it matches with the first allow-filter. If the page is 701not anonymous but young, the scheme's action will not be applied, since the 702second reject-filter blocks it. If the page is neither anonymous nor young, 703the page will pass through the filters evaluation stage since there is no 704matching filter, and the action will be applied to the page. 705 706Below ``type`` of filters are currently supported. 707 708- Core layer handled 709 - addr 710 - Applied to pages that belonging to a given address range. 711 - target 712 - Applied to pages that belonging to a given DAMON monitoring target. 713- Operations layer handled, supported by only ``paddr`` operations set. 714 - anon 715 - Applied to pages that containing data that not stored in files. 716 - active 717 - Applied to active pages. 718 - memcg 719 - Applied to pages that belonging to a given cgroup. 720 - young 721 - Applied to pages that are accessed after the last access check from the 722 scheme. 723 - hugepage_size 724 - Applied to pages that managed in a given size range. 725 - unmapped 726 - Applied to pages that unmapped. 727 728To know how user-space can set the filters via :ref:`DAMON sysfs interface 729<sysfs_interface>`, refer to :ref:`filters <sysfs_filters>` part of the 730documentation. 731 732.. _damon_design_damos_stat: 733 734Statistics 735~~~~~~~~~~ 736 737The statistics of DAMOS behaviors that designed to help monitoring, tuning and 738debugging of DAMOS. 739 740DAMOS accounts below statistics for each scheme, from the beginning of the 741scheme's execution. 742 743- ``nr_tried``: Total number of regions that the scheme is tried to be applied. 744- ``sz_tried``: Total size of regions that the scheme is tried to be applied. 745- ``sz_ops_filter_passed``: Total bytes that passed operations set 746 layer-handled DAMOS filters. 747- ``nr_applied``: Total number of regions that the scheme is applied. 748- ``sz_applied``: Total size of regions that the scheme is applied. 749- ``qt_exceeds``: Total number of times the quota of the scheme has exceeded. 750- ``nr_snapshots``: Total number of DAMON snapshots that the scheme is tried to 751 be applied. 752- ``max_nr_snapshots``: Upper limit of ``nr_snapshots``. 753 754"A scheme is tried to be applied to a region" means DAMOS core logic determined 755the region is eligible to apply the scheme's :ref:`action 756<damon_design_damos_action>`. The :ref:`access pattern 757<damon_design_damos_access_pattern>`, :ref:`quotas 758<damon_design_damos_quotas>`, :ref:`watermarks 759<damon_design_damos_watermarks>`, and :ref:`filters 760<damon_design_damos_filters>` that handled on core logic could affect this. 761The core logic will only ask the underlying :ref:`operation set 762<damon_operations_set>` to do apply the action to the region, so whether the 763action is really applied or not is unclear. That's why it is called "tried". 764 765"A scheme is applied to a region" means the :ref:`operation set 766<damon_operations_set>` has applied the action to at least a part of the 767region. The :ref:`filters <damon_design_damos_filters>` that handled by the 768operation set, and the types of the :ref:`action <damon_design_damos_action>` 769and the pages of the region can affect this. For example, if a filter is set 770to exclude anonymous pages and the region has only anonymous pages, or if the 771action is ``pageout`` while all pages of the region are unreclaimable, applying 772the action to the region will fail. 773 774Unlike normal stats, ``max_nr_snapshots`` is set by users. If it is set as 775non-zero and ``nr_snapshots`` be same to or greater than ``nr_snapshots``, the 776scheme is deactivated. 777 778To know how user-space can read the stats via :ref:`DAMON sysfs interface 779<sysfs_interface>`, refer to :ref:s`stats <sysfs_stats>` part of the 780documentation. 781 782Regions Walking 783~~~~~~~~~~~~~~~ 784 785DAMOS feature allowing users access each region that a DAMOS action has just 786applied. Using this feature, DAMON :ref:`API <damon_design_api>` allows users 787access full properties of the regions including the access monitoring results 788and amount of the region's internal memory that passed the DAMOS filters. 789:ref:`DAMON sysfs interface <sysfs_interface>` also allows users read the data 790via special :ref:`files <sysfs_schemes_tried_regions>`. 791 792.. _damon_design_api: 793 794Application Programming Interface 795--------------------------------- 796 797The programming interface for kernel space data access-aware applications. 798DAMON is a framework, so it does nothing by itself. Instead, it only helps 799other kernel components such as subsystems and modules building their data 800access-aware applications using DAMON's core features. For this, DAMON exposes 801its all features to other kernel components via its application programming 802interface, namely ``include/linux/damon.h``. Please refer to the API 803:doc:`document </mm/damon/api>` for details of the interface. 804 805 806.. _damon_modules: 807 808Modules 809======= 810 811Because the core of DAMON is a framework for kernel components, it doesn't 812provide any direct interface for the user space. Such interfaces should be 813implemented by each DAMON API user kernel components, instead. DAMON subsystem 814itself implements such DAMON API user modules, which are supposed to be used 815for general purpose DAMON control and special purpose data access-aware system 816operations, and provides stable application binary interfaces (ABI) for the 817user space. The user space can build their efficient data access-aware 818applications using the interfaces. 819 820 821General Purpose User Interface Modules 822-------------------------------------- 823 824DAMON modules that provide user space ABIs for general purpose DAMON usage in 825runtime. 826 827Like many other ABIs, the modules create files on pseudo file systems like 828'sysfs', allow users to specify their requests to and get the answers from 829DAMON by writing to and reading from the files. As a response to such I/O, 830DAMON user interface modules control DAMON and retrieve the results as user 831requested via the DAMON API, and return the results to the user-space. 832 833The ABIs are designed to be used for user space applications development, 834rather than human beings' fingers. Human users are recommended to use such 835user space tools. One such Python-written user space tool is available at 836Github (https://github.com/damonitor/damo), Pypi 837(https://pypistats.org/packages/damo), and multiple distros 838(https://repology.org/project/damo/versions). 839 840Currently, one module for this type, namely 'DAMON sysfs interface' is 841available. Please refer to the ABI :ref:`doc <sysfs_interface>` for details of 842the interfaces. 843 844 845.. _damon_modules_special_purpose: 846 847Special-Purpose Access-aware Kernel Modules 848------------------------------------------- 849 850DAMON modules that provide user space ABI for specific purpose DAMON usage. 851 852DAMON user interface modules are for full control of all DAMON features in 853runtime. For each special-purpose system-wide data access-aware system 854operations such as proactive reclamation or LRU lists balancing, the interfaces 855could be simplified by removing unnecessary knobs for the specific purpose, and 856extended for boot-time and even compile time control. Default values of DAMON 857control parameters for the usage would also need to be optimized for the 858purpose. 859 860To support such cases, yet more DAMON API user kernel modules that provide more 861simple and optimized user space interfaces are available. Currently, two 862modules for proactive reclamation and LRU lists manipulation are provided. For 863more detail, please read the usage documents for those 864(:doc:`/admin-guide/mm/damon/stat`, :doc:`/admin-guide/mm/damon/reclaim` and 865:doc:`/admin-guide/mm/damon/lru_sort`). 866 867.. _damon_design_special_purpose_modules_exclusivity: 868 869Note that these modules currently run in an exclusive manner. If one of those 870is already running, others will return ``-EBUSY`` upon start requests. 871 872Sample DAMON Modules 873-------------------- 874 875DAMON modules that provides example DAMON kernel API usages. 876 877kernel programmers can build their own special or general purpose DAMON modules 878using DAMON kernel API. To help them easily understand how DAMON kernel API 879can be used, a few sample modules are provided under ``samples/damon/`` of the 880linux source tree. Please note that these modules are not developed for being 881used on real products, but only for showing how DAMON kernel API can be used in 882simple ways. 883