1.. SPDX-License-Identifier: GPL-2.0 2 3====== 4Design 5====== 6 7 8.. _damon_design_execution_model_and_data_structures: 9 10Execution Model and Data Structures 11=================================== 12 13The monitoring-related information including the monitoring request 14specification and DAMON-based operation schemes are stored in a data structure 15called DAMON ``context``. DAMON executes each context with a kernel thread 16called ``kdamond``. Multiple kdamonds could run in parallel, for different 17types of monitoring. 18 19To know how user-space can do the configurations and start/stop DAMON, refer to 20:ref:`DAMON sysfs interface <sysfs_interface>` documentation. 21 22 23Overall Architecture 24==================== 25 26DAMON subsystem is configured with three layers including 27 28- :ref:`Operations Set <damon_operations_set>`: Implements fundamental 29 operations for DAMON that depends on the given monitoring target 30 address-space and available set of software/hardware primitives, 31- :ref:`Core <damon_core_logic>`: Implements core logics including monitoring 32 overhead/accuracy control and access-aware system operations on top of the 33 operations set layer, and 34- :ref:`Modules <damon_modules>`: Implements kernel modules for various 35 purposes that provides interfaces for the user space, on top of the core 36 layer. 37 38 39.. _damon_operations_set: 40 41Operations Set Layer 42==================== 43 44.. _damon_design_configurable_operations_set: 45 46For data access monitoring and additional low level work, DAMON needs a set of 47implementations for specific operations that are dependent on and optimized for 48the given target address space. For example, below two operations for access 49monitoring are address-space dependent. 50 511. Identification of the monitoring target address range for the address space. 522. Access check of specific address range in the target space. 53 54DAMON consolidates these implementations in a layer called DAMON Operations 55Set, and defines the interface between it and the upper layer. The upper layer 56is dedicated for DAMON's core logics including the mechanism for control of the 57monitoring accuracy and the overhead. 58 59Hence, DAMON can easily be extended for any address space and/or available 60hardware features by configuring the core logic to use the appropriate 61operations set. If there is no available operations set for a given purpose, a 62new operations set can be implemented following the interface between the 63layers. 64 65For example, physical memory, virtual memory, swap space, those for specific 66processes, NUMA nodes, files, and backing memory devices would be supportable. 67Also, if some architectures or devices support special optimized access check 68features, those will be easily configurable. 69 70DAMON currently provides below three operation sets. Below two subsections 71describe how those work. 72 73 - vaddr: Monitor virtual address spaces of specific processes 74 - fvaddr: Monitor fixed virtual address ranges 75 - paddr: Monitor the physical address space of the system 76 77To know how user-space can do the configuration via :ref:`DAMON sysfs interface 78<sysfs_interface>`, refer to :ref:`operations <sysfs_context>` file part of the 79documentation. 80 81 82 .. _damon_design_vaddr_target_regions_construction: 83 84VMA-based Target Address Range Construction 85------------------------------------------- 86 87A mechanism of ``vaddr`` DAMON operations set that automatically initializes 88and updates the monitoring target address regions so that entire memory 89mappings of the target processes can be covered. 90 91This mechanism is only for the ``vaddr`` operations set. In cases of 92``fvaddr`` and ``paddr`` operation sets, users are asked to manually set the 93monitoring target address ranges. 94 95Only small parts in the super-huge virtual address space of the processes are 96mapped to the physical memory and accessed. Thus, tracking the unmapped 97address regions is just wasteful. However, because DAMON can deal with some 98level of noise using the adaptive regions adjustment mechanism, tracking every 99mapping is not strictly required but could even incur a high overhead in some 100cases. That said, too huge unmapped areas inside the monitoring target should 101be removed to not take the time for the adaptive mechanism. 102 103For the reason, this implementation converts the complex mappings to three 104distinct regions that cover every mapped area of the address space. The two 105gaps between the three regions are the two biggest unmapped areas in the given 106address space. The two biggest unmapped areas would be the gap between the 107heap and the uppermost mmap()-ed region, and the gap between the lowermost 108mmap()-ed region and the stack in most of the cases. Because these gaps are 109exceptionally huge in usual address spaces, excluding these will be sufficient 110to make a reasonable trade-off. Below shows this in detail:: 111 112 <heap> 113 <BIG UNMAPPED REGION 1> 114 <uppermost mmap()-ed region> 115 (small mmap()-ed regions and munmap()-ed regions) 116 <lowermost mmap()-ed region> 117 <BIG UNMAPPED REGION 2> 118 <stack> 119 120 121PTE Accessed-bit Based Access Check 122----------------------------------- 123 124Both of the implementations for physical and virtual address spaces use PTE 125Accessed-bit for basic access checks. Only one difference is the way of 126finding the relevant PTE Accessed bit(s) from the address. While the 127implementation for the virtual address walks the page table for the target task 128of the address, the implementation for the physical address walks every page 129table having a mapping to the address. In this way, the implementations find 130and clear the bit(s) for next sampling target address and checks whether the 131bit(s) set again after one sampling period. This could disturb other kernel 132subsystems using the Accessed bits, namely Idle page tracking and the reclaim 133logic. DAMON does nothing to avoid disturbing Idle page tracking, so handling 134the interference is the responsibility of sysadmins. However, it solves the 135conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags, 136as Idle page tracking does. 137 138 139.. _damon_core_logic: 140 141Core Logics 142=========== 143 144.. _damon_design_monitoring: 145 146Monitoring 147---------- 148 149Below four sections describe each of the DAMON core mechanisms and the five 150monitoring attributes, ``sampling interval``, ``aggregation interval``, 151``update interval``, ``minimum number of regions``, and ``maximum number of 152regions``. 153 154To know how user-space can set the attributes via :ref:`DAMON sysfs interface 155<sysfs_interface>`, refer to :ref:`monitoring_attrs <sysfs_monitoring_attrs>` 156part of the documentation. 157 158 159Access Frequency Monitoring 160~~~~~~~~~~~~~~~~~~~~~~~~~~~ 161 162The output of DAMON says what pages are how frequently accessed for a given 163duration. The resolution of the access frequency is controlled by setting 164``sampling interval`` and ``aggregation interval``. In detail, DAMON checks 165access to each page per ``sampling interval`` and aggregates the results. In 166other words, counts the number of the accesses to each page. After each 167``aggregation interval`` passes, DAMON calls callback functions that previously 168registered by users so that users can read the aggregated results and then 169clears the results. This can be described in below simple pseudo-code:: 170 171 while monitoring_on: 172 for page in monitoring_target: 173 if accessed(page): 174 nr_accesses[page] += 1 175 if time() % aggregation_interval == 0: 176 for callback in user_registered_callbacks: 177 callback(monitoring_target, nr_accesses) 178 for page in monitoring_target: 179 nr_accesses[page] = 0 180 sleep(sampling interval) 181 182The monitoring overhead of this mechanism will arbitrarily increase as the 183size of the target workload grows. 184 185 186.. _damon_design_region_based_sampling: 187 188Region Based Sampling 189~~~~~~~~~~~~~~~~~~~~~ 190 191To avoid the unbounded increase of the overhead, DAMON groups adjacent pages 192that assumed to have the same access frequencies into a region. As long as the 193assumption (pages in a region have the same access frequencies) is kept, only 194one page in the region is required to be checked. Thus, for each ``sampling 195interval``, DAMON randomly picks one page in each region, waits for one 196``sampling interval``, checks whether the page is accessed meanwhile, and 197increases the access frequency counter of the region if so. The counter is 198called ``nr_accesses`` of the region. Therefore, the monitoring overhead is 199controllable by setting the number of regions. DAMON allows users to set the 200minimum and the maximum number of regions for the trade-off. 201 202This scheme, however, cannot preserve the quality of the output if the 203assumption is not guaranteed. 204 205 206.. _damon_design_adaptive_regions_adjustment: 207 208Adaptive Regions Adjustment 209~~~~~~~~~~~~~~~~~~~~~~~~~~~ 210 211Even somehow the initial monitoring target regions are well constructed to 212fulfill the assumption (pages in same region have similar access frequencies), 213the data access pattern can be dynamically changed. This will result in low 214monitoring quality. To keep the assumption as much as possible, DAMON 215adaptively merges and splits each region based on their access frequency. 216 217For each ``aggregation interval``, it compares the access frequencies 218(``nr_accesses``) of adjacent regions. If the difference is small, and if the 219sum of the two regions' sizes is smaller than the size of total regions divided 220by the ``minimum number of regions``, DAMON merges the two regions. If the 221resulting number of total regions is still higher than ``maximum number of 222regions``, it repeats the merging with increasing access frequenceis difference 223threshold until the upper-limit of the number of regions is met, or the 224threshold becomes higher than possible maximum value (``aggregation interval`` 225divided by ``sampling interval``). Then, after it reports and clears the 226aggregated access frequency of each region, it splits each region into two or 227three regions if the total number of regions will not exceed the user-specified 228maximum number of regions after the split. 229 230In this way, DAMON provides its best-effort quality and minimal overhead while 231keeping the bounds users set for their trade-off. 232 233 234.. _damon_design_age_tracking: 235 236Age Tracking 237~~~~~~~~~~~~ 238 239By analyzing the monitoring results, users can also find how long the current 240access pattern of a region has maintained. That could be used for good 241understanding of the access pattern. For example, page placement algorithm 242utilizing both the frequency and the recency could be implemented using that. 243To make such access pattern maintained period analysis easier, DAMON maintains 244yet another counter called ``age`` in each region. For each ``aggregation 245interval``, DAMON checks if the region's size and access frequency 246(``nr_accesses``) has significantly changed. If so, the counter is reset to 247zero. Otherwise, the counter is increased. 248 249 250Dynamic Target Space Updates Handling 251~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 252 253The monitoring target address range could dynamically changed. For example, 254virtual memory could be dynamically mapped and unmapped. Physical memory could 255be hot-plugged. 256 257As the changes could be quite frequent in some cases, DAMON allows the 258monitoring operations to check dynamic changes including memory mapping changes 259and applies it to monitoring operations-related data structures such as the 260abstracted monitoring target memory area only for each of a user-specified time 261interval (``update interval``). 262 263User-space can get the monitoring results via DAMON sysfs interface and/or 264tracepoints. For more details, please refer to the documentations for 265:ref:`DAMOS tried regions <sysfs_schemes_tried_regions>` and :ref:`tracepoint`, 266respectively. 267 268 269.. _damon_design_monitoring_params_tuning_guide: 270 271Monitoring Parameters Tuning Guide 272~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 273 274In short, set ``aggregation interval`` to capture meaningful amount of accesses 275for the purpose. The amount of accesses can be measured using ``nr_accesses`` 276and ``age`` of regions in the aggregated monitoring results snapshot. The 277default value of the interval, ``100ms``, turns out to be too short in many 278cases. Set ``sampling interval`` proportional to ``aggregation interval``. By 279default, ``1/20`` is recommended as the ratio. 280 281``Aggregation interval`` should be set as the time interval that the workload 282can make an amount of accesses for the monitoring purpose, within the interval. 283If the interval is too short, only small number of accesses are captured. As a 284result, the monitoring results look everything is samely accessed only rarely. 285For many purposes, that would be useless. If it is too long, however, the time 286to converge regions with the :ref:`regions adjustment mechanism 287<damon_design_adaptive_regions_adjustment>` can be too long, depending on the 288time scale of the given purpose. This could happen if the workload is actually 289making only rare accesses but the user thinks the amount of accesses for the 290monitoring purpose too high. For such cases, the target amount of access to 291capture per ``aggregation interval`` should carefully reconsidered. Also, note 292that the captured amount of accesses is represented with not only 293``nr_accesses``, but also ``age``. For example, even if every region on the 294monitoring results show zero ``nr_accesses``, regions could still be 295distinguished using ``age`` values as the recency information. 296 297Hence the optimum value of ``aggregation interval`` depends on the access 298intensiveness of the workload. The user should tune the interval based on the 299amount of access that captured on each aggregated snapshot of the monitoring 300results. 301 302Note that the default value of the interval is 100 milliseconds, which is too 303short in many cases, especially on large systems. 304 305``Sampling interval`` defines the resolution of each aggregation. If it is set 306too large, monitoring results will look like every region was samely rarely 307accessed, or samely frequently accessed. That is, regions become 308undistinguishable based on access pattern, and therefore the results will be 309useless in many use cases. If ``sampling interval`` is too small, it will not 310degrade the resolution, but will increase the monitoring overhead. If it is 311appropriate enough to provide a resolution of the monitoring results that 312sufficient for the given purpose, it shouldn't be unnecessarily further 313lowered. It is recommended to be set proportional to ``aggregation interval``. 314By default, the ratio is set as ``1/20``, and it is still recommended. 315 316Based on the manual tuning guide, DAMON provides more intuitive knob-based 317intervals auto tuning mechanism. Please refer to :ref:`the design document of 318the feature <damon_design_monitoring_intervals_autotuning>` for detail. 319 320Refer to below documents for an example tuning based on the above guide. 321 322.. toctree:: 323 :maxdepth: 1 324 325 monitoring_intervals_tuning_example 326 327 328.. _damon_design_monitoring_intervals_autotuning: 329 330Monitoring Intervals Auto-tuning 331~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 332 333DAMON provides automatic tuning of the ``sampling interval`` and ``aggregation 334interval`` based on the :ref:`the tuning guide idea 335<damon_design_monitoring_params_tuning_guide>`. The tuning mechanism allows 336users to set the aimed amount of access events to observe via DAMON within 337given time interval. The target can be specified by the user as a ratio of 338DAMON-observed access events to the theoretical maximum amount of the events 339(``access_bp``) that measured within a given number of aggregations 340(``aggrs``). 341 342The DAMON-observed access events are calculated in byte granularity based on 343DAMON :ref:`region assumption <damon_design_region_based_sampling>`. For 344example, if a region of size ``X`` bytes of ``Y`` ``nr_accesses`` is found, it 345means ``X * Y`` access events are observed by DAMON. Theoretical maximum 346access events for the region is calculated in same way, but replacing ``Y`` 347with theoretical maximum ``nr_accesses``, which can be calculated as 348``aggregation interval / sampling interval``. 349 350The mechanism calculates the ratio of access events for ``aggrs`` aggregations, 351and increases or decrease the ``sampleing interval`` and ``aggregation 352interval`` in same ratio, if the observed access ratio is lower or higher than 353the target, respectively. The ratio of the intervals change is decided in 354proportion to the distance between current samples ratio and the target ratio. 355 356The user can further set the minimum and maximum ``sampling interval`` that can 357be set by the tuning mechanism using two parameters (``min_sample_us`` and 358``max_sample_us``). Because the tuning mechanism changes ``sampling interval`` 359and ``aggregation interval`` in same ratio always, the minimum and maximum 360``aggregation interval`` after each of the tuning changes can automatically set 361together. 362 363The tuning is turned off by default, and need to be set explicitly by the user. 364As a rule of thumbs and the Parreto principle, 4% access samples ratio target 365is recommended. Note that Parreto principle (80/20 rule) has applied twice. 366That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source) 367to capture 64% (80% multipled by 80%) real access events (outcomes). 368 369To know how user-space can use this feature via :ref:`DAMON sysfs interface 370<sysfs_interface>`, refer to :ref:`intervals_goal <sysfs_scheme>` part of 371the documentation. 372 373 374.. _damon_design_damos: 375 376Operation Schemes 377----------------- 378 379One common purpose of data access monitoring is access-aware system efficiency 380optimizations. For example, 381 382 paging out memory regions that are not accessed for more than two minutes 383 384or 385 386 using THP for memory regions that are larger than 2 MiB and showing a high 387 access frequency for more than one minute. 388 389One straightforward approach for such schemes would be profile-guided 390optimizations. That is, getting data access monitoring results of the 391workloads or the system using DAMON, finding memory regions of special 392characteristics by profiling the monitoring results, and making system 393operation changes for the regions. The changes could be made by modifying or 394providing advice to the software (the application and/or the kernel), or 395reconfiguring the hardware. Both offline and online approaches could be 396available. 397 398Among those, providing advice to the kernel at runtime would be flexible and 399effective, and therefore widely be used. However, implementing such schemes 400could impose unnecessary redundancy and inefficiency. The profiling could be 401redundant if the type of interest is common. Exchanging the information 402including monitoring results and operation advice between kernel and user 403spaces could be inefficient. 404 405To allow users to reduce such redundancy and inefficiencies by offloading the 406works, DAMON provides a feature called Data Access Monitoring-based Operation 407Schemes (DAMOS). It lets users specify their desired schemes at a high 408level. For such specifications, DAMON starts monitoring, finds regions having 409the access pattern of interest, and applies the user-desired operation actions 410to the regions, for every user-specified time interval called 411``apply_interval``. 412 413To know how user-space can set ``apply_interval`` via :ref:`DAMON sysfs 414interface <sysfs_interface>`, refer to :ref:`apply_interval_us <sysfs_scheme>` 415part of the documentation. 416 417 418.. _damon_design_damos_action: 419 420Operation Action 421~~~~~~~~~~~~~~~~ 422 423The management action that the users desire to apply to the regions of their 424interest. For example, paging out, prioritizing for next reclamation victim 425selection, advising ``khugepaged`` to collapse or split, or doing nothing but 426collecting statistics of the regions. 427 428The list of supported actions is defined in DAMOS, but the implementation of 429each action is in the DAMON operations set layer because the implementation 430normally depends on the monitoring target address space. For example, the code 431for paging specific virtual address ranges out would be different from that for 432physical address ranges. And the monitoring operations implementation sets are 433not mandated to support all actions of the list. Hence, the availability of 434specific DAMOS action depends on what operations set is selected to be used 435together. 436 437The list of the supported actions, their meaning, and DAMON operations sets 438that supports each action are as below. 439 440 - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``. 441 Supported by ``vaddr`` and ``fvaddr`` operations set. 442 - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``. 443 Supported by ``vaddr`` and ``fvaddr`` operations set. 444 - ``pageout``: Reclaim the region. 445 Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. 446 - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``. 447 Supported by ``vaddr`` and ``fvaddr`` operations set. 448 - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``. 449 Supported by ``vaddr`` and ``fvaddr`` operations set. 450 - ``lru_prio``: Prioritize the region on its LRU lists. 451 Supported by ``paddr`` operations set. 452 - ``lru_deprio``: Deprioritize the region on its LRU lists. 453 Supported by ``paddr`` operations set. 454 - ``migrate_hot``: Migrate the regions prioritizing warmer regions. 455 Supported by ``paddr`` operations set. 456 - ``migrate_cold``: Migrate the regions prioritizing colder regions. 457 Supported by ``paddr`` operations set. 458 - ``stat``: Do nothing but count the statistics. 459 Supported by all operations sets. 460 461Applying the actions except ``stat`` to a region is considered as changing the 462region's characteristics. Hence, DAMOS resets the age of regions when any such 463actions are applied to those. 464 465To know how user-space can set the action via :ref:`DAMON sysfs interface 466<sysfs_interface>`, refer to :ref:`action <sysfs_scheme>` part of the 467documentation. 468 469 470.. _damon_design_damos_access_pattern: 471 472Target Access Pattern 473~~~~~~~~~~~~~~~~~~~~~ 474 475The access pattern of the schemes' interest. The patterns are constructed with 476the properties that DAMON's monitoring results provide, specifically the size, 477the access frequency, and the age. Users can describe their access pattern of 478interest by setting minimum and maximum values of the three properties. If a 479region's three properties are in the ranges, DAMOS classifies it as one of the 480regions that the scheme is having an interest in. 481 482To know how user-space can set the access pattern via :ref:`DAMON sysfs 483interface <sysfs_interface>`, refer to :ref:`access_pattern 484<sysfs_access_pattern>` part of the documentation. 485 486 487.. _damon_design_damos_quotas: 488 489Quotas 490~~~~~~ 491 492DAMOS upper-bound overhead control feature. DAMOS could incur high overhead if 493the target access pattern is not properly tuned. For example, if a huge memory 494region having the access pattern of interest is found, applying the scheme's 495action to all pages of the huge region could consume unacceptably large system 496resources. Preventing such issues by tuning the access pattern could be 497challenging, especially if the access patterns of the workloads are highly 498dynamic. 499 500To mitigate that situation, DAMOS provides an upper-bound overhead control 501feature called quotas. It lets users specify an upper limit of time that DAMOS 502can use for applying the action, and/or a maximum bytes of memory regions that 503the action can be applied within a user-specified time duration. 504 505To know how user-space can set the basic quotas via :ref:`DAMON sysfs interface 506<sysfs_interface>`, refer to :ref:`quotas <sysfs_quotas>` part of the 507documentation. 508 509 510.. _damon_design_damos_quotas_prioritization: 511 512Prioritization 513^^^^^^^^^^^^^^ 514 515A mechanism for making a good decision under the quotas. When the action 516cannot be applied to all regions of interest due to the quotas, DAMOS 517prioritizes regions and applies the action to only regions having high enough 518priorities so that it will not exceed the quotas. 519 520The prioritization mechanism should be different for each action. For example, 521rarely accessed (colder) memory regions would be prioritized for page-out 522scheme action. In contrast, the colder regions would be deprioritized for huge 523page collapse scheme action. Hence, the prioritization mechanisms for each 524action are implemented in each DAMON operations set, together with the actions. 525 526Though the implementation is up to the DAMON operations set, it would be common 527to calculate the priority using the access pattern properties of the regions. 528Some users would want the mechanisms to be personalized for their specific 529case. For example, some users would want the mechanism to weigh the recency 530(``age``) more than the access frequency (``nr_accesses``). DAMOS allows users 531to specify the weight of each access pattern property and passes the 532information to the underlying mechanism. Nevertheless, how and even whether 533the weight will be respected are up to the underlying prioritization mechanism 534implementation. 535 536To know how user-space can set the prioritization weights via :ref:`DAMON sysfs 537interface <sysfs_interface>`, refer to :ref:`weights <sysfs_quotas>` part of 538the documentation. 539 540 541.. _damon_design_damos_quotas_auto_tuning: 542 543Aim-oriented Feedback-driven Auto-tuning 544^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 545 546Automatic feedback-driven quota tuning. Instead of setting the absolute quota 547value, users can specify the metric of their interest, and what target value 548they want the metric value to be. DAMOS then automatically tunes the 549aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS 550is under achieving the goal, DAMOS automatically increases the quota. If DAMOS 551is over achieving the goal, it decreases the quota. 552 553The goal can be specified with four parameters, namely ``target_metric``, 554``target_value``, ``current_value`` and ``nid``. The auto-tuning mechanism 555tries to make ``current_value`` of ``target_metric`` be same to 556``target_value``. 557 558- ``user_input``: User-provided value. Users could use any metric that they 559 has interest in for the value. Use space main workload's latency or 560 throughput, system metrics like free memory ratio or memory pressure stall 561 time (PSI) could be examples. Note that users should explicitly set 562 ``current_value`` on their own in this case. In other words, users should 563 repeatedly provide the feedback. 564- ``some_mem_psi_us``: System-wide ``some`` memory pressure stall information 565 in microseconds that measured from last quota reset to next quota reset. 566 DAMOS does the measurement on its own, so only ``target_value`` need to be 567 set by users at the initial time. In other words, DAMOS does self-feedback. 568- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000). 569- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000). 570 571``nid`` is optionally required for only ``node_mem_used_bp`` and 572``node_mem_free_bp`` to point the specific NUMA node. 573 574To know how user-space can set the tuning goal metric, the target value, and/or 575the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to 576:ref:`quota goals <sysfs_schemes_quota_goals>` part of the documentation. 577 578 579.. _damon_design_damos_watermarks: 580 581Watermarks 582~~~~~~~~~~ 583 584Conditional DAMOS (de)activation automation. Users might want DAMOS to run 585only under certain situations. For example, when a sufficient amount of free 586memory is guaranteed, running a scheme for proactive reclamation would only 587consume unnecessary system resources. To avoid such consumption, the user would 588need to manually monitor some metrics such as free memory ratio, and turn 589DAMON/DAMOS on or off. 590 591DAMOS allows users to offload such works using three watermarks. It allows the 592users to configure the metric of their interest, and three watermark values, 593namely high, middle, and low. If the value of the metric becomes above the 594high watermark or below the low watermark, the scheme is deactivated. If the 595metric becomes below the mid watermark but above the low watermark, the scheme 596is activated. If all schemes are deactivated by the watermarks, the monitoring 597is also deactivated. In this case, the DAMON worker thread only periodically 598checks the watermarks and therefore incurs nearly zero overhead. 599 600To know how user-space can set the watermarks via :ref:`DAMON sysfs interface 601<sysfs_interface>`, refer to :ref:`watermarks <sysfs_watermarks>` part of the 602documentation. 603 604 605.. _damon_design_damos_filters: 606 607Filters 608~~~~~~~ 609 610Non-access pattern-based target memory regions filtering. If users run 611self-written programs or have good profiling tools, they could know something 612more than the kernel, such as future access patterns or some special 613requirements for specific types of memory. For example, some users may know 614only anonymous pages can impact their program's performance. They can also 615have a list of latency-critical processes. 616 617To let users optimize DAMOS schemes with such special knowledge, DAMOS provides 618a feature called DAMOS filters. The feature allows users to set an arbitrary 619number of filters for each scheme. Each filter specifies 620 621- a type of memory (``type``), 622- whether it is for the memory of the type or all except the type 623 (``matching``), and 624- whether it is to allow (include) or reject (exclude) applying 625 the scheme's action to the memory (``allow``). 626 627For efficient handling of filters, some types of filters are handled by the 628core layer, while others are handled by operations set. In the latter case, 629hence, support of the filter types depends on the DAMON operations set. In 630case of the core layer-handled filters, the memory regions that excluded by the 631filter are not counted as the scheme has tried to the region. In contrast, if 632a memory regions is filtered by an operations set layer-handled filter, it is 633counted as the scheme has tried. This difference affects the statistics. 634 635When multiple filters are installed, the group of filters that handled by the 636core layer are evaluated first. After that, the group of filters that handled 637by the operations layer are evaluated. Filters in each of the groups are 638evaluated in the installed order. If a part of memory is matched to one of the 639filter, next filters are ignored. If the part passes through the filters 640evaluation stage because it is not matched to any of the filters, applying the 641scheme's action to it depends on the last filter's allowance type. If the last 642filter was for allowing, the part of memory will be rejected, and vice versa. 643 644For example, let's assume 1) a filter for allowing anonymous pages and 2) 645another filter for rejecting young pages are installed in the order. If a page 646of a region that eligible to apply the scheme's action is an anonymous page, 647the scheme's action will be applied to the page regardless of whether it is 648young or not, since it matches with the first allow-filter. If the page is 649not anonymous but young, the scheme's action will not be applied, since the 650second reject-filter blocks it. If the page is neither anonymous nor young, 651the page will pass through the filters evaluation stage since there is no 652matching filter, and the action will be applied to the page. 653 654Below ``type`` of filters are currently supported. 655 656- Core layer handled 657 - addr 658 - Applied to pages that belonging to a given address range. 659 - target 660 - Applied to pages that belonging to a given DAMON monitoring target. 661- Operations layer handled, supported by only ``paddr`` operations set. 662 - anon 663 - Applied to pages that containing data that not stored in files. 664 - active 665 - Applied to active pages. 666 - memcg 667 - Applied to pages that belonging to a given cgroup. 668 - young 669 - Applied to pages that are accessed after the last access check from the 670 scheme. 671 - hugepage_size 672 - Applied to pages that managed in a given size range. 673 - unmapped 674 - Applied to pages that unmapped. 675 676To know how user-space can set the filters via :ref:`DAMON sysfs interface 677<sysfs_interface>`, refer to :ref:`filters <sysfs_filters>` part of the 678documentation. 679 680.. _damon_design_damos_stat: 681 682Statistics 683~~~~~~~~~~ 684 685The statistics of DAMOS behaviors that designed to help monitoring, tuning and 686debugging of DAMOS. 687 688DAMOS accounts below statistics for each scheme, from the beginning of the 689scheme's execution. 690 691- ``nr_tried``: Total number of regions that the scheme is tried to be applied. 692- ``sz_trtied``: Total size of regions that the scheme is tried to be applied. 693- ``sz_ops_filter_passed``: Total bytes that passed operations set 694 layer-handled DAMOS filters. 695- ``nr_applied``: Total number of regions that the scheme is applied. 696- ``sz_applied``: Total size of regions that the scheme is applied. 697- ``qt_exceeds``: Total number of times the quota of the scheme has exceeded. 698 699"A scheme is tried to be applied to a region" means DAMOS core logic determined 700the region is eligible to apply the scheme's :ref:`action 701<damon_design_damos_action>`. The :ref:`access pattern 702<damon_design_damos_access_pattern>`, :ref:`quotas 703<damon_design_damos_quotas>`, :ref:`watermarks 704<damon_design_damos_watermarks>`, and :ref:`filters 705<damon_design_damos_filters>` that handled on core logic could affect this. 706The core logic will only ask the underlying :ref:`operation set 707<damon_operations_set>` to do apply the action to the region, so whether the 708action is really applied or not is unclear. That's why it is called "tried". 709 710"A scheme is applied to a region" means the :ref:`operation set 711<damon_operations_set>` has applied the action to at least a part of the 712region. The :ref:`filters <damon_design_damos_filters>` that handled by the 713operation set, and the types of the :ref:`action <damon_design_damos_action>` 714and the pages of the region can affect this. For example, if a filter is set 715to exclude anonymous pages and the region has only anonymous pages, or if the 716action is ``pageout`` while all pages of the region are unreclaimable, applying 717the action to the region will fail. 718 719To know how user-space can read the stats via :ref:`DAMON sysfs interface 720<sysfs_interface>`, refer to :ref:s`stats <sysfs_stats>` part of the 721documentation. 722 723Regions Walking 724~~~~~~~~~~~~~~~ 725 726DAMOS feature allowing users access each region that a DAMOS action has just 727applied. Using this feature, DAMON :ref:`API <damon_design_api>` allows users 728access full properties of the regions including the access monitoring results 729and amount of the region's internal memory that passed the DAMOS filters. 730:ref:`DAMON sysfs interface <sysfs_interface>` also allows users read the data 731via special :ref:`files <sysfs_schemes_tried_regions>`. 732 733.. _damon_design_api: 734 735Application Programming Interface 736--------------------------------- 737 738The programming interface for kernel space data access-aware applications. 739DAMON is a framework, so it does nothing by itself. Instead, it only helps 740other kernel components such as subsystems and modules building their data 741access-aware applications using DAMON's core features. For this, DAMON exposes 742its all features to other kernel components via its application programming 743interface, namely ``include/linux/damon.h``. Please refer to the API 744:doc:`document </mm/damon/api>` for details of the interface. 745 746 747.. _damon_modules: 748 749Modules 750======= 751 752Because the core of DAMON is a framework for kernel components, it doesn't 753provide any direct interface for the user space. Such interfaces should be 754implemented by each DAMON API user kernel components, instead. DAMON subsystem 755itself implements such DAMON API user modules, which are supposed to be used 756for general purpose DAMON control and special purpose data access-aware system 757operations, and provides stable application binary interfaces (ABI) for the 758user space. The user space can build their efficient data access-aware 759applications using the interfaces. 760 761 762General Purpose User Interface Modules 763-------------------------------------- 764 765DAMON modules that provide user space ABIs for general purpose DAMON usage in 766runtime. 767 768Like many other ABIs, the modules create files on pseudo file systems like 769'sysfs', allow users to specify their requests to and get the answers from 770DAMON by writing to and reading from the files. As a response to such I/O, 771DAMON user interface modules control DAMON and retrieve the results as user 772requested via the DAMON API, and return the results to the user-space. 773 774The ABIs are designed to be used for user space applications development, 775rather than human beings' fingers. Human users are recommended to use such 776user space tools. One such Python-written user space tool is available at 777Github (https://github.com/damonitor/damo), Pypi 778(https://pypistats.org/packages/damo), and Fedora 779(https://packages.fedoraproject.org/pkgs/python-damo/damo/). 780 781Currently, one module for this type, namely 'DAMON sysfs interface' is 782available. Please refer to the ABI :ref:`doc <sysfs_interface>` for details of 783the interfaces. 784 785 786Special-Purpose Access-aware Kernel Modules 787------------------------------------------- 788 789DAMON modules that provide user space ABI for specific purpose DAMON usage. 790 791DAMON user interface modules are for full control of all DAMON features in 792runtime. For each special-purpose system-wide data access-aware system 793operations such as proactive reclamation or LRU lists balancing, the interfaces 794could be simplified by removing unnecessary knobs for the specific purpose, and 795extended for boot-time and even compile time control. Default values of DAMON 796control parameters for the usage would also need to be optimized for the 797purpose. 798 799To support such cases, yet more DAMON API user kernel modules that provide more 800simple and optimized user space interfaces are available. Currently, two 801modules for proactive reclamation and LRU lists manipulation are provided. For 802more detail, please read the usage documents for those 803(:doc:`/admin-guide/mm/damon/reclaim` and 804:doc:`/admin-guide/mm/damon/lru_sort`). 805