1.. SPDX-License-Identifier: GPL-2.0 2 3====== 4Design 5====== 6 7 8.. _damon_design_execution_model_and_data_structures: 9 10Execution Model and Data Structures 11=================================== 12 13The monitoring-related information including the monitoring request 14specification and DAMON-based operation schemes are stored in a data structure 15called DAMON ``context``. DAMON executes each context with a kernel thread 16called ``kdamond``. Multiple kdamonds could run in parallel, for different 17types of monitoring. 18 19To know how user-space can do the configurations and start/stop DAMON, refer to 20:ref:`DAMON sysfs interface <sysfs_interface>` documentation. 21 22 23Overall Architecture 24==================== 25 26DAMON subsystem is configured with three layers including 27 28- :ref:`Operations Set <damon_operations_set>`: Implements fundamental 29 operations for DAMON that depends on the given monitoring target 30 address-space and available set of software/hardware primitives, 31- :ref:`Core <damon_core_logic>`: Implements core logics including monitoring 32 overhead/accuracy control and access-aware system operations on top of the 33 operations set layer, and 34- :ref:`Modules <damon_modules>`: Implements kernel modules for various 35 purposes that provides interfaces for the user space, on top of the core 36 layer. 37 38 39.. _damon_operations_set: 40 41Operations Set Layer 42==================== 43 44.. _damon_design_configurable_operations_set: 45 46For data access monitoring and additional low level work, DAMON needs a set of 47implementations for specific operations that are dependent on and optimized for 48the given target address space. For example, below two operations for access 49monitoring are address-space dependent. 50 511. Identification of the monitoring target address range for the address space. 522. Access check of specific address range in the target space. 53 54DAMON consolidates these implementations in a layer called DAMON Operations 55Set, and defines the interface between it and the upper layer. The upper layer 56is dedicated for DAMON's core logics including the mechanism for control of the 57monitoring accuracy and the overhead. 58 59Hence, DAMON can easily be extended for any address space and/or available 60hardware features by configuring the core logic to use the appropriate 61operations set. If there is no available operations set for a given purpose, a 62new operations set can be implemented following the interface between the 63layers. 64 65For example, physical memory, virtual memory, swap space, those for specific 66processes, NUMA nodes, files, and backing memory devices would be supportable. 67Also, if some architectures or devices support special optimized access check 68features, those will be easily configurable. 69 70DAMON currently provides below three operation sets. Below three subsections 71describe how those work. 72 73 - vaddr: Monitor virtual address spaces of specific processes 74 - fvaddr: Monitor fixed virtual address ranges 75 - paddr: Monitor the physical address space of the system 76 77To know how user-space can do the configuration via :ref:`DAMON sysfs interface 78<sysfs_interface>`, refer to :ref:`operations <sysfs_context>` file part of the 79documentation. 80 81 82 .. _damon_design_vaddr_target_regions_construction: 83 84VMA-based Target Address Range Construction 85------------------------------------------- 86 87A mechanism of ``vaddr`` DAMON operations set that automatically initializes 88and updates the monitoring target address regions so that entire memory 89mappings of the target processes can be covered. 90 91This mechanism is only for the ``vaddr`` operations set. In cases of 92``fvaddr`` and ``paddr`` operation sets, users are asked to manually set the 93monitoring target address ranges. 94 95Only small parts in the super-huge virtual address space of the processes are 96mapped to the physical memory and accessed. Thus, tracking the unmapped 97address regions is just wasteful. However, because DAMON can deal with some 98level of noise using the adaptive regions adjustment mechanism, tracking every 99mapping is not strictly required but could even incur a high overhead in some 100cases. That said, too huge unmapped areas inside the monitoring target should 101be removed to not take the time for the adaptive mechanism. 102 103For the reason, this implementation converts the complex mappings to three 104distinct regions that cover every mapped area of the address space. The two 105gaps between the three regions are the two biggest unmapped areas in the given 106address space. The two biggest unmapped areas would be the gap between the 107heap and the uppermost mmap()-ed region, and the gap between the lowermost 108mmap()-ed region and the stack in most of the cases. Because these gaps are 109exceptionally huge in usual address spaces, excluding these will be sufficient 110to make a reasonable trade-off. Below shows this in detail:: 111 112 <heap> 113 <BIG UNMAPPED REGION 1> 114 <uppermost mmap()-ed region> 115 (small mmap()-ed regions and munmap()-ed regions) 116 <lowermost mmap()-ed region> 117 <BIG UNMAPPED REGION 2> 118 <stack> 119 120 121PTE Accessed-bit Based Access Check 122----------------------------------- 123 124Both of the implementations for physical and virtual address spaces use PTE 125Accessed-bit for basic access checks. Only one difference is the way of 126finding the relevant PTE Accessed bit(s) from the address. While the 127implementation for the virtual address walks the page table for the target task 128of the address, the implementation for the physical address walks every page 129table having a mapping to the address. In this way, the implementations find 130and clear the bit(s) for next sampling target address and checks whether the 131bit(s) set again after one sampling period. This could disturb other kernel 132subsystems using the Accessed bits, namely Idle page tracking and the reclaim 133logic. DAMON does nothing to avoid disturbing Idle page tracking, so handling 134the interference is the responsibility of sysadmins. However, it solves the 135conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags, 136as Idle page tracking does. 137 138.. _damon_design_addr_unit: 139 140Address Unit 141------------ 142 143DAMON core layer uses ``unsinged long`` type for monitoring target address 144ranges. In some cases, the address space for a given operations set could be 145too large to be handled with the type. ARM (32-bit) with large physical 146address extension is an example. For such cases, a per-operations set 147parameter called ``address unit`` is provided. It represents the scale factor 148that need to be multiplied to the core layer's address for calculating real 149address on the given address space. Support of ``address unit`` parameter is 150up to each operations set implementation. ``paddr`` is the only operations set 151implementation that supports the parameter. 152 153.. _damon_core_logic: 154 155Core Logics 156=========== 157 158.. _damon_design_monitoring: 159 160Monitoring 161---------- 162 163Below four sections describe each of the DAMON core mechanisms and the five 164monitoring attributes, ``sampling interval``, ``aggregation interval``, 165``update interval``, ``minimum number of regions``, and ``maximum number of 166regions``. 167 168To know how user-space can set the attributes via :ref:`DAMON sysfs interface 169<sysfs_interface>`, refer to :ref:`monitoring_attrs <sysfs_monitoring_attrs>` 170part of the documentation. 171 172 173Access Frequency Monitoring 174~~~~~~~~~~~~~~~~~~~~~~~~~~~ 175 176The output of DAMON says what pages are how frequently accessed for a given 177duration. The resolution of the access frequency is controlled by setting 178``sampling interval`` and ``aggregation interval``. In detail, DAMON checks 179access to each page per ``sampling interval`` and aggregates the results. In 180other words, counts the number of the accesses to each page. After each 181``aggregation interval`` passes, DAMON calls callback functions that previously 182registered by users so that users can read the aggregated results and then 183clears the results. This can be described in below simple pseudo-code:: 184 185 while monitoring_on: 186 for page in monitoring_target: 187 if accessed(page): 188 nr_accesses[page] += 1 189 if time() % aggregation_interval == 0: 190 for callback in user_registered_callbacks: 191 callback(monitoring_target, nr_accesses) 192 for page in monitoring_target: 193 nr_accesses[page] = 0 194 sleep(sampling interval) 195 196The monitoring overhead of this mechanism will arbitrarily increase as the 197size of the target workload grows. 198 199 200.. _damon_design_region_based_sampling: 201 202Region Based Sampling 203~~~~~~~~~~~~~~~~~~~~~ 204 205To avoid the unbounded increase of the overhead, DAMON groups adjacent pages 206that assumed to have the same access frequencies into a region. As long as the 207assumption (pages in a region have the same access frequencies) is kept, only 208one page in the region is required to be checked. Thus, for each ``sampling 209interval``, DAMON randomly picks one page in each region, waits for one 210``sampling interval``, checks whether the page is accessed meanwhile, and 211increases the access frequency counter of the region if so. The counter is 212called ``nr_accesses`` of the region. Therefore, the monitoring overhead is 213controllable by setting the number of regions. DAMON allows users to set the 214minimum and the maximum number of regions for the trade-off. 215 216This scheme, however, cannot preserve the quality of the output if the 217assumption is not guaranteed. 218 219 220.. _damon_design_adaptive_regions_adjustment: 221 222Adaptive Regions Adjustment 223~~~~~~~~~~~~~~~~~~~~~~~~~~~ 224 225Even somehow the initial monitoring target regions are well constructed to 226fulfill the assumption (pages in same region have similar access frequencies), 227the data access pattern can be dynamically changed. This will result in low 228monitoring quality. To keep the assumption as much as possible, DAMON 229adaptively merges and splits each region based on their access frequency. 230 231For each ``aggregation interval``, it compares the access frequencies 232(``nr_accesses``) of adjacent regions. If the difference is small, and if the 233sum of the two regions' sizes is smaller than the size of total regions divided 234by the ``minimum number of regions``, DAMON merges the two regions. If the 235resulting number of total regions is still higher than ``maximum number of 236regions``, it repeats the merging with increasing access frequenceis difference 237threshold until the upper-limit of the number of regions is met, or the 238threshold becomes higher than possible maximum value (``aggregation interval`` 239divided by ``sampling interval``). Then, after it reports and clears the 240aggregated access frequency of each region, it splits each region into two or 241three regions if the total number of regions will not exceed the user-specified 242maximum number of regions after the split. 243 244In this way, DAMON provides its best-effort quality and minimal overhead while 245keeping the bounds users set for their trade-off. 246 247 248.. _damon_design_age_tracking: 249 250Age Tracking 251~~~~~~~~~~~~ 252 253By analyzing the monitoring results, users can also find how long the current 254access pattern of a region has maintained. That could be used for good 255understanding of the access pattern. For example, page placement algorithm 256utilizing both the frequency and the recency could be implemented using that. 257To make such access pattern maintained period analysis easier, DAMON maintains 258yet another counter called ``age`` in each region. For each ``aggregation 259interval``, DAMON checks if the region's size and access frequency 260(``nr_accesses``) has significantly changed. If so, the counter is reset to 261zero. Otherwise, the counter is increased. 262 263 264Dynamic Target Space Updates Handling 265~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 266 267The monitoring target address range could dynamically changed. For example, 268virtual memory could be dynamically mapped and unmapped. Physical memory could 269be hot-plugged. 270 271As the changes could be quite frequent in some cases, DAMON allows the 272monitoring operations to check dynamic changes including memory mapping changes 273and applies it to monitoring operations-related data structures such as the 274abstracted monitoring target memory area only for each of a user-specified time 275interval (``update interval``). 276 277User-space can get the monitoring results via DAMON sysfs interface and/or 278tracepoints. For more details, please refer to the documentations for 279:ref:`DAMOS tried regions <sysfs_schemes_tried_regions>` and :ref:`tracepoint`, 280respectively. 281 282 283.. _damon_design_monitoring_params_tuning_guide: 284 285Monitoring Parameters Tuning Guide 286~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 287 288In short, set ``aggregation interval`` to capture meaningful amount of accesses 289for the purpose. The amount of accesses can be measured using ``nr_accesses`` 290and ``age`` of regions in the aggregated monitoring results snapshot. The 291default value of the interval, ``100ms``, turns out to be too short in many 292cases. Set ``sampling interval`` proportional to ``aggregation interval``. By 293default, ``1/20`` is recommended as the ratio. 294 295``Aggregation interval`` should be set as the time interval that the workload 296can make an amount of accesses for the monitoring purpose, within the interval. 297If the interval is too short, only small number of accesses are captured. As a 298result, the monitoring results look everything is samely accessed only rarely. 299For many purposes, that would be useless. If it is too long, however, the time 300to converge regions with the :ref:`regions adjustment mechanism 301<damon_design_adaptive_regions_adjustment>` can be too long, depending on the 302time scale of the given purpose. This could happen if the workload is actually 303making only rare accesses but the user thinks the amount of accesses for the 304monitoring purpose too high. For such cases, the target amount of access to 305capture per ``aggregation interval`` should carefully reconsidered. Also, note 306that the captured amount of accesses is represented with not only 307``nr_accesses``, but also ``age``. For example, even if every region on the 308monitoring results show zero ``nr_accesses``, regions could still be 309distinguished using ``age`` values as the recency information. 310 311Hence the optimum value of ``aggregation interval`` depends on the access 312intensiveness of the workload. The user should tune the interval based on the 313amount of access that captured on each aggregated snapshot of the monitoring 314results. 315 316Note that the default value of the interval is 100 milliseconds, which is too 317short in many cases, especially on large systems. 318 319``Sampling interval`` defines the resolution of each aggregation. If it is set 320too large, monitoring results will look like every region was samely rarely 321accessed, or samely frequently accessed. That is, regions become 322undistinguishable based on access pattern, and therefore the results will be 323useless in many use cases. If ``sampling interval`` is too small, it will not 324degrade the resolution, but will increase the monitoring overhead. If it is 325appropriate enough to provide a resolution of the monitoring results that 326sufficient for the given purpose, it shouldn't be unnecessarily further 327lowered. It is recommended to be set proportional to ``aggregation interval``. 328By default, the ratio is set as ``1/20``, and it is still recommended. 329 330Based on the manual tuning guide, DAMON provides more intuitive knob-based 331intervals auto tuning mechanism. Please refer to :ref:`the design document of 332the feature <damon_design_monitoring_intervals_autotuning>` for detail. 333 334Refer to below documents for an example tuning based on the above guide. 335 336.. toctree:: 337 :maxdepth: 1 338 339 monitoring_intervals_tuning_example 340 341 342.. _damon_design_monitoring_intervals_autotuning: 343 344Monitoring Intervals Auto-tuning 345~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 346 347DAMON provides automatic tuning of the ``sampling interval`` and ``aggregation 348interval`` based on the :ref:`the tuning guide idea 349<damon_design_monitoring_params_tuning_guide>`. The tuning mechanism allows 350users to set the aimed amount of access events to observe via DAMON within 351given time interval. The target can be specified by the user as a ratio of 352DAMON-observed access events to the theoretical maximum amount of the events 353(``access_bp``) that measured within a given number of aggregations 354(``aggrs``). 355 356The DAMON-observed access events are calculated in byte granularity based on 357DAMON :ref:`region assumption <damon_design_region_based_sampling>`. For 358example, if a region of size ``X`` bytes of ``Y`` ``nr_accesses`` is found, it 359means ``X * Y`` access events are observed by DAMON. Theoretical maximum 360access events for the region is calculated in same way, but replacing ``Y`` 361with theoretical maximum ``nr_accesses``, which can be calculated as 362``aggregation interval / sampling interval``. 363 364The mechanism calculates the ratio of access events for ``aggrs`` aggregations, 365and increases or decrease the ``sampleing interval`` and ``aggregation 366interval`` in same ratio, if the observed access ratio is lower or higher than 367the target, respectively. The ratio of the intervals change is decided in 368proportion to the distance between current samples ratio and the target ratio. 369 370The user can further set the minimum and maximum ``sampling interval`` that can 371be set by the tuning mechanism using two parameters (``min_sample_us`` and 372``max_sample_us``). Because the tuning mechanism changes ``sampling interval`` 373and ``aggregation interval`` in same ratio always, the minimum and maximum 374``aggregation interval`` after each of the tuning changes can automatically set 375together. 376 377The tuning is turned off by default, and need to be set explicitly by the user. 378As a rule of thumbs and the Parreto principle, 4% access samples ratio target 379is recommended. Note that Parreto principle (80/20 rule) has applied twice. 380That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source) 381to capture 64% (80% multipled by 80%) real access events (outcomes). 382 383To know how user-space can use this feature via :ref:`DAMON sysfs interface 384<sysfs_interface>`, refer to :ref:`intervals_goal <sysfs_scheme>` part of 385the documentation. 386 387 388.. _damon_design_damos: 389 390Operation Schemes 391----------------- 392 393One common purpose of data access monitoring is access-aware system efficiency 394optimizations. For example, 395 396 paging out memory regions that are not accessed for more than two minutes 397 398or 399 400 using THP for memory regions that are larger than 2 MiB and showing a high 401 access frequency for more than one minute. 402 403One straightforward approach for such schemes would be profile-guided 404optimizations. That is, getting data access monitoring results of the 405workloads or the system using DAMON, finding memory regions of special 406characteristics by profiling the monitoring results, and making system 407operation changes for the regions. The changes could be made by modifying or 408providing advice to the software (the application and/or the kernel), or 409reconfiguring the hardware. Both offline and online approaches could be 410available. 411 412Among those, providing advice to the kernel at runtime would be flexible and 413effective, and therefore widely be used. However, implementing such schemes 414could impose unnecessary redundancy and inefficiency. The profiling could be 415redundant if the type of interest is common. Exchanging the information 416including monitoring results and operation advice between kernel and user 417spaces could be inefficient. 418 419To allow users to reduce such redundancy and inefficiencies by offloading the 420works, DAMON provides a feature called Data Access Monitoring-based Operation 421Schemes (DAMOS). It lets users specify their desired schemes at a high 422level. For such specifications, DAMON starts monitoring, finds regions having 423the access pattern of interest, and applies the user-desired operation actions 424to the regions, for every user-specified time interval called 425``apply_interval``. 426 427To know how user-space can set ``apply_interval`` via :ref:`DAMON sysfs 428interface <sysfs_interface>`, refer to :ref:`apply_interval_us <sysfs_scheme>` 429part of the documentation. 430 431 432.. _damon_design_damos_action: 433 434Operation Action 435~~~~~~~~~~~~~~~~ 436 437The management action that the users desire to apply to the regions of their 438interest. For example, paging out, prioritizing for next reclamation victim 439selection, advising ``khugepaged`` to collapse or split, or doing nothing but 440collecting statistics of the regions. 441 442The list of supported actions is defined in DAMOS, but the implementation of 443each action is in the DAMON operations set layer because the implementation 444normally depends on the monitoring target address space. For example, the code 445for paging specific virtual address ranges out would be different from that for 446physical address ranges. And the monitoring operations implementation sets are 447not mandated to support all actions of the list. Hence, the availability of 448specific DAMOS action depends on what operations set is selected to be used 449together. 450 451The list of the supported actions, their meaning, and DAMON operations sets 452that supports each action are as below. 453 454 - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``. 455 Supported by ``vaddr`` and ``fvaddr`` operations set. 456 - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``. 457 Supported by ``vaddr`` and ``fvaddr`` operations set. 458 - ``pageout``: Reclaim the region. 459 Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. 460 - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``. 461 Supported by ``vaddr`` and ``fvaddr`` operations set. 462 - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``. 463 Supported by ``vaddr`` and ``fvaddr`` operations set. 464 - ``lru_prio``: Prioritize the region on its LRU lists. 465 Supported by ``paddr`` operations set. 466 - ``lru_deprio``: Deprioritize the region on its LRU lists. 467 Supported by ``paddr`` operations set. 468 - ``migrate_hot``: Migrate the regions prioritizing warmer regions. 469 Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. 470 - ``migrate_cold``: Migrate the regions prioritizing colder regions. 471 Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set. 472 - ``stat``: Do nothing but count the statistics. 473 Supported by all operations sets. 474 475Applying the actions except ``stat`` to a region is considered as changing the 476region's characteristics. Hence, DAMOS resets the age of regions when any such 477actions are applied to those. 478 479To know how user-space can set the action via :ref:`DAMON sysfs interface 480<sysfs_interface>`, refer to :ref:`action <sysfs_scheme>` part of the 481documentation. 482 483 484.. _damon_design_damos_access_pattern: 485 486Target Access Pattern 487~~~~~~~~~~~~~~~~~~~~~ 488 489The access pattern of the schemes' interest. The patterns are constructed with 490the properties that DAMON's monitoring results provide, specifically the size, 491the access frequency, and the age. Users can describe their access pattern of 492interest by setting minimum and maximum values of the three properties. If a 493region's three properties are in the ranges, DAMOS classifies it as one of the 494regions that the scheme is having an interest in. 495 496To know how user-space can set the access pattern via :ref:`DAMON sysfs 497interface <sysfs_interface>`, refer to :ref:`access_pattern 498<sysfs_access_pattern>` part of the documentation. 499 500 501.. _damon_design_damos_quotas: 502 503Quotas 504~~~~~~ 505 506DAMOS upper-bound overhead control feature. DAMOS could incur high overhead if 507the target access pattern is not properly tuned. For example, if a huge memory 508region having the access pattern of interest is found, applying the scheme's 509action to all pages of the huge region could consume unacceptably large system 510resources. Preventing such issues by tuning the access pattern could be 511challenging, especially if the access patterns of the workloads are highly 512dynamic. 513 514To mitigate that situation, DAMOS provides an upper-bound overhead control 515feature called quotas. It lets users specify an upper limit of time that DAMOS 516can use for applying the action, and/or a maximum bytes of memory regions that 517the action can be applied within a user-specified time duration. 518 519To know how user-space can set the basic quotas via :ref:`DAMON sysfs interface 520<sysfs_interface>`, refer to :ref:`quotas <sysfs_quotas>` part of the 521documentation. 522 523 524.. _damon_design_damos_quotas_prioritization: 525 526Prioritization 527^^^^^^^^^^^^^^ 528 529A mechanism for making a good decision under the quotas. When the action 530cannot be applied to all regions of interest due to the quotas, DAMOS 531prioritizes regions and applies the action to only regions having high enough 532priorities so that it will not exceed the quotas. 533 534The prioritization mechanism should be different for each action. For example, 535rarely accessed (colder) memory regions would be prioritized for page-out 536scheme action. In contrast, the colder regions would be deprioritized for huge 537page collapse scheme action. Hence, the prioritization mechanisms for each 538action are implemented in each DAMON operations set, together with the actions. 539 540Though the implementation is up to the DAMON operations set, it would be common 541to calculate the priority using the access pattern properties of the regions. 542Some users would want the mechanisms to be personalized for their specific 543case. For example, some users would want the mechanism to weigh the recency 544(``age``) more than the access frequency (``nr_accesses``). DAMOS allows users 545to specify the weight of each access pattern property and passes the 546information to the underlying mechanism. Nevertheless, how and even whether 547the weight will be respected are up to the underlying prioritization mechanism 548implementation. 549 550To know how user-space can set the prioritization weights via :ref:`DAMON sysfs 551interface <sysfs_interface>`, refer to :ref:`weights <sysfs_quotas>` part of 552the documentation. 553 554 555.. _damon_design_damos_quotas_auto_tuning: 556 557Aim-oriented Feedback-driven Auto-tuning 558^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 559 560Automatic feedback-driven quota tuning. Instead of setting the absolute quota 561value, users can specify the metric of their interest, and what target value 562they want the metric value to be. DAMOS then automatically tunes the 563aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS 564is under achieving the goal, DAMOS automatically increases the quota. If DAMOS 565is over achieving the goal, it decreases the quota. 566 567The goal can be specified with four parameters, namely ``target_metric``, 568``target_value``, ``current_value`` and ``nid``. The auto-tuning mechanism 569tries to make ``current_value`` of ``target_metric`` be same to 570``target_value``. 571 572- ``user_input``: User-provided value. Users could use any metric that they 573 has interest in for the value. Use space main workload's latency or 574 throughput, system metrics like free memory ratio or memory pressure stall 575 time (PSI) could be examples. Note that users should explicitly set 576 ``current_value`` on their own in this case. In other words, users should 577 repeatedly provide the feedback. 578- ``some_mem_psi_us``: System-wide ``some`` memory pressure stall information 579 in microseconds that measured from last quota reset to next quota reset. 580 DAMOS does the measurement on its own, so only ``target_value`` need to be 581 set by users at the initial time. In other words, DAMOS does self-feedback. 582- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000). 583- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000). 584 585``nid`` is optionally required for only ``node_mem_used_bp`` and 586``node_mem_free_bp`` to point the specific NUMA node. 587 588To know how user-space can set the tuning goal metric, the target value, and/or 589the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to 590:ref:`quota goals <sysfs_schemes_quota_goals>` part of the documentation. 591 592 593.. _damon_design_damos_watermarks: 594 595Watermarks 596~~~~~~~~~~ 597 598Conditional DAMOS (de)activation automation. Users might want DAMOS to run 599only under certain situations. For example, when a sufficient amount of free 600memory is guaranteed, running a scheme for proactive reclamation would only 601consume unnecessary system resources. To avoid such consumption, the user would 602need to manually monitor some metrics such as free memory ratio, and turn 603DAMON/DAMOS on or off. 604 605DAMOS allows users to offload such works using three watermarks. It allows the 606users to configure the metric of their interest, and three watermark values, 607namely high, middle, and low. If the value of the metric becomes above the 608high watermark or below the low watermark, the scheme is deactivated. If the 609metric becomes below the mid watermark but above the low watermark, the scheme 610is activated. If all schemes are deactivated by the watermarks, the monitoring 611is also deactivated. In this case, the DAMON worker thread only periodically 612checks the watermarks and therefore incurs nearly zero overhead. 613 614To know how user-space can set the watermarks via :ref:`DAMON sysfs interface 615<sysfs_interface>`, refer to :ref:`watermarks <sysfs_watermarks>` part of the 616documentation. 617 618 619.. _damon_design_damos_filters: 620 621Filters 622~~~~~~~ 623 624Non-access pattern-based target memory regions filtering. If users run 625self-written programs or have good profiling tools, they could know something 626more than the kernel, such as future access patterns or some special 627requirements for specific types of memory. For example, some users may know 628only anonymous pages can impact their program's performance. They can also 629have a list of latency-critical processes. 630 631To let users optimize DAMOS schemes with such special knowledge, DAMOS provides 632a feature called DAMOS filters. The feature allows users to set an arbitrary 633number of filters for each scheme. Each filter specifies 634 635- a type of memory (``type``), 636- whether it is for the memory of the type or all except the type 637 (``matching``), and 638- whether it is to allow (include) or reject (exclude) applying 639 the scheme's action to the memory (``allow``). 640 641For efficient handling of filters, some types of filters are handled by the 642core layer, while others are handled by operations set. In the latter case, 643hence, support of the filter types depends on the DAMON operations set. In 644case of the core layer-handled filters, the memory regions that excluded by the 645filter are not counted as the scheme has tried to the region. In contrast, if 646a memory regions is filtered by an operations set layer-handled filter, it is 647counted as the scheme has tried. This difference affects the statistics. 648 649When multiple filters are installed, the group of filters that handled by the 650core layer are evaluated first. After that, the group of filters that handled 651by the operations layer are evaluated. Filters in each of the groups are 652evaluated in the installed order. If a part of memory is matched to one of the 653filter, next filters are ignored. If the part passes through the filters 654evaluation stage because it is not matched to any of the filters, applying the 655scheme's action to it depends on the last filter's allowance type. If the last 656filter was for allowing, the part of memory will be rejected, and vice versa. 657 658For example, let's assume 1) a filter for allowing anonymous pages and 2) 659another filter for rejecting young pages are installed in the order. If a page 660of a region that eligible to apply the scheme's action is an anonymous page, 661the scheme's action will be applied to the page regardless of whether it is 662young or not, since it matches with the first allow-filter. If the page is 663not anonymous but young, the scheme's action will not be applied, since the 664second reject-filter blocks it. If the page is neither anonymous nor young, 665the page will pass through the filters evaluation stage since there is no 666matching filter, and the action will be applied to the page. 667 668Below ``type`` of filters are currently supported. 669 670- Core layer handled 671 - addr 672 - Applied to pages that belonging to a given address range. 673 - target 674 - Applied to pages that belonging to a given DAMON monitoring target. 675- Operations layer handled, supported by only ``paddr`` operations set. 676 - anon 677 - Applied to pages that containing data that not stored in files. 678 - active 679 - Applied to active pages. 680 - memcg 681 - Applied to pages that belonging to a given cgroup. 682 - young 683 - Applied to pages that are accessed after the last access check from the 684 scheme. 685 - hugepage_size 686 - Applied to pages that managed in a given size range. 687 - unmapped 688 - Applied to pages that unmapped. 689 690To know how user-space can set the filters via :ref:`DAMON sysfs interface 691<sysfs_interface>`, refer to :ref:`filters <sysfs_filters>` part of the 692documentation. 693 694.. _damon_design_damos_stat: 695 696Statistics 697~~~~~~~~~~ 698 699The statistics of DAMOS behaviors that designed to help monitoring, tuning and 700debugging of DAMOS. 701 702DAMOS accounts below statistics for each scheme, from the beginning of the 703scheme's execution. 704 705- ``nr_tried``: Total number of regions that the scheme is tried to be applied. 706- ``sz_tried``: Total size of regions that the scheme is tried to be applied. 707- ``sz_ops_filter_passed``: Total bytes that passed operations set 708 layer-handled DAMOS filters. 709- ``nr_applied``: Total number of regions that the scheme is applied. 710- ``sz_applied``: Total size of regions that the scheme is applied. 711- ``qt_exceeds``: Total number of times the quota of the scheme has exceeded. 712 713"A scheme is tried to be applied to a region" means DAMOS core logic determined 714the region is eligible to apply the scheme's :ref:`action 715<damon_design_damos_action>`. The :ref:`access pattern 716<damon_design_damos_access_pattern>`, :ref:`quotas 717<damon_design_damos_quotas>`, :ref:`watermarks 718<damon_design_damos_watermarks>`, and :ref:`filters 719<damon_design_damos_filters>` that handled on core logic could affect this. 720The core logic will only ask the underlying :ref:`operation set 721<damon_operations_set>` to do apply the action to the region, so whether the 722action is really applied or not is unclear. That's why it is called "tried". 723 724"A scheme is applied to a region" means the :ref:`operation set 725<damon_operations_set>` has applied the action to at least a part of the 726region. The :ref:`filters <damon_design_damos_filters>` that handled by the 727operation set, and the types of the :ref:`action <damon_design_damos_action>` 728and the pages of the region can affect this. For example, if a filter is set 729to exclude anonymous pages and the region has only anonymous pages, or if the 730action is ``pageout`` while all pages of the region are unreclaimable, applying 731the action to the region will fail. 732 733To know how user-space can read the stats via :ref:`DAMON sysfs interface 734<sysfs_interface>`, refer to :ref:s`stats <sysfs_stats>` part of the 735documentation. 736 737Regions Walking 738~~~~~~~~~~~~~~~ 739 740DAMOS feature allowing users access each region that a DAMOS action has just 741applied. Using this feature, DAMON :ref:`API <damon_design_api>` allows users 742access full properties of the regions including the access monitoring results 743and amount of the region's internal memory that passed the DAMOS filters. 744:ref:`DAMON sysfs interface <sysfs_interface>` also allows users read the data 745via special :ref:`files <sysfs_schemes_tried_regions>`. 746 747.. _damon_design_api: 748 749Application Programming Interface 750--------------------------------- 751 752The programming interface for kernel space data access-aware applications. 753DAMON is a framework, so it does nothing by itself. Instead, it only helps 754other kernel components such as subsystems and modules building their data 755access-aware applications using DAMON's core features. For this, DAMON exposes 756its all features to other kernel components via its application programming 757interface, namely ``include/linux/damon.h``. Please refer to the API 758:doc:`document </mm/damon/api>` for details of the interface. 759 760 761.. _damon_modules: 762 763Modules 764======= 765 766Because the core of DAMON is a framework for kernel components, it doesn't 767provide any direct interface for the user space. Such interfaces should be 768implemented by each DAMON API user kernel components, instead. DAMON subsystem 769itself implements such DAMON API user modules, which are supposed to be used 770for general purpose DAMON control and special purpose data access-aware system 771operations, and provides stable application binary interfaces (ABI) for the 772user space. The user space can build their efficient data access-aware 773applications using the interfaces. 774 775 776General Purpose User Interface Modules 777-------------------------------------- 778 779DAMON modules that provide user space ABIs for general purpose DAMON usage in 780runtime. 781 782Like many other ABIs, the modules create files on pseudo file systems like 783'sysfs', allow users to specify their requests to and get the answers from 784DAMON by writing to and reading from the files. As a response to such I/O, 785DAMON user interface modules control DAMON and retrieve the results as user 786requested via the DAMON API, and return the results to the user-space. 787 788The ABIs are designed to be used for user space applications development, 789rather than human beings' fingers. Human users are recommended to use such 790user space tools. One such Python-written user space tool is available at 791Github (https://github.com/damonitor/damo), Pypi 792(https://pypistats.org/packages/damo), and Fedora 793(https://packages.fedoraproject.org/pkgs/python-damo/damo/). 794 795Currently, one module for this type, namely 'DAMON sysfs interface' is 796available. Please refer to the ABI :ref:`doc <sysfs_interface>` for details of 797the interfaces. 798 799 800Special-Purpose Access-aware Kernel Modules 801------------------------------------------- 802 803DAMON modules that provide user space ABI for specific purpose DAMON usage. 804 805DAMON user interface modules are for full control of all DAMON features in 806runtime. For each special-purpose system-wide data access-aware system 807operations such as proactive reclamation or LRU lists balancing, the interfaces 808could be simplified by removing unnecessary knobs for the specific purpose, and 809extended for boot-time and even compile time control. Default values of DAMON 810control parameters for the usage would also need to be optimized for the 811purpose. 812 813To support such cases, yet more DAMON API user kernel modules that provide more 814simple and optimized user space interfaces are available. Currently, two 815modules for proactive reclamation and LRU lists manipulation are provided. For 816more detail, please read the usage documents for those 817(:doc:`/admin-guide/mm/damon/reclaim` and 818:doc:`/admin-guide/mm/damon/lru_sort`). 819