xref: /linux/Documentation/mm/damon/design.rst (revision fbf5df34a4dbcd09d433dd4f0916bf9b2ddb16de)
1.. SPDX-License-Identifier: GPL-2.0
2
3======
4Design
5======
6
7
8.. _damon_design_execution_model_and_data_structures:
9
10Execution Model and Data Structures
11===================================
12
13The monitoring-related information including the monitoring request
14specification and DAMON-based operation schemes are stored in a data structure
15called DAMON ``context``.  DAMON executes each context with a kernel thread
16called ``kdamond``.  Multiple kdamonds could run in parallel, for different
17types of monitoring.
18
19To know how user-space can do the configurations and start/stop DAMON, refer to
20:ref:`DAMON sysfs interface <sysfs_interface>` documentation.
21
22
23Overall Architecture
24====================
25
26DAMON subsystem is configured with three layers including
27
28- :ref:`Operations Set <damon_operations_set>`: Implements fundamental
29  operations for DAMON that depends on the given monitoring target
30  address-space and available set of software/hardware primitives,
31- :ref:`Core <damon_core_logic>`: Implements core logics including monitoring
32  overhead/accuracy control and access-aware system operations on top of the
33  operations set layer, and
34- :ref:`Modules <damon_modules>`: Implements kernel modules for various
35  purposes that provides interfaces for the user space, on top of the core
36  layer.
37
38
39.. _damon_operations_set:
40
41Operations Set Layer
42====================
43
44.. _damon_design_configurable_operations_set:
45
46For data access monitoring and additional low level work, DAMON needs a set of
47implementations for specific operations that are dependent on and optimized for
48the given target address space.  For example, below two operations for access
49monitoring are address-space dependent.
50
511. Identification of the monitoring target address range for the address space.
522. Access check of specific address range in the target space.
53
54DAMON consolidates these implementations in a layer called DAMON Operations
55Set, and defines the interface between it and the upper layer.  The upper layer
56is dedicated for DAMON's core logics including the mechanism for control of the
57monitoring accuracy and the overhead.
58
59Hence, DAMON can easily be extended for any address space and/or available
60hardware features by configuring the core logic to use the appropriate
61operations set.  If there is no available operations set for a given purpose, a
62new operations set can be implemented following the interface between the
63layers.
64
65For example, physical memory, virtual memory, swap space, those for specific
66processes, NUMA nodes, files, and backing memory devices would be supportable.
67Also, if some architectures or devices support special optimized access check
68features, those will be easily configurable.
69
70DAMON currently provides below three operation sets.  Below three subsections
71describe how those work.
72
73 - vaddr: Monitor virtual address spaces of specific processes
74 - fvaddr: Monitor fixed virtual address ranges
75 - paddr: Monitor the physical address space of the system
76
77To know how user-space can do the configuration via :ref:`DAMON sysfs interface
78<sysfs_interface>`, refer to :ref:`operations <sysfs_context>` file part of the
79documentation.
80
81
82 .. _damon_design_vaddr_target_regions_construction:
83
84VMA-based Target Address Range Construction
85-------------------------------------------
86
87A mechanism of ``vaddr`` DAMON operations set that automatically initializes
88and updates the monitoring target address regions so that entire memory
89mappings of the target processes can be covered.
90
91This mechanism is only for the ``vaddr`` operations set.  In cases of
92``fvaddr`` and ``paddr`` operation sets, users are asked to manually set the
93monitoring target address ranges.
94
95Only small parts in the super-huge virtual address space of the processes are
96mapped to the physical memory and accessed.  Thus, tracking the unmapped
97address regions is just wasteful.  However, because DAMON can deal with some
98level of noise using the adaptive regions adjustment mechanism, tracking every
99mapping is not strictly required but could even incur a high overhead in some
100cases.  That said, too huge unmapped areas inside the monitoring target should
101be removed to not take the time for the adaptive mechanism.
102
103For the reason, this implementation converts the complex mappings to three
104distinct regions that cover every mapped area of the address space.  The two
105gaps between the three regions are the two biggest unmapped areas in the given
106address space.  The two biggest unmapped areas would be the gap between the
107heap and the uppermost mmap()-ed region, and the gap between the lowermost
108mmap()-ed region and the stack in most of the cases.  Because these gaps are
109exceptionally huge in usual address spaces, excluding these will be sufficient
110to make a reasonable trade-off.  Below shows this in detail::
111
112    <heap>
113    <BIG UNMAPPED REGION 1>
114    <uppermost mmap()-ed region>
115    (small mmap()-ed regions and munmap()-ed regions)
116    <lowermost mmap()-ed region>
117    <BIG UNMAPPED REGION 2>
118    <stack>
119
120
121PTE Accessed-bit Based Access Check
122-----------------------------------
123
124Both of the implementations for physical and virtual address spaces use PTE
125Accessed-bit for basic access checks.  Only one difference is the way of
126finding the relevant PTE Accessed bit(s) from the address.  While the
127implementation for the virtual address walks the page table for the target task
128of the address, the implementation for the physical address walks every page
129table having a mapping to the address.  In this way, the implementations find
130and clear the bit(s) for next sampling target address and checks whether the
131bit(s) set again after one sampling period.  This could disturb other kernel
132subsystems using the Accessed bits, namely Idle page tracking and the reclaim
133logic.  DAMON does nothing to avoid disturbing Idle page tracking, so handling
134the interference is the responsibility of sysadmins.  However, it solves the
135conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags,
136as Idle page tracking does.
137
138.. _damon_design_addr_unit:
139
140Address Unit
141------------
142
143DAMON core layer uses ``unsinged long`` type for monitoring target address
144ranges.  In some cases, the address space for a given operations set could be
145too large to be handled with the type.  ARM (32-bit) with large physical
146address extension is an example.  For such cases, a per-operations set
147parameter called ``address unit`` is provided.  It represents the scale factor
148that need to be multiplied to the core layer's address for calculating real
149address on the given address space.  Support of ``address unit`` parameter is
150up to each operations set implementation.  ``paddr`` is the only operations set
151implementation that supports the parameter.
152
153If the value is smaller than ``PAGE_SIZE``, only a power of two should be used.
154
155.. _damon_core_logic:
156
157Core Logics
158===========
159
160.. _damon_design_monitoring:
161
162Monitoring
163----------
164
165Below four sections describe each of the DAMON core mechanisms and the five
166monitoring attributes, ``sampling interval``, ``aggregation interval``,
167``update interval``, ``minimum number of regions``, and ``maximum number of
168regions``.
169
170Note that ``minimum number of regions`` must be 3 or higher. This is because the
171virtual address space monitoring is designed to handle at least three regions to
172accommodate two large unmapped areas commonly found in normal virtual address
173spaces. While this restriction might not be strictly necessary for other
174operation sets like ``paddr``, it is currently enforced across all DAMON
175operations for consistency.
176
177To know how user-space can set the attributes via :ref:`DAMON sysfs interface
178<sysfs_interface>`, refer to :ref:`monitoring_attrs <sysfs_monitoring_attrs>`
179part of the documentation.
180
181
182Access Frequency Monitoring
183~~~~~~~~~~~~~~~~~~~~~~~~~~~
184
185The output of DAMON says what pages are how frequently accessed for a given
186duration.  The resolution of the access frequency is controlled by setting
187``sampling interval`` and ``aggregation interval``.  In detail, DAMON checks
188access to each page per ``sampling interval`` and aggregates the results.  In
189other words, counts the number of the accesses to each page.  After each
190``aggregation interval`` passes, DAMON calls callback functions that previously
191registered by users so that users can read the aggregated results and then
192clears the results.  This can be described in below simple pseudo-code::
193
194    while monitoring_on:
195        for page in monitoring_target:
196            if accessed(page):
197                nr_accesses[page] += 1
198        if time() % aggregation_interval == 0:
199            for callback in user_registered_callbacks:
200                callback(monitoring_target, nr_accesses)
201            for page in monitoring_target:
202                nr_accesses[page] = 0
203        sleep(sampling interval)
204
205The monitoring overhead of this mechanism will arbitrarily increase as the
206size of the target workload grows.
207
208
209.. _damon_design_region_based_sampling:
210
211Region Based Sampling
212~~~~~~~~~~~~~~~~~~~~~
213
214To avoid the unbounded increase of the overhead, DAMON groups adjacent pages
215that assumed to have the same access frequencies into a region.  As long as the
216assumption (pages in a region have the same access frequencies) is kept, only
217one page in the region is required to be checked.  Thus, for each ``sampling
218interval``, DAMON randomly picks one page in each region, waits for one
219``sampling interval``, checks whether the page is accessed meanwhile, and
220increases the access frequency counter of the region if so.  The counter is
221called ``nr_accesses`` of the region.  Therefore, the monitoring overhead is
222controllable by setting the number of regions.  DAMON allows users to set the
223minimum and the maximum number of regions for the trade-off.
224
225This scheme, however, cannot preserve the quality of the output if the
226assumption is not guaranteed.
227
228
229.. _damon_design_adaptive_regions_adjustment:
230
231Adaptive Regions Adjustment
232~~~~~~~~~~~~~~~~~~~~~~~~~~~
233
234Even somehow the initial monitoring target regions are well constructed to
235fulfill the assumption (pages in same region have similar access frequencies),
236the data access pattern can be dynamically changed.  This will result in low
237monitoring quality.  To keep the assumption as much as possible, DAMON
238adaptively merges and splits each region based on their access frequency.
239
240For each ``aggregation interval``, it compares the access frequencies
241(``nr_accesses``) of adjacent regions.  If the difference is small, and if the
242sum of the two regions' sizes is smaller than the size of total regions divided
243by the ``minimum number of regions``, DAMON merges the two regions.  If the
244resulting number of total regions is still higher than ``maximum number of
245regions``, it repeats the merging with increasing access frequenceis difference
246threshold until the upper-limit of the number of regions is met, or the
247threshold becomes higher than possible maximum value (``aggregation interval``
248divided by ``sampling interval``).   Then, after it reports and clears the
249aggregated access frequency of each region, it splits each region into two or
250three regions if the total number of regions will not exceed the user-specified
251maximum number of regions after the split.
252
253In this way, DAMON provides its best-effort quality and minimal overhead while
254keeping the bounds users set for their trade-off.
255
256
257.. _damon_design_age_tracking:
258
259Age Tracking
260~~~~~~~~~~~~
261
262By analyzing the monitoring results, users can also find how long the current
263access pattern of a region has maintained.  That could be used for good
264understanding of the access pattern.  For example, page placement algorithm
265utilizing both the frequency and the recency could be implemented using that.
266To make such access pattern maintained period analysis easier, DAMON maintains
267yet another counter called ``age`` in each region.  For each ``aggregation
268interval``, DAMON checks if the region's size and access frequency
269(``nr_accesses``) has significantly changed.  If so, the counter is reset to
270zero.  Otherwise, the counter is increased.
271
272
273Dynamic Target Space Updates Handling
274~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
275
276The monitoring target address range could dynamically changed.  For example,
277virtual memory could be dynamically mapped and unmapped.  Physical memory could
278be hot-plugged.
279
280As the changes could be quite frequent in some cases, DAMON allows the
281monitoring operations to check dynamic changes including memory mapping changes
282and applies it to monitoring operations-related data structures such as the
283abstracted monitoring target memory area only for each of a user-specified time
284interval (``update interval``).
285
286User-space can get the monitoring results via DAMON sysfs interface and/or
287tracepoints.  For more details, please refer to the documentations for
288:ref:`DAMOS tried regions <sysfs_schemes_tried_regions>` and :ref:`tracepoint`,
289respectively.
290
291
292.. _damon_design_monitoring_params_tuning_guide:
293
294Monitoring Parameters Tuning Guide
295~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
296
297In short, set ``aggregation interval`` to capture meaningful amount of accesses
298for the purpose.  The amount of accesses can be measured using ``nr_accesses``
299and ``age`` of regions in the aggregated monitoring results snapshot.  The
300default value of the interval, ``100ms``, turns out to be too short in many
301cases.  Set ``sampling interval`` proportional to ``aggregation interval``.  By
302default, ``1/20`` is recommended as the ratio.
303
304``Aggregation interval`` should be set as the time interval that the workload
305can make an amount of accesses for the monitoring purpose, within the interval.
306If the interval is too short, only small number of accesses are captured.  As a
307result, the monitoring results look everything is samely accessed only rarely.
308For many purposes, that would be useless.  If it is too long, however, the time
309to converge regions with the :ref:`regions adjustment mechanism
310<damon_design_adaptive_regions_adjustment>` can be too long, depending on the
311time scale of the given purpose.  This could happen if the workload is actually
312making only rare accesses but the user thinks the amount of accesses for the
313monitoring purpose too high.  For such cases, the target amount of access to
314capture per ``aggregation interval`` should carefully reconsidered.  Also, note
315that the captured amount of accesses is represented with not only
316``nr_accesses``, but also ``age``.  For example, even if every region on the
317monitoring results show zero ``nr_accesses``, regions could still be
318distinguished using ``age`` values as the recency information.
319
320Hence the optimum value of ``aggregation interval`` depends on the access
321intensiveness of the workload.  The user should tune the interval based on the
322amount of access that captured on each aggregated snapshot of the monitoring
323results.
324
325Note that the default value of the interval is 100 milliseconds, which is too
326short in many cases, especially on large systems.
327
328``Sampling interval`` defines the resolution of each aggregation.  If it is set
329too large, monitoring results will look like every region was samely rarely
330accessed, or samely frequently accessed.  That is, regions become
331undistinguishable based on access pattern, and therefore the results will be
332useless in many use cases.  If ``sampling interval`` is too small, it will not
333degrade the resolution, but will increase the monitoring overhead.  If it is
334appropriate enough to provide a resolution of the monitoring results that
335sufficient for the given purpose, it shouldn't be unnecessarily further
336lowered.  It is recommended to be set proportional to ``aggregation interval``.
337By default, the ratio is set as ``1/20``, and it is still recommended.
338
339Based on the manual tuning guide, DAMON provides more intuitive knob-based
340intervals auto tuning mechanism.  Please refer to :ref:`the design document of
341the feature <damon_design_monitoring_intervals_autotuning>` for detail.
342
343Refer to below documents for an example tuning based on the above guide.
344
345.. toctree::
346   :maxdepth: 1
347
348   monitoring_intervals_tuning_example
349
350
351.. _damon_design_monitoring_intervals_autotuning:
352
353Monitoring Intervals Auto-tuning
354~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
355
356DAMON provides automatic tuning of the ``sampling interval`` and ``aggregation
357interval`` based on the :ref:`the tuning guide idea
358<damon_design_monitoring_params_tuning_guide>`.  The tuning mechanism allows
359users to set the aimed amount of access events to observe via DAMON within
360given time interval.  The target can be specified by the user as a ratio of
361DAMON-observed access events to the theoretical maximum amount of the events
362(``access_bp``) that measured within a given number of aggregations
363(``aggrs``).
364
365The DAMON-observed access events are calculated in byte granularity based on
366DAMON :ref:`region assumption <damon_design_region_based_sampling>`.  For
367example, if a region of size ``X`` bytes of ``Y`` ``nr_accesses`` is found, it
368means ``X * Y`` access events are observed by DAMON.  Theoretical maximum
369access events for the region is calculated in same way, but replacing ``Y``
370with theoretical maximum ``nr_accesses``, which can be calculated as
371``aggregation interval / sampling interval``.
372
373The mechanism calculates the ratio of access events for ``aggrs`` aggregations,
374and increases or decrease the ``sampleing interval`` and ``aggregation
375interval`` in same ratio, if the observed access ratio is lower or higher than
376the target, respectively.  The ratio of the intervals change is decided in
377proportion to the distance between current samples ratio and the target ratio.
378
379The user can further set the minimum and maximum ``sampling interval`` that can
380be set by the tuning mechanism using two parameters (``min_sample_us`` and
381``max_sample_us``).  Because the tuning mechanism changes ``sampling interval``
382and ``aggregation interval`` in same ratio always, the minimum and maximum
383``aggregation interval`` after each of the tuning changes can automatically set
384together.
385
386The tuning is turned off by default, and need to be set explicitly by the user.
387As a rule of thumbs and the Parreto principle, 4% access samples ratio target
388is recommended.  Note that Parreto principle (80/20 rule) has applied twice.
389That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source)
390to capture 64% (80% multipled by 80%) real access events (outcomes).
391
392To know how user-space can use this feature via :ref:`DAMON sysfs interface
393<sysfs_interface>`, refer to :ref:`intervals_goal
394<damon_usage_sysfs_monitoring_intervals_goal>` part of the documentation.
395
396
397.. _damon_design_damos:
398
399Operation Schemes
400-----------------
401
402One common purpose of data access monitoring is access-aware system efficiency
403optimizations.  For example,
404
405    paging out memory regions that are not accessed for more than two minutes
406
407or
408
409    using THP for memory regions that are larger than 2 MiB and showing a high
410    access frequency for more than one minute.
411
412One straightforward approach for such schemes would be profile-guided
413optimizations.  That is, getting data access monitoring results of the
414workloads or the system using DAMON, finding memory regions of special
415characteristics by profiling the monitoring results, and making system
416operation changes for the regions.  The changes could be made by modifying or
417providing advice to the software (the application and/or the kernel), or
418reconfiguring the hardware.  Both offline and online approaches could be
419available.
420
421Among those, providing advice to the kernel at runtime would be flexible and
422effective, and therefore widely be used.   However, implementing such schemes
423could impose unnecessary redundancy and inefficiency.  The profiling could be
424redundant if the type of interest is common.  Exchanging the information
425including monitoring results and operation advice between kernel and user
426spaces could be inefficient.
427
428To allow users to reduce such redundancy and inefficiencies by offloading the
429works, DAMON provides a feature called Data Access Monitoring-based Operation
430Schemes (DAMOS).  It lets users specify their desired schemes at a high
431level.  For such specifications, DAMON starts monitoring, finds regions having
432the access pattern of interest, and applies the user-desired operation actions
433to the regions, for every user-specified time interval called
434``apply_interval``.
435
436To know how user-space can set ``apply_interval`` via :ref:`DAMON sysfs
437interface <sysfs_interface>`, refer to :ref:`apply_interval_us <sysfs_scheme>`
438part of the documentation.
439
440
441.. _damon_design_damos_action:
442
443Operation Action
444~~~~~~~~~~~~~~~~
445
446The management action that the users desire to apply to the regions of their
447interest.  For example, paging out, prioritizing for next reclamation victim
448selection, advising ``khugepaged`` to collapse or split, or doing nothing but
449collecting statistics of the regions.
450
451The list of supported actions is defined in DAMOS, but the implementation of
452each action is in the DAMON operations set layer because the implementation
453normally depends on the monitoring target address space.  For example, the code
454for paging specific virtual address ranges out would be different from that for
455physical address ranges.  And the monitoring operations implementation sets are
456not mandated to support all actions of the list.  Hence, the availability of
457specific DAMOS action depends on what operations set is selected to be used
458together.
459
460The list of the supported actions, their meaning, and DAMON operations sets
461that supports each action are as below.
462
463 - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
464   Supported by ``vaddr`` and ``fvaddr`` operations set.
465 - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``.
466   Supported by ``vaddr`` and ``fvaddr`` operations set.
467 - ``pageout``: Reclaim the region.
468   Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
469 - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``.
470   Supported by ``vaddr`` and ``fvaddr`` operations set. When
471   TRANSPARENT_HUGEPAGE is disabled, the application of the action will just
472   fail.
473 - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``.
474   Supported by ``vaddr`` and ``fvaddr`` operations set. When
475   TRANSPARENT_HUGEPAGE is disabled, the application of the action will just
476   fail.
477 - ``lru_prio``: Prioritize the region on its LRU lists.
478   Supported by ``paddr`` operations set.
479 - ``lru_deprio``: Deprioritize the region on its LRU lists.
480   Supported by ``paddr`` operations set.
481 - ``migrate_hot``: Migrate the regions prioritizing warmer regions.
482   Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
483 - ``migrate_cold``: Migrate the regions prioritizing colder regions.
484   Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
485 - ``stat``: Do nothing but count the statistics.
486   Supported by all operations sets.
487
488Applying the actions except ``stat`` to a region is considered as changing the
489region's characteristics.  Hence, DAMOS resets the age of regions when any such
490actions are applied to those.
491
492To know how user-space can set the action via :ref:`DAMON sysfs interface
493<sysfs_interface>`, refer to :ref:`action <sysfs_scheme>` part of the
494documentation.
495
496
497.. _damon_design_damos_access_pattern:
498
499Target Access Pattern
500~~~~~~~~~~~~~~~~~~~~~
501
502The access pattern of the schemes' interest.  The patterns are constructed with
503the properties that DAMON's monitoring results provide, specifically the size,
504the access frequency, and the age.  Users can describe their access pattern of
505interest by setting minimum and maximum values of the three properties.  If a
506region's three properties are in the ranges, DAMOS classifies it as one of the
507regions that the scheme is having an interest in.
508
509To know how user-space can set the access pattern via :ref:`DAMON sysfs
510interface <sysfs_interface>`, refer to :ref:`access_pattern
511<sysfs_access_pattern>` part of the documentation.
512
513
514.. _damon_design_damos_quotas:
515
516Quotas
517~~~~~~
518
519DAMOS upper-bound overhead control feature.  DAMOS could incur high overhead if
520the target access pattern is not properly tuned.  For example, if a huge memory
521region having the access pattern of interest is found, applying the scheme's
522action to all pages of the huge region could consume unacceptably large system
523resources.  Preventing such issues by tuning the access pattern could be
524challenging, especially if the access patterns of the workloads are highly
525dynamic.
526
527To mitigate that situation, DAMOS provides an upper-bound overhead control
528feature called quotas.  It lets users specify an upper limit of time that DAMOS
529can use for applying the action, and/or a maximum bytes of memory regions that
530the action can be applied within a user-specified time duration.
531
532To know how user-space can set the basic quotas via :ref:`DAMON sysfs interface
533<sysfs_interface>`, refer to :ref:`quotas <sysfs_quotas>` part of the
534documentation.
535
536
537.. _damon_design_damos_quotas_prioritization:
538
539Prioritization
540^^^^^^^^^^^^^^
541
542A mechanism for making a good decision under the quotas.  When the action
543cannot be applied to all regions of interest due to the quotas, DAMOS
544prioritizes regions and applies the action to only regions having high enough
545priorities so that it will not exceed the quotas.
546
547The prioritization mechanism should be different for each action.  For example,
548rarely accessed (colder) memory regions would be prioritized for page-out
549scheme action.  In contrast, the colder regions would be deprioritized for huge
550page collapse scheme action.  Hence, the prioritization mechanisms for each
551action are implemented in each DAMON operations set, together with the actions.
552
553Though the implementation is up to the DAMON operations set, it would be common
554to calculate the priority using the access pattern properties of the regions.
555Some users would want the mechanisms to be personalized for their specific
556case.  For example, some users would want the mechanism to weigh the recency
557(``age``) more than the access frequency (``nr_accesses``).  DAMOS allows users
558to specify the weight of each access pattern property and passes the
559information to the underlying mechanism.  Nevertheless, how and even whether
560the weight will be respected are up to the underlying prioritization mechanism
561implementation.
562
563To know how user-space can set the prioritization weights via :ref:`DAMON sysfs
564interface <sysfs_interface>`, refer to :ref:`weights <sysfs_quotas>` part of
565the documentation.
566
567
568.. _damon_design_damos_quotas_auto_tuning:
569
570Aim-oriented Feedback-driven Auto-tuning
571^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
572
573Automatic feedback-driven quota tuning.  Instead of setting the absolute quota
574value, users can specify the metric of their interest, and what target value
575they want the metric value to be.  DAMOS then automatically tunes the
576aggressiveness (the quota) of the corresponding scheme.  For example, if DAMOS
577is under achieving the goal, DAMOS automatically increases the quota.  If DAMOS
578is over achieving the goal, it decreases the quota.
579
580There are two such tuning algorithms that users can select as they need.
581
582- ``consist``: A proportional feedback loop based algorithm.  Tries to find an
583  optimum quota that should be consistently kept, to keep achieving the goal.
584  Useful for kernel-only operation on dynamic and long-running environments.
585  This is the default selection.  If unsure, use this.
586- ``temporal``: More straightforward algorithm.  Tries to achieve the goal as
587  fast as possible, using maximum allowed quota, but only for a temporal short
588  time.  When the quota is under-achieved, this algorithm keeps tuning quota to
589  a maximum allowed one.  Once the quota is [over]-achieved, this sets the
590  quota zero.  Useful for deterministic control required environments.
591
592The goal can be specified with five parameters, namely ``target_metric``,
593``target_value``, ``current_value``, ``nid`` and ``path``.  The auto-tuning
594mechanism tries to make ``current_value`` of ``target_metric`` be same to
595``target_value``.
596
597- ``user_input``: User-provided value.  Users could use any metric that they
598  has interest in for the value.  Use space main workload's latency or
599  throughput, system metrics like free memory ratio or memory pressure stall
600  time (PSI) could be examples.  Note that users should explicitly set
601  ``current_value`` on their own in this case.  In other words, users should
602  repeatedly provide the feedback.
603- ``some_mem_psi_us``: System-wide ``some`` memory pressure stall information
604  in microseconds that measured from last quota reset to next quota reset.
605  DAMOS does the measurement on its own, so only ``target_value`` need to be
606  set by users at the initial time.  In other words, DAMOS does self-feedback.
607- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000).
608- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000).
609- ``node_memcg_used_bp``: Specific cgroup's node used memory ratio for a
610  specific NUMA node, in bp (1/10,000).
611- ``node_memcg_free_bp``: Specific cgroup's node unused memory ratio for a
612  specific NUMA node, in bp (1/10,000).
613- ``active_mem_bp``: Active to active + inactive (LRU) memory size ratio in bp
614  (1/10,000).
615- ``inactive_mem_bp``: Inactive to active + inactive (LRU) memory size ratio in
616  bp (1/10,000).
617
618``nid`` is optionally required for only ``node_mem_used_bp``,
619``node_mem_free_bp``, ``node_memcg_used_bp`` and ``node_memcg_free_bp`` to
620point the specific NUMA node.
621
622``path`` is optionally required for only ``node_memcg_used_bp`` and
623``node_memcg_free_bp`` to point the path to the cgroup.  The value should be
624the path of the memory cgroup from the cgroups mount point.
625
626To know how user-space can set the tuning goal metric, the target value, and/or
627the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to
628:ref:`quota goals <sysfs_schemes_quota_goals>` part of the documentation.
629
630
631.. _damon_design_damos_watermarks:
632
633Watermarks
634~~~~~~~~~~
635
636Conditional DAMOS (de)activation automation.  Users might want DAMOS to run
637only under certain situations.  For example, when a sufficient amount of free
638memory is guaranteed, running a scheme for proactive reclamation would only
639consume unnecessary system resources.  To avoid such consumption, the user would
640need to manually monitor some metrics such as free memory ratio, and turn
641DAMON/DAMOS on or off.
642
643DAMOS allows users to offload such works using three watermarks.  It allows the
644users to configure the metric of their interest, and three watermark values,
645namely high, middle, and low.  If the value of the metric becomes above the
646high watermark or below the low watermark, the scheme is deactivated.  If the
647metric becomes below the mid watermark but above the low watermark, the scheme
648is activated.  If all schemes are deactivated by the watermarks, the monitoring
649is also deactivated.  In this case, the DAMON worker thread only periodically
650checks the watermarks and therefore incurs nearly zero overhead.
651
652To know how user-space can set the watermarks via :ref:`DAMON sysfs interface
653<sysfs_interface>`, refer to :ref:`watermarks <sysfs_watermarks>` part of the
654documentation.
655
656
657.. _damon_design_damos_filters:
658
659Filters
660~~~~~~~
661
662Non-access pattern-based target memory regions filtering.  If users run
663self-written programs or have good profiling tools, they could know something
664more than the kernel, such as future access patterns or some special
665requirements for specific types of memory. For example, some users may know
666only anonymous pages can impact their program's performance.  They can also
667have a list of latency-critical processes.
668
669To let users optimize DAMOS schemes with such special knowledge, DAMOS provides
670a feature called DAMOS filters.  The feature allows users to set an arbitrary
671number of filters for each scheme.  Each filter specifies
672
673- a type of memory (``type``),
674- whether it is for the memory of the type or all except the type
675  (``matching``), and
676- whether it is to allow (include) or reject (exclude) applying
677  the scheme's action to the memory (``allow``).
678
679For efficient handling of filters, some types of filters are handled by the
680core layer, while others are handled by operations set.  In the latter case,
681hence, support of the filter types depends on the DAMON operations set.  In
682case of the core layer-handled filters, the memory regions that excluded by the
683filter are not counted as the scheme has tried to the region.  In contrast, if
684a memory regions is filtered by an operations set layer-handled filter, it is
685counted as the scheme has tried.  This difference affects the statistics.
686
687When multiple filters are installed, the group of filters that handled by the
688core layer are evaluated first.  After that, the group of filters that handled
689by the operations layer are evaluated.  Filters in each of the groups are
690evaluated in the installed order.  If a part of memory is matched to one of the
691filter, next filters are ignored.  If the part passes through the filters
692evaluation stage because it is not matched to any of the filters, applying the
693scheme's action to it depends on the last filter's allowance type.  If the last
694filter was for allowing, the part of memory will be rejected, and vice versa.
695
696For example, let's assume 1) a filter for allowing anonymous pages and 2)
697another filter for rejecting young pages are installed in the order.  If a page
698of a region that eligible to apply the scheme's action is an anonymous page,
699the scheme's action will be applied to the page regardless of whether it is
700young or not, since it matches with the first allow-filter.  If the page is
701not anonymous but young, the scheme's action will not be applied, since the
702second reject-filter blocks it.  If the page is neither anonymous nor young,
703the page will pass through the filters evaluation stage since there is no
704matching filter, and the action will be applied to the page.
705
706Below ``type`` of filters are currently supported.
707
708- Core layer handled
709    - addr
710        - Applied to pages that belonging to a given address range.
711    - target
712        - Applied to pages that belonging to a given DAMON monitoring target.
713- Operations layer handled, supported by only ``paddr`` operations set.
714    - anon
715        - Applied to pages that containing data that not stored in files.
716    - active
717        - Applied to active pages.
718    - memcg
719        - Applied to pages that belonging to a given cgroup.
720    - young
721        - Applied to pages that are accessed after the last access check from the
722          scheme.
723    - hugepage_size
724        - Applied to pages that managed in a given size range.
725    - unmapped
726        - Applied to pages that unmapped.
727
728To know how user-space can set the filters via :ref:`DAMON sysfs interface
729<sysfs_interface>`, refer to :ref:`filters <sysfs_filters>` part of the
730documentation.
731
732.. _damon_design_damos_stat:
733
734Statistics
735~~~~~~~~~~
736
737The statistics of DAMOS behaviors that designed to help monitoring, tuning and
738debugging of DAMOS.
739
740DAMOS accounts below statistics for each scheme, from the beginning of the
741scheme's execution.
742
743- ``nr_tried``: Total number of regions that the scheme is tried to be applied.
744- ``sz_tried``: Total size of regions that the scheme is tried to be applied.
745- ``sz_ops_filter_passed``: Total bytes that passed operations set
746  layer-handled DAMOS filters.
747- ``nr_applied``: Total number of regions that the scheme is applied.
748- ``sz_applied``: Total size of regions that the scheme is applied.
749- ``qt_exceeds``: Total number of times the quota of the scheme has exceeded.
750- ``nr_snapshots``: Total number of DAMON snapshots that the scheme is tried to
751  be applied.
752- ``max_nr_snapshots``: Upper limit of ``nr_snapshots``.
753
754"A scheme is tried to be applied to a region" means DAMOS core logic determined
755the region is eligible to apply the scheme's :ref:`action
756<damon_design_damos_action>`.  The :ref:`access pattern
757<damon_design_damos_access_pattern>`, :ref:`quotas
758<damon_design_damos_quotas>`, :ref:`watermarks
759<damon_design_damos_watermarks>`, and :ref:`filters
760<damon_design_damos_filters>` that handled on core logic could affect this.
761The core logic will only ask the underlying :ref:`operation set
762<damon_operations_set>` to do apply the action to the region, so whether the
763action is really applied or not is unclear.  That's why it is called "tried".
764
765"A scheme is applied to a region" means the :ref:`operation set
766<damon_operations_set>` has applied the action to at least a part of the
767region.  The :ref:`filters <damon_design_damos_filters>` that handled by the
768operation set, and the types of the :ref:`action <damon_design_damos_action>`
769and the pages of the region can affect this.  For example, if a filter is set
770to exclude anonymous pages and the region has only anonymous pages, or if the
771action is ``pageout`` while all pages of the region are unreclaimable, applying
772the action to the region will fail.
773
774Unlike normal stats, ``max_nr_snapshots`` is set by users.  If it is set as
775non-zero and ``nr_snapshots`` be same to or greater than ``nr_snapshots``, the
776scheme is deactivated.
777
778To know how user-space can read the stats via :ref:`DAMON sysfs interface
779<sysfs_interface>`, refer to :ref:s`stats <sysfs_stats>` part of the
780documentation.
781
782Regions Walking
783~~~~~~~~~~~~~~~
784
785DAMOS feature allowing users access each region that a DAMOS action has just
786applied.  Using this feature, DAMON :ref:`API <damon_design_api>` allows users
787access full properties of the regions including the access monitoring results
788and amount of the region's internal memory that passed the DAMOS filters.
789:ref:`DAMON sysfs interface <sysfs_interface>` also allows users read the data
790via special :ref:`files <sysfs_schemes_tried_regions>`.
791
792.. _damon_design_api:
793
794Application Programming Interface
795---------------------------------
796
797The programming interface for kernel space data access-aware applications.
798DAMON is a framework, so it does nothing by itself.  Instead, it only helps
799other kernel components such as subsystems and modules building their data
800access-aware applications using DAMON's core features.  For this, DAMON exposes
801its all features to other kernel components via its application programming
802interface, namely ``include/linux/damon.h``.  Please refer to the API
803:doc:`document </mm/damon/api>` for details of the interface.
804
805
806.. _damon_modules:
807
808Modules
809=======
810
811Because the core of DAMON is a framework for kernel components, it doesn't
812provide any direct interface for the user space.  Such interfaces should be
813implemented by each DAMON API user kernel components, instead.  DAMON subsystem
814itself implements such DAMON API user modules, which are supposed to be used
815for general purpose DAMON control and special purpose data access-aware system
816operations, and provides stable application binary interfaces (ABI) for the
817user space.  The user space can build their efficient data access-aware
818applications using the interfaces.
819
820
821General Purpose User Interface Modules
822--------------------------------------
823
824DAMON modules that provide user space ABIs for general purpose DAMON usage in
825runtime.
826
827Like many other ABIs, the modules create files on pseudo file systems like
828'sysfs', allow users to specify their requests to and get the answers from
829DAMON by writing to and reading from the files.  As a response to such I/O,
830DAMON user interface modules control DAMON and retrieve the results as user
831requested via the DAMON API, and return the results to the user-space.
832
833The ABIs are designed to be used for user space applications development,
834rather than human beings' fingers.  Human users are recommended to use such
835user space tools.  One such Python-written user space tool is available at
836Github (https://github.com/damonitor/damo), Pypi
837(https://pypistats.org/packages/damo), and multiple distros
838(https://repology.org/project/damo/versions).
839
840Currently, one module for this type, namely 'DAMON sysfs interface' is
841available.  Please refer to the ABI :ref:`doc <sysfs_interface>` for details of
842the interfaces.
843
844
845.. _damon_modules_special_purpose:
846
847Special-Purpose Access-aware Kernel Modules
848-------------------------------------------
849
850DAMON modules that provide user space ABI for specific purpose DAMON usage.
851
852DAMON user interface modules are for full control of all DAMON features in
853runtime.  For each special-purpose system-wide data access-aware system
854operations such as proactive reclamation or LRU lists balancing, the interfaces
855could be simplified by removing unnecessary knobs for the specific purpose, and
856extended for boot-time and even compile time control.  Default values of DAMON
857control parameters for the usage would also need to be optimized for the
858purpose.
859
860To support such cases, yet more DAMON API user kernel modules that provide more
861simple and optimized user space interfaces are available.  Currently, two
862modules for proactive reclamation and LRU lists manipulation are provided.  For
863more detail, please read the usage documents for those
864(:doc:`/admin-guide/mm/damon/stat`, :doc:`/admin-guide/mm/damon/reclaim` and
865:doc:`/admin-guide/mm/damon/lru_sort`).
866
867.. _damon_design_special_purpose_modules_exclusivity:
868
869Note that these modules currently run in an exclusive manner.  If one of those
870is already running, others will return ``-EBUSY`` upon start requests.
871
872Sample DAMON Modules
873--------------------
874
875DAMON modules that provides example DAMON kernel API usages.
876
877kernel programmers can build their own special or general purpose DAMON modules
878using DAMON kernel API.  To help them easily understand how DAMON kernel API
879can be used, a few sample modules are provided under ``samples/damon/`` of the
880linux source tree.  Please note that these modules are not developed for being
881used on real products, but only for showing how DAMON kernel API can be used in
882simple ways.
883