xref: /linux/Documentation/mm/damon/design.rst (revision 8804d970fab45726b3c7cd7f240b31122aa94219)
1.. SPDX-License-Identifier: GPL-2.0
2
3======
4Design
5======
6
7
8.. _damon_design_execution_model_and_data_structures:
9
10Execution Model and Data Structures
11===================================
12
13The monitoring-related information including the monitoring request
14specification and DAMON-based operation schemes are stored in a data structure
15called DAMON ``context``.  DAMON executes each context with a kernel thread
16called ``kdamond``.  Multiple kdamonds could run in parallel, for different
17types of monitoring.
18
19To know how user-space can do the configurations and start/stop DAMON, refer to
20:ref:`DAMON sysfs interface <sysfs_interface>` documentation.
21
22
23Overall Architecture
24====================
25
26DAMON subsystem is configured with three layers including
27
28- :ref:`Operations Set <damon_operations_set>`: Implements fundamental
29  operations for DAMON that depends on the given monitoring target
30  address-space and available set of software/hardware primitives,
31- :ref:`Core <damon_core_logic>`: Implements core logics including monitoring
32  overhead/accuracy control and access-aware system operations on top of the
33  operations set layer, and
34- :ref:`Modules <damon_modules>`: Implements kernel modules for various
35  purposes that provides interfaces for the user space, on top of the core
36  layer.
37
38
39.. _damon_operations_set:
40
41Operations Set Layer
42====================
43
44.. _damon_design_configurable_operations_set:
45
46For data access monitoring and additional low level work, DAMON needs a set of
47implementations for specific operations that are dependent on and optimized for
48the given target address space.  For example, below two operations for access
49monitoring are address-space dependent.
50
511. Identification of the monitoring target address range for the address space.
522. Access check of specific address range in the target space.
53
54DAMON consolidates these implementations in a layer called DAMON Operations
55Set, and defines the interface between it and the upper layer.  The upper layer
56is dedicated for DAMON's core logics including the mechanism for control of the
57monitoring accuracy and the overhead.
58
59Hence, DAMON can easily be extended for any address space and/or available
60hardware features by configuring the core logic to use the appropriate
61operations set.  If there is no available operations set for a given purpose, a
62new operations set can be implemented following the interface between the
63layers.
64
65For example, physical memory, virtual memory, swap space, those for specific
66processes, NUMA nodes, files, and backing memory devices would be supportable.
67Also, if some architectures or devices support special optimized access check
68features, those will be easily configurable.
69
70DAMON currently provides below three operation sets.  Below three subsections
71describe how those work.
72
73 - vaddr: Monitor virtual address spaces of specific processes
74 - fvaddr: Monitor fixed virtual address ranges
75 - paddr: Monitor the physical address space of the system
76
77To know how user-space can do the configuration via :ref:`DAMON sysfs interface
78<sysfs_interface>`, refer to :ref:`operations <sysfs_context>` file part of the
79documentation.
80
81
82 .. _damon_design_vaddr_target_regions_construction:
83
84VMA-based Target Address Range Construction
85-------------------------------------------
86
87A mechanism of ``vaddr`` DAMON operations set that automatically initializes
88and updates the monitoring target address regions so that entire memory
89mappings of the target processes can be covered.
90
91This mechanism is only for the ``vaddr`` operations set.  In cases of
92``fvaddr`` and ``paddr`` operation sets, users are asked to manually set the
93monitoring target address ranges.
94
95Only small parts in the super-huge virtual address space of the processes are
96mapped to the physical memory and accessed.  Thus, tracking the unmapped
97address regions is just wasteful.  However, because DAMON can deal with some
98level of noise using the adaptive regions adjustment mechanism, tracking every
99mapping is not strictly required but could even incur a high overhead in some
100cases.  That said, too huge unmapped areas inside the monitoring target should
101be removed to not take the time for the adaptive mechanism.
102
103For the reason, this implementation converts the complex mappings to three
104distinct regions that cover every mapped area of the address space.  The two
105gaps between the three regions are the two biggest unmapped areas in the given
106address space.  The two biggest unmapped areas would be the gap between the
107heap and the uppermost mmap()-ed region, and the gap between the lowermost
108mmap()-ed region and the stack in most of the cases.  Because these gaps are
109exceptionally huge in usual address spaces, excluding these will be sufficient
110to make a reasonable trade-off.  Below shows this in detail::
111
112    <heap>
113    <BIG UNMAPPED REGION 1>
114    <uppermost mmap()-ed region>
115    (small mmap()-ed regions and munmap()-ed regions)
116    <lowermost mmap()-ed region>
117    <BIG UNMAPPED REGION 2>
118    <stack>
119
120
121PTE Accessed-bit Based Access Check
122-----------------------------------
123
124Both of the implementations for physical and virtual address spaces use PTE
125Accessed-bit for basic access checks.  Only one difference is the way of
126finding the relevant PTE Accessed bit(s) from the address.  While the
127implementation for the virtual address walks the page table for the target task
128of the address, the implementation for the physical address walks every page
129table having a mapping to the address.  In this way, the implementations find
130and clear the bit(s) for next sampling target address and checks whether the
131bit(s) set again after one sampling period.  This could disturb other kernel
132subsystems using the Accessed bits, namely Idle page tracking and the reclaim
133logic.  DAMON does nothing to avoid disturbing Idle page tracking, so handling
134the interference is the responsibility of sysadmins.  However, it solves the
135conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags,
136as Idle page tracking does.
137
138.. _damon_design_addr_unit:
139
140Address Unit
141------------
142
143DAMON core layer uses ``unsinged long`` type for monitoring target address
144ranges.  In some cases, the address space for a given operations set could be
145too large to be handled with the type.  ARM (32-bit) with large physical
146address extension is an example.  For such cases, a per-operations set
147parameter called ``address unit`` is provided.  It represents the scale factor
148that need to be multiplied to the core layer's address for calculating real
149address on the given address space.  Support of ``address unit`` parameter is
150up to each operations set implementation.  ``paddr`` is the only operations set
151implementation that supports the parameter.
152
153.. _damon_core_logic:
154
155Core Logics
156===========
157
158.. _damon_design_monitoring:
159
160Monitoring
161----------
162
163Below four sections describe each of the DAMON core mechanisms and the five
164monitoring attributes, ``sampling interval``, ``aggregation interval``,
165``update interval``, ``minimum number of regions``, and ``maximum number of
166regions``.
167
168To know how user-space can set the attributes via :ref:`DAMON sysfs interface
169<sysfs_interface>`, refer to :ref:`monitoring_attrs <sysfs_monitoring_attrs>`
170part of the documentation.
171
172
173Access Frequency Monitoring
174~~~~~~~~~~~~~~~~~~~~~~~~~~~
175
176The output of DAMON says what pages are how frequently accessed for a given
177duration.  The resolution of the access frequency is controlled by setting
178``sampling interval`` and ``aggregation interval``.  In detail, DAMON checks
179access to each page per ``sampling interval`` and aggregates the results.  In
180other words, counts the number of the accesses to each page.  After each
181``aggregation interval`` passes, DAMON calls callback functions that previously
182registered by users so that users can read the aggregated results and then
183clears the results.  This can be described in below simple pseudo-code::
184
185    while monitoring_on:
186        for page in monitoring_target:
187            if accessed(page):
188                nr_accesses[page] += 1
189        if time() % aggregation_interval == 0:
190            for callback in user_registered_callbacks:
191                callback(monitoring_target, nr_accesses)
192            for page in monitoring_target:
193                nr_accesses[page] = 0
194        sleep(sampling interval)
195
196The monitoring overhead of this mechanism will arbitrarily increase as the
197size of the target workload grows.
198
199
200.. _damon_design_region_based_sampling:
201
202Region Based Sampling
203~~~~~~~~~~~~~~~~~~~~~
204
205To avoid the unbounded increase of the overhead, DAMON groups adjacent pages
206that assumed to have the same access frequencies into a region.  As long as the
207assumption (pages in a region have the same access frequencies) is kept, only
208one page in the region is required to be checked.  Thus, for each ``sampling
209interval``, DAMON randomly picks one page in each region, waits for one
210``sampling interval``, checks whether the page is accessed meanwhile, and
211increases the access frequency counter of the region if so.  The counter is
212called ``nr_accesses`` of the region.  Therefore, the monitoring overhead is
213controllable by setting the number of regions.  DAMON allows users to set the
214minimum and the maximum number of regions for the trade-off.
215
216This scheme, however, cannot preserve the quality of the output if the
217assumption is not guaranteed.
218
219
220.. _damon_design_adaptive_regions_adjustment:
221
222Adaptive Regions Adjustment
223~~~~~~~~~~~~~~~~~~~~~~~~~~~
224
225Even somehow the initial monitoring target regions are well constructed to
226fulfill the assumption (pages in same region have similar access frequencies),
227the data access pattern can be dynamically changed.  This will result in low
228monitoring quality.  To keep the assumption as much as possible, DAMON
229adaptively merges and splits each region based on their access frequency.
230
231For each ``aggregation interval``, it compares the access frequencies
232(``nr_accesses``) of adjacent regions.  If the difference is small, and if the
233sum of the two regions' sizes is smaller than the size of total regions divided
234by the ``minimum number of regions``, DAMON merges the two regions.  If the
235resulting number of total regions is still higher than ``maximum number of
236regions``, it repeats the merging with increasing access frequenceis difference
237threshold until the upper-limit of the number of regions is met, or the
238threshold becomes higher than possible maximum value (``aggregation interval``
239divided by ``sampling interval``).   Then, after it reports and clears the
240aggregated access frequency of each region, it splits each region into two or
241three regions if the total number of regions will not exceed the user-specified
242maximum number of regions after the split.
243
244In this way, DAMON provides its best-effort quality and minimal overhead while
245keeping the bounds users set for their trade-off.
246
247
248.. _damon_design_age_tracking:
249
250Age Tracking
251~~~~~~~~~~~~
252
253By analyzing the monitoring results, users can also find how long the current
254access pattern of a region has maintained.  That could be used for good
255understanding of the access pattern.  For example, page placement algorithm
256utilizing both the frequency and the recency could be implemented using that.
257To make such access pattern maintained period analysis easier, DAMON maintains
258yet another counter called ``age`` in each region.  For each ``aggregation
259interval``, DAMON checks if the region's size and access frequency
260(``nr_accesses``) has significantly changed.  If so, the counter is reset to
261zero.  Otherwise, the counter is increased.
262
263
264Dynamic Target Space Updates Handling
265~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
266
267The monitoring target address range could dynamically changed.  For example,
268virtual memory could be dynamically mapped and unmapped.  Physical memory could
269be hot-plugged.
270
271As the changes could be quite frequent in some cases, DAMON allows the
272monitoring operations to check dynamic changes including memory mapping changes
273and applies it to monitoring operations-related data structures such as the
274abstracted monitoring target memory area only for each of a user-specified time
275interval (``update interval``).
276
277User-space can get the monitoring results via DAMON sysfs interface and/or
278tracepoints.  For more details, please refer to the documentations for
279:ref:`DAMOS tried regions <sysfs_schemes_tried_regions>` and :ref:`tracepoint`,
280respectively.
281
282
283.. _damon_design_monitoring_params_tuning_guide:
284
285Monitoring Parameters Tuning Guide
286~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
287
288In short, set ``aggregation interval`` to capture meaningful amount of accesses
289for the purpose.  The amount of accesses can be measured using ``nr_accesses``
290and ``age`` of regions in the aggregated monitoring results snapshot.  The
291default value of the interval, ``100ms``, turns out to be too short in many
292cases.  Set ``sampling interval`` proportional to ``aggregation interval``.  By
293default, ``1/20`` is recommended as the ratio.
294
295``Aggregation interval`` should be set as the time interval that the workload
296can make an amount of accesses for the monitoring purpose, within the interval.
297If the interval is too short, only small number of accesses are captured.  As a
298result, the monitoring results look everything is samely accessed only rarely.
299For many purposes, that would be useless.  If it is too long, however, the time
300to converge regions with the :ref:`regions adjustment mechanism
301<damon_design_adaptive_regions_adjustment>` can be too long, depending on the
302time scale of the given purpose.  This could happen if the workload is actually
303making only rare accesses but the user thinks the amount of accesses for the
304monitoring purpose too high.  For such cases, the target amount of access to
305capture per ``aggregation interval`` should carefully reconsidered.  Also, note
306that the captured amount of accesses is represented with not only
307``nr_accesses``, but also ``age``.  For example, even if every region on the
308monitoring results show zero ``nr_accesses``, regions could still be
309distinguished using ``age`` values as the recency information.
310
311Hence the optimum value of ``aggregation interval`` depends on the access
312intensiveness of the workload.  The user should tune the interval based on the
313amount of access that captured on each aggregated snapshot of the monitoring
314results.
315
316Note that the default value of the interval is 100 milliseconds, which is too
317short in many cases, especially on large systems.
318
319``Sampling interval`` defines the resolution of each aggregation.  If it is set
320too large, monitoring results will look like every region was samely rarely
321accessed, or samely frequently accessed.  That is, regions become
322undistinguishable based on access pattern, and therefore the results will be
323useless in many use cases.  If ``sampling interval`` is too small, it will not
324degrade the resolution, but will increase the monitoring overhead.  If it is
325appropriate enough to provide a resolution of the monitoring results that
326sufficient for the given purpose, it shouldn't be unnecessarily further
327lowered.  It is recommended to be set proportional to ``aggregation interval``.
328By default, the ratio is set as ``1/20``, and it is still recommended.
329
330Based on the manual tuning guide, DAMON provides more intuitive knob-based
331intervals auto tuning mechanism.  Please refer to :ref:`the design document of
332the feature <damon_design_monitoring_intervals_autotuning>` for detail.
333
334Refer to below documents for an example tuning based on the above guide.
335
336.. toctree::
337   :maxdepth: 1
338
339   monitoring_intervals_tuning_example
340
341
342.. _damon_design_monitoring_intervals_autotuning:
343
344Monitoring Intervals Auto-tuning
345~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
346
347DAMON provides automatic tuning of the ``sampling interval`` and ``aggregation
348interval`` based on the :ref:`the tuning guide idea
349<damon_design_monitoring_params_tuning_guide>`.  The tuning mechanism allows
350users to set the aimed amount of access events to observe via DAMON within
351given time interval.  The target can be specified by the user as a ratio of
352DAMON-observed access events to the theoretical maximum amount of the events
353(``access_bp``) that measured within a given number of aggregations
354(``aggrs``).
355
356The DAMON-observed access events are calculated in byte granularity based on
357DAMON :ref:`region assumption <damon_design_region_based_sampling>`.  For
358example, if a region of size ``X`` bytes of ``Y`` ``nr_accesses`` is found, it
359means ``X * Y`` access events are observed by DAMON.  Theoretical maximum
360access events for the region is calculated in same way, but replacing ``Y``
361with theoretical maximum ``nr_accesses``, which can be calculated as
362``aggregation interval / sampling interval``.
363
364The mechanism calculates the ratio of access events for ``aggrs`` aggregations,
365and increases or decrease the ``sampleing interval`` and ``aggregation
366interval`` in same ratio, if the observed access ratio is lower or higher than
367the target, respectively.  The ratio of the intervals change is decided in
368proportion to the distance between current samples ratio and the target ratio.
369
370The user can further set the minimum and maximum ``sampling interval`` that can
371be set by the tuning mechanism using two parameters (``min_sample_us`` and
372``max_sample_us``).  Because the tuning mechanism changes ``sampling interval``
373and ``aggregation interval`` in same ratio always, the minimum and maximum
374``aggregation interval`` after each of the tuning changes can automatically set
375together.
376
377The tuning is turned off by default, and need to be set explicitly by the user.
378As a rule of thumbs and the Parreto principle, 4% access samples ratio target
379is recommended.  Note that Parreto principle (80/20 rule) has applied twice.
380That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source)
381to capture 64% (80% multipled by 80%) real access events (outcomes).
382
383To know how user-space can use this feature via :ref:`DAMON sysfs interface
384<sysfs_interface>`, refer to :ref:`intervals_goal <sysfs_scheme>` part of
385the documentation.
386
387
388.. _damon_design_damos:
389
390Operation Schemes
391-----------------
392
393One common purpose of data access monitoring is access-aware system efficiency
394optimizations.  For example,
395
396    paging out memory regions that are not accessed for more than two minutes
397
398or
399
400    using THP for memory regions that are larger than 2 MiB and showing a high
401    access frequency for more than one minute.
402
403One straightforward approach for such schemes would be profile-guided
404optimizations.  That is, getting data access monitoring results of the
405workloads or the system using DAMON, finding memory regions of special
406characteristics by profiling the monitoring results, and making system
407operation changes for the regions.  The changes could be made by modifying or
408providing advice to the software (the application and/or the kernel), or
409reconfiguring the hardware.  Both offline and online approaches could be
410available.
411
412Among those, providing advice to the kernel at runtime would be flexible and
413effective, and therefore widely be used.   However, implementing such schemes
414could impose unnecessary redundancy and inefficiency.  The profiling could be
415redundant if the type of interest is common.  Exchanging the information
416including monitoring results and operation advice between kernel and user
417spaces could be inefficient.
418
419To allow users to reduce such redundancy and inefficiencies by offloading the
420works, DAMON provides a feature called Data Access Monitoring-based Operation
421Schemes (DAMOS).  It lets users specify their desired schemes at a high
422level.  For such specifications, DAMON starts monitoring, finds regions having
423the access pattern of interest, and applies the user-desired operation actions
424to the regions, for every user-specified time interval called
425``apply_interval``.
426
427To know how user-space can set ``apply_interval`` via :ref:`DAMON sysfs
428interface <sysfs_interface>`, refer to :ref:`apply_interval_us <sysfs_scheme>`
429part of the documentation.
430
431
432.. _damon_design_damos_action:
433
434Operation Action
435~~~~~~~~~~~~~~~~
436
437The management action that the users desire to apply to the regions of their
438interest.  For example, paging out, prioritizing for next reclamation victim
439selection, advising ``khugepaged`` to collapse or split, or doing nothing but
440collecting statistics of the regions.
441
442The list of supported actions is defined in DAMOS, but the implementation of
443each action is in the DAMON operations set layer because the implementation
444normally depends on the monitoring target address space.  For example, the code
445for paging specific virtual address ranges out would be different from that for
446physical address ranges.  And the monitoring operations implementation sets are
447not mandated to support all actions of the list.  Hence, the availability of
448specific DAMOS action depends on what operations set is selected to be used
449together.
450
451The list of the supported actions, their meaning, and DAMON operations sets
452that supports each action are as below.
453
454 - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
455   Supported by ``vaddr`` and ``fvaddr`` operations set.
456 - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``.
457   Supported by ``vaddr`` and ``fvaddr`` operations set.
458 - ``pageout``: Reclaim the region.
459   Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
460 - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``.
461   Supported by ``vaddr`` and ``fvaddr`` operations set.
462 - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``.
463   Supported by ``vaddr`` and ``fvaddr`` operations set.
464 - ``lru_prio``: Prioritize the region on its LRU lists.
465   Supported by ``paddr`` operations set.
466 - ``lru_deprio``: Deprioritize the region on its LRU lists.
467   Supported by ``paddr`` operations set.
468 - ``migrate_hot``: Migrate the regions prioritizing warmer regions.
469   Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
470 - ``migrate_cold``: Migrate the regions prioritizing colder regions.
471   Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
472 - ``stat``: Do nothing but count the statistics.
473   Supported by all operations sets.
474
475Applying the actions except ``stat`` to a region is considered as changing the
476region's characteristics.  Hence, DAMOS resets the age of regions when any such
477actions are applied to those.
478
479To know how user-space can set the action via :ref:`DAMON sysfs interface
480<sysfs_interface>`, refer to :ref:`action <sysfs_scheme>` part of the
481documentation.
482
483
484.. _damon_design_damos_access_pattern:
485
486Target Access Pattern
487~~~~~~~~~~~~~~~~~~~~~
488
489The access pattern of the schemes' interest.  The patterns are constructed with
490the properties that DAMON's monitoring results provide, specifically the size,
491the access frequency, and the age.  Users can describe their access pattern of
492interest by setting minimum and maximum values of the three properties.  If a
493region's three properties are in the ranges, DAMOS classifies it as one of the
494regions that the scheme is having an interest in.
495
496To know how user-space can set the access pattern via :ref:`DAMON sysfs
497interface <sysfs_interface>`, refer to :ref:`access_pattern
498<sysfs_access_pattern>` part of the documentation.
499
500
501.. _damon_design_damos_quotas:
502
503Quotas
504~~~~~~
505
506DAMOS upper-bound overhead control feature.  DAMOS could incur high overhead if
507the target access pattern is not properly tuned.  For example, if a huge memory
508region having the access pattern of interest is found, applying the scheme's
509action to all pages of the huge region could consume unacceptably large system
510resources.  Preventing such issues by tuning the access pattern could be
511challenging, especially if the access patterns of the workloads are highly
512dynamic.
513
514To mitigate that situation, DAMOS provides an upper-bound overhead control
515feature called quotas.  It lets users specify an upper limit of time that DAMOS
516can use for applying the action, and/or a maximum bytes of memory regions that
517the action can be applied within a user-specified time duration.
518
519To know how user-space can set the basic quotas via :ref:`DAMON sysfs interface
520<sysfs_interface>`, refer to :ref:`quotas <sysfs_quotas>` part of the
521documentation.
522
523
524.. _damon_design_damos_quotas_prioritization:
525
526Prioritization
527^^^^^^^^^^^^^^
528
529A mechanism for making a good decision under the quotas.  When the action
530cannot be applied to all regions of interest due to the quotas, DAMOS
531prioritizes regions and applies the action to only regions having high enough
532priorities so that it will not exceed the quotas.
533
534The prioritization mechanism should be different for each action.  For example,
535rarely accessed (colder) memory regions would be prioritized for page-out
536scheme action.  In contrast, the colder regions would be deprioritized for huge
537page collapse scheme action.  Hence, the prioritization mechanisms for each
538action are implemented in each DAMON operations set, together with the actions.
539
540Though the implementation is up to the DAMON operations set, it would be common
541to calculate the priority using the access pattern properties of the regions.
542Some users would want the mechanisms to be personalized for their specific
543case.  For example, some users would want the mechanism to weigh the recency
544(``age``) more than the access frequency (``nr_accesses``).  DAMOS allows users
545to specify the weight of each access pattern property and passes the
546information to the underlying mechanism.  Nevertheless, how and even whether
547the weight will be respected are up to the underlying prioritization mechanism
548implementation.
549
550To know how user-space can set the prioritization weights via :ref:`DAMON sysfs
551interface <sysfs_interface>`, refer to :ref:`weights <sysfs_quotas>` part of
552the documentation.
553
554
555.. _damon_design_damos_quotas_auto_tuning:
556
557Aim-oriented Feedback-driven Auto-tuning
558^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
559
560Automatic feedback-driven quota tuning.  Instead of setting the absolute quota
561value, users can specify the metric of their interest, and what target value
562they want the metric value to be.  DAMOS then automatically tunes the
563aggressiveness (the quota) of the corresponding scheme.  For example, if DAMOS
564is under achieving the goal, DAMOS automatically increases the quota.  If DAMOS
565is over achieving the goal, it decreases the quota.
566
567The goal can be specified with four parameters, namely ``target_metric``,
568``target_value``, ``current_value`` and ``nid``.  The auto-tuning mechanism
569tries to make ``current_value`` of ``target_metric`` be same to
570``target_value``.
571
572- ``user_input``: User-provided value.  Users could use any metric that they
573  has interest in for the value.  Use space main workload's latency or
574  throughput, system metrics like free memory ratio or memory pressure stall
575  time (PSI) could be examples.  Note that users should explicitly set
576  ``current_value`` on their own in this case.  In other words, users should
577  repeatedly provide the feedback.
578- ``some_mem_psi_us``: System-wide ``some`` memory pressure stall information
579  in microseconds that measured from last quota reset to next quota reset.
580  DAMOS does the measurement on its own, so only ``target_value`` need to be
581  set by users at the initial time.  In other words, DAMOS does self-feedback.
582- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000).
583- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000).
584
585``nid`` is optionally required for only ``node_mem_used_bp`` and
586``node_mem_free_bp`` to point the specific NUMA node.
587
588To know how user-space can set the tuning goal metric, the target value, and/or
589the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to
590:ref:`quota goals <sysfs_schemes_quota_goals>` part of the documentation.
591
592
593.. _damon_design_damos_watermarks:
594
595Watermarks
596~~~~~~~~~~
597
598Conditional DAMOS (de)activation automation.  Users might want DAMOS to run
599only under certain situations.  For example, when a sufficient amount of free
600memory is guaranteed, running a scheme for proactive reclamation would only
601consume unnecessary system resources.  To avoid such consumption, the user would
602need to manually monitor some metrics such as free memory ratio, and turn
603DAMON/DAMOS on or off.
604
605DAMOS allows users to offload such works using three watermarks.  It allows the
606users to configure the metric of their interest, and three watermark values,
607namely high, middle, and low.  If the value of the metric becomes above the
608high watermark or below the low watermark, the scheme is deactivated.  If the
609metric becomes below the mid watermark but above the low watermark, the scheme
610is activated.  If all schemes are deactivated by the watermarks, the monitoring
611is also deactivated.  In this case, the DAMON worker thread only periodically
612checks the watermarks and therefore incurs nearly zero overhead.
613
614To know how user-space can set the watermarks via :ref:`DAMON sysfs interface
615<sysfs_interface>`, refer to :ref:`watermarks <sysfs_watermarks>` part of the
616documentation.
617
618
619.. _damon_design_damos_filters:
620
621Filters
622~~~~~~~
623
624Non-access pattern-based target memory regions filtering.  If users run
625self-written programs or have good profiling tools, they could know something
626more than the kernel, such as future access patterns or some special
627requirements for specific types of memory. For example, some users may know
628only anonymous pages can impact their program's performance.  They can also
629have a list of latency-critical processes.
630
631To let users optimize DAMOS schemes with such special knowledge, DAMOS provides
632a feature called DAMOS filters.  The feature allows users to set an arbitrary
633number of filters for each scheme.  Each filter specifies
634
635- a type of memory (``type``),
636- whether it is for the memory of the type or all except the type
637  (``matching``), and
638- whether it is to allow (include) or reject (exclude) applying
639  the scheme's action to the memory (``allow``).
640
641For efficient handling of filters, some types of filters are handled by the
642core layer, while others are handled by operations set.  In the latter case,
643hence, support of the filter types depends on the DAMON operations set.  In
644case of the core layer-handled filters, the memory regions that excluded by the
645filter are not counted as the scheme has tried to the region.  In contrast, if
646a memory regions is filtered by an operations set layer-handled filter, it is
647counted as the scheme has tried.  This difference affects the statistics.
648
649When multiple filters are installed, the group of filters that handled by the
650core layer are evaluated first.  After that, the group of filters that handled
651by the operations layer are evaluated.  Filters in each of the groups are
652evaluated in the installed order.  If a part of memory is matched to one of the
653filter, next filters are ignored.  If the part passes through the filters
654evaluation stage because it is not matched to any of the filters, applying the
655scheme's action to it depends on the last filter's allowance type.  If the last
656filter was for allowing, the part of memory will be rejected, and vice versa.
657
658For example, let's assume 1) a filter for allowing anonymous pages and 2)
659another filter for rejecting young pages are installed in the order.  If a page
660of a region that eligible to apply the scheme's action is an anonymous page,
661the scheme's action will be applied to the page regardless of whether it is
662young or not, since it matches with the first allow-filter.  If the page is
663not anonymous but young, the scheme's action will not be applied, since the
664second reject-filter blocks it.  If the page is neither anonymous nor young,
665the page will pass through the filters evaluation stage since there is no
666matching filter, and the action will be applied to the page.
667
668Below ``type`` of filters are currently supported.
669
670- Core layer handled
671    - addr
672        - Applied to pages that belonging to a given address range.
673    - target
674        - Applied to pages that belonging to a given DAMON monitoring target.
675- Operations layer handled, supported by only ``paddr`` operations set.
676    - anon
677        - Applied to pages that containing data that not stored in files.
678    - active
679        - Applied to active pages.
680    - memcg
681        - Applied to pages that belonging to a given cgroup.
682    - young
683        - Applied to pages that are accessed after the last access check from the
684          scheme.
685    - hugepage_size
686        - Applied to pages that managed in a given size range.
687    - unmapped
688        - Applied to pages that unmapped.
689
690To know how user-space can set the filters via :ref:`DAMON sysfs interface
691<sysfs_interface>`, refer to :ref:`filters <sysfs_filters>` part of the
692documentation.
693
694.. _damon_design_damos_stat:
695
696Statistics
697~~~~~~~~~~
698
699The statistics of DAMOS behaviors that designed to help monitoring, tuning and
700debugging of DAMOS.
701
702DAMOS accounts below statistics for each scheme, from the beginning of the
703scheme's execution.
704
705- ``nr_tried``: Total number of regions that the scheme is tried to be applied.
706- ``sz_tried``: Total size of regions that the scheme is tried to be applied.
707- ``sz_ops_filter_passed``: Total bytes that passed operations set
708  layer-handled DAMOS filters.
709- ``nr_applied``: Total number of regions that the scheme is applied.
710- ``sz_applied``: Total size of regions that the scheme is applied.
711- ``qt_exceeds``: Total number of times the quota of the scheme has exceeded.
712
713"A scheme is tried to be applied to a region" means DAMOS core logic determined
714the region is eligible to apply the scheme's :ref:`action
715<damon_design_damos_action>`.  The :ref:`access pattern
716<damon_design_damos_access_pattern>`, :ref:`quotas
717<damon_design_damos_quotas>`, :ref:`watermarks
718<damon_design_damos_watermarks>`, and :ref:`filters
719<damon_design_damos_filters>` that handled on core logic could affect this.
720The core logic will only ask the underlying :ref:`operation set
721<damon_operations_set>` to do apply the action to the region, so whether the
722action is really applied or not is unclear.  That's why it is called "tried".
723
724"A scheme is applied to a region" means the :ref:`operation set
725<damon_operations_set>` has applied the action to at least a part of the
726region.  The :ref:`filters <damon_design_damos_filters>` that handled by the
727operation set, and the types of the :ref:`action <damon_design_damos_action>`
728and the pages of the region can affect this.  For example, if a filter is set
729to exclude anonymous pages and the region has only anonymous pages, or if the
730action is ``pageout`` while all pages of the region are unreclaimable, applying
731the action to the region will fail.
732
733To know how user-space can read the stats via :ref:`DAMON sysfs interface
734<sysfs_interface>`, refer to :ref:s`stats <sysfs_stats>` part of the
735documentation.
736
737Regions Walking
738~~~~~~~~~~~~~~~
739
740DAMOS feature allowing users access each region that a DAMOS action has just
741applied.  Using this feature, DAMON :ref:`API <damon_design_api>` allows users
742access full properties of the regions including the access monitoring results
743and amount of the region's internal memory that passed the DAMOS filters.
744:ref:`DAMON sysfs interface <sysfs_interface>` also allows users read the data
745via special :ref:`files <sysfs_schemes_tried_regions>`.
746
747.. _damon_design_api:
748
749Application Programming Interface
750---------------------------------
751
752The programming interface for kernel space data access-aware applications.
753DAMON is a framework, so it does nothing by itself.  Instead, it only helps
754other kernel components such as subsystems and modules building their data
755access-aware applications using DAMON's core features.  For this, DAMON exposes
756its all features to other kernel components via its application programming
757interface, namely ``include/linux/damon.h``.  Please refer to the API
758:doc:`document </mm/damon/api>` for details of the interface.
759
760
761.. _damon_modules:
762
763Modules
764=======
765
766Because the core of DAMON is a framework for kernel components, it doesn't
767provide any direct interface for the user space.  Such interfaces should be
768implemented by each DAMON API user kernel components, instead.  DAMON subsystem
769itself implements such DAMON API user modules, which are supposed to be used
770for general purpose DAMON control and special purpose data access-aware system
771operations, and provides stable application binary interfaces (ABI) for the
772user space.  The user space can build their efficient data access-aware
773applications using the interfaces.
774
775
776General Purpose User Interface Modules
777--------------------------------------
778
779DAMON modules that provide user space ABIs for general purpose DAMON usage in
780runtime.
781
782Like many other ABIs, the modules create files on pseudo file systems like
783'sysfs', allow users to specify their requests to and get the answers from
784DAMON by writing to and reading from the files.  As a response to such I/O,
785DAMON user interface modules control DAMON and retrieve the results as user
786requested via the DAMON API, and return the results to the user-space.
787
788The ABIs are designed to be used for user space applications development,
789rather than human beings' fingers.  Human users are recommended to use such
790user space tools.  One such Python-written user space tool is available at
791Github (https://github.com/damonitor/damo), Pypi
792(https://pypistats.org/packages/damo), and Fedora
793(https://packages.fedoraproject.org/pkgs/python-damo/damo/).
794
795Currently, one module for this type, namely 'DAMON sysfs interface' is
796available.  Please refer to the ABI :ref:`doc <sysfs_interface>` for details of
797the interfaces.
798
799
800Special-Purpose Access-aware Kernel Modules
801-------------------------------------------
802
803DAMON modules that provide user space ABI for specific purpose DAMON usage.
804
805DAMON user interface modules are for full control of all DAMON features in
806runtime.  For each special-purpose system-wide data access-aware system
807operations such as proactive reclamation or LRU lists balancing, the interfaces
808could be simplified by removing unnecessary knobs for the specific purpose, and
809extended for boot-time and even compile time control.  Default values of DAMON
810control parameters for the usage would also need to be optimized for the
811purpose.
812
813To support such cases, yet more DAMON API user kernel modules that provide more
814simple and optimized user space interfaces are available.  Currently, two
815modules for proactive reclamation and LRU lists manipulation are provided.  For
816more detail, please read the usage documents for those
817(:doc:`/admin-guide/mm/damon/reclaim` and
818:doc:`/admin-guide/mm/damon/lru_sort`).
819