xref: /linux/Documentation/mm/damon/design.rst (revision 00c010e130e58301db2ea0cec1eadc931e1cb8cf)
1.. SPDX-License-Identifier: GPL-2.0
2
3======
4Design
5======
6
7
8.. _damon_design_execution_model_and_data_structures:
9
10Execution Model and Data Structures
11===================================
12
13The monitoring-related information including the monitoring request
14specification and DAMON-based operation schemes are stored in a data structure
15called DAMON ``context``.  DAMON executes each context with a kernel thread
16called ``kdamond``.  Multiple kdamonds could run in parallel, for different
17types of monitoring.
18
19To know how user-space can do the configurations and start/stop DAMON, refer to
20:ref:`DAMON sysfs interface <sysfs_interface>` documentation.
21
22
23Overall Architecture
24====================
25
26DAMON subsystem is configured with three layers including
27
28- :ref:`Operations Set <damon_operations_set>`: Implements fundamental
29  operations for DAMON that depends on the given monitoring target
30  address-space and available set of software/hardware primitives,
31- :ref:`Core <damon_core_logic>`: Implements core logics including monitoring
32  overhead/accuracy control and access-aware system operations on top of the
33  operations set layer, and
34- :ref:`Modules <damon_modules>`: Implements kernel modules for various
35  purposes that provides interfaces for the user space, on top of the core
36  layer.
37
38
39.. _damon_operations_set:
40
41Operations Set Layer
42====================
43
44.. _damon_design_configurable_operations_set:
45
46For data access monitoring and additional low level work, DAMON needs a set of
47implementations for specific operations that are dependent on and optimized for
48the given target address space.  For example, below two operations for access
49monitoring are address-space dependent.
50
511. Identification of the monitoring target address range for the address space.
522. Access check of specific address range in the target space.
53
54DAMON consolidates these implementations in a layer called DAMON Operations
55Set, and defines the interface between it and the upper layer.  The upper layer
56is dedicated for DAMON's core logics including the mechanism for control of the
57monitoring accuracy and the overhead.
58
59Hence, DAMON can easily be extended for any address space and/or available
60hardware features by configuring the core logic to use the appropriate
61operations set.  If there is no available operations set for a given purpose, a
62new operations set can be implemented following the interface between the
63layers.
64
65For example, physical memory, virtual memory, swap space, those for specific
66processes, NUMA nodes, files, and backing memory devices would be supportable.
67Also, if some architectures or devices support special optimized access check
68features, those will be easily configurable.
69
70DAMON currently provides below three operation sets.  Below two subsections
71describe how those work.
72
73 - vaddr: Monitor virtual address spaces of specific processes
74 - fvaddr: Monitor fixed virtual address ranges
75 - paddr: Monitor the physical address space of the system
76
77To know how user-space can do the configuration via :ref:`DAMON sysfs interface
78<sysfs_interface>`, refer to :ref:`operations <sysfs_context>` file part of the
79documentation.
80
81
82 .. _damon_design_vaddr_target_regions_construction:
83
84VMA-based Target Address Range Construction
85-------------------------------------------
86
87A mechanism of ``vaddr`` DAMON operations set that automatically initializes
88and updates the monitoring target address regions so that entire memory
89mappings of the target processes can be covered.
90
91This mechanism is only for the ``vaddr`` operations set.  In cases of
92``fvaddr`` and ``paddr`` operation sets, users are asked to manually set the
93monitoring target address ranges.
94
95Only small parts in the super-huge virtual address space of the processes are
96mapped to the physical memory and accessed.  Thus, tracking the unmapped
97address regions is just wasteful.  However, because DAMON can deal with some
98level of noise using the adaptive regions adjustment mechanism, tracking every
99mapping is not strictly required but could even incur a high overhead in some
100cases.  That said, too huge unmapped areas inside the monitoring target should
101be removed to not take the time for the adaptive mechanism.
102
103For the reason, this implementation converts the complex mappings to three
104distinct regions that cover every mapped area of the address space.  The two
105gaps between the three regions are the two biggest unmapped areas in the given
106address space.  The two biggest unmapped areas would be the gap between the
107heap and the uppermost mmap()-ed region, and the gap between the lowermost
108mmap()-ed region and the stack in most of the cases.  Because these gaps are
109exceptionally huge in usual address spaces, excluding these will be sufficient
110to make a reasonable trade-off.  Below shows this in detail::
111
112    <heap>
113    <BIG UNMAPPED REGION 1>
114    <uppermost mmap()-ed region>
115    (small mmap()-ed regions and munmap()-ed regions)
116    <lowermost mmap()-ed region>
117    <BIG UNMAPPED REGION 2>
118    <stack>
119
120
121PTE Accessed-bit Based Access Check
122-----------------------------------
123
124Both of the implementations for physical and virtual address spaces use PTE
125Accessed-bit for basic access checks.  Only one difference is the way of
126finding the relevant PTE Accessed bit(s) from the address.  While the
127implementation for the virtual address walks the page table for the target task
128of the address, the implementation for the physical address walks every page
129table having a mapping to the address.  In this way, the implementations find
130and clear the bit(s) for next sampling target address and checks whether the
131bit(s) set again after one sampling period.  This could disturb other kernel
132subsystems using the Accessed bits, namely Idle page tracking and the reclaim
133logic.  DAMON does nothing to avoid disturbing Idle page tracking, so handling
134the interference is the responsibility of sysadmins.  However, it solves the
135conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags,
136as Idle page tracking does.
137
138
139.. _damon_core_logic:
140
141Core Logics
142===========
143
144.. _damon_design_monitoring:
145
146Monitoring
147----------
148
149Below four sections describe each of the DAMON core mechanisms and the five
150monitoring attributes, ``sampling interval``, ``aggregation interval``,
151``update interval``, ``minimum number of regions``, and ``maximum number of
152regions``.
153
154To know how user-space can set the attributes via :ref:`DAMON sysfs interface
155<sysfs_interface>`, refer to :ref:`monitoring_attrs <sysfs_monitoring_attrs>`
156part of the documentation.
157
158
159Access Frequency Monitoring
160~~~~~~~~~~~~~~~~~~~~~~~~~~~
161
162The output of DAMON says what pages are how frequently accessed for a given
163duration.  The resolution of the access frequency is controlled by setting
164``sampling interval`` and ``aggregation interval``.  In detail, DAMON checks
165access to each page per ``sampling interval`` and aggregates the results.  In
166other words, counts the number of the accesses to each page.  After each
167``aggregation interval`` passes, DAMON calls callback functions that previously
168registered by users so that users can read the aggregated results and then
169clears the results.  This can be described in below simple pseudo-code::
170
171    while monitoring_on:
172        for page in monitoring_target:
173            if accessed(page):
174                nr_accesses[page] += 1
175        if time() % aggregation_interval == 0:
176            for callback in user_registered_callbacks:
177                callback(monitoring_target, nr_accesses)
178            for page in monitoring_target:
179                nr_accesses[page] = 0
180        sleep(sampling interval)
181
182The monitoring overhead of this mechanism will arbitrarily increase as the
183size of the target workload grows.
184
185
186.. _damon_design_region_based_sampling:
187
188Region Based Sampling
189~~~~~~~~~~~~~~~~~~~~~
190
191To avoid the unbounded increase of the overhead, DAMON groups adjacent pages
192that assumed to have the same access frequencies into a region.  As long as the
193assumption (pages in a region have the same access frequencies) is kept, only
194one page in the region is required to be checked.  Thus, for each ``sampling
195interval``, DAMON randomly picks one page in each region, waits for one
196``sampling interval``, checks whether the page is accessed meanwhile, and
197increases the access frequency counter of the region if so.  The counter is
198called ``nr_accesses`` of the region.  Therefore, the monitoring overhead is
199controllable by setting the number of regions.  DAMON allows users to set the
200minimum and the maximum number of regions for the trade-off.
201
202This scheme, however, cannot preserve the quality of the output if the
203assumption is not guaranteed.
204
205
206.. _damon_design_adaptive_regions_adjustment:
207
208Adaptive Regions Adjustment
209~~~~~~~~~~~~~~~~~~~~~~~~~~~
210
211Even somehow the initial monitoring target regions are well constructed to
212fulfill the assumption (pages in same region have similar access frequencies),
213the data access pattern can be dynamically changed.  This will result in low
214monitoring quality.  To keep the assumption as much as possible, DAMON
215adaptively merges and splits each region based on their access frequency.
216
217For each ``aggregation interval``, it compares the access frequencies
218(``nr_accesses``) of adjacent regions.  If the difference is small, and if the
219sum of the two regions' sizes is smaller than the size of total regions divided
220by the ``minimum number of regions``, DAMON merges the two regions.  If the
221resulting number of total regions is still higher than ``maximum number of
222regions``, it repeats the merging with increasing access frequenceis difference
223threshold until the upper-limit of the number of regions is met, or the
224threshold becomes higher than possible maximum value (``aggregation interval``
225divided by ``sampling interval``).   Then, after it reports and clears the
226aggregated access frequency of each region, it splits each region into two or
227three regions if the total number of regions will not exceed the user-specified
228maximum number of regions after the split.
229
230In this way, DAMON provides its best-effort quality and minimal overhead while
231keeping the bounds users set for their trade-off.
232
233
234.. _damon_design_age_tracking:
235
236Age Tracking
237~~~~~~~~~~~~
238
239By analyzing the monitoring results, users can also find how long the current
240access pattern of a region has maintained.  That could be used for good
241understanding of the access pattern.  For example, page placement algorithm
242utilizing both the frequency and the recency could be implemented using that.
243To make such access pattern maintained period analysis easier, DAMON maintains
244yet another counter called ``age`` in each region.  For each ``aggregation
245interval``, DAMON checks if the region's size and access frequency
246(``nr_accesses``) has significantly changed.  If so, the counter is reset to
247zero.  Otherwise, the counter is increased.
248
249
250Dynamic Target Space Updates Handling
251~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
252
253The monitoring target address range could dynamically changed.  For example,
254virtual memory could be dynamically mapped and unmapped.  Physical memory could
255be hot-plugged.
256
257As the changes could be quite frequent in some cases, DAMON allows the
258monitoring operations to check dynamic changes including memory mapping changes
259and applies it to monitoring operations-related data structures such as the
260abstracted monitoring target memory area only for each of a user-specified time
261interval (``update interval``).
262
263User-space can get the monitoring results via DAMON sysfs interface and/or
264tracepoints.  For more details, please refer to the documentations for
265:ref:`DAMOS tried regions <sysfs_schemes_tried_regions>` and :ref:`tracepoint`,
266respectively.
267
268
269.. _damon_design_monitoring_params_tuning_guide:
270
271Monitoring Parameters Tuning Guide
272~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
273
274In short, set ``aggregation interval`` to capture meaningful amount of accesses
275for the purpose.  The amount of accesses can be measured using ``nr_accesses``
276and ``age`` of regions in the aggregated monitoring results snapshot.  The
277default value of the interval, ``100ms``, turns out to be too short in many
278cases.  Set ``sampling interval`` proportional to ``aggregation interval``.  By
279default, ``1/20`` is recommended as the ratio.
280
281``Aggregation interval`` should be set as the time interval that the workload
282can make an amount of accesses for the monitoring purpose, within the interval.
283If the interval is too short, only small number of accesses are captured.  As a
284result, the monitoring results look everything is samely accessed only rarely.
285For many purposes, that would be useless.  If it is too long, however, the time
286to converge regions with the :ref:`regions adjustment mechanism
287<damon_design_adaptive_regions_adjustment>` can be too long, depending on the
288time scale of the given purpose.  This could happen if the workload is actually
289making only rare accesses but the user thinks the amount of accesses for the
290monitoring purpose too high.  For such cases, the target amount of access to
291capture per ``aggregation interval`` should carefully reconsidered.  Also, note
292that the captured amount of accesses is represented with not only
293``nr_accesses``, but also ``age``.  For example, even if every region on the
294monitoring results show zero ``nr_accesses``, regions could still be
295distinguished using ``age`` values as the recency information.
296
297Hence the optimum value of ``aggregation interval`` depends on the access
298intensiveness of the workload.  The user should tune the interval based on the
299amount of access that captured on each aggregated snapshot of the monitoring
300results.
301
302Note that the default value of the interval is 100 milliseconds, which is too
303short in many cases, especially on large systems.
304
305``Sampling interval`` defines the resolution of each aggregation.  If it is set
306too large, monitoring results will look like every region was samely rarely
307accessed, or samely frequently accessed.  That is, regions become
308undistinguishable based on access pattern, and therefore the results will be
309useless in many use cases.  If ``sampling interval`` is too small, it will not
310degrade the resolution, but will increase the monitoring overhead.  If it is
311appropriate enough to provide a resolution of the monitoring results that
312sufficient for the given purpose, it shouldn't be unnecessarily further
313lowered.  It is recommended to be set proportional to ``aggregation interval``.
314By default, the ratio is set as ``1/20``, and it is still recommended.
315
316Based on the manual tuning guide, DAMON provides more intuitive knob-based
317intervals auto tuning mechanism.  Please refer to :ref:`the design document of
318the feature <damon_design_monitoring_intervals_autotuning>` for detail.
319
320Refer to below documents for an example tuning based on the above guide.
321
322.. toctree::
323   :maxdepth: 1
324
325   monitoring_intervals_tuning_example
326
327
328.. _damon_design_monitoring_intervals_autotuning:
329
330Monitoring Intervals Auto-tuning
331~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
332
333DAMON provides automatic tuning of the ``sampling interval`` and ``aggregation
334interval`` based on the :ref:`the tuning guide idea
335<damon_design_monitoring_params_tuning_guide>`.  The tuning mechanism allows
336users to set the aimed amount of access events to observe via DAMON within
337given time interval.  The target can be specified by the user as a ratio of
338DAMON-observed access events to the theoretical maximum amount of the events
339(``access_bp``) that measured within a given number of aggregations
340(``aggrs``).
341
342The DAMON-observed access events are calculated in byte granularity based on
343DAMON :ref:`region assumption <damon_design_region_based_sampling>`.  For
344example, if a region of size ``X`` bytes of ``Y`` ``nr_accesses`` is found, it
345means ``X * Y`` access events are observed by DAMON.  Theoretical maximum
346access events for the region is calculated in same way, but replacing ``Y``
347with theoretical maximum ``nr_accesses``, which can be calculated as
348``aggregation interval / sampling interval``.
349
350The mechanism calculates the ratio of access events for ``aggrs`` aggregations,
351and increases or decrease the ``sampleing interval`` and ``aggregation
352interval`` in same ratio, if the observed access ratio is lower or higher than
353the target, respectively.  The ratio of the intervals change is decided in
354proportion to the distance between current samples ratio and the target ratio.
355
356The user can further set the minimum and maximum ``sampling interval`` that can
357be set by the tuning mechanism using two parameters (``min_sample_us`` and
358``max_sample_us``).  Because the tuning mechanism changes ``sampling interval``
359and ``aggregation interval`` in same ratio always, the minimum and maximum
360``aggregation interval`` after each of the tuning changes can automatically set
361together.
362
363The tuning is turned off by default, and need to be set explicitly by the user.
364As a rule of thumbs and the Parreto principle, 4% access samples ratio target
365is recommended.  Note that Parreto principle (80/20 rule) has applied twice.
366That is, assumes 4% (20% of 20%) DAMON-observed access events ratio (source)
367to capture 64% (80% multipled by 80%) real access events (outcomes).
368
369To know how user-space can use this feature via :ref:`DAMON sysfs interface
370<sysfs_interface>`, refer to :ref:`intervals_goal <sysfs_scheme>` part of
371the documentation.
372
373
374.. _damon_design_damos:
375
376Operation Schemes
377-----------------
378
379One common purpose of data access monitoring is access-aware system efficiency
380optimizations.  For example,
381
382    paging out memory regions that are not accessed for more than two minutes
383
384or
385
386    using THP for memory regions that are larger than 2 MiB and showing a high
387    access frequency for more than one minute.
388
389One straightforward approach for such schemes would be profile-guided
390optimizations.  That is, getting data access monitoring results of the
391workloads or the system using DAMON, finding memory regions of special
392characteristics by profiling the monitoring results, and making system
393operation changes for the regions.  The changes could be made by modifying or
394providing advice to the software (the application and/or the kernel), or
395reconfiguring the hardware.  Both offline and online approaches could be
396available.
397
398Among those, providing advice to the kernel at runtime would be flexible and
399effective, and therefore widely be used.   However, implementing such schemes
400could impose unnecessary redundancy and inefficiency.  The profiling could be
401redundant if the type of interest is common.  Exchanging the information
402including monitoring results and operation advice between kernel and user
403spaces could be inefficient.
404
405To allow users to reduce such redundancy and inefficiencies by offloading the
406works, DAMON provides a feature called Data Access Monitoring-based Operation
407Schemes (DAMOS).  It lets users specify their desired schemes at a high
408level.  For such specifications, DAMON starts monitoring, finds regions having
409the access pattern of interest, and applies the user-desired operation actions
410to the regions, for every user-specified time interval called
411``apply_interval``.
412
413To know how user-space can set ``apply_interval`` via :ref:`DAMON sysfs
414interface <sysfs_interface>`, refer to :ref:`apply_interval_us <sysfs_scheme>`
415part of the documentation.
416
417
418.. _damon_design_damos_action:
419
420Operation Action
421~~~~~~~~~~~~~~~~
422
423The management action that the users desire to apply to the regions of their
424interest.  For example, paging out, prioritizing for next reclamation victim
425selection, advising ``khugepaged`` to collapse or split, or doing nothing but
426collecting statistics of the regions.
427
428The list of supported actions is defined in DAMOS, but the implementation of
429each action is in the DAMON operations set layer because the implementation
430normally depends on the monitoring target address space.  For example, the code
431for paging specific virtual address ranges out would be different from that for
432physical address ranges.  And the monitoring operations implementation sets are
433not mandated to support all actions of the list.  Hence, the availability of
434specific DAMOS action depends on what operations set is selected to be used
435together.
436
437The list of the supported actions, their meaning, and DAMON operations sets
438that supports each action are as below.
439
440 - ``willneed``: Call ``madvise()`` for the region with ``MADV_WILLNEED``.
441   Supported by ``vaddr`` and ``fvaddr`` operations set.
442 - ``cold``: Call ``madvise()`` for the region with ``MADV_COLD``.
443   Supported by ``vaddr`` and ``fvaddr`` operations set.
444 - ``pageout``: Reclaim the region.
445   Supported by ``vaddr``, ``fvaddr`` and ``paddr`` operations set.
446 - ``hugepage``: Call ``madvise()`` for the region with ``MADV_HUGEPAGE``.
447   Supported by ``vaddr`` and ``fvaddr`` operations set.
448 - ``nohugepage``: Call ``madvise()`` for the region with ``MADV_NOHUGEPAGE``.
449   Supported by ``vaddr`` and ``fvaddr`` operations set.
450 - ``lru_prio``: Prioritize the region on its LRU lists.
451   Supported by ``paddr`` operations set.
452 - ``lru_deprio``: Deprioritize the region on its LRU lists.
453   Supported by ``paddr`` operations set.
454 - ``migrate_hot``: Migrate the regions prioritizing warmer regions.
455   Supported by ``paddr`` operations set.
456 - ``migrate_cold``: Migrate the regions prioritizing colder regions.
457   Supported by ``paddr`` operations set.
458 - ``stat``: Do nothing but count the statistics.
459   Supported by all operations sets.
460
461Applying the actions except ``stat`` to a region is considered as changing the
462region's characteristics.  Hence, DAMOS resets the age of regions when any such
463actions are applied to those.
464
465To know how user-space can set the action via :ref:`DAMON sysfs interface
466<sysfs_interface>`, refer to :ref:`action <sysfs_scheme>` part of the
467documentation.
468
469
470.. _damon_design_damos_access_pattern:
471
472Target Access Pattern
473~~~~~~~~~~~~~~~~~~~~~
474
475The access pattern of the schemes' interest.  The patterns are constructed with
476the properties that DAMON's monitoring results provide, specifically the size,
477the access frequency, and the age.  Users can describe their access pattern of
478interest by setting minimum and maximum values of the three properties.  If a
479region's three properties are in the ranges, DAMOS classifies it as one of the
480regions that the scheme is having an interest in.
481
482To know how user-space can set the access pattern via :ref:`DAMON sysfs
483interface <sysfs_interface>`, refer to :ref:`access_pattern
484<sysfs_access_pattern>` part of the documentation.
485
486
487.. _damon_design_damos_quotas:
488
489Quotas
490~~~~~~
491
492DAMOS upper-bound overhead control feature.  DAMOS could incur high overhead if
493the target access pattern is not properly tuned.  For example, if a huge memory
494region having the access pattern of interest is found, applying the scheme's
495action to all pages of the huge region could consume unacceptably large system
496resources.  Preventing such issues by tuning the access pattern could be
497challenging, especially if the access patterns of the workloads are highly
498dynamic.
499
500To mitigate that situation, DAMOS provides an upper-bound overhead control
501feature called quotas.  It lets users specify an upper limit of time that DAMOS
502can use for applying the action, and/or a maximum bytes of memory regions that
503the action can be applied within a user-specified time duration.
504
505To know how user-space can set the basic quotas via :ref:`DAMON sysfs interface
506<sysfs_interface>`, refer to :ref:`quotas <sysfs_quotas>` part of the
507documentation.
508
509
510.. _damon_design_damos_quotas_prioritization:
511
512Prioritization
513^^^^^^^^^^^^^^
514
515A mechanism for making a good decision under the quotas.  When the action
516cannot be applied to all regions of interest due to the quotas, DAMOS
517prioritizes regions and applies the action to only regions having high enough
518priorities so that it will not exceed the quotas.
519
520The prioritization mechanism should be different for each action.  For example,
521rarely accessed (colder) memory regions would be prioritized for page-out
522scheme action.  In contrast, the colder regions would be deprioritized for huge
523page collapse scheme action.  Hence, the prioritization mechanisms for each
524action are implemented in each DAMON operations set, together with the actions.
525
526Though the implementation is up to the DAMON operations set, it would be common
527to calculate the priority using the access pattern properties of the regions.
528Some users would want the mechanisms to be personalized for their specific
529case.  For example, some users would want the mechanism to weigh the recency
530(``age``) more than the access frequency (``nr_accesses``).  DAMOS allows users
531to specify the weight of each access pattern property and passes the
532information to the underlying mechanism.  Nevertheless, how and even whether
533the weight will be respected are up to the underlying prioritization mechanism
534implementation.
535
536To know how user-space can set the prioritization weights via :ref:`DAMON sysfs
537interface <sysfs_interface>`, refer to :ref:`weights <sysfs_quotas>` part of
538the documentation.
539
540
541.. _damon_design_damos_quotas_auto_tuning:
542
543Aim-oriented Feedback-driven Auto-tuning
544^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
545
546Automatic feedback-driven quota tuning.  Instead of setting the absolute quota
547value, users can specify the metric of their interest, and what target value
548they want the metric value to be.  DAMOS then automatically tunes the
549aggressiveness (the quota) of the corresponding scheme.  For example, if DAMOS
550is under achieving the goal, DAMOS automatically increases the quota.  If DAMOS
551is over achieving the goal, it decreases the quota.
552
553The goal can be specified with four parameters, namely ``target_metric``,
554``target_value``, ``current_value`` and ``nid``.  The auto-tuning mechanism
555tries to make ``current_value`` of ``target_metric`` be same to
556``target_value``.
557
558- ``user_input``: User-provided value.  Users could use any metric that they
559  has interest in for the value.  Use space main workload's latency or
560  throughput, system metrics like free memory ratio or memory pressure stall
561  time (PSI) could be examples.  Note that users should explicitly set
562  ``current_value`` on their own in this case.  In other words, users should
563  repeatedly provide the feedback.
564- ``some_mem_psi_us``: System-wide ``some`` memory pressure stall information
565  in microseconds that measured from last quota reset to next quota reset.
566  DAMOS does the measurement on its own, so only ``target_value`` need to be
567  set by users at the initial time.  In other words, DAMOS does self-feedback.
568- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000).
569- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000).
570
571``nid`` is optionally required for only ``node_mem_used_bp`` and
572``node_mem_free_bp`` to point the specific NUMA node.
573
574To know how user-space can set the tuning goal metric, the target value, and/or
575the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to
576:ref:`quota goals <sysfs_schemes_quota_goals>` part of the documentation.
577
578
579.. _damon_design_damos_watermarks:
580
581Watermarks
582~~~~~~~~~~
583
584Conditional DAMOS (de)activation automation.  Users might want DAMOS to run
585only under certain situations.  For example, when a sufficient amount of free
586memory is guaranteed, running a scheme for proactive reclamation would only
587consume unnecessary system resources.  To avoid such consumption, the user would
588need to manually monitor some metrics such as free memory ratio, and turn
589DAMON/DAMOS on or off.
590
591DAMOS allows users to offload such works using three watermarks.  It allows the
592users to configure the metric of their interest, and three watermark values,
593namely high, middle, and low.  If the value of the metric becomes above the
594high watermark or below the low watermark, the scheme is deactivated.  If the
595metric becomes below the mid watermark but above the low watermark, the scheme
596is activated.  If all schemes are deactivated by the watermarks, the monitoring
597is also deactivated.  In this case, the DAMON worker thread only periodically
598checks the watermarks and therefore incurs nearly zero overhead.
599
600To know how user-space can set the watermarks via :ref:`DAMON sysfs interface
601<sysfs_interface>`, refer to :ref:`watermarks <sysfs_watermarks>` part of the
602documentation.
603
604
605.. _damon_design_damos_filters:
606
607Filters
608~~~~~~~
609
610Non-access pattern-based target memory regions filtering.  If users run
611self-written programs or have good profiling tools, they could know something
612more than the kernel, such as future access patterns or some special
613requirements for specific types of memory. For example, some users may know
614only anonymous pages can impact their program's performance.  They can also
615have a list of latency-critical processes.
616
617To let users optimize DAMOS schemes with such special knowledge, DAMOS provides
618a feature called DAMOS filters.  The feature allows users to set an arbitrary
619number of filters for each scheme.  Each filter specifies
620
621- a type of memory (``type``),
622- whether it is for the memory of the type or all except the type
623  (``matching``), and
624- whether it is to allow (include) or reject (exclude) applying
625  the scheme's action to the memory (``allow``).
626
627For efficient handling of filters, some types of filters are handled by the
628core layer, while others are handled by operations set.  In the latter case,
629hence, support of the filter types depends on the DAMON operations set.  In
630case of the core layer-handled filters, the memory regions that excluded by the
631filter are not counted as the scheme has tried to the region.  In contrast, if
632a memory regions is filtered by an operations set layer-handled filter, it is
633counted as the scheme has tried.  This difference affects the statistics.
634
635When multiple filters are installed, the group of filters that handled by the
636core layer are evaluated first.  After that, the group of filters that handled
637by the operations layer are evaluated.  Filters in each of the groups are
638evaluated in the installed order.  If a part of memory is matched to one of the
639filter, next filters are ignored.  If the part passes through the filters
640evaluation stage because it is not matched to any of the filters, applying the
641scheme's action to it depends on the last filter's allowance type.  If the last
642filter was for allowing, the part of memory will be rejected, and vice versa.
643
644For example, let's assume 1) a filter for allowing anonymous pages and 2)
645another filter for rejecting young pages are installed in the order.  If a page
646of a region that eligible to apply the scheme's action is an anonymous page,
647the scheme's action will be applied to the page regardless of whether it is
648young or not, since it matches with the first allow-filter.  If the page is
649not anonymous but young, the scheme's action will not be applied, since the
650second reject-filter blocks it.  If the page is neither anonymous nor young,
651the page will pass through the filters evaluation stage since there is no
652matching filter, and the action will be applied to the page.
653
654Below ``type`` of filters are currently supported.
655
656- Core layer handled
657    - addr
658        - Applied to pages that belonging to a given address range.
659    - target
660        - Applied to pages that belonging to a given DAMON monitoring target.
661- Operations layer handled, supported by only ``paddr`` operations set.
662    - anon
663        - Applied to pages that containing data that not stored in files.
664    - active
665        - Applied to active pages.
666    - memcg
667        - Applied to pages that belonging to a given cgroup.
668    - young
669        - Applied to pages that are accessed after the last access check from the
670          scheme.
671    - hugepage_size
672        - Applied to pages that managed in a given size range.
673    - unmapped
674        - Applied to pages that unmapped.
675
676To know how user-space can set the filters via :ref:`DAMON sysfs interface
677<sysfs_interface>`, refer to :ref:`filters <sysfs_filters>` part of the
678documentation.
679
680.. _damon_design_damos_stat:
681
682Statistics
683~~~~~~~~~~
684
685The statistics of DAMOS behaviors that designed to help monitoring, tuning and
686debugging of DAMOS.
687
688DAMOS accounts below statistics for each scheme, from the beginning of the
689scheme's execution.
690
691- ``nr_tried``: Total number of regions that the scheme is tried to be applied.
692- ``sz_trtied``: Total size of regions that the scheme is tried to be applied.
693- ``sz_ops_filter_passed``: Total bytes that passed operations set
694  layer-handled DAMOS filters.
695- ``nr_applied``: Total number of regions that the scheme is applied.
696- ``sz_applied``: Total size of regions that the scheme is applied.
697- ``qt_exceeds``: Total number of times the quota of the scheme has exceeded.
698
699"A scheme is tried to be applied to a region" means DAMOS core logic determined
700the region is eligible to apply the scheme's :ref:`action
701<damon_design_damos_action>`.  The :ref:`access pattern
702<damon_design_damos_access_pattern>`, :ref:`quotas
703<damon_design_damos_quotas>`, :ref:`watermarks
704<damon_design_damos_watermarks>`, and :ref:`filters
705<damon_design_damos_filters>` that handled on core logic could affect this.
706The core logic will only ask the underlying :ref:`operation set
707<damon_operations_set>` to do apply the action to the region, so whether the
708action is really applied or not is unclear.  That's why it is called "tried".
709
710"A scheme is applied to a region" means the :ref:`operation set
711<damon_operations_set>` has applied the action to at least a part of the
712region.  The :ref:`filters <damon_design_damos_filters>` that handled by the
713operation set, and the types of the :ref:`action <damon_design_damos_action>`
714and the pages of the region can affect this.  For example, if a filter is set
715to exclude anonymous pages and the region has only anonymous pages, or if the
716action is ``pageout`` while all pages of the region are unreclaimable, applying
717the action to the region will fail.
718
719To know how user-space can read the stats via :ref:`DAMON sysfs interface
720<sysfs_interface>`, refer to :ref:s`stats <sysfs_stats>` part of the
721documentation.
722
723Regions Walking
724~~~~~~~~~~~~~~~
725
726DAMOS feature allowing users access each region that a DAMOS action has just
727applied.  Using this feature, DAMON :ref:`API <damon_design_api>` allows users
728access full properties of the regions including the access monitoring results
729and amount of the region's internal memory that passed the DAMOS filters.
730:ref:`DAMON sysfs interface <sysfs_interface>` also allows users read the data
731via special :ref:`files <sysfs_schemes_tried_regions>`.
732
733.. _damon_design_api:
734
735Application Programming Interface
736---------------------------------
737
738The programming interface for kernel space data access-aware applications.
739DAMON is a framework, so it does nothing by itself.  Instead, it only helps
740other kernel components such as subsystems and modules building their data
741access-aware applications using DAMON's core features.  For this, DAMON exposes
742its all features to other kernel components via its application programming
743interface, namely ``include/linux/damon.h``.  Please refer to the API
744:doc:`document </mm/damon/api>` for details of the interface.
745
746
747.. _damon_modules:
748
749Modules
750=======
751
752Because the core of DAMON is a framework for kernel components, it doesn't
753provide any direct interface for the user space.  Such interfaces should be
754implemented by each DAMON API user kernel components, instead.  DAMON subsystem
755itself implements such DAMON API user modules, which are supposed to be used
756for general purpose DAMON control and special purpose data access-aware system
757operations, and provides stable application binary interfaces (ABI) for the
758user space.  The user space can build their efficient data access-aware
759applications using the interfaces.
760
761
762General Purpose User Interface Modules
763--------------------------------------
764
765DAMON modules that provide user space ABIs for general purpose DAMON usage in
766runtime.
767
768Like many other ABIs, the modules create files on pseudo file systems like
769'sysfs', allow users to specify their requests to and get the answers from
770DAMON by writing to and reading from the files.  As a response to such I/O,
771DAMON user interface modules control DAMON and retrieve the results as user
772requested via the DAMON API, and return the results to the user-space.
773
774The ABIs are designed to be used for user space applications development,
775rather than human beings' fingers.  Human users are recommended to use such
776user space tools.  One such Python-written user space tool is available at
777Github (https://github.com/damonitor/damo), Pypi
778(https://pypistats.org/packages/damo), and Fedora
779(https://packages.fedoraproject.org/pkgs/python-damo/damo/).
780
781Currently, one module for this type, namely 'DAMON sysfs interface' is
782available.  Please refer to the ABI :ref:`doc <sysfs_interface>` for details of
783the interfaces.
784
785
786Special-Purpose Access-aware Kernel Modules
787-------------------------------------------
788
789DAMON modules that provide user space ABI for specific purpose DAMON usage.
790
791DAMON user interface modules are for full control of all DAMON features in
792runtime.  For each special-purpose system-wide data access-aware system
793operations such as proactive reclamation or LRU lists balancing, the interfaces
794could be simplified by removing unnecessary knobs for the specific purpose, and
795extended for boot-time and even compile time control.  Default values of DAMON
796control parameters for the usage would also need to be optimized for the
797purpose.
798
799To support such cases, yet more DAMON API user kernel modules that provide more
800simple and optimized user space interfaces are available.  Currently, two
801modules for proactive reclamation and LRU lists manipulation are provided.  For
802more detail, please read the usage documents for those
803(:doc:`/admin-guide/mm/damon/reclaim` and
804:doc:`/admin-guide/mm/damon/lru_sort`).
805