xref: /linux/Documentation/mm/damon/design.rst (revision 80154575849778e40d9d87aa7ab14491ac401948)
1.. SPDX-License-Identifier: GPL-2.0
2
3======
4Design
5======
6
7
8.. _damon_design_execution_model_and_data_structures:
9
10Execution Model and Data Structures
11===================================
12
13The monitoring-related information including the monitoring request
14specification and DAMON-based operation schemes are stored in a data structure
15called DAMON ``context``.  DAMON executes each context with a kernel thread
16called ``kdamond``.  Multiple kdamonds could run in parallel, for different
17types of monitoring.
18
19
20Overall Architecture
21====================
22
23DAMON subsystem is configured with three layers including
24
25- Operations Set: Implements fundamental operations for DAMON that depends on
26  the given monitoring target address-space and available set of
27  software/hardware primitives,
28- Core: Implements core logics including monitoring overhead/accurach control
29  and access-aware system operations on top of the operations set layer, and
30- Modules: Implements kernel modules for various purposes that provides
31  interfaces for the user space, on top of the core layer.
32
33
34Configurable Operations Set
35---------------------------
36
37For data access monitoring and additional low level work, DAMON needs a set of
38implementations for specific operations that are dependent on and optimized for
39the given target address space.  On the other hand, the accuracy and overhead
40tradeoff mechanism, which is the core logic of DAMON, is in the pure logic
41space.  DAMON separates the two parts in different layers, namely DAMON
42Operations Set and DAMON Core Logics Layers, respectively.  It further defines
43the interface between the layers to allow various operations sets to be
44configured with the core logic.
45
46Due to this design, users can extend DAMON for any address space by configuring
47the core logic to use the appropriate operations set.  If any appropriate set
48is unavailable, users can implement one on their own.
49
50For example, physical memory, virtual memory, swap space, those for specific
51processes, NUMA nodes, files, and backing memory devices would be supportable.
52Also, if some architectures or devices supporting special optimized access
53check primitives, those will be easily configurable.
54
55
56Programmable Modules
57--------------------
58
59Core layer of DAMON is implemented as a framework, and exposes its application
60programming interface to all kernel space components such as subsystems and
61modules.  For common use cases of DAMON, DAMON subsystem provides kernel
62modules that built on top of the core layer using the API, which can be easily
63used by the user space end users.
64
65
66Operations Set Layer
67====================
68
69The monitoring operations are defined in two parts:
70
711. Identification of the monitoring target address range for the address space.
722. Access check of specific address range in the target space.
73
74DAMON currently provides the implementations of the operations for the physical
75and virtual address spaces. Below two subsections describe how those work.
76
77
78VMA-based Target Address Range Construction
79-------------------------------------------
80
81This is only for the virtual address space monitoring operations
82implementation.  That for the physical address space simply asks users to
83manually set the monitoring target address ranges.
84
85Only small parts in the super-huge virtual address space of the processes are
86mapped to the physical memory and accessed.  Thus, tracking the unmapped
87address regions is just wasteful.  However, because DAMON can deal with some
88level of noise using the adaptive regions adjustment mechanism, tracking every
89mapping is not strictly required but could even incur a high overhead in some
90cases.  That said, too huge unmapped areas inside the monitoring target should
91be removed to not take the time for the adaptive mechanism.
92
93For the reason, this implementation converts the complex mappings to three
94distinct regions that cover every mapped area of the address space.  The two
95gaps between the three regions are the two biggest unmapped areas in the given
96address space.  The two biggest unmapped areas would be the gap between the
97heap and the uppermost mmap()-ed region, and the gap between the lowermost
98mmap()-ed region and the stack in most of the cases.  Because these gaps are
99exceptionally huge in usual address spaces, excluding these will be sufficient
100to make a reasonable trade-off.  Below shows this in detail::
101
102    <heap>
103    <BIG UNMAPPED REGION 1>
104    <uppermost mmap()-ed region>
105    (small mmap()-ed regions and munmap()-ed regions)
106    <lowermost mmap()-ed region>
107    <BIG UNMAPPED REGION 2>
108    <stack>
109
110
111PTE Accessed-bit Based Access Check
112-----------------------------------
113
114Both of the implementations for physical and virtual address spaces use PTE
115Accessed-bit for basic access checks.  Only one difference is the way of
116finding the relevant PTE Accessed bit(s) from the address.  While the
117implementation for the virtual address walks the page table for the target task
118of the address, the implementation for the physical address walks every page
119table having a mapping to the address.  In this way, the implementations find
120and clear the bit(s) for next sampling target address and checks whether the
121bit(s) set again after one sampling period.  This could disturb other kernel
122subsystems using the Accessed bits, namely Idle page tracking and the reclaim
123logic.  DAMON does nothing to avoid disturbing Idle page tracking, so handling
124the interference is the responsibility of sysadmins.  However, it solves the
125conflict with the reclaim logic using ``PG_idle`` and ``PG_young`` page flags,
126as Idle page tracking does.
127
128
129Core Logics
130===========
131
132
133Monitoring
134----------
135
136Below four sections describe each of the DAMON core mechanisms and the five
137monitoring attributes, ``sampling interval``, ``aggregation interval``,
138``update interval``, ``minimum number of regions``, and ``maximum number of
139regions``.
140
141
142Access Frequency Monitoring
143~~~~~~~~~~~~~~~~~~~~~~~~~~~
144
145The output of DAMON says what pages are how frequently accessed for a given
146duration.  The resolution of the access frequency is controlled by setting
147``sampling interval`` and ``aggregation interval``.  In detail, DAMON checks
148access to each page per ``sampling interval`` and aggregates the results.  In
149other words, counts the number of the accesses to each page.  After each
150``aggregation interval`` passes, DAMON calls callback functions that previously
151registered by users so that users can read the aggregated results and then
152clears the results.  This can be described in below simple pseudo-code::
153
154    while monitoring_on:
155        for page in monitoring_target:
156            if accessed(page):
157                nr_accesses[page] += 1
158        if time() % aggregation_interval == 0:
159            for callback in user_registered_callbacks:
160                callback(monitoring_target, nr_accesses)
161            for page in monitoring_target:
162                nr_accesses[page] = 0
163        sleep(sampling interval)
164
165The monitoring overhead of this mechanism will arbitrarily increase as the
166size of the target workload grows.
167
168
169.. _damon_design_region_based_sampling:
170
171Region Based Sampling
172~~~~~~~~~~~~~~~~~~~~~
173
174To avoid the unbounded increase of the overhead, DAMON groups adjacent pages
175that assumed to have the same access frequencies into a region.  As long as the
176assumption (pages in a region have the same access frequencies) is kept, only
177one page in the region is required to be checked.  Thus, for each ``sampling
178interval``, DAMON randomly picks one page in each region, waits for one
179``sampling interval``, checks whether the page is accessed meanwhile, and
180increases the access frequency counter of the region if so.  The counter is
181called ``nr_regions`` of the region.  Therefore, the monitoring overhead is
182controllable by setting the number of regions.  DAMON allows users to set the
183minimum and the maximum number of regions for the trade-off.
184
185This scheme, however, cannot preserve the quality of the output if the
186assumption is not guaranteed.
187
188
189Adaptive Regions Adjustment
190~~~~~~~~~~~~~~~~~~~~~~~~~~~
191
192Even somehow the initial monitoring target regions are well constructed to
193fulfill the assumption (pages in same region have similar access frequencies),
194the data access pattern can be dynamically changed.  This will result in low
195monitoring quality.  To keep the assumption as much as possible, DAMON
196adaptively merges and splits each region based on their access frequency.
197
198For each ``aggregation interval``, it compares the access frequencies of
199adjacent regions and merges those if the frequency difference is small.  Then,
200after it reports and clears the aggregated access frequency of each region, it
201splits each region into two or three regions if the total number of regions
202will not exceed the user-specified maximum number of regions after the split.
203
204In this way, DAMON provides its best-effort quality and minimal overhead while
205keeping the bounds users set for their trade-off.
206
207
208.. _damon_design_age_tracking:
209
210Age Tracking
211~~~~~~~~~~~~
212
213By analyzing the monitoring results, users can also find how long the current
214access pattern of a region has maintained.  That could be used for good
215understanding of the access pattern.  For example, page placement algorithm
216utilizing both the frequency and the recency could be implemented using that.
217To make such access pattern maintained period analysis easier, DAMON maintains
218yet another counter called ``age`` in each region.  For each ``aggregation
219interval``, DAMON checks if the region's size and access frequency
220(``nr_accesses``) has significantly changed.  If so, the counter is reset to
221zero.  Otherwise, the counter is increased.
222
223
224Dynamic Target Space Updates Handling
225~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
226
227The monitoring target address range could dynamically changed.  For example,
228virtual memory could be dynamically mapped and unmapped.  Physical memory could
229be hot-plugged.
230
231As the changes could be quite frequent in some cases, DAMON allows the
232monitoring operations to check dynamic changes including memory mapping changes
233and applies it to monitoring operations-related data structures such as the
234abstracted monitoring target memory area only for each of a user-specified time
235interval (``update interval``).
236
237
238.. _damon_design_damos:
239
240Operation Schemes
241-----------------
242
243One common purpose of data access monitoring is access-aware system efficiency
244optimizations.  For example,
245
246    paging out memory regions that are not accessed for more than two minutes
247
248or
249
250    using THP for memory regions that are larger than 2 MiB and showing a high
251    access frequency for more than one minute.
252
253One straightforward approach for such schemes would be profile-guided
254optimizations.  That is, getting data access monitoring results of the
255workloads or the system using DAMON, finding memory regions of special
256characteristics by profiling the monitoring results, and making system
257operation changes for the regions.  The changes could be made by modifying or
258providing advice to the software (the application and/or the kernel), or
259reconfiguring the hardware.  Both offline and online approaches could be
260available.
261
262Among those, providing advice to the kernel at runtime would be flexible and
263effective, and therefore widely be used.   However, implementing such schemes
264could impose unnecessary redundancy and inefficiency.  The profiling could be
265redundant if the type of interest is common.  Exchanging the information
266including monitoring results and operation advice between kernel and user
267spaces could be inefficient.
268
269To allow users to reduce such redundancy and inefficiencies by offloading the
270works, DAMON provides a feature called Data Access Monitoring-based Operation
271Schemes (DAMOS).  It lets users specify their desired schemes at a high
272level.  For such specifications, DAMON starts monitoring, finds regions having
273the access pattern of interest, and applies the user-desired operation actions
274to the regions, for every user-specified time interval called
275``apply_interval``.
276
277
278.. _damon_design_damos_action:
279
280Operation Action
281~~~~~~~~~~~~~~~~
282
283The management action that the users desire to apply to the regions of their
284interest.  For example, paging out, prioritizing for next reclamation victim
285selection, advising ``khugepaged`` to collapse or split, or doing nothing but
286collecting statistics of the regions.
287
288The list of supported actions is defined in DAMOS, but the implementation of
289each action is in the DAMON operations set layer because the implementation
290normally depends on the monitoring target address space.  For example, the code
291for paging specific virtual address ranges out would be different from that for
292physical address ranges.  And the monitoring operations implementation sets are
293not mandated to support all actions of the list.  Hence, the availability of
294specific DAMOS action depends on what operations set is selected to be used
295together.
296
297Applying an action to a region is considered as changing the region's
298characteristics.  Hence, DAMOS resets the age of regions when an action is
299applied to those.
300
301
302.. _damon_design_damos_access_pattern:
303
304Target Access Pattern
305~~~~~~~~~~~~~~~~~~~~~
306
307The access pattern of the schemes' interest.  The patterns are constructed with
308the properties that DAMON's monitoring results provide, specifically the size,
309the access frequency, and the age.  Users can describe their access pattern of
310interest by setting minimum and maximum values of the three properties.  If a
311region's three properties are in the ranges, DAMOS classifies it as one of the
312regions that the scheme is having an interest in.
313
314
315.. _damon_design_damos_quotas:
316
317Quotas
318~~~~~~
319
320DAMOS upper-bound overhead control feature.  DAMOS could incur high overhead if
321the target access pattern is not properly tuned.  For example, if a huge memory
322region having the access pattern of interest is found, applying the scheme's
323action to all pages of the huge region could consume unacceptably large system
324resources.  Preventing such issues by tuning the access pattern could be
325challenging, especially if the access patterns of the workloads are highly
326dynamic.
327
328To mitigate that situation, DAMOS provides an upper-bound overhead control
329feature called quotas.  It lets users specify an upper limit of time that DAMOS
330can use for applying the action, and/or a maximum bytes of memory regions that
331the action can be applied within a user-specified time duration.
332
333
334.. _damon_design_damos_quotas_prioritization:
335
336Prioritization
337^^^^^^^^^^^^^^
338
339A mechanism for making a good decision under the quotas.  When the action
340cannot be applied to all regions of interest due to the quotas, DAMOS
341prioritizes regions and applies the action to only regions having high enough
342priorities so that it will not exceed the quotas.
343
344The prioritization mechanism should be different for each action.  For example,
345rarely accessed (colder) memory regions would be prioritized for page-out
346scheme action.  In contrast, the colder regions would be deprioritized for huge
347page collapse scheme action.  Hence, the prioritization mechanisms for each
348action are implemented in each DAMON operations set, together with the actions.
349
350Though the implementation is up to the DAMON operations set, it would be common
351to calculate the priority using the access pattern properties of the regions.
352Some users would want the mechanisms to be personalized for their specific
353case.  For example, some users would want the mechanism to weigh the recency
354(``age``) more than the access frequency (``nr_accesses``).  DAMOS allows users
355to specify the weight of each access pattern property and passes the
356information to the underlying mechanism.  Nevertheless, how and even whether
357the weight will be respected are up to the underlying prioritization mechanism
358implementation.
359
360
361.. _damon_design_damos_quotas_auto_tuning:
362
363Aim-oriented Feedback-driven Auto-tuning
364^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
365
366Automatic feedback-driven quota tuning.  Instead of setting the absolute quota
367value, users can repeatedly provide numbers representing how much of their goal
368for the scheme is achieved as feedback.  DAMOS then automatically tunes the
369aggressiveness (the quota) of the corresponding scheme.  For example, if DAMOS
370is under achieving the goal, DAMOS automatically increases the quota.  If DAMOS
371is over achieving the goal, it decreases the quota.
372
373
374.. _damon_design_damos_watermarks:
375
376Watermarks
377~~~~~~~~~~
378
379Conditional DAMOS (de)activation automation.  Users might want DAMOS to run
380only under certain situations.  For example, when a sufficient amount of free
381memory is guaranteed, running a scheme for proactive reclamation would only
382consume unnecessary system resources.  To avoid such consumption, the user would
383need to manually monitor some metrics such as free memory ratio, and turn
384DAMON/DAMOS on or off.
385
386DAMOS allows users to offload such works using three watermarks.  It allows the
387users to configure the metric of their interest, and three watermark values,
388namely high, middle, and low.  If the value of the metric becomes above the
389high watermark or below the low watermark, the scheme is deactivated.  If the
390metric becomes below the mid watermark but above the low watermark, the scheme
391is activated.  If all schemes are deactivated by the watermarks, the monitoring
392is also deactivated.  In this case, the DAMON worker thread only periodically
393checks the watermarks and therefore incurs nearly zero overhead.
394
395
396.. _damon_design_damos_filters:
397
398Filters
399~~~~~~~
400
401Non-access pattern-based target memory regions filtering.  If users run
402self-written programs or have good profiling tools, they could know something
403more than the kernel, such as future access patterns or some special
404requirements for specific types of memory. For example, some users may know
405only anonymous pages can impact their program's performance.  They can also
406have a list of latency-critical processes.
407
408To let users optimize DAMOS schemes with such special knowledge, DAMOS provides
409a feature called DAMOS filters.  The feature allows users to set an arbitrary
410number of filters for each scheme.  Each filter specifies the type of target
411memory, and whether it should exclude the memory of the type (filter-out), or
412all except the memory of the type (filter-in).
413
414Currently, anonymous page, memory cgroup, address range, and DAMON monitoring
415target type filters are supported by the feature.  Some filter target types
416require additional arguments.  The memory cgroup filter type asks users to
417specify the file path of the memory cgroup for the filter.  The address range
418type asks the start and end addresses of the range.  The DAMON monitoring
419target type asks the index of the target from the context's monitoring targets
420list.  Hence, users can apply specific schemes to only anonymous pages,
421non-anonymous pages, pages of specific cgroups, all pages excluding those of
422specific cgroups, pages in specific address range, pages in specific DAMON
423monitoring targets, and any combination of those.
424
425To handle filters efficiently, the address range and DAMON monitoring target
426type filters are handled by the core layer, while others are handled by
427operations set.  If a memory region is filtered by a core layer-handled filter,
428it is not counted as the scheme has tried to the region.  In contrast, if a
429memory regions is filtered by an operations set layer-handled filter, it is
430counted as the scheme has tried.  The difference in accounting leads to changes
431in the statistics.
432
433
434Application Programming Interface
435---------------------------------
436
437The programming interface for kernel space data access-aware applications.
438DAMON is a framework, so it does nothing by itself.  Instead, it only helps
439other kernel components such as subsystems and modules building their data
440access-aware applications using DAMON's core features.  For this, DAMON exposes
441its all features to other kernel components via its application programming
442interface, namely ``include/linux/damon.h``.  Please refer to the API
443:doc:`document </mm/damon/api>` for details of the interface.
444
445
446Modules
447=======
448
449Because the core of DAMON is a framework for kernel components, it doesn't
450provide any direct interface for the user space.  Such interfaces should be
451implemented by each DAMON API user kernel components, instead.  DAMON subsystem
452itself implements such DAMON API user modules, which are supposed to be used
453for general purpose DAMON control and special purpose data access-aware system
454operations, and provides stable application binary interfaces (ABI) for the
455user space.  The user space can build their efficient data access-aware
456applications using the interfaces.
457
458
459General Purpose User Interface Modules
460--------------------------------------
461
462DAMON modules that provide user space ABIs for general purpose DAMON usage in
463runtime.
464
465DAMON user interface modules, namely 'DAMON sysfs interface' and 'DAMON debugfs
466interface' are DAMON API user kernel modules that provide ABIs to the
467user-space.  Please note that DAMON debugfs interface is currently deprecated.
468
469Like many other ABIs, the modules create files on sysfs and debugfs, allow
470users to specify their requests to and get the answers from DAMON by writing to
471and reading from the files.  As a response to such I/O, DAMON user interface
472modules control DAMON and retrieve the results as user requested via the DAMON
473API, and return the results to the user-space.
474
475The ABIs are designed to be used for user space applications development,
476rather than human beings' fingers.  Human users are recommended to use such
477user space tools.  One such Python-written user space tool is available at
478Github (https://github.com/awslabs/damo), Pypi
479(https://pypistats.org/packages/damo), and Fedora
480(https://packages.fedoraproject.org/pkgs/python-damo/damo/).
481
482Please refer to the ABI :doc:`document </admin-guide/mm/damon/usage>` for
483details of the interfaces.
484
485
486Special-Purpose Access-aware Kernel Modules
487-------------------------------------------
488
489DAMON modules that provide user space ABI for specific purpose DAMON usage.
490
491DAMON sysfs/debugfs user interfaces are for full control of all DAMON features
492in runtime.  For each special-purpose system-wide data access-aware system
493operations such as proactive reclamation or LRU lists balancing, the interfaces
494could be simplified by removing unnecessary knobs for the specific purpose, and
495extended for boot-time and even compile time control.  Default values of DAMON
496control parameters for the usage would also need to be optimized for the
497purpose.
498
499To support such cases, yet more DAMON API user kernel modules that provide more
500simple and optimized user space interfaces are available.  Currently, two
501modules for proactive reclamation and LRU lists manipulation are provided.  For
502more detail, please read the usage documents for those
503(:doc:`/admin-guide/mm/damon/reclaim` and
504:doc:`/admin-guide/mm/damon/lru_sort`).
505