1.. SPDX-License-Identifier: GPL-2.0 2 3=============== 4Physical Memory 5=============== 6 7Linux is available for a wide range of architectures so there is a need for an 8architecture-independent abstraction to represent the physical memory. This 9chapter describes the structures used to manage physical memory in a running 10system. 11 12The first principal concept prevalent in the memory management is 13`Non-Uniform Memory Access (NUMA) 14<https://en.wikipedia.org/wiki/Non-uniform_memory_access>`_. 15With multi-core and multi-socket machines, memory may be arranged into banks 16that incur a different cost to access depending on the “distance” from the 17processor. For example, there might be a bank of memory assigned to each CPU or 18a bank of memory very suitable for DMA near peripheral devices. 19 20Each bank is called a node and the concept is represented under Linux by a 21``struct pglist_data`` even if the architecture is UMA. This structure is 22always referenced by its typedef ``pg_data_t``. A ``pg_data_t`` structure 23for a particular node can be referenced by ``NODE_DATA(nid)`` macro where 24``nid`` is the ID of that node. 25 26For NUMA architectures, the node structures are allocated by the architecture 27specific code early during boot. Usually, these structures are allocated 28locally on the memory bank they represent. For UMA architectures, only one 29static ``pg_data_t`` structure called ``contig_page_data`` is used. Nodes will 30be discussed further in Section :ref:`Nodes <nodes>` 31 32The entire physical address space is partitioned into one or more blocks 33called zones which represent ranges within memory. These ranges are usually 34determined by architectural constraints for accessing the physical memory. 35The memory range within a node that corresponds to a particular zone is 36described by a ``struct zone``. Each zone has 37one of the types described below. 38 39* ``ZONE_DMA`` and ``ZONE_DMA32`` historically represented memory suitable for 40 DMA by peripheral devices that cannot access all of the addressable 41 memory. For many years there are better more and robust interfaces to get 42 memory with DMA specific requirements (Documentation/core-api/dma-api.rst), 43 but ``ZONE_DMA`` and ``ZONE_DMA32`` still represent memory ranges that have 44 restrictions on how they can be accessed. 45 Depending on the architecture, either of these zone types or even they both 46 can be disabled at build time using ``CONFIG_ZONE_DMA`` and 47 ``CONFIG_ZONE_DMA32`` configuration options. Some 64-bit platforms may need 48 both zones as they support peripherals with different DMA addressing 49 limitations. 50 51* ``ZONE_NORMAL`` is for normal memory that can be accessed by the kernel all 52 the time. DMA operations can be performed on pages in this zone if the DMA 53 devices support transfers to all addressable memory. ``ZONE_NORMAL`` is 54 always enabled. 55 56* ``ZONE_HIGHMEM`` is the part of the physical memory that is not covered by a 57 permanent mapping in the kernel page tables. The memory in this zone is only 58 accessible to the kernel using temporary mappings. This zone is available 59 only on some 32-bit architectures and is enabled with ``CONFIG_HIGHMEM``. 60 61* ``ZONE_MOVABLE`` is for normal accessible memory, just like ``ZONE_NORMAL``. 62 The difference is that the contents of most pages in ``ZONE_MOVABLE`` is 63 movable. That means that while virtual addresses of these pages do not 64 change, their content may move between different physical pages. Often 65 ``ZONE_MOVABLE`` is populated during memory hotplug, but it may be 66 also populated on boot using one of ``kernelcore``, ``movablecore`` and 67 ``movable_node`` kernel command line parameters. See 68 Documentation/mm/page_migration.rst and 69 Documentation/admin-guide/mm/memory-hotplug.rst for additional details. 70 71* ``ZONE_DEVICE`` represents memory residing on devices such as PMEM and GPU. 72 It has different characteristics than RAM zone types and it exists to provide 73 :ref:`struct page <Pages>` and memory map services for device driver 74 identified physical address ranges. ``ZONE_DEVICE`` is enabled with 75 configuration option ``CONFIG_ZONE_DEVICE``. 76 77It is important to note that many kernel operations can only take place using 78``ZONE_NORMAL`` so it is the most performance critical zone. Zones are 79discussed further in Section :ref:`Zones <zones>`. 80 81The relation between node and zone extents is determined by the physical memory 82map reported by the firmware, architectural constraints for memory addressing 83and certain parameters in the kernel command line. 84 85For example, with 32-bit kernel on an x86 UMA machine with 2 Gbytes of RAM the 86entire memory will be on node 0 and there will be three zones: ``ZONE_DMA``, 87``ZONE_NORMAL`` and ``ZONE_HIGHMEM``:: 88 89 0 2G 90 +-------------------------------------------------------------+ 91 | node 0 | 92 +-------------------------------------------------------------+ 93 94 0 16M 896M 2G 95 +----------+-----------------------+--------------------------+ 96 | ZONE_DMA | ZONE_NORMAL | ZONE_HIGHMEM | 97 +----------+-----------------------+--------------------------+ 98 99 100With a kernel built with ``ZONE_DMA`` disabled and ``ZONE_DMA32`` enabled and 101booted with ``movablecore=80%`` parameter on an arm64 machine with 16 Gbytes of 102RAM equally split between two nodes, there will be ``ZONE_DMA32``, 103``ZONE_NORMAL`` and ``ZONE_MOVABLE`` on node 0, and ``ZONE_NORMAL`` and 104``ZONE_MOVABLE`` on node 1:: 105 106 107 1G 9G 17G 108 +--------------------------------+ +--------------------------+ 109 | node 0 | | node 1 | 110 +--------------------------------+ +--------------------------+ 111 112 1G 4G 4200M 9G 9320M 17G 113 +---------+----------+-----------+ +------------+-------------+ 114 | DMA32 | NORMAL | MOVABLE | | NORMAL | MOVABLE | 115 +---------+----------+-----------+ +------------+-------------+ 116 117 118Memory banks may belong to interleaving nodes. In the example below an x86 119machine has 16 Gbytes of RAM in 4 memory banks, even banks belong to node 0 120and odd banks belong to node 1:: 121 122 123 0 4G 8G 12G 16G 124 +-------------+ +-------------+ +-------------+ +-------------+ 125 | node 0 | | node 1 | | node 0 | | node 1 | 126 +-------------+ +-------------+ +-------------+ +-------------+ 127 128 0 16M 4G 129 +-----+-------+ +-------------+ +-------------+ +-------------+ 130 | DMA | DMA32 | | NORMAL | | NORMAL | | NORMAL | 131 +-----+-------+ +-------------+ +-------------+ +-------------+ 132 133In this case node 0 will span from 0 to 12 Gbytes and node 1 will span from 1344 to 16 Gbytes. 135 136.. _nodes: 137 138Nodes 139===== 140 141As we have mentioned, each node in memory is described by a ``pg_data_t`` which 142is a typedef for a ``struct pglist_data``. When allocating a page, by default 143Linux uses a node-local allocation policy to allocate memory from the node 144closest to the running CPU. As processes tend to run on the same CPU, it is 145likely the memory from the current node will be used. The allocation policy can 146be controlled by users as described in 147Documentation/admin-guide/mm/numa_memory_policy.rst. 148 149Most NUMA architectures maintain an array of pointers to the node 150structures. The actual structures are allocated early during boot when 151architecture specific code parses the physical memory map reported by the 152firmware. The bulk of the node initialization happens slightly later in the 153boot process by free_area_init() function, described later in Section 154:ref:`Initialization <initialization>`. 155 156 157Along with the node structures, kernel maintains an array of ``nodemask_t`` 158bitmasks called ``node_states``. Each bitmask in this array represents a set of 159nodes with particular properties as defined by ``enum node_states``: 160 161``N_POSSIBLE`` 162 The node could become online at some point. 163``N_ONLINE`` 164 The node is online. 165``N_NORMAL_MEMORY`` 166 The node has regular memory. 167``N_HIGH_MEMORY`` 168 The node has regular or high memory. When ``CONFIG_HIGHMEM`` is disabled 169 aliased to ``N_NORMAL_MEMORY``. 170``N_MEMORY`` 171 The node has memory(regular, high, movable) 172``N_CPU`` 173 The node has one or more CPUs 174``N_GENERIC_INITIATOR`` 175 The node has one or more Generic Initiators 176 177For each node that has a property described above, the bit corresponding to the 178node ID in the ``node_states[<property>]`` bitmask is set. 179 180For example, for node 2 with normal memory and CPUs, bit 2 will be set in :: 181 182 node_states[N_POSSIBLE] 183 node_states[N_ONLINE] 184 node_states[N_NORMAL_MEMORY] 185 node_states[N_HIGH_MEMORY] 186 node_states[N_MEMORY] 187 node_states[N_CPU] 188 189For various operations possible with nodemasks please refer to 190``include/linux/nodemask.h``. 191 192Among other things, nodemasks are used to provide macros for node traversal, 193namely ``for_each_node()`` and ``for_each_online_node()``. 194 195For instance, to call a function foo() for each online node:: 196 197 for_each_online_node(nid) { 198 pg_data_t *pgdat = NODE_DATA(nid); 199 200 foo(pgdat); 201 } 202 203Node structure 204-------------- 205 206The nodes structure ``struct pglist_data`` is declared in 207``include/linux/mmzone.h``. Here we briefly describe fields of this 208structure: 209 210General 211~~~~~~~ 212 213``node_zones`` 214 The zones for this node. Not all of the zones may be populated, but it is 215 the full list. It is referenced by this node's node_zonelists as well as 216 other node's node_zonelists. 217 218``node_zonelists`` 219 The list of all zones in all nodes. This list defines the order of zones 220 that allocations are preferred from. The ``node_zonelists`` is set up by 221 ``build_zonelists()`` in ``mm/page_alloc.c`` during the initialization of 222 core memory management structures. 223 224``nr_zones`` 225 Number of populated zones in this node. 226 227``node_mem_map`` 228 For UMA systems that use FLATMEM memory model the 0's node 229 ``node_mem_map`` is array of struct pages representing each physical frame. 230 231``node_page_ext`` 232 For UMA systems that use FLATMEM memory model the 0's node 233 ``node_page_ext`` is array of extensions of struct pages. Available only 234 in the kernels built with ``CONFIG_PAGE_EXTENSION`` enabled. 235 236``node_start_pfn`` 237 The page frame number of the starting page frame in this node. 238 239``node_present_pages`` 240 Total number of physical pages present in this node. 241 242``node_spanned_pages`` 243 Total size of physical page range, including holes. 244 245``node_size_lock`` 246 A lock that protects the fields defining the node extents. Only defined when 247 at least one of ``CONFIG_MEMORY_HOTPLUG`` or 248 ``CONFIG_DEFERRED_STRUCT_PAGE_INIT`` configuration options are enabled. 249 ``pgdat_resize_lock()`` and ``pgdat_resize_unlock()`` are provided to 250 manipulate ``node_size_lock`` without checking for ``CONFIG_MEMORY_HOTPLUG`` 251 or ``CONFIG_DEFERRED_STRUCT_PAGE_INIT``. 252 253``node_id`` 254 The Node ID (NID) of the node, starts at 0. 255 256``totalreserve_pages`` 257 This is a per-node reserve of pages that are not available to userspace 258 allocations. 259 260``first_deferred_pfn`` 261 If memory initialization on large machines is deferred then this is the first 262 PFN that needs to be initialized. Defined only when 263 ``CONFIG_DEFERRED_STRUCT_PAGE_INIT`` is enabled 264 265``deferred_split_queue`` 266 Per-node queue of huge pages that their split was deferred. Defined only when ``CONFIG_TRANSPARENT_HUGEPAGE`` is enabled. 267 268``__lruvec`` 269 Per-node lruvec holding LRU lists and related parameters. Used only when 270 memory cgroups are disabled. It should not be accessed directly, use 271 ``mem_cgroup_lruvec()`` to look up lruvecs instead. 272 273Reclaim control 274~~~~~~~~~~~~~~~ 275 276See also Documentation/mm/page_reclaim.rst. 277 278``kswapd`` 279 Per-node instance of kswapd kernel thread. 280 281``kswapd_wait``, ``pfmemalloc_wait``, ``reclaim_wait`` 282 Workqueues used to synchronize memory reclaim tasks 283 284``nr_writeback_throttled`` 285 Number of tasks that are throttled waiting on dirty pages to clean. 286 287``nr_reclaim_start`` 288 Number of pages written while reclaim is throttled waiting for writeback. 289 290``kswapd_order`` 291 Controls the order kswapd tries to reclaim 292 293``kswapd_highest_zoneidx`` 294 The highest zone index to be reclaimed by kswapd 295 296``kswapd_failures`` 297 Number of runs kswapd was unable to reclaim any pages 298 299``min_unmapped_pages`` 300 Minimal number of unmapped file backed pages that cannot be reclaimed. 301 Determined by ``vm.min_unmapped_ratio`` sysctl. Only defined when 302 ``CONFIG_NUMA`` is enabled. 303 304``min_slab_pages`` 305 Minimal number of SLAB pages that cannot be reclaimed. Determined by 306 ``vm.min_slab_ratio sysctl``. Only defined when ``CONFIG_NUMA`` is enabled 307 308``flags`` 309 Flags controlling reclaim behavior. 310 311Compaction control 312~~~~~~~~~~~~~~~~~~ 313 314``kcompactd_max_order`` 315 Page order that kcompactd should try to achieve. 316 317``kcompactd_highest_zoneidx`` 318 The highest zone index to be compacted by kcompactd. 319 320``kcompactd_wait`` 321 Workqueue used to synchronize memory compaction tasks. 322 323``kcompactd`` 324 Per-node instance of kcompactd kernel thread. 325 326``proactive_compact_trigger`` 327 Determines if proactive compaction is enabled. Controlled by 328 ``vm.compaction_proactiveness`` sysctl. 329 330Statistics 331~~~~~~~~~~ 332 333``per_cpu_nodestats`` 334 Per-CPU VM statistics for the node 335 336``vm_stat`` 337 VM statistics for the node. 338 339.. _zones: 340 341Zones 342===== 343As we have mentioned, each zone in memory is described by a ``struct zone`` 344which is an element of the ``node_zones`` array of the node it belongs to. 345``struct zone`` is the core data structure of the page allocator. A zone 346represents a range of physical memory and may have holes. 347 348The page allocator uses the GFP flags, see :ref:`mm-api-gfp-flags`, specified by 349a memory allocation to determine the highest zone in a node from which the 350memory allocation can allocate memory. The page allocator first allocates memory 351from that zone, if the page allocator can't allocate the requested amount of 352memory from the zone, it will allocate memory from the next lower zone in the 353node, the process continues up to and including the lowest zone. For example, if 354a node contains ``ZONE_DMA32``, ``ZONE_NORMAL`` and ``ZONE_MOVABLE`` and the 355highest zone of a memory allocation is ``ZONE_MOVABLE``, the order of the zones 356from which the page allocator allocates memory is ``ZONE_MOVABLE`` > 357``ZONE_NORMAL`` > ``ZONE_DMA32``. 358 359At runtime, free pages in a zone are in the Per-CPU Pagesets (PCP) or free areas 360of the zone. The Per-CPU Pagesets are a vital mechanism in the kernel's memory 361management system. By handling most frequent allocations and frees locally on 362each CPU, the Per-CPU Pagesets improve performance and scalability, especially 363on systems with many cores. The page allocator in the kernel employs a two-step 364strategy for memory allocation, starting with the Per-CPU Pagesets before 365falling back to the buddy allocator. Pages are transferred between the Per-CPU 366Pagesets and the global free areas (managed by the buddy allocator) in batches. 367This minimizes the overhead of frequent interactions with the global buddy 368allocator. 369 370Architecture specific code calls free_area_init() to initializes zones. 371 372Zone structure 373-------------- 374The zones structure ``struct zone`` is defined in ``include/linux/mmzone.h``. 375Here we briefly describe fields of this structure: 376 377General 378~~~~~~~ 379 380``_watermark`` 381 The watermarks for this zone. When the amount of free pages in a zone is below 382 the min watermark, boosting is ignored, an allocation may trigger direct 383 reclaim and direct compaction, it is also used to throttle direct reclaim. 384 When the amount of free pages in a zone is below the low watermark, kswapd is 385 woken up. When the amount of free pages in a zone is above the high watermark, 386 kswapd stops reclaiming (a zone is balanced) when the 387 ``NUMA_BALANCING_MEMORY_TIERING`` bit of ``sysctl_numa_balancing_mode`` is not 388 set. The promo watermark is used for memory tiering and NUMA balancing. When 389 the amount of free pages in a zone is above the promo watermark, kswapd stops 390 reclaiming when the ``NUMA_BALANCING_MEMORY_TIERING`` bit of 391 ``sysctl_numa_balancing_mode`` is set. The watermarks are set by 392 ``__setup_per_zone_wmarks()``. The min watermark is calculated according to 393 ``vm.min_free_kbytes`` sysctl. The other three watermarks are set according 394 to the distance between two watermarks. The distance itself is calculated 395 taking ``vm.watermark_scale_factor`` sysctl into account. 396 397``watermark_boost`` 398 The number of pages which are used to boost watermarks to increase reclaim 399 pressure to reduce the likelihood of future fallbacks and wake kswapd now 400 as the node may be balanced overall and kswapd will not wake naturally. 401 402``nr_reserved_highatomic`` 403 The number of pages which are reserved for high-order atomic allocations. 404 405``nr_free_highatomic`` 406 The number of free pages in reserved highatomic pageblocks 407 408``lowmem_reserve`` 409 The array of the amounts of the memory reserved in this zone for memory 410 allocations. For example, if the highest zone a memory allocation can 411 allocate memory from is ``ZONE_MOVABLE``, the amount of memory reserved in 412 this zone for this allocation is ``lowmem_reserve[ZONE_MOVABLE]`` when 413 attempting to allocate memory from this zone. This is a mechanism the page 414 allocator uses to prevent allocations which could use ``highmem`` from using 415 too much ``lowmem``. For some specialised workloads on ``highmem`` machines, 416 it is dangerous for the kernel to allow process memory to be allocated from 417 the ``lowmem`` zone. This is because that memory could then be pinned via the 418 ``mlock()`` system call, or by unavailability of swapspace. 419 ``vm.lowmem_reserve_ratio`` sysctl determines how aggressive the kernel is in 420 defending these lower zones. This array is recalculated by 421 ``setup_per_zone_lowmem_reserve()`` at runtime if ``vm.lowmem_reserve_ratio`` 422 sysctl changes. 423 424``node`` 425 The index of the node this zone belongs to. Available only when 426 ``CONFIG_NUMA`` is enabled because there is only one zone in a UMA system. 427 428``zone_pgdat`` 429 Pointer to the ``struct pglist_data`` of the node this zone belongs to. 430 431``per_cpu_pageset`` 432 Pointer to the Per-CPU Pagesets (PCP) allocated and initialized by 433 ``setup_zone_pageset()``. By handling most frequent allocations and frees 434 locally on each CPU, PCP improves performance and scalability on systems with 435 many cores. 436 437``pageset_high_min`` 438 Copied to the ``high_min`` of the Per-CPU Pagesets for faster access. 439 440``pageset_high_max`` 441 Copied to the ``high_max`` of the Per-CPU Pagesets for faster access. 442 443``pageset_batch`` 444 Copied to the ``batch`` of the Per-CPU Pagesets for faster access. The 445 ``batch``, ``high_min`` and ``high_max`` of the Per-CPU Pagesets are used to 446 calculate the number of elements the Per-CPU Pagesets obtain from the buddy 447 allocator under a single hold of the lock for efficiency. They are also used 448 to decide if the Per-CPU Pagesets return pages to the buddy allocator in page 449 free process. 450 451``pageblock_flags`` 452 The pointer to the flags for the pageblocks in the zone (see 453 ``include/linux/pageblock-flags.h`` for flags list). The memory is allocated 454 in ``setup_usemap()``. Each pageblock occupies ``NR_PAGEBLOCK_BITS`` bits. 455 Defined only when ``CONFIG_FLATMEM`` is enabled. The flags is stored in 456 ``mem_section`` when ``CONFIG_SPARSEMEM`` is enabled. 457 458``zone_start_pfn`` 459 The start pfn of the zone. It is initialized by 460 ``calculate_node_totalpages()``. 461 462``managed_pages`` 463 The present pages managed by the buddy system, which is calculated as: 464 ``managed_pages`` = ``present_pages`` - ``reserved_pages``, ``reserved_pages`` 465 includes pages allocated by the memblock allocator. It should be used by page 466 allocator and vm scanner to calculate all kinds of watermarks and thresholds. 467 It is accessed using ``atomic_long_xxx()`` functions. It is initialized in 468 ``free_area_init_core()`` and then is reinitialized when memblock allocator 469 frees pages into buddy system. 470 471``spanned_pages`` 472 The total pages spanned by the zone, including holes, which is calculated as: 473 ``spanned_pages`` = ``zone_end_pfn`` - ``zone_start_pfn``. It is initialized 474 by ``calculate_node_totalpages()``. 475 476``present_pages`` 477 The physical pages existing within the zone, which is calculated as: 478 ``present_pages`` = ``spanned_pages`` - ``absent_pages`` (pages in holes). It 479 may be used by memory hotplug or memory power management logic to figure out 480 unmanaged pages by checking (``present_pages`` - ``managed_pages``). Write 481 access to ``present_pages`` at runtime should be protected by 482 ``mem_hotplug_begin/done()``. Any reader who can't tolerant drift of 483 ``present_pages`` should use ``get_online_mems()`` to get a stable value. It 484 is initialized by ``calculate_node_totalpages()``. 485 486``present_early_pages`` 487 The present pages existing within the zone located on memory available since 488 early boot, excluding hotplugged memory. Defined only when 489 ``CONFIG_MEMORY_HOTPLUG`` is enabled and initialized by 490 ``calculate_node_totalpages()``. 491 492``cma_pages`` 493 The pages reserved for CMA use. These pages behave like ``ZONE_MOVABLE`` when 494 they are not used for CMA. Defined only when ``CONFIG_CMA`` is enabled. 495 496``name`` 497 The name of the zone. It is a pointer to the corresponding element of 498 the ``zone_names`` array. 499 500``nr_isolate_pageblock`` 501 Number of isolated pageblocks. It is used to solve incorrect freepage counting 502 problem due to racy retrieving migratetype of pageblock. Protected by 503 ``zone->lock``. Defined only when ``CONFIG_MEMORY_ISOLATION`` is enabled. 504 505``span_seqlock`` 506 The seqlock to protect ``zone_start_pfn`` and ``spanned_pages``. It is a 507 seqlock because it has to be read outside of ``zone->lock``, and it is done in 508 the main allocator path. However, the seqlock is written quite infrequently. 509 Defined only when ``CONFIG_MEMORY_HOTPLUG`` is enabled. 510 511``initialized`` 512 The flag indicating if the zone is initialized. Set by 513 ``init_currently_empty_zone()`` during boot. 514 515``free_area`` 516 The array of free areas, where each element corresponds to a specific order 517 which is a power of two. The buddy allocator uses this structure to manage 518 free memory efficiently. When allocating, it tries to find the smallest 519 sufficient block, if the smallest sufficient block is larger than the 520 requested size, it will be recursively split into the next smaller blocks 521 until the required size is reached. When a page is freed, it may be merged 522 with its buddy to form a larger block. It is initialized by 523 ``zone_init_free_lists()``. 524 525``unaccepted_pages`` 526 The list of pages to be accepted. All pages on the list are ``MAX_PAGE_ORDER``. 527 Defined only when ``CONFIG_UNACCEPTED_MEMORY`` is enabled. 528 529``flags`` 530 The zone flags. The least three bits are used and defined by 531 ``enum zone_flags``. ``ZONE_BOOSTED_WATERMARK`` (bit 0): zone recently boosted 532 watermarks. Cleared when kswapd is woken. ``ZONE_RECLAIM_ACTIVE`` (bit 1): 533 kswapd may be scanning the zone. ``ZONE_BELOW_HIGH`` (bit 2): zone is below 534 high watermark. 535 536``lock`` 537 The main lock that protects the internal data structures of the page allocator 538 specific to the zone, especially protects ``free_area``. 539 540``percpu_drift_mark`` 541 When free pages are below this point, additional steps are taken when reading 542 the number of free pages to avoid per-cpu counter drift allowing watermarks 543 to be breached. It is updated in ``refresh_zone_stat_thresholds()``. 544 545Compaction control 546~~~~~~~~~~~~~~~~~~ 547 548``compact_cached_free_pfn`` 549 The PFN where compaction free scanner should start in the next scan. 550 551``compact_cached_migrate_pfn`` 552 The PFNs where compaction migration scanner should start in the next scan. 553 This array has two elements: the first one is used in ``MIGRATE_ASYNC`` mode, 554 and the other one is used in ``MIGRATE_SYNC`` mode. 555 556``compact_init_migrate_pfn`` 557 The initial migration PFN which is initialized to 0 at boot time, and to the 558 first pageblock with migratable pages in the zone after a full compaction 559 finishes. It is used to check if a scan is a whole zone scan or not. 560 561``compact_init_free_pfn`` 562 The initial free PFN which is initialized to 0 at boot time and to the last 563 pageblock with free ``MIGRATE_MOVABLE`` pages in the zone. It is used to check 564 if it is the start of a scan. 565 566``compact_considered`` 567 The number of compactions attempted since last failure. It is reset in 568 ``defer_compaction()`` when a compaction fails to result in a page allocation 569 success. It is increased by 1 in ``compaction_deferred()`` when a compaction 570 should be skipped. ``compaction_deferred()`` is called before 571 ``compact_zone()`` is called, ``compaction_defer_reset()`` is called when 572 ``compact_zone()`` returns ``COMPACT_SUCCESS``, ``defer_compaction()`` is 573 called when ``compact_zone()`` returns ``COMPACT_PARTIAL_SKIPPED`` or 574 ``COMPACT_COMPLETE``. 575 576``compact_defer_shift`` 577 The number of compactions skipped before trying again is 578 ``1<<compact_defer_shift``. It is increased by 1 in ``defer_compaction()``. 579 It is reset in ``compaction_defer_reset()`` when a direct compaction results 580 in a page allocation success. Its maximum value is ``COMPACT_MAX_DEFER_SHIFT``. 581 582``compact_order_failed`` 583 The minimum compaction failed order. It is set in ``compaction_defer_reset()`` 584 when a compaction succeeds and in ``defer_compaction()`` when a compaction 585 fails to result in a page allocation success. 586 587``compact_blockskip_flush`` 588 Set to true when compaction migration scanner and free scanner meet, which 589 means the ``PB_compact_skip`` bits should be cleared. 590 591``contiguous`` 592 Set to true when the zone is contiguous (in other words, no hole). 593 594Statistics 595~~~~~~~~~~ 596 597``vm_stat`` 598 VM statistics for the zone. The items tracked are defined by 599 ``enum zone_stat_item``. 600 601``vm_numa_event`` 602 VM NUMA event statistics for the zone. The items tracked are defined by 603 ``enum numa_stat_item``. 604 605``per_cpu_zonestats`` 606 Per-CPU VM statistics for the zone. It records VM statistics and VM NUMA event 607 statistics on a per-CPU basis. It reduces updates to the global ``vm_stat`` 608 and ``vm_numa_event`` fields of the zone to improve performance. 609 610.. _pages: 611 612Pages 613===== 614 615.. admonition:: Stub 616 617 This section is incomplete. Please list and describe the appropriate fields. 618 619.. _folios: 620 621Folios 622====== 623 624.. admonition:: Stub 625 626 This section is incomplete. Please list and describe the appropriate fields. 627 628.. _initialization: 629 630Initialization 631============== 632 633.. admonition:: Stub 634 635 This section is incomplete. Please list and describe the appropriate fields. 636