transhuge.rst - OpenGrok cross reference for /linux/Documentation/admin-guide/mm/transhuge.rst

Lines Matching +full:one +full:- +full:to +full:- +full:many
16 But in the future it can expand to other filesystems.
26 requiring larger clear-page copy-page in page faults which is a
32 factor will affect all subsequent accesses to the memory for the whole
44    hugepages but a significant speedup already happens if only one of
46    going to run faster.
48 Modern kernels support "multi-size THP" (mTHP), which introduces the
49 ability to allocate memory in blocks that are bigger than a base page
50 but smaller than traditional PMD-size (as described above), in
51 increments of a power-of-2 number of pages. mTHP can back anonymous
52 memory (for example 16K, 32K, 64K, etc). These THPs continue to be
53 PTE-mapped, but in many cases can still provide similar benefits to
56 prominent because the size of each page isn't as huge as the PMD-sized
57 variant and there is less memory to clear in each page fault. Some
58 architectures also employ TLB compression mechanisms to squeeze more
63 THP can be enabled system wide or restricted to certain tasks or even
66 collapses sequences of basic pages into PMD-sized huge pages.
72 if compared to the reservation approach of hugetlbfs by allowing all
73 unused memory to be used as cache or other movable (or even unmovable
74 entities). It doesn't require reservation to prevent hugepage
75 allocation failures to be noticeable from userland. It allows paging
76 and all other advanced VM features to be available on the
77 hugepages. It requires no modifications for applications to take
80 Applications however can be further optimized to take advantage of
81 this feature, like for example they've been optimized before to avoid
91 possible to disable hugepages system-wide and to only have them inside
95 to eliminate any risk of wasting any precious byte of memory and to
99 risk to lose memory by using hugepages, should use
108 -------------------
112 regions (to avoid the risk of consuming more memory resources) or enabled
113 system wide. This can be achieved per-supported-THP-size with one of::
115 	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
116 	echo madvise >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
117 	echo never >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
124           MADV_COLLAPSE)`` ignores these settings and collapses ranges to
125           PMD-sized huge pages unconditionally.
129 	echo always >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
131 Alternatively it is possible to specify that a given hugepage size
132 will inherit the top-level "enabled" value::
134 	echo inherit >/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/enabled
138 	echo inherit >/sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled
140 The top-level setting (for use with "inherit") can be set by issuing
141 one of the following commands::
147 By default, PMD-sized hugepages have enabled="inherit" and all other
152 It's also possible to limit defrag efforts in the VM to generate
153 anonymous hugepages in case they're not immediately free to madvise
154 regions or to never try to defrag memory and simply fallback to regular
156 time to defrag memory, we would expect to gain even more by the fact we
172 	memory in an effort to allocate a THP immediately. This may be
174 	use and are willing to delay the VM start to utilise them.
178 	to reclaim pages and wake kcompactd to compact memory so that
180 	of khugepaged to then install the THP pages later.
185 	other regions will wake kswapd in the background to reclaim
186 	pages and wake kcompactd to compact memory so that THP is
195 	should be self-explanatory. Note that ``madvise(...,
196 	MADV_COLLAPSE)`` can still cause transparent huge pages to be
199 By default kernel tries to use huge, PMD-mappable zero page on read
200 page fault to anonymous mapping. It's possible to disable huge zero
207 allocation library) may want to know the size (in bytes) of a
208 PMD-mappable transparent hugepage::
212 All THPs at fault and collapse time will be added to _deferred_list,
214 "underused". A THP is underused if the number of zero-filled pages in
215 the THP is above max_ptes_none (see below). It is possible to disable
216 this behaviour by writing 0 to shrink_underused, and enable it by writing
217 1 to it::
222 khugepaged will be automatically started when PMD-sized THP is enabled
223 (either of the per-size anon control or the top-level control are set
224 to "always" or "madvise"), and it'll be automatically shutdown when
225 PMD-sized THP is disabled (when both the per-size anon control and the
226 top-level control are "never")
229 -------------------
232    khugepaged currently only searches for opportunities to collapse to
233    PMD-sized THP and no attempt is made to collapse to other THP
236 khugepaged runs usually at low frequency so while one may not want to
239 also possible to disable defrag in khugepaged by writing 0 or enable
245 You can also control how many pages khugepaged should scan at each
250 and how many milliseconds to wait in khugepaged between each pass (you
251 can set this to 0 to run khugepaged at 100% utilization of one core)::
255 and how many milliseconds to wait in khugepaged if there's an hugepage
256 allocation failure to throttle the next allocation attempt::
264 one 2M hugepage. Each may happen independently, or together, depending on
275 ``max_ptes_none`` specifies how many extra small pages (that are
277 of small pages into one large page::
281 A higher value leads to use additional memory for programs.
282 A lower value leads to gain less thp performance. Value of
286 ``max_ptes_swap`` specifies how many pages can be brought in from
296 ``max_ptes_shared`` specifies how many pages can be shared across multiple
307 You can change the sysfs boot time default for the top-level "enabled"
309 ``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the
313 passing ``thp_anon=<size>[KMG],<size>[KMG]:<state>;<size>[KMG]-<size>[KMG]:<state>``,
315 supported anonymous THP)  and ``<state>`` is one of ``always``, ``madvise``,
318 For example, the following will set 16K, 32K, 64K THP to ``always``,
319 set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M
320 to ``never``::
322 	thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never
324 ``thp_anon=`` may be specified multiple times to configure all THP sizes as
326 not explicitly configured on the command line are implicitly set to
330 ``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``.
333 is not defined within a valid ``thp_anon``, its policy will default to
336 Similarly to ``transparent_hugepage``, you can control the hugepage
338 ``transparent_hugepage_shmem=<policy>``, where ``<policy>`` is one of the
342 Similarly to ``transparent_hugepage_shmem``, you can control the default
344 ``transparent_hugepage_tmpfs=<policy>``, where ``<policy>`` is one of the
353 ``thp_shmem=`` may be specified multiple times to configure all THP sizes
355 sizes not explicitly configured on the command line are implicitly set to
359 ``thp_shmem`` is not specified, PMD_ORDER hugepage will default to
363 default to ``never``.
370 to as "multi-size THP" (mTHP). Huge pages of any size are commonly
373 While there is fine control over the huge page sizes to use for the internal
379 ------------
385     Attempt to allocate huge pages every time we need a new page;
389     can still cause transparent huge pages to be obtained even if this mode
405 ``mount -o remount,huge= /mountpoint`` works fine after mount: remounting
406 ``huge=never`` will not attempt to break up huge pages at all, just stop more
409 In addition to policies listed above, the sysfs knob
411 allocation policy of tmpfs mounts, when set to the following values:
414     For use in emergencies, to force the huge option off from
417     Force the huge option on for all - very useful for testing;
420 ----------------------
424 To control the THP allocation policy for this internal tmpfs mount, the
427 '/sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/shmem_enabled'
433 per-size knob is set to 'inherit'.
439     Attempt to allocate <size> huge pages every time we need a new page;
442     Inherit the top-level "shmem_enabled" value. By default, PMD-sized hugepages
447     MADV_COLLAPSE)`` can still cause transparent huge pages to be obtained
461 transparent_hugepage/hugepages-<size>kB/enabled values and tmpfs mount
462 option only affect future behavior. So to make them effective you need
463 to restart any application that could have been using hugepages. This
464 also applies to the regions registered in khugepaged.
469 The number of PMD-sized anonymous transparent huge pages currently used by the
471 To identify what applications are using PMD-sized anonymous transparent huge
472 pages, it is necessary to read ``/proc/PID/smaps`` and count the AnonHugePages
473 fields for each mapping. (Note that AnonHugePages only applies to traditional
474 PMD-sized THP for historical reasons and should have been called
477 The number of file transparent huge pages mapped to userspace is available
479 To identify what applications are mapping file transparent huge pages, it
480 is necessary to read ``/proc/PID/smaps`` and count the FilePmdMapped fields
486 There are a number of counters in ``/proc/vmstat`` that may be used to
491 	allocated and charged to handle a page fault.
495 	a range of pages to collapse into one huge page and has
496 	successfully allocated a new huge page to store the data.
499 	is incremented if a page fault fails to allocate or charge
500 	a huge page and instead falls back to using small pages.
503 	is incremented if a page fault fails to charge a huge page and
504 	instead falls back to using small pages even though the
509 	of pages that should be collapsed into one huge page but failed
518 	is incremented if a shmem huge page is attempted to be allocated
519 	but fails and instead falls back to using small pages. (Note that
524 	falls back to using small pages even though the allocation was
539 	is incremented if kernel fails to split huge
546 	going to be split under memory pressure.
566 	is incremented if kernel fails to allocate
567 	huge zero page and falls back to using small pages.
570 	is incremented every time a huge page is swapout in one
574 	is incremented if a huge page has to be split before swapout.
575 	Usually because failed to allocate some continuous swap space
578 In /sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/stats, There are
579 also individual counters for each huge page size, which can be utilized to
585 	allocated and charged to handle a page fault.
588 	is incremented if a page fault fails to allocate or charge
589 	a huge page and instead falls back to using huge pages with
593 	is incremented if a page fault fails to charge a huge page and
594 	instead falls back to using huge pages with lower orders or
598 	is incremented every time a huge page is swapped out to zswap in one
602 	is incremented every time a huge page is swapped in from a non-zswap
603 	swap device in one piece.
606 	is incremented if swapin fails to allocate or charge a huge page
607 	and instead falls back to using huge pages with lower orders or
611 	is incremented if swapin fails to charge a huge page and instead
612 	falls back to using  huge pages with lower orders or small pages
616 	is incremented every time a huge page is swapped out to a non-zswap
617 	swap device in one piece without splitting.
620 	is incremented if a huge page has to be split before swapout.
621 	Usually because failed to allocate some continuous swap space
629 	is incremented if a shmem huge page is attempted to be allocated
630 	but fails and instead falls back to using small pages.
634 	falls back to using small pages even though the allocation was
643 	is incremented if kernel fails to split huge
649         it would free up some memory. Pages on split queue are going to
665 system uses memory compaction to copy data around memory to free a
666 huge page for use. There are some counters in ``/proc/vmstat`` to help
670 	is incremented every time a process stalls to run
678 	is incremented if the system tries to compact memory
681 It is possible to establish how long the stalls were using the function
682 tracer to record how long was spent in __alloc_pages() and
683 using the mm_page_alloc tracepoint to identify which allocations were
689 To be guaranteed that the kernel will map a THP immediately in any
690 memory region, the mmap region has to be hugepage naturally
699 usual features belonging to hugetlbfs are preserved and