| #
334fbe73 |
| 15-Apr-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "maple_tree: Replace big node with maple copy" (Liam Howlett)
Merge tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "maple_tree: Replace big node with maple copy" (Liam Howlett)
Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement.
- "mm, swap: swap table phase III: remove swap_map" (Kairui Song)
Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups.
- "mm: memfd_luo: preserve file seals" (Pratyush Yadav)
File seal preservation to LUO's memfd code
- "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen)
Additional userspace stats reportng to zswap
- "arch, mm: consolidate empty_zero_page" (Mike Rapoport)
Some cleanups for our handling of ZERO_PAGE() and zero_pfn
- "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han)
A robustness improvement and some cleanups in the kmemleak code
- "Improve khugepaged scan logic" (Vernon Yang)
Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently
- "Make KHO Stateless" (Jason Miu)
Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel
- "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt)
Enhance vmscan's tracepointing
- "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas)
Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation
- "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin)
Fix a WARN() which can be emitted the KHO restores a vmalloc area
- "mm: Remove stray references to pagevec" (Tal Zussman)
Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago
- "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau)
Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page
- "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park)
Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used
- "mm/damon: strictly respect min_nr_regions" (SeongJae Park)
Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter
- "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka)
The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued
- "mm: cleanups around unmapping / zapping" (David Hildenbrand)
A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions
- "support batched checking of the young flag for MGLRU" (Baolin Wang)
Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64
- "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner)
memcg cleanup and robustness improvements
- "Allow order zero pages in page reporting" (Yuvraj Sakshith)
Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory.
- "mm: vma flag tweaks" (Lorenzo Stoakes)
Cleanup work following from the recent conversion of the VMA flags to a bitmap
- "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park)
Add some more developer-facing debug checks into DAMON core
- "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park)
An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling
- "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park)
Fix a hard-to-hit time overflow issue in DAMON core
- "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park)
A batch of misc/minor improvements and fixups for DAMON
- "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand)
Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required.
- "zram: recompression cleanups and tweaks" (Sergey Senozhatsky)
A somewhat random mix of fixups, recompression cleanups and improvements in the zram code
- "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park)
Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select
- "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao)
Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged
- "mm: improve map count checks" (Lorenzo Stoakes)
Provide some cleanups and slight fixes in the mremap, mmap and vma code
- "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park)
Extend the use of DAMON core's addr_unit tunable
- "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache)
Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support
- "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand)
Code movement and cleanups in the memhotplug and sparsemem code
- "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand)
Rationalize some memhotplug Kconfig support
- "change young flag check functions to return bool" (Baolin Wang)
Cleanups to change all young flag check functions to return bool
- "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park)
Fix a few potential DAMON bugs
- "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes)
Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code.
- "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes)
Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers
- "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes)
Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed.
* tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
show more ...
|
| #
6ebf98d7 |
| 19-Mar-2026 |
David Hildenbrand (Arm) <david@kernel.org> |
mm: introduce CONFIG_NUMA_MIGRATION and simplify CONFIG_MIGRATION
CONFIG_MEMORY_HOTREMOVE, CONFIG_COMPACTION and CONFIG_CMA all select CONFIG_MIGRATION, because they require it to work (users).
Onl
mm: introduce CONFIG_NUMA_MIGRATION and simplify CONFIG_MIGRATION
CONFIG_MEMORY_HOTREMOVE, CONFIG_COMPACTION and CONFIG_CMA all select CONFIG_MIGRATION, because they require it to work (users).
Only CONFIG_NUMA_BALANCING and CONFIG_BALLOON_MIGRATION depend on CONFIG_MIGRATION. CONFIG_BALLOON_MIGRATION is not an actual user, but an implementation of migration support, so the dependency is correct (CONFIG_BALLOON_MIGRATION does not make any sense without CONFIG_MIGRATION).
However, kconfig-language.rst clearly states "In general use select only for non-visible symbols". So far CONFIG_MIGRATION is user-visible ... and the dependencies rather confusing.
The whole reason why CONFIG_MIGRATION is user-visible is because of CONFIG_NUMA: some users might want CONFIG_NUMA but not page migration support.
Let's clean all that up by introducing a dedicated CONFIG_NUMA_MIGRATION config option for that purpose only. Make CONFIG_NUMA_BALANCING that so far depended on CONFIG_NUMA && CONFIG_MIGRATION to depend on CONFIG_MIGRATION instead. CONFIG_NUMA_MIGRATION will depend on CONFIG_NUMA && CONFIG_MMU.
CONFIG_NUMA_MIGRATION is user-visible and will default to "y". We use that default so new configs will automatically enable it, just like it was the case with CONFIG_MIGRATION. The downside is that some configs that used to have CONFIG_MIGRATION=n might get it re-enabled by CONFIG_NUMA_MIGRATION=y, which shouldn't be a problem.
CONFIG_MIGRATION is now a non-visible config option. Any code that select CONFIG_MIGRATION (as before) must depend directly or indirectly on CONFIG_MMU.
CONFIG_NUMA_MIGRATION is responsible for any NUMA migration code, which is mempolicy migration code, memory-tiering code, and move_pages() code in migrate.c. CONFIG_NUMA_BALANCING uses its functionality.
Note that this implies that with CONFIG_NUMA_MIGRATION=n, move_pages() will not be available even though CONFIG_MIGRATION=y, which is an expected change.
In migrate.c, we can remove the CONFIG_NUMA check as both CONFIG_NUMA_MIGRATION and CONFIG_NUMA_BALANCING depend on it.
With this change, CONFIG_MIGRATION is an internal config, all users of migration selects CONFIG_MIGRATION, and only CONFIG_BALLOON_MIGRATION depends on it.
Link: https://lkml.kernel.org/r/20260319-config_migration-v1-2-42270124966f@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Byungchul Park <byungchul@sk.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Gregory Price <gourry@gourry.net> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
078f80f9 |
| 19-Mar-2026 |
David Hildenbrand (Arm) <david@kernel.org> |
mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE
Patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION".
While working on memory hotplug code cleanups, I realized
mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE
Patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION".
While working on memory hotplug code cleanups, I realized that CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE is not really required anymore.
Changing that revealed some rather nasty looking CONFIG_MIGRATION handling.
Let's clean that up by introducing a dedicated CONFIG_NUMA_MIGRATION option and reducing the dependencies that CONFIG_MIGRATION has.
This patch (of 2):
All architectures that select CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE also select CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG. So we can just remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE.
For CONFIG_MIGRATION, make it depend on CONFIG_MEMORY_HOTREMOVE instead, and make CONFIG_MEMORY_HOTREMOVE select CONFIG_MIGRATION (just like CONFIG_CMA and CONFIG_COMPACTION already do).
We'll clean up CONFIG_MIGRATION next.
Link: https://lkml.kernel.org/r/20260319-config_migration-v1-0-42270124966f@kernel.org Link: https://lkml.kernel.org/r/20260319-config_migration-v1-1-42270124966f@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Byungchul Park <byungchul@sk.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
fb3c3f5d |
| 20-Mar-2026 |
David Hildenbrand (Arm) <david@kernel.org> |
mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
Ever since commit f8f03eb5f0f9 ("mm: stop making SPARSEMEM_VMEMMAP user-selectable"), an architecture that supports CONFIG_S
mm/Kconfig: make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP
Ever since commit f8f03eb5f0f9 ("mm: stop making SPARSEMEM_VMEMMAP user-selectable"), an architecture that supports CONFIG_SPARSEMEM_VMEMMAP (by selecting SPARSEMEM_VMEMMAP_ENABLE) can no longer enable CONFIG_SPARSEMEM without CONFIG_SPARSEMEM_VMEMMAP.
Right now, CONFIG_MEMORY_HOTPLUG is guarded by CONFIG_SPARSEMEM.
However, CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG is only enabled by * arm64: which selects SPARSEMEM_VMEMMAP_ENABLE * loongarch: which selects SPARSEMEM_VMEMMAP_ENABLE * powerpc (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE * riscv (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE * s390 with SPARSEMEM: which selects SPARSEMEM_VMEMMAP_ENABLE * x86 (64bit): which selects SPARSEMEM_VMEMMAP_ENABLE
So, we can make CONFIG_MEMORY_HOTPLUG depend on CONFIG_SPARSEMEM_VMEMMAP without affecting any setups.
Link: https://lkml.kernel.org/r/20260320-sparsemem_cleanups-v2-4-096addc8800d@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
17a9399a |
| 24-Mar-2026 |
Jann Horn <jannh@google.com> |
slab,rcu: disable KVFREE_RCU_BATCHED for strict grace period
Disable CONFIG_KVFREE_RCU_BATCHED in CONFIG_RCU_STRICT_GRACE_PERIOD builds so that kernel fuzzers have an easier time finding use-after-f
slab,rcu: disable KVFREE_RCU_BATCHED for strict grace period
Disable CONFIG_KVFREE_RCU_BATCHED in CONFIG_RCU_STRICT_GRACE_PERIOD builds so that kernel fuzzers have an easier time finding use-after-free involving kfree_rcu().
The intent behind CONFIG_RCU_STRICT_GRACE_PERIOD is that RCU should invoke callbacks and free objects as soon as possible (at a large performance cost) so that kernel fuzzers and such have an easier time detecting use-after-free bugs in objects with RCU lifetime.
CONFIG_KVFREE_RCU_BATCHED is a performance optimization that queues RCU-freed objects in ways that CONFIG_RCU_STRICT_GRACE_PERIOD can't expedite; for example, the following testcase doesn't trigger a KASAN splat when CONFIG_KVFREE_RCU_BATCHED is enabled: ``` struct foo_struct { struct rcu_head rcu; int a; }; struct foo_struct *foo = kmalloc(sizeof(*foo), GFP_KERNEL | __GFP_NOFAIL | __GFP_ZERO);
pr_info("%s: calling kfree_rcu()\n", __func__); kfree_rcu(foo, rcu); msleep(10); pr_info("%s: start UAF access\n", __func__); READ_ONCE(foo->a); pr_info("%s: end UAF access\n", __func__); ```
Signed-off-by: Jann Horn <jannh@google.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Acked-by: Harry Yoo (Oracle) <harry@kernel.org> Link: https://patch.msgid.link/20260324-kasan-kfree-rcu-v1-1-ac58a7a13d03@google.com Signed-off-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
show more ...
|
| #
4cff5c05 |
| 12-Feb-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "powerpc/64s: do not re-activate batched TLB flush" makes a
Merge tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "powerpc/64s: do not re-activate batched TLB flush" makes arch_{enter|leave}_lazy_mmu_mode() nest properly (Alexander Gordeev)
It adds a generic enter/leave layer and switches architectures to use it. Various hacks were removed in the process.
- "zram: introduce compressed data writeback" implements data compression for zram writeback (Richard Chang and Sergey Senozhatsky)
- "mm: folio_zero_user: clear page ranges" adds clearing of contiguous page ranges for hugepages. Large improvements during demand faulting are demonstrated (David Hildenbrand)
- "memcg cleanups" tidies up some memcg code (Chen Ridong)
- "mm/damon: introduce {,max_}nr_snapshots and tracepoint for damos stats" improves DAMOS stat's provided information, deterministic control, and readability (SeongJae Park)
- "selftests/mm: hugetlb cgroup charging: robustness fixes" fixes a few issues in the hugetlb cgroup charging selftests (Li Wang)
- "Fix va_high_addr_switch.sh test failure - again" addresses several issues in the va_high_addr_switch test (Chunyu Hu)
- "mm/damon/tests/core-kunit: extend existing test scenarios" improves the KUnit test coverage for DAMON (Shu Anzai)
- "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE" fixes a glitch in khugepaged which was causing madvise(MADV_COLLAPSE) to transiently return -EAGAIN (Shivank Garg)
- "arch, mm: consolidate hugetlb early reservation" reworks and consolidates a pile of straggly code related to reservation of hugetlb memory from bootmem and creation of CMA areas for hugetlb (Mike Rapoport)
- "mm: clean up anon_vma implementation" cleans up the anon_vma implementation in various ways (Lorenzo Stoakes)
- "tweaks for __alloc_pages_slowpath()" does a little streamlining of the page allocator's slowpath code (Vlastimil Babka)
- "memcg: separate private and public ID namespaces" cleans up the memcg ID code and prevents the internal-only private IDs from being exposed to userspace (Shakeel Butt)
- "mm: hugetlb: allocate frozen gigantic folio" cleans up the allocation of frozen folios and avoids some atomic refcount operations (Kefeng Wang)
- "mm/damon: advance DAMOS-based LRU sorting" improves DAMOS's movement of memory betewwn the active and inactive LRUs and adds auto-tuning of the ratio-based quotas and of monitoring intervals (SeongJae Park)
- "Support page table check on PowerPC" makes CONFIG_PAGE_TABLE_CHECK_ENFORCED work on powerpc (Andrew Donnellan)
- "nodemask: align nodes_and{,not} with underlying bitmap ops" makes nodes_and() and nodes_andnot() propagate the return values from the underlying bit operations, enabling some cleanup in calling code (Yury Norov)
- "mm/damon: hide kdamond and kdamond_lock from API callers" cleans up some DAMON internal interfaces (SeongJae Park)
- "mm/khugepaged: cleanups and scan limit fix" does some cleanup work in khupaged and fixes a scan limit accounting issue (Shivank Garg)
- "mm: balloon infrastructure cleanups" goes to town on the balloon infrastructure and its page migration function. Mainly cleanups, also some locking simplification (David Hildenbrand)
- "mm/vmscan: add tracepoint and reason for kswapd_failures reset" adds additional tracepoints to the page reclaim code (Jiayuan Chen)
- "Replace wq users and add WQ_PERCPU to alloc_workqueue() users" is part of Marco's kernel-wide migration from the legacy workqueue APIs over to the preferred unbound workqueues (Marco Crivellari)
- "Various mm kselftests improvements/fixes" provides various unrelated improvements/fixes for the mm kselftests (Kevin Brodsky)
- "mm: accelerate gigantic folio allocation" greatly speeds up gigantic folio allocation, mainly by avoiding unnecessary work in pfn_range_valid_contig() (Kefeng Wang)
- "selftests/damon: improve leak detection and wss estimation reliability" improves the reliability of two of the DAMON selftests (SeongJae Park)
- "mm/damon: cleanup kdamond, damon_call(), damos filter and DAMON_MIN_REGION" does some cleanup work in the core DAMON code (SeongJae Park)
- "Docs/mm/damon: update intro, modules, maintainer profile, and misc" performs maintenance work on the DAMON documentation (SeongJae Park)
- "mm: add and use vma_assert_stabilised() helper" refactors and cleans up the core VMA code. The main aim here is to be able to use the mmap write lock's lockdep state to perform various assertions regarding the locking which the VMA code requires (Lorenzo Stoakes)
- "mm, swap: swap table phase II: unify swapin use" removes some old swap code (swap cache bypassing and swap synchronization) which wasn't working very well. Various other cleanups and simplifications were made. The end result is a 20% speedup in one benchmark (Kairui Song)
- "enable PT_RECLAIM on more 64-bit architectures" makes PT_RECLAIM available on 64-bit alpha, loongarch, mips, parisc, and um. Various cleanups were performed along the way (Qi Zheng)
* tag 'mm-stable-2026-02-11-19-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (325 commits) mm/memory: handle non-split locks correctly in zap_empty_pte_table() mm: move pte table reclaim code to memory.c mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config um: mm: enable MMU_GATHER_RCU_TABLE_FREE parisc: mm: enable MMU_GATHER_RCU_TABLE_FREE mips: mm: enable MMU_GATHER_RCU_TABLE_FREE LoongArch: mm: enable MMU_GATHER_RCU_TABLE_FREE alpha: mm: enable MMU_GATHER_RCU_TABLE_FREE mm: change mm/pt_reclaim.c to use asm/tlb.h instead of asm-generic/tlb.h mm/damon/stat: remove __read_mostly from memory_idle_ms_percentiles zsmalloc: make common caches global mm: add SPDX id lines to some mm source files mm/zswap: use %pe to print error pointers mm/vmscan: use %pe to print error pointers mm/readahead: fix typo in comment mm: khugepaged: fix NR_FILE_PAGES and NR_SHMEM in collapse_file() mm: refactor vma_map_pages to use vm_insert_pages mm/damon: unify address range representation with damon_addr_range mm/cma: replace snprintf with strscpy in cma_new_area ...
show more ...
|
| #
148f95f7 |
| 11-Feb-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'slab-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- The percpu sheaves caching layer was introduced as opt-in in 6.18 an
Merge tag 'slab-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:
- The percpu sheaves caching layer was introduced as opt-in in 6.18 and now we enable it for all caches and remove the previous cpu (partial) slab caching mechanism.
Besides the lower locking overhead and much more likely fastpath when freeing, this removes the rather complicated code related to the cpu slab lockless fastpaths (using this_cpu_try_cmpxchg128/64) and all its complications for PREEMPT_RT or kmalloc_nolock().
The lockless slab freelist+counters update operation using try_cmpxchg128/64 remains and is crucial for freeing remote NUMA objects, and to allow flushing objects from sheaves to slabs mostly without the node list_lock (Vlastimil Babka)
- Eliminate slabobj_ext metadata overhead when possible. Instead of using kmalloc() to allocate the array for memcg and/or allocation profiling tag pointers, use leftover space in a slab or per-object padding due to alignment (Harry Yoo)
- Various followup improvements to the above (Hao Li)
* tag 'slab-for-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (39 commits) slub: let need_slab_obj_exts() return false if SLAB_NO_OBJ_EXT is set mm/slab: only allow SLAB_OBJ_EXT_IN_OBJ for unmergeable caches mm/slab: place slabobj_ext metadata in unused space within s->size mm/slab: move [__]ksize and slab_ksize() to mm/slub.c mm/slab: save memory by allocating slabobj_ext array from leftover mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poison mm/slab: use stride to access slabobj_ext mm/slab: abstract slabobj_ext access via new slab_obj_ext() helper ext4: specify the free pointer offset for ext4_inode_cache mm/slab: allow specifying free pointer offset when using constructor mm/slab: use unsigned long for orig_size to ensure proper metadata align slub: clarify object field layout comments mm/slab: avoid allocating slabobj_ext array from its own slab slub: avoid list_lock contention from __refill_objects_any() mm/slub: cleanup and repurpose some stat items mm/slub: remove DEACTIVATE_TO_* stat items slab: remove frozen slab checks from __slab_free() slab: update overview comments slab: refill sheaves from all nodes slab: remove unused PREEMPT_RT specific macros ...
show more ...
|
| #
9c8c02df |
| 27-Jan-2026 |
Qi Zheng <zhengqi.arch@bytedance.com> |
mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
The PT_RECLAIM can work on all architectures that support MMU_GATHER_RCU_TABLE_FREE, except for those that have selected HAVE_ARCH_TLB_REMOVE
mm: make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE
The PT_RECLAIM can work on all architectures that support MMU_GATHER_RCU_TABLE_FREE, except for those that have selected HAVE_ARCH_TLB_REMOVE_TABLE,so make PT_RECLAIM depends on MMU_GATHER_RCU_TABLE_FREE && !HAVE_ARCH_TLB_REMOVE_TABLE.
BTW, change PT_RECLAIM to be enabled by default, since nobody should want to turn it off.
Link: https://lkml.kernel.org/r/83b034810935a9ff18e425b085e065bb0acb28f3.1769515122.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Andreas Larsson <andreas@gaisler.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
086498ae |
| 27-Jan-2026 |
Qi Zheng <zhengqi.arch@bytedance.com> |
mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
For architectures that define __HAVE_ARCH_TLB_REMOVE_TABLE, the page tables at the pmd/pud level are generally no
mm: convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config
For architectures that define __HAVE_ARCH_TLB_REMOVE_TABLE, the page tables at the pmd/pud level are generally not of struct ptdesc type, and do not have pt_rcu_head member, thus these architectures cannot support PT_RECLAIM.
In preparation for enabling PT_RECLAIM on more architectures, convert __HAVE_ARCH_TLB_REMOVE_TABLE to CONFIG_HAVE_ARCH_TLB_REMOVE_TABLE config, so that we can make conditional judgments in Kconfig.
Link: https://lkml.kernel.org/r/5ebfa3d4b56e63c6906bda5eccaa9f7194d3a86b.1769515122.git.zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Tested-by: Andreas Larsson <andreas@gaisler.com> [sparc, UP&SMP] Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc] Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Helge Deller <deller@gmx.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Magnus Lindholm <linmag7@gmail.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Richard Henderson <richard.henderson@linaro.org> Cc: Richard Weinberger <richard@nod.at> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
14217580 |
| 20-Jan-2026 |
David Hildenbrand (Red Hat) <david@kernel.org> |
mm: rename CONFIG_MEMORY_BALLOON -> CONFIG_BALLOON
Let's make it consistent with the naming of the files but also with the naming of CONFIG_BALLOON_MIGRATION.
While at it, add a "/* CONFIG_BALLOON
mm: rename CONFIG_MEMORY_BALLOON -> CONFIG_BALLOON
Let's make it consistent with the naming of the files but also with the naming of CONFIG_BALLOON_MIGRATION.
While at it, add a "/* CONFIG_BALLOON */".
Link: https://lkml.kernel.org/r/20260119230133.3551867-24-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
cd8e95d8 |
| 20-Jan-2026 |
David Hildenbrand (Red Hat) <david@kernel.org> |
mm: rename CONFIG_BALLOON_COMPACTION to CONFIG_BALLOON_MIGRATION
While compaction depends on migration, the other direction is not the case. So let's make it clearer that this is all about migratio
mm: rename CONFIG_BALLOON_COMPACTION to CONFIG_BALLOON_MIGRATION
While compaction depends on migration, the other direction is not the case. So let's make it clearer that this is all about migration of balloon pages.
Adjust all comments/docs in the core to talk about "migration" instead of "compaction".
While at it add some "/* CONFIG_BALLOON_MIGRATION */".
Link: https://lkml.kernel.org/r/20260119230133.3551867-23-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
7cf3318a |
| 20-Jan-2026 |
David Hildenbrand (Red Hat) <david@kernel.org> |
mm/kconfig: make BALLOON_COMPACTION depend on MIGRATION
Migration support for balloon memory depends on MIGRATION not COMPACTION. Compaction is simply another user of page migration.
The last depe
mm/kconfig: make BALLOON_COMPACTION depend on MIGRATION
Migration support for balloon memory depends on MIGRATION not COMPACTION. Compaction is simply another user of page migration.
The last dependency on compaction.c was effectively removed with commit 3d388584d599 ("mm: convert "movable" flag in page->mapping to a page flag"). Ever since, everything for handling movable_ops page migration resides in core migration code.
So let's change the dependency and adjust the description + help text.
We'll rename BALLOON_COMPACTION separately next.
Link: https://lkml.kernel.org/r/20260119230133.3551867-22-david@kernel.org Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jason Wang <jasowang@redhat.com> Cc: Jerrin Shaji George <jerrin.shaji-george@broadcom.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: SeongJae Park <sj@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
e323b52c |
| 23-Jan-2026 |
Vlastimil Babka <vbabka@suse.cz> |
slab: remove SLUB_CPU_PARTIAL
We have removed the partial slab usage from allocation paths. Now remove the whole config option and associated code.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Rev
slab: remove SLUB_CPU_PARTIAL
We have removed the partial slab usage from allocation paths. Now remove the whole config option and associated code.
Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Reviewed-by: Hao Li <hao.li@linux.dev> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
| #
ee628d9c |
| 15-Dec-2025 |
Kevin Brodsky <kevin.brodsky@arm.com> |
mm: add basic tests for lazy_mmu
Add basic KUnit tests for the generic aspects of the lazy MMU mode: ensure that it appears active when it should, depending on how enable/disable and pause/resume pa
mm: add basic tests for lazy_mmu
Add basic KUnit tests for the generic aspects of the lazy MMU mode: ensure that it appears active when it should, depending on how enable/disable and pause/resume pairs are nested.
[akpm@linux-foundation.org: export ppc64_tlb_batch and __flush_tlb_pending to modules] [ritesh.list@gmail.com: use EXPORT_SYMBOL_IF_KUNIT()] Link: https://lkml.kernel.org/r/87a4zhkt6h.ritesh.list@gmail.com [kevin.brodsky@arm.com: move MODULE_IMPORT_NS(), add comment] Link: https://lkml.kernel.org/r/20251217163812.2633648-2-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20251215150323.2218608-15-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
7303ecbf |
| 15-Dec-2025 |
Kevin Brodsky <kevin.brodsky@arm.com> |
mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODE
Architectures currently opt in for implementing lazy_mmu helpers by defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE.
In preparation for introducing a generic la
mm: introduce CONFIG_ARCH_HAS_LAZY_MMU_MODE
Architectures currently opt in for implementing lazy_mmu helpers by defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE.
In preparation for introducing a generic lazy_mmu layer that will require storage in task_struct, let's switch to a cleaner approach: instead of defining a macro, select a CONFIG option.
This patch introduces CONFIG_ARCH_HAS_LAZY_MMU_MODE and has each arch select it when it implements lazy_mmu helpers. __HAVE_ARCH_ENTER_LAZY_MMU_MODE is removed and <linux/pgtable.h> relies on the new CONFIG instead.
On x86, lazy_mmu helpers are only implemented if PARAVIRT_XXL is selected. This creates some complications in arch/x86/boot/, because a few files manually undefine PARAVIRT* options. As a result <asm/paravirt.h> does not define the lazy_mmu helpers, but this breaks the build as <linux/pgtable.h> only defines them if !CONFIG_ARCH_HAS_LAZY_MMU_MODE. There does not seem to be a clean way out of this - let's just undefine that new CONFIG too.
Link: https://lkml.kernel.org/r/20251215150323.2218608-7-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Acked-by: Andreas Larsson <andreas@gaisler.com> [sparc] Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
269031b1 |
| 07-Nov-2025 |
Dan Williams <dan.j.williams@intel.com> |
x86/kaslr: Recognize all ZONE_DEVICE users as physaddr consumers
Commit 7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems") is too narrow. The effect being mitigated in that commit
x86/kaslr: Recognize all ZONE_DEVICE users as physaddr consumers
Commit 7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems") is too narrow. The effect being mitigated in that commit is caused by ZONE_DEVICE which PCI_P2PDMA has a dependency. ZONE_DEVICE, in general, lets any physical address be added to the direct-map. I.e. not only ACPI hotplug ranges, CXL Memory Windows, or EFI Specific Purpose Memory, but also any PCI MMIO range for the DEVICE_PRIVATE and PCI_P2PDMA cases. Update the mitigation, limit KASLR entropy, to apply in all ZONE_DEVICE=y cases.
Distro kernels typically have PCI_P2PDMA=y, so the practical exposure of this problem is limited to the PCI_P2PDMA=n case.
A potential path to recover entropy would be to walk ACPI and determine the limits for hotplug and PCI MMIO before kernel_randomize_memory(). On smaller systems that could yield some KASLR address bits. This needs additional investigation to determine if some limited ACPI table scanning can happen this early without an open coded solution like arch/x86/boot/compressed/acpi.c needs to deploy.
Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Fixes: 7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Tested-by: Yasunori Goto <y-goto@fujitsu.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://patch.msgid.link/692e08b2516d4_261c1100a3@dwillia2-mobl4.notmuch Signed-off-by: Dave Jiang <dave.jiang@intel.com>
show more ...
|
| #
31807483 |
| 19-Nov-2025 |
Xie Yuanbin <xieyuanbin1@huawei.com> |
mm/memory-failure: remove the selection of RAS
commit 97f0b13452198290799f ("tracing: add trace event for memory-failure") introduces the selection of RAS in memory-failure. This commit is just a t
mm/memory-failure: remove the selection of RAS
commit 97f0b13452198290799f ("tracing: add trace event for memory-failure") introduces the selection of RAS in memory-failure. This commit is just a tracing feature; in reality, there is no dependency between memory-failure and RAS. RAS increases the size of the bzImage image by 8k, which is very valuable for embedded devices.
Move the memory-failure traceing code from ras_event.h to memory-failure.h and remove the selection of RAS.
Link: https://lkml.kernel.org/r/20251119095943.67125-1-xieyuanbin1@huawei.com Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
2ec41967 |
| 02-Nov-2025 |
Ankit Agrawal <ankita@nvidia.com> |
mm: handle poisoning of pfn without struct pages
Poison (or ECC) errors can be very common on a large size cluster. The kernel MM currently does not handle ECC errors / poison on a memory region th
mm: handle poisoning of pfn without struct pages
Poison (or ECC) errors can be very common on a large size cluster. The kernel MM currently does not handle ECC errors / poison on a memory region that is not backed by struct pages. If a memory region mapped using remap_pfn_range() for example, but not added to the kernel, MM will not have associated struct pages. Add a new mechanism to handle memory failure on such memory.
Make kernel MM expose a function to allow modules managing the device memory to register the device memory SPA and the address space associated it. MM maintains this information as an interval tree. On poison, MM can search for the range that the poisoned PFN belong and use the address_space to determine the mapping VMA.
In this implementation, kernel MM follows the following sequence that is largely similar to the memory_failure() handler for struct page backed memory:
1. memory_failure() is triggered on reception of a poison error. An absence of struct page is detected and consequently memory_failure_pfn() is executed.
2. memory_failure_pfn() collects the processes mapped to the PFN.
3. memory_failure_pfn() sends SIGBUS to all the processes mapping the faulty PFN using kill_procs().
Note that there is one primary difference versus the handling of the poison on struct pages, which is to skip unmapping to the faulty PFN. This is done to handle the huge PFNMAP support added recently [1] that enables VM_PFNMAP vmas to map at PMD or PUD level. A poison to a PFN mapped in such as way would need breaking the PMD/PUD mapping into PTEs that will get mirrored into the S2. This can greatly increase the cost of table walks and have a major performance impact.
Link: https://lore.kernel.org/all/20240826204353.2228736-1-peterx@redhat.com/ [1] Link: https://lkml.kernel.org/r/20251102184434.2406-3-ankita@nvidia.com Signed-off-by: Ankit Agrawal <ankita@nvidia.com> Cc: Aniket Agashe <aniketa@nvidia.com> Cc: Borislav Betkov <bp@alien8.de> Cc: David Hildenbrand <david@redhat.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Kevin Tian <kevin.tian@intel.com> Cc: Kirti Wankhede <kwankhede@nvidia.com> Cc: Len Brown <lenb@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Matthew R. Ochs <mochs@nvidia.com> Cc: Mauro Carvalho Chehab <mchehab@kernel.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Neo Jia <cjia@nvidia.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuai Xue <xueshuai@linux.alibaba.com> Cc: Smita Koralahalli Channabasappa <smita.koralahallichannabasappa@amd.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Tarun Gupta <targupta@nvidia.com> Cc: Uwe Kleine-König <u.kleine-koenig@baylibre.com> Cc: Vikram Sethi <vsethi@nvidia.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zhi Wang <zhiw@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
e97d7c51 |
| 23-Oct-2025 |
Dmitry Ilvokhin <d@ilvokhin.com> |
mm: shmem/tmpfs hugepage defaults config choice
Allow to override defaults for shemem and tmpfs at config time. This is consistent with how transparent hugepages can be configured.
Same results ca
mm: shmem/tmpfs hugepage defaults config choice
Allow to override defaults for shemem and tmpfs at config time. This is consistent with how transparent hugepages can be configured.
Same results can be achieved with the existing 'transparent_hugepage_shmem' and 'transparent_hugepage_tmpfs' settings in the kernel command line, but it is more convenient to define basic settings at config time instead of changing kernel command line later.
Defaults for shmem and tmpfs were not changed. They are remained the same as before: 'never' for both cases. Options 'deny' and 'force' are omitted intentionally since these are special values and supposed to be used for emergencies or testing and are not expected to be permanent ones.
Primary motivation for adding config option is to enable policy enforcement at build time. In large-scale production environments (Meta's for example), the kernel configuration is often maintained centrally close to the kernel code itself and owned by the kernel engineers, while boot parameters are managed independently (e.g. by provisioning systems). In such setups, the kernel build defines the supported and expected behavior in a single place, but there is no reliable or uniform control over the kernel command line options.
A build-time default allows kernel integrators to enforce a predictable hugepage policy for shmem/tmpfs on a base layer, ensuring reproducible behavior and avoiding configuration drift caused by possible boot-time differences.
In short, primary benefit is mostly operational: it provides a way to codify preferred policy in the kernel configuration, which is versioned, reviewed, and tested as part of the kernel build process, rather than depending on potentially variable boot parameters.
[d@ilvokhin.com: v2] Link: https://lkml.kernel.org/r/aQECPpjd-fU_TC79@shell.ilvokhin.com Link: https://lkml.kernel.org/r/aPpv8sAa2sYgNu3L@shell.ilvokhin.com Signed-off-by: Dmitry Ilvokhin <d@ilvokhin.com> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: Kiryl Shutsemau <kas@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
5ba2f0a1 |
| 22-Oct-2025 |
Dave Hansen <dave.hansen@linux.intel.com> |
mm: introduce deferred freeing for kernel page tables
This introduces a conditional asynchronous mechanism, enabled by CONFIG_ASYNC_KERNEL_PGTABLE_FREE. When enabled, this mechanism defers the free
mm: introduce deferred freeing for kernel page tables
This introduces a conditional asynchronous mechanism, enabled by CONFIG_ASYNC_KERNEL_PGTABLE_FREE. When enabled, this mechanism defers the freeing of pages that are used as page tables for kernel address mappings. These pages are now queued to a work struct instead of being freed immediately.
This deferred freeing allows for batch-freeing of page tables, providing a safe context for performing a single expensive operation (TLB flush) for a batch of kernel page tables instead of performing that expensive operation for each page table.
Link: https://lkml.kernel.org/r/20251022082635.2462433-8-baolu.lu@linux.intel.com Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Alistair Popple <apopple@nvidia.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Jean-Philippe Brucker <jean-philippe@linaro.org> Cc: Joerg Roedel <joro@8bytes.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robin Murohy <robin.murphy@arm.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Vasant Hegde <vasant.hegde@amd.com> Cc: Vinicius Costa Gomes <vinicius.gomes@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Cc: Yi Lai <yi1.lai@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
900fcf00 |
| 13-Oct-2025 |
Huacai Chen <chenhuacai@loongson.cn> |
mm: remove the BOUNCE config option
Commit eeadd68e2a5f ("block: remove bounce buffering support") remove block/bounce.c but left the BOUNCE config option. Now this option has no users, so remove i
mm: remove the BOUNCE config option
Commit eeadd68e2a5f ("block: remove bounce buffering support") remove block/bounce.c but left the BOUNCE config option. Now this option has no users, so remove it.
Link: https://lkml.kernel.org/r/20251013095620.1111061-1-chenhuacai@loongson.cn Signed-off-by: Huacai Chen <chenhuacai@loongson.cn> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
39231e8d |
| 14-Nov-2025 |
David Hildenbrand (Red Hat) <david@kernel.org> |
mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb
In the past, CONFIG_ARCH_HAS_GIGANTIC_PAGE indicated that we support runtime allocation of gigantic hugetlb folios. In the meantime it evolve
mm: fix MAX_FOLIO_ORDER on powerpc configs with hugetlb
In the past, CONFIG_ARCH_HAS_GIGANTIC_PAGE indicated that we support runtime allocation of gigantic hugetlb folios. In the meantime it evolved into a generic way for the architecture to state that it supports gigantic hugetlb folios.
In commit fae7d834c43c ("mm: add __dump_folio()") we started using CONFIG_ARCH_HAS_GIGANTIC_PAGE to decide MAX_FOLIO_ORDER: whether we could have folios larger than what the buddy can handle. In the context of that commit, we started using MAX_FOLIO_ORDER to detect page corruptions when dumping tail pages of folios. Before that commit, we assumed that we cannot have folios larger than the highest buddy order, which was obviously wrong.
In commit 7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes when registering hstate"), we used MAX_FOLIO_ORDER to detect inconsistencies, and in fact, we found some now.
Powerpc allows for configs that can allocate gigantic folio during boot (not at runtime), that do not set CONFIG_ARCH_HAS_GIGANTIC_PAGE and can exceed PUD_ORDER.
To fix it, let's make powerpc select CONFIG_ARCH_HAS_GIGANTIC_PAGE with hugetlb on powerpc, and increase the maximum folio size with hugetlb to 16 GiB on 64bit (possible on arm64 and powerpc) and 1 GiB on 32 bit (powerpc). Note that on some powerpc configurations, whether we actually have gigantic pages depends on the setting of CONFIG_ARCH_FORCE_MAX_ORDER, but there is nothing really problematic about setting it unconditionally: we just try to keep the value small so we can better detect problems in __dump_folio() and inconsistencies around the expected largest folio in the system.
Ideally, we'd have a better way to obtain the maximum hugetlb folio size and detect ourselves whether we really end up with gigantic folios. Let's defer bigger changes and fix the warnings first.
While at it, handle gigantic DAX folios more clearly: DAX can only end up creating gigantic folios with HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD.
Add a new Kconfig option HAVE_GIGANTIC_FOLIOS to make both cases clearer. In particular, worry about ARCH_HAS_GIGANTIC_PAGE only with HUGETLB_PAGE.
Note: with enabling CONFIG_ARCH_HAS_GIGANTIC_PAGE on powerpc, we will now also allow for runtime allocations of folios in some more powerpc configs. I don't think this is a problem, but if it is we could handle it through __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED.
While __dump_page()/__dump_folio was also problematic (not handling dumping of tail pages of such gigantic folios correctly), it doesn't seem critical enough to mark it as a fix.
Link: https://lkml.kernel.org/r/20251114214920.2550676-1-david@kernel.org Fixes: 7b4f21f5e038 ("mm/hugetlb: check for unreasonable folio sizes when registering hstate") Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Closes: https://lore.kernel.org/r/3e043453-3f27-48ad-b987-cc39f523060a@csgroup.eu/ Reported-by: Sourabh Jain <sourabhjain@linux.ibm.com> Closes: https://lore.kernel.org/r/94377f5c-d4f0-4c0f-b0f6-5bf1cd7305b1@linux.ibm.com/ Signed-off-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Donet Tom <donettom@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
8804d970 |
| 03-Oct-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "mm, swap: improve cluster scan strategy" from Kairui Song imp
Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation
- "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs
- "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters
- "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps
- "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code
- "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code
- "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero
- "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature
- "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs
- "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code
- "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code
- "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system".
It's a long story - the [1/N] changelog spells out the considerations
- "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see
https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc
- "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path
- "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code
- "test that rmap behaves as expected" from Wei Yang adds some rmap selftests
- "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers
- "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues
- "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks
- "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code
- "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem
- "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/
- "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c
- "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation
- "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc)
- "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code
- "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function
- "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only
- "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code
- "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy
- "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages()
- "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver
- "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code
- "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations
- "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little
- "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code
- "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature
- "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work
- "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem
- "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code
- "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources
- "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON
- "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix
- "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information
- "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma
- "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems
- "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate
- "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters
- "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling
* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...
show more ...
|
| #
af92793e |
| 09-Sep-2025 |
Alexei Starovoitov <ast@kernel.org> |
slab: Introduce kmalloc_nolock() and kfree_nolock().
kmalloc_nolock() relies on ability of local_trylock_t to detect the situation when per-cpu kmem_cache is locked.
In !PREEMPT_RT local_(try)lock_
slab: Introduce kmalloc_nolock() and kfree_nolock().
kmalloc_nolock() relies on ability of local_trylock_t to detect the situation when per-cpu kmem_cache is locked.
In !PREEMPT_RT local_(try)lock_irqsave(&s->cpu_slab->lock, flags) disables IRQs and marks s->cpu_slab->lock as acquired. local_lock_is_locked(&s->cpu_slab->lock) returns true when slab is in the middle of manipulating per-cpu cache of that specific kmem_cache.
kmalloc_nolock() can be called from any context and can re-enter into ___slab_alloc(): kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> NMI -> bpf -> kmalloc_nolock() -> ___slab_alloc(cache_B) or kmalloc() -> ___slab_alloc(cache_A) -> irqsave -> tracepoint/kprobe -> bpf -> kmalloc_nolock() -> ___slab_alloc(cache_B)
Hence the caller of ___slab_alloc() checks if &s->cpu_slab->lock can be acquired without a deadlock before invoking the function. If that specific per-cpu kmem_cache is busy the kmalloc_nolock() retries in a different kmalloc bucket. The second attempt will likely succeed, since this cpu locked different kmem_cache.
Similarly, in PREEMPT_RT local_lock_is_locked() returns true when per-cpu rt_spin_lock is locked by current _task_. In this case re-entrance into the same kmalloc bucket is unsafe, and kmalloc_nolock() tries a different bucket that is most likely is not locked by the current task. Though it may be locked by a different task it's safe to rt_spin_lock() and sleep on it.
Similar to alloc_pages_nolock() the kmalloc_nolock() returns NULL immediately if called from hard irq or NMI in PREEMPT_RT.
kfree_nolock() defers freeing to irq_work when local_lock_is_locked() and (in_nmi() or in PREEMPT_RT).
SLUB_TINY config doesn't use local_lock_is_locked() and relies on spin_trylock_irqsave(&n->list_lock) to allocate, while kfree_nolock() always defers to irq_work.
Note, kfree_nolock() must be called _only_ for objects allocated with kmalloc_nolock(). Debug checks (like kmemleak and kfence) were skipped on allocation, hence obj = kmalloc(); kfree_nolock(obj); will miss kmemleak/kfence book keeping and will cause false positives. large_kmalloc is not supported by either kmalloc_nolock() or kfree_nolock().
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
show more ...
|
| #
cde31ecd |
| 22-Sep-2025 |
Xie Yuanbin <xieyuanbin1@huawei.com> |
mm/memory-failure: don't select MEMORY_ISOLATION
We added that "select MEMORY_ISOLATION" in commit ee6f509c3274 ("mm: factor out memory isolate functions"). However, in commit add05cecef80 ("mm: so
mm/memory-failure: don't select MEMORY_ISOLATION
We added that "select MEMORY_ISOLATION" in commit ee6f509c3274 ("mm: factor out memory isolate functions"). However, in commit add05cecef80 ("mm: soft-offline: don't free target page in successful page migration") we remove the need for it, where we removed the calls to set_migratetype_isolate() etc.
What CONFIG_MEMORY_FAILURE soft-offline support wants is migrate_pages() support. But that comes with CONFIG_MIGRATION. And isolate_folio_to_list() has nothing to do with CONFIG_MEMORY_ISOLATION.
Therefore, we can remove "select MEMORY_ISOLATION" of MEMORY_FAILURE.
Link: https://lkml.kernel.org/r/20250922143618.48640-1-xieyuanbin1@huawei.com Signed-off-by: Xie Yuanbin <xieyuanbin1@huawei.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Miaohe Lin <linmiaohe@huawei.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Naoya Horiguchi <nao.horiguchi@gmail.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|