| #
1f57f68c |
| 05-Jun-2026 |
Heiko Carstens <hca@linux.ibm.com> |
s390: Remove GENERIC_LOCKBREAK Kconfig option
s390 selects GENERIC_LOCKBREAK if PREEMPT is enabled. Reason is a historic 18 years old commit [1] which fixed a compile error for PREEMPT enabled kerne
s390: Remove GENERIC_LOCKBREAK Kconfig option
s390 selects GENERIC_LOCKBREAK if PREEMPT is enabled. Reason is a historic 18 years old commit [1] which fixed a compile error for PREEMPT enabled kernels. Back than only PREEMPT_NONE and PREEMPT_VOLUNTARY kernels were considered to be important for s390. PREEMPT should "just work".
However, since recently PREEMPT is always enabled [2], which also causes GENERIC_LOCKBREAK to be always enabled. For some workloads this leads to massive performance degradation; e.g. a simple kernel compile on machines with many CPUs may take up to four times longer.
To fix this just remove the GENERIC_LOCKBREAK from s390's Kconfig, since the compile error from 18 years ago does not exist anymore.
[1] commit b6b40c532a36 ("[S390] Define GENERIC_LOCKBREAK.") [2] commit 7dadeaa6e851 ("sched: Further restrict the preemption modes")
Cc: stable@vger.kernel.org Reported-by: Massimiliano Pellizzer <massimiliano.pellizzer@canonical.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
show more ...
|
| #
2a4c0c11 |
| 22-Apr-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 's390-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for CONFIG_PAGE_TABLE_CHECK and enable it in debug_defconf
Merge tag 's390-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Vasily Gorbik:
- Add support for CONFIG_PAGE_TABLE_CHECK and enable it in debug_defconfig. s390 can only tell user from kernel PTEs via the mm, so mm_struct is now passed into pxx_user_accessible_page() callbacks
- Expose the PCI function UID as an arch-specific slot attribute in sysfs so a function can be identified by its user-defined id while still in standby. Introduces a generic ARCH_PCI_SLOT_GROUPS hook in drivers/pci/slot.c
- Refresh s390 PCI documentation to reflect current behavior and cover previously undocumented sysfs attributes
- zcrypt device driver cleanup series: consistent field types, clearer variable naming, a kernel-doc warning fix, and a comment explaining the intentional synchronize_rcu() in pkey_handler_register()
- Provide an s390 arch_raw_cpu_ptr() that avoids the detour via get_lowcore() using alternatives, shrinking defconfig by ~27 kB
- Guard identity-base randomization with kaslr_enabled() so nokaslr keeps the identity mapping at 0 even with RANDOMIZE_IDENTITY_BASE=y
- Build S390_MODULES_SANITY_TEST as a module only by requiring KUNIT && m, since built-in would not exercise module loading
- Remove the permanently commented-out HMCDRV_DEV_CLASS create_class() code in the hmcdrv driver
- Drop stale ident_map_size extern conflicting with asm/page.h
* tag 's390-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/zcrypt: Fix warning about wrong kernel doc comment PCI: s390: Expose the UID as an arch specific PCI slot attribute docs: s390/pci: Improve and update PCI documentation s390/pkey: Add comment about synchronize_rcu() to pkey base s390/hmcdrv: Remove commented out code s390/zcrypt: Slight rework on the agent_id field s390/zcrypt: Explicitly use a card variable in _zcrypt_send_cprb s390/zcrypt: Rework MKVP fields and handling s390/zcrypt: Make apfs a real unsigned int field s390/zcrypt: Rework domain processing within zcrypt device driver s390/zcrypt: Move inline function rng_type6cprb_msgx from header to code s390/percpu: Provide arch_raw_cpu_ptr() s390: Enable page table check for debug_defconfig s390/pgtable: Add s390 support for page table check s390/pgtable: Use set_pmd_bit() to invalidate PMD entry mm/page_table_check: Pass mm_struct to pxx_user_accessible_page() s390/boot: Respect kaslr_enabled() for identity randomization s390/Kconfig: Make modules sanity test a module-only option s390/setup: Drop stale ident_map_size declaration
show more ...
|
| #
9cdca336 |
| 18-Apr-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'integrity-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar: "There are two main changes, one feature removal, some code
Merge tag 'integrity-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity
Pull integrity updates from Mimi Zohar: "There are two main changes, one feature removal, some code cleanup, and a number of bug fixes.
Main changes: - Detecting secure boot mode was limited to IMA. Make detecting secure boot mode accessible to EVM and other LSMs - IMA sigv3 support was limited to fsverity. Add IMA sigv3 support for IMA regular file hashes and EVM portable signatures
Remove: - Remove IMA support for asychronous hash calculation originally added for hardware acceleration
Cleanup: - Remove unnecessary Kconfig CONFIG_MODULE_SIG and CONFIG_KEXEC_SIG tests - Add descriptions of the IMA atomic flags
Bug fixes: - Like IMA, properly limit EVM "fix" mode - Define and call evm_fix_hmac() to update security.evm - Fallback to using i_version to detect file change for filesystems that do not support STATX_CHANGE_COOKIE - Address missing kernel support for configured (new) TPM hash algorithms - Add missing crypto_shash_final() return value"
* tag 'integrity-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: evm: Enforce signatures version 3 with new EVM policy 'bit 3' integrity: Allow sigv3 verification on EVM_XATTR_PORTABLE_DIGSIG ima: add support to require IMA sigv3 signatures ima: add regular file data hash signature version 3 support ima: Define asymmetric_verify_v3() to verify IMA sigv3 signatures ima: remove buggy support for asynchronous hashes integrity: Eliminate weak definition of arch_get_secureboot() ima: Add code comments to explain IMA iint cache atomic_flags ima_fs: Correctly create securityfs files for unsupported hash algos ima: check return value of crypto_shash_final() in boot aggregate ima: Define and use a digest_size field in the ima_algo_desc structure powerpc/ima: Drop unnecessary check for CONFIG_MODULE_SIG ima: efi: Drop unnecessary check for CONFIG_MODULE_SIG/CONFIG_KEXEC_SIG ima: fallback to using i_version to detect file change evm: fix security.evm for a file with IMA signature s390: Drop unnecessary CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT evm: Don't enable fix mode when secure boot is enabled integrity: Make arch_ima_get_secureboot integrity-wide
show more ...
|
| #
078f80f9 |
| 19-Mar-2026 |
David Hildenbrand (Arm) <david@kernel.org> |
mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE
Patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION".
While working on memory hotplug code cleanups, I realized
mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE
Patch series "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION".
While working on memory hotplug code cleanups, I realized that CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE is not really required anymore.
Changing that revealed some rather nasty looking CONFIG_MIGRATION handling.
Let's clean that up by introducing a dedicated CONFIG_NUMA_MIGRATION option and reducing the dependencies that CONFIG_MIGRATION has.
This patch (of 2):
All architectures that select CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE also select CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG. So we can just remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE.
For CONFIG_MIGRATION, make it depend on CONFIG_MEMORY_HOTREMOVE instead, and make CONFIG_MEMORY_HOTREMOVE select CONFIG_MIGRATION (just like CONFIG_CMA and CONFIG_COMPACTION already do).
We'll clean up CONFIG_MIGRATION next.
Link: https://lkml.kernel.org/r/20260319-config_migration-v1-0-42270124966f@kernel.org Link: https://lkml.kernel.org/r/20260319-config_migration-v1-1-42270124966f@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexandre Ghiti <alex@ghiti.fr> Cc: Alistair Popple <apopple@nvidia.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Byungchul Park <byungchul@sk.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: "Huang, Ying" <ying.huang@linux.alibaba.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
show more ...
|
| #
7caedbb5 |
| 09-Mar-2026 |
Nathan Chancellor <nathan@kernel.org> |
integrity: Eliminate weak definition of arch_get_secureboot()
security/integrity/secure_boot.c contains a single __weak function, which breaks recordmcount when building with clang:
$ make -skj"$
integrity: Eliminate weak definition of arch_get_secureboot()
security/integrity/secure_boot.c contains a single __weak function, which breaks recordmcount when building with clang:
$ make -skj"$(nproc)" ARCH=powerpc LLVM=1 ppc64_defconfig security/integrity/secure_boot.o Cannot find symbol for section 2: .text. security/integrity/secure_boot.o: failed
Introduce a Kconfig symbol, CONFIG_HAVE_ARCH_GET_SECUREBOOT, to indicate that an architecture provides a definition of arch_get_secureboot(). Provide a static inline stub when this symbol is not defined to achieve the same effect as the __weak function, allowing secure_boot.c to be removed altogether. Move the s390 definition of arch_get_secureboot() out of the CONFIG_KEXEC_FILE block to ensure it is always available, as it does not actually depend on KEXEC_FILE.
Reported-by: Arnd Bergmann <arnd@arndb.de> Fixes: 31a6a07eefeb ("integrity: Make arch_ima_get_secureboot integrity-wide") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
show more ...
|
| #
7bbf7013 |
| 13-Mar-2026 |
Vasily Gorbik <gor@linux.ibm.com> |
Merge branch 'page-table-check-support' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux into features
Add s390 support for CONFIG_PAGE_TABLE_CHECK.
* 'page-table-check-support' of git:/
Merge branch 'page-table-check-support' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux into features
Add s390 support for CONFIG_PAGE_TABLE_CHECK.
* 'page-table-check-support' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390: Enable page table check for debug_defconfig s390/pgtable: Add s390 support for page table check s390/pgtable: Use set_pmd_bit() to invalidate PMD entry mm/page_table_check: Pass mm_struct to pxx_user_accessible_page()
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
show more ...
|
| #
7b4dde5e |
| 06-Mar-2026 |
Tobias Huschle <huschle@linux.ibm.com> |
s390/pgtable: Add s390 support for page table check
Add page table check hooks into routines that modify user page tables.
Unlike other architectures s390 does not have means to distinguish between
s390/pgtable: Add s390 support for page table check
Add page table check hooks into routines that modify user page tables.
Unlike other architectures s390 does not have means to distinguish between kernel and user page table entries. Rely on the fact the page table check infrastructure itself operates on non-init_mm memory spaces only.
Use the provided mm_struct to verify that the memory space is not init_mm (aka not the kernel memory space) indeed. That check is supposed to be succeeded already (on some code paths even twice).
If the passed memory space by contrast is init_mm that would be an unexpected semantical change in generic code, so do VM_BUG_ON() in such case.
Unset _SEGMENT_ENTRY_READ bit to indicate that pmdp_invalidate() was applied against a huge PMD and is going to be updated by set_pmd_at() shortly. The hook pmd_user_accessible_page() should skip such entries until that, otherwise the page table accounting falls apart and BUG_ON() gets hit as result.
The invalidated huge PMD entry should not be confused with a PROT_NONE entry as reported by pmd_protnone(), though the entry characteristics exactly match: _SEGMENT_ENTRY_LARGE is set while _SEGMENT_ENTRY_READ is unset. Since pmd_protnone() implementation depends on NUMA_BALANCING configuration option, it should not be used in pmd_user_accessible_page() check, which is expected to be CONFIG_NUMA_BALANCING-agnostic.
Nevertheless, an invalidated huge PMD is technically still pmd_protnone() entry and it should not break other code paths once _SEGMENT_ENTRY_READ is unset. As of now, all pmd_protnone() checks are done under page table locks or exercise GUP-fast and HMM code paths, which are expected to be safe against concurrent page table updates.
Alternative approach would be using the last remaining unused PMD entry bit 0x800 to indicate that pmdp_invalidate() was called on a PMD. That would allow avoiding collisions with pmd_protnone() handling code paths, but saving the bit is more preferable way to go.
Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Co-developed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Link: https://lore.kernel.org/r/4db8a681205bd555298d62441cdcfca43317a35a.1772812343.git.agordeev@linux.ibm.com Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
show more ...
|
| #
20216c12 |
| 25-Feb-2026 |
Vasily Gorbik <gor@linux.ibm.com> |
s390/Kconfig: Make modules sanity test a module-only option
The modules sanity test must be built as a module to actually exercise module loading. Require KUNIT && m to prevent built-in builds and a
s390/Kconfig: Make modules sanity test a module-only option
The modules sanity test must be built as a module to actually exercise module loading. Require KUNIT && m to prevent built-in builds and avoid misuse, so it is either 'n' or 'm'.
Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
show more ...
|
| #
a2e507af |
| 13-Feb-2026 |
Coiby Xu <coxu@redhat.com> |
s390: Drop unnecessary CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT
Commit b5ca117365d9 ("ima: prevent kexec_load syscall based on runtime secureboot flag") and commit 268a78404973 ("s390/kexec_file: Disab
s390: Drop unnecessary CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT
Commit b5ca117365d9 ("ima: prevent kexec_load syscall based on runtime secureboot flag") and commit 268a78404973 ("s390/kexec_file: Disable kexec_load when IPLed secure") disabled the kexec_load syscall based on the secureboot mode. Commit 9e2b4be377f0 ("ima: add a new CONFIG for loading arch-specific policies") needed to detect the secure boot mode, not to load an IMA architecture specific policy. Since there is the new CONFIG_INTEGRITY_SECURE_BOOT, drop CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT for s390.
Signed-off-by: Coiby Xu <coxu@redhat.com> Tested-by: Alexander Egorenkov <egorenar@linux.ibm.com> [Vasily Gorbik: Fix missing arch_get_secureboot() prototype warning] link: https://lore.kernel.org/linux-integrity/c00-01.ttbfdx5@ub.hpns/ Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
show more ...
|
| #
98067901 |
| 20-Feb-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 's390-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Make KEXEC_SIG available again for CONFIG_MODULES=n
- The s390 topology
Merge tag 's390-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Heiko Carstens:
- Make KEXEC_SIG available again for CONFIG_MODULES=n
- The s390 topology code used to call rebuild_sched_domains() before common code scheduling domains were setup. This was silently ignored by common code, but now results in a warning. Address by avoiding the early call
- Convert debug area lock from spinlock to raw spinlock to address lockdep warnings
- The recent 3490 tape device driver rework resulted in a different device driver name, which is visible via sysfs for user space. This breaks at least one user space application. Change the device driver name back to its old name to fix this
* tag 's390-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/tape: Fix device driver name s390/debug: Convert debug area lock from a spinlock to a raw spinlock s390/smp: Avoid calling rebuild_sched_domains() early s390/kexec: Make KEXEC_SIG available when CONFIG_MODULES=n
show more ...
|
| #
dd341195 |
| 16-Feb-2026 |
Alexander Egorenkov <egorenar@linux.ibm.com> |
s390/kexec: Make KEXEC_SIG available when CONFIG_MODULES=n
The commit c8424e776b09 ("MODSIGN: Export module signature definitions") replaced the dependency of KEXEC_SIG on SYSTEM_DATA_VERIFICATION w
s390/kexec: Make KEXEC_SIG available when CONFIG_MODULES=n
The commit c8424e776b09 ("MODSIGN: Export module signature definitions") replaced the dependency of KEXEC_SIG on SYSTEM_DATA_VERIFICATION with the dependency on MODULE_SIG_FORMAT. This change disables KEXEC_SIG in s390 kernels built with MODULES=n if nothing else selects MODULE_SIG_FORMAT.
Furthermore, the signature verification in s390 kexec does not require MODULE_SIG_FORMAT because it requires only the struct module_signature and, therefore, does not depend on code in kernel/module_signature.c.
But making ARCH_SUPPORTS_KEXEC_SIG depend on SYSTEM_DATA_VERIFICATION is also incorrect because it makes KEXEC_SIG available on s390 only if some other arbitrary option (for instance a file system or device driver) selects it directly or indirectly.
To properly make KEXEC_SIG available for s390 kernels built with MODULES=y as well as MODULES=n _and_ also not depend on arbitrary options selecting SYSTEM_DATA_VERIFICATION, set ARCH_SUPPORTS_KEXEC_SIG=y for s390 and select SYSTEM_DATA_VERIFICATION when KEXEC_SIG=y.
Fixes: c8424e776b09 ("MODSIGN: Export module signature definitions") Suggested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
show more ...
|
| #
cb557386 |
| 13-Feb-2026 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini: "Loongarch:
- Add more CPUCFG mask bits
- Improve feature detection
- Add lazy lo
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Paolo Bonzini: "Loongarch:
- Add more CPUCFG mask bits
- Improve feature detection
- Add lazy load support for FPU and binary translation (LBT) register state
- Fix return value for memory reads from and writes to in-kernel devices
- Add support for detecting preemption from within a guest
- Add KVM steal time test case to tools/selftests
ARM:
- Add support for FEAT_IDST, allowing ID registers that are not implemented to be reported as a normal trap rather than as an UNDEF exception
- Add sanitisation of the VTCR_EL2 register, fixing a number of UXN/PXN/XN bugs in the process
- Full handling of RESx bits, instead of only RES0, and resulting in SCTLR_EL2 being added to the list of sanitised registers
- More pKVM fixes for features that are not supposed to be exposed to guests
- Make sure that MTE being disabled on the pKVM host doesn't give it the ability to attack the hypervisor
- Allow pKVM's host stage-2 mappings to use the Force Write Back version of the memory attributes by using the "pass-through' encoding
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the guest
- Preliminary work for guest GICv5 support
- A bunch of debugfs fixes, removing pointless custom iterators stored in guest data structures
- A small set of FPSIMD cleanups
- Selftest fixes addressing the incorrect alignment of page allocation
- Other assorted low-impact fixes and spelling fixes
RISC-V:
- Fixes for issues discoverd by KVM API fuzzing in kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and kvm_riscv_vcpu_aia_imsic_update()
- Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
- Transparent huge page support for hypervisor page tables
- Adjust the number of available guest irq files based on MMIO register sizes found in the device tree or the ACPI tables
- Add RISC-V specific paging modes to KVM selftests
- Detect paging mode at runtime for selftests
s390:
- Performance improvement for vSIE (aka nested virtualization)
- Completely new memory management. s390 was a special snowflake that enlisted help from the architecture's page table management to build hypervisor page tables, in particular enabling sharing the last level of page tables. This however was a lot of code (~3K lines) in order to support KVM, and also blocked several features. The biggest advantages is that the page size of userspace is completely independent of the page size used by the guest: userspace can mix normal pages, THPs and hugetlbfs as it sees fit, and in fact transparent hugepages were not possible before. It's also now possible to have nested guests and guests with huge pages running on the same host
- Maintainership change for s390 vfio-pci
- Small quality of life improvement for protected guests
x86:
- Add support for giving the guest full ownership of PMU hardware (contexted switched around the fastpath run loop) and allowing direct access to data MSRs and PMCs (restricted by the vPMU model).
KVM still intercepts access to control registers, e.g. to enforce event filtering and to prevent the guest from profiling sensitive host state. This is more accurate, since it has no risk of contention and thus dropped events, and also has significantly less overhead.
For more information, see the commit message for merge commit bf2c3138ae36 ("Merge tag 'kvm-x86-pmu-6.20' ...")
- Disallow changing the virtual CPU model if L2 is active, for all the same reasons KVM disallows change the model after the first KVM_RUN
- Fix a bug where KVM would incorrectly reject host accesses to PV MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled, even if those were advertised as supported to userspace,
- Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs, where KVM would attempt to read CR3 configuring an async #PF entry
- Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM (for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL. Only a few exports that are intended for external usage, and those are allowed explicitly
- When checking nested events after a vCPU is unblocked, ignore -EBUSY instead of WARNing. Userspace can sometimes put the vCPU into what should be an impossible state, and spurious exit to userspace on -EBUSY does not really do anything to solve the issue
- Also throw in the towel and drop the WARN on INIT/SIPI being blocked when vCPU is in Wait-For-SIPI, which also resulted in playing whack-a-mole with syzkaller stuffing architecturally impossible states into KVM
- Add support for new Intel instructions that don't require anything beyond enumerating feature flags to userspace
- Grab SRCU when reading PDPTRs in KVM_GET_SREGS2
- Add WARNs to guard against modifying KVM's CPU caps outside of the intended setup flow, as nested VMX in particular is sensitive to unexpected changes in KVM's golden configuration
- Add a quirk to allow userspace to opt-in to actually suppress EOI broadcasts when the suppression feature is enabled by the guest (currently limited to split IRQCHIP, i.e. userspace I/O APIC). Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an option as some userspaces have come to rely on KVM's buggy behavior (KVM advertises Supress EOI Broadcast irrespective of whether or not userspace I/O APIC supports Directed EOIs)
- Clean up KVM's handling of marking mapped vCPU pages dirty
- Drop a pile of *ancient* sanity checks hidden behind in KVM's unused ASSERT() macro, most of which could be trivially triggered by the guest and/or user, and all of which were useless
- Fold "struct dest_map" into its sole user, "struct rtc_status", to make it more obvious what the weird parameter is used for, and to allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n
- Bury all of ioapic.h, i8254.h and related ioctls (including KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y
- Add a regression test for recent APICv update fixes
- Handle "hardware APIC ISR", a.k.a. SVI, updates in kvm_apic_update_apicv() to consolidate the updates, and to co-locate SVI updates with the updates for KVM's own cache of ISR information
- Drop a dead function declaration
- Minor cleanups
x86 (Intel):
- Rework KVM's handling of VMCS updates while L2 is active to temporarily switch to vmcs01 instead of deferring the update until the next nested VM-Exit.
The deferred updates approach directly contributed to several bugs, was proving to be a maintenance burden due to the difficulty in auditing the correctness of deferred updates, and was polluting "struct nested_vmx" with a growing pile of booleans
- Fix an SGX bug where KVM would incorrectly try to handle EPCM page faults, and instead always reflect them into the guest. Since KVM doesn't shadow EPCM entries, EPCM violations cannot be due to KVM interference and can't be resolved by KVM
- Fix a bug where KVM would register its posted interrupt wakeup handler even if loading kvm-intel.ko ultimately failed
- Disallow access to vmcb12 fields that aren't fully supported, mostly to avoid weirdness and complexity for FRED and other features, where KVM wants enable VMCS shadowing for fields that conditionally exist
- Print out the "bad" offsets and values if kvm-intel.ko refuses to load (or refuses to online a CPU) due to a VMCS config mismatch
x86 (AMD):
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure
- Add support for virtualizing ERAPS. Note, correct virtualization of ERAPS relies on an upcoming, publicly announced change in the APM to reduce the set of conditions where hardware (i.e. KVM) *must* flush the RAP
- Ignore nSVM intercepts for instructions that are not supported according to L1's virtual CPU model
- Add support for expedited writes to the fast MMIO bus, a la VMX's fastpath for EPT Misconfig
- Don't set GIF when clearing EFER.SVME, as GIF exists independently of SVM, and allow userspace to restore nested state with GIF=0
- Treat exit_code as an unsigned 64-bit value through all of KVM
- Add support for fetching SNP certificates from userspace
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when emulating VMLOAD or VMSAVE on behalf of L2
- Misc fixes and cleanups
x86 selftests:
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU support, and extend x86's infrastructure to support EPT and NPT (for L2 guests)
- Extend several nested VMX tests to also cover nested SVM
- Add a selftest for nested VMLOAD/VMSAVE
- Rework the nested dirty log test, originally added as a regression test for PML where KVM logged L2 GPAs instead of L1 GPAs, to improve test coverage and to hopefully make the test easier to understand and maintain
guest_memfd:
- Remove kvm_gmem_populate()'s preparation tracking and half-baked hugepage handling. SEV/SNP was the only user of the tracking and it can do it via the RMP
- Retroactively document and enforce (for SNP) that KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the source page to be 4KiB aligned, to avoid non-trivial complexity for something that no known VMM seems to be doing and to avoid an API special case for in-place conversion, which simply can't support unaligned sources
- When populating guest_memfd memory, GUP the source page in common code and pass the refcounted page to the vendor callback, instead of letting vendor code do the heavy lifting. Doing so avoids a looming deadlock bug with in-place due an AB-BA conflict betwee mmap_lock and guest_memfd's filemap invalidate lock
Generic:
- Fix a bug where KVM would ignore the vCPU's selected address space when creating a vCPU-specific mapping of guest memory. Actually this bug could not be hit even on x86, the only architecture with multiple address spaces, but it's a bug nevertheless"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits) KVM: s390: Increase permitted SE header size to 1 MiB MAINTAINERS: Replace backup for s390 vfio-pci KVM: s390: vsie: Fix race in acquire_gmap_shadow() KVM: s390: vsie: Fix race in walk_guest_tables() KVM: s390: Use guest address to mark guest page dirty irqchip/riscv-imsic: Adjust the number of available guest irq files RISC-V: KVM: Transparent huge page support RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test RISC-V: KVM: Allow Zalasr extensions for Guest/VM KVM: riscv: selftests: Add riscv vm satp modes KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr() RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr() RISC-V: KVM: Remove unnecessary 'ret' assignment KVM: s390: Add explicit padding to struct kvm_s390_keyop KVM: LoongArch: selftests: Add steal time test case LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side LoongArch: KVM: Add paravirt preempt feature in hypervisor side ...
show more ...
|
| #
728b0e21 |
| 04-Feb-2026 |
Claudio Imbrenda <imbrenda@linux.ibm.com> |
KVM: S390: Remove PGSTE code from linux/s390 mm
Remove the PGSTE config option. Remove all code from linux/s390 mm that involves PGSTEs.
Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by:
KVM: S390: Remove PGSTE code from linux/s390 mm
Remove the PGSTE config option. Remove all code from linux/s390 mm that involves PGSTEs.
Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
show more ...
|
| #
e38c884d |
| 04-Feb-2026 |
Claudio Imbrenda <imbrenda@linux.ibm.com> |
KVM: s390: Switch to new gmap
Switch KVM/s390 to use the new gmap code.
Remove includes to <gmap.h> and include "gmap.h" instead; fix all the existing users of the old gmap functions to use the new
KVM: s390: Switch to new gmap
Switch KVM/s390 to use the new gmap code.
Remove includes to <gmap.h> and include "gmap.h" instead; fix all the existing users of the old gmap functions to use the new ones instead.
Fix guest storage key access functions to work with the new gmap.
Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
show more ...
|
| #
0d453ba0 |
| 16-Jan-2026 |
Gerd Bayer <gbayer@linux.ibm.com> |
s390/Kconfig: Define non-zero ILLEGAL_POINTER_VALUE
Define CONFIG_ILLEGAL_POINTER_VALUE to the eye-catching non-zero value of 0xdead000000000000, consistent with other architectures. Assert at compi
s390/Kconfig: Define non-zero ILLEGAL_POINTER_VALUE
Define CONFIG_ILLEGAL_POINTER_VALUE to the eye-catching non-zero value of 0xdead000000000000, consistent with other architectures. Assert at compile-time that the poison pointers that include/linux/poison.h defines based on this illegal pointer are beyond the largest useful virtual addresses. Also, assert at compile-time that the range of poison pointers per include/linux/poison.h (currently a range of less than 0x10000 addresses) does not overlap with the range used for address handles for s390's non-MIO PCI instructions.
This enables s390 to track the DMA mappings by the network stack's page_pool that was introduced with [0]. Other functional changes are not intended.
Other archictectures have introduced this for various other reasons with commit 5c178472af24 ("riscv: define ILLEGAL_POINTER_VALUE for 64bit") commit f6853eb561fb ("powerpc/64: Define ILLEGAL_POINTER_VALUE for 64-bit") commit bf0c4e047324 ("arm64: kconfig: Move LIST_POISON to a safe value") commit a29815a333c6 ("core, x86: make LIST_POISON less deadly")
[0] https://lore.kernel.org/all/20250409-page-pool-track-dma-v9-0-6a9ef2e0cba8@redhat.com/
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
show more ...
|
| #
e5f3e67d |
| 09-Jan-2026 |
Heiko Carstens <hca@linux.ibm.com> |
s390: Add CC_HAS_ASM_IMMEDIATE_STRINGS
Upcoming changes to s390 specific inline assemblies require the usage of strings as immediate input operands. This works only with gcc-9 and newer compilers. W
s390: Add CC_HAS_ASM_IMMEDIATE_STRINGS
Upcoming changes to s390 specific inline assemblies require the usage of strings as immediate input operands. This works only with gcc-9 and newer compilers. With gcc-8 this leads to a compile error:
void bar(void) { asm volatile("" :: "i" ("foo")); }
Results in:
In function 'bar': warning: asm operand 0 probably doesn't match constraints asm volatile("" :: "i" ("foo")); ^~~ error: impossible constraint in 'asm'
Provide a CC_HAS_ASM_IMMEDIATE_STRINGS config option which allows to tell if the compiler supports strings as immediate input operands. Based on that conditional code can be provided.
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
show more ...
|
| #
eb2606bb |
| 22-Dec-2025 |
Thomas Weißschuh <thomas.weissschuh@linutronix.de> |
s390: Implement ARCH_HAS_CC_CAN_LINK
The generic CC_CAN_LINK detection relies on -m32/-m64 compiler flags. Some s390 toolchains use -m31 instead but that is not supported in the kernel.
Make the lo
s390: Implement ARCH_HAS_CC_CAN_LINK
The generic CC_CAN_LINK detection relies on -m32/-m64 compiler flags. Some s390 toolchains use -m31 instead but that is not supported in the kernel.
Make the logic easier to understand and allow the simplification of the generic CC_CAN_LINK by using a tailored implementation.
Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
show more ...
|
| #
f770950a |
| 04-Dec-2025 |
Tobias Schumacher <ts@linux.ibm.com> |
s390/pci: Migrate s390 IRQ logic to IRQ domain API
s390 is one of the last architectures using the legacy API for setup and teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown to
s390/pci: Migrate s390 IRQ logic to IRQ domain API
s390 is one of the last architectures using the legacy API for setup and teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown to the MSI parent domain API. For details, see:
https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de
In detail, create an MSI parent domain for each PCI domain. When a PCI device sets up MSI or MSI-X IRQs, the library creates a per-device IRQ domain for this device, which is used by the device for allocating and freeing IRQs.
The per-device domain delegates this allocation and freeing to the parent-domain. In the end, the corresponding callbacks of the parent domain are responsible for allocating and freeing the IRQs.
The allocation is split into two parts: - zpci_msi_prepare() is called once for each device and allocates the required resources. On s390, each PCI function has its own airq vector and a summary bit, which must be configured once per function. This is done in prepare(). - zpci_msi_alloc() can be called multiple times for allocating one or more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ number in the kernel and the hardware IRQ number.
Freeing is split into two counterparts: - zpci_msi_free() reverts the effects of zpci_msi_alloc() and - zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is called once when all IRQs are freed before a device is removed.
Since the parent domain in the end allocates the IRQs, the hwirq encoding must be unambiguous for all IRQs of all devices. This is achieved by encoding the hwirq using the devfn and the MSI index.
Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Tobias Schumacher <ts@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
show more ...
|
| #
eb9780a1 |
| 28-Nov-2025 |
Heiko Carstens <hca@linux.ibm.com> |
s390: Select POSIX_CPU_TIMERS_TASK_WORK
After support for VIRT_XFER_TO_GUEST_WORK is available for s390 it is possible to also select HAVE_POSIX_CPU_TIMERS_TASK_WORK. See [1] for the reasons why it
s390: Select POSIX_CPU_TIMERS_TASK_WORK
After support for VIRT_XFER_TO_GUEST_WORK is available for s390 it is possible to also select HAVE_POSIX_CPU_TIMERS_TASK_WORK. See [1] for the reasons why it makes sense, also for architectures which do not support PREEMPT_RT.
[1] https://lore.kernel.org/all/20200716201923.228696399@linutronix.de
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
show more ...
|
| #
2547f79b |
| 03-Dec-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Provide a new interface for dynamic configuration and deconfiguration
Merge tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Provide a new interface for dynamic configuration and deconfiguration of hotplug memory, allowing with and without memmap_on_memory support. This makes the way memory hotplug is handled on s390 much more similar to other architectures
- Remove compat support. There shouldn't be any compat user space around anymore, therefore get rid of a lot of code which also doesn't need to be tested anymore
- Add stackprotector support. GCC 16 will get new compiler options, which allow to generate code required for kernel stackprotector support
- Merge pai_crypto and pai_ext PMU drivers into a new driver. This removes a lot of duplicated code. The new driver is also extendable and allows to support new PMUs
- Add driver override support for AP queues
- Rework and extend zcrypt and AP trace events to allow for tracing of crypto requests
- Support block sizes larger than 65535 bytes for CCW tape devices
- Since the rework of the virtual kernel address space the module area and the kernel image are within the same 4GB area. This eliminates the need of weak per cpu variables. Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
- Various other small improvements and fixes
* tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits) watchdog: diag288_wdt: Remove KMSG_COMPONENT macro s390/entry: Use lay instead of aghik s390/vdso: Get rid of -m64 flag handling s390/vdso: Rename vdso64 to vdso s390: Rename head64.S to head.S s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros s390: Add stackprotector support s390/modules: Simplify module_finalize() slightly s390: Remove KMSG_COMPONENT macro s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU s390/ap: Restrict driver_override versus apmask and aqmask use s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex s390/ap: Support driver_override for AP queue devices s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks s390/debug: Update description of resize operation s390/syscalls: Switch to generic system call table generation s390/syscalls: Remove system call table pointer from thread_struct s390/uapi: Remove 31 bit support from uapi header files s390: Remove compat support tools: Remove s390 compat support ...
show more ...
|
| #
f5730d44 |
| 17-Nov-2025 |
Heiko Carstens <hca@linux.ibm.com> |
s390: Add stackprotector support
Stackprotector support was previously unavailable on s390 because by default compilers generate code which is not suitable for the kernel: the canary value is access
s390: Add stackprotector support
Stackprotector support was previously unavailable on s390 because by default compilers generate code which is not suitable for the kernel: the canary value is accessed via thread local storage, where the address of thread local storage is within access registers 0 and 1.
Using those registers also for the kernel would come with a significant performance impact and more complicated kernel entry/exit code, since access registers contents would have to be exchanged on every kernel entry and exit.
With the upcoming gcc 16 release new compiler options will become available which allow to generate code suitable for the kernel. [1]
Compiler option -mstack-protector-guard=global instructs gcc to generate stackprotector code that refers to a global stackprotector canary value via symbol __stack_chk_guard. Access to this value is guaranteed to occur via larl and lgrl instructions.
Furthermore, compiler option -mstack-protector-guard-record generates a section containing all code addresses that reference the canary value.
To allow for per task canary values the instructions which load the address of __stack_chk_guard are patched so they access a lowcore field instead: a per task canary value is available within the task_struct of each task, and is written to the per-cpu lowcore location on each context switch.
Also add sanity checks and debugging option to be consistent with other kernel code patching mechanisms.
Full debugging output can be enabled with the following kernel command line options:
debug_stackprotector bootdebug ignore_loglevel earlyprintk dyndbg="file stackprotector.c +p"
Example debug output:
stackprot: 0000021e402d4eda: c010005a9ae3 -> c01f00070240
where "<insn address>: <old insn> -> <new insn>".
[1] gcc commit 0cd1f03939d5 ("s390: Support global stack protector")
Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
show more ...
|
| #
e950d1f8 |
| 19-Nov-2025 |
Heiko Carstens <hca@linux.ibm.com> |
s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
Since the rework of the kernel virtual address space [1] the module area and the kernel image are within the same 4GB area. Therefore there is
s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU
Since the rework of the kernel virtual address space [1] the module area and the kernel image are within the same 4GB area. Therefore there is no need for the weak per cpu workaround for modules anymore. Remove it.
[1] commit c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces")
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
show more ...
|
| #
8e0b986c |
| 10-Nov-2025 |
Heiko Carstens <hca@linux.ibm.com> |
s390: Remove compat support
There shouldn't be any 31 bit code around anymore that matters. Remove the compat layer support required to run 31 bit code.
Reason for removal is code simplification an
s390: Remove compat support
There shouldn't be any 31 bit code around anymore that matters. Remove the compat layer support required to run 31 bit code.
Reason for removal is code simplification and reduced test effort.
Note that this comes without any deprecation warnings added to config options, or kernel messages, since most likely those would be ignored anyway.
If it turns out there is still a reason to keep the compat layer this can be reverted at any time in the future.
Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
show more ...
|
| #
64e2f60f |
| 30-Oct-2025 |
Heiko Carstens <hca@linux.ibm.com> |
s390: Disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
As reported by Luiz Capitulino enabling HVO on s390 leads to reproducible crashes. The problem is that kernel page tables are modified without flushi
s390: Disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP
As reported by Luiz Capitulino enabling HVO on s390 leads to reproducible crashes. The problem is that kernel page tables are modified without flushing corresponding TLB entries.
Even if it looks like the empty flush_tlb_all() implementation on s390 is the problem, it is actually a different problem: on s390 it is not allowed to replace an active/valid page table entry with another valid page table entry without the detour over an invalid entry. A direct replacement may lead to random crashes and/or data corruption.
In order to invalidate an entry special instructions have to be used (e.g. ipte or idte). Alternatively there are also special instructions available which allow to replace a valid entry with a different valid entry (e.g. crdte or cspg).
Given that the HVO code currently does not provide the hooks to allow for an implementation which is compliant with the s390 architecture requirements, disable ARCH_WANT_OPTIMIZE_HUGETLB_VMEMMAP again, which is basically a revert of the original patch which enabled it.
Reported-by: Luiz Capitulino <luizcap@redhat.com> Closes: https://lore.kernel.org/all/20251028153930.37107-1-luizcap@redhat.com/ Fixes: 00a34d5a99c0 ("s390: select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP") Cc: stable@vger.kernel.org Tested-by: Luiz Capitulino <luizcap@redhat.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
show more ...
|
| #
8804d970 |
| 03-Oct-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "mm, swap: improve cluster scan strategy" from Kairui Song imp
Merge tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation
- "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs
- "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters
- "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps
- "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code
- "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code
- "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero
- "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature
- "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs
- "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code
- "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code
- "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system".
It's a long story - the [1/N] changelog spells out the considerations
- "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see
https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc
- "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path
- "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code
- "test that rmap behaves as expected" from Wei Yang adds some rmap selftests
- "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers
- "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues
- "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks
- "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code
- "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem
- "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/
- "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c
- "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation
- "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc)
- "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code
- "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function
- "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only
- "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code
- "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy
- "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages()
- "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver
- "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code
- "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations
- "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little
- "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code
- "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature
- "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work
- "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem
- "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code
- "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources
- "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON
- "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix
- "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information
- "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma
- "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems
- "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate
- "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters
- "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling
* tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...
show more ...
|