History log of /linux/arch/riscv/mm/ptdump.c (Results 1 – 25 of 139)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 36ec807b 20-Sep-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.12 merge window.


Revision tags: v6.11, v6.11-rc7
# f057b572 06-Sep-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'ib/6.11-rc6-matrix-keypad-spitz' into next

Bring in changes removing support for platform data from matrix-keypad
driver.


Revision tags: v6.11-rc6, v6.11-rc5, v6.11-rc4, v6.11-rc3, v6.11-rc2
# 66e72a01 29-Jul-2024 Jerome Brunet <jbrunet@baylibre.com>

Merge tag 'v6.11-rc1' into clk-meson-next

Linux 6.11-rc1


# ee057c8c 14-Aug-2024 Steven Rostedt <rostedt@goodmis.org>

Merge tag 'v6.11-rc3' into trace/ring-buffer/core

The "reserve_mem" kernel command line parameter has been pulled into
v6.11. Merge the latest -rc3 to allow the persistent ring buffer memory to
be a

Merge tag 'v6.11-rc3' into trace/ring-buffer/core

The "reserve_mem" kernel command line parameter has been pulled into
v6.11. Merge the latest -rc3 to allow the persistent ring buffer memory to
be able to be mapped at the address specified by the "reserve_mem" command
line parameter.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

show more ...


# c8faf11c 30-Jul-2024 Tejun Heo <tj@kernel.org>

Merge tag 'v6.11-rc1' into for-6.12

Linux 6.11-rc1


# ed7171ff 16-Aug-2024 Lucas De Marchi <lucas.demarchi@intel.com>

Merge drm/drm-next into drm-xe-next

Get drm-xe-next on v6.11-rc2 and synchronized with drm-intel-next for
the display side. This resolves the current conflict for the
enable_display module parameter

Merge drm/drm-next into drm-xe-next

Get drm-xe-next on v6.11-rc2 and synchronized with drm-intel-next for
the display side. This resolves the current conflict for the
enable_display module parameter and allows further pending refactors.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

show more ...


# 5c61f598 12-Aug-2024 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Get drm-misc-next to the state of v6.11-rc2.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# 3663e2c4 01-Aug-2024 Jani Nikula <jani.nikula@intel.com>

Merge drm/drm-next into drm-intel-next

Sync with v6.11-rc1 in general, and specifically get the new
BACKLIGHT_POWER_ constants for power states.

Signed-off-by: Jani Nikula <jani.nikula@intel.com>


# 4436e6da 02-Aug-2024 Thomas Gleixner <tglx@linutronix.de>

Merge branch 'linus' into x86/mm

Bring x86 and selftests up to date


# a1ff5a7d 30-Jul-2024 Maxime Ripard <mripard@kernel.org>

Merge drm/drm-fixes into drm-misc-fixes

Let's start the new drm-misc-fixes cycle by bringing in 6.11-rc1.

Signed-off-by: Maxime Ripard <mripard@kernel.org>


Revision tags: v6.11-rc1
# f557af08 20-Jul-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

- Support for various new ISA extensions:
* The Zve3

Merge tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

- Support for various new ISA extensions:
* The Zve32[xf] and Zve64[xfd] sub-extensios of the vector
extension
* Zimop and Zcmop for may-be-operations
* The Zca, Zcf, Zcd and Zcb sub-extensions of the C extension
* Zawrs

- riscv,cpu-intc is now dtschema

- A handful of performance improvements and cleanups to text patching

- Support for memory hot{,un}plug

- The highest user-allocatable virtual address is now visible in
hwprobe

* tag 'riscv-for-linus-6.11-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (58 commits)
riscv: lib: relax assembly constraints in hweight
riscv: set trap vector earlier
KVM: riscv: selftests: Add Zawrs extension to get-reg-list test
KVM: riscv: Support guest wrs.nto
riscv: hwprobe: export Zawrs ISA extension
riscv: Add Zawrs support for spinlocks
dt-bindings: riscv: Add Zawrs ISA extension description
riscv: Provide a definition for 'pause'
riscv: hwprobe: export highest virtual userspace address
riscv: Improve sbi_ecall() code generation by reordering arguments
riscv: Add tracepoints for SBI calls and returns
riscv: Optimize crc32 with Zbc extension
riscv: Enable DAX VMEMMAP optimization
riscv: mm: Add support for ZONE_DEVICE
virtio-mem: Enable virtio-mem for RISC-V
riscv: Enable memory hotplugging for RISC-V
riscv: mm: Take memory hotplug read-lock during kernel page table dump
riscv: mm: Add memory hotplugging support
riscv: mm: Add pfn_to_kaddr() implementation
riscv: mm: Refactor create_linear_mapping_range() for memory hot add
...

show more ...


Revision tags: v6.10, v6.10-rc7, v6.10-rc6
# 60a6707f 26-Jun-2024 Palmer Dabbelt <palmer@rivosinc.com>

Merge patch series "riscv: Memory Hot(Un)Plug support"

Björn Töpel <bjorn@kernel.org> says:

From: Björn Töpel <bjorn@rivosinc.com>

================================================================

Merge patch series "riscv: Memory Hot(Un)Plug support"

Björn Töpel <bjorn@kernel.org> says:

From: Björn Töpel <bjorn@rivosinc.com>

================================================================
Memory Hot(Un)Plug support (and ZONE_DEVICE) for the RISC-V port
================================================================

Introduction
============

To quote "Documentation/admin-guide/mm/memory-hotplug.rst": "Memory
hot(un)plug allows for increasing and decreasing the size of physical
memory available to a machine at runtime."

This series adds memory hot(un)plugging, and ZONE_DEVICE support for
the RISC-V Linux port.

MM configuration
================

RISC-V MM has the following configuration:

* Memory blocks are 128M, analogous to x86-64. It uses PMD
("hugepage") vmemmaps. From that follows that 2M (PMD) worth of
vmemmap spans 32768 pages á 4K which gets us 128M.

* The pageblock size is the minimum minimum virtio_mem size, and on
RISC-V it's 2M (2^9 * 4K).

Implementation
==============

The PGD table on RISC-V is shared/copied between for all processes. To
avoid doing page table synchronization, the first patch (patch 1)
pre-allocated the PGD entries for vmemmap/direct map. By doing that
the init_mm PGD will be fixed at kernel init, and synchronization can
be avoided all together.

The following two patches (patch 2-3) does some preparations, followed
by the actual MHP implementation (patch 4-5). Then, MHP and virtio-mem
are enabled (patch 6-7), and finally ZONE_DEVICE support is added
(patch 8).

MHP and locking
===============

TL;DR: The MHP does not step on any toes, except for ptdump.
Additional locking is required for ptdump.

Long version: For v2 I spent some time digging into init_mm
synchronization/update. Here are my findings, and I'd love them to be
corrected if incorrect.

It's been a gnarly path...

The `init_mm` structure is a special mm (perhaps not a "real" one).
It's a "lazy context" that tracks kernel page table resources, e.g.,
the kernel page table (swapper_pg_dir), a kernel page_table_lock (more
about the usage below), mmap_lock, and such.

`init_mm` does not track/contain any VMAs. Having the `init_mm` is
convenient, so that the regular kernel page table walk/modify
functions can be used.

Now, `init_mm` being special means that the locking for kernel page
tables are special as well.

On RISC-V the PGD (top-level page table structure), similar to x86, is
shared (copied) with user processes. If the kernel PGD is modified, it
has to be synched to user-mode processes PGDs. This is avoided by
pre-populating the PGD, so it'll be fixed from boot.

The in-kernel pgd regions are documented in
`Documentation/arch/riscv/vm-layout.rst`.

The distinct regions are:
* vmemmap
* vmalloc/ioremap space
* direct mapping of all physical memory
* kasan
* modules, BPF
* kernel

Memory hotplug is the process of adding/removing memory to/from the
kernel.

Adding is done in two phases:
1. Add the memory to the kernel
2. Online memory, making it available to the page allocator.

Step 1 is partially architecture dependent, and updates the init_mm
page table:
* Update the direct map page tables. The direct map is a linear map,
representing all physical memory: `virt = phys + PAGE_OFFSET`
* Add a `struct page` for each added page of memory. Update the
vmemmap (virtual mapping to the `struct page`, so we can easily
transform a kernel virtual address to a `struct page *` address.

From an MHP perspective, there are two regions of the PGD that are
updated:
* vmemmap
* direct mapping of all physical memory

The `struct mm_struct` has a couple of locks in play:
* `spinlock_t page_table_lock` protects the page table, and some
counters
* `struct rw_semaphore mmap_lock` protect an mm's VMAs

Note again that `init_mm` does not contain any VMAs, but still uses
the mmap_lock in some places.

The `page_table_lock` was originally used to to protect all pages
tables, but more recently a split page table lock has been introduced.
The split lock has a per-table lock for the PTE and PMD tables. If
split lock is disabled, all tables are guarded by
`mm->page_table_lock` (for user processes). Split page table locks are
not used for init_mm.

MHP operations is typically synchronized using
`DEFINE_STATIC_PERCPU_RWSEM(mem_hotplug_lock)`.

Actors
------

The following non-MHP actors in the kernel traverses (read), and/or
modifies the kernel PGD.

* `ptdump`

Walks the entire `init_mm`, via `ptdump_walk_pgd()` with the
`mmap_write_lock(init_mm)` taken.

Observation: ptdump can race with MHP, and needs additional locking
to avoid crashes/races.

* `set_direct_*` / `arch/riscv/mm/pageattr.c`

The `set_direct_*` functionality is used to "synchronize" the
direct map to other kernel mappings, e.g. modules/kernel text. The
direct map is using "as large huge table mappings as possible",
which means that the `set_direct_*` might need to split the direct
map.

The `set_direct_*` functions operates with the
`mmap_write_lock(init_mm)` taken.

Observation: `set_direct_*` uses the direct map, but will never
modify the same entry as MHP. If there is a mapping, that entry will
never race with MHP. Further, MHP acts when memory is offline.

* HVO / `mm/hugetlb_vmemmap`

HVO optimizes the backing `struct page` for hugetlb pages, which
means changing the "vmemmap" region. HVO can split (merge?) a
vmemmap pmd. However, it will never race with MHP, since HVO only
operates at online memory. HVO cannot touch memory being MHP added
or removed.

* `apply_to_page_range`

Walks a range, creates pages and applies a callback (setting
permissions) for the page.

When creating a table, it might use `int __pte_alloc_kernel(pmd_t
*pmd)` which takes the `init_mm.page_table_lock` to synchronize pmd
populate.

Used by: `mm/vmalloc.c` and `mm/kasan/shadow.c`. The KASAN callback
takes the `init_mm.page_table_lock` to synchronize pte creation.

Observations: `apply_to_page_range` applies to the "vmalloc/ioremap
space" region, and "kasan" region. *Not* affected by MHP.

* `apply_to_existing_page_range`

Walks a range, applies a callback (setting permissions) for the
page (no page creation).

Used by: `kernel/bpf/arena.c` and `mm/kasan/shadow.c`. The KASAN
callback takes the `init_mm.page_table_lock` to synchronize pte
creation. *Not* affected by MHP regions.

* `apply_to_existing_page_range` applies to the "vmalloc/ioremap
space" region, and "kasan" region. *Not* affected by MHP regions.

* `ioremap_page_range` and `vmap_page_range`

Uses the same internal function, and might create table entries at
the "vmalloc/ioremap space" region. Can call
`__pte_alloc_kernel()` which takes the `init_mm.page_table_lock`
synchronizing pmd populate in the region. *Not* affected by MHP
regions.

Summary:
* MHP add will never modify the same page table entries, as any of
the other actors.
* MHP remove is done when memory is offlined, and will not clash
with any of the actors.
* Functions that walk the entire kernel page table need
synchronization

* It's sufficient to add the MHP lock ptdump.

Testing
=======

This series adds basic DT supported hotplugging. There is a QEMU
series enabling MHP for the RISC-V "virt" machine here: [1]

ACPI/MSI support is still in the making for RISC-V, and prior proper
(ACPI) PCI MSI support lands [2] and NUMA SRAT support [3], it hard to
try it out.

I've prepared a QEMU branch with proper ACPI GED/PC-DIMM support [4],
and a this series with the required prerequisites [5] (AIA, ACPI AIA
MADT, ACPI NUMA SRAT).

To test with virtio-mem, e.g.:
| qemu-system-riscv64 \
| -machine virt,aia=aplic-imsic \
| -cpu rv64,v=true,vlen=256,elen=64,h=true,zbkb=on,zbkc=on,zbkx=on,zkr=on,zkt=on,svinval=on,svnapot=on,svpbmt=on \
| -nodefaults \
| -nographic -smp 8 -kernel rv64-u-boot.bin \
| -drive file=rootfs.img,format=raw,if=virtio \
| -device virtio-rng-pci \
| -m 16G,slots=3,maxmem=32G \
| -object memory-backend-ram,id=mem0,size=16G \
| -numa node,nodeid=0,memdev=mem0 \
| -serial chardev:char0 \
| -mon chardev=char0,mode=readline \
| -chardev stdio,mux=on,id=char0 \
| -device pci-serial,id=serial0,chardev=char0 \
| -object memory-backend-ram,id=vmem0,size=2G \
| -device virtio-mem-pci,id=vm0,memdev=vmem0,node=0

where "rv64-u-boot.bin" is U-boot with EFI/ACPI-support (use [6] if
you're lazy).

In the QEMU monitor:
| (qemu) info memory-devices
| (qemu) qom-set vm0 requested-size 1G

...to test DAX/KMEM, use the follow QEMU parameters:
| -object memory-backend-file,id=mem1,share=on,mem-path=virtio_pmem.img,size=4G \
| -device virtio-pmem-pci,memdev=mem1,id=nv1

and the regular ndctl/daxctl dance.

If you're brave to try the ACPI branch, add "acpi=on" to "-machine
virt", and test PC-DIMM MHP (in addition to virtio-{p},mem):

In the QEMU monitor:
| (qemu) object_add memory-backend-ram,id=mem1,size=1G
| (qemu) device_add pc-dimm,id=dimm1,memdev=mem1

You can also try hot-remove with some QEMU options, say:
| -object memory-backend-file,id=mem-1,size=256M,mem-path=/pagesize-2MB
| -device pc-dimm,id=mem1,memdev=mem-1
| -object memory-backend-file,id=mem-2,size=1G,mem-path=/pagesize-1GB
| -device pc-dimm,id=mem2,memdev=mem-2
| -object memory-backend-file,id=mem-3,size=256M,mem-path=/pagesize-2MB
| -device pc-dimm,id=mem3,memdev=mem-3

Remove "acpi=on" to run with DT.

Thanks to Alex, Andrew, David, and Oscar for all
comments/tests/fixups.

References
==========

[1] https://lore.kernel.org/qemu-devel/20240521105635.795211-1-bjorn@kernel.org/
[2] https://lore.kernel.org/linux-riscv/20240501121742.1215792-1-sunilvl@ventanamicro.com/
[3] https://lore.kernel.org/linux-riscv/cover.1713778236.git.haibo1.xu@intel.com/
[4] https://github.com/bjoto/qemu/commits/virtio-mem-pc-dimm-mhp-acpi-v2/
[5] https://github.com/bjoto/linux/commits/mhp-v4-acpi
[6] https://github.com/bjoto/riscv-rootfs-utils/tree/acpi

* b4-shazam-merge:
riscv: Enable DAX VMEMMAP optimization
riscv: mm: Add support for ZONE_DEVICE
virtio-mem: Enable virtio-mem for RISC-V
riscv: Enable memory hotplugging for RISC-V
riscv: mm: Take memory hotplug read-lock during kernel page table dump
riscv: mm: Add memory hotplugging support
riscv: mm: Add pfn_to_kaddr() implementation
riscv: mm: Refactor create_linear_mapping_range() for memory hot add
riscv: mm: Change attribute from __init to __meminit for page functions
riscv: mm: Pre-allocate vmemmap/direct map/kasan PGD entries
riscv: mm: Properly forward vmemmap_populate() altmap parameter

Link: https://lore.kernel.org/r/20240605114100.315918-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

show more ...


Revision tags: v6.10-rc5, v6.10-rc4, v6.10-rc3
# 37992b7f 05-Jun-2024 Björn Töpel <bjorn@rivosinc.com>

riscv: mm: Take memory hotplug read-lock during kernel page table dump

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by h

riscv: mm: Take memory hotplug read-lock during kernel page table dump

During memory hot remove, the ptdump functionality can end up touching
stale data. Avoid any potential crashes (or worse), by holding the
memory hotplug read-lock while traversing the page table.

This change is analogous to arm64's commit bf2b59f60ee1 ("arm64/mm:
Hold memory hotplug lock while walking for kernel page table dump").

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20240605114100.315918-8-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>

show more ...


# a23e1966 15-Jul-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge branch 'next' into for-linus

Prepare input updates for 6.11 merge window.


Revision tags: v6.10-rc2
# 6f47c7ae 28-May-2024 Dmitry Torokhov <dmitry.torokhov@gmail.com>

Merge tag 'v6.9' into next

Sync up with the mainline to bring in the new cleanup API.


Revision tags: v6.10-rc1
# 60a2f25d 16-May-2024 Tvrtko Ursulin <tursulin@ursulin.net>

Merge drm/drm-next into drm-intel-gt-next

Some display refactoring patches are needed in order to allow conflict-
less merging.

Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>


# 594ce0b8 10-Jun-2024 Russell King (Oracle) <rmk+kernel@armlinux.org.uk>

Merge topic branches 'clkdev' and 'fixes' into for-linus


Revision tags: v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4
# 79790b68 12-Apr-2024 Thomas Hellström <thomas.hellstrom@linux.intel.com>

Merge drm/drm-next into drm-xe-next

Backmerging drm-next in order to get up-to-date and in particular
to access commit 9ca5facd0400f610f3f7f71aeb7fc0b949a48c67.

Signed-off-by: Thomas Hellström <tho

Merge drm/drm-next into drm-xe-next

Backmerging drm-next in order to get up-to-date and in particular
to access commit 9ca5facd0400f610f3f7f71aeb7fc0b949a48c67.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>

show more ...


# 3e5a516f 08-Apr-2024 Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

Merge tag 'phy_dp_modes_6.10' into msm-next-lumag

Merge DisplayPort subnode API in order to allow DisplayPort driver to
configure the PHYs either to the DP or eDP mode, depending on hardware
configu

Merge tag 'phy_dp_modes_6.10' into msm-next-lumag

Merge DisplayPort subnode API in order to allow DisplayPort driver to
configure the PHYs either to the DP or eDP mode, depending on hardware
configuration.

Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

show more ...


Revision tags: v6.9-rc3
# 5add703f 02-Apr-2024 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Catching up on 6.9-rc2

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


# 0d21364c 02-Apr-2024 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next

Backmerging to get v6.9-rc2 changes into drm-misc-next.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.9-rc2
# b7e1e969 26-Mar-2024 Takashi Iwai <tiwai@suse.de>

Merge branch 'topic/sound-devel-6.10' into for-next


Revision tags: v6.9-rc1
# 537c2e91 22-Mar-2024 Jakub Kicinski <kuba@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>


# f4566a1e 25-Mar-2024 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.9-rc1' into sched/core, to pick up fixes and to refresh the branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 100c8542 05-Apr-2024 Takashi Iwai <tiwai@suse.de>

Merge tag 'asoc-fix-v6.9-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.9

A relatively large set of fixes here, the biggest piece of it is a

Merge tag 'asoc-fix-v6.9-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.9

A relatively large set of fixes here, the biggest piece of it is a
series correcting some problems with the delay reporting for Intel SOF
cards but there's a bunch of other things. Everything here is driver
specific except for a fix in the core for an issue with sign extension
handling volume controls.

show more ...


123456