History log of /linux/virt/kvm/kvm_main.c (Results 1 – 25 of 4113)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
Revision tags: v6.12-rc5
# fc5ced75 25-Oct-2024 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-fixes into drm-misc-fixes

Backmerging to get the latest fixes from upstream.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


# d1293776 21-Oct-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM64:

- Fix the guest view of the ID registers, making the relevant fields
writable

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM64:

- Fix the guest view of the ID registers, making the relevant fields
writable from userspace (affecting ID_AA64DFR0_EL1 and
ID_AA64PFR1_EL1)

- Correcly expose S1PIE to guests, fixing a regression introduced in
6.12-rc1 with the S1POE support

- Fix the recycling of stage-2 shadow MMUs by tracking the context
(are we allowed to block or not) as well as the recycling state

- Address a couple of issues with the vgic when userspace
misconfigures the emulation, resulting in various splats. Headaches
courtesy of our Syzkaller friends

- Stop wasting space in the HYP idmap, as we are dangerously close to
the 4kB limit, and this has already exploded in -next

- Fix another race in vgic_init()

- Fix a UBSAN error when faking the cache topology with MTE enabled

RISCV:

- RISCV: KVM: use raw_spinlock for critical section in imsic

x86:

- A bandaid for lack of XCR0 setup in selftests, which causes trouble
if the compiler is configured to have x86-64-v3 (with AVX) as the
default ISA. Proper XCR0 setup will come in the next merge window.

- Fix an issue where KVM would not ignore low bits of the nested CR3
and potentially leak up to 31 bytes out of the guest memory's
bounds

- Fix case in which an out-of-date cached value for the segments
could by returned by KVM_GET_SREGS.

- More cleanups for KVM_X86_QUIRK_SLOT_ZAP_ALL

- Override MTRR state for KVM confidential guests, making it WB by
default as is already the case for Hyper-V guests.

Generic:

- Remove a couple of unused functions"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (27 commits)
RISCV: KVM: use raw_spinlock for critical section in imsic
KVM: selftests: Fix out-of-bounds reads in CPUID test's array lookups
KVM: selftests: x86: Avoid using SSE/AVX instructions
KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory
KVM: VMX: reset the segment cache after segment init in vmx_vcpu_reset()
KVM: x86: Clean up documentation for KVM_X86_QUIRK_SLOT_ZAP_ALL
KVM: x86/mmu: Add lockdep assert to enforce safe usage of kvm_unmap_gfn_range()
KVM: x86/mmu: Zap only SPs that shadow gPTEs when deleting memslot
x86/kvm: Override default caching mode for SEV-SNP and TDX
KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic
KVM: Remove unused kvm_vcpu_gfn_to_pfn
KVM: arm64: Ensure vgic_ready() is ordered against MMIO registration
KVM: arm64: vgic: Don't check for vgic_ready() when setting NR_IRQS
KVM: arm64: Fix shift-out-of-bounds bug
KVM: arm64: Shave a few bytes from the EL2 idmap code
KVM: arm64: Don't eagerly teardown the vgic on init error
KVM: arm64: Expose S1PIE to guests
KVM: arm64: nv: Clarify safety of allowing TLBI unmaps to reschedule
KVM: arm64: nv: Punt stage-2 recycling to a vCPU request
KVM: arm64: nv: Do not block when unmapping stage-2 if disallowed
...

show more ...


Revision tags: v6.12-rc4
# 2b4d2501 20-Oct-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduling fixes from Borislav Petkov:

- Add PREEMPT_RT maintainers

- Fix another aspect of d

Merge tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduling fixes from Borislav Petkov:

- Add PREEMPT_RT maintainers

- Fix another aspect of delayed dequeued tasks wrt determining their
state, i.e., whether they're runnable or blocked

- Handle delayed dequeued tasks and their migration wrt PSI properly

- Fix the situation where a delayed dequeue task gets enqueued into a
new class, which should not happen

- Fix a case where memory allocation would happen while the runqueue
lock is held, which is a no-no

- Do not over-schedule when tasks with shorter slices preempt the
currently running task

- Make sure delayed to deque entities are properly handled before
unthrottling

- Other smaller cleanups and improvements

* tag 'sched_urgent_for_v6.12_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Add an entry for PREEMPT_RT.
sched/fair: Fix external p->on_rq users
sched/psi: Fix mistaken CPU pressure indication after corrupted task state bug
sched/core: Dequeue PSI signals for blocked tasks that are delayed
sched: Fix delayed_dequeue vs switched_from_fair()
sched/core: Disable page allocation in task_tick_mm_cid()
sched/deadline: Use hrtick_enabled_dl() before start_hrtick_dl()
sched/eevdf: Fix wakeup-preempt by checking cfs_rq->nr_running
sched: Fix sched_delayed vs cfs_bandwidth

show more ...


Revision tags: v6.12-rc3, v6.12-rc2
# bc07eea2 01-Oct-2024 Dr. David Alan Gilbert <linux@treblig.org>

KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic

The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit
1bbc60d0c7e5 ("KVM: x86/mmu: Remove MMU auditing")

Remove it.

Signed-off-by: Dr. Davi

KVM: Remove unused kvm_vcpu_gfn_to_pfn_atomic

The last use of kvm_vcpu_gfn_to_pfn_atomic was removed by commit
1bbc60d0c7e5 ("KVM: x86/mmu: Remove MMU auditing")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Message-ID: <20241001141354.18009-3-linux@treblig.org>
[Adjust Documentation/virt/kvm/locking.rst. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 88a387cf 01-Oct-2024 Dr. David Alan Gilbert <linux@treblig.org>

KVM: Remove unused kvm_vcpu_gfn_to_pfn

The last use of kvm_vcpu_gfn_to_pfn was removed by commit
b1624f99aa8f ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()")

Remove it.

Signed-of

KVM: Remove unused kvm_vcpu_gfn_to_pfn

The last use of kvm_vcpu_gfn_to_pfn was removed by commit
b1624f99aa8f ("KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page()")

Remove it.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Message-ID: <20241001141354.18009-2-linux@treblig.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# cd9626e9 10-Oct-2024 Peter Zijlstra <peterz@infradead.org>

sched/fair: Fix external p->on_rq users

Sean noted that ever since commit 152e11f6df29 ("sched/fair: Implement
delayed dequeue") KVM's preemption notifiers have started
mis-classifying preemption vs

sched/fair: Fix external p->on_rq users

Sean noted that ever since commit 152e11f6df29 ("sched/fair: Implement
delayed dequeue") KVM's preemption notifiers have started
mis-classifying preemption vs blocking.

Notably p->on_rq is no longer sufficient to determine if a task is
runnable or blocked -- the aforementioned commit introduces tasks that
remain on the runqueue even through they will not run again, and
should be considered blocked for many cases.

Add the task_is_runnable() helper to classify things and audit all
external users of the p->on_rq state. Also add a few comments.

Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue")
Reported-by: Sean Christopherson <seanjc@google.com>
Tested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20241010091843.GK33184@noisy.programming.kicks-ass.net

show more ...


Revision tags: v6.12-rc1
# 52c996d3 27-Sep-2024 Arnaldo Carvalho de Melo <acme@redhat.com>

Merge remote-tracking branch 'torvalds/master' into perf-tools

To pick up changes in other trees that may affect perf, such as libbpf
and in general the header files that perf has copies of, so that

Merge remote-tracking branch 'torvalds/master' into perf-tools

To pick up changes in other trees that may affect perf, such as libbpf
and in general the header files that perf has copies of, so that we can
do the sync with the kernel sources.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

show more ...


# c8d430db 06-Oct-2024 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #1

- Fix pKVM error path on init, making sure we do not chang

Merge tag 'kvmarm-fixes-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.12, take #1

- Fix pKVM error path on init, making sure we do not change critical
system registers as we're about to fail

- Make sure that the host's vector length is at capped by a value
common to all CPUs

- Fix kvm_has_feat*() handling of "negative" features, as the current
code is pretty broken

- Promote Joey to the status of official reviewer, while James steps
down -- hopefully only temporarly

show more ...


# 0c436dfe 02-Oct-2024 Takashi Iwai <tiwai@suse.de>

Merge tag 'asoc-fix-v6.12-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.12

A bunch of fixes here that came in during the merge window and t

Merge tag 'asoc-fix-v6.12-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.12

A bunch of fixes here that came in during the merge window and the first
week of release, plus some new quirks and device IDs. There's nothing
major here, it's a bit bigger than it might've been due to there being
no fixes sent during the merge window due to your vacation.

show more ...


# 2cd86f02 01-Oct-2024 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes

Required for a panthor fix that broke when
FOP_UNSIGNED_OFFSET was added in place of FMODE_UNSIGNED_OFFSET.

Signed-off-by: Maarten L

Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixes

Required for a panthor fix that broke when
FOP_UNSIGNED_OFFSET was added in place of FMODE_UNSIGNED_OFFSET.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

show more ...


# 3a39d672 27-Sep-2024 Paolo Abeni <pabeni@redhat.com>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Cross-merge networking fixes after downstream PR.

No conflicts and no adjacent changes.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>


# 3efc5736 28-Sep-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm updates from Paolo Bonzini:
"x86:

- KVM currently invalidates the entirety of the page tables, not just
thos

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull x86 kvm updates from Paolo Bonzini:
"x86:

- KVM currently invalidates the entirety of the page tables, not just
those for the memslot being touched, when a memslot is moved or
deleted.

This does not traditionally have particularly noticeable overhead,
but Intel's TDX will require the guest to re-accept private pages
if they are dropped from the secure EPT, which is a non starter.

Actually, the only reason why this is not already being done is a
bug which was never fully investigated and caused VM instability
with assigned GeForce GPUs, so allow userspace to opt into the new
behavior.

- Advertise AVX10.1 to userspace (effectively prep work for the
"real" AVX10 functionality that is on the horizon)

- Rework common MSR handling code to suppress errors on userspace
accesses to unsupported-but-advertised MSRs

This will allow removing (almost?) all of KVM's exemptions for
userspace access to MSRs that shouldn't exist based on the vCPU
model (the actual cleanup is non-trivial future work)

- Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
splits the 64-bit value into the legacy ICR and ICR2 storage,
whereas Intel (APICv) stores the entire 64-bit value at the ICR
offset

- Fix a bug where KVM would fail to exit to userspace if one was
triggered by a fastpath exit handler

- Add fastpath handling of HLT VM-Exit to expedite re-entering the
guest when there's already a pending wake event at the time of the
exit

- Fix a WARN caused by RSM entering a nested guest from SMM with
invalid guest state, by forcing the vCPU out of guest mode prior to
signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
nested guest)

- Overhaul the "unprotect and retry" logic to more precisely identify
cases where retrying is actually helpful, and to harden all retry
paths against putting the guest into an infinite retry loop

- Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
rmaps in the shadow MMU

- Refactor pieces of the shadow MMU related to aging SPTEs in
prepartion for adding multi generation LRU support in KVM

- Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
enabled, i.e. when the CPU has already flushed the RSB

- Trace the per-CPU host save area as a VMCB pointer to improve
readability and cleanup the retrieval of the SEV-ES host save area

- Remove unnecessary accounting of temporary nested VMCB related
allocations

- Set FINAL/PAGE in the page fault error code for EPT violations if
and only if the GVA is valid. If the GVA is NOT valid, there is no
guest-side page table walk and so stuffing paging related metadata
is nonsensical

- Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
instead of emulating posted interrupt delivery to L2

- Add a lockdep assertion to detect unsafe accesses of vmcs12
structures

- Harden eVMCS loading against an impossible NULL pointer deref
(really truly should be impossible)

- Minor SGX fix and a cleanup

- Misc cleanups

Generic:

- Register KVM's cpuhp and syscore callbacks when enabling
virtualization in hardware, as the sole purpose of said callbacks
is to disable and re-enable virtualization as needed

- Enable virtualization when KVM is loaded, not right before the
first VM is created

Together with the previous change, this simplifies a lot the logic
of the callbacks, because their very existence implies
virtualization is enabled

- Fix a bug that results in KVM prematurely exiting to userspace for
coalesced MMIO/PIO in many cases, clean up the related code, and
add a testcase

- Fix a bug in kvm_clear_guest() where it would trigger a buffer
overflow _if_ the gpa+len crosses a page boundary, which thankfully
is guaranteed to not happen in the current code base. Add WARNs in
more helpers that read/write guest memory to detect similar bugs

Selftests:

- Fix a goof that caused some Hyper-V tests to be skipped when run on
bare metal, i.e. NOT in a VM

- Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
guest

- Explicitly include one-off assets in .gitignore. Past Sean was
completely wrong about not being able to detect missing .gitignore
entries

- Verify userspace single-stepping works when KVM happens to handle a
VM-Exit in its fastpath

- Misc cleanups"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
Documentation: KVM: fix warning in "make htmldocs"
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
selftests: kvm: s390: Add VM run test case
KVM: SVM: let alternatives handle the cases when RSB filling is required
KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
KVM: x86: Update retry protection fields when forcing retry on emulation failure
KVM: x86: Apply retry protection to "unprotect on failure" path
KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
...

show more ...


Revision tags: v6.11
# 7056c4e2 14-Sep-2024 Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD

KVK generic changes for 6.12:

- Fix a bug that results in KVM prematurely exiting to userspace for coalesced
MMIO/

Merge tag 'kvm-x86-generic-6.12' of https://github.com/kvm-x86/linux into HEAD

KVK generic changes for 6.12:

- Fix a bug that results in KVM prematurely exiting to userspace for coalesced
MMIO/PIO in many cases, clean up the related code, and add a testcase.

- Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
the gpa+len crosses a page boundary, which thankfully is guaranteed to not
happen in the current code base. Add WARNs in more helpers that read/write
guest memory to detect similar bugs.

show more ...


# c09dd2bb 12-Sep-2024 Paolo Bonzini <pbonzini@redhat.com>

Merge branch 'kvm-redo-enable-virt' into HEAD

Register KVM's cpuhp and syscore callbacks when enabling virtualization in
hardware, as the sole purpose of said callbacks is to disable and re-enable
v

Merge branch 'kvm-redo-enable-virt' into HEAD

Register KVM's cpuhp and syscore callbacks when enabling virtualization in
hardware, as the sole purpose of said callbacks is to disable and re-enable
virtualization as needed.

The primary motivation for this series is to simplify dealing with enabling
virtualization for Intel's TDX, which needs to enable virtualization
when kvm-intel.ko is loaded, i.e. long before the first VM is created.

That said, this is a nice cleanup on its own. By registering the callbacks
on-demand, the callbacks themselves don't need to check kvm_usage_count,
because their very existence implies a non-zero count.

Patch 1 (re)adds a dedicated lock for kvm_usage_count. This avoids a
lock ordering issue between cpus_read_lock() and kvm_lock. The lock
ordering issue still exist in very rare cases, and will be fixed for
good by switching vm_list to an (S)RCU-protected list.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


Revision tags: v6.11-rc7, v6.11-rc6
# 025dde58 29-Aug-2024 Sean Christopherson <seanjc@google.com>

KVM: Harden guest memory APIs against out-of-bounds accesses

When reading or writing a guest page, WARN and bail if offset+len would
result in a read to a different page so that KVM bugs are more li

KVM: Harden guest memory APIs against out-of-bounds accesses

When reading or writing a guest page, WARN and bail if offset+len would
result in a read to a different page so that KVM bugs are more likely to
be detected, and so that any such bugs are less likely to escalate to an
out-of-bounds access. E.g. if userspace isn't using guard pages and the
target page is at the end of a memslot.

Note, KVM already hardens itself in similar APIs, e.g. in the "cached"
variants, it's just the vanilla APIs that are playing with fire.

Link: https://lore.kernel.org/r/20240829191413.900740-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

show more ...


# ec495f2a 29-Aug-2024 Sean Christopherson <seanjc@google.com>

KVM: Write the per-page "segment" when clearing (part of) a guest page

Pass "seg" instead of "len" when writing guest memory in kvm_clear_guest(),
as "seg" holds the number of bytes to write for the

KVM: Write the per-page "segment" when clearing (part of) a guest page

Pass "seg" instead of "len" when writing guest memory in kvm_clear_guest(),
as "seg" holds the number of bytes to write for the current page, while
"len" holds the total bytes remaining.

Luckily, all users of kvm_clear_guest() are guaranteed to not cross a page
boundary, and so the bug is unhittable in the current code base.

Fixes: 2f5414423ef5 ("KVM: remove kvm_clear_guest_page")
Reported-by: zyr_ms@outlook.com
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219104
Link: https://lore.kernel.org/r/20240829191413.900740-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

show more ...


# b67107a2 30-Aug-2024 Sean Christopherson <seanjc@google.com>

KVM: Add arch hooks for enabling/disabling virtualization

Add arch hooks that are invoked when KVM enables/disable virtualization.
x86 will use the hooks to register an "emergency disable" callback,

KVM: Add arch hooks for enabling/disabling virtualization

Add arch hooks that are invoked when KVM enables/disable virtualization.
x86 will use the hooks to register an "emergency disable" callback, which
is essentially an x86-specific shutdown notifier that is used when the
kernel is doing an emergency reboot/shutdown/kexec.

Add comments for the declarations to help arch code understand exactly
when the callbacks are invoked. Alternatively, the APIs themselves could
communicate most of the same info, but kvm_arch_pre_enable_virtualization()
and kvm_arch_post_disable_virtualization() are a bit cumbersome, and make
it a bit less obvious that they are intended to be implemented as a pair.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-9-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# b4886fab 30-Aug-2024 Sean Christopherson <seanjc@google.com>

KVM: Add a module param to allow enabling virtualization when KVM is loaded

Add an on-by-default module param, enable_virt_at_load, to let userspace
force virtualization to be enabled in hardware wh

KVM: Add a module param to allow enabling virtualization when KVM is loaded

Add an on-by-default module param, enable_virt_at_load, to let userspace
force virtualization to be enabled in hardware when KVM is initialized,
i.e. just before /dev/kvm is exposed to userspace. Enabling virtualization
during KVM initialization allows userspace to avoid the additional latency
when creating/destroying the first/last VM (or more specifically, on the
0=>1 and 1=>0 edges of creation/destruction).

Now that KVM uses the cpuhp framework to do per-CPU enabling, the latency
could be non-trivial as the cpuhup bringup/teardown is serialized across
CPUs, e.g. the latency could be problematic for use case that need to spin
up VMs quickly.

Prior to commit 10474ae8945c ("KVM: Activate Virtualization On Demand"),
KVM _unconditionally_ enabled virtualization during load, i.e. there's no
fundamental reason KVM needs to dynamically toggle virtualization. These
days, the only known argument for not enabling virtualization is to allow
KVM to be autoloaded without blocking other out-of-tree hypervisors, and
such use cases can simply change the module param, e.g. via command line.

Note, the aforementioned commit also mentioned that enabling SVM (AMD's
virtualization extensions) can result in "using invalid TLB entries".
It's not clear whether the changelog was referring to a KVM bug, a CPU
bug, or something else entirely. Regardless, leaving virtualization off
by default is not a robust "fix", as any protection provided is lost the
instant userspace creates the first VM.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-8-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 071f24ad 30-Aug-2024 Sean Christopherson <seanjc@google.com>

KVM: Rename arch hooks related to per-CPU virtualization enabling

Rename the per-CPU hooks used to enable virtualization in hardware to
align with the KVM-wide helpers in kvm_main.c, and to better c

KVM: Rename arch hooks related to per-CPU virtualization enabling

Rename the per-CPU hooks used to enable virtualization in hardware to
align with the KVM-wide helpers in kvm_main.c, and to better capture that
the callbacks are invoked on every online CPU.

No functional change intended.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Message-ID: <20240830043600.127750-5-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 70c01943 30-Aug-2024 Sean Christopherson <seanjc@google.com>

KVM: Rename symbols related to enabling virtualization hardware

Rename the various functions (and a variable) that enable virtualization
to prepare for upcoming changes, and to clean up artifacts of

KVM: Rename symbols related to enabling virtualization hardware

Rename the various functions (and a variable) that enable virtualization
to prepare for upcoming changes, and to clean up artifacts of KVM's
previous behavior, which required manually juggling locks around
kvm_usage_count.

Drop the "nolock" qualifier from per-CPU functions now that there are no
"nolock" implementations of the "all" variants, i.e. now that calling a
non-nolock function from a nolock function isn't confusing (unlike this
sentence).

Drop "all" from the outer helpers as they no longer manually iterate
over all CPUs, and because it might not be obvious what "all" refers to.

In lieu of the above qualifiers, append "_cpu" to the end of the functions
that are per-CPU helpers for the outer APIs.

Opportunistically prepend "kvm" to all functions to help make it clear
that they are KVM helpers, but mostly because there's no reason not to.

Lastly, use "virtualization" instead of "hardware", because while the
functions do enable virtualization in hardware, there are a _lot_ of
things that KVM enables in hardware.

Defer renaming the arch hooks to future patches, purely to reduce the
amount of churn in a single commit.

Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-4-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 9a798b13 30-Aug-2024 Sean Christopherson <seanjc@google.com>

KVM: Register cpuhp and syscore callbacks when enabling hardware

Register KVM's cpuhp and syscore callback when enabling virtualization
in hardware instead of registering the callbacks during initia

KVM: Register cpuhp and syscore callbacks when enabling hardware

Register KVM's cpuhp and syscore callback when enabling virtualization
in hardware instead of registering the callbacks during initialization,
and let the CPU up/down framework invoke the inner enable/disable
functions. Registering the callbacks during initialization makes things
more complex than they need to be, as KVM needs to be very careful about
handling races between enabling CPUs being onlined/offlined and hardware
being enabled/disabled.

Intel TDX support will require KVM to enable virtualization during KVM
initialization, i.e. will add another wrinkle to things, at which point
sorting out the potential races with kvm_usage_count would become even
more complex.

Note, using the cpuhp framework has a subtle behavioral change: enabling
will be done serially across all CPUs, whereas KVM currently sends an IPI
to all CPUs in parallel. While serializing virtualization enabling could
create undesirable latency, the issue is limited to the 0=>1 transition of
VM creation. And even that can be mitigated, e.g. by letting userspace
force virtualization to be enabled when KVM is initialized.

Cc: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# 44d17459 30-Aug-2024 Sean Christopherson <seanjc@google.com>

KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock

Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
on x86 due to a chain of locks and SRCU synchronizati

KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock

Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
on x86 due to a chain of locks and SRCU synchronizations. Translating the
below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
fairness of r/w semaphores).

CPU0 CPU1 CPU2
1 lock(&kvm->slots_lock);
2 lock(&vcpu->mutex);
3 lock(&kvm->srcu);
4 lock(cpu_hotplug_lock);
5 lock(kvm_lock);
6 lock(&kvm->slots_lock);
7 lock(cpu_hotplug_lock);
8 sync(&kvm->srcu);

Note, there are likely more potential deadlocks in KVM x86, e.g. the same
pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with
__kvmclock_cpufreq_notifier():

cpuhp_cpufreq_online()
|
-> cpufreq_online()
|
-> cpufreq_gov_performance_limits()
|
-> __cpufreq_driver_target()
|
-> __target_index()
|
-> cpufreq_freq_transition_begin()
|
-> cpufreq_notify_transition()
|
-> ... __kvmclock_cpufreq_notifier()

But, actually triggering such deadlocks is beyond rare due to the
combination of dependencies and timings involved. E.g. the cpufreq
notifier is only used on older CPUs without a constant TSC, mucking with
the NX hugepage mitigation while VMs are running is very uncommon, and
doing so while also onlining/offlining a CPU (necessary to generate
contention on cpu_hotplug_lock) would be even more unusual.

The most robust solution to the general cpu_hotplug_lock issue is likely
to switch vm_list to be an RCU-protected list, e.g. so that x86's cpufreq
notifier doesn't to take kvm_lock. For now, settle for fixing the most
blatant deadlock, as switching to an RCU-protected list is a much more
involved change, but add a comment in locking.rst to call out that care
needs to be taken when walking holding kvm_lock and walking vm_list.

======================================================
WARNING: possible circular locking dependency detected
6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O
------------------------------------------------------
tee/35048 is trying to acquire lock:
ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm]

but task is already holding lock:
ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (kvm_lock){+.+.}-{3:3}:
__mutex_lock+0x6a/0xb40
mutex_lock_nested+0x1f/0x30
kvm_dev_ioctl+0x4fb/0xe50 [kvm]
__se_sys_ioctl+0x7b/0xd0
__x64_sys_ioctl+0x21/0x30
x64_sys_call+0x15d0/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #2 (cpu_hotplug_lock){++++}-{0:0}:
cpus_read_lock+0x2e/0xb0
static_key_slow_inc+0x16/0x30
kvm_lapic_set_base+0x6a/0x1c0 [kvm]
kvm_set_apic_base+0x8f/0xe0 [kvm]
kvm_set_msr_common+0x9ae/0xf80 [kvm]
vmx_set_msr+0xa54/0xbe0 [kvm_intel]
__kvm_set_msr+0xb6/0x1a0 [kvm]
kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm]
kvm_vcpu_ioctl+0x485/0x5b0 [kvm]
__se_sys_ioctl+0x7b/0xd0
__x64_sys_ioctl+0x21/0x30
x64_sys_call+0x15d0/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #1 (&kvm->srcu){.+.+}-{0:0}:
__synchronize_srcu+0x44/0x1a0
synchronize_srcu_expedited+0x21/0x30
kvm_swap_active_memslots+0x110/0x1c0 [kvm]
kvm_set_memslot+0x360/0x620 [kvm]
__kvm_set_memory_region+0x27b/0x300 [kvm]
kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm]
kvm_vm_ioctl+0x295/0x650 [kvm]
__se_sys_ioctl+0x7b/0xd0
__x64_sys_ioctl+0x21/0x30
x64_sys_call+0x15d0/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e

-> #0 (&kvm->slots_lock){+.+.}-{3:3}:
__lock_acquire+0x15ef/0x2e30
lock_acquire+0xe0/0x260
__mutex_lock+0x6a/0xb40
mutex_lock_nested+0x1f/0x30
set_nx_huge_pages+0x179/0x1e0 [kvm]
param_attr_store+0x93/0x100
module_attr_store+0x22/0x40
sysfs_kf_write+0x81/0xb0
kernfs_fop_write_iter+0x133/0x1d0
vfs_write+0x28d/0x380
ksys_write+0x70/0xe0
__x64_sys_write+0x1f/0x30
x64_sys_call+0x281b/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e

Cc: Chao Gao <chao.gao@intel.com>
Fixes: 0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
Cc: stable@vger.kernel.org
Reviewed-by: Kai Huang <kai.huang@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20240830043600.127750-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

show more ...


# cb787f4a 27-Sep-2024 Al Viro <viro@zeniv.linux.org.uk>

[tree-wide] finally take no_llseek out

no_llseek had been defined to NULL two years ago, in commit 868941b14441
("fs: remove no_llseek")

To quote that commit,

At -rc1 we'll need do a mechanical

[tree-wide] finally take no_llseek out

no_llseek had been defined to NULL two years ago, in commit 868941b14441
("fs: remove no_llseek")

To quote that commit,

At -rc1 we'll need do a mechanical removal of no_llseek -

git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done

would do it.

Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

show more ...


Revision tags: v6.11-rc5
# 87ee9981 19-Aug-2024 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.11-rc4 into driver-core-next

We need the driver core build fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


Revision tags: v6.11-rc4
# 0c80bdfc 12-Aug-2024 Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Merge 6.11-rc3 into driver-core-next

We need the driver core fixes in here as well to build on top of.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>


12345678910>>...165