1*287c6981SWill Deacon.. SPDX-License-Identifier: GPL-2.0 2*287c6981SWill Deacon 3*287c6981SWill Deacon==================== 4*287c6981SWill DeaconProtected KVM (pKVM) 5*287c6981SWill Deacon==================== 6*287c6981SWill Deacon 7*287c6981SWill Deacon**NOTE**: pKVM is currently an experimental, development feature and 8*287c6981SWill Deaconsubject to breaking changes as new isolation features are implemented. 9*287c6981SWill DeaconPlease reach out to the developers at kvmarm@lists.linux.dev if you have 10*287c6981SWill Deaconany questions. 11*287c6981SWill Deacon 12*287c6981SWill DeaconOverview 13*287c6981SWill Deacon======== 14*287c6981SWill Deacon 15*287c6981SWill DeaconBooting a host kernel with '``kvm-arm.mode=protected``' enables 16*287c6981SWill Deacon"Protected KVM" (pKVM). During boot, pKVM installs a stage-2 identity 17*287c6981SWill Deaconmap page-table for the host and uses it to isolate the hypervisor 18*287c6981SWill Deaconrunning at EL2 from the rest of the host running at EL1/0. 19*287c6981SWill Deacon 20*287c6981SWill DeaconpKVM permits creation of protected virtual machines (pVMs) by passing 21*287c6981SWill Deaconthe ``KVM_VM_TYPE_ARM_PROTECTED`` machine type identifier to the 22*287c6981SWill Deacon``KVM_CREATE_VM`` ioctl(). The hypervisor isolates pVMs from the host by 23*287c6981SWill Deaconunmapping pages from the stage-2 identity map as they are accessed by a 24*287c6981SWill DeaconpVM. Hypercalls are provided for a pVM to share specific regions of its 25*287c6981SWill DeaconIPA space back with the host, allowing for communication with the VMM. 26*287c6981SWill DeaconA Linux guest must be configured with ``CONFIG_ARM_PKVM_GUEST=y`` in 27*287c6981SWill Deaconorder to issue these hypercalls. 28*287c6981SWill Deacon 29*287c6981SWill DeaconSee hypercalls.rst for more details. 30*287c6981SWill Deacon 31*287c6981SWill DeaconIsolation mechanisms 32*287c6981SWill Deacon==================== 33*287c6981SWill Deacon 34*287c6981SWill DeaconpKVM relies on a number of mechanisms to isolate PVMs from the host: 35*287c6981SWill Deacon 36*287c6981SWill DeaconCPU memory isolation 37*287c6981SWill Deacon-------------------- 38*287c6981SWill Deacon 39*287c6981SWill DeaconStatus: Isolation of anonymous memory and metadata pages. 40*287c6981SWill Deacon 41*287c6981SWill DeaconMetadata pages (e.g. page-table pages and '``struct kvm_vcpu``' pages) 42*287c6981SWill Deaconare donated from the host to the hypervisor during pVM creation and 43*287c6981SWill Deaconare consequently unmapped from the stage-2 identity map until the pVM is 44*287c6981SWill Deacondestroyed. 45*287c6981SWill Deacon 46*287c6981SWill DeaconSimilarly to regular KVM, pages are lazily mapped into the guest in 47*287c6981SWill Deaconresponse to stage-2 page faults handled by the host. However, when 48*287c6981SWill Deaconrunning a pVM, these pages are first pinned and then unmapped from the 49*287c6981SWill Deaconstage-2 identity map as part of the donation procedure. This gives rise 50*287c6981SWill Deaconto some user-visible differences when compared to non-protected VMs, 51*287c6981SWill Deaconlargely due to the lack of MMU notifiers: 52*287c6981SWill Deacon 53*287c6981SWill Deacon* Memslots cannot be moved or deleted once the pVM has started running. 54*287c6981SWill Deacon* Read-only memslots and dirty logging are not supported. 55*287c6981SWill Deacon* With the exception of swap, file-backed pages cannot be mapped into a 56*287c6981SWill Deacon pVM. 57*287c6981SWill Deacon* Donated pages are accounted against ``RLIMIT_MLOCK`` and so the VMM 58*287c6981SWill Deacon must have a sufficient resource limit or be granted ``CAP_IPC_LOCK``. 59*287c6981SWill Deacon The lack of a runtime reclaim mechanism means that memory locked for 60*287c6981SWill Deacon a pVM will remain locked until the pVM is destroyed. 61*287c6981SWill Deacon* Changes to the VMM address space (e.g. a ``MAP_FIXED`` mmap() over a 62*287c6981SWill Deacon mapping associated with a memslot) are not reflected in the guest and 63*287c6981SWill Deacon may lead to loss of coherency. 64*287c6981SWill Deacon* Accessing pVM memory that has not been shared back will result in the 65*287c6981SWill Deacon delivery of a SIGSEGV. 66*287c6981SWill Deacon* If a system call accesses pVM memory that has not been shared back 67*287c6981SWill Deacon then it will either return ``-EFAULT`` or forcefully reclaim the 68*287c6981SWill Deacon memory pages. Reclaimed memory is zeroed by the hypervisor and a 69*287c6981SWill Deacon subsequent attempt to access it in the pVM will return ``-EFAULT`` 70*287c6981SWill Deacon from the ``VCPU_RUN`` ioctl(). 71*287c6981SWill Deacon 72*287c6981SWill DeaconCPU state isolation 73*287c6981SWill Deacon------------------- 74*287c6981SWill Deacon 75*287c6981SWill DeaconStatus: **Unimplemented.** 76*287c6981SWill Deacon 77*287c6981SWill DeaconDMA isolation using an IOMMU 78*287c6981SWill Deacon---------------------------- 79*287c6981SWill Deacon 80*287c6981SWill DeaconStatus: **Unimplemented.** 81*287c6981SWill Deacon 82*287c6981SWill DeaconProxying of Trustzone services 83*287c6981SWill Deacon------------------------------ 84*287c6981SWill Deacon 85*287c6981SWill DeaconStatus: FF-A and PSCI calls from the host are proxied by the pKVM 86*287c6981SWill Deaconhypervisor. 87*287c6981SWill Deacon 88*287c6981SWill DeaconThe FF-A proxy ensures that the host cannot share pVM or hypervisor 89*287c6981SWill Deaconmemory with Trustzone as part of a "confused deputy" attack. 90*287c6981SWill Deacon 91*287c6981SWill DeaconThe PSCI proxy ensures that CPUs always have the stage-2 identity map 92*287c6981SWill Deaconinstalled when they are executing in the host. 93*287c6981SWill Deacon 94*287c6981SWill DeaconProtected VM firmware (pvmfw) 95*287c6981SWill Deacon----------------------------- 96*287c6981SWill Deacon 97*287c6981SWill DeaconStatus: **Unimplemented.** 98*287c6981SWill Deacon 99*287c6981SWill DeaconResources 100*287c6981SWill Deacon========= 101*287c6981SWill Deacon 102*287c6981SWill DeaconQuentin Perret's KVM Forum 2022 talk entitled "Protected KVM on arm64: A 103*287c6981SWill Deacontechnical deep dive" remains a good resource for learning more about 104*287c6981SWill DeaconpKVM, despite some of the details having changed in the meantime: 105*287c6981SWill Deacon 106*287c6981SWill Deaconhttps://www.youtube.com/watch?v=9npebeVFbFw 107