1c849d861SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 2c849d861SMauro Carvalho Chehab 3c849d861SMauro Carvalho Chehab================================= 4c849d861SMauro Carvalho ChehabThe PPC KVM paravirtual interface 5c849d861SMauro Carvalho Chehab================================= 6c849d861SMauro Carvalho Chehab 7c849d861SMauro Carvalho ChehabThe basic execution principle by which KVM on PowerPC works is to run all kernel 8c849d861SMauro Carvalho Chehabspace code in PR=1 which is user space. This way we trap all privileged 9c849d861SMauro Carvalho Chehabinstructions and can emulate them accordingly. 10c849d861SMauro Carvalho Chehab 11c849d861SMauro Carvalho ChehabUnfortunately that is also the downfall. There are quite some privileged 12c849d861SMauro Carvalho Chehabinstructions that needlessly return us to the hypervisor even though they 13c849d861SMauro Carvalho Chehabcould be handled differently. 14c849d861SMauro Carvalho Chehab 15c849d861SMauro Carvalho ChehabThis is what the PPC PV interface helps with. It takes privileged instructions 16c849d861SMauro Carvalho Chehaband transforms them into unprivileged ones with some help from the hypervisor. 17c849d861SMauro Carvalho ChehabThis cuts down virtualization costs by about 50% on some of my benchmarks. 18c849d861SMauro Carvalho Chehab 19c849d861SMauro Carvalho ChehabThe code for that interface can be found in arch/powerpc/kernel/kvm* 20c849d861SMauro Carvalho Chehab 21c849d861SMauro Carvalho ChehabQuerying for existence 22c849d861SMauro Carvalho Chehab====================== 23c849d861SMauro Carvalho Chehab 24c849d861SMauro Carvalho ChehabTo find out if we're running on KVM or not, we leverage the device tree. When 25c849d861SMauro Carvalho ChehabLinux is running on KVM, a node /hypervisor exists. That node contains a 26c849d861SMauro Carvalho Chehabcompatible property with the value "linux,kvm". 27c849d861SMauro Carvalho Chehab 28c849d861SMauro Carvalho ChehabOnce you determined you're running under a PV capable KVM, you can now use 29c849d861SMauro Carvalho Chehabhypercalls as described below. 30c849d861SMauro Carvalho Chehab 31c849d861SMauro Carvalho ChehabKVM hypercalls 32c849d861SMauro Carvalho Chehab============== 33c849d861SMauro Carvalho Chehab 34c849d861SMauro Carvalho ChehabInside the device tree's /hypervisor node there's a property called 35c849d861SMauro Carvalho Chehab'hypercall-instructions'. This property contains at most 4 opcodes that make 36c849d861SMauro Carvalho Chehabup the hypercall. To call a hypercall, just call these instructions. 37c849d861SMauro Carvalho Chehab 38c849d861SMauro Carvalho ChehabThe parameters are as follows: 39c849d861SMauro Carvalho Chehab 40c849d861SMauro Carvalho Chehab ======== ================ ================ 41c849d861SMauro Carvalho Chehab Register IN OUT 42c849d861SMauro Carvalho Chehab ======== ================ ================ 43c849d861SMauro Carvalho Chehab r0 - volatile 44c849d861SMauro Carvalho Chehab r3 1st parameter Return code 45c849d861SMauro Carvalho Chehab r4 2nd parameter 1st output value 46c849d861SMauro Carvalho Chehab r5 3rd parameter 2nd output value 47c849d861SMauro Carvalho Chehab r6 4th parameter 3rd output value 48c849d861SMauro Carvalho Chehab r7 5th parameter 4th output value 49c849d861SMauro Carvalho Chehab r8 6th parameter 5th output value 50c849d861SMauro Carvalho Chehab r9 7th parameter 6th output value 51c849d861SMauro Carvalho Chehab r10 8th parameter 7th output value 52c849d861SMauro Carvalho Chehab r11 hypercall number 8th output value 53c849d861SMauro Carvalho Chehab r12 - volatile 54c849d861SMauro Carvalho Chehab ======== ================ ================ 55c849d861SMauro Carvalho Chehab 56c849d861SMauro Carvalho ChehabHypercall definitions are shared in generic code, so the same hypercall numbers 57c849d861SMauro Carvalho Chehabapply for x86 and powerpc alike with the exception that each KVM hypercall 58c849d861SMauro Carvalho Chehabalso needs to be ORed with the KVM vendor code which is (42 << 16). 59c849d861SMauro Carvalho Chehab 60c849d861SMauro Carvalho ChehabReturn codes can be as follows: 61c849d861SMauro Carvalho Chehab 62c849d861SMauro Carvalho Chehab ==== ========================= 63c849d861SMauro Carvalho Chehab Code Meaning 64c849d861SMauro Carvalho Chehab ==== ========================= 65c849d861SMauro Carvalho Chehab 0 Success 66c849d861SMauro Carvalho Chehab 12 Hypercall not implemented 67c849d861SMauro Carvalho Chehab <0 Error 68c849d861SMauro Carvalho Chehab ==== ========================= 69c849d861SMauro Carvalho Chehab 70c849d861SMauro Carvalho ChehabThe magic page 71c849d861SMauro Carvalho Chehab============== 72c849d861SMauro Carvalho Chehab 73c849d861SMauro Carvalho ChehabTo enable communication between the hypervisor and guest there is a new shared 74c849d861SMauro Carvalho Chehabpage that contains parts of supervisor visible register state. The guest can 75c849d861SMauro Carvalho Chehabmap this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE. 76c849d861SMauro Carvalho Chehab 77c849d861SMauro Carvalho ChehabWith this hypercall issued the guest always gets the magic page mapped at the 78c849d861SMauro Carvalho Chehabdesired location. The first parameter indicates the effective address when the 79c849d861SMauro Carvalho ChehabMMU is enabled. The second parameter indicates the address in real mode, if 80c849d861SMauro Carvalho Chehabapplicable to the target. For now, we always map the page to -4096. This way we 81c849d861SMauro Carvalho Chehabcan access it using absolute load and store functions. The following 82c849d861SMauro Carvalho Chehabinstruction reads the first field of the magic page:: 83c849d861SMauro Carvalho Chehab 84c849d861SMauro Carvalho Chehab ld rX, -4096(0) 85c849d861SMauro Carvalho Chehab 86c849d861SMauro Carvalho ChehabThe interface is designed to be extensible should there be need later to add 87c849d861SMauro Carvalho Chehabadditional registers to the magic page. If you add fields to the magic page, 88c849d861SMauro Carvalho Chehabalso define a new hypercall feature to indicate that the host can give you more 89c849d861SMauro Carvalho Chehabregisters. Only if the host supports the additional features, make use of them. 90c849d861SMauro Carvalho Chehab 91c849d861SMauro Carvalho ChehabThe magic page layout is described by struct kvm_vcpu_arch_shared 92*daa3a397SRandy Dunlapin arch/powerpc/include/uapi/asm/kvm_para.h. 93c849d861SMauro Carvalho Chehab 94c849d861SMauro Carvalho ChehabMagic page features 95c849d861SMauro Carvalho Chehab=================== 96c849d861SMauro Carvalho Chehab 97c849d861SMauro Carvalho ChehabWhen mapping the magic page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE, 98c849d861SMauro Carvalho Chehaba second return value is passed to the guest. This second return value contains 99c849d861SMauro Carvalho Chehaba bitmap of available features inside the magic page. 100c849d861SMauro Carvalho Chehab 101c849d861SMauro Carvalho ChehabThe following enhancements to the magic page are currently available: 102c849d861SMauro Carvalho Chehab 103c849d861SMauro Carvalho Chehab ============================ ======================================= 104c849d861SMauro Carvalho Chehab KVM_MAGIC_FEAT_SR Maps SR registers r/w in the magic page 105c849d861SMauro Carvalho Chehab KVM_MAGIC_FEAT_MAS0_TO_SPRG7 Maps MASn, ESR, PIR and high SPRGs 106c849d861SMauro Carvalho Chehab ============================ ======================================= 107c849d861SMauro Carvalho Chehab 108c849d861SMauro Carvalho ChehabFor enhanced features in the magic page, please check for the existence of the 109c849d861SMauro Carvalho Chehabfeature before using them! 110c849d861SMauro Carvalho Chehab 111c849d861SMauro Carvalho ChehabMagic page flags 112c849d861SMauro Carvalho Chehab================ 113c849d861SMauro Carvalho Chehab 114c849d861SMauro Carvalho ChehabIn addition to features that indicate whether a host is capable of a particular 115*daa3a397SRandy Dunlapfeature we also have a channel for a guest to tell the host whether it's capable 116c849d861SMauro Carvalho Chehabof something. This is what we call "flags". 117c849d861SMauro Carvalho Chehab 118c849d861SMauro Carvalho ChehabFlags are passed to the host in the low 12 bits of the Effective Address. 119c849d861SMauro Carvalho Chehab 120c849d861SMauro Carvalho ChehabThe following flags are currently available for a guest to expose: 121c849d861SMauro Carvalho Chehab 122c849d861SMauro Carvalho Chehab MAGIC_PAGE_FLAG_NOT_MAPPED_NX Guest handles NX bits correctly wrt magic page 123c849d861SMauro Carvalho Chehab 124c849d861SMauro Carvalho ChehabMSR bits 125c849d861SMauro Carvalho Chehab======== 126c849d861SMauro Carvalho Chehab 127c849d861SMauro Carvalho ChehabThe MSR contains bits that require hypervisor intervention and bits that do 128c849d861SMauro Carvalho Chehabnot require direct hypervisor intervention because they only get interpreted 129c849d861SMauro Carvalho Chehabwhen entering the guest or don't have any impact on the hypervisor's behavior. 130c849d861SMauro Carvalho Chehab 131c849d861SMauro Carvalho ChehabThe following bits are safe to be set inside the guest: 132c849d861SMauro Carvalho Chehab 133c849d861SMauro Carvalho Chehab - MSR_EE 134c849d861SMauro Carvalho Chehab - MSR_RI 135c849d861SMauro Carvalho Chehab 136c849d861SMauro Carvalho ChehabIf any other bit changes in the MSR, please still use mtmsr(d). 137c849d861SMauro Carvalho Chehab 138c849d861SMauro Carvalho ChehabPatched instructions 139c849d861SMauro Carvalho Chehab==================== 140c849d861SMauro Carvalho Chehab 141c849d861SMauro Carvalho ChehabThe "ld" and "std" instructions are transformed to "lwz" and "stw" instructions 142*daa3a397SRandy Dunlaprespectively on 32-bit systems with an added offset of 4 to accommodate for big 143c849d861SMauro Carvalho Chehabendianness. 144c849d861SMauro Carvalho Chehab 145c849d861SMauro Carvalho ChehabThe following is a list of mapping the Linux kernel performs when running as 146c849d861SMauro Carvalho Chehabguest. Implementing any of those mappings is optional, as the instruction traps 147c849d861SMauro Carvalho Chehabalso act on the shared page. So calling privileged instructions still works as 148c849d861SMauro Carvalho Chehabbefore. 149c849d861SMauro Carvalho Chehab 150c849d861SMauro Carvalho Chehab======================= ================================ 151c849d861SMauro Carvalho ChehabFrom To 152c849d861SMauro Carvalho Chehab======================= ================================ 153c849d861SMauro Carvalho Chehabmfmsr rX ld rX, magic_page->msr 154c849d861SMauro Carvalho Chehabmfsprg rX, 0 ld rX, magic_page->sprg0 155c849d861SMauro Carvalho Chehabmfsprg rX, 1 ld rX, magic_page->sprg1 156c849d861SMauro Carvalho Chehabmfsprg rX, 2 ld rX, magic_page->sprg2 157c849d861SMauro Carvalho Chehabmfsprg rX, 3 ld rX, magic_page->sprg3 158c849d861SMauro Carvalho Chehabmfsrr0 rX ld rX, magic_page->srr0 159c849d861SMauro Carvalho Chehabmfsrr1 rX ld rX, magic_page->srr1 160c849d861SMauro Carvalho Chehabmfdar rX ld rX, magic_page->dar 161c849d861SMauro Carvalho Chehabmfdsisr rX lwz rX, magic_page->dsisr 162c849d861SMauro Carvalho Chehab 163c849d861SMauro Carvalho Chehabmtmsr rX std rX, magic_page->msr 164c849d861SMauro Carvalho Chehabmtsprg 0, rX std rX, magic_page->sprg0 165c849d861SMauro Carvalho Chehabmtsprg 1, rX std rX, magic_page->sprg1 166c849d861SMauro Carvalho Chehabmtsprg 2, rX std rX, magic_page->sprg2 167c849d861SMauro Carvalho Chehabmtsprg 3, rX std rX, magic_page->sprg3 168c849d861SMauro Carvalho Chehabmtsrr0 rX std rX, magic_page->srr0 169c849d861SMauro Carvalho Chehabmtsrr1 rX std rX, magic_page->srr1 170c849d861SMauro Carvalho Chehabmtdar rX std rX, magic_page->dar 171c849d861SMauro Carvalho Chehabmtdsisr rX stw rX, magic_page->dsisr 172c849d861SMauro Carvalho Chehab 173c849d861SMauro Carvalho Chehabtlbsync nop 174c849d861SMauro Carvalho Chehab 175c849d861SMauro Carvalho Chehabmtmsrd rX, 0 b <special mtmsr section> 176c849d861SMauro Carvalho Chehabmtmsr rX b <special mtmsr section> 177c849d861SMauro Carvalho Chehab 178c849d861SMauro Carvalho Chehabmtmsrd rX, 1 b <special mtmsrd section> 179c849d861SMauro Carvalho Chehab 180c849d861SMauro Carvalho Chehab[Book3S only] 181c849d861SMauro Carvalho Chehabmtsrin rX, rY b <special mtsrin section> 182c849d861SMauro Carvalho Chehab 183c849d861SMauro Carvalho Chehab[BookE only] 184c849d861SMauro Carvalho Chehabwrteei [0|1] b <special wrteei section> 185c849d861SMauro Carvalho Chehab======================= ================================ 186c849d861SMauro Carvalho Chehab 187c849d861SMauro Carvalho ChehabSome instructions require more logic to determine what's going on than a load 188c849d861SMauro Carvalho Chehabor store instruction can deliver. To enable patching of those, we keep some 189c849d861SMauro Carvalho ChehabRAM around where we can live translate instructions to. What happens is the 190c849d861SMauro Carvalho Chehabfollowing: 191c849d861SMauro Carvalho Chehab 192c849d861SMauro Carvalho Chehab 1) copy emulation code to memory 193c849d861SMauro Carvalho Chehab 2) patch that code to fit the emulated instruction 194c849d861SMauro Carvalho Chehab 3) patch that code to return to the original pc + 4 195c849d861SMauro Carvalho Chehab 4) patch the original instruction to branch to the new code 196c849d861SMauro Carvalho Chehab 197c849d861SMauro Carvalho ChehabThat way we can inject an arbitrary amount of code as replacement for a single 198c849d861SMauro Carvalho Chehabinstruction. This allows us to check for pending interrupts when setting EE=1 199c849d861SMauro Carvalho Chehabfor example. 200c849d861SMauro Carvalho Chehab 201c849d861SMauro Carvalho ChehabHypercall ABIs in KVM on PowerPC 202c849d861SMauro Carvalho Chehab================================= 203c849d861SMauro Carvalho Chehab 204c849d861SMauro Carvalho Chehab1) KVM hypercalls (ePAPR) 205c849d861SMauro Carvalho Chehab 206c849d861SMauro Carvalho ChehabThese are ePAPR compliant hypercall implementation (mentioned above). Even 207c849d861SMauro Carvalho Chehabgeneric hypercalls are implemented here, like the ePAPR idle hcall. These are 208c849d861SMauro Carvalho Chehabavailable on all targets. 209c849d861SMauro Carvalho Chehab 210c849d861SMauro Carvalho Chehab2) PAPR hypercalls 211c849d861SMauro Carvalho Chehab 212c849d861SMauro Carvalho ChehabPAPR hypercalls are needed to run server PowerPC PAPR guests (-M pseries in QEMU). 213*daa3a397SRandy DunlapThese are the same hypercalls that pHyp, the POWER hypervisor, implements. Some of 214c849d861SMauro Carvalho Chehabthem are handled in the kernel, some are handled in user space. This is only 215c849d861SMauro Carvalho Chehabavailable on book3s_64. 216c849d861SMauro Carvalho Chehab 217c849d861SMauro Carvalho Chehab3) OSI hypercalls 218c849d861SMauro Carvalho Chehab 219c849d861SMauro Carvalho ChehabMac-on-Linux is another user of KVM on PowerPC, which has its own hypercall (long 220c849d861SMauro Carvalho Chehabbefore KVM). This is supported to maintain compatibility. All these hypercalls get 221c849d861SMauro Carvalho Chehabforwarded to user space. This is only useful on book3s_32, but can be used with 222c849d861SMauro Carvalho Chehabbook3s_64 as well. 223