xref: /linux/Documentation/virt/kvm/ppc-pv.rst (revision cdd5b5a9761fd66d17586e4f4ba6588c70e640ea)
1c849d861SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2c849d861SMauro Carvalho Chehab
3c849d861SMauro Carvalho Chehab=================================
4c849d861SMauro Carvalho ChehabThe PPC KVM paravirtual interface
5c849d861SMauro Carvalho Chehab=================================
6c849d861SMauro Carvalho Chehab
7c849d861SMauro Carvalho ChehabThe basic execution principle by which KVM on PowerPC works is to run all kernel
8c849d861SMauro Carvalho Chehabspace code in PR=1 which is user space. This way we trap all privileged
9c849d861SMauro Carvalho Chehabinstructions and can emulate them accordingly.
10c849d861SMauro Carvalho Chehab
11c849d861SMauro Carvalho ChehabUnfortunately that is also the downfall. There are quite some privileged
12c849d861SMauro Carvalho Chehabinstructions that needlessly return us to the hypervisor even though they
13c849d861SMauro Carvalho Chehabcould be handled differently.
14c849d861SMauro Carvalho Chehab
15c849d861SMauro Carvalho ChehabThis is what the PPC PV interface helps with. It takes privileged instructions
16c849d861SMauro Carvalho Chehaband transforms them into unprivileged ones with some help from the hypervisor.
17c849d861SMauro Carvalho ChehabThis cuts down virtualization costs by about 50% on some of my benchmarks.
18c849d861SMauro Carvalho Chehab
19c849d861SMauro Carvalho ChehabThe code for that interface can be found in arch/powerpc/kernel/kvm*
20c849d861SMauro Carvalho Chehab
21c849d861SMauro Carvalho ChehabQuerying for existence
22c849d861SMauro Carvalho Chehab======================
23c849d861SMauro Carvalho Chehab
24c849d861SMauro Carvalho ChehabTo find out if we're running on KVM or not, we leverage the device tree. When
25c849d861SMauro Carvalho ChehabLinux is running on KVM, a node /hypervisor exists. That node contains a
26c849d861SMauro Carvalho Chehabcompatible property with the value "linux,kvm".
27c849d861SMauro Carvalho Chehab
28c849d861SMauro Carvalho ChehabOnce you determined you're running under a PV capable KVM, you can now use
29c849d861SMauro Carvalho Chehabhypercalls as described below.
30c849d861SMauro Carvalho Chehab
31c849d861SMauro Carvalho ChehabKVM hypercalls
32c849d861SMauro Carvalho Chehab==============
33c849d861SMauro Carvalho Chehab
34c849d861SMauro Carvalho ChehabInside the device tree's /hypervisor node there's a property called
35c849d861SMauro Carvalho Chehab'hypercall-instructions'. This property contains at most 4 opcodes that make
36c849d861SMauro Carvalho Chehabup the hypercall. To call a hypercall, just call these instructions.
37c849d861SMauro Carvalho Chehab
38c849d861SMauro Carvalho ChehabThe parameters are as follows:
39c849d861SMauro Carvalho Chehab
40c849d861SMauro Carvalho Chehab        ========	================	================
41c849d861SMauro Carvalho Chehab	Register	IN			OUT
42c849d861SMauro Carvalho Chehab        ========	================	================
43c849d861SMauro Carvalho Chehab	r0		-			volatile
44c849d861SMauro Carvalho Chehab	r3		1st parameter		Return code
45c849d861SMauro Carvalho Chehab	r4		2nd parameter		1st output value
46c849d861SMauro Carvalho Chehab	r5		3rd parameter		2nd output value
47c849d861SMauro Carvalho Chehab	r6		4th parameter		3rd output value
48c849d861SMauro Carvalho Chehab	r7		5th parameter		4th output value
49c849d861SMauro Carvalho Chehab	r8		6th parameter		5th output value
50c849d861SMauro Carvalho Chehab	r9		7th parameter		6th output value
51c849d861SMauro Carvalho Chehab	r10		8th parameter		7th output value
52c849d861SMauro Carvalho Chehab	r11		hypercall number	8th output value
53c849d861SMauro Carvalho Chehab	r12		-			volatile
54c849d861SMauro Carvalho Chehab        ========	================	================
55c849d861SMauro Carvalho Chehab
56c849d861SMauro Carvalho ChehabHypercall definitions are shared in generic code, so the same hypercall numbers
57c849d861SMauro Carvalho Chehabapply for x86 and powerpc alike with the exception that each KVM hypercall
58c849d861SMauro Carvalho Chehabalso needs to be ORed with the KVM vendor code which is (42 << 16).
59c849d861SMauro Carvalho Chehab
60c849d861SMauro Carvalho ChehabReturn codes can be as follows:
61c849d861SMauro Carvalho Chehab
62c849d861SMauro Carvalho Chehab	====		=========================
63c849d861SMauro Carvalho Chehab	Code		Meaning
64c849d861SMauro Carvalho Chehab	====		=========================
65c849d861SMauro Carvalho Chehab	0		Success
66c849d861SMauro Carvalho Chehab	12		Hypercall not implemented
67c849d861SMauro Carvalho Chehab	<0		Error
68c849d861SMauro Carvalho Chehab	====		=========================
69c849d861SMauro Carvalho Chehab
70c849d861SMauro Carvalho ChehabThe magic page
71c849d861SMauro Carvalho Chehab==============
72c849d861SMauro Carvalho Chehab
73c849d861SMauro Carvalho ChehabTo enable communication between the hypervisor and guest there is a new shared
74c849d861SMauro Carvalho Chehabpage that contains parts of supervisor visible register state. The guest can
75c849d861SMauro Carvalho Chehabmap this shared page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE.
76c849d861SMauro Carvalho Chehab
77c849d861SMauro Carvalho ChehabWith this hypercall issued the guest always gets the magic page mapped at the
78c849d861SMauro Carvalho Chehabdesired location. The first parameter indicates the effective address when the
79c849d861SMauro Carvalho ChehabMMU is enabled. The second parameter indicates the address in real mode, if
80c849d861SMauro Carvalho Chehabapplicable to the target. For now, we always map the page to -4096. This way we
81c849d861SMauro Carvalho Chehabcan access it using absolute load and store functions. The following
82c849d861SMauro Carvalho Chehabinstruction reads the first field of the magic page::
83c849d861SMauro Carvalho Chehab
84c849d861SMauro Carvalho Chehab	ld	rX, -4096(0)
85c849d861SMauro Carvalho Chehab
86c849d861SMauro Carvalho ChehabThe interface is designed to be extensible should there be need later to add
87c849d861SMauro Carvalho Chehabadditional registers to the magic page. If you add fields to the magic page,
88c849d861SMauro Carvalho Chehabalso define a new hypercall feature to indicate that the host can give you more
89c849d861SMauro Carvalho Chehabregisters. Only if the host supports the additional features, make use of them.
90c849d861SMauro Carvalho Chehab
91c849d861SMauro Carvalho ChehabThe magic page layout is described by struct kvm_vcpu_arch_shared
92*daa3a397SRandy Dunlapin arch/powerpc/include/uapi/asm/kvm_para.h.
93c849d861SMauro Carvalho Chehab
94c849d861SMauro Carvalho ChehabMagic page features
95c849d861SMauro Carvalho Chehab===================
96c849d861SMauro Carvalho Chehab
97c849d861SMauro Carvalho ChehabWhen mapping the magic page using the KVM hypercall KVM_HC_PPC_MAP_MAGIC_PAGE,
98c849d861SMauro Carvalho Chehaba second return value is passed to the guest. This second return value contains
99c849d861SMauro Carvalho Chehaba bitmap of available features inside the magic page.
100c849d861SMauro Carvalho Chehab
101c849d861SMauro Carvalho ChehabThe following enhancements to the magic page are currently available:
102c849d861SMauro Carvalho Chehab
103c849d861SMauro Carvalho Chehab  ============================  =======================================
104c849d861SMauro Carvalho Chehab  KVM_MAGIC_FEAT_SR		Maps SR registers r/w in the magic page
105c849d861SMauro Carvalho Chehab  KVM_MAGIC_FEAT_MAS0_TO_SPRG7	Maps MASn, ESR, PIR and high SPRGs
106c849d861SMauro Carvalho Chehab  ============================  =======================================
107c849d861SMauro Carvalho Chehab
108c849d861SMauro Carvalho ChehabFor enhanced features in the magic page, please check for the existence of the
109c849d861SMauro Carvalho Chehabfeature before using them!
110c849d861SMauro Carvalho Chehab
111c849d861SMauro Carvalho ChehabMagic page flags
112c849d861SMauro Carvalho Chehab================
113c849d861SMauro Carvalho Chehab
114c849d861SMauro Carvalho ChehabIn addition to features that indicate whether a host is capable of a particular
115*daa3a397SRandy Dunlapfeature we also have a channel for a guest to tell the host whether it's capable
116c849d861SMauro Carvalho Chehabof something. This is what we call "flags".
117c849d861SMauro Carvalho Chehab
118c849d861SMauro Carvalho ChehabFlags are passed to the host in the low 12 bits of the Effective Address.
119c849d861SMauro Carvalho Chehab
120c849d861SMauro Carvalho ChehabThe following flags are currently available for a guest to expose:
121c849d861SMauro Carvalho Chehab
122c849d861SMauro Carvalho Chehab  MAGIC_PAGE_FLAG_NOT_MAPPED_NX Guest handles NX bits correctly wrt magic page
123c849d861SMauro Carvalho Chehab
124c849d861SMauro Carvalho ChehabMSR bits
125c849d861SMauro Carvalho Chehab========
126c849d861SMauro Carvalho Chehab
127c849d861SMauro Carvalho ChehabThe MSR contains bits that require hypervisor intervention and bits that do
128c849d861SMauro Carvalho Chehabnot require direct hypervisor intervention because they only get interpreted
129c849d861SMauro Carvalho Chehabwhen entering the guest or don't have any impact on the hypervisor's behavior.
130c849d861SMauro Carvalho Chehab
131c849d861SMauro Carvalho ChehabThe following bits are safe to be set inside the guest:
132c849d861SMauro Carvalho Chehab
133c849d861SMauro Carvalho Chehab  - MSR_EE
134c849d861SMauro Carvalho Chehab  - MSR_RI
135c849d861SMauro Carvalho Chehab
136c849d861SMauro Carvalho ChehabIf any other bit changes in the MSR, please still use mtmsr(d).
137c849d861SMauro Carvalho Chehab
138c849d861SMauro Carvalho ChehabPatched instructions
139c849d861SMauro Carvalho Chehab====================
140c849d861SMauro Carvalho Chehab
141c849d861SMauro Carvalho ChehabThe "ld" and "std" instructions are transformed to "lwz" and "stw" instructions
142*daa3a397SRandy Dunlaprespectively on 32-bit systems with an added offset of 4 to accommodate for big
143c849d861SMauro Carvalho Chehabendianness.
144c849d861SMauro Carvalho Chehab
145c849d861SMauro Carvalho ChehabThe following is a list of mapping the Linux kernel performs when running as
146c849d861SMauro Carvalho Chehabguest. Implementing any of those mappings is optional, as the instruction traps
147c849d861SMauro Carvalho Chehabalso act on the shared page. So calling privileged instructions still works as
148c849d861SMauro Carvalho Chehabbefore.
149c849d861SMauro Carvalho Chehab
150c849d861SMauro Carvalho Chehab======================= ================================
151c849d861SMauro Carvalho ChehabFrom			To
152c849d861SMauro Carvalho Chehab======================= ================================
153c849d861SMauro Carvalho Chehabmfmsr	rX		ld	rX, magic_page->msr
154c849d861SMauro Carvalho Chehabmfsprg	rX, 0		ld	rX, magic_page->sprg0
155c849d861SMauro Carvalho Chehabmfsprg	rX, 1		ld	rX, magic_page->sprg1
156c849d861SMauro Carvalho Chehabmfsprg	rX, 2		ld	rX, magic_page->sprg2
157c849d861SMauro Carvalho Chehabmfsprg	rX, 3		ld	rX, magic_page->sprg3
158c849d861SMauro Carvalho Chehabmfsrr0	rX		ld	rX, magic_page->srr0
159c849d861SMauro Carvalho Chehabmfsrr1	rX		ld	rX, magic_page->srr1
160c849d861SMauro Carvalho Chehabmfdar	rX		ld	rX, magic_page->dar
161c849d861SMauro Carvalho Chehabmfdsisr	rX		lwz	rX, magic_page->dsisr
162c849d861SMauro Carvalho Chehab
163c849d861SMauro Carvalho Chehabmtmsr	rX		std	rX, magic_page->msr
164c849d861SMauro Carvalho Chehabmtsprg	0, rX		std	rX, magic_page->sprg0
165c849d861SMauro Carvalho Chehabmtsprg	1, rX		std	rX, magic_page->sprg1
166c849d861SMauro Carvalho Chehabmtsprg	2, rX		std	rX, magic_page->sprg2
167c849d861SMauro Carvalho Chehabmtsprg	3, rX		std	rX, magic_page->sprg3
168c849d861SMauro Carvalho Chehabmtsrr0	rX		std	rX, magic_page->srr0
169c849d861SMauro Carvalho Chehabmtsrr1	rX		std	rX, magic_page->srr1
170c849d861SMauro Carvalho Chehabmtdar	rX		std	rX, magic_page->dar
171c849d861SMauro Carvalho Chehabmtdsisr	rX		stw	rX, magic_page->dsisr
172c849d861SMauro Carvalho Chehab
173c849d861SMauro Carvalho Chehabtlbsync			nop
174c849d861SMauro Carvalho Chehab
175c849d861SMauro Carvalho Chehabmtmsrd	rX, 0		b	<special mtmsr section>
176c849d861SMauro Carvalho Chehabmtmsr	rX		b	<special mtmsr section>
177c849d861SMauro Carvalho Chehab
178c849d861SMauro Carvalho Chehabmtmsrd	rX, 1		b	<special mtmsrd section>
179c849d861SMauro Carvalho Chehab
180c849d861SMauro Carvalho Chehab[Book3S only]
181c849d861SMauro Carvalho Chehabmtsrin	rX, rY		b	<special mtsrin section>
182c849d861SMauro Carvalho Chehab
183c849d861SMauro Carvalho Chehab[BookE only]
184c849d861SMauro Carvalho Chehabwrteei	[0|1]		b	<special wrteei section>
185c849d861SMauro Carvalho Chehab======================= ================================
186c849d861SMauro Carvalho Chehab
187c849d861SMauro Carvalho ChehabSome instructions require more logic to determine what's going on than a load
188c849d861SMauro Carvalho Chehabor store instruction can deliver. To enable patching of those, we keep some
189c849d861SMauro Carvalho ChehabRAM around where we can live translate instructions to. What happens is the
190c849d861SMauro Carvalho Chehabfollowing:
191c849d861SMauro Carvalho Chehab
192c849d861SMauro Carvalho Chehab	1) copy emulation code to memory
193c849d861SMauro Carvalho Chehab	2) patch that code to fit the emulated instruction
194c849d861SMauro Carvalho Chehab	3) patch that code to return to the original pc + 4
195c849d861SMauro Carvalho Chehab	4) patch the original instruction to branch to the new code
196c849d861SMauro Carvalho Chehab
197c849d861SMauro Carvalho ChehabThat way we can inject an arbitrary amount of code as replacement for a single
198c849d861SMauro Carvalho Chehabinstruction. This allows us to check for pending interrupts when setting EE=1
199c849d861SMauro Carvalho Chehabfor example.
200c849d861SMauro Carvalho Chehab
201c849d861SMauro Carvalho ChehabHypercall ABIs in KVM on PowerPC
202c849d861SMauro Carvalho Chehab=================================
203c849d861SMauro Carvalho Chehab
204c849d861SMauro Carvalho Chehab1) KVM hypercalls (ePAPR)
205c849d861SMauro Carvalho Chehab
206c849d861SMauro Carvalho ChehabThese are ePAPR compliant hypercall implementation (mentioned above). Even
207c849d861SMauro Carvalho Chehabgeneric hypercalls are implemented here, like the ePAPR idle hcall. These are
208c849d861SMauro Carvalho Chehabavailable on all targets.
209c849d861SMauro Carvalho Chehab
210c849d861SMauro Carvalho Chehab2) PAPR hypercalls
211c849d861SMauro Carvalho Chehab
212c849d861SMauro Carvalho ChehabPAPR hypercalls are needed to run server PowerPC PAPR guests (-M pseries in QEMU).
213*daa3a397SRandy DunlapThese are the same hypercalls that pHyp, the POWER hypervisor, implements. Some of
214c849d861SMauro Carvalho Chehabthem are handled in the kernel, some are handled in user space. This is only
215c849d861SMauro Carvalho Chehabavailable on book3s_64.
216c849d861SMauro Carvalho Chehab
217c849d861SMauro Carvalho Chehab3) OSI hypercalls
218c849d861SMauro Carvalho Chehab
219c849d861SMauro Carvalho ChehabMac-on-Linux is another user of KVM on PowerPC, which has its own hypercall (long
220c849d861SMauro Carvalho Chehabbefore KVM). This is supported to maintain compatibility. All these hypercalls get
221c849d861SMauro Carvalho Chehabforwarded to user space. This is only useful on book3s_32, but can be used with
222c849d861SMauro Carvalho Chehabbook3s_64 as well.
223