xref: /linux/Documentation/arch/riscv/boot.rst (revision 0ea5c948cb64bab5bc7a5516774eb8536f05aa0d)
1*ed843ae9SCosta Shulyupin.. SPDX-License-Identifier: GPL-2.0
2*ed843ae9SCosta Shulyupin
3*ed843ae9SCosta Shulyupin===============================================
4*ed843ae9SCosta ShulyupinRISC-V Kernel Boot Requirements and Constraints
5*ed843ae9SCosta Shulyupin===============================================
6*ed843ae9SCosta Shulyupin
7*ed843ae9SCosta Shulyupin:Author: Alexandre Ghiti <alexghiti@rivosinc.com>
8*ed843ae9SCosta Shulyupin:Date: 23 May 2023
9*ed843ae9SCosta Shulyupin
10*ed843ae9SCosta ShulyupinThis document describes what the RISC-V kernel expects from bootloaders and
11*ed843ae9SCosta Shulyupinfirmware, and also the constraints that any developer must have in mind when
12*ed843ae9SCosta Shulyupintouching the early boot process. For the purposes of this document, the
13*ed843ae9SCosta Shulyupin``early boot process`` refers to any code that runs before the final virtual
14*ed843ae9SCosta Shulyupinmapping is set up.
15*ed843ae9SCosta Shulyupin
16*ed843ae9SCosta ShulyupinPre-kernel Requirements and Constraints
17*ed843ae9SCosta Shulyupin=======================================
18*ed843ae9SCosta Shulyupin
19*ed843ae9SCosta ShulyupinThe RISC-V kernel expects the following of bootloaders and platform firmware:
20*ed843ae9SCosta Shulyupin
21*ed843ae9SCosta ShulyupinRegister state
22*ed843ae9SCosta Shulyupin--------------
23*ed843ae9SCosta Shulyupin
24*ed843ae9SCosta ShulyupinThe RISC-V kernel expects:
25*ed843ae9SCosta Shulyupin
26*ed843ae9SCosta Shulyupin  * ``$a0`` to contain the hartid of the current core.
27*ed843ae9SCosta Shulyupin  * ``$a1`` to contain the address of the devicetree in memory.
28*ed843ae9SCosta Shulyupin
29*ed843ae9SCosta ShulyupinCSR state
30*ed843ae9SCosta Shulyupin---------
31*ed843ae9SCosta Shulyupin
32*ed843ae9SCosta ShulyupinThe RISC-V kernel expects:
33*ed843ae9SCosta Shulyupin
34*ed843ae9SCosta Shulyupin  * ``$satp = 0``: the MMU, if present, must be disabled.
35*ed843ae9SCosta Shulyupin
36*ed843ae9SCosta ShulyupinReserved memory for resident firmware
37*ed843ae9SCosta Shulyupin-------------------------------------
38*ed843ae9SCosta Shulyupin
39*ed843ae9SCosta ShulyupinThe RISC-V kernel must not map any resident memory, or memory protected with
40*ed843ae9SCosta ShulyupinPMPs, in the direct mapping, so the firmware must correctly mark those regions
41*ed843ae9SCosta Shulyupinas per the devicetree specification and/or the UEFI specification.
42*ed843ae9SCosta Shulyupin
43*ed843ae9SCosta ShulyupinKernel location
44*ed843ae9SCosta Shulyupin---------------
45*ed843ae9SCosta Shulyupin
46*ed843ae9SCosta ShulyupinThe RISC-V kernel expects to be placed at a PMD boundary (2MB aligned for rv64
47*ed843ae9SCosta Shulyupinand 4MB aligned for rv32). Note that the EFI stub will physically relocate the
48*ed843ae9SCosta Shulyupinkernel if that's not the case.
49*ed843ae9SCosta Shulyupin
50*ed843ae9SCosta ShulyupinHardware description
51*ed843ae9SCosta Shulyupin--------------------
52*ed843ae9SCosta Shulyupin
53*ed843ae9SCosta ShulyupinThe firmware can pass either a devicetree or ACPI tables to the RISC-V kernel.
54*ed843ae9SCosta Shulyupin
55*ed843ae9SCosta ShulyupinThe devicetree is either passed directly to the kernel from the previous stage
56*ed843ae9SCosta Shulyupinusing the ``$a1`` register, or when booting with UEFI, it can be passed using the
57*ed843ae9SCosta ShulyupinEFI configuration table.
58*ed843ae9SCosta Shulyupin
59*ed843ae9SCosta ShulyupinThe ACPI tables are passed to the kernel using the EFI configuration table. In
60*ed843ae9SCosta Shulyupinthis case, a tiny devicetree is still created by the EFI stub. Please refer to
61*ed843ae9SCosta Shulyupin"EFI stub and devicetree" section below for details about this devicetree.
62*ed843ae9SCosta Shulyupin
63*ed843ae9SCosta ShulyupinKernel entry
64*ed843ae9SCosta Shulyupin------------
65*ed843ae9SCosta Shulyupin
66*ed843ae9SCosta ShulyupinOn SMP systems, there are 2 methods to enter the kernel:
67*ed843ae9SCosta Shulyupin
68*ed843ae9SCosta Shulyupin- ``RISCV_BOOT_SPINWAIT``: the firmware releases all harts in the kernel, one hart
69*ed843ae9SCosta Shulyupin  wins a lottery and executes the early boot code while the other harts are
70*ed843ae9SCosta Shulyupin  parked waiting for the initialization to finish. This method is mostly used to
71*ed843ae9SCosta Shulyupin  support older firmwares without SBI HSM extension and M-mode RISC-V kernel.
72*ed843ae9SCosta Shulyupin- ``Ordered booting``: the firmware releases only one hart that will execute the
73*ed843ae9SCosta Shulyupin  initialization phase and then will start all other harts using the SBI HSM
74*ed843ae9SCosta Shulyupin  extension. The ordered booting method is the preferred booting method for
75*ed843ae9SCosta Shulyupin  booting the RISC-V kernel because it can support CPU hotplug and kexec.
76*ed843ae9SCosta Shulyupin
77*ed843ae9SCosta ShulyupinUEFI
78*ed843ae9SCosta Shulyupin----
79*ed843ae9SCosta Shulyupin
80*ed843ae9SCosta ShulyupinUEFI memory map
81*ed843ae9SCosta Shulyupin~~~~~~~~~~~~~~~
82*ed843ae9SCosta Shulyupin
83*ed843ae9SCosta ShulyupinWhen booting with UEFI, the RISC-V kernel will use only the EFI memory map to
84*ed843ae9SCosta Shulyupinpopulate the system memory.
85*ed843ae9SCosta Shulyupin
86*ed843ae9SCosta ShulyupinThe UEFI firmware must parse the subnodes of the ``/reserved-memory`` devicetree
87*ed843ae9SCosta Shulyupinnode and abide by the devicetree specification to convert the attributes of
88*ed843ae9SCosta Shulyupinthose subnodes (``no-map`` and ``reusable``) into their correct EFI equivalent
89*ed843ae9SCosta Shulyupin(refer to section "3.5.4 /reserved-memory and UEFI" of the devicetree
90*ed843ae9SCosta Shulyupinspecification v0.4-rc1).
91*ed843ae9SCosta Shulyupin
92*ed843ae9SCosta ShulyupinRISCV_EFI_BOOT_PROTOCOL
93*ed843ae9SCosta Shulyupin~~~~~~~~~~~~~~~~~~~~~~~
94*ed843ae9SCosta Shulyupin
95*ed843ae9SCosta ShulyupinWhen booting with UEFI, the EFI stub requires the boot hartid in order to pass
96*ed843ae9SCosta Shulyupinit to the RISC-V kernel in ``$a1``. The EFI stub retrieves the boot hartid using
97*ed843ae9SCosta Shulyupinone of the following methods:
98*ed843ae9SCosta Shulyupin
99*ed843ae9SCosta Shulyupin- ``RISCV_EFI_BOOT_PROTOCOL`` (**preferred**).
100*ed843ae9SCosta Shulyupin- ``boot-hartid`` devicetree subnode (**deprecated**).
101*ed843ae9SCosta Shulyupin
102*ed843ae9SCosta ShulyupinAny new firmware must implement ``RISCV_EFI_BOOT_PROTOCOL`` as the devicetree
103*ed843ae9SCosta Shulyupinbased approach is deprecated now.
104*ed843ae9SCosta Shulyupin
105*ed843ae9SCosta ShulyupinEarly Boot Requirements and Constraints
106*ed843ae9SCosta Shulyupin=======================================
107*ed843ae9SCosta Shulyupin
108*ed843ae9SCosta ShulyupinThe RISC-V kernel's early boot process operates under the following constraints:
109*ed843ae9SCosta Shulyupin
110*ed843ae9SCosta ShulyupinEFI stub and devicetree
111*ed843ae9SCosta Shulyupin-----------------------
112*ed843ae9SCosta Shulyupin
113*ed843ae9SCosta ShulyupinWhen booting with UEFI, the devicetree is supplemented (or created) by the EFI
114*ed843ae9SCosta Shulyupinstub with the same parameters as arm64 which are described at the paragraph
115*ed843ae9SCosta Shulyupin"UEFI kernel support on ARM" in Documentation/arch/arm/uefi.rst.
116*ed843ae9SCosta Shulyupin
117*ed843ae9SCosta ShulyupinVirtual mapping installation
118*ed843ae9SCosta Shulyupin----------------------------
119*ed843ae9SCosta Shulyupin
120*ed843ae9SCosta ShulyupinThe installation of the virtual mapping is done in 2 steps in the RISC-V kernel:
121*ed843ae9SCosta Shulyupin
122*ed843ae9SCosta Shulyupin1. ``setup_vm()`` installs a temporary kernel mapping in ``early_pg_dir`` which
123*ed843ae9SCosta Shulyupin   allows discovery of the system memory. Only the kernel text/data are mapped
124*ed843ae9SCosta Shulyupin   at this point. When establishing this mapping, no allocation can be done
125*ed843ae9SCosta Shulyupin   (since the system memory is not known yet), so ``early_pg_dir`` page table is
126*ed843ae9SCosta Shulyupin   statically allocated (using only one table for each level).
127*ed843ae9SCosta Shulyupin
128*ed843ae9SCosta Shulyupin2. ``setup_vm_final()`` creates the final kernel mapping in ``swapper_pg_dir``
129*ed843ae9SCosta Shulyupin   and takes advantage of the discovered system memory to create the linear
130*ed843ae9SCosta Shulyupin   mapping. When establishing this mapping, the kernel can allocate memory but
131*ed843ae9SCosta Shulyupin   cannot access it directly (since the direct mapping is not present yet), so
132*ed843ae9SCosta Shulyupin   it uses temporary mappings in the fixmap region to be able to access the
133*ed843ae9SCosta Shulyupin   newly allocated page table levels.
134*ed843ae9SCosta Shulyupin
135*ed843ae9SCosta ShulyupinFor ``virt_to_phys()`` and ``phys_to_virt()`` to be able to correctly convert
136*ed843ae9SCosta Shulyupindirect mapping addresses to physical addresses, they need to know the start of
137*ed843ae9SCosta Shulyupinthe DRAM. This happens after step 1, right before step 2 installs the direct
138*ed843ae9SCosta Shulyupinmapping (see ``setup_bootmem()`` function in arch/riscv/mm/init.c). Any usage of
139*ed843ae9SCosta Shulyupinthose macros before the final virtual mapping is installed must be carefully
140*ed843ae9SCosta Shulyupinexamined.
141*ed843ae9SCosta Shulyupin
142*ed843ae9SCosta ShulyupinDevicetree mapping via fixmap
143*ed843ae9SCosta Shulyupin-----------------------------
144*ed843ae9SCosta Shulyupin
145*ed843ae9SCosta ShulyupinAs the ``reserved_mem`` array is initialized with virtual addresses established
146*ed843ae9SCosta Shulyupinby ``setup_vm()``, and used with the mapping established by
147*ed843ae9SCosta Shulyupin``setup_vm_final()``, the RISC-V kernel uses the fixmap region to map the
148*ed843ae9SCosta Shulyupindevicetree. This ensures that the devicetree remains accessible by both virtual
149*ed843ae9SCosta Shulyupinmappings.
150*ed843ae9SCosta Shulyupin
151*ed843ae9SCosta ShulyupinPre-MMU execution
152*ed843ae9SCosta Shulyupin-----------------
153*ed843ae9SCosta Shulyupin
154*ed843ae9SCosta ShulyupinA few pieces of code need to run before even the first virtual mapping is
155*ed843ae9SCosta Shulyupinestablished. These are the installation of the first virtual mapping itself,
156*ed843ae9SCosta Shulyupinpatching of early alternatives and the early parsing of the kernel command line.
157*ed843ae9SCosta ShulyupinThat code must be very carefully compiled as:
158*ed843ae9SCosta Shulyupin
159*ed843ae9SCosta Shulyupin- ``-fno-pie``: This is needed for relocatable kernels which use ``-fPIE``,
160*ed843ae9SCosta Shulyupin  since otherwise, any access to a global symbol would go through the GOT which
161*ed843ae9SCosta Shulyupin  is only relocated virtually.
162*ed843ae9SCosta Shulyupin- ``-mcmodel=medany``: Any access to a global symbol must be PC-relative to
163*ed843ae9SCosta Shulyupin  avoid any relocations to happen before the MMU is setup.
164*ed843ae9SCosta Shulyupin- *all* instrumentation must also be disabled (that includes KASAN, ftrace and
165*ed843ae9SCosta Shulyupin  others).
166*ed843ae9SCosta Shulyupin
167*ed843ae9SCosta ShulyupinAs using a symbol from a different compilation unit requires this unit to be
168*ed843ae9SCosta Shulyupincompiled with those flags, we advise, as much as possible, not to use external
169*ed843ae9SCosta Shulyupinsymbols.
170