1.. SPDX-License-Identifier: GPL-2.0 2 3=============================================== 4RISC-V Kernel Boot Requirements and Constraints 5=============================================== 6 7:Author: Alexandre Ghiti <alexghiti@rivosinc.com> 8:Date: 23 May 2023 9 10This document describes what the RISC-V kernel expects from bootloaders and 11firmware, and also the constraints that any developer must have in mind when 12touching the early boot process. For the purposes of this document, the 13``early boot process`` refers to any code that runs before the final virtual 14mapping is set up. 15 16Pre-kernel Requirements and Constraints 17======================================= 18 19The RISC-V kernel expects the following of bootloaders and platform firmware: 20 21Register state 22-------------- 23 24The RISC-V kernel expects: 25 26 * ``$a0`` to contain the hartid of the current core. 27 * ``$a1`` to contain the address of the devicetree in memory. 28 29CSR state 30--------- 31 32The RISC-V kernel expects: 33 34 * ``$satp = 0``: the MMU, if present, must be disabled. 35 36Reserved memory for resident firmware 37------------------------------------- 38 39The RISC-V kernel must not map any resident memory, or memory protected with 40PMPs, in the direct mapping, so the firmware must correctly mark those regions 41as per the devicetree specification and/or the UEFI specification. 42 43Kernel location 44--------------- 45 46The RISC-V kernel expects to be placed at a PMD boundary (2MB aligned for rv64 47and 4MB aligned for rv32). Note that the EFI stub will physically relocate the 48kernel if that's not the case. 49 50Hardware description 51-------------------- 52 53The firmware can pass either a devicetree or ACPI tables to the RISC-V kernel. 54 55The devicetree is either passed directly to the kernel from the previous stage 56using the ``$a1`` register, or when booting with UEFI, it can be passed using the 57EFI configuration table. 58 59The ACPI tables are passed to the kernel using the EFI configuration table. In 60this case, a tiny devicetree is still created by the EFI stub. Please refer to 61"EFI stub and devicetree" section below for details about this devicetree. 62 63Kernel entry 64------------ 65 66On SMP systems, there are 2 methods to enter the kernel: 67 68- ``RISCV_BOOT_SPINWAIT``: the firmware releases all harts in the kernel, one hart 69 wins a lottery and executes the early boot code while the other harts are 70 parked waiting for the initialization to finish. This method is mostly used to 71 support older firmwares without SBI HSM extension and M-mode RISC-V kernel. 72- ``Ordered booting``: the firmware releases only one hart that will execute the 73 initialization phase and then will start all other harts using the SBI HSM 74 extension. The ordered booting method is the preferred booting method for 75 booting the RISC-V kernel because it can support CPU hotplug and kexec. 76 77UEFI 78---- 79 80UEFI memory map 81~~~~~~~~~~~~~~~ 82 83When booting with UEFI, the RISC-V kernel will use only the EFI memory map to 84populate the system memory. 85 86The UEFI firmware must parse the subnodes of the ``/reserved-memory`` devicetree 87node and abide by the devicetree specification to convert the attributes of 88those subnodes (``no-map`` and ``reusable``) into their correct EFI equivalent 89(refer to section "3.5.4 /reserved-memory and UEFI" of the devicetree 90specification v0.4-rc1). 91 92RISCV_EFI_BOOT_PROTOCOL 93~~~~~~~~~~~~~~~~~~~~~~~ 94 95When booting with UEFI, the EFI stub requires the boot hartid in order to pass 96it to the RISC-V kernel in ``$a1``. The EFI stub retrieves the boot hartid using 97one of the following methods: 98 99- ``RISCV_EFI_BOOT_PROTOCOL`` (**preferred**). 100- ``boot-hartid`` devicetree subnode (**deprecated**). 101 102Any new firmware must implement ``RISCV_EFI_BOOT_PROTOCOL`` as the devicetree 103based approach is deprecated now. 104 105Early Boot Requirements and Constraints 106======================================= 107 108The RISC-V kernel's early boot process operates under the following constraints: 109 110EFI stub and devicetree 111----------------------- 112 113When booting with UEFI, the devicetree is supplemented (or created) by the EFI 114stub with the same parameters as arm64 which are described at the paragraph 115"UEFI kernel support on ARM" in Documentation/arch/arm/uefi.rst. 116 117Virtual mapping installation 118---------------------------- 119 120The installation of the virtual mapping is done in 2 steps in the RISC-V kernel: 121 1221. ``setup_vm()`` installs a temporary kernel mapping in ``early_pg_dir`` which 123 allows discovery of the system memory. Only the kernel text/data are mapped 124 at this point. When establishing this mapping, no allocation can be done 125 (since the system memory is not known yet), so ``early_pg_dir`` page table is 126 statically allocated (using only one table for each level). 127 1282. ``setup_vm_final()`` creates the final kernel mapping in ``swapper_pg_dir`` 129 and takes advantage of the discovered system memory to create the linear 130 mapping. When establishing this mapping, the kernel can allocate memory but 131 cannot access it directly (since the direct mapping is not present yet), so 132 it uses temporary mappings in the fixmap region to be able to access the 133 newly allocated page table levels. 134 135For ``virt_to_phys()`` and ``phys_to_virt()`` to be able to correctly convert 136direct mapping addresses to physical addresses, they need to know the start of 137the DRAM. This happens after step 1, right before step 2 installs the direct 138mapping (see ``setup_bootmem()`` function in arch/riscv/mm/init.c). Any usage of 139those macros before the final virtual mapping is installed must be carefully 140examined. 141 142Devicetree mapping via fixmap 143----------------------------- 144 145As the ``reserved_mem`` array is initialized with virtual addresses established 146by ``setup_vm()``, and used with the mapping established by 147``setup_vm_final()``, the RISC-V kernel uses the fixmap region to map the 148devicetree. This ensures that the devicetree remains accessible by both virtual 149mappings. 150 151Pre-MMU execution 152----------------- 153 154A few pieces of code need to run before even the first virtual mapping is 155established. These are the installation of the first virtual mapping itself, 156patching of early alternatives and the early parsing of the kernel command line. 157That code must be very carefully compiled as: 158 159- ``-fno-pie``: This is needed for relocatable kernels which use ``-fPIE``, 160 since otherwise, any access to a global symbol would go through the GOT which 161 is only relocated virtually. 162- ``-mcmodel=medany``: Any access to a global symbol must be PC-relative to 163 avoid any relocations to happen before the MMU is setup. 164- *all* instrumentation must also be disabled (that includes KASAN, ftrace and 165 others). 166 167As using a symbol from a different compilation unit requires this unit to be 168compiled with those flags, we advise, as much as possible, not to use external 169symbols. 170