1.. SPDX-License-Identifier: GPL-2.0 2 3=================================== 4Using AutoFDO with the Linux kernel 5=================================== 6 7This enables AutoFDO build support for the kernel when using 8the Clang compiler. AutoFDO (Auto-Feedback-Directed Optimization) 9is a type of profile-guided optimization (PGO) used to enhance the 10performance of binary executables. It gathers information about the 11frequency of execution of various code paths within a binary using 12hardware sampling. This data is then used to guide the compiler's 13optimization decisions, resulting in a more efficient binary. AutoFDO 14is a powerful optimization technique, and data indicates that it can 15significantly improve kernel performance. It's especially beneficial 16for workloads affected by front-end stalls. 17 18For AutoFDO builds, unlike non-FDO builds, the user must supply a 19profile. Acquiring an AutoFDO profile can be done in several ways. 20AutoFDO profiles are created by converting hardware sampling using 21the "perf" tool. It is crucial that the workload used to create these 22perf files is representative; they must exhibit runtime 23characteristics similar to the workloads that are intended to be 24optimized. Failure to do so will result in the compiler optimizing 25for the wrong objective. 26 27The AutoFDO profile often encapsulates the program's behavior. If the 28performance-critical codes are architecture-independent, the profile 29can be applied across platforms to achieve performance gains. For 30instance, using the profile generated on Intel architecture to build 31a kernel for AMD architecture can also yield performance improvements. 32 33There are two methods for acquiring a representative profile: 34(1) Sample real workloads using a production environment. 35(2) Generate the profile using a representative load test. 36When enabling the AutoFDO build configuration without providing an 37AutoFDO profile, the compiler only modifies the dwarf information in 38the kernel without impacting runtime performance. It's advisable to 39use a kernel binary built with the same AutoFDO configuration to 40collect the perf profile. While it's possible to use a kernel built 41with different options, it may result in inferior performance. 42 43One can collect profiles using AutoFDO build for the previous kernel. 44AutoFDO employs relative line numbers to match the profiles, offering 45some tolerance for source changes. This mode is commonly used in a 46production environment for profile collection. 47 48In a profile collection based on a load test, the AutoFDO collection 49process consists of the following steps: 50 51#. Initial build: The kernel is built with AutoFDO options 52 without a profile. 53 54#. Profiling: The above kernel is then run with a representative 55 workload to gather execution frequency data. This data is 56 collected using hardware sampling, via perf. AutoFDO is most 57 effective on platforms supporting advanced PMU features like 58 LBR on Intel machines. 59 60#. AutoFDO profile generation: Perf output file is converted to 61 the AutoFDO profile via offline tools. 62 63The support requires a Clang compiler LLVM 17 or later. 64Current supported architectures include x86/x86_64 (via LBR) and 65arm64 (via SPE or ETM). 66 67 68Preparation 69=========== 70 71Configure the kernel with:: 72 73 CONFIG_AUTOFDO_CLANG=y 74 75Customization 76============= 77 78The default CONFIG_AUTOFDO_CLANG setting covers kernel space objects for 79AutoFDO builds. One can, however, enable or disable AutoFDO build for 80individual files and directories by adding a line similar to the following 81to the respective kernel Makefile: 82 83- For enabling a single file (e.g. foo.o) :: 84 85 AUTOFDO_PROFILE_foo.o := y 86 87- For enabling all files in one directory :: 88 89 AUTOFDO_PROFILE := y 90 91- For disabling one file :: 92 93 AUTOFDO_PROFILE_foo.o := n 94 95- For disabling all files in one directory :: 96 97 AUTOFDO_PROFILE := n 98 99Workflow 100======== 101 102Here is an example workflow for AutoFDO kernel: 103 1041) Build the kernel on the host machine with LLVM enabled, 105 for example, :: 106 107 $ make menuconfig LLVM=1 108 109 Turn on AutoFDO build config:: 110 111 CONFIG_AUTOFDO_CLANG=y 112 113 With a configuration that with LLVM enabled, use the following command:: 114 115 $ scripts/config -e AUTOFDO_CLANG 116 117 After getting the config, build with :: 118 119 $ make LLVM=1 120 1212) Install the kernel on the test machine. 122 1233) Run the load tests. The '-c' option in perf specifies the sample 124 event period. We suggest using a suitable prime number, like 500009, 125 for this purpose. 126 127 - For Intel platforms:: 128 129 $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> 130 131 - For AMD platforms: 132 133 The supported systems are: Zen3 with BRS, or Zen4 with amd_lbr_v2. To check, 134 135 For Zen3:: 136 137 $ cat /proc/cpuinfo | grep " brs" 138 139 For Zen4:: 140 141 $ cat /proc/cpuinfo | grep amd_lbr_v2 142 143 The following command generated the perf data file:: 144 145 $ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> 146 147 - For arm64 with SPE: 148 149 There are a few kernel features that must be enabled to collect SPE profiles on Arm. 150 Below is a list of the required features: 151 152 - CONFIG_ARM_SPE_PMU=y 153 - CONFIG_PID_IN_CONTEXTIDR=y 154 - kpti=off 155 156 Use the following command to generate SPE perf data file:: 157 158 $ perf record -e ' arm_spe_0/branch_filter=1,load_filter=0,store_filter=0/' -a -c <count> -N --no-switch-events -o <perf_file> -- <loadtest> 159 160 - For arm64 with ETM trace: 161 162 Follow the instructions in `Linaro OpenCSD document 163 <https://github.com/Linaro/OpenCSD/blob/master/decoder/tests/auto-fdo/autofdo.md>`_ 164 to record ETM traces for AutoFDO:: 165 166 $ perf record -e cs_etm/@tmc_etr0/k -a -o <etm_perf_file> -- <loadtest> 167 $ perf inject -i <etm_perf_file> -o <perf_file> --itrace=i500009il 168 169 For ARM platforms running Android, follow the instructions in `Android simpleperf 170 document <https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline/gki/aarch64/afdo>`_ 171 to record ETM traces for AutoFDO:: 172 173 $ simpleperf record -e cs-etm:k -a -o <etm_perf_file> -- <loadtest> 174 $ simpleperf inject -i <etm_perf_file> -o <text_perf_file> --symdir <vmlinux_dir> 175 1764) (Optional) Download the raw perf file to the host machine. 177 1785) To generate an AutoFDO profile, two offline tools are available: 179 create_llvm_prof and llvm_profgen. The create_llvm_prof tool is part 180 of the AutoFDO project and can be found on GitHub 181 (https://github.com/google/autofdo), version v0.30.1 or later. 182 The llvm_profgen tool is included in the LLVM compiler itself. It's 183 important to note that the version of llvm_profgen doesn't need to match 184 the version of Clang. It needs to be the LLVM 19 release of Clang 185 or later, or just from the LLVM trunk. :: 186 187 $ llvm-profgen --kernel --binary=<vmlinux> --perfdata=<perf_file> -o <profile_file> 188 189 or :: 190 191 $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --format=extbinary --out=<profile_file> 192 193 Note that multiple AutoFDO profile files can be merged into one via:: 194 195 $ llvm-profdata merge -o <profile_file> <profile_1> <profile_2> ... <profile_n> 196 197 For arm64 SPE, use the following command:: 198 199 $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> --profiler=perf_spe --format=extbinary --out=<profile_file> 200 201 For arm64 ETM, use the following command:: 202 203 $ create_llvm_prof --binary=<vmlinux> --profile=<text_perf_file> --profiler=text -format=extbinary -out=<profile_file> 204 205 2066) Rebuild the kernel using the AutoFDO profile file with the same config as step 1, 207 (Note CONFIG_AUTOFDO_CLANG needs to be enabled):: 208 209 $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> 210