1.. SPDX-License-Identifier: GPL-2.0 2 3===================================== 4Using Propeller with the Linux kernel 5===================================== 6 7This enables Propeller build support for the kernel when using Clang 8compiler. Propeller is a profile-guided optimization (PGO) method used 9to optimize binary executables. Like AutoFDO, it utilizes hardware 10sampling to gather information about the frequency of execution of 11different code paths within a binary. Unlike AutoFDO, this information 12is then used right before linking phase to optimize (among others) 13block layout within and across functions. 14 15A few important notes about adopting Propeller optimization: 16 17#. Although it can be used as a standalone optimization step, it is 18 strongly recommended to apply Propeller on top of AutoFDO, 19 AutoFDO+ThinLTO or Instrument FDO. The rest of this document 20 assumes this paradigm. 21 22#. Propeller uses another round of profiling on top of 23 AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves 24 "build-afdo - train-afdo - build-propeller - train-propeller - 25 build-optimized". 26 27#. Propeller requires LLVM 19 release or later for Clang/Clang++ 28 and the linker(ld.lld). 29 30#. In addition to LLVM toolchain, Propeller requires a profiling 31 conversion tool: https://github.com/google/llvm-propeller. 32 33Current supported architectures include x86/X86_64 (via LBR), 34and arm64 (via SPE). 35 36The Propeller optimization process involves the following steps: 37 38#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as 39 you would normally do, but with a set of compile-time / link-time 40 flags, so that a special metadata section is created within the 41 kernel binary. The special section is only intend to be used by the 42 profiling tool, it is not part of the runtime image, nor does it 43 change kernel run time text sections. 44 45#. Profiling: The above kernel is then run with a representative 46 workload to gather execution frequency data. This data is collected 47 using hardware sampling, via perf. Propeller is most effective on 48 platforms supporting advanced PMU features like LBR on Intel 49 machines. This step is the same as profiling the kernel for AutoFDO 50 (the exact perf parameters can be different). 51 52#. Propeller profile generation: Perf output file is converted to a 53 pair of Propeller profiles via an offline tool. 54 55#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized 56 binary as you would normally do, but with a compile-time / 57 link-time flag to pick up the Propeller compile time and link time 58 profiles. This build step uses 3 profiles - the AutoFDO profile, 59 the Propeller compile-time profile and the Propeller link-time 60 profile. 61 62#. Deployment: The optimized kernel binary is deployed and used 63 in production environments, providing improved performance 64 and reduced latency. 65 66Preparation 67=========== 68 69Configure the kernel with:: 70 71 CONFIG_AUTOFDO_CLANG=y 72 CONFIG_PROPELLER_CLANG=y 73 74Customization 75============= 76 77The default CONFIG_PROPELLER_CLANG setting covers kernel space objects 78for Propeller builds. One can, however, enable or disable Propeller build 79for individual files and directories by adding a line similar to the 80following to the respective kernel Makefile: 81 82- For enabling a single file (e.g. foo.o):: 83 84 PROPELLER_PROFILE_foo.o := y 85 86- For enabling all files in one directory:: 87 88 PROPELLER_PROFILE := y 89 90- For disabling one file:: 91 92 PROPELLER_PROFILE_foo.o := n 93 94- For disabling all files in one directory:: 95 96 PROPELLER__PROFILE := n 97 98 99Workflow 100======== 101 102Here is an example workflow for building an AutoFDO+Propeller kernel: 103 1041) Assuming an AutoFDO profile is already collected following 105 instructions in the AutoFDO document, build the kernel on the host 106 machine, with AutoFDO and Propeller build configs :: 107 108 CONFIG_AUTOFDO_CLANG=y 109 CONFIG_PROPELLER_CLANG=y 110 111 and :: 112 113 $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name> 114 1152) Install the kernel on the test machine. 116 1173) Run the load tests. The '-c' option in perf specifies the sample 118 event period. We suggest using a suitable prime number, like 500009, 119 for this purpose. 120 121 - For Intel platforms:: 122 123 $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> 124 125 - For AMD platforms:: 126 127 $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> 128 129 - For arm64 with SPE:: 130 There are a few kernel features that must be enabled to collect SPE profiles on Arm. 131 Below is a list of the required features: 132 133 - CONFIG_ARM_SPE_PMU=y 134 - CONFIG_PID_IN_CONTEXTIDR=y 135 - kpti=off 136 137 Use the following command to generate SPE perf data file:: 138 139 $ perf record -e 'arm_spe_0/branch_filter=1,load_filter=0,store_filter=0/' -a -N -c <count> --no-switch-events -o <perf_file> -- <loadtest> 140 141 Note you can repeat the above steps to collect multiple <perf_file>s. 142 1434) (Optional) Download the raw perf file(s) to the host machine. 144 1455) Use the generate_propeller_profiles tool (https://github.com/google/llvm-propeller) to 146 generate Propeller profile. :: 147 148 $ generate_propeller_profiles \ 149 --binary=<vmlinux> --profile=<perf_file> \ 150 --format=propeller --propeller_output_module_name \ 151 --out=<propeller_profile_prefix>_cc_profile.txt \ 152 --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt 153 154 "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". 155 156 This command generates a pair of Propeller profiles: 157 "<propeller_profile_prefix>_cc_profile.txt" and 158 "<propeller_profile_prefix>_ld_profile.txt". 159 160 If there are more than 1 perf_file collected in the previous step, 161 you can create a temp list file "<perf_file_list>" with each line 162 containing one perf file name and run:: 163 164 $ generate_propeller_profiles \ 165 --binary=<vmlinux> --profile=@<perf_file_list> \ 166 --format=propeller --propeller_output_module_name \ 167 --out=<propeller_profile_prefix>_cc_profile.txt \ 168 --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt 169 170 For arm64 SPE, add the option '--profiler=perf_spe', like:: 171 172 $ generate_propeller_profiles \ 173 --binary=<vmlinux> --profile=<perf_file> \ 174 --profiler=perf_spe \ 175 --format=propeller --propeller_output_module_name \ 176 --out=<propeller_profile_prefix>_cc_profile.txt \ 177 --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt 178 1796) Rebuild the kernel using the AutoFDO and Propeller 180 profiles. :: 181 182 CONFIG_AUTOFDO_CLANG=y 183 CONFIG_PROPELLER_CLANG=y 184 185 and :: 186 187 $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> 188