1*d5dc9583SRong Xu.. SPDX-License-Identifier: GPL-2.0 2*d5dc9583SRong Xu 3*d5dc9583SRong Xu===================================== 4*d5dc9583SRong XuUsing Propeller with the Linux kernel 5*d5dc9583SRong Xu===================================== 6*d5dc9583SRong Xu 7*d5dc9583SRong XuThis enables Propeller build support for the kernel when using Clang 8*d5dc9583SRong Xucompiler. Propeller is a profile-guided optimization (PGO) method used 9*d5dc9583SRong Xuto optimize binary executables. Like AutoFDO, it utilizes hardware 10*d5dc9583SRong Xusampling to gather information about the frequency of execution of 11*d5dc9583SRong Xudifferent code paths within a binary. Unlike AutoFDO, this information 12*d5dc9583SRong Xuis then used right before linking phase to optimize (among others) 13*d5dc9583SRong Xublock layout within and across functions. 14*d5dc9583SRong Xu 15*d5dc9583SRong XuA few important notes about adopting Propeller optimization: 16*d5dc9583SRong Xu 17*d5dc9583SRong Xu#. Although it can be used as a standalone optimization step, it is 18*d5dc9583SRong Xu strongly recommended to apply Propeller on top of AutoFDO, 19*d5dc9583SRong Xu AutoFDO+ThinLTO or Instrument FDO. The rest of this document 20*d5dc9583SRong Xu assumes this paradigm. 21*d5dc9583SRong Xu 22*d5dc9583SRong Xu#. Propeller uses another round of profiling on top of 23*d5dc9583SRong Xu AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves 24*d5dc9583SRong Xu "build-afdo - train-afdo - build-propeller - train-propeller - 25*d5dc9583SRong Xu build-optimized". 26*d5dc9583SRong Xu 27*d5dc9583SRong Xu#. Propeller requires LLVM 19 release or later for Clang/Clang++ 28*d5dc9583SRong Xu and the linker(ld.lld). 29*d5dc9583SRong Xu 30*d5dc9583SRong Xu#. In addition to LLVM toolchain, Propeller requires a profiling 31*d5dc9583SRong Xu conversion tool: https://github.com/google/autofdo with a release 32*d5dc9583SRong Xu after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1. 33*d5dc9583SRong Xu 34*d5dc9583SRong XuThe Propeller optimization process involves the following steps: 35*d5dc9583SRong Xu 36*d5dc9583SRong Xu#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as 37*d5dc9583SRong Xu you would normally do, but with a set of compile-time / link-time 38*d5dc9583SRong Xu flags, so that a special metadata section is created within the 39*d5dc9583SRong Xu kernel binary. The special section is only intend to be used by the 40*d5dc9583SRong Xu profiling tool, it is not part of the runtime image, nor does it 41*d5dc9583SRong Xu change kernel run time text sections. 42*d5dc9583SRong Xu 43*d5dc9583SRong Xu#. Profiling: The above kernel is then run with a representative 44*d5dc9583SRong Xu workload to gather execution frequency data. This data is collected 45*d5dc9583SRong Xu using hardware sampling, via perf. Propeller is most effective on 46*d5dc9583SRong Xu platforms supporting advanced PMU features like LBR on Intel 47*d5dc9583SRong Xu machines. This step is the same as profiling the kernel for AutoFDO 48*d5dc9583SRong Xu (the exact perf parameters can be different). 49*d5dc9583SRong Xu 50*d5dc9583SRong Xu#. Propeller profile generation: Perf output file is converted to a 51*d5dc9583SRong Xu pair of Propeller profiles via an offline tool. 52*d5dc9583SRong Xu 53*d5dc9583SRong Xu#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized 54*d5dc9583SRong Xu binary as you would normally do, but with a compile-time / 55*d5dc9583SRong Xu link-time flag to pick up the Propeller compile time and link time 56*d5dc9583SRong Xu profiles. This build step uses 3 profiles - the AutoFDO profile, 57*d5dc9583SRong Xu the Propeller compile-time profile and the Propeller link-time 58*d5dc9583SRong Xu profile. 59*d5dc9583SRong Xu 60*d5dc9583SRong Xu#. Deployment: The optimized kernel binary is deployed and used 61*d5dc9583SRong Xu in production environments, providing improved performance 62*d5dc9583SRong Xu and reduced latency. 63*d5dc9583SRong Xu 64*d5dc9583SRong XuPreparation 65*d5dc9583SRong Xu=========== 66*d5dc9583SRong Xu 67*d5dc9583SRong XuConfigure the kernel with:: 68*d5dc9583SRong Xu 69*d5dc9583SRong Xu CONFIG_AUTOFDO_CLANG=y 70*d5dc9583SRong Xu CONFIG_PROPELLER_CLANG=y 71*d5dc9583SRong Xu 72*d5dc9583SRong XuCustomization 73*d5dc9583SRong Xu============= 74*d5dc9583SRong Xu 75*d5dc9583SRong XuThe default CONFIG_PROPELLER_CLANG setting covers kernel space objects 76*d5dc9583SRong Xufor Propeller builds. One can, however, enable or disable Propeller build 77*d5dc9583SRong Xufor individual files and directories by adding a line similar to the 78*d5dc9583SRong Xufollowing to the respective kernel Makefile: 79*d5dc9583SRong Xu 80*d5dc9583SRong Xu- For enabling a single file (e.g. foo.o):: 81*d5dc9583SRong Xu 82*d5dc9583SRong Xu PROPELLER_PROFILE_foo.o := y 83*d5dc9583SRong Xu 84*d5dc9583SRong Xu- For enabling all files in one directory:: 85*d5dc9583SRong Xu 86*d5dc9583SRong Xu PROPELLER_PROFILE := y 87*d5dc9583SRong Xu 88*d5dc9583SRong Xu- For disabling one file:: 89*d5dc9583SRong Xu 90*d5dc9583SRong Xu PROPELLER_PROFILE_foo.o := n 91*d5dc9583SRong Xu 92*d5dc9583SRong Xu- For disabling all files in one directory:: 93*d5dc9583SRong Xu 94*d5dc9583SRong Xu PROPELLER__PROFILE := n 95*d5dc9583SRong Xu 96*d5dc9583SRong Xu 97*d5dc9583SRong XuWorkflow 98*d5dc9583SRong Xu======== 99*d5dc9583SRong Xu 100*d5dc9583SRong XuHere is an example workflow for building an AutoFDO+Propeller kernel: 101*d5dc9583SRong Xu 102*d5dc9583SRong Xu1) Assuming an AutoFDO profile is already collected following 103*d5dc9583SRong Xu instructions in the AutoFDO document, build the kernel on the host 104*d5dc9583SRong Xu machine, with AutoFDO and Propeller build configs :: 105*d5dc9583SRong Xu 106*d5dc9583SRong Xu CONFIG_AUTOFDO_CLANG=y 107*d5dc9583SRong Xu CONFIG_PROPELLER_CLANG=y 108*d5dc9583SRong Xu 109*d5dc9583SRong Xu and :: 110*d5dc9583SRong Xu 111*d5dc9583SRong Xu $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name> 112*d5dc9583SRong Xu 113*d5dc9583SRong Xu2) Install the kernel on the test machine. 114*d5dc9583SRong Xu 115*d5dc9583SRong Xu3) Run the load tests. The '-c' option in perf specifies the sample 116*d5dc9583SRong Xu event period. We suggest using a suitable prime number, like 500009, 117*d5dc9583SRong Xu for this purpose. 118*d5dc9583SRong Xu 119*d5dc9583SRong Xu - For Intel platforms:: 120*d5dc9583SRong Xu 121*d5dc9583SRong Xu $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> 122*d5dc9583SRong Xu 123*d5dc9583SRong Xu - For AMD platforms:: 124*d5dc9583SRong Xu 125*d5dc9583SRong Xu $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> 126*d5dc9583SRong Xu 127*d5dc9583SRong Xu Note you can repeat the above steps to collect multiple <perf_file>s. 128*d5dc9583SRong Xu 129*d5dc9583SRong Xu4) (Optional) Download the raw perf file(s) to the host machine. 130*d5dc9583SRong Xu 131*d5dc9583SRong Xu5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to 132*d5dc9583SRong Xu generate Propeller profile. :: 133*d5dc9583SRong Xu 134*d5dc9583SRong Xu $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> 135*d5dc9583SRong Xu --format=propeller --propeller_output_module_name 136*d5dc9583SRong Xu --out=<propeller_profile_prefix>_cc_profile.txt 137*d5dc9583SRong Xu --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt 138*d5dc9583SRong Xu 139*d5dc9583SRong Xu "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". 140*d5dc9583SRong Xu 141*d5dc9583SRong Xu This command generates a pair of Propeller profiles: 142*d5dc9583SRong Xu "<propeller_profile_prefix>_cc_profile.txt" and 143*d5dc9583SRong Xu "<propeller_profile_prefix>_ld_profile.txt". 144*d5dc9583SRong Xu 145*d5dc9583SRong Xu If there are more than 1 perf_file collected in the previous step, 146*d5dc9583SRong Xu you can create a temp list file "<perf_file_list>" with each line 147*d5dc9583SRong Xu containing one perf file name and run:: 148*d5dc9583SRong Xu 149*d5dc9583SRong Xu $ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list> 150*d5dc9583SRong Xu --format=propeller --propeller_output_module_name 151*d5dc9583SRong Xu --out=<propeller_profile_prefix>_cc_profile.txt 152*d5dc9583SRong Xu --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt 153*d5dc9583SRong Xu 154*d5dc9583SRong Xu6) Rebuild the kernel using the AutoFDO and Propeller 155*d5dc9583SRong Xu profiles. :: 156*d5dc9583SRong Xu 157*d5dc9583SRong Xu CONFIG_AUTOFDO_CLANG=y 158*d5dc9583SRong Xu CONFIG_PROPELLER_CLANG=y 159*d5dc9583SRong Xu 160*d5dc9583SRong Xu and :: 161*d5dc9583SRong Xu 162*d5dc9583SRong Xu $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix> 163