xref: /linux/Documentation/dev-tools/propeller.rst (revision 6a34dfa15d6edf7e78b8118d862d2db0889cf669)
1*d5dc9583SRong Xu.. SPDX-License-Identifier: GPL-2.0
2*d5dc9583SRong Xu
3*d5dc9583SRong Xu=====================================
4*d5dc9583SRong XuUsing Propeller with the Linux kernel
5*d5dc9583SRong Xu=====================================
6*d5dc9583SRong Xu
7*d5dc9583SRong XuThis enables Propeller build support for the kernel when using Clang
8*d5dc9583SRong Xucompiler. Propeller is a profile-guided optimization (PGO) method used
9*d5dc9583SRong Xuto optimize binary executables. Like AutoFDO, it utilizes hardware
10*d5dc9583SRong Xusampling to gather information about the frequency of execution of
11*d5dc9583SRong Xudifferent code paths within a binary. Unlike AutoFDO, this information
12*d5dc9583SRong Xuis then used right before linking phase to optimize (among others)
13*d5dc9583SRong Xublock layout within and across functions.
14*d5dc9583SRong Xu
15*d5dc9583SRong XuA few important notes about adopting Propeller optimization:
16*d5dc9583SRong Xu
17*d5dc9583SRong Xu#. Although it can be used as a standalone optimization step, it is
18*d5dc9583SRong Xu   strongly recommended to apply Propeller on top of AutoFDO,
19*d5dc9583SRong Xu   AutoFDO+ThinLTO or Instrument FDO. The rest of this document
20*d5dc9583SRong Xu   assumes this paradigm.
21*d5dc9583SRong Xu
22*d5dc9583SRong Xu#. Propeller uses another round of profiling on top of
23*d5dc9583SRong Xu   AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves
24*d5dc9583SRong Xu   "build-afdo - train-afdo - build-propeller - train-propeller -
25*d5dc9583SRong Xu   build-optimized".
26*d5dc9583SRong Xu
27*d5dc9583SRong Xu#. Propeller requires LLVM 19 release or later for Clang/Clang++
28*d5dc9583SRong Xu   and the linker(ld.lld).
29*d5dc9583SRong Xu
30*d5dc9583SRong Xu#. In addition to LLVM toolchain, Propeller requires a profiling
31*d5dc9583SRong Xu   conversion tool: https://github.com/google/autofdo with a release
32*d5dc9583SRong Xu   after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1.
33*d5dc9583SRong Xu
34*d5dc9583SRong XuThe Propeller optimization process involves the following steps:
35*d5dc9583SRong Xu
36*d5dc9583SRong Xu#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as
37*d5dc9583SRong Xu   you would normally do, but with a set of compile-time / link-time
38*d5dc9583SRong Xu   flags, so that a special metadata section is created within the
39*d5dc9583SRong Xu   kernel binary. The special section is only intend to be used by the
40*d5dc9583SRong Xu   profiling tool, it is not part of the runtime image, nor does it
41*d5dc9583SRong Xu   change kernel run time text sections.
42*d5dc9583SRong Xu
43*d5dc9583SRong Xu#. Profiling: The above kernel is then run with a representative
44*d5dc9583SRong Xu   workload to gather execution frequency data. This data is collected
45*d5dc9583SRong Xu   using hardware sampling, via perf. Propeller is most effective on
46*d5dc9583SRong Xu   platforms supporting advanced PMU features like LBR on Intel
47*d5dc9583SRong Xu   machines. This step is the same as profiling the kernel for AutoFDO
48*d5dc9583SRong Xu   (the exact perf parameters can be different).
49*d5dc9583SRong Xu
50*d5dc9583SRong Xu#. Propeller profile generation: Perf output file is converted to a
51*d5dc9583SRong Xu   pair of Propeller profiles via an offline tool.
52*d5dc9583SRong Xu
53*d5dc9583SRong Xu#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized
54*d5dc9583SRong Xu   binary as you would normally do, but with a compile-time /
55*d5dc9583SRong Xu   link-time flag to pick up the Propeller compile time and link time
56*d5dc9583SRong Xu   profiles. This build step uses 3 profiles - the AutoFDO profile,
57*d5dc9583SRong Xu   the Propeller compile-time profile and the Propeller link-time
58*d5dc9583SRong Xu   profile.
59*d5dc9583SRong Xu
60*d5dc9583SRong Xu#. Deployment: The optimized kernel binary is deployed and used
61*d5dc9583SRong Xu   in production environments, providing improved performance
62*d5dc9583SRong Xu   and reduced latency.
63*d5dc9583SRong Xu
64*d5dc9583SRong XuPreparation
65*d5dc9583SRong Xu===========
66*d5dc9583SRong Xu
67*d5dc9583SRong XuConfigure the kernel with::
68*d5dc9583SRong Xu
69*d5dc9583SRong Xu   CONFIG_AUTOFDO_CLANG=y
70*d5dc9583SRong Xu   CONFIG_PROPELLER_CLANG=y
71*d5dc9583SRong Xu
72*d5dc9583SRong XuCustomization
73*d5dc9583SRong Xu=============
74*d5dc9583SRong Xu
75*d5dc9583SRong XuThe default CONFIG_PROPELLER_CLANG setting covers kernel space objects
76*d5dc9583SRong Xufor Propeller builds. One can, however, enable or disable Propeller build
77*d5dc9583SRong Xufor individual files and directories by adding a line similar to the
78*d5dc9583SRong Xufollowing to the respective kernel Makefile:
79*d5dc9583SRong Xu
80*d5dc9583SRong Xu- For enabling a single file (e.g. foo.o)::
81*d5dc9583SRong Xu
82*d5dc9583SRong Xu   PROPELLER_PROFILE_foo.o := y
83*d5dc9583SRong Xu
84*d5dc9583SRong Xu- For enabling all files in one directory::
85*d5dc9583SRong Xu
86*d5dc9583SRong Xu   PROPELLER_PROFILE := y
87*d5dc9583SRong Xu
88*d5dc9583SRong Xu- For disabling one file::
89*d5dc9583SRong Xu
90*d5dc9583SRong Xu   PROPELLER_PROFILE_foo.o := n
91*d5dc9583SRong Xu
92*d5dc9583SRong Xu- For disabling all files in one directory::
93*d5dc9583SRong Xu
94*d5dc9583SRong Xu   PROPELLER__PROFILE := n
95*d5dc9583SRong Xu
96*d5dc9583SRong Xu
97*d5dc9583SRong XuWorkflow
98*d5dc9583SRong Xu========
99*d5dc9583SRong Xu
100*d5dc9583SRong XuHere is an example workflow for building an AutoFDO+Propeller kernel:
101*d5dc9583SRong Xu
102*d5dc9583SRong Xu1) Assuming an AutoFDO profile is already collected following
103*d5dc9583SRong Xu   instructions in the AutoFDO document, build the kernel on the host
104*d5dc9583SRong Xu   machine, with AutoFDO and Propeller build configs ::
105*d5dc9583SRong Xu
106*d5dc9583SRong Xu      CONFIG_AUTOFDO_CLANG=y
107*d5dc9583SRong Xu      CONFIG_PROPELLER_CLANG=y
108*d5dc9583SRong Xu
109*d5dc9583SRong Xu   and ::
110*d5dc9583SRong Xu
111*d5dc9583SRong Xu      $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name>
112*d5dc9583SRong Xu
113*d5dc9583SRong Xu2) Install the kernel on the test machine.
114*d5dc9583SRong Xu
115*d5dc9583SRong Xu3) Run the load tests. The '-c' option in perf specifies the sample
116*d5dc9583SRong Xu   event period. We suggest using a suitable prime number, like 500009,
117*d5dc9583SRong Xu   for this purpose.
118*d5dc9583SRong Xu
119*d5dc9583SRong Xu   - For Intel platforms::
120*d5dc9583SRong Xu
121*d5dc9583SRong Xu      $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
122*d5dc9583SRong Xu
123*d5dc9583SRong Xu   - For AMD platforms::
124*d5dc9583SRong Xu
125*d5dc9583SRong Xu      $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
126*d5dc9583SRong Xu
127*d5dc9583SRong Xu   Note you can repeat the above steps to collect multiple <perf_file>s.
128*d5dc9583SRong Xu
129*d5dc9583SRong Xu4) (Optional) Download the raw perf file(s) to the host machine.
130*d5dc9583SRong Xu
131*d5dc9583SRong Xu5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to
132*d5dc9583SRong Xu   generate Propeller profile. ::
133*d5dc9583SRong Xu
134*d5dc9583SRong Xu      $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file>
135*d5dc9583SRong Xu                         --format=propeller --propeller_output_module_name
136*d5dc9583SRong Xu                         --out=<propeller_profile_prefix>_cc_profile.txt
137*d5dc9583SRong Xu                         --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
138*d5dc9583SRong Xu
139*d5dc9583SRong Xu   "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string".
140*d5dc9583SRong Xu
141*d5dc9583SRong Xu   This command generates a pair of Propeller profiles:
142*d5dc9583SRong Xu   "<propeller_profile_prefix>_cc_profile.txt" and
143*d5dc9583SRong Xu   "<propeller_profile_prefix>_ld_profile.txt".
144*d5dc9583SRong Xu
145*d5dc9583SRong Xu   If there are more than 1 perf_file collected in the previous step,
146*d5dc9583SRong Xu   you can create a temp list file "<perf_file_list>" with each line
147*d5dc9583SRong Xu   containing one perf file name and run::
148*d5dc9583SRong Xu
149*d5dc9583SRong Xu      $ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list>
150*d5dc9583SRong Xu                         --format=propeller --propeller_output_module_name
151*d5dc9583SRong Xu                         --out=<propeller_profile_prefix>_cc_profile.txt
152*d5dc9583SRong Xu                         --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
153*d5dc9583SRong Xu
154*d5dc9583SRong Xu6) Rebuild the kernel using the AutoFDO and Propeller
155*d5dc9583SRong Xu   profiles. ::
156*d5dc9583SRong Xu
157*d5dc9583SRong Xu      CONFIG_AUTOFDO_CLANG=y
158*d5dc9583SRong Xu      CONFIG_PROPELLER_CLANG=y
159*d5dc9583SRong Xu
160*d5dc9583SRong Xu   and ::
161*d5dc9583SRong Xu
162*d5dc9583SRong Xu      $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>
163