xref: /linux/Documentation/dev-tools/propeller.rst (revision 8c13415c8a4383447c21ec832b20b3b283f0e01a)
1.. SPDX-License-Identifier: GPL-2.0
2
3=====================================
4Using Propeller with the Linux kernel
5=====================================
6
7This enables Propeller build support for the kernel when using Clang
8compiler. Propeller is a profile-guided optimization (PGO) method used
9to optimize binary executables. Like AutoFDO, it utilizes hardware
10sampling to gather information about the frequency of execution of
11different code paths within a binary. Unlike AutoFDO, this information
12is then used right before linking phase to optimize (among others)
13block layout within and across functions.
14
15A few important notes about adopting Propeller optimization:
16
17#. Although it can be used as a standalone optimization step, it is
18   strongly recommended to apply Propeller on top of AutoFDO,
19   AutoFDO+ThinLTO or Instrument FDO. The rest of this document
20   assumes this paradigm.
21
22#. Propeller uses another round of profiling on top of
23   AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves
24   "build-afdo - train-afdo - build-propeller - train-propeller -
25   build-optimized".
26
27#. Propeller requires LLVM 19 release or later for Clang/Clang++
28   and the linker(ld.lld).
29
30#. In addition to LLVM toolchain, Propeller requires a profiling
31   conversion tool: https://github.com/google/llvm-propeller.
32
33Current supported architectures include x86/X86_64 (via LBR),
34and arm64 (via SPE).
35
36The Propeller optimization process involves the following steps:
37
38#. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as
39   you would normally do, but with a set of compile-time / link-time
40   flags, so that a special metadata section is created within the
41   kernel binary. The special section is only intend to be used by the
42   profiling tool, it is not part of the runtime image, nor does it
43   change kernel run time text sections.
44
45#. Profiling: The above kernel is then run with a representative
46   workload to gather execution frequency data. This data is collected
47   using hardware sampling, via perf. Propeller is most effective on
48   platforms supporting advanced PMU features like LBR on Intel
49   machines. This step is the same as profiling the kernel for AutoFDO
50   (the exact perf parameters can be different).
51
52#. Propeller profile generation: Perf output file is converted to a
53   pair of Propeller profiles via an offline tool.
54
55#. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized
56   binary as you would normally do, but with a compile-time /
57   link-time flag to pick up the Propeller compile time and link time
58   profiles. This build step uses 3 profiles - the AutoFDO profile,
59   the Propeller compile-time profile and the Propeller link-time
60   profile.
61
62#. Deployment: The optimized kernel binary is deployed and used
63   in production environments, providing improved performance
64   and reduced latency.
65
66Preparation
67===========
68
69Configure the kernel with::
70
71   CONFIG_AUTOFDO_CLANG=y
72   CONFIG_PROPELLER_CLANG=y
73
74Customization
75=============
76
77The default CONFIG_PROPELLER_CLANG setting covers kernel space objects
78for Propeller builds. One can, however, enable or disable Propeller build
79for individual files and directories by adding a line similar to the
80following to the respective kernel Makefile:
81
82- For enabling a single file (e.g. foo.o)::
83
84   PROPELLER_PROFILE_foo.o := y
85
86- For enabling all files in one directory::
87
88   PROPELLER_PROFILE := y
89
90- For disabling one file::
91
92   PROPELLER_PROFILE_foo.o := n
93
94- For disabling all files in one directory::
95
96   PROPELLER__PROFILE := n
97
98
99Workflow
100========
101
102Here is an example workflow for building an AutoFDO+Propeller kernel:
103
1041) Assuming an AutoFDO profile is already collected following
105   instructions in the AutoFDO document, build the kernel on the host
106   machine, with AutoFDO and Propeller build configs ::
107
108      CONFIG_AUTOFDO_CLANG=y
109      CONFIG_PROPELLER_CLANG=y
110
111   and ::
112
113      $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name>
114
1152) Install the kernel on the test machine.
116
1173) Run the load tests. The '-c' option in perf specifies the sample
118   event period. We suggest using a suitable prime number, like 500009,
119   for this purpose.
120
121   - For Intel platforms::
122
123      $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
124
125   - For AMD platforms::
126
127      $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest>
128
129   - For arm64 with SPE::
130     There are a few kernel features that must be enabled to collect SPE profiles on Arm.
131     Below is a list of the required features:
132
133      - CONFIG_ARM_SPE_PMU=y
134      - CONFIG_PID_IN_CONTEXTIDR=y
135      - kpti=off
136
137     Use the following command to generate SPE perf data file::
138
139      $ perf record -e 'arm_spe_0/branch_filter=1,load_filter=0,store_filter=0/' -a -N -c <count> --no-switch-events -o <perf_file> -- <loadtest>
140
141     Note you can repeat the above steps to collect multiple <perf_file>s.
142
1434) (Optional) Download the raw perf file(s) to the host machine.
144
1455) Use the generate_propeller_profiles tool (https://github.com/google/llvm-propeller) to
146   generate Propeller profile. ::
147
148      $ generate_propeller_profiles \
149             --binary=<vmlinux> --profile=<perf_file> \
150             --format=propeller --propeller_output_module_name \
151             --out=<propeller_profile_prefix>_cc_profile.txt \
152             --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
153
154   "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string".
155
156   This command generates a pair of Propeller profiles:
157   "<propeller_profile_prefix>_cc_profile.txt" and
158   "<propeller_profile_prefix>_ld_profile.txt".
159
160   If there are more than 1 perf_file collected in the previous step,
161   you can create a temp list file "<perf_file_list>" with each line
162   containing one perf file name and run::
163
164      $ generate_propeller_profiles \
165             --binary=<vmlinux> --profile=@<perf_file_list> \
166             --format=propeller --propeller_output_module_name \
167             --out=<propeller_profile_prefix>_cc_profile.txt \
168             --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
169
170   For arm64 SPE, add the option '--profiler=perf_spe', like::
171
172      $ generate_propeller_profiles  \
173             --binary=<vmlinux> --profile=<perf_file> \
174             --profiler=perf_spe \
175             --format=propeller --propeller_output_module_name \
176             --out=<propeller_profile_prefix>_cc_profile.txt \
177             --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt
178
1796) Rebuild the kernel using the AutoFDO and Propeller
180   profiles. ::
181
182      CONFIG_AUTOFDO_CLANG=y
183      CONFIG_PROPELLER_CLANG=y
184
185   and ::
186
187      $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>
188