| 875612a7 | 27-Jan-2026 |
Hari Bathini <hbathini@linux.ibm.com> |
powerpc64/ftrace: fix OOL stub count with clang
The total number of out-of-line (OOL) stubs required for function tracing is determined using the following command:
$(OBJDUMP) -r -j __patchable
powerpc64/ftrace: fix OOL stub count with clang
The total number of out-of-line (OOL) stubs required for function tracing is determined using the following command:
$(OBJDUMP) -r -j __patchable_function_entries vmlinux.o
While this works correctly with GNU objdump, llvm-objdump does not list the expected relocation records for this section. Fix this by using the -d option and counting R_PPC64_ADDR64 relocation entries. This works as desired with both objdump and llvm-objdump.
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.ibm.com> Link: https://patch.msgid.link/20260127084926.34497-3-hbathini@linux.ibm.com
show more ...
|
| e717754f | 30-Oct-2024 |
Naveen N Rao <naveen@kernel.org> |
powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the arm64 implementation.
This works by patching-in a pointer to an associ
powerpc/ftrace: Add support for DYNAMIC_FTRACE_WITH_CALL_OPS
Implement support for DYNAMIC_FTRACE_WITH_CALL_OPS similar to the arm64 implementation.
This works by patching-in a pointer to an associated ftrace_ops structure before each traceable function. If multiple ftrace_ops are associated with a call site, then a special ftrace_list_ops is used to enable iterating over all the registered ftrace_ops. If no ftrace_ops are associated with a call site, then a special ftrace_nop_ops structure is used to render the ftrace call as a no-op. ftrace trampoline can then read the associated ftrace_ops for a call site by loading from an offset from the LR, and branch directly to the associated function.
The primary advantage with this approach is that we don't have to iterate over all the registered ftrace_ops for call sites that have a single ftrace_ops registered. This is the equivalent of implementing support for dynamic ftrace trampolines, which set up a special ftrace trampoline for each registered ftrace_ops and have individual call sites branch into those directly.
A secondary advantage is that this gives us a way to add support for direct ftrace callers without having to resort to using stubs. The address of the direct call trampoline can be loaded from the ftrace_ops structure.
To support this, we reserve a nop before each function on 32-bit powerpc. For 64-bit powerpc, two nops are reserved before each out-of-line stub. During ftrace activation, we update this location with the associated ftrace_ops pointer. Then, on ftrace entry, we load from this location and call into ftrace_ops->func().
For 64-bit powerpc, we ensure that the out-of-line stub area is doubleword aligned so that ftrace_ops address can be updated atomically.
Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-15-hbathini@linux.ibm.com
show more ...
|
| cf9bc0ef | 30-Oct-2024 |
Naveen N Rao <naveen@kernel.org> |
powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubs
We are restricted to a .text size of ~32MB when using out-of-line function profile sequence. Allow this to be extended up to t
powerpc64/ftrace: Support .text larger than 32MB with out-of-line stubs
We are restricted to a .text size of ~32MB when using out-of-line function profile sequence. Allow this to be extended up to the previous limit of ~64MB by reserving space in the middle of .text.
A new config option CONFIG_PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE is introduced to specify the number of function stubs that are reserved in .text. On boot, ftrace utilizes stubs from this area first before using the stub area at the end of .text.
A ppc64le defconfig has ~44k functions that can be traced. A more conservative value of 32k functions is chosen as the default value of PPC_FTRACE_OUT_OF_LINE_NUM_RESERVE so that we do not allot more space than necessary by default. If building a kernel that only has 32k trace-able functions, we won't allot any more space at the end of .text during the pass on vmlinux.o. Otherwise, only the remaining functions get space for stubs at the end of .text. This default value should help cover a .text size of ~48MB in total (including space reserved at the end of .text which can cover up to 32MB), which should be sufficient for most common builds. For a very small kernel build, this can be set to 0. Or, this can be bumped up to a larger value to support vmlinux .text size up to ~64MB.
Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-14-hbathini@linux.ibm.com
show more ...
|
| eec37961 | 30-Oct-2024 |
Naveen N Rao <naveen@kernel.org> |
powerpc64/ftrace: Move ftrace sequence out of line
Function profile sequence on powerpc includes two instructions at the beginning of each function: mflr r0 bl ftrace_caller
The call to ftrace_ca
powerpc64/ftrace: Move ftrace sequence out of line
Function profile sequence on powerpc includes two instructions at the beginning of each function: mflr r0 bl ftrace_caller
The call to ftrace_caller() gets nop'ed out during kernel boot and is patched in when ftrace is enabled.
Given the sequence, we cannot return from ftrace_caller with 'blr' as we need to keep LR and r0 intact. This results in link stack (return address predictor) imbalance when ftrace is enabled. To address that, we would like to use a three instruction sequence: mflr r0 bl ftrace_caller mtlr r0
Further more, to support DYNAMIC_FTRACE_WITH_CALL_OPS, we need to reserve two instruction slots before the function. This results in a total of five instruction slots to be reserved for ftrace use on each function that is traced.
Move the function profile sequence out-of-line to minimize its impact. To do this, we reserve a single nop at function entry using -fpatchable-function-entry=1 and add a pass on vmlinux.o to determine the total number of functions that can be traced. This is then used to generate a .S file reserving the appropriate amount of space for use as ftrace stubs, which is built and linked into vmlinux.
On bootup, the stub space is split into separate stubs per function and populated with the proper instruction sequence. A pointer to the associated stub is maintained in dyn_arch_ftrace.
For modules, space for ftrace stubs is reserved from the generic module stub space.
This is restricted to and enabled by default only on 64-bit powerpc, though there are some changes to accommodate 32-bit powerpc. This is done so that 32-bit powerpc could choose to opt into this based on further tests and benchmarks.
As an example, after this patch, kernel functions will have a single nop at function entry: <kernel_clone>: addis r2,r12,467 addi r2,r2,-16028 nop mfocrf r11,8 ...
When ftrace is enabled, the nop is converted to an unconditional branch to the stub associated with that function: <kernel_clone>: addis r2,r12,467 addi r2,r2,-16028 b ftrace_ool_stub_text_end+0x11b28 mfocrf r11,8 ...
The associated stub: <ftrace_ool_stub_text_end+0x11b28>: mflr r0 bl ftrace_caller mtlr r0 b kernel_clone+0xc ...
This change showed an improvement of ~10% in null_syscall benchmark on a Power 10 system with ftrace enabled.
Signed-off-by: Naveen N Rao <naveen@kernel.org> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://patch.msgid.link/20241030070850.1361304-13-hbathini@linux.ibm.com
show more ...
|
| e95ad5f2 | 16-Aug-2021 |
Michael Ellerman <mpe@ellerman.id.au> |
powerpc/head_check: Fix shellcheck errors
Replace "cat file | grep pattern" with "grep pattern file", and quote a few variables. Together that fixes all shellcheck errors.
Signed-off-by: Michael El
powerpc/head_check: Fix shellcheck errors
Replace "cat file | grep pattern" with "grep pattern file", and quote a few variables. Together that fixes all shellcheck errors.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210817125154.3369884-1-mpe@ellerman.id.au
show more ...
|
| 6b1992bc | 12-Aug-2020 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: unrel_branch_check.sh: enable the use of llvm-objdump v9, 10 or 11
Currently, using llvm-objtool, this script just silently succeeds without actually do the intended checking. So this upda
powerpc: unrel_branch_check.sh: enable the use of llvm-objdump v9, 10 or 11
Currently, using llvm-objtool, this script just silently succeeds without actually do the intended checking. So this updates it to work properly.
Firstly, llvm-objdump does not add target symbol names to the end of branches in its asm output, so we have to drop the branch to __start_initialization_multiplatform using its address.
Secondly, v9 and 10 specify branch targets as .+<offset>, so we convert those to actual addresses.
Thirdly, v10 and 11 error out on a vmlinux if given the -R option complaining that it is "not a dynamic object". The -R does not make any difference to the asm output, so remove it.
Lastly, v11 produces asm that is very similar to Gnu objtool (at least as far as branches are concerned), so no further changes are necessary to make it work.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200812081036.7969-3-sfr@canb.auug.org.au
show more ...
|
| b71dca98 | 12-Aug-2020 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: unrel_branch_check.sh: use nm to find symbol value
This is considerably faster then parsing the objdump asm output. It will also make the enabling of llvm-objdump a little easier.
Signed-
powerpc: unrel_branch_check.sh: use nm to find symbol value
This is considerably faster then parsing the objdump asm output. It will also make the enabling of llvm-objdump a little easier.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200812081036.7969-2-sfr@canb.auug.org.au
show more ...
|
| af13a224 | 11-Aug-2020 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: unrel_branch_check.sh: exit silently for early errors
If we can't find the address of __end_interrupts, then we still exit successfully as that is the current behaviour.
Signed-off-by: Ste
powerpc: unrel_branch_check.sh: exit silently for early errors
If we can't find the address of __end_interrupts, then we still exit successfully as that is the current behaviour.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200811140435.20957-8-sfr@canb.auug.org.au
show more ...
|
| 3745ae63 | 11-Aug-2020 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: unrel_branch_check.sh: fix up the file header
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/202
powerpc: unrel_branch_check.sh: fix up the file header
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200811140435.20957-7-sfr@canb.auug.org.au
show more ...
|
| b84eaab6 | 11-Aug-2020 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: unrel_branch_check.sh: simplify and tidy up the final loop
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.ker
powerpc: unrel_branch_check.sh: simplify and tidy up the final loop
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200811140435.20957-6-sfr@canb.auug.org.au
show more ...
|
| 3d97abbc | 11-Aug-2020 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: unrel_branch_check.sh: convert grep | sed | awk to just sed
Also start using sed -E and make all the separate expressions into a single one with comments. Pull the stripping of condition r
powerpc: unrel_branch_check.sh: convert grep | sed | awk to just sed
Also start using sed -E and make all the separate expressions into a single one with comments. Pull the stripping of condition registers back into the sed command.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200811140435.20957-5-sfr@canb.auug.org.au
show more ...
|
| 4e71106c | 11-Aug-2020 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: unrel_branch_check.sh: simplify objdump's asm output
We don't use the raw hex instruction dump, so elide it and adjust the following expressions.
Also use \s instead of [[:space:]] everywh
powerpc: unrel_branch_check.sh: simplify objdump's asm output
We don't use the raw hex instruction dump, so elide it and adjust the following expressions.
Also use \s instead of [[:space:]] everywhere.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200811140435.20957-4-sfr@canb.auug.org.au
show more ...
|
| 20ff8ec1 | 11-Aug-2020 |
Stephen Rothwell <sfr@canb.auug.org.au> |
powerpc: unrel_branch_check.sh: simplify and combine some executions
Also some minor style changes.
There should still be no change in behaviour.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org
powerpc: unrel_branch_check.sh: simplify and combine some executions
Also some minor style changes.
There should still be no change in behaviour.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20200811140435.20957-3-sfr@canb.auug.org.au
show more ...
|