#
ae28ed45 |
| 01-Oct-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Support pulling non-linear xdp data with bpf_xdp_pull_data() kfu
Merge tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Pull bpf updates from Alexei Starovoitov:
- Support pulling non-linear xdp data with bpf_xdp_pull_data() kfunc (Amery Hung)
Applied as a stable branch in bpf-next and net-next trees.
- Support reading skb metadata via bpf_dynptr (Jakub Sitnicki)
Also a stable branch in bpf-next and net-next trees.
- Enforce expected_attach_type for tailcall compatibility (Daniel Borkmann)
- Replace path-sensitive with path-insensitive live stack analysis in the verifier (Eduard Zingerman)
This is a significant change in the verification logic. More details, motivation, long term plans are in the cover letter/merge commit.
- Support signed BPF programs (KP Singh)
This is another major feature that took years to materialize.
Algorithm details are in the cover letter/marge commit
- Add support for may_goto instruction to s390 JIT (Ilya Leoshkevich)
- Add support for may_goto instruction to arm64 JIT (Puranjay Mohan)
- Fix USDT SIB argument handling in libbpf (Jiawei Zhao)
- Allow uprobe-bpf program to change context registers (Jiri Olsa)
- Support signed loads from BPF arena (Kumar Kartikeya Dwivedi and Puranjay Mohan)
- Allow access to union arguments in tracing programs (Leon Hwang)
- Optimize rcu_read_lock() + migrate_disable() combination where it's used in BPF subsystem (Menglong Dong)
- Introduce bpf_task_work_schedule*() kfuncs to schedule deferred execution of BPF callback in the context of a specific task using the kernel’s task_work infrastructure (Mykyta Yatsenko)
- Enforce RCU protection for KF_RCU_PROTECTED kfuncs (Kumar Kartikeya Dwivedi)
- Add stress test for rqspinlock in NMI (Kumar Kartikeya Dwivedi)
- Improve the precision of tnum multiplier verifier operation (Nandakumar Edamana)
- Use tnums to improve is_branch_taken() logic (Paul Chaignon)
- Add support for atomic operations in arena in riscv JIT (Pu Lehui)
- Report arena faults to BPF error stream (Puranjay Mohan)
- Search for tracefs at /sys/kernel/tracing first in bpftool (Quentin Monnet)
- Add bpf_strcasecmp() kfunc (Rong Tao)
- Support lookup_and_delete_elem command in BPF_MAP_STACK_TRACE (Tao Chen)
* tag 'bpf-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (197 commits) libbpf: Replace AF_ALG with open coded SHA-256 selftests/bpf: Add stress test for rqspinlock in NMI selftests/bpf: Add test case for different expected_attach_type bpf: Enforce expected_attach_type for tailcall compatibility bpftool: Remove duplicate string.h header bpf: Remove duplicate crypto/sha2.h header libbpf: Fix error when st-prefix_ops and ops from differ btf selftests/bpf: Test changing packet data from kfunc selftests/bpf: Add stacktrace map lookup_and_delete_elem test case selftests/bpf: Refactor stacktrace_map case with skeleton bpf: Add lookup_and_delete_elem for BPF_MAP_STACK_TRACE selftests/bpf: Fix flaky bpf_cookie selftest selftests/bpf: Test changing packet data from global functions with a kfunc bpf: Emit struct bpf_xdp_sock type in vmlinux BTF selftests/bpf: Task_work selftest cleanup fixes MAINTAINERS: Delete inactive maintainers from AF_XDP bpf: Mark kfuncs as __noclone selftests/bpf: Add kprobe multi write ctx attach test selftests/bpf: Add kprobe write ctx attach test selftests/bpf: Add uprobe context ip register change test ...
show more ...
|
#
815276db |
| 19-Sep-2025 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'bpf-replace-path-sensitive-with-path-insensitive-live-stack-analysis'
Eduard Zingerman says:
==================== bpf: replace path-sensitive with path-insensitive live stack analysis
Merge branch 'bpf-replace-path-sensitive-with-path-insensitive-live-stack-analysis'
Eduard Zingerman says:
==================== bpf: replace path-sensitive with path-insensitive live stack analysis
Consider the following program, assuming checkpoint is created for a state at instruction (3):
1: call bpf_get_prandom_u32() 2: *(u64 *)(r10 - 8) = 42 -- checkpoint #1 -- 3: if r0 != 0 goto +1 4: exit; 5: r0 = *(u64 *)(r10 - 8) 6: exit
The verifier processes this program by exploring two paths: - 1 -> 2 -> 3 -> 4 - 1 -> 2 -> 3 -> 5 -> 6
When instruction (5) is processed, the current liveness tracking mechanism moves up the register parent links and records a "read" mark for stack slot -8 at checkpoint #1, stopping because of the "write" mark recorded at instruction (2).
This patch set replaces the existing liveness tracking mechanism with a path-insensitive data flow analysis. The program above is processed as follows: - a data structure representing live stack slots for instructions 1-6 in frame #0 is allocated; - when instruction (2) is processed, record that slot -8 is written at instruction (2) in frame #0; - when instruction (5) is processed, record that slot -8 is read at instruction (5) in frame #0; - when instruction (6) is processed, propagate read mark for slot -8 up the control flow graph to instructions 3 and 2.
The key difference is that the new mechanism operates on a control flow graph and associates read and write marks with pairs of (call chain, instruction index). In contrast, the old mechanism operates on verifier states and register parent links, associating read and write marks with verifier states.
Motivation ==========
As it stands, this patch set makes liveness tracking slightly less precise, as it no longer distinguishes individual program paths taken by the verifier during symbolic execution. See the "Impact on verification performance" section for details.
However, this change is intended as a stepping stone toward the following goals: - Short term, integrate precision tracking into liveness analysis and remove the following code: - verifier backedge states accumulation in is_state_visited(); - most of the logic for precision tracking; - jump history tracking. - Long term, help with more efficient loop verification handling.
Why integrating precision tracking? -----------------------------------
In a sense, precision tracking is very similar to liveness tracking. The data flow equations for liveness tracking look as follows:
live_after = U [state[s].live_before for s in insn_successors(i)]
state[i].live_before = (live_after / state[i].must_write) U state[i].may_read
While data flow equations for precision tracking look as follows:
precise_after = U [state[s].precise_before for s in insn_successors(i)]
// if some of the instruction outputs are precise, // assume its inputs to be precise induced_precise = ⎧ state[i].may_read if (state[i].may_write ∩ precise_after) ≠ ∅ ⎨ ⎩ ∅ otherwise
state[i].precise_before = (precise_after / state[i].must_write) ∩ induced_precise
Where: - `may_read` set represents a union of all possibly read slots (any slot in `may_read` set might be by the instruction); - `must_write` set represents an intersection of all possibly written slots (any slot in `must_write` set is guaranteed to be written by the instruction). - `may_write` set represents a union of all possibly written slots (any slot in `may_write` set might be written by the instruction).
This means that precision tracking can be implemented as a logical extension of liveness tracking: - track registers as well as stack slots; - add bit masks to represent `precise_before` and `may_write`; - add above equations for `precise_before` computation; - (linked registers require some additional consideration).
Such extension would allow removal of: - precision propagation logic in verifier.c: - backtrack_insn() - mark_chain_precision() - propagate_{precision,backedges}() - push_jmp_history() and related data structures, which are only used by precision tracking; - add_scc_backedge() and related backedge state accumulation in is_state_visited(), superseded by per-callchain function state accumulated by liveness analysis.
The hope here is that unifying liveness and precision tracking will reduce overall amount of code and make it easier to reason about.
How this helps with loops? --------------------------
As it stands, this patch set shares the same deficiency as the current liveness tracking mechanism. Liveness marks on stack slots cannot be used to prune states when processing iterator-based loops: - such states still have branches to be explored; - meaning that not all stack slot reads have been discovered.
For example:
1: while(iter_next()) { 2: if (...) 3: r0 = *(u64 *)(r10 - 8) 4: if (...) 5: r0 = *(u64 *)(r10 - 16) 6: ... 7: }
For any checkpoint state created at instruction (1), it is only possible to rely on read marks for slots fp[-8] and fp[-16] once all child states of (1) have been explored. Thus, when the verifier transitions from (7) to (1), it cannot rely on read marks.
However, sacrificing path-sensitivity makes it possible to run analysis defined in this patch set before main verification pass, if estimates for value ranges are available. E.g. for the following program:
1: while(iter_next()) { 2: r0 = r10 3: r0 += r2 4: r0 = *(u64 *)(r2 + 0) 5: ... 6: }
If an estimate for `r2` range is available before the main verification pass, it can be used to populate read marks at instruction (4) and run the liveness analysis. Thus making conservative liveness information available during loops verification.
Such estimates can be provided by some form of value range analysis. Value range analysis is also necessary to address loop verification from another angle: computing boundaries for loop induction variables and iteration counts.
The hope here is that the new liveness tracking mechanism will support the broader goal of making loop verification more efficient.
Validation ==========
The change was tested on three program sets: - bpf selftests - sched_ext - Meta's internal set of programs
Commit [#8] enables a special mode where both the current and new liveness analyses are enabled simultaneously. This mode signals an error if the new algorithm considers a stack slot dead while the current algorithm assumes it is alive. This mode was very useful for debugging. At the time of posting, no such errors have been reported for the above program sets.
[#8] "bpf: signal error if old liveness is more conservative than new"
Impact on memory consumption ============================
Debug patch [1] extends the kernel and veristat to count the amount of memory allocated for storing analysis data. This patch is not included in the submission. The maximal observed impact for the above program sets is 2.6Mb.
Data below is shown in bytes.
For bpf selftests top 5 consumers look as follows:
File Program liveness mem ----------------------- ---------------- ------------ pyperf180.bpf.o on_event 2629740 pyperf600.bpf.o on_event 2287662 pyperf100.bpf.o on_event 1427022 test_verif_scale3.bpf.o balancer_ingress 1121283 pyperf_subprogs.bpf.o on_event 756900
For sched_ext top 5 consumers loog as follows:
File Program liveness mem --------- ------------------------------- ------------ bpf.bpf.o lavd_enqueue 164686 bpf.bpf.o lavd_select_cpu 157393 bpf.bpf.o layered_enqueue 154817 bpf.bpf.o lavd_init 127865 bpf.bpf.o layered_dispatch 110129
For Meta's internal set of programs top consumer is 1Mb.
[1] https://github.com/kernel-patches/bpf/commit/085588e787b7998a296eb320666897d80bca7c08
Impact on verification performance ==================================
Veristat results below are reported using `-f insns_pct>1 -f !insns<500` filter and -t option (BPF_F_TEST_STATE_FREQ flag).
master vs patch-set, selftests (out of ~4K programs) ----------------------------------------------------
File Program Insns (A) Insns (B) Insns (DIFF) -------------------------------- -------------------------------------- --------- --------- --------------- cpumask_success.bpf.o test_global_mask_nested_deep_array_rcu 1622 1655 +33 (+2.03%) strobemeta_bpf_loop.bpf.o on_event 2163 2684 +521 (+24.09%) test_cls_redirect.bpf.o cls_redirect 36001 42515 +6514 (+18.09%) test_cls_redirect_dynptr.bpf.o cls_redirect 2299 2339 +40 (+1.74%) test_cls_redirect_subprogs.bpf.o cls_redirect 69545 78497 +8952 (+12.87%) test_l4lb_noinline.bpf.o balancer_ingress 2993 3084 +91 (+3.04%) test_xdp_noinline.bpf.o balancer_ingress_v4 3539 3616 +77 (+2.18%) test_xdp_noinline.bpf.o balancer_ingress_v6 3608 3685 +77 (+2.13%)
master vs patch-set, sched_ext (out of 148 programs) ----------------------------------------------------
File Program Insns (A) Insns (B) Insns (DIFF) --------- ---------------- --------- --------- --------------- bpf.bpf.o chaos_dispatch 2257 2287 +30 (+1.33%) bpf.bpf.o lavd_enqueue 20735 22101 +1366 (+6.59%) bpf.bpf.o lavd_select_cpu 22100 24409 +2309 (+10.45%) bpf.bpf.o layered_dispatch 25051 25606 +555 (+2.22%) bpf.bpf.o p2dq_dispatch 961 990 +29 (+3.02%) bpf.bpf.o rusty_quiescent 526 534 +8 (+1.52%) bpf.bpf.o rusty_runnable 541 547 +6 (+1.11%)
Perf report ===========
In relative terms, the analysis does not consume much CPU time. For example, here is a perf report collected for pyperf180 selftest:
# Children Self Command Shared Object Symbol # ........ ........ ........ .................... ........................................ ... 1.22% 1.22% veristat [kernel.kallsyms] [k] bpf_update_live_stack ...
Changelog =========
v1: https://lore.kernel.org/bpf/20250911010437.2779173-1-eddyz87@gmail.com/T/ v1 -> v2: - compute_postorder() fixed to handle jumps with offset -1 (syzbot). - is_state_visited() in patch #9 fixed access to uninitialized `err` (kernel test robot, Dan Carpenter). - Selftests added. - Fixed bug with write marks propagation from callee to caller, see verifier_live_stack.c:caller_stack_write() test case. - Added a patch for __not_msg() annotation for test_loader based tests.
v2: https://lore.kernel.org/bpf/20250918-callchain-sensitive-liveness-v2-0-214ed2653eee@gmail.com/ v2 -> v3: - Added __diag_ignore_all("-Woverride-init", ...) in liveness.c for bpf_insn_successors() (suggested by Alexei).
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> ====================
Link: https://patch.msgid.link/20250918-callchain-sensitive-liveness-v3-0-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
79f047c7 |
| 19-Sep-2025 |
Eduard Zingerman <eddyz87@gmail.com> |
bpf: table based bpf_insn_successors()
Converting bpf_insn_successors() to use lookup table makes it ~1.5 times faster.
Also remove unnecessary conditionals: - `idx + 1 < prog->len` is unnecessary
bpf: table based bpf_insn_successors()
Converting bpf_insn_successors() to use lookup table makes it ~1.5 times faster.
Also remove unnecessary conditionals: - `idx + 1 < prog->len` is unnecessary because after check_cfg() all jump targets are guaranteed to be within a program; - `i == 0 || succ[0] != dst` is unnecessary because any client of bpf_insn_successors() can handle duplicate edges: - compute_live_registers() - compute_scc()
Moving bpf_insn_successors() to liveness.c allows its inlining in liveness.c:__update_stack_liveness(). Such inlining speeds up __update_stack_liveness() by ~40%. bpf_insn_successors() is used in both verifier.c and liveness.c. perf shows such move does not negatively impact users in verifier.c, as these are executed only once before main varification pass. Unlike __update_stack_liveness() which can be triggered multiple times.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-10-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
b3698c35 |
| 19-Sep-2025 |
Eduard Zingerman <eddyz87@gmail.com> |
bpf: callchain sensitive stack liveness tracking using CFG
This commit adds a flow-sensitive, context-sensitive, path-insensitive data flow analysis for live stack slots: - flow-sensitive: uses prog
bpf: callchain sensitive stack liveness tracking using CFG
This commit adds a flow-sensitive, context-sensitive, path-insensitive data flow analysis for live stack slots: - flow-sensitive: uses program control flow graph to compute data flow values; - context-sensitive: collects data flow values for each possible call chain in a program; - path-insensitive: does not distinguish between separate control flow graph paths reaching the same instruction.
Compared to the current path-sensitive analysis, this approach trades some precision for not having to enumerate every path in the program. This gives a theoretical capability to run the analysis before main verification pass. See cover letter for motivation.
The basic idea is as follows: - Data flow values indicate stack slots that might be read and stack slots that are definitely written. - Data flow values are collected for each (call chain, instruction number) combination in the program. - Within a subprogram, data flow values are propagated using control flow graph. - Data flow values are transferred from entry instructions of callee subprograms to call sites in caller subprograms.
In other words, a tree of all possible call chains is constructed. Each node of this tree represents a subprogram. Read and write marks are collected for each instruction of each node. Live stack slots are first computed for lower level nodes. Then, information about outer stack slots that might be read or are definitely written by a subprogram is propagated one level up, to the corresponding call instructions of the upper nodes. Procedure repeats until root node is processed.
In the absence of value range analysis, stack read/write marks are collected during main verification pass, and data flow computation is triggered each time verifier.c:states_equal() needs to query the information.
Implementation details are documented in kernel/bpf/liveness.c. Quantitative data about verification performance changes and memory consumption is in the cover letter.
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250918-callchain-sensitive-liveness-v3-6-c3cd27bacc60@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|