Revision tags: v6.3-rc5 |
|
#
a033907e |
| 01-Apr-2023 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'Enable RCU semantics for task kptrs'
David Vernet says:
====================
In commit 22df776a9a86 ("tasks: Extract rcu_users out of union"), the 'refcount_t rcu_users' field was ex
Merge branch 'Enable RCU semantics for task kptrs'
David Vernet says:
====================
In commit 22df776a9a86 ("tasks: Extract rcu_users out of union"), the 'refcount_t rcu_users' field was extracted out of a union with the 'struct rcu_head rcu' field. This allows us to use the field for refcounting struct task_struct with RCU protection, as the RCU callback no longer flips rcu_users to be nonzero after the callback is scheduled.
This patch set leverages this to do a few things:
1. Marks struct task_struct as RCU safe in the verifier, allowing referenced kptr tasks stored in maps to be accessed in an RCU read region without acquiring a reference (with just a NULL check). 2. Makes bpf_task_acquire() a KF_ACQUIRE | KF_RCU | KF_RET_NULL kfunc. 3. Removes bpf_task_kptr_get() and bpf_task_acquire_not_zero(), as they're now redundant with the above two changes. 4. Updates selftests and documentation accordingly. --- Changelog: v1: https://lore.kernel.org/all/20230331005733.406202-1-void@manifault.com/ v1 -> v2: - Remove testcases validating nested trust inheritance. The first version used 'struct task_struct __rcu *parent', but because that field has the __rcu tag it functions differently on gcc and llvm and causes gcc selftests to fail. Alexei is reworking nested trust, anyways so let's leave it off for now (Alexei). ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
f85671c6 |
| 31-Mar-2023 |
David Vernet <void@manifault.com> |
bpf: Remove now-defunct task kfuncs
In commit 22df776a9a86 ("tasks: Extract rcu_users out of union"), the 'refcount_t rcu_users' field was extracted out of a union with the 'struct rcu_head rcu' fie
bpf: Remove now-defunct task kfuncs
In commit 22df776a9a86 ("tasks: Extract rcu_users out of union"), the 'refcount_t rcu_users' field was extracted out of a union with the 'struct rcu_head rcu' field. This allows us to safely perform a refcount_inc_not_zero() on task->rcu_users when acquiring a reference on a task struct. A prior patch leveraged this by making struct task_struct an RCU-protected object in the verifier, and by bpf_task_acquire() to use the task->rcu_users field for synchronization.
Now that we can use RCU to protect tasks, we no longer need bpf_task_kptr_get(), or bpf_task_acquire_not_zero(). bpf_task_kptr_get() is truly completely unnecessary, as we can just use RCU to get the object. bpf_task_acquire_not_zero() is now equivalent to bpf_task_acquire().
In addition to these changes, this patch also updates the associated selftests to no longer use these kfuncs.
Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230331195733.699708-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
d02c48fa |
| 31-Mar-2023 |
David Vernet <void@manifault.com> |
bpf: Make struct task_struct an RCU-safe type
struct task_struct objects are a bit interesting in terms of how their lifetime is protected by refcounts. task structs have two refcount fields:
1. re
bpf: Make struct task_struct an RCU-safe type
struct task_struct objects are a bit interesting in terms of how their lifetime is protected by refcounts. task structs have two refcount fields:
1. refcount_t usage: Protects the memory backing the task struct. When this refcount drops to 0, the task is immediately freed, without waiting for an RCU grace period to elapse. This is the field that most callers in the kernel currently use to ensure that a task remains valid while it's being referenced, and is what's currently tracked with bpf_task_acquire() and bpf_task_release().
2. refcount_t rcu_users: A refcount field which, when it drops to 0, schedules an RCU callback that drops a reference held on the 'usage' field above (which is acquired when the task is first created). This field therefore provides a form of RCU protection on the task by ensuring that at least one 'usage' refcount will be held until an RCU grace period has elapsed. The qualifier "a form of" is important here, as a task can remain valid after task->rcu_users has dropped to 0 and the subsequent RCU gp has elapsed.
In terms of BPF, we want to use task->rcu_users to protect tasks that function as referenced kptrs, and to allow tasks stored as referenced kptrs in maps to be accessed with RCU protection.
Let's first determine whether we can safely use task->rcu_users to protect tasks stored in maps. All of the bpf_task* kfuncs can only be called from tracepoint, struct_ops, or BPF_PROG_TYPE_SCHED_CLS, program types. For tracepoint and struct_ops programs, the struct task_struct passed to a program handler will always be trusted, so it will always be safe to call bpf_task_acquire() with any task passed to a program. Note, however, that we must update bpf_task_acquire() to be KF_RET_NULL, as it is possible that the task has exited by the time the program is invoked, even if the pointer is still currently valid because the main kernel holds a task->usage refcount. For BPF_PROG_TYPE_SCHED_CLS, tasks should never be passed as an argument to the any program handlers, so it should not be relevant.
The second question is whether it's safe to use RCU to access a task that was acquired with bpf_task_acquire(), and stored in a map. Because bpf_task_acquire() now uses task->rcu_users, it follows that if the task is present in the map, that it must have had at least one task->rcu_users refcount by the time the current RCU cs was started. Therefore, it's safe to access that task until the end of the current RCU cs.
With all that said, this patch makes struct task_struct is an RCU-protected object. In doing so, we also change bpf_task_acquire() to be KF_ACQUIRE | KF_RCU | KF_RET_NULL, and adjust any selftests as necessary. A subsequent patch will remove bpf_task_kptr_get(), and bpf_task_acquire_not_zero() respectively.
Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230331195733.699708-2-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
cecdd52a |
| 28-Mar-2023 |
Rodrigo Vivi <rodrigo.vivi@intel.com> |
Merge drm/drm-next into drm-intel-next
Catch up with 6.3-rc cycle...
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
|
Revision tags: v6.3-rc4 |
|
#
496f4f1b |
| 26-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'Don't invoke KPTR_REF destructor on NULL xchg'
David Vernet says:
====================
When a map value is being freed, we loop over all of the fields of the corresponding BPF object
Merge branch 'Don't invoke KPTR_REF destructor on NULL xchg'
David Vernet says:
====================
When a map value is being freed, we loop over all of the fields of the corresponding BPF object and issue the appropriate cleanup calls corresponding to the field's type. If the field is a referenced kptr, we atomically xchg the value out of the map, and invoke the kptr's destructor on whatever was there before.
Currently, we always invoke the destructor (or bpf_obj_drop() for a local kptr) on any kptr, including if no value was xchg'd out of the map. This means that any function serving as the kptr's KF_RELEASE destructor must always treat the argument as possibly NULL, and we invoke unnecessary (and seemingly unsafe) cleanup logic for the local kptr path as well.
This is an odd requirement -- KF_RELEASE kfuncs that are invoked by BPF programs do not have this restriction, and the verifier will fail to load the program if the register containing the to-be-released type has any untrusted modifiers (e.g. PTR_UNTRUSTED or PTR_MAYBE_NULL). So as to simplify the expectations required for a KF_RELEASE kfunc, this patch set updates the KPTR_REF destructor logic to only be invoked when a non-NULL value is xchg'd out of the map.
Additionally, the patch removes now-unnecessary KF_RELEASE calls from several kfuncs, and finally, updates the verifier to have KF_RELEASE automatically imply KF_TRUSTED_ARGS. This restriction was already implicitly happening because of the aforementioned logic in the verifier to reject any regs with untrusted modifiers, and to enforce that KF_RELEASE args are passed with a 0 offset. This change just updates the behavior to match that of other trusted args. This patch is left to the end of the series in case it happens to be controversial, as it arguably is slightly orthogonal to the purpose of the rest of the series. ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
fb2211a5 |
| 25-Mar-2023 |
David Vernet <void@manifault.com> |
bpf: Remove now-unnecessary NULL checks for KF_RELEASE kfuncs
Now that we're not invoking kfunc destructors when the kptr in a map was NULL, we no longer require NULL checks in many of our KF_RELEAS
bpf: Remove now-unnecessary NULL checks for KF_RELEASE kfuncs
Now that we're not invoking kfunc destructors when the kptr in a map was NULL, we no longer require NULL checks in many of our KF_RELEASE kfuncs. This patch removes those NULL checks.
Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230325213144.486885-3-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
e752ab11 |
| 20-Mar-2023 |
Rob Clark <robdclark@chromium.org> |
Merge remote-tracking branch 'drm/drm-next' into msm-next
Merge drm-next into msm-next to pick up external clk and PM dependencies for improved a6xx GPU reset sequence.
Signed-off-by: Rob Clark <ro
Merge remote-tracking branch 'drm/drm-next' into msm-next
Merge drm-next into msm-next to pick up external clk and PM dependencies for improved a6xx GPU reset sequence.
Signed-off-by: Rob Clark <robdclark@chromium.org>
show more ...
|
Revision tags: v6.3-rc3 |
|
#
d26a3a6c |
| 17-Mar-2023 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v6.3-rc2' into next
Merge with mainline to get of_property_present() and other newer APIs.
|
#
283b40c5 |
| 14-Mar-2023 |
Martin KaFai Lau <martin.lau@kernel.org> |
Merge branch 'bpf: Allow helpers access ptr_to_btf_id.'
Alexei Starovoitov says:
====================
From: Alexei Starovoitov <ast@kernel.org>
Allow code like: bpf_strncmp(task->comm, 16, "foo")
Merge branch 'bpf: Allow helpers access ptr_to_btf_id.'
Alexei Starovoitov says:
====================
From: Alexei Starovoitov <ast@kernel.org>
Allow code like: bpf_strncmp(task->comm, 16, "foo"); ====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
#
c9267aa8 |
| 14-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Fix bpf_strncmp proto.
bpf_strncmp() doesn't write into its first argument. Make sure that the verifier knows about it.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Verne
bpf: Fix bpf_strncmp proto.
bpf_strncmp() doesn't write into its first argument. Make sure that the verifier knows about it.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230313235845.61029-2-alexei.starovoitov@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
#
b3c9a041 |
| 13-Mar-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Backmerging to get latest upstream.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
#
a1eccc57 |
| 13-Mar-2023 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-next into drm-misc-next
Backmerging to get v6.3-rc1 and sync with the other DRM trees.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
|
Revision tags: v6.3-rc2 |
|
#
49b5300f |
| 11-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'Support stashing local kptrs with bpf_kptr_xchg'
Dave Marchevsky says:
====================
Local kptrs are kptrs allocated via bpf_obj_new with a type specified in program BTF. A BP
Merge branch 'Support stashing local kptrs with bpf_kptr_xchg'
Dave Marchevsky says:
====================
Local kptrs are kptrs allocated via bpf_obj_new with a type specified in program BTF. A BPF program which creates a local kptr has exclusive control of the lifetime of the kptr, and, prior to terminating, must:
* free the kptr via bpf_obj_drop * If the kptr is a {list,rbtree} node, add the node to a {list, rbtree}, thereby passing control of the lifetime to the collection
This series adds a third option:
* stash the kptr in a map value using bpf_kptr_xchg
As indicated by the use of "stash" to describe this behavior, the intended use of this feature is temporary storage of local kptrs. For example, a sched_ext ([0]) scheduler may want to create an rbtree node for each new cgroup on cgroup init, but to add that node to the rbtree as part of a separate program which runs on enqueue. Stashing the node in a map_value allows its lifetime to outlive the execution of the cgroup_init program.
Behavior:
There is no semantic difference between adding a kptr to a graph collection and "stashing" it in a map. In both cases exclusive ownership of the kptr's lifetime is passed to some containing data structure, which is responsible for bpf_obj_drop'ing it when the container goes away.
Since graph collections also expect exclusive ownership of the nodes they contain, graph nodes cannot be both stashed in a map_value and contained by their corresponding collection.
Implementation:
Two observations simplify the verifier changes for this feature. First, kptrs ("referenced kptrs" until a recent renaming) require registration of a dtor function as part of their acquire/release semantics, so that a referenced kptr which is placed in a map_value is properly released when the map goes away. We want this exact behavior for local kptrs, but with bpf_obj_drop as the dtor instead of a per-btf_id dtor.
The second observation is that, in terms of identification, "referenced kptr" and "local kptr" already don't interfere with one another. Consider the following example:
struct node_data { long key; long data; struct bpf_rb_node node; };
struct map_value { struct node_data __kptr *node; };
struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(key, int); __type(value, struct map_value); __uint(max_entries, 1); } some_nodes SEC(".maps");
struct map_value *mapval; struct node_data *res; int key = 0;
res = bpf_obj_new(typeof(*res)); if (!res) { /* err handling */ }
mapval = bpf_map_lookup_elem(&some_nodes, &key); if (!mapval) { /* err handling */ }
res = bpf_kptr_xchg(&mapval->node, res); if (res) bpf_obj_drop(res);
The __kptr tag identifies map_value's node as a referenced kptr, while the PTR_TO_BTF_ID which bpf_obj_new returns - a type in some non-vmlinux, non-module BTF - identifies res as a local kptr. Type tag on the pointer indicates referenced kptr, while the type of the pointee indicates local kptr. So using existing facilities we can tell the verifier about a "referenced kptr" pointer to a "local kptr" pointee.
When kptr_xchg'ing a kptr into a map_value, the verifier can recognize local kptr types and treat them like referenced kptrs with a properly-typed bpf_obj_drop as a dtor.
Other implementation notes: * We don't need to do anything special to enforce "graph nodes cannot be both stashed in a map_value and contained by their corresponding collection" * bpf_kptr_xchg both returns and takes as input a (possibly-null) owning reference. It does not accept non-owning references as input by virtue of requiring a ref_obj_id. By definition, if a program has an owning ref to a node, the node isn't in a collection, so it's safe to pass ownership via bpf_kptr_xchg.
Summary of patches:
* Patch 1 modifies BTF plumbing to support using bpf_obj_drop as a dtor * Patch 2 adds verifier plumbing to support MEM_ALLOC-flagged param for bpf_kptr_xchg * Patch 3 adds selftests exercising the new behavior
Changelog:
v1 -> v2: https://lore.kernel.org/bpf/20230309180111.1618459-1-davemarchevsky@fb.com/
Patch #s used below refer to the patch's position in v1 unless otherwise specified.
Patches 1-3 were applied and are not included in v2. Rebase onto latest bpf-next: "libbpf: Revert poisoning of strlcpy"
Patch 4: "bpf: Support __kptr to local kptrs" * Remove !btf_is_kernel(btf) check, WARN_ON_ONCE instead (Alexei)
Patch 6: "selftests/bpf: Add local kptr stashing test" * Add test which stashes 2 nodes and later unstashes one of them using a separate BPF program (Alexei) * Fix incorrect runner subtest name for original test (was "rbtree_add_nodes") ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
c8e18754 |
| 11-Mar-2023 |
Dave Marchevsky <davemarchevsky@fb.com> |
bpf: Support __kptr to local kptrs
If a PTR_TO_BTF_ID type comes from program BTF - not vmlinux or module BTF - it must have been allocated by bpf_obj_new and therefore must be free'd with bpf_obj_d
bpf: Support __kptr to local kptrs
If a PTR_TO_BTF_ID type comes from program BTF - not vmlinux or module BTF - it must have been allocated by bpf_obj_new and therefore must be free'd with bpf_obj_drop. Such a PTR_TO_BTF_ID is considered a "local kptr" and is tagged with MEM_ALLOC type tag by bpf_obj_new.
This patch adds support for treating __kptr-tagged pointers to "local kptrs" as having an implicit bpf_obj_drop destructor for referenced kptr acquire / release semantics. Consider the following example:
struct node_data { long key; long data; struct bpf_rb_node node; };
struct map_value { struct node_data __kptr *node; };
struct { __uint(type, BPF_MAP_TYPE_ARRAY); __type(key, int); __type(value, struct map_value); __uint(max_entries, 1); } some_nodes SEC(".maps");
If struct node_data had a matching definition in kernel BTF, the verifier would expect a destructor for the type to be registered. Since struct node_data does not match any type in kernel BTF, the verifier knows that there is no kfunc that provides a PTR_TO_BTF_ID to this type, and that such a PTR_TO_BTF_ID can only come from bpf_obj_new. So instead of searching for a registered dtor, a bpf_obj_drop dtor can be assumed.
This allows the runtime to properly destruct such kptrs in bpf_obj_free_fields, which enables maps to clean up map_vals w/ such kptrs when going away.
Implementation notes: * "kernel_btf" variable is renamed to "kptr_btf" in btf_parse_kptr. Before this patch, the variable would only ever point to vmlinux or module BTFs, but now it can point to some program BTF for local kptr type. It's later used to populate the (btf, btf_id) pair in kptr btf field. * It's necessary to btf_get the program BTF when populating btf_field for local kptr. btf_record_free later does a btf_put. * Behavior for non-local referenced kptrs is not modified, as bpf_find_btf_id helper only searches vmlinux and module BTFs for matching BTF type. If such a type is found, btf_field_kptr's btf will pass btf_is_kernel check, and the associated release function is some one-argument dtor. If btf_is_kernel check fails, associated release function is two-arg bpf_obj_drop_impl. Before this patch only btf_field_kptr's w/ kernel or module BTFs were created.
Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230310230743.2320707-2-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
b8fa3e38 |
| 10-Mar-2023 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Merge remote-tracking branch 'acme/perf-tools' into perf-tools-next
To pick up perf-tools fixes just merged upstream.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
23e403b3 |
| 09-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'BPF open-coded iterators'
Andrii Nakryiko says:
====================
Add support for open-coded (aka inline) iterators in BPF world. This is a next evolution of gradually allowing mo
Merge branch 'BPF open-coded iterators'
Andrii Nakryiko says:
====================
Add support for open-coded (aka inline) iterators in BPF world. This is a next evolution of gradually allowing more powerful and less restrictive looping and iteration capabilities to BPF programs.
We set up a framework for implementing all kinds of iterators (e.g., cgroup, task, file, etc, iterators), but this patch set only implements numbers iterator, which is used to implement ergonomic bpf_for() for-like construct (see patches #4-#5). We also add bpf_for_each(), which is a generic foreach-like construct that will work with any kind of open-coded iterator implementation, as long as we stick with bpf_iter_<type>_{new,next,destroy}() naming pattern (which we now enforce on the kernel side).
Patch #1 is preparatory refactoring for easier way to check for special kfunc calls. Patch #2 is adding iterator kfunc registration and validation logic, which is mostly independent from the rest of open-coded iterator logic, so is separated out for easier reviewing.
The meat of verifier-side logic is in patch #3. Patch #4 implements numbers iterator. I kept them separate to have clean reference for how to integrate new iterator types (now even simpler to do than in v1 of this patch set). Patch #5 adds bpf_for(), bpf_for_each(), and bpf_repeat() macros to bpf_misc.h, and also adds yet another pyperf test variant, now with bpf_for() loop. Patch #6 is verification tests, based on numbers iterator (as the only available right now). Patch #7 actually tests runtime behavior of numbers iterator.
Finally, with changes in v2, it's possible and trivial to implement custom iterators completely in kernel modules, which we showcase and test by adding a simple iterator returning same number a given number of times to bpf_testmod. Patch #8 is where all this happens and is tested.
Most of the relevant details are in corresponding commit messages or code comments.
v4->v5: - fixing missed inner for() in is_iter_reg_valid_uninit, and fixed return false (kernel test robot); - typo fixes and comment/commit description improvements throughout the patch set; v3->v4: - remove unused variable from is_iter_reg_valid_init (kernel test robot); v2->v3: - remove special kfunc leftovers for bpf_iter_num_{new,next,destroy}; - add iters/testmod_seq* to DENYLIST.s390x, it doesn't support kfuncs in modules yet (CI); v1->v2: - rebased on latest, dropping previously landed preparatory patches; - each iterator type now have its own `struct bpf_iter_<type>` which allows each iterator implementation to use exactly as much stack space as necessary, allowing to avoid runtime allocations (Alexei); - reworked how iterator kfuncs are defined, no verifier changes are required when adding new iterator type; - added bpf_testmod-based iterator implementation; - address the rest of feedback, comments, commit message adjustment, etc.
Cc: Tejun Heo <tj@kernel.org> ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
6018e1f4 |
| 08-Mar-2023 |
Andrii Nakryiko <andrii@kernel.org> |
bpf: implement numbers iterator
Implement the first open-coded iterator type over a range of integers.
It's public API consists of: - bpf_iter_num_new() constructor, which accepts [start, end) ra
bpf: implement numbers iterator
Implement the first open-coded iterator type over a range of integers.
It's public API consists of: - bpf_iter_num_new() constructor, which accepts [start, end) range (that is, start is inclusive, end is exclusive). - bpf_iter_num_next() which will keep returning read-only pointer to int until the range is exhausted, at which point NULL will be returned. If bpf_iter_num_next() is kept calling after this, NULL will be persistently returned. - bpf_iter_num_destroy() destructor, which needs to be called at some point to clean up iterator state. BPF verifier enforces that iterator destructor is called at some point before BPF program exits.
Note that `start = end = X` is a valid combination to setup an empty iterator. bpf_iter_num_new() will return 0 (success) for any such combination.
If bpf_iter_num_new() detects invalid combination of input arguments, it returns error, resets iterator state to, effectively, empty iterator, so any subsequent call to bpf_iter_num_next() will keep returning NULL.
BPF verifier has no knowledge that returned integers are in the [start, end) value range, as both `start` and `end` are not statically known and enforced: they are runtime values.
While the implementation is pretty trivial, some care needs to be taken to avoid overflows and underflows. Subsequent selftests will validate correctness of [start, end) semantics, especially around extremes (INT_MIN and INT_MAX).
Similarly to bpf_loop(), we enforce that no more than BPF_MAX_LOOPS can be specified.
bpf_iter_num_{new,next,destroy}() is a logical evolution from bounded BPF loops and bpf_loop() helper and is the basis for implementing ergonomic BPF loops with no statically known or verified bounds. Subsequent patches implement bpf_for() macro, demonstrating how this can be wrapped into something that works and feels like a normal for() loop in C language.
Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230308184121.1165081-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
36e5e391 |
| 07-Mar-2023 |
Jakub Kicinski <kuba@kernel.org> |
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2023-03-06
We've added 85 non-merge commits
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
==================== pull-request: bpf-next 2023-03-06
We've added 85 non-merge commits during the last 13 day(s) which contain a total of 131 files changed, 7102 insertions(+), 1792 deletions(-).
The main changes are:
1) Add skb and XDP typed dynptrs which allow BPF programs for more ergonomic and less brittle iteration through data and variable-sized accesses, from Joanne Koong.
2) Bigger batch of BPF verifier improvements to prepare for upcoming BPF open-coded iterators allowing for less restrictive looping capabilities, from Andrii Nakryiko.
3) Rework RCU enforcement in the verifier, add kptr_rcu and enforce BPF programs to NULL-check before passing such pointers into kfunc, from Alexei Starovoitov.
4) Add support for kptrs in percpu hashmaps, percpu LRU hashmaps and in local storage maps, from Kumar Kartikeya Dwivedi.
5) Add BPF verifier support for ST instructions in convert_ctx_access() which will help new -mcpu=v4 clang flag to start emitting them, from Eduard Zingerman.
6) Make uprobe attachment Android APK aware by supporting attachment to functions inside ELF objects contained in APKs via function names, from Daniel Müller.
7) Add a new flag BPF_F_TIMER_ABS flag for bpf_timer_start() helper to start the timer with absolute expiration value instead of relative one, from Tero Kristo.
8) Add a new kfunc bpf_cgroup_from_id() to look up cgroups via id, from Tejun Heo.
9) Extend libbpf to support users manually attaching kprobes/uprobes in the legacy/perf/link mode, from Menglong Dong.
10) Implement workarounds in the mips BPF JIT for DADDI/R4000, from Jiaxun Yang.
11) Enable mixing bpf2bpf and tailcalls for the loongarch BPF JIT, from Hengqi Chen.
12) Extend BPF instruction set doc with describing the encoding of BPF instructions in terms of how bytes are stored under big/little endian, from Jose E. Marchesi.
13) Follow-up to enable kfunc support for riscv BPF JIT, from Pu Lehui.
14) Fix bpf_xdp_query() backwards compatibility on old kernels, from Yonghong Song.
15) Fix BPF selftest cross compilation with CLANG_CROSS_FLAGS, from Florent Revest.
16) Improve bpf_cpumask_ma to only allocate one bpf_mem_cache, from Hou Tao.
17) Fix BPF verifier's check_subprogs to not unnecessarily mark a subprogram with has_tail_call, from Ilya Leoshkevich.
18) Fix arm syscall regs spec in libbpf's bpf_tracing.h, from Puranjay Mohan.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (85 commits) selftests/bpf: Add test for legacy/perf kprobe/uprobe attach mode selftests/bpf: Split test_attach_probe into multi subtests libbpf: Add support to set kprobe/uprobe attach mode tools/resolve_btfids: Add /libsubcmd to .gitignore bpf: add support for fixed-size memory pointer returns for kfuncs bpf: generalize dynptr_get_spi to be usable for iters bpf: mark PTR_TO_MEM as non-null register type bpf: move kfunc_call_arg_meta higher in the file bpf: ensure that r0 is marked scratched after any function call bpf: fix visit_insn()'s detection of BPF_FUNC_timer_set_callback helper bpf: clean up visit_insn()'s instruction processing selftests/bpf: adjust log_fixup's buffer size for proper truncation bpf: honor env->test_state_freq flag in is_state_visited() selftests/bpf: enhance align selftest's expected log matching bpf: improve regsafe() checks for PTR_TO_{MEM,BUF,TP_BUFFER} bpf: improve stack slot state printing selftests/bpf: Disassembler tests for verifier.c:convert_ctx_access() selftests/bpf: test if pointer type is tracked for BPF_ST_MEM bpf: allow ctx writes using BPF_ST_MEM instruction bpf: Use separate RCU callbacks for freeing selem ... ====================
Link: https://lore.kernel.org/r/20230307004346.27578-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
Revision tags: v6.3-rc1 |
|
#
db55174d |
| 03-Mar-2023 |
Daniel Borkmann <daniel@iogearbox.net> |
Merge branch 'bpf-kptr-rcu'
Alexei Starovoitov says:
==================== v4->v5: fix typos, add acks.
v3->v4: - patch 3 got much cleaner after BPF_KPTR_RCU was removed as suggested by David.
- m
Merge branch 'bpf-kptr-rcu'
Alexei Starovoitov says:
==================== v4->v5: fix typos, add acks.
v3->v4: - patch 3 got much cleaner after BPF_KPTR_RCU was removed as suggested by David.
- make KF_RCU stronger and require that bpf program checks for NULL before passing such pointers into kfunc. The prog has to do that anyway to access fields and it aligns with BTF_TYPE_SAFE_RCU allowlist.
- New patch 6: refactor RCU enforcement in the verifier. The patches 2,3,6 are part of one feature. The 2 and 3 alone are incomplete, since RCU pointers are barely useful without bpf_rcu_read_lock/unlock in GCC compiled kernel. Even if GCC lands support for btf_type_tag today it will take time to mandate that version for kernel builds. Hence go with allow list approach. See patch 6 for details. This allows to start strict enforcement of TRUSTED | UNTRUSTED in one part of PTR_TO_BTF_ID accesses. One step closer to KF_TRUSTED_ARGS by default.
v2->v3: - Instead of requiring bpf progs to tag fields with __kptr_rcu teach the verifier to infer RCU properties based on the type. BPF_KPTR_RCU becomes kernel internal type of struct btf_field. - Add patch 2 to tag cgroups and dfl_cgrp as trusted. That bug was spotted by BPF CI on clang compiler kernels, since patch 3 is doing: static bool in_rcu_cs(struct bpf_verifier_env *env) { return env->cur_state->active_rcu_lock || !env->prog->aux->sleepable; } which makes all non-sleepable programs behave like they have implicit rcu_read_lock around them. Which is the case in practice. It was fine on gcc compiled kernels where task->cgroup deference was producing PTR_TO_BTF_ID, but on clang compiled kernels task->cgroup deference was producing PTR_TO_BTF_ID | MEM_RCU | MAYBE_NULL, which is more correct, but selftests were failing. Patch 2 fixes this discrepancy. With few more patches like patch 2 we can make KF_TRUSTED_ARGS default for kfuncs and helpers. - Add comment in selftest patch 5 that it's verifier only check.
v1->v2: Instead of agressively allow dereferenced kptr_rcu pointers into KF_TRUSTED_ARGS kfuncs only allow them into KF_RCU funcs. The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with KF_RCU expect either PTR_TRUSTED or MEM_RCU arguments. The verifier guarantees that the objects are valid and there is no use-after-free, but the pointers maybe NULL and pointee object's reference count could have reached zero, hence kfuncs must do != NULL check and consider refcnt==0 case when accessing such arguments. No changes in patch 1. Patches 2,3,4 adjusted with above behavior.
v1: The __kptr_ref turned out to be too limited, since any "trusted" pointer access requires bpf_kptr_xchg() which is impractical when the same pointer needs to be dereferenced by multiple cpus. The __kptr "untrusted" only access isn't very useful in practice. Rename __kptr to __kptr_untrusted with eventual goal to deprecate it, and rename __kptr_ref to __kptr, since that looks to be more common use of kptrs. Introduce __kptr_rcu that can be directly dereferenced and used similar to native kernel C code. Once bpf_cpumask and task_struct kfuncs are converted to observe RCU GP when refcnt goes to zero, both __kptr and __kptr_untrusted can be deprecated and __kptr_rcu can become the only __kptr tag. ====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
show more ...
|
#
20c09d92 |
| 03-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
bpf: Introduce kptr_rcu.
The life time of certain kernel structures like 'struct cgroup' is protected by RCU. Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps. Th
bpf: Introduce kptr_rcu.
The life time of certain kernel structures like 'struct cgroup' is protected by RCU. Hence it's safe to dereference them directly from __kptr tagged pointers in bpf maps. The resulting pointer is MEM_RCU and can be passed to kfuncs that expect KF_RCU. Derefrence of other kptr-s returns PTR_UNTRUSTED.
For example: struct map_value { struct cgroup __kptr *cgrp; };
SEC("tp_btf/cgroup_mkdir") int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path) { struct cgroup *cg, *cg2;
cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0 bpf_kptr_xchg(&v->cgrp, cg);
cg2 = v->cgrp; // This is new feature introduced by this patch. // cg2 is PTR_MAYBE_NULL | MEM_RCU. // When cg2 != NULL, it's a valid cgroup, but its percpu_ref could be zero
if (cg2) bpf_cgroup_ancestor(cg2, level); // safe to do. }
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230303041446.3630-4-alexei.starovoitov@gmail.com
show more ...
|
#
f71f8530 |
| 02-Mar-2023 |
Tero Kristo <tero.kristo@linux.intel.com> |
bpf: Add support for absolute value BPF timers
Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start() to start an absolute value timer instead of the default relative value. This mak
bpf: Add support for absolute value BPF timers
Add a new flag BPF_F_TIMER_ABS that can be passed to bpf_timer_start() to start an absolute value timer instead of the default relative value. This makes the timer expire at an exact point in time, instead of a time with latencies induced by both the BPF and timer subsystems.
Suggested-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: Tero Kristo <tero.kristo@linux.intel.com> Link: https://lore.kernel.org/r/20230302114614.2985072-2-tero.kristo@linux.intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
c501bf55 |
| 02-Mar-2023 |
Tejun Heo <tj@kernel.org> |
bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types
These helpers are safe to call from any context and there's no reason to restrict access to them. Remove them from bp
bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types
These helpers are safe to call from any context and there's no reason to restrict access to them. Remove them from bpf_trace and filter lists and add to bpf_base_func_proto() under perfmon_capable().
v2: After consulting with Andrii, relocated in bpf_base_func_proto() so that they require bpf_capable() but not perfomon_capable() as it doesn't read from or affect others on the system.
Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/ZAD8QyoszMZiTzBY@slm.duckdns.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
c45eac53 |
| 02-Mar-2023 |
Joanne Koong <joannelkoong@gmail.com> |
bpf: Fix bpf_dynptr_slice{_rdwr} to return NULL instead of 0
Change bpf_dynptr_slice and bpf_dynptr_slice_rdwr to return NULL instead of 0, in accordance with the codebase guidelines.
Fixes: 66e3a1
bpf: Fix bpf_dynptr_slice{_rdwr} to return NULL instead of 0
Change bpf_dynptr_slice and bpf_dynptr_slice_rdwr to return NULL instead of 0, in accordance with the codebase guidelines.
Fixes: 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230302053014.1726219-1-joannelkoong@gmail.com
show more ...
|
#
7ce60b11 |
| 01-Mar-2023 |
David Vernet <void@manifault.com> |
bpf: Fix doxygen comments for dynptr slice kfuncs
In commit 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr"), the bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() kfuncs were added
bpf: Fix doxygen comments for dynptr slice kfuncs
In commit 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr"), the bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() kfuncs were added to BPF. These kfuncs included doxygen headers, but unfortunately those headers are not properly formatted according to [0], and causes the following warnings during the docs build:
./kernel/bpf/helpers.c:2225: warning: \ Excess function parameter 'returns' description in 'bpf_dynptr_slice' ./kernel/bpf/helpers.c:2303: warning: \ Excess function parameter 'returns' description in 'bpf_dynptr_slice_rdwr' ...
This patch fixes those doxygen comments.
[0]: https://docs.kernel.org/doc-guide/kernel-doc.html#function-documentation
Fixes: 66e3a13e7c2c ("bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr") Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230301194910.602738-1-void@manifault.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|
#
c4b5c5ba |
| 01-Mar-2023 |
Alexei Starovoitov <ast@kernel.org> |
Merge branch 'Add skb + xdp dynptrs'
Joanne Koong says:
====================
This patchset is the 2nd in the dynptr series. The 1st can be found here [0].
This patchset adds skb and xdp type dynp
Merge branch 'Add skb + xdp dynptrs'
Joanne Koong says:
====================
This patchset is the 2nd in the dynptr series. The 1st can be found here [0].
This patchset adds skb and xdp type dynptrs, which have two main benefits for packet parsing: * allowing operations on sizes that are not statically known at compile-time (eg variable-sized accesses). * more ergonomic and less brittle iteration through data (eg does not need manual if checking for being within bounds of data_end)
When comparing the differences in runtime for packet parsing without dynptrs vs. with dynptrs, there is no noticeable difference. Patch 9 contains more details as well as examples of how to use skb and xdp dynptrs.
[0] https://lore.kernel.org/bpf/20220523210712.3641569-1-joannelkoong@gmail.com/ --- Changelog:
v12 = https://lore.kernel.org/bpf/20230226085120.3907863-1-joannelkoong@gmail.com/ v12 -> v13: * Fix missing { } for case statement
v11 = https://lore.kernel.org/bpf/20230222060747.2562549-1-joannelkoong@gmail.com/ v11 -> v12: * Change constant mem size checking to use "__szk" kfunc annotation for slices * Use autoloading for success selftests
v10 = https://lore.kernel.org/bpf/20230216225524.1192789-1-joannelkoong@gmail.com/ v10 -> v11: * Reject bpf_dynptr_slice_rdwr() for non-writable progs at load time instead of runtime * Add additional patch (__uninit kfunc annotation) * Expand on documentation * Add bpf_dynptr_write() calls for persisting writes in tests
v9 = https://lore.kernel.org/bpf/20230127191703.3864860-1-joannelkoong@gmail.com/ v9 -> v10: * Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr interface * Add some more tests * Split up patchset into more parts to make it easier to review
v8 = https://lore.kernel.org/bpf/20230126233439.3739120-1-joannelkoong@gmail.com/ v8 -> v9: * Fix dynptr_get_type() to check non-stack dynptrs
v7 = https://lore.kernel.org/bpf/20221021011510.1890852-1-joannelkoong@gmail.com/ v7 -> v8: * Change helpers to kfuncs * Add 2 new patches (1/5 and 2/5)
v6 = https://lore.kernel.org/bpf/20220907183129.745846-1-joannelkoong@gmail.com/ v6 -> v7 * Change bpf_dynptr_data() to return read-only data slices if the skb prog is read-only (Martin) * Add test "skb_invalid_write" to test that writes to rd-only data slices are rejected
v5 = https://lore.kernel.org/bpf/20220831183224.3754305-1-joannelkoong@gmail.com/ v5 -> v6 * Address kernel test robot errors by static inlining
v4 = https://lore.kernel.org/bpf/20220822235649.2218031-1-joannelkoong@gmail.com/ v4 -> v5 * Address kernel test robot errors for configs w/out CONFIG_NET set * For data slices, return PTR_TO_MEM instead of PTR_TO_PACKET (Kumar) * Split selftests into subtests (Andrii) * Remove insn patching. Use rdonly and rdwr protos for dynptr skb construction (Andrii) * bpf_dynptr_data() returns NULL for rd-only dynptrs. There will be a separate bpf_dynptr_data_rdonly() added later (Andrii and Kumar)
v3 = https://lore.kernel.org/bpf/20220822193442.657638-1-joannelkoong@gmail.com/ v3 -> v4 * Forgot to commit --amend the kernel test robot error fixups
v2 = https://lore.kernel.org/bpf/20220811230501.2632393-1-joannelkoong@gmail.com/ v2 -> v3 * Fix kernel test robot build test errors
v1 = https://lore.kernel.org/bpf/20220726184706.954822-1-joannelkoong@gmail.com/ v1 -> v2 * Return data slices to rd-only skb dynptrs (Martin) * bpf_dynptr_write allows writes to frags for skb dynptrs, but always invalidates associated data slices (Martin) * Use switch casing instead of ifs (Andrii) * Use 0xFD for experimental kind number in the selftest (Zvi) * Put selftest conversions w/ dynptrs into new files (Alexei) * Add new selftest "test_cls_redirect_dynptr.c" ====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
show more ...
|