token.c - OpenGrok history log for /linux/kernel/bpf/token.c

Revision (<<< Hide revision tags) (Show revision tags >>>)	Date	Author	Comments
# a23e1966	15-Jul-2024	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge branch 'next' into for-linus Prepare input updates for 6.11 merge window.
Revision tags: v6.10, v6.10-rc7, v6.10-rc6, v6.10-rc5, v6.10-rc4, v6.10-rc3, v6.10-rc2
# 6f47c7ae	28-May-2024	Dmitry Torokhov <dmitry.torokhov@gmail.com>	Merge tag 'v6.9' into next Sync up with the mainline to bring in the new cleanup API.
Revision tags: v6.10-rc1
# 60a2f25d	16-May-2024	Tvrtko Ursulin <tursulin@ursulin.net>	Merge drm/drm-next into drm-intel-gt-next Some display refactoring patches are needed in order to allow conflict- less merging. Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
# 594ce0b8	10-Jun-2024	Russell King (Oracle) <rmk+kernel@armlinux.org.uk>	Merge topic branches 'clkdev' and 'fixes' into for-linus
Revision tags: v6.9, v6.9-rc7, v6.9-rc6, v6.9-rc5, v6.9-rc4, v6.9-rc3, v6.9-rc2, v6.9-rc1
# b228ab57	18-Mar-2024	Andrew Morton <akpm@linux-foundation.org>	Merge branch 'master' into mm-stable
# 79790b68	12-Apr-2024	Thomas Hellström <thomas.hellstrom@linux.intel.com>	Merge drm/drm-next into drm-xe-next Backmerging drm-next in order to get up-to-date and in particular to access commit 9ca5facd0400f610f3f7f71aeb7fc0b949a48c67. Signed-off-by: Thomas Hellström <tho Merge drm/drm-next into drm-xe-next Backmerging drm-next in order to get up-to-date and in particular to access commit 9ca5facd0400f610f3f7f71aeb7fc0b949a48c67. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> show more ...
# 3e5a516f	08-Apr-2024	Dmitry Baryshkov <dmitry.baryshkov@linaro.org>	Merge tag 'phy_dp_modes_6.10' into msm-next-lumag Merge DisplayPort subnode API in order to allow DisplayPort driver to configure the PHYs either to the DP or eDP mode, depending on hardware configu Merge tag 'phy_dp_modes_6.10' into msm-next-lumag Merge DisplayPort subnode API in order to allow DisplayPort driver to configure the PHYs either to the DP or eDP mode, depending on hardware configuration. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> show more ...
# 5add703f	02-Apr-2024	Rodrigo Vivi <rodrigo.vivi@intel.com>	Merge drm/drm-next into drm-intel-next Catching up on 6.9-rc2 Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
# 0d21364c	02-Apr-2024	Thomas Zimmermann <tzimmermann@suse.de>	Merge drm/drm-next into drm-misc-next Backmerging to get v6.9-rc2 changes into drm-misc-next. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
# b7e1e969	26-Mar-2024	Takashi Iwai <tiwai@suse.de>	Merge branch 'topic/sound-devel-6.10' into for-next
# f4566a1e	25-Mar-2024	Ingo Molnar <mingo@kernel.org>	Merge tag 'v6.9-rc1' into sched/core, to pick up fixes and to refresh the branch Signed-off-by: Ingo Molnar <mingo@kernel.org>
# 100c8542	05-Apr-2024	Takashi Iwai <tiwai@suse.de>	Merge tag 'asoc-fix-v6.9-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.9 A relatively large set of fixes here, the biggest piece of it is a Merge tag 'asoc-fix-v6.9-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus ASoC: Fixes for v6.9 A relatively large set of fixes here, the biggest piece of it is a series correcting some problems with the delay reporting for Intel SOF cards but there's a bunch of other things. Everything here is driver specific except for a fix in the core for an issue with sign extension handling volume controls. show more ...
# 36a1818f	25-Mar-2024	Thomas Zimmermann <tzimmermann@suse.de>	Merge drm/drm-fixes into drm-misc-fixes Backmerging to get drm-misc-fixes to the state of v6.9-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
# 9187210e	13-Mar-2024	Linus Torvalds <torvalds@linux-foundation.org>	Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lo Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Large effort by Eric to lower rtnl_lock pressure and remove locks: - Make commonly used parts of rtnetlink (address, route dumps etc) lockless, protected by RCU instead of rtnl_lock. - Add a netns exit callback which already holds rtnl_lock, allowing netns exit to take rtnl_lock once in the core instead of once for each driver / callback. - Remove locks / serialization in the socket diag interface. - Remove 6 calls to synchronize_rcu() while holding rtnl_lock. - Remove the dev_base_lock, depend on RCU where necessary. - Support busy polling on a per-epoll context basis. Poll length and budget parameters can be set independently of system defaults. - Introduce struct net_hotdata, to make sure read-mostly global config variables fit in as few cache lines as possible. - Add optional per-nexthop statistics to ease monitoring / debug of ECMP imbalance problems. - Support TCP_NOTSENT_LOWAT in MPTCP. - Ensure that IPv6 temporary addresses' preferred lifetimes are long enough, compared to other configured lifetimes, and at least 2 sec. - Support forwarding of ICMP Error messages in IPSec, per RFC 4301. - Add support for the independent control state machine for bonding per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled control state machine. - Add "network ID" to MCTP socket APIs to support hosts with multiple disjoint MCTP networks. - Re-use the mono_delivery_time skbuff bit for packets which user space wants to be sent at a specified time. Maintain the timing information while traversing veth links, bridge etc. - Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets. - Simplify many places iterating over netdevs by using an xarray instead of a hash table walk (hash table remains in place, for use on fastpaths). - Speed up scanning for expired routes by keeping a dedicated list. - Speed up "generic" XDP by trying harder to avoid large allocations. - Support attaching arbitrary metadata to netconsole messages. Things we sprinkled into general kernel code: - Enforce VM_IOREMAP flag and range in ioremap_page_range and introduce VM_SPARSE kind and vm_area_[un]map_pages (used by bpf_arena). - Rework selftest harness to enable the use of the full range of ksft exit code (pass, fail, skip, xfail, xpass). Netfilter: - Allow userspace to define a table that is exclusively owned by a daemon (via netlink socket aliveness) without auto-removing this table when the userspace program exits. Such table gets marked as orphaned and a restarting management daemon can re-attach/regain ownership. - Speed up element insertions to nftables' concatenated-ranges set type. Compact a few related data structures. BPF: - Add BPF token support for delegating a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. - Introduce bpf_arena which is sparse shared memory region between BPF program and user space where structures inside the arena can have pointers to other areas of the arena, and pointers work seamlessly for both user-space programs and BPF programs. - Introduce may_goto instruction that is a contract between the verifier and the program. The verifier allows the program to loop assuming it's behaving well, but reserves the right to terminate it. - Extend the BPF verifier to enable static subprog calls in spin lock critical sections. - Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type. - Add support for retrieval of cookies for perf/kprobe multi links. - Support arbitrary TCP SYN cookie generation / validation in the TC layer with BPF to allow creating SYN flood handling in BPF firewalls. - Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects. Wireless: - Add SPP (signaling and payload protected) AMSDU support. - Support wider bandwidth OFDMA, as required for EHT operation. Driver API: - Major overhaul of the Energy Efficient Ethernet internals to support new link modes (2.5GE, 5GE), share more code between drivers (especially those using phylib), and encourage more uniform behavior. Convert and clean up drivers. - Define an API for querying per netdev queue statistics from drivers. - IPSec: account in global stats for fully offloaded sessions. - Create a concept of Ethernet PHY Packages at the Device Tree level, to allow parameterizing the existing PHY package code. - Enable Rx hashing (RSS) on GTP protocol fields. Misc: - Improvements and refactoring all over networking selftests. - Create uniform module aliases for TC classifiers, actions, and packet schedulers to simplify creating modprobe policies. - Address all missing MODULE_DESCRIPTION() warnings in networking. - Extend the Netlink descriptions in YAML to cover message encapsulation or "Netlink polymorphism", where interpretation of nested attributes depends on link type, classifier type or some other "class type". Drivers: - Ethernet high-speed NICs: - Add a new driver for Marvell's Octeon PCI Endpoint NIC VF. - Intel (100G, ice, idpf): - support E825-C devices - nVidia/Mellanox: - support devices with one port and multiple PCIe links - Broadcom (bnxt): - support n-tuple filters - support configuring the RSS key - Wangxun (ngbe/txgbe): - implement irq_domain for TXGBE's sub-interrupts - Pensando/AMD: - support XDP - optimize queue submission and wakeup handling (+17% bps) - optimize struct layout, saving 28% of memory on queues - Ethernet NICs embedded and virtual: - Google cloud vNIC: - refactor driver to perform memory allocations for new queue config before stopping and freeing the old queue memory - Synopsys (stmmac): - obey queueMaxSDU and implement counters required by 802.1Qbv - Renesas (ravb): - support packet checksum offload - suspend to RAM and runtime PM support - Ethernet switches: - nVidia/Mellanox: - support for nexthop group statistics - Microchip: - ksz8: implement PHY loopback - add support for KSZ8567, a 7-port 10/100Mbps switch - PTP: - New driver for RENESAS FemtoClock3 Wireless clock generator. - Support OCP PTP cards designed and built by Adva. - CAN: - Support recvmsg() flags for own, local and remote traffic on CAN BCM sockets. - Support for esd GmbH PCIe/402 CAN device family. - m_can: - Rx/Tx submission coalescing - wake on frame Rx - WiFi: - Intel (iwlwifi): - enable signaling and payload protected A-MSDUs - support wider-bandwidth OFDMA - support for new devices - bump FW API to 89 for AX devices; 90 for BZ/SC devices - MediaTek (mt76): - mt7915: newer ADIE version support - mt7925: radio temperature sensor support - Qualcomm (ath11k): - support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) - QCA6390 & WCN6855: support 2 concurrent station interfaces - QCA2066 support - Qualcomm (ath12k): - refactoring in preparation for Multi-Link Operation (MLO) support - 1024 Block Ack window size support - firmware-2.bin support - support having multiple identical PCI devices (firmware needs to have ATH12K_FW_FEATURE_MULTI_QRTR_ID) - QCN9274: support split-PHY devices - WCN7850: enable Power Save Mode in station mode - WCN7850: P2P support - RealTek: - rtw88: support for more rtw8811cu and rtw8821cu devices - rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL - rtlwifi: speed up USB firmware initialization - rtwl8xxxu: - RTL8188F: concurrent interface support - Channel Switch Announcement (CSA) support in AP mode - Broadcom (brcmfmac): - per-vendor feature support - per-vendor SAE password setup - DMI nvram filename quirk for ACEPC W5 Pro" * tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits) nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y nexthop: Fix out-of-bounds access during attribute validation nexthop: Only parse NHA_OP_FLAGS for dump messages that require it nexthop: Only parse NHA_OP_FLAGS for get messages that require it bpf: move sleepable flag from bpf_prog_aux to bpf_prog bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes() selftests/bpf: Add kprobe multi triggering benchmarks ptp: Move from simple ida to xarray vxlan: Remove generic .ndo_get_stats64 vxlan: Do not alloc tstats manually devlink: Add comments to use netlink gen tool nfp: flower: handle acti_netdevs allocation failure net/packet: Add getsockopt support for PACKET_COPY_THRESH net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID selftests/bpf: Add bpf_arena_htab test. selftests/bpf: Add bpf_arena_list test. selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages bpf: Add helper macro bpf_addr_space_cast() libbpf: Recognize __arena global variables. bpftool: Recognize arena map type ... show more ...
Revision tags: v6.8, v6.8-rc7
# 4b2765ae	03-Mar-2024	Jakub Kicinski <kuba@kernel.org>	Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-02-29 We've added 119 non-merge commit Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-02-29 We've added 119 non-merge commits during the last 32 day(s) which contain a total of 150 files changed, 3589 insertions(+), 995 deletions(-). The main changes are: 1) Extend the BPF verifier to enable static subprog calls in spin lock critical sections, from Kumar Kartikeya Dwivedi. 2) Fix confusing and incorrect inference of PTR_TO_CTX argument type in BPF global subprogs, from Andrii Nakryiko. 3) Larger batch of riscv BPF JIT improvements and enabling inlining of the bpf_kptr_xchg() for RV64, from Pu Lehui. 4) Allow skeleton users to change the values of the fields in struct_ops maps at runtime, from Kui-Feng Lee. 5) Extend the verifier's capabilities of tracking scalars when they are spilled to stack, especially when the spill or fill is narrowing, from Maxim Mikityanskiy & Eduard Zingerman. 6) Various BPF selftest improvements to fix errors under gcc BPF backend, from Jose E. Marchesi. 7) Avoid module loading failure when the module trying to register a struct_ops has its BTF section stripped, from Geliang Tang. 8) Annotate all kfuncs in .BTF_ids section which eventually allows for automatic kfunc prototype generation from bpftool, from Daniel Xu. 9) Several updates to the instruction-set.rst IETF standardization document, from Dave Thaler. 10) Shrink the size of struct bpf_map resp. bpf_array, from Alexei Starovoitov. 11) Initial small subset of BPF verifier prepwork for sleepable bpf_timer, from Benjamin Tissoires. 12) Fix bpftool to be more portable to musl libc by using POSIX's basename(), from Arnaldo Carvalho de Melo. 13) Add libbpf support to gcc in CORE macro definitions, from Cupertino Miranda. 14) Remove a duplicate type check in perf_event_bpf_event, from Florian Lehner. 15) Fix bpf_spin_{un,}lock BPF helpers to actually annotate them with notrace correctly, from Yonghong Song. 16) Replace the deprecated bpf_lpm_trie_key 0-length array with flexible array to fix build warnings, from Kees Cook. 17) Fix resolve_btfids cross-compilation to non host-native endianness, from Viktor Malik. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits) selftests/bpf: Test if shadow types work correctly. bpftool: Add an example for struct_ops map and shadow type. bpftool: Generated shadow variables for struct_ops maps. libbpf: Convert st_ops->data to shadow type. libbpf: Set btf_value_type_id of struct bpf_map for struct_ops. bpf: Replace bpf_lpm_trie_key 0-length array with flexible array bpf, arm64: use bpf_prog_pack for memory management arm64: patching: implement text_poke API bpf, arm64: support exceptions arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT bpf: add is_async_callback_calling_insn() helper bpf: introduce in_sleepable() helper bpf: allow more maps in sleepable bpf programs selftests/bpf: Test case for lacking CFI stub functions. bpf: Check cfi_stubs before registering a struct_ops type. bpf: Clarify batch lookup/lookup_and_delete semantics bpf, docs: specify which BPF_ABS and BPF_IND fields were zero bpf, docs: Fix typos in instruction-set.rst selftests/bpf: update tcp_custom_syncookie to use scalar packet offset bpf: Shrink size of struct bpf_map/bpf_array. ... ==================== Link: https://lore.kernel.org/r/20240301001625.8800-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> show more ...
Revision tags: v6.8-rc6, v6.8-rc5, v6.8-rc4, v6.8-rc3, v6.8-rc2
# 6668e818	27-Jan-2024	Haiyue Wang <haiyue.wang@intel.com>	bpf,token: Use BIT_ULL() to convert the bit mask Replace the '(1ULL << )' with the macro BIT_ULL(nr). Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kern bpf,token: Use BIT_ULL() to convert the bit mask Replace the '(1ULL << )' with the macro BIT_ULL(nr). Signed-off-by: Haiyue Wang <haiyue.wang@intel.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240127134901.3698613-1-haiyue.wang@intel.com show more ...
# 92046e83	27-Jan-2024	Jakub Kicinski <kuba@kernel.org>	Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-01-26 We've added 107 non-merge commit Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2024-01-26 We've added 107 non-merge commits during the last 4 day(s) which contain a total of 101 files changed, 6009 insertions(+), 1260 deletions(-). The main changes are: 1) Add BPF token support to delegate a subset of BPF subsystem functionality from privileged system-wide daemons such as systemd through special mount options for userns-bound BPF fs to a trusted & unprivileged application. With addressed changes from Christian and Linus' reviews, from Andrii Nakryiko. 2) Support registration of struct_ops types from modules which helps projects like fuse-bpf that seeks to implement a new struct_ops type, from Kui-Feng Lee. 3) Add support for retrieval of cookies for perf/kprobe multi links, from Jiri Olsa. 4) Bigger batch of prep-work for the BPF verifier to eventually support preserving boundaries and tracking scalars on narrowing fills, from Maxim Mikityanskiy. 5) Extend the tc BPF flavor to support arbitrary TCP SYN cookies to help with the scenario of SYN floods, from Kuniyuki Iwashima. 6) Add code generation to inline the bpf_kptr_xchg() helper which improves performance when stashing/popping the allocated BPF objects, from Hou Tao. 7) Extend BPF verifier to track aligned ST stores as imprecise spilled registers, from Yonghong Song. 8) Several fixes to BPF selftests around inline asm constraints and unsupported VLA code generation, from Jose E. Marchesi. 9) Various updates to the BPF IETF instruction set draft document such as the introduction of conformance groups for instructions, from Dave Thaler. 10) Fix BPF verifier to make infinite loop detection in is_state_visited() exact to catch some too lax spill/fill corner cases, from Eduard Zingerman. 11) Refactor the BPF verifier pointer ALU check to allow ALU explicitly instead of implicitly for various register types, from Hao Sun. 12) Fix the flaky tc_redirect_dtime BPF selftest due to slowness in neighbor advertisement at setup time, from Martin KaFai Lau. 13) Change BPF selftests to skip callback tests for the case when the JIT is disabled, from Tiezhu Yang. 14) Add a small extension to libbpf which allows to auto create a map-in-map's inner map, from Andrey Grafin. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (107 commits) selftests/bpf: Add missing line break in test_verifier bpf, docs: Clarify definitions of various instructions bpf: Fix error checks against bpf_get_btf_vmlinux(). bpf: One more maintainer for libbpf and BPF selftests selftests/bpf: Incorporate LSM policy to token-based tests selftests/bpf: Add tests for LIBBPF_BPF_TOKEN_PATH envvar libbpf: Support BPF token path setting through LIBBPF_BPF_TOKEN_PATH envvar selftests/bpf: Add tests for BPF object load with implicit token selftests/bpf: Add BPF object loading tests with explicit token passing libbpf: Wire up BPF token support at BPF object level libbpf: Wire up token_fd into feature probing logic libbpf: Move feature detection code into its own file libbpf: Further decouple feature checking logic from bpf_object libbpf: Split feature detectors definitions from cached results selftests/bpf: Utilize string values for delegate_xxx mount options bpf: Support symbolic BPF FS delegation mount options bpf: Fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS bpf,selinux: Allocate bpf_security_struct per BPF token selftests/bpf: Add BPF token-enabled tests libbpf: Add BPF token support to bpf_prog_load() API ... ==================== Link: https://lore.kernel.org/r/20240126215710.19855-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> show more ...
# c8632acf	24-Jan-2024	Andrii Nakryiko <andrii@kernel.org>	Merge branch 'bpf-token' Andrii Nakryiko says: ==================== BPF token This patch set is a combination of three BPF token-related patch sets ([0], [1], [2]) with fixes ([3]) to kernel-side Merge branch 'bpf-token' Andrii Nakryiko says: ==================== BPF token This patch set is a combination of three BPF token-related patch sets ([0], [1], [2]) with fixes ([3]) to kernel-side token_fd passing APIs incorporated into relevant patches, bpf_token_capable() changes requested by Christian Brauner, and necessary libbpf and BPF selftests side adjustments. This patch set introduces an ability to delegate a subset of BPF subsystem functionality from privileged system-wide daemon (e.g., systemd or any other container manager) through special mount options for userns-bound BPF FS to a trusted unprivileged application. Trust is the key here. This functionality is not about allowing unconditional unprivileged BPF usage. Establishing trust, though, is completely up to the discretion of respective privileged application that would create and mount a BPF FS instance with delegation enabled, as different production setups can and do achieve it through a combination of different means (signing, LSM, code reviews, etc), and it's undesirable and infeasible for kernel to enforce any particular way of validating trustworthiness of particular process. The main motivation for this work is a desire to enable containerized BPF applications to be used together with user namespaces. This is currently impossible, as CAP_BPF, required for BPF subsystem usage, cannot be namespaced or sandboxed, as a general rule. E.g., tracing BPF programs, thanks to BPF helpers like bpf_probe_read_kernel() and bpf_probe_read_user() can safely read arbitrary memory, and it's impossible to ensure that they only read memory of processes belonging to any given namespace. This means that it's impossible to have a mechanically verifiable namespace-aware CAP_BPF capability, and as such another mechanism to allow safe usage of BPF functionality is necessary. BPF FS delegation mount options and BPF token derived from such BPF FS instance is such a mechanism. Kernel makes no assumption about what "trusted" constitutes in any particular case, and it's up to specific privileged applications and their surrounding infrastructure to decide that. What kernel provides is a set of APIs to setup and mount special BPF FS instance and derive BPF tokens from it. BPF FS and BPF token are both bound to its owning userns and in such a way are constrained inside intended container. Users can then pass BPF token FD to privileged bpf() syscall commands, like BPF map creation and BPF program loading, to perform such operations without having init userns privileges. This version incorporates feedback and suggestions ([4]) received on earlier iterations of BPF token approach, and instead of allowing to create BPF tokens directly assuming capable(CAP_SYS_ADMIN), we instead enhance BPF FS to accept a few new delegation mount options. If these options are used and BPF FS itself is properly created, set up, and mounted inside the user namespaced container, user application is able to derive a BPF token object from BPF FS instance, and pass that token to bpf() syscall. As explained in patch #3, BPF token itself doesn't grant access to BPF functionality, but instead allows kernel to do namespaced capabilities checks (ns_capable() vs capable()) for CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, and CAP_SYS_ADMIN, as applicable. So it forms one half of a puzzle and allows container managers and sys admins to have safe and flexible configuration options: determining which containers get delegation of BPF functionality through BPF FS, and then which applications within such containers are allowed to perform bpf() commands, based on namespaces capabilities. Previous attempt at addressing this very same problem ([5]) attempted to utilize authoritative LSM approach, but was conclusively rejected by upstream LSM maintainers. BPF token concept is not changing anything about LSM approach, but can be combined with LSM hooks for very fine-grained security policy. Some ideas about making BPF token more convenient to use with LSM (in particular custom BPF LSM programs) was briefly described in recent LSF/MM/BPF 2023 presentation ([6]). E.g., an ability to specify user-provided data (context), which in combination with BPF LSM would allow implementing a very dynamic and fine-granular custom security policies on top of BPF token. In the interest of minimizing API surface area and discussions this was relegated to follow up patches, as it's not essential to the fundamental concept of delegatable BPF token. It should be noted that BPF token is conceptually quite similar to the idea of /dev/bpf device file, proposed by Song a while ago ([7]). The biggest difference is the idea of using virtual anon_inode file to hold BPF token and allowing multiple independent instances of them, each (potentially) with its own set of restrictions. And also, crucially, BPF token approach is not using any special stateful task-scoped flags. Instead, bpf() syscall accepts token_fd parameters explicitly for each relevant BPF command. This addresses main concerns brought up during the /dev/bpf discussion, and fits better with overall BPF subsystem design. Second part of this patch set adds full support for BPF token in libbpf's BPF object high-level API. Good chunk of the changes rework libbpf feature detection internals, which are the most affected by BPF token presence. Besides internal refactorings, libbpf allows to pass location of BPF FS from which BPF token should be created by libbpf. This can be done explicitly though a new bpf_object_open_opts.bpf_token_path field. But we also add implicit BPF token creation logic to BPF object load step, even without any explicit involvement of the user. If the environment is setup properly, BPF token will be created transparently and used implicitly. This allows for all existing application to gain BPF token support by just linking with latest version of libbpf library. No source code modifications are required. All that under assumption that privileged container management agent properly set up default BPF FS instance at /sys/bpf/fs to allow BPF token creation. libbpf adds support to override default BPF FS location for BPF token creation through LIBBPF_BPF_TOKEN_PATH envvar knowledge. This allows admins or container managers to mount BPF token-enabled BPF FS at non-standard location without the need to coordinate with applications. LIBBPF_BPF_TOKEN_PATH can also be used to disable BPF token implicit creation by setting it to an empty value. [0] https://patchwork.kernel.org/project/netdevbpf/list/?series=805707&state=* [1] https://patchwork.kernel.org/project/netdevbpf/list/?series=810260&state=* [2] https://patchwork.kernel.org/project/netdevbpf/list/?series=809800&state=* [3] https://patchwork.kernel.org/project/netdevbpf/patch/20231219053150.336991-1-andrii@kernel.org/ [4] https://lore.kernel.org/bpf/20230704-hochverdient-lehne-eeb9eeef785e@brauner/ [5] https://lore.kernel.org/bpf/20230412043300.360803-1-andrii@kernel.org/ [6] http://vger.kernel.org/bpfconf2023_material/Trusted_unprivileged_BPF_LSFMM2023.pdf [7] https://lore.kernel.org/bpf/20190627201923.2589391-2-songliubraving@fb.com/ v1->v2: - disable BPF token creation in init userns, and simplify bpf_token_capable() logic (Christian); - use kzalloc/kfree instead of kvzalloc/kvfree (Linus); - few more selftest cases to validate LSM and BPF token interations. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> ==================== Link: https://lore.kernel.org/r/20240124022127.2379740-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> show more ...
# aeaa97b0	24-Jan-2024	Andrii Nakryiko <andrii@kernel.org>	bpf: Fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS It's quite confusing in practice when it's possible to successfully create a BPF token from BPF FS that didn't have any of delega bpf: Fail BPF_TOKEN_CREATE if no delegation option was set on BPF FS It's quite confusing in practice when it's possible to successfully create a BPF token from BPF FS that didn't have any of delegate_xxx mount options set up. While it's not wrong, it's actually more meaningful to reject BPF_TOKEN_CREATE with specific error code (-ENOENT) to let user-space know that no token delegation is setup up. So, instead of creating empty BPF token that will be always ignored because it doesn't have any of the allow_xxx bits set, reject it with -ENOENT. If we ever need empty BPF token to be possible, we can support that with extra flag passed into BPF_TOKEN_CREATE. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-19-andrii@kernel.org show more ...
# f568a3d4	24-Jan-2024	Andrii Nakryiko <andrii@kernel.org>	bpf,lsm: Add BPF token LSM hooks Wire up bpf_token_create and bpf_token_free LSM hooks, which allow to allocate LSM security blob (we add `void security` field to struct bpf_token for that), but al bpf,lsm: Add BPF token LSM hooks Wire up bpf_token_create and bpf_token_free LSM hooks, which allow to allocate LSM security blob (we add `void security` field to struct bpf_token for that), but also control who can instantiate BPF token. This follows existing pattern for BPF map and BPF prog. Also add security_bpf_token_allow_cmd() and security_bpf_token_capable() LSM hooks that allow LSM implementation to control and negate (if necessary) BPF token's delegation of a specific bpf_cmd and capability, respectively. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/bpf/20240124022127.2379740-12-andrii@kernel.org show more ...
# caf8f28e	24-Jan-2024	Andrii Nakryiko <andrii@kernel.org>	bpf: Add BPF token support to BPF_PROG_LOAD command Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag should be set in prog_flags field when providing prog_token_fd. Wire through bpf: Add BPF token support to BPF_PROG_LOAD command Add basic support of BPF token to BPF_PROG_LOAD. BPF_F_TOKEN_FD flag should be set in prog_flags field when providing prog_token_fd. Wire through a set of allowed BPF program types and attach types, derived from BPF FS at BPF token creation time. Then make sure we perform bpf_token_capable() checks everywhere where it's relevant. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-7-andrii@kernel.org show more ...
# a177fc2b	24-Jan-2024	Andrii Nakryiko <andrii@kernel.org>	bpf: Add BPF token support to BPF_MAP_CREATE command Allow providing token_fd for BPF_MAP_CREATE command to allow controlled BPF map creation from unprivileged process through delegated BPF token. N bpf: Add BPF token support to BPF_MAP_CREATE command Allow providing token_fd for BPF_MAP_CREATE command to allow controlled BPF map creation from unprivileged process through delegated BPF token. New BPF_F_TOKEN_FD flag is added to specify together with BPF token FD for BPF_MAP_CREATE command. Wire through a set of allowed BPF map types to BPF token, derived from BPF FS at BPF token creation time. This, in combination with allowed_cmds allows to create a narrowly-focused BPF token (controlled by privileged agent) with a restrictive set of BPF maps that application can attempt to create. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-5-andrii@kernel.org show more ...
# 35f96de0	24-Jan-2024	Andrii Nakryiko <andrii@kernel.org>	bpf: Introduce BPF token object Add new kind of BPF kernel object, BPF token. BPF token is meant to allow delegating privileged BPF functionality, like loading a BPF program or creating a BPF map, f bpf: Introduce BPF token object Add new kind of BPF kernel object, BPF token. BPF token is meant to allow delegating privileged BPF functionality, like loading a BPF program or creating a BPF map, from privileged process to a trusted unprivileged process, all while having a good amount of control over which privileged operations could be performed using provided BPF token. This is achieved through mounting BPF FS instance with extra delegation mount options, which determine what operations are delegatable, and also constraining it to the owning user namespace (as mentioned in the previous patch). BPF token itself is just a derivative from BPF FS and can be created through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF FS FD, which can be attained through open() API by opening BPF FS mount point. Currently, BPF token "inherits" delegated command, map types, prog type, and attach type bit sets from BPF FS as is. In the future, having an BPF token as a separate object with its own FD, we can allow to further restrict BPF token's allowable set of things either at the creation time or after the fact, allowing the process to guard itself further from unintentionally trying to load undesired kind of BPF programs. But for now we keep things simple and just copy bit sets as is. When BPF token is created from BPF FS mount, we take reference to the BPF super block's owning user namespace, and then use that namespace for checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN} capabilities that are normally only checked against init userns (using capable()), but now we check them using ns_capable() instead (if BPF token is provided). See bpf_token_capable() for details. Such setup means that BPF token in itself is not sufficient to grant BPF functionality. User namespaced process has to also have necessary combination of capabilities inside that user namespace. So while previously CAP_BPF was useless when granted within user namespace, now it gains a meaning and allows container managers and sys admins to have a flexible control over which processes can and need to use BPF functionality within the user namespace (i.e., container in practice). And BPF FS delegation mount options and derived BPF tokens serve as a per-container "flag" to grant overall ability to use bpf() (plus further restrict on which parts of bpf() syscalls are treated as namespaced). Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF) within the BPF FS owning user namespace, rounding up the ns_capable() story of BPF token. Also creating BPF token in init user namespace is currently not supported, given BPF token doesn't have any effect in init user namespace anyways. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/bpf/20240124022127.2379740-4-andrii@kernel.org show more ...