| 48103896 | 03-Apr-2026 |
Daniel Borkmann <daniel@iogearbox.net> |
netkit: Add single device mode for netkit
Add a single device mode for netkit instead of netkit pairs. The primary target for the paired devices is to connect network namespaces, of course, and supp
netkit: Add single device mode for netkit
Add a single device mode for netkit instead of netkit pairs. The primary target for the paired devices is to connect network namespaces, of course, and support has been implemented in projects like Cilium [0]. For the rxq leasing the plan is to support two main scenarios related to single device mode:
* For the use-case of io_uring zero-copy, the control plane can either set up a netkit pair where the peer device can perform rxq leasing which is then tied to the lifetime of the peer device, or the control plane can use a regular netkit pair to connect the hostns to a Pod/container and dynamically add/remove rxq leasing through a single device without having to interrupt the device pair. In the case of io_uring, the memory pool is used as skb non-linear pages, and thus the skb will go its way through the regular stack into netkit. Things like the netkit policy when no BPF is attached or skb scrubbing etc apply as-is in case the paired devices are used, or if the backend memory is tied to the single device and traffic goes through a paired device.
* For the use-case of AF_XDP, the control plane needs to use netkit in the single device mode. The single device mode currently enforces only a pass policy when no BPF is attached, and does not yet support BPF link attachments for AF_XDP. skbs sent to that device get dropped at the moment. Given AF_XDP operates at a lower layer of the stack tying this to the netkit pair did not make sense. In future, the plan is to allow BPF at the XDP layer which can: i) process traffic coming from the AF_XDP application (e.g. QEMU with AF_XDP backend) to filter egress traffic or to push selected egress traffic up to the single netkit device to the local stack (e.g. DHCP requests), and ii) vice-versa skbs sent to the single netkit into the AF_XDP application (e.g. DHCP replies). Also, the control-plane can dynamically manage rxq leasing for the single netkit device without having to interrupt (e.g. down/up cycle) the main netkit pair for the Pod which has traffic going in and out.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Jordan Rife <jordan@jrife.io> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://docs.cilium.io/en/stable/operations/performance/tuning/#netkit-device-mode [0] Link: https://patch.msgid.link/20260402231031.447597-11-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| 7789c6bb | 03-Apr-2026 |
Daniel Borkmann <daniel@iogearbox.net> |
net: Add queue-create operation
Add a ynl netdev family operation called queue-create that creates a new queue on a netdevice:
name: queue-create attribute-set: queue flags: [admi
net: Add queue-create operation
Add a ynl netdev family operation called queue-create that creates a new queue on a netdevice:
name: queue-create attribute-set: queue flags: [admin-perm] do: request: attributes: - ifindex - type - lease reply: &queue-create-op attributes: - id
This is a generic operation such that it can be extended for various use cases in future. Right now it is mandatory to specify ifindex, the queue type which is enforced to rx and a lease. The newly created queue id is returned to the caller.
A queue from a virtual device can have a lease which refers to another queue from a physical device. This is useful for memory providers and AF_XDP operations which take an ifindex and queue id to allow applications to bind against virtual devices in containers. The lease couples both queues together and allows to proxy the operations from a virtual device in a container to the physical device.
In future, the nested lease attribute can be lifted and made optional for other use-cases such as dynamic queue creation for physical netdevs. The lack of lease and the specification of the physical device as an ifindex will imply that we need a real queue to be allocated. Similarly, the queue type enforcement to rx can then be lifted as well to support tx.
An early implementation had only driver-specific integration [0], but in order for other virtual devices to reuse, it makes sense to have this as a generic API in core net.
For leasing queues, the virtual netdev must have real_num_rx_queues less than num_rx_queues at the time of calling queue-create. The queue-type must be rx as only rx queues are supported for leasing for now. We also enforce that the queue-create ifindex must point to a virtual device, and that the nested lease attribute's ifindex must point to a physical device. The nested lease attribute set contains a netns-id attribute which is optional and can specify a netns-id relative to the caller's netns. It requires cap_net_admin and if the netns-id attribute is not specified, the lease ifindex will be retrieved from the current netns. Also, it is modeled as an s32 type similarly as done elsewhere in the stack.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0] Link: https://patch.msgid.link/20260402231031.447597-2-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| 1bc45341 | 07-Apr-2026 |
Or Har-Toov <ohartoov@nvidia.com> |
devlink: Add resource scope filtering to resource dump
Allow filtering the resource dump to device-level or port-level resources using the 'scope' option.
Example - dump only device-level resources
devlink: Add resource scope filtering to resource dump
Allow filtering the resource dump to device-level or port-level resources using the 'scope' option.
Example - dump only device-level resources:
$ devlink resource show scope dev pci/0000:03:00.0: name max_local_SFs size 128 unit entry dpipe_tables none name max_external_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.1: name max_local_SFs size 128 unit entry dpipe_tables none name max_external_SFs size 128 unit entry dpipe_tables none
Example - dump only port-level resources:
$ devlink resource show scope port pci/0000:03:00.0/196608: name max_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.0/196609: name max_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.1/196708: name max_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.1/196709: name max_SFs size 128 unit entry dpipe_tables none
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| 7511ff14 | 07-Apr-2026 |
Or Har-Toov <ohartoov@nvidia.com> |
devlink: Add port-specific option to resource dump doit
Allow querying devlink resources per-port via the resource-dump doit handler. When a port-index attribute is provided, only that port's resour
devlink: Add port-specific option to resource dump doit
Allow querying devlink resources per-port via the resource-dump doit handler. When a port-index attribute is provided, only that port's resources are returned. When no port-index is given, only device-level resources are returned, preserving backward compatibility.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| c8eee00c | 03-Apr-2026 |
Daniel Zahka <daniel.zahka@gmail.com> |
psp: add missing device stats to get-stats reply attributes
Commit f05d26198cf2 ("psp: add stats from psp spec to driver facing api") added device statistics (rx-packets, rx-bytes, rx-auth-fail, rx-
psp: add missing device stats to get-stats reply attributes
Commit f05d26198cf2 ("psp: add stats from psp spec to driver facing api") added device statistics (rx-packets, rx-bytes, rx-auth-fail, rx-error, rx-bad, tx-packets, tx-bytes, tx-error) to the stats attribute-set but did not add them to the get-stats operation reply attributes. The kernel reports these attributes in the reply, so list them in the spec to match.
Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260403-psp-yaml-fix-v1-1-dacee0663903@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| d85a8af5 | 12-Mar-2026 |
Jiri Pirko <jiri@nvidia.com> |
devlink: allow to use devlink index as a command handle
Currently devlink instances are addressed bus_name/dev_name tuple. Allow the newly introduced DEVLINK_ATTR_INDEX to be used as an alternative
devlink: allow to use devlink index as a command handle
Currently devlink instances are addressed bus_name/dev_name tuple. Allow the newly introduced DEVLINK_ATTR_INDEX to be used as an alternative handle for all devlink commands.
When DEVLINK_ATTR_INDEX is present in the request, use it for a direct xarray lookup instead of iterating over all instances comparing bus_name/dev_name strings.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260312100407.551173-5-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| 568b370f | 03-Mar-2026 |
Remy D. Farley <one-d-wide@protonmail.com> |
doc/netlink: nftables: Fill out operation attributes
Filled out operation attributes: - newtable - gettable - deltable - destroytable - newchain - getchain - delchain - destroychain - newrule - getr
doc/netlink: nftables: Fill out operation attributes
Filled out operation attributes: - newtable - gettable - deltable - destroytable - newchain - getchain - delchain - destroychain - newrule - getrule - getrule-reset - delrule - destroyrule - newset - getset - delset - destroyset - newsetelem - getsetelem - getsetelem-reset - delsetelem - destroysetelem - getgen - newobj - getobj - delobj - destroyobj - newflowtable - getflowtable - delflowtable - destroyflowtable
Signed-off-by: Remy D. Farley <one-d-wide@protonmail.com> Link: https://patch.msgid.link/20260303195638.381642-6-one-d-wide@protonmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| 27c7ee6d | 03-Mar-2026 |
Remy D. Farley <one-d-wide@protonmail.com> |
doc/netlink: nftables: Add sub-messages
New sub-messsages: - log - match - numgen - range
Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Remy D. Farley <one-d-wide@protonmail.c
doc/netlink: nftables: Add sub-messages
New sub-messsages: - log - match - numgen - range
Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Signed-off-by: Remy D. Farley <one-d-wide@protonmail.com> Link: https://patch.msgid.link/20260303195638.381642-5-one-d-wide@protonmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|