| #
4fe18ddd |
| 29-Apr-2026 |
Arnd Bergmann <arnd@arndb.de> |
ne2k: fold drivers/net/Space.c into ne.c
drivers/net/Space.c is the last remnant of the linux-2.4.x driver model that required each subsystem and device driver init function to be called from init/m
ne2k: fold drivers/net/Space.c into ne.c
drivers/net/Space.c is the last remnant of the linux-2.4.x driver model that required each subsystem and device driver init function to be called from init/main.c explicitly, before the introduction of initcall levels.
In linux-7.0, this was only used for a handful of ISA network drivers, with the ne2000 driver being the last one.
Fold the code into ne.c directly, with minimal changes to preserve the existing command line parsing.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> # m68k Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260429145624.2948432-2-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
6d543155 |
| 16-Apr-2026 |
Jakub Kicinski <kuba@kernel.org> |
caif: remove CAIF NETWORK LAYER
Remove CAIF (Communication CPU to Application CPU Interface), the ST-Ericsson modem protocol. The subsystem has been orphaned since 2013. The last meaningful changes
caif: remove CAIF NETWORK LAYER
Remove CAIF (Communication CPU to Application CPU Interface), the ST-Ericsson modem protocol. The subsystem has been orphaned since 2013. The last meaningful changes from the maintainers were in March 2013: a8c7687bf216 ("caif_virtio: Check that vringh_config is not null") b2273be8d2df ("caif_virtio: Use vringh_notify_enable correctly") 0d2e1a2926b1 ("caif_virtio: Introduce caif over virtio")
Not-so-coincidentally, according to "the Internet" ST-Ericsson officially shut down its modem joint venture in Aug 2013.
If anyone is using this code please yell!
In the 13 years since, the code has accumulated 200 non-merge commits, of which 71 were cross-tree API changes, 21 carried Fixes: tags, and the remaining ~110 were cleanups, doc conversions, treewide refactors, and one partial removal (caif_hsi, ca75bcf0a83b).
We are still getting fixes to this code, in the last 10 days there were 3 reports on security@ about CAIF that I have been CCed on.
UAPI constants (AF_CAIF, ARPHRD_CAIF, N_CAIF, VIRTIO_ID_CAIF) and the SELinux classmap entry are intentionally kept for ABI stability.
Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Linus Walleij <linusw@kernel.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260416182829.1440262-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
309b905d |
| 25-Mar-2026 |
Fernando Fernandez Mancera <fmancera@suse.de> |
ipv6: convert CONFIG_IPV6 to built-in only and clean up Kconfigs
Maintaining a modular IPv6 stack offers image size savings for specific setups, this benefit is outweighed by the architectural burde
ipv6: convert CONFIG_IPV6 to built-in only and clean up Kconfigs
Maintaining a modular IPv6 stack offers image size savings for specific setups, this benefit is outweighed by the architectural burden it imposes on the subsystems on implementation and maintenance. Therefore, drop it.
Change CONFIG_IPV6 from tristate to bool. Remove all Kconfig dependencies across the tree that explicitly checked for IPV6=m. In addition, remove MODULE_DESCRIPTION(), MODULE_ALIAS(), MODULE_AUTHOR() and MODULE_LICENSE().
This is also replacing module_init() by device_initcall(). It is not possible to use fs_initcall() as IPv4 does because that creates a race condition on IPv6 addrconf.
Finally, modify the default configs from CONFIG_IPV6=m to CONFIG_IPV6=y except for m68k as according to the bloat-o-meter the image is increasing by 330KB~ and that isn't acceptable. Instead, disable IPv6 on this architecture by default. This is aligned with m68k RAM requirements and recommendations [1].
[1] http://www.linux-m68k.org/faq/ram.html
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Acked-by: Krzysztof Kozlowski <krzk@kernel.org> # arm64 Link: https://patch.msgid.link/20260325120928.15848-2-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
24fbd396 |
| 10-Mar-2026 |
Vishwanath Seshagiri <vishs@meta.com> |
virtio_net: add page_pool support for buffer allocation
Use page_pool for RX buffer allocation in mergeable and small buffer modes to enable page recycling and avoid repeated page allocator calls. s
virtio_net: add page_pool support for buffer allocation
Use page_pool for RX buffer allocation in mergeable and small buffer modes to enable page recycling and avoid repeated page allocator calls. skb_mark_for_recycle() enables page reuse in the network stack.
Big packets mode is unchanged because it uses page->private for linked list chaining of multiple pages per buffer, which conflicts with page_pool's internal use of page->private.
Implement conditional DMA premapping using virtqueue_dma_dev(): - When non-NULL (vhost, virtio-pci): use PP_FLAG_DMA_MAP with page_pool handling DMA mapping, submit via virtqueue_add_inbuf_premapped() - When NULL (VDUSE, direct physical): page_pool handles allocation only, submit via virtqueue_add_inbuf_ctx()
This preserves the DMA premapping optimization from commit 31f3cd4e5756b ("virtio-net: rq submits premapped per-buffer") while adding page_pool support as a prerequisite for future zero-copy features (devmem TCP, io_uring ZCRX).
Page pools are created in probe and destroyed in remove (not open/close), following existing driver behavior where RX buffers remain in virtqueues across interface state changes.
Signed-off-by: Vishwanath Seshagiri <vishs@meta.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260310183107.2822016-1-vishs@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
636fd32d |
| 13-Feb-2026 |
Arnd Bergmann <arnd@arndb.de> |
printk: add CONFIG_PRINTK dependency for netconsole
The 'select PRINTK_EXECUTION_CTX' line now causes a harmless warning when NETCONSOLE_DYNAMIC is enabled but PRINTK is not:
WARNING: unmet direct
printk: add CONFIG_PRINTK dependency for netconsole
The 'select PRINTK_EXECUTION_CTX' line now causes a harmless warning when NETCONSOLE_DYNAMIC is enabled but PRINTK is not:
WARNING: unmet direct dependencies detected for PRINTK_EXECUTION_CTX Depends on [n]: PRINTK [=n] Selected by [y]: - NETCONSOLE_DYNAMIC [=y] && NETDEVICES [=y] && NET_CORE [=y] && NETCONSOLE [=y] && SYSFS [=y] && CONFIGFS_FS [=y] && (NETCONSOLE [=y]!=y [=y] || CONFIGFS_FS [=y]!=m [=m])
In that configuration, the netconsole driver is useless anyway, so avoid this with an added dependency that prevents CONFIG_NETCONSOLE to be enabled without CONFIG_PRINTK.
Fixes: 60325c27d3cf ("printk: Add execution context (task name/CPU) to printk_info") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260213074431.1729627-1-arnd@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
| #
60325c27 |
| 06-Feb-2026 |
Breno Leitao <leitao@debian.org> |
printk: Add execution context (task name/CPU) to printk_info
Extend struct printk_info to include the task name, pid, and CPU number where printk messages originate. This information is captured at
printk: Add execution context (task name/CPU) to printk_info
Extend struct printk_info to include the task name, pid, and CPU number where printk messages originate. This information is captured at vprintk_store() time and propagated through printk_message to nbcon_write_context, making it available to nbcon console drivers.
This is useful for consoles like netconsole that want to include execution context in their output, allowing correlation of messages with specific tasks and CPUs regardless of where the console driver actually runs.
The feature is controlled by CONFIG_PRINTK_EXECUTION_CTX, which is automatically selected by CONFIG_NETCONSOLE_DYNAMIC. When disabled, the helper functions compile to no-ops with no overhead.
Suggested-by: John Ogness <john.ogness@linutronix.de> Signed-off-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20260206-nbcon-v7-1-62bda69b1b41@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
d8f87aa5 |
| 19-Jan-2026 |
Ethan Nelson-Moore <enelsonmoore@gmail.com> |
net: remove HIPPI support and RoadRunner HIPPI driver
HIPPI has not been relevant for over two decades. It was rapidly eclipsed by Fibre Channel, and even when it was new, it was confined to very hi
net: remove HIPPI support and RoadRunner HIPPI driver
HIPPI has not been relevant for over two decades. It was rapidly eclipsed by Fibre Channel, and even when it was new, it was confined to very high-end hardware. The HIPPI code has only received tree-wide changes and fixes by inspection in the entire Git history. Remove HIPPI support and the rrunner HIPPI driver, and move the former maintainer to the CREDITS file. Keep the include/uapi/linux/if_hippi.h header because it is used by the TUN code, and to avoid breaking userspace, however unlikely that may be.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Link: https://patch.msgid.link/20260119022451.22344-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
54e7bb6a |
| 06-Sep-2025 |
Eric Biggers <ebiggers@kernel.org> |
wireguard: kconfig: simplify crypto kconfig selections
Simplify the kconfig entry for WIREGUARD:
- Drop the selections of the arch-optimized ChaCha20, Poly1305, BLAKE2s, and Curve25519 code. The
wireguard: kconfig: simplify crypto kconfig selections
Simplify the kconfig entry for WIREGUARD:
- Drop the selections of the arch-optimized ChaCha20, Poly1305, BLAKE2s, and Curve25519 code. These options no longer exist, as lib/crypto/ now enables the arch-optimized code automatically.
- Drop the selection of CRYPTO. This was needed only to make the arch-optimized options visible. lib/crypto/ now handles these options internally, without any dependency on CRYPTO.
- Drop the dependency on !KMSAN. This was needed only to avoid selecting arch-optimized code that isn't compatible with KMSAN. lib/crypto/ now handles the !KMSAN dependencies internally.
- Add a selection of CRYPTO_LIB_UTILS, since WireGuard directly calls crypto_memneq(). This gets selected indirectly by CRYPTO_LIB_CURVE25519 and CRYPTO_LIB_CHACHA20POLY1305 anyway, but it's best to make this dependency explicit.
Link: https://lore.kernel.org/r/20250906213523.84915-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
show more ...
|
| #
11851cbd |
| 15-Apr-2025 |
Antonio Quartulli <antonio@openvpn.net> |
ovpn: implement TCP transport
With this change ovpn is allowed to communicate to peers also via TCP. Parsing of incoming messages is implemented through the strparser API.
Note that ovpn redefines
ovpn: implement TCP transport
With this change ovpn is allowed to communicate to peers also via TCP. Parsing of incoming messages is implemented through the strparser API.
Note that ovpn redefines sk_prot and sk_socket->ops for the TCP socket used to communicate with the peer. For this reason it needs to access inet6_stream_ops, which is declared as extern in the IPv6 module, but it is not fully exported.
Therefore this patch is also adding EXPORT_SYMBOL_GPL(inet6_stream_ops) to net/ipv6/af_inet6.c.
Cc: David Ahern <dsahern@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Simon Horman <horms@kernel.org> Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20250415-b4-ovpn-v26-11-577f6097b964@openvpn.net Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
| #
8534731d |
| 15-Apr-2025 |
Antonio Quartulli <antonio@openvpn.net> |
ovpn: implement packet processing
This change implements encryption/decryption and encapsulation/decapsulation of OpenVPN packets.
Support for generic crypto state is added along with a wrapper for
ovpn: implement packet processing
This change implements encryption/decryption and encapsulation/decapsulation of OpenVPN packets.
Support for generic crypto state is added along with a wrapper for the AEAD crypto kernel API.
Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20250415-b4-ovpn-v26-9-577f6097b964@openvpn.net Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
| #
08857b5e |
| 15-Apr-2025 |
Antonio Quartulli <antonio@openvpn.net> |
ovpn: implement basic TX path (UDP)
Packets sent over the ovpn interface are processed and transmitted to the connected peer, if any.
Implementation is UDP only. TCP will be added by a later patch.
ovpn: implement basic TX path (UDP)
Packets sent over the ovpn interface are processed and transmitted to the connected peer, if any.
Implementation is UDP only. TCP will be added by a later patch.
Note: no crypto/encapsulation exists yet. Packets are just captured and sent.
Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20250415-b4-ovpn-v26-7-577f6097b964@openvpn.net Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
| #
80747cae |
| 15-Apr-2025 |
Antonio Quartulli <antonio@openvpn.net> |
ovpn: introduce the ovpn_peer object
An ovpn_peer object holds the whole status of a remote peer (regardless whether it is a server or a client).
This includes status for crypto, tx/rx buffers, nap
ovpn: introduce the ovpn_peer object
An ovpn_peer object holds the whole status of a remote peer (regardless whether it is a server or a client).
This includes status for crypto, tx/rx buffers, napi, etc.
Only support for one peer is introduced (P2P mode). Multi peer support is introduced with a later patch.
Along with the ovpn_peer, also the ovpn_bind object is introcued as the two are strictly related. An ovpn_bind object wraps a sockaddr representing the local coordinates being used to talk to a specific peer.
Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20250415-b4-ovpn-v26-5-577f6097b964@openvpn.net Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
| #
9f23d943 |
| 15-Apr-2025 |
Antonio Quartulli <antonio@openvpn.net> |
net: introduce OpenVPN Data Channel Offload (ovpn)
OpenVPN is a userspace software existing since around 2005 that allows users to create secure tunnels.
So far OpenVPN has implemented all operatio
net: introduce OpenVPN Data Channel Offload (ovpn)
OpenVPN is a userspace software existing since around 2005 that allows users to create secure tunnels.
So far OpenVPN has implemented all operations in userspace, which implies several back and forth between kernel and user land in order to process packets (encapsulate/decapsulate, encrypt/decrypt, rerouting..).
With `ovpn` we intend to move the fast path (data channel) entirely in kernel space and thus improve user measured throughput over the tunnel.
`ovpn` is implemented as a simple virtual network device driver, that can be manipulated by means of the standard RTNL APIs. A device of kind `ovpn` allows only IPv4/6 traffic and can be of type: * P2P (peer-to-peer): any packet sent over the interface will be encapsulated and transmitted to the other side (typical OpenVPN client or peer-to-peer behaviour); * P2MP (point-to-multipoint): packets sent over the interface are transmitted to peers based on existing routes (typical OpenVPN server behaviour).
After the interface has been created, OpenVPN in userspace can configure it using a new Netlink API. Specifically it is possible to manage peers and their keys.
The OpenVPN control channel is multiplexed over the same transport socket by means of OP codes. Anything that is not DATA_V2 (OpenVPN OP code for data traffic) is sent to userspace and handled there. This way the `ovpn` codebase is kept as compact as possible while focusing on handling data traffic only (fast path).
Any OpenVPN control feature (like cipher negotiation, TLS handshake, rekeying, etc.) is still fully handled by the userspace process.
When userspace establishes a new connection with a peer, it first performs the handshake and then passes the socket to the `ovpn` kernel module, which takes ownership. From this moment on `ovpn` will handle data traffic for the new peer. When control packets are received on the link, they are forwarded to userspace through the same transport socket they were received on, as userspace is still listening to them.
Some events (like peer deletion) are sent to a Netlink multicast group.
Although it wasn't easy to convince the community, `ovpn` implements only a limited number of the data-channel features supported by the userspace program.
Each feature that made it to `ovpn` was attentively vetted to avoid carrying too much legacy along with us (and to give a clear cut to old and probalby-not-so-useful features).
Notably, only encryption using AEAD ciphers (specifically ChaCha20Poly1305 and AES-GCM) was implemented. Supporting any other cipher out there was not deemed useful.
Both UDP and TCP sockets are supported.
As explained above, in case of P2MP mode, OpenVPN will use the main system routing table to decide which packet goes to which peer. This implies that no routing table was re-implemented in the `ovpn` kernel module.
This kernel module can be enabled by selecting the CONFIG_OVPN entry in the networking drivers section.
NOTE: this first patch introduces the very basic framework only. Features are then added patch by patch, however, although each patch will compile and possibly not break at runtime, only after having applied the full set it is expected to see the ovpn module fully working.
Cc: steffen.klassert@secunet.com Cc: antony.antony@secunet.com Signed-off-by: Antonio Quartulli <antonio@openvpn.net> Link: https://patch.msgid.link/20250415-b4-ovpn-v26-1-577f6097b964@openvpn.net Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
| #
3fed9fda |
| 12-Mar-2025 |
Arnd Bergmann <arnd@arndb.de> |
net: remove sb1000 cable modem driver
This one is hilariously outdated, it provided a faster downlink over TV cable for users of analog modems in the 1990s, through an ISA card.
The web page for th
net: remove sb1000 cable modem driver
This one is hilariously outdated, it provided a faster downlink over TV cable for users of analog modems in the 1990s, through an ISA card.
The web page for the userspace tools has been broken for 25 years, and the driver has only ever seen mechanical updates.
Link: http://web.archive.org/web/20000611165545/http://home.adelphia.net:80/~siglercm/sb1000.html Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250312085236.2531870-1-arnd@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
| #
b3ea4164 |
| 09-Oct-2024 |
Paolo Abeni <pabeni@redhat.com> |
testing: net-drv: add basic shaper test
Leverage a basic/dummy netdevsim implementation to do functional coverage for NL interface.
Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Ab
testing: net-drv: add basic shaper test
Leverage a basic/dummy netdevsim implementation to do functional coverage for NL interface.
Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Link: https://patch.msgid.link/43092afbf38365c796088bf8fc155e523ab434ae.1728460186.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
94e2a19a |
| 17-Apr-2024 |
Jakub Kicinski <kuba@kernel.org> |
net: netdevsim: select PAGE_POOL in Kconfig
build bot points out that I forgot to add the PAGE_POOL config dependency when adding the support in netdevsim.
Fixes: 1580cbcbfe77 ("net: netdevsim: add
net: netdevsim: select PAGE_POOL in Kconfig
build bot points out that I forgot to add the PAGE_POOL config dependency when adding the support in netdevsim.
Fixes: 1580cbcbfe77 ("net: netdevsim: add some fake page pool use") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202404170348.thxrboF1-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202404170527.LIAPSyMB-lkp@intel.com/ Link: https://lore.kernel.org/r/20240416232137.2022058-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
a29689e6 |
| 05-Apr-2024 |
Niklas Schnelle <schnelle@linux.ibm.com> |
net: handle HAS_IOPORT dependencies
In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at compile time. We thus need to add HAS_IOPORT as dependency for those drivers requiring the
net: handle HAS_IOPORT dependencies
In a future patch HAS_IOPORT=n will disable inb()/outb() and friends at compile time. We thus need to add HAS_IOPORT as dependency for those drivers requiring them. For the DEFXX driver the use of I/O ports is optional and we only need to fence specific code paths. It also turns out that with HAS_IOPORT handled explicitly HAMRADIO does not need the !S390 dependency and successfully builds the bpqether driver.
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Maciej W. Rozycki <macro@orcam.me.uk> Co-developed-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Arnd Bergmann <arnd@kernel.org> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
| #
76c8764e |
| 27-Mar-2024 |
Wojciech Drewek <wojciech.drewek@intel.com> |
pfcp: add PFCP module
Packet Forwarding Control Protocol (PFCP) is a 3GPP Protocol used between the control plane and the user plane function. It is specified in TS 29.244[1].
Note that this module
pfcp: add PFCP module
Packet Forwarding Control Protocol (PFCP) is a 3GPP Protocol used between the control plane and the user plane function. It is specified in TS 29.244[1].
Note that this module is not designed to support this Protocol in the kernel space. There is no support for parsing any PFCP messages. There is no API that could be used by any userspace daemon. Basically it does not support PFCP. This protocol is sophisticated and there is no need for implementing it in the kernel. The purpose of this module is to allow users to setup software and hardware offload of PFCP packets using tc tool.
When user requests to create a PFCP device, a new socket is created. The socket is set up with port number 8805 which is specific for PFCP [29.244 4.2.2]. This allow to receive PFCP request messages, response messages use other ports.
Note that only one PFCP netdev can be created.
Only IPv4 is supported at this time.
[1] https://portal.3gpp.org/desktopmodules/Specifications/SpecificationDetails.aspx?specificationId=3111
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
| #
62087995 |
| 11-Dec-2023 |
Heng Qi <hengqi@linux.alibaba.com> |
virtio-net: support rx netdim
By comparing the traffic information in the complete napi processes, let the virtio-net driver automatically adjust the coalescing moderation parameters of each receive
virtio-net: support rx netdim
By comparing the traffic information in the complete napi processes, let the virtio-net driver automatically adjust the coalescing moderation parameters of each receive queue.
Signed-off-by: Heng Qi <hengqi@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
| #
35dfaad7 |
| 24-Oct-2023 |
Daniel Borkmann <daniel@iogearbox.net> |
netkit, bpf: Add bpf programmable net device
This work adds a new, minimal BPF-programmable device called "netkit" (former PoC code-name "meta") we recently presented at LSF/MM/BPF. The core idea is
netkit, bpf: Add bpf programmable net device
This work adds a new, minimal BPF-programmable device called "netkit" (former PoC code-name "meta") we recently presented at LSF/MM/BPF. The core idea is that BPF programs are executed within the drivers xmit routine and therefore e.g. in case of containers/Pods moving BPF processing closer to the source.
One of the goals was that in case of Pod egress traffic, this allows to move BPF programs from hostns tcx ingress into the device itself, providing earlier drop or forward mechanisms, for example, if the BPF program determines that the skb must be sent out of the node, then a redirect to the physical device can take place directly without going through per-CPU backlog queue. This helps to shift processing for such traffic from softirq to process context, leading to better scheduling decisions/performance (see measurements in the slides).
In this initial version, the netkit device ships as a pair, but we plan to extend this further so it can also operate in single device mode. The pair comes with a primary and a peer device. Only the primary device, typically residing in hostns, can manage BPF programs for itself and its peer. The peer device is designated for containers/Pods and cannot attach/detach BPF programs. Upon the device creation, the user can set the default policy to 'pass' or 'drop' for the case when no BPF program is attached.
Additionally, the device can be operated in L3 (default) or L2 mode. The management of BPF programs is done via bpf_mprog, so that multi-attach is supported right from the beginning with similar API and dependency controls as tcx. For details on the latter see commit 053c8e1f235d ("bpf: Add generic attach/detach/query API for multi-progs"). tc BPF compatibility is provided, so that existing programs can be easily migrated.
Going forward, we plan to use netkit devices in Cilium as the main device type for connecting Pods. They will be operated in L3 mode in order to simplify a Pod's neighbor management and the peer will operate in default drop mode, so that no traffic is leaving between the time when a Pod is brought up by the CNI plugin and programs attached by the agent. Additionally, the programs we attach via tcx on the physical devices are using bpf_redirect_peer() for inbound traffic into netkit device, hence the latter is also supporting the ndo_get_peer_dev callback. Similarly, we use bpf_redirect_neigh() for the way out, pushing from netkit peer to phys device directly. Also, BIG TCP is supported on netkit device. For the follow-up work in single device mode, we plan to convert Cilium's cilium_host/_net devices into a single one.
An extensive test suite for checking device operations and the BPF program and link management API comes as BPF selftests in this series.
Co-developed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Acked-by: Stanislav Fomichev <sdf@google.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://github.com/borkmann/iproute2/tree/pr/netkit Link: http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf (24ff.) Link: https://lore.kernel.org/r/20231024214904.29825-2-daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
show more ...
|
| #
fad361a2 |
| 11-Aug-2023 |
Breno Leitao <leitao@debian.org> |
netconsole: Enable compile time configuration
Enable netconsole features to be set at compilation time. Create two Kconfig options that allow users to set extended logs and release prepending featur
netconsole: Enable compile time configuration
Enable netconsole features to be set at compilation time. Create two Kconfig options that allow users to set extended logs and release prepending features at compilation time.
Right now, the user needs to pass command line parameters to netconsole, such as "+"/"r" to enable extended logs and version prepending features.
With these two options, the user could set the default values for the features at compile time, and don't need to pass it in the command line to get them enabled, simplifying the command line.
Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230811093158.1678322-3-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
54f00cce |
| 10-Aug-2023 |
William Tu <u9012063@gmail.com> |
vmxnet3: Add XDP support.
The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT.
Background: The vmxnet3 rx consists of three rings: ring0, ring1, and dataring. For r0 and r1, bu
vmxnet3: Add XDP support.
The patch adds native-mode XDP support: XDP DROP, PASS, TX, and REDIRECT.
Background: The vmxnet3 rx consists of three rings: ring0, ring1, and dataring. For r0 and r1, buffers at r0 are allocated using alloc_skb APIs and dma mapped to the ring's descriptor. If LRO is enabled and packet size larger than 3K, VMXNET3_MAX_SKB_BUF_SIZE, then r1 is used to mapped the rest of the buffer larger than VMXNET3_MAX_SKB_BUF_SIZE. Each buffer in r1 is allocated using alloc_page. So for LRO packets, the payload will be in one buffer from r0 and multiple from r1, for non-LRO packets, only one descriptor in r0 is used for packet size less than 3k.
When receiving a packet, the first descriptor will have the sop (start of packet) bit set, and the last descriptor will have the eop (end of packet) bit set. Non-LRO packets will have only one descriptor with both sop and eop set.
Other than r0 and r1, vmxnet3 dataring is specifically designed for handling packets with small size, usually 128 bytes, defined in VMXNET3_DEF_RXDATA_DESC_SIZE, by simply copying the packet from the backend driver in ESXi to the ring's memory region at front-end vmxnet3 driver, in order to avoid memory mapping/unmapping overhead. In summary, packet size: A. < 128B: use dataring B. 128B - 3K: use ring0 (VMXNET3_RX_BUF_SKB) C. > 3K: use ring0 and ring1 (VMXNET3_RX_BUF_SKB + VMXNET3_RX_BUF_PAGE) As a result, the patch adds XDP support for packets using dataring and r0 (case A and B), not the large packet size when LRO is enabled.
XDP Implementation: When user loads and XDP prog, vmxnet3 driver checks configurations, such as mtu, lro, and re-allocate the rx buffer size for reserving the extra headroom, XDP_PACKET_HEADROOM, for XDP frame. The XDP prog will then be associated with every rx queue of the device. Note that when using dataring for small packet size, vmxnet3 (front-end driver) doesn't control the buffer allocation, as a result we allocate a new page and copy packet from the dataring to XDP frame.
The receive side of XDP is implemented for case A and B, by invoking the bpf program at vmxnet3_rq_rx_complete and handle its returned action. The vmxnet3_process_xdp(), vmxnet3_process_xdp_small() function handles the ring0 and dataring case separately, and decides the next journey of the packet afterward.
For TX, vmxnet3 has split header design. Outgoing packets are parsed first and protocol headers (L2/L3/L4) are copied to the backend. The rest of the payload are dma mapped. Since XDP_TX does not parse the packet protocol, the entire XDP frame is dma mapped for transmission and transmitted in a batch. Later on, the frame is freed and recycled back to the memory pool.
Performance: Tested using two VMs inside one ESXi vSphere 7.0 machine, using single core on each vmxnet3 device, sender using DPDK testpmd tx-mode attached to vmxnet3 device, sending 64B or 512B UDP packet.
VM1 txgen: $ dpdk-testpmd -l 0-3 -n 1 -- -i --nb-cores=3 \ --forward-mode=txonly --eth-peer=0,<mac addr of vm2> option: add "--txonly-multi-flow" option: use --txpkts=512 or 64 byte
VM2 running XDP: $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> --skb-mode $ ./samples/bpf/xdp_rxq_info -d ens160 -a <options> options: XDP_DROP, XDP_PASS, XDP_TX
To test REDIRECT to cpu 0, use $ ./samples/bpf/xdp_redirect_cpu -d ens160 -c 0 -e drop
Single core performance comparison with skb-mode. 64B: skb-mode -> native-mode XDP_DROP: 1.6Mpps -> 2.4Mpps XDP_PASS: 338Kpps -> 367Kpps XDP_TX: 1.1Mpps -> 2.3Mpps REDIRECT-drop: 1.3Mpps -> 2.3Mpps
512B: skb-mode -> native-mode XDP_DROP: 863Kpps -> 1.3Mpps XDP_PASS: 275Kpps -> 376Kpps XDP_TX: 554Kpps -> 1.2Mpps REDIRECT-drop: 659Kpps -> 1.2Mpps
Demo: https://youtu.be/4lm1CSCi78Q
Future work: - XDP frag support - use napi_consume_skb() instead of dev_kfree_skb_any at unmap - stats using u64_stats_t - using bitfield macro BIT() - optimization for DMA synchronization using actual frame length, instead of always max_len
Signed-off-by: William Tu <u9012063@gmail.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
| #
b63e78fc |
| 07-Aug-2023 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: netdevsim: use mock PHC driver
I'd like to make netdevsim offload tc-taprio, but currently, this Qdisc emits a ETHTOOL_GET_TS_INFO call to the driver to make sure that it has a PTP clock, so th
net: netdevsim: use mock PHC driver
I'd like to make netdevsim offload tc-taprio, but currently, this Qdisc emits a ETHTOOL_GET_TS_INFO call to the driver to make sure that it has a PTP clock, so that it is reasonably capable of offloading the schedule.
By using the mock PHC driver, that becomes possible.
Hardware timestamping is not necessary, and netdevsim does not support packet I/O anyway.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20230807193324.4128292-8-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
| #
5e316a81 |
| 09-May-2023 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net: veth: make PAGE_POOL_STATS optional
Since veth is very likely to be enabled and there are some drivers (e.g. mlx5) where CONFIG_PAGE_POOL_STATS is optional, make CONFIG_PAGE_POOL_STATS optional
net: veth: make PAGE_POOL_STATS optional
Since veth is very likely to be enabled and there are some drivers (e.g. mlx5) where CONFIG_PAGE_POOL_STATS is optional, make CONFIG_PAGE_POOL_STATS optional for veth too in order to keep it optional instead of required.
Suggested-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Simon Horman <simon.horman@corigine.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
| #
4fc41805 |
| 22-Apr-2023 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net: veth: add page_pool stats
Introduce page_pool stats support to report info about local page_pool through ethtool
Tested-by: Maryam Tahhan <mtahhan@redhat.com> Signed-off-by: Lorenzo Bianconi <
net: veth: add page_pool stats
Introduce page_pool stats support to report info about local page_pool through ethtool
Tested-by: Maryam Tahhan <mtahhan@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|