#
07fdad3a |
| 03-Oct-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core & protocols:
- Improve drop account scalability on NUM
Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni: "Core & protocols:
- Improve drop account scalability on NUMA hosts for RAW and UDP sockets and the backlog, almost doubling the Pps capacity under DoS
- Optimize the UDP RX performance under stress, reducing contention, revisiting the binary layout of the involved data structs and implementing NUMA-aware locking. This improves UDP RX performance by an additional 50%, even more under extreme conditions
- Add support for PSP encryption of TCP connections; this mechanism has some similarities with IPsec and TLS, but offers superior HW offloads capabilities
- Ongoing work to support Accurate ECN for TCP. AccECN allows more than one congestion notification signal per RTT and is a building block for Low Latency, Low Loss, and Scalable Throughput (L4S)
- Reorganize the TCP socket binary layout for data locality, reducing the number of touched cachelines in the fastpath
- Refactor skb deferral free to better scale on large multi-NUMA hosts, this improves TCP and UDP RX performances significantly on such HW
- Increase the default socket memory buffer limits from 256K to 4M to better fit modern link speeds
- Improve handling of setups with a large number of nexthop, making dump operating scaling linearly and avoiding unneeded synchronize_rcu() on delete
- Improve bridge handling of VLAN FDB, storing a single entry per bridge instead of one entry per port; this makes the dump order of magnitude faster on large switches
- Restore IP ID correctly for encapsulated packets at GSO segmentation time, allowing GRO to merge packets in more scenarios
- Improve netfilter matching performance on large sets
- Improve MPTCP receive path performance by leveraging recently introduced core infrastructure (skb deferral free) and adopting recent TCP autotuning changes
- Allow bridges to redirect to a backup port when the bridge port is administratively down
- Introduce MPTCP 'laminar' endpoint that con be used only once per connection and simplify common MPTCP setups
- Add RCU safety to dst->dev, closing a lot of possible races
- A significant crypto library API for SCTP, MPTCP and IPv6 SR, reducing code duplication
- Supports pulling data from an skb frag into the linear area of an XDP buffer
Things we sprinkled into general kernel code:
- Generate netlink documentation from YAML using an integrated YAML parser
Driver API:
- Support using IPv6 Flow Label in Rx hash computation and RSS queue selection
- Introduce API for fetching the DMA device for a given queue, allowing TCP zerocopy RX on more H/W setups
- Make XDP helpers compatible with unreadable memory, allowing more easily building DevMem-enabled drivers with a unified XDP/skbs datapath
- Add a new dedicated ethtool callback enabling drivers to provide the number of RX rings directly, improving efficiency and clarity in RX ring queries and RSS configuration
- Introduce a burst period for the health reporter, allowing better handling of multiple errors due to the same root cause
- Support for DPLL phase offset exponential moving average, controlling the average smoothing factor
Device drivers:
- Add a new Huawei driver for 3rd gen NIC (hinic3)
- Add a new SpacemiT driver for K1 ethernet MAC
- Add a generic abstraction for shared memory communication devices (dibps)
- Ethernet high-speed NICs: - nVidia/Mellanox: - Use multiple per-queue doorbell, to avoid MMIO contention issues - support adjacent functions, allowing them to delegate their SR-IOV VFs to sibling PFs - support RSS for IPSec offload - support exposing raw cycle counters in PTP and mlx5 - support for disabling host PFs. - Intel (100G, ice, idpf): - ice: support for SRIOV VFs over an Active-Active link aggregate - ice: support for firmware logging via debugfs - ice: support for Earliest TxTime First (ETF) hardware offload - idpf: support basic XDP functionalities and XSk - Broadcom (bnxt): - support Hyper-V VF ID - dynamic SRIOV resource allocations for RoCE - Meta (fbnic): - support queue API, zero-copy Rx and Tx - support basic XDP functionalities - devlink health support for FW crashes and OTP mem corruptions - expand hardware stats coverage to FEC, PHY, and Pause - Wangxun: - support ethtool coalesce options - support for multiple RSS contexts
- Ethernet virtual: - Macsec: - replace custom netlink attribute checks with policy-level checks - Bonding: - support aggregator selection based on port priority - Microsoft vNIC: - use page pool fragments for RX buffers instead of full pages to improve memory efficiency
- Ethernet NICs consumer, and embedded: - Qualcomm: support Ethernet function for IPQ9574 SoC - Airoha: implement wlan offloading via NPU - Freescale - enetc: add NETC timer PTP driver and add PTP support - fec: enable the Jumbo frame support for i.MX8QM - Renesas (R-Car S4): - support HW offloading for layer 2 switching - support for RZ/{T2H, N2H} SoCs - Cadence (macb): support TAPRIO traffic scheduling - TI: - support for Gigabit ICSS ethernet SoC (icssm-prueth) - Synopsys (stmmac): a lot of cleanups
- Ethernet PHYs: - Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS driver - Support bcm63268 GPHY power control - Support for Micrel lan8842 PHY and PTP - Support for Aquantia AQR412 and AQR115
- CAN: - a large CAN-XL preparation work - reorganize raw_sock and uniqframe struct to minimize memory usage - rcar_canfd: update the CAN-FD handling
- WiFi: - extended Neighbor Awareness Networking (NAN) support - S1G channel representation cleanup - improve S1G support
- WiFi drivers: - Intel (iwlwifi): - major refactor and cleanup - Broadcom (brcm80211): - support for AP isolation - RealTek (rtw88/89) rtw88/89: - preparation work for RTL8922DE support - MediaTek (mt76): - HW restart improvements - MLO support - Qualcomm/Atheros (ath10k): - GTK rekey fixes
- Bluetooth drivers: - btusb: support for several new IDs for MT7925 - btintel: support for BlazarIW core - btintel_pcie: support for _suspend() / _resume() - btintel_pcie: support for Scorpious, Panther Lake-H484 IDs"
* tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1536 commits) net: stmmac: Add support for Allwinner A523 GMAC200 dt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible Revert "Documentation: net: add flow control guide and document ethtool API" octeontx2-pf: fix bitmap leak octeontx2-vf: fix bitmap leak net/mlx5e: Use extack in set rxfh callback net/mlx5e: Introduce mlx5e_rss_params for RSS configuration net/mlx5e: Introduce mlx5e_rss_init_params net/mlx5e: Remove unused mdev param from RSS indir init net/mlx5: Improve QoS error messages with actual depth values net/mlx5e: Prevent entering switchdev mode with inconsistent netns net/mlx5: HWS, Generalize complex matchers net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs selftests/net: add tcp_port_share to .gitignore Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set" net: add NUMA awareness to skb_attempt_defer_free() net: use llist for sd->defer_list net: make softnet_data.defer_count an atomic selftests: drv-net: psp: add tests for destroying devices selftests: drv-net: psp: add test for auto-adjusting TCP MSS ...
show more ...
|
Revision tags: v6.17, v6.17-rc7, v6.17-rc6 |
|
#
203e3beb |
| 12-Sep-2025 |
Jakub Kicinski <kuba@kernel.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc8).
Conflicts:
drivers/net/can/spi/hi311x.c 6b6968084721 ("can: hi311x
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.17-rc8).
Conflicts:
drivers/net/can/spi/hi311x.c 6b6968084721 ("can: hi311x: fix null pointer dereference when resuming from sleep before interface was enabled") 27ce71e1ce81 ("net: WQ_PERCPU added to alloc_workqueue users") https://lore.kernel.org/72ce7599-1b5b-464a-a5de-228ff9724701@kernel.org
net/smc/smc_loopback.c drivers/dibs/dibs_loopback.c a35c04de2565 ("net/smc: fix warning in smc_rx_splice() when calling get_page()") cc21191b584c ("dibs: Move data path to dibs layer") https://lore.kernel.org/74368a5c-48ac-4f8e-a198-40ec1ed3cf5f@kernel.org
Adjacent changes:
drivers/net/dsa/lantiq/lantiq_gswip.c c0054b25e2f1 ("net: dsa: lantiq_gswip: move gswip_add_single_port_br() call to port_setup()") 7a1eaef0a791 ("net: dsa: lantiq_gswip: support model-specific mac_select_pcs()")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
df152675 |
| 23-Sep-2025 |
Paolo Abeni <pabeni@redhat.com> |
Merge branch 'dibs-direct-internal-buffer-sharing'
Alexandra Winter says:
==================== dibs - Direct Internal Buffer Sharing
This series introduces a generic abstraction of existing compon
Merge branch 'dibs-direct-internal-buffer-sharing'
Alexandra Winter says:
==================== dibs - Direct Internal Buffer Sharing
This series introduces a generic abstraction of existing components like: - the s390 specific ISM device (Internal Shared Memory), - the SMC-D loopback mechanism (Shared Memory Communication - Direct) - the client interface of the SMC-D module to the transport devices This generic shim layer can be extended with more devices, more clients and more features in the future.
This layer is called 'dibs' for Direct Internal Buffer Sharing based on the common scheme that these mechanisms enable controlled sharing of memory buffers within some containing entity such as a hypervisor or a Linux instance.
Benefits: - Cleaner separation of ISM and SMC-D functionality - simpler and less module dependencies - Clear interface definition. - Extendable for future devices and clients.
An overview was given at the Netdev 0x19 conference, recordings and slides are available [1].
Background / Status quo: ------------------------ Currently s390 hardware provides virtual PCI ISM devices (Internal Shared Memory). Their driver is in drivers/s390/net/ism_drv.c. The main user is SMC-D (net/smc). The ism driver offers a client interface so other users/protocols can also use them, but it is still heavily intermingled with the smc code. Namely, the ism module cannot be used without the smc module, which feels artificial.
There is ongoing work to extend the ISM concept of shared buffers that can be accessed directly by another instance on the same hardware: [2] proposed a loopback interface (ism_lo), that can be used on non-s390 architectures (e.g. between containers or to test SMC-D). A minimal implementation went upstream with [3]: ism_lo currently is a part of the smc protocol and rather hidden.
[4] proposed a virtio definition of ism (ism_virtio) that can be used between kvm guests.
We will shortly send an RFC for an dibs client that uses dibs as transport for TTY.
Concept: -------- Create a shim layer in net/dibs that contains common definitions and code for all dibs devices and all dibs clients. Any device or client module only needs to depend on this dibs layer module and any device or client code only needs to include the definitions in include/linux/dibs.h.
The name dibs was chosen to clearly distinguish it from the existing s390 ism devices. And to emphasize that it is not about sharing whole memory regions with anybody, but dedicating single buffers for another system.
Implementation: --------------- The end result of this series is: A dibs shim layer with - One dibs client: smc-d - Two dibs device drivers: ism and dibs-loopback - Everything prepared to add more clients and more device drivers.
Patches 1-2 contain some issues that were found along the way. They make sense on their own, but also enable a better structured dibs series.
There are three components that exist today: a) smc module (especially SMC-D functionality, which is an ism client today) b) ism device driver (supports multiple ism clients today) c) smc-loopback (integrated with smc today) In order to preserve existing functionality at each step, these are not moved to dibs layer by component, instead: - the dibs layer is established in parallel to existing code [patches 3-6] - then some service functions are moved to the dibs layer [patches 7-12] - the actual data movement is moved to the dibs layer [patch 13] - and last event handling is moved to the dibs layer [patch 14]
Future: ------- Items that are not part of this patchset but can be added later: - dynamically add or remove dibs_loopback. That will be allow for simple testing of add_dev()/del_dev() - handle_irq(): Call clients without interrupt context. e.g using threaded interrupts. I left this for a follow-on, because it includes conceptual changes for the smcd receive code. - Any improvements of locking scopes. I mainly moved some of the the existing locks to dibs layer. I have the feeling there is room for improvements. - The device drivers should not loop through the client array - dibs_dev_op.*_dmb() functions reveal unnecessary details of the internal dmb struct to the clients - Check whether client calls to dibs_dev_ops should be replaced by interface functions that provide additional value - Check whether device driver calls to dibs_client_ops should be replaced by interface functions that provide additional value.
Link: [1] https://netdevconf.info/0x19/sessions/talk/communication-via-internal-shared-memory-ism-time-to-open-up.html Link: [2] https://lore.kernel.org/netdev/1695568613-125057-1-git-send-email-guwen@linux.alibaba.com/ Link: [3] https://lore.kernel.org/linux-kernel//20240428060738.60843-1-guwen@linux.alibaba.com/ Link: [4] https://groups.oasis-open.org/communities/community-home/digestviewer/viewthread?GroupId=3973&MessageKey=c060ecf9-ea1a-49a2-9827-c92f0e6447b2&CommunityKey=2f26be99-3aa1-48f6-93a5-018dce262226&hlmlt=VT ====================
Link: https://patch.msgid.link/20250918110500.1731261-1-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
cc21191b |
| 18-Sep-2025 |
Alexandra Winter <wintera@linux.ibm.com> |
dibs: Move data path to dibs layer
Use struct dibs_dmb instead of struct smc_dmb and move the corresponding client tables to dibs_dev. Leave driver specific implementation details like sba in the de
dibs: Move data path to dibs layer
Use struct dibs_dmb instead of struct smc_dmb and move the corresponding client tables to dibs_dev. Leave driver specific implementation details like sba in the device drivers.
Register and unregister dmbs via dibs_dev_ops. A dmb is dedicated to a single client, but a dibs device can have dmbs for more than one client.
Trigger dibs clients via dibs_client_ops->handle_irq(), when data is received into a dmb. For dibs_loopback replace scheduling an smcd receive tasklet with calling dibs_client_ops->handle_irq().
For loopback devices attach_dmb(), detach_dmb() and move_data() need to access the dmb tables, so move those to dibs_dev_ops in this patch as well.
Remove remaining definitions of smc_loopback as they are no longer required, now that everything is in dibs_loopback.
Note that struct ism_client and struct ism_dev are still required in smc until a follow-on patch moves event handling to dibs. (Loopback does not use events).
Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-14-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
719c3b67 |
| 18-Sep-2025 |
Alexandra Winter <wintera@linux.ibm.com> |
dibs: Move query_remote_gid() to dibs_dev_ops
Provide the dibs_dev_ops->query_remote_gid() in ism and dibs_loopback dibs_devices. And call it in smc dibs_client.
Reviewed-by: Julian Ruess <julianr@
dibs: Move query_remote_gid() to dibs_dev_ops
Provide the dibs_dev_ops->query_remote_gid() in ism and dibs_loopback dibs_devices. And call it in smc dibs_client.
Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-13-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
05e68d8d |
| 18-Sep-2025 |
Alexandra Winter <wintera@linux.ibm.com> |
dibs: Local gid for dibs devices
Define a uuid_t GID attribute to identify a dibs device.
SMC uses 64 Bit and 128 Bit Global Identifiers (GIDs) per device, that need to be sent via the SMC protocol
dibs: Local gid for dibs devices
Define a uuid_t GID attribute to identify a dibs device.
SMC uses 64 Bit and 128 Bit Global Identifiers (GIDs) per device, that need to be sent via the SMC protocol. Because the smc code uses integers, network endianness and host endianness need to be considered. Avoid this in the dibs layer by using uuid_t byte arrays. Future patches could change SMC to use uuid_t. For now conversion helper functions are introduced.
ISM devices provide 64 Bit GIDs. Map them to dibs uuid_t GIDs like this: _________________________________________ | 64 Bit ISM-vPCI GID | 00000000_00000000 | ----------------------------------------- If interpreted as UUID [1], this would be interpreted as the UIID variant, that is reserved for NCS backward compatibility. So it will not collide with UUIDs that were generated according to the standard.
smc_loopback already uses version 4 UUIDs as 128 Bit GIDs, move that to dibs loopback. A temporary change to smc_lo_query_rgid() is required, that will be moved to dibs_loopback with a follow-on patch.
Provide gid of a dibs device as sysfs read-only attribute.
Link: https://datatracker.ietf.org/doc/html/rfc4122 [1] Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Julian Ruess <julianr@linux.ibm.com> Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-11-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
845c334a |
| 18-Sep-2025 |
Julian Ruess <julianr@linux.ibm.com> |
dibs: Move struct device to dibs_dev
Move struct device from ism_dev and smc_lo_dev to dibs_dev, and define a corresponding release function. Free ism_dev in ism_remove() and smc_lo_dev in smc_lo_de
dibs: Move struct device to dibs_dev
Move struct device from ism_dev and smc_lo_dev to dibs_dev, and define a corresponding release function. Free ism_dev in ism_remove() and smc_lo_dev in smc_lo_dev_remove().
Replace smcd->ops->get_dev(smcd) by using dibs->dev directly.
An alternative design would be to embed dibs_dev as a field in ism_dev and do the same for other dibs device driver specific structs. However that would have the disadvantage that each dibs device driver needs to allocate dibs_dev and each dibs device driver needs a different device release function. The advantage would be that ism_dev and other device driver specific structs would be covered by device reference counts.
Signed-off-by: Julian Ruess <julianr@linux.ibm.com> Co-developed-by: Alexandra Winter <wintera@linux.ibm.com> Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-9-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
69baaac9 |
| 18-Sep-2025 |
Alexandra Winter <wintera@linux.ibm.com> |
dibs: Define dibs_client_ops and dibs_dev_ops
Move the device add() and remove() functions from ism_client to dibs_client_ops and call add_dev()/del_dev() for ism devices and dibs_loopback devices.
dibs: Define dibs_client_ops and dibs_dev_ops
Move the device add() and remove() functions from ism_client to dibs_client_ops and call add_dev()/del_dev() for ism devices and dibs_loopback devices. dibs_client_ops->add_dev() = smcd_register_dev() for the smc_dibs_client. This is the first step to handle ism and loopback devices alike (as dibs devices) in the smc dibs client.
Define dibs_dev->ops and move smcd_ops->get_chid to dibs_dev_ops->get_fabric_id() for ism and loopback devices. See below for why this needs to be in the same patch as dibs_client_ops->add_dev().
The following changes contain intermediate steps, that will be obsoleted by follow-on patches, once more functionality has been moved to dibs:
Use different smcd_ops and max_dmbs for ism and loopback. Follow-on patches will change SMC-D to directly use dibs_ops instead of smcd_ops.
In smcd_register_dev() it is now necessary to identify a dibs_loopback device before smcd_dev and smcd_ops->get_chid() are available. So provide dibs_dev_ops->get_fabric_id() in this patch and evaluate it in smc_ism_is_loopback().
Call smc_loopback_init() in smcd_register_dev() and call smc_loopback_exit() in smcd_unregister_dev() to handle the functionality that is still in smc_loopback. Follow-on patches will move all smc_loopback code to dibs_loopback.
In smcd_[un]register_dev() use only ism device name, this will be replaced by dibs device name by a follow-on patch.
End of changes with intermediate parts.
Allocate an smcd event workqueue for all dibs devices, although dibs_loopback does not generate events.
Use kernel memory instead of devres memory for smcd_dev and smcd->conn. Since commit a72178cfe855 ("net/smc: Fix dependency of SMC on ISM") an ism device and its driver can have a longer lifetime than the smc module, so smc should not rely on devres to free its resources [1]. It is now the responsibility of the smc client to free smcd and smcd->conn for all dibs devices, ism devices as well as loopback. Call client->ops->del_dev() for all existing dibs devices in dibs_unregister_client(), so all device related structures can be freed in the client.
When dibs_unregister_client() is called in the context of smc_exit() or smc_core_reboot_event(), these functions have already called smc_lgrs_shutdown() which calls smc_smcd_terminate_all(smcd) and sets going_away. This is done a second time in smcd_unregister_dev(). This is analogous to how smcr is handled in these functions, by calling first smc_lgrs_shutdown() and then smc_ib_unregister_client() > smc_ib_remove_dev(), so leave it that way. It may be worth investigating, whether smc_lgrs_shutdown() is still required or useful.
Remove CONFIG_SMC_LO. CONFIG_DIBS_LO now controls whether a dibs loopback device exists or not.
Link: https://www.kernel.org/doc/Documentation/driver-model/devres.txt [1] Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Reviewed-by: Mahanta Jambigi <mjambigi@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-8-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
#
cb990a45 |
| 18-Sep-2025 |
Alexandra Winter <wintera@linux.ibm.com> |
dibs: Define dibs loopback
The first stage of loopback-ism was implemented as part of the SMC module [1]. Now that we have the dibs layer, provide access to a dibs_loopback device to all dibs client
dibs: Define dibs loopback
The first stage of loopback-ism was implemented as part of the SMC module [1]. Now that we have the dibs layer, provide access to a dibs_loopback device to all dibs clients.
This is the first step of moving loopback-ism from net/smc/smc_loopback.* to drivers/dibs/dibs_loopback.*. One global structure lo_dev is allocated and added to the dibs devices. Follow-on patches will move functionality.
Same as smc_loopback, dibs_loopback is provided by a config option. Note that there is no way to dynamically add or remove the loopback device. That could be a future improvement.
When moving code to drivers/dibs, replace ism_ prefix with dibs_ prefix. As this is mostly a move of existing code, copyright and authors are unchanged.
Link: https://lore.kernel.org/lkml/20240428060738.60843-1-guwen@linux.alibaba.com/ [1] Signed-off-by: Alexandra Winter <wintera@linux.ibm.com> Link: https://patch.msgid.link/20250918110500.1731261-7-wintera@linux.ibm.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|