#
169e7776 |
| 24-Mar-2022 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one
Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request.
Core ----
- Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO).
- Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order.
- Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect).
- Continue the work of annotating packet drop reasons throughout the stack.
- Switch netdev error counters from an atomic to dynamically allocated per-CPU counters.
- Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT.
- Extend the ref_tracker to allow catching use-after-free bugs.
BPF ---
- Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split.
- Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers.
- Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency.
- Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator.
- Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep.
- Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later).
- Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra.
- Allow cgroup BPF progs to return custom errors to user space.
- Add support for AF_UNIX iterator batching.
- Allow iterator programs to use sleepable helpers.
- Support JIT of add, and, or, xor and xchg atomic ops on arm64.
- Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info.
- Large number of libbpf API improvements, cleanups and deprecations.
Protocols ---------
- Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev.
- Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames.
- Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior.
- VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge.
- Support inserting IPv6 IOAM information to a fraction of frames.
- Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.)
- Support setting socket and IPv6 options via cmsg on ping6 sockets.
- Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules.
- Add support for locked bridge ports (for 802.1X).
- tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios.
- IPv6 extension header handling in Open vSwitch.
- Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor.
- SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY
- MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237)
- Multi-BSSID beacon handling in AP mode for WiFi.
- Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events
- Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements
- Increase the max PDU size in CAN ISOTP to 64 kB.
Driver API ----------
- Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels.
- Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8.
- Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks.
- Add header/data split indication to guide user space enabling of TCP zero-copy Rx.
- Allow configuring completion queue event size.
- Refactor page_pool to enable fragmenting after allocation.
- Add allocation and page reuse statistics to page_pool.
- Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches.
- DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering
New hardware / drivers ----------------------
- Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches
- WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6
- Mobile: - iosm: Intel M.2 7360 WWAN card
Drivers -------
- Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases.
- Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support
- Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions
- Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP
- Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support
- Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports
- Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload
- NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches
- Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock
- Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets
- Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode
- Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices
- MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915
- RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan
- Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS)
- Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup"
* tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...
show more ...
|
#
82e94d41 |
| 18-Mar-2022 |
Jakub Kicinski <kuba@kernel.org> |
Merge branch 'net-bridge-multiple-spanning-trees'
Tobias Waldekranz says:
==================== net: bridge: Multiple Spanning Trees
The bridge has had per-VLAN STP support for a while now, since:
Merge branch 'net-bridge-multiple-spanning-trees'
Tobias Waldekranz says:
==================== net: bridge: Multiple Spanning Trees
The bridge has had per-VLAN STP support for a while now, since:
https://lore.kernel.org/netdev/20200124114022.10883-1-nikolay@cumulusnetworks.com/
The current implementation has some problems:
- The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN is managed independently. This is awkward from an MSTP (802.1Q-2018, Clause 13.5) point of view, where the model is that multiple VLANs are grouped into MST instances.
Because of the way that the standard is written, presumably, this is also reflected in hardware implementations. It is not uncommon for a switch to support the full 4k range of VIDs, but that the pool of MST instances is much smaller. Some examples:
Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs Marvell Prestera: 4k VLANs, but only 128 MSTIs Microchip SparX-5i: 4k VLANs, but only 128 MSTIs
- By default, the feature is enabled, and there is no way to disable it. This makes it hard to add offloading in a backwards compatible way, since any underlying switchdevs have no way to refuse the function if the hardware does not support it
- The port-global STP state has precedence over per-VLAN states. In MSTP, as far as I understand it, all VLANs will use the common spanning tree (CST) by default - through traffic engineering you can then optimize your network to group subsets of VLANs to use different trees (MSTI). To my understanding, the way this is typically managed in silicon is roughly:
Incoming packet: .----.----.--------------.----.------------- | DA | SA | 802.1Q VID=X | ET | Payload ... '----'----'--------------'----'------------- | '->|\ .----------------------------. | +--> | VID | Members | ... | MSTI | PVID -->|/ |-----|---------|-----|------| | 1 | 0001001 | ... | 0 | | 2 | 0001010 | ... | 10 | | 3 | 0001100 | ... | 10 | '----------------------------' | .-----------------------------' | .------------------------. '->| MSTI | Fwding | Lrning | |------|--------|--------| | 0 | 111110 | 111110 | | 10 | 110111 | 110111 | '------------------------'
What this is trying to show is that the STP state (whether MSTP is used, or ye olde STP) is always accessed via the VLAN table. If STP is running, all MSTI pointers in that table will reference the same index in the STP stable - if MSTP is running, some VLANs may point to other trees (like in this example).
The fact that in the Linux bridge, the global state (think: index 0 in most hardware implementations) is supposed to override the per-VLAN state, is very awkward to offload. In effect, this means that when the global state changes to blocking, drivers will have to iterate over all MSTIs in use, and alter them all to match. This also means that you have to cache whether the hardware state is currently tracking the global state or the per-VLAN state. In the first case, you also have to cache the per-VLAN state so that you can restore it if the global state transitions back to forwarding.
This series adds a new mst_enable bridge setting (as suggested by Nik) that can only be changed when no VLANs are configured on the bridge. Enabling this mode has the following effect:
- The port-global STP state is used to represent the CST (Common Spanning Tree) (1/15)
- Ingress STP filtering is deferred until the frame's VLAN has been resolved (1/15)
- The preexisting per-VLAN states can no longer be controlled directly (1/15). They are instead placed under the MST module's control, which is managed using a new netlink interface (described in 3/15)
- VLANs can br mapped to MSTIs in an arbitrary M:N fashion, using a new global VLAN option (2/15)
Switchdev notifications are added so that a driver can track: - MST enabled state - VID to MSTI mappings - MST port states
An offloading implementation is this provided for mv88e6xxx. ====================
Link: https://lore.kernel.org/r/20220316150857.2442916-1-tobias@waldekranz.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
7414af30 |
| 16-Mar-2022 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: dsa: Handle MST state changes
Add the usual trampoline functionality from the generic DSA layer down to the drivers for MST state changes.
When a state changes to disabled/blocking/listening,
net: dsa: Handle MST state changes
Add the usual trampoline functionality from the generic DSA layer down to the drivers for MST state changes.
When a state changes to disabled/blocking/listening, make sure to fast age any dynamic entries in the affected VLANs (those controlled by the MSTI in question).
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
8e6598a7 |
| 16-Mar-2022 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: dsa: Pass VLAN MSTI migration notifications to driver
Add the usual trampoline functionality from the generic DSA layer down to the drivers for VLAN MSTI migrations.
Signed-off-by: Tobias Wald
net: dsa: Pass VLAN MSTI migration notifications to driver
Add the usual trampoline functionality from the generic DSA layer down to the drivers for VLAN MSTI migrations.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
332afc4c |
| 16-Mar-2022 |
Tobias Waldekranz <tobias@waldekranz.com> |
net: dsa: Validate hardware support for MST
When joining a bridge where MST is enabled, we validate that the proper offloading support is in place, otherwise we fallback to software bridging.
When
net: dsa: Validate hardware support for MST
When joining a bridge where MST is enabled, we validate that the proper offloading support is in place, otherwise we fallback to software bridging.
When then mode is changed on a bridge in which we are members, we refuse the change if offloading is not supported.
At the moment we only check for configurable learning, but this will be further restricted as we support more MST related switchdev events.
Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
a8253684 |
| 17-Mar-2022 |
Thomas Zimmermann <tzimmermann@suse.de> |
Merge drm/drm-fixes into drm-misc-fixes
Backmerging drm/drm-fixes for commit 3755d35ee1d2 ("drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP").
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.d
Merge drm/drm-fixes into drm-misc-fixes
Backmerging drm/drm-fixes for commit 3755d35ee1d2 ("drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP").
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
show more ...
|
#
ccdbf33c |
| 15-Mar-2022 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v5.17-rc8' into sched/core, to pick up fixes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
ce835633 |
| 15-Mar-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v5.17-rc8' into next
Sync up with mainline to again get the latest changes in HID subsystem.
|
#
65eab2bc |
| 14-Mar-2022 |
Arnaldo Carvalho de Melo <acme@redhat.com> |
Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes that went thru perf/urgent.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
#
411472ae |
| 14-Mar-2022 |
Ingo Molnar <mingo@kernel.org> |
Merge tag 'v5.17-rc8' into irq/core, to fix conflicts
Conflicts: drivers/pinctrl/pinctrl-starfive.c
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
#
6fb8661c |
| 03-Mar-2022 |
David S. Miller <davem@davemloft.net> |
Merge branch 'dsa-unicast-filtering'
Vladimir Oltean says:
==================== DSA unicast filtering
This series doesn't attempt anything extremely brave, it just changes the way in which standal
Merge branch 'dsa-unicast-filtering'
Vladimir Oltean says:
==================== DSA unicast filtering
This series doesn't attempt anything extremely brave, it just changes the way in which standalone ports which support FDB isolation work.
Up until now, DSA has recommended that switch drivers configure standalone ports in a separate VID/FID with learning disabled, and with the CPU port as the only destination, reached trivially via flooding. That works, except that standalone ports will deliver all packets to the CPU. We can leverage the hardware FDB as a MAC DA filter, and disable flooding towards the CPU port, to force the dropping of packets with unknown MAC DA.
We handle port promiscuity by re-enabling flooding towards the CPU port. This is relevant because the bridge puts its automatic (learning + flooding) ports in promiscuous mode, and this makes some things work automagically, like for example bridging with a foreign interface. We don't delve yet into the territory of managing CPU flooding more aggressively while under a bridge.
The only switch driver that benefits from this work right now is the NXP LS1028A switch (felix). The others need to implement FDB isolation first, before DSA is going to install entries to the port's standalone database. Otherwise, these entries might collide with bridge FDB/MDB entries.
This work was done mainly to have all the required features in place before somebody starts seriously architecting DSA support for multiple CPU ports. Otherwise it is much more difficult to bolt these features on top of multiple CPU ports. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
5e8a1e03 |
| 02-Mar-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: dsa: install secondary unicast and multicast addresses as host FDB/MDB
In preparation of disabling flooding towards the CPU in standalone ports mode, identify the addresses requested by upper i
net: dsa: install secondary unicast and multicast addresses as host FDB/MDB
In preparation of disabling flooding towards the CPU in standalone ports mode, identify the addresses requested by upper interfaces and use the new API for DSA FDB isolation to request the hardware driver to offload these as FDB or MDB objects. The objects belong to the user port's database, and are installed pointing towards the CPU port.
Because dev_uc_add()/dev_mc_add() is VLAN-unaware, we offload to the port standalone database addresses with VID 0 (also VLAN-unaware). So this excludes switches with global VLAN filtering from supporting unicast filtering, because there, it is possible for a port of a switch to join a VLAN-aware bridge, and this changes the VLAN awareness of standalone ports, requiring VLAN-aware standalone host FDB entries. For the same reason, hellcreek, which requires VLAN awareness in standalone mode, is also exempted from unicast filtering.
We create "standalone" variants of dsa_port_host_fdb_add() and dsa_port_host_mdb_add() (and the _del coresponding functions).
We also create a separate work item type for handling deferred standalone host FDB/MDB entries compared to the switchdev one. This is done for the purpose of clarity - the procedure for offloading a bridge FDB entry is different than offloading a standalone one, and the switchdev event work handles only FDBs anyway, not MDBs. Deferral is needed for standalone entries because ndo_set_rx_mode runs in atomic context. We could probably optimize things a little by first queuing up all entries that need to be offloaded, and scheduling the work item just once, but the data structures that we can pass through __dev_uc_sync() and __dev_mc_sync() are limiting (there is nothing like a void *priv), so we'd have to keep the list of queued events somewhere in struct dsa_switch, and possibly a lock for it. Too complicated for now.
Adding the address to the master is handled by dev_uc_sync(), adding it to the hardware is handled by __dev_uc_sync(). So this is the reason why dsa_port_standalone_host_fdb_add() does not call dev_uc_add(). Not that it had the rtnl_mutex anyway - ndo_set_rx_mode has it, but is atomic.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
68d6d71e |
| 02-Mar-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: dsa: rename the host FDB and MDB methods to contain the "bridge" namespace
We are preparing to add API in port.c that adds FDB and MDB entries that correspond to the port's standalone database.
net: dsa: rename the host FDB and MDB methods to contain the "bridge" namespace
We are preparing to add API in port.c that adds FDB and MDB entries that correspond to the port's standalone database. Rename the existing methods to make it clear that the FDB and MDB entries offloaded come from the bridge database.
Since the function names lengthen in dsa_slave_switchdev_event_work(), we place "addr" and "vid" in temporary variables, to shorten those.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
1136fa0c |
| 01-Mar-2022 |
Dmitry Torokhov <dmitry.torokhov@gmail.com> |
Merge tag 'v5.17-rc4' into for-linus
Merge with mainline to get the Intel ASoC generic helpers header and other changes.
|
#
d4ab5487 |
| 28-Feb-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 5.17-rc6 into tty-next
We need the tty/serial fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
ca9400ef |
| 28-Feb-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 5.17-rc6 into usb-next
We need the USB fixes in here, and it resolves a merge conflict in: drivers/usb/dwc3/dwc3-pci.c
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
dbbe23c3 |
| 28-Feb-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 5.17-rc6 into staging-next
We need the staging fixes in here as well.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
4a248f85 |
| 28-Feb-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 5.17-rc6 into driver-core-next
We need the driver core fix in here as well for future changes.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
085686fb |
| 28-Feb-2022 |
Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
Merge 5.17-rc6 into char-misc-next
We need the char-misc fixes in here.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
#
6c64ae22 |
| 28-Feb-2022 |
Dave Airlie <airlied@redhat.com> |
Backmerge tag 'v5.17-rc6' into drm-next
This backmerges v5.17-rc6 so I can merge some amdgpu and some tegra changes on top.
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
Revision tags: v5.17-rc6 |
|
#
b42a738e |
| 27-Feb-2022 |
David S. Miller <davem@davemloft.net> |
Merge branch 'dsa-fdb-isolation'
Vladimir Oltean says:
==================== DSA FDB isolation
There are use cases which need FDB isolation between standalone ports and bridged ports, as well as is
Merge branch 'dsa-fdb-isolation'
Vladimir Oltean says:
==================== DSA FDB isolation
There are use cases which need FDB isolation between standalone ports and bridged ports, as well as isolation between ports of different bridges. Most of these use cases are a result of the fact that packets can now be partially forwarded by the software bridge, so one port might need to send a packet to the CPU but its FDB lookup will see that it can forward it directly to a bridge port where that packet was autonomously learned. So the source port will attempt to shortcircuit the CPU and forward autonomously, which it can't due to the forwarding isolation we have in place. So we will have packet drops instead of proper operation.
Additionally, before DSA can implement IFF_UNICAST_FLT for standalone ports, we must have control over which database we install FDB entries corresponding to port MAC addresses in. We don't want to hinder the operation of the bridging layer.
DSA does not have a driver API that encourages FDB isolation, so this needs to be created. The basis for this is a new struct dsa_db which annotates each FDB and MDB entry with the database it belongs to.
The sja1105 and felix drivers are modified to observe the dsa_db argument, and therefore, enforce the FDB isolation.
Compared to the previous RFC patch series from August: https://patchwork.kernel.org/project/netdevbpf/cover/20210818120150.892647-1-vladimir.oltean@nxp.com/
what is different is that I stopped trying to make SWITCHDEV_FDB_{ADD,DEL}_TO_DEVICE blocking, instead I'm making use of the fact that DSA waits for switchdev FDB work items to finish before a port leaves the bridge. This is possible since: https://patchwork.kernel.org/project/netdevbpf/patch/20211024171757.3753288-7-vladimir.oltean@nxp.com/
Additionally, v2 is also rebased over the DSA LAG FDB work. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
06b9cce4 |
| 25-Feb-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: dsa: pass extack to .port_bridge_join driver methods
As FDB isolation cannot be enforced between VLAN-aware bridges in lack of hardware assistance like extra FID bits, it seems plausible that m
net: dsa: pass extack to .port_bridge_join driver methods
As FDB isolation cannot be enforced between VLAN-aware bridges in lack of hardware assistance like extra FID bits, it seems plausible that many DSA switches cannot do it. Therefore, they need to reject configurations with multiple VLAN-aware bridges from the two code paths that can transition towards that state:
- joining a VLAN-aware bridge - toggling VLAN awareness on an existing bridge
The .port_vlan_filtering method already propagates the netlink extack to the driver, let's propagate it from .port_bridge_join too, to make sure that the driver can use the same function for both.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
c2693363 |
| 25-Feb-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: dsa: request drivers to perform FDB isolation
For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes th
net: dsa: request drivers to perform FDB isolation
For DSA, to encourage drivers to perform FDB isolation simply means to track which bridge does each FDB and MDB entry belong to. It then becomes the driver responsibility to use something that makes the FDB entry from one bridge not match the FDB lookup of ports from other bridges.
The top-level functions where the bridge is determined are: - dsa_port_fdb_{add,del} - dsa_port_host_fdb_{add,del} - dsa_port_mdb_{add,del} - dsa_port_host_mdb_{add,del}
aka the pre-crosschip-notifier functions.
Changing the API to pass a reference to a bridge is not superfluous, and looking at the passed bridge argument is not the same as having the driver look at dsa_to_port(ds, port)->bridge from the ->port_fdb_add() method.
DSA installs FDB and MDB entries on shared (CPU and DSA) ports as well, and those do not have any dp->bridge information to retrieve, because they are not in any bridge - they are merely the pipes that serve the user ports that are in one or multiple bridges.
The struct dsa_bridge associated with each FDB/MDB entry is encapsulated in a larger "struct dsa_db" database. Although only databases associated to bridges are notified for now, this API will be the starting point for implementing IFF_UNICAST_FLT in DSA. There, the idea is to install FDB entries on the CPU port which belong to the corresponding user port's port database. These are supposed to match only when the port is standalone.
It is better to introduce the API in its expected final form than to introduce it for bridges first, then to have to change drivers which may have made one or more assumptions.
Drivers can use the provided bridge.num, but they can also use a different numbering scheme that is more convenient.
DSA must perform refcounting on the CPU and DSA ports by also taking into account the bridge number. So if two bridges request the same local address, DSA must notify the driver twice, once for each bridge.
In fact, if the driver supports FDB isolation, DSA must perform refcounting per bridge, but if the driver doesn't, DSA must refcount host addresses across all bridges, otherwise it would be telling the driver to delete an FDB entry for a bridge and the driver would delete it for all bridges. So introduce a bool fdb_isolation in drivers which would make all bridge databases passed to the cross-chip notifier have the same number (0). This makes dsa_mac_addr_find() -> dsa_db_equal() say that all bridge databases are the same database - which is essentially the legacy behavior.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
53110c67 |
| 25-Feb-2022 |
Jakub Kicinski <kuba@kernel.org> |
Merge branch 'fdb-entries-on-dsa-lag-interfaces'
Vladimir Oltean says:
==================== FDB entries on DSA LAG interfaces
This work permits having static and local FDB entries on LAG interface
Merge branch 'fdb-entries-on-dsa-lag-interfaces'
Vladimir Oltean says:
==================== FDB entries on DSA LAG interfaces
This work permits having static and local FDB entries on LAG interfaces that are offloaded by DSA ports. New API needs to be introduced in drivers. To maintain consistency with the bridging offload code, I've taken the liberty to reorganize the data structures added by Tobias in the DSA core a little bit.
Tested on NXP LS1028A (felix switch). Would appreciate feedback/testing on other platforms too. Testing procedure was the one described here: https://patchwork.kernel.org/project/netdevbpf/cover/20210205130240.4072854-1-vladimir.oltean@nxp.com/
with this script:
ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link add br0 type bridge && ip link set br0 up ip link set br0 arp off ip link set bond0 master br0 && ip link set bond0 up ip link set swp0 master br0 && ip link set swp0 up ip link set dev bond0 type bridge_slave flood off learning off bridge fdb add dev bond0 <mac address of other eno0> master static
I'm noticing a problem in 'bridge fdb dump' with the 'self' entries, and I didn't solve this. On Ocelot, an entry learned on a LAG is reported as being on the first member port of it (so instead of saying 'self bond0', it says 'self swp1'). This is better than not seeing the entry at all, but when DSA queries for the FDBs on a port via ds->ops->port_fdb_dump, it never queries for FDBs on a LAG. Not clear what we should do there, we aren't in control of the ->ndo_fdb_dump of the bonding/team drivers. Alternatively, we could just consider the 'self' entries reported via ndo_fdb_dump as "better than nothing", and concentrate on the 'master' entries that are in sync with the bridge when packets are flooded to software. ====================
Link: https://lore.kernel.org/r/20220223140054.3379617-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
#
e212fa7c |
| 23-Feb-2022 |
Vladimir Oltean <vladimir.oltean@nxp.com> |
net: dsa: support FDB events on offloaded LAG interfaces
This change introduces support for installing static FDB entries towards a bridge port that is a LAG of multiple DSA switch ports, as well as
net: dsa: support FDB events on offloaded LAG interfaces
This change introduces support for installing static FDB entries towards a bridge port that is a LAG of multiple DSA switch ports, as well as support for filtering towards the CPU local FDB entries emitted for LAG interfaces that are bridge ports.
Conceptually, host addresses on LAG ports are identical to what we do for plain bridge ports. Whereas FDB entries _towards_ a LAG can't simply be replicated towards all member ports like we do for multicast, or VLAN. Instead we need new driver API. Hardware usually considers a LAG to be a "logical port", and sets the entire LAG as the forwarding destination. The physical egress port selection within the LAG is made by hashing policy, as usual.
To represent the logical port corresponding to the LAG, we pass by value a copy of the dsa_lag structure to all switches in the tree that have at least one port in that LAG.
To illustrate why a refcounted list of FDB entries is needed in struct dsa_lag, it is enough to say that: - a LAG may be a bridge port and may therefore receive FDB events even while it isn't yet offloaded by any DSA interface - DSA interfaces may be removed from a LAG while that is a bridge port; we don't want FDB entries lingering around, but we don't want to remove entries that are still in use, either
For all the cases below to work, the idea is to always keep an FDB entry on a LAG with a reference count equal to the DSA member ports. So: - if a port joins a LAG, it requests the bridge to replay the FDB, and the FDB entries get created, or their refcount gets bumped by one - if a port leaves a LAG, the FDB replay deletes or decrements refcount by one - if an FDB is installed towards a LAG with ports already present, that entry is created (if it doesn't exist) and its refcount is bumped by the amount of ports already present in the LAG
echo "Adding FDB entry to bond with existing ports" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static
ip link del br0 ip link del bond0
echo "Adding FDB entry to empty bond" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
ip link del br0 ip link del bond0
echo "Adding FDB entry to empty bond, then removing ports one by one" ip link del bond0 ip link add bond0 type bond mode 802.3ad ip link del br0 ip link add br0 type bridge ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp1 down && ip link set swp1 master bond0 && ip link set swp1 up ip link set swp2 down && ip link set swp2 master bond0 && ip link set swp2 up
ip link set swp1 nomaster ip link set swp2 nomaster ip link del br0 ip link del bond0
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|