xref: /linux/Documentation/networking/bridge.rst (revision e46ff213f7a5f5aaebd6bca589517844aa0fe73a)
1.. SPDX-License-Identifier: GPL-2.0
2
3=================
4Ethernet Bridging
5=================
6
7Introduction
8============
9
10The IEEE 802.1Q-2022 (Bridges and Bridged Networks) standard defines the
11operation of bridges in computer networks. A bridge, in the context of this
12standard, is a device that connects two or more network segments and operates
13at the data link layer (Layer 2) of the OSI (Open Systems Interconnection)
14model. The purpose of a bridge is to filter and forward frames between
15different segments based on the destination MAC (Media Access Control) address.
16
17Bridge kAPI
18===========
19
20Here are some core structures of bridge code. Note that the kAPI is *unstable*,
21and can be changed at any time.
22
23.. kernel-doc:: net/bridge/br_private.h
24   :identifiers: net_bridge_vlan
25
26Bridge uAPI
27===========
28
29Modern Linux bridge uAPI is accessed via Netlink interface. You can find
30below files where the bridge and bridge port netlink attributes are defined.
31
32Bridge netlink attributes
33-------------------------
34
35.. kernel-doc:: include/uapi/linux/if_link.h
36   :doc: Bridge enum definition
37
38Bridge port netlink attributes
39------------------------------
40
41.. kernel-doc:: include/uapi/linux/if_link.h
42   :doc: Bridge port enum definition
43
44Bridge sysfs
45------------
46
47The sysfs interface is deprecated and should not be extended if new
48options are added.
49
50STP
51===
52
53The STP (Spanning Tree Protocol) implementation in the Linux bridge driver
54is a critical feature that helps prevent loops and broadcast storms in
55Ethernet networks by identifying and disabling redundant links. In a Linux
56bridge context, STP is crucial for network stability and availability.
57
58STP is a Layer 2 protocol that operates at the Data Link Layer of the OSI
59model. It was originally developed as IEEE 802.1D and has since evolved into
60multiple versions, including Rapid Spanning Tree Protocol (RSTP) and
61`Multiple Spanning Tree Protocol (MSTP)
62<https://lore.kernel.org/netdev/20220316150857.2442916-1-tobias@waldekranz.com/>`_.
63
64The 802.1D-2004 removed the original Spanning Tree Protocol, instead
65incorporating the Rapid Spanning Tree Protocol (RSTP). By 2014, all the
66functionality defined by IEEE 802.1D has been incorporated into either
67IEEE 802.1Q (Bridges and Bridged Networks) or IEEE 802.1AC (MAC Service
68Definition). 802.1D has been officially withdrawn in 2022.
69
70Bridge Ports and STP States
71---------------------------
72
73In the context of STP, bridge ports can be in one of the following states:
74  * Blocking: The port is disabled for data traffic and only listens for
75    BPDUs (Bridge Protocol Data Units) from other devices to determine the
76    network topology.
77  * Listening: The port begins to participate in the STP process and listens
78    for BPDUs.
79  * Learning: The port continues to listen for BPDUs and begins to learn MAC
80    addresses from incoming frames but does not forward data frames.
81  * Forwarding: The port is fully operational and forwards both BPDUs and
82    data frames.
83  * Disabled: The port is administratively disabled and does not participate
84    in the STP process. The data frames forwarding are also disabled.
85
86Root Bridge and Convergence
87---------------------------
88
89In the context of networking and Ethernet bridging in Linux, the root bridge
90is a designated switch in a bridged network that serves as a reference point
91for the spanning tree algorithm to create a loop-free topology.
92
93Here's how the STP works and root bridge is chosen:
94  1. Bridge Priority: Each bridge running a spanning tree protocol, has a
95     configurable Bridge Priority value. The lower the value, the higher the
96     priority. By default, the Bridge Priority is set to a standard value
97     (e.g., 32768).
98  2. Bridge ID: The Bridge ID is composed of two components: Bridge Priority
99     and the MAC address of the bridge. It uniquely identifies each bridge
100     in the network. The Bridge ID is used to compare the priorities of
101     different bridges.
102  3. Bridge Election: When the network starts, all bridges initially assume
103     that they are the root bridge. They start advertising Bridge Protocol
104     Data Units (BPDU) to their neighbors, containing their Bridge ID and
105     other information.
106  4. BPDU Comparison: Bridges exchange BPDUs to determine the root bridge.
107     Each bridge examines the received BPDUs, including the Bridge Priority
108     and Bridge ID, to determine if it should adjust its own priorities.
109     The bridge with the lowest Bridge ID will become the root bridge.
110  5. Root Bridge Announcement: Once the root bridge is determined, it sends
111     BPDUs with information about the root bridge to all other bridges in the
112     network. This information is used by other bridges to calculate the
113     shortest path to the root bridge and, in doing so, create a loop-free
114     topology.
115  6. Forwarding Ports: After the root bridge is selected and the spanning tree
116     topology is established, each bridge determines which of its ports should
117     be in the forwarding state (used for data traffic) and which should be in
118     the blocking state (used to prevent loops). The root bridge's ports are
119     all in the forwarding state. while other bridges have some ports in the
120     blocking state to avoid loops.
121  7. Root Ports: After the root bridge is selected and the spanning tree
122     topology is established, each non-root bridge processes incoming
123     BPDUs and determines which of its ports provides the shortest path to the
124     root bridge based on the information in the received BPDUs. This port is
125     designated as the root port. And it is in the Forwarding state, allowing
126     it to actively forward network traffic.
127  8. Designated ports: A designated port is the port through which the non-root
128     bridge will forward traffic towards the designated segment. Designated ports
129     are placed in the Forwarding state. All other ports on the non-root
130     bridge that are not designated for specific segments are placed in the
131     Blocking state to prevent network loops.
132
133STP ensures network convergence by calculating the shortest path and disabling
134redundant links. When network topology changes occur (e.g., a link failure),
135STP recalculates the network topology to restore connectivity while avoiding loops.
136
137Proper configuration of STP parameters, such as the bridge priority, can
138influence network performance, path selection and which bridge becomes the
139Root Bridge.
140
141User space STP helper
142---------------------
143
144The user space STP helper *bridge-stp* is a program to control whether to use
145user mode spanning tree. The ``/sbin/bridge-stp <bridge> <start|stop>`` is
146called by the kernel when STP is enabled/disabled on a bridge
147(via ``brctl stp <bridge> <on|off>`` or ``ip link set <bridge> type bridge
148stp_state <0|1>``).  The kernel enables user_stp mode if that command returns
1490, or enables kernel_stp mode if that command returns any other value.
150
151STP mode selection
152------------------
153
154The ``IFLA_BR_STP_MODE`` bridge attribute allows explicit control over how
155STP operates when enabled, bypassing the ``/sbin/bridge-stp`` helper
156entirely for the ``user`` and ``kernel`` modes.
157
158.. kernel-doc:: include/uapi/linux/if_link.h
159   :doc: Bridge STP mode values
160
161The default mode is ``BR_STP_MODE_AUTO``, which preserves the traditional
162behavior of invoking the ``/sbin/bridge-stp`` helper. The ``user`` and
163``kernel`` modes are particularly useful in network namespace environments
164where the helper mechanism is not available, as ``call_usermodehelper()``
165is restricted to the initial network namespace.
166
167Example::
168
169  ip link set dev br0 type bridge stp_mode user stp_state 1
170
171The mode can only be changed while STP is disabled.
172
173VLAN
174====
175
176A LAN (Local Area Network) is a network that covers a small geographic area,
177typically within a single building or a campus. LANs are used to connect
178computers, servers, printers, and other networked devices within a localized
179area. LANs can be wired (using Ethernet cables) or wireless (using Wi-Fi).
180
181A VLAN (Virtual Local Area Network) is a logical segmentation of a physical
182network into multiple isolated broadcast domains. VLANs are used to divide
183a single physical LAN into multiple virtual LANs, allowing different groups of
184devices to communicate as if they were on separate physical networks.
185
186Typically there are two VLAN implementations, IEEE 802.1Q and IEEE 802.1ad
187(also known as QinQ). IEEE 802.1Q is a standard for VLAN tagging in Ethernet
188networks. It allows network administrators to create logical VLANs on a
189physical network and tag Ethernet frames with VLAN information, which is
190called *VLAN-tagged frames*. IEEE 802.1ad, commonly known as QinQ or Double
191VLAN, is an extension of the IEEE 802.1Q standard. QinQ allows for the
192stacking of multiple VLAN tags within a single Ethernet frame. The Linux
193bridge supports both the IEEE 802.1Q and `802.1AD
194<https://lore.kernel.org/netdev/1402401565-15423-1-git-send-email-makita.toshiaki@lab.ntt.co.jp/>`_
195protocol for VLAN tagging.
196
197`VLAN filtering <https://lore.kernel.org/netdev/1360792820-14116-1-git-send-email-vyasevic@redhat.com/>`_
198on a bridge is disabled by default. After enabling VLAN filtering on a bridge,
199it will start forwarding frames to appropriate destinations based on their
200destination MAC address and VLAN tag (both must match).
201
202Multicast
203=========
204
205The Linux bridge driver has multicast support allowing it to process Internet
206Group Management Protocol (IGMP) or Multicast Listener Discovery (MLD)
207messages, and to efficiently forward multicast data packets. The bridge
208driver supports IGMPv2/IGMPv3 and MLDv1/MLDv2.
209
210Multicast snooping
211------------------
212
213Multicast snooping is a networking technology that allows network switches
214to intelligently manage multicast traffic within a local area network (LAN).
215
216The switch maintains a multicast group table, which records the association
217between multicast group addresses and the ports where hosts have joined these
218groups. The group table is dynamically updated based on the IGMP/MLD messages
219received. With the multicast group information gathered through snooping, the
220switch optimizes the forwarding of multicast traffic. Instead of blindly
221broadcasting the multicast traffic to all ports, it sends the multicast
222traffic based on the destination MAC address only to ports which have
223subscribed the respective destination multicast group.
224
225When created, the Linux bridge devices have multicast snooping enabled by
226default. It maintains a Multicast forwarding database (MDB) which keeps track
227of port and group relationships.
228
229IGMPv3/MLDv2 EHT support
230------------------------
231
232The Linux bridge supports IGMPv3/MLDv2 EHT (Explicit Host Tracking), which
233was added by `474ddb37fa3a ("net: bridge: multicast: add EHT allow/block handling")
234<https://lore.kernel.org/netdev/20210120145203.1109140-1-razor@blackwall.org/>`_
235
236The explicit host tracking enables the device to keep track of each
237individual host that is joined to a particular group or channel. The main
238benefit of the explicit host tracking in IGMP is to allow minimal leave
239latencies when a host leaves a multicast group or channel.
240
241The length of time between a host wanting to leave and a device stopping
242traffic forwarding is called the IGMP leave latency. A device configured
243with IGMPv3 or MLDv2 and explicit tracking can immediately stop forwarding
244traffic if the last host to request to receive traffic from the device
245indicates that it no longer wants to receive traffic. The leave latency
246is thus bound only by the packet transmission latencies in the multiaccess
247network and the processing time in the device.
248
249Other multicast features
250------------------------
251
252The Linux bridge also supports `per-VLAN multicast snooping
253<https://lore.kernel.org/netdev/20210719170637.435541-1-razor@blackwall.org/>`_,
254which is disabled by default but can be enabled. And `Multicast Router Discovery
255<https://lore.kernel.org/netdev/20190121062628.2710-1-linus.luessing@c0d3.blue/>`_,
256which help identify the location of multicast routers.
257
258Switchdev
259=========
260
261Linux Bridge Switchdev is a feature in the Linux kernel that extends the
262capabilities of the traditional Linux bridge to work more efficiently with
263hardware switches that support switchdev. With Linux Bridge Switchdev, certain
264networking functions like forwarding, filtering, and learning of Ethernet
265frames can be offloaded to a hardware switch. This offloading reduces the
266burden on the Linux kernel and CPU, leading to improved network performance
267and lower latency.
268
269To use Linux Bridge Switchdev, you need hardware switches that support the
270switchdev interface. This means that the switch hardware needs to have the
271necessary drivers and functionality to work in conjunction with the Linux
272kernel.
273
274Please see the :ref:`switchdev` document for more details.
275
276Netfilter
277=========
278
279The bridge netfilter module is a legacy feature that allows to filter bridged
280packets with iptables and ip6tables. Its use is discouraged. Users should
281consider using nftables for packet filtering.
282
283The older ebtables tool is more feature-limited compared to nftables, but
284just like nftables it doesn't need this module either to function.
285
286The br_netfilter module intercepts packets entering the bridge, performs
287minimal sanity tests on ipv4 and ipv6 packets and then pretends that
288these packets are being routed, not bridged. br_netfilter then calls
289the ip and ipv6 netfilter hooks from the bridge layer, i.e. ip(6)tables
290rulesets will also see these packets.
291
292br_netfilter is also the reason for the iptables *physdev* match:
293This match is the only way to reliably tell routed and bridged packets
294apart in an iptables ruleset.
295
296Note that ebtables and nftables will work fine without the br_netfilter module.
297iptables/ip6tables/arptables do not work for bridged traffic because they
298plug in the routing stack. nftables rules in ip/ip6/inet/arp families won't
299see traffic that is forwarded by a bridge either, but that's very much how it
300should be.
301
302Historically the feature set of ebtables was very limited (it still is),
303this module was added to pretend packets are routed and invoke the ipv4/ipv6
304netfilter hooks from the bridge so users had access to the more feature-rich
305iptables matching capabilities (including conntrack). nftables doesn't have
306this limitation, pretty much all features work regardless of the protocol family.
307
308So, br_netfilter is only needed if users, for some reason, need to use
309ip(6)tables to filter packets forwarded by the bridge, or NAT bridged
310traffic. For pure link layer filtering, this module isn't needed.
311
312Other Features
313==============
314
315The Linux bridge also supports `IEEE 802.11 Proxy ARP
316<https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=958501163ddd6ea22a98f94fa0e7ce6d4734e5c4>`_,
317`Media Redundancy Protocol (MRP)
318<https://lore.kernel.org/netdev/20200426132208.3232-1-horatiu.vultur@microchip.com/>`_,
319`Media Redundancy Protocol (MRP) LC mode
320<https://lore.kernel.org/r/20201124082525.273820-1-horatiu.vultur@microchip.com>`_,
321`IEEE 802.1X port authentication
322<https://lore.kernel.org/netdev/20220218155148.2329797-1-schultz.hans+netdev@gmail.com/>`_,
323and `MAC Authentication Bypass (MAB)
324<https://lore.kernel.org/netdev/20221101193922.2125323-2-idosch@nvidia.com/>`_.
325
326FAQ
327===
328
329What does a bridge do?
330----------------------
331
332A bridge transparently forwards traffic between multiple network interfaces.
333In plain English this means that a bridge connects two or more physical
334Ethernet networks, to form one larger (logical) Ethernet network.
335
336Is it L3 protocol independent?
337------------------------------
338
339Yes. The bridge sees all frames, but it *uses* only L2 headers/information.
340As such, the bridging functionality is protocol independent, and there should
341be no trouble forwarding IPX, NetBEUI, IP, IPv6, etc.
342
343Contact Info
344============
345
346The code is currently maintained by Roopa Prabhu <roopa@nvidia.com> and
347Nikolay Aleksandrov <razor@blackwall.org>. Bridge bugs and enhancements
348are discussed on the linux-netdev mailing list netdev@vger.kernel.org and
349bridge@lists.linux.dev.
350
351The list is open to anyone interested: http://vger.kernel.org/vger-lists.html#netdev
352
353External Links
354==============
355
356The old Documentation for Linux bridging is on:
357https://wiki.linuxfoundation.org/networking/bridge
358