xref: /linux/Documentation/networking/netdev-features.rst (revision d6f6d7123355388f2f41c1b6c108bfdba18b0cfc)
1.. SPDX-License-Identifier: GPL-2.0
2
3=====================================================
4Netdev features mess and how to get out from it alive
5=====================================================
6
7Author:
8	Michał Mirosław <mirq-linux@rere.qmqm.pl>
9
10
11
12Part I: Feature sets
13====================
14
15Long gone are the days when a network card would just take and give packets
16verbatim.  Today's devices add multiple features and bugs (read: offloads)
17that relieve an OS of various tasks like generating and checking checksums,
18splitting packets, classifying them.  Those capabilities and their state
19are commonly referred to as netdev features in Linux kernel world.
20
21There are currently three main sets of features on each netdevice,
22first and second are initialized by the driver:
23
24 1. netdev->hw_features set contains features whose state may possibly
25    be changed (enabled or disabled) for a particular device by user's
26    request.  Drivers normally initialize this set before registration or
27    in the ndo_init callback. Changes after registration should be made
28    very carefully as other parts of the code may assume hw_features are
29    static. At the very least changes must be made under rtnl_lock and
30    the netdev instance lock, and followed by netdev_update_features().
31
32 2. netdev->features set contains features which are currently enabled
33    for a device.  This should be changed only by network core or in
34    error paths of ndo_set_features callback.
35
36 3. netdev->wanted_features set contains feature set requested by user.
37    This set is filtered by ndo_fix_features callback whenever it or
38    some device-specific conditions change. This set is internal to
39    networking core and should not be referenced in drivers.
40
41On top of those three main sets, each netdev has:
42
43 1. Sets which control features inherited by child devices (VLAN, MPLS,
44    hw_enc for L3/L4 tunnels). These sets allow the driver to limit which
45    netdev->features are propagated, in case HW cannot perform the offloads
46    with the extra headers present.
47
48 2. netdev->mangleid_features, TSO features which are supported only when
49    IP ID field can be mangled (constant instead of incrementing) during TSO.
50
51 3. netdev->gso_partial_features, additional TSO features which HW can
52    support via NETIF_F_GSO_PARTIAL.
53
54Part II: Controlling enabled features
55=====================================
56
57When current feature set (netdev->features) is to be changed, new set
58is calculated and filtered by calling ndo_fix_features callback
59and netdev_fix_features(). If the resulting set differs from current
60set, it is passed to ndo_set_features callback and (if the callback
61returns success) replaces value stored in netdev->features.
62NETDEV_FEAT_CHANGE notification is issued after that whenever current
63set might have changed.
64
65The following events trigger recalculation:
66 1. device's registration, after ndo_init returned success
67 2. user requested changes in features state
68 3. netdev_update_features() is called
69
70ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks
71are treated as always returning success.
72
73A driver that wants to trigger recalculation must do so by calling
74netdev_update_features() while holding rtnl_lock. If the device uses the
75netdev instance lock, that lock must be held as well. This should not be
76done from ndo_*_features callbacks. netdev->features should not be modified
77by driver except by means of ndo_fix_features callback.
78
79ndo_features_check is called for each skb before that skb is passed to
80ndo_start_xmit. Driver may perform any non-trivial checks (e.g. exact
81header geometry / length) and withdraw features like HW_CSUM or TSO,
82requesting the networking stack to fall back to the software implementation.
83
84Part III: Implementation hints
85==============================
86
87 * ndo_fix_features:
88
89All dependencies between features should be resolved here. The resulting
90set can be reduced further by networking core imposed limitations (as coded
91in netdev_fix_features()). For this reason it is safer to disable a feature
92when its dependencies are not met instead of forcing the dependency on.
93
94This callback should not modify hardware nor driver state (should be
95stateless).  It can be called multiple times between successive
96ndo_set_features calls.
97
98Callback must not alter features contained in NETIF_F_SOFT_FEATURES or
99NETIF_F_NEVER_CHANGE, except that NETIF_F_VLAN_CHALLENGED may be changed.
100Care must be taken as changes to NETIF_F_VLAN_CHALLENGED won't affect already
101configured VLANs.
102
103 * ndo_set_features:
104
105Hardware should be reconfigured to match passed feature set. The set
106should not be altered unless some error condition happens that can't
107be reliably detected in ndo_fix_features. In this case, the callback
108should update netdev->features to match resulting hardware state.
109Errors returned are not (and cannot be) propagated anywhere except dmesg.
110(Note: successful return is zero, >0 means silent error.)
111
112
113
114Part IV: Features
115=================
116
117For current list of features, see include/linux/netdev_features.h.
118This section describes semantics of some of them.
119
120 * Transmit checksumming
121
122For complete description, see comments near the top of include/linux/skbuff.h.
123
124Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM.
125It means that device can fill TCP/UDP-like checksum anywhere in the packets
126whatever headers there might be.
127
128 * Transmit TCP segmentation offload
129
130NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit
131set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6).
132
133 * Transmit UDP segmentation offload
134
135NETIF_F_GSO_UDP_L4 accepts a single UDP header with a payload that exceeds
136gso_size. On segmentation, it segments the payload on gso_size boundaries and
137replicates the network and UDP headers (fixing up the last one if less than
138gso_size).
139
140 * Transmit DMA from high memory
141
142On platforms where this is relevant, NETIF_F_HIGHDMA signals that
143ndo_start_xmit can handle skbs with frags in high memory.
144
145 * Transmit scatter-gather
146
147Those features say that ndo_start_xmit can handle fragmented skbs:
148NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST ---
149chained skbs (skb->next/prev list).
150
151 * Software features
152
153Features contained in NETIF_F_SOFT_FEATURES are features of networking
154stack. Driver should not change behaviour based on them.
155
156 * VLAN challenged
157
158NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN
159headers. Some drivers set this because the cards can't handle the bigger MTU.
160[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU
161VLANs. This may be not useful, though.]
162
163*  rx-fcs
164
165This requests that the NIC append the Ethernet Frame Checksum (FCS)
166to the end of the skb data.  This allows sniffers and other tools to
167read the CRC recorded by the NIC on receipt of the packet.
168
169*  rx-all
170
171This requests that the NIC receive all possible frames, including errored
172frames (such as bad FCS, etc).  This can be helpful when sniffing a link with
173bad packets on it.  Some NICs may receive more packets if also put into normal
174PROMISC mode.
175
176*  rx-gro-hw
177
178This requests that the NIC enables Hardware GRO (generic receive offload).
179Hardware GRO is basically the exact reverse of TSO, and is generally
180stricter than Hardware LRO.  A packet stream merged by Hardware GRO must
181be re-segmentable by GSO or TSO back to the exact original packet stream.
182Hardware GRO is dependent on RXCSUM since every packet successfully merged
183by hardware must also have the checksum verified by hardware.
184
185* hsr-tag-ins-offload
186
187This should be set for devices which insert an HSR (High-availability Seamless
188Redundancy) or PRP (Parallel Redundancy Protocol) tag automatically.
189
190* hsr-tag-rm-offload
191
192This should be set for devices which remove HSR (High-availability Seamless
193Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically.
194
195* hsr-fwd-offload
196
197This should be set for devices which forward HSR (High-availability Seamless
198Redundancy) frames from one port to another in hardware.
199
200* hsr-dup-offload
201
202This should be set for devices which duplicate outgoing HSR (High-availability
203Seamless Redundancy) or PRP (Parallel Redundancy Protocol) frames
204automatically in hardware.
205
206Part V: Related device flags
207============================
208
209* netdev->netmem_tx
210
211This is not a netdev feature bit. Drivers support netmem TX by setting
212netdev->netmem_tx to one of the values in enum netmem_tx_mode.
213See Documentation/networking/netmem.rst.
214