1.. SPDX-License-Identifier: GPL-2.0 2 3===================================================== 4Netdev features mess and how to get out from it alive 5===================================================== 6 7Author: 8 Michał Mirosław <mirq-linux@rere.qmqm.pl> 9 10 11 12Part I: Feature sets 13==================== 14 15Long gone are the days when a network card would just take and give packets 16verbatim. Today's devices add multiple features and bugs (read: offloads) 17that relieve an OS of various tasks like generating and checking checksums, 18splitting packets, classifying them. Those capabilities and their state 19are commonly referred to as netdev features in Linux kernel world. 20 21There are currently three main sets of features on each netdevice, 22first and second are initialized by the driver: 23 24 1. netdev->hw_features set contains features whose state may possibly 25 be changed (enabled or disabled) for a particular device by user's 26 request. Drivers normally initialize this set before registration or 27 in the ndo_init callback. Changes after registration should be made 28 very carefully as other parts of the code may assume hw_features are 29 static. At the very least changes must be made under rtnl_lock and 30 the netdev instance lock, and followed by netdev_update_features(). 31 32 2. netdev->features set contains features which are currently enabled 33 for a device. This should be changed only by network core or in 34 error paths of ndo_set_features callback. 35 36 3. netdev->wanted_features set contains feature set requested by user. 37 This set is filtered by ndo_fix_features callback whenever it or 38 some device-specific conditions change. This set is internal to 39 networking core and should not be referenced in drivers. 40 41On top of those three main sets, each netdev has: 42 43 1. Sets which control features inherited by child devices (VLAN, MPLS, 44 hw_enc for L3/L4 tunnels). These sets allow the driver to limit which 45 netdev->features are propagated, in case HW cannot perform the offloads 46 with the extra headers present. 47 48 2. netdev->mangleid_features, TSO features which are supported only when 49 IP ID field can be mangled (constant instead of incrementing) during TSO. 50 51 3. netdev->gso_partial_features, additional TSO features which HW can 52 support via NETIF_F_GSO_PARTIAL. 53 54Part II: Controlling enabled features 55===================================== 56 57When current feature set (netdev->features) is to be changed, new set 58is calculated and filtered by calling ndo_fix_features callback 59and netdev_fix_features(). If the resulting set differs from current 60set, it is passed to ndo_set_features callback and (if the callback 61returns success) replaces value stored in netdev->features. 62NETDEV_FEAT_CHANGE notification is issued after that whenever current 63set might have changed. 64 65The following events trigger recalculation: 66 1. device's registration, after ndo_init returned success 67 2. user requested changes in features state 68 3. netdev_update_features() is called 69 70ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks 71are treated as always returning success. 72 73A driver that wants to trigger recalculation must do so by calling 74netdev_update_features() while holding rtnl_lock. If the device uses the 75netdev instance lock, that lock must be held as well. This should not be 76done from ndo_*_features callbacks. netdev->features should not be modified 77by driver except by means of ndo_fix_features callback. 78 79For "ops locked" drivers (see Documentation/networking/netdevices.rst), 80ethtool callbacks that may end up invoking netdev_update_features() must 81opt back into rtnl_lock by setting the matching ETHTOOL_OP_NEEDS_RTNL_* 82bit in ``ethtool_ops::op_needs_rtnl``. The ethtool core then keeps 83rtnl_lock held across those SET callbacks so the contract above still 84holds. 85 86ndo_features_check is called for each skb before that skb is passed to 87ndo_start_xmit. Driver may perform any non-trivial checks (e.g. exact 88header geometry / length) and withdraw features like HW_CSUM or TSO, 89requesting the networking stack to fall back to the software implementation. 90 91Part III: Implementation hints 92============================== 93 94 * ndo_fix_features: 95 96All dependencies between features should be resolved here. The resulting 97set can be reduced further by networking core imposed limitations (as coded 98in netdev_fix_features()). For this reason it is safer to disable a feature 99when its dependencies are not met instead of forcing the dependency on. 100 101This callback should not modify hardware nor driver state (should be 102stateless). It can be called multiple times between successive 103ndo_set_features calls. 104 105Callback must not alter features contained in NETIF_F_SOFT_FEATURES or 106NETIF_F_NEVER_CHANGE, except that NETIF_F_VLAN_CHALLENGED may be changed. 107Care must be taken as changes to NETIF_F_VLAN_CHALLENGED won't affect already 108configured VLANs. 109 110 * ndo_set_features: 111 112Hardware should be reconfigured to match passed feature set. The set 113should not be altered unless some error condition happens that can't 114be reliably detected in ndo_fix_features. In this case, the callback 115should update netdev->features to match resulting hardware state. 116Errors returned are not (and cannot be) propagated anywhere except dmesg. 117(Note: successful return is zero, >0 means silent error.) 118 119 120 121Part IV: Features 122================= 123 124For current list of features, see include/linux/netdev_features.h. 125This section describes semantics of some of them. 126 127 * Transmit checksumming 128 129For complete description, see comments near the top of include/linux/skbuff.h. 130 131Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM. 132It means that device can fill TCP/UDP-like checksum anywhere in the packets 133whatever headers there might be. 134 135 * Transmit TCP segmentation offload 136 137NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit 138set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6). 139 140 * Transmit UDP segmentation offload 141 142NETIF_F_GSO_UDP_L4 accepts a single UDP header with a payload that exceeds 143gso_size. On segmentation, it segments the payload on gso_size boundaries and 144replicates the network and UDP headers (fixing up the last one if less than 145gso_size). 146 147 * Transmit DMA from high memory 148 149On platforms where this is relevant, NETIF_F_HIGHDMA signals that 150ndo_start_xmit can handle skbs with frags in high memory. 151 152 * Transmit scatter-gather 153 154Those features say that ndo_start_xmit can handle fragmented skbs: 155NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- 156chained skbs (skb->next/prev list). 157 158 * Software features 159 160Features contained in NETIF_F_SOFT_FEATURES are features of networking 161stack. Driver should not change behaviour based on them. 162 163 * VLAN challenged 164 165NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN 166headers. Some drivers set this because the cards can't handle the bigger MTU. 167[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU 168VLANs. This may be not useful, though.] 169 170* rx-fcs 171 172This requests that the NIC append the Ethernet Frame Checksum (FCS) 173to the end of the skb data. This allows sniffers and other tools to 174read the CRC recorded by the NIC on receipt of the packet. 175 176* rx-all 177 178This requests that the NIC receive all possible frames, including errored 179frames (such as bad FCS, etc). This can be helpful when sniffing a link with 180bad packets on it. Some NICs may receive more packets if also put into normal 181PROMISC mode. 182 183* rx-gro-hw 184 185This requests that the NIC enables Hardware GRO (generic receive offload). 186Hardware GRO is basically the exact reverse of TSO, and is generally 187stricter than Hardware LRO. A packet stream merged by Hardware GRO must 188be re-segmentable by GSO or TSO back to the exact original packet stream. 189Hardware GRO is dependent on RXCSUM since every packet successfully merged 190by hardware must also have the checksum verified by hardware. 191 192* hsr-tag-ins-offload 193 194This should be set for devices which insert an HSR (High-availability Seamless 195Redundancy) or PRP (Parallel Redundancy Protocol) tag automatically. 196 197* hsr-tag-rm-offload 198 199This should be set for devices which remove HSR (High-availability Seamless 200Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically. 201 202* hsr-fwd-offload 203 204This should be set for devices which forward HSR (High-availability Seamless 205Redundancy) frames from one port to another in hardware. 206 207* hsr-dup-offload 208 209This should be set for devices which duplicate outgoing HSR (High-availability 210Seamless Redundancy) or PRP (Parallel Redundancy Protocol) frames 211automatically in hardware. 212 213Part V: Related device flags 214============================ 215 216* netdev->netmem_tx 217 218This is not a netdev feature bit. Drivers support netmem TX by setting 219netdev->netmem_tx to one of the values in enum netmem_tx_mode. 220See Documentation/networking/netmem.rst. 221