1.. SPDX-License-Identifier: GPL-2.0 2 3===================================================== 4Netdev features mess and how to get out from it alive 5===================================================== 6 7Author: 8 Michał Mirosław <mirq-linux@rere.qmqm.pl> 9 10 11 12Part I: Feature sets 13==================== 14 15Long gone are the days when a network card would just take and give packets 16verbatim. Today's devices add multiple features and bugs (read: offloads) 17that relieve an OS of various tasks like generating and checking checksums, 18splitting packets, classifying them. Those capabilities and their state 19are commonly referred to as netdev features in Linux kernel world. 20 21There are currently three sets of features relevant to the driver, and 22one used internally by network core: 23 24 1. netdev->hw_features set contains features whose state may possibly 25 be changed (enabled or disabled) for a particular device by user's 26 request. This set should be initialized in ndo_init callback and not 27 changed later. 28 29 2. netdev->features set contains features which are currently enabled 30 for a device. This should be changed only by network core or in 31 error paths of ndo_set_features callback. 32 33 3. netdev->vlan_features set contains features whose state is inherited 34 by child VLAN devices (limits netdev->features set). This is currently 35 used for all VLAN devices whether tags are stripped or inserted in 36 hardware or software. 37 38 4. netdev->wanted_features set contains feature set requested by user. 39 This set is filtered by ndo_fix_features callback whenever it or 40 some device-specific conditions change. This set is internal to 41 networking core and should not be referenced in drivers. 42 43 44 45Part II: Controlling enabled features 46===================================== 47 48When current feature set (netdev->features) is to be changed, new set 49is calculated and filtered by calling ndo_fix_features callback 50and netdev_fix_features(). If the resulting set differs from current 51set, it is passed to ndo_set_features callback and (if the callback 52returns success) replaces value stored in netdev->features. 53NETDEV_FEAT_CHANGE notification is issued after that whenever current 54set might have changed. 55 56The following events trigger recalculation: 57 1. device's registration, after ndo_init returned success 58 2. user requested changes in features state 59 3. netdev_update_features() is called 60 61ndo_*_features callbacks are called with rtnl_lock held. Missing callbacks 62are treated as always returning success. 63 64A driver that wants to trigger recalculation must do so by calling 65netdev_update_features() while holding rtnl_lock. This should not be done 66from ndo_*_features callbacks. netdev->features should not be modified by 67driver except by means of ndo_fix_features callback. 68 69 70 71Part III: Implementation hints 72============================== 73 74 * ndo_fix_features: 75 76All dependencies between features should be resolved here. The resulting 77set can be reduced further by networking core imposed limitations (as coded 78in netdev_fix_features()). For this reason it is safer to disable a feature 79when its dependencies are not met instead of forcing the dependency on. 80 81This callback should not modify hardware nor driver state (should be 82stateless). It can be called multiple times between successive 83ndo_set_features calls. 84 85Callback must not alter features contained in NETIF_F_SOFT_FEATURES or 86NETIF_F_NEVER_CHANGE sets. The exception is NETIF_F_VLAN_CHALLENGED but 87care must be taken as the change won't affect already configured VLANs. 88 89 * ndo_set_features: 90 91Hardware should be reconfigured to match passed feature set. The set 92should not be altered unless some error condition happens that can't 93be reliably detected in ndo_fix_features. In this case, the callback 94should update netdev->features to match resulting hardware state. 95Errors returned are not (and cannot be) propagated anywhere except dmesg. 96(Note: successful return is zero, >0 means silent error.) 97 98 99 100Part IV: Features 101================= 102 103For current list of features, see include/linux/netdev_features.h. 104This section describes semantics of some of them. 105 106 * Transmit checksumming 107 108For complete description, see comments near the top of include/linux/skbuff.h. 109 110Note: NETIF_F_HW_CSUM is a superset of NETIF_F_IP_CSUM + NETIF_F_IPV6_CSUM. 111It means that device can fill TCP/UDP-like checksum anywhere in the packets 112whatever headers there might be. 113 114 * Transmit TCP segmentation offload 115 116NETIF_F_TSO_ECN means that hardware can properly split packets with CWR bit 117set, be it TCPv4 (when NETIF_F_TSO is enabled) or TCPv6 (NETIF_F_TSO6). 118 119 * Transmit UDP segmentation offload 120 121NETIF_F_GSO_UDP_L4 accepts a single UDP header with a payload that exceeds 122gso_size. On segmentation, it segments the payload on gso_size boundaries and 123replicates the network and UDP headers (fixing up the last one if less than 124gso_size). 125 126 * Transmit DMA from high memory 127 128On platforms where this is relevant, NETIF_F_HIGHDMA signals that 129ndo_start_xmit can handle skbs with frags in high memory. 130 131 * Transmit scatter-gather 132 133Those features say that ndo_start_xmit can handle fragmented skbs: 134NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- 135chained skbs (skb->next/prev list). 136 137 * Software features 138 139Features contained in NETIF_F_SOFT_FEATURES are features of networking 140stack. Driver should not change behaviour based on them. 141 142 * VLAN challenged 143 144NETIF_F_VLAN_CHALLENGED should be set for devices which can't cope with VLAN 145headers. Some drivers set this because the cards can't handle the bigger MTU. 146[FIXME: Those cases could be fixed in VLAN code by allowing only reduced-MTU 147VLANs. This may be not useful, though.] 148 149* rx-fcs 150 151This requests that the NIC append the Ethernet Frame Checksum (FCS) 152to the end of the skb data. This allows sniffers and other tools to 153read the CRC recorded by the NIC on receipt of the packet. 154 155* rx-all 156 157This requests that the NIC receive all possible frames, including errored 158frames (such as bad FCS, etc). This can be helpful when sniffing a link with 159bad packets on it. Some NICs may receive more packets if also put into normal 160PROMISC mode. 161 162* rx-gro-hw 163 164This requests that the NIC enables Hardware GRO (generic receive offload). 165Hardware GRO is basically the exact reverse of TSO, and is generally 166stricter than Hardware LRO. A packet stream merged by Hardware GRO must 167be re-segmentable by GSO or TSO back to the exact original packet stream. 168Hardware GRO is dependent on RXCSUM since every packet successfully merged 169by hardware must also have the checksum verified by hardware. 170 171* hsr-tag-ins-offload 172 173This should be set for devices which insert an HSR (High-availability Seamless 174Redundancy) or PRP (Parallel Redundancy Protocol) tag automatically. 175 176* hsr-tag-rm-offload 177 178This should be set for devices which remove HSR (High-availability Seamless 179Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically. 180 181* hsr-fwd-offload 182 183This should be set for devices which forward HSR (High-availability Seamless 184Redundancy) frames from one port to another in hardware. 185 186* hsr-dup-offload 187 188This should be set for devices which duplicate outgoing HSR (High-availability 189Seamless Redundancy) or PRP (Parallel Redundancy Protocol) tags automatically 190frames in hardware. 191