1d0dcde64SOtto Sabart.. SPDX-License-Identifier: GPL-2.0 2d0dcde64SOtto Sabart 3b83eb68cSOtto Sabart================= 4b83eb68cSOtto SabartChecksum Offloads 5b83eb68cSOtto Sabart================= 6d0dcde64SOtto Sabart 7d0dcde64SOtto Sabart 8d0dcde64SOtto SabartIntroduction 9d0dcde64SOtto Sabart============ 10d0dcde64SOtto Sabart 11d0dcde64SOtto SabartThis document describes a set of techniques in the Linux networking stack to 12d0dcde64SOtto Sabarttake advantage of checksum offload capabilities of various NICs. 13d0dcde64SOtto Sabart 14d0dcde64SOtto SabartThe following technologies are described: 15d0dcde64SOtto Sabart 16d0dcde64SOtto Sabart* TX Checksum Offload 17d0dcde64SOtto Sabart* LCO: Local Checksum Offload 18d0dcde64SOtto Sabart* RCO: Remote Checksum Offload 19d0dcde64SOtto Sabart 20d0dcde64SOtto SabartThings that should be documented here but aren't yet: 21d0dcde64SOtto Sabart 22d0dcde64SOtto Sabart* RX Checksum Offload 23d0dcde64SOtto Sabart* CHECKSUM_UNNECESSARY conversion 24d0dcde64SOtto Sabart 25d0dcde64SOtto Sabart 26d0dcde64SOtto SabartTX Checksum Offload 27d0dcde64SOtto Sabart=================== 28d0dcde64SOtto Sabart 29d0dcde64SOtto SabartThe interface for offloading a transmit checksum to a device is explained in 30d0dcde64SOtto Sabartdetail in comments near the top of include/linux/skbuff.h. 31d0dcde64SOtto Sabart 32d0dcde64SOtto SabartIn brief, it allows to request the device fill in a single ones-complement 33d0dcde64SOtto Sabartchecksum defined by the sk_buff fields skb->csum_start and skb->csum_offset. 34d0dcde64SOtto SabartThe device should compute the 16-bit ones-complement checksum (i.e. the 35d0dcde64SOtto Sabart'IP-style' checksum) from csum_start to the end of the packet, and fill in the 36d0dcde64SOtto Sabartresult at (csum_start + csum_offset). 37d0dcde64SOtto Sabart 38d0dcde64SOtto SabartBecause csum_offset cannot be negative, this ensures that the previous value of 39d0dcde64SOtto Sabartthe checksum field is included in the checksum computation, thus it can be used 40d0dcde64SOtto Sabartto supply any needed corrections to the checksum (such as the sum of the 41d0dcde64SOtto Sabartpseudo-header for UDP or TCP). 42d0dcde64SOtto Sabart 43d0dcde64SOtto SabartThis interface only allows a single checksum to be offloaded. Where 44d0dcde64SOtto Sabartencapsulation is used, the packet may have multiple checksum fields in 45d0dcde64SOtto Sabartdifferent header layers, and the rest will have to be handled by another 46d0dcde64SOtto Sabartmechanism such as LCO or RCO. 47d0dcde64SOtto Sabart 48d0dcde64SOtto SabartCRC32c can also be offloaded using this interface, by means of filling 49d0dcde64SOtto Sabartskb->csum_start and skb->csum_offset as described above, and setting 50d0dcde64SOtto Sabartskb->csum_not_inet: see skbuff.h comment (section 'D') for more details. 51d0dcde64SOtto Sabart 52d0dcde64SOtto SabartNo offloading of the IP header checksum is performed; it is always done in 53d0dcde64SOtto Sabartsoftware. This is OK because when we build the IP header, we obviously have it 54d0dcde64SOtto Sabartin cache, so summing it isn't expensive. It's also rather short. 55d0dcde64SOtto Sabart 56d0dcde64SOtto SabartThe requirements for GSO are more complicated, because when segmenting an 57d0dcde64SOtto Sabartencapsulated packet both the inner and outer checksums may need to be edited or 58d0dcde64SOtto Sabartrecomputed for each resulting segment. See the skbuff.h comment (section 'E') 59d0dcde64SOtto Sabartfor more details. 60d0dcde64SOtto Sabart 61d0dcde64SOtto SabartA driver declares its offload capabilities in netdev->hw_features; see 62*ea5bacaaSMauro Carvalho ChehabDocumentation/networking/netdev-features.rst for more. Note that a device 63d0dcde64SOtto Sabartwhich only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and 64d0dcde64SOtto Sabartcsum_offset given in the SKB; if it tries to deduce these itself in hardware 65d0dcde64SOtto Sabart(as some NICs do) the driver should check that the values in the SKB match 66d0dcde64SOtto Sabartthose which the hardware will deduce, and if not, fall back to checksumming in 67d0dcde64SOtto Sabartsoftware instead (with skb_csum_hwoffload_help() or one of the 68d0dcde64SOtto Sabartskb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in 69d0dcde64SOtto Sabartinclude/linux/skbuff.h). 70d0dcde64SOtto Sabart 71d0dcde64SOtto SabartThe stack should, for the most part, assume that checksum offload is supported 72d0dcde64SOtto Sabartby the underlying device. The only place that should check is 73d0dcde64SOtto Sabartvalidate_xmit_skb(), and the functions it calls directly or indirectly. That 74d0dcde64SOtto Sabartfunction compares the offload features requested by the SKB (which may include 75d0dcde64SOtto Sabartother offloads besides TX Checksum Offload) and, if they are not supported or 76d0dcde64SOtto Sabartenabled on the device (determined by netdev->features), performs the 77d0dcde64SOtto Sabartcorresponding offload in software. In the case of TX Checksum Offload, that 78d0dcde64SOtto Sabartmeans calling skb_csum_hwoffload_help(skb, features). 79d0dcde64SOtto Sabart 80d0dcde64SOtto Sabart 81d0dcde64SOtto SabartLCO: Local Checksum Offload 82d0dcde64SOtto Sabart=========================== 83d0dcde64SOtto Sabart 84d0dcde64SOtto SabartLCO is a technique for efficiently computing the outer checksum of an 85d0dcde64SOtto Sabartencapsulated datagram when the inner checksum is due to be offloaded. 86d0dcde64SOtto Sabart 87d0dcde64SOtto SabartThe ones-complement sum of a correctly checksummed TCP or UDP packet is equal 88d0dcde64SOtto Sabartto the complement of the sum of the pseudo header, because everything else gets 89d0dcde64SOtto Sabart'cancelled out' by the checksum field. This is because the sum was 90d0dcde64SOtto Sabartcomplemented before being written to the checksum field. 91d0dcde64SOtto Sabart 92d0dcde64SOtto SabartMore generally, this holds in any case where the 'IP-style' ones complement 93d0dcde64SOtto Sabartchecksum is used, and thus any checksum that TX Checksum Offload supports. 94d0dcde64SOtto Sabart 95d0dcde64SOtto SabartThat is, if we have set up TX Checksum Offload with a start/offset pair, we 96d0dcde64SOtto Sabartknow that after the device has filled in that checksum, the ones complement sum 97d0dcde64SOtto Sabartfrom csum_start to the end of the packet will be equal to the complement of 98d0dcde64SOtto Sabartwhatever value we put in the checksum field beforehand. This allows us to 99d0dcde64SOtto Sabartcompute the outer checksum without looking at the payload: we simply stop 100d0dcde64SOtto Sabartsumming when we get to csum_start, then add the complement of the 16-bit word 101d0dcde64SOtto Sabartat (csum_start + csum_offset). 102d0dcde64SOtto Sabart 103d0dcde64SOtto SabartThen, when the true inner checksum is filled in (either by hardware or by 104d0dcde64SOtto Sabartskb_checksum_help()), the outer checksum will become correct by virtue of the 105d0dcde64SOtto Sabartarithmetic. 106d0dcde64SOtto Sabart 107d0dcde64SOtto SabartLCO is performed by the stack when constructing an outer UDP header for an 108d0dcde64SOtto Sabartencapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the 109d0dcde64SOtto SabartIPv6 equivalents, in udp6_set_csum(). 110d0dcde64SOtto Sabart 111d0dcde64SOtto SabartIt is also performed when constructing an IPv4 GRE header, in 112d0dcde64SOtto Sabartnet/ipv4/ip_gre.c:build_header(). It is *not* currently performed when 113d0dcde64SOtto Sabartconstructing an IPv6 GRE header; the GRE checksum is computed over the whole 114d0dcde64SOtto Sabartpacket in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use 115d0dcde64SOtto SabartLCO here as IPv6 GRE still uses an IP-style checksum. 116d0dcde64SOtto Sabart 117d0dcde64SOtto SabartAll of the LCO implementations use a helper function lco_csum(), in 118d0dcde64SOtto Sabartinclude/linux/skbuff.h. 119d0dcde64SOtto Sabart 120d0dcde64SOtto SabartLCO can safely be used for nested encapsulations; in this case, the outer 121d0dcde64SOtto Sabartencapsulation layer will sum over both its own header and the 'middle' header. 122d0dcde64SOtto SabartThis does mean that the 'middle' header will get summed multiple times, but 123d0dcde64SOtto Sabartthere doesn't seem to be a way to avoid that without incurring bigger costs 124d0dcde64SOtto Sabart(e.g. in SKB bloat). 125d0dcde64SOtto Sabart 126d0dcde64SOtto Sabart 127d0dcde64SOtto SabartRCO: Remote Checksum Offload 128d0dcde64SOtto Sabart============================ 129d0dcde64SOtto Sabart 130d0dcde64SOtto SabartRCO is a technique for eliding the inner checksum of an encapsulated datagram, 131d0dcde64SOtto Sabartallowing the outer checksum to be offloaded. It does, however, involve a 132d0dcde64SOtto Sabartchange to the encapsulation protocols, which the receiver must also support. 133d0dcde64SOtto SabartFor this reason, it is disabled by default. 134d0dcde64SOtto Sabart 135d0dcde64SOtto SabartRCO is detailed in the following Internet-Drafts: 136d0dcde64SOtto Sabart 137d0dcde64SOtto Sabart* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 138d0dcde64SOtto Sabart* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 139d0dcde64SOtto Sabart 140d0dcde64SOtto SabartIn Linux, RCO is implemented individually in each encapsulation protocol, and 141d0dcde64SOtto Sabartmost tunnel types have flags controlling its use. For instance, VXLAN has the 142d0dcde64SOtto Sabartflag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be 143d0dcde64SOtto Sabartused when transmitting to a given remote destination. 144