1*d0dcde64SOtto Sabart.. SPDX-License-Identifier: GPL-2.0 2*d0dcde64SOtto Sabart 3*d0dcde64SOtto Sabart=============================================== 4*d0dcde64SOtto SabartChecksum Offloads in the Linux Networking Stack 5*d0dcde64SOtto Sabart=============================================== 6*d0dcde64SOtto Sabart 7*d0dcde64SOtto Sabart 8*d0dcde64SOtto SabartIntroduction 9*d0dcde64SOtto Sabart============ 10*d0dcde64SOtto Sabart 11*d0dcde64SOtto SabartThis document describes a set of techniques in the Linux networking stack to 12*d0dcde64SOtto Sabarttake advantage of checksum offload capabilities of various NICs. 13*d0dcde64SOtto Sabart 14*d0dcde64SOtto SabartThe following technologies are described: 15*d0dcde64SOtto Sabart 16*d0dcde64SOtto Sabart* TX Checksum Offload 17*d0dcde64SOtto Sabart* LCO: Local Checksum Offload 18*d0dcde64SOtto Sabart* RCO: Remote Checksum Offload 19*d0dcde64SOtto Sabart 20*d0dcde64SOtto SabartThings that should be documented here but aren't yet: 21*d0dcde64SOtto Sabart 22*d0dcde64SOtto Sabart* RX Checksum Offload 23*d0dcde64SOtto Sabart* CHECKSUM_UNNECESSARY conversion 24*d0dcde64SOtto Sabart 25*d0dcde64SOtto Sabart 26*d0dcde64SOtto SabartTX Checksum Offload 27*d0dcde64SOtto Sabart=================== 28*d0dcde64SOtto Sabart 29*d0dcde64SOtto SabartThe interface for offloading a transmit checksum to a device is explained in 30*d0dcde64SOtto Sabartdetail in comments near the top of include/linux/skbuff.h. 31*d0dcde64SOtto Sabart 32*d0dcde64SOtto SabartIn brief, it allows to request the device fill in a single ones-complement 33*d0dcde64SOtto Sabartchecksum defined by the sk_buff fields skb->csum_start and skb->csum_offset. 34*d0dcde64SOtto SabartThe device should compute the 16-bit ones-complement checksum (i.e. the 35*d0dcde64SOtto Sabart'IP-style' checksum) from csum_start to the end of the packet, and fill in the 36*d0dcde64SOtto Sabartresult at (csum_start + csum_offset). 37*d0dcde64SOtto Sabart 38*d0dcde64SOtto SabartBecause csum_offset cannot be negative, this ensures that the previous value of 39*d0dcde64SOtto Sabartthe checksum field is included in the checksum computation, thus it can be used 40*d0dcde64SOtto Sabartto supply any needed corrections to the checksum (such as the sum of the 41*d0dcde64SOtto Sabartpseudo-header for UDP or TCP). 42*d0dcde64SOtto Sabart 43*d0dcde64SOtto SabartThis interface only allows a single checksum to be offloaded. Where 44*d0dcde64SOtto Sabartencapsulation is used, the packet may have multiple checksum fields in 45*d0dcde64SOtto Sabartdifferent header layers, and the rest will have to be handled by another 46*d0dcde64SOtto Sabartmechanism such as LCO or RCO. 47*d0dcde64SOtto Sabart 48*d0dcde64SOtto SabartCRC32c can also be offloaded using this interface, by means of filling 49*d0dcde64SOtto Sabartskb->csum_start and skb->csum_offset as described above, and setting 50*d0dcde64SOtto Sabartskb->csum_not_inet: see skbuff.h comment (section 'D') for more details. 51*d0dcde64SOtto Sabart 52*d0dcde64SOtto SabartNo offloading of the IP header checksum is performed; it is always done in 53*d0dcde64SOtto Sabartsoftware. This is OK because when we build the IP header, we obviously have it 54*d0dcde64SOtto Sabartin cache, so summing it isn't expensive. It's also rather short. 55*d0dcde64SOtto Sabart 56*d0dcde64SOtto SabartThe requirements for GSO are more complicated, because when segmenting an 57*d0dcde64SOtto Sabartencapsulated packet both the inner and outer checksums may need to be edited or 58*d0dcde64SOtto Sabartrecomputed for each resulting segment. See the skbuff.h comment (section 'E') 59*d0dcde64SOtto Sabartfor more details. 60*d0dcde64SOtto Sabart 61*d0dcde64SOtto SabartA driver declares its offload capabilities in netdev->hw_features; see 62*d0dcde64SOtto SabartDocumentation/networking/netdev-features.txt for more. Note that a device 63*d0dcde64SOtto Sabartwhich only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and 64*d0dcde64SOtto Sabartcsum_offset given in the SKB; if it tries to deduce these itself in hardware 65*d0dcde64SOtto Sabart(as some NICs do) the driver should check that the values in the SKB match 66*d0dcde64SOtto Sabartthose which the hardware will deduce, and if not, fall back to checksumming in 67*d0dcde64SOtto Sabartsoftware instead (with skb_csum_hwoffload_help() or one of the 68*d0dcde64SOtto Sabartskb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in 69*d0dcde64SOtto Sabartinclude/linux/skbuff.h). 70*d0dcde64SOtto Sabart 71*d0dcde64SOtto SabartThe stack should, for the most part, assume that checksum offload is supported 72*d0dcde64SOtto Sabartby the underlying device. The only place that should check is 73*d0dcde64SOtto Sabartvalidate_xmit_skb(), and the functions it calls directly or indirectly. That 74*d0dcde64SOtto Sabartfunction compares the offload features requested by the SKB (which may include 75*d0dcde64SOtto Sabartother offloads besides TX Checksum Offload) and, if they are not supported or 76*d0dcde64SOtto Sabartenabled on the device (determined by netdev->features), performs the 77*d0dcde64SOtto Sabartcorresponding offload in software. In the case of TX Checksum Offload, that 78*d0dcde64SOtto Sabartmeans calling skb_csum_hwoffload_help(skb, features). 79*d0dcde64SOtto Sabart 80*d0dcde64SOtto Sabart 81*d0dcde64SOtto SabartLCO: Local Checksum Offload 82*d0dcde64SOtto Sabart=========================== 83*d0dcde64SOtto Sabart 84*d0dcde64SOtto SabartLCO is a technique for efficiently computing the outer checksum of an 85*d0dcde64SOtto Sabartencapsulated datagram when the inner checksum is due to be offloaded. 86*d0dcde64SOtto Sabart 87*d0dcde64SOtto SabartThe ones-complement sum of a correctly checksummed TCP or UDP packet is equal 88*d0dcde64SOtto Sabartto the complement of the sum of the pseudo header, because everything else gets 89*d0dcde64SOtto Sabart'cancelled out' by the checksum field. This is because the sum was 90*d0dcde64SOtto Sabartcomplemented before being written to the checksum field. 91*d0dcde64SOtto Sabart 92*d0dcde64SOtto SabartMore generally, this holds in any case where the 'IP-style' ones complement 93*d0dcde64SOtto Sabartchecksum is used, and thus any checksum that TX Checksum Offload supports. 94*d0dcde64SOtto Sabart 95*d0dcde64SOtto SabartThat is, if we have set up TX Checksum Offload with a start/offset pair, we 96*d0dcde64SOtto Sabartknow that after the device has filled in that checksum, the ones complement sum 97*d0dcde64SOtto Sabartfrom csum_start to the end of the packet will be equal to the complement of 98*d0dcde64SOtto Sabartwhatever value we put in the checksum field beforehand. This allows us to 99*d0dcde64SOtto Sabartcompute the outer checksum without looking at the payload: we simply stop 100*d0dcde64SOtto Sabartsumming when we get to csum_start, then add the complement of the 16-bit word 101*d0dcde64SOtto Sabartat (csum_start + csum_offset). 102*d0dcde64SOtto Sabart 103*d0dcde64SOtto SabartThen, when the true inner checksum is filled in (either by hardware or by 104*d0dcde64SOtto Sabartskb_checksum_help()), the outer checksum will become correct by virtue of the 105*d0dcde64SOtto Sabartarithmetic. 106*d0dcde64SOtto Sabart 107*d0dcde64SOtto SabartLCO is performed by the stack when constructing an outer UDP header for an 108*d0dcde64SOtto Sabartencapsulation such as VXLAN or GENEVE, in udp_set_csum(). Similarly for the 109*d0dcde64SOtto SabartIPv6 equivalents, in udp6_set_csum(). 110*d0dcde64SOtto Sabart 111*d0dcde64SOtto SabartIt is also performed when constructing an IPv4 GRE header, in 112*d0dcde64SOtto Sabartnet/ipv4/ip_gre.c:build_header(). It is *not* currently performed when 113*d0dcde64SOtto Sabartconstructing an IPv6 GRE header; the GRE checksum is computed over the whole 114*d0dcde64SOtto Sabartpacket in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use 115*d0dcde64SOtto SabartLCO here as IPv6 GRE still uses an IP-style checksum. 116*d0dcde64SOtto Sabart 117*d0dcde64SOtto SabartAll of the LCO implementations use a helper function lco_csum(), in 118*d0dcde64SOtto Sabartinclude/linux/skbuff.h. 119*d0dcde64SOtto Sabart 120*d0dcde64SOtto SabartLCO can safely be used for nested encapsulations; in this case, the outer 121*d0dcde64SOtto Sabartencapsulation layer will sum over both its own header and the 'middle' header. 122*d0dcde64SOtto SabartThis does mean that the 'middle' header will get summed multiple times, but 123*d0dcde64SOtto Sabartthere doesn't seem to be a way to avoid that without incurring bigger costs 124*d0dcde64SOtto Sabart(e.g. in SKB bloat). 125*d0dcde64SOtto Sabart 126*d0dcde64SOtto Sabart 127*d0dcde64SOtto SabartRCO: Remote Checksum Offload 128*d0dcde64SOtto Sabart============================ 129*d0dcde64SOtto Sabart 130*d0dcde64SOtto SabartRCO is a technique for eliding the inner checksum of an encapsulated datagram, 131*d0dcde64SOtto Sabartallowing the outer checksum to be offloaded. It does, however, involve a 132*d0dcde64SOtto Sabartchange to the encapsulation protocols, which the receiver must also support. 133*d0dcde64SOtto SabartFor this reason, it is disabled by default. 134*d0dcde64SOtto Sabart 135*d0dcde64SOtto SabartRCO is detailed in the following Internet-Drafts: 136*d0dcde64SOtto Sabart 137*d0dcde64SOtto Sabart* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00 138*d0dcde64SOtto Sabart* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00 139*d0dcde64SOtto Sabart 140*d0dcde64SOtto SabartIn Linux, RCO is implemented individually in each encapsulation protocol, and 141*d0dcde64SOtto Sabartmost tunnel types have flags controlling its use. For instance, VXLAN has the 142*d0dcde64SOtto Sabartflag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be 143*d0dcde64SOtto Sabartused when transmitting to a given remote destination. 144