xref: /linux/Documentation/networking/checksum-offloads.rst (revision d0dcde6426ce071ad447fb9d91c85ab649026114)
1*d0dcde64SOtto Sabart.. SPDX-License-Identifier: GPL-2.0
2*d0dcde64SOtto Sabart
3*d0dcde64SOtto Sabart===============================================
4*d0dcde64SOtto SabartChecksum Offloads in the Linux Networking Stack
5*d0dcde64SOtto Sabart===============================================
6*d0dcde64SOtto Sabart
7*d0dcde64SOtto Sabart
8*d0dcde64SOtto SabartIntroduction
9*d0dcde64SOtto Sabart============
10*d0dcde64SOtto Sabart
11*d0dcde64SOtto SabartThis document describes a set of techniques in the Linux networking stack to
12*d0dcde64SOtto Sabarttake advantage of checksum offload capabilities of various NICs.
13*d0dcde64SOtto Sabart
14*d0dcde64SOtto SabartThe following technologies are described:
15*d0dcde64SOtto Sabart
16*d0dcde64SOtto Sabart* TX Checksum Offload
17*d0dcde64SOtto Sabart* LCO: Local Checksum Offload
18*d0dcde64SOtto Sabart* RCO: Remote Checksum Offload
19*d0dcde64SOtto Sabart
20*d0dcde64SOtto SabartThings that should be documented here but aren't yet:
21*d0dcde64SOtto Sabart
22*d0dcde64SOtto Sabart* RX Checksum Offload
23*d0dcde64SOtto Sabart* CHECKSUM_UNNECESSARY conversion
24*d0dcde64SOtto Sabart
25*d0dcde64SOtto Sabart
26*d0dcde64SOtto SabartTX Checksum Offload
27*d0dcde64SOtto Sabart===================
28*d0dcde64SOtto Sabart
29*d0dcde64SOtto SabartThe interface for offloading a transmit checksum to a device is explained in
30*d0dcde64SOtto Sabartdetail in comments near the top of include/linux/skbuff.h.
31*d0dcde64SOtto Sabart
32*d0dcde64SOtto SabartIn brief, it allows to request the device fill in a single ones-complement
33*d0dcde64SOtto Sabartchecksum defined by the sk_buff fields skb->csum_start and skb->csum_offset.
34*d0dcde64SOtto SabartThe device should compute the 16-bit ones-complement checksum (i.e. the
35*d0dcde64SOtto Sabart'IP-style' checksum) from csum_start to the end of the packet, and fill in the
36*d0dcde64SOtto Sabartresult at (csum_start + csum_offset).
37*d0dcde64SOtto Sabart
38*d0dcde64SOtto SabartBecause csum_offset cannot be negative, this ensures that the previous value of
39*d0dcde64SOtto Sabartthe checksum field is included in the checksum computation, thus it can be used
40*d0dcde64SOtto Sabartto supply any needed corrections to the checksum (such as the sum of the
41*d0dcde64SOtto Sabartpseudo-header for UDP or TCP).
42*d0dcde64SOtto Sabart
43*d0dcde64SOtto SabartThis interface only allows a single checksum to be offloaded.  Where
44*d0dcde64SOtto Sabartencapsulation is used, the packet may have multiple checksum fields in
45*d0dcde64SOtto Sabartdifferent header layers, and the rest will have to be handled by another
46*d0dcde64SOtto Sabartmechanism such as LCO or RCO.
47*d0dcde64SOtto Sabart
48*d0dcde64SOtto SabartCRC32c can also be offloaded using this interface, by means of filling
49*d0dcde64SOtto Sabartskb->csum_start and skb->csum_offset as described above, and setting
50*d0dcde64SOtto Sabartskb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
51*d0dcde64SOtto Sabart
52*d0dcde64SOtto SabartNo offloading of the IP header checksum is performed; it is always done in
53*d0dcde64SOtto Sabartsoftware.  This is OK because when we build the IP header, we obviously have it
54*d0dcde64SOtto Sabartin cache, so summing it isn't expensive.  It's also rather short.
55*d0dcde64SOtto Sabart
56*d0dcde64SOtto SabartThe requirements for GSO are more complicated, because when segmenting an
57*d0dcde64SOtto Sabartencapsulated packet both the inner and outer checksums may need to be edited or
58*d0dcde64SOtto Sabartrecomputed for each resulting segment.  See the skbuff.h comment (section 'E')
59*d0dcde64SOtto Sabartfor more details.
60*d0dcde64SOtto Sabart
61*d0dcde64SOtto SabartA driver declares its offload capabilities in netdev->hw_features; see
62*d0dcde64SOtto SabartDocumentation/networking/netdev-features.txt for more.  Note that a device
63*d0dcde64SOtto Sabartwhich only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and
64*d0dcde64SOtto Sabartcsum_offset given in the SKB; if it tries to deduce these itself in hardware
65*d0dcde64SOtto Sabart(as some NICs do) the driver should check that the values in the SKB match
66*d0dcde64SOtto Sabartthose which the hardware will deduce, and if not, fall back to checksumming in
67*d0dcde64SOtto Sabartsoftware instead (with skb_csum_hwoffload_help() or one of the
68*d0dcde64SOtto Sabartskb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
69*d0dcde64SOtto Sabartinclude/linux/skbuff.h).
70*d0dcde64SOtto Sabart
71*d0dcde64SOtto SabartThe stack should, for the most part, assume that checksum offload is supported
72*d0dcde64SOtto Sabartby the underlying device.  The only place that should check is
73*d0dcde64SOtto Sabartvalidate_xmit_skb(), and the functions it calls directly or indirectly.  That
74*d0dcde64SOtto Sabartfunction compares the offload features requested by the SKB (which may include
75*d0dcde64SOtto Sabartother offloads besides TX Checksum Offload) and, if they are not supported or
76*d0dcde64SOtto Sabartenabled on the device (determined by netdev->features), performs the
77*d0dcde64SOtto Sabartcorresponding offload in software.  In the case of TX Checksum Offload, that
78*d0dcde64SOtto Sabartmeans calling skb_csum_hwoffload_help(skb, features).
79*d0dcde64SOtto Sabart
80*d0dcde64SOtto Sabart
81*d0dcde64SOtto SabartLCO: Local Checksum Offload
82*d0dcde64SOtto Sabart===========================
83*d0dcde64SOtto Sabart
84*d0dcde64SOtto SabartLCO is a technique for efficiently computing the outer checksum of an
85*d0dcde64SOtto Sabartencapsulated datagram when the inner checksum is due to be offloaded.
86*d0dcde64SOtto Sabart
87*d0dcde64SOtto SabartThe ones-complement sum of a correctly checksummed TCP or UDP packet is equal
88*d0dcde64SOtto Sabartto the complement of the sum of the pseudo header, because everything else gets
89*d0dcde64SOtto Sabart'cancelled out' by the checksum field.  This is because the sum was
90*d0dcde64SOtto Sabartcomplemented before being written to the checksum field.
91*d0dcde64SOtto Sabart
92*d0dcde64SOtto SabartMore generally, this holds in any case where the 'IP-style' ones complement
93*d0dcde64SOtto Sabartchecksum is used, and thus any checksum that TX Checksum Offload supports.
94*d0dcde64SOtto Sabart
95*d0dcde64SOtto SabartThat is, if we have set up TX Checksum Offload with a start/offset pair, we
96*d0dcde64SOtto Sabartknow that after the device has filled in that checksum, the ones complement sum
97*d0dcde64SOtto Sabartfrom csum_start to the end of the packet will be equal to the complement of
98*d0dcde64SOtto Sabartwhatever value we put in the checksum field beforehand.  This allows us to
99*d0dcde64SOtto Sabartcompute the outer checksum without looking at the payload: we simply stop
100*d0dcde64SOtto Sabartsumming when we get to csum_start, then add the complement of the 16-bit word
101*d0dcde64SOtto Sabartat (csum_start + csum_offset).
102*d0dcde64SOtto Sabart
103*d0dcde64SOtto SabartThen, when the true inner checksum is filled in (either by hardware or by
104*d0dcde64SOtto Sabartskb_checksum_help()), the outer checksum will become correct by virtue of the
105*d0dcde64SOtto Sabartarithmetic.
106*d0dcde64SOtto Sabart
107*d0dcde64SOtto SabartLCO is performed by the stack when constructing an outer UDP header for an
108*d0dcde64SOtto Sabartencapsulation such as VXLAN or GENEVE, in udp_set_csum().  Similarly for the
109*d0dcde64SOtto SabartIPv6 equivalents, in udp6_set_csum().
110*d0dcde64SOtto Sabart
111*d0dcde64SOtto SabartIt is also performed when constructing an IPv4 GRE header, in
112*d0dcde64SOtto Sabartnet/ipv4/ip_gre.c:build_header().  It is *not* currently performed when
113*d0dcde64SOtto Sabartconstructing an IPv6 GRE header; the GRE checksum is computed over the whole
114*d0dcde64SOtto Sabartpacket in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use
115*d0dcde64SOtto SabartLCO here as IPv6 GRE still uses an IP-style checksum.
116*d0dcde64SOtto Sabart
117*d0dcde64SOtto SabartAll of the LCO implementations use a helper function lco_csum(), in
118*d0dcde64SOtto Sabartinclude/linux/skbuff.h.
119*d0dcde64SOtto Sabart
120*d0dcde64SOtto SabartLCO can safely be used for nested encapsulations; in this case, the outer
121*d0dcde64SOtto Sabartencapsulation layer will sum over both its own header and the 'middle' header.
122*d0dcde64SOtto SabartThis does mean that the 'middle' header will get summed multiple times, but
123*d0dcde64SOtto Sabartthere doesn't seem to be a way to avoid that without incurring bigger costs
124*d0dcde64SOtto Sabart(e.g. in SKB bloat).
125*d0dcde64SOtto Sabart
126*d0dcde64SOtto Sabart
127*d0dcde64SOtto SabartRCO: Remote Checksum Offload
128*d0dcde64SOtto Sabart============================
129*d0dcde64SOtto Sabart
130*d0dcde64SOtto SabartRCO is a technique for eliding the inner checksum of an encapsulated datagram,
131*d0dcde64SOtto Sabartallowing the outer checksum to be offloaded.  It does, however, involve a
132*d0dcde64SOtto Sabartchange to the encapsulation protocols, which the receiver must also support.
133*d0dcde64SOtto SabartFor this reason, it is disabled by default.
134*d0dcde64SOtto Sabart
135*d0dcde64SOtto SabartRCO is detailed in the following Internet-Drafts:
136*d0dcde64SOtto Sabart
137*d0dcde64SOtto Sabart* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
138*d0dcde64SOtto Sabart* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
139*d0dcde64SOtto Sabart
140*d0dcde64SOtto SabartIn Linux, RCO is implemented individually in each encapsulation protocol, and
141*d0dcde64SOtto Sabartmost tunnel types have flags controlling its use.  For instance, VXLAN has the
142*d0dcde64SOtto Sabartflag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be
143*d0dcde64SOtto Sabartused when transmitting to a given remote destination.
144