xref: /linux/Documentation/networking/checksum-offloads.rst (revision 4b4193256c8d3bc3a5397b5cd9494c2ad386317d)
1d0dcde64SOtto Sabart.. SPDX-License-Identifier: GPL-2.0
2d0dcde64SOtto Sabart
3b83eb68cSOtto Sabart=================
4b83eb68cSOtto SabartChecksum Offloads
5b83eb68cSOtto Sabart=================
6d0dcde64SOtto Sabart
7d0dcde64SOtto Sabart
8d0dcde64SOtto SabartIntroduction
9d0dcde64SOtto Sabart============
10d0dcde64SOtto Sabart
11d0dcde64SOtto SabartThis document describes a set of techniques in the Linux networking stack to
12d0dcde64SOtto Sabarttake advantage of checksum offload capabilities of various NICs.
13d0dcde64SOtto Sabart
14d0dcde64SOtto SabartThe following technologies are described:
15d0dcde64SOtto Sabart
16d0dcde64SOtto Sabart* TX Checksum Offload
17d0dcde64SOtto Sabart* LCO: Local Checksum Offload
18d0dcde64SOtto Sabart* RCO: Remote Checksum Offload
19d0dcde64SOtto Sabart
20d0dcde64SOtto SabartThings that should be documented here but aren't yet:
21d0dcde64SOtto Sabart
22d0dcde64SOtto Sabart* RX Checksum Offload
23d0dcde64SOtto Sabart* CHECKSUM_UNNECESSARY conversion
24d0dcde64SOtto Sabart
25d0dcde64SOtto Sabart
26d0dcde64SOtto SabartTX Checksum Offload
27d0dcde64SOtto Sabart===================
28d0dcde64SOtto Sabart
29d0dcde64SOtto SabartThe interface for offloading a transmit checksum to a device is explained in
30d0dcde64SOtto Sabartdetail in comments near the top of include/linux/skbuff.h.
31d0dcde64SOtto Sabart
32d0dcde64SOtto SabartIn brief, it allows to request the device fill in a single ones-complement
33d0dcde64SOtto Sabartchecksum defined by the sk_buff fields skb->csum_start and skb->csum_offset.
34d0dcde64SOtto SabartThe device should compute the 16-bit ones-complement checksum (i.e. the
35d0dcde64SOtto Sabart'IP-style' checksum) from csum_start to the end of the packet, and fill in the
36d0dcde64SOtto Sabartresult at (csum_start + csum_offset).
37d0dcde64SOtto Sabart
38d0dcde64SOtto SabartBecause csum_offset cannot be negative, this ensures that the previous value of
39d0dcde64SOtto Sabartthe checksum field is included in the checksum computation, thus it can be used
40d0dcde64SOtto Sabartto supply any needed corrections to the checksum (such as the sum of the
41d0dcde64SOtto Sabartpseudo-header for UDP or TCP).
42d0dcde64SOtto Sabart
43d0dcde64SOtto SabartThis interface only allows a single checksum to be offloaded.  Where
44d0dcde64SOtto Sabartencapsulation is used, the packet may have multiple checksum fields in
45d0dcde64SOtto Sabartdifferent header layers, and the rest will have to be handled by another
46d0dcde64SOtto Sabartmechanism such as LCO or RCO.
47d0dcde64SOtto Sabart
48d0dcde64SOtto SabartCRC32c can also be offloaded using this interface, by means of filling
49d0dcde64SOtto Sabartskb->csum_start and skb->csum_offset as described above, and setting
50d0dcde64SOtto Sabartskb->csum_not_inet: see skbuff.h comment (section 'D') for more details.
51d0dcde64SOtto Sabart
52d0dcde64SOtto SabartNo offloading of the IP header checksum is performed; it is always done in
53d0dcde64SOtto Sabartsoftware.  This is OK because when we build the IP header, we obviously have it
54d0dcde64SOtto Sabartin cache, so summing it isn't expensive.  It's also rather short.
55d0dcde64SOtto Sabart
56d0dcde64SOtto SabartThe requirements for GSO are more complicated, because when segmenting an
57d0dcde64SOtto Sabartencapsulated packet both the inner and outer checksums may need to be edited or
58d0dcde64SOtto Sabartrecomputed for each resulting segment.  See the skbuff.h comment (section 'E')
59d0dcde64SOtto Sabartfor more details.
60d0dcde64SOtto Sabart
61d0dcde64SOtto SabartA driver declares its offload capabilities in netdev->hw_features; see
62*ea5bacaaSMauro Carvalho ChehabDocumentation/networking/netdev-features.rst for more.  Note that a device
63d0dcde64SOtto Sabartwhich only advertises NETIF_F_IP[V6]_CSUM must still obey the csum_start and
64d0dcde64SOtto Sabartcsum_offset given in the SKB; if it tries to deduce these itself in hardware
65d0dcde64SOtto Sabart(as some NICs do) the driver should check that the values in the SKB match
66d0dcde64SOtto Sabartthose which the hardware will deduce, and if not, fall back to checksumming in
67d0dcde64SOtto Sabartsoftware instead (with skb_csum_hwoffload_help() or one of the
68d0dcde64SOtto Sabartskb_checksum_help() / skb_crc32c_csum_help functions, as mentioned in
69d0dcde64SOtto Sabartinclude/linux/skbuff.h).
70d0dcde64SOtto Sabart
71d0dcde64SOtto SabartThe stack should, for the most part, assume that checksum offload is supported
72d0dcde64SOtto Sabartby the underlying device.  The only place that should check is
73d0dcde64SOtto Sabartvalidate_xmit_skb(), and the functions it calls directly or indirectly.  That
74d0dcde64SOtto Sabartfunction compares the offload features requested by the SKB (which may include
75d0dcde64SOtto Sabartother offloads besides TX Checksum Offload) and, if they are not supported or
76d0dcde64SOtto Sabartenabled on the device (determined by netdev->features), performs the
77d0dcde64SOtto Sabartcorresponding offload in software.  In the case of TX Checksum Offload, that
78d0dcde64SOtto Sabartmeans calling skb_csum_hwoffload_help(skb, features).
79d0dcde64SOtto Sabart
80d0dcde64SOtto Sabart
81d0dcde64SOtto SabartLCO: Local Checksum Offload
82d0dcde64SOtto Sabart===========================
83d0dcde64SOtto Sabart
84d0dcde64SOtto SabartLCO is a technique for efficiently computing the outer checksum of an
85d0dcde64SOtto Sabartencapsulated datagram when the inner checksum is due to be offloaded.
86d0dcde64SOtto Sabart
87d0dcde64SOtto SabartThe ones-complement sum of a correctly checksummed TCP or UDP packet is equal
88d0dcde64SOtto Sabartto the complement of the sum of the pseudo header, because everything else gets
89d0dcde64SOtto Sabart'cancelled out' by the checksum field.  This is because the sum was
90d0dcde64SOtto Sabartcomplemented before being written to the checksum field.
91d0dcde64SOtto Sabart
92d0dcde64SOtto SabartMore generally, this holds in any case where the 'IP-style' ones complement
93d0dcde64SOtto Sabartchecksum is used, and thus any checksum that TX Checksum Offload supports.
94d0dcde64SOtto Sabart
95d0dcde64SOtto SabartThat is, if we have set up TX Checksum Offload with a start/offset pair, we
96d0dcde64SOtto Sabartknow that after the device has filled in that checksum, the ones complement sum
97d0dcde64SOtto Sabartfrom csum_start to the end of the packet will be equal to the complement of
98d0dcde64SOtto Sabartwhatever value we put in the checksum field beforehand.  This allows us to
99d0dcde64SOtto Sabartcompute the outer checksum without looking at the payload: we simply stop
100d0dcde64SOtto Sabartsumming when we get to csum_start, then add the complement of the 16-bit word
101d0dcde64SOtto Sabartat (csum_start + csum_offset).
102d0dcde64SOtto Sabart
103d0dcde64SOtto SabartThen, when the true inner checksum is filled in (either by hardware or by
104d0dcde64SOtto Sabartskb_checksum_help()), the outer checksum will become correct by virtue of the
105d0dcde64SOtto Sabartarithmetic.
106d0dcde64SOtto Sabart
107d0dcde64SOtto SabartLCO is performed by the stack when constructing an outer UDP header for an
108d0dcde64SOtto Sabartencapsulation such as VXLAN or GENEVE, in udp_set_csum().  Similarly for the
109d0dcde64SOtto SabartIPv6 equivalents, in udp6_set_csum().
110d0dcde64SOtto Sabart
111d0dcde64SOtto SabartIt is also performed when constructing an IPv4 GRE header, in
112d0dcde64SOtto Sabartnet/ipv4/ip_gre.c:build_header().  It is *not* currently performed when
113d0dcde64SOtto Sabartconstructing an IPv6 GRE header; the GRE checksum is computed over the whole
114d0dcde64SOtto Sabartpacket in net/ipv6/ip6_gre.c:ip6gre_xmit2(), but it should be possible to use
115d0dcde64SOtto SabartLCO here as IPv6 GRE still uses an IP-style checksum.
116d0dcde64SOtto Sabart
117d0dcde64SOtto SabartAll of the LCO implementations use a helper function lco_csum(), in
118d0dcde64SOtto Sabartinclude/linux/skbuff.h.
119d0dcde64SOtto Sabart
120d0dcde64SOtto SabartLCO can safely be used for nested encapsulations; in this case, the outer
121d0dcde64SOtto Sabartencapsulation layer will sum over both its own header and the 'middle' header.
122d0dcde64SOtto SabartThis does mean that the 'middle' header will get summed multiple times, but
123d0dcde64SOtto Sabartthere doesn't seem to be a way to avoid that without incurring bigger costs
124d0dcde64SOtto Sabart(e.g. in SKB bloat).
125d0dcde64SOtto Sabart
126d0dcde64SOtto Sabart
127d0dcde64SOtto SabartRCO: Remote Checksum Offload
128d0dcde64SOtto Sabart============================
129d0dcde64SOtto Sabart
130d0dcde64SOtto SabartRCO is a technique for eliding the inner checksum of an encapsulated datagram,
131d0dcde64SOtto Sabartallowing the outer checksum to be offloaded.  It does, however, involve a
132d0dcde64SOtto Sabartchange to the encapsulation protocols, which the receiver must also support.
133d0dcde64SOtto SabartFor this reason, it is disabled by default.
134d0dcde64SOtto Sabart
135d0dcde64SOtto SabartRCO is detailed in the following Internet-Drafts:
136d0dcde64SOtto Sabart
137d0dcde64SOtto Sabart* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
138d0dcde64SOtto Sabart* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
139d0dcde64SOtto Sabart
140d0dcde64SOtto SabartIn Linux, RCO is implemented individually in each encapsulation protocol, and
141d0dcde64SOtto Sabartmost tunnel types have flags controlling its use.  For instance, VXLAN has the
142d0dcde64SOtto Sabartflag VXLAN_F_REMCSUM_TX (per struct vxlan_rdst) to indicate that RCO should be
143d0dcde64SOtto Sabartused when transmitting to a given remote destination.
144