xref: /linux/Documentation/networking/checksum-offloads.rst (revision 079a028d6327e68cfa5d38b36123637b321c19a7)
1.. SPDX-License-Identifier: GPL-2.0
2
3=================
4Checksum Offloads
5=================
6
7
8Introduction
9============
10
11This document describes a set of techniques in the Linux networking stack to
12take advantage of checksum offload capabilities of various NICs.
13
14The following technologies are described:
15
16* TX Checksum Offload
17* LCO: Local Checksum Offload
18* RCO: Remote Checksum Offload
19
20Things that should be documented here but aren't yet:
21
22* CHECKSUM_UNNECESSARY conversion
23
24
25TX Checksum Offload
26===================
27
28In brief, Tx checksum offload allows to request the device fill in a single
29ones-complement
30checksum defined by the sk_buff fields skb->csum_start and skb->csum_offset.
31The device should compute the 16-bit ones-complement checksum (i.e. the
32'IP-style' checksum) from csum_start to the end of the packet, and fill in the
33result at (csum_start + csum_offset).
34
35Because csum_offset cannot be negative, this ensures that the previous value of
36the checksum field is included in the checksum computation, thus it can be used
37to supply any needed corrections to the checksum (such as the sum of the
38pseudo-header for UDP or TCP).
39
40This interface only allows a single checksum to be offloaded.  Where
41encapsulation is used, the packet may have multiple checksum fields in
42different header layers, and the rest will have to be handled by another
43mechanism such as LCO or RCO.
44
45SCTP CRC32c can also be offloaded using this interface, by means of filling
46skb->csum_start and skb->csum_offset as described above, setting
47skb->csum_not_inet, and advertising NETIF_F_SCTP_CRC. Drivers must not treat
48ordinary IP checksum offload as SCTP CRC32c support.
49
50No offloading of the IP header checksum is performed; it is always done in
51software.  This is OK because when we build the IP header, we obviously have it
52in cache, so summing it isn't expensive.  It's also rather short.
53
54The requirements for GSO are more complicated, because when segmenting an
55encapsulated packet both the inner and outer checksums may need to be edited or
56recomputed for each resulting segment.
57
58A driver declares its offload capabilities in netdev->hw_features; see
59Documentation/networking/netdev-features.rst for more. NETIF_F_IP_CSUM and
60NETIF_F_IPV6_CSUM are restricted legacy features and are being deprecated in
61favor of NETIF_F_HW_CSUM. New devices should use NETIF_F_HW_CSUM to advertise
62generic checksum offload. The skb_csum_hwoffload_help() helper can resolve
63CHECKSUM_PARTIAL according to the device's advertised checksum capabilities,
64falling back to software when needed.
65
66The stack should, for the most part, assume that checksum offload is supported
67by the underlying device.  The only place that should check is
68validate_xmit_skb(), and the functions it calls directly or indirectly.  That
69function compares the offload features requested by the SKB (which may include
70other offloads besides TX Checksum Offload) and, if they are not supported or
71enabled on the device (determined by netdev->features), performs the
72corresponding offload in software.  In the case of TX Checksum Offload, that
73means calling skb_csum_hwoffload_help(skb, features).
74
75
76LCO: Local Checksum Offload
77===========================
78
79LCO is a technique for efficiently computing the outer checksum of an
80encapsulated datagram when the inner checksum is due to be offloaded.
81
82The ones-complement sum of a correctly checksummed TCP or UDP packet is equal
83to the complement of the sum of the pseudo header, because everything else gets
84'cancelled out' by the checksum field.  This is because the sum was
85complemented before being written to the checksum field.
86
87More generally, this holds in any case where the 'IP-style' ones complement
88checksum is used, and thus any checksum that TX Checksum Offload supports.
89
90That is, if we have set up TX Checksum Offload with a start/offset pair, we
91know that after the device has filled in that checksum, the ones complement sum
92from csum_start to the end of the packet will be equal to the complement of
93whatever value we put in the checksum field beforehand.  This allows us to
94compute the outer checksum without looking at the payload: we simply stop
95summing when we get to csum_start, then add the complement of the 16-bit word
96at (csum_start + csum_offset).
97
98Then, when the true inner checksum is filled in (either by hardware or by
99skb_checksum_help()), the outer checksum will become correct by virtue of the
100arithmetic.
101
102LCO is performed by the stack when constructing an outer UDP header for an
103encapsulation such as VXLAN or GENEVE, in udp_set_csum().  Similarly for the
104IPv6 equivalents, in udp6_set_csum().
105
106It is also performed when constructing GRE headers with the shared
107gre_build_header() helper in include/net/gre.h, which is used by both IPv4 and
108IPv6 GRE.
109
110All of the LCO implementations use a helper function lco_csum(), in
111include/linux/skbuff.h.
112
113LCO can safely be used for nested encapsulations; in this case, the outer
114encapsulation layer will sum over both its own header and the 'middle' header.
115This does mean that the 'middle' header will get summed multiple times, but
116there doesn't seem to be a way to avoid that without incurring bigger costs
117(e.g. in SKB bloat).
118
119
120RCO: Remote Checksum Offload
121============================
122
123RCO is a technique for eliding the inner checksum of an encapsulated datagram,
124allowing the outer checksum to be offloaded.  It does, however, involve a
125change to the encapsulation protocols, which the receiver must also support.
126For this reason, it is disabled by default.
127
128RCO is detailed in the following Internet-Drafts:
129
130* https://tools.ietf.org/html/draft-herbert-remotecsumoffload-00
131* https://tools.ietf.org/html/draft-herbert-vxlan-rco-00
132
133In Linux, RCO is implemented individually in each encapsulation protocol, and
134most tunnel types have flags controlling its use. For instance, VXLAN has the
135configuration flag VXLAN_F_REMCSUM_TX to indicate that RCO should be used when
136transmitting.
137
138
139RX Checksum Offload
140===================
141
142RX checksum offload is controlled via NETIF_F_RXCSUM. When disabled the driver
143must not set skb->ip_summed on ingress packets. As mentioned, IPv4 checksum
144is not offloaded, the RXCSUM feature controls the offload of verification of
145transport layer checksums.
146
147Note that packets with bad TCP/UDP checksums must still be passed
148to the stack. skb->ip_summed of such packets can be set to ``CHECKSUM_COMPLETE``
149or left at ``CHECKSUM_NONE``. Drivers **must not discard** packets with
150bad TCP/UDP checksum and must not configure the device to drop them.
151Checksum validation is relatively inexpensive and having bad packets reflected
152in SNMP counters is crucial for network monitoring.
153
154skb checksum documentation
155==========================
156
157.. kernel-doc:: include/linux/skbuff.h
158   :doc: skb checksums
159