xref: /linux/Documentation/networking/xfrm/xfrm_device.rst (revision cc4adab164b772a34b3340d644b7c4728498581e)
1*03e23b18SBagas Sanjaya.. SPDX-License-Identifier: GPL-2.0
2*03e23b18SBagas Sanjaya.. _xfrm_device:
3*03e23b18SBagas Sanjaya
4*03e23b18SBagas Sanjaya===============================================
5*03e23b18SBagas SanjayaXFRM device - offloading the IPsec computations
6*03e23b18SBagas Sanjaya===============================================
7*03e23b18SBagas Sanjaya
8*03e23b18SBagas SanjayaShannon Nelson <shannon.nelson@oracle.com>
9*03e23b18SBagas SanjayaLeon Romanovsky <leonro@nvidia.com>
10*03e23b18SBagas Sanjaya
11*03e23b18SBagas Sanjaya
12*03e23b18SBagas SanjayaOverview
13*03e23b18SBagas Sanjaya========
14*03e23b18SBagas Sanjaya
15*03e23b18SBagas SanjayaIPsec is a useful feature for securing network traffic, but the
16*03e23b18SBagas Sanjayacomputational cost is high: a 10Gbps link can easily be brought down
17*03e23b18SBagas Sanjayato under 1Gbps, depending on the traffic and link configuration.
18*03e23b18SBagas SanjayaLuckily, there are NICs that offer a hardware based IPsec offload which
19*03e23b18SBagas Sanjayacan radically increase throughput and decrease CPU utilization.  The XFRM
20*03e23b18SBagas SanjayaDevice interface allows NIC drivers to offer to the stack access to the
21*03e23b18SBagas Sanjayahardware offload.
22*03e23b18SBagas Sanjaya
23*03e23b18SBagas SanjayaRight now, there are two types of hardware offload that kernel supports:
24*03e23b18SBagas Sanjaya
25*03e23b18SBagas Sanjaya * IPsec crypto offload:
26*03e23b18SBagas Sanjaya
27*03e23b18SBagas Sanjaya   * NIC performs encrypt/decrypt
28*03e23b18SBagas Sanjaya   * Kernel does everything else
29*03e23b18SBagas Sanjaya
30*03e23b18SBagas Sanjaya * IPsec packet offload:
31*03e23b18SBagas Sanjaya
32*03e23b18SBagas Sanjaya   * NIC performs encrypt/decrypt
33*03e23b18SBagas Sanjaya   * NIC does encapsulation
34*03e23b18SBagas Sanjaya   * Kernel and NIC have SA and policy in-sync
35*03e23b18SBagas Sanjaya   * NIC handles the SA and policies states
36*03e23b18SBagas Sanjaya   * The Kernel talks to the keymanager
37*03e23b18SBagas Sanjaya
38*03e23b18SBagas SanjayaUserland access to the offload is typically through a system such as
39*03e23b18SBagas Sanjayalibreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can
40*03e23b18SBagas Sanjayabe handy when experimenting.  An example command might look something
41*03e23b18SBagas Sanjayalike this for crypto offload::
42*03e23b18SBagas Sanjaya
43*03e23b18SBagas Sanjaya  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
44*03e23b18SBagas Sanjaya     reqid 0x07 replay-window 32 \
45*03e23b18SBagas Sanjaya     aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
46*03e23b18SBagas Sanjaya     sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
47*03e23b18SBagas Sanjaya     offload dev eth4 dir in
48*03e23b18SBagas Sanjaya
49*03e23b18SBagas Sanjayaand for packet offload::
50*03e23b18SBagas Sanjaya
51*03e23b18SBagas Sanjaya  ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \
52*03e23b18SBagas Sanjaya     reqid 0x07 replay-window 32 \
53*03e23b18SBagas Sanjaya     aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \
54*03e23b18SBagas Sanjaya     sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \
55*03e23b18SBagas Sanjaya     offload packet dev eth4 dir in
56*03e23b18SBagas Sanjaya
57*03e23b18SBagas Sanjaya  ip x p add src 14.0.0.70 dst 14.0.0.52 offload packet dev eth4 dir in
58*03e23b18SBagas Sanjaya  tmpl src 14.0.0.70 dst 14.0.0.52 proto esp reqid 10000 mode transport
59*03e23b18SBagas Sanjaya
60*03e23b18SBagas SanjayaYes, that's ugly, but that's what shell scripts and/or libreswan are for.
61*03e23b18SBagas Sanjaya
62*03e23b18SBagas Sanjaya
63*03e23b18SBagas Sanjaya
64*03e23b18SBagas SanjayaCallbacks to implement
65*03e23b18SBagas Sanjaya======================
66*03e23b18SBagas Sanjaya
67*03e23b18SBagas Sanjaya::
68*03e23b18SBagas Sanjaya
69*03e23b18SBagas Sanjaya  /* from include/linux/netdevice.h */
70*03e23b18SBagas Sanjaya  struct xfrmdev_ops {
71*03e23b18SBagas Sanjaya        /* Crypto and Packet offload callbacks */
72*03e23b18SBagas Sanjaya	int	(*xdo_dev_state_add)(struct net_device *dev,
73*03e23b18SBagas Sanjaya                                     struct xfrm_state *x,
74*03e23b18SBagas Sanjaya                                     struct netlink_ext_ack *extack);
75*03e23b18SBagas Sanjaya	void	(*xdo_dev_state_delete)(struct net_device *dev,
76*03e23b18SBagas Sanjaya                                        struct xfrm_state *x);
77*03e23b18SBagas Sanjaya	void	(*xdo_dev_state_free)(struct net_device *dev,
78*03e23b18SBagas Sanjaya                                      struct xfrm_state *x);
79*03e23b18SBagas Sanjaya	bool	(*xdo_dev_offload_ok) (struct sk_buff *skb,
80*03e23b18SBagas Sanjaya				       struct xfrm_state *x);
81*03e23b18SBagas Sanjaya	void    (*xdo_dev_state_advance_esn) (struct xfrm_state *x);
82*03e23b18SBagas Sanjaya	void    (*xdo_dev_state_update_stats) (struct xfrm_state *x);
83*03e23b18SBagas Sanjaya
84*03e23b18SBagas Sanjaya        /* Solely packet offload callbacks */
85*03e23b18SBagas Sanjaya	int	(*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack);
86*03e23b18SBagas Sanjaya	void	(*xdo_dev_policy_delete) (struct xfrm_policy *x);
87*03e23b18SBagas Sanjaya	void	(*xdo_dev_policy_free) (struct xfrm_policy *x);
88*03e23b18SBagas Sanjaya  };
89*03e23b18SBagas Sanjaya
90*03e23b18SBagas SanjayaThe NIC driver offering ipsec offload will need to implement callbacks
91*03e23b18SBagas Sanjayarelevant to supported offload to make the offload available to the network
92*03e23b18SBagas Sanjayastack's XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and
93*03e23b18SBagas SanjayaNETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload.
94*03e23b18SBagas Sanjaya
95*03e23b18SBagas Sanjaya
96*03e23b18SBagas Sanjaya
97*03e23b18SBagas SanjayaFlow
98*03e23b18SBagas Sanjaya====
99*03e23b18SBagas Sanjaya
100*03e23b18SBagas SanjayaAt probe time and before the call to register_netdev(), the driver should
101*03e23b18SBagas Sanjayaset up local data structures and XFRM callbacks, and set the feature bits.
102*03e23b18SBagas SanjayaThe XFRM code's listener will finish the setup on NETDEV_REGISTER.
103*03e23b18SBagas Sanjaya
104*03e23b18SBagas Sanjaya::
105*03e23b18SBagas Sanjaya
106*03e23b18SBagas Sanjaya		adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops;
107*03e23b18SBagas Sanjaya		adapter->netdev->features |= NETIF_F_HW_ESP;
108*03e23b18SBagas Sanjaya		adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP;
109*03e23b18SBagas Sanjaya
110*03e23b18SBagas SanjayaWhen new SAs are set up with a request for "offload" feature, the
111*03e23b18SBagas Sanjayadriver's xdo_dev_state_add() will be given the new SA to be offloaded
112*03e23b18SBagas Sanjayaand an indication of whether it is for Rx or Tx.  The driver should
113*03e23b18SBagas Sanjaya
114*03e23b18SBagas Sanjaya	- verify the algorithm is supported for offloads
115*03e23b18SBagas Sanjaya	- store the SA information (key, salt, target-ip, protocol, etc)
116*03e23b18SBagas Sanjaya	- enable the HW offload of the SA
117*03e23b18SBagas Sanjaya	- return status value:
118*03e23b18SBagas Sanjaya
119*03e23b18SBagas Sanjaya		===========   ===================================
120*03e23b18SBagas Sanjaya		0             success
121*03e23b18SBagas Sanjaya		-EOPNETSUPP   offload not supported, try SW IPsec,
122*03e23b18SBagas Sanjaya                              not applicable for packet offload mode
123*03e23b18SBagas Sanjaya		other         fail the request
124*03e23b18SBagas Sanjaya		===========   ===================================
125*03e23b18SBagas Sanjaya
126*03e23b18SBagas SanjayaThe driver can also set an offload_handle in the SA, an opaque void pointer
127*03e23b18SBagas Sanjayathat can be used to convey context into the fast-path offload requests::
128*03e23b18SBagas Sanjaya
129*03e23b18SBagas Sanjaya		xs->xso.offload_handle = context;
130*03e23b18SBagas Sanjaya
131*03e23b18SBagas Sanjaya
132*03e23b18SBagas SanjayaWhen the network stack is preparing an IPsec packet for an SA that has
133*03e23b18SBagas Sanjayabeen setup for offload, it first calls into xdo_dev_offload_ok() with
134*03e23b18SBagas Sanjayathe skb and the intended offload state to ask the driver if the offload
135*03e23b18SBagas Sanjayawill serviceable.  This can check the packet information to be sure the
136*03e23b18SBagas Sanjayaoffload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and
137*03e23b18SBagas Sanjayareturn true or false to signify its support. In case driver doesn't implement
138*03e23b18SBagas Sanjayathis callback, the stack provides reasonable defaults.
139*03e23b18SBagas Sanjaya
140*03e23b18SBagas SanjayaCrypto offload mode:
141*03e23b18SBagas SanjayaWhen ready to send, the driver needs to inspect the Tx packet for the
142*03e23b18SBagas Sanjayaoffload information, including the opaque context, and set up the packet
143*03e23b18SBagas Sanjayasend accordingly::
144*03e23b18SBagas Sanjaya
145*03e23b18SBagas Sanjaya		xs = xfrm_input_state(skb);
146*03e23b18SBagas Sanjaya		context = xs->xso.offload_handle;
147*03e23b18SBagas Sanjaya		set up HW for send
148*03e23b18SBagas Sanjaya
149*03e23b18SBagas SanjayaThe stack has already inserted the appropriate IPsec headers in the
150*03e23b18SBagas Sanjayapacket data, the offload just needs to do the encryption and fix up the
151*03e23b18SBagas Sanjayaheader values.
152*03e23b18SBagas Sanjaya
153*03e23b18SBagas Sanjaya
154*03e23b18SBagas SanjayaWhen a packet is received and the HW has indicated that it offloaded a
155*03e23b18SBagas Sanjayadecryption, the driver needs to add a reference to the decoded SA into
156*03e23b18SBagas Sanjayathe packet's skb.  At this point the data should be decrypted but the
157*03e23b18SBagas SanjayaIPsec headers are still in the packet data; they are removed later up
158*03e23b18SBagas Sanjayathe stack in xfrm_input().
159*03e23b18SBagas Sanjaya
160*03e23b18SBagas Sanjaya1. Find and hold the SA that was used to the Rx skb::
161*03e23b18SBagas Sanjaya
162*03e23b18SBagas Sanjaya		/* get spi, protocol, and destination IP from packet headers */
163*03e23b18SBagas Sanjaya		xs = find xs from (spi, protocol, dest_IP)
164*03e23b18SBagas Sanjaya		xfrm_state_hold(xs);
165*03e23b18SBagas Sanjaya
166*03e23b18SBagas Sanjaya2. Store the state information into the skb::
167*03e23b18SBagas Sanjaya
168*03e23b18SBagas Sanjaya		sp = secpath_set(skb);
169*03e23b18SBagas Sanjaya		if (!sp) return;
170*03e23b18SBagas Sanjaya		sp->xvec[sp->len++] = xs;
171*03e23b18SBagas Sanjaya		sp->olen++;
172*03e23b18SBagas Sanjaya
173*03e23b18SBagas Sanjaya3. Indicate the success and/or error status of the offload::
174*03e23b18SBagas Sanjaya
175*03e23b18SBagas Sanjaya		xo = xfrm_offload(skb);
176*03e23b18SBagas Sanjaya		xo->flags = CRYPTO_DONE;
177*03e23b18SBagas Sanjaya		xo->status = crypto_status;
178*03e23b18SBagas Sanjaya
179*03e23b18SBagas Sanjaya4. Hand the packet to napi_gro_receive() as usual.
180*03e23b18SBagas Sanjaya
181*03e23b18SBagas SanjayaIn ESN mode, xdo_dev_state_advance_esn() is called from
182*03e23b18SBagas Sanjayaxfrm_replay_advance_esn() for RX, and xfrm_replay_overflow_offload_esn for TX.
183*03e23b18SBagas SanjayaDriver will check packet seq number and update HW ESN state machine if needed.
184*03e23b18SBagas Sanjaya
185*03e23b18SBagas SanjayaPacket offload mode:
186*03e23b18SBagas SanjayaHW adds and deletes XFRM headers. So in RX path, XFRM stack is bypassed if HW
187*03e23b18SBagas Sanjayareported success. In TX path, the packet lefts kernel without extra header
188*03e23b18SBagas Sanjayaand not encrypted, the HW is responsible to perform it.
189*03e23b18SBagas Sanjaya
190*03e23b18SBagas SanjayaWhen the SA is removed by the user, the driver's xdo_dev_state_delete()
191*03e23b18SBagas Sanjayaand xdo_dev_policy_delete() are asked to disable the offload.  Later,
192*03e23b18SBagas Sanjayaxdo_dev_state_free() and xdo_dev_policy_free() are called from a garbage
193*03e23b18SBagas Sanjayacollection routine after all reference counts to the state and policy
194*03e23b18SBagas Sanjayahave been removed and any remaining resources can be cleared for the
195*03e23b18SBagas Sanjayaoffload state.  How these are used by the driver will depend on specific
196*03e23b18SBagas Sanjayahardware needs.
197*03e23b18SBagas Sanjaya
198*03e23b18SBagas SanjayaAs a netdev is set to DOWN the XFRM stack's netdev listener will call
199*03e23b18SBagas Sanjayaxdo_dev_state_delete(), xdo_dev_policy_delete(), xdo_dev_state_free() and
200*03e23b18SBagas Sanjayaxdo_dev_policy_free() on any remaining offloaded states.
201*03e23b18SBagas Sanjaya
202*03e23b18SBagas SanjayaOutcome of HW handling packets, the XFRM core can't count hard, soft limits.
203*03e23b18SBagas SanjayaThe HW/driver are responsible to perform it and provide accurate data when
204*03e23b18SBagas Sanjayaxdo_dev_state_update_stats() is called. In case of one of these limits
205*03e23b18SBagas Sanjayaoccuried, the driver needs to call to xfrm_state_check_expire() to make sure
206*03e23b18SBagas Sanjayathat XFRM performs rekeying sequence.
207