1*03e23b18SBagas Sanjaya.. SPDX-License-Identifier: GPL-2.0 2*03e23b18SBagas Sanjaya.. _xfrm_device: 3*03e23b18SBagas Sanjaya 4*03e23b18SBagas Sanjaya=============================================== 5*03e23b18SBagas SanjayaXFRM device - offloading the IPsec computations 6*03e23b18SBagas Sanjaya=============================================== 7*03e23b18SBagas Sanjaya 8*03e23b18SBagas SanjayaShannon Nelson <shannon.nelson@oracle.com> 9*03e23b18SBagas SanjayaLeon Romanovsky <leonro@nvidia.com> 10*03e23b18SBagas Sanjaya 11*03e23b18SBagas Sanjaya 12*03e23b18SBagas SanjayaOverview 13*03e23b18SBagas Sanjaya======== 14*03e23b18SBagas Sanjaya 15*03e23b18SBagas SanjayaIPsec is a useful feature for securing network traffic, but the 16*03e23b18SBagas Sanjayacomputational cost is high: a 10Gbps link can easily be brought down 17*03e23b18SBagas Sanjayato under 1Gbps, depending on the traffic and link configuration. 18*03e23b18SBagas SanjayaLuckily, there are NICs that offer a hardware based IPsec offload which 19*03e23b18SBagas Sanjayacan radically increase throughput and decrease CPU utilization. The XFRM 20*03e23b18SBagas SanjayaDevice interface allows NIC drivers to offer to the stack access to the 21*03e23b18SBagas Sanjayahardware offload. 22*03e23b18SBagas Sanjaya 23*03e23b18SBagas SanjayaRight now, there are two types of hardware offload that kernel supports: 24*03e23b18SBagas Sanjaya 25*03e23b18SBagas Sanjaya * IPsec crypto offload: 26*03e23b18SBagas Sanjaya 27*03e23b18SBagas Sanjaya * NIC performs encrypt/decrypt 28*03e23b18SBagas Sanjaya * Kernel does everything else 29*03e23b18SBagas Sanjaya 30*03e23b18SBagas Sanjaya * IPsec packet offload: 31*03e23b18SBagas Sanjaya 32*03e23b18SBagas Sanjaya * NIC performs encrypt/decrypt 33*03e23b18SBagas Sanjaya * NIC does encapsulation 34*03e23b18SBagas Sanjaya * Kernel and NIC have SA and policy in-sync 35*03e23b18SBagas Sanjaya * NIC handles the SA and policies states 36*03e23b18SBagas Sanjaya * The Kernel talks to the keymanager 37*03e23b18SBagas Sanjaya 38*03e23b18SBagas SanjayaUserland access to the offload is typically through a system such as 39*03e23b18SBagas Sanjayalibreswan or KAME/raccoon, but the iproute2 'ip xfrm' command set can 40*03e23b18SBagas Sanjayabe handy when experimenting. An example command might look something 41*03e23b18SBagas Sanjayalike this for crypto offload:: 42*03e23b18SBagas Sanjaya 43*03e23b18SBagas Sanjaya ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \ 44*03e23b18SBagas Sanjaya reqid 0x07 replay-window 32 \ 45*03e23b18SBagas Sanjaya aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \ 46*03e23b18SBagas Sanjaya sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \ 47*03e23b18SBagas Sanjaya offload dev eth4 dir in 48*03e23b18SBagas Sanjaya 49*03e23b18SBagas Sanjayaand for packet offload:: 50*03e23b18SBagas Sanjaya 51*03e23b18SBagas Sanjaya ip x s add proto esp dst 14.0.0.70 src 14.0.0.52 spi 0x07 mode transport \ 52*03e23b18SBagas Sanjaya reqid 0x07 replay-window 32 \ 53*03e23b18SBagas Sanjaya aead 'rfc4106(gcm(aes))' 0x44434241343332312423222114131211f4f3f2f1 128 \ 54*03e23b18SBagas Sanjaya sel src 14.0.0.52/24 dst 14.0.0.70/24 proto tcp \ 55*03e23b18SBagas Sanjaya offload packet dev eth4 dir in 56*03e23b18SBagas Sanjaya 57*03e23b18SBagas Sanjaya ip x p add src 14.0.0.70 dst 14.0.0.52 offload packet dev eth4 dir in 58*03e23b18SBagas Sanjaya tmpl src 14.0.0.70 dst 14.0.0.52 proto esp reqid 10000 mode transport 59*03e23b18SBagas Sanjaya 60*03e23b18SBagas SanjayaYes, that's ugly, but that's what shell scripts and/or libreswan are for. 61*03e23b18SBagas Sanjaya 62*03e23b18SBagas Sanjaya 63*03e23b18SBagas Sanjaya 64*03e23b18SBagas SanjayaCallbacks to implement 65*03e23b18SBagas Sanjaya====================== 66*03e23b18SBagas Sanjaya 67*03e23b18SBagas Sanjaya:: 68*03e23b18SBagas Sanjaya 69*03e23b18SBagas Sanjaya /* from include/linux/netdevice.h */ 70*03e23b18SBagas Sanjaya struct xfrmdev_ops { 71*03e23b18SBagas Sanjaya /* Crypto and Packet offload callbacks */ 72*03e23b18SBagas Sanjaya int (*xdo_dev_state_add)(struct net_device *dev, 73*03e23b18SBagas Sanjaya struct xfrm_state *x, 74*03e23b18SBagas Sanjaya struct netlink_ext_ack *extack); 75*03e23b18SBagas Sanjaya void (*xdo_dev_state_delete)(struct net_device *dev, 76*03e23b18SBagas Sanjaya struct xfrm_state *x); 77*03e23b18SBagas Sanjaya void (*xdo_dev_state_free)(struct net_device *dev, 78*03e23b18SBagas Sanjaya struct xfrm_state *x); 79*03e23b18SBagas Sanjaya bool (*xdo_dev_offload_ok) (struct sk_buff *skb, 80*03e23b18SBagas Sanjaya struct xfrm_state *x); 81*03e23b18SBagas Sanjaya void (*xdo_dev_state_advance_esn) (struct xfrm_state *x); 82*03e23b18SBagas Sanjaya void (*xdo_dev_state_update_stats) (struct xfrm_state *x); 83*03e23b18SBagas Sanjaya 84*03e23b18SBagas Sanjaya /* Solely packet offload callbacks */ 85*03e23b18SBagas Sanjaya int (*xdo_dev_policy_add) (struct xfrm_policy *x, struct netlink_ext_ack *extack); 86*03e23b18SBagas Sanjaya void (*xdo_dev_policy_delete) (struct xfrm_policy *x); 87*03e23b18SBagas Sanjaya void (*xdo_dev_policy_free) (struct xfrm_policy *x); 88*03e23b18SBagas Sanjaya }; 89*03e23b18SBagas Sanjaya 90*03e23b18SBagas SanjayaThe NIC driver offering ipsec offload will need to implement callbacks 91*03e23b18SBagas Sanjayarelevant to supported offload to make the offload available to the network 92*03e23b18SBagas Sanjayastack's XFRM subsystem. Additionally, the feature bits NETIF_F_HW_ESP and 93*03e23b18SBagas SanjayaNETIF_F_HW_ESP_TX_CSUM will signal the availability of the offload. 94*03e23b18SBagas Sanjaya 95*03e23b18SBagas Sanjaya 96*03e23b18SBagas Sanjaya 97*03e23b18SBagas SanjayaFlow 98*03e23b18SBagas Sanjaya==== 99*03e23b18SBagas Sanjaya 100*03e23b18SBagas SanjayaAt probe time and before the call to register_netdev(), the driver should 101*03e23b18SBagas Sanjayaset up local data structures and XFRM callbacks, and set the feature bits. 102*03e23b18SBagas SanjayaThe XFRM code's listener will finish the setup on NETDEV_REGISTER. 103*03e23b18SBagas Sanjaya 104*03e23b18SBagas Sanjaya:: 105*03e23b18SBagas Sanjaya 106*03e23b18SBagas Sanjaya adapter->netdev->xfrmdev_ops = &ixgbe_xfrmdev_ops; 107*03e23b18SBagas Sanjaya adapter->netdev->features |= NETIF_F_HW_ESP; 108*03e23b18SBagas Sanjaya adapter->netdev->hw_enc_features |= NETIF_F_HW_ESP; 109*03e23b18SBagas Sanjaya 110*03e23b18SBagas SanjayaWhen new SAs are set up with a request for "offload" feature, the 111*03e23b18SBagas Sanjayadriver's xdo_dev_state_add() will be given the new SA to be offloaded 112*03e23b18SBagas Sanjayaand an indication of whether it is for Rx or Tx. The driver should 113*03e23b18SBagas Sanjaya 114*03e23b18SBagas Sanjaya - verify the algorithm is supported for offloads 115*03e23b18SBagas Sanjaya - store the SA information (key, salt, target-ip, protocol, etc) 116*03e23b18SBagas Sanjaya - enable the HW offload of the SA 117*03e23b18SBagas Sanjaya - return status value: 118*03e23b18SBagas Sanjaya 119*03e23b18SBagas Sanjaya =========== =================================== 120*03e23b18SBagas Sanjaya 0 success 121*03e23b18SBagas Sanjaya -EOPNETSUPP offload not supported, try SW IPsec, 122*03e23b18SBagas Sanjaya not applicable for packet offload mode 123*03e23b18SBagas Sanjaya other fail the request 124*03e23b18SBagas Sanjaya =========== =================================== 125*03e23b18SBagas Sanjaya 126*03e23b18SBagas SanjayaThe driver can also set an offload_handle in the SA, an opaque void pointer 127*03e23b18SBagas Sanjayathat can be used to convey context into the fast-path offload requests:: 128*03e23b18SBagas Sanjaya 129*03e23b18SBagas Sanjaya xs->xso.offload_handle = context; 130*03e23b18SBagas Sanjaya 131*03e23b18SBagas Sanjaya 132*03e23b18SBagas SanjayaWhen the network stack is preparing an IPsec packet for an SA that has 133*03e23b18SBagas Sanjayabeen setup for offload, it first calls into xdo_dev_offload_ok() with 134*03e23b18SBagas Sanjayathe skb and the intended offload state to ask the driver if the offload 135*03e23b18SBagas Sanjayawill serviceable. This can check the packet information to be sure the 136*03e23b18SBagas Sanjayaoffload can be supported (e.g. IPv4 or IPv6, no IPv4 options, etc) and 137*03e23b18SBagas Sanjayareturn true or false to signify its support. In case driver doesn't implement 138*03e23b18SBagas Sanjayathis callback, the stack provides reasonable defaults. 139*03e23b18SBagas Sanjaya 140*03e23b18SBagas SanjayaCrypto offload mode: 141*03e23b18SBagas SanjayaWhen ready to send, the driver needs to inspect the Tx packet for the 142*03e23b18SBagas Sanjayaoffload information, including the opaque context, and set up the packet 143*03e23b18SBagas Sanjayasend accordingly:: 144*03e23b18SBagas Sanjaya 145*03e23b18SBagas Sanjaya xs = xfrm_input_state(skb); 146*03e23b18SBagas Sanjaya context = xs->xso.offload_handle; 147*03e23b18SBagas Sanjaya set up HW for send 148*03e23b18SBagas Sanjaya 149*03e23b18SBagas SanjayaThe stack has already inserted the appropriate IPsec headers in the 150*03e23b18SBagas Sanjayapacket data, the offload just needs to do the encryption and fix up the 151*03e23b18SBagas Sanjayaheader values. 152*03e23b18SBagas Sanjaya 153*03e23b18SBagas Sanjaya 154*03e23b18SBagas SanjayaWhen a packet is received and the HW has indicated that it offloaded a 155*03e23b18SBagas Sanjayadecryption, the driver needs to add a reference to the decoded SA into 156*03e23b18SBagas Sanjayathe packet's skb. At this point the data should be decrypted but the 157*03e23b18SBagas SanjayaIPsec headers are still in the packet data; they are removed later up 158*03e23b18SBagas Sanjayathe stack in xfrm_input(). 159*03e23b18SBagas Sanjaya 160*03e23b18SBagas Sanjaya1. Find and hold the SA that was used to the Rx skb:: 161*03e23b18SBagas Sanjaya 162*03e23b18SBagas Sanjaya /* get spi, protocol, and destination IP from packet headers */ 163*03e23b18SBagas Sanjaya xs = find xs from (spi, protocol, dest_IP) 164*03e23b18SBagas Sanjaya xfrm_state_hold(xs); 165*03e23b18SBagas Sanjaya 166*03e23b18SBagas Sanjaya2. Store the state information into the skb:: 167*03e23b18SBagas Sanjaya 168*03e23b18SBagas Sanjaya sp = secpath_set(skb); 169*03e23b18SBagas Sanjaya if (!sp) return; 170*03e23b18SBagas Sanjaya sp->xvec[sp->len++] = xs; 171*03e23b18SBagas Sanjaya sp->olen++; 172*03e23b18SBagas Sanjaya 173*03e23b18SBagas Sanjaya3. Indicate the success and/or error status of the offload:: 174*03e23b18SBagas Sanjaya 175*03e23b18SBagas Sanjaya xo = xfrm_offload(skb); 176*03e23b18SBagas Sanjaya xo->flags = CRYPTO_DONE; 177*03e23b18SBagas Sanjaya xo->status = crypto_status; 178*03e23b18SBagas Sanjaya 179*03e23b18SBagas Sanjaya4. Hand the packet to napi_gro_receive() as usual. 180*03e23b18SBagas Sanjaya 181*03e23b18SBagas SanjayaIn ESN mode, xdo_dev_state_advance_esn() is called from 182*03e23b18SBagas Sanjayaxfrm_replay_advance_esn() for RX, and xfrm_replay_overflow_offload_esn for TX. 183*03e23b18SBagas SanjayaDriver will check packet seq number and update HW ESN state machine if needed. 184*03e23b18SBagas Sanjaya 185*03e23b18SBagas SanjayaPacket offload mode: 186*03e23b18SBagas SanjayaHW adds and deletes XFRM headers. So in RX path, XFRM stack is bypassed if HW 187*03e23b18SBagas Sanjayareported success. In TX path, the packet lefts kernel without extra header 188*03e23b18SBagas Sanjayaand not encrypted, the HW is responsible to perform it. 189*03e23b18SBagas Sanjaya 190*03e23b18SBagas SanjayaWhen the SA is removed by the user, the driver's xdo_dev_state_delete() 191*03e23b18SBagas Sanjayaand xdo_dev_policy_delete() are asked to disable the offload. Later, 192*03e23b18SBagas Sanjayaxdo_dev_state_free() and xdo_dev_policy_free() are called from a garbage 193*03e23b18SBagas Sanjayacollection routine after all reference counts to the state and policy 194*03e23b18SBagas Sanjayahave been removed and any remaining resources can be cleared for the 195*03e23b18SBagas Sanjayaoffload state. How these are used by the driver will depend on specific 196*03e23b18SBagas Sanjayahardware needs. 197*03e23b18SBagas Sanjaya 198*03e23b18SBagas SanjayaAs a netdev is set to DOWN the XFRM stack's netdev listener will call 199*03e23b18SBagas Sanjayaxdo_dev_state_delete(), xdo_dev_policy_delete(), xdo_dev_state_free() and 200*03e23b18SBagas Sanjayaxdo_dev_policy_free() on any remaining offloaded states. 201*03e23b18SBagas Sanjaya 202*03e23b18SBagas SanjayaOutcome of HW handling packets, the XFRM core can't count hard, soft limits. 203*03e23b18SBagas SanjayaThe HW/driver are responsible to perform it and provide accurate data when 204*03e23b18SBagas Sanjayaxdo_dev_state_update_stats() is called. In case of one of these limits 205*03e23b18SBagas Sanjayaoccuried, the driver needs to call to xfrm_state_check_expire() to make sure 206*03e23b18SBagas Sanjayathat XFRM performs rekeying sequence. 207