xref: /linux/Documentation/networking/tls.rst (revision c0ef1446959101d23fdf1b1bdefc6613a83dba03)
1.. _kernel_tls:
2
3==========
4Kernel TLS
5==========
6
7Overview
8========
9
10Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over
11TCP. TLS provides end-to-end data integrity and confidentiality.
12
13User interface
14==============
15
16Creating a TLS connection
17-------------------------
18
19First create a new TCP socket and once the connection is established set the
20TLS ULP.
21
22.. code-block:: c
23
24  sock = socket(AF_INET, SOCK_STREAM, 0);
25  connect(sock, addr, addrlen);
26  setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
27
28Setting the TLS ULP allows us to set/get TLS socket options. Currently
29only the symmetric encryption is handled in the kernel.  After the TLS
30handshake is complete, we have all the parameters required to move the
31data-path to the kernel. There is a separate socket option for moving
32the transmit and the receive into the kernel.
33
34.. code-block:: c
35
36  /* From linux/tls.h */
37  struct tls_crypto_info {
38          unsigned short version;
39          unsigned short cipher_type;
40  };
41
42  struct tls12_crypto_info_aes_gcm_128 {
43          struct tls_crypto_info info;
44          unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
45          unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
46          unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
47          unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
48  };
49
50
51  struct tls12_crypto_info_aes_gcm_128 crypto_info;
52
53  crypto_info.info.version = TLS_1_2_VERSION;
54  crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
55  memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE);
56  memcpy(crypto_info.rec_seq, seq_number_write,
57					TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
58  memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
59  memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
60
61  setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info));
62
63Transmit and receive are set separately, but the setup is the same, using either
64TLS_TX or TLS_RX.
65
66Sending TLS application data
67----------------------------
68
69After setting the TLS_TX socket option all application data sent over this
70socket is encrypted using TLS and the parameters provided in the socket option.
71For example, we can send an encrypted hello world record as follows:
72
73.. code-block:: c
74
75  const char *msg = "hello world\n";
76  send(sock, msg, strlen(msg));
77
78send() data is directly encrypted from the userspace buffer provided
79to the encrypted kernel send buffer if possible.
80
81The sendfile system call will send the file's data over TLS records of maximum
82length (2^14).
83
84.. code-block:: c
85
86  file = open(filename, O_RDONLY);
87  fstat(file, &stat);
88  sendfile(sock, file, &offset, stat.st_size);
89
90TLS records are created and sent after each send() call, unless
91MSG_MORE is passed.  MSG_MORE will delay creation of a record until
92MSG_MORE is not passed, or the maximum record size is reached.
93
94The kernel will need to allocate a buffer for the encrypted data.
95This buffer is allocated at the time send() is called, such that
96either the entire send() call will return -ENOMEM (or block waiting
97for memory), or the encryption will always succeed.  If send() returns
98-ENOMEM and some data was left on the socket buffer from a previous
99call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
100
101Receiving TLS application data
102------------------------------
103
104After setting the TLS_RX socket option, all recv family socket calls
105are decrypted using TLS parameters provided.  A full TLS record must
106be received before decryption can happen.
107
108.. code-block:: c
109
110  char buffer[16384];
111  recv(sock, buffer, 16384);
112
113Received data is decrypted directly in to the user buffer if it is
114large enough, and no additional allocations occur.  If the userspace
115buffer is too small, data is decrypted in the kernel and copied to
116userspace.
117
118``EINVAL`` is returned if the TLS version in the received message does not
119match the version passed in setsockopt.
120
121``EMSGSIZE`` is returned if the received message is too big.
122
123``EBADMSG`` is returned if decryption failed for any other reason.
124
125Send TLS control messages
126-------------------------
127
128Other than application data, TLS has control messages such as alert
129messages (record type 21) and handshake messages (record type 22), etc.
130These messages can be sent over the socket by providing the TLS record type
131via a CMSG. For example the following function sends @data of @length bytes
132using a record of type @record_type.
133
134.. code-block:: c
135
136  /* send TLS control message using record_type */
137  static int klts_send_ctrl_message(int sock, unsigned char record_type,
138                                    void *data, size_t length)
139  {
140        struct msghdr msg = {0};
141        int cmsg_len = sizeof(record_type);
142        struct cmsghdr *cmsg;
143        char buf[CMSG_SPACE(cmsg_len)];
144        struct iovec msg_iov;   /* Vector of data to send/receive into.  */
145
146        msg.msg_control = buf;
147        msg.msg_controllen = sizeof(buf);
148        cmsg = CMSG_FIRSTHDR(&msg);
149        cmsg->cmsg_level = SOL_TLS;
150        cmsg->cmsg_type = TLS_SET_RECORD_TYPE;
151        cmsg->cmsg_len = CMSG_LEN(cmsg_len);
152        *CMSG_DATA(cmsg) = record_type;
153        msg.msg_controllen = cmsg->cmsg_len;
154
155        msg_iov.iov_base = data;
156        msg_iov.iov_len = length;
157        msg.msg_iov = &msg_iov;
158        msg.msg_iovlen = 1;
159
160        return sendmsg(sock, &msg, 0);
161  }
162
163Control message data should be provided unencrypted, and will be
164encrypted by the kernel.
165
166Receiving TLS control messages
167------------------------------
168
169TLS control messages are passed in the userspace buffer, with message
170type passed via cmsg.  If no cmsg buffer is provided, an error is
171returned if a control message is received.  Data messages may be
172received without a cmsg buffer set.
173
174.. code-block:: c
175
176  char buffer[16384];
177  char cmsg[CMSG_SPACE(sizeof(unsigned char))];
178  struct msghdr msg = {0};
179  msg.msg_control = cmsg;
180  msg.msg_controllen = sizeof(cmsg);
181
182  struct iovec msg_iov;
183  msg_iov.iov_base = buffer;
184  msg_iov.iov_len = 16384;
185
186  msg.msg_iov = &msg_iov;
187  msg.msg_iovlen = 1;
188
189  int ret = recvmsg(sock, &msg, 0 /* flags */);
190
191  struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
192  if (cmsg->cmsg_level == SOL_TLS &&
193      cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
194      int record_type = *((unsigned char *)CMSG_DATA(cmsg));
195      // Do something with record_type, and control message data in
196      // buffer.
197      //
198      // Note that record_type may be == to application data (23).
199  } else {
200      // Buffer contains application data.
201  }
202
203recv will never return data from mixed types of TLS records.
204
205TLS 1.3 Key Updates
206-------------------
207
208In TLS 1.3, KeyUpdate handshake messages signal that the sender is
209updating its TX key. Any message sent after a KeyUpdate will be
210encrypted using the new key. The userspace library can pass the new
211key to the kernel using the TLS_TX and TLS_RX socket options, as for
212the initial keys. TLS version and cipher cannot be changed.
213
214To prevent attempting to decrypt incoming records using the wrong key,
215decryption will be paused when a KeyUpdate message is received by the
216kernel, until the new key has been provided using the TLS_RX socket
217option. Any read occurring after the KeyUpdate has been read and
218before the new key is provided will fail with EKEYEXPIRED. poll() will
219not report any read events from the socket until the new key is
220provided. There is no pausing on the transmit side.
221
222Userspace should make sure that the crypto_info provided has been set
223properly. In particular, the kernel will not check for key/nonce
224reuse.
225
226The number of successful and failed key updates is tracked in the
227``TlsTxRekeyOk``, ``TlsRxRekeyOk``, ``TlsTxRekeyError``,
228``TlsRxRekeyError`` statistics. The ``TlsRxRekeyReceived`` statistic
229counts KeyUpdate handshake messages that have been received.
230
231Integrating in to userspace TLS library
232---------------------------------------
233
234At a high level, the kernel TLS ULP is a replacement for the record
235layer of a userspace TLS library.
236
237A patchset to OpenSSL to use ktls as the record layer is
238`here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_.
239
240`An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_
241of calling send directly after a handshake using gnutls.
242Since it doesn't implement a full record layer, control
243messages are not supported.
244
245Optional optimizations
246----------------------
247
248There are certain condition-specific optimizations the TLS ULP can make,
249if requested. Those optimizations are either not universally beneficial
250or may impact correctness, hence they require an opt-in.
251All options are set per-socket using setsockopt(), and their
252state can be checked using getsockopt() and via socket diag (``ss``).
253
254TLS_TX_ZEROCOPY_RO
255~~~~~~~~~~~~~~~~~~
256
257For device offload only. Allow sendfile() data to be transmitted directly
258to the NIC without making an in-kernel copy. This allows true zero-copy
259behavior when device offload is enabled.
260
261The application must make sure that the data is not modified between being
262submitted and transmission completing. In other words this is mostly
263applicable if the data sent on a socket via sendfile() is read-only.
264
265Modifying the data may result in different versions of the data being used
266for the original TCP transmission and TCP retransmissions. To the receiver
267this will look like TLS records had been tampered with and will result
268in record authentication failures.
269
270TLS_RX_EXPECT_NO_PAD
271~~~~~~~~~~~~~~~~~~~~
272
273TLS 1.3 only. Expect the sender to not pad records. This allows the data
274to be decrypted directly into user space buffers with TLS 1.3.
275
276This optimization is safe to enable only if the remote end is trusted,
277otherwise it is an attack vector to doubling the TLS processing cost.
278
279If the record decrypted turns out to had been padded or is not a data
280record it will be decrypted again into a kernel buffer without zero copy.
281Such events are counted in the ``TlsDecryptRetry`` statistic.
282
283Statistics
284==========
285
286TLS implementation exposes the following per-namespace statistics
287(``/proc/net/tls_stat``):
288
289- ``TlsCurrTxSw``, ``TlsCurrRxSw`` -
290  number of TX and RX sessions currently installed where host handles
291  cryptography
292
293- ``TlsCurrTxDevice``, ``TlsCurrRxDevice`` -
294  number of TX and RX sessions currently installed where NIC handles
295  cryptography
296
297- ``TlsTxSw``, ``TlsRxSw`` -
298  number of TX and RX sessions opened with host cryptography
299
300- ``TlsTxDevice``, ``TlsRxDevice`` -
301  number of TX and RX sessions opened with NIC cryptography
302
303- ``TlsDecryptError`` -
304  record decryption failed (e.g. due to incorrect authentication tag)
305
306- ``TlsDeviceRxResync`` -
307  number of RX resyncs sent to NICs handling cryptography
308
309- ``TlsDecryptRetry`` -
310  number of RX records which had to be re-decrypted due to
311  ``TLS_RX_EXPECT_NO_PAD`` mis-prediction. Note that this counter will
312  also increment for non-data records.
313
314- ``TlsRxNoPadViolation`` -
315  number of data RX records which had to be re-decrypted due to
316  ``TLS_RX_EXPECT_NO_PAD`` mis-prediction.
317
318- ``TlsTxRekeyOk``, ``TlsRxRekeyOk`` -
319  number of successful rekeys on existing sessions for TX and RX
320
321- ``TlsTxRekeyError``, ``TlsRxRekeyError`` -
322  number of failed rekeys on existing sessions for TX and RX
323
324- ``TlsRxRekeyReceived`` -
325  number of received KeyUpdate handshake messages, requiring userspace
326  to provide a new RX key
327