xref: /linux/Documentation/networking/tls.rst (revision 95f68e06b41b9e88291796efa3969409d13fdd4c)
1.. _kernel_tls:
2
3==========
4Kernel TLS
5==========
6
7Overview
8========
9
10Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs over
11TCP. TLS provides end-to-end data integrity and confidentiality.
12
13User interface
14==============
15
16Creating a TLS connection
17-------------------------
18
19First create a new TCP socket and set the TLS ULP.
20
21.. code-block:: c
22
23  sock = socket(AF_INET, SOCK_STREAM, 0);
24  setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"));
25
26Setting the TLS ULP allows us to set/get TLS socket options. Currently
27only the symmetric encryption is handled in the kernel.  After the TLS
28handshake is complete, we have all the parameters required to move the
29data-path to the kernel. There is a separate socket option for moving
30the transmit and the receive into the kernel.
31
32.. code-block:: c
33
34  /* From linux/tls.h */
35  struct tls_crypto_info {
36          unsigned short version;
37          unsigned short cipher_type;
38  };
39
40  struct tls12_crypto_info_aes_gcm_128 {
41          struct tls_crypto_info info;
42          unsigned char iv[TLS_CIPHER_AES_GCM_128_IV_SIZE];
43          unsigned char key[TLS_CIPHER_AES_GCM_128_KEY_SIZE];
44          unsigned char salt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];
45          unsigned char rec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];
46  };
47
48
49  struct tls12_crypto_info_aes_gcm_128 crypto_info;
50
51  crypto_info.info.version = TLS_1_2_VERSION;
52  crypto_info.info.cipher_type = TLS_CIPHER_AES_GCM_128;
53  memcpy(crypto_info.iv, iv_write, TLS_CIPHER_AES_GCM_128_IV_SIZE);
54  memcpy(crypto_info.rec_seq, seq_number_write,
55					TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);
56  memcpy(crypto_info.key, cipher_key_write, TLS_CIPHER_AES_GCM_128_KEY_SIZE);
57  memcpy(crypto_info.salt, implicit_iv_write, TLS_CIPHER_AES_GCM_128_SALT_SIZE);
58
59  setsockopt(sock, SOL_TLS, TLS_TX, &crypto_info, sizeof(crypto_info));
60
61Transmit and receive are set separately, but the setup is the same, using either
62TLS_TX or TLS_RX.
63
64Sending TLS application data
65----------------------------
66
67After setting the TLS_TX socket option all application data sent over this
68socket is encrypted using TLS and the parameters provided in the socket option.
69For example, we can send an encrypted hello world record as follows:
70
71.. code-block:: c
72
73  const char *msg = "hello world\n";
74  send(sock, msg, strlen(msg));
75
76send() data is directly encrypted from the userspace buffer provided
77to the encrypted kernel send buffer if possible.
78
79The sendfile system call will send the file's data over TLS records of maximum
80length (2^14).
81
82.. code-block:: c
83
84  file = open(filename, O_RDONLY);
85  fstat(file, &stat);
86  sendfile(sock, file, &offset, stat.st_size);
87
88TLS records are created and sent after each send() call, unless
89MSG_MORE is passed.  MSG_MORE will delay creation of a record until
90MSG_MORE is not passed, or the maximum record size is reached.
91
92The kernel will need to allocate a buffer for the encrypted data.
93This buffer is allocated at the time send() is called, such that
94either the entire send() call will return -ENOMEM (or block waiting
95for memory), or the encryption will always succeed.  If send() returns
96-ENOMEM and some data was left on the socket buffer from a previous
97call using MSG_MORE, the MSG_MORE data is left on the socket buffer.
98
99Receiving TLS application data
100------------------------------
101
102After setting the TLS_RX socket option, all recv family socket calls
103are decrypted using TLS parameters provided.  A full TLS record must
104be received before decryption can happen.
105
106.. code-block:: c
107
108  char buffer[16384];
109  recv(sock, buffer, 16384);
110
111Received data is decrypted directly in to the user buffer if it is
112large enough, and no additional allocations occur.  If the userspace
113buffer is too small, data is decrypted in the kernel and copied to
114userspace.
115
116``EINVAL`` is returned if the TLS version in the received message does not
117match the version passed in setsockopt.
118
119``EMSGSIZE`` is returned if the received message is too big.
120
121``EBADMSG`` is returned if decryption failed for any other reason.
122
123Send TLS control messages
124-------------------------
125
126Other than application data, TLS has control messages such as alert
127messages (record type 21) and handshake messages (record type 22), etc.
128These messages can be sent over the socket by providing the TLS record type
129via a CMSG. For example the following function sends @data of @length bytes
130using a record of type @record_type.
131
132.. code-block:: c
133
134  /* send TLS control message using record_type */
135  static int klts_send_ctrl_message(int sock, unsigned char record_type,
136                                    void *data, size_t length)
137  {
138        struct msghdr msg = {0};
139        int cmsg_len = sizeof(record_type);
140        struct cmsghdr *cmsg;
141        char buf[CMSG_SPACE(cmsg_len)];
142        struct iovec msg_iov;   /* Vector of data to send/receive into.  */
143
144        msg.msg_control = buf;
145        msg.msg_controllen = sizeof(buf);
146        cmsg = CMSG_FIRSTHDR(&msg);
147        cmsg->cmsg_level = SOL_TLS;
148        cmsg->cmsg_type = TLS_SET_RECORD_TYPE;
149        cmsg->cmsg_len = CMSG_LEN(cmsg_len);
150        *CMSG_DATA(cmsg) = record_type;
151        msg.msg_controllen = cmsg->cmsg_len;
152
153        msg_iov.iov_base = data;
154        msg_iov.iov_len = length;
155        msg.msg_iov = &msg_iov;
156        msg.msg_iovlen = 1;
157
158        return sendmsg(sock, &msg, 0);
159  }
160
161Control message data should be provided unencrypted, and will be
162encrypted by the kernel.
163
164Receiving TLS control messages
165------------------------------
166
167TLS control messages are passed in the userspace buffer, with message
168type passed via cmsg.  If no cmsg buffer is provided, an error is
169returned if a control message is received.  Data messages may be
170received without a cmsg buffer set.
171
172.. code-block:: c
173
174  char buffer[16384];
175  char cmsg[CMSG_SPACE(sizeof(unsigned char))];
176  struct msghdr msg = {0};
177  msg.msg_control = cmsg;
178  msg.msg_controllen = sizeof(cmsg);
179
180  struct iovec msg_iov;
181  msg_iov.iov_base = buffer;
182  msg_iov.iov_len = 16384;
183
184  msg.msg_iov = &msg_iov;
185  msg.msg_iovlen = 1;
186
187  int ret = recvmsg(sock, &msg, 0 /* flags */);
188
189  struct cmsghdr *cmsg = CMSG_FIRSTHDR(&msg);
190  if (cmsg->cmsg_level == SOL_TLS &&
191      cmsg->cmsg_type == TLS_GET_RECORD_TYPE) {
192      int record_type = *((unsigned char *)CMSG_DATA(cmsg));
193      // Do something with record_type, and control message data in
194      // buffer.
195      //
196      // Note that record_type may be == to application data (23).
197  } else {
198      // Buffer contains application data.
199  }
200
201recv will never return data from mixed types of TLS records.
202
203TLS 1.3 Key Updates
204-------------------
205
206In TLS 1.3, KeyUpdate handshake messages signal that the sender is
207updating its TX key. Any message sent after a KeyUpdate will be
208encrypted using the new key. The userspace library can pass the new
209key to the kernel using the TLS_TX and TLS_RX socket options, as for
210the initial keys. TLS version and cipher cannot be changed.
211
212To prevent attempting to decrypt incoming records using the wrong key,
213decryption will be paused when a KeyUpdate message is received by the
214kernel, until the new key has been provided using the TLS_RX socket
215option. Any read occurring after the KeyUpdate has been read and
216before the new key is provided will fail with EKEYEXPIRED. poll() will
217not report any read events from the socket until the new key is
218provided. There is no pausing on the transmit side.
219
220Userspace should make sure that the crypto_info provided has been set
221properly. In particular, the kernel will not check for key/nonce
222reuse.
223
224The number of successful and failed key updates is tracked in the
225``TlsTxRekeyOk``, ``TlsRxRekeyOk``, ``TlsTxRekeyError``,
226``TlsRxRekeyError`` statistics. The ``TlsRxRekeyReceived`` statistic
227counts KeyUpdate handshake messages that have been received.
228
229Integrating in to userspace TLS library
230---------------------------------------
231
232At a high level, the kernel TLS ULP is a replacement for the record
233layer of a userspace TLS library.
234
235A patchset to OpenSSL to use ktls as the record layer is
236`here <https://github.com/Mellanox/openssl/commits/tls_rx2>`_.
237
238`An example <https://github.com/ktls/af_ktls-tool/commits/RX>`_
239of calling send directly after a handshake using gnutls.
240Since it doesn't implement a full record layer, control
241messages are not supported.
242
243Optional optimizations
244----------------------
245
246There are certain condition-specific optimizations the TLS ULP can make,
247if requested. Those optimizations are either not universally beneficial
248or may impact correctness, hence they require an opt-in.
249All options are set per-socket using setsockopt(), and their
250state can be checked using getsockopt() and via socket diag (``ss``).
251
252TLS_TX_ZEROCOPY_RO
253~~~~~~~~~~~~~~~~~~
254
255For device offload only. Allow sendfile() data to be transmitted directly
256to the NIC without making an in-kernel copy. This allows true zero-copy
257behavior when device offload is enabled.
258
259The application must make sure that the data is not modified between being
260submitted and transmission completing. In other words this is mostly
261applicable if the data sent on a socket via sendfile() is read-only.
262
263Modifying the data may result in different versions of the data being used
264for the original TCP transmission and TCP retransmissions. To the receiver
265this will look like TLS records had been tampered with and will result
266in record authentication failures.
267
268TLS_RX_EXPECT_NO_PAD
269~~~~~~~~~~~~~~~~~~~~
270
271TLS 1.3 only. Expect the sender to not pad records. This allows the data
272to be decrypted directly into user space buffers with TLS 1.3.
273
274This optimization is safe to enable only if the remote end is trusted,
275otherwise it is an attack vector to doubling the TLS processing cost.
276
277If the record decrypted turns out to had been padded or is not a data
278record it will be decrypted again into a kernel buffer without zero copy.
279Such events are counted in the ``TlsDecryptRetry`` statistic.
280
281Statistics
282==========
283
284TLS implementation exposes the following per-namespace statistics
285(``/proc/net/tls_stat``):
286
287- ``TlsCurrTxSw``, ``TlsCurrRxSw`` -
288  number of TX and RX sessions currently installed where host handles
289  cryptography
290
291- ``TlsCurrTxDevice``, ``TlsCurrRxDevice`` -
292  number of TX and RX sessions currently installed where NIC handles
293  cryptography
294
295- ``TlsTxSw``, ``TlsRxSw`` -
296  number of TX and RX sessions opened with host cryptography
297
298- ``TlsTxDevice``, ``TlsRxDevice`` -
299  number of TX and RX sessions opened with NIC cryptography
300
301- ``TlsDecryptError`` -
302  record decryption failed (e.g. due to incorrect authentication tag)
303
304- ``TlsDeviceRxResync`` -
305  number of RX resyncs sent to NICs handling cryptography
306
307- ``TlsDecryptRetry`` -
308  number of RX records which had to be re-decrypted due to
309  ``TLS_RX_EXPECT_NO_PAD`` mis-prediction. Note that this counter will
310  also increment for non-data records.
311
312- ``TlsRxNoPadViolation`` -
313  number of data RX records which had to be re-decrypted due to
314  ``TLS_RX_EXPECT_NO_PAD`` mis-prediction.
315
316- ``TlsTxRekeyOk``, ``TlsRxRekeyOk`` -
317  number of successful rekeys on existing sessions for TX and RX
318
319- ``TlsTxRekeyError``, ``TlsRxRekeyError`` -
320  number of failed rekeys on existing sessions for TX and RX
321
322- ``TlsRxRekeyReceived`` -
323  number of received KeyUpdate handshake messages, requiring userspace
324  to provide a new RX key
325