#
b2e60773 |
| 27-Aug-2019 |
John Baldwin <jhb@FreeBSD.org> |
Add kernel-side support for in-kernel TLS.
KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for tr
Add kernel-side support for in-kernel TLS.
KTLS adds support for in-kernel framing and encryption of Transport Layer Security (1.0-1.2) data on TCP sockets. KTLS only supports offload of TLS for transmitted data. Key negotation must still be performed in userland. Once completed, transmit session keys for a connection are provided to the kernel via a new TCP_TXTLS_ENABLE socket option. All subsequent data transmitted on the socket is placed into TLS frames and encrypted using the supplied keys.
Any data written to a KTLS-enabled socket via write(2), aio_write(2), or sendfile(2) is assumed to be application data and is encoded in TLS frames with an application data type. Individual records can be sent with a custom type (e.g. handshake messages) via sendmsg(2) with a new control message (TLS_SET_RECORD_TYPE) specifying the record type.
At present, rekeying is not supported though the in-kernel framework should support rekeying.
KTLS makes use of the recently added unmapped mbufs to store TLS frames in the socket buffer. Each TLS frame is described by a single ext_pgs mbuf. The ext_pgs structure contains the header of the TLS record (and trailer for encrypted records) as well as references to the associated TLS session.
KTLS supports two primary methods of encrypting TLS frames: software TLS and ifnet TLS.
Software TLS marks mbufs holding socket data as not ready via M_NOTREADY similar to sendfile(2) when TLS framing information is added to an unmapped mbuf in ktls_frame(). ktls_enqueue() is then called to schedule TLS frames for encryption. In the case of sendfile_iodone() calls ktls_enqueue() instead of pru_ready() leaving the mbufs marked M_NOTREADY until encryption is completed. For other writes (vn_sendfile when pages are available, write(2), etc.), the PRUS_NOTREADY is set when invoking pru_send() along with invoking ktls_enqueue().
A pool of worker threads (the "KTLS" kernel process) encrypts TLS frames queued via ktls_enqueue(). Each TLS frame is temporarily mapped using the direct map and passed to a software encryption backend to perform the actual encryption.
(Note: The use of PHYS_TO_DMAP could be replaced with sf_bufs if someone wished to make this work on architectures without a direct map.)
KTLS supports pluggable software encryption backends. Internally, Netflix uses proprietary pure-software backends. This commit includes a simple backend in a new ktls_ocf.ko module that uses the kernel's OpenCrypto framework to provide AES-GCM encryption of TLS frames. As a result, software TLS is now a bit of a misnomer as it can make use of hardware crypto accelerators.
Once software encryption has finished, the TLS frame mbufs are marked ready via pru_ready(). At this point, the encrypted data appears as regular payload to the TCP stack stored in unmapped mbufs.
ifnet TLS permits a NIC to offload the TLS encryption and TCP segmentation. In this mode, a new send tag type (IF_SND_TAG_TYPE_TLS) is allocated on the interface a socket is routed over and associated with a TLS session. TLS records for a TLS session using ifnet TLS are not marked M_NOTREADY but are passed down the stack unencrypted. The ip_output_send() and ip6_output_send() helper functions that apply send tags to outbound IP packets verify that the send tag of the TLS record matches the outbound interface. If so, the packet is tagged with the TLS send tag and sent to the interface. The NIC device driver must recognize packets with the TLS send tag and schedule them for TLS encryption and TCP segmentation. If the the outbound interface does not match the interface in the TLS send tag, the packet is dropped. In addition, a task is scheduled to refresh the TLS send tag for the TLS session. If a new TLS send tag cannot be allocated, the connection is dropped. If a new TLS send tag is allocated, however, subsequent packets will be tagged with the correct TLS send tag. (This latter case has been tested by configuring both ports of a Chelsio T6 in a lagg and failing over from one port to another. As the connections migrated to the new port, new TLS send tags were allocated for the new port and connections resumed without being dropped.)
ifnet TLS can be enabled and disabled on supported network interfaces via new '[-]txtls[46]' options to ifconfig(8). ifnet TLS is supported across both vlan devices and lagg interfaces using failover, lacp with flowid enabled, or lacp with flowid enabled.
Applications may request the current KTLS mode of a connection via a new TCP_TXTLS_MODE socket option. They can also use this socket option to toggle between software and ifnet TLS modes.
In addition, a testing tool is available in tools/tools/switch_tls. This is modeled on tcpdrop and uses similar syntax. However, instead of dropping connections, -s is used to force KTLS connections to switch to software TLS and -i is used to switch to ifnet TLS.
Various sysctls and counters are available under the kern.ipc.tls sysctl node. The kern.ipc.tls.enable node must be set to true to enable KTLS (it is off by default). The use of unmapped mbufs must also be enabled via kern.ipc.mb_use_ext_pgs to enable KTLS.
KTLS is enabled via the KERN_TLS kernel option.
This patch is the culmination of years of work by several folks including Scott Long and Randall Stewart for the original design and implementation; Drew Gallatin for several optimizations including the use of ext_pgs mbufs, the M_NOTREADY mechanism for TLS records awaiting software encryption, and pluggable software crypto backends; and John Baldwin for modifications to support hardware TLS offload.
Reviewed by: gallatin, hselasky, rrs Obtained from: Netflix Sponsored by: Netflix, Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21277
show more ...
|
#
20abea66 |
| 01-Aug-2019 |
Randall Stewart <rrs@FreeBSD.org> |
This adds the third step in getting BBR into the tree. BBR and an updated rack depend on having access to the new ratelimit api in this commit.
Sponsored by: Netflix Inc. Differential Revision: http
This adds the third step in getting BBR into the tree. BBR and an updated rack depend on having access to the new ratelimit api in this commit.
Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D20953
show more ...
|
Revision tags: release/11.3.0 |
|
#
0269ae4c |
| 06-Jun-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @348740
Sponsored by: The FreeBSD Foundation
|
#
e2e050c8 |
| 20-May-2019 |
Conrad Meyer <cem@FreeBSD.org> |
Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces hea
Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change).
No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
show more ...
|
#
7648bc9f |
| 13-May-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @347527
Sponsored by: The FreeBSD Foundation
|
#
7687707d |
| 22-Apr-2019 |
Andrew Gallatin <gallatin@FreeBSD.org> |
Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory
This commit adds new if_alloc_domain() and if_alloc_dev() methods to allocate ifnets. When called with a domain on a NUMA ma
Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory
This commit adds new if_alloc_domain() and if_alloc_dev() methods to allocate ifnets. When called with a domain on a NUMA machine, ifalloc_domain() will record the NUMA domain in the ifnet, and it will allocate the ifnet struct from memory which is local to that NUMA node. Similarly, if_alloc_dev() is a wrapper for if_alloc_domain which uses a driver supplied device_t to call ifalloc_domain() with the appropriate domain.
Note that the new if_numa_domain field fits in an alignment pad in struct ifnet, and so does not alter the size of the structure.
Reviewed by: glebius, kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19930
show more ...
|
#
c2c227a5 |
| 03-Feb-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r343571 through r343711.
|
#
b252313f |
| 01-Feb-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
New pfil(9) KPI together with newborn pfil API and control utility.
The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) intern
New pfil(9) KPI together with newborn pfil API and control utility.
The KPI have been reviewed and cleansed of features that were planned back 20 years ago and never implemented. The pfil(9) internals have been made opaque to protocols with only returned types and function declarations exposed. The KPI is made more strict, but at the same time more extensible, as kernel uses same command structures that userland ioctl uses.
In nutshell [KA]PI is about declaring filtering points, declaring filters and linking and unlinking them together.
New [KA]PI makes it possible to reconfigure pfil(9) configuration: change order of hooks, rehook filter from one filtering point to a different one, disconnect a hook on output leaving it on input only, prepend/append a filter to existing list of filters.
Now it possible for a single packet filter to provide multiple rulesets that may be linked to different points. Think of per-interface ACLs in Cisco or Juniper. None of existing packet filters yet support that, however limited usage is already possible, e.g. default ruleset can be moved to single interface, as soon as interface would pride their filtering points.
Another future feature is possiblity to create pfil heads, that provide not an mbuf pointer but just a memory pointer with length. That would allow filtering at very early stages of a packet lifecycle, e.g. when packet has just been received by a NIC and no mbuf was yet allocated.
Differential Revision: https://reviews.freebsd.org/D18951
show more ...
|
#
a68cc388 |
| 09-Jan-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro
Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always.
- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.
This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively.
Discussed with: jtl, gallatin
show more ...
|
Revision tags: release/12.0.0 |
|
#
6149ed01 |
| 14-Nov-2018 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r340368 through r340426.
|
#
b79aa45e |
| 13-Nov-2018 |
Gleb Smirnoff <glebius@FreeBSD.org> |
For compatibility KPI functions like if_addr_rlock() that used to have mutexes but now are converted to epoch(9) use thread-private epoch_tracker. Embedding tracker into ifnet(9) or ifnet derived str
For compatibility KPI functions like if_addr_rlock() that used to have mutexes but now are converted to epoch(9) use thread-private epoch_tracker. Embedding tracker into ifnet(9) or ifnet derived structures creates a non reentrable function, that will fail miserably if called simultaneously from two different contexts. A thread private tracker will provide a single tracker that would allow to call these functions safely. It doesn't allow nested call, but this is not expected from compatibility KPIs.
Reviewed by: markj
show more ...
|
#
c6879c6c |
| 23-Oct-2018 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r339015 through r339669.
|
#
64d63b1e |
| 21-Oct-2018 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Add ifaddr_event_ext event. It is similar to ifaddr_event, but the handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL, and the pointer to ifaddr. Also ifaddr_event now is implemente
Add ifaddr_event_ext event. It is similar to ifaddr_event, but the handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL, and the pointer to ifaddr. Also ifaddr_event now is implemented using ifaddr_event_ext handler.
MFC after: 3 weeks Sponsored by: Yandex LLC Differential Revision: https://reviews.freebsd.org/D17100
show more ...
|
#
01d4e214 |
| 05-Oct-2018 |
Glen Barber <gjb@FreeBSD.org> |
MFH r338661 through r339200.
Sponsored by: The FreeBSD Foundation
|
#
ab530bf0 |
| 29-Sep-2018 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r338988 through r339014.
|
#
1687b1ab |
| 29-Sep-2018 |
Michael Tuexen <tuexen@FreeBSD.org> |
For changing the MTU on tun/tap devices, it should not matter whether it is done via using ifconfig, which uses a SIOCSIFMTU ioctl() command, or doing it using a TUNSIFINFO/TAPSIFINFO ioctl() command
For changing the MTU on tun/tap devices, it should not matter whether it is done via using ifconfig, which uses a SIOCSIFMTU ioctl() command, or doing it using a TUNSIFINFO/TAPSIFINFO ioctl() command. Without this patch, for IPv6 the new MTU is not used when creating routes. Especially, when initiating TCP connections after increasing the MTU, the old MTU is still used to compute the MSS. Thanks to ae@ and bz@ for helping to improve the patch.
Reviewed by: ae@, bz@ Approved by: re (kib@) MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D17180
show more ...
|
#
ce44d808 |
| 27-Sep-2018 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r338731 through r338987.
|
#
b08d611d |
| 21-Sep-2018 |
Matt Macy <mmacy@FreeBSD.org> |
fix vlan locking to permit sx acquisition in ioctl calls
- update vlan(9) to handle changes earlier this year in multicast locking
Tested by: np@, darkfiberu at gmail.com
PR: 230510 Reviewed by: m
fix vlan locking to permit sx acquisition in ioctl calls
- update vlan(9) to handle changes earlier this year in multicast locking
Tested by: np@, darkfiberu at gmail.com
PR: 230510 Reviewed by: mjoras@, shurd@, sbruno@ Approved by: re (gjb@) Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16808
show more ...
|
#
3611ec60 |
| 18-Aug-2018 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r337646 through r338014.
|
#
f9be0386 |
| 15-Aug-2018 |
Matt Macy <mmacy@FreeBSD.org> |
Fix in6_multi double free
This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast
Fix in6_multi double free
This is actually several different bugs: - The code is not designed to handle inpcb deletion after interface deletion - add reference for inpcb membership - The multicast address has to be removed from interface lists when the refcount goes to zero OR when the interface goes away - decouple list disconnect from refcount (v6 only for now) - ifmultiaddr can exist past being on interface lists - add flag for tracking whether or not it's enqueued - deferring freeing moptions makes the incpb cleanup code simpler but opens the door wider still to races - call inp_gcmoptions synchronously after dropping the the inpcb lock
Fundamentally multicast needs a rewrite - but keep applying band-aids for now.
Tested by: kp Reported by: novel, kp, lwhsu
show more ...
|
#
98a8fdf6 |
| 09-Jul-2018 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Deduplicate the code.
Add generic function if_tunnel_check_nesting() that does check for allowed nesting level for tunneling interfaces and also does loop detection. Use it in gif(4), gre(4) and me(
Deduplicate the code.
Add generic function if_tunnel_check_nesting() that does check for allowed nesting level for tunneling interfaces and also does loop detection. Use it in gif(4), gre(4) and me(4) interfaces.
Differential Revision: https://reviews.freebsd.org/D16162
show more ...
|
#
6573d758 |
| 04-Jul-2018 |
Matt Macy <mmacy@FreeBSD.org> |
epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simpli
epoch(9): allow preemptible epochs to compose
- Add tracker argument to preemptible epochs - Inline epoch read path in kernel and tied modules - Change in_epoch to take an epoch as argument - Simplify tfb_tcp_do_segment to not take a ti_locked argument, there's no longer any benefit to dropping the pcbinfo lock and trying to do so just adds an error prone branchfest to these functions - Remove cases of same function recursion on the epoch as recursing is no longer free. - Remove the the TAILQ_ENTRY and epoch_section from struct thread as the tracker field is now stack or heap allocated as appropriate.
Tested by: pho and Limelight Networks Reviewed by: kbowling at llnw dot com Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D16066
show more ...
|
Revision tags: release/11.2.0 |
|
#
0f8d79d9 |
| 25-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
CK: update consumers to use CK macros across the board
r334189 changed the fields to have names distinct from those in queue.h in order to expose the oversights as compile time errors
|
#
4f6c66cc |
| 23-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
UDP: further performance improvements on tx
Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps
Single stream throughput incre
UDP: further performance improvements on tx
Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps
Single stream throughput increases from 910kpps to 1.18Mpps
Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg
- Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg
- Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg
- Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg
A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW.
Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409
show more ...
|
#
fd04260d |
| 21-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
ck: simplify interface with libkvm consumers by defining ck_queue types as their queue.h equivalents if !_KERNEL
|