#
22c81cc5 |
| 06-Nov-2022 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: add AccECN CE packet counters to tcpinfo
Provide diagnostics information around AccECN into the tcpinfo struct.
Event: IETF 115 Hackathon Reviewed By: tuexen, #transport Sponsored by: NetA
tcp: add AccECN CE packet counters to tcpinfo
Provide diagnostics information around AccECN into the tcpinfo struct.
Event: IETF 115 Hackathon Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D37280
show more ...
|
#
3708c3d3 |
| 04-Nov-2022 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: reserve tcp_info counters for AccECN
Marking all new fields unused (__xxx).
No functional change.
Reviewed By: tuexen, rrs, #transport Sponsored by: NetApp, Inc. Differential Revision: http
tcp: reserve tcp_info counters for AccECN
Marking all new fields unused (__xxx).
No functional change.
Reviewed By: tuexen, rrs, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D37016
show more ...
|
Revision tags: release/13.1.0 |
|
#
93e28d6e |
| 01-Feb-2022 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: LRO code to deal with all 12 TCP header flags
TCP per RFC793 has 4 reserved flag bits for future use. One of those bits may be used for Accurate ECN. This patch is to include these bits in the
tcp: LRO code to deal with all 12 TCP header flags
TCP per RFC793 has 4 reserved flag bits for future use. One of those bits may be used for Accurate ECN. This patch is to include these bits in the LRO code to ease the extensibility if/when these bits are used.
Reviewed By: hselasky, rrs, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34127
show more ...
|
Revision tags: release/12.3.0 |
|
#
e2833083 |
| 26-Oct-2021 |
Peter Lei <peterlei@netflix.com> |
tcp: socket option to get stack alias name
TCP stack sysctl nodes are currently inserted using the stack name alias. Allow the user to get the current stack's alias to allow for programatic sysctl a
tcp: socket option to get stack alias name
TCP stack sysctl nodes are currently inserted using the stack name alias. Allow the user to get the current stack's alias to allow for programatic sysctl access.
Obtained from: Netflix
show more ...
|
#
4e4c84f8 |
| 22-Oct-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Add hystart-plus to cc_newreno and rack.
TCP Hystart draft version -03: https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-hystartplusplus
Is a new version of hystart that allows one to car
tcp: Add hystart-plus to cc_newreno and rack.
TCP Hystart draft version -03: https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-hystartplusplus
Is a new version of hystart that allows one to carefully exit slow start if the RTT spikes too much. The newer version has a slower-slow-start so to speak that then kicks in for five round trips. To see if you exited too early, if not into congestion avoidance. This commit will add that feature to our newreno CC and add the needed bits in rack to be able to enable it.
Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32373
show more ...
|
#
5baf32c9 |
| 17-Aug-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Add support for DSACK based reordering window to rack.
The rack stack, with respect to the rack bits in it, was originally built based on an early I-D of rack. In fact at that time the TLP bits
tcp: Add support for DSACK based reordering window to rack.
The rack stack, with respect to the rack bits in it, was originally built based on an early I-D of rack. In fact at that time the TLP bits were in a separate I-D. The dynamic reordering window based on DSACK events was not present in rack at that time. It is now part of the RFC and we need to update our stack to include these features. However we want to have a way to control the feature so that we can, if the admin decides, make it stay the same way system wide as well as via socket option. The new sysctl and socket option has the following meaning for setting:
00 (0) - Keep the old way, i.e. reordering window is 1 and do not use DSACK bytes to add to reorder window 01 (1) - Change the Reordering window to 1/4 of an RTT but do not use DSACK bytes to add to reorder window 10 (2) - Keep the reordering window as 1, but do use SACK bytes to add additional 1/4 RTT delay to the reorder window 11 (3) - reordering window is 1/4 of an RTT and add additional DSACK bytes to increase the reordering window (RFC behavior)
The default currently in the sysctl is 3 so we get standards based behavior. Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D31506
show more ...
|
#
8e1864ed |
| 20-May-2021 |
Kristof Provost <kp@FreeBSD.org> |
pf: syncookie support
Import OpenBSD's syncookie support for pf. This feature help pf resist TCP SYN floods by only creating states once the remote host completes the TCP handshake rather than when
pf: syncookie support
Import OpenBSD's syncookie support for pf. This feature help pf resist TCP SYN floods by only creating states once the remote host completes the TCP handshake rather than when the initial SYN packet is received.
This is accomplished by using the initial sequence numbers to encode a cookie (hence the name) in the SYN+ACK response and verifying this on receipt of the client ACK.
Reviewed by: kbowling Obtained from: OpenBSD MFC after: 1 week Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D31138
show more ...
|
#
4f3addd9 |
| 26-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Add a socket option to rack so we can test various changes to the slop value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back a long time when delayed ack ti
tcp: Add a socket option to rack so we can test various changes to the slop value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back a long time when delayed ack timers were longer and links were slower. A 200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that lowering this value to something more in line with todays delayed ack values (40ms) might improve TCP. This bit of code makes it so rack can, via a socket option, adjust the timer slop.
Reviewed by: mtuexen Sponsered by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30249
show more ...
|
#
0471a8c7 |
| 10-May-2021 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: SACK Lost Retransmission Detection (LRD)
Recover from excessive losses without reverting to a retransmission timeout (RTO). Disabled by default, enable with sysctl net.inet.tcp.do_lrd=1
Review
tcp: SACK Lost Retransmission Detection (LRD)
Recover from excessive losses without reverting to a retransmission timeout (RTO). Disabled by default, enable with sysctl net.inet.tcp.do_lrd=1
Reviewed By: #transport, rrs, tuexen, #manpages Sponsored by: Netapp, Inc. Differential Revision: https://reviews.freebsd.org/D28931
show more ...
|
#
5d8fd932 |
| 06-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
This brings into sync FreeBSD with the netflix versions of rack and bbr. This fixes several breakages (panics) since the tcp_lro code was committed that have been reported. Quite a few new features a
This brings into sync FreeBSD with the netflix versions of rack and bbr. This fixes several breakages (panics) since the tcp_lro code was committed that have been reported. Quite a few new features are now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the largest). There is also support for ack-war prevention. Documents comming soon on rack..
Sponsored by: Netflix Reviewed by: rscheff, mtuexen Differential Revision: https://reviews.freebsd.org/D30036
show more ...
|
#
9e644c23 |
| 18-Apr-2021 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: add support for TCP over UDP
Adding support for TCP over UDP allows communication with TCP stacks which can be implemented in userspace without requiring special priviledges or specific support
tcp: add support for TCP over UDP
Adding support for TCP over UDP allows communication with TCP stacks which can be implemented in userspace without requiring special priviledges or specific support by the OS. This is joint work with rrs.
Reviewed by: rrs Sponsored by: Netflix, Inc. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29469
show more ...
|
Revision tags: release/13.0.0 |
|
#
a034518a |
| 19-Dec-2020 |
Andrew Gallatin <gallatin@FreeBSD.org> |
Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain
In order to efficiently serve web traffic on a NUMA machine, one must avoid as many NUMA domain crossings as possible. With SO
Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain
In order to efficiently serve web traffic on a NUMA machine, one must avoid as many NUMA domain crossings as possible. With SO_REUSEPORT_LB, a number of workers can share a listen socket. However, even if a worker sets affinity to a core or set of cores on a NUMA domain, it will receive connections associated with all NUMA domains in the system. This will lead to cross-domain traffic when the server writes to the socket or calls sendfile(), and memory is allocated on the server's local NUMA node, but transmitted on the NUMA node associated with the TCP connection. Similarly, when the server reads from the socket, he will likely be reading memory allocated on the NUMA domain associated with the TCP connection.
This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A server can now tell the kernel to filter traffic so that only incoming connections associated with the desired NUMA domain are given to the server. (Of course, in the case where there are no servers sharing the listen socket on some domain, then as a fallback, traffic will be hashed as normal to all servers sharing the listen socket regardless of domain). This allows a server to deal only with traffic that is local to its NUMA domain, and avoids cross-domain traffic in most cases.
This patch, and a corresponding small patch to nginx to use TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted https media content from dual-socket Xeons with only 13% (as measured by pcm.x) cross domain traffic on the memory controller.
Reviewed by: jhb, bz (earlier version), bcr (man page) Tested by: gonzo Sponsored by: Netfix Differential Revision: https://reviews.freebsd.org/D21636
show more ...
|
Revision tags: release/12.2.0 |
|
#
e3995661 |
| 25-Sep-2020 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
TCP: send full initial window when timestamps are in use
The fastpath in tcp_output tries to send out full segments, and avoid sending partial segments by comparing against the static t_maxseg varia
TCP: send full initial window when timestamps are in use
The fastpath in tcp_output tries to send out full segments, and avoid sending partial segments by comparing against the static t_maxseg variable. That value does not consider tcp options like timestamps, while the initial window calculation is using the correct dynamic tcp_maxseg() function.
Due to this interaction, the last, full size segment is considered too short and not sent out immediately.
Reviewed by: tuexen MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26478
show more ...
|
#
662c1305 |
| 01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
net: clean up empty lines in .c and .h files
|
Revision tags: release/11.4.0 |
|
#
f1f93475 |
| 28-Apr-2020 |
John Baldwin <jhb@FreeBSD.org> |
Initial support for kernel offload of TLS receive.
- Add a new TCP_RXTLS_ENABLE socket option to set the encryption and authentication algorithms and keys as well as the initial sequence number.
Initial support for kernel offload of TLS receive.
- Add a new TCP_RXTLS_ENABLE socket option to set the encryption and authentication algorithms and keys as well as the initial sequence number.
- When reading from a socket using KTLS receive, applications must use recvmsg(). Each successful call to recvmsg() will return a single TLS record. A new TCP control message, TLS_GET_RECORD, will contain the TLS record header of the decrypted record. The regular message buffer passed to recvmsg() will receive the decrypted payload. This is similar to the interface used by Linux's KTLS RX except that Linux does not return the full TLS header in the control message.
- Add plumbing to the TOE KTLS interface to request either transmit or receive KTLS sessions.
- When a socket is using receive KTLS, redirect reads from soreceive_stream() into soreceive_generic().
- Note that this interface is currently only defined for TLS 1.1 and 1.2, though I believe we will be able to reuse the same interface and structures for 1.3.
show more ...
|
#
e570d231 |
| 27-Apr-2020 |
Randall Stewart <rrs@FreeBSD.org> |
This change does a small prepratory step in getting the latest rack and bbr in from the NF repo. When those come in the OOB data handling will be fixed where Skyzaller crashes.
Differential Revision
This change does a small prepratory step in getting the latest rack and bbr in from the NF repo. When those come in the OOB data handling will be fixed where Skyzaller crashes.
Differential Revision: https://reviews.freebsd.org/D24575
show more ...
|
#
44e86fbd |
| 13-Feb-2020 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r357662 through r357854.
|
#
481be5de |
| 12-Feb-2020 |
Randall Stewart <rrs@FreeBSD.org> |
White space cleanup -- remove trailing tab's or spaces from any line.
Sponsored by: Netflix Inc.
|
#
493c98c6 |
| 31-Dec-2019 |
Michael Tuexen <tuexen@FreeBSD.org> |
Add flags for upcoming patches related to improved ECN handling. No functional change. Submitted by: Richard Scheffenegger Reviewed by: rgrimes@, tuexen@ Differential Revision: https://reviews.free
Add flags for upcoming patches related to improved ECN handling. No functional change. Submitted by: Richard Scheffenegger Reviewed by: rgrimes@, tuexen@ Differential Revision: https://reviews.freebsd.org/D22429
show more ...
|
#
adc56f5a |
| 02-Dec-2019 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Make use of the stats(3) framework in the TCP stack.
This makes it possible to retrieve per-connection statistical information such as the receive window size, RTT, or goodput, using a newly added T
Make use of the stats(3) framework in the TCP stack.
This makes it possible to retrieve per-connection statistical information such as the receive window size, RTT, or goodput, using a newly added TCP_STATS getsockopt(3) option, and extract them using the stats_voistat_fetch(3) API.
See the net/tcprtt port for an example consumer of this API.
Compared to the existing TCP_INFO system, the main differences are that this mechanism is easy to extend without breaking ABI, and provides statistical information instead of raw "snapshots" of values at a given point in time. stats(3) is more generic and can be used in both userland and the kernel.
Reviewed by: thj Tested by: thj Obtained from: Netflix Relnotes: yes Sponsored by: Klara Inc, Netflix Differential Revision: https://reviews.freebsd.org/D20655
show more ...
|
Revision tags: release/12.1.0 |
|
#
9122aeea |
| 09-Oct-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r353316 through r353350.
|
#
9e14430d |
| 08-Oct-2019 |
John Baldwin <jhb@FreeBSD.org> |
Add a TOE KTLS mode and a TOE hook for allocating TLS sessions.
This adds the glue to allocate TLS sessions and invokes it from the TLS enable socket option handler. This also adds some counters fo
Add a TOE KTLS mode and a TOE hook for allocating TLS sessions.
This adds the glue to allocate TLS sessions and invokes it from the TLS enable socket option handler. This also adds some counters for active TOE sessions.
The TOE KTLS mode is returned by getsockopt(TLSTX_TLS_MODE) when TOE KTLS is in use on a socket, but cannot be set via setsockopt().
To simplify various checks, a TLS session now includes an explicit 'mode' member set to the value returned by TLSTX_TLS_MODE. Various places that used to check 'sw_encrypt' against NULL to determine software vs ifnet (NIC) TLS now check 'mode' instead.
Reviewed by: np, gallatin Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D21891
show more ...
|
#
668ee101 |
| 26-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r352587 through r352763.
|
#
35c7bb34 |
| 24-Sep-2019 |
Randall Stewart <rrs@FreeBSD.org> |
This commit adds BBR (Bottleneck Bandwidth and RTT) congestion control. This is a completely separate TCP stack (tcp_bbr.ko) that will be built only if you add the make options WITH_EXTRA_TCP_STACKS=
This commit adds BBR (Bottleneck Bandwidth and RTT) congestion control. This is a completely separate TCP stack (tcp_bbr.ko) that will be built only if you add the make options WITH_EXTRA_TCP_STACKS=1 and also include the option TCPHPTS. You can also include the RATELIMIT option if you have a NIC interface that supports hardware pacing, BBR understands how to use such a feature.
Note that this commit also adds in a general purpose time-filter which allows you to have a min-filter or max-filter. A filter allows you to have a low (or high) value for some period of time and degrade slowly to another value has time passes. You can find out the details of BBR by looking at the original paper at:
https://queue.acm.org/detail.cfm?id=3022184
or consult many other web resources you can find on the web referenced by "BBR congestion control". It should be noted that BBRv1 (which this is) does tend to unfairness in cases of small buffered paths, and it will usually get less bandwidth in the case of large BDP paths(when competing with new-reno or cubic flows). BBR is still an active research area and we do plan on implementing V2 of BBR to see if it is an improvement over V1.
Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D21582
show more ...
|
#
c5c3ba6b |
| 03-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r351317 through r351731.
|