#
3f169c54 |
| 10-Feb-2022 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: Add/update AccECN related statistics and numbers
Reserve couters in the tcps struct in preparation for AccECN, extend the debugging output for TF2 flags, optimize the syncache flags from indivi
tcp: Add/update AccECN related statistics and numbers
Reserve couters in the tcps struct in preparation for AccECN, extend the debugging output for TF2 flags, optimize the syncache flags from individual bits to a codepoint for the specifc ECN handshake.
This is in preparation of AccECN.
No functional chance except for extended debug output capabilities.
Reviewed By: #transport, rrs Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34161
show more ...
|
#
fd7daa72 |
| 08-Feb-2022 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: make tcp_ctloutput_set() non-static
tcp_ctloutput_set() will be used via the sysctl interface in a upcoming command line tool tcpsso.
Reviewed by: glebius, rscheff Sponsored by: Netflix, Inc
tcp: make tcp_ctloutput_set() non-static
tcp_ctloutput_set() will be used via the sysctl interface in a upcoming command line tool tcpsso.
Reviewed by: glebius, rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D34164
show more ...
|
#
1ebf4607 |
| 03-Feb-2022 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: Access all 12 TCP header flags via inline function
In order to consistently provide access to all (including reserved) TCP header flag bits, use an accessor function tcp_get_flags and tcp_set_f
tcp: Access all 12 TCP header flags via inline function
In order to consistently provide access to all (including reserved) TCP header flag bits, use an accessor function tcp_get_flags and tcp_set_flags. Also expand any flag variable from uint8_t / char to uint16_t.
Reviewed By: hselasky, tuexen, glebius, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34130
show more ...
|
#
3b3c08c1 |
| 02-Feb-2022 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: cleanup functions related to socket option handling
Consistently only pass the inp and the sopt around. Don't pass the so around, since in a upcoming commit tcp_ctloutput_set() will be called f
tcp: cleanup functions related to socket option handling
Consistently only pass the inp and the sopt around. Don't pass the so around, since in a upcoming commit tcp_ctloutput_set() will be called from a context different from setsockopt(). Also expect the inp to be locked when calling tcp_ctloutput_[gs]et(), this is also required for the upcoming use by tcpsso, a command line tool to set socket options. Reviewed by: glebius, rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D34151
show more ...
|
#
68e623c3 |
| 27-Jan-2022 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: Rewind erraneous RTO only while performing RTO retransmissions
Under rare circumstances, a spurious retranmission is incorrectly detected and rewound, messing up various tcpcb values, which can
tcp: Rewind erraneous RTO only while performing RTO retransmissions
Under rare circumstances, a spurious retranmission is incorrectly detected and rewound, messing up various tcpcb values, which can lead to a panic when SACK is in use.
Reviewed By: tuexen, chengc_netapp.com, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D33979
show more ...
|
#
89128ff3 |
| 03-Jan-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protocols: init with standard SYSINIT(9) or VNET_SYSINIT
The historical BSD network stack loop that rolls over domains and over protocols has no advantages over more modern SYSINIT(9). While doing t
protocols: init with standard SYSINIT(9) or VNET_SYSINIT
The historical BSD network stack loop that rolls over domains and over protocols has no advantages over more modern SYSINIT(9). While doing the sweep, split global and per-VNET initializers.
Getting rid of pr_init allows to achieve several things: o Get rid of ifdef's that protect against double foo_init() when both INET and INET6 are compiled in. o Isolate initializers statically to the module they init. o Makes code easier to understand and maintain.
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33537
show more ...
|
#
f64dc2ab |
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: TCP output method can request tcp_drop
The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection when they do output on it. The default stack never does this, thus existing frame
tcp: TCP output method can request tcp_drop
The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection when they do output on it. The default stack never does this, thus existing framework expects tcp_output() always to return locked and valid tcpcb.
Provide KPI extension to satisfy demands of advanced stacks. If the output method returns negative error code, it means that caller must call tcp_drop().
In tcp_var() provide three inline methods to call tcp_output(): - tcp_output() is a drop-in replacement for the default stack, so that default stack can continue using it internally without modifications. For advanced stacks it would perform tcp_drop() and unlock and report that with negative error code. - tcp_output_unlock() handles the negative code and always converts it to positive and always unlocks. - tcp_output_nodrop() just calls the method and leaves the responsibility to drop on the caller.
Sweep over the advanced stacks and use new KPI instead of using HPTS delayed drop queue for that.
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33370
show more ...
|
#
5b08b46a |
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: welcome back tcp_output() as the right way to run output on tcpcb.
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33365
|
#
2a28b045 |
| 20-Dec-2021 |
Robert Wing <rew@FreeBSD.org> |
tcp_twrespond: send signed segment when connection is TCP-MD5
When a connection is established to use TCP-MD5, tcp_twrespond() doesn't respond with a signed segment. This results in the host perform
tcp_twrespond: send signed segment when connection is TCP-MD5
When a connection is established to use TCP-MD5, tcp_twrespond() doesn't respond with a signed segment. This results in the host performing the active close to remain in a TIME_WAIT state and the other host in the LAST_ACK state. Fix this by sending a signed segment when the connection is established to use TCP-MD5.
Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D33490
show more ...
|
#
71d2d5ad |
| 19-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcptw: count how many times a tcptw was actually useful
This will allow a sysadmin to lower net.inet.tcp.msl and see how long tcptw are actually useful.
|
#
cb377263 |
| 19-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcptw: remove unused fields
The structure goes away anyway, but it would be interesting to know how much memory we used to save with it. So for the record, structure size with this revision is 64 b
tcptw: remove unused fields
The structure goes away anyway, but it would be interesting to know how much memory we used to save with it. So for the record, structure size with this revision is 64 bytes.
show more ...
|
#
db0ac6de |
| 02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mism
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mismerge of a merge to catch up to main resulted in files being committed which should not have been.
show more ...
|
#
266f97b5 |
| 02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.
MFC after: 1 month
|
#
de2d4784 |
| 02-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addre
SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addresses, etc). However, inpcb aren't static in nature, they are created and destroyed all the time, which creates some traffic on the epoch(9) garbage collector.
Fairly new feature of uma(9) - Safe Memory Reclamation allows to safely free memory in page-sized batches, with virtually zero overhead compared to uma_zfree(). However, unlike epoch(9), it puts stricter requirement on the access to the protected memory, needing the critical(9) section to access it. Details:
- The database is already build on CK lists, thanks to epoch(9). - For write access nothing is changed. - For a lookup in the database SMR section is now required. Once the desired inpcb is found we need to transition from SMR section to r/w lock on the inpcb itself, with a check that inpcb isn't yet freed. This requires some compexity, since SMR section itself is a critical(9) section. The complexity is hidden from KPI users in inp_smr_lock(). - For a inpcb list traversal (a pcblist sysctl, or broadcast notification) also a new KPI is provided, that hides internals of the database - inp_next(struct inp_iterator *).
Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33022
show more ...
|
#
ff945008 |
| 19-Nov-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Add tcp_freecb() - single place to free tcpcb.
Until this change there were two places where we would free tcpcb - tcp_discardcb() in case if all timers are drained and tcp_timer_discard() otherwise
Add tcp_freecb() - single place to free tcpcb.
Until this change there were two places where we would free tcpcb - tcp_discardcb() in case if all timers are drained and tcp_timer_discard() otherwise. They were pretty much copy-n-paste, except that in the default case we would run tcp_hc_update(). Merge this into single function tcp_freecb() and move new short version of tcp_timer_discard() to tcp_timer.c and make it static.
Reviewed by: rrs, hselasky Differential revision: https://reviews.freebsd.org/D32965
show more ...
|
#
f581a26e |
| 26-Oct-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP.
Pass control for IP/IP6 level options from generic tcp_ctloutput_set() down to per-stack ctloutput.
Call tcp6_use_min_mtu() from tcp
Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP.
Pass control for IP/IP6 level options from generic tcp_ctloutput_set() down to per-stack ctloutput.
Call tcp6_use_min_mtu() from tcp stack tcp_default_ctloutput().
Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655
show more ...
|
#
e2833083 |
| 26-Oct-2021 |
Peter Lei <peterlei@netflix.com> |
tcp: socket option to get stack alias name
TCP stack sysctl nodes are currently inserted using the stack name alias. Allow the user to get the current stack's alias to allow for programatic sysctl a
tcp: socket option to get stack alias name
TCP stack sysctl nodes are currently inserted using the stack name alias. Allow the user to get the current stack's alias to allow for programatic sysctl access.
Obtained from: Netflix
show more ...
|
#
a36230f7 |
| 01-Oct-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Make dsack stats available in netstat and also make sure its aware of TLP's.
DSACK accounting has been for quite some time under a NETFLIX_STATS ifdef. Statistics on DSACKs however are very use
tcp: Make dsack stats available in netstat and also make sure its aware of TLP's.
DSACK accounting has been for quite some time under a NETFLIX_STATS ifdef. Statistics on DSACKs however are very useful in figuring out how much bad retransmissions you are doing. This is further complicated, however, by stacks that do TLP. A TLP when discovering a lost ack in the reverse path will cause the generation of a DSACK. For this situation we introduce a new dsack-tlp-bytes as well as the more traditional dsack-bytes and dsack-packets. These will now all display in netstat -p tcp -s. This also updates all stacks that are currently built to keep track of these stats.
Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32158
show more ...
|
#
e3e7d953 |
| 15-Sep-2021 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
tcp: Avoid division by zero when KERN_TLS is enabled in tcp_account_for_send().
If the "len" variable is non-zero, we can assume that the sum of "tp->t_snd_rxt_bytes + tp->t_sndbytes" is also non-ze
tcp: Avoid division by zero when KERN_TLS is enabled in tcp_account_for_send().
If the "len" variable is non-zero, we can assume that the sum of "tp->t_snd_rxt_bytes + tp->t_sndbytes" is also non-zero.
It is also assumed that the 64-bit byte counters will never wrap around.
Differential Revision: https://reviews.freebsd.org/D31959 Reviewed by: gallatin, rrs and tuexen Found by: "I told you so", also called hselasky MFC after: 1 week Sponsored by: NVIDIA Networking
show more ...
|
#
739de953 |
| 06-Aug-2021 |
Andrew Gallatin <gallatin@FreeBSD.org> |
ktls: Move KERN_TLS ifdef to tcp_var.h
This allows us to remove stubs in ktls.h and allows us to sort the function prototypes.
Reviewed by: jhb Sponsored by: Netflix
|
#
ca1a7e10 |
| 13-Jul-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: TCP_LRO getting bad checksums and sending it in to TCP incorrectly.
In reviewing tcp_lro.c we have a possibility that some drives may send a mbuf into LRO without making sure that the checksum
tcp: TCP_LRO getting bad checksums and sending it in to TCP incorrectly.
In reviewing tcp_lro.c we have a possibility that some drives may send a mbuf into LRO without making sure that the checksum passes. Some drivers actually are aware of this and do not call lro when the csum failed, others do not do this and thus could end up sending data up that we think has a checksum passing when it does not.
This change will fix that situation by properly verifying that the mbuf has the correct markings (CSUM VALID bits as well as csum in mbuf header is set to 0xffff).
Reviewed by: tuexen, hselasky, gallatin Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D31155
show more ...
|
#
28d0a740 |
| 06-Jul-2021 |
Andrew Gallatin <gallatin@FreeBSD.org> |
ktls: auto-disable ifnet (inline hw) kTLS
Ifnet (inline) hw kTLS NICs typically keep state within a TLS record, so that when transmitting in-order, they can continue encryption on each segment sent
ktls: auto-disable ifnet (inline hw) kTLS
Ifnet (inline) hw kTLS NICs typically keep state within a TLS record, so that when transmitting in-order, they can continue encryption on each segment sent without DMA'ing extra state from the host.
This breaks down when transmits are out of order (eg, TCP retransmits). In this case, the NIC must re-DMA the entire TLS record up to and including the segment being retransmitted. This means that when re-transmitting the last 1448 byte segment of a TLS record, the NIC will have to re-DMA the entire 16KB TLS record. This can lead to the NIC running out of PCIe bus bandwidth well before it saturates the network link if a lot of TCP connections have a high retransmoit rate.
This change introduces a new sysctl (kern.ipc.tls.ifnet_max_rexmit_pct), where TCP connections with higher retransmit rate will be switched to SW kTLS so as to conserve PCIe bandwidth.
Reviewed by: hselasky, markj, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D30908
show more ...
|
#
9e4d9e4c |
| 25-Jun-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
Hardware TLS is now supported in some interface c
tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
Hardware TLS is now supported in some interface cards and it works well. Except that when we have connections that retransmit a lot we get into trouble with all the retransmits. This prep step makes way for change that Drew will be making so that we can "kick out" a session from hardware TLS.
Reviewed by: mtuexen, gallatin Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30895
show more ...
|
#
67e89281 |
| 10-Jun-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Mbuf leak while holding a socket buffer lock.
When running at NF the current Rack and BBR changes with the recent commits from Richard that cause the socket buffer lock to be held over the ip_o
tcp: Mbuf leak while holding a socket buffer lock.
When running at NF the current Rack and BBR changes with the recent commits from Richard that cause the socket buffer lock to be held over the ip_output() call and then finally culminating in a call to tcp_handle_wakeup() we get a lot of leaked mbufs. I don't think that this leak is actually caused by holding the lock or what Richard has done, but is exposing some other bug that has probably been lying dormant for a long time. I will continue to look (using his changes) at what is going on to try to root cause out the issue.
In the meantime I can't leave the leaks out for everyone else. So this commit will revert all of Richards changes and move both Rack and BBR back to just doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.
We may want to look at adding back in Richards changes after I have pinpointed the root cause of the mbuf leak and fixed it.
Reviewed by: mtuexen,rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30704
show more ...
|
#
032bf749 |
| 21-May-2021 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
[tcp] Keep socket buffer locked until upcall
r367492 would unlock the socket buffer before eventually calling the upcall. This leads to problematic interaction with NFS kernel server/client componen
[tcp] Keep socket buffer locked until upcall
r367492 would unlock the socket buffer before eventually calling the upcall. This leads to problematic interaction with NFS kernel server/client components (MP threads) accessing the socket buffer with potentially not correctly updated state.
Reported by: rmacklem Reviewed By: tuexen, #transport Tested by: rmacklem, otis MFC after: 2 weeks Sponsored By: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29690
show more ...
|