#
f64dc2ab |
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: TCP output method can request tcp_drop
The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection when they do output on it. The default stack never does this, thus existing frame
tcp: TCP output method can request tcp_drop
The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection when they do output on it. The default stack never does this, thus existing framework expects tcp_output() always to return locked and valid tcpcb.
Provide KPI extension to satisfy demands of advanced stacks. If the output method returns negative error code, it means that caller must call tcp_drop().
In tcp_var() provide three inline methods to call tcp_output(): - tcp_output() is a drop-in replacement for the default stack, so that default stack can continue using it internally without modifications. For advanced stacks it would perform tcp_drop() and unlock and report that with negative error code. - tcp_output_unlock() handles the negative code and always converts it to positive and always unlocks. - tcp_output_nodrop() just calls the method and leaves the responsibility to drop on the caller.
Sweep over the advanced stacks and use new KPI instead of using HPTS delayed drop queue for that.
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33370
show more ...
|
#
dbbcc777 |
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
rack: rack_do_compressed_ack_processing() can call tcp_drop()
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33369
|
#
66aeb0b5 |
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
rack: drop connection synchronously, when we can
For all functions that are leaves of tcp_input() call ctf_do_dropwithreset_conn() instead of ctf_do_dropwithreset(), cause we always got tp and we wa
rack: drop connection synchronously, when we can
For all functions that are leaves of tcp_input() call ctf_do_dropwithreset_conn() instead of ctf_do_dropwithreset(), cause we always got tp and we want it to be dropped.
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33368
show more ...
|
#
40fa3e40 |
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: mechanically substitute call to tfb_tcp_output to new method.
Made with sed(1) execution:
sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/)
sed: s/tp->t_fb->tfb_tcp_output\(tp
tcp: mechanically substitute call to tfb_tcp_output to new method.
Made with sed(1) execution:
sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/)
sed: s/tp->t_fb->tfb_tcp_output\(tp\)/tcp_output(tp)/ s/to tfb_tcp_output\(\)/to tcp_output()/
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33366
show more ...
|
#
9b602965 |
| 15-Dec-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Rack in a rare case we can get stuck sending a very small amount.
If a tlp sending new data fails, and then the peer starts talking to us again, we can be in a situation where the tlp_new_data
tcp: Rack in a rare case we can get stuck sending a very small amount.
If a tlp sending new data fails, and then the peer starts talking to us again, we can be in a situation where the tlp_new_data count is set, we are not in recovery and we always send one packet every RTT. The failure has to occur when we send the TLP initially from the ip_output() which is rare. But if it occurs you are basically stuck.
This fixes it so we use the new_data count and clear it so we know it will be cleared. If a failure occurs the tlp timer will regenerate a new amount anyway so it is un-needed to carry the value on.
Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D33325
show more ...
|
#
dadbc042 |
| 06-Dec-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: rack fails to send out a TLP after a MTU change
When rack sends out a TLP it sets up various state to make sure it avoids the cwnd (its been more than 1 RTT since our last send) and it may at t
tcp: rack fails to send out a TLP after a MTU change
When rack sends out a TLP it sets up various state to make sure it avoids the cwnd (its been more than 1 RTT since our last send) and it may at times send new data. If an MTU change as occurred and our cwnd has collapsed we can have a situation where must_retran flag is set and we obey the cwnd thus never sending the TLP and then sitting stuck.
This one line fix addresses that problem Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D33231
show more ...
|
#
db0ac6de |
| 02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mism
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mismerge of a merge to catch up to main resulted in files being committed which should not have been.
show more ...
|
#
266f97b5 |
| 02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.
MFC after: 1 month
|
#
f971e791 |
| 02-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp_hpts: rename input queue to drop queue and trim dead code
The HPTS input queue is in reality used only for "delayed drops". When a TCP stack decides to drop a connection on the output path it ca
tcp_hpts: rename input queue to drop queue and trim dead code
The HPTS input queue is in reality used only for "delayed drops". When a TCP stack decides to drop a connection on the output path it can't do that due to locking protocol between main tcp_output() and stacks. So, rack/bbr utilize HPTS to drop the connection in a different context.
In the past the queue could also process input packets in context of HPTS thread, but now no stack uses this, so remove this functionality.
Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33025
show more ...
|
#
50f081ec |
| 02-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp_hpts: provide tcp_in_hpts().
It will hide some internal HPTS knowledge from the consumers.
Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33023
|
Revision tags: release/12.3.0 |
|
#
1dadeab3 |
| 30-Nov-2021 |
Gordon Bergling <gbe@FreeBSD.org> |
netinet: Fix a common typo in source code comments
- s/segement/segment/
MFC after: 3 days
|
#
97e28f0f |
| 17-Nov-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Rack ack war with a mis-behaving firewall or nat with resets.
Previously we added ack-war prevention for misbehaving firewalls. This is where the f/w or nat messes up its sequence numbers and c
tcp: Rack ack war with a mis-behaving firewall or nat with resets.
Previously we added ack-war prevention for misbehaving firewalls. This is where the f/w or nat messes up its sequence numbers and causes an ack-war. There is yet another type of ack war that we have found in the wild that is like unto this. Basically the f/w or nat gets a ack (keep-alive probe or such) and instead of turning the ack/seq around and adding a TH_RST it does something real stupid and sends a new packet with seq=0. This of course triggers the challenge ack in the reset processing which then sends in a challenge ack (if the seq=0 is within the range of possible sequence numbers allowed by the challenge) and then we rinse-repeat.
This will add the needed tweaks (similar to the last ack-war prevention using the same sysctls and counters) to prevent it and allow say 5 per second by default.
Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32938
show more ...
|
#
26cbd002 |
| 11-Nov-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Rack may still calculate long RTT on persists probes.
When a persists probe is lost, we will end up calculating a long RTT based on the initial probe and when the response comes from the second
tcp: Rack may still calculate long RTT on persists probes.
When a persists probe is lost, we will end up calculating a long RTT based on the initial probe and when the response comes from the second probe (or third etc). This means we have a minimum of a confidence level of 3 on a incorrect probe. This commit will change it so that we have one of two options a) Just not count RTT of probes where we had a loss <or> b) Count them still but degrade the confidence to 0.
I have set in this the default being to just not measure them, but I am open to having the default be otherwise.
Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32897
show more ...
|
#
477aeb3d |
| 08-Nov-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Printf should be removed.
There is a printf when a socket option down to the CC module fails, this really should not be a printf. In fact this whole option needs to be re-thought in coordinatio
tcp: Printf should be removed.
There is a printf when a socket option down to the CC module fails, this really should not be a printf. In fact this whole option needs to be re-thought in coordination with some other changes in the CC modules (its just not right but its ok what it does here if it fails since it will just use the ECN beta).
Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32894
show more ...
|
#
c28e39c3 |
| 03-Nov-2021 |
Gordon Bergling <gbe@FreeBSD.org> |
Fix a common typo in syctl descriptions
- s/maxiumum/maximum/
MFC after: 3 days
|
#
141a53cd |
| 29-Oct-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Rack might retransmit forever.
If we get a Sacked peer with an MTU change we can retransmit forever if the last bytes are sacked and the client goes away (think power off). Then we never see th
tcp: Rack might retransmit forever.
If we get a Sacked peer with an MTU change we can retransmit forever if the last bytes are sacked and the client goes away (think power off). Then we never see the end condition and continually retransmit.
Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32671
show more ...
|
#
aeda8527 |
| 29-Oct-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Rack at times can miscalculate the RTT from what it thinks is a persists probe respone.
Turns out that if a peer sends in a window update right after rack fires off a persists probe, we can mis
tcp: Rack at times can miscalculate the RTT from what it thinks is a persists probe respone.
Turns out that if a peer sends in a window update right after rack fires off a persists probe, we can mis-interpret the window update and calculate a bogus RTT (very short). We still process the window update and send the data but we incorrectly generate an RTT. We should be only doing the RTT stuff if the rwnd is still small and has not changed.
Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32717
show more ...
|
#
5d3bf5b1 |
| 26-Oct-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
rack: Update the fast send block on setsockopt(2)
Rack caches TCP/IP header for fast send, so it doesn't call tcpip_fillheaders(). After certain socket option changes, namely IPV6_TCLASS, IP_TOS an
rack: Update the fast send block on setsockopt(2)
Rack caches TCP/IP header for fast send, so it doesn't call tcpip_fillheaders(). After certain socket option changes, namely IPV6_TCLASS, IP_TOS and IP_TTL it needs to update its fast block to be in sync with the inpcb.
Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655
show more ...
|
#
f581a26e |
| 26-Oct-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP.
Pass control for IP/IP6 level options from generic tcp_ctloutput_set() down to per-stack ctloutput.
Call tcp6_use_min_mtu() from tcp
Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP.
Pass control for IP/IP6 level options from generic tcp_ctloutput_set() down to per-stack ctloutput.
Call tcp6_use_min_mtu() from tcp stack tcp_default_ctloutput().
Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655
show more ...
|
#
12752978 |
| 26-Oct-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: The rack stack can incorrectly have an overflow when calculating a burst delay.
If the congestion window is very large the fact that we multiply it by 1000 (for microseconds) can cause the uint
tcp: The rack stack can incorrectly have an overflow when calculating a burst delay.
If the congestion window is very large the fact that we multiply it by 1000 (for microseconds) can cause the uint32_t to overflow and we incorrectly calculate a very small divisor. This will then cause the burst timer to be very large when it should be 0. Instead lets make the three variables uint64_t and avoid the issue.
Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32668
show more ...
|
#
4e4c84f8 |
| 22-Oct-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Add hystart-plus to cc_newreno and rack.
TCP Hystart draft version -03: https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-hystartplusplus
Is a new version of hystart that allows one to car
tcp: Add hystart-plus to cc_newreno and rack.
TCP Hystart draft version -03: https://datatracker.ietf.org/doc/html/draft-ietf-tcpm-hystartplusplus
Is a new version of hystart that allows one to carefully exit slow start if the RTT spikes too much. The newer version has a slower-slow-start so to speak that then kicks in for five round trips. To see if you exited too early, if not into congestion avoidance. This commit will add that feature to our newreno CC and add the needed bits in rack to be able to enable it.
Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32373
show more ...
|
#
a36230f7 |
| 01-Oct-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Make dsack stats available in netstat and also make sure its aware of TLP's.
DSACK accounting has been for quite some time under a NETFLIX_STATS ifdef. Statistics on DSACKs however are very use
tcp: Make dsack stats available in netstat and also make sure its aware of TLP's.
DSACK accounting has been for quite some time under a NETFLIX_STATS ifdef. Statistics on DSACKs however are very useful in figuring out how much bad retransmissions you are doing. This is further complicated, however, by stacks that do TLP. A TLP when discovering a lost ack in the reverse path will cause the generation of a DSACK. For this situation we introduce a new dsack-tlp-bytes as well as the more traditional dsack-bytes and dsack-packets. These will now all display in netstat -p tcp -s. This also updates all stacks that are currently built to keep track of these stats.
Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32158
show more ...
|
#
1ca931a5 |
| 23-Sep-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Rack compressed ack path updates the recv window too easily
The compressed ack path of rack is not following proper procedures in updating the peers window. It should be checking the seq and ac
tcp: Rack compressed ack path updates the recv window too easily
The compressed ack path of rack is not following proper procedures in updating the peers window. It should be checking the seq and ack values before updating and instead it is blindly updating the values. This could in theory get the wrong window in the connection for some length of time.
Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32082
show more ...
|
#
fd69939e |
| 23-Sep-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Two bugs in rack one of which can lead to a panic.
In extensive testing in NF we have found two issues inside the rack stack.
1) An incorrect offset is being generated by the fast send path wh
tcp: Two bugs in rack one of which can lead to a panic.
In extensive testing in NF we have found two issues inside the rack stack.
1) An incorrect offset is being generated by the fast send path when a fast send is initiated on the end of the socket buffer and before the fast send runs, the sb_compress macro adds data to the trailing socket. This fools the fast send code into thinking the sb offset changed and it miscalculates a "updated offset". It should only do that when the mbuf in question got smaller.. i.e. an ack was processed. This can lead to a panic deref'ing a NULL mbuf if that packet is ever retransmitted. At the best case it leads to invalid data being sent to the client which usually terminates the connection. The fix is to have the proper logic (that is in the rsm fast path) to make sure we only update the offset when the mbuf shrinks. 2) The other issue is more bothersome. The timestamp check in rack needs to use the msec timestamp when comparing the timestamp echo to now. It was using a microsecond timestamp which ends up giving error prone results but causes only small harm in trying to identify which send to use in RTT calculations if its a retransmit.
Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D32062
show more ...
|
#
5baf32c9 |
| 17-Aug-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Add support for DSACK based reordering window to rack.
The rack stack, with respect to the rack bits in it, was originally built based on an early I-D of rack. In fact at that time the TLP bits
tcp: Add support for DSACK based reordering window to rack.
The rack stack, with respect to the rack bits in it, was originally built based on an early I-D of rack. In fact at that time the TLP bits were in a separate I-D. The dynamic reordering window based on DSACK events was not present in rack at that time. It is now part of the RFC and we need to update our stack to include these features. However we want to have a way to control the feature so that we can, if the admin decides, make it stay the same way system wide as well as via socket option. The new sysctl and socket option has the following meaning for setting:
00 (0) - Keep the old way, i.e. reordering window is 1 and do not use DSACK bytes to add to reorder window 01 (1) - Change the Reordering window to 1/4 of an RTT but do not use DSACK bytes to add to reorder window 10 (2) - Keep the reordering window as 1, but do use SACK bytes to add additional 1/4 RTT delay to the reorder window 11 (3) - reordering window is 1/4 of an RTT and add additional DSACK bytes to increase the reordering window (RFC behavior)
The default currently in the sysctl is 3 so we get standards based behavior. Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D31506
show more ...
|