#
b1e806c0 |
| 07-Jul-2021 |
Andrew Gallatin <gallatin@FreeBSD.org> |
tcp: fix alternate stack build with LINT-NO{INET,INET6,IP}
When fixing another bug, I noticed that the alternate TCP stacks do not build when various combinations of ipv4 and ipv6 are disabled.
Rev
tcp: fix alternate stack build with LINT-NO{INET,INET6,IP}
When fixing another bug, I noticed that the alternate TCP stacks do not build when various combinations of ipv4 and ipv6 are disabled.
Reviewed by: rrs, tuexen Differential Revision: https://reviews.freebsd.org/D31094 Sponsored by: Netflix
show more ...
|
#
d7955cc0 |
| 06-Jul-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: HPTS performance enhancements
HPTS drives both rack and bbr, and yet there have been many complaints about performance. This bit of work restructures hpts to help reduce CPU overhead. It does t
tcp: HPTS performance enhancements
HPTS drives both rack and bbr, and yet there have been many complaints about performance. This bit of work restructures hpts to help reduce CPU overhead. It does this by now instead of relying on the timer/callout to drive it instead use user return from a system call as well as lro flushes to drive hpts. The timer becomes a backstop that dynamically adjusts based on how "late" we are.
Reviewed by: tuexen, glebius Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D31083
show more ...
|
#
e834f9a4 |
| 06-Jul-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Address goodput and TLP edge cases.
There are several cases where we make a goodput measurement and we are running out of data when we decide to make the measurement. In reality we should not m
tcp: Address goodput and TLP edge cases.
There are several cases where we make a goodput measurement and we are running out of data when we decide to make the measurement. In reality we should not make such a measurement if there is no chance we can have "enough" data. There is also some corner case TLP's that end up not registering as a TLP like they should, we fix this by pushing the doing_tlp setup to the actual timeout that knows it did a TLP. This makes it so we always have the appropriate flag on the sendmap indicating a TLP being done as well as count correctly so we make no more that two TLP's.
In addressing the goodput lets also add a "quality" metric that can be viewed via blackbox logs so that a casual observer does not have to figure out how good of a measurement it is. This is needed due to the fact that we may still make a measurement that is of a poorer quality as we run out of data but still have a minimal amount of data to make a measurement.
Reviewed by: tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D31076
show more ...
|
#
9e4d9e4c |
| 25-Jun-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
Hardware TLS is now supported in some interface c
tcp: Preparation for allowing hardware TLS to be able to kick a tcp connection that is retransmitting too much out of hardware and back to software.
Hardware TLS is now supported in some interface cards and it works well. Except that when we have connections that retransmit a lot we get into trouble with all the retransmits. This prep step makes way for change that Drew will be making so that we can "kick out" a session from hardware TLS.
Reviewed by: mtuexen, gallatin Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30895
show more ...
|
#
66aec14a |
| 24-Jun-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Rack not being very friendly with V6:4 socket and having a connection from V4
There were two bugs that prevented V4 sockets from connecting to a rack server running a V4/V6 socket. As well as a
tcp: Rack not being very friendly with V6:4 socket and having a connection from V4
There were two bugs that prevented V4 sockets from connecting to a rack server running a V4/V6 socket. As well as a bug that stops the mapped v4 in V6 address from working.
Reviewed by: mtuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30885
show more ...
|
#
f1536bb5 |
| 11-Jun-2021 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: remove debug output from RACK
Reported by: iron.udjin@gmail.com, Marek Zarychta Reviewed by: rrs PR: 256538 MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D30723 Spon
tcp: remove debug output from RACK
Reported by: iron.udjin@gmail.com, Marek Zarychta Reviewed by: rrs PR: 256538 MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D30723 Sponsored by: Netflix, Inc.
show more ...
|
#
ba1b3e48 |
| 11-Jun-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Missing mfree in rack and bbr
Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and then not including a timestamp. This involved in the input path doing a got
tcp: Missing mfree in rack and bbr
Recently (Nov) we added logic that protects against a peer negotiating a timestamp, and then not including a timestamp. This involved in the input path doing a goto done_with_input label. Now I suspect the code was cribbed from one in Rack that has to do with the SYN. This had a bug, i.e. it should have a m_freem(m) before going to the label (bbr had this missing m_freem() but rack did not). This then caused the missing m_freem to show up in both BBR and Rack. Also looking at the code referencing m->m_pkthdr.lro_nsegs later (after processing) is not a good idea, even though its only for logging. Best to copy that off before any frees can take place.
Reviewed by: mtuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30727
show more ...
|
#
224cf7b3 |
| 11-Jun-2021 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: fix compilation of IPv4-only builds
PR: 256538 Reported by: iron.udjin@gmail.com MFC after: 3 days Sponsored by: Netflix, Inc.
|
#
67e89281 |
| 10-Jun-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Mbuf leak while holding a socket buffer lock.
When running at NF the current Rack and BBR changes with the recent commits from Richard that cause the socket buffer lock to be held over the ip_o
tcp: Mbuf leak while holding a socket buffer lock.
When running at NF the current Rack and BBR changes with the recent commits from Richard that cause the socket buffer lock to be held over the ip_output() call and then finally culminating in a call to tcp_handle_wakeup() we get a lot of leaked mbufs. I don't think that this leak is actually caused by holding the lock or what Richard has done, but is exposing some other bug that has probably been lying dormant for a long time. I will continue to look (using his changes) at what is going on to try to root cause out the issue.
In the meantime I can't leave the leaks out for everyone else. So this commit will revert all of Richards changes and move both Rack and BBR back to just doing the old sorwakeup_locked() calls after messing with the so_rcv buffer.
We may want to look at adding back in Richards changes after I have pinpointed the root cause of the mbuf leak and fixed it.
Reviewed by: mtuexen,rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30704
show more ...
|
#
4747500d |
| 04-Jun-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: A better fix for the previously attempted fix of the ack-war issue with tcp.
So it turns out that my fix before was not correct. It ended with us failing some of the "improved" SYN tests, since
tcp: A better fix for the previously attempted fix of the ack-war issue with tcp.
So it turns out that my fix before was not correct. It ended with us failing some of the "improved" SYN tests, since we are not in the correct states. With more digging I have figured out the root of the problem is that when we receive a SYN|FIN the reassembly code made it so we create a segq entry to hold the FIN. In the established state where we were not in order this would be correct i.e. a 0 len with a FIN would need to be accepted. But if you are in a front state we need to strip the FIN so we correctly handle the ACK but ignore the FIN. This gets us into the proper states and avoids the previous ack war.
I back out some of the previous changes but then add a new change here in tcp_reass() that fixes the root cause of the issue. We still leave the rack panic fixes in place however.
Reviewed by: mtuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30627
show more ...
|
#
8c69d988 |
| 27-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: When we have an out-of-order FIN we do want to strip off the FIN bit.
The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr). However there was a missing case,
tcp: When we have an out-of-order FIN we do want to strip off the FIN bit.
The last set of commits fixed both a panic (in rack) and an ACK-war (in freebsd and bbr). However there was a missing case, i.e. where we get an out-of-order FIN by itself. In such a case we don't want to leave the FIN bit set, otherwise we will do the wrong thing and ack the FIN incorrectly. Instead we need to go through the tcp_reasm() code and that way the FIN will be stripped and all will be well.
Reviewed by: mtuexen,rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30497
show more ...
|
#
4f3addd9 |
| 26-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Add a socket option to rack so we can test various changes to the slop value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back a long time when delayed ack ti
tcp: Add a socket option to rack so we can test various changes to the slop value in timers.
Timer_slop, in TCP, has been 200ms for a long time. This value dates back a long time when delayed ack timers were longer and links were slower. A 200ms timer slop allows 1 MSS to be sent over a 60kbps link. Its possible that lowering this value to something more in line with todays delayed ack values (40ms) might improve TCP. This bit of code makes it so rack can, via a socket option, adjust the timer slop.
Reviewed by: mtuexen Sponsered by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30249
show more ...
|
#
13c0e198 |
| 25-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Fix bugs related to the PUSH bit and rack and an ack war
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed in the last commit. The problem is the
tcp: Fix bugs related to the PUSH bit and rack and an ack war
Michaels testing with UDP tunneling found an issue with the push bit, which was only partly fixed in the last commit. The problem is the left edge gets transmitted before the adjustments are done to the send_map, this means that right edge bits must be considered to be added only if the entire RSM is being retransmitted.
Now syzkaller also continued to find a crash, which Michael sent me the reproducer for. Turns out that the reproducer on default (freebsd) stack made the stack get into an ack-war with itself. After fixing the reference issues in rack the same ack-war was found in rack (and bbr). Basically what happens is we go into the reassembly code and lose the FIN bit. The trick here is we should not be going into the reassembly code if tlen == 0 i.e. the peer never sent you anything. That then gets the proper action on the FIN bit but then you end up in LAST_ACK with no timers running. This is because the usrclosed function gets called and the FIN's and such have already been exchanged. So when we should be entering FIN_WAIT2 (or even FIN_WAIT1) we get stuck in LAST_ACK. Fixing this means tweaking the usrclosed function so that we properly recognize the condition and drop into FIN_WAIT2 where a timer will allow at least TP_MAXIDLE before closing (to allow time for the peer to retransmit its FIN if the ack is lost). Setting the fast_finwait2 timer can speed this up in testing.
Reviewed by: mtuexen,rscheff Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30451
show more ...
|
#
9bbd1a8f |
| 24-May-2021 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: fix a RACK socket buffer lock issue
Fix a missing socket buffer unlocking of the socket receive buffer.
Reviewed by: gallatin, rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential
tcp: fix a RACK socket buffer lock issue
Fix a missing socket buffer unlocking of the socket receive buffer.
Reviewed by: gallatin, rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D30402
show more ...
|
#
631449d5 |
| 24-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Fix an issue with the PUSH bit as well as fill in the missing mtu change for fsb's
The push bit itself was also not actually being properly moved to the right edge. The FIN bit was incorrectly
tcp: Fix an issue with the PUSH bit as well as fill in the missing mtu change for fsb's
The push bit itself was also not actually being properly moved to the right edge. The FIN bit was incorrectly on the left edge. We fix these two issues as well as plumb in the mtu_change for alternate stacks.
Reviewed by: mtuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30413
show more ...
|
#
8923ce63 |
| 22-May-2021 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: Handle stack switch while processing socket options
Handle the case where during socket option processing, the user switches a stack such that processing the stack specific socket option does n
tcp: Handle stack switch while processing socket options
Handle the case where during socket option processing, the user switches a stack such that processing the stack specific socket option does not make sense anymore. Return an error in this case.
MFC after: 1 week Reviewed by: markj Reported by: syzbot+a6e1d91f240ad5d72cd1@syzkaller.appspotmail.com Sponsored by: Netflix, Inc. Differential revision: https://reviews.freebsd.org/D30395
show more ...
|
#
39756885 |
| 22-May-2021 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
rack: honor prior socket buffer lock when doing the upcall
While partially reverting D24237 with D29690, due to introducing some unintended effects for in-kernel TCP consumers, the preexisting lock
rack: honor prior socket buffer lock when doing the upcall
While partially reverting D24237 with D29690, due to introducing some unintended effects for in-kernel TCP consumers, the preexisting lock on the socket send buffer was not considered properly.
Found by: markj MFC after: 2 weeks Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D30390
show more ...
|
#
032bf749 |
| 21-May-2021 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
[tcp] Keep socket buffer locked until upcall
r367492 would unlock the socket buffer before eventually calling the upcall. This leads to problematic interaction with NFS kernel server/client componen
[tcp] Keep socket buffer locked until upcall
r367492 would unlock the socket buffer before eventually calling the upcall. This leads to problematic interaction with NFS kernel server/client components (MP threads) accessing the socket buffer with potentially not correctly updated state.
Reported by: rmacklem Reviewed By: tuexen, #transport Tested by: rmacklem, otis MFC after: 2 weeks Sponsored By: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D29690
show more ...
|
#
500eb6dd |
| 21-May-2021 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: Fix sending of TCP segments with IP level options
When bringing in TCP over UDP support in https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605, the length of IP lev
tcp: Fix sending of TCP segments with IP level options
When bringing in TCP over UDP support in https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605, the length of IP level options was considered when locating the transport header. This was incorrect and is fixed by this patch.
X-MFC with: https://cgit.FreeBSD.org/src/commit/?id=9e644c23000c2f5028b235f6263d17ffb24d3605 MFC after: 3 days Reviewed by: markj, rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D30358
show more ...
|
#
02cffbc2 |
| 13-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Incorrect KASSERT causes a panic in rack
Skyzall found an interesting panic in rack. When a SYN and FIN are both sent together a KASSERT gets tripped where it is validating that a mbuf pointer
tcp: Incorrect KASSERT causes a panic in rack
Skyzall found an interesting panic in rack. When a SYN and FIN are both sent together a KASSERT gets tripped where it is validating that a mbuf pointer is in the sendmap. But a SYN and FIN often will not have a mbuf pointer. So the fix is two fold a) make sure that the SYN and FIN split the right way when cloning an RSM SYN on left edge and FIN on right. And also make sure the KASSERT properly accounts for the case that we have a SYN or FIN so we don't panic.
Reviewed by: mtuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D30241
show more ...
|
#
251842c6 |
| 12-May-2021 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp rack: improve initialisation of retransmit timeout
When the TCP is in the front states, don't take the slop variable into account. This improves consistency with the base stack.
Reviewed by: r
tcp rack: improve initialisation of retransmit timeout
When the TCP is in the front states, don't take the slop variable into account. This improves consistency with the base stack.
Reviewed by: rrs@ Differential Revision: https://reviews.freebsd.org/D30230 MFC after: 1 week Sponsored by: Netflix, Inc.
show more ...
|
#
4b86a24a |
| 11-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: In rack, we must only convert restored rtt when the hostcache does restore them.
Rack now after the previous commit is very careful to translate any value in the hostcache for srtt/rttvar into
tcp: In rack, we must only convert restored rtt when the hostcache does restore them.
Rack now after the previous commit is very careful to translate any value in the hostcache for srtt/rttvar into its proper format. However there is a snafu here in that if tp->srtt is 0 is the only time that the HC will actually restore the srtt. We need to then only convert the srtt restored when it is actually restored. We do this by making sure it was zero before the call to cc_conn_init and it is non-zero afterwards.
Reviewed by: Michael Tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30213
show more ...
|
#
9867224b |
| 10-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp:Host cache and rack ending up with incorrect values.
The hostcache up to now as been updated in the discard callback but without checking if we are all done (the race where there are more than o
tcp:Host cache and rack ending up with incorrect values.
The hostcache up to now as been updated in the discard callback but without checking if we are all done (the race where there are more than one calls and the counter has not yet reached zero). This means that when the race occurs, we end up calling the hc_upate more than once. Also alternate stacks can keep there srtt/rttvar in different formats (example rack keeps its values in microseconds). Since we call the hc_update *before* the stack fini() then the values will be in the wrong format.
Rack on the other hand, needs to convert items pulled from the hostcache into its internal format else it may end up with very much incorrect values from the hostcache. In the process lets commonize the update mechanism for srtt/rttvar since we now have more than one place that needs to call it.
Reviewed by: Michael Tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30172
show more ...
|
#
a16cee02 |
| 07-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
Fix a UDP tunneling issue with rack. Basically there are two issues. A) Not enough hdrlen was being calculated when a UDP tunnel is in place. and B) Not enough memory is allocated in racks fsb. We
Fix a UDP tunneling issue with rack. Basically there are two issues. A) Not enough hdrlen was being calculated when a UDP tunnel is in place. and B) Not enough memory is allocated in racks fsb. We need to overbook the fsb to include a udphdr just in case.
Submitted by: Peter Lei Reviewed by: Michael Tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D30157
show more ...
|
#
5d8fd932 |
| 06-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
This brings into sync FreeBSD with the netflix versions of rack and bbr. This fixes several breakages (panics) since the tcp_lro code was committed that have been reported. Quite a few new features a
This brings into sync FreeBSD with the netflix versions of rack and bbr. This fixes several breakages (panics) since the tcp_lro code was committed that have been reported. Quite a few new features are now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the largest). There is also support for ack-war prevention. Documents comming soon on rack..
Sponsored by: Netflix Reviewed by: rscheff, mtuexen Differential Revision: https://reviews.freebsd.org/D30036
show more ...
|