#
4e8a20a7 |
| 19-Apr-2023 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: rack the request level logging is a bit too noisy when doing point logging.
When doing request level BB logging the hybrid_bw_log() does not have proper screening to minimize logging when point
tcp: rack the request level logging is a bit too noisy when doing point logging.
When doing request level BB logging the hybrid_bw_log() does not have proper screening to minimize logging when point level logging is in use. Lets fix it properly so you have to have the proper knobs set to get the more noisy logging.
Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision:https://reviews.freebsd.org/D39699
show more ...
|
#
7a842346 |
| 19-Apr-2023 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Rack can crash with the new non-TSO fix..
Turns out the location of the check to see if we can do output is in the wrong place. We need to jump off to the compressed acks before handling that c
tcp: Rack can crash with the new non-TSO fix..
Turns out the location of the check to see if we can do output is in the wrong place. We need to jump off to the compressed acks before handling that case since th is NULL in the compressed ack case which is handled differently anyway.
Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision:https://reviews.freebsd.org/D39690
show more ...
|
#
2ad584c5 |
| 17-Apr-2023 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Inconsistent use of hpts_calling flag
Gleb has noticed there were some inconsistency's in the way the inp_hpts_calls flag was being used. One such inconsistency results in a bug when we can't a
tcp: Inconsistent use of hpts_calling flag
Gleb has noticed there were some inconsistency's in the way the inp_hpts_calls flag was being used. One such inconsistency results in a bug when we can't allocate enough sendmap entries to entertain a call to rack_output().. basically a timer won't get started like it should. Also in cleaning this up I find that the "no_output" side of input needs to be adjusted to make sure we don't try to re-pace too quickly outside the hpts assurance of 250useconds.
Another thing here is we end up with duplicate calls to tcp_output() which we should not. If packets go from hpts for processing the input side of tcp will call the output side of tcp on the last packet if it is needed. This means that when that occurs a second call to tcp_output would be made that is not needed and if pacing is going on may be harmful.
Lets fix all this and explicitly state the contract that hpts is making with transports that care about the flag.
Reviewed by: tuexen, glebius Sponsored by: Netflix Inc Differential Revision:https://reviews.freebsd.org/D39653
show more ...
|
#
a540cdca |
| 17-Apr-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp_hpts: use queue(9) STAILQ for the input queue
Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39574
|
#
3cc7b667 |
| 14-Apr-2023 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: stack unloading crash in rack and bbr
Its possible to induce a crash in either rack or bbr. This would be done if the rack stack were say the default and bbr was being used by a connection. If
tcp: stack unloading crash in rack and bbr
Its possible to induce a crash in either rack or bbr. This would be done if the rack stack were say the default and bbr was being used by a connection. If the bbr stack is then unloaded and it was active, we will trigger a MPASS assert in tcp_hpts since the new stack (default rack) would start a timer, and the old stack (bbr) would have the inp already in hpts.
Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision:https://reviews.freebsd.org/D39576
show more ...
|
#
9903bf34 |
| 14-Apr-2023 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: rack pacing has some caveats that need to be obeyed when LRO is missing
n further non-LRO testing I found a case where rack is supposed to be waking up but it is not now. In this special case i
tcp: rack pacing has some caveats that need to be obeyed when LRO is missing
n further non-LRO testing I found a case where rack is supposed to be waking up but it is not now. In this special case it sets the flag rc_ack_can_sendout_data. When that is set we should not prohibit output.
Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision:https://reviews.freebsd.org/D39565
show more ...
|
#
a2b33c9a |
| 10-Apr-2023 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Rack - in the absence of LRO fixed rate pacing (loopback or interfaces with no LRO) does not work correctly.
Rack is capable of fixed rate or dynamic rate pacing. Both of these can get mixed up
tcp: Rack - in the absence of LRO fixed rate pacing (loopback or interfaces with no LRO) does not work correctly.
Rack is capable of fixed rate or dynamic rate pacing. Both of these can get mixed up when LRO is not available. This is because LRO will hold off waking up the tcp connection to processing the inbound packets until the pacing timer is up. Without LRO the pacing only sort-of works. Sometimes we pace correctly, other times not so much.
This set of changes will make it so pacing works properly in the absence of LRO.
Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision:https://reviews.freebsd.org/D39494
show more ...
|
#
e2224617 |
| 10-Apr-2023 |
John Baldwin <jhb@FreeBSD.org> |
rack: mask and tclass are only used for INET6.
This fixes the LINT-NOINET6 build.
|
#
66fbc19f |
| 07-Apr-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: pass tcpcb in the tfb_tcp_ctloutput() method instead of inpcb
Just matches rest of the KPI.
Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39435
|
#
35bc0bcc |
| 07-Apr-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: reduce argument list to functions that pass a segment
The socket argument is superfluous, as a tcpcb always has one and only one socket.
Reviewed by: rrs Differential Revision: https://review
tcp: reduce argument list to functions that pass a segment
The socket argument is superfluous, as a tcpcb always has one and only one socket.
Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D39434
show more ...
|
#
945f9a7c |
| 07-Apr-2023 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: misc cleanup of options for rack as well as socket option logging.
Both BBR and Rack have the ability to log socket options, which is currently disabled. Rack has an experimental SaD (Sack Atta
tcp: misc cleanup of options for rack as well as socket option logging.
Both BBR and Rack have the ability to log socket options, which is currently disabled. Rack has an experimental SaD (Sack Attack Detection) algorithm that should be made available. Also there is a t_maxpeak_rate that needs to be removed (its un-used).
Reviewed by: tuexen, cc Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D39427
show more ...
|
Revision tags: release/13.2.0 |
|
#
84b42df8 |
| 05-Apr-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
rack: fix build on powerpc
|
#
030434ac |
| 04-Apr-2023 |
Randall Stewart <rrs@FreeBSD.org> |
Update rack to the latest code used at NF.
There have been many changes to rack over the last couple of years, including: a) Ability when switching stacks to have one stack query another.
Update rack to the latest code used at NF.
There have been many changes to rack over the last couple of years, including: a) Ability when switching stacks to have one stack query another. b) Internal use of micro-second timers instead of ticks. c) Many changes to pacing in forms of 1) Improvements to Dynamic Goodput Pacing (DGP) 2) Improvements to fixed rate paciing 3) A new feature called hybrid pacing where the requestor can get a combination of DGP and fixed rate pacing with deadlines for delivery that can dynamically speed things up. d) All kinds of bugs found during extensive testing and use of the rack stack for streaming video and in fact all data transferred by NF
Reviewed by: glebius, gallatin, tuexen Sponsored By: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D39402
show more ...
|
#
73ee5756 |
| 01-Apr-2023 |
Randall Stewart <rrs@FreeBSD.org> |
Fixes in the tcp infrastructure with respect to stack changes as well as other infrastructure updates for incoming rack features.
So stack switching as always been a bit of a issue. We currently use
Fixes in the tcp infrastructure with respect to stack changes as well as other infrastructure updates for incoming rack features.
So stack switching as always been a bit of a issue. We currently use a break before make setup which means that if something goes wrong you have to try to get back to a stack. This patch among a lot of other things changes that so that it is a make before break. We also expand some of the function blocks in prep for new features in rack that will allow more controlled pacing. We also add other abilities such as the pathway for a stack to query a previous stack to acquire from it critical state information so things in flight don't get dropped or mis-handled when switching stacks. We also add the concept of a timer granularity. This allows an alternate stack to change from the old ticks granularity to microseconds and of course this even gives us a pathway to go to nanosecond timekeeping if we need to (something for the data center to consider for sure).
Once all this lands I will then update rack to begin using all these new features.
Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D39210
show more ...
|
#
69c7c811 |
| 16-Mar-2023 |
Randall Stewart <rrs@FreeBSD.org> |
Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and t
Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints we need to move to using the new inline functions. This adds them and moves rack to now use the tcp_tracepoints.
Reviewed by: tuexen, gallatin Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D38831
show more ...
|
#
c0e4090e |
| 08-Feb-2023 |
Andrew Gallatin <gallatin@FreeBSD.org> |
ktls: Accurately track if ifnet ktls is enabled
This allows us to avoid spurious calls to ktls_disable_ifnet()
When we implemented ifnet kTLSe, we set a flag in the tx socket buffer (SB_TLS_IFNET)
ktls: Accurately track if ifnet ktls is enabled
This allows us to avoid spurious calls to ktls_disable_ifnet()
When we implemented ifnet kTLSe, we set a flag in the tx socket buffer (SB_TLS_IFNET) to indicate ifnet kTLS. This flag meant that now, or in the past, ifnet ktls was active on a socket. Later, I added code to switch ifnet ktls sessions to software in the case of lossy TCP connections that have a high retransmit rate. Because TCP was using SB_TLS_IFNET to know if it needed to do math to calculate the retransmit ratio and potentially call into ktls_disable_ifnet(), it was doing unneeded work long after a session was moved to software.
This patch carefully tracks whether or not ifnet ktls is still enabled on a TCP connection. Because the inp is now embedded in the tcpcb, and because TCP is the most frequent accessor of this state, it made sense to move this from the socket buffer flags to the tcpcb. Because we now need reliable access to the tcbcb, we take a ref on the inp when creating a tx ktls session.
While here, I noticed that rack/bbr were incorrectly implementing tfb_hwtls_change(), and applying the change to all pending sends, when it should apply only to future sends.
This change reduces spurious calls to ktls_disable_ifnet() by 95% or so in a Netflix CDN environment.
Reviewed by: markj, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D38380
show more ...
|
#
18b83b62 |
| 26-Jan-2023 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: reduce the size of t_rttupdated in tcpcb
During tcp session start, various mechanisms need to track a few initial RTTs before becoming active. Prevent overflows of the corresponding tracking co
tcp: reduce the size of t_rttupdated in tcpcb
During tcp session start, various mechanisms need to track a few initial RTTs before becoming active. Prevent overflows of the corresponding tracking counter and reduce the size of tcpcb simultaneously.
Reviewed By: #transport, tuexen, guest-ccui Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D21117
show more ...
|
#
73e994a9 |
| 19-Jan-2023 |
Gordon Bergling <gbe@FreeBSD.org> |
extra_tcp_stacks: Fix a common typo in source code comments
- s/orginal/original/
MFC after: 3 days
|
#
432a398d |
| 11-Jan-2023 |
Gordon Bergling <gbe@FreeBSD.org> |
tcp_rack(4): Fix a typo in a source code comment
- s/postion/position/
MFC after: 3 days
|
#
8ea41829 |
| 10-Jan-2023 |
Andrew Gallatin <gallatin@FreeBSD.org> |
tcp: Build RACK and BBR stacks as a part of LINT
When RACK and BBR were added to the kernel, they were put behind 'WITH_EXTRA_TCP_STACKS=1'. Unfortunately that was never added to any NOTES file, s
tcp: Build RACK and BBR stacks as a part of LINT
When RACK and BBR were added to the kernel, they were put behind 'WITH_EXTRA_TCP_STACKS=1'. Unfortunately that was never added to any NOTES file, so RACK & BBR were not compiled with the various LINT-NOINET, LINT-NOINET6, and LINT-NOIP kernels. This lead to the stacks sometimes being broken.
This change:
- Fixes RACK so that it compiles with the various LINT-NO* kernels - Adds WITH_EXTRA_TCP_STACKS=1 to all NOTES kernels so that RACK and BBR are compile tested regularly
Sponsored by: Netflix Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D37903
show more ...
|
#
2e2a1c31 |
| 14-Dec-2022 |
Randall Stewart <rrs@FreeBSD.org> |
Opps take out a stray left behind printf that was for debugging.. Sorry.
|
#
e2e088ae |
| 14-Dec-2022 |
Randall Stewart <rrs@FreeBSD.org> |
Rack cannot be loaded without cc_newreno compiled into the kernel.
Right now rack will fail to load due to its hack in accessing symbol names in cc_newreno. This was fine when newreno was always com
Rack cannot be loaded without cc_newreno compiled into the kernel.
Right now rack will fail to load due to its hack in accessing symbol names in cc_newreno. This was fine when newreno was always compiled into the kernel but now ... not so much. Instead lets fix up rack to use the socket option queries to get the information it wants and set the parameters. We also fix the CC parameter so they are always settable.
Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D37622
show more ...
|
#
eaabc937 |
| 14-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: retire TCPDEBUG
This subsystem is superseded by modern debugging facilities, e.g. DTrace probes and TCP black box logging.
We intentionally leave SO_DEBUG in place, as many utilities may set i
tcp: retire TCPDEBUG
This subsystem is superseded by modern debugging facilities, e.g. DTrace probes and TCP black box logging.
We intentionally leave SO_DEBUG in place, as many utilities may set it on a socket. Also the tcp::debug DTrace probes look at this flag on a socket.
Reviewed by: gnn, tuexen Discussed with: rscheff, rrs, jtl Differential revision: https://reviews.freebsd.org/D37694
show more ...
|
#
e6fc01f6 |
| 14-Dec-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
tcp: whack the stale declaration of rack_timer_stop
Sponsored by: Rubicon Communications, LLC ("Netgate")
|
#
446ccdd0 |
| 07-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: use single locked callout per tcpcb for the TCP timers
Use only one callout structure per tcpcb that is responsible for handling all five TCP timeouts. Use locked version of callout, of course
tcp: use single locked callout per tcpcb for the TCP timers
Use only one callout structure per tcpcb that is responsible for handling all five TCP timeouts. Use locked version of callout, of course. The callout function tcp_timer_enter() chooses soonest timer and executes it with lock held. Unless the timer reports that the tcpcb has been freed, the callout is rescheduled for next soonest timer, if there is any.
With single callout per tcpcb on connection teardown we should be able to fully stop the callout and immediately free it, avoiding use of callout_async_drain(). There is one gotcha here: callout_stop() can actually touch our memory when a rare race condition happens. See comment above tcp_timer_stop(). Synchronous stop of the callout makes tcp_discardcb() the single entry point for tcpcb destructor, merging the tcp_freecb() to the end of the function.
While here, also remove lots of lingering checks in the beginning of TCP timer functions. With a locked callout they are unnecessary.
While here, clean unused parts of timer KPI for the pluggable TCP stacks.
While here, remove TCPDEBUG from tcp_timer.c, as this allows for more simplification of TCP timers. The TCPDEBUG is scheduled for removal.
Move the DTrace probes in timers to the beginning of a function, where a tcpcb is always existing.
Discussed with: rrs, tuexen, rscheff (the TCP part of the diff) Reviewed by: hselasky, kib, mav (the callout part) Differential revision: https://reviews.freebsd.org/D37321
show more ...
|