#
07285bb4 |
| 10-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: utilize new solisten_clone() and solisten_enqueue()
This streamlines cloning of a socket from a listener. Now we do not drop the inpcb lock during creation of a new socket, do not do useless s
tcp: utilize new solisten_clone() and solisten_enqueue()
This streamlines cloning of a socket from a listener. Now we do not drop the inpcb lock during creation of a new socket, do not do useless state transitions, and put a fully initialized socket+inpcb+tcpcb into the listen queue.
Before this change, first we would allocate the socket and inpcb+tcpcb via tcp_usr_attach() as TCPS_CLOSED, link them into global list of pcbs, unlock pcb and put this onto incomplete queue (see 6f3caa6d815). Then, after sonewconn() we would lock it again, transition into TCPS_SYN_RECEIVED, insert into inpcb hash, finalize initialization of tcpcb. And then, in call into tcp_do_segment() and upon transition to TCPS_ESTABLISHED call soisconnected(). This call would lock the listening socket once again with a LOR protection sequence and then we would relocate the socket onto the complete queue and only now it is ready for accept(2).
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D36064
show more ...
|
#
c7a62c92 |
| 10-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: gather v4/v6 handling code into in_pcballoc() from protocols
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D36062
|
#
1b91978f |
| 07-Jul-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: remove a condition in tcp_usr_detach() that never happens
The comment from Robert Watson doubts that this condition ever happens. Our analysis confirm that. Also, we found that if you manage t
tcp: remove a condition in tcp_usr_detach() that never happens
The comment from Robert Watson doubts that this condition ever happens. Our analysis confirm that. Also, we found that if you manage to create such a connection with help of some other bug, then after the "second case" code is executed, the kernel will panic in other part of the stack.
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D35714
show more ...
|
#
d8596171 |
| 04-Jul-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: use only soref()/sorele() as socket reference count
o Retire SS_FDREF as it is basically a debug flag on top of already existing soref()/sorele(). o Convert SS_PROTOREF into soref()/sorel
sockets: use only soref()/sorele() as socket reference count
o Retire SS_FDREF as it is basically a debug flag on top of already existing soref()/sorele(). o Convert SS_PROTOREF into soref()/sorele(). o Change reference model for the listen queues, see below. o Make sofree() private. The correct KPI to use is only sorele(). o Make soabort() respect the model and sorele() instead of sofree().
Note on listening queues. Until now the sockets on a queue had zero reference count. And the reference were given only upon accept(2). The assumption was that there is no way to see the queued socket from anywhere except its head. This is not true, since queued sockets already have pcbs, which are linked at least into the global pcb lists. With this change we put the reference right in the sonewconn() and on accept(2) path we just hand the existing reference to the file descriptor.
Differential revision: https://reviews.freebsd.org/D35679
show more ...
|
#
74703901 |
| 04-Jul-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: use a TCP flag to check if connection has been close(2)d
The flag SS_NOFDREF is a private flag of the socket layer. It also is supposed to be read with SOCK_LOCK(), which we don't own here.
R
tcp: use a TCP flag to check if connection has been close(2)d
The flag SS_NOFDREF is a private flag of the socket layer. It also is supposed to be read with SOCK_LOCK(), which we don't own here.
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D35663
show more ...
|
#
97453e5e |
| 23-Jun-2022 |
Claudio Jeker <claudio@openbsd.org> |
Unlock inp when handling TCP_MD5SIG socket options
Unlock the inp when hanlding TCP_MD5SIG socket options. tcp_ipsec_pcbctl handles locking the inp when the option is being modified.
This was found
Unlock inp when handling TCP_MD5SIG socket options
Unlock the inp when hanlding TCP_MD5SIG socket options. tcp_ipsec_pcbctl handles locking the inp when the option is being modified.
This was found by Claudio Jeker while working on the OpenBGPd port.
On 14 we get a panic when trying to call getsockopt, on 13.1 the process locks up using 100% CPU.
Reviewed by: rscheff (transport), tuexen MFC after: 3 days Sponsored by: Klara Inc. Differential Revision: https://reviews.freebsd.org/D35532
show more ...
|
Revision tags: release/13.1.0 |
|
#
b338b1fd |
| 19-Apr-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
tcp: plug set-but-not-used vars
Sponsored by: Rubicon Communications, LLC ("Netgate")
|
#
ea9017fb |
| 21-Feb-2022 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Congestion control move to using reference counting.
In the transport call on 12/3 Gleb asked to move the CC modules towards using reference counting to prevent folks from unloading a module in
tcp: Congestion control move to using reference counting.
In the transport call on 12/3 Gleb asked to move the CC modules towards using reference counting to prevent folks from unloading a module in use. It was also agreed that Michael would do a user space utility like tcp_drop that could be used to move all connections that are using a specific CC to some other CC.
This is the half I committed to doing, making it so that we maintain a refcount on a cc module every time a pcb refers to it and decrementing that every time a pcb no longer uses a cc module. This also helps us simplify the whole unloading process by getting rid of tcp_ccunload() which munged through all the tcb's. Instead we mark a module as being removed and prevent further references to it. We also make sure that if a module is marked as being removed it cannot be made as the default and also the opposite of that, if its a default it fails and does not mark it as being removed.
Reviewed by: Michael Tuexen, Gleb Smirnoff Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D33249
show more ...
|
#
3f169c54 |
| 10-Feb-2022 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: Add/update AccECN related statistics and numbers
Reserve couters in the tcps struct in preparation for AccECN, extend the debugging output for TF2 flags, optimize the syncache flags from indivi
tcp: Add/update AccECN related statistics and numbers
Reserve couters in the tcps struct in preparation for AccECN, extend the debugging output for TF2 flags, optimize the syncache flags from individual bits to a codepoint for the specifc ECN handshake.
This is in preparation of AccECN.
No functional chance except for extended debug output capabilities.
Reviewed By: #transport, rrs Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34161
show more ...
|
#
528c7649 |
| 09-Feb-2022 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: fix compliation when KERN_TLS is not defined
Reported by: Gary Jennejohn Fixes: fd7daa727126 - main - tcp: make tcp_ctloutput_set() non-static Sponsored by: Netflix, Inc.
|
#
fd7daa72 |
| 08-Feb-2022 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: make tcp_ctloutput_set() non-static
tcp_ctloutput_set() will be used via the sysctl interface in a upcoming command line tool tcpsso.
Reviewed by: glebius, rscheff Sponsored by: Netflix, Inc
tcp: make tcp_ctloutput_set() non-static
tcp_ctloutput_set() will be used via the sysctl interface in a upcoming command line tool tcpsso.
Reviewed by: glebius, rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D34164
show more ...
|
#
3b0ee680 |
| 03-Feb-2022 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
tcp: Prevent setting of ECN bits with setsockopt()
setsockopt() grants full access to the deprecated TOS byte. For TCP, mask out the ECN codepoint, so that only the DSCP portion can be adjusted.
Re
tcp: Prevent setting of ECN bits with setsockopt()
setsockopt() grants full access to the deprecated TOS byte. For TCP, mask out the ECN codepoint, so that only the DSCP portion can be adjusted.
Reviewed By: tuexen, hselasky, #manpages, #transport, debdrup Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D34154
show more ...
|
#
3b3c08c1 |
| 02-Feb-2022 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: cleanup functions related to socket option handling
Consistently only pass the inp and the sopt around. Don't pass the so around, since in a upcoming commit tcp_ctloutput_set() will be called f
tcp: cleanup functions related to socket option handling
Consistently only pass the inp and the sopt around. Don't pass the so around, since in a upcoming commit tcp_ctloutput_set() will be called from a context different from setsockopt(). Also expect the inp to be locked when calling tcp_ctloutput_[gs]et(), this is also required for the upcoming use by tcpsso, a command line tool to set socket options. Reviewed by: glebius, rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D34151
show more ...
|
#
aac52f94 |
| 18-Jan-2022 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Warning cleanup from new compiler.
The clang compiler recently got an update that generates warnings of unused variables where they were set, and then never used. This revision goes through the
tcp: Warning cleanup from new compiler.
The clang compiler recently got an update that generates warnings of unused variables where they were set, and then never used. This revision goes through the tcp stack and cleans all of those up.
Reviewed by: Michael Tuexen, Gleb Smirnoff Sponsored by: Netflix Inc. Differential Revision:
show more ...
|
#
1d41a494 |
| 13-Jan-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp_usr_connect: report actual error code when stack requests drop
|
#
4287aa56 |
| 28-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags
While here move out one more erroneous condition out of the epoch and common return. The only functional change is that if w
tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags
While here move out one more erroneous condition out of the epoch and common return. The only functional change is that if we send control on a shut down socket we would get EINVAL instead of ECONNRESET.
Reviewed by: tuexen Reported by: syzbot+8388cf7f401a7b6bece6@syzkaller.appspotmail.com Fixes: f64dc2ab5be38e5366271ef85ea90d8cb1c7841a
show more ...
|
#
0af4ce45 |
| 28-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp_usr_shutdown: don't cast inp_ppcb to tcpcb before checking inp_flags
Fixes: f64dc2ab5be38e5366271ef85ea90d8cb1c7841a
|
#
37a7f557 |
| 27-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp_usr_rcvd: don't cast inp_ppcb to tcpcb before checking inp_flags
Fixes: f64dc2ab5be38e5366271ef85ea90d8cb1c7841a
|
#
a370832b |
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: remove delayed drop KPI
No longer needed after tcp_output() can ask caller to drop.
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33371
|
#
f64dc2ab |
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: TCP output method can request tcp_drop
The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection when they do output on it. The default stack never does this, thus existing frame
tcp: TCP output method can request tcp_drop
The advanced TCP stacks (bbr, rack) may decide to drop a TCP connection when they do output on it. The default stack never does this, thus existing framework expects tcp_output() always to return locked and valid tcpcb.
Provide KPI extension to satisfy demands of advanced stacks. If the output method returns negative error code, it means that caller must call tcp_drop().
In tcp_var() provide three inline methods to call tcp_output(): - tcp_output() is a drop-in replacement for the default stack, so that default stack can continue using it internally without modifications. For advanced stacks it would perform tcp_drop() and unlock and report that with negative error code. - tcp_output_unlock() handles the negative code and always converts it to positive and always unlocks. - tcp_output_nodrop() just calls the method and leaves the responsibility to drop on the caller.
Sweep over the advanced stacks and use new KPI instead of using HPTS delayed drop queue for that.
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33370
show more ...
|
#
40fa3e40 |
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: mechanically substitute call to tfb_tcp_output to new method.
Made with sed(1) execution:
sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/)
sed: s/tp->t_fb->tfb_tcp_output\(tp
tcp: mechanically substitute call to tfb_tcp_output to new method.
Made with sed(1) execution:
sed -Ef sed -i "" $(grep --exclude tcp_var.h -lr tcp_output sys/)
sed: s/tp->t_fb->tfb_tcp_output\(tp\)/tcp_output(tp)/ s/to tfb_tcp_output\(\)/to tcp_output()/
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D33366
show more ...
|
Revision tags: release/12.3.0 |
|
#
ef396441 |
| 10-Nov-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp_usr_detach: revert debugging piece from f5cf1e5f5a500.
The code was probably useful during the problem being chased down, but for brevity makes sense just to return to the original KASSERT.
Rev
tcp_usr_detach: revert debugging piece from f5cf1e5f5a500.
The code was probably useful during the problem being chased down, but for brevity makes sense just to return to the original KASSERT.
Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D32968
show more ...
|
#
df07bfda |
| 12-Nov-2021 |
Michael Tuexen <tuexen@FreeBSD.org> |
tcp: Fix a locking issue
INP_WLOCK_RECHECK_CLEANUP() and INP_WLOCK_RECHECK() might return from the function, so any locks held must be released.
Reported by: syzbot+b1a888df08efaa7b4bf1@syzkaller.
tcp: Fix a locking issue
INP_WLOCK_RECHECK_CLEANUP() and INP_WLOCK_RECHECK() might return from the function, so any locks held must be released.
Reported by: syzbot+b1a888df08efaa7b4bf1@syzkaller.appspotmail.com Reviewed by: markj Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D32975
show more ...
|
#
b8d60729 |
| 11-Nov-2021 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Congestion control cleanup.
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!
This patch does a bit of cleanup on TCP congestion control modules. There were s
tcp: Congestion control cleanup.
NOTE: HEADS UP read the note below if your kernel config is not including GENERIC!!
This patch does a bit of cleanup on TCP congestion control modules. There were some rather interesting surprises that one could get i.e. where you use a socket option to change from one CC (say cc_cubic) to another CC (say cc_vegas) and you could in theory get a memory failure and end up on cc_newreno. This is not what one would expect. The new code fixes this by requiring a cc_data_sz() function so we can malloc with M_WAITOK and pass in to the init function preallocated memory. The CC init is expected in this case *not* to fail but if it does and a module does break the "no fail with memory given" contract we do fall back to the CC that was in place at the time.
This also fixes up a set of common newreno utilities that can be shared amongst other CC modules instead of the other CC modules reaching into newreno and executing what they think is a "common and understood" function. Lets put these functions in cc.c and that way we have a common place that is easily findable by future developers or bug fixers. This also allows newreno to evolve and grow support for its features i.e. ABE and HYSTART++ without having to dance through hoops for other CC modules, instead both newreno and the other modules just call into the common functions if they desire that behavior or roll there own if that makes more sense.
Note: This commit changes the kernel configuration!! If you are not using GENERIC in some form you must add a CC module option (one of CC_NEWRENO, CC_VEGAS, CC_CUBIC, CC_CDG, CC_CHD, CC_DCTCP, CC_HTCP, CC_HD). You can have more than one defined as well if you desire. Note that if you create a kernel configuration that does not define a congestion control module and includes INET or INET6 the kernel compile will break. Also you need to define a default, generic adds 'options CC_DEFAULT=\"newreno\" but you can specify any string that represents the name of the CC module (same names that show up in the CC module list under net.inet.tcp.cc). If you fail to add the options CC_DEFAULT in your kernel configuration the kernel build will also break.
Reviewed by: Michael Tuexen Sponsored by: Netflix Inc. RELNOTES:YES Differential Revision: https://reviews.freebsd.org/D32693
show more ...
|
#
f581a26e |
| 26-Oct-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP.
Pass control for IP/IP6 level options from generic tcp_ctloutput_set() down to per-stack ctloutput.
Call tcp6_use_min_mtu() from tcp
Factor out tcp6_use_min_mtu() to handle IPV6_USE_MIN_MTU by TCP.
Pass control for IP/IP6 level options from generic tcp_ctloutput_set() down to per-stack ctloutput.
Call tcp6_use_min_mtu() from tcp stack tcp_default_ctloutput().
Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D32655
show more ...
|