#
94387f25 |
| 03-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: remove unused pr_sopoll_notsupp()
|
#
815f2a61 |
| 03-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: removed unused argument from sopoll()
|
Revision tags: release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3 |
|
#
7cbb6b6e |
| 23-Jan-2025 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Close some SO_REUSEPORT_LB races, part 2
Suppose a thread is adds a socket to an existing TCP lbgroup that is actively accepting connections. It has to do the following operations: 1. set SO
inpcb: Close some SO_REUSEPORT_LB races, part 2
Suppose a thread is adds a socket to an existing TCP lbgroup that is actively accepting connections. It has to do the following operations: 1. set SO_REUSEPORT_LB on the socket 2. bind() the socket to the shared address/port 3. call listen()
Step 2 makes the inpcb visible to incoming connection requests. However, at this point the inpcb cannot accept new connections. If in_pcblookup() matches it, the remote end will see ECONNREFUSED even when other listening sockets are present in the lbgroup. This means that dynamically adding inpcbs to an lbgroup (e.g., by starting up new workers) can trigger spurious connection failures for no good reason. (A similar problem exists when removing inpcbs from an lbgroup, but that is harder to fix and is not addressed by this patch; see the review for a bit more commentary.)
Fix this by augmenting each lbgroup with a linked list of inpcbs that are pending a listen() call. When adding an inpcb to an lbgroup, keep the inpcb on this list if listen() hasn't been called, so it is not yet visible to the lookup path. Then, add a new in_pcblisten() routine which makes the inpcb visible within the lbgroup now that it's safe to let it handle new connections.
Add a regression test which verifies that we don't get spurious connection errors while adding sockets to an LB group.
Reviewed by: glebius MFC after: 1 month Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48544
show more ...
|
Revision tags: release/14.2.0, release/13.4.0, release/14.1.0 |
|
#
9576fc16 |
| 21-Apr-2024 |
Gordon Bergling <gbe@FreeBSD.org> |
uipc_domain: Fix a typo in a source code comment
- s/cant/can't/
MFC after: 3 days
|
Revision tags: release/13.3.0 |
|
#
5bba2728 |
| 16-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: make pr_shutdown fully protocol specific method
Disassemble a one-for-all soshutdown() into protocol specific methods. This creates a small amount of copy & paste, but makes code a lot more
sockets: make pr_shutdown fully protocol specific method
Disassemble a one-for-all soshutdown() into protocol specific methods. This creates a small amount of copy & paste, but makes code a lot more self documented, as protocol specific method would execute only the code that is relevant to that protocol and nothing else. This also fixes a couple recent regressions and reduces risk of future regressions. The extended KPI for the new pr_shutdown removes need for the extra pr_flush which was added for the sake of SCTP which could not perform its shutdown properly with the old one. Particularly for SCTP this change streamlines a lot of code.
Some notes on why certain parts of code were copied or were not to certain protocols: * The (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING) check is needed only for those protocols that may be connected or disconnected. * The above reduces into only SS_ISCONNECTED for those protocols that always connect instantly. * The ENOTCONN and continue processing hack is left only for datagram protocols. * The SOLISTENING(so) block is copied to those protocols that listen(2). * sorflush() on SHUT_RD is copied almost to every protocol, but that will be refactored later. * wakeup(&so->so_timeo) is copied to protocols that can make a non-instant connect(2), can SO_LINGER or can accept(2).
There are three protocols (netgraph(4), Bluetooth, SDP) that did not have pr_shutdown, but old soshutdown() would still perform sorflush() on SHUT_RD for them and also wakeup(9). Those protocols partially supported shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost shutdown(2) support. I'm pretty sure netgraph(4) and Bluetooth are okay about that and SDP is almost abandoned anyway.
Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43413
show more ...
|
#
0fac350c |
| 30-Nov-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912b4, use same approach for two simplier syscalls that return socket addresses.
sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912b4, use same approach for two simplier syscalls that return socket addresses. Although, these two syscalls aren't performance critical, this change generalizes some code between 3 syscalls trimming code size.
Following example of accept(2), provide VNET-aware and INVARIANT-checking wrappers sopeeraddr() and sosockaddr() around protosw methods.
Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D42694
show more ...
|
#
cfb1e929 |
| 30-Nov-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: don't malloc/free sockaddr memory on accept(2)
Let the accept functions provide stack memory for protocols to fill it in. Generic code should provide sockaddr_storage, specialized code may
sockets: don't malloc/free sockaddr memory on accept(2)
Let the accept functions provide stack memory for protocols to fill it in. Generic code should provide sockaddr_storage, specialized code may provide smaller structure.
While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting required length in case if provided length was insufficient. Our manual page accept(2) and POSIX don't explicitly require that, but one can read the text as they do. Linux also does that. Update tests accordingly.
Reviewed by: rscheff, tuexen, zlei, dchagin Differential Revision: https://reviews.freebsd.org/D42635
show more ...
|
#
fdafd315 |
| 24-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate no-longer-needed sys/cdefs.h includes as well as now-empty blank lines in a row.
Remov
sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate no-longer-needed sys/cdefs.h includes as well as now-empty blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/ Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/ Remove /\n+#if.*\n#endif.*\n+/ Remove /^#if.*\n#endif.*\n/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
show more ...
|
#
29363fb4 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
685dc743 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
Revision tags: release/13.2.0, release/12.4.0 |
|
#
f6696856 |
| 27-Sep-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
protocols: make socket buffers ioctl handler changeable
Allow to set custom per-protocol handlers for the socket buffers ioctls by introducing pr_setsbopt callback with the default value set to th
protocols: make socket buffers ioctl handler changeable
Allow to set custom per-protocol handlers for the socket buffers ioctls by introducing pr_setsbopt callback with the default value set to the currently-used sbsetopt().
Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D36746
show more ...
|
#
24af7808 |
| 31-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: repair protocol selection logic in socket(2)
Pointy hat to: glebius Fixes: 61f7427f02a307d28af674a12c45dd546e3898e4
|
#
61f7427f |
| 31-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: cleanup protocols that existed merely to provide pr_input
Since 4.4BSD the protosw was used to implement socket types created by socket(2) syscall and at the same to demultiplex incoming IP
protosw: cleanup protocols that existed merely to provide pr_input
Since 4.4BSD the protosw was used to implement socket types created by socket(2) syscall and at the same to demultiplex incoming IPv4 datagrams (later copied to IPv6). This story ended with 78b1fc05b20.
These entries (e.g. IPPROTO_ICMP) in inetsw that were added to catch packets in ip_input(), they would also be returned by pffindproto() if user says socket(AF_INET, SOCK_RAW, IPPROTO_ICMP). Thus, for raw sockets to work correctly, all the entries were pointing at raw_usrreq differentiating only in the value of pr_protocol.
With 78b1fc05b20 all these entries are no longer needed, as ip_protox is independent of protosw. Any socket syscall requesting SOCK_RAW type would end up with rip_protosw. And this protosw has its pr_protocol set to 0, allowing to mark socket with any protocol.
For IPv6 raw socket the change required two small fixes: o Validate user provided protocol value o Always use protocol number stored in inp in rip6_attach, instead of protosw value, which is now always 0.
Differential revision: https://reviews.freebsd.org/D36380
show more ...
|
#
244e1aea |
| 30-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
domains: merge domain_init() into domain_add()
domain_init() called at SI_SUB_PROTO_DOMAIN/SI_ORDER_SECOND is always called right after domain_add(), that had been called at SI_ORDER_FIRST. Note tha
domains: merge domain_init() into domain_add()
domain_init() called at SI_SUB_PROTO_DOMAIN/SI_ORDER_SECOND is always called right after domain_add(), that had been called at SI_ORDER_FIRST. Note that protocols aren't initialized yet at this point, since they are usually scheduled to initialize at SI_ORDER_THIRD.
After this merge it becomes clear that DOMF_SUPPORTED / DOMF_INITED can be garbage collected as they are set & checked in the same function.
For initialization of the domain system itself it is now clear that domaininit() can be garbage collected and static initializer is enough.
show more ...
|
#
e18c5816 |
| 30-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
domains: use queue(9) SLIST for linked list of domains
|
#
d7574c74 |
| 30-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
domains: init pr_domain in pr_init()
|
#
c414347b |
| 30-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
mbufs: isolate max_linkhdr and max_protohdr handling in the mbuf code
o Statically initialize max_linkhdr to default value without relying on domain(9) code doing that. o Statically initialize max
mbufs: isolate max_linkhdr and max_protohdr handling in the mbuf code
o Statically initialize max_linkhdr to default value without relying on domain(9) code doing that. o Statically initialize max_protohdr to a sane value, without relying on TCP being always compiled in. o Retire max_datalen. Set, but not used. o Don't make the domain(9) system responsible in validating these values and updating max_hdr. Instead provide KPI max_linkhdr_grow() and max_protohdr_grow(). o Call max_linkhdr_grow() from IEEE802.11 and max_protohdr_grow() from TCP. Those are the only protocols today that may want to grow.
Reviewed by: tuexen Differential revision: https://reviews.freebsd.org/D36376
show more ...
|
#
837b7203 |
| 26-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
domains: use struct domain as argument
|
#
7c04ca1f |
| 26-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: for stat(2) on a socket don't report hiwat as block size
The code appeared in d8392c6c39eb with not good explanation. It is very unlikely any software in the world needs that.
Differentia
sockets: for stat(2) on a socket don't report hiwat as block size
The code appeared in d8392c6c39eb with not good explanation. It is very unlikely any software in the world needs that.
Differential revision: https://reviews.freebsd.org/D36283
show more ...
|
#
e7d02be1 |
| 17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge
protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge struct pr_usrreqs into struct protosw. This was suggested in 1996 by wollman@ (see 7b187005d18ef), and later reiterated in 2006 by rwatson@ (see 6fbb9cf860dcd). o Make struct domain hold a variable sized array of protosw pointers. For most protocols these pointers are initialized statically. Those domains that may have loadable protocols have spacers. IPv4 and IPv6 have 8 spacers each (andre@ dff3237ee54ea). o For inetsw and inet6sw leave a comment noting that many protosw entries very likely are dead code. o Refactor pf_proto_[un]register() into protosw_[un]register(). o Isolate pr_*_notsupp() methods into uipc_domain.c
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36232
show more ...
|
#
81a34d37 |
| 17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: retire pr_drain and use EVENTHANDLER(9) directly
The method was called for two different conditions: 1) the VM layer is low on pages or 2) one of UMA zones of mbuf allocator exhausted. This
protosw: retire pr_drain and use EVENTHANDLER(9) directly
The method was called for two different conditions: 1) the VM layer is low on pages or 2) one of UMA zones of mbuf allocator exhausted. This change 2) into a new event handler, but all affected network subsystems modified to subscribe to both, so this change shall not bring functional changes under different low memory situations.
There were three subsystems still using pr_drain: TCP, SCTP and frag6. The latter had its protosw entry for the only reason to register its pr_drain method.
Reviewed by: tuexen, melifaro Differential revision: https://reviews.freebsd.org/D36164
show more ...
|
#
1922eb3e |
| 17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: retire pr_slowtimo and pr_fasttimo
They were useful many years ago, when the callwheel was not efficient, and the kernel tried to have as little callout entries scheduled as possible.
Revi
protosw: retire pr_slowtimo and pr_fasttimo
They were useful many years ago, when the callwheel was not efficient, and the kernel tried to have as little callout entries scheduled as possible.
Reviewed by: tuexen, melifaro Differential revision: https://reviews.freebsd.org/D36163
show more ...
|
#
78b1fc05 |
| 17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: separate pr_input and pr_ctlinput out of protosw
The protosw KPI historically has implemented two quite orthogonal things: protocols that implement a certain kind of socket, and protocols t
protosw: separate pr_input and pr_ctlinput out of protosw
The protosw KPI historically has implemented two quite orthogonal things: protocols that implement a certain kind of socket, and protocols that are IPv4/IPv6 protocol. These two things do not make one-to-one correspondence. The pr_input and pr_ctlinput methods were utilized only in IP protocols. This strange duality required IP protocols that doesn't have a socket to declare protosw, e.g. carp(4). On the other hand developers of socket protocols thought that they need to define pr_input/pr_ctlinput always, which lead to strange dead code, e.g. div_input() or sdp_ctlinput().
With this change pr_input and pr_ctlinput as part of protosw disappear and IPv4/IPv6 get their private single level protocol switch table ip_protox[] and ip6_protox[] respectively, pointing at array of ipproto_input_t functions. The pr_ctlinput that was used for control input coming from the network (ICMP, ICMPv6) is now represented by ip_ctlprotox[] and ip6_ctlprotox[].
ipproto_register() becomes the only official way to register in the table. Those protocols that were always static and unlikely anybody is interested in making them loadable, are now registered by ip_init(), ip6_init(). An IP protocol that considers itself unloadable shall register itself within its own private SYSINIT().
Reviewed by: tuexen, melifaro Differential revision: https://reviews.freebsd.org/D36157
show more ...
|
#
9b967bd6 |
| 12-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
domains: allow domains to be unloaded
Add domain_remove() SYSUNINT callback that removes the domain from the domain list if it has DOMF_UNLOADABLE flag set. This change is required to support netli
domains: allow domains to be unloaded
Add domain_remove() SYSUNINT callback that removes the domain from the domain list if it has DOMF_UNLOADABLE flag set. This change is required to support netlink ( D36002 ).
Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D36173
show more ...
|
#
948f31d7 |
| 12-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet: do not broadcast PRC_REDIRECT_HOST on ICMP redirect
This is expensive and useless call. It has been useless since Alexander melifaro@ moved the forwarding table to nexthops with passive in
netinet: do not broadcast PRC_REDIRECT_HOST on ICMP redirect
This is expensive and useless call. It has been useless since Alexander melifaro@ moved the forwarding table to nexthops with passive invalidation. What happens now is that cached route in a inpcb would get invalidated on next ip_output().
These were the last users of pfctlinput(), so garbage collect it.
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36156
show more ...
|