#
da806e8d |
| 06-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Add FIB-aware inpcb lookup
Allow protocol layers to look up an inpcb belonging to a particular FIB. This is indicated by setting INPLOOKUP_FIB; if it is set, the FIB to be used is obtained fr
inpcb: Add FIB-aware inpcb lookup
Allow protocol layers to look up an inpcb belonging to a particular FIB. This is indicated by setting INPLOOKUP_FIB; if it is set, the FIB to be used is obtained from the specificed mbuf or ifnet.
No functional change intended.
Reviewed by: glebius, melifaro MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48662
show more ...
|
#
bbd0084b |
| 06-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Add a flags parameter to in_pcbbind()
Add a flag, INPBIND_FIB, which means that the inpcb is local to its FIB number. When this flag is specified, duplicate bindings are permitted, so long a
inpcb: Add a flags parameter to in_pcbbind()
Add a flag, INPBIND_FIB, which means that the inpcb is local to its FIB number. When this flag is specified, duplicate bindings are permitted, so long as each FIB contains at most one inpcb bound to the same address/port. If an inpcb is bound with this flag, it'll have the INP_BOUNDFIB flag set.
No functional change intended.
Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48661
show more ...
|
#
9a413162 |
| 06-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Imbue in(6)_pcblookup_local() with a FIB parameter
This is to enable a mode where duplicate inpcb bindings are permitted, and we want to look up an inpcb with a particular FIB. Thus, add a "
inpcb: Imbue in(6)_pcblookup_local() with a FIB parameter
This is to enable a mode where duplicate inpcb bindings are permitted, and we want to look up an inpcb with a particular FIB. Thus, add a "fib" parameter to in_pcblookup() and related functions, and plumb it through.
A fib value of RT_ALL_FIBS indicates that the lookup should ignore FIB numbers when searching. Otherwise, it should refer to a valid FIB number, and the returned inpcb should belong to the specific FIB. For now, just add the fib parameter where needed, as there are several layers to plumb through.
No functional change intended.
Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48660
show more ...
|
Revision tags: release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3, release/14.2.0 |
|
#
52ef944b |
| 14-Nov-2024 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Constify address parameters to in6 pcb lookup routines
No functional change intended.
MFC after: 1 week Sponsored by: Klara, Inc. Sponsored by: Stormshield
|
Revision tags: release/13.4.0, release/14.1.0, release/13.3.0 |
|
#
0fac350c |
| 30-Nov-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912b4, use same approach for two simplier syscalls that return socket addresses.
sockets: don't malloc/free sockaddr memory on getpeername/getsockname
Just like it was done for accept(2) in cfb1e92912b4, use same approach for two simplier syscalls that return socket addresses. Although, these two syscalls aren't performance critical, this change generalizes some code between 3 syscalls trimming code size.
Following example of accept(2), provide VNET-aware and INVARIANT-checking wrappers sopeeraddr() and sosockaddr() around protosw methods.
Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D42694
show more ...
|
#
cfb1e929 |
| 30-Nov-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: don't malloc/free sockaddr memory on accept(2)
Let the accept functions provide stack memory for protocols to fill it in. Generic code should provide sockaddr_storage, specialized code may
sockets: don't malloc/free sockaddr memory on accept(2)
Let the accept functions provide stack memory for protocols to fill it in. Generic code should provide sockaddr_storage, specialized code may provide smaller structure.
While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting required length in case if provided length was insufficient. Our manual page accept(2) and POSIX don't explicitly require that, but one can read the text as they do. Linux also does that. Update tests accordingly.
Reviewed by: rscheff, tuexen, zlei, dchagin Differential Revision: https://reviews.freebsd.org/D42635
show more ...
|
#
29363fb4 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
2ff63af9 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
|
#
7b92493a |
| 20-Apr-2023 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Avoid inp_cred dereferences in SMR-protected lookup
The SMR-protected inpcb lookup algorithm currently has to check whether a matching inpcb belongs to a jail, in order to prioritize jailed b
inpcb: Avoid inp_cred dereferences in SMR-protected lookup
The SMR-protected inpcb lookup algorithm currently has to check whether a matching inpcb belongs to a jail, in order to prioritize jailed bound sockets. To do this it has to maintain a ucred reference, and for this to be safe, the reference can't be released until the UMA destructor is called, and this will not happen within any bounded time period.
Changing SMR to periodically recycle garbage is not trivial. Instead, let's implement SMR-synchronized lookup without needing to dereference inp_cred. This will allow the inpcb code to free the inp_cred reference immediately when a PCB is freed, ensuring that ucred (and thus jail) references are released promptly.
Commit 220d89212943 ("inpcb: immediately return matching pcb on lookup") gets us part of the way there. This patch goes further to handle lookups of unconnected sockets. Here, the strategy is to maintain a well-defined order of items within a hash chain so that a wild lookup can simply return the first match and preserve existing semantics. This makes insertion of listening sockets more complicated in order to make lookup simpler, which seems like the right tradeoff anyway given that bind() is already a fairly expensive operation and lookups are more common.
In particular, when inserting an unconnected socket, in_pcbinhash() now keeps the following ordering: - jailed sockets before non-jailed sockets, - specified local addresses before unspecified local addresses.
Most of the change adds a separate SMR-based lookup path for inpcb hash lookups. When a match is found, we try to lock the inpcb and re-validate its connection info. In the common case, this works well and we can simply return the inpcb. If this fails, typically because something is concurrently modifying the inpcb, we go to the slow path, which performs a serialized lookup.
Note, I did not touch lbgroup lookup, since there the credential reference is formally synchronized by net_epoch, not SMR. In particular, lbgroups are rarely allocated or freed.
I think it is possible to simplify in_pcblookup_hash_wild_locked() now, but I didn't do it in this patch.
Discussed with: glebius Tested by: glebius Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38572
show more ...
|
Revision tags: release/13.2.0 |
|
#
96871af0 |
| 15-Feb-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: use family specific sockaddr argument for bind functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the protocol's pr_bind method and from there on go down the call stack wi
inpcb: use family specific sockaddr argument for bind functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the protocol's pr_bind method and from there on go down the call stack with family specific argument.
Reviewed by: zlei, melifaro, markj Differential Revision: https://reviews.freebsd.org/D38601
show more ...
|
#
4130ea61 |
| 09-Feb-2023 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Split in_pcblookup_hash_locked() and clean up a bit
Split the in_pcblookup_hash_locked() function into several independent subroutine calls, each of which does some kind of hash table lookup.
inpcb: Split in_pcblookup_hash_locked() and clean up a bit
Split the in_pcblookup_hash_locked() function into several independent subroutine calls, each of which does some kind of hash table lookup. This refactoring makes it easier to introduce variants of the lookup algorithm that behave differently depending on whether they are synchronized by SMR or the PCB database hash lock.
While here, do some related cleanup: - Remove an unused ifnet parameter from internal functions. Keep it in external functions so that it can be used in the future to derive a v6 scopeid. - Reorder the parameters to in_pcblookup_lbgroup() to be consistent with the other lookup functions. - Remove an always-true check from in_pcblookup_lbgroup(): we can assume that we're performing a wildcard match.
No functional change intended.
Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D38364
show more ...
|
#
a9d22cce |
| 03-Feb-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: use family specific sockaddr argument for connect functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the protocol's pr_connect method and from there on go down the call st
inpcb: use family specific sockaddr argument for connect functions
Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the protocol's pr_connect method and from there on go down the call stack with family specific argument.
Reviewed by: markj Differential revision: https://reviews.freebsd.org/D38356
show more ...
|
#
221b9e3d |
| 03-Feb-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: merge two versions of in6_pcbconnect() into one
No functional change.
Reviewed by: markj Differential revision: https://reviews.freebsd.org/D38354
|
Revision tags: release/12.4.0 |
|
#
43d39ca7 |
| 04-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet*: de-void control input IP protocol methods
After decoupling of protosw(9) and IP wire protocols in 78b1fc05b205 for IPv4 we got vector ip_ctlprotox[] that is executed only and only from icm
netinet*: de-void control input IP protocol methods
After decoupling of protosw(9) and IP wire protocols in 78b1fc05b205 for IPv4 we got vector ip_ctlprotox[] that is executed only and only from icmp_input() and respectively for IPv6 we got ip6_ctlprotox[] executed only and only from icmp6_input(). This allows to use protocol specific argument types in these methods instead of struct sockaddr and void.
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36727
show more ...
|
Revision tags: release/13.1.0 |
|
#
db0ac6de |
| 02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mism
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mismerge of a merge to catch up to main resulted in files being committed which should not have been.
show more ...
|
#
266f97b5 |
| 02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.
MFC after: 1 month
|
#
93c67567 |
| 02-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove "options PCBGROUP"
With upcoming changes to the inpcb synchronisation it is going to be broken. Even its current status after the move of PCB synchronization to the network epoch is very ques
Remove "options PCBGROUP"
With upcoming changes to the inpcb synchronisation it is going to be broken. Even its current status after the move of PCB synchronization to the network epoch is very questionable.
This experimental feature was sponsored by Juniper but ended never to be used in Juniper and doesn't exist in their source tree [sjg@, stevek@, jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix [gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@].
I'm up to resurrecting it back if there is any interest from anybody.
Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33020
show more ...
|
Revision tags: release/12.3.0, release/13.0.0 |
|
#
a034518a |
| 19-Dec-2020 |
Andrew Gallatin <gallatin@FreeBSD.org> |
Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain
In order to efficiently serve web traffic on a NUMA machine, one must avoid as many NUMA domain crossings as possible. With SO
Filter TCP connections to SO_REUSEPORT_LB listen sockets by NUMA domain
In order to efficiently serve web traffic on a NUMA machine, one must avoid as many NUMA domain crossings as possible. With SO_REUSEPORT_LB, a number of workers can share a listen socket. However, even if a worker sets affinity to a core or set of cores on a NUMA domain, it will receive connections associated with all NUMA domains in the system. This will lead to cross-domain traffic when the server writes to the socket or calls sendfile(), and memory is allocated on the server's local NUMA node, but transmitted on the NUMA node associated with the TCP connection. Similarly, when the server reads from the socket, he will likely be reading memory allocated on the NUMA domain associated with the TCP connection.
This change provides a new socket ioctl, TCP_REUSPORT_LB_NUMA. A server can now tell the kernel to filter traffic so that only incoming connections associated with the desired NUMA domain are given to the server. (Of course, in the case where there are no servers sharing the listen socket on some domain, then as a fallback, traffic will be hashed as normal to all servers sharing the listen socket regardless of domain). This allows a server to deal only with traffic that is local to its NUMA domain, and avoids cross-domain traffic in most cases.
This patch, and a corresponding small patch to nginx to use TCP_REUSPORT_LB_NUMA allows us to serve 190Gb/s of kTLS encrypted https media content from dual-socket Xeons with only 13% (as measured by pcm.x) cross domain traffic on the memory controller.
Reviewed by: jhb, bz (earlier version), bcr (man page) Tested by: gonzo Sponsored by: Netfix Differential Revision: https://reviews.freebsd.org/D21636
show more ...
|
Revision tags: release/12.2.0, release/11.4.0 |
|
#
25102351 |
| 19-May-2020 |
Mike Karels <karels@FreeBSD.org> |
Allow TCP to reuse local port with different destinations
Previously, tcp_connect() would bind a local port before connecting, forcing the local port to be unique across all outgoing TCP connections
Allow TCP to reuse local port with different destinations
Previously, tcp_connect() would bind a local port before connecting, forcing the local port to be unique across all outgoing TCP connections for the address family. Instead, choose a local port after selecting the destination and the local address, requiring only that the tuple is unique and does not match a wildcard binding.
Reviewed by: tuexen (rscheff, rrs previous version) MFC after: 1 month Sponsored by: Forcepoint LLC Differential Revision: https://reviews.freebsd.org/D24781
show more ...
|
#
fe1274ee |
| 12-Jan-2020 |
Michael Tuexen <tuexen@FreeBSD.org> |
Fix race when accepting TCP connections.
When expanding a SYN-cache entry to a socket/inp a two step approach was taken: 1) The local address was filled in, then the inp was added to the hash tab
Fix race when accepting TCP connections.
When expanding a SYN-cache entry to a socket/inp a two step approach was taken: 1) The local address was filled in, then the inp was added to the hash table. 2) The remote address was filled in and the inp was relocated in the hash table. Before the epoch changes, a write lock was held when this happens and the code looking up entries was holding a corresponding read lock. Since the read lock is gone away after the introduction of the epochs, the half populated inp was found during lookup. This resulted in processing TCP segments in the context of the wrong TCP connection. This patch changes the above procedure in a way that the inp is fully populated before inserted into the hash table.
Thanks to Paul <devgs@ukr.net> for reporting the issue on the net@ mailing list and for testing the patch!
Reviewed by: rrs@ MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D22971
show more ...
|
Revision tags: release/12.1.0 |
|
#
0ecd976e |
| 02-Aug-2019 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
IPv6 cleanup: kernel
Finish what was started a few years ago and harmonize IPv6 and IPv4 kernel names. We are down to very few places now that it is feasible to do the change for everything remaini
IPv6 cleanup: kernel
Finish what was started a few years ago and harmonize IPv6 and IPv4 kernel names. We are down to very few places now that it is feasible to do the change for everything remaining with causing too much disturbance.
Remove "aliases" for IPv6 names which confusingly could indicate that we are talking about a different data structure or field or have two fields, one for each address family. Try to follow common conventions used in FreeBSD.
* Rename sin6p to sin6 as that is how it is spelt in most places. * Remove "aliases" (#defines) for: - in6pcb which really is an inpcb and nothing separate - sotoin6pcb which is sotoinpcb (as per above) - in6p_sp which is inp_sp - in6p_flowinfo which is inp_flow * Try to use ia6 for in6_addr rather than in6p. * With all these gone also rename the in6p variables to inp as that is what we call it in most of the network stack including parts of netinet6.
The reasons behind this cleanup are that we try to further unify netinet and netinet6 code where possible and that people will less ignore one or the other protocol family when doing code changes as they may not have spotted places due to different names for the same thing.
No functional changes.
Discussed with: tuexen (SCTP changes) MFC after: 3 months Sponsored by: Netflix
show more ...
|
Revision tags: release/11.3.0, release/12.0.0, release/11.2.0 |
|
#
82725ba9 |
| 23-Nov-2017 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Merge ^/head r325999 through r326131.
|
#
51369649 |
| 20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for
sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
show more ...
|
Revision tags: release/10.4.0, release/11.1.0 |
|
#
dce33a45 |
| 06-Mar-2017 |
Ermal Luçi <eri@FreeBSD.org> |
The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD.
The patch is similar to
The patch provides the same socket option as Linux IP_ORIGDSTADDR. Unfortunately they will have different integer value due to Linux value being already assigned in FreeBSD.
The patch is similar to IP_RECVDSTADDR but also provides the destination port value to the application.
This allows/improves implementation of transparent proxies on UDP sockets due to having the whole information on forwarded packets.
Reviewed by: adrian, aw Approved by: ae (mentor) Sponsored by: rsync.net Differential Revision: D9235
show more ...
|
#
348238db |
| 01-Mar-2017 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r314420 through r314481.
|