#
5f539170 |
| 07-Mar-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: retire two-level port hash database
This structure originates from the pre-FreeBSD times when system RAM was measured in single digits of MB and Internet speeds were measured in Kb. At first
inpcb: retire two-level port hash database
This structure originates from the pre-FreeBSD times when system RAM was measured in single digits of MB and Internet speeds were measured in Kb. At first level the database hashes the port value only to calculate index into array of pointers to lazily allocated headers that hold lists of inpcbs with the same local port. This design apparently was made to preserve kernel memory.
In the modern kernel size of the first level of the hash is derived from maxsockets, which is derived from maxfiles, which in its turn is derived from amount of physical memory. Then the size of the hash is capped by IPPORT_MAX, cause it doesn't make any sense to have hash table larger then the set of possible values. In practice this cap works even on my laptop. I haven't done precise calculation or experiments, but my guess is that any system with > 8 Gb of RAM will be autotuned to IPPORT_MAX sized hash. Apparently, this hash is a degenerate one: it never has more than one entries in any slot. You can check this with kgdb:
set $i = 0 while ($i <= tcbinfo->ipi_porthashmask) set $p = tcbinfo->ipi_porthashbase[$i].clh_first set $c = 0 while ($p != 0) set $c = $c + 1 set $p = $p->phd_hash.cle_next end if ($c > 1) printf "Slot %u count %u", $i, $c end set $i = $i + 1 end
Retiring the two level hash we remove a lot of complexity at the cost of only one comparison 'inp->inp_lport != lport' in the lookup cycle, which is going to be always false on most machines anyway. This comparison definitely shall be cheaper than extra pointer traversal.
Another positive change to be singled out is that now we no longer need to allocate memory in non-sleepable context in in_pcbinshash(), so a potential ENOMEM on connect(2) is removed.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D49151
show more ...
|
#
79fb0d24 |
| 07-Mar-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: make inpcb hash insertion/removal functions private
|
Revision tags: release/13.5.0 |
|
#
8b3d2c19 |
| 23-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Fix reuseport lbgroup array resizing
in_pcblisten() moves an inpcb from the per-group list into the array, at which point it becomes visible to inpcb lookups in the datapath. It assumes that
inpcb: Fix reuseport lbgroup array resizing
in_pcblisten() moves an inpcb from the per-group list into the array, at which point it becomes visible to inpcb lookups in the datapath. It assumes that there is space in the array for this, but that's not guaranteed, since in_pcbinslbgrouphash() doesn't reserve space in the array if the inpcb isn't associated with a listening socket.
We could resize the array in in_pcblisten(), but that would introduce a failure case where there currently is none. Instead, keep track of the number of pending inpcbs as well, and modify in_pcbinslbgrouphash() to reserve space for each pending (i.e., not-yet-listening) inpcb.
Add a regression test.
Reviewed by: glebius Reported by: netchild Fixes: 7cbb6b6e28db ("inpcb: Close some SO_REUSEPORT_LB races, part 2") Differential Revision: https://reviews.freebsd.org/D49100
show more ...
|
Revision tags: release/14.2.0-p2, release/14.1.0-p8, release/13.4.0-p4 |
|
#
bafe022b |
| 18-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: add const qualifiers on functions that select address/port
There are several functions that keep database locked and do address and port selection before a caller commits the changes to the i
inpcb: add const qualifiers on functions that select address/port
There are several functions that keep database locked and do address and port selection before a caller commits the changes to the inpcb. Mark the inpcb argument with a good documenting const.
show more ...
|
Revision tags: release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3 |
|
#
ca94f92c |
| 23-Jan-2025 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Move the definition of struct inpcblbgroup to in_pcb_var.h
It's only needed for in_pcb.c and in6_pcb.c, so can go to the private header.
No functional change intended.
Reported by: glebius
inpcb: Move the definition of struct inpcblbgroup to in_pcb_var.h
It's only needed for in_pcb.c and in6_pcb.c, so can go to the private header.
No functional change intended.
Reported by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield
show more ...
|
#
9a413162 |
| 06-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Imbue in(6)_pcblookup_local() with a FIB parameter
This is to enable a mode where duplicate inpcb bindings are permitted, and we want to look up an inpcb with a particular FIB. Thus, add a "
inpcb: Imbue in(6)_pcblookup_local() with a FIB parameter
This is to enable a mode where duplicate inpcb bindings are permitted, and we want to look up an inpcb with a particular FIB. Thus, add a "fib" parameter to in_pcblookup() and related functions, and plumb it through.
A fib value of RT_ALL_FIBS indicates that the lookup should ignore FIB numbers when searching. Otherwise, it should refer to a valid FIB number, and the returned inpcb should belong to the specific FIB. For now, just add the fib parameter where needed, as there are several layers to plumb through.
No functional change intended.
Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48660
show more ...
|
Revision tags: release/14.2.0, release/13.4.0, release/14.1.0, release/13.3.0 |
|
#
29363fb4 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
2ff63af9 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
|
#
7b92493a |
| 20-Apr-2023 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Avoid inp_cred dereferences in SMR-protected lookup
The SMR-protected inpcb lookup algorithm currently has to check whether a matching inpcb belongs to a jail, in order to prioritize jailed b
inpcb: Avoid inp_cred dereferences in SMR-protected lookup
The SMR-protected inpcb lookup algorithm currently has to check whether a matching inpcb belongs to a jail, in order to prioritize jailed bound sockets. To do this it has to maintain a ucred reference, and for this to be safe, the reference can't be released until the UMA destructor is called, and this will not happen within any bounded time period.
Changing SMR to periodically recycle garbage is not trivial. Instead, let's implement SMR-synchronized lookup without needing to dereference inp_cred. This will allow the inpcb code to free the inp_cred reference immediately when a PCB is freed, ensuring that ucred (and thus jail) references are released promptly.
Commit 220d89212943 ("inpcb: immediately return matching pcb on lookup") gets us part of the way there. This patch goes further to handle lookups of unconnected sockets. Here, the strategy is to maintain a well-defined order of items within a hash chain so that a wild lookup can simply return the first match and preserve existing semantics. This makes insertion of listening sockets more complicated in order to make lookup simpler, which seems like the right tradeoff anyway given that bind() is already a fairly expensive operation and lookups are more common.
In particular, when inserting an unconnected socket, in_pcbinhash() now keeps the following ordering: - jailed sockets before non-jailed sockets, - specified local addresses before unspecified local addresses.
Most of the change adds a separate SMR-based lookup path for inpcb hash lookups. When a match is found, we try to lock the inpcb and re-validate its connection info. In the common case, this works well and we can simply return the inpcb. If this fails, typically because something is concurrently modifying the inpcb, we go to the slow path, which performs a serialized lookup.
Note, I did not touch lbgroup lookup, since there the credential reference is formally synchronized by net_epoch, not SMR. In particular, lbgroups are rarely allocated or freed.
I think it is possible to simplify in_pcblookup_hash_wild_locked() now, but I didn't do it in this patch.
Discussed with: glebius Tested by: glebius Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38572
show more ...
|
Revision tags: release/13.2.0, release/12.4.0, release/13.1.0 |
|
#
a0577692 |
| 26-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address
The intent is to provide more entropy than can be provided by just the 32-bits of the IPv6 address which overlaps with 6to4 tunnels.
in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address
The intent is to provide more entropy than can be provided by just the 32-bits of the IPv6 address which overlaps with 6to4 tunnels. This is needed to mitigate potential algorithmic complexity attacks from attackers who can control large numbers of IPv6 addresses.
Together with: gallatin Reviewed by: dwmalone, rscheff Differential revision: https://reviews.freebsd.org/D33254
show more ...
|
#
db0ac6de |
| 02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mism
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mismerge of a merge to catch up to main resulted in files being committed which should not have been.
show more ...
|
#
266f97b5 |
| 02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.
MFC after: 1 month
|
#
de2d4784 |
| 02-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addre
SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addresses, etc). However, inpcb aren't static in nature, they are created and destroyed all the time, which creates some traffic on the epoch(9) garbage collector.
Fairly new feature of uma(9) - Safe Memory Reclamation allows to safely free memory in page-sized batches, with virtually zero overhead compared to uma_zfree(). However, unlike epoch(9), it puts stricter requirement on the access to the protected memory, needing the critical(9) section to access it. Details:
- The database is already build on CK lists, thanks to epoch(9). - For write access nothing is changed. - For a lookup in the database SMR section is now required. Once the desired inpcb is found we need to transition from SMR section to r/w lock on the inpcb itself, with a check that inpcb isn't yet freed. This requires some compexity, since SMR section itself is a critical(9) section. The complexity is hidden from KPI users in inp_smr_lock(). - For a inpcb list traversal (a pcblist sysctl, or broadcast notification) also a new KPI is provided, that hides internals of the database - inp_next(struct inp_iterator *).
Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33022
show more ...
|
Revision tags: release/12.3.0 |
|
#
0f617ae4 |
| 18-Oct-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Add in_pcb_var.h for KPIs that are private to in_pcb.c and in6_pcb.c.
|