History log of /freebsd/sys/netinet/in_pcb_var.h (Results 1 – 14 of 14)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 5f539170 07-Mar-2025 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: retire two-level port hash database

This structure originates from the pre-FreeBSD times when system RAM was
measured in single digits of MB and Internet speeds were measured in Kb.
At first

inpcb: retire two-level port hash database

This structure originates from the pre-FreeBSD times when system RAM was
measured in single digits of MB and Internet speeds were measured in Kb.
At first level the database hashes the port value only to calculate index
into array of pointers to lazily allocated headers that hold lists of
inpcbs with the same local port. This design apparently was made to
preserve kernel memory.

In the modern kernel size of the first level of the hash is derived from
maxsockets, which is derived from maxfiles, which in its turn is derived
from amount of physical memory. Then the size of the hash is capped by
IPPORT_MAX, cause it doesn't make any sense to have hash table larger then
the set of possible values. In practice this cap works even on my laptop.
I haven't done precise calculation or experiments, but my guess is that
any system with > 8 Gb of RAM will be autotuned to IPPORT_MAX sized hash.
Apparently, this hash is a degenerate one: it never has more than one
entries in any slot. You can check this with kgdb:

set $i = 0
while ($i <= tcbinfo->ipi_porthashmask)
set $p = tcbinfo->ipi_porthashbase[$i].clh_first
set $c = 0
while ($p != 0)
set $c = $c + 1
set $p = $p->phd_hash.cle_next
end
if ($c > 1)
printf "Slot %u count %u", $i, $c
end
set $i = $i + 1
end

Retiring the two level hash we remove a lot of complexity at the cost of
only one comparison 'inp->inp_lport != lport' in the lookup cycle, which
is going to be always false on most machines anyway. This comparison
definitely shall be cheaper than extra pointer traversal.

Another positive change to be singled out is that now we no longer need to
allocate memory in non-sleepable context in in_pcbinshash(), so a
potential ENOMEM on connect(2) is removed.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D49151

show more ...


# 79fb0d24 07-Mar-2025 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: make inpcb hash insertion/removal functions private


Revision tags: release/13.5.0
# 8b3d2c19 23-Feb-2025 Mark Johnston <markj@FreeBSD.org>

inpcb: Fix reuseport lbgroup array resizing

in_pcblisten() moves an inpcb from the per-group list into the array, at
which point it becomes visible to inpcb lookups in the datapath. It
assumes that

inpcb: Fix reuseport lbgroup array resizing

in_pcblisten() moves an inpcb from the per-group list into the array, at
which point it becomes visible to inpcb lookups in the datapath. It
assumes that there is space in the array for this, but that's not
guaranteed, since in_pcbinslbgrouphash() doesn't reserve space in the
array if the inpcb isn't associated with a listening socket.

We could resize the array in in_pcblisten(), but that would introduce a
failure case where there currently is none. Instead, keep track of the
number of pending inpcbs as well, and modify in_pcbinslbgrouphash() to
reserve space for each pending (i.e., not-yet-listening) inpcb.

Add a regression test.

Reviewed by: glebius
Reported by: netchild
Fixes: 7cbb6b6e28db ("inpcb: Close some SO_REUSEPORT_LB races, part 2")
Differential Revision: https://reviews.freebsd.org/D49100

show more ...


Revision tags: release/14.2.0-p2, release/14.1.0-p8, release/13.4.0-p4
# bafe022b 18-Feb-2025 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: add const qualifiers on functions that select address/port

There are several functions that keep database locked and do address
and port selection before a caller commits the changes to the i

inpcb: add const qualifiers on functions that select address/port

There are several functions that keep database locked and do address
and port selection before a caller commits the changes to the inpcb.
Mark the inpcb argument with a good documenting const.

show more ...


Revision tags: release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3
# ca94f92c 23-Jan-2025 Mark Johnston <markj@FreeBSD.org>

inpcb: Move the definition of struct inpcblbgroup to in_pcb_var.h

It's only needed for in_pcb.c and in6_pcb.c, so can go to the private
header.

No functional change intended.

Reported by: glebius

inpcb: Move the definition of struct inpcblbgroup to in_pcb_var.h

It's only needed for in_pcb.c and in6_pcb.c, so can go to the private
header.

No functional change intended.

Reported by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield

show more ...


# 9a413162 06-Feb-2025 Mark Johnston <markj@FreeBSD.org>

inpcb: Imbue in(6)_pcblookup_local() with a FIB parameter

This is to enable a mode where duplicate inpcb bindings are permitted,
and we want to look up an inpcb with a particular FIB. Thus, add a
"

inpcb: Imbue in(6)_pcblookup_local() with a FIB parameter

This is to enable a mode where duplicate inpcb bindings are permitted,
and we want to look up an inpcb with a particular FIB. Thus, add a
"fib" parameter to in_pcblookup() and related functions, and plumb it
through.

A fib value of RT_ALL_FIBS indicates that the lookup should ignore FIB
numbers when searching. Otherwise, it should refer to a valid FIB
number, and the returned inpcb should belong to the specific FIB. For
now, just add the fib parameter where needed, as there are several
layers to plumb through.

No functional change intended.

Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48660

show more ...


Revision tags: release/14.2.0, release/13.4.0, release/14.1.0, release/13.3.0
# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl s

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix

show more ...


Revision tags: release/14.0.0
# 2ff63af9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .h pattern

Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/


# 7b92493a 20-Apr-2023 Mark Johnston <markj@FreeBSD.org>

inpcb: Avoid inp_cred dereferences in SMR-protected lookup

The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
b

inpcb: Avoid inp_cred dereferences in SMR-protected lookup

The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
bound sockets. To do this it has to maintain a ucred reference, and for
this to be safe, the reference can't be released until the UMA
destructor is called, and this will not happen within any bounded time
period.

Changing SMR to periodically recycle garbage is not trivial. Instead,
let's implement SMR-synchronized lookup without needing to dereference
inp_cred. This will allow the inpcb code to free the inp_cred reference
immediately when a PCB is freed, ensuring that ucred (and thus jail)
references are released promptly.

Commit 220d89212943 ("inpcb: immediately return matching pcb on lookup")
gets us part of the way there. This patch goes further to handle
lookups of unconnected sockets. Here, the strategy is to maintain a
well-defined order of items within a hash chain so that a wild lookup
can simply return the first match and preserve existing semantics. This
makes insertion of listening sockets more complicated in order to make
lookup simpler, which seems like the right tradeoff anyway given that
bind() is already a fairly expensive operation and lookups are more
common.

In particular, when inserting an unconnected socket, in_pcbinhash() now
keeps the following ordering:
- jailed sockets before non-jailed sockets,
- specified local addresses before unspecified local addresses.

Most of the change adds a separate SMR-based lookup path for inpcb hash
lookups. When a match is found, we try to lock the inpcb and
re-validate its connection info. In the common case, this works well
and we can simply return the inpcb. If this fails, typically because
something is concurrently modifying the inpcb, we go to the slow path,
which performs a serialized lookup.

Note, I did not touch lbgroup lookup, since there the credential
reference is formally synchronized by net_epoch, not SMR. In
particular, lbgroups are rarely allocated or freed.

I think it is possible to simplify in_pcblookup_hash_wild_locked() now,
but I didn't do it in this patch.

Discussed with: glebius
Tested by: glebius
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38572

show more ...


Revision tags: release/13.2.0, release/12.4.0, release/13.1.0
# a0577692 26-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address

The intent is to provide more entropy than can be provided
by just the 32-bits of the IPv6 address which overlaps with
6to4 tunnels.

in_pcb: use jenkins hash over the entire IPv6 (or IPv4) address

The intent is to provide more entropy than can be provided
by just the 32-bits of the IPv6 address which overlaps with
6to4 tunnels. This is needed to mitigate potential algorithmic
complexity attacks from attackers who can control large
numbers of IPv6 addresses.

Together with: gallatin
Reviewed by: dwmalone, rscheff
Differential revision: https://reviews.freebsd.org/D33254

show more ...


# db0ac6de 02-Dec-2021 Cy Schubert <cy@FreeBSD.org>

Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"

This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.

A mism

Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"

This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.

show more ...


# 266f97b5 02-Dec-2021 Cy Schubert <cy@FreeBSD.org>

wpa: Import wpa_supplicant/hostapd commit 14ab4a816

This is the November update to vendor/wpa committed upstream 2021-11-26.

MFC after: 1 month


# de2d4784 02-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

SMR protection for inpcbs

With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addre

SMR protection for inpcbs

With introduction of epoch(9) synchronization to network stack the
inpcb database became protected by the network epoch together with
static network data (interfaces, addresses, etc). However, inpcb
aren't static in nature, they are created and destroyed all the
time, which creates some traffic on the epoch(9) garbage collector.

Fairly new feature of uma(9) - Safe Memory Reclamation allows to
safely free memory in page-sized batches, with virtually zero
overhead compared to uma_zfree(). However, unlike epoch(9), it
puts stricter requirement on the access to the protected memory,
needing the critical(9) section to access it. Details:

- The database is already build on CK lists, thanks to epoch(9).
- For write access nothing is changed.
- For a lookup in the database SMR section is now required.
Once the desired inpcb is found we need to transition from SMR
section to r/w lock on the inpcb itself, with a check that inpcb
isn't yet freed. This requires some compexity, since SMR section
itself is a critical(9) section. The complexity is hidden from
KPI users in inp_smr_lock().
- For a inpcb list traversal (a pcblist sysctl, or broadcast
notification) also a new KPI is provided, that hides internals of
the database - inp_next(struct inp_iterator *).

Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33022

show more ...


Revision tags: release/12.3.0
# 0f617ae4 18-Oct-2021 Gleb Smirnoff <glebius@FreeBSD.org>

Add in_pcb_var.h for KPIs that are private to in_pcb.c and in6_pcb.c.