History log of /freebsd/sys/netinet6/in6_pcb.c (Results 1 – 25 of 450)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 5f539170 07-Mar-2025 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: retire two-level port hash database

This structure originates from the pre-FreeBSD times when system RAM was
measured in single digits of MB and Internet speeds were measured in Kb.
At first

inpcb: retire two-level port hash database

This structure originates from the pre-FreeBSD times when system RAM was
measured in single digits of MB and Internet speeds were measured in Kb.
At first level the database hashes the port value only to calculate index
into array of pointers to lazily allocated headers that hold lists of
inpcbs with the same local port. This design apparently was made to
preserve kernel memory.

In the modern kernel size of the first level of the hash is derived from
maxsockets, which is derived from maxfiles, which in its turn is derived
from amount of physical memory. Then the size of the hash is capped by
IPPORT_MAX, cause it doesn't make any sense to have hash table larger then
the set of possible values. In practice this cap works even on my laptop.
I haven't done precise calculation or experiments, but my guess is that
any system with > 8 Gb of RAM will be autotuned to IPPORT_MAX sized hash.
Apparently, this hash is a degenerate one: it never has more than one
entries in any slot. You can check this with kgdb:

set $i = 0
while ($i <= tcbinfo->ipi_porthashmask)
set $p = tcbinfo->ipi_porthashbase[$i].clh_first
set $c = 0
while ($p != 0)
set $c = $c + 1
set $p = $p->phd_hash.cle_next
end
if ($c > 1)
printf "Slot %u count %u", $i, $c
end
set $i = $i + 1
end

Retiring the two level hash we remove a lot of complexity at the cost of
only one comparison 'inp->inp_lport != lport' in the lookup cycle, which
is going to be always false on most machines anyway. This comparison
definitely shall be cheaper than extra pointer traversal.

Another positive change to be singled out is that now we no longer need to
allocate memory in non-sleepable context in in_pcbinshash(), so a
potential ENOMEM on connect(2) is removed.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D49151

show more ...


Revision tags: release/13.5.0, release/14.2.0-p2, release/14.1.0-p8, release/13.4.0-p4
# da806e8d 06-Feb-2025 Mark Johnston <markj@FreeBSD.org>

inpcb: Add FIB-aware inpcb lookup

Allow protocol layers to look up an inpcb belonging to a particular FIB.
This is indicated by setting INPLOOKUP_FIB; if it is set, the FIB to be
used is obtained fr

inpcb: Add FIB-aware inpcb lookup

Allow protocol layers to look up an inpcb belonging to a particular FIB.
This is indicated by setting INPLOOKUP_FIB; if it is set, the FIB to be
used is obtained from the specificed mbuf or ifnet.

No functional change intended.

Reviewed by: glebius, melifaro
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48662

show more ...


# bbd0084b 06-Feb-2025 Mark Johnston <markj@FreeBSD.org>

inpcb: Add a flags parameter to in_pcbbind()

Add a flag, INPBIND_FIB, which means that the inpcb is local to its FIB
number. When this flag is specified, duplicate bindings are permitted,
so long a

inpcb: Add a flags parameter to in_pcbbind()

Add a flag, INPBIND_FIB, which means that the inpcb is local to its FIB
number. When this flag is specified, duplicate bindings are permitted,
so long as each FIB contains at most one inpcb bound to the same
address/port. If an inpcb is bound with this flag, it'll have the
INP_BOUNDFIB flag set.

No functional change intended.

Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48661

show more ...


# 9a413162 06-Feb-2025 Mark Johnston <markj@FreeBSD.org>

inpcb: Imbue in(6)_pcblookup_local() with a FIB parameter

This is to enable a mode where duplicate inpcb bindings are permitted,
and we want to look up an inpcb with a particular FIB. Thus, add a
"

inpcb: Imbue in(6)_pcblookup_local() with a FIB parameter

This is to enable a mode where duplicate inpcb bindings are permitted,
and we want to look up an inpcb with a particular FIB. Thus, add a
"fib" parameter to in_pcblookup() and related functions, and plumb it
through.

A fib value of RT_ALL_FIBS indicates that the lookup should ignore FIB
numbers when searching. Otherwise, it should refer to a valid FIB
number, and the returned inpcb should belong to the specific FIB. For
now, just add the fib parameter where needed, as there are several
layers to plumb through.

No functional change intended.

Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48660

show more ...


Revision tags: release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3
# c9756953 23-Dec-2024 Mark Johnston <markj@FreeBSD.org>

inpcb: Further restrict binding to a port owned by a different UID

See commit 4f02a7d739b3 for more background.

I cannot see a good reason to continue ignoring mismatching UIDs when
binding to INAD

inpcb: Further restrict binding to a port owned by a different UID

See commit 4f02a7d739b3 for more background.

I cannot see a good reason to continue ignoring mismatching UIDs when
binding to INADDR_ANY. Looking at the sdr.V2.4a7n sources (mentioned in
bugzilla PR 7713), there is a CANT_MCAST_BIND hack wherein the
application binds to INADDR_ANY instead of a multicast address, but
CANT_MCAST_BIND isn't defined for FreeBSD builds.

It seems unlikely that we still have a use-case for allowing sockets
from different UIDs to bind to the same port when binding to the
unspecified address. And, as noted in D47832, applications like sdr
would have been broken by the inverted SO_REUSEPORT check removed in
that revision, apparently without any bug reports. Let's break
compatibility and simply disallow this case outright.

Also, add some comments, remove a hack in a regression test which tests
this funtionality, and add a new regression test to exercise the
remaining checks that were added in commit 4658dc8325e03.

MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47870

show more ...


# 4f02a7d7 12-Dec-2024 Mark Johnston <markj@FreeBSD.org>

inpcb: Remove bogus SO_REUSEPORT(_LB) checks in in_pcbbind()

This check for SO_REUSEPORT was added way back in commit 52b65dbe85faf.
Per the commit log, this commit restricted this port-stealing che

inpcb: Remove bogus SO_REUSEPORT(_LB) checks in in_pcbbind()

This check for SO_REUSEPORT was added way back in commit 52b65dbe85faf.
Per the commit log, this commit restricted this port-stealing check to
unicast addresses, and then only if the existing socket does not have
SO_REUSEPORT set. In other words, if there exists a socket bound to
INADDR_ANY, and we bind a socket to INADDR_ANY with the same port, then
the two sockets need not be owned by the same user if the existing
socket has SO_REUSEPORT set.

This is a surprising semantic; bugzilla PR 7713 gives some additional
context. That PR makes a case for the behaviour described above when
binding to a multicast address. But, the SO_REUSEPORT check is only
applied when binding to a non-multicast address, so it doesn't really
make sense. In the PR the committer notes that "unicast applications
don't set SO_REUSEPORT", which makes some sense, but also refers to
"multicast applications that bind to INADDR_ANY", which sounds a bit
suspicious.

OpenBSD performs the multicast check, but not the SO_REUSEPORT check.
DragonflyBSD removed the SO_REUSEPORT (and INADDR_ANY) checks back in
2014 (commit 0323d5fde12a4). NetBSD explicitly copied our logic and
still has it.

The plot thickens: 20 years later, SO_REUSEPORT_LB was ported from
DragonflyBSD: this option provides similar semantics to SO_REUSEPORT,
but for unicast addresses it causes incoming connections/datagrams to be
distributed among all sockets in the group. This commit (1a43cff92a20d)
inverted the check for SO_REUSEPORT while adding one for
SO_REUSEPORT_LB; this appears to have been inadvertent. However:
- apparently no one has noticed that the semantics were changed;
- sockets belonging to different users can now be bound to the same port
so long as they belong to a single lbgroup bound to INADDR_ANY, which
is not correct.

Simply remove the SO_REUSEPORT(_LB) checks, as their original
justification was dubious and their current implementation is wrong; add
some tests.

Reviewed by: glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47832

show more ...


# a600aabe 12-Dec-2024 Mark Johnston <markj@FreeBSD.org>

inpcb: Close some SO_REUSEPORT_LB races

For a long time, the inpcb lookup path has been lockless in the common
case: we use net_epoch to synchronize lookups. However, the routines
which update lbgr

inpcb: Close some SO_REUSEPORT_LB races

For a long time, the inpcb lookup path has been lockless in the common
case: we use net_epoch to synchronize lookups. However, the routines
which update lbgroups were not careful to synchronize with unlocked
lookups. I believe that in the worst case this can result in spurious
connection aborts (I have a regression test case to exercise this), but
it's hard to be certain.

Modify in_pcblbgroup* routines to synchronize with unlocked lookup:
- When removing inpcbs from an lbgroup, do not shrink the array.
The maximum number of lbgroup entries is INPCBLBGROUP_SIZMAX (256),
and it doesn't seem worth the complexity to shrink the array when a
socket is removed.
- When resizing an lbgroup, do not insert it into the hash table until
it is fully initialized; otherwise lookups may observe a partially
constructed lbgroup.
- When adding an inpcb to the group, increment the counter after adding
the array entry, using a release store. Otherwise it's possible for
lookups to observe a null array slot.
- When looking up an entry, use a corresponding acquire load.

Reviewed by: ae, glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48020

show more ...


# ffb3d384 05-Dec-2024 Mark Johnston <markj@FreeBSD.org>

inpcb: Fix the GENERIC-NODEBUG build

Fixes: 01f8ce83242d ("inpcb: Factor out parts of in6_pcbbind() and in_pcbbind_setup()")


# 01f8ce83 05-Dec-2024 Mark Johnston <markj@FreeBSD.org>

inpcb: Factor out parts of in6_pcbbind() and in_pcbbind_setup()

A large portion of these functions just determines whether the inpcb can
bind to the address/port. This portion has no side effects,

inpcb: Factor out parts of in6_pcbbind() and in_pcbbind_setup()

A large portion of these functions just determines whether the inpcb can
bind to the address/port. This portion has no side effects, so is a
good candidate to move into its own helper function. This patch does
so, making the callers less complicated and reducing indentation.

While moving this code, also make some changes:
- Load socket options (SO_REUSEADDR etc.) only once. There is nothing
preventing another thread from toggling the socket options, so make
this function easier to reason about by avoiding races.
- When checking whether the bind address is an interface address, make a
separate sockaddr rather than temporarily modifying the one passed to
in_pcbbind().

Reviewed by: ae, glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47590

show more ...


Revision tags: release/14.2.0
# 52ef944b 14-Nov-2024 Mark Johnston <markj@FreeBSD.org>

inpcb: Constify address parameters to in6 pcb lookup routines

No functional change intended.

MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Stormshield


# 45a77bf2 14-Nov-2024 Mark Johnston <markj@FreeBSD.org>

inpcb: Make some cosmetic improvements to in_pcbbind()

- Use the local var "laddr" instead of sin->sin_addr in one block.
- Use in_nullhost() instead of explicit comparisons with INADDR_ANY.
- Combi

inpcb: Make some cosmetic improvements to in_pcbbind()

- Use the local var "laddr" instead of sin->sin_addr in one block.
- Use in_nullhost() instead of explicit comparisons with INADDR_ANY.
- Combine multiple socket options checks into one.
- Fix indentation.
- Remove some unhelpful comments.

This is in preparation for some simplification and bug-fixing.

No functional change intended.

Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47451

show more ...


# 21d7ac8c 08-Nov-2024 Mark Johnston <markj@FreeBSD.org>

inpcb: Remove some unused parameters in internal hash lookup functions

in_pcblookup_hash_wild_* looks up unconnected inpcbs, so there is no
point in passing the foreign address and port, and indeed

inpcb: Remove some unused parameters in internal hash lookup functions

in_pcblookup_hash_wild_* looks up unconnected inpcbs, so there is no
point in passing the foreign address and port, and indeed those
parameters are not used. So, remove them.

No functional change intended.

MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D47385

show more ...


Revision tags: release/13.4.0
# 0c605af3 29-Aug-2024 Mark Johnston <markj@FreeBSD.org>

netinet: Explicitly disallow connections to the unspecified address

If the V_connect_ifaddr_wild sysctl says that we shouldn't infer a
destination address, return an error. Otherwise it's possible

netinet: Explicitly disallow connections to the unspecified address

If the V_connect_ifaddr_wild sysctl says that we shouldn't infer a
destination address, return an error. Otherwise it's possible for use
of an unspecified foreign address to trigger a subsequent assertion
failure, for example in in_pcblookup_hash_locked().

Similarly, if no interface addresses are assigned, fail quickly upon an
attempt to connect to the unspecified address.

Reported by: Shawn Webb <shawn.webb@hardenedbsd.org>
MFC after: 2 weeks
Reviewed by: zlei, allanjude, emaste
Differential Revision: https://reviews.freebsd.org/D46454

show more ...


# 417b35a9 20-Aug-2024 Mark Johnston <markj@FreeBSD.org>

netinet: Add a sysctl to allow disabling connections to INADDR_ANY

See the discussion in Bugzilla PR 280705 for context.

PR: 280705
MFC after: 1 week
Differential Revision: https://reviews.freebsd

netinet: Add a sysctl to allow disabling connections to INADDR_ANY

See the discussion in Bugzilla PR 280705 for context.

PR: 280705
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D46259

show more ...


Revision tags: release/14.1.0
# 042fb58d 12-Apr-2024 Lexi Winter <lexi@le-Fay.ORG>

sys/netinet6/in6_pcb.c: fix compile without INET

in6_mapped_sockaddr() and in6_mapped_peeraddr() both define a local
variable named 'inp', but in the non-INET case, this variable is set
and never us

sys/netinet6/in6_pcb.c: fix compile without INET

in6_mapped_sockaddr() and in6_mapped_peeraddr() both define a local
variable named 'inp', but in the non-INET case, this variable is set
and never used, causing a compiler error:

/src/freebsd/src/lf/sys/netinet6/in6_pcb.c:547:16: error:
variable 'inp' set but not used [-Werror,-Wunused-but-set-variable]
547 | struct inpcb *inp;
| ^
/src/freebsd/src/lf/sys/netinet6/in6_pcb.c:573:16: error:
variable 'inp' set but not used [-Werror,-Wunused-but-set-variable]
573 | struct inpcb *inp;

Fix this by guarding all the INET-specific logic, including the variable
definition, behind #ifdef INET.

While here, tweak formatting in in6_mapped_peeraddr() so both functions
are the same.

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1155

show more ...


Revision tags: release/13.3.0
# 0fac350c 30-Nov-2023 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: don't malloc/free sockaddr memory on getpeername/getsockname

Just like it was done for accept(2) in cfb1e92912b4, use same approach
for two simplier syscalls that return socket addresses.

sockets: don't malloc/free sockaddr memory on getpeername/getsockname

Just like it was done for accept(2) in cfb1e92912b4, use same approach
for two simplier syscalls that return socket addresses. Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.

Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.

Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D42694

show more ...


# cfb1e929 30-Nov-2023 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: don't malloc/free sockaddr memory on accept(2)

Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may

sockets: don't malloc/free sockaddr memory on accept(2)

Let the accept functions provide stack memory for protocols to fill it in.
Generic code should provide sockaddr_storage, specialized code may provide
smaller structure.

While rewriting accept(2) make 'addrlen' a true in/out parameter, reporting
required length in case if provided length was insufficient. Our manual
page accept(2) and POSIX don't explicitly require that, but one can read
the text as they do. Linux also does that. Update tests accordingly.

Reviewed by: rscheff, tuexen, zlei, dchagin
Differential Revision: https://reviews.freebsd.org/D42635

show more ...


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl s

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix

show more ...


Revision tags: release/14.0.0
# 0bf5377b 14-Sep-2023 Andrey V. Elsukov <ae@FreeBSD.org>

Avoid IPv6 source address selection on accepting TCP connections

When an application listens IPv6 TCP socket, due to ipfw
forwarding tag it may handle connections for addresses that do not
belongs t

Avoid IPv6 source address selection on accepting TCP connections

When an application listens IPv6 TCP socket, due to ipfw
forwarding tag it may handle connections for addresses that do not
belongs to the jail or even current host (transparent proxy).
Syncache code can successfully handle TCP handshake for such connections.
When syncache finally accepts connection it uses in6_pcbconnect() to
properly initlize new connection info.

For IPv4 this scenario just works, but for IPv6 it fails when
local address doesn't belongs to the jail. This check occurs when
in6_pcbladdr() applies IPv6 SAS algorithm.
We need IPv6 SAS when we are connection initiator, but in the above
case connection is already established and both source and destination
addresses are known.

Use unused argument to notify in6_pcbconnect() when we don't need
source address selection. This will fix `ipfw fwd` to jailed IPv6
address.

When we are connection initiator, we stil use IPv6 SAS algorithm and
apply all related restrictions.

MFC after: 1 month
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D41685

show more ...


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# e3ba0d6a 27-Jul-2023 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: do not copy so_options into inp_flags2

Since f71cb9f74808 socket stays connnected with inpcb through latter's
lifetime and there is no reason to complicate things and copy these
flags.

Revie

inpcb: do not copy so_options into inp_flags2

Since f71cb9f74808 socket stays connnected with inpcb through latter's
lifetime and there is no reason to complicate things and copy these
flags.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D41198

show more ...


# a306ed50 30-May-2023 Mark Johnston <markj@FreeBSD.org>

inpcb: Restore missing validation of local addresses for jailed sockets

When looking up a listening socket, the SMR-protected lookup routine may
return a jailed socket with no local address. This h

inpcb: Restore missing validation of local addresses for jailed sockets

When looking up a listening socket, the SMR-protected lookup routine may
return a jailed socket with no local address. This happens when using
classic jails with more than one IP address; in a single-IP classic
jail, a bound socket's local address is always rewritten to be that of
the jail.

After commit 7b92493ab1d4, the lookup path failed to check whether the
jail corresponding to a matched wildcard socket actually owns the
address, and would return the match regardless. Restore the omitted
checks.

Fixes: 7b92493ab1d4 ("inpcb: Avoid inp_cred dereferences in SMR-protected lookup")
Reported by: peter
Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D40268

show more ...


# 7b92493a 20-Apr-2023 Mark Johnston <markj@FreeBSD.org>

inpcb: Avoid inp_cred dereferences in SMR-protected lookup

The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
b

inpcb: Avoid inp_cred dereferences in SMR-protected lookup

The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
bound sockets. To do this it has to maintain a ucred reference, and for
this to be safe, the reference can't be released until the UMA
destructor is called, and this will not happen within any bounded time
period.

Changing SMR to periodically recycle garbage is not trivial. Instead,
let's implement SMR-synchronized lookup without needing to dereference
inp_cred. This will allow the inpcb code to free the inp_cred reference
immediately when a PCB is freed, ensuring that ucred (and thus jail)
references are released promptly.

Commit 220d89212943 ("inpcb: immediately return matching pcb on lookup")
gets us part of the way there. This patch goes further to handle
lookups of unconnected sockets. Here, the strategy is to maintain a
well-defined order of items within a hash chain so that a wild lookup
can simply return the first match and preserve existing semantics. This
makes insertion of listening sockets more complicated in order to make
lookup simpler, which seems like the right tradeoff anyway given that
bind() is already a fairly expensive operation and lookups are more
common.

In particular, when inserting an unconnected socket, in_pcbinhash() now
keeps the following ordering:
- jailed sockets before non-jailed sockets,
- specified local addresses before unspecified local addresses.

Most of the change adds a separate SMR-based lookup path for inpcb hash
lookups. When a match is found, we try to lock the inpcb and
re-validate its connection info. In the common case, this works well
and we can simply return the inpcb. If this fails, typically because
something is concurrently modifying the inpcb, we go to the slow path,
which performs a serialized lookup.

Note, I did not touch lbgroup lookup, since there the credential
reference is formally synchronized by net_epoch, not SMR. In
particular, lbgroups are rarely allocated or freed.

I think it is possible to simplify in_pcblookup_hash_wild_locked() now,
but I didn't do it in this patch.

Discussed with: glebius
Tested by: glebius
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38572

show more ...


# 3e98dcb3 20-Apr-2023 Mark Johnston <markj@FreeBSD.org>

inpcb: Move inpcb matching logic into separate functions

These functions will get some additional callers in future revisions.

No functional change intended.

Discussed with: glebius
Tested by: gle

inpcb: Move inpcb matching logic into separate functions

These functions will get some additional callers in future revisions.

No functional change intended.

Discussed with: glebius
Tested by: glebius
Sponsored by: Modirum MDPay
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38571

show more ...


# fdb987be 20-Apr-2023 Mark Johnston <markj@FreeBSD.org>

inpcb: Split PCB hash tables

Currently we use a single hash table per PCB database for connected and
bound PCBs. Since we started using net_epoch to synchronize hash table
lookups, there's been a b

inpcb: Split PCB hash tables

Currently we use a single hash table per PCB database for connected and
bound PCBs. Since we started using net_epoch to synchronize hash table
lookups, there's been a bug, noted in a comment above in_pcbrehash():
connecting a socket can cause an inpcb to move between hash chains, and
this can cause a concurrent lookup to follow the wrong linkage pointers.
I believe this could cause rare, spurious ECONNREFUSED errors in the
worse case.

Address the problem by introducing a second hash table and adding more
linkage pointers to struct inpcb. Now the database has one table each
for connected and unconnected sockets.

When inserting an inpcb into the hash table, in_pcbinhash() now looks at
the foreign address of the inpcb to figure out which table to use. This
ensures that queue linkage pointers are stable until the socket is
disconnected, so the problem described above goes away. There is also a
small benefit in that in_pcblookup_*() can now search just one of the
two possible hash buckets.

I also made the "rehash" parameter of in(6)_pcbconnect() unused. This
parameter seems confusing and it is simpler to let the inpcb code figure
out what to do using the existing INP_INHASHLIST flag.

UDP sockets pose a special problem since they can be connected and
disconnected multiple times during their lifecycle. To handle this, the
patch plugs a hole in the inpcb structure and uses it to store an SMR
sequence number. When an inpcb is disconnected - an operation which
requires the global PCB database hash lock - the write sequence number
is advanced, and in order to reconnect, the connecting thread must wait
for readers to drain before reusing the inpcb's hash chain linkage
pointers.

raw_ip (ab)uses the hash table without using the corresponding
accessors. Since there are now two hash tables, it arbitrarily uses the
"connected" table for all of its PCBs. This will be addressed in some
way in the future.

inp interators which specify a hash bucket will only visit connected
PCBs. This is not really correct, but nothing in the tree uses that
functionality except raw_ip, which as mentioned above places all of its
PCBs in the "connected" table and so is unaffected.

Discussed with: glebius
Tested by: glebius
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38569

show more ...


12345678910>>...18