History log of /freebsd/sys/netinet/in_pcb.h (Results 1 – 25 of 510)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 7cbb6b6e 23-Jan-2025 Mark Johnston <markj@FreeBSD.org>

inpcb: Close some SO_REUSEPORT_LB races, part 2

Suppose a thread is adds a socket to an existing TCP lbgroup that is
actively accepting connections. It has to do the following operations:
1. set SO

inpcb: Close some SO_REUSEPORT_LB races, part 2

Suppose a thread is adds a socket to an existing TCP lbgroup that is
actively accepting connections. It has to do the following operations:
1. set SO_REUSEPORT_LB on the socket
2. bind() the socket to the shared address/port
3. call listen()

Step 2 makes the inpcb visible to incoming connection requests.
However, at this point the inpcb cannot accept new connections. If
in_pcblookup() matches it, the remote end will see ECONNREFUSED even
when other listening sockets are present in the lbgroup. This means
that dynamically adding inpcbs to an lbgroup (e.g., by starting up new
workers) can trigger spurious connection failures for no good reason.
(A similar problem exists when removing inpcbs from an lbgroup, but that
is harder to fix and is not addressed by this patch; see the review for
a bit more commentary.)

Fix this by augmenting each lbgroup with a linked list of inpcbs that
are pending a listen() call. When adding an inpcb to an lbgroup, keep
the inpcb on this list if listen() hasn't been called, so it is not yet
visible to the lookup path. Then, add a new in_pcblisten() routine which
makes the inpcb visible within the lbgroup now that it's safe to let it
handle new connections.

Add a regression test which verifies that we don't get spurious
connection errors while adding sockets to an LB group.

Reviewed by: glebius
MFC after: 1 month
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Differential Revision: https://reviews.freebsd.org/D48544

show more ...


Revision tags: release/14.2.0
# 0b4539ee 14-Nov-2024 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: gc unused argument of in_pcbconnect()


Revision tags: release/13.4.0, release/14.1.0
# 1a8d1764 29-Mar-2024 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: fully retire inp_ppcb pointer

Before a protocol specific control block started to embed inpcb in self
(see 0aa120d52f3c, e68b3792440c, 483fe96511ec) this pointer used to point
at it.

Retain

inpcb: fully retire inp_ppcb pointer

Before a protocol specific control block started to embed inpcb in self
(see 0aa120d52f3c, e68b3792440c, 483fe96511ec) this pointer used to point
at it.

Retain kf_sock_inpcb field in the struct kinfo_file in <sys/user.h>. The
exp-run detected a minimal use of the field in ports:
* sysutils/lsof - patched upstream
* net-mgmt/netdata - patch accepted upstream
* emulators/qemu-user-static - upstream master branch seems not using
the field anymore
We can keep the field around for some time, but eventually it may be
reused for something else.

PR: 277659 (exp-run)
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D44491

show more ...


# 027fda80 18-Mar-2024 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: remove unused KPIs to manipulate inpcbs

These KPIs were added in 9d29c635daa69 and through 15 years had zero use.
They slightly remind what IfAPI does for struct ifnet. But IfAPI does
that f

inpcb: remove unused KPIs to manipulate inpcbs

These KPIs were added in 9d29c635daa69 and through 15 years had zero use.
They slightly remind what IfAPI does for struct ifnet. But IfAPI does
that for the sake of large collection of NIC drivers not being aware of
struct ifnet. For the inpcb it is unclear what could be a large
collection of externally written kernel modules that need extensively use
inpcb and not be aware of its internals at the same time. This isolation
of a structure knowledge requires a lot of work, and just throwing in a
few KPIs isn't helpful.

Reviewed by: kib, bz, markj
Differential Revision: https://reviews.freebsd.org/D44310

show more ...


Revision tags: release/13.3.0
# a13039e2 27-Dec-2023 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: reoder inpcb destruction

First, merge in_pcbdetach() with in_pcbfree(). The comment for
in_pcbdetach() was no longer correct. Then, make sure we remove
the inpcb from the hash before we com

inpcb: reoder inpcb destruction

First, merge in_pcbdetach() with in_pcbfree(). The comment for
in_pcbdetach() was no longer correct. Then, make sure we remove
the inpcb from the hash before we commit any destructive actions
on it. There are couple functions that rely on the hash lock
skipping SMR + inpcb lock to lookup an inpcb. Although there are
no known functions that similarly rely on the global inpcb list
lock, also do list removal before destructive actions.

PR: 273890
Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D43122

show more ...


# 0fac350c 30-Nov-2023 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: don't malloc/free sockaddr memory on getpeername/getsockname

Just like it was done for accept(2) in cfb1e92912b4, use same approach
for two simplier syscalls that return socket addresses.

sockets: don't malloc/free sockaddr memory on getpeername/getsockname

Just like it was done for accept(2) in cfb1e92912b4, use same approach
for two simplier syscalls that return socket addresses. Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.

Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.

Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D42694

show more ...


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl s

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix

show more ...


# bbbd7aab 20-Nov-2023 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: garbage collect in_pcbnotifyall()


Revision tags: release/14.0.0
# 2ff63af9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .h pattern

Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/


# e3ba0d6a 27-Jul-2023 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: do not copy so_options into inp_flags2

Since f71cb9f74808 socket stays connnected with inpcb through latter's
lifetime and there is no reason to complicate things and copy these
flags.

Revie

inpcb: do not copy so_options into inp_flags2

Since f71cb9f74808 socket stays connnected with inpcb through latter's
lifetime and there is no reason to complicate things and copy these
flags.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D41198

show more ...


# a43e7a96 27-Jul-2023 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: use internal flag to mark pcbs that are inserted into lbgroup

Using INP_REUSEPORT_LB is unsafe, as it is basically a copy of socket's
SO_REUSEPORT_LB flag, which can be cleared by userland af

inpcb: use internal flag to mark pcbs that are inserted into lbgroup

Using INP_REUSEPORT_LB is unsafe, as it is basically a copy of socket's
SO_REUSEPORT_LB flag, which can be cleared by userland after bind().

Reviewed by: markj
Reported by: syzbot+e7d2e451f89fb444319b@syzkaller.appspotmail.com
Differential Revision: https://reviews.freebsd.org/D41197

show more ...


# c3c20de3 25-Apr-2023 Gleb Smirnoff <glebius@FreeBSD.org>

tcp: move HPTS/LRO flags out of inpcb to tcpcb

These flags are TCP specific. While here, make also several LRO
internal functions to pass tcpcb pointer instead of inpcb one.

Reviewed by: rrs
Diff

tcp: move HPTS/LRO flags out of inpcb to tcpcb

These flags are TCP specific. While here, make also several LRO
internal functions to pass tcpcb pointer instead of inpcb one.

Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39698

show more ...


# c2a69e84 25-Apr-2023 Gleb Smirnoff <glebius@FreeBSD.org>

tcp_hpts: move HPTS related fields from inpcb to tcpcb

This makes inpcb lighter and allows future cache line optimizations
of tcpcb. The reason why HPTS originally used inpcb is the compressed
TIME

tcp_hpts: move HPTS related fields from inpcb to tcpcb

This makes inpcb lighter and allows future cache line optimizations
of tcpcb. The reason why HPTS originally used inpcb is the compressed
TIME-WAIT state (see 0d7445193ab), that used to free a tcpcb, while the
associated connection is still on the HPTS ring.

Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D39697

show more ...


# 7b92493a 20-Apr-2023 Mark Johnston <markj@FreeBSD.org>

inpcb: Avoid inp_cred dereferences in SMR-protected lookup

The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
b

inpcb: Avoid inp_cred dereferences in SMR-protected lookup

The SMR-protected inpcb lookup algorithm currently has to check whether
a matching inpcb belongs to a jail, in order to prioritize jailed
bound sockets. To do this it has to maintain a ucred reference, and for
this to be safe, the reference can't be released until the UMA
destructor is called, and this will not happen within any bounded time
period.

Changing SMR to periodically recycle garbage is not trivial. Instead,
let's implement SMR-synchronized lookup without needing to dereference
inp_cred. This will allow the inpcb code to free the inp_cred reference
immediately when a PCB is freed, ensuring that ucred (and thus jail)
references are released promptly.

Commit 220d89212943 ("inpcb: immediately return matching pcb on lookup")
gets us part of the way there. This patch goes further to handle
lookups of unconnected sockets. Here, the strategy is to maintain a
well-defined order of items within a hash chain so that a wild lookup
can simply return the first match and preserve existing semantics. This
makes insertion of listening sockets more complicated in order to make
lookup simpler, which seems like the right tradeoff anyway given that
bind() is already a fairly expensive operation and lookups are more
common.

In particular, when inserting an unconnected socket, in_pcbinhash() now
keeps the following ordering:
- jailed sockets before non-jailed sockets,
- specified local addresses before unspecified local addresses.

Most of the change adds a separate SMR-based lookup path for inpcb hash
lookups. When a match is found, we try to lock the inpcb and
re-validate its connection info. In the common case, this works well
and we can simply return the inpcb. If this fails, typically because
something is concurrently modifying the inpcb, we go to the slow path,
which performs a serialized lookup.

Note, I did not touch lbgroup lookup, since there the credential
reference is formally synchronized by net_epoch, not SMR. In
particular, lbgroups are rarely allocated or freed.

I think it is possible to simplify in_pcblookup_hash_wild_locked() now,
but I didn't do it in this patch.

Discussed with: glebius
Tested by: glebius
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38572

show more ...


# fdb987be 20-Apr-2023 Mark Johnston <markj@FreeBSD.org>

inpcb: Split PCB hash tables

Currently we use a single hash table per PCB database for connected and
bound PCBs. Since we started using net_epoch to synchronize hash table
lookups, there's been a b

inpcb: Split PCB hash tables

Currently we use a single hash table per PCB database for connected and
bound PCBs. Since we started using net_epoch to synchronize hash table
lookups, there's been a bug, noted in a comment above in_pcbrehash():
connecting a socket can cause an inpcb to move between hash chains, and
this can cause a concurrent lookup to follow the wrong linkage pointers.
I believe this could cause rare, spurious ECONNREFUSED errors in the
worse case.

Address the problem by introducing a second hash table and adding more
linkage pointers to struct inpcb. Now the database has one table each
for connected and unconnected sockets.

When inserting an inpcb into the hash table, in_pcbinhash() now looks at
the foreign address of the inpcb to figure out which table to use. This
ensures that queue linkage pointers are stable until the socket is
disconnected, so the problem described above goes away. There is also a
small benefit in that in_pcblookup_*() can now search just one of the
two possible hash buckets.

I also made the "rehash" parameter of in(6)_pcbconnect() unused. This
parameter seems confusing and it is simpler to let the inpcb code figure
out what to do using the existing INP_INHASHLIST flag.

UDP sockets pose a special problem since they can be connected and
disconnected multiple times during their lifecycle. To handle this, the
patch plugs a hole in the inpcb structure and uses it to store an SMR
sequence number. When an inpcb is disconnected - an operation which
requires the global PCB database hash lock - the write sequence number
is advanced, and in order to reconnect, the connecting thread must wait
for readers to drain before reusing the inpcb's hash chain linkage
pointers.

raw_ip (ab)uses the hash table without using the corresponding
accessors. Since there are now two hash tables, it arbitrarily uses the
"connected" table for all of its PCBs. This will be addressed in some
way in the future.

inp interators which specify a hash bucket will only visit connected
PCBs. This is not really correct, but nothing in the tree uses that
functionality except raw_ip, which as mentioned above places all of its
PCBs in the "connected" table and so is unaffected.

Discussed with: glebius
Tested by: glebius
Sponsored by: Klara, Inc.
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D38569

show more ...


Revision tags: release/13.2.0
# 317fa516 28-Feb-2023 Mark Johnston <markj@FreeBSD.org>

netinet: Remove the IP(V6)_RSS_LISTEN_BUCKET socket option

It has no effect, and an exp-run revealed that it is not in use.

PR: 261398 (exp-run)
Reviewed by: mjg, glebius
Sponsored by: Klara, Inc.

netinet: Remove the IP(V6)_RSS_LISTEN_BUCKET socket option

It has no effect, and an exp-run revealed that it is not in use.

PR: 261398 (exp-run)
Reviewed by: mjg, glebius
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38822

show more ...


# 3aff4ccd 27-Feb-2023 Mark Johnston <markj@FreeBSD.org>

netinet: Remove IP(V6)_BINDMULTI

This option was added in commit 0a100a6f1ee5 but was never completed.
In particular, there is no logic to map flowids to different listening
sockets, so it accomplis

netinet: Remove IP(V6)_BINDMULTI

This option was added in commit 0a100a6f1ee5 but was never completed.
In particular, there is no logic to map flowids to different listening
sockets, so it accomplishes basically the same thing as SO_REUSEPORT.
Meanwhile, we've since added SO_REUSEPORT_LB, which at least tries to
balance among listening sockets using a hash of the 4-tuple and some
optional NUMA policy.

The option was never documented or completed, and an exp-run revealed
nothing using it in the ports tree. Moreover, it complicates the
already very complicated in_pcbbind_setup(), and the checking in
in_pcbbind_check_bindmulti() is insufficient. So, let's remove it.

PR: 261398 (exp-run)
Reviewed by: glebius
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38574

show more ...


# 96871af0 15-Feb-2023 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: use family specific sockaddr argument for bind functions

Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_bind method and from there on go down the call
stack wi

inpcb: use family specific sockaddr argument for bind functions

Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_bind method and from there on go down the call
stack with family specific argument.

Reviewed by: zlei, melifaro, markj
Differential Revision: https://reviews.freebsd.org/D38601

show more ...


# 09d3671b 03-Feb-2023 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: better document INP_ANONPORT flag

The name is pretty self explaining, but it is unclear why we need this
flag, as kernel only sets it and never reads.


# 9e46ff4d 03-Feb-2023 Gleb Smirnoff <glebius@FreeBSD.org>

netinet: don't return conflicting inpcb in in_pcbconnect_setup()

Last time this inpcb was actually used was in tcp_connect()
before c94c54e4df9a.


# a9d22cce 03-Feb-2023 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: use family specific sockaddr argument for connect functions

Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_connect method and from there on go down the call
st

inpcb: use family specific sockaddr argument for connect functions

Do the cast from sockaddr to either IPv4 or IPv6 sockaddr in the
protocol's pr_connect method and from there on go down the call
stack with family specific argument.

Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D38356

show more ...


# 0aa120d5 02-Dec-2022 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: allow to provide protocol specific pcb size

The protocol specific structure shall start with inpcb.

Differential revision: https://reviews.freebsd.org/D37126


Revision tags: release/12.4.0
# d93ec8cb 02-Nov-2022 Mark Johnston <markj@FreeBSD.org>

inpcb: Allow SO_REUSEPORT_LB to be used in jails

Currently SO_REUSEPORT_LB silently does nothing when set by a jailed
process. It is trivial to support this option in VNET jails, but it's
also usef

inpcb: Allow SO_REUSEPORT_LB to be used in jails

Currently SO_REUSEPORT_LB silently does nothing when set by a jailed
process. It is trivial to support this option in VNET jails, but it's
also useful in traditional jails.

This patch enables LB groups in jails with the following semantics:
- all PCBs in a group must belong to the same jail,
- PCB lookup prefers jailed groups to non-jailed groups

This is a straightforward extension of the semantics used for individual
listening sockets. One pre-existing quirk of the lbgroup implementation
is that non-jailed lbgroups are searched before jailed listening
sockets; that is preserved with this change.

Discussed with: glebius
MFC after: 1 month
Sponsored by: Modirum MDPay
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D37029

show more ...


# 19acc506 31-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: retire suppresion of randomization of ephemeral ports

The suppresion was added in 5f311da2ccb with no explanation in the
commit message of the exact problem that was fixed. In the BSDCan
2006

inpcb: retire suppresion of randomization of ephemeral ports

The suppresion was added in 5f311da2ccb with no explanation in the
commit message of the exact problem that was fixed. In the BSDCan
2006 talk [1], slides 12 to 14, we can find that it seems that there
was some problem with the TIME_WAIT state not properly being handled
on the remote side (also FreeBSD!), and this switching off the
suppression had hidden the problem. The rationale of the change was
that other stacks may also be buggy wrt the TIME_WAIT.

I did not find the actual problem in TIME_WAIT that the suppression
has hidden, neither a commit that would fix it. However, since that
time we started to handle SYNs with RFC5961 instead of RFC793, see
3220a2121cc. We also now have the tcp-testsuite [2], that has full
coverage of all possible scenarios of receiving SYN in TIME_WAIT.

This effectively reverts 5f311da2ccb6c216b79049172be840af4778129a
and 6ee79c59d2c081f837a11cc78908be54a6dbe09d.

[1] https://www.bsdcan.org/2006/papers/ImprovingTCPIP.pdf
[2] https://github.com/freebsd-net/tcp-testsuite

Reviewed by: rscheff
Discussed with: rscheff, rrs, tuexen
Differential revision: https://reviews.freebsd.org/D37042

show more ...


# 24cf7a8d 20-Oct-2022 Gleb Smirnoff <glebius@FreeBSD.org>

inpcb: provide pcbinfo pointer argument to inp_apply_all()

Allows to clear inpcb layer of TCP knowledge.


12345678910>>...21