History log of /freebsd/sys/kern/uipc_sockbuf.c (Results 1 – 25 of 527)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# b93e930c 25-Jul-2025 Gleb Smirnoff <glebius@FreeBSD.org>

sendfile: retire M_BLOCKED

Follow unix(4) commit 51ac5ee0d57f and retire M_BLOCKED for TCP sockets as
well. The M_BLOCKED flag was introduced back 2016 together with non-
blocking sendfile(2). It

sendfile: retire M_BLOCKED

Follow unix(4) commit 51ac5ee0d57f and retire M_BLOCKED for TCP sockets as
well. The M_BLOCKED flag was introduced back 2016 together with non-
blocking sendfile(2). It marked mbufs in a sending socket buffer that
could be ready to sent, but are sitting behind an M_NOTREADY mbuf(s), that
blocks them.

You may consider this flag as an INVARIANT flag that helped to ensure
socket buffer consistency. Or maybe the socket code was so convoluted
back then, that it was unclear if sbfree() may be called on an mbuf that
is in the middle of the buffer, and I decided to introduce the flag to
protect against that. With today state of socket buffer code it became
clear that the latter cannot happen. And this commit adds an assertion
proving that.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D50728

show more ...


# f2c2ed7d 25-Jul-2025 Gleb Smirnoff <glebius@FreeBSD.org>

sendfile: don't hack sb_lowat for sockets that manage the watermark

In the sendfile(2) we carry an old hack (originating from d99b0dd2c5297)
to help dumb benchmarks and applications to achieve highe

sendfile: don't hack sb_lowat for sockets that manage the watermark

In the sendfile(2) we carry an old hack (originating from d99b0dd2c5297)
to help dumb benchmarks and applications to achieve higher performance. We
would modify low watermark on the socket send buffer to avoid socket being
reported as writable too early, which would result in lots of small
writes.

Skip that hack for applications that do setsockopt(SO_SNDLOWAT) or that
register the socket in kevent(2) with NOTE_LOWAT feature. First, we don't
want the hack to rewrite the watermark value explicitly specified by the
user. Second, in certain cases that can lead to real performance
regressions. A kevent(2) with NOTE_LOWAT would report socket as writable,
but then sendfile(2) would write 0 bytes and return EAGAIN.

The change also disables the hack for unix(4) sockets, leaving only TCP.

Reviewed by: rrs
Differential Revision: https://reviews.freebsd.org/D50581

show more ...


Revision tags: release/14.3.0-p1, release/14.2.0-p4, release/13.5.0-p2, release/14.3.0, release/13.4.0-p5, release/13.5.0-p1, release/14.2.0-p3
# 67c1c4df 24-Mar-2025 Gleb Smirnoff <glebius@FreeBSD.org>

sockbuf: provide sbunreserve_locked() which is a complement to sbreserve()

The sbreserve() works only on protocol-independent parts of the sockbuf,
but sbrelease() also clears the generic sockbuf mb

sockbuf: provide sbunreserve_locked() which is a complement to sbreserve()

The sbreserve() works only on protocol-independent parts of the sockbuf,
but sbrelease() also clears the generic sockbuf mbuf chain. Calling the
latter to undo changes done by the former is not correct. The new
function is the right thing.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D49364

show more ...


# 371392bc 24-Mar-2025 Gleb Smirnoff <glebius@FreeBSD.org>

sockbuf: remove sbflush_internal() and sbrelease_internal() shims

This functions serve just one purpose - allow to call sbdestroy() from
sofree() without triggering unlocked mutex assertions. Let's

sockbuf: remove sbflush_internal() and sbrelease_internal() shims

This functions serve just one purpose - allow to call sbdestroy() from
sofree() without triggering unlocked mutex assertions. Let's just don't
save on locking with INVARIANTS kernel and this will allow to clean up all
these shims. Should be no functional changes.

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D49363

show more ...


Revision tags: release/13.5.0, release/14.2.0-p2, release/14.1.0-p8, release/13.4.0-p4
# 9d7fb768 01-Feb-2025 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: garbage collect SB_NOINTR

Not used. All socket buffer sleeps are interruptible.


Revision tags: release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3, release/14.2.0, release/13.4.0
# a1da7dc1 10-Sep-2024 Mark Johnston <markj@FreeBSD.org>

socket: Implement SO_SPLICE

This is a feature which allows one to splice two TCP sockets together
such that data which arrives on one socket is automatically pushed into
the send buffer of the splic

socket: Implement SO_SPLICE

This is a feature which allows one to splice two TCP sockets together
such that data which arrives on one socket is automatically pushed into
the send buffer of the spliced socket. This can be used to make TCP
proxying more efficient as it eliminates the need to copy data into and
out of userspace.

The interface is copied from OpenBSD, and this implementation aims to be
compatible. Splicing is enabled by setting the SO_SPLICE socket option.
When spliced, data that arrives on the receive buffer is automatically
forwarded to the other socket. In particular, splicing is a
unidirectional operation; to splice a socket pair in both directions,
SO_SPLICE needs to be applied to both sockets. More concretely, when
setting the option one passes the following struct:

struct splice {
int fd;
off_t max;
struct timveval idle;
};

where "fd" refers to the socket to which the first socket is to be
spliced, and two setsockopt(SO_SPLICE) calls are required to set up a
bi-directional splice.

select(), poll() and kevent() do not return when data arrives in the
receive buffer of a spliced socket, as such data is expected to be
removed automatically once space is available in the corresponding send
buffer. Userspace can perform I/O on spliced sockets, but it will be
unpredictably interleaved with splice I/O.

A splice can be configured to unsplice once a certain number of bytes
have been transmitted, or after a given time period. Once unspliced,
the socket behaves normally from userspace's perspective. The number of
bytes transmitted via the splice can be retrieved using
getsockopt(SO_SPLICE); this works after unsplicing as well, up until the
socket is closed or spliced again. Userspace can also manually trigger
unsplicing by splicing to -1.

Splicing work is handled by dedicated threads, similar to KTLS. A
worker thread is assigned at splice creation time. At some point it
would be nice to have a direct dispatch mode, wherein the thread which
places data into a receive buffer is also responsible for pushing it
into the sink, but this requires tighter integration with the protocol
stack in order to avoid reentrancy problems.

Currently, sowakeup() and related functions will signal the worker
thread assigned to a spliced socket. so_splice_xfer() does the hard
work of moving data between socket buffers.

Co-authored by: gallatin
Reviewed by: brooks (interface bits)
MFC after: 3 months
Sponsored by: Klara, Inc.
Sponsored by: Stormshield
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D46411

show more ...


Revision tags: release/14.1.0, release/13.3.0
# 30f8cb81 02-Feb-2024 Mark Johnston <markj@FreeBSD.org>

socket: Don't assume m0 != NULL in sbappendcontrol_locked()

Some callers (e.g., ktls_decrypt()) violate this assumption and thus
could trigger a NULL pointer dereference in KMSAN kernels.

Reported

socket: Don't assume m0 != NULL in sbappendcontrol_locked()

Some callers (e.g., ktls_decrypt()) violate this assumption and thus
could trigger a NULL pointer dereference in KMSAN kernels.

Reported by: glebius
Fixes: ec45f952a232 ("sockbuf: Add KMSAN checks to sbappend*()")
MFC after: 1 week

show more ...


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl s

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix

show more ...


Revision tags: release/14.0.0
# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# ec45f952 27-Apr-2023 Mark Johnston <markj@FreeBSD.org>

sockbuf: Add KMSAN checks to sbappend*()

Otherwise KMSAN only detects uninitialized memory when the contents of
the buffer are copied out to userspace or transmitted to a network
interface. At that

sockbuf: Add KMSAN checks to sbappend*()

Otherwise KMSAN only detects uninitialized memory when the contents of
the buffer are copied out to userspace or transmitted to a network
interface. At that point the KMSAN violation will be far removed from
its origin, so let's try to make debugging such problems a bit easier.

Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D38101

show more ...


Revision tags: release/13.2.0, release/12.4.0
# 7b660faa 27-Sep-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

sockbufs: add sbreserve_locked_limit() with custom maxsockbuf limit.

Protocols such as netlink may need a large socket receive buffer,
measured in tens of megabytes. This change allows netlink to

sockbufs: add sbreserve_locked_limit() with custom maxsockbuf limit.

Protocols such as netlink may need a large socket receive buffer,
measured in tens of megabytes. This change allows netlink to
set larger socket buffers (given the privs are in place), without
requiring user to manuall bump maxsockbuf.

Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D36747

show more ...


# f6696856 27-Sep-2022 Alexander V. Chernikov <melifaro@FreeBSD.org>

protocols: make socket buffers ioctl handler changeable

Allow to set custom per-protocol handlers for the socket buffers
ioctls by introducing pr_setsbopt callback with the default value
set to th

protocols: make socket buffers ioctl handler changeable

Allow to set custom per-protocol handlers for the socket buffers
ioctls by introducing pr_setsbopt callback with the default value
set to the currently-used sbsetopt().

Reviewed by: glebius
Differential Revision: https://reviews.freebsd.org/D36746

show more ...


Revision tags: release/13.1.0
# fe8c78f0 23-Apr-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

ktls: Add full support for TLS RX offloading via network interface.

Basic TLS RX offloading uses the "csum_flags" field in the mbuf packet
header to figure out if an incoming mbuf has been fully off

ktls: Add full support for TLS RX offloading via network interface.

Basic TLS RX offloading uses the "csum_flags" field in the mbuf packet
header to figure out if an incoming mbuf has been fully offloaded or
not. This information follows the packet stream via the LRO engine, IP
stack and finally to the TCP stack. The TCP stack preserves the mbuf
packet header also when re-assembling packets after packet loss. When
the mbuf goes into the socket buffer the packet header is demoted and
the offload information is transferred to "m_flags" . Later on a
worker thread will analyze the mbuf flags and decide if the mbufs
making up a TLS record indicate a fully-, partially- or not decrypted
TLS record. Based on these three cases the worker thread will either
pass the packet on as-is or recrypt the decrypted bits, if any, or
decrypt the packet as usual.

During packet loss the kernel TLS code will call back into the network
driver using the send tag, informing about the TCP starting sequence
number of every TLS record that is not fully decrypted by the network
interface. The network interface then stores this information in a
compressed table and starts asking the hardware if it has found a
valid TLS header in the TCP data payload. If the hardware has found a
valid TLS header and the referred TLS header is at a valid TCP
sequence number according to the TCP sequence numbers provided by the
kernel TLS code, the network driver then informs the hardware that it
can resume decryption.

Care has been taken to not merge encrypted and decrypted mbuf chains,
in the LRO engine and when appending mbufs to the socket buffer.

The mbuf's leaf network interface pointer is used to figure out from
which network interface the offloading rule should be allocated. Also
this pointer is used to track route changes.

Currently mbuf send tags are used in both transmit and receive
direction, due to convenience, but may get a new name in the future to
better reflect their usage.

Reviewed by: jhb@ and gallatin@
Differential revision: https://reviews.freebsd.org/D32356
Sponsored by: NVIDIA Networking

show more ...


# d59bc188 27-May-2022 Gleb Smirnoff <glebius@FreeBSD.org>

sockbuf: remove unused mbuf counter and cluster counter

With M_EXTPG mbufs these two counters already do not represent the
reality. As we are moving towards protocol independent socket buffers,
whi

sockbuf: remove unused mbuf counter and cluster counter

With M_EXTPG mbufs these two counters already do not represent the
reality. As we are moving towards protocol independent socket buffers,
which may not even use mbufs at all, the counters become less and less
relevant. The only userland seeing them was 'netstat -x'.

PR: 264181 (exp-run)
Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35334

show more ...


# ad51c47f 25-May-2022 Gleb Smirnoff <glebius@FreeBSD.org>

sockbuf: fix assertion in sbcreatecontrol()

Fixes: 6890b588141a8298fc8a63700aeeea4ba36ca3f9


# 6890b588 17-May-2022 Gleb Smirnoff <glebius@FreeBSD.org>

sockbuf: improve sbcreatecontrol()

o Constify memory pointer. Make length unsigned.
o Make it never fail with M_WAITOK and assert that length is sane.


# b46667c6 17-May-2022 Gleb Smirnoff <glebius@FreeBSD.org>

sockbuf: merge two versions of sbcreatecontrol() into one

No functional change.


# 43283184 12-May-2022 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: use socket buffer mutexes in struct socket directly

Since c67f3b8b78e the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it. In 74a68313b50 macros that a

sockets: use socket buffer mutexes in struct socket directly

Since c67f3b8b78e the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it. In 74a68313b50 macros that access
this mutex directly were added. Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.

This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.

This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally. However, it allows operation of a
protocol that doesn't use them.

Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35152

show more ...


# 7db54446 09-May-2022 Gleb Smirnoff <glebius@FreeBSD.org>

sockbufs: make sbrelease_internal() private


# 17cbcf33 26-Jan-2022 Hans Petter Selasky <hselasky@FreeBSD.org>

mbuf(9): Assert receive mbufs don't carry a send tag.

Else we would start leaking reference counts.

Discussed with: jhb@
MFC after: 1 week
Sponsored by: NVIDIA Networking


# fe27f1db 26-Dec-2021 Alexander Motin <mav@FreeBSD.org>

kern: Remove CTLFLAG_NEEDGIANT from some sysctls.

MFC after: 2 weeks


Revision tags: release/12.3.0
# f94acf52 07-Sep-2021 Mark Johnston <markj@FreeBSD.org>

socket: Rename sb(un)lock() and interlock with listen(2)

In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LO

socket: Rename sb(un)lock() and interlock with listen(2)

In preparation for moving sockbuf locks into the containing socket,
provide alternative macros for the sockbuf I/O locks:
SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a
socket rather than a socket buffer. Note that these locks are used only
to prevent concurrent readers and writters from interleaving I/O.

When locking for I/O, return an error if the socket is a listening
socket. Currently the check is racy since the sockbuf sx locks are
destroyed during the transition to a listening socket, but that will no
longer be true after some follow-up changes.

Modify a few places to check for errors from
sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before. In
particular, add checks to sendfile() and sorflush().

Reviewed by: tuexen, gallatin
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31657

show more ...


# 7045b160 28-Jul-2021 Roy Marples <roy@marples.name>

socket: Implement SO_RERROR

SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they

socket: Implement SO_RERROR

SO_RERROR indicates that receive buffer overflows should be handled as
errors. Historically receive buffer overflows have been ignored and
programs could not tell if they missed messages or messages had been
truncated because of overflows. Since programs historically do not
expect to get receive overflow errors, this behavior is not the
default.

This is really really important for programs that use route(4) to keep
in sync with the system. If we loose a message then we need to reload
the full system state, otherwise the behaviour from that point is
undefined and can lead to chasing bogus bug reports.

Reviewed by: philip (network), kbowling (transport), gbe (manpages)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D26652

show more ...


# a1002174 14-Jun-2021 Mark Johnston <markj@FreeBSD.org>

Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros

This makes it easier to change the socket locking protocols. No
functional change intended.

MFC after: 1 week
Sponsored by: The FreeBSD Fou

Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros

This makes it easier to change the socket locking protocols. No
functional change intended.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation

show more ...


Revision tags: release/13.0.0
# 924d1c9a 08-Feb-2021 Alexander V. Chernikov <melifaro@FreeBSD.org>

Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors."
Wrong version of the change was pushed inadvertenly.

This reverts commit 4a01b854ca5c2e5124958363b3326708b913a

Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors."
Wrong version of the change was pushed inadvertenly.

This reverts commit 4a01b854ca5c2e5124958363b3326708b913af71.

show more ...


12345678910>>...22