#
b93e930c |
| 25-Jul-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sendfile: retire M_BLOCKED
Follow unix(4) commit 51ac5ee0d57f and retire M_BLOCKED for TCP sockets as well. The M_BLOCKED flag was introduced back 2016 together with non- blocking sendfile(2). It
sendfile: retire M_BLOCKED
Follow unix(4) commit 51ac5ee0d57f and retire M_BLOCKED for TCP sockets as well. The M_BLOCKED flag was introduced back 2016 together with non- blocking sendfile(2). It marked mbufs in a sending socket buffer that could be ready to sent, but are sitting behind an M_NOTREADY mbuf(s), that blocks them.
You may consider this flag as an INVARIANT flag that helped to ensure socket buffer consistency. Or maybe the socket code was so convoluted back then, that it was unclear if sbfree() may be called on an mbuf that is in the middle of the buffer, and I decided to introduce the flag to protect against that. With today state of socket buffer code it became clear that the latter cannot happen. And this commit adds an assertion proving that.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D50728
show more ...
|
#
f2c2ed7d |
| 25-Jul-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sendfile: don't hack sb_lowat for sockets that manage the watermark
In the sendfile(2) we carry an old hack (originating from d99b0dd2c5297) to help dumb benchmarks and applications to achieve highe
sendfile: don't hack sb_lowat for sockets that manage the watermark
In the sendfile(2) we carry an old hack (originating from d99b0dd2c5297) to help dumb benchmarks and applications to achieve higher performance. We would modify low watermark on the socket send buffer to avoid socket being reported as writable too early, which would result in lots of small writes.
Skip that hack for applications that do setsockopt(SO_SNDLOWAT) or that register the socket in kevent(2) with NOTE_LOWAT feature. First, we don't want the hack to rewrite the watermark value explicitly specified by the user. Second, in certain cases that can lead to real performance regressions. A kevent(2) with NOTE_LOWAT would report socket as writable, but then sendfile(2) would write 0 bytes and return EAGAIN.
The change also disables the hack for unix(4) sockets, leaving only TCP.
Reviewed by: rrs Differential Revision: https://reviews.freebsd.org/D50581
show more ...
|
Revision tags: release/14.3.0-p1, release/14.2.0-p4, release/13.5.0-p2, release/14.3.0, release/13.4.0-p5, release/13.5.0-p1, release/14.2.0-p3 |
|
#
67c1c4df |
| 24-Mar-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockbuf: provide sbunreserve_locked() which is a complement to sbreserve()
The sbreserve() works only on protocol-independent parts of the sockbuf, but sbrelease() also clears the generic sockbuf mb
sockbuf: provide sbunreserve_locked() which is a complement to sbreserve()
The sbreserve() works only on protocol-independent parts of the sockbuf, but sbrelease() also clears the generic sockbuf mbuf chain. Calling the latter to undo changes done by the former is not correct. The new function is the right thing.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D49364
show more ...
|
#
371392bc |
| 24-Mar-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockbuf: remove sbflush_internal() and sbrelease_internal() shims
This functions serve just one purpose - allow to call sbdestroy() from sofree() without triggering unlocked mutex assertions. Let's
sockbuf: remove sbflush_internal() and sbrelease_internal() shims
This functions serve just one purpose - allow to call sbdestroy() from sofree() without triggering unlocked mutex assertions. Let's just don't save on locking with INVARIANTS kernel and this will allow to clean up all these shims. Should be no functional changes.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D49363
show more ...
|
Revision tags: release/13.5.0, release/14.2.0-p2, release/14.1.0-p8, release/13.4.0-p4 |
|
#
9d7fb768 |
| 01-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: garbage collect SB_NOINTR
Not used. All socket buffer sleeps are interruptible.
|
Revision tags: release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3, release/14.2.0, release/13.4.0 |
|
#
a1da7dc1 |
| 10-Sep-2024 |
Mark Johnston <markj@FreeBSD.org> |
socket: Implement SO_SPLICE
This is a feature which allows one to splice two TCP sockets together such that data which arrives on one socket is automatically pushed into the send buffer of the splic
socket: Implement SO_SPLICE
This is a feature which allows one to splice two TCP sockets together such that data which arrives on one socket is automatically pushed into the send buffer of the spliced socket. This can be used to make TCP proxying more efficient as it eliminates the need to copy data into and out of userspace.
The interface is copied from OpenBSD, and this implementation aims to be compatible. Splicing is enabled by setting the SO_SPLICE socket option. When spliced, data that arrives on the receive buffer is automatically forwarded to the other socket. In particular, splicing is a unidirectional operation; to splice a socket pair in both directions, SO_SPLICE needs to be applied to both sockets. More concretely, when setting the option one passes the following struct:
struct splice { int fd; off_t max; struct timveval idle; };
where "fd" refers to the socket to which the first socket is to be spliced, and two setsockopt(SO_SPLICE) calls are required to set up a bi-directional splice.
select(), poll() and kevent() do not return when data arrives in the receive buffer of a spliced socket, as such data is expected to be removed automatically once space is available in the corresponding send buffer. Userspace can perform I/O on spliced sockets, but it will be unpredictably interleaved with splice I/O.
A splice can be configured to unsplice once a certain number of bytes have been transmitted, or after a given time period. Once unspliced, the socket behaves normally from userspace's perspective. The number of bytes transmitted via the splice can be retrieved using getsockopt(SO_SPLICE); this works after unsplicing as well, up until the socket is closed or spliced again. Userspace can also manually trigger unsplicing by splicing to -1.
Splicing work is handled by dedicated threads, similar to KTLS. A worker thread is assigned at splice creation time. At some point it would be nice to have a direct dispatch mode, wherein the thread which places data into a receive buffer is also responsible for pushing it into the sink, but this requires tighter integration with the protocol stack in order to avoid reentrancy problems.
Currently, sowakeup() and related functions will signal the worker thread assigned to a spliced socket. so_splice_xfer() does the hard work of moving data between socket buffers.
Co-authored by: gallatin Reviewed by: brooks (interface bits) MFC after: 3 months Sponsored by: Klara, Inc. Sponsored by: Stormshield Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D46411
show more ...
|
Revision tags: release/14.1.0, release/13.3.0 |
|
#
30f8cb81 |
| 02-Feb-2024 |
Mark Johnston <markj@FreeBSD.org> |
socket: Don't assume m0 != NULL in sbappendcontrol_locked()
Some callers (e.g., ktls_decrypt()) violate this assumption and thus could trigger a NULL pointer dereference in KMSAN kernels.
Reported
socket: Don't assume m0 != NULL in sbappendcontrol_locked()
Some callers (e.g., ktls_decrypt()) violate this assumption and thus could trigger a NULL pointer dereference in KMSAN kernels.
Reported by: glebius Fixes: ec45f952a232 ("sockbuf: Add KMSAN checks to sbappend*()") MFC after: 1 week
show more ...
|
#
29363fb4 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
685dc743 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
ec45f952 |
| 27-Apr-2023 |
Mark Johnston <markj@FreeBSD.org> |
sockbuf: Add KMSAN checks to sbappend*()
Otherwise KMSAN only detects uninitialized memory when the contents of the buffer are copied out to userspace or transmitted to a network interface. At that
sockbuf: Add KMSAN checks to sbappend*()
Otherwise KMSAN only detects uninitialized memory when the contents of the buffer are copied out to userspace or transmitted to a network interface. At that point the KMSAN violation will be far removed from its origin, so let's try to make debugging such problems a bit easier.
Reviewed by: glebius MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D38101
show more ...
|
Revision tags: release/13.2.0, release/12.4.0 |
|
#
7b660faa |
| 27-Sep-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
sockbufs: add sbreserve_locked_limit() with custom maxsockbuf limit.
Protocols such as netlink may need a large socket receive buffer, measured in tens of megabytes. This change allows netlink to
sockbufs: add sbreserve_locked_limit() with custom maxsockbuf limit.
Protocols such as netlink may need a large socket receive buffer, measured in tens of megabytes. This change allows netlink to set larger socket buffers (given the privs are in place), without requiring user to manuall bump maxsockbuf.
Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D36747
show more ...
|
#
f6696856 |
| 27-Sep-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
protocols: make socket buffers ioctl handler changeable
Allow to set custom per-protocol handlers for the socket buffers ioctls by introducing pr_setsbopt callback with the default value set to th
protocols: make socket buffers ioctl handler changeable
Allow to set custom per-protocol handlers for the socket buffers ioctls by introducing pr_setsbopt callback with the default value set to the currently-used sbsetopt().
Reviewed by: glebius Differential Revision: https://reviews.freebsd.org/D36746
show more ...
|
Revision tags: release/13.1.0 |
|
#
fe8c78f0 |
| 23-Apr-2022 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
ktls: Add full support for TLS RX offloading via network interface.
Basic TLS RX offloading uses the "csum_flags" field in the mbuf packet header to figure out if an incoming mbuf has been fully off
ktls: Add full support for TLS RX offloading via network interface.
Basic TLS RX offloading uses the "csum_flags" field in the mbuf packet header to figure out if an incoming mbuf has been fully offloaded or not. This information follows the packet stream via the LRO engine, IP stack and finally to the TCP stack. The TCP stack preserves the mbuf packet header also when re-assembling packets after packet loss. When the mbuf goes into the socket buffer the packet header is demoted and the offload information is transferred to "m_flags" . Later on a worker thread will analyze the mbuf flags and decide if the mbufs making up a TLS record indicate a fully-, partially- or not decrypted TLS record. Based on these three cases the worker thread will either pass the packet on as-is or recrypt the decrypted bits, if any, or decrypt the packet as usual.
During packet loss the kernel TLS code will call back into the network driver using the send tag, informing about the TCP starting sequence number of every TLS record that is not fully decrypted by the network interface. The network interface then stores this information in a compressed table and starts asking the hardware if it has found a valid TLS header in the TCP data payload. If the hardware has found a valid TLS header and the referred TLS header is at a valid TCP sequence number according to the TCP sequence numbers provided by the kernel TLS code, the network driver then informs the hardware that it can resume decryption.
Care has been taken to not merge encrypted and decrypted mbuf chains, in the LRO engine and when appending mbufs to the socket buffer.
The mbuf's leaf network interface pointer is used to figure out from which network interface the offloading rule should be allocated. Also this pointer is used to track route changes.
Currently mbuf send tags are used in both transmit and receive direction, due to convenience, but may get a new name in the future to better reflect their usage.
Reviewed by: jhb@ and gallatin@ Differential revision: https://reviews.freebsd.org/D32356 Sponsored by: NVIDIA Networking
show more ...
|
#
d59bc188 |
| 27-May-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockbuf: remove unused mbuf counter and cluster counter
With M_EXTPG mbufs these two counters already do not represent the reality. As we are moving towards protocol independent socket buffers, whi
sockbuf: remove unused mbuf counter and cluster counter
With M_EXTPG mbufs these two counters already do not represent the reality. As we are moving towards protocol independent socket buffers, which may not even use mbufs at all, the counters become less and less relevant. The only userland seeing them was 'netstat -x'.
PR: 264181 (exp-run) Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35334
show more ...
|
#
ad51c47f |
| 25-May-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockbuf: fix assertion in sbcreatecontrol()
Fixes: 6890b588141a8298fc8a63700aeeea4ba36ca3f9
|
#
6890b588 |
| 17-May-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockbuf: improve sbcreatecontrol()
o Constify memory pointer. Make length unsigned. o Make it never fail with M_WAITOK and assert that length is sane.
|
#
b46667c6 |
| 17-May-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockbuf: merge two versions of sbcreatecontrol() into one
No functional change.
|
#
43283184 |
| 12-May-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: use socket buffer mutexes in struct socket directly
Since c67f3b8b78e the sockbuf mutexes belong to the containing socket, and socket buffers just point to it. In 74a68313b50 macros that a
sockets: use socket buffer mutexes in struct socket directly
Since c67f3b8b78e the sockbuf mutexes belong to the containing socket, and socket buffers just point to it. In 74a68313b50 macros that access this mutex directly were added. Go over the core socket code and eliminate code that reaches the mutex by dereferencing the sockbuf compatibility pointer.
This change requires a KPI change, as some functions were given the sockbuf pointer only without any hint if it is a receive or send buffer.
This change doesn't cover the whole kernel, many protocols still use compatibility pointers internally. However, it allows operation of a protocol that doesn't use them.
Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35152
show more ...
|
#
7db54446 |
| 09-May-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockbufs: make sbrelease_internal() private
|
#
17cbcf33 |
| 26-Jan-2022 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
mbuf(9): Assert receive mbufs don't carry a send tag.
Else we would start leaking reference counts.
Discussed with: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking
|
#
fe27f1db |
| 26-Dec-2021 |
Alexander Motin <mav@FreeBSD.org> |
kern: Remove CTLFLAG_NEEDGIANT from some sysctls.
MFC after: 2 weeks
|
Revision tags: release/12.3.0 |
|
#
f94acf52 |
| 07-Sep-2021 |
Mark Johnston <markj@FreeBSD.org> |
socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket, provide alternative macros for the sockbuf I/O locks: SOCK_IO_SEND_(UN)LO
socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket, provide alternative macros for the sockbuf I/O locks: SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a socket rather than a socket buffer. Note that these locks are used only to prevent concurrent readers and writters from interleaving I/O.
When locking for I/O, return an error if the socket is a listening socket. Currently the check is racy since the sockbuf sx locks are destroyed during the transition to a listening socket, but that will no longer be true after some follow-up changes.
Modify a few places to check for errors from sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before. In particular, add checks to sendfile() and sorflush().
Reviewed by: tuexen, gallatin MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31657
show more ...
|
#
7045b160 |
| 28-Jul-2021 |
Roy Marples <roy@marples.name> |
socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they
socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default.
This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.
Reviewed by: philip (network), kbowling (transport), gbe (manpages) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26652
show more ...
|
#
a1002174 |
| 14-Jun-2021 |
Mark Johnston <markj@FreeBSD.org> |
Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros
This makes it easier to change the socket locking protocols. No functional change intended.
MFC after: 1 week Sponsored by: The FreeBSD Fou
Consistently use the SOCKBUF_MTX() and SOCK_MTX() macros
This makes it easier to change the socket locking protocols. No functional change intended.
MFC after: 1 week Sponsored by: The FreeBSD Foundation
show more ...
|
Revision tags: release/13.0.0 |
|
#
924d1c9a |
| 08-Feb-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors." Wrong version of the change was pushed inadvertenly.
This reverts commit 4a01b854ca5c2e5124958363b3326708b913a
Revert "SO_RERROR indicates that receive buffer overflows should be handled as errors." Wrong version of the change was pushed inadvertenly.
This reverts commit 4a01b854ca5c2e5124958363b3326708b913af71.
show more ...
|