#
314cb279 |
| 31-Oct-2024 |
John Baldwin <jhb@FreeBSD.org> |
mbuf: Don't force all M_EXTPG mbufs to be read-only
Some M_EXTPG mbufs are read-only (e.g. those backing sendfile requests), but others are not. Add a flags argument to mb_alloc_ext_pgs that can be
mbuf: Don't force all M_EXTPG mbufs to be read-only
Some M_EXTPG mbufs are read-only (e.g. those backing sendfile requests), but others are not. Add a flags argument to mb_alloc_ext_pgs that can be used to set M_RDONLY when needed rather than setting it unconditionally. Update mb_unmapped_to_ext to preserve M_RDONLY from the unmapped mbuf.
Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D46783
show more ...
|
Revision tags: release/13.4.0, release/14.1.0 |
|
#
0020e1b6 |
| 10-Apr-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Revert "sendfile: mark it explicitly as a TCP only feature"
This reverts commit 3b7aa842e27dcf07181f161b1abde0067ed51e97.
|
#
3b7aa842 |
| 08-Apr-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sendfile: mark it explicitly as a TCP only feature
Back in 2015 when it turned non-blocking, it was working with PF_UNIX and it may still work. However, the usefullness of such application of sendf
sendfile: mark it explicitly as a TCP only feature
Back in 2015 when it turned non-blocking, it was working with PF_UNIX and it may still work. However, the usefullness of such application of sendfile(2) is questionable. Disable the feature while unix/stream is under refactoring.
Relnotes: yes
show more ...
|
Revision tags: release/13.3.0 |
|
#
61cc4830 |
| 18-Jan-2024 |
Alfredo Mazzinghi <am2419@cl.cam.ac.uk> |
Abstract UIO allocation and deallocation.
Introduce the allocuio() and freeuio() functions to allocate and deallocate struct uio. This hides the actual allocator interface, so it is easier to modify
Abstract UIO allocation and deallocation.
Introduce the allocuio() and freeuio() functions to allocate and deallocate struct uio. This hides the actual allocator interface, so it is easier to modify the sub-allocation layout of struct uio and the corresponding iovec array.
Obtained from: CheriBSD Reviewed by: kib, markj MFC after: 2 weeks Sponsored by: CHaOS, EPSRC grant EP/V000292/1 Differential Revision: https://reviews.freebsd.org/D43711
show more ...
|
#
d0adc2f2 |
| 26-Dec-2023 |
Mark Johnston <markj@FreeBSD.org> |
sendfile: Explicitly ignore errors from copyout()
There is a documented bug in sendfile.2 which notes that sendfile(2) does not raise an error if it fails to copy out the number of bytes written. E
sendfile: Explicitly ignore errors from copyout()
There is a documented bug in sendfile.2 which notes that sendfile(2) does not raise an error if it fails to copy out the number of bytes written. Explicitly ignore the error from copyout() calls in preparation for annotating copyout() with __result_use_check.
Reviewed by: glebius, kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D43129
show more ...
|
Revision tags: release/14.0.0 |
|
#
685dc743 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
Revision tags: release/13.2.0 |
|
#
73ee5756 |
| 01-Apr-2023 |
Randall Stewart <rrs@FreeBSD.org> |
Fixes in the tcp infrastructure with respect to stack changes as well as other infrastructure updates for incoming rack features.
So stack switching as always been a bit of a issue. We currently use
Fixes in the tcp infrastructure with respect to stack changes as well as other infrastructure updates for incoming rack features.
So stack switching as always been a bit of a issue. We currently use a break before make setup which means that if something goes wrong you have to try to get back to a stack. This patch among a lot of other things changes that so that it is a make before break. We also expand some of the function blocks in prep for new features in rack that will allow more controlled pacing. We also add other abilities such as the pathway for a stack to query a previous stack to acquire from it critical state information so things in flight don't get dropped or mis-handled when switching stacks. We also add the concept of a timer granularity. This allows an alternate stack to change from the old ticks granularity to microseconds and of course this even gives us a pathway to go to nanosecond timekeeping if we need to (something for the data center to consider for sure).
Once all this lands I will then update rack to begin using all these new features.
Reviewed by: tuexen Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D39210
show more ...
|
Revision tags: release/12.4.0 |
|
#
f45feecf |
| 22-Sep-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add vn_getsize
getattr is very expensive and in important cases only gets called to get the size. This can be optimized with a dedicated routine which obtains that statistic.
As a step towards
vfs: add vn_getsize
getattr is very expensive and in important cases only gets called to get the size. This can be optimized with a dedicated routine which obtains that statistic.
As a step towards that goal make size-only consumers use a dedicated routine.
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D37885
show more ...
|
#
3212ad15 |
| 07-Sep-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
Add getsock
All but one consumers of getsock_cap only pass 4 arguments. Take advantage of it.
|
#
e7d02be1 |
| 17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge
protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge struct pr_usrreqs into struct protosw. This was suggested in 1996 by wollman@ (see 7b187005d18ef), and later reiterated in 2006 by rwatson@ (see 6fbb9cf860dcd). o Make struct domain hold a variable sized array of protosw pointers. For most protocols these pointers are initialized statically. Those domains that may have loadable protocols have spacers. IPv4 and IPv6 have 8 spacers each (andre@ dff3237ee54ea). o For inetsw and inet6sw leave a comment noting that many protosw entries very likely are dead code. o Refactor pf_proto_[un]register() into protosw_[un]register(). o Isolate pr_*_notsupp() methods into uipc_domain.c
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36232
show more ...
|
#
43283184 |
| 12-May-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: use socket buffer mutexes in struct socket directly
Since c67f3b8b78e the sockbuf mutexes belong to the containing socket, and socket buffers just point to it. In 74a68313b50 macros that a
sockets: use socket buffer mutexes in struct socket directly
Since c67f3b8b78e the sockbuf mutexes belong to the containing socket, and socket buffers just point to it. In 74a68313b50 macros that access this mutex directly were added. Go over the core socket code and eliminate code that reaches the mutex by dereferencing the sockbuf compatibility pointer.
This change requires a KPI change, as some functions were given the sockbuf pointer only without any hint if it is a receive or send buffer.
This change doesn't cover the whole kernel, many protocols still use compatibility pointers internally. However, it allows operation of a protocol that doesn't use them.
Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35152
show more ...
|
Revision tags: release/13.1.0 |
|
#
fe27f1db |
| 26-Dec-2021 |
Alexander Motin <mav@FreeBSD.org> |
kern: Remove CTLFLAG_NEEDGIANT from some sysctls.
MFC after: 2 weeks
|
Revision tags: release/12.3.0 |
|
#
e3ba94d4 |
| 09-Nov-2021 |
John Baldwin <jhb@FreeBSD.org> |
Don't require the socket lock for sorele().
Previously, sorele() always required the socket lock and dropped the lock if the released reference was not the last reference. Many callers locked the s
Don't require the socket lock for sorele().
Previously, sorele() always required the socket lock and dropped the lock if the released reference was not the last reference. Many callers locked the socket lock just before calling sorele() resulting in a wasted lock/unlock when not dropping the last reference.
Move the previous implementation of sorele() into a new sorele_locked() function and use it instead of sorele() for various places in uipc_socket.c that called sorele() while already holding the socket lock.
The sorele() macro now uses refcount_release_if_not_last() try to drop the socket reference without locking the socket. If that shortcut fails, it locks the socket and calls sorele_locked().
Reviewed by: kib, markj Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D32741
show more ...
|
#
523d58aa |
| 07-Sep-2021 |
Mark Johnston <markj@FreeBSD.org> |
socket: Remove unneeded SOLISTENING checks
Now that SOCK_IO_*_LOCK() checks for listening sockets, we can eliminate some racy SOLISTENING() checks. No functional change intended.
Reviewed by: tuex
socket: Remove unneeded SOLISTENING checks
Now that SOCK_IO_*_LOCK() checks for listening sockets, we can eliminate some racy SOLISTENING() checks. No functional change intended.
Reviewed by: tuexen MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31660
show more ...
|
#
f94acf52 |
| 07-Sep-2021 |
Mark Johnston <markj@FreeBSD.org> |
socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket, provide alternative macros for the sockbuf I/O locks: SOCK_IO_SEND_(UN)LO
socket: Rename sb(un)lock() and interlock with listen(2)
In preparation for moving sockbuf locks into the containing socket, provide alternative macros for the sockbuf I/O locks: SOCK_IO_SEND_(UN)LOCK() and SOCK_IO_RECV_(UN)LOCK(). These operate on a socket rather than a socket buffer. Note that these locks are used only to prevent concurrent readers and writters from interleaving I/O.
When locking for I/O, return an error if the socket is a listening socket. Currently the check is racy since the sockbuf sx locks are destroyed during the transition to a listening socket, but that will no longer be true after some follow-up changes.
Modify a few places to check for errors from sblock()/SOCK_IO_(SEND|RECV)_LOCK() where they were not before. In particular, add checks to sendfile() and sorflush().
Reviewed by: tuexen, gallatin MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31657
show more ...
|
#
916c61a5 |
| 21-May-2021 |
Mark Johnston <markj@FreeBSD.org> |
Fix handling of errors from pru_send(PRUS_NOTREADY)
PRUS_NOTREADY indicates that the caller has not yet populated the chain with data, and so it is not ready for transmission. This is used by sendf
Fix handling of errors from pru_send(PRUS_NOTREADY)
PRUS_NOTREADY indicates that the caller has not yet populated the chain with data, and so it is not ready for transmission. This is used by sendfile (for async I/O) and KTLS (for encryption). In particular, if pru_send returns an error, the caller is responsible for freeing the chain since other implicit references to the data buffers exist.
For async sendfile, it happens that an error will only be returned if the connection was dropped, in which case tcp_usr_ready() will handle freeing the chain. But since KTLS can be used in conjunction with the regular socket I/O system calls, many more error cases - which do not result in the connection being dropped - are reachable. In these cases, KTLS was effectively assuming success.
So: - Change sosend_generic() to free the mbuf chain if pru_send(PRUS_NOTREADY) fails. Nothing else owns a reference to the chain at that point. - Similarly, in vn_sendfile() change the !async I/O && KTLS case to free the chain. - If async I/O is still outstanding when pru_send fails in vn_sendfile(), set an error in the sfio structure so that the connection is aborted and the mbuf chain is freed.
Reviewed by: gallatin, tuexen Discussed with: jhb MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30349
show more ...
|
Revision tags: release/13.0.0 |
|
#
52a99c72 |
| 02-Apr-2021 |
Mark Johnston <markj@FreeBSD.org> |
sendfile: Fix error initialization in sendfile_getobj()
Reviewed by: chs, kib Reported by: jhb Fixes: faa998f6ff695 MFC after: 1 day Differential Revision: https://reviews.freebsd.org/D29540
|
#
faa998f6 |
| 25-Feb-2021 |
Mark Johnston <markj@FreeBSD.org> |
sendfile: Use the pager size to determine the file extent when possible
Previously sendfile would issue a VOP_GETATTR and use the returned size, i.e., the file size. When paging in file data, sendf
sendfile: Use the pager size to determine the file extent when possible
Previously sendfile would issue a VOP_GETATTR and use the returned size, i.e., the file size. When paging in file data, sendfile_swapin() will use the pager to determine whether it needs to zero-fill, most often because of a hole in a sparse file. An attempt to page in beyond the end of a file is treated this way, and occurs when the requested page is past the end of the pager. In other words, both the file size and pager size were used interchangeably.
With ZFS, updates to the pager and file sizes are not synchronized by the exclusive vnode lock, at least partially due to its use of MNTK_SHARED_WRITES. In particular, the pager size is updated after the file size, so in the presence of a writer concurrently extending the file, sendfile could incorrectly instantiate "holes" in the page cache pages backing the file, which manifests as data corruption when reading the file back from the page cache. The on-disk copy is unaffected.
Fix this by consistently using the pager size when available.
Reported by: dumbbell Reviewed by: chs, kib Tested by: dumbbell, pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28811
show more ...
|
#
214257da |
| 03-Jan-2021 |
Mark Johnston <markj@FreeBSD.org> |
sendfile: Clear page pointers when handling a pager error
When INVARIANTS is configred, the sendfile_iodone() callback verifies that pages attached to the sendfile header are wired, but we unwire al
sendfile: Clear page pointers when handling a pager error
When INVARIANTS is configred, the sendfile_iodone() callback verifies that pages attached to the sendfile header are wired, but we unwire all such pages after a synchronous pager error, before calling sendfile_iodone().
Reported by: pho Tested by: pho Sponsored by: The FreeBSD Foundation
show more ...
|
#
26b23f07 |
| 26-Dec-2020 |
Mark Johnston <markj@FreeBSD.org> |
sendfile: Ensure that sfio->npages is initialized
We initialize sfio->npages only when some I/O is required to satisfy the request. However, sendfile_iodone() contains an INVARIANTS-only check that
sendfile: Ensure that sfio->npages is initialized
We initialize sfio->npages only when some I/O is required to satisfy the request. However, sendfile_iodone() contains an INVARIANTS-only check that references sfio->npages, and this check is executed even if no I/O is performed, so the check may use an uninitialized value.
Fix the problem by initializing sfio->npages earlier. Note that sendfile_swapin() always initializes the page array. In some rare cases we need to trim the page array so ensure that sfio->npages gets updated accordingly.
Reported by: syzkaller (with KASAN) Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27726
show more ...
|
#
cd853791 |
| 28-Nov-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pag
Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (*). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*) Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225
show more ...
|
Revision tags: release/12.2.0 |
|
#
6fed89b1 |
| 02-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
kern: clean up empty lines in .c and .h files
|
Revision tags: release/11.4.0 |
|
#
c2ea3d44 |
| 06-Jun-2020 |
Chuck Silvers <chs@FreeBSD.org> |
Fix hang due to missing unbusy in sendfile when an async data I/O fails.
r359473 removed the page unbusy logic from sendfile_iodone() because when vm_pager_get_pages_async() would return an error af
Fix hang due to missing unbusy in sendfile when an async data I/O fails.
r359473 removed the page unbusy logic from sendfile_iodone() because when vm_pager_get_pages_async() would return an error after failing to start the async I/O (eg. because VOP_BMAP failed), sendfile_swapin() would also unbusy the pages, and it was wrong to unbusy twice. However this breaks the case where vm_pager_get_pages_async() succeeds in starting an async I/O and the async I/O is what fails. In this case, sendfile_iodone() must unbusy the pages, and because sendfile_iodone() doesn't know which case it is in, sendfile_iodone() must always unbusy pages and relookup pages which have been substituted with bogus_page, which in turn means that sendfile_swapin() must never do unbusy or relookup for pages which have been given to vm_pager_get_pages_async(), even if there is an error.
Reviewed by: kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25136
show more ...
|
#
61664ee7 |
| 03-May-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Step 4.2: start divorce of M_EXT and M_EXTPG
They have more differencies than similarities. For now there is lots of code that would check for M_EXT only and work correctly on M_EXTPG buffers, so st
Step 4.2: start divorce of M_EXT and M_EXTPG
They have more differencies than similarities. For now there is lots of code that would check for M_EXT only and work correctly on M_EXTPG buffers, so still carry M_EXT bit together with M_EXTPG. However, prepare some code for explicit check for M_EXTPG.
Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598
show more ...
|
#
6edfd179 |
| 03-May-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Step 4.1: mechanically rename M_NOMAP to M_EXTPG
Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598
|