#
ccb973da |
| 26-Nov-2024 |
Kyle Evans <kevans@FreeBSD.org> |
kern: restore signal mask before ast() for pselect/ppoll
It's possible to take a signal after pselect/ppoll have set their return value, but before we actually return to userland. This results in t
kern: restore signal mask before ast() for pselect/ppoll
It's possible to take a signal after pselect/ppoll have set their return value, but before we actually return to userland. This results in taking a signal without reflecting it in the return value, which weakens the guarantees provided by these functions.
Switch both to restore the signal mask before we would deliver signals on return to userland. If a signal was received after the wait was over, then we'll just have the signal queued up for the next time it comes unblocked. The modified signal mask is retained if we were interrupted so that ast() actually handles the signal, at which point the signal mask is restored.
des@ has a test case demonstrating the issue in D47738 which will follow.
Note for MFC: TDA_PSELECT is a KBI break, we should just inline ast_sigsuspend() in pselect/ppoll for stable branches. It's not exactly the same, but it will be close enough.
Reported by: des Reviewed by: des (earlier version), kib Sponsored by: Klara, Inc. Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D47741
show more ...
|
Revision tags: release/13.4.0, release/14.1.0 |
|
#
5b3e5c6c |
| 29-Apr-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kcmp_pget(): do not accept TIDs
Otherwise pget() might still look up and hold the current process.
Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
1e01650a |
| 29-Apr-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kcmp_pget(): add an assert that we did not hold the current process
Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days
|
#
47ad4f2d |
| 05-Mar-2024 |
Kyle Evans <kevans@FreeBSD.org> |
ktrace: log genio events on failed write
Visibility into the contents of the buffer when a write(2) has failed can be immensely useful in debugging IPC issues -- pushing this to discuss the idea, or
ktrace: log genio events on failed write
Visibility into the contents of the buffer when a write(2) has failed can be immensely useful in debugging IPC issues -- pushing this to discuss the idea, or maybe an alternative where we can set a flag like KTRFAC_ERRIO to enable it.
When a genio event is potentially raised after an error, currently we'll just free the uio and return. However, such data can be useful when debugging communication between processes to, e.g., understand what the remote side should have grabbed before closing a pipe. Tap out the entire buffer on failure rather than simply discarding it.
Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D43799
show more ...
|
#
b5d2165b |
| 05-Mar-2024 |
Kyle Evans <kevans@FreeBSD.org> |
kern: poll: tap out the pollfd array on successful return
We do this in kern_poll() to include freebsd32 but exclude the linux compat layer. The ABI should be the same, but the POLL constants are p
kern: poll: tap out the pollfd array on successful return
We do this in kern_poll() to include freebsd32 but exclude the linux compat layer. The ABI should be the same, but the POLL constants are probably different or should be assumed so.
Reviewed by: bapt, jhb Differential Revision: https://reviews.freebsd.org/D44158
show more ...
|
Revision tags: release/13.3.0 |
|
#
61cc4830 |
| 18-Jan-2024 |
Alfredo Mazzinghi <am2419@cl.cam.ac.uk> |
Abstract UIO allocation and deallocation.
Introduce the allocuio() and freeuio() functions to allocate and deallocate struct uio. This hides the actual allocator interface, so it is easier to modify
Abstract UIO allocation and deallocation.
Introduce the allocuio() and freeuio() functions to allocate and deallocate struct uio. This hides the actual allocator interface, so it is easier to modify the sub-allocation layout of struct uio and the corresponding iovec array.
Obtained from: CheriBSD Reviewed by: kib, markj MFC after: 2 weeks Sponsored by: CHaOS, EPSRC grant EP/V000292/1 Differential Revision: https://reviews.freebsd.org/D43711
show more ...
|
#
f28526e9 |
| 19-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kcmp(2): implement for generic file types
Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
|
#
d8decc9a |
| 19-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
Add kcmp(2) kernel bits
This is based purely on reading the Linux kcmp(2) man page. In addition to the Linux set of comparators, I also added KCMP_FILEOBJ to compare underlying file' objects.
Teste
Add kcmp(2) kernel bits
This is based purely on reading the Linux kcmp(2) man page. In addition to the Linux set of comparators, I also added KCMP_FILEOBJ to compare underlying file' objects.
Tested by: manu Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
show more ...
|
#
29363fb4 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
685dc743 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
Revision tags: release/13.2.0 |
|
#
7a2c93b8 |
| 14-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: provide sousrsend() that does socket specific error handling
Sockets have special handling for EPIPE on a write, that was spread out into several places. Treating transient errors is also
sockets: provide sousrsend() that does socket specific error handling
Sockets have special handling for EPIPE on a write, that was spread out into several places. Treating transient errors is also special - if protocol is atomic, than we should ignore any changes to uio_resid, a transient error means the write had completely failed (see d2b3a0ed31e).
- Provide sousrsend() that expects a valid uio, and leave sosend() for kernel consumers only. Do all special error handling right here. - In dofilewrite() don't do special handling of error for DTYPE_SOCKET. - For send(2), write(2) and aio_write(2) call into sousrsend() and remove error handling for kern_sendit(), soo_write() and soaio_process_job().
PR: 265087 Reported by: rz-rpi03 at h-ka.de Reviewed by: markj Differential revision: https://reviews.freebsd.org/D35863
show more ...
|
Revision tags: release/12.4.0 |
|
#
c6d31b83 |
| 18-Jul-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
AST: rework
Make most AST handlers dynamically registered. This allows to have subsystem-specific handler source located in the subsystem files, instead of making subr_trap.c aware of it. For inst
AST: rework
Make most AST handlers dynamically registered. This allows to have subsystem-specific handler source located in the subsystem files, instead of making subr_trap.c aware of it. For instance, signal delivery code on return to userspace is now moved to kern_sig.c.
Also, it allows to have some handlers designated as the cleanup (kclear) type, which are called both at AST and on thread/process exit. For instance, ast(), exit1(), and NFS server no longer need to be aware about UFS softdep processing.
The dynamic registration also allows third-party modules to register AST handlers if needed. There is one caveat with loadable modules: the code does not make any effort to ensure that the module is not unloaded before all threads processed through AST handler in it. In fact, this is already present behavior for hwpmc.ko and ufs.ko. I do not think it is worth the efforts and the runtime overhead to try to fix it.
Reviewed by: markj Tested by: emaste (arm64), pho Discussed with: jhb Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35888
show more ...
|
Revision tags: release/13.1.0 |
|
#
91e7bdcd |
| 25-Apr-2022 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Add timespecvalid_interval macro and use it.
Reviewed by: jhb, imp (early rev) Differential revision: https://reviews.freebsd.org/D34848 MFC after: 2 weeks
|
#
f17ef286 |
| 22-Feb-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: rename fget*_locked to fget*_noref
This gets rid of the error prone naming where fget_unlocked returns with a ref held, while fget_locked requires a lock but provides nothing in terms of making
fd: rename fget*_locked to fget*_noref
This gets rid of the error prone naming where fget_unlocked returns with a ref held, while fget_locked requires a lock but provides nothing in terms of making sure the file lives past unlock.
No functional changes.
show more ...
|
#
513c7a6e |
| 11-Feb-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
fd: make fget_unlocked take a thread argument
Just like other fget routines. This enables embedding fd table pointer in struct thread, avoiding taking a trip through proc.
|
Revision tags: release/12.3.0 |
|
#
04c91ac4 |
| 14-Oct-2021 |
Brooks Davis <brooks@FreeBSD.org> |
selsocket: handle sopoll() errors correctly
Without this change, unmounting smbfs filesystems with an INVARIANTS kernel would panic after 10e64782ed59727e8c9fe4a5c7e17f497903c8eb.
Found by: markj R
selsocket: handle sopoll() errors correctly
Without this change, unmounting smbfs filesystems with an INVARIANTS kernel would panic after 10e64782ed59727e8c9fe4a5c7e17f497903c8eb.
Found by: markj Reviewed by: markj, jhb Obtained from: CheriBSD MFC after: 3 days Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D32492
show more ...
|
#
0dc332bf |
| 05-Aug-2021 |
Ka Ho Ng <khng@FreeBSD.org> |
Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).
fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).
fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the deallocation. vn_deallocate(9) is a public KPI for kmods' use.
The purpose of proposing a new system call, a KPI and a VOP call is to allow bhyve or other hypervisor monitors to emulate the behavior of SCSI UNMAP/NVMe DEALLOCATE on a plain file.
fspacectl(2) comprises of cmd and flags parameters to specify the space management operation to be performed. Currently cmd has to be SPACECTL_DEALLOC, and flags has to be 0.
fo_fspacectl is added to fileops. VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation of VOP_DEALLOCATE(9) is provided.
Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28347
show more ...
|
#
cae3f9dd |
| 23-Jul-2021 |
Mark Johnston <markj@FreeBSD.org> |
select: Define select_flags[] as const
MFC after: 1 week Sponsored by: The FreeBSD Foundation
|
#
e884512a |
| 10-Jun-2021 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Split kern_poll() on two counterparts.
The kern_poll_kfds() operates on clear kernel data, kfds points to an array in the kernel, while kern_poll() operates on user supplied pollfd. Move nfds check
Split kern_poll() on two counterparts.
The kern_poll_kfds() operates on clear kernel data, kfds points to an array in the kernel, while kern_poll() operates on user supplied pollfd. Move nfds check to kern_poll_maxfds().
No functional changes, it's for future use in the Linux emulation layer.
Reviewd by: kib Differential Revision: https://reviews.freebsd.org/D30690 MFC after: 2 weeks
show more ...
|
Revision tags: release/13.0.0 |
|
#
45e1f854 |
| 29-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
poll: use fget_unlocked or fget_only_user when feasible
This follows select by eleminating the use of filedesc lock. This is a win for single-threaded processes and a mixed bag for others as at smal
poll: use fget_unlocked or fget_only_user when feasible
This follows select by eleminating the use of filedesc lock. This is a win for single-threaded processes and a mixed bag for others as at small concurrency it is faster to take the lock instead of refing/unrefing each file descriptor.
Nonetheless, removal of shared lock usage is a step towards a mtx-protected fd table.
show more ...
|
#
6affe1b7 |
| 29-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
select: employ fget_only_user
Since most select users are single-threaded this avoid a lot of work in the common case.
For example select of 16 fds (ops/s): before: 2114536 after: 2991010
|
#
7a202823 |
| 23-Dec-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Expose eventfd in the native API/ABI using a new __specialfd syscall
eventfd is a Linux system call that produces special file descriptors for event notification. When porting Linux software, it is
Expose eventfd in the native API/ABI using a new __specialfd syscall
eventfd is a Linux system call that produces special file descriptors for event notification. When porting Linux software, it is currently usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues are not passable between processes. And, as noted by the author of epoll-shim, even if they were, the library state would also have to be passed somehow. This came up when debugging strange HW video decode failures in Firefox. A native implementation would avoid these problems and help with porting Linux software.
Since we now already have an eventfd implementation in the kernel (for the Linuxulator), it's pretty easy to expose it natively, which is what this patch does.
Submitted by: greg@unrelenting.technology Reviewed by: markj (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26668
show more ...
|
#
10e64782 |
| 02-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
select: make sure there are no wakeup attempts after selfdfree returns
Prior to the patch returning selfdfree could still be racing against doselwakeup which set sf_si = NULL and now locks stp to wa
select: make sure there are no wakeup attempts after selfdfree returns
Prior to the patch returning selfdfree could still be racing against doselwakeup which set sf_si = NULL and now locks stp to wake up the other thread.
A sufficiently unlucky pair can end up going all the way down to freeing select-related structures before the lock/wakeup/unlock finishes.
This started manifesting itself as crashes since select data started getting freed in r367714.
show more ...
|
#
31b2ac4b |
| 16-Nov-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
select: replace reference counting with memory barriers in selfd
Refcounting was added to combat a race between selfdfree and doselwakup, but it adds avoidable overhead.
selfdfree detects it can fr
select: replace reference counting with memory barriers in selfd
Refcounting was added to combat a race between selfdfree and doselwakup, but it adds avoidable overhead.
selfdfree detects it can free the object by ->sf_si == NULL, thus we can ensure that the condition only holds after all accesses are completed.
show more ...
|
#
ea33cca9 |
| 05-Nov-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
poll/select: change selfd_zone into a malloc type
On a sample box vmstat -z shows:
ITEM SIZE LIMIT USED FREE REQ 64: 64, 0, 1043784, 436753
poll/select: change selfd_zone into a malloc type
On a sample box vmstat -z shows:
ITEM SIZE LIMIT USED FREE REQ 64: 64, 0, 1043784, 4367538,3698187229 selfd: 64, 0, 1520, 13726,182729008
But at the same time: vm.uma.selfd.keg.domain.1.pages: 121 vm.uma.selfd.keg.domain.0.pages: 121
Thus 242 pages got pulled even though the malloc zone would likely accomodate the load without using extra memory.
show more ...
|