#
779f106a |
| 08-Jun-2017 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Listening sockets improvements.
o Separate fields of struct socket that belong to listening from fields that belong to normal dataflow, and unionize them. This shrinks the structure a bit. -
Listening sockets improvements.
o Separate fields of struct socket that belong to listening from fields that belong to normal dataflow, and unionize them. This shrinks the structure a bit. - Take out selinfo's from the socket buffers into the socket. The first reason is to support braindamaged scenario when a socket is added to kevent(2) and then listen(2) is cast on it. The second reason is that there is future plan to make socket buffers pluggable, so that for a dataflow socket a socket buffer can be changed, and in this case we also want to keep same selinfos through the lifetime of a socket. - Remove struct struct so_accf. Since now listening stuff no longer affects struct socket size, just move its fields into listening part of the union. - Provide sol_upcall field and enforce that so_upcall_set() may be called only on a dataflow socket, which has buffers, and for listening sockets provide solisten_upcall_set().
o Remove ACCEPT_LOCK() global. - Add a mutex to socket, to be used instead of socket buffer lock to lock fields of struct socket that don't belong to a socket buffer. - Allow to acquire two socket locks, but the first one must belong to a listening socket. - Make soref()/sorele() to use atomic(9). This allows in some situations to do soref() without owning socket lock. There is place for improvement here, it is possible to make sorele() also to lock optionally. - Most protocols aren't touched by this change, except UNIX local sockets. See below for more information.
o Reduce copy-and-paste in kernel modules that accept connections from listening sockets: provide function solisten_dequeue(), and use it in the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4), infiniband, rpc.
o UNIX local sockets. - Removal of ACCEPT_LOCK() global uncovered several races in the UNIX local sockets. Most races exist around spawning a new socket, when we are connecting to a local listening socket. To cover them, we need to hold locks on both PCBs when spawning a third one. This means holding them across sonewconn(). This creates a LOR between pcb locks and unp_list_lock. - To fix the new LOR, abandon the global unp_list_lock in favor of global unp_link_lock. Indeed, separating these two locks didn't provide us any extra parralelism in the UNIX sockets. - Now call into uipc_attach() may happen with unp_link_lock hold if, we are accepting, or without unp_link_lock in case if we are just creating a socket. - Another problem in UNIX sockets is that uipc_close() basicly did nothing for a listening socket. The vnode remained opened for connections. This is fixed by removing vnode in uipc_close(). Maybe the right way would be to do it for all sockets (not only listening), simply move the vnode teardown from uipc_detach() to uipc_close()?
Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D9770
show more ...
|
#
a773cead |
| 30-May-2017 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r318964 through r319164.
|
#
95b97895 |
| 27-May-2017 |
Conrad Meyer <cem@FreeBSD.org> |
procstat(1): Add TCP socket send/recv buffer size
Add TCP socket send and receive buffer size to procstat -f output.
Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: htt
procstat(1): Add TCP socket send/recv buffer size
Add TCP socket send and receive buffer size to procstat -f output.
Reviewed by: kib, markj Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10689
show more ...
|
#
d02c951f |
| 26-May-2017 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r318658 through r318963.
|
#
69921123 |
| 23-May-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_na
Commit the 64-bit inode project.
Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024.
ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways.
Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important.
Struct xvnode changed layout, no compat shims are provided.
For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat.
Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world.
Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib).
Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439
show more ...
|
#
69415bc5 |
| 08-Jan-2017 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r311546 through r311683.
|
#
14da48cb |
| 07-Jan-2017 |
John Baldwin <jhb@FreeBSD.org> |
Set MORETOCOME for AIO write requests on a socket.
Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend* set PRUS_MOREOTOCOME when invoking the protocol send method. The aio worker task
Set MORETOCOME for AIO write requests on a socket.
Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend* set PRUS_MOREOTOCOME when invoking the protocol send method. The aio worker tasks for sending on a socket set this flag when there are additional write jobs waiting on the socket buffer.
Reviewed by: adrian MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D8955
show more ...
|
Revision tags: release/11.0.1, release/11.0.0 |
|
#
93badfa1 |
| 16-Sep-2016 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r305687 through r305890.
|
#
69a28758 |
| 15-Sep-2016 |
Ed Maste <emaste@FreeBSD.org> |
Renumber license clauses in sys/kern to avoid skipping #3
|
#
b1012d80 |
| 22-Jun-2016 |
John Baldwin <jhb@FreeBSD.org> |
Account for AIO socket operations in thread/process resource usage.
File and disk-backed I/O requests store counts of read/written disk blocks in each AIO job so that they can be charged to the thre
Account for AIO socket operations in thread/process resource usage.
File and disk-backed I/O requests store counts of read/written disk blocks in each AIO job so that they can be charged to the thread that completes an AIO request via aio_return() or aio_waitcomplete(). This change extends AIO jobs to store counts of received/sent messages and updates socket backends to set these counts accordingly. Note that the socket backends are careful to only charge a single messages for each AIO request even though a single request on a blocking socket might invoke sosend or soreceive multiple times. This is to mimic the resource accounting of synchronous read/write.
Adjust the UNIX socketpair AIO test to verify that the message resource usage counts update accordingly for aio_read and aio_write.
Approved by: re (hrs) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D6911
show more ...
|
#
fe0bdd1d |
| 15-Jun-2016 |
John Baldwin <jhb@FreeBSD.org> |
Move backend-specific fields of kaiocb into a union.
This reduces the size of kaiocb slightly. I've also added some generic fields that other backends can use in place of the BIO-specific fields.
C
Move backend-specific fields of kaiocb into a union.
This reduces the size of kaiocb slightly. I've also added some generic fields that other backends can use in place of the BIO-specific fields.
Change the socket and Chelsio DDP backends to use 'backend3' instead of abusing _aiocb_private.status directly. This confines the use of _aiocb_private to the AIO internals in vfs_aio.c.
Reviewed by: kib (earlier version) Approved by: re (gjb) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D6547
show more ...
|
#
778ce4f2 |
| 24-May-2016 |
John Baldwin <jhb@FreeBSD.org> |
Return the correct status when a partially completed request is cancelled.
After the previous changes to fix requests on blocking sockets to complete across multiple operations, an edge case exists
Return the correct status when a partially completed request is cancelled.
After the previous changes to fix requests on blocking sockets to complete across multiple operations, an edge case exists where a request can be cancelled after it has partially completed. POSIX doesn't appear to dictate exactly how to handle this case, but in general I feel that aio_cancel() should arrange to cancel any request it can, but that any partially completed requests should return a partial completion rather than ECANCELED. To that end, fix the socket AIO cancellation routine to return a short read/write if a partially completed request is cancelled rather than ECANCELED.
Sponsored by: Chelsio Communications
show more ...
|
#
1717b68a |
| 24-May-2016 |
John Baldwin <jhb@FreeBSD.org> |
Don't prematurely return short completions on blocking sockets.
Always requeue an AIO job at the head of the socket buffer's queue if sosend() or soreceive() returns EWOULDBLOCK on a blocking socket
Don't prematurely return short completions on blocking sockets.
Always requeue an AIO job at the head of the socket buffer's queue if sosend() or soreceive() returns EWOULDBLOCK on a blocking socket. Previously, requests were only requeued if they returned EWOULDBLOCK and completed no data. Now after a partial completion on a blocking socket the request is queued and the remaining request is retried when the socket is ready. This allows writes larger than the currently available space on a blocking socket to fully complete. Reads on a blocking socket that satifsy the low watermark can still return a short read (same as read()).
In order to track previously completed data, the internal 'status' field of the AIO job is used to store the amount of previously computed data.
Non-blocking sockets continue to return short completions for both reads and writes.
Add a test for a "large" AIO write on a blocking socket that writes twice the socket buffer size to a UNIX domain socket.
Sponsored by: Chelsio Communications
show more ...
|
#
f0ec1740 |
| 20-May-2016 |
John Baldwin <jhb@FreeBSD.org> |
Consistently set status to -1 when completing an AIO request with an error.
Sponsored by: Chelsio Communications
|
#
5163d2ec |
| 29-Apr-2016 |
John Baldwin <jhb@FreeBSD.org> |
Expose soaio_enqueue().
This can be used by protocol-specific AIO handlers to queue work to the socket AIO daemon pool.
Sponsored by: Chelsio Communications
|
#
8722384b |
| 29-Apr-2016 |
John Baldwin <jhb@FreeBSD.org> |
Introduce a new protocol hook pru_aio_queue.
This allows a protocol to claim individual AIO requests instead of using the default socket AIO handling.
Sponsored by: Chelsio Communications
|
Revision tags: release/10.3.0 |
|
#
82aa34e6 |
| 04-Mar-2016 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r296007 through r296368.
|
#
52259a98 |
| 02-Mar-2016 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: The FreeBSD Foundation
|
#
f3215338 |
| 01-Mar-2016 |
John Baldwin <jhb@FreeBSD.org> |
Refactor the AIO subsystem to permit file-type-specific handling and improve cancellation robustness.
Introduce a new file operation, fo_aio_queue, which is responsible for queueing and completing a
Refactor the AIO subsystem to permit file-type-specific handling and improve cancellation robustness.
Introduce a new file operation, fo_aio_queue, which is responsible for queueing and completing an asynchronous I/O request for a given file. The AIO subystem now exports library of routines to manipulate AIO requests as well as the ability to run a handler function in the "default" pool of AIO daemons to service a request.
A default implementation for file types which do not include an fo_aio_queue method queues requests to the "default" pool invoking the fo_read or fo_write methods as before.
The AIO subsystem permits file types to install a private "cancel" routine when a request is queued to permit safe dequeueing and cleanup of cancelled requests.
Sockets now use their own pool of AIO daemons and service per-socket requests in FIFO order. Socket requests will not block indefinitely permitting timely cancellation of all requests.
Due to the now-tight coupling of the AIO subsystem with file types, the AIO subsystem is now a standard part of all kernels. The VFS_AIO kernel option and aio.ko module are gone.
Many file types may block indefinitely in their fo_read or fo_write callbacks resulting in a hung AIO daemon. This can result in hung user processes (when processes attempt to cancel all outstanding requests during exit) or a hung system. To protect against this, AIO requests are only permitted for known "safe" files by default. AIO requests for all file types can be enabled by setting the new vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have been updated to skip operations on unsafe file types if the sysctl is zero.
Currently, AIO requests on sockets and raw disks are considered safe and are enabled by default. aio_mlock() is also enabled by default.
Reviewed by: cem, jilles Discussed with: kib (earlier version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5289
show more ...
|
Revision tags: release/10.2.0 |
|
#
d899be7d |
| 19-Jan-2015 |
Glen Barber <gjb@FreeBSD.org> |
Reintegrate head: r274132-r277384
Sponsored by: The FreeBSD Foundation
|
#
8f0ea33f |
| 13-Jan-2015 |
Glen Barber <gjb@FreeBSD.org> |
Reintegrate head revisions r273096-r277147
Sponsored by: The FreeBSD Foundation
|
#
4d56c133 |
| 21-Nov-2014 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Sync to HEAD@r274766
|
#
9268022b |
| 19-Nov-2014 |
Simon J. Gerraty <sjg@FreeBSD.org> |
Merge from head@274682
|
#
cfa6009e |
| 12-Nov-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
In preparation of merging projects/sendfile, transform bare access to sb_cc member of struct sockbuf to a couple of inline functions:
sbavail() and sbused()
Right now they are equal, but once notio
In preparation of merging projects/sendfile, transform bare access to sb_cc member of struct sockbuf to a couple of inline functions:
sbavail() and sbused()
Right now they are equal, but once notion of "not ready socket buffer data", will be checked in, they are going to be different.
Sponsored by: Netflix Sponsored by: Nginx, Inc.
show more ...
|
Revision tags: release/10.1.0 |
|
#
1ce4b357 |
| 04-Oct-2014 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Sync to HEAD@r272516.
|