History log of /freebsd/sys/kern/sys_socket.c (Results 51 – 75 of 275)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 779f106a 08-Jun-2017 Gleb Smirnoff <glebius@FreeBSD.org>

Listening sockets improvements.

o Separate fields of struct socket that belong to listening from
fields that belong to normal dataflow, and unionize them. This
shrinks the structure a bit.
-

Listening sockets improvements.

o Separate fields of struct socket that belong to listening from
fields that belong to normal dataflow, and unionize them. This
shrinks the structure a bit.
- Take out selinfo's from the socket buffers into the socket. The
first reason is to support braindamaged scenario when a socket is
added to kevent(2) and then listen(2) is cast on it. The second
reason is that there is future plan to make socket buffers pluggable,
so that for a dataflow socket a socket buffer can be changed, and
in this case we also want to keep same selinfos through the lifetime
of a socket.
- Remove struct struct so_accf. Since now listening stuff no longer
affects struct socket size, just move its fields into listening part
of the union.
- Provide sol_upcall field and enforce that so_upcall_set() may be called
only on a dataflow socket, which has buffers, and for listening sockets
provide solisten_upcall_set().

o Remove ACCEPT_LOCK() global.
- Add a mutex to socket, to be used instead of socket buffer lock to lock
fields of struct socket that don't belong to a socket buffer.
- Allow to acquire two socket locks, but the first one must belong to a
listening socket.
- Make soref()/sorele() to use atomic(9). This allows in some situations
to do soref() without owning socket lock. There is place for improvement
here, it is possible to make sorele() also to lock optionally.
- Most protocols aren't touched by this change, except UNIX local sockets.
See below for more information.

o Reduce copy-and-paste in kernel modules that accept connections from
listening sockets: provide function solisten_dequeue(), and use it in
the following modules: ctl(4), iscsi(4), ng_btsocket(4), ng_ksocket(4),
infiniband, rpc.

o UNIX local sockets.
- Removal of ACCEPT_LOCK() global uncovered several races in the UNIX
local sockets. Most races exist around spawning a new socket, when we
are connecting to a local listening socket. To cover them, we need to
hold locks on both PCBs when spawning a third one. This means holding
them across sonewconn(). This creates a LOR between pcb locks and
unp_list_lock.
- To fix the new LOR, abandon the global unp_list_lock in favor of global
unp_link_lock. Indeed, separating these two locks didn't provide us any
extra parralelism in the UNIX sockets.
- Now call into uipc_attach() may happen with unp_link_lock hold if, we
are accepting, or without unp_link_lock in case if we are just creating
a socket.
- Another problem in UNIX sockets is that uipc_close() basicly did nothing
for a listening socket. The vnode remained opened for connections. This
is fixed by removing vnode in uipc_close(). Maybe the right way would be
to do it for all sockets (not only listening), simply move the vnode
teardown from uipc_detach() to uipc_close()?

Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D9770

show more ...


# a773cead 30-May-2017 Dimitry Andric <dim@FreeBSD.org>

Merge ^/head r318964 through r319164.


# 95b97895 27-May-2017 Conrad Meyer <cem@FreeBSD.org>

procstat(1): Add TCP socket send/recv buffer size

Add TCP socket send and receive buffer size to procstat -f output.

Reviewed by: kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: htt

procstat(1): Add TCP socket send/recv buffer size

Add TCP socket send and receive buffer size to procstat -f output.

Reviewed by: kib, markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10689

show more ...


# d02c951f 26-May-2017 Dimitry Andric <dim@FreeBSD.org>

Merge ^/head r318658 through r318963.


# 69921123 23-May-2017 Konstantin Belousov <kib@FreeBSD.org>

Commit the 64-bit inode project.

Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_na

Commit the 64-bit inode project.

Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439

show more ...


# 69415bc5 08-Jan-2017 Dimitry Andric <dim@FreeBSD.org>

Merge ^/head r311546 through r311683.


# 14da48cb 07-Jan-2017 John Baldwin <jhb@FreeBSD.org>

Set MORETOCOME for AIO write requests on a socket.

Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend*
set PRUS_MOREOTOCOME when invoking the protocol send method. The aio
worker task

Set MORETOCOME for AIO write requests on a socket.

Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend*
set PRUS_MOREOTOCOME when invoking the protocol send method. The aio
worker tasks for sending on a socket set this flag when there are
additional write jobs waiting on the socket buffer.

Reviewed by: adrian
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D8955

show more ...


Revision tags: release/11.0.1, release/11.0.0
# 93badfa1 16-Sep-2016 Dimitry Andric <dim@FreeBSD.org>

Merge ^/head r305687 through r305890.


# 69a28758 15-Sep-2016 Ed Maste <emaste@FreeBSD.org>

Renumber license clauses in sys/kern to avoid skipping #3


# b1012d80 22-Jun-2016 John Baldwin <jhb@FreeBSD.org>

Account for AIO socket operations in thread/process resource usage.

File and disk-backed I/O requests store counts of read/written disk
blocks in each AIO job so that they can be charged to the thre

Account for AIO socket operations in thread/process resource usage.

File and disk-backed I/O requests store counts of read/written disk
blocks in each AIO job so that they can be charged to the thread that
completes an AIO request via aio_return() or aio_waitcomplete(). This
change extends AIO jobs to store counts of received/sent messages and
updates socket backends to set these counts accordingly. Note that
the socket backends are careful to only charge a single messages for
each AIO request even though a single request on a blocking socket might
invoke sosend or soreceive multiple times. This is to mimic the
resource accounting of synchronous read/write.

Adjust the UNIX socketpair AIO test to verify that the message resource
usage counts update accordingly for aio_read and aio_write.

Approved by: re (hrs)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6911

show more ...


# fe0bdd1d 15-Jun-2016 John Baldwin <jhb@FreeBSD.org>

Move backend-specific fields of kaiocb into a union.

This reduces the size of kaiocb slightly. I've also added some generic
fields that other backends can use in place of the BIO-specific fields.

C

Move backend-specific fields of kaiocb into a union.

This reduces the size of kaiocb slightly. I've also added some generic
fields that other backends can use in place of the BIO-specific fields.

Change the socket and Chelsio DDP backends to use 'backend3' instead of
abusing _aiocb_private.status directly. This confines the use of
_aiocb_private to the AIO internals in vfs_aio.c.

Reviewed by: kib (earlier version)
Approved by: re (gjb)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D6547

show more ...


# 778ce4f2 24-May-2016 John Baldwin <jhb@FreeBSD.org>

Return the correct status when a partially completed request is cancelled.

After the previous changes to fix requests on blocking sockets to complete
across multiple operations, an edge case exists

Return the correct status when a partially completed request is cancelled.

After the previous changes to fix requests on blocking sockets to complete
across multiple operations, an edge case exists where a request can be
cancelled after it has partially completed. POSIX doesn't appear to
dictate exactly how to handle this case, but in general I feel that
aio_cancel() should arrange to cancel any request it can, but that any
partially completed requests should return a partial completion rather
than ECANCELED. To that end, fix the socket AIO cancellation routine to
return a short read/write if a partially completed request is cancelled
rather than ECANCELED.

Sponsored by: Chelsio Communications

show more ...


# 1717b68a 24-May-2016 John Baldwin <jhb@FreeBSD.org>

Don't prematurely return short completions on blocking sockets.

Always requeue an AIO job at the head of the socket buffer's queue if
sosend() or soreceive() returns EWOULDBLOCK on a blocking socket

Don't prematurely return short completions on blocking sockets.

Always requeue an AIO job at the head of the socket buffer's queue if
sosend() or soreceive() returns EWOULDBLOCK on a blocking socket.
Previously, requests were only requeued if they returned EWOULDBLOCK
and completed no data. Now after a partial completion on a blocking
socket the request is queued and the remaining request is retried when
the socket is ready. This allows writes larger than the currently
available space on a blocking socket to fully complete. Reads on a
blocking socket that satifsy the low watermark can still return a short
read (same as read()).

In order to track previously completed data, the internal 'status'
field of the AIO job is used to store the amount of previously
computed data.

Non-blocking sockets continue to return short completions for both
reads and writes.

Add a test for a "large" AIO write on a blocking socket that writes
twice the socket buffer size to a UNIX domain socket.

Sponsored by: Chelsio Communications

show more ...


# f0ec1740 20-May-2016 John Baldwin <jhb@FreeBSD.org>

Consistently set status to -1 when completing an AIO request with an error.

Sponsored by: Chelsio Communications


# 5163d2ec 29-Apr-2016 John Baldwin <jhb@FreeBSD.org>

Expose soaio_enqueue().

This can be used by protocol-specific AIO handlers to queue work to the
socket AIO daemon pool.

Sponsored by: Chelsio Communications


# 8722384b 29-Apr-2016 John Baldwin <jhb@FreeBSD.org>

Introduce a new protocol hook pru_aio_queue.

This allows a protocol to claim individual AIO requests instead of using
the default socket AIO handling.

Sponsored by: Chelsio Communications


Revision tags: release/10.3.0
# 82aa34e6 04-Mar-2016 Dimitry Andric <dim@FreeBSD.org>

Merge ^/head r296007 through r296368.


# 52259a98 02-Mar-2016 Glen Barber <gjb@FreeBSD.org>

MFH

Sponsored by: The FreeBSD Foundation


# f3215338 01-Mar-2016 John Baldwin <jhb@FreeBSD.org>

Refactor the AIO subsystem to permit file-type-specific handling and
improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing a

Refactor the AIO subsystem to permit file-type-specific handling and
improve cancellation robustness.

Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.

A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.

The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.

Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order. Socket requests will not block indefinitely
permitting timely cancellation of all requests.

Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels. The VFS_AIO
kernel option and aio.ko module are gone.

Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon. This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system. To protect against this, AIO
requests are only permitted for known "safe" files by default. AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.

Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default. aio_mlock() is also enabled by default.

Reviewed by: cem, jilles
Discussed with: kib (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5289

show more ...


Revision tags: release/10.2.0
# d899be7d 19-Jan-2015 Glen Barber <gjb@FreeBSD.org>

Reintegrate head: r274132-r277384

Sponsored by: The FreeBSD Foundation


# 8f0ea33f 13-Jan-2015 Glen Barber <gjb@FreeBSD.org>

Reintegrate head revisions r273096-r277147

Sponsored by: The FreeBSD Foundation


# 4d56c133 21-Nov-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Sync to HEAD@r274766


# 9268022b 19-Nov-2014 Simon J. Gerraty <sjg@FreeBSD.org>

Merge from head@274682


# cfa6009e 12-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

In preparation of merging projects/sendfile, transform bare access to
sb_cc member of struct sockbuf to a couple of inline functions:

sbavail() and sbused()

Right now they are equal, but once notio

In preparation of merging projects/sendfile, transform bare access to
sb_cc member of struct sockbuf to a couple of inline functions:

sbavail() and sbused()

Right now they are equal, but once notion of "not ready socket buffer data",
will be checked in, they are going to be different.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.

show more ...


Revision tags: release/10.1.0
# 1ce4b357 04-Oct-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Sync to HEAD@r272516.


1234567891011