History log of /linux/io_uring/net.c (Results 126 – 150 of 539)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 0e6ebfd1 09-Apr-2024 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.9-rc3' into x86/cpu, to pick up fixes

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9b4e5285 03-Apr-2024 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.9-rc2' into perf/core, to pick up dependent commits

Pick up fixes that followup patches are going to depend on.

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 6a2bcf92 03-Apr-2024 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.9-rc2' into x86/percpu, to pick up fixes and resolve conflict

Conflicts:
arch/x86/Kconfig

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# f4566a1e 25-Mar-2024 Ingo Molnar <mingo@kernel.org>

Merge tag 'v6.9-rc1' into sched/core, to pick up fixes and to refresh the branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>


# 9961a785 13-May-2024 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:

- Greatly improve send zerocopy performance, by enabling coalescing of
sent buffers.

Merge tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux

Pull io_uring updates from Jens Axboe:

- Greatly improve send zerocopy performance, by enabling coalescing of
sent buffers.

MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the
io_uring side did not. In local testing, the crossover point for send
zerocopy being faster is now around 3000 byte packets, and it
performs better than the sync syscall variants as well.

This feature relies on a shared branch with net-next, which was
pulled into both branches.

- Unification of how async preparation is done across opcodes.

Previously, opcodes that required extra memory for async retry would
allocate that as needed, using on-stack state until that was the
case. If async retry was needed, the on-stack state was adjusted
appropriately for a retry and then copied to the allocated memory.

This led to some fragile and ugly code, particularly for read/write
handling, and made storage retries more difficult than they needed to
be. Allocate the memory upfront, as it's cheap from our pools, and
use that state consistently both initially and also from the retry
side.

- Move away from using remap_pfn_range() for mapping the rings.

This is really not the right interface to use and can cause lifetime
issues or leaks. Additionally, it means the ring sq/cq arrays need to
be physically contigious, which can cause problems in production with
larger rings when services are restarted, as memory can be very
fragmented at that point.

Move to using vm_insert_page(s) for the ring sq/cq arrays, and apply
the same treatment to mapped ring provided buffers. This also helps
unify the code we have dealing with allocating and mapping memory.

Hard to see in the diffstat as we're adding a few features as well,
but this kills about ~400 lines of code from the codebase as well.

- Add support for bundles for send/recv.

When used with provided buffers, bundles support sending or receiving
more than one buffer at the time, improving the efficiency by only
needing to call into the networking stack once for multiple sends or
receives.

- Tweaks for our accept operations, supporting both a DONTWAIT flag for
skipping poll arm and retry if we can, and a POLLFIRST flag that the
application can use to skip the initial accept attempt and rely
purely on poll for triggering the operation. Both of these have
identical flags on the receive side already.

- Make the task_work ctx locking unconditional.

We had various code paths here that would do a mix of lock/trylock
and set the task_work state to whether or not it was locked. All of
that goes away, we lock it unconditionally and get rid of the state
flag indicating whether it's locked or not.

The state struct still exists as an empty type, can go away in the
future.

- Add support for specifying NOP completion values, allowing it to be
used for error handling testing.

- Use set/test bit for io-wq worker flags. Not strictly needed, but
also doesn't hurt and helps silence a KCSAN warning.

- Cleanups for io-wq locking and work assignments, closing a tiny race
where cancelations would not be able to find the work item reliably.

- Misc fixes, cleanups, and improvements

* tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux: (97 commits)
io_uring: support to inject result for NOP
io_uring: fail NOP if non-zero op flags is passed in
io_uring/net: add IORING_ACCEPT_POLL_FIRST flag
io_uring/net: add IORING_ACCEPT_DONTWAIT flag
io_uring/filetable: don't unnecessarily clear/reset bitmap
io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
io_uring: Require zeroed sqe->len on provided-buffers send
io_uring/notif: disable LAZY_WAKE for linked notifs
io_uring/net: fix sendzc lazy wake polling
io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it
io_uring/rw: reinstate thread check for retries
io_uring/notif: implement notification stacking
io_uring/notif: simplify io_notif_flush()
net: add callback for setting a ubuf_info to skb
net: extend ubuf_info callback to ops structure
io_uring/net: support bundles for recv
io_uring/net: support bundles for send
io_uring/kbuf: add helpers for getting/peeking multiple buffers
io_uring/net: add provided buffer support for IORING_OP_SEND
...

show more ...


# d3da8e98 08-May-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: add IORING_ACCEPT_POLL_FIRST flag

Similarly to how polling first is supported for receive, it makes sense
to provide the same for accept. An accept operation does a lot of
expensive se

io_uring/net: add IORING_ACCEPT_POLL_FIRST flag

Similarly to how polling first is supported for receive, it makes sense
to provide the same for accept. An accept operation does a lot of
expensive setup, like allocating an fd, a socket/inode, etc. If no
connection request is already pending, this is wasted and will just be
cleaned up and freed, only to retry via the usual poll trigger.

Add IORING_ACCEPT_POLL_FIRST, which tells accept to only initiate the
accept request if poll says we have something to accept.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 7dcc758c 07-May-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: add IORING_ACCEPT_DONTWAIT flag

This allows the caller to perform a non-blocking attempt, similarly to
how recvmsg has MSG_DONTWAIT. If set, and we get -EAGAIN on a connection
attempt,

io_uring/net: add IORING_ACCEPT_DONTWAIT flag

This allows the caller to perform a non-blocking attempt, similarly to
how recvmsg has MSG_DONTWAIT. If set, and we get -EAGAIN on a connection
attempt, propagate the result to userspace rather than arm poll and
wait for a retry.

Suggested-by: Norman Maurer <norman_maurer@apple.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 79996b45 01-May-2024 Gabriel Krisman Bertazi <krisman@suse.de>

io_uring: Require zeroed sqe->len on provided-buffers send

When sending from a provided buffer, we set sr->len to be the smallest
between the actual buffer size and sqe->len. But, now that we
disco

io_uring: Require zeroed sqe->len on provided-buffers send

When sending from a provided buffer, we set sr->len to be the smallest
between the actual buffer size and sqe->len. But, now that we
disconnect the buffer from the submission request, we can get in a
situation where the buffers and requests mismatch, and only part of a
buffer gets sent. Assume:

* buf[1]->len = 128; buf[2]->len = 256
* sqe[1]->len = 128; sqe[2]->len = 256

If sqe1 runs first, it picks buff[1] and it's all good. But, if sqe[2]
runs first, sqe[1] picks buff[2], and the last half of buff[2] is
never sent.

While arguably the use-case of different-length sends is questionable,
it has already raised confusion with potential users of this
feature. Let's make the interface less tricky by forcing the length to
only come from the buffer ring entry itself.

Fixes: ac5f71a3d9d7 ("io_uring/net: add provided buffer support for IORING_OP_SEND")
Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# ef42b85a 30-Apr-2024 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: fix sendzc lazy wake polling

SEND[MSG]_ZC produces multiple CQEs via notifications, LAZY_WAKE doesn't
handle it and so disable LAZY_WAKE for sendzc polling. It should be
fine, sends ar

io_uring/net: fix sendzc lazy wake polling

SEND[MSG]_ZC produces multiple CQEs via notifications, LAZY_WAKE doesn't
handle it and so disable LAZY_WAKE for sendzc polling. It should be
fine, sends are not likely to be polled in the first place.

Fixes: 6ce4a93dbb5b ("io_uring/poll: use IOU_F_TWQ_LAZY_WAKE for wakeups")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5b360fb352d91e3aec751d75c87dfb4753a084ee.1714488419.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 2f9c9515 06-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: support bundles for recv

If IORING_OP_RECV is used with provided buffers, the caller may also set
IORING_RECVSEND_BUNDLE to turn it into a multi-buffer recv. This grabs
buffers availab

io_uring/net: support bundles for recv

If IORING_OP_RECV is used with provided buffers, the caller may also set
IORING_RECVSEND_BUNDLE to turn it into a multi-buffer recv. This grabs
buffers available and receives into them, posting a single completion for
all of it.

This can be used with multishot receive as well, or without it.

Now that both send and receive support bundles, add a feature flag for
it as well. If IORING_FEAT_RECVSEND_BUNDLE is set after registering the
ring, then the kernel supports bundles for recv and send.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# a05d1f62 05-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: support bundles for send

If IORING_OP_SEND is used with provided buffers, the caller may also
set IORING_RECVSEND_BUNDLE to turn it into a multi-buffer send. The idea
is that an applic

io_uring/net: support bundles for send

If IORING_OP_SEND is used with provided buffers, the caller may also
set IORING_RECVSEND_BUNDLE to turn it into a multi-buffer send. The idea
is that an application can fill outgoing buffers in a provided buffer
group, and then arm a single send that will service them all. Once
there are no more buffers to send, or if the requested length has
been sent, the request posts a single completion for all the buffers.

This only enables it for IORING_OP_SEND, IORING_OP_SENDMSG is coming
in a separate patch. However, this patch does do a lot of the prep
work that makes wiring up the sendmsg variant pretty trivial. They
share the prep side.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


Revision tags: v6.8-rc6
# ac5f71a3 19-Feb-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: add provided buffer support for IORING_OP_SEND

It's pretty trivial to wire up provided buffer support for the send
side, just like how it's done the receive side. This enables setting

io_uring/net: add provided buffer support for IORING_OP_SEND

It's pretty trivial to wire up provided buffer support for the send
side, just like how it's done the receive side. This enables setting up
a buffer ring that an application can use to push pending sends to,
and then have a send pick a buffer from that ring.

One of the challenges with async IO and networking sends is that you
can get into reordering conditions if you have more than one inflight
at the same time. Consider the following scenario where everything is
fine:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, completes successfully, posts CQE
5) sendB is issued, completes successfully, posts CQE

All is fine. Requests are always issued in-order, and both complete
inline as most sends do.

However, if we're flooding socket1 with sends, the following could
also result from the same sequence:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, completes successfully, posts CQE
7) sendA is retried, completes successfully, posts CQE

Now we've sent sendB before sendA, which can make things unhappy. If
both sendA and sendB had been using provided buffers, then it would look
as follows instead:

1) App queues dataA for sendA, queues sendA for socket1
2) App queues dataB for sendB queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, picks first buffer (dataA), completes successfully,
posts CQE (which says "I sent dataA")
7) sendA is retried, picks first buffer (dataB), completes successfully,
posts CQE (which says "I sent dataB")

Now we've sent the data in order, and everybody is happy.

It's worth noting that this also opens the door for supporting multishot
sends, as provided buffers would be a prerequisite for that. Those can
trigger either when new buffers are added to the outgoing ring, or (if
stalled due to lack of space) when space frees up in the socket.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 3e747ded 25-Feb-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: add generic multishot retry helper

This is just moving io_recv_prep_retry() higher up so it can get used
for sends as well, and rename it to be generically useful for both
sends and re

io_uring/net: add generic multishot retry helper

This is just moving io_recv_prep_retry() higher up so it can get used
for sends as well, and rename it to be generically useful for both
sends and receives.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# d285da7d 08-Apr-2024 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: set MSG_ZEROCOPY for sendzc in advance

We can set MSG_ZEROCOPY at the preparation step, do it so we don't have
to care about it later in the issue callback.

Signed-off-by: Pavel Begun

io_uring/net: set MSG_ZEROCOPY for sendzc in advance

We can set MSG_ZEROCOPY at the preparation step, do it so we don't have
to care about it later in the issue callback.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c2c22aaa577624977f045979a6db2b9fb2e5648c.1712534031.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 6b7f864b 08-Apr-2024 Pavel Begunkov <asml.silence@gmail.com>

io_uring/net: get rid of io_notif_complete_tw_ext

io_notif_complete_tw_ext() can be removed and combined with
io_notif_complete_tw to make it simpler without sacrificing
anything.

Signed-off-by: Pa

io_uring/net: get rid of io_notif_complete_tw_ext

io_notif_complete_tw_ext() can be removed and combined with
io_notif_complete_tw to make it simpler without sacrificing
anything.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/025a124a5e20e2474a57e2f04f16c422eb83063c.1712534031.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 414d0f45 20-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/alloc_cache: switch to array based caching

Currently lists are being used to manage this, but best practice is
usually to have these in an array instead as that it cheaper to manage.

Outsi

io_uring/alloc_cache: switch to array based caching

Currently lists are being used to manage this, but best practice is
usually to have these in an array instead as that it cheaper to manage.

Outside of that detail, games are also played with KASAN as the list
is inside the cached entry itself.

Finally, all users of this need a struct io_cache_entry embedded in
their struct, which is union'ized with something else in there that
isn't used across the free -> realloc cycle.

Get rid of all of that, and simply have it be an array. This will not
change the memory used, as we're just trading an 8-byte member entry
for the per-elem array size.

This reduces the overhead of the recycled allocations, and it reduces
the amount of code code needed to support recycling to about half of
what it currently is.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# e2ea5a70 19-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: move connect to always using async data

While doing that, get rid of io_async_connect and just use the generic
io_async_msghdr. Both of them have a struct sockaddr_storage in there,
an

io_uring/net: move connect to always using async data

While doing that, get rid of io_async_connect and just use the generic
io_async_msghdr. Both of them have a struct sockaddr_storage in there,
and while io_async_msghdr is bigger, if the same type can be used then
the netmsg_cache can get reused for connect as well.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# d80f9407 18-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: drop 'kmsg' parameter from io_req_msg_cleanup()

Now that iovec recycling is being done, the iovec is no longer being
freed in there. Hence the kmsg parameter is now useless.

Signed-of

io_uring/net: drop 'kmsg' parameter from io_req_msg_cleanup()

Now that iovec recycling is being done, the iovec is no longer being
freed in there. Hence the kmsg parameter is now useless.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 75191341 16-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: add iovec recycling

Right now the io_async_msghdr is recycled to avoid the overhead of
allocating+freeing it for every request. But the iovec is not included,
hence that will be alloca

io_uring/net: add iovec recycling

Right now the io_async_msghdr is recycled to avoid the overhead of
allocating+freeing it for every request. But the iovec is not included,
hence that will be allocated and freed for each transfer regardless.
This commit enables recyling of the iovec between io_async_msghdr
recycles. This avoids alloc+free for each one if an iovec is used, and
on top of that, it extends the cache hot nature of msg to the iovec as
well.

Also enables KASAN for the iovec entries, so that reuse can be detected
even while they are in the cache.

The io_async_msghdr also shrinks from 376 -> 288 bytes, an 88 byte
saving (or ~23% smaller), as the fast_iovec entry is dropped from 8
entries to a single entry. There's no point keeping a big fast iovec
entry, if iovecs aren't being allocated and freed continually.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 9f8539fe 21-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: remove (now) dead code in io_netmsg_recycle()

All net commands have async data at this point, there's no reason to
check if this is the case or not.

Signed-off-by: Jens Axboe <axboe@k

io_uring/net: remove (now) dead code in io_netmsg_recycle()

All net commands have async data at this point, there's no reason to
check if this is the case or not.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 6498c5c9 18-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring: kill io_msg_alloc_async_prep()

We now ONLY call io_msg_alloc_async() from inside prep handling, which
is always locked. No need for this helper anymore, or the check in
io_msg_alloc_async(

io_uring: kill io_msg_alloc_async_prep()

We now ONLY call io_msg_alloc_async() from inside prep handling, which
is always locked. No need for this helper anymore, or the check in
io_msg_alloc_async() on whether the ring is locked or not.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 50220d6a 18-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: get rid of ->prep_async() for send side

Move the io_async_msghdr out of the issue path and into prep handling,
e it's now done unconditionally and hence does not need to be part
of the

io_uring/net: get rid of ->prep_async() for send side

Move the io_async_msghdr out of the issue path and into prep handling,
e it's now done unconditionally and hence does not need to be part
of the issue path. This means any usage of io_sendrecv_prep_async() and
io_sendmsg_prep_async(), and hence the forced async setup path is now
unified with the normal prep setup.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# c6f32c7d 18-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: get rid of ->prep_async() for receive side

Move the io_async_msghdr out of the issue path and into prep handling,
since it's now done unconditionally and hence does not need to be part

io_uring/net: get rid of ->prep_async() for receive side

Move the io_async_msghdr out of the issue path and into prep handling,
since it's now done unconditionally and hence does not need to be part
of the issue path. This reduces the footprint of the multishot fast
path of multiple invocations of ->issue() per prep, and also means that
using ->prep_async() can be dropped for recvmsg asthis is now done via
setup on the prep side.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 3ba8345a 12-Apr-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: always set kmsg->msg.msg_control_user before issue

We currently set this separately for async/sync entry, but let's just
move it to a generic pre-issue spot and eliminate the differenc

io_uring/net: always set kmsg->msg.msg_control_user before issue

We currently set this separately for async/sync entry, but let's just
move it to a generic pre-issue spot and eliminate the difference
between the two.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


# 790b68b3 17-Mar-2024 Jens Axboe <axboe@kernel.dk>

io_uring/net: always setup an io_async_msghdr

Rather than use an on-stack one and then need to allocate and copy if
async execution is required, always grab one upfront. This should be
very cheap, a

io_uring/net: always setup an io_async_msghdr

Rather than use an on-stack one and then need to allocate and copy if
async execution is required, always grab one upfront. This should be
very cheap, and potentially even have cache hotness benefits for
back-to-back send/recv requests.

For any recv type of request, this is probably a good choice in general,
as it's expected that no data is available initially. For send this is
not necessarily the case, as space in the socket buffer is expected to
be available. However, getting a cached io_async_msghdr is very cheap,
and as it should be cache hot, probably the difference here is neglible,
if any.

A nice side benefit is that io_setup_async_msg can get killed
completely, which has some nasty iovec manipulation code.

Signed-off-by: Jens Axboe <axboe@kernel.dk>

show more ...


12345678910>>...22