History log of /freebsd/sys/kern/kern_sendfile.c (Results 26 – 50 of 133)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# faa998f6 25-Feb-2021 Mark Johnston <markj@FreeBSD.org>

sendfile: Use the pager size to determine the file extent when possible

Previously sendfile would issue a VOP_GETATTR and use the returned size,
i.e., the file size. When paging in file data, sendf

sendfile: Use the pager size to determine the file extent when possible

Previously sendfile would issue a VOP_GETATTR and use the returned size,
i.e., the file size. When paging in file data, sendfile_swapin() will
use the pager to determine whether it needs to zero-fill, most often
because of a hole in a sparse file. An attempt to page in beyond the
end of a file is treated this way, and occurs when the requested page is
past the end of the pager. In other words, both the file size and pager
size were used interchangeably.

With ZFS, updates to the pager and file sizes are not synchronized by
the exclusive vnode lock, at least partially due to its use of
MNTK_SHARED_WRITES. In particular, the pager size is updated after the
file size, so in the presence of a writer concurrently extending the
file, sendfile could incorrectly instantiate "holes" in the page cache
pages backing the file, which manifests as data corruption when reading
the file back from the page cache. The on-disk copy is unaffected.

Fix this by consistently using the pager size when available.

Reported by: dumbbell
Reviewed by: chs, kib
Tested by: dumbbell, pho
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28811

show more ...


# 214257da 03-Jan-2021 Mark Johnston <markj@FreeBSD.org>

sendfile: Clear page pointers when handling a pager error

When INVARIANTS is configred, the sendfile_iodone() callback verifies
that pages attached to the sendfile header are wired, but we unwire al

sendfile: Clear page pointers when handling a pager error

When INVARIANTS is configred, the sendfile_iodone() callback verifies
that pages attached to the sendfile header are wired, but we unwire all
such pages after a synchronous pager error, before calling
sendfile_iodone().

Reported by: pho
Tested by: pho
Sponsored by: The FreeBSD Foundation

show more ...


# 26b23f07 26-Dec-2020 Mark Johnston <markj@FreeBSD.org>

sendfile: Ensure that sfio->npages is initialized

We initialize sfio->npages only when some I/O is required to satisfy the
request. However, sendfile_iodone() contains an INVARIANTS-only check
that

sendfile: Ensure that sfio->npages is initialized

We initialize sfio->npages only when some I/O is required to satisfy the
request. However, sendfile_iodone() contains an INVARIANTS-only check
that references sfio->npages, and this check is executed even if no I/O
is performed, so the check may use an uninitialized value.

Fix the problem by initializing sfio->npages earlier. Note that
sendfile_swapin() always initializes the page array. In some rare cases
we need to trim the page array so ensure that sfio->npages gets updated
accordingly.

Reported by: syzkaller (with KASAN)
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D27726

show more ...


# cd853791 28-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Make MAXPHYS tunable. Bump MAXPHYS to 1M.

Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pag

Make MAXPHYS tunable. Bump MAXPHYS to 1M.

Replace MAXPHYS by runtime variable maxphys. It is initialized from
MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.

Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer
cache buffers exactly to atop(maxbcachebuf) (currently it is sized to
atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1.
The +1 for pbufs allow several pbuf consumers, among them vmapbuf(),
to use unaligned buffers still sized to maxphys, esp. when such
buffers come from userspace (*). Overall, we save significant amount
of otherwise wasted memory in b_pages[] for buffer cache buffers,
while bumping MAXPHYS to desired high value.

Eliminate all direct uses of the MAXPHYS constant in kernel and driver
sources, except a place which initialize maxphys. Some random (and
arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted
straight. Some drivers, which use MAXPHYS to size embeded structures,
get private MAXPHYS-like constant; their convertion is out of scope
for this work.

Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs,
dev/siis, where either submitted by, or based on changes by mav.

Suggested by: mav (*)
Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D27225

show more ...


Revision tags: release/12.2.0
# 6fed89b1 02-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

kern: clean up empty lines in .c and .h files


Revision tags: release/11.4.0
# c2ea3d44 06-Jun-2020 Chuck Silvers <chs@FreeBSD.org>

Fix hang due to missing unbusy in sendfile when an async data I/O fails.

r359473 removed the page unbusy logic from sendfile_iodone() because when
vm_pager_get_pages_async() would return an error af

Fix hang due to missing unbusy in sendfile when an async data I/O fails.

r359473 removed the page unbusy logic from sendfile_iodone() because when
vm_pager_get_pages_async() would return an error after failing to start
the async I/O (eg. because VOP_BMAP failed), sendfile_swapin() would also
unbusy the pages, and it was wrong to unbusy twice. However this breaks
the case where vm_pager_get_pages_async() succeeds in starting an async I/O
and the async I/O is what fails. In this case, sendfile_iodone() must
unbusy the pages, and because sendfile_iodone() doesn't know which case
it is in, sendfile_iodone() must always unbusy pages and relookup pages
which have been substituted with bogus_page, which in turn means that
sendfile_swapin() must never do unbusy or relookup for pages which have
been given to vm_pager_get_pages_async(), even if there is an error.

Reviewed by: kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D25136

show more ...


# 61664ee7 03-May-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Step 4.2: start divorce of M_EXT and M_EXTPG

They have more differencies than similarities. For now there is lots
of code that would check for M_EXT only and work correctly on M_EXTPG
buffers, so st

Step 4.2: start divorce of M_EXT and M_EXTPG

They have more differencies than similarities. For now there is lots
of code that would check for M_EXT only and work correctly on M_EXTPG
buffers, so still carry M_EXT bit together with M_EXTPG. However,
prepare some code for explicit check for M_EXTPG.

Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598

show more ...


# 6edfd179 03-May-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Step 4.1: mechanically rename M_NOMAP to M_EXTPG

Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598


# 7b6c99d0 03-May-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf
within m_epg namespace.
All edits except the 'struct mbuf' declaration and mb_dupcl() were done
mechanically with sed:

Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf
within m_epg namespace.
All edits except the 'struct mbuf' declaration and mb_dupcl() were done
mechanically with sed:

s/->m_ext_pgs.nrdy/->m_epg_nrdy/g
s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g
s/->m_ext_pgs.trail_len/->m_epg_trllen/g
s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g
s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g
s/->m_ext_pgs.flags/->m_epg_flags/g
s/->m_ext_pgs.record_type/->m_epg_record_type/g
s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g
s/->m_ext_pgs.tls/->m_epg_tls/g
s/->m_ext_pgs.so/->m_epg_so/g
s/->m_ext_pgs.seqno/->m_epg_seqno/g
s/->m_ext_pgs.stailq/->m_epg_stailq/g

Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598

show more ...


# bccf6e26 03-May-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Step 2.5: Stop using 'struct mbuf_ext_pgs' in the kernel itself.

Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598


# 0c103266 03-May-2020 Gleb Smirnoff <glebius@FreeBSD.org>

Continuation of multi page mbuf redesign from r359919.

The following series of patches addresses three things:

Now that array of pages is embedded into mbuf, we no longer need
separate structure to

Continuation of multi page mbuf redesign from r359919.

The following series of patches addresses three things:

Now that array of pages is embedded into mbuf, we no longer need
separate structure to pass around, so struct mbuf_ext_pgs is an
artifact of the first implementation. And struct mbuf_ext_pgs_data
is a crutch to accomodate the main idea r359919 with minimal churn.

Also, M_EXT of type EXT_PGS are just a synonym of M_NOMAP.

The namespace for the newfeature is somewhat inconsistent and
sometimes has a lengthy prefixes. In these patches we will
gradually bring the namespace to "m_epg" prefix for all mbuf
fields and most functions.

Step 1 of 4:

o Anonymize mbuf_ext_pgs_data, embed in m_ext
o Embed mbuf_ext_pgs
o Start documenting all this entanglement

Reviewed by: gallatin
Differential Revision: https://reviews.freebsd.org/D24598

show more ...


# b7eae758 28-Apr-2020 Mark Johnston <markj@FreeBSD.org>

Make sendfile(SF_SYNC)'s CV wait interruptible.

Otherwise, since the CV is not signalled until data is drained from the
socket, it is trivial to create an unkillable process using
sendfile(SF_SYNC)

Make sendfile(SF_SYNC)'s CV wait interruptible.

Otherwise, since the CV is not signalled until data is drained from the
socket, it is trivial to create an unkillable process using
sendfile(SF_SYNC) and a process-private PF_LOCAL socket pair. In
particular, the cv_wait() in sendfile() does not get interrupted until
data is drained from the receiving socket buffer.

Reported by: pho
Discussed with: kib
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation

show more ...


# acf65eef 18-Apr-2020 Konstantin Belousov <kib@FreeBSD.org>

sendfile: When all io finished, assert that sfio->pa[] is in expected state.

It must contain fully restored contigous run of the wired pages from
the object, except possible trimmed tail.

Tested by

sendfile: When all io finished, assert that sfio->pa[] is in expected state.

It must contain fully restored contigous run of the wired pages from
the object, except possible trimmed tail.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks

show more ...


# e795a040 18-Apr-2020 Konstantin Belousov <kib@FreeBSD.org>

The pa argument for sendfile_iodone() is not necessary a slice of sfio->pa.

It is true for zfs, but it is not for e.g. vnode or buffer pagers.
When fixing bogus pages, fix them in both places. Rely

The pa argument for sendfile_iodone() is not necessary a slice of sfio->pa.

It is true for zfs, but it is not for e.g. vnode or buffer pagers.
When fixing bogus pages, fix them in both places. Rely on the fact
that pa[0] must have been invalid so it cannot be bogus.

Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks

show more ...


# 23feb563 14-Apr-2020 Andrew Gallatin <gallatin@FreeBSD.org>

KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.

While the original implementation of unmapped mbufs was a large
step forward in terms of reducing cache misses by enabling mbufs
to

KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.

While the original implementation of unmapped mbufs was a large
step forward in terms of reducing cache misses by enabling mbufs
to carry more than a single page for sendfile, they are rather
cache unfriendly when accessing the ext_pgs metadata and
data. This is because the ext_pgs part of the mbuf is allocated
separately, and almost guaranteed to be cold in cache.

This change takes advantage of the fact that unmapped mbufs
are never used at the same time as pkthdr mbufs. Given this
fact, we can overlap the ext_pgs metadata with the mbuf
pkthdr, and carry the ext_pgs meta directly in the mbuf itself.
Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array
of pages) directly after the existing m_ext.

In order to be able to carry 5 pages (which is the minimum
required for a 16K TLS record which is not perfectly aligned) on
LP64, I've had to steal ext_arg2. The only user of this in the
xmit path is sendfile, and I've adjusted it to use arg1 when
using unmapped mbufs.

This change is almost entirely mechanical, except that we
change mb_alloc_ext_pgs() to no longer allow allocating
pkthdrs, the change to avoid ext_arg2 as mentioned above,
and the removal of the ext_pgs zone,

This change saves roughly 2% "raw" CPU (~59% -> 57%), or over
3% "scaled" CPU on a Netflix 100% software kTLS workload at
90+ Gb/s on Broadwell Xeons.

In a follow-on commit, I plan to remove some hacks to avoid
access ext_pgs fields of mbufs, since they will now be in
cache.

Many thanks to glebius for helping to make this better in
the Netflix tree.

Reviewed by: hselasky, jhb, rrs, glebius (early version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D24213

show more ...


# d25f1b21 12-Apr-2020 Konstantin Belousov <kib@FreeBSD.org>

sendfile_iodone: correct calculation of the page index for relookup.

This is yet another bug in r359473.

Reported and tested by: delphij
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# f709eee6 10-Apr-2020 Konstantin Belousov <kib@FreeBSD.org>

Do not pass bogus page to mbufs.

This is a bug in r359473.

Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# c506a638 31-Mar-2020 Konstantin Belousov <kib@FreeBSD.org>

kern_sendfile.c: fix bugs with handling of busy page states.

- Do not call into a vnode pager while leaving some pages from the
same block as the current run, xbusy. This immediately deadlocks if

kern_sendfile.c: fix bugs with handling of busy page states.

- Do not call into a vnode pager while leaving some pages from the
same block as the current run, xbusy. This immediately deadlocks if
pager needs to instantiate the buffer.
- Only relookup bogus pages after io finished, otherwise we might
obliterate the valid pages by out of date disk content. While there,
expand the comment explaining this pecularity.
- Do not double-unbusy on error. Split unbusy for error case, which
is left in the sendfile_swapin(), from the more properly coded
normal case in sendfile_iodone().
- Add an XXXKIB comment explaining the serious bug in the validation
algorithm, not fixed by this patch series.

PR: 244713
Reviewed by: glebius, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24038

show more ...


# d8663536 31-Mar-2020 Konstantin Belousov <kib@FreeBSD.org>

kern_sendfile.c: do not release sfio reference on error.

It is already done by sendfile_iodone(), now consistently for all errors.
This de-facto reverts r358597, after r359466.

Reviewed by: glebius

kern_sendfile.c: do not release sfio reference on error.

It is already done by sendfile_iodone(), now consistently for all errors.
This de-facto reverts r358597, after r359466.

Reviewed by: glebius, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24038

show more ...


# 59e1ac9d 30-Mar-2020 Konstantin Belousov <kib@FreeBSD.org>

kern_sendfile.c: wait for all in-flight ios completion before unwiring pages.

Reviewed by: glebius, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision:

kern_sendfile.c: wait for all in-flight ios completion before unwiring pages.

Reviewed by: glebius, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24038

show more ...


# 8f0a223c 30-Mar-2020 Konstantin Belousov <kib@FreeBSD.org>

kern_sendfile.c: add specific malloc type.

Now sfio leaks are more easily seen in the malloc statistics than
e.g. just wired or busy pages leak.

Reviewed by: glebius, markj
Tested by: pho
Sponsored

kern_sendfile.c: add specific malloc type.

Now sfio leaks are more easily seen in the malloc statistics than
e.g. just wired or busy pages leak.

Reviewed by: glebius, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24038

show more ...


# 0ac8511a 30-Mar-2020 Konstantin Belousov <kib@FreeBSD.org>

kern_sendfile.c style: order headers alphabetically.

Reviewed by: glebius, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.

kern_sendfile.c style: order headers alphabetically.

Reviewed by: glebius, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D24038

show more ...


# db4493f7 13-Mar-2020 Michael Tuexen <tuexen@FreeBSD.org>

sendfile() does currently not support SCTP sockets.
Therefore, fail the call.

Reviewed by: markj@
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24059


# e43d33d2 05-Mar-2020 Dimitry Andric <dim@FreeBSD.org>

Merge ^/head r358466 through r358677.


# 37bf88e7 04-Mar-2020 Chuck Silvers <chs@FreeBSD.org>

if vm_pager_get_pages_async() returns an error, release the sfio->nios
refcount that we took earlier that represents the I/O that ended up
not being started.

Reviewed by: glebius
Approved by: imp (m

if vm_pager_get_pages_async() returns an error, release the sfio->nios
refcount that we took earlier that represents the I/O that ended up
not being started.

Reviewed by: glebius
Approved by: imp (mentor)
Sponsored by: Netflix

show more ...


123456