#
faa998f6 |
| 25-Feb-2021 |
Mark Johnston <markj@FreeBSD.org> |
sendfile: Use the pager size to determine the file extent when possible
Previously sendfile would issue a VOP_GETATTR and use the returned size, i.e., the file size. When paging in file data, sendf
sendfile: Use the pager size to determine the file extent when possible
Previously sendfile would issue a VOP_GETATTR and use the returned size, i.e., the file size. When paging in file data, sendfile_swapin() will use the pager to determine whether it needs to zero-fill, most often because of a hole in a sparse file. An attempt to page in beyond the end of a file is treated this way, and occurs when the requested page is past the end of the pager. In other words, both the file size and pager size were used interchangeably.
With ZFS, updates to the pager and file sizes are not synchronized by the exclusive vnode lock, at least partially due to its use of MNTK_SHARED_WRITES. In particular, the pager size is updated after the file size, so in the presence of a writer concurrently extending the file, sendfile could incorrectly instantiate "holes" in the page cache pages backing the file, which manifests as data corruption when reading the file back from the page cache. The on-disk copy is unaffected.
Fix this by consistently using the pager size when available.
Reported by: dumbbell Reviewed by: chs, kib Tested by: dumbbell, pho MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28811
show more ...
|
#
214257da |
| 03-Jan-2021 |
Mark Johnston <markj@FreeBSD.org> |
sendfile: Clear page pointers when handling a pager error
When INVARIANTS is configred, the sendfile_iodone() callback verifies that pages attached to the sendfile header are wired, but we unwire al
sendfile: Clear page pointers when handling a pager error
When INVARIANTS is configred, the sendfile_iodone() callback verifies that pages attached to the sendfile header are wired, but we unwire all such pages after a synchronous pager error, before calling sendfile_iodone().
Reported by: pho Tested by: pho Sponsored by: The FreeBSD Foundation
show more ...
|
#
26b23f07 |
| 26-Dec-2020 |
Mark Johnston <markj@FreeBSD.org> |
sendfile: Ensure that sfio->npages is initialized
We initialize sfio->npages only when some I/O is required to satisfy the request. However, sendfile_iodone() contains an INVARIANTS-only check that
sendfile: Ensure that sfio->npages is initialized
We initialize sfio->npages only when some I/O is required to satisfy the request. However, sendfile_iodone() contains an INVARIANTS-only check that references sfio->npages, and this check is executed even if no I/O is performed, so the check may use an uninitialized value.
Fix the problem by initializing sfio->npages earlier. Note that sendfile_swapin() always initializes the page array. In some rare cases we need to trim the page array so ensure that sfio->npages gets updated accordingly.
Reported by: syzkaller (with KASAN) Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D27726
show more ...
|
#
cd853791 |
| 28-Nov-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pag
Make MAXPHYS tunable. Bump MAXPHYS to 1M.
Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys.
Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (*). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value.
Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work.
Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav.
Suggested by: mav (*) Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225
show more ...
|
Revision tags: release/12.2.0 |
|
#
6fed89b1 |
| 02-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
kern: clean up empty lines in .c and .h files
|
Revision tags: release/11.4.0 |
|
#
c2ea3d44 |
| 06-Jun-2020 |
Chuck Silvers <chs@FreeBSD.org> |
Fix hang due to missing unbusy in sendfile when an async data I/O fails.
r359473 removed the page unbusy logic from sendfile_iodone() because when vm_pager_get_pages_async() would return an error af
Fix hang due to missing unbusy in sendfile when an async data I/O fails.
r359473 removed the page unbusy logic from sendfile_iodone() because when vm_pager_get_pages_async() would return an error after failing to start the async I/O (eg. because VOP_BMAP failed), sendfile_swapin() would also unbusy the pages, and it was wrong to unbusy twice. However this breaks the case where vm_pager_get_pages_async() succeeds in starting an async I/O and the async I/O is what fails. In this case, sendfile_iodone() must unbusy the pages, and because sendfile_iodone() doesn't know which case it is in, sendfile_iodone() must always unbusy pages and relookup pages which have been substituted with bogus_page, which in turn means that sendfile_swapin() must never do unbusy or relookup for pages which have been given to vm_pager_get_pages_async(), even if there is an error.
Reviewed by: kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25136
show more ...
|
#
61664ee7 |
| 03-May-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Step 4.2: start divorce of M_EXT and M_EXTPG
They have more differencies than similarities. For now there is lots of code that would check for M_EXT only and work correctly on M_EXTPG buffers, so st
Step 4.2: start divorce of M_EXT and M_EXTPG
They have more differencies than similarities. For now there is lots of code that would check for M_EXT only and work correctly on M_EXTPG buffers, so still carry M_EXT bit together with M_EXTPG. However, prepare some code for explicit check for M_EXTPG.
Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598
show more ...
|
#
6edfd179 |
| 03-May-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Step 4.1: mechanically rename M_NOMAP to M_EXTPG
Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598
|
#
7b6c99d0 |
| 03-May-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf within m_epg namespace. All edits except the 'struct mbuf' declaration and mb_dupcl() were done mechanically with sed:
Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf within m_epg namespace. All edits except the 'struct mbuf' declaration and mb_dupcl() were done mechanically with sed:
s/->m_ext_pgs.nrdy/->m_epg_nrdy/g s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g s/->m_ext_pgs.trail_len/->m_epg_trllen/g s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g s/->m_ext_pgs.flags/->m_epg_flags/g s/->m_ext_pgs.record_type/->m_epg_record_type/g s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g s/->m_ext_pgs.tls/->m_epg_tls/g s/->m_ext_pgs.so/->m_epg_so/g s/->m_ext_pgs.seqno/->m_epg_seqno/g s/->m_ext_pgs.stailq/->m_epg_stailq/g
Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598
show more ...
|
#
bccf6e26 |
| 03-May-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Step 2.5: Stop using 'struct mbuf_ext_pgs' in the kernel itself.
Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598
|
#
0c103266 |
| 03-May-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Continuation of multi page mbuf redesign from r359919.
The following series of patches addresses three things:
Now that array of pages is embedded into mbuf, we no longer need separate structure to
Continuation of multi page mbuf redesign from r359919.
The following series of patches addresses three things:
Now that array of pages is embedded into mbuf, we no longer need separate structure to pass around, so struct mbuf_ext_pgs is an artifact of the first implementation. And struct mbuf_ext_pgs_data is a crutch to accomodate the main idea r359919 with minimal churn.
Also, M_EXT of type EXT_PGS are just a synonym of M_NOMAP.
The namespace for the newfeature is somewhat inconsistent and sometimes has a lengthy prefixes. In these patches we will gradually bring the namespace to "m_epg" prefix for all mbuf fields and most functions.
Step 1 of 4:
o Anonymize mbuf_ext_pgs_data, embed in m_ext o Embed mbuf_ext_pgs o Start documenting all this entanglement
Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598
show more ...
|
#
b7eae758 |
| 28-Apr-2020 |
Mark Johnston <markj@FreeBSD.org> |
Make sendfile(SF_SYNC)'s CV wait interruptible.
Otherwise, since the CV is not signalled until data is drained from the socket, it is trivial to create an unkillable process using sendfile(SF_SYNC)
Make sendfile(SF_SYNC)'s CV wait interruptible.
Otherwise, since the CV is not signalled until data is drained from the socket, it is trivial to create an unkillable process using sendfile(SF_SYNC) and a process-private PF_LOCAL socket pair. In particular, the cv_wait() in sendfile() does not get interrupted until data is drained from the receiving socket buffer.
Reported by: pho Discussed with: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
show more ...
|
#
acf65eef |
| 18-Apr-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
sendfile: When all io finished, assert that sfio->pa[] is in expected state.
It must contain fully restored contigous run of the wired pages from the object, except possible trimmed tail.
Tested by
sendfile: When all io finished, assert that sfio->pa[] is in expected state.
It must contain fully restored contigous run of the wired pages from the object, except possible trimmed tail.
Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
show more ...
|
#
e795a040 |
| 18-Apr-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
The pa argument for sendfile_iodone() is not necessary a slice of sfio->pa.
It is true for zfs, but it is not for e.g. vnode or buffer pagers. When fixing bogus pages, fix them in both places. Rely
The pa argument for sendfile_iodone() is not necessary a slice of sfio->pa.
It is true for zfs, but it is not for e.g. vnode or buffer pagers. When fixing bogus pages, fix them in both places. Rely on the fact that pa[0] must have been invalid so it cannot be bogus.
Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
show more ...
|
#
23feb563 |
| 14-Apr-2020 |
Andrew Gallatin <gallatin@FreeBSD.org> |
KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to
KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to carry more than a single page for sendfile, they are rather cache unfriendly when accessing the ext_pgs metadata and data. This is because the ext_pgs part of the mbuf is allocated separately, and almost guaranteed to be cold in cache.
This change takes advantage of the fact that unmapped mbufs are never used at the same time as pkthdr mbufs. Given this fact, we can overlap the ext_pgs metadata with the mbuf pkthdr, and carry the ext_pgs meta directly in the mbuf itself. Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array of pages) directly after the existing m_ext.
In order to be able to carry 5 pages (which is the minimum required for a 16K TLS record which is not perfectly aligned) on LP64, I've had to steal ext_arg2. The only user of this in the xmit path is sendfile, and I've adjusted it to use arg1 when using unmapped mbufs.
This change is almost entirely mechanical, except that we change mb_alloc_ext_pgs() to no longer allow allocating pkthdrs, the change to avoid ext_arg2 as mentioned above, and the removal of the ext_pgs zone,
This change saves roughly 2% "raw" CPU (~59% -> 57%), or over 3% "scaled" CPU on a Netflix 100% software kTLS workload at 90+ Gb/s on Broadwell Xeons.
In a follow-on commit, I plan to remove some hacks to avoid access ext_pgs fields of mbufs, since they will now be in cache.
Many thanks to glebius for helping to make this better in the Netflix tree.
Reviewed by: hselasky, jhb, rrs, glebius (early version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24213
show more ...
|
#
d25f1b21 |
| 12-Apr-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
sendfile_iodone: correct calculation of the page index for relookup.
This is yet another bug in r359473.
Reported and tested by: delphij Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
f709eee6 |
| 10-Apr-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Do not pass bogus page to mbufs.
This is a bug in r359473.
Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
c506a638 |
| 31-Mar-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_sendfile.c: fix bugs with handling of busy page states.
- Do not call into a vnode pager while leaving some pages from the same block as the current run, xbusy. This immediately deadlocks if
kern_sendfile.c: fix bugs with handling of busy page states.
- Do not call into a vnode pager while leaving some pages from the same block as the current run, xbusy. This immediately deadlocks if pager needs to instantiate the buffer. - Only relookup bogus pages after io finished, otherwise we might obliterate the valid pages by out of date disk content. While there, expand the comment explaining this pecularity. - Do not double-unbusy on error. Split unbusy for error case, which is left in the sendfile_swapin(), from the more properly coded normal case in sendfile_iodone(). - Add an XXXKIB comment explaining the serious bug in the validation algorithm, not fixed by this patch series.
PR: 244713 Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038
show more ...
|
#
d8663536 |
| 31-Mar-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_sendfile.c: do not release sfio reference on error.
It is already done by sendfile_iodone(), now consistently for all errors. This de-facto reverts r358597, after r359466.
Reviewed by: glebius
kern_sendfile.c: do not release sfio reference on error.
It is already done by sendfile_iodone(), now consistently for all errors. This de-facto reverts r358597, after r359466.
Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038
show more ...
|
#
59e1ac9d |
| 30-Mar-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_sendfile.c: wait for all in-flight ios completion before unwiring pages.
Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision:
kern_sendfile.c: wait for all in-flight ios completion before unwiring pages.
Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038
show more ...
|
#
8f0a223c |
| 30-Mar-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_sendfile.c: add specific malloc type.
Now sfio leaks are more easily seen in the malloc statistics than e.g. just wired or busy pages leak.
Reviewed by: glebius, markj Tested by: pho Sponsored
kern_sendfile.c: add specific malloc type.
Now sfio leaks are more easily seen in the malloc statistics than e.g. just wired or busy pages leak.
Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038
show more ...
|
#
0ac8511a |
| 30-Mar-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
kern_sendfile.c style: order headers alphabetically.
Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.
kern_sendfile.c style: order headers alphabetically.
Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038
show more ...
|
#
db4493f7 |
| 13-Mar-2020 |
Michael Tuexen <tuexen@FreeBSD.org> |
sendfile() does currently not support SCTP sockets. Therefore, fail the call.
Reviewed by: markj@ MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D24059
|
#
e43d33d2 |
| 05-Mar-2020 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r358466 through r358677.
|
#
37bf88e7 |
| 04-Mar-2020 |
Chuck Silvers <chs@FreeBSD.org> |
if vm_pager_get_pages_async() returns an error, release the sfio->nios refcount that we took earlier that represents the I/O that ended up not being started.
Reviewed by: glebius Approved by: imp (m
if vm_pager_get_pages_async() returns an error, release the sfio->nios refcount that we took earlier that represents the I/O that ended up not being started.
Reviewed by: glebius Approved by: imp (mentor) Sponsored by: Netflix
show more ...
|