#
3e867f24 |
| 21-Mar-2018 |
Warner Losh <imp@FreeBSD.org> |
bufshutdown is no longer called with Giant held, so there's no need to drop or pickup Giant anymore. Remove that code and adjust comments.
|
#
2acde6a8 |
| 20-Mar-2018 |
Justin Hibbits <jhibbits@FreeBSD.org> |
Cast through uintptr_t to narrow the buf domain pointer on 32-bit archs
arg2 is an intmax_t, which on 32-bit architectures is 64 bits, wider than a pointer. When &bdomain[i] is added to arg2 it wid
Cast through uintptr_t to narrow the buf domain pointer on 32-bit archs
arg2 is an intmax_t, which on 32-bit architectures is 64 bits, wider than a pointer. When &bdomain[i] is added to arg2 it widens from uintptr_t to intmax_t, then gcc whines when it gets cast to a pointer. Casting through uintptr_t silences this warning.
show more ...
|
#
0eb50f9c |
| 18-Mar-2018 |
Mark Johnston <markj@FreeBSD.org> |
Have vm_page_{deactivate,launder}() requeue already-queued pages.
In many cases the page is not enqueued so the change will have no effect. However, the change is needed to support an optimization i
Have vm_page_{deactivate,launder}() requeue already-queued pages.
In many cases the page is not enqueued so the change will have no effect. However, the change is needed to support an optimization in the fault handler and in some cases (sendfile, the buffer cache) it was being emulated by the caller anyway.
Reviewed by: alc Tested by: pho MFC after: 2 weeks X-Differential Revision: https://reviews.freebsd.org/D14625
show more ...
|
#
3cec5c77 |
| 17-Mar-2018 |
Jeff Roberson <jeff@FreeBSD.org> |
Move the dirty queues inside the per-domain structure. This resolves a bug where we had not hit global dirty limits but a single queue was starved for space by dirty buffers. A single buf_daemon is
Move the dirty queues inside the per-domain structure. This resolves a bug where we had not hit global dirty limits but a single queue was starved for space by dirty buffers. A single buf_daemon is maintained for now.
Add a bd_speedup() when we are low on bufspace. This can happen due to SUJ keeping many bufs locked until a cg block is written. Document this with a comment.
Fix sysctls to work with per-domain variables. Add more ddb debugging.
Reported by: pho Reviewed by: kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14705
show more ...
|
#
330b675f |
| 14-Mar-2018 |
Conrad Meyer <cem@FreeBSD.org> |
vfs_bio.c: Apply cleanups motivated by Coverity analysis
It is believed that the conditions Coverity indicated were actually impossible to hit. So this patch just adds a cleanup to only compute v_m
vfs_bio.c: Apply cleanups motivated by Coverity analysis
It is believed that the conditions Coverity indicated were actually impossible to hit. So this patch just adds a cleanup to only compute v_mount once in brelse(), and in vfs_bio_getpages() always initializes error to zero to appease the static analyzer.
No functional change intended.
Submitted by: Darrick Lew <darrick.freebsd AT gmail.com> Reviewed by: kib Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14613
show more ...
|
#
1c2529ab |
| 25-Feb-2018 |
Jeff Roberson <jeff@FreeBSD.org> |
Fix issues with sparse cpu allocation. Consistently use mp_maxid + 1.
Reported by: pho Reviewed by: markj Sponsored by: Netflix, Dell/EMC Isilon
|
#
a0c722bd |
| 22-Feb-2018 |
Mateusz Guzik <mjg@FreeBSD.org> |
Fix up sysctl vfs.buffercache broken in r329612
Sample problem: top: sysctl(vfs.bufspace...) expected 8, got 4
Reported by: O. Hartmann <ohartmann walstatt.org>
|
#
683ca3a4 |
| 20-Feb-2018 |
Jeff Roberson <jeff@FreeBSD.org> |
Fix the broken subqueue assignment for the cleanq.
Reported by: pho Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon
|
#
06220fa7 |
| 20-Feb-2018 |
Jeff Roberson <jeff@FreeBSD.org> |
Further parallelize the buffer cache.
Provide multiple clean queues partitioned into 'domains'. Each domain manages its own bufspace and has its own bufspace daemon. Each domain has a set of subqu
Further parallelize the buffer cache.
Provide multiple clean queues partitioned into 'domains'. Each domain manages its own bufspace and has its own bufspace daemon. Each domain has a set of subqueues indexed by the current cpuid to reduce lock contention on the cleanq.
Refine the sleep/wakeup around the bufspace daemon to use atomics as much as possible.
Add a B_REUSE flag that is used to requeue bufs during the scan to approximate LRU rather than locking the queue on every use of a frequently accessed buf.
Implement bufspace_reserve with only atomic_fetchadd to avoid loop restarts.
Reviewed by: markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14274
show more ...
|
#
e958ad4c |
| 12-Feb-2018 |
Jeff Roberson <jeff@FreeBSD.org> |
Make v_wire_count a per-cpu counter(9) counter. This eliminates a significant source of cache line contention from vm_page_alloc(). Use accessors and vm_page_unwire_noq() so that the mechanism can
Make v_wire_count a per-cpu counter(9) counter. This eliminates a significant source of cache line contention from vm_page_alloc(). Use accessors and vm_page_unwire_noq() so that the mechanism can be easily changed in the future.
Reviewed by: markj Discussed with: kib, glebius Tested by: pho (earlier version) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14273
show more ...
|
#
13a025d5 |
| 09-Feb-2018 |
Kirk McKusick <mckusick@FreeBSD.org> |
Merge biodone_finish() back into biodone(). The primary purpose is to make the order of operations clearer to avoid the race condition that was fixed in r328914. In particular, this commit corrects a
Merge biodone_finish() back into biodone(). The primary purpose is to make the order of operations clearer to avoid the race condition that was fixed in r328914. In particular, this commit corrects a similar race that existed in the soft updates callback.
Doing some sleuthing through the SVN repository, it appears that bufdone_finish() was added to support XFS:
------------------------------------------------------------------------ r153192 | rodrigc | 2005-12-06 19:39:08 -0800 (Tue, 06 Dec 2005) | 13 lines
Changes imported from XFS for FreeBSD project: - add fields to struct buf (needed by XFS) - 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3 - b_pin_count, count of pinned buffer
- add new B_MANAGED flag - add breada() function to initiate asynchronous I/O on read-ahead blocks. - add bufdone_finish(), bpin(), bunpin_wait() functions
Patches provided by: kan Reviewed by: phk Silence on: arch@
------------------------------------------------------------------------
It does not appear to ever have been used for anything else. XFS was disconnected in r241607:
------------------------------------------------------------------------ r241607 | attilio | 2012-10-16 03:04:00 -0700 (Tue, 16 Oct 2012) | 5 lines
Disconnect non-MPSAFE XFS from the build in preparation for dropping GIANT from VFS.
This is not targeted for MFC.
------------------------------------------------------------------------
and removed entirely in r247631:
------------------------------------------------------------------------ r247631 | attilio | 2013-03-02 07:33:54 -0800 (Sat, 02 Mar 2013) | 5 lines
Garbage collect XFS bits which are now already completely disconnected from the tree since few months.
This is not targeted for MFC.
------------------------------------------------------------------------
Since XFS support is gone, there is no reason to retain biodone_finish().
Suggested by: Warner Losh (imp) Discussed with: cem, kib Tested by: Peter Holm (pho)
show more ...
|
#
1d3a1bcf |
| 07-Feb-2018 |
Mark Johnston <markj@FreeBSD.org> |
Dequeue wired pages lazily.
Previously, wiring a page would cause it to be removed from its page queue. In the common case, unwiring causes it to be enqueued at the tail of that page queue. This cha
Dequeue wired pages lazily.
Previously, wiring a page would cause it to be removed from its page queue. In the common case, unwiring causes it to be enqueued at the tail of that page queue. This change modifies vm_page_wire() to not dequeue the page, thus avoiding the highly contended page queue locks. Instead, vm_page_unwire() takes care of requeuing the page as a single operation, and the page daemon dequeues wired pages as they are encountered during a queue scan to avoid needlessly revisiting them later. For pages in PQ_ACTIVE we do even better, since a requeue is unnecessary.
The change improves scalability for some common workloads. For instance, threads wiring pages into the buffer cache no longer need to modify global page queues, and unwiring is usually done by the bufspace thread, so concurrency is not as much of an issue. As another example, many sysctl handlers wire the output buffer to avoid faults on copyout, and since the buffer is likely to be in PQ_ACTIVE, we now entirely avoid modifying the page queue in this case.
The change also adds a block comment describing some properties of struct vm_page's reference counters, and the busy lock.
Reviewed by: jeff Discussed with: alc, kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D11943
show more ...
|
#
47806d1b |
| 06-Feb-2018 |
Kirk McKusick <mckusick@FreeBSD.org> |
Occasional cylinder-group check-hash errors were being reported on systems running with a heavy filesystem load. Tracking down this bug was elusive because there were actually two problems. Sometimes
Occasional cylinder-group check-hash errors were being reported on systems running with a heavy filesystem load. Tracking down this bug was elusive because there were actually two problems. Sometimes the in-memory check hash was wrong and sometimes the check hash computed when doing the read was wrong. The occurrence of either error caused a check-hash mismatch to be reported.
The first error was that the check hash in the in-memory cylinder group was incorrect. This error was caused by the following sequence of events:
- We read a cylinder-group buffer and the check hash is valid. - We update its cg_time and cg_old_time which makes the in-memory check-hash value invalid but we do not mark the cylinder group dirty. - We do not make any other changes to the cylinder group, so we never mark it dirty, thus do not write it out, and hence never update the incorrect check hash for the in-memory buffer. - Later, the buffer gets freed, but the page with the old incorrect check hash is still in the VM cache. - Later, we read the cylinder group again, and the first page with the old check hash is still in the VM cache, but some other pages are not, so we have to do a read. - The read does not actually get the first page from disk, but rather from the VM cache, resulting in the old check hash in the buffer. - The value computed after doing the read does not match causing the error to be printed.
The fix for this problem is to only set cg_time and cg_old_time as the cylinder group is being written to disk. This keeps the in-memory check-hash valid unless the cylinder group has had other modifications which will require it to be written with a new check hash calculated. It also requires that the check hash be recalculated in the in-memory cylinder group when it is marked clean after doing a background write.
The second problem was that the check hash computed at the end of the read was incorrect because the calculation of the check hash on completion of the read was being done too soon.
- When a read completes we had the following sequence:
- bufdone() -- b_ckhashcalc (calculates check hash) -- bufdone_finish() --- vfs_vmio_iodone() (replaces bogus pages with the cached ones)
- When we are reading a buffer where one or more pages are already in memory (but not all pages, or we wouldn't be doing the read), the I/O is done with bogus_page mapped in for the pages that exist in the VM cache. This mapping is done to avoid corrupting the cached pages if there is any I/O overrun. The vfs_vmio_iodone() function is responsible for replacing the bogus_page(s) with the cached ones. But we were calculating the check hash before the bogus_page(s) were replaced. Hence, when we were calculating the check hash, we were partly reading from bogus_page, which means we calculated a bad check hash (e.g., because multiple pages have been mapped to bogus_page, so its contents are indeterminate).
The second fix is to move the check-hash calculation from bufdone() to bufdone_finish() after the call to vfs_vmio_iodone() so that it computes the check hash over the correct set of pages.
With these two changes, the occasional cylinder-group check-hash errors are gone.
Submitted by: David Pfitzner <dpfitzner@netflix.com> Reviewed by: kib Tested by: David Pfitzner
show more ...
|
#
3e4f610d |
| 18-Jan-2018 |
Andriy Gapon <avg@FreeBSD.org> |
correct read-ahead calculations in vfs_bio_getpages
Previously the calculations were done as if the requested region ended at the start of the last requested page, not its end. The problem as actual
correct read-ahead calculations in vfs_bio_getpages
Previously the calculations were done as if the requested region ended at the start of the last requested page, not its end. The problem as actually quite minor as it affected only stats and page prefaulting, not the actual page data, and only with specific parameters.
Reviewed by: kib (previous version) MFC after: 2 weeks
show more ...
|
#
72bfb31a |
| 13-Jan-2018 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r327886 through r327930.
|
#
ab3185d1 |
| 13-Jan-2018 |
Jeff Roberson <jeff@FreeBSD.org> |
Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag.
The slab
Implement NUMA support in uma(9) and malloc(9). Allocations from specific domains can be done by the _domain() API variants. UMA also supports a first-touch policy via the NUMA zone flag.
The slab layer is now segregated by VM domains and is precise. It handles iteration for round-robin directly. The per-cpu cache layer remains a mix of domains according to where memory is allocated and freed. Well behaved clients can achieve perfect locality with no performance penalty.
The direct domain allocation functions have to visit the slab layer and so require per-zone locks which come at some expense.
Reviewed by: Attilio (a slightly older version) Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon
show more ...
|
#
8a36da99 |
| 27-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone
sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
show more ...
|
#
82725ba9 |
| 23-Nov-2017 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Merge ^/head r325999 through r326131.
|
#
cab229b2 |
| 20-Nov-2017 |
Scott Long <scottl@FreeBSD.org> |
Update a comment in brelse() to match reality.
|
#
f8190300 |
| 10-Nov-2017 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Merge ^/head r325505 through r325662.
|
#
8d6fbbb8 |
| 08-Nov-2017 |
Jeff Roberson <jeff@FreeBSD.org> |
Replace manyinstances of VM_WAIT with blocking page allocation flags similar to the kernel memory allocator.
This simplifies NUMA allocation because the domain will be known at wait time and races b
Replace manyinstances of VM_WAIT with blocking page allocation flags similar to the kernel memory allocator.
This simplifies NUMA allocation because the domain will be known at wait time and races between failure and sleeping are eliminated. This also reduces boilerplate code and simplifies callers.
A wait primitive is supplied for uma zones for similar reasons. This eliminates some non-specific VM_WAIT calls in favor of more explicit sleeps that may be satisfied without new pages.
Reviewed by: alc, kib, markj Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon
show more ...
|
#
c2c014f2 |
| 07-Nov-2017 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Merge ^/head r323559 through r325504.
|
Revision tags: release/10.4.0 |
|
#
e5d34ca9 |
| 23-Sep-2017 |
Enji Cooper <ngie@FreeBSD.org> |
MFhead@r320180
|
#
75e3597a |
| 22-Sep-2017 |
Kirk McKusick <mckusick@FreeBSD.org> |
Continuing efforts to provide hardening of FFS, this change adds a check hash to cylinder groups. If a check hash fails when a cylinder group is read, no further allocations are attempted in that cyl
Continuing efforts to provide hardening of FFS, this change adds a check hash to cylinder groups. If a check hash fails when a cylinder group is read, no further allocations are attempted in that cylinder group until it has been fixed by fsck. This avoids a class of filesystem panics related to corrupted cylinder group maps. The hash is done using crc32c.
Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible.
Specifics of the changes:
sys/sys/buf.h: Add BX_FSPRIV to reserve a set of eight b_xflags that may be used by individual filesystems for their own purpose. Their specific definitions are found in the header files for each filesystem that uses them. Also add fields to struct buf as noted below.
sys/kern/vfs_bio.c: It is only necessary to compute a check hash for a cylinder group when it is actually read from disk. When calling bread, you do not know whether the buffer was found in the cache or read. So a new flag (GB_CKHASH) and a pointer to a function to perform the hash has been added to breadn_flags to say that the function should be called to calculate a hash if the data has been read. The check hash is placed in b_ckhash and the B_CKHASH flag is set to indicate that a read was done and a check hash calculated. Though a rather elaborate mechanism, it should also work for check hashing other metadata in the future. A kernel internal API change was to change breada into a static fucntion and add flags and a function pointer to a check-hash function.
sys/ufs/ffs/fs.h: Add flags for types of check hashes; stored in a new word in the superblock. Define corresponding BX_ flags for the different types of check hashes. Add a check hash word in the cylinder group.
sys/ufs/ffs/ffs_alloc.c: In ffs_getcg do the dance with breadn_flags to get a check hash and if one is provided, check it.
sys/ufs/ffs/ffs_vfsops.c: Copy across the BX_FFSTYPES flags in background writes. Update the check hash when writing out buffers that need them.
sys/ufs/ffs/ffs_snapshot.c: Recompute check hash when updating snapshot cylinder groups.
sys/libkern/crc32.c: lib/libufs/Makefile: lib/libufs/libufs.h: lib/libufs/cgroup.c: Include libkern/crc32.c in libufs and use it to compute check hashes when updating cylinder groups.
Four utilities are affected:
sbin/newfs/mkfs.c: Add the check hashes when building the cylinder groups.
sbin/fsck_ffs/fsck.h: sbin/fsck_ffs/fsutil.c: Verify and update check hashes when checking and writing cylinder groups.
sbin/fsck_ffs/pass5.c: Offer to add check hashes to existing filesystems. Precompute check hashes when rebuilding cylinder group (although this will be done when it is written in fsutil.c it is necessary to do it early before comparing with the old cylinder group)
sbin/dumpfs/dumpfs.c Print out the new check hash flag(s)
sbin/fsdb/Makefile: Needs to add libufs now used by pass5.c imported from fsck_ffs.
Reviewed by: kib Tested by: Peter Holm (pho)
show more ...
|
#
b754c279 |
| 13-Sep-2017 |
Navdeep Parhar <np@FreeBSD.org> |
MFH @ r323558.
|