Revision tags: release/14.3.0-p1, release/14.2.0-p4, release/13.5.0-p2, release/14.3.0, release/13.4.0-p5, release/13.5.0-p1, release/14.2.0-p3, release/13.5.0, release/14.2.0-p2, release/14.1.0-p8, release/13.4.0-p4, release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3 |
|
#
095f6305 |
| 06-Jan-2025 |
Mark Johnston <markj@FreeBSD.org> |
vm_pageout: Scan inactive dirty pages less aggressively
Consider a database workload where the bulk of RAM is used for a fixed-size file-backed cache. Any leftover pages are used for filesystem cac
vm_pageout: Scan inactive dirty pages less aggressively
Consider a database workload where the bulk of RAM is used for a fixed-size file-backed cache. Any leftover pages are used for filesystem caching or anonymous memory. In particular, there is little memory pressure and the inactive queue is scanned rarely.
Once in a while, the free page count dips a bit below the setpoint, triggering an inactive queue scan. Since almost all of the memory there is used by the database cache, the scan encounters only referenced and/or dirty pages, moving them to the active and laundry queues. In particular, it ends up completely depleting the inactive queue, even for a small, non-urgent free page shortage.
This scan might process many gigabytes worth of pages in one go, triggering VM object lock contention (on the DB cache file's VM object) and consuming CPU, which can cause application latency spikes.
Observing this behaviour, my observation is that we should abort scanning once we've encountered many dirty pages without meeting the shortage. In general we've tried to make the page daemon control loops avoid large bursts of work, and if a scan fails to turn up clean pages, there's not much use in moving everything to laundry queue at once. In particular, pacing this work ensures that the page daemon isn't frequently acquiring and releasing the VM object lock over long periods, especially when multiple page daemon threads are in use.
Modify the inactive scan to abort early if we encounter enough dirty pages without meeting the shortage. If the shortage hasn't been met, this will trigger shortfall laundering, wherein the laundry thread will clean as many pages as needed to meet the instantaneous shortfall. Laundered pages will be placed near the head of the inactive queue, so will be immediately visible to the page daemon during its next scan of the inactive queue.
Reviewed by: alc, kib MFC after: 1 month Sponsored by: Modirum MDPay Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D48337
show more ...
|
#
78546fb0 |
| 27-Jun-2025 |
Mark Johnston <markj@FreeBSD.org> |
vm_pageout: Make the OOM killer less aggressive
A problem can arise if we enter a shortfall of clean, inactive pages. The PID controller will attempt to overshoot the reclamation target because repe
vm_pageout: Make the OOM killer less aggressive
A problem can arise if we enter a shortfall of clean, inactive pages. The PID controller will attempt to overshoot the reclamation target because repeated scans of the inactive queue are just moving pages to the laundry queue, so inactive queue scans fail to address an instantaneous page shortage. The laundry thread will launder pages and move them back to the head of the inactive queue to be reclaimed, but this does not happen immediately, so the integral term of the PID controller grows and the page daemon tries to reclaim pages in excess of the setpoint. However, the laundry thread will only launder enough pages to meet the shortfall: vm_laundry_target(), which is the same as the setpoint.
Oonce the shortfall is addressed by the laundry thread, no more clean pages will appear in the inactive queue, but the page daemon may keep scanning dirty pages due to this overshooting. This can result in a spurious OOM kill.
Thus, reset the sequence counter if we observe that there is no instantaneous free page shortage.
Reviewed by: alc, kib MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D51015
show more ...
|
#
e3bc87ab |
| 31-May-2025 |
Doug Moore <dougm@FreeBSD.org> |
vm_pageout: fix pageout_flush
A change just made to vm_pageout_flush wrongly dismissed the variable 'runlen' and used 'count' in its place, with the unintended consequence of terminating the main lo
vm_pageout: fix pageout_flush
A change just made to vm_pageout_flush wrongly dismissed the variable 'runlen' and used 'count' in its place, with the unintended consequence of terminating the main loop of the function prematurely when the first VM_PAGER_AGAIN pageout status was encountered. Reintroduce that variable, so that the loop runs to completion.
Reported by: alc Reviewed by: alc Fixes: f2a193a967e3 ("vm_pageout: reduce number of flush() params") Differential Revision: https://reviews.freebsd.org/D50622
show more ...
|
#
f2a193a9 |
| 30-May-2025 |
Doug Moore <dougm@FreeBSD.org> |
vm_pageout: reduce number of flush() params
vm_pageout_flush is called in two places, and in both, the mreq parameter is 0. Drop that parameter, and simplify the calculations that use it.
The prunl
vm_pageout: reduce number of flush() params
vm_pageout_flush is called in two places, and in both, the mreq parameter is 0. Drop that parameter, and simplify the calculations that use it.
The prunlen and eio parameters are either both NULL, or neither NULL. Drop the prunlen parameter and, when eio is NULL, return the runlen value instead of the numpagedout parameter, which the caller ignores.
Change a param from boolean_t* to bool*.
Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D50568
show more ...
|
#
d8b03c59 |
| 18-Apr-2025 |
Mark Johnston <markj@FreeBSD.org> |
vm_pageout: Disallow invalid values for act_scan_laundry_weight
PR: 234167 MFC after: 2 weeks
|
#
98372394 |
| 17-Apr-2025 |
Doug Moore <dougm@FreeBSD.org> |
vm_pageout: rewrite cluster()
Implement vm_pageout_cluster using iterators instead of vm_page_next() and vm_page_prev(), and without gotos.
Reviewed by: kib Differential Revision: https://reviews.f
vm_pageout: rewrite cluster()
Implement vm_pageout_cluster using iterators instead of vm_page_next() and vm_page_prev(), and without gotos.
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D49848
show more ...
|
#
55b343f4 |
| 09-Jan-2025 |
Mark Johnston <markj@FreeBSD.org> |
vm_pageout: Add a chicken switch for multithreaded PQ_INACTIVE scanning
Right now we have the vm.pageout_cpus_per_thread tunable which controls the number of threads to start up per CPU per NUMA dom
vm_pageout: Add a chicken switch for multithreaded PQ_INACTIVE scanning
Right now we have the vm.pageout_cpus_per_thread tunable which controls the number of threads to start up per CPU per NUMA domain, but after booting, it's not possible to disable multi-threaded scanning.
There is at least one workload where this mechanism doesn't work well; let's make it possible to disable it without a reboot, to simplify troubleshooting.
Reviewed by: dougm, kib MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D48377
show more ...
|
#
fe1165df |
| 09-Jan-2025 |
Mark Johnston <markj@FreeBSD.org> |
vm_pageout: Make vmd_oom a bool
No functional change intended.
Reviewed by: dougm, kib MFC after: 1 week Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews
vm_pageout: Make vmd_oom a bool
No functional change intended.
Reviewed by: dougm, kib MFC after: 1 week Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D48376
show more ...
|
#
c5b19cef |
| 07-Dec-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
vm_map: wrap map->system_map checks into wrapper
Reviewed by: alc Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D47934
|
Revision tags: release/14.2.0, release/13.4.0 |
|
#
8bb6b413 |
| 05-Aug-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
vm/vm_pageout.c: eliminate write-only variable
PR: 280631 Fixes: acb4cb33d35838e3e86412202cd63d9021b21ce2 (non-debug builds) Sponsored by: The FreeBSD Foundation
|
#
acb4cb33 |
| 04-Aug-2024 |
Doug Moore <dougm@FreeBSD.org> |
vm_pageout: simplify pageout_cluster
Rewrite vm_pageout_cluster to eliminate redundant variables and duplicated code.
Remove tests on pindex to check for object boundary conditions, since the page_
vm_pageout: simplify pageout_cluster
Rewrite vm_pageout_cluster to eliminate redundant variables and duplicated code.
Remove tests on pindex to check for object boundary conditions, since the page_next and page_prev functions return NULL at the object boundaries. Fix an alignment error that could happen if pindex is aligned, and the first of vm_pageout_page_count flushable pages, and the page at pindex-1 is also flushable.
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D46217
show more ...
|
#
f6ed52c1 |
| 01-Aug-2024 |
Alan Cox <alc@FreeBSD.org> |
vm: Stop reducing vm_pageout_page_count at startup
Attempting to reduce vm_pageout_page_count at startup when the machine has less than 8MB of physical memory is pointless, since we haven't run on m
vm: Stop reducing vm_pageout_page_count at startup
Attempting to reduce vm_pageout_page_count at startup when the machine has less than 8MB of physical memory is pointless, since we haven't run on machines with so little memory in ages.
Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D46206
show more ...
|
#
6d86bdf1 |
| 02-Aug-2024 |
Doug Moore <dougm@FreeBSD.org> |
vm_pageout: shrink pageout array
The array passed to vm_pageout_flush, and constructed in a middle-out fashion, can never use array element zero. Shrink the array by one, and reduce indices by one,
vm_pageout: shrink pageout array
The array passed to vm_pageout_flush, and constructed in a middle-out fashion, can never use array element zero. Shrink the array by one, and reduce indices by one, to save that bit of stack space. In the vm_object version, make the accounting look more like the pageout version.
Reported by: alc Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D46208
show more ...
|
#
e24a6552 |
| 29-Jul-2024 |
Mark Johnston <markj@FreeBSD.org> |
thread: Remove kernel stack swapping support, part 4
- Remove the IS_SWAPPED thread inhibitor state. - Remove all uses of TD_IS_SWAPPED() in the kernel. - Remove the TDF_CANSWAP flag. - Remove the P
thread: Remove kernel stack swapping support, part 4
- Remove the IS_SWAPPED thread inhibitor state. - Remove all uses of TD_IS_SWAPPED() in the kernel. - Remove the TDF_CANSWAP flag. - Remove the P_SWAPPINGOUT and P_SWAPPINGIN flags.
Tested by: pho Reviewed by: alc, imp, kib Differential Revision: https://reviews.freebsd.org/D46115
show more ...
|
#
8370e9df |
| 29-Jul-2024 |
Mark Johnston <markj@FreeBSD.org> |
vm: Remove kernel stack swapping support, part 3
- Modify PHOLD() to no longer fault in the process. - Remove _PHOLD_LITE(), which is now the same as _PHOLD(), fix up consumers. - Remove faultin()
vm: Remove kernel stack swapping support, part 3
- Modify PHOLD() to no longer fault in the process. - Remove _PHOLD_LITE(), which is now the same as _PHOLD(), fix up consumers. - Remove faultin() and its callees.
Tested by: pho Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D46114
show more ...
|
#
13a1129d |
| 29-Jul-2024 |
Mark Johnston <markj@FreeBSD.org> |
vm: Remove kernel stack swapping support, part 1
- Disconnect the swapout daemon from the page daemon. - Remove swapout() and swapout_procs().
Tested by: pho Reviewed by: alc, imp, kib Differential
vm: Remove kernel stack swapping support, part 1
- Disconnect the swapout daemon from the page daemon. - Remove swapout() and swapout_procs().
Tested by: pho Reviewed by: alc, imp, kib Differential Revision: https://reviews.freebsd.org/D46112
show more ...
|
Revision tags: release/14.1.0 |
|
#
a216e311 |
| 24-May-2024 |
Ryan Libby <rlibby@FreeBSD.org> |
vm_pageout_scan_inactive: take a lock break
In vm_pageout_scan_inactive, release the object lock when we go to refill the scan batch queue so that someone else has a chance to acquire it. This impr
vm_pageout_scan_inactive: take a lock break
In vm_pageout_scan_inactive, release the object lock when we go to refill the scan batch queue so that someone else has a chance to acquire it. This improves access latency to the object when the pagedaemon is processing many consecutive pages from a single object, and also in any case avoids a hiccup during refill for the last touched object.
Reviewed by: alc, markj (previous version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D45288
show more ...
|
Revision tags: release/13.3.0 |
|
#
29363fb4 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
685dc743 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
Revision tags: release/13.2.0 |
|
#
1cac76c9 |
| 14-Dec-2022 |
Andrew Gallatin <gallatin@FreeBSD.org> |
vm: reduce lock contention when processing vm batchqueues
Rather than waiting until the batchqueue is full to acquire the lock & process the queue, we now start trying to acquire the lock using tryl
vm: reduce lock contention when processing vm batchqueues
Rather than waiting until the batchqueue is full to acquire the lock & process the queue, we now start trying to acquire the lock using trylocks when the batchqueue is 1/2 full. This removes almost all contention on the vm pagequeue mutex for for our busy sendfile() based web workload. It also greadly reduces the amount of time a network driver ithread remains blocked on a mutex, and eliminates some packet drops under heavy load.
So that the system does not loose the benefit of processing large batchqueues, I've doubled the size of the batchqueues. This way, when there is no contention, we process the same batch size as before.
This has been run for several months on a busy Netflix server, as well as on my personal desktop.
Reviewed by: markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D37305
show more ...
|
Revision tags: release/12.4.0 |
|
#
0cb2610e |
| 16-Jul-2022 |
Mark Johnston <markj@FreeBSD.org> |
vm: Remove handling for OBJT_DEFAULT objects
Now that OBJT_DEFAULT objects can't be instantiated, we can simplify checks of the form object->type == OBJT_DEFAULT || (object->flags & OBJ_SWAP) != 0.
vm: Remove handling for OBJT_DEFAULT objects
Now that OBJT_DEFAULT objects can't be instantiated, we can simplify checks of the form object->type == OBJT_DEFAULT || (object->flags & OBJ_SWAP) != 0. No functional change intended.
Reviewed by: alc, kib Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35788
show more ...
|
#
e123264e |
| 20-Jun-2022 |
Mark Johnston <markj@FreeBSD.org> |
vm: Fix racy checks for swap objects
Commit 4b8365d752ef introduced the ability to dynamically register VM object types, for use by tmpfs, which creates swap-backed objects. As a part of this, check
vm: Fix racy checks for swap objects
Commit 4b8365d752ef introduced the ability to dynamically register VM object types, for use by tmpfs, which creates swap-backed objects. As a part of this, checks for such objects changed from
object->type == OBJT_DEFAULT || object->type == OBJT_SWAP
to
object->type == OBJT_DEFAULT || (object->flags & OBJ_SWAP) != 0
In particular, objects of type OBJT_DEFAULT do not have OBJ_SWAP set; the swap pager sets this flag when converting from OBJT_DEFAULT to OBJT_SWAP.
A few of these checks are done without the object lock held. It turns out that this can result in false negatives since the swap pager converts objects like so:
object->type = OBJT_SWAP; object->flags |= OBJ_SWAP;
Fix the problem by adding explicit tests for OBJT_SWAP objects in unlocked checks.
PR: 258932 Fixes: 4b8365d752ef ("Add OBJT_SWAP_TMPFS pager") Reported by: bdrewery Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35470
show more ...
|
Revision tags: release/13.1.0 |
|
#
b51927b7 |
| 10-Feb-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
Revert "vm_pageout_scans: correct detection of active object"
This reverts commit 3de96d664aaaf8e3fb1ca4fc4bd864d2cf734b24.
Problem is that it is possible to reach the state with ref_count == 1 for
Revert "vm_pageout_scans: correct detection of active object"
This reverts commit 3de96d664aaaf8e3fb1ca4fc4bd864d2cf734b24.
Problem is that it is possible to reach the state with ref_count == 1 for the mapped non-anonymous object. For instance, anonymous posix shmfd or linux shmfs object could be mapped, and then corresponding file descriptor closed, dropping the object reference owned by the shmfd/shmfs file. Then the check in inactive scan assumes that the object and page are not mapped and frees the page, while they are not.
PR: 261707 Discussed with: markj Sponsored by: The FreeBSD Foundation MFC after: now
show more ...
|
#
3de96d66 |
| 16-Jan-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
vm_pageout_scans: correct detection of active object
For non-anonymous swap objects, there is always a reference from the owner to the object to keep it from recycling. Account for it when deciding
vm_pageout_scans: correct detection of active object
For non-anonymous swap objects, there is always a reference from the owner to the object to keep it from recycling. Account for it when deciding should we query pmap for hardware active references for the page.
As result, we avoid unneeded calls to pmap_ts_referenced(), which for non-mapped page means avoiding unneccessary lock and unlock of the pv list.
Reviewed by: markj Discussed with: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33924
show more ...
|
#
4a864f62 |
| 14-Jan-2022 |
Mark Johnston <markj@FreeBSD.org> |
vm_pageout: Print a more accurate message to the console before an OOM kill
Previously we'd always print "out of swap space." This can be misleading, as there are other reasons an OOM kill can be t
vm_pageout: Print a more accurate message to the console before an OOM kill
Previously we'd always print "out of swap space." This can be misleading, as there are other reasons an OOM kill can be triggered. In particular, it's entirely possible to trigger an OOM kill on a system with plenty of free swap space.
Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33810
show more ...
|