#
0012f373 |
| 15-Oct-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
(4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are
(4/6) Protect page valid with the busy lock.
Atomics are used for page busy and valid state when the shared busy is held. The details of the locking protocol and valid and dirty synchronization are in the updated vm_page.h comments.
Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21594
show more ...
|
#
205be21d |
| 15-Oct-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
(3/6) Add a shared object busy synchronization mechanism that blocks new page busy acquires while held.
This allows code that would need to acquire and release a very large number of page busy locks
(3/6) Add a shared object busy synchronization mechanism that blocks new page busy acquires while held.
This allows code that would need to acquire and release a very large number of page busy locks to use the old mechanism where busy is only checked and not held. This comes at the cost of false positives but never false negatives which the single consumer, vm_fault_soft_fast(), handles.
Reviewed by: kib Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21592
show more ...
|
#
8da1c098 |
| 15-Oct-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
(2/6) Don't release xbusy in vm_page_remove(), defer to vm_page_free_prep().
This persists busy state across operations like rename and replace.
Reviewed by: kib, markj Tested by: pho Spons
(2/6) Don't release xbusy in vm_page_remove(), defer to vm_page_free_prep().
This persists busy state across operations like rename and replace.
Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21549
show more ...
|
#
63e97555 |
| 15-Oct-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
(1/6) Replace busy checks with acquires where it is trival to do so.
This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the obje
(1/6) Replace busy checks with acquires where it is trival to do so.
This is the first in a series of patches that promotes the page busy field to a first class lock that no longer requires the object lock for consistency.
Reviewed by: kib, markj Tested by: pho Sponsored by: Netflix, Intel Differential Revision: https://reviews.freebsd.org/D21548
show more ...
|
#
c31cec45 |
| 13-Oct-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Restore nofaulting operations after r352807
The TDP_NOFAULTING flag should be checked in vm_fault(), not in vm_fault_trap(). Otherwise kernel accesses to userspace, like vn_io_fault(), enter vm loc
Restore nofaulting operations after r352807
The TDP_NOFAULTING flag should be checked in vm_fault(), not in vm_fault_trap(). Otherwise kernel accesses to userspace, like vn_io_fault(), enter vm locking when it should not.
Reported and tested by: pho Reviewed by: alc, markj Sponsored by: The FreeBSD Foundation MFC after: 3 days Differential revision: https://reviews.freebsd.org/D21992
show more ...
|
#
8b3bc70a |
| 08-Oct-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r352764 through r353315.
|
#
df08823d |
| 27-Sep-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Improve MD page fault handlers.
Centralize calculation of signal and ucode delivered on unhandled page fault in new function vm_fault_trap(). MD trap_pfault() now almost always uses the signal numb
Improve MD page fault handlers.
Centralize calculation of signal and ucode delivered on unhandled page fault in new function vm_fault_trap(). MD trap_pfault() now almost always uses the signal numbers and error codes calculated in consistent MI way.
This introduces the protection fault compatibility sysctls to all non-x86 architectures which did not have that bug, but apparently they were already much more wrong in selecting delivered signals on protection violations.
Change the delivered signal for accesses to mapped area after the backing object was truncated. According to POSIX description for mmap(2): The system shall always zero-fill any partial page at the end of an object. Further, the system shall never write out any modified portions of the last page of an object which are beyond its end. References within the address range starting at pa and continuing for len bytes to whole pages following the end of an object shall result in delivery of a SIGBUS signal.
An implementation may generate SIGBUS signals when a reference would cause an error in the mapped object, such as out-of-space condition. Adjust according to the description, keeping the existing compatibility code for SIGSEGV/SIGBUS on protection failures.
For situations where kernel cannot handle page fault due to resource limit enforcement, SIGBUS with a new error code BUS_OBJERR is delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on amd64 due to protection key access violation.
vm_fault_hold() is renamed to vm_fault(). Fixed some nits in trap_pfault()s like mis-interpreting Mach errors as errnos. Removed unneeded truncations of the fault addresses reported by hardware.
PR: 211924 Reviewed by: alc Discussed with: jilles, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21566
show more ...
|
#
e8bcf696 |
| 16-Sep-2019 |
Mark Johnston <markj@FreeBSD.org> |
Revert r352406, which contained changes I didn't intend to commit.
|
#
41fd4b94 |
| 16-Sep-2019 |
Mark Johnston <markj@FreeBSD.org> |
Fix a couple of nits in r352110.
- Remove a dead variable from the amd64 pmap_extract_and_hold(). - Fix grammar in the vm_page_wire man page.
Reported by: alc Reviewed by: alc, kib Sponsored by: Ne
Fix a couple of nits in r352110.
- Remove a dead variable from the amd64 pmap_extract_and_hold(). - Fix grammar in the vm_page_wire man page.
Reported by: alc Reviewed by: alc, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21639
show more ...
|
#
61c1328e |
| 13-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r352105 through r352307.
|
#
11b57401 |
| 12-Sep-2019 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Use REFCOUNT_COUNT() to obtain refcount where appropriate. Refcount waiting will set some flag bits in the refcount value. Make sure these bits get cleared by using the REFCOUNT_COUNT() macro to obta
Use REFCOUNT_COUNT() to obtain refcount where appropriate. Refcount waiting will set some flag bits in the refcount value. Make sure these bits get cleared by using the REFCOUNT_COUNT() macro to obtain the actual refcount.
Differential Revision: https://reviews.freebsd.org/D21620 Reviewed by: kib@, markj@ MFC after: 1 week Sponsored by: Mellanox Technologies
show more ...
|
#
4cdea4a8 |
| 10-Sep-2019 |
Jeff Roberson <jeff@FreeBSD.org> |
Use the sleepq lock rather than the page lock to protect against wakeup races with page busy state. The object lock is still used as an interlock to ensure that the identity stays valid. Most calle
Use the sleepq lock rather than the page lock to protect against wakeup races with page busy state. The object lock is still used as an interlock to ensure that the identity stays valid. Most callers should use vm_page_sleep_if_busy() to handle the locking particulars.
Reviewed by: alc, kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D21255
show more ...
|
#
fee2a2fa |
| 09-Sep-2019 |
Mark Johnston <markj@FreeBSD.org> |
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In pa
Change synchonization rules for vm_page reference counting.
There are several mechanisms by which a vm_page reference is held, preventing the page from being freed back to the page allocator. In particular, holding the page's object lock is sufficient to prevent the page from being freed; holding the busy lock or a wiring is sufficent as well. These references are protected by the page lock, which must therefore be acquired for many per-page operations. This results in false sharing since the page locks are external to the vm_page structures themselves and each lock protects multiple structures.
Transition to using an atomically updated per-page reference counter. The object's reference is counted using a flag bit in the counter. A second flag bit is used to atomically block new references via pmap_extract_and_hold() while removing managed mappings of a page. Thus, the reference count of a page is guaranteed not to increase if the page is unbusied, unmapped, and the object's write lock is held. As a consequence of this, the page lock no longer protects a page's identity; operations which move pages between objects are now synchronized solely by the objects' locks.
The vm_page_wire() and vm_page_unwire() KPIs are changed. The former requires that either the object lock or the busy lock is held. The latter no longer has a return value and may free the page if it releases the last reference to that page. vm_page_unwire_noq() behaves the same as before; the caller is responsible for checking its return value and freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is introduced for use in pmap_extract_and_hold(). It fails if the page is concurrently being unmapped, typically triggering a fallback to the fault handler. vm_page_wire() no longer requires the page lock and vm_page_unwire() now internally acquires the page lock when releasing the last wiring of a page (since the page lock still protects a page's queue state). In particular, synchronization details are no longer leaked into the caller.
The change excises the page lock from several frequently executed code paths. In particular, vm_object_terminate() no longer bounces between page locks as it releases an object's pages, and direct I/O and sendfile(SF_NOCACHE) completions no longer require the page lock. In these latter cases we now get linear scalability in the common scenario where different threads are operating on different files.
__FreeBSD_version is bumped. The DRM ports have been updated to accomodate the KPI changes.
Reviewed by: jeff (earlier version) Tested by: gallatin (earlier version), pho Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20486
show more ...
|
#
245139c6 |
| 16-Aug-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix OOM handling of some corner cases.
In addition to pagedaemon initiating OOM, also do it from the vm_fault() internals. Namely, if the thread waits for a free page to satisfy page fault some pre
Fix OOM handling of some corner cases.
In addition to pagedaemon initiating OOM, also do it from the vm_fault() internals. Namely, if the thread waits for a free page to satisfy page fault some preconfigured amount of time, trigger OOM. These triggers are rate-limited, due to a usual case of several threads of the same multi-threaded process to enter fault handler simultaneously. The faults from pagedaemon threads participate in the calculation of OOM rate, but are not under the limit.
Reviewed by: markj (previous version) Tested by: pho Discussed with: alc Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D13671
show more ...
|
#
a63915c2 |
| 28-Jul-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @r350386
Sponsored by: The FreeBSD Foundation
|
#
eeacb3b0 |
| 08-Jul-2019 |
Mark Johnston <markj@FreeBSD.org> |
Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are th
Merge the vm_page hold and wire mechanisms.
The hold_count and wire_count fields of struct vm_page are separate reference counters with similar semantics. The remaining essential differences are that holds are not counted as a reference with respect to LRU, and holds have an implicit free-on-last unhold semantic whereas vm_page_unwire() callers must explicitly determine whether to free the page once the last reference to the page is released.
This change removes the KPIs which directly manipulate hold_count. Functions such as vm_fault_quick_hold_pages() now return wired pages instead. Since r328977 the overhead of maintaining LRU for wired pages is lower, and in many cases vm_fault_quick_hold_pages() callers would swap holds for wirings on the returned pages anyway, so with this change we remove a number of page lock acquisitions.
No functional change is intended. __FreeBSD_version is bumped.
Reviewed by: alc, kib Discussed with: jeff Discussed with: jhb, np (cxgbe) Tested by: pho (previous version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19247
show more ...
|
Revision tags: release/11.3.0 |
|
#
7f49ce7a |
| 28-Jun-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @349476
Sponsored by: The FreeBSD Foundation
|
#
0fd977b3 |
| 26-Jun-2019 |
Mark Johnston <markj@FreeBSD.org> |
Add a return value to vm_page_remove().
Use it to indicate whether the page may be safely freed following its removal from the object. Also change vm_page_remove() to assume that the page's object
Add a return value to vm_page_remove().
Use it to indicate whether the page may be safely freed following its removal from the object. Also change vm_page_remove() to assume that the page's object pointer is non-NULL, and have callers perform this check instead.
This is a step towards an implementation of an atomic reference counter for each physical page structure.
Reviewed by: alc, dougm, kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20758
show more ...
|
#
0269ae4c |
| 06-Jun-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @348740
Sponsored by: The FreeBSD Foundation
|
#
d842aa51 |
| 02-Jun-2019 |
Mark Johnston <markj@FreeBSD.org> |
Add a vm_page_wired() predicate.
Use it instead of accessing the wire_count field directly. No functional change intended.
Reviewed by: alc, kib MFC after: 1 week Sponsored by: Netflix Differentia
Add a vm_page_wired() predicate.
Use it instead of accessing the wire_count field directly. No functional change intended.
Reviewed by: alc, kib MFC after: 1 week Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20485
show more ...
|
#
7648bc9f |
| 13-May-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @347527
Sponsored by: The FreeBSD Foundation
|
#
78022527 |
| 05-May-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_w
Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition.
The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal
The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own.
nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode.
On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left.
Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923
show more ...
|
#
415e34c4 |
| 29-Mar-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead@r345677
|
#
64087fd7 |
| 21-Mar-2019 |
Mark Johnston <markj@FreeBSD.org> |
Disallow preemptive creation of wired superpage mappings.
There are some unusual cases where a process may cause an mlock()ed range of memory to be unmapped. If the application subsequently faults
Disallow preemptive creation of wired superpage mappings.
There are some unusual cases where a process may cause an mlock()ed range of memory to be unmapped. If the application subsequently faults on that region, the handler may attempt to create a superpage mapping backed by the resident, wired pages. However, the pmap code responsible for creating such a mapping (pmap_enter_pde() on i386 and amd64) does not ensure that a leaf page table page is available if the superpage is later demoted; the demotion operation must therefore perform a non-blocking page allocation and must unmap the entire superpage if the allocation fails. The pmap layer ensures that this can never happen for wired mappings, and so the case described above breaks that invariant.
For now, simply ensure that the MI fault handler never attempts to create a wired superpage except via promotion.
Reviewed by: kib Reported by: syzbot+292d3b0416c27c131505@syzkaller.appspotmail.com MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19670
show more ...
|
#
f9856d08 |
| 21-Mar-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @345353
|