History log of /freebsd/sys/vm/vm_fault.c (Results 101 – 125 of 937)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 0012f373 15-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(4/6) Protect page valid with the busy lock.

Atomics are used for page busy and valid state when the shared busy is
held. The details of the locking protocol and valid and dirty
synchronization are

(4/6) Protect page valid with the busy lock.

Atomics are used for page busy and valid state when the shared busy is
held. The details of the locking protocol and valid and dirty
synchronization are in the updated vm_page.h comments.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21594

show more ...


# 205be21d 15-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks

(3/6) Add a shared object busy synchronization mechanism that blocks new page
busy acquires while held.

This allows code that would need to acquire and release a very large number
of page busy locks to use the old mechanism where busy is only checked and
not held. This comes at the cost of false positives but never false
negatives which the single consumer, vm_fault_soft_fast(), handles.

Reviewed by: kib
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21592

show more ...


# 8da1c098 15-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(2/6) Don't release xbusy in vm_page_remove(), defer to vm_page_free_prep().

This persists busy state across operations like rename and replace.

Reviewed by: kib, markj
Tested by: pho
Spons

(2/6) Don't release xbusy in vm_page_remove(), defer to vm_page_free_prep().

This persists busy state across operations like rename and replace.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21549

show more ...


# 63e97555 15-Oct-2019 Jeff Roberson <jeff@FreeBSD.org>

(1/6) Replace busy checks with acquires where it is trival to do so.

This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the obje

(1/6) Replace busy checks with acquires where it is trival to do so.

This is the first in a series of patches that promotes the page busy field
to a first class lock that no longer requires the object lock for
consistency.

Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix, Intel
Differential Revision: https://reviews.freebsd.org/D21548

show more ...


# c31cec45 13-Oct-2019 Konstantin Belousov <kib@FreeBSD.org>

Restore nofaulting operations after r352807

The TDP_NOFAULTING flag should be checked in vm_fault(), not in
vm_fault_trap(). Otherwise kernel accesses to userspace, like
vn_io_fault(), enter vm loc

Restore nofaulting operations after r352807

The TDP_NOFAULTING flag should be checked in vm_fault(), not in
vm_fault_trap(). Otherwise kernel accesses to userspace, like
vn_io_fault(), enter vm locking when it should not.

Reported and tested by: pho
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
Differential revision: https://reviews.freebsd.org/D21992

show more ...


# 8b3bc70a 08-Oct-2019 Dimitry Andric <dim@FreeBSD.org>

Merge ^/head r352764 through r353315.


# df08823d 27-Sep-2019 Konstantin Belousov <kib@FreeBSD.org>

Improve MD page fault handlers.

Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap(). MD trap_pfault() now almost
always uses the signal numb

Improve MD page fault handlers.

Centralize calculation of signal and ucode delivered on unhandled page
fault in new function vm_fault_trap(). MD trap_pfault() now almost
always uses the signal numbers and error codes calculated in
consistent MI way.

This introduces the protection fault compatibility sysctls to all
non-x86 architectures which did not have that bug, but apparently they
were already much more wrong in selecting delivered signals on
protection violations.

Change the delivered signal for accesses to mapped area after the
backing object was truncated. According to POSIX description for
mmap(2):
The system shall always zero-fill any partial page at the end of an
object. Further, the system shall never write out any modified
portions of the last page of an object which are beyond its
end. References within the address range starting at pa and
continuing for len bytes to whole pages following the end of an
object shall result in delivery of a SIGBUS signal.

An implementation may generate SIGBUS signals when a reference
would cause an error in the mapped object, such as out-of-space
condition.
Adjust according to the description, keeping the existing
compatibility code for SIGSEGV/SIGBUS on protection failures.

For situations where kernel cannot handle page fault due to resource
limit enforcement, SIGBUS with a new error code BUS_OBJERR is
delivered. Also, provide a new error code SEGV_PKUERR for SIGSEGV on
amd64 due to protection key access violation.

vm_fault_hold() is renamed to vm_fault(). Fixed some nits in
trap_pfault()s like mis-interpreting Mach errors as errnos. Removed
unneeded truncations of the fault addresses reported by hardware.

PR: 211924
Reviewed by: alc
Discussed with: jilles, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21566

show more ...


# e8bcf696 16-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Revert r352406, which contained changes I didn't intend to commit.


# 41fd4b94 16-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Fix a couple of nits in r352110.

- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by: alc
Reviewed by: alc, kib
Sponsored by: Ne

Fix a couple of nits in r352110.

- Remove a dead variable from the amd64 pmap_extract_and_hold().
- Fix grammar in the vm_page_wire man page.

Reported by: alc
Reviewed by: alc, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21639

show more ...


# 61c1328e 13-Sep-2019 Dimitry Andric <dim@FreeBSD.org>

Merge ^/head r352105 through r352307.


# 11b57401 12-Sep-2019 Hans Petter Selasky <hselasky@FreeBSD.org>

Use REFCOUNT_COUNT() to obtain refcount where appropriate.
Refcount waiting will set some flag bits in the refcount value.
Make sure these bits get cleared by using the REFCOUNT_COUNT()
macro to obta

Use REFCOUNT_COUNT() to obtain refcount where appropriate.
Refcount waiting will set some flag bits in the refcount value.
Make sure these bits get cleared by using the REFCOUNT_COUNT()
macro to obtain the actual refcount.

Differential Revision: https://reviews.freebsd.org/D21620
Reviewed by: kib@, markj@
MFC after: 1 week
Sponsored by: Mellanox Technologies

show more ...


# 4cdea4a8 10-Sep-2019 Jeff Roberson <jeff@FreeBSD.org>

Use the sleepq lock rather than the page lock to protect against wakeup
races with page busy state. The object lock is still used as an interlock
to ensure that the identity stays valid. Most calle

Use the sleepq lock rather than the page lock to protect against wakeup
races with page busy state. The object lock is still used as an interlock
to ensure that the identity stays valid. Most callers should use
vm_page_sleep_if_busy() to handle the locking particulars.

Reviewed by: alc, kib, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21255

show more ...


# fee2a2fa 09-Sep-2019 Mark Johnston <markj@FreeBSD.org>

Change synchonization rules for vm_page reference counting.

There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
pa

Change synchonization rules for vm_page reference counting.

There are several mechanisms by which a vm_page reference is held,
preventing the page from being freed back to the page allocator. In
particular, holding the page's object lock is sufficient to prevent the
page from being freed; holding the busy lock or a wiring is sufficent as
well. These references are protected by the page lock, which must
therefore be acquired for many per-page operations. This results in
false sharing since the page locks are external to the vm_page
structures themselves and each lock protects multiple structures.

Transition to using an atomically updated per-page reference counter.
The object's reference is counted using a flag bit in the counter. A
second flag bit is used to atomically block new references via
pmap_extract_and_hold() while removing managed mappings of a page.
Thus, the reference count of a page is guaranteed not to increase if the
page is unbusied, unmapped, and the object's write lock is held. As
a consequence of this, the page lock no longer protects a page's
identity; operations which move pages between objects are now
synchronized solely by the objects' locks.

The vm_page_wire() and vm_page_unwire() KPIs are changed. The former
requires that either the object lock or the busy lock is held. The
latter no longer has a return value and may free the page if it releases
the last reference to that page. vm_page_unwire_noq() behaves the same
as before; the caller is responsible for checking its return value and
freeing or enqueuing the page as appropriate. vm_page_wire_mapped() is
introduced for use in pmap_extract_and_hold(). It fails if the page is
concurrently being unmapped, typically triggering a fallback to the
fault handler. vm_page_wire() no longer requires the page lock and
vm_page_unwire() now internally acquires the page lock when releasing
the last wiring of a page (since the page lock still protects a page's
queue state). In particular, synchronization details are no longer
leaked into the caller.

The change excises the page lock from several frequently executed code
paths. In particular, vm_object_terminate() no longer bounces between
page locks as it releases an object's pages, and direct I/O and
sendfile(SF_NOCACHE) completions no longer require the page lock. In
these latter cases we now get linear scalability in the common scenario
where different threads are operating on different files.

__FreeBSD_version is bumped. The DRM ports have been updated to
accomodate the KPI changes.

Reviewed by: jeff (earlier version)
Tested by: gallatin (earlier version), pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20486

show more ...


# 245139c6 16-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Fix OOM handling of some corner cases.

In addition to pagedaemon initiating OOM, also do it from the
vm_fault() internals. Namely, if the thread waits for a free page to
satisfy page fault some pre

Fix OOM handling of some corner cases.

In addition to pagedaemon initiating OOM, also do it from the
vm_fault() internals. Namely, if the thread waits for a free page to
satisfy page fault some preconfigured amount of time, trigger OOM.
These triggers are rate-limited, due to a usual case of several
threads of the same multi-threaded process to enter fault handler
simultaneously. The faults from pagedaemon threads participate in the
calculation of OOM rate, but are not under the limit.

Reviewed by: markj (previous version)
Tested by: pho
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D13671

show more ...


# a63915c2 28-Jul-2019 Alan Somers <asomers@FreeBSD.org>

MFHead @r350386

Sponsored by: The FreeBSD Foundation


# eeacb3b0 08-Jul-2019 Mark Johnston <markj@FreeBSD.org>

Merge the vm_page hold and wire mechanisms.

The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are th

Merge the vm_page hold and wire mechanisms.

The hold_count and wire_count fields of struct vm_page are separate
reference counters with similar semantics. The remaining essential
differences are that holds are not counted as a reference with respect
to LRU, and holds have an implicit free-on-last unhold semantic whereas
vm_page_unwire() callers must explicitly determine whether to free the
page once the last reference to the page is released.

This change removes the KPIs which directly manipulate hold_count.
Functions such as vm_fault_quick_hold_pages() now return wired pages
instead. Since r328977 the overhead of maintaining LRU for wired pages
is lower, and in many cases vm_fault_quick_hold_pages() callers would
swap holds for wirings on the returned pages anyway, so with this change
we remove a number of page lock acquisitions.

No functional change is intended. __FreeBSD_version is bumped.

Reviewed by: alc, kib
Discussed with: jeff
Discussed with: jhb, np (cxgbe)
Tested by: pho (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19247

show more ...


Revision tags: release/11.3.0
# 7f49ce7a 28-Jun-2019 Alan Somers <asomers@FreeBSD.org>

MFHead @349476

Sponsored by: The FreeBSD Foundation


# 0fd977b3 26-Jun-2019 Mark Johnston <markj@FreeBSD.org>

Add a return value to vm_page_remove().

Use it to indicate whether the page may be safely freed following
its removal from the object. Also change vm_page_remove() to assume
that the page's object

Add a return value to vm_page_remove().

Use it to indicate whether the page may be safely freed following
its removal from the object. Also change vm_page_remove() to assume
that the page's object pointer is non-NULL, and have callers perform
this check instead.

This is a step towards an implementation of an atomic reference counter
for each physical page structure.

Reviewed by: alc, dougm, kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20758

show more ...


# 0269ae4c 06-Jun-2019 Alan Somers <asomers@FreeBSD.org>

MFHead @348740

Sponsored by: The FreeBSD Foundation


# d842aa51 02-Jun-2019 Mark Johnston <markj@FreeBSD.org>

Add a vm_page_wired() predicate.

Use it instead of accessing the wire_count field directly. No
functional change intended.

Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: Netflix
Differentia

Add a vm_page_wired() predicate.

Use it instead of accessing the wire_count field directly. No
functional change intended.

Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20485

show more ...


# 7648bc9f 13-May-2019 Alan Somers <asomers@FreeBSD.org>

MFHead @347527

Sponsored by: The FreeBSD Foundation


# 78022527 05-May-2019 Konstantin Belousov <kib@FreeBSD.org>

Switch to use shared vnode locks for text files during image activation.

kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_w

Switch to use shared vnode locks for text files during image activation.

kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount. To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change. vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP. vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by: markj, trasz
Tested by: mjg, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D19923

show more ...


# 415e34c4 29-Mar-2019 Alan Somers <asomers@FreeBSD.org>

MFHead@r345677


# 64087fd7 21-Mar-2019 Mark Johnston <markj@FreeBSD.org>

Disallow preemptive creation of wired superpage mappings.

There are some unusual cases where a process may cause an mlock()ed
range of memory to be unmapped. If the application subsequently
faults

Disallow preemptive creation of wired superpage mappings.

There are some unusual cases where a process may cause an mlock()ed
range of memory to be unmapped. If the application subsequently
faults on that region, the handler may attempt to create a superpage
mapping backed by the resident, wired pages. However, the pmap code
responsible for creating such a mapping (pmap_enter_pde() on i386
and amd64) does not ensure that a leaf page table page is available
if the superpage is later demoted; the demotion operation must therefore
perform a non-blocking page allocation and must unmap the entire
superpage if the allocation fails. The pmap layer ensures that this
can never happen for wired mappings, and so the case described above
breaks that invariant.

For now, simply ensure that the MI fault handler never attempts to
create a wired superpage except via promotion.

Reviewed by: kib
Reported by: syzbot+292d3b0416c27c131505@syzkaller.appspotmail.com
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19670

show more ...


# f9856d08 21-Mar-2019 Alan Somers <asomers@FreeBSD.org>

MFHead @345353


12345678910>>...38