#
77131528 |
| 14-Jun-2019 |
Doug Moore <dougm@FreeBSD.org> |
Avoid using the prev field of vm_map_entry_t in two functions that iterate over consecutive vm_map entries, and that can easily just 'remember' the prev value instead of looking it up.
Approved by:
Avoid using the prev field of vm_map_entry_t in two functions that iterate over consecutive vm_map entries, and that can easily just 'remember' the prev value instead of looking it up.
Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20628
show more ...
|
#
af1d6d6a |
| 13-Jun-2019 |
Doug Moore <dougm@FreeBSD.org> |
Create a function for creating objects to back map entries, and one for giving cred to a map entry backed by an object, and use them instead of the code duplicated inline now.
Approved by: kib (ment
Create a function for creating objects to back map entries, and one for giving cred to a map entry backed by an object, and use them instead of the code duplicated inline now.
Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20370
show more ...
|
#
e65d58a0 |
| 12-Jun-2019 |
Doug Moore <dougm@FreeBSD.org> |
To test to see if a free space is big enough compare the required length to the difference of the two offsets that define the gap, to avoid overflow, rather that adding the length to an offset and co
To test to see if a free space is big enough compare the required length to the difference of the two offsets that define the gap, to avoid overflow, rather that adding the length to an offset and comparing that to another offset.
This addresses an overflow issue reported by Peter Holm on i386.
Reported by: pho Tested by: pho Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20594
show more ...
|
#
5a0879da |
| 10-Jun-2019 |
Doug Moore <dougm@FreeBSD.org> |
The computations of vm_map_splay_split and vm_map_splay_merge touch both children of every entry on the search path as part of updating values of the max_free field. By comparing the max_free values
The computations of vm_map_splay_split and vm_map_splay_merge touch both children of every entry on the search path as part of updating values of the max_free field. By comparing the max_free values of an entry and its child on the search path, the code can avoid accessing the child off the path in cases where the max_free value decreases along the path.
Specifically, this patch changes splay_split so that the max_free field of every entry on the search path is replaced, temporarily, by the max_free field from its child not on the search path or, if the child in that direction is NULL, then a difference between start and end values of two pointers already available in the split code, without following any next or prev pointers. However, to find that max_free value does not require looking toward that other child if either the child on the search path has a lower max_free value, or the current max_free value is zero, because in either case we know that the value of max_free for the other child is the value we already have. So, the changes to vm_entry_splay_split make sure that we know all the off-search-path entries we will need to complete the splay, without looking at all of them. There is an exception at the bottom of the search path where we cannot rely on the max_free value in the direction of the NULL pointer that ends the search, because of the behavior of entry-clipping code.
The corresponding change to vm_splay_entry_merge makes it simpler, since it's just reversing pointers and updating running maxima.
In a test intended to exercise vigorously the vm_map implementation, the effect of this change was to reduce the data cache miss rate by 10-14% and the running time by 5-7%.
Tested by: pho Reviewed by: alc Approved by: kib (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D19826
show more ...
|
#
0b96ca33 |
| 10-Jun-2019 |
John Baldwin <jhb@FreeBSD.org> |
Remove an overly-aggressive assertion.
While it is true that the new vmspace passed to vmspace_switch_aio will always have a valid reference due to the AIO job or the extra reference on the original
Remove an overly-aggressive assertion.
While it is true that the new vmspace passed to vmspace_switch_aio will always have a valid reference due to the AIO job or the extra reference on the original vmspace in the worker thread, it is not true that the old vmspace being switched away from will have more than one reference.
Specifically, when a process with queued AIO jobs exits, the exit hook in aio_proc_rundown will only ensure that all of the AIO jobs have completed or been cancelled. However, the last AIO job might have completed and woken up the exiting process before the worker thread servicing that job has switched back to its original vmspace. In that case, the process might finish exiting dropping its reference to the vmspace before the worker thread resulting in the worker thread dropping the last reference.
Reported by: np Reviewed by: alc, markj, np, imp MFC after: 2 weeks Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D20542
show more ...
|
#
0269ae4c |
| 06-Jun-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @348740
Sponsored by: The FreeBSD Foundation
|
#
32d2014d |
| 05-Jun-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
In vm_map_entry_set_vnode_text(), tolerate tmpfs mappings for which vnode is no longer resident.
Mapping of tmpfs file does not bump use count on the vnode, because backing object has swap type. As
In vm_map_entry_set_vnode_text(), tolerate tmpfs mappings for which vnode is no longer resident.
Mapping of tmpfs file does not bump use count on the vnode, because backing object has swap type. As result, even during normal operations, and of course on forced unmount, we might end up with text mapping from tmpfs node which has no vnode in memory. In this case, there is no v_writecount to clear (this was done during reclaim), and no reason to assert that the vnode is present.
Restructure the code to silently ignore OBJ_SWAP objects with OBJ_TMPFS_NODE flag set, but OBJ_TMPFS flag clear.
Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
show more ...
|
#
73f11451 |
| 23-May-2019 |
Doug Moore <dougm@FreeBSD.org> |
Fix typo from r348128: _func__ -> __func__
Reported by: LINT
|
#
fa581662 |
| 23-May-2019 |
Doug Moore <dougm@FreeBSD.org> |
Cleanups made necessary by r348115, or reactions to it: 1. Change size_t to vm_size_t in some places. 2. Rename vm_map_entry_resize_free to drop the _free part. 3. Fix whitespace errors. 4. Fix screw
Cleanups made necessary by r348115, or reactions to it: 1. Change size_t to vm_size_t in some places. 2. Rename vm_map_entry_resize_free to drop the _free part. 3. Fix whitespace errors. 4. Fix screwups in patch-conflict-management that left out important changes related to growing and shrinking objects.
Reviewed by: alc Approved by: kib (mentor)
show more ...
|
#
1895f520 |
| 22-May-2019 |
Doug Moore <dougm@FreeBSD.org> |
Passing a parameter to vm_map_entry_resize_free that describes the amount of resizing reduces the number of functions changing the vm_map invariants regarding the max_free field of map entries.
Revi
Passing a parameter to vm_map_entry_resize_free that describes the amount of resizing reduces the number of functions changing the vm_map invariants regarding the max_free field of map entries.
Reviewed by: markj (mentor) Approved by: kib (mentor) Differential Revision: https://reviews.freebsd.org/D20356
show more ...
|
#
7648bc9f |
| 13-May-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @347527
Sponsored by: The FreeBSD Foundation
|
#
54a3a114 |
| 13-May-2019 |
Mark Johnston <markj@FreeBSD.org> |
Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user wirings for accounting purposes. User wirings (via mlock(2)) were subject to
Provide separate accounting for user-wired pages.
Historically we have not distinguished between kernel wirings and user wirings for accounting purposes. User wirings (via mlock(2)) were subject to a global limit on the number of wired pages, so if large swaths of physical memory were wired by the kernel, as happens with the ZFS ARC among other things, the limit could be exceeded, causing user wirings to fail.
The change adds a new counter, v_user_wire_count, which counts the number of virtual pages wired by user processes via mlock(2) and mlockall(2). Only user-wired pages are subject to the system-wide limit which helps provide some safety against deadlocks. In particular, while sources of kernel wirings typically support some backpressure mechanism, there is no way to reclaim user-wired pages shorting of killing the wiring process. The limit is exported as vm.max_user_wired, renamed from vm.max_wired, and changed from u_int to u_long.
The choice to count virtual user-wired pages rather than physical pages was done for simplicity. There are mechanisms that can cause user-wired mappings to be destroyed while maintaining a wiring of the backing physical page; these make it difficult to accurately track user wirings at the physical page layer.
The change also closes some holes which allowed user wirings to succeed even when they would cause the system limit to be exceeded. For instance, mmap() may now fail with ENOMEM in a process that has called mlockall(MCL_FUTURE) if the new mapping would cause the user wiring limit to be exceeded.
Note that bhyve -S is subject to the user wiring limit, which defaults to 1/3 of physical RAM. Users that wish to exceed the limit must tune vm.max_user_wired.
Reviewed by: kib, ngie (mlock() test changes) Tested by: pho (earlier version) MFC after: 45 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19908
show more ...
|
#
78022527 |
| 05-May-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_w
Switch to use shared vnode locks for text files during image activation.
kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition.
The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal
The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own.
nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode.
On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left.
Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923
show more ...
|
#
19f5d9f2 |
| 01-May-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix another race between vm_map_protect() and vm_map_wire().
vm_map_wire() increments entry->wire_count, after that it drops the map lock both for faulting in the entry' pages, and for marking next
Fix another race between vm_map_protect() and vm_map_wire().
vm_map_wire() increments entry->wire_count, after that it drops the map lock both for faulting in the entry' pages, and for marking next entry in the requested region as IN_TRANSITION. Only after all entries are faulted in, MAP_ENTRY_USER_WIRE flag is set.
This makes it possible for vm_map_protect() to run while other entry' MAP_ENTRY_IN_TRANSITION flag is handled, and vm_map_busy() lock does not prevent it. In particular, if the call to vm_map_protect() adds VM_PROT_WRITE to CoW entry, it would fail to call vm_fault_copy_entry(). There are at least two consequences of the race: the top object in the shadow chain is not populated with writeable pages, and second, the entry eventually get contradictory flags MAP_ENTRY_NEEDS_COPY | MAP_ENTRY_USER_WIRED with VM_PROT_WRITE set.
Handle it by waiting for all MAP_ENTRY_IN_TRANSITION flags to go away in vm_map_protect(), which does not drop map lock afterwards. Note that vm_map_busy_wait() is left as is.
Reported and tested by: pho (previous version) Reviewed by: Doug Moore <dougm@rice.edu>, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D20091
show more ...
|
#
c4e5de7e |
| 22-Apr-2019 |
Mark Johnston <markj@FreeBSD.org> |
Disable vm map consistency checking by default on INVARIANTS kernels.
The checks are too expensive for a general-purpose kernel. Enable the checks when DIAGNOSTIC is defined and provide a sysctl to
Disable vm map consistency checking by default on INVARIANTS kernels.
The checks are too expensive for a general-purpose kernel. Enable the checks when DIAGNOSTIC is defined and provide a sysctl to enable the checks in a non-DIAGNOSTIC INVARIANTS kernel.
Reviewed by: kib Discussed with: Doug Moore <dougm@rice.edu> MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D19999
show more ...
|
#
a5a02ef4 |
| 05-Apr-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix mis-merge.
Amusingly, it is nop.
Noted by: trasz Sponsored by: The FreeBSD Foundation MFC after: 1 week X-MFC-rev: r345702
|
#
9a696dc6 |
| 04-Apr-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead@r345880
|
#
9f701172 |
| 29-Mar-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Eliminate adj_free field from vm_map_entry.
Drop the adj_free field from vm_map_entry_t. Refine the max_free field so that p->max_free is the size of the largest gap with one endpoint in the subtree
Eliminate adj_free field from vm_map_entry.
Drop the adj_free field from vm_map_entry_t. Refine the max_free field so that p->max_free is the size of the largest gap with one endpoint in the subtree rooted at p. Change vm_map_findspace so that, first, the address-based splay is restricted to tree nodes with large-enough max_free value, to avoid searching for the right starting point in a subtree where all the gaps are too small. Second, when the address search leads to a tree search for the first large-enough gap, that gap is the subject of a splay-search that brings the gap to the top of the tree, so that an immediate insertion will take constant time.
Break up the splay code into separate components, one for searching and breaking up the tree and another for reassembling it. Use these components, and not splay itself, for linking and unlinking. Drop the after-where parameter to link, as it is computed as a side-effect of the splay search.
Submitted by: Doug Moore <dougm@rice.edu> Reviewed by: markj Tested by: pho MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D17794
show more ...
|
#
415e34c4 |
| 29-Mar-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead@r345677
|
#
5019dac9 |
| 23-Mar-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
ASLR: check for max_addr after applying randomization, not before.
Otherwise resulting address from vm_map_find() migh not satisfy the upper limit. For instance, it could affect MAP_32BIT flag from
ASLR: check for max_addr after applying randomization, not before.
Otherwise resulting address from vm_map_find() migh not satisfy the upper limit. For instance, it could affect MAP_32BIT flag from 64bit processes.
Found by: Doug Moore <dougm@rice.edu> Reviewed by: alc, Doug Moore <dougm@rice.edu> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D19688
show more ...
|
#
18b18078 |
| 25-Feb-2019 |
Enji Cooper <ngie@FreeBSD.org> |
MFhead@r344527
|
#
a8fe8db4 |
| 25-Feb-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r344178 through r344512.
|
#
e7a9df16 |
| 20-Feb-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Add kernel support for Intel userspace protection keys feature on Skylake Xeons.
See SDM rev. 68 Vol 3 4.6.2 Protection Keys and the description of the RDPKRU and WRPKRU instructions.
Reviewed by:
Add kernel support for Intel userspace protection keys feature on Skylake Xeons.
See SDM rev. 68 Vol 3 4.6.2 Protection Keys and the description of the RDPKRU and WRPKRU instructions.
Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D18893
show more ...
|
#
30e009fc |
| 19-Feb-2019 |
Enji Cooper <ngie@FreeBSD.org> |
MFhead@r344270
|
#
c981cbbd |
| 15-Feb-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r343956 through r344177.
|