userfaultfd.rst - OpenGrok cross reference for /linux/Documentation/admin-guide/mm/userfaultfd.rst

Lines Matching +full:keep +full:- +full:a +full:- +full:live
8 Userfaults allow the implementation of on-demand paging from userland
12 For example userfaults allows a proper and more optimal implementation
18 Userspace creates a new userfaultfd, initializes it, and registers one or more
20 region(s) result in a message being delivered to the userfaultfd, notifying
26 1) ``read/POLLIN`` protocol to notify a userland thread of the faults
38 Vmas are not suitable for page- (or hugepage) granular fault tracking
43 passed using unix domain sockets to a manager process, so the same
44 manager process could handle the userfaults of a multitude of
48 is a corner case that would currently return ``-EBUSY``).
53 Creating a userfaultfd
54 ----------------------
56 There are two ways to create a new userfaultfd, each of which provide ways to
58 handle kernel page faults have been a useful tool for exploiting the kernel).
63 - Any user can always create a userfaultfd which traps userspace page faults
64   only. Such a userfaultfd can be created using the userfaultfd(2) syscall
67 - In order to also trap kernel page faults for the address space, either the
73 /dev/userfaultfd and issuing a USERFAULTFD_IOC_NEW ioctl to it. This method
83 Initializing a userfaultfd
84 --------------------------
87 ``UFFDIO_API`` ioctl specifying a ``uffdio_api.api`` value set to ``UFFD_API`` (or
88 a later API version) which will specify the ``read/POLLIN`` protocol
101 - The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
103   detail below in the `Non-cooperative userfaultfd`_ section.
105 - ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
111 - ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
116 - ``UFFD_FEATURE_MOVE`` indicates that the kernel supports moving an
125 bitmask) to register a memory range in the ``userfaultfd`` by setting the
136 memory from the ``userfaultfd`` registered range). This means a userfault
138 user-faulted page.
141 --------------------
145 - ``UFFDIO_COPY`` atomically copies some existing page contents from
148 - ``UFFDIO_ZEROPAGE`` atomically zeros the new page.
150 - ``UFFDIO_CONTINUE`` maps an existing, previously-populated page.
153 see a half-populated page, since readers will keep userfaulting until the
157 They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates
163 - For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be
164   resolved by either providing a new page (``UFFDIO_COPY``), or mapping
166   the zero page for a missing fault. With userfaultfd, userspace can
169 - For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in
177 - You can tell which kind of fault occurred by examining
181 - None of the page-delivering ioctls default to the range that you
185 - You get the address of the access that triggered the missing page
186   event out of a struct uffd_msg that you read in the thread from the
188   Keep in mind that unless you used DONTWAKE then the first of any of
191 - Be sure to test for all errors including
196 ---------------------------
198 This is equivalent to (but faster than) using mprotect and a SIGSEGV
201 Firstly you need to register a range with ``UFFDIO_REGISTER_MODE_WP``.
218 which you supply a page and undo write protect.  Note that there is a
219 difference between writes into a WP area and into a !WP area.  The
222 you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was
225 Userfaultfd write-protect mode currently behave differently on none ptes
229 (e.g. when pages are missing and not populated).  For file-backed memories
230 like shmem and hugetlbfs, none ptes will be write protected just like a
231 present pte.  In other words, there will be a userfaultfd write fault
232 message generated when writing to a missing page on file typed memories,
233 as long as the page range was write-protected before.  Such a message will
237 memory, one can pre-populate the memory with e.g. MADV_POPULATE_READ.  On
246 write-protected (so future writes will also result in a WP fault). These ioctls
247 support a mode flag (``UFFDIO_COPY_MODE_WP`` or ``UFFDIO_CONTINUE_MODE_WP``
251 any vma registered with write-protection will work in async mode rather
254 In async mode, there will be no message generated when a write operation
255 happens, meanwhile the write-protection will be resolved automatically by
256 the kernel.  It can be seen as a more accurate version of soft-dirty
257 tracking and it can be different in a few ways:
259   - The dirty result will not be affected by vma changes (e.g. vma
262   - It supports range operations by default, so one can enable tracking on
265   - Dirty information will not get lost if the pte was zapped due to
266     various reasons (e.g. during split of a shmem transparent huge page).
268   - Due to a reverted meaning of soft-dirty (page clean when uffd-wp bit
269     set; dirty when uffd-wp bit cleared), it has different semantics on
271     anonymous (or ``MADV_REMOVE`` on a file mapping) will be treated as
272     dirtying of memory by dropping uffd-wp bit during the procedure.
275 uffd-wp bit for the pages being interested in /proc/pagemap.
277 The page will not be under track of uffd-wp async mode until the page is
278 explicitly write-protected by ``ioctl(UFFDIO_WRITEPROTECT)`` with the mode
279 flag ``UFFDIO_WRITEPROTECT_MODE_WP`` set.  Trying to resolve a page fault
280 that was tracked by async mode userfaultfd-wp is invalid.
282 When userfaultfd-wp async mode is used alone, it can be applied to all
286 ---------------------------
288 In response to a fault (either missing or minor), an action userspace can
289 take to "resolve" it is to issue a ``UFFDIO_POISON``. This will cause any
290 future faulters to either get a SIGBUS, or in KVM's case the guest will
293 This is used to emulate hardware memory poisoning. Imagine a VM running on a
294 machine which experiences a real hardware memory error. Later, we live migrate
297 still poisoned, even though it's on a new physical host which ostensibly
298 doesn't have a memory error in the exact same spot.
303 QEMU/KVM is using the ``userfaultfd`` syscall to implement postcopy live
304 migration. Postcopy live migration is one form of memory
305 externalization consisting of a virtual machine running with part or
306 all of its memory residing on a different node in the cloud. The
307 ``userfaultfd`` abstraction is generic enough that not a single line of
308 KVM kernel code had to be modified in order to add postcopy live
314 aren't waiting for userfaults (i.e. network bound) can keep running in
317 It is generally beneficial to run one pass of precopy live migration
318 just before starting postcopy live migration, in order to avoid
321 The implementation of postcopy live migration currently uses one
330 guest (``UFFDIO_ZEROCOPY`` is used if the source page was a zero page).
332 A different postcopy thread in the destination node listens with
333 poll() to the ``userfaultfd`` in parallel. When a ``POLLIN`` event is
334 generated after a userfault triggers, the postcopy thread read() from
335 the ``userfaultfd`` and receives the fault address (or ``-EAGAIN`` in case the
336 userfault was already resolved and waken by a ``UFFDIO_COPY|ZEROPAGE`` run
349 requested through a userfault).
352 doesn't need to keep any per-page state bitmap relative to the live
353 migration around and a single per-page bitmap has to be maintained in
363 Non-cooperative userfaultfd
382 	non-cooperative process moves a virtual memory area to a
402 ``userfaultfd``, and if a page fault occurs in that area it will be
414 asynchronously and the non-cooperative process resumes execution as
418 return ``-ENOSPC`` when the monitored process exits at the time of
419 ``UFFDIO_COPY``, and ``-ENOENT``, when the non-cooperative process has changed
424 single threaded non-cooperative ``userfaultfd`` manager implementations. A
425 synchronous event delivery model can be added later as a new