1===================== 2Restartable Sequences 3===================== 4 5Restartable Sequences allow to register a per thread userspace memory area 6to be used as an ABI between kernel and userspace for three purposes: 7 8 * userspace restartable sequences 9 10 * quick access to read the current CPU number, node ID from userspace 11 12 * scheduler time slice extensions 13 14Restartable sequences (per-cpu atomics) 15--------------------------------------- 16 17Restartable sequences allow userspace to perform update operations on 18per-cpu data without requiring heavyweight atomic operations. The actual 19ABI is unfortunately only available in the code and selftests. 20 21Quick access to CPU number, node ID 22----------------------------------- 23 24Allows to implement per CPU data efficiently. Documentation is in code and 25selftests. :( 26 27Scheduler time slice extensions 28------------------------------- 29 30This allows a thread to request a time slice extension when it enters a 31critical section to avoid contention on a resource when the thread is 32scheduled out inside of the critical section. 33 34The prerequisites for this functionality are: 35 36 * Enabled in Kconfig 37 38 * Enabled at boot time (default is enabled) 39 40 * A rseq userspace pointer has been registered for the thread 41 42The thread has to enable the functionality via prctl(2):: 43 44 prctl(PR_RSEQ_SLICE_EXTENSION, PR_RSEQ_SLICE_EXTENSION_SET, 45 PR_RSEQ_SLICE_EXT_ENABLE, 0, 0); 46 47prctl() returns 0 on success or otherwise with the following error codes: 48 49========= ============================================================== 50Errorcode Meaning 51========= ============================================================== 52EINVAL Functionality not available or invalid function arguments. 53 Note: arg4 and arg5 must be zero 54ENOTSUPP Functionality was disabled on the kernel command line 55ENXIO Available, but no rseq user struct registered 56========= ============================================================== 57 58The state can be also queried via prctl(2):: 59 60 prctl(PR_RSEQ_SLICE_EXTENSION, PR_RSEQ_SLICE_EXTENSION_GET, 0, 0, 0); 61 62prctl() returns ``PR_RSEQ_SLICE_EXT_ENABLE`` when it is enabled or 0 if 63disabled. Otherwise it returns with the following error codes: 64 65========= ============================================================== 66Errorcode Meaning 67========= ============================================================== 68EINVAL Functionality not available or invalid function arguments. 69 Note: arg3 and arg4 and arg5 must be zero 70========= ============================================================== 71 72The availability and status is also exposed via the rseq ABI struct flags 73field via the ``RSEQ_CS_FLAG_SLICE_EXT_AVAILABLE_BIT`` and the 74``RSEQ_CS_FLAG_SLICE_EXT_ENABLED_BIT``. These bits are read-only for user 75space and only for informational purposes. 76 77If the mechanism was enabled via prctl(), the thread can request a time 78slice extension by setting rseq::slice_ctrl::request to 1. If the thread is 79interrupted and the interrupt results in a reschedule request in the 80kernel, then the kernel can grant a time slice extension and return to 81userspace instead of scheduling out. The length of the extension is 82determined by debugfs:rseq/slice_ext_nsec. The default value is 5 usec; which 83is the minimum value. It can be incremented to 50 usecs, however doing so 84can/will affect the minimum scheduling latency. 85 86The kernel indicates the grant by clearing rseq::slice_ctrl::request and 87setting rseq::slice_ctrl::granted to 1. If there is a reschedule of the 88thread after granting the extension, the kernel clears the granted bit to 89indicate that to userspace. 90 91If the request bit is still set when the leaving the critical section, 92userspace can clear it and continue. 93 94If the granted bit is set, then userspace invokes rseq_slice_yield(2) when 95leaving the critical section to relinquish the CPU. The kernel enforces 96this by arming a timer to prevent misbehaving userspace from abusing this 97mechanism. 98 99If both the request bit and the granted bit are false when leaving the 100critical section, then this indicates that a grant was revoked and no 101further action is required by userspace. 102 103The required code flow is as follows:: 104 105 rseq->slice_ctrl.request = 1; 106 barrier(); // Prevent compiler reordering 107 critical_section(); 108 barrier(); // Prevent compiler reordering 109 rseq->slice_ctrl.request = 0; 110 if (rseq->slice_ctrl.granted) 111 rseq_slice_yield(); 112 113As all of this is strictly CPU local, there are no atomicity requirements. 114Checking the granted state is racy, but that cannot be avoided at all:: 115 116 if (rseq->slice_ctrl.granted) 117 -> Interrupt results in schedule and grant revocation 118 rseq_slice_yield(); 119 120So there is no point in pretending that this might be solved by an atomic 121operation. 122 123If the thread issues a syscall other than rseq_slice_yield(2) within the 124granted timeslice extension, the grant is also revoked and the CPU is 125relinquished immediately when entering the kernel. This is required as 126syscalls might consume arbitrary CPU time until they reach a scheduling 127point when the preemption model is either NONE or VOLUNTARY and therefore 128might exceed the grant by far. 129 130The preferred solution for user space is to use rseq_slice_yield(2) which 131is side effect free. The support for arbitrary syscalls is required to 132support onion layer architectured applications, where the code handling the 133critical section and requesting the time slice extension has no control 134over the code within the critical section. 135 136The kernel enforces flag consistency and terminates the thread with SIGSEGV 137if it detects a violation. 138