xref: /linux/Documentation/userspace-api/ntsync.rst (revision 1260ed77798502de9c98020040d2995008de10cc)
1===================================
2NT synchronization primitive driver
3===================================
4
5This page documents the user-space API for the ntsync driver.
6
7ntsync is a support driver for emulation of NT synchronization
8primitives by user-space NT emulators. It exists because implementation
9in user-space, using existing tools, cannot match Windows performance
10while offering accurate semantics. It is implemented entirely in
11software, and does not drive any hardware device.
12
13This interface is meant as a compatibility tool only, and should not
14be used for general synchronization. Instead use generic, versatile
15interfaces such as futex(2) and poll(2).
16
17Synchronization primitives
18==========================
19
20The ntsync driver exposes three types of synchronization primitives:
21semaphores, mutexes, and events.
22
23A semaphore holds a single volatile 32-bit counter, and a static 32-bit
24integer denoting the maximum value. It is considered signaled (that is,
25can be acquired without contention, or will wake up a waiting thread)
26when the counter is nonzero. The counter is decremented by one when a
27wait is satisfied. Both the initial and maximum count are established
28when the semaphore is created.
29
30A mutex holds a volatile 32-bit recursion count, and a volatile 32-bit
31identifier denoting its owner. A mutex is considered signaled when its
32owner is zero (indicating that it is not owned). The recursion count is
33incremented when a wait is satisfied, and ownership is set to the given
34identifier.
35
36A mutex also holds an internal flag denoting whether its previous owner
37has died; such a mutex is said to be abandoned. Owner death is not
38tracked automatically based on thread death, but rather must be
39communicated using ``NTSYNC_IOC_MUTEX_KILL``. An abandoned mutex is
40inherently considered unowned.
41
42Except for the "unowned" semantics of zero, the actual value of the
43owner identifier is not interpreted by the ntsync driver at all. The
44intended use is to store a thread identifier; however, the ntsync
45driver does not actually validate that a calling thread provides
46consistent or unique identifiers.
47
48An event is similar to a semaphore with a maximum count of one. It holds
49a volatile boolean state denoting whether it is signaled or not. There
50are two types of events, auto-reset and manual-reset. An auto-reset
51event is designaled when a wait is satisfied; a manual-reset event is
52not. The event type is specified when the event is created.
53
54Unless specified otherwise, all operations on an object are atomic and
55totally ordered with respect to other operations on the same object.
56
57Objects are represented by files. When all file descriptors to an
58object are closed, that object is deleted.
59
60Char device
61===========
62
63The ntsync driver creates a single char device /dev/ntsync. Each file
64description opened on the device represents a unique instance intended
65to back an individual NT virtual machine. Objects created by one ntsync
66instance may only be used with other objects created by the same
67instance.
68
69ioctl reference
70===============
71
72All operations on the device are done through ioctls. There are four
73structures used in ioctl calls::
74
75   struct ntsync_sem_args {
76   	__u32 count;
77   	__u32 max;
78   };
79
80   struct ntsync_mutex_args {
81   	__u32 owner;
82   	__u32 count;
83   };
84
85   struct ntsync_event_args {
86   	__u32 signaled;
87   	__u32 manual;
88   };
89
90   struct ntsync_wait_args {
91   	__u64 timeout;
92   	__u64 objs;
93   	__u32 count;
94   	__u32 owner;
95   	__u32 index;
96   	__u32 alert;
97   	__u32 flags;
98   	__u32 pad;
99   };
100
101Depending on the ioctl, members of the structure may be used as input,
102output, or not at all.
103
104The ioctls on the device file are as follows:
105
106.. c:macro:: NTSYNC_IOC_CREATE_SEM
107
108  Create a semaphore object. Takes a pointer to struct
109  :c:type:`ntsync_sem_args`, which is used as follows:
110
111  .. list-table::
112
113     * - ``count``
114       - Initial count of the semaphore.
115     * - ``max``
116       - Maximum count of the semaphore.
117
118  Fails with ``EINVAL`` if ``count`` is greater than ``max``.
119  On success, returns a file descriptor the created semaphore.
120
121.. c:macro:: NTSYNC_IOC_CREATE_MUTEX
122
123  Create a mutex object. Takes a pointer to struct
124  :c:type:`ntsync_mutex_args`, which is used as follows:
125
126  .. list-table::
127
128     * - ``count``
129       - Initial recursion count of the mutex.
130     * - ``owner``
131       - Initial owner of the mutex.
132
133  If ``owner`` is nonzero and ``count`` is zero, or if ``owner`` is
134  zero and ``count`` is nonzero, the function fails with ``EINVAL``.
135  On success, returns a file descriptor the created mutex.
136
137.. c:macro:: NTSYNC_IOC_CREATE_EVENT
138
139  Create an event object. Takes a pointer to struct
140  :c:type:`ntsync_event_args`, which is used as follows:
141
142  .. list-table::
143
144     * - ``signaled``
145       - If nonzero, the event is initially signaled, otherwise
146         nonsignaled.
147     * - ``manual``
148       - If nonzero, the event is a manual-reset event, otherwise
149         auto-reset.
150
151  On success, returns a file descriptor the created event.
152
153The ioctls on the individual objects are as follows:
154
155.. c:macro:: NTSYNC_IOC_SEM_POST
156
157  Post to a semaphore object. Takes a pointer to a 32-bit integer,
158  which on input holds the count to be added to the semaphore, and on
159  output contains its previous count.
160
161  If adding to the semaphore's current count would raise the latter
162  past the semaphore's maximum count, the ioctl fails with
163  ``EOVERFLOW`` and the semaphore is not affected. If raising the
164  semaphore's count causes it to become signaled, eligible threads
165  waiting on this semaphore will be woken and the semaphore's count
166  decremented appropriately.
167
168.. c:macro:: NTSYNC_IOC_MUTEX_UNLOCK
169
170  Release a mutex object. Takes a pointer to struct
171  :c:type:`ntsync_mutex_args`, which is used as follows:
172
173  .. list-table::
174
175     * - ``owner``
176       - Specifies the owner trying to release this mutex.
177     * - ``count``
178       - On output, contains the previous recursion count.
179
180  If ``owner`` is zero, the ioctl fails with ``EINVAL``. If ``owner``
181  is not the current owner of the mutex, the ioctl fails with
182  ``EPERM``.
183
184  The mutex's count will be decremented by one. If decrementing the
185  mutex's count causes it to become zero, the mutex is marked as
186  unowned and signaled, and eligible threads waiting on it will be
187  woken as appropriate.
188
189.. c:macro:: NTSYNC_IOC_SET_EVENT
190
191  Signal an event object. Takes a pointer to a 32-bit integer, which on
192  output contains the previous state of the event.
193
194  Eligible threads will be woken, and auto-reset events will be
195  designaled appropriately.
196
197.. c:macro:: NTSYNC_IOC_RESET_EVENT
198
199  Designal an event object. Takes a pointer to a 32-bit integer, which
200  on output contains the previous state of the event.
201
202.. c:macro:: NTSYNC_IOC_PULSE_EVENT
203
204  Wake threads waiting on an event object while leaving it in an
205  unsignaled state. Takes a pointer to a 32-bit integer, which on
206  output contains the previous state of the event.
207
208  A pulse operation can be thought of as a set followed by a reset,
209  performed as a single atomic operation. If two threads are waiting on
210  an auto-reset event which is pulsed, only one will be woken. If two
211  threads are waiting a manual-reset event which is pulsed, both will
212  be woken. However, in both cases, the event will be unsignaled
213  afterwards, and a simultaneous read operation will always report the
214  event as unsignaled.
215
216.. c:macro:: NTSYNC_IOC_READ_SEM
217
218  Read the current state of a semaphore object. Takes a pointer to
219  struct :c:type:`ntsync_sem_args`, which is used as follows:
220
221  .. list-table::
222
223     * - ``count``
224       - On output, contains the current count of the semaphore.
225     * - ``max``
226       - On output, contains the maximum count of the semaphore.
227
228.. c:macro:: NTSYNC_IOC_READ_MUTEX
229
230  Read the current state of a mutex object. Takes a pointer to struct
231  :c:type:`ntsync_mutex_args`, which is used as follows:
232
233  .. list-table::
234
235     * - ``owner``
236       - On output, contains the current owner of the mutex, or zero
237         if the mutex is not currently owned.
238     * - ``count``
239       - On output, contains the current recursion count of the mutex.
240
241  If the mutex is marked as abandoned, the function fails with
242  ``EOWNERDEAD``. In this case, ``count`` and ``owner`` are set to
243  zero.
244
245.. c:macro:: NTSYNC_IOC_READ_EVENT
246
247  Read the current state of an event object. Takes a pointer to struct
248  :c:type:`ntsync_event_args`, which is used as follows:
249
250  .. list-table::
251
252     * - ``signaled``
253       - On output, contains the current state of the event.
254     * - ``manual``
255       - On output, contains 1 if the event is a manual-reset event,
256         and 0 otherwise.
257
258.. c:macro:: NTSYNC_IOC_KILL_OWNER
259
260  Mark a mutex as unowned and abandoned if it is owned by the given
261  owner. Takes an input-only pointer to a 32-bit integer denoting the
262  owner. If the owner is zero, the ioctl fails with ``EINVAL``. If the
263  owner does not own the mutex, the function fails with ``EPERM``.
264
265  Eligible threads waiting on the mutex will be woken as appropriate
266  (and such waits will fail with ``EOWNERDEAD``, as described below).
267
268.. c:macro:: NTSYNC_IOC_WAIT_ANY
269
270  Poll on any of a list of objects, atomically acquiring at most one.
271  Takes a pointer to struct :c:type:`ntsync_wait_args`, which is
272  used as follows:
273
274  .. list-table::
275
276     * - ``timeout``
277       - Absolute timeout in nanoseconds. If ``NTSYNC_WAIT_REALTIME``
278         is set, the timeout is measured against the REALTIME clock;
279         otherwise it is measured against the MONOTONIC clock. If the
280         timeout is equal to or earlier than the current time, the
281         function returns immediately without sleeping. If ``timeout``
282         is U64_MAX, the function will sleep until an object is
283         signaled, and will not fail with ``ETIMEDOUT``.
284     * - ``objs``
285       - Pointer to an array of ``count`` file descriptors
286         (specified as an integer so that the structure has the same
287         size regardless of architecture). If any object is
288         invalid, the function fails with ``EINVAL``.
289     * - ``count``
290       - Number of objects specified in the ``objs`` array.
291         If greater than ``NTSYNC_MAX_WAIT_COUNT``, the function fails
292         with ``EINVAL``.
293     * - ``owner``
294       - Mutex owner identifier. If any object in ``objs`` is a mutex,
295         the ioctl will attempt to acquire that mutex on behalf of
296         ``owner``. If ``owner`` is zero, the ioctl fails with
297         ``EINVAL``.
298     * - ``index``
299       - On success, contains the index (into ``objs``) of the object
300         which was signaled. If ``alert`` was signaled instead,
301         this contains ``count``.
302     * - ``alert``
303       - Optional event object file descriptor. If nonzero, this
304         specifies an "alert" event object which, if signaled, will
305         terminate the wait. If nonzero, the identifier must point to a
306         valid event.
307     * - ``flags``
308       - Zero or more flags. Currently the only flag is
309         ``NTSYNC_WAIT_REALTIME``, which causes the timeout to be
310         measured against the REALTIME clock instead of MONOTONIC.
311     * - ``pad``
312       - Unused, must be set to zero.
313
314  This function attempts to acquire one of the given objects. If unable
315  to do so, it sleeps until an object becomes signaled, subsequently
316  acquiring it, or the timeout expires. In the latter case the ioctl
317  fails with ``ETIMEDOUT``. The function only acquires one object, even
318  if multiple objects are signaled.
319
320  A semaphore is considered to be signaled if its count is nonzero, and
321  is acquired by decrementing its count by one. A mutex is considered
322  to be signaled if it is unowned or if its owner matches the ``owner``
323  argument, and is acquired by incrementing its recursion count by one
324  and setting its owner to the ``owner`` argument. An auto-reset event
325  is acquired by designaling it; a manual-reset event is not affected
326  by acquisition.
327
328  Acquisition is atomic and totally ordered with respect to other
329  operations on the same object. If two wait operations (with different
330  ``owner`` identifiers) are queued on the same mutex, only one is
331  signaled. If two wait operations are queued on the same semaphore,
332  and a value of one is posted to it, only one is signaled.
333
334  If an abandoned mutex is acquired, the ioctl fails with
335  ``EOWNERDEAD``. Although this is a failure return, the function may
336  otherwise be considered successful. The mutex is marked as owned by
337  the given owner (with a recursion count of 1) and as no longer
338  abandoned, and ``index`` is still set to the index of the mutex.
339
340  The ``alert`` argument is an "extra" event which can terminate the
341  wait, independently of all other objects.
342
343  It is valid to pass the same object more than once, including by
344  passing the same event in the ``objs`` array and in ``alert``. If a
345  wakeup occurs due to that object being signaled, ``index`` is set to
346  the lowest index corresponding to that object.
347
348  The function may fail with ``EINTR`` if a signal is received.
349
350.. c:macro:: NTSYNC_IOC_WAIT_ALL
351
352  Poll on a list of objects, atomically acquiring all of them. Takes a
353  pointer to struct :c:type:`ntsync_wait_args`, which is used
354  identically to ``NTSYNC_IOC_WAIT_ANY``, except that ``index`` is
355  always filled with zero on success if not woken via alert.
356
357  This function attempts to simultaneously acquire all of the given
358  objects. If unable to do so, it sleeps until all objects become
359  simultaneously signaled, subsequently acquiring them, or the timeout
360  expires. In the latter case the ioctl fails with ``ETIMEDOUT`` and no
361  objects are modified.
362
363  Objects may become signaled and subsequently designaled (through
364  acquisition by other threads) while this thread is sleeping. Only
365  once all objects are simultaneously signaled does the ioctl acquire
366  them and return. The entire acquisition is atomic and totally ordered
367  with respect to other operations on any of the given objects.
368
369  If an abandoned mutex is acquired, the ioctl fails with
370  ``EOWNERDEAD``. Similarly to ``NTSYNC_IOC_WAIT_ANY``, all objects are
371  nevertheless marked as acquired. Note that if multiple mutex objects
372  are specified, there is no way to know which were marked as
373  abandoned.
374
375  As with "any" waits, the ``alert`` argument is an "extra" event which
376  can terminate the wait. Critically, however, an "all" wait will
377  succeed if all members in ``objs`` are signaled, *or* if ``alert`` is
378  signaled. In the latter case ``index`` will be set to ``count``. As
379  with "any" waits, if both conditions are filled, the former takes
380  priority, and objects in ``objs`` will be acquired.
381
382  Unlike ``NTSYNC_IOC_WAIT_ANY``, it is not valid to pass the same
383  object more than once, nor is it valid to pass the same object in
384  ``objs`` and in ``alert``. If this is attempted, the function fails
385  with ``EINVAL``.
386