xref: /freebsd/lib/libsys/_umtx_op.2 (revision dab59af3bcc7cb7ba01569d3044894b3e860ad56)
1.\" Copyright (c) 2016 The FreeBSD Foundation
2.\"
3.\" This documentation was written by
4.\" Konstantin Belousov <kib@FreeBSD.org> under sponsorship
5.\" from the FreeBSD Foundation.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.Dd November 23, 2020
29.Dt _UMTX_OP 2
30.Os
31.Sh NAME
32.Nm _umtx_op
33.Nd interface for implementation of userspace threading synchronization primitives
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/types.h
38.In sys/umtx.h
39.Ft int
40.Fn _umtx_op "void *obj" "int op" "u_long val" "void *uaddr" "void *uaddr2"
41.Sh DESCRIPTION
42The
43.Fn _umtx_op
44system call provides kernel support for userspace implementation of
45the threading synchronization primitives.
46The
47.Lb libthr
48uses the syscall to implement
49.St -p1003.1-2001
50pthread locks, like mutexes, condition variables and so on.
51.Ss STRUCTURES
52The operations, performed by the
53.Fn _umtx_op
54syscall, operate on userspace objects which are described
55by the following structures.
56Reserved fields and paddings are omitted.
57All objects require ABI-mandated alignment, but this is not currently
58enforced consistently on all architectures.
59.Pp
60The following flags are defined for flag fields of all structures:
61.Bl -tag -width indent
62.It Dv USYNC_PROCESS_SHARED
63Allow selection of the process-shared sleep queue for the thread sleep
64container, when the lock ownership cannot be granted immediately,
65and the operation must sleep.
66The process-shared or process-private sleep queue is selected based on
67the attributes of the memory mapping which contains the first byte of
68the structure, see
69.Xr mmap 2 .
70Otherwise, if the flag is not specified, the process-private sleep queue
71is selected regardless of the memory mapping attributes, as an optimization.
72.Pp
73See the
74.Sx SLEEP QUEUES
75subsection below for more details on sleep queues.
76.El
77.Bl -hang -offset indent
78.It Sy Mutex
79.Bd -literal
80struct umutex {
81	volatile lwpid_t m_owner;
82	uint32_t         m_flags;
83	uint32_t         m_ceilings[2];
84	uintptr_t        m_rb_lnk;
85};
86.Ed
87.Pp
88The
89.Dv m_owner
90field is the actual lock.
91It contains either the thread identifier of the lock owner in the
92locked state, or zero when the lock is unowned.
93The highest bit set indicates that there is contention on the lock.
94The constants are defined for special values:
95.Bl -tag -width indent
96.It Dv UMUTEX_UNOWNED
97Zero, the value stored in the unowned lock.
98.It Dv UMUTEX_CONTESTED
99The contention indicator.
100.It Dv UMUTEX_RB_OWNERDEAD
101A thread owning the robust mutex terminated.
102The mutex is in unlocked state.
103.It Dv UMUTEX_RB_NOTRECOV
104The robust mutex is in a non-recoverable state.
105It cannot be locked until reinitialized.
106.El
107.Pp
108The
109.Dv m_flags
110field may contain the following umutex-specific flags, in addition to
111the common flags:
112.Bl -tag -width indent
113.It Dv UMUTEX_PRIO_INHERIT
114Mutex implements
115.Em Priority Inheritance
116protocol.
117.It Dv UMUTEX_PRIO_PROTECT
118Mutex implements
119.Em Priority Protection
120protocol.
121.It Dv UMUTEX_ROBUST
122Mutex is robust, as described in the
123.Sx ROBUST UMUTEXES
124section below.
125.It Dv UMUTEX_NONCONSISTENT
126Robust mutex is in a transient non-consistent state.
127Not used by kernel.
128.El
129.Pp
130In the manual page, mutexes not having
131.Dv UMUTEX_PRIO_INHERIT
132and
133.Dv UMUTEX_PRIO_PROTECT
134flags set, are called normal mutexes.
135Each type of mutex
136.Pq normal, priority-inherited, and priority-protected
137has a separate sleep queue associated
138with the given key.
139.Pp
140For priority protected mutexes, the
141.Dv m_ceilings
142array contains priority ceiling values.
143The
144.Dv m_ceilings[0]
145is the ceiling value for the mutex, as specified by
146.St -p1003.1-2008
147for the
148.Em Priority Protected
149mutex protocol.
150The
151.Dv m_ceilings[1]
152is used only for the unlock of a priority protected mutex, when
153unlock is done in an order other than the reversed lock order.
154In this case,
155.Dv m_ceilings[1]
156must contain the ceiling value for the last locked priority protected
157mutex, for proper priority reassignment.
158If, instead, the unlocking mutex was the last priority propagated
159mutex locked by the thread,
160.Dv m_ceilings[1]
161should contain \-1.
162This is required because kernel does not maintain the ordered lock list.
163.It Sy Condition variable
164.Bd -literal
165struct ucond {
166	volatile uint32_t c_has_waiters;
167	uint32_t          c_flags;
168	uint32_t          c_clockid;
169};
170.Ed
171.Pp
172A non-zero
173.Dv c_has_waiters
174value indicates that there are in-kernel waiters for the condition,
175executing the
176.Dv UMTX_OP_CV_WAIT
177request.
178.Pp
179The
180.Dv c_flags
181field contains flags.
182Only the common flags
183.Pq Dv USYNC_PROCESS_SHARED
184are defined for ucond.
185.Pp
186The
187.Dv c_clockid
188member provides the clock identifier to use for timeout, when the
189.Dv UMTX_OP_CV_WAIT
190request has both the
191.Dv CVWAIT_CLOCKID
192flag and the timeout specified.
193Valid clock identifiers are a subset of those for
194.Xr clock_gettime 2 :
195.Bl -bullet -compact
196.It
197.Dv CLOCK_MONOTONIC
198.It
199.Dv CLOCK_MONOTONIC_FAST
200.It
201.Dv CLOCK_MONOTONIC_PRECISE
202.It
203.Dv CLOCK_PROF
204.It
205.Dv CLOCK_REALTIME
206.It
207.Dv CLOCK_REALTIME_FAST
208.It
209.Dv CLOCK_REALTIME_PRECISE
210.It
211.Dv CLOCK_SECOND
212.It
213.Dv CLOCK_UPTIME
214.It
215.Dv CLOCK_UPTIME_FAST
216.It
217.Dv CLOCK_UPTIME_PRECISE
218.It
219.Dv CLOCK_VIRTUAL
220.El
221.It Sy Reader/writer lock
222.Bd -literal
223struct urwlock {
224	volatile int32_t rw_state;
225	uint32_t         rw_flags;
226	uint32_t         rw_blocked_readers;
227	uint32_t         rw_blocked_writers;
228};
229.Ed
230.Pp
231The
232.Dv rw_state
233field is the actual lock.
234It contains both the flags and counter of the read locks which were
235granted.
236Names of the
237.Dv rw_state
238bits are following:
239.Bl -tag -width indent
240.It Dv URWLOCK_WRITE_OWNER
241Write lock was granted.
242.It Dv URWLOCK_WRITE_WAITERS
243There are write lock waiters.
244.It Dv URWLOCK_READ_WAITERS
245There are read lock waiters.
246.It Dv URWLOCK_READER_COUNT(c)
247Returns the count of currently granted read locks.
248.El
249.Pp
250At any given time there may be only one thread to which the writer lock
251is granted on the
252.Vt struct rwlock ,
253and no threads are granted read lock.
254Or, at the given time, up to
255.Dv URWLOCK_MAX_READERS
256threads may be granted the read lock simultaneously, but write lock is
257not granted to any thread.
258.Pp
259The following flags for the
260.Dv rw_flags
261member of
262.Vt struct urwlock
263are defined, in addition to the common flags:
264.Bl -tag -width indent
265.It Dv URWLOCK_PREFER_READER
266If specified, immediately grant read lock requests when
267.Dv urwlock
268is already read-locked, even in presence of unsatisfied write
269lock requests.
270By default, if there is a write lock waiter, further read requests are
271not granted, to prevent unfair write lock waiter starvation.
272.El
273.Pp
274The
275.Dv rw_blocked_readers
276and
277.Dv rw_blocked_writers
278members contain the count of threads which are sleeping in kernel,
279waiting for the associated request type to be granted.
280The fields are used by kernel to update the
281.Dv URWLOCK_READ_WAITERS
282and
283.Dv URWLOCK_WRITE_WAITERS
284flags of the
285.Dv rw_state
286lock after requesting thread was woken up.
287.It Sy Semaphore
288.Bd -literal
289struct _usem2 {
290	volatile uint32_t _count;
291	uint32_t          _flags;
292};
293.Ed
294.Pp
295The
296.Dv _count
297word represents a counting semaphore.
298A non-zero value indicates an unlocked (posted) semaphore, while zero
299represents the locked state.
300The maximal supported semaphore count is
301.Dv USEM_MAX_COUNT .
302.Pp
303The
304.Dv _count
305word, besides the counter of posts (unlocks), also contains the
306.Dv USEM_HAS_WAITERS
307bit, which indicates that locked semaphore has waiting threads.
308.Pp
309The
310.Dv USEM_COUNT()
311macro, applied to the
312.Dv _count
313word, returns the current semaphore counter, which is the number of posts
314issued on the semaphore.
315.Pp
316The following bits for the
317.Dv _flags
318member of
319.Vt struct _usem2
320are defined, in addition to the common flags:
321.Bl -tag -width indent
322.It Dv USEM_NAMED
323Flag is ignored by kernel.
324.El
325.It Sy Timeout parameter
326.Bd -literal
327struct _umtx_time {
328	struct timespec _timeout;
329	uint32_t        _flags;
330	uint32_t        _clockid;
331};
332.Ed
333.Pp
334Several
335.Fn _umtx_op
336operations allow the blocking time to be limited, failing the request
337if it cannot be satisfied in the specified time period.
338The timeout is specified by passing either the address of
339.Vt struct timespec ,
340or its extended variant,
341.Vt struct _umtx_time ,
342as the
343.Fa uaddr2
344argument of
345.Fn _umtx_op .
346They are distinguished by the
347.Fa uaddr
348value, which must be equal to the size of the structure pointed to by
349.Fa uaddr2 ,
350casted to
351.Vt uintptr_t .
352.Pp
353The
354.Dv _timeout
355member specifies the time when the timeout should occur.
356Legal values for clock identifier
357.Dv _clockid
358are shared with the
359.Fa clock_id
360argument to the
361.Xr clock_gettime 2
362function,
363and use the same underlying clocks.
364The specified clock is used to obtain the current time value.
365Interval counting is always performed by the monotonic wall clock.
366.Pp
367The
368.Dv _flags
369argument allows the following flags to further define the timeout behaviour:
370.Bl -tag -width indent
371.It Dv UMTX_ABSTIME
372The
373.Dv _timeout
374value is the absolute time.
375The thread will be unblocked and the request failed when specified
376clock value is equal or exceeds the
377.Dv _timeout.
378.Pp
379If the flag is absent, the timeout value is relative, that is the amount
380of time, measured by the monotonic wall clock from the moment of the request
381start.
382.El
383.El
384.Ss SLEEP QUEUES
385When a locking request cannot be immediately satisfied, the thread is
386typically put to
387.Em sleep ,
388which is a non-runnable state terminated by the
389.Em wake
390operation.
391Lock operations include a
392.Em try
393variant which returns an error rather than sleeping if the lock cannot
394be obtained.
395Also,
396.Fn _umtx_op
397provides requests which explicitly put the thread to sleep.
398.Pp
399Wakes need to know which threads to make runnable, so sleeping threads
400are grouped into containers called
401.Em sleep queues .
402A sleep queue is identified by a key, which for
403.Fn _umtx_op
404is defined as the physical address of some variable.
405Note that the
406.Em physical
407address is used, which means that same variable mapped multiple
408times will give one key value.
409This mechanism enables the construction of
410.Em process-shared
411locks.
412.Pp
413A related attribute of the key is shareability.
414Some requests always interpret keys as private for the current process,
415creating sleep queues with the scope of the current process even if
416the memory is shared.
417Others either select the shareability automatically from the
418mapping attributes, or take additional input as the
419.Dv USYNC_PROCESS_SHARED
420common flag.
421This is done as optimization, allowing the lock scope to be limited
422regardless of the kind of backing memory.
423.Pp
424Only the address of the start byte of the variable specified as key is
425important for determining corresponding sleep queue.
426The size of the variable does not matter, so, for example, sleep on the same
427address interpreted as
428.Vt uint32_t
429and
430.Vt long
431on a little-endian 64-bit platform would collide.
432.Pp
433The last attribute of the key is the object type.
434The sleep queue to which a sleeping thread is assigned is an individual
435one for simple wait requests, mutexes, rwlocks, condvars and other
436primitives, even when the physical address of the key is same.
437.Pp
438When waking up a limited number of threads from a given sleep queue,
439the highest priority threads that have been blocked for the longest on
440the queue are selected.
441.Ss ROBUST UMUTEXES
442The
443.Em robust umutexes
444are provided as a substrate for a userspace library to implement
445.Tn POSIX
446robust mutexes.
447A robust umutex must have the
448.Dv UMUTEX_ROBUST
449flag set.
450.Pp
451On thread termination, the kernel walks two lists of mutexes.
452The two lists head addresses must be provided by a prior call to
453.Dv UMTX_OP_ROBUST_LISTS
454request.
455The lists are singly-linked.
456The link to next element is provided by the
457.Dv m_rb_lnk
458member of the
459.Vt struct umutex .
460.Pp
461Robust list processing is aborted if the kernel finds a mutex
462with any of the following conditions:
463.Bl -dash -offset indent -compact
464.It
465the
466.Dv UMUTEX_ROBUST
467flag is not set
468.It
469not owned by the current thread, except when the mutex is pointed to
470by the
471.Dv robust_inactive
472member of the
473.Vt struct umtx_robust_lists_params ,
474registered for the current thread
475.It
476the combination of mutex flags is invalid
477.It
478read of the umutex memory faults
479.It
480the list length limit described in
481.Xr libthr 3
482is reached.
483.El
484.Pp
485Every mutex in both lists is unlocked as if the
486.Dv UMTX_OP_MUTEX_UNLOCK
487request is performed on it, but instead of the
488.Dv UMUTEX_UNOWNED
489value, the
490.Dv m_owner
491field is written with the
492.Dv UMUTEX_RB_OWNERDEAD
493value.
494When a mutex in the
495.Dv UMUTEX_RB_OWNERDEAD
496state is locked by kernel due to the
497.Dv UMTX_OP_MUTEX_TRYLOCK
498and
499.Dv UMTX_OP_MUTEX_LOCK
500requests, the lock is granted and
501.Er EOWNERDEAD
502error is returned.
503.Pp
504Also, the kernel handles the
505.Dv UMUTEX_RB_NOTRECOV
506value of
507.Dv the m_owner
508field specially, always returning the
509.Er ENOTRECOVERABLE
510error for lock attempts, without granting the lock.
511.Ss OPERATIONS
512The following operations, requested by the
513.Fa op
514argument to the function, are implemented:
515.Bl -tag -width indent
516.It Dv UMTX_OP_WAIT
517Wait.
518The arguments for the request are:
519.Bl -tag -width "obj"
520.It Fa obj
521Pointer to a variable of type
522.Vt long .
523.It Fa val
524Current value of the
525.Dv *obj .
526.El
527.Pp
528The current value of the variable pointed to by the
529.Fa obj
530argument is compared with the
531.Fa val .
532If they are equal, the requesting thread is put to interruptible sleep
533until woken up or the optionally specified timeout expires.
534.Pp
535The comparison and sleep are atomic.
536In other words, if another thread writes a new value to
537.Dv *obj
538and then issues
539.Dv UMTX_OP_WAKE ,
540the request is guaranteed to not miss the wakeup,
541which might otherwise happen between comparison and blocking.
542.Pp
543The physical address of memory where the
544.Fa *obj
545variable is located, is used as a key to index sleeping threads.
546.Pp
547The read of the current value of the
548.Dv *obj
549variable is not guarded by barriers.
550In particular, it is the user's duty to ensure the lock acquire
551and release memory semantics, if the
552.Dv UMTX_OP_WAIT
553and
554.Dv UMTX_OP_WAKE
555requests are used as a substrate for implementing a simple lock.
556.Pp
557The request is not restartable.
558An unblocked signal delivered during the wait always results in sleep
559interruption and
560.Er EINTR
561error.
562.Pp
563Optionally, a timeout for the request may be specified.
564.It Dv UMTX_OP_WAKE
565Wake the threads possibly sleeping due to
566.Dv UMTX_OP_WAIT .
567The arguments for the request are:
568.Bl -tag -width "obj"
569.It Fa obj
570Pointer to a variable, used as a key to find sleeping threads.
571.It Fa val
572Up to
573.Fa val
574threads are woken up by this request.
575Specify
576.Dv INT_MAX
577to wake up all waiters.
578.El
579.It Dv UMTX_OP_MUTEX_TRYLOCK
580Try to lock umutex.
581The arguments to the request are:
582.Bl -tag -width "obj"
583.It Fa obj
584Pointer to the umutex.
585.El
586.Pp
587Operates same as the
588.Dv UMTX_OP_MUTEX_LOCK
589request, but returns
590.Er EBUSY
591instead of sleeping if the lock cannot be obtained immediately.
592.It Dv UMTX_OP_MUTEX_LOCK
593Lock umutex.
594The arguments to the request are:
595.Bl -tag -width "obj"
596.It Fa obj
597Pointer to the umutex.
598.El
599.Pp
600Locking is performed by writing the current thread id into the
601.Dv m_owner
602word of the
603.Vt struct umutex .
604The write is atomic, preserves the
605.Dv UMUTEX_CONTESTED
606contention indicator, and provides the acquire barrier for
607lock entrance semantic.
608.Pp
609If the lock cannot be obtained immediately because another thread owns
610the lock, the current thread is put to sleep, with
611.Dv UMUTEX_CONTESTED
612bit set before.
613Upon wake up, the lock conditions are re-tested.
614.Pp
615The request adheres to the priority protection or inheritance protocol
616of the mutex, specified by the
617.Dv UMUTEX_PRIO_PROTECT
618or
619.Dv UMUTEX_PRIO_INHERIT
620flag, respectively.
621.Pp
622Optionally, a timeout for the request may be specified.
623.Pp
624A request with a timeout specified is not restartable.
625An unblocked signal delivered during the wait always results in sleep
626interruption and
627.Er EINTR
628error.
629A request without timeout specified is always restarted after return
630from a signal handler.
631.It Dv UMTX_OP_MUTEX_UNLOCK
632Unlock umutex.
633The arguments to the request are:
634.Bl -tag -width "obj"
635.It Fa obj
636Pointer to the umutex.
637.El
638.Pp
639Unlocks the mutex, by writing
640.Dv UMUTEX_UNOWNED
641(zero) value into
642.Dv m_owner
643word of the
644.Vt struct umutex .
645The write is done with a release barrier, to provide lock leave semantic.
646.Pp
647If there are threads sleeping in the sleep queue associated with the
648umutex, one thread is woken up.
649If more than one thread sleeps in the sleep queue, the
650.Dv UMUTEX_CONTESTED
651bit is set together with the write of the
652.Dv UMUTEX_UNOWNED
653value into
654.Dv m_owner .
655.Pp
656The request adheres to the priority protection or inheritance protocol
657of the mutex, specified by the
658.Dv UMUTEX_PRIO_PROTECT
659or
660.Dv UMUTEX_PRIO_INHERIT
661flag, respectively.
662See description of the
663.Dv m_ceilings
664member of the
665.Vt struct umutex
666structure for additional details of the request operation on the
667priority protected protocol mutex.
668.It Dv UMTX_OP_SET_CEILING
669Set ceiling for the priority protected umutex.
670The arguments to the request are:
671.Bl -tag -width "uaddr"
672.It Fa obj
673Pointer to the umutex.
674.It Fa val
675New ceiling value.
676.It Fa uaddr
677Address of a variable of type
678.Vt uint32_t .
679If not
680.Dv NULL
681and the update was successful, the previous ceiling value is
682written to the location pointed to by
683.Fa uaddr .
684.El
685.Pp
686The request locks the umutex pointed to by the
687.Fa obj
688parameter, waiting for the lock if not immediately available.
689After the lock is obtained, the new ceiling value
690.Fa val
691is written to the
692.Dv m_ceilings[0]
693member of the
694.Vt struct umutex,
695after which the umutex is unlocked.
696.Pp
697The locking does not adhere to the priority protect protocol,
698to conform to the
699.Tn POSIX
700requirements for the
701.Xr pthread_mutex_setprioceiling 3
702interface.
703.It Dv UMTX_OP_CV_WAIT
704Wait for a condition.
705The arguments to the request are:
706.Bl -tag -width "uaddr2"
707.It Fa obj
708Pointer to the
709.Vt struct ucond .
710.It Fa val
711Request flags, see below.
712.It Fa uaddr
713Pointer to the umutex.
714.It Fa uaddr2
715Optional pointer to a
716.Vt struct timespec
717for timeout specification.
718.El
719.Pp
720The request must be issued by the thread owning the mutex pointed to
721by the
722.Fa uaddr
723argument.
724The
725.Dv c_hash_waiters
726member of the
727.Vt struct ucond ,
728pointed to by the
729.Fa obj
730argument, is set to an arbitrary non-zero value, after which the
731.Fa uaddr
732mutex is unlocked (following the appropriate protocol), and
733the current thread is put to sleep on the sleep queue keyed by
734the
735.Fa obj
736argument.
737The operations are performed atomically.
738It is guaranteed to not miss a wakeup from
739.Dv UMTX_OP_CV_SIGNAL
740or
741.Dv UMTX_OP_CV_BROADCAST
742sent between mutex unlock and putting the current thread on the sleep queue.
743.Pp
744Upon wakeup, if the timeout expired and no other threads are sleeping in
745the same sleep queue, the
746.Dv c_hash_waiters
747member is cleared.
748After wakeup, the
749.Fa uaddr
750umutex is not relocked.
751.Pp
752The following flags are defined:
753.Bl -tag -width "CVWAIT_CLOCKID"
754.It Dv CVWAIT_ABSTIME
755Timeout is absolute.
756.It Dv CVWAIT_CLOCKID
757Clockid is provided.
758.El
759.Pp
760Optionally, a timeout for the request may be specified.
761Unlike other requests, the timeout value is specified directly by a
762.Vt struct timespec ,
763pointed to by the
764.Fa uaddr2
765argument.
766If the
767.Dv CVWAIT_CLOCKID
768flag is provided, the timeout uses the clock from the
769.Dv c_clockid
770member of the
771.Vt struct ucond ,
772pointed to by
773.Fa obj
774argument.
775Otherwise,
776.Dv CLOCK_REALTIME
777is used, regardless of the clock identifier possibly specified in the
778.Vt struct _umtx_time .
779If the
780.Dv CVWAIT_ABSTIME
781flag is supplied, the timeout specifies absolute time value, otherwise
782it denotes a relative time interval.
783.Pp
784The request is not restartable.
785An unblocked signal delivered during
786the wait always results in sleep interruption and
787.Er EINTR
788error.
789.It Dv UMTX_OP_CV_SIGNAL
790Wake up one condition waiter.
791The arguments to the request are:
792.Bl -tag -width "obj"
793.It Fa obj
794Pointer to
795.Vt struct ucond .
796.El
797.Pp
798The request wakes up at most one thread sleeping on the sleep queue keyed
799by the
800.Fa obj
801argument.
802If the woken up thread was the last on the sleep queue, the
803.Dv c_has_waiters
804member of the
805.Vt struct ucond
806is cleared.
807.It Dv UMTX_OP_CV_BROADCAST
808Wake up all condition waiters.
809The arguments to the request are:
810.Bl -tag -width "obj"
811.It Fa obj
812Pointer to
813.Vt struct ucond .
814.El
815.Pp
816The request wakes up all threads sleeping on the sleep queue keyed by the
817.Fa obj
818argument.
819The
820.Dv c_has_waiters
821member of the
822.Vt struct ucond
823is cleared.
824.It Dv UMTX_OP_WAIT_UINT
825Same as
826.Dv UMTX_OP_WAIT ,
827but the type of the variable pointed to by
828.Fa obj
829is
830.Vt u_int
831.Pq a 32-bit integer .
832.It Dv UMTX_OP_RW_RDLOCK
833Read-lock a
834.Vt struct rwlock
835lock.
836The arguments to the request are:
837.Bl -tag -width "obj"
838.It Fa obj
839Pointer to the lock (of type
840.Vt struct rwlock )
841to be read-locked.
842.It Fa val
843Additional flags to augment locking behaviour.
844The valid flags in the
845.Fa val
846argument are:
847.Bl -tag -width indent
848.It Dv URWLOCK_PREFER_READER
849.El
850.El
851.Pp
852The request obtains the read lock on the specified
853.Vt struct rwlock
854by incrementing the count of readers in the
855.Dv rw_state
856word of the structure.
857If the
858.Dv URWLOCK_WRITE_OWNER
859bit is set in the word
860.Dv rw_state ,
861the lock was granted to a writer which has not yet relinquished
862its ownership.
863In this case the current thread is put to sleep until it makes sense to
864retry.
865.Pp
866If the
867.Dv URWLOCK_PREFER_READER
868flag is set either in the
869.Dv rw_flags
870word of the structure, or in the
871.Fa val
872argument of the request, the presence of the threads trying to obtain
873the write lock on the same structure does not prevent the current thread
874from trying to obtain the read lock.
875Otherwise, if the flag is not set, and the
876.Dv URWLOCK_WRITE_WAITERS
877flag is set in
878.Dv rw_state ,
879the current thread does not attempt to obtain read-lock.
880Instead it sets the
881.Dv URWLOCK_READ_WAITERS
882in the
883.Dv rw_state
884word and puts itself to sleep on corresponding sleep queue.
885Upon wakeup, the locking conditions are re-evaluated.
886.Pp
887Optionally, a timeout for the request may be specified.
888.Pp
889The request is not restartable.
890An unblocked signal delivered during the wait always results in sleep
891interruption and
892.Er EINTR
893error.
894.It Dv UMTX_OP_RW_WRLOCK
895Write-lock a
896.Vt struct rwlock
897lock.
898The arguments to the request are:
899.Bl -tag -width "obj"
900.It Fa obj
901Pointer to the lock (of type
902.Vt struct rwlock )
903to be write-locked.
904.El
905.Pp
906The request obtains a write lock on the specified
907.Vt struct rwlock ,
908by setting the
909.Dv URWLOCK_WRITE_OWNER
910bit in the
911.Dv rw_state
912word of the structure.
913If there is already a write lock owner, as indicated by the
914.Dv URWLOCK_WRITE_OWNER
915bit being set, or there are read lock owners, as indicated
916by the read-lock counter, the current thread does not attempt to
917obtain the write-lock.
918Instead it sets the
919.Dv URWLOCK_WRITE_WAITERS
920in the
921.Dv rw_state
922word and puts itself to sleep on corresponding sleep queue.
923Upon wakeup, the locking conditions are re-evaluated.
924.Pp
925Optionally, a timeout for the request may be specified.
926.Pp
927The request is not restartable.
928An unblocked signal delivered during the wait always results in sleep
929interruption and
930.Er EINTR
931error.
932.It Dv UMTX_OP_RW_UNLOCK
933Unlock rwlock.
934The arguments to the request are:
935.Bl -tag -width "obj"
936.It Fa obj
937Pointer to the lock (of type
938.Vt struct rwlock )
939to be unlocked.
940.El
941.Pp
942The unlock type (read or write) is determined by the
943current lock state.
944Note that the
945.Vt struct rwlock
946does not save information about the identity of the thread which
947acquired the lock.
948.Pp
949If there are pending writers after the unlock, and the
950.Dv URWLOCK_PREFER_READER
951flag is not set in the
952.Dv rw_flags
953member of the
954.Fa *obj
955structure, one writer is woken up, selected as described in the
956.Sx SLEEP QUEUES
957subsection.
958If the
959.Dv URWLOCK_PREFER_READER
960flag is set, a pending writer is woken up only if there is
961no pending readers.
962.Pp
963If there are no pending writers, or, in the case that the
964.Dv URWLOCK_PREFER_READER
965flag is set, then all pending readers are woken up by unlock.
966.It Dv UMTX_OP_WAIT_UINT_PRIVATE
967Same as
968.Dv UMTX_OP_WAIT_UINT ,
969but unconditionally select the process-private sleep queue.
970.It Dv UMTX_OP_WAKE_PRIVATE
971Same as
972.Dv UMTX_OP_WAKE ,
973but unconditionally select the process-private sleep queue.
974.It Dv UMTX_OP_MUTEX_WAIT
975Wait for mutex availability.
976The arguments to the request are:
977.Bl -tag -width "obj"
978.It Fa obj
979Address of the mutex.
980.El
981.Pp
982Similarly to the
983.Dv UMTX_OP_MUTEX_LOCK ,
984put the requesting thread to sleep if the mutex lock cannot be obtained
985immediately.
986The
987.Dv UMUTEX_CONTESTED
988bit is set in the
989.Dv m_owner
990word of the mutex to indicate that there is a waiter, before the thread
991is added to the sleep queue.
992Unlike the
993.Dv UMTX_OP_MUTEX_LOCK
994request, the lock is not obtained.
995.Pp
996The operation is not implemented for priority protected and
997priority inherited protocol mutexes.
998.Pp
999Optionally, a timeout for the request may be specified.
1000.Pp
1001A request with a timeout specified is not restartable.
1002An unblocked signal delivered during the wait always results in sleep
1003interruption and
1004.Er EINTR
1005error.
1006A request without a timeout automatically restarts if the signal disposition
1007requested restart via the
1008.Dv SA_RESTART
1009flag in
1010.Vt struct sigaction
1011member
1012.Dv sa_flags .
1013.It Dv UMTX_OP_NWAKE_PRIVATE
1014Wake up a batch of sleeping threads.
1015The arguments to the request are:
1016.Bl -tag -width "obj"
1017.It Fa obj
1018Pointer to the array of pointers.
1019.It Fa val
1020Number of elements in the array pointed to by
1021.Fa obj .
1022.El
1023.Pp
1024For each element in the array pointed to by
1025.Fa obj ,
1026wakes up all threads waiting on the
1027.Em private
1028sleep queue with the key
1029being the byte addressed by the array element.
1030.It Dv UMTX_OP_MUTEX_WAKE
1031Check if a normal umutex is unlocked and wake up a waiter.
1032The arguments for the request are:
1033.Bl -tag -width "obj"
1034.It Fa obj
1035Pointer to the umutex.
1036.El
1037.Pp
1038If the
1039.Dv m_owner
1040word of the mutex pointed to by the
1041.Fa obj
1042argument indicates unowned mutex, which has its contention indicator bit
1043.Dv UMUTEX_CONTESTED
1044set, clear the bit and wake up one waiter in the sleep queue associated
1045with the byte addressed by the
1046.Fa obj ,
1047if any.
1048Only normal mutexes are supported by the request.
1049The sleep queue is always one for a normal mutex type.
1050.Pp
1051This request is deprecated in favor of
1052.Dv UMTX_OP_MUTEX_WAKE2
1053since mutexes using it cannot synchronize their own destruction.
1054That is, the
1055.Dv m_owner
1056word has already been set to
1057.Dv UMUTEX_UNOWNED
1058when this request is made,
1059so that another thread can lock, unlock and destroy the mutex
1060(if no other thread uses the mutex afterwards).
1061Clearing the
1062.Dv UMUTEX_CONTESTED
1063bit may then modify freed memory.
1064.It Dv UMTX_OP_MUTEX_WAKE2
1065Check if a umutex is unlocked and wake up a waiter.
1066The arguments for the request are:
1067.Bl -tag -width "obj"
1068.It Fa obj
1069Pointer to the umutex.
1070.It Fa val
1071The umutex flags.
1072.El
1073.Pp
1074The request does not read the
1075.Dv m_flags
1076member of the
1077.Vt struct umutex ;
1078instead, the
1079.Fa val
1080argument supplies flag information, in particular, to determine the
1081sleep queue where the waiters are found for wake up.
1082.Pp
1083If the mutex is unowned, one waiter is woken up.
1084.Pp
1085If the mutex memory cannot be accessed, all waiters are woken up.
1086.Pp
1087If there is more than one waiter on the sleep queue, or there is only
1088one waiter but the mutex is owned by a thread, the
1089.Dv UMUTEX_CONTESTED
1090bit is set in the
1091.Dv m_owner
1092word of the
1093.Vt struct umutex .
1094.It Dv UMTX_OP_SEM2_WAIT
1095Wait until semaphore is available.
1096The arguments to the request are:
1097.Bl -tag -width "obj"
1098.It Fa obj
1099Pointer to the semaphore (of type
1100.Vt struct _usem2 ) .
1101.It Fa uaddr
1102Size of the memory passed in via the
1103.Fa uaddr2
1104argument.
1105.It Fa uaddr2
1106Optional pointer to a structure of type
1107.Vt struct _umtx_time ,
1108which may be followed by a structure of type
1109.Vt struct timespec .
1110.El
1111.Pp
1112Put the requesting thread onto a sleep queue if the semaphore counter
1113is zero.
1114If the thread is put to sleep, the
1115.Dv USEM_HAS_WAITERS
1116bit is set in the
1117.Dv _count
1118word to indicate waiters.
1119The function returns either due to
1120.Dv _count
1121indicating the semaphore is available (non-zero count due to post),
1122or due to a wakeup.
1123The return does not guarantee that the semaphore is available,
1124nor does it consume the semaphore lock on successful return.
1125.Pp
1126Optionally, a timeout for the request may be specified.
1127.Pp
1128A request with non-absolute timeout value is not restartable.
1129An unblocked signal delivered during such wait results in sleep
1130interruption and
1131.Er EINTR
1132error.
1133.Pp
1134If
1135.Dv UMTX_ABSTIME
1136was not set, and the operation was interrupted and the caller passed in a
1137.Fa uaddr2
1138large enough to hold a
1139.Vt struct timespec
1140following the initial
1141.Vt struct _umtx_time ,
1142then the
1143.Vt struct timespec
1144is updated to contain the unslept amount.
1145.It Dv UMTX_OP_SEM2_WAKE
1146Wake up waiters on semaphore lock.
1147The arguments to the request are:
1148.Bl -tag -width "obj"
1149.It Fa obj
1150Pointer to the semaphore (of type
1151.Vt struct _usem2 ) .
1152.El
1153.Pp
1154The request wakes up one waiter for the semaphore lock.
1155The function does not increment the semaphore lock count.
1156If the
1157.Dv USEM_HAS_WAITERS
1158bit was set in the
1159.Dv _count
1160word, and the last sleeping thread was woken up, the bit is cleared.
1161.It Dv UMTX_OP_SHM
1162Manage anonymous
1163.Tn POSIX
1164shared memory objects (see
1165.Xr shm_open 2 ) ,
1166which can be attached to a byte of physical memory, mapped into the
1167process address space.
1168The objects are used to implement process-shared locks in
1169.Dv libthr .
1170.Pp
1171The
1172.Fa val
1173argument specifies the sub-request of the
1174.Dv UMTX_OP_SHM
1175request:
1176.Bl -tag -width indent
1177.It Dv UMTX_SHM_CREAT
1178Creates the anonymous shared memory object, which can be looked up
1179with the specified key
1180.Fa uaddr .
1181If the object associated with the
1182.Fa uaddr
1183key already exists, it is returned instead of creating a new object.
1184The object's size is one page.
1185On success, the file descriptor referencing the object is returned.
1186The descriptor can be used for mapping the object using
1187.Xr mmap 2 ,
1188or for other shared memory operations.
1189.It Dv UMTX_SHM_LOOKUP
1190Same as
1191.Dv UMTX_SHM_CREATE
1192request, but if there is no shared memory object associated with
1193the specified key
1194.Fa uaddr ,
1195an error is returned, and no new object is created.
1196.It Dv UMTX_SHM_DESTROY
1197De-associate the shared object with the specified key
1198.Fa uaddr .
1199The object is destroyed after the last open file descriptor is closed
1200and the last mapping for it is destroyed.
1201.It Dv UMTX_SHM_ALIVE
1202Checks whether there is a live shared object associated with the
1203supplied key
1204.Fa uaddr .
1205Returns zero if there is, and an error otherwise.
1206This request is an optimization of the
1207.Dv UMTX_SHM_LOOKUP
1208request.
1209It is cheaper when only the liveness of the associated object is asked
1210for, since no file descriptor is installed in the process fd table
1211on success.
1212.El
1213.Pp
1214The
1215.Fa uaddr
1216argument specifies the virtual address, which backing physical memory
1217byte identity is used as a key for the anonymous shared object
1218creation or lookup.
1219.It Dv UMTX_OP_ROBUST_LISTS
1220Register the list heads for the current thread's robust mutex lists.
1221The arguments to the request are:
1222.Bl -tag -width "uaddr"
1223.It Fa val
1224Size of the structure passed in the
1225.Fa uaddr
1226argument.
1227.It Fa uaddr
1228Pointer to the structure of type
1229.Vt struct umtx_robust_lists_params .
1230.El
1231.Pp
1232The structure is defined as
1233.Bd -literal
1234struct umtx_robust_lists_params {
1235	uintptr_t	robust_list_offset;
1236	uintptr_t	robust_priv_list_offset;
1237	uintptr_t	robust_inact_offset;
1238};
1239.Ed
1240.Pp
1241The
1242.Dv robust_list_offset
1243member contains address of the first element in the list of locked
1244robust shared mutexes.
1245The
1246.Dv robust_priv_list_offset
1247member contains address of the first element in the list of locked
1248robust private mutexes.
1249The private and shared robust locked lists are split to allow fast
1250termination of the shared list on fork, in the child.
1251.Pp
1252The
1253.Dv robust_inact_offset
1254contains a pointer to the mutex which might be locked in nearby future,
1255or might have been just unlocked.
1256It is typically set by the lock or unlock mutex implementation code
1257around the whole operation, since lists can be only changed race-free
1258when the thread owns the mutex.
1259The kernel inspects the
1260.Dv robust_inact_offset
1261in addition to walking the shared and private lists.
1262Also, the mutex pointed to by
1263.Dv robust_inact_offset
1264is handled more loosely at the thread termination time,
1265than other mutexes on the list.
1266That mutex is allowed to be not owned by the current thread,
1267in which case list processing is continued.
1268See
1269.Sx ROBUST UMUTEXES
1270subsection for details.
1271.It Dv UMTX_OP_GET_MIN_TIMEOUT
1272Writes out the current value of minimal umtx operations timeout,
1273in nanoseconds, into the long integer variable pointed to by
1274.Fa uaddr1 .
1275.It Dv UMTX_OP_SET_MIN_TIMEOUT
1276Set the minimal amount of time, in nanoseconds, the thread is required
1277to sleep for umtx operations specifying a timeout using absolute clocks.
1278The value is taken from the
1279.Fa val
1280argument of the call.
1281Zero means no minimum.
1282.El
1283.Pp
1284The
1285.Fa op
1286argument may be a bitwise OR of a single command from above with one or more of
1287the following flags:
1288.Bl -tag -width indent
1289.It Dv UMTX_OP__I386
1290Request i386 ABI compatibility from the native
1291.Nm
1292system call.
1293Specifically, this implies that:
1294.Bl -hang -offset indent
1295.It
1296.Fa obj
1297arguments that point to a word, point to a 32-bit integer.
1298.It
1299The
1300.Dv UMTX_OP_NWAKE_PRIVATE
1301.Fa obj
1302argument is a pointer to an array of 32-bit pointers.
1303.It
1304The
1305.Dv m_rb_lnk
1306member of
1307.Vt struct umutex
1308is a 32-bit pointer.
1309.It
1310.Vt struct timespec
1311uses a 32-bit time_t.
1312.El
1313.Pp
1314.Dv UMTX_OP__32BIT
1315has no effect if this flag is set.
1316This flag is valid for all architectures, but it is ignored on i386.
1317.It Dv UMTX_OP__32BIT
1318Request non-i386, 32-bit ABI compatibility from the native
1319.Nm
1320system call.
1321Specifically, this implies that:
1322.Bl -hang -offset indent
1323.It
1324.Fa obj
1325arguments that point to a word, point to a 32-bit integer.
1326.It
1327The
1328.Dv UMTX_OP_NWAKE_PRIVATE
1329.Fa obj
1330argument is a pointer to an array of 32-bit pointers.
1331.It
1332The
1333.Dv m_rb_lnk
1334member of
1335.Vt struct umutex
1336is a 32-bit pointer.
1337.It
1338.Vt struct timespec
1339uses a 64-bit time_t.
1340.El
1341.Pp
1342This flag has no effect if
1343.Dv UMTX_OP__I386
1344is set.
1345This flag is valid for all architectures.
1346.El
1347.Pp
1348Note that if any 32-bit ABI compatibility is being requested, then care must be
1349taken with robust lists.
1350A single thread may not mix 32-bit compatible robust lists with native
1351robust lists.
1352The first
1353.Dv UMTX_OP_ROBUST_LISTS
1354call in a given thread determines which ABI that thread will use for robust
1355lists going forward.
1356.Sh RETURN VALUES
1357If successful,
1358all requests, except
1359.Dv UMTX_SHM_CREAT
1360and
1361.Dv UMTX_SHM_LOOKUP
1362sub-requests of the
1363.Dv UMTX_OP_SHM
1364request, will return zero.
1365The
1366.Dv UMTX_SHM_CREAT
1367and
1368.Dv UMTX_SHM_LOOKUP
1369return a shared memory file descriptor on success.
1370On error \-1 is returned, and the
1371.Va errno
1372variable is set to indicate the error.
1373.Sh ERRORS
1374The
1375.Fn _umtx_op
1376operations can fail with the following errors:
1377.Bl -tag -width "[ETIMEDOUT]"
1378.It Bq Er EFAULT
1379One of the arguments point to invalid memory.
1380.It Bq Er EINVAL
1381The clock identifier, specified for the
1382.Vt struct _umtx_time
1383timeout parameter, or in the
1384.Dv c_clockid
1385member of
1386.Vt struct ucond,
1387is invalid.
1388.It Bq Er EINVAL
1389The type of the mutex, encoded by the
1390.Dv m_flags
1391member of
1392.Vt struct umutex ,
1393is invalid.
1394.It Bq Er EINVAL
1395The
1396.Dv m_owner
1397member of the
1398.Vt struct umutex
1399has changed the lock owner thread identifier during unlock.
1400.It Bq Er EINVAL
1401The
1402.Dv timeout.tv_sec
1403or
1404.Dv timeout.tv_nsec
1405member of
1406.Vt struct _umtx_time
1407is less than zero, or
1408.Dv timeout.tv_nsec
1409is greater than 1000000000.
1410.It Bq Er EINVAL
1411The
1412.Fa op
1413argument specifies invalid operation.
1414.It Bq Er EINVAL
1415The
1416.Fa uaddr
1417argument for the
1418.Dv UMTX_OP_SHM
1419request specifies invalid operation.
1420.It Bq Er EINVAL
1421The
1422.Dv UMTX_OP_SET_CEILING
1423request specifies non priority protected mutex.
1424.It Bq Er EINVAL
1425The new ceiling value for the
1426.Dv UMTX_OP_SET_CEILING
1427request, or one or more of the values read from the
1428.Dv m_ceilings
1429array during lock or unlock operations, is greater than
1430.Dv RTP_PRIO_MAX .
1431.It Bq Er EPERM
1432Unlock attempted on an object not owned by the current thread.
1433.It Bq Er EOWNERDEAD
1434The lock was requested on an umutex where the
1435.Dv m_owner
1436field was set to the
1437.Dv UMUTEX_RB_OWNERDEAD
1438value, indicating terminated robust mutex.
1439The lock was granted to the caller, so this error in fact
1440indicates success with additional conditions.
1441.It Bq Er ENOTRECOVERABLE
1442The lock was requested on an umutex which
1443.Dv m_owner
1444field is equal to the
1445.Dv UMUTEX_RB_NOTRECOV
1446value, indicating abandoned robust mutex after termination.
1447The lock was not granted to the caller.
1448.It Bq Er ENOTTY
1449The shared memory object, associated with the address passed to the
1450.Dv UMTX_SHM_ALIVE
1451sub-request of
1452.Dv UMTX_OP_SHM
1453request, was destroyed.
1454.It Bq Er ESRCH
1455For the
1456.Dv UMTX_SHM_LOOKUP ,
1457.Dv UMTX_SHM_DESTROY ,
1458and
1459.Dv UMTX_SHM_ALIVE
1460sub-requests of the
1461.Dv UMTX_OP_SHM
1462request, there is no shared memory object associated with the provided key.
1463.It Bq Er ENOMEM
1464The
1465.Dv UMTX_SHM_CREAT
1466sub-request of the
1467.Dv UMTX_OP_SHM
1468request cannot be satisfied, because allocation of the shared memory object
1469would exceed the
1470.Dv RLIMIT_UMTXP
1471resource limit, see
1472.Xr setrlimit 2 .
1473.It Bq Er EAGAIN
1474The maximum number of readers
1475.Dv ( URWLOCK_MAX_READERS )
1476were already granted ownership of the given
1477.Vt struct rwlock
1478for read.
1479.It Bq Er EBUSY
1480A try mutex lock operation was not able to obtain the lock.
1481.It Bq Er ETIMEDOUT
1482The request specified a timeout in the
1483.Fa uaddr
1484and
1485.Fa uaddr2
1486arguments, and timed out before obtaining the lock or being woken up.
1487.It Bq Er EINTR
1488A signal was delivered during wait, for a non-restartable operation.
1489Operations with timeouts are typically non-restartable, but timeouts
1490specified in absolute time may be restartable.
1491.It Bq Er ERESTART
1492A signal was delivered during wait, for a restartable operation.
1493Mutex lock requests without timeout specified are restartable.
1494The error is not returned to userspace code since restart
1495is handled by usual adjustment of the instruction counter.
1496.El
1497.Sh SEE ALSO
1498.Xr clock_gettime 2 ,
1499.Xr mmap 2 ,
1500.Xr setrlimit 2 ,
1501.Xr shm_open 2 ,
1502.Xr sigaction 2 ,
1503.Xr thr_exit 2 ,
1504.Xr thr_kill 2 ,
1505.Xr thr_kill2 2 ,
1506.Xr thr_new 2 ,
1507.Xr thr_self 2 ,
1508.Xr thr_set_name 2 ,
1509.Xr signal 3
1510.Sh STANDARDS
1511The
1512.Fn _umtx_op
1513system call is non-standard and is used by the
1514.Lb libthr
1515to implement
1516.St -p1003.1-2001
1517.Xr pthread 3
1518functionality.
1519.Sh BUGS
1520A window between a unlocking robust mutex and resetting the pointer in the
1521.Dv robust_inact_offset
1522member of the registered
1523.Vt struct umtx_robust_lists_params
1524allows another thread to destroy the mutex, thus making the kernel inspect
1525freed or reused memory.
1526The
1527.Li libthr
1528implementation is only vulnerable to this race when operating on
1529a shared mutex.
1530A possible fix for the current implementation is to strengthen the checks
1531for shared mutexes before terminating them, in particular, verifying
1532that the mutex memory is mapped from a shared memory object allocated
1533by the
1534.Dv UMTX_OP_SHM
1535request.
1536This is not done because it is believed that the race is adequately
1537covered by other consistency checks, while adding the check would
1538prevent alternative implementations of
1539.Li libpthread .
1540