xref: /freebsd/lib/libsys/_umtx_op.2 (revision 24e4dcf4ba5e9dedcf89efd358ea3e1fe5867020)
1.\" Copyright (c) 2016 The FreeBSD Foundation
2.\"
3.\" This documentation was written by
4.\" Konstantin Belousov <kib@FreeBSD.org> under sponsorship
5.\" from the FreeBSD Foundation.
6.\"
7.\" Redistribution and use in source and binary forms, with or without
8.\" modification, are permitted provided that the following conditions
9.\" are met:
10.\" 1. Redistributions of source code must retain the above copyright
11.\"    notice, this list of conditions and the following disclaimer.
12.\" 2. Redistributions in binary form must reproduce the above copyright
13.\"    notice, this list of conditions and the following disclaimer in the
14.\"    documentation and/or other materials provided with the distribution.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.Dd November 23, 2020
29.Dt _UMTX_OP 2
30.Os
31.Sh NAME
32.Nm _umtx_op
33.Nd interface for implementation of userspace threading synchronization primitives
34.Sh LIBRARY
35.Lb libc
36.Sh SYNOPSIS
37.In sys/types.h
38.In sys/umtx.h
39.Ft int
40.Fn _umtx_op "void *obj" "int op" "u_long val" "void *uaddr" "void *uaddr2"
41.Sh DESCRIPTION
42The
43.Fn _umtx_op
44system call provides kernel support for userspace implementation of
45the threading synchronization primitives.
46The
47.Lb libthr
48uses the syscall to implement
49.St -p1003.1-2001
50pthread locks, like mutexes, condition variables and so on.
51.Ss STRUCTURES
52The operations, performed by the
53.Fn _umtx_op
54syscall, operate on userspace objects which are described
55by the following structures.
56Reserved fields and paddings are omitted.
57All objects require ABI-mandated alignment, but this is not currently
58enforced consistently on all architectures.
59.Pp
60The following flags are defined for flag fields of all structures:
61.Bl -tag -width indent
62.It Dv USYNC_PROCESS_SHARED
63Allow selection of the process-shared sleep queue for the thread sleep
64container, when the lock ownership cannot be granted immediately,
65and the operation must sleep.
66The process-shared or process-private sleep queue is selected based on
67the attributes of the memory mapping which contains the first byte of
68the structure, see
69.Xr mmap 2 .
70Otherwise, if the flag is not specified, the process-private sleep queue
71is selected regardless of the memory mapping attributes, as an optimization.
72.Pp
73See the
74.Sx SLEEP QUEUES
75subsection below for more details on sleep queues.
76.El
77.Bl -hang -offset indent
78.It Sy Mutex
79.Bd -literal
80struct umutex {
81	volatile lwpid_t m_owner;
82	uint32_t         m_flags;
83	uint32_t         m_ceilings[2];
84	uintptr_t        m_rb_lnk;
85};
86.Ed
87.Pp
88The
89.Dv m_owner
90field is the actual lock.
91It contains either the thread identifier of the lock owner in the
92locked state, or zero when the lock is unowned.
93The highest bit set indicates that there is contention on the lock.
94The constants are defined for special values:
95.Bl -tag -width indent
96.It Dv UMUTEX_UNOWNED
97Zero, the value stored in the unowned lock.
98.It Dv UMUTEX_CONTESTED
99The contention indicator.
100.It Dv UMUTEX_RB_OWNERDEAD
101A thread owning the robust mutex terminated.
102The mutex is in unlocked state.
103.It Dv UMUTEX_RB_NOTRECOV
104The robust mutex is in a non-recoverable state.
105It cannot be locked until reinitialized.
106.El
107.Pp
108The
109.Dv m_flags
110field may contain the following umutex-specific flags, in addition to
111the common flags:
112.Bl -tag -width indent
113.It Dv UMUTEX_PRIO_INHERIT
114Mutex implements
115.Em Priority Inheritance
116protocol.
117.It Dv UMUTEX_PRIO_PROTECT
118Mutex implements
119.Em Priority Protection
120protocol.
121.It Dv UMUTEX_ROBUST
122Mutex is robust, as described in the
123.Sx ROBUST UMUTEXES
124section below.
125.It Dv UMUTEX_NONCONSISTENT
126Robust mutex is in a transient non-consistent state.
127Not used by kernel.
128.El
129.Pp
130In the manual page, mutexes not having
131.Dv UMUTEX_PRIO_INHERIT
132and
133.Dv UMUTEX_PRIO_PROTECT
134flags set, are called normal mutexes.
135Each type of mutex
136.Pq normal, priority-inherited, and priority-protected
137has a separate sleep queue associated
138with the given key.
139.Pp
140For priority protected mutexes, the
141.Dv m_ceilings
142array contains priority ceiling values.
143The
144.Dv m_ceilings[0]
145is the ceiling value for the mutex, as specified by
146.St -p1003.1-2008
147for the
148.Em Priority Protected
149mutex protocol.
150The
151.Dv m_ceilings[1]
152is used only for the unlock of a priority protected mutex, when
153unlock is done in an order other than the reversed lock order.
154In this case,
155.Dv m_ceilings[1]
156must contain the ceiling value for the last locked priority protected
157mutex, for proper priority reassignment.
158If, instead, the unlocking mutex was the last priority propagated
159mutex locked by the thread,
160.Dv m_ceilings[1]
161should contain \-1.
162This is required because kernel does not maintain the ordered lock list.
163.It Sy Condition variable
164.Bd -literal
165struct ucond {
166	volatile uint32_t c_has_waiters;
167	uint32_t          c_flags;
168	uint32_t          c_clockid;
169};
170.Ed
171.Pp
172A non-zero
173.Dv c_has_waiters
174value indicates that there are in-kernel waiters for the condition,
175executing the
176.Dv UMTX_OP_CV_WAIT
177request.
178.Pp
179The
180.Dv c_flags
181field contains flags.
182Only the common flags
183.Pq Dv USYNC_PROCESS_SHARED
184are defined for ucond.
185.Pp
186The
187.Dv c_clockid
188member provides the clock identifier to use for timeout, when the
189.Dv UMTX_OP_CV_WAIT
190request has both the
191.Dv CVWAIT_CLOCKID
192flag and the timeout specified.
193Valid clock identifiers are a subset of those for
194.Xr clock_gettime 2 :
195.Bl -bullet -compact
196.It
197.Dv CLOCK_MONOTONIC
198.It
199.Dv CLOCK_MONOTONIC_FAST
200.It
201.Dv CLOCK_MONOTONIC_PRECISE
202.It
203.Dv CLOCK_PROF
204.It
205.Dv CLOCK_REALTIME
206.It
207.Dv CLOCK_REALTIME_FAST
208.It
209.Dv CLOCK_REALTIME_PRECISE
210.It
211.Dv CLOCK_SECOND
212.It
213.Dv CLOCK_TAI
214.It
215.Dv CLOCK_UPTIME
216.It
217.Dv CLOCK_UPTIME_FAST
218.It
219.Dv CLOCK_UPTIME_PRECISE
220.It
221.Dv CLOCK_VIRTUAL
222.El
223.It Sy Reader/writer lock
224.Bd -literal
225struct urwlock {
226	volatile int32_t rw_state;
227	uint32_t         rw_flags;
228	uint32_t         rw_blocked_readers;
229	uint32_t         rw_blocked_writers;
230};
231.Ed
232.Pp
233The
234.Dv rw_state
235field is the actual lock.
236It contains both the flags and counter of the read locks which were
237granted.
238Names of the
239.Dv rw_state
240bits are following:
241.Bl -tag -width indent
242.It Dv URWLOCK_WRITE_OWNER
243Write lock was granted.
244.It Dv URWLOCK_WRITE_WAITERS
245There are write lock waiters.
246.It Dv URWLOCK_READ_WAITERS
247There are read lock waiters.
248.It Dv URWLOCK_READER_COUNT(c)
249Returns the count of currently granted read locks.
250.El
251.Pp
252At any given time there may be only one thread to which the writer lock
253is granted on the
254.Vt struct rwlock ,
255and no threads are granted read lock.
256Or, at the given time, up to
257.Dv URWLOCK_MAX_READERS
258threads may be granted the read lock simultaneously, but write lock is
259not granted to any thread.
260.Pp
261The following flags for the
262.Dv rw_flags
263member of
264.Vt struct urwlock
265are defined, in addition to the common flags:
266.Bl -tag -width indent
267.It Dv URWLOCK_PREFER_READER
268If specified, immediately grant read lock requests when
269.Dv urwlock
270is already read-locked, even in presence of unsatisfied write
271lock requests.
272By default, if there is a write lock waiter, further read requests are
273not granted, to prevent unfair write lock waiter starvation.
274.El
275.Pp
276The
277.Dv rw_blocked_readers
278and
279.Dv rw_blocked_writers
280members contain the count of threads which are sleeping in kernel,
281waiting for the associated request type to be granted.
282The fields are used by kernel to update the
283.Dv URWLOCK_READ_WAITERS
284and
285.Dv URWLOCK_WRITE_WAITERS
286flags of the
287.Dv rw_state
288lock after requesting thread was woken up.
289.It Sy Semaphore
290.Bd -literal
291struct _usem2 {
292	volatile uint32_t _count;
293	uint32_t          _flags;
294};
295.Ed
296.Pp
297The
298.Dv _count
299word represents a counting semaphore.
300A non-zero value indicates an unlocked (posted) semaphore, while zero
301represents the locked state.
302The maximal supported semaphore count is
303.Dv USEM_MAX_COUNT .
304.Pp
305The
306.Dv _count
307word, besides the counter of posts (unlocks), also contains the
308.Dv USEM_HAS_WAITERS
309bit, which indicates that locked semaphore has waiting threads.
310.Pp
311The
312.Dv USEM_COUNT()
313macro, applied to the
314.Dv _count
315word, returns the current semaphore counter, which is the number of posts
316issued on the semaphore.
317.Pp
318The following bits for the
319.Dv _flags
320member of
321.Vt struct _usem2
322are defined, in addition to the common flags:
323.Bl -tag -width indent
324.It Dv USEM_NAMED
325Flag is ignored by kernel.
326.El
327.It Sy Timeout parameter
328.Bd -literal
329struct _umtx_time {
330	struct timespec _timeout;
331	uint32_t        _flags;
332	uint32_t        _clockid;
333};
334.Ed
335.Pp
336Several
337.Fn _umtx_op
338operations allow the blocking time to be limited, failing the request
339if it cannot be satisfied in the specified time period.
340The timeout is specified by passing either the address of
341.Vt struct timespec ,
342or its extended variant,
343.Vt struct _umtx_time ,
344as the
345.Fa uaddr2
346argument of
347.Fn _umtx_op .
348They are distinguished by the
349.Fa uaddr
350value, which must be equal to the size of the structure pointed to by
351.Fa uaddr2 ,
352casted to
353.Vt uintptr_t .
354.Pp
355The
356.Dv _timeout
357member specifies the time when the timeout should occur.
358Legal values for clock identifier
359.Dv _clockid
360are shared with the
361.Fa clock_id
362argument to the
363.Xr clock_gettime 2
364function,
365and use the same underlying clocks.
366The specified clock is used to obtain the current time value.
367Interval counting is always performed by the monotonic wall clock.
368.Pp
369The
370.Dv _flags
371argument allows the following flags to further define the timeout behaviour:
372.Bl -tag -width indent
373.It Dv UMTX_ABSTIME
374The
375.Dv _timeout
376value is the absolute time.
377The thread will be unblocked and the request failed when specified
378clock value is equal or exceeds the
379.Dv _timeout.
380.Pp
381If the flag is absent, the timeout value is relative, that is the amount
382of time, measured by the monotonic wall clock from the moment of the request
383start.
384.El
385.El
386.Ss SLEEP QUEUES
387When a locking request cannot be immediately satisfied, the thread is
388typically put to
389.Em sleep ,
390which is a non-runnable state terminated by the
391.Em wake
392operation.
393Lock operations include a
394.Em try
395variant which returns an error rather than sleeping if the lock cannot
396be obtained.
397Also,
398.Fn _umtx_op
399provides requests which explicitly put the thread to sleep.
400.Pp
401Wakes need to know which threads to make runnable, so sleeping threads
402are grouped into containers called
403.Em sleep queues .
404A sleep queue is identified by a key, which for
405.Fn _umtx_op
406is defined as the physical address of some variable.
407Note that the
408.Em physical
409address is used, which means that same variable mapped multiple
410times will give one key value.
411This mechanism enables the construction of
412.Em process-shared
413locks.
414.Pp
415A related attribute of the key is shareability.
416Some requests always interpret keys as private for the current process,
417creating sleep queues with the scope of the current process even if
418the memory is shared.
419Others either select the shareability automatically from the
420mapping attributes, or take additional input as the
421.Dv USYNC_PROCESS_SHARED
422common flag.
423This is done as optimization, allowing the lock scope to be limited
424regardless of the kind of backing memory.
425.Pp
426Only the address of the start byte of the variable specified as key is
427important for determining corresponding sleep queue.
428The size of the variable does not matter, so, for example, sleep on the same
429address interpreted as
430.Vt uint32_t
431and
432.Vt long
433on a little-endian 64-bit platform would collide.
434.Pp
435The last attribute of the key is the object type.
436The sleep queue to which a sleeping thread is assigned is an individual
437one for simple wait requests, mutexes, rwlocks, condvars and other
438primitives, even when the physical address of the key is same.
439.Pp
440When waking up a limited number of threads from a given sleep queue,
441the highest priority threads that have been blocked for the longest on
442the queue are selected.
443.Ss ROBUST UMUTEXES
444The
445.Em robust umutexes
446are provided as a substrate for a userspace library to implement
447.Tn POSIX
448robust mutexes.
449A robust umutex must have the
450.Dv UMUTEX_ROBUST
451flag set.
452.Pp
453On thread termination, the kernel walks two lists of mutexes.
454The two lists head addresses must be provided by a prior call to
455.Dv UMTX_OP_ROBUST_LISTS
456request.
457The lists are singly-linked.
458The link to next element is provided by the
459.Dv m_rb_lnk
460member of the
461.Vt struct umutex .
462.Pp
463Robust list processing is aborted if the kernel finds a mutex
464with any of the following conditions:
465.Bl -dash -offset indent -compact
466.It
467the
468.Dv UMUTEX_ROBUST
469flag is not set
470.It
471not owned by the current thread, except when the mutex is pointed to
472by the
473.Dv robust_inactive
474member of the
475.Vt struct umtx_robust_lists_params ,
476registered for the current thread
477.It
478the combination of mutex flags is invalid
479.It
480read of the umutex memory faults
481.It
482the list length limit described in
483.Xr libthr 3
484is reached.
485.El
486.Pp
487Every mutex in both lists is unlocked as if the
488.Dv UMTX_OP_MUTEX_UNLOCK
489request is performed on it, but instead of the
490.Dv UMUTEX_UNOWNED
491value, the
492.Dv m_owner
493field is written with the
494.Dv UMUTEX_RB_OWNERDEAD
495value.
496When a mutex in the
497.Dv UMUTEX_RB_OWNERDEAD
498state is locked by kernel due to the
499.Dv UMTX_OP_MUTEX_TRYLOCK
500and
501.Dv UMTX_OP_MUTEX_LOCK
502requests, the lock is granted and
503.Er EOWNERDEAD
504error is returned.
505.Pp
506Also, the kernel handles the
507.Dv UMUTEX_RB_NOTRECOV
508value of
509.Dv the m_owner
510field specially, always returning the
511.Er ENOTRECOVERABLE
512error for lock attempts, without granting the lock.
513.Ss OPERATIONS
514The following operations, requested by the
515.Fa op
516argument to the function, are implemented:
517.Bl -tag -width indent
518.It Dv UMTX_OP_WAIT
519Wait.
520The arguments for the request are:
521.Bl -tag -width "obj"
522.It Fa obj
523Pointer to a variable of type
524.Vt long .
525.It Fa val
526Current value of the
527.Dv *obj .
528.El
529.Pp
530The current value of the variable pointed to by the
531.Fa obj
532argument is compared with the
533.Fa val .
534If they are equal, the requesting thread is put to interruptible sleep
535until woken up or the optionally specified timeout expires.
536.Pp
537The comparison and sleep are atomic.
538In other words, if another thread writes a new value to
539.Dv *obj
540and then issues
541.Dv UMTX_OP_WAKE ,
542the request is guaranteed to not miss the wakeup,
543which might otherwise happen between comparison and blocking.
544.Pp
545The physical address of memory where the
546.Fa *obj
547variable is located, is used as a key to index sleeping threads.
548.Pp
549The read of the current value of the
550.Dv *obj
551variable is not guarded by barriers.
552In particular, it is the user's duty to ensure the lock acquire
553and release memory semantics, if the
554.Dv UMTX_OP_WAIT
555and
556.Dv UMTX_OP_WAKE
557requests are used as a substrate for implementing a simple lock.
558.Pp
559The request is not restartable.
560An unblocked signal delivered during the wait always results in sleep
561interruption and
562.Er EINTR
563error.
564.Pp
565Optionally, a timeout for the request may be specified.
566.It Dv UMTX_OP_WAKE
567Wake the threads possibly sleeping due to
568.Dv UMTX_OP_WAIT .
569The arguments for the request are:
570.Bl -tag -width "obj"
571.It Fa obj
572Pointer to a variable, used as a key to find sleeping threads.
573.It Fa val
574Up to
575.Fa val
576threads are woken up by this request.
577Specify
578.Dv INT_MAX
579to wake up all waiters.
580.El
581.It Dv UMTX_OP_MUTEX_TRYLOCK
582Try to lock umutex.
583The arguments to the request are:
584.Bl -tag -width "obj"
585.It Fa obj
586Pointer to the umutex.
587.El
588.Pp
589Operates same as the
590.Dv UMTX_OP_MUTEX_LOCK
591request, but returns
592.Er EBUSY
593instead of sleeping if the lock cannot be obtained immediately.
594.It Dv UMTX_OP_MUTEX_LOCK
595Lock umutex.
596The arguments to the request are:
597.Bl -tag -width "obj"
598.It Fa obj
599Pointer to the umutex.
600.El
601.Pp
602Locking is performed by writing the current thread id into the
603.Dv m_owner
604word of the
605.Vt struct umutex .
606The write is atomic, preserves the
607.Dv UMUTEX_CONTESTED
608contention indicator, and provides the acquire barrier for
609lock entrance semantic.
610.Pp
611If the lock cannot be obtained immediately because another thread owns
612the lock, the current thread is put to sleep, with
613.Dv UMUTEX_CONTESTED
614bit set before.
615Upon wake up, the lock conditions are re-tested.
616.Pp
617The request adheres to the priority protection or inheritance protocol
618of the mutex, specified by the
619.Dv UMUTEX_PRIO_PROTECT
620or
621.Dv UMUTEX_PRIO_INHERIT
622flag, respectively.
623.Pp
624Optionally, a timeout for the request may be specified.
625.Pp
626A request with a timeout specified is not restartable.
627An unblocked signal delivered during the wait always results in sleep
628interruption and
629.Er EINTR
630error.
631A request without timeout specified is always restarted after return
632from a signal handler.
633.It Dv UMTX_OP_MUTEX_UNLOCK
634Unlock umutex.
635The arguments to the request are:
636.Bl -tag -width "obj"
637.It Fa obj
638Pointer to the umutex.
639.El
640.Pp
641Unlocks the mutex, by writing
642.Dv UMUTEX_UNOWNED
643(zero) value into
644.Dv m_owner
645word of the
646.Vt struct umutex .
647The write is done with a release barrier, to provide lock leave semantic.
648.Pp
649If there are threads sleeping in the sleep queue associated with the
650umutex, one thread is woken up.
651If more than one thread sleeps in the sleep queue, the
652.Dv UMUTEX_CONTESTED
653bit is set together with the write of the
654.Dv UMUTEX_UNOWNED
655value into
656.Dv m_owner .
657.Pp
658The request adheres to the priority protection or inheritance protocol
659of the mutex, specified by the
660.Dv UMUTEX_PRIO_PROTECT
661or
662.Dv UMUTEX_PRIO_INHERIT
663flag, respectively.
664See description of the
665.Dv m_ceilings
666member of the
667.Vt struct umutex
668structure for additional details of the request operation on the
669priority protected protocol mutex.
670.It Dv UMTX_OP_SET_CEILING
671Set ceiling for the priority protected umutex.
672The arguments to the request are:
673.Bl -tag -width "uaddr"
674.It Fa obj
675Pointer to the umutex.
676.It Fa val
677New ceiling value.
678.It Fa uaddr
679Address of a variable of type
680.Vt uint32_t .
681If not
682.Dv NULL
683and the update was successful, the previous ceiling value is
684written to the location pointed to by
685.Fa uaddr .
686.El
687.Pp
688The request locks the umutex pointed to by the
689.Fa obj
690parameter, waiting for the lock if not immediately available.
691After the lock is obtained, the new ceiling value
692.Fa val
693is written to the
694.Dv m_ceilings[0]
695member of the
696.Vt struct umutex,
697after which the umutex is unlocked.
698.Pp
699The locking does not adhere to the priority protect protocol,
700to conform to the
701.Tn POSIX
702requirements for the
703.Xr pthread_mutex_setprioceiling 3
704interface.
705.It Dv UMTX_OP_CV_WAIT
706Wait for a condition.
707The arguments to the request are:
708.Bl -tag -width "uaddr2"
709.It Fa obj
710Pointer to the
711.Vt struct ucond .
712.It Fa val
713Request flags, see below.
714.It Fa uaddr
715Pointer to the umutex.
716.It Fa uaddr2
717Optional pointer to a
718.Vt struct timespec
719for timeout specification.
720.El
721.Pp
722The request must be issued by the thread owning the mutex pointed to
723by the
724.Fa uaddr
725argument.
726The
727.Dv c_hash_waiters
728member of the
729.Vt struct ucond ,
730pointed to by the
731.Fa obj
732argument, is set to an arbitrary non-zero value, after which the
733.Fa uaddr
734mutex is unlocked (following the appropriate protocol), and
735the current thread is put to sleep on the sleep queue keyed by
736the
737.Fa obj
738argument.
739The operations are performed atomically.
740It is guaranteed to not miss a wakeup from
741.Dv UMTX_OP_CV_SIGNAL
742or
743.Dv UMTX_OP_CV_BROADCAST
744sent between mutex unlock and putting the current thread on the sleep queue.
745.Pp
746Upon wakeup, if the timeout expired and no other threads are sleeping in
747the same sleep queue, the
748.Dv c_hash_waiters
749member is cleared.
750After wakeup, the
751.Fa uaddr
752umutex is not relocked.
753.Pp
754The following flags are defined:
755.Bl -tag -width "CVWAIT_CLOCKID"
756.It Dv CVWAIT_ABSTIME
757Timeout is absolute.
758.It Dv CVWAIT_CLOCKID
759Clockid is provided.
760.El
761.Pp
762Optionally, a timeout for the request may be specified.
763Unlike other requests, the timeout value is specified directly by a
764.Vt struct timespec ,
765pointed to by the
766.Fa uaddr2
767argument.
768If the
769.Dv CVWAIT_CLOCKID
770flag is provided, the timeout uses the clock from the
771.Dv c_clockid
772member of the
773.Vt struct ucond ,
774pointed to by
775.Fa obj
776argument.
777Otherwise,
778.Dv CLOCK_REALTIME
779is used, regardless of the clock identifier possibly specified in the
780.Vt struct _umtx_time .
781If the
782.Dv CVWAIT_ABSTIME
783flag is supplied, the timeout specifies absolute time value, otherwise
784it denotes a relative time interval.
785.Pp
786The request is not restartable.
787An unblocked signal delivered during
788the wait always results in sleep interruption and
789.Er EINTR
790error.
791.It Dv UMTX_OP_CV_SIGNAL
792Wake up one condition waiter.
793The arguments to the request are:
794.Bl -tag -width "obj"
795.It Fa obj
796Pointer to
797.Vt struct ucond .
798.El
799.Pp
800The request wakes up at most one thread sleeping on the sleep queue keyed
801by the
802.Fa obj
803argument.
804If the woken up thread was the last on the sleep queue, the
805.Dv c_has_waiters
806member of the
807.Vt struct ucond
808is cleared.
809.It Dv UMTX_OP_CV_BROADCAST
810Wake up all condition waiters.
811The arguments to the request are:
812.Bl -tag -width "obj"
813.It Fa obj
814Pointer to
815.Vt struct ucond .
816.El
817.Pp
818The request wakes up all threads sleeping on the sleep queue keyed by the
819.Fa obj
820argument.
821The
822.Dv c_has_waiters
823member of the
824.Vt struct ucond
825is cleared.
826.It Dv UMTX_OP_WAIT_UINT
827Same as
828.Dv UMTX_OP_WAIT ,
829but the type of the variable pointed to by
830.Fa obj
831is
832.Vt u_int
833.Pq a 32-bit integer .
834.It Dv UMTX_OP_RW_RDLOCK
835Read-lock a
836.Vt struct rwlock
837lock.
838The arguments to the request are:
839.Bl -tag -width "obj"
840.It Fa obj
841Pointer to the lock (of type
842.Vt struct rwlock )
843to be read-locked.
844.It Fa val
845Additional flags to augment locking behaviour.
846The valid flags in the
847.Fa val
848argument are:
849.Bl -tag -width indent
850.It Dv URWLOCK_PREFER_READER
851.El
852.El
853.Pp
854The request obtains the read lock on the specified
855.Vt struct rwlock
856by incrementing the count of readers in the
857.Dv rw_state
858word of the structure.
859If the
860.Dv URWLOCK_WRITE_OWNER
861bit is set in the word
862.Dv rw_state ,
863the lock was granted to a writer which has not yet relinquished
864its ownership.
865In this case the current thread is put to sleep until it makes sense to
866retry.
867.Pp
868If the
869.Dv URWLOCK_PREFER_READER
870flag is set either in the
871.Dv rw_flags
872word of the structure, or in the
873.Fa val
874argument of the request, the presence of the threads trying to obtain
875the write lock on the same structure does not prevent the current thread
876from trying to obtain the read lock.
877Otherwise, if the flag is not set, and the
878.Dv URWLOCK_WRITE_WAITERS
879flag is set in
880.Dv rw_state ,
881the current thread does not attempt to obtain read-lock.
882Instead it sets the
883.Dv URWLOCK_READ_WAITERS
884in the
885.Dv rw_state
886word and puts itself to sleep on corresponding sleep queue.
887Upon wakeup, the locking conditions are re-evaluated.
888.Pp
889Optionally, a timeout for the request may be specified.
890.Pp
891The request is not restartable.
892An unblocked signal delivered during the wait always results in sleep
893interruption and
894.Er EINTR
895error.
896.It Dv UMTX_OP_RW_WRLOCK
897Write-lock a
898.Vt struct rwlock
899lock.
900The arguments to the request are:
901.Bl -tag -width "obj"
902.It Fa obj
903Pointer to the lock (of type
904.Vt struct rwlock )
905to be write-locked.
906.El
907.Pp
908The request obtains a write lock on the specified
909.Vt struct rwlock ,
910by setting the
911.Dv URWLOCK_WRITE_OWNER
912bit in the
913.Dv rw_state
914word of the structure.
915If there is already a write lock owner, as indicated by the
916.Dv URWLOCK_WRITE_OWNER
917bit being set, or there are read lock owners, as indicated
918by the read-lock counter, the current thread does not attempt to
919obtain the write-lock.
920Instead it sets the
921.Dv URWLOCK_WRITE_WAITERS
922in the
923.Dv rw_state
924word and puts itself to sleep on corresponding sleep queue.
925Upon wakeup, the locking conditions are re-evaluated.
926.Pp
927Optionally, a timeout for the request may be specified.
928.Pp
929The request is not restartable.
930An unblocked signal delivered during the wait always results in sleep
931interruption and
932.Er EINTR
933error.
934.It Dv UMTX_OP_RW_UNLOCK
935Unlock rwlock.
936The arguments to the request are:
937.Bl -tag -width "obj"
938.It Fa obj
939Pointer to the lock (of type
940.Vt struct rwlock )
941to be unlocked.
942.El
943.Pp
944The unlock type (read or write) is determined by the
945current lock state.
946Note that the
947.Vt struct rwlock
948does not save information about the identity of the thread which
949acquired the lock.
950.Pp
951If there are pending writers after the unlock, and the
952.Dv URWLOCK_PREFER_READER
953flag is not set in the
954.Dv rw_flags
955member of the
956.Fa *obj
957structure, one writer is woken up, selected as described in the
958.Sx SLEEP QUEUES
959subsection.
960If the
961.Dv URWLOCK_PREFER_READER
962flag is set, a pending writer is woken up only if there is
963no pending readers.
964.Pp
965If there are no pending writers, or, in the case that the
966.Dv URWLOCK_PREFER_READER
967flag is set, then all pending readers are woken up by unlock.
968.It Dv UMTX_OP_WAIT_UINT_PRIVATE
969Same as
970.Dv UMTX_OP_WAIT_UINT ,
971but unconditionally select the process-private sleep queue.
972.It Dv UMTX_OP_WAKE_PRIVATE
973Same as
974.Dv UMTX_OP_WAKE ,
975but unconditionally select the process-private sleep queue.
976.It Dv UMTX_OP_MUTEX_WAIT
977Wait for mutex availability.
978The arguments to the request are:
979.Bl -tag -width "obj"
980.It Fa obj
981Address of the mutex.
982.El
983.Pp
984Similarly to the
985.Dv UMTX_OP_MUTEX_LOCK ,
986put the requesting thread to sleep if the mutex lock cannot be obtained
987immediately.
988The
989.Dv UMUTEX_CONTESTED
990bit is set in the
991.Dv m_owner
992word of the mutex to indicate that there is a waiter, before the thread
993is added to the sleep queue.
994Unlike the
995.Dv UMTX_OP_MUTEX_LOCK
996request, the lock is not obtained.
997.Pp
998The operation is not implemented for priority protected and
999priority inherited protocol mutexes.
1000.Pp
1001Optionally, a timeout for the request may be specified.
1002.Pp
1003A request with a timeout specified is not restartable.
1004An unblocked signal delivered during the wait always results in sleep
1005interruption and
1006.Er EINTR
1007error.
1008A request without a timeout automatically restarts if the signal disposition
1009requested restart via the
1010.Dv SA_RESTART
1011flag in
1012.Vt struct sigaction
1013member
1014.Dv sa_flags .
1015.It Dv UMTX_OP_NWAKE_PRIVATE
1016Wake up a batch of sleeping threads.
1017The arguments to the request are:
1018.Bl -tag -width "obj"
1019.It Fa obj
1020Pointer to the array of pointers.
1021.It Fa val
1022Number of elements in the array pointed to by
1023.Fa obj .
1024.El
1025.Pp
1026For each element in the array pointed to by
1027.Fa obj ,
1028wakes up all threads waiting on the
1029.Em private
1030sleep queue with the key
1031being the byte addressed by the array element.
1032.It Dv UMTX_OP_MUTEX_WAKE
1033Check if a normal umutex is unlocked and wake up a waiter.
1034The arguments for the request are:
1035.Bl -tag -width "obj"
1036.It Fa obj
1037Pointer to the umutex.
1038.El
1039.Pp
1040If the
1041.Dv m_owner
1042word of the mutex pointed to by the
1043.Fa obj
1044argument indicates unowned mutex, which has its contention indicator bit
1045.Dv UMUTEX_CONTESTED
1046set, clear the bit and wake up one waiter in the sleep queue associated
1047with the byte addressed by the
1048.Fa obj ,
1049if any.
1050Only normal mutexes are supported by the request.
1051The sleep queue is always one for a normal mutex type.
1052.Pp
1053This request is deprecated in favor of
1054.Dv UMTX_OP_MUTEX_WAKE2
1055since mutexes using it cannot synchronize their own destruction.
1056That is, the
1057.Dv m_owner
1058word has already been set to
1059.Dv UMUTEX_UNOWNED
1060when this request is made,
1061so that another thread can lock, unlock and destroy the mutex
1062(if no other thread uses the mutex afterwards).
1063Clearing the
1064.Dv UMUTEX_CONTESTED
1065bit may then modify freed memory.
1066.It Dv UMTX_OP_MUTEX_WAKE2
1067Check if a umutex is unlocked and wake up a waiter.
1068The arguments for the request are:
1069.Bl -tag -width "obj"
1070.It Fa obj
1071Pointer to the umutex.
1072.It Fa val
1073The umutex flags.
1074.El
1075.Pp
1076The request does not read the
1077.Dv m_flags
1078member of the
1079.Vt struct umutex ;
1080instead, the
1081.Fa val
1082argument supplies flag information, in particular, to determine the
1083sleep queue where the waiters are found for wake up.
1084.Pp
1085If the mutex is unowned, one waiter is woken up.
1086.Pp
1087If the mutex memory cannot be accessed, all waiters are woken up.
1088.Pp
1089If there is more than one waiter on the sleep queue, or there is only
1090one waiter but the mutex is owned by a thread, the
1091.Dv UMUTEX_CONTESTED
1092bit is set in the
1093.Dv m_owner
1094word of the
1095.Vt struct umutex .
1096.It Dv UMTX_OP_SEM2_WAIT
1097Wait until semaphore is available.
1098The arguments to the request are:
1099.Bl -tag -width "obj"
1100.It Fa obj
1101Pointer to the semaphore (of type
1102.Vt struct _usem2 ) .
1103.It Fa uaddr
1104Size of the memory passed in via the
1105.Fa uaddr2
1106argument.
1107.It Fa uaddr2
1108Optional pointer to a structure of type
1109.Vt struct _umtx_time ,
1110which may be followed by a structure of type
1111.Vt struct timespec .
1112.El
1113.Pp
1114Put the requesting thread onto a sleep queue if the semaphore counter
1115is zero.
1116If the thread is put to sleep, the
1117.Dv USEM_HAS_WAITERS
1118bit is set in the
1119.Dv _count
1120word to indicate waiters.
1121The function returns either due to
1122.Dv _count
1123indicating the semaphore is available (non-zero count due to post),
1124or due to a wakeup.
1125The return does not guarantee that the semaphore is available,
1126nor does it consume the semaphore lock on successful return.
1127.Pp
1128Optionally, a timeout for the request may be specified.
1129.Pp
1130A request with non-absolute timeout value is not restartable.
1131An unblocked signal delivered during such wait results in sleep
1132interruption and
1133.Er EINTR
1134error.
1135.Pp
1136If
1137.Dv UMTX_ABSTIME
1138was not set, and the operation was interrupted and the caller passed in a
1139.Fa uaddr2
1140large enough to hold a
1141.Vt struct timespec
1142following the initial
1143.Vt struct _umtx_time ,
1144then the
1145.Vt struct timespec
1146is updated to contain the unslept amount.
1147.It Dv UMTX_OP_SEM2_WAKE
1148Wake up waiters on semaphore lock.
1149The arguments to the request are:
1150.Bl -tag -width "obj"
1151.It Fa obj
1152Pointer to the semaphore (of type
1153.Vt struct _usem2 ) .
1154.El
1155.Pp
1156The request wakes up one waiter for the semaphore lock.
1157The function does not increment the semaphore lock count.
1158If the
1159.Dv USEM_HAS_WAITERS
1160bit was set in the
1161.Dv _count
1162word, and the last sleeping thread was woken up, the bit is cleared.
1163.It Dv UMTX_OP_SHM
1164Manage anonymous
1165.Tn POSIX
1166shared memory objects (see
1167.Xr shm_open 2 ) ,
1168which can be attached to a byte of physical memory, mapped into the
1169process address space.
1170The objects are used to implement process-shared locks in
1171.Dv libthr .
1172.Pp
1173The
1174.Fa val
1175argument specifies the sub-request of the
1176.Dv UMTX_OP_SHM
1177request:
1178.Bl -tag -width indent
1179.It Dv UMTX_SHM_CREAT
1180Creates the anonymous shared memory object, which can be looked up
1181with the specified key
1182.Fa uaddr .
1183If the object associated with the
1184.Fa uaddr
1185key already exists, it is returned instead of creating a new object.
1186The object's size is one page.
1187On success, the file descriptor referencing the object is returned.
1188The descriptor can be used for mapping the object using
1189.Xr mmap 2 ,
1190or for other shared memory operations.
1191.It Dv UMTX_SHM_LOOKUP
1192Same as
1193.Dv UMTX_SHM_CREATE
1194request, but if there is no shared memory object associated with
1195the specified key
1196.Fa uaddr ,
1197an error is returned, and no new object is created.
1198.It Dv UMTX_SHM_DESTROY
1199De-associate the shared object with the specified key
1200.Fa uaddr .
1201The object is destroyed after the last open file descriptor is closed
1202and the last mapping for it is destroyed.
1203.It Dv UMTX_SHM_ALIVE
1204Checks whether there is a live shared object associated with the
1205supplied key
1206.Fa uaddr .
1207Returns zero if there is, and an error otherwise.
1208This request is an optimization of the
1209.Dv UMTX_SHM_LOOKUP
1210request.
1211It is cheaper when only the liveness of the associated object is asked
1212for, since no file descriptor is installed in the process fd table
1213on success.
1214.El
1215.Pp
1216The
1217.Fa uaddr
1218argument specifies the virtual address, which backing physical memory
1219byte identity is used as a key for the anonymous shared object
1220creation or lookup.
1221.It Dv UMTX_OP_ROBUST_LISTS
1222Register the list heads for the current thread's robust mutex lists.
1223The arguments to the request are:
1224.Bl -tag -width "uaddr"
1225.It Fa val
1226Size of the structure passed in the
1227.Fa uaddr
1228argument.
1229.It Fa uaddr
1230Pointer to the structure of type
1231.Vt struct umtx_robust_lists_params .
1232.El
1233.Pp
1234The structure is defined as
1235.Bd -literal
1236struct umtx_robust_lists_params {
1237	uintptr_t	robust_list_offset;
1238	uintptr_t	robust_priv_list_offset;
1239	uintptr_t	robust_inact_offset;
1240};
1241.Ed
1242.Pp
1243The
1244.Dv robust_list_offset
1245member contains address of the first element in the list of locked
1246robust shared mutexes.
1247The
1248.Dv robust_priv_list_offset
1249member contains address of the first element in the list of locked
1250robust private mutexes.
1251The private and shared robust locked lists are split to allow fast
1252termination of the shared list on fork, in the child.
1253.Pp
1254The
1255.Dv robust_inact_offset
1256contains a pointer to the mutex which might be locked in nearby future,
1257or might have been just unlocked.
1258It is typically set by the lock or unlock mutex implementation code
1259around the whole operation, since lists can be only changed race-free
1260when the thread owns the mutex.
1261The kernel inspects the
1262.Dv robust_inact_offset
1263in addition to walking the shared and private lists.
1264Also, the mutex pointed to by
1265.Dv robust_inact_offset
1266is handled more loosely at the thread termination time,
1267than other mutexes on the list.
1268That mutex is allowed to be not owned by the current thread,
1269in which case list processing is continued.
1270See
1271.Sx ROBUST UMUTEXES
1272subsection for details.
1273.It Dv UMTX_OP_GET_MIN_TIMEOUT
1274Writes out the current value of minimal umtx operations timeout,
1275in nanoseconds, into the long integer variable pointed to by
1276.Fa uaddr1 .
1277.It Dv UMTX_OP_SET_MIN_TIMEOUT
1278Set the minimal amount of time, in nanoseconds, the thread is required
1279to sleep for umtx operations specifying a timeout using absolute clocks.
1280The value is taken from the
1281.Fa val
1282argument of the call.
1283Zero means no minimum.
1284.El
1285.Pp
1286The
1287.Fa op
1288argument may be a bitwise OR of a single command from above with one or more of
1289the following flags:
1290.Bl -tag -width indent
1291.It Dv UMTX_OP__I386
1292Request i386 ABI compatibility from the native
1293.Nm
1294system call.
1295Specifically, this implies that:
1296.Bl -hang -offset indent
1297.It
1298.Fa obj
1299arguments that point to a word, point to a 32-bit integer.
1300.It
1301The
1302.Dv UMTX_OP_NWAKE_PRIVATE
1303.Fa obj
1304argument is a pointer to an array of 32-bit pointers.
1305.It
1306The
1307.Dv m_rb_lnk
1308member of
1309.Vt struct umutex
1310is a 32-bit pointer.
1311.It
1312.Vt struct timespec
1313uses a 32-bit time_t.
1314.El
1315.Pp
1316.Dv UMTX_OP__32BIT
1317has no effect if this flag is set.
1318This flag is valid for all architectures, but it is ignored on i386.
1319.It Dv UMTX_OP__32BIT
1320Request non-i386, 32-bit ABI compatibility from the native
1321.Nm
1322system call.
1323Specifically, this implies that:
1324.Bl -hang -offset indent
1325.It
1326.Fa obj
1327arguments that point to a word, point to a 32-bit integer.
1328.It
1329The
1330.Dv UMTX_OP_NWAKE_PRIVATE
1331.Fa obj
1332argument is a pointer to an array of 32-bit pointers.
1333.It
1334The
1335.Dv m_rb_lnk
1336member of
1337.Vt struct umutex
1338is a 32-bit pointer.
1339.It
1340.Vt struct timespec
1341uses a 64-bit time_t.
1342.El
1343.Pp
1344This flag has no effect if
1345.Dv UMTX_OP__I386
1346is set.
1347This flag is valid for all architectures.
1348.El
1349.Pp
1350Note that if any 32-bit ABI compatibility is being requested, then care must be
1351taken with robust lists.
1352A single thread may not mix 32-bit compatible robust lists with native
1353robust lists.
1354The first
1355.Dv UMTX_OP_ROBUST_LISTS
1356call in a given thread determines which ABI that thread will use for robust
1357lists going forward.
1358.Sh RETURN VALUES
1359If successful,
1360all requests, except
1361.Dv UMTX_SHM_CREAT
1362and
1363.Dv UMTX_SHM_LOOKUP
1364sub-requests of the
1365.Dv UMTX_OP_SHM
1366request, will return zero.
1367The
1368.Dv UMTX_SHM_CREAT
1369and
1370.Dv UMTX_SHM_LOOKUP
1371return a shared memory file descriptor on success.
1372On error \-1 is returned, and the
1373.Va errno
1374variable is set to indicate the error.
1375.Sh ERRORS
1376The
1377.Fn _umtx_op
1378operations can fail with the following errors:
1379.Bl -tag -width "[ETIMEDOUT]"
1380.It Bq Er EFAULT
1381One of the arguments point to invalid memory.
1382.It Bq Er EINVAL
1383The clock identifier, specified for the
1384.Vt struct _umtx_time
1385timeout parameter, or in the
1386.Dv c_clockid
1387member of
1388.Vt struct ucond,
1389is invalid.
1390.It Bq Er EINVAL
1391The type of the mutex, encoded by the
1392.Dv m_flags
1393member of
1394.Vt struct umutex ,
1395is invalid.
1396.It Bq Er EINVAL
1397The
1398.Dv m_owner
1399member of the
1400.Vt struct umutex
1401has changed the lock owner thread identifier during unlock.
1402.It Bq Er EINVAL
1403The
1404.Dv timeout.tv_sec
1405or
1406.Dv timeout.tv_nsec
1407member of
1408.Vt struct _umtx_time
1409is less than zero, or
1410.Dv timeout.tv_nsec
1411is greater than 1000000000.
1412.It Bq Er EINVAL
1413The
1414.Fa op
1415argument specifies invalid operation.
1416.It Bq Er EINVAL
1417The
1418.Fa uaddr
1419argument for the
1420.Dv UMTX_OP_SHM
1421request specifies invalid operation.
1422.It Bq Er EINVAL
1423The
1424.Dv UMTX_OP_SET_CEILING
1425request specifies non priority protected mutex.
1426.It Bq Er EINVAL
1427The new ceiling value for the
1428.Dv UMTX_OP_SET_CEILING
1429request, or one or more of the values read from the
1430.Dv m_ceilings
1431array during lock or unlock operations, is greater than
1432.Dv RTP_PRIO_MAX .
1433.It Bq Er EPERM
1434Unlock attempted on an object not owned by the current thread.
1435.It Bq Er EOWNERDEAD
1436The lock was requested on an umutex where the
1437.Dv m_owner
1438field was set to the
1439.Dv UMUTEX_RB_OWNERDEAD
1440value, indicating terminated robust mutex.
1441The lock was granted to the caller, so this error in fact
1442indicates success with additional conditions.
1443.It Bq Er ENOTRECOVERABLE
1444The lock was requested on an umutex which
1445.Dv m_owner
1446field is equal to the
1447.Dv UMUTEX_RB_NOTRECOV
1448value, indicating abandoned robust mutex after termination.
1449The lock was not granted to the caller.
1450.It Bq Er ENOTTY
1451The shared memory object, associated with the address passed to the
1452.Dv UMTX_SHM_ALIVE
1453sub-request of
1454.Dv UMTX_OP_SHM
1455request, was destroyed.
1456.It Bq Er ESRCH
1457For the
1458.Dv UMTX_SHM_LOOKUP ,
1459.Dv UMTX_SHM_DESTROY ,
1460and
1461.Dv UMTX_SHM_ALIVE
1462sub-requests of the
1463.Dv UMTX_OP_SHM
1464request, there is no shared memory object associated with the provided key.
1465.It Bq Er ENOMEM
1466The
1467.Dv UMTX_SHM_CREAT
1468sub-request of the
1469.Dv UMTX_OP_SHM
1470request cannot be satisfied, because allocation of the shared memory object
1471would exceed the
1472.Dv RLIMIT_UMTXP
1473resource limit, see
1474.Xr setrlimit 2 .
1475.It Bq Er EAGAIN
1476The maximum number of readers
1477.Dv ( URWLOCK_MAX_READERS )
1478were already granted ownership of the given
1479.Vt struct rwlock
1480for read.
1481.It Bq Er EBUSY
1482A try mutex lock operation was not able to obtain the lock.
1483.It Bq Er ETIMEDOUT
1484The request specified a timeout in the
1485.Fa uaddr
1486and
1487.Fa uaddr2
1488arguments, and timed out before obtaining the lock or being woken up.
1489.It Bq Er EINTR
1490A signal was delivered during wait, for a non-restartable operation.
1491Operations with timeouts are typically non-restartable, but timeouts
1492specified in absolute time may be restartable.
1493.It Bq Er ERESTART
1494A signal was delivered during wait, for a restartable operation.
1495Mutex lock requests without timeout specified are restartable.
1496The error is not returned to userspace code since restart
1497is handled by usual adjustment of the instruction counter.
1498.El
1499.Sh SEE ALSO
1500.Xr clock_gettime 2 ,
1501.Xr mmap 2 ,
1502.Xr setrlimit 2 ,
1503.Xr shm_open 2 ,
1504.Xr sigaction 2 ,
1505.Xr thr_exit 2 ,
1506.Xr thr_kill 2 ,
1507.Xr thr_kill2 2 ,
1508.Xr thr_new 2 ,
1509.Xr thr_self 2 ,
1510.Xr thr_set_name 2 ,
1511.Xr signal 3
1512.Sh STANDARDS
1513The
1514.Fn _umtx_op
1515system call is non-standard and is used by the
1516.Lb libthr
1517to implement
1518.St -p1003.1-2001
1519.Xr pthread 3
1520functionality.
1521.Sh BUGS
1522A window between a unlocking robust mutex and resetting the pointer in the
1523.Dv robust_inact_offset
1524member of the registered
1525.Vt struct umtx_robust_lists_params
1526allows another thread to destroy the mutex, thus making the kernel inspect
1527freed or reused memory.
1528The
1529.Li libthr
1530implementation is only vulnerable to this race when operating on
1531a shared mutex.
1532A possible fix for the current implementation is to strengthen the checks
1533for shared mutexes before terminating them, in particular, verifying
1534that the mutex memory is mapped from a shared memory object allocated
1535by the
1536.Dv UMTX_OP_SHM
1537request.
1538This is not done because it is believed that the race is adequately
1539covered by other consistency checks, while adding the check would
1540prevent alternative implementations of
1541.Li libpthread .
1542