1.\" Copyright (c) 2016 The FreeBSD Foundation 2.\" 3.\" This documentation was written by 4.\" Konstantin Belousov <kib@FreeBSD.org> under sponsorship 5.\" from the FreeBSD Foundation. 6.\" 7.\" Redistribution and use in source and binary forms, with or without 8.\" modification, are permitted provided that the following conditions 9.\" are met: 10.\" 1. Redistributions of source code must retain the above copyright 11.\" notice, this list of conditions and the following disclaimer. 12.\" 2. Redistributions in binary form must reproduce the above copyright 13.\" notice, this list of conditions and the following disclaimer in the 14.\" documentation and/or other materials provided with the distribution. 15.\" 16.\" THIS SOFTWARE IS PROVIDED BY THE AUTHORS AND CONTRIBUTORS ``AS IS'' AND 17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 19.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHORS OR CONTRIBUTORS BE LIABLE 20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 26.\" SUCH DAMAGE. 27.\" 28.Dd November 23, 2020 29.Dt _UMTX_OP 2 30.Os 31.Sh NAME 32.Nm _umtx_op 33.Nd interface for implementation of userspace threading synchronization primitives 34.Sh LIBRARY 35.Lb libc 36.Sh SYNOPSIS 37.In sys/types.h 38.In sys/umtx.h 39.Ft int 40.Fn _umtx_op "void *obj" "int op" "u_long val" "void *uaddr" "void *uaddr2" 41.Sh DESCRIPTION 42The 43.Fn _umtx_op 44system call provides kernel support for userspace implementation of 45the threading synchronization primitives. 46The 47.Lb libthr 48uses the syscall to implement 49.St -p1003.1-2001 50pthread locks, like mutexes, condition variables and so on. 51.Ss STRUCTURES 52The operations, performed by the 53.Fn _umtx_op 54syscall, operate on userspace objects which are described 55by the following structures. 56Reserved fields and paddings are omitted. 57All objects require ABI-mandated alignment, but this is not currently 58enforced consistently on all architectures. 59.Pp 60The following flags are defined for flag fields of all structures: 61.Bl -tag -width indent 62.It Dv USYNC_PROCESS_SHARED 63Allow selection of the process-shared sleep queue for the thread sleep 64container, when the lock ownership cannot be granted immediately, 65and the operation must sleep. 66The process-shared or process-private sleep queue is selected based on 67the attributes of the memory mapping which contains the first byte of 68the structure, see 69.Xr mmap 2 . 70Otherwise, if the flag is not specified, the process-private sleep queue 71is selected regardless of the memory mapping attributes, as an optimization. 72.Pp 73See the 74.Sx SLEEP QUEUES 75subsection below for more details on sleep queues. 76.El 77.Bl -hang -offset indent 78.It Sy Mutex 79.Bd -literal 80struct umutex { 81 volatile lwpid_t m_owner; 82 uint32_t m_flags; 83 uint32_t m_ceilings[2]; 84 uintptr_t m_rb_lnk; 85}; 86.Ed 87.Pp 88The 89.Dv m_owner 90field is the actual lock. 91It contains either the thread identifier of the lock owner in the 92locked state, or zero when the lock is unowned. 93The highest bit set indicates that there is contention on the lock. 94The constants are defined for special values: 95.Bl -tag -width indent 96.It Dv UMUTEX_UNOWNED 97Zero, the value stored in the unowned lock. 98.It Dv UMUTEX_CONTESTED 99The contention indicator. 100.It Dv UMUTEX_RB_OWNERDEAD 101A thread owning the robust mutex terminated. 102The mutex is in unlocked state. 103.It Dv UMUTEX_RB_NOTRECOV 104The robust mutex is in a non-recoverable state. 105It cannot be locked until reinitialized. 106.El 107.Pp 108The 109.Dv m_flags 110field may contain the following umutex-specific flags, in addition to 111the common flags: 112.Bl -tag -width indent 113.It Dv UMUTEX_PRIO_INHERIT 114Mutex implements 115.Em Priority Inheritance 116protocol. 117.It Dv UMUTEX_PRIO_PROTECT 118Mutex implements 119.Em Priority Protection 120protocol. 121.It Dv UMUTEX_ROBUST 122Mutex is robust, as described in the 123.Sx ROBUST UMUTEXES 124section below. 125.It Dv UMUTEX_NONCONSISTENT 126Robust mutex is in a transient non-consistent state. 127Not used by kernel. 128.El 129.Pp 130In the manual page, mutexes not having 131.Dv UMUTEX_PRIO_INHERIT 132and 133.Dv UMUTEX_PRIO_PROTECT 134flags set, are called normal mutexes. 135Each type of mutex 136.Pq normal, priority-inherited, and priority-protected 137has a separate sleep queue associated 138with the given key. 139.Pp 140For priority protected mutexes, the 141.Dv m_ceilings 142array contains priority ceiling values. 143The 144.Dv m_ceilings[0] 145is the ceiling value for the mutex, as specified by 146.St -p1003.1-2008 147for the 148.Em Priority Protected 149mutex protocol. 150The 151.Dv m_ceilings[1] 152is used only for the unlock of a priority protected mutex, when 153unlock is done in an order other than the reversed lock order. 154In this case, 155.Dv m_ceilings[1] 156must contain the ceiling value for the last locked priority protected 157mutex, for proper priority reassignment. 158If, instead, the unlocking mutex was the last priority propagated 159mutex locked by the thread, 160.Dv m_ceilings[1] 161should contain \-1. 162This is required because kernel does not maintain the ordered lock list. 163.It Sy Condition variable 164.Bd -literal 165struct ucond { 166 volatile uint32_t c_has_waiters; 167 uint32_t c_flags; 168 uint32_t c_clockid; 169}; 170.Ed 171.Pp 172A non-zero 173.Dv c_has_waiters 174value indicates that there are in-kernel waiters for the condition, 175executing the 176.Dv UMTX_OP_CV_WAIT 177request. 178.Pp 179The 180.Dv c_flags 181field contains flags. 182Only the common flags 183.Pq Dv USYNC_PROCESS_SHARED 184are defined for ucond. 185.Pp 186The 187.Dv c_clockid 188member provides the clock identifier to use for timeout, when the 189.Dv UMTX_OP_CV_WAIT 190request has both the 191.Dv CVWAIT_CLOCKID 192flag and the timeout specified. 193Valid clock identifiers are a subset of those for 194.Xr clock_gettime 2 : 195.Bl -bullet -compact 196.It 197.Dv CLOCK_MONOTONIC 198.It 199.Dv CLOCK_MONOTONIC_FAST 200.It 201.Dv CLOCK_MONOTONIC_PRECISE 202.It 203.Dv CLOCK_PROF 204.It 205.Dv CLOCK_REALTIME 206.It 207.Dv CLOCK_REALTIME_FAST 208.It 209.Dv CLOCK_REALTIME_PRECISE 210.It 211.Dv CLOCK_SECOND 212.It 213.Dv CLOCK_TAI 214.It 215.Dv CLOCK_UPTIME 216.It 217.Dv CLOCK_UPTIME_FAST 218.It 219.Dv CLOCK_UPTIME_PRECISE 220.It 221.Dv CLOCK_VIRTUAL 222.El 223.It Sy Reader/writer lock 224.Bd -literal 225struct urwlock { 226 volatile int32_t rw_state; 227 uint32_t rw_flags; 228 uint32_t rw_blocked_readers; 229 uint32_t rw_blocked_writers; 230}; 231.Ed 232.Pp 233The 234.Dv rw_state 235field is the actual lock. 236It contains both the flags and counter of the read locks which were 237granted. 238Names of the 239.Dv rw_state 240bits are following: 241.Bl -tag -width indent 242.It Dv URWLOCK_WRITE_OWNER 243Write lock was granted. 244.It Dv URWLOCK_WRITE_WAITERS 245There are write lock waiters. 246.It Dv URWLOCK_READ_WAITERS 247There are read lock waiters. 248.It Dv URWLOCK_READER_COUNT(c) 249Returns the count of currently granted read locks. 250.El 251.Pp 252At any given time there may be only one thread to which the writer lock 253is granted on the 254.Vt struct rwlock , 255and no threads are granted read lock. 256Or, at the given time, up to 257.Dv URWLOCK_MAX_READERS 258threads may be granted the read lock simultaneously, but write lock is 259not granted to any thread. 260.Pp 261The following flags for the 262.Dv rw_flags 263member of 264.Vt struct urwlock 265are defined, in addition to the common flags: 266.Bl -tag -width indent 267.It Dv URWLOCK_PREFER_READER 268If specified, immediately grant read lock requests when 269.Dv urwlock 270is already read-locked, even in presence of unsatisfied write 271lock requests. 272By default, if there is a write lock waiter, further read requests are 273not granted, to prevent unfair write lock waiter starvation. 274.El 275.Pp 276The 277.Dv rw_blocked_readers 278and 279.Dv rw_blocked_writers 280members contain the count of threads which are sleeping in kernel, 281waiting for the associated request type to be granted. 282The fields are used by kernel to update the 283.Dv URWLOCK_READ_WAITERS 284and 285.Dv URWLOCK_WRITE_WAITERS 286flags of the 287.Dv rw_state 288lock after requesting thread was woken up. 289.It Sy Semaphore 290.Bd -literal 291struct _usem2 { 292 volatile uint32_t _count; 293 uint32_t _flags; 294}; 295.Ed 296.Pp 297The 298.Dv _count 299word represents a counting semaphore. 300A non-zero value indicates an unlocked (posted) semaphore, while zero 301represents the locked state. 302The maximal supported semaphore count is 303.Dv USEM_MAX_COUNT . 304.Pp 305The 306.Dv _count 307word, besides the counter of posts (unlocks), also contains the 308.Dv USEM_HAS_WAITERS 309bit, which indicates that locked semaphore has waiting threads. 310.Pp 311The 312.Dv USEM_COUNT() 313macro, applied to the 314.Dv _count 315word, returns the current semaphore counter, which is the number of posts 316issued on the semaphore. 317.Pp 318The following bits for the 319.Dv _flags 320member of 321.Vt struct _usem2 322are defined, in addition to the common flags: 323.Bl -tag -width indent 324.It Dv USEM_NAMED 325Flag is ignored by kernel. 326.El 327.It Sy Timeout parameter 328.Bd -literal 329struct _umtx_time { 330 struct timespec _timeout; 331 uint32_t _flags; 332 uint32_t _clockid; 333}; 334.Ed 335.Pp 336Several 337.Fn _umtx_op 338operations allow the blocking time to be limited, failing the request 339if it cannot be satisfied in the specified time period. 340The timeout is specified by passing either the address of 341.Vt struct timespec , 342or its extended variant, 343.Vt struct _umtx_time , 344as the 345.Fa uaddr2 346argument of 347.Fn _umtx_op . 348They are distinguished by the 349.Fa uaddr 350value, which must be equal to the size of the structure pointed to by 351.Fa uaddr2 , 352casted to 353.Vt uintptr_t . 354.Pp 355The 356.Dv _timeout 357member specifies the time when the timeout should occur. 358Legal values for clock identifier 359.Dv _clockid 360are shared with the 361.Fa clock_id 362argument to the 363.Xr clock_gettime 2 364function, 365and use the same underlying clocks. 366The specified clock is used to obtain the current time value. 367Interval counting is always performed by the monotonic wall clock. 368.Pp 369The 370.Dv _flags 371argument allows the following flags to further define the timeout behaviour: 372.Bl -tag -width indent 373.It Dv UMTX_ABSTIME 374The 375.Dv _timeout 376value is the absolute time. 377The thread will be unblocked and the request failed when specified 378clock value is equal or exceeds the 379.Dv _timeout. 380.Pp 381If the flag is absent, the timeout value is relative, that is the amount 382of time, measured by the monotonic wall clock from the moment of the request 383start. 384.El 385.El 386.Ss SLEEP QUEUES 387When a locking request cannot be immediately satisfied, the thread is 388typically put to 389.Em sleep , 390which is a non-runnable state terminated by the 391.Em wake 392operation. 393Lock operations include a 394.Em try 395variant which returns an error rather than sleeping if the lock cannot 396be obtained. 397Also, 398.Fn _umtx_op 399provides requests which explicitly put the thread to sleep. 400.Pp 401Wakes need to know which threads to make runnable, so sleeping threads 402are grouped into containers called 403.Em sleep queues . 404A sleep queue is identified by a key, which for 405.Fn _umtx_op 406is defined as the physical address of some variable. 407Note that the 408.Em physical 409address is used, which means that same variable mapped multiple 410times will give one key value. 411This mechanism enables the construction of 412.Em process-shared 413locks. 414.Pp 415A related attribute of the key is shareability. 416Some requests always interpret keys as private for the current process, 417creating sleep queues with the scope of the current process even if 418the memory is shared. 419Others either select the shareability automatically from the 420mapping attributes, or take additional input as the 421.Dv USYNC_PROCESS_SHARED 422common flag. 423This is done as optimization, allowing the lock scope to be limited 424regardless of the kind of backing memory. 425.Pp 426Only the address of the start byte of the variable specified as key is 427important for determining corresponding sleep queue. 428The size of the variable does not matter, so, for example, sleep on the same 429address interpreted as 430.Vt uint32_t 431and 432.Vt long 433on a little-endian 64-bit platform would collide. 434.Pp 435The last attribute of the key is the object type. 436The sleep queue to which a sleeping thread is assigned is an individual 437one for simple wait requests, mutexes, rwlocks, condvars and other 438primitives, even when the physical address of the key is same. 439.Pp 440When waking up a limited number of threads from a given sleep queue, 441the highest priority threads that have been blocked for the longest on 442the queue are selected. 443.Ss ROBUST UMUTEXES 444The 445.Em robust umutexes 446are provided as a substrate for a userspace library to implement 447.Tn POSIX 448robust mutexes. 449A robust umutex must have the 450.Dv UMUTEX_ROBUST 451flag set. 452.Pp 453On thread termination, the kernel walks two lists of mutexes. 454The two lists head addresses must be provided by a prior call to 455.Dv UMTX_OP_ROBUST_LISTS 456request. 457The lists are singly-linked. 458The link to next element is provided by the 459.Dv m_rb_lnk 460member of the 461.Vt struct umutex . 462.Pp 463Robust list processing is aborted if the kernel finds a mutex 464with any of the following conditions: 465.Bl -dash -offset indent -compact 466.It 467the 468.Dv UMUTEX_ROBUST 469flag is not set 470.It 471not owned by the current thread, except when the mutex is pointed to 472by the 473.Dv robust_inactive 474member of the 475.Vt struct umtx_robust_lists_params , 476registered for the current thread 477.It 478the combination of mutex flags is invalid 479.It 480read of the umutex memory faults 481.It 482the list length limit described in 483.Xr libthr 3 484is reached. 485.El 486.Pp 487Every mutex in both lists is unlocked as if the 488.Dv UMTX_OP_MUTEX_UNLOCK 489request is performed on it, but instead of the 490.Dv UMUTEX_UNOWNED 491value, the 492.Dv m_owner 493field is written with the 494.Dv UMUTEX_RB_OWNERDEAD 495value. 496When a mutex in the 497.Dv UMUTEX_RB_OWNERDEAD 498state is locked by kernel due to the 499.Dv UMTX_OP_MUTEX_TRYLOCK 500and 501.Dv UMTX_OP_MUTEX_LOCK 502requests, the lock is granted and 503.Er EOWNERDEAD 504error is returned. 505.Pp 506Also, the kernel handles the 507.Dv UMUTEX_RB_NOTRECOV 508value of 509.Dv the m_owner 510field specially, always returning the 511.Er ENOTRECOVERABLE 512error for lock attempts, without granting the lock. 513.Ss OPERATIONS 514The following operations, requested by the 515.Fa op 516argument to the function, are implemented: 517.Bl -tag -width indent 518.It Dv UMTX_OP_WAIT 519Wait. 520The arguments for the request are: 521.Bl -tag -width "obj" 522.It Fa obj 523Pointer to a variable of type 524.Vt long . 525.It Fa val 526Current value of the 527.Dv *obj . 528.El 529.Pp 530The current value of the variable pointed to by the 531.Fa obj 532argument is compared with the 533.Fa val . 534If they are equal, the requesting thread is put to interruptible sleep 535until woken up or the optionally specified timeout expires. 536.Pp 537The comparison and sleep are atomic. 538In other words, if another thread writes a new value to 539.Dv *obj 540and then issues 541.Dv UMTX_OP_WAKE , 542the request is guaranteed to not miss the wakeup, 543which might otherwise happen between comparison and blocking. 544.Pp 545The physical address of memory where the 546.Fa *obj 547variable is located, is used as a key to index sleeping threads. 548.Pp 549The read of the current value of the 550.Dv *obj 551variable is not guarded by barriers. 552In particular, it is the user's duty to ensure the lock acquire 553and release memory semantics, if the 554.Dv UMTX_OP_WAIT 555and 556.Dv UMTX_OP_WAKE 557requests are used as a substrate for implementing a simple lock. 558.Pp 559The request is not restartable. 560An unblocked signal delivered during the wait always results in sleep 561interruption and 562.Er EINTR 563error. 564.Pp 565Optionally, a timeout for the request may be specified. 566.It Dv UMTX_OP_WAKE 567Wake the threads possibly sleeping due to 568.Dv UMTX_OP_WAIT . 569The arguments for the request are: 570.Bl -tag -width "obj" 571.It Fa obj 572Pointer to a variable, used as a key to find sleeping threads. 573.It Fa val 574Up to 575.Fa val 576threads are woken up by this request. 577Specify 578.Dv INT_MAX 579to wake up all waiters. 580.El 581.It Dv UMTX_OP_MUTEX_TRYLOCK 582Try to lock umutex. 583The arguments to the request are: 584.Bl -tag -width "obj" 585.It Fa obj 586Pointer to the umutex. 587.El 588.Pp 589Operates same as the 590.Dv UMTX_OP_MUTEX_LOCK 591request, but returns 592.Er EBUSY 593instead of sleeping if the lock cannot be obtained immediately. 594.It Dv UMTX_OP_MUTEX_LOCK 595Lock umutex. 596The arguments to the request are: 597.Bl -tag -width "obj" 598.It Fa obj 599Pointer to the umutex. 600.El 601.Pp 602Locking is performed by writing the current thread id into the 603.Dv m_owner 604word of the 605.Vt struct umutex . 606The write is atomic, preserves the 607.Dv UMUTEX_CONTESTED 608contention indicator, and provides the acquire barrier for 609lock entrance semantic. 610.Pp 611If the lock cannot be obtained immediately because another thread owns 612the lock, the current thread is put to sleep, with 613.Dv UMUTEX_CONTESTED 614bit set before. 615Upon wake up, the lock conditions are re-tested. 616.Pp 617The request adheres to the priority protection or inheritance protocol 618of the mutex, specified by the 619.Dv UMUTEX_PRIO_PROTECT 620or 621.Dv UMUTEX_PRIO_INHERIT 622flag, respectively. 623.Pp 624Optionally, a timeout for the request may be specified. 625.Pp 626A request with a timeout specified is not restartable. 627An unblocked signal delivered during the wait always results in sleep 628interruption and 629.Er EINTR 630error. 631A request without timeout specified is always restarted after return 632from a signal handler. 633.It Dv UMTX_OP_MUTEX_UNLOCK 634Unlock umutex. 635The arguments to the request are: 636.Bl -tag -width "obj" 637.It Fa obj 638Pointer to the umutex. 639.El 640.Pp 641Unlocks the mutex, by writing 642.Dv UMUTEX_UNOWNED 643(zero) value into 644.Dv m_owner 645word of the 646.Vt struct umutex . 647The write is done with a release barrier, to provide lock leave semantic. 648.Pp 649If there are threads sleeping in the sleep queue associated with the 650umutex, one thread is woken up. 651If more than one thread sleeps in the sleep queue, the 652.Dv UMUTEX_CONTESTED 653bit is set together with the write of the 654.Dv UMUTEX_UNOWNED 655value into 656.Dv m_owner . 657.Pp 658The request adheres to the priority protection or inheritance protocol 659of the mutex, specified by the 660.Dv UMUTEX_PRIO_PROTECT 661or 662.Dv UMUTEX_PRIO_INHERIT 663flag, respectively. 664See description of the 665.Dv m_ceilings 666member of the 667.Vt struct umutex 668structure for additional details of the request operation on the 669priority protected protocol mutex. 670.It Dv UMTX_OP_SET_CEILING 671Set ceiling for the priority protected umutex. 672The arguments to the request are: 673.Bl -tag -width "uaddr" 674.It Fa obj 675Pointer to the umutex. 676.It Fa val 677New ceiling value. 678.It Fa uaddr 679Address of a variable of type 680.Vt uint32_t . 681If not 682.Dv NULL 683and the update was successful, the previous ceiling value is 684written to the location pointed to by 685.Fa uaddr . 686.El 687.Pp 688The request locks the umutex pointed to by the 689.Fa obj 690parameter, waiting for the lock if not immediately available. 691After the lock is obtained, the new ceiling value 692.Fa val 693is written to the 694.Dv m_ceilings[0] 695member of the 696.Vt struct umutex, 697after which the umutex is unlocked. 698.Pp 699The locking does not adhere to the priority protect protocol, 700to conform to the 701.Tn POSIX 702requirements for the 703.Xr pthread_mutex_setprioceiling 3 704interface. 705.It Dv UMTX_OP_CV_WAIT 706Wait for a condition. 707The arguments to the request are: 708.Bl -tag -width "uaddr2" 709.It Fa obj 710Pointer to the 711.Vt struct ucond . 712.It Fa val 713Request flags, see below. 714.It Fa uaddr 715Pointer to the umutex. 716.It Fa uaddr2 717Optional pointer to a 718.Vt struct timespec 719for timeout specification. 720.El 721.Pp 722The request must be issued by the thread owning the mutex pointed to 723by the 724.Fa uaddr 725argument. 726The 727.Dv c_hash_waiters 728member of the 729.Vt struct ucond , 730pointed to by the 731.Fa obj 732argument, is set to an arbitrary non-zero value, after which the 733.Fa uaddr 734mutex is unlocked (following the appropriate protocol), and 735the current thread is put to sleep on the sleep queue keyed by 736the 737.Fa obj 738argument. 739The operations are performed atomically. 740It is guaranteed to not miss a wakeup from 741.Dv UMTX_OP_CV_SIGNAL 742or 743.Dv UMTX_OP_CV_BROADCAST 744sent between mutex unlock and putting the current thread on the sleep queue. 745.Pp 746Upon wakeup, if the timeout expired and no other threads are sleeping in 747the same sleep queue, the 748.Dv c_hash_waiters 749member is cleared. 750After wakeup, the 751.Fa uaddr 752umutex is not relocked. 753.Pp 754The following flags are defined: 755.Bl -tag -width "CVWAIT_CLOCKID" 756.It Dv CVWAIT_ABSTIME 757Timeout is absolute. 758.It Dv CVWAIT_CLOCKID 759Clockid is provided. 760.El 761.Pp 762Optionally, a timeout for the request may be specified. 763Unlike other requests, the timeout value is specified directly by a 764.Vt struct timespec , 765pointed to by the 766.Fa uaddr2 767argument. 768If the 769.Dv CVWAIT_CLOCKID 770flag is provided, the timeout uses the clock from the 771.Dv c_clockid 772member of the 773.Vt struct ucond , 774pointed to by 775.Fa obj 776argument. 777Otherwise, 778.Dv CLOCK_REALTIME 779is used, regardless of the clock identifier possibly specified in the 780.Vt struct _umtx_time . 781If the 782.Dv CVWAIT_ABSTIME 783flag is supplied, the timeout specifies absolute time value, otherwise 784it denotes a relative time interval. 785.Pp 786The request is not restartable. 787An unblocked signal delivered during 788the wait always results in sleep interruption and 789.Er EINTR 790error. 791.It Dv UMTX_OP_CV_SIGNAL 792Wake up one condition waiter. 793The arguments to the request are: 794.Bl -tag -width "obj" 795.It Fa obj 796Pointer to 797.Vt struct ucond . 798.El 799.Pp 800The request wakes up at most one thread sleeping on the sleep queue keyed 801by the 802.Fa obj 803argument. 804If the woken up thread was the last on the sleep queue, the 805.Dv c_has_waiters 806member of the 807.Vt struct ucond 808is cleared. 809.It Dv UMTX_OP_CV_BROADCAST 810Wake up all condition waiters. 811The arguments to the request are: 812.Bl -tag -width "obj" 813.It Fa obj 814Pointer to 815.Vt struct ucond . 816.El 817.Pp 818The request wakes up all threads sleeping on the sleep queue keyed by the 819.Fa obj 820argument. 821The 822.Dv c_has_waiters 823member of the 824.Vt struct ucond 825is cleared. 826.It Dv UMTX_OP_WAIT_UINT 827Same as 828.Dv UMTX_OP_WAIT , 829but the type of the variable pointed to by 830.Fa obj 831is 832.Vt u_int 833.Pq a 32-bit integer . 834.It Dv UMTX_OP_RW_RDLOCK 835Read-lock a 836.Vt struct rwlock 837lock. 838The arguments to the request are: 839.Bl -tag -width "obj" 840.It Fa obj 841Pointer to the lock (of type 842.Vt struct rwlock ) 843to be read-locked. 844.It Fa val 845Additional flags to augment locking behaviour. 846The valid flags in the 847.Fa val 848argument are: 849.Bl -tag -width indent 850.It Dv URWLOCK_PREFER_READER 851.El 852.El 853.Pp 854The request obtains the read lock on the specified 855.Vt struct rwlock 856by incrementing the count of readers in the 857.Dv rw_state 858word of the structure. 859If the 860.Dv URWLOCK_WRITE_OWNER 861bit is set in the word 862.Dv rw_state , 863the lock was granted to a writer which has not yet relinquished 864its ownership. 865In this case the current thread is put to sleep until it makes sense to 866retry. 867.Pp 868If the 869.Dv URWLOCK_PREFER_READER 870flag is set either in the 871.Dv rw_flags 872word of the structure, or in the 873.Fa val 874argument of the request, the presence of the threads trying to obtain 875the write lock on the same structure does not prevent the current thread 876from trying to obtain the read lock. 877Otherwise, if the flag is not set, and the 878.Dv URWLOCK_WRITE_WAITERS 879flag is set in 880.Dv rw_state , 881the current thread does not attempt to obtain read-lock. 882Instead it sets the 883.Dv URWLOCK_READ_WAITERS 884in the 885.Dv rw_state 886word and puts itself to sleep on corresponding sleep queue. 887Upon wakeup, the locking conditions are re-evaluated. 888.Pp 889Optionally, a timeout for the request may be specified. 890.Pp 891The request is not restartable. 892An unblocked signal delivered during the wait always results in sleep 893interruption and 894.Er EINTR 895error. 896.It Dv UMTX_OP_RW_WRLOCK 897Write-lock a 898.Vt struct rwlock 899lock. 900The arguments to the request are: 901.Bl -tag -width "obj" 902.It Fa obj 903Pointer to the lock (of type 904.Vt struct rwlock ) 905to be write-locked. 906.El 907.Pp 908The request obtains a write lock on the specified 909.Vt struct rwlock , 910by setting the 911.Dv URWLOCK_WRITE_OWNER 912bit in the 913.Dv rw_state 914word of the structure. 915If there is already a write lock owner, as indicated by the 916.Dv URWLOCK_WRITE_OWNER 917bit being set, or there are read lock owners, as indicated 918by the read-lock counter, the current thread does not attempt to 919obtain the write-lock. 920Instead it sets the 921.Dv URWLOCK_WRITE_WAITERS 922in the 923.Dv rw_state 924word and puts itself to sleep on corresponding sleep queue. 925Upon wakeup, the locking conditions are re-evaluated. 926.Pp 927Optionally, a timeout for the request may be specified. 928.Pp 929The request is not restartable. 930An unblocked signal delivered during the wait always results in sleep 931interruption and 932.Er EINTR 933error. 934.It Dv UMTX_OP_RW_UNLOCK 935Unlock rwlock. 936The arguments to the request are: 937.Bl -tag -width "obj" 938.It Fa obj 939Pointer to the lock (of type 940.Vt struct rwlock ) 941to be unlocked. 942.El 943.Pp 944The unlock type (read or write) is determined by the 945current lock state. 946Note that the 947.Vt struct rwlock 948does not save information about the identity of the thread which 949acquired the lock. 950.Pp 951If there are pending writers after the unlock, and the 952.Dv URWLOCK_PREFER_READER 953flag is not set in the 954.Dv rw_flags 955member of the 956.Fa *obj 957structure, one writer is woken up, selected as described in the 958.Sx SLEEP QUEUES 959subsection. 960If the 961.Dv URWLOCK_PREFER_READER 962flag is set, a pending writer is woken up only if there is 963no pending readers. 964.Pp 965If there are no pending writers, or, in the case that the 966.Dv URWLOCK_PREFER_READER 967flag is set, then all pending readers are woken up by unlock. 968.It Dv UMTX_OP_WAIT_UINT_PRIVATE 969Same as 970.Dv UMTX_OP_WAIT_UINT , 971but unconditionally select the process-private sleep queue. 972.It Dv UMTX_OP_WAKE_PRIVATE 973Same as 974.Dv UMTX_OP_WAKE , 975but unconditionally select the process-private sleep queue. 976.It Dv UMTX_OP_MUTEX_WAIT 977Wait for mutex availability. 978The arguments to the request are: 979.Bl -tag -width "obj" 980.It Fa obj 981Address of the mutex. 982.El 983.Pp 984Similarly to the 985.Dv UMTX_OP_MUTEX_LOCK , 986put the requesting thread to sleep if the mutex lock cannot be obtained 987immediately. 988The 989.Dv UMUTEX_CONTESTED 990bit is set in the 991.Dv m_owner 992word of the mutex to indicate that there is a waiter, before the thread 993is added to the sleep queue. 994Unlike the 995.Dv UMTX_OP_MUTEX_LOCK 996request, the lock is not obtained. 997.Pp 998The operation is not implemented for priority protected and 999priority inherited protocol mutexes. 1000.Pp 1001Optionally, a timeout for the request may be specified. 1002.Pp 1003A request with a timeout specified is not restartable. 1004An unblocked signal delivered during the wait always results in sleep 1005interruption and 1006.Er EINTR 1007error. 1008A request without a timeout automatically restarts if the signal disposition 1009requested restart via the 1010.Dv SA_RESTART 1011flag in 1012.Vt struct sigaction 1013member 1014.Dv sa_flags . 1015.It Dv UMTX_OP_NWAKE_PRIVATE 1016Wake up a batch of sleeping threads. 1017The arguments to the request are: 1018.Bl -tag -width "obj" 1019.It Fa obj 1020Pointer to the array of pointers. 1021.It Fa val 1022Number of elements in the array pointed to by 1023.Fa obj . 1024.El 1025.Pp 1026For each element in the array pointed to by 1027.Fa obj , 1028wakes up all threads waiting on the 1029.Em private 1030sleep queue with the key 1031being the byte addressed by the array element. 1032.It Dv UMTX_OP_MUTEX_WAKE 1033Check if a normal umutex is unlocked and wake up a waiter. 1034The arguments for the request are: 1035.Bl -tag -width "obj" 1036.It Fa obj 1037Pointer to the umutex. 1038.El 1039.Pp 1040If the 1041.Dv m_owner 1042word of the mutex pointed to by the 1043.Fa obj 1044argument indicates unowned mutex, which has its contention indicator bit 1045.Dv UMUTEX_CONTESTED 1046set, clear the bit and wake up one waiter in the sleep queue associated 1047with the byte addressed by the 1048.Fa obj , 1049if any. 1050Only normal mutexes are supported by the request. 1051The sleep queue is always one for a normal mutex type. 1052.Pp 1053This request is deprecated in favor of 1054.Dv UMTX_OP_MUTEX_WAKE2 1055since mutexes using it cannot synchronize their own destruction. 1056That is, the 1057.Dv m_owner 1058word has already been set to 1059.Dv UMUTEX_UNOWNED 1060when this request is made, 1061so that another thread can lock, unlock and destroy the mutex 1062(if no other thread uses the mutex afterwards). 1063Clearing the 1064.Dv UMUTEX_CONTESTED 1065bit may then modify freed memory. 1066.It Dv UMTX_OP_MUTEX_WAKE2 1067Check if a umutex is unlocked and wake up a waiter. 1068The arguments for the request are: 1069.Bl -tag -width "obj" 1070.It Fa obj 1071Pointer to the umutex. 1072.It Fa val 1073The umutex flags. 1074.El 1075.Pp 1076The request does not read the 1077.Dv m_flags 1078member of the 1079.Vt struct umutex ; 1080instead, the 1081.Fa val 1082argument supplies flag information, in particular, to determine the 1083sleep queue where the waiters are found for wake up. 1084.Pp 1085If the mutex is unowned, one waiter is woken up. 1086.Pp 1087If the mutex memory cannot be accessed, all waiters are woken up. 1088.Pp 1089If there is more than one waiter on the sleep queue, or there is only 1090one waiter but the mutex is owned by a thread, the 1091.Dv UMUTEX_CONTESTED 1092bit is set in the 1093.Dv m_owner 1094word of the 1095.Vt struct umutex . 1096.It Dv UMTX_OP_SEM2_WAIT 1097Wait until semaphore is available. 1098The arguments to the request are: 1099.Bl -tag -width "obj" 1100.It Fa obj 1101Pointer to the semaphore (of type 1102.Vt struct _usem2 ) . 1103.It Fa uaddr 1104Size of the memory passed in via the 1105.Fa uaddr2 1106argument. 1107.It Fa uaddr2 1108Optional pointer to a structure of type 1109.Vt struct _umtx_time , 1110which may be followed by a structure of type 1111.Vt struct timespec . 1112.El 1113.Pp 1114Put the requesting thread onto a sleep queue if the semaphore counter 1115is zero. 1116If the thread is put to sleep, the 1117.Dv USEM_HAS_WAITERS 1118bit is set in the 1119.Dv _count 1120word to indicate waiters. 1121The function returns either due to 1122.Dv _count 1123indicating the semaphore is available (non-zero count due to post), 1124or due to a wakeup. 1125The return does not guarantee that the semaphore is available, 1126nor does it consume the semaphore lock on successful return. 1127.Pp 1128Optionally, a timeout for the request may be specified. 1129.Pp 1130A request with non-absolute timeout value is not restartable. 1131An unblocked signal delivered during such wait results in sleep 1132interruption and 1133.Er EINTR 1134error. 1135.Pp 1136If 1137.Dv UMTX_ABSTIME 1138was not set, and the operation was interrupted and the caller passed in a 1139.Fa uaddr2 1140large enough to hold a 1141.Vt struct timespec 1142following the initial 1143.Vt struct _umtx_time , 1144then the 1145.Vt struct timespec 1146is updated to contain the unslept amount. 1147.It Dv UMTX_OP_SEM2_WAKE 1148Wake up waiters on semaphore lock. 1149The arguments to the request are: 1150.Bl -tag -width "obj" 1151.It Fa obj 1152Pointer to the semaphore (of type 1153.Vt struct _usem2 ) . 1154.El 1155.Pp 1156The request wakes up one waiter for the semaphore lock. 1157The function does not increment the semaphore lock count. 1158If the 1159.Dv USEM_HAS_WAITERS 1160bit was set in the 1161.Dv _count 1162word, and the last sleeping thread was woken up, the bit is cleared. 1163.It Dv UMTX_OP_SHM 1164Manage anonymous 1165.Tn POSIX 1166shared memory objects (see 1167.Xr shm_open 2 ) , 1168which can be attached to a byte of physical memory, mapped into the 1169process address space. 1170The objects are used to implement process-shared locks in 1171.Dv libthr . 1172.Pp 1173The 1174.Fa val 1175argument specifies the sub-request of the 1176.Dv UMTX_OP_SHM 1177request: 1178.Bl -tag -width indent 1179.It Dv UMTX_SHM_CREAT 1180Creates the anonymous shared memory object, which can be looked up 1181with the specified key 1182.Fa uaddr . 1183If the object associated with the 1184.Fa uaddr 1185key already exists, it is returned instead of creating a new object. 1186The object's size is one page. 1187On success, the file descriptor referencing the object is returned. 1188The descriptor can be used for mapping the object using 1189.Xr mmap 2 , 1190or for other shared memory operations. 1191.It Dv UMTX_SHM_LOOKUP 1192Same as 1193.Dv UMTX_SHM_CREATE 1194request, but if there is no shared memory object associated with 1195the specified key 1196.Fa uaddr , 1197an error is returned, and no new object is created. 1198.It Dv UMTX_SHM_DESTROY 1199De-associate the shared object with the specified key 1200.Fa uaddr . 1201The object is destroyed after the last open file descriptor is closed 1202and the last mapping for it is destroyed. 1203.It Dv UMTX_SHM_ALIVE 1204Checks whether there is a live shared object associated with the 1205supplied key 1206.Fa uaddr . 1207Returns zero if there is, and an error otherwise. 1208This request is an optimization of the 1209.Dv UMTX_SHM_LOOKUP 1210request. 1211It is cheaper when only the liveness of the associated object is asked 1212for, since no file descriptor is installed in the process fd table 1213on success. 1214.El 1215.Pp 1216The 1217.Fa uaddr 1218argument specifies the virtual address, which backing physical memory 1219byte identity is used as a key for the anonymous shared object 1220creation or lookup. 1221.It Dv UMTX_OP_ROBUST_LISTS 1222Register the list heads for the current thread's robust mutex lists. 1223The arguments to the request are: 1224.Bl -tag -width "uaddr" 1225.It Fa val 1226Size of the structure passed in the 1227.Fa uaddr 1228argument. 1229.It Fa uaddr 1230Pointer to the structure of type 1231.Vt struct umtx_robust_lists_params . 1232.El 1233.Pp 1234The structure is defined as 1235.Bd -literal 1236struct umtx_robust_lists_params { 1237 uintptr_t robust_list_offset; 1238 uintptr_t robust_priv_list_offset; 1239 uintptr_t robust_inact_offset; 1240}; 1241.Ed 1242.Pp 1243The 1244.Dv robust_list_offset 1245member contains address of the first element in the list of locked 1246robust shared mutexes. 1247The 1248.Dv robust_priv_list_offset 1249member contains address of the first element in the list of locked 1250robust private mutexes. 1251The private and shared robust locked lists are split to allow fast 1252termination of the shared list on fork, in the child. 1253.Pp 1254The 1255.Dv robust_inact_offset 1256contains a pointer to the mutex which might be locked in nearby future, 1257or might have been just unlocked. 1258It is typically set by the lock or unlock mutex implementation code 1259around the whole operation, since lists can be only changed race-free 1260when the thread owns the mutex. 1261The kernel inspects the 1262.Dv robust_inact_offset 1263in addition to walking the shared and private lists. 1264Also, the mutex pointed to by 1265.Dv robust_inact_offset 1266is handled more loosely at the thread termination time, 1267than other mutexes on the list. 1268That mutex is allowed to be not owned by the current thread, 1269in which case list processing is continued. 1270See 1271.Sx ROBUST UMUTEXES 1272subsection for details. 1273.It Dv UMTX_OP_GET_MIN_TIMEOUT 1274Writes out the current value of minimal umtx operations timeout, 1275in nanoseconds, into the long integer variable pointed to by 1276.Fa uaddr1 . 1277.It Dv UMTX_OP_SET_MIN_TIMEOUT 1278Set the minimal amount of time, in nanoseconds, the thread is required 1279to sleep for umtx operations specifying a timeout using absolute clocks. 1280The value is taken from the 1281.Fa val 1282argument of the call. 1283Zero means no minimum. 1284.El 1285.Pp 1286The 1287.Fa op 1288argument may be a bitwise OR of a single command from above with one or more of 1289the following flags: 1290.Bl -tag -width indent 1291.It Dv UMTX_OP__I386 1292Request i386 ABI compatibility from the native 1293.Nm 1294system call. 1295Specifically, this implies that: 1296.Bl -hang -offset indent 1297.It 1298.Fa obj 1299arguments that point to a word, point to a 32-bit integer. 1300.It 1301The 1302.Dv UMTX_OP_NWAKE_PRIVATE 1303.Fa obj 1304argument is a pointer to an array of 32-bit pointers. 1305.It 1306The 1307.Dv m_rb_lnk 1308member of 1309.Vt struct umutex 1310is a 32-bit pointer. 1311.It 1312.Vt struct timespec 1313uses a 32-bit time_t. 1314.El 1315.Pp 1316.Dv UMTX_OP__32BIT 1317has no effect if this flag is set. 1318This flag is valid for all architectures, but it is ignored on i386. 1319.It Dv UMTX_OP__32BIT 1320Request non-i386, 32-bit ABI compatibility from the native 1321.Nm 1322system call. 1323Specifically, this implies that: 1324.Bl -hang -offset indent 1325.It 1326.Fa obj 1327arguments that point to a word, point to a 32-bit integer. 1328.It 1329The 1330.Dv UMTX_OP_NWAKE_PRIVATE 1331.Fa obj 1332argument is a pointer to an array of 32-bit pointers. 1333.It 1334The 1335.Dv m_rb_lnk 1336member of 1337.Vt struct umutex 1338is a 32-bit pointer. 1339.It 1340.Vt struct timespec 1341uses a 64-bit time_t. 1342.El 1343.Pp 1344This flag has no effect if 1345.Dv UMTX_OP__I386 1346is set. 1347This flag is valid for all architectures. 1348.El 1349.Pp 1350Note that if any 32-bit ABI compatibility is being requested, then care must be 1351taken with robust lists. 1352A single thread may not mix 32-bit compatible robust lists with native 1353robust lists. 1354The first 1355.Dv UMTX_OP_ROBUST_LISTS 1356call in a given thread determines which ABI that thread will use for robust 1357lists going forward. 1358.Sh RETURN VALUES 1359If successful, 1360all requests, except 1361.Dv UMTX_SHM_CREAT 1362and 1363.Dv UMTX_SHM_LOOKUP 1364sub-requests of the 1365.Dv UMTX_OP_SHM 1366request, will return zero. 1367The 1368.Dv UMTX_SHM_CREAT 1369and 1370.Dv UMTX_SHM_LOOKUP 1371return a shared memory file descriptor on success. 1372On error \-1 is returned, and the 1373.Va errno 1374variable is set to indicate the error. 1375.Sh ERRORS 1376The 1377.Fn _umtx_op 1378operations can fail with the following errors: 1379.Bl -tag -width "[ETIMEDOUT]" 1380.It Bq Er EFAULT 1381One of the arguments point to invalid memory. 1382.It Bq Er EINVAL 1383The clock identifier, specified for the 1384.Vt struct _umtx_time 1385timeout parameter, or in the 1386.Dv c_clockid 1387member of 1388.Vt struct ucond, 1389is invalid. 1390.It Bq Er EINVAL 1391The type of the mutex, encoded by the 1392.Dv m_flags 1393member of 1394.Vt struct umutex , 1395is invalid. 1396.It Bq Er EINVAL 1397The 1398.Dv m_owner 1399member of the 1400.Vt struct umutex 1401has changed the lock owner thread identifier during unlock. 1402.It Bq Er EINVAL 1403The 1404.Dv timeout.tv_sec 1405or 1406.Dv timeout.tv_nsec 1407member of 1408.Vt struct _umtx_time 1409is less than zero, or 1410.Dv timeout.tv_nsec 1411is greater than 1000000000. 1412.It Bq Er EINVAL 1413The 1414.Fa op 1415argument specifies invalid operation. 1416.It Bq Er EINVAL 1417The 1418.Fa uaddr 1419argument for the 1420.Dv UMTX_OP_SHM 1421request specifies invalid operation. 1422.It Bq Er EINVAL 1423The 1424.Dv UMTX_OP_SET_CEILING 1425request specifies non priority protected mutex. 1426.It Bq Er EINVAL 1427The new ceiling value for the 1428.Dv UMTX_OP_SET_CEILING 1429request, or one or more of the values read from the 1430.Dv m_ceilings 1431array during lock or unlock operations, is greater than 1432.Dv RTP_PRIO_MAX . 1433.It Bq Er EPERM 1434Unlock attempted on an object not owned by the current thread. 1435.It Bq Er EOWNERDEAD 1436The lock was requested on an umutex where the 1437.Dv m_owner 1438field was set to the 1439.Dv UMUTEX_RB_OWNERDEAD 1440value, indicating terminated robust mutex. 1441The lock was granted to the caller, so this error in fact 1442indicates success with additional conditions. 1443.It Bq Er ENOTRECOVERABLE 1444The lock was requested on an umutex which 1445.Dv m_owner 1446field is equal to the 1447.Dv UMUTEX_RB_NOTRECOV 1448value, indicating abandoned robust mutex after termination. 1449The lock was not granted to the caller. 1450.It Bq Er ENOTTY 1451The shared memory object, associated with the address passed to the 1452.Dv UMTX_SHM_ALIVE 1453sub-request of 1454.Dv UMTX_OP_SHM 1455request, was destroyed. 1456.It Bq Er ESRCH 1457For the 1458.Dv UMTX_SHM_LOOKUP , 1459.Dv UMTX_SHM_DESTROY , 1460and 1461.Dv UMTX_SHM_ALIVE 1462sub-requests of the 1463.Dv UMTX_OP_SHM 1464request, there is no shared memory object associated with the provided key. 1465.It Bq Er ENOMEM 1466The 1467.Dv UMTX_SHM_CREAT 1468sub-request of the 1469.Dv UMTX_OP_SHM 1470request cannot be satisfied, because allocation of the shared memory object 1471would exceed the 1472.Dv RLIMIT_UMTXP 1473resource limit, see 1474.Xr setrlimit 2 . 1475.It Bq Er EAGAIN 1476The maximum number of readers 1477.Dv ( URWLOCK_MAX_READERS ) 1478were already granted ownership of the given 1479.Vt struct rwlock 1480for read. 1481.It Bq Er EBUSY 1482A try mutex lock operation was not able to obtain the lock. 1483.It Bq Er ETIMEDOUT 1484The request specified a timeout in the 1485.Fa uaddr 1486and 1487.Fa uaddr2 1488arguments, and timed out before obtaining the lock or being woken up. 1489.It Bq Er EINTR 1490A signal was delivered during wait, for a non-restartable operation. 1491Operations with timeouts are typically non-restartable, but timeouts 1492specified in absolute time may be restartable. 1493.It Bq Er ERESTART 1494A signal was delivered during wait, for a restartable operation. 1495Mutex lock requests without timeout specified are restartable. 1496The error is not returned to userspace code since restart 1497is handled by usual adjustment of the instruction counter. 1498.El 1499.Sh SEE ALSO 1500.Xr clock_gettime 2 , 1501.Xr mmap 2 , 1502.Xr setrlimit 2 , 1503.Xr shm_open 2 , 1504.Xr sigaction 2 , 1505.Xr thr_exit 2 , 1506.Xr thr_kill 2 , 1507.Xr thr_kill2 2 , 1508.Xr thr_new 2 , 1509.Xr thr_self 2 , 1510.Xr thr_set_name 2 , 1511.Xr signal 3 1512.Sh STANDARDS 1513The 1514.Fn _umtx_op 1515system call is non-standard and is used by the 1516.Lb libthr 1517to implement 1518.St -p1003.1-2001 1519.Xr pthread 3 1520functionality. 1521.Sh BUGS 1522A window between a unlocking robust mutex and resetting the pointer in the 1523.Dv robust_inact_offset 1524member of the registered 1525.Vt struct umtx_robust_lists_params 1526allows another thread to destroy the mutex, thus making the kernel inspect 1527freed or reused memory. 1528The 1529.Li libthr 1530implementation is only vulnerable to this race when operating on 1531a shared mutex. 1532A possible fix for the current implementation is to strengthen the checks 1533for shared mutexes before terminating them, in particular, verifying 1534that the mutex memory is mapped from a shared memory object allocated 1535by the 1536.Dv UMTX_OP_SHM 1537request. 1538This is not done because it is believed that the race is adequately 1539covered by other consistency checks, while adding the check would 1540prevent alternative implementations of 1541.Li libpthread . 1542