1*7c478bd9Sstevel@tonic-gate /* 2*7c478bd9Sstevel@tonic-gate * CDDL HEADER START 3*7c478bd9Sstevel@tonic-gate * 4*7c478bd9Sstevel@tonic-gate * The contents of this file are subject to the terms of the 5*7c478bd9Sstevel@tonic-gate * Common Development and Distribution License, Version 1.0 only 6*7c478bd9Sstevel@tonic-gate * (the "License"). You may not use this file except in compliance 7*7c478bd9Sstevel@tonic-gate * with the License. 8*7c478bd9Sstevel@tonic-gate * 9*7c478bd9Sstevel@tonic-gate * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 10*7c478bd9Sstevel@tonic-gate * or http://www.opensolaris.org/os/licensing. 11*7c478bd9Sstevel@tonic-gate * See the License for the specific language governing permissions 12*7c478bd9Sstevel@tonic-gate * and limitations under the License. 13*7c478bd9Sstevel@tonic-gate * 14*7c478bd9Sstevel@tonic-gate * When distributing Covered Code, include this CDDL HEADER in each 15*7c478bd9Sstevel@tonic-gate * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 16*7c478bd9Sstevel@tonic-gate * If applicable, add the following below this CDDL HEADER, with the 17*7c478bd9Sstevel@tonic-gate * fields enclosed by brackets "[]" replaced with your own identifying 18*7c478bd9Sstevel@tonic-gate * information: Portions Copyright [yyyy] [name of copyright owner] 19*7c478bd9Sstevel@tonic-gate * 20*7c478bd9Sstevel@tonic-gate * CDDL HEADER END 21*7c478bd9Sstevel@tonic-gate */ 22*7c478bd9Sstevel@tonic-gate /* 23*7c478bd9Sstevel@tonic-gate * Copyright 2004 Sun Microsystems, Inc. All rights reserved. 24*7c478bd9Sstevel@tonic-gate * Use is subject to license terms. 25*7c478bd9Sstevel@tonic-gate */ 26*7c478bd9Sstevel@tonic-gate 27*7c478bd9Sstevel@tonic-gate #pragma ident "%Z%%M% %I% %E% SMI" 28*7c478bd9Sstevel@tonic-gate 29*7c478bd9Sstevel@tonic-gate /* 30*7c478bd9Sstevel@tonic-gate * Big Theory Statement for turnstiles. 31*7c478bd9Sstevel@tonic-gate * 32*7c478bd9Sstevel@tonic-gate * Turnstiles provide blocking and wakeup support, including priority 33*7c478bd9Sstevel@tonic-gate * inheritance, for synchronization primitives (e.g. mutexes and rwlocks). 34*7c478bd9Sstevel@tonic-gate * Typical usage is as follows: 35*7c478bd9Sstevel@tonic-gate * 36*7c478bd9Sstevel@tonic-gate * To block on lock 'lp' for read access in foo_enter(): 37*7c478bd9Sstevel@tonic-gate * 38*7c478bd9Sstevel@tonic-gate * ts = turnstile_lookup(lp); 39*7c478bd9Sstevel@tonic-gate * [ If the lock is still held, set the waiters bit 40*7c478bd9Sstevel@tonic-gate * turnstile_block(ts, TS_READER_Q, lp, &foo_sobj_ops); 41*7c478bd9Sstevel@tonic-gate * 42*7c478bd9Sstevel@tonic-gate * To wake threads waiting for write access to lock 'lp' in foo_exit(): 43*7c478bd9Sstevel@tonic-gate * 44*7c478bd9Sstevel@tonic-gate * ts = turnstile_lookup(lp); 45*7c478bd9Sstevel@tonic-gate * [ Either drop the lock (change owner to NULL) or perform a direct 46*7c478bd9Sstevel@tonic-gate * [ handoff (change owner to one of the threads we're about to wake). 47*7c478bd9Sstevel@tonic-gate * [ If we're going to wake the last waiter, clear the waiters bit. 48*7c478bd9Sstevel@tonic-gate * turnstile_wakeup(ts, TS_WRITER_Q, nwaiters, new_owner or NULL); 49*7c478bd9Sstevel@tonic-gate * 50*7c478bd9Sstevel@tonic-gate * turnstile_lookup() returns holding the turnstile hash chain lock for lp. 51*7c478bd9Sstevel@tonic-gate * Both turnstile_block() and turnstile_wakeup() drop the turnstile lock. 52*7c478bd9Sstevel@tonic-gate * To abort a turnstile operation, the client must call turnstile_exit(). 53*7c478bd9Sstevel@tonic-gate * 54*7c478bd9Sstevel@tonic-gate * Requirements of the client: 55*7c478bd9Sstevel@tonic-gate * 56*7c478bd9Sstevel@tonic-gate * (1) The lock's waiters indicator may be manipulated *only* while 57*7c478bd9Sstevel@tonic-gate * holding the turnstile hash chain lock (i.e. under turnstile_lookup()). 58*7c478bd9Sstevel@tonic-gate * 59*7c478bd9Sstevel@tonic-gate * (2) Once the lock is marked as having waiters, the owner may be 60*7c478bd9Sstevel@tonic-gate * changed *only* while holding the turnstile hash chain lock. 61*7c478bd9Sstevel@tonic-gate * 62*7c478bd9Sstevel@tonic-gate * (3) The caller must never block on an unheld lock. 63*7c478bd9Sstevel@tonic-gate * 64*7c478bd9Sstevel@tonic-gate * Consequences of these assumptions include the following: 65*7c478bd9Sstevel@tonic-gate * 66*7c478bd9Sstevel@tonic-gate * (a) It is impossible for a lock to be unheld but have waiters. 67*7c478bd9Sstevel@tonic-gate * 68*7c478bd9Sstevel@tonic-gate * (b) The priority inheritance code can safely assume that an active 69*7c478bd9Sstevel@tonic-gate * turnstile's ts_inheritor never changes until the inheritor calls 70*7c478bd9Sstevel@tonic-gate * turnstile_pi_waive(). 71*7c478bd9Sstevel@tonic-gate * 72*7c478bd9Sstevel@tonic-gate * These assumptions simplify the implementation of both turnstiles and 73*7c478bd9Sstevel@tonic-gate * their clients. 74*7c478bd9Sstevel@tonic-gate * 75*7c478bd9Sstevel@tonic-gate * Background on priority inheritance: 76*7c478bd9Sstevel@tonic-gate * 77*7c478bd9Sstevel@tonic-gate * Priority inheritance allows a thread to "will" its dispatch priority 78*7c478bd9Sstevel@tonic-gate * to all the threads blocking it, directly or indirectly. This prevents 79*7c478bd9Sstevel@tonic-gate * situations called priority inversions in which a high-priority thread 80*7c478bd9Sstevel@tonic-gate * needs a lock held by a low-priority thread, which cannot run because 81*7c478bd9Sstevel@tonic-gate * of medium-priority threads. Without PI, the medium-priority threads 82*7c478bd9Sstevel@tonic-gate * can starve out the high-priority thread indefinitely. With PI, the 83*7c478bd9Sstevel@tonic-gate * low-priority thread becomes high-priority until it releases whatever 84*7c478bd9Sstevel@tonic-gate * synchronization object the real high-priority thread is waiting for. 85*7c478bd9Sstevel@tonic-gate * 86*7c478bd9Sstevel@tonic-gate * How turnstiles work: 87*7c478bd9Sstevel@tonic-gate * 88*7c478bd9Sstevel@tonic-gate * All active turnstiles reside in a global hash table, turnstile_table[]. 89*7c478bd9Sstevel@tonic-gate * The address of a synchronization object determines its hash index. 90*7c478bd9Sstevel@tonic-gate * Each hash chain is protected by its own dispatcher lock, acquired 91*7c478bd9Sstevel@tonic-gate * by turnstile_lookup(). This lock protects the hash chain linkage, the 92*7c478bd9Sstevel@tonic-gate * contents of all turnstiles on the hash chain, and the waiters bits of 93*7c478bd9Sstevel@tonic-gate * every synchronization object in the system that hashes to the same chain. 94*7c478bd9Sstevel@tonic-gate * Giving the lock such broad scope simplifies the interactions between 95*7c478bd9Sstevel@tonic-gate * the turnstile code and its clients considerably. The blocking path 96*7c478bd9Sstevel@tonic-gate * is rare enough that this has no impact on scalability. (If it ever 97*7c478bd9Sstevel@tonic-gate * does, it's almost surely a second-order effect -- the real problem 98*7c478bd9Sstevel@tonic-gate * is that some synchronization object is *very* heavily contended.) 99*7c478bd9Sstevel@tonic-gate * 100*7c478bd9Sstevel@tonic-gate * Each thread has an attached turnstile in case it needs to block. 101*7c478bd9Sstevel@tonic-gate * A thread cannot block on more than one lock at a time, so one 102*7c478bd9Sstevel@tonic-gate * turnstile per thread is the most we ever need. The first thread 103*7c478bd9Sstevel@tonic-gate * to block on a lock donates its attached turnstile and adds it to 104*7c478bd9Sstevel@tonic-gate * the appropriate hash chain in turnstile_table[]. This becomes the 105*7c478bd9Sstevel@tonic-gate * "active turnstile" for the lock. Each subsequent thread that blocks 106*7c478bd9Sstevel@tonic-gate * on the same lock discovers that the lock already has an active 107*7c478bd9Sstevel@tonic-gate * turnstile, so it stashes its own turnstile on the active turnstile's 108*7c478bd9Sstevel@tonic-gate * freelist. As threads wake up, the process is reversed. 109*7c478bd9Sstevel@tonic-gate * 110*7c478bd9Sstevel@tonic-gate * turnstile_block() puts the current thread to sleep on the active 111*7c478bd9Sstevel@tonic-gate * turnstile for the desired lock, walks the blocking chain to apply 112*7c478bd9Sstevel@tonic-gate * priority inheritance to everyone in its way, and yields the CPU. 113*7c478bd9Sstevel@tonic-gate * 114*7c478bd9Sstevel@tonic-gate * turnstile_wakeup() waives any priority the owner may have inherited 115*7c478bd9Sstevel@tonic-gate * and wakes the specified number of waiting threads. If the caller is 116*7c478bd9Sstevel@tonic-gate * doing direct handoff of ownership (rather than just dropping the lock), 117*7c478bd9Sstevel@tonic-gate * the new owner automatically inherits priority from any existing waiters. 118*7c478bd9Sstevel@tonic-gate */ 119*7c478bd9Sstevel@tonic-gate 120*7c478bd9Sstevel@tonic-gate #include <sys/param.h> 121*7c478bd9Sstevel@tonic-gate #include <sys/systm.h> 122*7c478bd9Sstevel@tonic-gate #include <sys/thread.h> 123*7c478bd9Sstevel@tonic-gate #include <sys/proc.h> 124*7c478bd9Sstevel@tonic-gate #include <sys/debug.h> 125*7c478bd9Sstevel@tonic-gate #include <sys/cpuvar.h> 126*7c478bd9Sstevel@tonic-gate #include <sys/turnstile.h> 127*7c478bd9Sstevel@tonic-gate #include <sys/t_lock.h> 128*7c478bd9Sstevel@tonic-gate #include <sys/disp.h> 129*7c478bd9Sstevel@tonic-gate #include <sys/sobject.h> 130*7c478bd9Sstevel@tonic-gate #include <sys/cmn_err.h> 131*7c478bd9Sstevel@tonic-gate #include <sys/sysmacros.h> 132*7c478bd9Sstevel@tonic-gate #include <sys/lockstat.h> 133*7c478bd9Sstevel@tonic-gate #include <sys/lwp_upimutex_impl.h> 134*7c478bd9Sstevel@tonic-gate #include <sys/schedctl.h> 135*7c478bd9Sstevel@tonic-gate #include <sys/cpu.h> 136*7c478bd9Sstevel@tonic-gate #include <sys/sdt.h> 137*7c478bd9Sstevel@tonic-gate #include <sys/cpupart.h> 138*7c478bd9Sstevel@tonic-gate 139*7c478bd9Sstevel@tonic-gate extern upib_t upimutextab[UPIMUTEX_TABSIZE]; 140*7c478bd9Sstevel@tonic-gate 141*7c478bd9Sstevel@tonic-gate #define IS_UPI(sobj) \ 142*7c478bd9Sstevel@tonic-gate ((uintptr_t)(sobj) - (uintptr_t)upimutextab < sizeof (upimutextab)) 143*7c478bd9Sstevel@tonic-gate 144*7c478bd9Sstevel@tonic-gate /* 145*7c478bd9Sstevel@tonic-gate * The turnstile hash table is partitioned into two halves: the lower half 146*7c478bd9Sstevel@tonic-gate * is used for upimutextab[] locks, the upper half for everything else. 147*7c478bd9Sstevel@tonic-gate * The reason for the distinction is that SOBJ_USER_PI locks present a 148*7c478bd9Sstevel@tonic-gate * unique problem: the upimutextab[] lock passed to turnstile_block() 149*7c478bd9Sstevel@tonic-gate * cannot be dropped until the calling thread has blocked on its 150*7c478bd9Sstevel@tonic-gate * SOBJ_USER_PI lock and willed its priority down the blocking chain. 151*7c478bd9Sstevel@tonic-gate * At that point, the caller's t_lockp will be one of the turnstile locks. 152*7c478bd9Sstevel@tonic-gate * If mutex_exit() discovers that the upimutextab[] lock has waiters, it 153*7c478bd9Sstevel@tonic-gate * must wake them, which forces a lock ordering on us: the turnstile lock 154*7c478bd9Sstevel@tonic-gate * for the upimutextab[] lock will be acquired in mutex_vector_exit(), 155*7c478bd9Sstevel@tonic-gate * which will eventually call into turnstile_pi_waive(), which will then 156*7c478bd9Sstevel@tonic-gate * acquire the caller's thread lock, which in this case is the turnstile 157*7c478bd9Sstevel@tonic-gate * lock for the SOBJ_USER_PI lock. In general, when two turnstile locks 158*7c478bd9Sstevel@tonic-gate * must be held at the same time, the lock order must be the address order. 159*7c478bd9Sstevel@tonic-gate * Therefore, to prevent deadlock in turnstile_pi_waive(), we must ensure 160*7c478bd9Sstevel@tonic-gate * that upimutextab[] locks *always* hash to lower addresses than any 161*7c478bd9Sstevel@tonic-gate * other locks. You think this is cheesy? Let's see you do better. 162*7c478bd9Sstevel@tonic-gate */ 163*7c478bd9Sstevel@tonic-gate #define TURNSTILE_HASH_SIZE 128 /* must be power of 2 */ 164*7c478bd9Sstevel@tonic-gate #define TURNSTILE_HASH_MASK (TURNSTILE_HASH_SIZE - 1) 165*7c478bd9Sstevel@tonic-gate #define TURNSTILE_SOBJ_HASH(sobj) \ 166*7c478bd9Sstevel@tonic-gate ((((ulong_t)sobj >> 2) + ((ulong_t)sobj >> 9)) & TURNSTILE_HASH_MASK) 167*7c478bd9Sstevel@tonic-gate #define TURNSTILE_SOBJ_BUCKET(sobj) \ 168*7c478bd9Sstevel@tonic-gate ((IS_UPI(sobj) ? 0 : TURNSTILE_HASH_SIZE) + TURNSTILE_SOBJ_HASH(sobj)) 169*7c478bd9Sstevel@tonic-gate #define TURNSTILE_CHAIN(sobj) turnstile_table[TURNSTILE_SOBJ_BUCKET(sobj)] 170*7c478bd9Sstevel@tonic-gate 171*7c478bd9Sstevel@tonic-gate typedef struct turnstile_chain { 172*7c478bd9Sstevel@tonic-gate turnstile_t *tc_first; /* first turnstile on hash chain */ 173*7c478bd9Sstevel@tonic-gate disp_lock_t tc_lock; /* lock for this hash chain */ 174*7c478bd9Sstevel@tonic-gate } turnstile_chain_t; 175*7c478bd9Sstevel@tonic-gate 176*7c478bd9Sstevel@tonic-gate turnstile_chain_t turnstile_table[2 * TURNSTILE_HASH_SIZE]; 177*7c478bd9Sstevel@tonic-gate 178*7c478bd9Sstevel@tonic-gate static lock_t turnstile_loser_lock; 179*7c478bd9Sstevel@tonic-gate 180*7c478bd9Sstevel@tonic-gate /* 181*7c478bd9Sstevel@tonic-gate * Make 'inheritor' inherit priority from this turnstile. 182*7c478bd9Sstevel@tonic-gate */ 183*7c478bd9Sstevel@tonic-gate static void 184*7c478bd9Sstevel@tonic-gate turnstile_pi_inherit(turnstile_t *ts, kthread_t *inheritor, pri_t epri) 185*7c478bd9Sstevel@tonic-gate { 186*7c478bd9Sstevel@tonic-gate ASSERT(THREAD_LOCK_HELD(inheritor)); 187*7c478bd9Sstevel@tonic-gate ASSERT(DISP_LOCK_HELD(&TURNSTILE_CHAIN(ts->ts_sobj).tc_lock)); 188*7c478bd9Sstevel@tonic-gate 189*7c478bd9Sstevel@tonic-gate if (epri <= inheritor->t_pri) 190*7c478bd9Sstevel@tonic-gate return; 191*7c478bd9Sstevel@tonic-gate 192*7c478bd9Sstevel@tonic-gate if (ts->ts_inheritor == NULL) { 193*7c478bd9Sstevel@tonic-gate ts->ts_inheritor = inheritor; 194*7c478bd9Sstevel@tonic-gate ts->ts_epri = epri; 195*7c478bd9Sstevel@tonic-gate disp_lock_enter_high(&inheritor->t_pi_lock); 196*7c478bd9Sstevel@tonic-gate ts->ts_prioinv = inheritor->t_prioinv; 197*7c478bd9Sstevel@tonic-gate inheritor->t_prioinv = ts; 198*7c478bd9Sstevel@tonic-gate disp_lock_exit_high(&inheritor->t_pi_lock); 199*7c478bd9Sstevel@tonic-gate } else { 200*7c478bd9Sstevel@tonic-gate /* 201*7c478bd9Sstevel@tonic-gate * 'inheritor' is already inheriting from this turnstile, 202*7c478bd9Sstevel@tonic-gate * so just adjust its priority. 203*7c478bd9Sstevel@tonic-gate */ 204*7c478bd9Sstevel@tonic-gate ASSERT(ts->ts_inheritor == inheritor); 205*7c478bd9Sstevel@tonic-gate if (ts->ts_epri < epri) 206*7c478bd9Sstevel@tonic-gate ts->ts_epri = epri; 207*7c478bd9Sstevel@tonic-gate } 208*7c478bd9Sstevel@tonic-gate 209*7c478bd9Sstevel@tonic-gate if (epri > DISP_PRIO(inheritor)) 210*7c478bd9Sstevel@tonic-gate thread_change_epri(inheritor, epri); 211*7c478bd9Sstevel@tonic-gate } 212*7c478bd9Sstevel@tonic-gate 213*7c478bd9Sstevel@tonic-gate /* 214*7c478bd9Sstevel@tonic-gate * If turnstile is non-NULL, remove it from inheritor's t_prioinv list. 215*7c478bd9Sstevel@tonic-gate * Compute new inherited priority, and return it. 216*7c478bd9Sstevel@tonic-gate */ 217*7c478bd9Sstevel@tonic-gate static pri_t 218*7c478bd9Sstevel@tonic-gate turnstile_pi_tsdelete(turnstile_t *ts, kthread_t *inheritor) 219*7c478bd9Sstevel@tonic-gate { 220*7c478bd9Sstevel@tonic-gate turnstile_t **tspp, *tsp; 221*7c478bd9Sstevel@tonic-gate pri_t new_epri = 0; 222*7c478bd9Sstevel@tonic-gate 223*7c478bd9Sstevel@tonic-gate disp_lock_enter_high(&inheritor->t_pi_lock); 224*7c478bd9Sstevel@tonic-gate tspp = &inheritor->t_prioinv; 225*7c478bd9Sstevel@tonic-gate while ((tsp = *tspp) != NULL) { 226*7c478bd9Sstevel@tonic-gate if (tsp == ts) 227*7c478bd9Sstevel@tonic-gate *tspp = tsp->ts_prioinv; 228*7c478bd9Sstevel@tonic-gate else 229*7c478bd9Sstevel@tonic-gate new_epri = MAX(new_epri, tsp->ts_epri); 230*7c478bd9Sstevel@tonic-gate tspp = &tsp->ts_prioinv; 231*7c478bd9Sstevel@tonic-gate } 232*7c478bd9Sstevel@tonic-gate disp_lock_exit_high(&inheritor->t_pi_lock); 233*7c478bd9Sstevel@tonic-gate return (new_epri); 234*7c478bd9Sstevel@tonic-gate } 235*7c478bd9Sstevel@tonic-gate 236*7c478bd9Sstevel@tonic-gate /* 237*7c478bd9Sstevel@tonic-gate * Remove turnstile from inheritor's t_prioinv list, compute 238*7c478bd9Sstevel@tonic-gate * new priority, and change the inheritor's effective priority if 239*7c478bd9Sstevel@tonic-gate * necessary. Keep in synch with turnstile_pi_recalc(). 240*7c478bd9Sstevel@tonic-gate */ 241*7c478bd9Sstevel@tonic-gate static void 242*7c478bd9Sstevel@tonic-gate turnstile_pi_waive(turnstile_t *ts) 243*7c478bd9Sstevel@tonic-gate { 244*7c478bd9Sstevel@tonic-gate kthread_t *inheritor = ts->ts_inheritor; 245*7c478bd9Sstevel@tonic-gate pri_t new_epri; 246*7c478bd9Sstevel@tonic-gate 247*7c478bd9Sstevel@tonic-gate ASSERT(inheritor == curthread); 248*7c478bd9Sstevel@tonic-gate 249*7c478bd9Sstevel@tonic-gate thread_lock_high(inheritor); 250*7c478bd9Sstevel@tonic-gate new_epri = turnstile_pi_tsdelete(ts, inheritor); 251*7c478bd9Sstevel@tonic-gate if (new_epri != DISP_PRIO(inheritor)) 252*7c478bd9Sstevel@tonic-gate thread_change_epri(inheritor, new_epri); 253*7c478bd9Sstevel@tonic-gate ts->ts_inheritor = NULL; 254*7c478bd9Sstevel@tonic-gate if (DISP_MUST_SURRENDER(inheritor)) 255*7c478bd9Sstevel@tonic-gate cpu_surrender(inheritor); 256*7c478bd9Sstevel@tonic-gate thread_unlock_high(inheritor); 257*7c478bd9Sstevel@tonic-gate } 258*7c478bd9Sstevel@tonic-gate 259*7c478bd9Sstevel@tonic-gate /* 260*7c478bd9Sstevel@tonic-gate * Compute caller's new inherited priority, and change its effective 261*7c478bd9Sstevel@tonic-gate * priority if necessary. Necessary only for SOBJ_USER_PI, because of 262*7c478bd9Sstevel@tonic-gate * its interruptibility characteristic. 263*7c478bd9Sstevel@tonic-gate */ 264*7c478bd9Sstevel@tonic-gate void 265*7c478bd9Sstevel@tonic-gate turnstile_pi_recalc(void) 266*7c478bd9Sstevel@tonic-gate { 267*7c478bd9Sstevel@tonic-gate kthread_t *inheritor = curthread; 268*7c478bd9Sstevel@tonic-gate pri_t new_epri; 269*7c478bd9Sstevel@tonic-gate 270*7c478bd9Sstevel@tonic-gate thread_lock(inheritor); 271*7c478bd9Sstevel@tonic-gate new_epri = turnstile_pi_tsdelete(NULL, inheritor); 272*7c478bd9Sstevel@tonic-gate if (new_epri != DISP_PRIO(inheritor)) 273*7c478bd9Sstevel@tonic-gate thread_change_epri(inheritor, new_epri); 274*7c478bd9Sstevel@tonic-gate if (DISP_MUST_SURRENDER(inheritor)) 275*7c478bd9Sstevel@tonic-gate cpu_surrender(inheritor); 276*7c478bd9Sstevel@tonic-gate thread_unlock(inheritor); 277*7c478bd9Sstevel@tonic-gate } 278*7c478bd9Sstevel@tonic-gate 279*7c478bd9Sstevel@tonic-gate /* 280*7c478bd9Sstevel@tonic-gate * Grab the lock protecting the hash chain for sobj 281*7c478bd9Sstevel@tonic-gate * and return the active turnstile for sobj, if any. 282*7c478bd9Sstevel@tonic-gate */ 283*7c478bd9Sstevel@tonic-gate turnstile_t * 284*7c478bd9Sstevel@tonic-gate turnstile_lookup(void *sobj) 285*7c478bd9Sstevel@tonic-gate { 286*7c478bd9Sstevel@tonic-gate turnstile_t *ts; 287*7c478bd9Sstevel@tonic-gate turnstile_chain_t *tc = &TURNSTILE_CHAIN(sobj); 288*7c478bd9Sstevel@tonic-gate 289*7c478bd9Sstevel@tonic-gate disp_lock_enter(&tc->tc_lock); 290*7c478bd9Sstevel@tonic-gate 291*7c478bd9Sstevel@tonic-gate for (ts = tc->tc_first; ts != NULL; ts = ts->ts_next) 292*7c478bd9Sstevel@tonic-gate if (ts->ts_sobj == sobj) 293*7c478bd9Sstevel@tonic-gate break; 294*7c478bd9Sstevel@tonic-gate 295*7c478bd9Sstevel@tonic-gate return (ts); 296*7c478bd9Sstevel@tonic-gate } 297*7c478bd9Sstevel@tonic-gate 298*7c478bd9Sstevel@tonic-gate /* 299*7c478bd9Sstevel@tonic-gate * Drop the lock protecting the hash chain for sobj. 300*7c478bd9Sstevel@tonic-gate */ 301*7c478bd9Sstevel@tonic-gate void 302*7c478bd9Sstevel@tonic-gate turnstile_exit(void *sobj) 303*7c478bd9Sstevel@tonic-gate { 304*7c478bd9Sstevel@tonic-gate disp_lock_exit(&TURNSTILE_CHAIN(sobj).tc_lock); 305*7c478bd9Sstevel@tonic-gate } 306*7c478bd9Sstevel@tonic-gate 307*7c478bd9Sstevel@tonic-gate /* 308*7c478bd9Sstevel@tonic-gate * When we apply priority inheritance, we must grab the owner's thread lock 309*7c478bd9Sstevel@tonic-gate * while already holding the waiter's thread lock. If both thread locks are 310*7c478bd9Sstevel@tonic-gate * turnstile locks, this can lead to deadlock: while we hold L1 and try to 311*7c478bd9Sstevel@tonic-gate * grab L2, some unrelated thread may be applying priority inheritance to 312*7c478bd9Sstevel@tonic-gate * some other blocking chain, holding L2 and trying to grab L1. The most 313*7c478bd9Sstevel@tonic-gate * obvious solution -- do a lock_try() for the owner lock -- isn't quite 314*7c478bd9Sstevel@tonic-gate * sufficient because it can cause livelock: each thread may hold one lock, 315*7c478bd9Sstevel@tonic-gate * try to grab the other, fail, bail out, and try again, looping forever. 316*7c478bd9Sstevel@tonic-gate * To prevent livelock we must define a winner, i.e. define an arbitrary 317*7c478bd9Sstevel@tonic-gate * lock ordering on the turnstile locks. For simplicity we declare that 318*7c478bd9Sstevel@tonic-gate * virtual address order defines lock order, i.e. if L1 < L2, then the 319*7c478bd9Sstevel@tonic-gate * correct lock ordering is L1, L2. Thus the thread that holds L1 and 320*7c478bd9Sstevel@tonic-gate * wants L2 should spin until L2 is available, but the thread that holds 321*7c478bd9Sstevel@tonic-gate * L2 and can't get L1 on the first try must drop L2 and return failure. 322*7c478bd9Sstevel@tonic-gate * Moreover, the losing thread must not reacquire L2 until the winning 323*7c478bd9Sstevel@tonic-gate * thread has had a chance to grab it; to ensure this, the losing thread 324*7c478bd9Sstevel@tonic-gate * must grab L1 after dropping L2, thus spinning until the winner is done. 325*7c478bd9Sstevel@tonic-gate * Complicating matters further, note that the owner's thread lock pointer 326*7c478bd9Sstevel@tonic-gate * can change (i.e. be pointed at a different lock) while we're trying to 327*7c478bd9Sstevel@tonic-gate * grab it. If that happens, we must unwind our state and try again. 328*7c478bd9Sstevel@tonic-gate * 329*7c478bd9Sstevel@tonic-gate * On success, returns 1 with both locks held. 330*7c478bd9Sstevel@tonic-gate * On failure, returns 0 with neither lock held. 331*7c478bd9Sstevel@tonic-gate */ 332*7c478bd9Sstevel@tonic-gate static int 333*7c478bd9Sstevel@tonic-gate turnstile_interlock(lock_t *wlp, lock_t *volatile *olpp) 334*7c478bd9Sstevel@tonic-gate { 335*7c478bd9Sstevel@tonic-gate ASSERT(LOCK_HELD(wlp)); 336*7c478bd9Sstevel@tonic-gate 337*7c478bd9Sstevel@tonic-gate for (;;) { 338*7c478bd9Sstevel@tonic-gate volatile lock_t *olp = *olpp; 339*7c478bd9Sstevel@tonic-gate 340*7c478bd9Sstevel@tonic-gate /* 341*7c478bd9Sstevel@tonic-gate * If the locks are identical, there's nothing to do. 342*7c478bd9Sstevel@tonic-gate */ 343*7c478bd9Sstevel@tonic-gate if (olp == wlp) 344*7c478bd9Sstevel@tonic-gate return (1); 345*7c478bd9Sstevel@tonic-gate if (lock_try((lock_t *)olp)) { 346*7c478bd9Sstevel@tonic-gate /* 347*7c478bd9Sstevel@tonic-gate * If 'olp' is still the right lock, return success. 348*7c478bd9Sstevel@tonic-gate * Otherwise, drop 'olp' and try the dance again. 349*7c478bd9Sstevel@tonic-gate */ 350*7c478bd9Sstevel@tonic-gate if (olp == *olpp) 351*7c478bd9Sstevel@tonic-gate return (1); 352*7c478bd9Sstevel@tonic-gate lock_clear((lock_t *)olp); 353*7c478bd9Sstevel@tonic-gate } else { 354*7c478bd9Sstevel@tonic-gate uint_t spin_count = 1; 355*7c478bd9Sstevel@tonic-gate /* 356*7c478bd9Sstevel@tonic-gate * If we're grabbing the locks out of order, we lose. 357*7c478bd9Sstevel@tonic-gate * Drop the waiter's lock, and then grab and release 358*7c478bd9Sstevel@tonic-gate * the owner's lock to ensure that we won't retry 359*7c478bd9Sstevel@tonic-gate * until the winner is done (as described above). 360*7c478bd9Sstevel@tonic-gate */ 361*7c478bd9Sstevel@tonic-gate if (olp >= (lock_t *)turnstile_table && olp < wlp) { 362*7c478bd9Sstevel@tonic-gate lock_clear(wlp); 363*7c478bd9Sstevel@tonic-gate lock_set((lock_t *)olp); 364*7c478bd9Sstevel@tonic-gate lock_clear((lock_t *)olp); 365*7c478bd9Sstevel@tonic-gate return (0); 366*7c478bd9Sstevel@tonic-gate } 367*7c478bd9Sstevel@tonic-gate /* 368*7c478bd9Sstevel@tonic-gate * We're grabbing the locks in the right order, 369*7c478bd9Sstevel@tonic-gate * so spin until the owner's lock either becomes 370*7c478bd9Sstevel@tonic-gate * available or spontaneously changes. 371*7c478bd9Sstevel@tonic-gate */ 372*7c478bd9Sstevel@tonic-gate while (olp == *olpp && LOCK_HELD(olp)) { 373*7c478bd9Sstevel@tonic-gate if (panicstr) 374*7c478bd9Sstevel@tonic-gate return (1); 375*7c478bd9Sstevel@tonic-gate spin_count++; 376*7c478bd9Sstevel@tonic-gate SMT_PAUSE(); 377*7c478bd9Sstevel@tonic-gate } 378*7c478bd9Sstevel@tonic-gate LOCKSTAT_RECORD(LS_TURNSTILE_INTERLOCK_SPIN, 379*7c478bd9Sstevel@tonic-gate olp, spin_count); 380*7c478bd9Sstevel@tonic-gate } 381*7c478bd9Sstevel@tonic-gate } 382*7c478bd9Sstevel@tonic-gate } 383*7c478bd9Sstevel@tonic-gate 384*7c478bd9Sstevel@tonic-gate /* 385*7c478bd9Sstevel@tonic-gate * Block the current thread on a synchronization object. 386*7c478bd9Sstevel@tonic-gate * 387*7c478bd9Sstevel@tonic-gate * Turnstiles implement both kernel and user-level priority inheritance. 388*7c478bd9Sstevel@tonic-gate * To avoid missed wakeups in the user-level case, lwp_upimutex_lock() calls 389*7c478bd9Sstevel@tonic-gate * turnstile_block() holding the appropriate lock in the upimutextab (see 390*7c478bd9Sstevel@tonic-gate * the block comment in lwp_upimutex_lock() for details). The held lock is 391*7c478bd9Sstevel@tonic-gate * passed to turnstile_block() as the "mp" parameter, and will be dropped 392*7c478bd9Sstevel@tonic-gate * after priority has been willed, but before the thread actually sleeps 393*7c478bd9Sstevel@tonic-gate * (this locking behavior leads to some subtle ordering issues; see the 394*7c478bd9Sstevel@tonic-gate * block comment on turnstile hashing for details). This _must_ be the only 395*7c478bd9Sstevel@tonic-gate * lock held when calling turnstile_block() with a SOBJ_USER_PI sobj; holding 396*7c478bd9Sstevel@tonic-gate * other locks can result in panics due to cycles in the blocking chain. 397*7c478bd9Sstevel@tonic-gate * 398*7c478bd9Sstevel@tonic-gate * turnstile_block() always succeeds for kernel synchronization objects. 399*7c478bd9Sstevel@tonic-gate * For SOBJ_USER_PI locks the possible errors are EINTR for signals, and 400*7c478bd9Sstevel@tonic-gate * EDEADLK for cycles in the blocking chain. A return code of zero indicates 401*7c478bd9Sstevel@tonic-gate * *either* that the lock is now held, or that this is a spurious wake-up, or 402*7c478bd9Sstevel@tonic-gate * that the lock can never be held due to an ENOTRECOVERABLE error. 403*7c478bd9Sstevel@tonic-gate * It is up to lwp_upimutex_lock() to sort this all out. 404*7c478bd9Sstevel@tonic-gate */ 405*7c478bd9Sstevel@tonic-gate 406*7c478bd9Sstevel@tonic-gate int 407*7c478bd9Sstevel@tonic-gate turnstile_block(turnstile_t *ts, int qnum, void *sobj, sobj_ops_t *sobj_ops, 408*7c478bd9Sstevel@tonic-gate kmutex_t *mp, lwp_timer_t *lwptp) 409*7c478bd9Sstevel@tonic-gate { 410*7c478bd9Sstevel@tonic-gate kthread_t *owner; 411*7c478bd9Sstevel@tonic-gate kthread_t *t = curthread; 412*7c478bd9Sstevel@tonic-gate proc_t *p = ttoproc(t); 413*7c478bd9Sstevel@tonic-gate klwp_t *lwp = ttolwp(t); 414*7c478bd9Sstevel@tonic-gate turnstile_chain_t *tc = &TURNSTILE_CHAIN(sobj); 415*7c478bd9Sstevel@tonic-gate int error = 0; 416*7c478bd9Sstevel@tonic-gate int loser = 0; 417*7c478bd9Sstevel@tonic-gate 418*7c478bd9Sstevel@tonic-gate ASSERT(DISP_LOCK_HELD(&tc->tc_lock)); 419*7c478bd9Sstevel@tonic-gate ASSERT(mp == NULL || IS_UPI(mp)); 420*7c478bd9Sstevel@tonic-gate ASSERT((SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) ^ (mp == NULL)); 421*7c478bd9Sstevel@tonic-gate 422*7c478bd9Sstevel@tonic-gate thread_lock_high(t); 423*7c478bd9Sstevel@tonic-gate 424*7c478bd9Sstevel@tonic-gate if (ts == NULL) { 425*7c478bd9Sstevel@tonic-gate /* 426*7c478bd9Sstevel@tonic-gate * This is the first thread to block on this sobj. 427*7c478bd9Sstevel@tonic-gate * Take its attached turnstile and add it to the hash chain. 428*7c478bd9Sstevel@tonic-gate */ 429*7c478bd9Sstevel@tonic-gate ts = t->t_ts; 430*7c478bd9Sstevel@tonic-gate ts->ts_sobj = sobj; 431*7c478bd9Sstevel@tonic-gate ts->ts_next = tc->tc_first; 432*7c478bd9Sstevel@tonic-gate tc->tc_first = ts; 433*7c478bd9Sstevel@tonic-gate ASSERT(ts->ts_waiters == 0); 434*7c478bd9Sstevel@tonic-gate } else { 435*7c478bd9Sstevel@tonic-gate /* 436*7c478bd9Sstevel@tonic-gate * Another thread has already donated its turnstile 437*7c478bd9Sstevel@tonic-gate * to block on this sobj, so ours isn't needed. 438*7c478bd9Sstevel@tonic-gate * Stash it on the active turnstile's freelist. 439*7c478bd9Sstevel@tonic-gate */ 440*7c478bd9Sstevel@tonic-gate turnstile_t *myts = t->t_ts; 441*7c478bd9Sstevel@tonic-gate myts->ts_free = ts->ts_free; 442*7c478bd9Sstevel@tonic-gate ts->ts_free = myts; 443*7c478bd9Sstevel@tonic-gate t->t_ts = ts; 444*7c478bd9Sstevel@tonic-gate ASSERT(ts->ts_sobj == sobj); 445*7c478bd9Sstevel@tonic-gate ASSERT(ts->ts_waiters > 0); 446*7c478bd9Sstevel@tonic-gate } 447*7c478bd9Sstevel@tonic-gate 448*7c478bd9Sstevel@tonic-gate /* 449*7c478bd9Sstevel@tonic-gate * Put the thread to sleep. 450*7c478bd9Sstevel@tonic-gate */ 451*7c478bd9Sstevel@tonic-gate ASSERT(t != CPU->cpu_idle_thread); 452*7c478bd9Sstevel@tonic-gate ASSERT(CPU_ON_INTR(CPU) == 0); 453*7c478bd9Sstevel@tonic-gate ASSERT(t->t_wchan0 == NULL && t->t_wchan == NULL); 454*7c478bd9Sstevel@tonic-gate ASSERT(t->t_state == TS_ONPROC); 455*7c478bd9Sstevel@tonic-gate 456*7c478bd9Sstevel@tonic-gate if (SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) { 457*7c478bd9Sstevel@tonic-gate curthread->t_flag |= T_WAKEABLE; 458*7c478bd9Sstevel@tonic-gate } 459*7c478bd9Sstevel@tonic-gate CL_SLEEP(t); /* assign kernel priority */ 460*7c478bd9Sstevel@tonic-gate THREAD_SLEEP(t, &tc->tc_lock); 461*7c478bd9Sstevel@tonic-gate t->t_wchan = sobj; 462*7c478bd9Sstevel@tonic-gate t->t_sobj_ops = sobj_ops; 463*7c478bd9Sstevel@tonic-gate DTRACE_SCHED(sleep); 464*7c478bd9Sstevel@tonic-gate 465*7c478bd9Sstevel@tonic-gate if (lwp != NULL) { 466*7c478bd9Sstevel@tonic-gate lwp->lwp_ru.nvcsw++; 467*7c478bd9Sstevel@tonic-gate (void) new_mstate(t, LMS_SLEEP); 468*7c478bd9Sstevel@tonic-gate if (SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) { 469*7c478bd9Sstevel@tonic-gate lwp->lwp_asleep = 1; 470*7c478bd9Sstevel@tonic-gate lwp->lwp_sysabort = 0; 471*7c478bd9Sstevel@tonic-gate /* 472*7c478bd9Sstevel@tonic-gate * make wchan0 non-zero to conform to the rule that 473*7c478bd9Sstevel@tonic-gate * threads blocking for user-level objects have a 474*7c478bd9Sstevel@tonic-gate * non-zero wchan0: this prevents spurious wake-ups 475*7c478bd9Sstevel@tonic-gate * by, for example, /proc. 476*7c478bd9Sstevel@tonic-gate */ 477*7c478bd9Sstevel@tonic-gate t->t_wchan0 = (caddr_t)1; 478*7c478bd9Sstevel@tonic-gate } 479*7c478bd9Sstevel@tonic-gate } 480*7c478bd9Sstevel@tonic-gate ts->ts_waiters++; 481*7c478bd9Sstevel@tonic-gate sleepq_insert(&ts->ts_sleepq[qnum], t); 482*7c478bd9Sstevel@tonic-gate 483*7c478bd9Sstevel@tonic-gate if (SOBJ_TYPE(sobj_ops) == SOBJ_MUTEX && 484*7c478bd9Sstevel@tonic-gate SOBJ_OWNER(sobj_ops, sobj) == NULL) 485*7c478bd9Sstevel@tonic-gate panic("turnstile_block(%p): unowned mutex", (void *)ts); 486*7c478bd9Sstevel@tonic-gate 487*7c478bd9Sstevel@tonic-gate /* 488*7c478bd9Sstevel@tonic-gate * Follow the blocking chain to its end, willing our priority to 489*7c478bd9Sstevel@tonic-gate * everyone who's in our way. 490*7c478bd9Sstevel@tonic-gate */ 491*7c478bd9Sstevel@tonic-gate while (t->t_sobj_ops != NULL && 492*7c478bd9Sstevel@tonic-gate (owner = SOBJ_OWNER(t->t_sobj_ops, t->t_wchan)) != NULL) { 493*7c478bd9Sstevel@tonic-gate if (owner == curthread) { 494*7c478bd9Sstevel@tonic-gate if (SOBJ_TYPE(sobj_ops) != SOBJ_USER_PI) { 495*7c478bd9Sstevel@tonic-gate panic("Deadlock: cycle in blocking chain"); 496*7c478bd9Sstevel@tonic-gate } 497*7c478bd9Sstevel@tonic-gate /* 498*7c478bd9Sstevel@tonic-gate * If the cycle we've encountered ends in mp, 499*7c478bd9Sstevel@tonic-gate * then we know it isn't a 'real' cycle because 500*7c478bd9Sstevel@tonic-gate * we're going to drop mp before we go to sleep. 501*7c478bd9Sstevel@tonic-gate * Moreover, since we've come full circle we know 502*7c478bd9Sstevel@tonic-gate * that we must have willed priority to everyone 503*7c478bd9Sstevel@tonic-gate * in our way. Therefore, we can break out now. 504*7c478bd9Sstevel@tonic-gate */ 505*7c478bd9Sstevel@tonic-gate if (t->t_wchan == (void *)mp) 506*7c478bd9Sstevel@tonic-gate break; 507*7c478bd9Sstevel@tonic-gate 508*7c478bd9Sstevel@tonic-gate if (loser) 509*7c478bd9Sstevel@tonic-gate lock_clear(&turnstile_loser_lock); 510*7c478bd9Sstevel@tonic-gate /* 511*7c478bd9Sstevel@tonic-gate * For SOBJ_USER_PI, a cycle is an application 512*7c478bd9Sstevel@tonic-gate * deadlock which needs to be communicated 513*7c478bd9Sstevel@tonic-gate * back to the application. 514*7c478bd9Sstevel@tonic-gate */ 515*7c478bd9Sstevel@tonic-gate thread_unlock_nopreempt(t); 516*7c478bd9Sstevel@tonic-gate if (lwptp->lwpt_id != 0) { 517*7c478bd9Sstevel@tonic-gate /* 518*7c478bd9Sstevel@tonic-gate * We enqueued a timeout, we are 519*7c478bd9Sstevel@tonic-gate * holding curthread->t_delay_lock. 520*7c478bd9Sstevel@tonic-gate * Drop it and dequeue the timeout. 521*7c478bd9Sstevel@tonic-gate */ 522*7c478bd9Sstevel@tonic-gate mutex_exit(&curthread->t_delay_lock); 523*7c478bd9Sstevel@tonic-gate (void) lwp_timer_dequeue(lwptp); 524*7c478bd9Sstevel@tonic-gate } 525*7c478bd9Sstevel@tonic-gate mutex_exit(mp); 526*7c478bd9Sstevel@tonic-gate setrun(curthread); 527*7c478bd9Sstevel@tonic-gate swtch(); /* necessary to transition state */ 528*7c478bd9Sstevel@tonic-gate curthread->t_flag &= ~T_WAKEABLE; 529*7c478bd9Sstevel@tonic-gate setallwatch(); 530*7c478bd9Sstevel@tonic-gate lwp->lwp_asleep = 0; 531*7c478bd9Sstevel@tonic-gate lwp->lwp_sysabort = 0; 532*7c478bd9Sstevel@tonic-gate return (EDEADLK); 533*7c478bd9Sstevel@tonic-gate } 534*7c478bd9Sstevel@tonic-gate if (!turnstile_interlock(t->t_lockp, &owner->t_lockp)) { 535*7c478bd9Sstevel@tonic-gate /* 536*7c478bd9Sstevel@tonic-gate * If we failed to grab the owner's thread lock, 537*7c478bd9Sstevel@tonic-gate * turnstile_interlock() will have dropped t's 538*7c478bd9Sstevel@tonic-gate * thread lock, so at this point we don't even know 539*7c478bd9Sstevel@tonic-gate * that 't' exists anymore. The simplest solution 540*7c478bd9Sstevel@tonic-gate * is to restart the entire priority inheritance dance 541*7c478bd9Sstevel@tonic-gate * from the beginning of the blocking chain, since 542*7c478bd9Sstevel@tonic-gate * we *do* know that 'curthread' still exists. 543*7c478bd9Sstevel@tonic-gate * Application of priority inheritance is idempotent, 544*7c478bd9Sstevel@tonic-gate * so it's OK that we're doing it more than once. 545*7c478bd9Sstevel@tonic-gate * Note also that since we've dropped our thread lock, 546*7c478bd9Sstevel@tonic-gate * we may already have been woken up; if so, our 547*7c478bd9Sstevel@tonic-gate * t_sobj_ops will be NULL, the loop will terminate, 548*7c478bd9Sstevel@tonic-gate * and the call to swtch() will be a no-op. Phew. 549*7c478bd9Sstevel@tonic-gate * 550*7c478bd9Sstevel@tonic-gate * There is one further complication: if two (or more) 551*7c478bd9Sstevel@tonic-gate * threads keep trying to grab the turnstile locks out 552*7c478bd9Sstevel@tonic-gate * of order and keep losing the race to another thread, 553*7c478bd9Sstevel@tonic-gate * these "dueling losers" can livelock the system. 554*7c478bd9Sstevel@tonic-gate * Therefore, once we get into this rare situation, 555*7c478bd9Sstevel@tonic-gate * we serialize all the losers. 556*7c478bd9Sstevel@tonic-gate */ 557*7c478bd9Sstevel@tonic-gate if (loser == 0) { 558*7c478bd9Sstevel@tonic-gate loser = 1; 559*7c478bd9Sstevel@tonic-gate lock_set(&turnstile_loser_lock); 560*7c478bd9Sstevel@tonic-gate } 561*7c478bd9Sstevel@tonic-gate t = curthread; 562*7c478bd9Sstevel@tonic-gate thread_lock_high(t); 563*7c478bd9Sstevel@tonic-gate continue; 564*7c478bd9Sstevel@tonic-gate } 565*7c478bd9Sstevel@tonic-gate 566*7c478bd9Sstevel@tonic-gate /* 567*7c478bd9Sstevel@tonic-gate * We now have the owner's thread lock. If we are traversing 568*7c478bd9Sstevel@tonic-gate * from non-SOBJ_USER_PI ops to SOBJ_USER_PI ops, then we know 569*7c478bd9Sstevel@tonic-gate * that we have caught the thread while in the TS_SLEEP state, 570*7c478bd9Sstevel@tonic-gate * but holding mp. We know that this situation is transient 571*7c478bd9Sstevel@tonic-gate * (mp will be dropped before the holder actually sleeps on 572*7c478bd9Sstevel@tonic-gate * the SOBJ_USER_PI sobj), so we will spin waiting for mp to 573*7c478bd9Sstevel@tonic-gate * be dropped. Then, as in the turnstile_interlock() failure 574*7c478bd9Sstevel@tonic-gate * case, we will restart the priority inheritance dance. 575*7c478bd9Sstevel@tonic-gate */ 576*7c478bd9Sstevel@tonic-gate if (SOBJ_TYPE(t->t_sobj_ops) != SOBJ_USER_PI && 577*7c478bd9Sstevel@tonic-gate owner->t_sobj_ops != NULL && 578*7c478bd9Sstevel@tonic-gate SOBJ_TYPE(owner->t_sobj_ops) == SOBJ_USER_PI) { 579*7c478bd9Sstevel@tonic-gate kmutex_t *upi_lock = (kmutex_t *)t->t_wchan; 580*7c478bd9Sstevel@tonic-gate 581*7c478bd9Sstevel@tonic-gate ASSERT(IS_UPI(upi_lock)); 582*7c478bd9Sstevel@tonic-gate ASSERT(SOBJ_TYPE(t->t_sobj_ops) == SOBJ_MUTEX); 583*7c478bd9Sstevel@tonic-gate 584*7c478bd9Sstevel@tonic-gate if (t->t_lockp != owner->t_lockp) 585*7c478bd9Sstevel@tonic-gate thread_unlock_high(owner); 586*7c478bd9Sstevel@tonic-gate thread_unlock_high(t); 587*7c478bd9Sstevel@tonic-gate if (loser) 588*7c478bd9Sstevel@tonic-gate lock_clear(&turnstile_loser_lock); 589*7c478bd9Sstevel@tonic-gate 590*7c478bd9Sstevel@tonic-gate while (mutex_owner(upi_lock) == owner) { 591*7c478bd9Sstevel@tonic-gate SMT_PAUSE(); 592*7c478bd9Sstevel@tonic-gate continue; 593*7c478bd9Sstevel@tonic-gate } 594*7c478bd9Sstevel@tonic-gate 595*7c478bd9Sstevel@tonic-gate if (loser) 596*7c478bd9Sstevel@tonic-gate lock_set(&turnstile_loser_lock); 597*7c478bd9Sstevel@tonic-gate t = curthread; 598*7c478bd9Sstevel@tonic-gate thread_lock_high(t); 599*7c478bd9Sstevel@tonic-gate continue; 600*7c478bd9Sstevel@tonic-gate } 601*7c478bd9Sstevel@tonic-gate 602*7c478bd9Sstevel@tonic-gate turnstile_pi_inherit(t->t_ts, owner, DISP_PRIO(t)); 603*7c478bd9Sstevel@tonic-gate if (t->t_lockp != owner->t_lockp) 604*7c478bd9Sstevel@tonic-gate thread_unlock_high(t); 605*7c478bd9Sstevel@tonic-gate t = owner; 606*7c478bd9Sstevel@tonic-gate } 607*7c478bd9Sstevel@tonic-gate 608*7c478bd9Sstevel@tonic-gate if (loser) 609*7c478bd9Sstevel@tonic-gate lock_clear(&turnstile_loser_lock); 610*7c478bd9Sstevel@tonic-gate 611*7c478bd9Sstevel@tonic-gate /* 612*7c478bd9Sstevel@tonic-gate * Note: 't' and 'curthread' were synonymous before the loop above, 613*7c478bd9Sstevel@tonic-gate * but now they may be different. ('t' is now the last thread in 614*7c478bd9Sstevel@tonic-gate * the blocking chain.) 615*7c478bd9Sstevel@tonic-gate */ 616*7c478bd9Sstevel@tonic-gate if (SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) { 617*7c478bd9Sstevel@tonic-gate ushort_t s = curthread->t_oldspl; 618*7c478bd9Sstevel@tonic-gate int timedwait = 0; 619*7c478bd9Sstevel@tonic-gate clock_t tim = -1; 620*7c478bd9Sstevel@tonic-gate 621*7c478bd9Sstevel@tonic-gate thread_unlock_high(t); 622*7c478bd9Sstevel@tonic-gate if (lwptp->lwpt_id != 0) { 623*7c478bd9Sstevel@tonic-gate /* 624*7c478bd9Sstevel@tonic-gate * We enqueued a timeout and we are 625*7c478bd9Sstevel@tonic-gate * holding curthread->t_delay_lock. 626*7c478bd9Sstevel@tonic-gate */ 627*7c478bd9Sstevel@tonic-gate mutex_exit(&curthread->t_delay_lock); 628*7c478bd9Sstevel@tonic-gate timedwait = 1; 629*7c478bd9Sstevel@tonic-gate } 630*7c478bd9Sstevel@tonic-gate mutex_exit(mp); 631*7c478bd9Sstevel@tonic-gate splx(s); 632*7c478bd9Sstevel@tonic-gate 633*7c478bd9Sstevel@tonic-gate if (ISSIG(curthread, JUSTLOOKING) || 634*7c478bd9Sstevel@tonic-gate MUSTRETURN(p, curthread) || lwptp->lwpt_imm_timeout) 635*7c478bd9Sstevel@tonic-gate setrun(curthread); 636*7c478bd9Sstevel@tonic-gate swtch(); 637*7c478bd9Sstevel@tonic-gate curthread->t_flag &= ~T_WAKEABLE; 638*7c478bd9Sstevel@tonic-gate if (timedwait) 639*7c478bd9Sstevel@tonic-gate tim = lwp_timer_dequeue(lwptp); 640*7c478bd9Sstevel@tonic-gate setallwatch(); 641*7c478bd9Sstevel@tonic-gate if (ISSIG(curthread, FORREAL) || lwp->lwp_sysabort || 642*7c478bd9Sstevel@tonic-gate MUSTRETURN(p, curthread)) 643*7c478bd9Sstevel@tonic-gate error = EINTR; 644*7c478bd9Sstevel@tonic-gate else if (lwptp->lwpt_imm_timeout || (timedwait && tim == -1)) 645*7c478bd9Sstevel@tonic-gate error = ETIME; 646*7c478bd9Sstevel@tonic-gate lwp->lwp_sysabort = 0; 647*7c478bd9Sstevel@tonic-gate lwp->lwp_asleep = 0; 648*7c478bd9Sstevel@tonic-gate } else { 649*7c478bd9Sstevel@tonic-gate thread_unlock_nopreempt(t); 650*7c478bd9Sstevel@tonic-gate swtch(); 651*7c478bd9Sstevel@tonic-gate } 652*7c478bd9Sstevel@tonic-gate 653*7c478bd9Sstevel@tonic-gate return (error); 654*7c478bd9Sstevel@tonic-gate } 655*7c478bd9Sstevel@tonic-gate 656*7c478bd9Sstevel@tonic-gate /* 657*7c478bd9Sstevel@tonic-gate * Remove thread from specified turnstile sleep queue; retrieve its 658*7c478bd9Sstevel@tonic-gate * free turnstile; if it is the last waiter, delete the turnstile 659*7c478bd9Sstevel@tonic-gate * from the turnstile chain and if there is an inheritor, delete it 660*7c478bd9Sstevel@tonic-gate * from the inheritor's t_prioinv chain. 661*7c478bd9Sstevel@tonic-gate */ 662*7c478bd9Sstevel@tonic-gate static void 663*7c478bd9Sstevel@tonic-gate turnstile_dequeue(kthread_t *t) 664*7c478bd9Sstevel@tonic-gate { 665*7c478bd9Sstevel@tonic-gate turnstile_t *ts = t->t_ts; 666*7c478bd9Sstevel@tonic-gate turnstile_chain_t *tc = &TURNSTILE_CHAIN(ts->ts_sobj); 667*7c478bd9Sstevel@tonic-gate turnstile_t *tsfree, **tspp; 668*7c478bd9Sstevel@tonic-gate 669*7c478bd9Sstevel@tonic-gate ASSERT(DISP_LOCK_HELD(&tc->tc_lock)); 670*7c478bd9Sstevel@tonic-gate ASSERT(t->t_lockp == &tc->tc_lock); 671*7c478bd9Sstevel@tonic-gate 672*7c478bd9Sstevel@tonic-gate if ((tsfree = ts->ts_free) != NULL) { 673*7c478bd9Sstevel@tonic-gate ASSERT(ts->ts_waiters > 1); 674*7c478bd9Sstevel@tonic-gate ASSERT(tsfree->ts_waiters == 0); 675*7c478bd9Sstevel@tonic-gate t->t_ts = tsfree; 676*7c478bd9Sstevel@tonic-gate ts->ts_free = tsfree->ts_free; 677*7c478bd9Sstevel@tonic-gate tsfree->ts_free = NULL; 678*7c478bd9Sstevel@tonic-gate } else { 679*7c478bd9Sstevel@tonic-gate /* 680*7c478bd9Sstevel@tonic-gate * The active turnstile's freelist is empty, so this 681*7c478bd9Sstevel@tonic-gate * must be the last waiter. Remove the turnstile 682*7c478bd9Sstevel@tonic-gate * from the hash chain and leave the now-inactive 683*7c478bd9Sstevel@tonic-gate * turnstile attached to the thread we're waking. 684*7c478bd9Sstevel@tonic-gate * Note that the ts_inheritor for the turnstile 685*7c478bd9Sstevel@tonic-gate * may be NULL. If one exists, its t_prioinv 686*7c478bd9Sstevel@tonic-gate * chain has to be updated. 687*7c478bd9Sstevel@tonic-gate */ 688*7c478bd9Sstevel@tonic-gate ASSERT(ts->ts_waiters == 1); 689*7c478bd9Sstevel@tonic-gate if (ts->ts_inheritor != NULL) { 690*7c478bd9Sstevel@tonic-gate (void) turnstile_pi_tsdelete(ts, ts->ts_inheritor); 691*7c478bd9Sstevel@tonic-gate /* 692*7c478bd9Sstevel@tonic-gate * If we ever do a "disinherit" or "unboost", we need 693*7c478bd9Sstevel@tonic-gate * to do it only if "t" is a thread at the head of the 694*7c478bd9Sstevel@tonic-gate * sleep queue. Since the sleep queue is prioritized, 695*7c478bd9Sstevel@tonic-gate * the disinherit is necessary only if the interrupted 696*7c478bd9Sstevel@tonic-gate * thread is the highest priority thread. 697*7c478bd9Sstevel@tonic-gate * Otherwise, there is a higher priority thread blocked 698*7c478bd9Sstevel@tonic-gate * on the turnstile, whose inheritance cannot be 699*7c478bd9Sstevel@tonic-gate * disinherited. However, disinheriting is explicitly 700*7c478bd9Sstevel@tonic-gate * not done here, since it would require holding the 701*7c478bd9Sstevel@tonic-gate * inheritor's thread lock (see turnstile_unsleep()). 702*7c478bd9Sstevel@tonic-gate */ 703*7c478bd9Sstevel@tonic-gate ts->ts_inheritor = NULL; 704*7c478bd9Sstevel@tonic-gate } 705*7c478bd9Sstevel@tonic-gate tspp = &tc->tc_first; 706*7c478bd9Sstevel@tonic-gate while (*tspp != ts) 707*7c478bd9Sstevel@tonic-gate tspp = &(*tspp)->ts_next; 708*7c478bd9Sstevel@tonic-gate *tspp = ts->ts_next; 709*7c478bd9Sstevel@tonic-gate ASSERT(t->t_ts == ts); 710*7c478bd9Sstevel@tonic-gate } 711*7c478bd9Sstevel@tonic-gate ts->ts_waiters--; 712*7c478bd9Sstevel@tonic-gate sleepq_dequeue(t); 713*7c478bd9Sstevel@tonic-gate t->t_sobj_ops = NULL; 714*7c478bd9Sstevel@tonic-gate t->t_wchan = NULL; 715*7c478bd9Sstevel@tonic-gate t->t_wchan0 = NULL; 716*7c478bd9Sstevel@tonic-gate ASSERT(t->t_state == TS_SLEEP); 717*7c478bd9Sstevel@tonic-gate } 718*7c478bd9Sstevel@tonic-gate 719*7c478bd9Sstevel@tonic-gate /* 720*7c478bd9Sstevel@tonic-gate * Wake threads that are blocked in a turnstile. 721*7c478bd9Sstevel@tonic-gate */ 722*7c478bd9Sstevel@tonic-gate void 723*7c478bd9Sstevel@tonic-gate turnstile_wakeup(turnstile_t *ts, int qnum, int nthreads, kthread_t *owner) 724*7c478bd9Sstevel@tonic-gate { 725*7c478bd9Sstevel@tonic-gate turnstile_chain_t *tc = &TURNSTILE_CHAIN(ts->ts_sobj); 726*7c478bd9Sstevel@tonic-gate sleepq_t *sqp = &ts->ts_sleepq[qnum]; 727*7c478bd9Sstevel@tonic-gate 728*7c478bd9Sstevel@tonic-gate ASSERT(DISP_LOCK_HELD(&tc->tc_lock)); 729*7c478bd9Sstevel@tonic-gate 730*7c478bd9Sstevel@tonic-gate /* 731*7c478bd9Sstevel@tonic-gate * Waive any priority we may have inherited from this turnstile. 732*7c478bd9Sstevel@tonic-gate */ 733*7c478bd9Sstevel@tonic-gate if (ts->ts_inheritor != NULL) { 734*7c478bd9Sstevel@tonic-gate turnstile_pi_waive(ts); 735*7c478bd9Sstevel@tonic-gate } 736*7c478bd9Sstevel@tonic-gate while (nthreads-- > 0) { 737*7c478bd9Sstevel@tonic-gate kthread_t *t = sqp->sq_first; 738*7c478bd9Sstevel@tonic-gate ASSERT(t->t_ts == ts); 739*7c478bd9Sstevel@tonic-gate ASSERT(ts->ts_waiters > 1 || ts->ts_inheritor == NULL); 740*7c478bd9Sstevel@tonic-gate DTRACE_SCHED1(wakeup, kthread_t *, t); 741*7c478bd9Sstevel@tonic-gate turnstile_dequeue(t); 742*7c478bd9Sstevel@tonic-gate CL_WAKEUP(t); /* previous thread lock, tc_lock, not dropped */ 743*7c478bd9Sstevel@tonic-gate /* 744*7c478bd9Sstevel@tonic-gate * If the caller did direct handoff of ownership, 745*7c478bd9Sstevel@tonic-gate * make the new owner inherit from this turnstile. 746*7c478bd9Sstevel@tonic-gate */ 747*7c478bd9Sstevel@tonic-gate if (t == owner) { 748*7c478bd9Sstevel@tonic-gate kthread_t *wp = ts->ts_sleepq[TS_WRITER_Q].sq_first; 749*7c478bd9Sstevel@tonic-gate kthread_t *rp = ts->ts_sleepq[TS_READER_Q].sq_first; 750*7c478bd9Sstevel@tonic-gate pri_t wpri = wp ? DISP_PRIO(wp) : 0; 751*7c478bd9Sstevel@tonic-gate pri_t rpri = rp ? DISP_PRIO(rp) : 0; 752*7c478bd9Sstevel@tonic-gate turnstile_pi_inherit(ts, t, MAX(wpri, rpri)); 753*7c478bd9Sstevel@tonic-gate owner = NULL; 754*7c478bd9Sstevel@tonic-gate } 755*7c478bd9Sstevel@tonic-gate thread_unlock_high(t); /* drop run queue lock */ 756*7c478bd9Sstevel@tonic-gate } 757*7c478bd9Sstevel@tonic-gate if (owner != NULL) 758*7c478bd9Sstevel@tonic-gate panic("turnstile_wakeup: owner %p not woken", owner); 759*7c478bd9Sstevel@tonic-gate disp_lock_exit(&tc->tc_lock); 760*7c478bd9Sstevel@tonic-gate } 761*7c478bd9Sstevel@tonic-gate 762*7c478bd9Sstevel@tonic-gate /* 763*7c478bd9Sstevel@tonic-gate * Change priority of a thread sleeping in a turnstile. 764*7c478bd9Sstevel@tonic-gate */ 765*7c478bd9Sstevel@tonic-gate void 766*7c478bd9Sstevel@tonic-gate turnstile_change_pri(kthread_t *t, pri_t pri, pri_t *t_prip) 767*7c478bd9Sstevel@tonic-gate { 768*7c478bd9Sstevel@tonic-gate sleepq_t *sqp = t->t_sleepq; 769*7c478bd9Sstevel@tonic-gate 770*7c478bd9Sstevel@tonic-gate sleepq_dequeue(t); 771*7c478bd9Sstevel@tonic-gate *t_prip = pri; 772*7c478bd9Sstevel@tonic-gate sleepq_insert(sqp, t); 773*7c478bd9Sstevel@tonic-gate } 774*7c478bd9Sstevel@tonic-gate 775*7c478bd9Sstevel@tonic-gate /* 776*7c478bd9Sstevel@tonic-gate * We don't allow spurious wakeups of threads blocked in turnstiles 777*7c478bd9Sstevel@tonic-gate * for synch objects whose sobj_ops vector is initialized with the 778*7c478bd9Sstevel@tonic-gate * following routine (e.g. kernel synchronization objects). 779*7c478bd9Sstevel@tonic-gate * This is vital to the correctness of direct-handoff logic in some 780*7c478bd9Sstevel@tonic-gate * synchronization primitives, and it also simplifies the PI logic. 781*7c478bd9Sstevel@tonic-gate */ 782*7c478bd9Sstevel@tonic-gate /* ARGSUSED */ 783*7c478bd9Sstevel@tonic-gate void 784*7c478bd9Sstevel@tonic-gate turnstile_stay_asleep(kthread_t *t) 785*7c478bd9Sstevel@tonic-gate { 786*7c478bd9Sstevel@tonic-gate } 787*7c478bd9Sstevel@tonic-gate 788*7c478bd9Sstevel@tonic-gate /* 789*7c478bd9Sstevel@tonic-gate * Wake up a thread blocked in a turnstile. Used to enable interruptibility 790*7c478bd9Sstevel@tonic-gate * of threads blocked on a SOBJ_USER_PI sobj. 791*7c478bd9Sstevel@tonic-gate * 792*7c478bd9Sstevel@tonic-gate * The implications of this interface are: 793*7c478bd9Sstevel@tonic-gate * 794*7c478bd9Sstevel@tonic-gate * 1. turnstile_block() may return with an EINTR. 795*7c478bd9Sstevel@tonic-gate * 2. When the owner of an sobj releases it, but no turnstile is found (i.e. 796*7c478bd9Sstevel@tonic-gate * no waiters), the (prior) owner must call turnstile_pi_recalc() to 797*7c478bd9Sstevel@tonic-gate * waive any priority inherited from interrupted waiters. 798*7c478bd9Sstevel@tonic-gate * 799*7c478bd9Sstevel@tonic-gate * When a waiter is interrupted, disinheriting its willed priority from the 800*7c478bd9Sstevel@tonic-gate * inheritor would require holding the inheritor's thread lock, while also 801*7c478bd9Sstevel@tonic-gate * holding the waiter's thread lock which is a turnstile lock. If the 802*7c478bd9Sstevel@tonic-gate * inheritor's thread lock is not free, and is also a turnstile lock that 803*7c478bd9Sstevel@tonic-gate * is out of lock order, the waiter's thread lock would have to be dropped. 804*7c478bd9Sstevel@tonic-gate * This leads to complications for the caller of turnstile_unsleep(), since 805*7c478bd9Sstevel@tonic-gate * the caller holds the waiter's thread lock. So, instead of disinheriting 806*7c478bd9Sstevel@tonic-gate * on waiter interruption, the owner is required to follow rule 2 above. 807*7c478bd9Sstevel@tonic-gate * 808*7c478bd9Sstevel@tonic-gate * Avoiding disinherit on waiter interruption seems acceptable because 809*7c478bd9Sstevel@tonic-gate * the owner runs at an unnecessarily high priority only while sobj is held, 810*7c478bd9Sstevel@tonic-gate * which it would have done in any case, if the waiter had not been interrupted. 811*7c478bd9Sstevel@tonic-gate */ 812*7c478bd9Sstevel@tonic-gate void 813*7c478bd9Sstevel@tonic-gate turnstile_unsleep(kthread_t *t) 814*7c478bd9Sstevel@tonic-gate { 815*7c478bd9Sstevel@tonic-gate turnstile_dequeue(t); 816*7c478bd9Sstevel@tonic-gate THREAD_TRANSITION(t); 817*7c478bd9Sstevel@tonic-gate CL_SETRUN(t); 818*7c478bd9Sstevel@tonic-gate } 819