xref: /titanic_44/usr/src/uts/common/os/turnstile.c (revision 7c478bd95313f5f23a4c958a745db2134aa03244)
1*7c478bd9Sstevel@tonic-gate /*
2*7c478bd9Sstevel@tonic-gate  * CDDL HEADER START
3*7c478bd9Sstevel@tonic-gate  *
4*7c478bd9Sstevel@tonic-gate  * The contents of this file are subject to the terms of the
5*7c478bd9Sstevel@tonic-gate  * Common Development and Distribution License, Version 1.0 only
6*7c478bd9Sstevel@tonic-gate  * (the "License").  You may not use this file except in compliance
7*7c478bd9Sstevel@tonic-gate  * with the License.
8*7c478bd9Sstevel@tonic-gate  *
9*7c478bd9Sstevel@tonic-gate  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
10*7c478bd9Sstevel@tonic-gate  * or http://www.opensolaris.org/os/licensing.
11*7c478bd9Sstevel@tonic-gate  * See the License for the specific language governing permissions
12*7c478bd9Sstevel@tonic-gate  * and limitations under the License.
13*7c478bd9Sstevel@tonic-gate  *
14*7c478bd9Sstevel@tonic-gate  * When distributing Covered Code, include this CDDL HEADER in each
15*7c478bd9Sstevel@tonic-gate  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
16*7c478bd9Sstevel@tonic-gate  * If applicable, add the following below this CDDL HEADER, with the
17*7c478bd9Sstevel@tonic-gate  * fields enclosed by brackets "[]" replaced with your own identifying
18*7c478bd9Sstevel@tonic-gate  * information: Portions Copyright [yyyy] [name of copyright owner]
19*7c478bd9Sstevel@tonic-gate  *
20*7c478bd9Sstevel@tonic-gate  * CDDL HEADER END
21*7c478bd9Sstevel@tonic-gate  */
22*7c478bd9Sstevel@tonic-gate /*
23*7c478bd9Sstevel@tonic-gate  * Copyright 2004 Sun Microsystems, Inc.  All rights reserved.
24*7c478bd9Sstevel@tonic-gate  * Use is subject to license terms.
25*7c478bd9Sstevel@tonic-gate  */
26*7c478bd9Sstevel@tonic-gate 
27*7c478bd9Sstevel@tonic-gate #pragma ident	"%Z%%M%	%I%	%E% SMI"
28*7c478bd9Sstevel@tonic-gate 
29*7c478bd9Sstevel@tonic-gate /*
30*7c478bd9Sstevel@tonic-gate  * Big Theory Statement for turnstiles.
31*7c478bd9Sstevel@tonic-gate  *
32*7c478bd9Sstevel@tonic-gate  * Turnstiles provide blocking and wakeup support, including priority
33*7c478bd9Sstevel@tonic-gate  * inheritance, for synchronization primitives (e.g. mutexes and rwlocks).
34*7c478bd9Sstevel@tonic-gate  * Typical usage is as follows:
35*7c478bd9Sstevel@tonic-gate  *
36*7c478bd9Sstevel@tonic-gate  * To block on lock 'lp' for read access in foo_enter():
37*7c478bd9Sstevel@tonic-gate  *
38*7c478bd9Sstevel@tonic-gate  *	ts = turnstile_lookup(lp);
39*7c478bd9Sstevel@tonic-gate  *	[ If the lock is still held, set the waiters bit
40*7c478bd9Sstevel@tonic-gate  *	turnstile_block(ts, TS_READER_Q, lp, &foo_sobj_ops);
41*7c478bd9Sstevel@tonic-gate  *
42*7c478bd9Sstevel@tonic-gate  * To wake threads waiting for write access to lock 'lp' in foo_exit():
43*7c478bd9Sstevel@tonic-gate  *
44*7c478bd9Sstevel@tonic-gate  *	ts = turnstile_lookup(lp);
45*7c478bd9Sstevel@tonic-gate  *	[ Either drop the lock (change owner to NULL) or perform a direct
46*7c478bd9Sstevel@tonic-gate  *	[ handoff (change owner to one of the threads we're about to wake).
47*7c478bd9Sstevel@tonic-gate  *	[ If we're going to wake the last waiter, clear the waiters bit.
48*7c478bd9Sstevel@tonic-gate  *	turnstile_wakeup(ts, TS_WRITER_Q, nwaiters, new_owner or NULL);
49*7c478bd9Sstevel@tonic-gate  *
50*7c478bd9Sstevel@tonic-gate  * turnstile_lookup() returns holding the turnstile hash chain lock for lp.
51*7c478bd9Sstevel@tonic-gate  * Both turnstile_block() and turnstile_wakeup() drop the turnstile lock.
52*7c478bd9Sstevel@tonic-gate  * To abort a turnstile operation, the client must call turnstile_exit().
53*7c478bd9Sstevel@tonic-gate  *
54*7c478bd9Sstevel@tonic-gate  * Requirements of the client:
55*7c478bd9Sstevel@tonic-gate  *
56*7c478bd9Sstevel@tonic-gate  * (1)  The lock's waiters indicator may be manipulated *only* while
57*7c478bd9Sstevel@tonic-gate  *	holding the turnstile hash chain lock (i.e. under turnstile_lookup()).
58*7c478bd9Sstevel@tonic-gate  *
59*7c478bd9Sstevel@tonic-gate  * (2)	Once the lock is marked as having waiters, the owner may be
60*7c478bd9Sstevel@tonic-gate  *	changed *only* while holding the turnstile hash chain lock.
61*7c478bd9Sstevel@tonic-gate  *
62*7c478bd9Sstevel@tonic-gate  * (3)	The caller must never block on an unheld lock.
63*7c478bd9Sstevel@tonic-gate  *
64*7c478bd9Sstevel@tonic-gate  * Consequences of these assumptions include the following:
65*7c478bd9Sstevel@tonic-gate  *
66*7c478bd9Sstevel@tonic-gate  * (a) It is impossible for a lock to be unheld but have waiters.
67*7c478bd9Sstevel@tonic-gate  *
68*7c478bd9Sstevel@tonic-gate  * (b)	The priority inheritance code can safely assume that an active
69*7c478bd9Sstevel@tonic-gate  *	turnstile's ts_inheritor never changes until the inheritor calls
70*7c478bd9Sstevel@tonic-gate  *	turnstile_pi_waive().
71*7c478bd9Sstevel@tonic-gate  *
72*7c478bd9Sstevel@tonic-gate  * These assumptions simplify the implementation of both turnstiles and
73*7c478bd9Sstevel@tonic-gate  * their clients.
74*7c478bd9Sstevel@tonic-gate  *
75*7c478bd9Sstevel@tonic-gate  * Background on priority inheritance:
76*7c478bd9Sstevel@tonic-gate  *
77*7c478bd9Sstevel@tonic-gate  * Priority inheritance allows a thread to "will" its dispatch priority
78*7c478bd9Sstevel@tonic-gate  * to all the threads blocking it, directly or indirectly.  This prevents
79*7c478bd9Sstevel@tonic-gate  * situations called priority inversions in which a high-priority thread
80*7c478bd9Sstevel@tonic-gate  * needs a lock held by a low-priority thread, which cannot run because
81*7c478bd9Sstevel@tonic-gate  * of medium-priority threads.  Without PI, the medium-priority threads
82*7c478bd9Sstevel@tonic-gate  * can starve out the high-priority thread indefinitely.  With PI, the
83*7c478bd9Sstevel@tonic-gate  * low-priority thread becomes high-priority until it releases whatever
84*7c478bd9Sstevel@tonic-gate  * synchronization object the real high-priority thread is waiting for.
85*7c478bd9Sstevel@tonic-gate  *
86*7c478bd9Sstevel@tonic-gate  * How turnstiles work:
87*7c478bd9Sstevel@tonic-gate  *
88*7c478bd9Sstevel@tonic-gate  * All active turnstiles reside in a global hash table, turnstile_table[].
89*7c478bd9Sstevel@tonic-gate  * The address of a synchronization object determines its hash index.
90*7c478bd9Sstevel@tonic-gate  * Each hash chain is protected by its own dispatcher lock, acquired
91*7c478bd9Sstevel@tonic-gate  * by turnstile_lookup().  This lock protects the hash chain linkage, the
92*7c478bd9Sstevel@tonic-gate  * contents of all turnstiles on the hash chain, and the waiters bits of
93*7c478bd9Sstevel@tonic-gate  * every synchronization object in the system that hashes to the same chain.
94*7c478bd9Sstevel@tonic-gate  * Giving the lock such broad scope simplifies the interactions between
95*7c478bd9Sstevel@tonic-gate  * the turnstile code and its clients considerably.  The blocking path
96*7c478bd9Sstevel@tonic-gate  * is rare enough that this has no impact on scalability.  (If it ever
97*7c478bd9Sstevel@tonic-gate  * does, it's almost surely a second-order effect -- the real problem
98*7c478bd9Sstevel@tonic-gate  * is that some synchronization object is *very* heavily contended.)
99*7c478bd9Sstevel@tonic-gate  *
100*7c478bd9Sstevel@tonic-gate  * Each thread has an attached turnstile in case it needs to block.
101*7c478bd9Sstevel@tonic-gate  * A thread cannot block on more than one lock at a time, so one
102*7c478bd9Sstevel@tonic-gate  * turnstile per thread is the most we ever need.  The first thread
103*7c478bd9Sstevel@tonic-gate  * to block on a lock donates its attached turnstile and adds it to
104*7c478bd9Sstevel@tonic-gate  * the appropriate hash chain in turnstile_table[].  This becomes the
105*7c478bd9Sstevel@tonic-gate  * "active turnstile" for the lock.  Each subsequent thread that blocks
106*7c478bd9Sstevel@tonic-gate  * on the same lock discovers that the lock already has an active
107*7c478bd9Sstevel@tonic-gate  * turnstile, so it stashes its own turnstile on the active turnstile's
108*7c478bd9Sstevel@tonic-gate  * freelist.  As threads wake up, the process is reversed.
109*7c478bd9Sstevel@tonic-gate  *
110*7c478bd9Sstevel@tonic-gate  * turnstile_block() puts the current thread to sleep on the active
111*7c478bd9Sstevel@tonic-gate  * turnstile for the desired lock, walks the blocking chain to apply
112*7c478bd9Sstevel@tonic-gate  * priority inheritance to everyone in its way, and yields the CPU.
113*7c478bd9Sstevel@tonic-gate  *
114*7c478bd9Sstevel@tonic-gate  * turnstile_wakeup() waives any priority the owner may have inherited
115*7c478bd9Sstevel@tonic-gate  * and wakes the specified number of waiting threads.  If the caller is
116*7c478bd9Sstevel@tonic-gate  * doing direct handoff of ownership (rather than just dropping the lock),
117*7c478bd9Sstevel@tonic-gate  * the new owner automatically inherits priority from any existing waiters.
118*7c478bd9Sstevel@tonic-gate  */
119*7c478bd9Sstevel@tonic-gate 
120*7c478bd9Sstevel@tonic-gate #include <sys/param.h>
121*7c478bd9Sstevel@tonic-gate #include <sys/systm.h>
122*7c478bd9Sstevel@tonic-gate #include <sys/thread.h>
123*7c478bd9Sstevel@tonic-gate #include <sys/proc.h>
124*7c478bd9Sstevel@tonic-gate #include <sys/debug.h>
125*7c478bd9Sstevel@tonic-gate #include <sys/cpuvar.h>
126*7c478bd9Sstevel@tonic-gate #include <sys/turnstile.h>
127*7c478bd9Sstevel@tonic-gate #include <sys/t_lock.h>
128*7c478bd9Sstevel@tonic-gate #include <sys/disp.h>
129*7c478bd9Sstevel@tonic-gate #include <sys/sobject.h>
130*7c478bd9Sstevel@tonic-gate #include <sys/cmn_err.h>
131*7c478bd9Sstevel@tonic-gate #include <sys/sysmacros.h>
132*7c478bd9Sstevel@tonic-gate #include <sys/lockstat.h>
133*7c478bd9Sstevel@tonic-gate #include <sys/lwp_upimutex_impl.h>
134*7c478bd9Sstevel@tonic-gate #include <sys/schedctl.h>
135*7c478bd9Sstevel@tonic-gate #include <sys/cpu.h>
136*7c478bd9Sstevel@tonic-gate #include <sys/sdt.h>
137*7c478bd9Sstevel@tonic-gate #include <sys/cpupart.h>
138*7c478bd9Sstevel@tonic-gate 
139*7c478bd9Sstevel@tonic-gate extern upib_t upimutextab[UPIMUTEX_TABSIZE];
140*7c478bd9Sstevel@tonic-gate 
141*7c478bd9Sstevel@tonic-gate #define	IS_UPI(sobj)	\
142*7c478bd9Sstevel@tonic-gate 	((uintptr_t)(sobj) - (uintptr_t)upimutextab < sizeof (upimutextab))
143*7c478bd9Sstevel@tonic-gate 
144*7c478bd9Sstevel@tonic-gate /*
145*7c478bd9Sstevel@tonic-gate  * The turnstile hash table is partitioned into two halves: the lower half
146*7c478bd9Sstevel@tonic-gate  * is used for upimutextab[] locks, the upper half for everything else.
147*7c478bd9Sstevel@tonic-gate  * The reason for the distinction is that SOBJ_USER_PI locks present a
148*7c478bd9Sstevel@tonic-gate  * unique problem: the upimutextab[] lock passed to turnstile_block()
149*7c478bd9Sstevel@tonic-gate  * cannot be dropped until the calling thread has blocked on its
150*7c478bd9Sstevel@tonic-gate  * SOBJ_USER_PI lock and willed its priority down the blocking chain.
151*7c478bd9Sstevel@tonic-gate  * At that point, the caller's t_lockp will be one of the turnstile locks.
152*7c478bd9Sstevel@tonic-gate  * If mutex_exit() discovers that the upimutextab[] lock has waiters, it
153*7c478bd9Sstevel@tonic-gate  * must wake them, which forces a lock ordering on us: the turnstile lock
154*7c478bd9Sstevel@tonic-gate  * for the upimutextab[] lock will be acquired in mutex_vector_exit(),
155*7c478bd9Sstevel@tonic-gate  * which will eventually call into turnstile_pi_waive(), which will then
156*7c478bd9Sstevel@tonic-gate  * acquire the caller's thread lock, which in this case is the turnstile
157*7c478bd9Sstevel@tonic-gate  * lock for the SOBJ_USER_PI lock.  In general, when two turnstile locks
158*7c478bd9Sstevel@tonic-gate  * must be held at the same time, the lock order must be the address order.
159*7c478bd9Sstevel@tonic-gate  * Therefore, to prevent deadlock in turnstile_pi_waive(), we must ensure
160*7c478bd9Sstevel@tonic-gate  * that upimutextab[] locks *always* hash to lower addresses than any
161*7c478bd9Sstevel@tonic-gate  * other locks.  You think this is cheesy?  Let's see you do better.
162*7c478bd9Sstevel@tonic-gate  */
163*7c478bd9Sstevel@tonic-gate #define	TURNSTILE_HASH_SIZE	128		/* must be power of 2 */
164*7c478bd9Sstevel@tonic-gate #define	TURNSTILE_HASH_MASK	(TURNSTILE_HASH_SIZE - 1)
165*7c478bd9Sstevel@tonic-gate #define	TURNSTILE_SOBJ_HASH(sobj)	\
166*7c478bd9Sstevel@tonic-gate 	((((ulong_t)sobj >> 2) + ((ulong_t)sobj >> 9)) & TURNSTILE_HASH_MASK)
167*7c478bd9Sstevel@tonic-gate #define	TURNSTILE_SOBJ_BUCKET(sobj)		\
168*7c478bd9Sstevel@tonic-gate 	((IS_UPI(sobj) ? 0 : TURNSTILE_HASH_SIZE) + TURNSTILE_SOBJ_HASH(sobj))
169*7c478bd9Sstevel@tonic-gate #define	TURNSTILE_CHAIN(sobj)	turnstile_table[TURNSTILE_SOBJ_BUCKET(sobj)]
170*7c478bd9Sstevel@tonic-gate 
171*7c478bd9Sstevel@tonic-gate typedef struct turnstile_chain {
172*7c478bd9Sstevel@tonic-gate 	turnstile_t	*tc_first;	/* first turnstile on hash chain */
173*7c478bd9Sstevel@tonic-gate 	disp_lock_t	tc_lock;	/* lock for this hash chain */
174*7c478bd9Sstevel@tonic-gate } turnstile_chain_t;
175*7c478bd9Sstevel@tonic-gate 
176*7c478bd9Sstevel@tonic-gate turnstile_chain_t	turnstile_table[2 * TURNSTILE_HASH_SIZE];
177*7c478bd9Sstevel@tonic-gate 
178*7c478bd9Sstevel@tonic-gate static	lock_t	turnstile_loser_lock;
179*7c478bd9Sstevel@tonic-gate 
180*7c478bd9Sstevel@tonic-gate /*
181*7c478bd9Sstevel@tonic-gate  * Make 'inheritor' inherit priority from this turnstile.
182*7c478bd9Sstevel@tonic-gate  */
183*7c478bd9Sstevel@tonic-gate static void
184*7c478bd9Sstevel@tonic-gate turnstile_pi_inherit(turnstile_t *ts, kthread_t *inheritor, pri_t epri)
185*7c478bd9Sstevel@tonic-gate {
186*7c478bd9Sstevel@tonic-gate 	ASSERT(THREAD_LOCK_HELD(inheritor));
187*7c478bd9Sstevel@tonic-gate 	ASSERT(DISP_LOCK_HELD(&TURNSTILE_CHAIN(ts->ts_sobj).tc_lock));
188*7c478bd9Sstevel@tonic-gate 
189*7c478bd9Sstevel@tonic-gate 	if (epri <= inheritor->t_pri)
190*7c478bd9Sstevel@tonic-gate 		return;
191*7c478bd9Sstevel@tonic-gate 
192*7c478bd9Sstevel@tonic-gate 	if (ts->ts_inheritor == NULL) {
193*7c478bd9Sstevel@tonic-gate 		ts->ts_inheritor = inheritor;
194*7c478bd9Sstevel@tonic-gate 		ts->ts_epri = epri;
195*7c478bd9Sstevel@tonic-gate 		disp_lock_enter_high(&inheritor->t_pi_lock);
196*7c478bd9Sstevel@tonic-gate 		ts->ts_prioinv = inheritor->t_prioinv;
197*7c478bd9Sstevel@tonic-gate 		inheritor->t_prioinv = ts;
198*7c478bd9Sstevel@tonic-gate 		disp_lock_exit_high(&inheritor->t_pi_lock);
199*7c478bd9Sstevel@tonic-gate 	} else {
200*7c478bd9Sstevel@tonic-gate 		/*
201*7c478bd9Sstevel@tonic-gate 		 * 'inheritor' is already inheriting from this turnstile,
202*7c478bd9Sstevel@tonic-gate 		 * so just adjust its priority.
203*7c478bd9Sstevel@tonic-gate 		 */
204*7c478bd9Sstevel@tonic-gate 		ASSERT(ts->ts_inheritor == inheritor);
205*7c478bd9Sstevel@tonic-gate 		if (ts->ts_epri < epri)
206*7c478bd9Sstevel@tonic-gate 			ts->ts_epri = epri;
207*7c478bd9Sstevel@tonic-gate 	}
208*7c478bd9Sstevel@tonic-gate 
209*7c478bd9Sstevel@tonic-gate 	if (epri > DISP_PRIO(inheritor))
210*7c478bd9Sstevel@tonic-gate 		thread_change_epri(inheritor, epri);
211*7c478bd9Sstevel@tonic-gate }
212*7c478bd9Sstevel@tonic-gate 
213*7c478bd9Sstevel@tonic-gate /*
214*7c478bd9Sstevel@tonic-gate  * If turnstile is non-NULL, remove it from inheritor's t_prioinv list.
215*7c478bd9Sstevel@tonic-gate  * Compute new inherited priority, and return it.
216*7c478bd9Sstevel@tonic-gate  */
217*7c478bd9Sstevel@tonic-gate static pri_t
218*7c478bd9Sstevel@tonic-gate turnstile_pi_tsdelete(turnstile_t *ts, kthread_t *inheritor)
219*7c478bd9Sstevel@tonic-gate {
220*7c478bd9Sstevel@tonic-gate 	turnstile_t **tspp, *tsp;
221*7c478bd9Sstevel@tonic-gate 	pri_t new_epri = 0;
222*7c478bd9Sstevel@tonic-gate 
223*7c478bd9Sstevel@tonic-gate 	disp_lock_enter_high(&inheritor->t_pi_lock);
224*7c478bd9Sstevel@tonic-gate 	tspp = &inheritor->t_prioinv;
225*7c478bd9Sstevel@tonic-gate 	while ((tsp = *tspp) != NULL) {
226*7c478bd9Sstevel@tonic-gate 		if (tsp == ts)
227*7c478bd9Sstevel@tonic-gate 			*tspp = tsp->ts_prioinv;
228*7c478bd9Sstevel@tonic-gate 		else
229*7c478bd9Sstevel@tonic-gate 			new_epri = MAX(new_epri, tsp->ts_epri);
230*7c478bd9Sstevel@tonic-gate 		tspp = &tsp->ts_prioinv;
231*7c478bd9Sstevel@tonic-gate 	}
232*7c478bd9Sstevel@tonic-gate 	disp_lock_exit_high(&inheritor->t_pi_lock);
233*7c478bd9Sstevel@tonic-gate 	return (new_epri);
234*7c478bd9Sstevel@tonic-gate }
235*7c478bd9Sstevel@tonic-gate 
236*7c478bd9Sstevel@tonic-gate /*
237*7c478bd9Sstevel@tonic-gate  * Remove turnstile from inheritor's t_prioinv list, compute
238*7c478bd9Sstevel@tonic-gate  * new priority, and change the inheritor's effective priority if
239*7c478bd9Sstevel@tonic-gate  * necessary. Keep in synch with turnstile_pi_recalc().
240*7c478bd9Sstevel@tonic-gate  */
241*7c478bd9Sstevel@tonic-gate static void
242*7c478bd9Sstevel@tonic-gate turnstile_pi_waive(turnstile_t *ts)
243*7c478bd9Sstevel@tonic-gate {
244*7c478bd9Sstevel@tonic-gate 	kthread_t *inheritor = ts->ts_inheritor;
245*7c478bd9Sstevel@tonic-gate 	pri_t new_epri;
246*7c478bd9Sstevel@tonic-gate 
247*7c478bd9Sstevel@tonic-gate 	ASSERT(inheritor == curthread);
248*7c478bd9Sstevel@tonic-gate 
249*7c478bd9Sstevel@tonic-gate 	thread_lock_high(inheritor);
250*7c478bd9Sstevel@tonic-gate 	new_epri = turnstile_pi_tsdelete(ts, inheritor);
251*7c478bd9Sstevel@tonic-gate 	if (new_epri != DISP_PRIO(inheritor))
252*7c478bd9Sstevel@tonic-gate 		thread_change_epri(inheritor, new_epri);
253*7c478bd9Sstevel@tonic-gate 	ts->ts_inheritor = NULL;
254*7c478bd9Sstevel@tonic-gate 	if (DISP_MUST_SURRENDER(inheritor))
255*7c478bd9Sstevel@tonic-gate 		cpu_surrender(inheritor);
256*7c478bd9Sstevel@tonic-gate 	thread_unlock_high(inheritor);
257*7c478bd9Sstevel@tonic-gate }
258*7c478bd9Sstevel@tonic-gate 
259*7c478bd9Sstevel@tonic-gate /*
260*7c478bd9Sstevel@tonic-gate  * Compute caller's new inherited priority, and change its effective
261*7c478bd9Sstevel@tonic-gate  * priority if necessary. Necessary only for SOBJ_USER_PI, because of
262*7c478bd9Sstevel@tonic-gate  * its interruptibility characteristic.
263*7c478bd9Sstevel@tonic-gate  */
264*7c478bd9Sstevel@tonic-gate void
265*7c478bd9Sstevel@tonic-gate turnstile_pi_recalc(void)
266*7c478bd9Sstevel@tonic-gate {
267*7c478bd9Sstevel@tonic-gate 	kthread_t *inheritor = curthread;
268*7c478bd9Sstevel@tonic-gate 	pri_t new_epri;
269*7c478bd9Sstevel@tonic-gate 
270*7c478bd9Sstevel@tonic-gate 	thread_lock(inheritor);
271*7c478bd9Sstevel@tonic-gate 	new_epri = turnstile_pi_tsdelete(NULL, inheritor);
272*7c478bd9Sstevel@tonic-gate 	if (new_epri != DISP_PRIO(inheritor))
273*7c478bd9Sstevel@tonic-gate 		thread_change_epri(inheritor, new_epri);
274*7c478bd9Sstevel@tonic-gate 	if (DISP_MUST_SURRENDER(inheritor))
275*7c478bd9Sstevel@tonic-gate 		cpu_surrender(inheritor);
276*7c478bd9Sstevel@tonic-gate 	thread_unlock(inheritor);
277*7c478bd9Sstevel@tonic-gate }
278*7c478bd9Sstevel@tonic-gate 
279*7c478bd9Sstevel@tonic-gate /*
280*7c478bd9Sstevel@tonic-gate  * Grab the lock protecting the hash chain for sobj
281*7c478bd9Sstevel@tonic-gate  * and return the active turnstile for sobj, if any.
282*7c478bd9Sstevel@tonic-gate  */
283*7c478bd9Sstevel@tonic-gate turnstile_t *
284*7c478bd9Sstevel@tonic-gate turnstile_lookup(void *sobj)
285*7c478bd9Sstevel@tonic-gate {
286*7c478bd9Sstevel@tonic-gate 	turnstile_t *ts;
287*7c478bd9Sstevel@tonic-gate 	turnstile_chain_t *tc = &TURNSTILE_CHAIN(sobj);
288*7c478bd9Sstevel@tonic-gate 
289*7c478bd9Sstevel@tonic-gate 	disp_lock_enter(&tc->tc_lock);
290*7c478bd9Sstevel@tonic-gate 
291*7c478bd9Sstevel@tonic-gate 	for (ts = tc->tc_first; ts != NULL; ts = ts->ts_next)
292*7c478bd9Sstevel@tonic-gate 		if (ts->ts_sobj == sobj)
293*7c478bd9Sstevel@tonic-gate 			break;
294*7c478bd9Sstevel@tonic-gate 
295*7c478bd9Sstevel@tonic-gate 	return (ts);
296*7c478bd9Sstevel@tonic-gate }
297*7c478bd9Sstevel@tonic-gate 
298*7c478bd9Sstevel@tonic-gate /*
299*7c478bd9Sstevel@tonic-gate  * Drop the lock protecting the hash chain for sobj.
300*7c478bd9Sstevel@tonic-gate  */
301*7c478bd9Sstevel@tonic-gate void
302*7c478bd9Sstevel@tonic-gate turnstile_exit(void *sobj)
303*7c478bd9Sstevel@tonic-gate {
304*7c478bd9Sstevel@tonic-gate 	disp_lock_exit(&TURNSTILE_CHAIN(sobj).tc_lock);
305*7c478bd9Sstevel@tonic-gate }
306*7c478bd9Sstevel@tonic-gate 
307*7c478bd9Sstevel@tonic-gate /*
308*7c478bd9Sstevel@tonic-gate  * When we apply priority inheritance, we must grab the owner's thread lock
309*7c478bd9Sstevel@tonic-gate  * while already holding the waiter's thread lock.  If both thread locks are
310*7c478bd9Sstevel@tonic-gate  * turnstile locks, this can lead to deadlock: while we hold L1 and try to
311*7c478bd9Sstevel@tonic-gate  * grab L2, some unrelated thread may be applying priority inheritance to
312*7c478bd9Sstevel@tonic-gate  * some other blocking chain, holding L2 and trying to grab L1.  The most
313*7c478bd9Sstevel@tonic-gate  * obvious solution -- do a lock_try() for the owner lock -- isn't quite
314*7c478bd9Sstevel@tonic-gate  * sufficient because it can cause livelock: each thread may hold one lock,
315*7c478bd9Sstevel@tonic-gate  * try to grab the other, fail, bail out, and try again, looping forever.
316*7c478bd9Sstevel@tonic-gate  * To prevent livelock we must define a winner, i.e. define an arbitrary
317*7c478bd9Sstevel@tonic-gate  * lock ordering on the turnstile locks.  For simplicity we declare that
318*7c478bd9Sstevel@tonic-gate  * virtual address order defines lock order, i.e. if L1 < L2, then the
319*7c478bd9Sstevel@tonic-gate  * correct lock ordering is L1, L2.  Thus the thread that holds L1 and
320*7c478bd9Sstevel@tonic-gate  * wants L2 should spin until L2 is available, but the thread that holds
321*7c478bd9Sstevel@tonic-gate  * L2 and can't get L1 on the first try must drop L2 and return failure.
322*7c478bd9Sstevel@tonic-gate  * Moreover, the losing thread must not reacquire L2 until the winning
323*7c478bd9Sstevel@tonic-gate  * thread has had a chance to grab it; to ensure this, the losing thread
324*7c478bd9Sstevel@tonic-gate  * must grab L1 after dropping L2, thus spinning until the winner is done.
325*7c478bd9Sstevel@tonic-gate  * Complicating matters further, note that the owner's thread lock pointer
326*7c478bd9Sstevel@tonic-gate  * can change (i.e. be pointed at a different lock) while we're trying to
327*7c478bd9Sstevel@tonic-gate  * grab it.  If that happens, we must unwind our state and try again.
328*7c478bd9Sstevel@tonic-gate  *
329*7c478bd9Sstevel@tonic-gate  * On success, returns 1 with both locks held.
330*7c478bd9Sstevel@tonic-gate  * On failure, returns 0 with neither lock held.
331*7c478bd9Sstevel@tonic-gate  */
332*7c478bd9Sstevel@tonic-gate static int
333*7c478bd9Sstevel@tonic-gate turnstile_interlock(lock_t *wlp, lock_t *volatile *olpp)
334*7c478bd9Sstevel@tonic-gate {
335*7c478bd9Sstevel@tonic-gate 	ASSERT(LOCK_HELD(wlp));
336*7c478bd9Sstevel@tonic-gate 
337*7c478bd9Sstevel@tonic-gate 	for (;;) {
338*7c478bd9Sstevel@tonic-gate 		volatile lock_t *olp = *olpp;
339*7c478bd9Sstevel@tonic-gate 
340*7c478bd9Sstevel@tonic-gate 		/*
341*7c478bd9Sstevel@tonic-gate 		 * If the locks are identical, there's nothing to do.
342*7c478bd9Sstevel@tonic-gate 		 */
343*7c478bd9Sstevel@tonic-gate 		if (olp == wlp)
344*7c478bd9Sstevel@tonic-gate 			return (1);
345*7c478bd9Sstevel@tonic-gate 		if (lock_try((lock_t *)olp)) {
346*7c478bd9Sstevel@tonic-gate 			/*
347*7c478bd9Sstevel@tonic-gate 			 * If 'olp' is still the right lock, return success.
348*7c478bd9Sstevel@tonic-gate 			 * Otherwise, drop 'olp' and try the dance again.
349*7c478bd9Sstevel@tonic-gate 			 */
350*7c478bd9Sstevel@tonic-gate 			if (olp == *olpp)
351*7c478bd9Sstevel@tonic-gate 				return (1);
352*7c478bd9Sstevel@tonic-gate 			lock_clear((lock_t *)olp);
353*7c478bd9Sstevel@tonic-gate 		} else {
354*7c478bd9Sstevel@tonic-gate 			uint_t spin_count = 1;
355*7c478bd9Sstevel@tonic-gate 			/*
356*7c478bd9Sstevel@tonic-gate 			 * If we're grabbing the locks out of order, we lose.
357*7c478bd9Sstevel@tonic-gate 			 * Drop the waiter's lock, and then grab and release
358*7c478bd9Sstevel@tonic-gate 			 * the owner's lock to ensure that we won't retry
359*7c478bd9Sstevel@tonic-gate 			 * until the winner is done (as described above).
360*7c478bd9Sstevel@tonic-gate 			 */
361*7c478bd9Sstevel@tonic-gate 			if (olp >= (lock_t *)turnstile_table && olp < wlp) {
362*7c478bd9Sstevel@tonic-gate 				lock_clear(wlp);
363*7c478bd9Sstevel@tonic-gate 				lock_set((lock_t *)olp);
364*7c478bd9Sstevel@tonic-gate 				lock_clear((lock_t *)olp);
365*7c478bd9Sstevel@tonic-gate 				return (0);
366*7c478bd9Sstevel@tonic-gate 			}
367*7c478bd9Sstevel@tonic-gate 			/*
368*7c478bd9Sstevel@tonic-gate 			 * We're grabbing the locks in the right order,
369*7c478bd9Sstevel@tonic-gate 			 * so spin until the owner's lock either becomes
370*7c478bd9Sstevel@tonic-gate 			 * available or spontaneously changes.
371*7c478bd9Sstevel@tonic-gate 			 */
372*7c478bd9Sstevel@tonic-gate 			while (olp == *olpp && LOCK_HELD(olp)) {
373*7c478bd9Sstevel@tonic-gate 				if (panicstr)
374*7c478bd9Sstevel@tonic-gate 					return (1);
375*7c478bd9Sstevel@tonic-gate 				spin_count++;
376*7c478bd9Sstevel@tonic-gate 				SMT_PAUSE();
377*7c478bd9Sstevel@tonic-gate 			}
378*7c478bd9Sstevel@tonic-gate 			LOCKSTAT_RECORD(LS_TURNSTILE_INTERLOCK_SPIN,
379*7c478bd9Sstevel@tonic-gate 			    olp, spin_count);
380*7c478bd9Sstevel@tonic-gate 		}
381*7c478bd9Sstevel@tonic-gate 	}
382*7c478bd9Sstevel@tonic-gate }
383*7c478bd9Sstevel@tonic-gate 
384*7c478bd9Sstevel@tonic-gate /*
385*7c478bd9Sstevel@tonic-gate  * Block the current thread on a synchronization object.
386*7c478bd9Sstevel@tonic-gate  *
387*7c478bd9Sstevel@tonic-gate  * Turnstiles implement both kernel and user-level priority inheritance.
388*7c478bd9Sstevel@tonic-gate  * To avoid missed wakeups in the user-level case, lwp_upimutex_lock() calls
389*7c478bd9Sstevel@tonic-gate  * turnstile_block() holding the appropriate lock in the upimutextab (see
390*7c478bd9Sstevel@tonic-gate  * the block comment in lwp_upimutex_lock() for details).  The held lock is
391*7c478bd9Sstevel@tonic-gate  * passed to turnstile_block() as the "mp" parameter, and will be dropped
392*7c478bd9Sstevel@tonic-gate  * after priority has been willed, but before the thread actually sleeps
393*7c478bd9Sstevel@tonic-gate  * (this locking behavior leads to some subtle ordering issues; see the
394*7c478bd9Sstevel@tonic-gate  * block comment on turnstile hashing for details).  This _must_ be the only
395*7c478bd9Sstevel@tonic-gate  * lock held when calling turnstile_block() with a SOBJ_USER_PI sobj; holding
396*7c478bd9Sstevel@tonic-gate  * other locks can result in panics due to cycles in the blocking chain.
397*7c478bd9Sstevel@tonic-gate  *
398*7c478bd9Sstevel@tonic-gate  * turnstile_block() always succeeds for kernel synchronization objects.
399*7c478bd9Sstevel@tonic-gate  * For SOBJ_USER_PI locks the possible errors are EINTR for signals, and
400*7c478bd9Sstevel@tonic-gate  * EDEADLK for cycles in the blocking chain. A return code of zero indicates
401*7c478bd9Sstevel@tonic-gate  * *either* that the lock is now held, or that this is a spurious wake-up, or
402*7c478bd9Sstevel@tonic-gate  * that the lock can never be held due to an ENOTRECOVERABLE error.
403*7c478bd9Sstevel@tonic-gate  * It is up to lwp_upimutex_lock() to sort this all out.
404*7c478bd9Sstevel@tonic-gate  */
405*7c478bd9Sstevel@tonic-gate 
406*7c478bd9Sstevel@tonic-gate int
407*7c478bd9Sstevel@tonic-gate turnstile_block(turnstile_t *ts, int qnum, void *sobj, sobj_ops_t *sobj_ops,
408*7c478bd9Sstevel@tonic-gate     kmutex_t *mp, lwp_timer_t *lwptp)
409*7c478bd9Sstevel@tonic-gate {
410*7c478bd9Sstevel@tonic-gate 	kthread_t *owner;
411*7c478bd9Sstevel@tonic-gate 	kthread_t *t = curthread;
412*7c478bd9Sstevel@tonic-gate 	proc_t *p = ttoproc(t);
413*7c478bd9Sstevel@tonic-gate 	klwp_t *lwp = ttolwp(t);
414*7c478bd9Sstevel@tonic-gate 	turnstile_chain_t *tc = &TURNSTILE_CHAIN(sobj);
415*7c478bd9Sstevel@tonic-gate 	int error = 0;
416*7c478bd9Sstevel@tonic-gate 	int loser = 0;
417*7c478bd9Sstevel@tonic-gate 
418*7c478bd9Sstevel@tonic-gate 	ASSERT(DISP_LOCK_HELD(&tc->tc_lock));
419*7c478bd9Sstevel@tonic-gate 	ASSERT(mp == NULL || IS_UPI(mp));
420*7c478bd9Sstevel@tonic-gate 	ASSERT((SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) ^ (mp == NULL));
421*7c478bd9Sstevel@tonic-gate 
422*7c478bd9Sstevel@tonic-gate 	thread_lock_high(t);
423*7c478bd9Sstevel@tonic-gate 
424*7c478bd9Sstevel@tonic-gate 	if (ts == NULL) {
425*7c478bd9Sstevel@tonic-gate 		/*
426*7c478bd9Sstevel@tonic-gate 		 * This is the first thread to block on this sobj.
427*7c478bd9Sstevel@tonic-gate 		 * Take its attached turnstile and add it to the hash chain.
428*7c478bd9Sstevel@tonic-gate 		 */
429*7c478bd9Sstevel@tonic-gate 		ts = t->t_ts;
430*7c478bd9Sstevel@tonic-gate 		ts->ts_sobj = sobj;
431*7c478bd9Sstevel@tonic-gate 		ts->ts_next = tc->tc_first;
432*7c478bd9Sstevel@tonic-gate 		tc->tc_first = ts;
433*7c478bd9Sstevel@tonic-gate 		ASSERT(ts->ts_waiters == 0);
434*7c478bd9Sstevel@tonic-gate 	} else {
435*7c478bd9Sstevel@tonic-gate 		/*
436*7c478bd9Sstevel@tonic-gate 		 * Another thread has already donated its turnstile
437*7c478bd9Sstevel@tonic-gate 		 * to block on this sobj, so ours isn't needed.
438*7c478bd9Sstevel@tonic-gate 		 * Stash it on the active turnstile's freelist.
439*7c478bd9Sstevel@tonic-gate 		 */
440*7c478bd9Sstevel@tonic-gate 		turnstile_t *myts = t->t_ts;
441*7c478bd9Sstevel@tonic-gate 		myts->ts_free = ts->ts_free;
442*7c478bd9Sstevel@tonic-gate 		ts->ts_free = myts;
443*7c478bd9Sstevel@tonic-gate 		t->t_ts = ts;
444*7c478bd9Sstevel@tonic-gate 		ASSERT(ts->ts_sobj == sobj);
445*7c478bd9Sstevel@tonic-gate 		ASSERT(ts->ts_waiters > 0);
446*7c478bd9Sstevel@tonic-gate 	}
447*7c478bd9Sstevel@tonic-gate 
448*7c478bd9Sstevel@tonic-gate 	/*
449*7c478bd9Sstevel@tonic-gate 	 * Put the thread to sleep.
450*7c478bd9Sstevel@tonic-gate 	 */
451*7c478bd9Sstevel@tonic-gate 	ASSERT(t != CPU->cpu_idle_thread);
452*7c478bd9Sstevel@tonic-gate 	ASSERT(CPU_ON_INTR(CPU) == 0);
453*7c478bd9Sstevel@tonic-gate 	ASSERT(t->t_wchan0 == NULL && t->t_wchan == NULL);
454*7c478bd9Sstevel@tonic-gate 	ASSERT(t->t_state == TS_ONPROC);
455*7c478bd9Sstevel@tonic-gate 
456*7c478bd9Sstevel@tonic-gate 	if (SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) {
457*7c478bd9Sstevel@tonic-gate 		curthread->t_flag |= T_WAKEABLE;
458*7c478bd9Sstevel@tonic-gate 	}
459*7c478bd9Sstevel@tonic-gate 	CL_SLEEP(t);		/* assign kernel priority */
460*7c478bd9Sstevel@tonic-gate 	THREAD_SLEEP(t, &tc->tc_lock);
461*7c478bd9Sstevel@tonic-gate 	t->t_wchan = sobj;
462*7c478bd9Sstevel@tonic-gate 	t->t_sobj_ops = sobj_ops;
463*7c478bd9Sstevel@tonic-gate 	DTRACE_SCHED(sleep);
464*7c478bd9Sstevel@tonic-gate 
465*7c478bd9Sstevel@tonic-gate 	if (lwp != NULL) {
466*7c478bd9Sstevel@tonic-gate 		lwp->lwp_ru.nvcsw++;
467*7c478bd9Sstevel@tonic-gate 		(void) new_mstate(t, LMS_SLEEP);
468*7c478bd9Sstevel@tonic-gate 		if (SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) {
469*7c478bd9Sstevel@tonic-gate 			lwp->lwp_asleep = 1;
470*7c478bd9Sstevel@tonic-gate 			lwp->lwp_sysabort = 0;
471*7c478bd9Sstevel@tonic-gate 			/*
472*7c478bd9Sstevel@tonic-gate 			 * make wchan0 non-zero to conform to the rule that
473*7c478bd9Sstevel@tonic-gate 			 * threads blocking for user-level objects have a
474*7c478bd9Sstevel@tonic-gate 			 * non-zero wchan0: this prevents spurious wake-ups
475*7c478bd9Sstevel@tonic-gate 			 * by, for example, /proc.
476*7c478bd9Sstevel@tonic-gate 			 */
477*7c478bd9Sstevel@tonic-gate 			t->t_wchan0 = (caddr_t)1;
478*7c478bd9Sstevel@tonic-gate 		}
479*7c478bd9Sstevel@tonic-gate 	}
480*7c478bd9Sstevel@tonic-gate 	ts->ts_waiters++;
481*7c478bd9Sstevel@tonic-gate 	sleepq_insert(&ts->ts_sleepq[qnum], t);
482*7c478bd9Sstevel@tonic-gate 
483*7c478bd9Sstevel@tonic-gate 	if (SOBJ_TYPE(sobj_ops) == SOBJ_MUTEX &&
484*7c478bd9Sstevel@tonic-gate 	    SOBJ_OWNER(sobj_ops, sobj) == NULL)
485*7c478bd9Sstevel@tonic-gate 		panic("turnstile_block(%p): unowned mutex", (void *)ts);
486*7c478bd9Sstevel@tonic-gate 
487*7c478bd9Sstevel@tonic-gate 	/*
488*7c478bd9Sstevel@tonic-gate 	 * Follow the blocking chain to its end, willing our priority to
489*7c478bd9Sstevel@tonic-gate 	 * everyone who's in our way.
490*7c478bd9Sstevel@tonic-gate 	 */
491*7c478bd9Sstevel@tonic-gate 	while (t->t_sobj_ops != NULL &&
492*7c478bd9Sstevel@tonic-gate 	    (owner = SOBJ_OWNER(t->t_sobj_ops, t->t_wchan)) != NULL) {
493*7c478bd9Sstevel@tonic-gate 		if (owner == curthread) {
494*7c478bd9Sstevel@tonic-gate 			if (SOBJ_TYPE(sobj_ops) != SOBJ_USER_PI) {
495*7c478bd9Sstevel@tonic-gate 				panic("Deadlock: cycle in blocking chain");
496*7c478bd9Sstevel@tonic-gate 			}
497*7c478bd9Sstevel@tonic-gate 			/*
498*7c478bd9Sstevel@tonic-gate 			 * If the cycle we've encountered ends in mp,
499*7c478bd9Sstevel@tonic-gate 			 * then we know it isn't a 'real' cycle because
500*7c478bd9Sstevel@tonic-gate 			 * we're going to drop mp before we go to sleep.
501*7c478bd9Sstevel@tonic-gate 			 * Moreover, since we've come full circle we know
502*7c478bd9Sstevel@tonic-gate 			 * that we must have willed priority to everyone
503*7c478bd9Sstevel@tonic-gate 			 * in our way.  Therefore, we can break out now.
504*7c478bd9Sstevel@tonic-gate 			 */
505*7c478bd9Sstevel@tonic-gate 			if (t->t_wchan == (void *)mp)
506*7c478bd9Sstevel@tonic-gate 				break;
507*7c478bd9Sstevel@tonic-gate 
508*7c478bd9Sstevel@tonic-gate 			if (loser)
509*7c478bd9Sstevel@tonic-gate 				lock_clear(&turnstile_loser_lock);
510*7c478bd9Sstevel@tonic-gate 			/*
511*7c478bd9Sstevel@tonic-gate 			 * For SOBJ_USER_PI, a cycle is an application
512*7c478bd9Sstevel@tonic-gate 			 * deadlock which needs to be communicated
513*7c478bd9Sstevel@tonic-gate 			 * back to the application.
514*7c478bd9Sstevel@tonic-gate 			 */
515*7c478bd9Sstevel@tonic-gate 			thread_unlock_nopreempt(t);
516*7c478bd9Sstevel@tonic-gate 			if (lwptp->lwpt_id != 0) {
517*7c478bd9Sstevel@tonic-gate 				/*
518*7c478bd9Sstevel@tonic-gate 				 * We enqueued a timeout, we are
519*7c478bd9Sstevel@tonic-gate 				 * holding curthread->t_delay_lock.
520*7c478bd9Sstevel@tonic-gate 				 * Drop it and dequeue the timeout.
521*7c478bd9Sstevel@tonic-gate 				 */
522*7c478bd9Sstevel@tonic-gate 				mutex_exit(&curthread->t_delay_lock);
523*7c478bd9Sstevel@tonic-gate 				(void) lwp_timer_dequeue(lwptp);
524*7c478bd9Sstevel@tonic-gate 			}
525*7c478bd9Sstevel@tonic-gate 			mutex_exit(mp);
526*7c478bd9Sstevel@tonic-gate 			setrun(curthread);
527*7c478bd9Sstevel@tonic-gate 			swtch(); /* necessary to transition state */
528*7c478bd9Sstevel@tonic-gate 			curthread->t_flag &= ~T_WAKEABLE;
529*7c478bd9Sstevel@tonic-gate 			setallwatch();
530*7c478bd9Sstevel@tonic-gate 			lwp->lwp_asleep = 0;
531*7c478bd9Sstevel@tonic-gate 			lwp->lwp_sysabort = 0;
532*7c478bd9Sstevel@tonic-gate 			return (EDEADLK);
533*7c478bd9Sstevel@tonic-gate 		}
534*7c478bd9Sstevel@tonic-gate 		if (!turnstile_interlock(t->t_lockp, &owner->t_lockp)) {
535*7c478bd9Sstevel@tonic-gate 			/*
536*7c478bd9Sstevel@tonic-gate 			 * If we failed to grab the owner's thread lock,
537*7c478bd9Sstevel@tonic-gate 			 * turnstile_interlock() will have dropped t's
538*7c478bd9Sstevel@tonic-gate 			 * thread lock, so at this point we don't even know
539*7c478bd9Sstevel@tonic-gate 			 * that 't' exists anymore.  The simplest solution
540*7c478bd9Sstevel@tonic-gate 			 * is to restart the entire priority inheritance dance
541*7c478bd9Sstevel@tonic-gate 			 * from the beginning of the blocking chain, since
542*7c478bd9Sstevel@tonic-gate 			 * we *do* know that 'curthread' still exists.
543*7c478bd9Sstevel@tonic-gate 			 * Application of priority inheritance is idempotent,
544*7c478bd9Sstevel@tonic-gate 			 * so it's OK that we're doing it more than once.
545*7c478bd9Sstevel@tonic-gate 			 * Note also that since we've dropped our thread lock,
546*7c478bd9Sstevel@tonic-gate 			 * we may already have been woken up; if so, our
547*7c478bd9Sstevel@tonic-gate 			 * t_sobj_ops will be NULL, the loop will terminate,
548*7c478bd9Sstevel@tonic-gate 			 * and the call to swtch() will be a no-op.  Phew.
549*7c478bd9Sstevel@tonic-gate 			 *
550*7c478bd9Sstevel@tonic-gate 			 * There is one further complication: if two (or more)
551*7c478bd9Sstevel@tonic-gate 			 * threads keep trying to grab the turnstile locks out
552*7c478bd9Sstevel@tonic-gate 			 * of order and keep losing the race to another thread,
553*7c478bd9Sstevel@tonic-gate 			 * these "dueling losers" can livelock the system.
554*7c478bd9Sstevel@tonic-gate 			 * Therefore, once we get into this rare situation,
555*7c478bd9Sstevel@tonic-gate 			 * we serialize all the losers.
556*7c478bd9Sstevel@tonic-gate 			 */
557*7c478bd9Sstevel@tonic-gate 			if (loser == 0) {
558*7c478bd9Sstevel@tonic-gate 				loser = 1;
559*7c478bd9Sstevel@tonic-gate 				lock_set(&turnstile_loser_lock);
560*7c478bd9Sstevel@tonic-gate 			}
561*7c478bd9Sstevel@tonic-gate 			t = curthread;
562*7c478bd9Sstevel@tonic-gate 			thread_lock_high(t);
563*7c478bd9Sstevel@tonic-gate 			continue;
564*7c478bd9Sstevel@tonic-gate 		}
565*7c478bd9Sstevel@tonic-gate 
566*7c478bd9Sstevel@tonic-gate 		/*
567*7c478bd9Sstevel@tonic-gate 		 * We now have the owner's thread lock.  If we are traversing
568*7c478bd9Sstevel@tonic-gate 		 * from non-SOBJ_USER_PI ops to SOBJ_USER_PI ops, then we know
569*7c478bd9Sstevel@tonic-gate 		 * that we have caught the thread while in the TS_SLEEP state,
570*7c478bd9Sstevel@tonic-gate 		 * but holding mp.  We know that this situation is transient
571*7c478bd9Sstevel@tonic-gate 		 * (mp will be dropped before the holder actually sleeps on
572*7c478bd9Sstevel@tonic-gate 		 * the SOBJ_USER_PI sobj), so we will spin waiting for mp to
573*7c478bd9Sstevel@tonic-gate 		 * be dropped.  Then, as in the turnstile_interlock() failure
574*7c478bd9Sstevel@tonic-gate 		 * case, we will restart the priority inheritance dance.
575*7c478bd9Sstevel@tonic-gate 		 */
576*7c478bd9Sstevel@tonic-gate 		if (SOBJ_TYPE(t->t_sobj_ops) != SOBJ_USER_PI &&
577*7c478bd9Sstevel@tonic-gate 		    owner->t_sobj_ops != NULL &&
578*7c478bd9Sstevel@tonic-gate 		    SOBJ_TYPE(owner->t_sobj_ops) == SOBJ_USER_PI) {
579*7c478bd9Sstevel@tonic-gate 			kmutex_t *upi_lock = (kmutex_t *)t->t_wchan;
580*7c478bd9Sstevel@tonic-gate 
581*7c478bd9Sstevel@tonic-gate 			ASSERT(IS_UPI(upi_lock));
582*7c478bd9Sstevel@tonic-gate 			ASSERT(SOBJ_TYPE(t->t_sobj_ops) == SOBJ_MUTEX);
583*7c478bd9Sstevel@tonic-gate 
584*7c478bd9Sstevel@tonic-gate 			if (t->t_lockp != owner->t_lockp)
585*7c478bd9Sstevel@tonic-gate 				thread_unlock_high(owner);
586*7c478bd9Sstevel@tonic-gate 			thread_unlock_high(t);
587*7c478bd9Sstevel@tonic-gate 			if (loser)
588*7c478bd9Sstevel@tonic-gate 				lock_clear(&turnstile_loser_lock);
589*7c478bd9Sstevel@tonic-gate 
590*7c478bd9Sstevel@tonic-gate 			while (mutex_owner(upi_lock) == owner) {
591*7c478bd9Sstevel@tonic-gate 				SMT_PAUSE();
592*7c478bd9Sstevel@tonic-gate 				continue;
593*7c478bd9Sstevel@tonic-gate 			}
594*7c478bd9Sstevel@tonic-gate 
595*7c478bd9Sstevel@tonic-gate 			if (loser)
596*7c478bd9Sstevel@tonic-gate 				lock_set(&turnstile_loser_lock);
597*7c478bd9Sstevel@tonic-gate 			t = curthread;
598*7c478bd9Sstevel@tonic-gate 			thread_lock_high(t);
599*7c478bd9Sstevel@tonic-gate 			continue;
600*7c478bd9Sstevel@tonic-gate 		}
601*7c478bd9Sstevel@tonic-gate 
602*7c478bd9Sstevel@tonic-gate 		turnstile_pi_inherit(t->t_ts, owner, DISP_PRIO(t));
603*7c478bd9Sstevel@tonic-gate 		if (t->t_lockp != owner->t_lockp)
604*7c478bd9Sstevel@tonic-gate 			thread_unlock_high(t);
605*7c478bd9Sstevel@tonic-gate 		t = owner;
606*7c478bd9Sstevel@tonic-gate 	}
607*7c478bd9Sstevel@tonic-gate 
608*7c478bd9Sstevel@tonic-gate 	if (loser)
609*7c478bd9Sstevel@tonic-gate 		lock_clear(&turnstile_loser_lock);
610*7c478bd9Sstevel@tonic-gate 
611*7c478bd9Sstevel@tonic-gate 	/*
612*7c478bd9Sstevel@tonic-gate 	 * Note: 't' and 'curthread' were synonymous before the loop above,
613*7c478bd9Sstevel@tonic-gate 	 * but now they may be different.  ('t' is now the last thread in
614*7c478bd9Sstevel@tonic-gate 	 * the blocking chain.)
615*7c478bd9Sstevel@tonic-gate 	 */
616*7c478bd9Sstevel@tonic-gate 	if (SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) {
617*7c478bd9Sstevel@tonic-gate 		ushort_t s = curthread->t_oldspl;
618*7c478bd9Sstevel@tonic-gate 		int timedwait = 0;
619*7c478bd9Sstevel@tonic-gate 		clock_t tim = -1;
620*7c478bd9Sstevel@tonic-gate 
621*7c478bd9Sstevel@tonic-gate 		thread_unlock_high(t);
622*7c478bd9Sstevel@tonic-gate 		if (lwptp->lwpt_id != 0) {
623*7c478bd9Sstevel@tonic-gate 			/*
624*7c478bd9Sstevel@tonic-gate 			 * We enqueued a timeout and we are
625*7c478bd9Sstevel@tonic-gate 			 * holding curthread->t_delay_lock.
626*7c478bd9Sstevel@tonic-gate 			 */
627*7c478bd9Sstevel@tonic-gate 			mutex_exit(&curthread->t_delay_lock);
628*7c478bd9Sstevel@tonic-gate 			timedwait = 1;
629*7c478bd9Sstevel@tonic-gate 		}
630*7c478bd9Sstevel@tonic-gate 		mutex_exit(mp);
631*7c478bd9Sstevel@tonic-gate 		splx(s);
632*7c478bd9Sstevel@tonic-gate 
633*7c478bd9Sstevel@tonic-gate 		if (ISSIG(curthread, JUSTLOOKING) ||
634*7c478bd9Sstevel@tonic-gate 		    MUSTRETURN(p, curthread) || lwptp->lwpt_imm_timeout)
635*7c478bd9Sstevel@tonic-gate 			setrun(curthread);
636*7c478bd9Sstevel@tonic-gate 		swtch();
637*7c478bd9Sstevel@tonic-gate 		curthread->t_flag &= ~T_WAKEABLE;
638*7c478bd9Sstevel@tonic-gate 		if (timedwait)
639*7c478bd9Sstevel@tonic-gate 			tim = lwp_timer_dequeue(lwptp);
640*7c478bd9Sstevel@tonic-gate 		setallwatch();
641*7c478bd9Sstevel@tonic-gate 		if (ISSIG(curthread, FORREAL) || lwp->lwp_sysabort ||
642*7c478bd9Sstevel@tonic-gate 		    MUSTRETURN(p, curthread))
643*7c478bd9Sstevel@tonic-gate 			error = EINTR;
644*7c478bd9Sstevel@tonic-gate 		else if (lwptp->lwpt_imm_timeout || (timedwait && tim == -1))
645*7c478bd9Sstevel@tonic-gate 			error = ETIME;
646*7c478bd9Sstevel@tonic-gate 		lwp->lwp_sysabort = 0;
647*7c478bd9Sstevel@tonic-gate 		lwp->lwp_asleep = 0;
648*7c478bd9Sstevel@tonic-gate 	} else {
649*7c478bd9Sstevel@tonic-gate 		thread_unlock_nopreempt(t);
650*7c478bd9Sstevel@tonic-gate 		swtch();
651*7c478bd9Sstevel@tonic-gate 	}
652*7c478bd9Sstevel@tonic-gate 
653*7c478bd9Sstevel@tonic-gate 	return (error);
654*7c478bd9Sstevel@tonic-gate }
655*7c478bd9Sstevel@tonic-gate 
656*7c478bd9Sstevel@tonic-gate /*
657*7c478bd9Sstevel@tonic-gate  * Remove thread from specified turnstile sleep queue; retrieve its
658*7c478bd9Sstevel@tonic-gate  * free turnstile; if it is the last waiter, delete the turnstile
659*7c478bd9Sstevel@tonic-gate  * from the turnstile chain and if there is an inheritor, delete it
660*7c478bd9Sstevel@tonic-gate  * from the inheritor's t_prioinv chain.
661*7c478bd9Sstevel@tonic-gate  */
662*7c478bd9Sstevel@tonic-gate static void
663*7c478bd9Sstevel@tonic-gate turnstile_dequeue(kthread_t *t)
664*7c478bd9Sstevel@tonic-gate {
665*7c478bd9Sstevel@tonic-gate 	turnstile_t *ts = t->t_ts;
666*7c478bd9Sstevel@tonic-gate 	turnstile_chain_t *tc = &TURNSTILE_CHAIN(ts->ts_sobj);
667*7c478bd9Sstevel@tonic-gate 	turnstile_t *tsfree, **tspp;
668*7c478bd9Sstevel@tonic-gate 
669*7c478bd9Sstevel@tonic-gate 	ASSERT(DISP_LOCK_HELD(&tc->tc_lock));
670*7c478bd9Sstevel@tonic-gate 	ASSERT(t->t_lockp == &tc->tc_lock);
671*7c478bd9Sstevel@tonic-gate 
672*7c478bd9Sstevel@tonic-gate 	if ((tsfree = ts->ts_free) != NULL) {
673*7c478bd9Sstevel@tonic-gate 		ASSERT(ts->ts_waiters > 1);
674*7c478bd9Sstevel@tonic-gate 		ASSERT(tsfree->ts_waiters == 0);
675*7c478bd9Sstevel@tonic-gate 		t->t_ts = tsfree;
676*7c478bd9Sstevel@tonic-gate 		ts->ts_free = tsfree->ts_free;
677*7c478bd9Sstevel@tonic-gate 		tsfree->ts_free = NULL;
678*7c478bd9Sstevel@tonic-gate 	} else {
679*7c478bd9Sstevel@tonic-gate 		/*
680*7c478bd9Sstevel@tonic-gate 		 * The active turnstile's freelist is empty, so this
681*7c478bd9Sstevel@tonic-gate 		 * must be the last waiter.  Remove the turnstile
682*7c478bd9Sstevel@tonic-gate 		 * from the hash chain and leave the now-inactive
683*7c478bd9Sstevel@tonic-gate 		 * turnstile attached to the thread we're waking.
684*7c478bd9Sstevel@tonic-gate 		 * Note that the ts_inheritor for the turnstile
685*7c478bd9Sstevel@tonic-gate 		 * may be NULL. If one exists, its t_prioinv
686*7c478bd9Sstevel@tonic-gate 		 * chain has to be updated.
687*7c478bd9Sstevel@tonic-gate 		 */
688*7c478bd9Sstevel@tonic-gate 		ASSERT(ts->ts_waiters == 1);
689*7c478bd9Sstevel@tonic-gate 		if (ts->ts_inheritor != NULL) {
690*7c478bd9Sstevel@tonic-gate 			(void) turnstile_pi_tsdelete(ts, ts->ts_inheritor);
691*7c478bd9Sstevel@tonic-gate 			/*
692*7c478bd9Sstevel@tonic-gate 			 * If we ever do a "disinherit" or "unboost", we need
693*7c478bd9Sstevel@tonic-gate 			 * to do it only if "t" is a thread at the head of the
694*7c478bd9Sstevel@tonic-gate 			 * sleep queue. Since the sleep queue is prioritized,
695*7c478bd9Sstevel@tonic-gate 			 * the disinherit is necessary only if the interrupted
696*7c478bd9Sstevel@tonic-gate 			 * thread is the highest priority thread.
697*7c478bd9Sstevel@tonic-gate 			 * Otherwise, there is a higher priority thread blocked
698*7c478bd9Sstevel@tonic-gate 			 * on the turnstile, whose inheritance cannot be
699*7c478bd9Sstevel@tonic-gate 			 * disinherited. However, disinheriting is explicitly
700*7c478bd9Sstevel@tonic-gate 			 * not done here, since it would require holding the
701*7c478bd9Sstevel@tonic-gate 			 * inheritor's thread lock (see turnstile_unsleep()).
702*7c478bd9Sstevel@tonic-gate 			 */
703*7c478bd9Sstevel@tonic-gate 			ts->ts_inheritor = NULL;
704*7c478bd9Sstevel@tonic-gate 		}
705*7c478bd9Sstevel@tonic-gate 		tspp = &tc->tc_first;
706*7c478bd9Sstevel@tonic-gate 		while (*tspp != ts)
707*7c478bd9Sstevel@tonic-gate 			tspp = &(*tspp)->ts_next;
708*7c478bd9Sstevel@tonic-gate 		*tspp = ts->ts_next;
709*7c478bd9Sstevel@tonic-gate 		ASSERT(t->t_ts == ts);
710*7c478bd9Sstevel@tonic-gate 	}
711*7c478bd9Sstevel@tonic-gate 	ts->ts_waiters--;
712*7c478bd9Sstevel@tonic-gate 	sleepq_dequeue(t);
713*7c478bd9Sstevel@tonic-gate 	t->t_sobj_ops = NULL;
714*7c478bd9Sstevel@tonic-gate 	t->t_wchan = NULL;
715*7c478bd9Sstevel@tonic-gate 	t->t_wchan0 = NULL;
716*7c478bd9Sstevel@tonic-gate 	ASSERT(t->t_state == TS_SLEEP);
717*7c478bd9Sstevel@tonic-gate }
718*7c478bd9Sstevel@tonic-gate 
719*7c478bd9Sstevel@tonic-gate /*
720*7c478bd9Sstevel@tonic-gate  * Wake threads that are blocked in a turnstile.
721*7c478bd9Sstevel@tonic-gate  */
722*7c478bd9Sstevel@tonic-gate void
723*7c478bd9Sstevel@tonic-gate turnstile_wakeup(turnstile_t *ts, int qnum, int nthreads, kthread_t *owner)
724*7c478bd9Sstevel@tonic-gate {
725*7c478bd9Sstevel@tonic-gate 	turnstile_chain_t *tc = &TURNSTILE_CHAIN(ts->ts_sobj);
726*7c478bd9Sstevel@tonic-gate 	sleepq_t *sqp = &ts->ts_sleepq[qnum];
727*7c478bd9Sstevel@tonic-gate 
728*7c478bd9Sstevel@tonic-gate 	ASSERT(DISP_LOCK_HELD(&tc->tc_lock));
729*7c478bd9Sstevel@tonic-gate 
730*7c478bd9Sstevel@tonic-gate 	/*
731*7c478bd9Sstevel@tonic-gate 	 * Waive any priority we may have inherited from this turnstile.
732*7c478bd9Sstevel@tonic-gate 	 */
733*7c478bd9Sstevel@tonic-gate 	if (ts->ts_inheritor != NULL) {
734*7c478bd9Sstevel@tonic-gate 		turnstile_pi_waive(ts);
735*7c478bd9Sstevel@tonic-gate 	}
736*7c478bd9Sstevel@tonic-gate 	while (nthreads-- > 0) {
737*7c478bd9Sstevel@tonic-gate 		kthread_t *t = sqp->sq_first;
738*7c478bd9Sstevel@tonic-gate 		ASSERT(t->t_ts == ts);
739*7c478bd9Sstevel@tonic-gate 		ASSERT(ts->ts_waiters > 1 || ts->ts_inheritor == NULL);
740*7c478bd9Sstevel@tonic-gate 		DTRACE_SCHED1(wakeup, kthread_t *, t);
741*7c478bd9Sstevel@tonic-gate 		turnstile_dequeue(t);
742*7c478bd9Sstevel@tonic-gate 		CL_WAKEUP(t); /* previous thread lock, tc_lock, not dropped */
743*7c478bd9Sstevel@tonic-gate 		/*
744*7c478bd9Sstevel@tonic-gate 		 * If the caller did direct handoff of ownership,
745*7c478bd9Sstevel@tonic-gate 		 * make the new owner inherit from this turnstile.
746*7c478bd9Sstevel@tonic-gate 		 */
747*7c478bd9Sstevel@tonic-gate 		if (t == owner) {
748*7c478bd9Sstevel@tonic-gate 			kthread_t *wp = ts->ts_sleepq[TS_WRITER_Q].sq_first;
749*7c478bd9Sstevel@tonic-gate 			kthread_t *rp = ts->ts_sleepq[TS_READER_Q].sq_first;
750*7c478bd9Sstevel@tonic-gate 			pri_t wpri = wp ? DISP_PRIO(wp) : 0;
751*7c478bd9Sstevel@tonic-gate 			pri_t rpri = rp ? DISP_PRIO(rp) : 0;
752*7c478bd9Sstevel@tonic-gate 			turnstile_pi_inherit(ts, t, MAX(wpri, rpri));
753*7c478bd9Sstevel@tonic-gate 			owner = NULL;
754*7c478bd9Sstevel@tonic-gate 		}
755*7c478bd9Sstevel@tonic-gate 		thread_unlock_high(t);		/* drop run queue lock */
756*7c478bd9Sstevel@tonic-gate 	}
757*7c478bd9Sstevel@tonic-gate 	if (owner != NULL)
758*7c478bd9Sstevel@tonic-gate 		panic("turnstile_wakeup: owner %p not woken", owner);
759*7c478bd9Sstevel@tonic-gate 	disp_lock_exit(&tc->tc_lock);
760*7c478bd9Sstevel@tonic-gate }
761*7c478bd9Sstevel@tonic-gate 
762*7c478bd9Sstevel@tonic-gate /*
763*7c478bd9Sstevel@tonic-gate  * Change priority of a thread sleeping in a turnstile.
764*7c478bd9Sstevel@tonic-gate  */
765*7c478bd9Sstevel@tonic-gate void
766*7c478bd9Sstevel@tonic-gate turnstile_change_pri(kthread_t *t, pri_t pri, pri_t *t_prip)
767*7c478bd9Sstevel@tonic-gate {
768*7c478bd9Sstevel@tonic-gate 	sleepq_t *sqp = t->t_sleepq;
769*7c478bd9Sstevel@tonic-gate 
770*7c478bd9Sstevel@tonic-gate 	sleepq_dequeue(t);
771*7c478bd9Sstevel@tonic-gate 	*t_prip = pri;
772*7c478bd9Sstevel@tonic-gate 	sleepq_insert(sqp, t);
773*7c478bd9Sstevel@tonic-gate }
774*7c478bd9Sstevel@tonic-gate 
775*7c478bd9Sstevel@tonic-gate /*
776*7c478bd9Sstevel@tonic-gate  * We don't allow spurious wakeups of threads blocked in turnstiles
777*7c478bd9Sstevel@tonic-gate  * for synch objects whose sobj_ops vector is initialized with the
778*7c478bd9Sstevel@tonic-gate  * following routine (e.g. kernel synchronization objects).
779*7c478bd9Sstevel@tonic-gate  * This is vital to the correctness of direct-handoff logic in some
780*7c478bd9Sstevel@tonic-gate  * synchronization primitives, and it also simplifies the PI logic.
781*7c478bd9Sstevel@tonic-gate  */
782*7c478bd9Sstevel@tonic-gate /* ARGSUSED */
783*7c478bd9Sstevel@tonic-gate void
784*7c478bd9Sstevel@tonic-gate turnstile_stay_asleep(kthread_t *t)
785*7c478bd9Sstevel@tonic-gate {
786*7c478bd9Sstevel@tonic-gate }
787*7c478bd9Sstevel@tonic-gate 
788*7c478bd9Sstevel@tonic-gate /*
789*7c478bd9Sstevel@tonic-gate  * Wake up a thread blocked in a turnstile. Used to enable interruptibility
790*7c478bd9Sstevel@tonic-gate  * of threads blocked on a SOBJ_USER_PI sobj.
791*7c478bd9Sstevel@tonic-gate  *
792*7c478bd9Sstevel@tonic-gate  * The implications of this interface are:
793*7c478bd9Sstevel@tonic-gate  *
794*7c478bd9Sstevel@tonic-gate  * 1. turnstile_block() may return with an EINTR.
795*7c478bd9Sstevel@tonic-gate  * 2. When the owner of an sobj releases it, but no turnstile is found (i.e.
796*7c478bd9Sstevel@tonic-gate  *    no waiters), the (prior) owner must call turnstile_pi_recalc() to
797*7c478bd9Sstevel@tonic-gate  *    waive any priority inherited from interrupted waiters.
798*7c478bd9Sstevel@tonic-gate  *
799*7c478bd9Sstevel@tonic-gate  * When a waiter is interrupted, disinheriting its willed priority from the
800*7c478bd9Sstevel@tonic-gate  * inheritor would require holding the inheritor's thread lock, while also
801*7c478bd9Sstevel@tonic-gate  * holding the waiter's thread lock which is a turnstile lock. If the
802*7c478bd9Sstevel@tonic-gate  * inheritor's thread lock is not free, and is also a turnstile lock that
803*7c478bd9Sstevel@tonic-gate  * is out of lock order, the waiter's thread lock would have to be dropped.
804*7c478bd9Sstevel@tonic-gate  * This leads to complications for the caller of turnstile_unsleep(), since
805*7c478bd9Sstevel@tonic-gate  * the caller holds the waiter's thread lock. So, instead of disinheriting
806*7c478bd9Sstevel@tonic-gate  * on waiter interruption, the owner is required to follow rule 2 above.
807*7c478bd9Sstevel@tonic-gate  *
808*7c478bd9Sstevel@tonic-gate  * Avoiding disinherit on waiter interruption seems acceptable because
809*7c478bd9Sstevel@tonic-gate  * the owner runs at an unnecessarily high priority only while sobj is held,
810*7c478bd9Sstevel@tonic-gate  * which it would have done in any case, if the waiter had not been interrupted.
811*7c478bd9Sstevel@tonic-gate  */
812*7c478bd9Sstevel@tonic-gate void
813*7c478bd9Sstevel@tonic-gate turnstile_unsleep(kthread_t *t)
814*7c478bd9Sstevel@tonic-gate {
815*7c478bd9Sstevel@tonic-gate 	turnstile_dequeue(t);
816*7c478bd9Sstevel@tonic-gate 	THREAD_TRANSITION(t);
817*7c478bd9Sstevel@tonic-gate 	CL_SETRUN(t);
818*7c478bd9Sstevel@tonic-gate }
819