1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 22 /* 23 * Copyright 2008 Sun Microsystems, Inc. All rights reserved. 24 * Use is subject to license terms. 25 */ 26 27 /* 28 * Big Theory Statement for turnstiles. 29 * 30 * Turnstiles provide blocking and wakeup support, including priority 31 * inheritance, for synchronization primitives (e.g. mutexes and rwlocks). 32 * Typical usage is as follows: 33 * 34 * To block on lock 'lp' for read access in foo_enter(): 35 * 36 * ts = turnstile_lookup(lp); 37 * [ If the lock is still held, set the waiters bit 38 * turnstile_block(ts, TS_READER_Q, lp, &foo_sobj_ops); 39 * 40 * To wake threads waiting for write access to lock 'lp' in foo_exit(): 41 * 42 * ts = turnstile_lookup(lp); 43 * [ Either drop the lock (change owner to NULL) or perform a direct 44 * [ handoff (change owner to one of the threads we're about to wake). 45 * [ If we're going to wake the last waiter, clear the waiters bit. 46 * turnstile_wakeup(ts, TS_WRITER_Q, nwaiters, new_owner or NULL); 47 * 48 * turnstile_lookup() returns holding the turnstile hash chain lock for lp. 49 * Both turnstile_block() and turnstile_wakeup() drop the turnstile lock. 50 * To abort a turnstile operation, the client must call turnstile_exit(). 51 * 52 * Requirements of the client: 53 * 54 * (1) The lock's waiters indicator may be manipulated *only* while 55 * holding the turnstile hash chain lock (i.e. under turnstile_lookup()). 56 * 57 * (2) Once the lock is marked as having waiters, the owner may be 58 * changed *only* while holding the turnstile hash chain lock. 59 * 60 * (3) The caller must never block on an unheld lock. 61 * 62 * Consequences of these assumptions include the following: 63 * 64 * (a) It is impossible for a lock to be unheld but have waiters. 65 * 66 * (b) The priority inheritance code can safely assume that an active 67 * turnstile's ts_inheritor never changes until the inheritor calls 68 * turnstile_pi_waive(). 69 * 70 * These assumptions simplify the implementation of both turnstiles and 71 * their clients. 72 * 73 * Background on priority inheritance: 74 * 75 * Priority inheritance allows a thread to "will" its dispatch priority 76 * to all the threads blocking it, directly or indirectly. This prevents 77 * situations called priority inversions in which a high-priority thread 78 * needs a lock held by a low-priority thread, which cannot run because 79 * of medium-priority threads. Without PI, the medium-priority threads 80 * can starve out the high-priority thread indefinitely. With PI, the 81 * low-priority thread becomes high-priority until it releases whatever 82 * synchronization object the real high-priority thread is waiting for. 83 * 84 * How turnstiles work: 85 * 86 * All active turnstiles reside in a global hash table, turnstile_table[]. 87 * The address of a synchronization object determines its hash index. 88 * Each hash chain is protected by its own dispatcher lock, acquired 89 * by turnstile_lookup(). This lock protects the hash chain linkage, the 90 * contents of all turnstiles on the hash chain, and the waiters bits of 91 * every synchronization object in the system that hashes to the same chain. 92 * Giving the lock such broad scope simplifies the interactions between 93 * the turnstile code and its clients considerably. The blocking path 94 * is rare enough that this has no impact on scalability. (If it ever 95 * does, it's almost surely a second-order effect -- the real problem 96 * is that some synchronization object is *very* heavily contended.) 97 * 98 * Each thread has an attached turnstile in case it needs to block. 99 * A thread cannot block on more than one lock at a time, so one 100 * turnstile per thread is the most we ever need. The first thread 101 * to block on a lock donates its attached turnstile and adds it to 102 * the appropriate hash chain in turnstile_table[]. This becomes the 103 * "active turnstile" for the lock. Each subsequent thread that blocks 104 * on the same lock discovers that the lock already has an active 105 * turnstile, so it stashes its own turnstile on the active turnstile's 106 * freelist. As threads wake up, the process is reversed. 107 * 108 * turnstile_block() puts the current thread to sleep on the active 109 * turnstile for the desired lock, walks the blocking chain to apply 110 * priority inheritance to everyone in its way, and yields the CPU. 111 * 112 * turnstile_wakeup() waives any priority the owner may have inherited 113 * and wakes the specified number of waiting threads. If the caller is 114 * doing direct handoff of ownership (rather than just dropping the lock), 115 * the new owner automatically inherits priority from any existing waiters. 116 */ 117 118 #include <sys/param.h> 119 #include <sys/systm.h> 120 #include <sys/thread.h> 121 #include <sys/proc.h> 122 #include <sys/debug.h> 123 #include <sys/cpuvar.h> 124 #include <sys/turnstile.h> 125 #include <sys/t_lock.h> 126 #include <sys/disp.h> 127 #include <sys/sobject.h> 128 #include <sys/cmn_err.h> 129 #include <sys/sysmacros.h> 130 #include <sys/lockstat.h> 131 #include <sys/lwp_upimutex_impl.h> 132 #include <sys/schedctl.h> 133 #include <sys/cpu.h> 134 #include <sys/sdt.h> 135 #include <sys/cpupart.h> 136 137 extern upib_t upimutextab[UPIMUTEX_TABSIZE]; 138 139 #define IS_UPI(sobj) \ 140 ((uintptr_t)(sobj) - (uintptr_t)upimutextab < sizeof (upimutextab)) 141 142 /* 143 * The turnstile hash table is partitioned into two halves: the lower half 144 * is used for upimutextab[] locks, the upper half for everything else. 145 * The reason for the distinction is that SOBJ_USER_PI locks present a 146 * unique problem: the upimutextab[] lock passed to turnstile_block() 147 * cannot be dropped until the calling thread has blocked on its 148 * SOBJ_USER_PI lock and willed its priority down the blocking chain. 149 * At that point, the caller's t_lockp will be one of the turnstile locks. 150 * If mutex_exit() discovers that the upimutextab[] lock has waiters, it 151 * must wake them, which forces a lock ordering on us: the turnstile lock 152 * for the upimutextab[] lock will be acquired in mutex_vector_exit(), 153 * which will eventually call into turnstile_pi_waive(), which will then 154 * acquire the caller's thread lock, which in this case is the turnstile 155 * lock for the SOBJ_USER_PI lock. In general, when two turnstile locks 156 * must be held at the same time, the lock order must be the address order. 157 * Therefore, to prevent deadlock in turnstile_pi_waive(), we must ensure 158 * that upimutextab[] locks *always* hash to lower addresses than any 159 * other locks. You think this is cheesy? Let's see you do better. 160 */ 161 #define TURNSTILE_HASH_SIZE 128 /* must be power of 2 */ 162 #define TURNSTILE_HASH_MASK (TURNSTILE_HASH_SIZE - 1) 163 #define TURNSTILE_SOBJ_HASH(sobj) \ 164 ((((ulong_t)sobj >> 2) + ((ulong_t)sobj >> 9)) & TURNSTILE_HASH_MASK) 165 #define TURNSTILE_SOBJ_BUCKET(sobj) \ 166 ((IS_UPI(sobj) ? 0 : TURNSTILE_HASH_SIZE) + TURNSTILE_SOBJ_HASH(sobj)) 167 #define TURNSTILE_CHAIN(sobj) turnstile_table[TURNSTILE_SOBJ_BUCKET(sobj)] 168 169 typedef struct turnstile_chain { 170 turnstile_t *tc_first; /* first turnstile on hash chain */ 171 disp_lock_t tc_lock; /* lock for this hash chain */ 172 } turnstile_chain_t; 173 174 turnstile_chain_t turnstile_table[2 * TURNSTILE_HASH_SIZE]; 175 176 static lock_t turnstile_loser_lock; 177 178 /* 179 * Make 'inheritor' inherit priority from this turnstile. 180 */ 181 static void 182 turnstile_pi_inherit(turnstile_t *ts, kthread_t *inheritor, pri_t epri) 183 { 184 ASSERT(THREAD_LOCK_HELD(inheritor)); 185 ASSERT(DISP_LOCK_HELD(&TURNSTILE_CHAIN(ts->ts_sobj).tc_lock)); 186 187 if (epri <= inheritor->t_pri) 188 return; 189 190 if (ts->ts_inheritor == NULL) { 191 ts->ts_inheritor = inheritor; 192 ts->ts_epri = epri; 193 disp_lock_enter_high(&inheritor->t_pi_lock); 194 ts->ts_prioinv = inheritor->t_prioinv; 195 inheritor->t_prioinv = ts; 196 disp_lock_exit_high(&inheritor->t_pi_lock); 197 } else { 198 /* 199 * 'inheritor' is already inheriting from this turnstile, 200 * so just adjust its priority. 201 */ 202 ASSERT(ts->ts_inheritor == inheritor); 203 if (ts->ts_epri < epri) 204 ts->ts_epri = epri; 205 } 206 207 if (epri > DISP_PRIO(inheritor)) 208 thread_change_epri(inheritor, epri); 209 } 210 211 /* 212 * If turnstile is non-NULL, remove it from inheritor's t_prioinv list. 213 * Compute new inherited priority, and return it. 214 */ 215 static pri_t 216 turnstile_pi_tsdelete(turnstile_t *ts, kthread_t *inheritor) 217 { 218 turnstile_t **tspp, *tsp; 219 pri_t new_epri = 0; 220 221 disp_lock_enter_high(&inheritor->t_pi_lock); 222 tspp = &inheritor->t_prioinv; 223 while ((tsp = *tspp) != NULL) { 224 if (tsp == ts) 225 *tspp = tsp->ts_prioinv; 226 else 227 new_epri = MAX(new_epri, tsp->ts_epri); 228 tspp = &tsp->ts_prioinv; 229 } 230 disp_lock_exit_high(&inheritor->t_pi_lock); 231 return (new_epri); 232 } 233 234 /* 235 * Remove turnstile from inheritor's t_prioinv list, compute 236 * new priority, and change the inheritor's effective priority if 237 * necessary. Keep in synch with turnstile_pi_recalc(). 238 */ 239 static void 240 turnstile_pi_waive(turnstile_t *ts) 241 { 242 kthread_t *inheritor = ts->ts_inheritor; 243 pri_t new_epri; 244 245 ASSERT(inheritor == curthread); 246 247 thread_lock_high(inheritor); 248 new_epri = turnstile_pi_tsdelete(ts, inheritor); 249 if (new_epri != DISP_PRIO(inheritor)) 250 thread_change_epri(inheritor, new_epri); 251 ts->ts_inheritor = NULL; 252 if (DISP_MUST_SURRENDER(inheritor)) 253 cpu_surrender(inheritor); 254 thread_unlock_high(inheritor); 255 } 256 257 /* 258 * Compute caller's new inherited priority, and change its effective 259 * priority if necessary. Necessary only for SOBJ_USER_PI, because of 260 * its interruptibility characteristic. 261 */ 262 void 263 turnstile_pi_recalc(void) 264 { 265 kthread_t *inheritor = curthread; 266 pri_t new_epri; 267 268 thread_lock(inheritor); 269 new_epri = turnstile_pi_tsdelete(NULL, inheritor); 270 if (new_epri != DISP_PRIO(inheritor)) 271 thread_change_epri(inheritor, new_epri); 272 if (DISP_MUST_SURRENDER(inheritor)) 273 cpu_surrender(inheritor); 274 thread_unlock(inheritor); 275 } 276 277 /* 278 * Grab the lock protecting the hash chain for sobj 279 * and return the active turnstile for sobj, if any. 280 */ 281 turnstile_t * 282 turnstile_lookup(void *sobj) 283 { 284 turnstile_t *ts; 285 turnstile_chain_t *tc = &TURNSTILE_CHAIN(sobj); 286 287 disp_lock_enter(&tc->tc_lock); 288 289 for (ts = tc->tc_first; ts != NULL; ts = ts->ts_next) 290 if (ts->ts_sobj == sobj) 291 break; 292 293 return (ts); 294 } 295 296 /* 297 * Drop the lock protecting the hash chain for sobj. 298 */ 299 void 300 turnstile_exit(void *sobj) 301 { 302 disp_lock_exit(&TURNSTILE_CHAIN(sobj).tc_lock); 303 } 304 305 /* 306 * When we apply priority inheritance, we must grab the owner's thread lock 307 * while already holding the waiter's thread lock. If both thread locks are 308 * turnstile locks, this can lead to deadlock: while we hold L1 and try to 309 * grab L2, some unrelated thread may be applying priority inheritance to 310 * some other blocking chain, holding L2 and trying to grab L1. The most 311 * obvious solution -- do a lock_try() for the owner lock -- isn't quite 312 * sufficient because it can cause livelock: each thread may hold one lock, 313 * try to grab the other, fail, bail out, and try again, looping forever. 314 * To prevent livelock we must define a winner, i.e. define an arbitrary 315 * lock ordering on the turnstile locks. For simplicity we declare that 316 * virtual address order defines lock order, i.e. if L1 < L2, then the 317 * correct lock ordering is L1, L2. Thus the thread that holds L1 and 318 * wants L2 should spin until L2 is available, but the thread that holds 319 * L2 and can't get L1 on the first try must drop L2 and return failure. 320 * Moreover, the losing thread must not reacquire L2 until the winning 321 * thread has had a chance to grab it; to ensure this, the losing thread 322 * must grab L1 after dropping L2, thus spinning until the winner is done. 323 * Complicating matters further, note that the owner's thread lock pointer 324 * can change (i.e. be pointed at a different lock) while we're trying to 325 * grab it. If that happens, we must unwind our state and try again. 326 * 327 * On success, returns 1 with both locks held. 328 * On failure, returns 0 with neither lock held. 329 */ 330 static int 331 turnstile_interlock(lock_t *wlp, lock_t *volatile *olpp) 332 { 333 ASSERT(LOCK_HELD(wlp)); 334 335 for (;;) { 336 volatile lock_t *olp = *olpp; 337 338 /* 339 * If the locks are identical, there's nothing to do. 340 */ 341 if (olp == wlp) 342 return (1); 343 if (lock_try((lock_t *)olp)) { 344 /* 345 * If 'olp' is still the right lock, return success. 346 * Otherwise, drop 'olp' and try the dance again. 347 */ 348 if (olp == *olpp) 349 return (1); 350 lock_clear((lock_t *)olp); 351 } else { 352 hrtime_t spin_time = 0; 353 /* 354 * If we're grabbing the locks out of order, we lose. 355 * Drop the waiter's lock, and then grab and release 356 * the owner's lock to ensure that we won't retry 357 * until the winner is done (as described above). 358 */ 359 if (olp >= (lock_t *)turnstile_table && olp < wlp) { 360 lock_clear(wlp); 361 lock_set((lock_t *)olp); 362 lock_clear((lock_t *)olp); 363 return (0); 364 } 365 /* 366 * We're grabbing the locks in the right order, 367 * so spin until the owner's lock either becomes 368 * available or spontaneously changes. 369 */ 370 spin_time = 371 LOCKSTAT_START_TIME(LS_TURNSTILE_INTERLOCK_SPIN); 372 while (olp == *olpp && LOCK_HELD(olp)) { 373 if (panicstr) 374 return (1); 375 SMT_PAUSE(); 376 } 377 LOCKSTAT_RECORD_TIME(LS_TURNSTILE_INTERLOCK_SPIN, 378 olp, spin_time); 379 } 380 } 381 } 382 383 /* 384 * Block the current thread on a synchronization object. 385 * 386 * Turnstiles implement both kernel and user-level priority inheritance. 387 * To avoid missed wakeups in the user-level case, lwp_upimutex_lock() calls 388 * turnstile_block() holding the appropriate lock in the upimutextab (see 389 * the block comment in lwp_upimutex_lock() for details). The held lock is 390 * passed to turnstile_block() as the "mp" parameter, and will be dropped 391 * after priority has been willed, but before the thread actually sleeps 392 * (this locking behavior leads to some subtle ordering issues; see the 393 * block comment on turnstile hashing for details). This _must_ be the only 394 * lock held when calling turnstile_block() with a SOBJ_USER_PI sobj; holding 395 * other locks can result in panics due to cycles in the blocking chain. 396 * 397 * turnstile_block() always succeeds for kernel synchronization objects. 398 * For SOBJ_USER_PI locks the possible errors are EINTR for signals, and 399 * EDEADLK for cycles in the blocking chain. A return code of zero indicates 400 * *either* that the lock is now held, or that this is a spurious wake-up, or 401 * that the lock can never be held due to an ENOTRECOVERABLE error. 402 * It is up to lwp_upimutex_lock() to sort this all out. 403 */ 404 405 int 406 turnstile_block(turnstile_t *ts, int qnum, void *sobj, sobj_ops_t *sobj_ops, 407 kmutex_t *mp, lwp_timer_t *lwptp) 408 { 409 kthread_t *owner; 410 kthread_t *t = curthread; 411 proc_t *p = ttoproc(t); 412 klwp_t *lwp = ttolwp(t); 413 turnstile_chain_t *tc = &TURNSTILE_CHAIN(sobj); 414 int error = 0; 415 int loser = 0; 416 417 ASSERT(DISP_LOCK_HELD(&tc->tc_lock)); 418 ASSERT(mp == NULL || IS_UPI(mp)); 419 ASSERT((SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) ^ (mp == NULL)); 420 421 thread_lock_high(t); 422 423 if (ts == NULL) { 424 /* 425 * This is the first thread to block on this sobj. 426 * Take its attached turnstile and add it to the hash chain. 427 */ 428 ts = t->t_ts; 429 ts->ts_sobj = sobj; 430 ts->ts_next = tc->tc_first; 431 tc->tc_first = ts; 432 ASSERT(ts->ts_waiters == 0); 433 } else { 434 /* 435 * Another thread has already donated its turnstile 436 * to block on this sobj, so ours isn't needed. 437 * Stash it on the active turnstile's freelist. 438 */ 439 turnstile_t *myts = t->t_ts; 440 myts->ts_free = ts->ts_free; 441 ts->ts_free = myts; 442 t->t_ts = ts; 443 ASSERT(ts->ts_sobj == sobj); 444 ASSERT(ts->ts_waiters > 0); 445 } 446 447 /* 448 * Put the thread to sleep. 449 */ 450 ASSERT(t != CPU->cpu_idle_thread); 451 ASSERT(CPU_ON_INTR(CPU) == 0); 452 ASSERT(t->t_wchan0 == NULL && t->t_wchan == NULL); 453 ASSERT(t->t_state == TS_ONPROC); 454 455 if (SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) { 456 curthread->t_flag |= T_WAKEABLE; 457 } 458 CL_SLEEP(t); /* assign kernel priority */ 459 THREAD_SLEEP(t, &tc->tc_lock); 460 t->t_wchan = sobj; 461 t->t_sobj_ops = sobj_ops; 462 DTRACE_SCHED(sleep); 463 464 if (lwp != NULL) { 465 lwp->lwp_ru.nvcsw++; 466 (void) new_mstate(t, LMS_SLEEP); 467 if (SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) { 468 lwp->lwp_asleep = 1; 469 lwp->lwp_sysabort = 0; 470 /* 471 * make wchan0 non-zero to conform to the rule that 472 * threads blocking for user-level objects have a 473 * non-zero wchan0: this prevents spurious wake-ups 474 * by, for example, /proc. 475 */ 476 t->t_wchan0 = (caddr_t)1; 477 } 478 } 479 ts->ts_waiters++; 480 sleepq_insert(&ts->ts_sleepq[qnum], t); 481 482 if (SOBJ_TYPE(sobj_ops) == SOBJ_MUTEX && 483 SOBJ_OWNER(sobj_ops, sobj) == NULL) 484 panic("turnstile_block(%p): unowned mutex", (void *)ts); 485 486 /* 487 * Follow the blocking chain to its end, willing our priority to 488 * everyone who's in our way. 489 */ 490 while (t->t_sobj_ops != NULL && 491 (owner = SOBJ_OWNER(t->t_sobj_ops, t->t_wchan)) != NULL) { 492 if (owner == curthread) { 493 if (SOBJ_TYPE(sobj_ops) != SOBJ_USER_PI) { 494 panic("Deadlock: cycle in blocking chain"); 495 } 496 /* 497 * If the cycle we've encountered ends in mp, 498 * then we know it isn't a 'real' cycle because 499 * we're going to drop mp before we go to sleep. 500 * Moreover, since we've come full circle we know 501 * that we must have willed priority to everyone 502 * in our way. Therefore, we can break out now. 503 */ 504 if (t->t_wchan == (void *)mp) 505 break; 506 507 if (loser) 508 lock_clear(&turnstile_loser_lock); 509 /* 510 * For SOBJ_USER_PI, a cycle is an application 511 * deadlock which needs to be communicated 512 * back to the application. 513 */ 514 thread_unlock_nopreempt(t); 515 mutex_exit(mp); 516 setrun(curthread); 517 swtch(); /* necessary to transition state */ 518 curthread->t_flag &= ~T_WAKEABLE; 519 if (lwptp->lwpt_id != 0) 520 (void) lwp_timer_dequeue(lwptp); 521 setallwatch(); 522 lwp->lwp_asleep = 0; 523 lwp->lwp_sysabort = 0; 524 return (EDEADLK); 525 } 526 if (!turnstile_interlock(t->t_lockp, &owner->t_lockp)) { 527 /* 528 * If we failed to grab the owner's thread lock, 529 * turnstile_interlock() will have dropped t's 530 * thread lock, so at this point we don't even know 531 * that 't' exists anymore. The simplest solution 532 * is to restart the entire priority inheritance dance 533 * from the beginning of the blocking chain, since 534 * we *do* know that 'curthread' still exists. 535 * Application of priority inheritance is idempotent, 536 * so it's OK that we're doing it more than once. 537 * Note also that since we've dropped our thread lock, 538 * we may already have been woken up; if so, our 539 * t_sobj_ops will be NULL, the loop will terminate, 540 * and the call to swtch() will be a no-op. Phew. 541 * 542 * There is one further complication: if two (or more) 543 * threads keep trying to grab the turnstile locks out 544 * of order and keep losing the race to another thread, 545 * these "dueling losers" can livelock the system. 546 * Therefore, once we get into this rare situation, 547 * we serialize all the losers. 548 */ 549 if (loser == 0) { 550 loser = 1; 551 lock_set(&turnstile_loser_lock); 552 } 553 t = curthread; 554 thread_lock_high(t); 555 continue; 556 } 557 558 /* 559 * We now have the owner's thread lock. If we are traversing 560 * from non-SOBJ_USER_PI ops to SOBJ_USER_PI ops, then we know 561 * that we have caught the thread while in the TS_SLEEP state, 562 * but holding mp. We know that this situation is transient 563 * (mp will be dropped before the holder actually sleeps on 564 * the SOBJ_USER_PI sobj), so we will spin waiting for mp to 565 * be dropped. Then, as in the turnstile_interlock() failure 566 * case, we will restart the priority inheritance dance. 567 */ 568 if (SOBJ_TYPE(t->t_sobj_ops) != SOBJ_USER_PI && 569 owner->t_sobj_ops != NULL && 570 SOBJ_TYPE(owner->t_sobj_ops) == SOBJ_USER_PI) { 571 kmutex_t *upi_lock = (kmutex_t *)t->t_wchan; 572 573 ASSERT(IS_UPI(upi_lock)); 574 ASSERT(SOBJ_TYPE(t->t_sobj_ops) == SOBJ_MUTEX); 575 576 if (t->t_lockp != owner->t_lockp) 577 thread_unlock_high(owner); 578 thread_unlock_high(t); 579 if (loser) 580 lock_clear(&turnstile_loser_lock); 581 582 while (mutex_owner(upi_lock) == owner) { 583 SMT_PAUSE(); 584 continue; 585 } 586 587 if (loser) 588 lock_set(&turnstile_loser_lock); 589 t = curthread; 590 thread_lock_high(t); 591 continue; 592 } 593 594 turnstile_pi_inherit(t->t_ts, owner, DISP_PRIO(t)); 595 if (t->t_lockp != owner->t_lockp) 596 thread_unlock_high(t); 597 t = owner; 598 } 599 600 if (loser) 601 lock_clear(&turnstile_loser_lock); 602 603 /* 604 * Note: 't' and 'curthread' were synonymous before the loop above, 605 * but now they may be different. ('t' is now the last thread in 606 * the blocking chain.) 607 */ 608 if (SOBJ_TYPE(sobj_ops) == SOBJ_USER_PI) { 609 ushort_t s = curthread->t_oldspl; 610 int timedwait = 0; 611 uint_t imm_timeout = 0; 612 clock_t tim = -1; 613 614 thread_unlock_high(t); 615 if (lwptp->lwpt_id != 0) { 616 /* 617 * We enqueued a timeout. If it has already fired, 618 * lwptp->lwpt_imm_timeout has been set with cas, 619 * so fetch it with cas. 620 */ 621 timedwait = 1; 622 imm_timeout = 623 atomic_cas_uint(&lwptp->lwpt_imm_timeout, 0, 0); 624 } 625 mutex_exit(mp); 626 splx(s); 627 628 if (ISSIG(curthread, JUSTLOOKING) || 629 MUSTRETURN(p, curthread) || imm_timeout) 630 setrun(curthread); 631 swtch(); 632 curthread->t_flag &= ~T_WAKEABLE; 633 if (timedwait) 634 tim = lwp_timer_dequeue(lwptp); 635 setallwatch(); 636 if (ISSIG(curthread, FORREAL) || lwp->lwp_sysabort || 637 MUSTRETURN(p, curthread)) 638 error = EINTR; 639 else if (imm_timeout || (timedwait && tim == -1)) 640 error = ETIME; 641 lwp->lwp_sysabort = 0; 642 lwp->lwp_asleep = 0; 643 } else { 644 thread_unlock_nopreempt(t); 645 swtch(); 646 } 647 648 return (error); 649 } 650 651 /* 652 * Remove thread from specified turnstile sleep queue; retrieve its 653 * free turnstile; if it is the last waiter, delete the turnstile 654 * from the turnstile chain and if there is an inheritor, delete it 655 * from the inheritor's t_prioinv chain. 656 */ 657 static void 658 turnstile_dequeue(kthread_t *t) 659 { 660 turnstile_t *ts = t->t_ts; 661 turnstile_chain_t *tc = &TURNSTILE_CHAIN(ts->ts_sobj); 662 turnstile_t *tsfree, **tspp; 663 664 ASSERT(DISP_LOCK_HELD(&tc->tc_lock)); 665 ASSERT(t->t_lockp == &tc->tc_lock); 666 667 if ((tsfree = ts->ts_free) != NULL) { 668 ASSERT(ts->ts_waiters > 1); 669 ASSERT(tsfree->ts_waiters == 0); 670 t->t_ts = tsfree; 671 ts->ts_free = tsfree->ts_free; 672 tsfree->ts_free = NULL; 673 } else { 674 /* 675 * The active turnstile's freelist is empty, so this 676 * must be the last waiter. Remove the turnstile 677 * from the hash chain and leave the now-inactive 678 * turnstile attached to the thread we're waking. 679 * Note that the ts_inheritor for the turnstile 680 * may be NULL. If one exists, its t_prioinv 681 * chain has to be updated. 682 */ 683 ASSERT(ts->ts_waiters == 1); 684 if (ts->ts_inheritor != NULL) { 685 (void) turnstile_pi_tsdelete(ts, ts->ts_inheritor); 686 /* 687 * If we ever do a "disinherit" or "unboost", we need 688 * to do it only if "t" is a thread at the head of the 689 * sleep queue. Since the sleep queue is prioritized, 690 * the disinherit is necessary only if the interrupted 691 * thread is the highest priority thread. 692 * Otherwise, there is a higher priority thread blocked 693 * on the turnstile, whose inheritance cannot be 694 * disinherited. However, disinheriting is explicitly 695 * not done here, since it would require holding the 696 * inheritor's thread lock (see turnstile_unsleep()). 697 */ 698 ts->ts_inheritor = NULL; 699 } 700 tspp = &tc->tc_first; 701 while (*tspp != ts) 702 tspp = &(*tspp)->ts_next; 703 *tspp = ts->ts_next; 704 ASSERT(t->t_ts == ts); 705 } 706 ts->ts_waiters--; 707 sleepq_dequeue(t); 708 t->t_sobj_ops = NULL; 709 t->t_wchan = NULL; 710 t->t_wchan0 = NULL; 711 ASSERT(t->t_state == TS_SLEEP); 712 } 713 714 /* 715 * Wake threads that are blocked in a turnstile. 716 */ 717 void 718 turnstile_wakeup(turnstile_t *ts, int qnum, int nthreads, kthread_t *owner) 719 { 720 turnstile_chain_t *tc = &TURNSTILE_CHAIN(ts->ts_sobj); 721 sleepq_t *sqp = &ts->ts_sleepq[qnum]; 722 723 ASSERT(DISP_LOCK_HELD(&tc->tc_lock)); 724 725 /* 726 * Waive any priority we may have inherited from this turnstile. 727 */ 728 if (ts->ts_inheritor != NULL) { 729 turnstile_pi_waive(ts); 730 } 731 while (nthreads-- > 0) { 732 kthread_t *t = sqp->sq_first; 733 ASSERT(t->t_ts == ts); 734 ASSERT(ts->ts_waiters > 1 || ts->ts_inheritor == NULL); 735 DTRACE_SCHED1(wakeup, kthread_t *, t); 736 turnstile_dequeue(t); 737 CL_WAKEUP(t); /* previous thread lock, tc_lock, not dropped */ 738 /* 739 * If the caller did direct handoff of ownership, 740 * make the new owner inherit from this turnstile. 741 */ 742 if (t == owner) { 743 kthread_t *wp = ts->ts_sleepq[TS_WRITER_Q].sq_first; 744 kthread_t *rp = ts->ts_sleepq[TS_READER_Q].sq_first; 745 pri_t wpri = wp ? DISP_PRIO(wp) : 0; 746 pri_t rpri = rp ? DISP_PRIO(rp) : 0; 747 turnstile_pi_inherit(ts, t, MAX(wpri, rpri)); 748 owner = NULL; 749 } 750 thread_unlock_high(t); /* drop run queue lock */ 751 } 752 if (owner != NULL) 753 panic("turnstile_wakeup: owner %p not woken", (void *)owner); 754 disp_lock_exit(&tc->tc_lock); 755 } 756 757 /* 758 * Change priority of a thread sleeping in a turnstile. 759 */ 760 void 761 turnstile_change_pri(kthread_t *t, pri_t pri, pri_t *t_prip) 762 { 763 sleepq_t *sqp = t->t_sleepq; 764 765 sleepq_dequeue(t); 766 *t_prip = pri; 767 sleepq_insert(sqp, t); 768 } 769 770 /* 771 * We don't allow spurious wakeups of threads blocked in turnstiles 772 * for synch objects whose sobj_ops vector is initialized with the 773 * following routine (e.g. kernel synchronization objects). 774 * This is vital to the correctness of direct-handoff logic in some 775 * synchronization primitives, and it also simplifies the PI logic. 776 */ 777 /* ARGSUSED */ 778 void 779 turnstile_stay_asleep(kthread_t *t) 780 { 781 } 782 783 /* 784 * Wake up a thread blocked in a turnstile. Used to enable interruptibility 785 * of threads blocked on a SOBJ_USER_PI sobj. 786 * 787 * The implications of this interface are: 788 * 789 * 1. turnstile_block() may return with an EINTR. 790 * 2. When the owner of an sobj releases it, but no turnstile is found (i.e. 791 * no waiters), the (prior) owner must call turnstile_pi_recalc() to 792 * waive any priority inherited from interrupted waiters. 793 * 794 * When a waiter is interrupted, disinheriting its willed priority from the 795 * inheritor would require holding the inheritor's thread lock, while also 796 * holding the waiter's thread lock which is a turnstile lock. If the 797 * inheritor's thread lock is not free, and is also a turnstile lock that 798 * is out of lock order, the waiter's thread lock would have to be dropped. 799 * This leads to complications for the caller of turnstile_unsleep(), since 800 * the caller holds the waiter's thread lock. So, instead of disinheriting 801 * on waiter interruption, the owner is required to follow rule 2 above. 802 * 803 * Avoiding disinherit on waiter interruption seems acceptable because 804 * the owner runs at an unnecessarily high priority only while sobj is held, 805 * which it would have done in any case, if the waiter had not been interrupted. 806 */ 807 void 808 turnstile_unsleep(kthread_t *t) 809 { 810 turnstile_dequeue(t); 811 THREAD_TRANSITION(t); 812 CL_SETRUN(t); 813 } 814