1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 /* 22 * Copyright 2008 Sun Microsystems, Inc. All rights reserved. 23 * Use is subject to license terms. 24 */ 25 26 #pragma ident "%Z%%M% %I% %E% SMI" 27 28 /* 29 * The Cyclic Subsystem 30 * -------------------- 31 * 32 * Prehistory 33 * 34 * Historically, most computer architectures have specified interval-based 35 * timer parts (e.g. SPARCstation's counter/timer; Intel's i8254). While 36 * these parts deal in relative (i.e. not absolute) time values, they are 37 * typically used by the operating system to implement the abstraction of 38 * absolute time. As a result, these parts cannot typically be reprogrammed 39 * without introducing error in the system's notion of time. 40 * 41 * Starting in about 1994, chip architectures began specifying high resolution 42 * timestamp registers. As of this writing (1999), all major chip families 43 * (UltraSPARC, PentiumPro, MIPS, PowerPC, Alpha) have high resolution 44 * timestamp registers, and two (UltraSPARC and MIPS) have added the capacity 45 * to interrupt based on timestamp values. These timestamp-compare registers 46 * present a time-based interrupt source which can be reprogrammed arbitrarily 47 * often without introducing error. Given the low cost of implementing such a 48 * timestamp-compare register (and the tangible benefit of eliminating 49 * discrete timer parts), it is reasonable to expect that future chip 50 * architectures will adopt this feature. 51 * 52 * The cyclic subsystem has been designed to take advantage of chip 53 * architectures with the capacity to interrupt based on absolute, high 54 * resolution values of time. 55 * 56 * Subsystem Overview 57 * 58 * The cyclic subsystem is a low-level kernel subsystem designed to provide 59 * arbitrarily high resolution, per-CPU interval timers (to avoid colliding 60 * with existing terms, we dub such an interval timer a "cyclic"). Cyclics 61 * can be specified to fire at high, lock or low interrupt level, and may be 62 * optionally bound to a CPU or a CPU partition. A cyclic's CPU or CPU 63 * partition binding may be changed dynamically; the cyclic will be "juggled" 64 * to a CPU which satisfies the new binding. Alternatively, a cyclic may 65 * be specified to be "omnipresent", denoting firing on all online CPUs. 66 * 67 * Cyclic Subsystem Interface Overview 68 * ----------------------------------- 69 * 70 * The cyclic subsystem has interfaces with the kernel at-large, with other 71 * kernel subsystems (e.g. the processor management subsystem, the checkpoint 72 * resume subsystem) and with the platform (the cyclic backend). Each 73 * of these interfaces is given a brief synopsis here, and is described 74 * in full above the interface's implementation. 75 * 76 * The following diagram displays the cyclic subsystem's interfaces to 77 * other kernel components. The arrows denote a "calls" relationship, with 78 * the large arrow indicating the cyclic subsystem's consumer interface. 79 * Each arrow is labeled with the section in which the corresponding 80 * interface is described. 81 * 82 * Kernel at-large consumers 83 * -----------++------------ 84 * || 85 * || 86 * _||_ 87 * \ / 88 * \/ 89 * +---------------------+ 90 * | | 91 * | Cyclic subsystem |<----------- Other kernel subsystems 92 * | | 93 * +---------------------+ 94 * ^ | 95 * | | 96 * | | 97 * | v 98 * +---------------------+ 99 * | | 100 * | Cyclic backend | 101 * | (platform specific) | 102 * | | 103 * +---------------------+ 104 * 105 * 106 * Kernel At-Large Interfaces 107 * 108 * cyclic_add() <-- Creates a cyclic 109 * cyclic_add_omni() <-- Creates an omnipresent cyclic 110 * cyclic_remove() <-- Removes a cyclic 111 * cyclic_bind() <-- Change a cyclic's CPU or partition binding 112 * 113 * Inter-subsystem Interfaces 114 * 115 * cyclic_juggle() <-- Juggles cyclics away from a CPU 116 * cyclic_offline() <-- Offlines cyclic operation on a CPU 117 * cyclic_online() <-- Reenables operation on an offlined CPU 118 * cyclic_move_in() <-- Notifies subsystem of change in CPU partition 119 * cyclic_move_out() <-- Notifies subsystem of change in CPU partition 120 * cyclic_suspend() <-- Suspends the cyclic subsystem on all CPUs 121 * cyclic_resume() <-- Resumes the cyclic subsystem on all CPUs 122 * 123 * Backend Interfaces 124 * 125 * cyclic_init() <-- Initializes the cyclic subsystem 126 * cyclic_fire() <-- CY_HIGH_LEVEL interrupt entry point 127 * cyclic_softint() <-- CY_LOCK/LOW_LEVEL soft interrupt entry point 128 * 129 * The backend-supplied interfaces (through the cyc_backend structure) are 130 * documented in detail in <sys/cyclic_impl.h> 131 * 132 * 133 * Cyclic Subsystem Implementation Overview 134 * ---------------------------------------- 135 * 136 * The cyclic subsystem is designed to minimize interference between cyclics 137 * on different CPUs. Thus, all of the cyclic subsystem's data structures 138 * hang off of a per-CPU structure, cyc_cpu. 139 * 140 * Each cyc_cpu has a power-of-two sized array of cyclic structures (the 141 * cyp_cyclics member of the cyc_cpu structure). If cyclic_add() is called 142 * and there does not exist a free slot in the cyp_cyclics array, the size of 143 * the array will be doubled. The array will never shrink. Cyclics are 144 * referred to by their index in the cyp_cyclics array, which is of type 145 * cyc_index_t. 146 * 147 * The cyclics are kept sorted by expiration time in the cyc_cpu's heap. The 148 * heap is keyed by cyclic expiration time, with parents expiring earlier 149 * than their children. 150 * 151 * Heap Management 152 * 153 * The heap is managed primarily by cyclic_fire(). Upon entry, cyclic_fire() 154 * compares the root cyclic's expiration time to the current time. If the 155 * expiration time is in the past, cyclic_expire() is called on the root 156 * cyclic. Upon return from cyclic_expire(), the cyclic's new expiration time 157 * is derived by adding its interval to its old expiration time, and a 158 * downheap operation is performed. After the downheap, cyclic_fire() 159 * examines the (potentially changed) root cyclic, repeating the 160 * cyclic_expire()/add interval/cyclic_downheap() sequence until the root 161 * cyclic has an expiration time in the future. This expiration time 162 * (guaranteed to be the earliest in the heap) is then communicated to the 163 * backend via cyb_reprogram. Optimal backends will next call cyclic_fire() 164 * shortly after the root cyclic's expiration time. 165 * 166 * To allow efficient, deterministic downheap operations, we implement the 167 * heap as an array (the cyp_heap member of the cyc_cpu structure), with each 168 * element containing an index into the CPU's cyp_cyclics array. 169 * 170 * The heap is laid out in the array according to the following: 171 * 172 * 1. The root of the heap is always in the 0th element of the heap array 173 * 2. The left and right children of the nth element are element 174 * (((n + 1) << 1) - 1) and element ((n + 1) << 1), respectively. 175 * 176 * This layout is standard (see, e.g., Cormen's "Algorithms"); the proof 177 * that these constraints correctly lay out a heap (or indeed, any binary 178 * tree) is trivial and left to the reader. 179 * 180 * To see the heap by example, assume our cyclics array has the following 181 * members (at time t): 182 * 183 * cy_handler cy_level cy_expire 184 * --------------------------------------------- 185 * [ 0] clock() LOCK t+10000000 186 * [ 1] deadman() HIGH t+1000000000 187 * [ 2] clock_highres_fire() LOW t+100 188 * [ 3] clock_highres_fire() LOW t+1000 189 * [ 4] clock_highres_fire() LOW t+500 190 * [ 5] (free) -- -- 191 * [ 6] (free) -- -- 192 * [ 7] (free) -- -- 193 * 194 * The heap array could be: 195 * 196 * [0] [1] [2] [3] [4] [5] [6] [7] 197 * +-----+-----+-----+-----+-----+-----+-----+-----+ 198 * | | | | | | | | | 199 * | 2 | 3 | 4 | 0 | 1 | x | x | x | 200 * | | | | | | | | | 201 * +-----+-----+-----+-----+-----+-----+-----+-----+ 202 * 203 * Graphically, this array corresponds to the following (excuse the ASCII art): 204 * 205 * 2 206 * | 207 * +------------------+------------------+ 208 * 3 4 209 * | 210 * +---------+--------+ 211 * 0 1 212 * 213 * Note that the heap is laid out by layer: all nodes at a given depth are 214 * stored in consecutive elements of the array. Moreover, layers of 215 * consecutive depths are in adjacent element ranges. This property 216 * guarantees high locality of reference during downheap operations. 217 * Specifically, we are guaranteed that we can downheap to a depth of 218 * 219 * lg (cache_line_size / sizeof (cyc_index_t)) 220 * 221 * nodes with at most one cache miss. On UltraSPARC (64 byte e-cache line 222 * size), this corresponds to a depth of four nodes. Thus, if there are 223 * fewer than sixteen cyclics in the heap, downheaps on UltraSPARC miss at 224 * most once in the e-cache. 225 * 226 * Downheaps are required to compare siblings as they proceed down the 227 * heap. For downheaps proceeding beyond the one-cache-miss depth, every 228 * access to a left child could potentially miss in the cache. However, 229 * if we assume 230 * 231 * (cache_line_size / sizeof (cyc_index_t)) > 2, 232 * 233 * then all siblings are guaranteed to be on the same cache line. Thus, the 234 * miss on the left child will guarantee a hit on the right child; downheaps 235 * will incur at most one cache miss per layer beyond the one-cache-miss 236 * depth. The total number of cache misses for heap management during a 237 * downheap operation is thus bounded by 238 * 239 * lg (n) - lg (cache_line_size / sizeof (cyc_index_t)) 240 * 241 * Traditional pointer-based heaps are implemented without regard to 242 * locality. Downheaps can thus incur two cache misses per layer (one for 243 * each child), but at most one cache miss at the root. This yields a bound 244 * of 245 * 246 * 2 * lg (n) - 1 247 * 248 * on the total cache misses. 249 * 250 * This difference may seem theoretically trivial (the difference is, after 251 * all, constant), but can become substantial in practice -- especially for 252 * caches with very large cache lines and high miss penalties (e.g. TLBs). 253 * 254 * Heaps must always be full, balanced trees. Heap management must therefore 255 * track the next point-of-insertion into the heap. In pointer-based heaps, 256 * recomputing this point takes O(lg (n)). Given the layout of the 257 * array-based implementation, however, the next point-of-insertion is 258 * always: 259 * 260 * heap[number_of_elements] 261 * 262 * We exploit this property by implementing the free-list in the usused 263 * heap elements. Heap insertion, therefore, consists only of filling in 264 * the cyclic at cyp_cyclics[cyp_heap[number_of_elements]], incrementing 265 * the number of elements, and performing an upheap. Heap deletion consists 266 * of decrementing the number of elements, swapping the to-be-deleted element 267 * with the element at cyp_heap[number_of_elements], and downheaping. 268 * 269 * Filling in more details in our earlier example: 270 * 271 * +--- free list head 272 * | 273 * V 274 * 275 * [0] [1] [2] [3] [4] [5] [6] [7] 276 * +-----+-----+-----+-----+-----+-----+-----+-----+ 277 * | | | | | | | | | 278 * | 2 | 3 | 4 | 0 | 1 | 5 | 6 | 7 | 279 * | | | | | | | | | 280 * +-----+-----+-----+-----+-----+-----+-----+-----+ 281 * 282 * To insert into this heap, we would just need to fill in the cyclic at 283 * cyp_cyclics[5], bump the number of elements (from 5 to 6) and perform 284 * an upheap. 285 * 286 * If we wanted to remove, say, cyp_cyclics[3], we would first scan for it 287 * in the cyp_heap, and discover it at cyp_heap[1]. We would then decrement 288 * the number of elements (from 5 to 4), swap cyp_heap[1] with cyp_heap[4], 289 * and perform a downheap from cyp_heap[1]. The linear scan is required 290 * because the cyclic does not keep a backpointer into the heap. This makes 291 * heap manipulation (e.g. downheaps) faster at the expense of removal 292 * operations. 293 * 294 * Expiry processing 295 * 296 * As alluded to above, cyclic_expire() is called by cyclic_fire() at 297 * CY_HIGH_LEVEL to expire a cyclic. Cyclic subsystem consumers are 298 * guaranteed that for an arbitrary time t in the future, their cyclic 299 * handler will have been called (t - cyt_when) / cyt_interval times. Thus, 300 * there must be a one-to-one mapping between a cyclic's expiration at 301 * CY_HIGH_LEVEL and its execution at the desired level (either CY_HIGH_LEVEL, 302 * CY_LOCK_LEVEL or CY_LOW_LEVEL). 303 * 304 * For CY_HIGH_LEVEL cyclics, this is trivial; cyclic_expire() simply needs 305 * to call the handler. 306 * 307 * For CY_LOCK_LEVEL and CY_LOW_LEVEL cyclics, however, there exists a 308 * potential disconnect: if the CPU is at an interrupt level less than 309 * CY_HIGH_LEVEL but greater than the level of a cyclic for a period of 310 * time longer than twice the cyclic's interval, the cyclic will be expired 311 * twice before it can be handled. 312 * 313 * To maintain the one-to-one mapping, we track the difference between the 314 * number of times a cyclic has been expired and the number of times it's 315 * been handled in a "pending count" (the cy_pend field of the cyclic 316 * structure). cyclic_expire() thus increments the cy_pend count for the 317 * expired cyclic and posts a soft interrupt at the desired level. In the 318 * cyclic subsystem's soft interrupt handler, cyclic_softint(), we repeatedly 319 * call the cyclic handler and decrement cy_pend until we have decremented 320 * cy_pend to zero. 321 * 322 * The Producer/Consumer Buffer 323 * 324 * If we wish to avoid a linear scan of the cyclics array at soft interrupt 325 * level, cyclic_softint() must be able to quickly determine which cyclics 326 * have a non-zero cy_pend count. We thus introduce a per-soft interrupt 327 * level producer/consumer buffer shared with CY_HIGH_LEVEL. These buffers 328 * are encapsulated in the cyc_pcbuffer structure, and, like cyp_heap, are 329 * implemented as cyc_index_t arrays (the cypc_buf member of the cyc_pcbuffer 330 * structure). 331 * 332 * The producer (cyclic_expire() running at CY_HIGH_LEVEL) enqueues a cyclic 333 * by storing the cyclic's index to cypc_buf[cypc_prodndx] and incrementing 334 * cypc_prodndx. The consumer (cyclic_softint() running at either 335 * CY_LOCK_LEVEL or CY_LOW_LEVEL) dequeues a cyclic by loading from 336 * cypc_buf[cypc_consndx] and bumping cypc_consndx. The buffer is empty when 337 * cypc_prodndx == cypc_consndx. 338 * 339 * To bound the size of the producer/consumer buffer, cyclic_expire() only 340 * enqueues a cyclic if its cy_pend was zero (if the cyclic's cy_pend is 341 * non-zero, cyclic_expire() only bumps cy_pend). Symmetrically, 342 * cyclic_softint() only consumes a cyclic after it has decremented the 343 * cy_pend count to zero. 344 * 345 * Returning to our example, here is what the CY_LOW_LEVEL producer/consumer 346 * buffer might look like: 347 * 348 * cypc_consndx ---+ +--- cypc_prodndx 349 * | | 350 * V V 351 * 352 * [0] [1] [2] [3] [4] [5] [6] [7] 353 * +-----+-----+-----+-----+-----+-----+-----+-----+ 354 * | | | | | | | | | 355 * | x | x | 3 | 2 | 4 | x | x | x | <== cypc_buf 356 * | | | . | . | . | | | | 357 * +-----+-----+- | -+- | -+- | -+-----+-----+-----+ 358 * | | | 359 * | | | cy_pend cy_handler 360 * | | | ------------------------- 361 * | | | [ 0] 1 clock() 362 * | | | [ 1] 0 deadman() 363 * | +---- | -------> [ 2] 3 clock_highres_fire() 364 * +---------- | -------> [ 3] 1 clock_highres_fire() 365 * +--------> [ 4] 1 clock_highres_fire() 366 * [ 5] - (free) 367 * [ 6] - (free) 368 * [ 7] - (free) 369 * 370 * In particular, note that clock()'s cy_pend is 1 but that it is _not_ in 371 * this producer/consumer buffer; it would be enqueued in the CY_LOCK_LEVEL 372 * producer/consumer buffer. 373 * 374 * Locking 375 * 376 * Traditionally, access to per-CPU data structures shared between 377 * interrupt levels is serialized by manipulating programmable interrupt 378 * level: readers and writers are required to raise their interrupt level 379 * to that of the highest level writer. 380 * 381 * For the producer/consumer buffers (shared between cyclic_fire()/ 382 * cyclic_expire() executing at CY_HIGH_LEVEL and cyclic_softint() executing 383 * at one of CY_LOCK_LEVEL or CY_LOW_LEVEL), forcing cyclic_softint() to raise 384 * programmable interrupt level is undesirable: aside from the additional 385 * latency incurred by manipulating interrupt level in the hot cy_pend 386 * processing path, this would create the potential for soft level cy_pend 387 * processing to delay CY_HIGH_LEVEL firing and expiry processing. 388 * CY_LOCK/LOW_LEVEL cyclics could thereby induce jitter in CY_HIGH_LEVEL 389 * cyclics. 390 * 391 * To minimize jitter, then, we would like the cyclic_fire()/cyclic_expire() 392 * and cyclic_softint() code paths to be lock-free. 393 * 394 * For cyclic_fire()/cyclic_expire(), lock-free execution is straightforward: 395 * because these routines execute at a higher interrupt level than 396 * cyclic_softint(), their actions on the producer/consumer buffer appear 397 * atomic. In particular, the increment of cy_pend appears to occur 398 * atomically with the increment of cypc_prodndx. 399 * 400 * For cyclic_softint(), however, lock-free execution requires more delicacy. 401 * When cyclic_softint() discovers a cyclic in the producer/consumer buffer, 402 * it calls the cyclic's handler and attempts to atomically decrement the 403 * cy_pend count with a compare&swap operation. 404 * 405 * If the compare&swap operation succeeds, cyclic_softint() behaves 406 * conditionally based on the value it atomically wrote to cy_pend: 407 * 408 * - If the cy_pend was decremented to 0, the cyclic has been consumed; 409 * cyclic_softint() increments the cypc_consndx and checks for more 410 * enqueued work. 411 * 412 * - If the count was decremented to a non-zero value, there is more work 413 * to be done on the cyclic; cyclic_softint() calls the cyclic handler 414 * and repeats the atomic decrement process. 415 * 416 * If the compare&swap operation fails, cyclic_softint() knows that 417 * cyclic_expire() has intervened and bumped the cy_pend count (resizes 418 * and removals complicate this, however -- see the sections on their 419 * operation, below). cyclic_softint() thus reloads cy_pend, and re-attempts 420 * the atomic decrement. 421 * 422 * Recall that we bound the size of the producer/consumer buffer by 423 * having cyclic_expire() only enqueue the specified cyclic if its 424 * cy_pend count is zero; this assures that each cyclic is enqueued at 425 * most once. This leads to a critical constraint on cyclic_softint(), 426 * however: after the compare&swap operation which successfully decrements 427 * cy_pend to zero, cyclic_softint() must _not_ re-examine the consumed 428 * cyclic. In part to obey this constraint, cyclic_softint() calls the 429 * cyclic handler before decrementing cy_pend. 430 * 431 * Resizing 432 * 433 * All of the discussion thus far has assumed a static number of cyclics. 434 * Obviously, static limitations are not practical; we need the capacity 435 * to resize our data structures dynamically. 436 * 437 * We resize our data structures lazily, and only on a per-CPU basis. 438 * The size of the data structures always doubles and never shrinks. We 439 * serialize adds (and thus resizes) on cpu_lock; we never need to deal 440 * with concurrent resizes. Resizes should be rare; they may induce jitter 441 * on the CPU being resized, but should not affect cyclic operation on other 442 * CPUs. Pending cyclics may not be dropped during a resize operation. 443 * 444 * Three key cyc_cpu data structures need to be resized: the cyclics array, 445 * the heap array and the producer/consumer buffers. Resizing the first two 446 * is relatively straightforward: 447 * 448 * 1. The new, larger arrays are allocated in cyclic_expand() (called 449 * from cyclic_add()). 450 * 2. cyclic_expand() cross calls cyclic_expand_xcall() on the CPU 451 * undergoing the resize. 452 * 3. cyclic_expand_xcall() raises interrupt level to CY_HIGH_LEVEL 453 * 4. The contents of the old arrays are copied into the new arrays. 454 * 5. The old cyclics array is bzero()'d 455 * 6. The pointers are updated. 456 * 457 * The producer/consumer buffer is dicier: cyclic_expand_xcall() may have 458 * interrupted cyclic_softint() in the middle of consumption. To resize the 459 * producer/consumer buffer, we implement up to two buffers per soft interrupt 460 * level: a hard buffer (the buffer being produced into by cyclic_expire()) 461 * and a soft buffer (the buffer from which cyclic_softint() is consuming). 462 * During normal operation, the hard buffer and soft buffer point to the 463 * same underlying producer/consumer buffer. 464 * 465 * During a resize, however, cyclic_expand_xcall() changes the hard buffer 466 * to point to the new, larger producer/consumer buffer; all future 467 * cyclic_expire()'s will produce into the new buffer. cyclic_expand_xcall() 468 * then posts a CY_LOCK_LEVEL soft interrupt, landing in cyclic_softint(). 469 * 470 * As under normal operation, cyclic_softint() will consume cyclics from 471 * its soft buffer. After the soft buffer is drained, however, 472 * cyclic_softint() will see that the hard buffer has changed. At that time, 473 * cyclic_softint() will change its soft buffer to point to the hard buffer, 474 * and repeat the producer/consumer buffer draining procedure. 475 * 476 * After the new buffer is drained, cyclic_softint() will determine if both 477 * soft levels have seen their new producer/consumer buffer. If both have, 478 * cyclic_softint() will post on the semaphore cyp_modify_wait. If not, a 479 * soft interrupt will be generated for the remaining level. 480 * 481 * cyclic_expand() blocks on the cyp_modify_wait semaphore (a semaphore is 482 * used instead of a condition variable because of the race between the 483 * sema_p() in cyclic_expand() and the sema_v() in cyclic_softint()). This 484 * allows cyclic_expand() to know when the resize operation is complete; 485 * all of the old buffers (the heap, the cyclics array and the producer/ 486 * consumer buffers) can be freed. 487 * 488 * A final caveat on resizing: we described step (5) in the 489 * cyclic_expand_xcall() procedure without providing any motivation. This 490 * step addresses the problem of a cyclic_softint() attempting to decrement 491 * a cy_pend count while interrupted by a cyclic_expand_xcall(). Because 492 * cyclic_softint() has already called the handler by the time cy_pend is 493 * decremented, we want to assure that it doesn't decrement a cy_pend 494 * count in the old cyclics array. By zeroing the old cyclics array in 495 * cyclic_expand_xcall(), we are zeroing out every cy_pend count; when 496 * cyclic_softint() attempts to compare&swap on the cy_pend count, it will 497 * fail and recognize that the count has been zeroed. cyclic_softint() will 498 * update its stale copy of the cyp_cyclics pointer, re-read the cy_pend 499 * count from the new cyclics array, and re-attempt the compare&swap. 500 * 501 * Removals 502 * 503 * Cyclic removals should be rare. To simplify the implementation (and to 504 * allow optimization for the cyclic_fire()/cyclic_expire()/cyclic_softint() 505 * path), we force removals and adds to serialize on cpu_lock. 506 * 507 * Cyclic removal is complicated by a guarantee made to the consumer of 508 * the cyclic subsystem: after cyclic_remove() returns, the cyclic handler 509 * has returned and will never again be called. 510 * 511 * Here is the procedure for cyclic removal: 512 * 513 * 1. cyclic_remove() calls cyclic_remove_xcall() on the CPU undergoing 514 * the removal. 515 * 2. cyclic_remove_xcall() raises interrupt level to CY_HIGH_LEVEL 516 * 3. The current expiration time for the removed cyclic is recorded. 517 * 4. If the cy_pend count on the removed cyclic is non-zero, it 518 * is copied into cyp_rpend and subsequently zeroed. 519 * 5. The cyclic is removed from the heap 520 * 6. If the root of the heap has changed, the backend is reprogrammed. 521 * 7. If the cy_pend count was non-zero cyclic_remove() blocks on the 522 * cyp_modify_wait semaphore. 523 * 524 * The motivation for step (3) is explained in "Juggling", below. 525 * 526 * The cy_pend count is decremented in cyclic_softint() after the cyclic 527 * handler returns. Thus, if we find a cy_pend count of zero in step 528 * (4), we know that cyclic_remove() doesn't need to block. 529 * 530 * If the cy_pend count is non-zero, however, we must block in cyclic_remove() 531 * until cyclic_softint() has finished calling the cyclic handler. To let 532 * cyclic_softint() know that this cyclic has been removed, we zero the 533 * cy_pend count. This will cause cyclic_softint()'s compare&swap to fail. 534 * When cyclic_softint() sees the zero cy_pend count, it knows that it's been 535 * caught during a resize (see "Resizing", above) or that the cyclic has been 536 * removed. In the latter case, it calls cyclic_remove_pend() to call the 537 * cyclic handler cyp_rpend - 1 times, and posts on cyp_modify_wait. 538 * 539 * Juggling 540 * 541 * At first glance, cyclic juggling seems to be a difficult problem. The 542 * subsystem must guarantee that a cyclic doesn't execute simultaneously on 543 * different CPUs, while also assuring that a cyclic fires exactly once 544 * per interval. We solve this problem by leveraging a property of the 545 * platform: gethrtime() is required to increase in lock-step across 546 * multiple CPUs. Therefore, to juggle a cyclic, we remove it from its 547 * CPU, recording its expiration time in the remove cross call (step (3) 548 * in "Removing", above). We then add the cyclic to the new CPU, explicitly 549 * setting its expiration time to the time recorded in the removal. This 550 * leverages the existing cyclic expiry processing, which will compensate 551 * for any time lost while juggling. 552 * 553 */ 554 #include <sys/cyclic_impl.h> 555 #include <sys/sysmacros.h> 556 #include <sys/systm.h> 557 #include <sys/atomic.h> 558 #include <sys/kmem.h> 559 #include <sys/cmn_err.h> 560 #include <sys/ddi.h> 561 #include <sys/sdt.h> 562 563 #ifdef CYCLIC_TRACE 564 565 /* 566 * cyc_trace_enabled is for the benefit of kernel debuggers. 567 */ 568 int cyc_trace_enabled = 1; 569 static cyc_tracebuf_t cyc_ptrace; 570 static cyc_coverage_t cyc_coverage[CY_NCOVERAGE]; 571 572 /* 573 * Seen this anywhere? 574 */ 575 static uint_t 576 cyclic_coverage_hash(char *p) 577 { 578 unsigned int g; 579 uint_t hval; 580 581 hval = 0; 582 while (*p) { 583 hval = (hval << 4) + *p++; 584 if ((g = (hval & 0xf0000000)) != 0) 585 hval ^= g >> 24; 586 hval &= ~g; 587 } 588 return (hval); 589 } 590 591 static void 592 cyclic_coverage(char *why, int level, uint64_t arg0, uint64_t arg1) 593 { 594 uint_t ndx, orig; 595 596 for (ndx = orig = cyclic_coverage_hash(why) % CY_NCOVERAGE; ; ) { 597 if (cyc_coverage[ndx].cyv_why == why) 598 break; 599 600 if (cyc_coverage[ndx].cyv_why != NULL || 601 casptr(&cyc_coverage[ndx].cyv_why, NULL, why) != NULL) { 602 603 if (++ndx == CY_NCOVERAGE) 604 ndx = 0; 605 606 if (ndx == orig) 607 panic("too many cyclic coverage points"); 608 continue; 609 } 610 611 /* 612 * If we're here, we have successfully swung our guy into 613 * the position at "ndx". 614 */ 615 break; 616 } 617 618 if (level == CY_PASSIVE_LEVEL) 619 cyc_coverage[ndx].cyv_passive_count++; 620 else 621 cyc_coverage[ndx].cyv_count[level]++; 622 623 cyc_coverage[ndx].cyv_arg0 = arg0; 624 cyc_coverage[ndx].cyv_arg1 = arg1; 625 } 626 627 #define CYC_TRACE(cpu, level, why, arg0, arg1) \ 628 CYC_TRACE_IMPL(&cpu->cyp_trace[level], level, why, arg0, arg1) 629 630 #define CYC_PTRACE(why, arg0, arg1) \ 631 CYC_TRACE_IMPL(&cyc_ptrace, CY_PASSIVE_LEVEL, why, arg0, arg1) 632 633 #define CYC_TRACE_IMPL(buf, level, why, a0, a1) { \ 634 if (panicstr == NULL) { \ 635 int _ndx = (buf)->cyt_ndx; \ 636 cyc_tracerec_t *_rec = &(buf)->cyt_buf[_ndx]; \ 637 (buf)->cyt_ndx = (++_ndx == CY_NTRACEREC) ? 0 : _ndx; \ 638 _rec->cyt_tstamp = gethrtime_unscaled(); \ 639 _rec->cyt_why = (why); \ 640 _rec->cyt_arg0 = (uint64_t)(uintptr_t)(a0); \ 641 _rec->cyt_arg1 = (uint64_t)(uintptr_t)(a1); \ 642 cyclic_coverage(why, level, \ 643 (uint64_t)(uintptr_t)(a0), (uint64_t)(uintptr_t)(a1)); \ 644 } \ 645 } 646 647 #else 648 649 static int cyc_trace_enabled = 0; 650 651 #define CYC_TRACE(cpu, level, why, arg0, arg1) 652 #define CYC_PTRACE(why, arg0, arg1) 653 654 #endif 655 656 #define CYC_TRACE0(cpu, level, why) CYC_TRACE(cpu, level, why, 0, 0) 657 #define CYC_TRACE1(cpu, level, why, arg0) CYC_TRACE(cpu, level, why, arg0, 0) 658 659 #define CYC_PTRACE0(why) CYC_PTRACE(why, 0, 0) 660 #define CYC_PTRACE1(why, arg0) CYC_PTRACE(why, arg0, 0) 661 662 static kmem_cache_t *cyclic_id_cache; 663 static cyc_id_t *cyclic_id_head; 664 static hrtime_t cyclic_resolution; 665 static cyc_backend_t cyclic_backend; 666 667 /* 668 * Returns 1 if the upheap propagated to the root, 0 if it did not. This 669 * allows the caller to reprogram the backend only when the root has been 670 * modified. 671 */ 672 static int 673 cyclic_upheap(cyc_cpu_t *cpu, cyc_index_t ndx) 674 { 675 cyclic_t *cyclics; 676 cyc_index_t *heap; 677 cyc_index_t heap_parent, heap_current = ndx; 678 cyc_index_t parent, current; 679 680 if (heap_current == 0) 681 return (1); 682 683 heap = cpu->cyp_heap; 684 cyclics = cpu->cyp_cyclics; 685 heap_parent = CYC_HEAP_PARENT(heap_current); 686 687 for (;;) { 688 current = heap[heap_current]; 689 parent = heap[heap_parent]; 690 691 /* 692 * We have an expiration time later than our parent; we're 693 * done. 694 */ 695 if (cyclics[current].cy_expire >= cyclics[parent].cy_expire) 696 return (0); 697 698 /* 699 * We need to swap with our parent, and continue up the heap. 700 */ 701 heap[heap_parent] = current; 702 heap[heap_current] = parent; 703 704 /* 705 * If we just reached the root, we're done. 706 */ 707 if (heap_parent == 0) 708 return (1); 709 710 heap_current = heap_parent; 711 heap_parent = CYC_HEAP_PARENT(heap_current); 712 } 713 } 714 715 static void 716 cyclic_downheap(cyc_cpu_t *cpu, cyc_index_t ndx) 717 { 718 cyclic_t *cyclics = cpu->cyp_cyclics; 719 cyc_index_t *heap = cpu->cyp_heap; 720 721 cyc_index_t heap_left, heap_right, heap_me = ndx; 722 cyc_index_t left, right, me; 723 cyc_index_t nelems = cpu->cyp_nelems; 724 725 for (;;) { 726 /* 727 * If we don't have a left child (i.e., we're a leaf), we're 728 * done. 729 */ 730 if ((heap_left = CYC_HEAP_LEFT(heap_me)) >= nelems) 731 return; 732 733 left = heap[heap_left]; 734 me = heap[heap_me]; 735 736 heap_right = CYC_HEAP_RIGHT(heap_me); 737 738 /* 739 * Even if we don't have a right child, we still need to compare 740 * our expiration time against that of our left child. 741 */ 742 if (heap_right >= nelems) 743 goto comp_left; 744 745 right = heap[heap_right]; 746 747 /* 748 * We have both a left and a right child. We need to compare 749 * the expiration times of the children to determine which 750 * expires earlier. 751 */ 752 if (cyclics[right].cy_expire < cyclics[left].cy_expire) { 753 /* 754 * Our right child is the earlier of our children. 755 * We'll now compare our expiration time to its; if 756 * ours is the earlier, we're done. 757 */ 758 if (cyclics[me].cy_expire <= cyclics[right].cy_expire) 759 return; 760 761 /* 762 * Our right child expires earlier than we do; swap 763 * with our right child, and descend right. 764 */ 765 heap[heap_right] = me; 766 heap[heap_me] = right; 767 heap_me = heap_right; 768 continue; 769 } 770 771 comp_left: 772 /* 773 * Our left child is the earlier of our children (or we have 774 * no right child). We'll now compare our expiration time 775 * to its; if ours is the earlier, we're done. 776 */ 777 if (cyclics[me].cy_expire <= cyclics[left].cy_expire) 778 return; 779 780 /* 781 * Our left child expires earlier than we do; swap with our 782 * left child, and descend left. 783 */ 784 heap[heap_left] = me; 785 heap[heap_me] = left; 786 heap_me = heap_left; 787 } 788 } 789 790 static void 791 cyclic_expire(cyc_cpu_t *cpu, cyc_index_t ndx, cyclic_t *cyclic) 792 { 793 cyc_backend_t *be = cpu->cyp_backend; 794 cyc_level_t level = cyclic->cy_level; 795 796 /* 797 * If this is a CY_HIGH_LEVEL cyclic, just call the handler; we don't 798 * need to worry about the pend count for CY_HIGH_LEVEL cyclics. 799 */ 800 if (level == CY_HIGH_LEVEL) { 801 cyc_func_t handler = cyclic->cy_handler; 802 void *arg = cyclic->cy_arg; 803 804 CYC_TRACE(cpu, CY_HIGH_LEVEL, "handler-in", handler, arg); 805 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 806 807 (*handler)(arg); 808 809 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 810 CYC_TRACE(cpu, CY_HIGH_LEVEL, "handler-out", handler, arg); 811 812 return; 813 } 814 815 /* 816 * We're at CY_HIGH_LEVEL; this modification to cy_pend need not 817 * be atomic (the high interrupt level assures that it will appear 818 * atomic to any softint currently running). 819 */ 820 if (cyclic->cy_pend++ == 0) { 821 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[level]; 822 cyc_pcbuffer_t *pc = &softbuf->cys_buf[softbuf->cys_hard]; 823 824 /* 825 * We need to enqueue this cyclic in the soft buffer. 826 */ 827 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expire-enq", cyclic, 828 pc->cypc_prodndx); 829 pc->cypc_buf[pc->cypc_prodndx++ & pc->cypc_sizemask] = ndx; 830 831 ASSERT(pc->cypc_prodndx != pc->cypc_consndx); 832 } else { 833 /* 834 * If the pend count is zero after we incremented it, then 835 * we've wrapped (i.e. we had a cy_pend count of over four 836 * billion. In this case, we clamp the pend count at 837 * UINT32_MAX. Yes, cyclics can be lost in this case. 838 */ 839 if (cyclic->cy_pend == 0) { 840 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "expire-wrap", cyclic); 841 cyclic->cy_pend = UINT32_MAX; 842 } 843 844 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expire-bump", cyclic, 0); 845 } 846 847 be->cyb_softint(be->cyb_arg, cyclic->cy_level); 848 } 849 850 /* 851 * cyclic_fire(cpu_t *) 852 * 853 * Overview 854 * 855 * cyclic_fire() is the cyclic subsystem's CY_HIGH_LEVEL interrupt handler. 856 * Called by the cyclic backend. 857 * 858 * Arguments and notes 859 * 860 * The only argument is the CPU on which the interrupt is executing; 861 * backends must call into cyclic_fire() on the specified CPU. 862 * 863 * cyclic_fire() may be called spuriously without ill effect. Optimal 864 * backends will call into cyclic_fire() at or shortly after the time 865 * requested via cyb_reprogram(). However, calling cyclic_fire() 866 * arbitrarily late will only manifest latency bubbles; the correctness 867 * of the cyclic subsystem does not rely on the timeliness of the backend. 868 * 869 * cyclic_fire() is wait-free; it will not block or spin. 870 * 871 * Return values 872 * 873 * None. 874 * 875 * Caller's context 876 * 877 * cyclic_fire() must be called from CY_HIGH_LEVEL interrupt context. 878 */ 879 void 880 cyclic_fire(cpu_t *c) 881 { 882 cyc_cpu_t *cpu = c->cpu_cyclic; 883 cyc_backend_t *be = cpu->cyp_backend; 884 cyc_index_t *heap = cpu->cyp_heap; 885 cyclic_t *cyclic, *cyclics = cpu->cyp_cyclics; 886 void *arg = be->cyb_arg; 887 hrtime_t now = gethrtime(); 888 hrtime_t exp; 889 890 CYC_TRACE(cpu, CY_HIGH_LEVEL, "fire", now, 0); 891 892 if (cpu->cyp_nelems == 0) { 893 /* 894 * This is a spurious fire. Count it as such, and blow 895 * out of here. 896 */ 897 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "fire-spurious"); 898 return; 899 } 900 901 for (;;) { 902 cyc_index_t ndx = heap[0]; 903 904 cyclic = &cyclics[ndx]; 905 906 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 907 908 CYC_TRACE(cpu, CY_HIGH_LEVEL, "fire-check", cyclic, 909 cyclic->cy_expire); 910 911 if ((exp = cyclic->cy_expire) > now) 912 break; 913 914 cyclic_expire(cpu, ndx, cyclic); 915 916 /* 917 * If this cyclic will be set to next expire in the distant 918 * past, we have one of two situations: 919 * 920 * a) This is the first firing of a cyclic which had 921 * cy_expire set to 0. 922 * 923 * b) We are tragically late for a cyclic -- most likely 924 * due to being in the debugger. 925 * 926 * In either case, we set the new expiration time to be the 927 * the next interval boundary. This assures that the 928 * expiration time modulo the interval is invariant. 929 * 930 * We arbitrarily define "distant" to be one second (one second 931 * is chosen because it's shorter than any foray to the 932 * debugger while still being longer than any legitimate 933 * stretch at CY_HIGH_LEVEL). 934 */ 935 exp += cyclic->cy_interval; 936 937 if (now - exp > NANOSEC) { 938 hrtime_t interval = cyclic->cy_interval; 939 940 CYC_TRACE(cpu, CY_HIGH_LEVEL, exp == interval ? 941 "fire-first" : "fire-swing", now, exp); 942 943 exp += ((now - exp) / interval + 1) * interval; 944 } 945 946 cyclic->cy_expire = exp; 947 cyclic_downheap(cpu, 0); 948 } 949 950 /* 951 * Now we have a cyclic in the root slot which isn't in the past; 952 * reprogram the interrupt source. 953 */ 954 be->cyb_reprogram(arg, exp); 955 } 956 957 static void 958 cyclic_remove_pend(cyc_cpu_t *cpu, cyc_level_t level, cyclic_t *cyclic) 959 { 960 cyc_func_t handler = cyclic->cy_handler; 961 void *arg = cyclic->cy_arg; 962 uint32_t i, rpend = cpu->cyp_rpend - 1; 963 964 ASSERT(cyclic->cy_flags & CYF_FREE); 965 ASSERT(cyclic->cy_pend == 0); 966 ASSERT(cpu->cyp_state == CYS_REMOVING); 967 ASSERT(cpu->cyp_rpend > 0); 968 969 CYC_TRACE(cpu, level, "remove-rpend", cyclic, cpu->cyp_rpend); 970 971 /* 972 * Note that we only call the handler cyp_rpend - 1 times; this is 973 * to account for the handler call in cyclic_softint(). 974 */ 975 for (i = 0; i < rpend; i++) { 976 CYC_TRACE(cpu, level, "rpend-in", handler, arg); 977 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 978 979 (*handler)(arg); 980 981 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 982 CYC_TRACE(cpu, level, "rpend-out", handler, arg); 983 } 984 985 /* 986 * We can now let the remove operation complete. 987 */ 988 sema_v(&cpu->cyp_modify_wait); 989 } 990 991 /* 992 * cyclic_softint(cpu_t *cpu, cyc_level_t level) 993 * 994 * Overview 995 * 996 * cyclic_softint() is the cyclic subsystem's CY_LOCK_LEVEL and CY_LOW_LEVEL 997 * soft interrupt handler. Called by the cyclic backend. 998 * 999 * Arguments and notes 1000 * 1001 * The first argument to cyclic_softint() is the CPU on which the interrupt 1002 * is executing; backends must call into cyclic_softint() on the specified 1003 * CPU. The second argument is the level of the soft interrupt; it must 1004 * be one of CY_LOCK_LEVEL or CY_LOW_LEVEL. 1005 * 1006 * cyclic_softint() will call the handlers for cyclics pending at the 1007 * specified level. cyclic_softint() will not return until all pending 1008 * cyclics at the specified level have been dealt with; intervening 1009 * CY_HIGH_LEVEL interrupts which enqueue cyclics at the specified level 1010 * may therefore prolong cyclic_softint(). 1011 * 1012 * cyclic_softint() never disables interrupts, and, if neither a 1013 * cyclic_add() nor a cyclic_remove() is pending on the specified CPU, is 1014 * lock-free. This assures that in the common case, cyclic_softint() 1015 * completes without blocking, and never starves cyclic_fire(). If either 1016 * cyclic_add() or cyclic_remove() is pending, cyclic_softint() may grab 1017 * a dispatcher lock. 1018 * 1019 * While cyclic_softint() is designed for bounded latency, it is obviously 1020 * at the mercy of its cyclic handlers. Because cyclic handlers may block 1021 * arbitrarily, callers of cyclic_softint() should not rely upon 1022 * deterministic completion. 1023 * 1024 * cyclic_softint() may be called spuriously without ill effect. 1025 * 1026 * Return value 1027 * 1028 * None. 1029 * 1030 * Caller's context 1031 * 1032 * The caller must be executing in soft interrupt context at either 1033 * CY_LOCK_LEVEL or CY_LOW_LEVEL. The level passed to cyclic_softint() 1034 * must match the level at which it is executing. On optimal backends, 1035 * the caller will hold no locks. In any case, the caller may not hold 1036 * cpu_lock or any lock acquired by any cyclic handler or held across 1037 * any of cyclic_add(), cyclic_remove(), cyclic_bind() or cyclic_juggle(). 1038 */ 1039 void 1040 cyclic_softint(cpu_t *c, cyc_level_t level) 1041 { 1042 cyc_cpu_t *cpu = c->cpu_cyclic; 1043 cyc_softbuf_t *softbuf; 1044 int soft, *buf, consndx, resized = 0, intr_resized = 0; 1045 cyc_pcbuffer_t *pc; 1046 cyclic_t *cyclics = cpu->cyp_cyclics; 1047 int sizemask; 1048 1049 CYC_TRACE(cpu, level, "softint", cyclics, 0); 1050 1051 ASSERT(level < CY_LOW_LEVEL + CY_SOFT_LEVELS); 1052 1053 softbuf = &cpu->cyp_softbuf[level]; 1054 top: 1055 soft = softbuf->cys_soft; 1056 ASSERT(soft == 0 || soft == 1); 1057 1058 pc = &softbuf->cys_buf[soft]; 1059 buf = pc->cypc_buf; 1060 consndx = pc->cypc_consndx; 1061 sizemask = pc->cypc_sizemask; 1062 1063 CYC_TRACE(cpu, level, "softint-top", cyclics, pc); 1064 1065 while (consndx != pc->cypc_prodndx) { 1066 int pend, npend, opend; 1067 int consmasked = consndx & sizemask; 1068 cyclic_t *cyclic = &cyclics[buf[consmasked]]; 1069 cyc_func_t handler = cyclic->cy_handler; 1070 void *arg = cyclic->cy_arg; 1071 1072 ASSERT(buf[consmasked] < cpu->cyp_size); 1073 CYC_TRACE(cpu, level, "consuming", consndx, cyclic); 1074 1075 /* 1076 * We have found this cyclic in the pcbuffer. We know that 1077 * one of the following is true: 1078 * 1079 * (a) The pend is non-zero. We need to execute the handler 1080 * at least once. 1081 * 1082 * (b) The pend _was_ non-zero, but it's now zero due to a 1083 * resize. We will call the handler once, see that we 1084 * are in this case, and read the new cyclics buffer 1085 * (and hence the old non-zero pend). 1086 * 1087 * (c) The pend _was_ non-zero, but it's now zero due to a 1088 * removal. We will call the handler once, see that we 1089 * are in this case, and call into cyclic_remove_pend() 1090 * to call the cyclic rpend times. We will take into 1091 * account that we have already called the handler once. 1092 * 1093 * Point is: it's safe to call the handler without first 1094 * checking the pend. 1095 */ 1096 do { 1097 CYC_TRACE(cpu, level, "handler-in", handler, arg); 1098 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 1099 1100 (*handler)(arg); 1101 1102 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 1103 CYC_TRACE(cpu, level, "handler-out", handler, arg); 1104 reread: 1105 pend = cyclic->cy_pend; 1106 npend = pend - 1; 1107 1108 if (pend == 0) { 1109 if (cpu->cyp_state == CYS_REMOVING) { 1110 /* 1111 * This cyclic has been removed while 1112 * it had a non-zero pend count (we 1113 * know it was non-zero because we 1114 * found this cyclic in the pcbuffer). 1115 * There must be a non-zero rpend for 1116 * this CPU, and there must be a remove 1117 * operation blocking; we'll call into 1118 * cyclic_remove_pend() to clean this 1119 * up, and break out of the pend loop. 1120 */ 1121 cyclic_remove_pend(cpu, level, cyclic); 1122 break; 1123 } 1124 1125 /* 1126 * We must have had a resize interrupt us. 1127 */ 1128 CYC_TRACE(cpu, level, "resize-int", cyclics, 0); 1129 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1130 ASSERT(cyclics != cpu->cyp_cyclics); 1131 ASSERT(resized == 0); 1132 ASSERT(intr_resized == 0); 1133 intr_resized = 1; 1134 cyclics = cpu->cyp_cyclics; 1135 cyclic = &cyclics[buf[consmasked]]; 1136 ASSERT(cyclic->cy_handler == handler); 1137 ASSERT(cyclic->cy_arg == arg); 1138 goto reread; 1139 } 1140 1141 if ((opend = 1142 cas32(&cyclic->cy_pend, pend, npend)) != pend) { 1143 /* 1144 * Our cas32 can fail for one of several 1145 * reasons: 1146 * 1147 * (a) An intervening high level bumped up the 1148 * pend count on this cyclic. In this 1149 * case, we will see a higher pend. 1150 * 1151 * (b) The cyclics array has been yanked out 1152 * from underneath us by a resize 1153 * operation. In this case, pend is 0 and 1154 * cyp_state is CYS_EXPANDING. 1155 * 1156 * (c) The cyclic has been removed by an 1157 * intervening remove-xcall. In this case, 1158 * pend will be 0, the cyp_state will be 1159 * CYS_REMOVING, and the cyclic will be 1160 * marked CYF_FREE. 1161 * 1162 * The assertion below checks that we are 1163 * in one of the above situations. The 1164 * action under all three is to return to 1165 * the top of the loop. 1166 */ 1167 CYC_TRACE(cpu, level, "cas-fail", opend, pend); 1168 ASSERT(opend > pend || (opend == 0 && 1169 ((cyclics != cpu->cyp_cyclics && 1170 cpu->cyp_state == CYS_EXPANDING) || 1171 (cpu->cyp_state == CYS_REMOVING && 1172 (cyclic->cy_flags & CYF_FREE))))); 1173 goto reread; 1174 } 1175 1176 /* 1177 * Okay, so we've managed to successfully decrement 1178 * pend. If we just decremented the pend to 0, we're 1179 * done. 1180 */ 1181 } while (npend > 0); 1182 1183 pc->cypc_consndx = ++consndx; 1184 } 1185 1186 /* 1187 * If the high level handler is no longer writing to the same 1188 * buffer, then we've had a resize. We need to switch our soft 1189 * index, and goto top. 1190 */ 1191 if (soft != softbuf->cys_hard) { 1192 /* 1193 * We can assert that the other buffer has grown by exactly 1194 * one factor of two. 1195 */ 1196 CYC_TRACE(cpu, level, "buffer-grow", 0, 0); 1197 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1198 ASSERT(softbuf->cys_buf[softbuf->cys_hard].cypc_sizemask == 1199 (softbuf->cys_buf[soft].cypc_sizemask << 1) + 1 || 1200 softbuf->cys_buf[soft].cypc_sizemask == 0); 1201 ASSERT(softbuf->cys_hard == (softbuf->cys_soft ^ 1)); 1202 1203 /* 1204 * If our cached cyclics pointer doesn't match cyp_cyclics, 1205 * then we took a resize between our last iteration of the 1206 * pend loop and the check against softbuf->cys_hard. 1207 */ 1208 if (cpu->cyp_cyclics != cyclics) { 1209 CYC_TRACE1(cpu, level, "resize-int-int", consndx); 1210 cyclics = cpu->cyp_cyclics; 1211 } 1212 1213 softbuf->cys_soft = softbuf->cys_hard; 1214 1215 ASSERT(resized == 0); 1216 resized = 1; 1217 goto top; 1218 } 1219 1220 /* 1221 * If we were interrupted by a resize operation, then we must have 1222 * seen the hard index change. 1223 */ 1224 ASSERT(!(intr_resized == 1 && resized == 0)); 1225 1226 if (resized) { 1227 uint32_t lev, nlev; 1228 1229 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1230 1231 do { 1232 lev = cpu->cyp_modify_levels; 1233 nlev = lev + 1; 1234 } while (cas32(&cpu->cyp_modify_levels, lev, nlev) != lev); 1235 1236 /* 1237 * If we are the last soft level to see the modification, 1238 * post on cyp_modify_wait. Otherwise, (if we're not 1239 * already at low level), post down to the next soft level. 1240 */ 1241 if (nlev == CY_SOFT_LEVELS) { 1242 CYC_TRACE0(cpu, level, "resize-kick"); 1243 sema_v(&cpu->cyp_modify_wait); 1244 } else { 1245 ASSERT(nlev < CY_SOFT_LEVELS); 1246 if (level != CY_LOW_LEVEL) { 1247 cyc_backend_t *be = cpu->cyp_backend; 1248 1249 CYC_TRACE0(cpu, level, "resize-post"); 1250 be->cyb_softint(be->cyb_arg, level - 1); 1251 } 1252 } 1253 } 1254 } 1255 1256 static void 1257 cyclic_expand_xcall(cyc_xcallarg_t *arg) 1258 { 1259 cyc_cpu_t *cpu = arg->cyx_cpu; 1260 cyc_backend_t *be = cpu->cyp_backend; 1261 cyb_arg_t bar = be->cyb_arg; 1262 cyc_cookie_t cookie; 1263 cyc_index_t new_size = arg->cyx_size, size = cpu->cyp_size, i; 1264 cyc_index_t *new_heap = arg->cyx_heap; 1265 cyclic_t *cyclics = cpu->cyp_cyclics, *new_cyclics = arg->cyx_cyclics; 1266 1267 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1268 1269 /* 1270 * This is a little dicey. First, we'll raise our interrupt level 1271 * to CY_HIGH_LEVEL. This CPU already has a new heap, cyclic array, 1272 * etc.; we just need to bcopy them across. As for the softint 1273 * buffers, we'll switch the active buffers. The actual softints will 1274 * take care of consuming any pending cyclics in the old buffer. 1275 */ 1276 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1277 1278 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expand", new_size, 0); 1279 1280 /* 1281 * Assert that the new size is a power of 2. 1282 */ 1283 ASSERT((new_size & new_size - 1) == 0); 1284 ASSERT(new_size == (size << 1)); 1285 ASSERT(cpu->cyp_heap != NULL && cpu->cyp_cyclics != NULL); 1286 1287 bcopy(cpu->cyp_heap, new_heap, sizeof (cyc_index_t) * size); 1288 bcopy(cyclics, new_cyclics, sizeof (cyclic_t) * size); 1289 1290 /* 1291 * Now run through the old cyclics array, setting pend to 0. To 1292 * softints (which are executing at a lower priority level), the 1293 * pends dropping to 0 will appear atomic with the cyp_cyclics 1294 * pointer changing. 1295 */ 1296 for (i = 0; i < size; i++) 1297 cyclics[i].cy_pend = 0; 1298 1299 /* 1300 * Set up the free list, and set all of the new cyclics to be CYF_FREE. 1301 */ 1302 for (i = size; i < new_size; i++) { 1303 new_heap[i] = i; 1304 new_cyclics[i].cy_flags = CYF_FREE; 1305 } 1306 1307 /* 1308 * We can go ahead and plow the value of cyp_heap and cyp_cyclics; 1309 * cyclic_expand() has kept a copy. 1310 */ 1311 cpu->cyp_heap = new_heap; 1312 cpu->cyp_cyclics = new_cyclics; 1313 cpu->cyp_size = new_size; 1314 1315 /* 1316 * We've switched over the heap and the cyclics array. Now we need 1317 * to switch over our active softint buffer pointers. 1318 */ 1319 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1320 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1321 uchar_t hard = softbuf->cys_hard; 1322 1323 /* 1324 * Assert that we're not in the middle of a resize operation. 1325 */ 1326 ASSERT(hard == softbuf->cys_soft); 1327 ASSERT(hard == 0 || hard == 1); 1328 ASSERT(softbuf->cys_buf[hard].cypc_buf != NULL); 1329 1330 softbuf->cys_hard = hard ^ 1; 1331 1332 /* 1333 * The caller (cyclic_expand()) is responsible for setting 1334 * up the new producer-consumer buffer; assert that it's 1335 * been done correctly. 1336 */ 1337 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_buf != NULL); 1338 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_prodndx == 0); 1339 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_consndx == 0); 1340 } 1341 1342 /* 1343 * That's all there is to it; now we just need to postdown to 1344 * get the softint chain going. 1345 */ 1346 be->cyb_softint(bar, CY_HIGH_LEVEL - 1); 1347 be->cyb_restore_level(bar, cookie); 1348 } 1349 1350 /* 1351 * cyclic_expand() will cross call onto the CPU to perform the actual 1352 * expand operation. 1353 */ 1354 static void 1355 cyclic_expand(cyc_cpu_t *cpu) 1356 { 1357 cyc_index_t new_size, old_size; 1358 cyc_index_t *new_heap, *old_heap; 1359 cyclic_t *new_cyclics, *old_cyclics; 1360 cyc_xcallarg_t arg; 1361 cyc_backend_t *be = cpu->cyp_backend; 1362 char old_hard; 1363 int i; 1364 1365 ASSERT(MUTEX_HELD(&cpu_lock)); 1366 ASSERT(cpu->cyp_state == CYS_ONLINE); 1367 1368 cpu->cyp_state = CYS_EXPANDING; 1369 1370 old_heap = cpu->cyp_heap; 1371 old_cyclics = cpu->cyp_cyclics; 1372 1373 if ((new_size = ((old_size = cpu->cyp_size) << 1)) == 0) { 1374 new_size = CY_DEFAULT_PERCPU; 1375 ASSERT(old_heap == NULL && old_cyclics == NULL); 1376 } 1377 1378 /* 1379 * Check that the new_size is a power of 2. 1380 */ 1381 ASSERT((new_size - 1 & new_size) == 0); 1382 1383 new_heap = kmem_alloc(sizeof (cyc_index_t) * new_size, KM_SLEEP); 1384 new_cyclics = kmem_zalloc(sizeof (cyclic_t) * new_size, KM_SLEEP); 1385 1386 /* 1387 * We know that no other expansions are in progress (they serialize 1388 * on cpu_lock), so we can safely read the softbuf metadata. 1389 */ 1390 old_hard = cpu->cyp_softbuf[0].cys_hard; 1391 1392 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1393 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1394 char hard = softbuf->cys_hard; 1395 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard ^ 1]; 1396 1397 ASSERT(hard == old_hard); 1398 ASSERT(hard == softbuf->cys_soft); 1399 ASSERT(pc->cypc_buf == NULL); 1400 1401 pc->cypc_buf = 1402 kmem_alloc(sizeof (cyc_index_t) * new_size, KM_SLEEP); 1403 pc->cypc_prodndx = pc->cypc_consndx = 0; 1404 pc->cypc_sizemask = new_size - 1; 1405 } 1406 1407 arg.cyx_cpu = cpu; 1408 arg.cyx_heap = new_heap; 1409 arg.cyx_cyclics = new_cyclics; 1410 arg.cyx_size = new_size; 1411 1412 cpu->cyp_modify_levels = 0; 1413 1414 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 1415 (cyc_func_t)cyclic_expand_xcall, &arg); 1416 1417 /* 1418 * Now block, waiting for the resize operation to complete. 1419 */ 1420 sema_p(&cpu->cyp_modify_wait); 1421 ASSERT(cpu->cyp_modify_levels == CY_SOFT_LEVELS); 1422 1423 /* 1424 * The operation is complete; we can now free the old buffers. 1425 */ 1426 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1427 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1428 char hard = softbuf->cys_hard; 1429 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard ^ 1]; 1430 1431 ASSERT(hard == (old_hard ^ 1)); 1432 ASSERT(hard == softbuf->cys_soft); 1433 1434 if (pc->cypc_buf == NULL) 1435 continue; 1436 1437 ASSERT(pc->cypc_sizemask == ((new_size - 1) >> 1)); 1438 1439 kmem_free(pc->cypc_buf, 1440 sizeof (cyc_index_t) * (pc->cypc_sizemask + 1)); 1441 pc->cypc_buf = NULL; 1442 } 1443 1444 if (old_cyclics != NULL) { 1445 ASSERT(old_heap != NULL); 1446 ASSERT(old_size != 0); 1447 kmem_free(old_cyclics, sizeof (cyclic_t) * old_size); 1448 kmem_free(old_heap, sizeof (cyc_index_t) * old_size); 1449 } 1450 1451 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1452 cpu->cyp_state = CYS_ONLINE; 1453 } 1454 1455 /* 1456 * cyclic_pick_cpu will attempt to pick a CPU according to the constraints 1457 * specified by the partition, bound CPU, and flags. Additionally, 1458 * cyclic_pick_cpu() will not pick the avoid CPU; it will return NULL if 1459 * the avoid CPU is the only CPU which satisfies the constraints. 1460 * 1461 * If CYF_CPU_BOUND is set in flags, the specified CPU must be non-NULL. 1462 * If CYF_PART_BOUND is set in flags, the specified partition must be non-NULL. 1463 * If both CYF_CPU_BOUND and CYF_PART_BOUND are set, the specified CPU must 1464 * be in the specified partition. 1465 */ 1466 static cyc_cpu_t * 1467 cyclic_pick_cpu(cpupart_t *part, cpu_t *bound, cpu_t *avoid, uint16_t flags) 1468 { 1469 cpu_t *c, *start = (part != NULL) ? part->cp_cpulist : CPU; 1470 cpu_t *online = NULL; 1471 uintptr_t offset; 1472 1473 CYC_PTRACE("pick-cpu", part, bound); 1474 1475 ASSERT(!(flags & CYF_CPU_BOUND) || bound != NULL); 1476 ASSERT(!(flags & CYF_PART_BOUND) || part != NULL); 1477 1478 /* 1479 * If we're bound to our CPU, there isn't much choice involved. We 1480 * need to check that the CPU passed as bound is in the cpupart, and 1481 * that the CPU that we're binding to has been configured. 1482 */ 1483 if (flags & CYF_CPU_BOUND) { 1484 CYC_PTRACE("pick-cpu-bound", bound, avoid); 1485 1486 if ((flags & CYF_PART_BOUND) && bound->cpu_part != part) 1487 panic("cyclic_pick_cpu: " 1488 "CPU binding contradicts partition binding"); 1489 1490 if (bound == avoid) 1491 return (NULL); 1492 1493 if (bound->cpu_cyclic == NULL) 1494 panic("cyclic_pick_cpu: " 1495 "attempt to bind to non-configured CPU"); 1496 1497 return (bound->cpu_cyclic); 1498 } 1499 1500 if (flags & CYF_PART_BOUND) { 1501 CYC_PTRACE("pick-part-bound", bound, avoid); 1502 offset = offsetof(cpu_t, cpu_next_part); 1503 } else { 1504 offset = offsetof(cpu_t, cpu_next_onln); 1505 } 1506 1507 c = start; 1508 do { 1509 if (c->cpu_cyclic == NULL) 1510 continue; 1511 1512 if (c->cpu_cyclic->cyp_state == CYS_OFFLINE) 1513 continue; 1514 1515 if (c == avoid) 1516 continue; 1517 1518 if (c->cpu_flags & CPU_ENABLE) 1519 goto found; 1520 1521 if (online == NULL) 1522 online = c; 1523 } while ((c = *(cpu_t **)((uintptr_t)c + offset)) != start); 1524 1525 /* 1526 * If we're here, we're in one of two situations: 1527 * 1528 * (a) We have a partition-bound cyclic, and there is no CPU in 1529 * our partition which is CPU_ENABLE'd. If we saw another 1530 * non-CYS_OFFLINE CPU in our partition, we'll go with it. 1531 * If not, the avoid CPU must be the only non-CYS_OFFLINE 1532 * CPU in the partition; we're forced to return NULL. 1533 * 1534 * (b) We have a partition-unbound cyclic, in which case there 1535 * must only be one CPU CPU_ENABLE'd, and it must be the one 1536 * we're trying to avoid. If cyclic_juggle()/cyclic_offline() 1537 * are called appropriately, this generally shouldn't happen 1538 * (the offline should fail before getting to this code). 1539 * At any rate: we can't avoid the avoid CPU, so we return 1540 * NULL. 1541 */ 1542 if (!(flags & CYF_PART_BOUND)) { 1543 ASSERT(avoid->cpu_flags & CPU_ENABLE); 1544 return (NULL); 1545 } 1546 1547 CYC_PTRACE("pick-no-intr", part, avoid); 1548 1549 if ((c = online) != NULL) 1550 goto found; 1551 1552 CYC_PTRACE("pick-fail", part, avoid); 1553 ASSERT(avoid->cpu_part == start->cpu_part); 1554 return (NULL); 1555 1556 found: 1557 CYC_PTRACE("pick-cpu-found", c, avoid); 1558 ASSERT(c != avoid); 1559 ASSERT(c->cpu_cyclic != NULL); 1560 1561 return (c->cpu_cyclic); 1562 } 1563 1564 static void 1565 cyclic_add_xcall(cyc_xcallarg_t *arg) 1566 { 1567 cyc_cpu_t *cpu = arg->cyx_cpu; 1568 cyc_handler_t *hdlr = arg->cyx_hdlr; 1569 cyc_time_t *when = arg->cyx_when; 1570 cyc_backend_t *be = cpu->cyp_backend; 1571 cyc_index_t ndx, nelems; 1572 cyc_cookie_t cookie; 1573 cyb_arg_t bar = be->cyb_arg; 1574 cyclic_t *cyclic; 1575 1576 ASSERT(cpu->cyp_nelems < cpu->cyp_size); 1577 1578 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1579 1580 CYC_TRACE(cpu, CY_HIGH_LEVEL, 1581 "add-xcall", when->cyt_when, when->cyt_interval); 1582 1583 nelems = cpu->cyp_nelems++; 1584 1585 if (nelems == 0) { 1586 /* 1587 * If this is the first element, we need to enable the 1588 * backend on this CPU. 1589 */ 1590 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "enabled"); 1591 be->cyb_enable(bar); 1592 } 1593 1594 ndx = cpu->cyp_heap[nelems]; 1595 cyclic = &cpu->cyp_cyclics[ndx]; 1596 1597 ASSERT(cyclic->cy_flags == CYF_FREE); 1598 cyclic->cy_interval = when->cyt_interval; 1599 1600 if (when->cyt_when == 0) { 1601 /* 1602 * If a start time hasn't been explicitly specified, we'll 1603 * start on the next interval boundary. 1604 */ 1605 cyclic->cy_expire = (gethrtime() / cyclic->cy_interval + 1) * 1606 cyclic->cy_interval; 1607 } else { 1608 cyclic->cy_expire = when->cyt_when; 1609 } 1610 1611 cyclic->cy_handler = hdlr->cyh_func; 1612 cyclic->cy_arg = hdlr->cyh_arg; 1613 cyclic->cy_level = hdlr->cyh_level; 1614 cyclic->cy_flags = arg->cyx_flags; 1615 1616 if (cyclic_upheap(cpu, nelems)) { 1617 hrtime_t exp = cyclic->cy_expire; 1618 1619 CYC_TRACE(cpu, CY_HIGH_LEVEL, "add-reprog", cyclic, exp); 1620 1621 /* 1622 * If our upheap propagated to the root, we need to 1623 * reprogram the interrupt source. 1624 */ 1625 be->cyb_reprogram(bar, exp); 1626 } 1627 be->cyb_restore_level(bar, cookie); 1628 1629 arg->cyx_ndx = ndx; 1630 } 1631 1632 static cyc_index_t 1633 cyclic_add_here(cyc_cpu_t *cpu, cyc_handler_t *hdlr, 1634 cyc_time_t *when, uint16_t flags) 1635 { 1636 cyc_backend_t *be = cpu->cyp_backend; 1637 cyb_arg_t bar = be->cyb_arg; 1638 cyc_xcallarg_t arg; 1639 1640 CYC_PTRACE("add-cpu", cpu, hdlr->cyh_func); 1641 ASSERT(MUTEX_HELD(&cpu_lock)); 1642 ASSERT(cpu->cyp_state == CYS_ONLINE); 1643 ASSERT(!(cpu->cyp_cpu->cpu_flags & CPU_OFFLINE)); 1644 ASSERT(when->cyt_when >= 0 && when->cyt_interval > 0); 1645 1646 if (cpu->cyp_nelems == cpu->cyp_size) { 1647 /* 1648 * This is expensive; it will cross call onto the other 1649 * CPU to perform the expansion. 1650 */ 1651 cyclic_expand(cpu); 1652 ASSERT(cpu->cyp_nelems < cpu->cyp_size); 1653 } 1654 1655 /* 1656 * By now, we know that we're going to be able to successfully 1657 * perform the add. Now cross call over to the CPU of interest to 1658 * actually add our cyclic. 1659 */ 1660 arg.cyx_cpu = cpu; 1661 arg.cyx_hdlr = hdlr; 1662 arg.cyx_when = when; 1663 arg.cyx_flags = flags; 1664 1665 be->cyb_xcall(bar, cpu->cyp_cpu, (cyc_func_t)cyclic_add_xcall, &arg); 1666 1667 CYC_PTRACE("add-cpu-done", cpu, arg.cyx_ndx); 1668 1669 return (arg.cyx_ndx); 1670 } 1671 1672 static void 1673 cyclic_remove_xcall(cyc_xcallarg_t *arg) 1674 { 1675 cyc_cpu_t *cpu = arg->cyx_cpu; 1676 cyc_backend_t *be = cpu->cyp_backend; 1677 cyb_arg_t bar = be->cyb_arg; 1678 cyc_cookie_t cookie; 1679 cyc_index_t ndx = arg->cyx_ndx, nelems = cpu->cyp_nelems, i; 1680 cyc_index_t *heap = cpu->cyp_heap, last; 1681 cyclic_t *cyclic; 1682 #ifdef DEBUG 1683 cyc_index_t root; 1684 #endif 1685 1686 ASSERT(cpu->cyp_state == CYS_REMOVING); 1687 ASSERT(nelems > 0); 1688 1689 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1690 1691 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "remove-xcall", ndx); 1692 1693 cyclic = &cpu->cyp_cyclics[ndx]; 1694 1695 /* 1696 * Grab the current expiration time. If this cyclic is being 1697 * removed as part of a juggling operation, the expiration time 1698 * will be used when the cyclic is added to the new CPU. 1699 */ 1700 if (arg->cyx_when != NULL) { 1701 arg->cyx_when->cyt_when = cyclic->cy_expire; 1702 arg->cyx_when->cyt_interval = cyclic->cy_interval; 1703 } 1704 1705 if (cyclic->cy_pend != 0) { 1706 /* 1707 * The pend is non-zero; this cyclic is currently being 1708 * executed (or will be executed shortly). If the caller 1709 * refuses to wait, we must return (doing nothing). Otherwise, 1710 * we will stash the pend value * in this CPU's rpend, and 1711 * then zero it out. The softint in the pend loop will see 1712 * that we have zeroed out pend, and will call the cyclic 1713 * handler rpend times. The caller will wait until the 1714 * softint has completed calling the cyclic handler. 1715 */ 1716 if (arg->cyx_wait == CY_NOWAIT) { 1717 arg->cyx_wait = CY_WAIT; 1718 goto out; 1719 } 1720 1721 ASSERT(cyclic->cy_level != CY_HIGH_LEVEL); 1722 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "remove-pend", cyclic->cy_pend); 1723 cpu->cyp_rpend = cyclic->cy_pend; 1724 cyclic->cy_pend = 0; 1725 } 1726 1727 /* 1728 * Now set the flags to CYF_FREE. We don't need a membar_enter() 1729 * between zeroing pend and setting the flags because we're at 1730 * CY_HIGH_LEVEL (that is, the zeroing of pend and the setting 1731 * of cy_flags appear atomic to softints). 1732 */ 1733 cyclic->cy_flags = CYF_FREE; 1734 1735 for (i = 0; i < nelems; i++) { 1736 if (heap[i] == ndx) 1737 break; 1738 } 1739 1740 if (i == nelems) 1741 panic("attempt to remove non-existent cyclic"); 1742 1743 cpu->cyp_nelems = --nelems; 1744 1745 if (nelems == 0) { 1746 /* 1747 * If we just removed the last element, then we need to 1748 * disable the backend on this CPU. 1749 */ 1750 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "disabled"); 1751 be->cyb_disable(bar); 1752 } 1753 1754 if (i == nelems) { 1755 /* 1756 * If we just removed the last element of the heap, then 1757 * we don't have to downheap. 1758 */ 1759 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-bottom"); 1760 goto out; 1761 } 1762 1763 #ifdef DEBUG 1764 root = heap[0]; 1765 #endif 1766 1767 /* 1768 * Swap the last element of the heap with the one we want to 1769 * remove, and downheap (this has the implicit effect of putting 1770 * the newly freed element on the free list). 1771 */ 1772 heap[i] = (last = heap[nelems]); 1773 heap[nelems] = ndx; 1774 1775 if (i == 0) { 1776 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-root"); 1777 cyclic_downheap(cpu, 0); 1778 } else { 1779 if (cyclic_upheap(cpu, i) == 0) { 1780 /* 1781 * The upheap didn't propagate to the root; if it 1782 * didn't propagate at all, we need to downheap. 1783 */ 1784 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-no-root"); 1785 if (heap[i] == last) { 1786 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-no-up"); 1787 cyclic_downheap(cpu, i); 1788 } 1789 ASSERT(heap[0] == root); 1790 goto out; 1791 } 1792 } 1793 1794 /* 1795 * We're here because we changed the root; we need to reprogram 1796 * the clock source. 1797 */ 1798 cyclic = &cpu->cyp_cyclics[heap[0]]; 1799 1800 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-reprog"); 1801 1802 ASSERT(nelems != 0); 1803 be->cyb_reprogram(bar, cyclic->cy_expire); 1804 out: 1805 be->cyb_restore_level(bar, cookie); 1806 } 1807 1808 static int 1809 cyclic_remove_here(cyc_cpu_t *cpu, cyc_index_t ndx, cyc_time_t *when, int wait) 1810 { 1811 cyc_backend_t *be = cpu->cyp_backend; 1812 cyc_xcallarg_t arg; 1813 cyclic_t *cyclic = &cpu->cyp_cyclics[ndx]; 1814 cyc_level_t level = cyclic->cy_level; 1815 1816 ASSERT(MUTEX_HELD(&cpu_lock)); 1817 ASSERT(cpu->cyp_rpend == 0); 1818 ASSERT(wait == CY_WAIT || wait == CY_NOWAIT); 1819 1820 arg.cyx_ndx = ndx; 1821 arg.cyx_cpu = cpu; 1822 arg.cyx_when = when; 1823 arg.cyx_wait = wait; 1824 1825 ASSERT(cpu->cyp_state == CYS_ONLINE); 1826 cpu->cyp_state = CYS_REMOVING; 1827 1828 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 1829 (cyc_func_t)cyclic_remove_xcall, &arg); 1830 1831 /* 1832 * If the cyclic we removed wasn't at CY_HIGH_LEVEL, then we need to 1833 * check the cyp_rpend. If it's non-zero, then we need to wait here 1834 * for all pending cyclic handlers to run. 1835 */ 1836 ASSERT(!(level == CY_HIGH_LEVEL && cpu->cyp_rpend != 0)); 1837 ASSERT(!(wait == CY_NOWAIT && cpu->cyp_rpend != 0)); 1838 ASSERT(!(arg.cyx_wait == CY_NOWAIT && cpu->cyp_rpend != 0)); 1839 1840 if (wait != arg.cyx_wait) { 1841 /* 1842 * We are being told that we must wait if we want to 1843 * remove this cyclic; put the CPU back in the CYS_ONLINE 1844 * state and return failure. 1845 */ 1846 ASSERT(wait == CY_NOWAIT && arg.cyx_wait == CY_WAIT); 1847 ASSERT(cpu->cyp_state == CYS_REMOVING); 1848 cpu->cyp_state = CYS_ONLINE; 1849 1850 return (0); 1851 } 1852 1853 if (cpu->cyp_rpend != 0) 1854 sema_p(&cpu->cyp_modify_wait); 1855 1856 ASSERT(cpu->cyp_state == CYS_REMOVING); 1857 1858 cpu->cyp_rpend = 0; 1859 cpu->cyp_state = CYS_ONLINE; 1860 1861 return (1); 1862 } 1863 1864 /* 1865 * cyclic_juggle_one_to() should only be called when the source cyclic 1866 * can be juggled and the destination CPU is known to be able to accept 1867 * it. 1868 */ 1869 static void 1870 cyclic_juggle_one_to(cyc_id_t *idp, cyc_cpu_t *dest) 1871 { 1872 cyc_cpu_t *src = idp->cyi_cpu; 1873 cyc_index_t ndx = idp->cyi_ndx; 1874 cyc_time_t when; 1875 cyc_handler_t hdlr; 1876 cyclic_t *cyclic; 1877 uint16_t flags; 1878 hrtime_t delay; 1879 1880 ASSERT(MUTEX_HELD(&cpu_lock)); 1881 ASSERT(src != NULL && idp->cyi_omni_list == NULL); 1882 ASSERT(!(dest->cyp_cpu->cpu_flags & (CPU_QUIESCED | CPU_OFFLINE))); 1883 CYC_PTRACE("juggle-one-to", idp, dest); 1884 1885 cyclic = &src->cyp_cyclics[ndx]; 1886 1887 flags = cyclic->cy_flags; 1888 ASSERT(!(flags & CYF_CPU_BOUND) && !(flags & CYF_FREE)); 1889 1890 hdlr.cyh_func = cyclic->cy_handler; 1891 hdlr.cyh_level = cyclic->cy_level; 1892 hdlr.cyh_arg = cyclic->cy_arg; 1893 1894 /* 1895 * Before we begin the juggling process, see if the destination 1896 * CPU requires an expansion. If it does, we'll perform the 1897 * expansion before removing the cyclic. This is to prevent us 1898 * from blocking while a system-critical cyclic (notably, the clock 1899 * cyclic) isn't on a CPU. 1900 */ 1901 if (dest->cyp_nelems == dest->cyp_size) { 1902 CYC_PTRACE("remove-expand", idp, dest); 1903 cyclic_expand(dest); 1904 ASSERT(dest->cyp_nelems < dest->cyp_size); 1905 } 1906 1907 /* 1908 * Remove the cyclic from the source. As mentioned above, we cannot 1909 * block during this operation; if we cannot remove the cyclic 1910 * without waiting, we spin for a time shorter than the interval, and 1911 * reattempt the (non-blocking) removal. If we continue to fail, 1912 * we will exponentially back off (up to half of the interval). 1913 * Note that the removal will ultimately succeed -- even if the 1914 * cyclic handler is blocked on a resource held by a thread which we 1915 * have preempted, priority inheritance assures that the preempted 1916 * thread will preempt us and continue to progress. 1917 */ 1918 for (delay = NANOSEC / MICROSEC; ; delay <<= 1) { 1919 /* 1920 * Before we begin this operation, disable kernel preemption. 1921 */ 1922 kpreempt_disable(); 1923 if (cyclic_remove_here(src, ndx, &when, CY_NOWAIT)) 1924 break; 1925 1926 /* 1927 * The operation failed; enable kernel preemption while 1928 * spinning. 1929 */ 1930 kpreempt_enable(); 1931 1932 CYC_PTRACE("remove-retry", idp, src); 1933 1934 if (delay > (cyclic->cy_interval >> 1)) 1935 delay = cyclic->cy_interval >> 1; 1936 1937 drv_usecwait((clock_t)(delay / (NANOSEC / MICROSEC))); 1938 } 1939 1940 /* 1941 * Now add the cyclic to the destination. This won't block; we 1942 * performed any necessary (blocking) expansion of the destination 1943 * CPU before removing the cyclic from the source CPU. 1944 */ 1945 idp->cyi_ndx = cyclic_add_here(dest, &hdlr, &when, flags); 1946 idp->cyi_cpu = dest; 1947 kpreempt_enable(); 1948 } 1949 1950 static int 1951 cyclic_juggle_one(cyc_id_t *idp) 1952 { 1953 cyc_index_t ndx = idp->cyi_ndx; 1954 cyc_cpu_t *cpu = idp->cyi_cpu, *dest; 1955 cyclic_t *cyclic = &cpu->cyp_cyclics[ndx]; 1956 cpu_t *c = cpu->cyp_cpu; 1957 cpupart_t *part = c->cpu_part; 1958 1959 CYC_PTRACE("juggle-one", idp, cpu); 1960 ASSERT(MUTEX_HELD(&cpu_lock)); 1961 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 1962 ASSERT(cpu->cyp_state == CYS_ONLINE); 1963 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 1964 1965 if ((dest = cyclic_pick_cpu(part, c, c, cyclic->cy_flags)) == NULL) { 1966 /* 1967 * Bad news: this cyclic can't be juggled. 1968 */ 1969 CYC_PTRACE("juggle-fail", idp, cpu) 1970 return (0); 1971 } 1972 1973 cyclic_juggle_one_to(idp, dest); 1974 1975 return (1); 1976 } 1977 1978 static void 1979 cyclic_unbind_cpu(cyclic_id_t id) 1980 { 1981 cyc_id_t *idp = (cyc_id_t *)id; 1982 cyc_cpu_t *cpu = idp->cyi_cpu; 1983 cpu_t *c = cpu->cyp_cpu; 1984 cyclic_t *cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 1985 1986 CYC_PTRACE("unbind-cpu", id, cpu); 1987 ASSERT(MUTEX_HELD(&cpu_lock)); 1988 ASSERT(cpu->cyp_state == CYS_ONLINE); 1989 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 1990 ASSERT(cyclic->cy_flags & CYF_CPU_BOUND); 1991 1992 cyclic->cy_flags &= ~CYF_CPU_BOUND; 1993 1994 /* 1995 * If we were bound to CPU which has interrupts disabled, we need 1996 * to juggle away. This can only fail if we are bound to a 1997 * processor set, and if every CPU in the processor set has 1998 * interrupts disabled. 1999 */ 2000 if (!(c->cpu_flags & CPU_ENABLE)) { 2001 int res = cyclic_juggle_one(idp); 2002 2003 ASSERT((res && idp->cyi_cpu != cpu) || 2004 (!res && (cyclic->cy_flags & CYF_PART_BOUND))); 2005 } 2006 } 2007 2008 static void 2009 cyclic_bind_cpu(cyclic_id_t id, cpu_t *d) 2010 { 2011 cyc_id_t *idp = (cyc_id_t *)id; 2012 cyc_cpu_t *dest = d->cpu_cyclic, *cpu = idp->cyi_cpu; 2013 cpu_t *c = cpu->cyp_cpu; 2014 cyclic_t *cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 2015 cpupart_t *part = c->cpu_part; 2016 2017 CYC_PTRACE("bind-cpu", id, dest); 2018 ASSERT(MUTEX_HELD(&cpu_lock)); 2019 ASSERT(!(d->cpu_flags & CPU_OFFLINE)); 2020 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2021 ASSERT(cpu->cyp_state == CYS_ONLINE); 2022 ASSERT(dest != NULL); 2023 ASSERT(dest->cyp_state == CYS_ONLINE); 2024 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 2025 ASSERT(!(cyclic->cy_flags & CYF_CPU_BOUND)); 2026 2027 dest = cyclic_pick_cpu(part, d, NULL, cyclic->cy_flags | CYF_CPU_BOUND); 2028 2029 if (dest != cpu) { 2030 cyclic_juggle_one_to(idp, dest); 2031 cyclic = &dest->cyp_cyclics[idp->cyi_ndx]; 2032 } 2033 2034 cyclic->cy_flags |= CYF_CPU_BOUND; 2035 } 2036 2037 static void 2038 cyclic_unbind_cpupart(cyclic_id_t id) 2039 { 2040 cyc_id_t *idp = (cyc_id_t *)id; 2041 cyc_cpu_t *cpu = idp->cyi_cpu; 2042 cpu_t *c = cpu->cyp_cpu; 2043 cyclic_t *cyc = &cpu->cyp_cyclics[idp->cyi_ndx]; 2044 2045 CYC_PTRACE("unbind-part", idp, c->cpu_part); 2046 ASSERT(MUTEX_HELD(&cpu_lock)); 2047 ASSERT(cpu->cyp_state == CYS_ONLINE); 2048 ASSERT(!(cyc->cy_flags & CYF_FREE)); 2049 ASSERT(cyc->cy_flags & CYF_PART_BOUND); 2050 2051 cyc->cy_flags &= ~CYF_PART_BOUND; 2052 2053 /* 2054 * If we're on a CPU which has interrupts disabled (and if this cyclic 2055 * isn't bound to the CPU), we need to juggle away. 2056 */ 2057 if (!(c->cpu_flags & CPU_ENABLE) && !(cyc->cy_flags & CYF_CPU_BOUND)) { 2058 int res = cyclic_juggle_one(idp); 2059 2060 ASSERT(res && idp->cyi_cpu != cpu); 2061 } 2062 } 2063 2064 static void 2065 cyclic_bind_cpupart(cyclic_id_t id, cpupart_t *part) 2066 { 2067 cyc_id_t *idp = (cyc_id_t *)id; 2068 cyc_cpu_t *cpu = idp->cyi_cpu, *dest; 2069 cpu_t *c = cpu->cyp_cpu; 2070 cyclic_t *cyc = &cpu->cyp_cyclics[idp->cyi_ndx]; 2071 2072 CYC_PTRACE("bind-part", idp, part); 2073 ASSERT(MUTEX_HELD(&cpu_lock)); 2074 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2075 ASSERT(cpu->cyp_state == CYS_ONLINE); 2076 ASSERT(!(cyc->cy_flags & CYF_FREE)); 2077 ASSERT(!(cyc->cy_flags & CYF_PART_BOUND)); 2078 ASSERT(part->cp_ncpus > 0); 2079 2080 dest = cyclic_pick_cpu(part, c, NULL, cyc->cy_flags | CYF_PART_BOUND); 2081 2082 if (dest != cpu) { 2083 cyclic_juggle_one_to(idp, dest); 2084 cyc = &dest->cyp_cyclics[idp->cyi_ndx]; 2085 } 2086 2087 cyc->cy_flags |= CYF_PART_BOUND; 2088 } 2089 2090 static void 2091 cyclic_configure(cpu_t *c) 2092 { 2093 cyc_cpu_t *cpu = kmem_zalloc(sizeof (cyc_cpu_t), KM_SLEEP); 2094 cyc_backend_t *nbe = kmem_zalloc(sizeof (cyc_backend_t), KM_SLEEP); 2095 int i; 2096 2097 CYC_PTRACE1("configure", cpu); 2098 ASSERT(MUTEX_HELD(&cpu_lock)); 2099 2100 if (cyclic_id_cache == NULL) 2101 cyclic_id_cache = kmem_cache_create("cyclic_id_cache", 2102 sizeof (cyc_id_t), 0, NULL, NULL, NULL, NULL, NULL, 0); 2103 2104 cpu->cyp_cpu = c; 2105 2106 sema_init(&cpu->cyp_modify_wait, 0, NULL, SEMA_DEFAULT, NULL); 2107 2108 cpu->cyp_size = 1; 2109 cpu->cyp_heap = kmem_zalloc(sizeof (cyc_index_t), KM_SLEEP); 2110 cpu->cyp_cyclics = kmem_zalloc(sizeof (cyclic_t), KM_SLEEP); 2111 cpu->cyp_cyclics->cy_flags = CYF_FREE; 2112 2113 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 2114 /* 2115 * We don't need to set the sizemask; it's already zero 2116 * (which is the appropriate sizemask for a size of 1). 2117 */ 2118 cpu->cyp_softbuf[i].cys_buf[0].cypc_buf = 2119 kmem_alloc(sizeof (cyc_index_t), KM_SLEEP); 2120 } 2121 2122 cpu->cyp_state = CYS_OFFLINE; 2123 2124 /* 2125 * Setup the backend for this CPU. 2126 */ 2127 bcopy(&cyclic_backend, nbe, sizeof (cyc_backend_t)); 2128 nbe->cyb_arg = nbe->cyb_configure(c); 2129 cpu->cyp_backend = nbe; 2130 2131 /* 2132 * On platforms where stray interrupts may be taken during startup, 2133 * the CPU's cpu_cyclic pointer serves as an indicator that the 2134 * cyclic subsystem for this CPU is prepared to field interrupts. 2135 */ 2136 membar_producer(); 2137 2138 c->cpu_cyclic = cpu; 2139 } 2140 2141 static void 2142 cyclic_unconfigure(cpu_t *c) 2143 { 2144 cyc_cpu_t *cpu = c->cpu_cyclic; 2145 cyc_backend_t *be = cpu->cyp_backend; 2146 cyb_arg_t bar = be->cyb_arg; 2147 int i; 2148 2149 CYC_PTRACE1("unconfigure", cpu); 2150 ASSERT(MUTEX_HELD(&cpu_lock)); 2151 ASSERT(cpu->cyp_state == CYS_OFFLINE); 2152 ASSERT(cpu->cyp_nelems == 0); 2153 2154 /* 2155 * Let the backend know that the CPU is being yanked, and free up 2156 * the backend structure. 2157 */ 2158 be->cyb_unconfigure(bar); 2159 kmem_free(be, sizeof (cyc_backend_t)); 2160 cpu->cyp_backend = NULL; 2161 2162 /* 2163 * Free up the producer/consumer buffers at each of the soft levels. 2164 */ 2165 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 2166 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 2167 uchar_t hard = softbuf->cys_hard; 2168 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard]; 2169 size_t bufsize = sizeof (cyc_index_t) * (pc->cypc_sizemask + 1); 2170 2171 /* 2172 * Assert that we're not in the middle of a resize operation. 2173 */ 2174 ASSERT(hard == softbuf->cys_soft); 2175 ASSERT(hard == 0 || hard == 1); 2176 ASSERT(pc->cypc_buf != NULL); 2177 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_buf == NULL); 2178 2179 kmem_free(pc->cypc_buf, bufsize); 2180 pc->cypc_buf = NULL; 2181 } 2182 2183 /* 2184 * Finally, clean up our remaining dynamic structures and NULL out 2185 * the cpu_cyclic pointer. 2186 */ 2187 kmem_free(cpu->cyp_cyclics, cpu->cyp_size * sizeof (cyclic_t)); 2188 kmem_free(cpu->cyp_heap, cpu->cyp_size * sizeof (cyc_index_t)); 2189 kmem_free(cpu, sizeof (cyc_cpu_t)); 2190 2191 c->cpu_cyclic = NULL; 2192 } 2193 2194 static int 2195 cyclic_cpu_setup(cpu_setup_t what, int id) 2196 { 2197 /* 2198 * We are guaranteed that there is still/already an entry in the 2199 * cpu array for this CPU. 2200 */ 2201 cpu_t *c = cpu[id]; 2202 cyc_cpu_t *cyp = c->cpu_cyclic; 2203 2204 ASSERT(MUTEX_HELD(&cpu_lock)); 2205 2206 switch (what) { 2207 case CPU_CONFIG: 2208 ASSERT(cyp == NULL); 2209 cyclic_configure(c); 2210 break; 2211 2212 case CPU_UNCONFIG: 2213 ASSERT(cyp != NULL && cyp->cyp_state == CYS_OFFLINE); 2214 cyclic_unconfigure(c); 2215 break; 2216 2217 default: 2218 break; 2219 } 2220 2221 return (0); 2222 } 2223 2224 static void 2225 cyclic_suspend_xcall(cyc_xcallarg_t *arg) 2226 { 2227 cyc_cpu_t *cpu = arg->cyx_cpu; 2228 cyc_backend_t *be = cpu->cyp_backend; 2229 cyc_cookie_t cookie; 2230 cyb_arg_t bar = be->cyb_arg; 2231 2232 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 2233 2234 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "suspend-xcall", cpu->cyp_nelems); 2235 ASSERT(cpu->cyp_state == CYS_ONLINE || cpu->cyp_state == CYS_OFFLINE); 2236 2237 /* 2238 * We won't disable this CPU unless it has a non-zero number of 2239 * elements (cpu_lock assures that no one else may be attempting 2240 * to disable this CPU). 2241 */ 2242 if (cpu->cyp_nelems > 0) { 2243 ASSERT(cpu->cyp_state == CYS_ONLINE); 2244 be->cyb_disable(bar); 2245 } 2246 2247 if (cpu->cyp_state == CYS_ONLINE) 2248 cpu->cyp_state = CYS_SUSPENDED; 2249 2250 be->cyb_suspend(bar); 2251 be->cyb_restore_level(bar, cookie); 2252 } 2253 2254 static void 2255 cyclic_resume_xcall(cyc_xcallarg_t *arg) 2256 { 2257 cyc_cpu_t *cpu = arg->cyx_cpu; 2258 cyc_backend_t *be = cpu->cyp_backend; 2259 cyc_cookie_t cookie; 2260 cyb_arg_t bar = be->cyb_arg; 2261 cyc_state_t state = cpu->cyp_state; 2262 2263 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 2264 2265 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "resume-xcall", cpu->cyp_nelems); 2266 ASSERT(state == CYS_SUSPENDED || state == CYS_OFFLINE); 2267 2268 be->cyb_resume(bar); 2269 2270 /* 2271 * We won't enable this CPU unless it has a non-zero number of 2272 * elements. 2273 */ 2274 if (cpu->cyp_nelems > 0) { 2275 cyclic_t *cyclic = &cpu->cyp_cyclics[cpu->cyp_heap[0]]; 2276 hrtime_t exp = cyclic->cy_expire; 2277 2278 CYC_TRACE(cpu, CY_HIGH_LEVEL, "resume-reprog", cyclic, exp); 2279 ASSERT(state == CYS_SUSPENDED); 2280 be->cyb_enable(bar); 2281 be->cyb_reprogram(bar, exp); 2282 } 2283 2284 if (state == CYS_SUSPENDED) 2285 cpu->cyp_state = CYS_ONLINE; 2286 2287 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "resume-done", cpu->cyp_nelems); 2288 be->cyb_restore_level(bar, cookie); 2289 } 2290 2291 static void 2292 cyclic_omni_start(cyc_id_t *idp, cyc_cpu_t *cpu) 2293 { 2294 cyc_omni_handler_t *omni = &idp->cyi_omni_hdlr; 2295 cyc_omni_cpu_t *ocpu = kmem_alloc(sizeof (cyc_omni_cpu_t), KM_SLEEP); 2296 cyc_handler_t hdlr; 2297 cyc_time_t when; 2298 2299 CYC_PTRACE("omni-start", cpu, idp); 2300 ASSERT(MUTEX_HELD(&cpu_lock)); 2301 ASSERT(cpu->cyp_state == CYS_ONLINE); 2302 ASSERT(idp->cyi_cpu == NULL); 2303 2304 hdlr.cyh_func = NULL; 2305 hdlr.cyh_arg = NULL; 2306 hdlr.cyh_level = CY_LEVELS; 2307 2308 when.cyt_when = 0; 2309 when.cyt_interval = 0; 2310 2311 omni->cyo_online(omni->cyo_arg, cpu->cyp_cpu, &hdlr, &when); 2312 2313 ASSERT(hdlr.cyh_func != NULL); 2314 ASSERT(hdlr.cyh_level < CY_LEVELS); 2315 ASSERT(when.cyt_when >= 0 && when.cyt_interval > 0); 2316 2317 ocpu->cyo_cpu = cpu; 2318 ocpu->cyo_arg = hdlr.cyh_arg; 2319 ocpu->cyo_ndx = cyclic_add_here(cpu, &hdlr, &when, 0); 2320 ocpu->cyo_next = idp->cyi_omni_list; 2321 idp->cyi_omni_list = ocpu; 2322 } 2323 2324 static void 2325 cyclic_omni_stop(cyc_id_t *idp, cyc_cpu_t *cpu) 2326 { 2327 cyc_omni_handler_t *omni = &idp->cyi_omni_hdlr; 2328 cyc_omni_cpu_t *ocpu = idp->cyi_omni_list, *prev = NULL; 2329 2330 CYC_PTRACE("omni-stop", cpu, idp); 2331 ASSERT(MUTEX_HELD(&cpu_lock)); 2332 ASSERT(cpu->cyp_state == CYS_ONLINE); 2333 ASSERT(idp->cyi_cpu == NULL); 2334 ASSERT(ocpu != NULL); 2335 2336 while (ocpu != NULL && ocpu->cyo_cpu != cpu) { 2337 prev = ocpu; 2338 ocpu = ocpu->cyo_next; 2339 } 2340 2341 /* 2342 * We _must_ have found an cyc_omni_cpu which corresponds to this 2343 * CPU -- the definition of an omnipresent cyclic is that it runs 2344 * on all online CPUs. 2345 */ 2346 ASSERT(ocpu != NULL); 2347 2348 if (prev == NULL) { 2349 idp->cyi_omni_list = ocpu->cyo_next; 2350 } else { 2351 prev->cyo_next = ocpu->cyo_next; 2352 } 2353 2354 (void) cyclic_remove_here(ocpu->cyo_cpu, ocpu->cyo_ndx, NULL, CY_WAIT); 2355 2356 /* 2357 * The cyclic has been removed from this CPU; time to call the 2358 * omnipresent offline handler. 2359 */ 2360 if (omni->cyo_offline != NULL) 2361 omni->cyo_offline(omni->cyo_arg, cpu->cyp_cpu, ocpu->cyo_arg); 2362 2363 kmem_free(ocpu, sizeof (cyc_omni_cpu_t)); 2364 } 2365 2366 static cyc_id_t * 2367 cyclic_new_id() 2368 { 2369 cyc_id_t *idp; 2370 2371 ASSERT(MUTEX_HELD(&cpu_lock)); 2372 2373 idp = kmem_cache_alloc(cyclic_id_cache, KM_SLEEP); 2374 2375 /* 2376 * The cyi_cpu field of the cyc_id_t structure tracks the CPU 2377 * associated with the cyclic. If and only if this field is NULL, the 2378 * cyc_id_t is an omnipresent cyclic. Note that cyi_omni_list may be 2379 * NULL for an omnipresent cyclic while the cyclic is being created 2380 * or destroyed. 2381 */ 2382 idp->cyi_cpu = NULL; 2383 idp->cyi_ndx = 0; 2384 2385 idp->cyi_next = cyclic_id_head; 2386 idp->cyi_prev = NULL; 2387 idp->cyi_omni_list = NULL; 2388 2389 if (cyclic_id_head != NULL) { 2390 ASSERT(cyclic_id_head->cyi_prev == NULL); 2391 cyclic_id_head->cyi_prev = idp; 2392 } 2393 2394 cyclic_id_head = idp; 2395 2396 return (idp); 2397 } 2398 2399 /* 2400 * cyclic_id_t cyclic_add(cyc_handler_t *, cyc_time_t *) 2401 * 2402 * Overview 2403 * 2404 * cyclic_add() will create an unbound cyclic with the specified handler and 2405 * interval. The cyclic will run on a CPU which both has interrupts enabled 2406 * and is in the system CPU partition. 2407 * 2408 * Arguments and notes 2409 * 2410 * As its first argument, cyclic_add() takes a cyc_handler, which has the 2411 * following members: 2412 * 2413 * cyc_func_t cyh_func <-- Cyclic handler 2414 * void *cyh_arg <-- Argument to cyclic handler 2415 * cyc_level_t cyh_level <-- Level at which to fire; must be one of 2416 * CY_LOW_LEVEL, CY_LOCK_LEVEL or CY_HIGH_LEVEL 2417 * 2418 * Note that cyh_level is _not_ an ipl or spl; it must be one the 2419 * CY_*_LEVELs. This layer of abstraction allows the platform to define 2420 * the precise interrupt priority levels, within the following constraints: 2421 * 2422 * CY_LOCK_LEVEL must map to LOCK_LEVEL 2423 * CY_HIGH_LEVEL must map to an ipl greater than LOCK_LEVEL 2424 * CY_LOW_LEVEL must map to an ipl below LOCK_LEVEL 2425 * 2426 * In addition to a cyc_handler, cyclic_add() takes a cyc_time, which 2427 * has the following members: 2428 * 2429 * hrtime_t cyt_when <-- Absolute time, in nanoseconds since boot, at 2430 * which to start firing 2431 * hrtime_t cyt_interval <-- Length of interval, in nanoseconds 2432 * 2433 * gethrtime() is the time source for nanoseconds since boot. If cyt_when 2434 * is set to 0, the cyclic will start to fire when cyt_interval next 2435 * divides the number of nanoseconds since boot. 2436 * 2437 * The cyt_interval field _must_ be filled in by the caller; one-shots are 2438 * _not_ explicitly supported by the cyclic subsystem (cyclic_add() will 2439 * assert that cyt_interval is non-zero). The maximum value for either 2440 * field is INT64_MAX; the caller is responsible for assuring that 2441 * cyt_when + cyt_interval <= INT64_MAX. Neither field may be negative. 2442 * 2443 * For an arbitrary time t in the future, the cyclic handler is guaranteed 2444 * to have been called (t - cyt_when) / cyt_interval times. This will 2445 * be true even if interrupts have been disabled for periods greater than 2446 * cyt_interval nanoseconds. In order to compensate for such periods, 2447 * the cyclic handler may be called a finite number of times with an 2448 * arbitrarily small interval. 2449 * 2450 * The cyclic subsystem will not enforce any lower bound on the interval; 2451 * if the interval is less than the time required to process an interrupt, 2452 * the CPU will wedge. It's the responsibility of the caller to assure that 2453 * either the value of the interval is sane, or that its caller has 2454 * sufficient privilege to deny service (i.e. its caller is root). 2455 * 2456 * The cyclic handler is guaranteed to be single threaded, even while the 2457 * cyclic is being juggled between CPUs (see cyclic_juggle(), below). 2458 * That is, a given cyclic handler will never be executed simultaneously 2459 * on different CPUs. 2460 * 2461 * Return value 2462 * 2463 * cyclic_add() returns a cyclic_id_t, which is guaranteed to be a value 2464 * other than CYCLIC_NONE. cyclic_add() cannot fail. 2465 * 2466 * Caller's context 2467 * 2468 * cpu_lock must be held by the caller, and the caller must not be in 2469 * interrupt context. cyclic_add() will perform a KM_SLEEP kernel 2470 * memory allocation, so the usual rules (e.g. p_lock cannot be held) 2471 * apply. A cyclic may be added even in the presence of CPUs that have 2472 * not been configured with respect to the cyclic subsystem, but only 2473 * configured CPUs will be eligible to run the new cyclic. 2474 * 2475 * Cyclic handler's context 2476 * 2477 * Cyclic handlers will be executed in the interrupt context corresponding 2478 * to the specified level (i.e. either high, lock or low level). The 2479 * usual context rules apply. 2480 * 2481 * A cyclic handler may not grab ANY locks held by the caller of any of 2482 * cyclic_add(), cyclic_remove() or cyclic_bind(); the implementation of 2483 * these functions may require blocking on cyclic handler completion. 2484 * Moreover, cyclic handlers may not make any call back into the cyclic 2485 * subsystem. 2486 */ 2487 cyclic_id_t 2488 cyclic_add(cyc_handler_t *hdlr, cyc_time_t *when) 2489 { 2490 cyc_id_t *idp = cyclic_new_id(); 2491 2492 ASSERT(MUTEX_HELD(&cpu_lock)); 2493 ASSERT(when->cyt_when >= 0 && when->cyt_interval > 0); 2494 2495 idp->cyi_cpu = cyclic_pick_cpu(NULL, NULL, NULL, 0); 2496 idp->cyi_ndx = cyclic_add_here(idp->cyi_cpu, hdlr, when, 0); 2497 2498 return ((uintptr_t)idp); 2499 } 2500 2501 /* 2502 * cyclic_id_t cyclic_add_omni(cyc_omni_handler_t *) 2503 * 2504 * Overview 2505 * 2506 * cyclic_add_omni() will create an omnipresent cyclic with the specified 2507 * online and offline handlers. Omnipresent cyclics run on all online 2508 * CPUs, including CPUs which have unbound interrupts disabled. 2509 * 2510 * Arguments 2511 * 2512 * As its only argument, cyclic_add_omni() takes a cyc_omni_handler, which 2513 * has the following members: 2514 * 2515 * void (*cyo_online)() <-- Online handler 2516 * void (*cyo_offline)() <-- Offline handler 2517 * void *cyo_arg <-- Argument to be passed to on/offline handlers 2518 * 2519 * Online handler 2520 * 2521 * The cyo_online member is a pointer to a function which has the following 2522 * four arguments: 2523 * 2524 * void * <-- Argument (cyo_arg) 2525 * cpu_t * <-- Pointer to CPU about to be onlined 2526 * cyc_handler_t * <-- Pointer to cyc_handler_t; must be filled in 2527 * by omni online handler 2528 * cyc_time_t * <-- Pointer to cyc_time_t; must be filled in by 2529 * omni online handler 2530 * 2531 * The omni cyclic online handler is always called _before_ the omni 2532 * cyclic begins to fire on the specified CPU. As the above argument 2533 * description implies, the online handler must fill in the two structures 2534 * passed to it: the cyc_handler_t and the cyc_time_t. These are the 2535 * same two structures passed to cyclic_add(), outlined above. This 2536 * allows the omni cyclic to have maximum flexibility; different CPUs may 2537 * optionally 2538 * 2539 * (a) have different intervals 2540 * (b) be explicitly in or out of phase with one another 2541 * (c) have different handlers 2542 * (d) have different handler arguments 2543 * (e) fire at different levels 2544 * 2545 * Of these, (e) seems somewhat dubious, but is nonetheless allowed. 2546 * 2547 * The omni online handler is called in the same context as cyclic_add(), 2548 * and has the same liberties: omni online handlers may perform KM_SLEEP 2549 * kernel memory allocations, and may grab locks which are also acquired 2550 * by cyclic handlers. However, omni cyclic online handlers may _not_ 2551 * call back into the cyclic subsystem, and should be generally careful 2552 * about calling into arbitrary kernel subsystems. 2553 * 2554 * Offline handler 2555 * 2556 * The cyo_offline member is a pointer to a function which has the following 2557 * three arguments: 2558 * 2559 * void * <-- Argument (cyo_arg) 2560 * cpu_t * <-- Pointer to CPU about to be offlined 2561 * void * <-- CPU's cyclic argument (that is, value 2562 * to which cyh_arg member of the cyc_handler_t 2563 * was set in the omni online handler) 2564 * 2565 * The omni cyclic offline handler is always called _after_ the omni 2566 * cyclic has ceased firing on the specified CPU. Its purpose is to 2567 * allow cleanup of any resources dynamically allocated in the omni cyclic 2568 * online handler. The context of the offline handler is identical to 2569 * that of the online handler; the same constraints and liberties apply. 2570 * 2571 * The offline handler is optional; it may be NULL. 2572 * 2573 * Return value 2574 * 2575 * cyclic_add_omni() returns a cyclic_id_t, which is guaranteed to be a 2576 * value other than CYCLIC_NONE. cyclic_add_omni() cannot fail. 2577 * 2578 * Caller's context 2579 * 2580 * The caller's context is identical to that of cyclic_add(), specified 2581 * above. 2582 */ 2583 cyclic_id_t 2584 cyclic_add_omni(cyc_omni_handler_t *omni) 2585 { 2586 cyc_id_t *idp = cyclic_new_id(); 2587 cyc_cpu_t *cpu; 2588 cpu_t *c; 2589 2590 ASSERT(MUTEX_HELD(&cpu_lock)); 2591 ASSERT(omni != NULL && omni->cyo_online != NULL); 2592 2593 idp->cyi_omni_hdlr = *omni; 2594 2595 c = cpu_list; 2596 do { 2597 if ((cpu = c->cpu_cyclic) == NULL) 2598 continue; 2599 2600 if (cpu->cyp_state != CYS_ONLINE) { 2601 ASSERT(cpu->cyp_state == CYS_OFFLINE); 2602 continue; 2603 } 2604 2605 cyclic_omni_start(idp, cpu); 2606 } while ((c = c->cpu_next) != cpu_list); 2607 2608 /* 2609 * We must have found at least one online CPU on which to run 2610 * this cyclic. 2611 */ 2612 ASSERT(idp->cyi_omni_list != NULL); 2613 ASSERT(idp->cyi_cpu == NULL); 2614 2615 return ((uintptr_t)idp); 2616 } 2617 2618 /* 2619 * void cyclic_remove(cyclic_id_t) 2620 * 2621 * Overview 2622 * 2623 * cyclic_remove() will remove the specified cyclic from the system. 2624 * 2625 * Arguments and notes 2626 * 2627 * The only argument is a cyclic_id returned from either cyclic_add() or 2628 * cyclic_add_omni(). 2629 * 2630 * By the time cyclic_remove() returns, the caller is guaranteed that the 2631 * removed cyclic handler has completed execution (this is the same 2632 * semantic that untimeout() provides). As a result, cyclic_remove() may 2633 * need to block, waiting for the removed cyclic to complete execution. 2634 * This leads to an important constraint on the caller: no lock may be 2635 * held across cyclic_remove() that also may be acquired by a cyclic 2636 * handler. 2637 * 2638 * Return value 2639 * 2640 * None; cyclic_remove() always succeeds. 2641 * 2642 * Caller's context 2643 * 2644 * cpu_lock must be held by the caller, and the caller must not be in 2645 * interrupt context. The caller may not hold any locks which are also 2646 * grabbed by any cyclic handler. See "Arguments and notes", above. 2647 */ 2648 void 2649 cyclic_remove(cyclic_id_t id) 2650 { 2651 cyc_id_t *idp = (cyc_id_t *)id; 2652 cyc_id_t *prev = idp->cyi_prev, *next = idp->cyi_next; 2653 cyc_cpu_t *cpu = idp->cyi_cpu; 2654 2655 CYC_PTRACE("remove", idp, idp->cyi_cpu); 2656 ASSERT(MUTEX_HELD(&cpu_lock)); 2657 2658 if (cpu != NULL) { 2659 (void) cyclic_remove_here(cpu, idp->cyi_ndx, NULL, CY_WAIT); 2660 } else { 2661 ASSERT(idp->cyi_omni_list != NULL); 2662 while (idp->cyi_omni_list != NULL) 2663 cyclic_omni_stop(idp, idp->cyi_omni_list->cyo_cpu); 2664 } 2665 2666 if (prev != NULL) { 2667 ASSERT(cyclic_id_head != idp); 2668 prev->cyi_next = next; 2669 } else { 2670 ASSERT(cyclic_id_head == idp); 2671 cyclic_id_head = next; 2672 } 2673 2674 if (next != NULL) 2675 next->cyi_prev = prev; 2676 2677 kmem_cache_free(cyclic_id_cache, idp); 2678 } 2679 2680 /* 2681 * void cyclic_bind(cyclic_id_t, cpu_t *, cpupart_t *) 2682 * 2683 * Overview 2684 * 2685 * cyclic_bind() atomically changes the CPU and CPU partition bindings 2686 * of a cyclic. 2687 * 2688 * Arguments and notes 2689 * 2690 * The first argument is a cyclic_id retuned from cyclic_add(). 2691 * cyclic_bind() may _not_ be called on a cyclic_id returned from 2692 * cyclic_add_omni(). 2693 * 2694 * The second argument specifies the CPU to which to bind the specified 2695 * cyclic. If the specified cyclic is bound to a CPU other than the one 2696 * specified, it will be unbound from its bound CPU. Unbinding the cyclic 2697 * from its CPU may cause it to be juggled to another CPU. If the specified 2698 * CPU is non-NULL, the cyclic will be subsequently rebound to the specified 2699 * CPU. 2700 * 2701 * If a CPU with bound cyclics is transitioned into the P_NOINTR state, 2702 * only cyclics not bound to the CPU can be juggled away; CPU-bound cyclics 2703 * will continue to fire on the P_NOINTR CPU. A CPU with bound cyclics 2704 * cannot be offlined (attempts to offline the CPU will return EBUSY). 2705 * Likewise, cyclics may not be bound to an offline CPU; if the caller 2706 * attempts to bind a cyclic to an offline CPU, the cyclic subsystem will 2707 * panic. 2708 * 2709 * The third argument specifies the CPU partition to which to bind the 2710 * specified cyclic. If the specified cyclic is bound to a CPU partition 2711 * other than the one specified, it will be unbound from its bound 2712 * partition. Unbinding the cyclic from its CPU partition may cause it 2713 * to be juggled to another CPU. If the specified CPU partition is 2714 * non-NULL, the cyclic will be subsequently rebound to the specified CPU 2715 * partition. 2716 * 2717 * It is the caller's responsibility to assure that the specified CPU 2718 * partition contains a CPU. If it does not, the cyclic subsystem will 2719 * panic. A CPU partition with bound cyclics cannot be destroyed (attempts 2720 * to destroy the partition will return EBUSY). If a CPU with 2721 * partition-bound cyclics is transitioned into the P_NOINTR state, cyclics 2722 * bound to the CPU's partition (but not bound to the CPU) will be juggled 2723 * away only if there exists another CPU in the partition in the P_ONLINE 2724 * state. 2725 * 2726 * It is the caller's responsibility to assure that the specified CPU and 2727 * CPU partition are self-consistent. If both parameters are non-NULL, 2728 * and the specified CPU partition does not contain the specified CPU, the 2729 * cyclic subsystem will panic. 2730 * 2731 * It is the caller's responsibility to assure that the specified CPU has 2732 * been configured with respect to the cyclic subsystem. Generally, this 2733 * is always true for valid, on-line CPUs. The only periods of time during 2734 * which this may not be true are during MP boot (i.e. after cyclic_init() 2735 * is called but before cyclic_mp_init() is called) or during dynamic 2736 * reconfiguration; cyclic_bind() should only be called with great care 2737 * from these contexts. 2738 * 2739 * Return value 2740 * 2741 * None; cyclic_bind() always succeeds. 2742 * 2743 * Caller's context 2744 * 2745 * cpu_lock must be held by the caller, and the caller must not be in 2746 * interrupt context. The caller may not hold any locks which are also 2747 * grabbed by any cyclic handler. 2748 */ 2749 void 2750 cyclic_bind(cyclic_id_t id, cpu_t *d, cpupart_t *part) 2751 { 2752 cyc_id_t *idp = (cyc_id_t *)id; 2753 cyc_cpu_t *cpu = idp->cyi_cpu; 2754 cpu_t *c; 2755 uint16_t flags; 2756 2757 CYC_PTRACE("bind", d, part); 2758 ASSERT(MUTEX_HELD(&cpu_lock)); 2759 ASSERT(part == NULL || d == NULL || d->cpu_part == part); 2760 2761 if (cpu == NULL) { 2762 ASSERT(idp->cyi_omni_list != NULL); 2763 panic("attempt to change binding of omnipresent cyclic"); 2764 } 2765 2766 c = cpu->cyp_cpu; 2767 flags = cpu->cyp_cyclics[idp->cyi_ndx].cy_flags; 2768 2769 if (c != d && (flags & CYF_CPU_BOUND)) 2770 cyclic_unbind_cpu(id); 2771 2772 /* 2773 * Reload our cpu (we may have migrated). We don't have to reload 2774 * the flags field here; if we were CYF_PART_BOUND on entry, we are 2775 * CYF_PART_BOUND now. 2776 */ 2777 cpu = idp->cyi_cpu; 2778 c = cpu->cyp_cpu; 2779 2780 if (part != c->cpu_part && (flags & CYF_PART_BOUND)) 2781 cyclic_unbind_cpupart(id); 2782 2783 /* 2784 * Now reload the flags field, asserting that if we are CPU bound, 2785 * the CPU was specified (and likewise, if we are partition bound, 2786 * the partition was specified). 2787 */ 2788 cpu = idp->cyi_cpu; 2789 c = cpu->cyp_cpu; 2790 flags = cpu->cyp_cyclics[idp->cyi_ndx].cy_flags; 2791 ASSERT(!(flags & CYF_CPU_BOUND) || c == d); 2792 ASSERT(!(flags & CYF_PART_BOUND) || c->cpu_part == part); 2793 2794 if (!(flags & CYF_CPU_BOUND) && d != NULL) 2795 cyclic_bind_cpu(id, d); 2796 2797 if (!(flags & CYF_PART_BOUND) && part != NULL) 2798 cyclic_bind_cpupart(id, part); 2799 } 2800 2801 hrtime_t 2802 cyclic_getres() 2803 { 2804 return (cyclic_resolution); 2805 } 2806 2807 void 2808 cyclic_init(cyc_backend_t *be, hrtime_t resolution) 2809 { 2810 ASSERT(MUTEX_HELD(&cpu_lock)); 2811 2812 CYC_PTRACE("init", be, resolution); 2813 cyclic_resolution = resolution; 2814 2815 /* 2816 * Copy the passed cyc_backend into the backend template. This must 2817 * be done before the CPU can be configured. 2818 */ 2819 bcopy(be, &cyclic_backend, sizeof (cyc_backend_t)); 2820 2821 /* 2822 * It's safe to look at the "CPU" pointer without disabling kernel 2823 * preemption; cyclic_init() is called only during startup by the 2824 * cyclic backend. 2825 */ 2826 cyclic_configure(CPU); 2827 cyclic_online(CPU); 2828 } 2829 2830 /* 2831 * It is assumed that cyclic_mp_init() is called some time after cyclic 2832 * init (and therefore, after cpu0 has been initialized). We grab cpu_lock, 2833 * find the already initialized CPU, and initialize every other CPU with the 2834 * same backend. Finally, we register a cpu_setup function. 2835 */ 2836 void 2837 cyclic_mp_init() 2838 { 2839 cpu_t *c; 2840 2841 mutex_enter(&cpu_lock); 2842 2843 c = cpu_list; 2844 do { 2845 if (c->cpu_cyclic == NULL) { 2846 cyclic_configure(c); 2847 cyclic_online(c); 2848 } 2849 } while ((c = c->cpu_next) != cpu_list); 2850 2851 register_cpu_setup_func((cpu_setup_func_t *)cyclic_cpu_setup, NULL); 2852 mutex_exit(&cpu_lock); 2853 } 2854 2855 /* 2856 * int cyclic_juggle(cpu_t *) 2857 * 2858 * Overview 2859 * 2860 * cyclic_juggle() juggles as many cyclics as possible away from the 2861 * specified CPU; all remaining cyclics on the CPU will either be CPU- 2862 * or partition-bound. 2863 * 2864 * Arguments and notes 2865 * 2866 * The only argument to cyclic_juggle() is the CPU from which cyclics 2867 * should be juggled. CPU-bound cyclics are never juggled; partition-bound 2868 * cyclics are only juggled if the specified CPU is in the P_NOINTR state 2869 * and there exists a P_ONLINE CPU in the partition. The cyclic subsystem 2870 * assures that a cyclic will never fire late or spuriously, even while 2871 * being juggled. 2872 * 2873 * Return value 2874 * 2875 * cyclic_juggle() returns a non-zero value if all cyclics were able to 2876 * be juggled away from the CPU, and zero if one or more cyclics could 2877 * not be juggled away. 2878 * 2879 * Caller's context 2880 * 2881 * cpu_lock must be held by the caller, and the caller must not be in 2882 * interrupt context. The caller may not hold any locks which are also 2883 * grabbed by any cyclic handler. While cyclic_juggle() _may_ be called 2884 * in any context satisfying these constraints, it _must_ be called 2885 * immediately after clearing CPU_ENABLE (i.e. before dropping cpu_lock). 2886 * Failure to do so could result in an assertion failure in the cyclic 2887 * subsystem. 2888 */ 2889 int 2890 cyclic_juggle(cpu_t *c) 2891 { 2892 cyc_cpu_t *cpu = c->cpu_cyclic; 2893 cyc_id_t *idp; 2894 int all_juggled = 1; 2895 2896 CYC_PTRACE1("juggle", c); 2897 ASSERT(MUTEX_HELD(&cpu_lock)); 2898 2899 /* 2900 * We'll go through each cyclic on the CPU, attempting to juggle 2901 * each one elsewhere. 2902 */ 2903 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 2904 if (idp->cyi_cpu != cpu) 2905 continue; 2906 2907 if (cyclic_juggle_one(idp) == 0) { 2908 all_juggled = 0; 2909 continue; 2910 } 2911 2912 ASSERT(idp->cyi_cpu != cpu); 2913 } 2914 2915 return (all_juggled); 2916 } 2917 2918 /* 2919 * int cyclic_offline(cpu_t *) 2920 * 2921 * Overview 2922 * 2923 * cyclic_offline() offlines the cyclic subsystem on the specified CPU. 2924 * 2925 * Arguments and notes 2926 * 2927 * The only argument to cyclic_offline() is a CPU to offline. 2928 * cyclic_offline() will attempt to juggle cyclics away from the specified 2929 * CPU. 2930 * 2931 * Return value 2932 * 2933 * cyclic_offline() returns 1 if all cyclics on the CPU were juggled away 2934 * and the cyclic subsystem on the CPU was successfully offlines. 2935 * cyclic_offline returns 0 if some cyclics remain, blocking the cyclic 2936 * offline operation. All remaining cyclics on the CPU will either be 2937 * CPU- or partition-bound. 2938 * 2939 * See the "Arguments and notes" of cyclic_juggle(), below, for more detail 2940 * on cyclic juggling. 2941 * 2942 * Caller's context 2943 * 2944 * The only caller of cyclic_offline() should be the processor management 2945 * subsystem. It is expected that the caller of cyclic_offline() will 2946 * offline the CPU immediately after cyclic_offline() returns success (i.e. 2947 * before dropping cpu_lock). Moreover, it is expected that the caller will 2948 * fail the CPU offline operation if cyclic_offline() returns failure. 2949 */ 2950 int 2951 cyclic_offline(cpu_t *c) 2952 { 2953 cyc_cpu_t *cpu = c->cpu_cyclic; 2954 cyc_id_t *idp; 2955 2956 CYC_PTRACE1("offline", cpu); 2957 ASSERT(MUTEX_HELD(&cpu_lock)); 2958 2959 if (!cyclic_juggle(c)) 2960 return (0); 2961 2962 /* 2963 * This CPU is headed offline; we need to now stop omnipresent 2964 * cyclic firing on this CPU. 2965 */ 2966 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 2967 if (idp->cyi_cpu != NULL) 2968 continue; 2969 2970 /* 2971 * We cannot possibly be offlining the last CPU; cyi_omni_list 2972 * must be non-NULL. 2973 */ 2974 ASSERT(idp->cyi_omni_list != NULL); 2975 cyclic_omni_stop(idp, cpu); 2976 } 2977 2978 ASSERT(cpu->cyp_state == CYS_ONLINE); 2979 cpu->cyp_state = CYS_OFFLINE; 2980 2981 return (1); 2982 } 2983 2984 /* 2985 * void cyclic_online(cpu_t *) 2986 * 2987 * Overview 2988 * 2989 * cyclic_online() onlines a CPU previously offlined with cyclic_offline(). 2990 * 2991 * Arguments and notes 2992 * 2993 * cyclic_online()'s only argument is a CPU to online. The specified 2994 * CPU must have been previously offlined with cyclic_offline(). After 2995 * cyclic_online() returns, the specified CPU will be eligible to execute 2996 * cyclics. 2997 * 2998 * Return value 2999 * 3000 * None; cyclic_online() always succeeds. 3001 * 3002 * Caller's context 3003 * 3004 * cyclic_online() should only be called by the processor management 3005 * subsystem; cpu_lock must be held. 3006 */ 3007 void 3008 cyclic_online(cpu_t *c) 3009 { 3010 cyc_cpu_t *cpu = c->cpu_cyclic; 3011 cyc_id_t *idp; 3012 3013 CYC_PTRACE1("online", cpu); 3014 ASSERT(c->cpu_flags & CPU_ENABLE); 3015 ASSERT(MUTEX_HELD(&cpu_lock)); 3016 ASSERT(cpu->cyp_state == CYS_OFFLINE); 3017 3018 cpu->cyp_state = CYS_ONLINE; 3019 3020 /* 3021 * Now that this CPU is open for business, we need to start firing 3022 * all omnipresent cyclics on it. 3023 */ 3024 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3025 if (idp->cyi_cpu != NULL) 3026 continue; 3027 3028 cyclic_omni_start(idp, cpu); 3029 } 3030 } 3031 3032 /* 3033 * void cyclic_move_in(cpu_t *) 3034 * 3035 * Overview 3036 * 3037 * cyclic_move_in() is called by the CPU partition code immediately after 3038 * the specified CPU has moved into a new partition. 3039 * 3040 * Arguments and notes 3041 * 3042 * The only argument to cyclic_move_in() is a CPU which has moved into a 3043 * new partition. If the specified CPU is P_ONLINE, and every other 3044 * CPU in the specified CPU's new partition is P_NOINTR, cyclic_move_in() 3045 * will juggle all partition-bound, CPU-unbound cyclics to the specified 3046 * CPU. 3047 * 3048 * Return value 3049 * 3050 * None; cyclic_move_in() always succeeds. 3051 * 3052 * Caller's context 3053 * 3054 * cyclic_move_in() should _only_ be called immediately after a CPU has 3055 * moved into a new partition, with cpu_lock held. As with other calls 3056 * into the cyclic subsystem, no lock may be held which is also grabbed 3057 * by any cyclic handler. 3058 */ 3059 void 3060 cyclic_move_in(cpu_t *d) 3061 { 3062 cyc_id_t *idp; 3063 cyc_cpu_t *dest = d->cpu_cyclic; 3064 cyclic_t *cyclic; 3065 cpupart_t *part = d->cpu_part; 3066 3067 CYC_PTRACE("move-in", dest, part); 3068 ASSERT(MUTEX_HELD(&cpu_lock)); 3069 3070 /* 3071 * Look for CYF_PART_BOUND cyclics in the new partition. If 3072 * we find one, check to see if it is currently on a CPU which has 3073 * interrupts disabled. If it is (and if this CPU currently has 3074 * interrupts enabled), we'll juggle those cyclics over here. 3075 */ 3076 if (!(d->cpu_flags & CPU_ENABLE)) { 3077 CYC_PTRACE1("move-in-none", dest); 3078 return; 3079 } 3080 3081 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3082 cyc_cpu_t *cpu = idp->cyi_cpu; 3083 cpu_t *c; 3084 3085 /* 3086 * Omnipresent cyclics are exempt from juggling. 3087 */ 3088 if (cpu == NULL) 3089 continue; 3090 3091 c = cpu->cyp_cpu; 3092 3093 if (c->cpu_part != part || (c->cpu_flags & CPU_ENABLE)) 3094 continue; 3095 3096 cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 3097 3098 if (cyclic->cy_flags & CYF_CPU_BOUND) 3099 continue; 3100 3101 /* 3102 * We know that this cyclic is bound to its processor set 3103 * (otherwise, it would not be on a CPU with interrupts 3104 * disabled); juggle it to our CPU. 3105 */ 3106 ASSERT(cyclic->cy_flags & CYF_PART_BOUND); 3107 cyclic_juggle_one_to(idp, dest); 3108 } 3109 3110 CYC_PTRACE1("move-in-done", dest); 3111 } 3112 3113 /* 3114 * int cyclic_move_out(cpu_t *) 3115 * 3116 * Overview 3117 * 3118 * cyclic_move_out() is called by the CPU partition code immediately before 3119 * the specified CPU is to move out of its partition. 3120 * 3121 * Arguments and notes 3122 * 3123 * The only argument to cyclic_move_out() is a CPU which is to move out of 3124 * its partition. 3125 * 3126 * cyclic_move_out() will attempt to juggle away all partition-bound 3127 * cyclics. If the specified CPU is the last CPU in a partition with 3128 * partition-bound cyclics, cyclic_move_out() will fail. If there exists 3129 * a partition-bound cyclic which is CPU-bound to the specified CPU, 3130 * cyclic_move_out() will fail. 3131 * 3132 * Note that cyclic_move_out() will _only_ attempt to juggle away 3133 * partition-bound cyclics; CPU-bound cyclics which are not partition-bound 3134 * and unbound cyclics are not affected by changing the partition 3135 * affiliation of the CPU. 3136 * 3137 * Return value 3138 * 3139 * cyclic_move_out() returns 1 if all partition-bound cyclics on the CPU 3140 * were juggled away; 0 if some cyclics remain. 3141 * 3142 * Caller's context 3143 * 3144 * cyclic_move_out() should _only_ be called immediately before a CPU has 3145 * moved out of its partition, with cpu_lock held. It is expected that 3146 * the caller of cyclic_move_out() will change the processor set affiliation 3147 * of the specified CPU immediately after cyclic_move_out() returns 3148 * success (i.e. before dropping cpu_lock). Moreover, it is expected that 3149 * the caller will fail the CPU repartitioning operation if cyclic_move_out() 3150 * returns failure. As with other calls into the cyclic subsystem, no lock 3151 * may be held which is also grabbed by any cyclic handler. 3152 */ 3153 int 3154 cyclic_move_out(cpu_t *c) 3155 { 3156 cyc_id_t *idp; 3157 cyc_cpu_t *cpu = c->cpu_cyclic, *dest; 3158 cyclic_t *cyclic, *cyclics = cpu->cyp_cyclics; 3159 cpupart_t *part = c->cpu_part; 3160 3161 CYC_PTRACE1("move-out", cpu); 3162 ASSERT(MUTEX_HELD(&cpu_lock)); 3163 3164 /* 3165 * If there are any CYF_PART_BOUND cyclics on this CPU, we need 3166 * to try to juggle them away. 3167 */ 3168 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3169 3170 if (idp->cyi_cpu != cpu) 3171 continue; 3172 3173 cyclic = &cyclics[idp->cyi_ndx]; 3174 3175 if (!(cyclic->cy_flags & CYF_PART_BOUND)) 3176 continue; 3177 3178 dest = cyclic_pick_cpu(part, c, c, cyclic->cy_flags); 3179 3180 if (dest == NULL) { 3181 /* 3182 * We can't juggle this cyclic; we need to return 3183 * failure (we won't bother trying to juggle away 3184 * other cyclics). 3185 */ 3186 CYC_PTRACE("move-out-fail", cpu, idp); 3187 return (0); 3188 } 3189 cyclic_juggle_one_to(idp, dest); 3190 } 3191 3192 CYC_PTRACE1("move-out-done", cpu); 3193 return (1); 3194 } 3195 3196 /* 3197 * void cyclic_suspend() 3198 * 3199 * Overview 3200 * 3201 * cyclic_suspend() suspends all cyclic activity throughout the cyclic 3202 * subsystem. It should be called only by subsystems which are attempting 3203 * to suspend the entire system (e.g. checkpoint/resume, dynamic 3204 * reconfiguration). 3205 * 3206 * Arguments and notes 3207 * 3208 * cyclic_suspend() takes no arguments. Each CPU with an active cyclic 3209 * disables its backend (offline CPUs disable their backends as part of 3210 * the cyclic_offline() operation), thereby disabling future CY_HIGH_LEVEL 3211 * interrupts. 3212 * 3213 * Note that disabling CY_HIGH_LEVEL interrupts does not completely preclude 3214 * cyclic handlers from being called after cyclic_suspend() returns: if a 3215 * CY_LOCK_LEVEL or CY_LOW_LEVEL interrupt thread was blocked at the time 3216 * of cyclic_suspend(), cyclic handlers at its level may continue to be 3217 * called after the interrupt thread becomes unblocked. The 3218 * post-cyclic_suspend() activity is bounded by the pend count on all 3219 * cyclics at the time of cyclic_suspend(). Callers concerned with more 3220 * than simply disabling future CY_HIGH_LEVEL interrupts must check for 3221 * this condition. 3222 * 3223 * On most platforms, timestamps from gethrtime() and gethrestime() are not 3224 * guaranteed to monotonically increase between cyclic_suspend() and 3225 * cyclic_resume(). However, timestamps are guaranteed to monotonically 3226 * increase across the entire cyclic_suspend()/cyclic_resume() operation. 3227 * That is, every timestamp obtained before cyclic_suspend() will be less 3228 * than every timestamp obtained after cyclic_resume(). 3229 * 3230 * Return value 3231 * 3232 * None; cyclic_suspend() always succeeds. 3233 * 3234 * Caller's context 3235 * 3236 * The cyclic subsystem must be configured on every valid CPU; 3237 * cyclic_suspend() may not be called during boot or during dynamic 3238 * reconfiguration. Additionally, cpu_lock must be held, and the caller 3239 * cannot be in high-level interrupt context. However, unlike most other 3240 * cyclic entry points, cyclic_suspend() may be called with locks held 3241 * which are also acquired by CY_LOCK_LEVEL or CY_LOW_LEVEL cyclic 3242 * handlers. 3243 */ 3244 void 3245 cyclic_suspend() 3246 { 3247 cpu_t *c; 3248 cyc_cpu_t *cpu; 3249 cyc_xcallarg_t arg; 3250 cyc_backend_t *be; 3251 3252 CYC_PTRACE0("suspend"); 3253 ASSERT(MUTEX_HELD(&cpu_lock)); 3254 c = cpu_list; 3255 3256 do { 3257 cpu = c->cpu_cyclic; 3258 be = cpu->cyp_backend; 3259 arg.cyx_cpu = cpu; 3260 3261 be->cyb_xcall(be->cyb_arg, c, 3262 (cyc_func_t)cyclic_suspend_xcall, &arg); 3263 } while ((c = c->cpu_next) != cpu_list); 3264 } 3265 3266 /* 3267 * void cyclic_resume() 3268 * 3269 * cyclic_resume() resumes all cyclic activity throughout the cyclic 3270 * subsystem. It should be called only by system-suspending subsystems. 3271 * 3272 * Arguments and notes 3273 * 3274 * cyclic_resume() takes no arguments. Each CPU with an active cyclic 3275 * reenables and reprograms its backend (offline CPUs are not reenabled). 3276 * On most platforms, timestamps from gethrtime() and gethrestime() are not 3277 * guaranteed to monotonically increase between cyclic_suspend() and 3278 * cyclic_resume(). However, timestamps are guaranteed to monotonically 3279 * increase across the entire cyclic_suspend()/cyclic_resume() operation. 3280 * That is, every timestamp obtained before cyclic_suspend() will be less 3281 * than every timestamp obtained after cyclic_resume(). 3282 * 3283 * Return value 3284 * 3285 * None; cyclic_resume() always succeeds. 3286 * 3287 * Caller's context 3288 * 3289 * The cyclic subsystem must be configured on every valid CPU; 3290 * cyclic_resume() may not be called during boot or during dynamic 3291 * reconfiguration. Additionally, cpu_lock must be held, and the caller 3292 * cannot be in high-level interrupt context. However, unlike most other 3293 * cyclic entry points, cyclic_resume() may be called with locks held which 3294 * are also acquired by CY_LOCK_LEVEL or CY_LOW_LEVEL cyclic handlers. 3295 */ 3296 void 3297 cyclic_resume() 3298 { 3299 cpu_t *c; 3300 cyc_cpu_t *cpu; 3301 cyc_xcallarg_t arg; 3302 cyc_backend_t *be; 3303 3304 CYC_PTRACE0("resume"); 3305 ASSERT(MUTEX_HELD(&cpu_lock)); 3306 3307 c = cpu_list; 3308 3309 do { 3310 cpu = c->cpu_cyclic; 3311 be = cpu->cyp_backend; 3312 arg.cyx_cpu = cpu; 3313 3314 be->cyb_xcall(be->cyb_arg, c, 3315 (cyc_func_t)cyclic_resume_xcall, &arg); 3316 } while ((c = c->cpu_next) != cpu_list); 3317 } 3318