1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 /* 22 * Copyright 2008 Sun Microsystems, Inc. All rights reserved. 23 * Use is subject to license terms. 24 */ 25 26 /* 27 * Copyright 2019 Joyent, Inc. 28 */ 29 30 /* 31 * The Cyclic Subsystem 32 * -------------------- 33 * 34 * Prehistory 35 * 36 * Historically, most computer architectures have specified interval-based 37 * timer parts (e.g. SPARCstation's counter/timer; Intel's i8254). While 38 * these parts deal in relative (i.e. not absolute) time values, they are 39 * typically used by the operating system to implement the abstraction of 40 * absolute time. As a result, these parts cannot typically be reprogrammed 41 * without introducing error in the system's notion of time. 42 * 43 * Starting in about 1994, chip architectures began specifying high resolution 44 * timestamp registers. As of this writing (1999), all major chip families 45 * (UltraSPARC, PentiumPro, MIPS, PowerPC, Alpha) have high resolution 46 * timestamp registers, and two (UltraSPARC and MIPS) have added the capacity 47 * to interrupt based on timestamp values. These timestamp-compare registers 48 * present a time-based interrupt source which can be reprogrammed arbitrarily 49 * often without introducing error. Given the low cost of implementing such a 50 * timestamp-compare register (and the tangible benefit of eliminating 51 * discrete timer parts), it is reasonable to expect that future chip 52 * architectures will adopt this feature. 53 * 54 * The cyclic subsystem has been designed to take advantage of chip 55 * architectures with the capacity to interrupt based on absolute, high 56 * resolution values of time. 57 * 58 * Subsystem Overview 59 * 60 * The cyclic subsystem is a low-level kernel subsystem designed to provide 61 * arbitrarily high resolution, per-CPU interval timers (to avoid colliding 62 * with existing terms, we dub such an interval timer a "cyclic"). Cyclics 63 * can be specified to fire at high, lock or low interrupt level, and may be 64 * optionally bound to a CPU or a CPU partition. A cyclic's CPU or CPU 65 * partition binding may be changed dynamically; the cyclic will be "juggled" 66 * to a CPU which satisfies the new binding. Alternatively, a cyclic may 67 * be specified to be "omnipresent", denoting firing on all online CPUs. 68 * 69 * Cyclic Subsystem Interface Overview 70 * ----------------------------------- 71 * 72 * The cyclic subsystem has interfaces with the kernel at-large, with other 73 * kernel subsystems (e.g. the processor management subsystem, the checkpoint 74 * resume subsystem) and with the platform (the cyclic backend). Each 75 * of these interfaces is given a brief synopsis here, and is described 76 * in full above the interface's implementation. 77 * 78 * The following diagram displays the cyclic subsystem's interfaces to 79 * other kernel components. The arrows denote a "calls" relationship, with 80 * the large arrow indicating the cyclic subsystem's consumer interface. 81 * Each arrow is labeled with the section in which the corresponding 82 * interface is described. 83 * 84 * Kernel at-large consumers 85 * -----------++------------ 86 * || 87 * || 88 * _||_ 89 * \ / 90 * \/ 91 * +---------------------+ 92 * | | 93 * | Cyclic subsystem |<----------- Other kernel subsystems 94 * | | 95 * +---------------------+ 96 * ^ | 97 * | | 98 * | | 99 * | v 100 * +---------------------+ 101 * | | 102 * | Cyclic backend | 103 * | (platform specific) | 104 * | | 105 * +---------------------+ 106 * 107 * 108 * Kernel At-Large Interfaces 109 * 110 * cyclic_add() <-- Creates a cyclic 111 * cyclic_add_omni() <-- Creates an omnipresent cyclic 112 * cyclic_remove() <-- Removes a cyclic 113 * cyclic_bind() <-- Change a cyclic's CPU or partition binding 114 * cyclic_reprogram() <-- Reprogram a cyclic's expiration 115 * cyclic_move_here() <-- Shuffle cyclic to current CPU 116 * 117 * Inter-subsystem Interfaces 118 * 119 * cyclic_juggle() <-- Juggles cyclics away from a CPU 120 * cyclic_offline() <-- Offlines cyclic operation on a CPU 121 * cyclic_online() <-- Reenables operation on an offlined CPU 122 * cyclic_move_in() <-- Notifies subsystem of change in CPU partition 123 * cyclic_move_out() <-- Notifies subsystem of change in CPU partition 124 * cyclic_suspend() <-- Suspends the cyclic subsystem on all CPUs 125 * cyclic_resume() <-- Resumes the cyclic subsystem on all CPUs 126 * 127 * Backend Interfaces 128 * 129 * cyclic_init() <-- Initializes the cyclic subsystem 130 * cyclic_fire() <-- CY_HIGH_LEVEL interrupt entry point 131 * cyclic_softint() <-- CY_LOCK/LOW_LEVEL soft interrupt entry point 132 * 133 * The backend-supplied interfaces (through the cyc_backend structure) are 134 * documented in detail in <sys/cyclic_impl.h> 135 * 136 * 137 * Cyclic Subsystem Implementation Overview 138 * ---------------------------------------- 139 * 140 * The cyclic subsystem is designed to minimize interference between cyclics 141 * on different CPUs. Thus, all of the cyclic subsystem's data structures 142 * hang off of a per-CPU structure, cyc_cpu. 143 * 144 * Each cyc_cpu has a power-of-two sized array of cyclic structures (the 145 * cyp_cyclics member of the cyc_cpu structure). If cyclic_add() is called 146 * and there does not exist a free slot in the cyp_cyclics array, the size of 147 * the array will be doubled. The array will never shrink. Cyclics are 148 * referred to by their index in the cyp_cyclics array, which is of type 149 * cyc_index_t. 150 * 151 * The cyclics are kept sorted by expiration time in the cyc_cpu's heap. The 152 * heap is keyed by cyclic expiration time, with parents expiring earlier 153 * than their children. 154 * 155 * Heap Management 156 * 157 * The heap is managed primarily by cyclic_fire(). Upon entry, cyclic_fire() 158 * compares the root cyclic's expiration time to the current time. If the 159 * expiration time is in the past, cyclic_expire() is called on the root 160 * cyclic. Upon return from cyclic_expire(), the cyclic's new expiration time 161 * is derived by adding its interval to its old expiration time, and a 162 * downheap operation is performed. After the downheap, cyclic_fire() 163 * examines the (potentially changed) root cyclic, repeating the 164 * cyclic_expire()/add interval/cyclic_downheap() sequence until the root 165 * cyclic has an expiration time in the future. This expiration time 166 * (guaranteed to be the earliest in the heap) is then communicated to the 167 * backend via cyb_reprogram. Optimal backends will next call cyclic_fire() 168 * shortly after the root cyclic's expiration time. 169 * 170 * To allow efficient, deterministic downheap operations, we implement the 171 * heap as an array (the cyp_heap member of the cyc_cpu structure), with each 172 * element containing an index into the CPU's cyp_cyclics array. 173 * 174 * The heap is laid out in the array according to the following: 175 * 176 * 1. The root of the heap is always in the 0th element of the heap array 177 * 2. The left and right children of the nth element are element 178 * (((n + 1) << 1) - 1) and element ((n + 1) << 1), respectively. 179 * 180 * This layout is standard (see, e.g., Cormen's "Algorithms"); the proof 181 * that these constraints correctly lay out a heap (or indeed, any binary 182 * tree) is trivial and left to the reader. 183 * 184 * To see the heap by example, assume our cyclics array has the following 185 * members (at time t): 186 * 187 * cy_handler cy_level cy_expire 188 * --------------------------------------------- 189 * [ 0] clock() LOCK t+10000000 190 * [ 1] deadman() HIGH t+1000000000 191 * [ 2] clock_highres_fire() LOW t+100 192 * [ 3] clock_highres_fire() LOW t+1000 193 * [ 4] clock_highres_fire() LOW t+500 194 * [ 5] (free) -- -- 195 * [ 6] (free) -- -- 196 * [ 7] (free) -- -- 197 * 198 * The heap array could be: 199 * 200 * [0] [1] [2] [3] [4] [5] [6] [7] 201 * +-----+-----+-----+-----+-----+-----+-----+-----+ 202 * | | | | | | | | | 203 * | 2 | 3 | 4 | 0 | 1 | x | x | x | 204 * | | | | | | | | | 205 * +-----+-----+-----+-----+-----+-----+-----+-----+ 206 * 207 * Graphically, this array corresponds to the following (excuse the ASCII art): 208 * 209 * 2 210 * | 211 * +------------------+------------------+ 212 * 3 4 213 * | 214 * +---------+--------+ 215 * 0 1 216 * 217 * Note that the heap is laid out by layer: all nodes at a given depth are 218 * stored in consecutive elements of the array. Moreover, layers of 219 * consecutive depths are in adjacent element ranges. This property 220 * guarantees high locality of reference during downheap operations. 221 * Specifically, we are guaranteed that we can downheap to a depth of 222 * 223 * lg (cache_line_size / sizeof (cyc_index_t)) 224 * 225 * nodes with at most one cache miss. On UltraSPARC (64 byte e-cache line 226 * size), this corresponds to a depth of four nodes. Thus, if there are 227 * fewer than sixteen cyclics in the heap, downheaps on UltraSPARC miss at 228 * most once in the e-cache. 229 * 230 * Downheaps are required to compare siblings as they proceed down the 231 * heap. For downheaps proceeding beyond the one-cache-miss depth, every 232 * access to a left child could potentially miss in the cache. However, 233 * if we assume 234 * 235 * (cache_line_size / sizeof (cyc_index_t)) > 2, 236 * 237 * then all siblings are guaranteed to be on the same cache line. Thus, the 238 * miss on the left child will guarantee a hit on the right child; downheaps 239 * will incur at most one cache miss per layer beyond the one-cache-miss 240 * depth. The total number of cache misses for heap management during a 241 * downheap operation is thus bounded by 242 * 243 * lg (n) - lg (cache_line_size / sizeof (cyc_index_t)) 244 * 245 * Traditional pointer-based heaps are implemented without regard to 246 * locality. Downheaps can thus incur two cache misses per layer (one for 247 * each child), but at most one cache miss at the root. This yields a bound 248 * of 249 * 250 * 2 * lg (n) - 1 251 * 252 * on the total cache misses. 253 * 254 * This difference may seem theoretically trivial (the difference is, after 255 * all, constant), but can become substantial in practice -- especially for 256 * caches with very large cache lines and high miss penalties (e.g. TLBs). 257 * 258 * Heaps must always be full, balanced trees. Heap management must therefore 259 * track the next point-of-insertion into the heap. In pointer-based heaps, 260 * recomputing this point takes O(lg (n)). Given the layout of the 261 * array-based implementation, however, the next point-of-insertion is 262 * always: 263 * 264 * heap[number_of_elements] 265 * 266 * We exploit this property by implementing the free-list in the usused 267 * heap elements. Heap insertion, therefore, consists only of filling in 268 * the cyclic at cyp_cyclics[cyp_heap[number_of_elements]], incrementing 269 * the number of elements, and performing an upheap. Heap deletion consists 270 * of decrementing the number of elements, swapping the to-be-deleted element 271 * with the element at cyp_heap[number_of_elements], and downheaping. 272 * 273 * Filling in more details in our earlier example: 274 * 275 * +--- free list head 276 * | 277 * V 278 * 279 * [0] [1] [2] [3] [4] [5] [6] [7] 280 * +-----+-----+-----+-----+-----+-----+-----+-----+ 281 * | | | | | | | | | 282 * | 2 | 3 | 4 | 0 | 1 | 5 | 6 | 7 | 283 * | | | | | | | | | 284 * +-----+-----+-----+-----+-----+-----+-----+-----+ 285 * 286 * To insert into this heap, we would just need to fill in the cyclic at 287 * cyp_cyclics[5], bump the number of elements (from 5 to 6) and perform 288 * an upheap. 289 * 290 * If we wanted to remove, say, cyp_cyclics[3], we would first scan for it 291 * in the cyp_heap, and discover it at cyp_heap[1]. We would then decrement 292 * the number of elements (from 5 to 4), swap cyp_heap[1] with cyp_heap[4], 293 * and perform a downheap from cyp_heap[1]. The linear scan is required 294 * because the cyclic does not keep a backpointer into the heap. This makes 295 * heap manipulation (e.g. downheaps) faster at the expense of removal 296 * operations. 297 * 298 * Expiry processing 299 * 300 * As alluded to above, cyclic_expire() is called by cyclic_fire() at 301 * CY_HIGH_LEVEL to expire a cyclic. Cyclic subsystem consumers are 302 * guaranteed that for an arbitrary time t in the future, their cyclic 303 * handler will have been called (t - cyt_when) / cyt_interval times. Thus, 304 * there must be a one-to-one mapping between a cyclic's expiration at 305 * CY_HIGH_LEVEL and its execution at the desired level (either CY_HIGH_LEVEL, 306 * CY_LOCK_LEVEL or CY_LOW_LEVEL). 307 * 308 * For CY_HIGH_LEVEL cyclics, this is trivial; cyclic_expire() simply needs 309 * to call the handler. 310 * 311 * For CY_LOCK_LEVEL and CY_LOW_LEVEL cyclics, however, there exists a 312 * potential disconnect: if the CPU is at an interrupt level less than 313 * CY_HIGH_LEVEL but greater than the level of a cyclic for a period of 314 * time longer than twice the cyclic's interval, the cyclic will be expired 315 * twice before it can be handled. 316 * 317 * To maintain the one-to-one mapping, we track the difference between the 318 * number of times a cyclic has been expired and the number of times it's 319 * been handled in a "pending count" (the cy_pend field of the cyclic 320 * structure). cyclic_expire() thus increments the cy_pend count for the 321 * expired cyclic and posts a soft interrupt at the desired level. In the 322 * cyclic subsystem's soft interrupt handler, cyclic_softint(), we repeatedly 323 * call the cyclic handler and decrement cy_pend until we have decremented 324 * cy_pend to zero. 325 * 326 * The Producer/Consumer Buffer 327 * 328 * If we wish to avoid a linear scan of the cyclics array at soft interrupt 329 * level, cyclic_softint() must be able to quickly determine which cyclics 330 * have a non-zero cy_pend count. We thus introduce a per-soft interrupt 331 * level producer/consumer buffer shared with CY_HIGH_LEVEL. These buffers 332 * are encapsulated in the cyc_pcbuffer structure, and, like cyp_heap, are 333 * implemented as cyc_index_t arrays (the cypc_buf member of the cyc_pcbuffer 334 * structure). 335 * 336 * The producer (cyclic_expire() running at CY_HIGH_LEVEL) enqueues a cyclic 337 * by storing the cyclic's index to cypc_buf[cypc_prodndx] and incrementing 338 * cypc_prodndx. The consumer (cyclic_softint() running at either 339 * CY_LOCK_LEVEL or CY_LOW_LEVEL) dequeues a cyclic by loading from 340 * cypc_buf[cypc_consndx] and bumping cypc_consndx. The buffer is empty when 341 * cypc_prodndx == cypc_consndx. 342 * 343 * To bound the size of the producer/consumer buffer, cyclic_expire() only 344 * enqueues a cyclic if its cy_pend was zero (if the cyclic's cy_pend is 345 * non-zero, cyclic_expire() only bumps cy_pend). Symmetrically, 346 * cyclic_softint() only consumes a cyclic after it has decremented the 347 * cy_pend count to zero. 348 * 349 * Returning to our example, here is what the CY_LOW_LEVEL producer/consumer 350 * buffer might look like: 351 * 352 * cypc_consndx ---+ +--- cypc_prodndx 353 * | | 354 * V V 355 * 356 * [0] [1] [2] [3] [4] [5] [6] [7] 357 * +-----+-----+-----+-----+-----+-----+-----+-----+ 358 * | | | | | | | | | 359 * | x | x | 3 | 2 | 4 | x | x | x | <== cypc_buf 360 * | | | . | . | . | | | | 361 * +-----+-----+- | -+- | -+- | -+-----+-----+-----+ 362 * | | | 363 * | | | cy_pend cy_handler 364 * | | | ------------------------- 365 * | | | [ 0] 1 clock() 366 * | | | [ 1] 0 deadman() 367 * | +---- | -------> [ 2] 3 clock_highres_fire() 368 * +---------- | -------> [ 3] 1 clock_highres_fire() 369 * +--------> [ 4] 1 clock_highres_fire() 370 * [ 5] - (free) 371 * [ 6] - (free) 372 * [ 7] - (free) 373 * 374 * In particular, note that clock()'s cy_pend is 1 but that it is _not_ in 375 * this producer/consumer buffer; it would be enqueued in the CY_LOCK_LEVEL 376 * producer/consumer buffer. 377 * 378 * Locking 379 * 380 * Traditionally, access to per-CPU data structures shared between 381 * interrupt levels is serialized by manipulating programmable interrupt 382 * level: readers and writers are required to raise their interrupt level 383 * to that of the highest level writer. 384 * 385 * For the producer/consumer buffers (shared between cyclic_fire()/ 386 * cyclic_expire() executing at CY_HIGH_LEVEL and cyclic_softint() executing 387 * at one of CY_LOCK_LEVEL or CY_LOW_LEVEL), forcing cyclic_softint() to raise 388 * programmable interrupt level is undesirable: aside from the additional 389 * latency incurred by manipulating interrupt level in the hot cy_pend 390 * processing path, this would create the potential for soft level cy_pend 391 * processing to delay CY_HIGH_LEVEL firing and expiry processing. 392 * CY_LOCK/LOW_LEVEL cyclics could thereby induce jitter in CY_HIGH_LEVEL 393 * cyclics. 394 * 395 * To minimize jitter, then, we would like the cyclic_fire()/cyclic_expire() 396 * and cyclic_softint() code paths to be lock-free. 397 * 398 * For cyclic_fire()/cyclic_expire(), lock-free execution is straightforward: 399 * because these routines execute at a higher interrupt level than 400 * cyclic_softint(), their actions on the producer/consumer buffer appear 401 * atomic. In particular, the increment of cy_pend appears to occur 402 * atomically with the increment of cypc_prodndx. 403 * 404 * For cyclic_softint(), however, lock-free execution requires more delicacy. 405 * When cyclic_softint() discovers a cyclic in the producer/consumer buffer, 406 * it calls the cyclic's handler and attempts to atomically decrement the 407 * cy_pend count with a compare&swap operation. 408 * 409 * If the compare&swap operation succeeds, cyclic_softint() behaves 410 * conditionally based on the value it atomically wrote to cy_pend: 411 * 412 * - If the cy_pend was decremented to 0, the cyclic has been consumed; 413 * cyclic_softint() increments the cypc_consndx and checks for more 414 * enqueued work. 415 * 416 * - If the count was decremented to a non-zero value, there is more work 417 * to be done on the cyclic; cyclic_softint() calls the cyclic handler 418 * and repeats the atomic decrement process. 419 * 420 * If the compare&swap operation fails, cyclic_softint() knows that 421 * cyclic_expire() has intervened and bumped the cy_pend count (resizes 422 * and removals complicate this, however -- see the sections on their 423 * operation, below). cyclic_softint() thus reloads cy_pend, and re-attempts 424 * the atomic decrement. 425 * 426 * Recall that we bound the size of the producer/consumer buffer by 427 * having cyclic_expire() only enqueue the specified cyclic if its 428 * cy_pend count is zero; this assures that each cyclic is enqueued at 429 * most once. This leads to a critical constraint on cyclic_softint(), 430 * however: after the compare&swap operation which successfully decrements 431 * cy_pend to zero, cyclic_softint() must _not_ re-examine the consumed 432 * cyclic. In part to obey this constraint, cyclic_softint() calls the 433 * cyclic handler before decrementing cy_pend. 434 * 435 * Resizing 436 * 437 * All of the discussion thus far has assumed a static number of cyclics. 438 * Obviously, static limitations are not practical; we need the capacity 439 * to resize our data structures dynamically. 440 * 441 * We resize our data structures lazily, and only on a per-CPU basis. 442 * The size of the data structures always doubles and never shrinks. We 443 * serialize adds (and thus resizes) on cpu_lock; we never need to deal 444 * with concurrent resizes. Resizes should be rare; they may induce jitter 445 * on the CPU being resized, but should not affect cyclic operation on other 446 * CPUs. Pending cyclics may not be dropped during a resize operation. 447 * 448 * Three key cyc_cpu data structures need to be resized: the cyclics array, 449 * the heap array and the producer/consumer buffers. Resizing the first two 450 * is relatively straightforward: 451 * 452 * 1. The new, larger arrays are allocated in cyclic_expand() (called 453 * from cyclic_add()). 454 * 2. cyclic_expand() cross calls cyclic_expand_xcall() on the CPU 455 * undergoing the resize. 456 * 3. cyclic_expand_xcall() raises interrupt level to CY_HIGH_LEVEL 457 * 4. The contents of the old arrays are copied into the new arrays. 458 * 5. The old cyclics array is bzero()'d 459 * 6. The pointers are updated. 460 * 461 * The producer/consumer buffer is dicier: cyclic_expand_xcall() may have 462 * interrupted cyclic_softint() in the middle of consumption. To resize the 463 * producer/consumer buffer, we implement up to two buffers per soft interrupt 464 * level: a hard buffer (the buffer being produced into by cyclic_expire()) 465 * and a soft buffer (the buffer from which cyclic_softint() is consuming). 466 * During normal operation, the hard buffer and soft buffer point to the 467 * same underlying producer/consumer buffer. 468 * 469 * During a resize, however, cyclic_expand_xcall() changes the hard buffer 470 * to point to the new, larger producer/consumer buffer; all future 471 * cyclic_expire()'s will produce into the new buffer. cyclic_expand_xcall() 472 * then posts a CY_LOCK_LEVEL soft interrupt, landing in cyclic_softint(). 473 * 474 * As under normal operation, cyclic_softint() will consume cyclics from 475 * its soft buffer. After the soft buffer is drained, however, 476 * cyclic_softint() will see that the hard buffer has changed. At that time, 477 * cyclic_softint() will change its soft buffer to point to the hard buffer, 478 * and repeat the producer/consumer buffer draining procedure. 479 * 480 * After the new buffer is drained, cyclic_softint() will determine if both 481 * soft levels have seen their new producer/consumer buffer. If both have, 482 * cyclic_softint() will post on the semaphore cyp_modify_wait. If not, a 483 * soft interrupt will be generated for the remaining level. 484 * 485 * cyclic_expand() blocks on the cyp_modify_wait semaphore (a semaphore is 486 * used instead of a condition variable because of the race between the 487 * sema_p() in cyclic_expand() and the sema_v() in cyclic_softint()). This 488 * allows cyclic_expand() to know when the resize operation is complete; 489 * all of the old buffers (the heap, the cyclics array and the producer/ 490 * consumer buffers) can be freed. 491 * 492 * A final caveat on resizing: we described step (5) in the 493 * cyclic_expand_xcall() procedure without providing any motivation. This 494 * step addresses the problem of a cyclic_softint() attempting to decrement 495 * a cy_pend count while interrupted by a cyclic_expand_xcall(). Because 496 * cyclic_softint() has already called the handler by the time cy_pend is 497 * decremented, we want to assure that it doesn't decrement a cy_pend 498 * count in the old cyclics array. By zeroing the old cyclics array in 499 * cyclic_expand_xcall(), we are zeroing out every cy_pend count; when 500 * cyclic_softint() attempts to compare&swap on the cy_pend count, it will 501 * fail and recognize that the count has been zeroed. cyclic_softint() will 502 * update its stale copy of the cyp_cyclics pointer, re-read the cy_pend 503 * count from the new cyclics array, and re-attempt the compare&swap. 504 * 505 * Removals 506 * 507 * Cyclic removals should be rare. To simplify the implementation (and to 508 * allow optimization for the cyclic_fire()/cyclic_expire()/cyclic_softint() 509 * path), we force removals and adds to serialize on cpu_lock. 510 * 511 * Cyclic removal is complicated by a guarantee made to the consumer of 512 * the cyclic subsystem: after cyclic_remove() returns, the cyclic handler 513 * has returned and will never again be called. 514 * 515 * Here is the procedure for cyclic removal: 516 * 517 * 1. cyclic_remove() calls cyclic_remove_xcall() on the CPU undergoing 518 * the removal. 519 * 2. cyclic_remove_xcall() raises interrupt level to CY_HIGH_LEVEL 520 * 3. The current expiration time for the removed cyclic is recorded. 521 * 4. If the cy_pend count on the removed cyclic is non-zero, it 522 * is copied into cyp_rpend and subsequently zeroed. 523 * 5. The cyclic is removed from the heap 524 * 6. If the root of the heap has changed, the backend is reprogrammed. 525 * 7. If the cy_pend count was non-zero cyclic_remove() blocks on the 526 * cyp_modify_wait semaphore. 527 * 528 * The motivation for step (3) is explained in "Juggling", below. 529 * 530 * The cy_pend count is decremented in cyclic_softint() after the cyclic 531 * handler returns. Thus, if we find a cy_pend count of zero in step 532 * (4), we know that cyclic_remove() doesn't need to block. 533 * 534 * If the cy_pend count is non-zero, however, we must block in cyclic_remove() 535 * until cyclic_softint() has finished calling the cyclic handler. To let 536 * cyclic_softint() know that this cyclic has been removed, we zero the 537 * cy_pend count. This will cause cyclic_softint()'s compare&swap to fail. 538 * When cyclic_softint() sees the zero cy_pend count, it knows that it's been 539 * caught during a resize (see "Resizing", above) or that the cyclic has been 540 * removed. In the latter case, it calls cyclic_remove_pend() to call the 541 * cyclic handler cyp_rpend - 1 times, and posts on cyp_modify_wait. 542 * 543 * Juggling 544 * 545 * At first glance, cyclic juggling seems to be a difficult problem. The 546 * subsystem must guarantee that a cyclic doesn't execute simultaneously on 547 * different CPUs, while also assuring that a cyclic fires exactly once 548 * per interval. We solve this problem by leveraging a property of the 549 * platform: gethrtime() is required to increase in lock-step across 550 * multiple CPUs. Therefore, to juggle a cyclic, we remove it from its 551 * CPU, recording its expiration time in the remove cross call (step (3) 552 * in "Removing", above). We then add the cyclic to the new CPU, explicitly 553 * setting its expiration time to the time recorded in the removal. This 554 * leverages the existing cyclic expiry processing, which will compensate 555 * for any time lost while juggling. 556 * 557 * Reprogramming 558 * 559 * Normally, after a cyclic fires, its next expiration is computed from 560 * the current time and the cyclic interval. But there are situations when 561 * the next expiration needs to be reprogrammed by the kernel subsystem that 562 * is using the cyclic. cyclic_reprogram() allows this to be done. This, 563 * unlike the other kernel at-large cyclic API functions, is permitted to 564 * be called from the cyclic handler. This is because it does not use the 565 * cpu_lock to serialize access. 566 * 567 * When cyclic_reprogram() is called for an omni-cyclic, the operation is 568 * applied to the omni-cyclic's component on the current CPU. 569 * 570 * If a high-level cyclic handler reprograms its own cyclic, then 571 * cyclic_fire() detects that and does not recompute the cyclic's next 572 * expiration. However, for a lock-level or a low-level cyclic, the 573 * actual cyclic handler will execute at the lower PIL only after 574 * cyclic_fire() is done with all expired cyclics. To deal with this, such 575 * cyclics can be specified with a special interval of CY_INFINITY (INT64_MAX). 576 * cyclic_fire() recognizes this special value and recomputes the next 577 * expiration to CY_INFINITY. This effectively moves the cyclic to the 578 * bottom of the heap and prevents it from going off until its handler has 579 * had a chance to reprogram it. Infact, this is the way to create and reuse 580 * "one-shot" timers in the context of the cyclic subsystem without using 581 * cyclic_remove(). 582 * 583 * Here is the procedure for cyclic reprogramming: 584 * 585 * 1. cyclic_reprogram() calls cyclic_reprogram_xcall() on the CPU 586 * that houses the cyclic. 587 * 2. cyclic_reprogram_xcall() raises interrupt level to CY_HIGH_LEVEL 588 * 3. The cyclic is located in the cyclic heap. The search for this is 589 * done from the bottom of the heap to the top as reprogrammable cyclics 590 * would be located closer to the bottom than the top. 591 * 4. The cyclic expiration is set and the cyclic is moved to its 592 * correct position in the heap (up or down depending on whether the 593 * new expiration is less than or greater than the old one). 594 * 5. If the cyclic move modified the root of the heap, the backend is 595 * reprogrammed. 596 * 597 * Reprogramming can be a frequent event (see the callout subsystem). So, 598 * the serialization used has to be efficient. As with all other cyclic 599 * operations, the interrupt level is raised during reprogramming. Plus, 600 * during reprogramming, the cyclic must not be juggled (regular cyclic) 601 * or stopped (omni-cyclic). The implementation defines a per-cyclic 602 * reader-writer lock to accomplish this. This lock is acquired in the 603 * reader mode by cyclic_reprogram() and writer mode by cyclic_juggle() and 604 * cyclic_omni_stop(). The reader-writer lock makes it efficient if 605 * an omni-cyclic is reprogrammed on different CPUs frequently. 606 * 607 * Note that since the cpu_lock is not used during reprogramming, it is 608 * the responsibility of the user of the reprogrammable cyclic to make sure 609 * that the cyclic is not removed via cyclic_remove() during reprogramming. 610 * This is not an unreasonable requirement as the user will typically have 611 * some sort of synchronization for its cyclic-related activities. This 612 * little caveat exists because the cyclic ID is not really an ID. It is 613 * implemented as a pointer to a structure. 614 * 615 * For cyclics which reprogram themselves during their own handler function, 616 * avoiding the potential race with cyclic_remove() can be a challenge. If a 617 * handler is running and a remote thread issues a cyclic_remove() on its 618 * cyclic (interrupting the handler with the removal xcall), subsequent 619 * attempts to reprogram the cyclics from within the handler will result in a 620 * failure return code from cyclic_reprogram(). 621 */ 622 #include <sys/cyclic_impl.h> 623 #include <sys/sysmacros.h> 624 #include <sys/systm.h> 625 #include <sys/atomic.h> 626 #include <sys/kmem.h> 627 #include <sys/cmn_err.h> 628 #include <sys/ddi.h> 629 #include <sys/sdt.h> 630 631 #ifdef CYCLIC_TRACE 632 633 /* 634 * cyc_trace_enabled is for the benefit of kernel debuggers. 635 */ 636 int cyc_trace_enabled = 1; 637 static cyc_tracebuf_t cyc_ptrace; 638 static cyc_coverage_t cyc_coverage[CY_NCOVERAGE]; 639 640 /* 641 * Seen this anywhere? 642 */ 643 static uint_t 644 cyclic_coverage_hash(char *p) 645 { 646 unsigned int g; 647 uint_t hval; 648 649 hval = 0; 650 while (*p) { 651 hval = (hval << 4) + *p++; 652 if ((g = (hval & 0xf0000000)) != 0) 653 hval ^= g >> 24; 654 hval &= ~g; 655 } 656 return (hval); 657 } 658 659 static void 660 cyclic_coverage(char *why, int level, uint64_t arg0, uint64_t arg1) 661 { 662 uint_t ndx, orig; 663 664 for (ndx = orig = cyclic_coverage_hash(why) % CY_NCOVERAGE; ; ) { 665 if (cyc_coverage[ndx].cyv_why == why) 666 break; 667 668 if (cyc_coverage[ndx].cyv_why != NULL || 669 atomic_cas_ptr(&cyc_coverage[ndx].cyv_why, NULL, why) != 670 NULL) { 671 672 if (++ndx == CY_NCOVERAGE) 673 ndx = 0; 674 675 if (ndx == orig) 676 panic("too many cyclic coverage points"); 677 continue; 678 } 679 680 /* 681 * If we're here, we have successfully swung our guy into 682 * the position at "ndx". 683 */ 684 break; 685 } 686 687 if (level == CY_PASSIVE_LEVEL) 688 cyc_coverage[ndx].cyv_passive_count++; 689 else 690 cyc_coverage[ndx].cyv_count[level]++; 691 692 cyc_coverage[ndx].cyv_arg0 = arg0; 693 cyc_coverage[ndx].cyv_arg1 = arg1; 694 } 695 696 #define CYC_TRACE(cpu, level, why, arg0, arg1) \ 697 CYC_TRACE_IMPL(&cpu->cyp_trace[level], level, why, arg0, arg1) 698 699 #define CYC_PTRACE(why, arg0, arg1) \ 700 CYC_TRACE_IMPL(&cyc_ptrace, CY_PASSIVE_LEVEL, why, arg0, arg1) 701 702 #define CYC_TRACE_IMPL(buf, level, why, a0, a1) { \ 703 if (panicstr == NULL) { \ 704 int _ndx = (buf)->cyt_ndx; \ 705 cyc_tracerec_t *_rec = &(buf)->cyt_buf[_ndx]; \ 706 (buf)->cyt_ndx = (++_ndx == CY_NTRACEREC) ? 0 : _ndx; \ 707 _rec->cyt_tstamp = gethrtime_unscaled(); \ 708 _rec->cyt_why = (why); \ 709 _rec->cyt_arg0 = (uint64_t)(uintptr_t)(a0); \ 710 _rec->cyt_arg1 = (uint64_t)(uintptr_t)(a1); \ 711 cyclic_coverage(why, level, \ 712 (uint64_t)(uintptr_t)(a0), (uint64_t)(uintptr_t)(a1)); \ 713 } \ 714 } 715 716 #else 717 718 static int cyc_trace_enabled = 0; 719 720 #define CYC_TRACE(cpu, level, why, arg0, arg1) 721 #define CYC_PTRACE(why, arg0, arg1) 722 723 #endif 724 725 #define CYC_TRACE0(cpu, level, why) CYC_TRACE(cpu, level, why, 0, 0) 726 #define CYC_TRACE1(cpu, level, why, arg0) CYC_TRACE(cpu, level, why, arg0, 0) 727 728 #define CYC_PTRACE0(why) CYC_PTRACE(why, 0, 0) 729 #define CYC_PTRACE1(why, arg0) CYC_PTRACE(why, arg0, 0) 730 731 static kmem_cache_t *cyclic_id_cache; 732 static cyc_id_t *cyclic_id_head; 733 static hrtime_t cyclic_resolution; 734 static cyc_backend_t cyclic_backend; 735 736 /* 737 * Returns 1 if the upheap propagated to the root, 0 if it did not. This 738 * allows the caller to reprogram the backend only when the root has been 739 * modified. 740 */ 741 static int 742 cyclic_upheap(cyc_cpu_t *cpu, cyc_index_t ndx) 743 { 744 cyclic_t *cyclics; 745 cyc_index_t *heap; 746 cyc_index_t heap_parent, heap_current = ndx; 747 cyc_index_t parent, current; 748 749 if (heap_current == 0) 750 return (1); 751 752 heap = cpu->cyp_heap; 753 cyclics = cpu->cyp_cyclics; 754 heap_parent = CYC_HEAP_PARENT(heap_current); 755 756 for (;;) { 757 current = heap[heap_current]; 758 parent = heap[heap_parent]; 759 760 /* 761 * We have an expiration time later than our parent; we're 762 * done. 763 */ 764 if (cyclics[current].cy_expire >= cyclics[parent].cy_expire) 765 return (0); 766 767 /* 768 * We need to swap with our parent, and continue up the heap. 769 */ 770 heap[heap_parent] = current; 771 heap[heap_current] = parent; 772 773 /* 774 * If we just reached the root, we're done. 775 */ 776 if (heap_parent == 0) 777 return (1); 778 779 heap_current = heap_parent; 780 heap_parent = CYC_HEAP_PARENT(heap_current); 781 } 782 } 783 784 static void 785 cyclic_downheap(cyc_cpu_t *cpu, cyc_index_t ndx) 786 { 787 cyclic_t *cyclics = cpu->cyp_cyclics; 788 cyc_index_t *heap = cpu->cyp_heap; 789 790 cyc_index_t heap_left, heap_right, heap_me = ndx; 791 cyc_index_t left, right, me; 792 cyc_index_t nelems = cpu->cyp_nelems; 793 794 for (;;) { 795 /* 796 * If we don't have a left child (i.e., we're a leaf), we're 797 * done. 798 */ 799 if ((heap_left = CYC_HEAP_LEFT(heap_me)) >= nelems) 800 return; 801 802 left = heap[heap_left]; 803 me = heap[heap_me]; 804 805 heap_right = CYC_HEAP_RIGHT(heap_me); 806 807 /* 808 * Even if we don't have a right child, we still need to compare 809 * our expiration time against that of our left child. 810 */ 811 if (heap_right >= nelems) 812 goto comp_left; 813 814 right = heap[heap_right]; 815 816 /* 817 * We have both a left and a right child. We need to compare 818 * the expiration times of the children to determine which 819 * expires earlier. 820 */ 821 if (cyclics[right].cy_expire < cyclics[left].cy_expire) { 822 /* 823 * Our right child is the earlier of our children. 824 * We'll now compare our expiration time to its; if 825 * ours is the earlier, we're done. 826 */ 827 if (cyclics[me].cy_expire <= cyclics[right].cy_expire) 828 return; 829 830 /* 831 * Our right child expires earlier than we do; swap 832 * with our right child, and descend right. 833 */ 834 heap[heap_right] = me; 835 heap[heap_me] = right; 836 heap_me = heap_right; 837 continue; 838 } 839 840 comp_left: 841 /* 842 * Our left child is the earlier of our children (or we have 843 * no right child). We'll now compare our expiration time 844 * to its; if ours is the earlier, we're done. 845 */ 846 if (cyclics[me].cy_expire <= cyclics[left].cy_expire) 847 return; 848 849 /* 850 * Our left child expires earlier than we do; swap with our 851 * left child, and descend left. 852 */ 853 heap[heap_left] = me; 854 heap[heap_me] = left; 855 heap_me = heap_left; 856 } 857 } 858 859 static void 860 cyclic_expire(cyc_cpu_t *cpu, cyc_index_t ndx, cyclic_t *cyclic) 861 { 862 cyc_backend_t *be = cpu->cyp_backend; 863 cyc_level_t level = cyclic->cy_level; 864 865 /* 866 * If this is a CY_HIGH_LEVEL cyclic, just call the handler; we don't 867 * need to worry about the pend count for CY_HIGH_LEVEL cyclics. 868 */ 869 if (level == CY_HIGH_LEVEL) { 870 cyc_func_t handler = cyclic->cy_handler; 871 void *arg = cyclic->cy_arg; 872 873 CYC_TRACE(cpu, CY_HIGH_LEVEL, "handler-in", handler, arg); 874 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 875 876 (*handler)(arg); 877 878 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 879 CYC_TRACE(cpu, CY_HIGH_LEVEL, "handler-out", handler, arg); 880 881 return; 882 } 883 884 /* 885 * We're at CY_HIGH_LEVEL; this modification to cy_pend need not 886 * be atomic (the high interrupt level assures that it will appear 887 * atomic to any softint currently running). 888 */ 889 if (cyclic->cy_pend++ == 0) { 890 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[level]; 891 cyc_pcbuffer_t *pc = &softbuf->cys_buf[softbuf->cys_hard]; 892 893 /* 894 * We need to enqueue this cyclic in the soft buffer. 895 */ 896 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expire-enq", cyclic, 897 pc->cypc_prodndx); 898 pc->cypc_buf[pc->cypc_prodndx++ & pc->cypc_sizemask] = ndx; 899 900 ASSERT(pc->cypc_prodndx != pc->cypc_consndx); 901 } else { 902 /* 903 * If the pend count is zero after we incremented it, then 904 * we've wrapped (i.e. we had a cy_pend count of over four 905 * billion. In this case, we clamp the pend count at 906 * UINT32_MAX. Yes, cyclics can be lost in this case. 907 */ 908 if (cyclic->cy_pend == 0) { 909 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "expire-wrap", cyclic); 910 cyclic->cy_pend = UINT32_MAX; 911 } 912 913 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expire-bump", cyclic, 0); 914 } 915 916 be->cyb_softint(be->cyb_arg, cyclic->cy_level); 917 } 918 919 /* 920 * cyclic_fire(cpu_t *) 921 * 922 * Overview 923 * 924 * cyclic_fire() is the cyclic subsystem's CY_HIGH_LEVEL interrupt handler. 925 * Called by the cyclic backend. 926 * 927 * Arguments and notes 928 * 929 * The only argument is the CPU on which the interrupt is executing; 930 * backends must call into cyclic_fire() on the specified CPU. 931 * 932 * cyclic_fire() may be called spuriously without ill effect. Optimal 933 * backends will call into cyclic_fire() at or shortly after the time 934 * requested via cyb_reprogram(). However, calling cyclic_fire() 935 * arbitrarily late will only manifest latency bubbles; the correctness 936 * of the cyclic subsystem does not rely on the timeliness of the backend. 937 * 938 * cyclic_fire() is wait-free; it will not block or spin. 939 * 940 * Return values 941 * 942 * None. 943 * 944 * Caller's context 945 * 946 * cyclic_fire() must be called from CY_HIGH_LEVEL interrupt context. 947 */ 948 void 949 cyclic_fire(cpu_t *c) 950 { 951 cyc_cpu_t *cpu = c->cpu_cyclic; 952 cyc_backend_t *be = cpu->cyp_backend; 953 cyc_index_t *heap = cpu->cyp_heap; 954 cyclic_t *cyclic, *cyclics = cpu->cyp_cyclics; 955 void *arg = be->cyb_arg; 956 hrtime_t now = gethrtime(); 957 hrtime_t exp; 958 959 CYC_TRACE(cpu, CY_HIGH_LEVEL, "fire", now, 0); 960 961 if (cpu->cyp_nelems == 0) { 962 /* 963 * This is a spurious fire. Count it as such, and blow 964 * out of here. 965 */ 966 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "fire-spurious"); 967 return; 968 } 969 970 for (;;) { 971 cyc_index_t ndx = heap[0]; 972 973 cyclic = &cyclics[ndx]; 974 975 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 976 977 CYC_TRACE(cpu, CY_HIGH_LEVEL, "fire-check", cyclic, 978 cyclic->cy_expire); 979 980 if ((exp = cyclic->cy_expire) > now) 981 break; 982 983 cyclic_expire(cpu, ndx, cyclic); 984 985 /* 986 * If the handler reprogrammed the cyclic, then don't 987 * recompute the expiration. Then, if the interval is 988 * infinity, set the expiration to infinity. This can 989 * be used to create one-shot timers. 990 */ 991 if (exp != cyclic->cy_expire) { 992 /* 993 * If a hi level cyclic reprograms itself, 994 * the heap adjustment and reprogramming of the 995 * clock source have already been done at this 996 * point. So, we can continue. 997 */ 998 continue; 999 } 1000 1001 if (cyclic->cy_interval == CY_INFINITY) 1002 exp = CY_INFINITY; 1003 else 1004 exp += cyclic->cy_interval; 1005 1006 /* 1007 * If this cyclic will be set to next expire in the distant 1008 * past, we have one of two situations: 1009 * 1010 * a) This is the first firing of a cyclic which had 1011 * cy_expire set to 0. 1012 * 1013 * b) We are tragically late for a cyclic -- most likely 1014 * due to being in the debugger. 1015 * 1016 * In either case, we set the new expiration time to be the 1017 * the next interval boundary. This assures that the 1018 * expiration time modulo the interval is invariant. 1019 * 1020 * We arbitrarily define "distant" to be one second (one second 1021 * is chosen because it's shorter than any foray to the 1022 * debugger while still being longer than any legitimate 1023 * stretch at CY_HIGH_LEVEL). 1024 */ 1025 1026 if (now - exp > NANOSEC) { 1027 hrtime_t interval = cyclic->cy_interval; 1028 1029 CYC_TRACE(cpu, CY_HIGH_LEVEL, exp == interval ? 1030 "fire-first" : "fire-swing", now, exp); 1031 1032 exp += ((now - exp) / interval + 1) * interval; 1033 } 1034 1035 cyclic->cy_expire = exp; 1036 cyclic_downheap(cpu, 0); 1037 } 1038 1039 /* 1040 * Now we have a cyclic in the root slot which isn't in the past; 1041 * reprogram the interrupt source. 1042 */ 1043 be->cyb_reprogram(arg, exp); 1044 } 1045 1046 static void 1047 cyclic_remove_pend(cyc_cpu_t *cpu, cyc_level_t level, cyclic_t *cyclic) 1048 { 1049 cyc_func_t handler = cyclic->cy_handler; 1050 void *arg = cyclic->cy_arg; 1051 uint32_t i, rpend = cpu->cyp_rpend - 1; 1052 1053 ASSERT(cyclic->cy_flags & CYF_FREE); 1054 ASSERT(cyclic->cy_pend == 0); 1055 ASSERT(cpu->cyp_state == CYS_REMOVING); 1056 ASSERT(cpu->cyp_rpend > 0); 1057 1058 CYC_TRACE(cpu, level, "remove-rpend", cyclic, cpu->cyp_rpend); 1059 1060 /* 1061 * Note that we only call the handler cyp_rpend - 1 times; this is 1062 * to account for the handler call in cyclic_softint(). 1063 */ 1064 for (i = 0; i < rpend; i++) { 1065 CYC_TRACE(cpu, level, "rpend-in", handler, arg); 1066 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 1067 1068 (*handler)(arg); 1069 1070 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 1071 CYC_TRACE(cpu, level, "rpend-out", handler, arg); 1072 } 1073 1074 /* 1075 * We can now let the remove operation complete. 1076 */ 1077 sema_v(&cpu->cyp_modify_wait); 1078 } 1079 1080 /* 1081 * cyclic_softint(cpu_t *cpu, cyc_level_t level) 1082 * 1083 * Overview 1084 * 1085 * cyclic_softint() is the cyclic subsystem's CY_LOCK_LEVEL and CY_LOW_LEVEL 1086 * soft interrupt handler. Called by the cyclic backend. 1087 * 1088 * Arguments and notes 1089 * 1090 * The first argument to cyclic_softint() is the CPU on which the interrupt 1091 * is executing; backends must call into cyclic_softint() on the specified 1092 * CPU. The second argument is the level of the soft interrupt; it must 1093 * be one of CY_LOCK_LEVEL or CY_LOW_LEVEL. 1094 * 1095 * cyclic_softint() will call the handlers for cyclics pending at the 1096 * specified level. cyclic_softint() will not return until all pending 1097 * cyclics at the specified level have been dealt with; intervening 1098 * CY_HIGH_LEVEL interrupts which enqueue cyclics at the specified level 1099 * may therefore prolong cyclic_softint(). 1100 * 1101 * cyclic_softint() never disables interrupts, and, if neither a 1102 * cyclic_add() nor a cyclic_remove() is pending on the specified CPU, is 1103 * lock-free. This assures that in the common case, cyclic_softint() 1104 * completes without blocking, and never starves cyclic_fire(). If either 1105 * cyclic_add() or cyclic_remove() is pending, cyclic_softint() may grab 1106 * a dispatcher lock. 1107 * 1108 * While cyclic_softint() is designed for bounded latency, it is obviously 1109 * at the mercy of its cyclic handlers. Because cyclic handlers may block 1110 * arbitrarily, callers of cyclic_softint() should not rely upon 1111 * deterministic completion. 1112 * 1113 * cyclic_softint() may be called spuriously without ill effect. 1114 * 1115 * Return value 1116 * 1117 * None. 1118 * 1119 * Caller's context 1120 * 1121 * The caller must be executing in soft interrupt context at either 1122 * CY_LOCK_LEVEL or CY_LOW_LEVEL. The level passed to cyclic_softint() 1123 * must match the level at which it is executing. On optimal backends, 1124 * the caller will hold no locks. In any case, the caller may not hold 1125 * cpu_lock or any lock acquired by any cyclic handler or held across 1126 * any of cyclic_add(), cyclic_remove(), cyclic_bind() or cyclic_juggle(). 1127 */ 1128 void 1129 cyclic_softint(cpu_t *c, cyc_level_t level) 1130 { 1131 cyc_cpu_t *cpu = c->cpu_cyclic; 1132 cyc_softbuf_t *softbuf; 1133 int soft, *buf, consndx, resized = 0, intr_resized = 0; 1134 cyc_pcbuffer_t *pc; 1135 cyclic_t *cyclics = cpu->cyp_cyclics; 1136 int sizemask; 1137 1138 CYC_TRACE(cpu, level, "softint", cyclics, 0); 1139 1140 ASSERT(level < CY_LOW_LEVEL + CY_SOFT_LEVELS); 1141 1142 softbuf = &cpu->cyp_softbuf[level]; 1143 top: 1144 soft = softbuf->cys_soft; 1145 ASSERT(soft == 0 || soft == 1); 1146 1147 pc = &softbuf->cys_buf[soft]; 1148 buf = pc->cypc_buf; 1149 consndx = pc->cypc_consndx; 1150 sizemask = pc->cypc_sizemask; 1151 1152 CYC_TRACE(cpu, level, "softint-top", cyclics, pc); 1153 1154 while (consndx != pc->cypc_prodndx) { 1155 uint32_t pend, npend, opend; 1156 int consmasked = consndx & sizemask; 1157 cyclic_t *cyclic = &cyclics[buf[consmasked]]; 1158 cyc_func_t handler = cyclic->cy_handler; 1159 void *arg = cyclic->cy_arg; 1160 1161 ASSERT(buf[consmasked] < cpu->cyp_size); 1162 CYC_TRACE(cpu, level, "consuming", consndx, cyclic); 1163 1164 /* 1165 * We have found this cyclic in the pcbuffer. We know that 1166 * one of the following is true: 1167 * 1168 * (a) The pend is non-zero. We need to execute the handler 1169 * at least once. 1170 * 1171 * (b) The pend _was_ non-zero, but it's now zero due to a 1172 * resize. We will call the handler once, see that we 1173 * are in this case, and read the new cyclics buffer 1174 * (and hence the old non-zero pend). 1175 * 1176 * (c) The pend _was_ non-zero, but it's now zero due to a 1177 * removal. We will call the handler once, see that we 1178 * are in this case, and call into cyclic_remove_pend() 1179 * to call the cyclic rpend times. We will take into 1180 * account that we have already called the handler once. 1181 * 1182 * Point is: it's safe to call the handler without first 1183 * checking the pend. 1184 */ 1185 do { 1186 CYC_TRACE(cpu, level, "handler-in", handler, arg); 1187 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 1188 1189 (*handler)(arg); 1190 1191 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 1192 CYC_TRACE(cpu, level, "handler-out", handler, arg); 1193 reread: 1194 pend = cyclic->cy_pend; 1195 npend = pend - 1; 1196 1197 if (pend == 0) { 1198 if (cpu->cyp_state == CYS_REMOVING) { 1199 /* 1200 * This cyclic has been removed while 1201 * it had a non-zero pend count (we 1202 * know it was non-zero because we 1203 * found this cyclic in the pcbuffer). 1204 * There must be a non-zero rpend for 1205 * this CPU, and there must be a remove 1206 * operation blocking; we'll call into 1207 * cyclic_remove_pend() to clean this 1208 * up, and break out of the pend loop. 1209 */ 1210 cyclic_remove_pend(cpu, level, cyclic); 1211 break; 1212 } 1213 1214 /* 1215 * We must have had a resize interrupt us. 1216 */ 1217 CYC_TRACE(cpu, level, "resize-int", cyclics, 0); 1218 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1219 ASSERT(cyclics != cpu->cyp_cyclics); 1220 ASSERT(resized == 0); 1221 ASSERT(intr_resized == 0); 1222 intr_resized = 1; 1223 cyclics = cpu->cyp_cyclics; 1224 cyclic = &cyclics[buf[consmasked]]; 1225 ASSERT(cyclic->cy_handler == handler); 1226 ASSERT(cyclic->cy_arg == arg); 1227 goto reread; 1228 } 1229 1230 if ((opend = 1231 atomic_cas_32(&cyclic->cy_pend, pend, npend)) != 1232 pend) { 1233 /* 1234 * Our atomic_cas_32 can fail for one of several 1235 * reasons: 1236 * 1237 * (a) An intervening high level bumped up the 1238 * pend count on this cyclic. In this 1239 * case, we will see a higher pend. 1240 * 1241 * (b) The cyclics array has been yanked out 1242 * from underneath us by a resize 1243 * operation. In this case, pend is 0 and 1244 * cyp_state is CYS_EXPANDING. 1245 * 1246 * (c) The cyclic has been removed by an 1247 * intervening remove-xcall. In this case, 1248 * pend will be 0, the cyp_state will be 1249 * CYS_REMOVING, and the cyclic will be 1250 * marked CYF_FREE. 1251 * 1252 * The assertion below checks that we are 1253 * in one of the above situations. The 1254 * action under all three is to return to 1255 * the top of the loop. 1256 */ 1257 CYC_TRACE(cpu, level, "cas-fail", opend, pend); 1258 ASSERT(opend > pend || (opend == 0 && 1259 ((cyclics != cpu->cyp_cyclics && 1260 cpu->cyp_state == CYS_EXPANDING) || 1261 (cpu->cyp_state == CYS_REMOVING && 1262 (cyclic->cy_flags & CYF_FREE))))); 1263 goto reread; 1264 } 1265 1266 /* 1267 * Okay, so we've managed to successfully decrement 1268 * pend. If we just decremented the pend to 0, we're 1269 * done. 1270 */ 1271 } while (npend > 0); 1272 1273 pc->cypc_consndx = ++consndx; 1274 } 1275 1276 /* 1277 * If the high level handler is no longer writing to the same 1278 * buffer, then we've had a resize. We need to switch our soft 1279 * index, and goto top. 1280 */ 1281 if (soft != softbuf->cys_hard) { 1282 /* 1283 * We can assert that the other buffer has grown by exactly 1284 * one factor of two. 1285 */ 1286 CYC_TRACE(cpu, level, "buffer-grow", 0, 0); 1287 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1288 ASSERT(softbuf->cys_buf[softbuf->cys_hard].cypc_sizemask == 1289 (softbuf->cys_buf[soft].cypc_sizemask << 1) + 1 || 1290 softbuf->cys_buf[soft].cypc_sizemask == 0); 1291 ASSERT(softbuf->cys_hard == (softbuf->cys_soft ^ 1)); 1292 1293 /* 1294 * If our cached cyclics pointer doesn't match cyp_cyclics, 1295 * then we took a resize between our last iteration of the 1296 * pend loop and the check against softbuf->cys_hard. 1297 */ 1298 if (cpu->cyp_cyclics != cyclics) { 1299 CYC_TRACE1(cpu, level, "resize-int-int", consndx); 1300 cyclics = cpu->cyp_cyclics; 1301 } 1302 1303 softbuf->cys_soft = softbuf->cys_hard; 1304 1305 ASSERT(resized == 0); 1306 resized = 1; 1307 goto top; 1308 } 1309 1310 /* 1311 * If we were interrupted by a resize operation, then we must have 1312 * seen the hard index change. 1313 */ 1314 ASSERT(!(intr_resized == 1 && resized == 0)); 1315 1316 if (resized) { 1317 uint32_t lev, nlev; 1318 1319 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1320 1321 do { 1322 lev = cpu->cyp_modify_levels; 1323 nlev = lev + 1; 1324 } while (atomic_cas_32(&cpu->cyp_modify_levels, lev, nlev) != 1325 lev); 1326 1327 /* 1328 * If we are the last soft level to see the modification, 1329 * post on cyp_modify_wait. Otherwise, (if we're not 1330 * already at low level), post down to the next soft level. 1331 */ 1332 if (nlev == CY_SOFT_LEVELS) { 1333 CYC_TRACE0(cpu, level, "resize-kick"); 1334 sema_v(&cpu->cyp_modify_wait); 1335 } else { 1336 ASSERT(nlev < CY_SOFT_LEVELS); 1337 if (level != CY_LOW_LEVEL) { 1338 cyc_backend_t *be = cpu->cyp_backend; 1339 1340 CYC_TRACE0(cpu, level, "resize-post"); 1341 be->cyb_softint(be->cyb_arg, level - 1); 1342 } 1343 } 1344 } 1345 } 1346 1347 static void 1348 cyclic_expand_xcall(cyc_xcallarg_t *arg) 1349 { 1350 cyc_cpu_t *cpu = arg->cyx_cpu; 1351 cyc_backend_t *be = cpu->cyp_backend; 1352 cyb_arg_t bar = be->cyb_arg; 1353 cyc_cookie_t cookie; 1354 cyc_index_t new_size = arg->cyx_size, size = cpu->cyp_size, i; 1355 cyc_index_t *new_heap = arg->cyx_heap; 1356 cyclic_t *cyclics = cpu->cyp_cyclics, *new_cyclics = arg->cyx_cyclics; 1357 1358 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1359 1360 /* 1361 * This is a little dicey. First, we'll raise our interrupt level 1362 * to CY_HIGH_LEVEL. This CPU already has a new heap, cyclic array, 1363 * etc.; we just need to bcopy them across. As for the softint 1364 * buffers, we'll switch the active buffers. The actual softints will 1365 * take care of consuming any pending cyclics in the old buffer. 1366 */ 1367 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1368 1369 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expand", new_size, 0); 1370 1371 /* 1372 * Assert that the new size is a power of 2. 1373 */ 1374 ASSERT((new_size & new_size - 1) == 0); 1375 ASSERT(new_size == (size << 1)); 1376 ASSERT(cpu->cyp_heap != NULL && cpu->cyp_cyclics != NULL); 1377 1378 bcopy(cpu->cyp_heap, new_heap, sizeof (cyc_index_t) * size); 1379 bcopy(cyclics, new_cyclics, sizeof (cyclic_t) * size); 1380 1381 /* 1382 * Now run through the old cyclics array, setting pend to 0. To 1383 * softints (which are executing at a lower priority level), the 1384 * pends dropping to 0 will appear atomic with the cyp_cyclics 1385 * pointer changing. 1386 */ 1387 for (i = 0; i < size; i++) 1388 cyclics[i].cy_pend = 0; 1389 1390 /* 1391 * Set up the free list, and set all of the new cyclics to be CYF_FREE. 1392 */ 1393 for (i = size; i < new_size; i++) { 1394 new_heap[i] = i; 1395 new_cyclics[i].cy_flags = CYF_FREE; 1396 } 1397 1398 /* 1399 * We can go ahead and plow the value of cyp_heap and cyp_cyclics; 1400 * cyclic_expand() has kept a copy. 1401 */ 1402 cpu->cyp_heap = new_heap; 1403 cpu->cyp_cyclics = new_cyclics; 1404 cpu->cyp_size = new_size; 1405 1406 /* 1407 * We've switched over the heap and the cyclics array. Now we need 1408 * to switch over our active softint buffer pointers. 1409 */ 1410 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1411 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1412 uchar_t hard = softbuf->cys_hard; 1413 1414 /* 1415 * Assert that we're not in the middle of a resize operation. 1416 */ 1417 ASSERT(hard == softbuf->cys_soft); 1418 ASSERT(hard == 0 || hard == 1); 1419 ASSERT(softbuf->cys_buf[hard].cypc_buf != NULL); 1420 1421 softbuf->cys_hard = hard ^ 1; 1422 1423 /* 1424 * The caller (cyclic_expand()) is responsible for setting 1425 * up the new producer-consumer buffer; assert that it's 1426 * been done correctly. 1427 */ 1428 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_buf != NULL); 1429 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_prodndx == 0); 1430 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_consndx == 0); 1431 } 1432 1433 /* 1434 * That's all there is to it; now we just need to postdown to 1435 * get the softint chain going. 1436 */ 1437 be->cyb_softint(bar, CY_HIGH_LEVEL - 1); 1438 be->cyb_restore_level(bar, cookie); 1439 } 1440 1441 /* 1442 * cyclic_expand() will cross call onto the CPU to perform the actual 1443 * expand operation. 1444 */ 1445 static void 1446 cyclic_expand(cyc_cpu_t *cpu) 1447 { 1448 cyc_index_t new_size, old_size; 1449 cyc_index_t *new_heap, *old_heap; 1450 cyclic_t *new_cyclics, *old_cyclics; 1451 cyc_xcallarg_t arg; 1452 cyc_backend_t *be = cpu->cyp_backend; 1453 char old_hard; 1454 int i; 1455 1456 ASSERT(MUTEX_HELD(&cpu_lock)); 1457 ASSERT(cpu->cyp_state == CYS_ONLINE); 1458 1459 cpu->cyp_state = CYS_EXPANDING; 1460 1461 old_heap = cpu->cyp_heap; 1462 old_cyclics = cpu->cyp_cyclics; 1463 1464 if ((new_size = ((old_size = cpu->cyp_size) << 1)) == 0) { 1465 new_size = CY_DEFAULT_PERCPU; 1466 ASSERT(old_heap == NULL && old_cyclics == NULL); 1467 } 1468 1469 /* 1470 * Check that the new_size is a power of 2. 1471 */ 1472 ASSERT((new_size - 1 & new_size) == 0); 1473 1474 new_heap = kmem_alloc(sizeof (cyc_index_t) * new_size, KM_SLEEP); 1475 new_cyclics = kmem_zalloc(sizeof (cyclic_t) * new_size, KM_SLEEP); 1476 1477 /* 1478 * We know that no other expansions are in progress (they serialize 1479 * on cpu_lock), so we can safely read the softbuf metadata. 1480 */ 1481 old_hard = cpu->cyp_softbuf[0].cys_hard; 1482 1483 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1484 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1485 char hard = softbuf->cys_hard; 1486 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard ^ 1]; 1487 1488 ASSERT(hard == old_hard); 1489 ASSERT(hard == softbuf->cys_soft); 1490 ASSERT(pc->cypc_buf == NULL); 1491 1492 pc->cypc_buf = 1493 kmem_alloc(sizeof (cyc_index_t) * new_size, KM_SLEEP); 1494 pc->cypc_prodndx = pc->cypc_consndx = 0; 1495 pc->cypc_sizemask = new_size - 1; 1496 } 1497 1498 arg.cyx_cpu = cpu; 1499 arg.cyx_heap = new_heap; 1500 arg.cyx_cyclics = new_cyclics; 1501 arg.cyx_size = new_size; 1502 1503 cpu->cyp_modify_levels = 0; 1504 1505 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 1506 (cyc_func_t)cyclic_expand_xcall, &arg); 1507 1508 /* 1509 * Now block, waiting for the resize operation to complete. 1510 */ 1511 sema_p(&cpu->cyp_modify_wait); 1512 ASSERT(cpu->cyp_modify_levels == CY_SOFT_LEVELS); 1513 1514 /* 1515 * The operation is complete; we can now free the old buffers. 1516 */ 1517 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1518 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1519 char hard = softbuf->cys_hard; 1520 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard ^ 1]; 1521 1522 ASSERT(hard == (old_hard ^ 1)); 1523 ASSERT(hard == softbuf->cys_soft); 1524 1525 if (pc->cypc_buf == NULL) 1526 continue; 1527 1528 ASSERT(pc->cypc_sizemask == ((new_size - 1) >> 1)); 1529 1530 kmem_free(pc->cypc_buf, 1531 sizeof (cyc_index_t) * (pc->cypc_sizemask + 1)); 1532 pc->cypc_buf = NULL; 1533 } 1534 1535 if (old_cyclics != NULL) { 1536 ASSERT(old_heap != NULL); 1537 ASSERT(old_size != 0); 1538 kmem_free(old_cyclics, sizeof (cyclic_t) * old_size); 1539 kmem_free(old_heap, sizeof (cyc_index_t) * old_size); 1540 } 1541 1542 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1543 cpu->cyp_state = CYS_ONLINE; 1544 } 1545 1546 /* 1547 * cyclic_pick_cpu will attempt to pick a CPU according to the constraints 1548 * specified by the partition, bound CPU, and flags. Additionally, 1549 * cyclic_pick_cpu() will not pick the avoid CPU; it will return NULL if 1550 * the avoid CPU is the only CPU which satisfies the constraints. 1551 * 1552 * If CYF_CPU_BOUND is set in flags, the specified CPU must be non-NULL. 1553 * If CYF_PART_BOUND is set in flags, the specified partition must be non-NULL. 1554 * If both CYF_CPU_BOUND and CYF_PART_BOUND are set, the specified CPU must 1555 * be in the specified partition. 1556 */ 1557 static cyc_cpu_t * 1558 cyclic_pick_cpu(cpupart_t *part, cpu_t *bound, cpu_t *avoid, uint16_t flags) 1559 { 1560 cpu_t *c, *start = (part != NULL) ? part->cp_cpulist : CPU; 1561 cpu_t *online = NULL; 1562 uintptr_t offset; 1563 1564 CYC_PTRACE("pick-cpu", part, bound); 1565 1566 ASSERT(!(flags & CYF_CPU_BOUND) || bound != NULL); 1567 ASSERT(!(flags & CYF_PART_BOUND) || part != NULL); 1568 1569 /* 1570 * If we're bound to our CPU, there isn't much choice involved. We 1571 * need to check that the CPU passed as bound is in the cpupart, and 1572 * that the CPU that we're binding to has been configured. 1573 */ 1574 if (flags & CYF_CPU_BOUND) { 1575 CYC_PTRACE("pick-cpu-bound", bound, avoid); 1576 1577 if ((flags & CYF_PART_BOUND) && bound->cpu_part != part) 1578 panic("cyclic_pick_cpu: " 1579 "CPU binding contradicts partition binding"); 1580 1581 if (bound == avoid) 1582 return (NULL); 1583 1584 if (bound->cpu_cyclic == NULL) 1585 panic("cyclic_pick_cpu: " 1586 "attempt to bind to non-configured CPU"); 1587 1588 return (bound->cpu_cyclic); 1589 } 1590 1591 if (flags & CYF_PART_BOUND) { 1592 CYC_PTRACE("pick-part-bound", bound, avoid); 1593 offset = offsetof(cpu_t, cpu_next_part); 1594 } else { 1595 offset = offsetof(cpu_t, cpu_next_onln); 1596 } 1597 1598 c = start; 1599 do { 1600 if (c->cpu_cyclic == NULL) 1601 continue; 1602 1603 if (c->cpu_cyclic->cyp_state == CYS_OFFLINE) 1604 continue; 1605 1606 if (c == avoid) 1607 continue; 1608 1609 if (c->cpu_flags & CPU_ENABLE) 1610 goto found; 1611 1612 if (online == NULL) 1613 online = c; 1614 } while ((c = *(cpu_t **)((uintptr_t)c + offset)) != start); 1615 1616 /* 1617 * If we're here, we're in one of two situations: 1618 * 1619 * (a) We have a partition-bound cyclic, and there is no CPU in 1620 * our partition which is CPU_ENABLE'd. If we saw another 1621 * non-CYS_OFFLINE CPU in our partition, we'll go with it. 1622 * If not, the avoid CPU must be the only non-CYS_OFFLINE 1623 * CPU in the partition; we're forced to return NULL. 1624 * 1625 * (b) We have a partition-unbound cyclic, in which case there 1626 * must only be one CPU CPU_ENABLE'd, and it must be the one 1627 * we're trying to avoid. If cyclic_juggle()/cyclic_offline() 1628 * are called appropriately, this generally shouldn't happen 1629 * (the offline should fail before getting to this code). 1630 * At any rate: we can't avoid the avoid CPU, so we return 1631 * NULL. 1632 */ 1633 if (!(flags & CYF_PART_BOUND)) { 1634 ASSERT(avoid->cpu_flags & CPU_ENABLE); 1635 return (NULL); 1636 } 1637 1638 CYC_PTRACE("pick-no-intr", part, avoid); 1639 1640 if ((c = online) != NULL) 1641 goto found; 1642 1643 CYC_PTRACE("pick-fail", part, avoid); 1644 ASSERT(avoid->cpu_part == start->cpu_part); 1645 return (NULL); 1646 1647 found: 1648 CYC_PTRACE("pick-cpu-found", c, avoid); 1649 ASSERT(c != avoid); 1650 ASSERT(c->cpu_cyclic != NULL); 1651 1652 return (c->cpu_cyclic); 1653 } 1654 1655 static void 1656 cyclic_add_xcall(cyc_xcallarg_t *arg) 1657 { 1658 cyc_cpu_t *cpu = arg->cyx_cpu; 1659 cyc_handler_t *hdlr = arg->cyx_hdlr; 1660 cyc_time_t *when = arg->cyx_when; 1661 cyc_backend_t *be = cpu->cyp_backend; 1662 cyc_index_t ndx, nelems; 1663 cyc_cookie_t cookie; 1664 cyb_arg_t bar = be->cyb_arg; 1665 cyclic_t *cyclic; 1666 1667 ASSERT(cpu->cyp_nelems < cpu->cyp_size); 1668 1669 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1670 1671 CYC_TRACE(cpu, CY_HIGH_LEVEL, 1672 "add-xcall", when->cyt_when, when->cyt_interval); 1673 1674 nelems = cpu->cyp_nelems++; 1675 1676 if (nelems == 0) { 1677 /* 1678 * If this is the first element, we need to enable the 1679 * backend on this CPU. 1680 */ 1681 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "enabled"); 1682 be->cyb_enable(bar); 1683 } 1684 1685 ndx = cpu->cyp_heap[nelems]; 1686 cyclic = &cpu->cyp_cyclics[ndx]; 1687 1688 ASSERT(cyclic->cy_flags == CYF_FREE); 1689 cyclic->cy_interval = when->cyt_interval; 1690 1691 if (when->cyt_when == 0) { 1692 /* 1693 * If a start time hasn't been explicitly specified, we'll 1694 * start on the next interval boundary. 1695 */ 1696 cyclic->cy_expire = (gethrtime() / cyclic->cy_interval + 1) * 1697 cyclic->cy_interval; 1698 } else { 1699 cyclic->cy_expire = when->cyt_when; 1700 } 1701 1702 cyclic->cy_handler = hdlr->cyh_func; 1703 cyclic->cy_arg = hdlr->cyh_arg; 1704 cyclic->cy_level = hdlr->cyh_level; 1705 cyclic->cy_flags = arg->cyx_flags; 1706 1707 if (cyclic_upheap(cpu, nelems)) { 1708 hrtime_t exp = cyclic->cy_expire; 1709 1710 CYC_TRACE(cpu, CY_HIGH_LEVEL, "add-reprog", cyclic, exp); 1711 1712 /* 1713 * If our upheap propagated to the root, we need to 1714 * reprogram the interrupt source. 1715 */ 1716 be->cyb_reprogram(bar, exp); 1717 } 1718 be->cyb_restore_level(bar, cookie); 1719 1720 arg->cyx_ndx = ndx; 1721 } 1722 1723 static cyc_index_t 1724 cyclic_add_here(cyc_cpu_t *cpu, cyc_handler_t *hdlr, 1725 cyc_time_t *when, uint16_t flags) 1726 { 1727 cyc_backend_t *be = cpu->cyp_backend; 1728 cyb_arg_t bar = be->cyb_arg; 1729 cyc_xcallarg_t arg; 1730 1731 CYC_PTRACE("add-cpu", cpu, hdlr->cyh_func); 1732 ASSERT(MUTEX_HELD(&cpu_lock)); 1733 ASSERT(cpu->cyp_state == CYS_ONLINE); 1734 ASSERT(!(cpu->cyp_cpu->cpu_flags & CPU_OFFLINE)); 1735 ASSERT(when->cyt_when >= 0 && when->cyt_interval > 0); 1736 1737 if (cpu->cyp_nelems == cpu->cyp_size) { 1738 /* 1739 * This is expensive; it will cross call onto the other 1740 * CPU to perform the expansion. 1741 */ 1742 cyclic_expand(cpu); 1743 ASSERT(cpu->cyp_nelems < cpu->cyp_size); 1744 } 1745 1746 /* 1747 * By now, we know that we're going to be able to successfully 1748 * perform the add. Now cross call over to the CPU of interest to 1749 * actually add our cyclic. 1750 */ 1751 arg.cyx_cpu = cpu; 1752 arg.cyx_hdlr = hdlr; 1753 arg.cyx_when = when; 1754 arg.cyx_flags = flags; 1755 1756 be->cyb_xcall(bar, cpu->cyp_cpu, (cyc_func_t)cyclic_add_xcall, &arg); 1757 1758 CYC_PTRACE("add-cpu-done", cpu, arg.cyx_ndx); 1759 1760 return (arg.cyx_ndx); 1761 } 1762 1763 static void 1764 cyclic_remove_xcall(cyc_xcallarg_t *arg) 1765 { 1766 cyc_cpu_t *cpu = arg->cyx_cpu; 1767 cyc_backend_t *be = cpu->cyp_backend; 1768 cyb_arg_t bar = be->cyb_arg; 1769 cyc_cookie_t cookie; 1770 cyc_index_t ndx = arg->cyx_ndx, nelems, i; 1771 cyc_index_t *heap, last; 1772 cyclic_t *cyclic; 1773 #ifdef DEBUG 1774 cyc_index_t root; 1775 #endif 1776 1777 ASSERT(cpu->cyp_state == CYS_REMOVING); 1778 1779 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1780 1781 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "remove-xcall", ndx); 1782 1783 heap = cpu->cyp_heap; 1784 nelems = cpu->cyp_nelems; 1785 ASSERT(nelems > 0); 1786 cyclic = &cpu->cyp_cyclics[ndx]; 1787 1788 /* 1789 * Grab the current expiration time. If this cyclic is being 1790 * removed as part of a juggling operation, the expiration time 1791 * will be used when the cyclic is added to the new CPU. 1792 */ 1793 if (arg->cyx_when != NULL) { 1794 arg->cyx_when->cyt_when = cyclic->cy_expire; 1795 arg->cyx_when->cyt_interval = cyclic->cy_interval; 1796 } 1797 1798 if (cyclic->cy_pend != 0) { 1799 /* 1800 * The pend is non-zero; this cyclic is currently being 1801 * executed (or will be executed shortly). If the caller 1802 * refuses to wait, we must return (doing nothing). Otherwise, 1803 * we will stash the pend value * in this CPU's rpend, and 1804 * then zero it out. The softint in the pend loop will see 1805 * that we have zeroed out pend, and will call the cyclic 1806 * handler rpend times. The caller will wait until the 1807 * softint has completed calling the cyclic handler. 1808 */ 1809 if (arg->cyx_wait == CY_NOWAIT) { 1810 arg->cyx_wait = CY_WAIT; 1811 goto out; 1812 } 1813 1814 ASSERT(cyclic->cy_level != CY_HIGH_LEVEL); 1815 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "remove-pend", cyclic->cy_pend); 1816 cpu->cyp_rpend = cyclic->cy_pend; 1817 cyclic->cy_pend = 0; 1818 } 1819 1820 /* 1821 * Now set the flags to CYF_FREE. We don't need a membar_enter() 1822 * between zeroing pend and setting the flags because we're at 1823 * CY_HIGH_LEVEL (that is, the zeroing of pend and the setting 1824 * of cy_flags appear atomic to softints). 1825 */ 1826 cyclic->cy_flags = CYF_FREE; 1827 1828 for (i = 0; i < nelems; i++) { 1829 if (heap[i] == ndx) 1830 break; 1831 } 1832 1833 if (i == nelems) 1834 panic("attempt to remove non-existent cyclic"); 1835 1836 cpu->cyp_nelems = --nelems; 1837 1838 if (nelems == 0) { 1839 /* 1840 * If we just removed the last element, then we need to 1841 * disable the backend on this CPU. 1842 */ 1843 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "disabled"); 1844 be->cyb_disable(bar); 1845 } 1846 1847 if (i == nelems) { 1848 /* 1849 * If we just removed the last element of the heap, then 1850 * we don't have to downheap. 1851 */ 1852 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-bottom"); 1853 goto out; 1854 } 1855 1856 #ifdef DEBUG 1857 root = heap[0]; 1858 #endif 1859 1860 /* 1861 * Swap the last element of the heap with the one we want to 1862 * remove, and downheap (this has the implicit effect of putting 1863 * the newly freed element on the free list). 1864 */ 1865 heap[i] = (last = heap[nelems]); 1866 heap[nelems] = ndx; 1867 1868 if (i == 0) { 1869 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-root"); 1870 cyclic_downheap(cpu, 0); 1871 } else { 1872 if (cyclic_upheap(cpu, i) == 0) { 1873 /* 1874 * The upheap didn't propagate to the root; if it 1875 * didn't propagate at all, we need to downheap. 1876 */ 1877 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-no-root"); 1878 if (heap[i] == last) { 1879 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-no-up"); 1880 cyclic_downheap(cpu, i); 1881 } 1882 ASSERT(heap[0] == root); 1883 goto out; 1884 } 1885 } 1886 1887 /* 1888 * We're here because we changed the root; we need to reprogram 1889 * the clock source. 1890 */ 1891 cyclic = &cpu->cyp_cyclics[heap[0]]; 1892 1893 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-reprog"); 1894 1895 ASSERT(nelems != 0); 1896 be->cyb_reprogram(bar, cyclic->cy_expire); 1897 out: 1898 be->cyb_restore_level(bar, cookie); 1899 } 1900 1901 static int 1902 cyclic_remove_here(cyc_cpu_t *cpu, cyc_index_t ndx, cyc_time_t *when, int wait) 1903 { 1904 cyc_backend_t *be = cpu->cyp_backend; 1905 cyc_xcallarg_t arg; 1906 cyclic_t *cyclic = &cpu->cyp_cyclics[ndx]; 1907 cyc_level_t level = cyclic->cy_level; 1908 1909 ASSERT(MUTEX_HELD(&cpu_lock)); 1910 ASSERT(cpu->cyp_rpend == 0); 1911 ASSERT(wait == CY_WAIT || wait == CY_NOWAIT); 1912 1913 arg.cyx_ndx = ndx; 1914 arg.cyx_cpu = cpu; 1915 arg.cyx_when = when; 1916 arg.cyx_wait = wait; 1917 1918 ASSERT(cpu->cyp_state == CYS_ONLINE); 1919 cpu->cyp_state = CYS_REMOVING; 1920 1921 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 1922 (cyc_func_t)cyclic_remove_xcall, &arg); 1923 1924 /* 1925 * If the cyclic we removed wasn't at CY_HIGH_LEVEL, then we need to 1926 * check the cyp_rpend. If it's non-zero, then we need to wait here 1927 * for all pending cyclic handlers to run. 1928 */ 1929 ASSERT(!(level == CY_HIGH_LEVEL && cpu->cyp_rpend != 0)); 1930 ASSERT(!(wait == CY_NOWAIT && cpu->cyp_rpend != 0)); 1931 ASSERT(!(arg.cyx_wait == CY_NOWAIT && cpu->cyp_rpend != 0)); 1932 1933 if (wait != arg.cyx_wait) { 1934 /* 1935 * We are being told that we must wait if we want to 1936 * remove this cyclic; put the CPU back in the CYS_ONLINE 1937 * state and return failure. 1938 */ 1939 ASSERT(wait == CY_NOWAIT && arg.cyx_wait == CY_WAIT); 1940 ASSERT(cpu->cyp_state == CYS_REMOVING); 1941 cpu->cyp_state = CYS_ONLINE; 1942 1943 return (0); 1944 } 1945 1946 if (cpu->cyp_rpend != 0) 1947 sema_p(&cpu->cyp_modify_wait); 1948 1949 ASSERT(cpu->cyp_state == CYS_REMOVING); 1950 1951 cpu->cyp_rpend = 0; 1952 cpu->cyp_state = CYS_ONLINE; 1953 1954 return (1); 1955 } 1956 1957 /* 1958 * If cyclic_reprogram() is called on the same CPU as the cyclic's CPU, then 1959 * it calls this function directly. Else, it invokes this function through 1960 * an X-call to the cyclic's CPU. 1961 */ 1962 static boolean_t 1963 cyclic_reprogram_cyclic(cyc_cpu_t *cpu, cyc_index_t ndx, hrtime_t expire, 1964 boolean_t is_local) 1965 { 1966 cyc_backend_t *be = cpu->cyp_backend; 1967 cyb_arg_t bar = be->cyb_arg; 1968 cyc_cookie_t cookie; 1969 cyc_index_t nelems, i; 1970 cyc_index_t *heap; 1971 cyclic_t *cyclic; 1972 hrtime_t oexpire; 1973 int reprog; 1974 1975 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1976 1977 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "reprog-xcall", ndx); 1978 1979 nelems = cpu->cyp_nelems; 1980 ASSERT(nelems > 0); 1981 heap = cpu->cyp_heap; 1982 1983 /* 1984 * Reprogrammed cyclics are typically one-shot ones that get 1985 * set to infinity on every expiration. We shorten the search by 1986 * searching from the bottom of the heap to the top instead of the 1987 * other way around. 1988 */ 1989 for (i = nelems - 1; i >= 0; i--) { 1990 if (heap[i] == ndx) 1991 break; 1992 } 1993 if (i < 0) { 1994 /* 1995 * Report failure (rather than panicking) if and only if the 1996 * cyclic_reprogram() is occurring on the CPU which the cyclic 1997 * resides upon, and there is evidence that a pending cyclic 1998 * was removed from that CPU. 1999 * 2000 * This covers the race where a cyclic is removed out from 2001 * under its running handler, which then attempts a reprogram. 2002 */ 2003 if (is_local && 2004 cpu->cyp_state == CYS_REMOVING && cpu->cyp_rpend > 0) { 2005 return (B_FALSE); 2006 } 2007 panic("attempt to reprogram non-existent cyclic"); 2008 } 2009 2010 cyclic = &cpu->cyp_cyclics[ndx]; 2011 oexpire = cyclic->cy_expire; 2012 cyclic->cy_expire = expire; 2013 2014 reprog = (i == 0); 2015 if (expire > oexpire) { 2016 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "reprog-down", i); 2017 cyclic_downheap(cpu, i); 2018 } else if (i > 0) { 2019 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "reprog-up", i); 2020 reprog = cyclic_upheap(cpu, i); 2021 } 2022 2023 if (reprog && (cpu->cyp_state != CYS_SUSPENDED)) { 2024 /* 2025 * The root changed. Reprogram the clock source. 2026 */ 2027 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "reprog-root"); 2028 cyclic = &cpu->cyp_cyclics[heap[0]]; 2029 be->cyb_reprogram(bar, cyclic->cy_expire); 2030 } 2031 2032 be->cyb_restore_level(bar, cookie); 2033 return (B_TRUE); 2034 } 2035 2036 static void 2037 cyclic_reprogram_xcall(cyc_xcallarg_t *arg) 2038 { 2039 /* 2040 * Cross-call reprogram operations should never fail due to racing 2041 * cyclic removal, as they cannot occur from the handler itself. 2042 */ 2043 VERIFY(cyclic_reprogram_cyclic(arg->cyx_cpu, arg->cyx_ndx, 2044 arg->cyx_when->cyt_when, B_FALSE)); 2045 } 2046 2047 static void 2048 cyclic_reprogram_here(cyc_cpu_t *cpu, cyc_index_t ndx, hrtime_t expiration) 2049 { 2050 cyc_backend_t *be = cpu->cyp_backend; 2051 cyc_xcallarg_t arg; 2052 cyc_time_t when; 2053 2054 ASSERT(expiration > 0); 2055 2056 arg.cyx_ndx = ndx; 2057 arg.cyx_cpu = cpu; 2058 arg.cyx_when = &when; 2059 when.cyt_when = expiration; 2060 2061 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 2062 (cyc_func_t)cyclic_reprogram_xcall, &arg); 2063 } 2064 2065 /* 2066 * cyclic_juggle_one_to() should only be called when the source cyclic 2067 * can be juggled and the destination CPU is known to be able to accept 2068 * it. 2069 */ 2070 static void 2071 cyclic_juggle_one_to(cyc_id_t *idp, cyc_cpu_t *dest) 2072 { 2073 cyc_cpu_t *src = idp->cyi_cpu; 2074 cyc_index_t ndx = idp->cyi_ndx; 2075 cyc_time_t when; 2076 cyc_handler_t hdlr; 2077 cyclic_t *cyclic; 2078 uint16_t flags; 2079 hrtime_t delay; 2080 2081 ASSERT(MUTEX_HELD(&cpu_lock)); 2082 ASSERT(src != NULL && idp->cyi_omni_list == NULL); 2083 ASSERT(!(dest->cyp_cpu->cpu_flags & (CPU_QUIESCED | CPU_OFFLINE))); 2084 CYC_PTRACE("juggle-one-to", idp, dest); 2085 2086 cyclic = &src->cyp_cyclics[ndx]; 2087 2088 flags = cyclic->cy_flags; 2089 ASSERT(!(flags & CYF_CPU_BOUND) && !(flags & CYF_FREE)); 2090 2091 hdlr.cyh_func = cyclic->cy_handler; 2092 hdlr.cyh_level = cyclic->cy_level; 2093 hdlr.cyh_arg = cyclic->cy_arg; 2094 2095 /* 2096 * Before we begin the juggling process, see if the destination 2097 * CPU requires an expansion. If it does, we'll perform the 2098 * expansion before removing the cyclic. This is to prevent us 2099 * from blocking while a system-critical cyclic (notably, the clock 2100 * cyclic) isn't on a CPU. 2101 */ 2102 if (dest->cyp_nelems == dest->cyp_size) { 2103 CYC_PTRACE("remove-expand", idp, dest); 2104 cyclic_expand(dest); 2105 ASSERT(dest->cyp_nelems < dest->cyp_size); 2106 } 2107 2108 /* 2109 * Prevent a reprogram of this cyclic while we are relocating it. 2110 * Otherwise, cyclic_reprogram_here() will end up sending an X-call 2111 * to the wrong CPU. 2112 */ 2113 rw_enter(&idp->cyi_lock, RW_WRITER); 2114 2115 /* 2116 * Remove the cyclic from the source. As mentioned above, we cannot 2117 * block during this operation; if we cannot remove the cyclic 2118 * without waiting, we spin for a time shorter than the interval, and 2119 * reattempt the (non-blocking) removal. If we continue to fail, 2120 * we will exponentially back off (up to half of the interval). 2121 * Note that the removal will ultimately succeed -- even if the 2122 * cyclic handler is blocked on a resource held by a thread which we 2123 * have preempted, priority inheritance assures that the preempted 2124 * thread will preempt us and continue to progress. 2125 */ 2126 for (delay = NANOSEC / MICROSEC; ; delay <<= 1) { 2127 /* 2128 * Before we begin this operation, disable kernel preemption. 2129 */ 2130 kpreempt_disable(); 2131 if (cyclic_remove_here(src, ndx, &when, CY_NOWAIT)) 2132 break; 2133 2134 /* 2135 * The operation failed; enable kernel preemption while 2136 * spinning. 2137 */ 2138 kpreempt_enable(); 2139 2140 CYC_PTRACE("remove-retry", idp, src); 2141 2142 if (delay > (cyclic->cy_interval >> 1)) 2143 delay = cyclic->cy_interval >> 1; 2144 2145 /* 2146 * Drop the RW lock to avoid a deadlock with the cyclic 2147 * handler (because it can potentially call cyclic_reprogram(). 2148 */ 2149 rw_exit(&idp->cyi_lock); 2150 drv_usecwait((clock_t)(delay / (NANOSEC / MICROSEC))); 2151 rw_enter(&idp->cyi_lock, RW_WRITER); 2152 } 2153 2154 /* 2155 * Now add the cyclic to the destination. This won't block; we 2156 * performed any necessary (blocking) expansion of the destination 2157 * CPU before removing the cyclic from the source CPU. 2158 */ 2159 idp->cyi_ndx = cyclic_add_here(dest, &hdlr, &when, flags); 2160 idp->cyi_cpu = dest; 2161 kpreempt_enable(); 2162 2163 /* 2164 * Now that we have successfully relocated the cyclic, allow 2165 * it to be reprogrammed. 2166 */ 2167 rw_exit(&idp->cyi_lock); 2168 } 2169 2170 static int 2171 cyclic_juggle_one(cyc_id_t *idp) 2172 { 2173 cyc_index_t ndx = idp->cyi_ndx; 2174 cyc_cpu_t *cpu = idp->cyi_cpu, *dest; 2175 cyclic_t *cyclic = &cpu->cyp_cyclics[ndx]; 2176 cpu_t *c = cpu->cyp_cpu; 2177 cpupart_t *part = c->cpu_part; 2178 2179 CYC_PTRACE("juggle-one", idp, cpu); 2180 ASSERT(MUTEX_HELD(&cpu_lock)); 2181 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2182 ASSERT(cpu->cyp_state == CYS_ONLINE); 2183 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 2184 2185 if ((dest = cyclic_pick_cpu(part, c, c, cyclic->cy_flags)) == NULL) { 2186 /* 2187 * Bad news: this cyclic can't be juggled. 2188 */ 2189 CYC_PTRACE("juggle-fail", idp, cpu) 2190 return (0); 2191 } 2192 2193 cyclic_juggle_one_to(idp, dest); 2194 2195 return (1); 2196 } 2197 2198 static void 2199 cyclic_unbind_cpu(cyclic_id_t id) 2200 { 2201 cyc_id_t *idp = (cyc_id_t *)id; 2202 cyc_cpu_t *cpu = idp->cyi_cpu; 2203 cpu_t *c = cpu->cyp_cpu; 2204 cyclic_t *cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 2205 2206 CYC_PTRACE("unbind-cpu", id, cpu); 2207 ASSERT(MUTEX_HELD(&cpu_lock)); 2208 ASSERT(cpu->cyp_state == CYS_ONLINE); 2209 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 2210 ASSERT(cyclic->cy_flags & CYF_CPU_BOUND); 2211 2212 cyclic->cy_flags &= ~CYF_CPU_BOUND; 2213 2214 /* 2215 * If we were bound to CPU which has interrupts disabled, we need 2216 * to juggle away. This can only fail if we are bound to a 2217 * processor set, and if every CPU in the processor set has 2218 * interrupts disabled. 2219 */ 2220 if (!(c->cpu_flags & CPU_ENABLE)) { 2221 int res = cyclic_juggle_one(idp); 2222 2223 ASSERT((res && idp->cyi_cpu != cpu) || 2224 (!res && (cyclic->cy_flags & CYF_PART_BOUND))); 2225 } 2226 } 2227 2228 static void 2229 cyclic_bind_cpu(cyclic_id_t id, cpu_t *d) 2230 { 2231 cyc_id_t *idp = (cyc_id_t *)id; 2232 cyc_cpu_t *dest = d->cpu_cyclic, *cpu = idp->cyi_cpu; 2233 cpu_t *c = cpu->cyp_cpu; 2234 cyclic_t *cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 2235 cpupart_t *part = c->cpu_part; 2236 2237 CYC_PTRACE("bind-cpu", id, dest); 2238 ASSERT(MUTEX_HELD(&cpu_lock)); 2239 ASSERT(!(d->cpu_flags & CPU_OFFLINE)); 2240 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2241 ASSERT(cpu->cyp_state == CYS_ONLINE); 2242 ASSERT(dest != NULL); 2243 ASSERT(dest->cyp_state == CYS_ONLINE); 2244 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 2245 ASSERT(!(cyclic->cy_flags & CYF_CPU_BOUND)); 2246 2247 dest = cyclic_pick_cpu(part, d, NULL, cyclic->cy_flags | CYF_CPU_BOUND); 2248 2249 if (dest != cpu) { 2250 cyclic_juggle_one_to(idp, dest); 2251 cyclic = &dest->cyp_cyclics[idp->cyi_ndx]; 2252 } 2253 2254 cyclic->cy_flags |= CYF_CPU_BOUND; 2255 } 2256 2257 static void 2258 cyclic_unbind_cpupart(cyclic_id_t id) 2259 { 2260 cyc_id_t *idp = (cyc_id_t *)id; 2261 cyc_cpu_t *cpu = idp->cyi_cpu; 2262 cpu_t *c = cpu->cyp_cpu; 2263 cyclic_t *cyc = &cpu->cyp_cyclics[idp->cyi_ndx]; 2264 2265 CYC_PTRACE("unbind-part", idp, c->cpu_part); 2266 ASSERT(MUTEX_HELD(&cpu_lock)); 2267 ASSERT(cpu->cyp_state == CYS_ONLINE); 2268 ASSERT(!(cyc->cy_flags & CYF_FREE)); 2269 ASSERT(cyc->cy_flags & CYF_PART_BOUND); 2270 2271 cyc->cy_flags &= ~CYF_PART_BOUND; 2272 2273 /* 2274 * If we're on a CPU which has interrupts disabled (and if this cyclic 2275 * isn't bound to the CPU), we need to juggle away. 2276 */ 2277 if (!(c->cpu_flags & CPU_ENABLE) && !(cyc->cy_flags & CYF_CPU_BOUND)) { 2278 int res = cyclic_juggle_one(idp); 2279 2280 ASSERT(res && idp->cyi_cpu != cpu); 2281 } 2282 } 2283 2284 static void 2285 cyclic_bind_cpupart(cyclic_id_t id, cpupart_t *part) 2286 { 2287 cyc_id_t *idp = (cyc_id_t *)id; 2288 cyc_cpu_t *cpu = idp->cyi_cpu, *dest; 2289 cpu_t *c = cpu->cyp_cpu; 2290 cyclic_t *cyc = &cpu->cyp_cyclics[idp->cyi_ndx]; 2291 2292 CYC_PTRACE("bind-part", idp, part); 2293 ASSERT(MUTEX_HELD(&cpu_lock)); 2294 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2295 ASSERT(cpu->cyp_state == CYS_ONLINE); 2296 ASSERT(!(cyc->cy_flags & CYF_FREE)); 2297 ASSERT(!(cyc->cy_flags & CYF_PART_BOUND)); 2298 ASSERT(part->cp_ncpus > 0); 2299 2300 dest = cyclic_pick_cpu(part, c, NULL, cyc->cy_flags | CYF_PART_BOUND); 2301 2302 if (dest != cpu) { 2303 cyclic_juggle_one_to(idp, dest); 2304 cyc = &dest->cyp_cyclics[idp->cyi_ndx]; 2305 } 2306 2307 cyc->cy_flags |= CYF_PART_BOUND; 2308 } 2309 2310 static void 2311 cyclic_configure(cpu_t *c) 2312 { 2313 cyc_cpu_t *cpu = kmem_zalloc(sizeof (cyc_cpu_t), KM_SLEEP); 2314 cyc_backend_t *nbe = kmem_zalloc(sizeof (cyc_backend_t), KM_SLEEP); 2315 int i; 2316 2317 CYC_PTRACE1("configure", cpu); 2318 ASSERT(MUTEX_HELD(&cpu_lock)); 2319 2320 if (cyclic_id_cache == NULL) 2321 cyclic_id_cache = kmem_cache_create("cyclic_id_cache", 2322 sizeof (cyc_id_t), 0, NULL, NULL, NULL, NULL, NULL, 0); 2323 2324 cpu->cyp_cpu = c; 2325 2326 sema_init(&cpu->cyp_modify_wait, 0, NULL, SEMA_DEFAULT, NULL); 2327 2328 cpu->cyp_size = 1; 2329 cpu->cyp_heap = kmem_zalloc(sizeof (cyc_index_t), KM_SLEEP); 2330 cpu->cyp_cyclics = kmem_zalloc(sizeof (cyclic_t), KM_SLEEP); 2331 cpu->cyp_cyclics->cy_flags = CYF_FREE; 2332 2333 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 2334 /* 2335 * We don't need to set the sizemask; it's already zero 2336 * (which is the appropriate sizemask for a size of 1). 2337 */ 2338 cpu->cyp_softbuf[i].cys_buf[0].cypc_buf = 2339 kmem_alloc(sizeof (cyc_index_t), KM_SLEEP); 2340 } 2341 2342 cpu->cyp_state = CYS_OFFLINE; 2343 2344 /* 2345 * Setup the backend for this CPU. 2346 */ 2347 bcopy(&cyclic_backend, nbe, sizeof (cyc_backend_t)); 2348 nbe->cyb_arg = nbe->cyb_configure(c); 2349 cpu->cyp_backend = nbe; 2350 2351 /* 2352 * On platforms where stray interrupts may be taken during startup, 2353 * the CPU's cpu_cyclic pointer serves as an indicator that the 2354 * cyclic subsystem for this CPU is prepared to field interrupts. 2355 */ 2356 membar_producer(); 2357 2358 c->cpu_cyclic = cpu; 2359 } 2360 2361 static void 2362 cyclic_unconfigure(cpu_t *c) 2363 { 2364 cyc_cpu_t *cpu = c->cpu_cyclic; 2365 cyc_backend_t *be = cpu->cyp_backend; 2366 cyb_arg_t bar = be->cyb_arg; 2367 int i; 2368 2369 CYC_PTRACE1("unconfigure", cpu); 2370 ASSERT(MUTEX_HELD(&cpu_lock)); 2371 ASSERT(cpu->cyp_state == CYS_OFFLINE); 2372 ASSERT(cpu->cyp_nelems == 0); 2373 2374 /* 2375 * Let the backend know that the CPU is being yanked, and free up 2376 * the backend structure. 2377 */ 2378 be->cyb_unconfigure(bar); 2379 kmem_free(be, sizeof (cyc_backend_t)); 2380 cpu->cyp_backend = NULL; 2381 2382 /* 2383 * Free up the producer/consumer buffers at each of the soft levels. 2384 */ 2385 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 2386 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 2387 uchar_t hard = softbuf->cys_hard; 2388 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard]; 2389 size_t bufsize = sizeof (cyc_index_t) * (pc->cypc_sizemask + 1); 2390 2391 /* 2392 * Assert that we're not in the middle of a resize operation. 2393 */ 2394 ASSERT(hard == softbuf->cys_soft); 2395 ASSERT(hard == 0 || hard == 1); 2396 ASSERT(pc->cypc_buf != NULL); 2397 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_buf == NULL); 2398 2399 kmem_free(pc->cypc_buf, bufsize); 2400 pc->cypc_buf = NULL; 2401 } 2402 2403 /* 2404 * Finally, clean up our remaining dynamic structures and NULL out 2405 * the cpu_cyclic pointer. 2406 */ 2407 kmem_free(cpu->cyp_cyclics, cpu->cyp_size * sizeof (cyclic_t)); 2408 kmem_free(cpu->cyp_heap, cpu->cyp_size * sizeof (cyc_index_t)); 2409 kmem_free(cpu, sizeof (cyc_cpu_t)); 2410 2411 c->cpu_cyclic = NULL; 2412 } 2413 2414 static int 2415 cyclic_cpu_setup(cpu_setup_t what, int id, void *arg __unused) 2416 { 2417 /* 2418 * We are guaranteed that there is still/already an entry in the 2419 * cpu array for this CPU. 2420 */ 2421 cpu_t *c = cpu[id]; 2422 cyc_cpu_t *cyp = c->cpu_cyclic; 2423 2424 ASSERT(MUTEX_HELD(&cpu_lock)); 2425 2426 switch (what) { 2427 case CPU_CONFIG: 2428 ASSERT(cyp == NULL); 2429 cyclic_configure(c); 2430 break; 2431 2432 case CPU_UNCONFIG: 2433 ASSERT(cyp != NULL && cyp->cyp_state == CYS_OFFLINE); 2434 cyclic_unconfigure(c); 2435 break; 2436 2437 default: 2438 break; 2439 } 2440 2441 return (0); 2442 } 2443 2444 static void 2445 cyclic_suspend_xcall(cyc_xcallarg_t *arg) 2446 { 2447 cyc_cpu_t *cpu = arg->cyx_cpu; 2448 cyc_backend_t *be = cpu->cyp_backend; 2449 cyc_cookie_t cookie; 2450 cyb_arg_t bar = be->cyb_arg; 2451 2452 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 2453 2454 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "suspend-xcall", cpu->cyp_nelems); 2455 ASSERT(cpu->cyp_state == CYS_ONLINE || cpu->cyp_state == CYS_OFFLINE); 2456 2457 /* 2458 * We won't disable this CPU unless it has a non-zero number of 2459 * elements (cpu_lock assures that no one else may be attempting 2460 * to disable this CPU). 2461 */ 2462 if (cpu->cyp_nelems > 0) { 2463 ASSERT(cpu->cyp_state == CYS_ONLINE); 2464 be->cyb_disable(bar); 2465 } 2466 2467 if (cpu->cyp_state == CYS_ONLINE) 2468 cpu->cyp_state = CYS_SUSPENDED; 2469 2470 be->cyb_suspend(bar); 2471 be->cyb_restore_level(bar, cookie); 2472 } 2473 2474 static void 2475 cyclic_resume_xcall(cyc_xcallarg_t *arg) 2476 { 2477 cyc_cpu_t *cpu = arg->cyx_cpu; 2478 cyc_backend_t *be = cpu->cyp_backend; 2479 cyc_cookie_t cookie; 2480 cyb_arg_t bar = be->cyb_arg; 2481 cyc_state_t state = cpu->cyp_state; 2482 2483 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 2484 2485 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "resume-xcall", cpu->cyp_nelems); 2486 ASSERT(state == CYS_SUSPENDED || state == CYS_OFFLINE); 2487 2488 be->cyb_resume(bar); 2489 2490 /* 2491 * We won't enable this CPU unless it has a non-zero number of 2492 * elements. 2493 */ 2494 if (cpu->cyp_nelems > 0) { 2495 cyclic_t *cyclic = &cpu->cyp_cyclics[cpu->cyp_heap[0]]; 2496 hrtime_t exp = cyclic->cy_expire; 2497 2498 CYC_TRACE(cpu, CY_HIGH_LEVEL, "resume-reprog", cyclic, exp); 2499 ASSERT(state == CYS_SUSPENDED); 2500 be->cyb_enable(bar); 2501 be->cyb_reprogram(bar, exp); 2502 } 2503 2504 if (state == CYS_SUSPENDED) 2505 cpu->cyp_state = CYS_ONLINE; 2506 2507 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "resume-done", cpu->cyp_nelems); 2508 be->cyb_restore_level(bar, cookie); 2509 } 2510 2511 static void 2512 cyclic_omni_start(cyc_id_t *idp, cyc_cpu_t *cpu) 2513 { 2514 cyc_omni_handler_t *omni = &idp->cyi_omni_hdlr; 2515 cyc_omni_cpu_t *ocpu = kmem_alloc(sizeof (cyc_omni_cpu_t), KM_SLEEP); 2516 cyc_handler_t hdlr; 2517 cyc_time_t when; 2518 2519 CYC_PTRACE("omni-start", cpu, idp); 2520 ASSERT(MUTEX_HELD(&cpu_lock)); 2521 ASSERT(cpu->cyp_state == CYS_ONLINE); 2522 ASSERT(idp->cyi_cpu == NULL); 2523 2524 hdlr.cyh_func = NULL; 2525 hdlr.cyh_arg = NULL; 2526 hdlr.cyh_level = CY_LEVELS; 2527 2528 when.cyt_when = 0; 2529 when.cyt_interval = 0; 2530 2531 omni->cyo_online(omni->cyo_arg, cpu->cyp_cpu, &hdlr, &when); 2532 2533 ASSERT(hdlr.cyh_func != NULL); 2534 ASSERT(hdlr.cyh_level < CY_LEVELS); 2535 ASSERT(when.cyt_when >= 0 && when.cyt_interval > 0); 2536 2537 ocpu->cyo_cpu = cpu; 2538 ocpu->cyo_arg = hdlr.cyh_arg; 2539 ocpu->cyo_ndx = cyclic_add_here(cpu, &hdlr, &when, 0); 2540 ocpu->cyo_next = idp->cyi_omni_list; 2541 idp->cyi_omni_list = ocpu; 2542 } 2543 2544 static void 2545 cyclic_omni_stop(cyc_id_t *idp, cyc_cpu_t *cpu) 2546 { 2547 cyc_omni_handler_t *omni = &idp->cyi_omni_hdlr; 2548 cyc_omni_cpu_t *ocpu = idp->cyi_omni_list, *prev = NULL; 2549 clock_t delay; 2550 int ret; 2551 2552 CYC_PTRACE("omni-stop", cpu, idp); 2553 ASSERT(MUTEX_HELD(&cpu_lock)); 2554 ASSERT(cpu->cyp_state == CYS_ONLINE); 2555 ASSERT(idp->cyi_cpu == NULL); 2556 ASSERT(ocpu != NULL); 2557 2558 /* 2559 * Prevent a reprogram of this cyclic while we are removing it. 2560 * Otherwise, cyclic_reprogram_here() will end up sending an X-call 2561 * to the offlined CPU. 2562 */ 2563 rw_enter(&idp->cyi_lock, RW_WRITER); 2564 2565 while (ocpu != NULL && ocpu->cyo_cpu != cpu) { 2566 prev = ocpu; 2567 ocpu = ocpu->cyo_next; 2568 } 2569 2570 /* 2571 * We _must_ have found an cyc_omni_cpu which corresponds to this 2572 * CPU -- the definition of an omnipresent cyclic is that it runs 2573 * on all online CPUs. 2574 */ 2575 ASSERT(ocpu != NULL); 2576 2577 if (prev == NULL) { 2578 idp->cyi_omni_list = ocpu->cyo_next; 2579 } else { 2580 prev->cyo_next = ocpu->cyo_next; 2581 } 2582 2583 /* 2584 * Remove the cyclic from the source. We cannot block during this 2585 * operation because we are holding the cyi_lock which can be held 2586 * by the cyclic handler via cyclic_reprogram(). 2587 * 2588 * If we cannot remove the cyclic without waiting, we spin for a time, 2589 * and reattempt the (non-blocking) removal. If the handler is blocked 2590 * on the cyi_lock, then we let go of it in the spin loop to give 2591 * the handler a chance to run. Note that the removal will ultimately 2592 * succeed -- even if the cyclic handler is blocked on a resource 2593 * held by a thread which we have preempted, priority inheritance 2594 * assures that the preempted thread will preempt us and continue 2595 * to progress. 2596 */ 2597 for (delay = 1; ; delay <<= 1) { 2598 /* 2599 * Before we begin this operation, disable kernel preemption. 2600 */ 2601 kpreempt_disable(); 2602 ret = cyclic_remove_here(ocpu->cyo_cpu, ocpu->cyo_ndx, NULL, 2603 CY_NOWAIT); 2604 /* 2605 * Enable kernel preemption while spinning. 2606 */ 2607 kpreempt_enable(); 2608 2609 if (ret) 2610 break; 2611 2612 CYC_PTRACE("remove-omni-retry", idp, ocpu->cyo_cpu); 2613 2614 /* 2615 * Drop the RW lock to avoid a deadlock with the cyclic 2616 * handler (because it can potentially call cyclic_reprogram(). 2617 */ 2618 rw_exit(&idp->cyi_lock); 2619 drv_usecwait(delay); 2620 rw_enter(&idp->cyi_lock, RW_WRITER); 2621 } 2622 2623 /* 2624 * Now that we have successfully removed the cyclic, allow the omni 2625 * cyclic to be reprogrammed on other CPUs. 2626 */ 2627 rw_exit(&idp->cyi_lock); 2628 2629 /* 2630 * The cyclic has been removed from this CPU; time to call the 2631 * omnipresent offline handler. 2632 */ 2633 if (omni->cyo_offline != NULL) 2634 omni->cyo_offline(omni->cyo_arg, cpu->cyp_cpu, ocpu->cyo_arg); 2635 2636 kmem_free(ocpu, sizeof (cyc_omni_cpu_t)); 2637 } 2638 2639 static cyc_id_t * 2640 cyclic_new_id() 2641 { 2642 cyc_id_t *idp; 2643 2644 ASSERT(MUTEX_HELD(&cpu_lock)); 2645 2646 idp = kmem_cache_alloc(cyclic_id_cache, KM_SLEEP); 2647 2648 /* 2649 * The cyi_cpu field of the cyc_id_t structure tracks the CPU 2650 * associated with the cyclic. If and only if this field is NULL, the 2651 * cyc_id_t is an omnipresent cyclic. Note that cyi_omni_list may be 2652 * NULL for an omnipresent cyclic while the cyclic is being created 2653 * or destroyed. 2654 */ 2655 idp->cyi_cpu = NULL; 2656 idp->cyi_ndx = 0; 2657 rw_init(&idp->cyi_lock, NULL, RW_DEFAULT, NULL); 2658 2659 idp->cyi_next = cyclic_id_head; 2660 idp->cyi_prev = NULL; 2661 idp->cyi_omni_list = NULL; 2662 2663 if (cyclic_id_head != NULL) { 2664 ASSERT(cyclic_id_head->cyi_prev == NULL); 2665 cyclic_id_head->cyi_prev = idp; 2666 } 2667 2668 cyclic_id_head = idp; 2669 2670 return (idp); 2671 } 2672 2673 /* 2674 * cyclic_id_t cyclic_add(cyc_handler_t *, cyc_time_t *) 2675 * 2676 * Overview 2677 * 2678 * cyclic_add() will create an unbound cyclic with the specified handler and 2679 * interval. The cyclic will run on a CPU which both has interrupts enabled 2680 * and is in the system CPU partition. 2681 * 2682 * Arguments and notes 2683 * 2684 * As its first argument, cyclic_add() takes a cyc_handler, which has the 2685 * following members: 2686 * 2687 * cyc_func_t cyh_func <-- Cyclic handler 2688 * void *cyh_arg <-- Argument to cyclic handler 2689 * cyc_level_t cyh_level <-- Level at which to fire; must be one of 2690 * CY_LOW_LEVEL, CY_LOCK_LEVEL or CY_HIGH_LEVEL 2691 * 2692 * Note that cyh_level is _not_ an ipl or spl; it must be one the 2693 * CY_*_LEVELs. This layer of abstraction allows the platform to define 2694 * the precise interrupt priority levels, within the following constraints: 2695 * 2696 * CY_LOCK_LEVEL must map to LOCK_LEVEL 2697 * CY_HIGH_LEVEL must map to an ipl greater than LOCK_LEVEL 2698 * CY_LOW_LEVEL must map to an ipl below LOCK_LEVEL 2699 * 2700 * In addition to a cyc_handler, cyclic_add() takes a cyc_time, which 2701 * has the following members: 2702 * 2703 * hrtime_t cyt_when <-- Absolute time, in nanoseconds since boot, at 2704 * which to start firing 2705 * hrtime_t cyt_interval <-- Length of interval, in nanoseconds 2706 * 2707 * gethrtime() is the time source for nanoseconds since boot. If cyt_when 2708 * is set to 0, the cyclic will start to fire when cyt_interval next 2709 * divides the number of nanoseconds since boot. 2710 * 2711 * The cyt_interval field _must_ be filled in by the caller; one-shots are 2712 * _not_ explicitly supported by the cyclic subsystem (cyclic_add() will 2713 * assert that cyt_interval is non-zero). The maximum value for either 2714 * field is INT64_MAX; the caller is responsible for assuring that 2715 * cyt_when + cyt_interval <= INT64_MAX. Neither field may be negative. 2716 * 2717 * For an arbitrary time t in the future, the cyclic handler is guaranteed 2718 * to have been called (t - cyt_when) / cyt_interval times. This will 2719 * be true even if interrupts have been disabled for periods greater than 2720 * cyt_interval nanoseconds. In order to compensate for such periods, 2721 * the cyclic handler may be called a finite number of times with an 2722 * arbitrarily small interval. 2723 * 2724 * The cyclic subsystem will not enforce any lower bound on the interval; 2725 * if the interval is less than the time required to process an interrupt, 2726 * the CPU will wedge. It's the responsibility of the caller to assure that 2727 * either the value of the interval is sane, or that its caller has 2728 * sufficient privilege to deny service (i.e. its caller is root). 2729 * 2730 * The cyclic handler is guaranteed to be single threaded, even while the 2731 * cyclic is being juggled between CPUs (see cyclic_juggle(), below). 2732 * That is, a given cyclic handler will never be executed simultaneously 2733 * on different CPUs. 2734 * 2735 * Return value 2736 * 2737 * cyclic_add() returns a cyclic_id_t, which is guaranteed to be a value 2738 * other than CYCLIC_NONE. cyclic_add() cannot fail. 2739 * 2740 * Caller's context 2741 * 2742 * cpu_lock must be held by the caller, and the caller must not be in 2743 * interrupt context. cyclic_add() will perform a KM_SLEEP kernel 2744 * memory allocation, so the usual rules (e.g. p_lock cannot be held) 2745 * apply. A cyclic may be added even in the presence of CPUs that have 2746 * not been configured with respect to the cyclic subsystem, but only 2747 * configured CPUs will be eligible to run the new cyclic. 2748 * 2749 * Cyclic handler's context 2750 * 2751 * Cyclic handlers will be executed in the interrupt context corresponding 2752 * to the specified level (i.e. either high, lock or low level). The 2753 * usual context rules apply. 2754 * 2755 * A cyclic handler may not grab ANY locks held by the caller of any of 2756 * cyclic_add(), cyclic_remove() or cyclic_bind(); the implementation of 2757 * these functions may require blocking on cyclic handler completion. 2758 * Moreover, cyclic handlers may not make any call back into the cyclic 2759 * subsystem. 2760 */ 2761 cyclic_id_t 2762 cyclic_add(cyc_handler_t *hdlr, cyc_time_t *when) 2763 { 2764 cyc_id_t *idp = cyclic_new_id(); 2765 2766 ASSERT(MUTEX_HELD(&cpu_lock)); 2767 ASSERT(when->cyt_when >= 0 && when->cyt_interval > 0); 2768 2769 idp->cyi_cpu = cyclic_pick_cpu(NULL, NULL, NULL, 0); 2770 idp->cyi_ndx = cyclic_add_here(idp->cyi_cpu, hdlr, when, 0); 2771 2772 return ((uintptr_t)idp); 2773 } 2774 2775 /* 2776 * cyclic_id_t cyclic_add_omni(cyc_omni_handler_t *) 2777 * 2778 * Overview 2779 * 2780 * cyclic_add_omni() will create an omnipresent cyclic with the specified 2781 * online and offline handlers. Omnipresent cyclics run on all online 2782 * CPUs, including CPUs which have unbound interrupts disabled. 2783 * 2784 * Arguments 2785 * 2786 * As its only argument, cyclic_add_omni() takes a cyc_omni_handler, which 2787 * has the following members: 2788 * 2789 * void (*cyo_online)() <-- Online handler 2790 * void (*cyo_offline)() <-- Offline handler 2791 * void *cyo_arg <-- Argument to be passed to on/offline handlers 2792 * 2793 * Online handler 2794 * 2795 * The cyo_online member is a pointer to a function which has the following 2796 * four arguments: 2797 * 2798 * void * <-- Argument (cyo_arg) 2799 * cpu_t * <-- Pointer to CPU about to be onlined 2800 * cyc_handler_t * <-- Pointer to cyc_handler_t; must be filled in 2801 * by omni online handler 2802 * cyc_time_t * <-- Pointer to cyc_time_t; must be filled in by 2803 * omni online handler 2804 * 2805 * The omni cyclic online handler is always called _before_ the omni 2806 * cyclic begins to fire on the specified CPU. As the above argument 2807 * description implies, the online handler must fill in the two structures 2808 * passed to it: the cyc_handler_t and the cyc_time_t. These are the 2809 * same two structures passed to cyclic_add(), outlined above. This 2810 * allows the omni cyclic to have maximum flexibility; different CPUs may 2811 * optionally 2812 * 2813 * (a) have different intervals 2814 * (b) be explicitly in or out of phase with one another 2815 * (c) have different handlers 2816 * (d) have different handler arguments 2817 * (e) fire at different levels 2818 * 2819 * Of these, (e) seems somewhat dubious, but is nonetheless allowed. 2820 * 2821 * The omni online handler is called in the same context as cyclic_add(), 2822 * and has the same liberties: omni online handlers may perform KM_SLEEP 2823 * kernel memory allocations, and may grab locks which are also acquired 2824 * by cyclic handlers. However, omni cyclic online handlers may _not_ 2825 * call back into the cyclic subsystem, and should be generally careful 2826 * about calling into arbitrary kernel subsystems. 2827 * 2828 * Offline handler 2829 * 2830 * The cyo_offline member is a pointer to a function which has the following 2831 * three arguments: 2832 * 2833 * void * <-- Argument (cyo_arg) 2834 * cpu_t * <-- Pointer to CPU about to be offlined 2835 * void * <-- CPU's cyclic argument (that is, value 2836 * to which cyh_arg member of the cyc_handler_t 2837 * was set in the omni online handler) 2838 * 2839 * The omni cyclic offline handler is always called _after_ the omni 2840 * cyclic has ceased firing on the specified CPU. Its purpose is to 2841 * allow cleanup of any resources dynamically allocated in the omni cyclic 2842 * online handler. The context of the offline handler is identical to 2843 * that of the online handler; the same constraints and liberties apply. 2844 * 2845 * The offline handler is optional; it may be NULL. 2846 * 2847 * Return value 2848 * 2849 * cyclic_add_omni() returns a cyclic_id_t, which is guaranteed to be a 2850 * value other than CYCLIC_NONE. cyclic_add_omni() cannot fail. 2851 * 2852 * Caller's context 2853 * 2854 * The caller's context is identical to that of cyclic_add(), specified 2855 * above. 2856 */ 2857 cyclic_id_t 2858 cyclic_add_omni(cyc_omni_handler_t *omni) 2859 { 2860 cyc_id_t *idp = cyclic_new_id(); 2861 cyc_cpu_t *cpu; 2862 cpu_t *c; 2863 2864 ASSERT(MUTEX_HELD(&cpu_lock)); 2865 ASSERT(omni != NULL && omni->cyo_online != NULL); 2866 2867 idp->cyi_omni_hdlr = *omni; 2868 2869 c = cpu_list; 2870 do { 2871 if ((cpu = c->cpu_cyclic) == NULL) 2872 continue; 2873 2874 if (cpu->cyp_state != CYS_ONLINE) { 2875 ASSERT(cpu->cyp_state == CYS_OFFLINE); 2876 continue; 2877 } 2878 2879 cyclic_omni_start(idp, cpu); 2880 } while ((c = c->cpu_next) != cpu_list); 2881 2882 /* 2883 * We must have found at least one online CPU on which to run 2884 * this cyclic. 2885 */ 2886 ASSERT(idp->cyi_omni_list != NULL); 2887 ASSERT(idp->cyi_cpu == NULL); 2888 2889 return ((uintptr_t)idp); 2890 } 2891 2892 /* 2893 * void cyclic_remove(cyclic_id_t) 2894 * 2895 * Overview 2896 * 2897 * cyclic_remove() will remove the specified cyclic from the system. 2898 * 2899 * Arguments and notes 2900 * 2901 * The only argument is a cyclic_id returned from either cyclic_add() or 2902 * cyclic_add_omni(). 2903 * 2904 * By the time cyclic_remove() returns, the caller is guaranteed that the 2905 * removed cyclic handler has completed execution (this is the same 2906 * semantic that untimeout() provides). As a result, cyclic_remove() may 2907 * need to block, waiting for the removed cyclic to complete execution. 2908 * This leads to an important constraint on the caller: no lock may be 2909 * held across cyclic_remove() that also may be acquired by a cyclic 2910 * handler. 2911 * 2912 * Return value 2913 * 2914 * None; cyclic_remove() always succeeds. 2915 * 2916 * Caller's context 2917 * 2918 * cpu_lock must be held by the caller, and the caller must not be in 2919 * interrupt context. The caller may not hold any locks which are also 2920 * grabbed by any cyclic handler. See "Arguments and notes", above. 2921 */ 2922 void 2923 cyclic_remove(cyclic_id_t id) 2924 { 2925 cyc_id_t *idp = (cyc_id_t *)id; 2926 cyc_id_t *prev = idp->cyi_prev, *next = idp->cyi_next; 2927 cyc_cpu_t *cpu = idp->cyi_cpu; 2928 2929 CYC_PTRACE("remove", idp, idp->cyi_cpu); 2930 ASSERT(MUTEX_HELD(&cpu_lock)); 2931 2932 if (cpu != NULL) { 2933 (void) cyclic_remove_here(cpu, idp->cyi_ndx, NULL, CY_WAIT); 2934 } else { 2935 ASSERT(idp->cyi_omni_list != NULL); 2936 while (idp->cyi_omni_list != NULL) 2937 cyclic_omni_stop(idp, idp->cyi_omni_list->cyo_cpu); 2938 } 2939 2940 if (prev != NULL) { 2941 ASSERT(cyclic_id_head != idp); 2942 prev->cyi_next = next; 2943 } else { 2944 ASSERT(cyclic_id_head == idp); 2945 cyclic_id_head = next; 2946 } 2947 2948 if (next != NULL) 2949 next->cyi_prev = prev; 2950 2951 kmem_cache_free(cyclic_id_cache, idp); 2952 } 2953 2954 /* 2955 * void cyclic_bind(cyclic_id_t, cpu_t *, cpupart_t *) 2956 * 2957 * Overview 2958 * 2959 * cyclic_bind() atomically changes the CPU and CPU partition bindings 2960 * of a cyclic. 2961 * 2962 * Arguments and notes 2963 * 2964 * The first argument is a cyclic_id retuned from cyclic_add(). 2965 * cyclic_bind() may _not_ be called on a cyclic_id returned from 2966 * cyclic_add_omni(). 2967 * 2968 * The second argument specifies the CPU to which to bind the specified 2969 * cyclic. If the specified cyclic is bound to a CPU other than the one 2970 * specified, it will be unbound from its bound CPU. Unbinding the cyclic 2971 * from its CPU may cause it to be juggled to another CPU. If the specified 2972 * CPU is non-NULL, the cyclic will be subsequently rebound to the specified 2973 * CPU. 2974 * 2975 * If a CPU with bound cyclics is transitioned into the P_NOINTR state, 2976 * only cyclics not bound to the CPU can be juggled away; CPU-bound cyclics 2977 * will continue to fire on the P_NOINTR CPU. A CPU with bound cyclics 2978 * cannot be offlined (attempts to offline the CPU will return EBUSY). 2979 * Likewise, cyclics may not be bound to an offline CPU; if the caller 2980 * attempts to bind a cyclic to an offline CPU, the cyclic subsystem will 2981 * panic. 2982 * 2983 * The third argument specifies the CPU partition to which to bind the 2984 * specified cyclic. If the specified cyclic is bound to a CPU partition 2985 * other than the one specified, it will be unbound from its bound 2986 * partition. Unbinding the cyclic from its CPU partition may cause it 2987 * to be juggled to another CPU. If the specified CPU partition is 2988 * non-NULL, the cyclic will be subsequently rebound to the specified CPU 2989 * partition. 2990 * 2991 * It is the caller's responsibility to assure that the specified CPU 2992 * partition contains a CPU. If it does not, the cyclic subsystem will 2993 * panic. A CPU partition with bound cyclics cannot be destroyed (attempts 2994 * to destroy the partition will return EBUSY). If a CPU with 2995 * partition-bound cyclics is transitioned into the P_NOINTR state, cyclics 2996 * bound to the CPU's partition (but not bound to the CPU) will be juggled 2997 * away only if there exists another CPU in the partition in the P_ONLINE 2998 * state. 2999 * 3000 * It is the caller's responsibility to assure that the specified CPU and 3001 * CPU partition are self-consistent. If both parameters are non-NULL, 3002 * and the specified CPU partition does not contain the specified CPU, the 3003 * cyclic subsystem will panic. 3004 * 3005 * It is the caller's responsibility to assure that the specified CPU has 3006 * been configured with respect to the cyclic subsystem. Generally, this 3007 * is always true for valid, on-line CPUs. The only periods of time during 3008 * which this may not be true are during MP boot (i.e. after cyclic_init() 3009 * is called but before cyclic_mp_init() is called) or during dynamic 3010 * reconfiguration; cyclic_bind() should only be called with great care 3011 * from these contexts. 3012 * 3013 * Return value 3014 * 3015 * None; cyclic_bind() always succeeds. 3016 * 3017 * Caller's context 3018 * 3019 * cpu_lock must be held by the caller, and the caller must not be in 3020 * interrupt context. The caller may not hold any locks which are also 3021 * grabbed by any cyclic handler. 3022 */ 3023 void 3024 cyclic_bind(cyclic_id_t id, cpu_t *d, cpupart_t *part) 3025 { 3026 cyc_id_t *idp = (cyc_id_t *)id; 3027 cyc_cpu_t *cpu = idp->cyi_cpu; 3028 cpu_t *c; 3029 uint16_t flags; 3030 3031 CYC_PTRACE("bind", d, part); 3032 ASSERT(MUTEX_HELD(&cpu_lock)); 3033 ASSERT(part == NULL || d == NULL || d->cpu_part == part); 3034 3035 if (cpu == NULL) { 3036 ASSERT(idp->cyi_omni_list != NULL); 3037 panic("attempt to change binding of omnipresent cyclic"); 3038 } 3039 3040 c = cpu->cyp_cpu; 3041 flags = cpu->cyp_cyclics[idp->cyi_ndx].cy_flags; 3042 3043 if (c != d && (flags & CYF_CPU_BOUND)) 3044 cyclic_unbind_cpu(id); 3045 3046 /* 3047 * Reload our cpu (we may have migrated). We don't have to reload 3048 * the flags field here; if we were CYF_PART_BOUND on entry, we are 3049 * CYF_PART_BOUND now. 3050 */ 3051 cpu = idp->cyi_cpu; 3052 c = cpu->cyp_cpu; 3053 3054 if (part != c->cpu_part && (flags & CYF_PART_BOUND)) 3055 cyclic_unbind_cpupart(id); 3056 3057 /* 3058 * Now reload the flags field, asserting that if we are CPU bound, 3059 * the CPU was specified (and likewise, if we are partition bound, 3060 * the partition was specified). 3061 */ 3062 cpu = idp->cyi_cpu; 3063 c = cpu->cyp_cpu; 3064 flags = cpu->cyp_cyclics[idp->cyi_ndx].cy_flags; 3065 ASSERT(!(flags & CYF_CPU_BOUND) || c == d); 3066 ASSERT(!(flags & CYF_PART_BOUND) || c->cpu_part == part); 3067 3068 if (!(flags & CYF_CPU_BOUND) && d != NULL) 3069 cyclic_bind_cpu(id, d); 3070 3071 if (!(flags & CYF_PART_BOUND) && part != NULL) 3072 cyclic_bind_cpupart(id, part); 3073 } 3074 3075 int 3076 cyclic_reprogram(cyclic_id_t id, hrtime_t expiration) 3077 { 3078 cyc_id_t *idp = (cyc_id_t *)id; 3079 cyc_cpu_t *cpu; 3080 cyc_omni_cpu_t *ocpu; 3081 cyc_index_t ndx; 3082 int res = 1; 3083 3084 ASSERT(expiration > 0); 3085 3086 CYC_PTRACE("reprog", idp, idp->cyi_cpu); 3087 3088 kpreempt_disable(); 3089 3090 /* 3091 * Prevent the cyclic from moving or disappearing while we reprogram. 3092 */ 3093 rw_enter(&idp->cyi_lock, RW_READER); 3094 3095 if (idp->cyi_cpu == NULL) { 3096 ASSERT(curthread->t_preempt > 0); 3097 cpu = CPU->cpu_cyclic; 3098 3099 /* 3100 * For an omni cyclic, we reprogram the cyclic corresponding 3101 * to the current CPU. Look for it in the list. 3102 */ 3103 ocpu = idp->cyi_omni_list; 3104 while (ocpu != NULL) { 3105 if (ocpu->cyo_cpu == cpu) 3106 break; 3107 ocpu = ocpu->cyo_next; 3108 } 3109 3110 if (ocpu == NULL) { 3111 /* 3112 * Didn't find it. This means that CPU offline 3113 * must have removed it racing with us. So, 3114 * nothing to do. 3115 */ 3116 rw_exit(&idp->cyi_lock); 3117 3118 kpreempt_enable(); 3119 3120 return (0); 3121 } 3122 ndx = ocpu->cyo_ndx; 3123 } else { 3124 cpu = idp->cyi_cpu; 3125 ndx = idp->cyi_ndx; 3126 } 3127 3128 if (cpu->cyp_cpu == CPU) { 3129 /* 3130 * If this reprogram is being done as part of a running cyclic 3131 * handler, it is possible that a racing cyclic_remove() on a 3132 * remote CPU will cause it to fail. 3133 */ 3134 if (!cyclic_reprogram_cyclic(cpu, ndx, expiration, B_TRUE)) { 3135 res = 0; 3136 } 3137 } else { 3138 cyclic_reprogram_here(cpu, ndx, expiration); 3139 } 3140 3141 /* 3142 * Allow the cyclic to be moved or removed. 3143 */ 3144 rw_exit(&idp->cyi_lock); 3145 3146 kpreempt_enable(); 3147 3148 return (res); 3149 } 3150 3151 /* 3152 * void cyclic_move_here(cyclic_id_t) 3153 * 3154 * Overview 3155 * 3156 * cyclic_move_here() attempts to shuffle a cyclic onto the current CPU. 3157 * 3158 * Arguments and notes 3159 * 3160 * The first argument is a cyclic_id returned from cyclic_add(). 3161 * cyclic_move_here() may _not_ be called on a cyclic_id returned from 3162 * cyclic_add_omni() or one bound to a CPU or partition via cyclic_bind(). 3163 * 3164 * This cyclic shuffling is performed on a best-effort basis. If for some 3165 * reason the current CPU is unsuitable or the thread migrates between CPUs 3166 * during the call, the function may return with the cyclic residing on some 3167 * other CPU. 3168 * 3169 * Return value 3170 * 3171 * None; cyclic_move_here() always reports success. 3172 * 3173 * Caller's context 3174 * 3175 * cpu_lock must be held by the caller, and the caller must not be in 3176 * interrupt context. The caller may not hold any locks which are also 3177 * grabbed by any cyclic handler. 3178 */ 3179 void 3180 cyclic_move_here(cyclic_id_t id) 3181 { 3182 cyc_id_t *idp = (cyc_id_t *)id; 3183 cyc_cpu_t *cc = idp->cyi_cpu; 3184 cpu_t *dest = CPU; 3185 3186 ASSERT(MUTEX_HELD(&cpu_lock)); 3187 CYC_PTRACE("move_here", idp, dest); 3188 VERIFY3P(cc, !=, NULL); 3189 VERIFY3U(cc->cyp_cyclics[idp->cyi_ndx].cy_flags & 3190 (CYF_CPU_BOUND|CYF_PART_BOUND), ==, 0); 3191 3192 if (cc->cyp_cpu == dest) { 3193 return; 3194 } 3195 3196 /* Is the destination CPU suitable for a migration target? */ 3197 if (dest->cpu_cyclic == NULL || 3198 dest->cpu_cyclic->cyp_state == CYS_OFFLINE || 3199 (dest->cpu_flags & CPU_ENABLE) == 0) { 3200 return; 3201 } 3202 3203 cyclic_juggle_one_to(idp, dest->cpu_cyclic); 3204 } 3205 3206 hrtime_t 3207 cyclic_getres() 3208 { 3209 return (cyclic_resolution); 3210 } 3211 3212 void 3213 cyclic_init(cyc_backend_t *be, hrtime_t resolution) 3214 { 3215 ASSERT(MUTEX_HELD(&cpu_lock)); 3216 3217 CYC_PTRACE("init", be, resolution); 3218 cyclic_resolution = resolution; 3219 3220 /* 3221 * Copy the passed cyc_backend into the backend template. This must 3222 * be done before the CPU can be configured. 3223 */ 3224 bcopy(be, &cyclic_backend, sizeof (cyc_backend_t)); 3225 3226 /* 3227 * It's safe to look at the "CPU" pointer without disabling kernel 3228 * preemption; cyclic_init() is called only during startup by the 3229 * cyclic backend. 3230 */ 3231 cyclic_configure(CPU); 3232 cyclic_online(CPU); 3233 } 3234 3235 /* 3236 * It is assumed that cyclic_mp_init() is called some time after cyclic 3237 * init (and therefore, after cpu0 has been initialized). We grab cpu_lock, 3238 * find the already initialized CPU, and initialize every other CPU with the 3239 * same backend. Finally, we register a cpu_setup function. 3240 */ 3241 void 3242 cyclic_mp_init() 3243 { 3244 cpu_t *c; 3245 3246 mutex_enter(&cpu_lock); 3247 3248 c = cpu_list; 3249 do { 3250 if (c->cpu_cyclic == NULL) { 3251 cyclic_configure(c); 3252 cyclic_online(c); 3253 } 3254 } while ((c = c->cpu_next) != cpu_list); 3255 3256 register_cpu_setup_func(cyclic_cpu_setup, NULL); 3257 mutex_exit(&cpu_lock); 3258 } 3259 3260 /* 3261 * int cyclic_juggle(cpu_t *) 3262 * 3263 * Overview 3264 * 3265 * cyclic_juggle() juggles as many cyclics as possible away from the 3266 * specified CPU; all remaining cyclics on the CPU will either be CPU- 3267 * or partition-bound. 3268 * 3269 * Arguments and notes 3270 * 3271 * The only argument to cyclic_juggle() is the CPU from which cyclics 3272 * should be juggled. CPU-bound cyclics are never juggled; partition-bound 3273 * cyclics are only juggled if the specified CPU is in the P_NOINTR state 3274 * and there exists a P_ONLINE CPU in the partition. The cyclic subsystem 3275 * assures that a cyclic will never fire late or spuriously, even while 3276 * being juggled. 3277 * 3278 * Return value 3279 * 3280 * cyclic_juggle() returns a non-zero value if all cyclics were able to 3281 * be juggled away from the CPU, and zero if one or more cyclics could 3282 * not be juggled away. 3283 * 3284 * Caller's context 3285 * 3286 * cpu_lock must be held by the caller, and the caller must not be in 3287 * interrupt context. The caller may not hold any locks which are also 3288 * grabbed by any cyclic handler. While cyclic_juggle() _may_ be called 3289 * in any context satisfying these constraints, it _must_ be called 3290 * immediately after clearing CPU_ENABLE (i.e. before dropping cpu_lock). 3291 * Failure to do so could result in an assertion failure in the cyclic 3292 * subsystem. 3293 */ 3294 int 3295 cyclic_juggle(cpu_t *c) 3296 { 3297 cyc_cpu_t *cpu = c->cpu_cyclic; 3298 cyc_id_t *idp; 3299 int all_juggled = 1; 3300 3301 CYC_PTRACE1("juggle", c); 3302 ASSERT(MUTEX_HELD(&cpu_lock)); 3303 3304 /* 3305 * We'll go through each cyclic on the CPU, attempting to juggle 3306 * each one elsewhere. 3307 */ 3308 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3309 if (idp->cyi_cpu != cpu) 3310 continue; 3311 3312 if (cyclic_juggle_one(idp) == 0) { 3313 all_juggled = 0; 3314 continue; 3315 } 3316 3317 ASSERT(idp->cyi_cpu != cpu); 3318 } 3319 3320 return (all_juggled); 3321 } 3322 3323 /* 3324 * int cyclic_offline(cpu_t *) 3325 * 3326 * Overview 3327 * 3328 * cyclic_offline() offlines the cyclic subsystem on the specified CPU. 3329 * 3330 * Arguments and notes 3331 * 3332 * The only argument to cyclic_offline() is a CPU to offline. 3333 * cyclic_offline() will attempt to juggle cyclics away from the specified 3334 * CPU. 3335 * 3336 * Return value 3337 * 3338 * cyclic_offline() returns 1 if all cyclics on the CPU were juggled away 3339 * and the cyclic subsystem on the CPU was successfully offlines. 3340 * cyclic_offline returns 0 if some cyclics remain, blocking the cyclic 3341 * offline operation. All remaining cyclics on the CPU will either be 3342 * CPU- or partition-bound. 3343 * 3344 * See the "Arguments and notes" of cyclic_juggle(), below, for more detail 3345 * on cyclic juggling. 3346 * 3347 * Caller's context 3348 * 3349 * The only caller of cyclic_offline() should be the processor management 3350 * subsystem. It is expected that the caller of cyclic_offline() will 3351 * offline the CPU immediately after cyclic_offline() returns success (i.e. 3352 * before dropping cpu_lock). Moreover, it is expected that the caller will 3353 * fail the CPU offline operation if cyclic_offline() returns failure. 3354 */ 3355 int 3356 cyclic_offline(cpu_t *c) 3357 { 3358 cyc_cpu_t *cpu = c->cpu_cyclic; 3359 cyc_id_t *idp; 3360 3361 CYC_PTRACE1("offline", cpu); 3362 ASSERT(MUTEX_HELD(&cpu_lock)); 3363 3364 if (!cyclic_juggle(c)) 3365 return (0); 3366 3367 /* 3368 * This CPU is headed offline; we need to now stop omnipresent 3369 * cyclic firing on this CPU. 3370 */ 3371 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3372 if (idp->cyi_cpu != NULL) 3373 continue; 3374 3375 /* 3376 * We cannot possibly be offlining the last CPU; cyi_omni_list 3377 * must be non-NULL. 3378 */ 3379 ASSERT(idp->cyi_omni_list != NULL); 3380 cyclic_omni_stop(idp, cpu); 3381 } 3382 3383 ASSERT(cpu->cyp_state == CYS_ONLINE); 3384 cpu->cyp_state = CYS_OFFLINE; 3385 3386 return (1); 3387 } 3388 3389 /* 3390 * void cyclic_online(cpu_t *) 3391 * 3392 * Overview 3393 * 3394 * cyclic_online() onlines a CPU previously offlined with cyclic_offline(). 3395 * 3396 * Arguments and notes 3397 * 3398 * cyclic_online()'s only argument is a CPU to online. The specified 3399 * CPU must have been previously offlined with cyclic_offline(). After 3400 * cyclic_online() returns, the specified CPU will be eligible to execute 3401 * cyclics. 3402 * 3403 * Return value 3404 * 3405 * None; cyclic_online() always succeeds. 3406 * 3407 * Caller's context 3408 * 3409 * cyclic_online() should only be called by the processor management 3410 * subsystem; cpu_lock must be held. 3411 */ 3412 void 3413 cyclic_online(cpu_t *c) 3414 { 3415 cyc_cpu_t *cpu = c->cpu_cyclic; 3416 cyc_id_t *idp; 3417 3418 CYC_PTRACE1("online", cpu); 3419 ASSERT(c->cpu_flags & CPU_ENABLE); 3420 ASSERT(MUTEX_HELD(&cpu_lock)); 3421 ASSERT(cpu->cyp_state == CYS_OFFLINE); 3422 3423 cpu->cyp_state = CYS_ONLINE; 3424 3425 /* 3426 * Now that this CPU is open for business, we need to start firing 3427 * all omnipresent cyclics on it. 3428 */ 3429 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3430 if (idp->cyi_cpu != NULL) 3431 continue; 3432 3433 cyclic_omni_start(idp, cpu); 3434 } 3435 } 3436 3437 /* 3438 * void cyclic_move_in(cpu_t *) 3439 * 3440 * Overview 3441 * 3442 * cyclic_move_in() is called by the CPU partition code immediately after 3443 * the specified CPU has moved into a new partition. 3444 * 3445 * Arguments and notes 3446 * 3447 * The only argument to cyclic_move_in() is a CPU which has moved into a 3448 * new partition. If the specified CPU is P_ONLINE, and every other 3449 * CPU in the specified CPU's new partition is P_NOINTR, cyclic_move_in() 3450 * will juggle all partition-bound, CPU-unbound cyclics to the specified 3451 * CPU. 3452 * 3453 * Return value 3454 * 3455 * None; cyclic_move_in() always succeeds. 3456 * 3457 * Caller's context 3458 * 3459 * cyclic_move_in() should _only_ be called immediately after a CPU has 3460 * moved into a new partition, with cpu_lock held. As with other calls 3461 * into the cyclic subsystem, no lock may be held which is also grabbed 3462 * by any cyclic handler. 3463 */ 3464 void 3465 cyclic_move_in(cpu_t *d) 3466 { 3467 cyc_id_t *idp; 3468 cyc_cpu_t *dest = d->cpu_cyclic; 3469 cyclic_t *cyclic; 3470 cpupart_t *part = d->cpu_part; 3471 3472 CYC_PTRACE("move-in", dest, part); 3473 ASSERT(MUTEX_HELD(&cpu_lock)); 3474 3475 /* 3476 * Look for CYF_PART_BOUND cyclics in the new partition. If 3477 * we find one, check to see if it is currently on a CPU which has 3478 * interrupts disabled. If it is (and if this CPU currently has 3479 * interrupts enabled), we'll juggle those cyclics over here. 3480 */ 3481 if (!(d->cpu_flags & CPU_ENABLE)) { 3482 CYC_PTRACE1("move-in-none", dest); 3483 return; 3484 } 3485 3486 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3487 cyc_cpu_t *cpu = idp->cyi_cpu; 3488 cpu_t *c; 3489 3490 /* 3491 * Omnipresent cyclics are exempt from juggling. 3492 */ 3493 if (cpu == NULL) 3494 continue; 3495 3496 c = cpu->cyp_cpu; 3497 3498 if (c->cpu_part != part || (c->cpu_flags & CPU_ENABLE)) 3499 continue; 3500 3501 cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 3502 3503 if (cyclic->cy_flags & CYF_CPU_BOUND) 3504 continue; 3505 3506 /* 3507 * We know that this cyclic is bound to its processor set 3508 * (otherwise, it would not be on a CPU with interrupts 3509 * disabled); juggle it to our CPU. 3510 */ 3511 ASSERT(cyclic->cy_flags & CYF_PART_BOUND); 3512 cyclic_juggle_one_to(idp, dest); 3513 } 3514 3515 CYC_PTRACE1("move-in-done", dest); 3516 } 3517 3518 /* 3519 * int cyclic_move_out(cpu_t *) 3520 * 3521 * Overview 3522 * 3523 * cyclic_move_out() is called by the CPU partition code immediately before 3524 * the specified CPU is to move out of its partition. 3525 * 3526 * Arguments and notes 3527 * 3528 * The only argument to cyclic_move_out() is a CPU which is to move out of 3529 * its partition. 3530 * 3531 * cyclic_move_out() will attempt to juggle away all partition-bound 3532 * cyclics. If the specified CPU is the last CPU in a partition with 3533 * partition-bound cyclics, cyclic_move_out() will fail. If there exists 3534 * a partition-bound cyclic which is CPU-bound to the specified CPU, 3535 * cyclic_move_out() will fail. 3536 * 3537 * Note that cyclic_move_out() will _only_ attempt to juggle away 3538 * partition-bound cyclics; CPU-bound cyclics which are not partition-bound 3539 * and unbound cyclics are not affected by changing the partition 3540 * affiliation of the CPU. 3541 * 3542 * Return value 3543 * 3544 * cyclic_move_out() returns 1 if all partition-bound cyclics on the CPU 3545 * were juggled away; 0 if some cyclics remain. 3546 * 3547 * Caller's context 3548 * 3549 * cyclic_move_out() should _only_ be called immediately before a CPU has 3550 * moved out of its partition, with cpu_lock held. It is expected that 3551 * the caller of cyclic_move_out() will change the processor set affiliation 3552 * of the specified CPU immediately after cyclic_move_out() returns 3553 * success (i.e. before dropping cpu_lock). Moreover, it is expected that 3554 * the caller will fail the CPU repartitioning operation if cyclic_move_out() 3555 * returns failure. As with other calls into the cyclic subsystem, no lock 3556 * may be held which is also grabbed by any cyclic handler. 3557 */ 3558 int 3559 cyclic_move_out(cpu_t *c) 3560 { 3561 cyc_id_t *idp; 3562 cyc_cpu_t *cpu = c->cpu_cyclic, *dest; 3563 cyclic_t *cyclic, *cyclics = cpu->cyp_cyclics; 3564 cpupart_t *part = c->cpu_part; 3565 3566 CYC_PTRACE1("move-out", cpu); 3567 ASSERT(MUTEX_HELD(&cpu_lock)); 3568 3569 /* 3570 * If there are any CYF_PART_BOUND cyclics on this CPU, we need 3571 * to try to juggle them away. 3572 */ 3573 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3574 3575 if (idp->cyi_cpu != cpu) 3576 continue; 3577 3578 cyclic = &cyclics[idp->cyi_ndx]; 3579 3580 if (!(cyclic->cy_flags & CYF_PART_BOUND)) 3581 continue; 3582 3583 dest = cyclic_pick_cpu(part, c, c, cyclic->cy_flags); 3584 3585 if (dest == NULL) { 3586 /* 3587 * We can't juggle this cyclic; we need to return 3588 * failure (we won't bother trying to juggle away 3589 * other cyclics). 3590 */ 3591 CYC_PTRACE("move-out-fail", cpu, idp); 3592 return (0); 3593 } 3594 cyclic_juggle_one_to(idp, dest); 3595 } 3596 3597 CYC_PTRACE1("move-out-done", cpu); 3598 return (1); 3599 } 3600 3601 /* 3602 * void cyclic_suspend() 3603 * 3604 * Overview 3605 * 3606 * cyclic_suspend() suspends all cyclic activity throughout the cyclic 3607 * subsystem. It should be called only by subsystems which are attempting 3608 * to suspend the entire system (e.g. checkpoint/resume, dynamic 3609 * reconfiguration). 3610 * 3611 * Arguments and notes 3612 * 3613 * cyclic_suspend() takes no arguments. Each CPU with an active cyclic 3614 * disables its backend (offline CPUs disable their backends as part of 3615 * the cyclic_offline() operation), thereby disabling future CY_HIGH_LEVEL 3616 * interrupts. 3617 * 3618 * Note that disabling CY_HIGH_LEVEL interrupts does not completely preclude 3619 * cyclic handlers from being called after cyclic_suspend() returns: if a 3620 * CY_LOCK_LEVEL or CY_LOW_LEVEL interrupt thread was blocked at the time 3621 * of cyclic_suspend(), cyclic handlers at its level may continue to be 3622 * called after the interrupt thread becomes unblocked. The 3623 * post-cyclic_suspend() activity is bounded by the pend count on all 3624 * cyclics at the time of cyclic_suspend(). Callers concerned with more 3625 * than simply disabling future CY_HIGH_LEVEL interrupts must check for 3626 * this condition. 3627 * 3628 * On most platforms, timestamps from gethrtime() and gethrestime() are not 3629 * guaranteed to monotonically increase between cyclic_suspend() and 3630 * cyclic_resume(). However, timestamps are guaranteed to monotonically 3631 * increase across the entire cyclic_suspend()/cyclic_resume() operation. 3632 * That is, every timestamp obtained before cyclic_suspend() will be less 3633 * than every timestamp obtained after cyclic_resume(). 3634 * 3635 * Return value 3636 * 3637 * None; cyclic_suspend() always succeeds. 3638 * 3639 * Caller's context 3640 * 3641 * The cyclic subsystem must be configured on every valid CPU; 3642 * cyclic_suspend() may not be called during boot or during dynamic 3643 * reconfiguration. Additionally, cpu_lock must be held, and the caller 3644 * cannot be in high-level interrupt context. However, unlike most other 3645 * cyclic entry points, cyclic_suspend() may be called with locks held 3646 * which are also acquired by CY_LOCK_LEVEL or CY_LOW_LEVEL cyclic 3647 * handlers. 3648 */ 3649 void 3650 cyclic_suspend() 3651 { 3652 cpu_t *c; 3653 cyc_cpu_t *cpu; 3654 cyc_xcallarg_t arg; 3655 cyc_backend_t *be; 3656 3657 CYC_PTRACE0("suspend"); 3658 ASSERT(MUTEX_HELD(&cpu_lock)); 3659 c = cpu_list; 3660 3661 do { 3662 cpu = c->cpu_cyclic; 3663 be = cpu->cyp_backend; 3664 arg.cyx_cpu = cpu; 3665 3666 be->cyb_xcall(be->cyb_arg, c, 3667 (cyc_func_t)cyclic_suspend_xcall, &arg); 3668 } while ((c = c->cpu_next) != cpu_list); 3669 } 3670 3671 /* 3672 * void cyclic_resume() 3673 * 3674 * cyclic_resume() resumes all cyclic activity throughout the cyclic 3675 * subsystem. It should be called only by system-suspending subsystems. 3676 * 3677 * Arguments and notes 3678 * 3679 * cyclic_resume() takes no arguments. Each CPU with an active cyclic 3680 * reenables and reprograms its backend (offline CPUs are not reenabled). 3681 * On most platforms, timestamps from gethrtime() and gethrestime() are not 3682 * guaranteed to monotonically increase between cyclic_suspend() and 3683 * cyclic_resume(). However, timestamps are guaranteed to monotonically 3684 * increase across the entire cyclic_suspend()/cyclic_resume() operation. 3685 * That is, every timestamp obtained before cyclic_suspend() will be less 3686 * than every timestamp obtained after cyclic_resume(). 3687 * 3688 * Return value 3689 * 3690 * None; cyclic_resume() always succeeds. 3691 * 3692 * Caller's context 3693 * 3694 * The cyclic subsystem must be configured on every valid CPU; 3695 * cyclic_resume() may not be called during boot or during dynamic 3696 * reconfiguration. Additionally, cpu_lock must be held, and the caller 3697 * cannot be in high-level interrupt context. However, unlike most other 3698 * cyclic entry points, cyclic_resume() may be called with locks held which 3699 * are also acquired by CY_LOCK_LEVEL or CY_LOW_LEVEL cyclic handlers. 3700 */ 3701 void 3702 cyclic_resume() 3703 { 3704 cpu_t *c; 3705 cyc_cpu_t *cpu; 3706 cyc_xcallarg_t arg; 3707 cyc_backend_t *be; 3708 3709 CYC_PTRACE0("resume"); 3710 ASSERT(MUTEX_HELD(&cpu_lock)); 3711 3712 c = cpu_list; 3713 3714 do { 3715 cpu = c->cpu_cyclic; 3716 be = cpu->cyp_backend; 3717 arg.cyx_cpu = cpu; 3718 3719 be->cyb_xcall(be->cyb_arg, c, 3720 (cyc_func_t)cyclic_resume_xcall, &arg); 3721 } while ((c = c->cpu_next) != cpu_list); 3722 } 3723