1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 /* 22 * Copyright 2008 Sun Microsystems, Inc. All rights reserved. 23 * Use is subject to license terms. 24 */ 25 26 /* 27 * The Cyclic Subsystem 28 * -------------------- 29 * 30 * Prehistory 31 * 32 * Historically, most computer architectures have specified interval-based 33 * timer parts (e.g. SPARCstation's counter/timer; Intel's i8254). While 34 * these parts deal in relative (i.e. not absolute) time values, they are 35 * typically used by the operating system to implement the abstraction of 36 * absolute time. As a result, these parts cannot typically be reprogrammed 37 * without introducing error in the system's notion of time. 38 * 39 * Starting in about 1994, chip architectures began specifying high resolution 40 * timestamp registers. As of this writing (1999), all major chip families 41 * (UltraSPARC, PentiumPro, MIPS, PowerPC, Alpha) have high resolution 42 * timestamp registers, and two (UltraSPARC and MIPS) have added the capacity 43 * to interrupt based on timestamp values. These timestamp-compare registers 44 * present a time-based interrupt source which can be reprogrammed arbitrarily 45 * often without introducing error. Given the low cost of implementing such a 46 * timestamp-compare register (and the tangible benefit of eliminating 47 * discrete timer parts), it is reasonable to expect that future chip 48 * architectures will adopt this feature. 49 * 50 * The cyclic subsystem has been designed to take advantage of chip 51 * architectures with the capacity to interrupt based on absolute, high 52 * resolution values of time. 53 * 54 * Subsystem Overview 55 * 56 * The cyclic subsystem is a low-level kernel subsystem designed to provide 57 * arbitrarily high resolution, per-CPU interval timers (to avoid colliding 58 * with existing terms, we dub such an interval timer a "cyclic"). Cyclics 59 * can be specified to fire at high, lock or low interrupt level, and may be 60 * optionally bound to a CPU or a CPU partition. A cyclic's CPU or CPU 61 * partition binding may be changed dynamically; the cyclic will be "juggled" 62 * to a CPU which satisfies the new binding. Alternatively, a cyclic may 63 * be specified to be "omnipresent", denoting firing on all online CPUs. 64 * 65 * Cyclic Subsystem Interface Overview 66 * ----------------------------------- 67 * 68 * The cyclic subsystem has interfaces with the kernel at-large, with other 69 * kernel subsystems (e.g. the processor management subsystem, the checkpoint 70 * resume subsystem) and with the platform (the cyclic backend). Each 71 * of these interfaces is given a brief synopsis here, and is described 72 * in full above the interface's implementation. 73 * 74 * The following diagram displays the cyclic subsystem's interfaces to 75 * other kernel components. The arrows denote a "calls" relationship, with 76 * the large arrow indicating the cyclic subsystem's consumer interface. 77 * Each arrow is labeled with the section in which the corresponding 78 * interface is described. 79 * 80 * Kernel at-large consumers 81 * -----------++------------ 82 * || 83 * || 84 * _||_ 85 * \ / 86 * \/ 87 * +---------------------+ 88 * | | 89 * | Cyclic subsystem |<----------- Other kernel subsystems 90 * | | 91 * +---------------------+ 92 * ^ | 93 * | | 94 * | | 95 * | v 96 * +---------------------+ 97 * | | 98 * | Cyclic backend | 99 * | (platform specific) | 100 * | | 101 * +---------------------+ 102 * 103 * 104 * Kernel At-Large Interfaces 105 * 106 * cyclic_add() <-- Creates a cyclic 107 * cyclic_add_omni() <-- Creates an omnipresent cyclic 108 * cyclic_remove() <-- Removes a cyclic 109 * cyclic_bind() <-- Change a cyclic's CPU or partition binding 110 * cyclic_reprogram() <-- Reprogram a cyclic's expiration 111 * 112 * Inter-subsystem Interfaces 113 * 114 * cyclic_juggle() <-- Juggles cyclics away from a CPU 115 * cyclic_offline() <-- Offlines cyclic operation on a CPU 116 * cyclic_online() <-- Reenables operation on an offlined CPU 117 * cyclic_move_in() <-- Notifies subsystem of change in CPU partition 118 * cyclic_move_out() <-- Notifies subsystem of change in CPU partition 119 * cyclic_suspend() <-- Suspends the cyclic subsystem on all CPUs 120 * cyclic_resume() <-- Resumes the cyclic subsystem on all CPUs 121 * 122 * Backend Interfaces 123 * 124 * cyclic_init() <-- Initializes the cyclic subsystem 125 * cyclic_fire() <-- CY_HIGH_LEVEL interrupt entry point 126 * cyclic_softint() <-- CY_LOCK/LOW_LEVEL soft interrupt entry point 127 * 128 * The backend-supplied interfaces (through the cyc_backend structure) are 129 * documented in detail in <sys/cyclic_impl.h> 130 * 131 * 132 * Cyclic Subsystem Implementation Overview 133 * ---------------------------------------- 134 * 135 * The cyclic subsystem is designed to minimize interference between cyclics 136 * on different CPUs. Thus, all of the cyclic subsystem's data structures 137 * hang off of a per-CPU structure, cyc_cpu. 138 * 139 * Each cyc_cpu has a power-of-two sized array of cyclic structures (the 140 * cyp_cyclics member of the cyc_cpu structure). If cyclic_add() is called 141 * and there does not exist a free slot in the cyp_cyclics array, the size of 142 * the array will be doubled. The array will never shrink. Cyclics are 143 * referred to by their index in the cyp_cyclics array, which is of type 144 * cyc_index_t. 145 * 146 * The cyclics are kept sorted by expiration time in the cyc_cpu's heap. The 147 * heap is keyed by cyclic expiration time, with parents expiring earlier 148 * than their children. 149 * 150 * Heap Management 151 * 152 * The heap is managed primarily by cyclic_fire(). Upon entry, cyclic_fire() 153 * compares the root cyclic's expiration time to the current time. If the 154 * expiration time is in the past, cyclic_expire() is called on the root 155 * cyclic. Upon return from cyclic_expire(), the cyclic's new expiration time 156 * is derived by adding its interval to its old expiration time, and a 157 * downheap operation is performed. After the downheap, cyclic_fire() 158 * examines the (potentially changed) root cyclic, repeating the 159 * cyclic_expire()/add interval/cyclic_downheap() sequence until the root 160 * cyclic has an expiration time in the future. This expiration time 161 * (guaranteed to be the earliest in the heap) is then communicated to the 162 * backend via cyb_reprogram. Optimal backends will next call cyclic_fire() 163 * shortly after the root cyclic's expiration time. 164 * 165 * To allow efficient, deterministic downheap operations, we implement the 166 * heap as an array (the cyp_heap member of the cyc_cpu structure), with each 167 * element containing an index into the CPU's cyp_cyclics array. 168 * 169 * The heap is laid out in the array according to the following: 170 * 171 * 1. The root of the heap is always in the 0th element of the heap array 172 * 2. The left and right children of the nth element are element 173 * (((n + 1) << 1) - 1) and element ((n + 1) << 1), respectively. 174 * 175 * This layout is standard (see, e.g., Cormen's "Algorithms"); the proof 176 * that these constraints correctly lay out a heap (or indeed, any binary 177 * tree) is trivial and left to the reader. 178 * 179 * To see the heap by example, assume our cyclics array has the following 180 * members (at time t): 181 * 182 * cy_handler cy_level cy_expire 183 * --------------------------------------------- 184 * [ 0] clock() LOCK t+10000000 185 * [ 1] deadman() HIGH t+1000000000 186 * [ 2] clock_highres_fire() LOW t+100 187 * [ 3] clock_highres_fire() LOW t+1000 188 * [ 4] clock_highres_fire() LOW t+500 189 * [ 5] (free) -- -- 190 * [ 6] (free) -- -- 191 * [ 7] (free) -- -- 192 * 193 * The heap array could be: 194 * 195 * [0] [1] [2] [3] [4] [5] [6] [7] 196 * +-----+-----+-----+-----+-----+-----+-----+-----+ 197 * | | | | | | | | | 198 * | 2 | 3 | 4 | 0 | 1 | x | x | x | 199 * | | | | | | | | | 200 * +-----+-----+-----+-----+-----+-----+-----+-----+ 201 * 202 * Graphically, this array corresponds to the following (excuse the ASCII art): 203 * 204 * 2 205 * | 206 * +------------------+------------------+ 207 * 3 4 208 * | 209 * +---------+--------+ 210 * 0 1 211 * 212 * Note that the heap is laid out by layer: all nodes at a given depth are 213 * stored in consecutive elements of the array. Moreover, layers of 214 * consecutive depths are in adjacent element ranges. This property 215 * guarantees high locality of reference during downheap operations. 216 * Specifically, we are guaranteed that we can downheap to a depth of 217 * 218 * lg (cache_line_size / sizeof (cyc_index_t)) 219 * 220 * nodes with at most one cache miss. On UltraSPARC (64 byte e-cache line 221 * size), this corresponds to a depth of four nodes. Thus, if there are 222 * fewer than sixteen cyclics in the heap, downheaps on UltraSPARC miss at 223 * most once in the e-cache. 224 * 225 * Downheaps are required to compare siblings as they proceed down the 226 * heap. For downheaps proceeding beyond the one-cache-miss depth, every 227 * access to a left child could potentially miss in the cache. However, 228 * if we assume 229 * 230 * (cache_line_size / sizeof (cyc_index_t)) > 2, 231 * 232 * then all siblings are guaranteed to be on the same cache line. Thus, the 233 * miss on the left child will guarantee a hit on the right child; downheaps 234 * will incur at most one cache miss per layer beyond the one-cache-miss 235 * depth. The total number of cache misses for heap management during a 236 * downheap operation is thus bounded by 237 * 238 * lg (n) - lg (cache_line_size / sizeof (cyc_index_t)) 239 * 240 * Traditional pointer-based heaps are implemented without regard to 241 * locality. Downheaps can thus incur two cache misses per layer (one for 242 * each child), but at most one cache miss at the root. This yields a bound 243 * of 244 * 245 * 2 * lg (n) - 1 246 * 247 * on the total cache misses. 248 * 249 * This difference may seem theoretically trivial (the difference is, after 250 * all, constant), but can become substantial in practice -- especially for 251 * caches with very large cache lines and high miss penalties (e.g. TLBs). 252 * 253 * Heaps must always be full, balanced trees. Heap management must therefore 254 * track the next point-of-insertion into the heap. In pointer-based heaps, 255 * recomputing this point takes O(lg (n)). Given the layout of the 256 * array-based implementation, however, the next point-of-insertion is 257 * always: 258 * 259 * heap[number_of_elements] 260 * 261 * We exploit this property by implementing the free-list in the usused 262 * heap elements. Heap insertion, therefore, consists only of filling in 263 * the cyclic at cyp_cyclics[cyp_heap[number_of_elements]], incrementing 264 * the number of elements, and performing an upheap. Heap deletion consists 265 * of decrementing the number of elements, swapping the to-be-deleted element 266 * with the element at cyp_heap[number_of_elements], and downheaping. 267 * 268 * Filling in more details in our earlier example: 269 * 270 * +--- free list head 271 * | 272 * V 273 * 274 * [0] [1] [2] [3] [4] [5] [6] [7] 275 * +-----+-----+-----+-----+-----+-----+-----+-----+ 276 * | | | | | | | | | 277 * | 2 | 3 | 4 | 0 | 1 | 5 | 6 | 7 | 278 * | | | | | | | | | 279 * +-----+-----+-----+-----+-----+-----+-----+-----+ 280 * 281 * To insert into this heap, we would just need to fill in the cyclic at 282 * cyp_cyclics[5], bump the number of elements (from 5 to 6) and perform 283 * an upheap. 284 * 285 * If we wanted to remove, say, cyp_cyclics[3], we would first scan for it 286 * in the cyp_heap, and discover it at cyp_heap[1]. We would then decrement 287 * the number of elements (from 5 to 4), swap cyp_heap[1] with cyp_heap[4], 288 * and perform a downheap from cyp_heap[1]. The linear scan is required 289 * because the cyclic does not keep a backpointer into the heap. This makes 290 * heap manipulation (e.g. downheaps) faster at the expense of removal 291 * operations. 292 * 293 * Expiry processing 294 * 295 * As alluded to above, cyclic_expire() is called by cyclic_fire() at 296 * CY_HIGH_LEVEL to expire a cyclic. Cyclic subsystem consumers are 297 * guaranteed that for an arbitrary time t in the future, their cyclic 298 * handler will have been called (t - cyt_when) / cyt_interval times. Thus, 299 * there must be a one-to-one mapping between a cyclic's expiration at 300 * CY_HIGH_LEVEL and its execution at the desired level (either CY_HIGH_LEVEL, 301 * CY_LOCK_LEVEL or CY_LOW_LEVEL). 302 * 303 * For CY_HIGH_LEVEL cyclics, this is trivial; cyclic_expire() simply needs 304 * to call the handler. 305 * 306 * For CY_LOCK_LEVEL and CY_LOW_LEVEL cyclics, however, there exists a 307 * potential disconnect: if the CPU is at an interrupt level less than 308 * CY_HIGH_LEVEL but greater than the level of a cyclic for a period of 309 * time longer than twice the cyclic's interval, the cyclic will be expired 310 * twice before it can be handled. 311 * 312 * To maintain the one-to-one mapping, we track the difference between the 313 * number of times a cyclic has been expired and the number of times it's 314 * been handled in a "pending count" (the cy_pend field of the cyclic 315 * structure). cyclic_expire() thus increments the cy_pend count for the 316 * expired cyclic and posts a soft interrupt at the desired level. In the 317 * cyclic subsystem's soft interrupt handler, cyclic_softint(), we repeatedly 318 * call the cyclic handler and decrement cy_pend until we have decremented 319 * cy_pend to zero. 320 * 321 * The Producer/Consumer Buffer 322 * 323 * If we wish to avoid a linear scan of the cyclics array at soft interrupt 324 * level, cyclic_softint() must be able to quickly determine which cyclics 325 * have a non-zero cy_pend count. We thus introduce a per-soft interrupt 326 * level producer/consumer buffer shared with CY_HIGH_LEVEL. These buffers 327 * are encapsulated in the cyc_pcbuffer structure, and, like cyp_heap, are 328 * implemented as cyc_index_t arrays (the cypc_buf member of the cyc_pcbuffer 329 * structure). 330 * 331 * The producer (cyclic_expire() running at CY_HIGH_LEVEL) enqueues a cyclic 332 * by storing the cyclic's index to cypc_buf[cypc_prodndx] and incrementing 333 * cypc_prodndx. The consumer (cyclic_softint() running at either 334 * CY_LOCK_LEVEL or CY_LOW_LEVEL) dequeues a cyclic by loading from 335 * cypc_buf[cypc_consndx] and bumping cypc_consndx. The buffer is empty when 336 * cypc_prodndx == cypc_consndx. 337 * 338 * To bound the size of the producer/consumer buffer, cyclic_expire() only 339 * enqueues a cyclic if its cy_pend was zero (if the cyclic's cy_pend is 340 * non-zero, cyclic_expire() only bumps cy_pend). Symmetrically, 341 * cyclic_softint() only consumes a cyclic after it has decremented the 342 * cy_pend count to zero. 343 * 344 * Returning to our example, here is what the CY_LOW_LEVEL producer/consumer 345 * buffer might look like: 346 * 347 * cypc_consndx ---+ +--- cypc_prodndx 348 * | | 349 * V V 350 * 351 * [0] [1] [2] [3] [4] [5] [6] [7] 352 * +-----+-----+-----+-----+-----+-----+-----+-----+ 353 * | | | | | | | | | 354 * | x | x | 3 | 2 | 4 | x | x | x | <== cypc_buf 355 * | | | . | . | . | | | | 356 * +-----+-----+- | -+- | -+- | -+-----+-----+-----+ 357 * | | | 358 * | | | cy_pend cy_handler 359 * | | | ------------------------- 360 * | | | [ 0] 1 clock() 361 * | | | [ 1] 0 deadman() 362 * | +---- | -------> [ 2] 3 clock_highres_fire() 363 * +---------- | -------> [ 3] 1 clock_highres_fire() 364 * +--------> [ 4] 1 clock_highres_fire() 365 * [ 5] - (free) 366 * [ 6] - (free) 367 * [ 7] - (free) 368 * 369 * In particular, note that clock()'s cy_pend is 1 but that it is _not_ in 370 * this producer/consumer buffer; it would be enqueued in the CY_LOCK_LEVEL 371 * producer/consumer buffer. 372 * 373 * Locking 374 * 375 * Traditionally, access to per-CPU data structures shared between 376 * interrupt levels is serialized by manipulating programmable interrupt 377 * level: readers and writers are required to raise their interrupt level 378 * to that of the highest level writer. 379 * 380 * For the producer/consumer buffers (shared between cyclic_fire()/ 381 * cyclic_expire() executing at CY_HIGH_LEVEL and cyclic_softint() executing 382 * at one of CY_LOCK_LEVEL or CY_LOW_LEVEL), forcing cyclic_softint() to raise 383 * programmable interrupt level is undesirable: aside from the additional 384 * latency incurred by manipulating interrupt level in the hot cy_pend 385 * processing path, this would create the potential for soft level cy_pend 386 * processing to delay CY_HIGH_LEVEL firing and expiry processing. 387 * CY_LOCK/LOW_LEVEL cyclics could thereby induce jitter in CY_HIGH_LEVEL 388 * cyclics. 389 * 390 * To minimize jitter, then, we would like the cyclic_fire()/cyclic_expire() 391 * and cyclic_softint() code paths to be lock-free. 392 * 393 * For cyclic_fire()/cyclic_expire(), lock-free execution is straightforward: 394 * because these routines execute at a higher interrupt level than 395 * cyclic_softint(), their actions on the producer/consumer buffer appear 396 * atomic. In particular, the increment of cy_pend appears to occur 397 * atomically with the increment of cypc_prodndx. 398 * 399 * For cyclic_softint(), however, lock-free execution requires more delicacy. 400 * When cyclic_softint() discovers a cyclic in the producer/consumer buffer, 401 * it calls the cyclic's handler and attempts to atomically decrement the 402 * cy_pend count with a compare&swap operation. 403 * 404 * If the compare&swap operation succeeds, cyclic_softint() behaves 405 * conditionally based on the value it atomically wrote to cy_pend: 406 * 407 * - If the cy_pend was decremented to 0, the cyclic has been consumed; 408 * cyclic_softint() increments the cypc_consndx and checks for more 409 * enqueued work. 410 * 411 * - If the count was decremented to a non-zero value, there is more work 412 * to be done on the cyclic; cyclic_softint() calls the cyclic handler 413 * and repeats the atomic decrement process. 414 * 415 * If the compare&swap operation fails, cyclic_softint() knows that 416 * cyclic_expire() has intervened and bumped the cy_pend count (resizes 417 * and removals complicate this, however -- see the sections on their 418 * operation, below). cyclic_softint() thus reloads cy_pend, and re-attempts 419 * the atomic decrement. 420 * 421 * Recall that we bound the size of the producer/consumer buffer by 422 * having cyclic_expire() only enqueue the specified cyclic if its 423 * cy_pend count is zero; this assures that each cyclic is enqueued at 424 * most once. This leads to a critical constraint on cyclic_softint(), 425 * however: after the compare&swap operation which successfully decrements 426 * cy_pend to zero, cyclic_softint() must _not_ re-examine the consumed 427 * cyclic. In part to obey this constraint, cyclic_softint() calls the 428 * cyclic handler before decrementing cy_pend. 429 * 430 * Resizing 431 * 432 * All of the discussion thus far has assumed a static number of cyclics. 433 * Obviously, static limitations are not practical; we need the capacity 434 * to resize our data structures dynamically. 435 * 436 * We resize our data structures lazily, and only on a per-CPU basis. 437 * The size of the data structures always doubles and never shrinks. We 438 * serialize adds (and thus resizes) on cpu_lock; we never need to deal 439 * with concurrent resizes. Resizes should be rare; they may induce jitter 440 * on the CPU being resized, but should not affect cyclic operation on other 441 * CPUs. Pending cyclics may not be dropped during a resize operation. 442 * 443 * Three key cyc_cpu data structures need to be resized: the cyclics array, 444 * the heap array and the producer/consumer buffers. Resizing the first two 445 * is relatively straightforward: 446 * 447 * 1. The new, larger arrays are allocated in cyclic_expand() (called 448 * from cyclic_add()). 449 * 2. cyclic_expand() cross calls cyclic_expand_xcall() on the CPU 450 * undergoing the resize. 451 * 3. cyclic_expand_xcall() raises interrupt level to CY_HIGH_LEVEL 452 * 4. The contents of the old arrays are copied into the new arrays. 453 * 5. The old cyclics array is bzero()'d 454 * 6. The pointers are updated. 455 * 456 * The producer/consumer buffer is dicier: cyclic_expand_xcall() may have 457 * interrupted cyclic_softint() in the middle of consumption. To resize the 458 * producer/consumer buffer, we implement up to two buffers per soft interrupt 459 * level: a hard buffer (the buffer being produced into by cyclic_expire()) 460 * and a soft buffer (the buffer from which cyclic_softint() is consuming). 461 * During normal operation, the hard buffer and soft buffer point to the 462 * same underlying producer/consumer buffer. 463 * 464 * During a resize, however, cyclic_expand_xcall() changes the hard buffer 465 * to point to the new, larger producer/consumer buffer; all future 466 * cyclic_expire()'s will produce into the new buffer. cyclic_expand_xcall() 467 * then posts a CY_LOCK_LEVEL soft interrupt, landing in cyclic_softint(). 468 * 469 * As under normal operation, cyclic_softint() will consume cyclics from 470 * its soft buffer. After the soft buffer is drained, however, 471 * cyclic_softint() will see that the hard buffer has changed. At that time, 472 * cyclic_softint() will change its soft buffer to point to the hard buffer, 473 * and repeat the producer/consumer buffer draining procedure. 474 * 475 * After the new buffer is drained, cyclic_softint() will determine if both 476 * soft levels have seen their new producer/consumer buffer. If both have, 477 * cyclic_softint() will post on the semaphore cyp_modify_wait. If not, a 478 * soft interrupt will be generated for the remaining level. 479 * 480 * cyclic_expand() blocks on the cyp_modify_wait semaphore (a semaphore is 481 * used instead of a condition variable because of the race between the 482 * sema_p() in cyclic_expand() and the sema_v() in cyclic_softint()). This 483 * allows cyclic_expand() to know when the resize operation is complete; 484 * all of the old buffers (the heap, the cyclics array and the producer/ 485 * consumer buffers) can be freed. 486 * 487 * A final caveat on resizing: we described step (5) in the 488 * cyclic_expand_xcall() procedure without providing any motivation. This 489 * step addresses the problem of a cyclic_softint() attempting to decrement 490 * a cy_pend count while interrupted by a cyclic_expand_xcall(). Because 491 * cyclic_softint() has already called the handler by the time cy_pend is 492 * decremented, we want to assure that it doesn't decrement a cy_pend 493 * count in the old cyclics array. By zeroing the old cyclics array in 494 * cyclic_expand_xcall(), we are zeroing out every cy_pend count; when 495 * cyclic_softint() attempts to compare&swap on the cy_pend count, it will 496 * fail and recognize that the count has been zeroed. cyclic_softint() will 497 * update its stale copy of the cyp_cyclics pointer, re-read the cy_pend 498 * count from the new cyclics array, and re-attempt the compare&swap. 499 * 500 * Removals 501 * 502 * Cyclic removals should be rare. To simplify the implementation (and to 503 * allow optimization for the cyclic_fire()/cyclic_expire()/cyclic_softint() 504 * path), we force removals and adds to serialize on cpu_lock. 505 * 506 * Cyclic removal is complicated by a guarantee made to the consumer of 507 * the cyclic subsystem: after cyclic_remove() returns, the cyclic handler 508 * has returned and will never again be called. 509 * 510 * Here is the procedure for cyclic removal: 511 * 512 * 1. cyclic_remove() calls cyclic_remove_xcall() on the CPU undergoing 513 * the removal. 514 * 2. cyclic_remove_xcall() raises interrupt level to CY_HIGH_LEVEL 515 * 3. The current expiration time for the removed cyclic is recorded. 516 * 4. If the cy_pend count on the removed cyclic is non-zero, it 517 * is copied into cyp_rpend and subsequently zeroed. 518 * 5. The cyclic is removed from the heap 519 * 6. If the root of the heap has changed, the backend is reprogrammed. 520 * 7. If the cy_pend count was non-zero cyclic_remove() blocks on the 521 * cyp_modify_wait semaphore. 522 * 523 * The motivation for step (3) is explained in "Juggling", below. 524 * 525 * The cy_pend count is decremented in cyclic_softint() after the cyclic 526 * handler returns. Thus, if we find a cy_pend count of zero in step 527 * (4), we know that cyclic_remove() doesn't need to block. 528 * 529 * If the cy_pend count is non-zero, however, we must block in cyclic_remove() 530 * until cyclic_softint() has finished calling the cyclic handler. To let 531 * cyclic_softint() know that this cyclic has been removed, we zero the 532 * cy_pend count. This will cause cyclic_softint()'s compare&swap to fail. 533 * When cyclic_softint() sees the zero cy_pend count, it knows that it's been 534 * caught during a resize (see "Resizing", above) or that the cyclic has been 535 * removed. In the latter case, it calls cyclic_remove_pend() to call the 536 * cyclic handler cyp_rpend - 1 times, and posts on cyp_modify_wait. 537 * 538 * Juggling 539 * 540 * At first glance, cyclic juggling seems to be a difficult problem. The 541 * subsystem must guarantee that a cyclic doesn't execute simultaneously on 542 * different CPUs, while also assuring that a cyclic fires exactly once 543 * per interval. We solve this problem by leveraging a property of the 544 * platform: gethrtime() is required to increase in lock-step across 545 * multiple CPUs. Therefore, to juggle a cyclic, we remove it from its 546 * CPU, recording its expiration time in the remove cross call (step (3) 547 * in "Removing", above). We then add the cyclic to the new CPU, explicitly 548 * setting its expiration time to the time recorded in the removal. This 549 * leverages the existing cyclic expiry processing, which will compensate 550 * for any time lost while juggling. 551 * 552 * Reprogramming 553 * 554 * Normally, after a cyclic fires, its next expiration is computed from 555 * the current time and the cyclic interval. But there are situations when 556 * the next expiration needs to be reprogrammed by the kernel subsystem that 557 * is using the cyclic. cyclic_reprogram() allows this to be done. This, 558 * unlike the other kernel at-large cyclic API functions, is permitted to 559 * be called from the cyclic handler. This is because it does not use the 560 * cpu_lock to serialize access. 561 * 562 * When cyclic_reprogram() is called for an omni-cyclic, the operation is 563 * applied to the omni-cyclic's component on the current CPU. 564 * 565 * If a high-level cyclic handler reprograms its own cyclic, then 566 * cyclic_fire() detects that and does not recompute the cyclic's next 567 * expiration. However, for a lock-level or a low-level cyclic, the 568 * actual cyclic handler will execute at the lower PIL only after 569 * cyclic_fire() is done with all expired cyclics. To deal with this, such 570 * cyclics can be specified with a special interval of CY_INFINITY (INT64_MAX). 571 * cyclic_fire() recognizes this special value and recomputes the next 572 * expiration to CY_INFINITY. This effectively moves the cyclic to the 573 * bottom of the heap and prevents it from going off until its handler has 574 * had a chance to reprogram it. Infact, this is the way to create and reuse 575 * "one-shot" timers in the context of the cyclic subsystem without using 576 * cyclic_remove(). 577 * 578 * Here is the procedure for cyclic reprogramming: 579 * 580 * 1. cyclic_reprogram() calls cyclic_reprogram_xcall() on the CPU 581 * that houses the cyclic. 582 * 2. cyclic_reprogram_xcall() raises interrupt level to CY_HIGH_LEVEL 583 * 3. The cyclic is located in the cyclic heap. The search for this is 584 * done from the bottom of the heap to the top as reprogrammable cyclics 585 * would be located closer to the bottom than the top. 586 * 4. The cyclic expiration is set and the cyclic is moved to its 587 * correct position in the heap (up or down depending on whether the 588 * new expiration is less than or greater than the old one). 589 * 5. If the cyclic move modified the root of the heap, the backend is 590 * reprogrammed. 591 * 592 * Reprogramming can be a frequent event (see the callout subsystem). So, 593 * the serialization used has to be efficient. As with all other cyclic 594 * operations, the interrupt level is raised during reprogramming. Plus, 595 * during reprogramming, the cyclic must not be juggled (regular cyclic) 596 * or stopped (omni-cyclic). The implementation defines a per-cyclic 597 * reader-writer lock to accomplish this. This lock is acquired in the 598 * reader mode by cyclic_reprogram() and writer mode by cyclic_juggle() and 599 * cyclic_omni_stop(). The reader-writer lock makes it efficient if 600 * an omni-cyclic is reprogrammed on different CPUs frequently. 601 * 602 * Note that since the cpu_lock is not used during reprogramming, it is 603 * the responsibility of the user of the reprogrammable cyclic to make sure 604 * that the cyclic is not removed via cyclic_remove() during reprogramming. 605 * This is not an unreasonable requirement as the user will typically have 606 * some sort of synchronization for its cyclic-related activities. This 607 * little caveat exists because the cyclic ID is not really an ID. It is 608 * implemented as a pointer to a structure. 609 */ 610 #include <sys/cyclic_impl.h> 611 #include <sys/sysmacros.h> 612 #include <sys/systm.h> 613 #include <sys/atomic.h> 614 #include <sys/kmem.h> 615 #include <sys/cmn_err.h> 616 #include <sys/ddi.h> 617 #include <sys/sdt.h> 618 619 #ifdef CYCLIC_TRACE 620 621 /* 622 * cyc_trace_enabled is for the benefit of kernel debuggers. 623 */ 624 int cyc_trace_enabled = 1; 625 static cyc_tracebuf_t cyc_ptrace; 626 static cyc_coverage_t cyc_coverage[CY_NCOVERAGE]; 627 628 /* 629 * Seen this anywhere? 630 */ 631 static uint_t 632 cyclic_coverage_hash(char *p) 633 { 634 unsigned int g; 635 uint_t hval; 636 637 hval = 0; 638 while (*p) { 639 hval = (hval << 4) + *p++; 640 if ((g = (hval & 0xf0000000)) != 0) 641 hval ^= g >> 24; 642 hval &= ~g; 643 } 644 return (hval); 645 } 646 647 static void 648 cyclic_coverage(char *why, int level, uint64_t arg0, uint64_t arg1) 649 { 650 uint_t ndx, orig; 651 652 for (ndx = orig = cyclic_coverage_hash(why) % CY_NCOVERAGE; ; ) { 653 if (cyc_coverage[ndx].cyv_why == why) 654 break; 655 656 if (cyc_coverage[ndx].cyv_why != NULL || 657 casptr(&cyc_coverage[ndx].cyv_why, NULL, why) != NULL) { 658 659 if (++ndx == CY_NCOVERAGE) 660 ndx = 0; 661 662 if (ndx == orig) 663 panic("too many cyclic coverage points"); 664 continue; 665 } 666 667 /* 668 * If we're here, we have successfully swung our guy into 669 * the position at "ndx". 670 */ 671 break; 672 } 673 674 if (level == CY_PASSIVE_LEVEL) 675 cyc_coverage[ndx].cyv_passive_count++; 676 else 677 cyc_coverage[ndx].cyv_count[level]++; 678 679 cyc_coverage[ndx].cyv_arg0 = arg0; 680 cyc_coverage[ndx].cyv_arg1 = arg1; 681 } 682 683 #define CYC_TRACE(cpu, level, why, arg0, arg1) \ 684 CYC_TRACE_IMPL(&cpu->cyp_trace[level], level, why, arg0, arg1) 685 686 #define CYC_PTRACE(why, arg0, arg1) \ 687 CYC_TRACE_IMPL(&cyc_ptrace, CY_PASSIVE_LEVEL, why, arg0, arg1) 688 689 #define CYC_TRACE_IMPL(buf, level, why, a0, a1) { \ 690 if (panicstr == NULL) { \ 691 int _ndx = (buf)->cyt_ndx; \ 692 cyc_tracerec_t *_rec = &(buf)->cyt_buf[_ndx]; \ 693 (buf)->cyt_ndx = (++_ndx == CY_NTRACEREC) ? 0 : _ndx; \ 694 _rec->cyt_tstamp = gethrtime_unscaled(); \ 695 _rec->cyt_why = (why); \ 696 _rec->cyt_arg0 = (uint64_t)(uintptr_t)(a0); \ 697 _rec->cyt_arg1 = (uint64_t)(uintptr_t)(a1); \ 698 cyclic_coverage(why, level, \ 699 (uint64_t)(uintptr_t)(a0), (uint64_t)(uintptr_t)(a1)); \ 700 } \ 701 } 702 703 #else 704 705 static int cyc_trace_enabled = 0; 706 707 #define CYC_TRACE(cpu, level, why, arg0, arg1) 708 #define CYC_PTRACE(why, arg0, arg1) 709 710 #endif 711 712 #define CYC_TRACE0(cpu, level, why) CYC_TRACE(cpu, level, why, 0, 0) 713 #define CYC_TRACE1(cpu, level, why, arg0) CYC_TRACE(cpu, level, why, arg0, 0) 714 715 #define CYC_PTRACE0(why) CYC_PTRACE(why, 0, 0) 716 #define CYC_PTRACE1(why, arg0) CYC_PTRACE(why, arg0, 0) 717 718 static kmem_cache_t *cyclic_id_cache; 719 static cyc_id_t *cyclic_id_head; 720 static hrtime_t cyclic_resolution; 721 static cyc_backend_t cyclic_backend; 722 723 /* 724 * Returns 1 if the upheap propagated to the root, 0 if it did not. This 725 * allows the caller to reprogram the backend only when the root has been 726 * modified. 727 */ 728 static int 729 cyclic_upheap(cyc_cpu_t *cpu, cyc_index_t ndx) 730 { 731 cyclic_t *cyclics; 732 cyc_index_t *heap; 733 cyc_index_t heap_parent, heap_current = ndx; 734 cyc_index_t parent, current; 735 736 if (heap_current == 0) 737 return (1); 738 739 heap = cpu->cyp_heap; 740 cyclics = cpu->cyp_cyclics; 741 heap_parent = CYC_HEAP_PARENT(heap_current); 742 743 for (;;) { 744 current = heap[heap_current]; 745 parent = heap[heap_parent]; 746 747 /* 748 * We have an expiration time later than our parent; we're 749 * done. 750 */ 751 if (cyclics[current].cy_expire >= cyclics[parent].cy_expire) 752 return (0); 753 754 /* 755 * We need to swap with our parent, and continue up the heap. 756 */ 757 heap[heap_parent] = current; 758 heap[heap_current] = parent; 759 760 /* 761 * If we just reached the root, we're done. 762 */ 763 if (heap_parent == 0) 764 return (1); 765 766 heap_current = heap_parent; 767 heap_parent = CYC_HEAP_PARENT(heap_current); 768 } 769 } 770 771 static void 772 cyclic_downheap(cyc_cpu_t *cpu, cyc_index_t ndx) 773 { 774 cyclic_t *cyclics = cpu->cyp_cyclics; 775 cyc_index_t *heap = cpu->cyp_heap; 776 777 cyc_index_t heap_left, heap_right, heap_me = ndx; 778 cyc_index_t left, right, me; 779 cyc_index_t nelems = cpu->cyp_nelems; 780 781 for (;;) { 782 /* 783 * If we don't have a left child (i.e., we're a leaf), we're 784 * done. 785 */ 786 if ((heap_left = CYC_HEAP_LEFT(heap_me)) >= nelems) 787 return; 788 789 left = heap[heap_left]; 790 me = heap[heap_me]; 791 792 heap_right = CYC_HEAP_RIGHT(heap_me); 793 794 /* 795 * Even if we don't have a right child, we still need to compare 796 * our expiration time against that of our left child. 797 */ 798 if (heap_right >= nelems) 799 goto comp_left; 800 801 right = heap[heap_right]; 802 803 /* 804 * We have both a left and a right child. We need to compare 805 * the expiration times of the children to determine which 806 * expires earlier. 807 */ 808 if (cyclics[right].cy_expire < cyclics[left].cy_expire) { 809 /* 810 * Our right child is the earlier of our children. 811 * We'll now compare our expiration time to its; if 812 * ours is the earlier, we're done. 813 */ 814 if (cyclics[me].cy_expire <= cyclics[right].cy_expire) 815 return; 816 817 /* 818 * Our right child expires earlier than we do; swap 819 * with our right child, and descend right. 820 */ 821 heap[heap_right] = me; 822 heap[heap_me] = right; 823 heap_me = heap_right; 824 continue; 825 } 826 827 comp_left: 828 /* 829 * Our left child is the earlier of our children (or we have 830 * no right child). We'll now compare our expiration time 831 * to its; if ours is the earlier, we're done. 832 */ 833 if (cyclics[me].cy_expire <= cyclics[left].cy_expire) 834 return; 835 836 /* 837 * Our left child expires earlier than we do; swap with our 838 * left child, and descend left. 839 */ 840 heap[heap_left] = me; 841 heap[heap_me] = left; 842 heap_me = heap_left; 843 } 844 } 845 846 static void 847 cyclic_expire(cyc_cpu_t *cpu, cyc_index_t ndx, cyclic_t *cyclic) 848 { 849 cyc_backend_t *be = cpu->cyp_backend; 850 cyc_level_t level = cyclic->cy_level; 851 852 /* 853 * If this is a CY_HIGH_LEVEL cyclic, just call the handler; we don't 854 * need to worry about the pend count for CY_HIGH_LEVEL cyclics. 855 */ 856 if (level == CY_HIGH_LEVEL) { 857 cyc_func_t handler = cyclic->cy_handler; 858 void *arg = cyclic->cy_arg; 859 860 CYC_TRACE(cpu, CY_HIGH_LEVEL, "handler-in", handler, arg); 861 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 862 863 (*handler)(arg); 864 865 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 866 CYC_TRACE(cpu, CY_HIGH_LEVEL, "handler-out", handler, arg); 867 868 return; 869 } 870 871 /* 872 * We're at CY_HIGH_LEVEL; this modification to cy_pend need not 873 * be atomic (the high interrupt level assures that it will appear 874 * atomic to any softint currently running). 875 */ 876 if (cyclic->cy_pend++ == 0) { 877 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[level]; 878 cyc_pcbuffer_t *pc = &softbuf->cys_buf[softbuf->cys_hard]; 879 880 /* 881 * We need to enqueue this cyclic in the soft buffer. 882 */ 883 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expire-enq", cyclic, 884 pc->cypc_prodndx); 885 pc->cypc_buf[pc->cypc_prodndx++ & pc->cypc_sizemask] = ndx; 886 887 ASSERT(pc->cypc_prodndx != pc->cypc_consndx); 888 } else { 889 /* 890 * If the pend count is zero after we incremented it, then 891 * we've wrapped (i.e. we had a cy_pend count of over four 892 * billion. In this case, we clamp the pend count at 893 * UINT32_MAX. Yes, cyclics can be lost in this case. 894 */ 895 if (cyclic->cy_pend == 0) { 896 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "expire-wrap", cyclic); 897 cyclic->cy_pend = UINT32_MAX; 898 } 899 900 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expire-bump", cyclic, 0); 901 } 902 903 be->cyb_softint(be->cyb_arg, cyclic->cy_level); 904 } 905 906 /* 907 * cyclic_fire(cpu_t *) 908 * 909 * Overview 910 * 911 * cyclic_fire() is the cyclic subsystem's CY_HIGH_LEVEL interrupt handler. 912 * Called by the cyclic backend. 913 * 914 * Arguments and notes 915 * 916 * The only argument is the CPU on which the interrupt is executing; 917 * backends must call into cyclic_fire() on the specified CPU. 918 * 919 * cyclic_fire() may be called spuriously without ill effect. Optimal 920 * backends will call into cyclic_fire() at or shortly after the time 921 * requested via cyb_reprogram(). However, calling cyclic_fire() 922 * arbitrarily late will only manifest latency bubbles; the correctness 923 * of the cyclic subsystem does not rely on the timeliness of the backend. 924 * 925 * cyclic_fire() is wait-free; it will not block or spin. 926 * 927 * Return values 928 * 929 * None. 930 * 931 * Caller's context 932 * 933 * cyclic_fire() must be called from CY_HIGH_LEVEL interrupt context. 934 */ 935 void 936 cyclic_fire(cpu_t *c) 937 { 938 cyc_cpu_t *cpu = c->cpu_cyclic; 939 cyc_backend_t *be = cpu->cyp_backend; 940 cyc_index_t *heap = cpu->cyp_heap; 941 cyclic_t *cyclic, *cyclics = cpu->cyp_cyclics; 942 void *arg = be->cyb_arg; 943 hrtime_t now = gethrtime(); 944 hrtime_t exp; 945 946 CYC_TRACE(cpu, CY_HIGH_LEVEL, "fire", now, 0); 947 948 if (cpu->cyp_nelems == 0) { 949 /* 950 * This is a spurious fire. Count it as such, and blow 951 * out of here. 952 */ 953 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "fire-spurious"); 954 return; 955 } 956 957 for (;;) { 958 cyc_index_t ndx = heap[0]; 959 960 cyclic = &cyclics[ndx]; 961 962 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 963 964 CYC_TRACE(cpu, CY_HIGH_LEVEL, "fire-check", cyclic, 965 cyclic->cy_expire); 966 967 if ((exp = cyclic->cy_expire) > now) 968 break; 969 970 cyclic_expire(cpu, ndx, cyclic); 971 972 /* 973 * If the handler reprogrammed the cyclic, then don't 974 * recompute the expiration. Then, if the interval is 975 * infinity, set the expiration to infinity. This can 976 * be used to create one-shot timers. 977 */ 978 if (exp != cyclic->cy_expire) { 979 /* 980 * If a hi level cyclic reprograms itself, 981 * the heap adjustment and reprogramming of the 982 * clock source have already been done at this 983 * point. So, we can continue. 984 */ 985 continue; 986 } 987 988 if (cyclic->cy_interval == CY_INFINITY) 989 exp = CY_INFINITY; 990 else 991 exp += cyclic->cy_interval; 992 993 /* 994 * If this cyclic will be set to next expire in the distant 995 * past, we have one of two situations: 996 * 997 * a) This is the first firing of a cyclic which had 998 * cy_expire set to 0. 999 * 1000 * b) We are tragically late for a cyclic -- most likely 1001 * due to being in the debugger. 1002 * 1003 * In either case, we set the new expiration time to be the 1004 * the next interval boundary. This assures that the 1005 * expiration time modulo the interval is invariant. 1006 * 1007 * We arbitrarily define "distant" to be one second (one second 1008 * is chosen because it's shorter than any foray to the 1009 * debugger while still being longer than any legitimate 1010 * stretch at CY_HIGH_LEVEL). 1011 */ 1012 1013 if (now - exp > NANOSEC) { 1014 hrtime_t interval = cyclic->cy_interval; 1015 1016 CYC_TRACE(cpu, CY_HIGH_LEVEL, exp == interval ? 1017 "fire-first" : "fire-swing", now, exp); 1018 1019 exp += ((now - exp) / interval + 1) * interval; 1020 } 1021 1022 cyclic->cy_expire = exp; 1023 cyclic_downheap(cpu, 0); 1024 } 1025 1026 /* 1027 * Now we have a cyclic in the root slot which isn't in the past; 1028 * reprogram the interrupt source. 1029 */ 1030 be->cyb_reprogram(arg, exp); 1031 } 1032 1033 static void 1034 cyclic_remove_pend(cyc_cpu_t *cpu, cyc_level_t level, cyclic_t *cyclic) 1035 { 1036 cyc_func_t handler = cyclic->cy_handler; 1037 void *arg = cyclic->cy_arg; 1038 uint32_t i, rpend = cpu->cyp_rpend - 1; 1039 1040 ASSERT(cyclic->cy_flags & CYF_FREE); 1041 ASSERT(cyclic->cy_pend == 0); 1042 ASSERT(cpu->cyp_state == CYS_REMOVING); 1043 ASSERT(cpu->cyp_rpend > 0); 1044 1045 CYC_TRACE(cpu, level, "remove-rpend", cyclic, cpu->cyp_rpend); 1046 1047 /* 1048 * Note that we only call the handler cyp_rpend - 1 times; this is 1049 * to account for the handler call in cyclic_softint(). 1050 */ 1051 for (i = 0; i < rpend; i++) { 1052 CYC_TRACE(cpu, level, "rpend-in", handler, arg); 1053 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 1054 1055 (*handler)(arg); 1056 1057 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 1058 CYC_TRACE(cpu, level, "rpend-out", handler, arg); 1059 } 1060 1061 /* 1062 * We can now let the remove operation complete. 1063 */ 1064 sema_v(&cpu->cyp_modify_wait); 1065 } 1066 1067 /* 1068 * cyclic_softint(cpu_t *cpu, cyc_level_t level) 1069 * 1070 * Overview 1071 * 1072 * cyclic_softint() is the cyclic subsystem's CY_LOCK_LEVEL and CY_LOW_LEVEL 1073 * soft interrupt handler. Called by the cyclic backend. 1074 * 1075 * Arguments and notes 1076 * 1077 * The first argument to cyclic_softint() is the CPU on which the interrupt 1078 * is executing; backends must call into cyclic_softint() on the specified 1079 * CPU. The second argument is the level of the soft interrupt; it must 1080 * be one of CY_LOCK_LEVEL or CY_LOW_LEVEL. 1081 * 1082 * cyclic_softint() will call the handlers for cyclics pending at the 1083 * specified level. cyclic_softint() will not return until all pending 1084 * cyclics at the specified level have been dealt with; intervening 1085 * CY_HIGH_LEVEL interrupts which enqueue cyclics at the specified level 1086 * may therefore prolong cyclic_softint(). 1087 * 1088 * cyclic_softint() never disables interrupts, and, if neither a 1089 * cyclic_add() nor a cyclic_remove() is pending on the specified CPU, is 1090 * lock-free. This assures that in the common case, cyclic_softint() 1091 * completes without blocking, and never starves cyclic_fire(). If either 1092 * cyclic_add() or cyclic_remove() is pending, cyclic_softint() may grab 1093 * a dispatcher lock. 1094 * 1095 * While cyclic_softint() is designed for bounded latency, it is obviously 1096 * at the mercy of its cyclic handlers. Because cyclic handlers may block 1097 * arbitrarily, callers of cyclic_softint() should not rely upon 1098 * deterministic completion. 1099 * 1100 * cyclic_softint() may be called spuriously without ill effect. 1101 * 1102 * Return value 1103 * 1104 * None. 1105 * 1106 * Caller's context 1107 * 1108 * The caller must be executing in soft interrupt context at either 1109 * CY_LOCK_LEVEL or CY_LOW_LEVEL. The level passed to cyclic_softint() 1110 * must match the level at which it is executing. On optimal backends, 1111 * the caller will hold no locks. In any case, the caller may not hold 1112 * cpu_lock or any lock acquired by any cyclic handler or held across 1113 * any of cyclic_add(), cyclic_remove(), cyclic_bind() or cyclic_juggle(). 1114 */ 1115 void 1116 cyclic_softint(cpu_t *c, cyc_level_t level) 1117 { 1118 cyc_cpu_t *cpu = c->cpu_cyclic; 1119 cyc_softbuf_t *softbuf; 1120 int soft, *buf, consndx, resized = 0, intr_resized = 0; 1121 cyc_pcbuffer_t *pc; 1122 cyclic_t *cyclics = cpu->cyp_cyclics; 1123 int sizemask; 1124 1125 CYC_TRACE(cpu, level, "softint", cyclics, 0); 1126 1127 ASSERT(level < CY_LOW_LEVEL + CY_SOFT_LEVELS); 1128 1129 softbuf = &cpu->cyp_softbuf[level]; 1130 top: 1131 soft = softbuf->cys_soft; 1132 ASSERT(soft == 0 || soft == 1); 1133 1134 pc = &softbuf->cys_buf[soft]; 1135 buf = pc->cypc_buf; 1136 consndx = pc->cypc_consndx; 1137 sizemask = pc->cypc_sizemask; 1138 1139 CYC_TRACE(cpu, level, "softint-top", cyclics, pc); 1140 1141 while (consndx != pc->cypc_prodndx) { 1142 int pend, npend, opend; 1143 int consmasked = consndx & sizemask; 1144 cyclic_t *cyclic = &cyclics[buf[consmasked]]; 1145 cyc_func_t handler = cyclic->cy_handler; 1146 void *arg = cyclic->cy_arg; 1147 1148 ASSERT(buf[consmasked] < cpu->cyp_size); 1149 CYC_TRACE(cpu, level, "consuming", consndx, cyclic); 1150 1151 /* 1152 * We have found this cyclic in the pcbuffer. We know that 1153 * one of the following is true: 1154 * 1155 * (a) The pend is non-zero. We need to execute the handler 1156 * at least once. 1157 * 1158 * (b) The pend _was_ non-zero, but it's now zero due to a 1159 * resize. We will call the handler once, see that we 1160 * are in this case, and read the new cyclics buffer 1161 * (and hence the old non-zero pend). 1162 * 1163 * (c) The pend _was_ non-zero, but it's now zero due to a 1164 * removal. We will call the handler once, see that we 1165 * are in this case, and call into cyclic_remove_pend() 1166 * to call the cyclic rpend times. We will take into 1167 * account that we have already called the handler once. 1168 * 1169 * Point is: it's safe to call the handler without first 1170 * checking the pend. 1171 */ 1172 do { 1173 CYC_TRACE(cpu, level, "handler-in", handler, arg); 1174 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 1175 1176 (*handler)(arg); 1177 1178 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 1179 CYC_TRACE(cpu, level, "handler-out", handler, arg); 1180 reread: 1181 pend = cyclic->cy_pend; 1182 npend = pend - 1; 1183 1184 if (pend == 0) { 1185 if (cpu->cyp_state == CYS_REMOVING) { 1186 /* 1187 * This cyclic has been removed while 1188 * it had a non-zero pend count (we 1189 * know it was non-zero because we 1190 * found this cyclic in the pcbuffer). 1191 * There must be a non-zero rpend for 1192 * this CPU, and there must be a remove 1193 * operation blocking; we'll call into 1194 * cyclic_remove_pend() to clean this 1195 * up, and break out of the pend loop. 1196 */ 1197 cyclic_remove_pend(cpu, level, cyclic); 1198 break; 1199 } 1200 1201 /* 1202 * We must have had a resize interrupt us. 1203 */ 1204 CYC_TRACE(cpu, level, "resize-int", cyclics, 0); 1205 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1206 ASSERT(cyclics != cpu->cyp_cyclics); 1207 ASSERT(resized == 0); 1208 ASSERT(intr_resized == 0); 1209 intr_resized = 1; 1210 cyclics = cpu->cyp_cyclics; 1211 cyclic = &cyclics[buf[consmasked]]; 1212 ASSERT(cyclic->cy_handler == handler); 1213 ASSERT(cyclic->cy_arg == arg); 1214 goto reread; 1215 } 1216 1217 if ((opend = 1218 cas32(&cyclic->cy_pend, pend, npend)) != pend) { 1219 /* 1220 * Our cas32 can fail for one of several 1221 * reasons: 1222 * 1223 * (a) An intervening high level bumped up the 1224 * pend count on this cyclic. In this 1225 * case, we will see a higher pend. 1226 * 1227 * (b) The cyclics array has been yanked out 1228 * from underneath us by a resize 1229 * operation. In this case, pend is 0 and 1230 * cyp_state is CYS_EXPANDING. 1231 * 1232 * (c) The cyclic has been removed by an 1233 * intervening remove-xcall. In this case, 1234 * pend will be 0, the cyp_state will be 1235 * CYS_REMOVING, and the cyclic will be 1236 * marked CYF_FREE. 1237 * 1238 * The assertion below checks that we are 1239 * in one of the above situations. The 1240 * action under all three is to return to 1241 * the top of the loop. 1242 */ 1243 CYC_TRACE(cpu, level, "cas-fail", opend, pend); 1244 ASSERT(opend > pend || (opend == 0 && 1245 ((cyclics != cpu->cyp_cyclics && 1246 cpu->cyp_state == CYS_EXPANDING) || 1247 (cpu->cyp_state == CYS_REMOVING && 1248 (cyclic->cy_flags & CYF_FREE))))); 1249 goto reread; 1250 } 1251 1252 /* 1253 * Okay, so we've managed to successfully decrement 1254 * pend. If we just decremented the pend to 0, we're 1255 * done. 1256 */ 1257 } while (npend > 0); 1258 1259 pc->cypc_consndx = ++consndx; 1260 } 1261 1262 /* 1263 * If the high level handler is no longer writing to the same 1264 * buffer, then we've had a resize. We need to switch our soft 1265 * index, and goto top. 1266 */ 1267 if (soft != softbuf->cys_hard) { 1268 /* 1269 * We can assert that the other buffer has grown by exactly 1270 * one factor of two. 1271 */ 1272 CYC_TRACE(cpu, level, "buffer-grow", 0, 0); 1273 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1274 ASSERT(softbuf->cys_buf[softbuf->cys_hard].cypc_sizemask == 1275 (softbuf->cys_buf[soft].cypc_sizemask << 1) + 1 || 1276 softbuf->cys_buf[soft].cypc_sizemask == 0); 1277 ASSERT(softbuf->cys_hard == (softbuf->cys_soft ^ 1)); 1278 1279 /* 1280 * If our cached cyclics pointer doesn't match cyp_cyclics, 1281 * then we took a resize between our last iteration of the 1282 * pend loop and the check against softbuf->cys_hard. 1283 */ 1284 if (cpu->cyp_cyclics != cyclics) { 1285 CYC_TRACE1(cpu, level, "resize-int-int", consndx); 1286 cyclics = cpu->cyp_cyclics; 1287 } 1288 1289 softbuf->cys_soft = softbuf->cys_hard; 1290 1291 ASSERT(resized == 0); 1292 resized = 1; 1293 goto top; 1294 } 1295 1296 /* 1297 * If we were interrupted by a resize operation, then we must have 1298 * seen the hard index change. 1299 */ 1300 ASSERT(!(intr_resized == 1 && resized == 0)); 1301 1302 if (resized) { 1303 uint32_t lev, nlev; 1304 1305 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1306 1307 do { 1308 lev = cpu->cyp_modify_levels; 1309 nlev = lev + 1; 1310 } while (cas32(&cpu->cyp_modify_levels, lev, nlev) != lev); 1311 1312 /* 1313 * If we are the last soft level to see the modification, 1314 * post on cyp_modify_wait. Otherwise, (if we're not 1315 * already at low level), post down to the next soft level. 1316 */ 1317 if (nlev == CY_SOFT_LEVELS) { 1318 CYC_TRACE0(cpu, level, "resize-kick"); 1319 sema_v(&cpu->cyp_modify_wait); 1320 } else { 1321 ASSERT(nlev < CY_SOFT_LEVELS); 1322 if (level != CY_LOW_LEVEL) { 1323 cyc_backend_t *be = cpu->cyp_backend; 1324 1325 CYC_TRACE0(cpu, level, "resize-post"); 1326 be->cyb_softint(be->cyb_arg, level - 1); 1327 } 1328 } 1329 } 1330 } 1331 1332 static void 1333 cyclic_expand_xcall(cyc_xcallarg_t *arg) 1334 { 1335 cyc_cpu_t *cpu = arg->cyx_cpu; 1336 cyc_backend_t *be = cpu->cyp_backend; 1337 cyb_arg_t bar = be->cyb_arg; 1338 cyc_cookie_t cookie; 1339 cyc_index_t new_size = arg->cyx_size, size = cpu->cyp_size, i; 1340 cyc_index_t *new_heap = arg->cyx_heap; 1341 cyclic_t *cyclics = cpu->cyp_cyclics, *new_cyclics = arg->cyx_cyclics; 1342 1343 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1344 1345 /* 1346 * This is a little dicey. First, we'll raise our interrupt level 1347 * to CY_HIGH_LEVEL. This CPU already has a new heap, cyclic array, 1348 * etc.; we just need to bcopy them across. As for the softint 1349 * buffers, we'll switch the active buffers. The actual softints will 1350 * take care of consuming any pending cyclics in the old buffer. 1351 */ 1352 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1353 1354 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expand", new_size, 0); 1355 1356 /* 1357 * Assert that the new size is a power of 2. 1358 */ 1359 ASSERT((new_size & new_size - 1) == 0); 1360 ASSERT(new_size == (size << 1)); 1361 ASSERT(cpu->cyp_heap != NULL && cpu->cyp_cyclics != NULL); 1362 1363 bcopy(cpu->cyp_heap, new_heap, sizeof (cyc_index_t) * size); 1364 bcopy(cyclics, new_cyclics, sizeof (cyclic_t) * size); 1365 1366 /* 1367 * Now run through the old cyclics array, setting pend to 0. To 1368 * softints (which are executing at a lower priority level), the 1369 * pends dropping to 0 will appear atomic with the cyp_cyclics 1370 * pointer changing. 1371 */ 1372 for (i = 0; i < size; i++) 1373 cyclics[i].cy_pend = 0; 1374 1375 /* 1376 * Set up the free list, and set all of the new cyclics to be CYF_FREE. 1377 */ 1378 for (i = size; i < new_size; i++) { 1379 new_heap[i] = i; 1380 new_cyclics[i].cy_flags = CYF_FREE; 1381 } 1382 1383 /* 1384 * We can go ahead and plow the value of cyp_heap and cyp_cyclics; 1385 * cyclic_expand() has kept a copy. 1386 */ 1387 cpu->cyp_heap = new_heap; 1388 cpu->cyp_cyclics = new_cyclics; 1389 cpu->cyp_size = new_size; 1390 1391 /* 1392 * We've switched over the heap and the cyclics array. Now we need 1393 * to switch over our active softint buffer pointers. 1394 */ 1395 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1396 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1397 uchar_t hard = softbuf->cys_hard; 1398 1399 /* 1400 * Assert that we're not in the middle of a resize operation. 1401 */ 1402 ASSERT(hard == softbuf->cys_soft); 1403 ASSERT(hard == 0 || hard == 1); 1404 ASSERT(softbuf->cys_buf[hard].cypc_buf != NULL); 1405 1406 softbuf->cys_hard = hard ^ 1; 1407 1408 /* 1409 * The caller (cyclic_expand()) is responsible for setting 1410 * up the new producer-consumer buffer; assert that it's 1411 * been done correctly. 1412 */ 1413 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_buf != NULL); 1414 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_prodndx == 0); 1415 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_consndx == 0); 1416 } 1417 1418 /* 1419 * That's all there is to it; now we just need to postdown to 1420 * get the softint chain going. 1421 */ 1422 be->cyb_softint(bar, CY_HIGH_LEVEL - 1); 1423 be->cyb_restore_level(bar, cookie); 1424 } 1425 1426 /* 1427 * cyclic_expand() will cross call onto the CPU to perform the actual 1428 * expand operation. 1429 */ 1430 static void 1431 cyclic_expand(cyc_cpu_t *cpu) 1432 { 1433 cyc_index_t new_size, old_size; 1434 cyc_index_t *new_heap, *old_heap; 1435 cyclic_t *new_cyclics, *old_cyclics; 1436 cyc_xcallarg_t arg; 1437 cyc_backend_t *be = cpu->cyp_backend; 1438 char old_hard; 1439 int i; 1440 1441 ASSERT(MUTEX_HELD(&cpu_lock)); 1442 ASSERT(cpu->cyp_state == CYS_ONLINE); 1443 1444 cpu->cyp_state = CYS_EXPANDING; 1445 1446 old_heap = cpu->cyp_heap; 1447 old_cyclics = cpu->cyp_cyclics; 1448 1449 if ((new_size = ((old_size = cpu->cyp_size) << 1)) == 0) { 1450 new_size = CY_DEFAULT_PERCPU; 1451 ASSERT(old_heap == NULL && old_cyclics == NULL); 1452 } 1453 1454 /* 1455 * Check that the new_size is a power of 2. 1456 */ 1457 ASSERT((new_size - 1 & new_size) == 0); 1458 1459 new_heap = kmem_alloc(sizeof (cyc_index_t) * new_size, KM_SLEEP); 1460 new_cyclics = kmem_zalloc(sizeof (cyclic_t) * new_size, KM_SLEEP); 1461 1462 /* 1463 * We know that no other expansions are in progress (they serialize 1464 * on cpu_lock), so we can safely read the softbuf metadata. 1465 */ 1466 old_hard = cpu->cyp_softbuf[0].cys_hard; 1467 1468 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1469 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1470 char hard = softbuf->cys_hard; 1471 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard ^ 1]; 1472 1473 ASSERT(hard == old_hard); 1474 ASSERT(hard == softbuf->cys_soft); 1475 ASSERT(pc->cypc_buf == NULL); 1476 1477 pc->cypc_buf = 1478 kmem_alloc(sizeof (cyc_index_t) * new_size, KM_SLEEP); 1479 pc->cypc_prodndx = pc->cypc_consndx = 0; 1480 pc->cypc_sizemask = new_size - 1; 1481 } 1482 1483 arg.cyx_cpu = cpu; 1484 arg.cyx_heap = new_heap; 1485 arg.cyx_cyclics = new_cyclics; 1486 arg.cyx_size = new_size; 1487 1488 cpu->cyp_modify_levels = 0; 1489 1490 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 1491 (cyc_func_t)cyclic_expand_xcall, &arg); 1492 1493 /* 1494 * Now block, waiting for the resize operation to complete. 1495 */ 1496 sema_p(&cpu->cyp_modify_wait); 1497 ASSERT(cpu->cyp_modify_levels == CY_SOFT_LEVELS); 1498 1499 /* 1500 * The operation is complete; we can now free the old buffers. 1501 */ 1502 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1503 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1504 char hard = softbuf->cys_hard; 1505 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard ^ 1]; 1506 1507 ASSERT(hard == (old_hard ^ 1)); 1508 ASSERT(hard == softbuf->cys_soft); 1509 1510 if (pc->cypc_buf == NULL) 1511 continue; 1512 1513 ASSERT(pc->cypc_sizemask == ((new_size - 1) >> 1)); 1514 1515 kmem_free(pc->cypc_buf, 1516 sizeof (cyc_index_t) * (pc->cypc_sizemask + 1)); 1517 pc->cypc_buf = NULL; 1518 } 1519 1520 if (old_cyclics != NULL) { 1521 ASSERT(old_heap != NULL); 1522 ASSERT(old_size != 0); 1523 kmem_free(old_cyclics, sizeof (cyclic_t) * old_size); 1524 kmem_free(old_heap, sizeof (cyc_index_t) * old_size); 1525 } 1526 1527 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1528 cpu->cyp_state = CYS_ONLINE; 1529 } 1530 1531 /* 1532 * cyclic_pick_cpu will attempt to pick a CPU according to the constraints 1533 * specified by the partition, bound CPU, and flags. Additionally, 1534 * cyclic_pick_cpu() will not pick the avoid CPU; it will return NULL if 1535 * the avoid CPU is the only CPU which satisfies the constraints. 1536 * 1537 * If CYF_CPU_BOUND is set in flags, the specified CPU must be non-NULL. 1538 * If CYF_PART_BOUND is set in flags, the specified partition must be non-NULL. 1539 * If both CYF_CPU_BOUND and CYF_PART_BOUND are set, the specified CPU must 1540 * be in the specified partition. 1541 */ 1542 static cyc_cpu_t * 1543 cyclic_pick_cpu(cpupart_t *part, cpu_t *bound, cpu_t *avoid, uint16_t flags) 1544 { 1545 cpu_t *c, *start = (part != NULL) ? part->cp_cpulist : CPU; 1546 cpu_t *online = NULL; 1547 uintptr_t offset; 1548 1549 CYC_PTRACE("pick-cpu", part, bound); 1550 1551 ASSERT(!(flags & CYF_CPU_BOUND) || bound != NULL); 1552 ASSERT(!(flags & CYF_PART_BOUND) || part != NULL); 1553 1554 /* 1555 * If we're bound to our CPU, there isn't much choice involved. We 1556 * need to check that the CPU passed as bound is in the cpupart, and 1557 * that the CPU that we're binding to has been configured. 1558 */ 1559 if (flags & CYF_CPU_BOUND) { 1560 CYC_PTRACE("pick-cpu-bound", bound, avoid); 1561 1562 if ((flags & CYF_PART_BOUND) && bound->cpu_part != part) 1563 panic("cyclic_pick_cpu: " 1564 "CPU binding contradicts partition binding"); 1565 1566 if (bound == avoid) 1567 return (NULL); 1568 1569 if (bound->cpu_cyclic == NULL) 1570 panic("cyclic_pick_cpu: " 1571 "attempt to bind to non-configured CPU"); 1572 1573 return (bound->cpu_cyclic); 1574 } 1575 1576 if (flags & CYF_PART_BOUND) { 1577 CYC_PTRACE("pick-part-bound", bound, avoid); 1578 offset = offsetof(cpu_t, cpu_next_part); 1579 } else { 1580 offset = offsetof(cpu_t, cpu_next_onln); 1581 } 1582 1583 c = start; 1584 do { 1585 if (c->cpu_cyclic == NULL) 1586 continue; 1587 1588 if (c->cpu_cyclic->cyp_state == CYS_OFFLINE) 1589 continue; 1590 1591 if (c == avoid) 1592 continue; 1593 1594 if (c->cpu_flags & CPU_ENABLE) 1595 goto found; 1596 1597 if (online == NULL) 1598 online = c; 1599 } while ((c = *(cpu_t **)((uintptr_t)c + offset)) != start); 1600 1601 /* 1602 * If we're here, we're in one of two situations: 1603 * 1604 * (a) We have a partition-bound cyclic, and there is no CPU in 1605 * our partition which is CPU_ENABLE'd. If we saw another 1606 * non-CYS_OFFLINE CPU in our partition, we'll go with it. 1607 * If not, the avoid CPU must be the only non-CYS_OFFLINE 1608 * CPU in the partition; we're forced to return NULL. 1609 * 1610 * (b) We have a partition-unbound cyclic, in which case there 1611 * must only be one CPU CPU_ENABLE'd, and it must be the one 1612 * we're trying to avoid. If cyclic_juggle()/cyclic_offline() 1613 * are called appropriately, this generally shouldn't happen 1614 * (the offline should fail before getting to this code). 1615 * At any rate: we can't avoid the avoid CPU, so we return 1616 * NULL. 1617 */ 1618 if (!(flags & CYF_PART_BOUND)) { 1619 ASSERT(avoid->cpu_flags & CPU_ENABLE); 1620 return (NULL); 1621 } 1622 1623 CYC_PTRACE("pick-no-intr", part, avoid); 1624 1625 if ((c = online) != NULL) 1626 goto found; 1627 1628 CYC_PTRACE("pick-fail", part, avoid); 1629 ASSERT(avoid->cpu_part == start->cpu_part); 1630 return (NULL); 1631 1632 found: 1633 CYC_PTRACE("pick-cpu-found", c, avoid); 1634 ASSERT(c != avoid); 1635 ASSERT(c->cpu_cyclic != NULL); 1636 1637 return (c->cpu_cyclic); 1638 } 1639 1640 static void 1641 cyclic_add_xcall(cyc_xcallarg_t *arg) 1642 { 1643 cyc_cpu_t *cpu = arg->cyx_cpu; 1644 cyc_handler_t *hdlr = arg->cyx_hdlr; 1645 cyc_time_t *when = arg->cyx_when; 1646 cyc_backend_t *be = cpu->cyp_backend; 1647 cyc_index_t ndx, nelems; 1648 cyc_cookie_t cookie; 1649 cyb_arg_t bar = be->cyb_arg; 1650 cyclic_t *cyclic; 1651 1652 ASSERT(cpu->cyp_nelems < cpu->cyp_size); 1653 1654 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1655 1656 CYC_TRACE(cpu, CY_HIGH_LEVEL, 1657 "add-xcall", when->cyt_when, when->cyt_interval); 1658 1659 nelems = cpu->cyp_nelems++; 1660 1661 if (nelems == 0) { 1662 /* 1663 * If this is the first element, we need to enable the 1664 * backend on this CPU. 1665 */ 1666 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "enabled"); 1667 be->cyb_enable(bar); 1668 } 1669 1670 ndx = cpu->cyp_heap[nelems]; 1671 cyclic = &cpu->cyp_cyclics[ndx]; 1672 1673 ASSERT(cyclic->cy_flags == CYF_FREE); 1674 cyclic->cy_interval = when->cyt_interval; 1675 1676 if (when->cyt_when == 0) { 1677 /* 1678 * If a start time hasn't been explicitly specified, we'll 1679 * start on the next interval boundary. 1680 */ 1681 cyclic->cy_expire = (gethrtime() / cyclic->cy_interval + 1) * 1682 cyclic->cy_interval; 1683 } else { 1684 cyclic->cy_expire = when->cyt_when; 1685 } 1686 1687 cyclic->cy_handler = hdlr->cyh_func; 1688 cyclic->cy_arg = hdlr->cyh_arg; 1689 cyclic->cy_level = hdlr->cyh_level; 1690 cyclic->cy_flags = arg->cyx_flags; 1691 1692 if (cyclic_upheap(cpu, nelems)) { 1693 hrtime_t exp = cyclic->cy_expire; 1694 1695 CYC_TRACE(cpu, CY_HIGH_LEVEL, "add-reprog", cyclic, exp); 1696 1697 /* 1698 * If our upheap propagated to the root, we need to 1699 * reprogram the interrupt source. 1700 */ 1701 be->cyb_reprogram(bar, exp); 1702 } 1703 be->cyb_restore_level(bar, cookie); 1704 1705 arg->cyx_ndx = ndx; 1706 } 1707 1708 static cyc_index_t 1709 cyclic_add_here(cyc_cpu_t *cpu, cyc_handler_t *hdlr, 1710 cyc_time_t *when, uint16_t flags) 1711 { 1712 cyc_backend_t *be = cpu->cyp_backend; 1713 cyb_arg_t bar = be->cyb_arg; 1714 cyc_xcallarg_t arg; 1715 1716 CYC_PTRACE("add-cpu", cpu, hdlr->cyh_func); 1717 ASSERT(MUTEX_HELD(&cpu_lock)); 1718 ASSERT(cpu->cyp_state == CYS_ONLINE); 1719 ASSERT(!(cpu->cyp_cpu->cpu_flags & CPU_OFFLINE)); 1720 ASSERT(when->cyt_when >= 0 && when->cyt_interval > 0); 1721 1722 if (cpu->cyp_nelems == cpu->cyp_size) { 1723 /* 1724 * This is expensive; it will cross call onto the other 1725 * CPU to perform the expansion. 1726 */ 1727 cyclic_expand(cpu); 1728 ASSERT(cpu->cyp_nelems < cpu->cyp_size); 1729 } 1730 1731 /* 1732 * By now, we know that we're going to be able to successfully 1733 * perform the add. Now cross call over to the CPU of interest to 1734 * actually add our cyclic. 1735 */ 1736 arg.cyx_cpu = cpu; 1737 arg.cyx_hdlr = hdlr; 1738 arg.cyx_when = when; 1739 arg.cyx_flags = flags; 1740 1741 be->cyb_xcall(bar, cpu->cyp_cpu, (cyc_func_t)cyclic_add_xcall, &arg); 1742 1743 CYC_PTRACE("add-cpu-done", cpu, arg.cyx_ndx); 1744 1745 return (arg.cyx_ndx); 1746 } 1747 1748 static void 1749 cyclic_remove_xcall(cyc_xcallarg_t *arg) 1750 { 1751 cyc_cpu_t *cpu = arg->cyx_cpu; 1752 cyc_backend_t *be = cpu->cyp_backend; 1753 cyb_arg_t bar = be->cyb_arg; 1754 cyc_cookie_t cookie; 1755 cyc_index_t ndx = arg->cyx_ndx, nelems, i; 1756 cyc_index_t *heap, last; 1757 cyclic_t *cyclic; 1758 #ifdef DEBUG 1759 cyc_index_t root; 1760 #endif 1761 1762 ASSERT(cpu->cyp_state == CYS_REMOVING); 1763 1764 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1765 1766 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "remove-xcall", ndx); 1767 1768 heap = cpu->cyp_heap; 1769 nelems = cpu->cyp_nelems; 1770 ASSERT(nelems > 0); 1771 cyclic = &cpu->cyp_cyclics[ndx]; 1772 1773 /* 1774 * Grab the current expiration time. If this cyclic is being 1775 * removed as part of a juggling operation, the expiration time 1776 * will be used when the cyclic is added to the new CPU. 1777 */ 1778 if (arg->cyx_when != NULL) { 1779 arg->cyx_when->cyt_when = cyclic->cy_expire; 1780 arg->cyx_when->cyt_interval = cyclic->cy_interval; 1781 } 1782 1783 if (cyclic->cy_pend != 0) { 1784 /* 1785 * The pend is non-zero; this cyclic is currently being 1786 * executed (or will be executed shortly). If the caller 1787 * refuses to wait, we must return (doing nothing). Otherwise, 1788 * we will stash the pend value * in this CPU's rpend, and 1789 * then zero it out. The softint in the pend loop will see 1790 * that we have zeroed out pend, and will call the cyclic 1791 * handler rpend times. The caller will wait until the 1792 * softint has completed calling the cyclic handler. 1793 */ 1794 if (arg->cyx_wait == CY_NOWAIT) { 1795 arg->cyx_wait = CY_WAIT; 1796 goto out; 1797 } 1798 1799 ASSERT(cyclic->cy_level != CY_HIGH_LEVEL); 1800 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "remove-pend", cyclic->cy_pend); 1801 cpu->cyp_rpend = cyclic->cy_pend; 1802 cyclic->cy_pend = 0; 1803 } 1804 1805 /* 1806 * Now set the flags to CYF_FREE. We don't need a membar_enter() 1807 * between zeroing pend and setting the flags because we're at 1808 * CY_HIGH_LEVEL (that is, the zeroing of pend and the setting 1809 * of cy_flags appear atomic to softints). 1810 */ 1811 cyclic->cy_flags = CYF_FREE; 1812 1813 for (i = 0; i < nelems; i++) { 1814 if (heap[i] == ndx) 1815 break; 1816 } 1817 1818 if (i == nelems) 1819 panic("attempt to remove non-existent cyclic"); 1820 1821 cpu->cyp_nelems = --nelems; 1822 1823 if (nelems == 0) { 1824 /* 1825 * If we just removed the last element, then we need to 1826 * disable the backend on this CPU. 1827 */ 1828 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "disabled"); 1829 be->cyb_disable(bar); 1830 } 1831 1832 if (i == nelems) { 1833 /* 1834 * If we just removed the last element of the heap, then 1835 * we don't have to downheap. 1836 */ 1837 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-bottom"); 1838 goto out; 1839 } 1840 1841 #ifdef DEBUG 1842 root = heap[0]; 1843 #endif 1844 1845 /* 1846 * Swap the last element of the heap with the one we want to 1847 * remove, and downheap (this has the implicit effect of putting 1848 * the newly freed element on the free list). 1849 */ 1850 heap[i] = (last = heap[nelems]); 1851 heap[nelems] = ndx; 1852 1853 if (i == 0) { 1854 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-root"); 1855 cyclic_downheap(cpu, 0); 1856 } else { 1857 if (cyclic_upheap(cpu, i) == 0) { 1858 /* 1859 * The upheap didn't propagate to the root; if it 1860 * didn't propagate at all, we need to downheap. 1861 */ 1862 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-no-root"); 1863 if (heap[i] == last) { 1864 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-no-up"); 1865 cyclic_downheap(cpu, i); 1866 } 1867 ASSERT(heap[0] == root); 1868 goto out; 1869 } 1870 } 1871 1872 /* 1873 * We're here because we changed the root; we need to reprogram 1874 * the clock source. 1875 */ 1876 cyclic = &cpu->cyp_cyclics[heap[0]]; 1877 1878 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-reprog"); 1879 1880 ASSERT(nelems != 0); 1881 be->cyb_reprogram(bar, cyclic->cy_expire); 1882 out: 1883 be->cyb_restore_level(bar, cookie); 1884 } 1885 1886 static int 1887 cyclic_remove_here(cyc_cpu_t *cpu, cyc_index_t ndx, cyc_time_t *when, int wait) 1888 { 1889 cyc_backend_t *be = cpu->cyp_backend; 1890 cyc_xcallarg_t arg; 1891 cyclic_t *cyclic = &cpu->cyp_cyclics[ndx]; 1892 cyc_level_t level = cyclic->cy_level; 1893 1894 ASSERT(MUTEX_HELD(&cpu_lock)); 1895 ASSERT(cpu->cyp_rpend == 0); 1896 ASSERT(wait == CY_WAIT || wait == CY_NOWAIT); 1897 1898 arg.cyx_ndx = ndx; 1899 arg.cyx_cpu = cpu; 1900 arg.cyx_when = when; 1901 arg.cyx_wait = wait; 1902 1903 ASSERT(cpu->cyp_state == CYS_ONLINE); 1904 cpu->cyp_state = CYS_REMOVING; 1905 1906 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 1907 (cyc_func_t)cyclic_remove_xcall, &arg); 1908 1909 /* 1910 * If the cyclic we removed wasn't at CY_HIGH_LEVEL, then we need to 1911 * check the cyp_rpend. If it's non-zero, then we need to wait here 1912 * for all pending cyclic handlers to run. 1913 */ 1914 ASSERT(!(level == CY_HIGH_LEVEL && cpu->cyp_rpend != 0)); 1915 ASSERT(!(wait == CY_NOWAIT && cpu->cyp_rpend != 0)); 1916 ASSERT(!(arg.cyx_wait == CY_NOWAIT && cpu->cyp_rpend != 0)); 1917 1918 if (wait != arg.cyx_wait) { 1919 /* 1920 * We are being told that we must wait if we want to 1921 * remove this cyclic; put the CPU back in the CYS_ONLINE 1922 * state and return failure. 1923 */ 1924 ASSERT(wait == CY_NOWAIT && arg.cyx_wait == CY_WAIT); 1925 ASSERT(cpu->cyp_state == CYS_REMOVING); 1926 cpu->cyp_state = CYS_ONLINE; 1927 1928 return (0); 1929 } 1930 1931 if (cpu->cyp_rpend != 0) 1932 sema_p(&cpu->cyp_modify_wait); 1933 1934 ASSERT(cpu->cyp_state == CYS_REMOVING); 1935 1936 cpu->cyp_rpend = 0; 1937 cpu->cyp_state = CYS_ONLINE; 1938 1939 return (1); 1940 } 1941 1942 /* 1943 * If cyclic_reprogram() is called on the same CPU as the cyclic's CPU, then 1944 * it calls this function directly. Else, it invokes this function through 1945 * an X-call to the cyclic's CPU. 1946 */ 1947 static void 1948 cyclic_reprogram_cyclic(cyc_cpu_t *cpu, cyc_index_t ndx, hrtime_t expire) 1949 { 1950 cyc_backend_t *be = cpu->cyp_backend; 1951 cyb_arg_t bar = be->cyb_arg; 1952 cyc_cookie_t cookie; 1953 cyc_index_t nelems, i; 1954 cyc_index_t *heap; 1955 cyclic_t *cyclic; 1956 hrtime_t oexpire; 1957 int reprog; 1958 1959 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1960 1961 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "reprog-xcall", ndx); 1962 1963 nelems = cpu->cyp_nelems; 1964 ASSERT(nelems > 0); 1965 heap = cpu->cyp_heap; 1966 1967 /* 1968 * Reprogrammed cyclics are typically one-shot ones that get 1969 * set to infinity on every expiration. We shorten the search by 1970 * searching from the bottom of the heap to the top instead of the 1971 * other way around. 1972 */ 1973 for (i = nelems - 1; i >= 0; i--) { 1974 if (heap[i] == ndx) 1975 break; 1976 } 1977 if (i < 0) 1978 panic("attempt to reprogram non-existent cyclic"); 1979 1980 cyclic = &cpu->cyp_cyclics[ndx]; 1981 oexpire = cyclic->cy_expire; 1982 cyclic->cy_expire = expire; 1983 1984 reprog = (i == 0); 1985 if (expire > oexpire) { 1986 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "reprog-down", i); 1987 cyclic_downheap(cpu, i); 1988 } else if (i > 0) { 1989 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "reprog-up", i); 1990 reprog = cyclic_upheap(cpu, i); 1991 } 1992 1993 if (reprog && (cpu->cyp_state != CYS_SUSPENDED)) { 1994 /* 1995 * The root changed. Reprogram the clock source. 1996 */ 1997 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "reprog-root"); 1998 cyclic = &cpu->cyp_cyclics[heap[0]]; 1999 be->cyb_reprogram(bar, cyclic->cy_expire); 2000 } 2001 2002 be->cyb_restore_level(bar, cookie); 2003 } 2004 2005 static void 2006 cyclic_reprogram_xcall(cyc_xcallarg_t *arg) 2007 { 2008 cyclic_reprogram_cyclic(arg->cyx_cpu, arg->cyx_ndx, 2009 arg->cyx_when->cyt_when); 2010 } 2011 2012 static void 2013 cyclic_reprogram_here(cyc_cpu_t *cpu, cyc_index_t ndx, hrtime_t expiration) 2014 { 2015 cyc_backend_t *be = cpu->cyp_backend; 2016 cyc_xcallarg_t arg; 2017 cyc_time_t when; 2018 2019 ASSERT(expiration > 0); 2020 2021 arg.cyx_ndx = ndx; 2022 arg.cyx_cpu = cpu; 2023 arg.cyx_when = &when; 2024 when.cyt_when = expiration; 2025 2026 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 2027 (cyc_func_t)cyclic_reprogram_xcall, &arg); 2028 } 2029 2030 /* 2031 * cyclic_juggle_one_to() should only be called when the source cyclic 2032 * can be juggled and the destination CPU is known to be able to accept 2033 * it. 2034 */ 2035 static void 2036 cyclic_juggle_one_to(cyc_id_t *idp, cyc_cpu_t *dest) 2037 { 2038 cyc_cpu_t *src = idp->cyi_cpu; 2039 cyc_index_t ndx = idp->cyi_ndx; 2040 cyc_time_t when; 2041 cyc_handler_t hdlr; 2042 cyclic_t *cyclic; 2043 uint16_t flags; 2044 hrtime_t delay; 2045 2046 ASSERT(MUTEX_HELD(&cpu_lock)); 2047 ASSERT(src != NULL && idp->cyi_omni_list == NULL); 2048 ASSERT(!(dest->cyp_cpu->cpu_flags & (CPU_QUIESCED | CPU_OFFLINE))); 2049 CYC_PTRACE("juggle-one-to", idp, dest); 2050 2051 cyclic = &src->cyp_cyclics[ndx]; 2052 2053 flags = cyclic->cy_flags; 2054 ASSERT(!(flags & CYF_CPU_BOUND) && !(flags & CYF_FREE)); 2055 2056 hdlr.cyh_func = cyclic->cy_handler; 2057 hdlr.cyh_level = cyclic->cy_level; 2058 hdlr.cyh_arg = cyclic->cy_arg; 2059 2060 /* 2061 * Before we begin the juggling process, see if the destination 2062 * CPU requires an expansion. If it does, we'll perform the 2063 * expansion before removing the cyclic. This is to prevent us 2064 * from blocking while a system-critical cyclic (notably, the clock 2065 * cyclic) isn't on a CPU. 2066 */ 2067 if (dest->cyp_nelems == dest->cyp_size) { 2068 CYC_PTRACE("remove-expand", idp, dest); 2069 cyclic_expand(dest); 2070 ASSERT(dest->cyp_nelems < dest->cyp_size); 2071 } 2072 2073 /* 2074 * Prevent a reprogram of this cyclic while we are relocating it. 2075 * Otherwise, cyclic_reprogram_here() will end up sending an X-call 2076 * to the wrong CPU. 2077 */ 2078 rw_enter(&idp->cyi_lock, RW_WRITER); 2079 2080 /* 2081 * Remove the cyclic from the source. As mentioned above, we cannot 2082 * block during this operation; if we cannot remove the cyclic 2083 * without waiting, we spin for a time shorter than the interval, and 2084 * reattempt the (non-blocking) removal. If we continue to fail, 2085 * we will exponentially back off (up to half of the interval). 2086 * Note that the removal will ultimately succeed -- even if the 2087 * cyclic handler is blocked on a resource held by a thread which we 2088 * have preempted, priority inheritance assures that the preempted 2089 * thread will preempt us and continue to progress. 2090 */ 2091 for (delay = NANOSEC / MICROSEC; ; delay <<= 1) { 2092 /* 2093 * Before we begin this operation, disable kernel preemption. 2094 */ 2095 kpreempt_disable(); 2096 if (cyclic_remove_here(src, ndx, &when, CY_NOWAIT)) 2097 break; 2098 2099 /* 2100 * The operation failed; enable kernel preemption while 2101 * spinning. 2102 */ 2103 kpreempt_enable(); 2104 2105 CYC_PTRACE("remove-retry", idp, src); 2106 2107 if (delay > (cyclic->cy_interval >> 1)) 2108 delay = cyclic->cy_interval >> 1; 2109 2110 /* 2111 * Drop the RW lock to avoid a deadlock with the cyclic 2112 * handler (because it can potentially call cyclic_reprogram(). 2113 */ 2114 rw_exit(&idp->cyi_lock); 2115 drv_usecwait((clock_t)(delay / (NANOSEC / MICROSEC))); 2116 rw_enter(&idp->cyi_lock, RW_WRITER); 2117 } 2118 2119 /* 2120 * Now add the cyclic to the destination. This won't block; we 2121 * performed any necessary (blocking) expansion of the destination 2122 * CPU before removing the cyclic from the source CPU. 2123 */ 2124 idp->cyi_ndx = cyclic_add_here(dest, &hdlr, &when, flags); 2125 idp->cyi_cpu = dest; 2126 kpreempt_enable(); 2127 2128 /* 2129 * Now that we have successfully relocated the cyclic, allow 2130 * it to be reprogrammed. 2131 */ 2132 rw_exit(&idp->cyi_lock); 2133 } 2134 2135 static int 2136 cyclic_juggle_one(cyc_id_t *idp) 2137 { 2138 cyc_index_t ndx = idp->cyi_ndx; 2139 cyc_cpu_t *cpu = idp->cyi_cpu, *dest; 2140 cyclic_t *cyclic = &cpu->cyp_cyclics[ndx]; 2141 cpu_t *c = cpu->cyp_cpu; 2142 cpupart_t *part = c->cpu_part; 2143 2144 CYC_PTRACE("juggle-one", idp, cpu); 2145 ASSERT(MUTEX_HELD(&cpu_lock)); 2146 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2147 ASSERT(cpu->cyp_state == CYS_ONLINE); 2148 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 2149 2150 if ((dest = cyclic_pick_cpu(part, c, c, cyclic->cy_flags)) == NULL) { 2151 /* 2152 * Bad news: this cyclic can't be juggled. 2153 */ 2154 CYC_PTRACE("juggle-fail", idp, cpu) 2155 return (0); 2156 } 2157 2158 cyclic_juggle_one_to(idp, dest); 2159 2160 return (1); 2161 } 2162 2163 static void 2164 cyclic_unbind_cpu(cyclic_id_t id) 2165 { 2166 cyc_id_t *idp = (cyc_id_t *)id; 2167 cyc_cpu_t *cpu = idp->cyi_cpu; 2168 cpu_t *c = cpu->cyp_cpu; 2169 cyclic_t *cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 2170 2171 CYC_PTRACE("unbind-cpu", id, cpu); 2172 ASSERT(MUTEX_HELD(&cpu_lock)); 2173 ASSERT(cpu->cyp_state == CYS_ONLINE); 2174 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 2175 ASSERT(cyclic->cy_flags & CYF_CPU_BOUND); 2176 2177 cyclic->cy_flags &= ~CYF_CPU_BOUND; 2178 2179 /* 2180 * If we were bound to CPU which has interrupts disabled, we need 2181 * to juggle away. This can only fail if we are bound to a 2182 * processor set, and if every CPU in the processor set has 2183 * interrupts disabled. 2184 */ 2185 if (!(c->cpu_flags & CPU_ENABLE)) { 2186 int res = cyclic_juggle_one(idp); 2187 2188 ASSERT((res && idp->cyi_cpu != cpu) || 2189 (!res && (cyclic->cy_flags & CYF_PART_BOUND))); 2190 } 2191 } 2192 2193 static void 2194 cyclic_bind_cpu(cyclic_id_t id, cpu_t *d) 2195 { 2196 cyc_id_t *idp = (cyc_id_t *)id; 2197 cyc_cpu_t *dest = d->cpu_cyclic, *cpu = idp->cyi_cpu; 2198 cpu_t *c = cpu->cyp_cpu; 2199 cyclic_t *cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 2200 cpupart_t *part = c->cpu_part; 2201 2202 CYC_PTRACE("bind-cpu", id, dest); 2203 ASSERT(MUTEX_HELD(&cpu_lock)); 2204 ASSERT(!(d->cpu_flags & CPU_OFFLINE)); 2205 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2206 ASSERT(cpu->cyp_state == CYS_ONLINE); 2207 ASSERT(dest != NULL); 2208 ASSERT(dest->cyp_state == CYS_ONLINE); 2209 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 2210 ASSERT(!(cyclic->cy_flags & CYF_CPU_BOUND)); 2211 2212 dest = cyclic_pick_cpu(part, d, NULL, cyclic->cy_flags | CYF_CPU_BOUND); 2213 2214 if (dest != cpu) { 2215 cyclic_juggle_one_to(idp, dest); 2216 cyclic = &dest->cyp_cyclics[idp->cyi_ndx]; 2217 } 2218 2219 cyclic->cy_flags |= CYF_CPU_BOUND; 2220 } 2221 2222 static void 2223 cyclic_unbind_cpupart(cyclic_id_t id) 2224 { 2225 cyc_id_t *idp = (cyc_id_t *)id; 2226 cyc_cpu_t *cpu = idp->cyi_cpu; 2227 cpu_t *c = cpu->cyp_cpu; 2228 cyclic_t *cyc = &cpu->cyp_cyclics[idp->cyi_ndx]; 2229 2230 CYC_PTRACE("unbind-part", idp, c->cpu_part); 2231 ASSERT(MUTEX_HELD(&cpu_lock)); 2232 ASSERT(cpu->cyp_state == CYS_ONLINE); 2233 ASSERT(!(cyc->cy_flags & CYF_FREE)); 2234 ASSERT(cyc->cy_flags & CYF_PART_BOUND); 2235 2236 cyc->cy_flags &= ~CYF_PART_BOUND; 2237 2238 /* 2239 * If we're on a CPU which has interrupts disabled (and if this cyclic 2240 * isn't bound to the CPU), we need to juggle away. 2241 */ 2242 if (!(c->cpu_flags & CPU_ENABLE) && !(cyc->cy_flags & CYF_CPU_BOUND)) { 2243 int res = cyclic_juggle_one(idp); 2244 2245 ASSERT(res && idp->cyi_cpu != cpu); 2246 } 2247 } 2248 2249 static void 2250 cyclic_bind_cpupart(cyclic_id_t id, cpupart_t *part) 2251 { 2252 cyc_id_t *idp = (cyc_id_t *)id; 2253 cyc_cpu_t *cpu = idp->cyi_cpu, *dest; 2254 cpu_t *c = cpu->cyp_cpu; 2255 cyclic_t *cyc = &cpu->cyp_cyclics[idp->cyi_ndx]; 2256 2257 CYC_PTRACE("bind-part", idp, part); 2258 ASSERT(MUTEX_HELD(&cpu_lock)); 2259 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2260 ASSERT(cpu->cyp_state == CYS_ONLINE); 2261 ASSERT(!(cyc->cy_flags & CYF_FREE)); 2262 ASSERT(!(cyc->cy_flags & CYF_PART_BOUND)); 2263 ASSERT(part->cp_ncpus > 0); 2264 2265 dest = cyclic_pick_cpu(part, c, NULL, cyc->cy_flags | CYF_PART_BOUND); 2266 2267 if (dest != cpu) { 2268 cyclic_juggle_one_to(idp, dest); 2269 cyc = &dest->cyp_cyclics[idp->cyi_ndx]; 2270 } 2271 2272 cyc->cy_flags |= CYF_PART_BOUND; 2273 } 2274 2275 static void 2276 cyclic_configure(cpu_t *c) 2277 { 2278 cyc_cpu_t *cpu = kmem_zalloc(sizeof (cyc_cpu_t), KM_SLEEP); 2279 cyc_backend_t *nbe = kmem_zalloc(sizeof (cyc_backend_t), KM_SLEEP); 2280 int i; 2281 2282 CYC_PTRACE1("configure", cpu); 2283 ASSERT(MUTEX_HELD(&cpu_lock)); 2284 2285 if (cyclic_id_cache == NULL) 2286 cyclic_id_cache = kmem_cache_create("cyclic_id_cache", 2287 sizeof (cyc_id_t), 0, NULL, NULL, NULL, NULL, NULL, 0); 2288 2289 cpu->cyp_cpu = c; 2290 2291 sema_init(&cpu->cyp_modify_wait, 0, NULL, SEMA_DEFAULT, NULL); 2292 2293 cpu->cyp_size = 1; 2294 cpu->cyp_heap = kmem_zalloc(sizeof (cyc_index_t), KM_SLEEP); 2295 cpu->cyp_cyclics = kmem_zalloc(sizeof (cyclic_t), KM_SLEEP); 2296 cpu->cyp_cyclics->cy_flags = CYF_FREE; 2297 2298 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 2299 /* 2300 * We don't need to set the sizemask; it's already zero 2301 * (which is the appropriate sizemask for a size of 1). 2302 */ 2303 cpu->cyp_softbuf[i].cys_buf[0].cypc_buf = 2304 kmem_alloc(sizeof (cyc_index_t), KM_SLEEP); 2305 } 2306 2307 cpu->cyp_state = CYS_OFFLINE; 2308 2309 /* 2310 * Setup the backend for this CPU. 2311 */ 2312 bcopy(&cyclic_backend, nbe, sizeof (cyc_backend_t)); 2313 nbe->cyb_arg = nbe->cyb_configure(c); 2314 cpu->cyp_backend = nbe; 2315 2316 /* 2317 * On platforms where stray interrupts may be taken during startup, 2318 * the CPU's cpu_cyclic pointer serves as an indicator that the 2319 * cyclic subsystem for this CPU is prepared to field interrupts. 2320 */ 2321 membar_producer(); 2322 2323 c->cpu_cyclic = cpu; 2324 } 2325 2326 static void 2327 cyclic_unconfigure(cpu_t *c) 2328 { 2329 cyc_cpu_t *cpu = c->cpu_cyclic; 2330 cyc_backend_t *be = cpu->cyp_backend; 2331 cyb_arg_t bar = be->cyb_arg; 2332 int i; 2333 2334 CYC_PTRACE1("unconfigure", cpu); 2335 ASSERT(MUTEX_HELD(&cpu_lock)); 2336 ASSERT(cpu->cyp_state == CYS_OFFLINE); 2337 ASSERT(cpu->cyp_nelems == 0); 2338 2339 /* 2340 * Let the backend know that the CPU is being yanked, and free up 2341 * the backend structure. 2342 */ 2343 be->cyb_unconfigure(bar); 2344 kmem_free(be, sizeof (cyc_backend_t)); 2345 cpu->cyp_backend = NULL; 2346 2347 /* 2348 * Free up the producer/consumer buffers at each of the soft levels. 2349 */ 2350 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 2351 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 2352 uchar_t hard = softbuf->cys_hard; 2353 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard]; 2354 size_t bufsize = sizeof (cyc_index_t) * (pc->cypc_sizemask + 1); 2355 2356 /* 2357 * Assert that we're not in the middle of a resize operation. 2358 */ 2359 ASSERT(hard == softbuf->cys_soft); 2360 ASSERT(hard == 0 || hard == 1); 2361 ASSERT(pc->cypc_buf != NULL); 2362 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_buf == NULL); 2363 2364 kmem_free(pc->cypc_buf, bufsize); 2365 pc->cypc_buf = NULL; 2366 } 2367 2368 /* 2369 * Finally, clean up our remaining dynamic structures and NULL out 2370 * the cpu_cyclic pointer. 2371 */ 2372 kmem_free(cpu->cyp_cyclics, cpu->cyp_size * sizeof (cyclic_t)); 2373 kmem_free(cpu->cyp_heap, cpu->cyp_size * sizeof (cyc_index_t)); 2374 kmem_free(cpu, sizeof (cyc_cpu_t)); 2375 2376 c->cpu_cyclic = NULL; 2377 } 2378 2379 static int 2380 cyclic_cpu_setup(cpu_setup_t what, int id) 2381 { 2382 /* 2383 * We are guaranteed that there is still/already an entry in the 2384 * cpu array for this CPU. 2385 */ 2386 cpu_t *c = cpu[id]; 2387 cyc_cpu_t *cyp = c->cpu_cyclic; 2388 2389 ASSERT(MUTEX_HELD(&cpu_lock)); 2390 2391 switch (what) { 2392 case CPU_CONFIG: 2393 ASSERT(cyp == NULL); 2394 cyclic_configure(c); 2395 break; 2396 2397 case CPU_UNCONFIG: 2398 ASSERT(cyp != NULL && cyp->cyp_state == CYS_OFFLINE); 2399 cyclic_unconfigure(c); 2400 break; 2401 2402 default: 2403 break; 2404 } 2405 2406 return (0); 2407 } 2408 2409 static void 2410 cyclic_suspend_xcall(cyc_xcallarg_t *arg) 2411 { 2412 cyc_cpu_t *cpu = arg->cyx_cpu; 2413 cyc_backend_t *be = cpu->cyp_backend; 2414 cyc_cookie_t cookie; 2415 cyb_arg_t bar = be->cyb_arg; 2416 2417 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 2418 2419 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "suspend-xcall", cpu->cyp_nelems); 2420 ASSERT(cpu->cyp_state == CYS_ONLINE || cpu->cyp_state == CYS_OFFLINE); 2421 2422 /* 2423 * We won't disable this CPU unless it has a non-zero number of 2424 * elements (cpu_lock assures that no one else may be attempting 2425 * to disable this CPU). 2426 */ 2427 if (cpu->cyp_nelems > 0) { 2428 ASSERT(cpu->cyp_state == CYS_ONLINE); 2429 be->cyb_disable(bar); 2430 } 2431 2432 if (cpu->cyp_state == CYS_ONLINE) 2433 cpu->cyp_state = CYS_SUSPENDED; 2434 2435 be->cyb_suspend(bar); 2436 be->cyb_restore_level(bar, cookie); 2437 } 2438 2439 static void 2440 cyclic_resume_xcall(cyc_xcallarg_t *arg) 2441 { 2442 cyc_cpu_t *cpu = arg->cyx_cpu; 2443 cyc_backend_t *be = cpu->cyp_backend; 2444 cyc_cookie_t cookie; 2445 cyb_arg_t bar = be->cyb_arg; 2446 cyc_state_t state = cpu->cyp_state; 2447 2448 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 2449 2450 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "resume-xcall", cpu->cyp_nelems); 2451 ASSERT(state == CYS_SUSPENDED || state == CYS_OFFLINE); 2452 2453 be->cyb_resume(bar); 2454 2455 /* 2456 * We won't enable this CPU unless it has a non-zero number of 2457 * elements. 2458 */ 2459 if (cpu->cyp_nelems > 0) { 2460 cyclic_t *cyclic = &cpu->cyp_cyclics[cpu->cyp_heap[0]]; 2461 hrtime_t exp = cyclic->cy_expire; 2462 2463 CYC_TRACE(cpu, CY_HIGH_LEVEL, "resume-reprog", cyclic, exp); 2464 ASSERT(state == CYS_SUSPENDED); 2465 be->cyb_enable(bar); 2466 be->cyb_reprogram(bar, exp); 2467 } 2468 2469 if (state == CYS_SUSPENDED) 2470 cpu->cyp_state = CYS_ONLINE; 2471 2472 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "resume-done", cpu->cyp_nelems); 2473 be->cyb_restore_level(bar, cookie); 2474 } 2475 2476 static void 2477 cyclic_omni_start(cyc_id_t *idp, cyc_cpu_t *cpu) 2478 { 2479 cyc_omni_handler_t *omni = &idp->cyi_omni_hdlr; 2480 cyc_omni_cpu_t *ocpu = kmem_alloc(sizeof (cyc_omni_cpu_t), KM_SLEEP); 2481 cyc_handler_t hdlr; 2482 cyc_time_t when; 2483 2484 CYC_PTRACE("omni-start", cpu, idp); 2485 ASSERT(MUTEX_HELD(&cpu_lock)); 2486 ASSERT(cpu->cyp_state == CYS_ONLINE); 2487 ASSERT(idp->cyi_cpu == NULL); 2488 2489 hdlr.cyh_func = NULL; 2490 hdlr.cyh_arg = NULL; 2491 hdlr.cyh_level = CY_LEVELS; 2492 2493 when.cyt_when = 0; 2494 when.cyt_interval = 0; 2495 2496 omni->cyo_online(omni->cyo_arg, cpu->cyp_cpu, &hdlr, &when); 2497 2498 ASSERT(hdlr.cyh_func != NULL); 2499 ASSERT(hdlr.cyh_level < CY_LEVELS); 2500 ASSERT(when.cyt_when >= 0 && when.cyt_interval > 0); 2501 2502 ocpu->cyo_cpu = cpu; 2503 ocpu->cyo_arg = hdlr.cyh_arg; 2504 ocpu->cyo_ndx = cyclic_add_here(cpu, &hdlr, &when, 0); 2505 ocpu->cyo_next = idp->cyi_omni_list; 2506 idp->cyi_omni_list = ocpu; 2507 } 2508 2509 static void 2510 cyclic_omni_stop(cyc_id_t *idp, cyc_cpu_t *cpu) 2511 { 2512 cyc_omni_handler_t *omni = &idp->cyi_omni_hdlr; 2513 cyc_omni_cpu_t *ocpu = idp->cyi_omni_list, *prev = NULL; 2514 clock_t delay; 2515 int ret; 2516 2517 CYC_PTRACE("omni-stop", cpu, idp); 2518 ASSERT(MUTEX_HELD(&cpu_lock)); 2519 ASSERT(cpu->cyp_state == CYS_ONLINE); 2520 ASSERT(idp->cyi_cpu == NULL); 2521 ASSERT(ocpu != NULL); 2522 2523 /* 2524 * Prevent a reprogram of this cyclic while we are removing it. 2525 * Otherwise, cyclic_reprogram_here() will end up sending an X-call 2526 * to the offlined CPU. 2527 */ 2528 rw_enter(&idp->cyi_lock, RW_WRITER); 2529 2530 while (ocpu != NULL && ocpu->cyo_cpu != cpu) { 2531 prev = ocpu; 2532 ocpu = ocpu->cyo_next; 2533 } 2534 2535 /* 2536 * We _must_ have found an cyc_omni_cpu which corresponds to this 2537 * CPU -- the definition of an omnipresent cyclic is that it runs 2538 * on all online CPUs. 2539 */ 2540 ASSERT(ocpu != NULL); 2541 2542 if (prev == NULL) { 2543 idp->cyi_omni_list = ocpu->cyo_next; 2544 } else { 2545 prev->cyo_next = ocpu->cyo_next; 2546 } 2547 2548 /* 2549 * Remove the cyclic from the source. We cannot block during this 2550 * operation because we are holding the cyi_lock which can be held 2551 * by the cyclic handler via cyclic_reprogram(). 2552 * 2553 * If we cannot remove the cyclic without waiting, we spin for a time, 2554 * and reattempt the (non-blocking) removal. If the handler is blocked 2555 * on the cyi_lock, then we let go of it in the spin loop to give 2556 * the handler a chance to run. Note that the removal will ultimately 2557 * succeed -- even if the cyclic handler is blocked on a resource 2558 * held by a thread which we have preempted, priority inheritance 2559 * assures that the preempted thread will preempt us and continue 2560 * to progress. 2561 */ 2562 for (delay = 1; ; delay <<= 1) { 2563 /* 2564 * Before we begin this operation, disable kernel preemption. 2565 */ 2566 kpreempt_disable(); 2567 ret = cyclic_remove_here(ocpu->cyo_cpu, ocpu->cyo_ndx, NULL, 2568 CY_NOWAIT); 2569 /* 2570 * Enable kernel preemption while spinning. 2571 */ 2572 kpreempt_enable(); 2573 2574 if (ret) 2575 break; 2576 2577 CYC_PTRACE("remove-omni-retry", idp, ocpu->cyo_cpu); 2578 2579 /* 2580 * Drop the RW lock to avoid a deadlock with the cyclic 2581 * handler (because it can potentially call cyclic_reprogram(). 2582 */ 2583 rw_exit(&idp->cyi_lock); 2584 drv_usecwait(delay); 2585 rw_enter(&idp->cyi_lock, RW_WRITER); 2586 } 2587 2588 /* 2589 * Now that we have successfully removed the cyclic, allow the omni 2590 * cyclic to be reprogrammed on other CPUs. 2591 */ 2592 rw_exit(&idp->cyi_lock); 2593 2594 /* 2595 * The cyclic has been removed from this CPU; time to call the 2596 * omnipresent offline handler. 2597 */ 2598 if (omni->cyo_offline != NULL) 2599 omni->cyo_offline(omni->cyo_arg, cpu->cyp_cpu, ocpu->cyo_arg); 2600 2601 kmem_free(ocpu, sizeof (cyc_omni_cpu_t)); 2602 } 2603 2604 static cyc_id_t * 2605 cyclic_new_id() 2606 { 2607 cyc_id_t *idp; 2608 2609 ASSERT(MUTEX_HELD(&cpu_lock)); 2610 2611 idp = kmem_cache_alloc(cyclic_id_cache, KM_SLEEP); 2612 2613 /* 2614 * The cyi_cpu field of the cyc_id_t structure tracks the CPU 2615 * associated with the cyclic. If and only if this field is NULL, the 2616 * cyc_id_t is an omnipresent cyclic. Note that cyi_omni_list may be 2617 * NULL for an omnipresent cyclic while the cyclic is being created 2618 * or destroyed. 2619 */ 2620 idp->cyi_cpu = NULL; 2621 idp->cyi_ndx = 0; 2622 rw_init(&idp->cyi_lock, NULL, RW_DEFAULT, NULL); 2623 2624 idp->cyi_next = cyclic_id_head; 2625 idp->cyi_prev = NULL; 2626 idp->cyi_omni_list = NULL; 2627 2628 if (cyclic_id_head != NULL) { 2629 ASSERT(cyclic_id_head->cyi_prev == NULL); 2630 cyclic_id_head->cyi_prev = idp; 2631 } 2632 2633 cyclic_id_head = idp; 2634 2635 return (idp); 2636 } 2637 2638 /* 2639 * cyclic_id_t cyclic_add(cyc_handler_t *, cyc_time_t *) 2640 * 2641 * Overview 2642 * 2643 * cyclic_add() will create an unbound cyclic with the specified handler and 2644 * interval. The cyclic will run on a CPU which both has interrupts enabled 2645 * and is in the system CPU partition. 2646 * 2647 * Arguments and notes 2648 * 2649 * As its first argument, cyclic_add() takes a cyc_handler, which has the 2650 * following members: 2651 * 2652 * cyc_func_t cyh_func <-- Cyclic handler 2653 * void *cyh_arg <-- Argument to cyclic handler 2654 * cyc_level_t cyh_level <-- Level at which to fire; must be one of 2655 * CY_LOW_LEVEL, CY_LOCK_LEVEL or CY_HIGH_LEVEL 2656 * 2657 * Note that cyh_level is _not_ an ipl or spl; it must be one the 2658 * CY_*_LEVELs. This layer of abstraction allows the platform to define 2659 * the precise interrupt priority levels, within the following constraints: 2660 * 2661 * CY_LOCK_LEVEL must map to LOCK_LEVEL 2662 * CY_HIGH_LEVEL must map to an ipl greater than LOCK_LEVEL 2663 * CY_LOW_LEVEL must map to an ipl below LOCK_LEVEL 2664 * 2665 * In addition to a cyc_handler, cyclic_add() takes a cyc_time, which 2666 * has the following members: 2667 * 2668 * hrtime_t cyt_when <-- Absolute time, in nanoseconds since boot, at 2669 * which to start firing 2670 * hrtime_t cyt_interval <-- Length of interval, in nanoseconds 2671 * 2672 * gethrtime() is the time source for nanoseconds since boot. If cyt_when 2673 * is set to 0, the cyclic will start to fire when cyt_interval next 2674 * divides the number of nanoseconds since boot. 2675 * 2676 * The cyt_interval field _must_ be filled in by the caller; one-shots are 2677 * _not_ explicitly supported by the cyclic subsystem (cyclic_add() will 2678 * assert that cyt_interval is non-zero). The maximum value for either 2679 * field is INT64_MAX; the caller is responsible for assuring that 2680 * cyt_when + cyt_interval <= INT64_MAX. Neither field may be negative. 2681 * 2682 * For an arbitrary time t in the future, the cyclic handler is guaranteed 2683 * to have been called (t - cyt_when) / cyt_interval times. This will 2684 * be true even if interrupts have been disabled for periods greater than 2685 * cyt_interval nanoseconds. In order to compensate for such periods, 2686 * the cyclic handler may be called a finite number of times with an 2687 * arbitrarily small interval. 2688 * 2689 * The cyclic subsystem will not enforce any lower bound on the interval; 2690 * if the interval is less than the time required to process an interrupt, 2691 * the CPU will wedge. It's the responsibility of the caller to assure that 2692 * either the value of the interval is sane, or that its caller has 2693 * sufficient privilege to deny service (i.e. its caller is root). 2694 * 2695 * The cyclic handler is guaranteed to be single threaded, even while the 2696 * cyclic is being juggled between CPUs (see cyclic_juggle(), below). 2697 * That is, a given cyclic handler will never be executed simultaneously 2698 * on different CPUs. 2699 * 2700 * Return value 2701 * 2702 * cyclic_add() returns a cyclic_id_t, which is guaranteed to be a value 2703 * other than CYCLIC_NONE. cyclic_add() cannot fail. 2704 * 2705 * Caller's context 2706 * 2707 * cpu_lock must be held by the caller, and the caller must not be in 2708 * interrupt context. cyclic_add() will perform a KM_SLEEP kernel 2709 * memory allocation, so the usual rules (e.g. p_lock cannot be held) 2710 * apply. A cyclic may be added even in the presence of CPUs that have 2711 * not been configured with respect to the cyclic subsystem, but only 2712 * configured CPUs will be eligible to run the new cyclic. 2713 * 2714 * Cyclic handler's context 2715 * 2716 * Cyclic handlers will be executed in the interrupt context corresponding 2717 * to the specified level (i.e. either high, lock or low level). The 2718 * usual context rules apply. 2719 * 2720 * A cyclic handler may not grab ANY locks held by the caller of any of 2721 * cyclic_add(), cyclic_remove() or cyclic_bind(); the implementation of 2722 * these functions may require blocking on cyclic handler completion. 2723 * Moreover, cyclic handlers may not make any call back into the cyclic 2724 * subsystem. 2725 */ 2726 cyclic_id_t 2727 cyclic_add(cyc_handler_t *hdlr, cyc_time_t *when) 2728 { 2729 cyc_id_t *idp = cyclic_new_id(); 2730 2731 ASSERT(MUTEX_HELD(&cpu_lock)); 2732 ASSERT(when->cyt_when >= 0 && when->cyt_interval > 0); 2733 2734 idp->cyi_cpu = cyclic_pick_cpu(NULL, NULL, NULL, 0); 2735 idp->cyi_ndx = cyclic_add_here(idp->cyi_cpu, hdlr, when, 0); 2736 2737 return ((uintptr_t)idp); 2738 } 2739 2740 /* 2741 * cyclic_id_t cyclic_add_omni(cyc_omni_handler_t *) 2742 * 2743 * Overview 2744 * 2745 * cyclic_add_omni() will create an omnipresent cyclic with the specified 2746 * online and offline handlers. Omnipresent cyclics run on all online 2747 * CPUs, including CPUs which have unbound interrupts disabled. 2748 * 2749 * Arguments 2750 * 2751 * As its only argument, cyclic_add_omni() takes a cyc_omni_handler, which 2752 * has the following members: 2753 * 2754 * void (*cyo_online)() <-- Online handler 2755 * void (*cyo_offline)() <-- Offline handler 2756 * void *cyo_arg <-- Argument to be passed to on/offline handlers 2757 * 2758 * Online handler 2759 * 2760 * The cyo_online member is a pointer to a function which has the following 2761 * four arguments: 2762 * 2763 * void * <-- Argument (cyo_arg) 2764 * cpu_t * <-- Pointer to CPU about to be onlined 2765 * cyc_handler_t * <-- Pointer to cyc_handler_t; must be filled in 2766 * by omni online handler 2767 * cyc_time_t * <-- Pointer to cyc_time_t; must be filled in by 2768 * omni online handler 2769 * 2770 * The omni cyclic online handler is always called _before_ the omni 2771 * cyclic begins to fire on the specified CPU. As the above argument 2772 * description implies, the online handler must fill in the two structures 2773 * passed to it: the cyc_handler_t and the cyc_time_t. These are the 2774 * same two structures passed to cyclic_add(), outlined above. This 2775 * allows the omni cyclic to have maximum flexibility; different CPUs may 2776 * optionally 2777 * 2778 * (a) have different intervals 2779 * (b) be explicitly in or out of phase with one another 2780 * (c) have different handlers 2781 * (d) have different handler arguments 2782 * (e) fire at different levels 2783 * 2784 * Of these, (e) seems somewhat dubious, but is nonetheless allowed. 2785 * 2786 * The omni online handler is called in the same context as cyclic_add(), 2787 * and has the same liberties: omni online handlers may perform KM_SLEEP 2788 * kernel memory allocations, and may grab locks which are also acquired 2789 * by cyclic handlers. However, omni cyclic online handlers may _not_ 2790 * call back into the cyclic subsystem, and should be generally careful 2791 * about calling into arbitrary kernel subsystems. 2792 * 2793 * Offline handler 2794 * 2795 * The cyo_offline member is a pointer to a function which has the following 2796 * three arguments: 2797 * 2798 * void * <-- Argument (cyo_arg) 2799 * cpu_t * <-- Pointer to CPU about to be offlined 2800 * void * <-- CPU's cyclic argument (that is, value 2801 * to which cyh_arg member of the cyc_handler_t 2802 * was set in the omni online handler) 2803 * 2804 * The omni cyclic offline handler is always called _after_ the omni 2805 * cyclic has ceased firing on the specified CPU. Its purpose is to 2806 * allow cleanup of any resources dynamically allocated in the omni cyclic 2807 * online handler. The context of the offline handler is identical to 2808 * that of the online handler; the same constraints and liberties apply. 2809 * 2810 * The offline handler is optional; it may be NULL. 2811 * 2812 * Return value 2813 * 2814 * cyclic_add_omni() returns a cyclic_id_t, which is guaranteed to be a 2815 * value other than CYCLIC_NONE. cyclic_add_omni() cannot fail. 2816 * 2817 * Caller's context 2818 * 2819 * The caller's context is identical to that of cyclic_add(), specified 2820 * above. 2821 */ 2822 cyclic_id_t 2823 cyclic_add_omni(cyc_omni_handler_t *omni) 2824 { 2825 cyc_id_t *idp = cyclic_new_id(); 2826 cyc_cpu_t *cpu; 2827 cpu_t *c; 2828 2829 ASSERT(MUTEX_HELD(&cpu_lock)); 2830 ASSERT(omni != NULL && omni->cyo_online != NULL); 2831 2832 idp->cyi_omni_hdlr = *omni; 2833 2834 c = cpu_list; 2835 do { 2836 if ((cpu = c->cpu_cyclic) == NULL) 2837 continue; 2838 2839 if (cpu->cyp_state != CYS_ONLINE) { 2840 ASSERT(cpu->cyp_state == CYS_OFFLINE); 2841 continue; 2842 } 2843 2844 cyclic_omni_start(idp, cpu); 2845 } while ((c = c->cpu_next) != cpu_list); 2846 2847 /* 2848 * We must have found at least one online CPU on which to run 2849 * this cyclic. 2850 */ 2851 ASSERT(idp->cyi_omni_list != NULL); 2852 ASSERT(idp->cyi_cpu == NULL); 2853 2854 return ((uintptr_t)idp); 2855 } 2856 2857 /* 2858 * void cyclic_remove(cyclic_id_t) 2859 * 2860 * Overview 2861 * 2862 * cyclic_remove() will remove the specified cyclic from the system. 2863 * 2864 * Arguments and notes 2865 * 2866 * The only argument is a cyclic_id returned from either cyclic_add() or 2867 * cyclic_add_omni(). 2868 * 2869 * By the time cyclic_remove() returns, the caller is guaranteed that the 2870 * removed cyclic handler has completed execution (this is the same 2871 * semantic that untimeout() provides). As a result, cyclic_remove() may 2872 * need to block, waiting for the removed cyclic to complete execution. 2873 * This leads to an important constraint on the caller: no lock may be 2874 * held across cyclic_remove() that also may be acquired by a cyclic 2875 * handler. 2876 * 2877 * Return value 2878 * 2879 * None; cyclic_remove() always succeeds. 2880 * 2881 * Caller's context 2882 * 2883 * cpu_lock must be held by the caller, and the caller must not be in 2884 * interrupt context. The caller may not hold any locks which are also 2885 * grabbed by any cyclic handler. See "Arguments and notes", above. 2886 */ 2887 void 2888 cyclic_remove(cyclic_id_t id) 2889 { 2890 cyc_id_t *idp = (cyc_id_t *)id; 2891 cyc_id_t *prev = idp->cyi_prev, *next = idp->cyi_next; 2892 cyc_cpu_t *cpu = idp->cyi_cpu; 2893 2894 CYC_PTRACE("remove", idp, idp->cyi_cpu); 2895 ASSERT(MUTEX_HELD(&cpu_lock)); 2896 2897 if (cpu != NULL) { 2898 (void) cyclic_remove_here(cpu, idp->cyi_ndx, NULL, CY_WAIT); 2899 } else { 2900 ASSERT(idp->cyi_omni_list != NULL); 2901 while (idp->cyi_omni_list != NULL) 2902 cyclic_omni_stop(idp, idp->cyi_omni_list->cyo_cpu); 2903 } 2904 2905 if (prev != NULL) { 2906 ASSERT(cyclic_id_head != idp); 2907 prev->cyi_next = next; 2908 } else { 2909 ASSERT(cyclic_id_head == idp); 2910 cyclic_id_head = next; 2911 } 2912 2913 if (next != NULL) 2914 next->cyi_prev = prev; 2915 2916 kmem_cache_free(cyclic_id_cache, idp); 2917 } 2918 2919 /* 2920 * void cyclic_bind(cyclic_id_t, cpu_t *, cpupart_t *) 2921 * 2922 * Overview 2923 * 2924 * cyclic_bind() atomically changes the CPU and CPU partition bindings 2925 * of a cyclic. 2926 * 2927 * Arguments and notes 2928 * 2929 * The first argument is a cyclic_id retuned from cyclic_add(). 2930 * cyclic_bind() may _not_ be called on a cyclic_id returned from 2931 * cyclic_add_omni(). 2932 * 2933 * The second argument specifies the CPU to which to bind the specified 2934 * cyclic. If the specified cyclic is bound to a CPU other than the one 2935 * specified, it will be unbound from its bound CPU. Unbinding the cyclic 2936 * from its CPU may cause it to be juggled to another CPU. If the specified 2937 * CPU is non-NULL, the cyclic will be subsequently rebound to the specified 2938 * CPU. 2939 * 2940 * If a CPU with bound cyclics is transitioned into the P_NOINTR state, 2941 * only cyclics not bound to the CPU can be juggled away; CPU-bound cyclics 2942 * will continue to fire on the P_NOINTR CPU. A CPU with bound cyclics 2943 * cannot be offlined (attempts to offline the CPU will return EBUSY). 2944 * Likewise, cyclics may not be bound to an offline CPU; if the caller 2945 * attempts to bind a cyclic to an offline CPU, the cyclic subsystem will 2946 * panic. 2947 * 2948 * The third argument specifies the CPU partition to which to bind the 2949 * specified cyclic. If the specified cyclic is bound to a CPU partition 2950 * other than the one specified, it will be unbound from its bound 2951 * partition. Unbinding the cyclic from its CPU partition may cause it 2952 * to be juggled to another CPU. If the specified CPU partition is 2953 * non-NULL, the cyclic will be subsequently rebound to the specified CPU 2954 * partition. 2955 * 2956 * It is the caller's responsibility to assure that the specified CPU 2957 * partition contains a CPU. If it does not, the cyclic subsystem will 2958 * panic. A CPU partition with bound cyclics cannot be destroyed (attempts 2959 * to destroy the partition will return EBUSY). If a CPU with 2960 * partition-bound cyclics is transitioned into the P_NOINTR state, cyclics 2961 * bound to the CPU's partition (but not bound to the CPU) will be juggled 2962 * away only if there exists another CPU in the partition in the P_ONLINE 2963 * state. 2964 * 2965 * It is the caller's responsibility to assure that the specified CPU and 2966 * CPU partition are self-consistent. If both parameters are non-NULL, 2967 * and the specified CPU partition does not contain the specified CPU, the 2968 * cyclic subsystem will panic. 2969 * 2970 * It is the caller's responsibility to assure that the specified CPU has 2971 * been configured with respect to the cyclic subsystem. Generally, this 2972 * is always true for valid, on-line CPUs. The only periods of time during 2973 * which this may not be true are during MP boot (i.e. after cyclic_init() 2974 * is called but before cyclic_mp_init() is called) or during dynamic 2975 * reconfiguration; cyclic_bind() should only be called with great care 2976 * from these contexts. 2977 * 2978 * Return value 2979 * 2980 * None; cyclic_bind() always succeeds. 2981 * 2982 * Caller's context 2983 * 2984 * cpu_lock must be held by the caller, and the caller must not be in 2985 * interrupt context. The caller may not hold any locks which are also 2986 * grabbed by any cyclic handler. 2987 */ 2988 void 2989 cyclic_bind(cyclic_id_t id, cpu_t *d, cpupart_t *part) 2990 { 2991 cyc_id_t *idp = (cyc_id_t *)id; 2992 cyc_cpu_t *cpu = idp->cyi_cpu; 2993 cpu_t *c; 2994 uint16_t flags; 2995 2996 CYC_PTRACE("bind", d, part); 2997 ASSERT(MUTEX_HELD(&cpu_lock)); 2998 ASSERT(part == NULL || d == NULL || d->cpu_part == part); 2999 3000 if (cpu == NULL) { 3001 ASSERT(idp->cyi_omni_list != NULL); 3002 panic("attempt to change binding of omnipresent cyclic"); 3003 } 3004 3005 c = cpu->cyp_cpu; 3006 flags = cpu->cyp_cyclics[idp->cyi_ndx].cy_flags; 3007 3008 if (c != d && (flags & CYF_CPU_BOUND)) 3009 cyclic_unbind_cpu(id); 3010 3011 /* 3012 * Reload our cpu (we may have migrated). We don't have to reload 3013 * the flags field here; if we were CYF_PART_BOUND on entry, we are 3014 * CYF_PART_BOUND now. 3015 */ 3016 cpu = idp->cyi_cpu; 3017 c = cpu->cyp_cpu; 3018 3019 if (part != c->cpu_part && (flags & CYF_PART_BOUND)) 3020 cyclic_unbind_cpupart(id); 3021 3022 /* 3023 * Now reload the flags field, asserting that if we are CPU bound, 3024 * the CPU was specified (and likewise, if we are partition bound, 3025 * the partition was specified). 3026 */ 3027 cpu = idp->cyi_cpu; 3028 c = cpu->cyp_cpu; 3029 flags = cpu->cyp_cyclics[idp->cyi_ndx].cy_flags; 3030 ASSERT(!(flags & CYF_CPU_BOUND) || c == d); 3031 ASSERT(!(flags & CYF_PART_BOUND) || c->cpu_part == part); 3032 3033 if (!(flags & CYF_CPU_BOUND) && d != NULL) 3034 cyclic_bind_cpu(id, d); 3035 3036 if (!(flags & CYF_PART_BOUND) && part != NULL) 3037 cyclic_bind_cpupart(id, part); 3038 } 3039 3040 int 3041 cyclic_reprogram(cyclic_id_t id, hrtime_t expiration) 3042 { 3043 cyc_id_t *idp = (cyc_id_t *)id; 3044 cyc_cpu_t *cpu; 3045 cyc_omni_cpu_t *ocpu; 3046 cyc_index_t ndx; 3047 3048 ASSERT(expiration > 0); 3049 3050 CYC_PTRACE("reprog", idp, idp->cyi_cpu); 3051 3052 kpreempt_disable(); 3053 3054 /* 3055 * Prevent the cyclic from moving or disappearing while we reprogram. 3056 */ 3057 rw_enter(&idp->cyi_lock, RW_READER); 3058 3059 if (idp->cyi_cpu == NULL) { 3060 ASSERT(curthread->t_preempt > 0); 3061 cpu = CPU->cpu_cyclic; 3062 3063 /* 3064 * For an omni cyclic, we reprogram the cyclic corresponding 3065 * to the current CPU. Look for it in the list. 3066 */ 3067 ocpu = idp->cyi_omni_list; 3068 while (ocpu != NULL) { 3069 if (ocpu->cyo_cpu == cpu) 3070 break; 3071 ocpu = ocpu->cyo_next; 3072 } 3073 3074 if (ocpu == NULL) { 3075 /* 3076 * Didn't find it. This means that CPU offline 3077 * must have removed it racing with us. So, 3078 * nothing to do. 3079 */ 3080 rw_exit(&idp->cyi_lock); 3081 3082 kpreempt_enable(); 3083 3084 return (0); 3085 } 3086 ndx = ocpu->cyo_ndx; 3087 } else { 3088 cpu = idp->cyi_cpu; 3089 ndx = idp->cyi_ndx; 3090 } 3091 3092 if (cpu->cyp_cpu == CPU) 3093 cyclic_reprogram_cyclic(cpu, ndx, expiration); 3094 else 3095 cyclic_reprogram_here(cpu, ndx, expiration); 3096 3097 /* 3098 * Allow the cyclic to be moved or removed. 3099 */ 3100 rw_exit(&idp->cyi_lock); 3101 3102 kpreempt_enable(); 3103 3104 return (1); 3105 } 3106 3107 hrtime_t 3108 cyclic_getres() 3109 { 3110 return (cyclic_resolution); 3111 } 3112 3113 void 3114 cyclic_init(cyc_backend_t *be, hrtime_t resolution) 3115 { 3116 ASSERT(MUTEX_HELD(&cpu_lock)); 3117 3118 CYC_PTRACE("init", be, resolution); 3119 cyclic_resolution = resolution; 3120 3121 /* 3122 * Copy the passed cyc_backend into the backend template. This must 3123 * be done before the CPU can be configured. 3124 */ 3125 bcopy(be, &cyclic_backend, sizeof (cyc_backend_t)); 3126 3127 /* 3128 * It's safe to look at the "CPU" pointer without disabling kernel 3129 * preemption; cyclic_init() is called only during startup by the 3130 * cyclic backend. 3131 */ 3132 cyclic_configure(CPU); 3133 cyclic_online(CPU); 3134 } 3135 3136 /* 3137 * It is assumed that cyclic_mp_init() is called some time after cyclic 3138 * init (and therefore, after cpu0 has been initialized). We grab cpu_lock, 3139 * find the already initialized CPU, and initialize every other CPU with the 3140 * same backend. Finally, we register a cpu_setup function. 3141 */ 3142 void 3143 cyclic_mp_init() 3144 { 3145 cpu_t *c; 3146 3147 mutex_enter(&cpu_lock); 3148 3149 c = cpu_list; 3150 do { 3151 if (c->cpu_cyclic == NULL) { 3152 cyclic_configure(c); 3153 cyclic_online(c); 3154 } 3155 } while ((c = c->cpu_next) != cpu_list); 3156 3157 register_cpu_setup_func((cpu_setup_func_t *)cyclic_cpu_setup, NULL); 3158 mutex_exit(&cpu_lock); 3159 } 3160 3161 /* 3162 * int cyclic_juggle(cpu_t *) 3163 * 3164 * Overview 3165 * 3166 * cyclic_juggle() juggles as many cyclics as possible away from the 3167 * specified CPU; all remaining cyclics on the CPU will either be CPU- 3168 * or partition-bound. 3169 * 3170 * Arguments and notes 3171 * 3172 * The only argument to cyclic_juggle() is the CPU from which cyclics 3173 * should be juggled. CPU-bound cyclics are never juggled; partition-bound 3174 * cyclics are only juggled if the specified CPU is in the P_NOINTR state 3175 * and there exists a P_ONLINE CPU in the partition. The cyclic subsystem 3176 * assures that a cyclic will never fire late or spuriously, even while 3177 * being juggled. 3178 * 3179 * Return value 3180 * 3181 * cyclic_juggle() returns a non-zero value if all cyclics were able to 3182 * be juggled away from the CPU, and zero if one or more cyclics could 3183 * not be juggled away. 3184 * 3185 * Caller's context 3186 * 3187 * cpu_lock must be held by the caller, and the caller must not be in 3188 * interrupt context. The caller may not hold any locks which are also 3189 * grabbed by any cyclic handler. While cyclic_juggle() _may_ be called 3190 * in any context satisfying these constraints, it _must_ be called 3191 * immediately after clearing CPU_ENABLE (i.e. before dropping cpu_lock). 3192 * Failure to do so could result in an assertion failure in the cyclic 3193 * subsystem. 3194 */ 3195 int 3196 cyclic_juggle(cpu_t *c) 3197 { 3198 cyc_cpu_t *cpu = c->cpu_cyclic; 3199 cyc_id_t *idp; 3200 int all_juggled = 1; 3201 3202 CYC_PTRACE1("juggle", c); 3203 ASSERT(MUTEX_HELD(&cpu_lock)); 3204 3205 /* 3206 * We'll go through each cyclic on the CPU, attempting to juggle 3207 * each one elsewhere. 3208 */ 3209 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3210 if (idp->cyi_cpu != cpu) 3211 continue; 3212 3213 if (cyclic_juggle_one(idp) == 0) { 3214 all_juggled = 0; 3215 continue; 3216 } 3217 3218 ASSERT(idp->cyi_cpu != cpu); 3219 } 3220 3221 return (all_juggled); 3222 } 3223 3224 /* 3225 * int cyclic_offline(cpu_t *) 3226 * 3227 * Overview 3228 * 3229 * cyclic_offline() offlines the cyclic subsystem on the specified CPU. 3230 * 3231 * Arguments and notes 3232 * 3233 * The only argument to cyclic_offline() is a CPU to offline. 3234 * cyclic_offline() will attempt to juggle cyclics away from the specified 3235 * CPU. 3236 * 3237 * Return value 3238 * 3239 * cyclic_offline() returns 1 if all cyclics on the CPU were juggled away 3240 * and the cyclic subsystem on the CPU was successfully offlines. 3241 * cyclic_offline returns 0 if some cyclics remain, blocking the cyclic 3242 * offline operation. All remaining cyclics on the CPU will either be 3243 * CPU- or partition-bound. 3244 * 3245 * See the "Arguments and notes" of cyclic_juggle(), below, for more detail 3246 * on cyclic juggling. 3247 * 3248 * Caller's context 3249 * 3250 * The only caller of cyclic_offline() should be the processor management 3251 * subsystem. It is expected that the caller of cyclic_offline() will 3252 * offline the CPU immediately after cyclic_offline() returns success (i.e. 3253 * before dropping cpu_lock). Moreover, it is expected that the caller will 3254 * fail the CPU offline operation if cyclic_offline() returns failure. 3255 */ 3256 int 3257 cyclic_offline(cpu_t *c) 3258 { 3259 cyc_cpu_t *cpu = c->cpu_cyclic; 3260 cyc_id_t *idp; 3261 3262 CYC_PTRACE1("offline", cpu); 3263 ASSERT(MUTEX_HELD(&cpu_lock)); 3264 3265 if (!cyclic_juggle(c)) 3266 return (0); 3267 3268 /* 3269 * This CPU is headed offline; we need to now stop omnipresent 3270 * cyclic firing on this CPU. 3271 */ 3272 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3273 if (idp->cyi_cpu != NULL) 3274 continue; 3275 3276 /* 3277 * We cannot possibly be offlining the last CPU; cyi_omni_list 3278 * must be non-NULL. 3279 */ 3280 ASSERT(idp->cyi_omni_list != NULL); 3281 cyclic_omni_stop(idp, cpu); 3282 } 3283 3284 ASSERT(cpu->cyp_state == CYS_ONLINE); 3285 cpu->cyp_state = CYS_OFFLINE; 3286 3287 return (1); 3288 } 3289 3290 /* 3291 * void cyclic_online(cpu_t *) 3292 * 3293 * Overview 3294 * 3295 * cyclic_online() onlines a CPU previously offlined with cyclic_offline(). 3296 * 3297 * Arguments and notes 3298 * 3299 * cyclic_online()'s only argument is a CPU to online. The specified 3300 * CPU must have been previously offlined with cyclic_offline(). After 3301 * cyclic_online() returns, the specified CPU will be eligible to execute 3302 * cyclics. 3303 * 3304 * Return value 3305 * 3306 * None; cyclic_online() always succeeds. 3307 * 3308 * Caller's context 3309 * 3310 * cyclic_online() should only be called by the processor management 3311 * subsystem; cpu_lock must be held. 3312 */ 3313 void 3314 cyclic_online(cpu_t *c) 3315 { 3316 cyc_cpu_t *cpu = c->cpu_cyclic; 3317 cyc_id_t *idp; 3318 3319 CYC_PTRACE1("online", cpu); 3320 ASSERT(c->cpu_flags & CPU_ENABLE); 3321 ASSERT(MUTEX_HELD(&cpu_lock)); 3322 ASSERT(cpu->cyp_state == CYS_OFFLINE); 3323 3324 cpu->cyp_state = CYS_ONLINE; 3325 3326 /* 3327 * Now that this CPU is open for business, we need to start firing 3328 * all omnipresent cyclics on it. 3329 */ 3330 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3331 if (idp->cyi_cpu != NULL) 3332 continue; 3333 3334 cyclic_omni_start(idp, cpu); 3335 } 3336 } 3337 3338 /* 3339 * void cyclic_move_in(cpu_t *) 3340 * 3341 * Overview 3342 * 3343 * cyclic_move_in() is called by the CPU partition code immediately after 3344 * the specified CPU has moved into a new partition. 3345 * 3346 * Arguments and notes 3347 * 3348 * The only argument to cyclic_move_in() is a CPU which has moved into a 3349 * new partition. If the specified CPU is P_ONLINE, and every other 3350 * CPU in the specified CPU's new partition is P_NOINTR, cyclic_move_in() 3351 * will juggle all partition-bound, CPU-unbound cyclics to the specified 3352 * CPU. 3353 * 3354 * Return value 3355 * 3356 * None; cyclic_move_in() always succeeds. 3357 * 3358 * Caller's context 3359 * 3360 * cyclic_move_in() should _only_ be called immediately after a CPU has 3361 * moved into a new partition, with cpu_lock held. As with other calls 3362 * into the cyclic subsystem, no lock may be held which is also grabbed 3363 * by any cyclic handler. 3364 */ 3365 void 3366 cyclic_move_in(cpu_t *d) 3367 { 3368 cyc_id_t *idp; 3369 cyc_cpu_t *dest = d->cpu_cyclic; 3370 cyclic_t *cyclic; 3371 cpupart_t *part = d->cpu_part; 3372 3373 CYC_PTRACE("move-in", dest, part); 3374 ASSERT(MUTEX_HELD(&cpu_lock)); 3375 3376 /* 3377 * Look for CYF_PART_BOUND cyclics in the new partition. If 3378 * we find one, check to see if it is currently on a CPU which has 3379 * interrupts disabled. If it is (and if this CPU currently has 3380 * interrupts enabled), we'll juggle those cyclics over here. 3381 */ 3382 if (!(d->cpu_flags & CPU_ENABLE)) { 3383 CYC_PTRACE1("move-in-none", dest); 3384 return; 3385 } 3386 3387 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3388 cyc_cpu_t *cpu = idp->cyi_cpu; 3389 cpu_t *c; 3390 3391 /* 3392 * Omnipresent cyclics are exempt from juggling. 3393 */ 3394 if (cpu == NULL) 3395 continue; 3396 3397 c = cpu->cyp_cpu; 3398 3399 if (c->cpu_part != part || (c->cpu_flags & CPU_ENABLE)) 3400 continue; 3401 3402 cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 3403 3404 if (cyclic->cy_flags & CYF_CPU_BOUND) 3405 continue; 3406 3407 /* 3408 * We know that this cyclic is bound to its processor set 3409 * (otherwise, it would not be on a CPU with interrupts 3410 * disabled); juggle it to our CPU. 3411 */ 3412 ASSERT(cyclic->cy_flags & CYF_PART_BOUND); 3413 cyclic_juggle_one_to(idp, dest); 3414 } 3415 3416 CYC_PTRACE1("move-in-done", dest); 3417 } 3418 3419 /* 3420 * int cyclic_move_out(cpu_t *) 3421 * 3422 * Overview 3423 * 3424 * cyclic_move_out() is called by the CPU partition code immediately before 3425 * the specified CPU is to move out of its partition. 3426 * 3427 * Arguments and notes 3428 * 3429 * The only argument to cyclic_move_out() is a CPU which is to move out of 3430 * its partition. 3431 * 3432 * cyclic_move_out() will attempt to juggle away all partition-bound 3433 * cyclics. If the specified CPU is the last CPU in a partition with 3434 * partition-bound cyclics, cyclic_move_out() will fail. If there exists 3435 * a partition-bound cyclic which is CPU-bound to the specified CPU, 3436 * cyclic_move_out() will fail. 3437 * 3438 * Note that cyclic_move_out() will _only_ attempt to juggle away 3439 * partition-bound cyclics; CPU-bound cyclics which are not partition-bound 3440 * and unbound cyclics are not affected by changing the partition 3441 * affiliation of the CPU. 3442 * 3443 * Return value 3444 * 3445 * cyclic_move_out() returns 1 if all partition-bound cyclics on the CPU 3446 * were juggled away; 0 if some cyclics remain. 3447 * 3448 * Caller's context 3449 * 3450 * cyclic_move_out() should _only_ be called immediately before a CPU has 3451 * moved out of its partition, with cpu_lock held. It is expected that 3452 * the caller of cyclic_move_out() will change the processor set affiliation 3453 * of the specified CPU immediately after cyclic_move_out() returns 3454 * success (i.e. before dropping cpu_lock). Moreover, it is expected that 3455 * the caller will fail the CPU repartitioning operation if cyclic_move_out() 3456 * returns failure. As with other calls into the cyclic subsystem, no lock 3457 * may be held which is also grabbed by any cyclic handler. 3458 */ 3459 int 3460 cyclic_move_out(cpu_t *c) 3461 { 3462 cyc_id_t *idp; 3463 cyc_cpu_t *cpu = c->cpu_cyclic, *dest; 3464 cyclic_t *cyclic, *cyclics = cpu->cyp_cyclics; 3465 cpupart_t *part = c->cpu_part; 3466 3467 CYC_PTRACE1("move-out", cpu); 3468 ASSERT(MUTEX_HELD(&cpu_lock)); 3469 3470 /* 3471 * If there are any CYF_PART_BOUND cyclics on this CPU, we need 3472 * to try to juggle them away. 3473 */ 3474 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3475 3476 if (idp->cyi_cpu != cpu) 3477 continue; 3478 3479 cyclic = &cyclics[idp->cyi_ndx]; 3480 3481 if (!(cyclic->cy_flags & CYF_PART_BOUND)) 3482 continue; 3483 3484 dest = cyclic_pick_cpu(part, c, c, cyclic->cy_flags); 3485 3486 if (dest == NULL) { 3487 /* 3488 * We can't juggle this cyclic; we need to return 3489 * failure (we won't bother trying to juggle away 3490 * other cyclics). 3491 */ 3492 CYC_PTRACE("move-out-fail", cpu, idp); 3493 return (0); 3494 } 3495 cyclic_juggle_one_to(idp, dest); 3496 } 3497 3498 CYC_PTRACE1("move-out-done", cpu); 3499 return (1); 3500 } 3501 3502 /* 3503 * void cyclic_suspend() 3504 * 3505 * Overview 3506 * 3507 * cyclic_suspend() suspends all cyclic activity throughout the cyclic 3508 * subsystem. It should be called only by subsystems which are attempting 3509 * to suspend the entire system (e.g. checkpoint/resume, dynamic 3510 * reconfiguration). 3511 * 3512 * Arguments and notes 3513 * 3514 * cyclic_suspend() takes no arguments. Each CPU with an active cyclic 3515 * disables its backend (offline CPUs disable their backends as part of 3516 * the cyclic_offline() operation), thereby disabling future CY_HIGH_LEVEL 3517 * interrupts. 3518 * 3519 * Note that disabling CY_HIGH_LEVEL interrupts does not completely preclude 3520 * cyclic handlers from being called after cyclic_suspend() returns: if a 3521 * CY_LOCK_LEVEL or CY_LOW_LEVEL interrupt thread was blocked at the time 3522 * of cyclic_suspend(), cyclic handlers at its level may continue to be 3523 * called after the interrupt thread becomes unblocked. The 3524 * post-cyclic_suspend() activity is bounded by the pend count on all 3525 * cyclics at the time of cyclic_suspend(). Callers concerned with more 3526 * than simply disabling future CY_HIGH_LEVEL interrupts must check for 3527 * this condition. 3528 * 3529 * On most platforms, timestamps from gethrtime() and gethrestime() are not 3530 * guaranteed to monotonically increase between cyclic_suspend() and 3531 * cyclic_resume(). However, timestamps are guaranteed to monotonically 3532 * increase across the entire cyclic_suspend()/cyclic_resume() operation. 3533 * That is, every timestamp obtained before cyclic_suspend() will be less 3534 * than every timestamp obtained after cyclic_resume(). 3535 * 3536 * Return value 3537 * 3538 * None; cyclic_suspend() always succeeds. 3539 * 3540 * Caller's context 3541 * 3542 * The cyclic subsystem must be configured on every valid CPU; 3543 * cyclic_suspend() may not be called during boot or during dynamic 3544 * reconfiguration. Additionally, cpu_lock must be held, and the caller 3545 * cannot be in high-level interrupt context. However, unlike most other 3546 * cyclic entry points, cyclic_suspend() may be called with locks held 3547 * which are also acquired by CY_LOCK_LEVEL or CY_LOW_LEVEL cyclic 3548 * handlers. 3549 */ 3550 void 3551 cyclic_suspend() 3552 { 3553 cpu_t *c; 3554 cyc_cpu_t *cpu; 3555 cyc_xcallarg_t arg; 3556 cyc_backend_t *be; 3557 3558 CYC_PTRACE0("suspend"); 3559 ASSERT(MUTEX_HELD(&cpu_lock)); 3560 c = cpu_list; 3561 3562 do { 3563 cpu = c->cpu_cyclic; 3564 be = cpu->cyp_backend; 3565 arg.cyx_cpu = cpu; 3566 3567 be->cyb_xcall(be->cyb_arg, c, 3568 (cyc_func_t)cyclic_suspend_xcall, &arg); 3569 } while ((c = c->cpu_next) != cpu_list); 3570 } 3571 3572 /* 3573 * void cyclic_resume() 3574 * 3575 * cyclic_resume() resumes all cyclic activity throughout the cyclic 3576 * subsystem. It should be called only by system-suspending subsystems. 3577 * 3578 * Arguments and notes 3579 * 3580 * cyclic_resume() takes no arguments. Each CPU with an active cyclic 3581 * reenables and reprograms its backend (offline CPUs are not reenabled). 3582 * On most platforms, timestamps from gethrtime() and gethrestime() are not 3583 * guaranteed to monotonically increase between cyclic_suspend() and 3584 * cyclic_resume(). However, timestamps are guaranteed to monotonically 3585 * increase across the entire cyclic_suspend()/cyclic_resume() operation. 3586 * That is, every timestamp obtained before cyclic_suspend() will be less 3587 * than every timestamp obtained after cyclic_resume(). 3588 * 3589 * Return value 3590 * 3591 * None; cyclic_resume() always succeeds. 3592 * 3593 * Caller's context 3594 * 3595 * The cyclic subsystem must be configured on every valid CPU; 3596 * cyclic_resume() may not be called during boot or during dynamic 3597 * reconfiguration. Additionally, cpu_lock must be held, and the caller 3598 * cannot be in high-level interrupt context. However, unlike most other 3599 * cyclic entry points, cyclic_resume() may be called with locks held which 3600 * are also acquired by CY_LOCK_LEVEL or CY_LOW_LEVEL cyclic handlers. 3601 */ 3602 void 3603 cyclic_resume() 3604 { 3605 cpu_t *c; 3606 cyc_cpu_t *cpu; 3607 cyc_xcallarg_t arg; 3608 cyc_backend_t *be; 3609 3610 CYC_PTRACE0("resume"); 3611 ASSERT(MUTEX_HELD(&cpu_lock)); 3612 3613 c = cpu_list; 3614 3615 do { 3616 cpu = c->cpu_cyclic; 3617 be = cpu->cyp_backend; 3618 arg.cyx_cpu = cpu; 3619 3620 be->cyb_xcall(be->cyb_arg, c, 3621 (cyc_func_t)cyclic_resume_xcall, &arg); 3622 } while ((c = c->cpu_next) != cpu_list); 3623 } 3624