1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 /* 22 * Copyright 2008 Sun Microsystems, Inc. All rights reserved. 23 * Use is subject to license terms. 24 */ 25 26 /* 27 * Copyright (c) 2012, Joyent Inc. All rights reserved. 28 */ 29 30 /* 31 * The Cyclic Subsystem 32 * -------------------- 33 * 34 * Prehistory 35 * 36 * Historically, most computer architectures have specified interval-based 37 * timer parts (e.g. SPARCstation's counter/timer; Intel's i8254). While 38 * these parts deal in relative (i.e. not absolute) time values, they are 39 * typically used by the operating system to implement the abstraction of 40 * absolute time. As a result, these parts cannot typically be reprogrammed 41 * without introducing error in the system's notion of time. 42 * 43 * Starting in about 1994, chip architectures began specifying high resolution 44 * timestamp registers. As of this writing (1999), all major chip families 45 * (UltraSPARC, PentiumPro, MIPS, PowerPC, Alpha) have high resolution 46 * timestamp registers, and two (UltraSPARC and MIPS) have added the capacity 47 * to interrupt based on timestamp values. These timestamp-compare registers 48 * present a time-based interrupt source which can be reprogrammed arbitrarily 49 * often without introducing error. Given the low cost of implementing such a 50 * timestamp-compare register (and the tangible benefit of eliminating 51 * discrete timer parts), it is reasonable to expect that future chip 52 * architectures will adopt this feature. 53 * 54 * The cyclic subsystem has been designed to take advantage of chip 55 * architectures with the capacity to interrupt based on absolute, high 56 * resolution values of time. 57 * 58 * Subsystem Overview 59 * 60 * The cyclic subsystem is a low-level kernel subsystem designed to provide 61 * arbitrarily high resolution, per-CPU interval timers (to avoid colliding 62 * with existing terms, we dub such an interval timer a "cyclic"). Cyclics 63 * can be specified to fire at high, lock or low interrupt level, and may be 64 * optionally bound to a CPU or a CPU partition. A cyclic's CPU or CPU 65 * partition binding may be changed dynamically; the cyclic will be "juggled" 66 * to a CPU which satisfies the new binding. Alternatively, a cyclic may 67 * be specified to be "omnipresent", denoting firing on all online CPUs. 68 * 69 * Cyclic Subsystem Interface Overview 70 * ----------------------------------- 71 * 72 * The cyclic subsystem has interfaces with the kernel at-large, with other 73 * kernel subsystems (e.g. the processor management subsystem, the checkpoint 74 * resume subsystem) and with the platform (the cyclic backend). Each 75 * of these interfaces is given a brief synopsis here, and is described 76 * in full above the interface's implementation. 77 * 78 * The following diagram displays the cyclic subsystem's interfaces to 79 * other kernel components. The arrows denote a "calls" relationship, with 80 * the large arrow indicating the cyclic subsystem's consumer interface. 81 * Each arrow is labeled with the section in which the corresponding 82 * interface is described. 83 * 84 * Kernel at-large consumers 85 * -----------++------------ 86 * || 87 * || 88 * _||_ 89 * \ / 90 * \/ 91 * +---------------------+ 92 * | | 93 * | Cyclic subsystem |<----------- Other kernel subsystems 94 * | | 95 * +---------------------+ 96 * ^ | 97 * | | 98 * | | 99 * | v 100 * +---------------------+ 101 * | | 102 * | Cyclic backend | 103 * | (platform specific) | 104 * | | 105 * +---------------------+ 106 * 107 * 108 * Kernel At-Large Interfaces 109 * 110 * cyclic_add() <-- Creates a cyclic 111 * cyclic_add_omni() <-- Creates an omnipresent cyclic 112 * cyclic_remove() <-- Removes a cyclic 113 * cyclic_bind() <-- Change a cyclic's CPU or partition binding 114 * cyclic_reprogram() <-- Reprogram a cyclic's expiration 115 * 116 * Inter-subsystem Interfaces 117 * 118 * cyclic_juggle() <-- Juggles cyclics away from a CPU 119 * cyclic_offline() <-- Offlines cyclic operation on a CPU 120 * cyclic_online() <-- Reenables operation on an offlined CPU 121 * cyclic_move_in() <-- Notifies subsystem of change in CPU partition 122 * cyclic_move_out() <-- Notifies subsystem of change in CPU partition 123 * cyclic_suspend() <-- Suspends the cyclic subsystem on all CPUs 124 * cyclic_resume() <-- Resumes the cyclic subsystem on all CPUs 125 * 126 * Backend Interfaces 127 * 128 * cyclic_init() <-- Initializes the cyclic subsystem 129 * cyclic_fire() <-- CY_HIGH_LEVEL interrupt entry point 130 * cyclic_softint() <-- CY_LOCK/LOW_LEVEL soft interrupt entry point 131 * 132 * The backend-supplied interfaces (through the cyc_backend structure) are 133 * documented in detail in <sys/cyclic_impl.h> 134 * 135 * 136 * Cyclic Subsystem Implementation Overview 137 * ---------------------------------------- 138 * 139 * The cyclic subsystem is designed to minimize interference between cyclics 140 * on different CPUs. Thus, all of the cyclic subsystem's data structures 141 * hang off of a per-CPU structure, cyc_cpu. 142 * 143 * Each cyc_cpu has a power-of-two sized array of cyclic structures (the 144 * cyp_cyclics member of the cyc_cpu structure). If cyclic_add() is called 145 * and there does not exist a free slot in the cyp_cyclics array, the size of 146 * the array will be doubled. The array will never shrink. Cyclics are 147 * referred to by their index in the cyp_cyclics array, which is of type 148 * cyc_index_t. 149 * 150 * The cyclics are kept sorted by expiration time in the cyc_cpu's heap. The 151 * heap is keyed by cyclic expiration time, with parents expiring earlier 152 * than their children. 153 * 154 * Heap Management 155 * 156 * The heap is managed primarily by cyclic_fire(). Upon entry, cyclic_fire() 157 * compares the root cyclic's expiration time to the current time. If the 158 * expiration time is in the past, cyclic_expire() is called on the root 159 * cyclic. Upon return from cyclic_expire(), the cyclic's new expiration time 160 * is derived by adding its interval to its old expiration time, and a 161 * downheap operation is performed. After the downheap, cyclic_fire() 162 * examines the (potentially changed) root cyclic, repeating the 163 * cyclic_expire()/add interval/cyclic_downheap() sequence until the root 164 * cyclic has an expiration time in the future. This expiration time 165 * (guaranteed to be the earliest in the heap) is then communicated to the 166 * backend via cyb_reprogram. Optimal backends will next call cyclic_fire() 167 * shortly after the root cyclic's expiration time. 168 * 169 * To allow efficient, deterministic downheap operations, we implement the 170 * heap as an array (the cyp_heap member of the cyc_cpu structure), with each 171 * element containing an index into the CPU's cyp_cyclics array. 172 * 173 * The heap is laid out in the array according to the following: 174 * 175 * 1. The root of the heap is always in the 0th element of the heap array 176 * 2. The left and right children of the nth element are element 177 * (((n + 1) << 1) - 1) and element ((n + 1) << 1), respectively. 178 * 179 * This layout is standard (see, e.g., Cormen's "Algorithms"); the proof 180 * that these constraints correctly lay out a heap (or indeed, any binary 181 * tree) is trivial and left to the reader. 182 * 183 * To see the heap by example, assume our cyclics array has the following 184 * members (at time t): 185 * 186 * cy_handler cy_level cy_expire 187 * --------------------------------------------- 188 * [ 0] clock() LOCK t+10000000 189 * [ 1] deadman() HIGH t+1000000000 190 * [ 2] clock_highres_fire() LOW t+100 191 * [ 3] clock_highres_fire() LOW t+1000 192 * [ 4] clock_highres_fire() LOW t+500 193 * [ 5] (free) -- -- 194 * [ 6] (free) -- -- 195 * [ 7] (free) -- -- 196 * 197 * The heap array could be: 198 * 199 * [0] [1] [2] [3] [4] [5] [6] [7] 200 * +-----+-----+-----+-----+-----+-----+-----+-----+ 201 * | | | | | | | | | 202 * | 2 | 3 | 4 | 0 | 1 | x | x | x | 203 * | | | | | | | | | 204 * +-----+-----+-----+-----+-----+-----+-----+-----+ 205 * 206 * Graphically, this array corresponds to the following (excuse the ASCII art): 207 * 208 * 2 209 * | 210 * +------------------+------------------+ 211 * 3 4 212 * | 213 * +---------+--------+ 214 * 0 1 215 * 216 * Note that the heap is laid out by layer: all nodes at a given depth are 217 * stored in consecutive elements of the array. Moreover, layers of 218 * consecutive depths are in adjacent element ranges. This property 219 * guarantees high locality of reference during downheap operations. 220 * Specifically, we are guaranteed that we can downheap to a depth of 221 * 222 * lg (cache_line_size / sizeof (cyc_index_t)) 223 * 224 * nodes with at most one cache miss. On UltraSPARC (64 byte e-cache line 225 * size), this corresponds to a depth of four nodes. Thus, if there are 226 * fewer than sixteen cyclics in the heap, downheaps on UltraSPARC miss at 227 * most once in the e-cache. 228 * 229 * Downheaps are required to compare siblings as they proceed down the 230 * heap. For downheaps proceeding beyond the one-cache-miss depth, every 231 * access to a left child could potentially miss in the cache. However, 232 * if we assume 233 * 234 * (cache_line_size / sizeof (cyc_index_t)) > 2, 235 * 236 * then all siblings are guaranteed to be on the same cache line. Thus, the 237 * miss on the left child will guarantee a hit on the right child; downheaps 238 * will incur at most one cache miss per layer beyond the one-cache-miss 239 * depth. The total number of cache misses for heap management during a 240 * downheap operation is thus bounded by 241 * 242 * lg (n) - lg (cache_line_size / sizeof (cyc_index_t)) 243 * 244 * Traditional pointer-based heaps are implemented without regard to 245 * locality. Downheaps can thus incur two cache misses per layer (one for 246 * each child), but at most one cache miss at the root. This yields a bound 247 * of 248 * 249 * 2 * lg (n) - 1 250 * 251 * on the total cache misses. 252 * 253 * This difference may seem theoretically trivial (the difference is, after 254 * all, constant), but can become substantial in practice -- especially for 255 * caches with very large cache lines and high miss penalties (e.g. TLBs). 256 * 257 * Heaps must always be full, balanced trees. Heap management must therefore 258 * track the next point-of-insertion into the heap. In pointer-based heaps, 259 * recomputing this point takes O(lg (n)). Given the layout of the 260 * array-based implementation, however, the next point-of-insertion is 261 * always: 262 * 263 * heap[number_of_elements] 264 * 265 * We exploit this property by implementing the free-list in the usused 266 * heap elements. Heap insertion, therefore, consists only of filling in 267 * the cyclic at cyp_cyclics[cyp_heap[number_of_elements]], incrementing 268 * the number of elements, and performing an upheap. Heap deletion consists 269 * of decrementing the number of elements, swapping the to-be-deleted element 270 * with the element at cyp_heap[number_of_elements], and downheaping. 271 * 272 * Filling in more details in our earlier example: 273 * 274 * +--- free list head 275 * | 276 * V 277 * 278 * [0] [1] [2] [3] [4] [5] [6] [7] 279 * +-----+-----+-----+-----+-----+-----+-----+-----+ 280 * | | | | | | | | | 281 * | 2 | 3 | 4 | 0 | 1 | 5 | 6 | 7 | 282 * | | | | | | | | | 283 * +-----+-----+-----+-----+-----+-----+-----+-----+ 284 * 285 * To insert into this heap, we would just need to fill in the cyclic at 286 * cyp_cyclics[5], bump the number of elements (from 5 to 6) and perform 287 * an upheap. 288 * 289 * If we wanted to remove, say, cyp_cyclics[3], we would first scan for it 290 * in the cyp_heap, and discover it at cyp_heap[1]. We would then decrement 291 * the number of elements (from 5 to 4), swap cyp_heap[1] with cyp_heap[4], 292 * and perform a downheap from cyp_heap[1]. The linear scan is required 293 * because the cyclic does not keep a backpointer into the heap. This makes 294 * heap manipulation (e.g. downheaps) faster at the expense of removal 295 * operations. 296 * 297 * Expiry processing 298 * 299 * As alluded to above, cyclic_expire() is called by cyclic_fire() at 300 * CY_HIGH_LEVEL to expire a cyclic. Cyclic subsystem consumers are 301 * guaranteed that for an arbitrary time t in the future, their cyclic 302 * handler will have been called (t - cyt_when) / cyt_interval times. Thus, 303 * there must be a one-to-one mapping between a cyclic's expiration at 304 * CY_HIGH_LEVEL and its execution at the desired level (either CY_HIGH_LEVEL, 305 * CY_LOCK_LEVEL or CY_LOW_LEVEL). 306 * 307 * For CY_HIGH_LEVEL cyclics, this is trivial; cyclic_expire() simply needs 308 * to call the handler. 309 * 310 * For CY_LOCK_LEVEL and CY_LOW_LEVEL cyclics, however, there exists a 311 * potential disconnect: if the CPU is at an interrupt level less than 312 * CY_HIGH_LEVEL but greater than the level of a cyclic for a period of 313 * time longer than twice the cyclic's interval, the cyclic will be expired 314 * twice before it can be handled. 315 * 316 * To maintain the one-to-one mapping, we track the difference between the 317 * number of times a cyclic has been expired and the number of times it's 318 * been handled in a "pending count" (the cy_pend field of the cyclic 319 * structure). cyclic_expire() thus increments the cy_pend count for the 320 * expired cyclic and posts a soft interrupt at the desired level. In the 321 * cyclic subsystem's soft interrupt handler, cyclic_softint(), we repeatedly 322 * call the cyclic handler and decrement cy_pend until we have decremented 323 * cy_pend to zero. 324 * 325 * The Producer/Consumer Buffer 326 * 327 * If we wish to avoid a linear scan of the cyclics array at soft interrupt 328 * level, cyclic_softint() must be able to quickly determine which cyclics 329 * have a non-zero cy_pend count. We thus introduce a per-soft interrupt 330 * level producer/consumer buffer shared with CY_HIGH_LEVEL. These buffers 331 * are encapsulated in the cyc_pcbuffer structure, and, like cyp_heap, are 332 * implemented as cyc_index_t arrays (the cypc_buf member of the cyc_pcbuffer 333 * structure). 334 * 335 * The producer (cyclic_expire() running at CY_HIGH_LEVEL) enqueues a cyclic 336 * by storing the cyclic's index to cypc_buf[cypc_prodndx] and incrementing 337 * cypc_prodndx. The consumer (cyclic_softint() running at either 338 * CY_LOCK_LEVEL or CY_LOW_LEVEL) dequeues a cyclic by loading from 339 * cypc_buf[cypc_consndx] and bumping cypc_consndx. The buffer is empty when 340 * cypc_prodndx == cypc_consndx. 341 * 342 * To bound the size of the producer/consumer buffer, cyclic_expire() only 343 * enqueues a cyclic if its cy_pend was zero (if the cyclic's cy_pend is 344 * non-zero, cyclic_expire() only bumps cy_pend). Symmetrically, 345 * cyclic_softint() only consumes a cyclic after it has decremented the 346 * cy_pend count to zero. 347 * 348 * Returning to our example, here is what the CY_LOW_LEVEL producer/consumer 349 * buffer might look like: 350 * 351 * cypc_consndx ---+ +--- cypc_prodndx 352 * | | 353 * V V 354 * 355 * [0] [1] [2] [3] [4] [5] [6] [7] 356 * +-----+-----+-----+-----+-----+-----+-----+-----+ 357 * | | | | | | | | | 358 * | x | x | 3 | 2 | 4 | x | x | x | <== cypc_buf 359 * | | | . | . | . | | | | 360 * +-----+-----+- | -+- | -+- | -+-----+-----+-----+ 361 * | | | 362 * | | | cy_pend cy_handler 363 * | | | ------------------------- 364 * | | | [ 0] 1 clock() 365 * | | | [ 1] 0 deadman() 366 * | +---- | -------> [ 2] 3 clock_highres_fire() 367 * +---------- | -------> [ 3] 1 clock_highres_fire() 368 * +--------> [ 4] 1 clock_highres_fire() 369 * [ 5] - (free) 370 * [ 6] - (free) 371 * [ 7] - (free) 372 * 373 * In particular, note that clock()'s cy_pend is 1 but that it is _not_ in 374 * this producer/consumer buffer; it would be enqueued in the CY_LOCK_LEVEL 375 * producer/consumer buffer. 376 * 377 * Locking 378 * 379 * Traditionally, access to per-CPU data structures shared between 380 * interrupt levels is serialized by manipulating programmable interrupt 381 * level: readers and writers are required to raise their interrupt level 382 * to that of the highest level writer. 383 * 384 * For the producer/consumer buffers (shared between cyclic_fire()/ 385 * cyclic_expire() executing at CY_HIGH_LEVEL and cyclic_softint() executing 386 * at one of CY_LOCK_LEVEL or CY_LOW_LEVEL), forcing cyclic_softint() to raise 387 * programmable interrupt level is undesirable: aside from the additional 388 * latency incurred by manipulating interrupt level in the hot cy_pend 389 * processing path, this would create the potential for soft level cy_pend 390 * processing to delay CY_HIGH_LEVEL firing and expiry processing. 391 * CY_LOCK/LOW_LEVEL cyclics could thereby induce jitter in CY_HIGH_LEVEL 392 * cyclics. 393 * 394 * To minimize jitter, then, we would like the cyclic_fire()/cyclic_expire() 395 * and cyclic_softint() code paths to be lock-free. 396 * 397 * For cyclic_fire()/cyclic_expire(), lock-free execution is straightforward: 398 * because these routines execute at a higher interrupt level than 399 * cyclic_softint(), their actions on the producer/consumer buffer appear 400 * atomic. In particular, the increment of cy_pend appears to occur 401 * atomically with the increment of cypc_prodndx. 402 * 403 * For cyclic_softint(), however, lock-free execution requires more delicacy. 404 * When cyclic_softint() discovers a cyclic in the producer/consumer buffer, 405 * it calls the cyclic's handler and attempts to atomically decrement the 406 * cy_pend count with a compare&swap operation. 407 * 408 * If the compare&swap operation succeeds, cyclic_softint() behaves 409 * conditionally based on the value it atomically wrote to cy_pend: 410 * 411 * - If the cy_pend was decremented to 0, the cyclic has been consumed; 412 * cyclic_softint() increments the cypc_consndx and checks for more 413 * enqueued work. 414 * 415 * - If the count was decremented to a non-zero value, there is more work 416 * to be done on the cyclic; cyclic_softint() calls the cyclic handler 417 * and repeats the atomic decrement process. 418 * 419 * If the compare&swap operation fails, cyclic_softint() knows that 420 * cyclic_expire() has intervened and bumped the cy_pend count (resizes 421 * and removals complicate this, however -- see the sections on their 422 * operation, below). cyclic_softint() thus reloads cy_pend, and re-attempts 423 * the atomic decrement. 424 * 425 * Recall that we bound the size of the producer/consumer buffer by 426 * having cyclic_expire() only enqueue the specified cyclic if its 427 * cy_pend count is zero; this assures that each cyclic is enqueued at 428 * most once. This leads to a critical constraint on cyclic_softint(), 429 * however: after the compare&swap operation which successfully decrements 430 * cy_pend to zero, cyclic_softint() must _not_ re-examine the consumed 431 * cyclic. In part to obey this constraint, cyclic_softint() calls the 432 * cyclic handler before decrementing cy_pend. 433 * 434 * Resizing 435 * 436 * All of the discussion thus far has assumed a static number of cyclics. 437 * Obviously, static limitations are not practical; we need the capacity 438 * to resize our data structures dynamically. 439 * 440 * We resize our data structures lazily, and only on a per-CPU basis. 441 * The size of the data structures always doubles and never shrinks. We 442 * serialize adds (and thus resizes) on cpu_lock; we never need to deal 443 * with concurrent resizes. Resizes should be rare; they may induce jitter 444 * on the CPU being resized, but should not affect cyclic operation on other 445 * CPUs. Pending cyclics may not be dropped during a resize operation. 446 * 447 * Three key cyc_cpu data structures need to be resized: the cyclics array, 448 * the heap array and the producer/consumer buffers. Resizing the first two 449 * is relatively straightforward: 450 * 451 * 1. The new, larger arrays are allocated in cyclic_expand() (called 452 * from cyclic_add()). 453 * 2. cyclic_expand() cross calls cyclic_expand_xcall() on the CPU 454 * undergoing the resize. 455 * 3. cyclic_expand_xcall() raises interrupt level to CY_HIGH_LEVEL 456 * 4. The contents of the old arrays are copied into the new arrays. 457 * 5. The old cyclics array is bzero()'d 458 * 6. The pointers are updated. 459 * 460 * The producer/consumer buffer is dicier: cyclic_expand_xcall() may have 461 * interrupted cyclic_softint() in the middle of consumption. To resize the 462 * producer/consumer buffer, we implement up to two buffers per soft interrupt 463 * level: a hard buffer (the buffer being produced into by cyclic_expire()) 464 * and a soft buffer (the buffer from which cyclic_softint() is consuming). 465 * During normal operation, the hard buffer and soft buffer point to the 466 * same underlying producer/consumer buffer. 467 * 468 * During a resize, however, cyclic_expand_xcall() changes the hard buffer 469 * to point to the new, larger producer/consumer buffer; all future 470 * cyclic_expire()'s will produce into the new buffer. cyclic_expand_xcall() 471 * then posts a CY_LOCK_LEVEL soft interrupt, landing in cyclic_softint(). 472 * 473 * As under normal operation, cyclic_softint() will consume cyclics from 474 * its soft buffer. After the soft buffer is drained, however, 475 * cyclic_softint() will see that the hard buffer has changed. At that time, 476 * cyclic_softint() will change its soft buffer to point to the hard buffer, 477 * and repeat the producer/consumer buffer draining procedure. 478 * 479 * After the new buffer is drained, cyclic_softint() will determine if both 480 * soft levels have seen their new producer/consumer buffer. If both have, 481 * cyclic_softint() will post on the semaphore cyp_modify_wait. If not, a 482 * soft interrupt will be generated for the remaining level. 483 * 484 * cyclic_expand() blocks on the cyp_modify_wait semaphore (a semaphore is 485 * used instead of a condition variable because of the race between the 486 * sema_p() in cyclic_expand() and the sema_v() in cyclic_softint()). This 487 * allows cyclic_expand() to know when the resize operation is complete; 488 * all of the old buffers (the heap, the cyclics array and the producer/ 489 * consumer buffers) can be freed. 490 * 491 * A final caveat on resizing: we described step (5) in the 492 * cyclic_expand_xcall() procedure without providing any motivation. This 493 * step addresses the problem of a cyclic_softint() attempting to decrement 494 * a cy_pend count while interrupted by a cyclic_expand_xcall(). Because 495 * cyclic_softint() has already called the handler by the time cy_pend is 496 * decremented, we want to assure that it doesn't decrement a cy_pend 497 * count in the old cyclics array. By zeroing the old cyclics array in 498 * cyclic_expand_xcall(), we are zeroing out every cy_pend count; when 499 * cyclic_softint() attempts to compare&swap on the cy_pend count, it will 500 * fail and recognize that the count has been zeroed. cyclic_softint() will 501 * update its stale copy of the cyp_cyclics pointer, re-read the cy_pend 502 * count from the new cyclics array, and re-attempt the compare&swap. 503 * 504 * Removals 505 * 506 * Cyclic removals should be rare. To simplify the implementation (and to 507 * allow optimization for the cyclic_fire()/cyclic_expire()/cyclic_softint() 508 * path), we force removals and adds to serialize on cpu_lock. 509 * 510 * Cyclic removal is complicated by a guarantee made to the consumer of 511 * the cyclic subsystem: after cyclic_remove() returns, the cyclic handler 512 * has returned and will never again be called. 513 * 514 * Here is the procedure for cyclic removal: 515 * 516 * 1. cyclic_remove() calls cyclic_remove_xcall() on the CPU undergoing 517 * the removal. 518 * 2. cyclic_remove_xcall() raises interrupt level to CY_HIGH_LEVEL 519 * 3. The current expiration time for the removed cyclic is recorded. 520 * 4. If the cy_pend count on the removed cyclic is non-zero, it 521 * is copied into cyp_rpend and subsequently zeroed. 522 * 5. The cyclic is removed from the heap 523 * 6. If the root of the heap has changed, the backend is reprogrammed. 524 * 7. If the cy_pend count was non-zero cyclic_remove() blocks on the 525 * cyp_modify_wait semaphore. 526 * 527 * The motivation for step (3) is explained in "Juggling", below. 528 * 529 * The cy_pend count is decremented in cyclic_softint() after the cyclic 530 * handler returns. Thus, if we find a cy_pend count of zero in step 531 * (4), we know that cyclic_remove() doesn't need to block. 532 * 533 * If the cy_pend count is non-zero, however, we must block in cyclic_remove() 534 * until cyclic_softint() has finished calling the cyclic handler. To let 535 * cyclic_softint() know that this cyclic has been removed, we zero the 536 * cy_pend count. This will cause cyclic_softint()'s compare&swap to fail. 537 * When cyclic_softint() sees the zero cy_pend count, it knows that it's been 538 * caught during a resize (see "Resizing", above) or that the cyclic has been 539 * removed. In the latter case, it calls cyclic_remove_pend() to call the 540 * cyclic handler cyp_rpend - 1 times, and posts on cyp_modify_wait. 541 * 542 * Juggling 543 * 544 * At first glance, cyclic juggling seems to be a difficult problem. The 545 * subsystem must guarantee that a cyclic doesn't execute simultaneously on 546 * different CPUs, while also assuring that a cyclic fires exactly once 547 * per interval. We solve this problem by leveraging a property of the 548 * platform: gethrtime() is required to increase in lock-step across 549 * multiple CPUs. Therefore, to juggle a cyclic, we remove it from its 550 * CPU, recording its expiration time in the remove cross call (step (3) 551 * in "Removing", above). We then add the cyclic to the new CPU, explicitly 552 * setting its expiration time to the time recorded in the removal. This 553 * leverages the existing cyclic expiry processing, which will compensate 554 * for any time lost while juggling. 555 * 556 * Reprogramming 557 * 558 * Normally, after a cyclic fires, its next expiration is computed from 559 * the current time and the cyclic interval. But there are situations when 560 * the next expiration needs to be reprogrammed by the kernel subsystem that 561 * is using the cyclic. cyclic_reprogram() allows this to be done. This, 562 * unlike the other kernel at-large cyclic API functions, is permitted to 563 * be called from the cyclic handler. This is because it does not use the 564 * cpu_lock to serialize access. 565 * 566 * When cyclic_reprogram() is called for an omni-cyclic, the operation is 567 * applied to the omni-cyclic's component on the current CPU. 568 * 569 * If a high-level cyclic handler reprograms its own cyclic, then 570 * cyclic_fire() detects that and does not recompute the cyclic's next 571 * expiration. However, for a lock-level or a low-level cyclic, the 572 * actual cyclic handler will execute at the lower PIL only after 573 * cyclic_fire() is done with all expired cyclics. To deal with this, such 574 * cyclics can be specified with a special interval of CY_INFINITY (INT64_MAX). 575 * cyclic_fire() recognizes this special value and recomputes the next 576 * expiration to CY_INFINITY. This effectively moves the cyclic to the 577 * bottom of the heap and prevents it from going off until its handler has 578 * had a chance to reprogram it. Infact, this is the way to create and reuse 579 * "one-shot" timers in the context of the cyclic subsystem without using 580 * cyclic_remove(). 581 * 582 * Here is the procedure for cyclic reprogramming: 583 * 584 * 1. cyclic_reprogram() calls cyclic_reprogram_xcall() on the CPU 585 * that houses the cyclic. 586 * 2. cyclic_reprogram_xcall() raises interrupt level to CY_HIGH_LEVEL 587 * 3. The cyclic is located in the cyclic heap. The search for this is 588 * done from the bottom of the heap to the top as reprogrammable cyclics 589 * would be located closer to the bottom than the top. 590 * 4. The cyclic expiration is set and the cyclic is moved to its 591 * correct position in the heap (up or down depending on whether the 592 * new expiration is less than or greater than the old one). 593 * 5. If the cyclic move modified the root of the heap, the backend is 594 * reprogrammed. 595 * 596 * Reprogramming can be a frequent event (see the callout subsystem). So, 597 * the serialization used has to be efficient. As with all other cyclic 598 * operations, the interrupt level is raised during reprogramming. Plus, 599 * during reprogramming, the cyclic must not be juggled (regular cyclic) 600 * or stopped (omni-cyclic). The implementation defines a per-cyclic 601 * reader-writer lock to accomplish this. This lock is acquired in the 602 * reader mode by cyclic_reprogram() and writer mode by cyclic_juggle() and 603 * cyclic_omni_stop(). The reader-writer lock makes it efficient if 604 * an omni-cyclic is reprogrammed on different CPUs frequently. 605 * 606 * Note that since the cpu_lock is not used during reprogramming, it is 607 * the responsibility of the user of the reprogrammable cyclic to make sure 608 * that the cyclic is not removed via cyclic_remove() during reprogramming. 609 * This is not an unreasonable requirement as the user will typically have 610 * some sort of synchronization for its cyclic-related activities. This 611 * little caveat exists because the cyclic ID is not really an ID. It is 612 * implemented as a pointer to a structure. 613 */ 614 #include <sys/cyclic_impl.h> 615 #include <sys/sysmacros.h> 616 #include <sys/systm.h> 617 #include <sys/atomic.h> 618 #include <sys/kmem.h> 619 #include <sys/cmn_err.h> 620 #include <sys/ddi.h> 621 #include <sys/sdt.h> 622 623 #ifdef CYCLIC_TRACE 624 625 /* 626 * cyc_trace_enabled is for the benefit of kernel debuggers. 627 */ 628 int cyc_trace_enabled = 1; 629 static cyc_tracebuf_t cyc_ptrace; 630 static cyc_coverage_t cyc_coverage[CY_NCOVERAGE]; 631 632 /* 633 * Seen this anywhere? 634 */ 635 static uint_t 636 cyclic_coverage_hash(char *p) 637 { 638 unsigned int g; 639 uint_t hval; 640 641 hval = 0; 642 while (*p) { 643 hval = (hval << 4) + *p++; 644 if ((g = (hval & 0xf0000000)) != 0) 645 hval ^= g >> 24; 646 hval &= ~g; 647 } 648 return (hval); 649 } 650 651 static void 652 cyclic_coverage(char *why, int level, uint64_t arg0, uint64_t arg1) 653 { 654 uint_t ndx, orig; 655 656 for (ndx = orig = cyclic_coverage_hash(why) % CY_NCOVERAGE; ; ) { 657 if (cyc_coverage[ndx].cyv_why == why) 658 break; 659 660 if (cyc_coverage[ndx].cyv_why != NULL || 661 atomic_cas_ptr(&cyc_coverage[ndx].cyv_why, NULL, why) != 662 NULL) { 663 664 if (++ndx == CY_NCOVERAGE) 665 ndx = 0; 666 667 if (ndx == orig) 668 panic("too many cyclic coverage points"); 669 continue; 670 } 671 672 /* 673 * If we're here, we have successfully swung our guy into 674 * the position at "ndx". 675 */ 676 break; 677 } 678 679 if (level == CY_PASSIVE_LEVEL) 680 cyc_coverage[ndx].cyv_passive_count++; 681 else 682 cyc_coverage[ndx].cyv_count[level]++; 683 684 cyc_coverage[ndx].cyv_arg0 = arg0; 685 cyc_coverage[ndx].cyv_arg1 = arg1; 686 } 687 688 #define CYC_TRACE(cpu, level, why, arg0, arg1) \ 689 CYC_TRACE_IMPL(&cpu->cyp_trace[level], level, why, arg0, arg1) 690 691 #define CYC_PTRACE(why, arg0, arg1) \ 692 CYC_TRACE_IMPL(&cyc_ptrace, CY_PASSIVE_LEVEL, why, arg0, arg1) 693 694 #define CYC_TRACE_IMPL(buf, level, why, a0, a1) { \ 695 if (panicstr == NULL) { \ 696 int _ndx = (buf)->cyt_ndx; \ 697 cyc_tracerec_t *_rec = &(buf)->cyt_buf[_ndx]; \ 698 (buf)->cyt_ndx = (++_ndx == CY_NTRACEREC) ? 0 : _ndx; \ 699 _rec->cyt_tstamp = gethrtime_unscaled(); \ 700 _rec->cyt_why = (why); \ 701 _rec->cyt_arg0 = (uint64_t)(uintptr_t)(a0); \ 702 _rec->cyt_arg1 = (uint64_t)(uintptr_t)(a1); \ 703 cyclic_coverage(why, level, \ 704 (uint64_t)(uintptr_t)(a0), (uint64_t)(uintptr_t)(a1)); \ 705 } \ 706 } 707 708 #else 709 710 static int cyc_trace_enabled = 0; 711 712 #define CYC_TRACE(cpu, level, why, arg0, arg1) 713 #define CYC_PTRACE(why, arg0, arg1) 714 715 #endif 716 717 #define CYC_TRACE0(cpu, level, why) CYC_TRACE(cpu, level, why, 0, 0) 718 #define CYC_TRACE1(cpu, level, why, arg0) CYC_TRACE(cpu, level, why, arg0, 0) 719 720 #define CYC_PTRACE0(why) CYC_PTRACE(why, 0, 0) 721 #define CYC_PTRACE1(why, arg0) CYC_PTRACE(why, arg0, 0) 722 723 static kmem_cache_t *cyclic_id_cache; 724 static cyc_id_t *cyclic_id_head; 725 static hrtime_t cyclic_resolution; 726 static cyc_backend_t cyclic_backend; 727 728 /* 729 * Returns 1 if the upheap propagated to the root, 0 if it did not. This 730 * allows the caller to reprogram the backend only when the root has been 731 * modified. 732 */ 733 static int 734 cyclic_upheap(cyc_cpu_t *cpu, cyc_index_t ndx) 735 { 736 cyclic_t *cyclics; 737 cyc_index_t *heap; 738 cyc_index_t heap_parent, heap_current = ndx; 739 cyc_index_t parent, current; 740 741 if (heap_current == 0) 742 return (1); 743 744 heap = cpu->cyp_heap; 745 cyclics = cpu->cyp_cyclics; 746 heap_parent = CYC_HEAP_PARENT(heap_current); 747 748 for (;;) { 749 current = heap[heap_current]; 750 parent = heap[heap_parent]; 751 752 /* 753 * We have an expiration time later than our parent; we're 754 * done. 755 */ 756 if (cyclics[current].cy_expire >= cyclics[parent].cy_expire) 757 return (0); 758 759 /* 760 * We need to swap with our parent, and continue up the heap. 761 */ 762 heap[heap_parent] = current; 763 heap[heap_current] = parent; 764 765 /* 766 * If we just reached the root, we're done. 767 */ 768 if (heap_parent == 0) 769 return (1); 770 771 heap_current = heap_parent; 772 heap_parent = CYC_HEAP_PARENT(heap_current); 773 } 774 } 775 776 static void 777 cyclic_downheap(cyc_cpu_t *cpu, cyc_index_t ndx) 778 { 779 cyclic_t *cyclics = cpu->cyp_cyclics; 780 cyc_index_t *heap = cpu->cyp_heap; 781 782 cyc_index_t heap_left, heap_right, heap_me = ndx; 783 cyc_index_t left, right, me; 784 cyc_index_t nelems = cpu->cyp_nelems; 785 786 for (;;) { 787 /* 788 * If we don't have a left child (i.e., we're a leaf), we're 789 * done. 790 */ 791 if ((heap_left = CYC_HEAP_LEFT(heap_me)) >= nelems) 792 return; 793 794 left = heap[heap_left]; 795 me = heap[heap_me]; 796 797 heap_right = CYC_HEAP_RIGHT(heap_me); 798 799 /* 800 * Even if we don't have a right child, we still need to compare 801 * our expiration time against that of our left child. 802 */ 803 if (heap_right >= nelems) 804 goto comp_left; 805 806 right = heap[heap_right]; 807 808 /* 809 * We have both a left and a right child. We need to compare 810 * the expiration times of the children to determine which 811 * expires earlier. 812 */ 813 if (cyclics[right].cy_expire < cyclics[left].cy_expire) { 814 /* 815 * Our right child is the earlier of our children. 816 * We'll now compare our expiration time to its; if 817 * ours is the earlier, we're done. 818 */ 819 if (cyclics[me].cy_expire <= cyclics[right].cy_expire) 820 return; 821 822 /* 823 * Our right child expires earlier than we do; swap 824 * with our right child, and descend right. 825 */ 826 heap[heap_right] = me; 827 heap[heap_me] = right; 828 heap_me = heap_right; 829 continue; 830 } 831 832 comp_left: 833 /* 834 * Our left child is the earlier of our children (or we have 835 * no right child). We'll now compare our expiration time 836 * to its; if ours is the earlier, we're done. 837 */ 838 if (cyclics[me].cy_expire <= cyclics[left].cy_expire) 839 return; 840 841 /* 842 * Our left child expires earlier than we do; swap with our 843 * left child, and descend left. 844 */ 845 heap[heap_left] = me; 846 heap[heap_me] = left; 847 heap_me = heap_left; 848 } 849 } 850 851 static void 852 cyclic_expire(cyc_cpu_t *cpu, cyc_index_t ndx, cyclic_t *cyclic) 853 { 854 cyc_backend_t *be = cpu->cyp_backend; 855 cyc_level_t level = cyclic->cy_level; 856 857 /* 858 * If this is a CY_HIGH_LEVEL cyclic, just call the handler; we don't 859 * need to worry about the pend count for CY_HIGH_LEVEL cyclics. 860 */ 861 if (level == CY_HIGH_LEVEL) { 862 cyc_func_t handler = cyclic->cy_handler; 863 void *arg = cyclic->cy_arg; 864 865 CYC_TRACE(cpu, CY_HIGH_LEVEL, "handler-in", handler, arg); 866 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 867 868 (*handler)(arg); 869 870 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 871 CYC_TRACE(cpu, CY_HIGH_LEVEL, "handler-out", handler, arg); 872 873 return; 874 } 875 876 /* 877 * We're at CY_HIGH_LEVEL; this modification to cy_pend need not 878 * be atomic (the high interrupt level assures that it will appear 879 * atomic to any softint currently running). 880 */ 881 if (cyclic->cy_pend++ == 0) { 882 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[level]; 883 cyc_pcbuffer_t *pc = &softbuf->cys_buf[softbuf->cys_hard]; 884 885 /* 886 * We need to enqueue this cyclic in the soft buffer. 887 */ 888 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expire-enq", cyclic, 889 pc->cypc_prodndx); 890 pc->cypc_buf[pc->cypc_prodndx++ & pc->cypc_sizemask] = ndx; 891 892 ASSERT(pc->cypc_prodndx != pc->cypc_consndx); 893 } else { 894 /* 895 * If the pend count is zero after we incremented it, then 896 * we've wrapped (i.e. we had a cy_pend count of over four 897 * billion. In this case, we clamp the pend count at 898 * UINT32_MAX. Yes, cyclics can be lost in this case. 899 */ 900 if (cyclic->cy_pend == 0) { 901 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "expire-wrap", cyclic); 902 cyclic->cy_pend = UINT32_MAX; 903 } 904 905 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expire-bump", cyclic, 0); 906 } 907 908 be->cyb_softint(be->cyb_arg, cyclic->cy_level); 909 } 910 911 /* 912 * cyclic_fire(cpu_t *) 913 * 914 * Overview 915 * 916 * cyclic_fire() is the cyclic subsystem's CY_HIGH_LEVEL interrupt handler. 917 * Called by the cyclic backend. 918 * 919 * Arguments and notes 920 * 921 * The only argument is the CPU on which the interrupt is executing; 922 * backends must call into cyclic_fire() on the specified CPU. 923 * 924 * cyclic_fire() may be called spuriously without ill effect. Optimal 925 * backends will call into cyclic_fire() at or shortly after the time 926 * requested via cyb_reprogram(). However, calling cyclic_fire() 927 * arbitrarily late will only manifest latency bubbles; the correctness 928 * of the cyclic subsystem does not rely on the timeliness of the backend. 929 * 930 * cyclic_fire() is wait-free; it will not block or spin. 931 * 932 * Return values 933 * 934 * None. 935 * 936 * Caller's context 937 * 938 * cyclic_fire() must be called from CY_HIGH_LEVEL interrupt context. 939 */ 940 void 941 cyclic_fire(cpu_t *c) 942 { 943 cyc_cpu_t *cpu = c->cpu_cyclic; 944 cyc_backend_t *be = cpu->cyp_backend; 945 cyc_index_t *heap = cpu->cyp_heap; 946 cyclic_t *cyclic, *cyclics = cpu->cyp_cyclics; 947 void *arg = be->cyb_arg; 948 hrtime_t now = gethrtime(); 949 hrtime_t exp; 950 951 CYC_TRACE(cpu, CY_HIGH_LEVEL, "fire", now, 0); 952 953 if (cpu->cyp_nelems == 0) { 954 /* 955 * This is a spurious fire. Count it as such, and blow 956 * out of here. 957 */ 958 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "fire-spurious"); 959 return; 960 } 961 962 for (;;) { 963 cyc_index_t ndx = heap[0]; 964 965 cyclic = &cyclics[ndx]; 966 967 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 968 969 CYC_TRACE(cpu, CY_HIGH_LEVEL, "fire-check", cyclic, 970 cyclic->cy_expire); 971 972 if ((exp = cyclic->cy_expire) > now) 973 break; 974 975 cyclic_expire(cpu, ndx, cyclic); 976 977 /* 978 * If the handler reprogrammed the cyclic, then don't 979 * recompute the expiration. Then, if the interval is 980 * infinity, set the expiration to infinity. This can 981 * be used to create one-shot timers. 982 */ 983 if (exp != cyclic->cy_expire) { 984 /* 985 * If a hi level cyclic reprograms itself, 986 * the heap adjustment and reprogramming of the 987 * clock source have already been done at this 988 * point. So, we can continue. 989 */ 990 continue; 991 } 992 993 if (cyclic->cy_interval == CY_INFINITY) 994 exp = CY_INFINITY; 995 else 996 exp += cyclic->cy_interval; 997 998 /* 999 * If this cyclic will be set to next expire in the distant 1000 * past, we have one of two situations: 1001 * 1002 * a) This is the first firing of a cyclic which had 1003 * cy_expire set to 0. 1004 * 1005 * b) We are tragically late for a cyclic -- most likely 1006 * due to being in the debugger. 1007 * 1008 * In either case, we set the new expiration time to be the 1009 * the next interval boundary. This assures that the 1010 * expiration time modulo the interval is invariant. 1011 * 1012 * We arbitrarily define "distant" to be one second (one second 1013 * is chosen because it's shorter than any foray to the 1014 * debugger while still being longer than any legitimate 1015 * stretch at CY_HIGH_LEVEL). 1016 */ 1017 1018 if (now - exp > NANOSEC) { 1019 hrtime_t interval = cyclic->cy_interval; 1020 1021 CYC_TRACE(cpu, CY_HIGH_LEVEL, exp == interval ? 1022 "fire-first" : "fire-swing", now, exp); 1023 1024 exp += ((now - exp) / interval + 1) * interval; 1025 } 1026 1027 cyclic->cy_expire = exp; 1028 cyclic_downheap(cpu, 0); 1029 } 1030 1031 /* 1032 * Now we have a cyclic in the root slot which isn't in the past; 1033 * reprogram the interrupt source. 1034 */ 1035 be->cyb_reprogram(arg, exp); 1036 } 1037 1038 static void 1039 cyclic_remove_pend(cyc_cpu_t *cpu, cyc_level_t level, cyclic_t *cyclic) 1040 { 1041 cyc_func_t handler = cyclic->cy_handler; 1042 void *arg = cyclic->cy_arg; 1043 uint32_t i, rpend = cpu->cyp_rpend - 1; 1044 1045 ASSERT(cyclic->cy_flags & CYF_FREE); 1046 ASSERT(cyclic->cy_pend == 0); 1047 ASSERT(cpu->cyp_state == CYS_REMOVING); 1048 ASSERT(cpu->cyp_rpend > 0); 1049 1050 CYC_TRACE(cpu, level, "remove-rpend", cyclic, cpu->cyp_rpend); 1051 1052 /* 1053 * Note that we only call the handler cyp_rpend - 1 times; this is 1054 * to account for the handler call in cyclic_softint(). 1055 */ 1056 for (i = 0; i < rpend; i++) { 1057 CYC_TRACE(cpu, level, "rpend-in", handler, arg); 1058 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 1059 1060 (*handler)(arg); 1061 1062 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 1063 CYC_TRACE(cpu, level, "rpend-out", handler, arg); 1064 } 1065 1066 /* 1067 * We can now let the remove operation complete. 1068 */ 1069 sema_v(&cpu->cyp_modify_wait); 1070 } 1071 1072 /* 1073 * cyclic_softint(cpu_t *cpu, cyc_level_t level) 1074 * 1075 * Overview 1076 * 1077 * cyclic_softint() is the cyclic subsystem's CY_LOCK_LEVEL and CY_LOW_LEVEL 1078 * soft interrupt handler. Called by the cyclic backend. 1079 * 1080 * Arguments and notes 1081 * 1082 * The first argument to cyclic_softint() is the CPU on which the interrupt 1083 * is executing; backends must call into cyclic_softint() on the specified 1084 * CPU. The second argument is the level of the soft interrupt; it must 1085 * be one of CY_LOCK_LEVEL or CY_LOW_LEVEL. 1086 * 1087 * cyclic_softint() will call the handlers for cyclics pending at the 1088 * specified level. cyclic_softint() will not return until all pending 1089 * cyclics at the specified level have been dealt with; intervening 1090 * CY_HIGH_LEVEL interrupts which enqueue cyclics at the specified level 1091 * may therefore prolong cyclic_softint(). 1092 * 1093 * cyclic_softint() never disables interrupts, and, if neither a 1094 * cyclic_add() nor a cyclic_remove() is pending on the specified CPU, is 1095 * lock-free. This assures that in the common case, cyclic_softint() 1096 * completes without blocking, and never starves cyclic_fire(). If either 1097 * cyclic_add() or cyclic_remove() is pending, cyclic_softint() may grab 1098 * a dispatcher lock. 1099 * 1100 * While cyclic_softint() is designed for bounded latency, it is obviously 1101 * at the mercy of its cyclic handlers. Because cyclic handlers may block 1102 * arbitrarily, callers of cyclic_softint() should not rely upon 1103 * deterministic completion. 1104 * 1105 * cyclic_softint() may be called spuriously without ill effect. 1106 * 1107 * Return value 1108 * 1109 * None. 1110 * 1111 * Caller's context 1112 * 1113 * The caller must be executing in soft interrupt context at either 1114 * CY_LOCK_LEVEL or CY_LOW_LEVEL. The level passed to cyclic_softint() 1115 * must match the level at which it is executing. On optimal backends, 1116 * the caller will hold no locks. In any case, the caller may not hold 1117 * cpu_lock or any lock acquired by any cyclic handler or held across 1118 * any of cyclic_add(), cyclic_remove(), cyclic_bind() or cyclic_juggle(). 1119 */ 1120 void 1121 cyclic_softint(cpu_t *c, cyc_level_t level) 1122 { 1123 cyc_cpu_t *cpu = c->cpu_cyclic; 1124 cyc_softbuf_t *softbuf; 1125 int soft, *buf, consndx, resized = 0, intr_resized = 0; 1126 cyc_pcbuffer_t *pc; 1127 cyclic_t *cyclics = cpu->cyp_cyclics; 1128 int sizemask; 1129 1130 CYC_TRACE(cpu, level, "softint", cyclics, 0); 1131 1132 ASSERT(level < CY_LOW_LEVEL + CY_SOFT_LEVELS); 1133 1134 softbuf = &cpu->cyp_softbuf[level]; 1135 top: 1136 soft = softbuf->cys_soft; 1137 ASSERT(soft == 0 || soft == 1); 1138 1139 pc = &softbuf->cys_buf[soft]; 1140 buf = pc->cypc_buf; 1141 consndx = pc->cypc_consndx; 1142 sizemask = pc->cypc_sizemask; 1143 1144 CYC_TRACE(cpu, level, "softint-top", cyclics, pc); 1145 1146 while (consndx != pc->cypc_prodndx) { 1147 uint32_t pend, npend, opend; 1148 int consmasked = consndx & sizemask; 1149 cyclic_t *cyclic = &cyclics[buf[consmasked]]; 1150 cyc_func_t handler = cyclic->cy_handler; 1151 void *arg = cyclic->cy_arg; 1152 1153 ASSERT(buf[consmasked] < cpu->cyp_size); 1154 CYC_TRACE(cpu, level, "consuming", consndx, cyclic); 1155 1156 /* 1157 * We have found this cyclic in the pcbuffer. We know that 1158 * one of the following is true: 1159 * 1160 * (a) The pend is non-zero. We need to execute the handler 1161 * at least once. 1162 * 1163 * (b) The pend _was_ non-zero, but it's now zero due to a 1164 * resize. We will call the handler once, see that we 1165 * are in this case, and read the new cyclics buffer 1166 * (and hence the old non-zero pend). 1167 * 1168 * (c) The pend _was_ non-zero, but it's now zero due to a 1169 * removal. We will call the handler once, see that we 1170 * are in this case, and call into cyclic_remove_pend() 1171 * to call the cyclic rpend times. We will take into 1172 * account that we have already called the handler once. 1173 * 1174 * Point is: it's safe to call the handler without first 1175 * checking the pend. 1176 */ 1177 do { 1178 CYC_TRACE(cpu, level, "handler-in", handler, arg); 1179 DTRACE_PROBE1(cyclic__start, cyclic_t *, cyclic); 1180 1181 (*handler)(arg); 1182 1183 DTRACE_PROBE1(cyclic__end, cyclic_t *, cyclic); 1184 CYC_TRACE(cpu, level, "handler-out", handler, arg); 1185 reread: 1186 pend = cyclic->cy_pend; 1187 npend = pend - 1; 1188 1189 if (pend == 0) { 1190 if (cpu->cyp_state == CYS_REMOVING) { 1191 /* 1192 * This cyclic has been removed while 1193 * it had a non-zero pend count (we 1194 * know it was non-zero because we 1195 * found this cyclic in the pcbuffer). 1196 * There must be a non-zero rpend for 1197 * this CPU, and there must be a remove 1198 * operation blocking; we'll call into 1199 * cyclic_remove_pend() to clean this 1200 * up, and break out of the pend loop. 1201 */ 1202 cyclic_remove_pend(cpu, level, cyclic); 1203 break; 1204 } 1205 1206 /* 1207 * We must have had a resize interrupt us. 1208 */ 1209 CYC_TRACE(cpu, level, "resize-int", cyclics, 0); 1210 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1211 ASSERT(cyclics != cpu->cyp_cyclics); 1212 ASSERT(resized == 0); 1213 ASSERT(intr_resized == 0); 1214 intr_resized = 1; 1215 cyclics = cpu->cyp_cyclics; 1216 cyclic = &cyclics[buf[consmasked]]; 1217 ASSERT(cyclic->cy_handler == handler); 1218 ASSERT(cyclic->cy_arg == arg); 1219 goto reread; 1220 } 1221 1222 if ((opend = 1223 atomic_cas_32(&cyclic->cy_pend, pend, npend)) != 1224 pend) { 1225 /* 1226 * Our atomic_cas_32 can fail for one of several 1227 * reasons: 1228 * 1229 * (a) An intervening high level bumped up the 1230 * pend count on this cyclic. In this 1231 * case, we will see a higher pend. 1232 * 1233 * (b) The cyclics array has been yanked out 1234 * from underneath us by a resize 1235 * operation. In this case, pend is 0 and 1236 * cyp_state is CYS_EXPANDING. 1237 * 1238 * (c) The cyclic has been removed by an 1239 * intervening remove-xcall. In this case, 1240 * pend will be 0, the cyp_state will be 1241 * CYS_REMOVING, and the cyclic will be 1242 * marked CYF_FREE. 1243 * 1244 * The assertion below checks that we are 1245 * in one of the above situations. The 1246 * action under all three is to return to 1247 * the top of the loop. 1248 */ 1249 CYC_TRACE(cpu, level, "cas-fail", opend, pend); 1250 ASSERT(opend > pend || (opend == 0 && 1251 ((cyclics != cpu->cyp_cyclics && 1252 cpu->cyp_state == CYS_EXPANDING) || 1253 (cpu->cyp_state == CYS_REMOVING && 1254 (cyclic->cy_flags & CYF_FREE))))); 1255 goto reread; 1256 } 1257 1258 /* 1259 * Okay, so we've managed to successfully decrement 1260 * pend. If we just decremented the pend to 0, we're 1261 * done. 1262 */ 1263 } while (npend > 0); 1264 1265 pc->cypc_consndx = ++consndx; 1266 } 1267 1268 /* 1269 * If the high level handler is no longer writing to the same 1270 * buffer, then we've had a resize. We need to switch our soft 1271 * index, and goto top. 1272 */ 1273 if (soft != softbuf->cys_hard) { 1274 /* 1275 * We can assert that the other buffer has grown by exactly 1276 * one factor of two. 1277 */ 1278 CYC_TRACE(cpu, level, "buffer-grow", 0, 0); 1279 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1280 ASSERT(softbuf->cys_buf[softbuf->cys_hard].cypc_sizemask == 1281 (softbuf->cys_buf[soft].cypc_sizemask << 1) + 1 || 1282 softbuf->cys_buf[soft].cypc_sizemask == 0); 1283 ASSERT(softbuf->cys_hard == (softbuf->cys_soft ^ 1)); 1284 1285 /* 1286 * If our cached cyclics pointer doesn't match cyp_cyclics, 1287 * then we took a resize between our last iteration of the 1288 * pend loop and the check against softbuf->cys_hard. 1289 */ 1290 if (cpu->cyp_cyclics != cyclics) { 1291 CYC_TRACE1(cpu, level, "resize-int-int", consndx); 1292 cyclics = cpu->cyp_cyclics; 1293 } 1294 1295 softbuf->cys_soft = softbuf->cys_hard; 1296 1297 ASSERT(resized == 0); 1298 resized = 1; 1299 goto top; 1300 } 1301 1302 /* 1303 * If we were interrupted by a resize operation, then we must have 1304 * seen the hard index change. 1305 */ 1306 ASSERT(!(intr_resized == 1 && resized == 0)); 1307 1308 if (resized) { 1309 uint32_t lev, nlev; 1310 1311 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1312 1313 do { 1314 lev = cpu->cyp_modify_levels; 1315 nlev = lev + 1; 1316 } while (atomic_cas_32(&cpu->cyp_modify_levels, lev, nlev) != 1317 lev); 1318 1319 /* 1320 * If we are the last soft level to see the modification, 1321 * post on cyp_modify_wait. Otherwise, (if we're not 1322 * already at low level), post down to the next soft level. 1323 */ 1324 if (nlev == CY_SOFT_LEVELS) { 1325 CYC_TRACE0(cpu, level, "resize-kick"); 1326 sema_v(&cpu->cyp_modify_wait); 1327 } else { 1328 ASSERT(nlev < CY_SOFT_LEVELS); 1329 if (level != CY_LOW_LEVEL) { 1330 cyc_backend_t *be = cpu->cyp_backend; 1331 1332 CYC_TRACE0(cpu, level, "resize-post"); 1333 be->cyb_softint(be->cyb_arg, level - 1); 1334 } 1335 } 1336 } 1337 } 1338 1339 static void 1340 cyclic_expand_xcall(cyc_xcallarg_t *arg) 1341 { 1342 cyc_cpu_t *cpu = arg->cyx_cpu; 1343 cyc_backend_t *be = cpu->cyp_backend; 1344 cyb_arg_t bar = be->cyb_arg; 1345 cyc_cookie_t cookie; 1346 cyc_index_t new_size = arg->cyx_size, size = cpu->cyp_size, i; 1347 cyc_index_t *new_heap = arg->cyx_heap; 1348 cyclic_t *cyclics = cpu->cyp_cyclics, *new_cyclics = arg->cyx_cyclics; 1349 1350 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1351 1352 /* 1353 * This is a little dicey. First, we'll raise our interrupt level 1354 * to CY_HIGH_LEVEL. This CPU already has a new heap, cyclic array, 1355 * etc.; we just need to bcopy them across. As for the softint 1356 * buffers, we'll switch the active buffers. The actual softints will 1357 * take care of consuming any pending cyclics in the old buffer. 1358 */ 1359 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1360 1361 CYC_TRACE(cpu, CY_HIGH_LEVEL, "expand", new_size, 0); 1362 1363 /* 1364 * Assert that the new size is a power of 2. 1365 */ 1366 ASSERT((new_size & new_size - 1) == 0); 1367 ASSERT(new_size == (size << 1)); 1368 ASSERT(cpu->cyp_heap != NULL && cpu->cyp_cyclics != NULL); 1369 1370 bcopy(cpu->cyp_heap, new_heap, sizeof (cyc_index_t) * size); 1371 bcopy(cyclics, new_cyclics, sizeof (cyclic_t) * size); 1372 1373 /* 1374 * Now run through the old cyclics array, setting pend to 0. To 1375 * softints (which are executing at a lower priority level), the 1376 * pends dropping to 0 will appear atomic with the cyp_cyclics 1377 * pointer changing. 1378 */ 1379 for (i = 0; i < size; i++) 1380 cyclics[i].cy_pend = 0; 1381 1382 /* 1383 * Set up the free list, and set all of the new cyclics to be CYF_FREE. 1384 */ 1385 for (i = size; i < new_size; i++) { 1386 new_heap[i] = i; 1387 new_cyclics[i].cy_flags = CYF_FREE; 1388 } 1389 1390 /* 1391 * We can go ahead and plow the value of cyp_heap and cyp_cyclics; 1392 * cyclic_expand() has kept a copy. 1393 */ 1394 cpu->cyp_heap = new_heap; 1395 cpu->cyp_cyclics = new_cyclics; 1396 cpu->cyp_size = new_size; 1397 1398 /* 1399 * We've switched over the heap and the cyclics array. Now we need 1400 * to switch over our active softint buffer pointers. 1401 */ 1402 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1403 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1404 uchar_t hard = softbuf->cys_hard; 1405 1406 /* 1407 * Assert that we're not in the middle of a resize operation. 1408 */ 1409 ASSERT(hard == softbuf->cys_soft); 1410 ASSERT(hard == 0 || hard == 1); 1411 ASSERT(softbuf->cys_buf[hard].cypc_buf != NULL); 1412 1413 softbuf->cys_hard = hard ^ 1; 1414 1415 /* 1416 * The caller (cyclic_expand()) is responsible for setting 1417 * up the new producer-consumer buffer; assert that it's 1418 * been done correctly. 1419 */ 1420 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_buf != NULL); 1421 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_prodndx == 0); 1422 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_consndx == 0); 1423 } 1424 1425 /* 1426 * That's all there is to it; now we just need to postdown to 1427 * get the softint chain going. 1428 */ 1429 be->cyb_softint(bar, CY_HIGH_LEVEL - 1); 1430 be->cyb_restore_level(bar, cookie); 1431 } 1432 1433 /* 1434 * cyclic_expand() will cross call onto the CPU to perform the actual 1435 * expand operation. 1436 */ 1437 static void 1438 cyclic_expand(cyc_cpu_t *cpu) 1439 { 1440 cyc_index_t new_size, old_size; 1441 cyc_index_t *new_heap, *old_heap; 1442 cyclic_t *new_cyclics, *old_cyclics; 1443 cyc_xcallarg_t arg; 1444 cyc_backend_t *be = cpu->cyp_backend; 1445 char old_hard; 1446 int i; 1447 1448 ASSERT(MUTEX_HELD(&cpu_lock)); 1449 ASSERT(cpu->cyp_state == CYS_ONLINE); 1450 1451 cpu->cyp_state = CYS_EXPANDING; 1452 1453 old_heap = cpu->cyp_heap; 1454 old_cyclics = cpu->cyp_cyclics; 1455 1456 if ((new_size = ((old_size = cpu->cyp_size) << 1)) == 0) { 1457 new_size = CY_DEFAULT_PERCPU; 1458 ASSERT(old_heap == NULL && old_cyclics == NULL); 1459 } 1460 1461 /* 1462 * Check that the new_size is a power of 2. 1463 */ 1464 ASSERT((new_size - 1 & new_size) == 0); 1465 1466 new_heap = kmem_alloc(sizeof (cyc_index_t) * new_size, KM_SLEEP); 1467 new_cyclics = kmem_zalloc(sizeof (cyclic_t) * new_size, KM_SLEEP); 1468 1469 /* 1470 * We know that no other expansions are in progress (they serialize 1471 * on cpu_lock), so we can safely read the softbuf metadata. 1472 */ 1473 old_hard = cpu->cyp_softbuf[0].cys_hard; 1474 1475 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1476 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1477 char hard = softbuf->cys_hard; 1478 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard ^ 1]; 1479 1480 ASSERT(hard == old_hard); 1481 ASSERT(hard == softbuf->cys_soft); 1482 ASSERT(pc->cypc_buf == NULL); 1483 1484 pc->cypc_buf = 1485 kmem_alloc(sizeof (cyc_index_t) * new_size, KM_SLEEP); 1486 pc->cypc_prodndx = pc->cypc_consndx = 0; 1487 pc->cypc_sizemask = new_size - 1; 1488 } 1489 1490 arg.cyx_cpu = cpu; 1491 arg.cyx_heap = new_heap; 1492 arg.cyx_cyclics = new_cyclics; 1493 arg.cyx_size = new_size; 1494 1495 cpu->cyp_modify_levels = 0; 1496 1497 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 1498 (cyc_func_t)cyclic_expand_xcall, &arg); 1499 1500 /* 1501 * Now block, waiting for the resize operation to complete. 1502 */ 1503 sema_p(&cpu->cyp_modify_wait); 1504 ASSERT(cpu->cyp_modify_levels == CY_SOFT_LEVELS); 1505 1506 /* 1507 * The operation is complete; we can now free the old buffers. 1508 */ 1509 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 1510 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 1511 char hard = softbuf->cys_hard; 1512 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard ^ 1]; 1513 1514 ASSERT(hard == (old_hard ^ 1)); 1515 ASSERT(hard == softbuf->cys_soft); 1516 1517 if (pc->cypc_buf == NULL) 1518 continue; 1519 1520 ASSERT(pc->cypc_sizemask == ((new_size - 1) >> 1)); 1521 1522 kmem_free(pc->cypc_buf, 1523 sizeof (cyc_index_t) * (pc->cypc_sizemask + 1)); 1524 pc->cypc_buf = NULL; 1525 } 1526 1527 if (old_cyclics != NULL) { 1528 ASSERT(old_heap != NULL); 1529 ASSERT(old_size != 0); 1530 kmem_free(old_cyclics, sizeof (cyclic_t) * old_size); 1531 kmem_free(old_heap, sizeof (cyc_index_t) * old_size); 1532 } 1533 1534 ASSERT(cpu->cyp_state == CYS_EXPANDING); 1535 cpu->cyp_state = CYS_ONLINE; 1536 } 1537 1538 /* 1539 * cyclic_pick_cpu will attempt to pick a CPU according to the constraints 1540 * specified by the partition, bound CPU, and flags. Additionally, 1541 * cyclic_pick_cpu() will not pick the avoid CPU; it will return NULL if 1542 * the avoid CPU is the only CPU which satisfies the constraints. 1543 * 1544 * If CYF_CPU_BOUND is set in flags, the specified CPU must be non-NULL. 1545 * If CYF_PART_BOUND is set in flags, the specified partition must be non-NULL. 1546 * If both CYF_CPU_BOUND and CYF_PART_BOUND are set, the specified CPU must 1547 * be in the specified partition. 1548 */ 1549 static cyc_cpu_t * 1550 cyclic_pick_cpu(cpupart_t *part, cpu_t *bound, cpu_t *avoid, uint16_t flags) 1551 { 1552 cpu_t *c, *start = (part != NULL) ? part->cp_cpulist : CPU; 1553 cpu_t *online = NULL; 1554 uintptr_t offset; 1555 1556 CYC_PTRACE("pick-cpu", part, bound); 1557 1558 ASSERT(!(flags & CYF_CPU_BOUND) || bound != NULL); 1559 ASSERT(!(flags & CYF_PART_BOUND) || part != NULL); 1560 1561 /* 1562 * If we're bound to our CPU, there isn't much choice involved. We 1563 * need to check that the CPU passed as bound is in the cpupart, and 1564 * that the CPU that we're binding to has been configured. 1565 */ 1566 if (flags & CYF_CPU_BOUND) { 1567 CYC_PTRACE("pick-cpu-bound", bound, avoid); 1568 1569 if ((flags & CYF_PART_BOUND) && bound->cpu_part != part) 1570 panic("cyclic_pick_cpu: " 1571 "CPU binding contradicts partition binding"); 1572 1573 if (bound == avoid) 1574 return (NULL); 1575 1576 if (bound->cpu_cyclic == NULL) 1577 panic("cyclic_pick_cpu: " 1578 "attempt to bind to non-configured CPU"); 1579 1580 return (bound->cpu_cyclic); 1581 } 1582 1583 if (flags & CYF_PART_BOUND) { 1584 CYC_PTRACE("pick-part-bound", bound, avoid); 1585 offset = offsetof(cpu_t, cpu_next_part); 1586 } else { 1587 offset = offsetof(cpu_t, cpu_next_onln); 1588 } 1589 1590 c = start; 1591 do { 1592 if (c->cpu_cyclic == NULL) 1593 continue; 1594 1595 if (c->cpu_cyclic->cyp_state == CYS_OFFLINE) 1596 continue; 1597 1598 if (c == avoid) 1599 continue; 1600 1601 if (c->cpu_flags & CPU_ENABLE) 1602 goto found; 1603 1604 if (online == NULL) 1605 online = c; 1606 } while ((c = *(cpu_t **)((uintptr_t)c + offset)) != start); 1607 1608 /* 1609 * If we're here, we're in one of two situations: 1610 * 1611 * (a) We have a partition-bound cyclic, and there is no CPU in 1612 * our partition which is CPU_ENABLE'd. If we saw another 1613 * non-CYS_OFFLINE CPU in our partition, we'll go with it. 1614 * If not, the avoid CPU must be the only non-CYS_OFFLINE 1615 * CPU in the partition; we're forced to return NULL. 1616 * 1617 * (b) We have a partition-unbound cyclic, in which case there 1618 * must only be one CPU CPU_ENABLE'd, and it must be the one 1619 * we're trying to avoid. If cyclic_juggle()/cyclic_offline() 1620 * are called appropriately, this generally shouldn't happen 1621 * (the offline should fail before getting to this code). 1622 * At any rate: we can't avoid the avoid CPU, so we return 1623 * NULL. 1624 */ 1625 if (!(flags & CYF_PART_BOUND)) { 1626 ASSERT(avoid->cpu_flags & CPU_ENABLE); 1627 return (NULL); 1628 } 1629 1630 CYC_PTRACE("pick-no-intr", part, avoid); 1631 1632 if ((c = online) != NULL) 1633 goto found; 1634 1635 CYC_PTRACE("pick-fail", part, avoid); 1636 ASSERT(avoid->cpu_part == start->cpu_part); 1637 return (NULL); 1638 1639 found: 1640 CYC_PTRACE("pick-cpu-found", c, avoid); 1641 ASSERT(c != avoid); 1642 ASSERT(c->cpu_cyclic != NULL); 1643 1644 return (c->cpu_cyclic); 1645 } 1646 1647 static void 1648 cyclic_add_xcall(cyc_xcallarg_t *arg) 1649 { 1650 cyc_cpu_t *cpu = arg->cyx_cpu; 1651 cyc_handler_t *hdlr = arg->cyx_hdlr; 1652 cyc_time_t *when = arg->cyx_when; 1653 cyc_backend_t *be = cpu->cyp_backend; 1654 cyc_index_t ndx, nelems; 1655 cyc_cookie_t cookie; 1656 cyb_arg_t bar = be->cyb_arg; 1657 cyclic_t *cyclic; 1658 1659 ASSERT(cpu->cyp_nelems < cpu->cyp_size); 1660 1661 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1662 1663 CYC_TRACE(cpu, CY_HIGH_LEVEL, 1664 "add-xcall", when->cyt_when, when->cyt_interval); 1665 1666 nelems = cpu->cyp_nelems++; 1667 1668 if (nelems == 0) { 1669 /* 1670 * If this is the first element, we need to enable the 1671 * backend on this CPU. 1672 */ 1673 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "enabled"); 1674 be->cyb_enable(bar); 1675 } 1676 1677 ndx = cpu->cyp_heap[nelems]; 1678 cyclic = &cpu->cyp_cyclics[ndx]; 1679 1680 ASSERT(cyclic->cy_flags == CYF_FREE); 1681 cyclic->cy_interval = when->cyt_interval; 1682 1683 if (when->cyt_when == 0) { 1684 /* 1685 * If a start time hasn't been explicitly specified, we'll 1686 * start on the next interval boundary. 1687 */ 1688 cyclic->cy_expire = (gethrtime() / cyclic->cy_interval + 1) * 1689 cyclic->cy_interval; 1690 } else { 1691 cyclic->cy_expire = when->cyt_when; 1692 } 1693 1694 cyclic->cy_handler = hdlr->cyh_func; 1695 cyclic->cy_arg = hdlr->cyh_arg; 1696 cyclic->cy_level = hdlr->cyh_level; 1697 cyclic->cy_flags = arg->cyx_flags; 1698 1699 if (cyclic_upheap(cpu, nelems)) { 1700 hrtime_t exp = cyclic->cy_expire; 1701 1702 CYC_TRACE(cpu, CY_HIGH_LEVEL, "add-reprog", cyclic, exp); 1703 1704 /* 1705 * If our upheap propagated to the root, we need to 1706 * reprogram the interrupt source. 1707 */ 1708 be->cyb_reprogram(bar, exp); 1709 } 1710 be->cyb_restore_level(bar, cookie); 1711 1712 arg->cyx_ndx = ndx; 1713 } 1714 1715 static cyc_index_t 1716 cyclic_add_here(cyc_cpu_t *cpu, cyc_handler_t *hdlr, 1717 cyc_time_t *when, uint16_t flags) 1718 { 1719 cyc_backend_t *be = cpu->cyp_backend; 1720 cyb_arg_t bar = be->cyb_arg; 1721 cyc_xcallarg_t arg; 1722 1723 CYC_PTRACE("add-cpu", cpu, hdlr->cyh_func); 1724 ASSERT(MUTEX_HELD(&cpu_lock)); 1725 ASSERT(cpu->cyp_state == CYS_ONLINE); 1726 ASSERT(!(cpu->cyp_cpu->cpu_flags & CPU_OFFLINE)); 1727 ASSERT(when->cyt_when >= 0 && when->cyt_interval > 0); 1728 1729 if (cpu->cyp_nelems == cpu->cyp_size) { 1730 /* 1731 * This is expensive; it will cross call onto the other 1732 * CPU to perform the expansion. 1733 */ 1734 cyclic_expand(cpu); 1735 ASSERT(cpu->cyp_nelems < cpu->cyp_size); 1736 } 1737 1738 /* 1739 * By now, we know that we're going to be able to successfully 1740 * perform the add. Now cross call over to the CPU of interest to 1741 * actually add our cyclic. 1742 */ 1743 arg.cyx_cpu = cpu; 1744 arg.cyx_hdlr = hdlr; 1745 arg.cyx_when = when; 1746 arg.cyx_flags = flags; 1747 1748 be->cyb_xcall(bar, cpu->cyp_cpu, (cyc_func_t)cyclic_add_xcall, &arg); 1749 1750 CYC_PTRACE("add-cpu-done", cpu, arg.cyx_ndx); 1751 1752 return (arg.cyx_ndx); 1753 } 1754 1755 static void 1756 cyclic_remove_xcall(cyc_xcallarg_t *arg) 1757 { 1758 cyc_cpu_t *cpu = arg->cyx_cpu; 1759 cyc_backend_t *be = cpu->cyp_backend; 1760 cyb_arg_t bar = be->cyb_arg; 1761 cyc_cookie_t cookie; 1762 cyc_index_t ndx = arg->cyx_ndx, nelems, i; 1763 cyc_index_t *heap, last; 1764 cyclic_t *cyclic; 1765 #ifdef DEBUG 1766 cyc_index_t root; 1767 #endif 1768 1769 ASSERT(cpu->cyp_state == CYS_REMOVING); 1770 1771 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1772 1773 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "remove-xcall", ndx); 1774 1775 heap = cpu->cyp_heap; 1776 nelems = cpu->cyp_nelems; 1777 ASSERT(nelems > 0); 1778 cyclic = &cpu->cyp_cyclics[ndx]; 1779 1780 /* 1781 * Grab the current expiration time. If this cyclic is being 1782 * removed as part of a juggling operation, the expiration time 1783 * will be used when the cyclic is added to the new CPU. 1784 */ 1785 if (arg->cyx_when != NULL) { 1786 arg->cyx_when->cyt_when = cyclic->cy_expire; 1787 arg->cyx_when->cyt_interval = cyclic->cy_interval; 1788 } 1789 1790 if (cyclic->cy_pend != 0) { 1791 /* 1792 * The pend is non-zero; this cyclic is currently being 1793 * executed (or will be executed shortly). If the caller 1794 * refuses to wait, we must return (doing nothing). Otherwise, 1795 * we will stash the pend value * in this CPU's rpend, and 1796 * then zero it out. The softint in the pend loop will see 1797 * that we have zeroed out pend, and will call the cyclic 1798 * handler rpend times. The caller will wait until the 1799 * softint has completed calling the cyclic handler. 1800 */ 1801 if (arg->cyx_wait == CY_NOWAIT) { 1802 arg->cyx_wait = CY_WAIT; 1803 goto out; 1804 } 1805 1806 ASSERT(cyclic->cy_level != CY_HIGH_LEVEL); 1807 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "remove-pend", cyclic->cy_pend); 1808 cpu->cyp_rpend = cyclic->cy_pend; 1809 cyclic->cy_pend = 0; 1810 } 1811 1812 /* 1813 * Now set the flags to CYF_FREE. We don't need a membar_enter() 1814 * between zeroing pend and setting the flags because we're at 1815 * CY_HIGH_LEVEL (that is, the zeroing of pend and the setting 1816 * of cy_flags appear atomic to softints). 1817 */ 1818 cyclic->cy_flags = CYF_FREE; 1819 1820 for (i = 0; i < nelems; i++) { 1821 if (heap[i] == ndx) 1822 break; 1823 } 1824 1825 if (i == nelems) 1826 panic("attempt to remove non-existent cyclic"); 1827 1828 cpu->cyp_nelems = --nelems; 1829 1830 if (nelems == 0) { 1831 /* 1832 * If we just removed the last element, then we need to 1833 * disable the backend on this CPU. 1834 */ 1835 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "disabled"); 1836 be->cyb_disable(bar); 1837 } 1838 1839 if (i == nelems) { 1840 /* 1841 * If we just removed the last element of the heap, then 1842 * we don't have to downheap. 1843 */ 1844 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-bottom"); 1845 goto out; 1846 } 1847 1848 #ifdef DEBUG 1849 root = heap[0]; 1850 #endif 1851 1852 /* 1853 * Swap the last element of the heap with the one we want to 1854 * remove, and downheap (this has the implicit effect of putting 1855 * the newly freed element on the free list). 1856 */ 1857 heap[i] = (last = heap[nelems]); 1858 heap[nelems] = ndx; 1859 1860 if (i == 0) { 1861 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-root"); 1862 cyclic_downheap(cpu, 0); 1863 } else { 1864 if (cyclic_upheap(cpu, i) == 0) { 1865 /* 1866 * The upheap didn't propagate to the root; if it 1867 * didn't propagate at all, we need to downheap. 1868 */ 1869 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-no-root"); 1870 if (heap[i] == last) { 1871 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-no-up"); 1872 cyclic_downheap(cpu, i); 1873 } 1874 ASSERT(heap[0] == root); 1875 goto out; 1876 } 1877 } 1878 1879 /* 1880 * We're here because we changed the root; we need to reprogram 1881 * the clock source. 1882 */ 1883 cyclic = &cpu->cyp_cyclics[heap[0]]; 1884 1885 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "remove-reprog"); 1886 1887 ASSERT(nelems != 0); 1888 be->cyb_reprogram(bar, cyclic->cy_expire); 1889 out: 1890 be->cyb_restore_level(bar, cookie); 1891 } 1892 1893 static int 1894 cyclic_remove_here(cyc_cpu_t *cpu, cyc_index_t ndx, cyc_time_t *when, int wait) 1895 { 1896 cyc_backend_t *be = cpu->cyp_backend; 1897 cyc_xcallarg_t arg; 1898 cyclic_t *cyclic = &cpu->cyp_cyclics[ndx]; 1899 cyc_level_t level = cyclic->cy_level; 1900 1901 ASSERT(MUTEX_HELD(&cpu_lock)); 1902 ASSERT(cpu->cyp_rpend == 0); 1903 ASSERT(wait == CY_WAIT || wait == CY_NOWAIT); 1904 1905 arg.cyx_ndx = ndx; 1906 arg.cyx_cpu = cpu; 1907 arg.cyx_when = when; 1908 arg.cyx_wait = wait; 1909 1910 ASSERT(cpu->cyp_state == CYS_ONLINE); 1911 cpu->cyp_state = CYS_REMOVING; 1912 1913 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 1914 (cyc_func_t)cyclic_remove_xcall, &arg); 1915 1916 /* 1917 * If the cyclic we removed wasn't at CY_HIGH_LEVEL, then we need to 1918 * check the cyp_rpend. If it's non-zero, then we need to wait here 1919 * for all pending cyclic handlers to run. 1920 */ 1921 ASSERT(!(level == CY_HIGH_LEVEL && cpu->cyp_rpend != 0)); 1922 ASSERT(!(wait == CY_NOWAIT && cpu->cyp_rpend != 0)); 1923 ASSERT(!(arg.cyx_wait == CY_NOWAIT && cpu->cyp_rpend != 0)); 1924 1925 if (wait != arg.cyx_wait) { 1926 /* 1927 * We are being told that we must wait if we want to 1928 * remove this cyclic; put the CPU back in the CYS_ONLINE 1929 * state and return failure. 1930 */ 1931 ASSERT(wait == CY_NOWAIT && arg.cyx_wait == CY_WAIT); 1932 ASSERT(cpu->cyp_state == CYS_REMOVING); 1933 cpu->cyp_state = CYS_ONLINE; 1934 1935 return (0); 1936 } 1937 1938 if (cpu->cyp_rpend != 0) 1939 sema_p(&cpu->cyp_modify_wait); 1940 1941 ASSERT(cpu->cyp_state == CYS_REMOVING); 1942 1943 cpu->cyp_rpend = 0; 1944 cpu->cyp_state = CYS_ONLINE; 1945 1946 return (1); 1947 } 1948 1949 /* 1950 * If cyclic_reprogram() is called on the same CPU as the cyclic's CPU, then 1951 * it calls this function directly. Else, it invokes this function through 1952 * an X-call to the cyclic's CPU. 1953 */ 1954 static void 1955 cyclic_reprogram_cyclic(cyc_cpu_t *cpu, cyc_index_t ndx, hrtime_t expire) 1956 { 1957 cyc_backend_t *be = cpu->cyp_backend; 1958 cyb_arg_t bar = be->cyb_arg; 1959 cyc_cookie_t cookie; 1960 cyc_index_t nelems, i; 1961 cyc_index_t *heap; 1962 cyclic_t *cyclic; 1963 hrtime_t oexpire; 1964 int reprog; 1965 1966 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 1967 1968 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "reprog-xcall", ndx); 1969 1970 nelems = cpu->cyp_nelems; 1971 ASSERT(nelems > 0); 1972 heap = cpu->cyp_heap; 1973 1974 /* 1975 * Reprogrammed cyclics are typically one-shot ones that get 1976 * set to infinity on every expiration. We shorten the search by 1977 * searching from the bottom of the heap to the top instead of the 1978 * other way around. 1979 */ 1980 for (i = nelems - 1; i >= 0; i--) { 1981 if (heap[i] == ndx) 1982 break; 1983 } 1984 if (i < 0) 1985 panic("attempt to reprogram non-existent cyclic"); 1986 1987 cyclic = &cpu->cyp_cyclics[ndx]; 1988 oexpire = cyclic->cy_expire; 1989 cyclic->cy_expire = expire; 1990 1991 reprog = (i == 0); 1992 if (expire > oexpire) { 1993 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "reprog-down", i); 1994 cyclic_downheap(cpu, i); 1995 } else if (i > 0) { 1996 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "reprog-up", i); 1997 reprog = cyclic_upheap(cpu, i); 1998 } 1999 2000 if (reprog && (cpu->cyp_state != CYS_SUSPENDED)) { 2001 /* 2002 * The root changed. Reprogram the clock source. 2003 */ 2004 CYC_TRACE0(cpu, CY_HIGH_LEVEL, "reprog-root"); 2005 cyclic = &cpu->cyp_cyclics[heap[0]]; 2006 be->cyb_reprogram(bar, cyclic->cy_expire); 2007 } 2008 2009 be->cyb_restore_level(bar, cookie); 2010 } 2011 2012 static void 2013 cyclic_reprogram_xcall(cyc_xcallarg_t *arg) 2014 { 2015 cyclic_reprogram_cyclic(arg->cyx_cpu, arg->cyx_ndx, 2016 arg->cyx_when->cyt_when); 2017 } 2018 2019 static void 2020 cyclic_reprogram_here(cyc_cpu_t *cpu, cyc_index_t ndx, hrtime_t expiration) 2021 { 2022 cyc_backend_t *be = cpu->cyp_backend; 2023 cyc_xcallarg_t arg; 2024 cyc_time_t when; 2025 2026 ASSERT(expiration > 0); 2027 2028 arg.cyx_ndx = ndx; 2029 arg.cyx_cpu = cpu; 2030 arg.cyx_when = &when; 2031 when.cyt_when = expiration; 2032 2033 be->cyb_xcall(be->cyb_arg, cpu->cyp_cpu, 2034 (cyc_func_t)cyclic_reprogram_xcall, &arg); 2035 } 2036 2037 /* 2038 * cyclic_juggle_one_to() should only be called when the source cyclic 2039 * can be juggled and the destination CPU is known to be able to accept 2040 * it. 2041 */ 2042 static void 2043 cyclic_juggle_one_to(cyc_id_t *idp, cyc_cpu_t *dest) 2044 { 2045 cyc_cpu_t *src = idp->cyi_cpu; 2046 cyc_index_t ndx = idp->cyi_ndx; 2047 cyc_time_t when; 2048 cyc_handler_t hdlr; 2049 cyclic_t *cyclic; 2050 uint16_t flags; 2051 hrtime_t delay; 2052 2053 ASSERT(MUTEX_HELD(&cpu_lock)); 2054 ASSERT(src != NULL && idp->cyi_omni_list == NULL); 2055 ASSERT(!(dest->cyp_cpu->cpu_flags & (CPU_QUIESCED | CPU_OFFLINE))); 2056 CYC_PTRACE("juggle-one-to", idp, dest); 2057 2058 cyclic = &src->cyp_cyclics[ndx]; 2059 2060 flags = cyclic->cy_flags; 2061 ASSERT(!(flags & CYF_CPU_BOUND) && !(flags & CYF_FREE)); 2062 2063 hdlr.cyh_func = cyclic->cy_handler; 2064 hdlr.cyh_level = cyclic->cy_level; 2065 hdlr.cyh_arg = cyclic->cy_arg; 2066 2067 /* 2068 * Before we begin the juggling process, see if the destination 2069 * CPU requires an expansion. If it does, we'll perform the 2070 * expansion before removing the cyclic. This is to prevent us 2071 * from blocking while a system-critical cyclic (notably, the clock 2072 * cyclic) isn't on a CPU. 2073 */ 2074 if (dest->cyp_nelems == dest->cyp_size) { 2075 CYC_PTRACE("remove-expand", idp, dest); 2076 cyclic_expand(dest); 2077 ASSERT(dest->cyp_nelems < dest->cyp_size); 2078 } 2079 2080 /* 2081 * Prevent a reprogram of this cyclic while we are relocating it. 2082 * Otherwise, cyclic_reprogram_here() will end up sending an X-call 2083 * to the wrong CPU. 2084 */ 2085 rw_enter(&idp->cyi_lock, RW_WRITER); 2086 2087 /* 2088 * Remove the cyclic from the source. As mentioned above, we cannot 2089 * block during this operation; if we cannot remove the cyclic 2090 * without waiting, we spin for a time shorter than the interval, and 2091 * reattempt the (non-blocking) removal. If we continue to fail, 2092 * we will exponentially back off (up to half of the interval). 2093 * Note that the removal will ultimately succeed -- even if the 2094 * cyclic handler is blocked on a resource held by a thread which we 2095 * have preempted, priority inheritance assures that the preempted 2096 * thread will preempt us and continue to progress. 2097 */ 2098 for (delay = NANOSEC / MICROSEC; ; delay <<= 1) { 2099 /* 2100 * Before we begin this operation, disable kernel preemption. 2101 */ 2102 kpreempt_disable(); 2103 if (cyclic_remove_here(src, ndx, &when, CY_NOWAIT)) 2104 break; 2105 2106 /* 2107 * The operation failed; enable kernel preemption while 2108 * spinning. 2109 */ 2110 kpreempt_enable(); 2111 2112 CYC_PTRACE("remove-retry", idp, src); 2113 2114 if (delay > (cyclic->cy_interval >> 1)) 2115 delay = cyclic->cy_interval >> 1; 2116 2117 /* 2118 * Drop the RW lock to avoid a deadlock with the cyclic 2119 * handler (because it can potentially call cyclic_reprogram(). 2120 */ 2121 rw_exit(&idp->cyi_lock); 2122 drv_usecwait((clock_t)(delay / (NANOSEC / MICROSEC))); 2123 rw_enter(&idp->cyi_lock, RW_WRITER); 2124 } 2125 2126 /* 2127 * Now add the cyclic to the destination. This won't block; we 2128 * performed any necessary (blocking) expansion of the destination 2129 * CPU before removing the cyclic from the source CPU. 2130 */ 2131 idp->cyi_ndx = cyclic_add_here(dest, &hdlr, &when, flags); 2132 idp->cyi_cpu = dest; 2133 kpreempt_enable(); 2134 2135 /* 2136 * Now that we have successfully relocated the cyclic, allow 2137 * it to be reprogrammed. 2138 */ 2139 rw_exit(&idp->cyi_lock); 2140 } 2141 2142 static int 2143 cyclic_juggle_one(cyc_id_t *idp) 2144 { 2145 cyc_index_t ndx = idp->cyi_ndx; 2146 cyc_cpu_t *cpu = idp->cyi_cpu, *dest; 2147 cyclic_t *cyclic = &cpu->cyp_cyclics[ndx]; 2148 cpu_t *c = cpu->cyp_cpu; 2149 cpupart_t *part = c->cpu_part; 2150 2151 CYC_PTRACE("juggle-one", idp, cpu); 2152 ASSERT(MUTEX_HELD(&cpu_lock)); 2153 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2154 ASSERT(cpu->cyp_state == CYS_ONLINE); 2155 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 2156 2157 if ((dest = cyclic_pick_cpu(part, c, c, cyclic->cy_flags)) == NULL) { 2158 /* 2159 * Bad news: this cyclic can't be juggled. 2160 */ 2161 CYC_PTRACE("juggle-fail", idp, cpu) 2162 return (0); 2163 } 2164 2165 cyclic_juggle_one_to(idp, dest); 2166 2167 return (1); 2168 } 2169 2170 static void 2171 cyclic_unbind_cpu(cyclic_id_t id) 2172 { 2173 cyc_id_t *idp = (cyc_id_t *)id; 2174 cyc_cpu_t *cpu = idp->cyi_cpu; 2175 cpu_t *c = cpu->cyp_cpu; 2176 cyclic_t *cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 2177 2178 CYC_PTRACE("unbind-cpu", id, cpu); 2179 ASSERT(MUTEX_HELD(&cpu_lock)); 2180 ASSERT(cpu->cyp_state == CYS_ONLINE); 2181 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 2182 ASSERT(cyclic->cy_flags & CYF_CPU_BOUND); 2183 2184 cyclic->cy_flags &= ~CYF_CPU_BOUND; 2185 2186 /* 2187 * If we were bound to CPU which has interrupts disabled, we need 2188 * to juggle away. This can only fail if we are bound to a 2189 * processor set, and if every CPU in the processor set has 2190 * interrupts disabled. 2191 */ 2192 if (!(c->cpu_flags & CPU_ENABLE)) { 2193 int res = cyclic_juggle_one(idp); 2194 2195 ASSERT((res && idp->cyi_cpu != cpu) || 2196 (!res && (cyclic->cy_flags & CYF_PART_BOUND))); 2197 } 2198 } 2199 2200 static void 2201 cyclic_bind_cpu(cyclic_id_t id, cpu_t *d) 2202 { 2203 cyc_id_t *idp = (cyc_id_t *)id; 2204 cyc_cpu_t *dest = d->cpu_cyclic, *cpu = idp->cyi_cpu; 2205 cpu_t *c = cpu->cyp_cpu; 2206 cyclic_t *cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 2207 cpupart_t *part = c->cpu_part; 2208 2209 CYC_PTRACE("bind-cpu", id, dest); 2210 ASSERT(MUTEX_HELD(&cpu_lock)); 2211 ASSERT(!(d->cpu_flags & CPU_OFFLINE)); 2212 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2213 ASSERT(cpu->cyp_state == CYS_ONLINE); 2214 ASSERT(dest != NULL); 2215 ASSERT(dest->cyp_state == CYS_ONLINE); 2216 ASSERT(!(cyclic->cy_flags & CYF_FREE)); 2217 ASSERT(!(cyclic->cy_flags & CYF_CPU_BOUND)); 2218 2219 dest = cyclic_pick_cpu(part, d, NULL, cyclic->cy_flags | CYF_CPU_BOUND); 2220 2221 if (dest != cpu) { 2222 cyclic_juggle_one_to(idp, dest); 2223 cyclic = &dest->cyp_cyclics[idp->cyi_ndx]; 2224 } 2225 2226 cyclic->cy_flags |= CYF_CPU_BOUND; 2227 } 2228 2229 static void 2230 cyclic_unbind_cpupart(cyclic_id_t id) 2231 { 2232 cyc_id_t *idp = (cyc_id_t *)id; 2233 cyc_cpu_t *cpu = idp->cyi_cpu; 2234 cpu_t *c = cpu->cyp_cpu; 2235 cyclic_t *cyc = &cpu->cyp_cyclics[idp->cyi_ndx]; 2236 2237 CYC_PTRACE("unbind-part", idp, c->cpu_part); 2238 ASSERT(MUTEX_HELD(&cpu_lock)); 2239 ASSERT(cpu->cyp_state == CYS_ONLINE); 2240 ASSERT(!(cyc->cy_flags & CYF_FREE)); 2241 ASSERT(cyc->cy_flags & CYF_PART_BOUND); 2242 2243 cyc->cy_flags &= ~CYF_PART_BOUND; 2244 2245 /* 2246 * If we're on a CPU which has interrupts disabled (and if this cyclic 2247 * isn't bound to the CPU), we need to juggle away. 2248 */ 2249 if (!(c->cpu_flags & CPU_ENABLE) && !(cyc->cy_flags & CYF_CPU_BOUND)) { 2250 int res = cyclic_juggle_one(idp); 2251 2252 ASSERT(res && idp->cyi_cpu != cpu); 2253 } 2254 } 2255 2256 static void 2257 cyclic_bind_cpupart(cyclic_id_t id, cpupart_t *part) 2258 { 2259 cyc_id_t *idp = (cyc_id_t *)id; 2260 cyc_cpu_t *cpu = idp->cyi_cpu, *dest; 2261 cpu_t *c = cpu->cyp_cpu; 2262 cyclic_t *cyc = &cpu->cyp_cyclics[idp->cyi_ndx]; 2263 2264 CYC_PTRACE("bind-part", idp, part); 2265 ASSERT(MUTEX_HELD(&cpu_lock)); 2266 ASSERT(!(c->cpu_flags & CPU_OFFLINE)); 2267 ASSERT(cpu->cyp_state == CYS_ONLINE); 2268 ASSERT(!(cyc->cy_flags & CYF_FREE)); 2269 ASSERT(!(cyc->cy_flags & CYF_PART_BOUND)); 2270 ASSERT(part->cp_ncpus > 0); 2271 2272 dest = cyclic_pick_cpu(part, c, NULL, cyc->cy_flags | CYF_PART_BOUND); 2273 2274 if (dest != cpu) { 2275 cyclic_juggle_one_to(idp, dest); 2276 cyc = &dest->cyp_cyclics[idp->cyi_ndx]; 2277 } 2278 2279 cyc->cy_flags |= CYF_PART_BOUND; 2280 } 2281 2282 static void 2283 cyclic_configure(cpu_t *c) 2284 { 2285 cyc_cpu_t *cpu = kmem_zalloc(sizeof (cyc_cpu_t), KM_SLEEP); 2286 cyc_backend_t *nbe = kmem_zalloc(sizeof (cyc_backend_t), KM_SLEEP); 2287 int i; 2288 2289 CYC_PTRACE1("configure", cpu); 2290 ASSERT(MUTEX_HELD(&cpu_lock)); 2291 2292 if (cyclic_id_cache == NULL) 2293 cyclic_id_cache = kmem_cache_create("cyclic_id_cache", 2294 sizeof (cyc_id_t), 0, NULL, NULL, NULL, NULL, NULL, 0); 2295 2296 cpu->cyp_cpu = c; 2297 2298 sema_init(&cpu->cyp_modify_wait, 0, NULL, SEMA_DEFAULT, NULL); 2299 2300 cpu->cyp_size = 1; 2301 cpu->cyp_heap = kmem_zalloc(sizeof (cyc_index_t), KM_SLEEP); 2302 cpu->cyp_cyclics = kmem_zalloc(sizeof (cyclic_t), KM_SLEEP); 2303 cpu->cyp_cyclics->cy_flags = CYF_FREE; 2304 2305 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 2306 /* 2307 * We don't need to set the sizemask; it's already zero 2308 * (which is the appropriate sizemask for a size of 1). 2309 */ 2310 cpu->cyp_softbuf[i].cys_buf[0].cypc_buf = 2311 kmem_alloc(sizeof (cyc_index_t), KM_SLEEP); 2312 } 2313 2314 cpu->cyp_state = CYS_OFFLINE; 2315 2316 /* 2317 * Setup the backend for this CPU. 2318 */ 2319 bcopy(&cyclic_backend, nbe, sizeof (cyc_backend_t)); 2320 nbe->cyb_arg = nbe->cyb_configure(c); 2321 cpu->cyp_backend = nbe; 2322 2323 /* 2324 * On platforms where stray interrupts may be taken during startup, 2325 * the CPU's cpu_cyclic pointer serves as an indicator that the 2326 * cyclic subsystem for this CPU is prepared to field interrupts. 2327 */ 2328 membar_producer(); 2329 2330 c->cpu_cyclic = cpu; 2331 } 2332 2333 static void 2334 cyclic_unconfigure(cpu_t *c) 2335 { 2336 cyc_cpu_t *cpu = c->cpu_cyclic; 2337 cyc_backend_t *be = cpu->cyp_backend; 2338 cyb_arg_t bar = be->cyb_arg; 2339 int i; 2340 2341 CYC_PTRACE1("unconfigure", cpu); 2342 ASSERT(MUTEX_HELD(&cpu_lock)); 2343 ASSERT(cpu->cyp_state == CYS_OFFLINE); 2344 ASSERT(cpu->cyp_nelems == 0); 2345 2346 /* 2347 * Let the backend know that the CPU is being yanked, and free up 2348 * the backend structure. 2349 */ 2350 be->cyb_unconfigure(bar); 2351 kmem_free(be, sizeof (cyc_backend_t)); 2352 cpu->cyp_backend = NULL; 2353 2354 /* 2355 * Free up the producer/consumer buffers at each of the soft levels. 2356 */ 2357 for (i = CY_LOW_LEVEL; i < CY_LOW_LEVEL + CY_SOFT_LEVELS; i++) { 2358 cyc_softbuf_t *softbuf = &cpu->cyp_softbuf[i]; 2359 uchar_t hard = softbuf->cys_hard; 2360 cyc_pcbuffer_t *pc = &softbuf->cys_buf[hard]; 2361 size_t bufsize = sizeof (cyc_index_t) * (pc->cypc_sizemask + 1); 2362 2363 /* 2364 * Assert that we're not in the middle of a resize operation. 2365 */ 2366 ASSERT(hard == softbuf->cys_soft); 2367 ASSERT(hard == 0 || hard == 1); 2368 ASSERT(pc->cypc_buf != NULL); 2369 ASSERT(softbuf->cys_buf[hard ^ 1].cypc_buf == NULL); 2370 2371 kmem_free(pc->cypc_buf, bufsize); 2372 pc->cypc_buf = NULL; 2373 } 2374 2375 /* 2376 * Finally, clean up our remaining dynamic structures and NULL out 2377 * the cpu_cyclic pointer. 2378 */ 2379 kmem_free(cpu->cyp_cyclics, cpu->cyp_size * sizeof (cyclic_t)); 2380 kmem_free(cpu->cyp_heap, cpu->cyp_size * sizeof (cyc_index_t)); 2381 kmem_free(cpu, sizeof (cyc_cpu_t)); 2382 2383 c->cpu_cyclic = NULL; 2384 } 2385 2386 static int 2387 cyclic_cpu_setup(cpu_setup_t what, int id, void *arg __unused) 2388 { 2389 /* 2390 * We are guaranteed that there is still/already an entry in the 2391 * cpu array for this CPU. 2392 */ 2393 cpu_t *c = cpu[id]; 2394 cyc_cpu_t *cyp = c->cpu_cyclic; 2395 2396 ASSERT(MUTEX_HELD(&cpu_lock)); 2397 2398 switch (what) { 2399 case CPU_CONFIG: 2400 ASSERT(cyp == NULL); 2401 cyclic_configure(c); 2402 break; 2403 2404 case CPU_UNCONFIG: 2405 ASSERT(cyp != NULL && cyp->cyp_state == CYS_OFFLINE); 2406 cyclic_unconfigure(c); 2407 break; 2408 2409 default: 2410 break; 2411 } 2412 2413 return (0); 2414 } 2415 2416 static void 2417 cyclic_suspend_xcall(cyc_xcallarg_t *arg) 2418 { 2419 cyc_cpu_t *cpu = arg->cyx_cpu; 2420 cyc_backend_t *be = cpu->cyp_backend; 2421 cyc_cookie_t cookie; 2422 cyb_arg_t bar = be->cyb_arg; 2423 2424 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 2425 2426 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "suspend-xcall", cpu->cyp_nelems); 2427 ASSERT(cpu->cyp_state == CYS_ONLINE || cpu->cyp_state == CYS_OFFLINE); 2428 2429 /* 2430 * We won't disable this CPU unless it has a non-zero number of 2431 * elements (cpu_lock assures that no one else may be attempting 2432 * to disable this CPU). 2433 */ 2434 if (cpu->cyp_nelems > 0) { 2435 ASSERT(cpu->cyp_state == CYS_ONLINE); 2436 be->cyb_disable(bar); 2437 } 2438 2439 if (cpu->cyp_state == CYS_ONLINE) 2440 cpu->cyp_state = CYS_SUSPENDED; 2441 2442 be->cyb_suspend(bar); 2443 be->cyb_restore_level(bar, cookie); 2444 } 2445 2446 static void 2447 cyclic_resume_xcall(cyc_xcallarg_t *arg) 2448 { 2449 cyc_cpu_t *cpu = arg->cyx_cpu; 2450 cyc_backend_t *be = cpu->cyp_backend; 2451 cyc_cookie_t cookie; 2452 cyb_arg_t bar = be->cyb_arg; 2453 cyc_state_t state = cpu->cyp_state; 2454 2455 cookie = be->cyb_set_level(bar, CY_HIGH_LEVEL); 2456 2457 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "resume-xcall", cpu->cyp_nelems); 2458 ASSERT(state == CYS_SUSPENDED || state == CYS_OFFLINE); 2459 2460 be->cyb_resume(bar); 2461 2462 /* 2463 * We won't enable this CPU unless it has a non-zero number of 2464 * elements. 2465 */ 2466 if (cpu->cyp_nelems > 0) { 2467 cyclic_t *cyclic = &cpu->cyp_cyclics[cpu->cyp_heap[0]]; 2468 hrtime_t exp = cyclic->cy_expire; 2469 2470 CYC_TRACE(cpu, CY_HIGH_LEVEL, "resume-reprog", cyclic, exp); 2471 ASSERT(state == CYS_SUSPENDED); 2472 be->cyb_enable(bar); 2473 be->cyb_reprogram(bar, exp); 2474 } 2475 2476 if (state == CYS_SUSPENDED) 2477 cpu->cyp_state = CYS_ONLINE; 2478 2479 CYC_TRACE1(cpu, CY_HIGH_LEVEL, "resume-done", cpu->cyp_nelems); 2480 be->cyb_restore_level(bar, cookie); 2481 } 2482 2483 static void 2484 cyclic_omni_start(cyc_id_t *idp, cyc_cpu_t *cpu) 2485 { 2486 cyc_omni_handler_t *omni = &idp->cyi_omni_hdlr; 2487 cyc_omni_cpu_t *ocpu = kmem_alloc(sizeof (cyc_omni_cpu_t), KM_SLEEP); 2488 cyc_handler_t hdlr; 2489 cyc_time_t when; 2490 2491 CYC_PTRACE("omni-start", cpu, idp); 2492 ASSERT(MUTEX_HELD(&cpu_lock)); 2493 ASSERT(cpu->cyp_state == CYS_ONLINE); 2494 ASSERT(idp->cyi_cpu == NULL); 2495 2496 hdlr.cyh_func = NULL; 2497 hdlr.cyh_arg = NULL; 2498 hdlr.cyh_level = CY_LEVELS; 2499 2500 when.cyt_when = 0; 2501 when.cyt_interval = 0; 2502 2503 omni->cyo_online(omni->cyo_arg, cpu->cyp_cpu, &hdlr, &when); 2504 2505 ASSERT(hdlr.cyh_func != NULL); 2506 ASSERT(hdlr.cyh_level < CY_LEVELS); 2507 ASSERT(when.cyt_when >= 0 && when.cyt_interval > 0); 2508 2509 ocpu->cyo_cpu = cpu; 2510 ocpu->cyo_arg = hdlr.cyh_arg; 2511 ocpu->cyo_ndx = cyclic_add_here(cpu, &hdlr, &when, 0); 2512 ocpu->cyo_next = idp->cyi_omni_list; 2513 idp->cyi_omni_list = ocpu; 2514 } 2515 2516 static void 2517 cyclic_omni_stop(cyc_id_t *idp, cyc_cpu_t *cpu) 2518 { 2519 cyc_omni_handler_t *omni = &idp->cyi_omni_hdlr; 2520 cyc_omni_cpu_t *ocpu = idp->cyi_omni_list, *prev = NULL; 2521 clock_t delay; 2522 int ret; 2523 2524 CYC_PTRACE("omni-stop", cpu, idp); 2525 ASSERT(MUTEX_HELD(&cpu_lock)); 2526 ASSERT(cpu->cyp_state == CYS_ONLINE); 2527 ASSERT(idp->cyi_cpu == NULL); 2528 ASSERT(ocpu != NULL); 2529 2530 /* 2531 * Prevent a reprogram of this cyclic while we are removing it. 2532 * Otherwise, cyclic_reprogram_here() will end up sending an X-call 2533 * to the offlined CPU. 2534 */ 2535 rw_enter(&idp->cyi_lock, RW_WRITER); 2536 2537 while (ocpu != NULL && ocpu->cyo_cpu != cpu) { 2538 prev = ocpu; 2539 ocpu = ocpu->cyo_next; 2540 } 2541 2542 /* 2543 * We _must_ have found an cyc_omni_cpu which corresponds to this 2544 * CPU -- the definition of an omnipresent cyclic is that it runs 2545 * on all online CPUs. 2546 */ 2547 ASSERT(ocpu != NULL); 2548 2549 if (prev == NULL) { 2550 idp->cyi_omni_list = ocpu->cyo_next; 2551 } else { 2552 prev->cyo_next = ocpu->cyo_next; 2553 } 2554 2555 /* 2556 * Remove the cyclic from the source. We cannot block during this 2557 * operation because we are holding the cyi_lock which can be held 2558 * by the cyclic handler via cyclic_reprogram(). 2559 * 2560 * If we cannot remove the cyclic without waiting, we spin for a time, 2561 * and reattempt the (non-blocking) removal. If the handler is blocked 2562 * on the cyi_lock, then we let go of it in the spin loop to give 2563 * the handler a chance to run. Note that the removal will ultimately 2564 * succeed -- even if the cyclic handler is blocked on a resource 2565 * held by a thread which we have preempted, priority inheritance 2566 * assures that the preempted thread will preempt us and continue 2567 * to progress. 2568 */ 2569 for (delay = 1; ; delay <<= 1) { 2570 /* 2571 * Before we begin this operation, disable kernel preemption. 2572 */ 2573 kpreempt_disable(); 2574 ret = cyclic_remove_here(ocpu->cyo_cpu, ocpu->cyo_ndx, NULL, 2575 CY_NOWAIT); 2576 /* 2577 * Enable kernel preemption while spinning. 2578 */ 2579 kpreempt_enable(); 2580 2581 if (ret) 2582 break; 2583 2584 CYC_PTRACE("remove-omni-retry", idp, ocpu->cyo_cpu); 2585 2586 /* 2587 * Drop the RW lock to avoid a deadlock with the cyclic 2588 * handler (because it can potentially call cyclic_reprogram(). 2589 */ 2590 rw_exit(&idp->cyi_lock); 2591 drv_usecwait(delay); 2592 rw_enter(&idp->cyi_lock, RW_WRITER); 2593 } 2594 2595 /* 2596 * Now that we have successfully removed the cyclic, allow the omni 2597 * cyclic to be reprogrammed on other CPUs. 2598 */ 2599 rw_exit(&idp->cyi_lock); 2600 2601 /* 2602 * The cyclic has been removed from this CPU; time to call the 2603 * omnipresent offline handler. 2604 */ 2605 if (omni->cyo_offline != NULL) 2606 omni->cyo_offline(omni->cyo_arg, cpu->cyp_cpu, ocpu->cyo_arg); 2607 2608 kmem_free(ocpu, sizeof (cyc_omni_cpu_t)); 2609 } 2610 2611 static cyc_id_t * 2612 cyclic_new_id() 2613 { 2614 cyc_id_t *idp; 2615 2616 ASSERT(MUTEX_HELD(&cpu_lock)); 2617 2618 idp = kmem_cache_alloc(cyclic_id_cache, KM_SLEEP); 2619 2620 /* 2621 * The cyi_cpu field of the cyc_id_t structure tracks the CPU 2622 * associated with the cyclic. If and only if this field is NULL, the 2623 * cyc_id_t is an omnipresent cyclic. Note that cyi_omni_list may be 2624 * NULL for an omnipresent cyclic while the cyclic is being created 2625 * or destroyed. 2626 */ 2627 idp->cyi_cpu = NULL; 2628 idp->cyi_ndx = 0; 2629 rw_init(&idp->cyi_lock, NULL, RW_DEFAULT, NULL); 2630 2631 idp->cyi_next = cyclic_id_head; 2632 idp->cyi_prev = NULL; 2633 idp->cyi_omni_list = NULL; 2634 2635 if (cyclic_id_head != NULL) { 2636 ASSERT(cyclic_id_head->cyi_prev == NULL); 2637 cyclic_id_head->cyi_prev = idp; 2638 } 2639 2640 cyclic_id_head = idp; 2641 2642 return (idp); 2643 } 2644 2645 /* 2646 * cyclic_id_t cyclic_add(cyc_handler_t *, cyc_time_t *) 2647 * 2648 * Overview 2649 * 2650 * cyclic_add() will create an unbound cyclic with the specified handler and 2651 * interval. The cyclic will run on a CPU which both has interrupts enabled 2652 * and is in the system CPU partition. 2653 * 2654 * Arguments and notes 2655 * 2656 * As its first argument, cyclic_add() takes a cyc_handler, which has the 2657 * following members: 2658 * 2659 * cyc_func_t cyh_func <-- Cyclic handler 2660 * void *cyh_arg <-- Argument to cyclic handler 2661 * cyc_level_t cyh_level <-- Level at which to fire; must be one of 2662 * CY_LOW_LEVEL, CY_LOCK_LEVEL or CY_HIGH_LEVEL 2663 * 2664 * Note that cyh_level is _not_ an ipl or spl; it must be one the 2665 * CY_*_LEVELs. This layer of abstraction allows the platform to define 2666 * the precise interrupt priority levels, within the following constraints: 2667 * 2668 * CY_LOCK_LEVEL must map to LOCK_LEVEL 2669 * CY_HIGH_LEVEL must map to an ipl greater than LOCK_LEVEL 2670 * CY_LOW_LEVEL must map to an ipl below LOCK_LEVEL 2671 * 2672 * In addition to a cyc_handler, cyclic_add() takes a cyc_time, which 2673 * has the following members: 2674 * 2675 * hrtime_t cyt_when <-- Absolute time, in nanoseconds since boot, at 2676 * which to start firing 2677 * hrtime_t cyt_interval <-- Length of interval, in nanoseconds 2678 * 2679 * gethrtime() is the time source for nanoseconds since boot. If cyt_when 2680 * is set to 0, the cyclic will start to fire when cyt_interval next 2681 * divides the number of nanoseconds since boot. 2682 * 2683 * The cyt_interval field _must_ be filled in by the caller; one-shots are 2684 * _not_ explicitly supported by the cyclic subsystem (cyclic_add() will 2685 * assert that cyt_interval is non-zero). The maximum value for either 2686 * field is INT64_MAX; the caller is responsible for assuring that 2687 * cyt_when + cyt_interval <= INT64_MAX. Neither field may be negative. 2688 * 2689 * For an arbitrary time t in the future, the cyclic handler is guaranteed 2690 * to have been called (t - cyt_when) / cyt_interval times. This will 2691 * be true even if interrupts have been disabled for periods greater than 2692 * cyt_interval nanoseconds. In order to compensate for such periods, 2693 * the cyclic handler may be called a finite number of times with an 2694 * arbitrarily small interval. 2695 * 2696 * The cyclic subsystem will not enforce any lower bound on the interval; 2697 * if the interval is less than the time required to process an interrupt, 2698 * the CPU will wedge. It's the responsibility of the caller to assure that 2699 * either the value of the interval is sane, or that its caller has 2700 * sufficient privilege to deny service (i.e. its caller is root). 2701 * 2702 * The cyclic handler is guaranteed to be single threaded, even while the 2703 * cyclic is being juggled between CPUs (see cyclic_juggle(), below). 2704 * That is, a given cyclic handler will never be executed simultaneously 2705 * on different CPUs. 2706 * 2707 * Return value 2708 * 2709 * cyclic_add() returns a cyclic_id_t, which is guaranteed to be a value 2710 * other than CYCLIC_NONE. cyclic_add() cannot fail. 2711 * 2712 * Caller's context 2713 * 2714 * cpu_lock must be held by the caller, and the caller must not be in 2715 * interrupt context. cyclic_add() will perform a KM_SLEEP kernel 2716 * memory allocation, so the usual rules (e.g. p_lock cannot be held) 2717 * apply. A cyclic may be added even in the presence of CPUs that have 2718 * not been configured with respect to the cyclic subsystem, but only 2719 * configured CPUs will be eligible to run the new cyclic. 2720 * 2721 * Cyclic handler's context 2722 * 2723 * Cyclic handlers will be executed in the interrupt context corresponding 2724 * to the specified level (i.e. either high, lock or low level). The 2725 * usual context rules apply. 2726 * 2727 * A cyclic handler may not grab ANY locks held by the caller of any of 2728 * cyclic_add(), cyclic_remove() or cyclic_bind(); the implementation of 2729 * these functions may require blocking on cyclic handler completion. 2730 * Moreover, cyclic handlers may not make any call back into the cyclic 2731 * subsystem. 2732 */ 2733 cyclic_id_t 2734 cyclic_add(cyc_handler_t *hdlr, cyc_time_t *when) 2735 { 2736 cyc_id_t *idp = cyclic_new_id(); 2737 2738 ASSERT(MUTEX_HELD(&cpu_lock)); 2739 ASSERT(when->cyt_when >= 0 && when->cyt_interval > 0); 2740 2741 idp->cyi_cpu = cyclic_pick_cpu(NULL, NULL, NULL, 0); 2742 idp->cyi_ndx = cyclic_add_here(idp->cyi_cpu, hdlr, when, 0); 2743 2744 return ((uintptr_t)idp); 2745 } 2746 2747 /* 2748 * cyclic_id_t cyclic_add_omni(cyc_omni_handler_t *) 2749 * 2750 * Overview 2751 * 2752 * cyclic_add_omni() will create an omnipresent cyclic with the specified 2753 * online and offline handlers. Omnipresent cyclics run on all online 2754 * CPUs, including CPUs which have unbound interrupts disabled. 2755 * 2756 * Arguments 2757 * 2758 * As its only argument, cyclic_add_omni() takes a cyc_omni_handler, which 2759 * has the following members: 2760 * 2761 * void (*cyo_online)() <-- Online handler 2762 * void (*cyo_offline)() <-- Offline handler 2763 * void *cyo_arg <-- Argument to be passed to on/offline handlers 2764 * 2765 * Online handler 2766 * 2767 * The cyo_online member is a pointer to a function which has the following 2768 * four arguments: 2769 * 2770 * void * <-- Argument (cyo_arg) 2771 * cpu_t * <-- Pointer to CPU about to be onlined 2772 * cyc_handler_t * <-- Pointer to cyc_handler_t; must be filled in 2773 * by omni online handler 2774 * cyc_time_t * <-- Pointer to cyc_time_t; must be filled in by 2775 * omni online handler 2776 * 2777 * The omni cyclic online handler is always called _before_ the omni 2778 * cyclic begins to fire on the specified CPU. As the above argument 2779 * description implies, the online handler must fill in the two structures 2780 * passed to it: the cyc_handler_t and the cyc_time_t. These are the 2781 * same two structures passed to cyclic_add(), outlined above. This 2782 * allows the omni cyclic to have maximum flexibility; different CPUs may 2783 * optionally 2784 * 2785 * (a) have different intervals 2786 * (b) be explicitly in or out of phase with one another 2787 * (c) have different handlers 2788 * (d) have different handler arguments 2789 * (e) fire at different levels 2790 * 2791 * Of these, (e) seems somewhat dubious, but is nonetheless allowed. 2792 * 2793 * The omni online handler is called in the same context as cyclic_add(), 2794 * and has the same liberties: omni online handlers may perform KM_SLEEP 2795 * kernel memory allocations, and may grab locks which are also acquired 2796 * by cyclic handlers. However, omni cyclic online handlers may _not_ 2797 * call back into the cyclic subsystem, and should be generally careful 2798 * about calling into arbitrary kernel subsystems. 2799 * 2800 * Offline handler 2801 * 2802 * The cyo_offline member is a pointer to a function which has the following 2803 * three arguments: 2804 * 2805 * void * <-- Argument (cyo_arg) 2806 * cpu_t * <-- Pointer to CPU about to be offlined 2807 * void * <-- CPU's cyclic argument (that is, value 2808 * to which cyh_arg member of the cyc_handler_t 2809 * was set in the omni online handler) 2810 * 2811 * The omni cyclic offline handler is always called _after_ the omni 2812 * cyclic has ceased firing on the specified CPU. Its purpose is to 2813 * allow cleanup of any resources dynamically allocated in the omni cyclic 2814 * online handler. The context of the offline handler is identical to 2815 * that of the online handler; the same constraints and liberties apply. 2816 * 2817 * The offline handler is optional; it may be NULL. 2818 * 2819 * Return value 2820 * 2821 * cyclic_add_omni() returns a cyclic_id_t, which is guaranteed to be a 2822 * value other than CYCLIC_NONE. cyclic_add_omni() cannot fail. 2823 * 2824 * Caller's context 2825 * 2826 * The caller's context is identical to that of cyclic_add(), specified 2827 * above. 2828 */ 2829 cyclic_id_t 2830 cyclic_add_omni(cyc_omni_handler_t *omni) 2831 { 2832 cyc_id_t *idp = cyclic_new_id(); 2833 cyc_cpu_t *cpu; 2834 cpu_t *c; 2835 2836 ASSERT(MUTEX_HELD(&cpu_lock)); 2837 ASSERT(omni != NULL && omni->cyo_online != NULL); 2838 2839 idp->cyi_omni_hdlr = *omni; 2840 2841 c = cpu_list; 2842 do { 2843 if ((cpu = c->cpu_cyclic) == NULL) 2844 continue; 2845 2846 if (cpu->cyp_state != CYS_ONLINE) { 2847 ASSERT(cpu->cyp_state == CYS_OFFLINE); 2848 continue; 2849 } 2850 2851 cyclic_omni_start(idp, cpu); 2852 } while ((c = c->cpu_next) != cpu_list); 2853 2854 /* 2855 * We must have found at least one online CPU on which to run 2856 * this cyclic. 2857 */ 2858 ASSERT(idp->cyi_omni_list != NULL); 2859 ASSERT(idp->cyi_cpu == NULL); 2860 2861 return ((uintptr_t)idp); 2862 } 2863 2864 /* 2865 * void cyclic_remove(cyclic_id_t) 2866 * 2867 * Overview 2868 * 2869 * cyclic_remove() will remove the specified cyclic from the system. 2870 * 2871 * Arguments and notes 2872 * 2873 * The only argument is a cyclic_id returned from either cyclic_add() or 2874 * cyclic_add_omni(). 2875 * 2876 * By the time cyclic_remove() returns, the caller is guaranteed that the 2877 * removed cyclic handler has completed execution (this is the same 2878 * semantic that untimeout() provides). As a result, cyclic_remove() may 2879 * need to block, waiting for the removed cyclic to complete execution. 2880 * This leads to an important constraint on the caller: no lock may be 2881 * held across cyclic_remove() that also may be acquired by a cyclic 2882 * handler. 2883 * 2884 * Return value 2885 * 2886 * None; cyclic_remove() always succeeds. 2887 * 2888 * Caller's context 2889 * 2890 * cpu_lock must be held by the caller, and the caller must not be in 2891 * interrupt context. The caller may not hold any locks which are also 2892 * grabbed by any cyclic handler. See "Arguments and notes", above. 2893 */ 2894 void 2895 cyclic_remove(cyclic_id_t id) 2896 { 2897 cyc_id_t *idp = (cyc_id_t *)id; 2898 cyc_id_t *prev = idp->cyi_prev, *next = idp->cyi_next; 2899 cyc_cpu_t *cpu = idp->cyi_cpu; 2900 2901 CYC_PTRACE("remove", idp, idp->cyi_cpu); 2902 ASSERT(MUTEX_HELD(&cpu_lock)); 2903 2904 if (cpu != NULL) { 2905 (void) cyclic_remove_here(cpu, idp->cyi_ndx, NULL, CY_WAIT); 2906 } else { 2907 ASSERT(idp->cyi_omni_list != NULL); 2908 while (idp->cyi_omni_list != NULL) 2909 cyclic_omni_stop(idp, idp->cyi_omni_list->cyo_cpu); 2910 } 2911 2912 if (prev != NULL) { 2913 ASSERT(cyclic_id_head != idp); 2914 prev->cyi_next = next; 2915 } else { 2916 ASSERT(cyclic_id_head == idp); 2917 cyclic_id_head = next; 2918 } 2919 2920 if (next != NULL) 2921 next->cyi_prev = prev; 2922 2923 kmem_cache_free(cyclic_id_cache, idp); 2924 } 2925 2926 /* 2927 * void cyclic_bind(cyclic_id_t, cpu_t *, cpupart_t *) 2928 * 2929 * Overview 2930 * 2931 * cyclic_bind() atomically changes the CPU and CPU partition bindings 2932 * of a cyclic. 2933 * 2934 * Arguments and notes 2935 * 2936 * The first argument is a cyclic_id retuned from cyclic_add(). 2937 * cyclic_bind() may _not_ be called on a cyclic_id returned from 2938 * cyclic_add_omni(). 2939 * 2940 * The second argument specifies the CPU to which to bind the specified 2941 * cyclic. If the specified cyclic is bound to a CPU other than the one 2942 * specified, it will be unbound from its bound CPU. Unbinding the cyclic 2943 * from its CPU may cause it to be juggled to another CPU. If the specified 2944 * CPU is non-NULL, the cyclic will be subsequently rebound to the specified 2945 * CPU. 2946 * 2947 * If a CPU with bound cyclics is transitioned into the P_NOINTR state, 2948 * only cyclics not bound to the CPU can be juggled away; CPU-bound cyclics 2949 * will continue to fire on the P_NOINTR CPU. A CPU with bound cyclics 2950 * cannot be offlined (attempts to offline the CPU will return EBUSY). 2951 * Likewise, cyclics may not be bound to an offline CPU; if the caller 2952 * attempts to bind a cyclic to an offline CPU, the cyclic subsystem will 2953 * panic. 2954 * 2955 * The third argument specifies the CPU partition to which to bind the 2956 * specified cyclic. If the specified cyclic is bound to a CPU partition 2957 * other than the one specified, it will be unbound from its bound 2958 * partition. Unbinding the cyclic from its CPU partition may cause it 2959 * to be juggled to another CPU. If the specified CPU partition is 2960 * non-NULL, the cyclic will be subsequently rebound to the specified CPU 2961 * partition. 2962 * 2963 * It is the caller's responsibility to assure that the specified CPU 2964 * partition contains a CPU. If it does not, the cyclic subsystem will 2965 * panic. A CPU partition with bound cyclics cannot be destroyed (attempts 2966 * to destroy the partition will return EBUSY). If a CPU with 2967 * partition-bound cyclics is transitioned into the P_NOINTR state, cyclics 2968 * bound to the CPU's partition (but not bound to the CPU) will be juggled 2969 * away only if there exists another CPU in the partition in the P_ONLINE 2970 * state. 2971 * 2972 * It is the caller's responsibility to assure that the specified CPU and 2973 * CPU partition are self-consistent. If both parameters are non-NULL, 2974 * and the specified CPU partition does not contain the specified CPU, the 2975 * cyclic subsystem will panic. 2976 * 2977 * It is the caller's responsibility to assure that the specified CPU has 2978 * been configured with respect to the cyclic subsystem. Generally, this 2979 * is always true for valid, on-line CPUs. The only periods of time during 2980 * which this may not be true are during MP boot (i.e. after cyclic_init() 2981 * is called but before cyclic_mp_init() is called) or during dynamic 2982 * reconfiguration; cyclic_bind() should only be called with great care 2983 * from these contexts. 2984 * 2985 * Return value 2986 * 2987 * None; cyclic_bind() always succeeds. 2988 * 2989 * Caller's context 2990 * 2991 * cpu_lock must be held by the caller, and the caller must not be in 2992 * interrupt context. The caller may not hold any locks which are also 2993 * grabbed by any cyclic handler. 2994 */ 2995 void 2996 cyclic_bind(cyclic_id_t id, cpu_t *d, cpupart_t *part) 2997 { 2998 cyc_id_t *idp = (cyc_id_t *)id; 2999 cyc_cpu_t *cpu = idp->cyi_cpu; 3000 cpu_t *c; 3001 uint16_t flags; 3002 3003 CYC_PTRACE("bind", d, part); 3004 ASSERT(MUTEX_HELD(&cpu_lock)); 3005 ASSERT(part == NULL || d == NULL || d->cpu_part == part); 3006 3007 if (cpu == NULL) { 3008 ASSERT(idp->cyi_omni_list != NULL); 3009 panic("attempt to change binding of omnipresent cyclic"); 3010 } 3011 3012 c = cpu->cyp_cpu; 3013 flags = cpu->cyp_cyclics[idp->cyi_ndx].cy_flags; 3014 3015 if (c != d && (flags & CYF_CPU_BOUND)) 3016 cyclic_unbind_cpu(id); 3017 3018 /* 3019 * Reload our cpu (we may have migrated). We don't have to reload 3020 * the flags field here; if we were CYF_PART_BOUND on entry, we are 3021 * CYF_PART_BOUND now. 3022 */ 3023 cpu = idp->cyi_cpu; 3024 c = cpu->cyp_cpu; 3025 3026 if (part != c->cpu_part && (flags & CYF_PART_BOUND)) 3027 cyclic_unbind_cpupart(id); 3028 3029 /* 3030 * Now reload the flags field, asserting that if we are CPU bound, 3031 * the CPU was specified (and likewise, if we are partition bound, 3032 * the partition was specified). 3033 */ 3034 cpu = idp->cyi_cpu; 3035 c = cpu->cyp_cpu; 3036 flags = cpu->cyp_cyclics[idp->cyi_ndx].cy_flags; 3037 ASSERT(!(flags & CYF_CPU_BOUND) || c == d); 3038 ASSERT(!(flags & CYF_PART_BOUND) || c->cpu_part == part); 3039 3040 if (!(flags & CYF_CPU_BOUND) && d != NULL) 3041 cyclic_bind_cpu(id, d); 3042 3043 if (!(flags & CYF_PART_BOUND) && part != NULL) 3044 cyclic_bind_cpupart(id, part); 3045 } 3046 3047 int 3048 cyclic_reprogram(cyclic_id_t id, hrtime_t expiration) 3049 { 3050 cyc_id_t *idp = (cyc_id_t *)id; 3051 cyc_cpu_t *cpu; 3052 cyc_omni_cpu_t *ocpu; 3053 cyc_index_t ndx; 3054 3055 ASSERT(expiration > 0); 3056 3057 CYC_PTRACE("reprog", idp, idp->cyi_cpu); 3058 3059 kpreempt_disable(); 3060 3061 /* 3062 * Prevent the cyclic from moving or disappearing while we reprogram. 3063 */ 3064 rw_enter(&idp->cyi_lock, RW_READER); 3065 3066 if (idp->cyi_cpu == NULL) { 3067 ASSERT(curthread->t_preempt > 0); 3068 cpu = CPU->cpu_cyclic; 3069 3070 /* 3071 * For an omni cyclic, we reprogram the cyclic corresponding 3072 * to the current CPU. Look for it in the list. 3073 */ 3074 ocpu = idp->cyi_omni_list; 3075 while (ocpu != NULL) { 3076 if (ocpu->cyo_cpu == cpu) 3077 break; 3078 ocpu = ocpu->cyo_next; 3079 } 3080 3081 if (ocpu == NULL) { 3082 /* 3083 * Didn't find it. This means that CPU offline 3084 * must have removed it racing with us. So, 3085 * nothing to do. 3086 */ 3087 rw_exit(&idp->cyi_lock); 3088 3089 kpreempt_enable(); 3090 3091 return (0); 3092 } 3093 ndx = ocpu->cyo_ndx; 3094 } else { 3095 cpu = idp->cyi_cpu; 3096 ndx = idp->cyi_ndx; 3097 } 3098 3099 if (cpu->cyp_cpu == CPU) 3100 cyclic_reprogram_cyclic(cpu, ndx, expiration); 3101 else 3102 cyclic_reprogram_here(cpu, ndx, expiration); 3103 3104 /* 3105 * Allow the cyclic to be moved or removed. 3106 */ 3107 rw_exit(&idp->cyi_lock); 3108 3109 kpreempt_enable(); 3110 3111 return (1); 3112 } 3113 3114 hrtime_t 3115 cyclic_getres() 3116 { 3117 return (cyclic_resolution); 3118 } 3119 3120 void 3121 cyclic_init(cyc_backend_t *be, hrtime_t resolution) 3122 { 3123 ASSERT(MUTEX_HELD(&cpu_lock)); 3124 3125 CYC_PTRACE("init", be, resolution); 3126 cyclic_resolution = resolution; 3127 3128 /* 3129 * Copy the passed cyc_backend into the backend template. This must 3130 * be done before the CPU can be configured. 3131 */ 3132 bcopy(be, &cyclic_backend, sizeof (cyc_backend_t)); 3133 3134 /* 3135 * It's safe to look at the "CPU" pointer without disabling kernel 3136 * preemption; cyclic_init() is called only during startup by the 3137 * cyclic backend. 3138 */ 3139 cyclic_configure(CPU); 3140 cyclic_online(CPU); 3141 } 3142 3143 /* 3144 * It is assumed that cyclic_mp_init() is called some time after cyclic 3145 * init (and therefore, after cpu0 has been initialized). We grab cpu_lock, 3146 * find the already initialized CPU, and initialize every other CPU with the 3147 * same backend. Finally, we register a cpu_setup function. 3148 */ 3149 void 3150 cyclic_mp_init() 3151 { 3152 cpu_t *c; 3153 3154 mutex_enter(&cpu_lock); 3155 3156 c = cpu_list; 3157 do { 3158 if (c->cpu_cyclic == NULL) { 3159 cyclic_configure(c); 3160 cyclic_online(c); 3161 } 3162 } while ((c = c->cpu_next) != cpu_list); 3163 3164 register_cpu_setup_func(cyclic_cpu_setup, NULL); 3165 mutex_exit(&cpu_lock); 3166 } 3167 3168 /* 3169 * int cyclic_juggle(cpu_t *) 3170 * 3171 * Overview 3172 * 3173 * cyclic_juggle() juggles as many cyclics as possible away from the 3174 * specified CPU; all remaining cyclics on the CPU will either be CPU- 3175 * or partition-bound. 3176 * 3177 * Arguments and notes 3178 * 3179 * The only argument to cyclic_juggle() is the CPU from which cyclics 3180 * should be juggled. CPU-bound cyclics are never juggled; partition-bound 3181 * cyclics are only juggled if the specified CPU is in the P_NOINTR state 3182 * and there exists a P_ONLINE CPU in the partition. The cyclic subsystem 3183 * assures that a cyclic will never fire late or spuriously, even while 3184 * being juggled. 3185 * 3186 * Return value 3187 * 3188 * cyclic_juggle() returns a non-zero value if all cyclics were able to 3189 * be juggled away from the CPU, and zero if one or more cyclics could 3190 * not be juggled away. 3191 * 3192 * Caller's context 3193 * 3194 * cpu_lock must be held by the caller, and the caller must not be in 3195 * interrupt context. The caller may not hold any locks which are also 3196 * grabbed by any cyclic handler. While cyclic_juggle() _may_ be called 3197 * in any context satisfying these constraints, it _must_ be called 3198 * immediately after clearing CPU_ENABLE (i.e. before dropping cpu_lock). 3199 * Failure to do so could result in an assertion failure in the cyclic 3200 * subsystem. 3201 */ 3202 int 3203 cyclic_juggle(cpu_t *c) 3204 { 3205 cyc_cpu_t *cpu = c->cpu_cyclic; 3206 cyc_id_t *idp; 3207 int all_juggled = 1; 3208 3209 CYC_PTRACE1("juggle", c); 3210 ASSERT(MUTEX_HELD(&cpu_lock)); 3211 3212 /* 3213 * We'll go through each cyclic on the CPU, attempting to juggle 3214 * each one elsewhere. 3215 */ 3216 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3217 if (idp->cyi_cpu != cpu) 3218 continue; 3219 3220 if (cyclic_juggle_one(idp) == 0) { 3221 all_juggled = 0; 3222 continue; 3223 } 3224 3225 ASSERT(idp->cyi_cpu != cpu); 3226 } 3227 3228 return (all_juggled); 3229 } 3230 3231 /* 3232 * int cyclic_offline(cpu_t *) 3233 * 3234 * Overview 3235 * 3236 * cyclic_offline() offlines the cyclic subsystem on the specified CPU. 3237 * 3238 * Arguments and notes 3239 * 3240 * The only argument to cyclic_offline() is a CPU to offline. 3241 * cyclic_offline() will attempt to juggle cyclics away from the specified 3242 * CPU. 3243 * 3244 * Return value 3245 * 3246 * cyclic_offline() returns 1 if all cyclics on the CPU were juggled away 3247 * and the cyclic subsystem on the CPU was successfully offlines. 3248 * cyclic_offline returns 0 if some cyclics remain, blocking the cyclic 3249 * offline operation. All remaining cyclics on the CPU will either be 3250 * CPU- or partition-bound. 3251 * 3252 * See the "Arguments and notes" of cyclic_juggle(), below, for more detail 3253 * on cyclic juggling. 3254 * 3255 * Caller's context 3256 * 3257 * The only caller of cyclic_offline() should be the processor management 3258 * subsystem. It is expected that the caller of cyclic_offline() will 3259 * offline the CPU immediately after cyclic_offline() returns success (i.e. 3260 * before dropping cpu_lock). Moreover, it is expected that the caller will 3261 * fail the CPU offline operation if cyclic_offline() returns failure. 3262 */ 3263 int 3264 cyclic_offline(cpu_t *c) 3265 { 3266 cyc_cpu_t *cpu = c->cpu_cyclic; 3267 cyc_id_t *idp; 3268 3269 CYC_PTRACE1("offline", cpu); 3270 ASSERT(MUTEX_HELD(&cpu_lock)); 3271 3272 if (!cyclic_juggle(c)) 3273 return (0); 3274 3275 /* 3276 * This CPU is headed offline; we need to now stop omnipresent 3277 * cyclic firing on this CPU. 3278 */ 3279 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3280 if (idp->cyi_cpu != NULL) 3281 continue; 3282 3283 /* 3284 * We cannot possibly be offlining the last CPU; cyi_omni_list 3285 * must be non-NULL. 3286 */ 3287 ASSERT(idp->cyi_omni_list != NULL); 3288 cyclic_omni_stop(idp, cpu); 3289 } 3290 3291 ASSERT(cpu->cyp_state == CYS_ONLINE); 3292 cpu->cyp_state = CYS_OFFLINE; 3293 3294 return (1); 3295 } 3296 3297 /* 3298 * void cyclic_online(cpu_t *) 3299 * 3300 * Overview 3301 * 3302 * cyclic_online() onlines a CPU previously offlined with cyclic_offline(). 3303 * 3304 * Arguments and notes 3305 * 3306 * cyclic_online()'s only argument is a CPU to online. The specified 3307 * CPU must have been previously offlined with cyclic_offline(). After 3308 * cyclic_online() returns, the specified CPU will be eligible to execute 3309 * cyclics. 3310 * 3311 * Return value 3312 * 3313 * None; cyclic_online() always succeeds. 3314 * 3315 * Caller's context 3316 * 3317 * cyclic_online() should only be called by the processor management 3318 * subsystem; cpu_lock must be held. 3319 */ 3320 void 3321 cyclic_online(cpu_t *c) 3322 { 3323 cyc_cpu_t *cpu = c->cpu_cyclic; 3324 cyc_id_t *idp; 3325 3326 CYC_PTRACE1("online", cpu); 3327 ASSERT(c->cpu_flags & CPU_ENABLE); 3328 ASSERT(MUTEX_HELD(&cpu_lock)); 3329 ASSERT(cpu->cyp_state == CYS_OFFLINE); 3330 3331 cpu->cyp_state = CYS_ONLINE; 3332 3333 /* 3334 * Now that this CPU is open for business, we need to start firing 3335 * all omnipresent cyclics on it. 3336 */ 3337 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3338 if (idp->cyi_cpu != NULL) 3339 continue; 3340 3341 cyclic_omni_start(idp, cpu); 3342 } 3343 } 3344 3345 /* 3346 * void cyclic_move_in(cpu_t *) 3347 * 3348 * Overview 3349 * 3350 * cyclic_move_in() is called by the CPU partition code immediately after 3351 * the specified CPU has moved into a new partition. 3352 * 3353 * Arguments and notes 3354 * 3355 * The only argument to cyclic_move_in() is a CPU which has moved into a 3356 * new partition. If the specified CPU is P_ONLINE, and every other 3357 * CPU in the specified CPU's new partition is P_NOINTR, cyclic_move_in() 3358 * will juggle all partition-bound, CPU-unbound cyclics to the specified 3359 * CPU. 3360 * 3361 * Return value 3362 * 3363 * None; cyclic_move_in() always succeeds. 3364 * 3365 * Caller's context 3366 * 3367 * cyclic_move_in() should _only_ be called immediately after a CPU has 3368 * moved into a new partition, with cpu_lock held. As with other calls 3369 * into the cyclic subsystem, no lock may be held which is also grabbed 3370 * by any cyclic handler. 3371 */ 3372 void 3373 cyclic_move_in(cpu_t *d) 3374 { 3375 cyc_id_t *idp; 3376 cyc_cpu_t *dest = d->cpu_cyclic; 3377 cyclic_t *cyclic; 3378 cpupart_t *part = d->cpu_part; 3379 3380 CYC_PTRACE("move-in", dest, part); 3381 ASSERT(MUTEX_HELD(&cpu_lock)); 3382 3383 /* 3384 * Look for CYF_PART_BOUND cyclics in the new partition. If 3385 * we find one, check to see if it is currently on a CPU which has 3386 * interrupts disabled. If it is (and if this CPU currently has 3387 * interrupts enabled), we'll juggle those cyclics over here. 3388 */ 3389 if (!(d->cpu_flags & CPU_ENABLE)) { 3390 CYC_PTRACE1("move-in-none", dest); 3391 return; 3392 } 3393 3394 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3395 cyc_cpu_t *cpu = idp->cyi_cpu; 3396 cpu_t *c; 3397 3398 /* 3399 * Omnipresent cyclics are exempt from juggling. 3400 */ 3401 if (cpu == NULL) 3402 continue; 3403 3404 c = cpu->cyp_cpu; 3405 3406 if (c->cpu_part != part || (c->cpu_flags & CPU_ENABLE)) 3407 continue; 3408 3409 cyclic = &cpu->cyp_cyclics[idp->cyi_ndx]; 3410 3411 if (cyclic->cy_flags & CYF_CPU_BOUND) 3412 continue; 3413 3414 /* 3415 * We know that this cyclic is bound to its processor set 3416 * (otherwise, it would not be on a CPU with interrupts 3417 * disabled); juggle it to our CPU. 3418 */ 3419 ASSERT(cyclic->cy_flags & CYF_PART_BOUND); 3420 cyclic_juggle_one_to(idp, dest); 3421 } 3422 3423 CYC_PTRACE1("move-in-done", dest); 3424 } 3425 3426 /* 3427 * int cyclic_move_out(cpu_t *) 3428 * 3429 * Overview 3430 * 3431 * cyclic_move_out() is called by the CPU partition code immediately before 3432 * the specified CPU is to move out of its partition. 3433 * 3434 * Arguments and notes 3435 * 3436 * The only argument to cyclic_move_out() is a CPU which is to move out of 3437 * its partition. 3438 * 3439 * cyclic_move_out() will attempt to juggle away all partition-bound 3440 * cyclics. If the specified CPU is the last CPU in a partition with 3441 * partition-bound cyclics, cyclic_move_out() will fail. If there exists 3442 * a partition-bound cyclic which is CPU-bound to the specified CPU, 3443 * cyclic_move_out() will fail. 3444 * 3445 * Note that cyclic_move_out() will _only_ attempt to juggle away 3446 * partition-bound cyclics; CPU-bound cyclics which are not partition-bound 3447 * and unbound cyclics are not affected by changing the partition 3448 * affiliation of the CPU. 3449 * 3450 * Return value 3451 * 3452 * cyclic_move_out() returns 1 if all partition-bound cyclics on the CPU 3453 * were juggled away; 0 if some cyclics remain. 3454 * 3455 * Caller's context 3456 * 3457 * cyclic_move_out() should _only_ be called immediately before a CPU has 3458 * moved out of its partition, with cpu_lock held. It is expected that 3459 * the caller of cyclic_move_out() will change the processor set affiliation 3460 * of the specified CPU immediately after cyclic_move_out() returns 3461 * success (i.e. before dropping cpu_lock). Moreover, it is expected that 3462 * the caller will fail the CPU repartitioning operation if cyclic_move_out() 3463 * returns failure. As with other calls into the cyclic subsystem, no lock 3464 * may be held which is also grabbed by any cyclic handler. 3465 */ 3466 int 3467 cyclic_move_out(cpu_t *c) 3468 { 3469 cyc_id_t *idp; 3470 cyc_cpu_t *cpu = c->cpu_cyclic, *dest; 3471 cyclic_t *cyclic, *cyclics = cpu->cyp_cyclics; 3472 cpupart_t *part = c->cpu_part; 3473 3474 CYC_PTRACE1("move-out", cpu); 3475 ASSERT(MUTEX_HELD(&cpu_lock)); 3476 3477 /* 3478 * If there are any CYF_PART_BOUND cyclics on this CPU, we need 3479 * to try to juggle them away. 3480 */ 3481 for (idp = cyclic_id_head; idp != NULL; idp = idp->cyi_next) { 3482 3483 if (idp->cyi_cpu != cpu) 3484 continue; 3485 3486 cyclic = &cyclics[idp->cyi_ndx]; 3487 3488 if (!(cyclic->cy_flags & CYF_PART_BOUND)) 3489 continue; 3490 3491 dest = cyclic_pick_cpu(part, c, c, cyclic->cy_flags); 3492 3493 if (dest == NULL) { 3494 /* 3495 * We can't juggle this cyclic; we need to return 3496 * failure (we won't bother trying to juggle away 3497 * other cyclics). 3498 */ 3499 CYC_PTRACE("move-out-fail", cpu, idp); 3500 return (0); 3501 } 3502 cyclic_juggle_one_to(idp, dest); 3503 } 3504 3505 CYC_PTRACE1("move-out-done", cpu); 3506 return (1); 3507 } 3508 3509 /* 3510 * void cyclic_suspend() 3511 * 3512 * Overview 3513 * 3514 * cyclic_suspend() suspends all cyclic activity throughout the cyclic 3515 * subsystem. It should be called only by subsystems which are attempting 3516 * to suspend the entire system (e.g. checkpoint/resume, dynamic 3517 * reconfiguration). 3518 * 3519 * Arguments and notes 3520 * 3521 * cyclic_suspend() takes no arguments. Each CPU with an active cyclic 3522 * disables its backend (offline CPUs disable their backends as part of 3523 * the cyclic_offline() operation), thereby disabling future CY_HIGH_LEVEL 3524 * interrupts. 3525 * 3526 * Note that disabling CY_HIGH_LEVEL interrupts does not completely preclude 3527 * cyclic handlers from being called after cyclic_suspend() returns: if a 3528 * CY_LOCK_LEVEL or CY_LOW_LEVEL interrupt thread was blocked at the time 3529 * of cyclic_suspend(), cyclic handlers at its level may continue to be 3530 * called after the interrupt thread becomes unblocked. The 3531 * post-cyclic_suspend() activity is bounded by the pend count on all 3532 * cyclics at the time of cyclic_suspend(). Callers concerned with more 3533 * than simply disabling future CY_HIGH_LEVEL interrupts must check for 3534 * this condition. 3535 * 3536 * On most platforms, timestamps from gethrtime() and gethrestime() are not 3537 * guaranteed to monotonically increase between cyclic_suspend() and 3538 * cyclic_resume(). However, timestamps are guaranteed to monotonically 3539 * increase across the entire cyclic_suspend()/cyclic_resume() operation. 3540 * That is, every timestamp obtained before cyclic_suspend() will be less 3541 * than every timestamp obtained after cyclic_resume(). 3542 * 3543 * Return value 3544 * 3545 * None; cyclic_suspend() always succeeds. 3546 * 3547 * Caller's context 3548 * 3549 * The cyclic subsystem must be configured on every valid CPU; 3550 * cyclic_suspend() may not be called during boot or during dynamic 3551 * reconfiguration. Additionally, cpu_lock must be held, and the caller 3552 * cannot be in high-level interrupt context. However, unlike most other 3553 * cyclic entry points, cyclic_suspend() may be called with locks held 3554 * which are also acquired by CY_LOCK_LEVEL or CY_LOW_LEVEL cyclic 3555 * handlers. 3556 */ 3557 void 3558 cyclic_suspend() 3559 { 3560 cpu_t *c; 3561 cyc_cpu_t *cpu; 3562 cyc_xcallarg_t arg; 3563 cyc_backend_t *be; 3564 3565 CYC_PTRACE0("suspend"); 3566 ASSERT(MUTEX_HELD(&cpu_lock)); 3567 c = cpu_list; 3568 3569 do { 3570 cpu = c->cpu_cyclic; 3571 be = cpu->cyp_backend; 3572 arg.cyx_cpu = cpu; 3573 3574 be->cyb_xcall(be->cyb_arg, c, 3575 (cyc_func_t)cyclic_suspend_xcall, &arg); 3576 } while ((c = c->cpu_next) != cpu_list); 3577 } 3578 3579 /* 3580 * void cyclic_resume() 3581 * 3582 * cyclic_resume() resumes all cyclic activity throughout the cyclic 3583 * subsystem. It should be called only by system-suspending subsystems. 3584 * 3585 * Arguments and notes 3586 * 3587 * cyclic_resume() takes no arguments. Each CPU with an active cyclic 3588 * reenables and reprograms its backend (offline CPUs are not reenabled). 3589 * On most platforms, timestamps from gethrtime() and gethrestime() are not 3590 * guaranteed to monotonically increase between cyclic_suspend() and 3591 * cyclic_resume(). However, timestamps are guaranteed to monotonically 3592 * increase across the entire cyclic_suspend()/cyclic_resume() operation. 3593 * That is, every timestamp obtained before cyclic_suspend() will be less 3594 * than every timestamp obtained after cyclic_resume(). 3595 * 3596 * Return value 3597 * 3598 * None; cyclic_resume() always succeeds. 3599 * 3600 * Caller's context 3601 * 3602 * The cyclic subsystem must be configured on every valid CPU; 3603 * cyclic_resume() may not be called during boot or during dynamic 3604 * reconfiguration. Additionally, cpu_lock must be held, and the caller 3605 * cannot be in high-level interrupt context. However, unlike most other 3606 * cyclic entry points, cyclic_resume() may be called with locks held which 3607 * are also acquired by CY_LOCK_LEVEL or CY_LOW_LEVEL cyclic handlers. 3608 */ 3609 void 3610 cyclic_resume() 3611 { 3612 cpu_t *c; 3613 cyc_cpu_t *cpu; 3614 cyc_xcallarg_t arg; 3615 cyc_backend_t *be; 3616 3617 CYC_PTRACE0("resume"); 3618 ASSERT(MUTEX_HELD(&cpu_lock)); 3619 3620 c = cpu_list; 3621 3622 do { 3623 cpu = c->cpu_cyclic; 3624 be = cpu->cyp_backend; 3625 arg.cyx_cpu = cpu; 3626 3627 be->cyb_xcall(be->cyb_arg, c, 3628 (cyc_func_t)cyclic_resume_xcall, &arg); 3629 } while ((c = c->cpu_next) != cpu_list); 3630 } 3631