1 /* 2 * CDDL HEADER START 3 * 4 * The contents of this file are subject to the terms of the 5 * Common Development and Distribution License (the "License"). 6 * You may not use this file except in compliance with the License. 7 * 8 * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9 * or http://www.opensolaris.org/os/licensing. 10 * See the License for the specific language governing permissions 11 * and limitations under the License. 12 * 13 * When distributing Covered Code, include this CDDL HEADER in each 14 * file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15 * If applicable, add the following below this CDDL HEADER, with the 16 * fields enclosed by brackets "[]" replaced with your own identifying 17 * information: Portions Copyright [yyyy] [name of copyright owner] 18 * 19 * CDDL HEADER END 20 */ 21 22 /* 23 * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. 24 * Copyright 2020 Joyent, Inc. 25 * Copyright 2015 Garrett D'Amore <garrett@damore.org> 26 * Copyright 2020 RackTop Systems, Inc. 27 */ 28 29 /* 30 * MAC Services Module 31 * 32 * The GLDv3 framework locking - The MAC layer 33 * -------------------------------------------- 34 * 35 * The MAC layer is central to the GLD framework and can provide the locking 36 * framework needed for itself and for the use of MAC clients. MAC end points 37 * are fairly disjoint and don't share a lot of state. So a coarse grained 38 * multi-threading scheme is to single thread all create/modify/delete or set 39 * type of control operations on a per mac end point while allowing data threads 40 * concurrently. 41 * 42 * Control operations (set) that modify a mac end point are always serialized on 43 * a per mac end point basis, We have at most 1 such thread per mac end point 44 * at a time. 45 * 46 * All other operations that are not serialized are essentially multi-threaded. 47 * For example a control operation (get) like getting statistics which may not 48 * care about reading values atomically or data threads sending or receiving 49 * data. Mostly these type of operations don't modify the control state. Any 50 * state these operations care about are protected using traditional locks. 51 * 52 * The perimeter only serializes serial operations. It does not imply there 53 * aren't any other concurrent operations. However a serialized operation may 54 * sometimes need to make sure it is the only thread. In this case it needs 55 * to use reference counting mechanisms to cv_wait until any current data 56 * threads are done. 57 * 58 * The mac layer itself does not hold any locks across a call to another layer. 59 * The perimeter is however held across a down call to the driver to make the 60 * whole control operation atomic with respect to other control operations. 61 * Also the data path and get type control operations may proceed concurrently. 62 * These operations synchronize with the single serial operation on a given mac 63 * end point using regular locks. The perimeter ensures that conflicting 64 * operations like say a mac_multicast_add and a mac_multicast_remove on the 65 * same mac end point don't interfere with each other and also ensures that the 66 * changes in the mac layer and the call to the underlying driver to say add a 67 * multicast address are done atomically without interference from a thread 68 * trying to delete the same address. 69 * 70 * For example, consider 71 * mac_multicst_add() 72 * { 73 * mac_perimeter_enter(); serialize all control operations 74 * 75 * grab list lock protect against access by data threads 76 * add to list 77 * drop list lock 78 * 79 * call driver's mi_multicst 80 * 81 * mac_perimeter_exit(); 82 * } 83 * 84 * To lessen the number of serialization locks and simplify the lock hierarchy, 85 * we serialize all the control operations on a per mac end point by using a 86 * single serialization lock called the perimeter. We allow recursive entry into 87 * the perimeter to facilitate use of this mechanism by both the mac client and 88 * the MAC layer itself. 89 * 90 * MAC client means an entity that does an operation on a mac handle 91 * obtained from a mac_open/mac_client_open. Similarly MAC driver means 92 * an entity that does an operation on a mac handle obtained from a 93 * mac_register. An entity could be both client and driver but on different 94 * handles eg. aggr. and should only make the corresponding mac interface calls 95 * i.e. mac driver interface or mac client interface as appropriate for that 96 * mac handle. 97 * 98 * General rules. 99 * ------------- 100 * 101 * R1. The lock order of upcall threads is natually opposite to downcall 102 * threads. Hence upcalls must not hold any locks across layers for fear of 103 * recursive lock enter and lock order violation. This applies to all layers. 104 * 105 * R2. The perimeter is just another lock. Since it is held in the down 106 * direction, acquiring the perimeter in an upcall is prohibited as it would 107 * cause a deadlock. This applies to all layers. 108 * 109 * Note that upcalls that need to grab the mac perimeter (for example 110 * mac_notify upcalls) can still achieve that by posting the request to a 111 * thread, which can then grab all the required perimeters and locks in the 112 * right global order. Note that in the above example the mac layer iself 113 * won't grab the mac perimeter in the mac_notify upcall, instead the upcall 114 * to the client must do that. Please see the aggr code for an example. 115 * 116 * MAC client rules 117 * ---------------- 118 * 119 * R3. A MAC client may use the MAC provided perimeter facility to serialize 120 * control operations on a per mac end point. It does this by by acquring 121 * and holding the perimeter across a sequence of calls to the mac layer. 122 * This ensures atomicity across the entire block of mac calls. In this 123 * model the MAC client must not hold any client locks across the calls to 124 * the mac layer. This model is the preferred solution. 125 * 126 * R4. However if a MAC client has a lot of global state across all mac end 127 * points the per mac end point serialization may not be sufficient. In this 128 * case the client may choose to use global locks or use its own serialization. 129 * To avoid deadlocks, these client layer locks held across the mac calls 130 * in the control path must never be acquired by the data path for the reason 131 * mentioned below. 132 * 133 * (Assume that a control operation that holds a client lock blocks in the 134 * mac layer waiting for upcall reference counts to drop to zero. If an upcall 135 * data thread that holds this reference count, tries to acquire the same 136 * client lock subsequently it will deadlock). 137 * 138 * A MAC client may follow either the R3 model or the R4 model, but can't 139 * mix both. In the former, the hierarchy is Perim -> client locks, but in 140 * the latter it is client locks -> Perim. 141 * 142 * R5. MAC clients must make MAC calls (excluding data calls) in a cv_wait'able 143 * context since they may block while trying to acquire the perimeter. 144 * In addition some calls may block waiting for upcall refcnts to come down to 145 * zero. 146 * 147 * R6. MAC clients must make sure that they are single threaded and all threads 148 * from the top (in particular data threads) have finished before calling 149 * mac_client_close. The MAC framework does not track the number of client 150 * threads using the mac client handle. Also mac clients must make sure 151 * they have undone all the control operations before calling mac_client_close. 152 * For example mac_unicast_remove/mac_multicast_remove to undo the corresponding 153 * mac_unicast_add/mac_multicast_add. 154 * 155 * MAC framework rules 156 * ------------------- 157 * 158 * R7. The mac layer itself must not hold any mac layer locks (except the mac 159 * perimeter) across a call to any other layer from the mac layer. The call to 160 * any other layer could be via mi_* entry points, classifier entry points into 161 * the driver or via upcall pointers into layers above. The mac perimeter may 162 * be acquired or held only in the down direction, for e.g. when calling into 163 * a mi_* driver enty point to provide atomicity of the operation. 164 * 165 * R8. Since it is not guaranteed (see R14) that drivers won't hold locks across 166 * mac driver interfaces, the MAC layer must provide a cut out for control 167 * interfaces like upcall notifications and start them in a separate thread. 168 * 169 * R9. Note that locking order also implies a plumbing order. For example 170 * VNICs are allowed to be created over aggrs, but not vice-versa. An attempt 171 * to plumb in any other order must be failed at mac_open time, otherwise it 172 * could lead to deadlocks due to inverse locking order. 173 * 174 * R10. MAC driver interfaces must not block since the driver could call them 175 * in interrupt context. 176 * 177 * R11. Walkers must preferably not hold any locks while calling walker 178 * callbacks. Instead these can operate on reference counts. In simple 179 * callbacks it may be ok to hold a lock and call the callbacks, but this is 180 * harder to maintain in the general case of arbitrary callbacks. 181 * 182 * R12. The MAC layer must protect upcall notification callbacks using reference 183 * counts rather than holding locks across the callbacks. 184 * 185 * R13. Given the variety of drivers, it is preferable if the MAC layer can make 186 * sure that any pointers (such as mac ring pointers) it passes to the driver 187 * remain valid until mac unregister time. Currently the mac layer achieves 188 * this by using generation numbers for rings and freeing the mac rings only 189 * at unregister time. The MAC layer must provide a layer of indirection and 190 * must not expose underlying driver rings or driver data structures/pointers 191 * directly to MAC clients. 192 * 193 * MAC driver rules 194 * ---------------- 195 * 196 * R14. It would be preferable if MAC drivers don't hold any locks across any 197 * mac call. However at a minimum they must not hold any locks across data 198 * upcalls. They must also make sure that all references to mac data structures 199 * are cleaned up and that it is single threaded at mac_unregister time. 200 * 201 * R15. MAC driver interfaces don't block and so the action may be done 202 * asynchronously in a separate thread as for example handling notifications. 203 * The driver must not assume that the action is complete when the call 204 * returns. 205 * 206 * R16. Drivers must maintain a generation number per Rx ring, and pass it 207 * back to mac_rx_ring(); They are expected to increment the generation 208 * number whenever the ring's stop routine is invoked. 209 * See comments in mac_rx_ring(); 210 * 211 * R17 Similarly mi_stop is another synchronization point and the driver must 212 * ensure that all upcalls are done and there won't be any future upcall 213 * before returning from mi_stop. 214 * 215 * R18. The driver may assume that all set/modify control operations via 216 * the mi_* entry points are single threaded on a per mac end point. 217 * 218 * Lock and Perimeter hierarchy scenarios 219 * --------------------------------------- 220 * 221 * i_mac_impl_lock -> mi_rw_lock -> srs_lock -> s_ring_lock[i_mac_tx_srs_notify] 222 * 223 * ft_lock -> fe_lock [mac_flow_lookup] 224 * 225 * mi_rw_lock -> fe_lock [mac_bcast_send] 226 * 227 * srs_lock -> mac_bw_lock [mac_rx_srs_drain_bw] 228 * 229 * cpu_lock -> mac_srs_g_lock -> srs_lock -> s_ring_lock [mac_walk_srs_and_bind] 230 * 231 * i_dls_devnet_lock -> mac layer locks [dls_devnet_rename] 232 * 233 * Perimeters are ordered P1 -> P2 -> P3 from top to bottom in order of mac 234 * client to driver. In the case of clients that explictly use the mac provided 235 * perimeter mechanism for its serialization, the hierarchy is 236 * Perimeter -> mac layer locks, since the client never holds any locks across 237 * the mac calls. In the case of clients that use its own locks the hierarchy 238 * is Client locks -> Mac Perim -> Mac layer locks. The client never explicitly 239 * calls mac_perim_enter/exit in this case. 240 * 241 * Subflow creation rules 242 * --------------------------- 243 * o In case of a user specified cpulist present on underlying link and flows, 244 * the flows cpulist must be a subset of the underlying link. 245 * o In case of a user specified fanout mode present on link and flow, the 246 * subflow fanout count has to be less than or equal to that of the 247 * underlying link. The cpu-bindings for the subflows will be a subset of 248 * the underlying link. 249 * o In case if no cpulist specified on both underlying link and flow, the 250 * underlying link relies on a MAC tunable to provide out of box fanout. 251 * The subflow will have no cpulist (the subflow will be unbound) 252 * o In case if no cpulist is specified on the underlying link, a subflow can 253 * carry either a user-specified cpulist or fanout count. The cpu-bindings 254 * for the subflow will not adhere to restriction that they need to be subset 255 * of the underlying link. 256 * o In case where the underlying link is carrying either a user specified 257 * cpulist or fanout mode and for a unspecified subflow, the subflow will be 258 * created unbound. 259 * o While creating unbound subflows, bandwidth mode changes attempt to 260 * figure a right fanout count. In such cases the fanout count will override 261 * the unbound cpu-binding behavior. 262 * o In addition to this, while cycling between flow and link properties, we 263 * impose a restriction that if a link property has a subflow with 264 * user-specified attributes, we will not allow changing the link property. 265 * The administrator needs to reset all the user specified properties for the 266 * subflows before attempting a link property change. 267 * Some of the above rules can be overridden by specifying additional command 268 * line options while creating or modifying link or subflow properties. 269 * 270 * Datapath 271 * -------- 272 * 273 * For information on the datapath, the world of soft rings, hardware rings, how 274 * it is structured, and the path of an mblk_t between a driver and a mac 275 * client, see mac_sched.c. 276 */ 277 278 #include <sys/types.h> 279 #include <sys/conf.h> 280 #include <sys/id_space.h> 281 #include <sys/esunddi.h> 282 #include <sys/stat.h> 283 #include <sys/mkdev.h> 284 #include <sys/stream.h> 285 #include <sys/strsun.h> 286 #include <sys/strsubr.h> 287 #include <sys/dlpi.h> 288 #include <sys/list.h> 289 #include <sys/modhash.h> 290 #include <sys/mac_provider.h> 291 #include <sys/mac_client_impl.h> 292 #include <sys/mac_soft_ring.h> 293 #include <sys/mac_stat.h> 294 #include <sys/mac_impl.h> 295 #include <sys/mac.h> 296 #include <sys/dls.h> 297 #include <sys/dld.h> 298 #include <sys/modctl.h> 299 #include <sys/fs/dv_node.h> 300 #include <sys/thread.h> 301 #include <sys/proc.h> 302 #include <sys/callb.h> 303 #include <sys/cpuvar.h> 304 #include <sys/atomic.h> 305 #include <sys/bitmap.h> 306 #include <sys/sdt.h> 307 #include <sys/mac_flow.h> 308 #include <sys/ddi_intr_impl.h> 309 #include <sys/disp.h> 310 #include <sys/sdt.h> 311 #include <sys/vnic.h> 312 #include <sys/vnic_impl.h> 313 #include <sys/vlan.h> 314 #include <inet/ip.h> 315 #include <inet/ip6.h> 316 #include <sys/exacct.h> 317 #include <sys/exacct_impl.h> 318 #include <inet/nd.h> 319 #include <sys/ethernet.h> 320 #include <sys/pool.h> 321 #include <sys/pool_pset.h> 322 #include <sys/cpupart.h> 323 #include <inet/wifi_ioctl.h> 324 #include <net/wpa.h> 325 #include <sys/mac_ether.h> 326 327 #define IMPL_HASHSZ 67 /* prime */ 328 329 kmem_cache_t *i_mac_impl_cachep; 330 mod_hash_t *i_mac_impl_hash; 331 krwlock_t i_mac_impl_lock; 332 uint_t i_mac_impl_count; 333 static kmem_cache_t *mac_ring_cache; 334 static id_space_t *minor_ids; 335 static uint32_t minor_count; 336 static pool_event_cb_t mac_pool_event_reg; 337 338 /* 339 * Logging stuff. Perhaps mac_logging_interval could be broken into 340 * mac_flow_log_interval and mac_link_log_interval if we want to be 341 * able to schedule them differently. 342 */ 343 uint_t mac_logging_interval; 344 boolean_t mac_flow_log_enable; 345 boolean_t mac_link_log_enable; 346 timeout_id_t mac_logging_timer; 347 348 #define MACTYPE_KMODDIR "mac" 349 #define MACTYPE_HASHSZ 67 350 static mod_hash_t *i_mactype_hash; 351 /* 352 * i_mactype_lock synchronizes threads that obtain references to mactype_t 353 * structures through i_mactype_getplugin(). 354 */ 355 static kmutex_t i_mactype_lock; 356 357 /* 358 * mac_tx_percpu_cnt 359 * 360 * Number of per cpu locks per mac_client_impl_t. Used by the transmit side 361 * in mac_tx to reduce lock contention. This is sized at boot time in mac_init. 362 * mac_tx_percpu_cnt_max is settable in /etc/system and must be a power of 2. 363 * Per cpu locks may be disabled by setting mac_tx_percpu_cnt_max to 1. 364 */ 365 int mac_tx_percpu_cnt; 366 int mac_tx_percpu_cnt_max = 128; 367 368 /* 369 * Call back functions for the bridge module. These are guaranteed to be valid 370 * when holding a reference on a link or when holding mip->mi_bridge_lock and 371 * mi_bridge_link is non-NULL. 372 */ 373 mac_bridge_tx_t mac_bridge_tx_cb; 374 mac_bridge_rx_t mac_bridge_rx_cb; 375 mac_bridge_ref_t mac_bridge_ref_cb; 376 mac_bridge_ls_t mac_bridge_ls_cb; 377 378 static int i_mac_constructor(void *, void *, int); 379 static void i_mac_destructor(void *, void *); 380 static int i_mac_ring_ctor(void *, void *, int); 381 static void i_mac_ring_dtor(void *, void *); 382 static mblk_t *mac_rx_classify(mac_impl_t *, mac_resource_handle_t, mblk_t *); 383 void mac_tx_client_flush(mac_client_impl_t *); 384 void mac_tx_client_block(mac_client_impl_t *); 385 static void mac_rx_ring_quiesce(mac_ring_t *, uint_t); 386 static int mac_start_group_and_rings(mac_group_t *); 387 static void mac_stop_group_and_rings(mac_group_t *); 388 static void mac_pool_event_cb(pool_event_t, int, void *); 389 390 typedef struct netinfo_s { 391 list_node_t ni_link; 392 void *ni_record; 393 int ni_size; 394 int ni_type; 395 } netinfo_t; 396 397 /* 398 * Module initialization functions. 399 */ 400 401 void 402 mac_init(void) 403 { 404 mac_tx_percpu_cnt = ((boot_max_ncpus == -1) ? max_ncpus : 405 boot_max_ncpus); 406 407 /* Upper bound is mac_tx_percpu_cnt_max */ 408 if (mac_tx_percpu_cnt > mac_tx_percpu_cnt_max) 409 mac_tx_percpu_cnt = mac_tx_percpu_cnt_max; 410 411 if (mac_tx_percpu_cnt < 1) { 412 /* Someone set max_tx_percpu_cnt_max to 0 or less */ 413 mac_tx_percpu_cnt = 1; 414 } 415 416 ASSERT(mac_tx_percpu_cnt >= 1); 417 mac_tx_percpu_cnt = (1 << highbit(mac_tx_percpu_cnt - 1)); 418 /* 419 * Make it of the form 2**N - 1 in the range 420 * [0 .. mac_tx_percpu_cnt_max - 1] 421 */ 422 mac_tx_percpu_cnt--; 423 424 i_mac_impl_cachep = kmem_cache_create("mac_impl_cache", 425 sizeof (mac_impl_t), 0, i_mac_constructor, i_mac_destructor, 426 NULL, NULL, NULL, 0); 427 ASSERT(i_mac_impl_cachep != NULL); 428 429 mac_ring_cache = kmem_cache_create("mac_ring_cache", 430 sizeof (mac_ring_t), 0, i_mac_ring_ctor, i_mac_ring_dtor, NULL, 431 NULL, NULL, 0); 432 ASSERT(mac_ring_cache != NULL); 433 434 i_mac_impl_hash = mod_hash_create_extended("mac_impl_hash", 435 IMPL_HASHSZ, mod_hash_null_keydtor, mod_hash_null_valdtor, 436 mod_hash_bystr, NULL, mod_hash_strkey_cmp, KM_SLEEP); 437 rw_init(&i_mac_impl_lock, NULL, RW_DEFAULT, NULL); 438 439 mac_flow_init(); 440 mac_soft_ring_init(); 441 mac_bcast_init(); 442 mac_client_init(); 443 444 i_mac_impl_count = 0; 445 446 i_mactype_hash = mod_hash_create_extended("mactype_hash", 447 MACTYPE_HASHSZ, 448 mod_hash_null_keydtor, mod_hash_null_valdtor, 449 mod_hash_bystr, NULL, mod_hash_strkey_cmp, KM_SLEEP); 450 451 /* 452 * Allocate an id space to manage minor numbers. The range of the 453 * space will be from MAC_MAX_MINOR+1 to MAC_PRIVATE_MINOR-1. This 454 * leaves half of the 32-bit minors available for driver private use. 455 */ 456 minor_ids = id_space_create("mac_minor_ids", MAC_MAX_MINOR+1, 457 MAC_PRIVATE_MINOR-1); 458 ASSERT(minor_ids != NULL); 459 minor_count = 0; 460 461 /* Let's default to 20 seconds */ 462 mac_logging_interval = 20; 463 mac_flow_log_enable = B_FALSE; 464 mac_link_log_enable = B_FALSE; 465 mac_logging_timer = NULL; 466 467 /* Register to be notified of noteworthy pools events */ 468 mac_pool_event_reg.pec_func = mac_pool_event_cb; 469 mac_pool_event_reg.pec_arg = NULL; 470 pool_event_cb_register(&mac_pool_event_reg); 471 } 472 473 int 474 mac_fini(void) 475 { 476 477 if (i_mac_impl_count > 0 || minor_count > 0) 478 return (EBUSY); 479 480 pool_event_cb_unregister(&mac_pool_event_reg); 481 482 id_space_destroy(minor_ids); 483 mac_flow_fini(); 484 485 mod_hash_destroy_hash(i_mac_impl_hash); 486 rw_destroy(&i_mac_impl_lock); 487 488 mac_client_fini(); 489 kmem_cache_destroy(mac_ring_cache); 490 491 mod_hash_destroy_hash(i_mactype_hash); 492 mac_soft_ring_finish(); 493 494 495 return (0); 496 } 497 498 /* 499 * Initialize a GLDv3 driver's device ops. A driver that manages its own ops 500 * (e.g. softmac) may pass in a NULL ops argument. 501 */ 502 void 503 mac_init_ops(struct dev_ops *ops, const char *name) 504 { 505 major_t major = ddi_name_to_major((char *)name); 506 507 /* 508 * By returning on error below, we are not letting the driver continue 509 * in an undefined context. The mac_register() function will faill if 510 * DN_GLDV3_DRIVER isn't set. 511 */ 512 if (major == DDI_MAJOR_T_NONE) 513 return; 514 LOCK_DEV_OPS(&devnamesp[major].dn_lock); 515 devnamesp[major].dn_flags |= (DN_GLDV3_DRIVER | DN_NETWORK_DRIVER); 516 UNLOCK_DEV_OPS(&devnamesp[major].dn_lock); 517 if (ops != NULL) 518 dld_init_ops(ops, name); 519 } 520 521 void 522 mac_fini_ops(struct dev_ops *ops) 523 { 524 dld_fini_ops(ops); 525 } 526 527 /*ARGSUSED*/ 528 static int 529 i_mac_constructor(void *buf, void *arg, int kmflag) 530 { 531 mac_impl_t *mip = buf; 532 533 bzero(buf, sizeof (mac_impl_t)); 534 535 mip->mi_linkstate = LINK_STATE_UNKNOWN; 536 537 rw_init(&mip->mi_rw_lock, NULL, RW_DRIVER, NULL); 538 mutex_init(&mip->mi_notify_lock, NULL, MUTEX_DRIVER, NULL); 539 mutex_init(&mip->mi_promisc_lock, NULL, MUTEX_DRIVER, NULL); 540 mutex_init(&mip->mi_ring_lock, NULL, MUTEX_DEFAULT, NULL); 541 542 mip->mi_notify_cb_info.mcbi_lockp = &mip->mi_notify_lock; 543 cv_init(&mip->mi_notify_cb_info.mcbi_cv, NULL, CV_DRIVER, NULL); 544 mip->mi_promisc_cb_info.mcbi_lockp = &mip->mi_promisc_lock; 545 cv_init(&mip->mi_promisc_cb_info.mcbi_cv, NULL, CV_DRIVER, NULL); 546 547 mutex_init(&mip->mi_bridge_lock, NULL, MUTEX_DEFAULT, NULL); 548 549 return (0); 550 } 551 552 /*ARGSUSED*/ 553 static void 554 i_mac_destructor(void *buf, void *arg) 555 { 556 mac_impl_t *mip = buf; 557 mac_cb_info_t *mcbi; 558 559 ASSERT(mip->mi_ref == 0); 560 ASSERT(mip->mi_active == 0); 561 ASSERT(mip->mi_linkstate == LINK_STATE_UNKNOWN); 562 ASSERT(mip->mi_devpromisc == 0); 563 ASSERT(mip->mi_ksp == NULL); 564 ASSERT(mip->mi_kstat_count == 0); 565 ASSERT(mip->mi_nclients == 0); 566 ASSERT(mip->mi_nactiveclients == 0); 567 ASSERT(mip->mi_single_active_client == NULL); 568 ASSERT(mip->mi_state_flags == 0); 569 ASSERT(mip->mi_factory_addr == NULL); 570 ASSERT(mip->mi_factory_addr_num == 0); 571 ASSERT(mip->mi_default_tx_ring == NULL); 572 573 mcbi = &mip->mi_notify_cb_info; 574 ASSERT(mcbi->mcbi_del_cnt == 0 && mcbi->mcbi_walker_cnt == 0); 575 ASSERT(mip->mi_notify_bits == 0); 576 ASSERT(mip->mi_notify_thread == NULL); 577 ASSERT(mcbi->mcbi_lockp == &mip->mi_notify_lock); 578 mcbi->mcbi_lockp = NULL; 579 580 mcbi = &mip->mi_promisc_cb_info; 581 ASSERT(mcbi->mcbi_del_cnt == 0 && mip->mi_promisc_list == NULL); 582 ASSERT(mip->mi_promisc_list == NULL); 583 ASSERT(mcbi->mcbi_lockp == &mip->mi_promisc_lock); 584 mcbi->mcbi_lockp = NULL; 585 586 ASSERT(mip->mi_bcast_ngrps == 0 && mip->mi_bcast_grp == NULL); 587 ASSERT(mip->mi_perim_owner == NULL && mip->mi_perim_ocnt == 0); 588 589 rw_destroy(&mip->mi_rw_lock); 590 591 mutex_destroy(&mip->mi_promisc_lock); 592 cv_destroy(&mip->mi_promisc_cb_info.mcbi_cv); 593 mutex_destroy(&mip->mi_notify_lock); 594 cv_destroy(&mip->mi_notify_cb_info.mcbi_cv); 595 mutex_destroy(&mip->mi_ring_lock); 596 597 ASSERT(mip->mi_bridge_link == NULL); 598 } 599 600 /* ARGSUSED */ 601 static int 602 i_mac_ring_ctor(void *buf, void *arg, int kmflag) 603 { 604 mac_ring_t *ring = (mac_ring_t *)buf; 605 606 bzero(ring, sizeof (mac_ring_t)); 607 cv_init(&ring->mr_cv, NULL, CV_DEFAULT, NULL); 608 mutex_init(&ring->mr_lock, NULL, MUTEX_DEFAULT, NULL); 609 ring->mr_state = MR_FREE; 610 return (0); 611 } 612 613 /* ARGSUSED */ 614 static void 615 i_mac_ring_dtor(void *buf, void *arg) 616 { 617 mac_ring_t *ring = (mac_ring_t *)buf; 618 619 cv_destroy(&ring->mr_cv); 620 mutex_destroy(&ring->mr_lock); 621 } 622 623 /* 624 * Common functions to do mac callback addition and deletion. Currently this is 625 * used by promisc callbacks and notify callbacks. List addition and deletion 626 * need to take care of list walkers. List walkers in general, can't hold list 627 * locks and make upcall callbacks due to potential lock order and recursive 628 * reentry issues. Instead list walkers increment the list walker count to mark 629 * the presence of a walker thread. Addition can be carefully done to ensure 630 * that the list walker always sees either the old list or the new list. 631 * However the deletion can't be done while the walker is active, instead the 632 * deleting thread simply marks the entry as logically deleted. The last walker 633 * physically deletes and frees up the logically deleted entries when the walk 634 * is complete. 635 */ 636 void 637 mac_callback_add(mac_cb_info_t *mcbi, mac_cb_t **mcb_head, 638 mac_cb_t *mcb_elem) 639 { 640 mac_cb_t *p; 641 mac_cb_t **pp; 642 643 /* Verify it is not already in the list */ 644 for (pp = mcb_head; (p = *pp) != NULL; pp = &p->mcb_nextp) { 645 if (p == mcb_elem) 646 break; 647 } 648 VERIFY(p == NULL); 649 650 /* 651 * Add it to the head of the callback list. The membar ensures that 652 * the following list pointer manipulations reach global visibility 653 * in exactly the program order below. 654 */ 655 ASSERT(MUTEX_HELD(mcbi->mcbi_lockp)); 656 657 mcb_elem->mcb_nextp = *mcb_head; 658 membar_producer(); 659 *mcb_head = mcb_elem; 660 } 661 662 /* 663 * Mark the entry as logically deleted. If there aren't any walkers unlink 664 * from the list. In either case return the corresponding status. 665 */ 666 boolean_t 667 mac_callback_remove(mac_cb_info_t *mcbi, mac_cb_t **mcb_head, 668 mac_cb_t *mcb_elem) 669 { 670 mac_cb_t *p; 671 mac_cb_t **pp; 672 673 ASSERT(MUTEX_HELD(mcbi->mcbi_lockp)); 674 /* 675 * Search the callback list for the entry to be removed 676 */ 677 for (pp = mcb_head; (p = *pp) != NULL; pp = &p->mcb_nextp) { 678 if (p == mcb_elem) 679 break; 680 } 681 VERIFY(p != NULL); 682 683 /* 684 * If there are walkers just mark it as deleted and the last walker 685 * will remove from the list and free it. 686 */ 687 if (mcbi->mcbi_walker_cnt != 0) { 688 p->mcb_flags |= MCB_CONDEMNED; 689 mcbi->mcbi_del_cnt++; 690 return (B_FALSE); 691 } 692 693 ASSERT(mcbi->mcbi_del_cnt == 0); 694 *pp = p->mcb_nextp; 695 p->mcb_nextp = NULL; 696 return (B_TRUE); 697 } 698 699 /* 700 * Wait for all pending callback removals to be completed 701 */ 702 void 703 mac_callback_remove_wait(mac_cb_info_t *mcbi) 704 { 705 ASSERT(MUTEX_HELD(mcbi->mcbi_lockp)); 706 while (mcbi->mcbi_del_cnt != 0) { 707 DTRACE_PROBE1(need_wait, mac_cb_info_t *, mcbi); 708 cv_wait(&mcbi->mcbi_cv, mcbi->mcbi_lockp); 709 } 710 } 711 712 void 713 mac_callback_barrier(mac_cb_info_t *mcbi) 714 { 715 ASSERT(MUTEX_HELD(mcbi->mcbi_lockp)); 716 ASSERT3U(mcbi->mcbi_barrier_cnt, <, UINT_MAX); 717 718 if (mcbi->mcbi_walker_cnt == 0) { 719 return; 720 } 721 722 mcbi->mcbi_barrier_cnt++; 723 do { 724 cv_wait(&mcbi->mcbi_cv, mcbi->mcbi_lockp); 725 } while (mcbi->mcbi_walker_cnt > 0); 726 mcbi->mcbi_barrier_cnt--; 727 cv_broadcast(&mcbi->mcbi_cv); 728 } 729 730 void 731 mac_callback_walker_enter(mac_cb_info_t *mcbi) 732 { 733 mutex_enter(mcbi->mcbi_lockp); 734 /* 735 * Incoming walkers should give precedence to timely clean-up of 736 * deleted callback entries and requested barriers. 737 */ 738 while (mcbi->mcbi_del_cnt > 0 || mcbi->mcbi_barrier_cnt > 0) { 739 cv_wait(&mcbi->mcbi_cv, mcbi->mcbi_lockp); 740 } 741 mcbi->mcbi_walker_cnt++; 742 mutex_exit(mcbi->mcbi_lockp); 743 } 744 745 /* 746 * The last mac callback walker does the cleanup. Walk the list and unlik 747 * all the logically deleted entries and construct a temporary list of 748 * removed entries. Return the list of removed entries to the caller. 749 */ 750 static mac_cb_t * 751 mac_callback_walker_cleanup(mac_cb_info_t *mcbi, mac_cb_t **mcb_head) 752 { 753 mac_cb_t *p; 754 mac_cb_t **pp; 755 mac_cb_t *rmlist = NULL; /* List of removed elements */ 756 int cnt = 0; 757 758 ASSERT(MUTEX_HELD(mcbi->mcbi_lockp)); 759 ASSERT(mcbi->mcbi_del_cnt != 0 && mcbi->mcbi_walker_cnt == 0); 760 761 pp = mcb_head; 762 while (*pp != NULL) { 763 if ((*pp)->mcb_flags & MCB_CONDEMNED) { 764 p = *pp; 765 *pp = p->mcb_nextp; 766 p->mcb_nextp = rmlist; 767 rmlist = p; 768 cnt++; 769 continue; 770 } 771 pp = &(*pp)->mcb_nextp; 772 } 773 774 ASSERT(mcbi->mcbi_del_cnt == cnt); 775 mcbi->mcbi_del_cnt = 0; 776 return (rmlist); 777 } 778 779 void 780 mac_callback_walker_exit(mac_cb_info_t *mcbi, mac_cb_t **headp, 781 boolean_t is_promisc) 782 { 783 boolean_t do_wake = B_FALSE; 784 785 mutex_enter(mcbi->mcbi_lockp); 786 787 /* If walkers remain, nothing more can be done for now */ 788 if (--mcbi->mcbi_walker_cnt != 0) { 789 mutex_exit(mcbi->mcbi_lockp); 790 return; 791 } 792 793 if (mcbi->mcbi_del_cnt != 0) { 794 mac_cb_t *rmlist; 795 796 rmlist = mac_callback_walker_cleanup(mcbi, headp); 797 798 if (!is_promisc) { 799 /* The "normal" non-promisc callback clean-up */ 800 mac_callback_free(rmlist); 801 } else { 802 mac_cb_t *mcb, *mcb_next; 803 804 /* 805 * The promisc callbacks are in 2 lists, one off the 806 * 'mip' and another off the 'mcip' threaded by 807 * mpi_mi_link and mpi_mci_link respectively. There 808 * is, however, only a single shared total walker 809 * count, and an entry cannot be physically unlinked if 810 * a walker is active on either list. The last walker 811 * does this cleanup of logically deleted entries. 812 * 813 * With a list of callbacks deleted from above from 814 * mi_promisc_list (headp), remove the corresponding 815 * entry from mci_promisc_list (headp_pair) and free 816 * the structure. 817 */ 818 for (mcb = rmlist; mcb != NULL; mcb = mcb_next) { 819 mac_promisc_impl_t *mpip; 820 mac_client_impl_t *mcip; 821 822 mcb_next = mcb->mcb_nextp; 823 mpip = (mac_promisc_impl_t *)mcb->mcb_objp; 824 mcip = mpip->mpi_mcip; 825 826 ASSERT3P(&mcip->mci_mip->mi_promisc_cb_info, 827 ==, mcbi); 828 ASSERT3P(&mcip->mci_mip->mi_promisc_list, 829 ==, headp); 830 831 VERIFY(mac_callback_remove(mcbi, 832 &mcip->mci_promisc_list, 833 &mpip->mpi_mci_link)); 834 mcb->mcb_flags = 0; 835 mcb->mcb_nextp = NULL; 836 kmem_cache_free(mac_promisc_impl_cache, mpip); 837 } 838 } 839 840 /* 841 * Wake any walker threads that could be waiting in 842 * mac_callback_walker_enter() until deleted items have been 843 * cleaned from the list. 844 */ 845 do_wake = B_TRUE; 846 } 847 848 if (mcbi->mcbi_barrier_cnt != 0) { 849 /* 850 * One or more threads are waiting for all walkers to exit the 851 * callback list. Notify them, now that the list is clear. 852 */ 853 do_wake = B_TRUE; 854 } 855 856 if (do_wake) { 857 cv_broadcast(&mcbi->mcbi_cv); 858 } 859 mutex_exit(mcbi->mcbi_lockp); 860 } 861 862 static boolean_t 863 mac_callback_lookup(mac_cb_t **mcb_headp, mac_cb_t *mcb_elem) 864 { 865 mac_cb_t *mcb; 866 867 /* Verify it is not already in the list */ 868 for (mcb = *mcb_headp; mcb != NULL; mcb = mcb->mcb_nextp) { 869 if (mcb == mcb_elem) 870 return (B_TRUE); 871 } 872 873 return (B_FALSE); 874 } 875 876 static boolean_t 877 mac_callback_find(mac_cb_info_t *mcbi, mac_cb_t **mcb_headp, mac_cb_t *mcb_elem) 878 { 879 boolean_t found; 880 881 mutex_enter(mcbi->mcbi_lockp); 882 found = mac_callback_lookup(mcb_headp, mcb_elem); 883 mutex_exit(mcbi->mcbi_lockp); 884 885 return (found); 886 } 887 888 /* Free the list of removed callbacks */ 889 void 890 mac_callback_free(mac_cb_t *rmlist) 891 { 892 mac_cb_t *mcb; 893 mac_cb_t *mcb_next; 894 895 for (mcb = rmlist; mcb != NULL; mcb = mcb_next) { 896 mcb_next = mcb->mcb_nextp; 897 kmem_free(mcb->mcb_objp, mcb->mcb_objsize); 898 } 899 } 900 901 void 902 i_mac_notify(mac_impl_t *mip, mac_notify_type_t type) 903 { 904 mac_cb_info_t *mcbi; 905 906 /* 907 * Signal the notify thread even after mi_ref has become zero and 908 * mi_disabled is set. The synchronization with the notify thread 909 * happens in mac_unregister and that implies the driver must make 910 * sure it is single-threaded (with respect to mac calls) and that 911 * all pending mac calls have returned before it calls mac_unregister 912 */ 913 rw_enter(&i_mac_impl_lock, RW_READER); 914 if (mip->mi_state_flags & MIS_DISABLED) 915 goto exit; 916 917 /* 918 * Guard against incorrect notifications. (Running a newer 919 * mac client against an older implementation?) 920 */ 921 if (type >= MAC_NNOTE) 922 goto exit; 923 924 mcbi = &mip->mi_notify_cb_info; 925 mutex_enter(mcbi->mcbi_lockp); 926 mip->mi_notify_bits |= (1 << type); 927 cv_broadcast(&mcbi->mcbi_cv); 928 mutex_exit(mcbi->mcbi_lockp); 929 930 exit: 931 rw_exit(&i_mac_impl_lock); 932 } 933 934 /* 935 * Mac serialization primitives. Please see the block comment at the 936 * top of the file. 937 */ 938 void 939 i_mac_perim_enter(mac_impl_t *mip) 940 { 941 mac_client_impl_t *mcip; 942 943 if (mip->mi_state_flags & MIS_IS_VNIC) { 944 /* 945 * This is a VNIC. Return the lower mac since that is what 946 * we want to serialize on. 947 */ 948 mcip = mac_vnic_lower(mip); 949 mip = mcip->mci_mip; 950 } 951 952 mutex_enter(&mip->mi_perim_lock); 953 if (mip->mi_perim_owner == curthread) { 954 mip->mi_perim_ocnt++; 955 mutex_exit(&mip->mi_perim_lock); 956 return; 957 } 958 959 while (mip->mi_perim_owner != NULL) 960 cv_wait(&mip->mi_perim_cv, &mip->mi_perim_lock); 961 962 mip->mi_perim_owner = curthread; 963 ASSERT(mip->mi_perim_ocnt == 0); 964 mip->mi_perim_ocnt++; 965 #ifdef DEBUG 966 mip->mi_perim_stack_depth = getpcstack(mip->mi_perim_stack, 967 MAC_PERIM_STACK_DEPTH); 968 #endif 969 mutex_exit(&mip->mi_perim_lock); 970 } 971 972 int 973 i_mac_perim_enter_nowait(mac_impl_t *mip) 974 { 975 /* 976 * The vnic is a special case, since the serialization is done based 977 * on the lower mac. If the lower mac is busy, it does not imply the 978 * vnic can't be unregistered. But in the case of other drivers, 979 * a busy perimeter or open mac handles implies that the mac is busy 980 * and can't be unregistered. 981 */ 982 if (mip->mi_state_flags & MIS_IS_VNIC) { 983 i_mac_perim_enter(mip); 984 return (0); 985 } 986 987 mutex_enter(&mip->mi_perim_lock); 988 if (mip->mi_perim_owner != NULL) { 989 mutex_exit(&mip->mi_perim_lock); 990 return (EBUSY); 991 } 992 ASSERT(mip->mi_perim_ocnt == 0); 993 mip->mi_perim_owner = curthread; 994 mip->mi_perim_ocnt++; 995 mutex_exit(&mip->mi_perim_lock); 996 997 return (0); 998 } 999 1000 void 1001 i_mac_perim_exit(mac_impl_t *mip) 1002 { 1003 mac_client_impl_t *mcip; 1004 1005 if (mip->mi_state_flags & MIS_IS_VNIC) { 1006 /* 1007 * This is a VNIC. Return the lower mac since that is what 1008 * we want to serialize on. 1009 */ 1010 mcip = mac_vnic_lower(mip); 1011 mip = mcip->mci_mip; 1012 } 1013 1014 ASSERT(mip->mi_perim_owner == curthread && mip->mi_perim_ocnt != 0); 1015 1016 mutex_enter(&mip->mi_perim_lock); 1017 if (--mip->mi_perim_ocnt == 0) { 1018 mip->mi_perim_owner = NULL; 1019 cv_signal(&mip->mi_perim_cv); 1020 } 1021 mutex_exit(&mip->mi_perim_lock); 1022 } 1023 1024 /* 1025 * Returns whether the current thread holds the mac perimeter. Used in making 1026 * assertions. 1027 */ 1028 boolean_t 1029 mac_perim_held(mac_handle_t mh) 1030 { 1031 mac_impl_t *mip = (mac_impl_t *)mh; 1032 mac_client_impl_t *mcip; 1033 1034 if (mip->mi_state_flags & MIS_IS_VNIC) { 1035 /* 1036 * This is a VNIC. Return the lower mac since that is what 1037 * we want to serialize on. 1038 */ 1039 mcip = mac_vnic_lower(mip); 1040 mip = mcip->mci_mip; 1041 } 1042 return (mip->mi_perim_owner == curthread); 1043 } 1044 1045 /* 1046 * mac client interfaces to enter the mac perimeter of a mac end point, given 1047 * its mac handle, or macname or linkid. 1048 */ 1049 void 1050 mac_perim_enter_by_mh(mac_handle_t mh, mac_perim_handle_t *mphp) 1051 { 1052 mac_impl_t *mip = (mac_impl_t *)mh; 1053 1054 i_mac_perim_enter(mip); 1055 /* 1056 * The mac_perim_handle_t returned encodes the 'mip' and whether a 1057 * mac_open has been done internally while entering the perimeter. 1058 * This information is used in mac_perim_exit 1059 */ 1060 MAC_ENCODE_MPH(*mphp, mip, 0); 1061 } 1062 1063 int 1064 mac_perim_enter_by_macname(const char *name, mac_perim_handle_t *mphp) 1065 { 1066 int err; 1067 mac_handle_t mh; 1068 1069 if ((err = mac_open(name, &mh)) != 0) 1070 return (err); 1071 1072 mac_perim_enter_by_mh(mh, mphp); 1073 MAC_ENCODE_MPH(*mphp, mh, 1); 1074 return (0); 1075 } 1076 1077 int 1078 mac_perim_enter_by_linkid(datalink_id_t linkid, mac_perim_handle_t *mphp) 1079 { 1080 int err; 1081 mac_handle_t mh; 1082 1083 if ((err = mac_open_by_linkid(linkid, &mh)) != 0) 1084 return (err); 1085 1086 mac_perim_enter_by_mh(mh, mphp); 1087 MAC_ENCODE_MPH(*mphp, mh, 1); 1088 return (0); 1089 } 1090 1091 void 1092 mac_perim_exit(mac_perim_handle_t mph) 1093 { 1094 mac_impl_t *mip; 1095 boolean_t need_close; 1096 1097 MAC_DECODE_MPH(mph, mip, need_close); 1098 i_mac_perim_exit(mip); 1099 if (need_close) 1100 mac_close((mac_handle_t)mip); 1101 } 1102 1103 int 1104 mac_hold(const char *macname, mac_impl_t **pmip) 1105 { 1106 mac_impl_t *mip; 1107 int err; 1108 1109 /* 1110 * Check the device name length to make sure it won't overflow our 1111 * buffer. 1112 */ 1113 if (strlen(macname) >= MAXNAMELEN) 1114 return (EINVAL); 1115 1116 /* 1117 * Look up its entry in the global hash table. 1118 */ 1119 rw_enter(&i_mac_impl_lock, RW_WRITER); 1120 err = mod_hash_find(i_mac_impl_hash, (mod_hash_key_t)macname, 1121 (mod_hash_val_t *)&mip); 1122 1123 if (err != 0) { 1124 rw_exit(&i_mac_impl_lock); 1125 return (ENOENT); 1126 } 1127 1128 if (mip->mi_state_flags & MIS_DISABLED) { 1129 rw_exit(&i_mac_impl_lock); 1130 return (ENOENT); 1131 } 1132 1133 if (mip->mi_state_flags & MIS_EXCLUSIVE_HELD) { 1134 rw_exit(&i_mac_impl_lock); 1135 return (EBUSY); 1136 } 1137 1138 mip->mi_ref++; 1139 rw_exit(&i_mac_impl_lock); 1140 1141 *pmip = mip; 1142 return (0); 1143 } 1144 1145 void 1146 mac_rele(mac_impl_t *mip) 1147 { 1148 rw_enter(&i_mac_impl_lock, RW_WRITER); 1149 ASSERT(mip->mi_ref != 0); 1150 if (--mip->mi_ref == 0) { 1151 ASSERT(mip->mi_nactiveclients == 0 && 1152 !(mip->mi_state_flags & MIS_EXCLUSIVE)); 1153 } 1154 rw_exit(&i_mac_impl_lock); 1155 } 1156 1157 /* 1158 * Private GLDv3 function to start a MAC instance. 1159 */ 1160 int 1161 mac_start(mac_handle_t mh) 1162 { 1163 mac_impl_t *mip = (mac_impl_t *)mh; 1164 int err = 0; 1165 mac_group_t *defgrp; 1166 1167 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 1168 ASSERT(mip->mi_start != NULL); 1169 1170 /* 1171 * Check whether the device is already started. 1172 */ 1173 if (mip->mi_active++ == 0) { 1174 mac_ring_t *ring = NULL; 1175 1176 /* 1177 * Start the device. 1178 */ 1179 err = mip->mi_start(mip->mi_driver); 1180 if (err != 0) { 1181 mip->mi_active--; 1182 return (err); 1183 } 1184 1185 /* 1186 * Start the default tx ring. 1187 */ 1188 if (mip->mi_default_tx_ring != NULL) { 1189 1190 ring = (mac_ring_t *)mip->mi_default_tx_ring; 1191 if (ring->mr_state != MR_INUSE) { 1192 err = mac_start_ring(ring); 1193 if (err != 0) { 1194 mip->mi_active--; 1195 return (err); 1196 } 1197 } 1198 } 1199 1200 if ((defgrp = MAC_DEFAULT_RX_GROUP(mip)) != NULL) { 1201 /* 1202 * Start the default group which is responsible 1203 * for receiving broadcast and multicast 1204 * traffic for both primary and non-primary 1205 * MAC clients. 1206 */ 1207 ASSERT(defgrp->mrg_state == MAC_GROUP_STATE_REGISTERED); 1208 err = mac_start_group_and_rings(defgrp); 1209 if (err != 0) { 1210 mip->mi_active--; 1211 if ((ring != NULL) && 1212 (ring->mr_state == MR_INUSE)) 1213 mac_stop_ring(ring); 1214 return (err); 1215 } 1216 mac_set_group_state(defgrp, MAC_GROUP_STATE_SHARED); 1217 } 1218 } 1219 1220 return (err); 1221 } 1222 1223 /* 1224 * Private GLDv3 function to stop a MAC instance. 1225 */ 1226 void 1227 mac_stop(mac_handle_t mh) 1228 { 1229 mac_impl_t *mip = (mac_impl_t *)mh; 1230 mac_group_t *grp; 1231 1232 ASSERT(mip->mi_stop != NULL); 1233 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 1234 1235 /* 1236 * Check whether the device is still needed. 1237 */ 1238 ASSERT(mip->mi_active != 0); 1239 if (--mip->mi_active == 0) { 1240 if ((grp = MAC_DEFAULT_RX_GROUP(mip)) != NULL) { 1241 /* 1242 * There should be no more active clients since the 1243 * MAC is being stopped. Stop the default RX group 1244 * and transition it back to registered state. 1245 * 1246 * When clients are torn down, the groups 1247 * are release via mac_release_rx_group which 1248 * knows the the default group is always in 1249 * started mode since broadcast uses it. So 1250 * we can assert that their are no clients 1251 * (since mac_bcast_add doesn't register itself 1252 * as a client) and group is in SHARED state. 1253 */ 1254 ASSERT(grp->mrg_state == MAC_GROUP_STATE_SHARED); 1255 ASSERT(MAC_GROUP_NO_CLIENT(grp) && 1256 mip->mi_nactiveclients == 0); 1257 mac_stop_group_and_rings(grp); 1258 mac_set_group_state(grp, MAC_GROUP_STATE_REGISTERED); 1259 } 1260 1261 if (mip->mi_default_tx_ring != NULL) { 1262 mac_ring_t *ring; 1263 1264 ring = (mac_ring_t *)mip->mi_default_tx_ring; 1265 if (ring->mr_state == MR_INUSE) { 1266 mac_stop_ring(ring); 1267 ring->mr_flag = 0; 1268 } 1269 } 1270 1271 /* 1272 * Stop the device. 1273 */ 1274 mip->mi_stop(mip->mi_driver); 1275 } 1276 } 1277 1278 int 1279 i_mac_promisc_set(mac_impl_t *mip, boolean_t on) 1280 { 1281 int err = 0; 1282 1283 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 1284 ASSERT(mip->mi_setpromisc != NULL); 1285 1286 if (on) { 1287 /* 1288 * Enable promiscuous mode on the device if not yet enabled. 1289 */ 1290 if (mip->mi_devpromisc++ == 0) { 1291 err = mip->mi_setpromisc(mip->mi_driver, B_TRUE); 1292 if (err != 0) { 1293 mip->mi_devpromisc--; 1294 return (err); 1295 } 1296 i_mac_notify(mip, MAC_NOTE_DEVPROMISC); 1297 } 1298 } else { 1299 if (mip->mi_devpromisc == 0) 1300 return (EPROTO); 1301 1302 /* 1303 * Disable promiscuous mode on the device if this is the last 1304 * enabling. 1305 */ 1306 if (--mip->mi_devpromisc == 0) { 1307 err = mip->mi_setpromisc(mip->mi_driver, B_FALSE); 1308 if (err != 0) { 1309 mip->mi_devpromisc++; 1310 return (err); 1311 } 1312 i_mac_notify(mip, MAC_NOTE_DEVPROMISC); 1313 } 1314 } 1315 1316 return (0); 1317 } 1318 1319 /* 1320 * The promiscuity state can change any time. If the caller needs to take 1321 * actions that are atomic with the promiscuity state, then the caller needs 1322 * to bracket the entire sequence with mac_perim_enter/exit 1323 */ 1324 boolean_t 1325 mac_promisc_get(mac_handle_t mh) 1326 { 1327 mac_impl_t *mip = (mac_impl_t *)mh; 1328 1329 /* 1330 * Return the current promiscuity. 1331 */ 1332 return (mip->mi_devpromisc != 0); 1333 } 1334 1335 /* 1336 * Invoked at MAC instance attach time to initialize the list 1337 * of factory MAC addresses supported by a MAC instance. This function 1338 * builds a local cache in the mac_impl_t for the MAC addresses 1339 * supported by the underlying hardware. The MAC clients themselves 1340 * use the mac_addr_factory*() functions to query and reserve 1341 * factory MAC addresses. 1342 */ 1343 void 1344 mac_addr_factory_init(mac_impl_t *mip) 1345 { 1346 mac_capab_multifactaddr_t capab; 1347 uint8_t *addr; 1348 int i; 1349 1350 /* 1351 * First round to see how many factory MAC addresses are available. 1352 */ 1353 bzero(&capab, sizeof (capab)); 1354 if (!i_mac_capab_get((mac_handle_t)mip, MAC_CAPAB_MULTIFACTADDR, 1355 &capab) || (capab.mcm_naddr == 0)) { 1356 /* 1357 * The MAC instance doesn't support multiple factory 1358 * MAC addresses, we're done here. 1359 */ 1360 return; 1361 } 1362 1363 /* 1364 * Allocate the space and get all the factory addresses. 1365 */ 1366 addr = kmem_alloc(capab.mcm_naddr * MAXMACADDRLEN, KM_SLEEP); 1367 capab.mcm_getaddr(mip->mi_driver, capab.mcm_naddr, addr); 1368 1369 mip->mi_factory_addr_num = capab.mcm_naddr; 1370 mip->mi_factory_addr = kmem_zalloc(mip->mi_factory_addr_num * 1371 sizeof (mac_factory_addr_t), KM_SLEEP); 1372 1373 for (i = 0; i < capab.mcm_naddr; i++) { 1374 bcopy(addr + i * MAXMACADDRLEN, 1375 mip->mi_factory_addr[i].mfa_addr, 1376 mip->mi_type->mt_addr_length); 1377 mip->mi_factory_addr[i].mfa_in_use = B_FALSE; 1378 } 1379 1380 kmem_free(addr, capab.mcm_naddr * MAXMACADDRLEN); 1381 } 1382 1383 void 1384 mac_addr_factory_fini(mac_impl_t *mip) 1385 { 1386 if (mip->mi_factory_addr == NULL) { 1387 ASSERT(mip->mi_factory_addr_num == 0); 1388 return; 1389 } 1390 1391 kmem_free(mip->mi_factory_addr, mip->mi_factory_addr_num * 1392 sizeof (mac_factory_addr_t)); 1393 1394 mip->mi_factory_addr = NULL; 1395 mip->mi_factory_addr_num = 0; 1396 } 1397 1398 /* 1399 * Reserve a factory MAC address. If *slot is set to -1, the function 1400 * attempts to reserve any of the available factory MAC addresses and 1401 * returns the reserved slot id. If no slots are available, the function 1402 * returns ENOSPC. If *slot is not set to -1, the function reserves 1403 * the specified slot if it is available, or returns EBUSY is the slot 1404 * is already used. Returns ENOTSUP if the underlying MAC does not 1405 * support multiple factory addresses. If the slot number is not -1 but 1406 * is invalid, returns EINVAL. 1407 */ 1408 int 1409 mac_addr_factory_reserve(mac_client_handle_t mch, int *slot) 1410 { 1411 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 1412 mac_impl_t *mip = mcip->mci_mip; 1413 int i, ret = 0; 1414 1415 i_mac_perim_enter(mip); 1416 /* 1417 * Protect against concurrent readers that may need a self-consistent 1418 * view of the factory addresses 1419 */ 1420 rw_enter(&mip->mi_rw_lock, RW_WRITER); 1421 1422 if (mip->mi_factory_addr_num == 0) { 1423 ret = ENOTSUP; 1424 goto bail; 1425 } 1426 1427 if (*slot != -1) { 1428 /* check the specified slot */ 1429 if (*slot < 1 || *slot > mip->mi_factory_addr_num) { 1430 ret = EINVAL; 1431 goto bail; 1432 } 1433 if (mip->mi_factory_addr[*slot-1].mfa_in_use) { 1434 ret = EBUSY; 1435 goto bail; 1436 } 1437 } else { 1438 /* pick the next available slot */ 1439 for (i = 0; i < mip->mi_factory_addr_num; i++) { 1440 if (!mip->mi_factory_addr[i].mfa_in_use) 1441 break; 1442 } 1443 1444 if (i == mip->mi_factory_addr_num) { 1445 ret = ENOSPC; 1446 goto bail; 1447 } 1448 *slot = i+1; 1449 } 1450 1451 mip->mi_factory_addr[*slot-1].mfa_in_use = B_TRUE; 1452 mip->mi_factory_addr[*slot-1].mfa_client = mcip; 1453 1454 bail: 1455 rw_exit(&mip->mi_rw_lock); 1456 i_mac_perim_exit(mip); 1457 return (ret); 1458 } 1459 1460 /* 1461 * Release the specified factory MAC address slot. 1462 */ 1463 void 1464 mac_addr_factory_release(mac_client_handle_t mch, uint_t slot) 1465 { 1466 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 1467 mac_impl_t *mip = mcip->mci_mip; 1468 1469 i_mac_perim_enter(mip); 1470 /* 1471 * Protect against concurrent readers that may need a self-consistent 1472 * view of the factory addresses 1473 */ 1474 rw_enter(&mip->mi_rw_lock, RW_WRITER); 1475 1476 ASSERT(slot > 0 && slot <= mip->mi_factory_addr_num); 1477 ASSERT(mip->mi_factory_addr[slot-1].mfa_in_use); 1478 1479 mip->mi_factory_addr[slot-1].mfa_in_use = B_FALSE; 1480 1481 rw_exit(&mip->mi_rw_lock); 1482 i_mac_perim_exit(mip); 1483 } 1484 1485 /* 1486 * Stores in mac_addr the value of the specified MAC address. Returns 1487 * 0 on success, or EINVAL if the slot number is not valid for the MAC. 1488 * The caller must provide a string of at least MAXNAMELEN bytes. 1489 */ 1490 void 1491 mac_addr_factory_value(mac_handle_t mh, int slot, uchar_t *mac_addr, 1492 uint_t *addr_len, char *client_name, boolean_t *in_use_arg) 1493 { 1494 mac_impl_t *mip = (mac_impl_t *)mh; 1495 boolean_t in_use; 1496 1497 ASSERT(slot > 0 && slot <= mip->mi_factory_addr_num); 1498 1499 /* 1500 * Readers need to hold mi_rw_lock. Writers need to hold mac perimeter 1501 * and mi_rw_lock 1502 */ 1503 rw_enter(&mip->mi_rw_lock, RW_READER); 1504 bcopy(mip->mi_factory_addr[slot-1].mfa_addr, mac_addr, MAXMACADDRLEN); 1505 *addr_len = mip->mi_type->mt_addr_length; 1506 in_use = mip->mi_factory_addr[slot-1].mfa_in_use; 1507 if (in_use && client_name != NULL) { 1508 bcopy(mip->mi_factory_addr[slot-1].mfa_client->mci_name, 1509 client_name, MAXNAMELEN); 1510 } 1511 if (in_use_arg != NULL) 1512 *in_use_arg = in_use; 1513 rw_exit(&mip->mi_rw_lock); 1514 } 1515 1516 /* 1517 * Returns the number of factory MAC addresses (in addition to the 1518 * primary MAC address), 0 if the underlying MAC doesn't support 1519 * that feature. 1520 */ 1521 uint_t 1522 mac_addr_factory_num(mac_handle_t mh) 1523 { 1524 mac_impl_t *mip = (mac_impl_t *)mh; 1525 1526 return (mip->mi_factory_addr_num); 1527 } 1528 1529 1530 void 1531 mac_rx_group_unmark(mac_group_t *grp, uint_t flag) 1532 { 1533 mac_ring_t *ring; 1534 1535 for (ring = grp->mrg_rings; ring != NULL; ring = ring->mr_next) 1536 ring->mr_flag &= ~flag; 1537 } 1538 1539 /* 1540 * The following mac_hwrings_xxx() functions are private mac client functions 1541 * used by the aggr driver to access and control the underlying HW Rx group 1542 * and rings. In this case, the aggr driver has exclusive control of the 1543 * underlying HW Rx group/rings, it calls the following functions to 1544 * start/stop the HW Rx rings, disable/enable polling, add/remove MAC 1545 * addresses, or set up the Rx callback. 1546 */ 1547 /* ARGSUSED */ 1548 static void 1549 mac_hwrings_rx_process(void *arg, mac_resource_handle_t srs, 1550 mblk_t *mp_chain, boolean_t loopback) 1551 { 1552 mac_soft_ring_set_t *mac_srs = (mac_soft_ring_set_t *)srs; 1553 mac_srs_rx_t *srs_rx = &mac_srs->srs_rx; 1554 mac_direct_rx_t proc; 1555 void *arg1; 1556 mac_resource_handle_t arg2; 1557 1558 proc = srs_rx->sr_func; 1559 arg1 = srs_rx->sr_arg1; 1560 arg2 = mac_srs->srs_mrh; 1561 1562 proc(arg1, arg2, mp_chain, NULL); 1563 } 1564 1565 /* 1566 * This function is called to get the list of HW rings that are reserved by 1567 * an exclusive mac client. 1568 * 1569 * Return value: the number of HW rings. 1570 */ 1571 int 1572 mac_hwrings_get(mac_client_handle_t mch, mac_group_handle_t *hwgh, 1573 mac_ring_handle_t *hwrh, mac_ring_type_t rtype) 1574 { 1575 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 1576 flow_entry_t *flent = mcip->mci_flent; 1577 mac_group_t *grp; 1578 mac_ring_t *ring; 1579 int cnt = 0; 1580 1581 if (rtype == MAC_RING_TYPE_RX) { 1582 grp = flent->fe_rx_ring_group; 1583 } else if (rtype == MAC_RING_TYPE_TX) { 1584 grp = flent->fe_tx_ring_group; 1585 } else { 1586 ASSERT(B_FALSE); 1587 return (-1); 1588 } 1589 1590 /* 1591 * The MAC client did not reserve an Rx group, return directly. 1592 * This is probably because the underlying MAC does not support 1593 * any groups. 1594 */ 1595 if (hwgh != NULL) 1596 *hwgh = NULL; 1597 if (grp == NULL) 1598 return (0); 1599 /* 1600 * This group must be reserved by this MAC client. 1601 */ 1602 ASSERT((grp->mrg_state == MAC_GROUP_STATE_RESERVED) && 1603 (mcip == MAC_GROUP_ONLY_CLIENT(grp))); 1604 1605 for (ring = grp->mrg_rings; ring != NULL; ring = ring->mr_next, cnt++) { 1606 ASSERT(cnt < MAX_RINGS_PER_GROUP); 1607 hwrh[cnt] = (mac_ring_handle_t)ring; 1608 } 1609 if (hwgh != NULL) 1610 *hwgh = (mac_group_handle_t)grp; 1611 1612 return (cnt); 1613 } 1614 1615 /* 1616 * Get the HW ring handles of the given group index. If the MAC 1617 * doesn't have a group at this index, or any groups at all, then 0 is 1618 * returned and hwgh is set to NULL. This is a private client API. The 1619 * MAC perimeter must be held when calling this function. 1620 * 1621 * mh: A handle to the MAC that owns the group. 1622 * 1623 * idx: The index of the HW group to be read. 1624 * 1625 * hwgh: If non-NULL, contains a handle to the HW group on return. 1626 * 1627 * hwrh: An array of ring handles pointing to the HW rings in the 1628 * group. The array must be large enough to hold a handle to each ring 1629 * in the group. To be safe, this array should be of size MAX_RINGS_PER_GROUP. 1630 * 1631 * rtype: Used to determine if we are fetching Rx or Tx rings. 1632 * 1633 * Returns the number of rings in the group. 1634 */ 1635 uint_t 1636 mac_hwrings_idx_get(mac_handle_t mh, uint_t idx, mac_group_handle_t *hwgh, 1637 mac_ring_handle_t *hwrh, mac_ring_type_t rtype) 1638 { 1639 mac_impl_t *mip = (mac_impl_t *)mh; 1640 mac_group_t *grp; 1641 mac_ring_t *ring; 1642 uint_t cnt = 0; 1643 1644 /* 1645 * The MAC perimeter must be held when accessing the 1646 * mi_{rx,tx}_groups fields. 1647 */ 1648 ASSERT(MAC_PERIM_HELD(mh)); 1649 ASSERT(rtype == MAC_RING_TYPE_RX || rtype == MAC_RING_TYPE_TX); 1650 1651 if (rtype == MAC_RING_TYPE_RX) { 1652 grp = mip->mi_rx_groups; 1653 } else { 1654 ASSERT(rtype == MAC_RING_TYPE_TX); 1655 grp = mip->mi_tx_groups; 1656 } 1657 1658 while (grp != NULL && grp->mrg_index != idx) 1659 grp = grp->mrg_next; 1660 1661 /* 1662 * If the MAC doesn't have a group at this index or doesn't 1663 * impelement RINGS capab, then set hwgh to NULL and return 0. 1664 */ 1665 if (hwgh != NULL) 1666 *hwgh = NULL; 1667 1668 if (grp == NULL) 1669 return (0); 1670 1671 ASSERT3U(idx, ==, grp->mrg_index); 1672 1673 for (ring = grp->mrg_rings; ring != NULL; ring = ring->mr_next, cnt++) { 1674 ASSERT3U(cnt, <, MAX_RINGS_PER_GROUP); 1675 hwrh[cnt] = (mac_ring_handle_t)ring; 1676 } 1677 1678 /* A group should always have at least one ring. */ 1679 ASSERT3U(cnt, >, 0); 1680 1681 if (hwgh != NULL) 1682 *hwgh = (mac_group_handle_t)grp; 1683 1684 return (cnt); 1685 } 1686 1687 /* 1688 * This function is called to get info about Tx/Rx rings. 1689 * 1690 * Return value: returns uint_t which will have various bits set 1691 * that indicates different properties of the ring. 1692 */ 1693 uint_t 1694 mac_hwring_getinfo(mac_ring_handle_t rh) 1695 { 1696 mac_ring_t *ring = (mac_ring_t *)rh; 1697 mac_ring_info_t *info = &ring->mr_info; 1698 1699 return (info->mri_flags); 1700 } 1701 1702 /* 1703 * Set the passthru callback on the hardware ring. 1704 */ 1705 void 1706 mac_hwring_set_passthru(mac_ring_handle_t hwrh, mac_rx_t fn, void *arg1, 1707 mac_resource_handle_t arg2) 1708 { 1709 mac_ring_t *hwring = (mac_ring_t *)hwrh; 1710 1711 ASSERT3S(hwring->mr_type, ==, MAC_RING_TYPE_RX); 1712 1713 hwring->mr_classify_type = MAC_PASSTHRU_CLASSIFIER; 1714 1715 hwring->mr_pt_fn = fn; 1716 hwring->mr_pt_arg1 = arg1; 1717 hwring->mr_pt_arg2 = arg2; 1718 } 1719 1720 /* 1721 * Clear the passthru callback on the hardware ring. 1722 */ 1723 void 1724 mac_hwring_clear_passthru(mac_ring_handle_t hwrh) 1725 { 1726 mac_ring_t *hwring = (mac_ring_t *)hwrh; 1727 1728 ASSERT3S(hwring->mr_type, ==, MAC_RING_TYPE_RX); 1729 1730 hwring->mr_classify_type = MAC_NO_CLASSIFIER; 1731 1732 hwring->mr_pt_fn = NULL; 1733 hwring->mr_pt_arg1 = NULL; 1734 hwring->mr_pt_arg2 = NULL; 1735 } 1736 1737 void 1738 mac_client_set_flow_cb(mac_client_handle_t mch, mac_rx_t func, void *arg1) 1739 { 1740 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 1741 flow_entry_t *flent = mcip->mci_flent; 1742 1743 mutex_enter(&flent->fe_lock); 1744 flent->fe_cb_fn = (flow_fn_t)func; 1745 flent->fe_cb_arg1 = arg1; 1746 flent->fe_cb_arg2 = NULL; 1747 flent->fe_flags &= ~FE_MC_NO_DATAPATH; 1748 mutex_exit(&flent->fe_lock); 1749 } 1750 1751 void 1752 mac_client_clear_flow_cb(mac_client_handle_t mch) 1753 { 1754 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 1755 flow_entry_t *flent = mcip->mci_flent; 1756 1757 mutex_enter(&flent->fe_lock); 1758 flent->fe_cb_fn = (flow_fn_t)mac_rx_def; 1759 flent->fe_cb_arg1 = NULL; 1760 flent->fe_cb_arg2 = NULL; 1761 flent->fe_flags |= FE_MC_NO_DATAPATH; 1762 mutex_exit(&flent->fe_lock); 1763 } 1764 1765 /* 1766 * Export ddi interrupt handles from the HW ring to the pseudo ring and 1767 * setup the RX callback of the mac client which exclusively controls 1768 * HW ring. 1769 */ 1770 void 1771 mac_hwring_setup(mac_ring_handle_t hwrh, mac_resource_handle_t prh, 1772 mac_ring_handle_t pseudo_rh) 1773 { 1774 mac_ring_t *hw_ring = (mac_ring_t *)hwrh; 1775 mac_ring_t *pseudo_ring; 1776 mac_soft_ring_set_t *mac_srs = hw_ring->mr_srs; 1777 1778 if (pseudo_rh != NULL) { 1779 pseudo_ring = (mac_ring_t *)pseudo_rh; 1780 /* Export the ddi handles to pseudo ring */ 1781 pseudo_ring->mr_info.mri_intr.mi_ddi_handle = 1782 hw_ring->mr_info.mri_intr.mi_ddi_handle; 1783 pseudo_ring->mr_info.mri_intr.mi_ddi_shared = 1784 hw_ring->mr_info.mri_intr.mi_ddi_shared; 1785 /* 1786 * Save a pointer to pseudo ring in the hw ring. If 1787 * interrupt handle changes, the hw ring will be 1788 * notified of the change (see mac_ring_intr_set()) 1789 * and the appropriate change has to be made to 1790 * the pseudo ring that has exported the ddi handle. 1791 */ 1792 hw_ring->mr_prh = pseudo_rh; 1793 } 1794 1795 if (hw_ring->mr_type == MAC_RING_TYPE_RX) { 1796 ASSERT(!(mac_srs->srs_type & SRST_TX)); 1797 mac_srs->srs_mrh = prh; 1798 mac_srs->srs_rx.sr_lower_proc = mac_hwrings_rx_process; 1799 } 1800 } 1801 1802 void 1803 mac_hwring_teardown(mac_ring_handle_t hwrh) 1804 { 1805 mac_ring_t *hw_ring = (mac_ring_t *)hwrh; 1806 mac_soft_ring_set_t *mac_srs; 1807 1808 if (hw_ring == NULL) 1809 return; 1810 hw_ring->mr_prh = NULL; 1811 if (hw_ring->mr_type == MAC_RING_TYPE_RX) { 1812 mac_srs = hw_ring->mr_srs; 1813 ASSERT(!(mac_srs->srs_type & SRST_TX)); 1814 mac_srs->srs_rx.sr_lower_proc = mac_rx_srs_process; 1815 mac_srs->srs_mrh = NULL; 1816 } 1817 } 1818 1819 int 1820 mac_hwring_disable_intr(mac_ring_handle_t rh) 1821 { 1822 mac_ring_t *rr_ring = (mac_ring_t *)rh; 1823 mac_intr_t *intr = &rr_ring->mr_info.mri_intr; 1824 1825 return (intr->mi_disable(intr->mi_handle)); 1826 } 1827 1828 int 1829 mac_hwring_enable_intr(mac_ring_handle_t rh) 1830 { 1831 mac_ring_t *rr_ring = (mac_ring_t *)rh; 1832 mac_intr_t *intr = &rr_ring->mr_info.mri_intr; 1833 1834 return (intr->mi_enable(intr->mi_handle)); 1835 } 1836 1837 /* 1838 * Start the HW ring pointed to by rh. 1839 * 1840 * This is used by special MAC clients that are MAC themselves and 1841 * need to exert control over the underlying HW rings of the NIC. 1842 */ 1843 int 1844 mac_hwring_start(mac_ring_handle_t rh) 1845 { 1846 mac_ring_t *rr_ring = (mac_ring_t *)rh; 1847 int rv = 0; 1848 1849 if (rr_ring->mr_state != MR_INUSE) 1850 rv = mac_start_ring(rr_ring); 1851 1852 return (rv); 1853 } 1854 1855 /* 1856 * Stop the HW ring pointed to by rh. Also see mac_hwring_start(). 1857 */ 1858 void 1859 mac_hwring_stop(mac_ring_handle_t rh) 1860 { 1861 mac_ring_t *rr_ring = (mac_ring_t *)rh; 1862 1863 if (rr_ring->mr_state != MR_FREE) 1864 mac_stop_ring(rr_ring); 1865 } 1866 1867 /* 1868 * Remove the quiesced flag from the HW ring pointed to by rh. 1869 * 1870 * This is used by special MAC clients that are MAC themselves and 1871 * need to exert control over the underlying HW rings of the NIC. 1872 */ 1873 int 1874 mac_hwring_activate(mac_ring_handle_t rh) 1875 { 1876 mac_ring_t *rr_ring = (mac_ring_t *)rh; 1877 1878 MAC_RING_UNMARK(rr_ring, MR_QUIESCE); 1879 return (0); 1880 } 1881 1882 /* 1883 * Quiesce the HW ring pointed to by rh. Also see mac_hwring_activate(). 1884 */ 1885 void 1886 mac_hwring_quiesce(mac_ring_handle_t rh) 1887 { 1888 mac_ring_t *rr_ring = (mac_ring_t *)rh; 1889 1890 mac_rx_ring_quiesce(rr_ring, MR_QUIESCE); 1891 } 1892 1893 mblk_t * 1894 mac_hwring_poll(mac_ring_handle_t rh, int bytes_to_pickup) 1895 { 1896 mac_ring_t *rr_ring = (mac_ring_t *)rh; 1897 mac_ring_info_t *info = &rr_ring->mr_info; 1898 1899 return (info->mri_poll(info->mri_driver, bytes_to_pickup)); 1900 } 1901 1902 /* 1903 * Send packets through a selected tx ring. 1904 */ 1905 mblk_t * 1906 mac_hwring_tx(mac_ring_handle_t rh, mblk_t *mp) 1907 { 1908 mac_ring_t *ring = (mac_ring_t *)rh; 1909 mac_ring_info_t *info = &ring->mr_info; 1910 1911 ASSERT(ring->mr_type == MAC_RING_TYPE_TX && 1912 ring->mr_state >= MR_INUSE); 1913 return (info->mri_tx(info->mri_driver, mp)); 1914 } 1915 1916 /* 1917 * Query stats for a particular rx/tx ring 1918 */ 1919 int 1920 mac_hwring_getstat(mac_ring_handle_t rh, uint_t stat, uint64_t *val) 1921 { 1922 mac_ring_t *ring = (mac_ring_t *)rh; 1923 mac_ring_info_t *info = &ring->mr_info; 1924 1925 return (info->mri_stat(info->mri_driver, stat, val)); 1926 } 1927 1928 /* 1929 * Private function that is only used by aggr to send packets through 1930 * a port/Tx ring. Since aggr exposes a pseudo Tx ring even for ports 1931 * that does not expose Tx rings, aggr_ring_tx() entry point needs 1932 * access to mac_impl_t to send packets through m_tx() entry point. 1933 * It accomplishes this by calling mac_hwring_send_priv() function. 1934 */ 1935 mblk_t * 1936 mac_hwring_send_priv(mac_client_handle_t mch, mac_ring_handle_t rh, mblk_t *mp) 1937 { 1938 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 1939 mac_impl_t *mip = mcip->mci_mip; 1940 1941 return (mac_provider_tx(mip, rh, mp, mcip)); 1942 } 1943 1944 /* 1945 * Private function that is only used by aggr to update the default transmission 1946 * ring. Because aggr exposes a pseudo Tx ring even for ports that may 1947 * temporarily be down, it may need to update the default ring that is used by 1948 * MAC such that it refers to a link that can actively be used to send traffic. 1949 * Note that this is different from the case where the port has been removed 1950 * from the group. In those cases, all of the rings will be torn down because 1951 * the ring will no longer exist. It's important to give aggr a case where the 1952 * rings can still exist such that it may be able to continue to send LACP PDUs 1953 * to potentially restore the link. 1954 */ 1955 void 1956 mac_hwring_set_default(mac_handle_t mh, mac_ring_handle_t rh) 1957 { 1958 mac_impl_t *mip = (mac_impl_t *)mh; 1959 mac_ring_t *ring = (mac_ring_t *)rh; 1960 1961 ASSERT(MAC_PERIM_HELD(mh)); 1962 VERIFY(mip->mi_state_flags & MIS_IS_AGGR); 1963 1964 /* 1965 * We used to condition this assignment on the ring's 1966 * 'mr_state' being one of 'MR_INUSE'. However, there are 1967 * cases where this is called before the ring has any active 1968 * clients, and therefore is not marked as in use. Since the 1969 * sole purpose of this function is for aggr to make sure 1970 * 'mi_default_tx_ring' matches 'lg_tx_ports[0]', its 1971 * imperative that we update its value regardless of ring 1972 * state. Otherwise, we can end up in a state where 1973 * 'mi_default_tx_ring' points to a pseudo ring of a downed 1974 * port, even when 'lg_tx_ports[0]' points to a port that is 1975 * up. 1976 */ 1977 mip->mi_default_tx_ring = rh; 1978 } 1979 1980 int 1981 mac_hwgroup_addmac(mac_group_handle_t gh, const uint8_t *addr) 1982 { 1983 mac_group_t *group = (mac_group_t *)gh; 1984 1985 return (mac_group_addmac(group, addr)); 1986 } 1987 1988 int 1989 mac_hwgroup_remmac(mac_group_handle_t gh, const uint8_t *addr) 1990 { 1991 mac_group_t *group = (mac_group_t *)gh; 1992 1993 return (mac_group_remmac(group, addr)); 1994 } 1995 1996 /* 1997 * Program the group's HW VLAN filter if it has such support. 1998 * Otherwise, the group will implicitly accept tagged traffic and 1999 * there is nothing to do. 2000 */ 2001 int 2002 mac_hwgroup_addvlan(mac_group_handle_t gh, uint16_t vid) 2003 { 2004 mac_group_t *group = (mac_group_t *)gh; 2005 2006 if (!MAC_GROUP_HW_VLAN(group)) 2007 return (0); 2008 2009 return (mac_group_addvlan(group, vid)); 2010 } 2011 2012 int 2013 mac_hwgroup_remvlan(mac_group_handle_t gh, uint16_t vid) 2014 { 2015 mac_group_t *group = (mac_group_t *)gh; 2016 2017 if (!MAC_GROUP_HW_VLAN(group)) 2018 return (0); 2019 2020 return (mac_group_remvlan(group, vid)); 2021 } 2022 2023 /* 2024 * Determine if a MAC has HW VLAN support. This is a private API 2025 * consumed by aggr. In the future it might be nice to have a bitfield 2026 * in mac_capab_rings_t to track which forms of HW filtering are 2027 * supported by the MAC. 2028 */ 2029 boolean_t 2030 mac_has_hw_vlan(mac_handle_t mh) 2031 { 2032 mac_impl_t *mip = (mac_impl_t *)mh; 2033 2034 return (MAC_GROUP_HW_VLAN(mip->mi_rx_groups)); 2035 } 2036 2037 /* 2038 * Get the number of Rx HW groups on this MAC. 2039 */ 2040 uint_t 2041 mac_get_num_rx_groups(mac_handle_t mh) 2042 { 2043 mac_impl_t *mip = (mac_impl_t *)mh; 2044 2045 ASSERT(MAC_PERIM_HELD(mh)); 2046 return (mip->mi_rx_group_count); 2047 } 2048 2049 int 2050 mac_set_promisc(mac_handle_t mh, boolean_t value) 2051 { 2052 mac_impl_t *mip = (mac_impl_t *)mh; 2053 2054 ASSERT(MAC_PERIM_HELD(mh)); 2055 return (i_mac_promisc_set(mip, value)); 2056 } 2057 2058 /* 2059 * Set the RX group to be shared/reserved. Note that the group must be 2060 * started/stopped outside of this function. 2061 */ 2062 void 2063 mac_set_group_state(mac_group_t *grp, mac_group_state_t state) 2064 { 2065 /* 2066 * If there is no change in the group state, just return. 2067 */ 2068 if (grp->mrg_state == state) 2069 return; 2070 2071 switch (state) { 2072 case MAC_GROUP_STATE_RESERVED: 2073 /* 2074 * Successfully reserved the group. 2075 * 2076 * Given that there is an exclusive client controlling this 2077 * group, we enable the group level polling when available, 2078 * so that SRSs get to turn on/off individual rings they's 2079 * assigned to. 2080 */ 2081 ASSERT(MAC_PERIM_HELD(grp->mrg_mh)); 2082 2083 if (grp->mrg_type == MAC_RING_TYPE_RX && 2084 GROUP_INTR_DISABLE_FUNC(grp) != NULL) { 2085 GROUP_INTR_DISABLE_FUNC(grp)(GROUP_INTR_HANDLE(grp)); 2086 } 2087 break; 2088 2089 case MAC_GROUP_STATE_SHARED: 2090 /* 2091 * Set all rings of this group to software classified. 2092 * If the group has an overriding interrupt, then re-enable it. 2093 */ 2094 ASSERT(MAC_PERIM_HELD(grp->mrg_mh)); 2095 2096 if (grp->mrg_type == MAC_RING_TYPE_RX && 2097 GROUP_INTR_ENABLE_FUNC(grp) != NULL) { 2098 GROUP_INTR_ENABLE_FUNC(grp)(GROUP_INTR_HANDLE(grp)); 2099 } 2100 /* The ring is not available for reservations any more */ 2101 break; 2102 2103 case MAC_GROUP_STATE_REGISTERED: 2104 /* Also callable from mac_register, perim is not held */ 2105 break; 2106 2107 default: 2108 ASSERT(B_FALSE); 2109 break; 2110 } 2111 2112 grp->mrg_state = state; 2113 } 2114 2115 /* 2116 * Quiesce future hardware classified packets for the specified Rx ring 2117 */ 2118 static void 2119 mac_rx_ring_quiesce(mac_ring_t *rx_ring, uint_t ring_flag) 2120 { 2121 ASSERT(rx_ring->mr_classify_type == MAC_HW_CLASSIFIER); 2122 ASSERT(ring_flag == MR_CONDEMNED || ring_flag == MR_QUIESCE); 2123 2124 mutex_enter(&rx_ring->mr_lock); 2125 rx_ring->mr_flag |= ring_flag; 2126 while (rx_ring->mr_refcnt != 0) 2127 cv_wait(&rx_ring->mr_cv, &rx_ring->mr_lock); 2128 mutex_exit(&rx_ring->mr_lock); 2129 } 2130 2131 /* 2132 * Please see mac_tx for details about the per cpu locking scheme 2133 */ 2134 static void 2135 mac_tx_lock_all(mac_client_impl_t *mcip) 2136 { 2137 int i; 2138 2139 for (i = 0; i <= mac_tx_percpu_cnt; i++) 2140 mutex_enter(&mcip->mci_tx_pcpu[i].pcpu_tx_lock); 2141 } 2142 2143 static void 2144 mac_tx_unlock_all(mac_client_impl_t *mcip) 2145 { 2146 int i; 2147 2148 for (i = mac_tx_percpu_cnt; i >= 0; i--) 2149 mutex_exit(&mcip->mci_tx_pcpu[i].pcpu_tx_lock); 2150 } 2151 2152 static void 2153 mac_tx_unlock_allbutzero(mac_client_impl_t *mcip) 2154 { 2155 int i; 2156 2157 for (i = mac_tx_percpu_cnt; i > 0; i--) 2158 mutex_exit(&mcip->mci_tx_pcpu[i].pcpu_tx_lock); 2159 } 2160 2161 static int 2162 mac_tx_sum_refcnt(mac_client_impl_t *mcip) 2163 { 2164 int i; 2165 int refcnt = 0; 2166 2167 for (i = 0; i <= mac_tx_percpu_cnt; i++) 2168 refcnt += mcip->mci_tx_pcpu[i].pcpu_tx_refcnt; 2169 2170 return (refcnt); 2171 } 2172 2173 /* 2174 * Stop future Tx packets coming down from the client in preparation for 2175 * quiescing the Tx side. This is needed for dynamic reclaim and reassignment 2176 * of rings between clients 2177 */ 2178 void 2179 mac_tx_client_block(mac_client_impl_t *mcip) 2180 { 2181 mac_tx_lock_all(mcip); 2182 mcip->mci_tx_flag |= MCI_TX_QUIESCE; 2183 while (mac_tx_sum_refcnt(mcip) != 0) { 2184 mac_tx_unlock_allbutzero(mcip); 2185 cv_wait(&mcip->mci_tx_cv, &mcip->mci_tx_pcpu[0].pcpu_tx_lock); 2186 mutex_exit(&mcip->mci_tx_pcpu[0].pcpu_tx_lock); 2187 mac_tx_lock_all(mcip); 2188 } 2189 mac_tx_unlock_all(mcip); 2190 } 2191 2192 void 2193 mac_tx_client_unblock(mac_client_impl_t *mcip) 2194 { 2195 mac_tx_lock_all(mcip); 2196 mcip->mci_tx_flag &= ~MCI_TX_QUIESCE; 2197 mac_tx_unlock_all(mcip); 2198 /* 2199 * We may fail to disable flow control for the last MAC_NOTE_TX 2200 * notification because the MAC client is quiesced. Send the 2201 * notification again. 2202 */ 2203 i_mac_notify(mcip->mci_mip, MAC_NOTE_TX); 2204 } 2205 2206 /* 2207 * Wait for an SRS to quiesce. The SRS worker will signal us when the 2208 * quiesce is done. 2209 */ 2210 static void 2211 mac_srs_quiesce_wait(mac_soft_ring_set_t *srs, uint_t srs_flag) 2212 { 2213 mutex_enter(&srs->srs_lock); 2214 while (!(srs->srs_state & srs_flag)) 2215 cv_wait(&srs->srs_quiesce_done_cv, &srs->srs_lock); 2216 mutex_exit(&srs->srs_lock); 2217 } 2218 2219 /* 2220 * Quiescing an Rx SRS is achieved by the following sequence. The protocol 2221 * works bottom up by cutting off packet flow from the bottommost point in the 2222 * mac, then the SRS, and then the soft rings. There are 2 use cases of this 2223 * mechanism. One is a temporary quiesce of the SRS, such as say while changing 2224 * the Rx callbacks. Another use case is Rx SRS teardown. In the former case 2225 * the QUIESCE prefix/suffix is used and in the latter the CONDEMNED is used 2226 * for the SRS and MR flags. In the former case the threads pause waiting for 2227 * a restart, while in the latter case the threads exit. The Tx SRS teardown 2228 * is also mostly similar to the above. 2229 * 2230 * 1. Stop future hardware classified packets at the lowest level in the mac. 2231 * Remove any hardware classification rule (CONDEMNED case) and mark the 2232 * rings as CONDEMNED or QUIESCE as appropriate. This prevents the mr_refcnt 2233 * from increasing. Upcalls from the driver that come through hardware 2234 * classification will be dropped in mac_rx from now on. Then we wait for 2235 * the mr_refcnt to drop to zero. When the mr_refcnt reaches zero we are 2236 * sure there aren't any upcall threads from the driver through hardware 2237 * classification. In the case of SRS teardown we also remove the 2238 * classification rule in the driver. 2239 * 2240 * 2. Stop future software classified packets by marking the flow entry with 2241 * FE_QUIESCE or FE_CONDEMNED as appropriate which prevents the refcnt from 2242 * increasing. We also remove the flow entry from the table in the latter 2243 * case. Then wait for the fe_refcnt to reach an appropriate quiescent value 2244 * that indicates there aren't any active threads using that flow entry. 2245 * 2246 * 3. Quiesce the SRS and softrings by signaling the SRS. The SRS poll thread, 2247 * SRS worker thread, and the soft ring threads are quiesced in sequence 2248 * with the SRS worker thread serving as a master controller. This 2249 * mechansim is explained in mac_srs_worker_quiesce(). 2250 * 2251 * The restart mechanism to reactivate the SRS and softrings is explained 2252 * in mac_srs_worker_restart(). Here we just signal the SRS worker to start the 2253 * restart sequence. 2254 */ 2255 void 2256 mac_rx_srs_quiesce(mac_soft_ring_set_t *srs, uint_t srs_quiesce_flag) 2257 { 2258 flow_entry_t *flent = srs->srs_flent; 2259 uint_t mr_flag, srs_done_flag; 2260 2261 ASSERT(MAC_PERIM_HELD((mac_handle_t)FLENT_TO_MIP(flent))); 2262 ASSERT(!(srs->srs_type & SRST_TX)); 2263 2264 if (srs_quiesce_flag == SRS_CONDEMNED) { 2265 mr_flag = MR_CONDEMNED; 2266 srs_done_flag = SRS_CONDEMNED_DONE; 2267 if (srs->srs_type & SRST_CLIENT_POLL_ENABLED) 2268 mac_srs_client_poll_disable(srs->srs_mcip, srs); 2269 } else { 2270 ASSERT(srs_quiesce_flag == SRS_QUIESCE); 2271 mr_flag = MR_QUIESCE; 2272 srs_done_flag = SRS_QUIESCE_DONE; 2273 if (srs->srs_type & SRST_CLIENT_POLL_ENABLED) 2274 mac_srs_client_poll_quiesce(srs->srs_mcip, srs); 2275 } 2276 2277 if (srs->srs_ring != NULL) { 2278 mac_rx_ring_quiesce(srs->srs_ring, mr_flag); 2279 } else { 2280 /* 2281 * SRS is driven by software classification. In case 2282 * of CONDEMNED, the top level teardown functions will 2283 * deal with flow removal. 2284 */ 2285 if (srs_quiesce_flag != SRS_CONDEMNED) { 2286 FLOW_MARK(flent, FE_QUIESCE); 2287 mac_flow_wait(flent, FLOW_DRIVER_UPCALL); 2288 } 2289 } 2290 2291 /* 2292 * Signal the SRS to quiesce itself, and then cv_wait for the 2293 * SRS quiesce to complete. The SRS worker thread will wake us 2294 * up when the quiesce is complete 2295 */ 2296 mac_srs_signal(srs, srs_quiesce_flag); 2297 mac_srs_quiesce_wait(srs, srs_done_flag); 2298 } 2299 2300 /* 2301 * Remove an SRS. 2302 */ 2303 void 2304 mac_rx_srs_remove(mac_soft_ring_set_t *srs) 2305 { 2306 flow_entry_t *flent = srs->srs_flent; 2307 int i; 2308 2309 mac_rx_srs_quiesce(srs, SRS_CONDEMNED); 2310 /* 2311 * Locate and remove our entry in the fe_rx_srs[] array, and 2312 * adjust the fe_rx_srs array entries and array count by 2313 * moving the last entry into the vacated spot. 2314 */ 2315 mutex_enter(&flent->fe_lock); 2316 for (i = 0; i < flent->fe_rx_srs_cnt; i++) { 2317 if (flent->fe_rx_srs[i] == srs) 2318 break; 2319 } 2320 2321 ASSERT(i != 0 && i < flent->fe_rx_srs_cnt); 2322 if (i != flent->fe_rx_srs_cnt - 1) { 2323 flent->fe_rx_srs[i] = 2324 flent->fe_rx_srs[flent->fe_rx_srs_cnt - 1]; 2325 i = flent->fe_rx_srs_cnt - 1; 2326 } 2327 2328 flent->fe_rx_srs[i] = NULL; 2329 flent->fe_rx_srs_cnt--; 2330 mutex_exit(&flent->fe_lock); 2331 2332 mac_srs_free(srs); 2333 } 2334 2335 static void 2336 mac_srs_clear_flag(mac_soft_ring_set_t *srs, uint_t flag) 2337 { 2338 mutex_enter(&srs->srs_lock); 2339 srs->srs_state &= ~flag; 2340 mutex_exit(&srs->srs_lock); 2341 } 2342 2343 void 2344 mac_rx_srs_restart(mac_soft_ring_set_t *srs) 2345 { 2346 flow_entry_t *flent = srs->srs_flent; 2347 mac_ring_t *mr; 2348 2349 ASSERT(MAC_PERIM_HELD((mac_handle_t)FLENT_TO_MIP(flent))); 2350 ASSERT((srs->srs_type & SRST_TX) == 0); 2351 2352 /* 2353 * This handles a change in the number of SRSs between the quiesce and 2354 * and restart operation of a flow. 2355 */ 2356 if (!SRS_QUIESCED(srs)) 2357 return; 2358 2359 /* 2360 * Signal the SRS to restart itself. Wait for the restart to complete 2361 * Note that we only restart the SRS if it is not marked as 2362 * permanently quiesced. 2363 */ 2364 if (!SRS_QUIESCED_PERMANENT(srs)) { 2365 mac_srs_signal(srs, SRS_RESTART); 2366 mac_srs_quiesce_wait(srs, SRS_RESTART_DONE); 2367 mac_srs_clear_flag(srs, SRS_RESTART_DONE); 2368 2369 mac_srs_client_poll_restart(srs->srs_mcip, srs); 2370 } 2371 2372 /* Finally clear the flags to let the packets in */ 2373 mr = srs->srs_ring; 2374 if (mr != NULL) { 2375 MAC_RING_UNMARK(mr, MR_QUIESCE); 2376 /* In case the ring was stopped, safely restart it */ 2377 if (mr->mr_state != MR_INUSE) 2378 (void) mac_start_ring(mr); 2379 } else { 2380 FLOW_UNMARK(flent, FE_QUIESCE); 2381 } 2382 } 2383 2384 /* 2385 * Temporary quiesce of a flow and associated Rx SRS. 2386 * Please see block comment above mac_rx_classify_flow_rem. 2387 */ 2388 /* ARGSUSED */ 2389 int 2390 mac_rx_classify_flow_quiesce(flow_entry_t *flent, void *arg) 2391 { 2392 int i; 2393 2394 for (i = 0; i < flent->fe_rx_srs_cnt; i++) { 2395 mac_rx_srs_quiesce((mac_soft_ring_set_t *)flent->fe_rx_srs[i], 2396 SRS_QUIESCE); 2397 } 2398 return (0); 2399 } 2400 2401 /* 2402 * Restart a flow and associated Rx SRS that has been quiesced temporarily 2403 * Please see block comment above mac_rx_classify_flow_rem 2404 */ 2405 /* ARGSUSED */ 2406 int 2407 mac_rx_classify_flow_restart(flow_entry_t *flent, void *arg) 2408 { 2409 int i; 2410 2411 for (i = 0; i < flent->fe_rx_srs_cnt; i++) 2412 mac_rx_srs_restart((mac_soft_ring_set_t *)flent->fe_rx_srs[i]); 2413 2414 return (0); 2415 } 2416 2417 void 2418 mac_srs_perm_quiesce(mac_client_handle_t mch, boolean_t on) 2419 { 2420 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 2421 flow_entry_t *flent = mcip->mci_flent; 2422 mac_impl_t *mip = mcip->mci_mip; 2423 mac_soft_ring_set_t *mac_srs; 2424 int i; 2425 2426 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 2427 2428 if (flent == NULL) 2429 return; 2430 2431 for (i = 0; i < flent->fe_rx_srs_cnt; i++) { 2432 mac_srs = flent->fe_rx_srs[i]; 2433 mutex_enter(&mac_srs->srs_lock); 2434 if (on) 2435 mac_srs->srs_state |= SRS_QUIESCE_PERM; 2436 else 2437 mac_srs->srs_state &= ~SRS_QUIESCE_PERM; 2438 mutex_exit(&mac_srs->srs_lock); 2439 } 2440 } 2441 2442 void 2443 mac_rx_client_quiesce(mac_client_handle_t mch) 2444 { 2445 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 2446 mac_impl_t *mip = mcip->mci_mip; 2447 2448 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 2449 2450 if (MCIP_DATAPATH_SETUP(mcip)) { 2451 (void) mac_rx_classify_flow_quiesce(mcip->mci_flent, 2452 NULL); 2453 (void) mac_flow_walk_nolock(mcip->mci_subflow_tab, 2454 mac_rx_classify_flow_quiesce, NULL); 2455 } 2456 } 2457 2458 void 2459 mac_rx_client_restart(mac_client_handle_t mch) 2460 { 2461 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 2462 mac_impl_t *mip = mcip->mci_mip; 2463 2464 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 2465 2466 if (MCIP_DATAPATH_SETUP(mcip)) { 2467 (void) mac_rx_classify_flow_restart(mcip->mci_flent, NULL); 2468 (void) mac_flow_walk_nolock(mcip->mci_subflow_tab, 2469 mac_rx_classify_flow_restart, NULL); 2470 } 2471 } 2472 2473 /* 2474 * This function only quiesces the Tx SRS and softring worker threads. Callers 2475 * need to make sure that there aren't any mac client threads doing current or 2476 * future transmits in the mac before calling this function. 2477 */ 2478 void 2479 mac_tx_srs_quiesce(mac_soft_ring_set_t *srs, uint_t srs_quiesce_flag) 2480 { 2481 mac_client_impl_t *mcip = srs->srs_mcip; 2482 2483 ASSERT(MAC_PERIM_HELD((mac_handle_t)mcip->mci_mip)); 2484 2485 ASSERT(srs->srs_type & SRST_TX); 2486 ASSERT(srs_quiesce_flag == SRS_CONDEMNED || 2487 srs_quiesce_flag == SRS_QUIESCE); 2488 2489 /* 2490 * Signal the SRS to quiesce itself, and then cv_wait for the 2491 * SRS quiesce to complete. The SRS worker thread will wake us 2492 * up when the quiesce is complete 2493 */ 2494 mac_srs_signal(srs, srs_quiesce_flag); 2495 mac_srs_quiesce_wait(srs, srs_quiesce_flag == SRS_QUIESCE ? 2496 SRS_QUIESCE_DONE : SRS_CONDEMNED_DONE); 2497 } 2498 2499 void 2500 mac_tx_srs_restart(mac_soft_ring_set_t *srs) 2501 { 2502 /* 2503 * Resizing the fanout could result in creation of new SRSs. 2504 * They may not necessarily be in the quiesced state in which 2505 * case it need be restarted 2506 */ 2507 if (!SRS_QUIESCED(srs)) 2508 return; 2509 2510 mac_srs_signal(srs, SRS_RESTART); 2511 mac_srs_quiesce_wait(srs, SRS_RESTART_DONE); 2512 mac_srs_clear_flag(srs, SRS_RESTART_DONE); 2513 } 2514 2515 /* 2516 * Temporary quiesce of a flow and associated Rx SRS. 2517 * Please see block comment above mac_rx_srs_quiesce 2518 */ 2519 /* ARGSUSED */ 2520 int 2521 mac_tx_flow_quiesce(flow_entry_t *flent, void *arg) 2522 { 2523 /* 2524 * The fe_tx_srs is null for a subflow on an interface that is 2525 * not plumbed 2526 */ 2527 if (flent->fe_tx_srs != NULL) 2528 mac_tx_srs_quiesce(flent->fe_tx_srs, SRS_QUIESCE); 2529 return (0); 2530 } 2531 2532 /* ARGSUSED */ 2533 int 2534 mac_tx_flow_restart(flow_entry_t *flent, void *arg) 2535 { 2536 /* 2537 * The fe_tx_srs is null for a subflow on an interface that is 2538 * not plumbed 2539 */ 2540 if (flent->fe_tx_srs != NULL) 2541 mac_tx_srs_restart(flent->fe_tx_srs); 2542 return (0); 2543 } 2544 2545 static void 2546 i_mac_tx_client_quiesce(mac_client_handle_t mch, uint_t srs_quiesce_flag) 2547 { 2548 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 2549 2550 ASSERT(MAC_PERIM_HELD((mac_handle_t)mcip->mci_mip)); 2551 2552 mac_tx_client_block(mcip); 2553 if (MCIP_TX_SRS(mcip) != NULL) { 2554 mac_tx_srs_quiesce(MCIP_TX_SRS(mcip), srs_quiesce_flag); 2555 (void) mac_flow_walk_nolock(mcip->mci_subflow_tab, 2556 mac_tx_flow_quiesce, NULL); 2557 } 2558 } 2559 2560 void 2561 mac_tx_client_quiesce(mac_client_handle_t mch) 2562 { 2563 i_mac_tx_client_quiesce(mch, SRS_QUIESCE); 2564 } 2565 2566 void 2567 mac_tx_client_condemn(mac_client_handle_t mch) 2568 { 2569 i_mac_tx_client_quiesce(mch, SRS_CONDEMNED); 2570 } 2571 2572 void 2573 mac_tx_client_restart(mac_client_handle_t mch) 2574 { 2575 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 2576 2577 ASSERT(MAC_PERIM_HELD((mac_handle_t)mcip->mci_mip)); 2578 2579 mac_tx_client_unblock(mcip); 2580 if (MCIP_TX_SRS(mcip) != NULL) { 2581 mac_tx_srs_restart(MCIP_TX_SRS(mcip)); 2582 (void) mac_flow_walk_nolock(mcip->mci_subflow_tab, 2583 mac_tx_flow_restart, NULL); 2584 } 2585 } 2586 2587 void 2588 mac_tx_client_flush(mac_client_impl_t *mcip) 2589 { 2590 ASSERT(MAC_PERIM_HELD((mac_handle_t)mcip->mci_mip)); 2591 2592 mac_tx_client_quiesce((mac_client_handle_t)mcip); 2593 mac_tx_client_restart((mac_client_handle_t)mcip); 2594 } 2595 2596 void 2597 mac_client_quiesce(mac_client_impl_t *mcip) 2598 { 2599 mac_rx_client_quiesce((mac_client_handle_t)mcip); 2600 mac_tx_client_quiesce((mac_client_handle_t)mcip); 2601 } 2602 2603 void 2604 mac_client_restart(mac_client_impl_t *mcip) 2605 { 2606 mac_rx_client_restart((mac_client_handle_t)mcip); 2607 mac_tx_client_restart((mac_client_handle_t)mcip); 2608 } 2609 2610 /* 2611 * Allocate a minor number. 2612 */ 2613 minor_t 2614 mac_minor_hold(boolean_t sleep) 2615 { 2616 id_t id; 2617 2618 /* 2619 * Grab a value from the arena. 2620 */ 2621 atomic_inc_32(&minor_count); 2622 2623 if (sleep) 2624 return ((uint_t)id_alloc(minor_ids)); 2625 2626 if ((id = id_alloc_nosleep(minor_ids)) == -1) { 2627 atomic_dec_32(&minor_count); 2628 return (0); 2629 } 2630 2631 return ((uint_t)id); 2632 } 2633 2634 /* 2635 * Release a previously allocated minor number. 2636 */ 2637 void 2638 mac_minor_rele(minor_t minor) 2639 { 2640 /* 2641 * Return the value to the arena. 2642 */ 2643 id_free(minor_ids, minor); 2644 atomic_dec_32(&minor_count); 2645 } 2646 2647 uint32_t 2648 mac_no_notification(mac_handle_t mh) 2649 { 2650 mac_impl_t *mip = (mac_impl_t *)mh; 2651 2652 return (((mip->mi_state_flags & MIS_LEGACY) != 0) ? 2653 mip->mi_capab_legacy.ml_unsup_note : 0); 2654 } 2655 2656 /* 2657 * Prevent any new opens of this mac in preparation for unregister 2658 */ 2659 int 2660 i_mac_disable(mac_impl_t *mip) 2661 { 2662 mac_client_impl_t *mcip; 2663 2664 rw_enter(&i_mac_impl_lock, RW_WRITER); 2665 if (mip->mi_state_flags & MIS_DISABLED) { 2666 /* Already disabled, return success */ 2667 rw_exit(&i_mac_impl_lock); 2668 return (0); 2669 } 2670 /* 2671 * See if there are any other references to this mac_t (e.g., VLAN's). 2672 * If so return failure. If all the other checks below pass, then 2673 * set mi_disabled atomically under the i_mac_impl_lock to prevent 2674 * any new VLAN's from being created or new mac client opens of this 2675 * mac end point. 2676 */ 2677 if (mip->mi_ref > 0) { 2678 rw_exit(&i_mac_impl_lock); 2679 return (EBUSY); 2680 } 2681 2682 /* 2683 * mac clients must delete all multicast groups they join before 2684 * closing. bcast groups are reference counted, the last client 2685 * to delete the group will wait till the group is physically 2686 * deleted. Since all clients have closed this mac end point 2687 * mi_bcast_ngrps must be zero at this point 2688 */ 2689 ASSERT(mip->mi_bcast_ngrps == 0); 2690 2691 /* 2692 * Don't let go of this if it has some flows. 2693 * All other code guarantees no flows are added to a disabled 2694 * mac, therefore it is sufficient to check for the flow table 2695 * only here. 2696 */ 2697 mcip = mac_primary_client_handle(mip); 2698 if ((mcip != NULL) && mac_link_has_flows((mac_client_handle_t)mcip)) { 2699 rw_exit(&i_mac_impl_lock); 2700 return (ENOTEMPTY); 2701 } 2702 2703 mip->mi_state_flags |= MIS_DISABLED; 2704 rw_exit(&i_mac_impl_lock); 2705 return (0); 2706 } 2707 2708 int 2709 mac_disable_nowait(mac_handle_t mh) 2710 { 2711 mac_impl_t *mip = (mac_impl_t *)mh; 2712 int err; 2713 2714 if ((err = i_mac_perim_enter_nowait(mip)) != 0) 2715 return (err); 2716 err = i_mac_disable(mip); 2717 i_mac_perim_exit(mip); 2718 return (err); 2719 } 2720 2721 int 2722 mac_disable(mac_handle_t mh) 2723 { 2724 mac_impl_t *mip = (mac_impl_t *)mh; 2725 int err; 2726 2727 i_mac_perim_enter(mip); 2728 err = i_mac_disable(mip); 2729 i_mac_perim_exit(mip); 2730 2731 /* 2732 * Clean up notification thread and wait for it to exit. 2733 */ 2734 if (err == 0) 2735 i_mac_notify_exit(mip); 2736 2737 return (err); 2738 } 2739 2740 /* 2741 * Called when the MAC instance has a non empty flow table, to de-multiplex 2742 * incoming packets to the right flow. 2743 */ 2744 /* ARGSUSED */ 2745 static mblk_t * 2746 mac_rx_classify(mac_impl_t *mip, mac_resource_handle_t mrh, mblk_t *mp) 2747 { 2748 flow_entry_t *flent = NULL; 2749 uint_t flags = FLOW_INBOUND; 2750 int err; 2751 2752 err = mac_flow_lookup(mip->mi_flow_tab, mp, flags, &flent); 2753 if (err != 0) { 2754 /* no registered receive function */ 2755 return (mp); 2756 } else { 2757 mac_client_impl_t *mcip; 2758 2759 /* 2760 * This flent might just be an additional one on the MAC client, 2761 * i.e. for classification purposes (different fdesc), however 2762 * the resources, SRS et. al., are in the mci_flent, so if 2763 * this isn't the mci_flent, we need to get it. 2764 */ 2765 if ((mcip = flent->fe_mcip) != NULL && 2766 mcip->mci_flent != flent) { 2767 FLOW_REFRELE(flent); 2768 flent = mcip->mci_flent; 2769 FLOW_TRY_REFHOLD(flent, err); 2770 if (err != 0) 2771 return (mp); 2772 } 2773 (flent->fe_cb_fn)(flent->fe_cb_arg1, flent->fe_cb_arg2, mp, 2774 B_FALSE); 2775 FLOW_REFRELE(flent); 2776 } 2777 return (NULL); 2778 } 2779 2780 mblk_t * 2781 mac_rx_flow(mac_handle_t mh, mac_resource_handle_t mrh, mblk_t *mp_chain) 2782 { 2783 mac_impl_t *mip = (mac_impl_t *)mh; 2784 mblk_t *bp, *bp1, **bpp, *list = NULL; 2785 2786 /* 2787 * We walk the chain and attempt to classify each packet. 2788 * The packets that couldn't be classified will be returned 2789 * back to the caller. 2790 */ 2791 bp = mp_chain; 2792 bpp = &list; 2793 while (bp != NULL) { 2794 bp1 = bp; 2795 bp = bp->b_next; 2796 bp1->b_next = NULL; 2797 2798 if (mac_rx_classify(mip, mrh, bp1) != NULL) { 2799 *bpp = bp1; 2800 bpp = &bp1->b_next; 2801 } 2802 } 2803 return (list); 2804 } 2805 2806 static int 2807 mac_tx_flow_srs_wakeup(flow_entry_t *flent, void *arg) 2808 { 2809 mac_ring_handle_t ring = arg; 2810 2811 if (flent->fe_tx_srs) 2812 mac_tx_srs_wakeup(flent->fe_tx_srs, ring); 2813 return (0); 2814 } 2815 2816 void 2817 i_mac_tx_srs_notify(mac_impl_t *mip, mac_ring_handle_t ring) 2818 { 2819 mac_client_impl_t *cclient; 2820 mac_soft_ring_set_t *mac_srs; 2821 2822 /* 2823 * After grabbing the mi_rw_lock, the list of clients can't change. 2824 * If there are any clients mi_disabled must be B_FALSE and can't 2825 * get set since there are clients. If there aren't any clients we 2826 * don't do anything. In any case the mip has to be valid. The driver 2827 * must make sure that it goes single threaded (with respect to mac 2828 * calls) and wait for all pending mac calls to finish before calling 2829 * mac_unregister. 2830 */ 2831 rw_enter(&i_mac_impl_lock, RW_READER); 2832 if (mip->mi_state_flags & MIS_DISABLED) { 2833 rw_exit(&i_mac_impl_lock); 2834 return; 2835 } 2836 2837 /* 2838 * Get MAC tx srs from walking mac_client_handle list. 2839 */ 2840 rw_enter(&mip->mi_rw_lock, RW_READER); 2841 for (cclient = mip->mi_clients_list; cclient != NULL; 2842 cclient = cclient->mci_client_next) { 2843 if ((mac_srs = MCIP_TX_SRS(cclient)) != NULL) { 2844 mac_tx_srs_wakeup(mac_srs, ring); 2845 } else { 2846 /* 2847 * Aggr opens underlying ports in exclusive mode 2848 * and registers flow control callbacks using 2849 * mac_tx_client_notify(). When opened in 2850 * exclusive mode, Tx SRS won't be created 2851 * during mac_unicast_add(). 2852 */ 2853 if (cclient->mci_state_flags & MCIS_EXCLUSIVE) { 2854 mac_tx_invoke_callbacks(cclient, 2855 (mac_tx_cookie_t)ring); 2856 } 2857 } 2858 (void) mac_flow_walk(cclient->mci_subflow_tab, 2859 mac_tx_flow_srs_wakeup, ring); 2860 } 2861 rw_exit(&mip->mi_rw_lock); 2862 rw_exit(&i_mac_impl_lock); 2863 } 2864 2865 /* ARGSUSED */ 2866 void 2867 mac_multicast_refresh(mac_handle_t mh, mac_multicst_t refresh, void *arg, 2868 boolean_t add) 2869 { 2870 mac_impl_t *mip = (mac_impl_t *)mh; 2871 2872 i_mac_perim_enter((mac_impl_t *)mh); 2873 /* 2874 * If no specific refresh function was given then default to the 2875 * driver's m_multicst entry point. 2876 */ 2877 if (refresh == NULL) { 2878 refresh = mip->mi_multicst; 2879 arg = mip->mi_driver; 2880 } 2881 2882 mac_bcast_refresh(mip, refresh, arg, add); 2883 i_mac_perim_exit((mac_impl_t *)mh); 2884 } 2885 2886 void 2887 mac_promisc_refresh(mac_handle_t mh, mac_setpromisc_t refresh, void *arg) 2888 { 2889 mac_impl_t *mip = (mac_impl_t *)mh; 2890 2891 /* 2892 * If no specific refresh function was given then default to the 2893 * driver's m_promisc entry point. 2894 */ 2895 if (refresh == NULL) { 2896 refresh = mip->mi_setpromisc; 2897 arg = mip->mi_driver; 2898 } 2899 ASSERT(refresh != NULL); 2900 2901 /* 2902 * Call the refresh function with the current promiscuity. 2903 */ 2904 refresh(arg, (mip->mi_devpromisc != 0)); 2905 } 2906 2907 /* 2908 * The mac client requests that the mac not to change its margin size to 2909 * be less than the specified value. If "current" is B_TRUE, then the client 2910 * requests the mac not to change its margin size to be smaller than the 2911 * current size. Further, return the current margin size value in this case. 2912 * 2913 * We keep every requested size in an ordered list from largest to smallest. 2914 */ 2915 int 2916 mac_margin_add(mac_handle_t mh, uint32_t *marginp, boolean_t current) 2917 { 2918 mac_impl_t *mip = (mac_impl_t *)mh; 2919 mac_margin_req_t **pp, *p; 2920 int err = 0; 2921 2922 rw_enter(&(mip->mi_rw_lock), RW_WRITER); 2923 if (current) 2924 *marginp = mip->mi_margin; 2925 2926 /* 2927 * If the current margin value cannot satisfy the margin requested, 2928 * return ENOTSUP directly. 2929 */ 2930 if (*marginp > mip->mi_margin) { 2931 err = ENOTSUP; 2932 goto done; 2933 } 2934 2935 /* 2936 * Check whether the given margin is already in the list. If so, 2937 * bump the reference count. 2938 */ 2939 for (pp = &mip->mi_mmrp; (p = *pp) != NULL; pp = &p->mmr_nextp) { 2940 if (p->mmr_margin == *marginp) { 2941 /* 2942 * The margin requested is already in the list, 2943 * so just bump the reference count. 2944 */ 2945 p->mmr_ref++; 2946 goto done; 2947 } 2948 if (p->mmr_margin < *marginp) 2949 break; 2950 } 2951 2952 2953 p = kmem_zalloc(sizeof (mac_margin_req_t), KM_SLEEP); 2954 p->mmr_margin = *marginp; 2955 p->mmr_ref++; 2956 p->mmr_nextp = *pp; 2957 *pp = p; 2958 2959 done: 2960 rw_exit(&(mip->mi_rw_lock)); 2961 return (err); 2962 } 2963 2964 /* 2965 * The mac client requests to cancel its previous mac_margin_add() request. 2966 * We remove the requested margin size from the list. 2967 */ 2968 int 2969 mac_margin_remove(mac_handle_t mh, uint32_t margin) 2970 { 2971 mac_impl_t *mip = (mac_impl_t *)mh; 2972 mac_margin_req_t **pp, *p; 2973 int err = 0; 2974 2975 rw_enter(&(mip->mi_rw_lock), RW_WRITER); 2976 /* 2977 * Find the entry in the list for the given margin. 2978 */ 2979 for (pp = &(mip->mi_mmrp); (p = *pp) != NULL; pp = &(p->mmr_nextp)) { 2980 if (p->mmr_margin == margin) { 2981 if (--p->mmr_ref == 0) 2982 break; 2983 2984 /* 2985 * There is still a reference to this address so 2986 * there's nothing more to do. 2987 */ 2988 goto done; 2989 } 2990 } 2991 2992 /* 2993 * We did not find an entry for the given margin. 2994 */ 2995 if (p == NULL) { 2996 err = ENOENT; 2997 goto done; 2998 } 2999 3000 ASSERT(p->mmr_ref == 0); 3001 3002 /* 3003 * Remove it from the list. 3004 */ 3005 *pp = p->mmr_nextp; 3006 kmem_free(p, sizeof (mac_margin_req_t)); 3007 done: 3008 rw_exit(&(mip->mi_rw_lock)); 3009 return (err); 3010 } 3011 3012 boolean_t 3013 mac_margin_update(mac_handle_t mh, uint32_t margin) 3014 { 3015 mac_impl_t *mip = (mac_impl_t *)mh; 3016 uint32_t margin_needed = 0; 3017 3018 rw_enter(&(mip->mi_rw_lock), RW_WRITER); 3019 3020 if (mip->mi_mmrp != NULL) 3021 margin_needed = mip->mi_mmrp->mmr_margin; 3022 3023 if (margin_needed <= margin) 3024 mip->mi_margin = margin; 3025 3026 rw_exit(&(mip->mi_rw_lock)); 3027 3028 if (margin_needed <= margin) 3029 i_mac_notify(mip, MAC_NOTE_MARGIN); 3030 3031 return (margin_needed <= margin); 3032 } 3033 3034 /* 3035 * MAC clients use this interface to request that a MAC device not change its 3036 * MTU below the specified amount. At this time, that amount must be within the 3037 * range of the device's current minimum and the device's current maximum. eg. a 3038 * client cannot request a 3000 byte MTU when the device's MTU is currently 3039 * 2000. 3040 * 3041 * If "current" is set to B_TRUE, then the request is to simply to reserve the 3042 * current underlying mac's maximum for this mac client and return it in mtup. 3043 */ 3044 int 3045 mac_mtu_add(mac_handle_t mh, uint32_t *mtup, boolean_t current) 3046 { 3047 mac_impl_t *mip = (mac_impl_t *)mh; 3048 mac_mtu_req_t *prev, *cur; 3049 mac_propval_range_t mpr; 3050 int err; 3051 3052 i_mac_perim_enter(mip); 3053 rw_enter(&mip->mi_rw_lock, RW_WRITER); 3054 3055 if (current == B_TRUE) 3056 *mtup = mip->mi_sdu_max; 3057 mpr.mpr_count = 1; 3058 err = mac_prop_info(mh, MAC_PROP_MTU, "mtu", NULL, 0, &mpr, NULL); 3059 if (err != 0) { 3060 rw_exit(&mip->mi_rw_lock); 3061 i_mac_perim_exit(mip); 3062 return (err); 3063 } 3064 3065 if (*mtup > mip->mi_sdu_max || 3066 *mtup < mpr.mpr_range_uint32[0].mpur_min) { 3067 rw_exit(&mip->mi_rw_lock); 3068 i_mac_perim_exit(mip); 3069 return (ENOTSUP); 3070 } 3071 3072 prev = NULL; 3073 for (cur = mip->mi_mtrp; cur != NULL; cur = cur->mtr_nextp) { 3074 if (*mtup == cur->mtr_mtu) { 3075 cur->mtr_ref++; 3076 rw_exit(&mip->mi_rw_lock); 3077 i_mac_perim_exit(mip); 3078 return (0); 3079 } 3080 3081 if (*mtup > cur->mtr_mtu) 3082 break; 3083 3084 prev = cur; 3085 } 3086 3087 cur = kmem_alloc(sizeof (mac_mtu_req_t), KM_SLEEP); 3088 cur->mtr_mtu = *mtup; 3089 cur->mtr_ref = 1; 3090 if (prev != NULL) { 3091 cur->mtr_nextp = prev->mtr_nextp; 3092 prev->mtr_nextp = cur; 3093 } else { 3094 cur->mtr_nextp = mip->mi_mtrp; 3095 mip->mi_mtrp = cur; 3096 } 3097 3098 rw_exit(&mip->mi_rw_lock); 3099 i_mac_perim_exit(mip); 3100 return (0); 3101 } 3102 3103 int 3104 mac_mtu_remove(mac_handle_t mh, uint32_t mtu) 3105 { 3106 mac_impl_t *mip = (mac_impl_t *)mh; 3107 mac_mtu_req_t *cur, *prev; 3108 3109 i_mac_perim_enter(mip); 3110 rw_enter(&mip->mi_rw_lock, RW_WRITER); 3111 3112 prev = NULL; 3113 for (cur = mip->mi_mtrp; cur != NULL; cur = cur->mtr_nextp) { 3114 if (cur->mtr_mtu == mtu) { 3115 ASSERT(cur->mtr_ref > 0); 3116 cur->mtr_ref--; 3117 if (cur->mtr_ref == 0) { 3118 if (prev == NULL) { 3119 mip->mi_mtrp = cur->mtr_nextp; 3120 } else { 3121 prev->mtr_nextp = cur->mtr_nextp; 3122 } 3123 kmem_free(cur, sizeof (mac_mtu_req_t)); 3124 } 3125 rw_exit(&mip->mi_rw_lock); 3126 i_mac_perim_exit(mip); 3127 return (0); 3128 } 3129 3130 prev = cur; 3131 } 3132 3133 rw_exit(&mip->mi_rw_lock); 3134 i_mac_perim_exit(mip); 3135 return (ENOENT); 3136 } 3137 3138 /* 3139 * MAC Type Plugin functions. 3140 */ 3141 3142 mactype_t * 3143 mactype_getplugin(const char *pname) 3144 { 3145 mactype_t *mtype = NULL; 3146 boolean_t tried_modload = B_FALSE; 3147 3148 mutex_enter(&i_mactype_lock); 3149 3150 find_registered_mactype: 3151 if (mod_hash_find(i_mactype_hash, (mod_hash_key_t)pname, 3152 (mod_hash_val_t *)&mtype) != 0) { 3153 if (!tried_modload) { 3154 /* 3155 * If the plugin has not yet been loaded, then 3156 * attempt to load it now. If modload() succeeds, 3157 * the plugin should have registered using 3158 * mactype_register(), in which case we can go back 3159 * and attempt to find it again. 3160 */ 3161 if (modload(MACTYPE_KMODDIR, (char *)pname) != -1) { 3162 tried_modload = B_TRUE; 3163 goto find_registered_mactype; 3164 } 3165 } 3166 } else { 3167 /* 3168 * Note that there's no danger that the plugin we've loaded 3169 * could be unloaded between the modload() step and the 3170 * reference count bump here, as we're holding 3171 * i_mactype_lock, which mactype_unregister() also holds. 3172 */ 3173 atomic_inc_32(&mtype->mt_ref); 3174 } 3175 3176 mutex_exit(&i_mactype_lock); 3177 return (mtype); 3178 } 3179 3180 mactype_register_t * 3181 mactype_alloc(uint_t mactype_version) 3182 { 3183 mactype_register_t *mtrp; 3184 3185 /* 3186 * Make sure there isn't a version mismatch between the plugin and 3187 * the framework. In the future, if multiple versions are 3188 * supported, this check could become more sophisticated. 3189 */ 3190 if (mactype_version != MACTYPE_VERSION) 3191 return (NULL); 3192 3193 mtrp = kmem_zalloc(sizeof (mactype_register_t), KM_SLEEP); 3194 mtrp->mtr_version = mactype_version; 3195 return (mtrp); 3196 } 3197 3198 void 3199 mactype_free(mactype_register_t *mtrp) 3200 { 3201 kmem_free(mtrp, sizeof (mactype_register_t)); 3202 } 3203 3204 int 3205 mactype_register(mactype_register_t *mtrp) 3206 { 3207 mactype_t *mtp; 3208 mactype_ops_t *ops = mtrp->mtr_ops; 3209 3210 /* Do some sanity checking before we register this MAC type. */ 3211 if (mtrp->mtr_ident == NULL || ops == NULL) 3212 return (EINVAL); 3213 3214 /* 3215 * Verify that all mandatory callbacks are set in the ops 3216 * vector. 3217 */ 3218 if (ops->mtops_unicst_verify == NULL || 3219 ops->mtops_multicst_verify == NULL || 3220 ops->mtops_sap_verify == NULL || 3221 ops->mtops_header == NULL || 3222 ops->mtops_header_info == NULL) { 3223 return (EINVAL); 3224 } 3225 3226 mtp = kmem_zalloc(sizeof (*mtp), KM_SLEEP); 3227 mtp->mt_ident = mtrp->mtr_ident; 3228 mtp->mt_ops = *ops; 3229 mtp->mt_type = mtrp->mtr_mactype; 3230 mtp->mt_nativetype = mtrp->mtr_nativetype; 3231 mtp->mt_addr_length = mtrp->mtr_addrlen; 3232 if (mtrp->mtr_brdcst_addr != NULL) { 3233 mtp->mt_brdcst_addr = kmem_alloc(mtrp->mtr_addrlen, KM_SLEEP); 3234 bcopy(mtrp->mtr_brdcst_addr, mtp->mt_brdcst_addr, 3235 mtrp->mtr_addrlen); 3236 } 3237 3238 mtp->mt_stats = mtrp->mtr_stats; 3239 mtp->mt_statcount = mtrp->mtr_statcount; 3240 3241 mtp->mt_mapping = mtrp->mtr_mapping; 3242 mtp->mt_mappingcount = mtrp->mtr_mappingcount; 3243 3244 if (mod_hash_insert(i_mactype_hash, 3245 (mod_hash_key_t)mtp->mt_ident, (mod_hash_val_t)mtp) != 0) { 3246 kmem_free(mtp->mt_brdcst_addr, mtp->mt_addr_length); 3247 kmem_free(mtp, sizeof (*mtp)); 3248 return (EEXIST); 3249 } 3250 return (0); 3251 } 3252 3253 int 3254 mactype_unregister(const char *ident) 3255 { 3256 mactype_t *mtp; 3257 mod_hash_val_t val; 3258 int err; 3259 3260 /* 3261 * Let's not allow MAC drivers to use this plugin while we're 3262 * trying to unregister it. Holding i_mactype_lock also prevents a 3263 * plugin from unregistering while a MAC driver is attempting to 3264 * hold a reference to it in i_mactype_getplugin(). 3265 */ 3266 mutex_enter(&i_mactype_lock); 3267 3268 if ((err = mod_hash_find(i_mactype_hash, (mod_hash_key_t)ident, 3269 (mod_hash_val_t *)&mtp)) != 0) { 3270 /* A plugin is trying to unregister, but it never registered. */ 3271 err = ENXIO; 3272 goto done; 3273 } 3274 3275 if (mtp->mt_ref != 0) { 3276 err = EBUSY; 3277 goto done; 3278 } 3279 3280 err = mod_hash_remove(i_mactype_hash, (mod_hash_key_t)ident, &val); 3281 ASSERT(err == 0); 3282 if (err != 0) { 3283 /* This should never happen, thus the ASSERT() above. */ 3284 err = EINVAL; 3285 goto done; 3286 } 3287 ASSERT(mtp == (mactype_t *)val); 3288 3289 if (mtp->mt_brdcst_addr != NULL) 3290 kmem_free(mtp->mt_brdcst_addr, mtp->mt_addr_length); 3291 kmem_free(mtp, sizeof (mactype_t)); 3292 done: 3293 mutex_exit(&i_mactype_lock); 3294 return (err); 3295 } 3296 3297 /* 3298 * Checks the size of the value size specified for a property as 3299 * part of a property operation. Returns B_TRUE if the size is 3300 * correct, B_FALSE otherwise. 3301 */ 3302 boolean_t 3303 mac_prop_check_size(mac_prop_id_t id, uint_t valsize, boolean_t is_range) 3304 { 3305 uint_t minsize = 0; 3306 3307 if (is_range) 3308 return (valsize >= sizeof (mac_propval_range_t)); 3309 3310 switch (id) { 3311 case MAC_PROP_ZONE: 3312 minsize = sizeof (dld_ioc_zid_t); 3313 break; 3314 case MAC_PROP_AUTOPUSH: 3315 if (valsize != 0) 3316 minsize = sizeof (struct dlautopush); 3317 break; 3318 case MAC_PROP_TAGMODE: 3319 minsize = sizeof (link_tagmode_t); 3320 break; 3321 case MAC_PROP_RESOURCE: 3322 case MAC_PROP_RESOURCE_EFF: 3323 minsize = sizeof (mac_resource_props_t); 3324 break; 3325 case MAC_PROP_DUPLEX: 3326 minsize = sizeof (link_duplex_t); 3327 break; 3328 case MAC_PROP_SPEED: 3329 minsize = sizeof (uint64_t); 3330 break; 3331 case MAC_PROP_STATUS: 3332 minsize = sizeof (link_state_t); 3333 break; 3334 case MAC_PROP_AUTONEG: 3335 case MAC_PROP_EN_AUTONEG: 3336 minsize = sizeof (uint8_t); 3337 break; 3338 case MAC_PROP_MTU: 3339 case MAC_PROP_LLIMIT: 3340 case MAC_PROP_LDECAY: 3341 minsize = sizeof (uint32_t); 3342 break; 3343 case MAC_PROP_FLOWCTRL: 3344 minsize = sizeof (link_flowctrl_t); 3345 break; 3346 case MAC_PROP_ADV_FEC_CAP: 3347 case MAC_PROP_EN_FEC_CAP: 3348 minsize = sizeof (link_fec_t); 3349 break; 3350 case MAC_PROP_ADV_400GFDX_CAP: 3351 case MAC_PROP_EN_400GFDX_CAP: 3352 case MAC_PROP_ADV_200GFDX_CAP: 3353 case MAC_PROP_EN_200GFDX_CAP: 3354 case MAC_PROP_ADV_100GFDX_CAP: 3355 case MAC_PROP_EN_100GFDX_CAP: 3356 case MAC_PROP_ADV_50GFDX_CAP: 3357 case MAC_PROP_EN_50GFDX_CAP: 3358 case MAC_PROP_ADV_40GFDX_CAP: 3359 case MAC_PROP_EN_40GFDX_CAP: 3360 case MAC_PROP_ADV_25GFDX_CAP: 3361 case MAC_PROP_EN_25GFDX_CAP: 3362 case MAC_PROP_ADV_10GFDX_CAP: 3363 case MAC_PROP_EN_10GFDX_CAP: 3364 case MAC_PROP_ADV_5000FDX_CAP: 3365 case MAC_PROP_EN_5000FDX_CAP: 3366 case MAC_PROP_ADV_2500FDX_CAP: 3367 case MAC_PROP_EN_2500FDX_CAP: 3368 case MAC_PROP_ADV_1000HDX_CAP: 3369 case MAC_PROP_EN_1000HDX_CAP: 3370 case MAC_PROP_ADV_100FDX_CAP: 3371 case MAC_PROP_EN_100FDX_CAP: 3372 case MAC_PROP_ADV_100T4_CAP: 3373 case MAC_PROP_EN_100T4_CAP: 3374 case MAC_PROP_ADV_100HDX_CAP: 3375 case MAC_PROP_EN_100HDX_CAP: 3376 case MAC_PROP_ADV_10FDX_CAP: 3377 case MAC_PROP_EN_10FDX_CAP: 3378 case MAC_PROP_ADV_10HDX_CAP: 3379 case MAC_PROP_EN_10HDX_CAP: 3380 minsize = sizeof (uint8_t); 3381 break; 3382 case MAC_PROP_PVID: 3383 minsize = sizeof (uint16_t); 3384 break; 3385 case MAC_PROP_IPTUN_HOPLIMIT: 3386 minsize = sizeof (uint32_t); 3387 break; 3388 case MAC_PROP_IPTUN_ENCAPLIMIT: 3389 minsize = sizeof (uint32_t); 3390 break; 3391 case MAC_PROP_MAX_TX_RINGS_AVAIL: 3392 case MAC_PROP_MAX_RX_RINGS_AVAIL: 3393 case MAC_PROP_MAX_RXHWCLNT_AVAIL: 3394 case MAC_PROP_MAX_TXHWCLNT_AVAIL: 3395 minsize = sizeof (uint_t); 3396 break; 3397 case MAC_PROP_WL_ESSID: 3398 minsize = sizeof (wl_linkstatus_t); 3399 break; 3400 case MAC_PROP_WL_BSSID: 3401 minsize = sizeof (wl_bssid_t); 3402 break; 3403 case MAC_PROP_WL_BSSTYPE: 3404 minsize = sizeof (wl_bss_type_t); 3405 break; 3406 case MAC_PROP_WL_LINKSTATUS: 3407 minsize = sizeof (wl_linkstatus_t); 3408 break; 3409 case MAC_PROP_WL_DESIRED_RATES: 3410 minsize = sizeof (wl_rates_t); 3411 break; 3412 case MAC_PROP_WL_SUPPORTED_RATES: 3413 minsize = sizeof (wl_rates_t); 3414 break; 3415 case MAC_PROP_WL_AUTH_MODE: 3416 minsize = sizeof (wl_authmode_t); 3417 break; 3418 case MAC_PROP_WL_ENCRYPTION: 3419 minsize = sizeof (wl_encryption_t); 3420 break; 3421 case MAC_PROP_WL_RSSI: 3422 minsize = sizeof (wl_rssi_t); 3423 break; 3424 case MAC_PROP_WL_PHY_CONFIG: 3425 minsize = sizeof (wl_phy_conf_t); 3426 break; 3427 case MAC_PROP_WL_CAPABILITY: 3428 minsize = sizeof (wl_capability_t); 3429 break; 3430 case MAC_PROP_WL_WPA: 3431 minsize = sizeof (wl_wpa_t); 3432 break; 3433 case MAC_PROP_WL_SCANRESULTS: 3434 minsize = sizeof (wl_wpa_ess_t); 3435 break; 3436 case MAC_PROP_WL_POWER_MODE: 3437 minsize = sizeof (wl_ps_mode_t); 3438 break; 3439 case MAC_PROP_WL_RADIO: 3440 minsize = sizeof (wl_radio_t); 3441 break; 3442 case MAC_PROP_WL_ESS_LIST: 3443 minsize = sizeof (wl_ess_list_t); 3444 break; 3445 case MAC_PROP_WL_KEY_TAB: 3446 minsize = sizeof (wl_wep_key_tab_t); 3447 break; 3448 case MAC_PROP_WL_CREATE_IBSS: 3449 minsize = sizeof (wl_create_ibss_t); 3450 break; 3451 case MAC_PROP_WL_SETOPTIE: 3452 minsize = sizeof (wl_wpa_ie_t); 3453 break; 3454 case MAC_PROP_WL_DELKEY: 3455 minsize = sizeof (wl_del_key_t); 3456 break; 3457 case MAC_PROP_WL_KEY: 3458 minsize = sizeof (wl_key_t); 3459 break; 3460 case MAC_PROP_WL_MLME: 3461 minsize = sizeof (wl_mlme_t); 3462 break; 3463 case MAC_PROP_VN_PROMISC_FILTERED: 3464 minsize = sizeof (boolean_t); 3465 break; 3466 case MAC_PROP_MEDIA: 3467 /* 3468 * Our assumption is that each class of device uses an enum and 3469 * that all enums will be the same size so it is OK to use a 3470 * single one. 3471 */ 3472 minsize = sizeof (mac_ether_media_t); 3473 break; 3474 } 3475 3476 return (valsize >= minsize); 3477 } 3478 3479 /* 3480 * mac_set_prop() sets MAC or hardware driver properties: 3481 * 3482 * - MAC-managed properties such as resource properties include maxbw, 3483 * priority, and cpu binding list, as well as the default port VID 3484 * used by bridging. These properties are consumed by the MAC layer 3485 * itself and not passed down to the driver. For resource control 3486 * properties, this function invokes mac_set_resources() which will 3487 * cache the property value in mac_impl_t and may call 3488 * mac_client_set_resource() to update property value of the primary 3489 * mac client, if it exists. 3490 * 3491 * - Properties which act on the hardware and must be passed to the 3492 * driver, such as MTU, through the driver's mc_setprop() entry point. 3493 */ 3494 int 3495 mac_set_prop(mac_handle_t mh, mac_prop_id_t id, char *name, void *val, 3496 uint_t valsize) 3497 { 3498 int err = ENOTSUP; 3499 mac_impl_t *mip = (mac_impl_t *)mh; 3500 3501 ASSERT(MAC_PERIM_HELD(mh)); 3502 3503 switch (id) { 3504 case MAC_PROP_RESOURCE: { 3505 mac_resource_props_t *mrp; 3506 3507 /* call mac_set_resources() for MAC properties */ 3508 ASSERT(valsize >= sizeof (mac_resource_props_t)); 3509 mrp = kmem_zalloc(sizeof (*mrp), KM_SLEEP); 3510 bcopy(val, mrp, sizeof (*mrp)); 3511 err = mac_set_resources(mh, mrp); 3512 kmem_free(mrp, sizeof (*mrp)); 3513 break; 3514 } 3515 3516 case MAC_PROP_PVID: 3517 ASSERT(valsize >= sizeof (uint16_t)); 3518 if (mip->mi_state_flags & MIS_IS_VNIC) 3519 return (EINVAL); 3520 err = mac_set_pvid(mh, *(uint16_t *)val); 3521 break; 3522 3523 case MAC_PROP_MTU: { 3524 uint32_t mtu; 3525 3526 ASSERT(valsize >= sizeof (uint32_t)); 3527 bcopy(val, &mtu, sizeof (mtu)); 3528 err = mac_set_mtu(mh, mtu, NULL); 3529 break; 3530 } 3531 3532 case MAC_PROP_LLIMIT: 3533 case MAC_PROP_LDECAY: { 3534 uint32_t learnval; 3535 3536 if (valsize < sizeof (learnval) || 3537 (mip->mi_state_flags & MIS_IS_VNIC)) 3538 return (EINVAL); 3539 bcopy(val, &learnval, sizeof (learnval)); 3540 if (learnval == 0 && id == MAC_PROP_LDECAY) 3541 return (EINVAL); 3542 if (id == MAC_PROP_LLIMIT) 3543 mip->mi_llimit = learnval; 3544 else 3545 mip->mi_ldecay = learnval; 3546 err = 0; 3547 break; 3548 } 3549 3550 case MAC_PROP_ADV_FEC_CAP: 3551 case MAC_PROP_EN_FEC_CAP: { 3552 link_fec_t fec; 3553 3554 ASSERT(valsize >= sizeof (link_fec_t)); 3555 3556 /* 3557 * fec cannot be zero, and auto must be set exclusively. 3558 */ 3559 bcopy(val, &fec, sizeof (link_fec_t)); 3560 if (fec == 0) 3561 return (EINVAL); 3562 if ((fec & LINK_FEC_AUTO) != 0 && (fec & ~LINK_FEC_AUTO) != 0) 3563 return (EINVAL); 3564 3565 if (mip->mi_callbacks->mc_callbacks & MC_SETPROP) { 3566 err = mip->mi_callbacks->mc_setprop(mip->mi_driver, 3567 name, id, valsize, val); 3568 } 3569 break; 3570 } 3571 3572 default: 3573 /* For other driver properties, call driver's callback */ 3574 if (mip->mi_callbacks->mc_callbacks & MC_SETPROP) { 3575 err = mip->mi_callbacks->mc_setprop(mip->mi_driver, 3576 name, id, valsize, val); 3577 } 3578 } 3579 return (err); 3580 } 3581 3582 /* 3583 * mac_get_prop() gets MAC or device driver properties. 3584 * 3585 * If the property is a driver property, mac_get_prop() calls driver's callback 3586 * entry point to get it. 3587 * If the property is a MAC property, mac_get_prop() invokes mac_get_resources() 3588 * which returns the cached value in mac_impl_t. 3589 */ 3590 int 3591 mac_get_prop(mac_handle_t mh, mac_prop_id_t id, char *name, void *val, 3592 uint_t valsize) 3593 { 3594 int err = ENOTSUP; 3595 mac_impl_t *mip = (mac_impl_t *)mh; 3596 uint_t rings; 3597 uint_t vlinks; 3598 3599 bzero(val, valsize); 3600 3601 switch (id) { 3602 case MAC_PROP_RESOURCE: { 3603 mac_resource_props_t *mrp; 3604 3605 /* If mac property, read from cache */ 3606 ASSERT(valsize >= sizeof (mac_resource_props_t)); 3607 mrp = kmem_zalloc(sizeof (*mrp), KM_SLEEP); 3608 mac_get_resources(mh, mrp); 3609 bcopy(mrp, val, sizeof (*mrp)); 3610 kmem_free(mrp, sizeof (*mrp)); 3611 return (0); 3612 } 3613 case MAC_PROP_RESOURCE_EFF: { 3614 mac_resource_props_t *mrp; 3615 3616 /* If mac effective property, read from client */ 3617 ASSERT(valsize >= sizeof (mac_resource_props_t)); 3618 mrp = kmem_zalloc(sizeof (*mrp), KM_SLEEP); 3619 mac_get_effective_resources(mh, mrp); 3620 bcopy(mrp, val, sizeof (*mrp)); 3621 kmem_free(mrp, sizeof (*mrp)); 3622 return (0); 3623 } 3624 3625 case MAC_PROP_PVID: 3626 ASSERT(valsize >= sizeof (uint16_t)); 3627 if (mip->mi_state_flags & MIS_IS_VNIC) 3628 return (EINVAL); 3629 *(uint16_t *)val = mac_get_pvid(mh); 3630 return (0); 3631 3632 case MAC_PROP_LLIMIT: 3633 case MAC_PROP_LDECAY: 3634 ASSERT(valsize >= sizeof (uint32_t)); 3635 if (mip->mi_state_flags & MIS_IS_VNIC) 3636 return (EINVAL); 3637 if (id == MAC_PROP_LLIMIT) 3638 bcopy(&mip->mi_llimit, val, sizeof (mip->mi_llimit)); 3639 else 3640 bcopy(&mip->mi_ldecay, val, sizeof (mip->mi_ldecay)); 3641 return (0); 3642 3643 case MAC_PROP_MTU: { 3644 uint32_t sdu; 3645 3646 ASSERT(valsize >= sizeof (uint32_t)); 3647 mac_sdu_get2(mh, NULL, &sdu, NULL); 3648 bcopy(&sdu, val, sizeof (sdu)); 3649 3650 return (0); 3651 } 3652 case MAC_PROP_STATUS: { 3653 link_state_t link_state; 3654 3655 if (valsize < sizeof (link_state)) 3656 return (EINVAL); 3657 link_state = mac_link_get(mh); 3658 bcopy(&link_state, val, sizeof (link_state)); 3659 3660 return (0); 3661 } 3662 3663 case MAC_PROP_MAX_RX_RINGS_AVAIL: 3664 case MAC_PROP_MAX_TX_RINGS_AVAIL: 3665 ASSERT(valsize >= sizeof (uint_t)); 3666 rings = id == MAC_PROP_MAX_RX_RINGS_AVAIL ? 3667 mac_rxavail_get(mh) : mac_txavail_get(mh); 3668 bcopy(&rings, val, sizeof (uint_t)); 3669 return (0); 3670 3671 case MAC_PROP_MAX_RXHWCLNT_AVAIL: 3672 case MAC_PROP_MAX_TXHWCLNT_AVAIL: 3673 ASSERT(valsize >= sizeof (uint_t)); 3674 vlinks = id == MAC_PROP_MAX_RXHWCLNT_AVAIL ? 3675 mac_rxhwlnksavail_get(mh) : mac_txhwlnksavail_get(mh); 3676 bcopy(&vlinks, val, sizeof (uint_t)); 3677 return (0); 3678 3679 case MAC_PROP_RXRINGSRANGE: 3680 case MAC_PROP_TXRINGSRANGE: 3681 /* 3682 * The value for these properties are returned through 3683 * the MAC_PROP_RESOURCE property. 3684 */ 3685 return (0); 3686 3687 default: 3688 break; 3689 3690 } 3691 3692 /* If driver property, request from driver */ 3693 if (mip->mi_callbacks->mc_callbacks & MC_GETPROP) { 3694 err = mip->mi_callbacks->mc_getprop(mip->mi_driver, name, id, 3695 valsize, val); 3696 } 3697 3698 return (err); 3699 } 3700 3701 /* 3702 * Helper function to initialize the range structure for use in 3703 * mac_get_prop. If the type can be other than uint32, we can 3704 * pass that as an arg. 3705 */ 3706 static void 3707 _mac_set_range(mac_propval_range_t *range, uint32_t min, uint32_t max) 3708 { 3709 range->mpr_count = 1; 3710 range->mpr_type = MAC_PROPVAL_UINT32; 3711 range->mpr_range_uint32[0].mpur_min = min; 3712 range->mpr_range_uint32[0].mpur_max = max; 3713 } 3714 3715 /* 3716 * Returns information about the specified property, such as default 3717 * values or permissions. 3718 */ 3719 int 3720 mac_prop_info(mac_handle_t mh, mac_prop_id_t id, char *name, 3721 void *default_val, uint_t default_size, mac_propval_range_t *range, 3722 uint_t *perm) 3723 { 3724 mac_prop_info_state_t state; 3725 mac_impl_t *mip = (mac_impl_t *)mh; 3726 uint_t max; 3727 3728 /* 3729 * A property is read/write by default unless the driver says 3730 * otherwise. 3731 */ 3732 if (perm != NULL) 3733 *perm = MAC_PROP_PERM_RW; 3734 3735 if (default_val != NULL) 3736 bzero(default_val, default_size); 3737 3738 /* 3739 * First, handle framework properties for which we don't need to 3740 * involve the driver. 3741 */ 3742 switch (id) { 3743 case MAC_PROP_RESOURCE: 3744 case MAC_PROP_PVID: 3745 case MAC_PROP_LLIMIT: 3746 case MAC_PROP_LDECAY: 3747 return (0); 3748 3749 case MAC_PROP_MAX_RX_RINGS_AVAIL: 3750 case MAC_PROP_MAX_TX_RINGS_AVAIL: 3751 case MAC_PROP_MAX_RXHWCLNT_AVAIL: 3752 case MAC_PROP_MAX_TXHWCLNT_AVAIL: 3753 if (perm != NULL) 3754 *perm = MAC_PROP_PERM_READ; 3755 return (0); 3756 3757 case MAC_PROP_RXRINGSRANGE: 3758 case MAC_PROP_TXRINGSRANGE: 3759 /* 3760 * Currently, we support range for RX and TX rings properties. 3761 * When we extend this support to maxbw, cpus and priority, 3762 * we should move this to mac_get_resources. 3763 * There is no default value for RX or TX rings. 3764 */ 3765 if ((mip->mi_state_flags & MIS_IS_VNIC) && 3766 mac_is_vnic_primary(mh)) { 3767 /* 3768 * We don't support setting rings for a VLAN 3769 * data link because it shares its ring with the 3770 * primary MAC client. 3771 */ 3772 if (perm != NULL) 3773 *perm = MAC_PROP_PERM_READ; 3774 if (range != NULL) 3775 range->mpr_count = 0; 3776 } else if (range != NULL) { 3777 if (mip->mi_state_flags & MIS_IS_VNIC) 3778 mh = mac_get_lower_mac_handle(mh); 3779 mip = (mac_impl_t *)mh; 3780 if ((id == MAC_PROP_RXRINGSRANGE && 3781 mip->mi_rx_group_type == MAC_GROUP_TYPE_STATIC) || 3782 (id == MAC_PROP_TXRINGSRANGE && 3783 mip->mi_tx_group_type == MAC_GROUP_TYPE_STATIC)) { 3784 if (id == MAC_PROP_RXRINGSRANGE) { 3785 if ((mac_rxhwlnksavail_get(mh) + 3786 mac_rxhwlnksrsvd_get(mh)) <= 1) { 3787 /* 3788 * doesn't support groups or 3789 * rings 3790 */ 3791 range->mpr_count = 0; 3792 } else { 3793 /* 3794 * supports specifying groups, 3795 * but not rings 3796 */ 3797 _mac_set_range(range, 0, 0); 3798 } 3799 } else { 3800 if ((mac_txhwlnksavail_get(mh) + 3801 mac_txhwlnksrsvd_get(mh)) <= 1) { 3802 /* 3803 * doesn't support groups or 3804 * rings 3805 */ 3806 range->mpr_count = 0; 3807 } else { 3808 /* 3809 * supports specifying groups, 3810 * but not rings 3811 */ 3812 _mac_set_range(range, 0, 0); 3813 } 3814 } 3815 } else { 3816 max = id == MAC_PROP_RXRINGSRANGE ? 3817 mac_rxavail_get(mh) + mac_rxrsvd_get(mh) : 3818 mac_txavail_get(mh) + mac_txrsvd_get(mh); 3819 if (max <= 1) { 3820 /* 3821 * doesn't support groups or 3822 * rings 3823 */ 3824 range->mpr_count = 0; 3825 } else { 3826 /* 3827 * -1 because we have to leave out the 3828 * default ring. 3829 */ 3830 _mac_set_range(range, 1, max - 1); 3831 } 3832 } 3833 } 3834 return (0); 3835 3836 case MAC_PROP_STATUS: 3837 case MAC_PROP_MEDIA: 3838 if (perm != NULL) 3839 *perm = MAC_PROP_PERM_READ; 3840 return (0); 3841 } 3842 3843 /* 3844 * Get the property info from the driver if it implements the 3845 * property info entry point. 3846 */ 3847 bzero(&state, sizeof (state)); 3848 3849 if (mip->mi_callbacks->mc_callbacks & MC_PROPINFO) { 3850 state.pr_default = default_val; 3851 state.pr_default_size = default_size; 3852 3853 /* 3854 * The caller specifies the maximum number of ranges 3855 * it can accomodate using mpr_count. We don't touch 3856 * this value until the driver returns from its 3857 * mc_propinfo() callback, and ensure we don't exceed 3858 * this number of range as the driver defines 3859 * supported range from its mc_propinfo(). 3860 * 3861 * pr_range_cur_count keeps track of how many ranges 3862 * were defined by the driver from its mc_propinfo() 3863 * entry point. 3864 * 3865 * On exit, the user-specified range mpr_count returns 3866 * the number of ranges specified by the driver on 3867 * success, or the number of ranges it wanted to 3868 * define if that number of ranges could not be 3869 * accomodated by the specified range structure. In 3870 * the latter case, the caller will be able to 3871 * allocate a larger range structure, and query the 3872 * property again. 3873 */ 3874 state.pr_range_cur_count = 0; 3875 state.pr_range = range; 3876 3877 mip->mi_callbacks->mc_propinfo(mip->mi_driver, name, id, 3878 (mac_prop_info_handle_t)&state); 3879 3880 if (state.pr_flags & MAC_PROP_INFO_RANGE) 3881 range->mpr_count = state.pr_range_cur_count; 3882 3883 /* 3884 * The operation could fail if the buffer supplied by 3885 * the user was too small for the range or default 3886 * value of the property. 3887 */ 3888 if (state.pr_errno != 0) 3889 return (state.pr_errno); 3890 3891 if (perm != NULL && state.pr_flags & MAC_PROP_INFO_PERM) 3892 *perm = state.pr_perm; 3893 } 3894 3895 /* 3896 * The MAC layer may want to provide default values or allowed 3897 * ranges for properties if the driver does not provide a 3898 * property info entry point, or that entry point exists, but 3899 * it did not provide a default value or allowed ranges for 3900 * that property. 3901 */ 3902 switch (id) { 3903 case MAC_PROP_MTU: { 3904 uint32_t sdu; 3905 3906 mac_sdu_get2(mh, NULL, &sdu, NULL); 3907 3908 if (range != NULL && !(state.pr_flags & 3909 MAC_PROP_INFO_RANGE)) { 3910 /* MTU range */ 3911 _mac_set_range(range, sdu, sdu); 3912 } 3913 3914 if (default_val != NULL && !(state.pr_flags & 3915 MAC_PROP_INFO_DEFAULT)) { 3916 if (mip->mi_info.mi_media == DL_ETHER) 3917 sdu = ETHERMTU; 3918 /* default MTU value */ 3919 bcopy(&sdu, default_val, sizeof (sdu)); 3920 } 3921 } 3922 } 3923 3924 return (0); 3925 } 3926 3927 int 3928 mac_fastpath_disable(mac_handle_t mh) 3929 { 3930 mac_impl_t *mip = (mac_impl_t *)mh; 3931 3932 if ((mip->mi_state_flags & MIS_LEGACY) == 0) 3933 return (0); 3934 3935 return (mip->mi_capab_legacy.ml_fastpath_disable(mip->mi_driver)); 3936 } 3937 3938 void 3939 mac_fastpath_enable(mac_handle_t mh) 3940 { 3941 mac_impl_t *mip = (mac_impl_t *)mh; 3942 3943 if ((mip->mi_state_flags & MIS_LEGACY) == 0) 3944 return; 3945 3946 mip->mi_capab_legacy.ml_fastpath_enable(mip->mi_driver); 3947 } 3948 3949 void 3950 mac_register_priv_prop(mac_impl_t *mip, char **priv_props) 3951 { 3952 uint_t nprops, i; 3953 3954 if (priv_props == NULL) 3955 return; 3956 3957 nprops = 0; 3958 while (priv_props[nprops] != NULL) 3959 nprops++; 3960 if (nprops == 0) 3961 return; 3962 3963 3964 mip->mi_priv_prop = kmem_zalloc(nprops * sizeof (char *), KM_SLEEP); 3965 3966 for (i = 0; i < nprops; i++) { 3967 mip->mi_priv_prop[i] = kmem_zalloc(MAXLINKPROPNAME, KM_SLEEP); 3968 (void) strlcpy(mip->mi_priv_prop[i], priv_props[i], 3969 MAXLINKPROPNAME); 3970 } 3971 3972 mip->mi_priv_prop_count = nprops; 3973 } 3974 3975 void 3976 mac_unregister_priv_prop(mac_impl_t *mip) 3977 { 3978 uint_t i; 3979 3980 if (mip->mi_priv_prop_count == 0) { 3981 ASSERT(mip->mi_priv_prop == NULL); 3982 return; 3983 } 3984 3985 for (i = 0; i < mip->mi_priv_prop_count; i++) 3986 kmem_free(mip->mi_priv_prop[i], MAXLINKPROPNAME); 3987 kmem_free(mip->mi_priv_prop, mip->mi_priv_prop_count * 3988 sizeof (char *)); 3989 3990 mip->mi_priv_prop = NULL; 3991 mip->mi_priv_prop_count = 0; 3992 } 3993 3994 /* 3995 * mac_ring_t 'mr' macros. Some rogue drivers may access ring structure 3996 * (by invoking mac_rx()) even after processing mac_stop_ring(). In such 3997 * cases if MAC free's the ring structure after mac_stop_ring(), any 3998 * illegal access to the ring structure coming from the driver will panic 3999 * the system. In order to protect the system from such inadverent access, 4000 * we maintain a cache of rings in the mac_impl_t after they get free'd up. 4001 * When packets are received on free'd up rings, MAC (through the generation 4002 * count mechanism) will drop such packets. 4003 */ 4004 static mac_ring_t * 4005 mac_ring_alloc(mac_impl_t *mip) 4006 { 4007 mac_ring_t *ring; 4008 4009 mutex_enter(&mip->mi_ring_lock); 4010 if (mip->mi_ring_freelist != NULL) { 4011 ring = mip->mi_ring_freelist; 4012 mip->mi_ring_freelist = ring->mr_next; 4013 bzero(ring, sizeof (mac_ring_t)); 4014 mutex_exit(&mip->mi_ring_lock); 4015 } else { 4016 mutex_exit(&mip->mi_ring_lock); 4017 ring = kmem_cache_alloc(mac_ring_cache, KM_SLEEP); 4018 } 4019 ASSERT((ring != NULL) && (ring->mr_state == MR_FREE)); 4020 return (ring); 4021 } 4022 4023 static void 4024 mac_ring_free(mac_impl_t *mip, mac_ring_t *ring) 4025 { 4026 ASSERT(ring->mr_state == MR_FREE); 4027 4028 mutex_enter(&mip->mi_ring_lock); 4029 ring->mr_state = MR_FREE; 4030 ring->mr_flag = 0; 4031 ring->mr_next = mip->mi_ring_freelist; 4032 ring->mr_mip = NULL; 4033 mip->mi_ring_freelist = ring; 4034 mac_ring_stat_delete(ring); 4035 mutex_exit(&mip->mi_ring_lock); 4036 } 4037 4038 static void 4039 mac_ring_freeall(mac_impl_t *mip) 4040 { 4041 mac_ring_t *ring_next; 4042 mutex_enter(&mip->mi_ring_lock); 4043 mac_ring_t *ring = mip->mi_ring_freelist; 4044 while (ring != NULL) { 4045 ring_next = ring->mr_next; 4046 kmem_cache_free(mac_ring_cache, ring); 4047 ring = ring_next; 4048 } 4049 mip->mi_ring_freelist = NULL; 4050 mutex_exit(&mip->mi_ring_lock); 4051 } 4052 4053 int 4054 mac_start_ring(mac_ring_t *ring) 4055 { 4056 int rv = 0; 4057 4058 ASSERT(ring->mr_state == MR_FREE); 4059 4060 if (ring->mr_start != NULL) { 4061 rv = ring->mr_start(ring->mr_driver, ring->mr_gen_num); 4062 if (rv != 0) 4063 return (rv); 4064 } 4065 4066 ring->mr_state = MR_INUSE; 4067 return (rv); 4068 } 4069 4070 void 4071 mac_stop_ring(mac_ring_t *ring) 4072 { 4073 ASSERT(ring->mr_state == MR_INUSE); 4074 4075 if (ring->mr_stop != NULL) 4076 ring->mr_stop(ring->mr_driver); 4077 4078 ring->mr_state = MR_FREE; 4079 4080 /* 4081 * Increment the ring generation number for this ring. 4082 */ 4083 ring->mr_gen_num++; 4084 } 4085 4086 int 4087 mac_start_group(mac_group_t *group) 4088 { 4089 int rv = 0; 4090 4091 if (group->mrg_start != NULL) 4092 rv = group->mrg_start(group->mrg_driver); 4093 4094 return (rv); 4095 } 4096 4097 void 4098 mac_stop_group(mac_group_t *group) 4099 { 4100 if (group->mrg_stop != NULL) 4101 group->mrg_stop(group->mrg_driver); 4102 } 4103 4104 /* 4105 * Called from mac_start() on the default Rx group. Broadcast and multicast 4106 * packets are received only on the default group. Hence the default group 4107 * needs to be up even if the primary client is not up, for the other groups 4108 * to be functional. We do this by calling this function at mac_start time 4109 * itself. However the broadcast packets that are received can't make their 4110 * way beyond mac_rx until a mac client creates a broadcast flow. 4111 */ 4112 static int 4113 mac_start_group_and_rings(mac_group_t *group) 4114 { 4115 mac_ring_t *ring; 4116 int rv = 0; 4117 4118 ASSERT(group->mrg_state == MAC_GROUP_STATE_REGISTERED); 4119 if ((rv = mac_start_group(group)) != 0) 4120 return (rv); 4121 4122 for (ring = group->mrg_rings; ring != NULL; ring = ring->mr_next) { 4123 ASSERT(ring->mr_state == MR_FREE); 4124 4125 if ((rv = mac_start_ring(ring)) != 0) 4126 goto error; 4127 4128 /* 4129 * When aggr_set_port_sdu() is called, it will remove 4130 * the port client's unicast address. This will cause 4131 * MAC to stop the default group's rings on the port 4132 * MAC. After it modifies the SDU, it will then re-add 4133 * the unicast address. At which time, this function is 4134 * called to start the default group's rings. Normally 4135 * this function would set the classify type to 4136 * MAC_SW_CLASSIFIER; but that will break aggr which 4137 * relies on the passthru classify mode being set for 4138 * correct delivery (see mac_rx_common()). To avoid 4139 * that, we check for a passthru callback and set the 4140 * classify type to MAC_PASSTHRU_CLASSIFIER; as it was 4141 * before the rings were stopped. 4142 */ 4143 ring->mr_classify_type = (ring->mr_pt_fn != NULL) ? 4144 MAC_PASSTHRU_CLASSIFIER : MAC_SW_CLASSIFIER; 4145 } 4146 return (0); 4147 4148 error: 4149 mac_stop_group_and_rings(group); 4150 return (rv); 4151 } 4152 4153 /* Called from mac_stop on the default Rx group */ 4154 static void 4155 mac_stop_group_and_rings(mac_group_t *group) 4156 { 4157 mac_ring_t *ring; 4158 4159 for (ring = group->mrg_rings; ring != NULL; ring = ring->mr_next) { 4160 if (ring->mr_state != MR_FREE) { 4161 mac_stop_ring(ring); 4162 ring->mr_flag = 0; 4163 ring->mr_classify_type = MAC_NO_CLASSIFIER; 4164 } 4165 } 4166 mac_stop_group(group); 4167 } 4168 4169 4170 static mac_ring_t * 4171 mac_init_ring(mac_impl_t *mip, mac_group_t *group, int index, 4172 mac_capab_rings_t *cap_rings) 4173 { 4174 mac_ring_t *ring, *rnext; 4175 mac_ring_info_t ring_info; 4176 ddi_intr_handle_t ddi_handle; 4177 4178 ring = mac_ring_alloc(mip); 4179 4180 /* Prepare basic information of ring */ 4181 4182 /* 4183 * Ring index is numbered to be unique across a particular device. 4184 * Ring index computation makes following assumptions: 4185 * - For drivers with static grouping (e.g. ixgbe, bge), 4186 * ring index exchanged with the driver (e.g. during mr_rget) 4187 * is unique only across the group the ring belongs to. 4188 * - Drivers with dynamic grouping (e.g. nxge), start 4189 * with single group (mrg_index = 0). 4190 */ 4191 ring->mr_index = group->mrg_index * group->mrg_info.mgi_count + index; 4192 ring->mr_type = group->mrg_type; 4193 ring->mr_gh = (mac_group_handle_t)group; 4194 4195 /* Insert the new ring to the list. */ 4196 ring->mr_next = group->mrg_rings; 4197 group->mrg_rings = ring; 4198 4199 /* Zero to reuse the info data structure */ 4200 bzero(&ring_info, sizeof (ring_info)); 4201 4202 /* Query ring information from driver */ 4203 cap_rings->mr_rget(mip->mi_driver, group->mrg_type, group->mrg_index, 4204 index, &ring_info, (mac_ring_handle_t)ring); 4205 4206 ring->mr_info = ring_info; 4207 4208 /* 4209 * The interrupt handle could be shared among multiple rings. 4210 * Thus if there is a bunch of rings that are sharing an 4211 * interrupt, then only one ring among the bunch will be made 4212 * available for interrupt re-targeting; the rest will have 4213 * ddi_shared flag set to TRUE and would not be available for 4214 * be interrupt re-targeting. 4215 */ 4216 if ((ddi_handle = ring_info.mri_intr.mi_ddi_handle) != NULL) { 4217 rnext = ring->mr_next; 4218 while (rnext != NULL) { 4219 if (rnext->mr_info.mri_intr.mi_ddi_handle == 4220 ddi_handle) { 4221 /* 4222 * If default ring (mr_index == 0) is part 4223 * of a group of rings sharing an 4224 * interrupt, then set ddi_shared flag for 4225 * the default ring and give another ring 4226 * the chance to be re-targeted. 4227 */ 4228 if (rnext->mr_index == 0 && 4229 !rnext->mr_info.mri_intr.mi_ddi_shared) { 4230 rnext->mr_info.mri_intr.mi_ddi_shared = 4231 B_TRUE; 4232 } else { 4233 ring->mr_info.mri_intr.mi_ddi_shared = 4234 B_TRUE; 4235 } 4236 break; 4237 } 4238 rnext = rnext->mr_next; 4239 } 4240 /* 4241 * If rnext is NULL, then no matching ddi_handle was found. 4242 * Rx rings get registered first. So if this is a Tx ring, 4243 * then go through all the Rx rings and see if there is a 4244 * matching ddi handle. 4245 */ 4246 if (rnext == NULL && ring->mr_type == MAC_RING_TYPE_TX) { 4247 mac_compare_ddi_handle(mip->mi_rx_groups, 4248 mip->mi_rx_group_count, ring); 4249 } 4250 } 4251 4252 /* Update ring's status */ 4253 ring->mr_state = MR_FREE; 4254 ring->mr_flag = 0; 4255 4256 /* Update the ring count of the group */ 4257 group->mrg_cur_count++; 4258 4259 /* Create per ring kstats */ 4260 if (ring->mr_stat != NULL) { 4261 ring->mr_mip = mip; 4262 mac_ring_stat_create(ring); 4263 } 4264 4265 return (ring); 4266 } 4267 4268 /* 4269 * Rings are chained together for easy regrouping. 4270 */ 4271 static void 4272 mac_init_group(mac_impl_t *mip, mac_group_t *group, int size, 4273 mac_capab_rings_t *cap_rings) 4274 { 4275 int index; 4276 4277 /* 4278 * Initialize all ring members of this group. Size of zero will not 4279 * enter the loop, so it's safe for initializing an empty group. 4280 */ 4281 for (index = size - 1; index >= 0; index--) 4282 (void) mac_init_ring(mip, group, index, cap_rings); 4283 } 4284 4285 int 4286 mac_init_rings(mac_impl_t *mip, mac_ring_type_t rtype) 4287 { 4288 mac_capab_rings_t *cap_rings; 4289 mac_group_t *group; 4290 mac_group_t *groups; 4291 mac_group_info_t group_info; 4292 uint_t group_free = 0; 4293 uint_t ring_left; 4294 mac_ring_t *ring; 4295 int g; 4296 int err = 0; 4297 uint_t grpcnt; 4298 boolean_t pseudo_txgrp = B_FALSE; 4299 4300 switch (rtype) { 4301 case MAC_RING_TYPE_RX: 4302 ASSERT(mip->mi_rx_groups == NULL); 4303 4304 cap_rings = &mip->mi_rx_rings_cap; 4305 cap_rings->mr_type = MAC_RING_TYPE_RX; 4306 break; 4307 case MAC_RING_TYPE_TX: 4308 ASSERT(mip->mi_tx_groups == NULL); 4309 4310 cap_rings = &mip->mi_tx_rings_cap; 4311 cap_rings->mr_type = MAC_RING_TYPE_TX; 4312 break; 4313 default: 4314 ASSERT(B_FALSE); 4315 } 4316 4317 if (!i_mac_capab_get((mac_handle_t)mip, MAC_CAPAB_RINGS, cap_rings)) 4318 return (0); 4319 grpcnt = cap_rings->mr_gnum; 4320 4321 /* 4322 * If we have multiple TX rings, but only one TX group, we can 4323 * create pseudo TX groups (one per TX ring) in the MAC layer, 4324 * except for an aggr. For an aggr currently we maintain only 4325 * one group with all the rings (for all its ports), going 4326 * forwards we might change this. 4327 */ 4328 if (rtype == MAC_RING_TYPE_TX && 4329 cap_rings->mr_gnum == 0 && cap_rings->mr_rnum > 0 && 4330 (mip->mi_state_flags & MIS_IS_AGGR) == 0) { 4331 /* 4332 * The -1 here is because we create a default TX group 4333 * with all the rings in it. 4334 */ 4335 grpcnt = cap_rings->mr_rnum - 1; 4336 pseudo_txgrp = B_TRUE; 4337 } 4338 4339 /* 4340 * Allocate a contiguous buffer for all groups. 4341 */ 4342 groups = kmem_zalloc(sizeof (mac_group_t) * (grpcnt+ 1), KM_SLEEP); 4343 4344 ring_left = cap_rings->mr_rnum; 4345 4346 /* 4347 * Get all ring groups if any, and get their ring members 4348 * if any. 4349 */ 4350 for (g = 0; g < grpcnt; g++) { 4351 group = groups + g; 4352 4353 /* Prepare basic information of the group */ 4354 group->mrg_index = g; 4355 group->mrg_type = rtype; 4356 group->mrg_state = MAC_GROUP_STATE_UNINIT; 4357 group->mrg_mh = (mac_handle_t)mip; 4358 group->mrg_next = group + 1; 4359 4360 /* Zero to reuse the info data structure */ 4361 bzero(&group_info, sizeof (group_info)); 4362 4363 if (pseudo_txgrp) { 4364 /* 4365 * This is a pseudo group that we created, apart 4366 * from setting the state there is nothing to be 4367 * done. 4368 */ 4369 group->mrg_state = MAC_GROUP_STATE_REGISTERED; 4370 group_free++; 4371 continue; 4372 } 4373 /* Query group information from driver */ 4374 cap_rings->mr_gget(mip->mi_driver, rtype, g, &group_info, 4375 (mac_group_handle_t)group); 4376 4377 switch (cap_rings->mr_group_type) { 4378 case MAC_GROUP_TYPE_DYNAMIC: 4379 if (cap_rings->mr_gaddring == NULL || 4380 cap_rings->mr_gremring == NULL) { 4381 DTRACE_PROBE3( 4382 mac__init__rings_no_addremring, 4383 char *, mip->mi_name, 4384 mac_group_add_ring_t, 4385 cap_rings->mr_gaddring, 4386 mac_group_add_ring_t, 4387 cap_rings->mr_gremring); 4388 err = EINVAL; 4389 goto bail; 4390 } 4391 4392 switch (rtype) { 4393 case MAC_RING_TYPE_RX: 4394 /* 4395 * The first RX group must have non-zero 4396 * rings, and the following groups must 4397 * have zero rings. 4398 */ 4399 if (g == 0 && group_info.mgi_count == 0) { 4400 DTRACE_PROBE1( 4401 mac__init__rings__rx__def__zero, 4402 char *, mip->mi_name); 4403 err = EINVAL; 4404 goto bail; 4405 } 4406 if (g > 0 && group_info.mgi_count != 0) { 4407 DTRACE_PROBE3( 4408 mac__init__rings__rx__nonzero, 4409 char *, mip->mi_name, 4410 int, g, int, group_info.mgi_count); 4411 err = EINVAL; 4412 goto bail; 4413 } 4414 break; 4415 case MAC_RING_TYPE_TX: 4416 /* 4417 * All TX ring groups must have zero rings. 4418 */ 4419 if (group_info.mgi_count != 0) { 4420 DTRACE_PROBE3( 4421 mac__init__rings__tx__nonzero, 4422 char *, mip->mi_name, 4423 int, g, int, group_info.mgi_count); 4424 err = EINVAL; 4425 goto bail; 4426 } 4427 break; 4428 } 4429 break; 4430 case MAC_GROUP_TYPE_STATIC: 4431 /* 4432 * Note that an empty group is allowed, e.g., an aggr 4433 * would start with an empty group. 4434 */ 4435 break; 4436 default: 4437 /* unknown group type */ 4438 DTRACE_PROBE2(mac__init__rings__unknown__type, 4439 char *, mip->mi_name, 4440 int, cap_rings->mr_group_type); 4441 err = EINVAL; 4442 goto bail; 4443 } 4444 4445 4446 /* 4447 * The driver must register some form of hardware MAC 4448 * filter in order for Rx groups to support multiple 4449 * MAC addresses. 4450 */ 4451 if (rtype == MAC_RING_TYPE_RX && 4452 (group_info.mgi_addmac == NULL || 4453 group_info.mgi_remmac == NULL)) { 4454 DTRACE_PROBE1(mac__init__rings__no__mac__filter, 4455 char *, mip->mi_name); 4456 err = EINVAL; 4457 goto bail; 4458 } 4459 4460 /* Cache driver-supplied information */ 4461 group->mrg_info = group_info; 4462 4463 /* Update the group's status and group count. */ 4464 mac_set_group_state(group, MAC_GROUP_STATE_REGISTERED); 4465 group_free++; 4466 4467 group->mrg_rings = NULL; 4468 group->mrg_cur_count = 0; 4469 mac_init_group(mip, group, group_info.mgi_count, cap_rings); 4470 ring_left -= group_info.mgi_count; 4471 4472 /* The current group size should be equal to default value */ 4473 ASSERT(group->mrg_cur_count == group_info.mgi_count); 4474 } 4475 4476 /* Build up a dummy group for free resources as a pool */ 4477 group = groups + grpcnt; 4478 4479 /* Prepare basic information of the group */ 4480 group->mrg_index = -1; 4481 group->mrg_type = rtype; 4482 group->mrg_state = MAC_GROUP_STATE_UNINIT; 4483 group->mrg_mh = (mac_handle_t)mip; 4484 group->mrg_next = NULL; 4485 4486 /* 4487 * If there are ungrouped rings, allocate a continuous buffer for 4488 * remaining resources. 4489 */ 4490 if (ring_left != 0) { 4491 group->mrg_rings = NULL; 4492 group->mrg_cur_count = 0; 4493 mac_init_group(mip, group, ring_left, cap_rings); 4494 4495 /* The current group size should be equal to ring_left */ 4496 ASSERT(group->mrg_cur_count == ring_left); 4497 4498 ring_left = 0; 4499 4500 /* Update this group's status */ 4501 mac_set_group_state(group, MAC_GROUP_STATE_REGISTERED); 4502 } else { 4503 group->mrg_rings = NULL; 4504 } 4505 4506 ASSERT(ring_left == 0); 4507 4508 bail: 4509 4510 /* Cache other important information to finalize the initialization */ 4511 switch (rtype) { 4512 case MAC_RING_TYPE_RX: 4513 mip->mi_rx_group_type = cap_rings->mr_group_type; 4514 mip->mi_rx_group_count = cap_rings->mr_gnum; 4515 mip->mi_rx_groups = groups; 4516 mip->mi_rx_donor_grp = groups; 4517 if (mip->mi_rx_group_type == MAC_GROUP_TYPE_DYNAMIC) { 4518 /* 4519 * The default ring is reserved since it is 4520 * used for sending the broadcast etc. packets. 4521 */ 4522 mip->mi_rxrings_avail = 4523 mip->mi_rx_groups->mrg_cur_count - 1; 4524 mip->mi_rxrings_rsvd = 1; 4525 } 4526 /* 4527 * The default group cannot be reserved. It is used by 4528 * all the clients that do not have an exclusive group. 4529 */ 4530 mip->mi_rxhwclnt_avail = mip->mi_rx_group_count - 1; 4531 mip->mi_rxhwclnt_used = 1; 4532 break; 4533 case MAC_RING_TYPE_TX: 4534 mip->mi_tx_group_type = pseudo_txgrp ? MAC_GROUP_TYPE_DYNAMIC : 4535 cap_rings->mr_group_type; 4536 mip->mi_tx_group_count = grpcnt; 4537 mip->mi_tx_group_free = group_free; 4538 mip->mi_tx_groups = groups; 4539 4540 group = groups + grpcnt; 4541 ring = group->mrg_rings; 4542 /* 4543 * The ring can be NULL in the case of aggr. Aggr will 4544 * have an empty Tx group which will get populated 4545 * later when pseudo Tx rings are added after 4546 * mac_register() is done. 4547 */ 4548 if (ring == NULL) { 4549 ASSERT(mip->mi_state_flags & MIS_IS_AGGR); 4550 /* 4551 * pass the group to aggr so it can add Tx 4552 * rings to the group later. 4553 */ 4554 cap_rings->mr_gget(mip->mi_driver, rtype, 0, NULL, 4555 (mac_group_handle_t)group); 4556 /* 4557 * Even though there are no rings at this time 4558 * (rings will come later), set the group 4559 * state to registered. 4560 */ 4561 group->mrg_state = MAC_GROUP_STATE_REGISTERED; 4562 } else { 4563 /* 4564 * Ring 0 is used as the default one and it could be 4565 * assigned to a client as well. 4566 */ 4567 while ((ring->mr_index != 0) && (ring->mr_next != NULL)) 4568 ring = ring->mr_next; 4569 ASSERT(ring->mr_index == 0); 4570 mip->mi_default_tx_ring = (mac_ring_handle_t)ring; 4571 } 4572 if (mip->mi_tx_group_type == MAC_GROUP_TYPE_DYNAMIC) { 4573 mip->mi_txrings_avail = group->mrg_cur_count - 1; 4574 /* 4575 * The default ring cannot be reserved. 4576 */ 4577 mip->mi_txrings_rsvd = 1; 4578 } 4579 /* 4580 * The default group cannot be reserved. It will be shared 4581 * by clients that do not have an exclusive group. 4582 */ 4583 mip->mi_txhwclnt_avail = mip->mi_tx_group_count; 4584 mip->mi_txhwclnt_used = 1; 4585 break; 4586 default: 4587 ASSERT(B_FALSE); 4588 } 4589 4590 if (err != 0) 4591 mac_free_rings(mip, rtype); 4592 4593 return (err); 4594 } 4595 4596 /* 4597 * The ddi interrupt handle could be shared amoung rings. If so, compare 4598 * the new ring's ddi handle with the existing ones and set ddi_shared 4599 * flag. 4600 */ 4601 void 4602 mac_compare_ddi_handle(mac_group_t *groups, uint_t grpcnt, mac_ring_t *cring) 4603 { 4604 mac_group_t *group; 4605 mac_ring_t *ring; 4606 ddi_intr_handle_t ddi_handle; 4607 int g; 4608 4609 ddi_handle = cring->mr_info.mri_intr.mi_ddi_handle; 4610 for (g = 0; g < grpcnt; g++) { 4611 group = groups + g; 4612 for (ring = group->mrg_rings; ring != NULL; 4613 ring = ring->mr_next) { 4614 if (ring == cring) 4615 continue; 4616 if (ring->mr_info.mri_intr.mi_ddi_handle == 4617 ddi_handle) { 4618 if (cring->mr_type == MAC_RING_TYPE_RX && 4619 ring->mr_index == 0 && 4620 !ring->mr_info.mri_intr.mi_ddi_shared) { 4621 ring->mr_info.mri_intr.mi_ddi_shared = 4622 B_TRUE; 4623 } else { 4624 cring->mr_info.mri_intr.mi_ddi_shared = 4625 B_TRUE; 4626 } 4627 return; 4628 } 4629 } 4630 } 4631 } 4632 4633 /* 4634 * Called to free all groups of particular type (RX or TX). It's assumed that 4635 * no clients are using these groups. 4636 */ 4637 void 4638 mac_free_rings(mac_impl_t *mip, mac_ring_type_t rtype) 4639 { 4640 mac_group_t *group, *groups; 4641 uint_t group_count; 4642 4643 switch (rtype) { 4644 case MAC_RING_TYPE_RX: 4645 if (mip->mi_rx_groups == NULL) 4646 return; 4647 4648 groups = mip->mi_rx_groups; 4649 group_count = mip->mi_rx_group_count; 4650 4651 mip->mi_rx_groups = NULL; 4652 mip->mi_rx_donor_grp = NULL; 4653 mip->mi_rx_group_count = 0; 4654 break; 4655 case MAC_RING_TYPE_TX: 4656 ASSERT(mip->mi_tx_group_count == mip->mi_tx_group_free); 4657 4658 if (mip->mi_tx_groups == NULL) 4659 return; 4660 4661 groups = mip->mi_tx_groups; 4662 group_count = mip->mi_tx_group_count; 4663 4664 mip->mi_tx_groups = NULL; 4665 mip->mi_tx_group_count = 0; 4666 mip->mi_tx_group_free = 0; 4667 mip->mi_default_tx_ring = NULL; 4668 break; 4669 default: 4670 ASSERT(B_FALSE); 4671 } 4672 4673 for (group = groups; group != NULL; group = group->mrg_next) { 4674 mac_ring_t *ring; 4675 4676 if (group->mrg_cur_count == 0) 4677 continue; 4678 4679 ASSERT(group->mrg_rings != NULL); 4680 4681 while ((ring = group->mrg_rings) != NULL) { 4682 group->mrg_rings = ring->mr_next; 4683 mac_ring_free(mip, ring); 4684 } 4685 } 4686 4687 /* Free all the cached rings */ 4688 mac_ring_freeall(mip); 4689 /* Free the block of group data strutures */ 4690 kmem_free(groups, sizeof (mac_group_t) * (group_count + 1)); 4691 } 4692 4693 /* 4694 * Associate the VLAN filter to the receive group. 4695 */ 4696 int 4697 mac_group_addvlan(mac_group_t *group, uint16_t vlan) 4698 { 4699 VERIFY3S(group->mrg_type, ==, MAC_RING_TYPE_RX); 4700 VERIFY3P(group->mrg_info.mgi_addvlan, !=, NULL); 4701 4702 if (vlan > VLAN_ID_MAX) 4703 return (EINVAL); 4704 4705 vlan = MAC_VLAN_UNTAGGED_VID(vlan); 4706 return (group->mrg_info.mgi_addvlan(group->mrg_info.mgi_driver, vlan)); 4707 } 4708 4709 /* 4710 * Dissociate the VLAN from the receive group. 4711 */ 4712 int 4713 mac_group_remvlan(mac_group_t *group, uint16_t vlan) 4714 { 4715 VERIFY3S(group->mrg_type, ==, MAC_RING_TYPE_RX); 4716 VERIFY3P(group->mrg_info.mgi_remvlan, !=, NULL); 4717 4718 if (vlan > VLAN_ID_MAX) 4719 return (EINVAL); 4720 4721 vlan = MAC_VLAN_UNTAGGED_VID(vlan); 4722 return (group->mrg_info.mgi_remvlan(group->mrg_info.mgi_driver, vlan)); 4723 } 4724 4725 /* 4726 * Associate a MAC address with a receive group. 4727 * 4728 * The return value of this function should always be checked properly, because 4729 * any type of failure could cause unexpected results. A group can be added 4730 * or removed with a MAC address only after it has been reserved. Ideally, 4731 * a successful reservation always leads to calling mac_group_addmac() to 4732 * steer desired traffic. Failure of adding an unicast MAC address doesn't 4733 * always imply that the group is functioning abnormally. 4734 * 4735 * Currently this function is called everywhere, and it reflects assumptions 4736 * about MAC addresses in the implementation. CR 6735196. 4737 */ 4738 int 4739 mac_group_addmac(mac_group_t *group, const uint8_t *addr) 4740 { 4741 VERIFY3S(group->mrg_type, ==, MAC_RING_TYPE_RX); 4742 VERIFY3P(group->mrg_info.mgi_addmac, !=, NULL); 4743 4744 return (group->mrg_info.mgi_addmac(group->mrg_info.mgi_driver, addr)); 4745 } 4746 4747 /* 4748 * Remove the association between MAC address and receive group. 4749 */ 4750 int 4751 mac_group_remmac(mac_group_t *group, const uint8_t *addr) 4752 { 4753 VERIFY3S(group->mrg_type, ==, MAC_RING_TYPE_RX); 4754 VERIFY3P(group->mrg_info.mgi_remmac, !=, NULL); 4755 4756 return (group->mrg_info.mgi_remmac(group->mrg_info.mgi_driver, addr)); 4757 } 4758 4759 /* 4760 * This is the entry point for packets transmitted through the bridge 4761 * code. If no bridge is in place, mac_ring_tx() transmits via the tx 4762 * ring. The 'rh' pointer may be NULL to select the default ring. 4763 */ 4764 mblk_t * 4765 mac_bridge_tx(mac_impl_t *mip, mac_ring_handle_t rh, mblk_t *mp) 4766 { 4767 mac_handle_t mh; 4768 4769 /* 4770 * Once we take a reference on the bridge link, the bridge 4771 * module itself can't unload, so the callback pointers are 4772 * stable. 4773 */ 4774 mutex_enter(&mip->mi_bridge_lock); 4775 if ((mh = mip->mi_bridge_link) != NULL) 4776 mac_bridge_ref_cb(mh, B_TRUE); 4777 mutex_exit(&mip->mi_bridge_lock); 4778 if (mh == NULL) { 4779 mp = mac_ring_tx((mac_handle_t)mip, rh, mp); 4780 } else { 4781 /* 4782 * The bridge may place this mblk on a provider's Tx 4783 * path, a mac's Rx path, or both. Since we don't have 4784 * enough information at this point, we can't be sure 4785 * that the destination(s) are capable of handling the 4786 * hardware offloads requested by the mblk. We emulate 4787 * them here as it is the safest choice. In the 4788 * future, if bridge performance becomes a priority, 4789 * we can elide the emulation here and leave the 4790 * choice up to bridge. 4791 * 4792 * We don't clear the DB_CKSUMFLAGS here because 4793 * HCK_IPV4_HDRCKSUM (Tx) and HCK_IPV4_HDRCKSUM_OK 4794 * (Rx) still have the same value. If the bridge 4795 * receives a packet from a HCKSUM_IPHDRCKSUM NIC then 4796 * the mac(s) it is forwarded on may calculate the 4797 * checksum again, but incorrectly (because the 4798 * checksum field is not zero). Until the 4799 * HCK_IPV4_HDRCKSUM/HCK_IPV4_HDRCKSUM_OK issue is 4800 * resovled, we leave the flag clearing in bridge 4801 * itself. 4802 */ 4803 if ((DB_CKSUMFLAGS(mp) & (HCK_TX_FLAGS | HW_LSO_FLAGS)) != 0) { 4804 mac_hw_emul(&mp, NULL, NULL, MAC_ALL_EMULS); 4805 } 4806 4807 mp = mac_bridge_tx_cb(mh, rh, mp); 4808 mac_bridge_ref_cb(mh, B_FALSE); 4809 } 4810 4811 return (mp); 4812 } 4813 4814 /* 4815 * Find a ring from its index. 4816 */ 4817 mac_ring_handle_t 4818 mac_find_ring(mac_group_handle_t gh, int index) 4819 { 4820 mac_group_t *group = (mac_group_t *)gh; 4821 mac_ring_t *ring = group->mrg_rings; 4822 4823 for (ring = group->mrg_rings; ring != NULL; ring = ring->mr_next) 4824 if (ring->mr_index == index) 4825 break; 4826 4827 return ((mac_ring_handle_t)ring); 4828 } 4829 /* 4830 * Add a ring to an existing group. 4831 * 4832 * The ring must be either passed directly (for example if the ring 4833 * movement is initiated by the framework), or specified through a driver 4834 * index (for example when the ring is added by the driver. 4835 * 4836 * The caller needs to call mac_perim_enter() before calling this function. 4837 */ 4838 int 4839 i_mac_group_add_ring(mac_group_t *group, mac_ring_t *ring, int index) 4840 { 4841 mac_impl_t *mip = (mac_impl_t *)group->mrg_mh; 4842 mac_capab_rings_t *cap_rings; 4843 boolean_t driver_call = (ring == NULL); 4844 mac_group_type_t group_type; 4845 int ret = 0; 4846 flow_entry_t *flent; 4847 4848 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 4849 4850 switch (group->mrg_type) { 4851 case MAC_RING_TYPE_RX: 4852 cap_rings = &mip->mi_rx_rings_cap; 4853 group_type = mip->mi_rx_group_type; 4854 break; 4855 case MAC_RING_TYPE_TX: 4856 cap_rings = &mip->mi_tx_rings_cap; 4857 group_type = mip->mi_tx_group_type; 4858 break; 4859 default: 4860 ASSERT(B_FALSE); 4861 } 4862 4863 /* 4864 * There should be no ring with the same ring index in the target 4865 * group. 4866 */ 4867 ASSERT(mac_find_ring((mac_group_handle_t)group, 4868 driver_call ? index : ring->mr_index) == NULL); 4869 4870 if (driver_call) { 4871 /* 4872 * The function is called as a result of a request from 4873 * a driver to add a ring to an existing group, for example 4874 * from the aggregation driver. Allocate a new mac_ring_t 4875 * for that ring. 4876 */ 4877 ring = mac_init_ring(mip, group, index, cap_rings); 4878 ASSERT(group->mrg_state > MAC_GROUP_STATE_UNINIT); 4879 } else { 4880 /* 4881 * The function is called as a result of a MAC layer request 4882 * to add a ring to an existing group. In this case the 4883 * ring is being moved between groups, which requires 4884 * the underlying driver to support dynamic grouping, 4885 * and the mac_ring_t already exists. 4886 */ 4887 ASSERT(group_type == MAC_GROUP_TYPE_DYNAMIC); 4888 ASSERT(group->mrg_driver == NULL || 4889 cap_rings->mr_gaddring != NULL); 4890 ASSERT(ring->mr_gh == NULL); 4891 } 4892 4893 /* 4894 * At this point the ring should not be in use, and it should be 4895 * of the right for the target group. 4896 */ 4897 ASSERT(ring->mr_state < MR_INUSE); 4898 ASSERT(ring->mr_srs == NULL); 4899 ASSERT(ring->mr_type == group->mrg_type); 4900 4901 if (!driver_call) { 4902 /* 4903 * Add the driver level hardware ring if the process was not 4904 * initiated by the driver, and the target group is not the 4905 * group. 4906 */ 4907 if (group->mrg_driver != NULL) { 4908 cap_rings->mr_gaddring(group->mrg_driver, 4909 ring->mr_driver, ring->mr_type); 4910 } 4911 4912 /* 4913 * Insert the ring ahead existing rings. 4914 */ 4915 ring->mr_next = group->mrg_rings; 4916 group->mrg_rings = ring; 4917 ring->mr_gh = (mac_group_handle_t)group; 4918 group->mrg_cur_count++; 4919 } 4920 4921 /* 4922 * If the group has not been actively used, we're done. 4923 */ 4924 if (group->mrg_index != -1 && 4925 group->mrg_state < MAC_GROUP_STATE_RESERVED) 4926 return (0); 4927 4928 /* 4929 * Start the ring if needed. Failure causes to undo the grouping action. 4930 */ 4931 if (ring->mr_state != MR_INUSE) { 4932 if ((ret = mac_start_ring(ring)) != 0) { 4933 if (!driver_call) { 4934 cap_rings->mr_gremring(group->mrg_driver, 4935 ring->mr_driver, ring->mr_type); 4936 } 4937 group->mrg_cur_count--; 4938 group->mrg_rings = ring->mr_next; 4939 4940 ring->mr_gh = NULL; 4941 4942 if (driver_call) 4943 mac_ring_free(mip, ring); 4944 4945 return (ret); 4946 } 4947 } 4948 4949 /* 4950 * Set up SRS/SR according to the ring type. 4951 */ 4952 switch (ring->mr_type) { 4953 case MAC_RING_TYPE_RX: 4954 /* 4955 * Setup an SRS on top of the new ring if the group is 4956 * reserved for someone's exclusive use. 4957 */ 4958 if (group->mrg_state == MAC_GROUP_STATE_RESERVED) { 4959 mac_client_impl_t *mcip = MAC_GROUP_ONLY_CLIENT(group); 4960 4961 VERIFY3P(mcip, !=, NULL); 4962 flent = mcip->mci_flent; 4963 VERIFY3S(flent->fe_rx_srs_cnt, >, 0); 4964 mac_rx_srs_group_setup(mcip, flent, SRST_LINK); 4965 mac_fanout_setup(mcip, flent, MCIP_RESOURCE_PROPS(mcip), 4966 mac_rx_deliver, mcip, NULL, NULL); 4967 } else { 4968 ring->mr_classify_type = MAC_SW_CLASSIFIER; 4969 } 4970 break; 4971 case MAC_RING_TYPE_TX: 4972 { 4973 mac_grp_client_t *mgcp = group->mrg_clients; 4974 mac_client_impl_t *mcip; 4975 mac_soft_ring_set_t *mac_srs; 4976 mac_srs_tx_t *tx; 4977 4978 if (MAC_GROUP_NO_CLIENT(group)) { 4979 if (ring->mr_state == MR_INUSE) 4980 mac_stop_ring(ring); 4981 ring->mr_flag = 0; 4982 break; 4983 } 4984 /* 4985 * If the rings are being moved to a group that has 4986 * clients using it, then add the new rings to the 4987 * clients SRS. 4988 */ 4989 while (mgcp != NULL) { 4990 boolean_t is_aggr; 4991 4992 mcip = mgcp->mgc_client; 4993 flent = mcip->mci_flent; 4994 is_aggr = (mcip->mci_state_flags & MCIS_IS_AGGR_CLIENT); 4995 mac_srs = MCIP_TX_SRS(mcip); 4996 tx = &mac_srs->srs_tx; 4997 mac_tx_client_quiesce((mac_client_handle_t)mcip); 4998 /* 4999 * If we are growing from 1 to multiple rings. 5000 */ 5001 if (tx->st_mode == SRS_TX_BW || 5002 tx->st_mode == SRS_TX_SERIALIZE || 5003 tx->st_mode == SRS_TX_DEFAULT) { 5004 mac_ring_t *tx_ring = tx->st_arg2; 5005 5006 tx->st_arg2 = NULL; 5007 mac_tx_srs_stat_recreate(mac_srs, B_TRUE); 5008 mac_tx_srs_add_ring(mac_srs, tx_ring); 5009 if (mac_srs->srs_type & SRST_BW_CONTROL) { 5010 tx->st_mode = is_aggr ? SRS_TX_BW_AGGR : 5011 SRS_TX_BW_FANOUT; 5012 } else { 5013 tx->st_mode = is_aggr ? SRS_TX_AGGR : 5014 SRS_TX_FANOUT; 5015 } 5016 tx->st_func = mac_tx_get_func(tx->st_mode); 5017 } 5018 mac_tx_srs_add_ring(mac_srs, ring); 5019 mac_fanout_setup(mcip, flent, MCIP_RESOURCE_PROPS(mcip), 5020 mac_rx_deliver, mcip, NULL, NULL); 5021 mac_tx_client_restart((mac_client_handle_t)mcip); 5022 mgcp = mgcp->mgc_next; 5023 } 5024 break; 5025 } 5026 default: 5027 ASSERT(B_FALSE); 5028 } 5029 /* 5030 * For aggr, the default ring will be NULL to begin with. If it 5031 * is NULL, then pick the first ring that gets added as the 5032 * default ring. Any ring in an aggregation can be removed at 5033 * any time (by the user action of removing a link) and if the 5034 * current default ring gets removed, then a new one gets 5035 * picked (see i_mac_group_rem_ring()). 5036 */ 5037 if (mip->mi_state_flags & MIS_IS_AGGR && 5038 mip->mi_default_tx_ring == NULL && 5039 ring->mr_type == MAC_RING_TYPE_TX) { 5040 mip->mi_default_tx_ring = (mac_ring_handle_t)ring; 5041 } 5042 5043 MAC_RING_UNMARK(ring, MR_INCIPIENT); 5044 return (0); 5045 } 5046 5047 /* 5048 * Remove a ring from it's current group. MAC internal function for dynamic 5049 * grouping. 5050 * 5051 * The caller needs to call mac_perim_enter() before calling this function. 5052 */ 5053 void 5054 i_mac_group_rem_ring(mac_group_t *group, mac_ring_t *ring, 5055 boolean_t driver_call) 5056 { 5057 mac_impl_t *mip = (mac_impl_t *)group->mrg_mh; 5058 mac_capab_rings_t *cap_rings = NULL; 5059 mac_group_type_t group_type; 5060 5061 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 5062 5063 ASSERT(mac_find_ring((mac_group_handle_t)group, 5064 ring->mr_index) == (mac_ring_handle_t)ring); 5065 ASSERT((mac_group_t *)ring->mr_gh == group); 5066 ASSERT(ring->mr_type == group->mrg_type); 5067 5068 if (ring->mr_state == MR_INUSE) 5069 mac_stop_ring(ring); 5070 switch (ring->mr_type) { 5071 case MAC_RING_TYPE_RX: 5072 group_type = mip->mi_rx_group_type; 5073 cap_rings = &mip->mi_rx_rings_cap; 5074 5075 /* 5076 * Only hardware classified packets hold a reference to the 5077 * ring all the way up the Rx path. mac_rx_srs_remove() 5078 * will take care of quiescing the Rx path and removing the 5079 * SRS. The software classified path neither holds a reference 5080 * nor any association with the ring in mac_rx. 5081 */ 5082 if (ring->mr_srs != NULL) { 5083 mac_rx_srs_remove(ring->mr_srs); 5084 ring->mr_srs = NULL; 5085 } 5086 5087 break; 5088 case MAC_RING_TYPE_TX: 5089 { 5090 mac_grp_client_t *mgcp; 5091 mac_client_impl_t *mcip; 5092 mac_soft_ring_set_t *mac_srs; 5093 mac_srs_tx_t *tx; 5094 mac_ring_t *rem_ring; 5095 mac_group_t *defgrp; 5096 uint_t ring_info = 0; 5097 5098 /* 5099 * For TX this function is invoked in three 5100 * cases: 5101 * 5102 * 1) In the case of a failure during the 5103 * initial creation of a group when a share is 5104 * associated with a MAC client. So the SRS is not 5105 * yet setup, and will be setup later after the 5106 * group has been reserved and populated. 5107 * 5108 * 2) From mac_release_tx_group() when freeing 5109 * a TX SRS. 5110 * 5111 * 3) In the case of aggr, when a port gets removed, 5112 * the pseudo Tx rings that it exposed gets removed. 5113 * 5114 * In the first two cases the SRS and its soft 5115 * rings are already quiesced. 5116 */ 5117 if (driver_call) { 5118 mac_client_impl_t *mcip; 5119 mac_soft_ring_set_t *mac_srs; 5120 mac_soft_ring_t *sringp; 5121 mac_srs_tx_t *srs_tx; 5122 5123 if (mip->mi_state_flags & MIS_IS_AGGR && 5124 mip->mi_default_tx_ring == 5125 (mac_ring_handle_t)ring) { 5126 /* pick a new default Tx ring */ 5127 mip->mi_default_tx_ring = 5128 (group->mrg_rings != ring) ? 5129 (mac_ring_handle_t)group->mrg_rings : 5130 (mac_ring_handle_t)(ring->mr_next); 5131 } 5132 /* Presently only aggr case comes here */ 5133 if (group->mrg_state != MAC_GROUP_STATE_RESERVED) 5134 break; 5135 5136 mcip = MAC_GROUP_ONLY_CLIENT(group); 5137 ASSERT(mcip != NULL); 5138 ASSERT(mcip->mci_state_flags & MCIS_IS_AGGR_CLIENT); 5139 mac_srs = MCIP_TX_SRS(mcip); 5140 ASSERT(mac_srs->srs_tx.st_mode == SRS_TX_AGGR || 5141 mac_srs->srs_tx.st_mode == SRS_TX_BW_AGGR); 5142 srs_tx = &mac_srs->srs_tx; 5143 /* 5144 * Wakeup any callers blocked on this 5145 * Tx ring due to flow control. 5146 */ 5147 sringp = srs_tx->st_soft_rings[ring->mr_index]; 5148 ASSERT(sringp != NULL); 5149 mac_tx_invoke_callbacks(mcip, (mac_tx_cookie_t)sringp); 5150 mac_tx_client_quiesce((mac_client_handle_t)mcip); 5151 mac_tx_srs_del_ring(mac_srs, ring); 5152 mac_tx_client_restart((mac_client_handle_t)mcip); 5153 break; 5154 } 5155 ASSERT(ring != (mac_ring_t *)mip->mi_default_tx_ring); 5156 group_type = mip->mi_tx_group_type; 5157 cap_rings = &mip->mi_tx_rings_cap; 5158 /* 5159 * See if we need to take it out of the MAC clients using 5160 * this group 5161 */ 5162 if (MAC_GROUP_NO_CLIENT(group)) 5163 break; 5164 mgcp = group->mrg_clients; 5165 defgrp = MAC_DEFAULT_TX_GROUP(mip); 5166 while (mgcp != NULL) { 5167 mcip = mgcp->mgc_client; 5168 mac_srs = MCIP_TX_SRS(mcip); 5169 tx = &mac_srs->srs_tx; 5170 mac_tx_client_quiesce((mac_client_handle_t)mcip); 5171 /* 5172 * If we are here when removing rings from the 5173 * defgroup, mac_reserve_tx_ring would have 5174 * already deleted the ring from the MAC 5175 * clients in the group. 5176 */ 5177 if (group != defgrp) { 5178 mac_tx_invoke_callbacks(mcip, 5179 (mac_tx_cookie_t) 5180 mac_tx_srs_get_soft_ring(mac_srs, ring)); 5181 mac_tx_srs_del_ring(mac_srs, ring); 5182 } 5183 /* 5184 * Additionally, if we are left with only 5185 * one ring in the group after this, we need 5186 * to modify the mode etc. to. (We haven't 5187 * yet taken the ring out, so we check with 2). 5188 */ 5189 if (group->mrg_cur_count == 2) { 5190 if (ring->mr_next == NULL) 5191 rem_ring = group->mrg_rings; 5192 else 5193 rem_ring = ring->mr_next; 5194 mac_tx_invoke_callbacks(mcip, 5195 (mac_tx_cookie_t) 5196 mac_tx_srs_get_soft_ring(mac_srs, 5197 rem_ring)); 5198 mac_tx_srs_del_ring(mac_srs, rem_ring); 5199 if (rem_ring->mr_state != MR_INUSE) { 5200 (void) mac_start_ring(rem_ring); 5201 } 5202 tx->st_arg2 = (void *)rem_ring; 5203 mac_tx_srs_stat_recreate(mac_srs, B_FALSE); 5204 ring_info = mac_hwring_getinfo( 5205 (mac_ring_handle_t)rem_ring); 5206 /* 5207 * We are shrinking from multiple 5208 * to 1 ring. 5209 */ 5210 if (mac_srs->srs_type & SRST_BW_CONTROL) { 5211 tx->st_mode = SRS_TX_BW; 5212 } else if (mac_tx_serialize || 5213 (ring_info & MAC_RING_TX_SERIALIZE)) { 5214 tx->st_mode = SRS_TX_SERIALIZE; 5215 } else { 5216 tx->st_mode = SRS_TX_DEFAULT; 5217 } 5218 tx->st_func = mac_tx_get_func(tx->st_mode); 5219 } 5220 mac_tx_client_restart((mac_client_handle_t)mcip); 5221 mgcp = mgcp->mgc_next; 5222 } 5223 break; 5224 } 5225 default: 5226 ASSERT(B_FALSE); 5227 } 5228 5229 /* 5230 * Remove the ring from the group. 5231 */ 5232 if (ring == group->mrg_rings) 5233 group->mrg_rings = ring->mr_next; 5234 else { 5235 mac_ring_t *pre; 5236 5237 pre = group->mrg_rings; 5238 while (pre->mr_next != ring) 5239 pre = pre->mr_next; 5240 pre->mr_next = ring->mr_next; 5241 } 5242 group->mrg_cur_count--; 5243 5244 if (!driver_call) { 5245 ASSERT(group_type == MAC_GROUP_TYPE_DYNAMIC); 5246 ASSERT(group->mrg_driver == NULL || 5247 cap_rings->mr_gremring != NULL); 5248 5249 /* 5250 * Remove the driver level hardware ring. 5251 */ 5252 if (group->mrg_driver != NULL) { 5253 cap_rings->mr_gremring(group->mrg_driver, 5254 ring->mr_driver, ring->mr_type); 5255 } 5256 } 5257 5258 ring->mr_gh = NULL; 5259 if (driver_call) 5260 mac_ring_free(mip, ring); 5261 else 5262 ring->mr_flag = 0; 5263 } 5264 5265 /* 5266 * Move a ring to the target group. If needed, remove the ring from the group 5267 * that it currently belongs to. 5268 * 5269 * The caller need to enter MAC's perimeter by calling mac_perim_enter(). 5270 */ 5271 static int 5272 mac_group_mov_ring(mac_impl_t *mip, mac_group_t *d_group, mac_ring_t *ring) 5273 { 5274 mac_group_t *s_group = (mac_group_t *)ring->mr_gh; 5275 int rv; 5276 5277 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 5278 ASSERT(d_group != NULL); 5279 ASSERT(s_group == NULL || s_group->mrg_mh == d_group->mrg_mh); 5280 5281 if (s_group == d_group) 5282 return (0); 5283 5284 /* 5285 * Remove it from current group first. 5286 */ 5287 if (s_group != NULL) 5288 i_mac_group_rem_ring(s_group, ring, B_FALSE); 5289 5290 /* 5291 * Add it to the new group. 5292 */ 5293 rv = i_mac_group_add_ring(d_group, ring, 0); 5294 if (rv != 0) { 5295 /* 5296 * Failed to add ring back to source group. If 5297 * that fails, the ring is stuck in limbo, log message. 5298 */ 5299 if (i_mac_group_add_ring(s_group, ring, 0)) { 5300 cmn_err(CE_WARN, "%s: failed to move ring %p\n", 5301 mip->mi_name, (void *)ring); 5302 } 5303 } 5304 5305 return (rv); 5306 } 5307 5308 /* 5309 * Find a MAC address according to its value. 5310 */ 5311 mac_address_t * 5312 mac_find_macaddr(mac_impl_t *mip, uint8_t *mac_addr) 5313 { 5314 mac_address_t *map; 5315 5316 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 5317 5318 for (map = mip->mi_addresses; map != NULL; map = map->ma_next) { 5319 if (bcmp(mac_addr, map->ma_addr, map->ma_len) == 0) 5320 break; 5321 } 5322 5323 return (map); 5324 } 5325 5326 /* 5327 * Check whether the MAC address is shared by multiple clients. 5328 */ 5329 boolean_t 5330 mac_check_macaddr_shared(mac_address_t *map) 5331 { 5332 ASSERT(MAC_PERIM_HELD((mac_handle_t)map->ma_mip)); 5333 5334 return (map->ma_nusers > 1); 5335 } 5336 5337 /* 5338 * Remove the specified MAC address from the MAC address list and free it. 5339 */ 5340 static void 5341 mac_free_macaddr(mac_address_t *map) 5342 { 5343 mac_impl_t *mip = map->ma_mip; 5344 5345 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 5346 VERIFY3P(mip->mi_addresses, !=, NULL); 5347 5348 VERIFY3P(map, ==, mac_find_macaddr(mip, map->ma_addr)); 5349 VERIFY3P(map, !=, NULL); 5350 VERIFY3S(map->ma_nusers, ==, 0); 5351 VERIFY3P(map->ma_vlans, ==, NULL); 5352 5353 if (map == mip->mi_addresses) { 5354 mip->mi_addresses = map->ma_next; 5355 } else { 5356 mac_address_t *pre; 5357 5358 pre = mip->mi_addresses; 5359 while (pre->ma_next != map) 5360 pre = pre->ma_next; 5361 pre->ma_next = map->ma_next; 5362 } 5363 5364 kmem_free(map, sizeof (mac_address_t)); 5365 } 5366 5367 static mac_vlan_t * 5368 mac_find_vlan(mac_address_t *map, uint16_t vid) 5369 { 5370 mac_vlan_t *mvp; 5371 5372 for (mvp = map->ma_vlans; mvp != NULL; mvp = mvp->mv_next) { 5373 if (mvp->mv_vid == vid) 5374 return (mvp); 5375 } 5376 5377 return (NULL); 5378 } 5379 5380 static mac_vlan_t * 5381 mac_add_vlan(mac_address_t *map, uint16_t vid) 5382 { 5383 mac_vlan_t *mvp; 5384 5385 /* 5386 * We should never add the same {addr, VID} tuple more 5387 * than once, but let's be sure. 5388 */ 5389 for (mvp = map->ma_vlans; mvp != NULL; mvp = mvp->mv_next) 5390 VERIFY3U(mvp->mv_vid, !=, vid); 5391 5392 /* Add the VLAN to the head of the VLAN list. */ 5393 mvp = kmem_zalloc(sizeof (mac_vlan_t), KM_SLEEP); 5394 mvp->mv_vid = vid; 5395 mvp->mv_next = map->ma_vlans; 5396 map->ma_vlans = mvp; 5397 5398 return (mvp); 5399 } 5400 5401 static void 5402 mac_rem_vlan(mac_address_t *map, mac_vlan_t *mvp) 5403 { 5404 mac_vlan_t *pre; 5405 5406 if (map->ma_vlans == mvp) { 5407 map->ma_vlans = mvp->mv_next; 5408 } else { 5409 pre = map->ma_vlans; 5410 while (pre->mv_next != mvp) { 5411 pre = pre->mv_next; 5412 5413 /* 5414 * We've reached the end of the list without 5415 * finding mvp. 5416 */ 5417 VERIFY3P(pre, !=, NULL); 5418 } 5419 pre->mv_next = mvp->mv_next; 5420 } 5421 5422 kmem_free(mvp, sizeof (mac_vlan_t)); 5423 } 5424 5425 /* 5426 * Create a new mac_address_t if this is the first use of the address 5427 * or add a VID to an existing address. In either case, the 5428 * mac_address_t acts as a list of {addr, VID} tuples where each tuple 5429 * shares the same addr. If group is non-NULL then attempt to program 5430 * the MAC's HW filters for this group. Otherwise, if group is NULL, 5431 * then the MAC has no rings and there is nothing to program. 5432 */ 5433 int 5434 mac_add_macaddr_vlan(mac_impl_t *mip, mac_group_t *group, uint8_t *addr, 5435 uint16_t vid, boolean_t use_hw) 5436 { 5437 mac_address_t *map; 5438 mac_vlan_t *mvp; 5439 int err = 0; 5440 boolean_t allocated_map = B_FALSE; 5441 boolean_t hw_mac = B_FALSE; 5442 boolean_t hw_vlan = B_FALSE; 5443 5444 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 5445 5446 map = mac_find_macaddr(mip, addr); 5447 5448 /* 5449 * If this is the first use of this MAC address then allocate 5450 * and initialize a new structure. 5451 */ 5452 if (map == NULL) { 5453 map = kmem_zalloc(sizeof (mac_address_t), KM_SLEEP); 5454 map->ma_len = mip->mi_type->mt_addr_length; 5455 bcopy(addr, map->ma_addr, map->ma_len); 5456 map->ma_nusers = 0; 5457 map->ma_group = group; 5458 map->ma_mip = mip; 5459 map->ma_untagged = B_FALSE; 5460 5461 /* Add the new MAC address to the head of the address list. */ 5462 map->ma_next = mip->mi_addresses; 5463 mip->mi_addresses = map; 5464 5465 allocated_map = B_TRUE; 5466 } 5467 5468 VERIFY(map->ma_group == NULL || map->ma_group == group); 5469 if (map->ma_group == NULL) 5470 map->ma_group = group; 5471 5472 if (vid == VLAN_ID_NONE) { 5473 map->ma_untagged = B_TRUE; 5474 mvp = NULL; 5475 } else { 5476 mvp = mac_add_vlan(map, vid); 5477 } 5478 5479 /* 5480 * Set the VLAN HW filter if: 5481 * 5482 * o the MAC's VLAN HW filtering is enabled, and 5483 * o the address does not currently rely on promisc mode. 5484 * 5485 * This is called even when the client specifies an untagged 5486 * address (VLAN_ID_NONE) because some MAC providers require 5487 * setting additional bits to accept untagged traffic when 5488 * VLAN HW filtering is enabled. 5489 */ 5490 if (MAC_GROUP_HW_VLAN(group) && 5491 map->ma_type != MAC_ADDRESS_TYPE_UNICAST_PROMISC) { 5492 if ((err = mac_group_addvlan(group, vid)) != 0) 5493 goto bail; 5494 5495 hw_vlan = B_TRUE; 5496 } 5497 5498 VERIFY3S(map->ma_nusers, >=, 0); 5499 map->ma_nusers++; 5500 5501 /* 5502 * If this MAC address already has a HW filter then simply 5503 * increment the counter. 5504 */ 5505 if (map->ma_nusers > 1) 5506 return (0); 5507 5508 /* 5509 * All logic from here on out is executed during initial 5510 * creation only. 5511 */ 5512 VERIFY3S(map->ma_nusers, ==, 1); 5513 5514 /* 5515 * Activate this MAC address by adding it to the reserved group. 5516 */ 5517 if (group != NULL) { 5518 err = mac_group_addmac(group, (const uint8_t *)addr); 5519 5520 /* 5521 * If the driver is out of filters then we can 5522 * continue and use promisc mode. For any other error, 5523 * assume the driver is in a state where we can't 5524 * program the filters or use promisc mode; so we must 5525 * bail. 5526 */ 5527 if (err != 0 && err != ENOSPC) { 5528 map->ma_nusers--; 5529 goto bail; 5530 } 5531 5532 hw_mac = (err == 0); 5533 } 5534 5535 if (hw_mac) { 5536 map->ma_type = MAC_ADDRESS_TYPE_UNICAST_CLASSIFIED; 5537 return (0); 5538 } 5539 5540 /* 5541 * The MAC address addition failed. If the client requires a 5542 * hardware classified MAC address, fail the operation. This 5543 * feature is only used by sun4v vsw. 5544 */ 5545 if (use_hw && !hw_mac) { 5546 err = ENOSPC; 5547 map->ma_nusers--; 5548 goto bail; 5549 } 5550 5551 /* 5552 * If we reach this point then either the MAC doesn't have 5553 * RINGS capability or we are out of MAC address HW filters. 5554 * In any case we must put the MAC into promiscuous mode. 5555 */ 5556 VERIFY(group == NULL || !hw_mac); 5557 5558 /* 5559 * The one exception is the primary address. A non-RINGS 5560 * driver filters the primary address by default; promisc mode 5561 * is not needed. 5562 */ 5563 if ((group == NULL) && 5564 (bcmp(map->ma_addr, mip->mi_addr, map->ma_len) == 0)) { 5565 map->ma_type = MAC_ADDRESS_TYPE_UNICAST_CLASSIFIED; 5566 return (0); 5567 } 5568 5569 /* 5570 * Enable promiscuous mode in order to receive traffic to the 5571 * new MAC address. All existing HW filters still send their 5572 * traffic to their respective group/SRSes. But with promisc 5573 * enabled all unknown traffic is delivered to the default 5574 * group where it is SW classified via mac_rx_classify(). 5575 */ 5576 if ((err = i_mac_promisc_set(mip, B_TRUE)) == 0) { 5577 map->ma_type = MAC_ADDRESS_TYPE_UNICAST_PROMISC; 5578 return (0); 5579 } 5580 5581 /* 5582 * We failed to set promisc mode and we are about to free 'map'. 5583 */ 5584 map->ma_nusers = 0; 5585 5586 bail: 5587 if (hw_vlan) { 5588 int err2 = mac_group_remvlan(group, vid); 5589 5590 if (err2 != 0) { 5591 cmn_err(CE_WARN, "Failed to remove VLAN %u from group" 5592 " %d on MAC %s: %d.", vid, group->mrg_index, 5593 mip->mi_name, err2); 5594 } 5595 } 5596 5597 if (mvp != NULL) 5598 mac_rem_vlan(map, mvp); 5599 5600 if (allocated_map) 5601 mac_free_macaddr(map); 5602 5603 return (err); 5604 } 5605 5606 int 5607 mac_remove_macaddr_vlan(mac_address_t *map, uint16_t vid) 5608 { 5609 mac_vlan_t *mvp; 5610 mac_impl_t *mip = map->ma_mip; 5611 mac_group_t *group = map->ma_group; 5612 int err = 0; 5613 5614 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 5615 VERIFY3P(map, ==, mac_find_macaddr(mip, map->ma_addr)); 5616 5617 if (vid == VLAN_ID_NONE) { 5618 map->ma_untagged = B_FALSE; 5619 mvp = NULL; 5620 } else { 5621 mvp = mac_find_vlan(map, vid); 5622 VERIFY3P(mvp, !=, NULL); 5623 } 5624 5625 if (MAC_GROUP_HW_VLAN(group) && 5626 map->ma_type == MAC_ADDRESS_TYPE_UNICAST_CLASSIFIED && 5627 ((err = mac_group_remvlan(group, vid)) != 0)) 5628 return (err); 5629 5630 if (mvp != NULL) 5631 mac_rem_vlan(map, mvp); 5632 5633 /* 5634 * If it's not the last client using this MAC address, only update 5635 * the MAC clients count. 5636 */ 5637 map->ma_nusers--; 5638 if (map->ma_nusers > 0) 5639 return (0); 5640 5641 VERIFY3S(map->ma_nusers, ==, 0); 5642 5643 /* 5644 * The MAC address is no longer used by any MAC client, so 5645 * remove it from its associated group. Turn off promiscuous 5646 * mode if this is the last address relying on it. 5647 */ 5648 switch (map->ma_type) { 5649 case MAC_ADDRESS_TYPE_UNICAST_CLASSIFIED: 5650 /* 5651 * Don't free the preset primary address for drivers that 5652 * don't advertise RINGS capability. 5653 */ 5654 if (group == NULL) 5655 return (0); 5656 5657 if ((err = mac_group_remmac(group, map->ma_addr)) != 0) { 5658 if (vid == VLAN_ID_NONE) 5659 map->ma_untagged = B_TRUE; 5660 else 5661 (void) mac_add_vlan(map, vid); 5662 5663 /* 5664 * If we fail to remove the MAC address HW 5665 * filter but then also fail to re-add the 5666 * VLAN HW filter then we are in a busted 5667 * state. We do our best by logging a warning 5668 * and returning the original 'err' that got 5669 * us here. At this point, traffic for this 5670 * address + VLAN combination will be dropped 5671 * until the user reboots the system. In the 5672 * future, it would be nice to have a system 5673 * that can compare the state of expected 5674 * classification according to mac to the 5675 * actual state of the provider, and report 5676 * and fix any inconsistencies. 5677 */ 5678 if (MAC_GROUP_HW_VLAN(group)) { 5679 int err2; 5680 5681 err2 = mac_group_addvlan(group, vid); 5682 if (err2 != 0) { 5683 cmn_err(CE_WARN, "Failed to readd VLAN" 5684 " %u to group %d on MAC %s: %d.", 5685 vid, group->mrg_index, mip->mi_name, 5686 err2); 5687 } 5688 } 5689 5690 map->ma_nusers = 1; 5691 return (err); 5692 } 5693 5694 map->ma_group = NULL; 5695 break; 5696 case MAC_ADDRESS_TYPE_UNICAST_PROMISC: 5697 err = i_mac_promisc_set(mip, B_FALSE); 5698 break; 5699 default: 5700 panic("Unexpected ma_type 0x%x, file: %s, line %d", 5701 map->ma_type, __FILE__, __LINE__); 5702 } 5703 5704 if (err != 0) { 5705 map->ma_nusers = 1; 5706 return (err); 5707 } 5708 5709 /* 5710 * We created MAC address for the primary one at registration, so we 5711 * won't free it here. mac_fini_macaddr() will take care of it. 5712 */ 5713 if (bcmp(map->ma_addr, mip->mi_addr, map->ma_len) != 0) 5714 mac_free_macaddr(map); 5715 5716 return (0); 5717 } 5718 5719 /* 5720 * Update an existing MAC address. The caller need to make sure that the new 5721 * value has not been used. 5722 */ 5723 int 5724 mac_update_macaddr(mac_address_t *map, uint8_t *mac_addr) 5725 { 5726 mac_impl_t *mip = map->ma_mip; 5727 int err = 0; 5728 5729 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 5730 ASSERT(mac_find_macaddr(mip, mac_addr) == NULL); 5731 5732 switch (map->ma_type) { 5733 case MAC_ADDRESS_TYPE_UNICAST_CLASSIFIED: 5734 /* 5735 * Update the primary address for drivers that are not 5736 * RINGS capable. 5737 */ 5738 if (mip->mi_rx_groups == NULL) { 5739 err = mip->mi_unicst(mip->mi_driver, (const uint8_t *) 5740 mac_addr); 5741 if (err != 0) 5742 return (err); 5743 break; 5744 } 5745 5746 /* 5747 * If this MAC address is not currently in use, 5748 * simply break out and update the value. 5749 */ 5750 if (map->ma_nusers == 0) 5751 break; 5752 5753 /* 5754 * Need to replace the MAC address associated with a group. 5755 */ 5756 err = mac_group_remmac(map->ma_group, map->ma_addr); 5757 if (err != 0) 5758 return (err); 5759 5760 err = mac_group_addmac(map->ma_group, mac_addr); 5761 5762 /* 5763 * Failure hints hardware error. The MAC layer needs to 5764 * have error notification facility to handle this. 5765 * Now, simply try to restore the value. 5766 */ 5767 if (err != 0) 5768 (void) mac_group_addmac(map->ma_group, map->ma_addr); 5769 5770 break; 5771 case MAC_ADDRESS_TYPE_UNICAST_PROMISC: 5772 /* 5773 * Need to do nothing more if in promiscuous mode. 5774 */ 5775 break; 5776 default: 5777 ASSERT(B_FALSE); 5778 } 5779 5780 /* 5781 * Successfully replaced the MAC address. 5782 */ 5783 if (err == 0) 5784 bcopy(mac_addr, map->ma_addr, map->ma_len); 5785 5786 return (err); 5787 } 5788 5789 /* 5790 * Freshen the MAC address with new value. Its caller must have updated the 5791 * hardware MAC address before calling this function. 5792 * This funcitons is supposed to be used to handle the MAC address change 5793 * notification from underlying drivers. 5794 */ 5795 void 5796 mac_freshen_macaddr(mac_address_t *map, uint8_t *mac_addr) 5797 { 5798 mac_impl_t *mip = map->ma_mip; 5799 5800 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 5801 ASSERT(mac_find_macaddr(mip, mac_addr) == NULL); 5802 5803 /* 5804 * Freshen the MAC address with new value. 5805 */ 5806 bcopy(mac_addr, map->ma_addr, map->ma_len); 5807 bcopy(mac_addr, mip->mi_addr, map->ma_len); 5808 5809 /* 5810 * Update all MAC clients that share this MAC address. 5811 */ 5812 mac_unicast_update_clients(mip, map); 5813 } 5814 5815 /* 5816 * Set up the primary MAC address. 5817 */ 5818 void 5819 mac_init_macaddr(mac_impl_t *mip) 5820 { 5821 mac_address_t *map; 5822 5823 /* 5824 * The reference count is initialized to zero, until it's really 5825 * activated. 5826 */ 5827 map = kmem_zalloc(sizeof (mac_address_t), KM_SLEEP); 5828 map->ma_len = mip->mi_type->mt_addr_length; 5829 bcopy(mip->mi_addr, map->ma_addr, map->ma_len); 5830 5831 /* 5832 * If driver advertises RINGS capability, it shouldn't have initialized 5833 * its primary MAC address. For other drivers, including VNIC, the 5834 * primary address must work after registration. 5835 */ 5836 if (mip->mi_rx_groups == NULL) 5837 map->ma_type = MAC_ADDRESS_TYPE_UNICAST_CLASSIFIED; 5838 5839 map->ma_mip = mip; 5840 5841 mip->mi_addresses = map; 5842 } 5843 5844 /* 5845 * Clean up the primary MAC address. Note, only one primary MAC address 5846 * is allowed. All other MAC addresses must have been freed appropriately. 5847 */ 5848 void 5849 mac_fini_macaddr(mac_impl_t *mip) 5850 { 5851 mac_address_t *map = mip->mi_addresses; 5852 5853 if (map == NULL) 5854 return; 5855 5856 /* 5857 * If mi_addresses is initialized, there should be exactly one 5858 * entry left on the list with no users. 5859 */ 5860 VERIFY3S(map->ma_nusers, ==, 0); 5861 VERIFY3P(map->ma_next, ==, NULL); 5862 VERIFY3P(map->ma_vlans, ==, NULL); 5863 5864 kmem_free(map, sizeof (mac_address_t)); 5865 mip->mi_addresses = NULL; 5866 } 5867 5868 /* 5869 * Logging related functions. 5870 * 5871 * Note that Kernel statistics have been extended to maintain fine 5872 * granularity of statistics viz. hardware lane, software lane, fanout 5873 * stats etc. However, extended accounting continues to support only 5874 * aggregate statistics like before. 5875 */ 5876 5877 /* Write the flow description to a netinfo_t record */ 5878 static netinfo_t * 5879 mac_write_flow_desc(flow_entry_t *flent, mac_client_impl_t *mcip) 5880 { 5881 netinfo_t *ninfo; 5882 net_desc_t *ndesc; 5883 flow_desc_t *fdesc; 5884 mac_resource_props_t *mrp; 5885 5886 ninfo = kmem_zalloc(sizeof (netinfo_t), KM_NOSLEEP); 5887 if (ninfo == NULL) 5888 return (NULL); 5889 ndesc = kmem_zalloc(sizeof (net_desc_t), KM_NOSLEEP); 5890 if (ndesc == NULL) { 5891 kmem_free(ninfo, sizeof (netinfo_t)); 5892 return (NULL); 5893 } 5894 5895 /* 5896 * Grab the fe_lock to see a self-consistent fe_flow_desc. 5897 * Updates to the fe_flow_desc are done under the fe_lock 5898 */ 5899 mutex_enter(&flent->fe_lock); 5900 fdesc = &flent->fe_flow_desc; 5901 mrp = &flent->fe_resource_props; 5902 5903 ndesc->nd_name = flent->fe_flow_name; 5904 ndesc->nd_devname = mcip->mci_name; 5905 bcopy(fdesc->fd_src_mac, ndesc->nd_ehost, ETHERADDRL); 5906 bcopy(fdesc->fd_dst_mac, ndesc->nd_edest, ETHERADDRL); 5907 ndesc->nd_sap = htonl(fdesc->fd_sap); 5908 ndesc->nd_isv4 = (uint8_t)fdesc->fd_ipversion == IPV4_VERSION; 5909 ndesc->nd_bw_limit = mrp->mrp_maxbw; 5910 if (ndesc->nd_isv4) { 5911 ndesc->nd_saddr[3] = htonl(fdesc->fd_local_addr.s6_addr32[3]); 5912 ndesc->nd_daddr[3] = htonl(fdesc->fd_remote_addr.s6_addr32[3]); 5913 } else { 5914 bcopy(&fdesc->fd_local_addr, ndesc->nd_saddr, IPV6_ADDR_LEN); 5915 bcopy(&fdesc->fd_remote_addr, ndesc->nd_daddr, IPV6_ADDR_LEN); 5916 } 5917 ndesc->nd_sport = htons(fdesc->fd_local_port); 5918 ndesc->nd_dport = htons(fdesc->fd_remote_port); 5919 ndesc->nd_protocol = (uint8_t)fdesc->fd_protocol; 5920 mutex_exit(&flent->fe_lock); 5921 5922 ninfo->ni_record = ndesc; 5923 ninfo->ni_size = sizeof (net_desc_t); 5924 ninfo->ni_type = EX_NET_FLDESC_REC; 5925 5926 return (ninfo); 5927 } 5928 5929 /* Write the flow statistics to a netinfo_t record */ 5930 static netinfo_t * 5931 mac_write_flow_stats(flow_entry_t *flent) 5932 { 5933 netinfo_t *ninfo; 5934 net_stat_t *nstat; 5935 mac_soft_ring_set_t *mac_srs; 5936 mac_rx_stats_t *mac_rx_stat; 5937 mac_tx_stats_t *mac_tx_stat; 5938 int i; 5939 5940 ninfo = kmem_zalloc(sizeof (netinfo_t), KM_NOSLEEP); 5941 if (ninfo == NULL) 5942 return (NULL); 5943 nstat = kmem_zalloc(sizeof (net_stat_t), KM_NOSLEEP); 5944 if (nstat == NULL) { 5945 kmem_free(ninfo, sizeof (netinfo_t)); 5946 return (NULL); 5947 } 5948 5949 nstat->ns_name = flent->fe_flow_name; 5950 for (i = 0; i < flent->fe_rx_srs_cnt; i++) { 5951 mac_srs = (mac_soft_ring_set_t *)flent->fe_rx_srs[i]; 5952 mac_rx_stat = &mac_srs->srs_rx.sr_stat; 5953 5954 nstat->ns_ibytes += mac_rx_stat->mrs_intrbytes + 5955 mac_rx_stat->mrs_pollbytes + mac_rx_stat->mrs_lclbytes; 5956 nstat->ns_ipackets += mac_rx_stat->mrs_intrcnt + 5957 mac_rx_stat->mrs_pollcnt + mac_rx_stat->mrs_lclcnt; 5958 nstat->ns_oerrors += mac_rx_stat->mrs_ierrors; 5959 } 5960 5961 mac_srs = (mac_soft_ring_set_t *)(flent->fe_tx_srs); 5962 if (mac_srs != NULL) { 5963 mac_tx_stat = &mac_srs->srs_tx.st_stat; 5964 5965 nstat->ns_obytes = mac_tx_stat->mts_obytes; 5966 nstat->ns_opackets = mac_tx_stat->mts_opackets; 5967 nstat->ns_oerrors = mac_tx_stat->mts_oerrors; 5968 } 5969 5970 ninfo->ni_record = nstat; 5971 ninfo->ni_size = sizeof (net_stat_t); 5972 ninfo->ni_type = EX_NET_FLSTAT_REC; 5973 5974 return (ninfo); 5975 } 5976 5977 /* Write the link description to a netinfo_t record */ 5978 static netinfo_t * 5979 mac_write_link_desc(mac_client_impl_t *mcip) 5980 { 5981 netinfo_t *ninfo; 5982 net_desc_t *ndesc; 5983 flow_entry_t *flent = mcip->mci_flent; 5984 5985 ninfo = kmem_zalloc(sizeof (netinfo_t), KM_NOSLEEP); 5986 if (ninfo == NULL) 5987 return (NULL); 5988 ndesc = kmem_zalloc(sizeof (net_desc_t), KM_NOSLEEP); 5989 if (ndesc == NULL) { 5990 kmem_free(ninfo, sizeof (netinfo_t)); 5991 return (NULL); 5992 } 5993 5994 ndesc->nd_name = mcip->mci_name; 5995 ndesc->nd_devname = mcip->mci_name; 5996 ndesc->nd_isv4 = B_TRUE; 5997 /* 5998 * Grab the fe_lock to see a self-consistent fe_flow_desc. 5999 * Updates to the fe_flow_desc are done under the fe_lock 6000 * after removing the flent from the flow table. 6001 */ 6002 mutex_enter(&flent->fe_lock); 6003 bcopy(flent->fe_flow_desc.fd_src_mac, ndesc->nd_ehost, ETHERADDRL); 6004 mutex_exit(&flent->fe_lock); 6005 6006 ninfo->ni_record = ndesc; 6007 ninfo->ni_size = sizeof (net_desc_t); 6008 ninfo->ni_type = EX_NET_LNDESC_REC; 6009 6010 return (ninfo); 6011 } 6012 6013 /* Write the link statistics to a netinfo_t record */ 6014 static netinfo_t * 6015 mac_write_link_stats(mac_client_impl_t *mcip) 6016 { 6017 netinfo_t *ninfo; 6018 net_stat_t *nstat; 6019 flow_entry_t *flent; 6020 mac_soft_ring_set_t *mac_srs; 6021 mac_rx_stats_t *mac_rx_stat; 6022 mac_tx_stats_t *mac_tx_stat; 6023 int i; 6024 6025 ninfo = kmem_zalloc(sizeof (netinfo_t), KM_NOSLEEP); 6026 if (ninfo == NULL) 6027 return (NULL); 6028 nstat = kmem_zalloc(sizeof (net_stat_t), KM_NOSLEEP); 6029 if (nstat == NULL) { 6030 kmem_free(ninfo, sizeof (netinfo_t)); 6031 return (NULL); 6032 } 6033 6034 nstat->ns_name = mcip->mci_name; 6035 flent = mcip->mci_flent; 6036 if (flent != NULL) { 6037 for (i = 0; i < flent->fe_rx_srs_cnt; i++) { 6038 mac_srs = (mac_soft_ring_set_t *)flent->fe_rx_srs[i]; 6039 mac_rx_stat = &mac_srs->srs_rx.sr_stat; 6040 6041 nstat->ns_ibytes += mac_rx_stat->mrs_intrbytes + 6042 mac_rx_stat->mrs_pollbytes + 6043 mac_rx_stat->mrs_lclbytes; 6044 nstat->ns_ipackets += mac_rx_stat->mrs_intrcnt + 6045 mac_rx_stat->mrs_pollcnt + mac_rx_stat->mrs_lclcnt; 6046 nstat->ns_oerrors += mac_rx_stat->mrs_ierrors; 6047 } 6048 } 6049 6050 mac_srs = (mac_soft_ring_set_t *)(mcip->mci_flent->fe_tx_srs); 6051 if (mac_srs != NULL) { 6052 mac_tx_stat = &mac_srs->srs_tx.st_stat; 6053 6054 nstat->ns_obytes = mac_tx_stat->mts_obytes; 6055 nstat->ns_opackets = mac_tx_stat->mts_opackets; 6056 nstat->ns_oerrors = mac_tx_stat->mts_oerrors; 6057 } 6058 6059 ninfo->ni_record = nstat; 6060 ninfo->ni_size = sizeof (net_stat_t); 6061 ninfo->ni_type = EX_NET_LNSTAT_REC; 6062 6063 return (ninfo); 6064 } 6065 6066 typedef struct i_mac_log_state_s { 6067 boolean_t mi_last; 6068 int mi_fenable; 6069 int mi_lenable; 6070 list_t *mi_list; 6071 } i_mac_log_state_t; 6072 6073 /* 6074 * For a given flow, if the description has not been logged before, do it now. 6075 * If it is a VNIC, then we have collected information about it from the MAC 6076 * table, so skip it. 6077 * 6078 * Called through mac_flow_walk_nolock() 6079 * 6080 * Return 0 if successful. 6081 */ 6082 static int 6083 mac_log_flowinfo(flow_entry_t *flent, void *arg) 6084 { 6085 mac_client_impl_t *mcip = flent->fe_mcip; 6086 i_mac_log_state_t *lstate = arg; 6087 netinfo_t *ninfo; 6088 6089 if (mcip == NULL) 6090 return (0); 6091 6092 /* 6093 * If the name starts with "vnic", and fe_user_generated is true (to 6094 * exclude the mcast and active flow entries created implicitly for 6095 * a vnic, it is a VNIC flow. i.e. vnic1 is a vnic flow, 6096 * vnic/bge1/mcast1 is not and neither is vnic/bge1/active. 6097 */ 6098 if (strncasecmp(flent->fe_flow_name, "vnic", 4) == 0 && 6099 (flent->fe_type & FLOW_USER) != 0) { 6100 return (0); 6101 } 6102 6103 if (!flent->fe_desc_logged) { 6104 /* 6105 * We don't return error because we want to continue the 6106 * walk in case this is the last walk which means we 6107 * need to reset fe_desc_logged in all the flows. 6108 */ 6109 if ((ninfo = mac_write_flow_desc(flent, mcip)) == NULL) 6110 return (0); 6111 list_insert_tail(lstate->mi_list, ninfo); 6112 flent->fe_desc_logged = B_TRUE; 6113 } 6114 6115 /* 6116 * Regardless of the error, we want to proceed in case we have to 6117 * reset fe_desc_logged. 6118 */ 6119 ninfo = mac_write_flow_stats(flent); 6120 if (ninfo == NULL) 6121 return (-1); 6122 6123 list_insert_tail(lstate->mi_list, ninfo); 6124 6125 if (mcip != NULL && !(mcip->mci_state_flags & MCIS_DESC_LOGGED)) 6126 flent->fe_desc_logged = B_FALSE; 6127 6128 return (0); 6129 } 6130 6131 /* 6132 * Log the description for each mac client of this mac_impl_t, if it 6133 * hasn't already been done. Additionally, log statistics for the link as 6134 * well. Walk the flow table and log information for each flow as well. 6135 * If it is the last walk (mci_last), then we turn off mci_desc_logged (and 6136 * also fe_desc_logged, if flow logging is on) since we want to log the 6137 * description if and when logging is restarted. 6138 * 6139 * Return 0 upon success or -1 upon failure 6140 */ 6141 static int 6142 i_mac_impl_log(mac_impl_t *mip, i_mac_log_state_t *lstate) 6143 { 6144 mac_client_impl_t *mcip; 6145 netinfo_t *ninfo; 6146 6147 i_mac_perim_enter(mip); 6148 /* 6149 * Only walk the client list for NIC and etherstub 6150 */ 6151 if ((mip->mi_state_flags & MIS_DISABLED) || 6152 ((mip->mi_state_flags & MIS_IS_VNIC) && 6153 (mac_get_lower_mac_handle((mac_handle_t)mip) != NULL))) { 6154 i_mac_perim_exit(mip); 6155 return (0); 6156 } 6157 6158 for (mcip = mip->mi_clients_list; mcip != NULL; 6159 mcip = mcip->mci_client_next) { 6160 if (!MCIP_DATAPATH_SETUP(mcip)) 6161 continue; 6162 if (lstate->mi_lenable) { 6163 if (!(mcip->mci_state_flags & MCIS_DESC_LOGGED)) { 6164 ninfo = mac_write_link_desc(mcip); 6165 if (ninfo == NULL) { 6166 /* 6167 * We can't terminate it if this is the last 6168 * walk, else there might be some links with 6169 * mi_desc_logged set to true, which means 6170 * their description won't be logged the next 6171 * time logging is started (similarly for the 6172 * flows within such links). We can continue 6173 * without walking the flow table (i.e. to 6174 * set fe_desc_logged to false) because we 6175 * won't have written any flow stuff for this 6176 * link as we haven't logged the link itself. 6177 */ 6178 i_mac_perim_exit(mip); 6179 if (lstate->mi_last) 6180 return (0); 6181 else 6182 return (-1); 6183 } 6184 mcip->mci_state_flags |= MCIS_DESC_LOGGED; 6185 list_insert_tail(lstate->mi_list, ninfo); 6186 } 6187 } 6188 6189 ninfo = mac_write_link_stats(mcip); 6190 if (ninfo == NULL && !lstate->mi_last) { 6191 i_mac_perim_exit(mip); 6192 return (-1); 6193 } 6194 list_insert_tail(lstate->mi_list, ninfo); 6195 6196 if (lstate->mi_last) 6197 mcip->mci_state_flags &= ~MCIS_DESC_LOGGED; 6198 6199 if (lstate->mi_fenable) { 6200 if (mcip->mci_subflow_tab != NULL) { 6201 (void) mac_flow_walk_nolock( 6202 mcip->mci_subflow_tab, mac_log_flowinfo, 6203 lstate); 6204 } 6205 } 6206 } 6207 i_mac_perim_exit(mip); 6208 return (0); 6209 } 6210 6211 /* 6212 * modhash walker function to add a mac_impl_t to a list 6213 */ 6214 /*ARGSUSED*/ 6215 static uint_t 6216 i_mac_impl_list_walker(mod_hash_key_t key, mod_hash_val_t *val, void *arg) 6217 { 6218 list_t *list = (list_t *)arg; 6219 mac_impl_t *mip = (mac_impl_t *)val; 6220 6221 if ((mip->mi_state_flags & MIS_DISABLED) == 0) { 6222 list_insert_tail(list, mip); 6223 mip->mi_ref++; 6224 } 6225 6226 return (MH_WALK_CONTINUE); 6227 } 6228 6229 void 6230 i_mac_log_info(list_t *net_log_list, i_mac_log_state_t *lstate) 6231 { 6232 list_t mac_impl_list; 6233 mac_impl_t *mip; 6234 netinfo_t *ninfo; 6235 6236 /* Create list of mac_impls */ 6237 ASSERT(RW_LOCK_HELD(&i_mac_impl_lock)); 6238 list_create(&mac_impl_list, sizeof (mac_impl_t), offsetof(mac_impl_t, 6239 mi_node)); 6240 mod_hash_walk(i_mac_impl_hash, i_mac_impl_list_walker, &mac_impl_list); 6241 rw_exit(&i_mac_impl_lock); 6242 6243 /* Create log entries for each mac_impl */ 6244 for (mip = list_head(&mac_impl_list); mip != NULL; 6245 mip = list_next(&mac_impl_list, mip)) { 6246 if (i_mac_impl_log(mip, lstate) != 0) 6247 continue; 6248 } 6249 6250 /* Remove elements and destroy list of mac_impls */ 6251 rw_enter(&i_mac_impl_lock, RW_WRITER); 6252 while ((mip = list_remove_tail(&mac_impl_list)) != NULL) { 6253 mip->mi_ref--; 6254 } 6255 rw_exit(&i_mac_impl_lock); 6256 list_destroy(&mac_impl_list); 6257 6258 /* 6259 * Write log entries to files outside of locks, free associated 6260 * structures, and remove entries from the list. 6261 */ 6262 while ((ninfo = list_head(net_log_list)) != NULL) { 6263 (void) exacct_commit_netinfo(ninfo->ni_record, ninfo->ni_type); 6264 list_remove(net_log_list, ninfo); 6265 kmem_free(ninfo->ni_record, ninfo->ni_size); 6266 kmem_free(ninfo, sizeof (*ninfo)); 6267 } 6268 list_destroy(net_log_list); 6269 } 6270 6271 /* 6272 * The timer thread that runs every mac_logging_interval seconds and logs 6273 * link and/or flow information. 6274 */ 6275 /* ARGSUSED */ 6276 void 6277 mac_log_linkinfo(void *arg) 6278 { 6279 i_mac_log_state_t lstate; 6280 list_t net_log_list; 6281 6282 list_create(&net_log_list, sizeof (netinfo_t), 6283 offsetof(netinfo_t, ni_link)); 6284 6285 rw_enter(&i_mac_impl_lock, RW_READER); 6286 if (!mac_flow_log_enable && !mac_link_log_enable) { 6287 rw_exit(&i_mac_impl_lock); 6288 return; 6289 } 6290 lstate.mi_fenable = mac_flow_log_enable; 6291 lstate.mi_lenable = mac_link_log_enable; 6292 lstate.mi_last = B_FALSE; 6293 lstate.mi_list = &net_log_list; 6294 6295 /* Write log entries for each mac_impl in the list */ 6296 i_mac_log_info(&net_log_list, &lstate); 6297 6298 if (mac_flow_log_enable || mac_link_log_enable) { 6299 mac_logging_timer = timeout(mac_log_linkinfo, NULL, 6300 SEC_TO_TICK(mac_logging_interval)); 6301 } 6302 } 6303 6304 typedef struct i_mac_fastpath_state_s { 6305 boolean_t mf_disable; 6306 int mf_err; 6307 } i_mac_fastpath_state_t; 6308 6309 /* modhash walker function to enable or disable fastpath */ 6310 /*ARGSUSED*/ 6311 static uint_t 6312 i_mac_fastpath_walker(mod_hash_key_t key, mod_hash_val_t *val, 6313 void *arg) 6314 { 6315 i_mac_fastpath_state_t *state = arg; 6316 mac_handle_t mh = (mac_handle_t)val; 6317 6318 if (state->mf_disable) 6319 state->mf_err = mac_fastpath_disable(mh); 6320 else 6321 mac_fastpath_enable(mh); 6322 6323 return (state->mf_err == 0 ? MH_WALK_CONTINUE : MH_WALK_TERMINATE); 6324 } 6325 6326 /* 6327 * Start the logging timer. 6328 */ 6329 int 6330 mac_start_logusage(mac_logtype_t type, uint_t interval) 6331 { 6332 i_mac_fastpath_state_t dstate = {B_TRUE, 0}; 6333 i_mac_fastpath_state_t estate = {B_FALSE, 0}; 6334 int err; 6335 6336 rw_enter(&i_mac_impl_lock, RW_WRITER); 6337 switch (type) { 6338 case MAC_LOGTYPE_FLOW: 6339 if (mac_flow_log_enable) { 6340 rw_exit(&i_mac_impl_lock); 6341 return (0); 6342 } 6343 /* FALLTHRU */ 6344 case MAC_LOGTYPE_LINK: 6345 if (mac_link_log_enable) { 6346 rw_exit(&i_mac_impl_lock); 6347 return (0); 6348 } 6349 break; 6350 default: 6351 ASSERT(0); 6352 } 6353 6354 /* Disable fastpath */ 6355 mod_hash_walk(i_mac_impl_hash, i_mac_fastpath_walker, &dstate); 6356 if ((err = dstate.mf_err) != 0) { 6357 /* Reenable fastpath */ 6358 mod_hash_walk(i_mac_impl_hash, i_mac_fastpath_walker, &estate); 6359 rw_exit(&i_mac_impl_lock); 6360 return (err); 6361 } 6362 6363 switch (type) { 6364 case MAC_LOGTYPE_FLOW: 6365 mac_flow_log_enable = B_TRUE; 6366 /* FALLTHRU */ 6367 case MAC_LOGTYPE_LINK: 6368 mac_link_log_enable = B_TRUE; 6369 break; 6370 } 6371 6372 mac_logging_interval = interval; 6373 rw_exit(&i_mac_impl_lock); 6374 mac_log_linkinfo(NULL); 6375 return (0); 6376 } 6377 6378 /* 6379 * Stop the logging timer if both link and flow logging are turned off. 6380 */ 6381 void 6382 mac_stop_logusage(mac_logtype_t type) 6383 { 6384 i_mac_log_state_t lstate; 6385 i_mac_fastpath_state_t estate = {B_FALSE, 0}; 6386 list_t net_log_list; 6387 6388 list_create(&net_log_list, sizeof (netinfo_t), 6389 offsetof(netinfo_t, ni_link)); 6390 6391 rw_enter(&i_mac_impl_lock, RW_WRITER); 6392 6393 lstate.mi_fenable = mac_flow_log_enable; 6394 lstate.mi_lenable = mac_link_log_enable; 6395 lstate.mi_list = &net_log_list; 6396 6397 /* Last walk */ 6398 lstate.mi_last = B_TRUE; 6399 6400 switch (type) { 6401 case MAC_LOGTYPE_FLOW: 6402 if (lstate.mi_fenable) { 6403 ASSERT(mac_link_log_enable); 6404 mac_flow_log_enable = B_FALSE; 6405 mac_link_log_enable = B_FALSE; 6406 break; 6407 } 6408 /* FALLTHRU */ 6409 case MAC_LOGTYPE_LINK: 6410 if (!lstate.mi_lenable || mac_flow_log_enable) { 6411 rw_exit(&i_mac_impl_lock); 6412 return; 6413 } 6414 mac_link_log_enable = B_FALSE; 6415 break; 6416 default: 6417 ASSERT(0); 6418 } 6419 6420 /* Reenable fastpath */ 6421 mod_hash_walk(i_mac_impl_hash, i_mac_fastpath_walker, &estate); 6422 6423 (void) untimeout(mac_logging_timer); 6424 mac_logging_timer = NULL; 6425 6426 /* Write log entries for each mac_impl in the list */ 6427 i_mac_log_info(&net_log_list, &lstate); 6428 } 6429 6430 /* 6431 * Walk the rx and tx SRS/SRs for a flow and update the priority value. 6432 */ 6433 void 6434 mac_flow_update_priority(mac_client_impl_t *mcip, flow_entry_t *flent) 6435 { 6436 pri_t pri; 6437 int count; 6438 mac_soft_ring_set_t *mac_srs; 6439 6440 if (flent->fe_rx_srs_cnt <= 0) 6441 return; 6442 6443 if (((mac_soft_ring_set_t *)flent->fe_rx_srs[0])->srs_type == 6444 SRST_FLOW) { 6445 pri = FLOW_PRIORITY(mcip->mci_min_pri, 6446 mcip->mci_max_pri, 6447 flent->fe_resource_props.mrp_priority); 6448 } else { 6449 pri = mcip->mci_max_pri; 6450 } 6451 6452 for (count = 0; count < flent->fe_rx_srs_cnt; count++) { 6453 mac_srs = flent->fe_rx_srs[count]; 6454 mac_update_srs_priority(mac_srs, pri); 6455 } 6456 /* 6457 * If we have a Tx SRS, we need to modify all the threads associated 6458 * with it. 6459 */ 6460 if (flent->fe_tx_srs != NULL) 6461 mac_update_srs_priority(flent->fe_tx_srs, pri); 6462 } 6463 6464 /* 6465 * RX and TX rings are reserved according to different semantics depending 6466 * on the requests from the MAC clients and type of rings: 6467 * 6468 * On the Tx side, by default we reserve individual rings, independently from 6469 * the groups. 6470 * 6471 * On the Rx side, the reservation is at the granularity of the group 6472 * of rings, and used for v12n level 1 only. It has a special case for the 6473 * primary client. 6474 * 6475 * If a share is allocated to a MAC client, we allocate a TX group and an 6476 * RX group to the client, and assign TX rings and RX rings to these 6477 * groups according to information gathered from the driver through 6478 * the share capability. 6479 * 6480 * The foreseable evolution of Rx rings will handle v12n level 2 and higher 6481 * to allocate individual rings out of a group and program the hw classifier 6482 * based on IP address or higher level criteria. 6483 */ 6484 6485 /* 6486 * mac_reserve_tx_ring() 6487 * Reserve a unused ring by marking it with MR_INUSE state. 6488 * As reserved, the ring is ready to function. 6489 * 6490 * Notes for Hybrid I/O: 6491 * 6492 * If a specific ring is needed, it is specified through the desired_ring 6493 * argument. Otherwise that argument is set to NULL. 6494 * If the desired ring was previous allocated to another client, this 6495 * function swaps it with a new ring from the group of unassigned rings. 6496 */ 6497 mac_ring_t * 6498 mac_reserve_tx_ring(mac_impl_t *mip, mac_ring_t *desired_ring) 6499 { 6500 mac_group_t *group; 6501 mac_grp_client_t *mgcp; 6502 mac_client_impl_t *mcip; 6503 mac_soft_ring_set_t *srs; 6504 6505 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 6506 6507 /* 6508 * Find an available ring and start it before changing its status. 6509 * The unassigned rings are at the end of the mi_tx_groups 6510 * array. 6511 */ 6512 group = MAC_DEFAULT_TX_GROUP(mip); 6513 6514 /* Can't take the default ring out of the default group */ 6515 ASSERT(desired_ring != (mac_ring_t *)mip->mi_default_tx_ring); 6516 6517 if (desired_ring->mr_state == MR_FREE) { 6518 ASSERT(MAC_GROUP_NO_CLIENT(group)); 6519 if (mac_start_ring(desired_ring) != 0) 6520 return (NULL); 6521 return (desired_ring); 6522 } 6523 /* 6524 * There are clients using this ring, so let's move the clients 6525 * away from using this ring. 6526 */ 6527 for (mgcp = group->mrg_clients; mgcp != NULL; mgcp = mgcp->mgc_next) { 6528 mcip = mgcp->mgc_client; 6529 mac_tx_client_quiesce((mac_client_handle_t)mcip); 6530 srs = MCIP_TX_SRS(mcip); 6531 ASSERT(mac_tx_srs_ring_present(srs, desired_ring)); 6532 mac_tx_invoke_callbacks(mcip, 6533 (mac_tx_cookie_t)mac_tx_srs_get_soft_ring(srs, 6534 desired_ring)); 6535 mac_tx_srs_del_ring(srs, desired_ring); 6536 mac_tx_client_restart((mac_client_handle_t)mcip); 6537 } 6538 return (desired_ring); 6539 } 6540 6541 /* 6542 * For a non-default group with multiple clients, return the primary client. 6543 */ 6544 static mac_client_impl_t * 6545 mac_get_grp_primary(mac_group_t *grp) 6546 { 6547 mac_grp_client_t *mgcp = grp->mrg_clients; 6548 mac_client_impl_t *mcip; 6549 6550 while (mgcp != NULL) { 6551 mcip = mgcp->mgc_client; 6552 if (mcip->mci_flent->fe_type & FLOW_PRIMARY_MAC) 6553 return (mcip); 6554 mgcp = mgcp->mgc_next; 6555 } 6556 return (NULL); 6557 } 6558 6559 /* 6560 * Hybrid I/O specifies the ring that should be given to a share. 6561 * If the ring is already used by clients, then we need to release 6562 * the ring back to the default group so that we can give it to 6563 * the share. This means the clients using this ring now get a 6564 * replacement ring. If there aren't any replacement rings, this 6565 * function returns a failure. 6566 */ 6567 static int 6568 mac_reclaim_ring_from_grp(mac_impl_t *mip, mac_ring_type_t ring_type, 6569 mac_ring_t *ring, mac_ring_t **rings, int nrings) 6570 { 6571 mac_group_t *group = (mac_group_t *)ring->mr_gh; 6572 mac_resource_props_t *mrp; 6573 mac_client_impl_t *mcip; 6574 mac_group_t *defgrp; 6575 mac_ring_t *tring; 6576 mac_group_t *tgrp; 6577 int i; 6578 int j; 6579 6580 mcip = MAC_GROUP_ONLY_CLIENT(group); 6581 if (mcip == NULL) 6582 mcip = mac_get_grp_primary(group); 6583 ASSERT(mcip != NULL); 6584 ASSERT(mcip->mci_share == 0); 6585 6586 mrp = MCIP_RESOURCE_PROPS(mcip); 6587 if (ring_type == MAC_RING_TYPE_RX) { 6588 defgrp = mip->mi_rx_donor_grp; 6589 if ((mrp->mrp_mask & MRP_RX_RINGS) == 0) { 6590 /* Need to put this mac client in the default group */ 6591 if (mac_rx_switch_group(mcip, group, defgrp) != 0) 6592 return (ENOSPC); 6593 } else { 6594 /* 6595 * Switch this ring with some other ring from 6596 * the default group. 6597 */ 6598 for (tring = defgrp->mrg_rings; tring != NULL; 6599 tring = tring->mr_next) { 6600 if (tring->mr_index == 0) 6601 continue; 6602 for (j = 0; j < nrings; j++) { 6603 if (rings[j] == tring) 6604 break; 6605 } 6606 if (j >= nrings) 6607 break; 6608 } 6609 if (tring == NULL) 6610 return (ENOSPC); 6611 if (mac_group_mov_ring(mip, group, tring) != 0) 6612 return (ENOSPC); 6613 if (mac_group_mov_ring(mip, defgrp, ring) != 0) { 6614 (void) mac_group_mov_ring(mip, defgrp, tring); 6615 return (ENOSPC); 6616 } 6617 } 6618 ASSERT(ring->mr_gh == (mac_group_handle_t)defgrp); 6619 return (0); 6620 } 6621 6622 defgrp = MAC_DEFAULT_TX_GROUP(mip); 6623 if (ring == (mac_ring_t *)mip->mi_default_tx_ring) { 6624 /* 6625 * See if we can get a spare ring to replace the default 6626 * ring. 6627 */ 6628 if (defgrp->mrg_cur_count == 1) { 6629 /* 6630 * Need to get a ring from another client, see if 6631 * there are any clients that can be moved to 6632 * the default group, thereby freeing some rings. 6633 */ 6634 for (i = 0; i < mip->mi_tx_group_count; i++) { 6635 tgrp = &mip->mi_tx_groups[i]; 6636 if (tgrp->mrg_state == 6637 MAC_GROUP_STATE_REGISTERED) { 6638 continue; 6639 } 6640 mcip = MAC_GROUP_ONLY_CLIENT(tgrp); 6641 if (mcip == NULL) 6642 mcip = mac_get_grp_primary(tgrp); 6643 ASSERT(mcip != NULL); 6644 mrp = MCIP_RESOURCE_PROPS(mcip); 6645 if ((mrp->mrp_mask & MRP_TX_RINGS) == 0) { 6646 ASSERT(tgrp->mrg_cur_count == 1); 6647 /* 6648 * If this ring is part of the 6649 * rings asked by the share we cannot 6650 * use it as the default ring. 6651 */ 6652 for (j = 0; j < nrings; j++) { 6653 if (rings[j] == tgrp->mrg_rings) 6654 break; 6655 } 6656 if (j < nrings) 6657 continue; 6658 mac_tx_client_quiesce( 6659 (mac_client_handle_t)mcip); 6660 mac_tx_switch_group(mcip, tgrp, 6661 defgrp); 6662 mac_tx_client_restart( 6663 (mac_client_handle_t)mcip); 6664 break; 6665 } 6666 } 6667 /* 6668 * All the rings are reserved, can't give up the 6669 * default ring. 6670 */ 6671 if (defgrp->mrg_cur_count <= 1) 6672 return (ENOSPC); 6673 } 6674 /* 6675 * Swap the default ring with another. 6676 */ 6677 for (tring = defgrp->mrg_rings; tring != NULL; 6678 tring = tring->mr_next) { 6679 /* 6680 * If this ring is part of the rings asked by the 6681 * share we cannot use it as the default ring. 6682 */ 6683 for (j = 0; j < nrings; j++) { 6684 if (rings[j] == tring) 6685 break; 6686 } 6687 if (j >= nrings) 6688 break; 6689 } 6690 ASSERT(tring != NULL); 6691 mip->mi_default_tx_ring = (mac_ring_handle_t)tring; 6692 return (0); 6693 } 6694 /* 6695 * The Tx ring is with a group reserved by a MAC client. See if 6696 * we can swap it. 6697 */ 6698 ASSERT(group->mrg_state == MAC_GROUP_STATE_RESERVED); 6699 mcip = MAC_GROUP_ONLY_CLIENT(group); 6700 if (mcip == NULL) 6701 mcip = mac_get_grp_primary(group); 6702 ASSERT(mcip != NULL); 6703 mrp = MCIP_RESOURCE_PROPS(mcip); 6704 mac_tx_client_quiesce((mac_client_handle_t)mcip); 6705 if ((mrp->mrp_mask & MRP_TX_RINGS) == 0) { 6706 ASSERT(group->mrg_cur_count == 1); 6707 /* Put this mac client in the default group */ 6708 mac_tx_switch_group(mcip, group, defgrp); 6709 } else { 6710 /* 6711 * Switch this ring with some other ring from 6712 * the default group. 6713 */ 6714 for (tring = defgrp->mrg_rings; tring != NULL; 6715 tring = tring->mr_next) { 6716 if (tring == (mac_ring_t *)mip->mi_default_tx_ring) 6717 continue; 6718 /* 6719 * If this ring is part of the rings asked by the 6720 * share we cannot use it for swapping. 6721 */ 6722 for (j = 0; j < nrings; j++) { 6723 if (rings[j] == tring) 6724 break; 6725 } 6726 if (j >= nrings) 6727 break; 6728 } 6729 if (tring == NULL) { 6730 mac_tx_client_restart((mac_client_handle_t)mcip); 6731 return (ENOSPC); 6732 } 6733 if (mac_group_mov_ring(mip, group, tring) != 0) { 6734 mac_tx_client_restart((mac_client_handle_t)mcip); 6735 return (ENOSPC); 6736 } 6737 if (mac_group_mov_ring(mip, defgrp, ring) != 0) { 6738 (void) mac_group_mov_ring(mip, defgrp, tring); 6739 mac_tx_client_restart((mac_client_handle_t)mcip); 6740 return (ENOSPC); 6741 } 6742 } 6743 mac_tx_client_restart((mac_client_handle_t)mcip); 6744 ASSERT(ring->mr_gh == (mac_group_handle_t)defgrp); 6745 return (0); 6746 } 6747 6748 /* 6749 * Populate a zero-ring group with rings. If the share is non-NULL, 6750 * the rings are chosen according to that share. 6751 * Invoked after allocating a new RX or TX group through 6752 * mac_reserve_rx_group() or mac_reserve_tx_group(), respectively. 6753 * Returns zero on success, an errno otherwise. 6754 */ 6755 int 6756 i_mac_group_allocate_rings(mac_impl_t *mip, mac_ring_type_t ring_type, 6757 mac_group_t *src_group, mac_group_t *new_group, mac_share_handle_t share, 6758 uint32_t ringcnt) 6759 { 6760 mac_ring_t **rings, *ring; 6761 uint_t nrings; 6762 int rv = 0, i = 0, j; 6763 6764 ASSERT((ring_type == MAC_RING_TYPE_RX && 6765 mip->mi_rx_group_type == MAC_GROUP_TYPE_DYNAMIC) || 6766 (ring_type == MAC_RING_TYPE_TX && 6767 mip->mi_tx_group_type == MAC_GROUP_TYPE_DYNAMIC)); 6768 6769 /* 6770 * First find the rings to allocate to the group. 6771 */ 6772 if (share != 0) { 6773 /* get rings through ms_squery() */ 6774 mip->mi_share_capab.ms_squery(share, ring_type, NULL, &nrings); 6775 ASSERT(nrings != 0); 6776 rings = kmem_alloc(nrings * sizeof (mac_ring_handle_t), 6777 KM_SLEEP); 6778 mip->mi_share_capab.ms_squery(share, ring_type, 6779 (mac_ring_handle_t *)rings, &nrings); 6780 for (i = 0; i < nrings; i++) { 6781 /* 6782 * If we have given this ring to a non-default 6783 * group, we need to check if we can get this 6784 * ring. 6785 */ 6786 ring = rings[i]; 6787 if (ring->mr_gh != (mac_group_handle_t)src_group || 6788 ring == (mac_ring_t *)mip->mi_default_tx_ring) { 6789 if (mac_reclaim_ring_from_grp(mip, ring_type, 6790 ring, rings, nrings) != 0) { 6791 rv = ENOSPC; 6792 goto bail; 6793 } 6794 } 6795 } 6796 } else { 6797 /* 6798 * Pick one ring from default group. 6799 * 6800 * for now pick the second ring which requires the first ring 6801 * at index 0 to stay in the default group, since it is the 6802 * ring which carries the multicast traffic. 6803 * We need a better way for a driver to indicate this, 6804 * for example a per-ring flag. 6805 */ 6806 rings = kmem_alloc(ringcnt * sizeof (mac_ring_handle_t), 6807 KM_SLEEP); 6808 for (ring = src_group->mrg_rings; ring != NULL; 6809 ring = ring->mr_next) { 6810 if (ring_type == MAC_RING_TYPE_RX && 6811 ring->mr_index == 0) { 6812 continue; 6813 } 6814 if (ring_type == MAC_RING_TYPE_TX && 6815 ring == (mac_ring_t *)mip->mi_default_tx_ring) { 6816 continue; 6817 } 6818 rings[i++] = ring; 6819 if (i == ringcnt) 6820 break; 6821 } 6822 ASSERT(ring != NULL); 6823 nrings = i; 6824 /* Not enough rings as required */ 6825 if (nrings != ringcnt) { 6826 rv = ENOSPC; 6827 goto bail; 6828 } 6829 } 6830 6831 switch (ring_type) { 6832 case MAC_RING_TYPE_RX: 6833 if (src_group->mrg_cur_count - nrings < 1) { 6834 /* we ran out of rings */ 6835 rv = ENOSPC; 6836 goto bail; 6837 } 6838 6839 /* move receive rings to new group */ 6840 for (i = 0; i < nrings; i++) { 6841 rv = mac_group_mov_ring(mip, new_group, rings[i]); 6842 if (rv != 0) { 6843 /* move rings back on failure */ 6844 for (j = 0; j < i; j++) { 6845 (void) mac_group_mov_ring(mip, 6846 src_group, rings[j]); 6847 } 6848 goto bail; 6849 } 6850 } 6851 break; 6852 6853 case MAC_RING_TYPE_TX: { 6854 mac_ring_t *tmp_ring; 6855 6856 /* move the TX rings to the new group */ 6857 for (i = 0; i < nrings; i++) { 6858 /* get the desired ring */ 6859 tmp_ring = mac_reserve_tx_ring(mip, rings[i]); 6860 if (tmp_ring == NULL) { 6861 rv = ENOSPC; 6862 goto bail; 6863 } 6864 ASSERT(tmp_ring == rings[i]); 6865 rv = mac_group_mov_ring(mip, new_group, rings[i]); 6866 if (rv != 0) { 6867 /* cleanup on failure */ 6868 for (j = 0; j < i; j++) { 6869 (void) mac_group_mov_ring(mip, 6870 MAC_DEFAULT_TX_GROUP(mip), 6871 rings[j]); 6872 } 6873 goto bail; 6874 } 6875 } 6876 break; 6877 } 6878 } 6879 6880 /* add group to share */ 6881 if (share != 0) 6882 mip->mi_share_capab.ms_sadd(share, new_group->mrg_driver); 6883 6884 bail: 6885 /* free temporary array of rings */ 6886 kmem_free(rings, nrings * sizeof (mac_ring_handle_t)); 6887 6888 return (rv); 6889 } 6890 6891 void 6892 mac_group_add_client(mac_group_t *grp, mac_client_impl_t *mcip) 6893 { 6894 mac_grp_client_t *mgcp; 6895 6896 for (mgcp = grp->mrg_clients; mgcp != NULL; mgcp = mgcp->mgc_next) { 6897 if (mgcp->mgc_client == mcip) 6898 break; 6899 } 6900 6901 ASSERT(mgcp == NULL); 6902 6903 mgcp = kmem_zalloc(sizeof (mac_grp_client_t), KM_SLEEP); 6904 mgcp->mgc_client = mcip; 6905 mgcp->mgc_next = grp->mrg_clients; 6906 grp->mrg_clients = mgcp; 6907 } 6908 6909 void 6910 mac_group_remove_client(mac_group_t *grp, mac_client_impl_t *mcip) 6911 { 6912 mac_grp_client_t *mgcp, **pprev; 6913 6914 for (pprev = &grp->mrg_clients, mgcp = *pprev; mgcp != NULL; 6915 pprev = &mgcp->mgc_next, mgcp = *pprev) { 6916 if (mgcp->mgc_client == mcip) 6917 break; 6918 } 6919 6920 ASSERT(mgcp != NULL); 6921 6922 *pprev = mgcp->mgc_next; 6923 kmem_free(mgcp, sizeof (mac_grp_client_t)); 6924 } 6925 6926 /* 6927 * Return true if any client on this group explicitly asked for HW 6928 * rings (of type mask) or have a bound share. 6929 */ 6930 static boolean_t 6931 i_mac_clients_hw(mac_group_t *grp, uint32_t mask) 6932 { 6933 mac_grp_client_t *mgcip; 6934 mac_client_impl_t *mcip; 6935 mac_resource_props_t *mrp; 6936 6937 for (mgcip = grp->mrg_clients; mgcip != NULL; mgcip = mgcip->mgc_next) { 6938 mcip = mgcip->mgc_client; 6939 mrp = MCIP_RESOURCE_PROPS(mcip); 6940 if (mcip->mci_share != 0 || (mrp->mrp_mask & mask) != 0) 6941 return (B_TRUE); 6942 } 6943 6944 return (B_FALSE); 6945 } 6946 6947 /* 6948 * Finds an available group and exclusively reserves it for a client. 6949 * The group is chosen to suit the flow's resource controls (bandwidth and 6950 * fanout requirements) and the address type. 6951 * If the requestor is the pimary MAC then return the group with the 6952 * largest number of rings, otherwise the default ring when available. 6953 */ 6954 mac_group_t * 6955 mac_reserve_rx_group(mac_client_impl_t *mcip, uint8_t *mac_addr, boolean_t move) 6956 { 6957 mac_share_handle_t share = mcip->mci_share; 6958 mac_impl_t *mip = mcip->mci_mip; 6959 mac_group_t *grp = NULL; 6960 int i; 6961 int err = 0; 6962 mac_address_t *map; 6963 mac_resource_props_t *mrp = MCIP_RESOURCE_PROPS(mcip); 6964 int nrings; 6965 int donor_grp_rcnt; 6966 boolean_t need_exclgrp = B_FALSE; 6967 int need_rings = 0; 6968 mac_group_t *candidate_grp = NULL; 6969 mac_client_impl_t *gclient; 6970 mac_group_t *donorgrp = NULL; 6971 boolean_t rxhw = mrp->mrp_mask & MRP_RX_RINGS; 6972 boolean_t unspec = mrp->mrp_mask & MRP_RXRINGS_UNSPEC; 6973 boolean_t isprimary; 6974 6975 ASSERT(MAC_PERIM_HELD((mac_handle_t)mip)); 6976 6977 isprimary = mcip->mci_flent->fe_type & FLOW_PRIMARY_MAC; 6978 6979 /* 6980 * Check if a group already has this MAC address (case of VLANs) 6981 * unless we are moving this MAC client from one group to another. 6982 */ 6983 if (!move && (map = mac_find_macaddr(mip, mac_addr)) != NULL) { 6984 if (map->ma_group != NULL) 6985 return (map->ma_group); 6986 } 6987 6988 if (mip->mi_rx_groups == NULL || mip->mi_rx_group_count == 0) 6989 return (NULL); 6990 6991 /* 6992 * If this client is requesting exclusive MAC access then 6993 * return NULL to ensure the client uses the default group. 6994 */ 6995 if (mcip->mci_state_flags & MCIS_EXCLUSIVE) 6996 return (NULL); 6997 6998 /* For dynamic groups default unspecified to 1 */ 6999 if (rxhw && unspec && 7000 mip->mi_rx_group_type == MAC_GROUP_TYPE_DYNAMIC) { 7001 mrp->mrp_nrxrings = 1; 7002 } 7003 7004 /* 7005 * For static grouping we allow only specifying rings=0 and 7006 * unspecified 7007 */ 7008 if (rxhw && mrp->mrp_nrxrings > 0 && 7009 mip->mi_rx_group_type == MAC_GROUP_TYPE_STATIC) { 7010 return (NULL); 7011 } 7012 7013 if (rxhw) { 7014 /* 7015 * We have explicitly asked for a group (with nrxrings, 7016 * if unspec). 7017 */ 7018 if (unspec || mrp->mrp_nrxrings > 0) { 7019 need_exclgrp = B_TRUE; 7020 need_rings = mrp->mrp_nrxrings; 7021 } else if (mrp->mrp_nrxrings == 0) { 7022 /* 7023 * We have asked for a software group. 7024 */ 7025 return (NULL); 7026 } 7027 } else if (isprimary && mip->mi_nactiveclients == 1 && 7028 mip->mi_rx_group_type == MAC_GROUP_TYPE_DYNAMIC) { 7029 /* 7030 * If the primary is the only active client on this 7031 * mip and we have not asked for any rings, we give 7032 * it the default group so that the primary gets to 7033 * use all the rings. 7034 */ 7035 return (NULL); 7036 } 7037 7038 /* The group that can donate rings */ 7039 donorgrp = mip->mi_rx_donor_grp; 7040 7041 /* 7042 * The number of rings that the default group can donate. 7043 * We need to leave at least one ring. 7044 */ 7045 donor_grp_rcnt = donorgrp->mrg_cur_count - 1; 7046 7047 /* 7048 * Try to exclusively reserve a RX group. 7049 * 7050 * For flows requiring HW_DEFAULT_RING (unicast flow of the primary 7051 * client), try to reserve the a non-default RX group and give 7052 * it all the rings from the donor group, except the default ring 7053 * 7054 * For flows requiring HW_RING (unicast flow of other clients), try 7055 * to reserve non-default RX group with the specified number of 7056 * rings, if available. 7057 * 7058 * For flows that have not asked for software or hardware ring, 7059 * try to reserve a non-default group with 1 ring, if available. 7060 */ 7061 for (i = 1; i < mip->mi_rx_group_count; i++) { 7062 grp = &mip->mi_rx_groups[i]; 7063 7064 DTRACE_PROBE3(rx__group__trying, char *, mip->mi_name, 7065 int, grp->mrg_index, mac_group_state_t, grp->mrg_state); 7066 7067 /* 7068 * Check if this group could be a candidate group for 7069 * eviction if we need a group for this MAC client, 7070 * but there aren't any. A candidate group is one 7071 * that didn't ask for an exclusive group, but got 7072 * one and it has enough rings (combined with what 7073 * the donor group can donate) for the new MAC 7074 * client. 7075 */ 7076 if (grp->mrg_state >= MAC_GROUP_STATE_RESERVED) { 7077 /* 7078 * If the donor group is not the default 7079 * group, don't bother looking for a candidate 7080 * group. If we don't have enough rings we 7081 * will check if the primary group can be 7082 * vacated. 7083 */ 7084 if (candidate_grp == NULL && 7085 donorgrp == MAC_DEFAULT_RX_GROUP(mip)) { 7086 if (!i_mac_clients_hw(grp, MRP_RX_RINGS) && 7087 (unspec || 7088 (grp->mrg_cur_count + donor_grp_rcnt >= 7089 need_rings))) { 7090 candidate_grp = grp; 7091 } 7092 } 7093 continue; 7094 } 7095 /* 7096 * This group could already be SHARED by other multicast 7097 * flows on this client. In that case, the group would 7098 * be shared and has already been started. 7099 */ 7100 ASSERT(grp->mrg_state != MAC_GROUP_STATE_UNINIT); 7101 7102 if ((grp->mrg_state == MAC_GROUP_STATE_REGISTERED) && 7103 (mac_start_group(grp) != 0)) { 7104 continue; 7105 } 7106 7107 if (mip->mi_rx_group_type != MAC_GROUP_TYPE_DYNAMIC) 7108 break; 7109 ASSERT(grp->mrg_cur_count == 0); 7110 7111 /* 7112 * Populate the group. Rings should be taken 7113 * from the donor group. 7114 */ 7115 nrings = rxhw ? need_rings : isprimary ? donor_grp_rcnt: 1; 7116 7117 /* 7118 * If the donor group can't donate, let's just walk and 7119 * see if someone can vacate a group, so that we have 7120 * enough rings for this, unless we already have 7121 * identified a candiate group.. 7122 */ 7123 if (nrings <= donor_grp_rcnt) { 7124 err = i_mac_group_allocate_rings(mip, MAC_RING_TYPE_RX, 7125 donorgrp, grp, share, nrings); 7126 if (err == 0) { 7127 /* 7128 * For a share i_mac_group_allocate_rings gets 7129 * the rings from the driver, let's populate 7130 * the property for the client now. 7131 */ 7132 if (share != 0) { 7133 mac_client_set_rings( 7134 (mac_client_handle_t)mcip, 7135 grp->mrg_cur_count, -1); 7136 } 7137 if (mac_is_primary_client(mcip) && !rxhw) 7138 mip->mi_rx_donor_grp = grp; 7139 break; 7140 } 7141 } 7142 7143 DTRACE_PROBE3(rx__group__reserve__alloc__rings, char *, 7144 mip->mi_name, int, grp->mrg_index, int, err); 7145 7146 /* 7147 * It's a dynamic group but the grouping operation 7148 * failed. 7149 */ 7150 mac_stop_group(grp); 7151 } 7152 7153 /* We didn't find an exclusive group for this MAC client */ 7154 if (i >= mip->mi_rx_group_count) { 7155 7156 if (!need_exclgrp) 7157 return (NULL); 7158 7159 /* 7160 * If we found a candidate group then move the 7161 * existing MAC client from the candidate_group to the 7162 * default group and give the candidate_group to the 7163 * new MAC client. If we didn't find a candidate 7164 * group, then check if the primary is in its own 7165 * group and if it can make way for this MAC client. 7166 */ 7167 if (candidate_grp == NULL && 7168 donorgrp != MAC_DEFAULT_RX_GROUP(mip) && 7169 donorgrp->mrg_cur_count >= need_rings) { 7170 candidate_grp = donorgrp; 7171 } 7172 if (candidate_grp != NULL) { 7173 boolean_t prim_grp = B_FALSE; 7174 7175 /* 7176 * Switch the existing MAC client from the 7177 * candidate group to the default group. If 7178 * the candidate group is the donor group, 7179 * then after the switch we need to update the 7180 * donor group too. 7181 */ 7182 grp = candidate_grp; 7183 gclient = grp->mrg_clients->mgc_client; 7184 VERIFY3P(gclient, !=, NULL); 7185 if (grp == mip->mi_rx_donor_grp) 7186 prim_grp = B_TRUE; 7187 if (mac_rx_switch_group(gclient, grp, 7188 MAC_DEFAULT_RX_GROUP(mip)) != 0) { 7189 return (NULL); 7190 } 7191 if (prim_grp) { 7192 mip->mi_rx_donor_grp = 7193 MAC_DEFAULT_RX_GROUP(mip); 7194 donorgrp = MAC_DEFAULT_RX_GROUP(mip); 7195 } 7196 7197 /* 7198 * Now give this group with the required rings 7199 * to this MAC client. 7200 */ 7201 ASSERT(grp->mrg_state == MAC_GROUP_STATE_REGISTERED); 7202 if (mac_start_group(grp) != 0) 7203 return (NULL); 7204 7205 if (mip->mi_rx_group_type != MAC_GROUP_TYPE_DYNAMIC) 7206 return (grp); 7207 7208 donor_grp_rcnt = donorgrp->mrg_cur_count - 1; 7209 ASSERT(grp->mrg_cur_count == 0); 7210 ASSERT(donor_grp_rcnt >= need_rings); 7211 err = i_mac_group_allocate_rings(mip, MAC_RING_TYPE_RX, 7212 donorgrp, grp, share, need_rings); 7213 if (err == 0) { 7214 /* 7215 * For a share i_mac_group_allocate_rings gets 7216 * the rings from the driver, let's populate 7217 * the property for the client now. 7218 */ 7219 if (share != 0) { 7220 mac_client_set_rings( 7221 (mac_client_handle_t)mcip, 7222 grp->mrg_cur_count, -1); 7223 } 7224 DTRACE_PROBE2(rx__group__reserved, 7225 char *, mip->mi_name, int, grp->mrg_index); 7226 return (grp); 7227 } 7228 DTRACE_PROBE3(rx__group__reserve__alloc__rings, char *, 7229 mip->mi_name, int, grp->mrg_index, int, err); 7230 mac_stop_group(grp); 7231 } 7232 return (NULL); 7233 } 7234 ASSERT(grp != NULL); 7235 7236 DTRACE_PROBE2(rx__group__reserved, 7237 char *, mip->mi_name, int, grp->mrg_index); 7238 return (grp); 7239 } 7240 7241 /* 7242 * mac_rx_release_group() 7243 * 7244 * Release the group when it has no remaining clients. The group is 7245 * stopped and its shares are removed and all rings are assigned back 7246 * to default group. This should never be called against the default 7247 * group. 7248 */ 7249 void 7250 mac_release_rx_group(mac_client_impl_t *mcip, mac_group_t *group) 7251 { 7252 mac_impl_t *mip = mcip->mci_mip; 7253 mac_ring_t *ring; 7254 7255 ASSERT(group != MAC_DEFAULT_RX_GROUP(mip)); 7256 ASSERT(MAC_GROUP_NO_CLIENT(group) == B_TRUE); 7257 7258 if (mip->mi_rx_donor_grp == group) 7259 mip->mi_rx_donor_grp = MAC_DEFAULT_RX_GROUP(mip); 7260 7261 /* 7262 * This is the case where there are no clients left. Any 7263 * SRS etc on this group have also be quiesced. 7264 */ 7265 for (ring = group->mrg_rings; ring != NULL; ring = ring->mr_next) { 7266 if (ring->mr_classify_type == MAC_HW_CLASSIFIER) { 7267 ASSERT(group->mrg_state == MAC_GROUP_STATE_RESERVED); 7268 /* 7269 * Remove the SRS associated with the HW ring. 7270 * As a result, polling will be disabled. 7271 */ 7272 ring->mr_srs = NULL; 7273 } 7274 ASSERT(group->mrg_state < MAC_GROUP_STATE_RESERVED || 7275 ring->mr_state == MR_INUSE); 7276 if (ring->mr_state == MR_INUSE) { 7277 mac_stop_ring(ring); 7278 ring->mr_flag = 0; 7279 } 7280 } 7281 7282 /* remove group from share */ 7283 if (mcip->mci_share != 0) { 7284 mip->mi_share_capab.ms_sremove(mcip->mci_share, 7285 group->mrg_driver); 7286 } 7287 7288 if (mip->mi_rx_group_type == MAC_GROUP_TYPE_DYNAMIC) { 7289 mac_ring_t *ring; 7290 7291 /* 7292 * Rings were dynamically allocated to group. 7293 * Move rings back to default group. 7294 */ 7295 while ((ring = group->mrg_rings) != NULL) { 7296 (void) mac_group_mov_ring(mip, mip->mi_rx_donor_grp, 7297 ring); 7298 } 7299 } 7300 mac_stop_group(group); 7301 /* 7302 * Possible improvement: See if we can assign the group just released 7303 * to a another client of the mip 7304 */ 7305 } 7306 7307 /* 7308 * Move the MAC address from fgrp to tgrp. 7309 */ 7310 static int 7311 mac_rx_move_macaddr(mac_client_impl_t *mcip, mac_group_t *fgrp, 7312 mac_group_t *tgrp) 7313 { 7314 mac_impl_t *mip = mcip->mci_mip; 7315 uint8_t maddr[MAXMACADDRLEN]; 7316 int err = 0; 7317 uint16_t vid; 7318 mac_unicast_impl_t *muip; 7319 boolean_t use_hw; 7320 7321 mac_rx_client_quiesce((mac_client_handle_t)mcip); 7322 VERIFY3P(mcip->mci_unicast, !=, NULL); 7323 bcopy(mcip->mci_unicast->ma_addr, maddr, mcip->mci_unicast->ma_len); 7324 7325 /* 7326 * Does the client require MAC address hardware classifiction? 7327 */ 7328 use_hw = (mcip->mci_state_flags & MCIS_UNICAST_HW) != 0; 7329 vid = i_mac_flow_vid(mcip->mci_flent); 7330 7331 /* 7332 * You can never move an address that is shared by multiple 7333 * clients. mac_datapath_setup() ensures that clients sharing 7334 * an address are placed on the default group. This guarantees 7335 * that a non-default group will only ever have one client and 7336 * thus make full use of HW filters. 7337 */ 7338 if (mac_check_macaddr_shared(mcip->mci_unicast)) 7339 return (EINVAL); 7340 7341 err = mac_remove_macaddr_vlan(mcip->mci_unicast, vid); 7342 7343 if (err != 0) { 7344 mac_rx_client_restart((mac_client_handle_t)mcip); 7345 return (err); 7346 } 7347 7348 /* 7349 * If this isn't the primary MAC address then the 7350 * mac_address_t has been freed by the last call to 7351 * mac_remove_macaddr_vlan(). In any case, NULL the reference 7352 * to avoid a dangling pointer. 7353 */ 7354 mcip->mci_unicast = NULL; 7355 7356 /* 7357 * We also have to NULL all the mui_map references -- sun4v 7358 * strikes again! 7359 */ 7360 rw_enter(&mcip->mci_rw_lock, RW_WRITER); 7361 for (muip = mcip->mci_unicast_list; muip != NULL; muip = muip->mui_next) 7362 muip->mui_map = NULL; 7363 rw_exit(&mcip->mci_rw_lock); 7364 7365 /* 7366 * Program the H/W Classifier first, if this fails we need not 7367 * proceed with the other stuff. 7368 */ 7369 if ((err = mac_add_macaddr_vlan(mip, tgrp, maddr, vid, use_hw)) != 0) { 7370 int err2; 7371 7372 /* Revert back the H/W Classifier */ 7373 err2 = mac_add_macaddr_vlan(mip, fgrp, maddr, vid, use_hw); 7374 7375 if (err2 != 0) { 7376 cmn_err(CE_WARN, "Failed to revert HW classification" 7377 " on MAC %s, for client %s: %d.", mip->mi_name, 7378 mcip->mci_name, err2); 7379 } 7380 7381 mac_rx_client_restart((mac_client_handle_t)mcip); 7382 return (err); 7383 } 7384 7385 /* 7386 * Get a reference to the new mac_address_t and update the 7387 * client's reference. Then restart the client and add the 7388 * other clients of this MAC addr (if they exsit). 7389 */ 7390 mcip->mci_unicast = mac_find_macaddr(mip, maddr); 7391 rw_enter(&mcip->mci_rw_lock, RW_WRITER); 7392 for (muip = mcip->mci_unicast_list; muip != NULL; muip = muip->mui_next) 7393 muip->mui_map = mcip->mci_unicast; 7394 rw_exit(&mcip->mci_rw_lock); 7395 mac_rx_client_restart((mac_client_handle_t)mcip); 7396 return (0); 7397 } 7398 7399 /* 7400 * Switch the MAC client from one group to another. This means we need 7401 * to remove the MAC address from the group, remove the MAC client, 7402 * teardown the SRSs and revert the group state. Then, we add the client 7403 * to the destination group, set the SRSs, and add the MAC address to the 7404 * group. 7405 */ 7406 int 7407 mac_rx_switch_group(mac_client_impl_t *mcip, mac_group_t *fgrp, 7408 mac_group_t *tgrp) 7409 { 7410 int err; 7411 mac_group_state_t next_state; 7412 mac_client_impl_t *group_only_mcip; 7413 mac_client_impl_t *gmcip; 7414 mac_impl_t *mip = mcip->mci_mip; 7415 mac_grp_client_t *mgcp; 7416 7417 VERIFY3P(fgrp, ==, mcip->mci_flent->fe_rx_ring_group); 7418 7419 if ((err = mac_rx_move_macaddr(mcip, fgrp, tgrp)) != 0) 7420 return (err); 7421 7422 /* 7423 * If the group is marked as reserved and in use by a single 7424 * client, then there is an SRS to teardown. 7425 */ 7426 if (fgrp->mrg_state == MAC_GROUP_STATE_RESERVED && 7427 MAC_GROUP_ONLY_CLIENT(fgrp) != NULL) { 7428 mac_rx_srs_group_teardown(mcip->mci_flent, B_TRUE); 7429 } 7430 7431 /* 7432 * If we are moving the client from a non-default group, then 7433 * we know that any additional clients on this group share the 7434 * same MAC address. Since we moved the MAC address filter, we 7435 * need to move these clients too. 7436 * 7437 * If we are moving the client from the default group and its 7438 * MAC address has VLAN clients, then we must move those 7439 * clients as well. 7440 * 7441 * In both cases the idea is the same: we moved the MAC 7442 * address filter to the tgrp, so we must move all clients 7443 * using that MAC address to tgrp as well. 7444 */ 7445 if (fgrp != MAC_DEFAULT_RX_GROUP(mip)) { 7446 mgcp = fgrp->mrg_clients; 7447 while (mgcp != NULL) { 7448 gmcip = mgcp->mgc_client; 7449 mgcp = mgcp->mgc_next; 7450 mac_group_remove_client(fgrp, gmcip); 7451 mac_group_add_client(tgrp, gmcip); 7452 gmcip->mci_flent->fe_rx_ring_group = tgrp; 7453 } 7454 mac_release_rx_group(mcip, fgrp); 7455 VERIFY3B(MAC_GROUP_NO_CLIENT(fgrp), ==, B_TRUE); 7456 mac_set_group_state(fgrp, MAC_GROUP_STATE_REGISTERED); 7457 } else { 7458 mac_group_remove_client(fgrp, mcip); 7459 mac_group_add_client(tgrp, mcip); 7460 mcip->mci_flent->fe_rx_ring_group = tgrp; 7461 7462 /* 7463 * If there are other clients (VLANs) sharing this address 7464 * then move them too. 7465 */ 7466 if (mac_check_macaddr_shared(mcip->mci_unicast)) { 7467 /* 7468 * We need to move all the clients that are using 7469 * this MAC address. 7470 */ 7471 mgcp = fgrp->mrg_clients; 7472 while (mgcp != NULL) { 7473 gmcip = mgcp->mgc_client; 7474 mgcp = mgcp->mgc_next; 7475 if (mcip->mci_unicast == gmcip->mci_unicast) { 7476 mac_group_remove_client(fgrp, gmcip); 7477 mac_group_add_client(tgrp, gmcip); 7478 gmcip->mci_flent->fe_rx_ring_group = 7479 tgrp; 7480 } 7481 } 7482 } 7483 7484 /* 7485 * The default group still handles multicast and 7486 * broadcast traffic; it won't transition to 7487 * MAC_GROUP_STATE_REGISTERED. 7488 */ 7489 if (fgrp->mrg_state == MAC_GROUP_STATE_RESERVED) 7490 mac_rx_group_unmark(fgrp, MR_CONDEMNED); 7491 mac_set_group_state(fgrp, MAC_GROUP_STATE_SHARED); 7492 } 7493 7494 next_state = mac_group_next_state(tgrp, &group_only_mcip, 7495 MAC_DEFAULT_RX_GROUP(mip), B_TRUE); 7496 mac_set_group_state(tgrp, next_state); 7497 7498 /* 7499 * If the destination group is reserved, then setup the SRSes. 7500 * Otherwise make sure to use SW classification. 7501 */ 7502 if (tgrp->mrg_state == MAC_GROUP_STATE_RESERVED) { 7503 mac_rx_srs_group_setup(mcip, mcip->mci_flent, SRST_LINK); 7504 mac_fanout_setup(mcip, mcip->mci_flent, 7505 MCIP_RESOURCE_PROPS(mcip), mac_rx_deliver, mcip, NULL, 7506 NULL); 7507 mac_rx_group_unmark(tgrp, MR_INCIPIENT); 7508 } else { 7509 mac_rx_switch_grp_to_sw(tgrp); 7510 } 7511 7512 return (0); 7513 } 7514 7515 /* 7516 * Reserves a TX group for the specified share. Invoked by mac_tx_srs_setup() 7517 * when a share was allocated to the client. 7518 */ 7519 mac_group_t * 7520 mac_reserve_tx_group(mac_client_impl_t *mcip, boolean_t move) 7521 { 7522 mac_impl_t *mip = mcip->mci_mip; 7523 mac_group_t *grp = NULL; 7524 int rv; 7525 int i; 7526 int err; 7527 mac_group_t *defgrp; 7528 mac_share_handle_t share = mcip->mci_share; 7529 mac_resource_props_t *mrp = MCIP_RESOURCE_PROPS(mcip); 7530 int nrings; 7531 int defnrings; 7532 boolean_t need_exclgrp = B_FALSE; 7533 int need_rings = 0; 7534 mac_group_t *candidate_grp = NULL; 7535 mac_client_impl_t *gclient; 7536 mac_resource_props_t *gmrp; 7537 boolean_t txhw = mrp->mrp_mask & MRP_TX_RINGS; 7538 boolean_t unspec = mrp->mrp_mask & MRP_TXRINGS_UNSPEC; 7539 boolean_t isprimary; 7540 7541 isprimary = mcip->mci_flent->fe_type & FLOW_PRIMARY_MAC; 7542 7543 /* 7544 * When we come here for a VLAN on the primary (dladm create-vlan), 7545 * we need to pair it along with the primary (to keep it consistent 7546 * with the RX side). So, we check if the primary is already assigned 7547 * to a group and return the group if so. The other way is also 7548 * true, i.e. the VLAN is already created and now we are plumbing 7549 * the primary. 7550 */ 7551 if (!move && isprimary) { 7552 for (gclient = mip->mi_clients_list; gclient != NULL; 7553 gclient = gclient->mci_client_next) { 7554 if (gclient->mci_flent->fe_type & FLOW_PRIMARY_MAC && 7555 gclient->mci_flent->fe_tx_ring_group != NULL) { 7556 return (gclient->mci_flent->fe_tx_ring_group); 7557 } 7558 } 7559 } 7560 7561 if (mip->mi_tx_groups == NULL || mip->mi_tx_group_count == 0) 7562 return (NULL); 7563 7564 /* For dynamic groups, default unspec to 1 */ 7565 if (txhw && unspec && 7566 mip->mi_tx_group_type == MAC_GROUP_TYPE_DYNAMIC) { 7567 mrp->mrp_ntxrings = 1; 7568 } 7569 /* 7570 * For static grouping we allow only specifying rings=0 and 7571 * unspecified 7572 */ 7573 if (txhw && mrp->mrp_ntxrings > 0 && 7574 mip->mi_tx_group_type == MAC_GROUP_TYPE_STATIC) { 7575 return (NULL); 7576 } 7577 7578 if (txhw) { 7579 /* 7580 * We have explicitly asked for a group (with ntxrings, 7581 * if unspec). 7582 */ 7583 if (unspec || mrp->mrp_ntxrings > 0) { 7584 need_exclgrp = B_TRUE; 7585 need_rings = mrp->mrp_ntxrings; 7586 } else if (mrp->mrp_ntxrings == 0) { 7587 /* 7588 * We have asked for a software group. 7589 */ 7590 return (NULL); 7591 } 7592 } 7593 defgrp = MAC_DEFAULT_TX_GROUP(mip); 7594 /* 7595 * The number of rings that the default group can donate. 7596 * We need to leave at least one ring - the default ring - in 7597 * this group. 7598 */ 7599 defnrings = defgrp->mrg_cur_count - 1; 7600 7601 /* 7602 * Primary gets default group unless explicitly told not 7603 * to (i.e. rings > 0). 7604 */ 7605 if (isprimary && !need_exclgrp) 7606 return (NULL); 7607 7608 nrings = (mrp->mrp_mask & MRP_TX_RINGS) != 0 ? mrp->mrp_ntxrings : 1; 7609 for (i = 0; i < mip->mi_tx_group_count; i++) { 7610 grp = &mip->mi_tx_groups[i]; 7611 if ((grp->mrg_state == MAC_GROUP_STATE_RESERVED) || 7612 (grp->mrg_state == MAC_GROUP_STATE_UNINIT)) { 7613 /* 7614 * Select a candidate for replacement if we don't 7615 * get an exclusive group. A candidate group is one 7616 * that didn't ask for an exclusive group, but got 7617 * one and it has enough rings (combined with what 7618 * the default group can donate) for the new MAC 7619 * client. 7620 */ 7621 if (grp->mrg_state == MAC_GROUP_STATE_RESERVED && 7622 candidate_grp == NULL) { 7623 gclient = MAC_GROUP_ONLY_CLIENT(grp); 7624 VERIFY3P(gclient, !=, NULL); 7625 gmrp = MCIP_RESOURCE_PROPS(gclient); 7626 if (gclient->mci_share == 0 && 7627 (gmrp->mrp_mask & MRP_TX_RINGS) == 0 && 7628 (unspec || 7629 (grp->mrg_cur_count + defnrings) >= 7630 need_rings)) { 7631 candidate_grp = grp; 7632 } 7633 } 7634 continue; 7635 } 7636 /* 7637 * If the default can't donate let's just walk and 7638 * see if someone can vacate a group, so that we have 7639 * enough rings for this. 7640 */ 7641 if (mip->mi_tx_group_type != MAC_GROUP_TYPE_DYNAMIC || 7642 nrings <= defnrings) { 7643 if (grp->mrg_state == MAC_GROUP_STATE_REGISTERED) { 7644 rv = mac_start_group(grp); 7645 ASSERT(rv == 0); 7646 } 7647 break; 7648 } 7649 } 7650 7651 /* The default group */ 7652 if (i >= mip->mi_tx_group_count) { 7653 /* 7654 * If we need an exclusive group and have identified a 7655 * candidate group we switch the MAC client from the 7656 * candidate group to the default group and give the 7657 * candidate group to this client. 7658 */ 7659 if (need_exclgrp && candidate_grp != NULL) { 7660 /* 7661 * Switch the MAC client from the candidate 7662 * group to the default group. We know the 7663 * candidate_grp came from a reserved group 7664 * and thus only has one client. 7665 */ 7666 grp = candidate_grp; 7667 gclient = MAC_GROUP_ONLY_CLIENT(grp); 7668 VERIFY3P(gclient, !=, NULL); 7669 mac_tx_client_quiesce((mac_client_handle_t)gclient); 7670 mac_tx_switch_group(gclient, grp, defgrp); 7671 mac_tx_client_restart((mac_client_handle_t)gclient); 7672 7673 /* 7674 * Give the candidate group with the specified number 7675 * of rings to this MAC client. 7676 */ 7677 ASSERT(grp->mrg_state == MAC_GROUP_STATE_REGISTERED); 7678 rv = mac_start_group(grp); 7679 ASSERT(rv == 0); 7680 7681 if (mip->mi_tx_group_type != MAC_GROUP_TYPE_DYNAMIC) 7682 return (grp); 7683 7684 ASSERT(grp->mrg_cur_count == 0); 7685 ASSERT(defgrp->mrg_cur_count > need_rings); 7686 7687 err = i_mac_group_allocate_rings(mip, MAC_RING_TYPE_TX, 7688 defgrp, grp, share, need_rings); 7689 if (err == 0) { 7690 /* 7691 * For a share i_mac_group_allocate_rings gets 7692 * the rings from the driver, let's populate 7693 * the property for the client now. 7694 */ 7695 if (share != 0) { 7696 mac_client_set_rings( 7697 (mac_client_handle_t)mcip, -1, 7698 grp->mrg_cur_count); 7699 } 7700 mip->mi_tx_group_free--; 7701 return (grp); 7702 } 7703 DTRACE_PROBE3(tx__group__reserve__alloc__rings, char *, 7704 mip->mi_name, int, grp->mrg_index, int, err); 7705 mac_stop_group(grp); 7706 } 7707 return (NULL); 7708 } 7709 /* 7710 * We got an exclusive group, but it is not dynamic. 7711 */ 7712 if (mip->mi_tx_group_type != MAC_GROUP_TYPE_DYNAMIC) { 7713 mip->mi_tx_group_free--; 7714 return (grp); 7715 } 7716 7717 rv = i_mac_group_allocate_rings(mip, MAC_RING_TYPE_TX, defgrp, grp, 7718 share, nrings); 7719 if (rv != 0) { 7720 DTRACE_PROBE3(tx__group__reserve__alloc__rings, 7721 char *, mip->mi_name, int, grp->mrg_index, int, rv); 7722 mac_stop_group(grp); 7723 return (NULL); 7724 } 7725 /* 7726 * For a share i_mac_group_allocate_rings gets the rings from the 7727 * driver, let's populate the property for the client now. 7728 */ 7729 if (share != 0) { 7730 mac_client_set_rings((mac_client_handle_t)mcip, -1, 7731 grp->mrg_cur_count); 7732 } 7733 mip->mi_tx_group_free--; 7734 return (grp); 7735 } 7736 7737 void 7738 mac_release_tx_group(mac_client_impl_t *mcip, mac_group_t *grp) 7739 { 7740 mac_impl_t *mip = mcip->mci_mip; 7741 mac_share_handle_t share = mcip->mci_share; 7742 mac_ring_t *ring; 7743 mac_soft_ring_set_t *srs = MCIP_TX_SRS(mcip); 7744 mac_group_t *defgrp; 7745 7746 defgrp = MAC_DEFAULT_TX_GROUP(mip); 7747 if (srs != NULL) { 7748 if (srs->srs_soft_ring_count > 0) { 7749 for (ring = grp->mrg_rings; ring != NULL; 7750 ring = ring->mr_next) { 7751 ASSERT(mac_tx_srs_ring_present(srs, ring)); 7752 mac_tx_invoke_callbacks(mcip, 7753 (mac_tx_cookie_t) 7754 mac_tx_srs_get_soft_ring(srs, ring)); 7755 mac_tx_srs_del_ring(srs, ring); 7756 } 7757 } else { 7758 ASSERT(srs->srs_tx.st_arg2 != NULL); 7759 srs->srs_tx.st_arg2 = NULL; 7760 mac_srs_stat_delete(srs); 7761 } 7762 } 7763 if (share != 0) 7764 mip->mi_share_capab.ms_sremove(share, grp->mrg_driver); 7765 7766 /* move the ring back to the pool */ 7767 if (mip->mi_tx_group_type == MAC_GROUP_TYPE_DYNAMIC) { 7768 while ((ring = grp->mrg_rings) != NULL) 7769 (void) mac_group_mov_ring(mip, defgrp, ring); 7770 } 7771 mac_stop_group(grp); 7772 mip->mi_tx_group_free++; 7773 } 7774 7775 /* 7776 * Disassociate a MAC client from a group, i.e go through the rings in the 7777 * group and delete all the soft rings tied to them. 7778 */ 7779 static void 7780 mac_tx_dismantle_soft_rings(mac_group_t *fgrp, flow_entry_t *flent) 7781 { 7782 mac_client_impl_t *mcip = flent->fe_mcip; 7783 mac_soft_ring_set_t *tx_srs; 7784 mac_srs_tx_t *tx; 7785 mac_ring_t *ring; 7786 7787 tx_srs = flent->fe_tx_srs; 7788 tx = &tx_srs->srs_tx; 7789 7790 /* Single ring case we haven't created any soft rings */ 7791 if (tx->st_mode == SRS_TX_BW || tx->st_mode == SRS_TX_SERIALIZE || 7792 tx->st_mode == SRS_TX_DEFAULT) { 7793 tx->st_arg2 = NULL; 7794 mac_srs_stat_delete(tx_srs); 7795 /* Fanout case, where we have to dismantle the soft rings */ 7796 } else { 7797 for (ring = fgrp->mrg_rings; ring != NULL; 7798 ring = ring->mr_next) { 7799 ASSERT(mac_tx_srs_ring_present(tx_srs, ring)); 7800 mac_tx_invoke_callbacks(mcip, 7801 (mac_tx_cookie_t)mac_tx_srs_get_soft_ring(tx_srs, 7802 ring)); 7803 mac_tx_srs_del_ring(tx_srs, ring); 7804 } 7805 ASSERT(tx->st_arg2 == NULL); 7806 } 7807 } 7808 7809 /* 7810 * Switch the MAC client from one group to another. This means we need 7811 * to remove the MAC client, teardown the SRSs and revert the group state. 7812 * Then, we add the client to the destination roup, set the SRSs etc. 7813 */ 7814 void 7815 mac_tx_switch_group(mac_client_impl_t *mcip, mac_group_t *fgrp, 7816 mac_group_t *tgrp) 7817 { 7818 mac_client_impl_t *group_only_mcip; 7819 mac_impl_t *mip = mcip->mci_mip; 7820 flow_entry_t *flent = mcip->mci_flent; 7821 mac_group_t *defgrp; 7822 mac_grp_client_t *mgcp; 7823 mac_client_impl_t *gmcip; 7824 flow_entry_t *gflent; 7825 7826 defgrp = MAC_DEFAULT_TX_GROUP(mip); 7827 ASSERT(fgrp == flent->fe_tx_ring_group); 7828 7829 if (fgrp == defgrp) { 7830 /* 7831 * If this is the primary we need to find any VLANs on 7832 * the primary and move them too. 7833 */ 7834 mac_group_remove_client(fgrp, mcip); 7835 mac_tx_dismantle_soft_rings(fgrp, flent); 7836 if (mac_check_macaddr_shared(mcip->mci_unicast)) { 7837 mgcp = fgrp->mrg_clients; 7838 while (mgcp != NULL) { 7839 gmcip = mgcp->mgc_client; 7840 mgcp = mgcp->mgc_next; 7841 if (mcip->mci_unicast != gmcip->mci_unicast) 7842 continue; 7843 mac_tx_client_quiesce( 7844 (mac_client_handle_t)gmcip); 7845 7846 gflent = gmcip->mci_flent; 7847 mac_group_remove_client(fgrp, gmcip); 7848 mac_tx_dismantle_soft_rings(fgrp, gflent); 7849 7850 mac_group_add_client(tgrp, gmcip); 7851 gflent->fe_tx_ring_group = tgrp; 7852 /* We could directly set this to SHARED */ 7853 tgrp->mrg_state = mac_group_next_state(tgrp, 7854 &group_only_mcip, defgrp, B_FALSE); 7855 7856 mac_tx_srs_group_setup(gmcip, gflent, 7857 SRST_LINK); 7858 mac_fanout_setup(gmcip, gflent, 7859 MCIP_RESOURCE_PROPS(gmcip), mac_rx_deliver, 7860 gmcip, NULL, NULL); 7861 7862 mac_tx_client_restart( 7863 (mac_client_handle_t)gmcip); 7864 } 7865 } 7866 if (MAC_GROUP_NO_CLIENT(fgrp)) { 7867 mac_ring_t *ring; 7868 int cnt; 7869 int ringcnt; 7870 7871 fgrp->mrg_state = MAC_GROUP_STATE_REGISTERED; 7872 /* 7873 * Additionally, we also need to stop all 7874 * the rings in the default group, except 7875 * the default ring. The reason being 7876 * this group won't be released since it is 7877 * the default group, so the rings won't 7878 * be stopped otherwise. 7879 */ 7880 ringcnt = fgrp->mrg_cur_count; 7881 ring = fgrp->mrg_rings; 7882 for (cnt = 0; cnt < ringcnt; cnt++) { 7883 if (ring->mr_state == MR_INUSE && 7884 ring != 7885 (mac_ring_t *)mip->mi_default_tx_ring) { 7886 mac_stop_ring(ring); 7887 ring->mr_flag = 0; 7888 } 7889 ring = ring->mr_next; 7890 } 7891 } else if (MAC_GROUP_ONLY_CLIENT(fgrp) != NULL) { 7892 fgrp->mrg_state = MAC_GROUP_STATE_RESERVED; 7893 } else { 7894 ASSERT(fgrp->mrg_state == MAC_GROUP_STATE_SHARED); 7895 } 7896 } else { 7897 /* 7898 * We could have VLANs sharing the non-default group with 7899 * the primary. 7900 */ 7901 mgcp = fgrp->mrg_clients; 7902 while (mgcp != NULL) { 7903 gmcip = mgcp->mgc_client; 7904 mgcp = mgcp->mgc_next; 7905 if (gmcip == mcip) 7906 continue; 7907 mac_tx_client_quiesce((mac_client_handle_t)gmcip); 7908 gflent = gmcip->mci_flent; 7909 7910 mac_group_remove_client(fgrp, gmcip); 7911 mac_tx_dismantle_soft_rings(fgrp, gflent); 7912 7913 mac_group_add_client(tgrp, gmcip); 7914 gflent->fe_tx_ring_group = tgrp; 7915 /* We could directly set this to SHARED */ 7916 tgrp->mrg_state = mac_group_next_state(tgrp, 7917 &group_only_mcip, defgrp, B_FALSE); 7918 mac_tx_srs_group_setup(gmcip, gflent, SRST_LINK); 7919 mac_fanout_setup(gmcip, gflent, 7920 MCIP_RESOURCE_PROPS(gmcip), mac_rx_deliver, 7921 gmcip, NULL, NULL); 7922 7923 mac_tx_client_restart((mac_client_handle_t)gmcip); 7924 } 7925 mac_group_remove_client(fgrp, mcip); 7926 mac_release_tx_group(mcip, fgrp); 7927 fgrp->mrg_state = MAC_GROUP_STATE_REGISTERED; 7928 } 7929 7930 /* Add it to the tgroup */ 7931 mac_group_add_client(tgrp, mcip); 7932 flent->fe_tx_ring_group = tgrp; 7933 tgrp->mrg_state = mac_group_next_state(tgrp, &group_only_mcip, 7934 defgrp, B_FALSE); 7935 7936 mac_tx_srs_group_setup(mcip, flent, SRST_LINK); 7937 mac_fanout_setup(mcip, flent, MCIP_RESOURCE_PROPS(mcip), 7938 mac_rx_deliver, mcip, NULL, NULL); 7939 } 7940 7941 /* 7942 * This is a 1-time control path activity initiated by the client (IP). 7943 * The mac perimeter protects against other simultaneous control activities, 7944 * for example an ioctl that attempts to change the degree of fanout and 7945 * increase or decrease the number of softrings associated with this Tx SRS. 7946 */ 7947 static mac_tx_notify_cb_t * 7948 mac_client_tx_notify_add(mac_client_impl_t *mcip, 7949 mac_tx_notify_t notify, void *arg) 7950 { 7951 mac_cb_info_t *mcbi; 7952 mac_tx_notify_cb_t *mtnfp; 7953 7954 ASSERT(MAC_PERIM_HELD((mac_handle_t)mcip->mci_mip)); 7955 7956 mtnfp = kmem_zalloc(sizeof (mac_tx_notify_cb_t), KM_SLEEP); 7957 mtnfp->mtnf_fn = notify; 7958 mtnfp->mtnf_arg = arg; 7959 mtnfp->mtnf_link.mcb_objp = mtnfp; 7960 mtnfp->mtnf_link.mcb_objsize = sizeof (mac_tx_notify_cb_t); 7961 mtnfp->mtnf_link.mcb_flags = MCB_TX_NOTIFY_CB_T; 7962 7963 mcbi = &mcip->mci_tx_notify_cb_info; 7964 mutex_enter(mcbi->mcbi_lockp); 7965 mac_callback_add(mcbi, &mcip->mci_tx_notify_cb_list, &mtnfp->mtnf_link); 7966 mutex_exit(mcbi->mcbi_lockp); 7967 return (mtnfp); 7968 } 7969 7970 static void 7971 mac_client_tx_notify_remove(mac_client_impl_t *mcip, mac_tx_notify_cb_t *mtnfp) 7972 { 7973 mac_cb_info_t *mcbi; 7974 mac_cb_t **cblist; 7975 7976 ASSERT(MAC_PERIM_HELD((mac_handle_t)mcip->mci_mip)); 7977 7978 if (!mac_callback_find(&mcip->mci_tx_notify_cb_info, 7979 &mcip->mci_tx_notify_cb_list, &mtnfp->mtnf_link)) { 7980 cmn_err(CE_WARN, 7981 "mac_client_tx_notify_remove: callback not " 7982 "found, mcip 0x%p mtnfp 0x%p", (void *)mcip, (void *)mtnfp); 7983 return; 7984 } 7985 7986 mcbi = &mcip->mci_tx_notify_cb_info; 7987 cblist = &mcip->mci_tx_notify_cb_list; 7988 mutex_enter(mcbi->mcbi_lockp); 7989 if (mac_callback_remove(mcbi, cblist, &mtnfp->mtnf_link)) 7990 kmem_free(mtnfp, sizeof (mac_tx_notify_cb_t)); 7991 else 7992 mac_callback_remove_wait(&mcip->mci_tx_notify_cb_info); 7993 mutex_exit(mcbi->mcbi_lockp); 7994 } 7995 7996 /* 7997 * mac_client_tx_notify(): 7998 * call to add and remove flow control callback routine. 7999 */ 8000 mac_tx_notify_handle_t 8001 mac_client_tx_notify(mac_client_handle_t mch, mac_tx_notify_t callb_func, 8002 void *ptr) 8003 { 8004 mac_client_impl_t *mcip = (mac_client_impl_t *)mch; 8005 mac_tx_notify_cb_t *mtnfp = NULL; 8006 8007 i_mac_perim_enter(mcip->mci_mip); 8008 8009 if (callb_func != NULL) { 8010 /* Add a notify callback */ 8011 mtnfp = mac_client_tx_notify_add(mcip, callb_func, ptr); 8012 } else { 8013 mac_client_tx_notify_remove(mcip, (mac_tx_notify_cb_t *)ptr); 8014 } 8015 i_mac_perim_exit(mcip->mci_mip); 8016 8017 return ((mac_tx_notify_handle_t)mtnfp); 8018 } 8019 8020 void 8021 mac_bridge_vectors(mac_bridge_tx_t txf, mac_bridge_rx_t rxf, 8022 mac_bridge_ref_t reff, mac_bridge_ls_t lsf) 8023 { 8024 mac_bridge_tx_cb = txf; 8025 mac_bridge_rx_cb = rxf; 8026 mac_bridge_ref_cb = reff; 8027 mac_bridge_ls_cb = lsf; 8028 } 8029 8030 int 8031 mac_bridge_set(mac_handle_t mh, mac_handle_t link) 8032 { 8033 mac_impl_t *mip = (mac_impl_t *)mh; 8034 int retv; 8035 8036 mutex_enter(&mip->mi_bridge_lock); 8037 if (mip->mi_bridge_link == NULL) { 8038 mip->mi_bridge_link = link; 8039 retv = 0; 8040 } else { 8041 retv = EBUSY; 8042 } 8043 mutex_exit(&mip->mi_bridge_lock); 8044 if (retv == 0) { 8045 mac_poll_state_change(mh, B_FALSE); 8046 mac_capab_update(mh); 8047 } 8048 return (retv); 8049 } 8050 8051 /* 8052 * Disable bridging on the indicated link. 8053 */ 8054 void 8055 mac_bridge_clear(mac_handle_t mh, mac_handle_t link) 8056 { 8057 mac_impl_t *mip = (mac_impl_t *)mh; 8058 8059 mutex_enter(&mip->mi_bridge_lock); 8060 ASSERT(mip->mi_bridge_link == link); 8061 mip->mi_bridge_link = NULL; 8062 mutex_exit(&mip->mi_bridge_lock); 8063 mac_poll_state_change(mh, B_TRUE); 8064 mac_capab_update(mh); 8065 } 8066 8067 void 8068 mac_no_active(mac_handle_t mh) 8069 { 8070 mac_impl_t *mip = (mac_impl_t *)mh; 8071 8072 i_mac_perim_enter(mip); 8073 mip->mi_state_flags |= MIS_NO_ACTIVE; 8074 i_mac_perim_exit(mip); 8075 } 8076 8077 /* 8078 * Walk the primary VLAN clients whenever the primary's rings property 8079 * changes and update the mac_resource_props_t for the VLAN's client. 8080 * We need to do this since we don't support setting these properties 8081 * on the primary's VLAN clients, but the VLAN clients have to 8082 * follow the primary w.r.t the rings property. 8083 */ 8084 void 8085 mac_set_prim_vlan_rings(mac_impl_t *mip, mac_resource_props_t *mrp) 8086 { 8087 mac_client_impl_t *vmcip; 8088 mac_resource_props_t *vmrp; 8089 8090 for (vmcip = mip->mi_clients_list; vmcip != NULL; 8091 vmcip = vmcip->mci_client_next) { 8092 if (!(vmcip->mci_flent->fe_type & FLOW_PRIMARY_MAC) || 8093 mac_client_vid((mac_client_handle_t)vmcip) == 8094 VLAN_ID_NONE) { 8095 continue; 8096 } 8097 vmrp = MCIP_RESOURCE_PROPS(vmcip); 8098 8099 vmrp->mrp_nrxrings = mrp->mrp_nrxrings; 8100 if (mrp->mrp_mask & MRP_RX_RINGS) 8101 vmrp->mrp_mask |= MRP_RX_RINGS; 8102 else if (vmrp->mrp_mask & MRP_RX_RINGS) 8103 vmrp->mrp_mask &= ~MRP_RX_RINGS; 8104 8105 vmrp->mrp_ntxrings = mrp->mrp_ntxrings; 8106 if (mrp->mrp_mask & MRP_TX_RINGS) 8107 vmrp->mrp_mask |= MRP_TX_RINGS; 8108 else if (vmrp->mrp_mask & MRP_TX_RINGS) 8109 vmrp->mrp_mask &= ~MRP_TX_RINGS; 8110 8111 if (mrp->mrp_mask & MRP_RXRINGS_UNSPEC) 8112 vmrp->mrp_mask |= MRP_RXRINGS_UNSPEC; 8113 else 8114 vmrp->mrp_mask &= ~MRP_RXRINGS_UNSPEC; 8115 8116 if (mrp->mrp_mask & MRP_TXRINGS_UNSPEC) 8117 vmrp->mrp_mask |= MRP_TXRINGS_UNSPEC; 8118 else 8119 vmrp->mrp_mask &= ~MRP_TXRINGS_UNSPEC; 8120 } 8121 } 8122 8123 /* 8124 * We are adding or removing ring(s) from a group. The source for taking 8125 * rings is the default group. The destination for giving rings back is 8126 * the default group. 8127 */ 8128 int 8129 mac_group_ring_modify(mac_client_impl_t *mcip, mac_group_t *group, 8130 mac_group_t *defgrp) 8131 { 8132 mac_resource_props_t *mrp = MCIP_RESOURCE_PROPS(mcip); 8133 uint_t modify; 8134 int count; 8135 mac_ring_t *ring; 8136 mac_ring_t *next; 8137 mac_impl_t *mip = mcip->mci_mip; 8138 mac_ring_t **rings; 8139 uint_t ringcnt; 8140 int i = 0; 8141 boolean_t rx_group = group->mrg_type == MAC_RING_TYPE_RX; 8142 int start; 8143 int end; 8144 mac_group_t *tgrp; 8145 int j; 8146 int rv = 0; 8147 8148 /* 8149 * If we are asked for just a group, we give 1 ring, else 8150 * the specified number of rings. 8151 */ 8152 if (rx_group) { 8153 ringcnt = (mrp->mrp_mask & MRP_RXRINGS_UNSPEC) ? 1: 8154 mrp->mrp_nrxrings; 8155 } else { 8156 ringcnt = (mrp->mrp_mask & MRP_TXRINGS_UNSPEC) ? 1: 8157 mrp->mrp_ntxrings; 8158 } 8159 8160 /* don't allow modifying rings for a share for now. */ 8161 ASSERT(mcip->mci_share == 0); 8162 8163 if (ringcnt == group->mrg_cur_count) 8164 return (0); 8165 8166 if (group->mrg_cur_count > ringcnt) { 8167 modify = group->mrg_cur_count - ringcnt; 8168 if (rx_group) { 8169 if (mip->mi_rx_donor_grp == group) { 8170 ASSERT(mac_is_primary_client(mcip)); 8171 mip->mi_rx_donor_grp = defgrp; 8172 } else { 8173 defgrp = mip->mi_rx_donor_grp; 8174 } 8175 } 8176 ring = group->mrg_rings; 8177 rings = kmem_alloc(modify * sizeof (mac_ring_handle_t), 8178 KM_SLEEP); 8179 j = 0; 8180 for (count = 0; count < modify; count++) { 8181 next = ring->mr_next; 8182 rv = mac_group_mov_ring(mip, defgrp, ring); 8183 if (rv != 0) { 8184 /* cleanup on failure */ 8185 for (j = 0; j < count; j++) { 8186 (void) mac_group_mov_ring(mip, group, 8187 rings[j]); 8188 } 8189 break; 8190 } 8191 rings[j++] = ring; 8192 ring = next; 8193 } 8194 kmem_free(rings, modify * sizeof (mac_ring_handle_t)); 8195 return (rv); 8196 } 8197 if (ringcnt >= MAX_RINGS_PER_GROUP) 8198 return (EINVAL); 8199 8200 modify = ringcnt - group->mrg_cur_count; 8201 8202 if (rx_group) { 8203 if (group != mip->mi_rx_donor_grp) 8204 defgrp = mip->mi_rx_donor_grp; 8205 else 8206 /* 8207 * This is the donor group with all the remaining 8208 * rings. Default group now gets to be the donor 8209 */ 8210 mip->mi_rx_donor_grp = defgrp; 8211 start = 1; 8212 end = mip->mi_rx_group_count; 8213 } else { 8214 start = 0; 8215 end = mip->mi_tx_group_count - 1; 8216 } 8217 /* 8218 * If the default doesn't have any rings, lets see if we can 8219 * take rings given to an h/w client that doesn't need it. 8220 * For now, we just see if there is any one client that can donate 8221 * all the required rings. 8222 */ 8223 if (defgrp->mrg_cur_count < (modify + 1)) { 8224 for (i = start; i < end; i++) { 8225 if (rx_group) { 8226 tgrp = &mip->mi_rx_groups[i]; 8227 if (tgrp == group || tgrp->mrg_state < 8228 MAC_GROUP_STATE_RESERVED) { 8229 continue; 8230 } 8231 if (i_mac_clients_hw(tgrp, MRP_RX_RINGS)) 8232 continue; 8233 mcip = tgrp->mrg_clients->mgc_client; 8234 VERIFY3P(mcip, !=, NULL); 8235 if ((tgrp->mrg_cur_count + 8236 defgrp->mrg_cur_count) < (modify + 1)) { 8237 continue; 8238 } 8239 if (mac_rx_switch_group(mcip, tgrp, 8240 defgrp) != 0) { 8241 return (ENOSPC); 8242 } 8243 } else { 8244 tgrp = &mip->mi_tx_groups[i]; 8245 if (tgrp == group || tgrp->mrg_state < 8246 MAC_GROUP_STATE_RESERVED) { 8247 continue; 8248 } 8249 if (i_mac_clients_hw(tgrp, MRP_TX_RINGS)) 8250 continue; 8251 mcip = tgrp->mrg_clients->mgc_client; 8252 VERIFY3P(mcip, !=, NULL); 8253 if ((tgrp->mrg_cur_count + 8254 defgrp->mrg_cur_count) < (modify + 1)) { 8255 continue; 8256 } 8257 /* OK, we can switch this to s/w */ 8258 mac_tx_client_quiesce( 8259 (mac_client_handle_t)mcip); 8260 mac_tx_switch_group(mcip, tgrp, defgrp); 8261 mac_tx_client_restart( 8262 (mac_client_handle_t)mcip); 8263 } 8264 } 8265 if (defgrp->mrg_cur_count < (modify + 1)) 8266 return (ENOSPC); 8267 } 8268 if ((rv = i_mac_group_allocate_rings(mip, group->mrg_type, defgrp, 8269 group, mcip->mci_share, modify)) != 0) { 8270 return (rv); 8271 } 8272 return (0); 8273 } 8274 8275 /* 8276 * Given the poolname in mac_resource_props, find the cpupart 8277 * that is associated with this pool. The cpupart will be used 8278 * later for finding the cpus to be bound to the networking threads. 8279 * 8280 * use_default is set B_TRUE if pools are enabled and pool_default 8281 * is returned. This avoids a 2nd lookup to set the poolname 8282 * for pool-effective. 8283 * 8284 * returns: 8285 * 8286 * NULL - pools are disabled or if the 'cpus' property is set. 8287 * cpupart of pool_default - pools are enabled and the pool 8288 * is not available or poolname is blank 8289 * cpupart of named pool - pools are enabled and the pool 8290 * is available. 8291 */ 8292 cpupart_t * 8293 mac_pset_find(mac_resource_props_t *mrp, boolean_t *use_default) 8294 { 8295 pool_t *pool; 8296 cpupart_t *cpupart; 8297 8298 *use_default = B_FALSE; 8299 8300 /* CPUs property is set */ 8301 if (mrp->mrp_mask & MRP_CPUS) 8302 return (NULL); 8303 8304 ASSERT(pool_lock_held()); 8305 8306 /* Pools are disabled, no pset */ 8307 if (pool_state == POOL_DISABLED) 8308 return (NULL); 8309 8310 /* Pools property is set */ 8311 if (mrp->mrp_mask & MRP_POOL) { 8312 if ((pool = pool_lookup_pool_by_name(mrp->mrp_pool)) == NULL) { 8313 /* Pool not found */ 8314 DTRACE_PROBE1(mac_pset_find_no_pool, char *, 8315 mrp->mrp_pool); 8316 *use_default = B_TRUE; 8317 pool = pool_default; 8318 } 8319 /* Pools property is not set */ 8320 } else { 8321 *use_default = B_TRUE; 8322 pool = pool_default; 8323 } 8324 8325 /* Find the CPU pset that corresponds to the pool */ 8326 mutex_enter(&cpu_lock); 8327 if ((cpupart = cpupart_find(pool->pool_pset->pset_id)) == NULL) { 8328 DTRACE_PROBE1(mac_find_pset_no_pset, psetid_t, 8329 pool->pool_pset->pset_id); 8330 } 8331 mutex_exit(&cpu_lock); 8332 8333 return (cpupart); 8334 } 8335 8336 void 8337 mac_set_pool_effective(boolean_t use_default, cpupart_t *cpupart, 8338 mac_resource_props_t *mrp, mac_resource_props_t *emrp) 8339 { 8340 ASSERT(pool_lock_held()); 8341 8342 if (cpupart != NULL) { 8343 emrp->mrp_mask |= MRP_POOL; 8344 if (use_default) { 8345 (void) strcpy(emrp->mrp_pool, 8346 "pool_default"); 8347 } else { 8348 ASSERT(strlen(mrp->mrp_pool) != 0); 8349 (void) strcpy(emrp->mrp_pool, 8350 mrp->mrp_pool); 8351 } 8352 } else { 8353 emrp->mrp_mask &= ~MRP_POOL; 8354 bzero(emrp->mrp_pool, MAXPATHLEN); 8355 } 8356 } 8357 8358 struct mac_pool_arg { 8359 char mpa_poolname[MAXPATHLEN]; 8360 pool_event_t mpa_what; 8361 }; 8362 8363 /*ARGSUSED*/ 8364 static uint_t 8365 mac_pool_link_update(mod_hash_key_t key, mod_hash_val_t *val, void *arg) 8366 { 8367 struct mac_pool_arg *mpa = arg; 8368 mac_impl_t *mip = (mac_impl_t *)val; 8369 mac_client_impl_t *mcip; 8370 mac_resource_props_t *mrp, *emrp; 8371 boolean_t pool_update = B_FALSE; 8372 boolean_t pool_clear = B_FALSE; 8373 boolean_t use_default = B_FALSE; 8374 cpupart_t *cpupart = NULL; 8375 8376 mrp = kmem_zalloc(sizeof (*mrp), KM_SLEEP); 8377 i_mac_perim_enter(mip); 8378 for (mcip = mip->mi_clients_list; mcip != NULL; 8379 mcip = mcip->mci_client_next) { 8380 pool_update = B_FALSE; 8381 pool_clear = B_FALSE; 8382 use_default = B_FALSE; 8383 mac_client_get_resources((mac_client_handle_t)mcip, mrp); 8384 emrp = MCIP_EFFECTIVE_PROPS(mcip); 8385 8386 /* 8387 * When pools are enabled 8388 */ 8389 if ((mpa->mpa_what == POOL_E_ENABLE) && 8390 ((mrp->mrp_mask & MRP_CPUS) == 0)) { 8391 mrp->mrp_mask |= MRP_POOL; 8392 pool_update = B_TRUE; 8393 } 8394 8395 /* 8396 * When pools are disabled 8397 */ 8398 if ((mpa->mpa_what == POOL_E_DISABLE) && 8399 ((mrp->mrp_mask & MRP_CPUS) == 0)) { 8400 mrp->mrp_mask |= MRP_POOL; 8401 pool_clear = B_TRUE; 8402 } 8403 8404 /* 8405 * Look for links with the pool property set and the poolname 8406 * matching the one which is changing. 8407 */ 8408 if (strcmp(mrp->mrp_pool, mpa->mpa_poolname) == 0) { 8409 /* 8410 * The pool associated with the link has changed. 8411 */ 8412 if (mpa->mpa_what == POOL_E_CHANGE) { 8413 mrp->mrp_mask |= MRP_POOL; 8414 pool_update = B_TRUE; 8415 } 8416 } 8417 8418 /* 8419 * This link is associated with pool_default and 8420 * pool_default has changed. 8421 */ 8422 if ((mpa->mpa_what == POOL_E_CHANGE) && 8423 (strcmp(emrp->mrp_pool, "pool_default") == 0) && 8424 (strcmp(mpa->mpa_poolname, "pool_default") == 0)) { 8425 mrp->mrp_mask |= MRP_POOL; 8426 pool_update = B_TRUE; 8427 } 8428 8429 /* 8430 * Get new list of cpus for the pool, bind network 8431 * threads to new list of cpus and update resources. 8432 */ 8433 if (pool_update) { 8434 if (MCIP_DATAPATH_SETUP(mcip)) { 8435 pool_lock(); 8436 cpupart = mac_pset_find(mrp, &use_default); 8437 mac_fanout_setup(mcip, mcip->mci_flent, mrp, 8438 mac_rx_deliver, mcip, NULL, cpupart); 8439 mac_set_pool_effective(use_default, cpupart, 8440 mrp, emrp); 8441 pool_unlock(); 8442 } 8443 mac_update_resources(mrp, MCIP_RESOURCE_PROPS(mcip), 8444 B_FALSE); 8445 } 8446 8447 /* 8448 * Clear the effective pool and bind network threads 8449 * to any available CPU. 8450 */ 8451 if (pool_clear) { 8452 if (MCIP_DATAPATH_SETUP(mcip)) { 8453 emrp->mrp_mask &= ~MRP_POOL; 8454 bzero(emrp->mrp_pool, MAXPATHLEN); 8455 mac_fanout_setup(mcip, mcip->mci_flent, mrp, 8456 mac_rx_deliver, mcip, NULL, NULL); 8457 } 8458 mac_update_resources(mrp, MCIP_RESOURCE_PROPS(mcip), 8459 B_FALSE); 8460 } 8461 } 8462 i_mac_perim_exit(mip); 8463 kmem_free(mrp, sizeof (*mrp)); 8464 return (MH_WALK_CONTINUE); 8465 } 8466 8467 static void 8468 mac_pool_update(void *arg) 8469 { 8470 mod_hash_walk(i_mac_impl_hash, mac_pool_link_update, arg); 8471 kmem_free(arg, sizeof (struct mac_pool_arg)); 8472 } 8473 8474 /* 8475 * Callback function to be executed when a noteworthy pool event 8476 * takes place. 8477 */ 8478 /* ARGSUSED */ 8479 static void 8480 mac_pool_event_cb(pool_event_t what, poolid_t id, void *arg) 8481 { 8482 pool_t *pool; 8483 char *poolname = NULL; 8484 struct mac_pool_arg *mpa; 8485 8486 pool_lock(); 8487 mpa = kmem_zalloc(sizeof (struct mac_pool_arg), KM_SLEEP); 8488 8489 switch (what) { 8490 case POOL_E_ENABLE: 8491 case POOL_E_DISABLE: 8492 break; 8493 8494 case POOL_E_CHANGE: 8495 pool = pool_lookup_pool_by_id(id); 8496 if (pool == NULL) { 8497 kmem_free(mpa, sizeof (struct mac_pool_arg)); 8498 pool_unlock(); 8499 return; 8500 } 8501 pool_get_name(pool, &poolname); 8502 (void) strlcpy(mpa->mpa_poolname, poolname, 8503 sizeof (mpa->mpa_poolname)); 8504 break; 8505 8506 default: 8507 kmem_free(mpa, sizeof (struct mac_pool_arg)); 8508 pool_unlock(); 8509 return; 8510 } 8511 pool_unlock(); 8512 8513 mpa->mpa_what = what; 8514 8515 mac_pool_update(mpa); 8516 } 8517 8518 /* 8519 * Set effective rings property. This could be called from datapath_setup/ 8520 * datapath_teardown or set-linkprop. 8521 * If the group is reserved we just go ahead and set the effective rings. 8522 * Additionally, for TX this could mean the default group has lost/gained 8523 * some rings, so if the default group is reserved, we need to adjust the 8524 * effective rings for the default group clients. For RX, if we are working 8525 * with the non-default group, we just need to reset the effective props 8526 * for the default group clients. 8527 */ 8528 void 8529 mac_set_rings_effective(mac_client_impl_t *mcip) 8530 { 8531 mac_impl_t *mip = mcip->mci_mip; 8532 mac_group_t *grp; 8533 mac_group_t *defgrp; 8534 flow_entry_t *flent = mcip->mci_flent; 8535 mac_resource_props_t *emrp = MCIP_EFFECTIVE_PROPS(mcip); 8536 mac_grp_client_t *mgcp; 8537 mac_client_impl_t *gmcip; 8538 8539 grp = flent->fe_rx_ring_group; 8540 if (grp != NULL) { 8541 defgrp = MAC_DEFAULT_RX_GROUP(mip); 8542 /* 8543 * If we have reserved a group, set the effective rings 8544 * to the ring count in the group. 8545 */ 8546 if (grp->mrg_state == MAC_GROUP_STATE_RESERVED) { 8547 emrp->mrp_mask |= MRP_RX_RINGS; 8548 emrp->mrp_nrxrings = grp->mrg_cur_count; 8549 } 8550 8551 /* 8552 * We go through the clients in the shared group and 8553 * reset the effective properties. It is possible this 8554 * might have already been done for some client (i.e. 8555 * if some client is being moved to a group that is 8556 * already shared). The case where the default group is 8557 * RESERVED is taken care of above (note in the RX side if 8558 * there is a non-default group, the default group is always 8559 * SHARED). 8560 */ 8561 if (grp != defgrp || grp->mrg_state == MAC_GROUP_STATE_SHARED) { 8562 if (grp->mrg_state == MAC_GROUP_STATE_SHARED) 8563 mgcp = grp->mrg_clients; 8564 else 8565 mgcp = defgrp->mrg_clients; 8566 while (mgcp != NULL) { 8567 gmcip = mgcp->mgc_client; 8568 emrp = MCIP_EFFECTIVE_PROPS(gmcip); 8569 if (emrp->mrp_mask & MRP_RX_RINGS) { 8570 emrp->mrp_mask &= ~MRP_RX_RINGS; 8571 emrp->mrp_nrxrings = 0; 8572 } 8573 mgcp = mgcp->mgc_next; 8574 } 8575 } 8576 } 8577 8578 /* Now the TX side */ 8579 grp = flent->fe_tx_ring_group; 8580 if (grp != NULL) { 8581 defgrp = MAC_DEFAULT_TX_GROUP(mip); 8582 8583 if (grp->mrg_state == MAC_GROUP_STATE_RESERVED) { 8584 emrp->mrp_mask |= MRP_TX_RINGS; 8585 emrp->mrp_ntxrings = grp->mrg_cur_count; 8586 } else if (grp->mrg_state == MAC_GROUP_STATE_SHARED) { 8587 mgcp = grp->mrg_clients; 8588 while (mgcp != NULL) { 8589 gmcip = mgcp->mgc_client; 8590 emrp = MCIP_EFFECTIVE_PROPS(gmcip); 8591 if (emrp->mrp_mask & MRP_TX_RINGS) { 8592 emrp->mrp_mask &= ~MRP_TX_RINGS; 8593 emrp->mrp_ntxrings = 0; 8594 } 8595 mgcp = mgcp->mgc_next; 8596 } 8597 } 8598 8599 /* 8600 * If the group is not the default group and the default 8601 * group is reserved, the ring count in the default group 8602 * might have changed, update it. 8603 */ 8604 if (grp != defgrp && 8605 defgrp->mrg_state == MAC_GROUP_STATE_RESERVED) { 8606 gmcip = MAC_GROUP_ONLY_CLIENT(defgrp); 8607 emrp = MCIP_EFFECTIVE_PROPS(gmcip); 8608 emrp->mrp_ntxrings = defgrp->mrg_cur_count; 8609 } 8610 } 8611 emrp = MCIP_EFFECTIVE_PROPS(mcip); 8612 } 8613 8614 /* 8615 * Check if the primary is in the default group. If so, see if we 8616 * can give it a an exclusive group now that another client is 8617 * being configured. We take the primary out of the default group 8618 * because the multicast/broadcast packets for the all the clients 8619 * will land in the default ring in the default group which means 8620 * any client in the default group, even if it is the only on in 8621 * the group, will lose exclusive access to the rings, hence 8622 * polling. 8623 */ 8624 mac_client_impl_t * 8625 mac_check_primary_relocation(mac_client_impl_t *mcip, boolean_t rxhw) 8626 { 8627 mac_impl_t *mip = mcip->mci_mip; 8628 mac_group_t *defgrp = MAC_DEFAULT_RX_GROUP(mip); 8629 flow_entry_t *flent = mcip->mci_flent; 8630 mac_resource_props_t *mrp = MCIP_RESOURCE_PROPS(mcip); 8631 uint8_t *mac_addr; 8632 mac_group_t *ngrp; 8633 8634 /* 8635 * Check if the primary is in the default group, if not 8636 * or if it is explicitly configured to be in the default 8637 * group OR set the RX rings property, return. 8638 */ 8639 if (flent->fe_rx_ring_group != defgrp || mrp->mrp_mask & MRP_RX_RINGS) 8640 return (NULL); 8641 8642 /* 8643 * If the new client needs an exclusive group and we 8644 * don't have another for the primary, return. 8645 */ 8646 if (rxhw && mip->mi_rxhwclnt_avail < 2) 8647 return (NULL); 8648 8649 mac_addr = flent->fe_flow_desc.fd_dst_mac; 8650 /* 8651 * We call this when we are setting up the datapath for 8652 * the first non-primary. 8653 */ 8654 ASSERT(mip->mi_nactiveclients == 2); 8655 8656 /* 8657 * OK, now we have the primary that needs to be relocated. 8658 */ 8659 ngrp = mac_reserve_rx_group(mcip, mac_addr, B_TRUE); 8660 if (ngrp == NULL) 8661 return (NULL); 8662 if (mac_rx_switch_group(mcip, defgrp, ngrp) != 0) { 8663 mac_stop_group(ngrp); 8664 return (NULL); 8665 } 8666 return (mcip); 8667 } 8668 8669 void 8670 mac_transceiver_init(mac_impl_t *mip) 8671 { 8672 if (mac_capab_get((mac_handle_t)mip, MAC_CAPAB_TRANSCEIVER, 8673 &mip->mi_transceiver)) { 8674 /* 8675 * The driver set a flag that we don't know about. In this case, 8676 * we need to warn about that case and ignore this capability. 8677 */ 8678 if (mip->mi_transceiver.mct_flags != 0) { 8679 dev_err(mip->mi_dip, CE_WARN, "driver set transceiver " 8680 "flags to invalid value: 0x%x, ignoring " 8681 "capability", mip->mi_transceiver.mct_flags); 8682 bzero(&mip->mi_transceiver, 8683 sizeof (mac_capab_transceiver_t)); 8684 } 8685 } else { 8686 bzero(&mip->mi_transceiver, 8687 sizeof (mac_capab_transceiver_t)); 8688 } 8689 } 8690 8691 int 8692 mac_transceiver_count(mac_handle_t mh, uint_t *countp) 8693 { 8694 mac_impl_t *mip = (mac_impl_t *)mh; 8695 8696 ASSERT(MAC_PERIM_HELD(mh)); 8697 8698 if (mip->mi_transceiver.mct_ntransceivers == 0) 8699 return (ENOTSUP); 8700 8701 *countp = mip->mi_transceiver.mct_ntransceivers; 8702 return (0); 8703 } 8704 8705 int 8706 mac_transceiver_info(mac_handle_t mh, uint_t tranid, boolean_t *present, 8707 boolean_t *usable) 8708 { 8709 int ret; 8710 mac_transceiver_info_t info; 8711 8712 mac_impl_t *mip = (mac_impl_t *)mh; 8713 8714 ASSERT(MAC_PERIM_HELD(mh)); 8715 8716 if (mip->mi_transceiver.mct_info == NULL || 8717 mip->mi_transceiver.mct_ntransceivers == 0) 8718 return (ENOTSUP); 8719 8720 if (tranid >= mip->mi_transceiver.mct_ntransceivers) 8721 return (EINVAL); 8722 8723 bzero(&info, sizeof (mac_transceiver_info_t)); 8724 if ((ret = mip->mi_transceiver.mct_info(mip->mi_driver, tranid, 8725 &info)) != 0) { 8726 return (ret); 8727 } 8728 8729 *present = info.mti_present; 8730 *usable = info.mti_usable; 8731 return (0); 8732 } 8733 8734 int 8735 mac_transceiver_read(mac_handle_t mh, uint_t tranid, uint_t page, void *buf, 8736 size_t nbytes, off_t offset, size_t *nread) 8737 { 8738 int ret; 8739 size_t nr; 8740 mac_impl_t *mip = (mac_impl_t *)mh; 8741 8742 ASSERT(MAC_PERIM_HELD(mh)); 8743 8744 if (mip->mi_transceiver.mct_read == NULL) 8745 return (ENOTSUP); 8746 8747 if (tranid >= mip->mi_transceiver.mct_ntransceivers) 8748 return (EINVAL); 8749 8750 /* 8751 * All supported pages today are 256 bytes wide. Make sure offset + 8752 * nbytes never exceeds that. 8753 */ 8754 if (offset < 0 || offset >= 256 || nbytes > 256 || 8755 offset + nbytes > 256) 8756 return (EINVAL); 8757 8758 if (nread == NULL) 8759 nread = &nr; 8760 ret = mip->mi_transceiver.mct_read(mip->mi_driver, tranid, page, buf, 8761 nbytes, offset, nread); 8762 if (ret == 0 && *nread > nbytes) { 8763 dev_err(mip->mi_dip, CE_PANIC, "driver wrote %lu bytes into " 8764 "%lu byte sized buffer, possible memory corruption", 8765 *nread, nbytes); 8766 } 8767 8768 return (ret); 8769 } 8770 8771 void 8772 mac_led_init(mac_impl_t *mip) 8773 { 8774 mip->mi_led_modes = MAC_LED_DEFAULT; 8775 8776 if (!mac_capab_get((mac_handle_t)mip, MAC_CAPAB_LED, &mip->mi_led)) { 8777 bzero(&mip->mi_led, sizeof (mac_capab_led_t)); 8778 return; 8779 } 8780 8781 if (mip->mi_led.mcl_flags != 0) { 8782 dev_err(mip->mi_dip, CE_WARN, "driver set led capability " 8783 "flags to invalid value: 0x%x, ignoring " 8784 "capability", mip->mi_transceiver.mct_flags); 8785 bzero(&mip->mi_led, sizeof (mac_capab_led_t)); 8786 return; 8787 } 8788 8789 if ((mip->mi_led.mcl_modes & ~MAC_LED_ALL) != 0) { 8790 dev_err(mip->mi_dip, CE_WARN, "driver set led capability " 8791 "supported modes to invalid value: 0x%x, ignoring " 8792 "capability", mip->mi_transceiver.mct_flags); 8793 bzero(&mip->mi_led, sizeof (mac_capab_led_t)); 8794 return; 8795 } 8796 } 8797 8798 int 8799 mac_led_get(mac_handle_t mh, mac_led_mode_t *supported, mac_led_mode_t *active) 8800 { 8801 mac_impl_t *mip = (mac_impl_t *)mh; 8802 8803 ASSERT(MAC_PERIM_HELD(mh)); 8804 8805 if (mip->mi_led.mcl_set == NULL) 8806 return (ENOTSUP); 8807 8808 *supported = mip->mi_led.mcl_modes; 8809 *active = mip->mi_led_modes; 8810 8811 return (0); 8812 } 8813 8814 /* 8815 * Update and multiplex the various LED requests. We only ever send one LED to 8816 * the underlying driver at a time. As such, we end up multiplexing all 8817 * requested states and picking one to send down to the driver. 8818 */ 8819 int 8820 mac_led_set(mac_handle_t mh, mac_led_mode_t desired) 8821 { 8822 int ret; 8823 mac_led_mode_t driver; 8824 8825 mac_impl_t *mip = (mac_impl_t *)mh; 8826 8827 ASSERT(MAC_PERIM_HELD(mh)); 8828 8829 /* 8830 * If we've been passed a desired value of zero, that indicates that 8831 * we're basically resetting to the value of zero, which is our default 8832 * value. 8833 */ 8834 if (desired == 0) 8835 desired = MAC_LED_DEFAULT; 8836 8837 if (mip->mi_led.mcl_set == NULL) 8838 return (ENOTSUP); 8839 8840 /* 8841 * Catch both values that we don't know about and those that the driver 8842 * doesn't support. 8843 */ 8844 if ((desired & ~MAC_LED_ALL) != 0) 8845 return (EINVAL); 8846 8847 if ((desired & ~mip->mi_led.mcl_modes) != 0) 8848 return (ENOTSUP); 8849 8850 /* 8851 * If we have the same value, then there is nothing to do. 8852 */ 8853 if (desired == mip->mi_led_modes) 8854 return (0); 8855 8856 /* 8857 * Based on the desired value, determine what to send to the driver. We 8858 * only will send a single bit to the driver at any given time. IDENT 8859 * takes priority over OFF or ON. We also let OFF take priority over the 8860 * rest. 8861 */ 8862 if (desired & MAC_LED_IDENT) { 8863 driver = MAC_LED_IDENT; 8864 } else if (desired & MAC_LED_OFF) { 8865 driver = MAC_LED_OFF; 8866 } else if (desired & MAC_LED_ON) { 8867 driver = MAC_LED_ON; 8868 } else { 8869 driver = MAC_LED_DEFAULT; 8870 } 8871 8872 if ((ret = mip->mi_led.mcl_set(mip->mi_driver, driver, 0)) == 0) { 8873 mip->mi_led_modes = desired; 8874 } 8875 8876 return (ret); 8877 } 8878 8879 /* 8880 * Send packets through the Tx ring ('mrh') or through the default 8881 * handler if no ring is specified. Before passing the packet down to 8882 * the MAC provider, emulate any hardware offloads which have been 8883 * requested but are not supported by the provider. 8884 */ 8885 mblk_t * 8886 mac_ring_tx(mac_handle_t mh, mac_ring_handle_t mrh, mblk_t *mp) 8887 { 8888 mac_impl_t *mip = (mac_impl_t *)mh; 8889 8890 if (mrh == NULL) 8891 mrh = mip->mi_default_tx_ring; 8892 8893 if (mrh == NULL) 8894 return (mip->mi_tx(mip->mi_driver, mp)); 8895 else 8896 return (mac_hwring_tx(mrh, mp)); 8897 } 8898 8899 /* 8900 * This is the final stop before reaching the underlying MAC provider. 8901 * This is also where the bridging hook is inserted. Packets that are 8902 * bridged will return through mac_bridge_tx(), with rh nulled out if 8903 * the bridge chooses to send output on a different link due to 8904 * forwarding. 8905 */ 8906 mblk_t * 8907 mac_provider_tx(mac_impl_t *mip, mac_ring_handle_t rh, mblk_t *mp, 8908 mac_client_impl_t *mcip) 8909 { 8910 /* 8911 * If there is a bound Hybrid I/O share, send packets through 8912 * the default tx ring. When there's a bound Hybrid I/O share, 8913 * the tx rings of this client are mapped in the guest domain 8914 * and not accessible from here. 8915 */ 8916 if (mcip->mci_state_flags & MCIS_SHARE_BOUND) 8917 rh = mip->mi_default_tx_ring; 8918 8919 if (mip->mi_promisc_list != NULL) 8920 mac_promisc_dispatch(mip, mp, mcip, B_FALSE); 8921 8922 if (mip->mi_bridge_link == NULL) 8923 return (mac_ring_tx((mac_handle_t)mip, rh, mp)); 8924 else 8925 return (mac_bridge_tx(mip, rh, mp)); 8926 } 8927