1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright 2016 Joyent, Inc. 13.\" 14.Dd March 26, 2017 15.Dt MAC 9E 16.Os 17.Sh NAME 18.Nm mac , 19.Nm GLDv3 20.Nd MAC networking device driver overview 21.Sh SYNOPSIS 22.In sys/mac_provider.h 23.In sys/mac_ether.h 24.Sh INTERFACE LEVEL 25illumos DDI specific 26.Sh DESCRIPTION 27The 28.Sy MAC 29framework provides a means for implementing high-performance networking 30device drivers. It is the successor to the GLD interfaces and is 31sometimes referred to as the GLDv3. The remainder of this manual 32introduces the aspects of writing devices drivers that leverage the MAC 33framework. While both the GLDv3 and MAC framework refer to the same thing, in 34this manual page we use the term the 35.Em MAC framework 36to refer to the device driver interface. 37.Pp 38MAC device drivers are character devices. They define the standard 39.Xr _init 9E , 40.Xr _fini 9E , 41and 42.Xr _info 9E 43entry points to initialize the module, as well as 44.Xr dev_ops 9S 45and 46.Xr cb_ops 9S 47structures. 48.Pp 49The main interface with MAC is through a series of callbacks defined in 50a 51.Xr mac_callbacks 9S 52structure. These callbacks control all the aspects of the device. They 53range from sending data, getting and setting of 54properties, controlling mac address filters, and also managing 55promiscuous mode. 56.Pp 57The MAC framework takes care of many aspects of the device driver's 58management. A device that uses the MAC framework does not have to worry 59about creating device nodes or implementing 60.Xr open 9E 61or 62.Xr close 9E 63routines. In addition, all of the work to interact with 64.Xr dlpi 7P 65is taken care of automatically and transparently. 66.Ss Initializing MAC Support 67For a device to be used in the framework, it must register with the 68framework and take specific actions during 69.Xr _init 9E , 70.Xr attach 9E , 71.Xr detach 9E , 72and 73.Xr _fini 9E . 74.Pp 75All device drivers have to define a 76.Xr dev_ops 9S 77structure which is pointed to by a 78.Xr modldrv 9S 79structure and the corresponding NULL-terminated 80.Xr modlinkage 9S 81structure. The 82.Xr dev_ops 9S 83structure should have a 84.Xr cb_ops 9S 85structure defined for it; however, it does not need to implement any of 86the standard 87.Xr cb_ops 9S 88entry points. 89.Pp 90Normally, in a driver's 91.Xr _init 9E 92entry point, it passes its 93.Sy modlinkage 94structure directly to 95.Xr mod_install 9F . 96To properly register with MAC, the driver must call 97.Xr mac_init_ops 9F 98before it calls 99.Xr mod_install 9F . 100If for some reason the 101.Xr mod_install 9F 102function fails, then the driver must be removed by a call to 103.Xr mac_fini_ops 9F . 104.Pp 105Conversely, in the driver's 106.Xr _fini 9F 107routine, it should call 108.Xr mac_fini_ops 9F 109after it successfully calls 110.Xr mod_remove 9F . 111For an example of how to use the 112.Xr mac_init_ops 9F 113and 114.Xr mac_fini_ops 9F 115functions, see the examples section in 116.Xr mac_init_ops 9F . 117.Ss Registering with MAC 118Every instance of a device should register separately with MAC. 119To register with MAC, a driver must allocate a 120.Xr mac_register 9S 121structure, fill it in, and then call 122.Xr mac_register 9F . 123The 124.Sy mac_register_t 125structure contains information about the device and all of the required 126function pointers that will be used as callbacks by the framework. 127.Pp 128These steps should all be taken during a device's 129.Xr attach 9E 130entry point. It is recommended that the driver perform this sequence of 131steps after the device has finished its initialization of the chipset 132and interrupts, though interrupts should not be enabled at that point. 133After it calls 134.Xr mac_register 9F 135it will start receiving callbacks from the MAC framework. 136.Pp 137To allocate the registration structure, the driver should call 138.Xr mac_alloc 9F . 139Device drivers should generally always pass the symbol 140.Sy MAC_VERSION 141as the argument to 142.Xr mac_alloc 9F . 143Upon successful completion, the driver will receive a 144.Sy mac_register_t 145structure which it should fill in. The structure and its members are 146documented in 147.Xr mac_register 9S . 148.Pp 149The 150.Xr mac_callbacks 9S 151structure is not allocated as a part of the 152.Xr mac_register 9S 153structure. In general, device drivers declare this statically. See the 154.Sx MAC Callbacks 155section for more information on how to fill it out. 156.Pp 157Once the structure has been filled in, the driver should call 158.Xr mac_register 9F 159to register itself with MAC. The handle that it uses to register with 160should be part of the driver's soft state. It will be used in various 161other support functions and callbacks. 162.Pp 163If the call is successful, then the device driver 164should enable interrupts and finish any other initialization required. 165If the call to 166.Xr mac_register 9F 167failed, then it should unwind its initialization and should return 168.Sy DDI_FAILURE 169from its 170.Xr attach 9E 171routine. 172.Ss MAC Callbacks 173The MAC framework interacts with a device driver through a series of 174callbacks. These callbacks are described in their individual manual 175pages and the collection of callbacks is indicated in the 176.Xr mac_callbacks 9S 177manual page. This section does not focus on the specific functions, but 178rather on interactions between them and the rest of the device driver 179framework. 180.Pp 181A device driver should make no assumptions about when the various 182callbacks will be called and whether or not they will be called 183simultaneously. For example, a device driver may be asked to 184transmit data through a call to its 185.Xr mc_tx 9F 186entry point while it is being asked to get a device property through a 187call to its 188.Xr mc_getprop 9F 189entry point. As such, while some calls may be serialized to the device, 190such as setting properties, the device driver should always presume that 191all of its data needs to be protected with locks. While the device is 192holding locks, it is safe for it call the following MAC routines: 193.Bl -bullet -offset indent -compact 194.It 195.Xr mac_hcksum_get 9F 196.It 197.Xr mac_hcksum_set 9F 198.It 199.Xr mac_lso_get 9F 200.It 201.Xr mac_maxsdu_update 9F 202.It 203.Xr mac_prop_info_set_default_link_flowctrl 9F 204.It 205.Xr mac_prop_info_set_default_str 9F 206.It 207.Xr mac_prop_info_set_default_uint8 9F 208.It 209.Xr mac_prop_info_set_default_uint32 9F 210.It 211.Xr mac_prop_info_set_default_uint64 9F 212.It 213.Xr mac_prop_info_set_perm 9F 214.It 215.Xr mac_prop_info_set_range_uint32 9F 216.El 217.Pp 218Any other MAC related routines should not be called with locks held, 219such as 220.Xr mac_link_update 9F 221or 222.Xr mac_rx 9F . 223Other routines in the DDI may be called while locks are held; however, 224device driver writers should be careful about calling blocking routines 225while locks are held or in interrupt context, though it is generally 226legal to do so. 227.Ss Receiving Data 228A device driver will often receive data through the means of an 229interrupt. When that interrupt occurs, the device driver will receive 230one or more frames with optional metadata. Often each frame has a 231corresponding descriptor which has information about whether or not 232there were errors or whether or not the device successfully checksummed 233the packet. 234.Pp 235During a single interrupt, a device driver should process a fixed number 236of frames. For each frame the device driver should: 237.Bl -enum -offset indent 238.It 239First check whether or not the frame has errors. If errors were 240detected, then the frame should not be sent to the operating system. It 241is recommended that devices keep kstats (see 242.Xr kstat_create 9S 243for more information) and bump the counter whenever such an error is 244detected. If the device distinguishes between the types of errors, then 245separate kstats for each class of error are recommended. See the 246.Sx STATISTICS 247section for more information on the various error cases that should be 248considered. 249.It 250Once the frame has been determined to be valid, the device driver should 251transform the frame into a 252.Xr mblk 9S . 253See the section 254.Sx MBLKS AND DMA 255for more information on how to transform and prepare a message block. 256.It 257If the device supports hardware checksumming (see the 258.Sx CAPABILITIES 259section for more information on checksumming), then the device driver 260should set the corresponding checksumming information with a call to 261.Xr mac_hcksum_set 9F . 262.It 263It should then append this new message block to the 264.Em end 265of the message block chain, linking it to the 266.Sy b_next 267pointer. It is vitally important that all the frames be chained in the 268order that they were received. If the device driver mistakenly reorders 269frames, then it may cause performance impacts in the TCP stack and 270potentially impact application correctness. 271.El 272.Pp 273Once all the frames have been processed and assembled, the device driver 274should deliver them to the rest of the operating system by calling 275.Xr mac_rx 9F . 276The device driver should try to give as many mblk_t structures to the 277system at once. It 278.Em should not 279call 280.Xr mac_rx 9F 281once for every assembled mblk_t. 282.Pp 283The device driver must not hold any locks across the call to 284.Xr mac_rx 9F . 285When this function is called, received data will be pushed through the 286networking stack and some replies may be generated and given to the 287driver to send out. 288.Pp 289It is not the device driver's responsibility to determine whether or not 290the system can keep up with a driver's delivery rate of frames. The rest 291of the networking stack will handle issues related to keeping up 292appropriately and ensure that kernel memory is not exhausted by packets 293that are not being processed. 294.Pp 295Finally, the device driver should make sure that any other housekeeping 296activities required for the ring are taken care of such that more data 297can be received. 298.Ss Transmitting Data and Back Pressure 299A device driver will be asked to transmit a message block chain by 300having it's 301.Xr mc_tx 9E 302entry point called. While the driver is processing the message blocks, 303it may run out of resources. For example, a transmit descriptor ring may 304become full. At that point, the device driver should return the 305remaining unprocessed frames. The act of returning frames indicates that 306the device has asserted flow control. 307Once this has been done, no additional calls will be made to the 308driver's transmit entry point and the back pressure will be propagated 309throughout the rest of the networking stack. 310.Pp 311At some point in the future when resources have become available again, 312for example after an interrupt indicating that some portion of the 313transmit ring has been sent, then the device driver must notify the 314system that it can continue transmission. To do this, the 315driver should call 316.Xr mac_tx_update 9F . 317After that point, the driver will receive calls to its 318.Xr mc_tx 9E 319entry point again. As mentioned in the section on callbacks, the device 320driver should avoid holding any particular locks across the call to 321.Xr mac_tx_update 9F . 322.Ss Interrupt Coalescing 323For devices operating at higher data rates, interrupt coalescing is an 324important part of a well functioning device and may impact the 325performance of the device. Not all devices support interrupt 326coalescing. If interrupt coalescing is supported on the device, it is 327recommended that device driver writers provide private properties for 328their device to control the interrupt coalescing rate. This will make it 329much easier to perform experiments and observe the impact of different 330interrupt rates on the rest of the system. 331.Ss MAC Address Filter Management 332The MAC framework will attempt to use as many MAC address filters as a 333device has. To program a multicast address filter, the driver's 334.Xr mc_multicst 9E 335entry point will be called. If the device driver runs out of filters, it 336should not take any special action and just return the appropriate error 337as documented in the corresponding manual pages for the entry points. 338The framework will ensure that the device is placed in promiscuous mode 339if it needs to. 340.Ss Link Updates 341It is the responsibility of the device driver to keep track of the 342data link's state. Many devices provide a means of receiving an 343interrupt when the state of the link changes. When such a change 344happens, the driver should update its internal data structures and then 345call 346.Xr mac_link_update 9F 347to inform the MAC layer that this has occurred. If the device driver 348does not properly inform the system about link changes, then various 349features like link aggregations and other mechanisms that leverage the 350link state will not work correctly. 351.Ss Link Speed and Auto-negotiation 352Many networking devices support more than one possible speed that they 353can operate at. The selection of a speed is often performed through 354.Em auto-negotiation , 355though some devices allow the user to control what speeds are advertised 356and used. 357.Pp 358Logically, there are two different sets of things that the device driver 359needs to keep track of while it's operating: 360.Bl -enum 361.It 362The supported speeds in hardware. 363.It 364The enabled speeds from the user. 365.El 366.Pp 367By default, when a link first comes up, the device driver should 368generally configure the link to support the common set of speeds and 369perform auto-negotiation. 370.Pp 371A user can control what speeds a device advertises via auto-negotiation 372and whether or not it performs auto-negotiation at all by using a series 373of properties that have 374.Sy _EN_ 375in the name. These are read/write properties and there is one for each 376speed supported in the operating system. For a full list of them, see 377the 378.Sx PROPERTIES 379section. 380.Pp 381In addition to these properties, there is a corresponding set of 382properties with 383.Sy _ADV_ 384in the name. These are similar to the 385.Sy _EN_ 386family of properties, but they are read-only and indicate what the 387device has actually negotiated. While they are generally similar to the 388.Sy _EN_ 389family of properties, they may change depending on power settings. See 390the 391.Sy Ethernet Link Properties 392section in 393.Xr dladm 1M 394for more information. 395.Pp 396It's worth discussing how these different values get used throughout the 397different entry points. The first entry point to consider is the 398.Xr mc_propinfo 9E 399entry point. For a given speed, the driver should consult whether or not 400the hardware supports this speed. If it does, it should fill in the 401default value that the hardware takes and whether or not the property is 402writable. The properties should also be updated to indicate whether or 403not it is writable. This holds for both the 404.Sy _EN_ 405and 406.Sy _ADV_ 407family of properties. 408.Pp 409The next entry point is 410.Xr mc_getprop 9E . 411Here, the device should first consult whether the given speed is 412supported. If it is not, then the driver should return 413.Er ENOTSUP . 414If it does, then it should return the current value of the property. 415.Pp 416The last property endpoint is the 417.Xr mc_setprop 9E 418entry point. Here, the same logic applies. Before the driver considers 419whether or not the property is writable, it should first check whether 420or not it's a supported property. If it's not, then it should return 421.Er ENOTSUP . 422Otherwise, it should proceed to check whether the property is writable, 423and if it is and a valid value, then it should update the property and 424restart the link's negotiation. 425.Pp 426Finally, there is the 427.Xr mc_getstat 9E 428entry point. Several of the statistics that are queried relate to 429auto-negotiation and hardware capabilities. When a statistic relates to 430the hardware supporting a given speed, the 431.Sy _EN_ 432properties should be ignored. The only thing that should be consulted is 433what the hardware itself supports. Otherwise, the statistics should look 434at what is currently being advertised by the device. 435.Ss Unregistering from MAC 436During a driver's 437.Xr detach 9E 438routine, it should unregister the device instance from MAC by calling 439.Xr mac_unregister 9F 440on the handle that it originally called it on. If the call to 441.Xr mac_unregister 9F 442failed, then the device is likely still in use and the driver should 443fail the call to 444.Xr detach 9E . 445.Ss Interacting with Devices 446Administrators always interact with devices through the 447.Xr dladm 1M 448command line interface. The state of devices such as whether the link is 449considered 450.Sy up 451or 452.Sy down , 453various link properties such as the 454.Sy MTU , 455.Sy auto-negotiation 456state, 457and 458.Sy flow control 459state, 460are all exposed. It is also the preferred way that these properties are 461set and configured. 462.Pp 463While device tunables may be presented in a 464.Xr driver.conf 4 465file, it is recommended instead to expose such things through 466.Xr dladm 1M 467private properties, whether explicitly documented or not. 468.Sh CAPABILITIES 469Capabilities in the MAC Framework are optional features that a device 470supports which indicate various hardware features that the device 471supports. The two current capabilities that the system supports are 472related to being able to hardware perform large send offloads (LSO), 473often also known as TCP segmentation and the ability for hardware to 474calculate and verify the checksums present in IPv4, IPV6, and protocol 475headers such as TCP and UDP. 476.Pp 477The MAC framework will query a device for support of a capability 478through the 479.Xr mc_getcapab 9E 480function. Each capability has its own constant and may have 481corresponding data that goes along with it and a specific structure that 482the device is required to fill in. Note, the set of capabilities changes 483over time and there are also private capabilities in the system. Several 484of the capabilities are used in the implementation of the MAC framework. 485Others, like 486.Sy MAC_CAPAB_RINGS , 487represent feature that have not been stabilized and thus both API and 488binary compatibility for them is not guaranteed. It is important that 489the device driver handles unknown capabilities correctly. For more 490information, see 491.Xr mc_getcapab 9E . 492.Pp 493The following capabilities are 494stable and defined in the system: 495.Ss MAC_CAPAB_HCKSUM 496The 497.Sy MAC_CAPAB_HCKSUM 498capability indicates to the system that the device driver supports some 499amount of checksumming. The specific data for this capability is a 500pointer to a 501.Sy uint32_t . 502To indicate no support for any kind of checksumming, the driver should 503either set this value to zero or simply return that it doesn't support 504the capability. 505.Pp 506Note, the values that the driver declares in this capability indicate 507what it can do when it transmits data. If the driver can only 508verify checksums when receiving data, then it should not indicate that 509it supports this capability. The following set of flags may be combined 510through a bitwise inclusive OR: 511.Bl -tag -width Ds 512.It Sy HCKSUM_INET_PARTIAL 513This indicates that the hardware can calculate a partial checksum for 514both IPv4 and IPv6; however, it requires the pseudo-header checksum be 515calculated for it. The pseudo-header checksum will be available for the 516mblk_t when calling 517.Xr mac_hcksum_get 9F . 518Note this does not imply that the hardware is capable of calculating the 519IPv4 header checksum. That should be indicated with the 520.Sy HCKSUM_IPHDRCKSUM flag. 521.It Sy HCKSUM_INET_FULL_V4 522This indicates that the hardware will fully calculate the L4 checksum 523for outgoing IPv4 packets and does not require a pseudo-header checksum. 524Note this does not imply that the hardware is capable of calculating the 525IPv4 header checksum. That should be indicated with the 526.Sy HCKSUM_IPHDRCKSUM . 527.It Sy HCKSUM_INET_FULL_V6 528This indicates that the hardware will fully calculate the L4 checksum 529for outgoing IPv6 packets and does not require a pseudo-header checksum. 530.It Sy HCKSUM_IPHDRCKSUM 531This indicates that the hardware supports calculating the checksum for 532the IPv4 header itself. 533.El 534.Pp 535When in a driver's transmit function, the driver will be processing a 536single frame. It should call 537.Xr mac_hcksum_get 9F 538to see what checksum flags are set on it. Note that the flags that are 539set on it are different from the ones described above and are documented 540in its manual page. These flags indicate how the driver is expected to 541program the hardware and what checksumming is required. Not all frames 542will require hardware checksumming or will ask the hardware to checksum 543it. 544.Pp 545If a driver supports offloading the receive checksum and verification, 546it should check to see what the hardware indicated was verified. The 547driver should then call 548.Xr mac_hcksum_set 9F . 549The flags used are different from the ones above and are discussed in 550detail in the 551.Xr mac_hcksum_set 9F 552manual page. If there is no checksum information available or the driver 553does not support checksumming, then it should simply not call 554.Xr mac_hcksum_set 9F . 555.Pp 556Note that the checksum flags should be set on the first 557mblk_t that makes up a given message. In other words, if multiple 558mblk_t structures are linked together by the 559.Sy b_cont 560member to describe a single frame, then it should only be called on the 561first mblk_t of that set. However, each distinct message should have the 562checksum bits set on it, if applicable. In other words, each mblk_t that 563is linked together by the 564.Sy b_next 565pointer may have checksum flags set. 566.Pp 567It is recommended that device drivers provide a private property or 568.Xr driver.conf 4 569property to control whether or not checksumming is enabled for both rx 570and tx; however, the default disposition is recommended to be enabled 571for both. This way if hardware bugs are found in the checksumming 572implementation, they can be disabled without requiring software updates. 573The transmit property should be checked when determining how to reply to 574.Xr mc_getcapab 9E 575and the receive property should be checked in the context of the receive 576function. 577.Ss MAC_CAPAB_LSO 578The 579.Sy MAC_CAPAB_LSO 580capability indicates that the driver supports various forms of large 581send offload (LSO). The private data is a pointer to a 582.Sy mac_capab_lso_t 583structure. At the moment, LSO support is limited to TCP inside of IPv4. 584This structure has the following members which are used to indicate 585various types of LSO support. 586.Bd -literal -offset indent 587t_uscalar_t lso_flags; 588lso_basic_tcp_ivr4_t lso_basic_tcp_ipv4; 589.Ed 590.Pp 591The 592.Sy lso_flags 593member is used to indicate which members are valid and should be 594considered. Each flag represents a different form of LSO. The member 595should be set to the bitwise inclusive OR of the following values: 596.Bl -tag -width Dv -offset indent 597.It Sy LSO_TX_BASIC_TCP_IPV4 598This indicates hardware support for performing TCP segmentation 599offloading over IPv4. When this flag is set, the 600.Sy lso_basic_tcp_ipv4 601member must be filled in. 602.El 603.Pp 604The 605.Sy lso_basic_tcp_ipv4 606member is a structure with the following members: 607.Bd -literal -offset indent 608t_uscalar_t lso_max 609.Ed 610.Bd -filled -offset indent 611The 612.Sy lso_max 613member should be set to the maximum size of the TCP data 614payload that can be offloaded to the hardware. 615.Ed 616.Pp 617Like with checksumming, it is recommended that driver writers provide a 618means for disabling the support of LSO even if it is enabled by default. 619This deals with the case where issues that pop up for LSO may be worked 620around without requiring additional driver work. 621.Sh PROPERTIES 622Properties in the MAC framework represent aspects of a link. These 623include things like the link's current state and MTU. Many of the 624properties in the system are focused around auto-negotiation and 625controlling what link speeds are advertised. Information about 626properties is covered by three different device entry points. The 627.Xr mc_propinfo 9E 628entry point obtains metadata about the property. The 629.Xr mc_getprop 9E 630entry point obtains the property. The 631.Xr mc_setprop 9E 632entry point updates the property to a new value. 633.Pp 634Many of the properties listed below are read-only. Each property 635indicates whether it's read-only or it's read/write. However, driver 636writers may not implement the ability to set all writable properties. 637Many of these depend on the card itself. In particular, all properties 638that relate to auto-negotiation and are read/write may not be updated 639if the hardware in question does not support toggling what link speeds 640are auto-negotiated. While copper Ethernet often does not have this 641restriction, it often exists with various fiber standards and phys. 642.Pp 643The following properties are the subset of MAC framework properties that 644driver writers should be aware of and handle. While other properties 645exist in the system, driver writers should always return an error when a 646property not listed below is encountered. See 647.Xr mc_getprop 9E 648and 649.Xr mc_setprop 9E 650for more information on how to handle them. 651.Bl -hang -width Ds 652.It Sy MAC_PROP_DUPLEX 653.Bd -filled -compact 654Type: 655.Sy link_duplex_t | 656Permissions: 657.Sy Read-Only 658.Ed 659.Pp 660The 661.Sy MAC_PROP_DUPLEX 662property is used to indicate whether or not the link is duplex. A duplex 663link may have traffic flowing in both directions at the same time. The 664.Sy link_duplex_t 665is an enumeration which may be set to any of the following values: 666.Bl -tag -width Ds 667.It Sy LINK_DUPLEX_UNKNOWN 668The current state of the link is unknown. This may be because the link 669has not negotiated to a specific speed or it is down. 670.It Sy LINK_DUPLEX_HALF 671The link is running at half duplex. Communication may travel in only one 672direction on the link at a given time. 673.It Sy LINK_DUPLEX_FULL 674The link is running at full duplex. Communication may travel in both 675directions on the link simultaneously. 676.El 677.It Sy MAC_PROP_SPEED 678.Bd -filled -compact 679Type: 680.Sy uint64_t | 681Permissions: 682.Sy Read-Only 683.Ed 684.Pp 685The 686.Sy MAC_PROP_SPEED 687property stores the current link speed in bits per second. A link 688that is running at 100 MBit/s would store the value 100000000ULL. A link 689that is running at 40 Gbit/s would store the value 40000000000ULL. 690.It Sy MAC_PROP_STATUS 691.Bd -filled -compact 692Type: 693.Sy link_state_t | 694Permissions: 695.Sy Read-Only 696.Ed 697.Pp 698The 699.Sy MAC_PROP_STATUS 700property is used to indicate the current state of the link. It indicates 701whether the link is up or down. The 702.Sy link_state_t 703is an enumeration which may be set to any of the following values: 704.Bl -tag -width Ds 705.It Sy LINK_STATE_UNKNOWN 706The current state of the link is unknown. This may be because the 707driver's 708.Xr mc_start 9E 709endpoint has not been called so it has not attempted to start the link. 710.It Sy LINK_STATE_DOWN 711The link is down. This may be because of a negotiation problem, a cable 712problem, or some other device specific issue. 713.It Sy LINK_STATE_UP 714The link is up. If auto-negotiation is in use, it should have completed. 715Traffic should be able to flow over the link, barring other issues. 716.El 717.It Sy MAC_PROP_AUTONEG 718.Bd -filled -compact 719Type: 720.Sy uint8_t | 721Permissions: 722.Sy Read/Write 723.Ed 724.Pp 725The 726.Sy MAC_PROP_AUTONEG 727property indicates whether or not the device is currently configured to 728perform auto-negotiation. A value of 729.Sy 0 730indicates that auto-negotiation is disabled. A 731.Sy non-zero 732value indicates that auto-negotiation is enabled. Devices should 733generally default to enabling auto-negotiation. 734.Pp 735When getting this property, the device driver should return the current 736state. When setting this property, if the device supports operating in 737the requested mode, then the device driver should reset the link to 738negotiate to the new speed after updating any internal registers. 739.It Sy MAC_PROP_MTU 740.Bd -filled -compact 741Type: 742.Sy uint32_t | 743Permissions: 744.Sy Read/Write 745.Ed 746.Pp 747The 748.Sy MAC_PROP_MTU 749property determines the maximum transmission unit (MTU). This indicates 750the maximum size packet that the device can transmit, ignoring its own 751headers. For an Ethernet device, this would exclude the size of the 752Ethernet header and any VLAN headers that would be placed. It is up to 753the driver to ensure that any MTU values that it accepts when adding in 754its margin and header sizes does not exceed its maximum frame size. 755.Pp 756By default, drivers for Ethernet should initialize this value and the 757MTU to 758.Sy 1500 . 759When getting this property, the driver should return its current 760recorded MTU. When setting this property, the driver should first 761validate that it is within the device's valid range and then it must 762call 763.Xr mac_maxsdu_update 9F . 764Note that the call may fail. If the call completes successfully, the 765driver should update the hardware with the new value of the MTU and 766perform any other work needed to handle it. 767.Pp 768If the device does not support changing the MTU after the device's 769.Xr mc_start 9E 770entry point has been called, then driver writers should return 771.Er EBUSY . 772.It Sy MAC_PROP_FLOWCTRL 773.Bd -filled -compact 774Type: 775.Sy link_flowctrl_t | 776Permissions: 777.Sy Read/Write 778.Ed 779.Pp 780The 781.Sy MAC_PROP_FLOWCTRL 782property manages the configuration of pause frames as part of Ethernet 783flow control. Note, this only describes what this device will advertise. 784What is actually enabled may be different and is subject to the rules of 785auto-negotiation. The 786.Sy link_flowctrl_t 787is an enumeration that may be set to one of the following values: 788.Bl -tag -width Ds 789.It Sy LINK_FLOWCTRL_NONE 790Flow control is disabled. No pause frames should be generated or 791honored. 792.It Sy LINK_FLOWCTRL_RX 793The device can receive pause frames; however, it should not generate 794them. 795.It Sy LINK_FLOWCTRL_TX 796The device can generate pause frames; however, it does not support 797receiving them. 798.It Sy LINK_FLOWCTRL_BI 799The device supports both sending and receiving pause frames. 800.El 801.Pp 802When getting this property, the device driver should return the way that 803it has configured the device, not what the device has actually 804negotiated. When setting the property, it should update the hardware and 805allow the link to potentially perform auto-negotiation again. 806.El 807.Pp 808The remaining properties are all about various auto-negotiation link 809speeds. They fall into two different buckets: properties with 810.Sy _ADV_ 811in the name and properties with 812.Sy _EN_ 813in the name. For any given supported speed, there is one of each. The 814.Sy _EN_ 815set of properties are read/write properties that control what should be 816advertised by the device. When these are retrieved, they should return 817the current value of the property. When they are set, they should change 818how the hardware advertises the specific speed and trigger any kind of 819link reset and auto-negotiation, if enabled, to occur. 820.Pp 821The 822.Sy _ADV_ 823set of properties are read-only properties. They are meant to reflect 824what has actually been negotiated. These may be different from the 825.Sy _EN_ 826family of properties, especially when different power management 827settings are at play. 828.Pp 829See the 830.Sx Link Speed and Auto-negotiation 831section for more information. 832.Pp 833The properties are ordered in increasing link speed: 834.Bl -hang -width Ds 835.It Sy MAC_PROP_ADV_10HDX_CAP 836.Bd -filled -compact 837Type: 838.Sy uint8_t | 839Permissions: 840.Sy Read-Only 841.Ed 842.Pp 843The 844.Sy MAC_PROP_ADV_10HDX_CAP 845property describes whether or not 10 Mbit/s half-duplex support is 846advertised. 847.It Sy MAC_PROP_EN_10HDX_CAP 848.Bd -filled -compact 849Type: 850.Sy uint8_t | 851Permissions: 852.Sy Read/Write 853.Ed 854.Pp 855The 856.Sy MAC_PROP_EN_10HDX_CAP 857property describes whether or not 10 Mbit/s half-duplex support is 858enabled. 859.It Sy MAC_PROP_ADV_10FDX_CAP 860.Bd -filled -compact 861Type: 862.Sy uint8_t | 863Permissions: 864.Sy Read-Only 865.Ed 866.Pp 867The 868.Sy MAC_PROP_ADV_10FDX_CAP 869property describes whether or not 10 Mbit/s full-duplex support is 870advertised. 871.It Sy MAC_PROP_EN_10FDX_CAP 872.Bd -filled -compact 873Type: 874.Sy uint8_t | 875Permissions: 876.Sy Read/Write 877.Ed 878.Pp 879The 880.Sy MAC_PROP_EN_10FDX_CAP 881property describes whether or not 10 Mbit/s full-duplex support is 882enabled. 883.It Sy MAC_PROP_ADV_100HDX_CAP 884.Bd -filled -compact 885Type: 886.Sy uint8_t | 887Permissions: 888.Sy Read-Only 889.Ed 890.Pp 891The 892.Sy MAC_PROP_ADV_100HDX_CAP 893property describes whether or not 100 Mbit/s half-duplex support is 894advertised. 895.It Sy MAC_PROP_EN_100HDX_CAP 896.Bd -filled -compact 897Type: 898.Sy uint8_t | 899Permissions: 900.Sy Read/Write 901.Ed 902.Pp 903The 904.Sy MAC_PROP_EN_100HDX_CAP 905property describes whether or not 100 Mbit/s half-duplex support is 906enabled. 907.It Sy MAC_PROP_ADV_100FDX_CAP 908.Bd -filled -compact 909Type: 910.Sy uint8_t | 911Permissions: 912.Sy Read-Only 913.Ed 914.Pp 915The 916.Sy MAC_PROP_ADV_100FDX_CAP 917property describes whether or not 100 Mbit/s full-duplex support is 918advertised. 919.It Sy MAC_PROP_EN_100FDX_CAP 920.Bd -filled -compact 921Type: 922.Sy uint8_t | 923Permissions: 924.Sy Read/Write 925.Ed 926.Pp 927The 928.Sy MAC_PROP_EN_100FDX_CAP 929property describes whether or not 100 Mbit/s full-duplex support is 930enabled. 931.It Sy MAC_PROP_ADV_100T4_CAP 932.Bd -filled -compact 933Type: 934.Sy uint8_t | 935Permissions: 936.Sy Read-Only 937.Ed 938.Pp 939The 940.Sy MAC_PROP_ADV_100T4_CAP 941property describes whether or not 100 Mbit/s Ethernet using the 942100BASE-T4 standard is 943advertised. 944.It Sy MAC_PROP_EN_100T4_CAP 945.Bd -filled -compact 946Type: 947.Sy uint8_t | 948Permissions: 949.Sy Read/Write 950.Ed 951.Pp 952The 953.Sy MAC_PROP_ADV_100T4_CAP 954property describes whether or not 100 Mbit/s Ethernet using the 955100BASE-T4 standard is 956enabled. 957.It Sy MAC_PROP_ADV_1000HDX_CAP 958.Bd -filled -compact 959Type: 960.Sy uint8_t | 961Permissions: 962.Sy Read-Only 963.Ed 964.Pp 965The 966.Sy MAC_PROP_ADV_1000HDX_CAP 967property describes whether or not 1 Gbit/s half-duplex support is 968advertised. 969.It Sy MAC_PROP_EN_1000HDX_CAP 970.Bd -filled -compact 971Type: 972.Sy uint8_t | 973Permissions: 974.Sy Read/Write 975.Ed 976.Pp 977The 978.Sy MAC_PROP_EN_1000HDX_CAP 979property describes whether or not 1 Gbit/s half-duplex support is 980enabled. 981.It Sy MAC_PROP_ADV_1000FDX_CAP 982.Bd -filled -compact 983Type: 984.Sy uint8_t | 985Permissions: 986.Sy Read-Only 987.Ed 988.Pp 989The 990.Sy MAC_PROP_ADV_1000FDX_CAP 991property describes whether or not 1 Gbit/s full-duplex support is 992advertised. 993.It Sy MAC_PROP_EN_1000FDX_CAP 994.Bd -filled -compact 995Type: 996.Sy uint8_t | 997Permissions: 998.Sy Read/Write 999.Ed 1000.Pp 1001The 1002.Sy MAC_PROP_EN_1000FDX_CAP 1003property describes whether or not 1 Gbit/s full-duplex support is 1004enabled. 1005.It Sy MAC_PROP_ADV_2500FDX_CAP 1006.Bd -filled -compact 1007Type: 1008.Sy uint8_t | 1009Permissions: 1010.Sy Read-Only 1011.Ed 1012.Pp 1013The 1014.Sy MAC_PROP_ADV_2500FDX_CAP 1015property describes whether or not 2.5 Gbit/s full-duplex support is 1016advertised. 1017.It Sy MAC_PROP_EN_2500FDX_CAP 1018.Bd -filled -compact 1019Type: 1020.Sy uint8_t | 1021Permissions: 1022.Sy Read/Write 1023.Ed 1024.Pp 1025The 1026.Sy MAC_PROP_EN_2500FDX_CAP 1027property describes whether or not 2.5 Gbit/s full-duplex support is 1028enabled. 1029.It Sy MAC_PROP_ADV_5000FDX_CAP 1030.Bd -filled -compact 1031Type: 1032.Sy uint8_t | 1033Permissions: 1034.Sy Read-Only 1035.Ed 1036.Pp 1037The 1038.Sy MAC_PROP_ADV_5000FDX_CAP 1039property describes whether or not 5.0 Gbit/s full-duplex support is 1040advertised. 1041.It Sy MAC_PROP_EN_5000FDX_CAP 1042.Bd -filled -compact 1043Type: 1044.Sy uint8_t | 1045Permissions: 1046.Sy Read/Write 1047.Ed 1048.Pp 1049The 1050.Sy MAC_PROP_EN_5000FDX_CAP 1051property describes whether or not 5.0 Gbit/s full-duplex support is 1052enabled. 1053.It Sy MAC_PROP_ADV_10GFDX_CAP 1054.Bd -filled -compact 1055Type: 1056.Sy uint8_t | 1057Permissions: 1058.Sy Read-Only 1059.Ed 1060.Pp 1061The 1062.Sy MAC_PROP_ADV_10GFDX_CAP 1063property describes whether or not 10 Gbit/s full-duplex support is 1064advertised. 1065.It Sy MAC_PROP_EN_10GFDX_CAP 1066.Bd -filled -compact 1067Type: 1068.Sy uint8_t | 1069Permissions: 1070.Sy Read/Write 1071.Ed 1072.Pp 1073The 1074.Sy MAC_PROP_EN_10GFDX_CAP 1075property describes whether or not 10 Gbit/s full-duplex support is 1076enabled. 1077.It Sy MAC_PROP_ADV_40GFDX_CAP 1078.Bd -filled -compact 1079Type: 1080.Sy uint8_t | 1081Permissions: 1082.Sy Read-Only 1083.Ed 1084.Pp 1085The 1086.Sy MAC_PROP_ADV_40GFDX_CAP 1087property describes whether or not 40 Gbit/s full-duplex support is 1088advertised. 1089.It Sy MAC_PROP_EN_40GFDX_CAP 1090.Bd -filled -compact 1091Type: 1092.Sy uint8_t | 1093Permissions: 1094.Sy Read/Write 1095.Ed 1096.Pp 1097The 1098.Sy MAC_PROP_EN_40GFDX_CAP 1099property describes whether or not 40 Gbit/s full-duplex support is 1100enabled. 1101.It Sy MAC_PROP_ADV_100GFDX_CAP 1102.Bd -filled -compact 1103Type: 1104.Sy uint8_t | 1105Permissions: 1106.Sy Read-Only 1107.Ed 1108.Pp 1109The 1110.Sy MAC_PROP_ADV_100GFDX_CAP 1111property describes whether or not 100 Gbit/s full-duplex support is 1112advertised. 1113.It Sy MAC_PROP_EN_100GFDX_CAP 1114.Bd -filled -compact 1115Type: 1116.Sy uint8_t | 1117Permissions: 1118.Sy Read/Write 1119.Ed 1120.Pp 1121The 1122.Sy MAC_PROP_EN_100GFDX_CAP 1123property describes whether or not 100 Gbit/s full-duplex support is 1124enabled. 1125.El 1126.Ss Private Properties 1127In addition to the defined properties above, drivers are allowed to 1128define private properties. These private properties are device-specific 1129properties. All private properties share the same constant, 1130.Sy MAC_PROP_PRIVATE . 1131Properties are distinguished by a name, which is a character string. The 1132list of such private properties is defined when registering with mac in 1133the 1134.Sy m_priv_props 1135member of the 1136.Xr mac_register 9S 1137structure. 1138.Pp 1139The driver may define whatever semantics it wants for these private 1140properties. They will not be listed when running 1141.Xr dladm 1M , 1142unless explicitly requested by name. All such properties should start 1143with a leading underscore character and then consist of alphanumeric 1144ASCII characters and additional underscores or hyphens. 1145.Pp 1146Properties of type 1147.Sy MAC_PROP_PRIVATE 1148may show up in all three property related entry points: 1149.Xr mc_propinfo 9E , 1150.Xr mc_getprop 9E , 1151and 1152.Xr mc_setprop 9E . 1153Device drivers should tell the different properties apart by using the 1154.Xr strcmp 9F 1155function to compare it to the set of properties that it knows about. 1156When encountering properties that it doesn't know, it should treat them 1157like all other unknown properties. 1158.Sh STATISTICS 1159The MAC framework defines a couple different sets of statistics which 1160are based on various standards for devices to implement. Statistics are 1161retrieved through the 1162.Xr mc_getstat 9E 1163entry point. There are both statistics that are required for all devices 1164and then there is a separate set of Ethernet specific statistics. Not 1165all devices will support every statistic. In many cases, several device 1166registers will need to be combined to create the proper stat. 1167.Pp 1168In general, if the device is not keeping track of these statistics, then 1169it is recommended that the driver store these values as a 1170.Sy uint64_t 1171to ensure that overflow does not occur. 1172.Pp 1173If a device does not support a specific statistic, then it is fine to 1174return that it is not supported. The same should be used for 1175unrecognized statistics. See 1176.Xr mc_getstat 9E 1177for more information on the proper way to handle these. 1178.Ss General Device Statistics 1179The following statistics are based on MIB-II statistics from both RFC 11801213 and RFC 1573. 1181.Bl -tag -width Ds 1182.It Sy MAC_STAT_IFSPEED 1183The device's current speed in bits per second. 1184.It Sy MAC_STAT_MULTIRCV 1185The total number of received multicast packets. 1186.It Sy MAC_STAT_BRDCSTRCV 1187The total number of received broadcast packets. 1188.It Sy MAC_STAT_MULTIXMT 1189The total number of transmitted multicast packets. 1190.It Sy MAC_STAT_BRDCSTXMT 1191The total number of received broadcast packets. 1192.It Sy MAC_STAT_NORCVBUF 1193The total number of packets discarded by the hardware due to a lack of 1194receive buffers. 1195.It Sy MAC_STAT_IERRORS 1196The total number of errors detected on input. 1197.It Sy MAC_STAT_UNKNOWNS 1198The total number of received packets that were discarded because they 1199were of an unknown protocol. 1200.It Sy MAC_STAT_NOXMTBUF 1201The total number of outgoing packets dropped due to a lack of transmit 1202buffers. 1203.It Sy MAC_STAT_OERRORS 1204The total number of outgoing packets that resulted in errors. 1205.It Sy MAC_STAT_COLLISIONS 1206Total number of collisions encountered by the transmitter. 1207.It Sy MAC_STAT_RBYTES 1208The total number of 1209.Sy bytes 1210received by the device, regardless of packet type. 1211.It Sy MAC_STAT_IPACKETS 1212The total number of 1213.Sy packets 1214received by the device, regardless of packet type. 1215.It Sy MAC_STAT_OBYTES 1216The total number of 1217.Sy bytes 1218transmitted by the device, regardless of packet type. 1219.It Sy MAC_STAT_OPACKETS 1220The total number of 1221.Sy packets 1222sent by the device, regardless of packet type. 1223.It Sy MAC_STAT_UNDERFLOWS 1224The total number of packets that were smaller than the minimum sized 1225packet for the device and were therefore dropped. 1226.It Sy MAC_STAT_OVERFLOWS 1227The total number of packets that were larger than the maximum sized 1228packet for the device and were therefore dropped. 1229.El 1230.Ss Ethernet Specific Statistics 1231The following statistics are specific to Ethernet devices. They refer to 1232values from RFC 1643 and include various MII/GMII specific stats. Many 1233of these are also defined in IEEE 802.3. 1234.Bl -tag -width Ds 1235.It Sy ETHER_STAT_ADV_CAP_1000FDX 1236Indicates that the device is advertising support for 1 Gbit/s 1237full-duplex operation. 1238.It Sy ETHER_STAT_ADV_CAP_1000HDX 1239Indicates that the device is advertising support for 1 Gbit/s 1240half-duplex operation. 1241.It Sy ETHER_STAT_ADV_CAP_100FDX 1242Indicates that the device is advertising support for 100 Mbit/s 1243full-duplex operation. 1244.It Sy ETHER_STAT_ADV_CAP_100GFDX 1245Indicates that the device is advertising support for 100 Gbit/s 1246full-duplex operation. 1247.It Sy ETHER_STAT_ADV_CAP_100HDX 1248Indicates that the device is advertising support for 100 Mbit/s 1249half-duplex operation. 1250.It Sy ETHER_STAT_ADV_CAP_100T4 1251Indicates that the device is advertising support for 100 Mbit/s 1252100BASE-T4 operation. 1253.It Sy ETHER_STAT_ADV_CAP_10FDX 1254Indicates that the device is advertising support for 10 Mbit/s 1255full-duplex operation. 1256.It Sy ETHER_STAT_ADV_CAP_10GFDX 1257Indicates that the device is advertising support for 10 Gbit/s 1258full-duplex operation. 1259.It Sy ETHER_STAT_ADV_CAP_10HDX 1260Indicates that the device is advertising support for 10 Mbit/s 1261half-duplex operation. 1262.It Sy ETHER_STAT_ADV_CAP_2500FDX 1263Indicates that the device is advertising support for 2.5 Gbit/s 1264full-duplex operation. 1265.It Sy ETHER_STAT_ADV_CAP_40GFDX 1266Indicates that the device is advertising support for 40 Gbit/s 1267full-duplex operation. 1268.It Sy ETHER_STAT_ADV_CAP_5000FDX 1269Indicates that the device is advertising support for 5.0 Gbit/s 1270full-duplex operation. 1271.It Sy ETHER_STAT_ADV_CAP_ASMPAUSE 1272Indicates that the device is advertising support for receiving pause 1273frames. 1274.It Sy ETHER_STAT_ADV_CAP_AUTONEG 1275Indicates that the device is advertising support for auto-negotiation. 1276.It Sy ETHER_STAT_ADV_CAP_PAUSE 1277Indicates that the device is advertising support for generating pause 1278frames. 1279.It Sy ETHER_STAT_ADV_REMFAULT 1280Indicates that the device is advertising support for detecting faults in 1281the remote link peer. 1282.It Sy ETHER_STAT_ALIGN_ERRORS 1283Indicates the number of times an alignment error was generated by the 1284Ethernet device. This is a count of packets that were not an integral 1285number of octets and failed the FCS check. 1286.It Sy ETHER_STAT_CAP_1000FDX 1287Indicates the device supports 1 Gbit/s full-duplex operation. 1288.It Sy ETHER_STAT_CAP_1000HDX 1289Indicates the device supports 1 Gbit/s half-duplex operation. 1290.It Sy ETHER_STAT_CAP_100FDX 1291Indicates the device supports 100 Mbit/s full-duplex operation. 1292.It Sy ETHER_STAT_CAP_100GFDX 1293Indicates the device supports 100 Gbit/s full-duplex operation. 1294.It Sy ETHER_STAT_CAP_100HDX 1295Indicates the device supports 100 Mbit/s half-duplex operation. 1296.It Sy ETHER_STAT_CAP_100T4 1297Indicates the device supports 100 Mbit/s 100BASE-T4 operation. 1298.It Sy ETHER_STAT_CAP_10FDX 1299Indicates the device supports 10 Mbit/s full-duplex operation. 1300.It Sy ETHER_STAT_CAP_10GFDX 1301Indicates the device supports 10 Gbit/s full-duplex operation. 1302.It Sy ETHER_STAT_CAP_10HDX 1303Indicates the device supports 10 Mbit/s half-duplex operation. 1304.It Sy ETHER_STAT_CAP_2500FDX 1305Indicates the device supports 2.5 Gbit/s full-duplex operation. 1306.It Sy ETHER_STAT_CAP_40GFDX 1307Indicates the device supports 40 Gbit/s full-duplex operation. 1308.It Sy ETHER_STAT_CAP_5000FDX 1309Indicates the device supports 5.0 Gbit/s full-duplex operation. 1310.It Sy ETHER_STAT_CAP_ASMPAUSE 1311Indicates that the device supports the ability to receive pause frames. 1312.It Sy ETHER_STAT_CAP_AUTONEG 1313Indicates that the device supports the ability to perform link 1314auto-negotiation. 1315.It Sy ETHER_STAT_CAP_PAUSE 1316Indicates that the device supports the ability to transmit pause frames. 1317.It Sy ETHER_STAT_CAP_REMFAULT 1318Indicates that the device supports the ability of detecting a remote 1319fault in a link peer. 1320.It Sy ETHER_STAT_CARRIER_ERRORS 1321Indicates the number of times that the Ethernet carrier sense condition 1322was lost or not asserted. 1323.It Sy ETHER_STAT_DEFER_XMTS 1324Indicates the number of frames for which the device was unable to 1325transmit the frame due to being busy and had to try again. 1326.It Sy ETHER_STAT_EX_COLLISIONS 1327Indicates the number of frames that failed to send due to an excessive 1328number of collisions. 1329.It Sy ETHER_STAT_FCS_ERRORS 1330Indicates the number of times that a frame check sequence failed. 1331.It Sy ETHER_STAT_FIRST_COLLISIONS 1332Indicates the number of times that a frame was eventually transmitted 1333successfully, but only after a single collision. 1334.It Sy ETHER_STAT_JABBER_ERRORS 1335Indicates the number of frames that were received that were both larger 1336than the maximum packet size and failed the frame check sequence. 1337.It Sy ETHER_STAT_LINK_ASMPAUSE 1338Indicates whether the link is currently configured to accept pause 1339frames. 1340.It Sy ETHER_STAT_LINK_AUTONEG 1341Indicates whether the current link state is a result of 1342auto-negotiation. 1343.It Sy ETHER_STAT_LINK_DUPLEX 1344Indicates the current duplex state of the link. The values used here 1345should be the same as documented for 1346.Sy MAC_PROP_DUPLEX . 1347.It Sy ETHER_STAT_LINK_PAUSE 1348Indicates whether the link is currently configured to generate pause 1349frames. 1350.It Sy ETHER_STAT_LP_CAP_1000FDX 1351Indicates the remote device supports 1 Gbit/s full-duplex operation. 1352.It Sy ETHER_STAT_LP_CAP_1000HDX 1353Indicates the remote device supports 1 Gbit/s half-duplex operation. 1354.It Sy ETHER_STAT_LP_CAP_100FDX 1355Indicates the remote device supports 100 Mbit/s full-duplex operation. 1356.It Sy ETHER_STAT_LP_CAP_100GFDX 1357Indicates the remote device supports 100 Gbit/s full-duplex operation. 1358.It Sy ETHER_STAT_LP_CAP_100HDX 1359Indicates the remote device supports 100 Mbit/s half-duplex operation. 1360.It Sy ETHER_STAT_LP_CAP_100T4 1361Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation. 1362.It Sy ETHER_STAT_LP_CAP_10FDX 1363Indicates the remote device supports 10 Mbit/s full-duplex operation. 1364.It Sy ETHER_STAT_LP_CAP_10GFDX 1365Indicates the remote device supports 10 Gbit/s full-duplex operation. 1366.It Sy ETHER_STAT_LP_CAP_10HDX 1367Indicates the remote device supports 10 Mbit/s half-duplex operation. 1368.It Sy ETHER_STAT_LP_CAP_2500FDX 1369Indicates the remote device supports 2.5 Gbit/s full-duplex operation. 1370.It Sy ETHER_STAT_LP_CAP_40GFDX 1371Indicates the remote device supports 40 Gbit/s full-duplex operation. 1372.It Sy ETHER_STAT_LP_CAP_5000FDX 1373Indicates the remote device supports 5.0 Gbit/s full-duplex operation. 1374.It Sy ETHER_STAT_LP_CAP_ASMPAUSE 1375Indicates that the remote device supports the ability to receive pause 1376frames. 1377.It Sy ETHER_STAT_LP_CAP_AUTONEG 1378Indicates that the remote device supports the ability to perform link 1379auto-negotiation. 1380.It Sy ETHER_STAT_LP_CAP_PAUSE 1381Indicates that the remote device supports the ability to transmit pause 1382frames. 1383.It Sy ETHER_STAT_LP_CAP_REMFAULT 1384Indicates that the remote device supports the ability of detecting a 1385remote fault in a link peer. 1386.It Sy ETHER_STAT_MACRCV_ERRORS 1387Indicates the number of times that the internal MAC layer encountered an 1388error when attempting to receive and process a frame. 1389.It Sy ETHER_STAT_MACXMT_ERRORS 1390Indicates the number of times that the internal MAC layer encountered an 1391error when attempting to process and transmit a frame. 1392.It Sy ETHER_STAT_MULTI_COLLISIONS 1393Indicates the number of times that a frame was eventually transmitted 1394successfully, but only after more than one collision. 1395.It Sy ETHER_STAT_SQE_ERRORS 1396Indicates the number of times that an SQE error occurred. The specific 1397conditions for this error are documented in IEEE 802.3. 1398.It Sy ETHER_STAT_TOOLONG_ERRORS 1399Indicates the number of frames that were received that were longer than 1400the maximum frame size supported by the device. 1401.It Sy ETHER_STAT_TOOSHORT_ERRORS 1402Indicates the number of frames that were received that were shorter than 1403the minimum frame size supported by the device. 1404.It Sy ETHER_STAT_TX_LATE_COLLISIONS 1405Indicates the number of times a collision was detected late on the 1406device. 1407.It Sy ETHER_STAT_XCVR_ADDR 1408Indicates the address of the MII/GMII receiver address. 1409.It Sy ETHER_STAT_XCVR_ID 1410Indicates the id of the MII/GMII receiver address. 1411.It Sy ETHER_STAT_XCVR_INUSE 1412Indicates what kind of receiver is in use. The following values may be 1413used: 1414.Bl -tag -width Ds 1415.It Sy XCVR_UNDEFINED 1416The receiver type is undefined by the hardware. 1417.It Sy XCVR_NONE 1418There is no receiver in use by the hardware. 1419.It Sy XCVR_10 1420The receiver supports 10BASE-T operation. 1421.It Sy XCVR_100T4 1422The receiver supports 100BASE-T4 operation. 1423.It Sy XCVR_100X 1424The receiver supports 100BASE-TX operation. 1425.It Sy XCVR_100T2 1426The receiver supports 100BASE-T2 operation. 1427.It Sy XCVR_1000X 1428The receiver supports 1000BASE-X operation. This is used for all fiber 1429receivers. 1430.It Sy XCVR_1000T 1431The receiver supports 1000BASE-T operation. This is used for all copper 1432receivers. 1433.El 1434.El 1435.Ss Device Specific kstats 1436In addition to the defined statistics above, if the device driver 1437maintains additional statistics or the device provides additional 1438statistics, it should create its own kstats through the 1439.Xr kstat_create 9F 1440function to allow operators to observe them. 1441.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT 1442Device drivers are the first line of defense for dealing with broken 1443devices and bugs in their firmware. While most devices will rarely fail, 1444it is important that when designing and implementing the device driver 1445that particular attention is paid in the design with respect to RAS 1446(Reliability, Availability, and Serviceability). While everything 1447described in this section is optional, it is highly recommended that 1448all new device drivers follow these guidelines. 1449.Pp 1450The Fault Management Architecture (FMA) provides facilities for 1451detecting and reporting various classes of defects and faults. 1452Specifically for networking device drivers, issues that should be 1453detected and reported include: 1454.Bl -bullet -offset indent 1455.It 1456Device internal uncorrectable errors 1457.It 1458Device internal correctable errors 1459.It 1460PCI and PCI Express transport errors 1461.It 1462Device temperature alarms 1463.It 1464Device transmission stalls 1465.It 1466Device communication timeouts 1467.It 1468High invalid interrupts 1469.El 1470.Pp 1471All such errors fall into three primary categories: 1472.Bl -enum -offset indent 1473.It 1474Errors detected by the Fault Management Architecture 1475.It 1476Errors detected by the device and indicated to the device driver 1477.It 1478Errors detected by the device driver 1479.El 1480.Ss Fault Management Setup and Teardown 1481Drivers should initialize support for the fault management framework by 1482calling 1483.Xr ddi_fm_init 9F 1484from their 1485.Xr attach 9E 1486routine. By registering with the fault management framework, a device 1487driver is given the chance to detect and notice transport errors as well 1488as report other errors that exist. While a device driver does not need to 1489indicate that it is capable of all such capabilities described in 1490.Xr ddi_fm_init 9F , 1491we suggest that device drivers at least register the 1492.Sy DDI_FM_EREPORT_CAPABLE 1493so as to allow the driver to report issues that it detects. 1494.Pp 1495If the driver registers with the fault management framework during its 1496.Xr attach 9E 1497entry point, it must call 1498.Xr ddi_fm_fini 9E 1499during its 1500.Xr detach 9E 1501entry point. 1502.Ss Transport Errors 1503Many modern networking devices leverage PCI or PCI Express. As such, 1504there are two primary ways that device drivers access data: they either 1505memory map device registers and use routines like 1506.Xr ddi_get8 9F 1507and 1508.Xr ddi_put8 9F 1509or they use direct memory access (DMA). New device drivers should always 1510enable checking of the transport layer by marking their support in the 1511.Xr ddi_device_acc_attr_t 9S 1512structure and using routines like 1513.Xr ddi_fm_acc_err_get 9F 1514and 1515.Xr ddi_fm_dma_err_get 9F 1516to detect if errors have occurred. 1517.Ss Device Indicated Errors 1518Many devices have capabilities to announce to a device driver that a 1519fatal correctable error or uncorrectable error has occurred. Other 1520devices have the ability to indicate that various physical issues have 1521occurred such as a fan failing or a temperature sensor having fired. 1522.Pp 1523Drivers should wire themselves to receive notifications when these 1524events occur. The means and capabilities will vary from device to 1525device. For example, some devices will generate information about these 1526notifications through special interrupts. Other devices may have a 1527register that software can poll. In the cases where polling is required, 1528driver writers should try not to poll too frequently and should 1529generally only poll when the device is actively being used, e.g. between 1530calls to the 1531.Xr mc_start 9E 1532and 1533.Xr mc_stop 9E 1534entry points. 1535.Ss Driver Transmit Stall Detection 1536One of the primary responsibilities of a hardened device driver is to 1537perform transmit stall detection. The core idea behind tx stall 1538detection is that the driver should record when it's getting activity 1539related to when data has been successfully transmitted. Most devices 1540should be transmitting data on a regular basis as long as the link is 1541up. If it is not, then this may indicate that the device is stuck and 1542needs to be reset. At this time, the MAC framework does not provide any 1543resources for performing these checks; however, polling on each 1544individual transmit ring for the last completion time while something is 1545actively being transmitted through the use of routines such as 1546.Xr timeout 9F 1547may be a reasonable starting point. 1548.Ss Driver Command Timeout Detection 1549Each device is programmed in different ways. Some devices are programmed 1550through asynchronous commands while others are programmed by writing 1551directly to memory mapped registers. If a device receives asynchronous 1552replies to commands, then the device driver should set reasonable 1553timeouts for all such commands and plan on detecting them. If a timeout 1554occurs, the driver should presume that there is an issue with the 1555hardware and proceed to abort the command or reset the device. 1556.Pp 1557Many devices do not have such a communication mechanism. However, 1558whenever there is some activity where the device driver must wait, then 1559it should be prepared for the fact that the device may never get back to 1560it and react appropriately by performing some kind of device reset. 1561.Ss Reacting to Errors 1562When any of the above categories of errors has been triggered, the 1563behavior that the device driver should take depends on the kind of 1564error. If a fatal error, for example, a transport error, a transmit 1565stall was detected, or the device indicated an uncorrectable error was 1566detected, then it is 1567important that the driver take the following steps: 1568.Bl -enum -offset indent 1569.It 1570Set a flag in the device driver's state that indicates that it has hit 1571an error condition. When this error condition flag is asserted, 1572transmitted packets should be accepted and dropped and actions that would 1573require writing to the device state should fail with an error. This flag 1574should remain until the device has been successfully restarted. 1575.It 1576If the error was not a transport error that was indicated by the fault 1577management architecture, e.g. a transport error that was detected, then 1578the device driver should post an 1579.Sy ereport 1580indicating what has occurred with the 1581.Xr ddi_fm_ereport_post 9F 1582function. 1583.It 1584The device driver should indicate that the device's service was lost 1585with a call to 1586.Xr ddi_fm_service_impact 9F 1587using the symbol 1588.Sy DDI_SERVICE_LOST . 1589.It 1590At this point the device driver should issue a device reset through some 1591device-specific means. 1592.It 1593When the device reset has been completed, then the device driver should 1594restore all of the programmed state to the device. This includes things 1595like the current MTU, advertised auto-negotiation speeds, MAC address 1596filters, and more. 1597.It 1598Finally, when service has been restored, the device driver should call 1599.Xr ddi_fm_service_impact 9F 1600using the symbol 1601.Sy DDI_SERVICE_RESTORED . 1602.El 1603.Pp 1604When a non-fatal error occurs, then the device driver should submit an 1605ereport and should optionally mark the device degraded using 1606.Xr ddi_fm_service_impact 9F 1607with the 1608.Sy DDI_SERVICE_DEGRADED 1609value depending on the nature of the problem that has occurred. 1610.Pp 1611Device drivers should never make the decision to remove a device from 1612service based on errors that have occurred nor should they panic the 1613system. Rather, the device driver should always try to notify the 1614operating system with various ereports and allow its policy decisions to 1615occur. The decision to retire a device lies in the hands of the fault 1616management architecture. It knows more about the operator's intent and 1617the surrounding system's state than the device driver itself does and it 1618will make the call to offline and retire the device if it is required. 1619.Ss Device Resets 1620When resetting a device, a device driver must exercise caution. If a 1621device driver has not been written to plan for a device reset, then it 1622may not correctly restore the device's state after such a reset. Such 1623state should be stored in the instance's private state data as the MAC 1624framework does not know about device resets and will not inform the 1625device again about the expected, programmed state. 1626.Pp 1627One wrinkle with device resets is that many networking cards show up as 1628multiple PCI functions on a single device, for example, each port may 1629show up as a separate function and thus have a separate instance of the 1630device driver attached. When resetting a function, device driver writers 1631should carefully read the device programming manuals and verify whether 1632or not a reset impacts only the stalled function or if it impacts all 1633function across the device. 1634.Pp 1635If the only way to reset a given function is through the device, then 1636this may require more coordination and work on the part of the device 1637driver to ensure that all the other instances are correctly restored. 1638In cases where this occurs, some devices offer ways of injecting 1639interrupts onto those other functions to notify them that this is 1640occurring. 1641.Sh MBLKS AND DMA 1642The networking stack manages framed data through the use of the 1643.Xr mblk 9S 1644structure. The mblk allows for a single message to be made up of 1645individual blocks. Each part is linked together through its 1646.Sy b_cont 1647member. However, it also allows for multiple messages to be chained 1648together through the use of the 1649.Sy b_next 1650member. While the networking stack works with these structures, device 1651drivers generally work with DMA regions. There are two different 1652strategies that device drivers use for handling these two different 1653cases: copying and binding. 1654.Ss Copying Data 1655The first way that device drivers handle interfacing between the two is 1656by having two separate regions of memory. One part is memory which has 1657been allocated for DMA through a call to 1658.Xr ddi_dma_alloc 9F 1659and the other is memory associated with the memory block. 1660.Pp 1661In this case, a driver will use 1662.Xr bcopy 9F 1663to copy memory between the two distinct regions. When transmitting a 1664packet, it will copy the memory from the mblk_t to the DMA region. When 1665receiving memory, it will allocate a mblk_t through the 1666.Xr allocb 9F 1667routine, copy the memory across with 1668.Xr bcopy 9F , 1669and then increment the mblk_t's 1670.Sy w_ptr 1671structure. 1672.Pp 1673If, when receiving, memory is not available for a new message block, 1674then the frame should be skipped and effectively dropped. A kstat should 1675be bumped when such an occasion occurs. 1676.Ss Binding Data 1677An alternative approach to copying data is to use DMA binding. When 1678using DMA binding, the OS takes care of mapping between DMA memory and 1679normal device memory. The exact process is a bit different between 1680transmit and receive. 1681.Pp 1682When transmitting a device driver has an mblk_t and needs to call the 1683.Xr ddi_dma_addr_bind_handle 9F 1684function to bind it to an already existing DMA handle. At that point, it 1685will receive various DMA cookies that it can use to obtain the addresses 1686to program the device with for transmitting data. Once the transmit is 1687done, the driver must then make sure to call 1688.Xr freemsg 9F 1689to release the data. It must not call 1690.Xr freemsg 9F 1691before it receives an interrupt from the device indicating that the data 1692has been transmitted, otherwise it risks sending arbitrary kernel 1693memory. 1694.Pp 1695When receiving data, the device can perform a similar operation. First, 1696it must bind the DMA memory into the kernel's virtual memory address 1697space through a call to the 1698.Xr ddi_dma_addr_bind_handle 9F 1699function if it has not already. Once it has, it must then call 1700.Xr desballoc 9F 1701to try and create a new mblk_t which leverages the associated memory. It 1702can then pass that mblk_t up to the stack. 1703.Ss Considerations 1704When deciding which of these options to use, there are many different 1705considerations that must be made. The answer as to whether to bind 1706memory or to copy data is not always simpler. 1707.Pp 1708The first thing to remember is that DMA resources may be finite on a 1709given platform. Consider the case of receiving data. A device driver 1710that binds one of its receive descriptors may not get it back for quite 1711some time as it may be used by the kernel until an application actually 1712consumes it. Device drivers that try to bind memory for receive, often 1713work with the constraint that they must be able to replace that DMA 1714memory with another DMA descriptor. If they were not replaced, then 1715eventually the device would not be able to receive additional data into 1716the ring. 1717.Pp 1718On the other hand, particularly for larger frames, copying every packet 1719from one buffer to another can be a source of additional latency and 1720memory waste in the system. For larger copies, the cost of copying may 1721dwarf any potential cost of performing DMA binding. 1722.Pp 1723For device driver authors that are unsure of what to do, they should 1724first employ the copying method to simplify the act of writing the 1725device driver. The copying method is simpler and also allows the device 1726driver author not to worry about allocated DMA memory that is still 1727outstanding when it is asked to unload. 1728.Pp 1729If device driver writers are worried about the cost, it is recommended 1730to make the decision as to whether or not to copy or bind DMA data 1731a separate private property for both transmitting and receiving. That 1732private property should indicate the size of the received frame at which 1733to switch from one format to the other. This way, data can be gathered 1734to determine what the impact of each method is on a given platform. 1735.Sh SEE ALSO 1736.Xr dladm 1M , 1737.Xr driver.conf 4 , 1738.Xr ieee802.3 5 , 1739.Xr dlpi 7P , 1740.Xr _fini 9E , 1741.Xr _info 9E , 1742.Xr _init 9E , 1743.Xr attach 9E , 1744.Xr close 9E , 1745.Xr detach 9E , 1746.Xr mc_close 9E , 1747.Xr mc_getcapab 9E , 1748.Xr mc_getprop 9E , 1749.Xr mc_getstat 9E , 1750.Xr mc_multicst 9E , 1751.Xr mc_open 9E , 1752.Xr mc_propinfo 9E , 1753.Xr mc_setpromisc 9E , 1754.Xr mc_setprop 9E , 1755.Xr mc_start 9E , 1756.Xr mc_stop 9E , 1757.Xr mc_tx 9E , 1758.Xr mc_unicst 9E , 1759.Xr open 9E , 1760.Xr allocb 9F , 1761.Xr bcopy 9F , 1762.Xr ddi_dma_addr_bind_handle 9F , 1763.Xr ddi_dma_alloc 9F , 1764.Xr ddi_fm_acc_err_get 9F , 1765.Xr ddi_fm_dma_err_get 9F , 1766.Xr ddi_fm_ereport_post 9F , 1767.Xr ddi_fm_fini 9F , 1768.Xr ddi_fm_init 9F , 1769.Xr ddi_fm_service_impact 9F , 1770.Xr ddi_get8 9F , 1771.Xr ddi_put8 9F , 1772.Xr desballoc 9F , 1773.Xr freemsg 9F , 1774.Xr kstat_create 9F , 1775.Xr mac_alloc 9F , 1776.Xr mac_fini_ops 9F , 1777.Xr mac_hcksum_get 9F , 1778.Xr mac_hcksum_set 9F , 1779.Xr mac_init_ops 9F , 1780.Xr mac_link_update 9F , 1781.Xr mac_lso_get 9F , 1782.Xr mac_maxsdu_update 9F , 1783.Xr mac_prop_info_set_default_link_flowctrl 9F , 1784.Xr mac_prop_info_set_default_str 9F , 1785.Xr mac_prop_info_set_default_uint32 9F , 1786.Xr mac_prop_info_set_default_uint64 9F , 1787.Xr mac_prop_info_set_default_uint8 9F , 1788.Xr mac_prop_info_set_perm 9F , 1789.Xr mac_prop_info_set_range_uint32 9F , 1790.Xr mac_register 9F , 1791.Xr mac_rx 9F , 1792.Xr mac_unregister 9F , 1793.Xr mc_getprop 9F , 1794.Xr mc_tx 9F , 1795.Xr mod_install 9F , 1796.Xr mod_remove 9F , 1797.Xr strcmp 9F , 1798.Xr timeout 9F , 1799.Xr cb_ops 9S , 1800.Xr ddi_device_acc_attr_t 9S , 1801.Xr dev_ops 9S , 1802.Xr kstat_create 9S , 1803.Xr mac_callbacks 9S , 1804.Xr mac_register 9S , 1805.Xr mblk 9S , 1806.Xr modldrv 9S , 1807.Xr modlinkage 9S 1808.Rs 1809.%A McCloghrie, K. 1810.%A Rose, M. 1811.%T RFC 1213 Management Information Base for Network Management of 1812.%T TCP/IP-based internets: MIB-II 1813.%D March 1991 1814.Re 1815.Rs 1816.%A McCloghrie, K. 1817.%A Kastenholz, F. 1818.%T RFC 1573 Evolution of the Interfaces Group of MIB-II 1819.%D January 1994 1820.Re 1821.Rs 1822.%A Kastenholz, F. 1823.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like 1824.%T Interface Types 1825.Re 1826