1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright 2019 Joyent, Inc. 13.\" Copyright 2020 RackTop Systems, Inc. 14.\" 15.Dd May 11, 2020 16.Dt MAC 9E 17.Os 18.Sh NAME 19.Nm mac , 20.Nm GLDv3 21.Nd MAC networking device driver overview 22.Sh SYNOPSIS 23.In sys/mac_provider.h 24.In sys/mac_ether.h 25.Sh INTERFACE LEVEL 26illumos DDI specific 27.Sh DESCRIPTION 28The 29.Sy MAC 30framework provides a means for implementing high-performance networking 31device drivers. 32It is the successor to the GLD interfaces and is sometimes referred to as the 33GLDv3. 34The remainder of this manual introduces the aspects of writing devices drivers 35that leverage the MAC framework. 36While both the GLDv3 and MAC framework refer to the same thing, in this manual 37page we use the term the 38.Em MAC framework 39to refer to the device driver interface. 40.Pp 41MAC device drivers are character devices. 42They define the standard 43.Xr _init 9E , 44.Xr _fini 9E , 45and 46.Xr _info 9E 47entry points to initialize the module, as well as 48.Xr dev_ops 9S 49and 50.Xr cb_ops 9S 51structures. 52.Pp 53The main interface with MAC is through a series of callbacks defined in 54a 55.Xr mac_callbacks 9S 56structure. 57These callbacks control all the aspects of the device. 58They range from sending data, getting and setting of properties, controlling mac 59address filters, and also managing promiscuous mode. 60.Pp 61The MAC framework takes care of many aspects of the device driver's 62management. 63A device that uses the MAC framework does not have to worry about creating 64device nodes or implementing 65.Xr open 9E 66or 67.Xr close 9E 68routines. 69In addition, all of the work to interact with 70.Xr dlpi 7P 71is taken care of automatically and transparently. 72.Ss Initializing MAC Support 73For a device to be used in the framework, it must register with the 74framework and take specific actions during 75.Xr _init 9E , 76.Xr attach 9E , 77.Xr detach 9E , 78and 79.Xr _fini 9E . 80.Pp 81All device drivers have to define a 82.Xr dev_ops 9S 83structure which is pointed to by a 84.Xr modldrv 9S 85structure and the corresponding NULL-terminated 86.Xr modlinkage 9S 87structure. 88The 89.Xr dev_ops 9S 90structure should have a 91.Xr cb_ops 9S 92structure defined for it; however, it does not need to implement any of 93the standard 94.Xr cb_ops 9S 95entry points. 96.Pp 97Normally, in a driver's 98.Xr _init 9E 99entry point, it passes its 100.Sy modlinkage 101structure directly to 102.Xr mod_install 9F . 103To properly register with MAC, the driver must call 104.Xr mac_init_ops 9F 105before it calls 106.Xr mod_install 9F . 107If for some reason the 108.Xr mod_install 9F 109function fails, then the driver must be removed by a call to 110.Xr mac_fini_ops 9F . 111.Pp 112Conversely, in the driver's 113.Xr _fini 9E 114routine, it should call 115.Xr mac_fini_ops 9F 116after it successfully calls 117.Xr mod_remove 9F . 118For an example of how to use the 119.Xr mac_init_ops 9F 120and 121.Xr mac_fini_ops 9F 122functions, see the examples section in 123.Xr mac_init_ops 9F . 124.Ss Registering with MAC 125Every instance of a device should register separately with MAC. 126To register with MAC, a driver must allocate a 127.Xr mac_register 9S 128structure, fill it in, and then call 129.Xr mac_register 9F . 130The 131.Sy mac_register_t 132structure contains information about the device and all of the required 133function pointers that will be used as callbacks by the framework. 134.Pp 135These steps should all be taken during a device's 136.Xr attach 9E 137entry point. 138It is recommended that the driver perform this sequence of steps after the 139device has finished its initialization of the chipset and interrupts, though 140interrupts should not be enabled at that point. 141After it calls 142.Xr mac_register 9F 143it will start receiving callbacks from the MAC framework. 144.Pp 145To allocate the registration structure, the driver should call 146.Xr mac_alloc 9F . 147Device drivers should generally always pass the symbol 148.Sy MAC_VERSION 149as the argument to 150.Xr mac_alloc 9F . 151Upon successful completion, the driver will receive a 152.Sy mac_register_t 153structure which it should fill in. 154The structure and its members are documented in 155.Xr mac_register 9S . 156.Pp 157The 158.Xr mac_callbacks 9S 159structure is not allocated as a part of the 160.Xr mac_register 9S 161structure. 162In general, device drivers declare this statically. 163See the 164.Sx MAC Callbacks 165section for more information on how to fill it out. 166.Pp 167Once the structure has been filled in, the driver should call 168.Xr mac_register 9F 169to register itself with MAC. 170The handle that it uses to register with should be part of the driver's soft 171state. 172It will be used in various other support functions and callbacks. 173.Pp 174If the call is successful, then the device driver 175should enable interrupts and finish any other initialization required. 176If the call to 177.Xr mac_register 9F 178failed, then it should unwind its initialization and should return 179.Sy DDI_FAILURE 180from its 181.Xr attach 9E 182routine. 183.Pp 184The driver does not need to hold onto an allocated 185.Xr mac_register 9S 186structure after it has called the 187.Xr mac_register 9F 188function. 189Whether the 190.Xr mac_register 9F 191function returns successfully or not, the driver may free its 192.Xr mac_register 9S 193structure by calling the 194.Xr mac_free 9F 195function. 196.Ss MAC Callbacks 197The MAC framework interacts with a device driver through a series of 198callbacks. 199These callbacks are described in their individual manual pages and the 200collection of callbacks is indicated in the 201.Xr mac_callbacks 9S 202manual page. 203This section does not focus on the specific functions, but rather on 204interactions between them and the rest of the device driver framework. 205.Pp 206A device driver should make no assumptions about when the various 207callbacks will be called and whether or not they will be called 208simultaneously. 209For example, a device driver may be asked to transmit data through a call to its 210.Xr mc_tx 9E 211entry point while it is being asked to get a device property through a 212call to its 213.Xr mc_getprop 9E 214entry point. 215As such, while some calls may be serialized to the device, such as setting 216properties, the device driver should always presume that all of its data needs 217to be protected with locks. 218While the device is holding locks, it is safe for it call the following MAC 219routines: 220.Bl -bullet -offset indent -compact 221.It 222.Xr mac_hcksum_get 9F 223.It 224.Xr mac_hcksum_set 9F 225.It 226.Xr mac_lso_get 9F 227.It 228.Xr mac_maxsdu_update 9F 229.It 230.Xr mac_prop_info_set_default_link_flowctrl 9F 231.It 232.Xr mac_prop_info_set_default_str 9F 233.It 234.Xr mac_prop_info_set_default_uint8 9F 235.It 236.Xr mac_prop_info_set_default_uint32 9F 237.It 238.Xr mac_prop_info_set_default_uint64 9F 239.It 240.Xr mac_prop_info_set_perm 9F 241.It 242.Xr mac_prop_info_set_range_uint32 9F 243.El 244.Pp 245Any other MAC related routines should not be called with locks held, 246such as 247.Xr mac_link_update 9F 248or 249.Xr mac_rx 9F . 250Other routines in the DDI may be called while locks are held; however, 251device driver writers should be careful about calling blocking routines 252while locks are held or in interrupt context, though it is generally 253legal to do so. 254.Ss Receiving Data 255A device driver will often receive data through the means of an 256interrupt. 257When that interrupt occurs, the device driver will receive one or more frames 258with optional metadata. 259Often each frame has a corresponding descriptor which has information about 260whether or not there were errors or whether or not the device successfully 261checksummed the packet. 262In addition to the per-packet flow described below, there are certain 263requirements that drivers must adhere to when programming the hardware 264to receive data. 265See the section 266.Sx RECEIVE DESCRIPTOR LAYOUT 267for more information. 268.Pp 269During a single interrupt, a device driver should process a fixed number 270of frames. 271For each frame the device driver should: 272.Bl -enum -offset indent 273.It 274First check whether or not the frame has errors. 275If errors were detected, then the frame should not be sent to the operating 276system. 277It is recommended that devices keep kstats (see 278.Xr kstat_create 9F 279for more information) and bump the counter whenever such an error is 280detected. 281If the device distinguishes between the types of errors, then separate kstats 282for each class of error are recommended. 283See the 284.Sx STATISTICS 285section for more information on the various error cases that should be 286considered. 287.It 288Once the frame has been determined to be valid, the device driver should 289transform the frame into a 290.Xr mblk 9S . 291See the section 292.Sx MBLKS AND DMA 293for more information on how to transform and prepare a message block. 294.It 295If the device supports hardware checksumming (see the 296.Sx CAPABILITIES 297section for more information on checksumming), then the device driver 298should set the corresponding checksumming information with a call to 299.Xr mac_hcksum_set 9F . 300.It 301It should then append this new message block to the 302.Em end 303of the message block chain, linking it to the 304.Sy b_next 305pointer. 306It is vitally important that all the frames be chained in the order that they 307were received. 308If the device driver mistakenly reorders frames, then it may cause performance 309impacts in the TCP stack and potentially impact application correctness. 310.El 311.Pp 312Once all the frames have been processed and assembled, the device driver 313should deliver them to the rest of the operating system by calling 314.Xr mac_rx 9F . 315The device driver should try to give as many mblk_t structures to the 316system at once. 317It 318.Em should not 319call 320.Xr mac_rx 9F 321once for every assembled mblk_t. 322.Pp 323The device driver must not hold any locks across the call to 324.Xr mac_rx 9F . 325When this function is called, received data will be pushed through the 326networking stack and some replies may be generated and given to the 327driver to send out. 328.Pp 329It is not the device driver's responsibility to determine whether or not 330the system can keep up with a driver's delivery rate of frames. 331The rest of the networking stack will handle issues related to keeping up 332appropriately and ensure that kernel memory is not exhausted by packets 333that are not being processed. 334.Pp 335Finally, the device driver should make sure that any other housekeeping 336activities required for the ring are taken care of such that more data 337can be received. 338.Ss Transmitting Data and Back Pressure 339A device driver will be asked to transmit a message block chain by 340having it's 341.Xr mc_tx 9E 342entry point called. 343While the driver is processing the message blocks, it may run out of resources. 344For example, a transmit descriptor ring may become full. 345At that point, the device driver should return the remaining unprocessed frames. 346The act of returning frames indicates that the device has asserted flow control. 347Once this has been done, no additional calls will be made to the 348driver's transmit entry point and the back pressure will be propagated 349throughout the rest of the networking stack. 350.Pp 351At some point in the future when resources have become available again, 352for example after an interrupt indicating that some portion of the 353transmit ring has been sent, then the device driver must notify the 354system that it can continue transmission. 355To do this, the driver should call 356.Xr mac_tx_update 9F . 357After that point, the driver will receive calls to its 358.Xr mc_tx 9E 359entry point again. 360As mentioned in the section on callbacks, the device driver should avoid holding 361any particular locks across the call to 362.Xr mac_tx_update 9F . 363.Ss Interrupt Coalescing 364For devices operating at higher data rates, interrupt coalescing is an 365important part of a well functioning device and may impact the 366performance of the device. 367Not all devices support interrupt coalescing. 368If interrupt coalescing is supported on the device, it is recommended that 369device driver writers provide private properties for their device to control the 370interrupt coalescing rate. 371This will make it much easier to perform experiments and observe the impact of 372different interrupt rates on the rest of the system. 373.Ss MAC Address Filter Management 374The MAC framework will attempt to use as many MAC address filters as a 375device has. 376To program a multicast address filter, the driver's 377.Xr mc_multicst 9E 378entry point will be called. 379If the device driver runs out of filters, it should not take any special action 380and just return the appropriate error as documented in the corresponding manual 381pages for the entry points. 382The framework will ensure that the device is placed in promiscuous mode 383if it needs to. 384.Ss Link Updates 385It is the responsibility of the device driver to keep track of the 386data link's state. 387Many devices provide a means of receiving an interrupt when the state of the 388link changes. 389When such a change happens, the driver should update its internal data 390structures and then call 391.Xr mac_link_update 9F 392to inform the MAC layer that this has occurred. 393If the device driver does not properly inform the system about link changes, 394then various features like link aggregations and other mechanisms that leverage 395the link state will not work correctly. 396.Ss Link Speed and Auto-negotiation 397Many networking devices support more than one possible speed that they 398can operate at. 399The selection of a speed is often performed through 400.Em auto-negotiation , 401though some devices allow the user to control what speeds are advertised 402and used. 403.Pp 404Logically, there are two different sets of things that the device driver 405needs to keep track of while it's operating: 406.Bl -enum 407.It 408The supported speeds in hardware. 409.It 410The enabled speeds from the user. 411.El 412.Pp 413By default, when a link first comes up, the device driver should 414generally configure the link to support the common set of speeds and 415perform auto-negotiation. 416.Pp 417A user can control what speeds a device advertises via auto-negotiation 418and whether or not it performs auto-negotiation at all by using a series 419of properties that have 420.Sy _EN_ 421in the name. 422These are read/write properties and there is one for each speed supported in the 423operating system. 424For a full list of them, see the 425.Sx PROPERTIES 426section. 427.Pp 428In addition to these properties, there is a corresponding set of 429properties with 430.Sy _ADV_ 431in the name. 432These are similar to the 433.Sy _EN_ 434family of properties, but they are read-only and indicate what the 435device has actually negotiated. 436While they are generally similar to the 437.Sy _EN_ 438family of properties, they may change depending on power settings. 439See the 440.Sy Ethernet Link Properties 441section in 442.Xr dladm 1M 443for more information. 444.Pp 445It's worth discussing how these different values get used throughout the 446different entry points. 447The first entry point to consider is the 448.Xr mc_propinfo 9E 449entry point. 450For a given speed, the driver should consult whether or not the hardware 451supports this speed. 452If it does, it should fill in the default value that the hardware takes and 453whether or not the property is writable. 454The properties should also be updated to indicate whether or not it is writable. 455This holds for both the 456.Sy _EN_ 457and 458.Sy _ADV_ 459family of properties. 460.Pp 461The next entry point is 462.Xr mc_getprop 9E . 463Here, the device should first consult whether the given speed is 464supported. 465If it is not, then the driver should return 466.Er ENOTSUP . 467If it does, then it should return the current value of the property. 468.Pp 469The last property endpoint is the 470.Xr mc_setprop 9E 471entry point. 472Here, the same logic applies. 473Before the driver considers whether or not the property is writable, it should 474first check whether or not it's a supported property. 475If it's not, then it should return 476.Er ENOTSUP . 477Otherwise, it should proceed to check whether the property is writable, 478and if it is and a valid value, then it should update the property and 479restart the link's negotiation. 480.Pp 481Finally, there is the 482.Xr mc_getstat 9E 483entry point. 484Several of the statistics that are queried relate to auto-negotiation and 485hardware capabilities. 486When a statistic relates to the hardware supporting a given speed, the 487.Sy _EN_ 488properties should be ignored. 489The only thing that should be consulted is what the hardware itself supports. 490Otherwise, the statistics should look at what is currently being advertised by 491the device. 492.Ss Unregistering from MAC 493During a driver's 494.Xr detach 9E 495routine, it should unregister the device instance from MAC by calling 496.Xr mac_unregister 9F 497on the handle that it originally called it on. 498If the call to 499.Xr mac_unregister 9F 500failed, then the device is likely still in use and the driver should 501fail the call to 502.Xr detach 9E . 503.Ss Interacting with Devices 504Administrators always interact with devices through the 505.Xr dladm 1M 506command line interface. 507The state of devices such as whether the link is considered 508.Sy up 509or 510.Sy down , 511various link properties such as the 512.Sy MTU , 513.Sy auto-negotiation 514state, 515and 516.Sy flow control 517state, 518are all exposed. 519It is also the preferred way that these properties are set and configured. 520.Pp 521While device tunables may be presented in a 522.Xr driver.conf 4 523file, it is recommended instead to expose such things through 524.Xr dladm 1M 525private properties, whether explicitly documented or not. 526.Sh CAPABILITIES 527Capabilities in the MAC Framework are optional features that a device 528supports which indicate various hardware features that the device 529supports. 530The two current capabilities that the system supports are related to being able 531to hardware perform large send offloads (LSO), often also known as TCP 532segmentation and the ability for hardware to calculate and verify the checksums 533present in IPv4, IPV6, and protocol headers such as TCP and UDP. 534.Pp 535The MAC framework will query a device for support of a capability 536through the 537.Xr mc_getcapab 9E 538function. 539Each capability has its own constant and may have corresponding data that goes 540along with it and a specific structure that the device is required to fill in. 541Note, the set of capabilities changes over time and there are also private 542capabilities in the system. 543Several of the capabilities are used in the implementation of the MAC framework. 544Others, like 545.Sy MAC_CAPAB_RINGS , 546represent feature that have not been stabilized and thus both API and binary 547compatibility for them is not guaranteed. 548It is important that the device driver handles unknown capabilities correctly. 549For more information, see 550.Xr mc_getcapab 9E . 551.Pp 552The following capabilities are 553stable and defined in the system: 554.Ss MAC_CAPAB_HCKSUM 555The 556.Sy MAC_CAPAB_HCKSUM 557capability indicates to the system that the device driver supports some 558amount of checksumming. 559The specific data for this capability is a pointer to a 560.Sy uint32_t . 561To indicate no support for any kind of checksumming, the driver should 562either set this value to zero or simply return that it doesn't support 563the capability. 564.Pp 565Note, the values that the driver declares in this capability indicate 566what it can do when it transmits data. 567If the driver can only verify checksums when receiving data, then it should not 568indicate that it supports this capability. 569The following set of flags may be combined through a bitwise inclusive OR: 570.Bl -tag -width Ds 571.It Sy HCKSUM_INET_PARTIAL 572This indicates that the hardware can calculate a partial checksum for 573both IPv4 and IPv6 UDP and TCP packets; however, it requires the pseudo-header 574checksum be calculated for it. 575The pseudo-header checksum will be available for the mblk_t when calling 576.Xr mac_hcksum_get 9F . 577Note this does not imply that the hardware is capable of calculating 578the partial checksum for other L4 protocols or the IPv4 header checksum. 579That should be indicated with the 580.Sy HCKSUM_IPHDRCKSUM flag. 581.It Sy HCKSUM_INET_FULL_V4 582This indicates that the hardware will fully calculate the L4 checksum for 583outgoing IPv4 UDP or TCP packets only, and does not require a pseudo-header 584checksum. 585Note this does not imply that the hardware is capable of calculating the 586checksum for other L4 protocols or the IPv4 header checksum. 587That should be indicated with the 588.Sy HCKSUM_IPHDRCKSUM . 589.It Sy HCKSUM_INET_FULL_V6 590This indicates that the hardware will fully calculate the L4 checksum for 591outgoing IPv6 UDP or TCP packets only, and does not require a pseudo-header 592checksum. 593Note this does not imply that the hardware is capable of calculating the 594checksum for any other L4 protocols. 595.It Sy HCKSUM_IPHDRCKSUM 596This indicates that the hardware supports calculating the checksum for 597the IPv4 header itself. 598.El 599.Pp 600When in a driver's transmit function, the driver will be processing a 601single frame. 602It should call 603.Xr mac_hcksum_get 9F 604to see what checksum flags are set on it. 605Note that the flags that are set on it are different from the ones described 606above and are documented in its manual page. 607These flags indicate how the driver is expected to program the hardware and what 608checksumming is required. 609Not all frames will require hardware checksumming or will ask the hardware to 610checksum it. 611.Pp 612If a driver supports offloading the receive checksum and verification, 613it should check to see what the hardware indicated was verified. 614The driver should then call 615.Xr mac_hcksum_set 9F . 616The flags used are different from the ones above and are discussed in 617detail in the 618.Xr mac_hcksum_set 9F 619manual page. 620If there is no checksum information available or the driver does not support 621checksumming, then it should simply not call 622.Xr mac_hcksum_set 9F . 623.Pp 624Note that the checksum flags should be set on the first 625mblk_t that makes up a given message. 626In other words, if multiple mblk_t structures are linked together by the 627.Sy b_cont 628member to describe a single frame, then it should only be called on the 629first mblk_t of that set. 630However, each distinct message should have the checksum bits set on it, if 631applicable. 632In other words, each mblk_t that is linked together by the 633.Sy b_next 634pointer may have checksum flags set. 635.Pp 636It is recommended that device drivers provide a private property or 637.Xr driver.conf 4 638property to control whether or not checksumming is enabled for both rx 639and tx; however, the default disposition is recommended to be enabled 640for both. 641This way if hardware bugs are found in the checksumming implementation, they can 642be disabled without requiring software updates. 643The transmit property should be checked when determining how to reply to 644.Xr mc_getcapab 9E 645and the receive property should be checked in the context of the receive 646function. 647.Ss MAC_CAPAB_LSO 648The 649.Sy MAC_CAPAB_LSO 650capability indicates that the driver supports various forms of large 651send offload (LSO). 652The private data is a pointer to a 653.Sy mac_capab_lso_t 654structure. 655At the moment, LSO support is limited to TCP inside of IPv4. 656This structure has the following members which are used to indicate 657various types of LSO support. 658.Bd -literal -offset indent 659t_uscalar_t lso_flags; 660lso_basic_tcp_ivr4_t lso_basic_tcp_ipv4; 661.Ed 662.Pp 663The 664.Sy lso_flags 665member is used to indicate which members are valid and should be 666considered. 667Each flag represents a different form of LSO. 668The member should be set to the bitwise inclusive OR of the following values: 669.Bl -tag -width Dv -offset indent 670.It Sy LSO_TX_BASIC_TCP_IPV4 671This indicates hardware support for performing TCP segmentation 672offloading over IPv4. 673When this flag is set, the 674.Sy lso_basic_tcp_ipv4 675member must be filled in. 676.El 677.Pp 678The 679.Sy lso_basic_tcp_ipv4 680member is a structure with the following members: 681.Bd -literal -offset indent 682t_uscalar_t lso_max 683.Ed 684.Bd -filled -offset indent 685The 686.Sy lso_max 687member should be set to the maximum size of the TCP data 688payload that can be offloaded to the hardware. 689.Ed 690.Pp 691Like with checksumming, it is recommended that driver writers provide a 692means for disabling the support of LSO even if it is enabled by default. 693This deals with the case where issues that pop up for LSO may be worked 694around without requiring additional driver work. 695.Sh PROPERTIES 696Properties in the MAC framework represent aspects of a link. 697These include things like the link's current state and MTU. 698Many of the properties in the system are focused around auto-negotiation and 699controlling what link speeds are advertised. 700Information about properties is covered by three different device entry points. 701The 702.Xr mc_propinfo 9E 703entry point obtains metadata about the property. 704The 705.Xr mc_getprop 9E 706entry point obtains the property. 707The 708.Xr mc_setprop 9E 709entry point updates the property to a new value. 710.Pp 711Many of the properties listed below are read-only. 712Each property indicates whether it's read-only or it's read/write. 713However, driver writers may not implement the ability to set all writable 714properties. 715Many of these depend on the card itself. 716In particular, all properties that relate to auto-negotiation and are read/write 717may not be updated if the hardware in question does not support toggling what 718link speeds are auto-negotiated. 719While copper Ethernet often does not have this restriction, it often exists with 720various fiber standards and phys. 721.Pp 722The following properties are the subset of MAC framework properties that 723driver writers should be aware of and handle. 724While other properties exist in the system, driver writers should always return 725an error when a property not listed below is encountered. 726See 727.Xr mc_getprop 9E 728and 729.Xr mc_setprop 9E 730for more information on how to handle them. 731.Bl -hang -width Ds 732.It Sy MAC_PROP_DUPLEX 733.Bd -filled -compact 734Type: 735.Sy link_duplex_t | 736Permissions: 737.Sy Read-Only 738.Ed 739.Pp 740The 741.Sy MAC_PROP_DUPLEX 742property is used to indicate whether or not the link is duplex. 743A duplex link may have traffic flowing in both directions at the same time. 744The 745.Sy link_duplex_t 746is an enumeration which may be set to any of the following values: 747.Bl -tag -width Ds 748.It Sy LINK_DUPLEX_UNKNOWN 749The current state of the link is unknown. 750This may be because the link has not negotiated to a specific speed or it is 751down. 752.It Sy LINK_DUPLEX_HALF 753The link is running at half duplex. 754Communication may travel in only one direction on the link at a given time. 755.It Sy LINK_DUPLEX_FULL 756The link is running at full duplex. 757Communication may travel in both directions on the link simultaneously. 758.El 759.It Sy MAC_PROP_SPEED 760.Bd -filled -compact 761Type: 762.Sy uint64_t | 763Permissions: 764.Sy Read-Only 765.Ed 766.Pp 767The 768.Sy MAC_PROP_SPEED 769property stores the current link speed in bits per second. 770A link that is running at 100 MBit/s would store the value 100000000ULL. 771A link that is running at 40 Gbit/s would store the value 40000000000ULL. 772.It Sy MAC_PROP_STATUS 773.Bd -filled -compact 774Type: 775.Sy link_state_t | 776Permissions: 777.Sy Read-Only 778.Ed 779.Pp 780The 781.Sy MAC_PROP_STATUS 782property is used to indicate the current state of the link. 783It indicates whether the link is up or down. 784The 785.Sy link_state_t 786is an enumeration which may be set to any of the following values: 787.Bl -tag -width Ds 788.It Sy LINK_STATE_UNKNOWN 789The current state of the link is unknown. 790This may be because the driver's 791.Xr mc_start 9E 792endpoint has not been called so it has not attempted to start the link. 793.It Sy LINK_STATE_DOWN 794The link is down. 795This may be because of a negotiation problem, a cable problem, or some other 796device specific issue. 797.It Sy LINK_STATE_UP 798The link is up. 799If auto-negotiation is in use, it should have completed. 800Traffic should be able to flow over the link, barring other issues. 801.El 802.It Sy MAC_PROP_AUTONEG 803.Bd -filled -compact 804Type: 805.Sy uint8_t | 806Permissions: 807.Sy Read/Write 808.Ed 809.Pp 810The 811.Sy MAC_PROP_AUTONEG 812property indicates whether or not the device is currently configured to 813perform auto-negotiation. 814A value of 815.Sy 0 816indicates that auto-negotiation is disabled. 817A 818.Sy non-zero 819value indicates that auto-negotiation is enabled. 820Devices should generally default to enabling auto-negotiation. 821.Pp 822When getting this property, the device driver should return the current 823state. 824When setting this property, if the device supports operating in the requested 825mode, then the device driver should reset the link to negotiate to the new speed 826after updating any internal registers. 827.It Sy MAC_PROP_MTU 828.Bd -filled -compact 829Type: 830.Sy uint32_t | 831Permissions: 832.Sy Read/Write 833.Ed 834.Pp 835The 836.Sy MAC_PROP_MTU 837property determines the maximum transmission unit (MTU). 838This indicates the maximum size packet that the device can transmit, ignoring 839its own headers. 840For an Ethernet device, this would exclude the size of the Ethernet header and 841any VLAN headers that would be placed. 842It is up to the driver to ensure that any MTU values that it accepts when adding 843in its margin and header sizes does not exceed its maximum frame size. 844.Pp 845By default, drivers for Ethernet should initialize this value and the 846MTU to 847.Sy 1500 . 848When getting this property, the driver should return its current 849recorded MTU. 850When setting this property, the driver should first validate that it is within 851the device's valid range and then it must call 852.Xr mac_maxsdu_update 9F . 853Note that the call may fail. 854If the call completes successfully, the driver should update the hardware with 855the new value of the MTU and perform any other work needed to handle it. 856.Pp 857If the device does not support changing the MTU after the device's 858.Xr mc_start 9E 859entry point has been called, then driver writers should return 860.Er EBUSY . 861.It Sy MAC_PROP_FLOWCTRL 862.Bd -filled -compact 863Type: 864.Sy link_flowctrl_t | 865Permissions: 866.Sy Read/Write 867.Ed 868.Pp 869The 870.Sy MAC_PROP_FLOWCTRL 871property manages the configuration of pause frames as part of Ethernet 872flow control. 873Note, this only describes what this device will advertise. 874What is actually enabled may be different and is subject to the rules of 875auto-negotiation. 876The 877.Sy link_flowctrl_t 878is an enumeration that may be set to one of the following values: 879.Bl -tag -width Ds 880.It Sy LINK_FLOWCTRL_NONE 881Flow control is disabled. 882No pause frames should be generated or honored. 883.It Sy LINK_FLOWCTRL_RX 884The device can receive pause frames; however, it should not generate 885them. 886.It Sy LINK_FLOWCTRL_TX 887The device can generate pause frames; however, it does not support 888receiving them. 889.It Sy LINK_FLOWCTRL_BI 890The device supports both sending and receiving pause frames. 891.El 892.Pp 893When getting this property, the device driver should return the way that 894it has configured the device, not what the device has actually 895negotiated. 896When setting the property, it should update the hardware and allow the link to 897potentially perform auto-negotiation again. 898.It Sy MAC_PROP_EN_FEC_CAP 899.Bd -filled -compact 900Type: 901.Sy link_fec_t | 902Permissions: 903.Sy Read/Write 904.Ed 905.Pp 906The 907.Sy MAC_PROP_EN_FEC_CAP 908property indicates which Forward Error Correction (FEC) code is advertised 909by the device. 910.Pp 911The 912.Sy link_fec_t 913is an enumeration that may be a combination of the following bit values: 914.Bl -tag -width Ds 915.It Sy LINK_FEC_NONE 916No FEC over the link. 917.It Sy LINK_FEC_AUTO 918The FEC coding to use is auto-negotiated, 919.Sy LINK_FEC_AUTO 920cannot be set along with any of the other values. 921This is the default setting the device driver should use. 922.It Sy LINK_FEC_RS 923The link may use Reed-Solomon FEC coding. 924.It Sy LINK_FEC_BASE_R 925The link may use Base-R coding, also common referred to as FireCode. 926.El 927.Pp 928When setting the property, it should update the hardware with the requested, or 929combination of requested codings. 930If a particular combination of codings is not supported by the hardware, 931the device driver should return 932.Er EINVAL . 933When retrieving this property, the device driver should return the current 934value of the property. 935.It Sy MAC_PROP_ADV_FEC_CAP 936.Bd -filled -compact 937Type: 938.Sy link_fec_t | 939Permissions: 940.Sy Read-Only 941.Ed 942.Pp 943The 944.Sy MAC_PROP_ADV_FEC_CAP 945has the same values as 946.Sy MAC_PROP_EN_FEC_CAP . 947The property indicates which Forward Error Correction (FEC) code has been 948negotiated over the link. 949.El 950.Pp 951The remaining properties are all about various auto-negotiation link 952speeds. 953They fall into two different buckets: properties with 954.Sy _ADV_ 955in the name and properties with 956.Sy _EN_ 957in the name. 958For any given supported speed, there is one of each. 959The 960.Sy _EN_ 961set of properties are read/write properties that control what should be 962advertised by the device. 963When these are retrieved, they should return the current value of the property. 964When they are set, they should change how the hardware advertises the specific 965speed and trigger any kind of link reset and auto-negotiation, if enabled, to 966occur. 967.Pp 968The 969.Sy _ADV_ 970set of properties are read-only properties. 971They are meant to reflect what has actually been negotiated. 972These may be different from the 973.Sy _EN_ 974family of properties, especially when different power management 975settings are at play. 976.Pp 977See the 978.Sx Link Speed and Auto-negotiation 979section for more information. 980.Pp 981The properties are ordered in increasing link speed: 982.Bl -hang -width Ds 983.It Sy MAC_PROP_ADV_10HDX_CAP 984.Bd -filled -compact 985Type: 986.Sy uint8_t | 987Permissions: 988.Sy Read-Only 989.Ed 990.Pp 991The 992.Sy MAC_PROP_ADV_10HDX_CAP 993property describes whether or not 10 Mbit/s half-duplex support is 994advertised. 995.It Sy MAC_PROP_EN_10HDX_CAP 996.Bd -filled -compact 997Type: 998.Sy uint8_t | 999Permissions: 1000.Sy Read/Write 1001.Ed 1002.Pp 1003The 1004.Sy MAC_PROP_EN_10HDX_CAP 1005property describes whether or not 10 Mbit/s half-duplex support is 1006enabled. 1007.It Sy MAC_PROP_ADV_10FDX_CAP 1008.Bd -filled -compact 1009Type: 1010.Sy uint8_t | 1011Permissions: 1012.Sy Read-Only 1013.Ed 1014.Pp 1015The 1016.Sy MAC_PROP_ADV_10FDX_CAP 1017property describes whether or not 10 Mbit/s full-duplex support is 1018advertised. 1019.It Sy MAC_PROP_EN_10FDX_CAP 1020.Bd -filled -compact 1021Type: 1022.Sy uint8_t | 1023Permissions: 1024.Sy Read/Write 1025.Ed 1026.Pp 1027The 1028.Sy MAC_PROP_EN_10FDX_CAP 1029property describes whether or not 10 Mbit/s full-duplex support is 1030enabled. 1031.It Sy MAC_PROP_ADV_100HDX_CAP 1032.Bd -filled -compact 1033Type: 1034.Sy uint8_t | 1035Permissions: 1036.Sy Read-Only 1037.Ed 1038.Pp 1039The 1040.Sy MAC_PROP_ADV_100HDX_CAP 1041property describes whether or not 100 Mbit/s half-duplex support is 1042advertised. 1043.It Sy MAC_PROP_EN_100HDX_CAP 1044.Bd -filled -compact 1045Type: 1046.Sy uint8_t | 1047Permissions: 1048.Sy Read/Write 1049.Ed 1050.Pp 1051The 1052.Sy MAC_PROP_EN_100HDX_CAP 1053property describes whether or not 100 Mbit/s half-duplex support is 1054enabled. 1055.It Sy MAC_PROP_ADV_100FDX_CAP 1056.Bd -filled -compact 1057Type: 1058.Sy uint8_t | 1059Permissions: 1060.Sy Read-Only 1061.Ed 1062.Pp 1063The 1064.Sy MAC_PROP_ADV_100FDX_CAP 1065property describes whether or not 100 Mbit/s full-duplex support is 1066advertised. 1067.It Sy MAC_PROP_EN_100FDX_CAP 1068.Bd -filled -compact 1069Type: 1070.Sy uint8_t | 1071Permissions: 1072.Sy Read/Write 1073.Ed 1074.Pp 1075The 1076.Sy MAC_PROP_EN_100FDX_CAP 1077property describes whether or not 100 Mbit/s full-duplex support is 1078enabled. 1079.It Sy MAC_PROP_ADV_100T4_CAP 1080.Bd -filled -compact 1081Type: 1082.Sy uint8_t | 1083Permissions: 1084.Sy Read-Only 1085.Ed 1086.Pp 1087The 1088.Sy MAC_PROP_ADV_100T4_CAP 1089property describes whether or not 100 Mbit/s Ethernet using the 1090100BASE-T4 standard is 1091advertised. 1092.It Sy MAC_PROP_EN_100T4_CAP 1093.Bd -filled -compact 1094Type: 1095.Sy uint8_t | 1096Permissions: 1097.Sy Read/Write 1098.Ed 1099.Pp 1100The 1101.Sy MAC_PROP_ADV_100T4_CAP 1102property describes whether or not 100 Mbit/s Ethernet using the 1103100BASE-T4 standard is 1104enabled. 1105.It Sy MAC_PROP_ADV_1000HDX_CAP 1106.Bd -filled -compact 1107Type: 1108.Sy uint8_t | 1109Permissions: 1110.Sy Read-Only 1111.Ed 1112.Pp 1113The 1114.Sy MAC_PROP_ADV_1000HDX_CAP 1115property describes whether or not 1 Gbit/s half-duplex support is 1116advertised. 1117.It Sy MAC_PROP_EN_1000HDX_CAP 1118.Bd -filled -compact 1119Type: 1120.Sy uint8_t | 1121Permissions: 1122.Sy Read/Write 1123.Ed 1124.Pp 1125The 1126.Sy MAC_PROP_EN_1000HDX_CAP 1127property describes whether or not 1 Gbit/s half-duplex support is 1128enabled. 1129.It Sy MAC_PROP_ADV_1000FDX_CAP 1130.Bd -filled -compact 1131Type: 1132.Sy uint8_t | 1133Permissions: 1134.Sy Read-Only 1135.Ed 1136.Pp 1137The 1138.Sy MAC_PROP_ADV_1000FDX_CAP 1139property describes whether or not 1 Gbit/s full-duplex support is 1140advertised. 1141.It Sy MAC_PROP_EN_1000FDX_CAP 1142.Bd -filled -compact 1143Type: 1144.Sy uint8_t | 1145Permissions: 1146.Sy Read/Write 1147.Ed 1148.Pp 1149The 1150.Sy MAC_PROP_EN_1000FDX_CAP 1151property describes whether or not 1 Gbit/s full-duplex support is 1152enabled. 1153.It Sy MAC_PROP_ADV_2500FDX_CAP 1154.Bd -filled -compact 1155Type: 1156.Sy uint8_t | 1157Permissions: 1158.Sy Read-Only 1159.Ed 1160.Pp 1161The 1162.Sy MAC_PROP_ADV_2500FDX_CAP 1163property describes whether or not 2.5 Gbit/s full-duplex support is 1164advertised. 1165.It Sy MAC_PROP_EN_2500FDX_CAP 1166.Bd -filled -compact 1167Type: 1168.Sy uint8_t | 1169Permissions: 1170.Sy Read/Write 1171.Ed 1172.Pp 1173The 1174.Sy MAC_PROP_EN_2500FDX_CAP 1175property describes whether or not 2.5 Gbit/s full-duplex support is 1176enabled. 1177.It Sy MAC_PROP_ADV_5000FDX_CAP 1178.Bd -filled -compact 1179Type: 1180.Sy uint8_t | 1181Permissions: 1182.Sy Read-Only 1183.Ed 1184.Pp 1185The 1186.Sy MAC_PROP_ADV_5000FDX_CAP 1187property describes whether or not 5.0 Gbit/s full-duplex support is 1188advertised. 1189.It Sy MAC_PROP_EN_5000FDX_CAP 1190.Bd -filled -compact 1191Type: 1192.Sy uint8_t | 1193Permissions: 1194.Sy Read/Write 1195.Ed 1196.Pp 1197The 1198.Sy MAC_PROP_EN_5000FDX_CAP 1199property describes whether or not 5.0 Gbit/s full-duplex support is 1200enabled. 1201.It Sy MAC_PROP_ADV_10GFDX_CAP 1202.Bd -filled -compact 1203Type: 1204.Sy uint8_t | 1205Permissions: 1206.Sy Read-Only 1207.Ed 1208.Pp 1209The 1210.Sy MAC_PROP_ADV_10GFDX_CAP 1211property describes whether or not 10 Gbit/s full-duplex support is 1212advertised. 1213.It Sy MAC_PROP_EN_10GFDX_CAP 1214.Bd -filled -compact 1215Type: 1216.Sy uint8_t | 1217Permissions: 1218.Sy Read/Write 1219.Ed 1220.Pp 1221The 1222.Sy MAC_PROP_EN_10GFDX_CAP 1223property describes whether or not 10 Gbit/s full-duplex support is 1224enabled. 1225.It Sy MAC_PROP_ADV_40GFDX_CAP 1226.Bd -filled -compact 1227Type: 1228.Sy uint8_t | 1229Permissions: 1230.Sy Read-Only 1231.Ed 1232.Pp 1233The 1234.Sy MAC_PROP_ADV_40GFDX_CAP 1235property describes whether or not 40 Gbit/s full-duplex support is 1236advertised. 1237.It Sy MAC_PROP_EN_40GFDX_CAP 1238.Bd -filled -compact 1239Type: 1240.Sy uint8_t | 1241Permissions: 1242.Sy Read/Write 1243.Ed 1244.Pp 1245The 1246.Sy MAC_PROP_EN_40GFDX_CAP 1247property describes whether or not 40 Gbit/s full-duplex support is 1248enabled. 1249.It Sy MAC_PROP_ADV_100GFDX_CAP 1250.Bd -filled -compact 1251Type: 1252.Sy uint8_t | 1253Permissions: 1254.Sy Read-Only 1255.Ed 1256.Pp 1257The 1258.Sy MAC_PROP_ADV_100GFDX_CAP 1259property describes whether or not 100 Gbit/s full-duplex support is 1260advertised. 1261.It Sy MAC_PROP_EN_100GFDX_CAP 1262.Bd -filled -compact 1263Type: 1264.Sy uint8_t | 1265Permissions: 1266.Sy Read/Write 1267.Ed 1268.Pp 1269The 1270.Sy MAC_PROP_EN_100GFDX_CAP 1271property describes whether or not 100 Gbit/s full-duplex support is 1272enabled. 1273.El 1274.Ss Private Properties 1275In addition to the defined properties above, drivers are allowed to 1276define private properties. 1277These private properties are device-specific properties. 1278All private properties share the same constant, 1279.Sy MAC_PROP_PRIVATE . 1280Properties are distinguished by a name, which is a character string. 1281The list of such private properties is defined when registering with mac in the 1282.Sy m_priv_props 1283member of the 1284.Xr mac_register 9S 1285structure. 1286.Pp 1287The driver may define whatever semantics it wants for these private 1288properties. 1289They will not be listed when running 1290.Xr dladm 1M , 1291unless explicitly requested by name. 1292All such properties should start with a leading underscore character and then 1293consist of alphanumeric ASCII characters and additional underscores or hyphens. 1294.Pp 1295Properties of type 1296.Sy MAC_PROP_PRIVATE 1297may show up in all three property related entry points: 1298.Xr mc_propinfo 9E , 1299.Xr mc_getprop 9E , 1300and 1301.Xr mc_setprop 9E . 1302Device drivers should tell the different properties apart by using the 1303.Xr strcmp 9F 1304function to compare it to the set of properties that it knows about. 1305When encountering properties that it doesn't know, it should treat them 1306like all other unknown properties. 1307.Sh STATISTICS 1308The MAC framework defines a couple different sets of statistics which 1309are based on various standards for devices to implement. 1310Statistics are retrieved through the 1311.Xr mc_getstat 9E 1312entry point. 1313There are both statistics that are required for all devices and then there is a 1314separate set of Ethernet specific statistics. 1315Not all devices will support every statistic. 1316In many cases, several device registers will need to be combined to create the 1317proper stat. 1318.Pp 1319In general, if the device is not keeping track of these statistics, then 1320it is recommended that the driver store these values as a 1321.Sy uint64_t 1322to ensure that overflow does not occur. 1323.Pp 1324If a device does not support a specific statistic, then it is fine to 1325return that it is not supported. 1326The same should be used for unrecognized statistics. 1327See 1328.Xr mc_getstat 9E 1329for more information on the proper way to handle these. 1330.Ss General Device Statistics 1331The following statistics are based on MIB-II statistics from both RFC 13321213 and RFC 1573. 1333.Bl -tag -width Ds 1334.It Sy MAC_STAT_IFSPEED 1335The device's current speed in bits per second. 1336.It Sy MAC_STAT_MULTIRCV 1337The total number of received multicast packets. 1338.It Sy MAC_STAT_BRDCSTRCV 1339The total number of received broadcast packets. 1340.It Sy MAC_STAT_MULTIXMT 1341The total number of transmitted multicast packets. 1342.It Sy MAC_STAT_BRDCSTXMT 1343The total number of received broadcast packets. 1344.It Sy MAC_STAT_NORCVBUF 1345The total number of packets discarded by the hardware due to a lack of 1346receive buffers. 1347.It Sy MAC_STAT_IERRORS 1348The total number of errors detected on input. 1349.It Sy MAC_STAT_UNKNOWNS 1350The total number of received packets that were discarded because they 1351were of an unknown protocol. 1352.It Sy MAC_STAT_NOXMTBUF 1353The total number of outgoing packets dropped due to a lack of transmit 1354buffers. 1355.It Sy MAC_STAT_OERRORS 1356The total number of outgoing packets that resulted in errors. 1357.It Sy MAC_STAT_COLLISIONS 1358Total number of collisions encountered by the transmitter. 1359.It Sy MAC_STAT_RBYTES 1360The total number of 1361.Sy bytes 1362received by the device, regardless of packet type. 1363.It Sy MAC_STAT_IPACKETS 1364The total number of 1365.Sy packets 1366received by the device, regardless of packet type. 1367.It Sy MAC_STAT_OBYTES 1368The total number of 1369.Sy bytes 1370transmitted by the device, regardless of packet type. 1371.It Sy MAC_STAT_OPACKETS 1372The total number of 1373.Sy packets 1374sent by the device, regardless of packet type. 1375.It Sy MAC_STAT_UNDERFLOWS 1376The total number of packets that were smaller than the minimum sized 1377packet for the device and were therefore dropped. 1378.It Sy MAC_STAT_OVERFLOWS 1379The total number of packets that were larger than the maximum sized 1380packet for the device and were therefore dropped. 1381.El 1382.Ss Ethernet Specific Statistics 1383The following statistics are specific to Ethernet devices. 1384They refer to values from RFC 1643 and include various MII/GMII specific stats. 1385Many of these are also defined in IEEE 802.3. 1386.Bl -tag -width Ds 1387.It Sy ETHER_STAT_ADV_CAP_1000FDX 1388Indicates that the device is advertising support for 1 Gbit/s 1389full-duplex operation. 1390.It Sy ETHER_STAT_ADV_CAP_1000HDX 1391Indicates that the device is advertising support for 1 Gbit/s 1392half-duplex operation. 1393.It Sy ETHER_STAT_ADV_CAP_100FDX 1394Indicates that the device is advertising support for 100 Mbit/s 1395full-duplex operation. 1396.It Sy ETHER_STAT_ADV_CAP_100GFDX 1397Indicates that the device is advertising support for 100 Gbit/s 1398full-duplex operation. 1399.It Sy ETHER_STAT_ADV_CAP_100HDX 1400Indicates that the device is advertising support for 100 Mbit/s 1401half-duplex operation. 1402.It Sy ETHER_STAT_ADV_CAP_100T4 1403Indicates that the device is advertising support for 100 Mbit/s 1404100BASE-T4 operation. 1405.It Sy ETHER_STAT_ADV_CAP_10FDX 1406Indicates that the device is advertising support for 10 Mbit/s 1407full-duplex operation. 1408.It Sy ETHER_STAT_ADV_CAP_10GFDX 1409Indicates that the device is advertising support for 10 Gbit/s 1410full-duplex operation. 1411.It Sy ETHER_STAT_ADV_CAP_10HDX 1412Indicates that the device is advertising support for 10 Mbit/s 1413half-duplex operation. 1414.It Sy ETHER_STAT_ADV_CAP_2500FDX 1415Indicates that the device is advertising support for 2.5 Gbit/s 1416full-duplex operation. 1417.It Sy ETHER_STAT_ADV_CAP_40GFDX 1418Indicates that the device is advertising support for 40 Gbit/s 1419full-duplex operation. 1420.It Sy ETHER_STAT_ADV_CAP_5000FDX 1421Indicates that the device is advertising support for 5.0 Gbit/s 1422full-duplex operation. 1423.It Sy ETHER_STAT_ADV_CAP_ASMPAUSE 1424Indicates that the device is advertising support for receiving pause 1425frames. 1426.It Sy ETHER_STAT_ADV_CAP_AUTONEG 1427Indicates that the device is advertising support for auto-negotiation. 1428.It Sy ETHER_STAT_ADV_CAP_PAUSE 1429Indicates that the device is advertising support for generating pause 1430frames. 1431.It Sy ETHER_STAT_ADV_REMFAULT 1432Indicates that the device is advertising support for detecting faults in 1433the remote link peer. 1434.It Sy ETHER_STAT_ALIGN_ERRORS 1435Indicates the number of times an alignment error was generated by the 1436Ethernet device. 1437This is a count of packets that were not an integral number of octets and failed 1438the FCS check. 1439.It Sy ETHER_STAT_CAP_1000FDX 1440Indicates the device supports 1 Gbit/s full-duplex operation. 1441.It Sy ETHER_STAT_CAP_1000HDX 1442Indicates the device supports 1 Gbit/s half-duplex operation. 1443.It Sy ETHER_STAT_CAP_100FDX 1444Indicates the device supports 100 Mbit/s full-duplex operation. 1445.It Sy ETHER_STAT_CAP_100GFDX 1446Indicates the device supports 100 Gbit/s full-duplex operation. 1447.It Sy ETHER_STAT_CAP_100HDX 1448Indicates the device supports 100 Mbit/s half-duplex operation. 1449.It Sy ETHER_STAT_CAP_100T4 1450Indicates the device supports 100 Mbit/s 100BASE-T4 operation. 1451.It Sy ETHER_STAT_CAP_10FDX 1452Indicates the device supports 10 Mbit/s full-duplex operation. 1453.It Sy ETHER_STAT_CAP_10GFDX 1454Indicates the device supports 10 Gbit/s full-duplex operation. 1455.It Sy ETHER_STAT_CAP_10HDX 1456Indicates the device supports 10 Mbit/s half-duplex operation. 1457.It Sy ETHER_STAT_CAP_2500FDX 1458Indicates the device supports 2.5 Gbit/s full-duplex operation. 1459.It Sy ETHER_STAT_CAP_40GFDX 1460Indicates the device supports 40 Gbit/s full-duplex operation. 1461.It Sy ETHER_STAT_CAP_5000FDX 1462Indicates the device supports 5.0 Gbit/s full-duplex operation. 1463.It Sy ETHER_STAT_CAP_ASMPAUSE 1464Indicates that the device supports the ability to receive pause frames. 1465.It Sy ETHER_STAT_CAP_AUTONEG 1466Indicates that the device supports the ability to perform link 1467auto-negotiation. 1468.It Sy ETHER_STAT_CAP_PAUSE 1469Indicates that the device supports the ability to transmit pause frames. 1470.It Sy ETHER_STAT_CAP_REMFAULT 1471Indicates that the device supports the ability of detecting a remote 1472fault in a link peer. 1473.It Sy ETHER_STAT_CARRIER_ERRORS 1474Indicates the number of times that the Ethernet carrier sense condition 1475was lost or not asserted. 1476.It Sy ETHER_STAT_DEFER_XMTS 1477Indicates the number of frames for which the device was unable to 1478transmit the frame due to being busy and had to try again. 1479.It Sy ETHER_STAT_EX_COLLISIONS 1480Indicates the number of frames that failed to send due to an excessive 1481number of collisions. 1482.It Sy ETHER_STAT_FCS_ERRORS 1483Indicates the number of times that a frame check sequence failed. 1484.It Sy ETHER_STAT_FIRST_COLLISIONS 1485Indicates the number of times that a frame was eventually transmitted 1486successfully, but only after a single collision. 1487.It Sy ETHER_STAT_JABBER_ERRORS 1488Indicates the number of frames that were received that were both larger 1489than the maximum packet size and failed the frame check sequence. 1490.It Sy ETHER_STAT_LINK_ASMPAUSE 1491Indicates whether the link is currently configured to accept pause 1492frames. 1493.It Sy ETHER_STAT_LINK_AUTONEG 1494Indicates whether the current link state is a result of 1495auto-negotiation. 1496.It Sy ETHER_STAT_LINK_DUPLEX 1497Indicates the current duplex state of the link. 1498The values used here should be the same as documented for 1499.Sy MAC_PROP_DUPLEX . 1500.It Sy ETHER_STAT_LINK_PAUSE 1501Indicates whether the link is currently configured to generate pause 1502frames. 1503.It Sy ETHER_STAT_LP_CAP_1000FDX 1504Indicates the remote device supports 1 Gbit/s full-duplex operation. 1505.It Sy ETHER_STAT_LP_CAP_1000HDX 1506Indicates the remote device supports 1 Gbit/s half-duplex operation. 1507.It Sy ETHER_STAT_LP_CAP_100FDX 1508Indicates the remote device supports 100 Mbit/s full-duplex operation. 1509.It Sy ETHER_STAT_LP_CAP_100GFDX 1510Indicates the remote device supports 100 Gbit/s full-duplex operation. 1511.It Sy ETHER_STAT_LP_CAP_100HDX 1512Indicates the remote device supports 100 Mbit/s half-duplex operation. 1513.It Sy ETHER_STAT_LP_CAP_100T4 1514Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation. 1515.It Sy ETHER_STAT_LP_CAP_10FDX 1516Indicates the remote device supports 10 Mbit/s full-duplex operation. 1517.It Sy ETHER_STAT_LP_CAP_10GFDX 1518Indicates the remote device supports 10 Gbit/s full-duplex operation. 1519.It Sy ETHER_STAT_LP_CAP_10HDX 1520Indicates the remote device supports 10 Mbit/s half-duplex operation. 1521.It Sy ETHER_STAT_LP_CAP_2500FDX 1522Indicates the remote device supports 2.5 Gbit/s full-duplex operation. 1523.It Sy ETHER_STAT_LP_CAP_40GFDX 1524Indicates the remote device supports 40 Gbit/s full-duplex operation. 1525.It Sy ETHER_STAT_LP_CAP_5000FDX 1526Indicates the remote device supports 5.0 Gbit/s full-duplex operation. 1527.It Sy ETHER_STAT_LP_CAP_ASMPAUSE 1528Indicates that the remote device supports the ability to receive pause 1529frames. 1530.It Sy ETHER_STAT_LP_CAP_AUTONEG 1531Indicates that the remote device supports the ability to perform link 1532auto-negotiation. 1533.It Sy ETHER_STAT_LP_CAP_PAUSE 1534Indicates that the remote device supports the ability to transmit pause 1535frames. 1536.It Sy ETHER_STAT_LP_CAP_REMFAULT 1537Indicates that the remote device supports the ability of detecting a 1538remote fault in a link peer. 1539.It Sy ETHER_STAT_MACRCV_ERRORS 1540Indicates the number of times that the internal MAC layer encountered an 1541error when attempting to receive and process a frame. 1542.It Sy ETHER_STAT_MACXMT_ERRORS 1543Indicates the number of times that the internal MAC layer encountered an 1544error when attempting to process and transmit a frame. 1545.It Sy ETHER_STAT_MULTI_COLLISIONS 1546Indicates the number of times that a frame was eventually transmitted 1547successfully, but only after more than one collision. 1548.It Sy ETHER_STAT_SQE_ERRORS 1549Indicates the number of times that an SQE error occurred. 1550The specific conditions for this error are documented in IEEE 802.3. 1551.It Sy ETHER_STAT_TOOLONG_ERRORS 1552Indicates the number of frames that were received that were longer than 1553the maximum frame size supported by the device. 1554.It Sy ETHER_STAT_TOOSHORT_ERRORS 1555Indicates the number of frames that were received that were shorter than 1556the minimum frame size supported by the device. 1557.It Sy ETHER_STAT_TX_LATE_COLLISIONS 1558Indicates the number of times a collision was detected late on the 1559device. 1560.It Sy ETHER_STAT_XCVR_ADDR 1561Indicates the address of the MII/GMII receiver address. 1562.It Sy ETHER_STAT_XCVR_ID 1563Indicates the id of the MII/GMII receiver address. 1564.It Sy ETHER_STAT_XCVR_INUSE 1565Indicates what kind of receiver is in use. 1566The following values may be used: 1567.Bl -tag -width Ds 1568.It Sy XCVR_UNDEFINED 1569The receiver type is undefined by the hardware. 1570.It Sy XCVR_NONE 1571There is no receiver in use by the hardware. 1572.It Sy XCVR_10 1573The receiver supports 10BASE-T operation. 1574.It Sy XCVR_100T4 1575The receiver supports 100BASE-T4 operation. 1576.It Sy XCVR_100X 1577The receiver supports 100BASE-TX operation. 1578.It Sy XCVR_100T2 1579The receiver supports 100BASE-T2 operation. 1580.It Sy XCVR_1000X 1581The receiver supports 1000BASE-X operation. 1582This is used for all fiber receivers. 1583.It Sy XCVR_1000T 1584The receiver supports 1000BASE-T operation. 1585This is used for all copper receivers. 1586.El 1587.El 1588.Ss Device Specific kstats 1589In addition to the defined statistics above, if the device driver 1590maintains additional statistics or the device provides additional 1591statistics, it should create its own kstats through the 1592.Xr kstat_create 9F 1593function to allow operators to observe them. 1594.Sh RECEIVE DESCRIPTOR LAYOUT 1595One of the important things that a device driver must do is lay out DMA 1596memory, generally in a ring of descriptors, into which received Ethernet 1597frames will be placed. 1598When performing this, there are a few things that drivers should 1599generally do: 1600.Bl -enum -offset indent 1601.It 1602Drivers should lay out memory so that the IP header will be 4-byte 1603aligned. 1604The IP stack expects that the beginning of an IP header will be at a 16054-byte aligned address; however, a DMA allocation will be at a 4- 1606or 8-byte aligned address by default. 1607The IP hearder is at a 14 byte offset from the beginning of the Ethernet 1608frame, leaving the IP header at a 2-byte alignment if the Ethernet frame 1609starts at the beginning of the DMA buffer. 1610If VLAN tagging is in place, then each VLAN tag adds 4 bytes, which 1611doesn't change the alignment the IP header is found at. 1612.Pp 1613As a solution to this, the driver should program the device to start 1614placing the received Ethernet frame at two bytes off of the start of the 1615DMA buffer. 1616This will make sure that no matter whether or not VLAN tags are present, 1617that the IP header will be 4-byte aligned. 1618.It 1619Drivers should try to allocate the DMA memory used for receiving frames 1620as a continuous buffer. 1621If for some reason that would not be possible, the driver should try to 1622ensure that there is enough space for all of the initial Ethernet and 1623any possible layer three and layer four headers 1624.Pq such as IP, TCP, or UDP 1625in the initial descriptor. 1626.It 1627As discussed in the 1628.Sx MBLKS AND DMA 1629section, there are multiple strategies for managing the relationship 1630between DMA data, receive descriptors, and the operating system 1631representation of a packet in the 1632.Xr mblk 9S 1633structure. 1634Drivers must limit their resource consumption. 1635See the 1636.Sy Considerations 1637section of 1638.Sx MBLKS AND DMA 1639for more on this. 1640.El 1641.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT 1642Device drivers are the first line of defense for dealing with broken 1643devices and bugs in their firmware. 1644While most devices will rarely fail, it is important that when designing and 1645implementing the device driver that particular attention is paid in the design 1646with respect to RAS (Reliability, Availability, and Serviceability). 1647While everything described in this section is optional, it is highly recommended 1648that all new device drivers follow these guidelines. 1649.Pp 1650The Fault Management Architecture (FMA) provides facilities for 1651detecting and reporting various classes of defects and faults. 1652Specifically for networking device drivers, issues that should be 1653detected and reported include: 1654.Bl -bullet -offset indent 1655.It 1656Device internal uncorrectable errors 1657.It 1658Device internal correctable errors 1659.It 1660PCI and PCI Express transport errors 1661.It 1662Device temperature alarms 1663.It 1664Device transmission stalls 1665.It 1666Device communication timeouts 1667.It 1668High invalid interrupts 1669.El 1670.Pp 1671All such errors fall into three primary categories: 1672.Bl -enum -offset indent 1673.It 1674Errors detected by the Fault Management Architecture 1675.It 1676Errors detected by the device and indicated to the device driver 1677.It 1678Errors detected by the device driver 1679.El 1680.Ss Fault Management Setup and Teardown 1681Drivers should initialize support for the fault management framework by 1682calling 1683.Xr ddi_fm_init 9F 1684from their 1685.Xr attach 9E 1686routine. 1687By registering with the fault management framework, a device driver is given the 1688chance to detect and notice transport errors as well as report other errors that 1689exist. 1690While a device driver does not need to indicate that it is capable of all such 1691capabilities described in 1692.Xr ddi_fm_init 9F , 1693we suggest that device drivers at least register the 1694.Sy DDI_FM_EREPORT_CAPABLE 1695so as to allow the driver to report issues that it detects. 1696.Pp 1697If the driver registers with the fault management framework during its 1698.Xr attach 9E 1699entry point, it must call 1700.Xr ddi_fm_fini 9F 1701during its 1702.Xr detach 9E 1703entry point. 1704.Ss Transport Errors 1705Many modern networking devices leverage PCI or PCI Express. 1706As such, there are two primary ways that device drivers access data: they either 1707memory map device registers and use routines like 1708.Xr ddi_get8 9F 1709and 1710.Xr ddi_put8 9F 1711or they use direct memory access (DMA). 1712New device drivers should always enable checking of the transport layer by 1713marking their support in the 1714.Xr ddi_device_acc_attr 9S 1715structure and using routines like 1716.Xr ddi_fm_acc_err_get 9F 1717and 1718.Xr ddi_fm_dma_err_get 9F 1719to detect if errors have occurred. 1720.Ss Device Indicated Errors 1721Many devices have capabilities to announce to a device driver that a 1722fatal correctable error or uncorrectable error has occurred. 1723Other devices have the ability to indicate that various physical issues have 1724occurred such as a fan failing or a temperature sensor having fired. 1725.Pp 1726Drivers should wire themselves to receive notifications when these 1727events occur. 1728The means and capabilities will vary from device to device. 1729For example, some devices will generate information about these notifications 1730through special interrupts. 1731Other devices may have a register that software can poll. 1732In the cases where polling is required, driver writers should try not to poll 1733too frequently and should generally only poll when the device is actively being 1734used, e.g. between calls to the 1735.Xr mc_start 9E 1736and 1737.Xr mc_stop 9E 1738entry points. 1739.Ss Driver Transmit Stall Detection 1740One of the primary responsibilities of a hardened device driver is to 1741perform transmit stall detection. 1742The core idea behind tx stall detection is that the driver should record when 1743it's getting activity related to when data has been successfully transmitted. 1744Most devices should be transmitting data on a regular basis as long as the link 1745is up. 1746If it is not, then this may indicate that the device is stuck and needs to be 1747reset. 1748At this time, the MAC framework does not provide any resources for performing 1749these checks; however, polling on each individual transmit ring for the last 1750completion time while something is actively being transmitted through the use of 1751routines such as 1752.Xr timeout 9F 1753may be a reasonable starting point. 1754.Ss Driver Command Timeout Detection 1755Each device is programmed in different ways. 1756Some devices are programmed through asynchronous commands while others are 1757programmed by writing directly to memory mapped registers. 1758If a device receives asynchronous replies to commands, then the device driver 1759should set reasonable timeouts for all such commands and plan on detecting them. 1760If a timeout occurs, the driver should presume that there is an issue with the 1761hardware and proceed to abort the command or reset the device. 1762.Pp 1763Many devices do not have such a communication mechanism. 1764However, whenever there is some activity where the device driver must wait, then 1765it should be prepared for the fact that the device may never get back to 1766it and react appropriately by performing some kind of device reset. 1767.Ss Reacting to Errors 1768When any of the above categories of errors has been triggered, the 1769behavior that the device driver should take depends on the kind of 1770error. 1771If a fatal error, for example, a transport error, a transmit stall was detected, 1772or the device indicated an uncorrectable error was detected, then it is 1773important that the driver take the following steps: 1774.Bl -enum -offset indent 1775.It 1776Set a flag in the device driver's state that indicates that it has hit 1777an error condition. 1778When this error condition flag is asserted, transmitted packets should be 1779accepted and dropped and actions that would require writing to the device state 1780should fail with an error. 1781This flag should remain until the device has been successfully restarted. 1782.It 1783If the error was not a transport error that was indicated by the fault 1784management architecture, e.g. a transport error that was detected, then 1785the device driver should post an 1786.Sy ereport 1787indicating what has occurred with the 1788.Xr ddi_fm_ereport_post 9F 1789function. 1790.It 1791The device driver should indicate that the device's service was lost 1792with a call to 1793.Xr ddi_fm_service_impact 9F 1794using the symbol 1795.Sy DDI_SERVICE_LOST . 1796.It 1797At this point the device driver should issue a device reset through some 1798device-specific means. 1799.It 1800When the device reset has been completed, then the device driver should 1801restore all of the programmed state to the device. 1802This includes things like the current MTU, advertised auto-negotiation speeds, 1803MAC address filters, and more. 1804.It 1805Finally, when service has been restored, the device driver should call 1806.Xr ddi_fm_service_impact 9F 1807using the symbol 1808.Sy DDI_SERVICE_RESTORED . 1809.El 1810.Pp 1811When a non-fatal error occurs, then the device driver should submit an 1812ereport and should optionally mark the device degraded using 1813.Xr ddi_fm_service_impact 9F 1814with the 1815.Sy DDI_SERVICE_DEGRADED 1816value depending on the nature of the problem that has occurred. 1817.Pp 1818Device drivers should never make the decision to remove a device from 1819service based on errors that have occurred nor should they panic the 1820system. 1821Rather, the device driver should always try to notify the operating system with 1822various ereports and allow its policy decisions to occur. 1823The decision to retire a device lies in the hands of the fault management 1824architecture. 1825It knows more about the operator's intent and the surrounding system's state 1826than the device driver itself does and it will make the call to offline and 1827retire the device if it is required. 1828.Ss Device Resets 1829When resetting a device, a device driver must exercise caution. 1830If a device driver has not been written to plan for a device reset, then it 1831may not correctly restore the device's state after such a reset. 1832Such state should be stored in the instance's private state data as the MAC 1833framework does not know about device resets and will not inform the 1834device again about the expected, programmed state. 1835.Pp 1836One wrinkle with device resets is that many networking cards show up as 1837multiple PCI functions on a single device, for example, each port may 1838show up as a separate function and thus have a separate instance of the 1839device driver attached. 1840When resetting a function, device driver writers should carefully read the 1841device programming manuals and verify whether or not a reset impacts only the 1842stalled function or if it impacts all function across the device. 1843.Pp 1844If the only way to reset a given function is through the device, then 1845this may require more coordination and work on the part of the device 1846driver to ensure that all the other instances are correctly restored. 1847In cases where this occurs, some devices offer ways of injecting 1848interrupts onto those other functions to notify them that this is 1849occurring. 1850.Sh MBLKS AND DMA 1851The networking stack manages framed data through the use of the 1852.Xr mblk 9S 1853structure. 1854The mblk allows for a single message to be made up of individual blocks. 1855Each part is linked together through its 1856.Sy b_cont 1857member. 1858However, it also allows for multiple messages to be chained together through the 1859use of the 1860.Sy b_next 1861member. 1862While the networking stack works with these structures, device drivers generally 1863work with DMA regions. 1864There are two different strategies that device drivers use for handling these 1865two different cases: copying and binding. 1866.Ss Copying Data 1867The first way that device drivers handle interfacing between the two is 1868by having two separate regions of memory. 1869One part is memory which has been allocated for DMA through a call to 1870.Xr ddi_dma_mem_alloc 9F 1871and the other is memory associated with the memory block. 1872.Pp 1873In this case, a driver will use 1874.Xr bcopy 9F 1875to copy memory between the two distinct regions. 1876When transmitting a packet, it will copy the memory from the mblk_t to the DMA 1877region. 1878When receiving memory, it will allocate a mblk_t through the 1879.Xr allocb 9F 1880routine, copy the memory across with 1881.Xr bcopy 9F , 1882and then increment the mblk_t's 1883.Sy w_ptr 1884structure. 1885.Pp 1886If, when receiving, memory is not available for a new message block, 1887then the frame should be skipped and effectively dropped. 1888A kstat should be bumped when such an occasion occurs. 1889.Ss Binding Data 1890An alternative approach to copying data is to use DMA binding. 1891When using DMA binding, the OS takes care of mapping between DMA memory and 1892normal device memory. 1893The exact process is a bit different between transmit and receive. 1894.Pp 1895When transmitting a device driver has an mblk_t and needs to call the 1896.Xr ddi_dma_addr_bind_handle 9F 1897function to bind it to an already existing DMA handle. 1898At that point, it will receive various DMA cookies that it can use to obtain the 1899addresses to program the device with for transmitting data. 1900Once the transmit is done, the driver must then make sure to call 1901.Xr freemsg 9F 1902to release the data. 1903It must not call 1904.Xr freemsg 9F 1905before it receives an interrupt from the device indicating that the data 1906has been transmitted, otherwise it risks sending arbitrary kernel 1907memory. 1908.Pp 1909When receiving data, the device can perform a similar operation. 1910First, it must bind the DMA memory into the kernel's virtual memory address 1911space through a call to the 1912.Xr ddi_dma_addr_bind_handle 9F 1913function if it has not already. 1914Once it has, it must then call 1915.Xr desballoc 9F 1916to try and create a new mblk_t which leverages the associated memory. 1917It can then pass that mblk_t up to the stack. 1918.Ss Considerations 1919When deciding which of these options to use, there are many different 1920considerations that must be made. 1921The answer as to whether to bind memory or to copy data is not always simpler. 1922.Pp 1923The first thing to remember is that DMA resources may be finite on a 1924given platform. 1925Consider the case of receiving data. 1926A device driver that binds one of its receive descriptors may not get it back 1927for quite some time as it may be used by the kernel until an application 1928actually consumes it. 1929Device drivers that try to bind memory for receive, often work with the 1930constraint that they must be able to replace that DMA memory with another DMA 1931descriptor. 1932If they were not replaced, then eventually the device would not be able to 1933receive additional data into the ring. 1934.Pp 1935On the other hand, particularly for larger frames, copying every packet 1936from one buffer to another can be a source of additional latency and 1937memory waste in the system. 1938For larger copies, the cost of copying may dwarf any potential cost of 1939performing DMA binding. 1940.Pp 1941For device driver authors that are unsure of what to do, they should 1942first employ the copying method to simplify the act of writing the 1943device driver. 1944The copying method is simpler and also allows the device driver author not to 1945worry about allocated DMA memory that is still outstanding when it is asked to 1946unload. 1947.Pp 1948If device driver writers are worried about the cost, it is recommended 1949to make the decision as to whether or not to copy or bind DMA data 1950a separate private property for both transmitting and receiving. 1951That private property should indicate the size of the received frame at which 1952to switch from one format to the other. 1953This way, data can be gathered to determine what the impact of each method is on 1954a given platform. 1955.Sh SEE ALSO 1956.Xr dladm 1M , 1957.Xr driver.conf 4 , 1958.Xr ieee802.3 5 , 1959.Xr dlpi 7P , 1960.Xr _fini 9E , 1961.Xr _info 9E , 1962.Xr _init 9E , 1963.Xr attach 9E , 1964.Xr close 9E , 1965.Xr detach 9E , 1966.Xr mc_close 9E , 1967.Xr mc_getcapab 9E , 1968.Xr mc_getprop 9E , 1969.Xr mc_getstat 9E , 1970.Xr mc_multicst 9E , 1971.Xr mc_open 9E , 1972.Xr mc_propinfo 9E , 1973.Xr mc_setpromisc 9E , 1974.Xr mc_setprop 9E , 1975.Xr mc_start 9E , 1976.Xr mc_stop 9E , 1977.Xr mc_tx 9E , 1978.Xr mc_unicst 9E , 1979.Xr open 9E , 1980.Xr allocb 9F , 1981.Xr bcopy 9F , 1982.Xr ddi_dma_addr_bind_handle 9F , 1983.Xr ddi_dma_mem_alloc 9F , 1984.Xr ddi_fm_acc_err_get 9F , 1985.Xr ddi_fm_dma_err_get 9F , 1986.Xr ddi_fm_ereport_post 9F , 1987.Xr ddi_fm_fini 9F , 1988.Xr ddi_fm_init 9F , 1989.Xr ddi_fm_service_impact 9F , 1990.Xr ddi_get8 9F , 1991.Xr ddi_put8 9F , 1992.Xr desballoc 9F , 1993.Xr freemsg 9F , 1994.Xr kstat_create 9F , 1995.Xr mac_alloc 9F , 1996.Xr mac_fini_ops 9F , 1997.Xr mac_free 9F , 1998.Xr mac_hcksum_get 9F , 1999.Xr mac_hcksum_set 9F , 2000.Xr mac_init_ops 9F , 2001.Xr mac_link_update 9F , 2002.Xr mac_lso_get 9F , 2003.Xr mac_maxsdu_update 9F , 2004.Xr mac_prop_info_set_default_link_flowctrl 9F , 2005.Xr mac_prop_info_set_default_str 9F , 2006.Xr mac_prop_info_set_default_uint32 9F , 2007.Xr mac_prop_info_set_default_uint64 9F , 2008.Xr mac_prop_info_set_default_uint8 9F , 2009.Xr mac_prop_info_set_perm 9F , 2010.Xr mac_prop_info_set_range_uint32 9F , 2011.Xr mac_register 9F , 2012.Xr mac_rx 9F , 2013.Xr mac_unregister 9F , 2014.Xr mod_install 9F , 2015.Xr mod_remove 9F , 2016.Xr strcmp 9F , 2017.Xr timeout 9F , 2018.Xr cb_ops 9S , 2019.Xr ddi_device_acc_attr 9S , 2020.Xr dev_ops 9S , 2021.Xr mac_callbacks 9S , 2022.Xr mac_register 9S , 2023.Xr mblk 9S , 2024.Xr modldrv 9S , 2025.Xr modlinkage 9S 2026.Rs 2027.%A McCloghrie, K. 2028.%A Rose, M. 2029.%T RFC 1213 Management Information Base for Network Management of 2030.%T TCP/IP-based internets: MIB-II 2031.%D March 1991 2032.Re 2033.Rs 2034.%A McCloghrie, K. 2035.%A Kastenholz, F. 2036.%T RFC 1573 Evolution of the Interfaces Group of MIB-II 2037.%D January 1994 2038.Re 2039.Rs 2040.%A Kastenholz, F. 2041.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like 2042.%T Interface Types 2043.Re 2044