1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright 2016 Joyent, Inc. 13.\" 14.Dd March 26, 2017 15.Dt MAC 9E 16.Os 17.Sh NAME 18.Nm mac , 19.Nm GLDv3 20.Nd MAC networking device driver overview 21.Sh SYNOPSIS 22.In sys/mac_provider.h 23.In sys/mac_ether.h 24.Sh INTERFACE LEVEL 25illumos DDI specific 26.Sh DESCRIPTION 27The 28.Sy MAC 29framework provides a means for implementing high-performance networking 30device drivers. 31It is the successor to the GLD interfaces and is sometimes referred to as the 32GLDv3. 33The remainder of this manual introduces the aspects of writing devices drivers 34that leverage the MAC framework. 35While both the GLDv3 and MAC framework refer to the same thing, in this manual 36page we use the term the 37.Em MAC framework 38to refer to the device driver interface. 39.Pp 40MAC device drivers are character devices. 41They define the standard 42.Xr _init 9E , 43.Xr _fini 9E , 44and 45.Xr _info 9E 46entry points to initialize the module, as well as 47.Xr dev_ops 9S 48and 49.Xr cb_ops 9S 50structures. 51.Pp 52The main interface with MAC is through a series of callbacks defined in 53a 54.Xr mac_callbacks 9S 55structure. 56These callbacks control all the aspects of the device. 57They range from sending data, getting and setting of properties, controlling mac 58address filters, and also managing promiscuous mode. 59.Pp 60The MAC framework takes care of many aspects of the device driver's 61management. 62A device that uses the MAC framework does not have to worry about creating 63device nodes or implementing 64.Xr open 9E 65or 66.Xr close 9E 67routines. 68In addition, all of the work to interact with 69.Xr dlpi 7P 70is taken care of automatically and transparently. 71.Ss Initializing MAC Support 72For a device to be used in the framework, it must register with the 73framework and take specific actions during 74.Xr _init 9E , 75.Xr attach 9E , 76.Xr detach 9E , 77and 78.Xr _fini 9E . 79.Pp 80All device drivers have to define a 81.Xr dev_ops 9S 82structure which is pointed to by a 83.Xr modldrv 9S 84structure and the corresponding NULL-terminated 85.Xr modlinkage 9S 86structure. 87The 88.Xr dev_ops 9S 89structure should have a 90.Xr cb_ops 9S 91structure defined for it; however, it does not need to implement any of 92the standard 93.Xr cb_ops 9S 94entry points. 95.Pp 96Normally, in a driver's 97.Xr _init 9E 98entry point, it passes its 99.Sy modlinkage 100structure directly to 101.Xr mod_install 9F . 102To properly register with MAC, the driver must call 103.Xr mac_init_ops 9F 104before it calls 105.Xr mod_install 9F . 106If for some reason the 107.Xr mod_install 9F 108function fails, then the driver must be removed by a call to 109.Xr mac_fini_ops 9F . 110.Pp 111Conversely, in the driver's 112.Xr _fini 9E 113routine, it should call 114.Xr mac_fini_ops 9F 115after it successfully calls 116.Xr mod_remove 9F . 117For an example of how to use the 118.Xr mac_init_ops 9F 119and 120.Xr mac_fini_ops 9F 121functions, see the examples section in 122.Xr mac_init_ops 9F . 123.Ss Registering with MAC 124Every instance of a device should register separately with MAC. 125To register with MAC, a driver must allocate a 126.Xr mac_register 9S 127structure, fill it in, and then call 128.Xr mac_register 9F . 129The 130.Sy mac_register_t 131structure contains information about the device and all of the required 132function pointers that will be used as callbacks by the framework. 133.Pp 134These steps should all be taken during a device's 135.Xr attach 9E 136entry point. 137It is recommended that the driver perform this sequence of steps after the 138device has finished its initialization of the chipset and interrupts, though 139interrupts should not be enabled at that point. 140After it calls 141.Xr mac_register 9F 142it will start receiving callbacks from the MAC framework. 143.Pp 144To allocate the registration structure, the driver should call 145.Xr mac_alloc 9F . 146Device drivers should generally always pass the symbol 147.Sy MAC_VERSION 148as the argument to 149.Xr mac_alloc 9F . 150Upon successful completion, the driver will receive a 151.Sy mac_register_t 152structure which it should fill in. 153The structure and its members are documented in 154.Xr mac_register 9S . 155.Pp 156The 157.Xr mac_callbacks 9S 158structure is not allocated as a part of the 159.Xr mac_register 9S 160structure. 161In general, device drivers declare this statically. 162See the 163.Sx MAC Callbacks 164section for more information on how to fill it out. 165.Pp 166Once the structure has been filled in, the driver should call 167.Xr mac_register 9F 168to register itself with MAC. 169The handle that it uses to register with should be part of the driver's soft 170state. 171It will be used in various other support functions and callbacks. 172.Pp 173If the call is successful, then the device driver 174should enable interrupts and finish any other initialization required. 175If the call to 176.Xr mac_register 9F 177failed, then it should unwind its initialization and should return 178.Sy DDI_FAILURE 179from its 180.Xr attach 9E 181routine. 182.Ss MAC Callbacks 183The MAC framework interacts with a device driver through a series of 184callbacks. 185These callbacks are described in their individual manual pages and the 186collection of callbacks is indicated in the 187.Xr mac_callbacks 9S 188manual page. 189This section does not focus on the specific functions, but rather on 190interactions between them and the rest of the device driver framework. 191.Pp 192A device driver should make no assumptions about when the various 193callbacks will be called and whether or not they will be called 194simultaneously. 195For example, a device driver may be asked to transmit data through a call to its 196.Xr mc_tx 9E 197entry point while it is being asked to get a device property through a 198call to its 199.Xr mc_getprop 9E 200entry point. 201As such, while some calls may be serialized to the device, such as setting 202properties, the device driver should always presume that all of its data needs 203to be protected with locks. 204While the device is holding locks, it is safe for it call the following MAC 205routines: 206.Bl -bullet -offset indent -compact 207.It 208.Xr mac_hcksum_get 9F 209.It 210.Xr mac_hcksum_set 9F 211.It 212.Xr mac_lso_get 9F 213.It 214.Xr mac_maxsdu_update 9F 215.It 216.Xr mac_prop_info_set_default_link_flowctrl 9F 217.It 218.Xr mac_prop_info_set_default_str 9F 219.It 220.Xr mac_prop_info_set_default_uint8 9F 221.It 222.Xr mac_prop_info_set_default_uint32 9F 223.It 224.Xr mac_prop_info_set_default_uint64 9F 225.It 226.Xr mac_prop_info_set_perm 9F 227.It 228.Xr mac_prop_info_set_range_uint32 9F 229.El 230.Pp 231Any other MAC related routines should not be called with locks held, 232such as 233.Xr mac_link_update 9F 234or 235.Xr mac_rx 9F . 236Other routines in the DDI may be called while locks are held; however, 237device driver writers should be careful about calling blocking routines 238while locks are held or in interrupt context, though it is generally 239legal to do so. 240.Ss Receiving Data 241A device driver will often receive data through the means of an 242interrupt. 243When that interrupt occurs, the device driver will receive one or more frames 244with optional metadata. 245Often each frame has a corresponding descriptor which has information about 246whether or not there were errors or whether or not the device successfully 247checksummed the packet. 248.Pp 249During a single interrupt, a device driver should process a fixed number 250of frames. 251For each frame the device driver should: 252.Bl -enum -offset indent 253.It 254First check whether or not the frame has errors. 255If errors were detected, then the frame should not be sent to the operating 256system. 257It is recommended that devices keep kstats (see 258.Xr kstat_create 9F 259for more information) and bump the counter whenever such an error is 260detected. 261If the device distinguishes between the types of errors, then separate kstats 262for each class of error are recommended. 263See the 264.Sx STATISTICS 265section for more information on the various error cases that should be 266considered. 267.It 268Once the frame has been determined to be valid, the device driver should 269transform the frame into a 270.Xr mblk 9S . 271See the section 272.Sx MBLKS AND DMA 273for more information on how to transform and prepare a message block. 274.It 275If the device supports hardware checksumming (see the 276.Sx CAPABILITIES 277section for more information on checksumming), then the device driver 278should set the corresponding checksumming information with a call to 279.Xr mac_hcksum_set 9F . 280.It 281It should then append this new message block to the 282.Em end 283of the message block chain, linking it to the 284.Sy b_next 285pointer. 286It is vitally important that all the frames be chained in the order that they 287were received. 288If the device driver mistakenly reorders frames, then it may cause performance 289impacts in the TCP stack and potentially impact application correctness. 290.El 291.Pp 292Once all the frames have been processed and assembled, the device driver 293should deliver them to the rest of the operating system by calling 294.Xr mac_rx 9F . 295The device driver should try to give as many mblk_t structures to the 296system at once. 297It 298.Em should not 299call 300.Xr mac_rx 9F 301once for every assembled mblk_t. 302.Pp 303The device driver must not hold any locks across the call to 304.Xr mac_rx 9F . 305When this function is called, received data will be pushed through the 306networking stack and some replies may be generated and given to the 307driver to send out. 308.Pp 309It is not the device driver's responsibility to determine whether or not 310the system can keep up with a driver's delivery rate of frames. 311The rest of the networking stack will handle issues related to keeping up 312appropriately and ensure that kernel memory is not exhausted by packets 313that are not being processed. 314.Pp 315Finally, the device driver should make sure that any other housekeeping 316activities required for the ring are taken care of such that more data 317can be received. 318.Ss Transmitting Data and Back Pressure 319A device driver will be asked to transmit a message block chain by 320having it's 321.Xr mc_tx 9E 322entry point called. 323While the driver is processing the message blocks, it may run out of resources. 324For example, a transmit descriptor ring may become full. 325At that point, the device driver should return the remaining unprocessed frames. 326The act of returning frames indicates that the device has asserted flow control. 327Once this has been done, no additional calls will be made to the 328driver's transmit entry point and the back pressure will be propagated 329throughout the rest of the networking stack. 330.Pp 331At some point in the future when resources have become available again, 332for example after an interrupt indicating that some portion of the 333transmit ring has been sent, then the device driver must notify the 334system that it can continue transmission. 335To do this, the driver should call 336.Xr mac_tx_update 9F . 337After that point, the driver will receive calls to its 338.Xr mc_tx 9E 339entry point again. 340As mentioned in the section on callbacks, the device driver should avoid holding 341any particular locks across the call to 342.Xr mac_tx_update 9F . 343.Ss Interrupt Coalescing 344For devices operating at higher data rates, interrupt coalescing is an 345important part of a well functioning device and may impact the 346performance of the device. 347Not all devices support interrupt coalescing. 348If interrupt coalescing is supported on the device, it is recommended that 349device driver writers provide private properties for their device to control the 350interrupt coalescing rate. 351This will make it much easier to perform experiments and observe the impact of 352different interrupt rates on the rest of the system. 353.Ss MAC Address Filter Management 354The MAC framework will attempt to use as many MAC address filters as a 355device has. 356To program a multicast address filter, the driver's 357.Xr mc_multicst 9E 358entry point will be called. 359If the device driver runs out of filters, it should not take any special action 360and just return the appropriate error as documented in the corresponding manual 361pages for the entry points. 362The framework will ensure that the device is placed in promiscuous mode 363if it needs to. 364.Ss Link Updates 365It is the responsibility of the device driver to keep track of the 366data link's state. 367Many devices provide a means of receiving an interrupt when the state of the 368link changes. 369When such a change happens, the driver should update its internal data 370structures and then call 371.Xr mac_link_update 9F 372to inform the MAC layer that this has occurred. 373If the device driver does not properly inform the system about link changes, 374then various features like link aggregations and other mechanisms that leverage 375the link state will not work correctly. 376.Ss Link Speed and Auto-negotiation 377Many networking devices support more than one possible speed that they 378can operate at. 379The selection of a speed is often performed through 380.Em auto-negotiation , 381though some devices allow the user to control what speeds are advertised 382and used. 383.Pp 384Logically, there are two different sets of things that the device driver 385needs to keep track of while it's operating: 386.Bl -enum 387.It 388The supported speeds in hardware. 389.It 390The enabled speeds from the user. 391.El 392.Pp 393By default, when a link first comes up, the device driver should 394generally configure the link to support the common set of speeds and 395perform auto-negotiation. 396.Pp 397A user can control what speeds a device advertises via auto-negotiation 398and whether or not it performs auto-negotiation at all by using a series 399of properties that have 400.Sy _EN_ 401in the name. 402These are read/write properties and there is one for each speed supported in the 403operating system. 404For a full list of them, see the 405.Sx PROPERTIES 406section. 407.Pp 408In addition to these properties, there is a corresponding set of 409properties with 410.Sy _ADV_ 411in the name. 412These are similar to the 413.Sy _EN_ 414family of properties, but they are read-only and indicate what the 415device has actually negotiated. 416While they are generally similar to the 417.Sy _EN_ 418family of properties, they may change depending on power settings. 419See the 420.Sy Ethernet Link Properties 421section in 422.Xr dladm 1M 423for more information. 424.Pp 425It's worth discussing how these different values get used throughout the 426different entry points. 427The first entry point to consider is the 428.Xr mc_propinfo 9E 429entry point. 430For a given speed, the driver should consult whether or not the hardware 431supports this speed. 432If it does, it should fill in the default value that the hardware takes and 433whether or not the property is writable. 434The properties should also be updated to indicate whether or not it is writable. 435This holds for both the 436.Sy _EN_ 437and 438.Sy _ADV_ 439family of properties. 440.Pp 441The next entry point is 442.Xr mc_getprop 9E . 443Here, the device should first consult whether the given speed is 444supported. 445If it is not, then the driver should return 446.Er ENOTSUP . 447If it does, then it should return the current value of the property. 448.Pp 449The last property endpoint is the 450.Xr mc_setprop 9E 451entry point. 452Here, the same logic applies. 453Before the driver considers whether or not the property is writable, it should 454first check whether or not it's a supported property. 455If it's not, then it should return 456.Er ENOTSUP . 457Otherwise, it should proceed to check whether the property is writable, 458and if it is and a valid value, then it should update the property and 459restart the link's negotiation. 460.Pp 461Finally, there is the 462.Xr mc_getstat 9E 463entry point. 464Several of the statistics that are queried relate to auto-negotiation and 465hardware capabilities. 466When a statistic relates to the hardware supporting a given speed, the 467.Sy _EN_ 468properties should be ignored. 469The only thing that should be consulted is what the hardware itself supports. 470Otherwise, the statistics should look at what is currently being advertised by 471the device. 472.Ss Unregistering from MAC 473During a driver's 474.Xr detach 9E 475routine, it should unregister the device instance from MAC by calling 476.Xr mac_unregister 9F 477on the handle that it originally called it on. 478If the call to 479.Xr mac_unregister 9F 480failed, then the device is likely still in use and the driver should 481fail the call to 482.Xr detach 9E . 483.Ss Interacting with Devices 484Administrators always interact with devices through the 485.Xr dladm 1M 486command line interface. 487The state of devices such as whether the link is considered 488.Sy up 489or 490.Sy down , 491various link properties such as the 492.Sy MTU , 493.Sy auto-negotiation 494state, 495and 496.Sy flow control 497state, 498are all exposed. 499It is also the preferred way that these properties are set and configured. 500.Pp 501While device tunables may be presented in a 502.Xr driver.conf 4 503file, it is recommended instead to expose such things through 504.Xr dladm 1M 505private properties, whether explicitly documented or not. 506.Sh CAPABILITIES 507Capabilities in the MAC Framework are optional features that a device 508supports which indicate various hardware features that the device 509supports. 510The two current capabilities that the system supports are related to being able 511to hardware perform large send offloads (LSO), often also known as TCP 512segmentation and the ability for hardware to calculate and verify the checksums 513present in IPv4, IPV6, and protocol headers such as TCP and UDP. 514.Pp 515The MAC framework will query a device for support of a capability 516through the 517.Xr mc_getcapab 9E 518function. 519Each capability has its own constant and may have corresponding data that goes 520along with it and a specific structure that the device is required to fill in. 521Note, the set of capabilities changes over time and there are also private 522capabilities in the system. 523Several of the capabilities are used in the implementation of the MAC framework. 524Others, like 525.Sy MAC_CAPAB_RINGS , 526represent feature that have not been stabilized and thus both API and binary 527compatibility for them is not guaranteed. 528It is important that the device driver handles unknown capabilities correctly. 529For more information, see 530.Xr mc_getcapab 9E . 531.Pp 532The following capabilities are 533stable and defined in the system: 534.Ss MAC_CAPAB_HCKSUM 535The 536.Sy MAC_CAPAB_HCKSUM 537capability indicates to the system that the device driver supports some 538amount of checksumming. 539The specific data for this capability is a pointer to a 540.Sy uint32_t . 541To indicate no support for any kind of checksumming, the driver should 542either set this value to zero or simply return that it doesn't support 543the capability. 544.Pp 545Note, the values that the driver declares in this capability indicate 546what it can do when it transmits data. 547If the driver can only verify checksums when receiving data, then it should not 548indicate that it supports this capability. 549The following set of flags may be combined through a bitwise inclusive OR: 550.Bl -tag -width Ds 551.It Sy HCKSUM_INET_PARTIAL 552This indicates that the hardware can calculate a partial checksum for 553both IPv4 and IPv6; however, it requires the pseudo-header checksum be 554calculated for it. 555The pseudo-header checksum will be available for the mblk_t when calling 556.Xr mac_hcksum_get 9F . 557Note this does not imply that the hardware is capable of calculating the 558IPv4 header checksum. 559That should be indicated with the 560.Sy HCKSUM_IPHDRCKSUM flag. 561.It Sy HCKSUM_INET_FULL_V4 562This indicates that the hardware will fully calculate the L4 checksum 563for outgoing IPv4 packets and does not require a pseudo-header checksum. 564Note this does not imply that the hardware is capable of calculating the 565IPv4 header checksum. 566That should be indicated with the 567.Sy HCKSUM_IPHDRCKSUM . 568.It Sy HCKSUM_INET_FULL_V6 569This indicates that the hardware will fully calculate the L4 checksum 570for outgoing IPv6 packets and does not require a pseudo-header checksum. 571.It Sy HCKSUM_IPHDRCKSUM 572This indicates that the hardware supports calculating the checksum for 573the IPv4 header itself. 574.El 575.Pp 576When in a driver's transmit function, the driver will be processing a 577single frame. 578It should call 579.Xr mac_hcksum_get 9F 580to see what checksum flags are set on it. 581Note that the flags that are set on it are different from the ones described 582above and are documented in its manual page. 583These flags indicate how the driver is expected to program the hardware and what 584checksumming is required. 585Not all frames will require hardware checksumming or will ask the hardware to 586checksum it. 587.Pp 588If a driver supports offloading the receive checksum and verification, 589it should check to see what the hardware indicated was verified. 590The driver should then call 591.Xr mac_hcksum_set 9F . 592The flags used are different from the ones above and are discussed in 593detail in the 594.Xr mac_hcksum_set 9F 595manual page. 596If there is no checksum information available or the driver does not support 597checksumming, then it should simply not call 598.Xr mac_hcksum_set 9F . 599.Pp 600Note that the checksum flags should be set on the first 601mblk_t that makes up a given message. 602In other words, if multiple mblk_t structures are linked together by the 603.Sy b_cont 604member to describe a single frame, then it should only be called on the 605first mblk_t of that set. 606However, each distinct message should have the checksum bits set on it, if 607applicable. 608In other words, each mblk_t that is linked together by the 609.Sy b_next 610pointer may have checksum flags set. 611.Pp 612It is recommended that device drivers provide a private property or 613.Xr driver.conf 4 614property to control whether or not checksumming is enabled for both rx 615and tx; however, the default disposition is recommended to be enabled 616for both. 617This way if hardware bugs are found in the checksumming implementation, they can 618be disabled without requiring software updates. 619The transmit property should be checked when determining how to reply to 620.Xr mc_getcapab 9E 621and the receive property should be checked in the context of the receive 622function. 623.Ss MAC_CAPAB_LSO 624The 625.Sy MAC_CAPAB_LSO 626capability indicates that the driver supports various forms of large 627send offload (LSO). 628The private data is a pointer to a 629.Sy mac_capab_lso_t 630structure. 631At the moment, LSO support is limited to TCP inside of IPv4. 632This structure has the following members which are used to indicate 633various types of LSO support. 634.Bd -literal -offset indent 635t_uscalar_t lso_flags; 636lso_basic_tcp_ivr4_t lso_basic_tcp_ipv4; 637.Ed 638.Pp 639The 640.Sy lso_flags 641member is used to indicate which members are valid and should be 642considered. 643Each flag represents a different form of LSO. 644The member should be set to the bitwise inclusive OR of the following values: 645.Bl -tag -width Dv -offset indent 646.It Sy LSO_TX_BASIC_TCP_IPV4 647This indicates hardware support for performing TCP segmentation 648offloading over IPv4. 649When this flag is set, the 650.Sy lso_basic_tcp_ipv4 651member must be filled in. 652.El 653.Pp 654The 655.Sy lso_basic_tcp_ipv4 656member is a structure with the following members: 657.Bd -literal -offset indent 658t_uscalar_t lso_max 659.Ed 660.Bd -filled -offset indent 661The 662.Sy lso_max 663member should be set to the maximum size of the TCP data 664payload that can be offloaded to the hardware. 665.Ed 666.Pp 667Like with checksumming, it is recommended that driver writers provide a 668means for disabling the support of LSO even if it is enabled by default. 669This deals with the case where issues that pop up for LSO may be worked 670around without requiring additional driver work. 671.Sh PROPERTIES 672Properties in the MAC framework represent aspects of a link. 673These include things like the link's current state and MTU. 674Many of the properties in the system are focused around auto-negotiation and 675controlling what link speeds are advertised. 676Information about properties is covered by three different device entry points. 677The 678.Xr mc_propinfo 9E 679entry point obtains metadata about the property. 680The 681.Xr mc_getprop 9E 682entry point obtains the property. 683The 684.Xr mc_setprop 9E 685entry point updates the property to a new value. 686.Pp 687Many of the properties listed below are read-only. 688Each property indicates whether it's read-only or it's read/write. 689However, driver writers may not implement the ability to set all writable 690properties. 691Many of these depend on the card itself. 692In particular, all properties that relate to auto-negotiation and are read/write 693may not be updated if the hardware in question does not support toggling what 694link speeds are auto-negotiated. 695While copper Ethernet often does not have this restriction, it often exists with 696various fiber standards and phys. 697.Pp 698The following properties are the subset of MAC framework properties that 699driver writers should be aware of and handle. 700While other properties exist in the system, driver writers should always return 701an error when a property not listed below is encountered. 702See 703.Xr mc_getprop 9E 704and 705.Xr mc_setprop 9E 706for more information on how to handle them. 707.Bl -hang -width Ds 708.It Sy MAC_PROP_DUPLEX 709.Bd -filled -compact 710Type: 711.Sy link_duplex_t | 712Permissions: 713.Sy Read-Only 714.Ed 715.Pp 716The 717.Sy MAC_PROP_DUPLEX 718property is used to indicate whether or not the link is duplex. 719A duplex link may have traffic flowing in both directions at the same time. 720The 721.Sy link_duplex_t 722is an enumeration which may be set to any of the following values: 723.Bl -tag -width Ds 724.It Sy LINK_DUPLEX_UNKNOWN 725The current state of the link is unknown. 726This may be because the link has not negotiated to a specific speed or it is 727down. 728.It Sy LINK_DUPLEX_HALF 729The link is running at half duplex. 730Communication may travel in only one direction on the link at a given time. 731.It Sy LINK_DUPLEX_FULL 732The link is running at full duplex. 733Communication may travel in both directions on the link simultaneously. 734.El 735.It Sy MAC_PROP_SPEED 736.Bd -filled -compact 737Type: 738.Sy uint64_t | 739Permissions: 740.Sy Read-Only 741.Ed 742.Pp 743The 744.Sy MAC_PROP_SPEED 745property stores the current link speed in bits per second. 746A link that is running at 100 MBit/s would store the value 100000000ULL. 747A link that is running at 40 Gbit/s would store the value 40000000000ULL. 748.It Sy MAC_PROP_STATUS 749.Bd -filled -compact 750Type: 751.Sy link_state_t | 752Permissions: 753.Sy Read-Only 754.Ed 755.Pp 756The 757.Sy MAC_PROP_STATUS 758property is used to indicate the current state of the link. 759It indicates whether the link is up or down. 760The 761.Sy link_state_t 762is an enumeration which may be set to any of the following values: 763.Bl -tag -width Ds 764.It Sy LINK_STATE_UNKNOWN 765The current state of the link is unknown. 766This may be because the driver's 767.Xr mc_start 9E 768endpoint has not been called so it has not attempted to start the link. 769.It Sy LINK_STATE_DOWN 770The link is down. 771This may be because of a negotiation problem, a cable problem, or some other 772device specific issue. 773.It Sy LINK_STATE_UP 774The link is up. 775If auto-negotiation is in use, it should have completed. 776Traffic should be able to flow over the link, barring other issues. 777.El 778.It Sy MAC_PROP_AUTONEG 779.Bd -filled -compact 780Type: 781.Sy uint8_t | 782Permissions: 783.Sy Read/Write 784.Ed 785.Pp 786The 787.Sy MAC_PROP_AUTONEG 788property indicates whether or not the device is currently configured to 789perform auto-negotiation. 790A value of 791.Sy 0 792indicates that auto-negotiation is disabled. 793A 794.Sy non-zero 795value indicates that auto-negotiation is enabled. 796Devices should generally default to enabling auto-negotiation. 797.Pp 798When getting this property, the device driver should return the current 799state. 800When setting this property, if the device supports operating in the requested 801mode, then the device driver should reset the link to negotiate to the new speed 802after updating any internal registers. 803.It Sy MAC_PROP_MTU 804.Bd -filled -compact 805Type: 806.Sy uint32_t | 807Permissions: 808.Sy Read/Write 809.Ed 810.Pp 811The 812.Sy MAC_PROP_MTU 813property determines the maximum transmission unit (MTU). 814This indicates the maximum size packet that the device can transmit, ignoring 815its own headers. 816For an Ethernet device, this would exclude the size of the Ethernet header and 817any VLAN headers that would be placed. 818It is up to the driver to ensure that any MTU values that it accepts when adding 819in its margin and header sizes does not exceed its maximum frame size. 820.Pp 821By default, drivers for Ethernet should initialize this value and the 822MTU to 823.Sy 1500 . 824When getting this property, the driver should return its current 825recorded MTU. 826When setting this property, the driver should first validate that it is within 827the device's valid range and then it must call 828.Xr mac_maxsdu_update 9F . 829Note that the call may fail. 830If the call completes successfully, the driver should update the hardware with 831the new value of the MTU and perform any other work needed to handle it. 832.Pp 833If the device does not support changing the MTU after the device's 834.Xr mc_start 9E 835entry point has been called, then driver writers should return 836.Er EBUSY . 837.It Sy MAC_PROP_FLOWCTRL 838.Bd -filled -compact 839Type: 840.Sy link_flowctrl_t | 841Permissions: 842.Sy Read/Write 843.Ed 844.Pp 845The 846.Sy MAC_PROP_FLOWCTRL 847property manages the configuration of pause frames as part of Ethernet 848flow control. 849Note, this only describes what this device will advertise. 850What is actually enabled may be different and is subject to the rules of 851auto-negotiation. 852The 853.Sy link_flowctrl_t 854is an enumeration that may be set to one of the following values: 855.Bl -tag -width Ds 856.It Sy LINK_FLOWCTRL_NONE 857Flow control is disabled. 858No pause frames should be generated or honored. 859.It Sy LINK_FLOWCTRL_RX 860The device can receive pause frames; however, it should not generate 861them. 862.It Sy LINK_FLOWCTRL_TX 863The device can generate pause frames; however, it does not support 864receiving them. 865.It Sy LINK_FLOWCTRL_BI 866The device supports both sending and receiving pause frames. 867.El 868.Pp 869When getting this property, the device driver should return the way that 870it has configured the device, not what the device has actually 871negotiated. 872When setting the property, it should update the hardware and allow the link to 873potentially perform auto-negotiation again. 874.El 875.Pp 876The remaining properties are all about various auto-negotiation link 877speeds. 878They fall into two different buckets: properties with 879.Sy _ADV_ 880in the name and properties with 881.Sy _EN_ 882in the name. 883For any given supported speed, there is one of each. 884The 885.Sy _EN_ 886set of properties are read/write properties that control what should be 887advertised by the device. 888When these are retrieved, they should return the current value of the property. 889When they are set, they should change how the hardware advertises the specific 890speed and trigger any kind of link reset and auto-negotiation, if enabled, to 891occur. 892.Pp 893The 894.Sy _ADV_ 895set of properties are read-only properties. 896They are meant to reflect what has actually been negotiated. 897These may be different from the 898.Sy _EN_ 899family of properties, especially when different power management 900settings are at play. 901.Pp 902See the 903.Sx Link Speed and Auto-negotiation 904section for more information. 905.Pp 906The properties are ordered in increasing link speed: 907.Bl -hang -width Ds 908.It Sy MAC_PROP_ADV_10HDX_CAP 909.Bd -filled -compact 910Type: 911.Sy uint8_t | 912Permissions: 913.Sy Read-Only 914.Ed 915.Pp 916The 917.Sy MAC_PROP_ADV_10HDX_CAP 918property describes whether or not 10 Mbit/s half-duplex support is 919advertised. 920.It Sy MAC_PROP_EN_10HDX_CAP 921.Bd -filled -compact 922Type: 923.Sy uint8_t | 924Permissions: 925.Sy Read/Write 926.Ed 927.Pp 928The 929.Sy MAC_PROP_EN_10HDX_CAP 930property describes whether or not 10 Mbit/s half-duplex support is 931enabled. 932.It Sy MAC_PROP_ADV_10FDX_CAP 933.Bd -filled -compact 934Type: 935.Sy uint8_t | 936Permissions: 937.Sy Read-Only 938.Ed 939.Pp 940The 941.Sy MAC_PROP_ADV_10FDX_CAP 942property describes whether or not 10 Mbit/s full-duplex support is 943advertised. 944.It Sy MAC_PROP_EN_10FDX_CAP 945.Bd -filled -compact 946Type: 947.Sy uint8_t | 948Permissions: 949.Sy Read/Write 950.Ed 951.Pp 952The 953.Sy MAC_PROP_EN_10FDX_CAP 954property describes whether or not 10 Mbit/s full-duplex support is 955enabled. 956.It Sy MAC_PROP_ADV_100HDX_CAP 957.Bd -filled -compact 958Type: 959.Sy uint8_t | 960Permissions: 961.Sy Read-Only 962.Ed 963.Pp 964The 965.Sy MAC_PROP_ADV_100HDX_CAP 966property describes whether or not 100 Mbit/s half-duplex support is 967advertised. 968.It Sy MAC_PROP_EN_100HDX_CAP 969.Bd -filled -compact 970Type: 971.Sy uint8_t | 972Permissions: 973.Sy Read/Write 974.Ed 975.Pp 976The 977.Sy MAC_PROP_EN_100HDX_CAP 978property describes whether or not 100 Mbit/s half-duplex support is 979enabled. 980.It Sy MAC_PROP_ADV_100FDX_CAP 981.Bd -filled -compact 982Type: 983.Sy uint8_t | 984Permissions: 985.Sy Read-Only 986.Ed 987.Pp 988The 989.Sy MAC_PROP_ADV_100FDX_CAP 990property describes whether or not 100 Mbit/s full-duplex support is 991advertised. 992.It Sy MAC_PROP_EN_100FDX_CAP 993.Bd -filled -compact 994Type: 995.Sy uint8_t | 996Permissions: 997.Sy Read/Write 998.Ed 999.Pp 1000The 1001.Sy MAC_PROP_EN_100FDX_CAP 1002property describes whether or not 100 Mbit/s full-duplex support is 1003enabled. 1004.It Sy MAC_PROP_ADV_100T4_CAP 1005.Bd -filled -compact 1006Type: 1007.Sy uint8_t | 1008Permissions: 1009.Sy Read-Only 1010.Ed 1011.Pp 1012The 1013.Sy MAC_PROP_ADV_100T4_CAP 1014property describes whether or not 100 Mbit/s Ethernet using the 1015100BASE-T4 standard is 1016advertised. 1017.It Sy MAC_PROP_EN_100T4_CAP 1018.Bd -filled -compact 1019Type: 1020.Sy uint8_t | 1021Permissions: 1022.Sy Read/Write 1023.Ed 1024.Pp 1025The 1026.Sy MAC_PROP_ADV_100T4_CAP 1027property describes whether or not 100 Mbit/s Ethernet using the 1028100BASE-T4 standard is 1029enabled. 1030.It Sy MAC_PROP_ADV_1000HDX_CAP 1031.Bd -filled -compact 1032Type: 1033.Sy uint8_t | 1034Permissions: 1035.Sy Read-Only 1036.Ed 1037.Pp 1038The 1039.Sy MAC_PROP_ADV_1000HDX_CAP 1040property describes whether or not 1 Gbit/s half-duplex support is 1041advertised. 1042.It Sy MAC_PROP_EN_1000HDX_CAP 1043.Bd -filled -compact 1044Type: 1045.Sy uint8_t | 1046Permissions: 1047.Sy Read/Write 1048.Ed 1049.Pp 1050The 1051.Sy MAC_PROP_EN_1000HDX_CAP 1052property describes whether or not 1 Gbit/s half-duplex support is 1053enabled. 1054.It Sy MAC_PROP_ADV_1000FDX_CAP 1055.Bd -filled -compact 1056Type: 1057.Sy uint8_t | 1058Permissions: 1059.Sy Read-Only 1060.Ed 1061.Pp 1062The 1063.Sy MAC_PROP_ADV_1000FDX_CAP 1064property describes whether or not 1 Gbit/s full-duplex support is 1065advertised. 1066.It Sy MAC_PROP_EN_1000FDX_CAP 1067.Bd -filled -compact 1068Type: 1069.Sy uint8_t | 1070Permissions: 1071.Sy Read/Write 1072.Ed 1073.Pp 1074The 1075.Sy MAC_PROP_EN_1000FDX_CAP 1076property describes whether or not 1 Gbit/s full-duplex support is 1077enabled. 1078.It Sy MAC_PROP_ADV_2500FDX_CAP 1079.Bd -filled -compact 1080Type: 1081.Sy uint8_t | 1082Permissions: 1083.Sy Read-Only 1084.Ed 1085.Pp 1086The 1087.Sy MAC_PROP_ADV_2500FDX_CAP 1088property describes whether or not 2.5 Gbit/s full-duplex support is 1089advertised. 1090.It Sy MAC_PROP_EN_2500FDX_CAP 1091.Bd -filled -compact 1092Type: 1093.Sy uint8_t | 1094Permissions: 1095.Sy Read/Write 1096.Ed 1097.Pp 1098The 1099.Sy MAC_PROP_EN_2500FDX_CAP 1100property describes whether or not 2.5 Gbit/s full-duplex support is 1101enabled. 1102.It Sy MAC_PROP_ADV_5000FDX_CAP 1103.Bd -filled -compact 1104Type: 1105.Sy uint8_t | 1106Permissions: 1107.Sy Read-Only 1108.Ed 1109.Pp 1110The 1111.Sy MAC_PROP_ADV_5000FDX_CAP 1112property describes whether or not 5.0 Gbit/s full-duplex support is 1113advertised. 1114.It Sy MAC_PROP_EN_5000FDX_CAP 1115.Bd -filled -compact 1116Type: 1117.Sy uint8_t | 1118Permissions: 1119.Sy Read/Write 1120.Ed 1121.Pp 1122The 1123.Sy MAC_PROP_EN_5000FDX_CAP 1124property describes whether or not 5.0 Gbit/s full-duplex support is 1125enabled. 1126.It Sy MAC_PROP_ADV_10GFDX_CAP 1127.Bd -filled -compact 1128Type: 1129.Sy uint8_t | 1130Permissions: 1131.Sy Read-Only 1132.Ed 1133.Pp 1134The 1135.Sy MAC_PROP_ADV_10GFDX_CAP 1136property describes whether or not 10 Gbit/s full-duplex support is 1137advertised. 1138.It Sy MAC_PROP_EN_10GFDX_CAP 1139.Bd -filled -compact 1140Type: 1141.Sy uint8_t | 1142Permissions: 1143.Sy Read/Write 1144.Ed 1145.Pp 1146The 1147.Sy MAC_PROP_EN_10GFDX_CAP 1148property describes whether or not 10 Gbit/s full-duplex support is 1149enabled. 1150.It Sy MAC_PROP_ADV_40GFDX_CAP 1151.Bd -filled -compact 1152Type: 1153.Sy uint8_t | 1154Permissions: 1155.Sy Read-Only 1156.Ed 1157.Pp 1158The 1159.Sy MAC_PROP_ADV_40GFDX_CAP 1160property describes whether or not 40 Gbit/s full-duplex support is 1161advertised. 1162.It Sy MAC_PROP_EN_40GFDX_CAP 1163.Bd -filled -compact 1164Type: 1165.Sy uint8_t | 1166Permissions: 1167.Sy Read/Write 1168.Ed 1169.Pp 1170The 1171.Sy MAC_PROP_EN_40GFDX_CAP 1172property describes whether or not 40 Gbit/s full-duplex support is 1173enabled. 1174.It Sy MAC_PROP_ADV_100GFDX_CAP 1175.Bd -filled -compact 1176Type: 1177.Sy uint8_t | 1178Permissions: 1179.Sy Read-Only 1180.Ed 1181.Pp 1182The 1183.Sy MAC_PROP_ADV_100GFDX_CAP 1184property describes whether or not 100 Gbit/s full-duplex support is 1185advertised. 1186.It Sy MAC_PROP_EN_100GFDX_CAP 1187.Bd -filled -compact 1188Type: 1189.Sy uint8_t | 1190Permissions: 1191.Sy Read/Write 1192.Ed 1193.Pp 1194The 1195.Sy MAC_PROP_EN_100GFDX_CAP 1196property describes whether or not 100 Gbit/s full-duplex support is 1197enabled. 1198.El 1199.Ss Private Properties 1200In addition to the defined properties above, drivers are allowed to 1201define private properties. 1202These private properties are device-specific properties. 1203All private properties share the same constant, 1204.Sy MAC_PROP_PRIVATE . 1205Properties are distinguished by a name, which is a character string. 1206The list of such private properties is defined when registering with mac in the 1207.Sy m_priv_props 1208member of the 1209.Xr mac_register 9S 1210structure. 1211.Pp 1212The driver may define whatever semantics it wants for these private 1213properties. 1214They will not be listed when running 1215.Xr dladm 1M , 1216unless explicitly requested by name. 1217All such properties should start with a leading underscore character and then 1218consist of alphanumeric ASCII characters and additional underscores or hyphens. 1219.Pp 1220Properties of type 1221.Sy MAC_PROP_PRIVATE 1222may show up in all three property related entry points: 1223.Xr mc_propinfo 9E , 1224.Xr mc_getprop 9E , 1225and 1226.Xr mc_setprop 9E . 1227Device drivers should tell the different properties apart by using the 1228.Xr strcmp 9F 1229function to compare it to the set of properties that it knows about. 1230When encountering properties that it doesn't know, it should treat them 1231like all other unknown properties. 1232.Sh STATISTICS 1233The MAC framework defines a couple different sets of statistics which 1234are based on various standards for devices to implement. 1235Statistics are retrieved through the 1236.Xr mc_getstat 9E 1237entry point. 1238There are both statistics that are required for all devices and then there is a 1239separate set of Ethernet specific statistics. 1240Not all devices will support every statistic. 1241In many cases, several device registers will need to be combined to create the 1242proper stat. 1243.Pp 1244In general, if the device is not keeping track of these statistics, then 1245it is recommended that the driver store these values as a 1246.Sy uint64_t 1247to ensure that overflow does not occur. 1248.Pp 1249If a device does not support a specific statistic, then it is fine to 1250return that it is not supported. 1251The same should be used for unrecognized statistics. 1252See 1253.Xr mc_getstat 9E 1254for more information on the proper way to handle these. 1255.Ss General Device Statistics 1256The following statistics are based on MIB-II statistics from both RFC 12571213 and RFC 1573. 1258.Bl -tag -width Ds 1259.It Sy MAC_STAT_IFSPEED 1260The device's current speed in bits per second. 1261.It Sy MAC_STAT_MULTIRCV 1262The total number of received multicast packets. 1263.It Sy MAC_STAT_BRDCSTRCV 1264The total number of received broadcast packets. 1265.It Sy MAC_STAT_MULTIXMT 1266The total number of transmitted multicast packets. 1267.It Sy MAC_STAT_BRDCSTXMT 1268The total number of received broadcast packets. 1269.It Sy MAC_STAT_NORCVBUF 1270The total number of packets discarded by the hardware due to a lack of 1271receive buffers. 1272.It Sy MAC_STAT_IERRORS 1273The total number of errors detected on input. 1274.It Sy MAC_STAT_UNKNOWNS 1275The total number of received packets that were discarded because they 1276were of an unknown protocol. 1277.It Sy MAC_STAT_NOXMTBUF 1278The total number of outgoing packets dropped due to a lack of transmit 1279buffers. 1280.It Sy MAC_STAT_OERRORS 1281The total number of outgoing packets that resulted in errors. 1282.It Sy MAC_STAT_COLLISIONS 1283Total number of collisions encountered by the transmitter. 1284.It Sy MAC_STAT_RBYTES 1285The total number of 1286.Sy bytes 1287received by the device, regardless of packet type. 1288.It Sy MAC_STAT_IPACKETS 1289The total number of 1290.Sy packets 1291received by the device, regardless of packet type. 1292.It Sy MAC_STAT_OBYTES 1293The total number of 1294.Sy bytes 1295transmitted by the device, regardless of packet type. 1296.It Sy MAC_STAT_OPACKETS 1297The total number of 1298.Sy packets 1299sent by the device, regardless of packet type. 1300.It Sy MAC_STAT_UNDERFLOWS 1301The total number of packets that were smaller than the minimum sized 1302packet for the device and were therefore dropped. 1303.It Sy MAC_STAT_OVERFLOWS 1304The total number of packets that were larger than the maximum sized 1305packet for the device and were therefore dropped. 1306.El 1307.Ss Ethernet Specific Statistics 1308The following statistics are specific to Ethernet devices. 1309They refer to values from RFC 1643 and include various MII/GMII specific stats. 1310Many of these are also defined in IEEE 802.3. 1311.Bl -tag -width Ds 1312.It Sy ETHER_STAT_ADV_CAP_1000FDX 1313Indicates that the device is advertising support for 1 Gbit/s 1314full-duplex operation. 1315.It Sy ETHER_STAT_ADV_CAP_1000HDX 1316Indicates that the device is advertising support for 1 Gbit/s 1317half-duplex operation. 1318.It Sy ETHER_STAT_ADV_CAP_100FDX 1319Indicates that the device is advertising support for 100 Mbit/s 1320full-duplex operation. 1321.It Sy ETHER_STAT_ADV_CAP_100GFDX 1322Indicates that the device is advertising support for 100 Gbit/s 1323full-duplex operation. 1324.It Sy ETHER_STAT_ADV_CAP_100HDX 1325Indicates that the device is advertising support for 100 Mbit/s 1326half-duplex operation. 1327.It Sy ETHER_STAT_ADV_CAP_100T4 1328Indicates that the device is advertising support for 100 Mbit/s 1329100BASE-T4 operation. 1330.It Sy ETHER_STAT_ADV_CAP_10FDX 1331Indicates that the device is advertising support for 10 Mbit/s 1332full-duplex operation. 1333.It Sy ETHER_STAT_ADV_CAP_10GFDX 1334Indicates that the device is advertising support for 10 Gbit/s 1335full-duplex operation. 1336.It Sy ETHER_STAT_ADV_CAP_10HDX 1337Indicates that the device is advertising support for 10 Mbit/s 1338half-duplex operation. 1339.It Sy ETHER_STAT_ADV_CAP_2500FDX 1340Indicates that the device is advertising support for 2.5 Gbit/s 1341full-duplex operation. 1342.It Sy ETHER_STAT_ADV_CAP_40GFDX 1343Indicates that the device is advertising support for 40 Gbit/s 1344full-duplex operation. 1345.It Sy ETHER_STAT_ADV_CAP_5000FDX 1346Indicates that the device is advertising support for 5.0 Gbit/s 1347full-duplex operation. 1348.It Sy ETHER_STAT_ADV_CAP_ASMPAUSE 1349Indicates that the device is advertising support for receiving pause 1350frames. 1351.It Sy ETHER_STAT_ADV_CAP_AUTONEG 1352Indicates that the device is advertising support for auto-negotiation. 1353.It Sy ETHER_STAT_ADV_CAP_PAUSE 1354Indicates that the device is advertising support for generating pause 1355frames. 1356.It Sy ETHER_STAT_ADV_REMFAULT 1357Indicates that the device is advertising support for detecting faults in 1358the remote link peer. 1359.It Sy ETHER_STAT_ALIGN_ERRORS 1360Indicates the number of times an alignment error was generated by the 1361Ethernet device. 1362This is a count of packets that were not an integral number of octets and failed 1363the FCS check. 1364.It Sy ETHER_STAT_CAP_1000FDX 1365Indicates the device supports 1 Gbit/s full-duplex operation. 1366.It Sy ETHER_STAT_CAP_1000HDX 1367Indicates the device supports 1 Gbit/s half-duplex operation. 1368.It Sy ETHER_STAT_CAP_100FDX 1369Indicates the device supports 100 Mbit/s full-duplex operation. 1370.It Sy ETHER_STAT_CAP_100GFDX 1371Indicates the device supports 100 Gbit/s full-duplex operation. 1372.It Sy ETHER_STAT_CAP_100HDX 1373Indicates the device supports 100 Mbit/s half-duplex operation. 1374.It Sy ETHER_STAT_CAP_100T4 1375Indicates the device supports 100 Mbit/s 100BASE-T4 operation. 1376.It Sy ETHER_STAT_CAP_10FDX 1377Indicates the device supports 10 Mbit/s full-duplex operation. 1378.It Sy ETHER_STAT_CAP_10GFDX 1379Indicates the device supports 10 Gbit/s full-duplex operation. 1380.It Sy ETHER_STAT_CAP_10HDX 1381Indicates the device supports 10 Mbit/s half-duplex operation. 1382.It Sy ETHER_STAT_CAP_2500FDX 1383Indicates the device supports 2.5 Gbit/s full-duplex operation. 1384.It Sy ETHER_STAT_CAP_40GFDX 1385Indicates the device supports 40 Gbit/s full-duplex operation. 1386.It Sy ETHER_STAT_CAP_5000FDX 1387Indicates the device supports 5.0 Gbit/s full-duplex operation. 1388.It Sy ETHER_STAT_CAP_ASMPAUSE 1389Indicates that the device supports the ability to receive pause frames. 1390.It Sy ETHER_STAT_CAP_AUTONEG 1391Indicates that the device supports the ability to perform link 1392auto-negotiation. 1393.It Sy ETHER_STAT_CAP_PAUSE 1394Indicates that the device supports the ability to transmit pause frames. 1395.It Sy ETHER_STAT_CAP_REMFAULT 1396Indicates that the device supports the ability of detecting a remote 1397fault in a link peer. 1398.It Sy ETHER_STAT_CARRIER_ERRORS 1399Indicates the number of times that the Ethernet carrier sense condition 1400was lost or not asserted. 1401.It Sy ETHER_STAT_DEFER_XMTS 1402Indicates the number of frames for which the device was unable to 1403transmit the frame due to being busy and had to try again. 1404.It Sy ETHER_STAT_EX_COLLISIONS 1405Indicates the number of frames that failed to send due to an excessive 1406number of collisions. 1407.It Sy ETHER_STAT_FCS_ERRORS 1408Indicates the number of times that a frame check sequence failed. 1409.It Sy ETHER_STAT_FIRST_COLLISIONS 1410Indicates the number of times that a frame was eventually transmitted 1411successfully, but only after a single collision. 1412.It Sy ETHER_STAT_JABBER_ERRORS 1413Indicates the number of frames that were received that were both larger 1414than the maximum packet size and failed the frame check sequence. 1415.It Sy ETHER_STAT_LINK_ASMPAUSE 1416Indicates whether the link is currently configured to accept pause 1417frames. 1418.It Sy ETHER_STAT_LINK_AUTONEG 1419Indicates whether the current link state is a result of 1420auto-negotiation. 1421.It Sy ETHER_STAT_LINK_DUPLEX 1422Indicates the current duplex state of the link. 1423The values used here should be the same as documented for 1424.Sy MAC_PROP_DUPLEX . 1425.It Sy ETHER_STAT_LINK_PAUSE 1426Indicates whether the link is currently configured to generate pause 1427frames. 1428.It Sy ETHER_STAT_LP_CAP_1000FDX 1429Indicates the remote device supports 1 Gbit/s full-duplex operation. 1430.It Sy ETHER_STAT_LP_CAP_1000HDX 1431Indicates the remote device supports 1 Gbit/s half-duplex operation. 1432.It Sy ETHER_STAT_LP_CAP_100FDX 1433Indicates the remote device supports 100 Mbit/s full-duplex operation. 1434.It Sy ETHER_STAT_LP_CAP_100GFDX 1435Indicates the remote device supports 100 Gbit/s full-duplex operation. 1436.It Sy ETHER_STAT_LP_CAP_100HDX 1437Indicates the remote device supports 100 Mbit/s half-duplex operation. 1438.It Sy ETHER_STAT_LP_CAP_100T4 1439Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation. 1440.It Sy ETHER_STAT_LP_CAP_10FDX 1441Indicates the remote device supports 10 Mbit/s full-duplex operation. 1442.It Sy ETHER_STAT_LP_CAP_10GFDX 1443Indicates the remote device supports 10 Gbit/s full-duplex operation. 1444.It Sy ETHER_STAT_LP_CAP_10HDX 1445Indicates the remote device supports 10 Mbit/s half-duplex operation. 1446.It Sy ETHER_STAT_LP_CAP_2500FDX 1447Indicates the remote device supports 2.5 Gbit/s full-duplex operation. 1448.It Sy ETHER_STAT_LP_CAP_40GFDX 1449Indicates the remote device supports 40 Gbit/s full-duplex operation. 1450.It Sy ETHER_STAT_LP_CAP_5000FDX 1451Indicates the remote device supports 5.0 Gbit/s full-duplex operation. 1452.It Sy ETHER_STAT_LP_CAP_ASMPAUSE 1453Indicates that the remote device supports the ability to receive pause 1454frames. 1455.It Sy ETHER_STAT_LP_CAP_AUTONEG 1456Indicates that the remote device supports the ability to perform link 1457auto-negotiation. 1458.It Sy ETHER_STAT_LP_CAP_PAUSE 1459Indicates that the remote device supports the ability to transmit pause 1460frames. 1461.It Sy ETHER_STAT_LP_CAP_REMFAULT 1462Indicates that the remote device supports the ability of detecting a 1463remote fault in a link peer. 1464.It Sy ETHER_STAT_MACRCV_ERRORS 1465Indicates the number of times that the internal MAC layer encountered an 1466error when attempting to receive and process a frame. 1467.It Sy ETHER_STAT_MACXMT_ERRORS 1468Indicates the number of times that the internal MAC layer encountered an 1469error when attempting to process and transmit a frame. 1470.It Sy ETHER_STAT_MULTI_COLLISIONS 1471Indicates the number of times that a frame was eventually transmitted 1472successfully, but only after more than one collision. 1473.It Sy ETHER_STAT_SQE_ERRORS 1474Indicates the number of times that an SQE error occurred. 1475The specific conditions for this error are documented in IEEE 802.3. 1476.It Sy ETHER_STAT_TOOLONG_ERRORS 1477Indicates the number of frames that were received that were longer than 1478the maximum frame size supported by the device. 1479.It Sy ETHER_STAT_TOOSHORT_ERRORS 1480Indicates the number of frames that were received that were shorter than 1481the minimum frame size supported by the device. 1482.It Sy ETHER_STAT_TX_LATE_COLLISIONS 1483Indicates the number of times a collision was detected late on the 1484device. 1485.It Sy ETHER_STAT_XCVR_ADDR 1486Indicates the address of the MII/GMII receiver address. 1487.It Sy ETHER_STAT_XCVR_ID 1488Indicates the id of the MII/GMII receiver address. 1489.It Sy ETHER_STAT_XCVR_INUSE 1490Indicates what kind of receiver is in use. 1491The following values may be used: 1492.Bl -tag -width Ds 1493.It Sy XCVR_UNDEFINED 1494The receiver type is undefined by the hardware. 1495.It Sy XCVR_NONE 1496There is no receiver in use by the hardware. 1497.It Sy XCVR_10 1498The receiver supports 10BASE-T operation. 1499.It Sy XCVR_100T4 1500The receiver supports 100BASE-T4 operation. 1501.It Sy XCVR_100X 1502The receiver supports 100BASE-TX operation. 1503.It Sy XCVR_100T2 1504The receiver supports 100BASE-T2 operation. 1505.It Sy XCVR_1000X 1506The receiver supports 1000BASE-X operation. 1507This is used for all fiber receivers. 1508.It Sy XCVR_1000T 1509The receiver supports 1000BASE-T operation. 1510This is used for all copper receivers. 1511.El 1512.El 1513.Ss Device Specific kstats 1514In addition to the defined statistics above, if the device driver 1515maintains additional statistics or the device provides additional 1516statistics, it should create its own kstats through the 1517.Xr kstat_create 9F 1518function to allow operators to observe them. 1519.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT 1520Device drivers are the first line of defense for dealing with broken 1521devices and bugs in their firmware. 1522While most devices will rarely fail, it is important that when designing and 1523implementing the device driver that particular attention is paid in the design 1524with respect to RAS (Reliability, Availability, and Serviceability). 1525While everything described in this section is optional, it is highly recommended 1526that all new device drivers follow these guidelines. 1527.Pp 1528The Fault Management Architecture (FMA) provides facilities for 1529detecting and reporting various classes of defects and faults. 1530Specifically for networking device drivers, issues that should be 1531detected and reported include: 1532.Bl -bullet -offset indent 1533.It 1534Device internal uncorrectable errors 1535.It 1536Device internal correctable errors 1537.It 1538PCI and PCI Express transport errors 1539.It 1540Device temperature alarms 1541.It 1542Device transmission stalls 1543.It 1544Device communication timeouts 1545.It 1546High invalid interrupts 1547.El 1548.Pp 1549All such errors fall into three primary categories: 1550.Bl -enum -offset indent 1551.It 1552Errors detected by the Fault Management Architecture 1553.It 1554Errors detected by the device and indicated to the device driver 1555.It 1556Errors detected by the device driver 1557.El 1558.Ss Fault Management Setup and Teardown 1559Drivers should initialize support for the fault management framework by 1560calling 1561.Xr ddi_fm_init 9F 1562from their 1563.Xr attach 9E 1564routine. 1565By registering with the fault management framework, a device driver is given the 1566chance to detect and notice transport errors as well as report other errors that 1567exist. 1568While a device driver does not need to indicate that it is capable of all such 1569capabilities described in 1570.Xr ddi_fm_init 9F , 1571we suggest that device drivers at least register the 1572.Sy DDI_FM_EREPORT_CAPABLE 1573so as to allow the driver to report issues that it detects. 1574.Pp 1575If the driver registers with the fault management framework during its 1576.Xr attach 9E 1577entry point, it must call 1578.Xr ddi_fm_fini 9F 1579during its 1580.Xr detach 9E 1581entry point. 1582.Ss Transport Errors 1583Many modern networking devices leverage PCI or PCI Express. 1584As such, there are two primary ways that device drivers access data: they either 1585memory map device registers and use routines like 1586.Xr ddi_get8 9F 1587and 1588.Xr ddi_put8 9F 1589or they use direct memory access (DMA). 1590New device drivers should always enable checking of the transport layer by 1591marking their support in the 1592.Xr ddi_device_acc_attr 9S 1593structure and using routines like 1594.Xr ddi_fm_acc_err_get 9F 1595and 1596.Xr ddi_fm_dma_err_get 9F 1597to detect if errors have occurred. 1598.Ss Device Indicated Errors 1599Many devices have capabilities to announce to a device driver that a 1600fatal correctable error or uncorrectable error has occurred. 1601Other devices have the ability to indicate that various physical issues have 1602occurred such as a fan failing or a temperature sensor having fired. 1603.Pp 1604Drivers should wire themselves to receive notifications when these 1605events occur. 1606The means and capabilities will vary from device to device. 1607For example, some devices will generate information about these notifications 1608through special interrupts. 1609Other devices may have a register that software can poll. 1610In the cases where polling is required, driver writers should try not to poll 1611too frequently and should generally only poll when the device is actively being 1612used, e.g. between calls to the 1613.Xr mc_start 9E 1614and 1615.Xr mc_stop 9E 1616entry points. 1617.Ss Driver Transmit Stall Detection 1618One of the primary responsibilities of a hardened device driver is to 1619perform transmit stall detection. 1620The core idea behind tx stall detection is that the driver should record when 1621it's getting activity related to when data has been successfully transmitted. 1622Most devices should be transmitting data on a regular basis as long as the link 1623is up. 1624If it is not, then this may indicate that the device is stuck and needs to be 1625reset. 1626At this time, the MAC framework does not provide any resources for performing 1627these checks; however, polling on each individual transmit ring for the last 1628completion time while something is actively being transmitted through the use of 1629routines such as 1630.Xr timeout 9F 1631may be a reasonable starting point. 1632.Ss Driver Command Timeout Detection 1633Each device is programmed in different ways. 1634Some devices are programmed through asynchronous commands while others are 1635programmed by writing directly to memory mapped registers. 1636If a device receives asynchronous replies to commands, then the device driver 1637should set reasonable timeouts for all such commands and plan on detecting them. 1638If a timeout occurs, the driver should presume that there is an issue with the 1639hardware and proceed to abort the command or reset the device. 1640.Pp 1641Many devices do not have such a communication mechanism. 1642However, whenever there is some activity where the device driver must wait, then 1643it should be prepared for the fact that the device may never get back to 1644it and react appropriately by performing some kind of device reset. 1645.Ss Reacting to Errors 1646When any of the above categories of errors has been triggered, the 1647behavior that the device driver should take depends on the kind of 1648error. 1649If a fatal error, for example, a transport error, a transmit stall was detected, 1650or the device indicated an uncorrectable error was detected, then it is 1651important that the driver take the following steps: 1652.Bl -enum -offset indent 1653.It 1654Set a flag in the device driver's state that indicates that it has hit 1655an error condition. 1656When this error condition flag is asserted, transmitted packets should be 1657accepted and dropped and actions that would require writing to the device state 1658should fail with an error. 1659This flag should remain until the device has been successfully restarted. 1660.It 1661If the error was not a transport error that was indicated by the fault 1662management architecture, e.g. a transport error that was detected, then 1663the device driver should post an 1664.Sy ereport 1665indicating what has occurred with the 1666.Xr ddi_fm_ereport_post 9F 1667function. 1668.It 1669The device driver should indicate that the device's service was lost 1670with a call to 1671.Xr ddi_fm_service_impact 9F 1672using the symbol 1673.Sy DDI_SERVICE_LOST . 1674.It 1675At this point the device driver should issue a device reset through some 1676device-specific means. 1677.It 1678When the device reset has been completed, then the device driver should 1679restore all of the programmed state to the device. 1680This includes things like the current MTU, advertised auto-negotiation speeds, 1681MAC address filters, and more. 1682.It 1683Finally, when service has been restored, the device driver should call 1684.Xr ddi_fm_service_impact 9F 1685using the symbol 1686.Sy DDI_SERVICE_RESTORED . 1687.El 1688.Pp 1689When a non-fatal error occurs, then the device driver should submit an 1690ereport and should optionally mark the device degraded using 1691.Xr ddi_fm_service_impact 9F 1692with the 1693.Sy DDI_SERVICE_DEGRADED 1694value depending on the nature of the problem that has occurred. 1695.Pp 1696Device drivers should never make the decision to remove a device from 1697service based on errors that have occurred nor should they panic the 1698system. 1699Rather, the device driver should always try to notify the operating system with 1700various ereports and allow its policy decisions to occur. 1701The decision to retire a device lies in the hands of the fault management 1702architecture. 1703It knows more about the operator's intent and the surrounding system's state 1704than the device driver itself does and it will make the call to offline and 1705retire the device if it is required. 1706.Ss Device Resets 1707When resetting a device, a device driver must exercise caution. 1708If a device driver has not been written to plan for a device reset, then it 1709may not correctly restore the device's state after such a reset. 1710Such state should be stored in the instance's private state data as the MAC 1711framework does not know about device resets and will not inform the 1712device again about the expected, programmed state. 1713.Pp 1714One wrinkle with device resets is that many networking cards show up as 1715multiple PCI functions on a single device, for example, each port may 1716show up as a separate function and thus have a separate instance of the 1717device driver attached. 1718When resetting a function, device driver writers should carefully read the 1719device programming manuals and verify whether or not a reset impacts only the 1720stalled function or if it impacts all function across the device. 1721.Pp 1722If the only way to reset a given function is through the device, then 1723this may require more coordination and work on the part of the device 1724driver to ensure that all the other instances are correctly restored. 1725In cases where this occurs, some devices offer ways of injecting 1726interrupts onto those other functions to notify them that this is 1727occurring. 1728.Sh MBLKS AND DMA 1729The networking stack manages framed data through the use of the 1730.Xr mblk 9S 1731structure. 1732The mblk allows for a single message to be made up of individual blocks. 1733Each part is linked together through its 1734.Sy b_cont 1735member. 1736However, it also allows for multiple messages to be chained together through the 1737use of the 1738.Sy b_next 1739member. 1740While the networking stack works with these structures, device drivers generally 1741work with DMA regions. 1742There are two different strategies that device drivers use for handling these 1743two different cases: copying and binding. 1744.Ss Copying Data 1745The first way that device drivers handle interfacing between the two is 1746by having two separate regions of memory. 1747One part is memory which has been allocated for DMA through a call to 1748.Xr ddi_dma_mem_alloc 9F 1749and the other is memory associated with the memory block. 1750.Pp 1751In this case, a driver will use 1752.Xr bcopy 9F 1753to copy memory between the two distinct regions. 1754When transmitting a packet, it will copy the memory from the mblk_t to the DMA 1755region. 1756When receiving memory, it will allocate a mblk_t through the 1757.Xr allocb 9F 1758routine, copy the memory across with 1759.Xr bcopy 9F , 1760and then increment the mblk_t's 1761.Sy w_ptr 1762structure. 1763.Pp 1764If, when receiving, memory is not available for a new message block, 1765then the frame should be skipped and effectively dropped. 1766A kstat should be bumped when such an occasion occurs. 1767.Ss Binding Data 1768An alternative approach to copying data is to use DMA binding. 1769When using DMA binding, the OS takes care of mapping between DMA memory and 1770normal device memory. 1771The exact process is a bit different between transmit and receive. 1772.Pp 1773When transmitting a device driver has an mblk_t and needs to call the 1774.Xr ddi_dma_addr_bind_handle 9F 1775function to bind it to an already existing DMA handle. 1776At that point, it will receive various DMA cookies that it can use to obtain the 1777addresses to program the device with for transmitting data. 1778Once the transmit is done, the driver must then make sure to call 1779.Xr freemsg 9F 1780to release the data. 1781It must not call 1782.Xr freemsg 9F 1783before it receives an interrupt from the device indicating that the data 1784has been transmitted, otherwise it risks sending arbitrary kernel 1785memory. 1786.Pp 1787When receiving data, the device can perform a similar operation. 1788First, it must bind the DMA memory into the kernel's virtual memory address 1789space through a call to the 1790.Xr ddi_dma_addr_bind_handle 9F 1791function if it has not already. 1792Once it has, it must then call 1793.Xr desballoc 9F 1794to try and create a new mblk_t which leverages the associated memory. 1795It can then pass that mblk_t up to the stack. 1796.Ss Considerations 1797When deciding which of these options to use, there are many different 1798considerations that must be made. 1799The answer as to whether to bind memory or to copy data is not always simpler. 1800.Pp 1801The first thing to remember is that DMA resources may be finite on a 1802given platform. 1803Consider the case of receiving data. 1804A device driver that binds one of its receive descriptors may not get it back 1805for quite some time as it may be used by the kernel until an application 1806actually consumes it. 1807Device drivers that try to bind memory for receive, often work with the 1808constraint that they must be able to replace that DMA memory with another DMA 1809descriptor. 1810If they were not replaced, then eventually the device would not be able to 1811receive additional data into the ring. 1812.Pp 1813On the other hand, particularly for larger frames, copying every packet 1814from one buffer to another can be a source of additional latency and 1815memory waste in the system. 1816For larger copies, the cost of copying may dwarf any potential cost of 1817performing DMA binding. 1818.Pp 1819For device driver authors that are unsure of what to do, they should 1820first employ the copying method to simplify the act of writing the 1821device driver. 1822The copying method is simpler and also allows the device driver author not to 1823worry about allocated DMA memory that is still outstanding when it is asked to 1824unload. 1825.Pp 1826If device driver writers are worried about the cost, it is recommended 1827to make the decision as to whether or not to copy or bind DMA data 1828a separate private property for both transmitting and receiving. 1829That private property should indicate the size of the received frame at which 1830to switch from one format to the other. 1831This way, data can be gathered to determine what the impact of each method is on 1832a given platform. 1833.Sh SEE ALSO 1834.Xr dladm 1M , 1835.Xr driver.conf 4 , 1836.Xr ieee802.3 5 , 1837.Xr dlpi 7P , 1838.Xr _fini 9E , 1839.Xr _info 9E , 1840.Xr _init 9E , 1841.Xr attach 9E , 1842.Xr close 9E , 1843.Xr detach 9E , 1844.Xr mc_close 9E , 1845.Xr mc_getcapab 9E , 1846.Xr mc_getprop 9E , 1847.Xr mc_getstat 9E , 1848.Xr mc_multicst 9E , 1849.Xr mc_open 9E , 1850.Xr mc_propinfo 9E , 1851.Xr mc_setpromisc 9E , 1852.Xr mc_setprop 9E , 1853.Xr mc_start 9E , 1854.Xr mc_stop 9E , 1855.Xr mc_tx 9E , 1856.Xr mc_unicst 9E , 1857.Xr open 9E , 1858.Xr allocb 9F , 1859.Xr bcopy 9F , 1860.Xr ddi_dma_addr_bind_handle 9F , 1861.Xr ddi_dma_mem_alloc 9F , 1862.Xr ddi_fm_acc_err_get 9F , 1863.Xr ddi_fm_dma_err_get 9F , 1864.Xr ddi_fm_ereport_post 9F , 1865.Xr ddi_fm_fini 9F , 1866.Xr ddi_fm_init 9F , 1867.Xr ddi_fm_service_impact 9F , 1868.Xr ddi_get8 9F , 1869.Xr ddi_put8 9F , 1870.Xr desballoc 9F , 1871.Xr freemsg 9F , 1872.Xr kstat_create 9F , 1873.Xr mac_alloc 9F , 1874.Xr mac_fini_ops 9F , 1875.Xr mac_hcksum_get 9F , 1876.Xr mac_hcksum_set 9F , 1877.Xr mac_init_ops 9F , 1878.Xr mac_link_update 9F , 1879.Xr mac_lso_get 9F , 1880.Xr mac_maxsdu_update 9F , 1881.Xr mac_prop_info_set_default_link_flowctrl 9F , 1882.Xr mac_prop_info_set_default_str 9F , 1883.Xr mac_prop_info_set_default_uint32 9F , 1884.Xr mac_prop_info_set_default_uint64 9F , 1885.Xr mac_prop_info_set_default_uint8 9F , 1886.Xr mac_prop_info_set_perm 9F , 1887.Xr mac_prop_info_set_range_uint32 9F , 1888.Xr mac_register 9F , 1889.Xr mac_rx 9F , 1890.Xr mac_unregister 9F , 1891.Xr mod_install 9F , 1892.Xr mod_remove 9F , 1893.Xr strcmp 9F , 1894.Xr timeout 9F , 1895.Xr cb_ops 9S , 1896.Xr ddi_device_acc_attr 9S , 1897.Xr dev_ops 9S , 1898.Xr mac_callbacks 9S , 1899.Xr mac_register 9S , 1900.Xr mblk 9S , 1901.Xr modldrv 9S , 1902.Xr modlinkage 9S 1903.Rs 1904.%A McCloghrie, K. 1905.%A Rose, M. 1906.%T RFC 1213 Management Information Base for Network Management of 1907.%T TCP/IP-based internets: MIB-II 1908.%D March 1991 1909.Re 1910.Rs 1911.%A McCloghrie, K. 1912.%A Kastenholz, F. 1913.%T RFC 1573 Evolution of the Interfaces Group of MIB-II 1914.%D January 1994 1915.Re 1916.Rs 1917.%A Kastenholz, F. 1918.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like 1919.%T Interface Types 1920.Re 1921