1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright 2019 Joyent, Inc. 13.\" Copyright 2020 RackTop Systems, Inc. 14.\" Copyright 2023 Oxide Computer Company 15.\" 16.Dd January 30, 2023 17.Dt MAC 9E 18.Os 19.Sh NAME 20.Nm mac , 21.Nm GLDv3 22.Nd MAC networking device driver overview 23.Sh SYNOPSIS 24.In sys/mac_provider.h 25.In sys/mac_ether.h 26.Sh INTERFACE LEVEL 27illumos DDI specific 28.Sh DESCRIPTION 29The 30.Sy MAC 31framework provides a means for implementing high-performance networking 32device drivers. 33It is the successor to the GLD interfaces and is sometimes referred to as the 34GLDv3. 35The remainder of this manual introduces the aspects of writing devices drivers 36that leverage the MAC framework. 37While both the GLDv3 and MAC framework refer to the same thing, in this manual 38page we use the term the 39.Em MAC framework 40to refer to the device driver interface. 41.Pp 42MAC device drivers are character devices. 43They define the standard 44.Xr _init 9E , 45.Xr _fini 9E , 46and 47.Xr _info 9E 48entry points to initialize the module, as well as 49.Xr dev_ops 9S 50and 51.Xr cb_ops 9S 52structures. 53.Pp 54The main interface with MAC is through a series of callbacks defined in 55a 56.Xr mac_callbacks 9S 57structure. 58These callbacks control all the aspects of the device. 59They range from sending data, getting and setting of properties, controlling mac 60address filters, and also managing promiscuous mode. 61.Pp 62The MAC framework takes care of many aspects of the device driver's 63management. 64A device that uses the MAC framework does not have to worry about creating 65device nodes or implementing 66.Xr open 9E 67or 68.Xr close 9E 69routines. 70In addition, all of the work to interact with 71.Xr dlpi 4P 72is taken care of automatically and transparently. 73.Ss High-Level Design 74At a high-level, a device driver is chiefly concerned with three general 75operations: 76.Bl -enum -offset indent 77.It 78Sending frames 79.It 80Receiving frames 81.It 82Managing device configuration and metadata 83.El 84.Pp 85When sending frames, the MAC framework always calls functions registered 86in the 87.Xr mac_callbacks 9S 88structure to have the driver transmit frames on hardware. 89When receiving frames, the driver will generally receive an interrupt which will 90cause it to check for incoming data and deliver it to the MAC framework. 91.Pp 92Configuration of a device, such as whether auto-negotiation should be 93enabled, the speeds that the device supports, the MTU (maximum 94transmission unit), and the generation of pause frames are all driven by 95properties. 96The functions to get, set, and obtain information about properties are 97defined through callback functions specified in the 98.Xr mac_callbacks 9S 99structure. 100The full list of properties and a description of the relevant callbacks 101can be found in the 102.Sx PROPERTIES 103section. 104.Pp 105The MAC framework is designed to take advantage of various modern 106features provided by hardware, such as checksumming, segmentation 107offload, and hardware filtering. 108The MAC framework assumes none of these advanced features are present 109and allows device drivers to negotiate them through a capability system. 110Drivers can declare that they support various capabilities by 111implementing the optional 112.Xr mc_getcapab 9E 113entry point. 114Each capability has its associated entry points and structures to fill 115out. 116The capabilities are detailed in the 117.Sx CAPABILITIES 118section. 119.Pp 120The following sections describe the flow of a basic device driver. 121For advanced device drivers, the flow is generally the same. 122The primary distinction is in how frames are sent and received. 123.Ss Initializing MAC Support 124For a device to be used by the MAC framework, it must register with the 125framework and take specific actions during 126.Xr _init 9E , 127.Xr attach 9E , 128.Xr detach 9E , 129and 130.Xr _fini 9E . 131.Pp 132All device drivers have to define a 133.Xr dev_ops 9S 134structure which is pointed to by a 135.Xr modldrv 9S 136structure and the corresponding NULL-terminated 137.Xr modlinkage 9S 138structure. 139The 140.Xr dev_ops 9S 141structure should have a 142.Xr cb_ops 9S 143structure defined for it; however, it does not need to implement any of 144the standard 145.Xr cb_ops 9S 146entry points unless it also exposes a custom set of device nodes not 147otherwise managed by the MAC framework. 148See the 149.Sx Custom Device Nodes 150section for more details. 151.Pp 152Normally, in a driver's 153.Xr _init 9E 154entry point, it passes its 155.Xr modlinkage 9S 156structure directly to 157.Xr mod_install 9F . 158To properly register with MAC, the driver must call 159.Xr mac_init_ops 9F 160before it calls 161.Xr mod_install 9F . 162If for some reason the 163.Xr mod_install 9F 164function fails, then the driver must be removed by a call to 165.Xr mac_fini_ops 9F . 166.Pp 167Conversely, in the driver's 168.Xr _fini 9E 169routine, it should call 170.Xr mac_fini_ops 9F 171after it successfully calls 172.Xr mod_remove 9F . 173For an example of how to use the 174.Xr mac_init_ops 9F 175and 176.Xr mac_fini_ops 9F 177functions, see the examples section in 178.Xr mac_init_ops 9F . 179.Ss Custom Device Nodes 180A device may want to provide its own minor nodes as simple character or block 181devices backed by the usual 182.Xr cb_ops 9S 183routines. 184The MAC framework allows for this by leaving a portion of the minor 185number space available for private driver use. 186.Xr mac_private_minor 9F 187returns the first minor number a driver may use for its own purposes, 188e.g., to pass to 189.Xr ddi_create_minor_node 9F . 190.Pp 191A driver making use of this ability must provide its own 192.Xr getinfo 9E 193implementation that is aware of any such minor nodes. 194It must also delegate back to the MAC framework as appropriate via either 195calls to 196.Xr mac_getinfo 9F 197or 198.Xr mac_devt_to_instance 9F 199for MAC reserved minor nodes. 200It should also take care to not affect MAC reserved minors, e.g., 201removing all minor nodes associated with a device: 202.Bd -literal -offset indent 203 ddi_remove_minor_node(dip, NULL); 204.Ed 205.Ss Registering with MAC 206Every instance of a device should register separately with MAC. 207To register with MAC, a driver must allocate a 208.Xr mac_register 9S 209structure, fill it in, and then call 210.Xr mac_register 9F . 211The 212.Vt mac_register_t 213structure contains information about the device and all of the required 214function pointers that will be used as callbacks by the framework. 215.Pp 216These steps should all be taken during a device's 217.Xr attach 9E 218entry point. 219It is recommended that the driver perform this sequence of steps after the 220device has finished its initialization of the chipset and interrupts, though 221interrupts should not be enabled at that point. 222After it calls 223.Xr mac_register 9F 224it will start receiving callbacks from the MAC framework. 225.Pp 226To allocate the registration structure, the driver should call 227.Xr mac_alloc 9F . 228Device drivers should generally always pass the symbol 229.Dv MAC_VERSION 230as the argument to 231.Xr mac_alloc 9F . 232Upon successful completion, the driver will receive a 233.Vt mac_register_t 234structure which it should fill in. 235The structure and its members are documented in 236.Xr mac_register 9S . 237.Pp 238The 239.Xr mac_callbacks 9S 240structure is not allocated as a part of the 241.Xr mac_register 9S 242structure. 243In general, device drivers declare this statically. 244See the 245.Sx MAC Callbacks 246section for more information on how to fill it out. 247.Pp 248Once the structure has been filled in, the driver should call 249.Xr mac_register 9F 250to register itself with MAC. 251The handle that it uses to register with should be part of the driver's soft 252state. 253It will be used in various other support functions and callbacks. 254.Pp 255If the call is successful, then the device driver 256should enable interrupts and finish any other initialization required. 257If the call to 258.Xr mac_register 9F 259failed, then it should unwind its initialization and should return 260.Dv DDI_FAILURE 261from its 262.Xr attach 9E 263routine. 264.Pp 265The driver does not need to hold onto an allocated 266.Xr mac_register 9S 267structure after it has called the 268.Xr mac_register 9F 269function. 270Whether the 271.Xr mac_register 9F 272function returns successfully or not, the driver may free its 273.Xr mac_register 9S 274structure by calling the 275.Xr mac_free 9F 276function. 277.Ss MAC Callbacks 278The MAC framework interacts with a device driver through a series of 279callbacks. 280These callbacks are described in their individual manual pages and the 281collection of callbacks is indicated in the 282.Xr mac_callbacks 9S 283manual page. 284This section does not focus on the specific functions, but rather on 285interactions between them and the rest of the device driver framework. 286.Pp 287A device driver should make no assumptions about when the various 288callbacks will be called and whether or not they will be called 289simultaneously. 290For example, a device driver may be asked to transmit data through a call to its 291.Xr mc_tx 9E 292entry point while it is being asked to get a device property through a 293call to its 294.Xr mc_getprop 9E 295entry point. 296As such, while some calls may be serialized to the device, such as setting 297properties, the device driver should always presume that all of its data needs 298to be protected with locks. 299While the device is holding locks, it is safe for it call the following MAC 300routines: 301.Bl -bullet -offset indent -compact 302.It 303.Xr mac_hcksum_get 9F 304.It 305.Xr mac_hcksum_set 9F 306.It 307.Xr mac_lso_get 9F 308.It 309.Xr mac_maxsdu_update 9F 310.It 311.Xr mac_prop_info_set_default_link_flowctrl 9F 312.It 313.Xr mac_prop_info_set_default_str 9F 314.It 315.Xr mac_prop_info_set_default_uint8 9F 316.It 317.Xr mac_prop_info_set_default_uint32 9F 318.It 319.Xr mac_prop_info_set_default_uint64 9F 320.It 321.Xr mac_prop_info_set_perm 9F 322.It 323.Xr mac_prop_info_set_range_uint32 9F 324.El 325.Pp 326Any other MAC related routines should not be called with locks held, 327such as 328.Xr mac_link_update 9F 329or 330.Xr mac_rx 9F . 331Other routines in the DDI may be called while locks are held; however, 332device driver writers should be careful about calling blocking routines 333while locks are held or in interrupt context, even when it is 334legal to do so as this may cause all other callers that need a given 335lock to back up behind such an operation. 336.Ss Receiving Data 337A device driver will often receive data through the means of an 338interrupt or by being asked to poll for frames. 339When this occurs, zero or more frames, each with optional metadata, may 340be ready for the device driver to consume. 341Often each frame has a corresponding descriptor which has information about 342whether or not there were errors or whether or not the device successfully 343checksummed the packet. 344In addition to the per-packet flow described below, there are certain 345requirements that drivers must adhere to when programming the hardware 346to receive data. 347See the section 348.Sx RECEIVE DESCRIPTOR LAYOUT 349for more information. 350.Pp 351During a single interrupt or poll request, a device driver should process 352a fixed number of frames. 353For each frame the device driver should: 354.Bl -enum -offset indent 355.It 356Ensure that all of the DMA memory for the descriptor ring is synchronized with 357the 358.Xr ddi_dma_sync 9F 359function and check the handle for errors if the device driver has enabled DMA 360error reporting as part of the Fault Management Architecture (FMA). 361If the driver does not rely on DMA, then it may skip this step. 362It is recommended that this is performed once per interrupt or poll for 363the entire region and not on a per-packet basis. 364.It 365First check whether or not the frame has errors. 366If errors were detected, then the frame should not be sent to the operating 367system. 368It is recommended that devices keep kstats (see 369.Xr kstat_create 9F 370for more information) and bump the counter whenever such an error is 371detected. 372If the device distinguishes between the types of errors, then separate kstats 373for each class of error are recommended. 374See the 375.Sx STATISTICS 376section for more information on the various error cases that should be 377considered. 378.It 379Once the frame has been determined to be valid, the device driver should 380transform the frame into a 381.Xr mblk 9S . 382See the section 383.Sx MBLKS AND DMA 384for more information on how to transform and prepare a message block. 385.It 386If the device supports hardware checksumming (see the 387.Sx CAPABILITIES 388section for more information on checksumming), then the device driver 389should set the corresponding checksumming information with a call to 390.Xr mac_hcksum_set 9F . 391.It 392It should then append this new message block to the 393.Em end 394of the message block chain, linking it to the 395.Fa b_next 396pointer. 397It is vitally important that all the frames be chained in the order that they 398were received. 399If the device driver mistakenly reorders frames, then it may cause performance 400impacts in the TCP stack and potentially impact application correctness. 401.El 402.Pp 403Once all the frames have been processed and assembled, the device driver 404should deliver them to the rest of the operating system by calling 405.Xr mac_rx 9F . 406The device driver should try to give as many mblk_t structures to the 407system at once. 408It 409.Em should not 410call 411.Xr mac_rx 9F 412once for every assembled mblk_t. 413.Pp 414The device driver must not hold any locks across the call to 415.Xr mac_rx 9F . 416When this function is called, received data will be pushed through the 417networking stack and some replies may be generated and given to the 418driver to send out. 419.Pp 420It is not the device driver's responsibility to determine whether or not 421the system can keep up with a driver's delivery rate of frames. 422The rest of the networking stack will handle issues related to keeping up 423appropriately and ensure that kernel memory is not exhausted by packets 424that are not being processed. 425.Pp 426If the device driver has negotiated the 427.Dv MAC_CAPAB_RINGS 428capability 429.Pq discussed in Xr mac_capab_rings 9E 430then it should call 431.Xr mac_rx_ring 9F 432and not 433.Xr mac_rx 9F . 434A given interrupt may correspond to more than one ring that needs to be 435checked. 436The set of rings is likely to span different groups that were registered 437with MAC through the 438.Xr mr_gget 9E 439interface. 440In those cases, the driver should follow the above procedure 441independently for each ring. 442That means it will call 443.Xr mac_rx_ring 9F 444once for each ring using the handle that it received from when MAC 445called the driver's 446.Xr mr_rget 9E 447entry point. 448When it is looking at the rings, the driver will need to make sure that 449the ring has not had interrupts disabled 450.Pq due to a pending change to polling mode . 451This is discussed in greater detail in the 452.Xr mac_capab_rings 9E 453and 454.Xr mri_poll 9E 455manual pages. 456.Pp 457Finally, the device driver should make sure that any other housekeeping 458activities required for the ring are taken care of such that more data 459can be received. 460.Ss Transmitting Data and Back Pressure 461A device driver will be asked to transmit a message block chain by 462having it's 463.Xr mc_tx 9E 464entry point called. 465While the driver is processing the message blocks, it may run out of resources. 466For example, a transmit descriptor ring may become full. 467At that point, the device driver should return the remaining unprocessed frames. 468The act of returning frames indicates that the device has asserted flow control. 469Once this has been done, no additional calls will be made to the 470driver's transmit entry point and the back pressure will be propagated 471throughout the rest of the networking stack. 472.Pp 473At some point in the future when resources have become available again, 474for example after an interrupt indicating that some portion of the 475transmit ring has been sent, then the device driver must notify the 476system that it can continue transmission. 477To do this, the driver should call 478.Xr mac_tx_update 9F . 479After that point, the driver will receive calls to its 480.Xr mc_tx 9E 481entry point again. 482As mentioned in the section on callbacks, the device driver should avoid holding 483any particular locks across the call to 484.Xr mac_tx_update 9F . 485.Ss Interrupt Coalescing 486For devices operating at higher data rates, interrupt coalescing is an 487important part of a well functioning device and may impact the 488performance of the device. 489Not all devices support interrupt coalescing. 490If interrupt coalescing is supported on the device, it is recommended that 491device driver writers provide private properties for their device to control the 492interrupt coalescing rate. 493This will make it much easier to perform experiments and observe the impact of 494different interrupt rates on the rest of the system. 495.Ss Polling 496Even with interrupt coalescing, when there is a certain incoming packet rate it 497can make more sense to just actively poll the device, asking for more packets 498rather than constantly taking an interrupt. 499When a device driver supports the 500.Xr mac_capab_rings 9E 501capability and therefore polling on receive rings, the MAC framework will ask 502the driver to disable interrupts, with its 503.Xr mi_disable 9E 504entry point, and then subsequently call its polling entry point, 505.Xr mri_poll 9E . 506.Pp 507As long as a device driver implements the needed entry points, then there is 508nothing else that it needs to do to take advantage of polling. 509A driver should not attempt to spin up its own threads, task queues, or 510creatively use timeouts, to try to simulate polling for received packets. 511.Ss MAC Address Filter Management 512The MAC framework will attempt to use as many MAC address filters as a 513device has. 514To program a multicast address filter, the driver's 515.Xr mc_multicst 9E 516entry point will be called. 517If the device driver runs out of filters, it should not take any special action 518and just return the appropriate error as documented in the corresponding manual 519pages for the entry points. 520The framework will ensure that the device is placed in promiscuous mode 521if it needs to. 522.Pp 523If the hardware supports more than one unicast filter then the device 524driver should consider implementing the 525.Dv MAC_CAPAB_RINGS 526capability, which exposes a means for multiple unicast MAC address filters to be 527used by the broader system. 528It is still useful to implement this on hardware which only has a single ring. 529See 530.Xr mac_capab_rings 9E 531for more information. 532.Ss Receive Side Scaling 533Receive side scaling is where a hardware device supports multiple, 534independent queues of frames that can be received. 535Each of these queues is generally associated with an independent 536interrupt and the hardware usually performs some form of hash across the 537queues. 538Hardware which supports this should look at implementing the 539.Dv MAC_CAPAB_RINGS 540capability and see 541.Xr mac_capab_rings 9E 542for more information. 543.Ss Link Updates 544It is the responsibility of the device driver to keep track of the 545data link's state. 546Many devices provide a means of receiving an interrupt when the state of the 547link changes. 548When such a change happens, the driver should update its internal data 549structures and then call 550.Xr mac_link_update 9F 551to inform the MAC layer that this has occurred. 552If the device driver does not properly inform the system about link changes, 553then various features like link aggregations and other mechanisms that leverage 554the link state will not work correctly. 555.Ss Link Speed and Auto-negotiation 556Many networking devices support more than one possible speed that they 557can operate at. 558The selection of a speed is often performed through 559.Em auto-negotiation , 560though some devices allow the user to control what speeds are advertised 561and used. 562.Pp 563Logically, there are two different sets of things that the device driver 564needs to keep track of while it's operating: 565.Bl -enum 566.It 567The supported speeds in hardware. 568.It 569The enabled speeds from the user. 570.El 571.Pp 572By default, when a link first comes up, the device driver should 573generally configure the link to support the common set of speeds and 574perform auto-negotiation. 575.Pp 576A user can control what speeds a device advertises via auto-negotiation 577and whether or not it performs auto-negotiation at all by using a series 578of properties that have 579.Sy _EN_ 580in the name. 581These are read/write properties and there is one for each speed supported in the 582operating system. 583For a full list of them, see the 584.Sx PROPERTIES 585section. 586.Pp 587In addition to these properties, there is a corresponding set of 588properties with 589.Sy _ADV_ 590in the name. 591These are similar to the 592.Sy _EN_ 593family of properties, but they are read-only and indicate what the 594device has actually negotiated. 595While they are generally similar to the 596.Sy _EN_ 597family of properties, they may change depending on power settings. 598See the 599.Sy Ethernet Link Properties 600section in 601.Xr dladm 8 602for more information. 603.Pp 604It's worth discussing how these different values get used throughout the 605different entry points. 606The first entry point to consider is the 607.Xr mc_propinfo 9E 608entry point. 609For a given speed, the driver should consult whether or not the hardware 610supports this speed. 611If it does, it should fill in the default value that the hardware takes and 612whether or not the property is writable. 613The properties should also be updated to indicate whether or not it is writable. 614This holds for both the 615.Sy _EN_ 616and 617.Sy _ADV_ 618family of properties. 619.Pp 620The next entry point is 621.Xr mc_getprop 9E . 622Here, the device should first consult whether the given speed is 623supported. 624If it is not, then the driver should return 625.Er ENOTSUP . 626If it does, then it should return the current value of the property. 627.Pp 628The last property endpoint is the 629.Xr mc_setprop 9E 630entry point. 631Here, the same logic applies. 632Before the driver considers whether or not the property is writable, it should 633first check whether or not it's a supported property. 634If it's not, then it should return 635.Er ENOTSUP . 636Otherwise, it should proceed to check whether the property is writable, 637and if it is and a valid value, then it should update the property and 638restart the link's negotiation. 639.Pp 640Finally, there is the 641.Xr mc_getstat 9E 642entry point. 643Several of the statistics that are queried relate to auto-negotiation and 644hardware capabilities. 645When a statistic relates to the hardware supporting a given speed, the 646.Sy _EN_ 647properties should be ignored. 648The only thing that should be consulted is what the hardware itself supports. 649Otherwise, the statistics should look at what is currently being advertised by 650the device. 651.Ss Unregistering from MAC 652During a driver's 653.Xr detach 9E 654routine, it should unregister the device instance from MAC by calling 655.Xr mac_unregister 9F 656on the handle that it originally called it on. 657If the call to 658.Xr mac_unregister 9F 659failed, then the device is likely still in use and the driver should 660fail the call to 661.Xr detach 9E . 662.Ss Interacting with Devices 663Administrators always interact with devices through the 664.Xr dladm 8 665command line interface. 666The state of devices such as whether the link is considered up or down, 667various link properties such as the MTU, auto-negotiation state, and 668flow control state, are all exposed. 669It is also the preferred way that these properties are set and configured. 670.Pp 671While device tunables may be presented in a 672.Xr driver.conf 5 673file, it is recommended instead to expose such things through 674.Xr dladm 8 675private properties, whether explicitly documented or not. 676.Sh CAPABILITIES 677Capabilities in the MAC Framework are optional features that a device 678supports which indicate various hardware features that the device 679supports. 680The two current capabilities that the system supports are related to being able 681to hardware perform large send offloads (LSO), often also known as TCP 682segmentation and the ability for hardware to calculate and verify the checksums 683present in IPv4, IPV6, and protocol headers such as TCP and UDP. 684.Pp 685The MAC framework will query a device for support of a capability 686through the 687.Xr mc_getcapab 9E 688function. 689Each capability has its own constant and may have corresponding data that goes 690along with it and a specific structure that the device is required to fill in. 691Note, the set of capabilities changes over time and there are also private 692capabilities in the system. 693Several of the capabilities are used in the implementation of the MAC framework. 694Others, like 695.Dv MAC_CAPAB_RINGS , 696represent feature that have not been stabilized and thus both API and binary 697compatibility for them is not guaranteed. 698It is important that the device driver handles unknown capabilities correctly. 699For more information, see 700.Xr mc_getcapab 9E . 701.Pp 702The following capabilities are 703stable and defined in the system: 704.Ss Dv MAC_CAPAB_HCKSUM 705The 706.Dv MAC_CAPAB_HCKSUM 707capability indicates to the system that the device driver supports some 708amount of checksumming. 709The specific data for this capability is a pointer to a 710.Vt uint32_t . 711To indicate no support for any kind of checksumming, the driver should 712either set this value to zero or simply return that it doesn't support 713the capability. 714.Pp 715Note, the values that the driver declares in this capability indicate 716what it can do when it transmits data. 717If the driver can only verify checksums when receiving data, then it should not 718indicate that it supports this capability. 719The following set of flags may be combined through a bitwise inclusive OR: 720.Bl -tag -width Ds 721.It Dv HCKSUM_INET_PARTIAL 722This indicates that the hardware can calculate a partial checksum for 723both IPv4 and IPv6 UDP and TCP packets; however, it requires the pseudo-header 724checksum be calculated for it. 725The pseudo-header checksum will be available for the mblk_t when calling 726.Xr mac_hcksum_get 9F . 727Note this does not imply that the hardware is capable of calculating 728the partial checksum for other L4 protocols or the IPv4 header checksum. 729That should be indicated with the 730.Dv HCKSUM_IPHDRCKSUM flag. 731.It Dv HCKSUM_INET_FULL_V4 732This indicates that the hardware will fully calculate the L4 checksum for 733outgoing IPv4 UDP or TCP packets only, and does not require a pseudo-header 734checksum. 735Note this does not imply that the hardware is capable of calculating the 736checksum for other L4 protocols or the IPv4 header checksum. 737That should be indicated with the 738.Dv HCKSUM_IPHDRCKSUM . 739.It Dv HCKSUM_INET_FULL_V6 740This indicates that the hardware will fully calculate the L4 checksum for 741outgoing IPv6 UDP or TCP packets only, and does not require a pseudo-header 742checksum. 743Note this does not imply that the hardware is capable of calculating the 744checksum for any other L4 protocols. 745.It Dv HCKSUM_IPHDRCKSUM 746This indicates that the hardware supports calculating the checksum for 747the IPv4 header itself. 748.El 749.Pp 750When in a driver's transmit function, the driver will be processing a 751single frame. 752It should call 753.Xr mac_hcksum_get 9F 754to see what checksum flags are set on it. 755Note that the flags that are set on it are different from the ones described 756above and are documented in its manual page. 757These flags indicate how the driver is expected to program the hardware and what 758checksumming is required. 759Not all frames will require hardware checksumming or will ask the hardware to 760checksum it. 761.Pp 762If a driver supports offloading the receive checksum and verification, 763it should check to see what the hardware indicated was verified. 764The driver should then call 765.Xr mac_hcksum_set 9F . 766The flags used are different from the ones above and are discussed in 767detail in the 768.Xr mac_hcksum_set 9F 769manual page. 770If there is no checksum information available or the driver does not support 771checksumming, then it should simply not call 772.Xr mac_hcksum_set 9F . 773.Pp 774Note that the checksum flags should be set on the first 775mblk_t that makes up a given message. 776In other words, if multiple mblk_t structures are linked together by the 777.Fa b_cont 778member to describe a single frame, then it should only be called on the 779first mblk_t of that set. 780However, each distinct message should have the checksum bits set on it, if 781applicable. 782In other words, each mblk_t that is linked together by the 783.Fa b_next 784pointer may have checksum flags set. 785.Pp 786It is recommended that device drivers provide a private property or 787.Xr driver.conf 5 788property to control whether or not checksumming is enabled for both rx 789and tx; however, the default disposition is recommended to be enabled 790for both. 791This way if hardware bugs are found in the checksumming implementation, they can 792be disabled without requiring software updates. 793The transmit property should be checked when determining how to reply to 794.Xr mc_getcapab 9E 795and the receive property should be checked in the context of the receive 796function. 797.Ss Dv MAC_CAPAB_LSO 798The 799.Dv MAC_CAPAB_LSO 800capability indicates that the driver supports various forms of large 801send offload (LSO). 802The private data is a pointer to a 803.Ft mac_capab_lso_t 804structure. 805The system currently supports offloading TCP packets over both IPv4 and 806IPv6. 807This structure has the following members which are used to indicate 808various types of LSO support. 809.Bd -literal -offset indent 810t_uscalar_t lso_flags; 811lso_basic_tcp_ivr4_t lso_basic_tcp_ipv4; 812lso_basic_tcp_ipv6_t lso_basic_tcp_ipv6; 813.Ed 814.Pp 815The 816.Fa lso_flags 817member is used to indicate which members are valid and should be 818considered. 819Each flag represents a different form of LSO. 820The member should be set to the bitwise inclusive OR of the following values: 821.Bl -tag -width Dv -offset indent 822.It Dv LSO_TX_BASIC_TCP_IPV4 823This indicates hardware support for performing TCP segmentation 824offloading over IPv4. 825When this flag is set, the 826.Fa lso_basic_tcp_ipv4 827member must be filled in. 828.It Dv LSO_TX_BASIC_TCP_IPV6 829This indicates hardware support for performing TCP segmentation 830offloading over IPv6. 831The IPv6 packet will have no extension headers present. 832When this flag is set, the 833.Fa lso_basic_tcp_ipv6 834member must be filled in. 835.El 836.Pp 837The 838.Fa lso_basic_tcp_ipv4 839member is a structure with the following members: 840.Bd -literal -offset indent 841t_uscalar_t lso_max 842.Ed 843.Bd -filled -offset indent 844The 845.Fa lso_max 846member should be set to the maximum size of the TCP data 847payload that can be offloaded to the hardware. 848.Ed 849.Pp 850The 851.Fa lso_basic_tcp_ipv6 852member is a structure with the following members: 853.Bd -literal -offset indent 854t_uscalar_t lso_max 855.Ed 856.Bd -filled -offset indent 857The 858.Fa lso_max 859member should be set to the maximum size of the TCP data 860payload that can be offloaded to the hardware. 861.Ed 862.Pp 863Like with checksumming, it is recommended that driver writers provide a 864means for disabling the support of LSO even if it is enabled by default. 865This deals with the case where issues that pop up for LSO may be worked 866around without requiring additional driver work. 867.Sh EVOLVING CAPABILITIES 868The following capabilities are still evolving in the operating system. 869They are documented such that device driver writers may experiment with 870them. 871However, if such drivers are not present inside the core operating 872system repository, they may be subject to API and ABI breakage. 873.Ss Dv MAC_CAPAB_RINGS 874The 875.Dv MAC_CAPAB_RINGS 876capability is very important for implementing a high-performing device 877driver. 878Networking hardware structures the queues of packets to be sent 879and received into a ring. 880Each entry in this ring has a descriptor, which describes the address 881and options for a packet which is going to 882be transmitted or received. 883While simple networking devices only have a single ring, most high-speed 884networking devices have support for many rings. 885.Pp 886Rings are used for two important purposes. 887The first is receive side scaling (RSS), which is the ability to have 888the hardware hash the contents of a packet based on some of the protocol 889headers, and send it to one of several rings. 890These different rings may each have their own interrupt associated with 891them, allowing the card to receive traffic in parallel. 892Similar logic can be performed when sending traffic, to leverage 893multiple hardware resources, thus increasing capacity. 894.Pp 895The second use of rings is to group them together and apply filtering 896rules. 897For example, if a packet matches a specific VLAN or MAC address, 898then it can be sent to a specific ring or a specific group of rings. 899This is especially useful when there are multiple different virtual NICs 900or zones in play as the operating system will be able to use the 901hardware classificaiton features to already know where a given packet 902needs to be delivered internally rather than having to determine that 903for each packet. 904.Pp 905From the MAC framework's perspective, a driver can have one or more 906groups. 907A group consists of the following: 908.Bl -bullet -offset -indent 909.It 910One or more hardware rings. 911.It 912One or more MAC address or VLAN filters. 913.El 914.Pp 915The details around how a device driver changes when rings are employed, 916the data structures that a driver must implement, and more are available 917in 918.Xr mac_capab_rings 9E . 919.Ss Dv MAC_CAPAB_TRANSCEIVER 920Many networking devices leverage external transceivers that adhere to 921standards such as SFP, QSFP, QSFP-DD, etc., which often contain 922standardized information in a EEPROM on the device. 923The 924.Dv MAC_CAPAB_TRANSCEIVER 925capability provides a means of discovering the number of transceivers, 926their types, and reading the data from a transceiver. 927This allows administrators and users to determine if devices are 928present, if the hardware can use them, and in many cases, detailed 929information about the device ranging from its manufacturer and 930serial numbers to specific information about its health. 931Implementing this capability will lead to the operating system being 932able to discover and display transceivers as part of its fault 933management topology. 934.Pp 935See 936.Xr mac_capab_transceiver 9E 937for more details on the capability structure and the various function 938entry points that come along with it. 939.Ss Dv MAC_CAPAB_LED 940The 941.Dv MAC_CAPAB_LED 942capability provides a means to access and control the LEDs on a network 943interface card. 944This is then made available to the broader operating system and consumed 945by facilities such as the Fault Management Architecture. 946See 947.Xr mac_capab_led 9E 948for more details on the structure and requirements of the capability. 949.Sh PROPERTIES 950Properties in the MAC framework represent aspects of a link. 951These include things like the link's current state and MTU. 952Many of the properties in the system are focused around auto-negotiation and 953controlling what link speeds are advertised. 954Information about properties is covered by three different device entry points. 955The 956.Xr mc_propinfo 9E 957entry point obtains metadata about the property. 958The 959.Xr mc_getprop 9E 960entry point obtains the property. 961The 962.Xr mc_setprop 9E 963entry point updates the property to a new value. 964.Pp 965Many of the properties listed below are read-only. 966Each property indicates whether it's read-only or it's read/write. 967However, driver writers may not implement the ability to set all writable 968properties. 969Many of these depend on the card itself. 970In particular, all properties that relate to auto-negotiation and are read/write 971may not be updated if the hardware in question does not support toggling what 972link speeds are auto-negotiated. 973While copper Ethernet often does not have this restriction, it often exists with 974various fiber standards and phys. 975.Pp 976The following properties are the subset of MAC framework properties that 977driver writers should be aware of and handle. 978While other properties exist in the system, driver writers should always return 979an error when a property not listed below is encountered. 980See 981.Xr mc_getprop 9E 982and 983.Xr mc_setprop 9E 984for more information on how to handle them. 985.Bl -hang -width Ds 986.It Dv MAC_PROP_DUPLEX 987.Bd -filled -compact 988Type: 989.Vt link_duplex_t | 990Permissions: 991.Sy Read-Only 992.Ed 993.Pp 994The 995.Dv MAC_PROP_DUPLEX 996property is used to indicate whether or not the link is duplex. 997A duplex link may have traffic flowing in both directions at the same time. 998The 999.Vt link_duplex_t 1000is an enumeration which may be set to any of the following values: 1001.Bl -tag -width Ds 1002.It Dv LINK_DUPLEX_UNKNOWN 1003The current state of the link is unknown. 1004This may be because the link has not negotiated to a specific speed or it is 1005down. 1006.It Dv LINK_DUPLEX_HALF 1007The link is running at half duplex. 1008Communication may travel in only one direction on the link at a given time. 1009.It Dv LINK_DUPLEX_FULL 1010The link is running at full duplex. 1011Communication may travel in both directions on the link simultaneously. 1012.El 1013.It Dv MAC_PROP_SPEED 1014.Bd -filled -compact 1015Type: 1016.Vt uint64_t | 1017Permissions: 1018.Sy Read-Only 1019.Ed 1020.Pp 1021The 1022.Dv MAC_PROP_SPEED 1023property stores the current link speed in bits per second. 1024A link that is running at 100 MBit/s would store the value 100000000ULL. 1025A link that is running at 40 Gbit/s would store the value 40000000000ULL. 1026.It Dv MAC_PROP_STATUS 1027.Bd -filled -compact 1028Type: 1029.Vt link_state_t | 1030Permissions: 1031.Sy Read-Only 1032.Ed 1033.Pp 1034The 1035.Dv MAC_PROP_STATUS 1036property is used to indicate the current state of the link. 1037It indicates whether the link is up or down. 1038The 1039.Vt link_state_t 1040is an enumeration which may be set to any of the following values: 1041.Bl -tag -width Ds 1042.It Dv LINK_STATE_UNKNOWN 1043The current state of the link is unknown. 1044This may be because the driver's 1045.Xr mc_start 9E 1046endpoint has not been called so it has not attempted to start the link. 1047.It Dv LINK_STATE_DOWN 1048The link is down. 1049This may be because of a negotiation problem, a cable problem, or some other 1050device specific issue. 1051.It Dv LINK_STATE_UP 1052The link is up. 1053If auto-negotiation is in use, it should have completed. 1054Traffic should be able to flow over the link, barring other issues. 1055.El 1056.It Dv MAC_PROP_AUTONEG 1057.Bd -filled -compact 1058Type: 1059.Vt uint8_t | 1060Permissions: 1061.Sy Read/Write 1062.Ed 1063.Pp 1064The 1065.Dv MAC_PROP_AUTONEG 1066property indicates whether or not the device is currently configured to 1067perform auto-negotiation. 1068A value of 1069.Sy 0 1070indicates that auto-negotiation is disabled. 1071A 1072.Sy non-zero 1073value indicates that auto-negotiation is enabled. 1074Devices should generally default to enabling auto-negotiation. 1075.Pp 1076When getting this property, the device driver should return the current 1077state. 1078When setting this property, if the device supports operating in the requested 1079mode, then the device driver should reset the link to negotiate to the new speed 1080after updating any internal registers. 1081.It Dv MAC_PROP_MTU 1082.Bd -filled -compact 1083Type: 1084.Vt uint32_t | 1085Permissions: 1086.Sy Read/Write 1087.Ed 1088.Pp 1089The 1090.Dv MAC_PROP_MTU 1091property determines the maximum transmission unit (MTU). 1092This indicates the maximum size packet that the device can transmit, ignoring 1093its own headers. 1094For an Ethernet device, this would exclude the size of the Ethernet header and 1095any VLAN headers that would be placed. 1096It is up to the driver to ensure that any MTU values that it accepts when adding 1097in its margin and header sizes does not exceed its maximum frame size. 1098.Pp 1099By default, drivers for Ethernet should initialize this value and the 1100MTU to 1101.Sy 1500 . 1102When getting this property, the driver should return its current 1103recorded MTU. 1104When setting this property, the driver should first validate that it is within 1105the device's valid range and then it must call 1106.Xr mac_maxsdu_update 9F . 1107Note that the call may fail. 1108If the call completes successfully, the driver should update the hardware with 1109the new value of the MTU and perform any other work needed to handle it. 1110.Pp 1111If the device does not support changing the MTU after the device's 1112.Xr mc_start 9E 1113entry point has been called, then driver writers should return 1114.Er EBUSY . 1115.It Dv MAC_PROP_FLOWCTRL 1116.Bd -filled -compact 1117Type: 1118.Vt link_flowctrl_t | 1119Permissions: 1120.Sy Read/Write 1121.Ed 1122.Pp 1123The 1124.Dv MAC_PROP_FLOWCTRL 1125property manages the configuration of pause frames as part of Ethernet 1126flow control. 1127Note, this only describes what this device will advertise. 1128What is actually enabled may be different and is subject to the rules of 1129auto-negotiation. 1130The 1131.Vt link_flowctrl_t 1132is an enumeration that may be set to one of the following values: 1133.Bl -tag -width Ds 1134.It Dv LINK_FLOWCTRL_NONE 1135Flow control is disabled. 1136No pause frames should be generated or honored. 1137.It Dv LINK_FLOWCTRL_RX 1138The device can receive pause frames; however, it should not generate 1139them. 1140.It Dv LINK_FLOWCTRL_TX 1141The device can generate pause frames; however, it does not support 1142receiving them. 1143.It Dv LINK_FLOWCTRL_BI 1144The device supports both sending and receiving pause frames. 1145.El 1146.Pp 1147When getting this property, the device driver should return the way that 1148it has configured the device, not what the device has actually 1149negotiated. 1150When setting the property, it should update the hardware and allow the link to 1151potentially perform auto-negotiation again. 1152.It Dv MAC_PROP_EN_FEC_CAP 1153.Bd -filled -compact 1154Type: 1155.Vt link_fec_t | 1156Permissions: 1157.Sy Read/Write 1158.Ed 1159.Pp 1160The 1161.Dv MAC_PROP_EN_FEC_CAP 1162property indicates which Forward Error Correction (FEC) code is advertised 1163by the device. 1164.Pp 1165The 1166.Vt link_fec_t 1167is an enumeration that may be a combination of the following bit values: 1168.Bl -tag -width Ds 1169.It Dv LINK_FEC_NONE 1170No FEC over the link. 1171.It Dv LINK_FEC_AUTO 1172The FEC coding to use is auto-negotiated, 1173.Dv LINK_FEC_AUTO 1174cannot be set along with any of the other values. 1175This is the default setting the device driver should use. 1176.It Dv LINK_FEC_RS 1177The link may use Reed-Solomon FEC coding. 1178.It Dv LINK_FEC_BASE_R 1179The link may use Base-R coding, also common referred to as FireCode. 1180.El 1181.Pp 1182When setting the property, it should update the hardware with the requested, or 1183combination of requested codings. 1184If a particular combination of codings is not supported by the hardware, 1185the device driver should return 1186.Er EINVAL . 1187When retrieving this property, the device driver should return the current 1188value of the property. 1189.It Dv MAC_PROP_ADV_FEC_CAP 1190.Bd -filled -compact 1191Type: 1192.Vt link_fec_t | 1193Permissions: 1194.Sy Read-Only 1195.Ed 1196.Pp 1197The 1198.Dv MAC_PROP_ADV_FEC_CAP 1199has the same values as 1200.Dv MAC_PROP_EN_FEC_CAP . 1201The property indicates which Forward Error Correction (FEC) code has been 1202negotiated over the link. 1203.El 1204.Pp 1205The remaining properties are all about various auto-negotiation link 1206speeds. 1207They fall into two different buckets: properties with 1208.Sy _ADV_ 1209in the name and properties with 1210.Sy _EN_ 1211in the name. 1212For any given supported speed, there is one of each. 1213The 1214.Sy _EN_ 1215set of properties are read/write properties that control what should be 1216advertised by the device. 1217When these are retrieved, they should return the current value of the property. 1218When they are set, they should change how the hardware advertises the specific 1219speed and trigger any kind of link reset and auto-negotiation, if enabled, to 1220occur. 1221.Pp 1222The 1223.Sy _ADV_ 1224set of properties are read-only properties. 1225They are meant to reflect what has actually been negotiated. 1226These may be different from the 1227.Sy _EN_ 1228family of properties, especially when different power management 1229settings are at play. 1230.Pp 1231See the 1232.Sx Link Speed and Auto-negotiation 1233section for more information. 1234.Pp 1235The properties are ordered in increasing link speed: 1236.Bl -hang -width Ds 1237.It Dv MAC_PROP_ADV_10HDX_CAP 1238.Bd -filled -compact 1239Type: 1240.Vt uint8_t | 1241Permissions: 1242.Sy Read-Only 1243.Ed 1244.Pp 1245The 1246.Dv MAC_PROP_ADV_10HDX_CAP 1247property describes whether or not 10 Mbit/s half-duplex support is 1248advertised. 1249.It Dv MAC_PROP_EN_10HDX_CAP 1250.Bd -filled -compact 1251Type: 1252.Vt uint8_t | 1253Permissions: 1254.Sy Read/Write 1255.Ed 1256.Pp 1257The 1258.Dv MAC_PROP_EN_10HDX_CAP 1259property describes whether or not 10 Mbit/s half-duplex support is 1260enabled. 1261.It Dv MAC_PROP_ADV_10FDX_CAP 1262.Bd -filled -compact 1263Type: 1264.Vt uint8_t | 1265Permissions: 1266.Sy Read-Only 1267.Ed 1268.Pp 1269The 1270.Dv MAC_PROP_ADV_10FDX_CAP 1271property describes whether or not 10 Mbit/s full-duplex support is 1272advertised. 1273.It Dv MAC_PROP_EN_10FDX_CAP 1274.Bd -filled -compact 1275Type: 1276.Vt uint8_t | 1277Permissions: 1278.Sy Read/Write 1279.Ed 1280.Pp 1281The 1282.Dv MAC_PROP_EN_10FDX_CAP 1283property describes whether or not 10 Mbit/s full-duplex support is 1284enabled. 1285.It Dv MAC_PROP_ADV_100HDX_CAP 1286.Bd -filled -compact 1287Type: 1288.Vt uint8_t | 1289Permissions: 1290.Sy Read-Only 1291.Ed 1292.Pp 1293The 1294.Dv MAC_PROP_ADV_100HDX_CAP 1295property describes whether or not 100 Mbit/s half-duplex support is 1296advertised. 1297.It Dv MAC_PROP_EN_100HDX_CAP 1298.Bd -filled -compact 1299Type: 1300.Vt uint8_t | 1301Permissions: 1302.Sy Read/Write 1303.Ed 1304.Pp 1305The 1306.Dv MAC_PROP_EN_100HDX_CAP 1307property describes whether or not 100 Mbit/s half-duplex support is 1308enabled. 1309.It Dv MAC_PROP_ADV_100FDX_CAP 1310.Bd -filled -compact 1311Type: 1312.Vt uint8_t | 1313Permissions: 1314.Sy Read-Only 1315.Ed 1316.Pp 1317The 1318.Dv MAC_PROP_ADV_100FDX_CAP 1319property describes whether or not 100 Mbit/s full-duplex support is 1320advertised. 1321.It Dv MAC_PROP_EN_100FDX_CAP 1322.Bd -filled -compact 1323Type: 1324.Vt uint8_t | 1325Permissions: 1326.Sy Read/Write 1327.Ed 1328.Pp 1329The 1330.Dv MAC_PROP_EN_100FDX_CAP 1331property describes whether or not 100 Mbit/s full-duplex support is 1332enabled. 1333.It Dv MAC_PROP_ADV_100T4_CAP 1334.Bd -filled -compact 1335Type: 1336.Vt uint8_t | 1337Permissions: 1338.Sy Read-Only 1339.Ed 1340.Pp 1341The 1342.Dv MAC_PROP_ADV_100T4_CAP 1343property describes whether or not 100 Mbit/s Ethernet using the 1344100BASE-T4 standard is 1345advertised. 1346.It Dv MAC_PROP_EN_100T4_CAP 1347.Bd -filled -compact 1348Type: 1349.Vt uint8_t | 1350Permissions: 1351.Sy Read/Write 1352.Ed 1353.Pp 1354The 1355.Sy MAC_PROP_ADV_100T4_CAP 1356property describes whether or not 100 Mbit/s Ethernet using the 1357100BASE-T4 standard is 1358enabled. 1359.It Sy MAC_PROP_ADV_1000HDX_CAP 1360.Bd -filled -compact 1361Type: 1362.Vt uint8_t | 1363Permissions: 1364.Sy Read-Only 1365.Ed 1366.Pp 1367The 1368.Dv MAC_PROP_ADV_1000HDX_CAP 1369property describes whether or not 1 Gbit/s half-duplex support is 1370advertised. 1371.It Dv MAC_PROP_EN_1000HDX_CAP 1372.Bd -filled -compact 1373Type: 1374.Vt uint8_t | 1375Permissions: 1376.Sy Read/Write 1377.Ed 1378.Pp 1379The 1380.Dv MAC_PROP_EN_1000HDX_CAP 1381property describes whether or not 1 Gbit/s half-duplex support is 1382enabled. 1383.It Dv MAC_PROP_ADV_1000FDX_CAP 1384.Bd -filled -compact 1385Type: 1386.Vt uint8_t | 1387Permissions: 1388.Sy Read-Only 1389.Ed 1390.Pp 1391The 1392.Dv MAC_PROP_ADV_1000FDX_CAP 1393property describes whether or not 1 Gbit/s full-duplex support is 1394advertised. 1395.It Dv MAC_PROP_EN_1000FDX_CAP 1396.Bd -filled -compact 1397Type: 1398.Vt uint8_t | 1399Permissions: 1400.Sy Read/Write 1401.Ed 1402.Pp 1403The 1404.Dv MAC_PROP_EN_1000FDX_CAP 1405property describes whether or not 1 Gbit/s full-duplex support is 1406enabled. 1407.It Dv MAC_PROP_ADV_2500FDX_CAP 1408.Bd -filled -compact 1409Type: 1410.Vt uint8_t | 1411Permissions: 1412.Sy Read-Only 1413.Ed 1414.Pp 1415The 1416.Dv MAC_PROP_ADV_2500FDX_CAP 1417property describes whether or not 2.5 Gbit/s full-duplex support is 1418advertised. 1419.It Dv MAC_PROP_EN_2500FDX_CAP 1420.Bd -filled -compact 1421Type: 1422.Vt uint8_t | 1423Permissions: 1424.Sy Read/Write 1425.Ed 1426.Pp 1427The 1428.Dv MAC_PROP_EN_2500FDX_CAP 1429property describes whether or not 2.5 Gbit/s full-duplex support is 1430enabled. 1431.It Dv MAC_PROP_ADV_5000FDX_CAP 1432.Bd -filled -compact 1433Type: 1434.Vt uint8_t | 1435Permissions: 1436.Sy Read-Only 1437.Ed 1438.Pp 1439The 1440.Dv MAC_PROP_ADV_5000FDX_CAP 1441property describes whether or not 5.0 Gbit/s full-duplex support is 1442advertised. 1443.It Dv MAC_PROP_EN_5000FDX_CAP 1444.Bd -filled -compact 1445Type: 1446.Vt uint8_t | 1447Permissions: 1448.Sy Read/Write 1449.Ed 1450.Pp 1451The 1452.Dv MAC_PROP_EN_5000FDX_CAP 1453property describes whether or not 5.0 Gbit/s full-duplex support is 1454enabled. 1455.It Dv MAC_PROP_ADV_10GFDX_CAP 1456.Bd -filled -compact 1457Type: 1458.Vt uint8_t | 1459Permissions: 1460.Sy Read-Only 1461.Ed 1462.Pp 1463The 1464.Dv MAC_PROP_ADV_10GFDX_CAP 1465property describes whether or not 10 Gbit/s full-duplex support is 1466advertised. 1467.It Dv MAC_PROP_EN_10GFDX_CAP 1468.Bd -filled -compact 1469Type: 1470.Vt uint8_t | 1471Permissions: 1472.Sy Read/Write 1473.Ed 1474.Pp 1475The 1476.Dv MAC_PROP_EN_10GFDX_CAP 1477property describes whether or not 10 Gbit/s full-duplex support is 1478enabled. 1479.It Dv MAC_PROP_ADV_40GFDX_CAP 1480.Bd -filled -compact 1481Type: 1482.Vt uint8_t | 1483Permissions: 1484.Sy Read-Only 1485.Ed 1486.Pp 1487The 1488.Dv MAC_PROP_ADV_40GFDX_CAP 1489property describes whether or not 40 Gbit/s full-duplex support is 1490advertised. 1491.It Dv MAC_PROP_EN_40GFDX_CAP 1492.Bd -filled -compact 1493Type: 1494.Vt uint8_t | 1495Permissions: 1496.Sy Read/Write 1497.Ed 1498.Pp 1499The 1500.Dv MAC_PROP_EN_40GFDX_CAP 1501property describes whether or not 40 Gbit/s full-duplex support is 1502enabled. 1503.It Dv MAC_PROP_ADV_100GFDX_CAP 1504.Bd -filled -compact 1505Type: 1506.Vt uint8_t | 1507Permissions: 1508.Sy Read-Only 1509.Ed 1510.Pp 1511The 1512.Dv MAC_PROP_ADV_100GFDX_CAP 1513property describes whether or not 100 Gbit/s full-duplex support is 1514advertised. 1515.It Dv MAC_PROP_EN_100GFDX_CAP 1516.Bd -filled -compact 1517Type: 1518.Vt uint8_t | 1519Permissions: 1520.Sy Read/Write 1521.Ed 1522.Pp 1523The 1524.Dv MAC_PROP_EN_100GFDX_CAP 1525property describes whether or not 100 Gbit/s full-duplex support is 1526enabled. 1527.El 1528.Ss Private Properties 1529In addition to the defined properties above, drivers are allowed to 1530define private properties. 1531These private properties are device-specific properties. 1532All private properties share the same constant, 1533.Dv MAC_PROP_PRIVATE . 1534Properties are distinguished by a name, which is a character string. 1535The list of such private properties is defined when registering with mac in the 1536.Fa m_priv_props 1537member of the 1538.Xr mac_register 9S 1539structure. 1540.Pp 1541The driver may define whatever semantics it wants for these private 1542properties. 1543They will not be listed when running 1544.Xr dladm 8 , 1545unless explicitly requested by name. 1546All such properties should start with a leading underscore character and then 1547consist of alphanumeric ASCII characters and additional underscores or hyphens. 1548.Pp 1549Properties of type 1550.Dv MAC_PROP_PRIVATE 1551may show up in all three property related entry points: 1552.Xr mc_propinfo 9E , 1553.Xr mc_getprop 9E , 1554and 1555.Xr mc_setprop 9E . 1556Device drivers should tell the different properties apart by using the 1557.Xr strcmp 9F 1558function to compare it to the set of properties that it knows about. 1559When encountering properties that it doesn't know, it should treat them 1560like all other unknown properties. 1561.Sh STATISTICS 1562The MAC framework defines a couple different sets of statistics which 1563are based on various standards for devices to implement. 1564Statistics are retrieved through the 1565.Xr mc_getstat 9E 1566entry point. 1567There are both statistics that are required for all devices and then there is a 1568separate set of Ethernet specific statistics. 1569Not all devices will support every statistic. 1570In many cases, several device registers will need to be combined to create the 1571proper stat. 1572.Pp 1573In general, if the device is not keeping track of these statistics, then 1574it is recommended that the driver store these values as a 1575.Vt uint64_t 1576to ensure that overflow does not occur. 1577.Pp 1578If a device does not support a specific statistic, then it is fine to 1579return that it is not supported. 1580The same should be used for unrecognized statistics. 1581See 1582.Xr mc_getstat 9E 1583for more information on the proper way to handle these. 1584.Ss General Device Statistics 1585The following statistics are based on MIB-II statistics from both RFC 15861213 and RFC 1573. 1587.Bl -tag -width Ds 1588.It Dv MAC_STAT_IFSPEED 1589The device's current speed in bits per second. 1590.It Dv MAC_STAT_MULTIRCV 1591The total number of received multicast packets. 1592.It Dv MAC_STAT_BRDCSTRCV 1593The total number of received broadcast packets. 1594.It Dv MAC_STAT_MULTIXMT 1595The total number of transmitted multicast packets. 1596.It Dv MAC_STAT_BRDCSTXMT 1597The total number of received broadcast packets. 1598.It Dv MAC_STAT_NORCVBUF 1599The total number of packets discarded by the hardware due to a lack of 1600receive buffers. 1601.It Dv MAC_STAT_IERRORS 1602The total number of errors detected on input. 1603.It Dv MAC_STAT_UNKNOWNS 1604The total number of received packets that were discarded because they 1605were of an unknown protocol. 1606.It Dv MAC_STAT_NOXMTBUF 1607The total number of outgoing packets dropped due to a lack of transmit 1608buffers. 1609.It Dv MAC_STAT_OERRORS 1610The total number of outgoing packets that resulted in errors. 1611.It Dv MAC_STAT_COLLISIONS 1612Total number of collisions encountered by the transmitter. 1613.It Dv MAC_STAT_RBYTES 1614The total number of bytes received by the device, regardless of packet 1615type. 1616.It Dv MAC_STAT_IPACKETS 1617The total number of packets received by the device, regardless of packet type. 1618.It Dv MAC_STAT_OBYTES 1619The total number of bytes transmitted by the device, regardless of packet type. 1620.It Dv MAC_STAT_OPACKETS 1621The total number of packets sent by the device, regardless of packet type. 1622.It Dv MAC_STAT_UNDERFLOWS 1623The total number of packets that were smaller than the minimum sized 1624packet for the device and were therefore dropped. 1625.It Dv MAC_STAT_OVERFLOWS 1626The total number of packets that were larger than the maximum sized 1627packet for the device and were therefore dropped. 1628.El 1629.Ss Ethernet Specific Statistics 1630The following statistics are specific to Ethernet devices. 1631They refer to values from RFC 1643 and include various MII/GMII specific stats. 1632Many of these are also defined in IEEE 802.3. 1633.Bl -tag -width Ds 1634.It Dv ETHER_STAT_ADV_CAP_1000FDX 1635Indicates that the device is advertising support for 1 Gbit/s 1636full-duplex operation. 1637.It Dv ETHER_STAT_ADV_CAP_1000HDX 1638Indicates that the device is advertising support for 1 Gbit/s 1639half-duplex operation. 1640.It Dv ETHER_STAT_ADV_CAP_100FDX 1641Indicates that the device is advertising support for 100 Mbit/s 1642full-duplex operation. 1643.It Dv ETHER_STAT_ADV_CAP_100GFDX 1644Indicates that the device is advertising support for 100 Gbit/s 1645full-duplex operation. 1646.It Dv ETHER_STAT_ADV_CAP_100HDX 1647Indicates that the device is advertising support for 100 Mbit/s 1648half-duplex operation. 1649.It Dv ETHER_STAT_ADV_CAP_100T4 1650Indicates that the device is advertising support for 100 Mbit/s 1651100BASE-T4 operation. 1652.It Dv ETHER_STAT_ADV_CAP_10FDX 1653Indicates that the device is advertising support for 10 Mbit/s 1654full-duplex operation. 1655.It Dv ETHER_STAT_ADV_CAP_10GFDX 1656Indicates that the device is advertising support for 10 Gbit/s 1657full-duplex operation. 1658.It Dv ETHER_STAT_ADV_CAP_10HDX 1659Indicates that the device is advertising support for 10 Mbit/s 1660half-duplex operation. 1661.It Dv ETHER_STAT_ADV_CAP_2500FDX 1662Indicates that the device is advertising support for 2.5 Gbit/s 1663full-duplex operation. 1664.It Dv ETHER_STAT_ADV_CAP_40GFDX 1665Indicates that the device is advertising support for 40 Gbit/s 1666full-duplex operation. 1667.It Dv ETHER_STAT_ADV_CAP_5000FDX 1668Indicates that the device is advertising support for 5.0 Gbit/s 1669full-duplex operation. 1670.It Dv ETHER_STAT_ADV_CAP_ASMPAUSE 1671Indicates that the device is advertising support for receiving pause 1672frames. 1673.It Dv ETHER_STAT_ADV_CAP_AUTONEG 1674Indicates that the device is advertising support for auto-negotiation. 1675.It Dv ETHER_STAT_ADV_CAP_PAUSE 1676Indicates that the device is advertising support for generating pause 1677frames. 1678.It Dv ETHER_STAT_ADV_REMFAULT 1679Indicates that the device is advertising support for detecting faults in 1680the remote link peer. 1681.It Dv ETHER_STAT_ALIGN_ERRORS 1682Indicates the number of times an alignment error was generated by the 1683Ethernet device. 1684This is a count of packets that were not an integral number of octets and failed 1685the FCS check. 1686.It Dv ETHER_STAT_CAP_1000FDX 1687Indicates the device supports 1 Gbit/s full-duplex operation. 1688.It Dv ETHER_STAT_CAP_1000HDX 1689Indicates the device supports 1 Gbit/s half-duplex operation. 1690.It Dv ETHER_STAT_CAP_100FDX 1691Indicates the device supports 100 Mbit/s full-duplex operation. 1692.It Dv ETHER_STAT_CAP_100GFDX 1693Indicates the device supports 100 Gbit/s full-duplex operation. 1694.It Dv ETHER_STAT_CAP_100HDX 1695Indicates the device supports 100 Mbit/s half-duplex operation. 1696.It Dv ETHER_STAT_CAP_100T4 1697Indicates the device supports 100 Mbit/s 100BASE-T4 operation. 1698.It Dv ETHER_STAT_CAP_10FDX 1699Indicates the device supports 10 Mbit/s full-duplex operation. 1700.It Dv ETHER_STAT_CAP_10GFDX 1701Indicates the device supports 10 Gbit/s full-duplex operation. 1702.It Dv ETHER_STAT_CAP_10HDX 1703Indicates the device supports 10 Mbit/s half-duplex operation. 1704.It Dv ETHER_STAT_CAP_2500FDX 1705Indicates the device supports 2.5 Gbit/s full-duplex operation. 1706.It Dv ETHER_STAT_CAP_40GFDX 1707Indicates the device supports 40 Gbit/s full-duplex operation. 1708.It Dv ETHER_STAT_CAP_5000FDX 1709Indicates the device supports 5.0 Gbit/s full-duplex operation. 1710.It Dv ETHER_STAT_CAP_ASMPAUSE 1711Indicates that the device supports the ability to receive pause frames. 1712.It Dv ETHER_STAT_CAP_AUTONEG 1713Indicates that the device supports the ability to perform link 1714auto-negotiation. 1715.It Dv ETHER_STAT_CAP_PAUSE 1716Indicates that the device supports the ability to transmit pause frames. 1717.It Dv ETHER_STAT_CAP_REMFAULT 1718Indicates that the device supports the ability of detecting a remote 1719fault in a link peer. 1720.It Dv ETHER_STAT_CARRIER_ERRORS 1721Indicates the number of times that the Ethernet carrier sense condition 1722was lost or not asserted. 1723.It Dv ETHER_STAT_DEFER_XMTS 1724Indicates the number of frames for which the device was unable to 1725transmit the frame due to being busy and had to try again. 1726.It Dv ETHER_STAT_EX_COLLISIONS 1727Indicates the number of frames that failed to send due to an excessive 1728number of collisions. 1729.It Dv ETHER_STAT_FCS_ERRORS 1730Indicates the number of times that a frame check sequence failed. 1731.It Dv ETHER_STAT_FIRST_COLLISIONS 1732Indicates the number of times that a frame was eventually transmitted 1733successfully, but only after a single collision. 1734.It Dv ETHER_STAT_JABBER_ERRORS 1735Indicates the number of frames that were received that were both larger 1736than the maximum packet size and failed the frame check sequence. 1737.It Dv ETHER_STAT_LINK_ASMPAUSE 1738Indicates whether the link is currently configured to accept pause 1739frames. 1740.It Dv ETHER_STAT_LINK_AUTONEG 1741Indicates whether the current link state is a result of 1742auto-negotiation. 1743.It Dv ETHER_STAT_LINK_DUPLEX 1744Indicates the current duplex state of the link. 1745The values used here should be the same as documented for 1746.Dv MAC_PROP_DUPLEX . 1747.It Dv ETHER_STAT_LINK_PAUSE 1748Indicates whether the link is currently configured to generate pause 1749frames. 1750.It Dv ETHER_STAT_LP_CAP_1000FDX 1751Indicates the remote device supports 1 Gbit/s full-duplex operation. 1752.It Dv ETHER_STAT_LP_CAP_1000HDX 1753Indicates the remote device supports 1 Gbit/s half-duplex operation. 1754.It Dv ETHER_STAT_LP_CAP_100FDX 1755Indicates the remote device supports 100 Mbit/s full-duplex operation. 1756.It Dv ETHER_STAT_LP_CAP_100GFDX 1757Indicates the remote device supports 100 Gbit/s full-duplex operation. 1758.It Dv ETHER_STAT_LP_CAP_100HDX 1759Indicates the remote device supports 100 Mbit/s half-duplex operation. 1760.It Dv ETHER_STAT_LP_CAP_100T4 1761Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation. 1762.It Dv ETHER_STAT_LP_CAP_10FDX 1763Indicates the remote device supports 10 Mbit/s full-duplex operation. 1764.It Dv ETHER_STAT_LP_CAP_10GFDX 1765Indicates the remote device supports 10 Gbit/s full-duplex operation. 1766.It Dv ETHER_STAT_LP_CAP_10HDX 1767Indicates the remote device supports 10 Mbit/s half-duplex operation. 1768.It Dv ETHER_STAT_LP_CAP_2500FDX 1769Indicates the remote device supports 2.5 Gbit/s full-duplex operation. 1770.It Dv ETHER_STAT_LP_CAP_40GFDX 1771Indicates the remote device supports 40 Gbit/s full-duplex operation. 1772.It Dv ETHER_STAT_LP_CAP_5000FDX 1773Indicates the remote device supports 5.0 Gbit/s full-duplex operation. 1774.It Dv ETHER_STAT_LP_CAP_ASMPAUSE 1775Indicates that the remote device supports the ability to receive pause 1776frames. 1777.It Dv ETHER_STAT_LP_CAP_AUTONEG 1778Indicates that the remote device supports the ability to perform link 1779auto-negotiation. 1780.It Dv ETHER_STAT_LP_CAP_PAUSE 1781Indicates that the remote device supports the ability to transmit pause 1782frames. 1783.It Dv ETHER_STAT_LP_CAP_REMFAULT 1784Indicates that the remote device supports the ability of detecting a 1785remote fault in a link peer. 1786.It Dv ETHER_STAT_MACRCV_ERRORS 1787Indicates the number of times that the internal MAC layer encountered an 1788error when attempting to receive and process a frame. 1789.It Dv ETHER_STAT_MACXMT_ERRORS 1790Indicates the number of times that the internal MAC layer encountered an 1791error when attempting to process and transmit a frame. 1792.It Dv ETHER_STAT_MULTI_COLLISIONS 1793Indicates the number of times that a frame was eventually transmitted 1794successfully, but only after more than one collision. 1795.It Dv ETHER_STAT_SQE_ERRORS 1796Indicates the number of times that an SQE error occurred. 1797The specific conditions for this error are documented in IEEE 802.3. 1798.It Dv ETHER_STAT_TOOLONG_ERRORS 1799Indicates the number of frames that were received that were longer than 1800the maximum frame size supported by the device. 1801.It Dv ETHER_STAT_TOOSHORT_ERRORS 1802Indicates the number of frames that were received that were shorter than 1803the minimum frame size supported by the device. 1804.It Dv ETHER_STAT_TX_LATE_COLLISIONS 1805Indicates the number of times a collision was detected late on the 1806device. 1807.It Dv ETHER_STAT_XCVR_ADDR 1808Indicates the address of the MII/GMII receiver address. 1809.It Dv ETHER_STAT_XCVR_ID 1810Indicates the id of the MII/GMII receiver address. 1811.It Dv ETHER_STAT_XCVR_INUSE 1812Indicates what kind of receiver is in use. 1813The following values may be used: 1814.Bl -tag -width Ds 1815.It Dv XCVR_UNDEFINED 1816The receiver type is undefined by the hardware. 1817.It Dv XCVR_NONE 1818There is no receiver in use by the hardware. 1819.It Dv XCVR_10 1820The receiver supports 10BASE-T operation. 1821.It Dv XCVR_100T4 1822The receiver supports 100BASE-T4 operation. 1823.It Dv XCVR_100X 1824The receiver supports 100BASE-TX operation. 1825.It Dv XCVR_100T2 1826The receiver supports 100BASE-T2 operation. 1827.It Dv XCVR_1000X 1828The receiver supports 1000BASE-X operation. 1829This is used for all fiber receivers. 1830.It Dv XCVR_1000T 1831The receiver supports 1000BASE-T operation. 1832This is used for all copper receivers. 1833.El 1834.El 1835.Ss Device Specific kstats 1836In addition to the defined statistics above, if the device driver 1837maintains additional statistics or the device provides additional 1838statistics, it should create its own kstats through the 1839.Xr kstat_create 9F 1840function to allow operators to observe them. 1841.Sh RECEIVE DESCRIPTOR LAYOUT 1842One of the important things that a device driver must do is lay out DMA 1843memory, generally in a ring of descriptors, into which received Ethernet 1844frames will be placed. 1845When performing this, there are a few things that drivers should 1846generally do: 1847.Bl -enum -offset indent 1848.It 1849Drivers should lay out memory so that the IP header will be 4-byte 1850aligned. 1851The IP stack expects that the beginning of an IP header will be at a 18524-byte aligned address; however, a DMA allocation will be at a 4- 1853or 8-byte aligned address by default. 1854The IP hearder is at a 14 byte offset from the beginning of the Ethernet 1855frame, leaving the IP header at a 2-byte alignment if the Ethernet frame 1856starts at the beginning of the DMA buffer. 1857If VLAN tagging is in place, then each VLAN tag adds 4 bytes, which 1858doesn't change the alignment the IP header is found at. 1859.Pp 1860As a solution to this, the driver should program the device to start 1861placing the received Ethernet frame at two bytes off of the start of the 1862DMA buffer. 1863This will make sure that no matter whether or not VLAN tags are present, 1864that the IP header will be 4-byte aligned. 1865.It 1866Drivers should try to allocate the DMA memory used for receiving frames 1867as a continuous buffer. 1868If for some reason that would not be possible, the driver should try to 1869ensure that there is enough space for all of the initial Ethernet and 1870any possible layer three and layer four headers 1871.Pq such as IP, TCP, or UDP 1872in the initial descriptor. 1873.It 1874As discussed in the 1875.Sx MBLKS AND DMA 1876section, there are multiple strategies for managing the relationship 1877between DMA data, receive descriptors, and the operating system 1878representation of a packet in the 1879.Xr mblk 9S 1880structure. 1881Drivers must limit their resource consumption. 1882See the 1883.Sy Considerations 1884section of 1885.Sx MBLKS AND DMA 1886for more on this. 1887.El 1888.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT 1889Device drivers are the first line of defense for dealing with broken 1890devices and bugs in their firmware. 1891While most devices will rarely fail, it is important that when designing and 1892implementing the device driver that particular attention is paid in the design 1893with respect to RAS (Reliability, Availability, and Serviceability). 1894While everything described in this section is optional, it is highly recommended 1895that all new device drivers follow these guidelines. 1896.Pp 1897The Fault Management Architecture (FMA) provides facilities for 1898detecting and reporting various classes of defects and faults. 1899Specifically for networking device drivers, issues that should be 1900detected and reported include: 1901.Bl -bullet -offset indent 1902.It 1903Device internal uncorrectable errors 1904.It 1905Device internal correctable errors 1906.It 1907PCI and PCI Express transport errors 1908.It 1909Device temperature alarms 1910.It 1911Device transmission stalls 1912.It 1913Device communication timeouts 1914.It 1915High invalid interrupts 1916.El 1917.Pp 1918All such errors fall into three primary categories: 1919.Bl -enum -offset indent 1920.It 1921Errors detected by the Fault Management Architecture 1922.It 1923Errors detected by the device and indicated to the device driver 1924.It 1925Errors detected by the device driver 1926.El 1927.Ss Fault Management Setup and Teardown 1928Drivers should initialize support for the fault management framework by 1929calling 1930.Xr ddi_fm_init 9F 1931from their 1932.Xr attach 9E 1933routine. 1934By registering with the fault management framework, a device driver is given the 1935chance to detect and notice transport errors as well as report other errors that 1936exist. 1937While a device driver does not need to indicate that it is capable of all such 1938capabilities described in 1939.Xr ddi_fm_init 9F , 1940we suggest that device drivers at least register the 1941.Dv DDI_FM_EREPORT_CAPABLE 1942so as to allow the driver to report issues that it detects. 1943.Pp 1944If the driver registers with the fault management framework during its 1945.Xr attach 9E 1946entry point, it must call 1947.Xr ddi_fm_fini 9F 1948during its 1949.Xr detach 9E 1950entry point. 1951.Ss Transport Errors 1952Many modern networking devices leverage PCI or PCI Express. 1953As such, there are two primary ways that device drivers access data: they either 1954memory map device registers and use routines like 1955.Xr ddi_get8 9F 1956and 1957.Xr ddi_put8 9F 1958or they use direct memory access (DMA). 1959New device drivers should always enable checking of the transport layer by 1960marking their support in the 1961.Xr ddi_device_acc_attr 9S 1962structure and using routines like 1963.Xr ddi_fm_acc_err_get 9F 1964and 1965.Xr ddi_fm_dma_err_get 9F 1966to detect if errors have occurred. 1967.Ss Device Indicated Errors 1968Many devices have capabilities to announce to a device driver that a 1969fatal correctable error or uncorrectable error has occurred. 1970Other devices have the ability to indicate that various physical issues have 1971occurred such as a fan failing or a temperature sensor having fired. 1972.Pp 1973Drivers should wire themselves to receive notifications when these 1974events occur. 1975The means and capabilities will vary from device to device. 1976For example, some devices will generate information about these notifications 1977through special interrupts. 1978Other devices may have a register that software can poll. 1979In the cases where polling is required, driver writers should try not to poll 1980too frequently and should generally only poll when the device is actively being 1981used, e.g. between calls to the 1982.Xr mc_start 9E 1983and 1984.Xr mc_stop 9E 1985entry points. 1986.Ss Driver Transmit Stall Detection 1987One of the primary responsibilities of a hardened device driver is to 1988perform transmit stall detection. 1989The core idea behind tx stall detection is that the driver should record when 1990it's getting activity related to when data has been successfully transmitted. 1991Most devices should be transmitting data on a regular basis as long as the link 1992is up. 1993If it is not, then this may indicate that the device is stuck and needs to be 1994reset. 1995At this time, the MAC framework does not provide any resources for performing 1996these checks; however, polling on each individual transmit ring for the last 1997completion time while something is actively being transmitted through the use of 1998routines such as 1999.Xr timeout 9F 2000may be a reasonable starting point. 2001.Ss Driver Command Timeout Detection 2002Each device is programmed in different ways. 2003Some devices are programmed through asynchronous commands while others are 2004programmed by writing directly to memory mapped registers. 2005If a device receives asynchronous replies to commands, then the device driver 2006should set reasonable timeouts for all such commands and plan on detecting them. 2007If a timeout occurs, the driver should presume that there is an issue with the 2008hardware and proceed to abort the command or reset the device. 2009.Pp 2010Many devices do not have such a communication mechanism. 2011However, whenever there is some activity where the device driver must wait, then 2012it should be prepared for the fact that the device may never get back to 2013it and react appropriately by performing some kind of device reset. 2014.Ss Reacting to Errors 2015When any of the above categories of errors has been triggered, the 2016behavior that the device driver should take depends on the kind of 2017error. 2018If a fatal error, for example, a transport error, a transmit stall was detected, 2019or the device indicated an uncorrectable error was detected, then it is 2020important that the driver take the following steps: 2021.Bl -enum -offset indent 2022.It 2023Set a flag in the device driver's state that indicates that it has hit 2024an error condition. 2025When this error condition flag is asserted, transmitted packets should be 2026accepted and dropped and actions that would require writing to the device state 2027should fail with an error. 2028This flag should remain until the device has been successfully restarted. 2029.It 2030If the error was not a transport error that was indicated by the fault 2031management architecture, e.g. a transport error that was detected, then 2032the device driver should post an 2033.Sy ereport 2034indicating what has occurred with the 2035.Xr ddi_fm_ereport_post 9F 2036function. 2037.It 2038The device driver should indicate that the device's service was lost 2039with a call to 2040.Xr ddi_fm_service_impact 9F 2041using the symbol 2042.Dv DDI_SERVICE_LOST . 2043.It 2044At this point the device driver should issue a device reset through some 2045device-specific means. 2046.It 2047When the device reset has been completed, then the device driver should 2048restore all of the programmed state to the device. 2049This includes things like the current MTU, advertised auto-negotiation speeds, 2050MAC address filters, and more. 2051.It 2052Finally, when service has been restored, the device driver should call 2053.Xr ddi_fm_service_impact 9F 2054using the symbol 2055.Dv DDI_SERVICE_RESTORED . 2056.El 2057.Pp 2058When a non-fatal error occurs, then the device driver should submit an 2059ereport and should optionally mark the device degraded using 2060.Xr ddi_fm_service_impact 9F 2061with the 2062.Dv DDI_SERVICE_DEGRADED 2063value depending on the nature of the problem that has occurred. 2064.Pp 2065Device drivers should never make the decision to remove a device from 2066service based on errors that have occurred nor should they panic the 2067system. 2068Rather, the device driver should always try to notify the operating system with 2069various ereports and allow its policy decisions to occur. 2070The decision to retire a device lies in the hands of the fault management 2071architecture. 2072It knows more about the operator's intent and the surrounding system's state 2073than the device driver itself does and it will make the call to offline and 2074retire the device if it is required. 2075.Ss Device Resets 2076When resetting a device, a device driver must exercise caution. 2077If a device driver has not been written to plan for a device reset, then it 2078may not correctly restore the device's state after such a reset. 2079Such state should be stored in the instance's private state data as the MAC 2080framework does not know about device resets and will not inform the 2081device again about the expected, programmed state. 2082.Pp 2083One wrinkle with device resets is that many networking cards show up as 2084multiple PCI functions on a single device, for example, each port may 2085show up as a separate function and thus have a separate instance of the 2086device driver attached. 2087When resetting a function, device driver writers should carefully read the 2088device programming manuals and verify whether or not a reset impacts only the 2089stalled function or if it impacts all function across the device. 2090.Pp 2091If the only way to reset a given function is through the device, then 2092this may require more coordination and work on the part of the device 2093driver to ensure that all the other instances are correctly restored. 2094In cases where this occurs, some devices offer ways of injecting 2095interrupts onto those other functions to notify them that this is 2096occurring. 2097.Sh MBLKS AND DMA 2098The networking stack manages framed data through the use of the 2099.Xr mblk 9S 2100structure. 2101The mblk allows for a single message to be made up of individual blocks. 2102Each part is linked together through its 2103.Fa b_cont 2104member. 2105However, it also allows for multiple messages to be chained together through the 2106use of the 2107.Fa b_next 2108member. 2109While the networking stack works with these structures, device drivers generally 2110work with DMA regions. 2111There are two different strategies that device drivers use for handling these 2112two different cases: copying and binding. 2113.Ss Copying Data 2114The first way that device drivers handle interfacing between the two is 2115by having two separate regions of memory. 2116One part is memory which has been allocated for DMA through a call to 2117.Xr ddi_dma_mem_alloc 9F 2118and the other is memory associated with the memory block. 2119.Pp 2120In this case, a driver will use 2121.Xr bcopy 9F 2122to copy memory between the two distinct regions. 2123When transmitting a packet, it will copy the memory from the mblk_t to the DMA 2124region. 2125When receiving memory, it will allocate a mblk_t through the 2126.Xr allocb 9F 2127routine, copy the memory across with 2128.Xr bcopy 9F , 2129and then increment the mblk_t's 2130.Fa b_wptr 2131structure. 2132.Pp 2133If, when receiving, memory is not available for a new message block, 2134then the frame should be skipped and effectively dropped. 2135A kstat should be bumped when such an occasion occurs. 2136.Ss Binding Data 2137An alternative approach to copying data is to use DMA binding. 2138When using DMA binding, the OS takes care of mapping between DMA memory and 2139normal device memory. 2140The exact process is a bit different between transmit and receive. 2141.Pp 2142When transmitting a device driver has an mblk_t and needs to call the 2143.Xr ddi_dma_addr_bind_handle 9F 2144function to bind it to an already existing DMA handle. 2145At that point, it will receive various DMA cookies that it can use to obtain the 2146addresses to program the device with for transmitting data. 2147Once the transmit is done, the driver must then make sure to call 2148.Xr freemsg 9F 2149to release the data. 2150It must not call 2151.Xr freemsg 9F 2152before it receives an interrupt from the device indicating that the data 2153has been transmitted, otherwise it risks sending arbitrary kernel 2154memory. 2155.Pp 2156When receiving data, the device can perform a similar operation. 2157First, it must bind the DMA memory into the kernel's virtual memory address 2158space through a call to the 2159.Xr ddi_dma_addr_bind_handle 9F 2160function if it has not already. 2161Once it has, it must then call 2162.Xr desballoc 9F 2163to try and create a new mblk_t which leverages the associated memory. 2164It can then pass that mblk_t up to the stack. 2165.Ss Considerations 2166When deciding which of these options to use, there are many different 2167considerations that must be made. 2168The answer as to whether to bind memory or to copy data is not always simpler. 2169.Pp 2170The first thing to remember is that DMA resources may be finite on a 2171given platform. 2172Consider the case of receiving data. 2173A device driver that binds one of its receive descriptors may not get it back 2174for quite some time as it may be used by the kernel until an application 2175actually consumes it. 2176Device drivers that try to bind memory for receive, often work with the 2177constraint that they must be able to replace that DMA memory with another DMA 2178descriptor. 2179If they were not replaced, then eventually the device would not be able to 2180receive additional data into the ring. 2181.Pp 2182On the other hand, particularly for larger frames, copying every packet 2183from one buffer to another can be a source of additional latency and 2184memory waste in the system. 2185For larger copies, the cost of copying may dwarf any potential cost of 2186performing DMA binding. 2187.Pp 2188For device driver authors that are unsure of what to do, they should 2189first employ the copying method to simplify the act of writing the 2190device driver. 2191The copying method is simpler and also allows the device driver author not to 2192worry about allocated DMA memory that is still outstanding when it is asked to 2193unload. 2194.Pp 2195If device driver writers are worried about the cost, it is recommended 2196to make the decision as to whether or not to copy or bind DMA data 2197a separate private property for both transmitting and receiving. 2198That private property should indicate the size of the received frame at which 2199to switch from one format to the other. 2200This way, data can be gathered to determine what the impact of each method is on 2201a given platform. 2202.Sh SEE ALSO 2203.Xr dlpi 4P , 2204.Xr driver.conf 5 , 2205.Xr ieee802.3 7 , 2206.Xr dladm 8 , 2207.Xr _fini 9E , 2208.Xr _info 9E , 2209.Xr _init 9E , 2210.Xr attach 9E , 2211.Xr close 9E , 2212.Xr detach 9E , 2213.Xr mac_capab_led 9E , 2214.Xr mac_capab_rings 9E , 2215.Xr mac_capab_transceiver 9E , 2216.Xr mc_close 9E , 2217.Xr mc_getcapab 9E , 2218.Xr mc_getprop 9E , 2219.Xr mc_getstat 9E , 2220.Xr mc_multicst 9E , 2221.Xr mc_open 9E , 2222.Xr mc_propinfo 9E , 2223.Xr mc_setpromisc 9E , 2224.Xr mc_setprop 9E , 2225.Xr mc_start 9E , 2226.Xr mc_stop 9E , 2227.Xr mc_tx 9E , 2228.Xr mc_unicst 9E , 2229.Xr open 9E , 2230.Xr allocb 9F , 2231.Xr bcopy 9F , 2232.Xr ddi_dma_addr_bind_handle 9F , 2233.Xr ddi_dma_mem_alloc 9F , 2234.Xr ddi_fm_acc_err_get 9F , 2235.Xr ddi_fm_dma_err_get 9F , 2236.Xr ddi_fm_ereport_post 9F , 2237.Xr ddi_fm_fini 9F , 2238.Xr ddi_fm_init 9F , 2239.Xr ddi_fm_service_impact 9F , 2240.Xr ddi_get8 9F , 2241.Xr ddi_put8 9F , 2242.Xr desballoc 9F , 2243.Xr freemsg 9F , 2244.Xr kstat_create 9F , 2245.Xr mac_alloc 9F , 2246.Xr mac_devt_to_instance 9F , 2247.Xr mac_fini_ops 9F , 2248.Xr mac_free 9F , 2249.Xr mac_getinfo 9F , 2250.Xr mac_hcksum_get 9F , 2251.Xr mac_hcksum_set 9F , 2252.Xr mac_init_ops 9F , 2253.Xr mac_link_update 9F , 2254.Xr mac_lso_get 9F , 2255.Xr mac_maxsdu_update 9F , 2256.Xr mac_private_minor 9F , 2257.Xr mac_prop_info_set_default_link_flowctrl 9F , 2258.Xr mac_prop_info_set_default_str 9F , 2259.Xr mac_prop_info_set_default_uint32 9F , 2260.Xr mac_prop_info_set_default_uint64 9F , 2261.Xr mac_prop_info_set_default_uint8 9F , 2262.Xr mac_prop_info_set_perm 9F , 2263.Xr mac_prop_info_set_range_uint32 9F , 2264.Xr mac_register 9F , 2265.Xr mac_rx 9F , 2266.Xr mac_unregister 9F , 2267.Xr mod_install 9F , 2268.Xr mod_remove 9F , 2269.Xr strcmp 9F , 2270.Xr timeout 9F , 2271.Xr cb_ops 9S , 2272.Xr ddi_device_acc_attr 9S , 2273.Xr dev_ops 9S , 2274.Xr mac_callbacks 9S , 2275.Xr mac_register 9S , 2276.Xr mblk 9S , 2277.Xr modldrv 9S , 2278.Xr modlinkage 9S 2279.Rs 2280.%A McCloghrie, K. 2281.%A Rose, M. 2282.%T RFC 1213 Management Information Base for Network Management of 2283.%T TCP/IP-based internets: MIB-II 2284.%D March 1991 2285.Re 2286.Rs 2287.%A McCloghrie, K. 2288.%A Kastenholz, F. 2289.%T RFC 1573 Evolution of the Interfaces Group of MIB-II 2290.%D January 1994 2291.Re 2292.Rs 2293.%A Kastenholz, F. 2294.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like 2295.%T Interface Types 2296.Re 2297