1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright 2019 Joyent, Inc. 13.\" Copyright 2020 RackTop Systems, Inc. 14.\" Copyright 2023 Oxide Computer Company 15.\" 16.Dd March 4, 2023 17.Dt MAC 9E 18.Os 19.Sh NAME 20.Nm mac , 21.Nm GLDv3 22.Nd MAC networking device driver overview 23.Sh SYNOPSIS 24.In sys/mac_provider.h 25.In sys/mac_ether.h 26.Sh INTERFACE LEVEL 27illumos DDI specific 28.Sh DESCRIPTION 29The 30.Sy MAC 31framework provides a means for implementing high-performance networking 32device drivers. 33It is the successor to the GLD interfaces and is sometimes referred to as the 34GLDv3. 35The remainder of this manual introduces the aspects of writing devices drivers 36that leverage the MAC framework. 37While both the GLDv3 and MAC framework refer to the same thing, in this manual 38page we use the term the 39.Em MAC framework 40to refer to the device driver interface. 41.Pp 42MAC device drivers are character devices. 43They define the standard 44.Xr _init 9E , 45.Xr _fini 9E , 46and 47.Xr _info 9E 48entry points to initialize the module, as well as 49.Xr dev_ops 9S 50and 51.Xr cb_ops 9S 52structures. 53.Pp 54The main interface with MAC is through a series of callbacks defined in 55a 56.Xr mac_callbacks 9S 57structure. 58These callbacks control all the aspects of the device. 59They range from sending data, getting and setting of properties, controlling mac 60address filters, and also managing promiscuous mode. 61.Pp 62The MAC framework takes care of many aspects of the device driver's 63management. 64A device that uses the MAC framework does not have to worry about creating 65device nodes or implementing 66.Xr open 9E 67or 68.Xr close 9E 69routines. 70In addition, all of the work to interact with 71.Xr dlpi 4P 72is taken care of automatically and transparently. 73.Ss High-Level Design 74At a high-level, a device driver is chiefly concerned with three general 75operations: 76.Bl -enum -offset indent 77.It 78Sending frames 79.It 80Receiving frames 81.It 82Managing device configuration and metadata 83.El 84.Pp 85When sending frames, the MAC framework always calls functions registered 86in the 87.Xr mac_callbacks 9S 88structure to have the driver transmit frames on hardware. 89When receiving frames, the driver will generally receive an interrupt which will 90cause it to check for incoming data and deliver it to the MAC framework. 91.Pp 92Configuration of a device, such as whether auto-negotiation should be 93enabled, the speeds that the device supports, the MTU (maximum 94transmission unit), and the generation of pause frames are all driven by 95properties. 96The functions to get, set, and obtain information about properties are 97defined through callback functions specified in the 98.Xr mac_callbacks 9S 99structure. 100The full list of properties and a description of the relevant callbacks 101can be found in the 102.Sx PROPERTIES 103section. 104.Pp 105The MAC framework is designed to take advantage of various modern 106features provided by hardware, such as checksumming, segmentation 107offload, and hardware filtering. 108The MAC framework assumes none of these advanced features are present 109and allows device drivers to negotiate them through a capability system. 110Drivers can declare that they support various capabilities by 111implementing the optional 112.Xr mc_getcapab 9E 113entry point. 114Each capability has its associated entry points and structures to fill 115out. 116The capabilities are detailed in the 117.Sx CAPABILITIES 118section. 119.Pp 120The following sections describe the flow of a basic device driver. 121For advanced device drivers, the flow is generally the same. 122The primary distinction is in how frames are sent and received. 123.Ss Initializing MAC Support 124For a device to be used by the MAC framework, it must register with the 125framework and take specific actions during 126.Xr _init 9E , 127.Xr attach 9E , 128.Xr detach 9E , 129and 130.Xr _fini 9E . 131.Pp 132All device drivers have to define a 133.Xr dev_ops 9S 134structure which is pointed to by a 135.Xr modldrv 9S 136structure and the corresponding NULL-terminated 137.Xr modlinkage 9S 138structure. 139The 140.Xr dev_ops 9S 141structure should have a 142.Xr cb_ops 9S 143structure defined for it; however, it does not need to implement any of 144the standard 145.Xr cb_ops 9S 146entry points unless it also exposes a custom set of device nodes not 147otherwise managed by the MAC framework. 148See the 149.Sx Custom Device Nodes 150section for more details. 151.Pp 152Normally, in a driver's 153.Xr _init 9E 154entry point, it passes its 155.Xr modlinkage 9S 156structure directly to 157.Xr mod_install 9F . 158To properly register with MAC, the driver must call 159.Xr mac_init_ops 9F 160before it calls 161.Xr mod_install 9F . 162If for some reason the 163.Xr mod_install 9F 164function fails, then the driver must be removed by a call to 165.Xr mac_fini_ops 9F . 166.Pp 167Conversely, in the driver's 168.Xr _fini 9E 169routine, it should call 170.Xr mac_fini_ops 9F 171after it successfully calls 172.Xr mod_remove 9F . 173For an example of how to use the 174.Xr mac_init_ops 9F 175and 176.Xr mac_fini_ops 9F 177functions, see the examples section in 178.Xr mac_init_ops 9F . 179.Ss Custom Device Nodes 180A device may want to provide its own minor nodes as simple character or block 181devices backed by the usual 182.Xr cb_ops 9S 183routines. 184The MAC framework allows for this by leaving a portion of the minor 185number space available for private driver use. 186.Xr mac_private_minor 9F 187returns the first minor number a driver may use for its own purposes, 188e.g., to pass to 189.Xr ddi_create_minor_node 9F . 190.Pp 191A driver making use of this ability must provide its own 192.Xr getinfo 9E 193implementation that is aware of any such minor nodes. 194It must also delegate back to the MAC framework as appropriate via either 195calls to 196.Xr mac_getinfo 9F 197or 198.Xr mac_devt_to_instance 9F 199for MAC reserved minor nodes. 200It should also take care to not affect MAC reserved minors, e.g., 201removing all minor nodes associated with a device: 202.Bd -literal -offset indent 203 ddi_remove_minor_node(dip, NULL); 204.Ed 205.Ss Registering with MAC 206Every instance of a device should register separately with MAC. 207To register with MAC, a driver must allocate a 208.Xr mac_register 9S 209structure, fill it in, and then call 210.Xr mac_register 9F . 211The 212.Vt mac_register_t 213structure contains information about the device and all of the required 214function pointers that will be used as callbacks by the framework. 215.Pp 216These steps should all be taken during a device's 217.Xr attach 9E 218entry point. 219It is recommended that the driver perform this sequence of steps after the 220device has finished its initialization of the chipset and interrupts, though 221interrupts should not be enabled at that point. 222After it calls 223.Xr mac_register 9F 224it will start receiving callbacks from the MAC framework. 225.Pp 226To allocate the registration structure, the driver should call 227.Xr mac_alloc 9F . 228Device drivers should generally always pass the symbol 229.Dv MAC_VERSION 230as the argument to 231.Xr mac_alloc 9F . 232Upon successful completion, the driver will receive a 233.Vt mac_register_t 234structure which it should fill in. 235The structure and its members are documented in 236.Xr mac_register 9S . 237.Pp 238The 239.Xr mac_callbacks 9S 240structure is not allocated as a part of the 241.Xr mac_register 9S 242structure. 243In general, device drivers declare this statically. 244See the 245.Sx MAC Callbacks 246section for more information on how to fill it out. 247.Pp 248Once the structure has been filled in, the driver should call 249.Xr mac_register 9F 250to register itself with MAC. 251The handle that it uses to register with should be part of the driver's soft 252state. 253It will be used in various other support functions and callbacks. 254.Pp 255If the call is successful, then the device driver 256should enable interrupts and finish any other initialization required. 257If the call to 258.Xr mac_register 9F 259failed, then it should unwind its initialization and should return 260.Dv DDI_FAILURE 261from its 262.Xr attach 9E 263routine. 264.Pp 265The driver does not need to hold onto an allocated 266.Xr mac_register 9S 267structure after it has called the 268.Xr mac_register 9F 269function. 270Whether the 271.Xr mac_register 9F 272function returns successfully or not, the driver may free its 273.Xr mac_register 9S 274structure by calling the 275.Xr mac_free 9F 276function. 277.Ss MAC Callbacks 278The MAC framework interacts with a device driver through a series of 279callbacks. 280These callbacks are described in their individual manual pages and the 281collection of callbacks is indicated in the 282.Xr mac_callbacks 9S 283manual page. 284This section does not focus on the specific functions, but rather on 285interactions between them and the rest of the device driver framework. 286.Pp 287A device driver should make no assumptions about when the various 288callbacks will be called and whether or not they will be called 289simultaneously. 290For example, a device driver may be asked to transmit data through a call to its 291.Xr mc_tx 9E 292entry point while it is being asked to get a device property through a 293call to its 294.Xr mc_getprop 9E 295entry point. 296As such, while some calls may be serialized to the device, such as setting 297properties, the device driver should always presume that all of its data needs 298to be protected with locks. 299While the device is holding locks, it is safe for it call the following MAC 300routines: 301.Bl -bullet -offset indent -compact 302.It 303.Xr mac_hcksum_get 9F 304.It 305.Xr mac_hcksum_set 9F 306.It 307.Xr mac_lso_get 9F 308.It 309.Xr mac_maxsdu_update 9F 310.It 311.Xr mac_prop_info_set_default_link_flowctrl 9F 312.It 313.Xr mac_prop_info_set_default_str 9F 314.It 315.Xr mac_prop_info_set_default_uint8 9F 316.It 317.Xr mac_prop_info_set_default_uint32 9F 318.It 319.Xr mac_prop_info_set_default_uint64 9F 320.It 321.Xr mac_prop_info_set_perm 9F 322.It 323.Xr mac_prop_info_set_range_uint32 9F 324.El 325.Pp 326Any other MAC related routines should not be called with locks held, 327such as 328.Xr mac_link_update 9F 329or 330.Xr mac_rx 9F . 331Other routines in the DDI may be called while locks are held; however, 332device driver writers should be careful about calling blocking routines 333while locks are held or in interrupt context, even when it is 334legal to do so as this may cause all other callers that need a given 335lock to back up behind such an operation. 336.Ss Receiving Data 337A device driver will often receive data through the means of an 338interrupt or by being asked to poll for frames. 339When this occurs, zero or more frames, each with optional metadata, may 340be ready for the device driver to consume. 341Often each frame has a corresponding descriptor which has information about 342whether or not there were errors or whether or not the device successfully 343checksummed the packet. 344In addition to the per-packet flow described below, there are certain 345requirements that drivers must adhere to when programming the hardware 346to receive data. 347See the section 348.Sx RECEIVE DESCRIPTOR LAYOUT 349for more information. 350.Pp 351During a single interrupt or poll request, a device driver should process 352a fixed number of frames. 353For each frame the device driver should: 354.Bl -enum -offset indent 355.It 356Ensure that all of the DMA memory for the descriptor ring is synchronized with 357the 358.Xr ddi_dma_sync 9F 359function and check the handle for errors if the device driver has enabled DMA 360error reporting as part of the Fault Management Architecture (FMA). 361If the driver does not rely on DMA, then it may skip this step. 362It is recommended that this is performed once per interrupt or poll for 363the entire region and not on a per-packet basis. 364.It 365First check whether or not the frame has errors. 366If errors were detected, then the frame should not be sent to the operating 367system. 368It is recommended that devices keep kstats (see 369.Xr kstat_create 9F 370for more information) and bump the counter whenever such an error is 371detected. 372If the device distinguishes between the types of errors, then separate kstats 373for each class of error are recommended. 374See the 375.Sx STATISTICS 376section for more information on the various error cases that should be 377considered. 378.It 379Once the frame has been determined to be valid, the device driver should 380transform the frame into a 381.Xr mblk 9S . 382See the section 383.Sx MBLKS AND DMA 384for more information on how to transform and prepare a message block. 385.It 386If the device supports hardware checksumming (see the 387.Sx CAPABILITIES 388section for more information on checksumming), then the device driver 389should set the corresponding checksumming information with a call to 390.Xr mac_hcksum_set 9F . 391.It 392It should then append this new message block to the 393.Em end 394of the message block chain, linking it to the 395.Fa b_next 396pointer. 397It is vitally important that all the frames be chained in the order that they 398were received. 399If the device driver mistakenly reorders frames, then it may cause performance 400impacts in the TCP stack and potentially impact application correctness. 401.El 402.Pp 403Once all the frames have been processed and assembled, the device driver 404should deliver them to the rest of the operating system by calling 405.Xr mac_rx 9F . 406The device driver should try to give as many mblk_t structures to the 407system at once. 408It 409.Em should not 410call 411.Xr mac_rx 9F 412once for every assembled mblk_t. 413.Pp 414The device driver must not hold any locks across the call to 415.Xr mac_rx 9F . 416When this function is called, received data will be pushed through the 417networking stack and some replies may be generated and given to the 418driver to send out. 419.Pp 420It is not the device driver's responsibility to determine whether or not 421the system can keep up with a driver's delivery rate of frames. 422The rest of the networking stack will handle issues related to keeping up 423appropriately and ensure that kernel memory is not exhausted by packets 424that are not being processed. 425.Pp 426If the device driver has negotiated the 427.Dv MAC_CAPAB_RINGS 428capability 429.Pq discussed in Xr mac_capab_rings 9E 430then it should call 431.Xr mac_rx_ring 9F 432and not 433.Xr mac_rx 9F . 434A given interrupt may correspond to more than one ring that needs to be 435checked. 436The set of rings is likely to span different groups that were registered 437with MAC through the 438.Xr mr_gget 9E 439interface. 440In those cases, the driver should follow the above procedure 441independently for each ring. 442That means it will call 443.Xr mac_rx_ring 9F 444once for each ring using the handle that it received from when MAC 445called the driver's 446.Xr mr_rget 9E 447entry point. 448When it is looking at the rings, the driver will need to make sure that 449the ring has not had interrupts disabled 450.Pq due to a pending change to polling mode . 451This is discussed in greater detail in the 452.Xr mac_capab_rings 9E 453and 454.Xr mri_poll 9E 455manual pages. 456.Pp 457Finally, the device driver should make sure that any other housekeeping 458activities required for the ring are taken care of such that more data 459can be received. 460.Ss Transmitting Data and Back Pressure 461A device driver will be asked to transmit a message block chain by 462having it's 463.Xr mc_tx 9E 464entry point called. 465While the driver is processing the message blocks, it may run out of resources. 466For example, a transmit descriptor ring may become full. 467At that point, the device driver should return the remaining unprocessed frames. 468The act of returning frames indicates that the device has asserted flow control. 469Once this has been done, no additional calls will be made to the 470driver's transmit entry point and the back pressure will be propagated 471throughout the rest of the networking stack. 472.Pp 473At some point in the future when resources have become available again, 474for example after an interrupt indicating that some portion of the 475transmit ring has been sent, then the device driver must notify the 476system that it can continue transmission. 477To do this, the driver should call 478.Xr mac_tx_update 9F . 479After that point, the driver will receive calls to its 480.Xr mc_tx 9E 481entry point again. 482As mentioned in the section on callbacks, the device driver should avoid holding 483any particular locks across the call to 484.Xr mac_tx_update 9F . 485.Ss Interrupt Coalescing 486For devices operating at higher data rates, interrupt coalescing is an 487important part of a well functioning device and may impact the 488performance of the device. 489Not all devices support interrupt coalescing. 490If interrupt coalescing is supported on the device, it is recommended that 491device driver writers provide private properties for their device to control the 492interrupt coalescing rate. 493This will make it much easier to perform experiments and observe the impact of 494different interrupt rates on the rest of the system. 495.Ss Polling 496Even with interrupt coalescing, when there is a certain incoming packet rate it 497can make more sense to just actively poll the device, asking for more packets 498rather than constantly taking an interrupt. 499When a device driver supports the 500.Xr mac_capab_rings 9E 501capability and therefore polling on receive rings, the MAC framework will ask 502the driver to disable interrupts, with its 503.Xr mi_disable 9E 504entry point, and then subsequently call its polling entry point, 505.Xr mri_poll 9E . 506.Pp 507As long as a device driver implements the needed entry points, then there is 508nothing else that it needs to do to take advantage of polling. 509A driver should not attempt to spin up its own threads, task queues, or 510creatively use timeouts, to try to simulate polling for received packets. 511.Ss MAC Address Filter Management 512The MAC framework will attempt to use as many MAC address filters as a 513device has. 514To program a multicast address filter, the driver's 515.Xr mc_multicst 9E 516entry point will be called. 517If the device driver runs out of filters, it should not take any special action 518and just return the appropriate error as documented in the corresponding manual 519pages for the entry points. 520The framework will ensure that the device is placed in promiscuous mode 521if it needs to. 522.Pp 523If the hardware supports more than one unicast filter then the device 524driver should consider implementing the 525.Dv MAC_CAPAB_RINGS 526capability, which exposes a means for multiple unicast MAC address filters to be 527used by the broader system. 528It is still useful to implement this on hardware which only has a single ring. 529See 530.Xr mac_capab_rings 9E 531for more information. 532.Ss Receive Side Scaling 533Receive side scaling is where a hardware device supports multiple, 534independent queues of frames that can be received. 535Each of these queues is generally associated with an independent 536interrupt and the hardware usually performs some form of hash across the 537queues. 538Hardware which supports this should look at implementing the 539.Dv MAC_CAPAB_RINGS 540capability and see 541.Xr mac_capab_rings 9E 542for more information. 543.Ss Link Updates 544It is the responsibility of the device driver to keep track of the 545data link's state. 546Many devices provide a means of receiving an interrupt when the state of the 547link changes. 548When such a change happens, the driver should update its internal data 549structures and then call 550.Xr mac_link_update 9F 551to inform the MAC layer that this has occurred. 552If the device driver does not properly inform the system about link changes, 553then various features like link aggregations and other mechanisms that leverage 554the link state will not work correctly. 555.Ss Link Speed and Auto-negotiation 556Many networking devices support more than one possible speed that they 557can operate at. 558The selection of a speed is often performed through 559.Em auto-negotiation , 560though some devices allow the user to control what speeds are advertised 561and used. 562.Pp 563Logically, there are two different sets of things that the device driver 564needs to keep track of while it's operating: 565.Bl -enum 566.It 567The supported speeds in hardware. 568.It 569The enabled speeds from the user. 570.El 571.Pp 572By default, when a link first comes up, the device driver should 573generally configure the link to support the common set of speeds and 574perform auto-negotiation. 575.Pp 576A user can control what speeds a device advertises via auto-negotiation 577and whether or not it performs auto-negotiation at all by using a series 578of properties that have 579.Sy _EN_ 580in the name. 581These are read/write properties and there is one for each speed supported in the 582operating system. 583For a full list of them, see the 584.Sx PROPERTIES 585section. 586.Pp 587In addition to these properties, there is a corresponding set of 588properties with 589.Sy _ADV_ 590in the name. 591These are similar to the 592.Sy _EN_ 593family of properties, but they are read-only and indicate what the 594device has actually negotiated. 595While they are generally similar to the 596.Sy _EN_ 597family of properties, they may change depending on power settings. 598See the 599.Sy Ethernet Link Properties 600section in 601.Xr dladm 8 602for more information. 603.Pp 604It's worth discussing how these different values get used throughout the 605different entry points. 606The first entry point to consider is the 607.Xr mc_propinfo 9E 608entry point. 609For a given speed, the driver should consult whether or not the hardware 610supports this speed. 611If it does, it should fill in the default value that the hardware takes and 612whether or not the property is writable. 613The properties should also be updated to indicate whether or not it is writable. 614This holds for both the 615.Sy _EN_ 616and 617.Sy _ADV_ 618family of properties. 619.Pp 620The next entry point is 621.Xr mc_getprop 9E . 622Here, the device should first consult whether the given speed is 623supported. 624If it is not, then the driver should return 625.Er ENOTSUP . 626If it does, then it should return the current value of the property. 627.Pp 628The last property endpoint is the 629.Xr mc_setprop 9E 630entry point. 631Here, the same logic applies. 632Before the driver considers whether or not the property is writable, it should 633first check whether or not it's a supported property. 634If it's not, then it should return 635.Er ENOTSUP . 636Otherwise, it should proceed to check whether the property is writable, 637and if it is and a valid value, then it should update the property and 638restart the link's negotiation. 639.Pp 640Finally, there is the 641.Xr mc_getstat 9E 642entry point. 643Several of the statistics that are queried relate to auto-negotiation and 644hardware capabilities. 645When a statistic relates to the hardware supporting a given speed, the 646.Sy _EN_ 647properties should be ignored. 648The only thing that should be consulted is what the hardware itself supports. 649Otherwise, the statistics should look at what is currently being advertised by 650the device. 651.Ss Unregistering from MAC 652During a driver's 653.Xr detach 9E 654routine, it should unregister the device instance from MAC by calling 655.Xr mac_unregister 9F 656on the handle that it originally called it on. 657If the call to 658.Xr mac_unregister 9F 659failed, then the device is likely still in use and the driver should 660fail the call to 661.Xr detach 9E . 662.Ss Interacting with Devices 663Administrators always interact with devices through the 664.Xr dladm 8 665command line interface. 666The state of devices such as whether the link is considered up or down, 667various link properties such as the MTU, auto-negotiation state, and 668flow control state, are all exposed. 669It is also the preferred way that these properties are set and configured. 670.Pp 671While device tunables may be presented in a 672.Xr driver.conf 5 673file, it is recommended instead to expose such things through 674.Xr dladm 8 675private properties, whether explicitly documented or not. 676.Sh CAPABILITIES 677Capabilities in the MAC Framework are optional features that a device 678supports which indicate various hardware features that the device 679supports. 680The two current capabilities that the system supports are related to being able 681to hardware perform large send offloads (LSO), often also known as TCP 682segmentation and the ability for hardware to calculate and verify the checksums 683present in IPv4, IPV6, and protocol headers such as TCP and UDP. 684.Pp 685The MAC framework will query a device for support of a capability 686through the 687.Xr mc_getcapab 9E 688function. 689Each capability has its own constant and may have corresponding data that goes 690along with it and a specific structure that the device is required to fill in. 691Note, the set of capabilities changes over time and there are also private 692capabilities in the system. 693Several of the capabilities are used in the implementation of the MAC framework. 694Others, like 695.Dv MAC_CAPAB_RINGS , 696represent feature that have not been stabilized and thus both API and binary 697compatibility for them is not guaranteed. 698It is important that the device driver handles unknown capabilities correctly. 699For more information, see 700.Xr mc_getcapab 9E . 701.Pp 702The following capabilities are 703stable and defined in the system: 704.Ss Dv MAC_CAPAB_HCKSUM 705The 706.Dv MAC_CAPAB_HCKSUM 707capability indicates to the system that the device driver supports some 708amount of checksumming. 709The specific data for this capability is a pointer to a 710.Vt uint32_t . 711To indicate no support for any kind of checksumming, the driver should 712either set this value to zero or simply return that it doesn't support 713the capability. 714.Pp 715Note, the values that the driver declares in this capability indicate 716what it can do when it transmits data. 717If the driver can only verify checksums when receiving data, then it should not 718indicate that it supports this capability. 719The following set of flags may be combined through a bitwise inclusive OR: 720.Bl -tag -width Ds 721.It Dv HCKSUM_INET_PARTIAL 722This indicates that the hardware can calculate a partial checksum for 723both IPv4 and IPv6 UDP and TCP packets; however, it requires the pseudo-header 724checksum be calculated for it. 725The pseudo-header checksum will be available for the mblk_t when calling 726.Xr mac_hcksum_get 9F . 727Note this does not imply that the hardware is capable of calculating 728the partial checksum for other L4 protocols or the IPv4 header checksum. 729That should be indicated with the 730.Dv HCKSUM_IPHDRCKSUM flag. 731.It Dv HCKSUM_INET_FULL_V4 732This indicates that the hardware will fully calculate the L4 checksum for 733outgoing IPv4 UDP or TCP packets only, and does not require a pseudo-header 734checksum. 735Note this does not imply that the hardware is capable of calculating the 736checksum for other L4 protocols or the IPv4 header checksum. 737That should be indicated with the 738.Dv HCKSUM_IPHDRCKSUM . 739.It Dv HCKSUM_INET_FULL_V6 740This indicates that the hardware will fully calculate the L4 checksum for 741outgoing IPv6 UDP or TCP packets only, and does not require a pseudo-header 742checksum. 743Note this does not imply that the hardware is capable of calculating the 744checksum for any other L4 protocols. 745.It Dv HCKSUM_IPHDRCKSUM 746This indicates that the hardware supports calculating the checksum for 747the IPv4 header itself. 748.El 749.Pp 750When in a driver's transmit function, the driver will be processing a 751single frame. 752It should call 753.Xr mac_hcksum_get 9F 754to see what checksum flags are set on it. 755Note that the flags that are set on it are different from the ones described 756above and are documented in its manual page. 757These flags indicate how the driver is expected to program the hardware and what 758checksumming is required. 759Not all frames will require hardware checksumming or will ask the hardware to 760checksum it. 761.Pp 762If a driver supports offloading the receive checksum and verification, 763it should check to see what the hardware indicated was verified. 764The driver should then call 765.Xr mac_hcksum_set 9F . 766The flags used are different from the ones above and are discussed in 767detail in the 768.Xr mac_hcksum_set 9F 769manual page. 770If there is no checksum information available or the driver does not support 771checksumming, then it should simply not call 772.Xr mac_hcksum_set 9F . 773.Pp 774Note that the checksum flags should be set on the first 775mblk_t that makes up a given message. 776In other words, if multiple mblk_t structures are linked together by the 777.Fa b_cont 778member to describe a single frame, then it should only be called on the 779first mblk_t of that set. 780However, each distinct message should have the checksum bits set on it, if 781applicable. 782In other words, each mblk_t that is linked together by the 783.Fa b_next 784pointer may have checksum flags set. 785.Pp 786It is recommended that device drivers provide a private property or 787.Xr driver.conf 5 788property to control whether or not checksumming is enabled for both rx 789and tx; however, the default disposition is recommended to be enabled 790for both. 791This way if hardware bugs are found in the checksumming implementation, they can 792be disabled without requiring software updates. 793The transmit property should be checked when determining how to reply to 794.Xr mc_getcapab 9E 795and the receive property should be checked in the context of the receive 796function. 797.Ss Dv MAC_CAPAB_LSO 798The 799.Dv MAC_CAPAB_LSO 800capability indicates that the driver supports various forms of large 801send offload (LSO). 802The private data is a pointer to a 803.Ft mac_capab_lso_t 804structure. 805The system currently supports offloading TCP packets over both IPv4 and 806IPv6. 807This structure has the following members which are used to indicate 808various types of LSO support. 809.Bd -literal -offset indent 810t_uscalar_t lso_flags; 811lso_basic_tcp_ivr4_t lso_basic_tcp_ipv4; 812lso_basic_tcp_ipv6_t lso_basic_tcp_ipv6; 813.Ed 814.Pp 815The 816.Fa lso_flags 817member is used to indicate which members are valid and should be 818considered. 819Each flag represents a different form of LSO. 820The member should be set to the bitwise inclusive OR of the following values: 821.Bl -tag -width Dv -offset indent 822.It Dv LSO_TX_BASIC_TCP_IPV4 823This indicates hardware support for performing TCP segmentation 824offloading over IPv4. 825When this flag is set, the 826.Fa lso_basic_tcp_ipv4 827member must be filled in. 828.It Dv LSO_TX_BASIC_TCP_IPV6 829This indicates hardware support for performing TCP segmentation 830offloading over IPv6. 831The IPv6 packet will have no extension headers present. 832When this flag is set, the 833.Fa lso_basic_tcp_ipv6 834member must be filled in. 835.El 836.Pp 837The 838.Fa lso_basic_tcp_ipv4 839member is a structure with the following members: 840.Bd -literal -offset indent 841t_uscalar_t lso_max 842.Ed 843.Bd -filled -offset indent 844The 845.Fa lso_max 846member should be set to the maximum size of the TCP data 847payload that can be offloaded to the hardware. 848.Ed 849.Pp 850The 851.Fa lso_basic_tcp_ipv6 852member is a structure with the following members: 853.Bd -literal -offset indent 854t_uscalar_t lso_max 855.Ed 856.Bd -filled -offset indent 857The 858.Fa lso_max 859member should be set to the maximum size of the TCP data 860payload that can be offloaded to the hardware. 861.Ed 862.Pp 863Like with checksumming, it is recommended that driver writers provide a 864means for disabling the support of LSO even if it is enabled by default. 865This deals with the case where issues that pop up for LSO may be worked 866around without requiring additional driver work. 867.Sh EVOLVING CAPABILITIES 868The following capabilities are still evolving in the operating system. 869They are documented such that device driver writers may experiment with 870them. 871However, if such drivers are not present inside the core operating 872system repository, they may be subject to API and ABI breakage. 873.Ss Dv MAC_CAPAB_RINGS 874The 875.Dv MAC_CAPAB_RINGS 876capability is very important for implementing a high-performing device 877driver. 878Networking hardware structures the queues of packets to be sent 879and received into a ring. 880Each entry in this ring has a descriptor, which describes the address 881and options for a packet which is going to 882be transmitted or received. 883While simple networking devices only have a single ring, most high-speed 884networking devices have support for many rings. 885.Pp 886Rings are used for two important purposes. 887The first is receive side scaling (RSS), which is the ability to have 888the hardware hash the contents of a packet based on some of the protocol 889headers, and send it to one of several rings. 890These different rings may each have their own interrupt associated with 891them, allowing the card to receive traffic in parallel. 892Similar logic can be performed when sending traffic, to leverage 893multiple hardware resources, thus increasing capacity. 894.Pp 895The second use of rings is to group them together and apply filtering 896rules. 897For example, if a packet matches a specific VLAN or MAC address, 898then it can be sent to a specific ring or a specific group of rings. 899This is especially useful when there are multiple different virtual NICs 900or zones in play as the operating system will be able to use the 901hardware classificaiton features to already know where a given packet 902needs to be delivered internally rather than having to determine that 903for each packet. 904.Pp 905From the MAC framework's perspective, a driver can have one or more 906groups. 907A group consists of the following: 908.Bl -bullet -offset -indent 909.It 910One or more hardware rings. 911.It 912One or more MAC address or VLAN filters. 913.El 914.Pp 915The details around how a device driver changes when rings are employed, 916the data structures that a driver must implement, and more are available 917in 918.Xr mac_capab_rings 9E . 919.Ss Dv MAC_CAPAB_TRANSCEIVER 920Many networking devices leverage external transceivers that adhere to 921standards such as SFP, QSFP, QSFP-DD, etc., which often contain 922standardized information in a EEPROM on the device. 923The 924.Dv MAC_CAPAB_TRANSCEIVER 925capability provides a means of discovering the number of transceivers, 926their types, and reading the data from a transceiver. 927This allows administrators and users to determine if devices are 928present, if the hardware can use them, and in many cases, detailed 929information about the device ranging from its manufacturer and 930serial numbers to specific information about its health. 931Implementing this capability will lead to the operating system being 932able to discover and display transceivers as part of its fault 933management topology. 934.Pp 935See 936.Xr mac_capab_transceiver 9E 937for more details on the capability structure and the various function 938entry points that come along with it. 939.Ss Dv MAC_CAPAB_LED 940The 941.Dv MAC_CAPAB_LED 942capability provides a means to access and control the LEDs on a network 943interface card. 944This is then made available to the broader operating system and consumed 945by facilities such as the Fault Management Architecture. 946See 947.Xr mac_capab_led 9E 948for more details on the structure and requirements of the capability. 949.Sh PROPERTIES 950Properties in the MAC framework represent aspects of a link. 951These include things like the link's current state and MTU. 952Many of the properties in the system are focused around auto-negotiation and 953controlling what link speeds are advertised. 954Information about properties is covered by three different device entry points. 955The 956.Xr mc_propinfo 9E 957entry point obtains metadata about the property. 958The 959.Xr mc_getprop 9E 960entry point obtains the property. 961The 962.Xr mc_setprop 9E 963entry point updates the property to a new value. 964.Pp 965Many of the properties listed below are read-only. 966Each property indicates whether it's read-only or it's read/write. 967However, driver writers may not implement the ability to set all writable 968properties. 969Many of these depend on the card itself. 970In particular, all properties that relate to auto-negotiation and are read/write 971may not be updated if the hardware in question does not support toggling what 972link speeds are auto-negotiated. 973While copper Ethernet often does not have this restriction, it often exists with 974various fiber standards and phys. 975.Pp 976The following properties are the subset of MAC framework properties that 977driver writers should be aware of and handle. 978While other properties exist in the system, driver writers should always return 979an error when a property not listed below is encountered. 980See 981.Xr mc_getprop 9E 982and 983.Xr mc_setprop 9E 984for more information on how to handle them. 985.Bl -hang -width Ds 986.It Dv MAC_PROP_DUPLEX 987.Bd -filled -compact 988Type: 989.Vt link_duplex_t | 990Permissions: 991.Sy Read-Only 992.Ed 993.Pp 994The 995.Dv MAC_PROP_DUPLEX 996property is used to indicate whether or not the link is duplex. 997A duplex link may have traffic flowing in both directions at the same time. 998The 999.Vt link_duplex_t 1000is an enumeration which may be set to any of the following values: 1001.Bl -tag -width Ds 1002.It Dv LINK_DUPLEX_UNKNOWN 1003The current state of the link is unknown. 1004This may be because the link has not negotiated to a specific speed or it is 1005down. 1006.It Dv LINK_DUPLEX_HALF 1007The link is running at half duplex. 1008Communication may travel in only one direction on the link at a given time. 1009.It Dv LINK_DUPLEX_FULL 1010The link is running at full duplex. 1011Communication may travel in both directions on the link simultaneously. 1012.El 1013.It Dv MAC_PROP_SPEED 1014.Bd -filled -compact 1015Type: 1016.Vt uint64_t | 1017Permissions: 1018.Sy Read-Only 1019.Ed 1020.Pp 1021The 1022.Dv MAC_PROP_SPEED 1023property stores the current link speed in bits per second. 1024A link that is running at 100 MBit/s would store the value 100000000ULL. 1025A link that is running at 40 Gbit/s would store the value 40000000000ULL. 1026.It Dv MAC_PROP_STATUS 1027.Bd -filled -compact 1028Type: 1029.Vt link_state_t | 1030Permissions: 1031.Sy Read-Only 1032.Ed 1033.Pp 1034The 1035.Dv MAC_PROP_STATUS 1036property is used to indicate the current state of the link. 1037It indicates whether the link is up or down. 1038The 1039.Vt link_state_t 1040is an enumeration which may be set to any of the following values: 1041.Bl -tag -width Ds 1042.It Dv LINK_STATE_UNKNOWN 1043The current state of the link is unknown. 1044This may be because the driver's 1045.Xr mc_start 9E 1046endpoint has not been called so it has not attempted to start the link. 1047.It Dv LINK_STATE_DOWN 1048The link is down. 1049This may be because of a negotiation problem, a cable problem, or some other 1050device specific issue. 1051.It Dv LINK_STATE_UP 1052The link is up. 1053If auto-negotiation is in use, it should have completed. 1054Traffic should be able to flow over the link, barring other issues. 1055.El 1056.It Dv MAC_PROP_MEDIA 1057.Bd -filled -compact 1058Type: 1059.Vt uint32_t No (Varies) | 1060Permissions: 1061.Sy Read-Only 1062.Ed 1063.Pp 1064The 1065.Dv MAC_PROP_MEDIA 1066property indicates the current type of media on the link. 1067The type of media is class-specific and determined based on the 1068.Fa m_type_ident 1069field in the 1070.Vt mac_register_t 1071structure used when calling 1072.Xr mac_register 9F . 1073The media is always read-only. 1074This property is not used to control how auto-negotiation should be 1075performed, instead the existing speed-based properties are used instead. 1076This property should be updated after auto-negotiation has completed. 1077If device hardware and firmware do not provide a way to accurately 1078determine this, then it is much better to return that the media is 1079unknown rather than to lie or guess. 1080A common case where this comes up is when a network card uses an 1081SFP-based device. 1082If the underlying negotiated type of the link isn't made available and 1083therefore the driver can't distinguish between say 40GBASE-SR4 and 108440GBASE-LR4, then drivers should return that the media is unknown. 1085.Pp 1086Similarly many types here represent an electrical interface that is 1087often used between a MAC and a PHY, but also for chip-to-chip 1088connectivity or on a backplane. 1089When connecting to a PHY these shouldn't generally be used as the user 1090is concerned with what is actually on the link they plug in, not the 1091internals of the device. 1092.Pp 1093Currently media values are defined for Ethernet-based devices and use 1094the enumeration 1095.Vt mac_ether_media_t . 1096These are defined in 1097.In sys/mac_ether.h 1098and generally follow the IEEE standardized physical medium dependent 1099.Pq PMD 1100layer in 802.3. 1101.Bl -tag -width Ds 1102.It Dv ETHER_MEDIA_UNKNOWN 1103This indicates that the type of the link media is unknown to the driver. 1104This may be because the link is in a state where this information is 1105unknown or the hardware, firmware, and device driver cannot figure it 1106out. 1107If there is no media present and the link is down, use 1108.Dv ETHER_MEDIA_NONE 1109instead. 1110.It Dv ETHER_MEDIA_NONE 1111Represents the case that there is no specific media in use. 1112This should generally be used when the link is down. 1113.It Dv ETHER_MEDIA_10BASE_T 1114Traditional 10 Mbit/s Ethernet based utilizing CAT-3 cabling. 1115Defined in 802.3i. 1116.It Dv ETHER_MEDIA_10BASE_T1 1117A more recent variant of 10 Mbit/s Ethernet that uses a single twisted 1118pair. 1119Defined in 802.3cg. 1120.It Dv ETHER_MEDIA_100BASE_TX 1121The most common form of 100 Mbit/s Ethernet that utilizes two twisted 1122pairs over a CAT-5 cable. 1123Defined in 802.3u. 1124.It Dv ETHER_MEDIA_100BASE_FX 1125100 Mbit/s Ethernet operating over multi-mode fiber. 1126Defined in 802.3u. 1127.It Dv ETHER_MEDIA_100BASE_X 1128This is a general term that covers operating in one of the 100BASE-?X 1129variants. 1130This is here because some PHYs do not distinguish between operating in 1131100BASE-TX and 100BASE-FX. 1132If the driver can determine if it is operating with a BASE-T or fiber 1133based PHY, prefer the more specific types instead. 1134.It Dv ETHER_MEDIA_100BASE_T4 1135This is an uncommon half-duplex variant of 100 Mbit/s Ethernet that 1136operates over CAT-3 cable using four twisted pairs. 1137Defined in 802.3u. 1138.It Dv ETHER_MEDIA_100BASE_T2 1139This is another uncommon variant of 100 Mbit/s Ethernet that only 1140requires two twisted pairs, but unlike 100BASE-TX requires CAT-3 cables. 1141Defined in 802.3y. 1142.It Dv ETHER_MEDIA_100BASE_T1 1143A more recent form of 100 Mbit/s Ethernet that requires only a single 1144twisted pair. 1145Defined in 802.3bw. 1146.It Dv ETHER_MEDIA_100_SGMII 1147This form of 100 Mbit/s Ethernet is generally used for chip-to-chip 1148connectivity and utilizes the SGMII 1149.Pq Serial gigabit media-independent interface 1150specification. 1151.It Dv ETHER_MEDIA_1000BASE_X 1152This is a general catch-all for all 1 Gbit/s fiber-based operation. 1153This is here for compatibility with the generic information returned by 1154traditional 802.3-compatible PHYs. 1155When more specific information is available, that should be used 1156instead. 1157.It Dv ETHER_MEDIA_1000BASE_T 1158Traditional 1 Gbit/s Ethernet that utilizes a CAT-5 cable with four 1159twisted pairs. 1160Defined in 802.3ab. 1161.It Dv ETHER_MEDIA_1000BASE_T1 1162A more recent form of 1 Gbit/s Ethernet that only requires a single 1163twisted pair. 1164.It Dv ETHER_MEDIA_1000BASE_KX 1165This form of 1 Gbit/s Ethernet is designed for operating over a backplane. 1166Defined in 802.3ap. 1167.It Dv ETHER_MEDIA_1000BASE_CX 1168An older form of 1 Gbit/s Ethernet that operates over balanced copper 1169cables. 1170Defined in 802.3z. 1171.It Dv ETHER_MEDIA_1000BASE_SX 11721 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for 1173each direction. 1174.It Dv ETHER_MEDIA_1000BASE_LX 11751 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1176each direction. 1177.It Dv ETHER_MEDIA_1000BASE_BX 11781 Gbit/s Ethernet operating over a single piece of single-mode fiber. 1179This media operates bi-directionally as opposed to how 1000BASE-LX and 11801000BASE-SX operate. 1181.It Dv ETHER_MEDIA_1000_SGMII 1182A form of 1 Gbit/s Ethernet defined by Cisco that is used for 1183chip-to-chip connectivity. 1184.It Dv ETHER_MEDIA_2500BASE_T 11852.5 Gbit/s Ethernet based on four copper twisted-pairs. 1186Defined in 802.3bz. 1187.It Dv ETHER_MEDIA_2500BASE_KX 11882.5 Gbit/s Ethernet that is designed for operating over a backplane 1189interconnect. 1190Defined in 802.3cb. 1191.It Dv ETHER_MEDIA_2500BASE_X 1192This is a variant of 2.5 Gbit/s Ethernet that took the 1000BASE-X IEEE 1193standard and ran it with a 2.5x faster clock. 1194It is a defacto standard. 1195.It Dv ETHER_MEDIA_5000BASE_T 11965.0 Gbit/s Ethernet based on four copper twisted-pairs. 1197Defined in 802.3bz. 1198.It Dv ETHER_MEDIA_5000BASE_KR 11995.0 Gbit/s Ethernet that is designed for operating over a backplane 1200interconnect. 1201Defined in 802.3cb. 1202.It Dv ETHER_MEDIA_10GBASE_T 120310 Gbit/s Ethernet operating over four copper twisted pairs utilizing 1204CAT-6a cables. 1205Defined in 802.3an. 1206.It Dv ETHER_MEDIA_10GBASE_SR 120710 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for 1208each direction. 1209Defined in 802.3ae. 1210.It Dv ETHER_MEDIA_10GBASE_LR 121110 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1212each direction. 1213The maximum fiber length is 10km. 1214Defined in 802.3ae. 1215.It Dv ETHER_MEDIA_10GBASE_ER 121610 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1217each direction. 1218The maximum fiber length is 30km. 1219Defined in 802.3ae. 1220.It Dv ETHER_MEDIA_10GBASE_LRM 122110 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for 1222each direction. 1223This has a longer reach of up to 220m and is a longer distance than 122410GBASE-SR. 1225Defined in 802.3aq. 1226.It Dv ETHER_MEDIA_10GBASE_KR 122710 Gbit/s Ethernet operating over a single lane backplane. 1228Defined n 802.3ap. 1229.It Dv ETHER_MEDIA_10GBASE_CX4 123010 Gbit/s Ethernet operating over a group of four shielded copper cables. 1231Defined in 802.3ak. 1232.It Dv ETHER_MEDIA_10GBASE_KX4 123310 Gbit/s Ethernet operating over a four lane backplane. 1234Defined n 802.3ap. 1235.It Dv ETHER_MEDIA_10GBASE_CR 123610 Gbit/s Ethernet that is built using a passive copper 1237SFP-compatible cable. 1238This is sometimes called 10GSFP+Cu passive. 1239Defined in SFF-8431. 1240.It Dv ETHER_MEDIA_10GBASE_AOC 124110 Gbit/s Ethernet that is built using a short-range active 1242optical cable that is SFP+-compatible. 1243Defined in SFF-8431. 1244.It Dv ETHER_MEDIA_10GBASE_ACC 124510 Gbit/s Ethernet based upon a single lane of copper cable with an 1246active component that allows it go longer distances than 10GBASE-CR. 1247Defined in SFF-8431. 1248.It Dv ETHER_MEDIA_10G_XAUI 124910 Gbit/s signalling that is defined for use between a MAC and PHY. 1250This is the roman numeral X and attachment unit interface. 1251Sometimes used for chip-to-chip interconnects. 1252Defined in 802.3ae. 1253.It Dv ETHER_MEDIA_10G_SFI 125410 Gbit/s signalling that is defined for use between a MAC and an 1255SFP-based transceiver. 1256Defined in SFF-8431. 1257.It Dv ETHER_MEDIA_10G_XFI 125810 Gbit/s signalling that is defined for use between a MAC and an 1259XFP-based transceiver. 1260Defined in INF-8077i 1261.Pq XFP MSA . 1262.It Dv ETHER_MEDIA_25GBASE_T 126325 Gbit/s Ethernet based upon four twisted pair cables using CAT-8 1264cable. 1265Defined in 802.3bq. 1266.It Dv ETHER_MEDIA_25GBASE_SR 126725 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for 1268each direction. 1269Defined in 802.3by. 1270.It Dv ETHER_MEDIA_25GBASE_LR 127125 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1272each direction. 1273The maximum fiber length is 10km. 1274Defined in 802.3cc. 1275.It Dv ETHER_MEDIA_25GBASE_ER 127625 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1277each direction. 1278The maximum fiber length is 30km. 1279Defined in 802.3cc. 1280.It Dv ETHER_MEDIA_25GBASE_KR 128125 Gbit/s Ethernet operating over a backplane with a single lane. 1282Defined in 802.3by. 1283.It Dv ETHER_MEDIA_25GBASE_CR 128425 Gbit/s Ethernet operating over a single lane of copper cable. 1285Generally used with an SFP28 style connector. 1286Defined in 802.3by. 1287.It Dv ETHER_MEDIA_25GBASE_AOC 128825 Gbit/s Ethernet based that is built using a short-range active 1289optical cable that is SFP28-compatible. 1290Defined loosely by SFF-8402 and often utilizes 25GBASE-SR. 1291.It Dv ETHER_MEDIA_25GBASE_ACC 129225 Gbit/s Ethernet based upon a single lane of copper cable with an 1293active component that allows it go longer distances than 25GBASE-CR. 1294Defined loosely by SFF-8402. 1295.It Dv ETHER_MEDIA_25G_AUI 129625 Gbit/s signalling that is defined for use between a MAC and PHY and 1297for chip-to-chip connectivity. 1298Defined by 802.3by. 1299.It Dv ETHER_MEDIA_40GBASE_T 130040 Gbit/s Ethernet based upon four twisted-pairs of CAT-8 cables. 1301Defined in 802.3bq. 1302.It Dv ETHER_MEDIA_40GBASE_CR4 130340 Gbit/s Ethernet utilizing four lanes of twinaxial copper cabling 1304each operating at 10 Gbit/s. 1305This is generally used with a QSFP+ connector defined in SFF-8635. 1306Defined in 802.3ba. 1307.It Dv ETHER_MEDIA_40GBASE_KR4 130840 Gbit/s Ethernet utilizing four lanes over a copper backplane each 1309operating at 10 Gbit/s. 1310Defined in 802.3ba. 1311.It Dv ETHER_MEDIA_40GBASE_SR4 131240 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, each 1313operating at 10 Gbit/s, with one fiber in the pair being used for 1314transmit and the other for receive. 1315Generally utilizes a QSFP+ connector. 1316Defined in 802.3ba. 1317.It Dv ETHER_MEDIA_40GBASE_LR4 131840 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1319for each direction. 1320Utilizes wavelength multiplexing as the electrical interface is four 10 1321Gbit/s signals. 1322The maximum fiber length is 10km. 1323Defined in 802.3ba. 1324.It Dv ETHER_MEDIA_40GBASE_ER4 132540 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1326for each direction. 1327Utilizes wavelength multiplexing as the electrical interface is four 10 1328Gbit/s signals and generally based upon a QSFP+ connector. 1329The maximum fiber length is 40km. 1330Defined in 802.3bm. 1331.It Dv ETHER_MEDIA_40GBASE_LM4 133240 Gbit/s Ethernet based upon using one pair of multi-mode fibers, one 1333for each direction. 1334Utilizes wavelength multiplexing as the electrical interface is four 10 1335Gbit/s signals and generally based upon a QSFP+ connector. 1336Defined by a specific MSA. 1337.It Dv ETHER_MEDIA_40GBASE_AOC4 133840 Gbit/s Ethernet based upon a QSFP+ based cable with built-in 1339optical transceivers. 1340The electrical interface is four lanes running at 10 Gbit/s. 1341.It Dv ETHER_MEDIA_40GBASE_ACC4 134240 Gbit/s Ethernet based upon four copper lanes each running at 10 1343Gbit/s with some additional component compared to 40GBASE-CR4. 1344.It Dv ETHER_MEDIA_40G_XLAUI 134540 Gbit/s signalling operating across four lanes that is defined for use 1346between a MAC and a PHY or for chip-to-chip connectivity. 1347Defined by 802.3ba. 1348.It Dv ETHER_MEDIA_40G_XLPPI 134940 Gbit/s signalling operating across four lanes that is designed to 1350connect between a chip and a module, generally a QSFP+ based device. 1351Defined in 802.3ba. 1352.It Dv ETHER_MEDIA_50GBASE_KR2 135350 Gbit/s Ethernet which operates over a two lane copper backplane. 1354Each lane operates at 25 Gbit/s. 1355Defined by the 25G and 50G Ethernet consortium. 1356This did not become an IEEE standard. 1357.It Dv ETHER_MEDIA_50GBASE_CR2 135850 Gbit/s Ethernet which operates over two lane copper twinaxial cable, 1359generally with a QSFP+ connector. 1360Each lane operates at 25 Gbit/s. 1361Defined by the 25G and 50G Ethernet consortium. 1362.It Dv ETHER_MEDIA_50GBASE_SR2 136350 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, each 1364operating at 25 Gbit/s, with one fiber in the pair being used for 1365transmit and the other for receive. 1366Generally utilizes a QSFP+ connector. 1367Defined by the 25G and 50G Ethernet consortium. 1368.It Dv ETHER_MEDIA_50GBASE_LR2 136950 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1370for each direction. 1371Utilizes wavelength multiplexing as the electrical interface is two 25 1372Gbit/s signals. 1373Defined by the 25G and 50G Ethernet consortium. 1374.It Dv ETHER_MEDIA_50GBASE_AOC2 137550 Gbit/s Ethernet generally based upon a QSFP+ based cable with built-in 1376optical transceivers. 1377The electrical interface is two lanes running at 25 Gbit/s. 1378.It Dv ETHER_MEDIA_50GBASE_ACC2 137950 Gbit/s Ethernet based upon two copper twinaxial lanes each running at 138025 Gbit/s with some additional component compared to 50GBASE-CR2. 1381.It Dv ETHER_MEDIA_50GBASE_KR 138250 Gbit/s Ethernet operating over a single lane backplane. 1383Defined by 802.3cd. 1384.It Dv ETHER_MEDIA_50GBASE_CR 138550 Gbit/s Ethernet operating over a single lane twinaxial copper cable 1386generally utilizing an SFP56 interface. 1387Defined by 802.3cd. 1388.It Dv ETHER_MEDIA_50GBASE_SR 138950 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for 1390each direction. 1391Defined by 802.3cd. 1392.It Dv ETHER_MEDIA_50GBASE_LR 139350 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1394each direction. 1395The maximum fiber length is 10km. 1396Defined in 802.3cd. 1397.It Dv ETHER_MEDIA_50GBASE_ER 139850 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1399each direction. 1400The maximum fiber length is 40km. 1401Defined in 802.3cd. 1402.It Dv ETHER_MEDIA_50GBASE_FR 140350 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1404each direction. 1405The maximum fiber length is 2km. 1406Defined in 802.3cd. 1407.It Dv ETHER_MEDIA_50GBASE_AOC 140850 Gbit/s Ethernet that is built using a short-range active optical 1409cable that is generally SFP56 compatible. 1410The electrical interface operates at 25 Gbit/s PAM4 signaling. 1411.It Dv ETHER_MEDIA_50GBASE_ACC 141250 Gbit/s Ethernet that is built using a single lane twinaxial 1413cable that is generally SFP56 compatible but uses an active component 1414such as a retimer or redriver when compared to 50GBASE-CR. 1415.It Dv ETHER_MEDIA_100GBASE_CR10 1416100 Gbit/s Ethernet operating over ten lanes of shielded twinaxial 1417copper cable, each operating at 10 Gbit/s. 1418Defined in 802.3ba. 1419.It Dv ETHER_MEDIA_100GBASE_SR10 1420100 Gbit/s Ethernet based upon using ten pairs of multi-mode fiber, each 1421operating at 10 Gbit/s, with one fiber in the pair being used for 1422transmit and the other for receive. 1423.It Dv ETHER_MEDIA_100GBASE_SR4 1424100 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, 1425each operating at 25 Gbit/s, with one fiber in the pair being used for 1426transmit and the other for receive. 1427Defined by 802.3bm. 1428.It Dv ETHER_MEDIA_100GBASE_LR4 1429100 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1430for each direction. 1431Utilizes wavelength multiplexing as the electrical interface is four 25 1432Gbit/s signals and generally based upon a QSFP28 connector. 1433The maximum fiber length is 10km. 1434Defined by 802.3ba. 1435.It Dv ETHER_MEDIA_100GBASE_ER4 1436100 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1437for each direction. 1438Utilizes wavelength multiplexing as the electrical interface is four 25 1439Gbit/s signals and generally based upon a QSFP28 connector. 1440The maximum fiber length is 40km. 1441Defined by 802.3ba. 1442.It Dv ETHER_MEDIA_100GBASE_KR4 1443100 Gbit/s Ethernet based upon using a four lane copper backplane. 1444Each lane operates at 25 Gbit/s. 1445Defined in 802.3bj. 1446.It Dv ETHER_MEDIA_100GBASE_CAUI4 1447100 Gbit/s signalling used for chip-to-chip and chip-to-module 1448connectivity. 1449Defined in 802.3bm. 1450.It Dv ETHER_MEDIA_100GBASE_CR4 1451100 Gbit/s Ethernet based upon using a four lane copper twinaxial cable. 1452Each lane operates at 25 Gbit/s and generally utilizes a QSFP28 1453connector. 1454Defined in 802.3bj. 1455.It Dv ETHER_MEDIA_100GBASE_AOC4 1456100 Gbit/s Ethernet that utilizes an active optical cable with 1457short-range optical transceivers. 1458Electrically operates as four lanes of 25 Gbit/s and most commonly uses 1459a QSFP28 connector. 1460.It Dv ETHER_MEDIA_100GBASE_ACC4 1461100 Gbit/s Ethernet that utilizes a four lane copper twinaxial cable 1462that unlike 100GBASE-CR4 has an active component such as a retimer or 1463redriver. 1464.It Dv ETHER_MEDIA_100GBASE_KR2 1465100 Gbit/s Ethernet based upon using a two lane copper backplane. 1466Each lane operates at 50 Gbit/s. 1467Defined in 802.3cd. 1468.It Dv ETHER_MEDIA_100GBASE_CR2 1469100 Gbit/s Ethernet that utilizes a two lane copper twinaxial cable. 1470Each lane operates at 50 Gbit/s. 1471Defined by 802.3cd. 1472.It Dv ETHER_MEDIA_100GBASE_SR2 1473100 Gbit/s Ethernet based upon using two pairs of multi-mode fiber, 1474each operating at 50 Gbit/s, with one fiber in the pair being used for 1475transmit and the other for receive. 1476Defined by 802.3cd. 1477.It Dv ETHER_MEDIA_100GBASE_KR 1478100 Gbit/s Ethernet operating over a single lane copper backplane. 1479Defined by 802.3ck. 1480.It Dv ETHER_MEDIA_100GBASE_CR 1481100 Gbit/s Ethernet operating over a single lane copper twinaxial cable. 1482Generally uses an SFP112 connector. 1483Defined by 802.3ck. 1484.It Dv ETHER_MEDIA_100GBASE_SR 1485100 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for 1486transmitting and one for receiving. 1487The maximum fiber length is 60-100m depending on the fiber type 1488.Pq OM3, OM4 . 1489Defined by 802.3db. 1490.It Dv ETHER_MEDIA_100GBASE_DR 1491100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1492transmitting and one for receiving. 1493Designed to be used with a parallel DR4/DR8 interface. 1494The maximum fiber length is 500m. 1495Defined by 802.3cd. 1496.It Dv ETHER_MEDIA_100GBASE_LR 1497100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1498transmitting and one for receiving. 1499The maximum fiber length is 10km. 1500Defined by 802.3cu. 1501.It Dv ETHER_MEDIA_100GBASE_FR 1502100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for 1503transmitting and one for receiving. 1504The maximum fiber length is 2km. 1505Defined by 802.3cu. 1506.It Dv ETHER_MEDIA_200GBASE_CR4 1507200 Gbit/s Ethernet utilizing a four lane passive copper twinaxial 1508cable. 1509Each lane operates at 50 Gbit/s and the connector is generally based on 1510QSFP56. 1511Defined by 802.3cd. 1512.It Dv ETHER_MEDIA_200GBASE_KR4 1513200 Gbit/s Ethernet utilizing four lanes over a copper backplane each 1514operating at 50 Gbit/s. 1515Defined by 802.3cd. 1516.It Dv ETHER_MEDIA_200GBASE_SR4 1517200 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, 1518each operating at 50 Gbit/s, with one fiber in the pair being used for 1519transmit and the other for receive. 1520Defined by 802.3cd. 1521.It Dv ETHER_MEDIA_200GBASE_DR4 1522200 Gbit/s Ethernet based upon using four pairs of single-mode fiber, 1523each operating at 50 Gbit/s, with one fiber in the pair being used for 1524transmit and the other for receive. 1525Defined by 802.3bs. 1526.It Dv ETHER_MEDIA_200GBASE_FR4 1527200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1528for transmitting and one for receiving. 1529Utilizes wavelength multiplexing as the electrical interface is four 50 1530Gbit/s signals and generally based upon a QSFP56 connector. 1531The maximum fiber length is 2km. 1532Defined by 802.3bs. 1533.It Dv ETHER_MEDIA_200GBASE_LR4 1534200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1535for transmitting and one for receiving. 1536Utilizes wavelength multiplexing as the electrical interface is four 50 1537Gbit/s signals and generally based upon a QSFP56 connector. 1538The maximum fiber length is 10km. 1539Defined by 802.3bs. 1540.It Dv ETHER_MEDIA_200GBASE_ER4 1541200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1542for transmitting and one for receiving. 1543Utilizes wavelength multiplexing as the electrical interface is four 50 1544Gbit/s signals and generally based upon a QSFP56 connector. 1545The maximum fiber length is 40km. 1546Defined by 802.3bs. 1547.It Dv ETHER_MEDIA_200GAUI_4 1548200 Gbit/s signalling utilizing four lanes each operating at 50 Gbit/s. 1549Used for chip-to-chip and chip-to-module connections. 1550Defined by 802.3bs. 1551.It Dv ETHER_MEDIA_200GBASE_KR2 1552200 Gbit/s Ethernet utilizing two lanes over a copper backplane each 1553operating at 100 Gbit/s. 1554Defined by 802.3ck. 1555.It Dv ETHER_MEDIA_200GBASE_CR2 1556200 Gbit/s Ethernet utilizing a two lane passive copper twinaxial 1557cable. 1558Each lane operates at 100 Gbit/s. 1559Defined by 802.3ck. 1560.It Dv ETHER_MEDIA_200GBASE_SR2 1561200 Gbit/s Ethernet based upon using two pairs of multi-mode fiber, 1562each operating at 100 Gbit/s, with one fiber in the pair being used for 1563transmit and the other for receive. 1564Defined by 802.3db. 1565.It Dv ETHER_MEDIA_200GAUI_2 1566200 Gbit/s signalling utilizing two lanes each operating at 100 Gbit/s. 1567Used for chip-to-chip and chip-to-module connections. 1568Defined by 802.3ck. 1569.It Dv ETHER_MEDIA_400GBASE_KR8 1570400 Gbit/s Ethernet utilizing eight lanes over a copper backplane each 1571operating at 50 Gbit/s. 1572Defined by the 25/50 Gigabit Ethernet Consortium. 1573.It Dv ETHER_MEDIA_400GBASE_FR8 1574200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1575for transmitting and one for receiving. 1576Utilizes wavelength multiplexing as the electrical interface is eight 50 1577Gbit/s signals and generally based upon a QSFP-DD connector. 1578The maximum fiber length is 2km. 1579Defined by 802.3bs. 1580.It Dv ETHER_MEDIA_400GBASE_LR8 1581200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1582for transmitting and one for receiving. 1583Utilizes wavelength multiplexing as the electrical interface is eight 50 1584Gbit/s signals and generally based upon a QSFP-DD connector. 1585The maximum fiber length is 10km. 1586Defined by 802.3bs. 1587.It Dv ETHER_MEDIA_400GBASE_ER8 1588200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1589for transmitting and one for receiving. 1590Utilizes wavelength multiplexing as the electrical interface is eight 50 1591Gbit/s signals and generally based upon a QSFP-DD connector. 1592The maximum fiber length is 40km. 1593Defined by 802.3cn. 1594.It Dv ETHER_MEDIA_400GAUI_8 1595400 Gbit/s signalling utilizing eight lanes each operating at 50 Gbit/s. 1596Used for chip-to-chip and chip-to-module connections. 1597Defined by 802.3bs. 1598.It Dv ETHER_MEDIA_400GBASE_KR4 1599400 Gbit/s Ethernet utilizing four lanes over a copper backplane each 1600operating at 100 Gbit/s. 1601Defined by 802.3ck. 1602.It Dv ETHER_MEDIA_400GBASE_CR4 1603200 Gbit/s Ethernet utilizing a two lane passive copper twinaxial 1604cable. 1605Each lane operates at 100 Gbit/s and generally uses a QSFP112 connector. 1606Defined by 802.3ck. 1607.It Dv ETHER_MEDIA_400GBASE_SR4 1608400 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, 1609each operating at 100 Gbit/s, with one fiber in the pair being used for 1610transmit and the other for receive. 1611Defined by 802.3db. 1612.It Dv ETHER_MEDIA_400GBASE_DR4 1613400 Gbit/s Ethernet based upon using four pairs of single-mode fiber, 1614each operating at 100 Gbit/s, with one fiber in the pair being used for 1615transmit and the other for receive. 1616The maximum fiber length is 500m. 1617Defined by 802.3bs. 1618.It Dv ETHER_MEDIA_400GBASE_FR4 1619400 Gbit/s Ethernet based upon using one pair of single-mode fibers, one 1620for transmitting and one for receiving. 1621Utilizes wavelength multiplexing as the electrical interface is four 100 1622Gbit/s signals and generally based upon a QSFP112 connector. 1623The maximum fiber length is 2km. 1624Defined by 802.3cu. 1625.It Dv ETHER_MEDIA_400GAUI_4 1626400 Gbit/s signalling utilizing four lanes each operating at 100 Gbit/s. 1627Used for chip-to-chip and chip-to-module connections. 1628Defined by 802.3ck. 1629.El 1630.It Dv MAC_PROP_AUTONEG 1631.Bd -filled -compact 1632Type: 1633.Vt uint8_t | 1634Permissions: 1635.Sy Read/Write 1636.Ed 1637.Pp 1638The 1639.Dv MAC_PROP_AUTONEG 1640property indicates whether or not the device is currently configured to 1641perform auto-negotiation. 1642A value of 1643.Sy 0 1644indicates that auto-negotiation is disabled. 1645A 1646.Sy non-zero 1647value indicates that auto-negotiation is enabled. 1648Devices should generally default to enabling auto-negotiation. 1649.Pp 1650When getting this property, the device driver should return the current 1651state. 1652When setting this property, if the device supports operating in the requested 1653mode, then the device driver should reset the link to negotiate to the new speed 1654after updating any internal registers. 1655.It Dv MAC_PROP_MTU 1656.Bd -filled -compact 1657Type: 1658.Vt uint32_t | 1659Permissions: 1660.Sy Read/Write 1661.Ed 1662.Pp 1663The 1664.Dv MAC_PROP_MTU 1665property determines the maximum transmission unit (MTU). 1666This indicates the maximum size packet that the device can transmit, ignoring 1667its own headers. 1668For an Ethernet device, this would exclude the size of the Ethernet header and 1669any VLAN headers that would be placed. 1670It is up to the driver to ensure that any MTU values that it accepts when adding 1671in its margin and header sizes does not exceed its maximum frame size. 1672.Pp 1673By default, drivers for Ethernet should initialize this value and the 1674MTU to 1675.Sy 1500 . 1676When getting this property, the driver should return its current 1677recorded MTU. 1678When setting this property, the driver should first validate that it is within 1679the device's valid range and then it must call 1680.Xr mac_maxsdu_update 9F . 1681Note that the call may fail. 1682If the call completes successfully, the driver should update the hardware with 1683the new value of the MTU and perform any other work needed to handle it. 1684.Pp 1685If the device does not support changing the MTU after the device's 1686.Xr mc_start 9E 1687entry point has been called, then driver writers should return 1688.Er EBUSY . 1689.It Dv MAC_PROP_FLOWCTRL 1690.Bd -filled -compact 1691Type: 1692.Vt link_flowctrl_t | 1693Permissions: 1694.Sy Read/Write 1695.Ed 1696.Pp 1697The 1698.Dv MAC_PROP_FLOWCTRL 1699property manages the configuration of pause frames as part of Ethernet 1700flow control. 1701Note, this only describes what this device will advertise. 1702What is actually enabled may be different and is subject to the rules of 1703auto-negotiation. 1704The 1705.Vt link_flowctrl_t 1706is an enumeration that may be set to one of the following values: 1707.Bl -tag -width Ds 1708.It Dv LINK_FLOWCTRL_NONE 1709Flow control is disabled. 1710No pause frames should be generated or honored. 1711.It Dv LINK_FLOWCTRL_RX 1712The device can receive pause frames; however, it should not generate 1713them. 1714.It Dv LINK_FLOWCTRL_TX 1715The device can generate pause frames; however, it does not support 1716receiving them. 1717.It Dv LINK_FLOWCTRL_BI 1718The device supports both sending and receiving pause frames. 1719.El 1720.Pp 1721When getting this property, the device driver should return the way that 1722it has configured the device, not what the device has actually 1723negotiated. 1724When setting the property, it should update the hardware and allow the link to 1725potentially perform auto-negotiation again. 1726.It Dv MAC_PROP_EN_FEC_CAP 1727.Bd -filled -compact 1728Type: 1729.Vt link_fec_t | 1730Permissions: 1731.Sy Read/Write 1732.Ed 1733.Pp 1734The 1735.Dv MAC_PROP_EN_FEC_CAP 1736property indicates which Forward Error Correction (FEC) code is advertised 1737by the device. 1738.Pp 1739The 1740.Vt link_fec_t 1741is an enumeration that may be a combination of the following bit values: 1742.Bl -tag -width Ds 1743.It Dv LINK_FEC_NONE 1744No FEC over the link. 1745.It Dv LINK_FEC_AUTO 1746The FEC coding to use is auto-negotiated, 1747.Dv LINK_FEC_AUTO 1748cannot be set along with any of the other values. 1749This is the default setting the device driver should use. 1750.It Dv LINK_FEC_RS 1751The link may use Reed-Solomon FEC coding. 1752.It Dv LINK_FEC_BASE_R 1753The link may use Base-R coding, also common referred to as FireCode. 1754.El 1755.Pp 1756When setting the property, it should update the hardware with the requested, or 1757combination of requested codings. 1758If a particular combination of codings is not supported by the hardware, 1759the device driver should return 1760.Er EINVAL . 1761When retrieving this property, the device driver should return the current 1762value of the property. 1763.It Dv MAC_PROP_ADV_FEC_CAP 1764.Bd -filled -compact 1765Type: 1766.Vt link_fec_t | 1767Permissions: 1768.Sy Read-Only 1769.Ed 1770.Pp 1771The 1772.Dv MAC_PROP_ADV_FEC_CAP 1773has the same values as 1774.Dv MAC_PROP_EN_FEC_CAP . 1775The property indicates which Forward Error Correction (FEC) code has been 1776negotiated over the link. 1777.El 1778.Pp 1779The remaining properties are all about various auto-negotiation link 1780speeds. 1781They fall into two different buckets: properties with 1782.Sy _ADV_ 1783in the name and properties with 1784.Sy _EN_ 1785in the name. 1786For any given supported speed, there is one of each. 1787The 1788.Sy _EN_ 1789set of properties are read/write properties that control what should be 1790advertised by the device. 1791When these are retrieved, they should return the current value of the property. 1792When they are set, they should change how the hardware advertises the specific 1793speed and trigger any kind of link reset and auto-negotiation, if enabled, to 1794occur. 1795.Pp 1796The 1797.Sy _ADV_ 1798set of properties are read-only properties. 1799They are meant to reflect what has actually been negotiated. 1800These may be different from the 1801.Sy _EN_ 1802family of properties, especially when different power management 1803settings are at play. 1804.Pp 1805See the 1806.Sx Link Speed and Auto-negotiation 1807section for more information. 1808.Pp 1809The properties are ordered in increasing link speed: 1810.Bl -hang -width Ds 1811.It Dv MAC_PROP_ADV_10HDX_CAP 1812.Bd -filled -compact 1813Type: 1814.Vt uint8_t | 1815Permissions: 1816.Sy Read-Only 1817.Ed 1818.Pp 1819The 1820.Dv MAC_PROP_ADV_10HDX_CAP 1821property describes whether or not 10 Mbit/s half-duplex support is 1822advertised. 1823.It Dv MAC_PROP_EN_10HDX_CAP 1824.Bd -filled -compact 1825Type: 1826.Vt uint8_t | 1827Permissions: 1828.Sy Read/Write 1829.Ed 1830.Pp 1831The 1832.Dv MAC_PROP_EN_10HDX_CAP 1833property describes whether or not 10 Mbit/s half-duplex support is 1834enabled. 1835.It Dv MAC_PROP_ADV_10FDX_CAP 1836.Bd -filled -compact 1837Type: 1838.Vt uint8_t | 1839Permissions: 1840.Sy Read-Only 1841.Ed 1842.Pp 1843The 1844.Dv MAC_PROP_ADV_10FDX_CAP 1845property describes whether or not 10 Mbit/s full-duplex support is 1846advertised. 1847.It Dv MAC_PROP_EN_10FDX_CAP 1848.Bd -filled -compact 1849Type: 1850.Vt uint8_t | 1851Permissions: 1852.Sy Read/Write 1853.Ed 1854.Pp 1855The 1856.Dv MAC_PROP_EN_10FDX_CAP 1857property describes whether or not 10 Mbit/s full-duplex support is 1858enabled. 1859.It Dv MAC_PROP_ADV_100HDX_CAP 1860.Bd -filled -compact 1861Type: 1862.Vt uint8_t | 1863Permissions: 1864.Sy Read-Only 1865.Ed 1866.Pp 1867The 1868.Dv MAC_PROP_ADV_100HDX_CAP 1869property describes whether or not 100 Mbit/s half-duplex support is 1870advertised. 1871.It Dv MAC_PROP_EN_100HDX_CAP 1872.Bd -filled -compact 1873Type: 1874.Vt uint8_t | 1875Permissions: 1876.Sy Read/Write 1877.Ed 1878.Pp 1879The 1880.Dv MAC_PROP_EN_100HDX_CAP 1881property describes whether or not 100 Mbit/s half-duplex support is 1882enabled. 1883.It Dv MAC_PROP_ADV_100FDX_CAP 1884.Bd -filled -compact 1885Type: 1886.Vt uint8_t | 1887Permissions: 1888.Sy Read-Only 1889.Ed 1890.Pp 1891The 1892.Dv MAC_PROP_ADV_100FDX_CAP 1893property describes whether or not 100 Mbit/s full-duplex support is 1894advertised. 1895.It Dv MAC_PROP_EN_100FDX_CAP 1896.Bd -filled -compact 1897Type: 1898.Vt uint8_t | 1899Permissions: 1900.Sy Read/Write 1901.Ed 1902.Pp 1903The 1904.Dv MAC_PROP_EN_100FDX_CAP 1905property describes whether or not 100 Mbit/s full-duplex support is 1906enabled. 1907.It Dv MAC_PROP_ADV_100T4_CAP 1908.Bd -filled -compact 1909Type: 1910.Vt uint8_t | 1911Permissions: 1912.Sy Read-Only 1913.Ed 1914.Pp 1915The 1916.Dv MAC_PROP_ADV_100T4_CAP 1917property describes whether or not 100 Mbit/s Ethernet using the 1918100BASE-T4 standard is 1919advertised. 1920.It Dv MAC_PROP_EN_100T4_CAP 1921.Bd -filled -compact 1922Type: 1923.Vt uint8_t | 1924Permissions: 1925.Sy Read/Write 1926.Ed 1927.Pp 1928The 1929.Sy MAC_PROP_ADV_100T4_CAP 1930property describes whether or not 100 Mbit/s Ethernet using the 1931100BASE-T4 standard is 1932enabled. 1933.It Sy MAC_PROP_ADV_1000HDX_CAP 1934.Bd -filled -compact 1935Type: 1936.Vt uint8_t | 1937Permissions: 1938.Sy Read-Only 1939.Ed 1940.Pp 1941The 1942.Dv MAC_PROP_ADV_1000HDX_CAP 1943property describes whether or not 1 Gbit/s half-duplex support is 1944advertised. 1945.It Dv MAC_PROP_EN_1000HDX_CAP 1946.Bd -filled -compact 1947Type: 1948.Vt uint8_t | 1949Permissions: 1950.Sy Read/Write 1951.Ed 1952.Pp 1953The 1954.Dv MAC_PROP_EN_1000HDX_CAP 1955property describes whether or not 1 Gbit/s half-duplex support is 1956enabled. 1957.It Dv MAC_PROP_ADV_1000FDX_CAP 1958.Bd -filled -compact 1959Type: 1960.Vt uint8_t | 1961Permissions: 1962.Sy Read-Only 1963.Ed 1964.Pp 1965The 1966.Dv MAC_PROP_ADV_1000FDX_CAP 1967property describes whether or not 1 Gbit/s full-duplex support is 1968advertised. 1969.It Dv MAC_PROP_EN_1000FDX_CAP 1970.Bd -filled -compact 1971Type: 1972.Vt uint8_t | 1973Permissions: 1974.Sy Read/Write 1975.Ed 1976.Pp 1977The 1978.Dv MAC_PROP_EN_1000FDX_CAP 1979property describes whether or not 1 Gbit/s full-duplex support is 1980enabled. 1981.It Dv MAC_PROP_ADV_2500FDX_CAP 1982.Bd -filled -compact 1983Type: 1984.Vt uint8_t | 1985Permissions: 1986.Sy Read-Only 1987.Ed 1988.Pp 1989The 1990.Dv MAC_PROP_ADV_2500FDX_CAP 1991property describes whether or not 2.5 Gbit/s full-duplex support is 1992advertised. 1993.It Dv MAC_PROP_EN_2500FDX_CAP 1994.Bd -filled -compact 1995Type: 1996.Vt uint8_t | 1997Permissions: 1998.Sy Read/Write 1999.Ed 2000.Pp 2001The 2002.Dv MAC_PROP_EN_2500FDX_CAP 2003property describes whether or not 2.5 Gbit/s full-duplex support is 2004enabled. 2005.It Dv MAC_PROP_ADV_5000FDX_CAP 2006.Bd -filled -compact 2007Type: 2008.Vt uint8_t | 2009Permissions: 2010.Sy Read-Only 2011.Ed 2012.Pp 2013The 2014.Dv MAC_PROP_ADV_5000FDX_CAP 2015property describes whether or not 5.0 Gbit/s full-duplex support is 2016advertised. 2017.It Dv MAC_PROP_EN_5000FDX_CAP 2018.Bd -filled -compact 2019Type: 2020.Vt uint8_t | 2021Permissions: 2022.Sy Read/Write 2023.Ed 2024.Pp 2025The 2026.Dv MAC_PROP_EN_5000FDX_CAP 2027property describes whether or not 5.0 Gbit/s full-duplex support is 2028enabled. 2029.It Dv MAC_PROP_ADV_10GFDX_CAP 2030.Bd -filled -compact 2031Type: 2032.Vt uint8_t | 2033Permissions: 2034.Sy Read-Only 2035.Ed 2036.Pp 2037The 2038.Dv MAC_PROP_ADV_10GFDX_CAP 2039property describes whether or not 10 Gbit/s full-duplex support is 2040advertised. 2041.It Dv MAC_PROP_EN_10GFDX_CAP 2042.Bd -filled -compact 2043Type: 2044.Vt uint8_t | 2045Permissions: 2046.Sy Read/Write 2047.Ed 2048.Pp 2049The 2050.Dv MAC_PROP_EN_10GFDX_CAP 2051property describes whether or not 10 Gbit/s full-duplex support is 2052enabled. 2053.It Dv MAC_PROP_ADV_40GFDX_CAP 2054.Bd -filled -compact 2055Type: 2056.Vt uint8_t | 2057Permissions: 2058.Sy Read-Only 2059.Ed 2060.Pp 2061The 2062.Dv MAC_PROP_ADV_40GFDX_CAP 2063property describes whether or not 40 Gbit/s full-duplex support is 2064advertised. 2065.It Dv MAC_PROP_EN_40GFDX_CAP 2066.Bd -filled -compact 2067Type: 2068.Vt uint8_t | 2069Permissions: 2070.Sy Read/Write 2071.Ed 2072.Pp 2073The 2074.Dv MAC_PROP_EN_40GFDX_CAP 2075property describes whether or not 40 Gbit/s full-duplex support is 2076enabled. 2077.It Dv MAC_PROP_ADV_100GFDX_CAP 2078.Bd -filled -compact 2079Type: 2080.Vt uint8_t | 2081Permissions: 2082.Sy Read-Only 2083.Ed 2084.Pp 2085The 2086.Dv MAC_PROP_ADV_100GFDX_CAP 2087property describes whether or not 100 Gbit/s full-duplex support is 2088advertised. 2089.It Dv MAC_PROP_EN_100GFDX_CAP 2090.Bd -filled -compact 2091Type: 2092.Vt uint8_t | 2093Permissions: 2094.Sy Read/Write 2095.Ed 2096.Pp 2097The 2098.Dv MAC_PROP_EN_100GFDX_CAP 2099property describes whether or not 100 Gbit/s full-duplex support is 2100enabled. 2101.El 2102.Ss Private Properties 2103In addition to the defined properties above, drivers are allowed to 2104define private properties. 2105These private properties are device-specific properties. 2106All private properties share the same constant, 2107.Dv MAC_PROP_PRIVATE . 2108Properties are distinguished by a name, which is a character string. 2109The list of such private properties is defined when registering with mac in the 2110.Fa m_priv_props 2111member of the 2112.Xr mac_register 9S 2113structure. 2114.Pp 2115The driver may define whatever semantics it wants for these private 2116properties. 2117They will not be listed when running 2118.Xr dladm 8 , 2119unless explicitly requested by name. 2120All such properties should start with a leading underscore character and then 2121consist of alphanumeric ASCII characters and additional underscores or hyphens. 2122.Pp 2123Properties of type 2124.Dv MAC_PROP_PRIVATE 2125may show up in all three property related entry points: 2126.Xr mc_propinfo 9E , 2127.Xr mc_getprop 9E , 2128and 2129.Xr mc_setprop 9E . 2130Device drivers should tell the different properties apart by using the 2131.Xr strcmp 9F 2132function to compare it to the set of properties that it knows about. 2133When encountering properties that it doesn't know, it should treat them 2134like all other unknown properties. 2135.Sh STATISTICS 2136The MAC framework defines a couple different sets of statistics which 2137are based on various standards for devices to implement. 2138Statistics are retrieved through the 2139.Xr mc_getstat 9E 2140entry point. 2141There are both statistics that are required for all devices and then there is a 2142separate set of Ethernet specific statistics. 2143Not all devices will support every statistic. 2144In many cases, several device registers will need to be combined to create the 2145proper stat. 2146.Pp 2147In general, if the device is not keeping track of these statistics, then 2148it is recommended that the driver store these values as a 2149.Vt uint64_t 2150to ensure that overflow does not occur. 2151.Pp 2152If a device does not support a specific statistic, then it is fine to 2153return that it is not supported. 2154The same should be used for unrecognized statistics. 2155See 2156.Xr mc_getstat 9E 2157for more information on the proper way to handle these. 2158.Ss General Device Statistics 2159The following statistics are based on MIB-II statistics from both RFC 21601213 and RFC 1573. 2161.Bl -tag -width Ds 2162.It Dv MAC_STAT_IFSPEED 2163The device's current speed in bits per second. 2164.It Dv MAC_STAT_MULTIRCV 2165The total number of received multicast packets. 2166.It Dv MAC_STAT_BRDCSTRCV 2167The total number of received broadcast packets. 2168.It Dv MAC_STAT_MULTIXMT 2169The total number of transmitted multicast packets. 2170.It Dv MAC_STAT_BRDCSTXMT 2171The total number of received broadcast packets. 2172.It Dv MAC_STAT_NORCVBUF 2173The total number of packets discarded by the hardware due to a lack of 2174receive buffers. 2175.It Dv MAC_STAT_IERRORS 2176The total number of errors detected on input. 2177.It Dv MAC_STAT_UNKNOWNS 2178The total number of received packets that were discarded because they 2179were of an unknown protocol. 2180.It Dv MAC_STAT_NOXMTBUF 2181The total number of outgoing packets dropped due to a lack of transmit 2182buffers. 2183.It Dv MAC_STAT_OERRORS 2184The total number of outgoing packets that resulted in errors. 2185.It Dv MAC_STAT_COLLISIONS 2186Total number of collisions encountered by the transmitter. 2187.It Dv MAC_STAT_RBYTES 2188The total number of bytes received by the device, regardless of packet 2189type. 2190.It Dv MAC_STAT_IPACKETS 2191The total number of packets received by the device, regardless of packet type. 2192.It Dv MAC_STAT_OBYTES 2193The total number of bytes transmitted by the device, regardless of packet type. 2194.It Dv MAC_STAT_OPACKETS 2195The total number of packets sent by the device, regardless of packet type. 2196.It Dv MAC_STAT_UNDERFLOWS 2197The total number of packets that were smaller than the minimum sized 2198packet for the device and were therefore dropped. 2199.It Dv MAC_STAT_OVERFLOWS 2200The total number of packets that were larger than the maximum sized 2201packet for the device and were therefore dropped. 2202.El 2203.Ss Ethernet Specific Statistics 2204The following statistics are specific to Ethernet devices. 2205They refer to values from RFC 1643 and include various MII/GMII specific stats. 2206Many of these are also defined in IEEE 802.3. 2207.Bl -tag -width Ds 2208.It Dv ETHER_STAT_ADV_CAP_1000FDX 2209Indicates that the device is advertising support for 1 Gbit/s 2210full-duplex operation. 2211.It Dv ETHER_STAT_ADV_CAP_1000HDX 2212Indicates that the device is advertising support for 1 Gbit/s 2213half-duplex operation. 2214.It Dv ETHER_STAT_ADV_CAP_100FDX 2215Indicates that the device is advertising support for 100 Mbit/s 2216full-duplex operation. 2217.It Dv ETHER_STAT_ADV_CAP_100GFDX 2218Indicates that the device is advertising support for 100 Gbit/s 2219full-duplex operation. 2220.It Dv ETHER_STAT_ADV_CAP_100HDX 2221Indicates that the device is advertising support for 100 Mbit/s 2222half-duplex operation. 2223.It Dv ETHER_STAT_ADV_CAP_100T4 2224Indicates that the device is advertising support for 100 Mbit/s 2225100BASE-T4 operation. 2226.It Dv ETHER_STAT_ADV_CAP_10FDX 2227Indicates that the device is advertising support for 10 Mbit/s 2228full-duplex operation. 2229.It Dv ETHER_STAT_ADV_CAP_10GFDX 2230Indicates that the device is advertising support for 10 Gbit/s 2231full-duplex operation. 2232.It Dv ETHER_STAT_ADV_CAP_10HDX 2233Indicates that the device is advertising support for 10 Mbit/s 2234half-duplex operation. 2235.It Dv ETHER_STAT_ADV_CAP_2500FDX 2236Indicates that the device is advertising support for 2.5 Gbit/s 2237full-duplex operation. 2238.It Dv ETHER_STAT_ADV_CAP_40GFDX 2239Indicates that the device is advertising support for 40 Gbit/s 2240full-duplex operation. 2241.It Dv ETHER_STAT_ADV_CAP_5000FDX 2242Indicates that the device is advertising support for 5.0 Gbit/s 2243full-duplex operation. 2244.It Dv ETHER_STAT_ADV_CAP_ASMPAUSE 2245Indicates that the device is advertising support for receiving pause 2246frames. 2247.It Dv ETHER_STAT_ADV_CAP_AUTONEG 2248Indicates that the device is advertising support for auto-negotiation. 2249.It Dv ETHER_STAT_ADV_CAP_PAUSE 2250Indicates that the device is advertising support for generating pause 2251frames. 2252.It Dv ETHER_STAT_ADV_REMFAULT 2253Indicates that the device is advertising support for detecting faults in 2254the remote link peer. 2255.It Dv ETHER_STAT_ALIGN_ERRORS 2256Indicates the number of times an alignment error was generated by the 2257Ethernet device. 2258This is a count of packets that were not an integral number of octets and failed 2259the FCS check. 2260.It Dv ETHER_STAT_CAP_1000FDX 2261Indicates the device supports 1 Gbit/s full-duplex operation. 2262.It Dv ETHER_STAT_CAP_1000HDX 2263Indicates the device supports 1 Gbit/s half-duplex operation. 2264.It Dv ETHER_STAT_CAP_100FDX 2265Indicates the device supports 100 Mbit/s full-duplex operation. 2266.It Dv ETHER_STAT_CAP_100GFDX 2267Indicates the device supports 100 Gbit/s full-duplex operation. 2268.It Dv ETHER_STAT_CAP_100HDX 2269Indicates the device supports 100 Mbit/s half-duplex operation. 2270.It Dv ETHER_STAT_CAP_100T4 2271Indicates the device supports 100 Mbit/s 100BASE-T4 operation. 2272.It Dv ETHER_STAT_CAP_10FDX 2273Indicates the device supports 10 Mbit/s full-duplex operation. 2274.It Dv ETHER_STAT_CAP_10GFDX 2275Indicates the device supports 10 Gbit/s full-duplex operation. 2276.It Dv ETHER_STAT_CAP_10HDX 2277Indicates the device supports 10 Mbit/s half-duplex operation. 2278.It Dv ETHER_STAT_CAP_2500FDX 2279Indicates the device supports 2.5 Gbit/s full-duplex operation. 2280.It Dv ETHER_STAT_CAP_40GFDX 2281Indicates the device supports 40 Gbit/s full-duplex operation. 2282.It Dv ETHER_STAT_CAP_5000FDX 2283Indicates the device supports 5.0 Gbit/s full-duplex operation. 2284.It Dv ETHER_STAT_CAP_ASMPAUSE 2285Indicates that the device supports the ability to receive pause frames. 2286.It Dv ETHER_STAT_CAP_AUTONEG 2287Indicates that the device supports the ability to perform link 2288auto-negotiation. 2289.It Dv ETHER_STAT_CAP_PAUSE 2290Indicates that the device supports the ability to transmit pause frames. 2291.It Dv ETHER_STAT_CAP_REMFAULT 2292Indicates that the device supports the ability of detecting a remote 2293fault in a link peer. 2294.It Dv ETHER_STAT_CARRIER_ERRORS 2295Indicates the number of times that the Ethernet carrier sense condition 2296was lost or not asserted. 2297.It Dv ETHER_STAT_DEFER_XMTS 2298Indicates the number of frames for which the device was unable to 2299transmit the frame due to being busy and had to try again. 2300.It Dv ETHER_STAT_EX_COLLISIONS 2301Indicates the number of frames that failed to send due to an excessive 2302number of collisions. 2303.It Dv ETHER_STAT_FCS_ERRORS 2304Indicates the number of times that a frame check sequence failed. 2305.It Dv ETHER_STAT_FIRST_COLLISIONS 2306Indicates the number of times that a frame was eventually transmitted 2307successfully, but only after a single collision. 2308.It Dv ETHER_STAT_JABBER_ERRORS 2309Indicates the number of frames that were received that were both larger 2310than the maximum packet size and failed the frame check sequence. 2311.It Dv ETHER_STAT_LINK_ASMPAUSE 2312Indicates whether the link is currently configured to accept pause 2313frames. 2314.It Dv ETHER_STAT_LINK_AUTONEG 2315Indicates whether the current link state is a result of 2316auto-negotiation. 2317.It Dv ETHER_STAT_LINK_DUPLEX 2318Indicates the current duplex state of the link. 2319The values used here should be the same as documented for 2320.Dv MAC_PROP_DUPLEX . 2321.It Dv ETHER_STAT_LINK_PAUSE 2322Indicates whether the link is currently configured to generate pause 2323frames. 2324.It Dv ETHER_STAT_LP_CAP_1000FDX 2325Indicates the remote device supports 1 Gbit/s full-duplex operation. 2326.It Dv ETHER_STAT_LP_CAP_1000HDX 2327Indicates the remote device supports 1 Gbit/s half-duplex operation. 2328.It Dv ETHER_STAT_LP_CAP_100FDX 2329Indicates the remote device supports 100 Mbit/s full-duplex operation. 2330.It Dv ETHER_STAT_LP_CAP_100GFDX 2331Indicates the remote device supports 100 Gbit/s full-duplex operation. 2332.It Dv ETHER_STAT_LP_CAP_100HDX 2333Indicates the remote device supports 100 Mbit/s half-duplex operation. 2334.It Dv ETHER_STAT_LP_CAP_100T4 2335Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation. 2336.It Dv ETHER_STAT_LP_CAP_10FDX 2337Indicates the remote device supports 10 Mbit/s full-duplex operation. 2338.It Dv ETHER_STAT_LP_CAP_10GFDX 2339Indicates the remote device supports 10 Gbit/s full-duplex operation. 2340.It Dv ETHER_STAT_LP_CAP_10HDX 2341Indicates the remote device supports 10 Mbit/s half-duplex operation. 2342.It Dv ETHER_STAT_LP_CAP_2500FDX 2343Indicates the remote device supports 2.5 Gbit/s full-duplex operation. 2344.It Dv ETHER_STAT_LP_CAP_40GFDX 2345Indicates the remote device supports 40 Gbit/s full-duplex operation. 2346.It Dv ETHER_STAT_LP_CAP_5000FDX 2347Indicates the remote device supports 5.0 Gbit/s full-duplex operation. 2348.It Dv ETHER_STAT_LP_CAP_ASMPAUSE 2349Indicates that the remote device supports the ability to receive pause 2350frames. 2351.It Dv ETHER_STAT_LP_CAP_AUTONEG 2352Indicates that the remote device supports the ability to perform link 2353auto-negotiation. 2354.It Dv ETHER_STAT_LP_CAP_PAUSE 2355Indicates that the remote device supports the ability to transmit pause 2356frames. 2357.It Dv ETHER_STAT_LP_CAP_REMFAULT 2358Indicates that the remote device supports the ability of detecting a 2359remote fault in a link peer. 2360.It Dv ETHER_STAT_MACRCV_ERRORS 2361Indicates the number of times that the internal MAC layer encountered an 2362error when attempting to receive and process a frame. 2363.It Dv ETHER_STAT_MACXMT_ERRORS 2364Indicates the number of times that the internal MAC layer encountered an 2365error when attempting to process and transmit a frame. 2366.It Dv ETHER_STAT_MULTI_COLLISIONS 2367Indicates the number of times that a frame was eventually transmitted 2368successfully, but only after more than one collision. 2369.It Dv ETHER_STAT_SQE_ERRORS 2370Indicates the number of times that an SQE error occurred. 2371The specific conditions for this error are documented in IEEE 802.3. 2372.It Dv ETHER_STAT_TOOLONG_ERRORS 2373Indicates the number of frames that were received that were longer than 2374the maximum frame size supported by the device. 2375.It Dv ETHER_STAT_TOOSHORT_ERRORS 2376Indicates the number of frames that were received that were shorter than 2377the minimum frame size supported by the device. 2378.It Dv ETHER_STAT_TX_LATE_COLLISIONS 2379Indicates the number of times a collision was detected late on the 2380device. 2381.It Dv ETHER_STAT_XCVR_ADDR 2382Indicates the address of the MII/GMII receiver address. 2383.It Dv ETHER_STAT_XCVR_ID 2384Indicates the id of the MII/GMII receiver address. 2385.It Dv ETHER_STAT_XCVR_INUSE 2386Indicates what kind of transceiver is in use. 2387Use the 2388.Vt mac_ether_media_t 2389enumeration values described in the discussion of 2390.Dv MAC_PROP_MEDIA 2391above. 2392These definitions are compatible with the older subset of 2393XCVR_* macros. 2394.El 2395.Ss Device Specific kstats 2396In addition to the defined statistics above, if the device driver 2397maintains additional statistics or the device provides additional 2398statistics, it should create its own kstats through the 2399.Xr kstat_create 9F 2400function to allow operators to observe them. 2401.Sh RECEIVE DESCRIPTOR LAYOUT 2402One of the important things that a device driver must do is lay out DMA 2403memory, generally in a ring of descriptors, into which received Ethernet 2404frames will be placed. 2405When performing this, there are a few things that drivers should 2406generally do: 2407.Bl -enum -offset indent 2408.It 2409Drivers should lay out memory so that the IP header will be 4-byte 2410aligned. 2411The IP stack expects that the beginning of an IP header will be at a 24124-byte aligned address; however, a DMA allocation will be at a 4- 2413or 8-byte aligned address by default. 2414The IP hearder is at a 14 byte offset from the beginning of the Ethernet 2415frame, leaving the IP header at a 2-byte alignment if the Ethernet frame 2416starts at the beginning of the DMA buffer. 2417If VLAN tagging is in place, then each VLAN tag adds 4 bytes, which 2418doesn't change the alignment the IP header is found at. 2419.Pp 2420As a solution to this, the driver should program the device to start 2421placing the received Ethernet frame at two bytes off of the start of the 2422DMA buffer. 2423This will make sure that no matter whether or not VLAN tags are present, 2424that the IP header will be 4-byte aligned. 2425.It 2426Drivers should try to allocate the DMA memory used for receiving frames 2427as a continuous buffer. 2428If for some reason that would not be possible, the driver should try to 2429ensure that there is enough space for all of the initial Ethernet and 2430any possible layer three and layer four headers 2431.Pq such as IP, TCP, or UDP 2432in the initial descriptor. 2433.It 2434As discussed in the 2435.Sx MBLKS AND DMA 2436section, there are multiple strategies for managing the relationship 2437between DMA data, receive descriptors, and the operating system 2438representation of a packet in the 2439.Xr mblk 9S 2440structure. 2441Drivers must limit their resource consumption. 2442See the 2443.Sy Considerations 2444section of 2445.Sx MBLKS AND DMA 2446for more on this. 2447.El 2448.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT 2449Device drivers are the first line of defense for dealing with broken 2450devices and bugs in their firmware. 2451While most devices will rarely fail, it is important that when designing and 2452implementing the device driver that particular attention is paid in the design 2453with respect to RAS (Reliability, Availability, and Serviceability). 2454While everything described in this section is optional, it is highly recommended 2455that all new device drivers follow these guidelines. 2456.Pp 2457The Fault Management Architecture (FMA) provides facilities for 2458detecting and reporting various classes of defects and faults. 2459Specifically for networking device drivers, issues that should be 2460detected and reported include: 2461.Bl -bullet -offset indent 2462.It 2463Device internal uncorrectable errors 2464.It 2465Device internal correctable errors 2466.It 2467PCI and PCI Express transport errors 2468.It 2469Device temperature alarms 2470.It 2471Device transmission stalls 2472.It 2473Device communication timeouts 2474.It 2475High invalid interrupts 2476.El 2477.Pp 2478All such errors fall into three primary categories: 2479.Bl -enum -offset indent 2480.It 2481Errors detected by the Fault Management Architecture 2482.It 2483Errors detected by the device and indicated to the device driver 2484.It 2485Errors detected by the device driver 2486.El 2487.Ss Fault Management Setup and Teardown 2488Drivers should initialize support for the fault management framework by 2489calling 2490.Xr ddi_fm_init 9F 2491from their 2492.Xr attach 9E 2493routine. 2494By registering with the fault management framework, a device driver is given the 2495chance to detect and notice transport errors as well as report other errors that 2496exist. 2497While a device driver does not need to indicate that it is capable of all such 2498capabilities described in 2499.Xr ddi_fm_init 9F , 2500we suggest that device drivers at least register the 2501.Dv DDI_FM_EREPORT_CAPABLE 2502so as to allow the driver to report issues that it detects. 2503.Pp 2504If the driver registers with the fault management framework during its 2505.Xr attach 9E 2506entry point, it must call 2507.Xr ddi_fm_fini 9F 2508during its 2509.Xr detach 9E 2510entry point. 2511.Ss Transport Errors 2512Many modern networking devices leverage PCI or PCI Express. 2513As such, there are two primary ways that device drivers access data: they either 2514memory map device registers and use routines like 2515.Xr ddi_get8 9F 2516and 2517.Xr ddi_put8 9F 2518or they use direct memory access (DMA). 2519New device drivers should always enable checking of the transport layer by 2520marking their support in the 2521.Xr ddi_device_acc_attr 9S 2522structure and using routines like 2523.Xr ddi_fm_acc_err_get 9F 2524and 2525.Xr ddi_fm_dma_err_get 9F 2526to detect if errors have occurred. 2527.Ss Device Indicated Errors 2528Many devices have capabilities to announce to a device driver that a 2529fatal correctable error or uncorrectable error has occurred. 2530Other devices have the ability to indicate that various physical issues have 2531occurred such as a fan failing or a temperature sensor having fired. 2532.Pp 2533Drivers should wire themselves to receive notifications when these 2534events occur. 2535The means and capabilities will vary from device to device. 2536For example, some devices will generate information about these notifications 2537through special interrupts. 2538Other devices may have a register that software can poll. 2539In the cases where polling is required, driver writers should try not to poll 2540too frequently and should generally only poll when the device is actively being 2541used, e.g. between calls to the 2542.Xr mc_start 9E 2543and 2544.Xr mc_stop 9E 2545entry points. 2546.Ss Driver Transmit Stall Detection 2547One of the primary responsibilities of a hardened device driver is to 2548perform transmit stall detection. 2549The core idea behind tx stall detection is that the driver should record when 2550it's getting activity related to when data has been successfully transmitted. 2551Most devices should be transmitting data on a regular basis as long as the link 2552is up. 2553If it is not, then this may indicate that the device is stuck and needs to be 2554reset. 2555At this time, the MAC framework does not provide any resources for performing 2556these checks; however, polling on each individual transmit ring for the last 2557completion time while something is actively being transmitted through the use of 2558routines such as 2559.Xr timeout 9F 2560may be a reasonable starting point. 2561.Ss Driver Command Timeout Detection 2562Each device is programmed in different ways. 2563Some devices are programmed through asynchronous commands while others are 2564programmed by writing directly to memory mapped registers. 2565If a device receives asynchronous replies to commands, then the device driver 2566should set reasonable timeouts for all such commands and plan on detecting them. 2567If a timeout occurs, the driver should presume that there is an issue with the 2568hardware and proceed to abort the command or reset the device. 2569.Pp 2570Many devices do not have such a communication mechanism. 2571However, whenever there is some activity where the device driver must wait, then 2572it should be prepared for the fact that the device may never get back to 2573it and react appropriately by performing some kind of device reset. 2574.Ss Reacting to Errors 2575When any of the above categories of errors has been triggered, the 2576behavior that the device driver should take depends on the kind of 2577error. 2578If a fatal error, for example, a transport error, a transmit stall was detected, 2579or the device indicated an uncorrectable error was detected, then it is 2580important that the driver take the following steps: 2581.Bl -enum -offset indent 2582.It 2583Set a flag in the device driver's state that indicates that it has hit 2584an error condition. 2585When this error condition flag is asserted, transmitted packets should be 2586accepted and dropped and actions that would require writing to the device state 2587should fail with an error. 2588This flag should remain until the device has been successfully restarted. 2589.It 2590If the error was not a transport error that was indicated by the fault 2591management architecture, e.g. a transport error that was detected, then 2592the device driver should post an 2593.Sy ereport 2594indicating what has occurred with the 2595.Xr ddi_fm_ereport_post 9F 2596function. 2597.It 2598The device driver should indicate that the device's service was lost 2599with a call to 2600.Xr ddi_fm_service_impact 9F 2601using the symbol 2602.Dv DDI_SERVICE_LOST . 2603.It 2604At this point the device driver should issue a device reset through some 2605device-specific means. 2606.It 2607When the device reset has been completed, then the device driver should 2608restore all of the programmed state to the device. 2609This includes things like the current MTU, advertised auto-negotiation speeds, 2610MAC address filters, and more. 2611.It 2612Finally, when service has been restored, the device driver should call 2613.Xr ddi_fm_service_impact 9F 2614using the symbol 2615.Dv DDI_SERVICE_RESTORED . 2616.El 2617.Pp 2618When a non-fatal error occurs, then the device driver should submit an 2619ereport and should optionally mark the device degraded using 2620.Xr ddi_fm_service_impact 9F 2621with the 2622.Dv DDI_SERVICE_DEGRADED 2623value depending on the nature of the problem that has occurred. 2624.Pp 2625Device drivers should never make the decision to remove a device from 2626service based on errors that have occurred nor should they panic the 2627system. 2628Rather, the device driver should always try to notify the operating system with 2629various ereports and allow its policy decisions to occur. 2630The decision to retire a device lies in the hands of the fault management 2631architecture. 2632It knows more about the operator's intent and the surrounding system's state 2633than the device driver itself does and it will make the call to offline and 2634retire the device if it is required. 2635.Ss Device Resets 2636When resetting a device, a device driver must exercise caution. 2637If a device driver has not been written to plan for a device reset, then it 2638may not correctly restore the device's state after such a reset. 2639Such state should be stored in the instance's private state data as the MAC 2640framework does not know about device resets and will not inform the 2641device again about the expected, programmed state. 2642.Pp 2643One wrinkle with device resets is that many networking cards show up as 2644multiple PCI functions on a single device, for example, each port may 2645show up as a separate function and thus have a separate instance of the 2646device driver attached. 2647When resetting a function, device driver writers should carefully read the 2648device programming manuals and verify whether or not a reset impacts only the 2649stalled function or if it impacts all function across the device. 2650.Pp 2651If the only way to reset a given function is through the device, then 2652this may require more coordination and work on the part of the device 2653driver to ensure that all the other instances are correctly restored. 2654In cases where this occurs, some devices offer ways of injecting 2655interrupts onto those other functions to notify them that this is 2656occurring. 2657.Sh MBLKS AND DMA 2658The networking stack manages framed data through the use of the 2659.Xr mblk 9S 2660structure. 2661The mblk allows for a single message to be made up of individual blocks. 2662Each part is linked together through its 2663.Fa b_cont 2664member. 2665However, it also allows for multiple messages to be chained together through the 2666use of the 2667.Fa b_next 2668member. 2669While the networking stack works with these structures, device drivers generally 2670work with DMA regions. 2671There are two different strategies that device drivers use for handling these 2672two different cases: copying and binding. 2673.Ss Copying Data 2674The first way that device drivers handle interfacing between the two is 2675by having two separate regions of memory. 2676One part is memory which has been allocated for DMA through a call to 2677.Xr ddi_dma_mem_alloc 9F 2678and the other is memory associated with the memory block. 2679.Pp 2680In this case, a driver will use 2681.Xr bcopy 9F 2682to copy memory between the two distinct regions. 2683When transmitting a packet, it will copy the memory from the mblk_t to the DMA 2684region. 2685When receiving memory, it will allocate a mblk_t through the 2686.Xr allocb 9F 2687routine, copy the memory across with 2688.Xr bcopy 9F , 2689and then increment the mblk_t's 2690.Fa b_wptr 2691structure. 2692.Pp 2693If, when receiving, memory is not available for a new message block, 2694then the frame should be skipped and effectively dropped. 2695A kstat should be bumped when such an occasion occurs. 2696.Ss Binding Data 2697An alternative approach to copying data is to use DMA binding. 2698When using DMA binding, the OS takes care of mapping between DMA memory and 2699normal device memory. 2700The exact process is a bit different between transmit and receive. 2701.Pp 2702When transmitting a device driver has an mblk_t and needs to call the 2703.Xr ddi_dma_addr_bind_handle 9F 2704function to bind it to an already existing DMA handle. 2705At that point, it will receive various DMA cookies that it can use to obtain the 2706addresses to program the device with for transmitting data. 2707Once the transmit is done, the driver must then make sure to call 2708.Xr freemsg 9F 2709to release the data. 2710It must not call 2711.Xr freemsg 9F 2712before it receives an interrupt from the device indicating that the data 2713has been transmitted, otherwise it risks sending arbitrary kernel 2714memory. 2715.Pp 2716When receiving data, the device can perform a similar operation. 2717First, it must bind the DMA memory into the kernel's virtual memory address 2718space through a call to the 2719.Xr ddi_dma_addr_bind_handle 9F 2720function if it has not already. 2721Once it has, it must then call 2722.Xr desballoc 9F 2723to try and create a new mblk_t which leverages the associated memory. 2724It can then pass that mblk_t up to the stack. 2725.Ss Considerations 2726When deciding which of these options to use, there are many different 2727considerations that must be made. 2728The answer as to whether to bind memory or to copy data is not always simpler. 2729.Pp 2730The first thing to remember is that DMA resources may be finite on a 2731given platform. 2732Consider the case of receiving data. 2733A device driver that binds one of its receive descriptors may not get it back 2734for quite some time as it may be used by the kernel until an application 2735actually consumes it. 2736Device drivers that try to bind memory for receive, often work with the 2737constraint that they must be able to replace that DMA memory with another DMA 2738descriptor. 2739If they were not replaced, then eventually the device would not be able to 2740receive additional data into the ring. 2741.Pp 2742On the other hand, particularly for larger frames, copying every packet 2743from one buffer to another can be a source of additional latency and 2744memory waste in the system. 2745For larger copies, the cost of copying may dwarf any potential cost of 2746performing DMA binding. 2747.Pp 2748For device driver authors that are unsure of what to do, they should 2749first employ the copying method to simplify the act of writing the 2750device driver. 2751The copying method is simpler and also allows the device driver author not to 2752worry about allocated DMA memory that is still outstanding when it is asked to 2753unload. 2754.Pp 2755If device driver writers are worried about the cost, it is recommended 2756to make the decision as to whether or not to copy or bind DMA data 2757a separate private property for both transmitting and receiving. 2758That private property should indicate the size of the received frame at which 2759to switch from one format to the other. 2760This way, data can be gathered to determine what the impact of each method is on 2761a given platform. 2762.Sh SEE ALSO 2763.Xr dlpi 4P , 2764.Xr driver.conf 5 , 2765.Xr ieee802.3 7 , 2766.Xr dladm 8 , 2767.Xr _fini 9E , 2768.Xr _info 9E , 2769.Xr _init 9E , 2770.Xr attach 9E , 2771.Xr close 9E , 2772.Xr detach 9E , 2773.Xr mac_capab_led 9E , 2774.Xr mac_capab_rings 9E , 2775.Xr mac_capab_transceiver 9E , 2776.Xr mc_close 9E , 2777.Xr mc_getcapab 9E , 2778.Xr mc_getprop 9E , 2779.Xr mc_getstat 9E , 2780.Xr mc_multicst 9E , 2781.Xr mc_open 9E , 2782.Xr mc_propinfo 9E , 2783.Xr mc_setpromisc 9E , 2784.Xr mc_setprop 9E , 2785.Xr mc_start 9E , 2786.Xr mc_stop 9E , 2787.Xr mc_tx 9E , 2788.Xr mc_unicst 9E , 2789.Xr open 9E , 2790.Xr allocb 9F , 2791.Xr bcopy 9F , 2792.Xr ddi_dma_addr_bind_handle 9F , 2793.Xr ddi_dma_mem_alloc 9F , 2794.Xr ddi_fm_acc_err_get 9F , 2795.Xr ddi_fm_dma_err_get 9F , 2796.Xr ddi_fm_ereport_post 9F , 2797.Xr ddi_fm_fini 9F , 2798.Xr ddi_fm_init 9F , 2799.Xr ddi_fm_service_impact 9F , 2800.Xr ddi_get8 9F , 2801.Xr ddi_put8 9F , 2802.Xr desballoc 9F , 2803.Xr freemsg 9F , 2804.Xr kstat_create 9F , 2805.Xr mac_alloc 9F , 2806.Xr mac_devt_to_instance 9F , 2807.Xr mac_fini_ops 9F , 2808.Xr mac_free 9F , 2809.Xr mac_getinfo 9F , 2810.Xr mac_hcksum_get 9F , 2811.Xr mac_hcksum_set 9F , 2812.Xr mac_init_ops 9F , 2813.Xr mac_link_update 9F , 2814.Xr mac_lso_get 9F , 2815.Xr mac_maxsdu_update 9F , 2816.Xr mac_private_minor 9F , 2817.Xr mac_prop_info_set_default_link_flowctrl 9F , 2818.Xr mac_prop_info_set_default_str 9F , 2819.Xr mac_prop_info_set_default_uint32 9F , 2820.Xr mac_prop_info_set_default_uint64 9F , 2821.Xr mac_prop_info_set_default_uint8 9F , 2822.Xr mac_prop_info_set_perm 9F , 2823.Xr mac_prop_info_set_range_uint32 9F , 2824.Xr mac_register 9F , 2825.Xr mac_rx 9F , 2826.Xr mac_unregister 9F , 2827.Xr mod_install 9F , 2828.Xr mod_remove 9F , 2829.Xr strcmp 9F , 2830.Xr timeout 9F , 2831.Xr cb_ops 9S , 2832.Xr ddi_device_acc_attr 9S , 2833.Xr dev_ops 9S , 2834.Xr mac_callbacks 9S , 2835.Xr mac_register 9S , 2836.Xr mblk 9S , 2837.Xr modldrv 9S , 2838.Xr modlinkage 9S 2839.Rs 2840.%A McCloghrie, K. 2841.%A Rose, M. 2842.%T RFC 1213 Management Information Base for Network Management of 2843.%T TCP/IP-based internets: MIB-II 2844.%D March 1991 2845.Re 2846.Rs 2847.%A McCloghrie, K. 2848.%A Kastenholz, F. 2849.%T RFC 1573 Evolution of the Interfaces Group of MIB-II 2850.%D January 1994 2851.Re 2852.Rs 2853.%A Kastenholz, F. 2854.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like 2855.%T Interface Types 2856.Re 2857.Rs 2858.%A IEEE Computer Standard 2859.%T IEEE 802.3 2860.%T IEEE Standard for Ethernet 2861.%D 2022 2862.Re 2863