xref: /illumos-gate/usr/src/man/man9e/mac.9e (revision f920d1d14a9315d14ccff066570cb3a133b018f9)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright 2019 Joyent, Inc.
13.\" Copyright 2020 RackTop Systems, Inc.
14.\" Copyright 2023 Oxide Computer Company
15.\" Copyright 2023 Jason King
16.\"
17.Dd June 22, 2023
18.Dt MAC 9E
19.Os
20.Sh NAME
21.Nm mac ,
22.Nm GLDv3
23.Nd MAC networking device driver overview
24.Sh SYNOPSIS
25.In sys/mac_provider.h
26.In sys/mac_ether.h
27.Sh INTERFACE LEVEL
28illumos DDI specific
29.Sh DESCRIPTION
30The
31.Sy MAC
32framework provides a means for implementing high-performance networking
33device drivers.
34It is the successor to the GLD interfaces and is sometimes referred to as the
35GLDv3.
36The remainder of this manual introduces the aspects of writing devices drivers
37that leverage the MAC framework.
38While both the GLDv3 and MAC framework refer to the same thing, in this manual
39page we use the term the
40.Em MAC framework
41to refer to the device driver interface.
42.Pp
43MAC device drivers are character devices.
44They define the standard
45.Xr _init 9E ,
46.Xr _fini 9E ,
47and
48.Xr _info 9E
49entry points to initialize the module, as well as
50.Xr dev_ops 9S
51and
52.Xr cb_ops 9S
53structures.
54.Pp
55The main interface with MAC is through a series of callbacks defined in
56a
57.Xr mac_callbacks 9S
58structure.
59These callbacks control all the aspects of the device.
60They range from sending data, getting and setting of properties, controlling mac
61address filters, and also managing promiscuous mode.
62.Pp
63The MAC framework takes care of many aspects of the device driver's
64management.
65A device that uses the MAC framework does not have to worry about creating
66device nodes or implementing
67.Xr open 9E
68or
69.Xr close 9E
70routines.
71In addition, all of the work to interact with
72.Xr dlpi 4P
73is taken care of automatically and transparently.
74.Ss High-Level Design
75At a high-level, a device driver is chiefly concerned with three general
76operations:
77.Bl -enum -offset indent
78.It
79Sending frames
80.It
81Receiving frames
82.It
83Managing device configuration and metadata
84.El
85.Pp
86When sending frames, the MAC framework always calls functions registered
87in the
88.Xr mac_callbacks 9S
89structure to have the driver transmit frames on hardware.
90When receiving frames, the driver will generally receive an interrupt which will
91cause it to check for incoming data and deliver it to the MAC framework.
92.Pp
93Configuration of a device, such as whether auto-negotiation should be
94enabled, the speeds that the device supports, the MTU (maximum
95transmission unit), and the generation of pause frames are all driven by
96properties.
97The functions to get, set, and obtain information about properties are
98defined through callback functions specified in the
99.Xr mac_callbacks 9S
100structure.
101The full list of properties and a description of the relevant callbacks
102can be found in the
103.Sx PROPERTIES
104section.
105.Pp
106The MAC framework is designed to take advantage of various modern
107features provided by hardware, such as checksumming, segmentation
108offload, and hardware filtering.
109The MAC framework assumes none of these advanced features are present
110and allows device drivers to negotiate them through a capability system.
111Drivers can declare that they support various capabilities by
112implementing the optional
113.Xr mc_getcapab 9E
114entry point.
115Each capability has its associated entry points and structures to fill
116out.
117The capabilities are detailed in the
118.Sx CAPABILITIES
119section.
120.Pp
121The following sections describe the flow of a basic device driver.
122For advanced device drivers, the flow is generally the same.
123The primary distinction is in how frames are sent and received.
124.Ss Initializing MAC Support
125For a device to be used by the MAC framework, it must register with the
126framework and take specific actions during
127.Xr _init 9E ,
128.Xr attach 9E ,
129.Xr detach 9E ,
130and
131.Xr _fini 9E .
132.Pp
133All device drivers have to define a
134.Xr dev_ops 9S
135structure which is pointed to by a
136.Xr modldrv 9S
137structure and the corresponding NULL-terminated
138.Xr modlinkage 9S
139structure.
140The
141.Xr dev_ops 9S
142structure should have a
143.Xr cb_ops 9S
144structure defined for it; however, it does not need to implement any of
145the standard
146.Xr cb_ops 9S
147entry points unless it also exposes a custom set of device nodes not
148otherwise managed by the MAC framework.
149See the
150.Sx Custom Device Nodes
151section for more details.
152.Pp
153Normally, in a driver's
154.Xr _init 9E
155entry point, it passes its
156.Xr modlinkage 9S
157structure directly to
158.Xr mod_install 9F .
159To properly register with MAC, the driver must call
160.Xr mac_init_ops 9F
161before it calls
162.Xr mod_install 9F .
163If for some reason the
164.Xr mod_install 9F
165function fails, then the driver must be removed by a call to
166.Xr mac_fini_ops 9F .
167.Pp
168Conversely, in the driver's
169.Xr _fini 9E
170routine, it should call
171.Xr mac_fini_ops 9F
172after it successfully calls
173.Xr mod_remove 9F .
174For an example of how to use the
175.Xr mac_init_ops 9F
176and
177.Xr mac_fini_ops 9F
178functions, see the examples section in
179.Xr mac_init_ops 9F .
180.Ss Custom Device Nodes
181A device may want to provide its own minor nodes as simple character or block
182devices backed by the usual
183.Xr cb_ops 9S
184routines.
185The MAC framework allows for this by leaving a portion of the minor
186number space available for private driver use.
187.Xr mac_private_minor 9F
188returns the first minor number a driver may use for its own purposes,
189e.g., to pass to
190.Xr ddi_create_minor_node 9F .
191.Pp
192A driver making use of this ability must provide its own
193.Xr getinfo 9E
194implementation that is aware of any such minor nodes.
195It must also delegate back to the MAC framework as appropriate via either
196calls to
197.Xr mac_getinfo 9F
198or
199.Xr mac_devt_to_instance 9F
200for MAC reserved minor nodes.
201It should also take care to not affect MAC reserved minors, e.g.,
202removing all minor nodes associated with a device:
203.Bd -literal -offset indent
204    ddi_remove_minor_node(dip, NULL);
205.Ed
206.Ss Registering with MAC
207Every instance of a device should register separately with MAC.
208To register with MAC, a driver must allocate a
209.Xr mac_register 9S
210structure, fill it in, and then call
211.Xr mac_register 9F .
212The
213.Vt mac_register_t
214structure contains information about the device and all of the required
215function pointers that will be used as callbacks by the framework.
216.Pp
217These steps should all be taken during a device's
218.Xr attach 9E
219entry point.
220It is recommended that the driver perform this sequence of steps after the
221device has finished its initialization of the chipset and interrupts, though
222interrupts should not be enabled at that point.
223After it calls
224.Xr mac_register 9F
225it will start receiving callbacks from the MAC framework.
226.Pp
227To allocate the registration structure, the driver should call
228.Xr mac_alloc 9F .
229Device drivers should generally always pass the symbol
230.Dv MAC_VERSION
231as the argument to
232.Xr mac_alloc 9F .
233Upon successful completion, the driver will receive a
234.Vt mac_register_t
235structure which it should fill in.
236The structure and its members are documented in
237.Xr mac_register 9S .
238.Pp
239The
240.Xr mac_callbacks 9S
241structure is not allocated as a part of the
242.Xr mac_register 9S
243structure.
244In general, device drivers declare this statically.
245See the
246.Sx MAC Callbacks
247section for more information on how to fill it out.
248.Pp
249Once the structure has been filled in, the driver should call
250.Xr mac_register 9F
251to register itself with MAC.
252The handle that it uses to register with should be part of the driver's soft
253state.
254It will be used in various other support functions and callbacks.
255.Pp
256If the call is successful, then the device driver
257should enable interrupts and finish any other initialization required.
258If the call to
259.Xr mac_register 9F
260failed, then it should unwind its initialization and should return
261.Dv DDI_FAILURE
262from its
263.Xr attach 9E
264routine.
265.Pp
266The driver does not need to hold onto an allocated
267.Xr mac_register 9S
268structure after it has called the
269.Xr mac_register 9F
270function.
271Whether the
272.Xr mac_register 9F
273function returns successfully or not, the driver may free its
274.Xr mac_register 9S
275structure by calling the
276.Xr mac_free 9F
277function.
278.Ss MAC Callbacks
279The MAC framework interacts with a device driver through a series of
280callbacks.
281These callbacks are described in their individual manual pages and the
282collection of callbacks is indicated in the
283.Xr mac_callbacks 9S
284manual page.
285This section does not focus on the specific functions, but rather on
286interactions between them and the rest of the device driver framework.
287.Pp
288A device driver should make no assumptions about when the various
289callbacks will be called and whether or not they will be called
290simultaneously.
291For example, a device driver may be asked to transmit data through a call to its
292.Xr mc_tx 9E
293entry point while it is being asked to get a device property through a
294call to its
295.Xr mc_getprop 9E
296entry point.
297As such, while some calls may be serialized to the device, such as setting
298properties, the device driver should always presume that all of its data needs
299to be protected with locks.
300While the device is holding locks, it is safe for it call the following MAC
301routines:
302.Bl -bullet -offset indent -compact
303.It
304.Xr mac_hcksum_get 9F
305.It
306.Xr mac_hcksum_set 9F
307.It
308.Xr mac_lso_get 9F
309.It
310.Xr mac_maxsdu_update 9F
311.It
312.Xr mac_prop_info_set_default_link_flowctrl 9F
313.It
314.Xr mac_prop_info_set_default_str 9F
315.It
316.Xr mac_prop_info_set_default_uint8 9F
317.It
318.Xr mac_prop_info_set_default_uint32 9F
319.It
320.Xr mac_prop_info_set_default_uint64 9F
321.It
322.Xr mac_prop_info_set_perm 9F
323.It
324.Xr mac_prop_info_set_range_uint32 9F
325.El
326.Pp
327Any other MAC related routines should not be called with locks held,
328such as
329.Xr mac_link_update 9F
330or
331.Xr mac_rx 9F .
332Other routines in the DDI may be called while locks are held; however,
333device driver writers should be careful about calling blocking routines
334while locks are held or in interrupt context, even when it is
335legal to do so as this may cause all other callers that need a given
336lock to back up behind such an operation.
337.Ss Receiving Data
338A device driver will often receive data through the means of an
339interrupt or by being asked to poll for frames.
340When this occurs, zero or more frames, each with optional metadata, may
341be ready for the device driver to consume.
342Often each frame has a corresponding descriptor which has information about
343whether or not there were errors or whether or not the device successfully
344checksummed the packet.
345In addition to the per-packet flow described below, there are certain
346requirements that drivers must adhere to when programming the hardware
347to receive data.
348See the section
349.Sx RECEIVE DESCRIPTOR LAYOUT
350for more information.
351.Pp
352During a single interrupt or poll request, a device driver should process
353a fixed number of frames.
354For each frame the device driver should:
355.Bl -enum -offset indent
356.It
357Ensure that all of the DMA memory for the descriptor ring is synchronized with
358the
359.Xr ddi_dma_sync 9F
360function and check the handle for errors if the device driver has enabled DMA
361error reporting as part of the Fault Management Architecture (FMA).
362If the driver does not rely on DMA, then it may skip this step.
363It is recommended that this is performed once per interrupt or poll for
364the entire region and not on a per-packet basis.
365.It
366First check whether or not the frame has errors.
367If errors were detected, then the frame should not be sent to the operating
368system.
369It is recommended that devices keep kstats (see
370.Xr kstat_create 9F
371for more information) and bump the counter whenever such an error is
372detected.
373If the device distinguishes between the types of errors, then separate kstats
374for each class of error are recommended.
375See the
376.Sx STATISTICS
377section for more information on the various error cases that should be
378considered.
379.It
380Once the frame has been determined to be valid, the device driver should
381transform the frame into a
382.Xr mblk 9S .
383See the section
384.Sx MBLKS AND DMA
385for more information on how to transform and prepare a message block.
386.It
387If the device supports hardware checksumming (see the
388.Sx CAPABILITIES
389section for more information on checksumming), then the device driver
390should set the corresponding checksumming information with a call to
391.Xr mac_hcksum_set 9F .
392.It
393It should then append this new message block to the
394.Em end
395of the message block chain, linking it to the
396.Fa b_next
397pointer.
398It is vitally important that all the frames be chained in the order that they
399were received.
400If the device driver mistakenly reorders frames, then it may cause performance
401impacts in the TCP stack and potentially impact application correctness.
402.El
403.Pp
404Once all the frames have been processed and assembled, the device driver
405should deliver them to the rest of the operating system by calling
406.Xr mac_rx 9F .
407The device driver should try to give as many mblk_t structures to the
408system at once.
409It
410.Em should not
411call
412.Xr mac_rx 9F
413once for every assembled mblk_t.
414.Pp
415The device driver must not hold any locks across the call to
416.Xr mac_rx 9F .
417When this function is called, received data will be pushed through the
418networking stack and some replies may be generated and given to the
419driver to send out.
420.Pp
421It is not the device driver's responsibility to determine whether or not
422the system can keep up with a driver's delivery rate of frames.
423The rest of the networking stack will handle issues related to keeping up
424appropriately and ensure that kernel memory is not exhausted by packets
425that are not being processed.
426.Pp
427If the device driver has negotiated the
428.Dv MAC_CAPAB_RINGS
429capability
430.Pq discussed in Xr mac_capab_rings 9E
431then it should call
432.Xr mac_rx_ring 9F
433and not
434.Xr mac_rx 9F .
435A given interrupt may correspond to more than one ring that needs to be
436checked.
437The set of rings is likely to span different groups that were registered
438with MAC through the
439.Xr mr_gget 9E
440interface.
441In those cases, the driver should follow the above procedure
442independently for each ring.
443That means it will call
444.Xr mac_rx_ring 9F
445once for each ring using the handle that it received from when MAC
446called the driver's
447.Xr mr_rget 9E
448entry point.
449When it is looking at the rings, the driver will need to make sure that
450the ring has not had interrupts disabled
451.Pq due to a pending change to polling mode .
452This is discussed in greater detail in the
453.Xr mac_capab_rings 9E
454and
455.Xr mri_poll 9E
456manual pages.
457.Pp
458Finally, the device driver should make sure that any other housekeeping
459activities required for the ring are taken care of such that more data
460can be received.
461.Ss Transmitting Data and Back Pressure
462A device driver will be asked to transmit a message block chain by
463having it's
464.Xr mc_tx 9E
465entry point called.
466While the driver is processing the message blocks, it may run out of resources.
467For example, a transmit descriptor ring may become full.
468At that point, the device driver should return the remaining unprocessed frames.
469The act of returning frames indicates that the device has asserted flow control.
470Once this has been done, no additional calls will be made to the
471driver's transmit entry point and the back pressure will be propagated
472throughout the rest of the networking stack.
473.Pp
474At some point in the future when resources have become available again,
475for example after an interrupt indicating that some portion of the
476transmit ring has been sent, then the device driver must notify the
477system that it can continue transmission.
478To do this, the driver should call
479.Xr mac_tx_update 9F .
480After that point, the driver will receive calls to its
481.Xr mc_tx 9E
482entry point again.
483As mentioned in the section on callbacks, the device driver should avoid holding
484any particular locks across the call to
485.Xr mac_tx_update 9F .
486.Ss Interrupt Coalescing
487For devices operating at higher data rates, interrupt coalescing is an
488important part of a well functioning device and may impact the
489performance of the device.
490Not all devices support interrupt coalescing.
491If interrupt coalescing is supported on the device, it is recommended that
492device driver writers provide private properties for their device to control the
493interrupt coalescing rate.
494This will make it much easier to perform experiments and observe the impact of
495different interrupt rates on the rest of the system.
496.Ss Polling
497Even with interrupt coalescing, when there is a certain incoming packet rate it
498can make more sense to just actively poll the device, asking for more packets
499rather than constantly taking an interrupt.
500When a device driver supports the
501.Xr mac_capab_rings 9E
502capability and therefore polling on receive rings, the MAC framework will ask
503the driver to disable interrupts, with its
504.Xr mi_disable 9E
505entry point, and then subsequently call its polling entry point,
506.Xr mri_poll 9E .
507.Pp
508As long as a device driver implements the needed entry points, then there is
509nothing else that it needs to do to take advantage of polling.
510A driver should not attempt to spin up its own threads, task queues, or
511creatively use timeouts, to try to simulate polling for received packets.
512.Ss MAC Address Filter Management
513The MAC framework will attempt to use as many MAC address filters as a
514device has.
515To program a multicast address filter, the driver's
516.Xr mc_multicst 9E
517entry point will be called.
518If the device driver runs out of filters, it should not take any special action
519and just return the appropriate error as documented in the corresponding manual
520pages for the entry points.
521The framework will ensure that the device is placed in promiscuous mode
522if it needs to.
523.Pp
524If the hardware supports more than one unicast filter then the device
525driver should consider implementing the
526.Dv MAC_CAPAB_RINGS
527capability, which exposes a means for multiple unicast MAC address filters to be
528used by the broader system.
529It is still useful to implement this on hardware which only has a single ring.
530See
531.Xr mac_capab_rings 9E
532for more information.
533.Ss Receive Side Scaling
534Receive side scaling is where a hardware device supports multiple,
535independent queues of frames that can be received.
536Each of these queues is generally associated with an independent
537interrupt and the hardware usually performs some form of hash across the
538queues.
539Hardware which supports this should look at implementing the
540.Dv MAC_CAPAB_RINGS
541capability and see
542.Xr mac_capab_rings 9E
543for more information.
544.Ss Link Updates
545It is the responsibility of the device driver to keep track of the
546data link's state.
547Many devices provide a means of receiving an interrupt when the state of the
548link changes.
549When such a change happens, the driver should update its internal data
550structures and then call
551.Xr mac_link_update 9F
552to inform the MAC layer that this has occurred.
553If the device driver does not properly inform the system about link changes,
554then various features like link aggregations and other mechanisms that leverage
555the link state will not work correctly.
556.Ss Link Speed and Auto-negotiation
557Many networking devices support more than one possible speed that they
558can operate at.
559The selection of a speed is often performed through
560.Em auto-negotiation ,
561though some devices allow the user to control what speeds are advertised
562and used.
563.Pp
564Logically, there are two different sets of things that the device driver
565needs to keep track of while it's operating:
566.Bl -enum
567.It
568The supported speeds in hardware.
569.It
570The enabled speeds from the user.
571.El
572.Pp
573By default, when a link first comes up, the device driver should
574generally configure the link to support the common set of speeds and
575perform auto-negotiation.
576.Pp
577A user can control what speeds a device advertises via auto-negotiation
578and whether or not it performs auto-negotiation at all by using a series
579of properties that have
580.Sy _EN_
581in the name.
582These are read/write properties and there is one for each speed supported in the
583operating system.
584For a full list of them, see the
585.Sx PROPERTIES
586section.
587.Pp
588In addition to these properties, there is a corresponding set of
589properties with
590.Sy _ADV_
591in the name.
592These are similar to the
593.Sy _EN_
594family of properties, but they are read-only and indicate what the
595device has actually negotiated.
596While they are generally similar to the
597.Sy _EN_
598family of properties, they may change depending on power settings.
599See the
600.Sy Ethernet Link Properties
601section in
602.Xr dladm 8
603for more information.
604.Pp
605It's worth discussing how these different values get used throughout the
606different entry points.
607The first entry point to consider is the
608.Xr mc_propinfo 9E
609entry point.
610For a given speed, the driver should consult whether or not the hardware
611supports this speed.
612If it does, it should fill in the default value that the hardware takes and
613whether or not the property is writable.
614The properties should also be updated to indicate whether or not it is writable.
615This holds for both the
616.Sy _EN_
617and
618.Sy _ADV_
619family of properties.
620.Pp
621The next entry point is
622.Xr mc_getprop 9E .
623Here, the device should first consult whether the given speed is
624supported.
625If it is not, then the driver should return
626.Er ENOTSUP .
627If it does, then it should return the current value of the property.
628.Pp
629The last property endpoint is the
630.Xr mc_setprop 9E
631entry point.
632Here, the same logic applies.
633Before the driver considers whether or not the property is writable, it should
634first check whether or not it's a supported property.
635If it's not, then it should return
636.Er ENOTSUP .
637Otherwise, it should proceed to check whether the property is writable,
638and if it is and a valid value, then it should update the property and
639restart the link's negotiation.
640.Pp
641Finally, there is the
642.Xr mc_getstat 9E
643entry point.
644Several of the statistics that are queried relate to auto-negotiation and
645hardware capabilities.
646When a statistic relates to the hardware supporting a given speed, the
647.Sy _EN_
648properties should be ignored.
649The only thing that should be consulted is what the hardware itself supports.
650Otherwise, the statistics should look at what is currently being advertised by
651the device.
652.Ss Unregistering from MAC
653During a driver's
654.Xr detach 9E
655routine, it should unregister the device instance from MAC by calling
656.Xr mac_unregister 9F
657on the handle that it originally called it on.
658If the call to
659.Xr mac_unregister 9F
660failed, then the device is likely still in use and the driver should
661fail the call to
662.Xr detach 9E .
663.Ss Interacting with Devices
664Administrators always interact with devices through the
665.Xr dladm 8
666command line interface.
667The state of devices such as whether the link is considered up or down,
668various link properties such as the MTU, auto-negotiation state, and
669flow control state, are all exposed.
670It is also the preferred way that these properties are set and configured.
671.Pp
672While device tunables may be presented in a
673.Xr driver.conf 5
674file, it is recommended instead to expose such things through
675.Xr dladm 8
676private properties, whether explicitly documented or not.
677.Sh CAPABILITIES
678Capabilities in the MAC Framework are optional features that a device
679supports which indicate various hardware features that the device
680supports.
681The two current capabilities that the system supports are related to being able
682to hardware perform large send offloads (LSO), often also known as TCP
683segmentation and the ability for hardware to calculate and verify the checksums
684present in IPv4, IPV6, and protocol headers such as TCP and UDP.
685.Pp
686The MAC framework will query a device for support of a capability
687through the
688.Xr mc_getcapab 9E
689function.
690Each capability has its own constant and may have corresponding data that goes
691along with it and a specific structure that the device is required to fill in.
692Note, the set of capabilities changes over time and there are also private
693capabilities in the system.
694Several of the capabilities are used in the implementation of the MAC framework.
695Others, like
696.Dv MAC_CAPAB_RINGS ,
697represent feature that have not been stabilized and thus both API and binary
698compatibility for them is not guaranteed.
699It is important that the device driver handles unknown capabilities correctly.
700For more information, see
701.Xr mc_getcapab 9E .
702.Pp
703The following capabilities are
704stable and defined in the system:
705.Ss Dv MAC_CAPAB_HCKSUM
706The
707.Dv MAC_CAPAB_HCKSUM
708capability indicates to the system that the device driver supports some
709amount of checksumming.
710The specific data for this capability is a pointer to a
711.Vt uint32_t .
712To indicate no support for any kind of checksumming, the driver should
713either set this value to zero or simply return that it doesn't support
714the capability.
715.Pp
716Note, the values that the driver declares in this capability indicate
717what it can do when it transmits data.
718If the driver can only verify checksums when receiving data, then it should not
719indicate that it supports this capability.
720The following set of flags may be combined through a bitwise inclusive OR:
721.Bl -tag -width Ds
722.It Dv HCKSUM_INET_PARTIAL
723This indicates that the hardware can calculate a partial checksum for
724both IPv4 and IPv6 UDP and TCP packets; however, it requires the pseudo-header
725checksum be calculated for it.
726The pseudo-header checksum will be available for the mblk_t when calling
727.Xr mac_hcksum_get 9F .
728Note this does not imply that the hardware is capable of calculating
729the partial checksum for other L4 protocols or the IPv4 header checksum.
730That should be indicated with the
731.Dv HCKSUM_IPHDRCKSUM flag .
732.It Dv HCKSUM_INET_FULL_V4
733This indicates that the hardware will fully calculate the L4 checksum for
734outgoing IPv4 UDP or TCP packets only, and does not require a pseudo-header
735checksum.
736Note this does not imply that the hardware is capable of calculating the
737checksum for other L4 protocols or the IPv4 header checksum.
738That should be indicated with the
739.Dv HCKSUM_IPHDRCKSUM .
740.It Dv HCKSUM_INET_FULL_V6
741This indicates that the hardware will fully calculate the L4 checksum for
742outgoing IPv6 UDP or TCP packets only, and does not require a pseudo-header
743checksum.
744Note this does not imply that the hardware is capable of calculating the
745checksum for any other L4 protocols.
746.It Dv HCKSUM_IPHDRCKSUM
747This indicates that the hardware supports calculating the checksum for
748the IPv4 header itself.
749.El
750.Pp
751When in a driver's transmit function, the driver will be processing a
752single frame.
753It should call
754.Xr mac_hcksum_get 9F
755to see what checksum flags are set on it.
756Note that the flags that are set on it are different from the ones described
757above and are documented in its manual page.
758These flags indicate how the driver is expected to program the hardware and what
759checksumming is required.
760Not all frames will require hardware checksumming or will ask the hardware to
761checksum it.
762.Pp
763If a driver supports offloading the receive checksum and verification,
764it should check to see what the hardware indicated was verified.
765The driver should then call
766.Xr mac_hcksum_set 9F .
767The flags used are different from the ones above and are discussed in
768detail in the
769.Xr mac_hcksum_set 9F
770manual page.
771If there is no checksum information available or the driver does not support
772checksumming, then it should simply not call
773.Xr mac_hcksum_set 9F .
774.Pp
775Note that the checksum flags should be set on the first
776mblk_t that makes up a given message.
777In other words, if multiple mblk_t structures are linked together by the
778.Fa b_cont
779member to describe a single frame, then it should only be called on the
780first mblk_t of that set.
781However, each distinct message should have the checksum bits set on it, if
782applicable.
783In other words, each mblk_t that is linked together by the
784.Fa b_next
785pointer may have checksum flags set.
786.Pp
787It is recommended that device drivers provide a private property or
788.Xr driver.conf 5
789property to control whether or not checksumming is enabled for both rx
790and tx; however, the default disposition is recommended to be enabled
791for both.
792This way if hardware bugs are found in the checksumming implementation, they can
793be disabled without requiring software updates.
794The transmit property should be checked when determining how to reply to
795.Xr mc_getcapab 9E
796and the receive property should be checked in the context of the receive
797function.
798.Ss Dv MAC_CAPAB_LSO
799The
800.Dv MAC_CAPAB_LSO
801capability indicates that the driver supports various forms of large
802send offload (LSO).
803The private data is a pointer to a
804.Ft mac_capab_lso_t
805structure.
806The system currently supports offloading TCP packets over both IPv4 and
807IPv6.
808This structure has the following members which are used to indicate
809various types of LSO support.
810.Bd -literal -offset indent
811t_uscalar_t		lso_flags;
812lso_basic_tcp_ivr4_t	lso_basic_tcp_ipv4;
813lso_basic_tcp_ipv6_t	lso_basic_tcp_ipv6;
814.Ed
815.Pp
816The
817.Fa lso_flags
818member is used to indicate which members are valid and should be
819considered.
820Each flag represents a different form of LSO.
821The member should be set to the bitwise inclusive OR of the following values:
822.Bl -tag -width Dv -offset indent
823.It Dv LSO_TX_BASIC_TCP_IPV4
824This indicates hardware support for performing TCP segmentation
825offloading over IPv4.
826When this flag is set, the
827.Fa lso_basic_tcp_ipv4
828member must be filled in.
829.It Dv LSO_TX_BASIC_TCP_IPV6
830This indicates hardware support for performing TCP segmentation
831offloading over IPv6.
832The IPv6 packet will have no extension headers present.
833When this flag is set, the
834.Fa lso_basic_tcp_ipv6
835member must be filled in.
836.El
837.Pp
838The
839.Fa lso_basic_tcp_ipv4
840member is a structure with the following members:
841.Bd -literal -offset indent
842t_uscalar_t	lso_max
843.Ed
844.Bd -filled -offset indent
845The
846.Fa lso_max
847member should be set to the maximum size of the TCP data
848payload that can be offloaded to the hardware.
849.Ed
850.Pp
851The
852.Fa lso_basic_tcp_ipv6
853member is a structure with the following members:
854.Bd -literal -offset indent
855t_uscalar_t	lso_max
856.Ed
857.Bd -filled -offset indent
858The
859.Fa lso_max
860member should be set to the maximum size of the TCP data
861payload that can be offloaded to the hardware.
862.Ed
863.Pp
864Like with checksumming, it is recommended that driver writers provide a
865means for disabling the support of LSO even if it is enabled by default.
866This deals with the case where issues that pop up for LSO may be worked
867around without requiring additional driver work.
868.Sh EVOLVING CAPABILITIES
869The following capabilities are still evolving in the operating system.
870They are documented such that device driver writers may experiment with
871them.
872However, if such drivers are not present inside the core operating
873system repository, they may be subject to API and ABI breakage.
874.Ss Dv MAC_CAPAB_RINGS
875The
876.Dv MAC_CAPAB_RINGS
877capability is very important for implementing a high-performing device
878driver.
879Networking hardware structures the queues of packets to be sent
880and received into a ring.
881Each entry in this ring has a descriptor, which describes the address
882and options for a packet which is going to
883be transmitted or received.
884While simple networking devices only have a single ring, most high-speed
885networking devices have support for many rings.
886.Pp
887Rings are used for two important purposes.
888The first is receive side scaling (RSS), which is the ability to have
889the hardware hash the contents of a packet based on some of the protocol
890headers, and send it to one of several rings.
891These different rings may each have their own interrupt associated with
892them, allowing the card to receive traffic in parallel.
893Similar logic can be performed when sending traffic, to leverage
894multiple hardware resources, thus increasing capacity.
895.Pp
896The second use of rings is to group them together and apply filtering
897rules.
898For example, if a packet matches a specific VLAN or MAC address,
899then it can be sent to a specific ring or a specific group of rings.
900This is especially useful when there are multiple different virtual NICs
901or zones in play as the operating system will be able to use the
902hardware classificaiton features to already know where a given packet
903needs to be delivered internally rather than having to determine that
904for each packet.
905.Pp
906From the MAC framework's perspective, a driver can have one or more
907groups.
908A group consists of the following:
909.Bl -bullet -offset -indent
910.It
911One or more hardware rings.
912.It
913One or more MAC address or VLAN filters.
914.El
915.Pp
916The details around how a device driver changes when rings are employed,
917the data structures that a driver must implement, and more are available
918in
919.Xr mac_capab_rings 9E .
920.Ss Dv MAC_CAPAB_TRANSCEIVER
921Many networking devices leverage external transceivers that adhere to
922standards such as SFP, QSFP, QSFP-DD, etc., which often contain
923standardized information in a EEPROM on the device.
924The
925.Dv MAC_CAPAB_TRANSCEIVER
926capability provides a means of discovering the number of transceivers,
927their types, and reading the data from a transceiver.
928This allows administrators and users to determine if devices are
929present, if the hardware can use them, and in many cases, detailed
930information about the device ranging from its manufacturer and
931serial numbers to specific information about its health.
932Implementing this capability will lead to the operating system being
933able to discover and display transceivers as part of its fault
934management topology.
935.Pp
936See
937.Xr mac_capab_transceiver 9E
938for more details on the capability structure and the various function
939entry points that come along with it.
940.Ss Dv MAC_CAPAB_LED
941The
942.Dv MAC_CAPAB_LED
943capability provides a means to access and control the LEDs on a network
944interface card.
945This is then made available to the broader operating system and consumed
946by facilities such as the Fault Management Architecture.
947See
948.Xr mac_capab_led 9E
949for more details on the structure and requirements of the capability.
950.Sh PROPERTIES
951Properties in the MAC framework represent aspects of a link.
952These include things like the link's current state and MTU.
953Many of the properties in the system are focused around auto-negotiation and
954controlling what link speeds are advertised.
955Information about properties is covered by three different device entry points.
956The
957.Xr mc_propinfo 9E
958entry point obtains metadata about the property.
959The
960.Xr mc_getprop 9E
961entry point obtains the property.
962The
963.Xr mc_setprop 9E
964entry point updates the property to a new value.
965.Pp
966Many of the properties listed below are read-only.
967Each property indicates whether it's read-only or it's read/write.
968However, driver writers may not implement the ability to set all writable
969properties.
970Many of these depend on the card itself.
971In particular, all properties that relate to auto-negotiation and are read/write
972may not be updated if the hardware in question does not support toggling what
973link speeds are auto-negotiated.
974While copper Ethernet often does not have this restriction, it often exists with
975various fiber standards and phys.
976.Pp
977The following properties are the subset of MAC framework properties that
978driver writers should be aware of and handle.
979While other properties exist in the system, driver writers should always return
980an error when a property not listed below is encountered.
981See
982.Xr mc_getprop 9E
983and
984.Xr mc_setprop 9E
985for more information on how to handle them.
986.Bl -hang -width Ds
987.It Dv MAC_PROP_DUPLEX
988.Bd -filled -compact
989Type:
990.Vt link_duplex_t |
991Permissions:
992.Sy Read-Only
993.Ed
994.Pp
995The
996.Dv MAC_PROP_DUPLEX
997property is used to indicate whether or not the link is duplex.
998A duplex link may have traffic flowing in both directions at the same time.
999The
1000.Vt link_duplex_t
1001is an enumeration which may be set to any of the following values:
1002.Bl -tag -width Ds
1003.It Dv LINK_DUPLEX_UNKNOWN
1004The current state of the link is unknown.
1005This may be because the link has not negotiated to a specific speed or it is
1006down.
1007.It Dv LINK_DUPLEX_HALF
1008The link is running at half duplex.
1009Communication may travel in only one direction on the link at a given time.
1010.It Dv LINK_DUPLEX_FULL
1011The link is running at full duplex.
1012Communication may travel in both directions on the link simultaneously.
1013.El
1014.It Dv MAC_PROP_SPEED
1015.Bd -filled -compact
1016Type:
1017.Vt uint64_t |
1018Permissions:
1019.Sy Read-Only
1020.Ed
1021.Pp
1022The
1023.Dv MAC_PROP_SPEED
1024property stores the current link speed in bits per second.
1025A link that is running at 100 MBit/s would store the value 100000000ULL.
1026A link that is running at 40 Gbit/s would store the value 40000000000ULL.
1027.It Dv MAC_PROP_STATUS
1028.Bd -filled -compact
1029Type:
1030.Vt link_state_t |
1031Permissions:
1032.Sy Read-Only
1033.Ed
1034.Pp
1035The
1036.Dv MAC_PROP_STATUS
1037property is used to indicate the current state of the link.
1038It indicates whether the link is up or down.
1039The
1040.Vt link_state_t
1041is an enumeration which may be set to any of the following values:
1042.Bl -tag -width Ds
1043.It Dv LINK_STATE_UNKNOWN
1044The current state of the link is unknown.
1045This may be because the driver's
1046.Xr mc_start 9E
1047endpoint has not been called so it has not attempted to start the link.
1048.It Dv LINK_STATE_DOWN
1049The link is down.
1050This may be because of a negotiation problem, a cable problem, or some other
1051device specific issue.
1052.It Dv LINK_STATE_UP
1053The link is up.
1054If auto-negotiation is in use, it should have completed.
1055Traffic should be able to flow over the link, barring other issues.
1056.El
1057.It Dv MAC_PROP_MEDIA
1058.Bd -filled -compact
1059Type:
1060.Vt uint32_t No (Varies) |
1061Permissions:
1062.Sy Read-Only
1063.Ed
1064.Pp
1065The
1066.Dv MAC_PROP_MEDIA
1067property indicates the current type of media on the link.
1068The type of media is class-specific and determined based on the
1069.Fa m_type_ident
1070field in the
1071.Vt mac_register_t
1072structure used when calling
1073.Xr mac_register 9F .
1074The media is always read-only.
1075This property is not used to control how auto-negotiation should be
1076performed, instead the existing speed-based properties are used instead.
1077This property should be updated after auto-negotiation has completed.
1078If device hardware and firmware do not provide a way to accurately
1079determine this, then it is much better to return that the media is
1080unknown rather than to lie or guess.
1081A common case where this comes up is when a network card uses an
1082SFP-based device.
1083If the underlying negotiated type of the link isn't made available and
1084therefore the driver can't distinguish between say 40GBASE-SR4 and
108540GBASE-LR4, then drivers should return that the media is unknown.
1086.Pp
1087Similarly many types here represent an electrical interface that is
1088often used between a MAC and a PHY, but also for chip-to-chip
1089connectivity or on a backplane.
1090When connecting to a PHY these shouldn't generally be used as the user
1091is concerned with what is actually on the link they plug in, not the
1092internals of the device.
1093.Pp
1094Currently media values are defined for Ethernet-based devices and use
1095the enumeration
1096.Vt mac_ether_media_t .
1097These are defined in
1098.In sys/mac_ether.h
1099and generally follow the IEEE standardized physical medium dependent
1100.Pq PMD
1101layer in 802.3.
1102.Bl -tag -width Ds
1103.It Dv ETHER_MEDIA_UNKNOWN
1104This indicates that the type of the link media is unknown to the driver.
1105This may be because the link is in a state where this information is
1106unknown or the hardware, firmware, and device driver cannot figure it
1107out.
1108If there is no media present and the link is down, use
1109.Dv ETHER_MEDIA_NONE
1110instead.
1111.It Dv ETHER_MEDIA_NONE
1112Represents the case that there is no specific media in use.
1113This should generally be used when the link is down.
1114.It Dv ETHER_MEDIA_10BASE_T
1115Traditional 10 Mbit/s Ethernet based utilizing CAT-3 cabling.
1116Defined in 802.3i.
1117.It Dv ETHER_MEDIA_10BASE_T1
1118A more recent variant of 10 Mbit/s Ethernet that uses a single twisted
1119pair.
1120Defined in 802.3cg.
1121.It Dv ETHER_MEDIA_100BASE_TX
1122The most common form of 100 Mbit/s Ethernet that utilizes two twisted
1123pairs over a CAT-5 cable.
1124Defined in 802.3u.
1125.It Dv ETHER_MEDIA_100BASE_FX
1126100 Mbit/s Ethernet operating over multi-mode fiber.
1127Defined in 802.3u.
1128.It Dv ETHER_MEDIA_100BASE_X
1129This is a general term that covers operating in one of the 100BASE-?X
1130variants.
1131This is here because some PHYs do not distinguish between operating in
1132100BASE-TX and 100BASE-FX.
1133If the driver can determine if it is operating with a BASE-T or fiber
1134based PHY, prefer the more specific types instead.
1135.It Dv ETHER_MEDIA_100BASE_T4
1136This is an uncommon half-duplex variant of 100 Mbit/s Ethernet that
1137operates over CAT-3 cable using four twisted pairs.
1138Defined in 802.3u.
1139.It Dv ETHER_MEDIA_100BASE_T2
1140This is another uncommon variant of 100 Mbit/s Ethernet that only
1141requires two twisted pairs, but unlike 100BASE-TX requires CAT-3 cables.
1142Defined in 802.3y.
1143.It Dv ETHER_MEDIA_100BASE_T1
1144A more recent form of 100 Mbit/s Ethernet that requires only a single
1145twisted pair.
1146Defined in 802.3bw.
1147.It Dv ETHER_MEDIA_100_SGMII
1148This form of 100 Mbit/s Ethernet is generally used for chip-to-chip
1149connectivity and utilizes the SGMII
1150.Pq Serial gigabit media-independent interface
1151specification.
1152.It Dv ETHER_MEDIA_1000BASE_X
1153This is a general catch-all for all 1 Gbit/s fiber-based operation.
1154This is here for compatibility with the generic information returned by
1155traditional 802.3-compatible PHYs.
1156When more specific information is available, that should be used
1157instead.
1158.It Dv ETHER_MEDIA_1000BASE_T
1159Traditional 1 Gbit/s Ethernet that utilizes a CAT-5 cable with four
1160twisted pairs.
1161Defined in 802.3ab.
1162.It Dv ETHER_MEDIA_1000BASE_T1
1163A more recent form of 1 Gbit/s Ethernet that only requires a single
1164twisted pair.
1165.It Dv ETHER_MEDIA_1000BASE_KX
1166This form of 1 Gbit/s Ethernet is designed for operating over a backplane.
1167Defined in 802.3ap.
1168.It Dv ETHER_MEDIA_1000BASE_CX
1169An older form of 1 Gbit/s Ethernet that operates over balanced copper
1170cables.
1171Defined in 802.3z.
1172.It Dv ETHER_MEDIA_1000BASE_SX
11731 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1174each direction.
1175.It Dv ETHER_MEDIA_1000BASE_LX
11761 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1177each direction.
1178.It Dv ETHER_MEDIA_1000BASE_BX
11791 Gbit/s Ethernet operating over a single piece of single-mode fiber.
1180This media operates bi-directionally as opposed to how 1000BASE-LX and
11811000BASE-SX operate.
1182.It Dv ETHER_MEDIA_1000_SGMII
1183A form of 1 Gbit/s Ethernet defined by Cisco that is used for
1184chip-to-chip connectivity.
1185.It Dv ETHER_MEDIA_2500BASE_T
11862.5 Gbit/s Ethernet based on four copper twisted-pairs.
1187Defined in 802.3bz.
1188.It Dv ETHER_MEDIA_2500BASE_KX
11892.5 Gbit/s Ethernet that is designed for operating over a backplane
1190interconnect.
1191Defined in 802.3cb.
1192.It Dv ETHER_MEDIA_2500BASE_X
1193This is a variant of 2.5 Gbit/s Ethernet that took the 1000BASE-X IEEE
1194standard and ran it with a 2.5x faster clock.
1195It is a defacto standard.
1196.It Dv ETHER_MEDIA_5000BASE_T
11975.0 Gbit/s Ethernet based on four copper twisted-pairs.
1198Defined in 802.3bz.
1199.It Dv ETHER_MEDIA_5000BASE_KR
12005.0 Gbit/s Ethernet that is designed for operating over a backplane
1201interconnect.
1202Defined in 802.3cb.
1203.It Dv ETHER_MEDIA_10GBASE_T
120410 Gbit/s Ethernet operating over four copper twisted pairs utilizing
1205CAT-6a cables.
1206Defined in 802.3an.
1207.It Dv ETHER_MEDIA_10GBASE_SR
120810 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1209each direction.
1210Defined in 802.3ae.
1211.It Dv ETHER_MEDIA_10GBASE_LR
121210 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1213each direction.
1214The maximum fiber length is 10km.
1215Defined in 802.3ae.
1216.It Dv ETHER_MEDIA_10GBASE_ER
121710 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1218each direction.
1219The maximum fiber length is 30km.
1220Defined in 802.3ae.
1221.It Dv ETHER_MEDIA_10GBASE_LRM
122210 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1223each direction.
1224This has a longer reach of up to 220m and is a longer distance than
122510GBASE-SR.
1226Defined in 802.3aq.
1227.It Dv ETHER_MEDIA_10GBASE_KR
122810 Gbit/s Ethernet operating over a single lane backplane.
1229Defined n 802.3ap.
1230.It Dv ETHER_MEDIA_10GBASE_CX4
123110 Gbit/s Ethernet operating over a group of four shielded copper cables.
1232Defined in 802.3ak.
1233.It Dv ETHER_MEDIA_10GBASE_KX4
123410 Gbit/s Ethernet operating over a four lane backplane.
1235Defined n 802.3ap.
1236.It Dv ETHER_MEDIA_10GBASE_CR
123710 Gbit/s Ethernet that is built using a passive copper
1238SFP-compatible cable.
1239This is sometimes called 10GSFP+Cu passive.
1240Defined in SFF-8431.
1241.It Dv ETHER_MEDIA_10GBASE_AOC
124210 Gbit/s Ethernet that is built using a short-range active
1243optical cable that is SFP+-compatible.
1244Defined in SFF-8431.
1245.It Dv ETHER_MEDIA_10GBASE_ACC
124610 Gbit/s Ethernet based upon a single lane of copper cable with an
1247active component that allows it go longer distances than 10GBASE-CR.
1248Defined in SFF-8431.
1249.It Dv ETHER_MEDIA_10G_XAUI
125010 Gbit/s signalling that is defined for use between a MAC and PHY.
1251This is the roman numeral X and attachment unit interface.
1252Sometimes used for chip-to-chip interconnects.
1253Defined in 802.3ae.
1254.It Dv ETHER_MEDIA_10G_SFI
125510 Gbit/s signalling that is defined for use between a MAC and an
1256SFP-based transceiver.
1257Defined in SFF-8431.
1258.It Dv ETHER_MEDIA_10G_XFI
125910 Gbit/s signalling that is defined for use between a MAC and an
1260XFP-based transceiver.
1261Defined in INF-8077i
1262.Pq XFP MSA .
1263.It Dv ETHER_MEDIA_25GBASE_T
126425 Gbit/s Ethernet based upon four twisted pair cables using CAT-8
1265cable.
1266Defined in 802.3bq.
1267.It Dv ETHER_MEDIA_25GBASE_SR
126825 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1269each direction.
1270Defined in 802.3by.
1271.It Dv ETHER_MEDIA_25GBASE_LR
127225 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1273each direction.
1274The maximum fiber length is 10km.
1275Defined in 802.3cc.
1276.It Dv ETHER_MEDIA_25GBASE_ER
127725 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1278each direction.
1279The maximum fiber length is 30km.
1280Defined in 802.3cc.
1281.It Dv ETHER_MEDIA_25GBASE_KR
128225 Gbit/s Ethernet operating over a backplane with a single lane.
1283Defined in 802.3by.
1284.It Dv ETHER_MEDIA_25GBASE_CR
128525 Gbit/s Ethernet operating over a single lane of copper cable.
1286Generally used with an SFP28 style connector.
1287Defined in 802.3by.
1288.It Dv ETHER_MEDIA_25GBASE_AOC
128925 Gbit/s Ethernet based that is built using a short-range active
1290optical cable that is SFP28-compatible.
1291Defined loosely by SFF-8402 and often utilizes 25GBASE-SR.
1292.It Dv ETHER_MEDIA_25GBASE_ACC
129325 Gbit/s Ethernet based upon a single lane of copper cable with an
1294active component that allows it go longer distances than 25GBASE-CR.
1295Defined loosely by SFF-8402.
1296.It Dv ETHER_MEDIA_25G_AUI
129725 Gbit/s signalling that is defined for use between a MAC and PHY and
1298for chip-to-chip connectivity.
1299Defined by 802.3by.
1300.It Dv ETHER_MEDIA_40GBASE_T
130140 Gbit/s Ethernet based upon four twisted-pairs of CAT-8 cables.
1302Defined in 802.3bq.
1303.It Dv ETHER_MEDIA_40GBASE_CR4
130440 Gbit/s Ethernet utilizing four lanes of twinaxial copper cabling
1305each operating at 10 Gbit/s.
1306This is generally used with a QSFP+ connector defined in SFF-8635.
1307Defined in 802.3ba.
1308.It Dv ETHER_MEDIA_40GBASE_KR4
130940 Gbit/s Ethernet utilizing four lanes over a copper backplane each
1310operating at 10 Gbit/s.
1311Defined in 802.3ba.
1312.It Dv ETHER_MEDIA_40GBASE_SR4
131340 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, each
1314operating at 10 Gbit/s, with one fiber in the pair being used for
1315transmit and the other for receive.
1316Generally utilizes a QSFP+ connector.
1317Defined in 802.3ba.
1318.It Dv ETHER_MEDIA_40GBASE_LR4
131940 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1320for each direction.
1321Utilizes wavelength multiplexing as the electrical interface is four 10
1322Gbit/s signals.
1323The maximum fiber length is 10km.
1324Defined in 802.3ba.
1325.It Dv ETHER_MEDIA_40GBASE_ER4
132640 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1327for each direction.
1328Utilizes wavelength multiplexing as the electrical interface is four 10
1329Gbit/s signals and generally based upon a QSFP+ connector.
1330The maximum fiber length is 40km.
1331Defined in 802.3bm.
1332.It Dv ETHER_MEDIA_40GBASE_LM4
133340 Gbit/s Ethernet based upon using one pair of multi-mode fibers, one
1334for each direction.
1335Utilizes wavelength multiplexing as the electrical interface is four 10
1336Gbit/s signals and generally based upon a QSFP+ connector.
1337Defined by a specific MSA.
1338.It Dv ETHER_MEDIA_40GBASE_AOC4
133940 Gbit/s Ethernet based upon a QSFP+ based cable with built-in
1340optical transceivers.
1341The electrical interface is four lanes running at 10 Gbit/s.
1342.It Dv ETHER_MEDIA_40GBASE_ACC4
134340 Gbit/s Ethernet based upon four copper lanes each running at 10
1344Gbit/s with some additional component compared to 40GBASE-CR4.
1345.It Dv ETHER_MEDIA_40G_XLAUI
134640 Gbit/s signalling operating across four lanes that is defined for use
1347between a MAC and a PHY or for chip-to-chip connectivity.
1348Defined by 802.3ba.
1349.It Dv ETHER_MEDIA_40G_XLPPI
135040 Gbit/s signalling operating across four lanes that is designed to
1351connect between a chip and a module, generally a QSFP+ based device.
1352Defined in 802.3ba.
1353.It Dv ETHER_MEDIA_50GBASE_KR2
135450 Gbit/s Ethernet which operates over a two lane copper backplane.
1355Each lane operates at 25 Gbit/s.
1356Defined by the 25G and 50G Ethernet consortium.
1357This did not become an IEEE standard.
1358.It Dv ETHER_MEDIA_50GBASE_CR2
135950 Gbit/s Ethernet which operates over two lane copper twinaxial cable,
1360generally with a QSFP+ connector.
1361Each lane operates at 25 Gbit/s.
1362Defined by the 25G and 50G Ethernet consortium.
1363.It Dv ETHER_MEDIA_50GBASE_SR2
136450 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, each
1365operating at 25 Gbit/s, with one fiber in the pair being used for
1366transmit and the other for receive.
1367Generally utilizes a QSFP+ connector.
1368Defined by the 25G and 50G Ethernet consortium.
1369.It Dv ETHER_MEDIA_50GBASE_LR2
137050 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1371for each direction.
1372Utilizes wavelength multiplexing as the electrical interface is two 25
1373Gbit/s signals.
1374Defined by the 25G and 50G Ethernet consortium.
1375.It Dv ETHER_MEDIA_50GBASE_AOC2
137650 Gbit/s Ethernet generally based upon a QSFP+ based cable with built-in
1377optical transceivers.
1378The electrical interface is two lanes running at 25 Gbit/s.
1379.It Dv ETHER_MEDIA_50GBASE_ACC2
138050 Gbit/s Ethernet based upon two copper twinaxial lanes each running at
138125 Gbit/s with some additional component compared to 50GBASE-CR2.
1382.It Dv ETHER_MEDIA_50GBASE_KR
138350 Gbit/s Ethernet operating over a single lane backplane.
1384Defined by 802.3cd.
1385.It Dv ETHER_MEDIA_50GBASE_CR
138650 Gbit/s Ethernet operating over a single lane twinaxial copper cable
1387generally utilizing an SFP56 interface.
1388Defined by 802.3cd.
1389.It Dv ETHER_MEDIA_50GBASE_SR
139050 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1391each direction.
1392Defined by 802.3cd.
1393.It Dv ETHER_MEDIA_50GBASE_LR
139450 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1395each direction.
1396The maximum fiber length is 10km.
1397Defined in 802.3cd.
1398.It Dv ETHER_MEDIA_50GBASE_ER
139950 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1400each direction.
1401The maximum fiber length is 40km.
1402Defined in 802.3cd.
1403.It Dv ETHER_MEDIA_50GBASE_FR
140450 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1405each direction.
1406The maximum fiber length is 2km.
1407Defined in 802.3cd.
1408.It Dv ETHER_MEDIA_50GBASE_AOC
140950 Gbit/s Ethernet that is built using a short-range active optical
1410cable that is generally SFP56 compatible.
1411The electrical interface operates at 25 Gbit/s PAM4 signaling.
1412.It Dv ETHER_MEDIA_50GBASE_ACC
141350 Gbit/s Ethernet that is built using a single lane twinaxial
1414cable that is generally SFP56 compatible but uses an active component
1415such as a retimer or redriver when compared to 50GBASE-CR.
1416.It Dv ETHER_MEDIA_100GBASE_CR10
1417100 Gbit/s Ethernet operating over ten lanes of shielded twinaxial
1418copper cable, each operating at 10 Gbit/s.
1419Defined in 802.3ba.
1420.It Dv ETHER_MEDIA_100GBASE_SR10
1421100 Gbit/s Ethernet based upon using ten pairs of multi-mode fiber, each
1422operating at 10 Gbit/s, with one fiber in the pair being used for
1423transmit and the other for receive.
1424.It Dv ETHER_MEDIA_100GBASE_SR4
1425100 Gbit/s Ethernet based upon using four pairs of multi-mode fiber,
1426each operating at 25 Gbit/s, with one fiber in the pair being used for
1427transmit and the other for receive.
1428Defined by 802.3bm.
1429.It Dv ETHER_MEDIA_100GBASE_LR4
1430100 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1431for each direction.
1432Utilizes wavelength multiplexing as the electrical interface is four 25
1433Gbit/s signals and generally based upon a QSFP28 connector.
1434The maximum fiber length is 10km.
1435Defined by 802.3ba.
1436.It Dv ETHER_MEDIA_100GBASE_ER4
1437100 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1438for each direction.
1439Utilizes wavelength multiplexing as the electrical interface is four 25
1440Gbit/s signals and generally based upon a QSFP28 connector.
1441The maximum fiber length is 40km.
1442Defined by 802.3ba.
1443.It Dv ETHER_MEDIA_100GBASE_KR4
1444100 Gbit/s Ethernet based upon using a four lane copper backplane.
1445Each lane operates at 25 Gbit/s.
1446Defined in 802.3bj.
1447.It Dv ETHER_MEDIA_100GBASE_CAUI4
1448100 Gbit/s signalling used for chip-to-chip and chip-to-module
1449connectivity.
1450Defined in 802.3bm.
1451.It Dv ETHER_MEDIA_100GBASE_CR4
1452100 Gbit/s Ethernet based upon using a four lane copper twinaxial cable.
1453Each lane operates at 25 Gbit/s and generally utilizes a QSFP28
1454connector.
1455Defined in 802.3bj.
1456.It Dv ETHER_MEDIA_100GBASE_AOC4
1457100 Gbit/s Ethernet that utilizes an active optical cable with
1458short-range optical transceivers.
1459Electrically operates as four lanes of 25 Gbit/s and most commonly uses
1460a QSFP28 connector.
1461.It Dv ETHER_MEDIA_100GBASE_ACC4
1462100 Gbit/s Ethernet that utilizes a four lane copper twinaxial cable
1463that unlike 100GBASE-CR4 has an active component such as a retimer or
1464redriver.
1465.It Dv ETHER_MEDIA_100GBASE_KR2
1466100 Gbit/s Ethernet based upon using a two lane copper backplane.
1467Each lane operates at 50 Gbit/s.
1468Defined in 802.3cd.
1469.It Dv ETHER_MEDIA_100GBASE_CR2
1470100 Gbit/s Ethernet that utilizes a two lane copper twinaxial cable.
1471Each lane operates at 50 Gbit/s.
1472Defined by 802.3cd.
1473.It Dv ETHER_MEDIA_100GBASE_SR2
1474100 Gbit/s Ethernet based upon using two pairs of multi-mode fiber,
1475each operating at 50 Gbit/s, with one fiber in the pair being used for
1476transmit and the other for receive.
1477Defined by 802.3cd.
1478.It Dv ETHER_MEDIA_100GBASE_KR
1479100 Gbit/s Ethernet operating over a single lane copper backplane.
1480Defined by 802.3ck.
1481.It Dv ETHER_MEDIA_100GBASE_CR
1482100 Gbit/s Ethernet operating over a single lane copper twinaxial cable.
1483Generally uses an SFP112 connector.
1484Defined by 802.3ck.
1485.It Dv ETHER_MEDIA_100GBASE_SR
1486100 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1487transmitting and one for receiving.
1488The maximum fiber length is 60-100m depending on the fiber type
1489.Pq OM3, OM4 .
1490Defined by 802.3db.
1491.It Dv ETHER_MEDIA_100GBASE_DR
1492100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1493transmitting and one for receiving.
1494Designed to be used with a parallel DR4/DR8 interface.
1495The maximum fiber length is 500m.
1496Defined by 802.3cd.
1497.It Dv ETHER_MEDIA_100GBASE_LR
1498100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1499transmitting and one for receiving.
1500The maximum fiber length is 10km.
1501Defined by 802.3cu.
1502.It Dv ETHER_MEDIA_100GBASE_FR
1503100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1504transmitting and one for receiving.
1505The maximum fiber length is 2km.
1506Defined by 802.3cu.
1507.It Dv ETHER_MEDIA_200GBASE_CR4
1508200 Gbit/s Ethernet utilizing a four lane passive copper twinaxial
1509cable.
1510Each lane operates at 50 Gbit/s and the connector is generally based on
1511QSFP56.
1512Defined by 802.3cd.
1513.It Dv ETHER_MEDIA_200GBASE_KR4
1514200 Gbit/s Ethernet utilizing four lanes over a copper backplane each
1515operating at 50 Gbit/s.
1516Defined by 802.3cd.
1517.It Dv ETHER_MEDIA_200GBASE_SR4
1518200 Gbit/s Ethernet based upon using four pairs of multi-mode fiber,
1519each operating at 50 Gbit/s, with one fiber in the pair being used for
1520transmit and the other for receive.
1521Defined by 802.3cd.
1522.It Dv ETHER_MEDIA_200GBASE_DR4
1523200 Gbit/s Ethernet based upon using four pairs of single-mode fiber,
1524each operating at 50 Gbit/s, with one fiber in the pair being used for
1525transmit and the other for receive.
1526Defined by 802.3bs.
1527.It Dv ETHER_MEDIA_200GBASE_FR4
1528200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1529for transmitting and one for receiving.
1530Utilizes wavelength multiplexing as the electrical interface is four 50
1531Gbit/s signals and generally based upon a QSFP56 connector.
1532The maximum fiber length is 2km.
1533Defined by 802.3bs.
1534.It Dv ETHER_MEDIA_200GBASE_LR4
1535200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1536for transmitting and one for receiving.
1537Utilizes wavelength multiplexing as the electrical interface is four 50
1538Gbit/s signals and generally based upon a QSFP56 connector.
1539The maximum fiber length is 10km.
1540Defined by 802.3bs.
1541.It Dv ETHER_MEDIA_200GBASE_ER4
1542200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1543for transmitting and one for receiving.
1544Utilizes wavelength multiplexing as the electrical interface is four 50
1545Gbit/s signals and generally based upon a QSFP56 connector.
1546The maximum fiber length is 40km.
1547Defined by 802.3bs.
1548.It Dv ETHER_MEDIA_200GAUI_4
1549200 Gbit/s signalling utilizing four lanes each operating at 50 Gbit/s.
1550Used for chip-to-chip and chip-to-module connections.
1551Defined by 802.3bs.
1552.It Dv ETHER_MEDIA_200GBASE_KR2
1553200 Gbit/s Ethernet utilizing two lanes over a copper backplane each
1554operating at 100 Gbit/s.
1555Defined by 802.3ck.
1556.It Dv ETHER_MEDIA_200GBASE_CR2
1557200 Gbit/s Ethernet utilizing a two lane passive copper twinaxial
1558cable.
1559Each lane operates at 100 Gbit/s.
1560Defined by 802.3ck.
1561.It Dv ETHER_MEDIA_200GBASE_SR2
1562200 Gbit/s Ethernet based upon using two pairs of multi-mode fiber,
1563each operating at 100 Gbit/s, with one fiber in the pair being used for
1564transmit and the other for receive.
1565Defined by 802.3db.
1566.It Dv ETHER_MEDIA_200GAUI_2
1567200 Gbit/s signalling utilizing two lanes each operating at 100 Gbit/s.
1568Used for chip-to-chip and chip-to-module connections.
1569Defined by 802.3ck.
1570.It Dv ETHER_MEDIA_400GBASE_KR8
1571400 Gbit/s Ethernet utilizing eight lanes over a copper backplane each
1572operating at 50 Gbit/s.
1573Defined by the 25/50 Gigabit Ethernet Consortium.
1574.It Dv ETHER_MEDIA_400GBASE_FR8
1575200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1576for transmitting and one for receiving.
1577Utilizes wavelength multiplexing as the electrical interface is eight 50
1578Gbit/s signals and generally based upon a QSFP-DD connector.
1579The maximum fiber length is 2km.
1580Defined by 802.3bs.
1581.It Dv ETHER_MEDIA_400GBASE_LR8
1582200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1583for transmitting and one for receiving.
1584Utilizes wavelength multiplexing as the electrical interface is eight 50
1585Gbit/s signals and generally based upon a QSFP-DD connector.
1586The maximum fiber length is 10km.
1587Defined by 802.3bs.
1588.It Dv ETHER_MEDIA_400GBASE_ER8
1589200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1590for transmitting and one for receiving.
1591Utilizes wavelength multiplexing as the electrical interface is eight 50
1592Gbit/s signals and generally based upon a QSFP-DD connector.
1593The maximum fiber length is 40km.
1594Defined by 802.3cn.
1595.It Dv ETHER_MEDIA_400GAUI_8
1596400 Gbit/s signalling utilizing eight lanes each operating at 50 Gbit/s.
1597Used for chip-to-chip and chip-to-module connections.
1598Defined by 802.3bs.
1599.It Dv ETHER_MEDIA_400GBASE_KR4
1600400 Gbit/s Ethernet utilizing four lanes over a copper backplane each
1601operating at 100 Gbit/s.
1602Defined by 802.3ck.
1603.It Dv ETHER_MEDIA_400GBASE_CR4
1604200 Gbit/s Ethernet utilizing a two lane passive copper twinaxial
1605cable.
1606Each lane operates at 100 Gbit/s and generally uses a QSFP112 connector.
1607Defined by 802.3ck.
1608.It Dv ETHER_MEDIA_400GBASE_SR4
1609400 Gbit/s Ethernet based upon using four pairs of multi-mode fiber,
1610each operating at 100 Gbit/s, with one fiber in the pair being used for
1611transmit and the other for receive.
1612Defined by 802.3db.
1613.It Dv ETHER_MEDIA_400GBASE_DR4
1614400 Gbit/s Ethernet based upon using four pairs of single-mode fiber,
1615each operating at 100 Gbit/s, with one fiber in the pair being used for
1616transmit and the other for receive.
1617The maximum fiber length is 500m.
1618Defined by 802.3bs.
1619.It Dv ETHER_MEDIA_400GBASE_FR4
1620400 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1621for transmitting and one for receiving.
1622Utilizes wavelength multiplexing as the electrical interface is four 100
1623Gbit/s signals and generally based upon a QSFP112 connector.
1624The maximum fiber length is 2km.
1625Defined by 802.3cu.
1626.It Dv ETHER_MEDIA_400GAUI_4
1627400 Gbit/s signalling utilizing four lanes each operating at 100 Gbit/s.
1628Used for chip-to-chip and chip-to-module connections.
1629Defined by 802.3ck.
1630.El
1631.It Dv MAC_PROP_AUTONEG
1632.Bd -filled -compact
1633Type:
1634.Vt uint8_t |
1635Permissions:
1636.Sy Read/Write
1637.Ed
1638.Pp
1639The
1640.Dv MAC_PROP_AUTONEG
1641property indicates whether or not the device is currently configured to
1642perform auto-negotiation.
1643A value of
1644.Sy 0
1645indicates that auto-negotiation is disabled.
1646A
1647.Sy non-zero
1648value indicates that auto-negotiation is enabled.
1649Devices should generally default to enabling auto-negotiation.
1650.Pp
1651When getting this property, the device driver should return the current
1652state.
1653When setting this property, if the device supports operating in the requested
1654mode, then the device driver should reset the link to negotiate to the new speed
1655after updating any internal registers.
1656.It Dv MAC_PROP_MTU
1657.Bd -filled -compact
1658Type:
1659.Vt uint32_t |
1660Permissions:
1661.Sy Read/Write
1662.Ed
1663.Pp
1664The
1665.Dv MAC_PROP_MTU
1666property determines the maximum transmission unit (MTU).
1667This indicates the maximum size packet that the device can transmit, ignoring
1668its own headers.
1669For an Ethernet device, this would exclude the size of the Ethernet header and
1670any VLAN headers that would be placed.
1671It is up to the driver to ensure that any MTU values that it accepts when adding
1672in its margin and header sizes does not exceed its maximum frame size.
1673.Pp
1674By default, drivers for Ethernet should initialize this value and the
1675MTU to
1676.Sy 1500 .
1677When getting this property, the driver should return its current
1678recorded MTU.
1679When setting this property, the driver should first validate that it is within
1680the device's valid range and then it must call
1681.Xr mac_maxsdu_update 9F .
1682Note that the call may fail.
1683If the call completes successfully, the driver should update the hardware with
1684the new value of the MTU and perform any other work needed to handle it.
1685.Pp
1686If the device does not support changing the MTU after the device's
1687.Xr mc_start 9E
1688entry point has been called, then driver writers should return
1689.Er EBUSY .
1690.It Dv MAC_PROP_FLOWCTRL
1691.Bd -filled -compact
1692Type:
1693.Vt link_flowctrl_t |
1694Permissions:
1695.Sy Read/Write
1696.Ed
1697.Pp
1698The
1699.Dv MAC_PROP_FLOWCTRL
1700property manages the configuration of pause frames as part of Ethernet
1701flow control.
1702Note, this only describes what this device will advertise.
1703What is actually enabled may be different and is subject to the rules of
1704auto-negotiation.
1705The
1706.Vt link_flowctrl_t
1707is an enumeration that may be set to one of the following values:
1708.Bl -tag -width Ds
1709.It Dv LINK_FLOWCTRL_NONE
1710Flow control is disabled.
1711No pause frames should be generated or honored.
1712.It Dv LINK_FLOWCTRL_RX
1713The device can receive pause frames; however, it should not generate
1714them.
1715.It Dv LINK_FLOWCTRL_TX
1716The device can generate pause frames; however, it does not support
1717receiving them.
1718.It Dv LINK_FLOWCTRL_BI
1719The device supports both sending and receiving pause frames.
1720.El
1721.Pp
1722When getting this property, the device driver should return the way that
1723it has configured the device, not what the device has actually
1724negotiated.
1725When setting the property, it should update the hardware and allow the link to
1726potentially perform auto-negotiation again.
1727.It Dv MAC_PROP_EN_FEC_CAP
1728.Bd -filled -compact
1729Type:
1730.Vt link_fec_t |
1731Permissions:
1732.Sy Read/Write
1733.Ed
1734.Pp
1735The
1736.Dv MAC_PROP_EN_FEC_CAP
1737property indicates which Forward Error Correction (FEC) code is advertised
1738by the device.
1739.Pp
1740The
1741.Vt link_fec_t
1742is an enumeration that may be a combination of the following bit values:
1743.Bl -tag -width Ds
1744.It Dv LINK_FEC_NONE
1745No FEC over the link.
1746.It Dv LINK_FEC_AUTO
1747The FEC coding to use is auto-negotiated,
1748.Dv LINK_FEC_AUTO
1749cannot be set along with any of the other values.
1750This is the default setting the device driver should use.
1751.It Dv LINK_FEC_RS
1752The link may use Reed-Solomon FEC coding.
1753.It Dv LINK_FEC_BASE_R
1754The link may use Base-R coding, also common referred to as FireCode.
1755.El
1756.Pp
1757When setting the property, it should update the hardware with the requested, or
1758combination of requested codings.
1759If a particular combination of codings is not supported by the hardware,
1760the device driver should return
1761.Er EINVAL .
1762When retrieving this property, the device driver should return the current
1763value of the property.
1764.It Dv MAC_PROP_ADV_FEC_CAP
1765.Bd -filled -compact
1766Type:
1767.Vt link_fec_t |
1768Permissions:
1769.Sy Read-Only
1770.Ed
1771.Pp
1772The
1773.Dv MAC_PROP_ADV_FEC_CAP
1774has the same values as
1775.Dv MAC_PROP_EN_FEC_CAP .
1776The property indicates which Forward Error Correction (FEC) code has been
1777negotiated over the link.
1778.El
1779.Pp
1780The remaining properties are all about various auto-negotiation link
1781speeds.
1782They fall into two different buckets: properties with
1783.Sy _ADV_
1784in the name and properties with
1785.Sy _EN_
1786in the name.
1787For any given supported speed, there is one of each.
1788The
1789.Sy _EN_
1790set of properties are read/write properties that control what should be
1791advertised by the device.
1792When these are retrieved, they should return the current value of the property.
1793When they are set, they should change how the hardware advertises the specific
1794speed and trigger any kind of link reset and auto-negotiation, if enabled, to
1795occur.
1796.Pp
1797The
1798.Sy _ADV_
1799set of properties are read-only properties.
1800They are meant to reflect what has actually been negotiated.
1801These may be different from the
1802.Sy _EN_
1803family of properties, especially when different power management
1804settings are at play.
1805.Pp
1806See the
1807.Sx Link Speed and Auto-negotiation
1808section for more information.
1809.Pp
1810The properties are ordered in increasing link speed:
1811.Bl -hang -width Ds
1812.It Dv MAC_PROP_ADV_10HDX_CAP
1813.Bd -filled -compact
1814Type:
1815.Vt uint8_t |
1816Permissions:
1817.Sy Read-Only
1818.Ed
1819.Pp
1820The
1821.Dv MAC_PROP_ADV_10HDX_CAP
1822property describes whether or not 10 Mbit/s half-duplex support is
1823advertised.
1824.It Dv MAC_PROP_EN_10HDX_CAP
1825.Bd -filled -compact
1826Type:
1827.Vt uint8_t |
1828Permissions:
1829.Sy Read/Write
1830.Ed
1831.Pp
1832The
1833.Dv MAC_PROP_EN_10HDX_CAP
1834property describes whether or not 10 Mbit/s half-duplex support is
1835enabled.
1836.It Dv MAC_PROP_ADV_10FDX_CAP
1837.Bd -filled -compact
1838Type:
1839.Vt uint8_t |
1840Permissions:
1841.Sy Read-Only
1842.Ed
1843.Pp
1844The
1845.Dv MAC_PROP_ADV_10FDX_CAP
1846property describes whether or not 10 Mbit/s full-duplex support is
1847advertised.
1848.It Dv MAC_PROP_EN_10FDX_CAP
1849.Bd -filled -compact
1850Type:
1851.Vt uint8_t |
1852Permissions:
1853.Sy Read/Write
1854.Ed
1855.Pp
1856The
1857.Dv MAC_PROP_EN_10FDX_CAP
1858property describes whether or not 10 Mbit/s full-duplex support is
1859enabled.
1860.It Dv MAC_PROP_ADV_100HDX_CAP
1861.Bd -filled -compact
1862Type:
1863.Vt uint8_t |
1864Permissions:
1865.Sy Read-Only
1866.Ed
1867.Pp
1868The
1869.Dv MAC_PROP_ADV_100HDX_CAP
1870property describes whether or not 100 Mbit/s half-duplex support is
1871advertised.
1872.It Dv MAC_PROP_EN_100HDX_CAP
1873.Bd -filled -compact
1874Type:
1875.Vt uint8_t |
1876Permissions:
1877.Sy Read/Write
1878.Ed
1879.Pp
1880The
1881.Dv MAC_PROP_EN_100HDX_CAP
1882property describes whether or not 100 Mbit/s half-duplex support is
1883enabled.
1884.It Dv MAC_PROP_ADV_100FDX_CAP
1885.Bd -filled -compact
1886Type:
1887.Vt uint8_t |
1888Permissions:
1889.Sy Read-Only
1890.Ed
1891.Pp
1892The
1893.Dv MAC_PROP_ADV_100FDX_CAP
1894property describes whether or not 100 Mbit/s full-duplex support is
1895advertised.
1896.It Dv MAC_PROP_EN_100FDX_CAP
1897.Bd -filled -compact
1898Type:
1899.Vt uint8_t |
1900Permissions:
1901.Sy Read/Write
1902.Ed
1903.Pp
1904The
1905.Dv MAC_PROP_EN_100FDX_CAP
1906property describes whether or not 100 Mbit/s full-duplex support is
1907enabled.
1908.It Dv MAC_PROP_ADV_100T4_CAP
1909.Bd -filled -compact
1910Type:
1911.Vt uint8_t |
1912Permissions:
1913.Sy Read-Only
1914.Ed
1915.Pp
1916The
1917.Dv MAC_PROP_ADV_100T4_CAP
1918property describes whether or not 100 Mbit/s Ethernet using the
1919100BASE-T4 standard is
1920advertised.
1921.It Dv MAC_PROP_EN_100T4_CAP
1922.Bd -filled -compact
1923Type:
1924.Vt uint8_t |
1925Permissions:
1926.Sy Read/Write
1927.Ed
1928.Pp
1929The
1930.Dv MAC_PROP_EN_100T4_CAP
1931property describes whether or not 100 Mbit/s Ethernet using the
1932100BASE-T4 standard is
1933enabled.
1934.It Dv MAC_PROP_ADV_1000HDX_CAP
1935.Bd -filled -compact
1936Type:
1937.Vt uint8_t |
1938Permissions:
1939.Sy Read-Only
1940.Ed
1941.Pp
1942The
1943.Dv MAC_PROP_ADV_1000HDX_CAP
1944property describes whether or not 1 Gbit/s half-duplex support is
1945advertised.
1946.It Dv MAC_PROP_EN_1000HDX_CAP
1947.Bd -filled -compact
1948Type:
1949.Vt uint8_t |
1950Permissions:
1951.Sy Read/Write
1952.Ed
1953.Pp
1954The
1955.Dv MAC_PROP_EN_1000HDX_CAP
1956property describes whether or not 1 Gbit/s half-duplex support is
1957enabled.
1958.It Dv MAC_PROP_ADV_1000FDX_CAP
1959.Bd -filled -compact
1960Type:
1961.Vt uint8_t |
1962Permissions:
1963.Sy Read-Only
1964.Ed
1965.Pp
1966The
1967.Dv MAC_PROP_ADV_1000FDX_CAP
1968property describes whether or not 1 Gbit/s full-duplex support is
1969advertised.
1970.It Dv MAC_PROP_EN_1000FDX_CAP
1971.Bd -filled -compact
1972Type:
1973.Vt uint8_t |
1974Permissions:
1975.Sy Read/Write
1976.Ed
1977.Pp
1978The
1979.Dv MAC_PROP_EN_1000FDX_CAP
1980property describes whether or not 1 Gbit/s full-duplex support is
1981enabled.
1982.It Dv MAC_PROP_ADV_2500FDX_CAP
1983.Bd -filled -compact
1984Type:
1985.Vt uint8_t |
1986Permissions:
1987.Sy Read-Only
1988.Ed
1989.Pp
1990The
1991.Dv MAC_PROP_ADV_2500FDX_CAP
1992property describes whether or not 2.5 Gbit/s full-duplex support is
1993advertised.
1994.It Dv MAC_PROP_EN_2500FDX_CAP
1995.Bd -filled -compact
1996Type:
1997.Vt uint8_t |
1998Permissions:
1999.Sy Read/Write
2000.Ed
2001.Pp
2002The
2003.Dv MAC_PROP_EN_2500FDX_CAP
2004property describes whether or not 2.5 Gbit/s full-duplex support is
2005enabled.
2006.It Dv MAC_PROP_ADV_5000FDX_CAP
2007.Bd -filled -compact
2008Type:
2009.Vt uint8_t |
2010Permissions:
2011.Sy Read-Only
2012.Ed
2013.Pp
2014The
2015.Dv MAC_PROP_ADV_5000FDX_CAP
2016property describes whether or not 5.0 Gbit/s full-duplex support is
2017advertised.
2018.It Dv MAC_PROP_EN_5000FDX_CAP
2019.Bd -filled -compact
2020Type:
2021.Vt uint8_t |
2022Permissions:
2023.Sy Read/Write
2024.Ed
2025.Pp
2026The
2027.Dv MAC_PROP_EN_5000FDX_CAP
2028property describes whether or not 5.0 Gbit/s full-duplex support is
2029enabled.
2030.It Dv MAC_PROP_ADV_10GFDX_CAP
2031.Bd -filled -compact
2032Type:
2033.Vt uint8_t |
2034Permissions:
2035.Sy Read-Only
2036.Ed
2037.Pp
2038The
2039.Dv MAC_PROP_ADV_10GFDX_CAP
2040property describes whether or not 10 Gbit/s full-duplex support is
2041advertised.
2042.It Dv MAC_PROP_EN_10GFDX_CAP
2043.Bd -filled -compact
2044Type:
2045.Vt uint8_t |
2046Permissions:
2047.Sy Read/Write
2048.Ed
2049.Pp
2050The
2051.Dv MAC_PROP_EN_10GFDX_CAP
2052property describes whether or not 10 Gbit/s full-duplex support is
2053enabled.
2054.It Dv MAC_PROP_ADV_40GFDX_CAP
2055.Bd -filled -compact
2056Type:
2057.Vt uint8_t |
2058Permissions:
2059.Sy Read-Only
2060.Ed
2061.Pp
2062The
2063.Dv MAC_PROP_ADV_40GFDX_CAP
2064property describes whether or not 40 Gbit/s full-duplex support is
2065advertised.
2066.It Dv MAC_PROP_EN_40GFDX_CAP
2067.Bd -filled -compact
2068Type:
2069.Vt uint8_t |
2070Permissions:
2071.Sy Read/Write
2072.Ed
2073.Pp
2074The
2075.Dv MAC_PROP_EN_40GFDX_CAP
2076property describes whether or not 40 Gbit/s full-duplex support is
2077enabled.
2078.It Dv MAC_PROP_ADV_100GFDX_CAP
2079.Bd -filled -compact
2080Type:
2081.Vt uint8_t |
2082Permissions:
2083.Sy Read-Only
2084.Ed
2085.Pp
2086The
2087.Dv MAC_PROP_ADV_100GFDX_CAP
2088property describes whether or not 100 Gbit/s full-duplex support is
2089advertised.
2090.It Dv MAC_PROP_EN_100GFDX_CAP
2091.Bd -filled -compact
2092Type:
2093.Vt uint8_t |
2094Permissions:
2095.Sy Read/Write
2096.Ed
2097.Pp
2098The
2099.Dv MAC_PROP_EN_100GFDX_CAP
2100property describes whether or not 100 Gbit/s full-duplex support is
2101enabled.
2102.El
2103.Ss Private Properties
2104In addition to the defined properties above, drivers are allowed to
2105define private properties.
2106These private properties are device-specific properties.
2107All private properties share the same constant,
2108.Dv MAC_PROP_PRIVATE .
2109Properties are distinguished by a name, which is a character string.
2110The list of such private properties is defined when registering with mac in the
2111.Fa m_priv_props
2112member of the
2113.Xr mac_register 9S
2114structure.
2115.Pp
2116The driver may define whatever semantics it wants for these private
2117properties.
2118They will not be listed when running
2119.Xr dladm 8 ,
2120unless explicitly requested by name.
2121All such properties should start with a leading underscore character and then
2122consist of alphanumeric ASCII characters and additional underscores or hyphens.
2123.Pp
2124Properties of type
2125.Dv MAC_PROP_PRIVATE
2126may show up in all three property related entry points:
2127.Xr mc_propinfo 9E ,
2128.Xr mc_getprop 9E ,
2129and
2130.Xr mc_setprop 9E .
2131Device drivers should tell the different properties apart by using the
2132.Xr strcmp 9F
2133function to compare it to the set of properties that it knows about.
2134When encountering properties that it doesn't know, it should treat them
2135like all other unknown properties.
2136.Sh STATISTICS
2137The MAC framework defines a couple different sets of statistics which
2138are based on various standards for devices to implement.
2139Statistics are retrieved through the
2140.Xr mc_getstat 9E
2141entry point.
2142There are both statistics that are required for all devices and then there is a
2143separate set of Ethernet specific statistics.
2144Not all devices will support every statistic.
2145In many cases, several device registers will need to be combined to create the
2146proper stat.
2147.Pp
2148In general, if the device is not keeping track of these statistics, then
2149it is recommended that the driver store these values as a
2150.Vt uint64_t
2151to ensure that overflow does not occur.
2152.Pp
2153If a device does not support a specific statistic, then it is fine to
2154return that it is not supported.
2155The same should be used for unrecognized statistics.
2156See
2157.Xr mc_getstat 9E
2158for more information on the proper way to handle these.
2159.Ss General Device Statistics
2160The following statistics are based on MIB-II statistics from both RFC
21611213 and RFC 1573.
2162.Bl -tag -width Ds
2163.It Dv MAC_STAT_IFSPEED
2164The device's current speed in bits per second.
2165.It Dv MAC_STAT_MULTIRCV
2166The total number of received multicast packets.
2167.It Dv MAC_STAT_BRDCSTRCV
2168The total number of received broadcast packets.
2169.It Dv MAC_STAT_MULTIXMT
2170The total number of transmitted multicast packets.
2171.It Dv MAC_STAT_BRDCSTXMT
2172The total number of received broadcast packets.
2173.It Dv MAC_STAT_NORCVBUF
2174The total number of packets discarded by the hardware due to a lack of
2175receive buffers.
2176.It Dv MAC_STAT_IERRORS
2177The total number of errors detected on input.
2178.It Dv MAC_STAT_UNKNOWNS
2179The total number of received packets that were discarded because they
2180were of an unknown protocol.
2181.It Dv MAC_STAT_NOXMTBUF
2182The total number of outgoing packets dropped due to a lack of transmit
2183buffers.
2184.It Dv MAC_STAT_OERRORS
2185The total number of outgoing packets that resulted in errors.
2186.It Dv MAC_STAT_COLLISIONS
2187Total number of collisions encountered by the transmitter.
2188.It Dv MAC_STAT_RBYTES
2189The total number of bytes received by the device, regardless of packet
2190type.
2191.It Dv MAC_STAT_IPACKETS
2192The total number of packets received by the device, regardless of packet type.
2193.It Dv MAC_STAT_OBYTES
2194The total number of bytes transmitted by the device, regardless of packet type.
2195.It Dv MAC_STAT_OPACKETS
2196The total number of packets sent by the device, regardless of packet type.
2197.It Dv MAC_STAT_UNDERFLOWS
2198The total number of packets that were smaller than the minimum sized
2199packet for the device and were therefore dropped.
2200.It Dv MAC_STAT_OVERFLOWS
2201The total number of packets that were larger than the maximum sized
2202packet for the device and were therefore dropped.
2203.El
2204.Ss Ethernet Specific Statistics
2205The following statistics are specific to Ethernet devices.
2206They refer to values from RFC 1643 and include various MII/GMII specific stats.
2207Many of these are also defined in IEEE 802.3.
2208.Bl -tag -width Ds
2209.It Dv ETHER_STAT_ADV_CAP_1000FDX
2210Indicates that the device is advertising support for 1 Gbit/s
2211full-duplex operation.
2212.It Dv ETHER_STAT_ADV_CAP_1000HDX
2213Indicates that the device is advertising support for 1 Gbit/s
2214half-duplex operation.
2215.It Dv ETHER_STAT_ADV_CAP_100FDX
2216Indicates that the device is advertising support for 100 Mbit/s
2217full-duplex operation.
2218.It Dv ETHER_STAT_ADV_CAP_100GFDX
2219Indicates that the device is advertising support for 100 Gbit/s
2220full-duplex operation.
2221.It Dv ETHER_STAT_ADV_CAP_100HDX
2222Indicates that the device is advertising support for 100 Mbit/s
2223half-duplex operation.
2224.It Dv ETHER_STAT_ADV_CAP_100T4
2225Indicates that the device is advertising support for 100 Mbit/s
2226100BASE-T4 operation.
2227.It Dv ETHER_STAT_ADV_CAP_10FDX
2228Indicates that the device is advertising support for 10 Mbit/s
2229full-duplex operation.
2230.It Dv ETHER_STAT_ADV_CAP_10GFDX
2231Indicates that the device is advertising support for 10 Gbit/s
2232full-duplex operation.
2233.It Dv ETHER_STAT_ADV_CAP_10HDX
2234Indicates that the device is advertising support for 10 Mbit/s
2235half-duplex operation.
2236.It Dv ETHER_STAT_ADV_CAP_2500FDX
2237Indicates that the device is advertising support for 2.5 Gbit/s
2238full-duplex operation.
2239.It Dv ETHER_STAT_ADV_CAP_40GFDX
2240Indicates that the device is advertising support for 40 Gbit/s
2241full-duplex operation.
2242.It Dv ETHER_STAT_ADV_CAP_5000FDX
2243Indicates that the device is advertising support for 5.0 Gbit/s
2244full-duplex operation.
2245.It Dv ETHER_STAT_ADV_CAP_ASMPAUSE
2246Indicates that the device is advertising support for receiving pause
2247frames.
2248.It Dv ETHER_STAT_ADV_CAP_AUTONEG
2249Indicates that the device is advertising support for auto-negotiation.
2250.It Dv ETHER_STAT_ADV_CAP_PAUSE
2251Indicates that the device is advertising support for generating pause
2252frames.
2253.It Dv ETHER_STAT_ADV_REMFAULT
2254Indicates that the device is advertising support for detecting faults in
2255the remote link peer.
2256.It Dv ETHER_STAT_ALIGN_ERRORS
2257Indicates the number of times an alignment error was generated by the
2258Ethernet device.
2259This is a count of packets that were not an integral number of octets and failed
2260the FCS check.
2261.It Dv ETHER_STAT_CAP_1000FDX
2262Indicates the device supports 1 Gbit/s full-duplex operation.
2263.It Dv ETHER_STAT_CAP_1000HDX
2264Indicates the device supports 1 Gbit/s half-duplex operation.
2265.It Dv ETHER_STAT_CAP_100FDX
2266Indicates the device supports 100 Mbit/s full-duplex operation.
2267.It Dv ETHER_STAT_CAP_100GFDX
2268Indicates the device supports 100 Gbit/s full-duplex operation.
2269.It Dv ETHER_STAT_CAP_100HDX
2270Indicates the device supports 100 Mbit/s half-duplex operation.
2271.It Dv ETHER_STAT_CAP_100T4
2272Indicates the device supports 100 Mbit/s 100BASE-T4 operation.
2273.It Dv ETHER_STAT_CAP_10FDX
2274Indicates the device supports 10 Mbit/s full-duplex operation.
2275.It Dv ETHER_STAT_CAP_10GFDX
2276Indicates the device supports 10 Gbit/s full-duplex operation.
2277.It Dv ETHER_STAT_CAP_10HDX
2278Indicates the device supports 10 Mbit/s half-duplex operation.
2279.It Dv ETHER_STAT_CAP_2500FDX
2280Indicates the device supports 2.5 Gbit/s full-duplex operation.
2281.It Dv ETHER_STAT_CAP_40GFDX
2282Indicates the device supports 40 Gbit/s full-duplex operation.
2283.It Dv ETHER_STAT_CAP_5000FDX
2284Indicates the device supports 5.0 Gbit/s full-duplex operation.
2285.It Dv ETHER_STAT_CAP_ASMPAUSE
2286Indicates that the device supports the ability to receive pause frames.
2287.It Dv ETHER_STAT_CAP_AUTONEG
2288Indicates that the device supports the ability to perform link
2289auto-negotiation.
2290.It Dv ETHER_STAT_CAP_PAUSE
2291Indicates that the device supports the ability to transmit pause frames.
2292.It Dv ETHER_STAT_CAP_REMFAULT
2293Indicates that the device supports the ability of detecting a remote
2294fault in a link peer.
2295.It Dv ETHER_STAT_CARRIER_ERRORS
2296Indicates the number of times that the Ethernet carrier sense condition
2297was lost or not asserted.
2298.It Dv ETHER_STAT_DEFER_XMTS
2299Indicates the number of frames for which the device was unable to
2300transmit the frame due to being busy and had to try again.
2301.It Dv ETHER_STAT_EX_COLLISIONS
2302Indicates the number of frames that failed to send due to an excessive
2303number of collisions.
2304.It Dv ETHER_STAT_FCS_ERRORS
2305Indicates the number of times that a frame check sequence failed.
2306.It Dv ETHER_STAT_FIRST_COLLISIONS
2307Indicates the number of times that a frame was eventually transmitted
2308successfully, but only after a single collision.
2309.It Dv ETHER_STAT_JABBER_ERRORS
2310Indicates the number of frames that were received that were both larger
2311than the maximum packet size and failed the frame check sequence.
2312.It Dv ETHER_STAT_LINK_ASMPAUSE
2313Indicates whether the link is currently configured to accept pause
2314frames.
2315.It Dv ETHER_STAT_LINK_AUTONEG
2316Indicates whether the current link state is a result of
2317auto-negotiation.
2318.It Dv ETHER_STAT_LINK_DUPLEX
2319Indicates the current duplex state of the link.
2320The values used here should be the same as documented for
2321.Dv MAC_PROP_DUPLEX .
2322.It Dv ETHER_STAT_LINK_PAUSE
2323Indicates whether the link is currently configured to generate pause
2324frames.
2325.It Dv ETHER_STAT_LP_CAP_1000FDX
2326Indicates the remote device supports 1 Gbit/s full-duplex operation.
2327.It Dv ETHER_STAT_LP_CAP_1000HDX
2328Indicates the remote device supports 1 Gbit/s half-duplex operation.
2329.It Dv ETHER_STAT_LP_CAP_100FDX
2330Indicates the remote device supports 100 Mbit/s full-duplex operation.
2331.It Dv ETHER_STAT_LP_CAP_100GFDX
2332Indicates the remote device supports 100 Gbit/s full-duplex operation.
2333.It Dv ETHER_STAT_LP_CAP_100HDX
2334Indicates the remote device supports 100 Mbit/s half-duplex operation.
2335.It Dv ETHER_STAT_LP_CAP_100T4
2336Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation.
2337.It Dv ETHER_STAT_LP_CAP_10FDX
2338Indicates the remote device supports 10 Mbit/s full-duplex operation.
2339.It Dv ETHER_STAT_LP_CAP_10GFDX
2340Indicates the remote device supports 10 Gbit/s full-duplex operation.
2341.It Dv ETHER_STAT_LP_CAP_10HDX
2342Indicates the remote device supports 10 Mbit/s half-duplex operation.
2343.It Dv ETHER_STAT_LP_CAP_2500FDX
2344Indicates the remote device supports 2.5 Gbit/s full-duplex operation.
2345.It Dv ETHER_STAT_LP_CAP_40GFDX
2346Indicates the remote device supports 40 Gbit/s full-duplex operation.
2347.It Dv ETHER_STAT_LP_CAP_5000FDX
2348Indicates the remote device supports 5.0 Gbit/s full-duplex operation.
2349.It Dv ETHER_STAT_LP_CAP_ASMPAUSE
2350Indicates that the remote device supports the ability to receive pause
2351frames.
2352.It Dv ETHER_STAT_LP_CAP_AUTONEG
2353Indicates that the remote device supports the ability to perform link
2354auto-negotiation.
2355.It Dv ETHER_STAT_LP_CAP_PAUSE
2356Indicates that the remote device supports the ability to transmit pause
2357frames.
2358.It Dv ETHER_STAT_LP_CAP_REMFAULT
2359Indicates that the remote device supports the ability of detecting a
2360remote fault in a link peer.
2361.It Dv ETHER_STAT_MACRCV_ERRORS
2362Indicates the number of times that the internal MAC layer encountered an
2363error when attempting to receive and process a frame.
2364.It Dv ETHER_STAT_MACXMT_ERRORS
2365Indicates the number of times that the internal MAC layer encountered an
2366error when attempting to process and transmit a frame.
2367.It Dv ETHER_STAT_MULTI_COLLISIONS
2368Indicates the number of times that a frame was eventually transmitted
2369successfully, but only after more than one collision.
2370.It Dv ETHER_STAT_SQE_ERRORS
2371Indicates the number of times that an SQE error occurred.
2372The specific conditions for this error are documented in IEEE 802.3.
2373.It Dv ETHER_STAT_TOOLONG_ERRORS
2374Indicates the number of frames that were received that were longer than
2375the maximum frame size supported by the device.
2376.It Dv ETHER_STAT_TOOSHORT_ERRORS
2377Indicates the number of frames that were received that were shorter than
2378the minimum frame size supported by the device.
2379.It Dv ETHER_STAT_TX_LATE_COLLISIONS
2380Indicates the number of times a collision was detected late on the
2381device.
2382.It Dv ETHER_STAT_XCVR_ADDR
2383Indicates the address of the MII/GMII receiver address.
2384.It Dv ETHER_STAT_XCVR_ID
2385Indicates the id of the MII/GMII receiver address.
2386.It Dv ETHER_STAT_XCVR_INUSE
2387Indicates what kind of transceiver is in use.
2388Use the
2389.Vt mac_ether_media_t
2390enumeration values described in the discussion of
2391.Dv MAC_PROP_MEDIA
2392above.
2393These definitions are compatible with the older subset of
2394XCVR_* macros.
2395.El
2396.Ss Device Specific kstats
2397In addition to the defined statistics above, if the device driver
2398maintains additional statistics or the device provides additional
2399statistics, it should create its own kstats through the
2400.Xr kstat_create 9F
2401function to allow operators to observe them.
2402.Sh RECEIVE DESCRIPTOR LAYOUT
2403One of the important things that a device driver must do is lay out DMA
2404memory, generally in a ring of descriptors, into which received Ethernet
2405frames will be placed.
2406When performing this, there are a few things that drivers should
2407generally do:
2408.Bl -enum -offset indent
2409.It
2410Drivers should lay out memory so that the IP header will be 4-byte
2411aligned.
2412The IP stack expects that the beginning of an IP header will be at a
24134-byte aligned address; however, a DMA allocation will be at a 4-
2414or 8-byte aligned address by default.
2415The IP hearder is at a 14 byte offset from the beginning of the Ethernet
2416frame, leaving the IP header at a 2-byte alignment if the Ethernet frame
2417starts at the beginning of the DMA buffer.
2418If VLAN tagging is in place, then each VLAN tag adds 4 bytes, which
2419doesn't change the alignment the IP header is found at.
2420.Pp
2421As a solution to this, the driver should program the device to start
2422placing the received Ethernet frame at two bytes off of the start of the
2423DMA buffer.
2424This will make sure that no matter whether or not VLAN tags are present,
2425that the IP header will be 4-byte aligned.
2426.It
2427Drivers should try to allocate the DMA memory used for receiving frames
2428as a continuous buffer.
2429If for some reason that would not be possible, the driver should try to
2430ensure that there is enough space for all of the initial Ethernet and
2431any possible layer three and layer four headers
2432.Pq such as IP, TCP, or UDP
2433in the initial descriptor.
2434.It
2435As discussed in the
2436.Sx MBLKS AND DMA
2437section, there are multiple strategies for managing the relationship
2438between DMA data, receive descriptors, and the operating system
2439representation of a packet in the
2440.Xr mblk 9S
2441structure.
2442Drivers must limit their resource consumption.
2443See the
2444.Sy Considerations
2445section of
2446.Sx MBLKS AND DMA
2447for more on this.
2448.El
2449.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
2450Device drivers are the first line of defense for dealing with broken
2451devices and bugs in their firmware.
2452While most devices will rarely fail, it is important that when designing and
2453implementing the device driver that particular attention is paid in the design
2454with respect to RAS (Reliability, Availability, and Serviceability).
2455While everything described in this section is optional, it is highly recommended
2456that all new device drivers follow these guidelines.
2457.Pp
2458The Fault Management Architecture (FMA) provides facilities for
2459detecting and reporting various classes of defects and faults.
2460Specifically for networking device drivers, issues that should be
2461detected and reported include:
2462.Bl -bullet -offset indent
2463.It
2464Device internal uncorrectable errors
2465.It
2466Device internal correctable errors
2467.It
2468PCI and PCI Express transport errors
2469.It
2470Device temperature alarms
2471.It
2472Device transmission stalls
2473.It
2474Device communication timeouts
2475.It
2476High invalid interrupts
2477.El
2478.Pp
2479All such errors fall into three primary categories:
2480.Bl -enum -offset indent
2481.It
2482Errors detected by the Fault Management Architecture
2483.It
2484Errors detected by the device and indicated to the device driver
2485.It
2486Errors detected by the device driver
2487.El
2488.Ss Fault Management Setup and Teardown
2489Drivers should initialize support for the fault management framework by
2490calling
2491.Xr ddi_fm_init 9F
2492from their
2493.Xr attach 9E
2494routine.
2495By registering with the fault management framework, a device driver is given the
2496chance to detect and notice transport errors as well as report other errors that
2497exist.
2498While a device driver does not need to indicate that it is capable of all such
2499capabilities described in
2500.Xr ddi_fm_init 9F ,
2501we suggest that device drivers at least register the
2502.Dv DDI_FM_EREPORT_CAPABLE
2503so as to allow the driver to report issues that it detects.
2504.Pp
2505If the driver registers with the fault management framework during its
2506.Xr attach 9E
2507entry point, it must call
2508.Xr ddi_fm_fini 9F
2509during its
2510.Xr detach 9E
2511entry point.
2512.Ss Transport Errors
2513Many modern networking devices leverage PCI or PCI Express.
2514As such, there are two primary ways that device drivers access data: they either
2515memory map device registers and use routines like
2516.Xr ddi_get8 9F
2517and
2518.Xr ddi_put8 9F
2519or they use direct memory access (DMA).
2520New device drivers should always enable checking of the transport layer by
2521marking their support in the
2522.Xr ddi_device_acc_attr 9S
2523structure and using routines like
2524.Xr ddi_fm_acc_err_get 9F
2525and
2526.Xr ddi_fm_dma_err_get 9F
2527to detect if errors have occurred.
2528.Ss Device Indicated Errors
2529Many devices have capabilities to announce to a device driver that a
2530fatal correctable error or uncorrectable error has occurred.
2531Other devices have the ability to indicate that various physical issues have
2532occurred such as a fan failing or a temperature sensor having fired.
2533.Pp
2534Drivers should wire themselves to receive notifications when these
2535events occur.
2536The means and capabilities will vary from device to device.
2537For example, some devices will generate information about these notifications
2538through special interrupts.
2539Other devices may have a register that software can poll.
2540In the cases where polling is required, driver writers should try not to poll
2541too frequently and should generally only poll when the device is actively being
2542used, e.g. between calls to the
2543.Xr mc_start 9E
2544and
2545.Xr mc_stop 9E
2546entry points.
2547.Ss Driver Transmit Stall Detection
2548One of the primary responsibilities of a hardened device driver is to
2549perform transmit stall detection.
2550The core idea behind tx stall detection is that the driver should record when
2551it's getting activity related to when data has been successfully transmitted.
2552Most devices should be transmitting data on a regular basis as long as the link
2553is up.
2554If it is not, then this may indicate that the device is stuck and needs to be
2555reset.
2556At this time, the MAC framework does not provide any resources for performing
2557these checks; however, polling on each individual transmit ring for the last
2558completion time while something is actively being transmitted through the use of
2559routines such as
2560.Xr timeout 9F
2561may be a reasonable starting point.
2562.Ss Driver Command Timeout Detection
2563Each device is programmed in different ways.
2564Some devices are programmed through asynchronous commands while others are
2565programmed by writing directly to memory mapped registers.
2566If a device receives asynchronous replies to commands, then the device driver
2567should set reasonable timeouts for all such commands and plan on detecting them.
2568If a timeout occurs, the driver should presume that there is an issue with the
2569hardware and proceed to abort the command or reset the device.
2570.Pp
2571Many devices do not have such a communication mechanism.
2572However, whenever there is some activity where the device driver must wait, then
2573it should be prepared for the fact that the device may never get back to
2574it and react appropriately by performing some kind of device reset.
2575.Ss Reacting to Errors
2576When any of the above categories of errors has been triggered, the
2577behavior that the device driver should take depends on the kind of
2578error.
2579If a fatal error, for example, a transport error, a transmit stall was detected,
2580or the device indicated an uncorrectable error was detected, then it is
2581important that the driver take the following steps:
2582.Bl -enum -offset indent
2583.It
2584Set a flag in the device driver's state that indicates that it has hit
2585an error condition.
2586When this error condition flag is asserted, transmitted packets should be
2587accepted and dropped and actions that would require writing to the device state
2588should fail with an error.
2589This flag should remain until the device has been successfully restarted.
2590.It
2591If the error was not a transport error that was indicated by the fault
2592management architecture, e.g. a transport error that was detected, then
2593the device driver should post an
2594.Sy ereport
2595indicating what has occurred with the
2596.Xr ddi_fm_ereport_post 9F
2597function.
2598.It
2599The device driver should indicate that the device's service was lost
2600with a call to
2601.Xr ddi_fm_service_impact 9F
2602using the symbol
2603.Dv DDI_SERVICE_LOST .
2604.It
2605At this point the device driver should issue a device reset through some
2606device-specific means.
2607.It
2608When the device reset has been completed, then the device driver should
2609restore all of the programmed state to the device.
2610This includes things like the current MTU, advertised auto-negotiation speeds,
2611MAC address filters, and more.
2612.It
2613Finally, when service has been restored, the device driver should call
2614.Xr ddi_fm_service_impact 9F
2615using the symbol
2616.Dv DDI_SERVICE_RESTORED .
2617.El
2618.Pp
2619When a non-fatal error occurs, then the device driver should submit an
2620ereport and should optionally mark the device degraded using
2621.Xr ddi_fm_service_impact 9F
2622with the
2623.Dv DDI_SERVICE_DEGRADED
2624value depending on the nature of the problem that has occurred.
2625.Pp
2626Device drivers should never make the decision to remove a device from
2627service based on errors that have occurred nor should they panic the
2628system.
2629Rather, the device driver should always try to notify the operating system with
2630various ereports and allow its policy decisions to occur.
2631The decision to retire a device lies in the hands of the fault management
2632architecture.
2633It knows more about the operator's intent and the surrounding system's state
2634than the device driver itself does and it will make the call to offline and
2635retire the device if it is required.
2636.Ss Device Resets
2637When resetting a device, a device driver must exercise caution.
2638If a device driver has not been written to plan for a device reset, then it
2639may not correctly restore the device's state after such a reset.
2640Such state should be stored in the instance's private state data as the MAC
2641framework does not know about device resets and will not inform the
2642device again about the expected, programmed state.
2643.Pp
2644One wrinkle with device resets is that many networking cards show up as
2645multiple PCI functions on a single device, for example, each port may
2646show up as a separate function and thus have a separate instance of the
2647device driver attached.
2648When resetting a function, device driver writers should carefully read the
2649device programming manuals and verify whether or not a reset impacts only the
2650stalled function or if it impacts all function across the device.
2651.Pp
2652If the only way to reset a given function is through the device, then
2653this may require more coordination and work on the part of the device
2654driver to ensure that all the other instances are correctly restored.
2655In cases where this occurs, some devices offer ways of injecting
2656interrupts onto those other functions to notify them that this is
2657occurring.
2658.Sh MBLKS AND DMA
2659The networking stack manages framed data through the use of the
2660.Xr mblk 9S
2661structure.
2662The mblk allows for a single message to be made up of individual blocks.
2663Each part is linked together through its
2664.Fa b_cont
2665member.
2666However, it also allows for multiple messages to be chained together through the
2667use of the
2668.Fa b_next
2669member.
2670While the networking stack works with these structures, device drivers generally
2671work with DMA regions.
2672There are two different strategies that device drivers use for handling these
2673two different cases: copying and binding.
2674.Ss Copying Data
2675The first way that device drivers handle interfacing between the two is
2676by having two separate regions of memory.
2677One part is memory which has been allocated for DMA through a call to
2678.Xr ddi_dma_mem_alloc 9F
2679and the other is memory associated with the memory block.
2680.Pp
2681In this case, a driver will use
2682.Xr bcopy 9F
2683to copy memory between the two distinct regions.
2684When transmitting a packet, it will copy the memory from the mblk_t to the DMA
2685region.
2686When receiving memory, it will allocate a mblk_t through the
2687.Xr allocb 9F
2688routine, copy the memory across with
2689.Xr bcopy 9F ,
2690and then increment the mblk_t's
2691.Fa b_wptr
2692structure.
2693.Pp
2694If, when receiving, memory is not available for a new message block,
2695then the frame should be skipped and effectively dropped.
2696A kstat should be bumped when such an occasion occurs.
2697.Ss Binding Data
2698An alternative approach to copying data is to use DMA binding.
2699When using DMA binding, the OS takes care of mapping between DMA memory and
2700normal device memory.
2701The exact process is a bit different between transmit and receive.
2702.Pp
2703When transmitting a device driver has an mblk_t and needs to call the
2704.Xr ddi_dma_addr_bind_handle 9F
2705function to bind it to an already existing DMA handle.
2706At that point, it will receive various DMA cookies that it can use to obtain the
2707addresses to program the device with for transmitting data.
2708Once the transmit is done, the driver must then make sure to call
2709.Xr freemsg 9F
2710to release the data.
2711It must not call
2712.Xr freemsg 9F
2713before it receives an interrupt from the device indicating that the data
2714has been transmitted, otherwise it risks sending arbitrary kernel
2715memory.
2716.Pp
2717When receiving data, the device can perform a similar operation.
2718First, it must bind the DMA memory into the kernel's virtual memory address
2719space through a call to the
2720.Xr ddi_dma_addr_bind_handle 9F
2721function if it has not already.
2722Once it has, it must then call
2723.Xr desballoc 9F
2724to try and create a new mblk_t which leverages the associated memory.
2725It can then pass that mblk_t up to the stack.
2726.Ss Considerations
2727When deciding which of these options to use, there are many different
2728considerations that must be made.
2729The answer as to whether to bind memory or to copy data is not always simpler.
2730.Pp
2731The first thing to remember is that DMA resources may be finite on a
2732given platform.
2733Consider the case of receiving data.
2734A device driver that binds one of its receive descriptors may not get it back
2735for quite some time as it may be used by the kernel until an application
2736actually consumes it.
2737Device drivers that try to bind memory for receive, often work with the
2738constraint that they must be able to replace that DMA memory with another DMA
2739descriptor.
2740If they were not replaced, then eventually the device would not be able to
2741receive additional data into the ring.
2742.Pp
2743On the other hand, particularly for larger frames, copying every packet
2744from one buffer to another can be a source of additional latency and
2745memory waste in the system.
2746For larger copies, the cost of copying may dwarf any potential cost of
2747performing DMA binding.
2748.Pp
2749For device driver authors that are unsure of what to do, they should
2750first employ the copying method to simplify the act of writing the
2751device driver.
2752The copying method is simpler and also allows the device driver author not to
2753worry about allocated DMA memory that is still outstanding when it is asked to
2754unload.
2755.Pp
2756If device driver writers are worried about the cost, it is recommended
2757to make the decision as to whether or not to copy or bind DMA data
2758a separate private property for both transmitting and receiving.
2759That private property should indicate the size of the received frame at which
2760to switch from one format to the other.
2761This way, data can be gathered to determine what the impact of each method is on
2762a given platform.
2763.Sh SEE ALSO
2764.Xr dlpi 4P ,
2765.Xr driver.conf 5 ,
2766.Xr ieee802.3 7 ,
2767.Xr dladm 8 ,
2768.Xr _fini 9E ,
2769.Xr _info 9E ,
2770.Xr _init 9E ,
2771.Xr attach 9E ,
2772.Xr close 9E ,
2773.Xr detach 9E ,
2774.Xr mac_capab_led 9E ,
2775.Xr mac_capab_rings 9E ,
2776.Xr mac_capab_transceiver 9E ,
2777.Xr mc_close 9E ,
2778.Xr mc_getcapab 9E ,
2779.Xr mc_getprop 9E ,
2780.Xr mc_getstat 9E ,
2781.Xr mc_multicst 9E  ,
2782.Xr mc_open 9E ,
2783.Xr mc_propinfo 9E  ,
2784.Xr mc_setpromisc 9E  ,
2785.Xr mc_setprop 9E ,
2786.Xr mc_start 9E ,
2787.Xr mc_stop 9E ,
2788.Xr mc_tx 9E ,
2789.Xr mc_unicst 9E  ,
2790.Xr open 9E ,
2791.Xr allocb 9F ,
2792.Xr bcopy 9F ,
2793.Xr ddi_dma_addr_bind_handle 9F ,
2794.Xr ddi_dma_mem_alloc 9F ,
2795.Xr ddi_fm_acc_err_get 9F ,
2796.Xr ddi_fm_dma_err_get 9F ,
2797.Xr ddi_fm_ereport_post 9F ,
2798.Xr ddi_fm_fini 9F ,
2799.Xr ddi_fm_init 9F ,
2800.Xr ddi_fm_service_impact 9F ,
2801.Xr ddi_get8 9F ,
2802.Xr ddi_put8 9F ,
2803.Xr desballoc 9F ,
2804.Xr freemsg 9F ,
2805.Xr kstat_create 9F ,
2806.Xr mac_alloc 9F ,
2807.Xr mac_devt_to_instance 9F ,
2808.Xr mac_fini_ops 9F ,
2809.Xr mac_free 9F ,
2810.Xr mac_getinfo 9F ,
2811.Xr mac_hcksum_get 9F ,
2812.Xr mac_hcksum_set 9F ,
2813.Xr mac_init_ops 9F ,
2814.Xr mac_link_update 9F ,
2815.Xr mac_lso_get 9F ,
2816.Xr mac_maxsdu_update 9F ,
2817.Xr mac_private_minor 9F ,
2818.Xr mac_prop_info_set_default_link_flowctrl 9F ,
2819.Xr mac_prop_info_set_default_str 9F ,
2820.Xr mac_prop_info_set_default_uint32 9F ,
2821.Xr mac_prop_info_set_default_uint64 9F ,
2822.Xr mac_prop_info_set_default_uint8 9F ,
2823.Xr mac_prop_info_set_perm 9F ,
2824.Xr mac_prop_info_set_range_uint32 9F ,
2825.Xr mac_register 9F ,
2826.Xr mac_rx 9F ,
2827.Xr mac_unregister 9F ,
2828.Xr mod_install 9F ,
2829.Xr mod_remove 9F ,
2830.Xr strcmp 9F ,
2831.Xr timeout 9F ,
2832.Xr cb_ops 9S ,
2833.Xr ddi_device_acc_attr 9S ,
2834.Xr dev_ops 9S ,
2835.Xr mac_callbacks 9S ,
2836.Xr mac_register 9S ,
2837.Xr mblk 9S ,
2838.Xr modldrv 9S ,
2839.Xr modlinkage 9S
2840.Rs
2841.%A McCloghrie, K.
2842.%A Rose, M.
2843.%T RFC 1213 Management Information Base for Network Management of
2844.%T TCP/IP-based internets: MIB-II
2845.%D March 1991
2846.Re
2847.Rs
2848.%A McCloghrie, K.
2849.%A Kastenholz, F.
2850.%T RFC 1573 Evolution of the Interfaces Group of MIB-II
2851.%D January 1994
2852.Re
2853.Rs
2854.%A Kastenholz, F.
2855.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like
2856.%T Interface Types
2857.Re
2858.Rs
2859.%A IEEE Computer Standard
2860.%T IEEE 802.3
2861.%T IEEE Standard for Ethernet
2862.%D 2022
2863.Re
2864