xref: /illumos-gate/usr/src/man/man9e/mac.9e (revision e3c4d829fd2f5ea07a7203c4a9a02f1b8b3b18d6)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright 2019 Joyent, Inc.
13.\" Copyright 2020 RackTop Systems, Inc.
14.\" Copyright 2023 Oxide Computer Company
15.\" Copyright 2023 Jason King
16.\" Copyright 2023 Peter Tribble
17.\"
18.Dd July 17, 2023
19.Dt MAC 9E
20.Os
21.Sh NAME
22.Nm mac ,
23.Nm GLDv3
24.Nd MAC networking device driver overview
25.Sh SYNOPSIS
26.In sys/mac_provider.h
27.In sys/mac_ether.h
28.Sh INTERFACE LEVEL
29illumos DDI specific
30.Sh DESCRIPTION
31The
32.Sy MAC
33framework provides a means for implementing high-performance networking
34device drivers.
35It is the successor to the GLD interfaces and is sometimes referred to as the
36GLDv3.
37The remainder of this manual introduces the aspects of writing devices drivers
38that leverage the MAC framework.
39While both the GLDv3 and MAC framework refer to the same thing, in this manual
40page we use the term the
41.Em MAC framework
42to refer to the device driver interface.
43.Pp
44MAC device drivers are character devices.
45They define the standard
46.Xr _init 9E ,
47.Xr _fini 9E ,
48and
49.Xr _info 9E
50entry points to initialize the module, as well as
51.Xr dev_ops 9S
52and
53.Xr cb_ops 9S
54structures.
55.Pp
56The main interface with MAC is through a series of callbacks defined in
57a
58.Xr mac_callbacks 9S
59structure.
60These callbacks control all the aspects of the device.
61They range from sending data, getting and setting of properties, controlling mac
62address filters, and also managing promiscuous mode.
63.Pp
64The MAC framework takes care of many aspects of the device driver's
65management.
66A device that uses the MAC framework does not have to worry about creating
67device nodes or implementing
68.Xr open 9E
69or
70.Xr close 9E
71routines.
72In addition, all of the work to interact with
73.Xr dlpi 4P
74is taken care of automatically and transparently.
75.Ss High-Level Design
76At a high-level, a device driver is chiefly concerned with three general
77operations:
78.Bl -enum -offset indent
79.It
80Sending frames
81.It
82Receiving frames
83.It
84Managing device configuration and metadata
85.El
86.Pp
87When sending frames, the MAC framework always calls functions registered
88in the
89.Xr mac_callbacks 9S
90structure to have the driver transmit frames on hardware.
91When receiving frames, the driver will generally receive an interrupt which will
92cause it to check for incoming data and deliver it to the MAC framework.
93.Pp
94Configuration of a device, such as whether auto-negotiation should be
95enabled, the speeds that the device supports, the MTU (maximum
96transmission unit), and the generation of pause frames are all driven by
97properties.
98The functions to get, set, and obtain information about properties are
99defined through callback functions specified in the
100.Xr mac_callbacks 9S
101structure.
102The full list of properties and a description of the relevant callbacks
103can be found in the
104.Sx PROPERTIES
105section.
106.Pp
107The MAC framework is designed to take advantage of various modern
108features provided by hardware, such as checksumming, segmentation
109offload, and hardware filtering.
110The MAC framework assumes none of these advanced features are present
111and allows device drivers to negotiate them through a capability system.
112Drivers can declare that they support various capabilities by
113implementing the optional
114.Xr mc_getcapab 9E
115entry point.
116Each capability has its associated entry points and structures to fill
117out.
118The capabilities are detailed in the
119.Sx CAPABILITIES
120section.
121.Pp
122The following sections describe the flow of a basic device driver.
123For advanced device drivers, the flow is generally the same.
124The primary distinction is in how frames are sent and received.
125.Ss Initializing MAC Support
126For a device to be used by the MAC framework, it must register with the
127framework and take specific actions during
128.Xr _init 9E ,
129.Xr attach 9E ,
130.Xr detach 9E ,
131and
132.Xr _fini 9E .
133.Pp
134All device drivers have to define a
135.Xr dev_ops 9S
136structure which is pointed to by a
137.Xr modldrv 9S
138structure and the corresponding NULL-terminated
139.Xr modlinkage 9S
140structure.
141The
142.Xr dev_ops 9S
143structure should have a
144.Xr cb_ops 9S
145structure defined for it; however, it does not need to implement any of
146the standard
147.Xr cb_ops 9S
148entry points unless it also exposes a custom set of device nodes not
149otherwise managed by the MAC framework.
150See the
151.Sx Custom Device Nodes
152section for more details.
153.Pp
154Normally, in a driver's
155.Xr _init 9E
156entry point, it passes its
157.Xr modlinkage 9S
158structure directly to
159.Xr mod_install 9F .
160To properly register with MAC, the driver must call
161.Xr mac_init_ops 9F
162before it calls
163.Xr mod_install 9F .
164If for some reason the
165.Xr mod_install 9F
166function fails, then the driver must be removed by a call to
167.Xr mac_fini_ops 9F .
168.Pp
169Conversely, in the driver's
170.Xr _fini 9E
171routine, it should call
172.Xr mac_fini_ops 9F
173after it successfully calls
174.Xr mod_remove 9F .
175For an example of how to use the
176.Xr mac_init_ops 9F
177and
178.Xr mac_fini_ops 9F
179functions, see the examples section in
180.Xr mac_init_ops 9F .
181.Ss Custom Device Nodes
182A device may want to provide its own minor nodes as simple character or block
183devices backed by the usual
184.Xr cb_ops 9S
185routines.
186The MAC framework allows for this by leaving a portion of the minor
187number space available for private driver use.
188.Xr mac_private_minor 9F
189returns the first minor number a driver may use for its own purposes,
190e.g., to pass to
191.Xr ddi_create_minor_node 9F .
192.Pp
193A driver making use of this ability must provide its own
194.Xr getinfo 9E
195implementation that is aware of any such minor nodes.
196It must also delegate back to the MAC framework as appropriate via either
197calls to
198.Xr mac_getinfo 9F
199or
200.Xr mac_devt_to_instance 9F
201for MAC reserved minor nodes.
202It should also take care to not affect MAC reserved minors, e.g.,
203removing all minor nodes associated with a device:
204.Bd -literal -offset indent
205    ddi_remove_minor_node(dip, NULL);
206.Ed
207.Ss Registering with MAC
208Every instance of a device should register separately with MAC.
209To register with MAC, a driver must allocate a
210.Xr mac_register 9S
211structure, fill it in, and then call
212.Xr mac_register 9F .
213The
214.Vt mac_register_t
215structure contains information about the device and all of the required
216function pointers that will be used as callbacks by the framework.
217.Pp
218These steps should all be taken during a device's
219.Xr attach 9E
220entry point.
221It is recommended that the driver perform this sequence of steps after the
222device has finished its initialization of the chipset and interrupts, though
223interrupts should not be enabled at that point.
224After it calls
225.Xr mac_register 9F
226it will start receiving callbacks from the MAC framework.
227.Pp
228To allocate the registration structure, the driver should call
229.Xr mac_alloc 9F .
230Device drivers should generally always pass the symbol
231.Dv MAC_VERSION
232as the argument to
233.Xr mac_alloc 9F .
234Upon successful completion, the driver will receive a
235.Vt mac_register_t
236structure which it should fill in.
237The structure and its members are documented in
238.Xr mac_register 9S .
239.Pp
240The
241.Xr mac_callbacks 9S
242structure is not allocated as a part of the
243.Xr mac_register 9S
244structure.
245In general, device drivers declare this statically.
246See the
247.Sx MAC Callbacks
248section for more information on how to fill it out.
249.Pp
250Once the structure has been filled in, the driver should call
251.Xr mac_register 9F
252to register itself with MAC.
253The handle that it uses to register with should be part of the driver's soft
254state.
255It will be used in various other support functions and callbacks.
256.Pp
257If the call is successful, then the device driver
258should enable interrupts and finish any other initialization required.
259If the call to
260.Xr mac_register 9F
261failed, then it should unwind its initialization and should return
262.Dv DDI_FAILURE
263from its
264.Xr attach 9E
265routine.
266.Pp
267The driver does not need to hold onto an allocated
268.Xr mac_register 9S
269structure after it has called the
270.Xr mac_register 9F
271function.
272Whether the
273.Xr mac_register 9F
274function returns successfully or not, the driver may free its
275.Xr mac_register 9S
276structure by calling the
277.Xr mac_free 9F
278function.
279.Ss MAC Callbacks
280The MAC framework interacts with a device driver through a series of
281callbacks.
282These callbacks are described in their individual manual pages and the
283collection of callbacks is indicated in the
284.Xr mac_callbacks 9S
285manual page.
286This section does not focus on the specific functions, but rather on
287interactions between them and the rest of the device driver framework.
288.Pp
289A device driver should make no assumptions about when the various
290callbacks will be called and whether or not they will be called
291simultaneously.
292For example, a device driver may be asked to transmit data through a call to its
293.Xr mc_tx 9E
294entry point while it is being asked to get a device property through a
295call to its
296.Xr mc_getprop 9E
297entry point.
298As such, while some calls may be serialized to the device, such as setting
299properties, the device driver should always presume that all of its data needs
300to be protected with locks.
301While the device is holding locks, it is safe for it call the following MAC
302routines:
303.Bl -bullet -offset indent -compact
304.It
305.Xr mac_hcksum_get 9F
306.It
307.Xr mac_hcksum_set 9F
308.It
309.Xr mac_lso_get 9F
310.It
311.Xr mac_maxsdu_update 9F
312.It
313.Xr mac_prop_info_set_default_link_flowctrl 9F
314.It
315.Xr mac_prop_info_set_default_str 9F
316.It
317.Xr mac_prop_info_set_default_uint8 9F
318.It
319.Xr mac_prop_info_set_default_uint32 9F
320.It
321.Xr mac_prop_info_set_default_uint64 9F
322.It
323.Xr mac_prop_info_set_perm 9F
324.It
325.Xr mac_prop_info_set_range_uint32 9F
326.El
327.Pp
328Any other MAC related routines should not be called with locks held,
329such as
330.Xr mac_link_update 9F
331or
332.Xr mac_rx 9F .
333Other routines in the DDI may be called while locks are held; however,
334device driver writers should be careful about calling blocking routines
335while locks are held or in interrupt context, even when it is
336legal to do so as this may cause all other callers that need a given
337lock to back up behind such an operation.
338.Ss Receiving Data
339A device driver will often receive data through the means of an
340interrupt or by being asked to poll for frames.
341When this occurs, zero or more frames, each with optional metadata, may
342be ready for the device driver to consume.
343Often each frame has a corresponding descriptor which has information about
344whether or not there were errors or whether or not the device successfully
345checksummed the packet.
346In addition to the per-packet flow described below, there are certain
347requirements that drivers must adhere to when programming the hardware
348to receive data.
349See the section
350.Sx RECEIVE DESCRIPTOR LAYOUT
351for more information.
352.Pp
353During a single interrupt or poll request, a device driver should process
354a fixed number of frames.
355For each frame the device driver should:
356.Bl -enum -offset indent
357.It
358Ensure that all of the DMA memory for the descriptor ring is synchronized with
359the
360.Xr ddi_dma_sync 9F
361function and check the handle for errors if the device driver has enabled DMA
362error reporting as part of the Fault Management Architecture (FMA).
363If the driver does not rely on DMA, then it may skip this step.
364It is recommended that this is performed once per interrupt or poll for
365the entire region and not on a per-packet basis.
366.It
367First check whether or not the frame has errors.
368If errors were detected, then the frame should not be sent to the operating
369system.
370It is recommended that devices keep kstats (see
371.Xr kstat_create 9F
372for more information) and bump the counter whenever such an error is
373detected.
374If the device distinguishes between the types of errors, then separate kstats
375for each class of error are recommended.
376See the
377.Sx STATISTICS
378section for more information on the various error cases that should be
379considered.
380.It
381Once the frame has been determined to be valid, the device driver should
382transform the frame into a
383.Xr mblk 9S .
384See the section
385.Sx MBLKS AND DMA
386for more information on how to transform and prepare a message block.
387.It
388If the device supports hardware checksumming (see the
389.Sx CAPABILITIES
390section for more information on checksumming), then the device driver
391should set the corresponding checksumming information with a call to
392.Xr mac_hcksum_set 9F .
393.It
394It should then append this new message block to the
395.Em end
396of the message block chain, linking it to the
397.Fa b_next
398pointer.
399It is vitally important that all the frames be chained in the order that they
400were received.
401If the device driver mistakenly reorders frames, then it may cause performance
402impacts in the TCP stack and potentially impact application correctness.
403.El
404.Pp
405Once all the frames have been processed and assembled, the device driver
406should deliver them to the rest of the operating system by calling
407.Xr mac_rx 9F .
408The device driver should try to give as many mblk_t structures to the
409system at once.
410It
411.Em should not
412call
413.Xr mac_rx 9F
414once for every assembled mblk_t.
415.Pp
416The device driver must not hold any locks across the call to
417.Xr mac_rx 9F .
418When this function is called, received data will be pushed through the
419networking stack and some replies may be generated and given to the
420driver to send out.
421.Pp
422It is not the device driver's responsibility to determine whether or not
423the system can keep up with a driver's delivery rate of frames.
424The rest of the networking stack will handle issues related to keeping up
425appropriately and ensure that kernel memory is not exhausted by packets
426that are not being processed.
427.Pp
428If the device driver has negotiated the
429.Dv MAC_CAPAB_RINGS
430capability
431.Pq discussed in Xr mac_capab_rings 9E
432then it should call
433.Xr mac_rx_ring 9F
434and not
435.Xr mac_rx 9F .
436A given interrupt may correspond to more than one ring that needs to be
437checked.
438The set of rings is likely to span different groups that were registered
439with MAC through the
440.Xr mr_gget 9E
441interface.
442In those cases, the driver should follow the above procedure
443independently for each ring.
444That means it will call
445.Xr mac_rx_ring 9F
446once for each ring using the handle that it received from when MAC
447called the driver's
448.Xr mr_rget 9E
449entry point.
450When it is looking at the rings, the driver will need to make sure that
451the ring has not had interrupts disabled
452.Pq due to a pending change to polling mode .
453This is discussed in greater detail in the
454.Xr mac_capab_rings 9E
455and
456.Xr mri_poll 9E
457manual pages.
458.Pp
459Finally, the device driver should make sure that any other housekeeping
460activities required for the ring are taken care of such that more data
461can be received.
462.Ss Transmitting Data and Back Pressure
463A device driver will be asked to transmit a message block chain by
464having it's
465.Xr mc_tx 9E
466entry point called.
467While the driver is processing the message blocks, it may run out of resources.
468For example, a transmit descriptor ring may become full.
469At that point, the device driver should return the remaining unprocessed frames.
470The act of returning frames indicates that the device has asserted flow control.
471Once this has been done, no additional calls will be made to the
472driver's transmit entry point and the back pressure will be propagated
473throughout the rest of the networking stack.
474.Pp
475At some point in the future when resources have become available again,
476for example after an interrupt indicating that some portion of the
477transmit ring has been sent, then the device driver must notify the
478system that it can continue transmission.
479To do this, the driver should call
480.Xr mac_tx_update 9F .
481After that point, the driver will receive calls to its
482.Xr mc_tx 9E
483entry point again.
484As mentioned in the section on callbacks, the device driver should avoid holding
485any particular locks across the call to
486.Xr mac_tx_update 9F .
487.Ss Interrupt Coalescing
488For devices operating at higher data rates, interrupt coalescing is an
489important part of a well functioning device and may impact the
490performance of the device.
491Not all devices support interrupt coalescing.
492If interrupt coalescing is supported on the device, it is recommended that
493device driver writers provide private properties for their device to control the
494interrupt coalescing rate.
495This will make it much easier to perform experiments and observe the impact of
496different interrupt rates on the rest of the system.
497.Ss Polling
498Even with interrupt coalescing, when there is a certain incoming packet rate it
499can make more sense to just actively poll the device, asking for more packets
500rather than constantly taking an interrupt.
501When a device driver supports the
502.Xr mac_capab_rings 9E
503capability and therefore polling on receive rings, the MAC framework will ask
504the driver to disable interrupts, with its
505.Xr mi_disable 9E
506entry point, and then subsequently call its polling entry point,
507.Xr mri_poll 9E .
508.Pp
509As long as a device driver implements the needed entry points, then there is
510nothing else that it needs to do to take advantage of polling.
511A driver should not attempt to spin up its own threads, task queues, or
512creatively use timeouts, to try to simulate polling for received packets.
513.Ss MAC Address Filter Management
514The MAC framework will attempt to use as many MAC address filters as a
515device has.
516To program a multicast address filter, the driver's
517.Xr mc_multicst 9E
518entry point will be called.
519If the device driver runs out of filters, it should not take any special action
520and just return the appropriate error as documented in the corresponding manual
521pages for the entry points.
522The framework will ensure that the device is placed in promiscuous mode
523if it needs to.
524.Pp
525If the hardware supports more than one unicast filter then the device
526driver should consider implementing the
527.Dv MAC_CAPAB_RINGS
528capability, which exposes a means for multiple unicast MAC address filters to be
529used by the broader system.
530It is still useful to implement this on hardware which only has a single ring.
531See
532.Xr mac_capab_rings 9E
533for more information.
534.Ss Receive Side Scaling
535Receive side scaling is where a hardware device supports multiple,
536independent queues of frames that can be received.
537Each of these queues is generally associated with an independent
538interrupt and the hardware usually performs some form of hash across the
539queues.
540Hardware which supports this should look at implementing the
541.Dv MAC_CAPAB_RINGS
542capability and see
543.Xr mac_capab_rings 9E
544for more information.
545.Ss Link Updates
546It is the responsibility of the device driver to keep track of the
547data link's state.
548Many devices provide a means of receiving an interrupt when the state of the
549link changes.
550When such a change happens, the driver should update its internal data
551structures and then call
552.Xr mac_link_update 9F
553to inform the MAC layer that this has occurred.
554If the device driver does not properly inform the system about link changes,
555then various features like link aggregations and other mechanisms that leverage
556the link state will not work correctly.
557.Ss Link Speed and Auto-negotiation
558Many networking devices support more than one possible speed that they
559can operate at.
560The selection of a speed is often performed through
561.Em auto-negotiation ,
562though some devices allow the user to control what speeds are advertised
563and used.
564.Pp
565Logically, there are two different sets of things that the device driver
566needs to keep track of while it's operating:
567.Bl -enum
568.It
569The supported speeds in hardware.
570.It
571The enabled speeds from the user.
572.El
573.Pp
574By default, when a link first comes up, the device driver should
575generally configure the link to support the common set of speeds and
576perform auto-negotiation.
577.Pp
578A user can control what speeds a device advertises via auto-negotiation
579and whether or not it performs auto-negotiation at all by using a series
580of properties that have
581.Sy _EN_
582in the name.
583These are read/write properties and there is one for each speed supported in the
584operating system.
585For a full list of them, see the
586.Sx PROPERTIES
587section.
588.Pp
589In addition to these properties, there is a corresponding set of
590properties with
591.Sy _ADV_
592in the name.
593These are similar to the
594.Sy _EN_
595family of properties, but they are read-only and indicate what the
596device has actually negotiated.
597While they are generally similar to the
598.Sy _EN_
599family of properties, they may change depending on power settings.
600See the
601.Sy Ethernet Link Properties
602section in
603.Xr dladm 8
604for more information.
605.Pp
606It's worth discussing how these different values get used throughout the
607different entry points.
608The first entry point to consider is the
609.Xr mc_propinfo 9E
610entry point.
611For a given speed, the driver should consult whether or not the hardware
612supports this speed.
613If it does, it should fill in the default value that the hardware takes and
614whether or not the property is writable.
615The properties should also be updated to indicate whether or not it is writable.
616This holds for both the
617.Sy _EN_
618and
619.Sy _ADV_
620family of properties.
621.Pp
622The next entry point is
623.Xr mc_getprop 9E .
624Here, the device should first consult whether the given speed is
625supported.
626If it is not, then the driver should return
627.Er ENOTSUP .
628If it does, then it should return the current value of the property.
629.Pp
630The last property endpoint is the
631.Xr mc_setprop 9E
632entry point.
633Here, the same logic applies.
634Before the driver considers whether or not the property is writable, it should
635first check whether or not it's a supported property.
636If it's not, then it should return
637.Er ENOTSUP .
638Otherwise, it should proceed to check whether the property is writable,
639and if it is and a valid value, then it should update the property and
640restart the link's negotiation.
641.Pp
642Finally, there is the
643.Xr mc_getstat 9E
644entry point.
645Several of the statistics that are queried relate to auto-negotiation and
646hardware capabilities.
647When a statistic relates to the hardware supporting a given speed, the
648.Sy _EN_
649properties should be ignored.
650The only thing that should be consulted is what the hardware itself supports.
651Otherwise, the statistics should look at what is currently being advertised by
652the device.
653.Ss Unregistering from MAC
654During a driver's
655.Xr detach 9E
656routine, it should unregister the device instance from MAC by calling
657.Xr mac_unregister 9F
658on the handle that it originally called it on.
659If the call to
660.Xr mac_unregister 9F
661failed, then the device is likely still in use and the driver should
662fail the call to
663.Xr detach 9E .
664.Ss Interacting with Devices
665Administrators always interact with devices through the
666.Xr dladm 8
667command line interface.
668The state of devices such as whether the link is considered up or down,
669various link properties such as the MTU, auto-negotiation state, and
670flow control state, are all exposed.
671It is also the preferred way that these properties are set and configured.
672.Pp
673While device tunables may be presented in a
674.Xr driver.conf 5
675file, it is recommended instead to expose such things through
676.Xr dladm 8
677private properties, whether explicitly documented or not.
678.Sh CAPABILITIES
679Capabilities in the MAC Framework are optional features that a device
680supports which indicate various hardware features that the device
681supports.
682The two current capabilities that the system supports are related to being able
683to hardware perform large send offloads (LSO), often also known as TCP
684segmentation and the ability for hardware to calculate and verify the checksums
685present in IPv4, IPV6, and protocol headers such as TCP and UDP.
686.Pp
687The MAC framework will query a device for support of a capability
688through the
689.Xr mc_getcapab 9E
690function.
691Each capability has its own constant and may have corresponding data that goes
692along with it and a specific structure that the device is required to fill in.
693Note, the set of capabilities changes over time and there are also private
694capabilities in the system.
695Several of the capabilities are used in the implementation of the MAC framework.
696Others, like
697.Dv MAC_CAPAB_RINGS ,
698represent feature that have not been stabilized and thus both API and binary
699compatibility for them is not guaranteed.
700It is important that the device driver handles unknown capabilities correctly.
701For more information, see
702.Xr mc_getcapab 9E .
703.Pp
704The following capabilities are
705stable and defined in the system:
706.Ss Dv MAC_CAPAB_HCKSUM
707The
708.Dv MAC_CAPAB_HCKSUM
709capability indicates to the system that the device driver supports some
710amount of checksumming.
711The specific data for this capability is a pointer to a
712.Vt uint32_t .
713To indicate no support for any kind of checksumming, the driver should
714either set this value to zero or simply return that it doesn't support
715the capability.
716.Pp
717Note, the values that the driver declares in this capability indicate
718what it can do when it transmits data.
719If the driver can only verify checksums when receiving data, then it should not
720indicate that it supports this capability.
721The following set of flags may be combined through a bitwise inclusive OR:
722.Bl -tag -width Ds
723.It Dv HCKSUM_INET_PARTIAL
724This indicates that the hardware can calculate a partial checksum for
725both IPv4 and IPv6 UDP and TCP packets; however, it requires the pseudo-header
726checksum be calculated for it.
727The pseudo-header checksum will be available for the mblk_t when calling
728.Xr mac_hcksum_get 9F .
729Note this does not imply that the hardware is capable of calculating
730the partial checksum for other L4 protocols or the IPv4 header checksum.
731That should be indicated with the
732.Dv HCKSUM_IPHDRCKSUM flag .
733.It Dv HCKSUM_INET_FULL_V4
734This indicates that the hardware will fully calculate the L4 checksum for
735outgoing IPv4 UDP or TCP packets only, and does not require a pseudo-header
736checksum.
737Note this does not imply that the hardware is capable of calculating the
738checksum for other L4 protocols or the IPv4 header checksum.
739That should be indicated with the
740.Dv HCKSUM_IPHDRCKSUM .
741.It Dv HCKSUM_INET_FULL_V6
742This indicates that the hardware will fully calculate the L4 checksum for
743outgoing IPv6 UDP or TCP packets only, and does not require a pseudo-header
744checksum.
745Note this does not imply that the hardware is capable of calculating the
746checksum for any other L4 protocols.
747.It Dv HCKSUM_IPHDRCKSUM
748This indicates that the hardware supports calculating the checksum for
749the IPv4 header itself.
750.El
751.Pp
752When in a driver's transmit function, the driver will be processing a
753single frame.
754It should call
755.Xr mac_hcksum_get 9F
756to see what checksum flags are set on it.
757Note that the flags that are set on it are different from the ones described
758above and are documented in its manual page.
759These flags indicate how the driver is expected to program the hardware and what
760checksumming is required.
761Not all frames will require hardware checksumming or will ask the hardware to
762checksum it.
763.Pp
764If a driver supports offloading the receive checksum and verification,
765it should check to see what the hardware indicated was verified.
766The driver should then call
767.Xr mac_hcksum_set 9F .
768The flags used are different from the ones above and are discussed in
769detail in the
770.Xr mac_hcksum_set 9F
771manual page.
772If there is no checksum information available or the driver does not support
773checksumming, then it should simply not call
774.Xr mac_hcksum_set 9F .
775.Pp
776Note that the checksum flags should be set on the first
777mblk_t that makes up a given message.
778In other words, if multiple mblk_t structures are linked together by the
779.Fa b_cont
780member to describe a single frame, then it should only be called on the
781first mblk_t of that set.
782However, each distinct message should have the checksum bits set on it, if
783applicable.
784In other words, each mblk_t that is linked together by the
785.Fa b_next
786pointer may have checksum flags set.
787.Pp
788It is recommended that device drivers provide a private property or
789.Xr driver.conf 5
790property to control whether or not checksumming is enabled for both rx
791and tx; however, the default disposition is recommended to be enabled
792for both.
793This way if hardware bugs are found in the checksumming implementation, they can
794be disabled without requiring software updates.
795The transmit property should be checked when determining how to reply to
796.Xr mc_getcapab 9E
797and the receive property should be checked in the context of the receive
798function.
799.Ss Dv MAC_CAPAB_LSO
800The
801.Dv MAC_CAPAB_LSO
802capability indicates that the driver supports various forms of large
803send offload (LSO).
804The private data is a pointer to a
805.Ft mac_capab_lso_t
806structure.
807The system currently supports offloading TCP packets over both IPv4 and
808IPv6.
809This structure has the following members which are used to indicate
810various types of LSO support.
811.Bd -literal -offset indent
812t_uscalar_t		lso_flags;
813lso_basic_tcp_ivr4_t	lso_basic_tcp_ipv4;
814lso_basic_tcp_ipv6_t	lso_basic_tcp_ipv6;
815.Ed
816.Pp
817The
818.Fa lso_flags
819member is used to indicate which members are valid and should be
820considered.
821Each flag represents a different form of LSO.
822The member should be set to the bitwise inclusive OR of the following values:
823.Bl -tag -width Dv -offset indent
824.It Dv LSO_TX_BASIC_TCP_IPV4
825This indicates hardware support for performing TCP segmentation
826offloading over IPv4.
827When this flag is set, the
828.Fa lso_basic_tcp_ipv4
829member must be filled in.
830.It Dv LSO_TX_BASIC_TCP_IPV6
831This indicates hardware support for performing TCP segmentation
832offloading over IPv6.
833The IPv6 packet will have no extension headers present.
834When this flag is set, the
835.Fa lso_basic_tcp_ipv6
836member must be filled in.
837.El
838.Pp
839The
840.Fa lso_basic_tcp_ipv4
841member is a structure with the following members:
842.Bd -literal -offset indent
843t_uscalar_t	lso_max
844.Ed
845.Bd -filled -offset indent
846The
847.Fa lso_max
848member should be set to the maximum size of the TCP data
849payload that can be offloaded to the hardware.
850.Ed
851.Pp
852The
853.Fa lso_basic_tcp_ipv6
854member is a structure with the following members:
855.Bd -literal -offset indent
856t_uscalar_t	lso_max
857.Ed
858.Bd -filled -offset indent
859The
860.Fa lso_max
861member should be set to the maximum size of the TCP data
862payload that can be offloaded to the hardware.
863.Ed
864.Pp
865Like with checksumming, it is recommended that driver writers provide a
866means for disabling the support of LSO even if it is enabled by default.
867This deals with the case where issues that pop up for LSO may be worked
868around without requiring additional driver work.
869.Sh EVOLVING CAPABILITIES
870The following capabilities are still evolving in the operating system.
871They are documented such that device driver writers may experiment with
872them.
873However, if such drivers are not present inside the core operating
874system repository, they may be subject to API and ABI breakage.
875.Ss Dv MAC_CAPAB_RINGS
876The
877.Dv MAC_CAPAB_RINGS
878capability is very important for implementing a high-performing device
879driver.
880Networking hardware structures the queues of packets to be sent
881and received into a ring.
882Each entry in this ring has a descriptor, which describes the address
883and options for a packet which is going to
884be transmitted or received.
885While simple networking devices only have a single ring, most high-speed
886networking devices have support for many rings.
887.Pp
888Rings are used for two important purposes.
889The first is receive side scaling (RSS), which is the ability to have
890the hardware hash the contents of a packet based on some of the protocol
891headers, and send it to one of several rings.
892These different rings may each have their own interrupt associated with
893them, allowing the card to receive traffic in parallel.
894Similar logic can be performed when sending traffic, to leverage
895multiple hardware resources, thus increasing capacity.
896.Pp
897The second use of rings is to group them together and apply filtering
898rules.
899For example, if a packet matches a specific VLAN or MAC address,
900then it can be sent to a specific ring or a specific group of rings.
901This is especially useful when there are multiple different virtual NICs
902or zones in play as the operating system will be able to use the
903hardware classification features to already know where a given packet
904needs to be delivered internally rather than having to determine that
905for each packet.
906.Pp
907From the MAC framework's perspective, a driver can have one or more
908groups.
909A group consists of the following:
910.Bl -bullet -offset -indent
911.It
912One or more hardware rings.
913.It
914One or more MAC address or VLAN filters.
915.El
916.Pp
917The details around how a device driver changes when rings are employed,
918the data structures that a driver must implement, and more are available
919in
920.Xr mac_capab_rings 9E .
921.Ss Dv MAC_CAPAB_TRANSCEIVER
922Many networking devices leverage external transceivers that adhere to
923standards such as SFP, QSFP, QSFP-DD, etc., which often contain
924standardized information in a EEPROM on the device.
925The
926.Dv MAC_CAPAB_TRANSCEIVER
927capability provides a means of discovering the number of transceivers,
928their types, and reading the data from a transceiver.
929This allows administrators and users to determine if devices are
930present, if the hardware can use them, and in many cases, detailed
931information about the device ranging from its manufacturer and
932serial numbers to specific information about its health.
933Implementing this capability will lead to the operating system being
934able to discover and display transceivers as part of its fault
935management topology.
936.Pp
937See
938.Xr mac_capab_transceiver 9E
939for more details on the capability structure and the various function
940entry points that come along with it.
941.Ss Dv MAC_CAPAB_LED
942The
943.Dv MAC_CAPAB_LED
944capability provides a means to access and control the LEDs on a network
945interface card.
946This is then made available to the broader operating system and consumed
947by facilities such as the Fault Management Architecture.
948See
949.Xr mac_capab_led 9E
950for more details on the structure and requirements of the capability.
951.Sh PROPERTIES
952Properties in the MAC framework represent aspects of a link.
953These include things like the link's current state and MTU.
954Many of the properties in the system are focused around auto-negotiation and
955controlling what link speeds are advertised.
956Information about properties is covered by three different device entry points.
957The
958.Xr mc_propinfo 9E
959entry point obtains metadata about the property.
960The
961.Xr mc_getprop 9E
962entry point obtains the property.
963The
964.Xr mc_setprop 9E
965entry point updates the property to a new value.
966.Pp
967Many of the properties listed below are read-only.
968Each property indicates whether it's read-only or it's read/write.
969However, driver writers may not implement the ability to set all writable
970properties.
971Many of these depend on the card itself.
972In particular, all properties that relate to auto-negotiation and are read/write
973may not be updated if the hardware in question does not support toggling what
974link speeds are auto-negotiated.
975While copper Ethernet often does not have this restriction, it often exists with
976various fiber standards and phys.
977.Pp
978The following properties are the subset of MAC framework properties that
979driver writers should be aware of and handle.
980While other properties exist in the system, driver writers should always return
981an error when a property not listed below is encountered.
982See
983.Xr mc_getprop 9E
984and
985.Xr mc_setprop 9E
986for more information on how to handle them.
987.Bl -hang -width Ds
988.It Dv MAC_PROP_DUPLEX
989.Bd -filled -compact
990Type:
991.Vt link_duplex_t |
992Permissions:
993.Sy Read-Only
994.Ed
995.Pp
996The
997.Dv MAC_PROP_DUPLEX
998property is used to indicate whether or not the link is duplex.
999A duplex link may have traffic flowing in both directions at the same time.
1000The
1001.Vt link_duplex_t
1002is an enumeration which may be set to any of the following values:
1003.Bl -tag -width Ds
1004.It Dv LINK_DUPLEX_UNKNOWN
1005The current state of the link is unknown.
1006This may be because the link has not negotiated to a specific speed or it is
1007down.
1008.It Dv LINK_DUPLEX_HALF
1009The link is running at half duplex.
1010Communication may travel in only one direction on the link at a given time.
1011.It Dv LINK_DUPLEX_FULL
1012The link is running at full duplex.
1013Communication may travel in both directions on the link simultaneously.
1014.El
1015.It Dv MAC_PROP_SPEED
1016.Bd -filled -compact
1017Type:
1018.Vt uint64_t |
1019Permissions:
1020.Sy Read-Only
1021.Ed
1022.Pp
1023The
1024.Dv MAC_PROP_SPEED
1025property stores the current link speed in bits per second.
1026A link that is running at 100 MBit/s would store the value 100000000ULL.
1027A link that is running at 40 Gbit/s would store the value 40000000000ULL.
1028.It Dv MAC_PROP_STATUS
1029.Bd -filled -compact
1030Type:
1031.Vt link_state_t |
1032Permissions:
1033.Sy Read-Only
1034.Ed
1035.Pp
1036The
1037.Dv MAC_PROP_STATUS
1038property is used to indicate the current state of the link.
1039It indicates whether the link is up or down.
1040The
1041.Vt link_state_t
1042is an enumeration which may be set to any of the following values:
1043.Bl -tag -width Ds
1044.It Dv LINK_STATE_UNKNOWN
1045The current state of the link is unknown.
1046This may be because the driver's
1047.Xr mc_start 9E
1048endpoint has not been called so it has not attempted to start the link.
1049.It Dv LINK_STATE_DOWN
1050The link is down.
1051This may be because of a negotiation problem, a cable problem, or some other
1052device specific issue.
1053.It Dv LINK_STATE_UP
1054The link is up.
1055If auto-negotiation is in use, it should have completed.
1056Traffic should be able to flow over the link, barring other issues.
1057.El
1058.It Dv MAC_PROP_MEDIA
1059.Bd -filled -compact
1060Type:
1061.Vt uint32_t No (Varies) |
1062Permissions:
1063.Sy Read-Only
1064.Ed
1065.Pp
1066The
1067.Dv MAC_PROP_MEDIA
1068property indicates the current type of media on the link.
1069The type of media is class-specific and determined based on the
1070.Fa m_type_ident
1071field in the
1072.Vt mac_register_t
1073structure used when calling
1074.Xr mac_register 9F .
1075The media is always read-only.
1076This property is not used to control how auto-negotiation should be
1077performed, instead the existing speed-based properties are used instead.
1078This property should be updated after auto-negotiation has completed.
1079If device hardware and firmware do not provide a way to accurately
1080determine this, then it is much better to return that the media is
1081unknown rather than to lie or guess.
1082A common case where this comes up is when a network card uses an
1083SFP-based device.
1084If the underlying negotiated type of the link isn't made available and
1085therefore the driver can't distinguish between say 40GBASE-SR4 and
108640GBASE-LR4, then drivers should return that the media is unknown.
1087.Pp
1088Similarly many types here represent an electrical interface that is
1089often used between a MAC and a PHY, but also for chip-to-chip
1090connectivity or on a backplane.
1091When connecting to a PHY these shouldn't generally be used as the user
1092is concerned with what is actually on the link they plug in, not the
1093internals of the device.
1094.Pp
1095Currently media values are defined for Ethernet-based devices and use
1096the enumeration
1097.Vt mac_ether_media_t .
1098These are defined in
1099.In sys/mac_ether.h
1100and generally follow the IEEE standardized physical medium dependent
1101.Pq PMD
1102layer in 802.3.
1103.Bl -tag -width Ds
1104.It Dv ETHER_MEDIA_UNKNOWN
1105This indicates that the type of the link media is unknown to the driver.
1106This may be because the link is in a state where this information is
1107unknown or the hardware, firmware, and device driver cannot figure it
1108out.
1109If there is no media present and the link is down, use
1110.Dv ETHER_MEDIA_NONE
1111instead.
1112.It Dv ETHER_MEDIA_NONE
1113Represents the case that there is no specific media in use.
1114This should generally be used when the link is down.
1115.It Dv ETHER_MEDIA_10BASE_T
1116Traditional 10 Mbit/s Ethernet based utilizing CAT-3 cabling.
1117Defined in 802.3i.
1118.It Dv ETHER_MEDIA_10BASE_T1
1119A more recent variant of 10 Mbit/s Ethernet that uses a single twisted
1120pair.
1121Defined in 802.3cg.
1122.It Dv ETHER_MEDIA_100BASE_TX
1123The most common form of 100 Mbit/s Ethernet that utilizes two twisted
1124pairs over a CAT-5 cable.
1125Defined in 802.3u.
1126.It Dv ETHER_MEDIA_100BASE_FX
1127100 Mbit/s Ethernet operating over multi-mode fiber.
1128Defined in 802.3u.
1129.It Dv ETHER_MEDIA_100BASE_X
1130This is a general term that covers operating in one of the 100BASE-?X
1131variants.
1132This is here because some PHYs do not distinguish between operating in
1133100BASE-TX and 100BASE-FX.
1134If the driver can determine if it is operating with a BASE-T or fiber
1135based PHY, prefer the more specific types instead.
1136.It Dv ETHER_MEDIA_100BASE_T4
1137This is an uncommon half-duplex variant of 100 Mbit/s Ethernet that
1138operates over CAT-3 cable using four twisted pairs.
1139Defined in 802.3u.
1140.It Dv ETHER_MEDIA_100BASE_T2
1141This is another uncommon variant of 100 Mbit/s Ethernet that only
1142requires two twisted pairs, but unlike 100BASE-TX requires CAT-3 cables.
1143Defined in 802.3y.
1144.It Dv ETHER_MEDIA_100BASE_T1
1145A more recent form of 100 Mbit/s Ethernet that requires only a single
1146twisted pair.
1147Defined in 802.3bw.
1148.It Dv ETHER_MEDIA_100_SGMII
1149This form of 100 Mbit/s Ethernet is generally used for chip-to-chip
1150connectivity and utilizes the SGMII
1151.Pq Serial gigabit media-independent interface
1152specification.
1153.It Dv ETHER_MEDIA_1000BASE_X
1154This is a general catch-all for all 1 Gbit/s fiber-based operation.
1155This is here for compatibility with the generic information returned by
1156traditional 802.3-compatible PHYs.
1157When more specific information is available, that should be used
1158instead.
1159.It Dv ETHER_MEDIA_1000BASE_T
1160Traditional 1 Gbit/s Ethernet that utilizes a CAT-5 cable with four
1161twisted pairs.
1162Defined in 802.3ab.
1163.It Dv ETHER_MEDIA_1000BASE_T1
1164A more recent form of 1 Gbit/s Ethernet that only requires a single
1165twisted pair.
1166.It Dv ETHER_MEDIA_1000BASE_KX
1167This form of 1 Gbit/s Ethernet is designed for operating over a backplane.
1168Defined in 802.3ap.
1169.It Dv ETHER_MEDIA_1000BASE_CX
1170An older form of 1 Gbit/s Ethernet that operates over balanced copper
1171cables.
1172Defined in 802.3z.
1173.It Dv ETHER_MEDIA_1000BASE_SX
11741 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1175each direction.
1176.It Dv ETHER_MEDIA_1000BASE_LX
11771 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1178each direction.
1179.It Dv ETHER_MEDIA_1000BASE_BX
11801 Gbit/s Ethernet operating over a single piece of single-mode fiber.
1181This media operates bi-directionally as opposed to how 1000BASE-LX and
11821000BASE-SX operate.
1183.It Dv ETHER_MEDIA_1000_SGMII
1184A form of 1 Gbit/s Ethernet defined by Cisco that is used for
1185chip-to-chip connectivity.
1186.It Dv ETHER_MEDIA_2500BASE_T
11872.5 Gbit/s Ethernet based on four copper twisted-pairs.
1188Defined in 802.3bz.
1189.It Dv ETHER_MEDIA_2500BASE_KX
11902.5 Gbit/s Ethernet that is designed for operating over a backplane
1191interconnect.
1192Defined in 802.3cb.
1193.It Dv ETHER_MEDIA_2500BASE_X
1194This is a variant of 2.5 Gbit/s Ethernet that took the 1000BASE-X IEEE
1195standard and ran it with a 2.5x faster clock.
1196It is a defacto standard.
1197.It Dv ETHER_MEDIA_5000BASE_T
11985.0 Gbit/s Ethernet based on four copper twisted-pairs.
1199Defined in 802.3bz.
1200.It Dv ETHER_MEDIA_5000BASE_KR
12015.0 Gbit/s Ethernet that is designed for operating over a backplane
1202interconnect.
1203Defined in 802.3cb.
1204.It Dv ETHER_MEDIA_10GBASE_T
120510 Gbit/s Ethernet operating over four copper twisted pairs utilizing
1206CAT-6a cables.
1207Defined in 802.3an.
1208.It Dv ETHER_MEDIA_10GBASE_SR
120910 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1210each direction.
1211Defined in 802.3ae.
1212.It Dv ETHER_MEDIA_10GBASE_LR
121310 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1214each direction.
1215The maximum fiber length is 10km.
1216Defined in 802.3ae.
1217.It Dv ETHER_MEDIA_10GBASE_ER
121810 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1219each direction.
1220The maximum fiber length is 30km.
1221Defined in 802.3ae.
1222.It Dv ETHER_MEDIA_10GBASE_LRM
122310 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1224each direction.
1225This has a longer reach of up to 220m and is a longer distance than
122610GBASE-SR.
1227Defined in 802.3aq.
1228.It Dv ETHER_MEDIA_10GBASE_KR
122910 Gbit/s Ethernet operating over a single lane backplane.
1230Defined n 802.3ap.
1231.It Dv ETHER_MEDIA_10GBASE_CX4
123210 Gbit/s Ethernet operating over a group of four shielded copper cables.
1233Defined in 802.3ak.
1234.It Dv ETHER_MEDIA_10GBASE_KX4
123510 Gbit/s Ethernet operating over a four lane backplane.
1236Defined n 802.3ap.
1237.It Dv ETHER_MEDIA_10GBASE_CR
123810 Gbit/s Ethernet that is built using a passive copper
1239SFP-compatible cable.
1240This is sometimes called 10GSFP+Cu passive.
1241Defined in SFF-8431.
1242.It Dv ETHER_MEDIA_10GBASE_AOC
124310 Gbit/s Ethernet that is built using a short-range active
1244optical cable that is SFP+-compatible.
1245Defined in SFF-8431.
1246.It Dv ETHER_MEDIA_10GBASE_ACC
124710 Gbit/s Ethernet based upon a single lane of copper cable with an
1248active component that allows it go longer distances than 10GBASE-CR.
1249Defined in SFF-8431.
1250.It Dv ETHER_MEDIA_10G_XAUI
125110 Gbit/s signalling that is defined for use between a MAC and PHY.
1252This is the roman numeral X and attachment unit interface.
1253Sometimes used for chip-to-chip interconnects.
1254Defined in 802.3ae.
1255.It Dv ETHER_MEDIA_10G_SFI
125610 Gbit/s signalling that is defined for use between a MAC and an
1257SFP-based transceiver.
1258Defined in SFF-8431.
1259.It Dv ETHER_MEDIA_10G_XFI
126010 Gbit/s signalling that is defined for use between a MAC and an
1261XFP-based transceiver.
1262Defined in INF-8077i
1263.Pq XFP MSA .
1264.It Dv ETHER_MEDIA_25GBASE_T
126525 Gbit/s Ethernet based upon four twisted pair cables using CAT-8
1266cable.
1267Defined in 802.3bq.
1268.It Dv ETHER_MEDIA_25GBASE_SR
126925 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1270each direction.
1271Defined in 802.3by.
1272.It Dv ETHER_MEDIA_25GBASE_LR
127325 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1274each direction.
1275The maximum fiber length is 10km.
1276Defined in 802.3cc.
1277.It Dv ETHER_MEDIA_25GBASE_ER
127825 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1279each direction.
1280The maximum fiber length is 30km.
1281Defined in 802.3cc.
1282.It Dv ETHER_MEDIA_25GBASE_KR
128325 Gbit/s Ethernet operating over a backplane with a single lane.
1284Defined in 802.3by.
1285.It Dv ETHER_MEDIA_25GBASE_CR
128625 Gbit/s Ethernet operating over a single lane of copper cable.
1287Generally used with an SFP28 style connector.
1288Defined in 802.3by.
1289.It Dv ETHER_MEDIA_25GBASE_AOC
129025 Gbit/s Ethernet based that is built using a short-range active
1291optical cable that is SFP28-compatible.
1292Defined loosely by SFF-8402 and often utilizes 25GBASE-SR.
1293.It Dv ETHER_MEDIA_25GBASE_ACC
129425 Gbit/s Ethernet based upon a single lane of copper cable with an
1295active component that allows it go longer distances than 25GBASE-CR.
1296Defined loosely by SFF-8402.
1297.It Dv ETHER_MEDIA_25G_AUI
129825 Gbit/s signalling that is defined for use between a MAC and PHY and
1299for chip-to-chip connectivity.
1300Defined by 802.3by.
1301.It Dv ETHER_MEDIA_40GBASE_T
130240 Gbit/s Ethernet based upon four twisted-pairs of CAT-8 cables.
1303Defined in 802.3bq.
1304.It Dv ETHER_MEDIA_40GBASE_CR4
130540 Gbit/s Ethernet utilizing four lanes of twinaxial copper cabling
1306each operating at 10 Gbit/s.
1307This is generally used with a QSFP+ connector defined in SFF-8635.
1308Defined in 802.3ba.
1309.It Dv ETHER_MEDIA_40GBASE_KR4
131040 Gbit/s Ethernet utilizing four lanes over a copper backplane each
1311operating at 10 Gbit/s.
1312Defined in 802.3ba.
1313.It Dv ETHER_MEDIA_40GBASE_SR4
131440 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, each
1315operating at 10 Gbit/s, with one fiber in the pair being used for
1316transmit and the other for receive.
1317Generally utilizes a QSFP+ connector.
1318Defined in 802.3ba.
1319.It Dv ETHER_MEDIA_40GBASE_LR4
132040 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1321for each direction.
1322Utilizes wavelength multiplexing as the electrical interface is four 10
1323Gbit/s signals.
1324The maximum fiber length is 10km.
1325Defined in 802.3ba.
1326.It Dv ETHER_MEDIA_40GBASE_ER4
132740 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1328for each direction.
1329Utilizes wavelength multiplexing as the electrical interface is four 10
1330Gbit/s signals and generally based upon a QSFP+ connector.
1331The maximum fiber length is 40km.
1332Defined in 802.3bm.
1333.It Dv ETHER_MEDIA_40GBASE_LM4
133440 Gbit/s Ethernet based upon using one pair of multi-mode fibers, one
1335for each direction.
1336Utilizes wavelength multiplexing as the electrical interface is four 10
1337Gbit/s signals and generally based upon a QSFP+ connector.
1338Defined by a specific MSA.
1339.It Dv ETHER_MEDIA_40GBASE_AOC4
134040 Gbit/s Ethernet based upon a QSFP+ based cable with built-in
1341optical transceivers.
1342The electrical interface is four lanes running at 10 Gbit/s.
1343.It Dv ETHER_MEDIA_40GBASE_ACC4
134440 Gbit/s Ethernet based upon four copper lanes each running at 10
1345Gbit/s with some additional component compared to 40GBASE-CR4.
1346.It Dv ETHER_MEDIA_40G_XLAUI
134740 Gbit/s signalling operating across four lanes that is defined for use
1348between a MAC and a PHY or for chip-to-chip connectivity.
1349Defined by 802.3ba.
1350.It Dv ETHER_MEDIA_40G_XLPPI
135140 Gbit/s signalling operating across four lanes that is designed to
1352connect between a chip and a module, generally a QSFP+ based device.
1353Defined in 802.3ba.
1354.It Dv ETHER_MEDIA_50GBASE_KR2
135550 Gbit/s Ethernet which operates over a two lane copper backplane.
1356Each lane operates at 25 Gbit/s.
1357Defined by the 25G and 50G Ethernet consortium.
1358This did not become an IEEE standard.
1359.It Dv ETHER_MEDIA_50GBASE_CR2
136050 Gbit/s Ethernet which operates over two lane copper twinaxial cable,
1361generally with a QSFP+ connector.
1362Each lane operates at 25 Gbit/s.
1363Defined by the 25G and 50G Ethernet consortium.
1364.It Dv ETHER_MEDIA_50GBASE_SR2
136550 Gbit/s Ethernet based upon using four pairs of multi-mode fiber, each
1366operating at 25 Gbit/s, with one fiber in the pair being used for
1367transmit and the other for receive.
1368Generally utilizes a QSFP+ connector.
1369Defined by the 25G and 50G Ethernet consortium.
1370.It Dv ETHER_MEDIA_50GBASE_LR2
137150 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1372for each direction.
1373Utilizes wavelength multiplexing as the electrical interface is two 25
1374Gbit/s signals.
1375Defined by the 25G and 50G Ethernet consortium.
1376.It Dv ETHER_MEDIA_50GBASE_AOC2
137750 Gbit/s Ethernet generally based upon a QSFP+ based cable with built-in
1378optical transceivers.
1379The electrical interface is two lanes running at 25 Gbit/s.
1380.It Dv ETHER_MEDIA_50GBASE_ACC2
138150 Gbit/s Ethernet based upon two copper twinaxial lanes each running at
138225 Gbit/s with some additional component compared to 50GBASE-CR2.
1383.It Dv ETHER_MEDIA_50GBASE_KR
138450 Gbit/s Ethernet operating over a single lane backplane.
1385Defined by 802.3cd.
1386.It Dv ETHER_MEDIA_50GBASE_CR
138750 Gbit/s Ethernet operating over a single lane twinaxial copper cable
1388generally utilizing an SFP56 interface.
1389Defined by 802.3cd.
1390.It Dv ETHER_MEDIA_50GBASE_SR
139150 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1392each direction.
1393Defined by 802.3cd.
1394.It Dv ETHER_MEDIA_50GBASE_LR
139550 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1396each direction.
1397The maximum fiber length is 10km.
1398Defined in 802.3cd.
1399.It Dv ETHER_MEDIA_50GBASE_ER
140050 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1401each direction.
1402The maximum fiber length is 40km.
1403Defined in 802.3cd.
1404.It Dv ETHER_MEDIA_50GBASE_FR
140550 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1406each direction.
1407The maximum fiber length is 2km.
1408Defined in 802.3cd.
1409.It Dv ETHER_MEDIA_50GBASE_AOC
141050 Gbit/s Ethernet that is built using a short-range active optical
1411cable that is generally SFP56 compatible.
1412The electrical interface operates at 25 Gbit/s PAM4 signaling.
1413.It Dv ETHER_MEDIA_50GBASE_ACC
141450 Gbit/s Ethernet that is built using a single lane twinaxial
1415cable that is generally SFP56 compatible but uses an active component
1416such as a retimer or redriver when compared to 50GBASE-CR.
1417.It Dv ETHER_MEDIA_100GBASE_CR10
1418100 Gbit/s Ethernet operating over ten lanes of shielded twinaxial
1419copper cable, each operating at 10 Gbit/s.
1420Defined in 802.3ba.
1421.It Dv ETHER_MEDIA_100GBASE_SR10
1422100 Gbit/s Ethernet based upon using ten pairs of multi-mode fiber, each
1423operating at 10 Gbit/s, with one fiber in the pair being used for
1424transmit and the other for receive.
1425.It Dv ETHER_MEDIA_100GBASE_SR4
1426100 Gbit/s Ethernet based upon using four pairs of multi-mode fiber,
1427each operating at 25 Gbit/s, with one fiber in the pair being used for
1428transmit and the other for receive.
1429Defined by 802.3bm.
1430.It Dv ETHER_MEDIA_100GBASE_LR4
1431100 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1432for each direction.
1433Utilizes wavelength multiplexing as the electrical interface is four 25
1434Gbit/s signals and generally based upon a QSFP28 connector.
1435The maximum fiber length is 10km.
1436Defined by 802.3ba.
1437.It Dv ETHER_MEDIA_100GBASE_ER4
1438100 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1439for each direction.
1440Utilizes wavelength multiplexing as the electrical interface is four 25
1441Gbit/s signals and generally based upon a QSFP28 connector.
1442The maximum fiber length is 40km.
1443Defined by 802.3ba.
1444.It Dv ETHER_MEDIA_100GBASE_KR4
1445100 Gbit/s Ethernet based upon using a four lane copper backplane.
1446Each lane operates at 25 Gbit/s.
1447Defined in 802.3bj.
1448.It Dv ETHER_MEDIA_100GBASE_CAUI4
1449100 Gbit/s signalling used for chip-to-chip and chip-to-module
1450connectivity.
1451Defined in 802.3bm.
1452.It Dv ETHER_MEDIA_100GBASE_CR4
1453100 Gbit/s Ethernet based upon using a four lane copper twinaxial cable.
1454Each lane operates at 25 Gbit/s and generally utilizes a QSFP28
1455connector.
1456Defined in 802.3bj.
1457.It Dv ETHER_MEDIA_100GBASE_AOC4
1458100 Gbit/s Ethernet that utilizes an active optical cable with
1459short-range optical transceivers.
1460Electrically operates as four lanes of 25 Gbit/s and most commonly uses
1461a QSFP28 connector.
1462.It Dv ETHER_MEDIA_100GBASE_ACC4
1463100 Gbit/s Ethernet that utilizes a four lane copper twinaxial cable
1464that unlike 100GBASE-CR4 has an active component such as a retimer or
1465redriver.
1466.It Dv ETHER_MEDIA_100GBASE_KR2
1467100 Gbit/s Ethernet based upon using a two lane copper backplane.
1468Each lane operates at 50 Gbit/s.
1469Defined in 802.3cd.
1470.It Dv ETHER_MEDIA_100GBASE_CR2
1471100 Gbit/s Ethernet that utilizes a two lane copper twinaxial cable.
1472Each lane operates at 50 Gbit/s.
1473Defined by 802.3cd.
1474.It Dv ETHER_MEDIA_100GBASE_SR2
1475100 Gbit/s Ethernet based upon using two pairs of multi-mode fiber,
1476each operating at 50 Gbit/s, with one fiber in the pair being used for
1477transmit and the other for receive.
1478Defined by 802.3cd.
1479.It Dv ETHER_MEDIA_100GBASE_KR
1480100 Gbit/s Ethernet operating over a single lane copper backplane.
1481Defined by 802.3ck.
1482.It Dv ETHER_MEDIA_100GBASE_CR
1483100 Gbit/s Ethernet operating over a single lane copper twinaxial cable.
1484Generally uses an SFP112 connector.
1485Defined by 802.3ck.
1486.It Dv ETHER_MEDIA_100GBASE_SR
1487100 Gbit/s Ethernet operating over a pair of multi-mode fibers, one for
1488transmitting and one for receiving.
1489The maximum fiber length is 60-100m depending on the fiber type
1490.Pq OM3, OM4 .
1491Defined by 802.3db.
1492.It Dv ETHER_MEDIA_100GBASE_DR
1493100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1494transmitting and one for receiving.
1495Designed to be used with a parallel DR4/DR8 interface.
1496The maximum fiber length is 500m.
1497Defined by 802.3cd.
1498.It Dv ETHER_MEDIA_100GBASE_LR
1499100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1500transmitting and one for receiving.
1501The maximum fiber length is 10km.
1502Defined by 802.3cu.
1503.It Dv ETHER_MEDIA_100GBASE_FR
1504100 Gbit/s Ethernet operating over a pair of single-mode fibers, one for
1505transmitting and one for receiving.
1506The maximum fiber length is 2km.
1507Defined by 802.3cu.
1508.It Dv ETHER_MEDIA_200GBASE_CR4
1509200 Gbit/s Ethernet utilizing a four lane passive copper twinaxial
1510cable.
1511Each lane operates at 50 Gbit/s and the connector is generally based on
1512QSFP56.
1513Defined by 802.3cd.
1514.It Dv ETHER_MEDIA_200GBASE_KR4
1515200 Gbit/s Ethernet utilizing four lanes over a copper backplane each
1516operating at 50 Gbit/s.
1517Defined by 802.3cd.
1518.It Dv ETHER_MEDIA_200GBASE_SR4
1519200 Gbit/s Ethernet based upon using four pairs of multi-mode fiber,
1520each operating at 50 Gbit/s, with one fiber in the pair being used for
1521transmit and the other for receive.
1522Defined by 802.3cd.
1523.It Dv ETHER_MEDIA_200GBASE_DR4
1524200 Gbit/s Ethernet based upon using four pairs of single-mode fiber,
1525each operating at 50 Gbit/s, with one fiber in the pair being used for
1526transmit and the other for receive.
1527Defined by 802.3bs.
1528.It Dv ETHER_MEDIA_200GBASE_FR4
1529200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1530for transmitting and one for receiving.
1531Utilizes wavelength multiplexing as the electrical interface is four 50
1532Gbit/s signals and generally based upon a QSFP56 connector.
1533The maximum fiber length is 2km.
1534Defined by 802.3bs.
1535.It Dv ETHER_MEDIA_200GBASE_LR4
1536200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1537for transmitting and one for receiving.
1538Utilizes wavelength multiplexing as the electrical interface is four 50
1539Gbit/s signals and generally based upon a QSFP56 connector.
1540The maximum fiber length is 10km.
1541Defined by 802.3bs.
1542.It Dv ETHER_MEDIA_200GBASE_ER4
1543200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1544for transmitting and one for receiving.
1545Utilizes wavelength multiplexing as the electrical interface is four 50
1546Gbit/s signals and generally based upon a QSFP56 connector.
1547The maximum fiber length is 40km.
1548Defined by 802.3bs.
1549.It Dv ETHER_MEDIA_200GAUI_4
1550200 Gbit/s signalling utilizing four lanes each operating at 50 Gbit/s.
1551Used for chip-to-chip and chip-to-module connections.
1552Defined by 802.3bs.
1553.It Dv ETHER_MEDIA_200GBASE_KR2
1554200 Gbit/s Ethernet utilizing two lanes over a copper backplane each
1555operating at 100 Gbit/s.
1556Defined by 802.3ck.
1557.It Dv ETHER_MEDIA_200GBASE_CR2
1558200 Gbit/s Ethernet utilizing a two lane passive copper twinaxial
1559cable.
1560Each lane operates at 100 Gbit/s.
1561Defined by 802.3ck.
1562.It Dv ETHER_MEDIA_200GBASE_SR2
1563200 Gbit/s Ethernet based upon using two pairs of multi-mode fiber,
1564each operating at 100 Gbit/s, with one fiber in the pair being used for
1565transmit and the other for receive.
1566Defined by 802.3db.
1567.It Dv ETHER_MEDIA_200GAUI_2
1568200 Gbit/s signalling utilizing two lanes each operating at 100 Gbit/s.
1569Used for chip-to-chip and chip-to-module connections.
1570Defined by 802.3ck.
1571.It Dv ETHER_MEDIA_400GBASE_KR8
1572400 Gbit/s Ethernet utilizing eight lanes over a copper backplane each
1573operating at 50 Gbit/s.
1574Defined by the 25/50 Gigabit Ethernet Consortium.
1575.It Dv ETHER_MEDIA_400GBASE_FR8
1576200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1577for transmitting and one for receiving.
1578Utilizes wavelength multiplexing as the electrical interface is eight 50
1579Gbit/s signals and generally based upon a QSFP-DD connector.
1580The maximum fiber length is 2km.
1581Defined by 802.3bs.
1582.It Dv ETHER_MEDIA_400GBASE_LR8
1583200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1584for transmitting and one for receiving.
1585Utilizes wavelength multiplexing as the electrical interface is eight 50
1586Gbit/s signals and generally based upon a QSFP-DD connector.
1587The maximum fiber length is 10km.
1588Defined by 802.3bs.
1589.It Dv ETHER_MEDIA_400GBASE_ER8
1590200 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1591for transmitting and one for receiving.
1592Utilizes wavelength multiplexing as the electrical interface is eight 50
1593Gbit/s signals and generally based upon a QSFP-DD connector.
1594The maximum fiber length is 40km.
1595Defined by 802.3cn.
1596.It Dv ETHER_MEDIA_400GAUI_8
1597400 Gbit/s signalling utilizing eight lanes each operating at 50 Gbit/s.
1598Used for chip-to-chip and chip-to-module connections.
1599Defined by 802.3bs.
1600.It Dv ETHER_MEDIA_400GBASE_KR4
1601400 Gbit/s Ethernet utilizing four lanes over a copper backplane each
1602operating at 100 Gbit/s.
1603Defined by 802.3ck.
1604.It Dv ETHER_MEDIA_400GBASE_CR4
1605200 Gbit/s Ethernet utilizing a two lane passive copper twinaxial
1606cable.
1607Each lane operates at 100 Gbit/s and generally uses a QSFP112 connector.
1608Defined by 802.3ck.
1609.It Dv ETHER_MEDIA_400GBASE_SR4
1610400 Gbit/s Ethernet based upon using four pairs of multi-mode fiber,
1611each operating at 100 Gbit/s, with one fiber in the pair being used for
1612transmit and the other for receive.
1613Defined by 802.3db.
1614.It Dv ETHER_MEDIA_400GBASE_DR4
1615400 Gbit/s Ethernet based upon using four pairs of single-mode fiber,
1616each operating at 100 Gbit/s, with one fiber in the pair being used for
1617transmit and the other for receive.
1618The maximum fiber length is 500m.
1619Defined by 802.3bs.
1620.It Dv ETHER_MEDIA_400GBASE_FR4
1621400 Gbit/s Ethernet based upon using one pair of single-mode fibers, one
1622for transmitting and one for receiving.
1623Utilizes wavelength multiplexing as the electrical interface is four 100
1624Gbit/s signals and generally based upon a QSFP112 connector.
1625The maximum fiber length is 2km.
1626Defined by 802.3cu.
1627.It Dv ETHER_MEDIA_400GAUI_4
1628400 Gbit/s signalling utilizing four lanes each operating at 100 Gbit/s.
1629Used for chip-to-chip and chip-to-module connections.
1630Defined by 802.3ck.
1631.El
1632.It Dv MAC_PROP_AUTONEG
1633.Bd -filled -compact
1634Type:
1635.Vt uint8_t |
1636Permissions:
1637.Sy Read/Write
1638.Ed
1639.Pp
1640The
1641.Dv MAC_PROP_AUTONEG
1642property indicates whether or not the device is currently configured to
1643perform auto-negotiation.
1644A value of
1645.Sy 0
1646indicates that auto-negotiation is disabled.
1647A
1648.Sy non-zero
1649value indicates that auto-negotiation is enabled.
1650Devices should generally default to enabling auto-negotiation.
1651.Pp
1652When getting this property, the device driver should return the current
1653state.
1654When setting this property, if the device supports operating in the requested
1655mode, then the device driver should reset the link to negotiate to the new speed
1656after updating any internal registers.
1657.It Dv MAC_PROP_MTU
1658.Bd -filled -compact
1659Type:
1660.Vt uint32_t |
1661Permissions:
1662.Sy Read/Write
1663.Ed
1664.Pp
1665The
1666.Dv MAC_PROP_MTU
1667property determines the maximum transmission unit (MTU).
1668This indicates the maximum size packet that the device can transmit, ignoring
1669its own headers.
1670For an Ethernet device, this would exclude the size of the Ethernet header and
1671any VLAN headers that would be placed.
1672It is up to the driver to ensure that any MTU values that it accepts when adding
1673in its margin and header sizes does not exceed its maximum frame size.
1674.Pp
1675By default, drivers for Ethernet should initialize this value and the
1676MTU to
1677.Sy 1500 .
1678When getting this property, the driver should return its current
1679recorded MTU.
1680When setting this property, the driver should first validate that it is within
1681the device's valid range and then it must call
1682.Xr mac_maxsdu_update 9F .
1683Note that the call may fail.
1684If the call completes successfully, the driver should update the hardware with
1685the new value of the MTU and perform any other work needed to handle it.
1686.Pp
1687If the device does not support changing the MTU after the device's
1688.Xr mc_start 9E
1689entry point has been called, then driver writers should return
1690.Er EBUSY .
1691.It Dv MAC_PROP_FLOWCTRL
1692.Bd -filled -compact
1693Type:
1694.Vt link_flowctrl_t |
1695Permissions:
1696.Sy Read/Write
1697.Ed
1698.Pp
1699The
1700.Dv MAC_PROP_FLOWCTRL
1701property manages the configuration of pause frames as part of Ethernet
1702flow control.
1703Note, this only describes what this device will advertise.
1704What is actually enabled may be different and is subject to the rules of
1705auto-negotiation.
1706The
1707.Vt link_flowctrl_t
1708is an enumeration that may be set to one of the following values:
1709.Bl -tag -width Ds
1710.It Dv LINK_FLOWCTRL_NONE
1711Flow control is disabled.
1712No pause frames should be generated or honored.
1713.It Dv LINK_FLOWCTRL_RX
1714The device can receive pause frames; however, it should not generate
1715them.
1716.It Dv LINK_FLOWCTRL_TX
1717The device can generate pause frames; however, it does not support
1718receiving them.
1719.It Dv LINK_FLOWCTRL_BI
1720The device supports both sending and receiving pause frames.
1721.El
1722.Pp
1723When getting this property, the device driver should return the way that
1724it has configured the device, not what the device has actually
1725negotiated.
1726When setting the property, it should update the hardware and allow the link to
1727potentially perform auto-negotiation again.
1728.It Dv MAC_PROP_EN_FEC_CAP
1729.Bd -filled -compact
1730Type:
1731.Vt link_fec_t |
1732Permissions:
1733.Sy Read/Write
1734.Ed
1735.Pp
1736The
1737.Dv MAC_PROP_EN_FEC_CAP
1738property indicates which Forward Error Correction (FEC) code is advertised
1739by the device.
1740.Pp
1741The
1742.Vt link_fec_t
1743is an enumeration that may be a combination of the following bit values:
1744.Bl -tag -width Ds
1745.It Dv LINK_FEC_NONE
1746No FEC over the link.
1747.It Dv LINK_FEC_AUTO
1748The FEC coding to use is auto-negotiated,
1749.Dv LINK_FEC_AUTO
1750cannot be set along with any of the other values.
1751This is the default setting the device driver should use.
1752.It Dv LINK_FEC_RS
1753The link may use Reed-Solomon FEC coding.
1754.It Dv LINK_FEC_BASE_R
1755The link may use Base-R coding, also common referred to as FireCode.
1756.El
1757.Pp
1758When setting the property, it should update the hardware with the requested, or
1759combination of requested codings.
1760If a particular combination of codings is not supported by the hardware,
1761the device driver should return
1762.Er EINVAL .
1763When retrieving this property, the device driver should return the current
1764value of the property.
1765.It Dv MAC_PROP_ADV_FEC_CAP
1766.Bd -filled -compact
1767Type:
1768.Vt link_fec_t |
1769Permissions:
1770.Sy Read-Only
1771.Ed
1772.Pp
1773The
1774.Dv MAC_PROP_ADV_FEC_CAP
1775has the same values as
1776.Dv MAC_PROP_EN_FEC_CAP .
1777The property indicates which Forward Error Correction (FEC) code has been
1778negotiated over the link.
1779.El
1780.Pp
1781The remaining properties are all about various auto-negotiation link
1782speeds.
1783They fall into two different buckets: properties with
1784.Sy _ADV_
1785in the name and properties with
1786.Sy _EN_
1787in the name.
1788For any given supported speed, there is one of each.
1789The
1790.Sy _EN_
1791set of properties are read/write properties that control what should be
1792advertised by the device.
1793When these are retrieved, they should return the current value of the property.
1794When they are set, they should change how the hardware advertises the specific
1795speed and trigger any kind of link reset and auto-negotiation, if enabled, to
1796occur.
1797.Pp
1798The
1799.Sy _ADV_
1800set of properties are read-only properties.
1801They are meant to reflect what has actually been negotiated.
1802These may be different from the
1803.Sy _EN_
1804family of properties, especially when different power management
1805settings are at play.
1806.Pp
1807See the
1808.Sx Link Speed and Auto-negotiation
1809section for more information.
1810.Pp
1811The properties are ordered in increasing link speed:
1812.Bl -hang -width Ds
1813.It Dv MAC_PROP_ADV_10HDX_CAP
1814.Bd -filled -compact
1815Type:
1816.Vt uint8_t |
1817Permissions:
1818.Sy Read-Only
1819.Ed
1820.Pp
1821The
1822.Dv MAC_PROP_ADV_10HDX_CAP
1823property describes whether or not 10 Mbit/s half-duplex support is
1824advertised.
1825.It Dv MAC_PROP_EN_10HDX_CAP
1826.Bd -filled -compact
1827Type:
1828.Vt uint8_t |
1829Permissions:
1830.Sy Read/Write
1831.Ed
1832.Pp
1833The
1834.Dv MAC_PROP_EN_10HDX_CAP
1835property describes whether or not 10 Mbit/s half-duplex support is
1836enabled.
1837.It Dv MAC_PROP_ADV_10FDX_CAP
1838.Bd -filled -compact
1839Type:
1840.Vt uint8_t |
1841Permissions:
1842.Sy Read-Only
1843.Ed
1844.Pp
1845The
1846.Dv MAC_PROP_ADV_10FDX_CAP
1847property describes whether or not 10 Mbit/s full-duplex support is
1848advertised.
1849.It Dv MAC_PROP_EN_10FDX_CAP
1850.Bd -filled -compact
1851Type:
1852.Vt uint8_t |
1853Permissions:
1854.Sy Read/Write
1855.Ed
1856.Pp
1857The
1858.Dv MAC_PROP_EN_10FDX_CAP
1859property describes whether or not 10 Mbit/s full-duplex support is
1860enabled.
1861.It Dv MAC_PROP_ADV_100HDX_CAP
1862.Bd -filled -compact
1863Type:
1864.Vt uint8_t |
1865Permissions:
1866.Sy Read-Only
1867.Ed
1868.Pp
1869The
1870.Dv MAC_PROP_ADV_100HDX_CAP
1871property describes whether or not 100 Mbit/s half-duplex support is
1872advertised.
1873.It Dv MAC_PROP_EN_100HDX_CAP
1874.Bd -filled -compact
1875Type:
1876.Vt uint8_t |
1877Permissions:
1878.Sy Read/Write
1879.Ed
1880.Pp
1881The
1882.Dv MAC_PROP_EN_100HDX_CAP
1883property describes whether or not 100 Mbit/s half-duplex support is
1884enabled.
1885.It Dv MAC_PROP_ADV_100FDX_CAP
1886.Bd -filled -compact
1887Type:
1888.Vt uint8_t |
1889Permissions:
1890.Sy Read-Only
1891.Ed
1892.Pp
1893The
1894.Dv MAC_PROP_ADV_100FDX_CAP
1895property describes whether or not 100 Mbit/s full-duplex support is
1896advertised.
1897.It Dv MAC_PROP_EN_100FDX_CAP
1898.Bd -filled -compact
1899Type:
1900.Vt uint8_t |
1901Permissions:
1902.Sy Read/Write
1903.Ed
1904.Pp
1905The
1906.Dv MAC_PROP_EN_100FDX_CAP
1907property describes whether or not 100 Mbit/s full-duplex support is
1908enabled.
1909.It Dv MAC_PROP_ADV_100T4_CAP
1910.Bd -filled -compact
1911Type:
1912.Vt uint8_t |
1913Permissions:
1914.Sy Read-Only
1915.Ed
1916.Pp
1917The
1918.Dv MAC_PROP_ADV_100T4_CAP
1919property describes whether or not 100 Mbit/s Ethernet using the
1920100BASE-T4 standard is
1921advertised.
1922.It Dv MAC_PROP_EN_100T4_CAP
1923.Bd -filled -compact
1924Type:
1925.Vt uint8_t |
1926Permissions:
1927.Sy Read/Write
1928.Ed
1929.Pp
1930The
1931.Dv MAC_PROP_EN_100T4_CAP
1932property describes whether or not 100 Mbit/s Ethernet using the
1933100BASE-T4 standard is
1934enabled.
1935.It Dv MAC_PROP_ADV_1000HDX_CAP
1936.Bd -filled -compact
1937Type:
1938.Vt uint8_t |
1939Permissions:
1940.Sy Read-Only
1941.Ed
1942.Pp
1943The
1944.Dv MAC_PROP_ADV_1000HDX_CAP
1945property describes whether or not 1 Gbit/s half-duplex support is
1946advertised.
1947.It Dv MAC_PROP_EN_1000HDX_CAP
1948.Bd -filled -compact
1949Type:
1950.Vt uint8_t |
1951Permissions:
1952.Sy Read/Write
1953.Ed
1954.Pp
1955The
1956.Dv MAC_PROP_EN_1000HDX_CAP
1957property describes whether or not 1 Gbit/s half-duplex support is
1958enabled.
1959.It Dv MAC_PROP_ADV_1000FDX_CAP
1960.Bd -filled -compact
1961Type:
1962.Vt uint8_t |
1963Permissions:
1964.Sy Read-Only
1965.Ed
1966.Pp
1967The
1968.Dv MAC_PROP_ADV_1000FDX_CAP
1969property describes whether or not 1 Gbit/s full-duplex support is
1970advertised.
1971.It Dv MAC_PROP_EN_1000FDX_CAP
1972.Bd -filled -compact
1973Type:
1974.Vt uint8_t |
1975Permissions:
1976.Sy Read/Write
1977.Ed
1978.Pp
1979The
1980.Dv MAC_PROP_EN_1000FDX_CAP
1981property describes whether or not 1 Gbit/s full-duplex support is
1982enabled.
1983.It Dv MAC_PROP_ADV_2500FDX_CAP
1984.Bd -filled -compact
1985Type:
1986.Vt uint8_t |
1987Permissions:
1988.Sy Read-Only
1989.Ed
1990.Pp
1991The
1992.Dv MAC_PROP_ADV_2500FDX_CAP
1993property describes whether or not 2.5 Gbit/s full-duplex support is
1994advertised.
1995.It Dv MAC_PROP_EN_2500FDX_CAP
1996.Bd -filled -compact
1997Type:
1998.Vt uint8_t |
1999Permissions:
2000.Sy Read/Write
2001.Ed
2002.Pp
2003The
2004.Dv MAC_PROP_EN_2500FDX_CAP
2005property describes whether or not 2.5 Gbit/s full-duplex support is
2006enabled.
2007.It Dv MAC_PROP_ADV_5000FDX_CAP
2008.Bd -filled -compact
2009Type:
2010.Vt uint8_t |
2011Permissions:
2012.Sy Read-Only
2013.Ed
2014.Pp
2015The
2016.Dv MAC_PROP_ADV_5000FDX_CAP
2017property describes whether or not 5.0 Gbit/s full-duplex support is
2018advertised.
2019.It Dv MAC_PROP_EN_5000FDX_CAP
2020.Bd -filled -compact
2021Type:
2022.Vt uint8_t |
2023Permissions:
2024.Sy Read/Write
2025.Ed
2026.Pp
2027The
2028.Dv MAC_PROP_EN_5000FDX_CAP
2029property describes whether or not 5.0 Gbit/s full-duplex support is
2030enabled.
2031.It Dv MAC_PROP_ADV_10GFDX_CAP
2032.Bd -filled -compact
2033Type:
2034.Vt uint8_t |
2035Permissions:
2036.Sy Read-Only
2037.Ed
2038.Pp
2039The
2040.Dv MAC_PROP_ADV_10GFDX_CAP
2041property describes whether or not 10 Gbit/s full-duplex support is
2042advertised.
2043.It Dv MAC_PROP_EN_10GFDX_CAP
2044.Bd -filled -compact
2045Type:
2046.Vt uint8_t |
2047Permissions:
2048.Sy Read/Write
2049.Ed
2050.Pp
2051The
2052.Dv MAC_PROP_EN_10GFDX_CAP
2053property describes whether or not 10 Gbit/s full-duplex support is
2054enabled.
2055.It Dv MAC_PROP_ADV_40GFDX_CAP
2056.Bd -filled -compact
2057Type:
2058.Vt uint8_t |
2059Permissions:
2060.Sy Read-Only
2061.Ed
2062.Pp
2063The
2064.Dv MAC_PROP_ADV_40GFDX_CAP
2065property describes whether or not 40 Gbit/s full-duplex support is
2066advertised.
2067.It Dv MAC_PROP_EN_40GFDX_CAP
2068.Bd -filled -compact
2069Type:
2070.Vt uint8_t |
2071Permissions:
2072.Sy Read/Write
2073.Ed
2074.Pp
2075The
2076.Dv MAC_PROP_EN_40GFDX_CAP
2077property describes whether or not 40 Gbit/s full-duplex support is
2078enabled.
2079.It Dv MAC_PROP_ADV_100GFDX_CAP
2080.Bd -filled -compact
2081Type:
2082.Vt uint8_t |
2083Permissions:
2084.Sy Read-Only
2085.Ed
2086.Pp
2087The
2088.Dv MAC_PROP_ADV_100GFDX_CAP
2089property describes whether or not 100 Gbit/s full-duplex support is
2090advertised.
2091.It Dv MAC_PROP_EN_100GFDX_CAP
2092.Bd -filled -compact
2093Type:
2094.Vt uint8_t |
2095Permissions:
2096.Sy Read/Write
2097.Ed
2098.Pp
2099The
2100.Dv MAC_PROP_EN_100GFDX_CAP
2101property describes whether or not 100 Gbit/s full-duplex support is
2102enabled.
2103.It Dv MAC_PROP_ADV_200GFDX_CAP
2104.Bd -filled -compact
2105Type:
2106.Vt uint8_t |
2107Permissions:
2108.Sy Read-Only
2109.Ed
2110.Pp
2111The
2112.Dv MAC_PROP_ADV_200GFDX_CAP
2113property describes whether or not 200 Gbit/s full-duplex support is
2114advertised.
2115.It Dv MAC_PROP_EN_200GFDX_CAP
2116.Bd -filled -compact
2117Type:
2118.Vt uint8_t |
2119Permissions:
2120.Sy Read/Write
2121.Ed
2122.Pp
2123The
2124.Dv MAC_PROP_EN_200GFDX_CAP
2125property describes whether or not 200 Gbit/s full-duplex support is
2126enabled.
2127.It Dv MAC_PROP_ADV_400GFDX_CAP
2128.Bd -filled -compact
2129Type:
2130.Vt uint8_t |
2131Permissions:
2132.Sy Read-Only
2133.Ed
2134.Pp
2135The
2136.Dv MAC_PROP_ADV_400GFDX_CAP
2137property describes whether or not 400 Gbit/s full-duplex support is
2138advertised.
2139.It Dv MAC_PROP_EN_400GFDX_CAP
2140.Bd -filled -compact
2141Type:
2142.Vt uint8_t |
2143Permissions:
2144.Sy Read/Write
2145.Ed
2146.Pp
2147The
2148.Dv MAC_PROP_EN_400GFDX_CAP
2149property describes whether or not 400 Gbit/s full-duplex support is
2150enabled.
2151.El
2152.Ss Private Properties
2153In addition to the defined properties above, drivers are allowed to
2154define private properties.
2155These private properties are device-specific properties.
2156All private properties share the same constant,
2157.Dv MAC_PROP_PRIVATE .
2158Properties are distinguished by a name, which is a character string.
2159The list of such private properties is defined when registering with mac in the
2160.Fa m_priv_props
2161member of the
2162.Xr mac_register 9S
2163structure.
2164.Pp
2165The driver may define whatever semantics it wants for these private
2166properties.
2167They will not be listed when running
2168.Xr dladm 8 ,
2169unless explicitly requested by name.
2170All such properties should start with a leading underscore character and then
2171consist of alphanumeric ASCII characters and additional underscores or hyphens.
2172.Pp
2173Properties of type
2174.Dv MAC_PROP_PRIVATE
2175may show up in all three property related entry points:
2176.Xr mc_propinfo 9E ,
2177.Xr mc_getprop 9E ,
2178and
2179.Xr mc_setprop 9E .
2180Device drivers should tell the different properties apart by using the
2181.Xr strcmp 9F
2182function to compare it to the set of properties that it knows about.
2183When encountering properties that it doesn't know, it should treat them
2184like all other unknown properties.
2185.Sh STATISTICS
2186The MAC framework defines a couple different sets of statistics which
2187are based on various standards for devices to implement.
2188Statistics are retrieved through the
2189.Xr mc_getstat 9E
2190entry point.
2191There are both statistics that are required for all devices and then there is a
2192separate set of Ethernet specific statistics.
2193Not all devices will support every statistic.
2194In many cases, several device registers will need to be combined to create the
2195proper stat.
2196.Pp
2197In general, if the device is not keeping track of these statistics, then
2198it is recommended that the driver store these values as a
2199.Vt uint64_t
2200to ensure that overflow does not occur.
2201.Pp
2202If a device does not support a specific statistic, then it is fine to
2203return that it is not supported.
2204The same should be used for unrecognized statistics.
2205See
2206.Xr mc_getstat 9E
2207for more information on the proper way to handle these.
2208.Ss General Device Statistics
2209The following statistics are based on MIB-II statistics from both RFC
22101213 and RFC 1573.
2211.Bl -tag -width Ds
2212.It Dv MAC_STAT_IFSPEED
2213The device's current speed in bits per second.
2214.It Dv MAC_STAT_MULTIRCV
2215The total number of received multicast packets.
2216.It Dv MAC_STAT_BRDCSTRCV
2217The total number of received broadcast packets.
2218.It Dv MAC_STAT_MULTIXMT
2219The total number of transmitted multicast packets.
2220.It Dv MAC_STAT_BRDCSTXMT
2221The total number of received broadcast packets.
2222.It Dv MAC_STAT_NORCVBUF
2223The total number of packets discarded by the hardware due to a lack of
2224receive buffers.
2225.It Dv MAC_STAT_IERRORS
2226The total number of errors detected on input.
2227.It Dv MAC_STAT_UNKNOWNS
2228The total number of received packets that were discarded because they
2229were of an unknown protocol.
2230.It Dv MAC_STAT_NOXMTBUF
2231The total number of outgoing packets dropped due to a lack of transmit
2232buffers.
2233.It Dv MAC_STAT_OERRORS
2234The total number of outgoing packets that resulted in errors.
2235.It Dv MAC_STAT_COLLISIONS
2236Total number of collisions encountered by the transmitter.
2237.It Dv MAC_STAT_RBYTES
2238The total number of bytes received by the device, regardless of packet
2239type.
2240.It Dv MAC_STAT_IPACKETS
2241The total number of packets received by the device, regardless of packet type.
2242.It Dv MAC_STAT_OBYTES
2243The total number of bytes transmitted by the device, regardless of packet type.
2244.It Dv MAC_STAT_OPACKETS
2245The total number of packets sent by the device, regardless of packet type.
2246.It Dv MAC_STAT_UNDERFLOWS
2247The total number of packets that were smaller than the minimum sized
2248packet for the device and were therefore dropped.
2249.It Dv MAC_STAT_OVERFLOWS
2250The total number of packets that were larger than the maximum sized
2251packet for the device and were therefore dropped.
2252.El
2253.Ss Ethernet Specific Statistics
2254The following statistics are specific to Ethernet devices.
2255They refer to values from RFC 1643 and include various MII/GMII specific stats.
2256Many of these are also defined in IEEE 802.3.
2257.Bl -tag -width Ds
2258.It Dv ETHER_STAT_ADV_CAP_1000FDX
2259Indicates that the device is advertising support for 1 Gbit/s
2260full-duplex operation.
2261.It Dv ETHER_STAT_ADV_CAP_1000HDX
2262Indicates that the device is advertising support for 1 Gbit/s
2263half-duplex operation.
2264.It Dv ETHER_STAT_ADV_CAP_100FDX
2265Indicates that the device is advertising support for 100 Mbit/s
2266full-duplex operation.
2267.It Dv ETHER_STAT_ADV_CAP_100GFDX
2268Indicates that the device is advertising support for 100 Gbit/s
2269full-duplex operation.
2270.It Dv ETHER_STAT_ADV_CAP_100HDX
2271Indicates that the device is advertising support for 100 Mbit/s
2272half-duplex operation.
2273.It Dv ETHER_STAT_ADV_CAP_100T4
2274Indicates that the device is advertising support for 100 Mbit/s
2275100BASE-T4 operation.
2276.It Dv ETHER_STAT_ADV_CAP_10FDX
2277Indicates that the device is advertising support for 10 Mbit/s
2278full-duplex operation.
2279.It Dv ETHER_STAT_ADV_CAP_10GFDX
2280Indicates that the device is advertising support for 10 Gbit/s
2281full-duplex operation.
2282.It Dv ETHER_STAT_ADV_CAP_10HDX
2283Indicates that the device is advertising support for 10 Mbit/s
2284half-duplex operation.
2285.It Dv ETHER_STAT_ADV_CAP_2500FDX
2286Indicates that the device is advertising support for 2.5 Gbit/s
2287full-duplex operation.
2288.It Dv ETHER_STAT_ADV_CAP_40GFDX
2289Indicates that the device is advertising support for 40 Gbit/s
2290full-duplex operation.
2291.It Dv ETHER_STAT_ADV_CAP_5000FDX
2292Indicates that the device is advertising support for 5.0 Gbit/s
2293full-duplex operation.
2294.It Dv ETHER_STAT_ADV_CAP_ASMPAUSE
2295Indicates that the device is advertising support for receiving pause
2296frames.
2297.It Dv ETHER_STAT_ADV_CAP_AUTONEG
2298Indicates that the device is advertising support for auto-negotiation.
2299.It Dv ETHER_STAT_ADV_CAP_PAUSE
2300Indicates that the device is advertising support for generating pause
2301frames.
2302.It Dv ETHER_STAT_ADV_REMFAULT
2303Indicates that the device is advertising support for detecting faults in
2304the remote link peer.
2305.It Dv ETHER_STAT_ALIGN_ERRORS
2306Indicates the number of times an alignment error was generated by the
2307Ethernet device.
2308This is a count of packets that were not an integral number of octets and failed
2309the FCS check.
2310.It Dv ETHER_STAT_CAP_1000FDX
2311Indicates the device supports 1 Gbit/s full-duplex operation.
2312.It Dv ETHER_STAT_CAP_1000HDX
2313Indicates the device supports 1 Gbit/s half-duplex operation.
2314.It Dv ETHER_STAT_CAP_100FDX
2315Indicates the device supports 100 Mbit/s full-duplex operation.
2316.It Dv ETHER_STAT_CAP_100GFDX
2317Indicates the device supports 100 Gbit/s full-duplex operation.
2318.It Dv ETHER_STAT_CAP_100HDX
2319Indicates the device supports 100 Mbit/s half-duplex operation.
2320.It Dv ETHER_STAT_CAP_100T4
2321Indicates the device supports 100 Mbit/s 100BASE-T4 operation.
2322.It Dv ETHER_STAT_CAP_10FDX
2323Indicates the device supports 10 Mbit/s full-duplex operation.
2324.It Dv ETHER_STAT_CAP_10GFDX
2325Indicates the device supports 10 Gbit/s full-duplex operation.
2326.It Dv ETHER_STAT_CAP_10HDX
2327Indicates the device supports 10 Mbit/s half-duplex operation.
2328.It Dv ETHER_STAT_CAP_2500FDX
2329Indicates the device supports 2.5 Gbit/s full-duplex operation.
2330.It Dv ETHER_STAT_CAP_40GFDX
2331Indicates the device supports 40 Gbit/s full-duplex operation.
2332.It Dv ETHER_STAT_CAP_5000FDX
2333Indicates the device supports 5.0 Gbit/s full-duplex operation.
2334.It Dv ETHER_STAT_CAP_ASMPAUSE
2335Indicates that the device supports the ability to receive pause frames.
2336.It Dv ETHER_STAT_CAP_AUTONEG
2337Indicates that the device supports the ability to perform link
2338auto-negotiation.
2339.It Dv ETHER_STAT_CAP_PAUSE
2340Indicates that the device supports the ability to transmit pause frames.
2341.It Dv ETHER_STAT_CAP_REMFAULT
2342Indicates that the device supports the ability of detecting a remote
2343fault in a link peer.
2344.It Dv ETHER_STAT_CARRIER_ERRORS
2345Indicates the number of times that the Ethernet carrier sense condition
2346was lost or not asserted.
2347.It Dv ETHER_STAT_DEFER_XMTS
2348Indicates the number of frames for which the device was unable to
2349transmit the frame due to being busy and had to try again.
2350.It Dv ETHER_STAT_EX_COLLISIONS
2351Indicates the number of frames that failed to send due to an excessive
2352number of collisions.
2353.It Dv ETHER_STAT_FCS_ERRORS
2354Indicates the number of times that a frame check sequence failed.
2355.It Dv ETHER_STAT_FIRST_COLLISIONS
2356Indicates the number of times that a frame was eventually transmitted
2357successfully, but only after a single collision.
2358.It Dv ETHER_STAT_JABBER_ERRORS
2359Indicates the number of frames that were received that were both larger
2360than the maximum packet size and failed the frame check sequence.
2361.It Dv ETHER_STAT_LINK_ASMPAUSE
2362Indicates whether the link is currently configured to accept pause
2363frames.
2364.It Dv ETHER_STAT_LINK_AUTONEG
2365Indicates whether the current link state is a result of
2366auto-negotiation.
2367.It Dv ETHER_STAT_LINK_DUPLEX
2368Indicates the current duplex state of the link.
2369The values used here should be the same as documented for
2370.Dv MAC_PROP_DUPLEX .
2371.It Dv ETHER_STAT_LINK_PAUSE
2372Indicates whether the link is currently configured to generate pause
2373frames.
2374.It Dv ETHER_STAT_LP_CAP_1000FDX
2375Indicates the remote device supports 1 Gbit/s full-duplex operation.
2376.It Dv ETHER_STAT_LP_CAP_1000HDX
2377Indicates the remote device supports 1 Gbit/s half-duplex operation.
2378.It Dv ETHER_STAT_LP_CAP_100FDX
2379Indicates the remote device supports 100 Mbit/s full-duplex operation.
2380.It Dv ETHER_STAT_LP_CAP_100GFDX
2381Indicates the remote device supports 100 Gbit/s full-duplex operation.
2382.It Dv ETHER_STAT_LP_CAP_100HDX
2383Indicates the remote device supports 100 Mbit/s half-duplex operation.
2384.It Dv ETHER_STAT_LP_CAP_100T4
2385Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation.
2386.It Dv ETHER_STAT_LP_CAP_10FDX
2387Indicates the remote device supports 10 Mbit/s full-duplex operation.
2388.It Dv ETHER_STAT_LP_CAP_10GFDX
2389Indicates the remote device supports 10 Gbit/s full-duplex operation.
2390.It Dv ETHER_STAT_LP_CAP_10HDX
2391Indicates the remote device supports 10 Mbit/s half-duplex operation.
2392.It Dv ETHER_STAT_LP_CAP_2500FDX
2393Indicates the remote device supports 2.5 Gbit/s full-duplex operation.
2394.It Dv ETHER_STAT_LP_CAP_40GFDX
2395Indicates the remote device supports 40 Gbit/s full-duplex operation.
2396.It Dv ETHER_STAT_LP_CAP_5000FDX
2397Indicates the remote device supports 5.0 Gbit/s full-duplex operation.
2398.It Dv ETHER_STAT_LP_CAP_ASMPAUSE
2399Indicates that the remote device supports the ability to receive pause
2400frames.
2401.It Dv ETHER_STAT_LP_CAP_AUTONEG
2402Indicates that the remote device supports the ability to perform link
2403auto-negotiation.
2404.It Dv ETHER_STAT_LP_CAP_PAUSE
2405Indicates that the remote device supports the ability to transmit pause
2406frames.
2407.It Dv ETHER_STAT_LP_CAP_REMFAULT
2408Indicates that the remote device supports the ability of detecting a
2409remote fault in a link peer.
2410.It Dv ETHER_STAT_MACRCV_ERRORS
2411Indicates the number of times that the internal MAC layer encountered an
2412error when attempting to receive and process a frame.
2413.It Dv ETHER_STAT_MACXMT_ERRORS
2414Indicates the number of times that the internal MAC layer encountered an
2415error when attempting to process and transmit a frame.
2416.It Dv ETHER_STAT_MULTI_COLLISIONS
2417Indicates the number of times that a frame was eventually transmitted
2418successfully, but only after more than one collision.
2419.It Dv ETHER_STAT_SQE_ERRORS
2420Indicates the number of times that an SQE error occurred.
2421The specific conditions for this error are documented in IEEE 802.3.
2422.It Dv ETHER_STAT_TOOLONG_ERRORS
2423Indicates the number of frames that were received that were longer than
2424the maximum frame size supported by the device.
2425.It Dv ETHER_STAT_TOOSHORT_ERRORS
2426Indicates the number of frames that were received that were shorter than
2427the minimum frame size supported by the device.
2428.It Dv ETHER_STAT_TX_LATE_COLLISIONS
2429Indicates the number of times a collision was detected late on the
2430device.
2431.It Dv ETHER_STAT_XCVR_ADDR
2432Indicates the address of the MII/GMII receiver address.
2433.It Dv ETHER_STAT_XCVR_ID
2434Indicates the id of the MII/GMII receiver address.
2435.It Dv ETHER_STAT_XCVR_INUSE
2436Indicates what kind of transceiver is in use.
2437Use the
2438.Vt mac_ether_media_t
2439enumeration values described in the discussion of
2440.Dv MAC_PROP_MEDIA
2441above.
2442These definitions are compatible with the older subset of
2443XCVR_* macros.
2444.El
2445.Ss Device Specific kstats
2446In addition to the defined statistics above, if the device driver
2447maintains additional statistics or the device provides additional
2448statistics, it should create its own kstats through the
2449.Xr kstat_create 9F
2450function to allow operators to observe them.
2451.Sh RECEIVE DESCRIPTOR LAYOUT
2452One of the important things that a device driver must do is lay out DMA
2453memory, generally in a ring of descriptors, into which received Ethernet
2454frames will be placed.
2455When performing this, there are a few things that drivers should
2456generally do:
2457.Bl -enum -offset indent
2458.It
2459Drivers should lay out memory so that the IP header will be 4-byte
2460aligned.
2461The IP stack expects that the beginning of an IP header will be at a
24624-byte aligned address; however, a DMA allocation will be at a 4-
2463or 8-byte aligned address by default.
2464The IP header is at a 14 byte offset from the beginning of the Ethernet
2465frame, leaving the IP header at a 2-byte alignment if the Ethernet frame
2466starts at the beginning of the DMA buffer.
2467If VLAN tagging is in place, then each VLAN tag adds 4 bytes, which
2468doesn't change the alignment the IP header is found at.
2469.Pp
2470As a solution to this, the driver should program the device to start
2471placing the received Ethernet frame at two bytes off of the start of the
2472DMA buffer.
2473This will make sure that no matter whether or not VLAN tags are present,
2474that the IP header will be 4-byte aligned.
2475.It
2476Drivers should try to allocate the DMA memory used for receiving frames
2477as a continuous buffer.
2478If for some reason that would not be possible, the driver should try to
2479ensure that there is enough space for all of the initial Ethernet and
2480any possible layer three and layer four headers
2481.Pq such as IP, TCP, or UDP
2482in the initial descriptor.
2483.It
2484As discussed in the
2485.Sx MBLKS AND DMA
2486section, there are multiple strategies for managing the relationship
2487between DMA data, receive descriptors, and the operating system
2488representation of a packet in the
2489.Xr mblk 9S
2490structure.
2491Drivers must limit their resource consumption.
2492See the
2493.Sy Considerations
2494section of
2495.Sx MBLKS AND DMA
2496for more on this.
2497.El
2498.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
2499Device drivers are the first line of defense for dealing with broken
2500devices and bugs in their firmware.
2501While most devices will rarely fail, it is important that when designing and
2502implementing the device driver that particular attention is paid in the design
2503with respect to RAS (Reliability, Availability, and Serviceability).
2504While everything described in this section is optional, it is highly recommended
2505that all new device drivers follow these guidelines.
2506.Pp
2507The Fault Management Architecture (FMA) provides facilities for
2508detecting and reporting various classes of defects and faults.
2509Specifically for networking device drivers, issues that should be
2510detected and reported include:
2511.Bl -bullet -offset indent
2512.It
2513Device internal uncorrectable errors
2514.It
2515Device internal correctable errors
2516.It
2517PCI and PCI Express transport errors
2518.It
2519Device temperature alarms
2520.It
2521Device transmission stalls
2522.It
2523Device communication timeouts
2524.It
2525High invalid interrupts
2526.El
2527.Pp
2528All such errors fall into three primary categories:
2529.Bl -enum -offset indent
2530.It
2531Errors detected by the Fault Management Architecture
2532.It
2533Errors detected by the device and indicated to the device driver
2534.It
2535Errors detected by the device driver
2536.El
2537.Ss Fault Management Setup and Teardown
2538Drivers should initialize support for the fault management framework by
2539calling
2540.Xr ddi_fm_init 9F
2541from their
2542.Xr attach 9E
2543routine.
2544By registering with the fault management framework, a device driver is given the
2545chance to detect and notice transport errors as well as report other errors that
2546exist.
2547While a device driver does not need to indicate that it is capable of all such
2548capabilities described in
2549.Xr ddi_fm_init 9F ,
2550we suggest that device drivers at least register the
2551.Dv DDI_FM_EREPORT_CAPABLE
2552so as to allow the driver to report issues that it detects.
2553.Pp
2554If the driver registers with the fault management framework during its
2555.Xr attach 9E
2556entry point, it must call
2557.Xr ddi_fm_fini 9F
2558during its
2559.Xr detach 9E
2560entry point.
2561.Ss Transport Errors
2562Many modern networking devices leverage PCI or PCI Express.
2563As such, there are two primary ways that device drivers access data: they either
2564memory map device registers and use routines like
2565.Xr ddi_get8 9F
2566and
2567.Xr ddi_put8 9F
2568or they use direct memory access (DMA).
2569New device drivers should always enable checking of the transport layer by
2570marking their support in the
2571.Xr ddi_device_acc_attr 9S
2572structure and using routines like
2573.Xr ddi_fm_acc_err_get 9F
2574and
2575.Xr ddi_fm_dma_err_get 9F
2576to detect if errors have occurred.
2577.Ss Device Indicated Errors
2578Many devices have capabilities to announce to a device driver that a
2579fatal correctable error or uncorrectable error has occurred.
2580Other devices have the ability to indicate that various physical issues have
2581occurred such as a fan failing or a temperature sensor having fired.
2582.Pp
2583Drivers should wire themselves to receive notifications when these
2584events occur.
2585The means and capabilities will vary from device to device.
2586For example, some devices will generate information about these notifications
2587through special interrupts.
2588Other devices may have a register that software can poll.
2589In the cases where polling is required, driver writers should try not to poll
2590too frequently and should generally only poll when the device is actively being
2591used, e.g. between calls to the
2592.Xr mc_start 9E
2593and
2594.Xr mc_stop 9E
2595entry points.
2596.Ss Driver Transmit Stall Detection
2597One of the primary responsibilities of a hardened device driver is to
2598perform transmit stall detection.
2599The core idea behind tx stall detection is that the driver should record when
2600it's getting activity related to when data has been successfully transmitted.
2601Most devices should be transmitting data on a regular basis as long as the link
2602is up.
2603If it is not, then this may indicate that the device is stuck and needs to be
2604reset.
2605At this time, the MAC framework does not provide any resources for performing
2606these checks; however, polling on each individual transmit ring for the last
2607completion time while something is actively being transmitted through the use of
2608routines such as
2609.Xr timeout 9F
2610may be a reasonable starting point.
2611.Ss Driver Command Timeout Detection
2612Each device is programmed in different ways.
2613Some devices are programmed through asynchronous commands while others are
2614programmed by writing directly to memory mapped registers.
2615If a device receives asynchronous replies to commands, then the device driver
2616should set reasonable timeouts for all such commands and plan on detecting them.
2617If a timeout occurs, the driver should presume that there is an issue with the
2618hardware and proceed to abort the command or reset the device.
2619.Pp
2620Many devices do not have such a communication mechanism.
2621However, whenever there is some activity where the device driver must wait, then
2622it should be prepared for the fact that the device may never get back to
2623it and react appropriately by performing some kind of device reset.
2624.Ss Reacting to Errors
2625When any of the above categories of errors has been triggered, the
2626behavior that the device driver should take depends on the kind of
2627error.
2628If a fatal error, for example, a transport error, a transmit stall was detected,
2629or the device indicated an uncorrectable error was detected, then it is
2630important that the driver take the following steps:
2631.Bl -enum -offset indent
2632.It
2633Set a flag in the device driver's state that indicates that it has hit
2634an error condition.
2635When this error condition flag is asserted, transmitted packets should be
2636accepted and dropped and actions that would require writing to the device state
2637should fail with an error.
2638This flag should remain until the device has been successfully restarted.
2639.It
2640If the error was not a transport error that was indicated by the fault
2641management architecture, e.g. a transport error that was detected, then
2642the device driver should post an
2643.Sy ereport
2644indicating what has occurred with the
2645.Xr ddi_fm_ereport_post 9F
2646function.
2647.It
2648The device driver should indicate that the device's service was lost
2649with a call to
2650.Xr ddi_fm_service_impact 9F
2651using the symbol
2652.Dv DDI_SERVICE_LOST .
2653.It
2654At this point the device driver should issue a device reset through some
2655device-specific means.
2656.It
2657When the device reset has been completed, then the device driver should
2658restore all of the programmed state to the device.
2659This includes things like the current MTU, advertised auto-negotiation speeds,
2660MAC address filters, and more.
2661.It
2662Finally, when service has been restored, the device driver should call
2663.Xr ddi_fm_service_impact 9F
2664using the symbol
2665.Dv DDI_SERVICE_RESTORED .
2666.El
2667.Pp
2668When a non-fatal error occurs, then the device driver should submit an
2669ereport and should optionally mark the device degraded using
2670.Xr ddi_fm_service_impact 9F
2671with the
2672.Dv DDI_SERVICE_DEGRADED
2673value depending on the nature of the problem that has occurred.
2674.Pp
2675Device drivers should never make the decision to remove a device from
2676service based on errors that have occurred nor should they panic the
2677system.
2678Rather, the device driver should always try to notify the operating system with
2679various ereports and allow its policy decisions to occur.
2680The decision to retire a device lies in the hands of the fault management
2681architecture.
2682It knows more about the operator's intent and the surrounding system's state
2683than the device driver itself does and it will make the call to offline and
2684retire the device if it is required.
2685.Ss Device Resets
2686When resetting a device, a device driver must exercise caution.
2687If a device driver has not been written to plan for a device reset, then it
2688may not correctly restore the device's state after such a reset.
2689Such state should be stored in the instance's private state data as the MAC
2690framework does not know about device resets and will not inform the
2691device again about the expected, programmed state.
2692.Pp
2693One wrinkle with device resets is that many networking cards show up as
2694multiple PCI functions on a single device, for example, each port may
2695show up as a separate function and thus have a separate instance of the
2696device driver attached.
2697When resetting a function, device driver writers should carefully read the
2698device programming manuals and verify whether or not a reset impacts only the
2699stalled function or if it impacts all function across the device.
2700.Pp
2701If the only way to reset a given function is through the device, then
2702this may require more coordination and work on the part of the device
2703driver to ensure that all the other instances are correctly restored.
2704In cases where this occurs, some devices offer ways of injecting
2705interrupts onto those other functions to notify them that this is
2706occurring.
2707.Sh MBLKS AND DMA
2708The networking stack manages framed data through the use of the
2709.Xr mblk 9S
2710structure.
2711The mblk allows for a single message to be made up of individual blocks.
2712Each part is linked together through its
2713.Fa b_cont
2714member.
2715However, it also allows for multiple messages to be chained together through the
2716use of the
2717.Fa b_next
2718member.
2719While the networking stack works with these structures, device drivers generally
2720work with DMA regions.
2721There are two different strategies that device drivers use for handling these
2722two different cases: copying and binding.
2723.Ss Copying Data
2724The first way that device drivers handle interfacing between the two is
2725by having two separate regions of memory.
2726One part is memory which has been allocated for DMA through a call to
2727.Xr ddi_dma_mem_alloc 9F
2728and the other is memory associated with the memory block.
2729.Pp
2730In this case, a driver will use
2731.Xr bcopy 9F
2732to copy memory between the two distinct regions.
2733When transmitting a packet, it will copy the memory from the mblk_t to the DMA
2734region.
2735When receiving memory, it will allocate a mblk_t through the
2736.Xr allocb 9F
2737routine, copy the memory across with
2738.Xr bcopy 9F ,
2739and then increment the mblk_t's
2740.Fa b_wptr
2741structure.
2742.Pp
2743If, when receiving, memory is not available for a new message block,
2744then the frame should be skipped and effectively dropped.
2745A kstat should be bumped when such an occasion occurs.
2746.Ss Binding Data
2747An alternative approach to copying data is to use DMA binding.
2748When using DMA binding, the OS takes care of mapping between DMA memory and
2749normal device memory.
2750The exact process is a bit different between transmit and receive.
2751.Pp
2752When transmitting a device driver has an mblk_t and needs to call the
2753.Xr ddi_dma_addr_bind_handle 9F
2754function to bind it to an already existing DMA handle.
2755At that point, it will receive various DMA cookies that it can use to obtain the
2756addresses to program the device with for transmitting data.
2757Once the transmit is done, the driver must then make sure to call
2758.Xr freemsg 9F
2759to release the data.
2760It must not call
2761.Xr freemsg 9F
2762before it receives an interrupt from the device indicating that the data
2763has been transmitted, otherwise it risks sending arbitrary kernel
2764memory.
2765.Pp
2766When receiving data, the device can perform a similar operation.
2767First, it must bind the DMA memory into the kernel's virtual memory address
2768space through a call to the
2769.Xr ddi_dma_addr_bind_handle 9F
2770function if it has not already.
2771Once it has, it must then call
2772.Xr desballoc 9F
2773to try and create a new mblk_t which leverages the associated memory.
2774It can then pass that mblk_t up to the stack.
2775.Ss Considerations
2776When deciding which of these options to use, there are many different
2777considerations that must be made.
2778The answer as to whether to bind memory or to copy data is not always simpler.
2779.Pp
2780The first thing to remember is that DMA resources may be finite on a
2781given platform.
2782Consider the case of receiving data.
2783A device driver that binds one of its receive descriptors may not get it back
2784for quite some time as it may be used by the kernel until an application
2785actually consumes it.
2786Device drivers that try to bind memory for receive, often work with the
2787constraint that they must be able to replace that DMA memory with another DMA
2788descriptor.
2789If they were not replaced, then eventually the device would not be able to
2790receive additional data into the ring.
2791.Pp
2792On the other hand, particularly for larger frames, copying every packet
2793from one buffer to another can be a source of additional latency and
2794memory waste in the system.
2795For larger copies, the cost of copying may dwarf any potential cost of
2796performing DMA binding.
2797.Pp
2798For device driver authors that are unsure of what to do, they should
2799first employ the copying method to simplify the act of writing the
2800device driver.
2801The copying method is simpler and also allows the device driver author not to
2802worry about allocated DMA memory that is still outstanding when it is asked to
2803unload.
2804.Pp
2805If device driver writers are worried about the cost, it is recommended
2806to make the decision as to whether or not to copy or bind DMA data
2807a separate private property for both transmitting and receiving.
2808That private property should indicate the size of the received frame at which
2809to switch from one format to the other.
2810This way, data can be gathered to determine what the impact of each method is on
2811a given platform.
2812.Sh SEE ALSO
2813.Xr dlpi 4P ,
2814.Xr driver.conf 5 ,
2815.Xr ieee802.3 7 ,
2816.Xr dladm 8 ,
2817.Xr _fini 9E ,
2818.Xr _info 9E ,
2819.Xr _init 9E ,
2820.Xr attach 9E ,
2821.Xr close 9E ,
2822.Xr detach 9E ,
2823.Xr mac_capab_led 9E ,
2824.Xr mac_capab_rings 9E ,
2825.Xr mac_capab_transceiver 9E ,
2826.Xr mc_close 9E ,
2827.Xr mc_getcapab 9E ,
2828.Xr mc_getprop 9E ,
2829.Xr mc_getstat 9E ,
2830.Xr mc_multicst 9E  ,
2831.Xr mc_open 9E ,
2832.Xr mc_propinfo 9E  ,
2833.Xr mc_setpromisc 9E  ,
2834.Xr mc_setprop 9E ,
2835.Xr mc_start 9E ,
2836.Xr mc_stop 9E ,
2837.Xr mc_tx 9E ,
2838.Xr mc_unicst 9E  ,
2839.Xr open 9E ,
2840.Xr allocb 9F ,
2841.Xr bcopy 9F ,
2842.Xr ddi_dma_addr_bind_handle 9F ,
2843.Xr ddi_dma_mem_alloc 9F ,
2844.Xr ddi_fm_acc_err_get 9F ,
2845.Xr ddi_fm_dma_err_get 9F ,
2846.Xr ddi_fm_ereport_post 9F ,
2847.Xr ddi_fm_fini 9F ,
2848.Xr ddi_fm_init 9F ,
2849.Xr ddi_fm_service_impact 9F ,
2850.Xr ddi_get8 9F ,
2851.Xr ddi_put8 9F ,
2852.Xr desballoc 9F ,
2853.Xr freemsg 9F ,
2854.Xr kstat_create 9F ,
2855.Xr mac_alloc 9F ,
2856.Xr mac_devt_to_instance 9F ,
2857.Xr mac_fini_ops 9F ,
2858.Xr mac_free 9F ,
2859.Xr mac_getinfo 9F ,
2860.Xr mac_hcksum_get 9F ,
2861.Xr mac_hcksum_set 9F ,
2862.Xr mac_init_ops 9F ,
2863.Xr mac_link_update 9F ,
2864.Xr mac_lso_get 9F ,
2865.Xr mac_maxsdu_update 9F ,
2866.Xr mac_private_minor 9F ,
2867.Xr mac_prop_info_set_default_link_flowctrl 9F ,
2868.Xr mac_prop_info_set_default_str 9F ,
2869.Xr mac_prop_info_set_default_uint32 9F ,
2870.Xr mac_prop_info_set_default_uint64 9F ,
2871.Xr mac_prop_info_set_default_uint8 9F ,
2872.Xr mac_prop_info_set_perm 9F ,
2873.Xr mac_prop_info_set_range_uint32 9F ,
2874.Xr mac_register 9F ,
2875.Xr mac_rx 9F ,
2876.Xr mac_unregister 9F ,
2877.Xr mod_install 9F ,
2878.Xr mod_remove 9F ,
2879.Xr strcmp 9F ,
2880.Xr timeout 9F ,
2881.Xr cb_ops 9S ,
2882.Xr ddi_device_acc_attr 9S ,
2883.Xr dev_ops 9S ,
2884.Xr mac_callbacks 9S ,
2885.Xr mac_register 9S ,
2886.Xr mblk 9S ,
2887.Xr modldrv 9S ,
2888.Xr modlinkage 9S
2889.Rs
2890.%A McCloghrie, K.
2891.%A Rose, M.
2892.%T RFC 1213 Management Information Base for Network Management of
2893.%T TCP/IP-based internets: MIB-II
2894.%D March 1991
2895.Re
2896.Rs
2897.%A McCloghrie, K.
2898.%A Kastenholz, F.
2899.%T RFC 1573 Evolution of the Interfaces Group of MIB-II
2900.%D January 1994
2901.Re
2902.Rs
2903.%A Kastenholz, F.
2904.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like
2905.%T Interface Types
2906.Re
2907.Rs
2908.%A IEEE Computer Standard
2909.%T IEEE 802.3
2910.%T IEEE Standard for Ethernet
2911.%D 2022
2912.Re
2913