xref: /illumos-gate/usr/src/man/man9e/mac.9e (revision c61a1653a4d73dbc950dac7d96350fd6cb517486)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright 2019 Joyent, Inc.
13.\"
14.Dd July 22, 2019
15.Dt MAC 9E
16.Os
17.Sh NAME
18.Nm mac ,
19.Nm GLDv3
20.Nd MAC networking device driver overview
21.Sh SYNOPSIS
22.In sys/mac_provider.h
23.In sys/mac_ether.h
24.Sh INTERFACE LEVEL
25illumos DDI specific
26.Sh DESCRIPTION
27The
28.Sy MAC
29framework provides a means for implementing high-performance networking
30device drivers.
31It is the successor to the GLD interfaces and is sometimes referred to as the
32GLDv3.
33The remainder of this manual introduces the aspects of writing devices drivers
34that leverage the MAC framework.
35While both the GLDv3 and MAC framework refer to the same thing, in this manual
36page we use the term the
37.Em MAC framework
38to refer to the device driver interface.
39.Pp
40MAC device drivers are character devices.
41They define the standard
42.Xr _init 9E ,
43.Xr _fini 9E ,
44and
45.Xr _info 9E
46entry points to initialize the module, as well as
47.Xr dev_ops 9S
48and
49.Xr cb_ops 9S
50structures.
51.Pp
52The main interface with MAC is through a series of callbacks defined in
53a
54.Xr mac_callbacks 9S
55structure.
56These callbacks control all the aspects of the device.
57They range from sending data, getting and setting of properties, controlling mac
58address filters, and also managing promiscuous mode.
59.Pp
60The MAC framework takes care of many aspects of the device driver's
61management.
62A device that uses the MAC framework does not have to worry about creating
63device nodes or implementing
64.Xr open 9E
65or
66.Xr close 9E
67routines.
68In addition, all of the work to interact with
69.Xr dlpi 7P
70is taken care of automatically and transparently.
71.Ss Initializing MAC Support
72For a device to be used in the framework, it must register with the
73framework and take specific actions during
74.Xr _init 9E ,
75.Xr attach 9E ,
76.Xr detach 9E ,
77and
78.Xr _fini 9E .
79.Pp
80All device drivers have to define a
81.Xr dev_ops 9S
82structure which is pointed to by a
83.Xr modldrv 9S
84structure and the corresponding NULL-terminated
85.Xr modlinkage 9S
86structure.
87The
88.Xr dev_ops 9S
89structure should have a
90.Xr cb_ops 9S
91structure defined for it; however, it does not need to implement any of
92the standard
93.Xr cb_ops 9S
94entry points.
95.Pp
96Normally, in a driver's
97.Xr _init 9E
98entry point, it passes its
99.Sy modlinkage
100structure directly to
101.Xr mod_install 9F .
102To properly register with MAC, the driver must call
103.Xr mac_init_ops 9F
104before it calls
105.Xr mod_install 9F .
106If for some reason the
107.Xr mod_install 9F
108function fails, then the driver must be removed by a call to
109.Xr mac_fini_ops 9F .
110.Pp
111Conversely, in the driver's
112.Xr _fini 9E
113routine, it should call
114.Xr mac_fini_ops 9F
115after it successfully calls
116.Xr mod_remove 9F .
117For an example of how to use the
118.Xr mac_init_ops 9F
119and
120.Xr mac_fini_ops 9F
121functions, see the examples section in
122.Xr mac_init_ops 9F .
123.Ss Registering with MAC
124Every instance of a device should register separately with MAC.
125To register with MAC, a driver must allocate a
126.Xr mac_register 9S
127structure, fill it in, and then call
128.Xr mac_register 9F .
129The
130.Sy mac_register_t
131structure contains information about the device and all of the required
132function pointers that will be used as callbacks by the framework.
133.Pp
134These steps should all be taken during a device's
135.Xr attach 9E
136entry point.
137It is recommended that the driver perform this sequence of steps after the
138device has finished its initialization of the chipset and interrupts, though
139interrupts should not be enabled at that point.
140After it calls
141.Xr mac_register 9F
142it will start receiving callbacks from the MAC framework.
143.Pp
144To allocate the registration structure, the driver should call
145.Xr mac_alloc 9F .
146Device drivers should generally always pass the symbol
147.Sy MAC_VERSION
148as the argument to
149.Xr mac_alloc 9F .
150Upon successful completion, the driver will receive a
151.Sy mac_register_t
152structure which it should fill in.
153The structure and its members are documented in
154.Xr mac_register 9S .
155.Pp
156The
157.Xr mac_callbacks 9S
158structure is not allocated as a part of the
159.Xr mac_register 9S
160structure.
161In general, device drivers declare this statically.
162See the
163.Sx MAC Callbacks
164section for more information on how to fill it out.
165.Pp
166Once the structure has been filled in, the driver should call
167.Xr mac_register 9F
168to register itself with MAC.
169The handle that it uses to register with should be part of the driver's soft
170state.
171It will be used in various other support functions and callbacks.
172.Pp
173If the call is successful, then the device driver
174should enable interrupts and finish any other initialization required.
175If the call to
176.Xr mac_register 9F
177failed, then it should unwind its initialization and should return
178.Sy DDI_FAILURE
179from its
180.Xr attach 9E
181routine.
182.Pp
183The driver does not need to hold onto an allocated
184.Xr mac_register 9S
185structure after it has called the
186.Xr mac_register 9F
187function.
188Whether the
189.Xr mac_register 9F
190function returns successfully or not, the driver may free its
191.Xr mac_register 9S
192structure by calling the
193.Xr mac_free 9F
194function.
195.Ss MAC Callbacks
196The MAC framework interacts with a device driver through a series of
197callbacks.
198These callbacks are described in their individual manual pages and the
199collection of callbacks is indicated in the
200.Xr mac_callbacks 9S
201manual page.
202This section does not focus on the specific functions, but rather on
203interactions between them and the rest of the device driver framework.
204.Pp
205A device driver should make no assumptions about when the various
206callbacks will be called and whether or not they will be called
207simultaneously.
208For example, a device driver may be asked to transmit data through a call to its
209.Xr mc_tx 9E
210entry point while it is being asked to get a device property through a
211call to its
212.Xr mc_getprop 9E
213entry point.
214As such, while some calls may be serialized to the device, such as setting
215properties, the device driver should always presume that all of its data needs
216to be protected with locks.
217While the device is holding locks, it is safe for it call the following MAC
218routines:
219.Bl -bullet -offset indent -compact
220.It
221.Xr mac_hcksum_get 9F
222.It
223.Xr mac_hcksum_set 9F
224.It
225.Xr mac_lso_get 9F
226.It
227.Xr mac_maxsdu_update 9F
228.It
229.Xr mac_prop_info_set_default_link_flowctrl 9F
230.It
231.Xr mac_prop_info_set_default_str 9F
232.It
233.Xr mac_prop_info_set_default_uint8 9F
234.It
235.Xr mac_prop_info_set_default_uint32 9F
236.It
237.Xr mac_prop_info_set_default_uint64 9F
238.It
239.Xr mac_prop_info_set_perm 9F
240.It
241.Xr mac_prop_info_set_range_uint32 9F
242.El
243.Pp
244Any other MAC related routines should not be called with locks held,
245such as
246.Xr mac_link_update 9F
247or
248.Xr mac_rx 9F .
249Other routines in the DDI may be called while locks are held; however,
250device driver writers should be careful about calling blocking routines
251while locks are held or in interrupt context, though it is generally
252legal to do so.
253.Ss Receiving Data
254A device driver will often receive data through the means of an
255interrupt.
256When that interrupt occurs, the device driver will receive one or more frames
257with optional metadata.
258Often each frame has a corresponding descriptor which has information about
259whether or not there were errors or whether or not the device successfully
260checksummed the packet.
261In addition to the per-packet flow described below, there are certain
262requirements that drivers must adhere to when programming the hardware
263to receive data.
264See the section
265.Sx RECEIVE DESCRIPTOR LAYOUT
266for more information.
267.Pp
268During a single interrupt, a device driver should process a fixed number
269of frames.
270For each frame the device driver should:
271.Bl -enum -offset indent
272.It
273First check whether or not the frame has errors.
274If errors were detected, then the frame should not be sent to the operating
275system.
276It is recommended that devices keep kstats (see
277.Xr kstat_create 9F
278for more information) and bump the counter whenever such an error is
279detected.
280If the device distinguishes between the types of errors, then separate kstats
281for each class of error are recommended.
282See the
283.Sx STATISTICS
284section for more information on the various error cases that should be
285considered.
286.It
287Once the frame has been determined to be valid, the device driver should
288transform the frame into a
289.Xr mblk 9S .
290See the section
291.Sx MBLKS AND DMA
292for more information on how to transform and prepare a message block.
293.It
294If the device supports hardware checksumming (see the
295.Sx CAPABILITIES
296section for more information on checksumming), then the device driver
297should set the corresponding checksumming information with a call to
298.Xr mac_hcksum_set 9F .
299.It
300It should then append this new message block to the
301.Em end
302of the message block chain, linking it to the
303.Sy b_next
304pointer.
305It is vitally important that all the frames be chained in the order that they
306were received.
307If the device driver mistakenly reorders frames, then it may cause performance
308impacts in the TCP stack and potentially impact application correctness.
309.El
310.Pp
311Once all the frames have been processed and assembled, the device driver
312should deliver them to the rest of the operating system by calling
313.Xr mac_rx 9F .
314The device driver should try to give as many mblk_t structures to the
315system at once.
316It
317.Em should not
318call
319.Xr mac_rx 9F
320once for every assembled mblk_t.
321.Pp
322The device driver must not hold any locks across the call to
323.Xr mac_rx 9F .
324When this function is called, received data will be pushed through the
325networking stack and some replies may be generated and given to the
326driver to send out.
327.Pp
328It is not the device driver's responsibility to determine whether or not
329the system can keep up with a driver's delivery rate of frames.
330The rest of the networking stack will handle issues related to keeping up
331appropriately and ensure that kernel memory is not exhausted by packets
332that are not being processed.
333.Pp
334Finally, the device driver should make sure that any other housekeeping
335activities required for the ring are taken care of such that more data
336can be received.
337.Ss Transmitting Data and Back Pressure
338A device driver will be asked to transmit a message block chain by
339having it's
340.Xr mc_tx 9E
341entry point called.
342While the driver is processing the message blocks, it may run out of resources.
343For example, a transmit descriptor ring may become full.
344At that point, the device driver should return the remaining unprocessed frames.
345The act of returning frames indicates that the device has asserted flow control.
346Once this has been done, no additional calls will be made to the
347driver's transmit entry point and the back pressure will be propagated
348throughout the rest of the networking stack.
349.Pp
350At some point in the future when resources have become available again,
351for example after an interrupt indicating that some portion of the
352transmit ring has been sent, then the device driver must notify the
353system that it can continue transmission.
354To do this, the driver should call
355.Xr mac_tx_update 9F .
356After that point, the driver will receive calls to its
357.Xr mc_tx 9E
358entry point again.
359As mentioned in the section on callbacks, the device driver should avoid holding
360any particular locks across the call to
361.Xr mac_tx_update 9F .
362.Ss Interrupt Coalescing
363For devices operating at higher data rates, interrupt coalescing is an
364important part of a well functioning device and may impact the
365performance of the device.
366Not all devices support interrupt coalescing.
367If interrupt coalescing is supported on the device, it is recommended that
368device driver writers provide private properties for their device to control the
369interrupt coalescing rate.
370This will make it much easier to perform experiments and observe the impact of
371different interrupt rates on the rest of the system.
372.Ss MAC Address Filter Management
373The MAC framework will attempt to use as many MAC address filters as a
374device has.
375To program a multicast address filter, the driver's
376.Xr mc_multicst 9E
377entry point will be called.
378If the device driver runs out of filters, it should not take any special action
379and just return the appropriate error as documented in the corresponding manual
380pages for the entry points.
381The framework will ensure that the device is placed in promiscuous mode
382if it needs to.
383.Ss Link Updates
384It is the responsibility of the device driver to keep track of the
385data link's state.
386Many devices provide a means of receiving an interrupt when the state of the
387link changes.
388When such a change happens, the driver should update its internal data
389structures and then call
390.Xr mac_link_update 9F
391to inform the MAC layer that this has occurred.
392If the device driver does not properly inform the system about link changes,
393then various features like link aggregations and other mechanisms that leverage
394the link state will not work correctly.
395.Ss Link Speed and Auto-negotiation
396Many networking devices support more than one possible speed that they
397can operate at.
398The selection of a speed is often performed through
399.Em auto-negotiation ,
400though some devices allow the user to control what speeds are advertised
401and used.
402.Pp
403Logically, there are two different sets of things that the device driver
404needs to keep track of while it's operating:
405.Bl -enum
406.It
407The supported speeds in hardware.
408.It
409The enabled speeds from the user.
410.El
411.Pp
412By default, when a link first comes up, the device driver should
413generally configure the link to support the common set of speeds and
414perform auto-negotiation.
415.Pp
416A user can control what speeds a device advertises via auto-negotiation
417and whether or not it performs auto-negotiation at all by using a series
418of properties that have
419.Sy _EN_
420in the name.
421These are read/write properties and there is one for each speed supported in the
422operating system.
423For a full list of them, see the
424.Sx PROPERTIES
425section.
426.Pp
427In addition to these properties, there is a corresponding set of
428properties with
429.Sy _ADV_
430in the name.
431These are similar to the
432.Sy _EN_
433family of properties, but they are read-only and indicate what the
434device has actually negotiated.
435While they are generally similar to the
436.Sy _EN_
437family of properties, they may change depending on power settings.
438See the
439.Sy Ethernet Link Properties
440section in
441.Xr dladm 1M
442for more information.
443.Pp
444It's worth discussing how these different values get used throughout the
445different entry points.
446The first entry point to consider is the
447.Xr mc_propinfo 9E
448entry point.
449For a given speed, the driver should consult whether or not the hardware
450supports this speed.
451If it does, it should fill in the default value that the hardware takes and
452whether or not the property is writable.
453The properties should also be updated to indicate whether or not it is writable.
454This holds for both the
455.Sy _EN_
456and
457.Sy _ADV_
458family of properties.
459.Pp
460The next entry point is
461.Xr mc_getprop 9E .
462Here, the device should first consult whether the given speed is
463supported.
464If it is not, then the driver should return
465.Er ENOTSUP .
466If it does, then it should return the current value of the property.
467.Pp
468The last property endpoint is the
469.Xr mc_setprop 9E
470entry point.
471Here, the same logic applies.
472Before the driver considers whether or not the property is writable, it should
473first check whether or not it's a supported property.
474If it's not, then it should return
475.Er ENOTSUP .
476Otherwise, it should proceed to check whether the property is writable,
477and if it is and a valid value, then it should update the property and
478restart the link's negotiation.
479.Pp
480Finally, there is the
481.Xr mc_getstat 9E
482entry point.
483Several of the statistics that are queried relate to auto-negotiation and
484hardware capabilities.
485When a statistic relates to the hardware supporting a given speed, the
486.Sy _EN_
487properties should be ignored.
488The only thing that should be consulted is what the hardware itself supports.
489Otherwise, the statistics should look at what is currently being advertised by
490the device.
491.Ss Unregistering from MAC
492During a driver's
493.Xr detach 9E
494routine, it should unregister the device instance from MAC by calling
495.Xr mac_unregister 9F
496on the handle that it originally called it on.
497If the call to
498.Xr mac_unregister 9F
499failed, then the device is likely still in use and the driver should
500fail the call to
501.Xr detach 9E .
502.Ss Interacting with Devices
503Administrators always interact with devices through the
504.Xr dladm 1M
505command line interface.
506The state of devices such as whether the link is considered
507.Sy up
508or
509.Sy down ,
510various link properties such as the
511.Sy MTU ,
512.Sy auto-negotiation
513state,
514and
515.Sy flow control
516state,
517are all exposed.
518It is also the preferred way that these properties are set and configured.
519.Pp
520While device tunables may be presented in a
521.Xr driver.conf 4
522file, it is recommended instead to expose such things through
523.Xr dladm 1M
524private properties, whether explicitly documented or not.
525.Sh CAPABILITIES
526Capabilities in the MAC Framework are optional features that a device
527supports which indicate various hardware features that the device
528supports.
529The two current capabilities that the system supports are related to being able
530to hardware perform large send offloads (LSO), often also known as TCP
531segmentation and the ability for hardware to calculate and verify the checksums
532present in IPv4, IPV6, and protocol headers such as TCP and UDP.
533.Pp
534The MAC framework will query a device for support of a capability
535through the
536.Xr mc_getcapab 9E
537function.
538Each capability has its own constant and may have corresponding data that goes
539along with it and a specific structure that the device is required to fill in.
540Note, the set of capabilities changes over time and there are also private
541capabilities in the system.
542Several of the capabilities are used in the implementation of the MAC framework.
543Others, like
544.Sy MAC_CAPAB_RINGS ,
545represent feature that have not been stabilized and thus both API and binary
546compatibility for them is not guaranteed.
547It is important that the device driver handles unknown capabilities correctly.
548For more information, see
549.Xr mc_getcapab 9E .
550.Pp
551The following capabilities are
552stable and defined in the system:
553.Ss MAC_CAPAB_HCKSUM
554The
555.Sy MAC_CAPAB_HCKSUM
556capability indicates to the system that the device driver supports some
557amount of checksumming.
558The specific data for this capability is a pointer to a
559.Sy uint32_t .
560To indicate no support for any kind of checksumming, the driver should
561either set this value to zero or simply return that it doesn't support
562the capability.
563.Pp
564Note, the values that the driver declares in this capability indicate
565what it can do when it transmits data.
566If the driver can only verify checksums when receiving data, then it should not
567indicate that it supports this capability.
568The following set of flags may be combined through a bitwise inclusive OR:
569.Bl -tag -width Ds
570.It Sy HCKSUM_INET_PARTIAL
571This indicates that the hardware can calculate a partial checksum for
572both IPv4 and IPv6; however, it requires the pseudo-header checksum be
573calculated for it.
574The pseudo-header checksum will be available for the mblk_t when calling
575.Xr mac_hcksum_get 9F .
576Note this does not imply that the hardware is capable of calculating the
577IPv4 header checksum.
578That should be indicated with the
579.Sy HCKSUM_IPHDRCKSUM flag.
580.It Sy HCKSUM_INET_FULL_V4
581This indicates that the hardware will fully calculate the L4 checksum
582for outgoing IPv4 packets and does not require a pseudo-header checksum.
583Note this does not imply that the hardware is capable of calculating the
584IPv4 header checksum.
585That should be indicated with the
586.Sy HCKSUM_IPHDRCKSUM .
587.It Sy HCKSUM_INET_FULL_V6
588This indicates that the hardware will fully calculate the L4 checksum
589for outgoing IPv6 packets and does not require a pseudo-header checksum.
590.It Sy HCKSUM_IPHDRCKSUM
591This indicates that the hardware supports calculating the checksum for
592the IPv4 header itself.
593.El
594.Pp
595When in a driver's transmit function, the driver will be processing a
596single frame.
597It should call
598.Xr mac_hcksum_get 9F
599to see what checksum flags are set on it.
600Note that the flags that are set on it are different from the ones described
601above and are documented in its manual page.
602These flags indicate how the driver is expected to program the hardware and what
603checksumming is required.
604Not all frames will require hardware checksumming or will ask the hardware to
605checksum it.
606.Pp
607If a driver supports offloading the receive checksum and verification,
608it should check to see what the hardware indicated was verified.
609The driver should then call
610.Xr mac_hcksum_set 9F .
611The flags used are different from the ones above and are discussed in
612detail in the
613.Xr mac_hcksum_set 9F
614manual page.
615If there is no checksum information available or the driver does not support
616checksumming, then it should simply not call
617.Xr mac_hcksum_set 9F .
618.Pp
619Note that the checksum flags should be set on the first
620mblk_t that makes up a given message.
621In other words, if multiple mblk_t structures are linked together by the
622.Sy b_cont
623member to describe a single frame, then it should only be called on the
624first mblk_t of that set.
625However, each distinct message should have the checksum bits set on it, if
626applicable.
627In other words, each mblk_t that is linked together by the
628.Sy b_next
629pointer may have checksum flags set.
630.Pp
631It is recommended that device drivers provide a private property or
632.Xr driver.conf 4
633property to control whether or not checksumming is enabled for both rx
634and tx; however, the default disposition is recommended to be enabled
635for both.
636This way if hardware bugs are found in the checksumming implementation, they can
637be disabled without requiring software updates.
638The transmit property should be checked when determining how to reply to
639.Xr mc_getcapab 9E
640and the receive property should be checked in the context of the receive
641function.
642.Ss MAC_CAPAB_LSO
643The
644.Sy MAC_CAPAB_LSO
645capability indicates that the driver supports various forms of large
646send offload (LSO).
647The private data is a pointer to a
648.Sy mac_capab_lso_t
649structure.
650At the moment, LSO support is limited to TCP inside of IPv4.
651This structure has the following members which are used to indicate
652various types of LSO support.
653.Bd -literal -offset indent
654t_uscalar_t		lso_flags;
655lso_basic_tcp_ivr4_t	lso_basic_tcp_ipv4;
656.Ed
657.Pp
658The
659.Sy lso_flags
660member is used to indicate which members are valid and should be
661considered.
662Each flag represents a different form of LSO.
663The member should be set to the bitwise inclusive OR of the following values:
664.Bl -tag -width Dv -offset indent
665.It Sy LSO_TX_BASIC_TCP_IPV4
666This indicates hardware support for performing TCP segmentation
667offloading over IPv4.
668When this flag is set, the
669.Sy lso_basic_tcp_ipv4
670member must be filled in.
671.El
672.Pp
673The
674.Sy lso_basic_tcp_ipv4
675member is a structure with the following members:
676.Bd -literal -offset indent
677t_uscalar_t	lso_max
678.Ed
679.Bd -filled -offset indent
680The
681.Sy lso_max
682member should be set to the maximum size of the TCP data
683payload that can be offloaded to the hardware.
684.Ed
685.Pp
686Like with checksumming, it is recommended that driver writers provide a
687means for disabling the support of LSO even if it is enabled by default.
688This deals with the case where issues that pop up for LSO may be worked
689around without requiring additional driver work.
690.Sh PROPERTIES
691Properties in the MAC framework represent aspects of a link.
692These include things like the link's current state and MTU.
693Many of the properties in the system are focused around auto-negotiation and
694controlling what link speeds are advertised.
695Information about properties is covered by three different device entry points.
696The
697.Xr mc_propinfo 9E
698entry point obtains metadata about the property.
699The
700.Xr mc_getprop 9E
701entry point obtains the property.
702The
703.Xr mc_setprop 9E
704entry point updates the property to a new value.
705.Pp
706Many of the properties listed below are read-only.
707Each property indicates whether it's read-only or it's read/write.
708However, driver writers may not implement the ability to set all writable
709properties.
710Many of these depend on the card itself.
711In particular, all properties that relate to auto-negotiation and are read/write
712may not be updated if the hardware in question does not support toggling what
713link speeds are auto-negotiated.
714While copper Ethernet often does not have this restriction, it often exists with
715various fiber standards and phys.
716.Pp
717The following properties are the subset of MAC framework properties that
718driver writers should be aware of and handle.
719While other properties exist in the system, driver writers should always return
720an error when a property not listed below is encountered.
721See
722.Xr mc_getprop 9E
723and
724.Xr mc_setprop 9E
725for more information on how to handle them.
726.Bl -hang -width Ds
727.It Sy MAC_PROP_DUPLEX
728.Bd -filled -compact
729Type:
730.Sy link_duplex_t |
731Permissions:
732.Sy Read-Only
733.Ed
734.Pp
735The
736.Sy MAC_PROP_DUPLEX
737property is used to indicate whether or not the link is duplex.
738A duplex link may have traffic flowing in both directions at the same time.
739The
740.Sy link_duplex_t
741is an enumeration which may be set to any of the following values:
742.Bl -tag -width Ds
743.It Sy LINK_DUPLEX_UNKNOWN
744The current state of the link is unknown.
745This may be because the link has not negotiated to a specific speed or it is
746down.
747.It Sy LINK_DUPLEX_HALF
748The link is running at half duplex.
749Communication may travel in only one direction on the link at a given time.
750.It Sy LINK_DUPLEX_FULL
751The link is running at full duplex.
752Communication may travel in both directions on the link simultaneously.
753.El
754.It Sy MAC_PROP_SPEED
755.Bd -filled -compact
756Type:
757.Sy uint64_t |
758Permissions:
759.Sy Read-Only
760.Ed
761.Pp
762The
763.Sy MAC_PROP_SPEED
764property stores the current link speed in bits per second.
765A link that is running at 100 MBit/s would store the value 100000000ULL.
766A link that is running at 40 Gbit/s would store the value 40000000000ULL.
767.It Sy MAC_PROP_STATUS
768.Bd -filled -compact
769Type:
770.Sy link_state_t |
771Permissions:
772.Sy Read-Only
773.Ed
774.Pp
775The
776.Sy MAC_PROP_STATUS
777property is used to indicate the current state of the link.
778It indicates whether the link is up or down.
779The
780.Sy link_state_t
781is an enumeration which may be set to any of the following values:
782.Bl -tag -width Ds
783.It Sy LINK_STATE_UNKNOWN
784The current state of the link is unknown.
785This may be because the driver's
786.Xr mc_start 9E
787endpoint has not been called so it has not attempted to start the link.
788.It Sy LINK_STATE_DOWN
789The link is down.
790This may be because of a negotiation problem, a cable problem, or some other
791device specific issue.
792.It Sy LINK_STATE_UP
793The link is up.
794If auto-negotiation is in use, it should have completed.
795Traffic should be able to flow over the link, barring other issues.
796.El
797.It Sy MAC_PROP_AUTONEG
798.Bd -filled -compact
799Type:
800.Sy uint8_t |
801Permissions:
802.Sy Read/Write
803.Ed
804.Pp
805The
806.Sy MAC_PROP_AUTONEG
807property indicates whether or not the device is currently configured to
808perform auto-negotiation.
809A value of
810.Sy 0
811indicates that auto-negotiation is disabled.
812A
813.Sy non-zero
814value indicates that auto-negotiation is enabled.
815Devices should generally default to enabling auto-negotiation.
816.Pp
817When getting this property, the device driver should return the current
818state.
819When setting this property, if the device supports operating in the requested
820mode, then the device driver should reset the link to negotiate to the new speed
821after updating any internal registers.
822.It Sy MAC_PROP_MTU
823.Bd -filled -compact
824Type:
825.Sy uint32_t |
826Permissions:
827.Sy Read/Write
828.Ed
829.Pp
830The
831.Sy MAC_PROP_MTU
832property determines the maximum transmission unit (MTU).
833This indicates the maximum size packet that the device can transmit, ignoring
834its own headers.
835For an Ethernet device, this would exclude the size of the Ethernet header and
836any VLAN headers that would be placed.
837It is up to the driver to ensure that any MTU values that it accepts when adding
838in its margin and header sizes does not exceed its maximum frame size.
839.Pp
840By default, drivers for Ethernet should initialize this value and the
841MTU to
842.Sy 1500 .
843When getting this property, the driver should return its current
844recorded MTU.
845When setting this property, the driver should first validate that it is within
846the device's valid range and then it must call
847.Xr mac_maxsdu_update 9F .
848Note that the call may fail.
849If the call completes successfully, the driver should update the hardware with
850the new value of the MTU and perform any other work needed to handle it.
851.Pp
852If the device does not support changing the MTU after the device's
853.Xr mc_start 9E
854entry point has been called, then driver writers should return
855.Er EBUSY .
856.It Sy MAC_PROP_FLOWCTRL
857.Bd -filled -compact
858Type:
859.Sy link_flowctrl_t |
860Permissions:
861.Sy Read/Write
862.Ed
863.Pp
864The
865.Sy MAC_PROP_FLOWCTRL
866property manages the configuration of pause frames as part of Ethernet
867flow control.
868Note, this only describes what this device will advertise.
869What is actually enabled may be different and is subject to the rules of
870auto-negotiation.
871The
872.Sy link_flowctrl_t
873is an enumeration that may be set to one of the following values:
874.Bl -tag -width Ds
875.It Sy LINK_FLOWCTRL_NONE
876Flow control is disabled.
877No pause frames should be generated or honored.
878.It Sy LINK_FLOWCTRL_RX
879The device can receive pause frames; however, it should not generate
880them.
881.It Sy LINK_FLOWCTRL_TX
882The device can generate pause frames; however, it does not support
883receiving them.
884.It Sy LINK_FLOWCTRL_BI
885The device supports both sending and receiving pause frames.
886.El
887.Pp
888When getting this property, the device driver should return the way that
889it has configured the device, not what the device has actually
890negotiated.
891When setting the property, it should update the hardware and allow the link to
892potentially perform auto-negotiation again.
893.El
894.Pp
895The remaining properties are all about various auto-negotiation link
896speeds.
897They fall into two different buckets: properties with
898.Sy _ADV_
899in the name and properties with
900.Sy _EN_
901in the name.
902For any given supported speed, there is one of each.
903The
904.Sy _EN_
905set of properties are read/write properties that control what should be
906advertised by the device.
907When these are retrieved, they should return the current value of the property.
908When they are set, they should change how the hardware advertises the specific
909speed and trigger any kind of link reset and auto-negotiation, if enabled, to
910occur.
911.Pp
912The
913.Sy _ADV_
914set of properties are read-only properties.
915They are meant to reflect what has actually been negotiated.
916These may be different from the
917.Sy _EN_
918family of properties, especially when different power management
919settings are at play.
920.Pp
921See the
922.Sx Link Speed and Auto-negotiation
923section for more information.
924.Pp
925The properties are ordered in increasing link speed:
926.Bl -hang -width Ds
927.It Sy MAC_PROP_ADV_10HDX_CAP
928.Bd -filled -compact
929Type:
930.Sy uint8_t |
931Permissions:
932.Sy Read-Only
933.Ed
934.Pp
935The
936.Sy MAC_PROP_ADV_10HDX_CAP
937property describes whether or not 10 Mbit/s half-duplex support is
938advertised.
939.It Sy MAC_PROP_EN_10HDX_CAP
940.Bd -filled -compact
941Type:
942.Sy uint8_t |
943Permissions:
944.Sy Read/Write
945.Ed
946.Pp
947The
948.Sy MAC_PROP_EN_10HDX_CAP
949property describes whether or not 10 Mbit/s half-duplex support is
950enabled.
951.It Sy MAC_PROP_ADV_10FDX_CAP
952.Bd -filled -compact
953Type:
954.Sy uint8_t |
955Permissions:
956.Sy Read-Only
957.Ed
958.Pp
959The
960.Sy MAC_PROP_ADV_10FDX_CAP
961property describes whether or not 10 Mbit/s full-duplex support is
962advertised.
963.It Sy MAC_PROP_EN_10FDX_CAP
964.Bd -filled -compact
965Type:
966.Sy uint8_t |
967Permissions:
968.Sy Read/Write
969.Ed
970.Pp
971The
972.Sy MAC_PROP_EN_10FDX_CAP
973property describes whether or not 10 Mbit/s full-duplex support is
974enabled.
975.It Sy MAC_PROP_ADV_100HDX_CAP
976.Bd -filled -compact
977Type:
978.Sy uint8_t |
979Permissions:
980.Sy Read-Only
981.Ed
982.Pp
983The
984.Sy MAC_PROP_ADV_100HDX_CAP
985property describes whether or not 100 Mbit/s half-duplex support is
986advertised.
987.It Sy MAC_PROP_EN_100HDX_CAP
988.Bd -filled -compact
989Type:
990.Sy uint8_t |
991Permissions:
992.Sy Read/Write
993.Ed
994.Pp
995The
996.Sy MAC_PROP_EN_100HDX_CAP
997property describes whether or not 100 Mbit/s half-duplex support is
998enabled.
999.It Sy MAC_PROP_ADV_100FDX_CAP
1000.Bd -filled -compact
1001Type:
1002.Sy uint8_t |
1003Permissions:
1004.Sy Read-Only
1005.Ed
1006.Pp
1007The
1008.Sy MAC_PROP_ADV_100FDX_CAP
1009property describes whether or not 100 Mbit/s full-duplex support is
1010advertised.
1011.It Sy MAC_PROP_EN_100FDX_CAP
1012.Bd -filled -compact
1013Type:
1014.Sy uint8_t |
1015Permissions:
1016.Sy Read/Write
1017.Ed
1018.Pp
1019The
1020.Sy MAC_PROP_EN_100FDX_CAP
1021property describes whether or not 100 Mbit/s full-duplex support is
1022enabled.
1023.It Sy MAC_PROP_ADV_100T4_CAP
1024.Bd -filled -compact
1025Type:
1026.Sy uint8_t |
1027Permissions:
1028.Sy Read-Only
1029.Ed
1030.Pp
1031The
1032.Sy MAC_PROP_ADV_100T4_CAP
1033property describes whether or not 100 Mbit/s Ethernet using the
1034100BASE-T4 standard is
1035advertised.
1036.It Sy MAC_PROP_EN_100T4_CAP
1037.Bd -filled -compact
1038Type:
1039.Sy uint8_t |
1040Permissions:
1041.Sy Read/Write
1042.Ed
1043.Pp
1044The
1045.Sy MAC_PROP_ADV_100T4_CAP
1046property describes whether or not 100 Mbit/s Ethernet using the
1047100BASE-T4 standard is
1048enabled.
1049.It Sy MAC_PROP_ADV_1000HDX_CAP
1050.Bd -filled -compact
1051Type:
1052.Sy uint8_t |
1053Permissions:
1054.Sy Read-Only
1055.Ed
1056.Pp
1057The
1058.Sy MAC_PROP_ADV_1000HDX_CAP
1059property describes whether or not 1 Gbit/s half-duplex support is
1060advertised.
1061.It Sy MAC_PROP_EN_1000HDX_CAP
1062.Bd -filled -compact
1063Type:
1064.Sy uint8_t |
1065Permissions:
1066.Sy Read/Write
1067.Ed
1068.Pp
1069The
1070.Sy MAC_PROP_EN_1000HDX_CAP
1071property describes whether or not 1 Gbit/s half-duplex support is
1072enabled.
1073.It Sy MAC_PROP_ADV_1000FDX_CAP
1074.Bd -filled -compact
1075Type:
1076.Sy uint8_t |
1077Permissions:
1078.Sy Read-Only
1079.Ed
1080.Pp
1081The
1082.Sy MAC_PROP_ADV_1000FDX_CAP
1083property describes whether or not 1 Gbit/s full-duplex support is
1084advertised.
1085.It Sy MAC_PROP_EN_1000FDX_CAP
1086.Bd -filled -compact
1087Type:
1088.Sy uint8_t |
1089Permissions:
1090.Sy Read/Write
1091.Ed
1092.Pp
1093The
1094.Sy MAC_PROP_EN_1000FDX_CAP
1095property describes whether or not 1 Gbit/s full-duplex support is
1096enabled.
1097.It Sy MAC_PROP_ADV_2500FDX_CAP
1098.Bd -filled -compact
1099Type:
1100.Sy uint8_t |
1101Permissions:
1102.Sy Read-Only
1103.Ed
1104.Pp
1105The
1106.Sy MAC_PROP_ADV_2500FDX_CAP
1107property describes whether or not 2.5 Gbit/s full-duplex support is
1108advertised.
1109.It Sy MAC_PROP_EN_2500FDX_CAP
1110.Bd -filled -compact
1111Type:
1112.Sy uint8_t |
1113Permissions:
1114.Sy Read/Write
1115.Ed
1116.Pp
1117The
1118.Sy MAC_PROP_EN_2500FDX_CAP
1119property describes whether or not 2.5 Gbit/s full-duplex support is
1120enabled.
1121.It Sy MAC_PROP_ADV_5000FDX_CAP
1122.Bd -filled -compact
1123Type:
1124.Sy uint8_t |
1125Permissions:
1126.Sy Read-Only
1127.Ed
1128.Pp
1129The
1130.Sy MAC_PROP_ADV_5000FDX_CAP
1131property describes whether or not 5.0 Gbit/s full-duplex support is
1132advertised.
1133.It Sy MAC_PROP_EN_5000FDX_CAP
1134.Bd -filled -compact
1135Type:
1136.Sy uint8_t |
1137Permissions:
1138.Sy Read/Write
1139.Ed
1140.Pp
1141The
1142.Sy MAC_PROP_EN_5000FDX_CAP
1143property describes whether or not 5.0 Gbit/s full-duplex support is
1144enabled.
1145.It Sy MAC_PROP_ADV_10GFDX_CAP
1146.Bd -filled -compact
1147Type:
1148.Sy uint8_t |
1149Permissions:
1150.Sy Read-Only
1151.Ed
1152.Pp
1153The
1154.Sy MAC_PROP_ADV_10GFDX_CAP
1155property describes whether or not 10 Gbit/s full-duplex support is
1156advertised.
1157.It Sy MAC_PROP_EN_10GFDX_CAP
1158.Bd -filled -compact
1159Type:
1160.Sy uint8_t |
1161Permissions:
1162.Sy Read/Write
1163.Ed
1164.Pp
1165The
1166.Sy MAC_PROP_EN_10GFDX_CAP
1167property describes whether or not 10 Gbit/s full-duplex support is
1168enabled.
1169.It Sy MAC_PROP_ADV_40GFDX_CAP
1170.Bd -filled -compact
1171Type:
1172.Sy uint8_t |
1173Permissions:
1174.Sy Read-Only
1175.Ed
1176.Pp
1177The
1178.Sy MAC_PROP_ADV_40GFDX_CAP
1179property describes whether or not 40 Gbit/s full-duplex support is
1180advertised.
1181.It Sy MAC_PROP_EN_40GFDX_CAP
1182.Bd -filled -compact
1183Type:
1184.Sy uint8_t |
1185Permissions:
1186.Sy Read/Write
1187.Ed
1188.Pp
1189The
1190.Sy MAC_PROP_EN_40GFDX_CAP
1191property describes whether or not 40 Gbit/s full-duplex support is
1192enabled.
1193.It Sy MAC_PROP_ADV_100GFDX_CAP
1194.Bd -filled -compact
1195Type:
1196.Sy uint8_t |
1197Permissions:
1198.Sy Read-Only
1199.Ed
1200.Pp
1201The
1202.Sy MAC_PROP_ADV_100GFDX_CAP
1203property describes whether or not 100 Gbit/s full-duplex support is
1204advertised.
1205.It Sy MAC_PROP_EN_100GFDX_CAP
1206.Bd -filled -compact
1207Type:
1208.Sy uint8_t |
1209Permissions:
1210.Sy Read/Write
1211.Ed
1212.Pp
1213The
1214.Sy MAC_PROP_EN_100GFDX_CAP
1215property describes whether or not 100 Gbit/s full-duplex support is
1216enabled.
1217.El
1218.Ss Private Properties
1219In addition to the defined properties above, drivers are allowed to
1220define private properties.
1221These private properties are device-specific properties.
1222All private properties share the same constant,
1223.Sy MAC_PROP_PRIVATE .
1224Properties are distinguished by a name, which is a character string.
1225The list of such private properties is defined when registering with mac in the
1226.Sy m_priv_props
1227member of the
1228.Xr mac_register 9S
1229structure.
1230.Pp
1231The driver may define whatever semantics it wants for these private
1232properties.
1233They will not be listed when running
1234.Xr dladm 1M ,
1235unless explicitly requested by name.
1236All such properties should start with a leading underscore character and then
1237consist of alphanumeric ASCII characters and additional underscores or hyphens.
1238.Pp
1239Properties of type
1240.Sy MAC_PROP_PRIVATE
1241may show up in all three property related entry points:
1242.Xr mc_propinfo 9E ,
1243.Xr mc_getprop 9E ,
1244and
1245.Xr mc_setprop 9E .
1246Device drivers should tell the different properties apart by using the
1247.Xr strcmp 9F
1248function to compare it to the set of properties that it knows about.
1249When encountering properties that it doesn't know, it should treat them
1250like all other unknown properties.
1251.Sh STATISTICS
1252The MAC framework defines a couple different sets of statistics which
1253are based on various standards for devices to implement.
1254Statistics are retrieved through the
1255.Xr mc_getstat 9E
1256entry point.
1257There are both statistics that are required for all devices and then there is a
1258separate set of Ethernet specific statistics.
1259Not all devices will support every statistic.
1260In many cases, several device registers will need to be combined to create the
1261proper stat.
1262.Pp
1263In general, if the device is not keeping track of these statistics, then
1264it is recommended that the driver store these values as a
1265.Sy uint64_t
1266to ensure that overflow does not occur.
1267.Pp
1268If a device does not support a specific statistic, then it is fine to
1269return that it is not supported.
1270The same should be used for unrecognized statistics.
1271See
1272.Xr mc_getstat 9E
1273for more information on the proper way to handle these.
1274.Ss General Device Statistics
1275The following statistics are based on MIB-II statistics from both RFC
12761213 and RFC 1573.
1277.Bl -tag -width Ds
1278.It Sy MAC_STAT_IFSPEED
1279The device's current speed in bits per second.
1280.It Sy MAC_STAT_MULTIRCV
1281The total number of received multicast packets.
1282.It Sy MAC_STAT_BRDCSTRCV
1283The total number of received broadcast packets.
1284.It Sy MAC_STAT_MULTIXMT
1285The total number of transmitted multicast packets.
1286.It Sy MAC_STAT_BRDCSTXMT
1287The total number of received broadcast packets.
1288.It Sy MAC_STAT_NORCVBUF
1289The total number of packets discarded by the hardware due to a lack of
1290receive buffers.
1291.It Sy MAC_STAT_IERRORS
1292The total number of errors detected on input.
1293.It Sy MAC_STAT_UNKNOWNS
1294The total number of received packets that were discarded because they
1295were of an unknown protocol.
1296.It Sy MAC_STAT_NOXMTBUF
1297The total number of outgoing packets dropped due to a lack of transmit
1298buffers.
1299.It Sy MAC_STAT_OERRORS
1300The total number of outgoing packets that resulted in errors.
1301.It Sy MAC_STAT_COLLISIONS
1302Total number of collisions encountered by the transmitter.
1303.It Sy MAC_STAT_RBYTES
1304The total number of
1305.Sy bytes
1306received by the device, regardless of packet type.
1307.It Sy MAC_STAT_IPACKETS
1308The total number of
1309.Sy packets
1310received by the device, regardless of packet type.
1311.It Sy MAC_STAT_OBYTES
1312The total number of
1313.Sy bytes
1314transmitted by the device, regardless of packet type.
1315.It Sy MAC_STAT_OPACKETS
1316The total number of
1317.Sy packets
1318sent by the device, regardless of packet type.
1319.It Sy MAC_STAT_UNDERFLOWS
1320The total number of packets that were smaller than the minimum sized
1321packet for the device and were therefore dropped.
1322.It Sy MAC_STAT_OVERFLOWS
1323The total number of packets that were larger than the maximum sized
1324packet for the device and were therefore dropped.
1325.El
1326.Ss Ethernet Specific Statistics
1327The following statistics are specific to Ethernet devices.
1328They refer to values from RFC 1643 and include various MII/GMII specific stats.
1329Many of these are also defined in IEEE 802.3.
1330.Bl -tag -width Ds
1331.It Sy ETHER_STAT_ADV_CAP_1000FDX
1332Indicates that the device is advertising support for 1 Gbit/s
1333full-duplex operation.
1334.It Sy ETHER_STAT_ADV_CAP_1000HDX
1335Indicates that the device is advertising support for 1 Gbit/s
1336half-duplex operation.
1337.It Sy ETHER_STAT_ADV_CAP_100FDX
1338Indicates that the device is advertising support for 100 Mbit/s
1339full-duplex operation.
1340.It Sy ETHER_STAT_ADV_CAP_100GFDX
1341Indicates that the device is advertising support for 100 Gbit/s
1342full-duplex operation.
1343.It Sy ETHER_STAT_ADV_CAP_100HDX
1344Indicates that the device is advertising support for 100 Mbit/s
1345half-duplex operation.
1346.It Sy ETHER_STAT_ADV_CAP_100T4
1347Indicates that the device is advertising support for 100 Mbit/s
1348100BASE-T4 operation.
1349.It Sy ETHER_STAT_ADV_CAP_10FDX
1350Indicates that the device is advertising support for 10 Mbit/s
1351full-duplex operation.
1352.It Sy ETHER_STAT_ADV_CAP_10GFDX
1353Indicates that the device is advertising support for 10 Gbit/s
1354full-duplex operation.
1355.It Sy ETHER_STAT_ADV_CAP_10HDX
1356Indicates that the device is advertising support for 10 Mbit/s
1357half-duplex operation.
1358.It Sy ETHER_STAT_ADV_CAP_2500FDX
1359Indicates that the device is advertising support for 2.5 Gbit/s
1360full-duplex operation.
1361.It Sy ETHER_STAT_ADV_CAP_40GFDX
1362Indicates that the device is advertising support for 40 Gbit/s
1363full-duplex operation.
1364.It Sy ETHER_STAT_ADV_CAP_5000FDX
1365Indicates that the device is advertising support for 5.0 Gbit/s
1366full-duplex operation.
1367.It Sy ETHER_STAT_ADV_CAP_ASMPAUSE
1368Indicates that the device is advertising support for receiving pause
1369frames.
1370.It Sy ETHER_STAT_ADV_CAP_AUTONEG
1371Indicates that the device is advertising support for auto-negotiation.
1372.It Sy ETHER_STAT_ADV_CAP_PAUSE
1373Indicates that the device is advertising support for generating pause
1374frames.
1375.It Sy ETHER_STAT_ADV_REMFAULT
1376Indicates that the device is advertising support for detecting faults in
1377the remote link peer.
1378.It Sy ETHER_STAT_ALIGN_ERRORS
1379Indicates the number of times an alignment error was generated by the
1380Ethernet device.
1381This is a count of packets that were not an integral number of octets and failed
1382the FCS check.
1383.It Sy ETHER_STAT_CAP_1000FDX
1384Indicates the device supports 1 Gbit/s full-duplex operation.
1385.It Sy ETHER_STAT_CAP_1000HDX
1386Indicates the device supports 1 Gbit/s half-duplex operation.
1387.It Sy ETHER_STAT_CAP_100FDX
1388Indicates the device supports 100 Mbit/s full-duplex operation.
1389.It Sy ETHER_STAT_CAP_100GFDX
1390Indicates the device supports 100 Gbit/s full-duplex operation.
1391.It Sy ETHER_STAT_CAP_100HDX
1392Indicates the device supports 100 Mbit/s half-duplex operation.
1393.It Sy ETHER_STAT_CAP_100T4
1394Indicates the device supports 100 Mbit/s 100BASE-T4 operation.
1395.It Sy ETHER_STAT_CAP_10FDX
1396Indicates the device supports 10 Mbit/s full-duplex operation.
1397.It Sy ETHER_STAT_CAP_10GFDX
1398Indicates the device supports 10 Gbit/s full-duplex operation.
1399.It Sy ETHER_STAT_CAP_10HDX
1400Indicates the device supports 10 Mbit/s half-duplex operation.
1401.It Sy ETHER_STAT_CAP_2500FDX
1402Indicates the device supports 2.5 Gbit/s full-duplex operation.
1403.It Sy ETHER_STAT_CAP_40GFDX
1404Indicates the device supports 40 Gbit/s full-duplex operation.
1405.It Sy ETHER_STAT_CAP_5000FDX
1406Indicates the device supports 5.0 Gbit/s full-duplex operation.
1407.It Sy ETHER_STAT_CAP_ASMPAUSE
1408Indicates that the device supports the ability to receive pause frames.
1409.It Sy ETHER_STAT_CAP_AUTONEG
1410Indicates that the device supports the ability to perform link
1411auto-negotiation.
1412.It Sy ETHER_STAT_CAP_PAUSE
1413Indicates that the device supports the ability to transmit pause frames.
1414.It Sy ETHER_STAT_CAP_REMFAULT
1415Indicates that the device supports the ability of detecting a remote
1416fault in a link peer.
1417.It Sy ETHER_STAT_CARRIER_ERRORS
1418Indicates the number of times that the Ethernet carrier sense condition
1419was lost or not asserted.
1420.It Sy ETHER_STAT_DEFER_XMTS
1421Indicates the number of frames for which the device was unable to
1422transmit the frame due to being busy and had to try again.
1423.It Sy ETHER_STAT_EX_COLLISIONS
1424Indicates the number of frames that failed to send due to an excessive
1425number of collisions.
1426.It Sy ETHER_STAT_FCS_ERRORS
1427Indicates the number of times that a frame check sequence failed.
1428.It Sy ETHER_STAT_FIRST_COLLISIONS
1429Indicates the number of times that a frame was eventually transmitted
1430successfully, but only after a single collision.
1431.It Sy ETHER_STAT_JABBER_ERRORS
1432Indicates the number of frames that were received that were both larger
1433than the maximum packet size and failed the frame check sequence.
1434.It Sy ETHER_STAT_LINK_ASMPAUSE
1435Indicates whether the link is currently configured to accept pause
1436frames.
1437.It Sy ETHER_STAT_LINK_AUTONEG
1438Indicates whether the current link state is a result of
1439auto-negotiation.
1440.It Sy ETHER_STAT_LINK_DUPLEX
1441Indicates the current duplex state of the link.
1442The values used here should be the same as documented for
1443.Sy MAC_PROP_DUPLEX .
1444.It Sy ETHER_STAT_LINK_PAUSE
1445Indicates whether the link is currently configured to generate pause
1446frames.
1447.It Sy ETHER_STAT_LP_CAP_1000FDX
1448Indicates the remote device supports 1 Gbit/s full-duplex operation.
1449.It Sy ETHER_STAT_LP_CAP_1000HDX
1450Indicates the remote device supports 1 Gbit/s half-duplex operation.
1451.It Sy ETHER_STAT_LP_CAP_100FDX
1452Indicates the remote device supports 100 Mbit/s full-duplex operation.
1453.It Sy ETHER_STAT_LP_CAP_100GFDX
1454Indicates the remote device supports 100 Gbit/s full-duplex operation.
1455.It Sy ETHER_STAT_LP_CAP_100HDX
1456Indicates the remote device supports 100 Mbit/s half-duplex operation.
1457.It Sy ETHER_STAT_LP_CAP_100T4
1458Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation.
1459.It Sy ETHER_STAT_LP_CAP_10FDX
1460Indicates the remote device supports 10 Mbit/s full-duplex operation.
1461.It Sy ETHER_STAT_LP_CAP_10GFDX
1462Indicates the remote device supports 10 Gbit/s full-duplex operation.
1463.It Sy ETHER_STAT_LP_CAP_10HDX
1464Indicates the remote device supports 10 Mbit/s half-duplex operation.
1465.It Sy ETHER_STAT_LP_CAP_2500FDX
1466Indicates the remote device supports 2.5 Gbit/s full-duplex operation.
1467.It Sy ETHER_STAT_LP_CAP_40GFDX
1468Indicates the remote device supports 40 Gbit/s full-duplex operation.
1469.It Sy ETHER_STAT_LP_CAP_5000FDX
1470Indicates the remote device supports 5.0 Gbit/s full-duplex operation.
1471.It Sy ETHER_STAT_LP_CAP_ASMPAUSE
1472Indicates that the remote device supports the ability to receive pause
1473frames.
1474.It Sy ETHER_STAT_LP_CAP_AUTONEG
1475Indicates that the remote device supports the ability to perform link
1476auto-negotiation.
1477.It Sy ETHER_STAT_LP_CAP_PAUSE
1478Indicates that the remote device supports the ability to transmit pause
1479frames.
1480.It Sy ETHER_STAT_LP_CAP_REMFAULT
1481Indicates that the remote device supports the ability of detecting a
1482remote fault in a link peer.
1483.It Sy ETHER_STAT_MACRCV_ERRORS
1484Indicates the number of times that the internal MAC layer encountered an
1485error when attempting to receive and process a frame.
1486.It Sy ETHER_STAT_MACXMT_ERRORS
1487Indicates the number of times that the internal MAC layer encountered an
1488error when attempting to process and transmit a frame.
1489.It Sy ETHER_STAT_MULTI_COLLISIONS
1490Indicates the number of times that a frame was eventually transmitted
1491successfully, but only after more than one collision.
1492.It Sy ETHER_STAT_SQE_ERRORS
1493Indicates the number of times that an SQE error occurred.
1494The specific conditions for this error are documented in IEEE 802.3.
1495.It Sy ETHER_STAT_TOOLONG_ERRORS
1496Indicates the number of frames that were received that were longer than
1497the maximum frame size supported by the device.
1498.It Sy ETHER_STAT_TOOSHORT_ERRORS
1499Indicates the number of frames that were received that were shorter than
1500the minimum frame size supported by the device.
1501.It Sy ETHER_STAT_TX_LATE_COLLISIONS
1502Indicates the number of times a collision was detected late on the
1503device.
1504.It Sy ETHER_STAT_XCVR_ADDR
1505Indicates the address of the MII/GMII receiver address.
1506.It Sy ETHER_STAT_XCVR_ID
1507Indicates the id of the MII/GMII receiver address.
1508.It Sy ETHER_STAT_XCVR_INUSE
1509Indicates what kind of receiver is in use.
1510The following values may be used:
1511.Bl -tag -width Ds
1512.It Sy XCVR_UNDEFINED
1513The receiver type is undefined by the hardware.
1514.It Sy XCVR_NONE
1515There is no receiver in use by the hardware.
1516.It Sy XCVR_10
1517The receiver supports 10BASE-T operation.
1518.It Sy XCVR_100T4
1519The receiver supports 100BASE-T4 operation.
1520.It Sy XCVR_100X
1521The receiver supports 100BASE-TX operation.
1522.It Sy XCVR_100T2
1523The receiver supports 100BASE-T2 operation.
1524.It Sy XCVR_1000X
1525The receiver supports 1000BASE-X operation.
1526This is used for all fiber receivers.
1527.It Sy XCVR_1000T
1528The receiver supports 1000BASE-T operation.
1529This is used for all copper receivers.
1530.El
1531.El
1532.Ss Device Specific kstats
1533In addition to the defined statistics above, if the device driver
1534maintains additional statistics or the device provides additional
1535statistics, it should create its own kstats through the
1536.Xr kstat_create 9F
1537function to allow operators to observe them.
1538.Sh RECEIVE DESCRIPTOR LAYOUT
1539One of the important things that a device driver must do is lay out DMA
1540memory, generally in a ring of descriptors, into which received Ethernet
1541frames will be placed.
1542When performing this, there are a few things that drivers should
1543generally do:
1544.Bl -enum -offset indent
1545.It
1546Drivers should lay out memory so that the IP header will be 4-byte
1547aligned.
1548The IP stack expects that the beginning of an IP header will be at a
15494-byte aligned address; however, a DMA allocation will be at a 4-
1550or 8-byte aligned address by default.
1551The IP hearder is at a 14 byte offset from the beginning of the Ethernet
1552frame, leaving the IP header at a 2-byte alignment if the Ethernet frame
1553starts at the beginning of the DMA buffer.
1554If VLAN tagging is in place, then each VLAN tag adds 4 bytes, which
1555doesn't change the alignment the IP header is found at.
1556.Pp
1557As a solution to this, the driver should program the device to start
1558placing the received Ethernet frame at two bytes off of the start of the
1559DMA buffer.
1560This will make sure that no matter whether or not VLAN tags are present,
1561that the IP header will be 4-byte aligned.
1562.It
1563Drivers should try to allocate the DMA memory used for receiving frames
1564as a continuous buffer.
1565If for some reason that would not be possible, the driver should try to
1566ensure that there is enough space for all of the initial Ethernet and
1567any possible layer three and layer four headers
1568.Pq such as IP, TCP, or UDP
1569in the initial descriptor.
1570.It
1571As discussed in the
1572.Sx MBLKS AND DMA
1573section, there are multiple strategies for managing the relationship
1574between DMA data, receive descriptors, and the operating system
1575representation of a packet in the
1576.Xr mblk 9S
1577structure.
1578Drivers must limit their resource consumption.
1579See the
1580.Sy Considerations
1581section of
1582.Sx MBLKS AND DMA
1583for more on this.
1584.El
1585.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
1586Device drivers are the first line of defense for dealing with broken
1587devices and bugs in their firmware.
1588While most devices will rarely fail, it is important that when designing and
1589implementing the device driver that particular attention is paid in the design
1590with respect to RAS (Reliability, Availability, and Serviceability).
1591While everything described in this section is optional, it is highly recommended
1592that all new device drivers follow these guidelines.
1593.Pp
1594The Fault Management Architecture (FMA) provides facilities for
1595detecting and reporting various classes of defects and faults.
1596Specifically for networking device drivers, issues that should be
1597detected and reported include:
1598.Bl -bullet -offset indent
1599.It
1600Device internal uncorrectable errors
1601.It
1602Device internal correctable errors
1603.It
1604PCI and PCI Express transport errors
1605.It
1606Device temperature alarms
1607.It
1608Device transmission stalls
1609.It
1610Device communication timeouts
1611.It
1612High invalid interrupts
1613.El
1614.Pp
1615All such errors fall into three primary categories:
1616.Bl -enum -offset indent
1617.It
1618Errors detected by the Fault Management Architecture
1619.It
1620Errors detected by the device and indicated to the device driver
1621.It
1622Errors detected by the device driver
1623.El
1624.Ss Fault Management Setup and Teardown
1625Drivers should initialize support for the fault management framework by
1626calling
1627.Xr ddi_fm_init 9F
1628from their
1629.Xr attach 9E
1630routine.
1631By registering with the fault management framework, a device driver is given the
1632chance to detect and notice transport errors as well as report other errors that
1633exist.
1634While a device driver does not need to indicate that it is capable of all such
1635capabilities described in
1636.Xr ddi_fm_init 9F ,
1637we suggest that device drivers at least register the
1638.Sy DDI_FM_EREPORT_CAPABLE
1639so as to allow the driver to report issues that it detects.
1640.Pp
1641If the driver registers with the fault management framework during its
1642.Xr attach 9E
1643entry point, it must call
1644.Xr ddi_fm_fini 9F
1645during its
1646.Xr detach 9E
1647entry point.
1648.Ss Transport Errors
1649Many modern networking devices leverage PCI or PCI Express.
1650As such, there are two primary ways that device drivers access data: they either
1651memory map device registers and use routines like
1652.Xr ddi_get8 9F
1653and
1654.Xr ddi_put8 9F
1655or they use direct memory access (DMA).
1656New device drivers should always enable checking of the transport layer by
1657marking their support in the
1658.Xr ddi_device_acc_attr 9S
1659structure and using routines like
1660.Xr ddi_fm_acc_err_get 9F
1661and
1662.Xr ddi_fm_dma_err_get 9F
1663to detect if errors have occurred.
1664.Ss Device Indicated Errors
1665Many devices have capabilities to announce to a device driver that a
1666fatal correctable error or uncorrectable error has occurred.
1667Other devices have the ability to indicate that various physical issues have
1668occurred such as a fan failing or a temperature sensor having fired.
1669.Pp
1670Drivers should wire themselves to receive notifications when these
1671events occur.
1672The means and capabilities will vary from device to device.
1673For example, some devices will generate information about these notifications
1674through special interrupts.
1675Other devices may have a register that software can poll.
1676In the cases where polling is required, driver writers should try not to poll
1677too frequently and should generally only poll when the device is actively being
1678used, e.g. between calls to the
1679.Xr mc_start 9E
1680and
1681.Xr mc_stop 9E
1682entry points.
1683.Ss Driver Transmit Stall Detection
1684One of the primary responsibilities of a hardened device driver is to
1685perform transmit stall detection.
1686The core idea behind tx stall detection is that the driver should record when
1687it's getting activity related to when data has been successfully transmitted.
1688Most devices should be transmitting data on a regular basis as long as the link
1689is up.
1690If it is not, then this may indicate that the device is stuck and needs to be
1691reset.
1692At this time, the MAC framework does not provide any resources for performing
1693these checks; however, polling on each individual transmit ring for the last
1694completion time while something is actively being transmitted through the use of
1695routines such as
1696.Xr timeout 9F
1697may be a reasonable starting point.
1698.Ss Driver Command Timeout Detection
1699Each device is programmed in different ways.
1700Some devices are programmed through asynchronous commands while others are
1701programmed by writing directly to memory mapped registers.
1702If a device receives asynchronous replies to commands, then the device driver
1703should set reasonable timeouts for all such commands and plan on detecting them.
1704If a timeout occurs, the driver should presume that there is an issue with the
1705hardware and proceed to abort the command or reset the device.
1706.Pp
1707Many devices do not have such a communication mechanism.
1708However, whenever there is some activity where the device driver must wait, then
1709it should be prepared for the fact that the device may never get back to
1710it and react appropriately by performing some kind of device reset.
1711.Ss Reacting to Errors
1712When any of the above categories of errors has been triggered, the
1713behavior that the device driver should take depends on the kind of
1714error.
1715If a fatal error, for example, a transport error, a transmit stall was detected,
1716or the device indicated an uncorrectable error was detected, then it is
1717important that the driver take the following steps:
1718.Bl -enum -offset indent
1719.It
1720Set a flag in the device driver's state that indicates that it has hit
1721an error condition.
1722When this error condition flag is asserted, transmitted packets should be
1723accepted and dropped and actions that would require writing to the device state
1724should fail with an error.
1725This flag should remain until the device has been successfully restarted.
1726.It
1727If the error was not a transport error that was indicated by the fault
1728management architecture, e.g. a transport error that was detected, then
1729the device driver should post an
1730.Sy ereport
1731indicating what has occurred with the
1732.Xr ddi_fm_ereport_post 9F
1733function.
1734.It
1735The device driver should indicate that the device's service was lost
1736with a call to
1737.Xr ddi_fm_service_impact 9F
1738using the symbol
1739.Sy DDI_SERVICE_LOST .
1740.It
1741At this point the device driver should issue a device reset through some
1742device-specific means.
1743.It
1744When the device reset has been completed, then the device driver should
1745restore all of the programmed state to the device.
1746This includes things like the current MTU, advertised auto-negotiation speeds,
1747MAC address filters, and more.
1748.It
1749Finally, when service has been restored, the device driver should call
1750.Xr ddi_fm_service_impact 9F
1751using the symbol
1752.Sy DDI_SERVICE_RESTORED .
1753.El
1754.Pp
1755When a non-fatal error occurs, then the device driver should submit an
1756ereport and should optionally mark the device degraded using
1757.Xr ddi_fm_service_impact 9F
1758with the
1759.Sy DDI_SERVICE_DEGRADED
1760value depending on the nature of the problem that has occurred.
1761.Pp
1762Device drivers should never make the decision to remove a device from
1763service based on errors that have occurred nor should they panic the
1764system.
1765Rather, the device driver should always try to notify the operating system with
1766various ereports and allow its policy decisions to occur.
1767The decision to retire a device lies in the hands of the fault management
1768architecture.
1769It knows more about the operator's intent and the surrounding system's state
1770than the device driver itself does and it will make the call to offline and
1771retire the device if it is required.
1772.Ss Device Resets
1773When resetting a device, a device driver must exercise caution.
1774If a device driver has not been written to plan for a device reset, then it
1775may not correctly restore the device's state after such a reset.
1776Such state should be stored in the instance's private state data as the MAC
1777framework does not know about device resets and will not inform the
1778device again about the expected, programmed state.
1779.Pp
1780One wrinkle with device resets is that many networking cards show up as
1781multiple PCI functions on a single device, for example, each port may
1782show up as a separate function and thus have a separate instance of the
1783device driver attached.
1784When resetting a function, device driver writers should carefully read the
1785device programming manuals and verify whether or not a reset impacts only the
1786stalled function or if it impacts all function across the device.
1787.Pp
1788If the only way to reset a given function is through the device, then
1789this may require more coordination and work on the part of the device
1790driver to ensure that all the other instances are correctly restored.
1791In cases where this occurs, some devices offer ways of injecting
1792interrupts onto those other functions to notify them that this is
1793occurring.
1794.Sh MBLKS AND DMA
1795The networking stack manages framed data through the use of the
1796.Xr mblk 9S
1797structure.
1798The mblk allows for a single message to be made up of individual blocks.
1799Each part is linked together through its
1800.Sy b_cont
1801member.
1802However, it also allows for multiple messages to be chained together through the
1803use of the
1804.Sy b_next
1805member.
1806While the networking stack works with these structures, device drivers generally
1807work with DMA regions.
1808There are two different strategies that device drivers use for handling these
1809two different cases: copying and binding.
1810.Ss Copying Data
1811The first way that device drivers handle interfacing between the two is
1812by having two separate regions of memory.
1813One part is memory which has been allocated for DMA through a call to
1814.Xr ddi_dma_mem_alloc 9F
1815and the other is memory associated with the memory block.
1816.Pp
1817In this case, a driver will use
1818.Xr bcopy 9F
1819to copy memory between the two distinct regions.
1820When transmitting a packet, it will copy the memory from the mblk_t to the DMA
1821region.
1822When receiving memory, it will allocate a mblk_t through the
1823.Xr allocb 9F
1824routine, copy the memory across with
1825.Xr bcopy 9F ,
1826and then increment the mblk_t's
1827.Sy w_ptr
1828structure.
1829.Pp
1830If, when receiving, memory is not available for a new message block,
1831then the frame should be skipped and effectively dropped.
1832A kstat should be bumped when such an occasion occurs.
1833.Ss Binding Data
1834An alternative approach to copying data is to use DMA binding.
1835When using DMA binding, the OS takes care of mapping between DMA memory and
1836normal device memory.
1837The exact process is a bit different between transmit and receive.
1838.Pp
1839When transmitting a device driver has an mblk_t and needs to call the
1840.Xr ddi_dma_addr_bind_handle 9F
1841function to bind it to an already existing DMA handle.
1842At that point, it will receive various DMA cookies that it can use to obtain the
1843addresses to program the device with for transmitting data.
1844Once the transmit is done, the driver must then make sure to call
1845.Xr freemsg 9F
1846to release the data.
1847It must not call
1848.Xr freemsg 9F
1849before it receives an interrupt from the device indicating that the data
1850has been transmitted, otherwise it risks sending arbitrary kernel
1851memory.
1852.Pp
1853When receiving data, the device can perform a similar operation.
1854First, it must bind the DMA memory into the kernel's virtual memory address
1855space through a call to the
1856.Xr ddi_dma_addr_bind_handle 9F
1857function if it has not already.
1858Once it has, it must then call
1859.Xr desballoc 9F
1860to try and create a new mblk_t which leverages the associated memory.
1861It can then pass that mblk_t up to the stack.
1862.Ss Considerations
1863When deciding which of these options to use, there are many different
1864considerations that must be made.
1865The answer as to whether to bind memory or to copy data is not always simpler.
1866.Pp
1867The first thing to remember is that DMA resources may be finite on a
1868given platform.
1869Consider the case of receiving data.
1870A device driver that binds one of its receive descriptors may not get it back
1871for quite some time as it may be used by the kernel until an application
1872actually consumes it.
1873Device drivers that try to bind memory for receive, often work with the
1874constraint that they must be able to replace that DMA memory with another DMA
1875descriptor.
1876If they were not replaced, then eventually the device would not be able to
1877receive additional data into the ring.
1878.Pp
1879On the other hand, particularly for larger frames, copying every packet
1880from one buffer to another can be a source of additional latency and
1881memory waste in the system.
1882For larger copies, the cost of copying may dwarf any potential cost of
1883performing DMA binding.
1884.Pp
1885For device driver authors that are unsure of what to do, they should
1886first employ the copying method to simplify the act of writing the
1887device driver.
1888The copying method is simpler and also allows the device driver author not to
1889worry about allocated DMA memory that is still outstanding when it is asked to
1890unload.
1891.Pp
1892If device driver writers are worried about the cost, it is recommended
1893to make the decision as to whether or not to copy or bind DMA data
1894a separate private property for both transmitting and receiving.
1895That private property should indicate the size of the received frame at which
1896to switch from one format to the other.
1897This way, data can be gathered to determine what the impact of each method is on
1898a given platform.
1899.Sh SEE ALSO
1900.Xr dladm 1M ,
1901.Xr driver.conf 4 ,
1902.Xr ieee802.3 5 ,
1903.Xr dlpi 7P ,
1904.Xr _fini 9E ,
1905.Xr _info 9E ,
1906.Xr _init 9E ,
1907.Xr attach 9E ,
1908.Xr close 9E ,
1909.Xr detach 9E ,
1910.Xr mc_close 9E ,
1911.Xr mc_getcapab 9E ,
1912.Xr mc_getprop 9E ,
1913.Xr mc_getstat 9E ,
1914.Xr mc_multicst 9E  ,
1915.Xr mc_open 9E ,
1916.Xr mc_propinfo 9E  ,
1917.Xr mc_setpromisc 9E  ,
1918.Xr mc_setprop 9E ,
1919.Xr mc_start 9E ,
1920.Xr mc_stop 9E ,
1921.Xr mc_tx 9E ,
1922.Xr mc_unicst 9E  ,
1923.Xr open 9E ,
1924.Xr allocb 9F ,
1925.Xr bcopy 9F ,
1926.Xr ddi_dma_addr_bind_handle 9F ,
1927.Xr ddi_dma_mem_alloc 9F ,
1928.Xr ddi_fm_acc_err_get 9F ,
1929.Xr ddi_fm_dma_err_get 9F ,
1930.Xr ddi_fm_ereport_post 9F ,
1931.Xr ddi_fm_fini 9F ,
1932.Xr ddi_fm_init 9F ,
1933.Xr ddi_fm_service_impact 9F ,
1934.Xr ddi_get8 9F ,
1935.Xr ddi_put8 9F ,
1936.Xr desballoc 9F ,
1937.Xr freemsg 9F ,
1938.Xr kstat_create 9F ,
1939.Xr mac_alloc 9F ,
1940.Xr mac_fini_ops 9F ,
1941.Xr mac_free 9F ,
1942.Xr mac_hcksum_get 9F ,
1943.Xr mac_hcksum_set 9F ,
1944.Xr mac_init_ops 9F ,
1945.Xr mac_link_update 9F ,
1946.Xr mac_lso_get 9F ,
1947.Xr mac_maxsdu_update 9F ,
1948.Xr mac_prop_info_set_default_link_flowctrl 9F ,
1949.Xr mac_prop_info_set_default_str 9F ,
1950.Xr mac_prop_info_set_default_uint32 9F ,
1951.Xr mac_prop_info_set_default_uint64 9F ,
1952.Xr mac_prop_info_set_default_uint8 9F ,
1953.Xr mac_prop_info_set_perm 9F ,
1954.Xr mac_prop_info_set_range_uint32 9F ,
1955.Xr mac_register 9F ,
1956.Xr mac_rx 9F ,
1957.Xr mac_unregister 9F ,
1958.Xr mod_install 9F ,
1959.Xr mod_remove 9F ,
1960.Xr strcmp 9F ,
1961.Xr timeout 9F ,
1962.Xr cb_ops 9S ,
1963.Xr ddi_device_acc_attr 9S ,
1964.Xr dev_ops 9S ,
1965.Xr mac_callbacks 9S ,
1966.Xr mac_register 9S ,
1967.Xr mblk 9S ,
1968.Xr modldrv 9S ,
1969.Xr modlinkage 9S
1970.Rs
1971.%A McCloghrie, K.
1972.%A Rose, M.
1973.%T RFC 1213 Management Information Base for Network Management of
1974.%T TCP/IP-based internets: MIB-II
1975.%D March 1991
1976.Re
1977.Rs
1978.%A McCloghrie, K.
1979.%A Kastenholz, F.
1980.%T RFC 1573 Evolution of the Interfaces Group of MIB-II
1981.%D January 1994
1982.Re
1983.Rs
1984.%A Kastenholz, F.
1985.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like
1986.%T Interface Types
1987.Re
1988