xref: /illumos-gate/usr/src/man/man9e/mac.9e (revision cd277642e0bdc71a7f57c550df1279b0c091d6e2)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright 2016 Joyent, Inc.
13.\"
14.Dd March 26, 2017
15.Dt MAC 9E
16.Os
17.Sh NAME
18.Nm mac ,
19.Nm GLDv3
20.Nd MAC networking device driver overview
21.Sh SYNOPSIS
22.In sys/mac_provider.h
23.In sys/mac_ether.h
24.Sh INTERFACE LEVEL
25illumos DDI specific
26.Sh DESCRIPTION
27The
28.Sy MAC
29framework provides a means for implementing high-performance networking
30device drivers. It is the successor to the GLD interfaces and is
31sometimes referred to as the GLDv3. The remainder of this manual
32introduces the aspects of writing devices drivers that leverage the MAC
33framework. While both the GLDv3 and MAC framework refer to the same thing, in
34this manual page we use the term the
35.Em MAC framework
36to refer to the device driver interface.
37.Pp
38MAC device drivers are character devices. They define the standard
39.Xr _init 9E ,
40.Xr _fini 9E ,
41and
42.Xr _info 9E
43entry points to initialize the module, as well as
44.Xr dev_ops 9S
45and
46.Xr cb_ops 9S
47structures.
48.Pp
49The main interface with MAC is through a series of callbacks defined in
50a
51.Xr mac_callbacks 9S
52structure. These callbacks control all the aspects of the device. They
53range from sending data, getting and setting of
54properties, controlling mac address filters, and also managing
55promiscuous mode.
56.Pp
57The MAC framework takes care of many aspects of the device driver's
58management. A device that uses the MAC framework does not have to worry
59about creating device nodes or implementing
60.Xr open 9E
61or
62.Xr close 9E
63routines. In addition, all of the work to interact with
64.Xr dlpi 7P
65is taken care of automatically and transparently.
66.Ss Initializing MAC Support
67For a device to be used in the framework, it must register with the
68framework and take specific actions during
69.Xr _init 9E ,
70.Xr attach 9E ,
71.Xr detach 9E ,
72and
73.Xr _fini 9E .
74.Pp
75All device drivers have to define a
76.Xr dev_ops 9S
77structure which is pointed to by a
78.Xr modldrv 9S
79structure and the corresponding NULL-terminated
80.Xr modlinkage 9S
81structure. The
82.Xr dev_ops 9S
83structure should have a
84.Xr cb_ops 9S
85structure defined for it; however, it does not need to implement any of
86the standard
87.Xr cb_ops 9S
88entry points.
89.Pp
90Normally, in a driver's
91.Xr _init 9E
92entry point, it passes its
93.Sy modlinkage
94structure directly to
95.Xr mod_install 9F .
96To properly register with MAC, the driver must call
97.Xr mac_init_ops 9F
98before it calls
99.Xr mod_install 9F .
100If for some reason the
101.Xr mod_install 9F
102function fails, then the driver must be removed by a call to
103.Xr mac_fini_ops 9F .
104.Pp
105Conversely, in the driver's
106.Xr _fini 9F
107routine, it should call
108.Xr mac_fini_ops 9F
109after it successfully calls
110.Xr mod_remove 9F .
111For an example of how to use the
112.Xr mac_init_ops 9F
113and
114.Xr mac_fini_ops 9F
115functions, see the examples section in
116.Xr mac_init_ops 9F .
117.Ss Registering with MAC
118Every instance of a device should register separately with MAC.
119To register with MAC, a driver must allocate a
120.Xr mac_register 9S
121structure, fill it in, and then call
122.Xr mac_register 9F .
123The
124.Sy mac_register_t
125structure contains information about the device and all of the required
126function pointers that will be used as callbacks by the framework.
127.Pp
128These steps should all be taken during a device's
129.Xr attach 9E
130entry point. It is recommended that the driver perform this sequence of
131steps after the device has finished its initialization of the chipset
132and interrupts, though interrupts should not be enabled at that point.
133After it calls
134.Xr mac_register 9F
135it will start receiving callbacks from the MAC framework.
136.Pp
137To allocate the registration structure, the driver should call
138.Xr mac_alloc 9F .
139Device drivers should generally always pass the symbol
140.Sy MAC_VERSION
141as the argument to
142.Xr mac_alloc 9F .
143Upon successful completion, the driver will receive a
144.Sy mac_register_t
145structure which it should fill in. The structure and its members are
146documented in
147.Xr mac_register 9S .
148.Pp
149The
150.Xr mac_callbacks 9S
151structure is not allocated as a part of the
152.Xr mac_register 9S
153structure. In general, device drivers declare this statically. See the
154.Sx MAC Callbacks
155section for more information on how to fill it out.
156.Pp
157Once the structure has been filled in, the driver should call
158.Xr mac_register 9F
159to register itself with MAC. The handle that it uses to register with
160should be part of the driver's soft state. It will be used in various
161other support functions and callbacks.
162.Pp
163If the call is successful, then the device driver
164should enable interrupts and finish any other initialization required.
165If the call to
166.Xr mac_register 9F
167failed, then it should unwind its initialization and should return
168.Sy DDI_FAILURE
169from its
170.Xr attach 9E
171routine.
172.Ss MAC Callbacks
173The MAC framework interacts with a device driver through a series of
174callbacks. These callbacks are described in their individual manual
175pages and the collection of callbacks is indicated in the
176.Xr mac_callbacks 9S
177manual page. This section does not focus on the specific functions, but
178rather on interactions between them and the rest of the device driver
179framework.
180.Pp
181A device driver should make no assumptions about when the various
182callbacks will be called and whether or not they will be called
183simultaneously. For example, a device driver may be asked to
184transmit data through a call to its
185.Xr mc_tx 9F
186entry point while it is being asked to get a device property through a
187call to its
188.Xr mc_getprop 9F
189entry point.  As such, while some calls may be serialized to the device,
190such as setting properties, the device driver should always presume that
191all of its data needs to be protected with locks. While the device is
192holding locks, it is safe for it call the following MAC routines:
193.Bl -bullet -offset indent -compact
194.It
195.Xr mac_hcksum_get 9F
196.It
197.Xr mac_hcksum_set 9F
198.It
199.Xr mac_lso_get 9F
200.It
201.Xr mac_maxsdu_update 9F
202.It
203.Xr mac_prop_info_set_default_link_flowctrl 9F
204.It
205.Xr mac_prop_info_set_default_str 9F
206.It
207.Xr mac_prop_info_set_default_uint8 9F
208.It
209.Xr mac_prop_info_set_default_uint32 9F
210.It
211.Xr mac_prop_info_set_default_uint64 9F
212.It
213.Xr mac_prop_info_set_perm 9F
214.It
215.Xr mac_prop_info_set_range_uint32 9F
216.El
217.Pp
218Any other MAC related routines should not be called with locks held,
219such as
220.Xr mac_link_update 9F
221or
222.Xr mac_rx 9F .
223Other routines in the DDI may be called while locks are held; however,
224device driver writers should be careful about calling blocking routines
225while locks are held or in interrupt context, though it is generally
226legal to do so.
227.Ss Receiving Data
228A device driver will often receive data through the means of an
229interrupt. When that interrupt occurs, the device driver will receive
230one or more frames with optional metadata. Often each frame has a
231corresponding descriptor which has information about whether or not
232there were errors or whether or not the device successfully checksummed
233the packet.
234.Pp
235During a single interrupt, a device driver should process a fixed number
236of frames. For each frame the device driver should:
237.Bl -enum -offset indent
238.It
239First check whether or not the frame has errors. If errors were
240detected, then the frame should not be sent to the operating system. It
241is recommended that devices keep kstats (see
242.Xr kstat_create 9S
243for more information) and bump the counter whenever such an error is
244detected. If the device distinguishes between the types of errors, then
245separate kstats for each class of error are recommended. See the
246.Sx STATISTICS
247section for more information on the various error cases that should be
248considered.
249.It
250Once the frame has been determined to be valid, the device driver should
251transform the frame into a
252.Xr mblk 9S .
253See the section
254.Sx MBLKS AND DMA
255for more information on how to transform and prepare a message block.
256.It
257If the device supports hardware checksumming (see the
258.Sx CAPABILITIES
259section for more information on checksumming), then the device driver
260should set the corresponding checksumming information with a call to
261.Xr mac_hcksum_set 9F .
262.It
263It should then append this new message block to the
264.Em end
265of the message block chain, linking it to the
266.Sy b_next
267pointer. It is vitally important that all the frames be chained in the
268order that they were received. If the device driver mistakenly reorders
269frames, then it may cause performance impacts in the TCP stack and
270potentially impact application correctness.
271.El
272.Pp
273Once all the frames have been processed and assembled, the device driver
274should deliver them to the rest of the operating system by calling
275.Xr mac_rx 9F .
276The device driver should try to give as many mblk_t structures to the
277system at once. It
278.Em should not
279call
280.Xr mac_rx 9F
281once for every assembled mblk_t.
282.Pp
283The device driver must not hold any locks across the call to
284.Xr mac_rx 9F .
285When this function is called, received data will be pushed through the
286networking stack and some replies may be generated and given to the
287driver to send out.
288.Pp
289It is not the device driver's responsibility to determine whether or not
290the system can keep up with a driver's delivery rate of frames. The rest
291of the networking stack will handle issues related to keeping up
292appropriately and ensure that kernel memory is not exhausted by packets
293that are not being processed.
294.Pp
295Finally, the device driver should make sure that any other housekeeping
296activities required for the ring are taken care of such that more data
297can be received.
298.Ss Transmitting Data and Back Pressure
299A device driver will be asked to transmit a message block chain by
300having it's
301.Xr mc_tx 9E
302entry point called. While the driver is processing the message blocks,
303it may run out of resources. For example, a transmit descriptor ring may
304become full. At that point, the device driver should return the
305remaining unprocessed frames. The act of returning frames indicates that
306the device has asserted flow control.
307Once this has been done, no additional calls will be made to the
308driver's transmit entry point and the back pressure will be propagated
309throughout the rest of the networking stack.
310.Pp
311At some point in the future when resources have become available again,
312for example after an interrupt indicating that some portion of the
313transmit ring has been sent, then the device driver must notify the
314system that it can continue transmission. To do this, the
315driver should call
316.Xr mac_tx_update 9F .
317After that point, the driver will receive calls to its
318.Xr mc_tx 9E
319entry point again. As mentioned in the section on callbacks, the device
320driver should avoid holding any particular locks across the call to
321.Xr mac_tx_update 9F .
322.Ss Interrupt Coalescing
323For devices operating at higher data rates, interrupt coalescing is an
324important part of a well functioning device and may impact the
325performance of the device. Not all devices support interrupt
326coalescing. If interrupt coalescing is supported on the device, it is
327recommended that device driver writers provide private properties for
328their device to control the interrupt coalescing rate. This will make it
329much easier to perform experiments and observe the impact of different
330interrupt rates on the rest of the system.
331.Ss MAC Address Filter Management
332The MAC framework will attempt to use as many MAC address filters as a
333device has.  To program a multicast address filter, the driver's
334.Xr mc_multicst 9E
335entry point will be called. If the device driver runs out of filters, it
336should not take any special action and just return the appropriate error
337as documented in the corresponding manual pages for the entry points.
338The framework will ensure that the device is placed in promiscuous mode
339if it needs to.
340.Ss Link Updates
341It is the responsibility of the device driver to keep track of the
342data link's state. Many devices provide a means of receiving an
343interrupt when the state of the link changes. When such a change
344happens, the driver should update its internal data structures and then
345call
346.Xr mac_link_update 9F
347to inform the MAC layer that this has occurred. If the device driver
348does not properly inform the system about link changes, then various
349features like link aggregations and other mechanisms that leverage the
350link state will not work correctly.
351.Ss Link Speed and Auto-negotiation
352Many networking devices support more than one possible speed that they
353can operate at. The selection of a speed is often performed through
354.Em auto-negotiation ,
355though some devices allow the user to control what speeds are advertised
356and used.
357.Pp
358Logically, there are two different sets of things that the device driver
359needs to keep track of while it's operating:
360.Bl -enum
361.It
362The supported speeds in hardware.
363.It
364The enabled speeds from the user.
365.El
366.Pp
367By default, when a link first comes up, the device driver should
368generally configure the link to support the common set of speeds and
369perform auto-negotiation.
370.Pp
371A user can control what speeds a device advertises via auto-negotiation
372and whether or not it performs auto-negotiation at all by using a series
373of properties that have
374.Sy _EN_
375in the name. These are read/write properties and there is one for each
376speed supported in the operating system. For a full list of them, see
377the
378.Sx PROPERTIES
379section.
380.Pp
381In addition to these properties, there is a corresponding set of
382properties with
383.Sy _ADV_
384in the name. These are similar to the
385.Sy _EN_
386family of properties, but they are read-only and indicate what the
387device has actually negotiated. While they are generally similar to the
388.Sy _EN_
389family of properties, they may change depending on power settings. See
390the
391.Sy Ethernet Link Properties
392section in
393.Xr dladm 1M
394for more information.
395.Pp
396It's worth discussing how these different values get used throughout the
397different entry points. The first entry point to consider is the
398.Xr mc_propinfo 9E
399entry point. For a given speed, the driver should consult whether or not
400the hardware supports this speed. If it does, it should fill in the
401default value that the hardware takes and whether or not the property is
402writable. The properties should also be updated to indicate whether or
403not it is writable. This holds for both the
404.Sy _EN_
405and
406.Sy _ADV_
407family of properties.
408.Pp
409The next entry point is
410.Xr mc_getprop 9E .
411Here, the device should first consult whether the given speed is
412supported. If it is not, then the driver should return
413.Er ENOTSUP .
414If it does, then it should return the current value of the property.
415.Pp
416The last property endpoint is the
417.Xr mc_setprop 9E
418entry point. Here, the same logic applies. Before the driver considers
419whether or not the property is writable, it should first check whether
420or not it's a supported property. If it's not, then it should return
421.Er ENOTSUP .
422Otherwise, it should proceed to check whether the property is writable,
423and if it is and a valid value, then it should update the property and
424restart the link's negotiation.
425.Pp
426Finally, there is the
427.Xr mc_getstat 9E
428entry point. Several of the statistics that are queried relate to
429auto-negotiation and hardware capabilities. When a statistic relates to
430the hardware supporting a given speed, the
431.Sy _EN_
432properties should be ignored. The only thing that should be consulted is
433what the hardware itself supports. Otherwise, the statistics should look
434at what is currently being advertised by the device.
435.Ss Unregistering from MAC
436During a driver's
437.Xr detach 9E
438routine, it should unregister the device instance from MAC by calling
439.Xr mac_unregister 9F
440on the handle that it originally called it on. If the call to
441.Xr mac_unregister 9F
442failed, then the device is likely still in use and the driver should
443fail the call to
444.Xr detach 9E .
445.Ss Interacting with Devices
446Administrators always interact with devices through the
447.Xr dladm 1M
448command line interface. The state of devices such as whether the link is
449considered
450.Sy up
451or
452.Sy down ,
453various link properties such as the
454.Sy MTU ,
455.Sy auto-negotiation
456state,
457and
458.Sy flow control
459state,
460are all exposed. It is also the preferred way that these properties are
461set and configured.
462.Pp
463While device tunables may be presented in a
464.Xr driver.conf 4
465file, it is recommended instead to expose such things through
466.Xr dladm 1M
467private properties, whether explicitly documented or not.
468.Sh CAPABILITIES
469Capabilities in the MAC Framework are optional features that a device
470supports which indicate various hardware features that the device
471supports. The two current capabilities that the system supports are
472related to being able to hardware perform large send offloads (LSO),
473often also known as TCP segmentation and the ability for hardware to
474calculate and verify the checksums present in IPv4, IPV6, and protocol
475headers such as TCP and UDP.
476.Pp
477The MAC framework will query a device for support of a capability
478through the
479.Xr mc_getcapab 9E
480function. Each capability has its own constant and may have
481corresponding data that goes along with it and a specific structure that
482the device is required to fill in. Note, the set of capabilities changes
483over time and there are also private capabilities in the system. Several
484of the capabilities are used in the implementation of the MAC framework.
485Others, like
486.Sy MAC_CAPAB_RINGS ,
487represent feature that have not been stabilized and thus both API and
488binary compatibility for them is not guaranteed. It is important that
489the device driver handles unknown capabilities correctly.  For more
490information, see
491.Xr mc_getcapab 9E .
492.Pp
493The following capabilities are
494stable and defined in the system:
495.Ss MAC_CAPAB_HCKSUM
496The
497.Sy MAC_CAPAB_HCKSUM
498capability indicates to the system that the device driver supports some
499amount of checksumming. The specific data for this capability is a
500pointer to a
501.Sy uint32_t .
502To indicate no support for any kind of checksumming, the driver should
503either set this value to zero or simply return that it doesn't support
504the capability.
505.Pp
506Note, the values that the driver declares in this capability indicate
507what it can do when it transmits data. If the driver can only
508verify checksums when receiving data, then it should not indicate that
509it supports this capability. The following set of flags may be combined
510through a bitwise inclusive OR:
511.Bl -tag -width Ds
512.It Sy HCKSUM_INET_PARTIAL
513This indicates that the hardware can calculate a partial checksum for
514both IPv4 and IPv6; however, it requires the pseudo-header checksum be
515calculated for it. The pseudo-header checksum will be available for the
516mblk_t when calling
517.Xr mac_hcksum_get 9F .
518Note this does not imply that the hardware is capable of calculating the
519IPv4 header checksum. That should be indicated with the
520.Sy HCKSUM_IPHDRCKSUM flag.
521.It Sy HCKSUM_INET_FULL_V4
522This indicates that the hardware will fully calculate the L4 checksum
523for outgoing IPv4 packets and does not require a pseudo-header checksum.
524Note this does not imply that the hardware is capable of calculating the
525IPv4 header checksum. That should be indicated with the
526.Sy HCKSUM_IPHDRCKSUM .
527.It Sy HCKSUM_INET_FULL_V6
528This indicates that the hardware will fully calculate the L4 checksum
529for outgoing IPv6 packets and does not require a pseudo-header checksum.
530.It Sy HCKSUM_IPHDRCKSUM
531This indicates that the hardware supports calculating the checksum for
532the IPv4 header itself.
533.El
534.Pp
535When in a driver's transmit function, the driver will be processing a
536single frame. It should call
537.Xr mac_hcksum_get 9F
538to see what checksum flags are set on it. Note that the flags that are
539set on it are different from the ones described above and are documented
540in its manual page. These flags indicate how the driver is expected to
541program the hardware and what checksumming is required. Not all frames
542will require hardware checksumming or will ask the hardware to checksum
543it.
544.Pp
545If a driver supports offloading the receive checksum and verification,
546it should check to see what the hardware indicated was verified. The
547driver should then call
548.Xr mac_hcksum_set 9F .
549The flags used are different from the ones above and are discussed in
550detail in the
551.Xr mac_hcksum_set 9F
552manual page. If there is no checksum information available or the driver
553does not support checksumming, then it should simply not call
554.Xr mac_hcksum_set 9F .
555.Pp
556Note that the checksum flags should be set on the first
557mblk_t that makes up a given message. In other words, if multiple
558mblk_t structures are linked together by the
559.Sy b_cont
560member to describe a single frame, then it should only be called on the
561first mblk_t of that set. However, each distinct message should have the
562checksum bits set on it, if applicable. In other words, each mblk_t that
563is linked together by the
564.Sy b_next
565pointer may have checksum flags set.
566.Pp
567It is recommended that device drivers provide a private property or
568.Xr driver.conf 4
569property to control whether or not checksumming is enabled for both rx
570and tx; however, the default disposition is recommended to be enabled
571for both. This way if hardware bugs are found in the checksumming
572implementation, they can be disabled without requiring software updates.
573The transmit property should be checked when determining how to reply to
574.Xr mc_getcapab 9E
575and the receive property should be checked in the context of the receive
576function.
577.Ss MAC_CAPAB_LSO
578The
579.Sy MAC_CAPAB_LSO
580capability indicates that the driver supports various forms of large
581send offload (LSO). The private data is a pointer to a
582.Sy mac_capab_lso_t
583structure. At the moment, LSO support is limited to TCP inside of IPv4.
584This structure has the following members which are used to indicate
585various types of LSO support.
586.Bd -literal -offset indent
587t_uscalar_t		lso_flags;
588lso_basic_tcp_ivr4_t	lso_basic_tcp_ipv4;
589.Ed
590.Pp
591The
592.Sy lso_flags
593member is used to indicate which members are valid and should be
594considered. Each flag represents a different form of LSO. The member
595should be set to the bitwise inclusive OR of the following values:
596.Bl -tag -width Dv -offset indent
597.It Sy LSO_TX_BASIC_TCP_IPV4
598This indicates hardware support for performing TCP segmentation
599offloading over IPv4. When this flag is set, the
600.Sy lso_basic_tcp_ipv4
601member must be filled in.
602.El
603.Pp
604The
605.Sy lso_basic_tcp_ipv4
606member is a structure with the following members:
607.Bd -literal -offset indent
608t_uscalar_t	lso_max
609.Ed
610.Bd -filled -offset indent
611The
612.Sy lso_max
613member should be set to the maximum size of the TCP data
614payload that can be offloaded to the hardware.
615.Ed
616.Pp
617Like with checksumming, it is recommended that driver writers provide a
618means for disabling the support of LSO even if it is enabled by default.
619This deals with the case where issues that pop up for LSO may be worked
620around without requiring additional driver work.
621.Sh PROPERTIES
622Properties in the MAC framework represent aspects of a link. These
623include things like the link's current state and MTU. Many of the
624properties in the system are focused around auto-negotiation and
625controlling what link speeds are advertised. Information about
626properties is covered by three different device entry points. The
627.Xr mc_propinfo 9E
628entry point obtains metadata about the property. The
629.Xr mc_getprop 9E
630entry point obtains the property. The
631.Xr mc_setprop 9E
632entry point updates the property to a new value.
633.Pp
634Many of the properties listed below are read-only. Each property
635indicates whether it's read-only or it's read/write. However, driver
636writers may not implement the ability to set all writable properties.
637Many of these depend on the card itself. In particular, all properties
638that relate to auto-negotiation and are read/write may not be updated
639if the hardware in question does not support toggling what link speeds
640are auto-negotiated. While copper Ethernet often does not have this
641restriction, it often exists with various fiber standards and phys.
642.Pp
643The following properties are the subset of MAC framework properties that
644driver writers should be aware of and handle. While other properties
645exist in the system, driver writers should always return an error when a
646property not listed below is encountered. See
647.Xr mc_getprop 9E
648and
649.Xr mc_setprop 9E
650for more information on how to handle them.
651.Bl -hang -width Ds
652.It Sy MAC_PROP_DUPLEX
653.Bd -filled -compact
654Type:
655.Sy link_duplex_t |
656Permissions:
657.Sy Read-Only
658.Ed
659.Pp
660The
661.Sy MAC_PROP_DUPLEX
662property is used to indicate whether or not the link is duplex. A duplex
663link may have traffic flowing in both directions at the same time. The
664.Sy link_duplex_t
665is an enumeration which may be set to any of the following values:
666.Bl -tag -width Ds
667.It Sy LINK_DUPLEX_UNKNOWN
668The current state of the link is unknown. This may be because the link
669has not negotiated to a specific speed or it is down.
670.It Sy LINK_DUPLEX_HALF
671The link is running at half duplex. Communication may travel in only one
672direction on the link at a given time.
673.It Sy LINK_DUPLEX_FULL
674The link is running at full duplex. Communication may travel in both
675directions on the link simultaneously.
676.El
677.It Sy MAC_PROP_SPEED
678.Bd -filled -compact
679Type:
680.Sy uint64_t |
681Permissions:
682.Sy Read-Only
683.Ed
684.Pp
685The
686.Sy MAC_PROP_SPEED
687property stores the current link speed in bits per second. A link
688that is running at 100 MBit/s would store the value 100000000ULL. A link
689that is running at 40 Gbit/s would store the value 40000000000ULL.
690.It Sy MAC_PROP_STATUS
691.Bd -filled -compact
692Type:
693.Sy link_state_t |
694Permissions:
695.Sy Read-Only
696.Ed
697.Pp
698The
699.Sy MAC_PROP_STATUS
700property is used to indicate the current state of the link. It indicates
701whether the link is up or down. The
702.Sy link_state_t
703is an enumeration which may be set to any of the following values:
704.Bl -tag -width Ds
705.It Sy LINK_STATE_UNKNOWN
706The current state of the link is unknown. This may be because the
707driver's
708.Xr mc_start 9E
709endpoint has not been called so it has not attempted to start the link.
710.It Sy LINK_STATE_DOWN
711The link is down. This may be because of a negotiation problem, a cable
712problem, or some other device specific issue.
713.It Sy LINK_STATE_UP
714The link is up. If auto-negotiation is in use, it should have completed.
715Traffic should be able to flow over the link, barring other issues.
716.El
717.It Sy MAC_PROP_AUTONEG
718.Bd -filled -compact
719Type:
720.Sy uint8_t |
721Permissions:
722.Sy Read/Write
723.Ed
724.Pp
725The
726.Sy MAC_PROP_AUTONEG
727property indicates whether or not the device is currently configured to
728perform auto-negotiation. A value of
729.Sy 0
730indicates that auto-negotiation is disabled. A
731.Sy non-zero
732value indicates that auto-negotiation is enabled. Devices should
733generally default to enabling auto-negotiation.
734.Pp
735When getting this property, the device driver should return the current
736state. When setting this property, if the device supports operating in
737the requested mode, then the device driver should reset the link to
738negotiate to the new speed after updating any internal registers.
739.It Sy MAC_PROP_MTU
740.Bd -filled -compact
741Type:
742.Sy uint32_t |
743Permissions:
744.Sy Read/Write
745.Ed
746.Pp
747The
748.Sy MAC_PROP_MTU
749property determines the maximum transmission unit (MTU). This indicates
750the maximum size packet that the device can transmit, ignoring its own
751headers. For an Ethernet device, this would exclude the size of the
752Ethernet header and any VLAN headers that would be placed. It is up to
753the driver to ensure that any MTU values that it accepts when adding in
754its margin and header sizes does not exceed its maximum frame size.
755.Pp
756By default, drivers for Ethernet should initialize this value and the
757MTU to
758.Sy 1500 .
759When getting this property, the driver should return its current
760recorded MTU. When setting this property, the driver should first
761validate that it is within the device's valid range and then it must
762call
763.Xr mac_maxsdu_update 9F .
764Note that the call may fail. If the call completes successfully, the
765driver should update the hardware with the new value of the MTU and
766perform any other work needed to handle it.
767.Pp
768If the device does not support changing the MTU after the device's
769.Xr mc_start 9E
770entry point has been called, then driver writers should return
771.Er EBUSY .
772.It Sy MAC_PROP_FLOWCTRL
773.Bd -filled -compact
774Type:
775.Sy link_flowctrl_t |
776Permissions:
777.Sy Read/Write
778.Ed
779.Pp
780The
781.Sy MAC_PROP_FLOWCTRL
782property manages the configuration of pause frames as part of Ethernet
783flow control. Note, this only describes what this device will advertise.
784What is actually enabled may be different and is subject to the rules of
785auto-negotiation. The
786.Sy link_flowctrl_t
787is an enumeration that may be set to one of the following values:
788.Bl -tag -width Ds
789.It Sy LINK_FLOWCTRL_NONE
790Flow control is disabled. No pause frames should be generated or
791honored.
792.It Sy LINK_FLOWCTRL_RX
793The device can receive pause frames; however, it should not generate
794them.
795.It Sy LINK_FLOWCTRL_TX
796The device can generate pause frames; however, it does not support
797receiving them.
798.It Sy LINK_FLOWCTRL_BI
799The device supports both sending and receiving pause frames.
800.El
801.Pp
802When getting this property, the device driver should return the way that
803it has configured the device, not what the device has actually
804negotiated. When setting the property, it should update the hardware and
805allow the link to potentially perform auto-negotiation again.
806.El
807.Pp
808The remaining properties are all about various auto-negotiation link
809speeds. They fall into two different buckets: properties with
810.Sy _ADV_
811in the name and properties with
812.Sy _EN_
813in the name. For any given supported speed, there is one of each. The
814.Sy _EN_
815set of properties are read/write properties that control what should be
816advertised by the device. When these are retrieved, they should return
817the current value of the property. When they are set, they should change
818how the hardware advertises the specific speed and trigger any kind of
819link reset and auto-negotiation, if enabled, to occur.
820.Pp
821The
822.Sy _ADV_
823set of properties are read-only properties. They are meant to reflect
824what has actually been negotiated. These may be different from the
825.Sy _EN_
826family of properties, especially when different power management
827settings are at play.
828.Pp
829See the
830.Sx Link Speed and Auto-negotiation
831section for more information.
832.Pp
833The properties are ordered in increasing link speed:
834.Bl -hang -width Ds
835.It Sy MAC_PROP_ADV_10HDX_CAP
836.Bd -filled -compact
837Type:
838.Sy uint8_t |
839Permissions:
840.Sy Read-Only
841.Ed
842.Pp
843The
844.Sy MAC_PROP_ADV_10HDX_CAP
845property describes whether or not 10 Mbit/s half-duplex support is
846advertised.
847.It Sy MAC_PROP_EN_10HDX_CAP
848.Bd -filled -compact
849Type:
850.Sy uint8_t |
851Permissions:
852.Sy Read/Write
853.Ed
854.Pp
855The
856.Sy MAC_PROP_EN_10HDX_CAP
857property describes whether or not 10 Mbit/s half-duplex support is
858enabled.
859.It Sy MAC_PROP_ADV_10FDX_CAP
860.Bd -filled -compact
861Type:
862.Sy uint8_t |
863Permissions:
864.Sy Read-Only
865.Ed
866.Pp
867The
868.Sy MAC_PROP_ADV_10FDX_CAP
869property describes whether or not 10 Mbit/s full-duplex support is
870advertised.
871.It Sy MAC_PROP_EN_10FDX_CAP
872.Bd -filled -compact
873Type:
874.Sy uint8_t |
875Permissions:
876.Sy Read/Write
877.Ed
878.Pp
879The
880.Sy MAC_PROP_EN_10FDX_CAP
881property describes whether or not 10 Mbit/s full-duplex support is
882enabled.
883.It Sy MAC_PROP_ADV_100HDX_CAP
884.Bd -filled -compact
885Type:
886.Sy uint8_t |
887Permissions:
888.Sy Read-Only
889.Ed
890.Pp
891The
892.Sy MAC_PROP_ADV_100HDX_CAP
893property describes whether or not 100 Mbit/s half-duplex support is
894advertised.
895.It Sy MAC_PROP_EN_100HDX_CAP
896.Bd -filled -compact
897Type:
898.Sy uint8_t |
899Permissions:
900.Sy Read/Write
901.Ed
902.Pp
903The
904.Sy MAC_PROP_EN_100HDX_CAP
905property describes whether or not 100 Mbit/s half-duplex support is
906enabled.
907.It Sy MAC_PROP_ADV_100FDX_CAP
908.Bd -filled -compact
909Type:
910.Sy uint8_t |
911Permissions:
912.Sy Read-Only
913.Ed
914.Pp
915The
916.Sy MAC_PROP_ADV_100FDX_CAP
917property describes whether or not 100 Mbit/s full-duplex support is
918advertised.
919.It Sy MAC_PROP_EN_100FDX_CAP
920.Bd -filled -compact
921Type:
922.Sy uint8_t |
923Permissions:
924.Sy Read/Write
925.Ed
926.Pp
927The
928.Sy MAC_PROP_EN_100FDX_CAP
929property describes whether or not 100 Mbit/s full-duplex support is
930enabled.
931.It Sy MAC_PROP_ADV_100T4_CAP
932.Bd -filled -compact
933Type:
934.Sy uint8_t |
935Permissions:
936.Sy Read-Only
937.Ed
938.Pp
939The
940.Sy MAC_PROP_ADV_100T4_CAP
941property describes whether or not 100 Mbit/s Ethernet using the
942100BASE-T4 standard is
943advertised.
944.It Sy MAC_PROP_EN_100T4_CAP
945.Bd -filled -compact
946Type:
947.Sy uint8_t |
948Permissions:
949.Sy Read/Write
950.Ed
951.Pp
952The
953.Sy MAC_PROP_ADV_100T4_CAP
954property describes whether or not 100 Mbit/s Ethernet using the
955100BASE-T4 standard is
956enabled.
957.It Sy MAC_PROP_ADV_1000HDX_CAP
958.Bd -filled -compact
959Type:
960.Sy uint8_t |
961Permissions:
962.Sy Read-Only
963.Ed
964.Pp
965The
966.Sy MAC_PROP_ADV_1000HDX_CAP
967property describes whether or not 1 Gbit/s half-duplex support is
968advertised.
969.It Sy MAC_PROP_EN_1000HDX_CAP
970.Bd -filled -compact
971Type:
972.Sy uint8_t |
973Permissions:
974.Sy Read/Write
975.Ed
976.Pp
977The
978.Sy MAC_PROP_EN_1000HDX_CAP
979property describes whether or not 1 Gbit/s half-duplex support is
980enabled.
981.It Sy MAC_PROP_ADV_1000FDX_CAP
982.Bd -filled -compact
983Type:
984.Sy uint8_t |
985Permissions:
986.Sy Read-Only
987.Ed
988.Pp
989The
990.Sy MAC_PROP_ADV_1000FDX_CAP
991property describes whether or not 1 Gbit/s full-duplex support is
992advertised.
993.It Sy MAC_PROP_EN_1000FDX_CAP
994.Bd -filled -compact
995Type:
996.Sy uint8_t |
997Permissions:
998.Sy Read/Write
999.Ed
1000.Pp
1001The
1002.Sy MAC_PROP_EN_1000FDX_CAP
1003property describes whether or not 1 Gbit/s full-duplex support is
1004enabled.
1005.It Sy MAC_PROP_ADV_2500FDX_CAP
1006.Bd -filled -compact
1007Type:
1008.Sy uint8_t |
1009Permissions:
1010.Sy Read-Only
1011.Ed
1012.Pp
1013The
1014.Sy MAC_PROP_ADV_2500FDX_CAP
1015property describes whether or not 2.5 Gbit/s full-duplex support is
1016advertised.
1017.It Sy MAC_PROP_EN_2500FDX_CAP
1018.Bd -filled -compact
1019Type:
1020.Sy uint8_t |
1021Permissions:
1022.Sy Read/Write
1023.Ed
1024.Pp
1025The
1026.Sy MAC_PROP_EN_2500FDX_CAP
1027property describes whether or not 2.5 Gbit/s full-duplex support is
1028enabled.
1029.It Sy MAC_PROP_ADV_5000FDX_CAP
1030.Bd -filled -compact
1031Type:
1032.Sy uint8_t |
1033Permissions:
1034.Sy Read-Only
1035.Ed
1036.Pp
1037The
1038.Sy MAC_PROP_ADV_5000FDX_CAP
1039property describes whether or not 5.0 Gbit/s full-duplex support is
1040advertised.
1041.It Sy MAC_PROP_EN_5000FDX_CAP
1042.Bd -filled -compact
1043Type:
1044.Sy uint8_t |
1045Permissions:
1046.Sy Read/Write
1047.Ed
1048.Pp
1049The
1050.Sy MAC_PROP_EN_5000FDX_CAP
1051property describes whether or not 5.0 Gbit/s full-duplex support is
1052enabled.
1053.It Sy MAC_PROP_ADV_10GFDX_CAP
1054.Bd -filled -compact
1055Type:
1056.Sy uint8_t |
1057Permissions:
1058.Sy Read-Only
1059.Ed
1060.Pp
1061The
1062.Sy MAC_PROP_ADV_10GFDX_CAP
1063property describes whether or not 10 Gbit/s full-duplex support is
1064advertised.
1065.It Sy MAC_PROP_EN_10GFDX_CAP
1066.Bd -filled -compact
1067Type:
1068.Sy uint8_t |
1069Permissions:
1070.Sy Read/Write
1071.Ed
1072.Pp
1073The
1074.Sy MAC_PROP_EN_10GFDX_CAP
1075property describes whether or not 10 Gbit/s full-duplex support is
1076enabled.
1077.It Sy MAC_PROP_ADV_40GFDX_CAP
1078.Bd -filled -compact
1079Type:
1080.Sy uint8_t |
1081Permissions:
1082.Sy Read-Only
1083.Ed
1084.Pp
1085The
1086.Sy MAC_PROP_ADV_40GFDX_CAP
1087property describes whether or not 40 Gbit/s full-duplex support is
1088advertised.
1089.It Sy MAC_PROP_EN_40GFDX_CAP
1090.Bd -filled -compact
1091Type:
1092.Sy uint8_t |
1093Permissions:
1094.Sy Read/Write
1095.Ed
1096.Pp
1097The
1098.Sy MAC_PROP_EN_40GFDX_CAP
1099property describes whether or not 40 Gbit/s full-duplex support is
1100enabled.
1101.It Sy MAC_PROP_ADV_100GFDX_CAP
1102.Bd -filled -compact
1103Type:
1104.Sy uint8_t |
1105Permissions:
1106.Sy Read-Only
1107.Ed
1108.Pp
1109The
1110.Sy MAC_PROP_ADV_100GFDX_CAP
1111property describes whether or not 100 Gbit/s full-duplex support is
1112advertised.
1113.It Sy MAC_PROP_EN_100GFDX_CAP
1114.Bd -filled -compact
1115Type:
1116.Sy uint8_t |
1117Permissions:
1118.Sy Read/Write
1119.Ed
1120.Pp
1121The
1122.Sy MAC_PROP_EN_100GFDX_CAP
1123property describes whether or not 100 Gbit/s full-duplex support is
1124enabled.
1125.El
1126.Ss Private Properties
1127In addition to the defined properties above, drivers are allowed to
1128define private properties. These private properties are device-specific
1129properties. All private properties share the same constant,
1130.Sy MAC_PROP_PRIVATE .
1131Properties are distinguished by a name, which is a character string. The
1132list of such private properties is defined when registering with mac in
1133the
1134.Sy m_priv_props
1135member of the
1136.Xr mac_register 9S
1137structure.
1138.Pp
1139The driver may define whatever semantics it wants for these private
1140properties. They will not be listed when running
1141.Xr dladm 1M ,
1142unless explicitly requested by name. All such properties should start
1143with a leading underscore character and then consist of alphanumeric
1144ASCII characters and additional underscores or hyphens.
1145.Pp
1146Properties of type
1147.Sy MAC_PROP_PRIVATE
1148may show up in all three property related entry points:
1149.Xr mc_propinfo 9E ,
1150.Xr mc_getprop 9E ,
1151and
1152.Xr mc_setprop 9E .
1153Device drivers should tell the different properties apart by using the
1154.Xr strcmp 9F
1155function to compare it to the set of properties that it knows about.
1156When encountering properties that it doesn't know, it should treat them
1157like all other unknown properties.
1158.Sh STATISTICS
1159The MAC framework defines a couple different sets of statistics which
1160are based on various standards for devices to implement. Statistics are
1161retrieved through the
1162.Xr mc_getstat 9E
1163entry point. There are both statistics that are required for all devices
1164and then there is a separate set of Ethernet specific statistics. Not
1165all devices will support every statistic. In many cases, several device
1166registers will need to be combined to create the proper stat.
1167.Pp
1168In general, if the device is not keeping track of these statistics, then
1169it is recommended that the driver store these values as a
1170.Sy uint64_t
1171to ensure that overflow does not occur.
1172.Pp
1173If a device does not support a specific statistic, then it is fine to
1174return that it is not supported. The same should be used for
1175unrecognized statistics. See
1176.Xr mc_getstat 9E
1177for more information on the proper way to handle these.
1178.Ss General Device Statistics
1179The following statistics are based on MIB-II statistics from both RFC
11801213 and RFC 1573.
1181.Bl -tag -width Ds
1182.It Sy MAC_STAT_IFSPEED
1183The device's current speed in bits per second.
1184.It Sy MAC_STAT_MULTIRCV
1185The total number of received multicast packets.
1186.It Sy MAC_STAT_BRDCSTRCV
1187The total number of received broadcast packets.
1188.It Sy MAC_STAT_MULTIXMT
1189The total number of transmitted multicast packets.
1190.It Sy MAC_STAT_BRDCSTXMT
1191The total number of received broadcast packets.
1192.It Sy MAC_STAT_NORCVBUF
1193The total number of packets discarded by the hardware due to a lack of
1194receive buffers.
1195.It Sy MAC_STAT_IERRORS
1196The total number of errors detected on input.
1197.It Sy MAC_STAT_UNKNOWNS
1198The total number of received packets that were discarded because they
1199were of an unknown protocol.
1200.It Sy MAC_STAT_NOXMTBUF
1201The total number of outgoing packets dropped due to a lack of transmit
1202buffers.
1203.It Sy MAC_STAT_OERRORS
1204The total number of outgoing packets that resulted in errors.
1205.It Sy MAC_STAT_COLLISIONS
1206Total number of collisions encountered by the transmitter.
1207.It Sy MAC_STAT_RBYTES
1208The total number of
1209.Sy bytes
1210received by the device, regardless of packet type.
1211.It Sy MAC_STAT_IPACKETS
1212The total number of
1213.Sy packets
1214received by the device, regardless of packet type.
1215.It Sy MAC_STAT_OBYTES
1216The total number of
1217.Sy bytes
1218transmitted by the device, regardless of packet type.
1219.It Sy MAC_STAT_OPACKETS
1220The total number of
1221.Sy packets
1222sent by the device, regardless of packet type.
1223.It Sy MAC_STAT_UNDERFLOWS
1224The total number of packets that were smaller than the minimum sized
1225packet for the device and were therefore dropped.
1226.It Sy MAC_STAT_OVERFLOWS
1227The total number of packets that were larger than the maximum sized
1228packet for the device and were therefore dropped.
1229.El
1230.Ss Ethernet Specific Statistics
1231The following statistics are specific to Ethernet devices. They refer to
1232values from RFC 1643 and include various MII/GMII specific stats. Many
1233of these are also defined in IEEE 802.3.
1234.Bl -tag -width Ds
1235.It Sy ETHER_STAT_ADV_CAP_1000FDX
1236Indicates that the device is advertising support for 1 Gbit/s
1237full-duplex operation.
1238.It Sy ETHER_STAT_ADV_CAP_1000HDX
1239Indicates that the device is advertising support for 1 Gbit/s
1240half-duplex operation.
1241.It Sy ETHER_STAT_ADV_CAP_100FDX
1242Indicates that the device is advertising support for 100 Mbit/s
1243full-duplex operation.
1244.It Sy ETHER_STAT_ADV_CAP_100GFDX
1245Indicates that the device is advertising support for 100 Gbit/s
1246full-duplex operation.
1247.It Sy ETHER_STAT_ADV_CAP_100HDX
1248Indicates that the device is advertising support for 100 Mbit/s
1249half-duplex operation.
1250.It Sy ETHER_STAT_ADV_CAP_100T4
1251Indicates that the device is advertising support for 100 Mbit/s
1252100BASE-T4 operation.
1253.It Sy ETHER_STAT_ADV_CAP_10FDX
1254Indicates that the device is advertising support for 10 Mbit/s
1255full-duplex operation.
1256.It Sy ETHER_STAT_ADV_CAP_10GFDX
1257Indicates that the device is advertising support for 10 Gbit/s
1258full-duplex operation.
1259.It Sy ETHER_STAT_ADV_CAP_10HDX
1260Indicates that the device is advertising support for 10 Mbit/s
1261half-duplex operation.
1262.It Sy ETHER_STAT_ADV_CAP_2500FDX
1263Indicates that the device is advertising support for 2.5 Gbit/s
1264full-duplex operation.
1265.It Sy ETHER_STAT_ADV_CAP_40GFDX
1266Indicates that the device is advertising support for 40 Gbit/s
1267full-duplex operation.
1268.It Sy ETHER_STAT_ADV_CAP_5000FDX
1269Indicates that the device is advertising support for 5.0 Gbit/s
1270full-duplex operation.
1271.It Sy ETHER_STAT_ADV_CAP_ASMPAUSE
1272Indicates that the device is advertising support for receiving pause
1273frames.
1274.It Sy ETHER_STAT_ADV_CAP_AUTONEG
1275Indicates that the device is advertising support for auto-negotiation.
1276.It Sy ETHER_STAT_ADV_CAP_PAUSE
1277Indicates that the device is advertising support for generating pause
1278frames.
1279.It Sy ETHER_STAT_ADV_REMFAULT
1280Indicates that the device is advertising support for detecting faults in
1281the remote link peer.
1282.It Sy ETHER_STAT_ALIGN_ERRORS
1283Indicates the number of times an alignment error was generated by the
1284Ethernet device. This is a count of packets that were not an integral
1285number of octets and failed the FCS check.
1286.It Sy ETHER_STAT_CAP_1000FDX
1287Indicates the device supports 1 Gbit/s full-duplex operation.
1288.It Sy ETHER_STAT_CAP_1000HDX
1289Indicates the device supports 1 Gbit/s half-duplex operation.
1290.It Sy ETHER_STAT_CAP_100FDX
1291Indicates the device supports 100 Mbit/s full-duplex operation.
1292.It Sy ETHER_STAT_CAP_100GFDX
1293Indicates the device supports 100 Gbit/s full-duplex operation.
1294.It Sy ETHER_STAT_CAP_100HDX
1295Indicates the device supports 100 Mbit/s half-duplex operation.
1296.It Sy ETHER_STAT_CAP_100T4
1297Indicates the device supports 100 Mbit/s 100BASE-T4 operation.
1298.It Sy ETHER_STAT_CAP_10FDX
1299Indicates the device supports 10 Mbit/s full-duplex operation.
1300.It Sy ETHER_STAT_CAP_10GFDX
1301Indicates the device supports 10 Gbit/s full-duplex operation.
1302.It Sy ETHER_STAT_CAP_10HDX
1303Indicates the device supports 10 Mbit/s half-duplex operation.
1304.It Sy ETHER_STAT_CAP_2500FDX
1305Indicates the device supports 2.5 Gbit/s full-duplex operation.
1306.It Sy ETHER_STAT_CAP_40GFDX
1307Indicates the device supports 40 Gbit/s full-duplex operation.
1308.It Sy ETHER_STAT_CAP_5000FDX
1309Indicates the device supports 5.0 Gbit/s full-duplex operation.
1310.It Sy ETHER_STAT_CAP_ASMPAUSE
1311Indicates that the device supports the ability to receive pause frames.
1312.It Sy ETHER_STAT_CAP_AUTONEG
1313Indicates that the device supports the ability to perform link
1314auto-negotiation.
1315.It Sy ETHER_STAT_CAP_PAUSE
1316Indicates that the device supports the ability to transmit pause frames.
1317.It Sy ETHER_STAT_CAP_REMFAULT
1318Indicates that the device supports the ability of detecting a remote
1319fault in a link peer.
1320.It Sy ETHER_STAT_CARRIER_ERRORS
1321Indicates the number of times that the Ethernet carrier sense condition
1322was lost or not asserted.
1323.It Sy ETHER_STAT_DEFER_XMTS
1324Indicates the number of frames for which the device was unable to
1325transmit the frame due to being busy and had to try again.
1326.It Sy ETHER_STAT_EX_COLLISIONS
1327Indicates the number of frames that failed to send due to an excessive
1328number of collisions.
1329.It Sy ETHER_STAT_FCS_ERRORS
1330Indicates the number of times that a frame check sequence failed.
1331.It Sy ETHER_STAT_FIRST_COLLISIONS
1332Indicates the number of times that a frame was eventually transmitted
1333successfully, but only after a single collision.
1334.It Sy ETHER_STAT_JABBER_ERRORS
1335Indicates the number of frames that were received that were both larger
1336than the maximum packet size and failed the frame check sequence.
1337.It Sy ETHER_STAT_LINK_ASMPAUSE
1338Indicates whether the link is currently configured to accept pause
1339frames.
1340.It Sy ETHER_STAT_LINK_AUTONEG
1341Indicates whether the current link state is a result of
1342auto-negotiation.
1343.It Sy ETHER_STAT_LINK_DUPLEX
1344Indicates the current duplex state of the link. The values used here
1345should be the same as documented for
1346.Sy MAC_PROP_DUPLEX .
1347.It Sy ETHER_STAT_LINK_PAUSE
1348Indicates whether the link is currently configured to generate pause
1349frames.
1350.It Sy ETHER_STAT_LP_CAP_1000FDX
1351Indicates the remote device supports 1 Gbit/s full-duplex operation.
1352.It Sy ETHER_STAT_LP_CAP_1000HDX
1353Indicates the remote device supports 1 Gbit/s half-duplex operation.
1354.It Sy ETHER_STAT_LP_CAP_100FDX
1355Indicates the remote device supports 100 Mbit/s full-duplex operation.
1356.It Sy ETHER_STAT_LP_CAP_100GFDX
1357Indicates the remote device supports 100 Gbit/s full-duplex operation.
1358.It Sy ETHER_STAT_LP_CAP_100HDX
1359Indicates the remote device supports 100 Mbit/s half-duplex operation.
1360.It Sy ETHER_STAT_LP_CAP_100T4
1361Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation.
1362.It Sy ETHER_STAT_LP_CAP_10FDX
1363Indicates the remote device supports 10 Mbit/s full-duplex operation.
1364.It Sy ETHER_STAT_LP_CAP_10GFDX
1365Indicates the remote device supports 10 Gbit/s full-duplex operation.
1366.It Sy ETHER_STAT_LP_CAP_10HDX
1367Indicates the remote device supports 10 Mbit/s half-duplex operation.
1368.It Sy ETHER_STAT_LP_CAP_2500FDX
1369Indicates the remote device supports 2.5 Gbit/s full-duplex operation.
1370.It Sy ETHER_STAT_LP_CAP_40GFDX
1371Indicates the remote device supports 40 Gbit/s full-duplex operation.
1372.It Sy ETHER_STAT_LP_CAP_5000FDX
1373Indicates the remote device supports 5.0 Gbit/s full-duplex operation.
1374.It Sy ETHER_STAT_LP_CAP_ASMPAUSE
1375Indicates that the remote device supports the ability to receive pause
1376frames.
1377.It Sy ETHER_STAT_LP_CAP_AUTONEG
1378Indicates that the remote device supports the ability to perform link
1379auto-negotiation.
1380.It Sy ETHER_STAT_LP_CAP_PAUSE
1381Indicates that the remote device supports the ability to transmit pause
1382frames.
1383.It Sy ETHER_STAT_LP_CAP_REMFAULT
1384Indicates that the remote device supports the ability of detecting a
1385remote fault in a link peer.
1386.It Sy ETHER_STAT_MACRCV_ERRORS
1387Indicates the number of times that the internal MAC layer encountered an
1388error when attempting to receive and process a frame.
1389.It Sy ETHER_STAT_MACXMT_ERRORS
1390Indicates the number of times that the internal MAC layer encountered an
1391error when attempting to process and transmit a frame.
1392.It Sy ETHER_STAT_MULTI_COLLISIONS
1393Indicates the number of times that a frame was eventually transmitted
1394successfully, but only after more than one collision.
1395.It Sy ETHER_STAT_SQE_ERRORS
1396Indicates the number of times that an SQE error occurred. The specific
1397conditions for this error are documented in IEEE 802.3.
1398.It Sy ETHER_STAT_TOOLONG_ERRORS
1399Indicates the number of frames that were received that were longer than
1400the maximum frame size supported by the device.
1401.It Sy ETHER_STAT_TOOSHORT_ERRORS
1402Indicates the number of frames that were received that were shorter than
1403the minimum frame size supported by the device.
1404.It Sy ETHER_STAT_TX_LATE_COLLISIONS
1405Indicates the number of times a collision was detected late on the
1406device.
1407.It Sy ETHER_STAT_XCVR_ADDR
1408Indicates the address of the MII/GMII receiver address.
1409.It Sy ETHER_STAT_XCVR_ID
1410Indicates the id of the MII/GMII receiver address.
1411.It Sy ETHER_STAT_XCVR_INUSE
1412Indicates what kind of receiver is in use. The following values may be
1413used:
1414.Bl -tag -width Ds
1415.It Sy XCVR_UNDEFINED
1416The receiver type is undefined by the hardware.
1417.It Sy XCVR_NONE
1418There is no receiver in use by the hardware.
1419.It Sy XCVR_10
1420The receiver supports 10BASE-T operation.
1421.It Sy XCVR_100T4
1422The receiver supports 100BASE-T4 operation.
1423.It Sy XCVR_100X
1424The receiver supports 100BASE-TX operation.
1425.It Sy XCVR_100T2
1426The receiver supports 100BASE-T2 operation.
1427.It Sy XCVR_1000X
1428The receiver supports 1000BASE-X operation. This is used for all fiber
1429receivers.
1430.It Sy XCVR_1000T
1431The receiver supports 1000BASE-T operation. This is used for all copper
1432receivers.
1433.El
1434.El
1435.Ss Device Specific kstats
1436In addition to the defined statistics above, if the device driver
1437maintains additional statistics or the device provides additional
1438statistics, it should create its own kstats through the
1439.Xr kstat_create 9F
1440function to allow operators to observe them.
1441.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
1442Device drivers are the first line of defense for dealing with broken
1443devices and bugs in their firmware. While most devices will rarely fail,
1444it is important that when designing and implementing the device driver
1445that particular attention is paid in the design with respect to RAS
1446(Reliability, Availability, and Serviceability). While everything
1447described in this section is optional, it is highly recommended that
1448all new device drivers follow these guidelines.
1449.Pp
1450The Fault Management Architecture (FMA) provides facilities for
1451detecting and reporting various classes of defects and faults.
1452Specifically for networking device drivers, issues that should be
1453detected and reported include:
1454.Bl -bullet -offset indent
1455.It
1456Device internal uncorrectable errors
1457.It
1458Device internal correctable errors
1459.It
1460PCI and PCI Express transport errors
1461.It
1462Device temperature alarms
1463.It
1464Device transmission stalls
1465.It
1466Device communication timeouts
1467.It
1468High invalid interrupts
1469.El
1470.Pp
1471All such errors fall into three primary categories:
1472.Bl -enum -offset indent
1473.It
1474Errors detected by the Fault Management Architecture
1475.It
1476Errors detected by the device and indicated to the device driver
1477.It
1478Errors detected by the device driver
1479.El
1480.Ss Fault Management Setup and Teardown
1481Drivers should initialize support for the fault management framework by
1482calling
1483.Xr ddi_fm_init 9F
1484from their
1485.Xr attach 9E
1486routine. By registering with the fault management framework, a device
1487driver is given the chance to detect and notice transport errors as well
1488as report other errors that exist. While a device driver does not need to
1489indicate that it is capable of all such capabilities described in
1490.Xr ddi_fm_init 9F ,
1491we suggest that device drivers at least register the
1492.Sy DDI_FM_EREPORT_CAPABLE
1493so as to allow the driver to report issues that it detects.
1494.Pp
1495If the driver registers with the fault management framework during its
1496.Xr attach 9E
1497entry point, it must call
1498.Xr ddi_fm_fini 9E
1499during its
1500.Xr detach 9E
1501entry point.
1502.Ss Transport Errors
1503Many modern networking devices leverage PCI or PCI Express. As such,
1504there are two primary ways that device drivers access data: they either
1505memory map device registers and use routines like
1506.Xr ddi_get8 9F
1507and
1508.Xr ddi_put8 9F
1509or they use direct memory access (DMA). New device drivers should always
1510enable checking of the transport layer by marking their support in the
1511.Xr ddi_device_acc_attr_t 9S
1512structure and using routines like
1513.Xr ddi_fm_acc_err_get 9F
1514and
1515.Xr ddi_fm_dma_err_get 9F
1516to detect if errors have occurred.
1517.Ss Device Indicated Errors
1518Many devices have capabilities to announce to a device driver that a
1519fatal correctable error or uncorrectable error has occurred. Other
1520devices have the ability to indicate that various physical issues have
1521occurred such as a fan failing or a temperature sensor having fired.
1522.Pp
1523Drivers should wire themselves to receive notifications when these
1524events occur. The means and capabilities will vary from device to
1525device. For example, some devices will generate information about these
1526notifications through special interrupts. Other devices may have a
1527register that software can poll. In the cases where polling is required,
1528driver writers should try not to poll too frequently and should
1529generally only poll when the device is actively being used, e.g. between
1530calls to the
1531.Xr mc_start 9E
1532and
1533.Xr mc_stop 9E
1534entry points.
1535.Ss Driver Transmit Stall Detection
1536One of the primary responsibilities of a hardened device driver is to
1537perform transmit stall detection. The core idea behind tx stall
1538detection is that the driver should record when it's getting activity
1539related to when data has been successfully transmitted. Most devices
1540should be transmitting data on a regular basis as long as the link is
1541up. If it is not, then this may indicate that the device is stuck and
1542needs to be reset. At this time, the MAC framework does not provide any
1543resources for performing these checks; however, polling on each
1544individual transmit ring for the last completion time while something is
1545actively being transmitted through the use of routines such as
1546.Xr timeout 9F
1547may be a reasonable starting point.
1548.Ss Driver Command Timeout Detection
1549Each device is programmed in different ways. Some devices are programmed
1550through asynchronous commands while others are programmed by writing
1551directly to memory mapped registers. If a device receives asynchronous
1552replies to commands, then the device driver should set reasonable
1553timeouts for all such commands and plan on detecting them. If a timeout
1554occurs, the driver should presume that there is an issue with the
1555hardware and proceed to abort the command or reset the device.
1556.Pp
1557Many devices do not have such a communication mechanism. However,
1558whenever there is some activity where the device driver must wait, then
1559it should be prepared for the fact that the device may never get back to
1560it and react appropriately by performing some kind of device reset.
1561.Ss Reacting to Errors
1562When any of the above categories of errors has been triggered, the
1563behavior that the device driver should take depends on the kind of
1564error. If a fatal error, for example, a transport error, a transmit
1565stall was detected, or the device indicated an uncorrectable error was
1566detected, then it is
1567important that the driver take the following steps:
1568.Bl -enum -offset indent
1569.It
1570Set a flag in the device driver's state that indicates that it has hit
1571an error condition. When this error condition flag is asserted,
1572transmitted packets should be accepted and dropped and actions that would
1573require writing to the device state should fail with an error. This flag
1574should remain until the device has been successfully restarted.
1575.It
1576If the error was not a transport error that was indicated by the fault
1577management architecture, e.g. a transport error that was detected, then
1578the device driver should post an
1579.Sy ereport
1580indicating what has occurred with the
1581.Xr ddi_fm_ereport_post 9F
1582function.
1583.It
1584The device driver should indicate that the device's service was lost
1585with a call to
1586.Xr ddi_fm_service_impact 9F
1587using the symbol
1588.Sy DDI_SERVICE_LOST .
1589.It
1590At this point the device driver should issue a device reset through some
1591device-specific means.
1592.It
1593When the device reset has been completed, then the device driver should
1594restore all of the programmed state to the device. This includes things
1595like the current MTU, advertised auto-negotiation speeds, MAC address
1596filters, and more.
1597.It
1598Finally, when service has been restored, the device driver should call
1599.Xr ddi_fm_service_impact 9F
1600using the symbol
1601.Sy DDI_SERVICE_RESTORED .
1602.El
1603.Pp
1604When a non-fatal error occurs, then the device driver should submit an
1605ereport and should optionally mark the device degraded using
1606.Xr ddi_fm_service_impact 9F
1607with the
1608.Sy DDI_SERVICE_DEGRADED
1609value depending on the nature of the problem that has occurred.
1610.Pp
1611Device drivers should never make the decision to remove a device from
1612service based on errors that have occurred nor should they panic the
1613system. Rather, the device driver should always try to notify the
1614operating system with various ereports and allow its policy decisions to
1615occur. The decision to retire a device lies in the hands of the fault
1616management architecture. It knows more about the operator's intent and
1617the surrounding system's state than the device driver itself does and it
1618will make the call to offline and retire the device if it is required.
1619.Ss Device Resets
1620When resetting a device, a device driver must exercise caution. If a
1621device driver has not been written to plan for a device reset, then it
1622may not correctly restore the device's state after such a reset. Such
1623state should be stored in the instance's private state data as the MAC
1624framework does not know about device resets and will not inform the
1625device again about the expected, programmed state.
1626.Pp
1627One wrinkle with device resets is that many networking cards show up as
1628multiple PCI functions on a single device, for example, each port may
1629show up as a separate function and thus have a separate instance of the
1630device driver attached. When resetting a function, device driver writers
1631should carefully read the device programming manuals and verify whether
1632or not a reset impacts only the stalled function or if it impacts all
1633function across the device.
1634.Pp
1635If the only way to reset a given function is through the device, then
1636this may require more coordination and work on the part of the device
1637driver to ensure that all the other instances are correctly restored.
1638In cases where this occurs, some devices offer ways of injecting
1639interrupts onto those other functions to notify them that this is
1640occurring.
1641.Sh MBLKS AND DMA
1642The networking stack manages framed data through the use of the
1643.Xr mblk 9S
1644structure. The mblk allows for a single message to be made up of
1645individual blocks. Each part is linked together through its
1646.Sy b_cont
1647member. However, it also allows for multiple messages to be chained
1648together through the use of the
1649.Sy b_next
1650member. While the networking stack works with these structures, device
1651drivers generally work with DMA regions. There are two different
1652strategies that device drivers use for handling these two different
1653cases: copying and binding.
1654.Ss Copying Data
1655The first way that device drivers handle interfacing between the two is
1656by having two separate regions of memory. One part is memory which has
1657been allocated for DMA through a call to
1658.Xr ddi_dma_alloc 9F
1659and the other is memory associated with the memory block.
1660.Pp
1661In this case, a driver will use
1662.Xr bcopy 9F
1663to copy memory between the two distinct regions. When transmitting a
1664packet, it will copy the memory from the mblk_t to the DMA region. When
1665receiving memory, it will allocate a mblk_t through the
1666.Xr allocb 9F
1667routine, copy the memory across with
1668.Xr bcopy 9F ,
1669and then increment the mblk_t's
1670.Sy w_ptr
1671structure.
1672.Pp
1673If, when receiving, memory is not available for a new message block,
1674then the frame should be skipped and effectively dropped. A kstat should
1675be bumped when such an occasion occurs.
1676.Ss Binding Data
1677An alternative approach to copying data is to use DMA binding. When
1678using DMA binding, the OS takes care of mapping between DMA memory and
1679normal device memory. The exact process is a bit different between
1680transmit and receive.
1681.Pp
1682When transmitting a device driver has an mblk_t and needs to call the
1683.Xr ddi_dma_addr_bind_handle 9F
1684function to bind it to an already existing DMA handle. At that point, it
1685will receive various DMA cookies that it can use to obtain the addresses
1686to program the device with for transmitting data. Once the transmit is
1687done, the driver must then make sure to call
1688.Xr freemsg 9F
1689to release the data. It must not call
1690.Xr freemsg 9F
1691before it receives an interrupt from the device indicating that the data
1692has been transmitted, otherwise it risks sending arbitrary kernel
1693memory.
1694.Pp
1695When receiving data, the device can perform a similar operation. First,
1696it must bind the DMA memory into the kernel's virtual memory address
1697space through a call to the
1698.Xr ddi_dma_addr_bind_handle 9F
1699function if it has not already. Once it has, it must then call
1700.Xr desballoc 9F
1701to try and create a new mblk_t which leverages the associated memory. It
1702can then pass that mblk_t up to the stack.
1703.Ss Considerations
1704When deciding which of these options to use, there are many different
1705considerations that must be made. The answer as to whether to bind
1706memory or to copy data is not always simpler.
1707.Pp
1708The first thing to remember is that DMA resources may be finite on a
1709given platform. Consider the case of receiving data. A device driver
1710that binds one of its receive descriptors may not get it back for quite
1711some time as it may be used by the kernel until an application actually
1712consumes it. Device drivers that try to bind memory for receive, often
1713work with the constraint that they must be able to replace that DMA
1714memory with another DMA descriptor. If they were not replaced, then
1715eventually the device would not be able to receive additional data into
1716the ring.
1717.Pp
1718On the other hand, particularly for larger frames, copying every packet
1719from one buffer to another can be a source of additional latency and
1720memory waste in the system. For larger copies, the cost of copying may
1721dwarf any potential cost of performing DMA binding.
1722.Pp
1723For device driver authors that are unsure of what to do, they should
1724first employ the copying method to simplify the act of writing the
1725device driver. The copying method is simpler and also allows the device
1726driver author not to worry about allocated DMA memory that is still
1727outstanding when it is asked to unload.
1728.Pp
1729If device driver writers are worried about the cost, it is recommended
1730to make the decision as to whether or not to copy or bind DMA data
1731a separate private property for both transmitting and receiving. That
1732private property should indicate the size of the received frame at which
1733to switch from one format to the other. This way, data can be gathered
1734to determine what the impact of each method is on a given platform.
1735.Sh SEE ALSO
1736.Xr dladm 1M ,
1737.Xr driver.conf 4 ,
1738.Xr ieee802.3 5 ,
1739.Xr dlpi 7P ,
1740.Xr _fini 9E ,
1741.Xr _info 9E ,
1742.Xr _init 9E ,
1743.Xr attach 9E ,
1744.Xr close 9E ,
1745.Xr detach 9E ,
1746.Xr mc_close 9E ,
1747.Xr mc_getcapab 9E ,
1748.Xr mc_getprop 9E ,
1749.Xr mc_getstat 9E ,
1750.Xr mc_multicst 9E  ,
1751.Xr mc_open 9E ,
1752.Xr mc_propinfo 9E  ,
1753.Xr mc_setpromisc 9E  ,
1754.Xr mc_setprop 9E ,
1755.Xr mc_start 9E ,
1756.Xr mc_stop 9E ,
1757.Xr mc_tx 9E ,
1758.Xr mc_unicst 9E  ,
1759.Xr open 9E ,
1760.Xr allocb 9F ,
1761.Xr bcopy 9F ,
1762.Xr ddi_dma_addr_bind_handle 9F ,
1763.Xr ddi_dma_alloc 9F ,
1764.Xr ddi_fm_acc_err_get 9F ,
1765.Xr ddi_fm_dma_err_get 9F ,
1766.Xr ddi_fm_ereport_post 9F ,
1767.Xr ddi_fm_fini 9F ,
1768.Xr ddi_fm_init 9F ,
1769.Xr ddi_fm_service_impact 9F ,
1770.Xr ddi_get8 9F ,
1771.Xr ddi_put8 9F ,
1772.Xr desballoc 9F ,
1773.Xr freemsg 9F ,
1774.Xr kstat_create 9F ,
1775.Xr mac_alloc 9F ,
1776.Xr mac_fini_ops 9F ,
1777.Xr mac_hcksum_get 9F ,
1778.Xr mac_hcksum_set 9F ,
1779.Xr mac_init_ops 9F ,
1780.Xr mac_link_update 9F ,
1781.Xr mac_lso_get 9F ,
1782.Xr mac_maxsdu_update 9F ,
1783.Xr mac_prop_info_set_default_link_flowctrl 9F ,
1784.Xr mac_prop_info_set_default_str 9F ,
1785.Xr mac_prop_info_set_default_uint32 9F ,
1786.Xr mac_prop_info_set_default_uint64 9F ,
1787.Xr mac_prop_info_set_default_uint8 9F ,
1788.Xr mac_prop_info_set_perm 9F ,
1789.Xr mac_prop_info_set_range_uint32 9F ,
1790.Xr mac_register 9F ,
1791.Xr mac_rx 9F ,
1792.Xr mac_unregister 9F ,
1793.Xr mc_getprop 9F ,
1794.Xr mc_tx 9F ,
1795.Xr mod_install 9F ,
1796.Xr mod_remove 9F ,
1797.Xr strcmp 9F ,
1798.Xr timeout 9F ,
1799.Xr cb_ops 9S ,
1800.Xr ddi_device_acc_attr_t 9S ,
1801.Xr dev_ops 9S ,
1802.Xr kstat_create 9S ,
1803.Xr mac_callbacks 9S ,
1804.Xr mac_register 9S ,
1805.Xr mblk 9S ,
1806.Xr modldrv 9S ,
1807.Xr modlinkage 9S
1808.Rs
1809.%A McCloghrie, K.
1810.%A Rose, M.
1811.%T RFC 1213 Management Information Base for Network Management of
1812.%T TCP/IP-based internets: MIB-II
1813.%D March 1991
1814.Re
1815.Rs
1816.%A McCloghrie, K.
1817.%A Kastenholz, F.
1818.%T RFC 1573 Evolution of the Interfaces Group of MIB-II
1819.%D January 1994
1820.Re
1821.Rs
1822.%A Kastenholz, F.
1823.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like
1824.%T Interface Types
1825.Re
1826