xref: /illumos-gate/usr/src/man/man9e/mac.9e (revision b8052df9f609edb713f6828c9eecc3d7be19dfb3)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright 2019 Joyent, Inc.
13.\" Copyright 2020 RackTop Systems, Inc.
14.\" Copyright 2022 Oxide Computer Company
15.\"
16.Dd July 2, 2022
17.Dt MAC 9E
18.Os
19.Sh NAME
20.Nm mac ,
21.Nm GLDv3
22.Nd MAC networking device driver overview
23.Sh SYNOPSIS
24.In sys/mac_provider.h
25.In sys/mac_ether.h
26.Sh INTERFACE LEVEL
27illumos DDI specific
28.Sh DESCRIPTION
29The
30.Sy MAC
31framework provides a means for implementing high-performance networking
32device drivers.
33It is the successor to the GLD interfaces and is sometimes referred to as the
34GLDv3.
35The remainder of this manual introduces the aspects of writing devices drivers
36that leverage the MAC framework.
37While both the GLDv3 and MAC framework refer to the same thing, in this manual
38page we use the term the
39.Em MAC framework
40to refer to the device driver interface.
41.Pp
42MAC device drivers are character devices.
43They define the standard
44.Xr _init 9E ,
45.Xr _fini 9E ,
46and
47.Xr _info 9E
48entry points to initialize the module, as well as
49.Xr dev_ops 9S
50and
51.Xr cb_ops 9S
52structures.
53.Pp
54The main interface with MAC is through a series of callbacks defined in
55a
56.Xr mac_callbacks 9S
57structure.
58These callbacks control all the aspects of the device.
59They range from sending data, getting and setting of properties, controlling mac
60address filters, and also managing promiscuous mode.
61.Pp
62The MAC framework takes care of many aspects of the device driver's
63management.
64A device that uses the MAC framework does not have to worry about creating
65device nodes or implementing
66.Xr open 9E
67or
68.Xr close 9E
69routines.
70In addition, all of the work to interact with
71.Xr dlpi 4P
72is taken care of automatically and transparently.
73.Ss High-Level Design
74At a high-level, a device driver is chiefly concerned with three general
75operations:
76.Bl -enum -offset indent
77.It
78Sending frames
79.It
80Receiving frames
81.It
82Managing device configuration and metadata
83.El
84.Pp
85When sending frames, the MAC framework always calls functions registered
86in the
87.Xr mac_callbacks 9S
88structure to have the driver transmit frames on hardware.
89When receiving frames, the driver will generally receive an interrupt which will
90cause it to check for incoming data and deliver it to the MAC framework.
91.Pp
92Configuration of a device, such as whether auto-negotiation should be
93enabled, the speeds that the device supports, the MTU (maximum
94transmission unit), and the generation of pause frames are all driven by
95properties.
96The functions to get, set, and obtain information about properties are
97defined through callback functions specified in the
98.Xr mac_callbacks 9S
99structure.
100The full list of properties and a description of the relevant callbacks
101can be found in the
102.Sx PROPERTIES
103section.
104.Pp
105The MAC framework is designed to take advantage of various modern
106features provided by hardware, such as checksumming, segmentation
107offload, and hardware filtering.
108The MAC framework assumes none of these advanced features are present
109and allows device drivers to negotiate them through a capability system.
110Drivers can declare that they support various capabilities by
111implementing the optional
112.Xr mc_getcapab 9E
113entry point.
114Each capability has its associated entry points and structures to fill
115out.
116The capabilities are detailed in the
117.Sx CAPABILITIES
118section.
119.Pp
120The following sections describe the flow of a basic device driver.
121For advanced device drivers, the flow is generally the same.
122The primary distinction is in how frames are sent and received.
123.Ss Initializing MAC Support
124For a device to be used by the MAC framework, it must register with the
125framework and take specific actions during
126.Xr _init 9E ,
127.Xr attach 9E ,
128.Xr detach 9E ,
129and
130.Xr _fini 9E .
131.Pp
132All device drivers have to define a
133.Xr dev_ops 9S
134structure which is pointed to by a
135.Xr modldrv 9S
136structure and the corresponding NULL-terminated
137.Xr modlinkage 9S
138structure.
139The
140.Xr dev_ops 9S
141structure should have a
142.Xr cb_ops 9S
143structure defined for it; however, it does not need to implement any of
144the standard
145.Xr cb_ops 9S
146entry points.
147.Pp
148Normally, in a driver's
149.Xr _init 9E
150entry point, it passes its
151.Xr modlinkage 9S
152structure directly to
153.Xr mod_install 9F .
154To properly register with MAC, the driver must call
155.Xr mac_init_ops 9F
156before it calls
157.Xr mod_install 9F .
158If for some reason the
159.Xr mod_install 9F
160function fails, then the driver must be removed by a call to
161.Xr mac_fini_ops 9F .
162.Pp
163Conversely, in the driver's
164.Xr _fini 9E
165routine, it should call
166.Xr mac_fini_ops 9F
167after it successfully calls
168.Xr mod_remove 9F .
169For an example of how to use the
170.Xr mac_init_ops 9F
171and
172.Xr mac_fini_ops 9F
173functions, see the examples section in
174.Xr mac_init_ops 9F .
175.Ss Registering with MAC
176Every instance of a device should register separately with MAC.
177To register with MAC, a driver must allocate a
178.Xr mac_register 9S
179structure, fill it in, and then call
180.Xr mac_register 9F .
181The
182.Vt mac_register_t
183structure contains information about the device and all of the required
184function pointers that will be used as callbacks by the framework.
185.Pp
186These steps should all be taken during a device's
187.Xr attach 9E
188entry point.
189It is recommended that the driver perform this sequence of steps after the
190device has finished its initialization of the chipset and interrupts, though
191interrupts should not be enabled at that point.
192After it calls
193.Xr mac_register 9F
194it will start receiving callbacks from the MAC framework.
195.Pp
196To allocate the registration structure, the driver should call
197.Xr mac_alloc 9F .
198Device drivers should generally always pass the symbol
199.Dv MAC_VERSION
200as the argument to
201.Xr mac_alloc 9F .
202Upon successful completion, the driver will receive a
203.Vt mac_register_t
204structure which it should fill in.
205The structure and its members are documented in
206.Xr mac_register 9S .
207.Pp
208The
209.Xr mac_callbacks 9S
210structure is not allocated as a part of the
211.Xr mac_register 9S
212structure.
213In general, device drivers declare this statically.
214See the
215.Sx MAC Callbacks
216section for more information on how to fill it out.
217.Pp
218Once the structure has been filled in, the driver should call
219.Xr mac_register 9F
220to register itself with MAC.
221The handle that it uses to register with should be part of the driver's soft
222state.
223It will be used in various other support functions and callbacks.
224.Pp
225If the call is successful, then the device driver
226should enable interrupts and finish any other initialization required.
227If the call to
228.Xr mac_register 9F
229failed, then it should unwind its initialization and should return
230.Dv DDI_FAILURE
231from its
232.Xr attach 9E
233routine.
234.Pp
235The driver does not need to hold onto an allocated
236.Xr mac_register 9S
237structure after it has called the
238.Xr mac_register 9F
239function.
240Whether the
241.Xr mac_register 9F
242function returns successfully or not, the driver may free its
243.Xr mac_register 9S
244structure by calling the
245.Xr mac_free 9F
246function.
247.Ss MAC Callbacks
248The MAC framework interacts with a device driver through a series of
249callbacks.
250These callbacks are described in their individual manual pages and the
251collection of callbacks is indicated in the
252.Xr mac_callbacks 9S
253manual page.
254This section does not focus on the specific functions, but rather on
255interactions between them and the rest of the device driver framework.
256.Pp
257A device driver should make no assumptions about when the various
258callbacks will be called and whether or not they will be called
259simultaneously.
260For example, a device driver may be asked to transmit data through a call to its
261.Xr mc_tx 9E
262entry point while it is being asked to get a device property through a
263call to its
264.Xr mc_getprop 9E
265entry point.
266As such, while some calls may be serialized to the device, such as setting
267properties, the device driver should always presume that all of its data needs
268to be protected with locks.
269While the device is holding locks, it is safe for it call the following MAC
270routines:
271.Bl -bullet -offset indent -compact
272.It
273.Xr mac_hcksum_get 9F
274.It
275.Xr mac_hcksum_set 9F
276.It
277.Xr mac_lso_get 9F
278.It
279.Xr mac_maxsdu_update 9F
280.It
281.Xr mac_prop_info_set_default_link_flowctrl 9F
282.It
283.Xr mac_prop_info_set_default_str 9F
284.It
285.Xr mac_prop_info_set_default_uint8 9F
286.It
287.Xr mac_prop_info_set_default_uint32 9F
288.It
289.Xr mac_prop_info_set_default_uint64 9F
290.It
291.Xr mac_prop_info_set_perm 9F
292.It
293.Xr mac_prop_info_set_range_uint32 9F
294.El
295.Pp
296Any other MAC related routines should not be called with locks held,
297such as
298.Xr mac_link_update 9F
299or
300.Xr mac_rx 9F .
301Other routines in the DDI may be called while locks are held; however,
302device driver writers should be careful about calling blocking routines
303while locks are held or in interrupt context, even when it is
304legal to do so as this may cause all other callers that need a given
305lock to back up behind such an operation.
306.Ss Receiving Data
307A device driver will often receive data through the means of an
308interrupt or by being asked to poll for frames.
309When this occurs, zero or more frames, each with optional metadata, may
310be ready for the device driver to consume.
311Often each frame has a corresponding descriptor which has information about
312whether or not there were errors or whether or not the device successfully
313checksummed the packet.
314In addition to the per-packet flow described below, there are certain
315requirements that drivers must adhere to when programming the hardware
316to receive data.
317See the section
318.Sx RECEIVE DESCRIPTOR LAYOUT
319for more information.
320.Pp
321During a single interrupt or poll request, a device driver should process
322a fixed number of frames.
323For each frame the device driver should:
324.Bl -enum -offset indent
325.It
326Ensure that all of the DMA memory for the descriptor ring is synchronized with
327the
328.Xr ddi_dma_sync 9F
329function and check the handle for errors if the device driver has enabled DMA
330error reporting as part of the Fault Management Architecture (FMA).
331If the driver does not rely on DMA, then it may skip this step.
332It is recommended that this is performed once per interrupt or poll for
333the entire region and not on a per-packet basis.
334.It
335First check whether or not the frame has errors.
336If errors were detected, then the frame should not be sent to the operating
337system.
338It is recommended that devices keep kstats (see
339.Xr kstat_create 9F
340for more information) and bump the counter whenever such an error is
341detected.
342If the device distinguishes between the types of errors, then separate kstats
343for each class of error are recommended.
344See the
345.Sx STATISTICS
346section for more information on the various error cases that should be
347considered.
348.It
349Once the frame has been determined to be valid, the device driver should
350transform the frame into a
351.Xr mblk 9S .
352See the section
353.Sx MBLKS AND DMA
354for more information on how to transform and prepare a message block.
355.It
356If the device supports hardware checksumming (see the
357.Sx CAPABILITIES
358section for more information on checksumming), then the device driver
359should set the corresponding checksumming information with a call to
360.Xr mac_hcksum_set 9F .
361.It
362It should then append this new message block to the
363.Em end
364of the message block chain, linking it to the
365.Fa b_next
366pointer.
367It is vitally important that all the frames be chained in the order that they
368were received.
369If the device driver mistakenly reorders frames, then it may cause performance
370impacts in the TCP stack and potentially impact application correctness.
371.El
372.Pp
373Once all the frames have been processed and assembled, the device driver
374should deliver them to the rest of the operating system by calling
375.Xr mac_rx 9F .
376The device driver should try to give as many mblk_t structures to the
377system at once.
378It
379.Em should not
380call
381.Xr mac_rx 9F
382once for every assembled mblk_t.
383.Pp
384The device driver must not hold any locks across the call to
385.Xr mac_rx 9F .
386When this function is called, received data will be pushed through the
387networking stack and some replies may be generated and given to the
388driver to send out.
389.Pp
390It is not the device driver's responsibility to determine whether or not
391the system can keep up with a driver's delivery rate of frames.
392The rest of the networking stack will handle issues related to keeping up
393appropriately and ensure that kernel memory is not exhausted by packets
394that are not being processed.
395.Pp
396If the device driver has negotiated the
397.Dv MAC_CAPAB_RINGS
398capability
399.Pq discussed in Xr mac_capab_rings 9E
400then it should call
401.Xr mac_rx_ring 9F
402and not
403.Xr mac_rx 9F .
404A given interrupt may correspond to more than one ring that needs to be
405checked.
406The set of rings is likely to span different groups that were registered
407with MAC through the
408.Xr mr_gget 9E
409interface.
410In those cases, the driver should follow the above procedure
411independently for each ring.
412That means it will call
413.Xr mac_rx_ring 9F
414once for each ring using the handle that it received from when MAC
415called the driver's
416.Xr mr_rget 9E
417entry point.
418When it is looking at the rings, the driver will need to make sure that
419the ring has not had interrupts disabled
420.Pq due to a pending change to polling mode .
421This is discussed in greater detail in the
422.Xr mac_capab_rings 9E
423and
424.Xr mri_poll 9E
425manual pages.
426.Pp
427Finally, the device driver should make sure that any other housekeeping
428activities required for the ring are taken care of such that more data
429can be received.
430.Ss Transmitting Data and Back Pressure
431A device driver will be asked to transmit a message block chain by
432having it's
433.Xr mc_tx 9E
434entry point called.
435While the driver is processing the message blocks, it may run out of resources.
436For example, a transmit descriptor ring may become full.
437At that point, the device driver should return the remaining unprocessed frames.
438The act of returning frames indicates that the device has asserted flow control.
439Once this has been done, no additional calls will be made to the
440driver's transmit entry point and the back pressure will be propagated
441throughout the rest of the networking stack.
442.Pp
443At some point in the future when resources have become available again,
444for example after an interrupt indicating that some portion of the
445transmit ring has been sent, then the device driver must notify the
446system that it can continue transmission.
447To do this, the driver should call
448.Xr mac_tx_update 9F .
449After that point, the driver will receive calls to its
450.Xr mc_tx 9E
451entry point again.
452As mentioned in the section on callbacks, the device driver should avoid holding
453any particular locks across the call to
454.Xr mac_tx_update 9F .
455.Ss Interrupt Coalescing
456For devices operating at higher data rates, interrupt coalescing is an
457important part of a well functioning device and may impact the
458performance of the device.
459Not all devices support interrupt coalescing.
460If interrupt coalescing is supported on the device, it is recommended that
461device driver writers provide private properties for their device to control the
462interrupt coalescing rate.
463This will make it much easier to perform experiments and observe the impact of
464different interrupt rates on the rest of the system.
465.Ss Polling
466Even with interrupt coalescing, when there is a certain incoming packet rate it
467can make more sense to just actively poll the device, asking for more packets
468rather than constantly taking an interrupt.
469When a device driver supports the
470.Xr mac_capab_rings 9E
471capability and therefore polling on receive rings, the MAC framework will ask
472the driver to disable interrupts, with its
473.Xr mi_disable 9E
474entry point, and then subsequently call its polling entry point,
475.Xr mri_poll 9E .
476.Pp
477As long as a device driver implements the needed entry points, then there is
478nothing else that it needs to do to take advantage of polling.
479A driver should not attempt to spin up its own threads, task queues, or
480creatively use timeouts, to try to simulate polling for received packets.
481.Ss MAC Address Filter Management
482The MAC framework will attempt to use as many MAC address filters as a
483device has.
484To program a multicast address filter, the driver's
485.Xr mc_multicst 9E
486entry point will be called.
487If the device driver runs out of filters, it should not take any special action
488and just return the appropriate error as documented in the corresponding manual
489pages for the entry points.
490The framework will ensure that the device is placed in promiscuous mode
491if it needs to.
492.Pp
493If the hardware supports more than one unicast filter then the device
494driver should consider implementing the
495.Dv MAC_CAPAB_RINGS
496capability, which exposes a means for multiple unicast MAC address filters to be
497used by the broader system.
498It is still useful to implement this on hardware which only has a single ring.
499See
500.Xr mac_capab_rings 9E
501for more information.
502.Ss Receive Side Scaling
503Receive side scaling is where a hardware device supports multiple,
504independent queues of frames that can be received.
505Each of these queues is generally associated with an independent
506interrupt and the hardware usually performs some form of hash across the
507queues.
508Hardware which supports this should look at implementing the
509.Dv MAC_CAPAB_RINGS
510capability and see
511.Xr mac_capab_rings 9E
512for more information.
513.Ss Link Updates
514It is the responsibility of the device driver to keep track of the
515data link's state.
516Many devices provide a means of receiving an interrupt when the state of the
517link changes.
518When such a change happens, the driver should update its internal data
519structures and then call
520.Xr mac_link_update 9F
521to inform the MAC layer that this has occurred.
522If the device driver does not properly inform the system about link changes,
523then various features like link aggregations and other mechanisms that leverage
524the link state will not work correctly.
525.Ss Link Speed and Auto-negotiation
526Many networking devices support more than one possible speed that they
527can operate at.
528The selection of a speed is often performed through
529.Em auto-negotiation ,
530though some devices allow the user to control what speeds are advertised
531and used.
532.Pp
533Logically, there are two different sets of things that the device driver
534needs to keep track of while it's operating:
535.Bl -enum
536.It
537The supported speeds in hardware.
538.It
539The enabled speeds from the user.
540.El
541.Pp
542By default, when a link first comes up, the device driver should
543generally configure the link to support the common set of speeds and
544perform auto-negotiation.
545.Pp
546A user can control what speeds a device advertises via auto-negotiation
547and whether or not it performs auto-negotiation at all by using a series
548of properties that have
549.Sy _EN_
550in the name.
551These are read/write properties and there is one for each speed supported in the
552operating system.
553For a full list of them, see the
554.Sx PROPERTIES
555section.
556.Pp
557In addition to these properties, there is a corresponding set of
558properties with
559.Sy _ADV_
560in the name.
561These are similar to the
562.Sy _EN_
563family of properties, but they are read-only and indicate what the
564device has actually negotiated.
565While they are generally similar to the
566.Sy _EN_
567family of properties, they may change depending on power settings.
568See the
569.Sy Ethernet Link Properties
570section in
571.Xr dladm 8
572for more information.
573.Pp
574It's worth discussing how these different values get used throughout the
575different entry points.
576The first entry point to consider is the
577.Xr mc_propinfo 9E
578entry point.
579For a given speed, the driver should consult whether or not the hardware
580supports this speed.
581If it does, it should fill in the default value that the hardware takes and
582whether or not the property is writable.
583The properties should also be updated to indicate whether or not it is writable.
584This holds for both the
585.Sy _EN_
586and
587.Sy _ADV_
588family of properties.
589.Pp
590The next entry point is
591.Xr mc_getprop 9E .
592Here, the device should first consult whether the given speed is
593supported.
594If it is not, then the driver should return
595.Er ENOTSUP .
596If it does, then it should return the current value of the property.
597.Pp
598The last property endpoint is the
599.Xr mc_setprop 9E
600entry point.
601Here, the same logic applies.
602Before the driver considers whether or not the property is writable, it should
603first check whether or not it's a supported property.
604If it's not, then it should return
605.Er ENOTSUP .
606Otherwise, it should proceed to check whether the property is writable,
607and if it is and a valid value, then it should update the property and
608restart the link's negotiation.
609.Pp
610Finally, there is the
611.Xr mc_getstat 9E
612entry point.
613Several of the statistics that are queried relate to auto-negotiation and
614hardware capabilities.
615When a statistic relates to the hardware supporting a given speed, the
616.Sy _EN_
617properties should be ignored.
618The only thing that should be consulted is what the hardware itself supports.
619Otherwise, the statistics should look at what is currently being advertised by
620the device.
621.Ss Unregistering from MAC
622During a driver's
623.Xr detach 9E
624routine, it should unregister the device instance from MAC by calling
625.Xr mac_unregister 9F
626on the handle that it originally called it on.
627If the call to
628.Xr mac_unregister 9F
629failed, then the device is likely still in use and the driver should
630fail the call to
631.Xr detach 9E .
632.Ss Interacting with Devices
633Administrators always interact with devices through the
634.Xr dladm 8
635command line interface.
636The state of devices such as whether the link is considered up or down ,
637various link properties such as the MTU, auto-negotiation state, and
638flow control state, are all exposed.
639It is also the preferred way that these properties are set and configured.
640.Pp
641While device tunables may be presented in a
642.Xr driver.conf 5
643file, it is recommended instead to expose such things through
644.Xr dladm 8
645private properties, whether explicitly documented or not.
646.Sh CAPABILITIES
647Capabilities in the MAC Framework are optional features that a device
648supports which indicate various hardware features that the device
649supports.
650The two current capabilities that the system supports are related to being able
651to hardware perform large send offloads (LSO), often also known as TCP
652segmentation and the ability for hardware to calculate and verify the checksums
653present in IPv4, IPV6, and protocol headers such as TCP and UDP.
654.Pp
655The MAC framework will query a device for support of a capability
656through the
657.Xr mc_getcapab 9E
658function.
659Each capability has its own constant and may have corresponding data that goes
660along with it and a specific structure that the device is required to fill in.
661Note, the set of capabilities changes over time and there are also private
662capabilities in the system.
663Several of the capabilities are used in the implementation of the MAC framework.
664Others, like
665.Dv MAC_CAPAB_RINGS ,
666represent feature that have not been stabilized and thus both API and binary
667compatibility for them is not guaranteed.
668It is important that the device driver handles unknown capabilities correctly.
669For more information, see
670.Xr mc_getcapab 9E .
671.Pp
672The following capabilities are
673stable and defined in the system:
674.Ss Dv MAC_CAPAB_HCKSUM
675The
676.Dv MAC_CAPAB_HCKSUM
677capability indicates to the system that the device driver supports some
678amount of checksumming.
679The specific data for this capability is a pointer to a
680.Vt uint32_t .
681To indicate no support for any kind of checksumming, the driver should
682either set this value to zero or simply return that it doesn't support
683the capability.
684.Pp
685Note, the values that the driver declares in this capability indicate
686what it can do when it transmits data.
687If the driver can only verify checksums when receiving data, then it should not
688indicate that it supports this capability.
689The following set of flags may be combined through a bitwise inclusive OR:
690.Bl -tag -width Ds
691.It Dv HCKSUM_INET_PARTIAL
692This indicates that the hardware can calculate a partial checksum for
693both IPv4 and IPv6 UDP and TCP packets; however, it requires the pseudo-header
694checksum be calculated for it.
695The pseudo-header checksum will be available for the mblk_t when calling
696.Xr mac_hcksum_get 9F .
697Note this does not imply that the hardware is capable of calculating
698the partial checksum for other L4 protocols or the IPv4 header checksum.
699That should be indicated with the
700.Dv HCKSUM_IPHDRCKSUM flag.
701.It Dv HCKSUM_INET_FULL_V4
702This indicates that the hardware will fully calculate the L4 checksum for
703outgoing IPv4 UDP or TCP packets only, and does not require a pseudo-header
704checksum.
705Note this does not imply that the hardware is capable of calculating the
706checksum for other L4 protocols or the IPv4 header checksum.
707That should be indicated with the
708.Dv HCKSUM_IPHDRCKSUM .
709.It Dv HCKSUM_INET_FULL_V6
710This indicates that the hardware will fully calculate the L4 checksum for
711outgoing IPv6 UDP or TCP packets only, and does not require a pseudo-header
712checksum.
713Note this does not imply that the hardware is capable of calculating the
714checksum for any other L4 protocols.
715.It Dv HCKSUM_IPHDRCKSUM
716This indicates that the hardware supports calculating the checksum for
717the IPv4 header itself.
718.El
719.Pp
720When in a driver's transmit function, the driver will be processing a
721single frame.
722It should call
723.Xr mac_hcksum_get 9F
724to see what checksum flags are set on it.
725Note that the flags that are set on it are different from the ones described
726above and are documented in its manual page.
727These flags indicate how the driver is expected to program the hardware and what
728checksumming is required.
729Not all frames will require hardware checksumming or will ask the hardware to
730checksum it.
731.Pp
732If a driver supports offloading the receive checksum and verification,
733it should check to see what the hardware indicated was verified.
734The driver should then call
735.Xr mac_hcksum_set 9F .
736The flags used are different from the ones above and are discussed in
737detail in the
738.Xr mac_hcksum_set 9F
739manual page.
740If there is no checksum information available or the driver does not support
741checksumming, then it should simply not call
742.Xr mac_hcksum_set 9F .
743.Pp
744Note that the checksum flags should be set on the first
745mblk_t that makes up a given message.
746In other words, if multiple mblk_t structures are linked together by the
747.Fa b_cont
748member to describe a single frame, then it should only be called on the
749first mblk_t of that set.
750However, each distinct message should have the checksum bits set on it, if
751applicable.
752In other words, each mblk_t that is linked together by the
753.Fa b_next
754pointer may have checksum flags set.
755.Pp
756It is recommended that device drivers provide a private property or
757.Xr driver.conf 5
758property to control whether or not checksumming is enabled for both rx
759and tx; however, the default disposition is recommended to be enabled
760for both.
761This way if hardware bugs are found in the checksumming implementation, they can
762be disabled without requiring software updates.
763The transmit property should be checked when determining how to reply to
764.Xr mc_getcapab 9E
765and the receive property should be checked in the context of the receive
766function.
767.Ss Dv MAC_CAPAB_LSO
768The
769.Dv MAC_CAPAB_LSO
770capability indicates that the driver supports various forms of large
771send offload (LSO).
772The private data is a pointer to a
773.Ft mac_capab_lso_t
774structure.
775The system currently supports offloading TCP packets over both IPv4 and
776IPv6.
777This structure has the following members which are used to indicate
778various types of LSO support.
779.Bd -literal -offset indent
780t_uscalar_t		lso_flags;
781lso_basic_tcp_ivr4_t	lso_basic_tcp_ipv4;
782lso_basic_tcp_ipv6_t	lso_basic_tcp_ipv6;
783.Ed
784.Pp
785The
786.Fa lso_flags
787member is used to indicate which members are valid and should be
788considered.
789Each flag represents a different form of LSO.
790The member should be set to the bitwise inclusive OR of the following values:
791.Bl -tag -width Dv -offset indent
792.It Dv LSO_TX_BASIC_TCP_IPV4
793This indicates hardware support for performing TCP segmentation
794offloading over IPv4.
795When this flag is set, the
796.Fa lso_basic_tcp_ipv4
797member must be filled in.
798.It Dv LSO_TX_BASIC_TCP_IPV6
799This indicates hardware support for performing TCP segmentation
800offloading over IPv6.
801The IPv6 packet will have no extension headers present.
802When this flag is set, the
803.Fa lso_basic_tcp_ipv6
804member must be filled in.
805.El
806.Pp
807The
808.Fa lso_basic_tcp_ipv4
809member is a structure with the following members:
810.Bd -literal -offset indent
811t_uscalar_t	lso_max
812.Ed
813.Bd -filled -offset indent
814The
815.Fa lso_max
816member should be set to the maximum size of the TCP data
817payload that can be offloaded to the hardware.
818.Ed
819.Pp
820The
821.Fa lso_basic_tcp_ipv6
822member is a structure with the following members:
823.Bd -literal -offset indent
824t_uscalar_t	lso_max
825.Ed
826.Bd -filled -offset indent
827The
828.Fa lso_max
829member should be set to the maximum size of the TCP data
830payload that can be offloaded to the hardware.
831.Ed
832.Pp
833Like with checksumming, it is recommended that driver writers provide a
834means for disabling the support of LSO even if it is enabled by default.
835This deals with the case where issues that pop up for LSO may be worked
836around without requiring additional driver work.
837.Sh EVOLVING CAPABILITIES
838The following capabilities are still evolving in the operating system.
839They are documented such that device driver writers may experiment with
840them.
841However, if such drivers are not present inside the core operating
842system repository, they may be subject to API and ABI breakage.
843.Ss Dv MAC_CAPAB_RINGS
844The
845.Dv MAC_CAPAB_RINGS
846capability is very important for implementing a high-performing device
847driver.
848Networking hardware structures the queues of packets to be sent
849and received into a ring.
850Each entry in this ring has a descriptor, which describes the address
851and options for a packet which is going to
852be transmitted or received.
853While simple networking devices only have a single ring, most high-speed
854networking devices have support for many rings.
855.Pp
856Rings are used for two important purposes.
857The first is receive side scaling (RSS), which is the ability to have
858the hardware hash the contents of a packet based on some of the protocol
859headers, and send it to one of several rings.
860These different rings may each have their own interrupt associated with
861them, allowing the card to receive traffic in parallel.
862Similar logic can be performed when sending traffic, to leverage
863multiple hardware resources, thus increasing capacity.
864.Pp
865The second use of rings is to group them together and apply filtering
866rules.
867For example, if a packet matches a specific VLAN or MAC address,
868then it can be sent to a specific ring or a specific group of rings.
869This is especially useful when there are multiple different virtual NICs
870or zones in play as the operating system will be able to use the
871hardware classificaiton features to already know where a given packet
872needs to be delivered internally rather than having to determine that
873for each packet.
874.Pp
875From the MAC framework's perspective, a driver can have one or more
876groups.
877A group consists of the following:
878.Bl -bullet -offset -indent
879.It
880One or more hardware rings.
881.It
882One or more MAC address or VLAN filters.
883.El
884.Pp
885The details around how a device driver changes when rings are employed,
886the data structures that a driver must implement, and more are available
887in
888.Xr mac_capab_rings 9E .
889.Ss Dv MAC_CAPAB_TRANSCEIVER
890Many networking devices leverage external transceivers that adhere to
891standards such as SFP, QSFP, QSFP-DD, etc., which often contain
892standardized information in a EEPROM on the device.
893The
894.Dv MAC_CAPAB_TRANSCEIVER
895capability provides a means of discovering the number of transceivers,
896their types, and reading the data from a transceiver.
897This allows administrators and users to determine if devices are
898present, if the hardware can use them, and in many cases, detailed
899information about the device ranging from its manufacturer and
900serial numbers to specific information about its health.
901Implementing this capability will lead to the operating system being
902able to discover and display transceivers as part of its fault
903management topology.
904.Pp
905See
906.Xr mac_capab_transceiver 9E
907for more details on the capability structure and the various function
908entry points that come along with it.
909.Ss Dv MAC_CAPAB_LED
910The
911.Dv MAC_CAPAB_LED
912capability provides a means to access and control the LEDs on a network
913interface card.
914This is then made available to the broader operating system and consumed
915by facilities such as the Fault Management Architecture.
916See
917.Xr mac_capab_led 9E
918for more details on the structure and requirements of the capability.
919.Sh PROPERTIES
920Properties in the MAC framework represent aspects of a link.
921These include things like the link's current state and MTU.
922Many of the properties in the system are focused around auto-negotiation and
923controlling what link speeds are advertised.
924Information about properties is covered by three different device entry points.
925The
926.Xr mc_propinfo 9E
927entry point obtains metadata about the property.
928The
929.Xr mc_getprop 9E
930entry point obtains the property.
931The
932.Xr mc_setprop 9E
933entry point updates the property to a new value.
934.Pp
935Many of the properties listed below are read-only.
936Each property indicates whether it's read-only or it's read/write.
937However, driver writers may not implement the ability to set all writable
938properties.
939Many of these depend on the card itself.
940In particular, all properties that relate to auto-negotiation and are read/write
941may not be updated if the hardware in question does not support toggling what
942link speeds are auto-negotiated.
943While copper Ethernet often does not have this restriction, it often exists with
944various fiber standards and phys.
945.Pp
946The following properties are the subset of MAC framework properties that
947driver writers should be aware of and handle.
948While other properties exist in the system, driver writers should always return
949an error when a property not listed below is encountered.
950See
951.Xr mc_getprop 9E
952and
953.Xr mc_setprop 9E
954for more information on how to handle them.
955.Bl -hang -width Ds
956.It Dv MAC_PROP_DUPLEX
957.Bd -filled -compact
958Type:
959.Vt link_duplex_t |
960Permissions:
961.Sy Read-Only
962.Ed
963.Pp
964The
965.Dv MAC_PROP_DUPLEX
966property is used to indicate whether or not the link is duplex.
967A duplex link may have traffic flowing in both directions at the same time.
968The
969.Vt link_duplex_t
970is an enumeration which may be set to any of the following values:
971.Bl -tag -width Ds
972.It Dv LINK_DUPLEX_UNKNOWN
973The current state of the link is unknown.
974This may be because the link has not negotiated to a specific speed or it is
975down.
976.It Dv LINK_DUPLEX_HALF
977The link is running at half duplex.
978Communication may travel in only one direction on the link at a given time.
979.It Dv LINK_DUPLEX_FULL
980The link is running at full duplex.
981Communication may travel in both directions on the link simultaneously.
982.El
983.It Dv MAC_PROP_SPEED
984.Bd -filled -compact
985Type:
986.Vt uint64_t |
987Permissions:
988.Sy Read-Only
989.Ed
990.Pp
991The
992.Dv MAC_PROP_SPEED
993property stores the current link speed in bits per second.
994A link that is running at 100 MBit/s would store the value 100000000ULL.
995A link that is running at 40 Gbit/s would store the value 40000000000ULL.
996.It Dv MAC_PROP_STATUS
997.Bd -filled -compact
998Type:
999.Vt link_state_t |
1000Permissions:
1001.Sy Read-Only
1002.Ed
1003.Pp
1004The
1005.Dv MAC_PROP_STATUS
1006property is used to indicate the current state of the link.
1007It indicates whether the link is up or down.
1008The
1009.Vt link_state_t
1010is an enumeration which may be set to any of the following values:
1011.Bl -tag -width Ds
1012.It Dv LINK_STATE_UNKNOWN
1013The current state of the link is unknown.
1014This may be because the driver's
1015.Xr mc_start 9E
1016endpoint has not been called so it has not attempted to start the link.
1017.It Dv LINK_STATE_DOWN
1018The link is down.
1019This may be because of a negotiation problem, a cable problem, or some other
1020device specific issue.
1021.It Dv LINK_STATE_UP
1022The link is up.
1023If auto-negotiation is in use, it should have completed.
1024Traffic should be able to flow over the link, barring other issues.
1025.El
1026.It Dv MAC_PROP_AUTONEG
1027.Bd -filled -compact
1028Type:
1029.Vt uint8_t |
1030Permissions:
1031.Sy Read/Write
1032.Ed
1033.Pp
1034The
1035.Dv MAC_PROP_AUTONEG
1036property indicates whether or not the device is currently configured to
1037perform auto-negotiation.
1038A value of
1039.Sy 0
1040indicates that auto-negotiation is disabled.
1041A
1042.Sy non-zero
1043value indicates that auto-negotiation is enabled.
1044Devices should generally default to enabling auto-negotiation.
1045.Pp
1046When getting this property, the device driver should return the current
1047state.
1048When setting this property, if the device supports operating in the requested
1049mode, then the device driver should reset the link to negotiate to the new speed
1050after updating any internal registers.
1051.It Dv MAC_PROP_MTU
1052.Bd -filled -compact
1053Type:
1054.Vt uint32_t |
1055Permissions:
1056.Sy Read/Write
1057.Ed
1058.Pp
1059The
1060.Dv MAC_PROP_MTU
1061property determines the maximum transmission unit (MTU).
1062This indicates the maximum size packet that the device can transmit, ignoring
1063its own headers.
1064For an Ethernet device, this would exclude the size of the Ethernet header and
1065any VLAN headers that would be placed.
1066It is up to the driver to ensure that any MTU values that it accepts when adding
1067in its margin and header sizes does not exceed its maximum frame size.
1068.Pp
1069By default, drivers for Ethernet should initialize this value and the
1070MTU to
1071.Sy 1500 .
1072When getting this property, the driver should return its current
1073recorded MTU.
1074When setting this property, the driver should first validate that it is within
1075the device's valid range and then it must call
1076.Xr mac_maxsdu_update 9F .
1077Note that the call may fail.
1078If the call completes successfully, the driver should update the hardware with
1079the new value of the MTU and perform any other work needed to handle it.
1080.Pp
1081If the device does not support changing the MTU after the device's
1082.Xr mc_start 9E
1083entry point has been called, then driver writers should return
1084.Er EBUSY .
1085.It Dv MAC_PROP_FLOWCTRL
1086.Bd -filled -compact
1087Type:
1088.Vt link_flowctrl_t |
1089Permissions:
1090.Sy Read/Write
1091.Ed
1092.Pp
1093The
1094.Dv MAC_PROP_FLOWCTRL
1095property manages the configuration of pause frames as part of Ethernet
1096flow control.
1097Note, this only describes what this device will advertise.
1098What is actually enabled may be different and is subject to the rules of
1099auto-negotiation.
1100The
1101.Vt link_flowctrl_t
1102is an enumeration that may be set to one of the following values:
1103.Bl -tag -width Ds
1104.It Dv LINK_FLOWCTRL_NONE
1105Flow control is disabled.
1106No pause frames should be generated or honored.
1107.It Dv LINK_FLOWCTRL_RX
1108The device can receive pause frames; however, it should not generate
1109them.
1110.It Dv LINK_FLOWCTRL_TX
1111The device can generate pause frames; however, it does not support
1112receiving them.
1113.It Dv LINK_FLOWCTRL_BI
1114The device supports both sending and receiving pause frames.
1115.El
1116.Pp
1117When getting this property, the device driver should return the way that
1118it has configured the device, not what the device has actually
1119negotiated.
1120When setting the property, it should update the hardware and allow the link to
1121potentially perform auto-negotiation again.
1122.It Dv MAC_PROP_EN_FEC_CAP
1123.Bd -filled -compact
1124Type:
1125.Vt link_fec_t |
1126Permissions:
1127.Sy Read/Write
1128.Ed
1129.Pp
1130The
1131.Dv MAC_PROP_EN_FEC_CAP
1132property indicates which Forward Error Correction (FEC) code is advertised
1133by the device.
1134.Pp
1135The
1136.Vt link_fec_t
1137is an enumeration that may be a combination of the following bit values:
1138.Bl -tag -width Ds
1139.It Dv LINK_FEC_NONE
1140No FEC over the link.
1141.It Dv LINK_FEC_AUTO
1142The FEC coding to use is auto-negotiated,
1143.Dv LINK_FEC_AUTO
1144cannot be set along with any of the other values.
1145This is the default setting the device driver should use.
1146.It Dv LINK_FEC_RS
1147The link may use Reed-Solomon FEC coding.
1148.It Dv LINK_FEC_BASE_R
1149The link may use Base-R coding, also common referred to as FireCode.
1150.El
1151.Pp
1152When setting the property, it should update the hardware with the requested, or
1153combination of requested codings.
1154If a particular combination of codings is not supported by the hardware,
1155the device driver should return
1156.Er EINVAL .
1157When retrieving this property, the device driver should return the current
1158value of the property.
1159.It Dv MAC_PROP_ADV_FEC_CAP
1160.Bd -filled -compact
1161Type:
1162.Vt link_fec_t |
1163Permissions:
1164.Sy Read-Only
1165.Ed
1166.Pp
1167The
1168.Dv MAC_PROP_ADV_FEC_CAP
1169has the same values as
1170.Dv MAC_PROP_EN_FEC_CAP .
1171The property indicates which Forward Error Correction (FEC) code has been
1172negotiated over the link.
1173.El
1174.Pp
1175The remaining properties are all about various auto-negotiation link
1176speeds.
1177They fall into two different buckets: properties with
1178.Sy _ADV_
1179in the name and properties with
1180.Sy _EN_
1181in the name.
1182For any given supported speed, there is one of each.
1183The
1184.Sy _EN_
1185set of properties are read/write properties that control what should be
1186advertised by the device.
1187When these are retrieved, they should return the current value of the property.
1188When they are set, they should change how the hardware advertises the specific
1189speed and trigger any kind of link reset and auto-negotiation, if enabled, to
1190occur.
1191.Pp
1192The
1193.Sy _ADV_
1194set of properties are read-only properties.
1195They are meant to reflect what has actually been negotiated.
1196These may be different from the
1197.Sy _EN_
1198family of properties, especially when different power management
1199settings are at play.
1200.Pp
1201See the
1202.Sx Link Speed and Auto-negotiation
1203section for more information.
1204.Pp
1205The properties are ordered in increasing link speed:
1206.Bl -hang -width Ds
1207.It Dv MAC_PROP_ADV_10HDX_CAP
1208.Bd -filled -compact
1209Type:
1210.Vt uint8_t |
1211Permissions:
1212.Sy Read-Only
1213.Ed
1214.Pp
1215The
1216.Dv MAC_PROP_ADV_10HDX_CAP
1217property describes whether or not 10 Mbit/s half-duplex support is
1218advertised.
1219.It Dv MAC_PROP_EN_10HDX_CAP
1220.Bd -filled -compact
1221Type:
1222.Vt uint8_t |
1223Permissions:
1224.Sy Read/Write
1225.Ed
1226.Pp
1227The
1228.Dv MAC_PROP_EN_10HDX_CAP
1229property describes whether or not 10 Mbit/s half-duplex support is
1230enabled.
1231.It Dv MAC_PROP_ADV_10FDX_CAP
1232.Bd -filled -compact
1233Type:
1234.Vt uint8_t |
1235Permissions:
1236.Sy Read-Only
1237.Ed
1238.Pp
1239The
1240.Dv MAC_PROP_ADV_10FDX_CAP
1241property describes whether or not 10 Mbit/s full-duplex support is
1242advertised.
1243.It Dv MAC_PROP_EN_10FDX_CAP
1244.Bd -filled -compact
1245Type:
1246.Vt uint8_t |
1247Permissions:
1248.Sy Read/Write
1249.Ed
1250.Pp
1251The
1252.Dv MAC_PROP_EN_10FDX_CAP
1253property describes whether or not 10 Mbit/s full-duplex support is
1254enabled.
1255.It Dv MAC_PROP_ADV_100HDX_CAP
1256.Bd -filled -compact
1257Type:
1258.Vt uint8_t |
1259Permissions:
1260.Sy Read-Only
1261.Ed
1262.Pp
1263The
1264.Dv MAC_PROP_ADV_100HDX_CAP
1265property describes whether or not 100 Mbit/s half-duplex support is
1266advertised.
1267.It Dv MAC_PROP_EN_100HDX_CAP
1268.Bd -filled -compact
1269Type:
1270.Vt uint8_t |
1271Permissions:
1272.Sy Read/Write
1273.Ed
1274.Pp
1275The
1276.Dv MAC_PROP_EN_100HDX_CAP
1277property describes whether or not 100 Mbit/s half-duplex support is
1278enabled.
1279.It Dv MAC_PROP_ADV_100FDX_CAP
1280.Bd -filled -compact
1281Type:
1282.Vt uint8_t |
1283Permissions:
1284.Sy Read-Only
1285.Ed
1286.Pp
1287The
1288.Dv MAC_PROP_ADV_100FDX_CAP
1289property describes whether or not 100 Mbit/s full-duplex support is
1290advertised.
1291.It Dv MAC_PROP_EN_100FDX_CAP
1292.Bd -filled -compact
1293Type:
1294.Vt uint8_t |
1295Permissions:
1296.Sy Read/Write
1297.Ed
1298.Pp
1299The
1300.Dv MAC_PROP_EN_100FDX_CAP
1301property describes whether or not 100 Mbit/s full-duplex support is
1302enabled.
1303.It Dv MAC_PROP_ADV_100T4_CAP
1304.Bd -filled -compact
1305Type:
1306.Vt uint8_t |
1307Permissions:
1308.Sy Read-Only
1309.Ed
1310.Pp
1311The
1312.Dv MAC_PROP_ADV_100T4_CAP
1313property describes whether or not 100 Mbit/s Ethernet using the
1314100BASE-T4 standard is
1315advertised.
1316.It Dv MAC_PROP_EN_100T4_CAP
1317.Bd -filled -compact
1318Type:
1319.Vt uint8_t |
1320Permissions:
1321.Sy Read/Write
1322.Ed
1323.Pp
1324The
1325.Sy MAC_PROP_ADV_100T4_CAP
1326property describes whether or not 100 Mbit/s Ethernet using the
1327100BASE-T4 standard is
1328enabled.
1329.It Sy MAC_PROP_ADV_1000HDX_CAP
1330.Bd -filled -compact
1331Type:
1332.Vt uint8_t |
1333Permissions:
1334.Sy Read-Only
1335.Ed
1336.Pp
1337The
1338.Dv MAC_PROP_ADV_1000HDX_CAP
1339property describes whether or not 1 Gbit/s half-duplex support is
1340advertised.
1341.It Dv MAC_PROP_EN_1000HDX_CAP
1342.Bd -filled -compact
1343Type:
1344.Vt uint8_t |
1345Permissions:
1346.Sy Read/Write
1347.Ed
1348.Pp
1349The
1350.Dv MAC_PROP_EN_1000HDX_CAP
1351property describes whether or not 1 Gbit/s half-duplex support is
1352enabled.
1353.It Dv MAC_PROP_ADV_1000FDX_CAP
1354.Bd -filled -compact
1355Type:
1356.Vt uint8_t |
1357Permissions:
1358.Sy Read-Only
1359.Ed
1360.Pp
1361The
1362.Dv MAC_PROP_ADV_1000FDX_CAP
1363property describes whether or not 1 Gbit/s full-duplex support is
1364advertised.
1365.It Dv MAC_PROP_EN_1000FDX_CAP
1366.Bd -filled -compact
1367Type:
1368.Vt uint8_t |
1369Permissions:
1370.Sy Read/Write
1371.Ed
1372.Pp
1373The
1374.Dv MAC_PROP_EN_1000FDX_CAP
1375property describes whether or not 1 Gbit/s full-duplex support is
1376enabled.
1377.It Dv MAC_PROP_ADV_2500FDX_CAP
1378.Bd -filled -compact
1379Type:
1380.Vt uint8_t |
1381Permissions:
1382.Sy Read-Only
1383.Ed
1384.Pp
1385The
1386.Dv MAC_PROP_ADV_2500FDX_CAP
1387property describes whether or not 2.5 Gbit/s full-duplex support is
1388advertised.
1389.It Dv MAC_PROP_EN_2500FDX_CAP
1390.Bd -filled -compact
1391Type:
1392.Vt uint8_t |
1393Permissions:
1394.Sy Read/Write
1395.Ed
1396.Pp
1397The
1398.Dv MAC_PROP_EN_2500FDX_CAP
1399property describes whether or not 2.5 Gbit/s full-duplex support is
1400enabled.
1401.It Dv MAC_PROP_ADV_5000FDX_CAP
1402.Bd -filled -compact
1403Type:
1404.Vt uint8_t |
1405Permissions:
1406.Sy Read-Only
1407.Ed
1408.Pp
1409The
1410.Dv MAC_PROP_ADV_5000FDX_CAP
1411property describes whether or not 5.0 Gbit/s full-duplex support is
1412advertised.
1413.It Dv MAC_PROP_EN_5000FDX_CAP
1414.Bd -filled -compact
1415Type:
1416.Vt uint8_t |
1417Permissions:
1418.Sy Read/Write
1419.Ed
1420.Pp
1421The
1422.Dv MAC_PROP_EN_5000FDX_CAP
1423property describes whether or not 5.0 Gbit/s full-duplex support is
1424enabled.
1425.It Dv MAC_PROP_ADV_10GFDX_CAP
1426.Bd -filled -compact
1427Type:
1428.Vt uint8_t |
1429Permissions:
1430.Sy Read-Only
1431.Ed
1432.Pp
1433The
1434.Dv MAC_PROP_ADV_10GFDX_CAP
1435property describes whether or not 10 Gbit/s full-duplex support is
1436advertised.
1437.It Dv MAC_PROP_EN_10GFDX_CAP
1438.Bd -filled -compact
1439Type:
1440.Vt uint8_t |
1441Permissions:
1442.Sy Read/Write
1443.Ed
1444.Pp
1445The
1446.Dv MAC_PROP_EN_10GFDX_CAP
1447property describes whether or not 10 Gbit/s full-duplex support is
1448enabled.
1449.It Dv MAC_PROP_ADV_40GFDX_CAP
1450.Bd -filled -compact
1451Type:
1452.Vt uint8_t |
1453Permissions:
1454.Sy Read-Only
1455.Ed
1456.Pp
1457The
1458.Dv MAC_PROP_ADV_40GFDX_CAP
1459property describes whether or not 40 Gbit/s full-duplex support is
1460advertised.
1461.It Dv MAC_PROP_EN_40GFDX_CAP
1462.Bd -filled -compact
1463Type:
1464.Vt uint8_t |
1465Permissions:
1466.Sy Read/Write
1467.Ed
1468.Pp
1469The
1470.Dv MAC_PROP_EN_40GFDX_CAP
1471property describes whether or not 40 Gbit/s full-duplex support is
1472enabled.
1473.It Dv MAC_PROP_ADV_100GFDX_CAP
1474.Bd -filled -compact
1475Type:
1476.Vt uint8_t |
1477Permissions:
1478.Sy Read-Only
1479.Ed
1480.Pp
1481The
1482.Dv MAC_PROP_ADV_100GFDX_CAP
1483property describes whether or not 100 Gbit/s full-duplex support is
1484advertised.
1485.It Dv MAC_PROP_EN_100GFDX_CAP
1486.Bd -filled -compact
1487Type:
1488.Vt uint8_t |
1489Permissions:
1490.Sy Read/Write
1491.Ed
1492.Pp
1493The
1494.Dv MAC_PROP_EN_100GFDX_CAP
1495property describes whether or not 100 Gbit/s full-duplex support is
1496enabled.
1497.El
1498.Ss Private Properties
1499In addition to the defined properties above, drivers are allowed to
1500define private properties.
1501These private properties are device-specific properties.
1502All private properties share the same constant,
1503.Dv MAC_PROP_PRIVATE .
1504Properties are distinguished by a name, which is a character string.
1505The list of such private properties is defined when registering with mac in the
1506.Fa m_priv_props
1507member of the
1508.Xr mac_register 9S
1509structure.
1510.Pp
1511The driver may define whatever semantics it wants for these private
1512properties.
1513They will not be listed when running
1514.Xr dladm 8 ,
1515unless explicitly requested by name.
1516All such properties should start with a leading underscore character and then
1517consist of alphanumeric ASCII characters and additional underscores or hyphens.
1518.Pp
1519Properties of type
1520.Dv MAC_PROP_PRIVATE
1521may show up in all three property related entry points:
1522.Xr mc_propinfo 9E ,
1523.Xr mc_getprop 9E ,
1524and
1525.Xr mc_setprop 9E .
1526Device drivers should tell the different properties apart by using the
1527.Xr strcmp 9F
1528function to compare it to the set of properties that it knows about.
1529When encountering properties that it doesn't know, it should treat them
1530like all other unknown properties.
1531.Sh STATISTICS
1532The MAC framework defines a couple different sets of statistics which
1533are based on various standards for devices to implement.
1534Statistics are retrieved through the
1535.Xr mc_getstat 9E
1536entry point.
1537There are both statistics that are required for all devices and then there is a
1538separate set of Ethernet specific statistics.
1539Not all devices will support every statistic.
1540In many cases, several device registers will need to be combined to create the
1541proper stat.
1542.Pp
1543In general, if the device is not keeping track of these statistics, then
1544it is recommended that the driver store these values as a
1545.Vt uint64_t
1546to ensure that overflow does not occur.
1547.Pp
1548If a device does not support a specific statistic, then it is fine to
1549return that it is not supported.
1550The same should be used for unrecognized statistics.
1551See
1552.Xr mc_getstat 9E
1553for more information on the proper way to handle these.
1554.Ss General Device Statistics
1555The following statistics are based on MIB-II statistics from both RFC
15561213 and RFC 1573.
1557.Bl -tag -width Ds
1558.It Dv MAC_STAT_IFSPEED
1559The device's current speed in bits per second.
1560.It Dv MAC_STAT_MULTIRCV
1561The total number of received multicast packets.
1562.It Dv MAC_STAT_BRDCSTRCV
1563The total number of received broadcast packets.
1564.It Dv MAC_STAT_MULTIXMT
1565The total number of transmitted multicast packets.
1566.It Dv MAC_STAT_BRDCSTXMT
1567The total number of received broadcast packets.
1568.It Dv MAC_STAT_NORCVBUF
1569The total number of packets discarded by the hardware due to a lack of
1570receive buffers.
1571.It Dv MAC_STAT_IERRORS
1572The total number of errors detected on input.
1573.It Dv MAC_STAT_UNKNOWNS
1574The total number of received packets that were discarded because they
1575were of an unknown protocol.
1576.It Dv MAC_STAT_NOXMTBUF
1577The total number of outgoing packets dropped due to a lack of transmit
1578buffers.
1579.It Dv MAC_STAT_OERRORS
1580The total number of outgoing packets that resulted in errors.
1581.It Dv MAC_STAT_COLLISIONS
1582Total number of collisions encountered by the transmitter.
1583.It Dv MAC_STAT_RBYTES
1584The total number of bytes received by the device, regardless of packet
1585type.
1586.It Dv MAC_STAT_IPACKETS
1587The total number of packets received by the device, regardless of packet type.
1588.It Dv MAC_STAT_OBYTES
1589The total number of bytes transmitted by the device, regardless of packet type.
1590.It Dv MAC_STAT_OPACKETS
1591The total number of packets sent by the device, regardless of packet type.
1592.It Dv MAC_STAT_UNDERFLOWS
1593The total number of packets that were smaller than the minimum sized
1594packet for the device and were therefore dropped.
1595.It Dv MAC_STAT_OVERFLOWS
1596The total number of packets that were larger than the maximum sized
1597packet for the device and were therefore dropped.
1598.El
1599.Ss Ethernet Specific Statistics
1600The following statistics are specific to Ethernet devices.
1601They refer to values from RFC 1643 and include various MII/GMII specific stats.
1602Many of these are also defined in IEEE 802.3.
1603.Bl -tag -width Ds
1604.It Dv ETHER_STAT_ADV_CAP_1000FDX
1605Indicates that the device is advertising support for 1 Gbit/s
1606full-duplex operation.
1607.It Dv ETHER_STAT_ADV_CAP_1000HDX
1608Indicates that the device is advertising support for 1 Gbit/s
1609half-duplex operation.
1610.It Dv ETHER_STAT_ADV_CAP_100FDX
1611Indicates that the device is advertising support for 100 Mbit/s
1612full-duplex operation.
1613.It Dv ETHER_STAT_ADV_CAP_100GFDX
1614Indicates that the device is advertising support for 100 Gbit/s
1615full-duplex operation.
1616.It Dv ETHER_STAT_ADV_CAP_100HDX
1617Indicates that the device is advertising support for 100 Mbit/s
1618half-duplex operation.
1619.It Dv ETHER_STAT_ADV_CAP_100T4
1620Indicates that the device is advertising support for 100 Mbit/s
1621100BASE-T4 operation.
1622.It Dv ETHER_STAT_ADV_CAP_10FDX
1623Indicates that the device is advertising support for 10 Mbit/s
1624full-duplex operation.
1625.It Dv ETHER_STAT_ADV_CAP_10GFDX
1626Indicates that the device is advertising support for 10 Gbit/s
1627full-duplex operation.
1628.It Dv ETHER_STAT_ADV_CAP_10HDX
1629Indicates that the device is advertising support for 10 Mbit/s
1630half-duplex operation.
1631.It Dv ETHER_STAT_ADV_CAP_2500FDX
1632Indicates that the device is advertising support for 2.5 Gbit/s
1633full-duplex operation.
1634.It Dv ETHER_STAT_ADV_CAP_40GFDX
1635Indicates that the device is advertising support for 40 Gbit/s
1636full-duplex operation.
1637.It Dv ETHER_STAT_ADV_CAP_5000FDX
1638Indicates that the device is advertising support for 5.0 Gbit/s
1639full-duplex operation.
1640.It Dv ETHER_STAT_ADV_CAP_ASMPAUSE
1641Indicates that the device is advertising support for receiving pause
1642frames.
1643.It Dv ETHER_STAT_ADV_CAP_AUTONEG
1644Indicates that the device is advertising support for auto-negotiation.
1645.It Dv ETHER_STAT_ADV_CAP_PAUSE
1646Indicates that the device is advertising support for generating pause
1647frames.
1648.It Dv ETHER_STAT_ADV_REMFAULT
1649Indicates that the device is advertising support for detecting faults in
1650the remote link peer.
1651.It Dv ETHER_STAT_ALIGN_ERRORS
1652Indicates the number of times an alignment error was generated by the
1653Ethernet device.
1654This is a count of packets that were not an integral number of octets and failed
1655the FCS check.
1656.It Dv ETHER_STAT_CAP_1000FDX
1657Indicates the device supports 1 Gbit/s full-duplex operation.
1658.It Dv ETHER_STAT_CAP_1000HDX
1659Indicates the device supports 1 Gbit/s half-duplex operation.
1660.It Dv ETHER_STAT_CAP_100FDX
1661Indicates the device supports 100 Mbit/s full-duplex operation.
1662.It Dv ETHER_STAT_CAP_100GFDX
1663Indicates the device supports 100 Gbit/s full-duplex operation.
1664.It Dv ETHER_STAT_CAP_100HDX
1665Indicates the device supports 100 Mbit/s half-duplex operation.
1666.It Dv ETHER_STAT_CAP_100T4
1667Indicates the device supports 100 Mbit/s 100BASE-T4 operation.
1668.It Dv ETHER_STAT_CAP_10FDX
1669Indicates the device supports 10 Mbit/s full-duplex operation.
1670.It Dv ETHER_STAT_CAP_10GFDX
1671Indicates the device supports 10 Gbit/s full-duplex operation.
1672.It Dv ETHER_STAT_CAP_10HDX
1673Indicates the device supports 10 Mbit/s half-duplex operation.
1674.It Dv ETHER_STAT_CAP_2500FDX
1675Indicates the device supports 2.5 Gbit/s full-duplex operation.
1676.It Dv ETHER_STAT_CAP_40GFDX
1677Indicates the device supports 40 Gbit/s full-duplex operation.
1678.It Dv ETHER_STAT_CAP_5000FDX
1679Indicates the device supports 5.0 Gbit/s full-duplex operation.
1680.It Dv ETHER_STAT_CAP_ASMPAUSE
1681Indicates that the device supports the ability to receive pause frames.
1682.It Dv ETHER_STAT_CAP_AUTONEG
1683Indicates that the device supports the ability to perform link
1684auto-negotiation.
1685.It Dv ETHER_STAT_CAP_PAUSE
1686Indicates that the device supports the ability to transmit pause frames.
1687.It Dv ETHER_STAT_CAP_REMFAULT
1688Indicates that the device supports the ability of detecting a remote
1689fault in a link peer.
1690.It Dv ETHER_STAT_CARRIER_ERRORS
1691Indicates the number of times that the Ethernet carrier sense condition
1692was lost or not asserted.
1693.It Dv ETHER_STAT_DEFER_XMTS
1694Indicates the number of frames for which the device was unable to
1695transmit the frame due to being busy and had to try again.
1696.It Dv ETHER_STAT_EX_COLLISIONS
1697Indicates the number of frames that failed to send due to an excessive
1698number of collisions.
1699.It Dv ETHER_STAT_FCS_ERRORS
1700Indicates the number of times that a frame check sequence failed.
1701.It Dv ETHER_STAT_FIRST_COLLISIONS
1702Indicates the number of times that a frame was eventually transmitted
1703successfully, but only after a single collision.
1704.It Dv ETHER_STAT_JABBER_ERRORS
1705Indicates the number of frames that were received that were both larger
1706than the maximum packet size and failed the frame check sequence.
1707.It Dv ETHER_STAT_LINK_ASMPAUSE
1708Indicates whether the link is currently configured to accept pause
1709frames.
1710.It Dv ETHER_STAT_LINK_AUTONEG
1711Indicates whether the current link state is a result of
1712auto-negotiation.
1713.It Dv ETHER_STAT_LINK_DUPLEX
1714Indicates the current duplex state of the link.
1715The values used here should be the same as documented for
1716.Dv MAC_PROP_DUPLEX .
1717.It Dv ETHER_STAT_LINK_PAUSE
1718Indicates whether the link is currently configured to generate pause
1719frames.
1720.It Dv ETHER_STAT_LP_CAP_1000FDX
1721Indicates the remote device supports 1 Gbit/s full-duplex operation.
1722.It Dv ETHER_STAT_LP_CAP_1000HDX
1723Indicates the remote device supports 1 Gbit/s half-duplex operation.
1724.It Dv ETHER_STAT_LP_CAP_100FDX
1725Indicates the remote device supports 100 Mbit/s full-duplex operation.
1726.It Dv ETHER_STAT_LP_CAP_100GFDX
1727Indicates the remote device supports 100 Gbit/s full-duplex operation.
1728.It Dv ETHER_STAT_LP_CAP_100HDX
1729Indicates the remote device supports 100 Mbit/s half-duplex operation.
1730.It Dv ETHER_STAT_LP_CAP_100T4
1731Indicates the remote device supports 100 Mbit/s 100BASE-T4 operation.
1732.It Dv ETHER_STAT_LP_CAP_10FDX
1733Indicates the remote device supports 10 Mbit/s full-duplex operation.
1734.It Dv ETHER_STAT_LP_CAP_10GFDX
1735Indicates the remote device supports 10 Gbit/s full-duplex operation.
1736.It Dv ETHER_STAT_LP_CAP_10HDX
1737Indicates the remote device supports 10 Mbit/s half-duplex operation.
1738.It Dv ETHER_STAT_LP_CAP_2500FDX
1739Indicates the remote device supports 2.5 Gbit/s full-duplex operation.
1740.It Dv ETHER_STAT_LP_CAP_40GFDX
1741Indicates the remote device supports 40 Gbit/s full-duplex operation.
1742.It Dv ETHER_STAT_LP_CAP_5000FDX
1743Indicates the remote device supports 5.0 Gbit/s full-duplex operation.
1744.It Dv ETHER_STAT_LP_CAP_ASMPAUSE
1745Indicates that the remote device supports the ability to receive pause
1746frames.
1747.It Dv ETHER_STAT_LP_CAP_AUTONEG
1748Indicates that the remote device supports the ability to perform link
1749auto-negotiation.
1750.It Dv ETHER_STAT_LP_CAP_PAUSE
1751Indicates that the remote device supports the ability to transmit pause
1752frames.
1753.It Dv ETHER_STAT_LP_CAP_REMFAULT
1754Indicates that the remote device supports the ability of detecting a
1755remote fault in a link peer.
1756.It Dv ETHER_STAT_MACRCV_ERRORS
1757Indicates the number of times that the internal MAC layer encountered an
1758error when attempting to receive and process a frame.
1759.It Dv ETHER_STAT_MACXMT_ERRORS
1760Indicates the number of times that the internal MAC layer encountered an
1761error when attempting to process and transmit a frame.
1762.It Dv ETHER_STAT_MULTI_COLLISIONS
1763Indicates the number of times that a frame was eventually transmitted
1764successfully, but only after more than one collision.
1765.It Dv ETHER_STAT_SQE_ERRORS
1766Indicates the number of times that an SQE error occurred.
1767The specific conditions for this error are documented in IEEE 802.3.
1768.It Dv ETHER_STAT_TOOLONG_ERRORS
1769Indicates the number of frames that were received that were longer than
1770the maximum frame size supported by the device.
1771.It Dv ETHER_STAT_TOOSHORT_ERRORS
1772Indicates the number of frames that were received that were shorter than
1773the minimum frame size supported by the device.
1774.It Dv ETHER_STAT_TX_LATE_COLLISIONS
1775Indicates the number of times a collision was detected late on the
1776device.
1777.It Dv ETHER_STAT_XCVR_ADDR
1778Indicates the address of the MII/GMII receiver address.
1779.It Dv ETHER_STAT_XCVR_ID
1780Indicates the id of the MII/GMII receiver address.
1781.It Dv ETHER_STAT_XCVR_INUSE
1782Indicates what kind of receiver is in use.
1783The following values may be used:
1784.Bl -tag -width Ds
1785.It Dv XCVR_UNDEFINED
1786The receiver type is undefined by the hardware.
1787.It Dv XCVR_NONE
1788There is no receiver in use by the hardware.
1789.It Dv XCVR_10
1790The receiver supports 10BASE-T operation.
1791.It Dv XCVR_100T4
1792The receiver supports 100BASE-T4 operation.
1793.It Dv XCVR_100X
1794The receiver supports 100BASE-TX operation.
1795.It Dv XCVR_100T2
1796The receiver supports 100BASE-T2 operation.
1797.It Dv XCVR_1000X
1798The receiver supports 1000BASE-X operation.
1799This is used for all fiber receivers.
1800.It Dv XCVR_1000T
1801The receiver supports 1000BASE-T operation.
1802This is used for all copper receivers.
1803.El
1804.El
1805.Ss Device Specific kstats
1806In addition to the defined statistics above, if the device driver
1807maintains additional statistics or the device provides additional
1808statistics, it should create its own kstats through the
1809.Xr kstat_create 9F
1810function to allow operators to observe them.
1811.Sh RECEIVE DESCRIPTOR LAYOUT
1812One of the important things that a device driver must do is lay out DMA
1813memory, generally in a ring of descriptors, into which received Ethernet
1814frames will be placed.
1815When performing this, there are a few things that drivers should
1816generally do:
1817.Bl -enum -offset indent
1818.It
1819Drivers should lay out memory so that the IP header will be 4-byte
1820aligned.
1821The IP stack expects that the beginning of an IP header will be at a
18224-byte aligned address; however, a DMA allocation will be at a 4-
1823or 8-byte aligned address by default.
1824The IP hearder is at a 14 byte offset from the beginning of the Ethernet
1825frame, leaving the IP header at a 2-byte alignment if the Ethernet frame
1826starts at the beginning of the DMA buffer.
1827If VLAN tagging is in place, then each VLAN tag adds 4 bytes, which
1828doesn't change the alignment the IP header is found at.
1829.Pp
1830As a solution to this, the driver should program the device to start
1831placing the received Ethernet frame at two bytes off of the start of the
1832DMA buffer.
1833This will make sure that no matter whether or not VLAN tags are present,
1834that the IP header will be 4-byte aligned.
1835.It
1836Drivers should try to allocate the DMA memory used for receiving frames
1837as a continuous buffer.
1838If for some reason that would not be possible, the driver should try to
1839ensure that there is enough space for all of the initial Ethernet and
1840any possible layer three and layer four headers
1841.Pq such as IP, TCP, or UDP
1842in the initial descriptor.
1843.It
1844As discussed in the
1845.Sx MBLKS AND DMA
1846section, there are multiple strategies for managing the relationship
1847between DMA data, receive descriptors, and the operating system
1848representation of a packet in the
1849.Xr mblk 9S
1850structure.
1851Drivers must limit their resource consumption.
1852See the
1853.Sy Considerations
1854section of
1855.Sx MBLKS AND DMA
1856for more on this.
1857.El
1858.Sh TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
1859Device drivers are the first line of defense for dealing with broken
1860devices and bugs in their firmware.
1861While most devices will rarely fail, it is important that when designing and
1862implementing the device driver that particular attention is paid in the design
1863with respect to RAS (Reliability, Availability, and Serviceability).
1864While everything described in this section is optional, it is highly recommended
1865that all new device drivers follow these guidelines.
1866.Pp
1867The Fault Management Architecture (FMA) provides facilities for
1868detecting and reporting various classes of defects and faults.
1869Specifically for networking device drivers, issues that should be
1870detected and reported include:
1871.Bl -bullet -offset indent
1872.It
1873Device internal uncorrectable errors
1874.It
1875Device internal correctable errors
1876.It
1877PCI and PCI Express transport errors
1878.It
1879Device temperature alarms
1880.It
1881Device transmission stalls
1882.It
1883Device communication timeouts
1884.It
1885High invalid interrupts
1886.El
1887.Pp
1888All such errors fall into three primary categories:
1889.Bl -enum -offset indent
1890.It
1891Errors detected by the Fault Management Architecture
1892.It
1893Errors detected by the device and indicated to the device driver
1894.It
1895Errors detected by the device driver
1896.El
1897.Ss Fault Management Setup and Teardown
1898Drivers should initialize support for the fault management framework by
1899calling
1900.Xr ddi_fm_init 9F
1901from their
1902.Xr attach 9E
1903routine.
1904By registering with the fault management framework, a device driver is given the
1905chance to detect and notice transport errors as well as report other errors that
1906exist.
1907While a device driver does not need to indicate that it is capable of all such
1908capabilities described in
1909.Xr ddi_fm_init 9F ,
1910we suggest that device drivers at least register the
1911.Dv DDI_FM_EREPORT_CAPABLE
1912so as to allow the driver to report issues that it detects.
1913.Pp
1914If the driver registers with the fault management framework during its
1915.Xr attach 9E
1916entry point, it must call
1917.Xr ddi_fm_fini 9F
1918during its
1919.Xr detach 9E
1920entry point.
1921.Ss Transport Errors
1922Many modern networking devices leverage PCI or PCI Express.
1923As such, there are two primary ways that device drivers access data: they either
1924memory map device registers and use routines like
1925.Xr ddi_get8 9F
1926and
1927.Xr ddi_put8 9F
1928or they use direct memory access (DMA).
1929New device drivers should always enable checking of the transport layer by
1930marking their support in the
1931.Xr ddi_device_acc_attr 9S
1932structure and using routines like
1933.Xr ddi_fm_acc_err_get 9F
1934and
1935.Xr ddi_fm_dma_err_get 9F
1936to detect if errors have occurred.
1937.Ss Device Indicated Errors
1938Many devices have capabilities to announce to a device driver that a
1939fatal correctable error or uncorrectable error has occurred.
1940Other devices have the ability to indicate that various physical issues have
1941occurred such as a fan failing or a temperature sensor having fired.
1942.Pp
1943Drivers should wire themselves to receive notifications when these
1944events occur.
1945The means and capabilities will vary from device to device.
1946For example, some devices will generate information about these notifications
1947through special interrupts.
1948Other devices may have a register that software can poll.
1949In the cases where polling is required, driver writers should try not to poll
1950too frequently and should generally only poll when the device is actively being
1951used, e.g. between calls to the
1952.Xr mc_start 9E
1953and
1954.Xr mc_stop 9E
1955entry points.
1956.Ss Driver Transmit Stall Detection
1957One of the primary responsibilities of a hardened device driver is to
1958perform transmit stall detection.
1959The core idea behind tx stall detection is that the driver should record when
1960it's getting activity related to when data has been successfully transmitted.
1961Most devices should be transmitting data on a regular basis as long as the link
1962is up.
1963If it is not, then this may indicate that the device is stuck and needs to be
1964reset.
1965At this time, the MAC framework does not provide any resources for performing
1966these checks; however, polling on each individual transmit ring for the last
1967completion time while something is actively being transmitted through the use of
1968routines such as
1969.Xr timeout 9F
1970may be a reasonable starting point.
1971.Ss Driver Command Timeout Detection
1972Each device is programmed in different ways.
1973Some devices are programmed through asynchronous commands while others are
1974programmed by writing directly to memory mapped registers.
1975If a device receives asynchronous replies to commands, then the device driver
1976should set reasonable timeouts for all such commands and plan on detecting them.
1977If a timeout occurs, the driver should presume that there is an issue with the
1978hardware and proceed to abort the command or reset the device.
1979.Pp
1980Many devices do not have such a communication mechanism.
1981However, whenever there is some activity where the device driver must wait, then
1982it should be prepared for the fact that the device may never get back to
1983it and react appropriately by performing some kind of device reset.
1984.Ss Reacting to Errors
1985When any of the above categories of errors has been triggered, the
1986behavior that the device driver should take depends on the kind of
1987error.
1988If a fatal error, for example, a transport error, a transmit stall was detected,
1989or the device indicated an uncorrectable error was detected, then it is
1990important that the driver take the following steps:
1991.Bl -enum -offset indent
1992.It
1993Set a flag in the device driver's state that indicates that it has hit
1994an error condition.
1995When this error condition flag is asserted, transmitted packets should be
1996accepted and dropped and actions that would require writing to the device state
1997should fail with an error.
1998This flag should remain until the device has been successfully restarted.
1999.It
2000If the error was not a transport error that was indicated by the fault
2001management architecture, e.g. a transport error that was detected, then
2002the device driver should post an
2003.Sy ereport
2004indicating what has occurred with the
2005.Xr ddi_fm_ereport_post 9F
2006function.
2007.It
2008The device driver should indicate that the device's service was lost
2009with a call to
2010.Xr ddi_fm_service_impact 9F
2011using the symbol
2012.Dv DDI_SERVICE_LOST .
2013.It
2014At this point the device driver should issue a device reset through some
2015device-specific means.
2016.It
2017When the device reset has been completed, then the device driver should
2018restore all of the programmed state to the device.
2019This includes things like the current MTU, advertised auto-negotiation speeds,
2020MAC address filters, and more.
2021.It
2022Finally, when service has been restored, the device driver should call
2023.Xr ddi_fm_service_impact 9F
2024using the symbol
2025.Dv DDI_SERVICE_RESTORED .
2026.El
2027.Pp
2028When a non-fatal error occurs, then the device driver should submit an
2029ereport and should optionally mark the device degraded using
2030.Xr ddi_fm_service_impact 9F
2031with the
2032.Dv DDI_SERVICE_DEGRADED
2033value depending on the nature of the problem that has occurred.
2034.Pp
2035Device drivers should never make the decision to remove a device from
2036service based on errors that have occurred nor should they panic the
2037system.
2038Rather, the device driver should always try to notify the operating system with
2039various ereports and allow its policy decisions to occur.
2040The decision to retire a device lies in the hands of the fault management
2041architecture.
2042It knows more about the operator's intent and the surrounding system's state
2043than the device driver itself does and it will make the call to offline and
2044retire the device if it is required.
2045.Ss Device Resets
2046When resetting a device, a device driver must exercise caution.
2047If a device driver has not been written to plan for a device reset, then it
2048may not correctly restore the device's state after such a reset.
2049Such state should be stored in the instance's private state data as the MAC
2050framework does not know about device resets and will not inform the
2051device again about the expected, programmed state.
2052.Pp
2053One wrinkle with device resets is that many networking cards show up as
2054multiple PCI functions on a single device, for example, each port may
2055show up as a separate function and thus have a separate instance of the
2056device driver attached.
2057When resetting a function, device driver writers should carefully read the
2058device programming manuals and verify whether or not a reset impacts only the
2059stalled function or if it impacts all function across the device.
2060.Pp
2061If the only way to reset a given function is through the device, then
2062this may require more coordination and work on the part of the device
2063driver to ensure that all the other instances are correctly restored.
2064In cases where this occurs, some devices offer ways of injecting
2065interrupts onto those other functions to notify them that this is
2066occurring.
2067.Sh MBLKS AND DMA
2068The networking stack manages framed data through the use of the
2069.Xr mblk 9S
2070structure.
2071The mblk allows for a single message to be made up of individual blocks.
2072Each part is linked together through its
2073.Fa b_cont
2074member.
2075However, it also allows for multiple messages to be chained together through the
2076use of the
2077.Fa b_next
2078member.
2079While the networking stack works with these structures, device drivers generally
2080work with DMA regions.
2081There are two different strategies that device drivers use for handling these
2082two different cases: copying and binding.
2083.Ss Copying Data
2084The first way that device drivers handle interfacing between the two is
2085by having two separate regions of memory.
2086One part is memory which has been allocated for DMA through a call to
2087.Xr ddi_dma_mem_alloc 9F
2088and the other is memory associated with the memory block.
2089.Pp
2090In this case, a driver will use
2091.Xr bcopy 9F
2092to copy memory between the two distinct regions.
2093When transmitting a packet, it will copy the memory from the mblk_t to the DMA
2094region.
2095When receiving memory, it will allocate a mblk_t through the
2096.Xr allocb 9F
2097routine, copy the memory across with
2098.Xr bcopy 9F ,
2099and then increment the mblk_t's
2100.Fa b_wptr
2101structure.
2102.Pp
2103If, when receiving, memory is not available for a new message block,
2104then the frame should be skipped and effectively dropped.
2105A kstat should be bumped when such an occasion occurs.
2106.Ss Binding Data
2107An alternative approach to copying data is to use DMA binding.
2108When using DMA binding, the OS takes care of mapping between DMA memory and
2109normal device memory.
2110The exact process is a bit different between transmit and receive.
2111.Pp
2112When transmitting a device driver has an mblk_t and needs to call the
2113.Xr ddi_dma_addr_bind_handle 9F
2114function to bind it to an already existing DMA handle.
2115At that point, it will receive various DMA cookies that it can use to obtain the
2116addresses to program the device with for transmitting data.
2117Once the transmit is done, the driver must then make sure to call
2118.Xr freemsg 9F
2119to release the data.
2120It must not call
2121.Xr freemsg 9F
2122before it receives an interrupt from the device indicating that the data
2123has been transmitted, otherwise it risks sending arbitrary kernel
2124memory.
2125.Pp
2126When receiving data, the device can perform a similar operation.
2127First, it must bind the DMA memory into the kernel's virtual memory address
2128space through a call to the
2129.Xr ddi_dma_addr_bind_handle 9F
2130function if it has not already.
2131Once it has, it must then call
2132.Xr desballoc 9F
2133to try and create a new mblk_t which leverages the associated memory.
2134It can then pass that mblk_t up to the stack.
2135.Ss Considerations
2136When deciding which of these options to use, there are many different
2137considerations that must be made.
2138The answer as to whether to bind memory or to copy data is not always simpler.
2139.Pp
2140The first thing to remember is that DMA resources may be finite on a
2141given platform.
2142Consider the case of receiving data.
2143A device driver that binds one of its receive descriptors may not get it back
2144for quite some time as it may be used by the kernel until an application
2145actually consumes it.
2146Device drivers that try to bind memory for receive, often work with the
2147constraint that they must be able to replace that DMA memory with another DMA
2148descriptor.
2149If they were not replaced, then eventually the device would not be able to
2150receive additional data into the ring.
2151.Pp
2152On the other hand, particularly for larger frames, copying every packet
2153from one buffer to another can be a source of additional latency and
2154memory waste in the system.
2155For larger copies, the cost of copying may dwarf any potential cost of
2156performing DMA binding.
2157.Pp
2158For device driver authors that are unsure of what to do, they should
2159first employ the copying method to simplify the act of writing the
2160device driver.
2161The copying method is simpler and also allows the device driver author not to
2162worry about allocated DMA memory that is still outstanding when it is asked to
2163unload.
2164.Pp
2165If device driver writers are worried about the cost, it is recommended
2166to make the decision as to whether or not to copy or bind DMA data
2167a separate private property for both transmitting and receiving.
2168That private property should indicate the size of the received frame at which
2169to switch from one format to the other.
2170This way, data can be gathered to determine what the impact of each method is on
2171a given platform.
2172.Sh SEE ALSO
2173.Xr dlpi 4P ,
2174.Xr driver.conf 5 ,
2175.Xr ieee802.3 7 ,
2176.Xr dladm 8 ,
2177.Xr _fini 9E ,
2178.Xr _info 9E ,
2179.Xr _init 9E ,
2180.Xr attach 9E ,
2181.Xr close 9E ,
2182.Xr detach 9E ,
2183.Xr mac_capab_led 9E ,
2184.Xr mac_capab_rings 9E ,
2185.Xr mac_capab_transceiver 9E ,
2186.Xr mc_close 9E ,
2187.Xr mc_getcapab 9E ,
2188.Xr mc_getprop 9E ,
2189.Xr mc_getstat 9E ,
2190.Xr mc_multicst 9E  ,
2191.Xr mc_open 9E ,
2192.Xr mc_propinfo 9E  ,
2193.Xr mc_setpromisc 9E  ,
2194.Xr mc_setprop 9E ,
2195.Xr mc_start 9E ,
2196.Xr mc_stop 9E ,
2197.Xr mc_tx 9E ,
2198.Xr mc_unicst 9E  ,
2199.Xr open 9E ,
2200.Xr allocb 9F ,
2201.Xr bcopy 9F ,
2202.Xr ddi_dma_addr_bind_handle 9F ,
2203.Xr ddi_dma_mem_alloc 9F ,
2204.Xr ddi_fm_acc_err_get 9F ,
2205.Xr ddi_fm_dma_err_get 9F ,
2206.Xr ddi_fm_ereport_post 9F ,
2207.Xr ddi_fm_fini 9F ,
2208.Xr ddi_fm_init 9F ,
2209.Xr ddi_fm_service_impact 9F ,
2210.Xr ddi_get8 9F ,
2211.Xr ddi_put8 9F ,
2212.Xr desballoc 9F ,
2213.Xr freemsg 9F ,
2214.Xr kstat_create 9F ,
2215.Xr mac_alloc 9F ,
2216.Xr mac_fini_ops 9F ,
2217.Xr mac_free 9F ,
2218.Xr mac_hcksum_get 9F ,
2219.Xr mac_hcksum_set 9F ,
2220.Xr mac_init_ops 9F ,
2221.Xr mac_link_update 9F ,
2222.Xr mac_lso_get 9F ,
2223.Xr mac_maxsdu_update 9F ,
2224.Xr mac_prop_info_set_default_link_flowctrl 9F ,
2225.Xr mac_prop_info_set_default_str 9F ,
2226.Xr mac_prop_info_set_default_uint32 9F ,
2227.Xr mac_prop_info_set_default_uint64 9F ,
2228.Xr mac_prop_info_set_default_uint8 9F ,
2229.Xr mac_prop_info_set_perm 9F ,
2230.Xr mac_prop_info_set_range_uint32 9F ,
2231.Xr mac_register 9F ,
2232.Xr mac_rx 9F ,
2233.Xr mac_unregister 9F ,
2234.Xr mod_install 9F ,
2235.Xr mod_remove 9F ,
2236.Xr strcmp 9F ,
2237.Xr timeout 9F ,
2238.Xr cb_ops 9S ,
2239.Xr ddi_device_acc_attr 9S ,
2240.Xr dev_ops 9S ,
2241.Xr mac_callbacks 9S ,
2242.Xr mac_register 9S ,
2243.Xr mblk 9S ,
2244.Xr modldrv 9S ,
2245.Xr modlinkage 9S
2246.Rs
2247.%A McCloghrie, K.
2248.%A Rose, M.
2249.%T RFC 1213 Management Information Base for Network Management of
2250.%T TCP/IP-based internets: MIB-II
2251.%D March 1991
2252.Re
2253.Rs
2254.%A McCloghrie, K.
2255.%A Kastenholz, F.
2256.%T RFC 1573 Evolution of the Interfaces Group of MIB-II
2257.%D January 1994
2258.Re
2259.Rs
2260.%A Kastenholz, F.
2261.%T RFC 1643 Definitions of Managed Objects for the Ethernet-like
2262.%T Interface Types
2263.Re
2264