xref: /illumos-gate/usr/src/man/man9e/mac_capab_rings.9e (revision 533affcbc7fc4d0c8132976ea454aaa715fe2307)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright (c) 2017, Joyent, Inc.
13.\" Copyright 2022 Oxide Computer Company
14.\" Copyright 2023 Peter Tribble
15.\"
16.Dd July 17, 2023
17.Dt MAC_CAPAB_RINGS 9E
18.Os
19.Sh NAME
20.Nm mac_capab_rings
21.Nd MAC ring capability
22.Sh SYNOPSIS
23.In sys/mac_provider.h
24.Vt typedef struct mac_capab_rings_s mac_capab_rings_t;
25.Sh INTERFACE LEVEL
26.Sy Uncommitted -
27This interface is still evolving.
28API and ABI stability is not guaranteed.
29.Sh DESCRIPTION
30The
31.Sy MAC_CAPAB_RINGS
32capability provides a means for device drivers to take advantage of
33the additional resources offered by hardware beyond the basic operations
34to transmit and receive.
35There are two primary concepts that this MAC capability relies on: rings
36and groups.
37.Pp
38The
39.Em ring
40is a abstract concept which must be mapped to some hardware construct by
41the driver.
42It typically takes the form of a DMA memory region which is divided
43into many smaller units, called descriptors or entries.
44Each entry in the ring describes a location in memory of a packet, which the
45hardware is to read from
46.Pq to transmit it
47or write to
48.Pq upon reception .
49Entries also typically contain metadata and attributes about the packet.
50These entries are typically arranged in a fixed-size circular buffer
51.Po hence the
52.Dq ring
53name
54.Pc
55which is shared between the operating system and the
56hardware via the DMA-backed memory.
57Most NICs, regardless of their support for this capability, use something
58resembling a descriptor ring under the hood.
59Some vendors may also refer to rings as
60.Em queues .
61The ring concept is intentionally general, so that more unusual underlying
62hardware constructs can also be used to implement it.
63.Pp
64A collection of one or more rings is called a
65.Em group .
66Each group usually has a collection of filters that can be associated
67with them.
68These filters are usually defined in terms of matching something like a
69MAC address, VLAN, or Ethertype, though more complex filters may exist
70in hardware.
71When a packet matches a filter, it will then be directed to the group
72and eventually delivered to one of the rings in the group.
73.Pp
74In the MAC framework, rings and groups are separated into categories
75based on their purpose: transmitting and receiving.
76While the MAC framework thinks of transmit and receive rings as
77different physical constructs, they may map to the same underlying
78resources in the hardware.
79The device driver may implement the
80.Dv MAC_CAPAB_RINGS
81capability for one of transmitting, receiving, or both.
82.Ss Mapping Hardware to Rings and Groups
83There are many different ways that hardware resources may map to this
84capability.
85Consider the following examples:
86.Bl -enum
87.It
88Hardware may support a feature commonly known as receive side scaling
89.Pq RSS .
90With RSS, the hardware has multiple rings and uses a hash function
91calculated over packet headers to choose which ring receives a
92particular packet.
93Rings are associated with different interrupts, allowing multiple rings
94to be processed in parallel.
95Supporting RSS in isolation would result in a device which has a single
96group, and multiple rings within that group.
97.It
98Some hardware may have a single ring, but still support multiple receive
99filters.
100This is commonly seen with some 1 GbE devices.
101While the hardware only has one ring, it has support for multiple
102independent MAC address filters, each of which can be programmed to
103receive traffic for a single MAC address.
104The driver should map this situation to a single group with a single
105ring.
106However, it would implement the ability to program several filters.
107While this may not seem useful at first, when virtual NICs are created
108on top of a physical NIC, the additional hardware filters will be used
109to avoid putting the device in promiscuous mode.
110.It
111Finally, some hardware has many rings, which can be placed in many
112different groups.
113Each group has its own filtering capabilities.
114For such hardware, the device driver would declare support for multiple
115groups, each of which has its own independent set of rings.
116.El
117.Pp
118When choosing hardware constructs to implement rings and groups, it is
119also important to consider interrupts.
120In order to support polling, each receive ring must be able to
121independently toggle whether that ring will generate an interrupt on
122packet reception, even when many rings share the same hardware level
123interrupt
124.Pq e.g. the same MSI or MSI-X interrupt number and handler .
125.Ss Filters
126The
127.Xr mac_group_info 9S
128structure is used to define several different kinds of filters that the
129group might implement.
130There are three different classes of filters that exist:
131.Bl -tag -width Ds
132.It Sy MAC Address
133A given frame matches a MAC Address filter if the receive address in
134the Ethernet Header matches the specified MAC address.
135.It Sy VLAN
136A given frame matches a VLAN filter if it both has an 802.1Q VLAN tag
137and that tag matches the VALN number specified in the filter.
138If the frame's outer ethertype is not 0x8100, then the filter will not
139match.
140.It Sy MAC Address and VLAN
141A given frame matches a MAC Address and VLAN filter if it matches both
142the specified MAC address and the specified VLAN.
143This is constructed as a logical AND of the previous two filters.
144If only one of the two matches, then the frame does not match this
145filter.
146.Pp
147Note: this filter type is still under development and has not been
148plumbed through our APIs yet.
149.El
150.Pp
151Devices may support many different filter types.
152If the hardware resources required for a combined filter type
153.Pq e.g. MAC Address and VLAN
154are similar to the resources required for each in isolation, drivers
155should prefer to implement just the combined type and should not
156implement the individual types.
157.Pp
158The MAC framework assumes that the following rules hold regarding
159filters:
160.Bl -enum
161.It
162When there are multiple filters of the same kind with different
163addresses, then the hardware will accept a frame if it matches
164.Em ANY
165of the specified filters.
166In other words, if there are two VLAN filters defined, one for VLAN 23
167and one for VLAN 42, then if a frame has either VLAN 23 or VLAN 42,
168it will be accepted for the group.
169.It
170If multiple different classes of filters are defined, then the hardware
171should only accept a frame if it passes
172.Em ALL
173of the filter classes.
174For example, if there is a MAC address filter and a separate VLAN
175filter, the hardware will only accept the frame if it passes both sets
176of filters.
177.It
178If there are multiple different classes of filters and there are
179multiple filters present in each class, then the driver will accept a
180packet as long as it matches
181.Em ALL
182filter classes.
183However, within a given filter class, it may match
184.Em ANY
185of the filters.
186See the following boolean logic as an alternative way to phrase this
187case:
188.Bd -literal -offset indent
189match = MAC && VLAN
190MAC = 00:11:22:33:44:55 OR 00:66:77:88:99:aa OR ...
191VLAN = 11 OR 12 OR ...
192.Ed
193.El
194.Pp
195The following pseudocode summarizes the behavior for a device that
196supports independent MAC and VLAN filters.
197If the hardware only supports a single family of filters, then simply
198treat that in the pseudocode as though it is always true:
199.Bd -literal -offset indent
200for each packet p:
201    for each MAC filter m:
202        if m matches p's mac:
203            for each VLAN filter v:
204                if v matches p's vlan:
205                    accept p for group
206	            proceed to next packet
207    reject packet p
208    proceed to next packet
209.Ed
210.Pp
211The following pseudocode summarizes the behavior for a device that
212supports a combined MAC address and VLAN filter:
213.Bd -literal -offset indent
214for each packet p:
215    for each filter f:
216        if f.mac matches p's mac and f.vlan matches p's vlan:
217            accept p for group
218	    proceed to next packet
219    reject packet p
220    proceed to next packet
221.Ed
222.Ss MAC Capability Structure
223When the device driver's
224.Xr mc_getcapab 9E
225function entry point is called with the capability requested set to
226.Dv MAC_CAPAB_RINGS ,
227then the value of the capability structure is a pointer to a
228.Vt mac_capab_rings_t
229structure with the following members:
230.Bd -literal -offset indent
231mac_ring_type_t         mr_type;
232mac_group_type_t        mr_group_type;
233uint_t                  mr_rnum;
234uint_t                  mr_gnum;
235mac_get_ring_t          mr_rget;
236mac_get_group_t         mr_gget;
237.Ed
238.Pp
239If the driver supports the
240.Dv MAC_CAPAB_RINGS
241capability, then it should first check the
242.Fa mr_type
243member of the structure.
244This member has the following possible values:
245.Bl -tag -width Dv
246.It Dv MAC_RING_TYPE_RX
247Indicates that this group is for receive rings.
248.It Dv MAC_RING_TYPE_TX
249Indicates that this group is for transmit rings.
250.El
251.Pp
252The driver will be asked to fill in this capability structure separately
253for receive and transmit groups and rings.
254This allows a driver to have different entry points for each type.
255If neither of these values is specified, then the device driver must
256return
257.Dv B_FALSE
258from its
259.Xr mc_getcapab 9E
260entry point.
261Once it has identified the type, it should fill in the capability
262structure based on the following rules:
263.Bl -tag -width Fa
264.It Fa mr_type
265The
266.Fa mr_type
267member is used to indicate whether this group is for transmit or receive
268rings.
269The
270.Fa mr_type
271member should not be modified by the device driver.
272It is set by the MAC framework when the driver's
273.Xr mc_getcapab 9E
274entry point is called.
275As indicated above, the driver must check the value to determine which
276group this
277.Xr mc_getcapab 9E
278call is referring to.
279.It Fa mr_group_type
280This member is used to indicate the group type.
281This should be set to
282.Dv MAC_GROUP_TYPE_STATIC ,
283which indicates that the assignment of rings to groups is fixed, and
284each ring can only ever belong to one specific group.
285The number of rings per group may vary on the group and can be set by
286the driver.
287.It Fa mr_rnum
288This indicates the total number of rings that are available.
289The number exposed may be less than the number supported in hardware.
290This is often due to receiving fewer resources such as interrupts.
291.It Fa mr_gnum
292This indicates the total number of groups that are available from
293hardware.
294The number exposed may be less than the number supported in hardware.
295This is often due to receiving fewer resources such as interrupts.
296.Pp
297When working with transmit rings, this value may be zero.
298In this case, each ring is treated independently and separate groups for
299each transmit ring are not required.
300.It Fa mr_rget
301This member is a function pointer that will be called to provide
302information about a ring inside of a specific group.
303See
304.Xr mr_rget 9E
305for information on the function, its signature, and responsibilities.
306.It Fa mr_gget
307This member is a function pointer that will be called to provide
308information about a group.
309See
310.Xr mr_gget 9E
311for information on the function, its signature, and responsibilities.
312.El
313.Sh DRIVER IMPLICATIONS
314.Ss MAC Callback Entry Points
315When a driver implements the
316.Dv MAC_CAPAB_RINGS
317capability, then it must not implement some of the traditional MAC
318callbacks.
319If the driver supports
320.Dv MAC_CAPAB_RINGS
321for receiving, then it must not implement the
322.Xr mc_unicst 9E
323entry point.
324This is instead handled through the filters that were described earlier.
325The filter entry points are defined as part of the
326.Xr mac_group_info 9S
327structure.
328.Pp
329If the driver supports
330.Dv MAC_CAPAB_RINGS
331for transmitting, then it should not implement the
332.Xr mc_tx 9E
333entry point, it will not be used.
334The MAC framework will instead use the
335.Xr mri_tx 9E
336entry point that is provided by the driver in the
337.Xr mac_ring_info 9S
338structure.
339.Ss Locking and Concurrency
340One of the main points of the
341.Dv MAC_CAPAB_RINGS
342capability is to increase the parallelism and concurrency that is
343actively going on in the driver.
344This means that a driver may be asked to transmit, poll, or receive
345interrupts on all of its rings in parallel.
346This usually calls for fine-grained locking in a driver's own data
347structures to ensure that the various rings can be populated and used
348without having to block on one another.
349In general, most drivers have their own independent set of locks for
350each transmit and receive ring.
351They also usually have separate locks for each group.
352.Pp
353Just because one driver performs locking in one way, does not mean that
354one has to mimic it.
355The design of a driver and its locking is often tightly coupled to how
356the underlying hardware works and its complexity.
357.Ss Polling on rings
358When the
359.Dv MAC_CAPAB_RINGS
360capability is implemented, then additional functionality for receiving
361becomes available.
362A receive ring has the ability to be polled.
363When the operating system desires to begin polling the ring, it will
364make a function call into the driver, asking it to receive packets from
365this ring.
366When receiving packets while polling, the process is generally identical
367to that described in the
368.Sy Receiving Data
369section of
370.Xr mac 9E .
371For more details, see
372.Xr mri_poll 9E .
373.Pp
374When the MAC framework wants to enable polling, it will first turn off
375interrupts through the
376.Xr mi_disable 9E
377entry point on the driver.
378The driver must ensure that there is proper serialization between the
379interrupt enablement, interrupt disablement, the interrupt handler for
380that ring, and the
381.Xr mri_poll 9E
382entry point.
383For more information on the locking requirements related to polling, see
384the discussions in
385.Xr mri_poll 9E
386and
387.Xr mi_disable 9E .
388.Ss Updated callback functions
389When using rings, two of the primary functions that were used change.
390First, the
391.Xr mac_rx 9F
392function should be replaced with the
393.Xr mac_rx_ring 9F
394function.
395Secondly,
396the
397.Xr mac_tx_update 9F
398function should be replaced with the
399.Xr mac_tx_ring_update 9F
400function.
401.Ss Interrupt and Ring Mapping
402Drivers often vary the number of rings that they expose based on the
403number of interrupts that exist.
404When a driver only supports a single group, there is often no reason to
405have more rings than interrupts.
406However, most hardware supports a means of having multiple rings tie to
407the same interrupt.
408Drivers then tie the rings in different groups to the same interrupts
409and therefore when an interrupt is triggered, iterate over all of the
410rings.
411.Pp
412Tying multiple rings together into a single interrupt should only be done
413if hardware has the ability to control whether or not each ring
414contributes to the interrupt.
415For the
416.Xr mi_disable 9E
417entry point to work, each ring must be able to independently control
418whether or not receipt of a packet generates the shared interrupt.
419.Ss Filter Management
420As part of general operation, the device driver will be asked to add
421various filters to groups.
422The MAC framework does not keep track of the assigned filters in such a
423way that after a device reset that they'll be given to the driver again.
424Therefore, it is recommended that the driver keep track of all filters
425it has assigned such that they can be reinstated after a driver or
426system initiated device reset of some kind.
427There is no need to persist anything across a call to
428.Xr detach 9E
429or similar.
430.Pp
431For more information, see the
432.Sy TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
433section of
434.Xr mac 9E .
435.Ss Broadcast, Multicast, and Promiscuous Mode
436Rings and groups are currently designed to emphasize and enhance the
437receipt of filtered, unicast frames.
438This means that special handling is required when working with broadcast
439traffic, multicast traffic, and enabling promiscuous mode.
440This only applies to receive groups and rings.
441.Pp
442By default, only the first group with index zero, sometimes called the
443default group, should ever be
444programmed to receive broadcast traffic.
445This group should always be programmed to receive broadcast traffic, the
446same way that the broader device is programmed to always receive
447broadcast traffic when the
448.Dv MAC_CAPAB_RINGS
449capability has not been negotiated.
450.Pp
451When multicast addresses are assigned to the device through the
452.Xr mc_multicst 9E
453entry point, those should also be assigned to the first group.
454.Pp
455Similarly, when enabling promiscuous mode, the driver should only enable
456promiscuous traffic to be received by the first group.
457.Pp
458No other groups or rings should ever receive broadcast, multicast, or
459promiscuous mode traffic.
460.Sh SEE ALSO
461.Xr mac 9E ,
462.Xr mc_getcapab 9E ,
463.Xr mc_multicst 9E ,
464.Xr mc_tx 9E ,
465.Xr mc_unicst 9E ,
466.Xr mi_disable 9E ,
467.Xr mr_gaddring 9E ,
468.Xr mr_gget 9E ,
469.Xr mr_gremring 9E ,
470.Xr mr_rget 9E ,
471.Xr mri_poll 9E ,
472.Xr mac_rx 9F ,
473.Xr mac_rx_ring 9F ,
474.Xr mac_tx_ring_update 9F ,
475.Xr mac_tx_update 9F ,
476.Xr mac_group_info 9S
477