xref: /illumos-gate/usr/src/man/man9e/mac_capab_rings.9e (revision 42b53e0fbc5c05289c3d334bb864b784fafe5ce4)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright (c) 2017, Joyent, Inc.
13.\" Copyright 2022 Oxide Computer Company
14.\"
15.Dd July 2, 2022
16.Dt MAC_CAPAB_RINGS 9E
17.Os
18.Sh NAME
19.Nm mac_capab_rings
20.Nd MAC ring capability
21.Sh SYNOPSIS
22.In sys/mac_provider.h
23.Vt typedef struct mac_capab_rings_s mac_capab_rings_t;
24.Sh INTERFACE LEVEL
25.Sy Uncommitted -
26This interface is still evolving.
27API and ABI stability is not guaranteed.
28.Sh DESCRIPTION
29The
30.Sy MAC_CAPAB_RINGS
31capability provides a means for device drivers to take advantage of
32the additional resources offered by hardware beyond the basic operations
33to transmit and receive.
34There are two primary concepts that this MAC capability relies on: rings
35and groups.
36.Pp
37The
38.Em ring
39is a abstract concept which must be mapped to some hardware construct by
40the driver.
41It typically takes the form of a DMA memory region which is divided
42into many smaller units, called descriptors or entries.
43Each entry in the ring describes a location in memory of a packet, which the
44hardware is to read from
45.Pq to transmit it
46or write to
47.Pq upon reception .
48Entries also typically contain metadata and attributes about the packet.
49These entries are typically arranged in a fixed-size circular buffer
50.Po hence the
51.Dq ring
52name
53.Pc
54which is shared between the operating system and the
55hardware via the DMA-backed memory.
56Most NICs, regardless of their support for this capability, use something
57resembling a descriptor ring under the hood.
58Some vendors may also refer to rings as
59.Em queues .
60The ring concept is intentionally general, so that more unusual underlying
61hardware constructs can also be used to implement it.
62.Pp
63A collection of one or more rings is called a
64.Em group .
65Each group usually has a collection of filters that can be associated
66with them.
67These filters are usually defined in terms of matching something like a
68MAC address, VLAN, or Ethertype, though more complex filters may exist
69in hardware.
70When a packet matches a filter, it will then be directed to the group
71and eventually delivered to one of the rings in the group.
72.Pp
73In the MAC framework, rings and groups are separated into categories
74based on their purpose: transmitting and receiving.
75While the MAC framework thinks of transmit and receive rings as
76different physical constructs, they may map to the same underlying
77resources in the hardware.
78The device driver may implement the
79.Dv MAC_CAPAB_RINGS
80capability for one of transmitting, receiving, or both.
81.Ss Mapping Hardware to Rings and Groups
82There are many different ways that hardware resources may map to this
83capability.
84Consider the following examples:
85.Bl -enum
86.It
87Hardware may support a feature commonly known as receive side scaling
88.Pq RSS .
89With RSS, the hardware has multiple rings and uses a hash function
90calculated over packet headers to choose which ring receives a
91particular packet.
92Rings are associated with different interrupts, allowing multiple rings
93to be processed in parallel.
94Supporting RSS in isolation would result in a device which has a single
95group, and multiple rings within that group.
96.It
97Some hardware may have a single ring, but still support multiple receive
98filters.
99This is commonly seen with some 1 GbE devices.
100While the hardware only has one ring, it has support for multiple
101independent MAC address filters, each of which can be programmed to
102receive traffic for a single MAC address.
103The driver should map this situation to a single group with a single
104ring.
105However, it would implement the ability to program several filters.
106While this may not seem useful at first, when virtual NICs are created
107on top of a physical NIC, the additional hardware filters will be used
108to avoid putting the device in promiscuous mode.
109.It
110Finally, some hardware has many rings, which can be placed in many
111different groups.
112Each group has its own filtering capabilities.
113For such hardware, the device driver would declare support for multiple
114groups, each of which has its own independent set of rings.
115.El
116.Pp
117When choosing hardware constructs to implement rings and groups, it is
118also important to consider interrupts.
119In order to support polling, each receive ring must be able to
120independently toggle whether that ring will generate an interrupt on
121packet reception, even when many rings share the same hardware level
122interrupt
123.Pq e.g. the same MSI or MSI-X interrupt number and handler .
124.Ss Filters
125The
126.Xr mac_group_info 9S
127structure is used to define several different kinds of filters that the
128group might implement.
129There are three different classes of filters that exist:
130.Bl -tag -width Ds
131.It Sy MAC Address
132A given frame matches a MAC Address filter if the receive address in
133the Ethernet Header matches the specified MAC address.
134.It Sy VLAN
135A given frame matches a VLAN filter if it both has an 802.1Q VLAN tag
136and that tag matches the VALN number specified in the filter.
137If the frame's outer ethertype is not 0x8100, then the filter will not
138match.
139.It Sy MAC Address and VLAN
140A given frame matches a MAC Address and VLAN filter if it matches both
141the specified MAC address and the specified VLAN.
142This is constructed as a logical AND of the previous two filters.
143If only one of the two matches, then the frame does not match this
144filter.
145.Pp
146Note: this filter type is still under development and has not been
147plumbed through our APIs yet.
148.El
149.Pp
150Devices may support many different filter types.
151If the hardware resources required for a combined filter type
152.Pq e.g. MAC Address and VLAN
153are similar to the resources required for each in isolation, drivers
154should prefer to implement just the combined type and should not
155implement the individual types.
156.Pp
157The MAC framework assumes that the following rules hold regarding
158filters:
159.Bl -enum
160.It
161When there are multiple filters of the same kind with different
162addresses, then the hardware will accept a frame if it matches
163.Em ANY
164of the specified filters.
165In other words, if there are two VLAN filters defined, one for VLAN 23
166and one for VLAN 42, then if a frame has either VLAN 23 or VLAN 42,
167it will be accepted for the group.
168.It
169If multiple different classes of filters are defined, then the hardware
170should only accept a frame if it passes
171.Em ALL
172of the filter classes.
173For example, if there is a MAC address filter and a separate VLAN
174filter, the hardware will only accept the frame if it passes both sets
175of filters.
176.It
177If there are multiple different classes of filters and there are
178multiple filters present in each class, then the driver will accept a
179packet as long as it matches
180.Em ALL
181filter classes.
182However, within a given filter class, it may match
183.Em ANY
184of the filters.
185See the following boolean logic as an alternative way to phrase this
186case:
187.Bd -literal -offset indent
188match = MAC && VLAN
189MAC = 00:11:22:33:44:55 OR 00:66:77:88:99:aa OR ...
190VLAN = 11 OR 12 OR ...
191.Ed
192.El
193.Pp
194The following psuedocode summarizes the behavior for a device that
195supports independent MAC and VLAN filters.
196If the hardware only supports a single family of filters, then simply
197treat that in the psuedocode as though it is always true:
198.Bd -literal -offset indent
199for each packet p:
200    for each MAC filter m:
201        if m matches p's mac:
202            for each VLAN filter v:
203                if v matches p's vlan:
204                    accept p for group
205	            proceed to next packet
206    reject packet p
207    proceed to next packet
208.Ed
209.Pp
210The following psuedocode summarizes the behavior for a device that
211supports a combined MAC address and VLAN filter:
212.Bd -literal -offset indent
213for each packet p:
214    for each filter f:
215        if f.mac matches p's mac and f.vlan matches p's vlan:
216            accept p for group
217	    proceed to next packet
218    reject packet p
219    proceed to next packet
220.Ed
221.Ss MAC Capability Structure
222When the device driver's
223.Xr mc_getcapab 9E
224function entry point is called with the capability requested set to
225.Dv MAC_CAPAB_RINGS ,
226then the value of the capability structure is a pointer to a
227.Vt mac_capab_rings_t
228structure with the following members:
229.Bd -literal -offset indent
230mac_ring_type_t         mr_type;
231mac_groupt_type_t       mr_group_type;
232uint_t                  mr_rnum;
233uint_t                  mr_gnum;
234mac_get_ring_t          mr_rget;
235mac_get_group_t         mr_gget;
236.Ed
237.Pp
238If the driver supports the
239.Dv MAC_CAPAB_RINGS
240capability, then it should first check the
241.Fa mr_type
242member of the structure.
243This member has the following possible values:
244.Bl -tag -width Dv
245.It Dv MAC_RING_TYPE_RX
246Indicates that this group is for receive rings.
247.It Dv MAC_RING_TYPE_TX
248Indicates that this group is for transmit rings.
249.El
250.Pp
251The driver will be asked to fill in this capability structure separately
252for receive and transmit groups and rings.
253This allows a driver to have different entry points for each type.
254If neither of these values is specified, then the device driver must
255return
256.Dv B_FALSE
257from its
258.Xr mc_getcapab 9E
259entry point.
260Once it has identified the type, it should fill in the capability
261structure based on the following rules:
262.Bl -tag -width Fa
263.It Fa mr_type
264The
265.Fa mr_type
266member is used to indicate whether this group is for transmit or receive
267rings.
268The
269.Fa mr_type
270member should not be modified by the device driver.
271It is set by the MAC framework when the driver's
272.Xr mc_getcapab 9E
273entry point is called.
274As indicated above, the driver must check the value to determine which
275group this
276.Xr mc_getcapab 9E
277call is referring to.
278.It Fa mr_group_type
279This member is used to indicate the group type.
280This should be set to
281.Dv MAC_GROUP_TYPE_STATIC ,
282which indicates that the assignment of rings to groups is fixed, and
283each ring can only ever belong to one specific group.
284The number of rings per group may vary on the group and can be set by
285the driver.
286.It Fa mr_rnum
287This indicates the total number of rings that are available.
288The number exposed may be less than the number supported in hardware.
289This is often due to receiving fewer resources such as interrupts.
290.It Fa mr_gnum
291This indicates the total number of groups that are available from
292hardware.
293The number exposed may be less than the number supported in hardware.
294This is often due to receiving fewer resources such as interrupts.
295.Pp
296When working with transmit rings, this value may be zero.
297In this case, each ring is treated independently and separate groups for
298each transmit ring are not required.
299.It Fa mr_rget
300This member is a function pointer that will be called to provide
301information about a ring inside of a specific group.
302See
303.Xr mr_rget 9E
304for information on the function, its signature, and responsibilities.
305.It Fa mr_gget
306This member is a function pointer that will be called to provide
307information about a group.
308See
309.Xr mr_gget 9E
310for information on the function, its signature, and responsibilities.
311.El
312.Sh DRIVER IMPLICATIONS
313.Ss MAC Callback Entry Points
314When a driver implements the
315.Dv MAC_CAPAB_RINGS
316capability, then it must not implement some of the traditional MAC
317callbacks.
318If the driver supports
319.Dv MAC_CAPAB_RINGS
320for receiving, then it must not implement the
321.Xr mc_unicst 9E
322entry point.
323This is instead handled through the filters that were described earlier.
324The filter entry points are defined as part of the
325.Xr mac_group_info 9S
326structure.
327.Pp
328If the driver supports
329.Dv MAC_CAPAB_RINGS
330for transmitting, then it should not implement the
331.Xr mc_tx 9E
332entry point, it will not be used.
333The MAC framework will instead use the
334.Xr mri_tx 9E
335entry point that is provided by the driver in the
336.Xr mac_ring_info 9S
337structure.
338.Ss Locking and Concurrency
339One of the main points of the
340.Dv MAC_CAPAB_RINGS
341capability is to increase the parallelism and concurrency that is
342actively going on in the driver.
343This means that a driver may be asked to transmit, poll, or receiver
344interrupts on all of its rings in parallel.
345This usually calls for fine-grained locking in a driver's own data
346structures to ensure that the various rings can be populated and used
347without having to block on one another.
348In general, most drivers have their own independent set of locks for
349each transmit and receive ring.
350They also usually have separate locks for each group.
351.Pp
352Just because one driver performs locking in one way, does not mean that
353one has to mimic it.
354The design of a driver and its locking is often tightly coupled to how
355the underlying hardware works and its complexity.
356.Ss Polling on rings
357When the
358.Dv MAC_CAPAB_RINGS
359capability is implemented, then additional functionality for receiving
360becomes available.
361A receive ring has the ability to be polled.
362When the operating system desires to begin polling the ring, it will
363make a function call into the driver, asking it to receive packets from
364this ring.
365When receiving packets while polling, the process is generally identical
366to that described in the
367.Sy Receiving Data
368section of
369.Xr mac 9E .
370For more details, see
371.Xr mri_poll 9E .
372.Pp
373When the MAC framework wants to enable polling, it will first turn off
374interrupts through the
375.Xr mi_disable 9E
376entry point on the driver.
377The driver must ensure that there is proper serialization between the
378interrupt enablement, interrupt disablement, the interrupt handler for
379that ring, and the
380.Xr mri_poll 9E
381entry point.
382For more information on the locking requirements related to polling, see
383the discussions in
384.Xr mri_poll 9E
385and
386.Xr mi_disable 9E .
387.Ss Updated callback functions
388When using rings, two of the primary functions that were used change.
389First, the
390.Xr mac_rx 9F
391function should be replaced with the
392.Xr mac_ring_rx 9F
393function.
394Secondly,
395the
396.Xr mac_tx_update 9F
397function should be replaced with the
398.Xr mac_tx_ring_update 9F
399function.
400.Ss Interrupt and Ring Mapping
401Drivers often vary the number of rings that they expose based on the
402number of interrupts that exist.
403When a driver only supports a single group, there is often no reason to
404have more rings than interrupts.
405However, most hardware supports a means of having multiple rings tie to
406the same interrupt.
407Drivers then tie the rings in different groups to the same interrupts
408and therefore when an interrupt is triggered, iterate over all of the
409rings.
410.Pp
411Tying multiple rings together into a single interrupt should only be done
412if hardware has the ability to control whether or not each ring
413contributes to the interrupt.
414For the
415.Xr mi_disable 9E
416entry point to work, each ring must be able to independently control
417whether or not receipt of a packet generates the shared interrupt.
418.Ss Filter Management
419As part of general operation, the device driver will be asked to add
420various filters to groups.
421The MAC framework does not keep track of the assigned filters in such a
422way that after a device reset that they'll be given to the driver again.
423Therefore, it is recommended that the driver keep track of all filters
424it has assigned such that they can be reinstated after a driver or
425system initiated device reset of some kind.
426There is no need to persist anything across a call to
427.Xr detach 9E
428or similar.
429.Pp
430For more information, see the
431.Sy TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT
432section of
433.Xr mac 9E .
434.Ss Broadcast, Multicast, and Promiscuous Mode
435Rings and groups are currently designed to emphasize and enhance the
436receipt of filtered, unicast frames.
437This means that special handling is required when working with broadcast
438traffic, multicast traffic, and enabling promiscuous mode.
439This only applies to receive groups and rings.
440.Pp
441By default, only the first group with index zero, sometimes called the
442default group, should ever be
443programmed to receive broadcast traffic.
444This group should always be programmed to receive broadcast traffic, the
445same way that the broader device is programmed to always receive
446broadcast traffic when the
447.Dv MAC_CAPAB_RINGS
448capability has not been negotiated.
449.Pp
450When multicast addresses are assigned to the device through the
451.Xr mc_multicst 9E
452entry point, those should also be assigned to the first group.
453.Pp
454Similarly, when enabling promiscuous mode, the driver should only enable
455promiscuous traffic to be received by the first group.
456.Pp
457No other groups or rings should ever receive broadcast, multicast, or
458promiscuous mode traffic.
459.Sh SEE ALSO
460.Xr mac 9E ,
461.Xr mc_getcapab 9E ,
462.Xr mc_multicst 9E ,
463.Xr mc_tx 9E ,
464.Xr mc_unicst 9E ,
465.Xr mi_disable 9E ,
466.Xr mr_gaddring 9E ,
467.Xr mr_gget 9E ,
468.Xr mr_gremring 9E ,
469.Xr mr_rget 9E ,
470.Xr mri_poll 9E ,
471.Xr mac_ring_rx 9F ,
472.Xr mac_rx 9F ,
473.Xr mac_tx_ring_update 9F ,
474.Xr mac_tx_update 9F ,
475.Xr mac_group_info 9S
476