1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright (c) 2017, Joyent, Inc. 13.\" Copyright 2022 Oxide Computer Company 14.\" Copyright 2023 Peter Tribble 15.\" 16.Dd July 17, 2023 17.Dt MAC_CAPAB_RINGS 9E 18.Os 19.Sh NAME 20.Nm mac_capab_rings 21.Nd MAC ring capability 22.Sh SYNOPSIS 23.In sys/mac_provider.h 24.Vt typedef struct mac_capab_rings_s mac_capab_rings_t; 25.Sh INTERFACE LEVEL 26.Sy Uncommitted - 27This interface is still evolving. 28API and ABI stability is not guaranteed. 29.Sh DESCRIPTION 30The 31.Sy MAC_CAPAB_RINGS 32capability provides a means for device drivers to take advantage of 33the additional resources offered by hardware beyond the basic operations 34to transmit and receive. 35There are two primary concepts that this MAC capability relies on: rings 36and groups. 37.Pp 38The 39.Em ring 40is a abstract concept which must be mapped to some hardware construct by 41the driver. 42It typically takes the form of a DMA memory region which is divided 43into many smaller units, called descriptors or entries. 44Each entry in the ring describes a location in memory of a packet, which the 45hardware is to read from 46.Pq to transmit it 47or write to 48.Pq upon reception . 49Entries also typically contain metadata and attributes about the packet. 50These entries are typically arranged in a fixed-size circular buffer 51.Po hence the 52.Dq ring 53name 54.Pc 55which is shared between the operating system and the 56hardware via the DMA-backed memory. 57Most NICs, regardless of their support for this capability, use something 58resembling a descriptor ring under the hood. 59Some vendors may also refer to rings as 60.Em queues . 61The ring concept is intentionally general, so that more unusual underlying 62hardware constructs can also be used to implement it. 63.Pp 64A collection of one or more rings is called a 65.Em group . 66Each group usually has a collection of filters that can be associated 67with them. 68These filters are usually defined in terms of matching something like a 69MAC address, VLAN, or Ethertype, though more complex filters may exist 70in hardware. 71When a packet matches a filter, it will then be directed to the group 72and eventually delivered to one of the rings in the group. 73.Pp 74In the MAC framework, rings and groups are separated into categories 75based on their purpose: transmitting and receiving. 76While the MAC framework thinks of transmit and receive rings as 77different physical constructs, they may map to the same underlying 78resources in the hardware. 79The device driver may implement the 80.Dv MAC_CAPAB_RINGS 81capability for one of transmitting, receiving, or both. 82.Ss Mapping Hardware to Rings and Groups 83There are many different ways that hardware resources may map to this 84capability. 85Consider the following examples: 86.Bl -enum 87.It 88Hardware may support a feature commonly known as receive side scaling 89.Pq RSS . 90With RSS, the hardware has multiple rings and uses a hash function 91calculated over packet headers to choose which ring receives a 92particular packet. 93Rings are associated with different interrupts, allowing multiple rings 94to be processed in parallel. 95Supporting RSS in isolation would result in a device which has a single 96group, and multiple rings within that group. 97.It 98Some hardware may have a single ring, but still support multiple receive 99filters. 100This is commonly seen with some 1 GbE devices. 101While the hardware only has one ring, it has support for multiple 102independent MAC address filters, each of which can be programmed to 103receive traffic for a single MAC address. 104The driver should map this situation to a single group with a single 105ring. 106However, it would implement the ability to program several filters. 107While this may not seem useful at first, when virtual NICs are created 108on top of a physical NIC, the additional hardware filters will be used 109to avoid putting the device in promiscuous mode. 110.It 111Finally, some hardware has many rings, which can be placed in many 112different groups. 113Each group has its own filtering capabilities. 114For such hardware, the device driver would declare support for multiple 115groups, each of which has its own independent set of rings. 116.El 117.Pp 118When choosing hardware constructs to implement rings and groups, it is 119also important to consider interrupts. 120In order to support polling, each receive ring must be able to 121independently toggle whether that ring will generate an interrupt on 122packet reception, even when many rings share the same hardware level 123interrupt 124.Pq e.g. the same MSI or MSI-X interrupt number and handler . 125.Ss Filters 126The 127.Xr mac_group_info 9S 128structure is used to define several different kinds of filters that the 129group might implement. 130There are three different classes of filters that exist: 131.Bl -tag -width Ds 132.It Sy MAC Address 133A given frame matches a MAC Address filter if the receive address in 134the Ethernet Header matches the specified MAC address. 135.It Sy VLAN 136A given frame matches a VLAN filter if it both has an 802.1Q VLAN tag 137and that tag matches the VALN number specified in the filter. 138If the frame's outer ethertype is not 0x8100, then the filter will not 139match. 140.It Sy MAC Address and VLAN 141A given frame matches a MAC Address and VLAN filter if it matches both 142the specified MAC address and the specified VLAN. 143This is constructed as a logical AND of the previous two filters. 144If only one of the two matches, then the frame does not match this 145filter. 146.Pp 147Note: this filter type is still under development and has not been 148plumbed through our APIs yet. 149.El 150.Pp 151Devices may support many different filter types. 152If the hardware resources required for a combined filter type 153.Pq e.g. MAC Address and VLAN 154are similar to the resources required for each in isolation, drivers 155should prefer to implement just the combined type and should not 156implement the individual types. 157.Pp 158The MAC framework assumes that the following rules hold regarding 159filters: 160.Bl -enum 161.It 162When there are multiple filters of the same kind with different 163addresses, then the hardware will accept a frame if it matches 164.Em ANY 165of the specified filters. 166In other words, if there are two VLAN filters defined, one for VLAN 23 167and one for VLAN 42, then if a frame has either VLAN 23 or VLAN 42, 168it will be accepted for the group. 169.It 170If multiple different classes of filters are defined, then the hardware 171should only accept a frame if it passes 172.Em ALL 173of the filter classes. 174For example, if there is a MAC address filter and a separate VLAN 175filter, the hardware will only accept the frame if it passes both sets 176of filters. 177.It 178If there are multiple different classes of filters and there are 179multiple filters present in each class, then the driver will accept a 180packet as long as it matches 181.Em ALL 182filter classes. 183However, within a given filter class, it may match 184.Em ANY 185of the filters. 186See the following boolean logic as an alternative way to phrase this 187case: 188.Bd -literal -offset indent 189match = MAC && VLAN 190MAC = 00:11:22:33:44:55 OR 00:66:77:88:99:aa OR ... 191VLAN = 11 OR 12 OR ... 192.Ed 193.El 194.Pp 195The following pseudocode summarizes the behavior for a device that 196supports independent MAC and VLAN filters. 197If the hardware only supports a single family of filters, then simply 198treat that in the pseudocode as though it is always true: 199.Bd -literal -offset indent 200for each packet p: 201 for each MAC filter m: 202 if m matches p's mac: 203 for each VLAN filter v: 204 if v matches p's vlan: 205 accept p for group 206 proceed to next packet 207 reject packet p 208 proceed to next packet 209.Ed 210.Pp 211The following pseudocode summarizes the behavior for a device that 212supports a combined MAC address and VLAN filter: 213.Bd -literal -offset indent 214for each packet p: 215 for each filter f: 216 if f.mac matches p's mac and f.vlan matches p's vlan: 217 accept p for group 218 proceed to next packet 219 reject packet p 220 proceed to next packet 221.Ed 222.Ss MAC Capability Structure 223When the device driver's 224.Xr mc_getcapab 9E 225function entry point is called with the capability requested set to 226.Dv MAC_CAPAB_RINGS , 227then the value of the capability structure is a pointer to a 228.Vt mac_capab_rings_t 229structure with the following members: 230.Bd -literal -offset indent 231mac_ring_type_t mr_type; 232mac_group_type_t mr_group_type; 233uint_t mr_rnum; 234uint_t mr_gnum; 235mac_get_ring_t mr_rget; 236mac_get_group_t mr_gget; 237.Ed 238.Pp 239If the driver supports the 240.Dv MAC_CAPAB_RINGS 241capability, then it should first check the 242.Fa mr_type 243member of the structure. 244This member has the following possible values: 245.Bl -tag -width Dv 246.It Dv MAC_RING_TYPE_RX 247Indicates that this group is for receive rings. 248.It Dv MAC_RING_TYPE_TX 249Indicates that this group is for transmit rings. 250.El 251.Pp 252The driver will be asked to fill in this capability structure separately 253for receive and transmit groups and rings. 254This allows a driver to have different entry points for each type. 255If neither of these values is specified, then the device driver must 256return 257.Dv B_FALSE 258from its 259.Xr mc_getcapab 9E 260entry point. 261Once it has identified the type, it should fill in the capability 262structure based on the following rules: 263.Bl -tag -width Fa 264.It Fa mr_type 265The 266.Fa mr_type 267member is used to indicate whether this group is for transmit or receive 268rings. 269The 270.Fa mr_type 271member should not be modified by the device driver. 272It is set by the MAC framework when the driver's 273.Xr mc_getcapab 9E 274entry point is called. 275As indicated above, the driver must check the value to determine which 276group this 277.Xr mc_getcapab 9E 278call is referring to. 279.It Fa mr_group_type 280This member is used to indicate the group type. 281This should be set to 282.Dv MAC_GROUP_TYPE_STATIC , 283which indicates that the assignment of rings to groups is fixed, and 284each ring can only ever belong to one specific group. 285The number of rings per group may vary on the group and can be set by 286the driver. 287.It Fa mr_rnum 288This indicates the total number of rings that are available. 289The number exposed may be less than the number supported in hardware. 290This is often due to receiving fewer resources such as interrupts. 291.It Fa mr_gnum 292This indicates the total number of groups that are available from 293hardware. 294The number exposed may be less than the number supported in hardware. 295This is often due to receiving fewer resources such as interrupts. 296.Pp 297When working with transmit rings, this value may be zero. 298In this case, each ring is treated independently and separate groups for 299each transmit ring are not required. 300.It Fa mr_rget 301This member is a function pointer that will be called to provide 302information about a ring inside of a specific group. 303See 304.Xr mr_rget 9E 305for information on the function, its signature, and responsibilities. 306.It Fa mr_gget 307This member is a function pointer that will be called to provide 308information about a group. 309See 310.Xr mr_gget 9E 311for information on the function, its signature, and responsibilities. 312.El 313.Sh DRIVER IMPLICATIONS 314.Ss MAC Callback Entry Points 315When a driver implements the 316.Dv MAC_CAPAB_RINGS 317capability, then it must not implement some of the traditional MAC 318callbacks. 319If the driver supports 320.Dv MAC_CAPAB_RINGS 321for receiving, then it must not implement the 322.Xr mc_unicst 9E 323entry point. 324This is instead handled through the filters that were described earlier. 325The filter entry points are defined as part of the 326.Xr mac_group_info 9S 327structure. 328.Pp 329If the driver supports 330.Dv MAC_CAPAB_RINGS 331for transmitting, then it should not implement the 332.Xr mc_tx 9E 333entry point, it will not be used. 334The MAC framework will instead use the 335.Xr mri_tx 9E 336entry point that is provided by the driver in the 337.Xr mac_ring_info 9S 338structure. 339.Ss Locking and Concurrency 340One of the main points of the 341.Dv MAC_CAPAB_RINGS 342capability is to increase the parallelism and concurrency that is 343actively going on in the driver. 344This means that a driver may be asked to transmit, poll, or receive 345interrupts on all of its rings in parallel. 346This usually calls for fine-grained locking in a driver's own data 347structures to ensure that the various rings can be populated and used 348without having to block on one another. 349In general, most drivers have their own independent set of locks for 350each transmit and receive ring. 351They also usually have separate locks for each group. 352.Pp 353Just because one driver performs locking in one way, does not mean that 354one has to mimic it. 355The design of a driver and its locking is often tightly coupled to how 356the underlying hardware works and its complexity. 357.Ss Polling on rings 358When the 359.Dv MAC_CAPAB_RINGS 360capability is implemented, then additional functionality for receiving 361becomes available. 362A receive ring has the ability to be polled. 363When the operating system desires to begin polling the ring, it will 364make a function call into the driver, asking it to receive packets from 365this ring. 366When receiving packets while polling, the process is generally identical 367to that described in the 368.Sy Receiving Data 369section of 370.Xr mac 9E . 371For more details, see 372.Xr mri_poll 9E . 373.Pp 374When the MAC framework wants to enable polling, it will first turn off 375interrupts through the 376.Xr mi_disable 9E 377entry point on the driver. 378The driver must ensure that there is proper serialization between the 379interrupt enablement, interrupt disablement, the interrupt handler for 380that ring, and the 381.Xr mri_poll 9E 382entry point. 383For more information on the locking requirements related to polling, see 384the discussions in 385.Xr mri_poll 9E 386and 387.Xr mi_disable 9E . 388.Ss Updated callback functions 389When using rings, two of the primary functions that were used change. 390First, the 391.Xr mac_rx 9F 392function should be replaced with the 393.Xr mac_rx_ring 9F 394function. 395Secondly, 396the 397.Xr mac_tx_update 9F 398function should be replaced with the 399.Xr mac_tx_ring_update 9F 400function. 401.Ss Interrupt and Ring Mapping 402Drivers often vary the number of rings that they expose based on the 403number of interrupts that exist. 404When a driver only supports a single group, there is often no reason to 405have more rings than interrupts. 406However, most hardware supports a means of having multiple rings tie to 407the same interrupt. 408Drivers then tie the rings in different groups to the same interrupts 409and therefore when an interrupt is triggered, iterate over all of the 410rings. 411.Pp 412Tying multiple rings together into a single interrupt should only be done 413if hardware has the ability to control whether or not each ring 414contributes to the interrupt. 415For the 416.Xr mi_disable 9E 417entry point to work, each ring must be able to independently control 418whether or not receipt of a packet generates the shared interrupt. 419.Ss Filter Management 420As part of general operation, the device driver will be asked to add 421various filters to groups. 422The MAC framework does not keep track of the assigned filters in such a 423way that after a device reset that they'll be given to the driver again. 424Therefore, it is recommended that the driver keep track of all filters 425it has assigned such that they can be reinstated after a driver or 426system initiated device reset of some kind. 427There is no need to persist anything across a call to 428.Xr detach 9E 429or similar. 430.Pp 431For more information, see the 432.Sy TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT 433section of 434.Xr mac 9E . 435.Ss Broadcast, Multicast, and Promiscuous Mode 436Rings and groups are currently designed to emphasize and enhance the 437receipt of filtered, unicast frames. 438This means that special handling is required when working with broadcast 439traffic, multicast traffic, and enabling promiscuous mode. 440This only applies to receive groups and rings. 441.Pp 442By default, only the first group with index zero, sometimes called the 443default group, should ever be 444programmed to receive broadcast traffic. 445This group should always be programmed to receive broadcast traffic, the 446same way that the broader device is programmed to always receive 447broadcast traffic when the 448.Dv MAC_CAPAB_RINGS 449capability has not been negotiated. 450.Pp 451When multicast addresses are assigned to the device through the 452.Xr mc_multicst 9E 453entry point, those should also be assigned to the first group. 454.Pp 455Similarly, when enabling promiscuous mode, the driver should only enable 456promiscuous traffic to be received by the first group. 457.Pp 458No other groups or rings should ever receive broadcast, multicast, or 459promiscuous mode traffic. 460.Sh SEE ALSO 461.Xr mac 9E , 462.Xr mc_getcapab 9E , 463.Xr mc_multicst 9E , 464.Xr mc_tx 9E , 465.Xr mc_unicst 9E , 466.Xr mi_disable 9E , 467.Xr mr_gaddring 9E , 468.Xr mr_gget 9E , 469.Xr mr_gremring 9E , 470.Xr mr_rget 9E , 471.Xr mri_poll 9E , 472.Xr mac_rx 9F , 473.Xr mac_rx_ring 9F , 474.Xr mac_tx_ring_update 9F , 475.Xr mac_tx_update 9F , 476.Xr mac_group_info 9S 477