1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright (c) 2017, Joyent, Inc. 13.\" Copyright 2022 Oxide Computer Company 14.\" 15.Dd July 2, 2022 16.Dt MAC_CAPAB_RINGS 9E 17.Os 18.Sh NAME 19.Nm mac_capab_rings 20.Nd MAC ring capability 21.Sh SYNOPSIS 22.In sys/mac_provider.h 23.Vt typedef struct mac_capab_rings_s mac_capab_rings_t; 24.Sh INTERFACE LEVEL 25.Sy Uncommitted - 26This interface is still evolving. 27API and ABI stability is not guaranteed. 28.Sh DESCRIPTION 29The 30.Sy MAC_CAPAB_RINGS 31capability provides a means for device drivers to take advantage of 32the additional resources offered by hardware beyond the basic operations 33to transmit and receive. 34There are two primary concepts that this MAC capability relies on: rings 35and groups. 36.Pp 37The 38.Em ring 39is a abstract concept which must be mapped to some hardware construct by 40the driver. 41It typically takes the form of a DMA memory region which is divided 42into many smaller units, called descriptors or entries. 43Each entry in the ring describes a location in memory of a packet, which the 44hardware is to read from 45.Pq to transmit it 46or write to 47.Pq upon reception . 48Entries also typically contain metadata and attributes about the packet. 49These entries are typically arranged in a fixed-size circular buffer 50.Po hence the 51.Dq ring 52name 53.Pc 54which is shared between the operating system and the 55hardware via the DMA-backed memory. 56Most NICs, regardless of their support for this capability, use something 57resembling a descriptor ring under the hood. 58Some vendors may also refer to rings as 59.Em queues . 60The ring concept is intentionally general, so that more unusual underlying 61hardware constructs can also be used to implement it. 62.Pp 63A collection of one or more rings is called a 64.Em group . 65Each group usually has a collection of filters that can be associated 66with them. 67These filters are usually defined in terms of matching something like a 68MAC address, VLAN, or Ethertype, though more complex filters may exist 69in hardware. 70When a packet matches a filter, it will then be directed to the group 71and eventually delivered to one of the rings in the group. 72.Pp 73In the MAC framework, rings and groups are separated into categories 74based on their purpose: transmitting and receiving. 75While the MAC framework thinks of transmit and receive rings as 76different physical constructs, they may map to the same underlying 77resources in the hardware. 78The device driver may implement the 79.Dv MAC_CAPAB_RINGS 80capability for one of transmitting, receiving, or both. 81.Ss Mapping Hardware to Rings and Groups 82There are many different ways that hardware resources may map to this 83capability. 84Consider the following examples: 85.Bl -enum 86.It 87Hardware may support a feature commonly known as receive side scaling 88.Pq RSS . 89With RSS, the hardware has multiple rings and uses a hash function 90calculated over packet headers to choose which ring receives a 91particular packet. 92Rings are associated with different interrupts, allowing multiple rings 93to be processed in parallel. 94Supporting RSS in isolation would result in a device which has a single 95group, and multiple rings within that group. 96.It 97Some hardware may have a single ring, but still support multiple receive 98filters. 99This is commonly seen with some 1 GbE devices. 100While the hardware only has one ring, it has support for multiple 101independent MAC address filters, each of which can be programmed to 102receive traffic for a single MAC address. 103The driver should map this situation to a single group with a single 104ring. 105However, it would implement the ability to program several filters. 106While this may not seem useful at first, when virtual NICs are created 107on top of a physical NIC, the additional hardware filters will be used 108to avoid putting the device in promiscuous mode. 109.It 110Finally, some hardware has many rings, which can be placed in many 111different groups. 112Each group has its own filtering capabilities. 113For such hardware, the device driver would declare support for multiple 114groups, each of which has its own independent set of rings. 115.El 116.Pp 117When choosing hardware constructs to implement rings and groups, it is 118also important to consider interrupts. 119In order to support polling, each receive ring must be able to 120independently toggle whether that ring will generate an interrupt on 121packet reception, even when many rings share the same hardware level 122interrupt 123.Pq e.g. the same MSI or MSI-X interrupt number and handler . 124.Ss Filters 125The 126.Xr mac_group_info 9S 127structure is used to define several different kinds of filters that the 128group might implement. 129There are three different classes of filters that exist: 130.Bl -tag -width Ds 131.It Sy MAC Address 132A given frame matches a MAC Address filter if the receive address in 133the Ethernet Header matches the specified MAC address. 134.It Sy VLAN 135A given frame matches a VLAN filter if it both has an 802.1Q VLAN tag 136and that tag matches the VALN number specified in the filter. 137If the frame's outer ethertype is not 0x8100, then the filter will not 138match. 139.It Sy MAC Address and VLAN 140A given frame matches a MAC Address and VLAN filter if it matches both 141the specified MAC address and the specified VLAN. 142This is constructed as a logical AND of the previous two filters. 143If only one of the two matches, then the frame does not match this 144filter. 145.Pp 146Note: this filter type is still under development and has not been 147plumbed through our APIs yet. 148.El 149.Pp 150Devices may support many different filter types. 151If the hardware resources required for a combined filter type 152.Pq e.g. MAC Address and VLAN 153are similar to the resources required for each in isolation, drivers 154should prefer to implement just the combined type and should not 155implement the individual types. 156.Pp 157The MAC framework assumes that the following rules hold regarding 158filters: 159.Bl -enum 160.It 161When there are multiple filters of the same kind with different 162addresses, then the hardware will accept a frame if it matches 163.Em ANY 164of the specified filters. 165In other words, if there are two VLAN filters defined, one for VLAN 23 166and one for VLAN 42, then if a frame has either VLAN 23 or VLAN 42, 167it will be accepted for the group. 168.It 169If multiple different classes of filters are defined, then the hardware 170should only accept a frame if it passes 171.Em ALL 172of the filter classes. 173For example, if there is a MAC address filter and a separate VLAN 174filter, the hardware will only accept the frame if it passes both sets 175of filters. 176.It 177If there are multiple different classes of filters and there are 178multiple filters present in each class, then the driver will accept a 179packet as long as it matches 180.Em ALL 181filter classes. 182However, within a given filter class, it may match 183.Em ANY 184of the filters. 185See the following boolean logic as an alternative way to phrase this 186case: 187.Bd -literal -offset indent 188match = MAC && VLAN 189MAC = 00:11:22:33:44:55 OR 00:66:77:88:99:aa OR ... 190VLAN = 11 OR 12 OR ... 191.Ed 192.El 193.Pp 194The following psuedocode summarizes the behavior for a device that 195supports independent MAC and VLAN filters. 196If the hardware only supports a single family of filters, then simply 197treat that in the psuedocode as though it is always true: 198.Bd -literal -offset indent 199for each packet p: 200 for each MAC filter m: 201 if m matches p's mac: 202 for each VLAN filter v: 203 if v matches p's vlan: 204 accept p for group 205 proceed to next packet 206 reject packet p 207 proceed to next packet 208.Ed 209.Pp 210The following psuedocode summarizes the behavior for a device that 211supports a combined MAC address and VLAN filter: 212.Bd -literal -offset indent 213for each packet p: 214 for each filter f: 215 if f.mac matches p's mac and f.vlan matches p's vlan: 216 accept p for group 217 proceed to next packet 218 reject packet p 219 proceed to next packet 220.Ed 221.Ss MAC Capability Structure 222When the device driver's 223.Xr mc_getcapab 9E 224function entry point is called with the capability requested set to 225.Dv MAC_CAPAB_RINGS , 226then the value of the capability structure is a pointer to a 227.Vt mac_capab_rings_t 228structure with the following members: 229.Bd -literal -offset indent 230mac_ring_type_t mr_type; 231mac_groupt_type_t mr_group_type; 232uint_t mr_rnum; 233uint_t mr_gnum; 234mac_get_ring_t mr_rget; 235mac_get_group_t mr_gget; 236.Ed 237.Pp 238If the driver supports the 239.Dv MAC_CAPAB_RINGS 240capability, then it should first check the 241.Fa mr_type 242member of the structure. 243This member has the following possible values: 244.Bl -tag -width Dv 245.It Dv MAC_RING_TYPE_RX 246Indicates that this group is for receive rings. 247.It Dv MAC_RING_TYPE_TX 248Indicates that this group is for transmit rings. 249.El 250.Pp 251The driver will be asked to fill in this capability structure separately 252for receive and transmit groups and rings. 253This allows a driver to have different entry points for each type. 254If neither of these values is specified, then the device driver must 255return 256.Dv B_FALSE 257from its 258.Xr mc_getcapab 9E 259entry point. 260Once it has identified the type, it should fill in the capability 261structure based on the following rules: 262.Bl -tag -width Fa 263.It Fa mr_type 264The 265.Fa mr_type 266member is used to indicate whether this group is for transmit or receive 267rings. 268The 269.Fa mr_type 270member should not be modified by the device driver. 271It is set by the MAC framework when the driver's 272.Xr mc_getcapab 9E 273entry point is called. 274As indicated above, the driver must check the value to determine which 275group this 276.Xr mc_getcapab 9E 277call is referring to. 278.It Fa mr_group_type 279This member is used to indicate the group type. 280This should be set to 281.Dv MAC_GROUP_TYPE_STATIC , 282which indicates that the assignment of rings to groups is fixed, and 283each ring can only ever belong to one specific group. 284The number of rings per group may vary on the group and can be set by 285the driver. 286.It Fa mr_rnum 287This indicates the total number of rings that are available. 288The number exposed may be less than the number supported in hardware. 289This is often due to receiving fewer resources such as interrupts. 290.It Fa mr_gnum 291This indicates the total number of groups that are available from 292hardware. 293The number exposed may be less than the number supported in hardware. 294This is often due to receiving fewer resources such as interrupts. 295.Pp 296When working with transmit rings, this value may be zero. 297In this case, each ring is treated independently and separate groups for 298each transmit ring are not required. 299.It Fa mr_rget 300This member is a function pointer that will be called to provide 301information about a ring inside of a specific group. 302See 303.Xr mr_rget 9E 304for information on the function, its signature, and responsibilities. 305.It Fa mr_gget 306This member is a function pointer that will be called to provide 307information about a group. 308See 309.Xr mr_gget 9E 310for information on the function, its signature, and responsibilities. 311.El 312.Sh DRIVER IMPLICATIONS 313.Ss MAC Callback Entry Points 314When a driver implements the 315.Dv MAC_CAPAB_RINGS 316capability, then it must not implement some of the traditional MAC 317callbacks. 318If the driver supports 319.Dv MAC_CAPAB_RINGS 320for receiving, then it must not implement the 321.Xr mc_unicst 9E 322entry point. 323This is instead handled through the filters that were described earlier. 324The filter entry points are defined as part of the 325.Xr mac_group_info 9S 326structure. 327.Pp 328If the driver supports 329.Dv MAC_CAPAB_RINGS 330for transmitting, then it should not implement the 331.Xr mc_tx 9E 332entry point, it will not be used. 333The MAC framework will instead use the 334.Xr mri_tx 9E 335entry point that is provided by the driver in the 336.Xr mac_ring_info 9S 337structure. 338.Ss Locking and Concurrency 339One of the main points of the 340.Dv MAC_CAPAB_RINGS 341capability is to increase the parallelism and concurrency that is 342actively going on in the driver. 343This means that a driver may be asked to transmit, poll, or receiver 344interrupts on all of its rings in parallel. 345This usually calls for fine-grained locking in a driver's own data 346structures to ensure that the various rings can be populated and used 347without having to block on one another. 348In general, most drivers have their own independent set of locks for 349each transmit and receive ring. 350They also usually have separate locks for each group. 351.Pp 352Just because one driver performs locking in one way, does not mean that 353one has to mimic it. 354The design of a driver and its locking is often tightly coupled to how 355the underlying hardware works and its complexity. 356.Ss Polling on rings 357When the 358.Dv MAC_CAPAB_RINGS 359capability is implemented, then additional functionality for receiving 360becomes available. 361A receive ring has the ability to be polled. 362When the operating system desires to begin polling the ring, it will 363make a function call into the driver, asking it to receive packets from 364this ring. 365When receiving packets while polling, the process is generally identical 366to that described in the 367.Sy Receiving Data 368section of 369.Xr mac 9E . 370For more details, see 371.Xr mri_poll 9E . 372.Pp 373When the MAC framework wants to enable polling, it will first turn off 374interrupts through the 375.Xr mi_disable 9E 376entry point on the driver. 377The driver must ensure that there is proper serialization between the 378interrupt enablement, interrupt disablement, the interrupt handler for 379that ring, and the 380.Xr mri_poll 9E 381entry point. 382For more information on the locking requirements related to polling, see 383the discussions in 384.Xr mri_poll 9E 385and 386.Xr mi_disable 9E . 387.Ss Updated callback functions 388When using rings, two of the primary functions that were used change. 389First, the 390.Xr mac_rx 9F 391function should be replaced with the 392.Xr mac_ring_rx 9F 393function. 394Secondly, 395the 396.Xr mac_tx_update 9F 397function should be replaced with the 398.Xr mac_tx_ring_update 9F 399function. 400.Ss Interrupt and Ring Mapping 401Drivers often vary the number of rings that they expose based on the 402number of interrupts that exist. 403When a driver only supports a single group, there is often no reason to 404have more rings than interrupts. 405However, most hardware supports a means of having multiple rings tie to 406the same interrupt. 407Drivers then tie the rings in different groups to the same interrupts 408and therefore when an interrupt is triggered, iterate over all of the 409rings. 410.Pp 411Tying multiple rings together into a single interrupt should only be done 412if hardware has the ability to control whether or not each ring 413contributes to the interrupt. 414For the 415.Xr mi_disable 9E 416entry point to work, each ring must be able to independently control 417whether or not receipt of a packet generates the shared interrupt. 418.Ss Filter Management 419As part of general operation, the device driver will be asked to add 420various filters to groups. 421The MAC framework does not keep track of the assigned filters in such a 422way that after a device reset that they'll be given to the driver again. 423Therefore, it is recommended that the driver keep track of all filters 424it has assigned such that they can be reinstated after a driver or 425system initiated device reset of some kind. 426There is no need to persist anything across a call to 427.Xr detach 9E 428or similar. 429.Pp 430For more information, see the 431.Sy TX STALL DETECTION, DEVICE RESETS, AND FAULT MANAGEMENT 432section of 433.Xr mac 9E . 434.Ss Broadcast, Multicast, and Promiscuous Mode 435Rings and groups are currently designed to emphasize and enhance the 436receipt of filtered, unicast frames. 437This means that special handling is required when working with broadcast 438traffic, multicast traffic, and enabling promiscuous mode. 439This only applies to receive groups and rings. 440.Pp 441By default, only the first group with index zero, sometimes called the 442default group, should ever be 443programmed to receive broadcast traffic. 444This group should always be programmed to receive broadcast traffic, the 445same way that the broader device is programmed to always receive 446broadcast traffic when the 447.Dv MAC_CAPAB_RINGS 448capability has not been negotiated. 449.Pp 450When multicast addresses are assigned to the device through the 451.Xr mc_multicst 9E 452entry point, those should also be assigned to the first group. 453.Pp 454Similarly, when enabling promiscuous mode, the driver should only enable 455promiscuous traffic to be received by the first group. 456.Pp 457No other groups or rings should ever receive broadcast, multicast, or 458promiscuous mode traffic. 459.Sh SEE ALSO 460.Xr mac 9E , 461.Xr mc_getcapab 9E , 462.Xr mc_multicst 9E , 463.Xr mc_tx 9E , 464.Xr mc_unicst 9E , 465.Xr mi_disable 9E , 466.Xr mr_gaddring 9E , 467.Xr mr_gget 9E , 468.Xr mr_gremring 9E , 469.Xr mr_rget 9E , 470.Xr mri_poll 9E , 471.Xr mac_ring_rx 9F , 472.Xr mac_rx 9F , 473.Xr mac_tx_ring_update 9F , 474.Xr mac_tx_update 9F , 475.Xr mac_group_info 9S 476