netmap.4 (1648bf478e17969be4a9be014d98d9f0e1c61bcd) | netmap.4 (17885a7bfde9d164e45a9833bb172215c55739f9) |
---|---|
1.\" Copyright (c) 2011-2013 Matteo Landi, Luigi Rizzo, Universita` di Pisa | 1.\" Copyright (c) 2011-2014 Matteo Landi, Luigi Rizzo, Universita` di Pisa |
2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright --- 12 unchanged lines hidden (view full) --- 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" This document is derived in part from the enet man page (enet.4) 26.\" distributed with 4.3BSD Unix. 27.\" 28.\" $FreeBSD$ 29.\" | 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright --- 12 unchanged lines hidden (view full) --- 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" This document is derived in part from the enet man page (enet.4) 26.\" distributed with 4.3BSD Unix. 27.\" 28.\" $FreeBSD$ 29.\" |
30.Dd October 18, 2013 | 30.Dd January 4, 2014 |
31.Dt NETMAP 4 32.Os 33.Sh NAME 34.Nm netmap 35.Nd a framework for fast packet I/O | 31.Dt NETMAP 4 32.Os 33.Sh NAME 34.Nm netmap 35.Nd a framework for fast packet I/O |
36.br 37.Nm VALE 38.Nd a fast VirtuAl Local Ethernet using the netmap API |
|
36.Sh SYNOPSIS 37.Cd device netmap 38.Sh DESCRIPTION 39.Nm 40is a framework for extremely fast and efficient packet I/O | 39.Sh SYNOPSIS 40.Cd device netmap 41.Sh DESCRIPTION 42.Nm 43is a framework for extremely fast and efficient packet I/O |
41(reaching 14.88 Mpps with a single core at less than 1 GHz) | |
42for both userspace and kernel clients. | 44for both userspace and kernel clients. |
43Userspace clients can use the netmap API 44to send and receive raw packets through physical interfaces 45or ports of the 46.Xr VALE 4 47switch. | 45It runs on FreeBSD and Linux, 46and includes 47.Nm VALE , 48a very fast and modular in-kernel software switch/dataplane. |
48.Pp | 49.Pp |
50.Nm 51and |
|
49.Nm VALE | 52.Nm VALE |
50is a very fast (reaching 20 Mpps per port) 51and modular software switch, 52implemented within the kernel, which can interconnect 53virtual ports, physical devices, and the native host stack. | 53are one order of magnitude faster than sockets, bpf or 54native switches based on 55.Xr tun/tap 4 , 56reaching 14.88 Mpps with much less than one core on a 10 Gbit NIC, 57and 20 Mpps per core for VALE ports. |
54.Pp | 58.Pp |
59Userspace clients can dynamically switch NICs into |
|
55.Nm | 60.Nm |
56uses a memory mapped region to share packet buffers, 57descriptors and queues with the kernel. 58Simple 59.Pa ioctl()s 60are used to bind interfaces/ports to file descriptors and 61implement non-blocking I/O, whereas blocking I/O uses 62.Pa select()/poll() . 63.Nm 64can exploit the parallelism in multiqueue devices and 65multicore systems. | 61mode and send and receive raw packets through 62memory mapped buffers. 63A selectable file descriptor supports 64synchronization and blocking I/O. |
66.Pp | 65.Pp |
67For the best performance, | 66Similarly, 67.Nm VALE 68can dynamically create switch instances and ports, 69providing high speed packet I/O between processes, 70virtual machines, NICs and the host stack. 71.Pp 72For best performance, |
68.Nm 69requires explicit support in device drivers; | 73.Nm 74requires explicit support in device drivers; |
70a generic emulation layer is available to implement the | 75however, the |
71.Nm | 76.Nm |
72API on top of unmodified device drivers, | 77API can be emulated on top of unmodified device drivers, |
73at the price of reduced performance | 78at the price of reduced performance |
74(but still better than what can be achieved with 75sockets or BPF/pcap). | 79(but still better than sockets or BPF/pcap). |
76.Pp | 80.Pp |
77For a list of devices with native | 81In the rest of this (long) manual page we document 82various aspects of the |
78.Nm | 83.Nm |
79support, see the end of this manual page. 80.Sh OPERATION - THE NETMAP API | 84and 85.Nm VALE 86architecture, features and usage. 87.Pp 88.Sh ARCHITECTURE |
81.Nm | 89.Nm |
82clients must first 83.Pa open("/dev/netmap") , 84and then issue an 85.Pa ioctl(fd, NIOCREGIF, (struct nmreq *)arg) 86to bind the file descriptor to a specific interface or port. | 90supports raw packet I/O through a 91.Em port , 92which can be connected to a physical interface 93.Em ( NIC ) , 94to the host stack, 95or to a 96.Nm VALE 97switch). 98Ports use preallocated circular queues of buffers 99.Em ( rings ) 100residing in an mmapped region. 101There is one ring for each transmit/receive queue of a 102NIC or virtual port. 103An additional ring pair connects to the host stack. 104.Pp 105After binding a file descriptor to a port, a |
87.Nm | 106.Nm |
88has multiple modes of operation controlled by the 89content of the 90.Pa struct nmreq 91passed to the 92.Pa ioctl() . 93In particular, the 94.Em nr_name 95field specifies whether the client operates on a physical network 96interface or on a port of a | 107client can send or receive packets in batches through 108the rings, and possibly implement zero-copy forwarding 109between ports. 110.Pp 111All NICs operating in 112.Nm 113mode use the same memory region, 114accessible to all processes who own 115.Nm /dev/netmap 116file descriptors bound to NICs. |
97.Nm VALE | 117.Nm VALE |
98switch, as indicated below. Additional fields in the 99.Pa struct nmreq 100control the details of operation. | 118ports instead use separate memory regions. 119.Pp 120.Sh ENTERING AND EXITING NETMAP MODE 121Ports and rings are created and controlled through a file descriptor, 122created by opening a special device 123.Dl fd = open("/dev/netmap"); 124and then bound to a specific port with an 125.Dl ioctl(fd, NIOCREGIF, (struct nmreq *)arg); 126.Pp 127.Nm 128has multiple modes of operation controlled by the 129.Vt struct nmreq 130argument. 131.Va arg.nr_name 132specifies the port name, as follows: |
101.Bl -tag -width XXXX | 133.Bl -tag -width XXXX |
102.It Dv Interface name (e.g. 'em0', 'eth1', ... ) 103The data path of the interface is disconnected from the host stack. 104Depending on additional arguments, 105the file descriptor is bound to the NIC (one or all queues), 106or to the host stack. | 134.It Dv OS network interface name (e.g. 'em0', 'eth1', ... ) 135the data path of the NIC is disconnected from the host stack, 136and the file descriptor is bound to the NIC (one or all queues), 137or to the host stack; |
107.It Dv valeXXX:YYY (arbitrary XXX and YYY) | 138.It Dv valeXXX:YYY (arbitrary XXX and YYY) |
108The file descriptor is bound to port YYY of a VALE switch called XXX, 109where XXX and YYY are arbitrary alphanumeric strings. | 139the file descriptor is bound to port YYY of a VALE switch called XXX, 140both dynamically created if necessary. |
110The string cannot exceed IFNAMSIZ characters, and YYY cannot | 141The string cannot exceed IFNAMSIZ characters, and YYY cannot |
111matching the name of any existing interface. 112.Pp 113The switch and the port are created if not existing. 114.It Dv valeXXX:ifname (ifname is an existing interface) 115Flags in the argument control whether the physical interface 116(and optionally the corrisponding host stack endpoint) 117are connected or disconnected from the VALE switch named XXX. 118.Pp 119In this case the 120.Pa ioctl() 121is used only for configuring the VALE switch, typically through the 122.Nm vale-ctl 123command. 124The file descriptor cannot be used for I/O, and should be 125.Pa close()d 126after issuing the 127.Pa ioctl(). | 142be the name of any existing OS network interface. |
128.El 129.Pp | 143.El 144.Pp |
130The binding can be removed (and the interface returns to 131regular operation, or the virtual port destroyed) with a 132.Pa close() 133on the file descriptor. | 145On return, 146.Va arg 147indicates the size of the shared memory region, 148and the number, size and location of all the 149.Nm 150data structures, which can be accessed by mmapping the memory 151.Dl char *mem = mmap(0, arg.nr_memsize, fd); |
134.Pp | 152.Pp |
135The processes owning the file descriptor can then 136.Pa mmap() 137the memory region that contains pre-allocated 138buffers, descriptors and queues, and use them to 139read/write raw packets. | |
140Non blocking I/O is done with special | 153Non blocking I/O is done with special |
141.Pa ioctl()'s , 142whereas the file descriptor can be passed to 143.Pa select()/poll() 144to be notified about incoming packet or available transmit buffers. 145.Ss DATA STRUCTURES 146The data structures in the mmapped memory are described below 147(see 148.Xr sys/net/netmap.h 149for reference). 150All physical devices operating in | 154.Xr ioctl 2 155.Xr select 2 156and 157.Xr poll 2 158on the file descriptor permit blocking I/O. 159.Xr epoll 2 160and 161.Xr kqueue 2 162are not supported on |
151.Nm | 163.Nm |
152mode use the same memory region, 153shared by the kernel and all processes who own 154.Pa /dev/netmap 155descriptors bound to those devices 156(NOTE: visibility may be restricted in future implementations). 157Virtual ports instead use separate memory regions, 158shared only with the kernel. | 164file descriptors. |
159.Pp | 165.Pp |
160All references between the shared data structure 161are relative (offsets or indexes). Some macros help converting 162them into actual pointers. | 166While a NIC is in 167.Nm 168mode, the OS will still believe the interface is up and running. 169OS-generated packets for that NIC end up into a 170.Nm 171ring, and another ring is used to send packets into the OS network stack. 172A 173.Xr close 2 174on the file descriptor removes the binding, 175and returns the NIC to normal mode (reconnecting the data path 176to the host stack), or destroys the virtual port. 177.Pp 178.Sh DATA STRUCTURES 179The data structures in the mmapped memory region are detailed in 180.Xr sys/net/netmap.h , 181which is the ultimate reference for the 182.Nm 183API. The main structures and fields are indicated below: |
163.Bl -tag -width XXX 164.It Dv struct netmap_if (one per interface) | 184.Bl -tag -width XXX 185.It Dv struct netmap_if (one per interface) |
165indicates the number of rings supported by an interface, their 166sizes, and the offsets of the 167.Pa netmap_rings 168associated to the interface. 169.Pp 170.Pa struct netmap_if 171is at offset 172.Pa nr_offset 173in the shared memory region is indicated by the 174field in the structure returned by the 175.Pa NIOCREGIF 176(see below). | |
177.Bd -literal 178struct netmap_if { | 186.Bd -literal 187struct netmap_if { |
179 char ni_name[IFNAMSIZ]; /* name of the interface. */ 180 const u_int ni_version; /* API version */ 181 const u_int ni_rx_rings; /* number of rx ring pairs */ 182 const u_int ni_tx_rings; /* if 0, same as ni_rx_rings */ 183 const ssize_t ring_ofs[]; /* offset of tx and rx rings */ | 188 ... 189 const uint32_t ni_flags; /* properties */ 190 ... 191 const uint32_t ni_tx_rings; /* NIC tx rings */ 192 const uint32_t ni_rx_rings; /* NIC rx rings */ 193 const uint32_t ni_extra_tx_rings; /* extra tx rings */ 194 const uint32_t ni_extra_rx_rings; /* extra rx rings */ 195 ... |
184}; 185.Ed | 196}; 197.Ed |
198.Pp 199Indicates the number of available rings 200.Pa ( struct netmap_rings ) 201and their position in the mmapped region. 202The number of tx and rx rings 203.Pa ( ni_tx_rings , ni_rx_rings ) 204normally depends on the hardware. 205NICs also have an extra tx/rx ring pair connected to the host stack. 206.Em NIOCREGIF 207can request additional tx/rx rings, 208to be used between multiple processes/threads 209accessing the same 210.Nm 211port. |
|
186.It Dv struct netmap_ring (one per ring) | 212.It Dv struct netmap_ring (one per ring) |
187Contains the positions in the transmit and receive rings to 188synchronize the kernel and the application, 189and an array of | 213.Bd -literal 214struct netmap_ring { 215 ... 216 const uint32_t num_slots; /* slots in each ring */ 217 const uint32_t nr_buf_size; /* size of each buffer */ 218 ... 219 uint32_t head; /* (u) first buf owned by user */ 220 uint32_t cur; /* (u) wakeup position */ 221 const uint32_t tail; /* (k) first buf owned by kernel */ 222 ... 223 uint32_t flags; 224 struct timeval ts; /* (k) time of last rxsync() */ 225 ... 226 struct netmap_slot slot[0]; /* array of slots */ 227} 228.Ed 229.Pp 230Implements transmit and receive rings, with read/write 231pointers, metadata and and an array of |
190.Pa slots 191describing the buffers. | 232.Pa slots 233describing the buffers. |
192'reserved' is used in receive rings to tell the kernel the 193number of slots after 'cur' that are still in usr 194indicates how many slots starting from 'cur' 195the | |
196.Pp | 234.Pp |
197Each physical interface has one 198.Pa netmap_ring 199for each hardware transmit and receive ring, 200plus one extra transmit and one receive structure 201that connect to the host stack. | 235.It Dv struct netmap_slot (one per buffer) |
202.Bd -literal | 236.Bd -literal |
203struct netmap_ring { 204 const ssize_t buf_ofs; /* see details */ 205 const uint32_t num_slots; /* number of slots in the ring */ 206 uint32_t avail; /* number of usable slots */ 207 uint32_t cur; /* 'current' read/write index */ 208 uint32_t reserved; /* not refilled before current */ 209 210 const uint16_t nr_buf_size; 211 uint16_t flags; 212#define NR_TIMESTAMP 0x0002 /* set timestamp on *sync() */ 213#define NR_FORWARD 0x0004 /* enable NS_FORWARD for ring */ 214#define NR_RX_TSTMP 0x0008 /* set rx timestamp in slots */ 215 struct timeval ts; 216 struct netmap_slot slot[0]; /* array of slots */ 217} | 237struct netmap_slot { 238 uint32_t buf_idx; /* buffer index */ 239 uint16_t len; /* packet length */ 240 uint16_t flags; /* buf changed, etc. */ 241 uint64_t ptr; /* address for indirect buffers */ 242}; |
218.Ed 219.Pp | 243.Ed 244.Pp |
220In transmit rings, after a system call 'cur' indicates 221the first slot that can be used for transmissions, 222and 'avail' reports how many of them are available. 223Before the next netmap-related system call on the file 224descriptor, the application should fill buffers and 225slots with data, and update 'cur' and 'avail' 226accordingly, as shown in the figure below: | 245Describes a packet buffer, which normally is identified by 246an index and resides in the mmapped region. 247.It Dv packet buffers 248Fixed size (normally 2 KB) packet buffers allocated by the kernel. 249.El 250.Pp 251The offset of the 252.Pa struct netmap_if 253in the mmapped region is indicated by the 254.Pa nr_offset 255field in the structure returned by 256.Pa NIOCREGIF . 257From there, all other objects are reachable through 258relative references (offsets or indexes). 259Macros and functions in <net/netmap_user.h> 260help converting them into actual pointers: 261.Pp 262.Dl struct netmap_if *nifp = NETMAP_IF(mem, arg.nr_offset); 263.Dl struct netmap_ring *txr = NETMAP_TXRING(nifp, ring_index); 264.Dl struct netmap_ring *rxr = NETMAP_RXRING(nifp, ring_index); 265.Pp 266.Dl char *buf = NETMAP_BUF(ring, buffer_index); 267.Sh RINGS, BUFFERS AND DATA I/O 268.Va Rings 269are circular queues of packets with three indexes/pointers 270.Va ( head , cur , tail ) ; 271one slot is always kept empty. 272The ring size 273.Va ( num_slots ) 274should not be assumed to be a power of two. 275.br 276(NOTE: older versions of netmap used head/count format to indicate 277the content of a ring). 278.Pp 279.Va head 280is the first slot available to userspace; 281.br 282.Va cur 283is the wakeup point: 284select/poll will unblock when 285.Va tail 286passes 287.Va cur ; 288.br 289.Va tail 290is the first slot reserved to the kernel. 291.Pp 292Slot indexes MUST only move forward; 293for convenience, the function 294.Dl nm_ring_next(ring, index) 295returns the next index modulo the ring size. 296.Pp 297.Va head 298and 299.Va cur 300are only modified by the user program; 301.Va tail 302is only modified by the kernel. 303The kernel only reads/writes the 304.Vt struct netmap_ring 305slots and buffers 306during the execution of a netmap-related system call. 307The only exception are slots (and buffers) in the range 308.Va tail\ . . . head-1 , 309that are explicitly assigned to the kernel. 310.Pp 311.Ss TRANSMIT RINGS 312On transmit rings, after a 313.Nm 314system call, slots in the range 315.Va head\ . . . tail-1 316are available for transmission. 317User code should fill the slots sequentially 318and advance 319.Va head 320and 321.Va cur 322past slots ready to transmit. 323.Va cur 324may be moved further ahead if the user code needs 325more slots before further transmissions (see 326.Sx SCATTER GATHER I/O ) . 327.Pp 328At the next NIOCTXSYNC/select()/poll(), 329slots up to 330.Va head-1 331are pushed to the port, and 332.Va tail 333may advance if further slots have become available. 334Below is an example of the evolution of a TX ring: 335.Pp |
227.Bd -literal | 336.Bd -literal |
337 after the syscall, slots between cur and tail are (a)vailable 338 head=cur tail 339 | | 340 v v 341 TX [.....aaaaaaaaaaa.............] |
|
228 | 342 |
229 cur 230 |----- avail ---| (after syscall) 231 v 232 TX [*****aaaaaaaaaaaaaaaaa**] 233 TX [*****TTTTTaaaaaaaaaaaa**] 234 ^ 235 |-- avail --| (before syscall) 236 cur | 343 user creates new packets to (T)ransmit 344 head=cur tail 345 | | 346 v v 347 TX [.....TTTTTaaaaaa.............] 348 349 NIOCTXSYNC/poll()/select() sends packets and reports new slots 350 head=cur tail 351 | | 352 v v 353 TX [..........aaaaaaaaaaa........] |
237.Ed | 354.Ed |
238In receive rings, after a system call 'cur' indicates 239the first slot that contains a valid packet, 240and 'avail' reports how many of them are available. 241Before the next netmap-related system call on the file 242descriptor, the application can process buffers and 243release them to the kernel updating 244'cur' and 'avail' accordingly, as shown in the figure below. 245Receive rings have an additional field called 'reserved' 246to indicate how many buffers before 'cur' are still 247under processing and cannot be released. | 355.Pp 356select() and poll() wlll block if there is no space in the ring, i.e. 357.Dl ring->cur == ring->tail 358and return when new slots have become available. 359.Pp 360High speed applications may want to amortize the cost of system calls 361by preparing as many packets as possible before issuing them. 362.Pp 363A transmit ring with pending transmissions has 364.Dl ring->head != ring->tail + 1 (modulo the ring size). 365The function 366.Va int nm_tx_pending(ring) 367implements this test. 368.Pp 369.Ss RECEIVE RINGS 370On receive rings, after a 371.Nm 372system call, the slots in the range 373.Va head\& . . . tail-1 374contain received packets. 375User code should process them and advance 376.Va head 377and 378.Va cur 379past slots it wants to return to the kernel. 380.Va cur 381may be moved further ahead if the user code wants to 382wait for more packets 383without returning all the previous slots to the kernel. 384.Pp 385At the next NIOCRXSYNC/select()/poll(), 386slots up to 387.Va head-1 388are returned to the kernel for further receives, and 389.Va tail 390may advance to report new incoming packets. 391.br 392Below is an example of the evolution of an RX ring: |
248.Bd -literal | 393.Bd -literal |
249 cur 250 |-res-|-- avail --| (after syscall) 251 v 252 RX [**rrrrrrRRRRRRRRRRRR******] 253 RX [**...........rrrrRRR******] 254 |res|--|<avail (before syscall) 255 ^ 256 cur | 394 after the syscall, there are some (h)eld and some (R)eceived slots 395 head cur tail 396 | | | 397 v v v 398 RX [..hhhhhhRRRRRRRR..........] |
257 | 399 |
400 user advances head and cur, releasing some slots and holding others 401 head cur tail 402 | | | 403 v v v 404 RX [..*****hhhRRRRRR...........] 405 406 NICRXSYNC/poll()/select() recovers slots and reports new packets 407 head cur tail 408 | | | 409 v v v 410 RX [.......hhhRRRRRRRRRRRR....] |
|
258.Ed | 411.Ed |
259.It Dv struct netmap_slot (one per packet) 260contains the metadata for a packet: 261.Bd -literal 262struct netmap_slot { 263 uint32_t buf_idx; /* buffer index */ 264 uint16_t len; /* packet length */ 265 uint16_t flags; /* buf changed, etc. */ 266#define NS_BUF_CHANGED 0x0001 /* must resync, buffer changed */ 267#define NS_REPORT 0x0002 /* tell hw to report results 268 * e.g. by generating an interrupt 269 */ 270#define NS_FORWARD 0x0004 /* pass packet to the other endpoint 271 * (host stack or device) 272 */ 273#define NS_NO_LEARN 0x0008 274#define NS_INDIRECT 0x0010 275#define NS_MOREFRAG 0x0020 276#define NS_PORT_SHIFT 8 277#define NS_PORT_MASK (0xff << NS_PORT_SHIFT) 278#define NS_RFRAGS(_slot) ( ((_slot)->flags >> 8) & 0xff) 279 uint64_t ptr; /* buffer address (indirect buffers) */ 280}; 281.Ed 282The flags control how the the buffer associated to the slot 283should be managed. 284.It Dv packet buffers 285are normally fixed size (2 Kbyte) buffers allocated by the kernel 286that contain packet data. Buffers addresses are computed through 287macros. 288.El 289.Bl -tag -width XXX 290Some macros support the access to objects in the shared memory 291region. In particular, 292.It NETMAP_TXRING(nifp, i) 293.It NETMAP_RXRING(nifp, i) 294return the address of the i-th transmit and receive ring, 295respectively, whereas 296.It NETMAP_BUF(ring, buf_idx) 297returns the address of the buffer with index buf_idx 298(which can be part of any ring for the given interface). 299.El | |
300.Pp | 412.Pp |
301Normally, buffers are associated to slots when interfaces are bound, 302and one packet is fully contained in a single buffer. 303Clients can however modify the mapping using the 304following flags: 305.Ss FLAGS | 413.Sh SLOTS AND PACKET BUFFERS 414Normally, packets should be stored in the netmap-allocated buffers 415assigned to slots when ports are bound to a file descriptor. 416One packet is fully contained in a single buffer. 417.Pp 418The following flags affect slot and buffer processing: |
306.Bl -tag -width XXX 307.It NS_BUF_CHANGED | 419.Bl -tag -width XXX 420.It NS_BUF_CHANGED |
308indicates that the buf_idx in the slot has changed. 309This can be useful if the client wants to implement 310some form of zero-copy forwarding (e.g. by passing buffers 311from an input interface to an output interface), or 312needs to process packets out of order. | 421it MUST be used when the buf_idx in the slot is changed. 422This can be used to implement 423zero-copy forwarding, see 424.Sx ZERO-COPY FORWARDING . |
313.Pp | 425.Pp |
314The flag MUST be used whenever the buffer index is changed. | |
315.It NS_REPORT | 426.It NS_REPORT |
316indicates that we want to be woken up when this buffer 317has been transmitted. This reduces performance but insures 318a prompt notification when a buffer has been sent. | 427reports when this buffer has been transmitted. |
319Normally, 320.Nm 321notifies transmit completions in batches, hence signals | 428Normally, 429.Nm 430notifies transmit completions in batches, hence signals |
322can be delayed indefinitely. However, we need such notifications 323before closing a descriptor. | 431can be delayed indefinitely. This flag helps detecting 432when packets have been send and a file descriptor can be closed. |
324.It NS_FORWARD | 433.It NS_FORWARD |
325When the device is open in 'transparent' mode, 326the client can mark slots in receive rings with this flag. 327For all marked slots, marked packets are forwarded to 328the other endpoint at the next system call, thus restoring 329(in a selective way) the connection between the NIC and the 330host stack. | 434When a ring is in 'transparent' mode (see 435.Sx TRANSPARENT MODE ) , 436packets marked with this flags are forwarded to the other endpoint 437at the next system call, thus restoring (in a selective way) 438the connection between a NIC and the host stack. |
331.It NS_NO_LEARN 332tells the forwarding code that the SRC MAC address for this | 439.It NS_NO_LEARN 440tells the forwarding code that the SRC MAC address for this |
333packet should not be used in the learning bridge | 441packet must not be used in the learning bridge code. |
334.It NS_INDIRECT | 442.It NS_INDIRECT |
335indicates that the packet's payload is not in the netmap 336supplied buffer, but in a user-supplied buffer whose 337user virtual address is in the 'ptr' field of the slot. | 443indicates that the packet's payload is in a user-supplied buffer, 444whose user virtual address is in the 'ptr' field of the slot. |
338The size can reach 65535 bytes. | 445The size can reach 65535 bytes. |
339.Em This is only supported on the transmit ring of virtual ports | 446.br 447This is only supported on the transmit ring of 448.Nm VALE 449ports, and it helps reducing data copies in the interconnection 450of virtual machines. |
340.It NS_MOREFRAG 341indicates that the packet continues with subsequent buffers; 342the last buffer in a packet must have the flag clear. | 451.It NS_MOREFRAG 452indicates that the packet continues with subsequent buffers; 453the last buffer in a packet must have the flag clear. |
454.El 455.Sh SCATTER GATHER I/O 456Packets can span multiple slots if the 457.Va NS_MOREFRAG 458flag is set in all but the last slot. |
|
343The maximum length of a chain is 64 buffers. | 459The maximum length of a chain is 64 buffers. |
344.Em This is only supported on virtual ports 345.It NS_RFRAGS(slot) 346on receive rings, returns the number of remaining buffers 347in a packet, including this one. 348Slots with a value greater than 1 also have NS_MOREFRAG set. 349The length refers to the individual buffer, there is no 350field for the total length. | 460This is normally used with 461.Nm VALE 462ports when connecting virtual machines, as they generate large 463TSO segments that are not split unless they reach a physical device. |
351.Pp | 464.Pp |
352On transmit rings, if NS_DST is set, it is passed to the lookup 353function, which can use it e.g. as the index of the destination 354port instead of doing an address lookup. 355.El | 465NOTE: The length field always refers to the individual 466fragment; there is no place with the total length of a packet. 467.Pp 468On receive rings the macro 469.Va NS_RFRAGS(slot) 470indicates the remaining number of slots for this packet, 471including the current one. 472Slots with a value greater than 1 also have NS_MOREFRAG set. |
356.Sh IOCTLS 357.Nm | 473.Sh IOCTLS 474.Nm |
358supports some ioctl() to synchronize the state of the rings 359between the kernel and the user processes, plus some 360to query and configure the interface. 361The former do not require any argument, whereas the latter 362use a 363.Pa struct nmreq 364defined as follows: | 475uses two ioctls (NIOCTXSYNC, NIOCRXSYNC) 476for non-blocking I/O. They take no argument. 477Two more ioctls (NIOCGINFO, NIOCREGIF) are used 478to query and configure ports, with the following argument: |
365.Bd -literal 366struct nmreq { | 479.Bd -literal 480struct nmreq { |
367 char nr_name[IFNAMSIZ]; 368 uint32_t nr_version; /* API version */ 369#define NETMAP_API 4 /* current version */ 370 uint32_t nr_offset; /* nifp offset in the shared region */ 371 uint32_t nr_memsize; /* size of the shared region */ 372 uint32_t nr_tx_slots; /* slots in tx rings */ 373 uint32_t nr_rx_slots; /* slots in rx rings */ 374 uint16_t nr_tx_rings; /* number of tx rings */ 375 uint16_t nr_rx_rings; /* number of tx rings */ 376 uint16_t nr_ringid; /* ring(s) we care about */ 377#define NETMAP_HW_RING 0x4000 /* low bits indicate one hw ring */ 378#define NETMAP_SW_RING 0x2000 /* we process the sw ring */ 379#define NETMAP_NO_TX_POLL 0x1000 /* no gratuitous txsync on poll */ 380#define NETMAP_RING_MASK 0xfff /* the actual ring number */ 381 uint16_t nr_cmd; 382#define NETMAP_BDG_ATTACH 1 /* attach the NIC */ 383#define NETMAP_BDG_DETACH 2 /* detach the NIC */ 384#define NETMAP_BDG_LOOKUP_REG 3 /* register lookup function */ 385#define NETMAP_BDG_LIST 4 /* get bridge's info */ 386 uint16_t nr_arg1; 387 uint16_t nr_arg2; 388 uint32_t spare2[3]; | 481 char nr_name[IFNAMSIZ]; /* (i) port name */ 482 uint32_t nr_version; /* (i) API version */ 483 uint32_t nr_offset; /* (o) nifp offset in mmap region */ 484 uint32_t nr_memsize; /* (o) size of the mmap region */ 485 uint32_t nr_tx_slots; /* (o) slots in tx rings */ 486 uint32_t nr_rx_slots; /* (o) slots in rx rings */ 487 uint16_t nr_tx_rings; /* (o) number of tx rings */ 488 uint16_t nr_rx_rings; /* (o) number of tx rings */ 489 uint16_t nr_ringid; /* (i) ring(s) we care about */ 490 uint16_t nr_cmd; /* (i) special command */ 491 uint16_t nr_arg1; /* (i) extra arguments */ 492 uint16_t nr_arg2; /* (i) extra arguments */ 493 ... |
389}; | 494}; |
390 | |
391.Ed | 495.Ed |
392A device descriptor obtained through | 496.Pp 497A file descriptor obtained through |
393.Pa /dev/netmap | 498.Pa /dev/netmap |
394also supports the ioctl supported by network devices. | 499also supports the ioctl supported by network devices, see 500.Xr netintro 4 . |
395.Pp | 501.Pp |
396The netmap-specific 397.Xr ioctl 2 398command codes below are defined in 399.In net/netmap.h 400and are: | |
401.Bl -tag -width XXXX 402.It Dv NIOCGINFO | 502.Bl -tag -width XXXX 503.It Dv NIOCGINFO |
403returns EINVAL if the named device does not support netmap. | 504returns EINVAL if the named port does not support netmap. |
404Otherwise, it returns 0 and (advisory) information | 505Otherwise, it returns 0 and (advisory) information |
405about the interface. | 506about the port. |
406Note that all the information below can change before the 407interface is actually put in netmap mode. 408.Pp | 507Note that all the information below can change before the 508interface is actually put in netmap mode. 509.Pp |
409.Pa nr_memsize 410indicates the size of the netmap 411memory region. Physical devices all share the same memory region, 412whereas VALE ports may have independent regions for each port. 413These sizes can be set through system-wise sysctl variables. 414.Pa nr_tx_slots, nr_rx_slots | 510.Bl -tag -width XX 511.It Pa nr_memsize 512indicates the size of the 513.Nm 514memory region. NICs in 515.Nm 516mode all share the same memory region, 517whereas 518.Nm VALE 519ports have independent regions for each port. 520.It Pa nr_tx_slots , nr_rx_slots |
415indicate the size of transmit and receive rings. | 521indicate the size of transmit and receive rings. |
416.Pa nr_tx_rings, nr_rx_rings | 522.It Pa nr_tx_rings , nr_rx_rings |
417indicate the number of transmit 418and receive rings. 419Both ring number and sizes may be configured at runtime 420using interface-specific functions (e.g. | 523indicate the number of transmit 524and receive rings. 525Both ring number and sizes may be configured at runtime 526using interface-specific functions (e.g. |
421.Pa sysctl 422or 423.Pa ethtool . | 527.Xr ethtool 528). 529.El |
424.It Dv NIOCREGIF | 530.It Dv NIOCREGIF |
425puts the interface named in nr_name into netmap mode, disconnecting 426it from the host stack, and/or defines which rings are controlled 427through this file descriptor. | 531binds the port named in 532.Va nr_name 533to the file descriptor. For a physical device this also switches it into 534.Nm 535mode, disconnecting 536it from the host stack. 537Multiple file descriptors can be bound to the same port, 538with proper synchronization left to the user. 539.Pp |
428On return, it gives the same info as NIOCGINFO, and nr_ringid 429indicates the identity of the rings controlled through the file 430descriptor. 431.Pp | 540On return, it gives the same info as NIOCGINFO, and nr_ringid 541indicates the identity of the rings controlled through the file 542descriptor. 543.Pp |
432Possible values for nr_ringid are | 544.Va nr_ringid 545selects which rings are controlled through this file descriptor. 546Possible values are: |
433.Bl -tag -width XXXXX 434.It 0 | 547.Bl -tag -width XXXXX 548.It 0 |
435default, all hardware rings | 549(default) all hardware rings |
436.It NETMAP_SW_RING | 550.It NETMAP_SW_RING |
437the ``host rings'' connecting to the host stack 438.It NETMAP_HW_RING + i 439the i-th hardware ring | 551the ``host rings'', connecting to the host stack. 552.It NETMAP_HW_RING | i 553the i-th hardware ring . |
440.El | 554.El |
555.Pp |
|
441By default, a | 556By default, a |
442.Nm poll | 557.Xr poll 2 |
443or | 558or |
444.Nm select | 559.Xr select 2 |
445call pushes out any pending packets on the transmit ring, even if 446no write events are specified. 447The feature can be disabled by or-ing | 560call pushes out any pending packets on the transmit ring, even if 561no write events are specified. 562The feature can be disabled by or-ing |
448.Nm NETMAP_NO_TX_SYNC 449to nr_ringid. 450But normally you should keep this feature unless you are using 451separate file descriptors for the send and receive rings, because 452otherwise packets are pushed out only if NETMAP_TXSYNC is called, 453or the send queue is full. | 563.Va NETMAP_NO_TX_SYNC 564to the value written to 565.Va nr_ringid. 566When this feature is used, 567packets are transmitted only on 568.Va ioctl(NIOCTXSYNC) 569or select()/poll() are called with a write event (POLLOUT/wfdset) or a full ring. |
454.Pp | 570.Pp |
455.Pa NIOCREGIF 456can be used multiple times to change the association of a 457file descriptor to a ring pair, always within the same device. 458.Pp | |
459When registering a virtual interface that is dynamically created to a 460.Xr vale 4 461switch, we can specify the desired number of rings (1 by default, 462and currently up to 16) on it using nr_tx_rings and nr_rx_rings fields. 463.It Dv NIOCTXSYNC 464tells the hardware of new packets to transmit, and updates the 465number of slots available for transmission. 466.It Dv NIOCRXSYNC 467tells the hardware of consumed packets, and asks for newly available 468packets. 469.El | 571When registering a virtual interface that is dynamically created to a 572.Xr vale 4 573switch, we can specify the desired number of rings (1 by default, 574and currently up to 16) on it using nr_tx_rings and nr_rx_rings fields. 575.It Dv NIOCTXSYNC 576tells the hardware of new packets to transmit, and updates the 577number of slots available for transmission. 578.It Dv NIOCRXSYNC 579tells the hardware of consumed packets, and asks for newly available 580packets. 581.El |
582.Sh SELECT AND POLL 583.Xr select 2 584and 585.Xr poll 2 586on a 587.Nm 588file descriptor process rings as indicated in 589.Sx TRANSMIT RINGS 590and 591.Sx RECEIVE RINGS 592when write (POLLOUT) and read (POLLIN) events are requested. 593.Pp 594Both block if no slots are available in the ring ( 595.Va ring->cur == ring->tail ) 596.Pp 597Packets in transmit rings are normally pushed out even without 598requesting write events. Passing the NETMAP_NO_TX_SYNC flag to 599.Em NIOCREGIF 600disables this feature. 601.Sh LIBRARIES 602The 603.Nm 604API is supposed to be used directly, both because of its simplicity and 605for efficient integration with applications. 606.Pp 607For conveniency, the 608.Va <net/netmap_user.h> 609header provides a few macros and functions to ease creating 610a file descriptor and doing I/O with a 611.Nm 612port. These are loosely modeled after the 613.Xr pcap 3 614API, to ease porting of libpcap-based applications to 615.Nm . 616To use these extra functions, programs should 617.Dl #define NETMAP_WITH_LIBS 618before 619.Dl #include <net/netmap_user.h> 620.Pp 621The following functions are available: 622.Bl -tag -width XXXXX 623.It Va struct nm_desc_t * nm_open(const char *ifname, const char *ring_name, int flags, int ring_flags) 624similar to 625.Xr pcap_open , 626binds a file descriptor to a port. 627.Bl -tag -width XX 628.It Va ifname 629is a port name, in the form "netmap:XXX" for a NIC and "valeXXX:YYY" for a 630.Nm VALE 631port. 632.It Va flags 633can be set to 634.Va NETMAP_SW_RING 635to bind to the host ring pair, 636or to NETMAP_HW_RING to bind to a specific ring. 637.Va ring_name 638with NETMAP_HW_RING, 639is interpreted as a string or an integer indicating the ring to use. 640.It Va ring_flags 641is copied directly into the ring flags, to specify additional parameters 642such as NR_TIMESTAMP or NR_FORWARD. 643.El 644.It Va int nm_close(struct nm_desc_t *d) 645closes the file descriptor, unmaps memory, frees resources. 646.It Va int nm_inject(struct nm_desc_t *d, const void *buf, size_t size) 647similar to pcap_inject(), pushes a packet to a ring, returns the size 648of the packet is successful, or 0 on error; 649.It Va int nm_dispatch(struct nm_desc_t *d, int cnt, nm_cb_t cb, u_char *arg) 650similar to pcap_dispatch(), applies a callback to incoming packets 651.It Va u_char * nm_nextpkt(struct nm_desc_t *d, struct nm_hdr_t *hdr) 652similar to pcap_next(), fetches the next packet 653.Pp 654.El 655.Sh SUPPORTED DEVICES 656.Nm 657natively supports the following devices: 658.Pp 659On FreeBSD: 660.Xr em 4 , 661.Xr igb 4 , 662.Xr ixgbe 4 , 663.Xr lem 4 , 664.Xr re 4 . 665.Pp 666On Linux 667.Xr e1000 4 , 668.Xr e1000e 4 , 669.Xr igb 4 , 670.Xr ixgbe 4 , 671.Xr mlx4 4 , 672.Xr forcedeth 4 , 673.Xr r8169 4 . 674.Pp 675NICs without native support can still be used in 676.Nm 677mode through emulation. Performance is inferior to native netmap 678mode but still significantly higher than sockets, and approaching 679that of in-kernel solutions such as Linux's 680.Xr pktgen . 681.Pp 682Emulation is also available for devices with native netmap support, 683which can be used for testing or performance comparison. 684The sysctl variable 685.Va dev.netmap.admode 686globally controls how netmap mode is implemented. 687.Sh SYSCTL VARIABLES AND MODULE PARAMETERS 688Some aspect of the operation of 689.Nm 690are controlled through sysctl variables on FreeBSD 691.Em ( dev.netmap.* ) 692and module parameters on Linux 693.Em ( /sys/module/netmap_lin/parameters/* ) : 694.Pp 695.Bl -tag -width indent 696.It Va dev.netmap.admode: 0 697Controls the use of native or emulated adapter mode. 6980 uses the best available option, 1 forces native and 699fails if not available, 2 forces emulated hence never fails. 700.It Va dev.netmap.generic_ringsize: 1024 701Ring size used for emulated netmap mode 702.It Va dev.netmap.generic_mit: 100000 703Controls interrupt moderation for emulated mode 704.It Va dev.netmap.mmap_unreg: 0 705.It Va dev.netmap.fwd: 0 706Forces NS_FORWARD mode 707.It Va dev.netmap.flags: 0 708.It Va dev.netmap.txsync_retry: 2 709.It Va dev.netmap.no_pendintr: 1 710Forces recovery of transmit buffers on system calls 711.It Va dev.netmap.mitigate: 1 712Propagates interrupt mitigation to user processes 713.It Va dev.netmap.no_timestamp: 0 714Disables the update of the timestamp in the netmap ring 715.It Va dev.netmap.verbose: 0 716Verbose kernel messages 717.It Va dev.netmap.buf_num: 163840 718.It Va dev.netmap.buf_size: 2048 719.It Va dev.netmap.ring_num: 200 720.It Va dev.netmap.ring_size: 36864 721.It Va dev.netmap.if_num: 100 722.It Va dev.netmap.if_size: 1024 723Sizes and number of objects (netmap_if, netmap_ring, buffers) 724for the global memory region. The only parameter worth modifying is 725.Va dev.netmap.buf_num 726as it impacts the total amount of memory used by netmap. 727.It Va dev.netmap.buf_curr_num: 0 728.It Va dev.netmap.buf_curr_size: 0 729.It Va dev.netmap.ring_curr_num: 0 730.It Va dev.netmap.ring_curr_size: 0 731.It Va dev.netmap.if_curr_num: 0 732.It Va dev.netmap.if_curr_size: 0 733Actual values in use. 734.It Va dev.netmap.bridge_batch: 1024 735Batch size used when moving packets across a 736.Nm VALE 737switch. Values above 64 generally guarantee good 738performance. 739.El |
|
470.Sh SYSTEM CALLS 471.Nm 472uses 473.Xr select 2 474and 475.Xr poll 2 476to wake up processes when significant events occur, and 477.Xr mmap 2 478to map memory. | 740.Sh SYSTEM CALLS 741.Nm 742uses 743.Xr select 2 744and 745.Xr poll 2 746to wake up processes when significant events occur, and 747.Xr mmap 2 748to map memory. |
749.Xr ioctl 2 750is used to configure ports and 751.Nm VALE switches . |
|
479.Pp 480Applications may need to create threads and bind them to 481specific cores to improve performance, using standard 482OS primitives, see 483.Xr pthread 3 . 484In particular, 485.Xr pthread_setaffinity_np 3 486may be of use. | 752.Pp 753Applications may need to create threads and bind them to 754specific cores to improve performance, using standard 755OS primitives, see 756.Xr pthread 3 . 757In particular, 758.Xr pthread_setaffinity_np 3 759may be of use. |
760.Sh CAVEATS 761No matter how fast the CPU and OS are, 762achieving line rate on 10G and faster interfaces 763requires hardware with sufficient performance. 764Several NICs are unable to sustain line rate with 765small packet sizes. Insufficient PCIe or memory bandwidth 766can also cause reduced performance. 767.Pp 768Another frequent reason for low performance is the use 769of flow control on the link: a slow receiver can limit 770the transmit speed. 771Be sure to disable flow control when running high 772speed experiments. 773.Pp 774.Ss SPECIAL NIC FEATURES 775.Nm 776is orthogonal to some NIC features such as 777multiqueue, schedulers, packet filters. 778.Pp 779Multiple transmit and receive rings are supported natively 780and can be configured with ordinary OS tools, 781such as 782.Xr ethtool 783or 784device-specific sysctl variables. 785The same goes for Receive Packet Steering (RPS) 786and filtering of incoming traffic. 787.Pp 788.Nm 789.Em does not use 790features such as 791.Em checksum offloading , TCP segmentation offloading , 792.Em encryption , VLAN encapsulation/decapsulation , 793etc. . 794When using netmap to exchange packets with the host stack, 795make sure to disable these features. |
|
487.Sh EXAMPLES | 796.Sh EXAMPLES |
797.Ss TEST PROGRAMS 798.Nm 799comes with a few programs that can be used for testing or 800simple applications. 801See the 802.Va examples/ 803directory in 804.Nm 805distributions, or 806.Va tools/tools/netmap/ 807directory in FreeBSD distributions. 808.Pp 809.Xr pkt-gen 810is a general purpose traffic source/sink. 811.Pp 812As an example 813.Dl pkt-gen -i ix0 -f tx -l 60 814can generate an infinite stream of minimum size packets, and 815.Dl pkt-gen -i ix0 -f rx 816is a traffic sink. 817Both print traffic statistics, to help monitor 818how the system performs. 819.Pp 820.Xr pkt-gen 821has many options can be uses to set packet sizes, addresses, 822rates, and use multiple send/receive threads and cores. 823.Pp 824.Xr bridge 825is another test program which interconnects two 826.Nm 827ports. It can be used for transparent forwarding between 828interfaces, as in 829.Dl bridge -i ix0 -i ix1 830or even connect the NIC to the host stack using netmap 831.Dl bridge -i ix0 -i ix0 832.Ss USING THE NATIVE API |
|
488The following code implements a traffic generator 489.Pp 490.Bd -literal -compact | 833The following code implements a traffic generator 834.Pp 835.Bd -literal -compact |
491#include <net/netmap.h> | |
492#include <net/netmap_user.h> | 836#include <net/netmap_user.h> |
493struct netmap_if *nifp; 494struct netmap_ring *ring; 495struct nmreq nmr; | 837... 838void sender(void) 839{ 840 struct netmap_if *nifp; 841 struct netmap_ring *ring; 842 struct nmreq nmr; 843 struct pollfd fds; |
496 | 844 |
497fd = open("/dev/netmap", O_RDWR); 498bzero(&nmr, sizeof(nmr)); 499strcpy(nmr.nr_name, "ix0"); 500nmr.nm_version = NETMAP_API; 501ioctl(fd, NIOCREGIF, &nmr); 502p = mmap(0, nmr.nr_memsize, fd); 503nifp = NETMAP_IF(p, nmr.nr_offset); 504ring = NETMAP_TXRING(nifp, 0); 505fds.fd = fd; 506fds.events = POLLOUT; 507for (;;) { 508 poll(list, 1, -1); 509 for ( ; ring->avail > 0 ; ring->avail--) { 510 i = ring->cur; 511 buf = NETMAP_BUF(ring, ring->slot[i].buf_index); 512 ... prepare packet in buf ... 513 ring->slot[i].len = ... packet length ... 514 ring->cur = NETMAP_RING_NEXT(ring, i); | 845 fd = open("/dev/netmap", O_RDWR); 846 bzero(&nmr, sizeof(nmr)); 847 strcpy(nmr.nr_name, "ix0"); 848 nmr.nm_version = NETMAP_API; 849 ioctl(fd, NIOCREGIF, &nmr); 850 p = mmap(0, nmr.nr_memsize, fd); 851 nifp = NETMAP_IF(p, nmr.nr_offset); 852 ring = NETMAP_TXRING(nifp, 0); 853 fds.fd = fd; 854 fds.events = POLLOUT; 855 for (;;) { 856 poll(&fds, 1, -1); 857 while (!nm_ring_empty(ring)) { 858 i = ring->cur; 859 buf = NETMAP_BUF(ring, ring->slot[i].buf_index); 860 ... prepare packet in buf ... 861 ring->slot[i].len = ... packet length ... 862 ring->head = ring->cur = nm_ring_next(ring, i); 863 } |
515 } 516} 517.Ed | 864 } 865} 866.Ed |
518.Sh SUPPORTED INTERFACES | 867.Ss HELPER FUNCTIONS 868A simple receiver can be implemented using the helper functions 869.Bd -literal -compact 870#define NETMAP_WITH_LIBS 871#include <net/netmap_user.h> 872... 873void receiver(void) 874{ 875 struct nm_desc_t *d; 876 struct pollfd fds; 877 u_char *buf; 878 struct nm_hdr_t h; 879 ... 880 d = nm_open("netmap:ix0", NULL, 0, 0); 881 fds.fd = NETMAP_FD(d); 882 fds.events = POLLIN; 883 for (;;) { 884 poll(&fds, 1, -1); 885 while ( (buf = nm_nextpkt(d, &h)) ) 886 consume_pkt(buf, h->len); 887 } 888 nm_close(d); 889} 890.Ed 891.Ss ZERO-COPY FORWARDING 892Since physical interfaces share the same memory region, 893it is possible to do packet forwarding between ports 894swapping buffers. The buffer from the transmit ring is used 895to replenish the receive ring: 896.Bd -literal -compact 897 uint32_t tmp; 898 struct netmap_slot *src, *dst; 899 ... 900 src = &src_ring->slot[rxr->cur]; 901 dst = &dst_ring->slot[txr->cur]; 902 tmp = dst->buf_idx; 903 dst->buf_idx = src->buf_idx; 904 dst->len = src->len; 905 dst->flags = NS_BUF_CHANGED; 906 src->buf_idx = tmp; 907 src->flags = NS_BUF_CHANGED; 908 rxr->head = rxr->cur = nm_ring_next(rxr, rxr->cur); 909 txr->head = txr->cur = nm_ring_next(txr, txr->cur); 910 ... 911.Ed 912.Ss ACCESSING THE HOST STACK 913.Ss VALE SWITCH 914A simple way to test the performance of a 915.Nm VALE 916switch is to attach a sender and a receiver to it, 917e.g. running the following in two different terminals: 918.Dl pkt-gen -i vale1:a -f rx # receiver 919.Dl pkt-gen -i vale1:b -f tx # sender 920.Pp 921The following command attaches an interface and the host stack 922to a switch: 923.Dl vale-ctl -h vale2:em0 924Other |
519.Nm | 925.Nm |
520supports the following interfaces: 521.Xr em 4 , 522.Xr igb 4 , 523.Xr ixgbe 4 , 524.Xr lem 4 , 525.Xr re 4 | 926clients attached to the same switch can now communicate 927with the network card or the host. 928.Pp |
526.Sh SEE ALSO | 929.Sh SEE ALSO |
527.Xr vale 4 | |
528.Pp 529http://info.iet.unipi.it/~luigi/netmap/ 530.Pp 531Luigi Rizzo, Revisiting network I/O APIs: the netmap framework, 532Communications of the ACM, 55 (3), pp.45-51, March 2012 533.Pp 534Luigi Rizzo, netmap: a novel framework for fast packet I/O, 535Usenix ATC'12, June 2012, Boston --- 10 unchanged lines hidden (view full) --- 546.An Giuseppe Lettieri , 547.An Vincenzo Maffione . 548.Pp 549.Nm 550and 551.Nm VALE 552have been funded by the European Commission within FP7 Projects 553CHANGE (257422) and OPENLAB (287581). | 930.Pp 931http://info.iet.unipi.it/~luigi/netmap/ 932.Pp 933Luigi Rizzo, Revisiting network I/O APIs: the netmap framework, 934Communications of the ACM, 55 (3), pp.45-51, March 2012 935.Pp 936Luigi Rizzo, netmap: a novel framework for fast packet I/O, 937Usenix ATC'12, June 2012, Boston --- 10 unchanged lines hidden (view full) --- 948.An Giuseppe Lettieri , 949.An Vincenzo Maffione . 950.Pp 951.Nm 952and 953.Nm VALE 954have been funded by the European Commission within FP7 Projects 955CHANGE (257422) and OPENLAB (287581). |
956.Pp 957.Ss SPECIAL MODES 958When the device name has the form 959.Dl valeXXX:ifname (ifname is an existing interface) 960the physical interface 961(and optionally the corrisponding host stack endpoint) 962are connected or disconnected from the 963.Nm VALE 964switch named XXX. 965In this case the 966.Pa ioctl() 967is only used only for configuration, typically through the 968.Xr vale-ctl 969command. 970The file descriptor cannot be used for I/O, and should be 971closed after issuing the 972.Pa ioctl() . |
|