117885a7bSLuigi Rizzo.\" Copyright (c) 2011-2014 Matteo Landi, Luigi Rizzo, Universita` di Pisa 268b8534bSLuigi Rizzo.\" All rights reserved. 368b8534bSLuigi Rizzo.\" 468b8534bSLuigi Rizzo.\" Redistribution and use in source and binary forms, with or without 568b8534bSLuigi Rizzo.\" modification, are permitted provided that the following conditions 668b8534bSLuigi Rizzo.\" are met: 768b8534bSLuigi Rizzo.\" 1. Redistributions of source code must retain the above copyright 868b8534bSLuigi Rizzo.\" notice, this list of conditions and the following disclaimer. 968b8534bSLuigi Rizzo.\" 2. Redistributions in binary form must reproduce the above copyright 1068b8534bSLuigi Rizzo.\" notice, this list of conditions and the following disclaimer in the 1168b8534bSLuigi Rizzo.\" documentation and/or other materials provided with the distribution. 1268b8534bSLuigi Rizzo.\" 1368b8534bSLuigi Rizzo.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 1468b8534bSLuigi Rizzo.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 1568b8534bSLuigi Rizzo.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 1668b8534bSLuigi Rizzo.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 1768b8534bSLuigi Rizzo.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 1868b8534bSLuigi Rizzo.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 1968b8534bSLuigi Rizzo.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 2068b8534bSLuigi Rizzo.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 2168b8534bSLuigi Rizzo.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 2268b8534bSLuigi Rizzo.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 2368b8534bSLuigi Rizzo.\" SUCH DAMAGE. 2468b8534bSLuigi Rizzo.\" 2568b8534bSLuigi Rizzo.\" This document is derived in part from the enet man page (enet.4) 2668b8534bSLuigi Rizzo.\" distributed with 4.3BSD Unix. 2768b8534bSLuigi Rizzo.\" 2868b8534bSLuigi Rizzo.\" $FreeBSD$ 2968b8534bSLuigi Rizzo.\" 30fa7db06bSLuigi Rizzo.Dd February 13, 2014 3168b8534bSLuigi Rizzo.Dt NETMAP 4 3268b8534bSLuigi Rizzo.Os 3368b8534bSLuigi Rizzo.Sh NAME 3468b8534bSLuigi Rizzo.Nm netmap 3568b8534bSLuigi Rizzo.Nd a framework for fast packet I/O 3617885a7bSLuigi Rizzo.br 3717885a7bSLuigi Rizzo.Nm VALE 3817885a7bSLuigi Rizzo.Nd a fast VirtuAl Local Ethernet using the netmap API 39fa7db06bSLuigi Rizzo.br 40fa7db06bSLuigi Rizzo.Nm netmap pipes 41fa7db06bSLuigi Rizzo.Nd a shared memory packet transport channel 4268b8534bSLuigi Rizzo.Sh SYNOPSIS 4368b8534bSLuigi Rizzo.Cd device netmap 4468b8534bSLuigi Rizzo.Sh DESCRIPTION 4568b8534bSLuigi Rizzo.Nm 46ce3ee1e7SLuigi Rizzois a framework for extremely fast and efficient packet I/O 47ce3ee1e7SLuigi Rizzofor both userspace and kernel clients. 4817885a7bSLuigi RizzoIt runs on FreeBSD and Linux, 4917885a7bSLuigi Rizzoand includes 5017885a7bSLuigi Rizzo.Nm VALE , 51fa7db06bSLuigi Rizzoa very fast and modular in-kernel software switch/dataplane, 5217885a7bSLuigi Rizzoand 53fa7db06bSLuigi Rizzo.Nm netmap pipes , 54fa7db06bSLuigi Rizzoa shared memory packet transport channel. 55fa7db06bSLuigi RizzoAll these are accessed interchangeably with the same API. 56fa7db06bSLuigi Rizzo.Pp 57fa7db06bSLuigi Rizzo.Nm , VALE 58fa7db06bSLuigi Rizzoand 59fa7db06bSLuigi Rizzo.Nm netmap pipes 60fa7db06bSLuigi Rizzoare at least one order of magnitude faster than 61fa7db06bSLuigi Rizzostandard OS mechanisms 62fa7db06bSLuigi Rizzo(sockets, bpf, tun/tap interfaces, native switches, pipes), 63fa7db06bSLuigi Rizzoreaching 14.88 million packets per second (Mpps) 64fa7db06bSLuigi Rizzowith much less than one core on a 10 Gbit NIC, 65fa7db06bSLuigi Rizzoabout 20 Mpps per core for VALE ports, 66fa7db06bSLuigi Rizzoand over 100 Mpps for netmap pipes. 67ce3ee1e7SLuigi Rizzo.Pp 6817885a7bSLuigi RizzoUserspace clients can dynamically switch NICs into 6968b8534bSLuigi Rizzo.Nm 7017885a7bSLuigi Rizzomode and send and receive raw packets through 7117885a7bSLuigi Rizzomemory mapped buffers. 7217885a7bSLuigi RizzoSimilarly, 7317885a7bSLuigi Rizzo.Nm VALE 74fa7db06bSLuigi Rizzoswitch instances and ports, and 75fa7db06bSLuigi Rizzo.Nm netmap pipes 76fa7db06bSLuigi Rizzocan be created dynamically, 7717885a7bSLuigi Rizzoproviding high speed packet I/O between processes, 7817885a7bSLuigi Rizzovirtual machines, NICs and the host stack. 7917885a7bSLuigi Rizzo.Pp 80fa7db06bSLuigi Rizzo.Nm 81fa7db06bSLuigi Rizzosuports both non-blocking I/O through 82fa7db06bSLuigi Rizzo.Xr ioctls() , 83fa7db06bSLuigi Rizzosynchronization and blocking I/O through a file descriptor 84fa7db06bSLuigi Rizzoand standard OS mechanisms such as 85fa7db06bSLuigi Rizzo.Xr select 2 , 86fa7db06bSLuigi Rizzo.Xr poll 2 , 87fa7db06bSLuigi Rizzo.Xr epoll 2 , 88fa7db06bSLuigi Rizzo.Xr kqueue 2 . 89fa7db06bSLuigi Rizzo.Nm VALE 90fa7db06bSLuigi Rizzoand 91fa7db06bSLuigi Rizzo.Nm netmap pipes 92fa7db06bSLuigi Rizzoare implemented by a single kernel module, which also emulates the 93fa7db06bSLuigi Rizzo.Nm 94fa7db06bSLuigi RizzoAPI over standard drivers for devices without native 95fa7db06bSLuigi Rizzo.Nm 96fa7db06bSLuigi Rizzosupport. 9717885a7bSLuigi RizzoFor best performance, 9868b8534bSLuigi Rizzo.Nm 99fa7db06bSLuigi Rizzorequires explicit support in device drivers. 100ce3ee1e7SLuigi Rizzo.Pp 10117885a7bSLuigi RizzoIn the rest of this (long) manual page we document 10217885a7bSLuigi Rizzovarious aspects of the 103ce3ee1e7SLuigi Rizzo.Nm 10417885a7bSLuigi Rizzoand 105ce3ee1e7SLuigi Rizzo.Nm VALE 10617885a7bSLuigi Rizzoarchitecture, features and usage. 10717885a7bSLuigi Rizzo.Sh ARCHITECTURE 10817885a7bSLuigi Rizzo.Nm 10917885a7bSLuigi Rizzosupports raw packet I/O through a 11017885a7bSLuigi Rizzo.Em port , 11117885a7bSLuigi Rizzowhich can be connected to a physical interface 11217885a7bSLuigi Rizzo.Em ( NIC ) , 11317885a7bSLuigi Rizzoto the host stack, 11417885a7bSLuigi Rizzoor to a 11517885a7bSLuigi Rizzo.Nm VALE 11617885a7bSLuigi Rizzoswitch). 11717885a7bSLuigi RizzoPorts use preallocated circular queues of buffers 11817885a7bSLuigi Rizzo.Em ( rings ) 11917885a7bSLuigi Rizzoresiding in an mmapped region. 12017885a7bSLuigi RizzoThere is one ring for each transmit/receive queue of a 12117885a7bSLuigi RizzoNIC or virtual port. 12217885a7bSLuigi RizzoAn additional ring pair connects to the host stack. 123ce3ee1e7SLuigi Rizzo.Pp 12417885a7bSLuigi RizzoAfter binding a file descriptor to a port, a 12517885a7bSLuigi Rizzo.Nm 12617885a7bSLuigi Rizzoclient can send or receive packets in batches through 12717885a7bSLuigi Rizzothe rings, and possibly implement zero-copy forwarding 12817885a7bSLuigi Rizzobetween ports. 129ce3ee1e7SLuigi Rizzo.Pp 13017885a7bSLuigi RizzoAll NICs operating in 13168b8534bSLuigi Rizzo.Nm 132ce3ee1e7SLuigi Rizzomode use the same memory region, 13317885a7bSLuigi Rizzoaccessible to all processes who own 13417885a7bSLuigi Rizzo.Nm /dev/netmap 13517885a7bSLuigi Rizzofile descriptors bound to NICs. 136fa7db06bSLuigi RizzoIndependent 13717885a7bSLuigi Rizzo.Nm VALE 138fa7db06bSLuigi Rizzoand 139fa7db06bSLuigi Rizzo.Nm netmap pipe 140fa7db06bSLuigi Rizzoports 141fa7db06bSLuigi Rizzoby default use separate memory regions, 142fa7db06bSLuigi Rizzobut can be independently configured to share memory. 14317885a7bSLuigi Rizzo.Sh ENTERING AND EXITING NETMAP MODE 144fa7db06bSLuigi RizzoThe following section describes the system calls to create 145fa7db06bSLuigi Rizzoand control 146fa7db06bSLuigi Rizzo.Nm netmap 147fa7db06bSLuigi Rizzoports (including 148fa7db06bSLuigi Rizzo.Nm VALE 149fa7db06bSLuigi Rizzoand 150fa7db06bSLuigi Rizzo.Nm netmap pipe 151fa7db06bSLuigi Rizzoports). 152fa7db06bSLuigi RizzoSimpler, higher level functions are described in section 153fa7db06bSLuigi Rizzo.Xr LIBRARIES . 154fa7db06bSLuigi Rizzo.Pp 15517885a7bSLuigi RizzoPorts and rings are created and controlled through a file descriptor, 15617885a7bSLuigi Rizzocreated by opening a special device 15717885a7bSLuigi Rizzo.Dl fd = open("/dev/netmap"); 15817885a7bSLuigi Rizzoand then bound to a specific port with an 15917885a7bSLuigi Rizzo.Dl ioctl(fd, NIOCREGIF, (struct nmreq *)arg); 16017885a7bSLuigi Rizzo.Pp 16117885a7bSLuigi Rizzo.Nm 16217885a7bSLuigi Rizzohas multiple modes of operation controlled by the 16317885a7bSLuigi Rizzo.Vt struct nmreq 16417885a7bSLuigi Rizzoargument. 16517885a7bSLuigi Rizzo.Va arg.nr_name 16617885a7bSLuigi Rizzospecifies the port name, as follows: 16717885a7bSLuigi Rizzo.Bl -tag -width XXXX 16817885a7bSLuigi Rizzo.It Dv OS network interface name (e.g. 'em0', 'eth1', ... ) 16917885a7bSLuigi Rizzothe data path of the NIC is disconnected from the host stack, 17017885a7bSLuigi Rizzoand the file descriptor is bound to the NIC (one or all queues), 17117885a7bSLuigi Rizzoor to the host stack; 17217885a7bSLuigi Rizzo.It Dv valeXXX:YYY (arbitrary XXX and YYY) 17317885a7bSLuigi Rizzothe file descriptor is bound to port YYY of a VALE switch called XXX, 17417885a7bSLuigi Rizzoboth dynamically created if necessary. 17517885a7bSLuigi RizzoThe string cannot exceed IFNAMSIZ characters, and YYY cannot 17617885a7bSLuigi Rizzobe the name of any existing OS network interface. 17717885a7bSLuigi Rizzo.El 17817885a7bSLuigi Rizzo.Pp 17917885a7bSLuigi RizzoOn return, 18017885a7bSLuigi Rizzo.Va arg 18117885a7bSLuigi Rizzoindicates the size of the shared memory region, 18217885a7bSLuigi Rizzoand the number, size and location of all the 18317885a7bSLuigi Rizzo.Nm 18417885a7bSLuigi Rizzodata structures, which can be accessed by mmapping the memory 18517885a7bSLuigi Rizzo.Dl char *mem = mmap(0, arg.nr_memsize, fd); 18617885a7bSLuigi Rizzo.Pp 18717885a7bSLuigi RizzoNon blocking I/O is done with special 18817885a7bSLuigi Rizzo.Xr ioctl 2 18917885a7bSLuigi Rizzo.Xr select 2 19017885a7bSLuigi Rizzoand 19117885a7bSLuigi Rizzo.Xr poll 2 19217885a7bSLuigi Rizzoon the file descriptor permit blocking I/O. 19317885a7bSLuigi Rizzo.Xr epoll 2 19417885a7bSLuigi Rizzoand 19517885a7bSLuigi Rizzo.Xr kqueue 2 19617885a7bSLuigi Rizzoare not supported on 19717885a7bSLuigi Rizzo.Nm 19817885a7bSLuigi Rizzofile descriptors. 19917885a7bSLuigi Rizzo.Pp 20017885a7bSLuigi RizzoWhile a NIC is in 20117885a7bSLuigi Rizzo.Nm 20217885a7bSLuigi Rizzomode, the OS will still believe the interface is up and running. 20317885a7bSLuigi RizzoOS-generated packets for that NIC end up into a 20417885a7bSLuigi Rizzo.Nm 20517885a7bSLuigi Rizzoring, and another ring is used to send packets into the OS network stack. 20617885a7bSLuigi RizzoA 20717885a7bSLuigi Rizzo.Xr close 2 20817885a7bSLuigi Rizzoon the file descriptor removes the binding, 20917885a7bSLuigi Rizzoand returns the NIC to normal mode (reconnecting the data path 21017885a7bSLuigi Rizzoto the host stack), or destroys the virtual port. 21117885a7bSLuigi Rizzo.Sh DATA STRUCTURES 21217885a7bSLuigi RizzoThe data structures in the mmapped memory region are detailed in 21317885a7bSLuigi Rizzo.Xr sys/net/netmap.h , 21417885a7bSLuigi Rizzowhich is the ultimate reference for the 21517885a7bSLuigi Rizzo.Nm 21617885a7bSLuigi RizzoAPI. The main structures and fields are indicated below: 21768b8534bSLuigi Rizzo.Bl -tag -width XXX 21868b8534bSLuigi Rizzo.It Dv struct netmap_if (one per interface) 21968b8534bSLuigi Rizzo.Bd -literal 22068b8534bSLuigi Rizzostruct netmap_if { 22117885a7bSLuigi Rizzo ... 22217885a7bSLuigi Rizzo const uint32_t ni_flags; /* properties */ 22317885a7bSLuigi Rizzo ... 22417885a7bSLuigi Rizzo const uint32_t ni_tx_rings; /* NIC tx rings */ 22517885a7bSLuigi Rizzo const uint32_t ni_rx_rings; /* NIC rx rings */ 226fa7db06bSLuigi Rizzo uint32_t ni_bufs_head; /* head of extra bufs list */ 22717885a7bSLuigi Rizzo ... 22868b8534bSLuigi Rizzo}; 22968b8534bSLuigi Rizzo.Ed 230ce3ee1e7SLuigi Rizzo.Pp 23117885a7bSLuigi RizzoIndicates the number of available rings 23217885a7bSLuigi Rizzo.Pa ( struct netmap_rings ) 23317885a7bSLuigi Rizzoand their position in the mmapped region. 23417885a7bSLuigi RizzoThe number of tx and rx rings 23517885a7bSLuigi Rizzo.Pa ( ni_tx_rings , ni_rx_rings ) 23617885a7bSLuigi Rizzonormally depends on the hardware. 23717885a7bSLuigi RizzoNICs also have an extra tx/rx ring pair connected to the host stack. 23817885a7bSLuigi Rizzo.Em NIOCREGIF 239fa7db06bSLuigi Rizzocan also request additional unbound buffers in the same memory space, 240fa7db06bSLuigi Rizzoto be used as temporary storage for packets. 241fa7db06bSLuigi Rizzo.Pa ni_bufs_head 242fa7db06bSLuigi Rizzocontains the index of the first of these free rings, 243fa7db06bSLuigi Rizzowhich are connected in a list (the first uint32_t of each 244fa7db06bSLuigi Rizzobuffer being the index of the next buffer in the list). 245fa7db06bSLuigi RizzoA 0 indicates the end of the list. 24617885a7bSLuigi Rizzo.It Dv struct netmap_ring (one per ring) 24768b8534bSLuigi Rizzo.Bd -literal 24868b8534bSLuigi Rizzostruct netmap_ring { 24917885a7bSLuigi Rizzo ... 25017885a7bSLuigi Rizzo const uint32_t num_slots; /* slots in each ring */ 25117885a7bSLuigi Rizzo const uint32_t nr_buf_size; /* size of each buffer */ 25217885a7bSLuigi Rizzo ... 25317885a7bSLuigi Rizzo uint32_t head; /* (u) first buf owned by user */ 25417885a7bSLuigi Rizzo uint32_t cur; /* (u) wakeup position */ 25517885a7bSLuigi Rizzo const uint32_t tail; /* (k) first buf owned by kernel */ 25617885a7bSLuigi Rizzo ... 25717885a7bSLuigi Rizzo uint32_t flags; 25817885a7bSLuigi Rizzo struct timeval ts; /* (k) time of last rxsync() */ 25917885a7bSLuigi Rizzo ... 260ce3ee1e7SLuigi Rizzo struct netmap_slot slot[0]; /* array of slots */ 26168b8534bSLuigi Rizzo} 26268b8534bSLuigi Rizzo.Ed 263ce3ee1e7SLuigi Rizzo.Pp 26417885a7bSLuigi RizzoImplements transmit and receive rings, with read/write 26517885a7bSLuigi Rizzopointers, metadata and and an array of 26617885a7bSLuigi Rizzo.Pa slots 26717885a7bSLuigi Rizzodescribing the buffers. 26817885a7bSLuigi Rizzo.It Dv struct netmap_slot (one per buffer) 26968b8534bSLuigi Rizzo.Bd -literal 27068b8534bSLuigi Rizzostruct netmap_slot { 27168b8534bSLuigi Rizzo uint32_t buf_idx; /* buffer index */ 27268b8534bSLuigi Rizzo uint16_t len; /* packet length */ 27368b8534bSLuigi Rizzo uint16_t flags; /* buf changed, etc. */ 27417885a7bSLuigi Rizzo uint64_t ptr; /* address for indirect buffers */ 27568b8534bSLuigi Rizzo}; 27668b8534bSLuigi Rizzo.Ed 27717885a7bSLuigi Rizzo.Pp 27817885a7bSLuigi RizzoDescribes a packet buffer, which normally is identified by 27917885a7bSLuigi Rizzoan index and resides in the mmapped region. 28068b8534bSLuigi Rizzo.It Dv packet buffers 28117885a7bSLuigi RizzoFixed size (normally 2 KB) packet buffers allocated by the kernel. 282ce3ee1e7SLuigi Rizzo.El 283ce3ee1e7SLuigi Rizzo.Pp 28417885a7bSLuigi RizzoThe offset of the 28517885a7bSLuigi Rizzo.Pa struct netmap_if 28617885a7bSLuigi Rizzoin the mmapped region is indicated by the 28717885a7bSLuigi Rizzo.Pa nr_offset 28817885a7bSLuigi Rizzofield in the structure returned by 28917885a7bSLuigi Rizzo.Pa NIOCREGIF . 29017885a7bSLuigi RizzoFrom there, all other objects are reachable through 29117885a7bSLuigi Rizzorelative references (offsets or indexes). 29217885a7bSLuigi RizzoMacros and functions in <net/netmap_user.h> 29317885a7bSLuigi Rizzohelp converting them into actual pointers: 29417885a7bSLuigi Rizzo.Pp 29517885a7bSLuigi Rizzo.Dl struct netmap_if *nifp = NETMAP_IF(mem, arg.nr_offset); 29617885a7bSLuigi Rizzo.Dl struct netmap_ring *txr = NETMAP_TXRING(nifp, ring_index); 29717885a7bSLuigi Rizzo.Dl struct netmap_ring *rxr = NETMAP_RXRING(nifp, ring_index); 29817885a7bSLuigi Rizzo.Pp 29917885a7bSLuigi Rizzo.Dl char *buf = NETMAP_BUF(ring, buffer_index); 30017885a7bSLuigi Rizzo.Sh RINGS, BUFFERS AND DATA I/O 30117885a7bSLuigi Rizzo.Va Rings 30217885a7bSLuigi Rizzoare circular queues of packets with three indexes/pointers 30317885a7bSLuigi Rizzo.Va ( head , cur , tail ) ; 30417885a7bSLuigi Rizzoone slot is always kept empty. 30517885a7bSLuigi RizzoThe ring size 30617885a7bSLuigi Rizzo.Va ( num_slots ) 30717885a7bSLuigi Rizzoshould not be assumed to be a power of two. 30817885a7bSLuigi Rizzo.br 30917885a7bSLuigi Rizzo(NOTE: older versions of netmap used head/count format to indicate 31017885a7bSLuigi Rizzothe content of a ring). 31117885a7bSLuigi Rizzo.Pp 31217885a7bSLuigi Rizzo.Va head 31317885a7bSLuigi Rizzois the first slot available to userspace; 31417885a7bSLuigi Rizzo.br 31517885a7bSLuigi Rizzo.Va cur 31617885a7bSLuigi Rizzois the wakeup point: 31717885a7bSLuigi Rizzoselect/poll will unblock when 31817885a7bSLuigi Rizzo.Va tail 31917885a7bSLuigi Rizzopasses 32017885a7bSLuigi Rizzo.Va cur ; 32117885a7bSLuigi Rizzo.br 32217885a7bSLuigi Rizzo.Va tail 32317885a7bSLuigi Rizzois the first slot reserved to the kernel. 32417885a7bSLuigi Rizzo.Pp 32517885a7bSLuigi RizzoSlot indexes MUST only move forward; 32617885a7bSLuigi Rizzofor convenience, the function 32717885a7bSLuigi Rizzo.Dl nm_ring_next(ring, index) 32817885a7bSLuigi Rizzoreturns the next index modulo the ring size. 32917885a7bSLuigi Rizzo.Pp 33017885a7bSLuigi Rizzo.Va head 33117885a7bSLuigi Rizzoand 33217885a7bSLuigi Rizzo.Va cur 33317885a7bSLuigi Rizzoare only modified by the user program; 33417885a7bSLuigi Rizzo.Va tail 33517885a7bSLuigi Rizzois only modified by the kernel. 33617885a7bSLuigi RizzoThe kernel only reads/writes the 33717885a7bSLuigi Rizzo.Vt struct netmap_ring 33817885a7bSLuigi Rizzoslots and buffers 33917885a7bSLuigi Rizzoduring the execution of a netmap-related system call. 34017885a7bSLuigi RizzoThe only exception are slots (and buffers) in the range 34117885a7bSLuigi Rizzo.Va tail\ . . . head-1 , 34217885a7bSLuigi Rizzothat are explicitly assigned to the kernel. 34317885a7bSLuigi Rizzo.Pp 34417885a7bSLuigi Rizzo.Ss TRANSMIT RINGS 34517885a7bSLuigi RizzoOn transmit rings, after a 34617885a7bSLuigi Rizzo.Nm 34717885a7bSLuigi Rizzosystem call, slots in the range 34817885a7bSLuigi Rizzo.Va head\ . . . tail-1 34917885a7bSLuigi Rizzoare available for transmission. 35017885a7bSLuigi RizzoUser code should fill the slots sequentially 35117885a7bSLuigi Rizzoand advance 35217885a7bSLuigi Rizzo.Va head 35317885a7bSLuigi Rizzoand 35417885a7bSLuigi Rizzo.Va cur 35517885a7bSLuigi Rizzopast slots ready to transmit. 35617885a7bSLuigi Rizzo.Va cur 35717885a7bSLuigi Rizzomay be moved further ahead if the user code needs 35817885a7bSLuigi Rizzomore slots before further transmissions (see 35917885a7bSLuigi Rizzo.Sx SCATTER GATHER I/O ) . 36017885a7bSLuigi Rizzo.Pp 36117885a7bSLuigi RizzoAt the next NIOCTXSYNC/select()/poll(), 36217885a7bSLuigi Rizzoslots up to 36317885a7bSLuigi Rizzo.Va head-1 36417885a7bSLuigi Rizzoare pushed to the port, and 36517885a7bSLuigi Rizzo.Va tail 36617885a7bSLuigi Rizzomay advance if further slots have become available. 36717885a7bSLuigi RizzoBelow is an example of the evolution of a TX ring: 36817885a7bSLuigi Rizzo.Bd -literal 36917885a7bSLuigi Rizzo after the syscall, slots between cur and tail are (a)vailable 37017885a7bSLuigi Rizzo head=cur tail 37117885a7bSLuigi Rizzo | | 37217885a7bSLuigi Rizzo v v 37317885a7bSLuigi Rizzo TX [.....aaaaaaaaaaa.............] 37417885a7bSLuigi Rizzo 37517885a7bSLuigi Rizzo user creates new packets to (T)ransmit 37617885a7bSLuigi Rizzo head=cur tail 37717885a7bSLuigi Rizzo | | 37817885a7bSLuigi Rizzo v v 37917885a7bSLuigi Rizzo TX [.....TTTTTaaaaaa.............] 38017885a7bSLuigi Rizzo 38117885a7bSLuigi Rizzo NIOCTXSYNC/poll()/select() sends packets and reports new slots 38217885a7bSLuigi Rizzo head=cur tail 38317885a7bSLuigi Rizzo | | 38417885a7bSLuigi Rizzo v v 38517885a7bSLuigi Rizzo TX [..........aaaaaaaaaaa........] 38617885a7bSLuigi Rizzo.Ed 38717885a7bSLuigi Rizzo.Pp 38817885a7bSLuigi Rizzoselect() and poll() wlll block if there is no space in the ring, i.e. 38917885a7bSLuigi Rizzo.Dl ring->cur == ring->tail 39017885a7bSLuigi Rizzoand return when new slots have become available. 39117885a7bSLuigi Rizzo.Pp 39217885a7bSLuigi RizzoHigh speed applications may want to amortize the cost of system calls 39317885a7bSLuigi Rizzoby preparing as many packets as possible before issuing them. 39417885a7bSLuigi Rizzo.Pp 39517885a7bSLuigi RizzoA transmit ring with pending transmissions has 39617885a7bSLuigi Rizzo.Dl ring->head != ring->tail + 1 (modulo the ring size). 39717885a7bSLuigi RizzoThe function 39817885a7bSLuigi Rizzo.Va int nm_tx_pending(ring) 39917885a7bSLuigi Rizzoimplements this test. 40017885a7bSLuigi Rizzo.Ss RECEIVE RINGS 40117885a7bSLuigi RizzoOn receive rings, after a 40217885a7bSLuigi Rizzo.Nm 40317885a7bSLuigi Rizzosystem call, the slots in the range 40417885a7bSLuigi Rizzo.Va head\& . . . tail-1 40517885a7bSLuigi Rizzocontain received packets. 40617885a7bSLuigi RizzoUser code should process them and advance 40717885a7bSLuigi Rizzo.Va head 40817885a7bSLuigi Rizzoand 40917885a7bSLuigi Rizzo.Va cur 41017885a7bSLuigi Rizzopast slots it wants to return to the kernel. 41117885a7bSLuigi Rizzo.Va cur 41217885a7bSLuigi Rizzomay be moved further ahead if the user code wants to 41317885a7bSLuigi Rizzowait for more packets 41417885a7bSLuigi Rizzowithout returning all the previous slots to the kernel. 41517885a7bSLuigi Rizzo.Pp 41617885a7bSLuigi RizzoAt the next NIOCRXSYNC/select()/poll(), 41717885a7bSLuigi Rizzoslots up to 41817885a7bSLuigi Rizzo.Va head-1 41917885a7bSLuigi Rizzoare returned to the kernel for further receives, and 42017885a7bSLuigi Rizzo.Va tail 42117885a7bSLuigi Rizzomay advance to report new incoming packets. 42217885a7bSLuigi Rizzo.br 42317885a7bSLuigi RizzoBelow is an example of the evolution of an RX ring: 42417885a7bSLuigi Rizzo.Bd -literal 42517885a7bSLuigi Rizzo after the syscall, there are some (h)eld and some (R)eceived slots 42617885a7bSLuigi Rizzo head cur tail 42717885a7bSLuigi Rizzo | | | 42817885a7bSLuigi Rizzo v v v 42917885a7bSLuigi Rizzo RX [..hhhhhhRRRRRRRR..........] 43017885a7bSLuigi Rizzo 43117885a7bSLuigi Rizzo user advances head and cur, releasing some slots and holding others 43217885a7bSLuigi Rizzo head cur tail 43317885a7bSLuigi Rizzo | | | 43417885a7bSLuigi Rizzo v v v 43517885a7bSLuigi Rizzo RX [..*****hhhRRRRRR...........] 43617885a7bSLuigi Rizzo 43717885a7bSLuigi Rizzo NICRXSYNC/poll()/select() recovers slots and reports new packets 43817885a7bSLuigi Rizzo head cur tail 43917885a7bSLuigi Rizzo | | | 44017885a7bSLuigi Rizzo v v v 44117885a7bSLuigi Rizzo RX [.......hhhRRRRRRRRRRRR....] 44217885a7bSLuigi Rizzo.Ed 44317885a7bSLuigi Rizzo.Sh SLOTS AND PACKET BUFFERS 44417885a7bSLuigi RizzoNormally, packets should be stored in the netmap-allocated buffers 44517885a7bSLuigi Rizzoassigned to slots when ports are bound to a file descriptor. 44617885a7bSLuigi RizzoOne packet is fully contained in a single buffer. 44717885a7bSLuigi Rizzo.Pp 44817885a7bSLuigi RizzoThe following flags affect slot and buffer processing: 449ce3ee1e7SLuigi Rizzo.Bl -tag -width XXX 450ce3ee1e7SLuigi Rizzo.It NS_BUF_CHANGED 45117885a7bSLuigi Rizzoit MUST be used when the buf_idx in the slot is changed. 45217885a7bSLuigi RizzoThis can be used to implement 45317885a7bSLuigi Rizzozero-copy forwarding, see 45417885a7bSLuigi Rizzo.Sx ZERO-COPY FORWARDING . 455ce3ee1e7SLuigi Rizzo.It NS_REPORT 45617885a7bSLuigi Rizzoreports when this buffer has been transmitted. 457ce3ee1e7SLuigi RizzoNormally, 458ce3ee1e7SLuigi Rizzo.Nm 459ce3ee1e7SLuigi Rizzonotifies transmit completions in batches, hence signals 46017885a7bSLuigi Rizzocan be delayed indefinitely. This flag helps detecting 46117885a7bSLuigi Rizzowhen packets have been send and a file descriptor can be closed. 462ce3ee1e7SLuigi Rizzo.It NS_FORWARD 46317885a7bSLuigi RizzoWhen a ring is in 'transparent' mode (see 46417885a7bSLuigi Rizzo.Sx TRANSPARENT MODE ) , 46517885a7bSLuigi Rizzopackets marked with this flags are forwarded to the other endpoint 46617885a7bSLuigi Rizzoat the next system call, thus restoring (in a selective way) 46717885a7bSLuigi Rizzothe connection between a NIC and the host stack. 468ce3ee1e7SLuigi Rizzo.It NS_NO_LEARN 469ce3ee1e7SLuigi Rizzotells the forwarding code that the SRC MAC address for this 47017885a7bSLuigi Rizzopacket must not be used in the learning bridge code. 471ce3ee1e7SLuigi Rizzo.It NS_INDIRECT 47217885a7bSLuigi Rizzoindicates that the packet's payload is in a user-supplied buffer, 47317885a7bSLuigi Rizzowhose user virtual address is in the 'ptr' field of the slot. 474ce3ee1e7SLuigi RizzoThe size can reach 65535 bytes. 47517885a7bSLuigi Rizzo.br 47617885a7bSLuigi RizzoThis is only supported on the transmit ring of 47717885a7bSLuigi Rizzo.Nm VALE 47817885a7bSLuigi Rizzoports, and it helps reducing data copies in the interconnection 47917885a7bSLuigi Rizzoof virtual machines. 480ce3ee1e7SLuigi Rizzo.It NS_MOREFRAG 481ce3ee1e7SLuigi Rizzoindicates that the packet continues with subsequent buffers; 482ce3ee1e7SLuigi Rizzothe last buffer in a packet must have the flag clear. 483ce3ee1e7SLuigi Rizzo.El 48417885a7bSLuigi Rizzo.Sh SCATTER GATHER I/O 48517885a7bSLuigi RizzoPackets can span multiple slots if the 48617885a7bSLuigi Rizzo.Va NS_MOREFRAG 48717885a7bSLuigi Rizzoflag is set in all but the last slot. 48817885a7bSLuigi RizzoThe maximum length of a chain is 64 buffers. 48917885a7bSLuigi RizzoThis is normally used with 49017885a7bSLuigi Rizzo.Nm VALE 49117885a7bSLuigi Rizzoports when connecting virtual machines, as they generate large 49217885a7bSLuigi RizzoTSO segments that are not split unless they reach a physical device. 49317885a7bSLuigi Rizzo.Pp 49417885a7bSLuigi RizzoNOTE: The length field always refers to the individual 49517885a7bSLuigi Rizzofragment; there is no place with the total length of a packet. 49617885a7bSLuigi Rizzo.Pp 49717885a7bSLuigi RizzoOn receive rings the macro 49817885a7bSLuigi Rizzo.Va NS_RFRAGS(slot) 49917885a7bSLuigi Rizzoindicates the remaining number of slots for this packet, 50017885a7bSLuigi Rizzoincluding the current one. 50117885a7bSLuigi RizzoSlots with a value greater than 1 also have NS_MOREFRAG set. 50213a5d88fSLuigi Rizzo.Sh IOCTLS 50368b8534bSLuigi Rizzo.Nm 50417885a7bSLuigi Rizzouses two ioctls (NIOCTXSYNC, NIOCRXSYNC) 50517885a7bSLuigi Rizzofor non-blocking I/O. They take no argument. 50617885a7bSLuigi RizzoTwo more ioctls (NIOCGINFO, NIOCREGIF) are used 50717885a7bSLuigi Rizzoto query and configure ports, with the following argument: 50868b8534bSLuigi Rizzo.Bd -literal 50968b8534bSLuigi Rizzostruct nmreq { 51017885a7bSLuigi Rizzo char nr_name[IFNAMSIZ]; /* (i) port name */ 51117885a7bSLuigi Rizzo uint32_t nr_version; /* (i) API version */ 51217885a7bSLuigi Rizzo uint32_t nr_offset; /* (o) nifp offset in mmap region */ 51317885a7bSLuigi Rizzo uint32_t nr_memsize; /* (o) size of the mmap region */ 514fa7db06bSLuigi Rizzo uint32_t nr_tx_slots; /* (i/o) slots in tx rings */ 515fa7db06bSLuigi Rizzo uint32_t nr_rx_slots; /* (i/o) slots in rx rings */ 516fa7db06bSLuigi Rizzo uint16_t nr_tx_rings; /* (i/o) number of tx rings */ 517fa7db06bSLuigi Rizzo uint16_t nr_rx_rings; /* (i/o) number of tx rings */ 518fa7db06bSLuigi Rizzo uint16_t nr_ringid; /* (i/o) ring(s) we care about */ 51917885a7bSLuigi Rizzo uint16_t nr_cmd; /* (i) special command */ 520fa7db06bSLuigi Rizzo uint16_t nr_arg1; /* (i/o) extra arguments */ 521fa7db06bSLuigi Rizzo uint16_t nr_arg2; /* (i/o) extra arguments */ 522fa7db06bSLuigi Rizzo uint32_t nr_arg3; /* (i/o) extra arguments */ 523fa7db06bSLuigi Rizzo uint32_t nr_flags /* (i/o) open mode */ 52417885a7bSLuigi Rizzo ... 52568b8534bSLuigi Rizzo}; 52668b8534bSLuigi Rizzo.Ed 52768b8534bSLuigi Rizzo.Pp 52817885a7bSLuigi RizzoA file descriptor obtained through 52917885a7bSLuigi Rizzo.Pa /dev/netmap 53017885a7bSLuigi Rizzoalso supports the ioctl supported by network devices, see 53117885a7bSLuigi Rizzo.Xr netintro 4 . 53268b8534bSLuigi Rizzo.Bl -tag -width XXXX 53368b8534bSLuigi Rizzo.It Dv NIOCGINFO 53417885a7bSLuigi Rizzoreturns EINVAL if the named port does not support netmap. 535ce3ee1e7SLuigi RizzoOtherwise, it returns 0 and (advisory) information 53617885a7bSLuigi Rizzoabout the port. 537ce3ee1e7SLuigi RizzoNote that all the information below can change before the 538ce3ee1e7SLuigi Rizzointerface is actually put in netmap mode. 53917885a7bSLuigi Rizzo.Bl -tag -width XX 54017885a7bSLuigi Rizzo.It Pa nr_memsize 54117885a7bSLuigi Rizzoindicates the size of the 54217885a7bSLuigi Rizzo.Nm 54317885a7bSLuigi Rizzomemory region. NICs in 54417885a7bSLuigi Rizzo.Nm 54517885a7bSLuigi Rizzomode all share the same memory region, 54617885a7bSLuigi Rizzowhereas 54717885a7bSLuigi Rizzo.Nm VALE 54817885a7bSLuigi Rizzoports have independent regions for each port. 54917885a7bSLuigi Rizzo.It Pa nr_tx_slots , nr_rx_slots 550ce3ee1e7SLuigi Rizzoindicate the size of transmit and receive rings. 55117885a7bSLuigi Rizzo.It Pa nr_tx_rings , nr_rx_rings 552ce3ee1e7SLuigi Rizzoindicate the number of transmit 553ce3ee1e7SLuigi Rizzoand receive rings. 554ce3ee1e7SLuigi RizzoBoth ring number and sizes may be configured at runtime 555ce3ee1e7SLuigi Rizzousing interface-specific functions (e.g. 55617885a7bSLuigi Rizzo.Xr ethtool 55717885a7bSLuigi Rizzo). 55817885a7bSLuigi Rizzo.El 55968b8534bSLuigi Rizzo.It Dv NIOCREGIF 56017885a7bSLuigi Rizzobinds the port named in 56117885a7bSLuigi Rizzo.Va nr_name 56217885a7bSLuigi Rizzoto the file descriptor. For a physical device this also switches it into 56317885a7bSLuigi Rizzo.Nm 56417885a7bSLuigi Rizzomode, disconnecting 56517885a7bSLuigi Rizzoit from the host stack. 56617885a7bSLuigi RizzoMultiple file descriptors can be bound to the same port, 56717885a7bSLuigi Rizzowith proper synchronization left to the user. 56817885a7bSLuigi Rizzo.Pp 569fa7db06bSLuigi Rizzo.Dv NIOCREGIF can also bind a file descriptor to one endpoint of a 570fa7db06bSLuigi Rizzo.Em netmap pipe , 571fa7db06bSLuigi Rizzoconsisting of two netmap ports with a crossover connection. 572fa7db06bSLuigi RizzoA netmap pipe share the same memory space of the parent port, 573fa7db06bSLuigi Rizzoand is meant to enable configuration where a master process acts 574fa7db06bSLuigi Rizzoas a dispatcher towards slave processes. 575fa7db06bSLuigi Rizzo.Pp 576fa7db06bSLuigi RizzoTo enable this function, the 577fa7db06bSLuigi Rizzo.Pa nr_arg1 578fa7db06bSLuigi Rizzofield of the structure can be used as a hint to the kernel to 579fa7db06bSLuigi Rizzoindicate how many pipes we expect to use, and reserve extra space 580fa7db06bSLuigi Rizzoin the memory region. 581fa7db06bSLuigi Rizzo.Pp 582fa7db06bSLuigi RizzoOn return, it gives the same info as NIOCGINFO, 583fa7db06bSLuigi Rizzowith 584fa7db06bSLuigi Rizzo.Pa nr_ringid 585fa7db06bSLuigi Rizzoand 586fa7db06bSLuigi Rizzo.Pa nr_flags 587fa7db06bSLuigi Rizzoindicating the identity of the rings controlled through the file 58868b8534bSLuigi Rizzodescriptor. 58968b8534bSLuigi Rizzo.Pp 590fa7db06bSLuigi Rizzo.Va nr_flags 59117885a7bSLuigi Rizzo.Va nr_ringid 59217885a7bSLuigi Rizzoselects which rings are controlled through this file descriptor. 593fa7db06bSLuigi RizzoPossible values of 594fa7db06bSLuigi Rizzo.Pa nr_flags 595fa7db06bSLuigi Rizzoare indicated below, together with the naming schemes 596fa7db06bSLuigi Rizzothat application libraries (such as the 597fa7db06bSLuigi Rizzo.Nm nm_open 598fa7db06bSLuigi Rizzoindicated below) can use to indicate the specific set of rings. 599fa7db06bSLuigi RizzoIn the example below, "netmap:foo" is any valid netmap port name. 60068b8534bSLuigi Rizzo.Bl -tag -width XXXXX 601fa7db06bSLuigi Rizzo.It NR_REG_ALL_NIC "netmap:foo" 602fa7db06bSLuigi Rizzo(default) all hardware ring pairs 603415dfa83SMaxim Sobolev.It NR_REG_SW "netmap:foo^" 60417885a7bSLuigi Rizzothe ``host rings'', connecting to the host stack. 605d4d112e3SJoel Dahl.It NR_REG_NIC_SW "netmap:foo+" 606fa7db06bSLuigi Rizzoall hardware rings and the host rings 607fa7db06bSLuigi Rizzo.It NR_REG_ONE_NIC "netmap:foo-i" 608fa7db06bSLuigi Rizzoonly the i-th hardware ring pair, where the number is in 609fa7db06bSLuigi Rizzo.Pa nr_ringid ; 610fa7db06bSLuigi Rizzo.It NR_REG_PIPE_MASTER "netmap:foo{i" 611fa7db06bSLuigi Rizzothe master side of the netmap pipe whose identifier (i) is in 612fa7db06bSLuigi Rizzo.Pa nr_ringid ; 613fa7db06bSLuigi Rizzo.It NR_REG_PIPE_SLAVE "netmap:foo}i" 614fa7db06bSLuigi Rizzothe slave side of the netmap pipe whose identifier (i) is in 615fa7db06bSLuigi Rizzo.Pa nr_ringid . 616fa7db06bSLuigi Rizzo.Pp 617fa7db06bSLuigi RizzoThe identifier of a pipe must be thought as part of the pipe name, 618fa7db06bSLuigi Rizzoand does not need to be sequential. On return the pipe 619fa7db06bSLuigi Rizzowill only have a single ring pair with index 0, 620fa7db06bSLuigi Rizzoirrespective of the value of i. 62168b8534bSLuigi Rizzo.El 62217885a7bSLuigi Rizzo.Pp 62368b8534bSLuigi RizzoBy default, a 62417885a7bSLuigi Rizzo.Xr poll 2 62568b8534bSLuigi Rizzoor 62617885a7bSLuigi Rizzo.Xr select 2 62768b8534bSLuigi Rizzocall pushes out any pending packets on the transmit ring, even if 62868b8534bSLuigi Rizzono write events are specified. 62968b8534bSLuigi RizzoThe feature can be disabled by or-ing 630415dfa83SMaxim Sobolev.Va NETMAP_NO_TX_POLL 63117885a7bSLuigi Rizzoto the value written to 63217885a7bSLuigi Rizzo.Va nr_ringid. 63317885a7bSLuigi RizzoWhen this feature is used, 63417885a7bSLuigi Rizzopackets are transmitted only on 63517885a7bSLuigi Rizzo.Va ioctl(NIOCTXSYNC) 63617885a7bSLuigi Rizzoor select()/poll() are called with a write event (POLLOUT/wfdset) or a full ring. 637ce3ee1e7SLuigi Rizzo.Pp 638ce3ee1e7SLuigi RizzoWhen registering a virtual interface that is dynamically created to a 639ce3ee1e7SLuigi Rizzo.Xr vale 4 640ce3ee1e7SLuigi Rizzoswitch, we can specify the desired number of rings (1 by default, 641ce3ee1e7SLuigi Rizzoand currently up to 16) on it using nr_tx_rings and nr_rx_rings fields. 64268b8534bSLuigi Rizzo.It Dv NIOCTXSYNC 64368b8534bSLuigi Rizzotells the hardware of new packets to transmit, and updates the 64468b8534bSLuigi Rizzonumber of slots available for transmission. 64568b8534bSLuigi Rizzo.It Dv NIOCRXSYNC 64668b8534bSLuigi Rizzotells the hardware of consumed packets, and asks for newly available 64768b8534bSLuigi Rizzopackets. 64868b8534bSLuigi Rizzo.El 649fa7db06bSLuigi Rizzo.Sh SELECT, POLL, EPOLL, KQUEUE. 65017885a7bSLuigi Rizzo.Xr select 2 65117885a7bSLuigi Rizzoand 65217885a7bSLuigi Rizzo.Xr poll 2 65317885a7bSLuigi Rizzoon a 65417885a7bSLuigi Rizzo.Nm 65517885a7bSLuigi Rizzofile descriptor process rings as indicated in 65617885a7bSLuigi Rizzo.Sx TRANSMIT RINGS 65717885a7bSLuigi Rizzoand 658fa7db06bSLuigi Rizzo.Sx RECEIVE RINGS , 659fa7db06bSLuigi Rizzorespectively when write (POLLOUT) and read (POLLIN) events are requested. 660fa7db06bSLuigi RizzoBoth block if no slots are available in the ring 661fa7db06bSLuigi Rizzo.Va ( ring->cur == ring->tail ) . 662fa7db06bSLuigi RizzoDepending on the platform, 663fa7db06bSLuigi Rizzo.Xr epoll 2 664fa7db06bSLuigi Rizzoand 665fa7db06bSLuigi Rizzo.Xr kqueue 2 666fa7db06bSLuigi Rizzoare supported too. 66717885a7bSLuigi Rizzo.Pp 668fa7db06bSLuigi RizzoPackets in transmit rings are normally pushed out 669fa7db06bSLuigi Rizzo(and buffers reclaimed) even without 670415dfa83SMaxim Sobolevrequesting write events. Passing the NETMAP_NO_TX_POLL flag to 67117885a7bSLuigi Rizzo.Em NIOCREGIF 67217885a7bSLuigi Rizzodisables this feature. 673fa7db06bSLuigi RizzoBy default, receive rings are processed only if read 674415dfa83SMaxim Sobolevevents are requested. Passing the NETMAP_DO_RX_POLL flag to 675fa7db06bSLuigi Rizzo.Em NIOCREGIF updates receive rings even without read events. 676415dfa83SMaxim SobolevNote that on epoll and kqueue, NETMAP_NO_TX_POLL and NETMAP_DO_RX_POLL 677fa7db06bSLuigi Rizzoonly have an effect when some event is posted for the file descriptor. 67817885a7bSLuigi Rizzo.Sh LIBRARIES 67917885a7bSLuigi RizzoThe 68017885a7bSLuigi Rizzo.Nm 68117885a7bSLuigi RizzoAPI is supposed to be used directly, both because of its simplicity and 68217885a7bSLuigi Rizzofor efficient integration with applications. 68317885a7bSLuigi Rizzo.Pp 68417885a7bSLuigi RizzoFor conveniency, the 68517885a7bSLuigi Rizzo.Va <net/netmap_user.h> 68617885a7bSLuigi Rizzoheader provides a few macros and functions to ease creating 68717885a7bSLuigi Rizzoa file descriptor and doing I/O with a 68817885a7bSLuigi Rizzo.Nm 68917885a7bSLuigi Rizzoport. These are loosely modeled after the 69017885a7bSLuigi Rizzo.Xr pcap 3 69117885a7bSLuigi RizzoAPI, to ease porting of libpcap-based applications to 69217885a7bSLuigi Rizzo.Nm . 69317885a7bSLuigi RizzoTo use these extra functions, programs should 69417885a7bSLuigi Rizzo.Dl #define NETMAP_WITH_LIBS 69517885a7bSLuigi Rizzobefore 69617885a7bSLuigi Rizzo.Dl #include <net/netmap_user.h> 69717885a7bSLuigi Rizzo.Pp 69817885a7bSLuigi RizzoThe following functions are available: 69917885a7bSLuigi Rizzo.Bl -tag -width XXXXX 700fa7db06bSLuigi Rizzo.It Va struct nm_desc * nm_open(const char *ifname, const struct nmreq *req, uint64_t flags, const struct nm_desc *arg) 70117885a7bSLuigi Rizzosimilar to 70217885a7bSLuigi Rizzo.Xr pcap_open , 70317885a7bSLuigi Rizzobinds a file descriptor to a port. 70417885a7bSLuigi Rizzo.Bl -tag -width XX 70517885a7bSLuigi Rizzo.It Va ifname 70617885a7bSLuigi Rizzois a port name, in the form "netmap:XXX" for a NIC and "valeXXX:YYY" for a 70717885a7bSLuigi Rizzo.Nm VALE 70817885a7bSLuigi Rizzoport. 709fa7db06bSLuigi Rizzo.It Va req 710fa7db06bSLuigi Rizzoprovides the initial values for the argument to the NIOCREGIF ioctl. 711fa7db06bSLuigi RizzoThe nm_flags and nm_ringid values are overwritten by parsing 712fa7db06bSLuigi Rizzoifname and flags, and other fields can be overridden through 713fa7db06bSLuigi Rizzothe other two arguments. 714fa7db06bSLuigi Rizzo.It Va arg 715fa7db06bSLuigi Rizzopoints to a struct nm_desc containing arguments (e.g. from a previously 716fa7db06bSLuigi Rizzoopen file descriptor) that should override the defaults. 717fa7db06bSLuigi RizzoThe fields are used as described below 71817885a7bSLuigi Rizzo.It Va flags 719fa7db06bSLuigi Rizzocan be set to a combination of the following flags: 720fa7db06bSLuigi Rizzo.Va NETMAP_NO_TX_POLL , 721fa7db06bSLuigi Rizzo.Va NETMAP_DO_RX_POLL 722fa7db06bSLuigi Rizzo(copied into nr_ringid); 723fa7db06bSLuigi Rizzo.Va NM_OPEN_NO_MMAP (if arg points to the same memory region, 724fa7db06bSLuigi Rizzoavoids the mmap and uses the values from it); 725fa7db06bSLuigi Rizzo.Va NM_OPEN_IFNAME (ignores ifname and uses the values in arg); 726fa7db06bSLuigi Rizzo.Va NM_OPEN_ARG1 , 727fa7db06bSLuigi Rizzo.Va NM_OPEN_ARG2 , 728fa7db06bSLuigi Rizzo.Va NM_OPEN_ARG3 (uses the fields from arg); 729fa7db06bSLuigi Rizzo.Va NM_OPEN_RING_CFG (uses the ring number and sizes from arg). 73017885a7bSLuigi Rizzo.El 731fa7db06bSLuigi Rizzo.It Va int nm_close(struct nm_desc *d) 73217885a7bSLuigi Rizzocloses the file descriptor, unmaps memory, frees resources. 733fa7db06bSLuigi Rizzo.It Va int nm_inject(struct nm_desc *d, const void *buf, size_t size) 73417885a7bSLuigi Rizzosimilar to pcap_inject(), pushes a packet to a ring, returns the size 73517885a7bSLuigi Rizzoof the packet is successful, or 0 on error; 736fa7db06bSLuigi Rizzo.It Va int nm_dispatch(struct nm_desc *d, int cnt, nm_cb_t cb, u_char *arg) 73717885a7bSLuigi Rizzosimilar to pcap_dispatch(), applies a callback to incoming packets 738fa7db06bSLuigi Rizzo.It Va u_char * nm_nextpkt(struct nm_desc *d, struct nm_pkthdr *hdr) 73917885a7bSLuigi Rizzosimilar to pcap_next(), fetches the next packet 74017885a7bSLuigi Rizzo.El 74117885a7bSLuigi Rizzo.Sh SUPPORTED DEVICES 74217885a7bSLuigi Rizzo.Nm 74317885a7bSLuigi Rizzonatively supports the following devices: 74417885a7bSLuigi Rizzo.Pp 74517885a7bSLuigi RizzoOn FreeBSD: 74617885a7bSLuigi Rizzo.Xr em 4 , 74717885a7bSLuigi Rizzo.Xr igb 4 , 74817885a7bSLuigi Rizzo.Xr ixgbe 4 , 74917885a7bSLuigi Rizzo.Xr lem 4 , 75017885a7bSLuigi Rizzo.Xr re 4 . 75117885a7bSLuigi Rizzo.Pp 75217885a7bSLuigi RizzoOn Linux 75317885a7bSLuigi Rizzo.Xr e1000 4 , 75417885a7bSLuigi Rizzo.Xr e1000e 4 , 75517885a7bSLuigi Rizzo.Xr igb 4 , 75617885a7bSLuigi Rizzo.Xr ixgbe 4 , 75717885a7bSLuigi Rizzo.Xr mlx4 4 , 75817885a7bSLuigi Rizzo.Xr forcedeth 4 , 75917885a7bSLuigi Rizzo.Xr r8169 4 . 76017885a7bSLuigi Rizzo.Pp 76117885a7bSLuigi RizzoNICs without native support can still be used in 76217885a7bSLuigi Rizzo.Nm 76317885a7bSLuigi Rizzomode through emulation. Performance is inferior to native netmap 76417885a7bSLuigi Rizzomode but still significantly higher than sockets, and approaching 76517885a7bSLuigi Rizzothat of in-kernel solutions such as Linux's 76617885a7bSLuigi Rizzo.Xr pktgen . 76717885a7bSLuigi Rizzo.Pp 76817885a7bSLuigi RizzoEmulation is also available for devices with native netmap support, 76917885a7bSLuigi Rizzowhich can be used for testing or performance comparison. 77017885a7bSLuigi RizzoThe sysctl variable 77117885a7bSLuigi Rizzo.Va dev.netmap.admode 77217885a7bSLuigi Rizzoglobally controls how netmap mode is implemented. 77317885a7bSLuigi Rizzo.Sh SYSCTL VARIABLES AND MODULE PARAMETERS 77417885a7bSLuigi RizzoSome aspect of the operation of 77517885a7bSLuigi Rizzo.Nm 77617885a7bSLuigi Rizzoare controlled through sysctl variables on FreeBSD 77717885a7bSLuigi Rizzo.Em ( dev.netmap.* ) 77817885a7bSLuigi Rizzoand module parameters on Linux 77917885a7bSLuigi Rizzo.Em ( /sys/module/netmap_lin/parameters/* ) : 78017885a7bSLuigi Rizzo.Bl -tag -width indent 78117885a7bSLuigi Rizzo.It Va dev.netmap.admode: 0 78217885a7bSLuigi RizzoControls the use of native or emulated adapter mode. 78317885a7bSLuigi Rizzo0 uses the best available option, 1 forces native and 78417885a7bSLuigi Rizzofails if not available, 2 forces emulated hence never fails. 78517885a7bSLuigi Rizzo.It Va dev.netmap.generic_ringsize: 1024 78617885a7bSLuigi RizzoRing size used for emulated netmap mode 78717885a7bSLuigi Rizzo.It Va dev.netmap.generic_mit: 100000 78817885a7bSLuigi RizzoControls interrupt moderation for emulated mode 78917885a7bSLuigi Rizzo.It Va dev.netmap.mmap_unreg: 0 79017885a7bSLuigi Rizzo.It Va dev.netmap.fwd: 0 79117885a7bSLuigi RizzoForces NS_FORWARD mode 79217885a7bSLuigi Rizzo.It Va dev.netmap.flags: 0 79317885a7bSLuigi Rizzo.It Va dev.netmap.txsync_retry: 2 79417885a7bSLuigi Rizzo.It Va dev.netmap.no_pendintr: 1 79517885a7bSLuigi RizzoForces recovery of transmit buffers on system calls 79617885a7bSLuigi Rizzo.It Va dev.netmap.mitigate: 1 79717885a7bSLuigi RizzoPropagates interrupt mitigation to user processes 79817885a7bSLuigi Rizzo.It Va dev.netmap.no_timestamp: 0 79917885a7bSLuigi RizzoDisables the update of the timestamp in the netmap ring 80017885a7bSLuigi Rizzo.It Va dev.netmap.verbose: 0 80117885a7bSLuigi RizzoVerbose kernel messages 80217885a7bSLuigi Rizzo.It Va dev.netmap.buf_num: 163840 80317885a7bSLuigi Rizzo.It Va dev.netmap.buf_size: 2048 80417885a7bSLuigi Rizzo.It Va dev.netmap.ring_num: 200 80517885a7bSLuigi Rizzo.It Va dev.netmap.ring_size: 36864 80617885a7bSLuigi Rizzo.It Va dev.netmap.if_num: 100 80717885a7bSLuigi Rizzo.It Va dev.netmap.if_size: 1024 80817885a7bSLuigi RizzoSizes and number of objects (netmap_if, netmap_ring, buffers) 80917885a7bSLuigi Rizzofor the global memory region. The only parameter worth modifying is 81017885a7bSLuigi Rizzo.Va dev.netmap.buf_num 81117885a7bSLuigi Rizzoas it impacts the total amount of memory used by netmap. 81217885a7bSLuigi Rizzo.It Va dev.netmap.buf_curr_num: 0 81317885a7bSLuigi Rizzo.It Va dev.netmap.buf_curr_size: 0 81417885a7bSLuigi Rizzo.It Va dev.netmap.ring_curr_num: 0 81517885a7bSLuigi Rizzo.It Va dev.netmap.ring_curr_size: 0 81617885a7bSLuigi Rizzo.It Va dev.netmap.if_curr_num: 0 81717885a7bSLuigi Rizzo.It Va dev.netmap.if_curr_size: 0 81817885a7bSLuigi RizzoActual values in use. 81917885a7bSLuigi Rizzo.It Va dev.netmap.bridge_batch: 1024 82017885a7bSLuigi RizzoBatch size used when moving packets across a 82117885a7bSLuigi Rizzo.Nm VALE 82217885a7bSLuigi Rizzoswitch. Values above 64 generally guarantee good 82317885a7bSLuigi Rizzoperformance. 82417885a7bSLuigi Rizzo.El 82513a5d88fSLuigi Rizzo.Sh SYSTEM CALLS 82668b8534bSLuigi Rizzo.Nm 82768b8534bSLuigi Rizzouses 828fa7db06bSLuigi Rizzo.Xr select 2 , 829fa7db06bSLuigi Rizzo.Xr poll 2 , 830fa7db06bSLuigi Rizzo.Xr epoll 83168b8534bSLuigi Rizzoand 832fa7db06bSLuigi Rizzo.Xr kqueue 833ce3ee1e7SLuigi Rizzoto wake up processes when significant events occur, and 834ce3ee1e7SLuigi Rizzo.Xr mmap 2 835ce3ee1e7SLuigi Rizzoto map memory. 83617885a7bSLuigi Rizzo.Xr ioctl 2 83717885a7bSLuigi Rizzois used to configure ports and 83817885a7bSLuigi Rizzo.Nm VALE switches . 839ce3ee1e7SLuigi Rizzo.Pp 840ce3ee1e7SLuigi RizzoApplications may need to create threads and bind them to 841ce3ee1e7SLuigi Rizzospecific cores to improve performance, using standard 842ce3ee1e7SLuigi RizzoOS primitives, see 843ce3ee1e7SLuigi Rizzo.Xr pthread 3 . 844ce3ee1e7SLuigi RizzoIn particular, 845ce3ee1e7SLuigi Rizzo.Xr pthread_setaffinity_np 3 846ce3ee1e7SLuigi Rizzomay be of use. 84768b8534bSLuigi Rizzo.Sh EXAMPLES 84817885a7bSLuigi Rizzo.Ss TEST PROGRAMS 84917885a7bSLuigi Rizzo.Nm 85017885a7bSLuigi Rizzocomes with a few programs that can be used for testing or 85117885a7bSLuigi Rizzosimple applications. 85217885a7bSLuigi RizzoSee the 85317885a7bSLuigi Rizzo.Va examples/ 85417885a7bSLuigi Rizzodirectory in 85517885a7bSLuigi Rizzo.Nm 85617885a7bSLuigi Rizzodistributions, or 85717885a7bSLuigi Rizzo.Va tools/tools/netmap/ 85817885a7bSLuigi Rizzodirectory in FreeBSD distributions. 85917885a7bSLuigi Rizzo.Pp 86017885a7bSLuigi Rizzo.Xr pkt-gen 86117885a7bSLuigi Rizzois a general purpose traffic source/sink. 86217885a7bSLuigi Rizzo.Pp 86317885a7bSLuigi RizzoAs an example 86417885a7bSLuigi Rizzo.Dl pkt-gen -i ix0 -f tx -l 60 86517885a7bSLuigi Rizzocan generate an infinite stream of minimum size packets, and 86617885a7bSLuigi Rizzo.Dl pkt-gen -i ix0 -f rx 86717885a7bSLuigi Rizzois a traffic sink. 86817885a7bSLuigi RizzoBoth print traffic statistics, to help monitor 86917885a7bSLuigi Rizzohow the system performs. 87017885a7bSLuigi Rizzo.Pp 87117885a7bSLuigi Rizzo.Xr pkt-gen 87217885a7bSLuigi Rizzohas many options can be uses to set packet sizes, addresses, 87317885a7bSLuigi Rizzorates, and use multiple send/receive threads and cores. 87417885a7bSLuigi Rizzo.Pp 87517885a7bSLuigi Rizzo.Xr bridge 87617885a7bSLuigi Rizzois another test program which interconnects two 87717885a7bSLuigi Rizzo.Nm 87817885a7bSLuigi Rizzoports. It can be used for transparent forwarding between 87917885a7bSLuigi Rizzointerfaces, as in 88017885a7bSLuigi Rizzo.Dl bridge -i ix0 -i ix1 88117885a7bSLuigi Rizzoor even connect the NIC to the host stack using netmap 88217885a7bSLuigi Rizzo.Dl bridge -i ix0 -i ix0 88317885a7bSLuigi Rizzo.Ss USING THE NATIVE API 88468b8534bSLuigi RizzoThe following code implements a traffic generator 88568b8534bSLuigi Rizzo.Pp 88668b8534bSLuigi Rizzo.Bd -literal -compact 88768b8534bSLuigi Rizzo#include <net/netmap_user.h> 888*fe1e4a6cSBaptiste Daroussin\&... 88917885a7bSLuigi Rizzovoid sender(void) 89017885a7bSLuigi Rizzo{ 89168b8534bSLuigi Rizzo struct netmap_if *nifp; 89268b8534bSLuigi Rizzo struct netmap_ring *ring; 893d83a410eSHiren Panchasara struct nmreq nmr; 89417885a7bSLuigi Rizzo struct pollfd fds; 89568b8534bSLuigi Rizzo 89668b8534bSLuigi Rizzo fd = open("/dev/netmap", O_RDWR); 89768b8534bSLuigi Rizzo bzero(&nmr, sizeof(nmr)); 898d83a410eSHiren Panchasara strcpy(nmr.nr_name, "ix0"); 899ce3ee1e7SLuigi Rizzo nmr.nm_version = NETMAP_API; 900ce3ee1e7SLuigi Rizzo ioctl(fd, NIOCREGIF, &nmr); 901d83a410eSHiren Panchasara p = mmap(0, nmr.nr_memsize, fd); 902ce3ee1e7SLuigi Rizzo nifp = NETMAP_IF(p, nmr.nr_offset); 90368b8534bSLuigi Rizzo ring = NETMAP_TXRING(nifp, 0); 90468b8534bSLuigi Rizzo fds.fd = fd; 90568b8534bSLuigi Rizzo fds.events = POLLOUT; 90668b8534bSLuigi Rizzo for (;;) { 90717885a7bSLuigi Rizzo poll(&fds, 1, -1); 90817885a7bSLuigi Rizzo while (!nm_ring_empty(ring)) { 90968b8534bSLuigi Rizzo i = ring->cur; 91068b8534bSLuigi Rizzo buf = NETMAP_BUF(ring, ring->slot[i].buf_index); 91168b8534bSLuigi Rizzo ... prepare packet in buf ... 91268b8534bSLuigi Rizzo ring->slot[i].len = ... packet length ... 91317885a7bSLuigi Rizzo ring->head = ring->cur = nm_ring_next(ring, i); 91417885a7bSLuigi Rizzo } 91568b8534bSLuigi Rizzo } 91668b8534bSLuigi Rizzo} 91768b8534bSLuigi Rizzo.Ed 91817885a7bSLuigi Rizzo.Ss HELPER FUNCTIONS 91917885a7bSLuigi RizzoA simple receiver can be implemented using the helper functions 92017885a7bSLuigi Rizzo.Bd -literal -compact 92117885a7bSLuigi Rizzo#define NETMAP_WITH_LIBS 92217885a7bSLuigi Rizzo#include <net/netmap_user.h> 923*fe1e4a6cSBaptiste Daroussin\&... 92417885a7bSLuigi Rizzovoid receiver(void) 92517885a7bSLuigi Rizzo{ 926fa7db06bSLuigi Rizzo struct nm_desc *d; 92717885a7bSLuigi Rizzo struct pollfd fds; 92817885a7bSLuigi Rizzo u_char *buf; 929fa7db06bSLuigi Rizzo struct nm_pkthdr h; 93017885a7bSLuigi Rizzo ... 93117885a7bSLuigi Rizzo d = nm_open("netmap:ix0", NULL, 0, 0); 93217885a7bSLuigi Rizzo fds.fd = NETMAP_FD(d); 93317885a7bSLuigi Rizzo fds.events = POLLIN; 93417885a7bSLuigi Rizzo for (;;) { 93517885a7bSLuigi Rizzo poll(&fds, 1, -1); 93617885a7bSLuigi Rizzo while ( (buf = nm_nextpkt(d, &h)) ) 93717885a7bSLuigi Rizzo consume_pkt(buf, h->len); 93817885a7bSLuigi Rizzo } 93917885a7bSLuigi Rizzo nm_close(d); 94017885a7bSLuigi Rizzo} 94117885a7bSLuigi Rizzo.Ed 94217885a7bSLuigi Rizzo.Ss ZERO-COPY FORWARDING 94317885a7bSLuigi RizzoSince physical interfaces share the same memory region, 94417885a7bSLuigi Rizzoit is possible to do packet forwarding between ports 94517885a7bSLuigi Rizzoswapping buffers. The buffer from the transmit ring is used 94617885a7bSLuigi Rizzoto replenish the receive ring: 94717885a7bSLuigi Rizzo.Bd -literal -compact 94817885a7bSLuigi Rizzo uint32_t tmp; 94917885a7bSLuigi Rizzo struct netmap_slot *src, *dst; 95017885a7bSLuigi Rizzo ... 95117885a7bSLuigi Rizzo src = &src_ring->slot[rxr->cur]; 95217885a7bSLuigi Rizzo dst = &dst_ring->slot[txr->cur]; 95317885a7bSLuigi Rizzo tmp = dst->buf_idx; 95417885a7bSLuigi Rizzo dst->buf_idx = src->buf_idx; 95517885a7bSLuigi Rizzo dst->len = src->len; 95617885a7bSLuigi Rizzo dst->flags = NS_BUF_CHANGED; 95717885a7bSLuigi Rizzo src->buf_idx = tmp; 95817885a7bSLuigi Rizzo src->flags = NS_BUF_CHANGED; 95917885a7bSLuigi Rizzo rxr->head = rxr->cur = nm_ring_next(rxr, rxr->cur); 96017885a7bSLuigi Rizzo txr->head = txr->cur = nm_ring_next(txr, txr->cur); 96117885a7bSLuigi Rizzo ... 96217885a7bSLuigi Rizzo.Ed 96317885a7bSLuigi Rizzo.Ss ACCESSING THE HOST STACK 964fa7db06bSLuigi RizzoThe host stack is for all practical purposes just a regular ring pair, 965fa7db06bSLuigi Rizzowhich you can access with the netmap API (e.g. with 966fa7db06bSLuigi Rizzo.Dl nm_open("netmap:eth0^", ... ) ; 967fa7db06bSLuigi RizzoAll packets that the host would send to an interface in 968fa7db06bSLuigi Rizzo.Nm 969fa7db06bSLuigi Rizzomode end up into the RX ring, whereas all packets queued to the 970fa7db06bSLuigi RizzoTX ring are send up to the host stack. 97117885a7bSLuigi Rizzo.Ss VALE SWITCH 97217885a7bSLuigi RizzoA simple way to test the performance of a 97317885a7bSLuigi Rizzo.Nm VALE 97417885a7bSLuigi Rizzoswitch is to attach a sender and a receiver to it, 97517885a7bSLuigi Rizzoe.g. running the following in two different terminals: 97617885a7bSLuigi Rizzo.Dl pkt-gen -i vale1:a -f rx # receiver 97717885a7bSLuigi Rizzo.Dl pkt-gen -i vale1:b -f tx # sender 978fa7db06bSLuigi RizzoThe same example can be used to test netmap pipes, by simply 979fa7db06bSLuigi Rizzochanging port names, e.g. 980fa7db06bSLuigi Rizzo.Dl pkt-gen -i vale:x{3 -f rx # receiver on the master side 981fa7db06bSLuigi Rizzo.Dl pkt-gen -i vale:x}3 -f tx # sender on the slave side 98217885a7bSLuigi Rizzo.Pp 98317885a7bSLuigi RizzoThe following command attaches an interface and the host stack 98417885a7bSLuigi Rizzoto a switch: 98517885a7bSLuigi Rizzo.Dl vale-ctl -h vale2:em0 98617885a7bSLuigi RizzoOther 98768b8534bSLuigi Rizzo.Nm 98817885a7bSLuigi Rizzoclients attached to the same switch can now communicate 98917885a7bSLuigi Rizzowith the network card or the host. 99013a5d88fSLuigi Rizzo.Sh SEE ALSO 9910b3504fdSChristian Brueffer.Pa http://info.iet.unipi.it/~luigi/netmap/ 99213a5d88fSLuigi Rizzo.Pp 99313a5d88fSLuigi RizzoLuigi Rizzo, Revisiting network I/O APIs: the netmap framework, 99413a5d88fSLuigi RizzoCommunications of the ACM, 55 (3), pp.45-51, March 2012 99513a5d88fSLuigi Rizzo.Pp 99613a5d88fSLuigi RizzoLuigi Rizzo, netmap: a novel framework for fast packet I/O, 99713a5d88fSLuigi RizzoUsenix ATC'12, June 2012, Boston 998fa7db06bSLuigi Rizzo.Pp 999fa7db06bSLuigi RizzoLuigi Rizzo, Giuseppe Lettieri, 1000fa7db06bSLuigi RizzoVALE, a switched ethernet for virtual machines, 1001fa7db06bSLuigi RizzoACM CoNEXT'12, December 2012, Nice 1002fa7db06bSLuigi Rizzo.Pp 1003fa7db06bSLuigi RizzoLuigi Rizzo, Giuseppe Lettieri, Vincenzo Maffione, 1004fa7db06bSLuigi RizzoSpeeding up packet I/O in virtual machines, 1005fa7db06bSLuigi RizzoACM/IEEE ANCS'13, October 2013, San Jose 100668b8534bSLuigi Rizzo.Sh AUTHORS 100713a5d88fSLuigi Rizzo.An -nosplit 100868b8534bSLuigi RizzoThe 100968b8534bSLuigi Rizzo.Nm 1010ce3ee1e7SLuigi Rizzoframework has been originally designed and implemented at the 101113a5d88fSLuigi RizzoUniversita` di Pisa in 2011 by 101213a5d88fSLuigi Rizzo.An Luigi Rizzo , 1013ce3ee1e7SLuigi Rizzoand further extended with help from 101413a5d88fSLuigi Rizzo.An Matteo Landi , 101513a5d88fSLuigi Rizzo.An Gaetano Catalli , 1016ce3ee1e7SLuigi Rizzo.An Giuseppe Lettieri , 1017ce3ee1e7SLuigi Rizzo.An Vincenzo Maffione . 101813a5d88fSLuigi Rizzo.Pp 101913a5d88fSLuigi Rizzo.Nm 1020ce3ee1e7SLuigi Rizzoand 1021ce3ee1e7SLuigi Rizzo.Nm VALE 1022ce3ee1e7SLuigi Rizzohave been funded by the European Commission within FP7 Projects 1023ce3ee1e7SLuigi RizzoCHANGE (257422) and OPENLAB (287581). 1024bf15fc88SJoel Dahl.Sh CAVEATS 1025bf15fc88SJoel DahlNo matter how fast the CPU and OS are, 1026bf15fc88SJoel Dahlachieving line rate on 10G and faster interfaces 1027bf15fc88SJoel Dahlrequires hardware with sufficient performance. 1028bf15fc88SJoel DahlSeveral NICs are unable to sustain line rate with 1029bf15fc88SJoel Dahlsmall packet sizes. Insufficient PCIe or memory bandwidth 1030bf15fc88SJoel Dahlcan also cause reduced performance. 1031bf15fc88SJoel Dahl.Pp 1032bf15fc88SJoel DahlAnother frequent reason for low performance is the use 1033bf15fc88SJoel Dahlof flow control on the link: a slow receiver can limit 1034bf15fc88SJoel Dahlthe transmit speed. 1035bf15fc88SJoel DahlBe sure to disable flow control when running high 1036bf15fc88SJoel Dahlspeed experiments. 1037bf15fc88SJoel Dahl.Pp 1038bf15fc88SJoel Dahl.Ss SPECIAL NIC FEATURES 1039bf15fc88SJoel Dahl.Nm 1040bf15fc88SJoel Dahlis orthogonal to some NIC features such as 1041bf15fc88SJoel Dahlmultiqueue, schedulers, packet filters. 1042bf15fc88SJoel Dahl.Pp 1043bf15fc88SJoel DahlMultiple transmit and receive rings are supported natively 1044bf15fc88SJoel Dahland can be configured with ordinary OS tools, 1045bf15fc88SJoel Dahlsuch as 1046bf15fc88SJoel Dahl.Xr ethtool 1047bf15fc88SJoel Dahlor 1048bf15fc88SJoel Dahldevice-specific sysctl variables. 1049bf15fc88SJoel DahlThe same goes for Receive Packet Steering (RPS) 1050bf15fc88SJoel Dahland filtering of incoming traffic. 1051bf15fc88SJoel Dahl.Pp 1052bf15fc88SJoel Dahl.Nm 1053bf15fc88SJoel Dahl.Em does not use 1054bf15fc88SJoel Dahlfeatures such as 1055bf15fc88SJoel Dahl.Em checksum offloading , TCP segmentation offloading , 1056bf15fc88SJoel Dahl.Em encryption , VLAN encapsulation/decapsulation , 1057bf15fc88SJoel Dahletc. . 1058bf15fc88SJoel DahlWhen using netmap to exchange packets with the host stack, 1059bf15fc88SJoel Dahlmake sure to disable these features. 1060