168b8534bSLuigi Rizzo.\" Copyright (c) 2011 Matteo Landi, Luigi Rizzo, Universita` di Pisa 268b8534bSLuigi Rizzo.\" All rights reserved. 368b8534bSLuigi Rizzo.\" 468b8534bSLuigi Rizzo.\" Redistribution and use in source and binary forms, with or without 568b8534bSLuigi Rizzo.\" modification, are permitted provided that the following conditions 668b8534bSLuigi Rizzo.\" are met: 768b8534bSLuigi Rizzo.\" 1. Redistributions of source code must retain the above copyright 868b8534bSLuigi Rizzo.\" notice, this list of conditions and the following disclaimer. 968b8534bSLuigi Rizzo.\" 2. Redistributions in binary form must reproduce the above copyright 1068b8534bSLuigi Rizzo.\" notice, this list of conditions and the following disclaimer in the 1168b8534bSLuigi Rizzo.\" documentation and/or other materials provided with the distribution. 1268b8534bSLuigi Rizzo.\" 1368b8534bSLuigi Rizzo.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 1468b8534bSLuigi Rizzo.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 1568b8534bSLuigi Rizzo.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 1668b8534bSLuigi Rizzo.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 1768b8534bSLuigi Rizzo.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 1868b8534bSLuigi Rizzo.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 1968b8534bSLuigi Rizzo.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 2068b8534bSLuigi Rizzo.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 2168b8534bSLuigi Rizzo.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 2268b8534bSLuigi Rizzo.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 2368b8534bSLuigi Rizzo.\" SUCH DAMAGE. 2468b8534bSLuigi Rizzo.\" 2568b8534bSLuigi Rizzo.\" This document is derived in part from the enet man page (enet.4) 2668b8534bSLuigi Rizzo.\" distributed with 4.3BSD Unix. 2768b8534bSLuigi Rizzo.\" 2868b8534bSLuigi Rizzo.\" $FreeBSD$ 29*13a5d88fSLuigi Rizzo.\" $Id: netmap.4 11563 2012-08-02 08:59:12Z luigi $: stable/8/share/man/man4/bpf.4 181694 2008-08-13 17:45:06Z ed $ 3068b8534bSLuigi Rizzo.\" 3164ae02c3SLuigi Rizzo.Dd February 27, 2012 3268b8534bSLuigi Rizzo.Dt NETMAP 4 3368b8534bSLuigi Rizzo.Os 3468b8534bSLuigi Rizzo.Sh NAME 3568b8534bSLuigi Rizzo.Nm netmap 3668b8534bSLuigi Rizzo.Nd a framework for fast packet I/O 3768b8534bSLuigi Rizzo.Sh SYNOPSIS 3868b8534bSLuigi Rizzo.Cd device netmap 3968b8534bSLuigi Rizzo.Sh DESCRIPTION 4068b8534bSLuigi Rizzo.Nm 4168b8534bSLuigi Rizzois a framework for fast and safe access to network devices 4268b8534bSLuigi Rizzo(reaching 14.88 Mpps at less than 1 GHz). 4368b8534bSLuigi Rizzo.Nm 4468b8534bSLuigi Rizzouses memory mapped buffers and metadata 4568b8534bSLuigi Rizzo(buffer indexes and lengths) to communicate with the kernel, 4668b8534bSLuigi Rizzowhich is in charge of validating information through 4768b8534bSLuigi Rizzo.Pa ioctl() 4868b8534bSLuigi Rizzoand 4968b8534bSLuigi Rizzo.Pa select()/poll(). 5068b8534bSLuigi Rizzo.Nm 5168b8534bSLuigi Rizzocan exploit the parallelism in multiqueue devices and 5268b8534bSLuigi Rizzomulticore systems. 5368b8534bSLuigi Rizzo.Pp 54*13a5d88fSLuigi Rizzo.Pp 5568b8534bSLuigi Rizzo.Nm 5668b8534bSLuigi Rizzorequires explicit support in device drivers. 5768b8534bSLuigi RizzoFor a list of supported devices, see the end of this manual page. 5868b8534bSLuigi Rizzo.Sh OPERATION 5968b8534bSLuigi Rizzo.Nm 6068b8534bSLuigi Rizzoclients must first open the 6168b8534bSLuigi Rizzo.Pa open("/dev/netmap") , 6268b8534bSLuigi Rizzoand then issue an 6368b8534bSLuigi Rizzo.Pa ioctl(...,NIOCREGIF,...) 6468b8534bSLuigi Rizzoto bind the file descriptor to a network device. 6568b8534bSLuigi Rizzo.Pp 6668b8534bSLuigi RizzoWhen a device is put in 6768b8534bSLuigi Rizzo.Nm 6868b8534bSLuigi Rizzomode, its data path is disconnected from the host stack. 6968b8534bSLuigi RizzoThe processes owning the file descriptor 7068b8534bSLuigi Rizzocan exchange packets with the device, or with the host stack, 7168b8534bSLuigi Rizzothrough an mmapped memory region that contains pre-allocated 7268b8534bSLuigi Rizzobuffers and metadata. 7368b8534bSLuigi Rizzo.Pp 7468b8534bSLuigi RizzoNon blocking I/O is done with special 7568b8534bSLuigi Rizzo.Pa ioctl()'s , 7668b8534bSLuigi Rizzowhereas the file descriptor can be passed to 7768b8534bSLuigi Rizzo.Pa select()/poll() 7868b8534bSLuigi Rizzoto be notified about incoming packet or available transmit buffers. 7968b8534bSLuigi Rizzo.Ss Data structures 8068b8534bSLuigi RizzoAll data structures for all devices in 8168b8534bSLuigi Rizzo.Nm 8268b8534bSLuigi Rizzomode are in a memory 8368b8534bSLuigi Rizzoregion shared by the kernel and all processes 8468b8534bSLuigi Rizzowho open 8568b8534bSLuigi Rizzo.Pa /dev/netmap 8668b8534bSLuigi Rizzo(NOTE: visibility may be restricted in future implementations). 8768b8534bSLuigi RizzoAll references between the shared data structure 8868b8534bSLuigi Rizzoare relative (offsets or indexes). Some macros help converting 8968b8534bSLuigi Rizzothem into actual pointers. 9068b8534bSLuigi Rizzo.Pp 9168b8534bSLuigi RizzoThe data structures in shared memory are the following: 92*13a5d88fSLuigi Rizzo.Pp 9368b8534bSLuigi Rizzo.Bl -tag -width XXX 9468b8534bSLuigi Rizzo.It Dv struct netmap_if (one per interface) 9568b8534bSLuigi Rizzoindicates the number of rings supported by an interface, their 9668b8534bSLuigi Rizzosizes, and the offsets of the 9768b8534bSLuigi Rizzo.Pa netmap_rings 9868b8534bSLuigi Rizzoassociated to the interface. 9968b8534bSLuigi RizzoThe offset of a 10068b8534bSLuigi Rizzo.Pa struct netmap_if 10168b8534bSLuigi Rizzoin the shared memory region is indicated by the 10268b8534bSLuigi Rizzo.Pa nr_offset 10368b8534bSLuigi Rizzofield in the structure returned by the 10468b8534bSLuigi Rizzo.Pa NIOCREGIF 10568b8534bSLuigi Rizzo(see below). 10668b8534bSLuigi Rizzo.Bd -literal 10768b8534bSLuigi Rizzostruct netmap_if { 10868b8534bSLuigi Rizzo char ni_name[IFNAMSIZ]; /* name of the interface. */ 10968b8534bSLuigi Rizzo const u_int ni_num_queues; /* number of hw ring pairs */ 11068b8534bSLuigi Rizzo const ssize_t ring_ofs[]; /* offset of tx and rx rings */ 11168b8534bSLuigi Rizzo}; 11268b8534bSLuigi Rizzo.Ed 11368b8534bSLuigi Rizzo.It Dv struct netmap_ring (one per ring) 11468b8534bSLuigi Rizzocontains the index of the current read or write slot (cur), 11568b8534bSLuigi Rizzothe number of slots available for reception or transmission (avail), 11668b8534bSLuigi Rizzoand an array of 11768b8534bSLuigi Rizzo.Pa slots 11868b8534bSLuigi Rizzodescribing the buffers. 11968b8534bSLuigi RizzoThere is one ring pair for each of the N hardware ring pairs 12068b8534bSLuigi Rizzosupported by the card (numbered 0..N-1), plus 12168b8534bSLuigi Rizzoone ring pair (numbered N) for packets from/to the host stack. 12268b8534bSLuigi Rizzo.Bd -literal 12368b8534bSLuigi Rizzostruct netmap_ring { 12468b8534bSLuigi Rizzo const ssize_t buf_ofs; 12568b8534bSLuigi Rizzo const uint32_t num_slots; /* number of slots in the ring. */ 12668b8534bSLuigi Rizzo uint32_t avail; /* number of usable slots */ 12768b8534bSLuigi Rizzo uint32_t cur; /* 'current' index for the user side */ 12864ae02c3SLuigi Rizzo uint32_t reserved; /* not refilled before current */ 12968b8534bSLuigi Rizzo 13068b8534bSLuigi Rizzo const uint16_t nr_buf_size; 13168b8534bSLuigi Rizzo uint16_t flags; 13268b8534bSLuigi Rizzo struct netmap_slot slot[0]; /* array of slots. */ 13368b8534bSLuigi Rizzo} 13468b8534bSLuigi Rizzo.Ed 13568b8534bSLuigi Rizzo.It Dv struct netmap_slot (one per packet) 13668b8534bSLuigi Rizzocontains the metadata for a packet: a buffer index (buf_idx), 13768b8534bSLuigi Rizzoa buffer length (len), and some flags. 13868b8534bSLuigi Rizzo.Bd -literal 13968b8534bSLuigi Rizzostruct netmap_slot { 14068b8534bSLuigi Rizzo uint32_t buf_idx; /* buffer index */ 14168b8534bSLuigi Rizzo uint16_t len; /* packet length */ 14268b8534bSLuigi Rizzo uint16_t flags; /* buf changed, etc. */ 14368b8534bSLuigi Rizzo#define NS_BUF_CHANGED 0x0001 /* must resync, buffer changed */ 14468b8534bSLuigi Rizzo#define NS_REPORT 0x0002 /* tell hw to report results 14568b8534bSLuigi Rizzo * e.g. by generating an interrupt 14668b8534bSLuigi Rizzo */ 14768b8534bSLuigi Rizzo}; 14868b8534bSLuigi Rizzo.Ed 14968b8534bSLuigi Rizzo.It Dv packet buffers 15068b8534bSLuigi Rizzoare fixed size (approximately 2k) buffers allocated by the kernel 15168b8534bSLuigi Rizzothat contain packet data. Buffers addresses are computed through 15268b8534bSLuigi Rizzomacros. 15368b8534bSLuigi Rizzo.El 15468b8534bSLuigi Rizzo.Pp 15568b8534bSLuigi RizzoSome macros support the access to objects in the shared memory 15668b8534bSLuigi Rizzoregion. In particular: 15768b8534bSLuigi Rizzo.Bd -literal 15868b8534bSLuigi Rizzostruct netmap_if *nifp; 15968b8534bSLuigi Rizzostruct netmap_ring *txring = NETMAP_TXRING(nifp, i); 16068b8534bSLuigi Rizzostruct netmap_ring *rxring = NETMAP_RXRING(nifp, i); 16168b8534bSLuigi Rizzoint i = txring->slot[txring->cur].buf_idx; 16268b8534bSLuigi Rizzochar *buf = NETMAP_BUF(txring, i); 16368b8534bSLuigi Rizzo.Ed 164*13a5d88fSLuigi Rizzo.Sh IOCTLS 165*13a5d88fSLuigi Rizzo.Pp 16668b8534bSLuigi Rizzo.Nm 16768b8534bSLuigi Rizzosupports some ioctl() to synchronize the state of the rings 16868b8534bSLuigi Rizzobetween the kernel and the user processes, plus some 16968b8534bSLuigi Rizzoto query and configure the interface. 17068b8534bSLuigi RizzoThe former do not require any argument, whereas the latter 17168b8534bSLuigi Rizzouse a 17268b8534bSLuigi Rizzo.Pa struct netmap_req 17368b8534bSLuigi Rizzodefined as follows: 17468b8534bSLuigi Rizzo.Bd -literal 17568b8534bSLuigi Rizzostruct nmreq { 17668b8534bSLuigi Rizzo char nr_name[IFNAMSIZ]; 17764ae02c3SLuigi Rizzo uint32_t nr_version; /* API version */ 178*13a5d88fSLuigi Rizzo#define NETMAP_API 3 /* current version */ 17968b8534bSLuigi Rizzo uint32_t nr_offset; /* nifp offset in the shared region */ 18068b8534bSLuigi Rizzo uint32_t nr_memsize; /* size of the shared region */ 18164ae02c3SLuigi Rizzo uint32_t nr_tx_slots; /* slots in tx rings */ 18264ae02c3SLuigi Rizzo uint32_t nr_rx_slots; /* slots in rx rings */ 18364ae02c3SLuigi Rizzo uint16_t nr_tx_rings; /* number of tx rings */ 18464ae02c3SLuigi Rizzo uint16_t nr_rx_rings; /* number of tx rings */ 18568b8534bSLuigi Rizzo uint16_t nr_ringid; /* ring(s) we care about */ 18668b8534bSLuigi Rizzo#define NETMAP_HW_RING 0x4000 /* low bits indicate one hw ring */ 18768b8534bSLuigi Rizzo#define NETMAP_SW_RING 0x2000 /* we process the sw ring */ 18868b8534bSLuigi Rizzo#define NETMAP_NO_TX_POLL 0x1000 /* no gratuitous txsync on poll */ 18968b8534bSLuigi Rizzo#define NETMAP_RING_MASK 0xfff /* the actual ring number */ 190*13a5d88fSLuigi Rizzo uint16_t spare1; 191*13a5d88fSLuigi Rizzo uint32_t spare2[4]; 19268b8534bSLuigi Rizzo}; 19368b8534bSLuigi Rizzo 19468b8534bSLuigi Rizzo.Ed 19568b8534bSLuigi RizzoA device descriptor obtained through 19668b8534bSLuigi Rizzo.Pa /dev/netmap 19768b8534bSLuigi Rizzoalso supports the ioctl supported by network devices. 19868b8534bSLuigi Rizzo.Pp 19968b8534bSLuigi RizzoThe netmap-specific 20068b8534bSLuigi Rizzo.Xr ioctl 2 20168b8534bSLuigi Rizzocommand codes below are defined in 20268b8534bSLuigi Rizzo.In net/netmap.h 20368b8534bSLuigi Rizzoand are: 20468b8534bSLuigi Rizzo.Bl -tag -width XXXX 20568b8534bSLuigi Rizzo.It Dv NIOCGINFO 20668b8534bSLuigi Rizzoreturns information about the interface named in nr_name. 20768b8534bSLuigi RizzoOn return, nr_memsize indicates the size of the shared netmap 20868b8534bSLuigi Rizzomemory region (this is device-independent), 20964ae02c3SLuigi Rizzonr_tx_slots and nr_rx_slots indicates how many buffers are in a 21064ae02c3SLuigi Rizzotransmit and receive ring, 21164ae02c3SLuigi Rizzonr_tx_rings and nr_rx_rings indicates the number of transmit 21264ae02c3SLuigi Rizzoand receive rings supported by the hardware. 21368b8534bSLuigi Rizzo.Pp 21468b8534bSLuigi RizzoIf the device does not support netmap, the ioctl returns EINVAL. 21568b8534bSLuigi Rizzo.It Dv NIOCREGIF 21668b8534bSLuigi Rizzoputs the interface named in nr_name into netmap mode, disconnecting 21768b8534bSLuigi Rizzoit from the host stack, and/or defines which rings are controlled 21868b8534bSLuigi Rizzothrough this file descriptor. 21968b8534bSLuigi RizzoOn return, it gives the same info as NIOCGINFO, and nr_ringid 22068b8534bSLuigi Rizzoindicates the identity of the rings controlled through the file 22168b8534bSLuigi Rizzodescriptor. 22268b8534bSLuigi Rizzo.Pp 22368b8534bSLuigi RizzoPossible values for nr_ringid are 22468b8534bSLuigi Rizzo.Bl -tag -width XXXXX 22568b8534bSLuigi Rizzo.It 0 22668b8534bSLuigi Rizzodefault, all hardware rings 22768b8534bSLuigi Rizzo.It NETMAP_SW_RING 22868b8534bSLuigi Rizzothe ``host rings'' connecting to the host stack 22968b8534bSLuigi Rizzo.It NETMAP_HW_RING + i 23068b8534bSLuigi Rizzothe i-th hardware ring 23168b8534bSLuigi Rizzo.El 23268b8534bSLuigi RizzoBy default, a 23368b8534bSLuigi Rizzo.Nm poll 23468b8534bSLuigi Rizzoor 23568b8534bSLuigi Rizzo.Nm select 23668b8534bSLuigi Rizzocall pushes out any pending packets on the transmit ring, even if 23768b8534bSLuigi Rizzono write events are specified. 23868b8534bSLuigi RizzoThe feature can be disabled by or-ing 23968b8534bSLuigi Rizzo.Nm NETMAP_NO_TX_SYNC 24068b8534bSLuigi Rizzoto nr_ringid. 24168b8534bSLuigi RizzoBut normally you should keep this feature unless you are using 24268b8534bSLuigi Rizzoseparate file descriptors for the send and receive rings, because 24368b8534bSLuigi Rizzootherwise packets are pushed out only if NETMAP_TXSYNC is called, 24468b8534bSLuigi Rizzoor the send queue is full. 24568b8534bSLuigi Rizzo.Pp 24668b8534bSLuigi Rizzo.Pa NIOCREGIF 24768b8534bSLuigi Rizzocan be used multiple times to change the association of a 24868b8534bSLuigi Rizzofile descriptor to a ring pair, always within the same device. 24968b8534bSLuigi Rizzo.It Dv NIOCUNREGIF 25068b8534bSLuigi Rizzobrings an interface back to normal mode. 25168b8534bSLuigi Rizzo.It Dv NIOCTXSYNC 25268b8534bSLuigi Rizzotells the hardware of new packets to transmit, and updates the 25368b8534bSLuigi Rizzonumber of slots available for transmission. 25468b8534bSLuigi Rizzo.It Dv NIOCRXSYNC 25568b8534bSLuigi Rizzotells the hardware of consumed packets, and asks for newly available 25668b8534bSLuigi Rizzopackets. 25768b8534bSLuigi Rizzo.El 258*13a5d88fSLuigi Rizzo.Sh SYSTEM CALLS 25968b8534bSLuigi Rizzo.Nm 26068b8534bSLuigi Rizzouses 26168b8534bSLuigi Rizzo.Nm select 26268b8534bSLuigi Rizzoand 26368b8534bSLuigi Rizzo.Nm poll 26468b8534bSLuigi Rizzoto wake up processes when significant events occur. 26568b8534bSLuigi Rizzo.Sh EXAMPLES 26668b8534bSLuigi RizzoThe following code implements a traffic generator 26768b8534bSLuigi Rizzo.Pp 26868b8534bSLuigi Rizzo.Bd -literal -compact 26968b8534bSLuigi Rizzo#include <net/netmap.h> 27068b8534bSLuigi Rizzo#include <net/netmap_user.h> 27168b8534bSLuigi Rizzostruct netmap_if *nifp; 27268b8534bSLuigi Rizzostruct netmap_ring *ring; 27368b8534bSLuigi Rizzostruct netmap_request nmr; 27468b8534bSLuigi Rizzo 27568b8534bSLuigi Rizzofd = open("/dev/netmap", O_RDWR); 27668b8534bSLuigi Rizzobzero(&nmr, sizeof(nmr)); 27768b8534bSLuigi Rizzostrcpy(nmr.nm_name, "ix0"); 27864ae02c3SLuigi Rizzonmr.nm_version = NETMAP_API; 27968b8534bSLuigi Rizzoioctl(fd, NIOCREG, &nmr); 28068b8534bSLuigi Rizzop = mmap(0, nmr.memsize, fd); 28168b8534bSLuigi Rizzonifp = NETMAP_IF(p, nmr.offset); 28268b8534bSLuigi Rizzoring = NETMAP_TXRING(nifp, 0); 28368b8534bSLuigi Rizzofds.fd = fd; 28468b8534bSLuigi Rizzofds.events = POLLOUT; 28568b8534bSLuigi Rizzofor (;;) { 28668b8534bSLuigi Rizzo poll(list, 1, -1); 287*13a5d88fSLuigi Rizzo for ( ; ring->avail > 0 ; ring->avail--) { 28868b8534bSLuigi Rizzo i = ring->cur; 28968b8534bSLuigi Rizzo buf = NETMAP_BUF(ring, ring->slot[i].buf_index); 29068b8534bSLuigi Rizzo ... prepare packet in buf ... 29168b8534bSLuigi Rizzo ring->slot[i].len = ... packet length ... 29268b8534bSLuigi Rizzo ring->cur = NETMAP_RING_NEXT(ring, i); 29368b8534bSLuigi Rizzo } 29468b8534bSLuigi Rizzo} 29568b8534bSLuigi Rizzo.Ed 29668b8534bSLuigi Rizzo.Sh SUPPORTED INTERFACES 29768b8534bSLuigi Rizzo.Nm 29868b8534bSLuigi Rizzosupports the following interfaces: 29968b8534bSLuigi Rizzo.Xr em 4 , 300*13a5d88fSLuigi Rizzo.Xr igb 4 , 30168b8534bSLuigi Rizzo.Xr ixgbe 4 , 302*13a5d88fSLuigi Rizzo.Xr lem 4 , 303*13a5d88fSLuigi Rizzo.Xr re 4 304*13a5d88fSLuigi Rizzo.Sh SEE ALSO 305*13a5d88fSLuigi Rizzo.Xr vale 4 306*13a5d88fSLuigi Rizzo.Pp 307*13a5d88fSLuigi Rizzohttp://info.iet.unipi.it/~luigi/netmap/ 308*13a5d88fSLuigi Rizzo.Pp 309*13a5d88fSLuigi RizzoLuigi Rizzo, Revisiting network I/O APIs: the netmap framework, 310*13a5d88fSLuigi RizzoCommunications of the ACM, 55 (3), pp.45-51, March 2012 311*13a5d88fSLuigi Rizzo.Pp 312*13a5d88fSLuigi RizzoLuigi Rizzo, netmap: a novel framework for fast packet I/O, 313*13a5d88fSLuigi RizzoUsenix ATC'12, June 2012, Boston 31468b8534bSLuigi Rizzo.Sh AUTHORS 315*13a5d88fSLuigi Rizzo.An -nosplit 31668b8534bSLuigi RizzoThe 31768b8534bSLuigi Rizzo.Nm 318*13a5d88fSLuigi Rizzoframework has been designed and implemented at the 319*13a5d88fSLuigi RizzoUniversita` di Pisa in 2011 by 320*13a5d88fSLuigi Rizzo.An Luigi Rizzo , 321*13a5d88fSLuigi Rizzowith help from 322*13a5d88fSLuigi Rizzo.An Matteo Landi , 323*13a5d88fSLuigi Rizzo.An Gaetano Catalli , 324*13a5d88fSLuigi Rizzo.An Giuseppe Lettieri . 325*13a5d88fSLuigi Rizzo.Pp 326*13a5d88fSLuigi Rizzo.Nm 327*13a5d88fSLuigi Rizzohas been funded by the European Commission within FP7 Project CHANGE (257422). 328