1.\" Copyright (c) 2011 Matteo Landi, Luigi Rizzo, Universita` di Pisa 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" This document is derived in part from the enet man page (enet.4) 26.\" distributed with 4.3BSD Unix. 27.\" 28.\" $FreeBSD$ 29.\" $Id: netmap.4 9662 2011-11-16 13:18:06Z luigi $: stable/8/share/man/man4/bpf.4 181694 2008-08-13 17:45:06Z ed $ 30.\" 31.Dd November 16, 2011 32.Dt NETMAP 4 33.Os 34.Sh NAME 35.Nm netmap 36.Nd a framework for fast packet I/O 37.Sh SYNOPSIS 38.Cd device netmap 39.Sh DESCRIPTION 40.Nm 41is a framework for fast and safe access to network devices 42(reaching 14.88 Mpps at less than 1 GHz). 43.Nm 44uses memory mapped buffers and metadata 45(buffer indexes and lengths) to communicate with the kernel, 46which is in charge of validating information through 47.Pa ioctl() 48and 49.Pa select()/poll(). 50.Nm 51can exploit the parallelism in multiqueue devices and 52multicore systems. 53.Pp 54.Pp 55.Nm 56requires explicit support in device drivers. 57For a list of supported devices, see the end of this manual page. 58.Sh OPERATION 59.Nm 60clients must first open the 61.Pa open("/dev/netmap") , 62and then issue an 63.Pa ioctl(...,NIOCREGIF,...) 64to bind the file descriptor to a network device. 65.Pp 66When a device is put in 67.Nm 68mode, its data path is disconnected from the host stack. 69The processes owning the file descriptor 70can exchange packets with the device, or with the host stack, 71through an mmapped memory region that contains pre-allocated 72buffers and metadata. 73.Pp 74Non blocking I/O is done with special 75.Pa ioctl()'s , 76whereas the file descriptor can be passed to 77.Pa select()/poll() 78to be notified about incoming packet or available transmit buffers. 79.Ss Data structures 80All data structures for all devices in 81.Nm 82mode are in a memory 83region shared by the kernel and all processes 84who open 85.Pa /dev/netmap 86(NOTE: visibility may be restricted in future implementations). 87All references between the shared data structure 88are relative (offsets or indexes). Some macros help converting 89them into actual pointers. 90.Pp 91The data structures in shared memory are the following: 92.Pp 93.Bl -tag -width XXX 94.It Dv struct netmap_if (one per interface) 95indicates the number of rings supported by an interface, their 96sizes, and the offsets of the 97.Pa netmap_rings 98associated to the interface. 99The offset of a 100.Pa struct netmap_if 101in the shared memory region is indicated by the 102.Pa nr_offset 103field in the structure returned by the 104.Pa NIOCREGIF 105(see below). 106.Bd -literal 107struct netmap_if { 108 char ni_name[IFNAMSIZ]; /* name of the interface. */ 109 const u_int ni_num_queues; /* number of hw ring pairs */ 110 const ssize_t ring_ofs[]; /* offset of tx and rx rings */ 111}; 112.Ed 113.It Dv struct netmap_ring (one per ring) 114contains the index of the current read or write slot (cur), 115the number of slots available for reception or transmission (avail), 116and an array of 117.Pa slots 118describing the buffers. 119There is one ring pair for each of the N hardware ring pairs 120supported by the card (numbered 0..N-1), plus 121one ring pair (numbered N) for packets from/to the host stack. 122.Bd -literal 123struct netmap_ring { 124 const ssize_t buf_ofs; 125 const uint32_t num_slots; /* number of slots in the ring. */ 126 uint32_t avail; /* number of usable slots */ 127 uint32_t cur; /* 'current' index for the user side */ 128 129 const uint16_t nr_buf_size; 130 uint16_t flags; 131 struct netmap_slot slot[0]; /* array of slots. */ 132} 133.Ed 134.It Dv struct netmap_slot (one per packet) 135contains the metadata for a packet: a buffer index (buf_idx), 136a buffer length (len), and some flags. 137.Bd -literal 138struct netmap_slot { 139 uint32_t buf_idx; /* buffer index */ 140 uint16_t len; /* packet length */ 141 uint16_t flags; /* buf changed, etc. */ 142#define NS_BUF_CHANGED 0x0001 /* must resync, buffer changed */ 143#define NS_REPORT 0x0002 /* tell hw to report results 144 * e.g. by generating an interrupt 145 */ 146}; 147.Ed 148.It Dv packet buffers 149are fixed size (approximately 2k) buffers allocated by the kernel 150that contain packet data. Buffers addresses are computed through 151macros. 152.El 153.Pp 154Some macros support the access to objects in the shared memory 155region. In particular: 156.Bd -literal 157struct netmap_if *nifp; 158... 159struct netmap_ring *txring = NETMAP_TXRING(nifp, i); 160struct netmap_ring *rxring = NETMAP_RXRING(nifp, i); 161int i = txring->slot[txring->cur].buf_idx; 162char *buf = NETMAP_BUF(txring, i); 163.Ed 164.Ss IOCTLS 165.Pp 166.Nm 167supports some ioctl() to synchronize the state of the rings 168between the kernel and the user processes, plus some 169to query and configure the interface. 170The former do not require any argument, whereas the latter 171use a 172.Pa struct netmap_req 173defined as follows: 174.Bd -literal 175struct nmreq { 176 char nr_name[IFNAMSIZ]; 177 uint32_t nr_offset; /* nifp offset in the shared region */ 178 uint32_t nr_memsize; /* size of the shared region */ 179 uint32_t nr_numdescs; /* descriptors per queue */ 180 uint16_t nr_numqueues; 181 uint16_t nr_ringid; /* ring(s) we care about */ 182#define NETMAP_HW_RING 0x4000 /* low bits indicate one hw ring */ 183#define NETMAP_SW_RING 0x2000 /* we process the sw ring */ 184#define NETMAP_NO_TX_POLL 0x1000 /* no gratuitous txsync on poll */ 185#define NETMAP_RING_MASK 0xfff /* the actual ring number */ 186}; 187 188.Ed 189A device descriptor obtained through 190.Pa /dev/netmap 191also supports the ioctl supported by network devices. 192.Pp 193The netmap-specific 194.Xr ioctl 2 195command codes below are defined in 196.In net/netmap.h 197and are: 198.Bl -tag -width XXXX 199.It Dv NIOCGINFO 200returns information about the interface named in nr_name. 201On return, nr_memsize indicates the size of the shared netmap 202memory region (this is device-independent), 203nr_numslots indicates how many buffers are in a ring, 204nr_numrings indicates the number of rings supported by the hardware. 205.Pp 206If the device does not support netmap, the ioctl returns EINVAL. 207.It Dv NIOCREGIF 208puts the interface named in nr_name into netmap mode, disconnecting 209it from the host stack, and/or defines which rings are controlled 210through this file descriptor. 211On return, it gives the same info as NIOCGINFO, and nr_ringid 212indicates the identity of the rings controlled through the file 213descriptor. 214.Pp 215Possible values for nr_ringid are 216.Bl -tag -width XXXXX 217.It 0 218default, all hardware rings 219.It NETMAP_SW_RING 220the ``host rings'' connecting to the host stack 221.It NETMAP_HW_RING + i 222the i-th hardware ring 223.El 224By default, a 225.Nm poll 226or 227.Nm select 228call pushes out any pending packets on the transmit ring, even if 229no write events are specified. 230The feature can be disabled by or-ing 231.Nm NETMAP_NO_TX_SYNC 232to nr_ringid. 233But normally you should keep this feature unless you are using 234separate file descriptors for the send and receive rings, because 235otherwise packets are pushed out only if NETMAP_TXSYNC is called, 236or the send queue is full. 237.Pp 238.Pa NIOCREGIF 239can be used multiple times to change the association of a 240file descriptor to a ring pair, always within the same device. 241.It Dv NIOCUNREGIF 242brings an interface back to normal mode. 243.It Dv NIOCTXSYNC 244tells the hardware of new packets to transmit, and updates the 245number of slots available for transmission. 246.It Dv NIOCRXSYNC 247tells the hardware of consumed packets, and asks for newly available 248packets. 249.El 250.Ss SYSTEM CALLS 251.Nm 252uses 253.Nm select 254and 255.Nm poll 256to wake up processes when significant events occur. 257.Sh EXAMPLES 258The following code implements a traffic generator 259.Pp 260.Bd -literal -compact 261#include <net/netmap.h> 262#include <net/netmap_user.h> 263struct netmap_if *nifp; 264struct netmap_ring *ring; 265struct netmap_request nmr; 266 267fd = open("/dev/netmap", O_RDWR); 268bzero(&nmr, sizeof(nmr)); 269strcpy(nmr.nm_name, "ix0"); 270ioctl(fd, NIOCREG, &nmr); 271p = mmap(0, nmr.memsize, fd); 272nifp = NETMAP_IF(p, nmr.offset); 273ring = NETMAP_TXRING(nifp, 0); 274fds.fd = fd; 275fds.events = POLLOUT; 276for (;;) { 277 poll(list, 1, -1); 278 while (ring->avail-- > 0) { 279 i = ring->cur; 280 buf = NETMAP_BUF(ring, ring->slot[i].buf_index); 281 ... prepare packet in buf ... 282 ring->slot[i].len = ... packet length ... 283 ring->cur = NETMAP_RING_NEXT(ring, i); 284 } 285} 286.Ed 287.Sh SUPPORTED INTERFACES 288.Nm 289supports the following interfaces: 290.Xr em 4 , 291.Xr ixgbe 4 , 292.Xr re 4 , 293.Sh AUTHORS 294The 295.Nm 296framework has been designed and implemented by 297.An Luigi Rizzo 298and 299.An Matteo Landi 300in 2011 at the Universita` di Pisa. 301