xref: /freebsd/share/man/man4/netmap.4 (revision d83a410e246f0818ca81460fe3eaf7c2bb3c570f)
168b8534bSLuigi Rizzo.\" Copyright (c) 2011 Matteo Landi, Luigi Rizzo, Universita` di Pisa
268b8534bSLuigi Rizzo.\" All rights reserved.
368b8534bSLuigi Rizzo.\"
468b8534bSLuigi Rizzo.\" Redistribution and use in source and binary forms, with or without
568b8534bSLuigi Rizzo.\" modification, are permitted provided that the following conditions
668b8534bSLuigi Rizzo.\" are met:
768b8534bSLuigi Rizzo.\" 1. Redistributions of source code must retain the above copyright
868b8534bSLuigi Rizzo.\"    notice, this list of conditions and the following disclaimer.
968b8534bSLuigi Rizzo.\" 2. Redistributions in binary form must reproduce the above copyright
1068b8534bSLuigi Rizzo.\"    notice, this list of conditions and the following disclaimer in the
1168b8534bSLuigi Rizzo.\"    documentation and/or other materials provided with the distribution.
1268b8534bSLuigi Rizzo.\"
1368b8534bSLuigi Rizzo.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
1468b8534bSLuigi Rizzo.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
1568b8534bSLuigi Rizzo.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
1668b8534bSLuigi Rizzo.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
1768b8534bSLuigi Rizzo.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
1868b8534bSLuigi Rizzo.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
1968b8534bSLuigi Rizzo.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
2068b8534bSLuigi Rizzo.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
2168b8534bSLuigi Rizzo.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
2268b8534bSLuigi Rizzo.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
2368b8534bSLuigi Rizzo.\" SUCH DAMAGE.
2468b8534bSLuigi Rizzo.\"
2568b8534bSLuigi Rizzo.\" This document is derived in part from the enet man page (enet.4)
2668b8534bSLuigi Rizzo.\" distributed with 4.3BSD Unix.
2768b8534bSLuigi Rizzo.\"
2868b8534bSLuigi Rizzo.\" $FreeBSD$
2913a5d88fSLuigi Rizzo.\" $Id: netmap.4 11563 2012-08-02 08:59:12Z luigi $: stable/8/share/man/man4/bpf.4 181694 2008-08-13 17:45:06Z ed $
3068b8534bSLuigi Rizzo.\"
31*d83a410eSHiren Panchasara.Dd September 23, 2013
3268b8534bSLuigi Rizzo.Dt NETMAP 4
3368b8534bSLuigi Rizzo.Os
3468b8534bSLuigi Rizzo.Sh NAME
3568b8534bSLuigi Rizzo.Nm netmap
3668b8534bSLuigi Rizzo.Nd a framework for fast packet I/O
3768b8534bSLuigi Rizzo.Sh SYNOPSIS
3868b8534bSLuigi Rizzo.Cd device netmap
3968b8534bSLuigi Rizzo.Sh DESCRIPTION
4068b8534bSLuigi Rizzo.Nm
4168b8534bSLuigi Rizzois a framework for fast and safe access to network devices
4268b8534bSLuigi Rizzo(reaching 14.88 Mpps at less than 1 GHz).
4368b8534bSLuigi Rizzo.Nm
4468b8534bSLuigi Rizzouses memory mapped buffers and metadata
4568b8534bSLuigi Rizzo(buffer indexes and lengths) to communicate with the kernel,
4668b8534bSLuigi Rizzowhich is in charge of validating information through
4768b8534bSLuigi Rizzo.Pa ioctl()
4868b8534bSLuigi Rizzoand
4968b8534bSLuigi Rizzo.Pa select()/poll().
5068b8534bSLuigi Rizzo.Nm
5168b8534bSLuigi Rizzocan exploit the parallelism in multiqueue devices and
5268b8534bSLuigi Rizzomulticore systems.
5368b8534bSLuigi Rizzo.Pp
5468b8534bSLuigi Rizzo.Nm
5568b8534bSLuigi Rizzorequires explicit support in device drivers.
5668b8534bSLuigi RizzoFor a list of supported devices, see the end of this manual page.
5768b8534bSLuigi Rizzo.Sh OPERATION
5868b8534bSLuigi Rizzo.Nm
5968b8534bSLuigi Rizzoclients must first open the
6068b8534bSLuigi Rizzo.Pa open("/dev/netmap") ,
6168b8534bSLuigi Rizzoand then issue an
6268b8534bSLuigi Rizzo.Pa ioctl(...,NIOCREGIF,...)
6368b8534bSLuigi Rizzoto bind the file descriptor to a network device.
6468b8534bSLuigi Rizzo.Pp
6568b8534bSLuigi RizzoWhen a device is put in
6668b8534bSLuigi Rizzo.Nm
6768b8534bSLuigi Rizzomode, its data path is disconnected from the host stack.
6868b8534bSLuigi RizzoThe processes owning the file descriptor
6968b8534bSLuigi Rizzocan exchange packets with the device, or with the host stack,
7068b8534bSLuigi Rizzothrough an mmapped memory region that contains pre-allocated
7168b8534bSLuigi Rizzobuffers and metadata.
7268b8534bSLuigi Rizzo.Pp
7368b8534bSLuigi RizzoNon blocking I/O is done with special
7468b8534bSLuigi Rizzo.Pa ioctl()'s ,
7568b8534bSLuigi Rizzowhereas the file descriptor can be passed to
7668b8534bSLuigi Rizzo.Pa select()/poll()
7768b8534bSLuigi Rizzoto be notified about incoming packet or available transmit buffers.
7868b8534bSLuigi Rizzo.Ss Data structures
7968b8534bSLuigi RizzoAll data structures for all devices in
8068b8534bSLuigi Rizzo.Nm
8168b8534bSLuigi Rizzomode are in a memory
8268b8534bSLuigi Rizzoregion shared by the kernel and all processes
8368b8534bSLuigi Rizzowho open
8468b8534bSLuigi Rizzo.Pa /dev/netmap
8568b8534bSLuigi Rizzo(NOTE: visibility may be restricted in future implementations).
8668b8534bSLuigi RizzoAll references between the shared data structure
8768b8534bSLuigi Rizzoare relative (offsets or indexes). Some macros help converting
8868b8534bSLuigi Rizzothem into actual pointers.
8968b8534bSLuigi Rizzo.Pp
9068b8534bSLuigi RizzoThe data structures in shared memory are the following:
9168b8534bSLuigi Rizzo.Bl -tag -width XXX
9268b8534bSLuigi Rizzo.It Dv struct netmap_if (one per interface)
9368b8534bSLuigi Rizzoindicates the number of rings supported by an interface, their
9468b8534bSLuigi Rizzosizes, and the offsets of the
9568b8534bSLuigi Rizzo.Pa netmap_rings
9668b8534bSLuigi Rizzoassociated to the interface.
9768b8534bSLuigi RizzoThe offset of a
9868b8534bSLuigi Rizzo.Pa struct netmap_if
9968b8534bSLuigi Rizzoin the shared memory region is indicated by the
10068b8534bSLuigi Rizzo.Pa nr_offset
10168b8534bSLuigi Rizzofield in the structure returned by the
10268b8534bSLuigi Rizzo.Pa NIOCREGIF
10368b8534bSLuigi Rizzo(see below).
10468b8534bSLuigi Rizzo.Bd -literal
10568b8534bSLuigi Rizzostruct netmap_if {
10668b8534bSLuigi Rizzo    char ni_name[IFNAMSIZ]; /* name of the interface. */
10768b8534bSLuigi Rizzo    const u_int ni_num_queues; /* number of hw ring pairs */
10868b8534bSLuigi Rizzo    const ssize_t   ring_ofs[]; /* offset of tx and rx rings */
10968b8534bSLuigi Rizzo};
11068b8534bSLuigi Rizzo.Ed
11168b8534bSLuigi Rizzo.It Dv struct netmap_ring (one per ring)
11268b8534bSLuigi Rizzocontains the index of the current read or write slot (cur),
11368b8534bSLuigi Rizzothe number of slots available for reception or transmission (avail),
11468b8534bSLuigi Rizzoand an array of
11568b8534bSLuigi Rizzo.Pa slots
11668b8534bSLuigi Rizzodescribing the buffers.
11768b8534bSLuigi RizzoThere is one ring pair for each of the N hardware ring pairs
11868b8534bSLuigi Rizzosupported by the card (numbered 0..N-1), plus
11968b8534bSLuigi Rizzoone ring pair (numbered N) for packets from/to the host stack.
12068b8534bSLuigi Rizzo.Bd -literal
12168b8534bSLuigi Rizzostruct netmap_ring {
12268b8534bSLuigi Rizzo    const ssize_t buf_ofs;
12368b8534bSLuigi Rizzo    const uint32_t num_slots; /* number of slots in the ring. */
12468b8534bSLuigi Rizzo    uint32_t avail;           /* number of usable slots */
12568b8534bSLuigi Rizzo    uint32_t cur;             /* 'current' index for the user side */
12664ae02c3SLuigi Rizzo    uint32_t reserved;        /* not refilled before current */
12768b8534bSLuigi Rizzo
12868b8534bSLuigi Rizzo    const uint16_t nr_buf_size;
12968b8534bSLuigi Rizzo    uint16_t flags;
13068b8534bSLuigi Rizzo    struct netmap_slot slot[0]; /* array of slots. */
13168b8534bSLuigi Rizzo}
13268b8534bSLuigi Rizzo.Ed
13368b8534bSLuigi Rizzo.It Dv struct netmap_slot (one per packet)
13468b8534bSLuigi Rizzocontains the metadata for a packet: a buffer index (buf_idx),
13568b8534bSLuigi Rizzoa buffer length (len), and some flags.
13668b8534bSLuigi Rizzo.Bd -literal
13768b8534bSLuigi Rizzostruct netmap_slot {
13868b8534bSLuigi Rizzo    uint32_t buf_idx; /* buffer index */
13968b8534bSLuigi Rizzo    uint16_t len;   /* packet length */
14068b8534bSLuigi Rizzo    uint16_t flags; /* buf changed, etc. */
14168b8534bSLuigi Rizzo#define NS_BUF_CHANGED  0x0001  /* must resync, buffer changed */
14268b8534bSLuigi Rizzo#define NS_REPORT       0x0002  /* tell hw to report results
14368b8534bSLuigi Rizzo                                 * e.g. by generating an interrupt
14468b8534bSLuigi Rizzo                                 */
14568b8534bSLuigi Rizzo};
14668b8534bSLuigi Rizzo.Ed
14768b8534bSLuigi Rizzo.It Dv packet buffers
14868b8534bSLuigi Rizzoare fixed size (approximately 2k) buffers allocated by the kernel
14968b8534bSLuigi Rizzothat contain packet data. Buffers addresses are computed through
15068b8534bSLuigi Rizzomacros.
15168b8534bSLuigi Rizzo.El
15268b8534bSLuigi Rizzo.Pp
15368b8534bSLuigi RizzoSome macros support the access to objects in the shared memory
15468b8534bSLuigi Rizzoregion. In particular:
15568b8534bSLuigi Rizzo.Bd -literal
15668b8534bSLuigi Rizzostruct netmap_if *nifp;
15768b8534bSLuigi Rizzostruct netmap_ring *txring = NETMAP_TXRING(nifp, i);
15868b8534bSLuigi Rizzostruct netmap_ring *rxring = NETMAP_RXRING(nifp, i);
15968b8534bSLuigi Rizzoint i = txring->slot[txring->cur].buf_idx;
16068b8534bSLuigi Rizzochar *buf = NETMAP_BUF(txring, i);
16168b8534bSLuigi Rizzo.Ed
16213a5d88fSLuigi Rizzo.Sh IOCTLS
16368b8534bSLuigi Rizzo.Nm
16468b8534bSLuigi Rizzosupports some ioctl() to synchronize the state of the rings
16568b8534bSLuigi Rizzobetween the kernel and the user processes, plus some
16668b8534bSLuigi Rizzoto query and configure the interface.
16768b8534bSLuigi RizzoThe former do not require any argument, whereas the latter
16868b8534bSLuigi Rizzouse a
16968b8534bSLuigi Rizzo.Pa struct netmap_req
17068b8534bSLuigi Rizzodefined as follows:
17168b8534bSLuigi Rizzo.Bd -literal
17268b8534bSLuigi Rizzostruct nmreq {
17368b8534bSLuigi Rizzo        char      nr_name[IFNAMSIZ];
17464ae02c3SLuigi Rizzo        uint32_t  nr_version;     /* API version */
17513a5d88fSLuigi Rizzo#define NETMAP_API      3         /* current version */
17668b8534bSLuigi Rizzo        uint32_t  nr_offset;      /* nifp offset in the shared region */
17768b8534bSLuigi Rizzo        uint32_t  nr_memsize;     /* size of the shared region */
17864ae02c3SLuigi Rizzo        uint32_t  nr_tx_slots;    /* slots in tx rings */
17964ae02c3SLuigi Rizzo        uint32_t  nr_rx_slots;    /* slots in rx rings */
18064ae02c3SLuigi Rizzo        uint16_t  nr_tx_rings;    /* number of tx rings */
18164ae02c3SLuigi Rizzo        uint16_t  nr_rx_rings;    /* number of tx rings */
18268b8534bSLuigi Rizzo        uint16_t  nr_ringid;      /* ring(s) we care about */
18368b8534bSLuigi Rizzo#define NETMAP_HW_RING  0x4000    /* low bits indicate one hw ring */
18468b8534bSLuigi Rizzo#define NETMAP_SW_RING  0x2000    /* we process the sw ring */
18568b8534bSLuigi Rizzo#define NETMAP_NO_TX_POLL 0x1000  /* no gratuitous txsync on poll */
18668b8534bSLuigi Rizzo#define NETMAP_RING_MASK 0xfff    /* the actual ring number */
18713a5d88fSLuigi Rizzo        uint16_t        spare1;
18813a5d88fSLuigi Rizzo        uint32_t        spare2[4];
18968b8534bSLuigi Rizzo};
19068b8534bSLuigi Rizzo
19168b8534bSLuigi Rizzo.Ed
19268b8534bSLuigi RizzoA device descriptor obtained through
19368b8534bSLuigi Rizzo.Pa /dev/netmap
19468b8534bSLuigi Rizzoalso supports the ioctl supported by network devices.
19568b8534bSLuigi Rizzo.Pp
19668b8534bSLuigi RizzoThe netmap-specific
19768b8534bSLuigi Rizzo.Xr ioctl 2
19868b8534bSLuigi Rizzocommand codes below are defined in
19968b8534bSLuigi Rizzo.In net/netmap.h
20068b8534bSLuigi Rizzoand are:
20168b8534bSLuigi Rizzo.Bl -tag -width XXXX
20268b8534bSLuigi Rizzo.It Dv NIOCGINFO
20368b8534bSLuigi Rizzoreturns information about the interface named in nr_name.
20468b8534bSLuigi RizzoOn return, nr_memsize indicates the size of the shared netmap
20568b8534bSLuigi Rizzomemory region (this is device-independent),
20664ae02c3SLuigi Rizzonr_tx_slots and nr_rx_slots indicates how many buffers are in a
20764ae02c3SLuigi Rizzotransmit and receive ring,
20864ae02c3SLuigi Rizzonr_tx_rings and nr_rx_rings indicates the number of transmit
20964ae02c3SLuigi Rizzoand receive rings supported by the hardware.
21068b8534bSLuigi Rizzo.Pp
21168b8534bSLuigi RizzoIf the device does not support netmap, the ioctl returns EINVAL.
21268b8534bSLuigi Rizzo.It Dv NIOCREGIF
21368b8534bSLuigi Rizzoputs the interface named in nr_name into netmap mode, disconnecting
21468b8534bSLuigi Rizzoit from the host stack, and/or defines which rings are controlled
21568b8534bSLuigi Rizzothrough this file descriptor.
21668b8534bSLuigi RizzoOn return, it gives the same info as NIOCGINFO, and nr_ringid
21768b8534bSLuigi Rizzoindicates the identity of the rings controlled through the file
21868b8534bSLuigi Rizzodescriptor.
21968b8534bSLuigi Rizzo.Pp
22068b8534bSLuigi RizzoPossible values for nr_ringid are
22168b8534bSLuigi Rizzo.Bl -tag -width XXXXX
22268b8534bSLuigi Rizzo.It 0
22368b8534bSLuigi Rizzodefault, all hardware rings
22468b8534bSLuigi Rizzo.It NETMAP_SW_RING
22568b8534bSLuigi Rizzothe ``host rings'' connecting to the host stack
22668b8534bSLuigi Rizzo.It NETMAP_HW_RING + i
22768b8534bSLuigi Rizzothe i-th hardware ring
22868b8534bSLuigi Rizzo.El
22968b8534bSLuigi RizzoBy default, a
23068b8534bSLuigi Rizzo.Nm poll
23168b8534bSLuigi Rizzoor
23268b8534bSLuigi Rizzo.Nm select
23368b8534bSLuigi Rizzocall pushes out any pending packets on the transmit ring, even if
23468b8534bSLuigi Rizzono write events are specified.
23568b8534bSLuigi RizzoThe feature can be disabled by or-ing
23668b8534bSLuigi Rizzo.Nm NETMAP_NO_TX_SYNC
23768b8534bSLuigi Rizzoto nr_ringid.
23868b8534bSLuigi RizzoBut normally you should keep this feature unless you are using
23968b8534bSLuigi Rizzoseparate file descriptors for the send and receive rings, because
24068b8534bSLuigi Rizzootherwise packets are pushed out only if NETMAP_TXSYNC is called,
24168b8534bSLuigi Rizzoor the send queue is full.
24268b8534bSLuigi Rizzo.Pp
24368b8534bSLuigi Rizzo.Pa NIOCREGIF
24468b8534bSLuigi Rizzocan be used multiple times to change the association of a
24568b8534bSLuigi Rizzofile descriptor to a ring pair, always within the same device.
24668b8534bSLuigi Rizzo.It Dv NIOCUNREGIF
24768b8534bSLuigi Rizzobrings an interface back to normal mode.
24868b8534bSLuigi Rizzo.It Dv NIOCTXSYNC
24968b8534bSLuigi Rizzotells the hardware of new packets to transmit, and updates the
25068b8534bSLuigi Rizzonumber of slots available for transmission.
25168b8534bSLuigi Rizzo.It Dv NIOCRXSYNC
25268b8534bSLuigi Rizzotells the hardware of consumed packets, and asks for newly available
25368b8534bSLuigi Rizzopackets.
25468b8534bSLuigi Rizzo.El
25513a5d88fSLuigi Rizzo.Sh SYSTEM CALLS
25668b8534bSLuigi Rizzo.Nm
25768b8534bSLuigi Rizzouses
25868b8534bSLuigi Rizzo.Nm select
25968b8534bSLuigi Rizzoand
26068b8534bSLuigi Rizzo.Nm poll
26168b8534bSLuigi Rizzoto wake up processes when significant events occur.
26268b8534bSLuigi Rizzo.Sh EXAMPLES
26368b8534bSLuigi RizzoThe following code implements a traffic generator
26468b8534bSLuigi Rizzo.Pp
26568b8534bSLuigi Rizzo.Bd -literal -compact
26668b8534bSLuigi Rizzo#include <net/netmap.h>
26768b8534bSLuigi Rizzo#include <net/netmap_user.h>
26868b8534bSLuigi Rizzostruct netmap_if *nifp;
26968b8534bSLuigi Rizzostruct netmap_ring *ring;
270*d83a410eSHiren Panchasarastruct nmreq nmr;
27168b8534bSLuigi Rizzo
27268b8534bSLuigi Rizzofd = open("/dev/netmap", O_RDWR);
27368b8534bSLuigi Rizzobzero(&nmr, sizeof(nmr));
274*d83a410eSHiren Panchasarastrcpy(nmr.nr_name, "ix0");
275*d83a410eSHiren Panchasaranmr.nr_version = NETMAP_API;
27668b8534bSLuigi Rizzoioctl(fd, NIOCREG, &nmr);
277*d83a410eSHiren Panchasarap = mmap(0, nmr.nr_memsize, fd);
27868b8534bSLuigi Rizzonifp = NETMAP_IF(p, nmr.offset);
27968b8534bSLuigi Rizzoring = NETMAP_TXRING(nifp, 0);
28068b8534bSLuigi Rizzofds.fd = fd;
28168b8534bSLuigi Rizzofds.events = POLLOUT;
28268b8534bSLuigi Rizzofor (;;) {
28368b8534bSLuigi Rizzo    poll(list, 1, -1);
28413a5d88fSLuigi Rizzo    for ( ; ring->avail > 0 ; ring->avail--) {
28568b8534bSLuigi Rizzo        i = ring->cur;
28668b8534bSLuigi Rizzo        buf = NETMAP_BUF(ring, ring->slot[i].buf_index);
28768b8534bSLuigi Rizzo        ... prepare packet in buf ...
28868b8534bSLuigi Rizzo        ring->slot[i].len = ... packet length ...
28968b8534bSLuigi Rizzo        ring->cur = NETMAP_RING_NEXT(ring, i);
29068b8534bSLuigi Rizzo    }
29168b8534bSLuigi Rizzo}
29268b8534bSLuigi Rizzo.Ed
29368b8534bSLuigi Rizzo.Sh SUPPORTED INTERFACES
29468b8534bSLuigi Rizzo.Nm
29568b8534bSLuigi Rizzosupports the following interfaces:
29668b8534bSLuigi Rizzo.Xr em 4 ,
29713a5d88fSLuigi Rizzo.Xr igb 4 ,
29868b8534bSLuigi Rizzo.Xr ixgbe 4 ,
29913a5d88fSLuigi Rizzo.Xr lem 4 ,
30013a5d88fSLuigi Rizzo.Xr re 4
30113a5d88fSLuigi Rizzo.Sh SEE ALSO
30213a5d88fSLuigi Rizzo.Xr vale 4
30313a5d88fSLuigi Rizzo.Pp
30413a5d88fSLuigi Rizzohttp://info.iet.unipi.it/~luigi/netmap/
30513a5d88fSLuigi Rizzo.Pp
30613a5d88fSLuigi RizzoLuigi Rizzo, Revisiting network I/O APIs: the netmap framework,
30713a5d88fSLuigi RizzoCommunications of the ACM, 55 (3), pp.45-51, March 2012
30813a5d88fSLuigi Rizzo.Pp
30913a5d88fSLuigi RizzoLuigi Rizzo, netmap: a novel framework for fast packet I/O,
31013a5d88fSLuigi RizzoUsenix ATC'12, June 2012, Boston
31168b8534bSLuigi Rizzo.Sh AUTHORS
31213a5d88fSLuigi Rizzo.An -nosplit
31368b8534bSLuigi RizzoThe
31468b8534bSLuigi Rizzo.Nm
31513a5d88fSLuigi Rizzoframework has been designed and implemented at the
31613a5d88fSLuigi RizzoUniversita` di Pisa in 2011 by
31713a5d88fSLuigi Rizzo.An Luigi Rizzo ,
31813a5d88fSLuigi Rizzowith help from
31913a5d88fSLuigi Rizzo.An Matteo Landi ,
32013a5d88fSLuigi Rizzo.An Gaetano Catalli ,
32113a5d88fSLuigi Rizzo.An Giuseppe Lettieri .
32213a5d88fSLuigi Rizzo.Pp
32313a5d88fSLuigi Rizzo.Nm
32413a5d88fSLuigi Rizzohas been funded by the European Commission within FP7 Project CHANGE (257422).
325