xref: /freebsd/share/man/man4/netmap.4 (revision 17885a7bfde9d164e45a9833bb172215c55739f9)
1.\" Copyright (c) 2011-2014 Matteo Landi, Luigi Rizzo, Universita` di Pisa
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" This document is derived in part from the enet man page (enet.4)
26.\" distributed with 4.3BSD Unix.
27.\"
28.\" $FreeBSD$
29.\"
30.Dd January 4, 2014
31.Dt NETMAP 4
32.Os
33.Sh NAME
34.Nm netmap
35.Nd a framework for fast packet I/O
36.br
37.Nm VALE
38.Nd a fast VirtuAl Local Ethernet using the netmap API
39.Sh SYNOPSIS
40.Cd device netmap
41.Sh DESCRIPTION
42.Nm
43is a framework for extremely fast and efficient packet I/O
44for both userspace and kernel clients.
45It runs on FreeBSD and Linux,
46and includes
47.Nm VALE ,
48a very fast and modular in-kernel software switch/dataplane.
49.Pp
50.Nm
51and
52.Nm VALE
53are one order of magnitude faster than sockets, bpf or
54native switches based on
55.Xr tun/tap 4 ,
56reaching 14.88 Mpps with much less than one core on a 10 Gbit NIC,
57and 20 Mpps per core for VALE ports.
58.Pp
59Userspace clients can dynamically switch NICs into
60.Nm
61mode and send and receive raw packets through
62memory mapped buffers.
63A selectable file descriptor supports
64synchronization and blocking I/O.
65.Pp
66Similarly,
67.Nm VALE
68can dynamically create switch instances and ports,
69providing high speed packet I/O between processes,
70virtual machines, NICs and the host stack.
71.Pp
72For best performance,
73.Nm
74requires explicit support in device drivers;
75however, the
76.Nm
77API can be emulated on top of unmodified device drivers,
78at the price of reduced performance
79(but still better than sockets or BPF/pcap).
80.Pp
81In the rest of this (long) manual page we document
82various aspects of the
83.Nm
84and
85.Nm VALE
86architecture, features and usage.
87.Pp
88.Sh ARCHITECTURE
89.Nm
90supports raw packet I/O through a
91.Em port ,
92which can be connected to a physical interface
93.Em ( NIC ) ,
94to the host stack,
95or to a
96.Nm VALE
97switch).
98Ports use preallocated circular queues of buffers
99.Em ( rings )
100residing in an mmapped region.
101There is one ring for each transmit/receive queue of a
102NIC or virtual port.
103An additional ring pair connects to the host stack.
104.Pp
105After binding a file descriptor to a port, a
106.Nm
107client can send or receive packets in batches through
108the rings, and possibly implement zero-copy forwarding
109between ports.
110.Pp
111All NICs operating in
112.Nm
113mode use the same memory region,
114accessible to all processes who own
115.Nm /dev/netmap
116file descriptors bound to NICs.
117.Nm VALE
118ports instead use separate memory regions.
119.Pp
120.Sh ENTERING AND EXITING NETMAP MODE
121Ports and rings are created and controlled through a file descriptor,
122created by opening a special device
123.Dl fd = open("/dev/netmap");
124and then bound to a specific port with an
125.Dl ioctl(fd, NIOCREGIF, (struct nmreq *)arg);
126.Pp
127.Nm
128has multiple modes of operation controlled by the
129.Vt struct nmreq
130argument.
131.Va arg.nr_name
132specifies the port name, as follows:
133.Bl -tag -width XXXX
134.It Dv OS network interface name (e.g. 'em0', 'eth1', ... )
135the data path of the NIC is disconnected from the host stack,
136and the file descriptor is bound to the NIC (one or all queues),
137or to the host stack;
138.It Dv valeXXX:YYY (arbitrary XXX and YYY)
139the file descriptor is bound to port YYY of a VALE switch called XXX,
140both dynamically created if necessary.
141The string cannot exceed IFNAMSIZ characters, and YYY cannot
142be the name of any existing OS network interface.
143.El
144.Pp
145On return,
146.Va arg
147indicates the size of the shared memory region,
148and the number, size and location of all the
149.Nm
150data structures, which can be accessed by mmapping the memory
151.Dl char *mem = mmap(0, arg.nr_memsize, fd);
152.Pp
153Non blocking I/O is done with special
154.Xr ioctl 2
155.Xr select 2
156and
157.Xr poll 2
158on the file descriptor permit blocking I/O.
159.Xr epoll 2
160and
161.Xr kqueue 2
162are not supported on
163.Nm
164file descriptors.
165.Pp
166While a NIC is in
167.Nm
168mode, the OS will still believe the interface is up and running.
169OS-generated packets for that NIC end up into a
170.Nm
171ring, and another ring is used to send packets into the OS network stack.
172A
173.Xr close 2
174on the file descriptor removes the binding,
175and returns the NIC to normal mode (reconnecting the data path
176to the host stack), or destroys the virtual port.
177.Pp
178.Sh DATA STRUCTURES
179The data structures in the mmapped memory region are detailed in
180.Xr sys/net/netmap.h ,
181which is the ultimate reference for the
182.Nm
183API. The main structures and fields are indicated below:
184.Bl -tag -width XXX
185.It Dv struct netmap_if (one per interface)
186.Bd -literal
187struct netmap_if {
188    ...
189    const uint32_t   ni_flags;          /* properties     */
190    ...
191    const uint32_t   ni_tx_rings;       /* NIC tx rings   */
192    const uint32_t   ni_rx_rings;       /* NIC rx rings   */
193    const uint32_t   ni_extra_tx_rings; /* extra tx rings */
194    const uint32_t   ni_extra_rx_rings; /* extra rx rings */
195    ...
196};
197.Ed
198.Pp
199Indicates the number of available rings
200.Pa ( struct netmap_rings )
201and their position in the mmapped region.
202The number of tx and rx rings
203.Pa ( ni_tx_rings , ni_rx_rings )
204normally depends on the hardware.
205NICs also have an extra tx/rx ring pair connected to the host stack.
206.Em NIOCREGIF
207can request additional tx/rx rings,
208to be used between multiple processes/threads
209accessing the same
210.Nm
211port.
212.It Dv struct netmap_ring (one per ring)
213.Bd -literal
214struct netmap_ring {
215    ...
216    const uint32_t num_slots;   /* slots in each ring            */
217    const uint32_t nr_buf_size; /* size of each buffer           */
218    ...
219    uint32_t       head;        /* (u) first buf owned by user   */
220    uint32_t       cur;         /* (u) wakeup position           */
221    const uint32_t tail;        /* (k) first buf owned by kernel */
222    ...
223    uint32_t       flags;
224    struct timeval ts;          /* (k) time of last rxsync()      */
225    ...
226    struct netmap_slot slot[0]; /* array of slots                 */
227}
228.Ed
229.Pp
230Implements transmit and receive rings, with read/write
231pointers, metadata and and an array of
232.Pa slots
233describing the buffers.
234.Pp
235.It Dv struct netmap_slot (one per buffer)
236.Bd -literal
237struct netmap_slot {
238    uint32_t buf_idx;           /* buffer index                 */
239    uint16_t len;               /* packet length                */
240    uint16_t flags;             /* buf changed, etc.            */
241    uint64_t ptr;               /* address for indirect buffers */
242};
243.Ed
244.Pp
245Describes a packet buffer, which normally is identified by
246an index and resides in the mmapped region.
247.It Dv packet buffers
248Fixed size (normally 2 KB) packet buffers allocated by the kernel.
249.El
250.Pp
251The offset of the
252.Pa struct netmap_if
253in the mmapped region is indicated by the
254.Pa nr_offset
255field in the structure returned by
256.Pa NIOCREGIF .
257From there, all other objects are reachable through
258relative references (offsets or indexes).
259Macros and functions in <net/netmap_user.h>
260help converting them into actual pointers:
261.Pp
262.Dl struct netmap_if  *nifp = NETMAP_IF(mem, arg.nr_offset);
263.Dl struct netmap_ring *txr = NETMAP_TXRING(nifp, ring_index);
264.Dl struct netmap_ring *rxr = NETMAP_RXRING(nifp, ring_index);
265.Pp
266.Dl char *buf = NETMAP_BUF(ring, buffer_index);
267.Sh RINGS, BUFFERS AND DATA I/O
268.Va Rings
269are circular queues of packets with three indexes/pointers
270.Va ( head , cur , tail ) ;
271one slot is always kept empty.
272The ring size
273.Va ( num_slots )
274should not be assumed to be a power of two.
275.br
276(NOTE: older versions of netmap used head/count format to indicate
277the content of a ring).
278.Pp
279.Va head
280is the first slot available to userspace;
281.br
282.Va cur
283is the wakeup point:
284select/poll will unblock when
285.Va tail
286passes
287.Va cur ;
288.br
289.Va tail
290is the first slot reserved to the kernel.
291.Pp
292Slot indexes MUST only move forward;
293for convenience, the function
294.Dl nm_ring_next(ring, index)
295returns the next index modulo the ring size.
296.Pp
297.Va head
298and
299.Va cur
300are only modified by the user program;
301.Va tail
302is only modified by the kernel.
303The kernel only reads/writes the
304.Vt struct netmap_ring
305slots and buffers
306during the execution of a netmap-related system call.
307The only exception are slots (and buffers) in the range
308.Va tail\  . . . head-1 ,
309that are explicitly assigned to the kernel.
310.Pp
311.Ss TRANSMIT RINGS
312On transmit rings, after a
313.Nm
314system call, slots in the range
315.Va head\  . . . tail-1
316are available for transmission.
317User code should fill the slots sequentially
318and advance
319.Va head
320and
321.Va cur
322past slots ready to transmit.
323.Va cur
324may be moved further ahead if the user code needs
325more slots before further transmissions (see
326.Sx SCATTER GATHER I/O ) .
327.Pp
328At the next NIOCTXSYNC/select()/poll(),
329slots up to
330.Va head-1
331are pushed to the port, and
332.Va tail
333may advance if further slots have become available.
334Below is an example of the evolution of a TX ring:
335.Pp
336.Bd -literal
337    after the syscall, slots between cur and tail are (a)vailable
338              head=cur   tail
339               |          |
340               v          v
341     TX  [.....aaaaaaaaaaa.............]
342
343    user creates new packets to (T)ransmit
344                head=cur tail
345                    |     |
346                    v     v
347     TX  [.....TTTTTaaaaaa.............]
348
349    NIOCTXSYNC/poll()/select() sends packets and reports new slots
350                head=cur      tail
351                    |          |
352                    v          v
353     TX  [..........aaaaaaaaaaa........]
354.Ed
355.Pp
356select() and poll() wlll block if there is no space in the ring, i.e.
357.Dl ring->cur == ring->tail
358and return when new slots have become available.
359.Pp
360High speed applications may want to amortize the cost of system calls
361by preparing as many packets as possible before issuing them.
362.Pp
363A transmit ring with pending transmissions has
364.Dl ring->head != ring->tail + 1 (modulo the ring size).
365The function
366.Va int nm_tx_pending(ring)
367implements this test.
368.Pp
369.Ss RECEIVE RINGS
370On receive rings, after a
371.Nm
372system call, the slots in the range
373.Va head\& . . . tail-1
374contain received packets.
375User code should process them and advance
376.Va head
377and
378.Va cur
379past slots it wants to return to the kernel.
380.Va cur
381may be moved further ahead if the user code wants to
382wait for more packets
383without returning all the previous slots to the kernel.
384.Pp
385At the next NIOCRXSYNC/select()/poll(),
386slots up to
387.Va head-1
388are returned to the kernel for further receives, and
389.Va tail
390may advance to report new incoming packets.
391.br
392Below is an example of the evolution of an RX ring:
393.Bd -literal
394    after the syscall, there are some (h)eld and some (R)eceived slots
395           head  cur     tail
396            |     |       |
397            v     v       v
398     RX  [..hhhhhhRRRRRRRR..........]
399
400    user advances head and cur, releasing some slots and holding others
401               head cur  tail
402                 |  |     |
403                 v  v     v
404     RX  [..*****hhhRRRRRR...........]
405
406    NICRXSYNC/poll()/select() recovers slots and reports new packets
407               head cur        tail
408                 |  |           |
409                 v  v           v
410     RX  [.......hhhRRRRRRRRRRRR....]
411.Ed
412.Pp
413.Sh SLOTS AND PACKET BUFFERS
414Normally, packets should be stored in the netmap-allocated buffers
415assigned to slots when ports are bound to a file descriptor.
416One packet is fully contained in a single buffer.
417.Pp
418The following flags affect slot and buffer processing:
419.Bl -tag -width XXX
420.It NS_BUF_CHANGED
421it MUST be used when the buf_idx in the slot is changed.
422This can be used to implement
423zero-copy forwarding, see
424.Sx ZERO-COPY FORWARDING .
425.Pp
426.It NS_REPORT
427reports when this buffer has been transmitted.
428Normally,
429.Nm
430notifies transmit completions in batches, hence signals
431can be delayed indefinitely. This flag helps detecting
432when packets have been send and a file descriptor can be closed.
433.It NS_FORWARD
434When a ring is in 'transparent' mode (see
435.Sx TRANSPARENT MODE ) ,
436packets marked with this flags are forwarded to the other endpoint
437at the next system call, thus restoring (in a selective way)
438the connection between a NIC and the host stack.
439.It NS_NO_LEARN
440tells the forwarding code that the SRC MAC address for this
441packet must not be used in the learning bridge code.
442.It NS_INDIRECT
443indicates that the packet's payload is in a user-supplied buffer,
444whose user virtual address is in the 'ptr' field of the slot.
445The size can reach 65535 bytes.
446.br
447This is only supported on the transmit ring of
448.Nm VALE
449ports, and it helps reducing data copies in the interconnection
450of virtual machines.
451.It NS_MOREFRAG
452indicates that the packet continues with subsequent buffers;
453the last buffer in a packet must have the flag clear.
454.El
455.Sh SCATTER GATHER I/O
456Packets can span multiple slots if the
457.Va NS_MOREFRAG
458flag is set in all but the last slot.
459The maximum length of a chain is 64 buffers.
460This is normally used with
461.Nm VALE
462ports when connecting virtual machines, as they generate large
463TSO segments that are not split unless they reach a physical device.
464.Pp
465NOTE: The length field always refers to the individual
466fragment; there is no place with the total length of a packet.
467.Pp
468On receive rings the macro
469.Va NS_RFRAGS(slot)
470indicates the remaining number of slots for this packet,
471including the current one.
472Slots with a value greater than 1 also have NS_MOREFRAG set.
473.Sh IOCTLS
474.Nm
475uses two ioctls (NIOCTXSYNC, NIOCRXSYNC)
476for non-blocking I/O. They take no argument.
477Two more ioctls (NIOCGINFO, NIOCREGIF) are used
478to query and configure ports, with the following argument:
479.Bd -literal
480struct nmreq {
481    char      nr_name[IFNAMSIZ]; /* (i) port name                  */
482    uint32_t  nr_version;        /* (i) API version                */
483    uint32_t  nr_offset;         /* (o) nifp offset in mmap region */
484    uint32_t  nr_memsize;        /* (o) size of the mmap region    */
485    uint32_t  nr_tx_slots;       /* (o) slots in tx rings          */
486    uint32_t  nr_rx_slots;       /* (o) slots in rx rings          */
487    uint16_t  nr_tx_rings;       /* (o) number of tx rings         */
488    uint16_t  nr_rx_rings;       /* (o) number of tx rings         */
489    uint16_t  nr_ringid;         /* (i) ring(s) we care about      */
490    uint16_t  nr_cmd;            /* (i) special command            */
491    uint16_t  nr_arg1;           /* (i) extra arguments            */
492    uint16_t  nr_arg2;           /* (i) extra arguments            */
493    ...
494};
495.Ed
496.Pp
497A file descriptor obtained through
498.Pa /dev/netmap
499also supports the ioctl supported by network devices, see
500.Xr netintro 4 .
501.Pp
502.Bl -tag -width XXXX
503.It Dv NIOCGINFO
504returns EINVAL if the named port does not support netmap.
505Otherwise, it returns 0 and (advisory) information
506about the port.
507Note that all the information below can change before the
508interface is actually put in netmap mode.
509.Pp
510.Bl -tag -width XX
511.It Pa nr_memsize
512indicates the size of the
513.Nm
514memory region. NICs in
515.Nm
516mode all share the same memory region,
517whereas
518.Nm VALE
519ports have independent regions for each port.
520.It Pa nr_tx_slots , nr_rx_slots
521indicate the size of transmit and receive rings.
522.It Pa nr_tx_rings , nr_rx_rings
523indicate the number of transmit
524and receive rings.
525Both ring number and sizes may be configured at runtime
526using interface-specific functions (e.g.
527.Xr ethtool
528).
529.El
530.It Dv NIOCREGIF
531binds the port named in
532.Va nr_name
533to the file descriptor. For a physical device this also switches it into
534.Nm
535mode, disconnecting
536it from the host stack.
537Multiple file descriptors can be bound to the same port,
538with proper synchronization left to the user.
539.Pp
540On return, it gives the same info as NIOCGINFO, and nr_ringid
541indicates the identity of the rings controlled through the file
542descriptor.
543.Pp
544.Va nr_ringid
545selects which rings are controlled through this file descriptor.
546Possible values are:
547.Bl -tag -width XXXXX
548.It 0
549(default) all hardware rings
550.It NETMAP_SW_RING
551the ``host rings'', connecting to the host stack.
552.It NETMAP_HW_RING | i
553the i-th hardware ring .
554.El
555.Pp
556By default, a
557.Xr poll 2
558or
559.Xr select 2
560call pushes out any pending packets on the transmit ring, even if
561no write events are specified.
562The feature can be disabled by or-ing
563.Va NETMAP_NO_TX_SYNC
564to the value written to
565.Va nr_ringid.
566When this feature is used,
567packets are transmitted only on
568.Va ioctl(NIOCTXSYNC)
569or select()/poll() are called with a write event (POLLOUT/wfdset) or a full ring.
570.Pp
571When registering a virtual interface that is dynamically created to a
572.Xr vale 4
573switch, we can specify the desired number of rings (1 by default,
574and currently up to 16) on it using nr_tx_rings and nr_rx_rings fields.
575.It Dv NIOCTXSYNC
576tells the hardware of new packets to transmit, and updates the
577number of slots available for transmission.
578.It Dv NIOCRXSYNC
579tells the hardware of consumed packets, and asks for newly available
580packets.
581.El
582.Sh SELECT AND POLL
583.Xr select 2
584and
585.Xr poll 2
586on a
587.Nm
588file descriptor process rings as indicated in
589.Sx TRANSMIT RINGS
590and
591.Sx RECEIVE RINGS
592when write (POLLOUT) and read (POLLIN) events are requested.
593.Pp
594Both block if no slots are available in the ring (
595.Va ring->cur == ring->tail )
596.Pp
597Packets in transmit rings are normally pushed out even without
598requesting write events. Passing the NETMAP_NO_TX_SYNC flag to
599.Em NIOCREGIF
600disables this feature.
601.Sh LIBRARIES
602The
603.Nm
604API is supposed to be used directly, both because of its simplicity and
605for efficient integration with applications.
606.Pp
607For conveniency, the
608.Va <net/netmap_user.h>
609header provides a few macros and functions to ease creating
610a file descriptor and doing I/O with a
611.Nm
612port. These are loosely modeled after the
613.Xr pcap 3
614API, to ease porting of libpcap-based applications to
615.Nm .
616To use these extra functions, programs should
617.Dl #define NETMAP_WITH_LIBS
618before
619.Dl #include <net/netmap_user.h>
620.Pp
621The following functions are available:
622.Bl -tag -width XXXXX
623.It Va  struct nm_desc_t * nm_open(const char *ifname, const char *ring_name, int flags, int ring_flags)
624similar to
625.Xr pcap_open ,
626binds a file descriptor to a port.
627.Bl -tag -width XX
628.It Va ifname
629is a port name, in the form "netmap:XXX" for a NIC and "valeXXX:YYY" for a
630.Nm VALE
631port.
632.It Va flags
633can be set to
634.Va NETMAP_SW_RING
635to bind to the host ring pair,
636or to NETMAP_HW_RING to bind to a specific ring.
637.Va ring_name
638with NETMAP_HW_RING,
639is interpreted as a string or an integer indicating the ring to use.
640.It Va ring_flags
641is copied directly into the ring flags, to specify additional parameters
642such as NR_TIMESTAMP or NR_FORWARD.
643.El
644.It Va int nm_close(struct nm_desc_t *d)
645closes the file descriptor, unmaps memory, frees resources.
646.It Va int nm_inject(struct nm_desc_t *d, const void *buf, size_t size)
647similar to pcap_inject(), pushes a packet to a ring, returns the size
648of the packet is successful, or 0 on error;
649.It Va int nm_dispatch(struct nm_desc_t *d, int cnt, nm_cb_t cb, u_char *arg)
650similar to pcap_dispatch(), applies a callback to incoming packets
651.It Va u_char * nm_nextpkt(struct nm_desc_t *d, struct nm_hdr_t *hdr)
652similar to pcap_next(), fetches the next packet
653.Pp
654.El
655.Sh SUPPORTED DEVICES
656.Nm
657natively supports the following devices:
658.Pp
659On FreeBSD:
660.Xr em 4 ,
661.Xr igb 4 ,
662.Xr ixgbe 4 ,
663.Xr lem 4 ,
664.Xr re 4 .
665.Pp
666On Linux
667.Xr e1000 4 ,
668.Xr e1000e 4 ,
669.Xr igb 4 ,
670.Xr ixgbe 4 ,
671.Xr mlx4 4 ,
672.Xr forcedeth 4 ,
673.Xr r8169 4 .
674.Pp
675NICs without native support can still be used in
676.Nm
677mode through emulation. Performance is inferior to native netmap
678mode but still significantly higher than sockets, and approaching
679that of in-kernel solutions such as Linux's
680.Xr pktgen .
681.Pp
682Emulation is also available for devices with native netmap support,
683which can be used for testing or performance comparison.
684The sysctl variable
685.Va dev.netmap.admode
686globally controls how netmap mode is implemented.
687.Sh SYSCTL VARIABLES AND MODULE PARAMETERS
688Some aspect of the operation of
689.Nm
690are controlled through sysctl variables on FreeBSD
691.Em ( dev.netmap.* )
692and module parameters on Linux
693.Em ( /sys/module/netmap_lin/parameters/* ) :
694.Pp
695.Bl -tag -width indent
696.It Va dev.netmap.admode: 0
697Controls the use of native or emulated adapter mode.
6980 uses the best available option, 1 forces native and
699fails if not available, 2 forces emulated hence never fails.
700.It Va dev.netmap.generic_ringsize: 1024
701Ring size used for emulated netmap mode
702.It Va dev.netmap.generic_mit: 100000
703Controls interrupt moderation for emulated mode
704.It Va dev.netmap.mmap_unreg: 0
705.It Va dev.netmap.fwd: 0
706Forces NS_FORWARD mode
707.It Va dev.netmap.flags: 0
708.It Va dev.netmap.txsync_retry: 2
709.It Va dev.netmap.no_pendintr: 1
710Forces recovery of transmit buffers on system calls
711.It Va dev.netmap.mitigate: 1
712Propagates interrupt mitigation to user processes
713.It Va dev.netmap.no_timestamp: 0
714Disables the update of the timestamp in the netmap ring
715.It Va dev.netmap.verbose: 0
716Verbose kernel messages
717.It Va dev.netmap.buf_num: 163840
718.It Va dev.netmap.buf_size: 2048
719.It Va dev.netmap.ring_num: 200
720.It Va dev.netmap.ring_size: 36864
721.It Va dev.netmap.if_num: 100
722.It Va dev.netmap.if_size: 1024
723Sizes and number of objects (netmap_if, netmap_ring, buffers)
724for the global memory region. The only parameter worth modifying is
725.Va dev.netmap.buf_num
726as it impacts the total amount of memory used by netmap.
727.It Va dev.netmap.buf_curr_num: 0
728.It Va dev.netmap.buf_curr_size: 0
729.It Va dev.netmap.ring_curr_num: 0
730.It Va dev.netmap.ring_curr_size: 0
731.It Va dev.netmap.if_curr_num: 0
732.It Va dev.netmap.if_curr_size: 0
733Actual values in use.
734.It Va dev.netmap.bridge_batch: 1024
735Batch size used when moving packets across a
736.Nm VALE
737switch. Values above 64 generally guarantee good
738performance.
739.El
740.Sh SYSTEM CALLS
741.Nm
742uses
743.Xr select 2
744and
745.Xr poll 2
746to wake up processes when significant events occur, and
747.Xr mmap 2
748to map memory.
749.Xr ioctl 2
750is used to configure ports and
751.Nm VALE switches .
752.Pp
753Applications may need to create threads and bind them to
754specific cores to improve performance, using standard
755OS primitives, see
756.Xr pthread 3 .
757In particular,
758.Xr pthread_setaffinity_np 3
759may be of use.
760.Sh CAVEATS
761No matter how fast the CPU and OS are,
762achieving line rate on 10G and faster interfaces
763requires hardware with sufficient performance.
764Several NICs are unable to sustain line rate with
765small packet sizes. Insufficient PCIe or memory bandwidth
766can also cause reduced performance.
767.Pp
768Another frequent reason for low performance is the use
769of flow control on the link: a slow receiver can limit
770the transmit speed.
771Be sure to disable flow control when running high
772speed experiments.
773.Pp
774.Ss SPECIAL NIC FEATURES
775.Nm
776is orthogonal to some NIC features such as
777multiqueue, schedulers, packet filters.
778.Pp
779Multiple transmit and receive rings are supported natively
780and can be configured with ordinary OS tools,
781such as
782.Xr ethtool
783or
784device-specific sysctl variables.
785The same goes for Receive Packet Steering (RPS)
786and filtering of incoming traffic.
787.Pp
788.Nm
789.Em does not use
790features such as
791.Em checksum offloading , TCP segmentation offloading ,
792.Em encryption , VLAN encapsulation/decapsulation ,
793etc. .
794When using netmap to exchange packets with the host stack,
795make sure to disable these features.
796.Sh EXAMPLES
797.Ss TEST PROGRAMS
798.Nm
799comes with a few programs that can be used for testing or
800simple applications.
801See the
802.Va examples/
803directory in
804.Nm
805distributions, or
806.Va tools/tools/netmap/
807directory in FreeBSD distributions.
808.Pp
809.Xr pkt-gen
810is a general purpose traffic source/sink.
811.Pp
812As an example
813.Dl pkt-gen -i ix0 -f tx -l 60
814can generate an infinite stream of minimum size packets, and
815.Dl pkt-gen -i ix0 -f rx
816is a traffic sink.
817Both print traffic statistics, to help monitor
818how the system performs.
819.Pp
820.Xr pkt-gen
821has many options can be uses to set packet sizes, addresses,
822rates, and use multiple send/receive threads and cores.
823.Pp
824.Xr bridge
825is another test program which interconnects two
826.Nm
827ports. It can be used for transparent forwarding between
828interfaces, as in
829.Dl bridge -i ix0 -i ix1
830or even connect the NIC to the host stack using netmap
831.Dl bridge -i ix0 -i ix0
832.Ss USING THE NATIVE API
833The following code implements a traffic generator
834.Pp
835.Bd -literal -compact
836#include <net/netmap_user.h>
837...
838void sender(void)
839{
840    struct netmap_if *nifp;
841    struct netmap_ring *ring;
842    struct nmreq nmr;
843    struct pollfd fds;
844
845    fd = open("/dev/netmap", O_RDWR);
846    bzero(&nmr, sizeof(nmr));
847    strcpy(nmr.nr_name, "ix0");
848    nmr.nm_version = NETMAP_API;
849    ioctl(fd, NIOCREGIF, &nmr);
850    p = mmap(0, nmr.nr_memsize, fd);
851    nifp = NETMAP_IF(p, nmr.nr_offset);
852    ring = NETMAP_TXRING(nifp, 0);
853    fds.fd = fd;
854    fds.events = POLLOUT;
855    for (;;) {
856	poll(&fds, 1, -1);
857	while (!nm_ring_empty(ring)) {
858	    i = ring->cur;
859	    buf = NETMAP_BUF(ring, ring->slot[i].buf_index);
860	    ... prepare packet in buf ...
861	    ring->slot[i].len = ... packet length ...
862	    ring->head = ring->cur = nm_ring_next(ring, i);
863	}
864    }
865}
866.Ed
867.Ss HELPER FUNCTIONS
868A simple receiver can be implemented using the helper functions
869.Bd -literal -compact
870#define NETMAP_WITH_LIBS
871#include <net/netmap_user.h>
872...
873void receiver(void)
874{
875    struct nm_desc_t *d;
876    struct pollfd fds;
877    u_char *buf;
878    struct nm_hdr_t h;
879    ...
880    d = nm_open("netmap:ix0", NULL, 0, 0);
881    fds.fd = NETMAP_FD(d);
882    fds.events = POLLIN;
883    for (;;) {
884	poll(&fds, 1, -1);
885        while ( (buf = nm_nextpkt(d, &h)) )
886	    consume_pkt(buf, h->len);
887    }
888    nm_close(d);
889}
890.Ed
891.Ss ZERO-COPY FORWARDING
892Since physical interfaces share the same memory region,
893it is possible to do packet forwarding between ports
894swapping buffers. The buffer from the transmit ring is used
895to replenish the receive ring:
896.Bd -literal -compact
897    uint32_t tmp;
898    struct netmap_slot *src, *dst;
899    ...
900    src = &src_ring->slot[rxr->cur];
901    dst = &dst_ring->slot[txr->cur];
902    tmp = dst->buf_idx;
903    dst->buf_idx = src->buf_idx;
904    dst->len = src->len;
905    dst->flags = NS_BUF_CHANGED;
906    src->buf_idx = tmp;
907    src->flags = NS_BUF_CHANGED;
908    rxr->head = rxr->cur = nm_ring_next(rxr, rxr->cur);
909    txr->head = txr->cur = nm_ring_next(txr, txr->cur);
910    ...
911.Ed
912.Ss ACCESSING THE HOST STACK
913.Ss VALE SWITCH
914A simple way to test the performance of a
915.Nm VALE
916switch is to attach a sender and a receiver to it,
917e.g. running the following in two different terminals:
918.Dl pkt-gen -i vale1:a -f rx # receiver
919.Dl pkt-gen -i vale1:b -f tx # sender
920.Pp
921The following command attaches an interface and the host stack
922to a switch:
923.Dl vale-ctl -h vale2:em0
924Other
925.Nm
926clients attached to the same switch can now communicate
927with the network card or the host.
928.Pp
929.Sh SEE ALSO
930.Pp
931http://info.iet.unipi.it/~luigi/netmap/
932.Pp
933Luigi Rizzo, Revisiting network I/O APIs: the netmap framework,
934Communications of the ACM, 55 (3), pp.45-51, March 2012
935.Pp
936Luigi Rizzo, netmap: a novel framework for fast packet I/O,
937Usenix ATC'12, June 2012, Boston
938.Sh AUTHORS
939.An -nosplit
940The
941.Nm
942framework has been originally designed and implemented at the
943Universita` di Pisa in 2011 by
944.An Luigi Rizzo ,
945and further extended with help from
946.An Matteo Landi ,
947.An Gaetano Catalli ,
948.An Giuseppe Lettieri ,
949.An Vincenzo Maffione .
950.Pp
951.Nm
952and
953.Nm VALE
954have been funded by the European Commission within FP7 Projects
955CHANGE (257422) and OPENLAB (287581).
956.Pp
957.Ss SPECIAL MODES
958When the device name has the form
959.Dl valeXXX:ifname (ifname is an existing interface)
960the physical interface
961(and optionally the corrisponding host stack endpoint)
962are connected or disconnected from the
963.Nm VALE
964switch named XXX.
965In this case the
966.Pa ioctl()
967is only used only for configuration, typically through the
968.Xr vale-ctl
969command.
970The file descriptor cannot be used for I/O, and should be
971closed after issuing the
972.Pa ioctl() .
973