1 /* 2 * Copyright (C) 2011-2014 Matteo Landi 3 * Copyright (C) 2011-2016 Luigi Rizzo 4 * Copyright (C) 2011-2016 Giuseppe Lettieri 5 * Copyright (C) 2011-2016 Vincenzo Maffione 6 * All rights reserved. 7 * 8 * Redistribution and use in source and binary forms, with or without 9 * modification, are permitted provided that the following conditions 10 * are met: 11 * 1. Redistributions of source code must retain the above copyright 12 * notice, this list of conditions and the following disclaimer. 13 * 2. Redistributions in binary form must reproduce the above copyright 14 * notice, this list of conditions and the following disclaimer in the 15 * documentation and/or other materials provided with the distribution. 16 * 17 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 18 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 19 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 20 * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 21 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 22 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 23 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 24 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 25 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 26 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 27 * SUCH DAMAGE. 28 */ 29 30 31 /* 32 * $FreeBSD$ 33 * 34 * This module supports memory mapped access to network devices, 35 * see netmap(4). 36 * 37 * The module uses a large, memory pool allocated by the kernel 38 * and accessible as mmapped memory by multiple userspace threads/processes. 39 * The memory pool contains packet buffers and "netmap rings", 40 * i.e. user-accessible copies of the interface's queues. 41 * 42 * Access to the network card works like this: 43 * 1. a process/thread issues one or more open() on /dev/netmap, to create 44 * select()able file descriptor on which events are reported. 45 * 2. on each descriptor, the process issues an ioctl() to identify 46 * the interface that should report events to the file descriptor. 47 * 3. on each descriptor, the process issues an mmap() request to 48 * map the shared memory region within the process' address space. 49 * The list of interesting queues is indicated by a location in 50 * the shared memory region. 51 * 4. using the functions in the netmap(4) userspace API, a process 52 * can look up the occupation state of a queue, access memory buffers, 53 * and retrieve received packets or enqueue packets to transmit. 54 * 5. using some ioctl()s the process can synchronize the userspace view 55 * of the queue with the actual status in the kernel. This includes both 56 * receiving the notification of new packets, and transmitting new 57 * packets on the output interface. 58 * 6. select() or poll() can be used to wait for events on individual 59 * transmit or receive queues (or all queues for a given interface). 60 * 61 62 SYNCHRONIZATION (USER) 63 64 The netmap rings and data structures may be shared among multiple 65 user threads or even independent processes. 66 Any synchronization among those threads/processes is delegated 67 to the threads themselves. Only one thread at a time can be in 68 a system call on the same netmap ring. The OS does not enforce 69 this and only guarantees against system crashes in case of 70 invalid usage. 71 72 LOCKING (INTERNAL) 73 74 Within the kernel, access to the netmap rings is protected as follows: 75 76 - a spinlock on each ring, to handle producer/consumer races on 77 RX rings attached to the host stack (against multiple host 78 threads writing from the host stack to the same ring), 79 and on 'destination' rings attached to a VALE switch 80 (i.e. RX rings in VALE ports, and TX rings in NIC/host ports) 81 protecting multiple active senders for the same destination) 82 83 - an atomic variable to guarantee that there is at most one 84 instance of *_*xsync() on the ring at any time. 85 For rings connected to user file 86 descriptors, an atomic_test_and_set() protects this, and the 87 lock on the ring is not actually used. 88 For NIC RX rings connected to a VALE switch, an atomic_test_and_set() 89 is also used to prevent multiple executions (the driver might indeed 90 already guarantee this). 91 For NIC TX rings connected to a VALE switch, the lock arbitrates 92 access to the queue (both when allocating buffers and when pushing 93 them out). 94 95 - *xsync() should be protected against initializations of the card. 96 On FreeBSD most devices have the reset routine protected by 97 a RING lock (ixgbe, igb, em) or core lock (re). lem is missing 98 the RING protection on rx_reset(), this should be added. 99 100 On linux there is an external lock on the tx path, which probably 101 also arbitrates access to the reset routine. XXX to be revised 102 103 - a per-interface core_lock protecting access from the host stack 104 while interfaces may be detached from netmap mode. 105 XXX there should be no need for this lock if we detach the interfaces 106 only while they are down. 107 108 109 --- VALE SWITCH --- 110 111 NMG_LOCK() serializes all modifications to switches and ports. 112 A switch cannot be deleted until all ports are gone. 113 114 For each switch, an SX lock (RWlock on linux) protects 115 deletion of ports. When configuring or deleting a new port, the 116 lock is acquired in exclusive mode (after holding NMG_LOCK). 117 When forwarding, the lock is acquired in shared mode (without NMG_LOCK). 118 The lock is held throughout the entire forwarding cycle, 119 during which the thread may incur in a page fault. 120 Hence it is important that sleepable shared locks are used. 121 122 On the rx ring, the per-port lock is grabbed initially to reserve 123 a number of slot in the ring, then the lock is released, 124 packets are copied from source to destination, and then 125 the lock is acquired again and the receive ring is updated. 126 (A similar thing is done on the tx ring for NIC and host stack 127 ports attached to the switch) 128 129 */ 130 131 132 /* --- internals ---- 133 * 134 * Roadmap to the code that implements the above. 135 * 136 * > 1. a process/thread issues one or more open() on /dev/netmap, to create 137 * > select()able file descriptor on which events are reported. 138 * 139 * Internally, we allocate a netmap_priv_d structure, that will be 140 * initialized on ioctl(NIOCREGIF). There is one netmap_priv_d 141 * structure for each open(). 142 * 143 * os-specific: 144 * FreeBSD: see netmap_open() (netmap_freebsd.c) 145 * linux: see linux_netmap_open() (netmap_linux.c) 146 * 147 * > 2. on each descriptor, the process issues an ioctl() to identify 148 * > the interface that should report events to the file descriptor. 149 * 150 * Implemented by netmap_ioctl(), NIOCREGIF case, with nmr->nr_cmd==0. 151 * Most important things happen in netmap_get_na() and 152 * netmap_do_regif(), called from there. Additional details can be 153 * found in the comments above those functions. 154 * 155 * In all cases, this action creates/takes-a-reference-to a 156 * netmap_*_adapter describing the port, and allocates a netmap_if 157 * and all necessary netmap rings, filling them with netmap buffers. 158 * 159 * In this phase, the sync callbacks for each ring are set (these are used 160 * in steps 5 and 6 below). The callbacks depend on the type of adapter. 161 * The adapter creation/initialization code puts them in the 162 * netmap_adapter (fields na->nm_txsync and na->nm_rxsync). Then, they 163 * are copied from there to the netmap_kring's during netmap_do_regif(), by 164 * the nm_krings_create() callback. All the nm_krings_create callbacks 165 * actually call netmap_krings_create() to perform this and the other 166 * common stuff. netmap_krings_create() also takes care of the host rings, 167 * if needed, by setting their sync callbacks appropriately. 168 * 169 * Additional actions depend on the kind of netmap_adapter that has been 170 * registered: 171 * 172 * - netmap_hw_adapter: [netmap.c] 173 * This is a system netdev/ifp with native netmap support. 174 * The ifp is detached from the host stack by redirecting: 175 * - transmissions (from the network stack) to netmap_transmit() 176 * - receive notifications to the nm_notify() callback for 177 * this adapter. The callback is normally netmap_notify(), unless 178 * the ifp is attached to a bridge using bwrap, in which case it 179 * is netmap_bwrap_intr_notify(). 180 * 181 * - netmap_generic_adapter: [netmap_generic.c] 182 * A system netdev/ifp without native netmap support. 183 * 184 * (the decision about native/non native support is taken in 185 * netmap_get_hw_na(), called by netmap_get_na()) 186 * 187 * - netmap_vp_adapter [netmap_vale.c] 188 * Returned by netmap_get_bdg_na(). 189 * This is a persistent or ephemeral VALE port. Ephemeral ports 190 * are created on the fly if they don't already exist, and are 191 * always attached to a bridge. 192 * Persistent VALE ports must must be created separately, and i 193 * then attached like normal NICs. The NIOCREGIF we are examining 194 * will find them only if they had previosly been created and 195 * attached (see VALE_CTL below). 196 * 197 * - netmap_pipe_adapter [netmap_pipe.c] 198 * Returned by netmap_get_pipe_na(). 199 * Both pipe ends are created, if they didn't already exist. 200 * 201 * - netmap_monitor_adapter [netmap_monitor.c] 202 * Returned by netmap_get_monitor_na(). 203 * If successful, the nm_sync callbacks of the monitored adapter 204 * will be intercepted by the returned monitor. 205 * 206 * - netmap_bwrap_adapter [netmap_vale.c] 207 * Cannot be obtained in this way, see VALE_CTL below 208 * 209 * 210 * os-specific: 211 * linux: we first go through linux_netmap_ioctl() to 212 * adapt the FreeBSD interface to the linux one. 213 * 214 * 215 * > 3. on each descriptor, the process issues an mmap() request to 216 * > map the shared memory region within the process' address space. 217 * > The list of interesting queues is indicated by a location in 218 * > the shared memory region. 219 * 220 * os-specific: 221 * FreeBSD: netmap_mmap_single (netmap_freebsd.c). 222 * linux: linux_netmap_mmap (netmap_linux.c). 223 * 224 * > 4. using the functions in the netmap(4) userspace API, a process 225 * > can look up the occupation state of a queue, access memory buffers, 226 * > and retrieve received packets or enqueue packets to transmit. 227 * 228 * these actions do not involve the kernel. 229 * 230 * > 5. using some ioctl()s the process can synchronize the userspace view 231 * > of the queue with the actual status in the kernel. This includes both 232 * > receiving the notification of new packets, and transmitting new 233 * > packets on the output interface. 234 * 235 * These are implemented in netmap_ioctl(), NIOCTXSYNC and NIOCRXSYNC 236 * cases. They invoke the nm_sync callbacks on the netmap_kring 237 * structures, as initialized in step 2 and maybe later modified 238 * by a monitor. Monitors, however, will always call the original 239 * callback before doing anything else. 240 * 241 * 242 * > 6. select() or poll() can be used to wait for events on individual 243 * > transmit or receive queues (or all queues for a given interface). 244 * 245 * Implemented in netmap_poll(). This will call the same nm_sync() 246 * callbacks as in step 5 above. 247 * 248 * os-specific: 249 * linux: we first go through linux_netmap_poll() to adapt 250 * the FreeBSD interface to the linux one. 251 * 252 * 253 * ---- VALE_CTL ----- 254 * 255 * VALE switches are controlled by issuing a NIOCREGIF with a non-null 256 * nr_cmd in the nmreq structure. These subcommands are handled by 257 * netmap_bdg_ctl() in netmap_vale.c. Persistent VALE ports are created 258 * and destroyed by issuing the NETMAP_BDG_NEWIF and NETMAP_BDG_DELIF 259 * subcommands, respectively. 260 * 261 * Any network interface known to the system (including a persistent VALE 262 * port) can be attached to a VALE switch by issuing the 263 * NETMAP_BDG_ATTACH subcommand. After the attachment, persistent VALE ports 264 * look exactly like ephemeral VALE ports (as created in step 2 above). The 265 * attachment of other interfaces, instead, requires the creation of a 266 * netmap_bwrap_adapter. Moreover, the attached interface must be put in 267 * netmap mode. This may require the creation of a netmap_generic_adapter if 268 * we have no native support for the interface, or if generic adapters have 269 * been forced by sysctl. 270 * 271 * Both persistent VALE ports and bwraps are handled by netmap_get_bdg_na(), 272 * called by nm_bdg_ctl_attach(), and discriminated by the nm_bdg_attach() 273 * callback. In the case of the bwrap, the callback creates the 274 * netmap_bwrap_adapter. The initialization of the bwrap is then 275 * completed by calling netmap_do_regif() on it, in the nm_bdg_ctl() 276 * callback (netmap_bwrap_bdg_ctl in netmap_vale.c). 277 * A generic adapter for the wrapped ifp will be created if needed, when 278 * netmap_get_bdg_na() calls netmap_get_hw_na(). 279 * 280 * 281 * ---- DATAPATHS ----- 282 * 283 * -= SYSTEM DEVICE WITH NATIVE SUPPORT =- 284 * 285 * na == NA(ifp) == netmap_hw_adapter created in DEVICE_netmap_attach() 286 * 287 * - tx from netmap userspace: 288 * concurrently: 289 * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context 290 * kring->nm_sync() == DEVICE_netmap_txsync() 291 * 2) device interrupt handler 292 * na->nm_notify() == netmap_notify() 293 * - rx from netmap userspace: 294 * concurrently: 295 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context 296 * kring->nm_sync() == DEVICE_netmap_rxsync() 297 * 2) device interrupt handler 298 * na->nm_notify() == netmap_notify() 299 * - rx from host stack 300 * concurrently: 301 * 1) host stack 302 * netmap_transmit() 303 * na->nm_notify == netmap_notify() 304 * 2) ioctl(NIOCRXSYNC)/netmap_poll() in process context 305 * kring->nm_sync() == netmap_rxsync_from_host 306 * netmap_rxsync_from_host(na, NULL, NULL) 307 * - tx to host stack 308 * ioctl(NIOCTXSYNC)/netmap_poll() in process context 309 * kring->nm_sync() == netmap_txsync_to_host 310 * netmap_txsync_to_host(na) 311 * nm_os_send_up() 312 * FreeBSD: na->if_input() == ether_input() 313 * linux: netif_rx() with NM_MAGIC_PRIORITY_RX 314 * 315 * 316 * -= SYSTEM DEVICE WITH GENERIC SUPPORT =- 317 * 318 * na == NA(ifp) == generic_netmap_adapter created in generic_netmap_attach() 319 * 320 * - tx from netmap userspace: 321 * concurrently: 322 * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context 323 * kring->nm_sync() == generic_netmap_txsync() 324 * nm_os_generic_xmit_frame() 325 * linux: dev_queue_xmit() with NM_MAGIC_PRIORITY_TX 326 * ifp->ndo_start_xmit == generic_ndo_start_xmit() 327 * gna->save_start_xmit == orig. dev. start_xmit 328 * FreeBSD: na->if_transmit() == orig. dev if_transmit 329 * 2) generic_mbuf_destructor() 330 * na->nm_notify() == netmap_notify() 331 * - rx from netmap userspace: 332 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context 333 * kring->nm_sync() == generic_netmap_rxsync() 334 * mbq_safe_dequeue() 335 * 2) device driver 336 * generic_rx_handler() 337 * mbq_safe_enqueue() 338 * na->nm_notify() == netmap_notify() 339 * - rx from host stack 340 * FreeBSD: same as native 341 * Linux: same as native except: 342 * 1) host stack 343 * dev_queue_xmit() without NM_MAGIC_PRIORITY_TX 344 * ifp->ndo_start_xmit == generic_ndo_start_xmit() 345 * netmap_transmit() 346 * na->nm_notify() == netmap_notify() 347 * - tx to host stack (same as native): 348 * 349 * 350 * -= VALE =- 351 * 352 * INCOMING: 353 * 354 * - VALE ports: 355 * ioctl(NIOCTXSYNC)/netmap_poll() in process context 356 * kring->nm_sync() == netmap_vp_txsync() 357 * 358 * - system device with native support: 359 * from cable: 360 * interrupt 361 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr != host ring) 362 * kring->nm_sync() == DEVICE_netmap_rxsync() 363 * netmap_vp_txsync() 364 * kring->nm_sync() == DEVICE_netmap_rxsync() 365 * from host stack: 366 * netmap_transmit() 367 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring) 368 * kring->nm_sync() == netmap_rxsync_from_host() 369 * netmap_vp_txsync() 370 * 371 * - system device with generic support: 372 * from device driver: 373 * generic_rx_handler() 374 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr != host ring) 375 * kring->nm_sync() == generic_netmap_rxsync() 376 * netmap_vp_txsync() 377 * kring->nm_sync() == generic_netmap_rxsync() 378 * from host stack: 379 * netmap_transmit() 380 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring) 381 * kring->nm_sync() == netmap_rxsync_from_host() 382 * netmap_vp_txsync() 383 * 384 * (all cases) --> nm_bdg_flush() 385 * dest_na->nm_notify() == (see below) 386 * 387 * OUTGOING: 388 * 389 * - VALE ports: 390 * concurrently: 391 * 1) ioctlNIOCRXSYNC)/netmap_poll() in process context 392 * kring->nm_sync() == netmap_vp_rxsync() 393 * 2) from nm_bdg_flush() 394 * na->nm_notify() == netmap_notify() 395 * 396 * - system device with native support: 397 * to cable: 398 * na->nm_notify() == netmap_bwrap_notify() 399 * netmap_vp_rxsync() 400 * kring->nm_sync() == DEVICE_netmap_txsync() 401 * netmap_vp_rxsync() 402 * to host stack: 403 * netmap_vp_rxsync() 404 * kring->nm_sync() == netmap_txsync_to_host 405 * netmap_vp_rxsync_locked() 406 * 407 * - system device with generic adapter: 408 * to device driver: 409 * na->nm_notify() == netmap_bwrap_notify() 410 * netmap_vp_rxsync() 411 * kring->nm_sync() == generic_netmap_txsync() 412 * netmap_vp_rxsync() 413 * to host stack: 414 * netmap_vp_rxsync() 415 * kring->nm_sync() == netmap_txsync_to_host 416 * netmap_vp_rxsync() 417 * 418 */ 419 420 /* 421 * OS-specific code that is used only within this file. 422 * Other OS-specific code that must be accessed by drivers 423 * is present in netmap_kern.h 424 */ 425 426 #if defined(__FreeBSD__) 427 #include <sys/cdefs.h> /* prerequisite */ 428 #include <sys/types.h> 429 #include <sys/errno.h> 430 #include <sys/param.h> /* defines used in kernel.h */ 431 #include <sys/kernel.h> /* types used in module initialization */ 432 #include <sys/conf.h> /* cdevsw struct, UID, GID */ 433 #include <sys/filio.h> /* FIONBIO */ 434 #include <sys/sockio.h> 435 #include <sys/socketvar.h> /* struct socket */ 436 #include <sys/malloc.h> 437 #include <sys/poll.h> 438 #include <sys/rwlock.h> 439 #include <sys/socket.h> /* sockaddrs */ 440 #include <sys/selinfo.h> 441 #include <sys/sysctl.h> 442 #include <sys/jail.h> 443 #include <net/vnet.h> 444 #include <net/if.h> 445 #include <net/if_var.h> 446 #include <net/bpf.h> /* BIOCIMMEDIATE */ 447 #include <machine/bus.h> /* bus_dmamap_* */ 448 #include <sys/endian.h> 449 #include <sys/refcount.h> 450 451 452 #elif defined(linux) 453 454 #include "bsd_glue.h" 455 456 #elif defined(__APPLE__) 457 458 #warning OSX support is only partial 459 #include "osx_glue.h" 460 461 #elif defined (_WIN32) 462 463 #include "win_glue.h" 464 465 #else 466 467 #error Unsupported platform 468 469 #endif /* unsupported */ 470 471 /* 472 * common headers 473 */ 474 #include <net/netmap.h> 475 #include <dev/netmap/netmap_kern.h> 476 #include <dev/netmap/netmap_mem2.h> 477 478 479 /* user-controlled variables */ 480 int netmap_verbose; 481 482 static int netmap_no_timestamp; /* don't timestamp on rxsync */ 483 int netmap_mitigate = 1; 484 int netmap_no_pendintr = 1; 485 int netmap_txsync_retry = 2; 486 int netmap_adaptive_io = 0; 487 int netmap_flags = 0; /* debug flags */ 488 static int netmap_fwd = 0; /* force transparent mode */ 489 490 /* 491 * netmap_admode selects the netmap mode to use. 492 * Invalid values are reset to NETMAP_ADMODE_BEST 493 */ 494 enum { NETMAP_ADMODE_BEST = 0, /* use native, fallback to generic */ 495 NETMAP_ADMODE_NATIVE, /* either native or none */ 496 NETMAP_ADMODE_GENERIC, /* force generic */ 497 NETMAP_ADMODE_LAST }; 498 static int netmap_admode = NETMAP_ADMODE_BEST; 499 500 /* netmap_generic_mit controls mitigation of RX notifications for 501 * the generic netmap adapter. The value is a time interval in 502 * nanoseconds. */ 503 int netmap_generic_mit = 100*1000; 504 505 /* We use by default netmap-aware qdiscs with generic netmap adapters, 506 * even if there can be a little performance hit with hardware NICs. 507 * However, using the qdisc is the safer approach, for two reasons: 508 * 1) it prevents non-fifo qdiscs to break the TX notification 509 * scheme, which is based on mbuf destructors when txqdisc is 510 * not used. 511 * 2) it makes it possible to transmit over software devices that 512 * change skb->dev, like bridge, veth, ... 513 * 514 * Anyway users looking for the best performance should 515 * use native adapters. 516 */ 517 int netmap_generic_txqdisc = 1; 518 519 /* Default number of slots and queues for generic adapters. */ 520 int netmap_generic_ringsize = 1024; 521 int netmap_generic_rings = 1; 522 523 /* Non-zero if ptnet devices are allowed to use virtio-net headers. */ 524 int ptnet_vnet_hdr = 1; 525 526 /* 527 * SYSCTL calls are grouped between SYSBEGIN and SYSEND to be emulated 528 * in some other operating systems 529 */ 530 SYSBEGIN(main_init); 531 532 SYSCTL_DECL(_dev_netmap); 533 SYSCTL_NODE(_dev, OID_AUTO, netmap, CTLFLAG_RW, 0, "Netmap args"); 534 SYSCTL_INT(_dev_netmap, OID_AUTO, verbose, 535 CTLFLAG_RW, &netmap_verbose, 0, "Verbose mode"); 536 SYSCTL_INT(_dev_netmap, OID_AUTO, no_timestamp, 537 CTLFLAG_RW, &netmap_no_timestamp, 0, "no_timestamp"); 538 SYSCTL_INT(_dev_netmap, OID_AUTO, mitigate, CTLFLAG_RW, &netmap_mitigate, 0, ""); 539 SYSCTL_INT(_dev_netmap, OID_AUTO, no_pendintr, 540 CTLFLAG_RW, &netmap_no_pendintr, 0, "Always look for new received packets."); 541 SYSCTL_INT(_dev_netmap, OID_AUTO, txsync_retry, CTLFLAG_RW, 542 &netmap_txsync_retry, 0 , "Number of txsync loops in bridge's flush."); 543 SYSCTL_INT(_dev_netmap, OID_AUTO, adaptive_io, CTLFLAG_RW, 544 &netmap_adaptive_io, 0 , "Adaptive I/O on paravirt"); 545 546 SYSCTL_INT(_dev_netmap, OID_AUTO, flags, CTLFLAG_RW, &netmap_flags, 0 , ""); 547 SYSCTL_INT(_dev_netmap, OID_AUTO, fwd, CTLFLAG_RW, &netmap_fwd, 0 , ""); 548 SYSCTL_INT(_dev_netmap, OID_AUTO, admode, CTLFLAG_RW, &netmap_admode, 0 , ""); 549 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_mit, CTLFLAG_RW, &netmap_generic_mit, 0 , ""); 550 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_ringsize, CTLFLAG_RW, &netmap_generic_ringsize, 0 , ""); 551 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_rings, CTLFLAG_RW, &netmap_generic_rings, 0 , ""); 552 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_txqdisc, CTLFLAG_RW, &netmap_generic_txqdisc, 0 , ""); 553 SYSCTL_INT(_dev_netmap, OID_AUTO, ptnet_vnet_hdr, CTLFLAG_RW, &ptnet_vnet_hdr, 0 , ""); 554 555 SYSEND; 556 557 NMG_LOCK_T netmap_global_lock; 558 559 /* 560 * mark the ring as stopped, and run through the locks 561 * to make sure other users get to see it. 562 * stopped must be either NR_KR_STOPPED (for unbounded stop) 563 * of NR_KR_LOCKED (brief stop for mutual exclusion purposes) 564 */ 565 static void 566 netmap_disable_ring(struct netmap_kring *kr, int stopped) 567 { 568 nm_kr_stop(kr, stopped); 569 // XXX check if nm_kr_stop is sufficient 570 mtx_lock(&kr->q_lock); 571 mtx_unlock(&kr->q_lock); 572 nm_kr_put(kr); 573 } 574 575 /* stop or enable a single ring */ 576 void 577 netmap_set_ring(struct netmap_adapter *na, u_int ring_id, enum txrx t, int stopped) 578 { 579 if (stopped) 580 netmap_disable_ring(NMR(na, t) + ring_id, stopped); 581 else 582 NMR(na, t)[ring_id].nkr_stopped = 0; 583 } 584 585 586 /* stop or enable all the rings of na */ 587 void 588 netmap_set_all_rings(struct netmap_adapter *na, int stopped) 589 { 590 int i; 591 enum txrx t; 592 593 if (!nm_netmap_on(na)) 594 return; 595 596 for_rx_tx(t) { 597 for (i = 0; i < netmap_real_rings(na, t); i++) { 598 netmap_set_ring(na, i, t, stopped); 599 } 600 } 601 } 602 603 /* 604 * Convenience function used in drivers. Waits for current txsync()s/rxsync()s 605 * to finish and prevents any new one from starting. Call this before turning 606 * netmap mode off, or before removing the hardware rings (e.g., on module 607 * onload). 608 */ 609 void 610 netmap_disable_all_rings(struct ifnet *ifp) 611 { 612 if (NM_NA_VALID(ifp)) { 613 netmap_set_all_rings(NA(ifp), NM_KR_STOPPED); 614 } 615 } 616 617 /* 618 * Convenience function used in drivers. Re-enables rxsync and txsync on the 619 * adapter's rings In linux drivers, this should be placed near each 620 * napi_enable(). 621 */ 622 void 623 netmap_enable_all_rings(struct ifnet *ifp) 624 { 625 if (NM_NA_VALID(ifp)) { 626 netmap_set_all_rings(NA(ifp), 0 /* enabled */); 627 } 628 } 629 630 void 631 netmap_make_zombie(struct ifnet *ifp) 632 { 633 if (NM_NA_VALID(ifp)) { 634 struct netmap_adapter *na = NA(ifp); 635 netmap_set_all_rings(na, NM_KR_LOCKED); 636 na->na_flags |= NAF_ZOMBIE; 637 netmap_set_all_rings(na, 0); 638 } 639 } 640 641 void 642 netmap_undo_zombie(struct ifnet *ifp) 643 { 644 if (NM_NA_VALID(ifp)) { 645 struct netmap_adapter *na = NA(ifp); 646 if (na->na_flags & NAF_ZOMBIE) { 647 netmap_set_all_rings(na, NM_KR_LOCKED); 648 na->na_flags &= ~NAF_ZOMBIE; 649 netmap_set_all_rings(na, 0); 650 } 651 } 652 } 653 654 /* 655 * generic bound_checking function 656 */ 657 u_int 658 nm_bound_var(u_int *v, u_int dflt, u_int lo, u_int hi, const char *msg) 659 { 660 u_int oldv = *v; 661 const char *op = NULL; 662 663 if (dflt < lo) 664 dflt = lo; 665 if (dflt > hi) 666 dflt = hi; 667 if (oldv < lo) { 668 *v = dflt; 669 op = "Bump"; 670 } else if (oldv > hi) { 671 *v = hi; 672 op = "Clamp"; 673 } 674 if (op && msg) 675 printf("%s %s to %d (was %d)\n", op, msg, *v, oldv); 676 return *v; 677 } 678 679 680 /* 681 * packet-dump function, user-supplied or static buffer. 682 * The destination buffer must be at least 30+4*len 683 */ 684 const char * 685 nm_dump_buf(char *p, int len, int lim, char *dst) 686 { 687 static char _dst[8192]; 688 int i, j, i0; 689 static char hex[] ="0123456789abcdef"; 690 char *o; /* output position */ 691 692 #define P_HI(x) hex[((x) & 0xf0)>>4] 693 #define P_LO(x) hex[((x) & 0xf)] 694 #define P_C(x) ((x) >= 0x20 && (x) <= 0x7e ? (x) : '.') 695 if (!dst) 696 dst = _dst; 697 if (lim <= 0 || lim > len) 698 lim = len; 699 o = dst; 700 sprintf(o, "buf 0x%p len %d lim %d\n", p, len, lim); 701 o += strlen(o); 702 /* hexdump routine */ 703 for (i = 0; i < lim; ) { 704 sprintf(o, "%5d: ", i); 705 o += strlen(o); 706 memset(o, ' ', 48); 707 i0 = i; 708 for (j=0; j < 16 && i < lim; i++, j++) { 709 o[j*3] = P_HI(p[i]); 710 o[j*3+1] = P_LO(p[i]); 711 } 712 i = i0; 713 for (j=0; j < 16 && i < lim; i++, j++) 714 o[j + 48] = P_C(p[i]); 715 o[j+48] = '\n'; 716 o += j+49; 717 } 718 *o = '\0'; 719 #undef P_HI 720 #undef P_LO 721 #undef P_C 722 return dst; 723 } 724 725 726 /* 727 * Fetch configuration from the device, to cope with dynamic 728 * reconfigurations after loading the module. 729 */ 730 /* call with NMG_LOCK held */ 731 int 732 netmap_update_config(struct netmap_adapter *na) 733 { 734 u_int txr, txd, rxr, rxd; 735 736 txr = txd = rxr = rxd = 0; 737 if (na->nm_config == NULL || 738 na->nm_config(na, &txr, &txd, &rxr, &rxd)) 739 { 740 /* take whatever we had at init time */ 741 txr = na->num_tx_rings; 742 txd = na->num_tx_desc; 743 rxr = na->num_rx_rings; 744 rxd = na->num_rx_desc; 745 } 746 747 if (na->num_tx_rings == txr && na->num_tx_desc == txd && 748 na->num_rx_rings == rxr && na->num_rx_desc == rxd) 749 return 0; /* nothing changed */ 750 if (netmap_verbose || na->active_fds > 0) { 751 D("stored config %s: txring %d x %d, rxring %d x %d", 752 na->name, 753 na->num_tx_rings, na->num_tx_desc, 754 na->num_rx_rings, na->num_rx_desc); 755 D("new config %s: txring %d x %d, rxring %d x %d", 756 na->name, txr, txd, rxr, rxd); 757 } 758 if (na->active_fds == 0) { 759 D("configuration changed (but fine)"); 760 na->num_tx_rings = txr; 761 na->num_tx_desc = txd; 762 na->num_rx_rings = rxr; 763 na->num_rx_desc = rxd; 764 return 0; 765 } 766 D("configuration changed while active, this is bad..."); 767 return 1; 768 } 769 770 /* nm_sync callbacks for the host rings */ 771 static int netmap_txsync_to_host(struct netmap_kring *kring, int flags); 772 static int netmap_rxsync_from_host(struct netmap_kring *kring, int flags); 773 774 /* create the krings array and initialize the fields common to all adapters. 775 * The array layout is this: 776 * 777 * +----------+ 778 * na->tx_rings ----->| | \ 779 * | | } na->num_tx_ring 780 * | | / 781 * +----------+ 782 * | | host tx kring 783 * na->rx_rings ----> +----------+ 784 * | | \ 785 * | | } na->num_rx_rings 786 * | | / 787 * +----------+ 788 * | | host rx kring 789 * +----------+ 790 * na->tailroom ----->| | \ 791 * | | } tailroom bytes 792 * | | / 793 * +----------+ 794 * 795 * Note: for compatibility, host krings are created even when not needed. 796 * The tailroom space is currently used by vale ports for allocating leases. 797 */ 798 /* call with NMG_LOCK held */ 799 int 800 netmap_krings_create(struct netmap_adapter *na, u_int tailroom) 801 { 802 u_int i, len, ndesc; 803 struct netmap_kring *kring; 804 u_int n[NR_TXRX]; 805 enum txrx t; 806 807 /* account for the (possibly fake) host rings */ 808 n[NR_TX] = na->num_tx_rings + 1; 809 n[NR_RX] = na->num_rx_rings + 1; 810 811 len = (n[NR_TX] + n[NR_RX]) * sizeof(struct netmap_kring) + tailroom; 812 813 na->tx_rings = malloc((size_t)len, M_DEVBUF, M_NOWAIT | M_ZERO); 814 if (na->tx_rings == NULL) { 815 D("Cannot allocate krings"); 816 return ENOMEM; 817 } 818 na->rx_rings = na->tx_rings + n[NR_TX]; 819 820 /* 821 * All fields in krings are 0 except the one initialized below. 822 * but better be explicit on important kring fields. 823 */ 824 for_rx_tx(t) { 825 ndesc = nma_get_ndesc(na, t); 826 for (i = 0; i < n[t]; i++) { 827 kring = &NMR(na, t)[i]; 828 bzero(kring, sizeof(*kring)); 829 kring->na = na; 830 kring->ring_id = i; 831 kring->tx = t; 832 kring->nkr_num_slots = ndesc; 833 kring->nr_mode = NKR_NETMAP_OFF; 834 kring->nr_pending_mode = NKR_NETMAP_OFF; 835 if (i < nma_get_nrings(na, t)) { 836 kring->nm_sync = (t == NR_TX ? na->nm_txsync : na->nm_rxsync); 837 } else { 838 kring->nm_sync = (t == NR_TX ? 839 netmap_txsync_to_host: 840 netmap_rxsync_from_host); 841 } 842 kring->nm_notify = na->nm_notify; 843 kring->rhead = kring->rcur = kring->nr_hwcur = 0; 844 /* 845 * IMPORTANT: Always keep one slot empty. 846 */ 847 kring->rtail = kring->nr_hwtail = (t == NR_TX ? ndesc - 1 : 0); 848 snprintf(kring->name, sizeof(kring->name) - 1, "%s %s%d", na->name, 849 nm_txrx2str(t), i); 850 ND("ktx %s h %d c %d t %d", 851 kring->name, kring->rhead, kring->rcur, kring->rtail); 852 mtx_init(&kring->q_lock, (t == NR_TX ? "nm_txq_lock" : "nm_rxq_lock"), NULL, MTX_DEF); 853 nm_os_selinfo_init(&kring->si); 854 } 855 nm_os_selinfo_init(&na->si[t]); 856 } 857 858 na->tailroom = na->rx_rings + n[NR_RX]; 859 860 return 0; 861 } 862 863 864 /* undo the actions performed by netmap_krings_create */ 865 /* call with NMG_LOCK held */ 866 void 867 netmap_krings_delete(struct netmap_adapter *na) 868 { 869 struct netmap_kring *kring = na->tx_rings; 870 enum txrx t; 871 872 for_rx_tx(t) 873 nm_os_selinfo_uninit(&na->si[t]); 874 875 /* we rely on the krings layout described above */ 876 for ( ; kring != na->tailroom; kring++) { 877 mtx_destroy(&kring->q_lock); 878 nm_os_selinfo_uninit(&kring->si); 879 } 880 free(na->tx_rings, M_DEVBUF); 881 na->tx_rings = na->rx_rings = na->tailroom = NULL; 882 } 883 884 885 /* 886 * Destructor for NIC ports. They also have an mbuf queue 887 * on the rings connected to the host so we need to purge 888 * them first. 889 */ 890 /* call with NMG_LOCK held */ 891 void 892 netmap_hw_krings_delete(struct netmap_adapter *na) 893 { 894 struct mbq *q = &na->rx_rings[na->num_rx_rings].rx_queue; 895 896 ND("destroy sw mbq with len %d", mbq_len(q)); 897 mbq_purge(q); 898 mbq_safe_fini(q); 899 netmap_krings_delete(na); 900 } 901 902 903 904 /* 905 * Undo everything that was done in netmap_do_regif(). In particular, 906 * call nm_register(ifp,0) to stop netmap mode on the interface and 907 * revert to normal operation. 908 */ 909 /* call with NMG_LOCK held */ 910 static void netmap_unset_ringid(struct netmap_priv_d *); 911 static void netmap_krings_put(struct netmap_priv_d *); 912 void 913 netmap_do_unregif(struct netmap_priv_d *priv) 914 { 915 struct netmap_adapter *na = priv->np_na; 916 917 NMG_LOCK_ASSERT(); 918 na->active_fds--; 919 /* unset nr_pending_mode and possibly release exclusive mode */ 920 netmap_krings_put(priv); 921 922 #ifdef WITH_MONITOR 923 /* XXX check whether we have to do something with monitor 924 * when rings change nr_mode. */ 925 if (na->active_fds <= 0) { 926 /* walk through all the rings and tell any monitor 927 * that the port is going to exit netmap mode 928 */ 929 netmap_monitor_stop(na); 930 } 931 #endif 932 933 if (na->active_fds <= 0 || nm_kring_pending(priv)) { 934 na->nm_register(na, 0); 935 } 936 937 /* delete rings and buffers that are no longer needed */ 938 netmap_mem_rings_delete(na); 939 940 if (na->active_fds <= 0) { /* last instance */ 941 /* 942 * (TO CHECK) We enter here 943 * when the last reference to this file descriptor goes 944 * away. This means we cannot have any pending poll() 945 * or interrupt routine operating on the structure. 946 * XXX The file may be closed in a thread while 947 * another thread is using it. 948 * Linux keeps the file opened until the last reference 949 * by any outstanding ioctl/poll or mmap is gone. 950 * FreeBSD does not track mmap()s (but we do) and 951 * wakes up any sleeping poll(). Need to check what 952 * happens if the close() occurs while a concurrent 953 * syscall is running. 954 */ 955 if (netmap_verbose) 956 D("deleting last instance for %s", na->name); 957 958 if (nm_netmap_on(na)) { 959 D("BUG: netmap on while going to delete the krings"); 960 } 961 962 na->nm_krings_delete(na); 963 } 964 965 /* possibily decrement counter of tx_si/rx_si users */ 966 netmap_unset_ringid(priv); 967 /* delete the nifp */ 968 netmap_mem_if_delete(na, priv->np_nifp); 969 /* drop the allocator */ 970 netmap_mem_deref(na->nm_mem, na); 971 /* mark the priv as unregistered */ 972 priv->np_na = NULL; 973 priv->np_nifp = NULL; 974 } 975 976 /* call with NMG_LOCK held */ 977 static __inline int 978 nm_si_user(struct netmap_priv_d *priv, enum txrx t) 979 { 980 return (priv->np_na != NULL && 981 (priv->np_qlast[t] - priv->np_qfirst[t] > 1)); 982 } 983 984 struct netmap_priv_d* 985 netmap_priv_new(void) 986 { 987 struct netmap_priv_d *priv; 988 989 priv = malloc(sizeof(struct netmap_priv_d), M_DEVBUF, 990 M_NOWAIT | M_ZERO); 991 if (priv == NULL) 992 return NULL; 993 priv->np_refs = 1; 994 nm_os_get_module(); 995 return priv; 996 } 997 998 /* 999 * Destructor of the netmap_priv_d, called when the fd is closed 1000 * Action: undo all the things done by NIOCREGIF, 1001 * On FreeBSD we need to track whether there are active mmap()s, 1002 * and we use np_active_mmaps for that. On linux, the field is always 0. 1003 * Return: 1 if we can free priv, 0 otherwise. 1004 * 1005 */ 1006 /* call with NMG_LOCK held */ 1007 void 1008 netmap_priv_delete(struct netmap_priv_d *priv) 1009 { 1010 struct netmap_adapter *na = priv->np_na; 1011 1012 /* number of active references to this fd */ 1013 if (--priv->np_refs > 0) { 1014 return; 1015 } 1016 nm_os_put_module(); 1017 if (na) { 1018 netmap_do_unregif(priv); 1019 } 1020 netmap_unget_na(na, priv->np_ifp); 1021 bzero(priv, sizeof(*priv)); /* for safety */ 1022 free(priv, M_DEVBUF); 1023 } 1024 1025 1026 /* call with NMG_LOCK *not* held */ 1027 void 1028 netmap_dtor(void *data) 1029 { 1030 struct netmap_priv_d *priv = data; 1031 1032 NMG_LOCK(); 1033 netmap_priv_delete(priv); 1034 NMG_UNLOCK(); 1035 } 1036 1037 1038 1039 1040 /* 1041 * Handlers for synchronization of the queues from/to the host. 1042 * Netmap has two operating modes: 1043 * - in the default mode, the rings connected to the host stack are 1044 * just another ring pair managed by userspace; 1045 * - in transparent mode (XXX to be defined) incoming packets 1046 * (from the host or the NIC) are marked as NS_FORWARD upon 1047 * arrival, and the user application has a chance to reset the 1048 * flag for packets that should be dropped. 1049 * On the RXSYNC or poll(), packets in RX rings between 1050 * kring->nr_kcur and ring->cur with NS_FORWARD still set are moved 1051 * to the other side. 1052 * The transfer NIC --> host is relatively easy, just encapsulate 1053 * into mbufs and we are done. The host --> NIC side is slightly 1054 * harder because there might not be room in the tx ring so it 1055 * might take a while before releasing the buffer. 1056 */ 1057 1058 1059 /* 1060 * pass a chain of buffers to the host stack as coming from 'dst' 1061 * We do not need to lock because the queue is private. 1062 */ 1063 static void 1064 netmap_send_up(struct ifnet *dst, struct mbq *q) 1065 { 1066 struct mbuf *m; 1067 struct mbuf *head = NULL, *prev = NULL; 1068 1069 /* send packets up, outside the lock */ 1070 while ((m = mbq_dequeue(q)) != NULL) { 1071 if (netmap_verbose & NM_VERB_HOST) 1072 D("sending up pkt %p size %d", m, MBUF_LEN(m)); 1073 prev = nm_os_send_up(dst, m, prev); 1074 if (head == NULL) 1075 head = prev; 1076 } 1077 if (head) 1078 nm_os_send_up(dst, NULL, head); 1079 mbq_fini(q); 1080 } 1081 1082 1083 /* 1084 * put a copy of the buffers marked NS_FORWARD into an mbuf chain. 1085 * Take packets from hwcur to ring->head marked NS_FORWARD (or forced) 1086 * and pass them up. Drop remaining packets in the unlikely event 1087 * of an mbuf shortage. 1088 */ 1089 static void 1090 netmap_grab_packets(struct netmap_kring *kring, struct mbq *q, int force) 1091 { 1092 u_int const lim = kring->nkr_num_slots - 1; 1093 u_int const head = kring->rhead; 1094 u_int n; 1095 struct netmap_adapter *na = kring->na; 1096 1097 for (n = kring->nr_hwcur; n != head; n = nm_next(n, lim)) { 1098 struct mbuf *m; 1099 struct netmap_slot *slot = &kring->ring->slot[n]; 1100 1101 if ((slot->flags & NS_FORWARD) == 0 && !force) 1102 continue; 1103 if (slot->len < 14 || slot->len > NETMAP_BUF_SIZE(na)) { 1104 RD(5, "bad pkt at %d len %d", n, slot->len); 1105 continue; 1106 } 1107 slot->flags &= ~NS_FORWARD; // XXX needed ? 1108 /* XXX TODO: adapt to the case of a multisegment packet */ 1109 m = m_devget(NMB(na, slot), slot->len, 0, na->ifp, NULL); 1110 1111 if (m == NULL) 1112 break; 1113 mbq_enqueue(q, m); 1114 } 1115 } 1116 1117 static inline int 1118 _nm_may_forward(struct netmap_kring *kring) 1119 { 1120 return ((netmap_fwd || kring->ring->flags & NR_FORWARD) && 1121 kring->na->na_flags & NAF_HOST_RINGS && 1122 kring->tx == NR_RX); 1123 } 1124 1125 static inline int 1126 nm_may_forward_up(struct netmap_kring *kring) 1127 { 1128 return _nm_may_forward(kring) && 1129 kring->ring_id != kring->na->num_rx_rings; 1130 } 1131 1132 static inline int 1133 nm_may_forward_down(struct netmap_kring *kring) 1134 { 1135 return _nm_may_forward(kring) && 1136 kring->ring_id == kring->na->num_rx_rings; 1137 } 1138 1139 /* 1140 * Send to the NIC rings packets marked NS_FORWARD between 1141 * kring->nr_hwcur and kring->rhead 1142 * Called under kring->rx_queue.lock on the sw rx ring, 1143 */ 1144 static u_int 1145 netmap_sw_to_nic(struct netmap_adapter *na) 1146 { 1147 struct netmap_kring *kring = &na->rx_rings[na->num_rx_rings]; 1148 struct netmap_slot *rxslot = kring->ring->slot; 1149 u_int i, rxcur = kring->nr_hwcur; 1150 u_int const head = kring->rhead; 1151 u_int const src_lim = kring->nkr_num_slots - 1; 1152 u_int sent = 0; 1153 1154 /* scan rings to find space, then fill as much as possible */ 1155 for (i = 0; i < na->num_tx_rings; i++) { 1156 struct netmap_kring *kdst = &na->tx_rings[i]; 1157 struct netmap_ring *rdst = kdst->ring; 1158 u_int const dst_lim = kdst->nkr_num_slots - 1; 1159 1160 /* XXX do we trust ring or kring->rcur,rtail ? */ 1161 for (; rxcur != head && !nm_ring_empty(rdst); 1162 rxcur = nm_next(rxcur, src_lim) ) { 1163 struct netmap_slot *src, *dst, tmp; 1164 u_int dst_head = rdst->head; 1165 1166 src = &rxslot[rxcur]; 1167 if ((src->flags & NS_FORWARD) == 0 && !netmap_fwd) 1168 continue; 1169 1170 sent++; 1171 1172 dst = &rdst->slot[dst_head]; 1173 1174 tmp = *src; 1175 1176 src->buf_idx = dst->buf_idx; 1177 src->flags = NS_BUF_CHANGED; 1178 1179 dst->buf_idx = tmp.buf_idx; 1180 dst->len = tmp.len; 1181 dst->flags = NS_BUF_CHANGED; 1182 1183 rdst->head = rdst->cur = nm_next(dst_head, dst_lim); 1184 } 1185 /* if (sent) XXX txsync ? */ 1186 } 1187 return sent; 1188 } 1189 1190 1191 /* 1192 * netmap_txsync_to_host() passes packets up. We are called from a 1193 * system call in user process context, and the only contention 1194 * can be among multiple user threads erroneously calling 1195 * this routine concurrently. 1196 */ 1197 static int 1198 netmap_txsync_to_host(struct netmap_kring *kring, int flags) 1199 { 1200 struct netmap_adapter *na = kring->na; 1201 u_int const lim = kring->nkr_num_slots - 1; 1202 u_int const head = kring->rhead; 1203 struct mbq q; 1204 1205 /* Take packets from hwcur to head and pass them up. 1206 * force head = cur since netmap_grab_packets() stops at head 1207 * In case of no buffers we give up. At the end of the loop, 1208 * the queue is drained in all cases. 1209 */ 1210 mbq_init(&q); 1211 netmap_grab_packets(kring, &q, 1 /* force */); 1212 ND("have %d pkts in queue", mbq_len(&q)); 1213 kring->nr_hwcur = head; 1214 kring->nr_hwtail = head + lim; 1215 if (kring->nr_hwtail > lim) 1216 kring->nr_hwtail -= lim + 1; 1217 1218 netmap_send_up(na->ifp, &q); 1219 return 0; 1220 } 1221 1222 1223 /* 1224 * rxsync backend for packets coming from the host stack. 1225 * They have been put in kring->rx_queue by netmap_transmit(). 1226 * We protect access to the kring using kring->rx_queue.lock 1227 * 1228 * This routine also does the selrecord if called from the poll handler 1229 * (we know because sr != NULL). 1230 * 1231 * returns the number of packets delivered to tx queues in 1232 * transparent mode, or a negative value if error 1233 */ 1234 static int 1235 netmap_rxsync_from_host(struct netmap_kring *kring, int flags) 1236 { 1237 struct netmap_adapter *na = kring->na; 1238 struct netmap_ring *ring = kring->ring; 1239 u_int nm_i, n; 1240 u_int const lim = kring->nkr_num_slots - 1; 1241 u_int const head = kring->rhead; 1242 int ret = 0; 1243 struct mbq *q = &kring->rx_queue, fq; 1244 1245 mbq_init(&fq); /* fq holds packets to be freed */ 1246 1247 mbq_lock(q); 1248 1249 /* First part: import newly received packets */ 1250 n = mbq_len(q); 1251 if (n) { /* grab packets from the queue */ 1252 struct mbuf *m; 1253 uint32_t stop_i; 1254 1255 nm_i = kring->nr_hwtail; 1256 stop_i = nm_prev(nm_i, lim); 1257 while ( nm_i != stop_i && (m = mbq_dequeue(q)) != NULL ) { 1258 int len = MBUF_LEN(m); 1259 struct netmap_slot *slot = &ring->slot[nm_i]; 1260 1261 m_copydata(m, 0, len, NMB(na, slot)); 1262 ND("nm %d len %d", nm_i, len); 1263 if (netmap_verbose) 1264 D("%s", nm_dump_buf(NMB(na, slot),len, 128, NULL)); 1265 1266 slot->len = len; 1267 slot->flags = kring->nkr_slot_flags; 1268 nm_i = nm_next(nm_i, lim); 1269 mbq_enqueue(&fq, m); 1270 } 1271 kring->nr_hwtail = nm_i; 1272 } 1273 1274 /* 1275 * Second part: skip past packets that userspace has released. 1276 */ 1277 nm_i = kring->nr_hwcur; 1278 if (nm_i != head) { /* something was released */ 1279 if (nm_may_forward_down(kring)) { 1280 ret = netmap_sw_to_nic(na); 1281 if (ret > 0) { 1282 kring->nr_kflags |= NR_FORWARD; 1283 ret = 0; 1284 } 1285 } 1286 kring->nr_hwcur = head; 1287 } 1288 1289 mbq_unlock(q); 1290 1291 mbq_purge(&fq); 1292 mbq_fini(&fq); 1293 1294 return ret; 1295 } 1296 1297 1298 /* Get a netmap adapter for the port. 1299 * 1300 * If it is possible to satisfy the request, return 0 1301 * with *na containing the netmap adapter found. 1302 * Otherwise return an error code, with *na containing NULL. 1303 * 1304 * When the port is attached to a bridge, we always return 1305 * EBUSY. 1306 * Otherwise, if the port is already bound to a file descriptor, 1307 * then we unconditionally return the existing adapter into *na. 1308 * In all the other cases, we return (into *na) either native, 1309 * generic or NULL, according to the following table: 1310 * 1311 * native_support 1312 * active_fds dev.netmap.admode YES NO 1313 * ------------------------------------------------------- 1314 * >0 * NA(ifp) NA(ifp) 1315 * 1316 * 0 NETMAP_ADMODE_BEST NATIVE GENERIC 1317 * 0 NETMAP_ADMODE_NATIVE NATIVE NULL 1318 * 0 NETMAP_ADMODE_GENERIC GENERIC GENERIC 1319 * 1320 */ 1321 static void netmap_hw_dtor(struct netmap_adapter *); /* needed by NM_IS_NATIVE() */ 1322 int 1323 netmap_get_hw_na(struct ifnet *ifp, struct netmap_adapter **na) 1324 { 1325 /* generic support */ 1326 int i = netmap_admode; /* Take a snapshot. */ 1327 struct netmap_adapter *prev_na; 1328 int error = 0; 1329 1330 *na = NULL; /* default */ 1331 1332 /* reset in case of invalid value */ 1333 if (i < NETMAP_ADMODE_BEST || i >= NETMAP_ADMODE_LAST) 1334 i = netmap_admode = NETMAP_ADMODE_BEST; 1335 1336 if (NM_NA_VALID(ifp)) { 1337 prev_na = NA(ifp); 1338 /* If an adapter already exists, return it if 1339 * there are active file descriptors or if 1340 * netmap is not forced to use generic 1341 * adapters. 1342 */ 1343 if (NETMAP_OWNED_BY_ANY(prev_na) 1344 || i != NETMAP_ADMODE_GENERIC 1345 || prev_na->na_flags & NAF_FORCE_NATIVE 1346 #ifdef WITH_PIPES 1347 /* ugly, but we cannot allow an adapter switch 1348 * if some pipe is referring to this one 1349 */ 1350 || prev_na->na_next_pipe > 0 1351 #endif 1352 ) { 1353 *na = prev_na; 1354 return 0; 1355 } 1356 } 1357 1358 /* If there isn't native support and netmap is not allowed 1359 * to use generic adapters, we cannot satisfy the request. 1360 */ 1361 if (!NM_IS_NATIVE(ifp) && i == NETMAP_ADMODE_NATIVE) 1362 return EOPNOTSUPP; 1363 1364 /* Otherwise, create a generic adapter and return it, 1365 * saving the previously used netmap adapter, if any. 1366 * 1367 * Note that here 'prev_na', if not NULL, MUST be a 1368 * native adapter, and CANNOT be a generic one. This is 1369 * true because generic adapters are created on demand, and 1370 * destroyed when not used anymore. Therefore, if the adapter 1371 * currently attached to an interface 'ifp' is generic, it 1372 * must be that 1373 * (NA(ifp)->active_fds > 0 || NETMAP_OWNED_BY_KERN(NA(ifp))). 1374 * Consequently, if NA(ifp) is generic, we will enter one of 1375 * the branches above. This ensures that we never override 1376 * a generic adapter with another generic adapter. 1377 */ 1378 error = generic_netmap_attach(ifp); 1379 if (error) 1380 return error; 1381 1382 *na = NA(ifp); 1383 return 0; 1384 } 1385 1386 1387 /* 1388 * MUST BE CALLED UNDER NMG_LOCK() 1389 * 1390 * Get a refcounted reference to a netmap adapter attached 1391 * to the interface specified by nmr. 1392 * This is always called in the execution of an ioctl(). 1393 * 1394 * Return ENXIO if the interface specified by the request does 1395 * not exist, ENOTSUP if netmap is not supported by the interface, 1396 * EBUSY if the interface is already attached to a bridge, 1397 * EINVAL if parameters are invalid, ENOMEM if needed resources 1398 * could not be allocated. 1399 * If successful, hold a reference to the netmap adapter. 1400 * 1401 * If the interface specified by nmr is a system one, also keep 1402 * a reference to it and return a valid *ifp. 1403 */ 1404 int 1405 netmap_get_na(struct nmreq *nmr, struct netmap_adapter **na, 1406 struct ifnet **ifp, int create) 1407 { 1408 int error = 0; 1409 struct netmap_adapter *ret = NULL; 1410 1411 *na = NULL; /* default return value */ 1412 *ifp = NULL; 1413 1414 NMG_LOCK_ASSERT(); 1415 1416 /* We cascade through all possible types of netmap adapter. 1417 * All netmap_get_*_na() functions return an error and an na, 1418 * with the following combinations: 1419 * 1420 * error na 1421 * 0 NULL type doesn't match 1422 * !0 NULL type matches, but na creation/lookup failed 1423 * 0 !NULL type matches and na created/found 1424 * !0 !NULL impossible 1425 */ 1426 1427 /* try to see if this is a ptnetmap port */ 1428 error = netmap_get_pt_host_na(nmr, na, create); 1429 if (error || *na != NULL) 1430 return error; 1431 1432 /* try to see if this is a monitor port */ 1433 error = netmap_get_monitor_na(nmr, na, create); 1434 if (error || *na != NULL) 1435 return error; 1436 1437 /* try to see if this is a pipe port */ 1438 error = netmap_get_pipe_na(nmr, na, create); 1439 if (error || *na != NULL) 1440 return error; 1441 1442 /* try to see if this is a bridge port */ 1443 error = netmap_get_bdg_na(nmr, na, create); 1444 if (error) 1445 return error; 1446 1447 if (*na != NULL) /* valid match in netmap_get_bdg_na() */ 1448 goto out; 1449 1450 /* 1451 * This must be a hardware na, lookup the name in the system. 1452 * Note that by hardware we actually mean "it shows up in ifconfig". 1453 * This may still be a tap, a veth/epair, or even a 1454 * persistent VALE port. 1455 */ 1456 *ifp = ifunit_ref(nmr->nr_name); 1457 if (*ifp == NULL) { 1458 return ENXIO; 1459 } 1460 1461 error = netmap_get_hw_na(*ifp, &ret); 1462 if (error) 1463 goto out; 1464 1465 *na = ret; 1466 netmap_adapter_get(ret); 1467 1468 out: 1469 if (error) { 1470 if (ret) 1471 netmap_adapter_put(ret); 1472 if (*ifp) { 1473 if_rele(*ifp); 1474 *ifp = NULL; 1475 } 1476 } 1477 1478 return error; 1479 } 1480 1481 /* undo netmap_get_na() */ 1482 void 1483 netmap_unget_na(struct netmap_adapter *na, struct ifnet *ifp) 1484 { 1485 if (ifp) 1486 if_rele(ifp); 1487 if (na) 1488 netmap_adapter_put(na); 1489 } 1490 1491 1492 #define NM_FAIL_ON(t) do { \ 1493 if (unlikely(t)) { \ 1494 RD(5, "%s: fail '" #t "' " \ 1495 "h %d c %d t %d " \ 1496 "rh %d rc %d rt %d " \ 1497 "hc %d ht %d", \ 1498 kring->name, \ 1499 head, cur, ring->tail, \ 1500 kring->rhead, kring->rcur, kring->rtail, \ 1501 kring->nr_hwcur, kring->nr_hwtail); \ 1502 return kring->nkr_num_slots; \ 1503 } \ 1504 } while (0) 1505 1506 /* 1507 * validate parameters on entry for *_txsync() 1508 * Returns ring->cur if ok, or something >= kring->nkr_num_slots 1509 * in case of error. 1510 * 1511 * rhead, rcur and rtail=hwtail are stored from previous round. 1512 * hwcur is the next packet to send to the ring. 1513 * 1514 * We want 1515 * hwcur <= *rhead <= head <= cur <= tail = *rtail <= hwtail 1516 * 1517 * hwcur, rhead, rtail and hwtail are reliable 1518 */ 1519 u_int 1520 nm_txsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring) 1521 { 1522 u_int head = ring->head; /* read only once */ 1523 u_int cur = ring->cur; /* read only once */ 1524 u_int n = kring->nkr_num_slots; 1525 1526 ND(5, "%s kcur %d ktail %d head %d cur %d tail %d", 1527 kring->name, 1528 kring->nr_hwcur, kring->nr_hwtail, 1529 ring->head, ring->cur, ring->tail); 1530 #if 1 /* kernel sanity checks; but we can trust the kring. */ 1531 NM_FAIL_ON(kring->nr_hwcur >= n || kring->rhead >= n || 1532 kring->rtail >= n || kring->nr_hwtail >= n); 1533 #endif /* kernel sanity checks */ 1534 /* 1535 * user sanity checks. We only use head, 1536 * A, B, ... are possible positions for head: 1537 * 1538 * 0 A rhead B rtail C n-1 1539 * 0 D rtail E rhead F n-1 1540 * 1541 * B, F, D are valid. A, C, E are wrong 1542 */ 1543 if (kring->rtail >= kring->rhead) { 1544 /* want rhead <= head <= rtail */ 1545 NM_FAIL_ON(head < kring->rhead || head > kring->rtail); 1546 /* and also head <= cur <= rtail */ 1547 NM_FAIL_ON(cur < head || cur > kring->rtail); 1548 } else { /* here rtail < rhead */ 1549 /* we need head outside rtail .. rhead */ 1550 NM_FAIL_ON(head > kring->rtail && head < kring->rhead); 1551 1552 /* two cases now: head <= rtail or head >= rhead */ 1553 if (head <= kring->rtail) { 1554 /* want head <= cur <= rtail */ 1555 NM_FAIL_ON(cur < head || cur > kring->rtail); 1556 } else { /* head >= rhead */ 1557 /* cur must be outside rtail..head */ 1558 NM_FAIL_ON(cur > kring->rtail && cur < head); 1559 } 1560 } 1561 if (ring->tail != kring->rtail) { 1562 RD(5, "tail overwritten was %d need %d", 1563 ring->tail, kring->rtail); 1564 ring->tail = kring->rtail; 1565 } 1566 kring->rhead = head; 1567 kring->rcur = cur; 1568 return head; 1569 } 1570 1571 1572 /* 1573 * validate parameters on entry for *_rxsync() 1574 * Returns ring->head if ok, kring->nkr_num_slots on error. 1575 * 1576 * For a valid configuration, 1577 * hwcur <= head <= cur <= tail <= hwtail 1578 * 1579 * We only consider head and cur. 1580 * hwcur and hwtail are reliable. 1581 * 1582 */ 1583 u_int 1584 nm_rxsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring) 1585 { 1586 uint32_t const n = kring->nkr_num_slots; 1587 uint32_t head, cur; 1588 1589 ND(5,"%s kc %d kt %d h %d c %d t %d", 1590 kring->name, 1591 kring->nr_hwcur, kring->nr_hwtail, 1592 ring->head, ring->cur, ring->tail); 1593 /* 1594 * Before storing the new values, we should check they do not 1595 * move backwards. However: 1596 * - head is not an issue because the previous value is hwcur; 1597 * - cur could in principle go back, however it does not matter 1598 * because we are processing a brand new rxsync() 1599 */ 1600 cur = kring->rcur = ring->cur; /* read only once */ 1601 head = kring->rhead = ring->head; /* read only once */ 1602 #if 1 /* kernel sanity checks */ 1603 NM_FAIL_ON(kring->nr_hwcur >= n || kring->nr_hwtail >= n); 1604 #endif /* kernel sanity checks */ 1605 /* user sanity checks */ 1606 if (kring->nr_hwtail >= kring->nr_hwcur) { 1607 /* want hwcur <= rhead <= hwtail */ 1608 NM_FAIL_ON(head < kring->nr_hwcur || head > kring->nr_hwtail); 1609 /* and also rhead <= rcur <= hwtail */ 1610 NM_FAIL_ON(cur < head || cur > kring->nr_hwtail); 1611 } else { 1612 /* we need rhead outside hwtail..hwcur */ 1613 NM_FAIL_ON(head < kring->nr_hwcur && head > kring->nr_hwtail); 1614 /* two cases now: head <= hwtail or head >= hwcur */ 1615 if (head <= kring->nr_hwtail) { 1616 /* want head <= cur <= hwtail */ 1617 NM_FAIL_ON(cur < head || cur > kring->nr_hwtail); 1618 } else { 1619 /* cur must be outside hwtail..head */ 1620 NM_FAIL_ON(cur < head && cur > kring->nr_hwtail); 1621 } 1622 } 1623 if (ring->tail != kring->rtail) { 1624 RD(5, "%s tail overwritten was %d need %d", 1625 kring->name, 1626 ring->tail, kring->rtail); 1627 ring->tail = kring->rtail; 1628 } 1629 return head; 1630 } 1631 1632 1633 /* 1634 * Error routine called when txsync/rxsync detects an error. 1635 * Can't do much more than resetting head =cur = hwcur, tail = hwtail 1636 * Return 1 on reinit. 1637 * 1638 * This routine is only called by the upper half of the kernel. 1639 * It only reads hwcur (which is changed only by the upper half, too) 1640 * and hwtail (which may be changed by the lower half, but only on 1641 * a tx ring and only to increase it, so any error will be recovered 1642 * on the next call). For the above, we don't strictly need to call 1643 * it under lock. 1644 */ 1645 int 1646 netmap_ring_reinit(struct netmap_kring *kring) 1647 { 1648 struct netmap_ring *ring = kring->ring; 1649 u_int i, lim = kring->nkr_num_slots - 1; 1650 int errors = 0; 1651 1652 // XXX KASSERT nm_kr_tryget 1653 RD(10, "called for %s", kring->name); 1654 // XXX probably wrong to trust userspace 1655 kring->rhead = ring->head; 1656 kring->rcur = ring->cur; 1657 kring->rtail = ring->tail; 1658 1659 if (ring->cur > lim) 1660 errors++; 1661 if (ring->head > lim) 1662 errors++; 1663 if (ring->tail > lim) 1664 errors++; 1665 for (i = 0; i <= lim; i++) { 1666 u_int idx = ring->slot[i].buf_idx; 1667 u_int len = ring->slot[i].len; 1668 if (idx < 2 || idx >= kring->na->na_lut.objtotal) { 1669 RD(5, "bad index at slot %d idx %d len %d ", i, idx, len); 1670 ring->slot[i].buf_idx = 0; 1671 ring->slot[i].len = 0; 1672 } else if (len > NETMAP_BUF_SIZE(kring->na)) { 1673 ring->slot[i].len = 0; 1674 RD(5, "bad len at slot %d idx %d len %d", i, idx, len); 1675 } 1676 } 1677 if (errors) { 1678 RD(10, "total %d errors", errors); 1679 RD(10, "%s reinit, cur %d -> %d tail %d -> %d", 1680 kring->name, 1681 ring->cur, kring->nr_hwcur, 1682 ring->tail, kring->nr_hwtail); 1683 ring->head = kring->rhead = kring->nr_hwcur; 1684 ring->cur = kring->rcur = kring->nr_hwcur; 1685 ring->tail = kring->rtail = kring->nr_hwtail; 1686 } 1687 return (errors ? 1 : 0); 1688 } 1689 1690 /* interpret the ringid and flags fields of an nmreq, by translating them 1691 * into a pair of intervals of ring indices: 1692 * 1693 * [priv->np_txqfirst, priv->np_txqlast) and 1694 * [priv->np_rxqfirst, priv->np_rxqlast) 1695 * 1696 */ 1697 int 1698 netmap_interp_ringid(struct netmap_priv_d *priv, uint16_t ringid, uint32_t flags) 1699 { 1700 struct netmap_adapter *na = priv->np_na; 1701 u_int j, i = ringid & NETMAP_RING_MASK; 1702 u_int reg = flags & NR_REG_MASK; 1703 int excluded_direction[] = { NR_TX_RINGS_ONLY, NR_RX_RINGS_ONLY }; 1704 enum txrx t; 1705 1706 if (reg == NR_REG_DEFAULT) { 1707 /* convert from old ringid to flags */ 1708 if (ringid & NETMAP_SW_RING) { 1709 reg = NR_REG_SW; 1710 } else if (ringid & NETMAP_HW_RING) { 1711 reg = NR_REG_ONE_NIC; 1712 } else { 1713 reg = NR_REG_ALL_NIC; 1714 } 1715 D("deprecated API, old ringid 0x%x -> ringid %x reg %d", ringid, i, reg); 1716 } 1717 1718 if ((flags & NR_PTNETMAP_HOST) && (reg != NR_REG_ALL_NIC || 1719 flags & (NR_RX_RINGS_ONLY|NR_TX_RINGS_ONLY))) { 1720 D("Error: only NR_REG_ALL_NIC supported with netmap passthrough"); 1721 return EINVAL; 1722 } 1723 1724 for_rx_tx(t) { 1725 if (flags & excluded_direction[t]) { 1726 priv->np_qfirst[t] = priv->np_qlast[t] = 0; 1727 continue; 1728 } 1729 switch (reg) { 1730 case NR_REG_ALL_NIC: 1731 case NR_REG_PIPE_MASTER: 1732 case NR_REG_PIPE_SLAVE: 1733 priv->np_qfirst[t] = 0; 1734 priv->np_qlast[t] = nma_get_nrings(na, t); 1735 ND("ALL/PIPE: %s %d %d", nm_txrx2str(t), 1736 priv->np_qfirst[t], priv->np_qlast[t]); 1737 break; 1738 case NR_REG_SW: 1739 case NR_REG_NIC_SW: 1740 if (!(na->na_flags & NAF_HOST_RINGS)) { 1741 D("host rings not supported"); 1742 return EINVAL; 1743 } 1744 priv->np_qfirst[t] = (reg == NR_REG_SW ? 1745 nma_get_nrings(na, t) : 0); 1746 priv->np_qlast[t] = nma_get_nrings(na, t) + 1; 1747 ND("%s: %s %d %d", reg == NR_REG_SW ? "SW" : "NIC+SW", 1748 nm_txrx2str(t), 1749 priv->np_qfirst[t], priv->np_qlast[t]); 1750 break; 1751 case NR_REG_ONE_NIC: 1752 if (i >= na->num_tx_rings && i >= na->num_rx_rings) { 1753 D("invalid ring id %d", i); 1754 return EINVAL; 1755 } 1756 /* if not enough rings, use the first one */ 1757 j = i; 1758 if (j >= nma_get_nrings(na, t)) 1759 j = 0; 1760 priv->np_qfirst[t] = j; 1761 priv->np_qlast[t] = j + 1; 1762 ND("ONE_NIC: %s %d %d", nm_txrx2str(t), 1763 priv->np_qfirst[t], priv->np_qlast[t]); 1764 break; 1765 default: 1766 D("invalid regif type %d", reg); 1767 return EINVAL; 1768 } 1769 } 1770 priv->np_flags = (flags & ~NR_REG_MASK) | reg; 1771 1772 if (netmap_verbose) { 1773 D("%s: tx [%d,%d) rx [%d,%d) id %d", 1774 na->name, 1775 priv->np_qfirst[NR_TX], 1776 priv->np_qlast[NR_TX], 1777 priv->np_qfirst[NR_RX], 1778 priv->np_qlast[NR_RX], 1779 i); 1780 } 1781 return 0; 1782 } 1783 1784 1785 /* 1786 * Set the ring ID. For devices with a single queue, a request 1787 * for all rings is the same as a single ring. 1788 */ 1789 static int 1790 netmap_set_ringid(struct netmap_priv_d *priv, uint16_t ringid, uint32_t flags) 1791 { 1792 struct netmap_adapter *na = priv->np_na; 1793 int error; 1794 enum txrx t; 1795 1796 error = netmap_interp_ringid(priv, ringid, flags); 1797 if (error) { 1798 return error; 1799 } 1800 1801 priv->np_txpoll = (ringid & NETMAP_NO_TX_POLL) ? 0 : 1; 1802 1803 /* optimization: count the users registered for more than 1804 * one ring, which are the ones sleeping on the global queue. 1805 * The default netmap_notify() callback will then 1806 * avoid signaling the global queue if nobody is using it 1807 */ 1808 for_rx_tx(t) { 1809 if (nm_si_user(priv, t)) 1810 na->si_users[t]++; 1811 } 1812 return 0; 1813 } 1814 1815 static void 1816 netmap_unset_ringid(struct netmap_priv_d *priv) 1817 { 1818 struct netmap_adapter *na = priv->np_na; 1819 enum txrx t; 1820 1821 for_rx_tx(t) { 1822 if (nm_si_user(priv, t)) 1823 na->si_users[t]--; 1824 priv->np_qfirst[t] = priv->np_qlast[t] = 0; 1825 } 1826 priv->np_flags = 0; 1827 priv->np_txpoll = 0; 1828 } 1829 1830 1831 /* Set the nr_pending_mode for the requested rings. 1832 * If requested, also try to get exclusive access to the rings, provided 1833 * the rings we want to bind are not exclusively owned by a previous bind. 1834 */ 1835 static int 1836 netmap_krings_get(struct netmap_priv_d *priv) 1837 { 1838 struct netmap_adapter *na = priv->np_na; 1839 u_int i; 1840 struct netmap_kring *kring; 1841 int excl = (priv->np_flags & NR_EXCLUSIVE); 1842 enum txrx t; 1843 1844 ND("%s: grabbing tx [%d, %d) rx [%d, %d)", 1845 na->name, 1846 priv->np_qfirst[NR_TX], 1847 priv->np_qlast[NR_TX], 1848 priv->np_qfirst[NR_RX], 1849 priv->np_qlast[NR_RX]); 1850 1851 /* first round: check that all the requested rings 1852 * are neither alread exclusively owned, nor we 1853 * want exclusive ownership when they are already in use 1854 */ 1855 for_rx_tx(t) { 1856 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 1857 kring = &NMR(na, t)[i]; 1858 if ((kring->nr_kflags & NKR_EXCLUSIVE) || 1859 (kring->users && excl)) 1860 { 1861 ND("ring %s busy", kring->name); 1862 return EBUSY; 1863 } 1864 } 1865 } 1866 1867 /* second round: increment usage count (possibly marking them 1868 * as exclusive) and set the nr_pending_mode 1869 */ 1870 for_rx_tx(t) { 1871 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 1872 kring = &NMR(na, t)[i]; 1873 kring->users++; 1874 if (excl) 1875 kring->nr_kflags |= NKR_EXCLUSIVE; 1876 kring->nr_pending_mode = NKR_NETMAP_ON; 1877 } 1878 } 1879 1880 return 0; 1881 1882 } 1883 1884 /* Undo netmap_krings_get(). This is done by clearing the exclusive mode 1885 * if was asked on regif, and unset the nr_pending_mode if we are the 1886 * last users of the involved rings. */ 1887 static void 1888 netmap_krings_put(struct netmap_priv_d *priv) 1889 { 1890 struct netmap_adapter *na = priv->np_na; 1891 u_int i; 1892 struct netmap_kring *kring; 1893 int excl = (priv->np_flags & NR_EXCLUSIVE); 1894 enum txrx t; 1895 1896 ND("%s: releasing tx [%d, %d) rx [%d, %d)", 1897 na->name, 1898 priv->np_qfirst[NR_TX], 1899 priv->np_qlast[NR_TX], 1900 priv->np_qfirst[NR_RX], 1901 priv->np_qlast[MR_RX]); 1902 1903 1904 for_rx_tx(t) { 1905 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 1906 kring = &NMR(na, t)[i]; 1907 if (excl) 1908 kring->nr_kflags &= ~NKR_EXCLUSIVE; 1909 kring->users--; 1910 if (kring->users == 0) 1911 kring->nr_pending_mode = NKR_NETMAP_OFF; 1912 } 1913 } 1914 } 1915 1916 /* 1917 * possibly move the interface to netmap-mode. 1918 * If success it returns a pointer to netmap_if, otherwise NULL. 1919 * This must be called with NMG_LOCK held. 1920 * 1921 * The following na callbacks are called in the process: 1922 * 1923 * na->nm_config() [by netmap_update_config] 1924 * (get current number and size of rings) 1925 * 1926 * We have a generic one for linux (netmap_linux_config). 1927 * The bwrap has to override this, since it has to forward 1928 * the request to the wrapped adapter (netmap_bwrap_config). 1929 * 1930 * 1931 * na->nm_krings_create() 1932 * (create and init the krings array) 1933 * 1934 * One of the following: 1935 * 1936 * * netmap_hw_krings_create, (hw ports) 1937 * creates the standard layout for the krings 1938 * and adds the mbq (used for the host rings). 1939 * 1940 * * netmap_vp_krings_create (VALE ports) 1941 * add leases and scratchpads 1942 * 1943 * * netmap_pipe_krings_create (pipes) 1944 * create the krings and rings of both ends and 1945 * cross-link them 1946 * 1947 * * netmap_monitor_krings_create (monitors) 1948 * avoid allocating the mbq 1949 * 1950 * * netmap_bwrap_krings_create (bwraps) 1951 * create both the brap krings array, 1952 * the krings array of the wrapped adapter, and 1953 * (if needed) the fake array for the host adapter 1954 * 1955 * na->nm_register(, 1) 1956 * (put the adapter in netmap mode) 1957 * 1958 * This may be one of the following: 1959 * 1960 * * netmap_hw_reg (hw ports) 1961 * checks that the ifp is still there, then calls 1962 * the hardware specific callback; 1963 * 1964 * * netmap_vp_reg (VALE ports) 1965 * If the port is connected to a bridge, 1966 * set the NAF_NETMAP_ON flag under the 1967 * bridge write lock. 1968 * 1969 * * netmap_pipe_reg (pipes) 1970 * inform the other pipe end that it is no 1971 * longer responsible for the lifetime of this 1972 * pipe end 1973 * 1974 * * netmap_monitor_reg (monitors) 1975 * intercept the sync callbacks of the monitored 1976 * rings 1977 * 1978 * * netmap_bwrap_reg (bwraps) 1979 * cross-link the bwrap and hwna rings, 1980 * forward the request to the hwna, override 1981 * the hwna notify callback (to get the frames 1982 * coming from outside go through the bridge). 1983 * 1984 * 1985 */ 1986 int 1987 netmap_do_regif(struct netmap_priv_d *priv, struct netmap_adapter *na, 1988 uint16_t ringid, uint32_t flags) 1989 { 1990 struct netmap_if *nifp = NULL; 1991 int error; 1992 1993 NMG_LOCK_ASSERT(); 1994 /* ring configuration may have changed, fetch from the card */ 1995 netmap_update_config(na); 1996 priv->np_na = na; /* store the reference */ 1997 error = netmap_set_ringid(priv, ringid, flags); 1998 if (error) 1999 goto err; 2000 error = netmap_mem_finalize(na->nm_mem, na); 2001 if (error) 2002 goto err; 2003 2004 if (na->active_fds == 0) { 2005 /* 2006 * If this is the first registration of the adapter, 2007 * create the in-kernel view of the netmap rings, 2008 * the netmap krings. 2009 */ 2010 2011 /* 2012 * Depending on the adapter, this may also create 2013 * the netmap rings themselves 2014 */ 2015 error = na->nm_krings_create(na); 2016 if (error) 2017 goto err_drop_mem; 2018 2019 } 2020 2021 /* now the krings must exist and we can check whether some 2022 * previous bind has exclusive ownership on them, and set 2023 * nr_pending_mode 2024 */ 2025 error = netmap_krings_get(priv); 2026 if (error) 2027 goto err_del_krings; 2028 2029 /* create all needed missing netmap rings */ 2030 error = netmap_mem_rings_create(na); 2031 if (error) 2032 goto err_rel_excl; 2033 2034 /* in all cases, create a new netmap if */ 2035 nifp = netmap_mem_if_new(na); 2036 if (nifp == NULL) { 2037 error = ENOMEM; 2038 goto err_del_rings; 2039 } 2040 2041 if (na->active_fds == 0) { 2042 /* cache the allocator info in the na */ 2043 error = netmap_mem_get_lut(na->nm_mem, &na->na_lut); 2044 if (error) 2045 goto err_del_if; 2046 ND("lut %p bufs %u size %u", na->na_lut.lut, na->na_lut.objtotal, 2047 na->na_lut.objsize); 2048 } 2049 2050 if (nm_kring_pending(priv)) { 2051 /* Some kring is switching mode, tell the adapter to 2052 * react on this. */ 2053 error = na->nm_register(na, 1); 2054 if (error) 2055 goto err_put_lut; 2056 } 2057 2058 /* Commit the reference. */ 2059 na->active_fds++; 2060 2061 /* 2062 * advertise that the interface is ready by setting np_nifp. 2063 * The barrier is needed because readers (poll, *SYNC and mmap) 2064 * check for priv->np_nifp != NULL without locking 2065 */ 2066 mb(); /* make sure previous writes are visible to all CPUs */ 2067 priv->np_nifp = nifp; 2068 2069 return 0; 2070 2071 err_put_lut: 2072 if (na->active_fds == 0) 2073 memset(&na->na_lut, 0, sizeof(na->na_lut)); 2074 err_del_if: 2075 netmap_mem_if_delete(na, nifp); 2076 err_rel_excl: 2077 netmap_krings_put(priv); 2078 err_del_rings: 2079 netmap_mem_rings_delete(na); 2080 err_del_krings: 2081 if (na->active_fds == 0) 2082 na->nm_krings_delete(na); 2083 err_drop_mem: 2084 netmap_mem_deref(na->nm_mem, na); 2085 err: 2086 priv->np_na = NULL; 2087 return error; 2088 } 2089 2090 2091 /* 2092 * update kring and ring at the end of rxsync/txsync. 2093 */ 2094 static inline void 2095 nm_sync_finalize(struct netmap_kring *kring) 2096 { 2097 /* 2098 * Update ring tail to what the kernel knows 2099 * After txsync: head/rhead/hwcur might be behind cur/rcur 2100 * if no carrier. 2101 */ 2102 kring->ring->tail = kring->rtail = kring->nr_hwtail; 2103 2104 ND(5, "%s now hwcur %d hwtail %d head %d cur %d tail %d", 2105 kring->name, kring->nr_hwcur, kring->nr_hwtail, 2106 kring->rhead, kring->rcur, kring->rtail); 2107 } 2108 2109 /* 2110 * ioctl(2) support for the "netmap" device. 2111 * 2112 * Following a list of accepted commands: 2113 * - NIOCGINFO 2114 * - SIOCGIFADDR just for convenience 2115 * - NIOCREGIF 2116 * - NIOCTXSYNC 2117 * - NIOCRXSYNC 2118 * 2119 * Return 0 on success, errno otherwise. 2120 */ 2121 int 2122 netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, caddr_t data, struct thread *td) 2123 { 2124 struct nmreq *nmr = (struct nmreq *) data; 2125 struct netmap_adapter *na = NULL; 2126 struct ifnet *ifp = NULL; 2127 int error = 0; 2128 u_int i, qfirst, qlast; 2129 struct netmap_if *nifp; 2130 struct netmap_kring *krings; 2131 enum txrx t; 2132 2133 if (cmd == NIOCGINFO || cmd == NIOCREGIF) { 2134 /* truncate name */ 2135 nmr->nr_name[sizeof(nmr->nr_name) - 1] = '\0'; 2136 if (nmr->nr_version != NETMAP_API) { 2137 D("API mismatch for %s got %d need %d", 2138 nmr->nr_name, 2139 nmr->nr_version, NETMAP_API); 2140 nmr->nr_version = NETMAP_API; 2141 } 2142 if (nmr->nr_version < NETMAP_MIN_API || 2143 nmr->nr_version > NETMAP_MAX_API) { 2144 return EINVAL; 2145 } 2146 } 2147 2148 switch (cmd) { 2149 case NIOCGINFO: /* return capabilities etc */ 2150 if (nmr->nr_cmd == NETMAP_BDG_LIST) { 2151 error = netmap_bdg_ctl(nmr, NULL); 2152 break; 2153 } 2154 2155 NMG_LOCK(); 2156 do { 2157 /* memsize is always valid */ 2158 struct netmap_mem_d *nmd = &nm_mem; 2159 u_int memflags; 2160 2161 if (nmr->nr_name[0] != '\0') { 2162 2163 /* get a refcount */ 2164 error = netmap_get_na(nmr, &na, &ifp, 1 /* create */); 2165 if (error) { 2166 na = NULL; 2167 ifp = NULL; 2168 break; 2169 } 2170 nmd = na->nm_mem; /* get memory allocator */ 2171 } 2172 2173 error = netmap_mem_get_info(nmd, &nmr->nr_memsize, &memflags, 2174 &nmr->nr_arg2); 2175 if (error) 2176 break; 2177 if (na == NULL) /* only memory info */ 2178 break; 2179 nmr->nr_offset = 0; 2180 nmr->nr_rx_slots = nmr->nr_tx_slots = 0; 2181 netmap_update_config(na); 2182 nmr->nr_rx_rings = na->num_rx_rings; 2183 nmr->nr_tx_rings = na->num_tx_rings; 2184 nmr->nr_rx_slots = na->num_rx_desc; 2185 nmr->nr_tx_slots = na->num_tx_desc; 2186 } while (0); 2187 netmap_unget_na(na, ifp); 2188 NMG_UNLOCK(); 2189 break; 2190 2191 case NIOCREGIF: 2192 /* possibly attach/detach NIC and VALE switch */ 2193 i = nmr->nr_cmd; 2194 if (i == NETMAP_BDG_ATTACH || i == NETMAP_BDG_DETACH 2195 || i == NETMAP_BDG_VNET_HDR 2196 || i == NETMAP_BDG_NEWIF 2197 || i == NETMAP_BDG_DELIF 2198 || i == NETMAP_BDG_POLLING_ON 2199 || i == NETMAP_BDG_POLLING_OFF) { 2200 error = netmap_bdg_ctl(nmr, NULL); 2201 break; 2202 } else if (i == NETMAP_PT_HOST_CREATE || i == NETMAP_PT_HOST_DELETE) { 2203 error = ptnetmap_ctl(nmr, priv->np_na); 2204 break; 2205 } else if (i == NETMAP_VNET_HDR_GET) { 2206 struct ifnet *ifp; 2207 2208 NMG_LOCK(); 2209 error = netmap_get_na(nmr, &na, &ifp, 0); 2210 if (na && !error) { 2211 nmr->nr_arg1 = na->virt_hdr_len; 2212 } 2213 netmap_unget_na(na, ifp); 2214 NMG_UNLOCK(); 2215 break; 2216 } else if (i != 0) { 2217 D("nr_cmd must be 0 not %d", i); 2218 error = EINVAL; 2219 break; 2220 } 2221 2222 /* protect access to priv from concurrent NIOCREGIF */ 2223 NMG_LOCK(); 2224 do { 2225 u_int memflags; 2226 struct ifnet *ifp; 2227 2228 if (priv->np_nifp != NULL) { /* thread already registered */ 2229 error = EBUSY; 2230 break; 2231 } 2232 /* find the interface and a reference */ 2233 error = netmap_get_na(nmr, &na, &ifp, 2234 1 /* create */); /* keep reference */ 2235 if (error) 2236 break; 2237 if (NETMAP_OWNED_BY_KERN(na)) { 2238 netmap_unget_na(na, ifp); 2239 error = EBUSY; 2240 break; 2241 } 2242 2243 if (na->virt_hdr_len && !(nmr->nr_flags & NR_ACCEPT_VNET_HDR)) { 2244 netmap_unget_na(na, ifp); 2245 error = EIO; 2246 break; 2247 } 2248 2249 error = netmap_do_regif(priv, na, nmr->nr_ringid, nmr->nr_flags); 2250 if (error) { /* reg. failed, release priv and ref */ 2251 netmap_unget_na(na, ifp); 2252 break; 2253 } 2254 nifp = priv->np_nifp; 2255 priv->np_td = td; // XXX kqueue, debugging only 2256 2257 /* return the offset of the netmap_if object */ 2258 nmr->nr_rx_rings = na->num_rx_rings; 2259 nmr->nr_tx_rings = na->num_tx_rings; 2260 nmr->nr_rx_slots = na->num_rx_desc; 2261 nmr->nr_tx_slots = na->num_tx_desc; 2262 error = netmap_mem_get_info(na->nm_mem, &nmr->nr_memsize, &memflags, 2263 &nmr->nr_arg2); 2264 if (error) { 2265 netmap_do_unregif(priv); 2266 netmap_unget_na(na, ifp); 2267 break; 2268 } 2269 if (memflags & NETMAP_MEM_PRIVATE) { 2270 *(uint32_t *)(uintptr_t)&nifp->ni_flags |= NI_PRIV_MEM; 2271 } 2272 for_rx_tx(t) { 2273 priv->np_si[t] = nm_si_user(priv, t) ? 2274 &na->si[t] : &NMR(na, t)[priv->np_qfirst[t]].si; 2275 } 2276 2277 if (nmr->nr_arg3) { 2278 if (netmap_verbose) 2279 D("requested %d extra buffers", nmr->nr_arg3); 2280 nmr->nr_arg3 = netmap_extra_alloc(na, 2281 &nifp->ni_bufs_head, nmr->nr_arg3); 2282 if (netmap_verbose) 2283 D("got %d extra buffers", nmr->nr_arg3); 2284 } 2285 nmr->nr_offset = netmap_mem_if_offset(na->nm_mem, nifp); 2286 2287 /* store ifp reference so that priv destructor may release it */ 2288 priv->np_ifp = ifp; 2289 } while (0); 2290 NMG_UNLOCK(); 2291 break; 2292 2293 case NIOCTXSYNC: 2294 case NIOCRXSYNC: 2295 nifp = priv->np_nifp; 2296 2297 if (nifp == NULL) { 2298 error = ENXIO; 2299 break; 2300 } 2301 mb(); /* make sure following reads are not from cache */ 2302 2303 na = priv->np_na; /* we have a reference */ 2304 2305 if (na == NULL) { 2306 D("Internal error: nifp != NULL && na == NULL"); 2307 error = ENXIO; 2308 break; 2309 } 2310 2311 t = (cmd == NIOCTXSYNC ? NR_TX : NR_RX); 2312 krings = NMR(na, t); 2313 qfirst = priv->np_qfirst[t]; 2314 qlast = priv->np_qlast[t]; 2315 2316 for (i = qfirst; i < qlast; i++) { 2317 struct netmap_kring *kring = krings + i; 2318 struct netmap_ring *ring = kring->ring; 2319 2320 if (unlikely(nm_kr_tryget(kring, 1, &error))) { 2321 error = (error ? EIO : 0); 2322 continue; 2323 } 2324 2325 if (cmd == NIOCTXSYNC) { 2326 if (netmap_verbose & NM_VERB_TXSYNC) 2327 D("pre txsync ring %d cur %d hwcur %d", 2328 i, ring->cur, 2329 kring->nr_hwcur); 2330 if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2331 netmap_ring_reinit(kring); 2332 } else if (kring->nm_sync(kring, NAF_FORCE_RECLAIM) == 0) { 2333 nm_sync_finalize(kring); 2334 } 2335 if (netmap_verbose & NM_VERB_TXSYNC) 2336 D("post txsync ring %d cur %d hwcur %d", 2337 i, ring->cur, 2338 kring->nr_hwcur); 2339 } else { 2340 if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2341 netmap_ring_reinit(kring); 2342 } else if (kring->nm_sync(kring, NAF_FORCE_READ) == 0) { 2343 nm_sync_finalize(kring); 2344 } 2345 microtime(&ring->ts); 2346 } 2347 nm_kr_put(kring); 2348 } 2349 2350 break; 2351 2352 #ifdef WITH_VALE 2353 case NIOCCONFIG: 2354 error = netmap_bdg_config(nmr); 2355 break; 2356 #endif 2357 #ifdef __FreeBSD__ 2358 case FIONBIO: 2359 case FIOASYNC: 2360 ND("FIONBIO/FIOASYNC are no-ops"); 2361 break; 2362 2363 case BIOCIMMEDIATE: 2364 case BIOCGHDRCMPLT: 2365 case BIOCSHDRCMPLT: 2366 case BIOCSSEESENT: 2367 D("ignore BIOCIMMEDIATE/BIOCSHDRCMPLT/BIOCSHDRCMPLT/BIOCSSEESENT"); 2368 break; 2369 2370 default: /* allow device-specific ioctls */ 2371 { 2372 struct ifnet *ifp = ifunit_ref(nmr->nr_name); 2373 if (ifp == NULL) { 2374 error = ENXIO; 2375 } else { 2376 struct socket so; 2377 2378 bzero(&so, sizeof(so)); 2379 so.so_vnet = ifp->if_vnet; 2380 // so->so_proto not null. 2381 error = ifioctl(&so, cmd, data, td); 2382 if_rele(ifp); 2383 } 2384 break; 2385 } 2386 2387 #else /* linux */ 2388 default: 2389 error = EOPNOTSUPP; 2390 #endif /* linux */ 2391 } 2392 2393 return (error); 2394 } 2395 2396 2397 /* 2398 * select(2) and poll(2) handlers for the "netmap" device. 2399 * 2400 * Can be called for one or more queues. 2401 * Return true the event mask corresponding to ready events. 2402 * If there are no ready events, do a selrecord on either individual 2403 * selinfo or on the global one. 2404 * Device-dependent parts (locking and sync of tx/rx rings) 2405 * are done through callbacks. 2406 * 2407 * On linux, arguments are really pwait, the poll table, and 'td' is struct file * 2408 * The first one is remapped to pwait as selrecord() uses the name as an 2409 * hidden argument. 2410 */ 2411 int 2412 netmap_poll(struct netmap_priv_d *priv, int events, NM_SELRECORD_T *sr) 2413 { 2414 struct netmap_adapter *na; 2415 struct netmap_kring *kring; 2416 struct netmap_ring *ring; 2417 u_int i, check_all_tx, check_all_rx, want[NR_TXRX], revents = 0; 2418 #define want_tx want[NR_TX] 2419 #define want_rx want[NR_RX] 2420 struct mbq q; /* packets from hw queues to host stack */ 2421 enum txrx t; 2422 2423 /* 2424 * In order to avoid nested locks, we need to "double check" 2425 * txsync and rxsync if we decide to do a selrecord(). 2426 * retry_tx (and retry_rx, later) prevent looping forever. 2427 */ 2428 int retry_tx = 1, retry_rx = 1; 2429 2430 /* transparent mode: send_down is 1 if we have found some 2431 * packets to forward during the rx scan and we have not 2432 * sent them down to the nic yet 2433 */ 2434 int send_down = 0; 2435 2436 mbq_init(&q); 2437 2438 if (priv->np_nifp == NULL) { 2439 D("No if registered"); 2440 return POLLERR; 2441 } 2442 mb(); /* make sure following reads are not from cache */ 2443 2444 na = priv->np_na; 2445 2446 if (!nm_netmap_on(na)) 2447 return POLLERR; 2448 2449 if (netmap_verbose & 0x8000) 2450 D("device %s events 0x%x", na->name, events); 2451 want_tx = events & (POLLOUT | POLLWRNORM); 2452 want_rx = events & (POLLIN | POLLRDNORM); 2453 2454 /* 2455 * check_all_{tx|rx} are set if the card has more than one queue AND 2456 * the file descriptor is bound to all of them. If so, we sleep on 2457 * the "global" selinfo, otherwise we sleep on individual selinfo 2458 * (FreeBSD only allows two selinfo's per file descriptor). 2459 * The interrupt routine in the driver wake one or the other 2460 * (or both) depending on which clients are active. 2461 * 2462 * rxsync() is only called if we run out of buffers on a POLLIN. 2463 * txsync() is called if we run out of buffers on POLLOUT, or 2464 * there are pending packets to send. The latter can be disabled 2465 * passing NETMAP_NO_TX_POLL in the NIOCREG call. 2466 */ 2467 check_all_tx = nm_si_user(priv, NR_TX); 2468 check_all_rx = nm_si_user(priv, NR_RX); 2469 2470 /* 2471 * We start with a lock free round which is cheap if we have 2472 * slots available. If this fails, then lock and call the sync 2473 * routines. 2474 */ 2475 #if 1 /* new code- call rx if any of the ring needs to release or read buffers */ 2476 if (want_tx) { 2477 t = NR_TX; 2478 for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) { 2479 kring = &NMR(na, t)[i]; 2480 /* XXX compare ring->cur and kring->tail */ 2481 if (!nm_ring_empty(kring->ring)) { 2482 revents |= want[t]; 2483 want[t] = 0; /* also breaks the loop */ 2484 } 2485 } 2486 } 2487 if (want_rx) { 2488 want_rx = 0; /* look for a reason to run the handlers */ 2489 t = NR_RX; 2490 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 2491 kring = &NMR(na, t)[i]; 2492 if (kring->ring->cur == kring->ring->tail /* try fetch new buffers */ 2493 || kring->rhead != kring->ring->head /* release buffers */) { 2494 want_rx = 1; 2495 } 2496 } 2497 if (!want_rx) 2498 revents |= events & (POLLIN | POLLRDNORM); /* we have data */ 2499 } 2500 #else /* old code */ 2501 for_rx_tx(t) { 2502 for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) { 2503 kring = &NMR(na, t)[i]; 2504 /* XXX compare ring->cur and kring->tail */ 2505 if (!nm_ring_empty(kring->ring)) { 2506 revents |= want[t]; 2507 want[t] = 0; /* also breaks the loop */ 2508 } 2509 } 2510 } 2511 #endif /* old code */ 2512 2513 /* 2514 * If we want to push packets out (priv->np_txpoll) or 2515 * want_tx is still set, we must issue txsync calls 2516 * (on all rings, to avoid that the tx rings stall). 2517 * XXX should also check cur != hwcur on the tx rings. 2518 * Fortunately, normal tx mode has np_txpoll set. 2519 */ 2520 if (priv->np_txpoll || want_tx) { 2521 /* 2522 * The first round checks if anyone is ready, if not 2523 * do a selrecord and another round to handle races. 2524 * want_tx goes to 0 if any space is found, and is 2525 * used to skip rings with no pending transmissions. 2526 */ 2527 flush_tx: 2528 for (i = priv->np_qfirst[NR_TX]; i < priv->np_qlast[NR_TX]; i++) { 2529 int found = 0; 2530 2531 kring = &na->tx_rings[i]; 2532 ring = kring->ring; 2533 2534 if (!send_down && !want_tx && ring->cur == kring->nr_hwcur) 2535 continue; 2536 2537 if (nm_kr_tryget(kring, 1, &revents)) 2538 continue; 2539 2540 if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2541 netmap_ring_reinit(kring); 2542 revents |= POLLERR; 2543 } else { 2544 if (kring->nm_sync(kring, 0)) 2545 revents |= POLLERR; 2546 else 2547 nm_sync_finalize(kring); 2548 } 2549 2550 /* 2551 * If we found new slots, notify potential 2552 * listeners on the same ring. 2553 * Since we just did a txsync, look at the copies 2554 * of cur,tail in the kring. 2555 */ 2556 found = kring->rcur != kring->rtail; 2557 nm_kr_put(kring); 2558 if (found) { /* notify other listeners */ 2559 revents |= want_tx; 2560 want_tx = 0; 2561 kring->nm_notify(kring, 0); 2562 } 2563 } 2564 /* if there were any packet to forward we must have handled them by now */ 2565 send_down = 0; 2566 if (want_tx && retry_tx && sr) { 2567 nm_os_selrecord(sr, check_all_tx ? 2568 &na->si[NR_TX] : &na->tx_rings[priv->np_qfirst[NR_TX]].si); 2569 retry_tx = 0; 2570 goto flush_tx; 2571 } 2572 } 2573 2574 /* 2575 * If want_rx is still set scan receive rings. 2576 * Do it on all rings because otherwise we starve. 2577 */ 2578 if (want_rx) { 2579 /* two rounds here for race avoidance */ 2580 do_retry_rx: 2581 for (i = priv->np_qfirst[NR_RX]; i < priv->np_qlast[NR_RX]; i++) { 2582 int found = 0; 2583 2584 kring = &na->rx_rings[i]; 2585 ring = kring->ring; 2586 2587 if (unlikely(nm_kr_tryget(kring, 1, &revents))) 2588 continue; 2589 2590 if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2591 netmap_ring_reinit(kring); 2592 revents |= POLLERR; 2593 } 2594 /* now we can use kring->rcur, rtail */ 2595 2596 /* 2597 * transparent mode support: collect packets 2598 * from the rxring(s). 2599 */ 2600 if (nm_may_forward_up(kring)) { 2601 ND(10, "forwarding some buffers up %d to %d", 2602 kring->nr_hwcur, ring->cur); 2603 netmap_grab_packets(kring, &q, netmap_fwd); 2604 } 2605 2606 kring->nr_kflags &= ~NR_FORWARD; 2607 if (kring->nm_sync(kring, 0)) 2608 revents |= POLLERR; 2609 else 2610 nm_sync_finalize(kring); 2611 send_down |= (kring->nr_kflags & NR_FORWARD); /* host ring only */ 2612 if (netmap_no_timestamp == 0 || 2613 ring->flags & NR_TIMESTAMP) { 2614 microtime(&ring->ts); 2615 } 2616 found = kring->rcur != kring->rtail; 2617 nm_kr_put(kring); 2618 if (found) { 2619 revents |= want_rx; 2620 retry_rx = 0; 2621 kring->nm_notify(kring, 0); 2622 } 2623 } 2624 2625 if (retry_rx && sr) { 2626 nm_os_selrecord(sr, check_all_rx ? 2627 &na->si[NR_RX] : &na->rx_rings[priv->np_qfirst[NR_RX]].si); 2628 } 2629 if (send_down > 0 || retry_rx) { 2630 retry_rx = 0; 2631 if (send_down) 2632 goto flush_tx; /* and retry_rx */ 2633 else 2634 goto do_retry_rx; 2635 } 2636 } 2637 2638 /* 2639 * Transparent mode: marked bufs on rx rings between 2640 * kring->nr_hwcur and ring->head 2641 * are passed to the other endpoint. 2642 * 2643 * Transparent mode requires to bind all 2644 * rings to a single file descriptor. 2645 */ 2646 2647 if (q.head && !nm_kr_tryget(&na->tx_rings[na->num_tx_rings], 1, &revents)) { 2648 netmap_send_up(na->ifp, &q); 2649 nm_kr_put(&na->tx_rings[na->num_tx_rings]); 2650 } 2651 2652 return (revents); 2653 #undef want_tx 2654 #undef want_rx 2655 } 2656 2657 2658 /*-------------------- driver support routines -------------------*/ 2659 2660 /* default notify callback */ 2661 static int 2662 netmap_notify(struct netmap_kring *kring, int flags) 2663 { 2664 struct netmap_adapter *na = kring->na; 2665 enum txrx t = kring->tx; 2666 2667 nm_os_selwakeup(&kring->si); 2668 /* optimization: avoid a wake up on the global 2669 * queue if nobody has registered for more 2670 * than one ring 2671 */ 2672 if (na->si_users[t] > 0) 2673 nm_os_selwakeup(&na->si[t]); 2674 2675 return NM_IRQ_COMPLETED; 2676 } 2677 2678 #if 0 2679 static int 2680 netmap_notify(struct netmap_adapter *na, u_int n_ring, 2681 enum txrx tx, int flags) 2682 { 2683 if (tx == NR_TX) { 2684 KeSetEvent(notes->TX_EVENT, 0, FALSE); 2685 } 2686 else 2687 { 2688 KeSetEvent(notes->RX_EVENT, 0, FALSE); 2689 } 2690 return 0; 2691 } 2692 #endif 2693 2694 /* called by all routines that create netmap_adapters. 2695 * provide some defaults and get a reference to the 2696 * memory allocator 2697 */ 2698 int 2699 netmap_attach_common(struct netmap_adapter *na) 2700 { 2701 if (na->num_tx_rings == 0 || na->num_rx_rings == 0) { 2702 D("%s: invalid rings tx %d rx %d", 2703 na->name, na->num_tx_rings, na->num_rx_rings); 2704 return EINVAL; 2705 } 2706 2707 #ifdef __FreeBSD__ 2708 if (na->na_flags & NAF_HOST_RINGS && na->ifp) { 2709 na->if_input = na->ifp->if_input; /* for netmap_send_up */ 2710 } 2711 #endif /* __FreeBSD__ */ 2712 if (na->nm_krings_create == NULL) { 2713 /* we assume that we have been called by a driver, 2714 * since other port types all provide their own 2715 * nm_krings_create 2716 */ 2717 na->nm_krings_create = netmap_hw_krings_create; 2718 na->nm_krings_delete = netmap_hw_krings_delete; 2719 } 2720 if (na->nm_notify == NULL) 2721 na->nm_notify = netmap_notify; 2722 na->active_fds = 0; 2723 2724 if (na->nm_mem == NULL) 2725 /* use the global allocator */ 2726 na->nm_mem = &nm_mem; 2727 netmap_mem_get(na->nm_mem); 2728 #ifdef WITH_VALE 2729 if (na->nm_bdg_attach == NULL) 2730 /* no special nm_bdg_attach callback. On VALE 2731 * attach, we need to interpose a bwrap 2732 */ 2733 na->nm_bdg_attach = netmap_bwrap_attach; 2734 #endif 2735 2736 return 0; 2737 } 2738 2739 2740 /* standard cleanup, called by all destructors */ 2741 void 2742 netmap_detach_common(struct netmap_adapter *na) 2743 { 2744 if (na->tx_rings) { /* XXX should not happen */ 2745 D("freeing leftover tx_rings"); 2746 na->nm_krings_delete(na); 2747 } 2748 netmap_pipe_dealloc(na); 2749 if (na->nm_mem) 2750 netmap_mem_put(na->nm_mem); 2751 bzero(na, sizeof(*na)); 2752 free(na, M_DEVBUF); 2753 } 2754 2755 /* Wrapper for the register callback provided netmap-enabled 2756 * hardware drivers. 2757 * nm_iszombie(na) means that the driver module has been 2758 * unloaded, so we cannot call into it. 2759 * nm_os_ifnet_lock() must guarantee mutual exclusion with 2760 * module unloading. 2761 */ 2762 static int 2763 netmap_hw_reg(struct netmap_adapter *na, int onoff) 2764 { 2765 struct netmap_hw_adapter *hwna = 2766 (struct netmap_hw_adapter*)na; 2767 int error = 0; 2768 2769 nm_os_ifnet_lock(); 2770 2771 if (nm_iszombie(na)) { 2772 if (onoff) { 2773 error = ENXIO; 2774 } else if (na != NULL) { 2775 na->na_flags &= ~NAF_NETMAP_ON; 2776 } 2777 goto out; 2778 } 2779 2780 error = hwna->nm_hw_register(na, onoff); 2781 2782 out: 2783 nm_os_ifnet_unlock(); 2784 2785 return error; 2786 } 2787 2788 static void 2789 netmap_hw_dtor(struct netmap_adapter *na) 2790 { 2791 if (nm_iszombie(na) || na->ifp == NULL) 2792 return; 2793 2794 WNA(na->ifp) = NULL; 2795 } 2796 2797 2798 /* 2799 * Allocate a ``netmap_adapter`` object, and initialize it from the 2800 * 'arg' passed by the driver on attach. 2801 * We allocate a block of memory with room for a struct netmap_adapter 2802 * plus two sets of N+2 struct netmap_kring (where N is the number 2803 * of hardware rings): 2804 * krings 0..N-1 are for the hardware queues. 2805 * kring N is for the host stack queue 2806 * kring N+1 is only used for the selinfo for all queues. // XXX still true ? 2807 * Return 0 on success, ENOMEM otherwise. 2808 */ 2809 static int 2810 _netmap_attach(struct netmap_adapter *arg, size_t size) 2811 { 2812 struct netmap_hw_adapter *hwna = NULL; 2813 struct ifnet *ifp = NULL; 2814 2815 if (arg == NULL || arg->ifp == NULL) 2816 goto fail; 2817 ifp = arg->ifp; 2818 hwna = malloc(size, M_DEVBUF, M_NOWAIT | M_ZERO); 2819 if (hwna == NULL) 2820 goto fail; 2821 hwna->up = *arg; 2822 hwna->up.na_flags |= NAF_HOST_RINGS | NAF_NATIVE; 2823 strncpy(hwna->up.name, ifp->if_xname, sizeof(hwna->up.name)); 2824 hwna->nm_hw_register = hwna->up.nm_register; 2825 hwna->up.nm_register = netmap_hw_reg; 2826 if (netmap_attach_common(&hwna->up)) { 2827 free(hwna, M_DEVBUF); 2828 goto fail; 2829 } 2830 netmap_adapter_get(&hwna->up); 2831 2832 NM_ATTACH_NA(ifp, &hwna->up); 2833 2834 #ifdef linux 2835 if (ifp->netdev_ops) { 2836 /* prepare a clone of the netdev ops */ 2837 #ifndef NETMAP_LINUX_HAVE_NETDEV_OPS 2838 hwna->nm_ndo.ndo_start_xmit = ifp->netdev_ops; 2839 #else 2840 hwna->nm_ndo = *ifp->netdev_ops; 2841 #endif /* NETMAP_LINUX_HAVE_NETDEV_OPS */ 2842 } 2843 hwna->nm_ndo.ndo_start_xmit = linux_netmap_start_xmit; 2844 if (ifp->ethtool_ops) { 2845 hwna->nm_eto = *ifp->ethtool_ops; 2846 } 2847 hwna->nm_eto.set_ringparam = linux_netmap_set_ringparam; 2848 #ifdef NETMAP_LINUX_HAVE_SET_CHANNELS 2849 hwna->nm_eto.set_channels = linux_netmap_set_channels; 2850 #endif /* NETMAP_LINUX_HAVE_SET_CHANNELS */ 2851 if (arg->nm_config == NULL) { 2852 hwna->up.nm_config = netmap_linux_config; 2853 } 2854 #endif /* linux */ 2855 if (arg->nm_dtor == NULL) { 2856 hwna->up.nm_dtor = netmap_hw_dtor; 2857 } 2858 2859 if_printf(ifp, "netmap queues/slots: TX %d/%d, RX %d/%d\n", 2860 hwna->up.num_tx_rings, hwna->up.num_tx_desc, 2861 hwna->up.num_rx_rings, hwna->up.num_rx_desc); 2862 return 0; 2863 2864 fail: 2865 D("fail, arg %p ifp %p na %p", arg, ifp, hwna); 2866 return (hwna ? EINVAL : ENOMEM); 2867 } 2868 2869 2870 int 2871 netmap_attach(struct netmap_adapter *arg) 2872 { 2873 return _netmap_attach(arg, sizeof(struct netmap_hw_adapter)); 2874 } 2875 2876 2877 #ifdef WITH_PTNETMAP_GUEST 2878 int 2879 netmap_pt_guest_attach(struct netmap_adapter *arg, 2880 void *csb, 2881 unsigned int nifp_offset, 2882 nm_pt_guest_ptctl_t ptctl) 2883 { 2884 struct netmap_pt_guest_adapter *ptna; 2885 struct ifnet *ifp = arg ? arg->ifp : NULL; 2886 int error; 2887 2888 /* get allocator */ 2889 arg->nm_mem = netmap_mem_pt_guest_new(ifp, nifp_offset, ptctl); 2890 if (arg->nm_mem == NULL) 2891 return ENOMEM; 2892 arg->na_flags |= NAF_MEM_OWNER; 2893 error = _netmap_attach(arg, sizeof(struct netmap_pt_guest_adapter)); 2894 if (error) 2895 return error; 2896 2897 /* get the netmap_pt_guest_adapter */ 2898 ptna = (struct netmap_pt_guest_adapter *) NA(ifp); 2899 ptna->csb = csb; 2900 2901 /* Initialize a separate pass-through netmap adapter that is going to 2902 * be used by the ptnet driver only, and so never exposed to netmap 2903 * applications. We only need a subset of the available fields. */ 2904 memset(&ptna->dr, 0, sizeof(ptna->dr)); 2905 ptna->dr.up.ifp = ifp; 2906 ptna->dr.up.nm_mem = ptna->hwup.up.nm_mem; 2907 netmap_mem_get(ptna->dr.up.nm_mem); 2908 ptna->dr.up.nm_config = ptna->hwup.up.nm_config; 2909 2910 ptna->backend_regifs = 0; 2911 2912 return 0; 2913 } 2914 #endif /* WITH_PTNETMAP_GUEST */ 2915 2916 2917 void 2918 NM_DBG(netmap_adapter_get)(struct netmap_adapter *na) 2919 { 2920 if (!na) { 2921 return; 2922 } 2923 2924 refcount_acquire(&na->na_refcount); 2925 } 2926 2927 2928 /* returns 1 iff the netmap_adapter is destroyed */ 2929 int 2930 NM_DBG(netmap_adapter_put)(struct netmap_adapter *na) 2931 { 2932 if (!na) 2933 return 1; 2934 2935 if (!refcount_release(&na->na_refcount)) 2936 return 0; 2937 2938 if (na->nm_dtor) 2939 na->nm_dtor(na); 2940 2941 netmap_detach_common(na); 2942 2943 return 1; 2944 } 2945 2946 /* nm_krings_create callback for all hardware native adapters */ 2947 int 2948 netmap_hw_krings_create(struct netmap_adapter *na) 2949 { 2950 int ret = netmap_krings_create(na, 0); 2951 if (ret == 0) { 2952 /* initialize the mbq for the sw rx ring */ 2953 mbq_safe_init(&na->rx_rings[na->num_rx_rings].rx_queue); 2954 ND("initialized sw rx queue %d", na->num_rx_rings); 2955 } 2956 return ret; 2957 } 2958 2959 2960 2961 /* 2962 * Called on module unload by the netmap-enabled drivers 2963 */ 2964 void 2965 netmap_detach(struct ifnet *ifp) 2966 { 2967 struct netmap_adapter *na = NA(ifp); 2968 2969 if (!na) 2970 return; 2971 2972 NMG_LOCK(); 2973 netmap_set_all_rings(na, NM_KR_LOCKED); 2974 na->na_flags |= NAF_ZOMBIE; 2975 /* 2976 * if the netmap adapter is not native, somebody 2977 * changed it, so we can not release it here. 2978 * The NAF_ZOMBIE flag will notify the new owner that 2979 * the driver is gone. 2980 */ 2981 if (na->na_flags & NAF_NATIVE) { 2982 netmap_adapter_put(na); 2983 } 2984 /* give active users a chance to notice that NAF_ZOMBIE has been 2985 * turned on, so that they can stop and return an error to userspace. 2986 * Note that this becomes a NOP if there are no active users and, 2987 * therefore, the put() above has deleted the na, since now NA(ifp) is 2988 * NULL. 2989 */ 2990 netmap_enable_all_rings(ifp); 2991 NMG_UNLOCK(); 2992 } 2993 2994 2995 /* 2996 * Intercept packets from the network stack and pass them 2997 * to netmap as incoming packets on the 'software' ring. 2998 * 2999 * We only store packets in a bounded mbq and then copy them 3000 * in the relevant rxsync routine. 3001 * 3002 * We rely on the OS to make sure that the ifp and na do not go 3003 * away (typically the caller checks for IFF_DRV_RUNNING or the like). 3004 * In nm_register() or whenever there is a reinitialization, 3005 * we make sure to make the mode change visible here. 3006 */ 3007 int 3008 netmap_transmit(struct ifnet *ifp, struct mbuf *m) 3009 { 3010 struct netmap_adapter *na = NA(ifp); 3011 struct netmap_kring *kring, *tx_kring; 3012 u_int len = MBUF_LEN(m); 3013 u_int error = ENOBUFS; 3014 unsigned int txr; 3015 struct mbq *q; 3016 int space; 3017 3018 kring = &na->rx_rings[na->num_rx_rings]; 3019 // XXX [Linux] we do not need this lock 3020 // if we follow the down/configure/up protocol -gl 3021 // mtx_lock(&na->core_lock); 3022 3023 if (!nm_netmap_on(na)) { 3024 D("%s not in netmap mode anymore", na->name); 3025 error = ENXIO; 3026 goto done; 3027 } 3028 3029 txr = MBUF_TXQ(m); 3030 if (txr >= na->num_tx_rings) { 3031 txr %= na->num_tx_rings; 3032 } 3033 tx_kring = &NMR(na, NR_TX)[txr]; 3034 3035 if (tx_kring->nr_mode == NKR_NETMAP_OFF) { 3036 return MBUF_TRANSMIT(na, ifp, m); 3037 } 3038 3039 q = &kring->rx_queue; 3040 3041 // XXX reconsider long packets if we handle fragments 3042 if (len > NETMAP_BUF_SIZE(na)) { /* too long for us */ 3043 D("%s from_host, drop packet size %d > %d", na->name, 3044 len, NETMAP_BUF_SIZE(na)); 3045 goto done; 3046 } 3047 3048 if (nm_os_mbuf_has_offld(m)) { 3049 RD(1, "%s drop mbuf requiring offloadings", na->name); 3050 goto done; 3051 } 3052 3053 /* protect against rxsync_from_host(), netmap_sw_to_nic() 3054 * and maybe other instances of netmap_transmit (the latter 3055 * not possible on Linux). 3056 * Also avoid overflowing the queue. 3057 */ 3058 mbq_lock(q); 3059 3060 space = kring->nr_hwtail - kring->nr_hwcur; 3061 if (space < 0) 3062 space += kring->nkr_num_slots; 3063 if (space + mbq_len(q) >= kring->nkr_num_slots - 1) { // XXX 3064 RD(10, "%s full hwcur %d hwtail %d qlen %d len %d m %p", 3065 na->name, kring->nr_hwcur, kring->nr_hwtail, mbq_len(q), 3066 len, m); 3067 } else { 3068 mbq_enqueue(q, m); 3069 ND(10, "%s %d bufs in queue len %d m %p", 3070 na->name, mbq_len(q), len, m); 3071 /* notify outside the lock */ 3072 m = NULL; 3073 error = 0; 3074 } 3075 mbq_unlock(q); 3076 3077 done: 3078 if (m) 3079 m_freem(m); 3080 /* unconditionally wake up listeners */ 3081 kring->nm_notify(kring, 0); 3082 /* this is normally netmap_notify(), but for nics 3083 * connected to a bridge it is netmap_bwrap_intr_notify(), 3084 * that possibly forwards the frames through the switch 3085 */ 3086 3087 return (error); 3088 } 3089 3090 3091 /* 3092 * netmap_reset() is called by the driver routines when reinitializing 3093 * a ring. The driver is in charge of locking to protect the kring. 3094 * If native netmap mode is not set just return NULL. 3095 * If native netmap mode is set, in particular, we have to set nr_mode to 3096 * NKR_NETMAP_ON. 3097 */ 3098 struct netmap_slot * 3099 netmap_reset(struct netmap_adapter *na, enum txrx tx, u_int n, 3100 u_int new_cur) 3101 { 3102 struct netmap_kring *kring; 3103 int new_hwofs, lim; 3104 3105 if (!nm_native_on(na)) { 3106 ND("interface not in native netmap mode"); 3107 return NULL; /* nothing to reinitialize */ 3108 } 3109 3110 /* XXX note- in the new scheme, we are not guaranteed to be 3111 * under lock (e.g. when called on a device reset). 3112 * In this case, we should set a flag and do not trust too 3113 * much the values. In practice: TODO 3114 * - set a RESET flag somewhere in the kring 3115 * - do the processing in a conservative way 3116 * - let the *sync() fixup at the end. 3117 */ 3118 if (tx == NR_TX) { 3119 if (n >= na->num_tx_rings) 3120 return NULL; 3121 3122 kring = na->tx_rings + n; 3123 3124 if (kring->nr_pending_mode == NKR_NETMAP_OFF) { 3125 kring->nr_mode = NKR_NETMAP_OFF; 3126 return NULL; 3127 } 3128 3129 // XXX check whether we should use hwcur or rcur 3130 new_hwofs = kring->nr_hwcur - new_cur; 3131 } else { 3132 if (n >= na->num_rx_rings) 3133 return NULL; 3134 kring = na->rx_rings + n; 3135 3136 if (kring->nr_pending_mode == NKR_NETMAP_OFF) { 3137 kring->nr_mode = NKR_NETMAP_OFF; 3138 return NULL; 3139 } 3140 3141 new_hwofs = kring->nr_hwtail - new_cur; 3142 } 3143 lim = kring->nkr_num_slots - 1; 3144 if (new_hwofs > lim) 3145 new_hwofs -= lim + 1; 3146 3147 /* Always set the new offset value and realign the ring. */ 3148 if (netmap_verbose) 3149 D("%s %s%d hwofs %d -> %d, hwtail %d -> %d", 3150 na->name, 3151 tx == NR_TX ? "TX" : "RX", n, 3152 kring->nkr_hwofs, new_hwofs, 3153 kring->nr_hwtail, 3154 tx == NR_TX ? lim : kring->nr_hwtail); 3155 kring->nkr_hwofs = new_hwofs; 3156 if (tx == NR_TX) { 3157 kring->nr_hwtail = kring->nr_hwcur + lim; 3158 if (kring->nr_hwtail > lim) 3159 kring->nr_hwtail -= lim + 1; 3160 } 3161 3162 #if 0 // def linux 3163 /* XXX check that the mappings are correct */ 3164 /* need ring_nr, adapter->pdev, direction */ 3165 buffer_info->dma = dma_map_single(&pdev->dev, addr, adapter->rx_buffer_len, DMA_FROM_DEVICE); 3166 if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) { 3167 D("error mapping rx netmap buffer %d", i); 3168 // XXX fix error handling 3169 } 3170 3171 #endif /* linux */ 3172 /* 3173 * Wakeup on the individual and global selwait 3174 * We do the wakeup here, but the ring is not yet reconfigured. 3175 * However, we are under lock so there are no races. 3176 */ 3177 kring->nr_mode = NKR_NETMAP_ON; 3178 kring->nm_notify(kring, 0); 3179 return kring->ring->slot; 3180 } 3181 3182 3183 /* 3184 * Dispatch rx/tx interrupts to the netmap rings. 3185 * 3186 * "work_done" is non-null on the RX path, NULL for the TX path. 3187 * We rely on the OS to make sure that there is only one active 3188 * instance per queue, and that there is appropriate locking. 3189 * 3190 * The 'notify' routine depends on what the ring is attached to. 3191 * - for a netmap file descriptor, do a selwakeup on the individual 3192 * waitqueue, plus one on the global one if needed 3193 * (see netmap_notify) 3194 * - for a nic connected to a switch, call the proper forwarding routine 3195 * (see netmap_bwrap_intr_notify) 3196 */ 3197 int 3198 netmap_common_irq(struct netmap_adapter *na, u_int q, u_int *work_done) 3199 { 3200 struct netmap_kring *kring; 3201 enum txrx t = (work_done ? NR_RX : NR_TX); 3202 3203 q &= NETMAP_RING_MASK; 3204 3205 if (netmap_verbose) { 3206 RD(5, "received %s queue %d", work_done ? "RX" : "TX" , q); 3207 } 3208 3209 if (q >= nma_get_nrings(na, t)) 3210 return NM_IRQ_PASS; // not a physical queue 3211 3212 kring = NMR(na, t) + q; 3213 3214 if (kring->nr_mode == NKR_NETMAP_OFF) { 3215 return NM_IRQ_PASS; 3216 } 3217 3218 if (t == NR_RX) { 3219 kring->nr_kflags |= NKR_PENDINTR; // XXX atomic ? 3220 *work_done = 1; /* do not fire napi again */ 3221 } 3222 3223 return kring->nm_notify(kring, 0); 3224 } 3225 3226 3227 /* 3228 * Default functions to handle rx/tx interrupts from a physical device. 3229 * "work_done" is non-null on the RX path, NULL for the TX path. 3230 * 3231 * If the card is not in netmap mode, simply return NM_IRQ_PASS, 3232 * so that the caller proceeds with regular processing. 3233 * Otherwise call netmap_common_irq(). 3234 * 3235 * If the card is connected to a netmap file descriptor, 3236 * do a selwakeup on the individual queue, plus one on the global one 3237 * if needed (multiqueue card _and_ there are multiqueue listeners), 3238 * and return NR_IRQ_COMPLETED. 3239 * 3240 * Finally, if called on rx from an interface connected to a switch, 3241 * calls the proper forwarding routine. 3242 */ 3243 int 3244 netmap_rx_irq(struct ifnet *ifp, u_int q, u_int *work_done) 3245 { 3246 struct netmap_adapter *na = NA(ifp); 3247 3248 /* 3249 * XXX emulated netmap mode sets NAF_SKIP_INTR so 3250 * we still use the regular driver even though the previous 3251 * check fails. It is unclear whether we should use 3252 * nm_native_on() here. 3253 */ 3254 if (!nm_netmap_on(na)) 3255 return NM_IRQ_PASS; 3256 3257 if (na->na_flags & NAF_SKIP_INTR) { 3258 ND("use regular interrupt"); 3259 return NM_IRQ_PASS; 3260 } 3261 3262 return netmap_common_irq(na, q, work_done); 3263 } 3264 3265 3266 /* 3267 * Module loader and unloader 3268 * 3269 * netmap_init() creates the /dev/netmap device and initializes 3270 * all global variables. Returns 0 on success, errno on failure 3271 * (but there is no chance) 3272 * 3273 * netmap_fini() destroys everything. 3274 */ 3275 3276 static struct cdev *netmap_dev; /* /dev/netmap character device. */ 3277 extern struct cdevsw netmap_cdevsw; 3278 3279 3280 void 3281 netmap_fini(void) 3282 { 3283 if (netmap_dev) 3284 destroy_dev(netmap_dev); 3285 /* we assume that there are no longer netmap users */ 3286 nm_os_ifnet_fini(); 3287 netmap_uninit_bridges(); 3288 netmap_mem_fini(); 3289 NMG_LOCK_DESTROY(); 3290 printf("netmap: unloaded module.\n"); 3291 } 3292 3293 3294 int 3295 netmap_init(void) 3296 { 3297 int error; 3298 3299 NMG_LOCK_INIT(); 3300 3301 error = netmap_mem_init(); 3302 if (error != 0) 3303 goto fail; 3304 /* 3305 * MAKEDEV_ETERNAL_KLD avoids an expensive check on syscalls 3306 * when the module is compiled in. 3307 * XXX could use make_dev_credv() to get error number 3308 */ 3309 netmap_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, 3310 &netmap_cdevsw, 0, NULL, UID_ROOT, GID_WHEEL, 0600, 3311 "netmap"); 3312 if (!netmap_dev) 3313 goto fail; 3314 3315 error = netmap_init_bridges(); 3316 if (error) 3317 goto fail; 3318 3319 #ifdef __FreeBSD__ 3320 nm_os_vi_init_index(); 3321 #endif 3322 3323 error = nm_os_ifnet_init(); 3324 if (error) 3325 goto fail; 3326 3327 printf("netmap: loaded module\n"); 3328 return (0); 3329 fail: 3330 netmap_fini(); 3331 return (EINVAL); /* may be incorrect */ 3332 } 3333