1 /* 2 * Copyright (C) 2011-2014 Matteo Landi 3 * Copyright (C) 2011-2016 Luigi Rizzo 4 * Copyright (C) 2011-2016 Giuseppe Lettieri 5 * Copyright (C) 2011-2016 Vincenzo Maffione 6 * All rights reserved. 7 * 8 * Redistribution and use in source and binary forms, with or without 9 * modification, are permitted provided that the following conditions 10 * are met: 11 * 1. Redistributions of source code must retain the above copyright 12 * notice, this list of conditions and the following disclaimer. 13 * 2. Redistributions in binary form must reproduce the above copyright 14 * notice, this list of conditions and the following disclaimer in the 15 * documentation and/or other materials provided with the distribution. 16 * 17 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 18 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 19 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 20 * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 21 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 22 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 23 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 24 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 25 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 26 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 27 * SUCH DAMAGE. 28 */ 29 30 31 /* 32 * $FreeBSD$ 33 * 34 * This module supports memory mapped access to network devices, 35 * see netmap(4). 36 * 37 * The module uses a large, memory pool allocated by the kernel 38 * and accessible as mmapped memory by multiple userspace threads/processes. 39 * The memory pool contains packet buffers and "netmap rings", 40 * i.e. user-accessible copies of the interface's queues. 41 * 42 * Access to the network card works like this: 43 * 1. a process/thread issues one or more open() on /dev/netmap, to create 44 * select()able file descriptor on which events are reported. 45 * 2. on each descriptor, the process issues an ioctl() to identify 46 * the interface that should report events to the file descriptor. 47 * 3. on each descriptor, the process issues an mmap() request to 48 * map the shared memory region within the process' address space. 49 * The list of interesting queues is indicated by a location in 50 * the shared memory region. 51 * 4. using the functions in the netmap(4) userspace API, a process 52 * can look up the occupation state of a queue, access memory buffers, 53 * and retrieve received packets or enqueue packets to transmit. 54 * 5. using some ioctl()s the process can synchronize the userspace view 55 * of the queue with the actual status in the kernel. This includes both 56 * receiving the notification of new packets, and transmitting new 57 * packets on the output interface. 58 * 6. select() or poll() can be used to wait for events on individual 59 * transmit or receive queues (or all queues for a given interface). 60 * 61 62 SYNCHRONIZATION (USER) 63 64 The netmap rings and data structures may be shared among multiple 65 user threads or even independent processes. 66 Any synchronization among those threads/processes is delegated 67 to the threads themselves. Only one thread at a time can be in 68 a system call on the same netmap ring. The OS does not enforce 69 this and only guarantees against system crashes in case of 70 invalid usage. 71 72 LOCKING (INTERNAL) 73 74 Within the kernel, access to the netmap rings is protected as follows: 75 76 - a spinlock on each ring, to handle producer/consumer races on 77 RX rings attached to the host stack (against multiple host 78 threads writing from the host stack to the same ring), 79 and on 'destination' rings attached to a VALE switch 80 (i.e. RX rings in VALE ports, and TX rings in NIC/host ports) 81 protecting multiple active senders for the same destination) 82 83 - an atomic variable to guarantee that there is at most one 84 instance of *_*xsync() on the ring at any time. 85 For rings connected to user file 86 descriptors, an atomic_test_and_set() protects this, and the 87 lock on the ring is not actually used. 88 For NIC RX rings connected to a VALE switch, an atomic_test_and_set() 89 is also used to prevent multiple executions (the driver might indeed 90 already guarantee this). 91 For NIC TX rings connected to a VALE switch, the lock arbitrates 92 access to the queue (both when allocating buffers and when pushing 93 them out). 94 95 - *xsync() should be protected against initializations of the card. 96 On FreeBSD most devices have the reset routine protected by 97 a RING lock (ixgbe, igb, em) or core lock (re). lem is missing 98 the RING protection on rx_reset(), this should be added. 99 100 On linux there is an external lock on the tx path, which probably 101 also arbitrates access to the reset routine. XXX to be revised 102 103 - a per-interface core_lock protecting access from the host stack 104 while interfaces may be detached from netmap mode. 105 XXX there should be no need for this lock if we detach the interfaces 106 only while they are down. 107 108 109 --- VALE SWITCH --- 110 111 NMG_LOCK() serializes all modifications to switches and ports. 112 A switch cannot be deleted until all ports are gone. 113 114 For each switch, an SX lock (RWlock on linux) protects 115 deletion of ports. When configuring or deleting a new port, the 116 lock is acquired in exclusive mode (after holding NMG_LOCK). 117 When forwarding, the lock is acquired in shared mode (without NMG_LOCK). 118 The lock is held throughout the entire forwarding cycle, 119 during which the thread may incur in a page fault. 120 Hence it is important that sleepable shared locks are used. 121 122 On the rx ring, the per-port lock is grabbed initially to reserve 123 a number of slot in the ring, then the lock is released, 124 packets are copied from source to destination, and then 125 the lock is acquired again and the receive ring is updated. 126 (A similar thing is done on the tx ring for NIC and host stack 127 ports attached to the switch) 128 129 */ 130 131 132 /* --- internals ---- 133 * 134 * Roadmap to the code that implements the above. 135 * 136 * > 1. a process/thread issues one or more open() on /dev/netmap, to create 137 * > select()able file descriptor on which events are reported. 138 * 139 * Internally, we allocate a netmap_priv_d structure, that will be 140 * initialized on ioctl(NIOCREGIF). There is one netmap_priv_d 141 * structure for each open(). 142 * 143 * os-specific: 144 * FreeBSD: see netmap_open() (netmap_freebsd.c) 145 * linux: see linux_netmap_open() (netmap_linux.c) 146 * 147 * > 2. on each descriptor, the process issues an ioctl() to identify 148 * > the interface that should report events to the file descriptor. 149 * 150 * Implemented by netmap_ioctl(), NIOCREGIF case, with nmr->nr_cmd==0. 151 * Most important things happen in netmap_get_na() and 152 * netmap_do_regif(), called from there. Additional details can be 153 * found in the comments above those functions. 154 * 155 * In all cases, this action creates/takes-a-reference-to a 156 * netmap_*_adapter describing the port, and allocates a netmap_if 157 * and all necessary netmap rings, filling them with netmap buffers. 158 * 159 * In this phase, the sync callbacks for each ring are set (these are used 160 * in steps 5 and 6 below). The callbacks depend on the type of adapter. 161 * The adapter creation/initialization code puts them in the 162 * netmap_adapter (fields na->nm_txsync and na->nm_rxsync). Then, they 163 * are copied from there to the netmap_kring's during netmap_do_regif(), by 164 * the nm_krings_create() callback. All the nm_krings_create callbacks 165 * actually call netmap_krings_create() to perform this and the other 166 * common stuff. netmap_krings_create() also takes care of the host rings, 167 * if needed, by setting their sync callbacks appropriately. 168 * 169 * Additional actions depend on the kind of netmap_adapter that has been 170 * registered: 171 * 172 * - netmap_hw_adapter: [netmap.c] 173 * This is a system netdev/ifp with native netmap support. 174 * The ifp is detached from the host stack by redirecting: 175 * - transmissions (from the network stack) to netmap_transmit() 176 * - receive notifications to the nm_notify() callback for 177 * this adapter. The callback is normally netmap_notify(), unless 178 * the ifp is attached to a bridge using bwrap, in which case it 179 * is netmap_bwrap_intr_notify(). 180 * 181 * - netmap_generic_adapter: [netmap_generic.c] 182 * A system netdev/ifp without native netmap support. 183 * 184 * (the decision about native/non native support is taken in 185 * netmap_get_hw_na(), called by netmap_get_na()) 186 * 187 * - netmap_vp_adapter [netmap_vale.c] 188 * Returned by netmap_get_bdg_na(). 189 * This is a persistent or ephemeral VALE port. Ephemeral ports 190 * are created on the fly if they don't already exist, and are 191 * always attached to a bridge. 192 * Persistent VALE ports must must be created separately, and i 193 * then attached like normal NICs. The NIOCREGIF we are examining 194 * will find them only if they had previosly been created and 195 * attached (see VALE_CTL below). 196 * 197 * - netmap_pipe_adapter [netmap_pipe.c] 198 * Returned by netmap_get_pipe_na(). 199 * Both pipe ends are created, if they didn't already exist. 200 * 201 * - netmap_monitor_adapter [netmap_monitor.c] 202 * Returned by netmap_get_monitor_na(). 203 * If successful, the nm_sync callbacks of the monitored adapter 204 * will be intercepted by the returned monitor. 205 * 206 * - netmap_bwrap_adapter [netmap_vale.c] 207 * Cannot be obtained in this way, see VALE_CTL below 208 * 209 * 210 * os-specific: 211 * linux: we first go through linux_netmap_ioctl() to 212 * adapt the FreeBSD interface to the linux one. 213 * 214 * 215 * > 3. on each descriptor, the process issues an mmap() request to 216 * > map the shared memory region within the process' address space. 217 * > The list of interesting queues is indicated by a location in 218 * > the shared memory region. 219 * 220 * os-specific: 221 * FreeBSD: netmap_mmap_single (netmap_freebsd.c). 222 * linux: linux_netmap_mmap (netmap_linux.c). 223 * 224 * > 4. using the functions in the netmap(4) userspace API, a process 225 * > can look up the occupation state of a queue, access memory buffers, 226 * > and retrieve received packets or enqueue packets to transmit. 227 * 228 * these actions do not involve the kernel. 229 * 230 * > 5. using some ioctl()s the process can synchronize the userspace view 231 * > of the queue with the actual status in the kernel. This includes both 232 * > receiving the notification of new packets, and transmitting new 233 * > packets on the output interface. 234 * 235 * These are implemented in netmap_ioctl(), NIOCTXSYNC and NIOCRXSYNC 236 * cases. They invoke the nm_sync callbacks on the netmap_kring 237 * structures, as initialized in step 2 and maybe later modified 238 * by a monitor. Monitors, however, will always call the original 239 * callback before doing anything else. 240 * 241 * 242 * > 6. select() or poll() can be used to wait for events on individual 243 * > transmit or receive queues (or all queues for a given interface). 244 * 245 * Implemented in netmap_poll(). This will call the same nm_sync() 246 * callbacks as in step 5 above. 247 * 248 * os-specific: 249 * linux: we first go through linux_netmap_poll() to adapt 250 * the FreeBSD interface to the linux one. 251 * 252 * 253 * ---- VALE_CTL ----- 254 * 255 * VALE switches are controlled by issuing a NIOCREGIF with a non-null 256 * nr_cmd in the nmreq structure. These subcommands are handled by 257 * netmap_bdg_ctl() in netmap_vale.c. Persistent VALE ports are created 258 * and destroyed by issuing the NETMAP_BDG_NEWIF and NETMAP_BDG_DELIF 259 * subcommands, respectively. 260 * 261 * Any network interface known to the system (including a persistent VALE 262 * port) can be attached to a VALE switch by issuing the 263 * NETMAP_BDG_ATTACH subcommand. After the attachment, persistent VALE ports 264 * look exactly like ephemeral VALE ports (as created in step 2 above). The 265 * attachment of other interfaces, instead, requires the creation of a 266 * netmap_bwrap_adapter. Moreover, the attached interface must be put in 267 * netmap mode. This may require the creation of a netmap_generic_adapter if 268 * we have no native support for the interface, or if generic adapters have 269 * been forced by sysctl. 270 * 271 * Both persistent VALE ports and bwraps are handled by netmap_get_bdg_na(), 272 * called by nm_bdg_ctl_attach(), and discriminated by the nm_bdg_attach() 273 * callback. In the case of the bwrap, the callback creates the 274 * netmap_bwrap_adapter. The initialization of the bwrap is then 275 * completed by calling netmap_do_regif() on it, in the nm_bdg_ctl() 276 * callback (netmap_bwrap_bdg_ctl in netmap_vale.c). 277 * A generic adapter for the wrapped ifp will be created if needed, when 278 * netmap_get_bdg_na() calls netmap_get_hw_na(). 279 * 280 * 281 * ---- DATAPATHS ----- 282 * 283 * -= SYSTEM DEVICE WITH NATIVE SUPPORT =- 284 * 285 * na == NA(ifp) == netmap_hw_adapter created in DEVICE_netmap_attach() 286 * 287 * - tx from netmap userspace: 288 * concurrently: 289 * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context 290 * kring->nm_sync() == DEVICE_netmap_txsync() 291 * 2) device interrupt handler 292 * na->nm_notify() == netmap_notify() 293 * - rx from netmap userspace: 294 * concurrently: 295 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context 296 * kring->nm_sync() == DEVICE_netmap_rxsync() 297 * 2) device interrupt handler 298 * na->nm_notify() == netmap_notify() 299 * - rx from host stack 300 * concurrently: 301 * 1) host stack 302 * netmap_transmit() 303 * na->nm_notify == netmap_notify() 304 * 2) ioctl(NIOCRXSYNC)/netmap_poll() in process context 305 * kring->nm_sync() == netmap_rxsync_from_host 306 * netmap_rxsync_from_host(na, NULL, NULL) 307 * - tx to host stack 308 * ioctl(NIOCTXSYNC)/netmap_poll() in process context 309 * kring->nm_sync() == netmap_txsync_to_host 310 * netmap_txsync_to_host(na) 311 * nm_os_send_up() 312 * FreeBSD: na->if_input() == ether_input() 313 * linux: netif_rx() with NM_MAGIC_PRIORITY_RX 314 * 315 * 316 * -= SYSTEM DEVICE WITH GENERIC SUPPORT =- 317 * 318 * na == NA(ifp) == generic_netmap_adapter created in generic_netmap_attach() 319 * 320 * - tx from netmap userspace: 321 * concurrently: 322 * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context 323 * kring->nm_sync() == generic_netmap_txsync() 324 * nm_os_generic_xmit_frame() 325 * linux: dev_queue_xmit() with NM_MAGIC_PRIORITY_TX 326 * ifp->ndo_start_xmit == generic_ndo_start_xmit() 327 * gna->save_start_xmit == orig. dev. start_xmit 328 * FreeBSD: na->if_transmit() == orig. dev if_transmit 329 * 2) generic_mbuf_destructor() 330 * na->nm_notify() == netmap_notify() 331 * - rx from netmap userspace: 332 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context 333 * kring->nm_sync() == generic_netmap_rxsync() 334 * mbq_safe_dequeue() 335 * 2) device driver 336 * generic_rx_handler() 337 * mbq_safe_enqueue() 338 * na->nm_notify() == netmap_notify() 339 * - rx from host stack 340 * FreeBSD: same as native 341 * Linux: same as native except: 342 * 1) host stack 343 * dev_queue_xmit() without NM_MAGIC_PRIORITY_TX 344 * ifp->ndo_start_xmit == generic_ndo_start_xmit() 345 * netmap_transmit() 346 * na->nm_notify() == netmap_notify() 347 * - tx to host stack (same as native): 348 * 349 * 350 * -= VALE =- 351 * 352 * INCOMING: 353 * 354 * - VALE ports: 355 * ioctl(NIOCTXSYNC)/netmap_poll() in process context 356 * kring->nm_sync() == netmap_vp_txsync() 357 * 358 * - system device with native support: 359 * from cable: 360 * interrupt 361 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr != host ring) 362 * kring->nm_sync() == DEVICE_netmap_rxsync() 363 * netmap_vp_txsync() 364 * kring->nm_sync() == DEVICE_netmap_rxsync() 365 * from host stack: 366 * netmap_transmit() 367 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring) 368 * kring->nm_sync() == netmap_rxsync_from_host() 369 * netmap_vp_txsync() 370 * 371 * - system device with generic support: 372 * from device driver: 373 * generic_rx_handler() 374 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr != host ring) 375 * kring->nm_sync() == generic_netmap_rxsync() 376 * netmap_vp_txsync() 377 * kring->nm_sync() == generic_netmap_rxsync() 378 * from host stack: 379 * netmap_transmit() 380 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring) 381 * kring->nm_sync() == netmap_rxsync_from_host() 382 * netmap_vp_txsync() 383 * 384 * (all cases) --> nm_bdg_flush() 385 * dest_na->nm_notify() == (see below) 386 * 387 * OUTGOING: 388 * 389 * - VALE ports: 390 * concurrently: 391 * 1) ioctlNIOCRXSYNC)/netmap_poll() in process context 392 * kring->nm_sync() == netmap_vp_rxsync() 393 * 2) from nm_bdg_flush() 394 * na->nm_notify() == netmap_notify() 395 * 396 * - system device with native support: 397 * to cable: 398 * na->nm_notify() == netmap_bwrap_notify() 399 * netmap_vp_rxsync() 400 * kring->nm_sync() == DEVICE_netmap_txsync() 401 * netmap_vp_rxsync() 402 * to host stack: 403 * netmap_vp_rxsync() 404 * kring->nm_sync() == netmap_txsync_to_host 405 * netmap_vp_rxsync_locked() 406 * 407 * - system device with generic adapter: 408 * to device driver: 409 * na->nm_notify() == netmap_bwrap_notify() 410 * netmap_vp_rxsync() 411 * kring->nm_sync() == generic_netmap_txsync() 412 * netmap_vp_rxsync() 413 * to host stack: 414 * netmap_vp_rxsync() 415 * kring->nm_sync() == netmap_txsync_to_host 416 * netmap_vp_rxsync() 417 * 418 */ 419 420 /* 421 * OS-specific code that is used only within this file. 422 * Other OS-specific code that must be accessed by drivers 423 * is present in netmap_kern.h 424 */ 425 426 #if defined(__FreeBSD__) 427 #include <sys/cdefs.h> /* prerequisite */ 428 #include <sys/types.h> 429 #include <sys/errno.h> 430 #include <sys/param.h> /* defines used in kernel.h */ 431 #include <sys/kernel.h> /* types used in module initialization */ 432 #include <sys/conf.h> /* cdevsw struct, UID, GID */ 433 #include <sys/filio.h> /* FIONBIO */ 434 #include <sys/sockio.h> 435 #include <sys/socketvar.h> /* struct socket */ 436 #include <sys/malloc.h> 437 #include <sys/poll.h> 438 #include <sys/rwlock.h> 439 #include <sys/socket.h> /* sockaddrs */ 440 #include <sys/selinfo.h> 441 #include <sys/sysctl.h> 442 #include <sys/jail.h> 443 #include <net/vnet.h> 444 #include <net/if.h> 445 #include <net/if_var.h> 446 #include <net/bpf.h> /* BIOCIMMEDIATE */ 447 #include <machine/bus.h> /* bus_dmamap_* */ 448 #include <sys/endian.h> 449 #include <sys/refcount.h> 450 451 452 #elif defined(linux) 453 454 #include "bsd_glue.h" 455 456 #elif defined(__APPLE__) 457 458 #warning OSX support is only partial 459 #include "osx_glue.h" 460 461 #elif defined (_WIN32) 462 463 #include "win_glue.h" 464 465 #else 466 467 #error Unsupported platform 468 469 #endif /* unsupported */ 470 471 /* 472 * common headers 473 */ 474 #include <net/netmap.h> 475 #include <dev/netmap/netmap_kern.h> 476 #include <dev/netmap/netmap_mem2.h> 477 478 479 /* user-controlled variables */ 480 int netmap_verbose; 481 482 static int netmap_no_timestamp; /* don't timestamp on rxsync */ 483 int netmap_mitigate = 1; 484 int netmap_no_pendintr = 1; 485 int netmap_txsync_retry = 2; 486 int netmap_flags = 0; /* debug flags */ 487 static int netmap_fwd = 0; /* force transparent mode */ 488 489 /* 490 * netmap_admode selects the netmap mode to use. 491 * Invalid values are reset to NETMAP_ADMODE_BEST 492 */ 493 enum { NETMAP_ADMODE_BEST = 0, /* use native, fallback to generic */ 494 NETMAP_ADMODE_NATIVE, /* either native or none */ 495 NETMAP_ADMODE_GENERIC, /* force generic */ 496 NETMAP_ADMODE_LAST }; 497 static int netmap_admode = NETMAP_ADMODE_BEST; 498 499 /* netmap_generic_mit controls mitigation of RX notifications for 500 * the generic netmap adapter. The value is a time interval in 501 * nanoseconds. */ 502 int netmap_generic_mit = 100*1000; 503 504 /* We use by default netmap-aware qdiscs with generic netmap adapters, 505 * even if there can be a little performance hit with hardware NICs. 506 * However, using the qdisc is the safer approach, for two reasons: 507 * 1) it prevents non-fifo qdiscs to break the TX notification 508 * scheme, which is based on mbuf destructors when txqdisc is 509 * not used. 510 * 2) it makes it possible to transmit over software devices that 511 * change skb->dev, like bridge, veth, ... 512 * 513 * Anyway users looking for the best performance should 514 * use native adapters. 515 */ 516 int netmap_generic_txqdisc = 1; 517 518 /* Default number of slots and queues for generic adapters. */ 519 int netmap_generic_ringsize = 1024; 520 int netmap_generic_rings = 1; 521 522 /* Non-zero if ptnet devices are allowed to use virtio-net headers. */ 523 int ptnet_vnet_hdr = 1; 524 525 /* 526 * SYSCTL calls are grouped between SYSBEGIN and SYSEND to be emulated 527 * in some other operating systems 528 */ 529 SYSBEGIN(main_init); 530 531 SYSCTL_DECL(_dev_netmap); 532 SYSCTL_NODE(_dev, OID_AUTO, netmap, CTLFLAG_RW, 0, "Netmap args"); 533 SYSCTL_INT(_dev_netmap, OID_AUTO, verbose, 534 CTLFLAG_RW, &netmap_verbose, 0, "Verbose mode"); 535 SYSCTL_INT(_dev_netmap, OID_AUTO, no_timestamp, 536 CTLFLAG_RW, &netmap_no_timestamp, 0, "no_timestamp"); 537 SYSCTL_INT(_dev_netmap, OID_AUTO, mitigate, CTLFLAG_RW, &netmap_mitigate, 0, ""); 538 SYSCTL_INT(_dev_netmap, OID_AUTO, no_pendintr, 539 CTLFLAG_RW, &netmap_no_pendintr, 0, "Always look for new received packets."); 540 SYSCTL_INT(_dev_netmap, OID_AUTO, txsync_retry, CTLFLAG_RW, 541 &netmap_txsync_retry, 0 , "Number of txsync loops in bridge's flush."); 542 543 SYSCTL_INT(_dev_netmap, OID_AUTO, flags, CTLFLAG_RW, &netmap_flags, 0 , ""); 544 SYSCTL_INT(_dev_netmap, OID_AUTO, fwd, CTLFLAG_RW, &netmap_fwd, 0 , ""); 545 SYSCTL_INT(_dev_netmap, OID_AUTO, admode, CTLFLAG_RW, &netmap_admode, 0 , ""); 546 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_mit, CTLFLAG_RW, &netmap_generic_mit, 0 , ""); 547 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_ringsize, CTLFLAG_RW, &netmap_generic_ringsize, 0 , ""); 548 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_rings, CTLFLAG_RW, &netmap_generic_rings, 0 , ""); 549 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_txqdisc, CTLFLAG_RW, &netmap_generic_txqdisc, 0 , ""); 550 SYSCTL_INT(_dev_netmap, OID_AUTO, ptnet_vnet_hdr, CTLFLAG_RW, &ptnet_vnet_hdr, 0 , ""); 551 552 SYSEND; 553 554 NMG_LOCK_T netmap_global_lock; 555 556 /* 557 * mark the ring as stopped, and run through the locks 558 * to make sure other users get to see it. 559 * stopped must be either NR_KR_STOPPED (for unbounded stop) 560 * of NR_KR_LOCKED (brief stop for mutual exclusion purposes) 561 */ 562 static void 563 netmap_disable_ring(struct netmap_kring *kr, int stopped) 564 { 565 nm_kr_stop(kr, stopped); 566 // XXX check if nm_kr_stop is sufficient 567 mtx_lock(&kr->q_lock); 568 mtx_unlock(&kr->q_lock); 569 nm_kr_put(kr); 570 } 571 572 /* stop or enable a single ring */ 573 void 574 netmap_set_ring(struct netmap_adapter *na, u_int ring_id, enum txrx t, int stopped) 575 { 576 if (stopped) 577 netmap_disable_ring(NMR(na, t) + ring_id, stopped); 578 else 579 NMR(na, t)[ring_id].nkr_stopped = 0; 580 } 581 582 583 /* stop or enable all the rings of na */ 584 void 585 netmap_set_all_rings(struct netmap_adapter *na, int stopped) 586 { 587 int i; 588 enum txrx t; 589 590 if (!nm_netmap_on(na)) 591 return; 592 593 for_rx_tx(t) { 594 for (i = 0; i < netmap_real_rings(na, t); i++) { 595 netmap_set_ring(na, i, t, stopped); 596 } 597 } 598 } 599 600 /* 601 * Convenience function used in drivers. Waits for current txsync()s/rxsync()s 602 * to finish and prevents any new one from starting. Call this before turning 603 * netmap mode off, or before removing the hardware rings (e.g., on module 604 * onload). 605 */ 606 void 607 netmap_disable_all_rings(struct ifnet *ifp) 608 { 609 if (NM_NA_VALID(ifp)) { 610 netmap_set_all_rings(NA(ifp), NM_KR_STOPPED); 611 } 612 } 613 614 /* 615 * Convenience function used in drivers. Re-enables rxsync and txsync on the 616 * adapter's rings In linux drivers, this should be placed near each 617 * napi_enable(). 618 */ 619 void 620 netmap_enable_all_rings(struct ifnet *ifp) 621 { 622 if (NM_NA_VALID(ifp)) { 623 netmap_set_all_rings(NA(ifp), 0 /* enabled */); 624 } 625 } 626 627 void 628 netmap_make_zombie(struct ifnet *ifp) 629 { 630 if (NM_NA_VALID(ifp)) { 631 struct netmap_adapter *na = NA(ifp); 632 netmap_set_all_rings(na, NM_KR_LOCKED); 633 na->na_flags |= NAF_ZOMBIE; 634 netmap_set_all_rings(na, 0); 635 } 636 } 637 638 void 639 netmap_undo_zombie(struct ifnet *ifp) 640 { 641 if (NM_NA_VALID(ifp)) { 642 struct netmap_adapter *na = NA(ifp); 643 if (na->na_flags & NAF_ZOMBIE) { 644 netmap_set_all_rings(na, NM_KR_LOCKED); 645 na->na_flags &= ~NAF_ZOMBIE; 646 netmap_set_all_rings(na, 0); 647 } 648 } 649 } 650 651 /* 652 * generic bound_checking function 653 */ 654 u_int 655 nm_bound_var(u_int *v, u_int dflt, u_int lo, u_int hi, const char *msg) 656 { 657 u_int oldv = *v; 658 const char *op = NULL; 659 660 if (dflt < lo) 661 dflt = lo; 662 if (dflt > hi) 663 dflt = hi; 664 if (oldv < lo) { 665 *v = dflt; 666 op = "Bump"; 667 } else if (oldv > hi) { 668 *v = hi; 669 op = "Clamp"; 670 } 671 if (op && msg) 672 printf("%s %s to %d (was %d)\n", op, msg, *v, oldv); 673 return *v; 674 } 675 676 677 /* 678 * packet-dump function, user-supplied or static buffer. 679 * The destination buffer must be at least 30+4*len 680 */ 681 const char * 682 nm_dump_buf(char *p, int len, int lim, char *dst) 683 { 684 static char _dst[8192]; 685 int i, j, i0; 686 static char hex[] ="0123456789abcdef"; 687 char *o; /* output position */ 688 689 #define P_HI(x) hex[((x) & 0xf0)>>4] 690 #define P_LO(x) hex[((x) & 0xf)] 691 #define P_C(x) ((x) >= 0x20 && (x) <= 0x7e ? (x) : '.') 692 if (!dst) 693 dst = _dst; 694 if (lim <= 0 || lim > len) 695 lim = len; 696 o = dst; 697 sprintf(o, "buf 0x%p len %d lim %d\n", p, len, lim); 698 o += strlen(o); 699 /* hexdump routine */ 700 for (i = 0; i < lim; ) { 701 sprintf(o, "%5d: ", i); 702 o += strlen(o); 703 memset(o, ' ', 48); 704 i0 = i; 705 for (j=0; j < 16 && i < lim; i++, j++) { 706 o[j*3] = P_HI(p[i]); 707 o[j*3+1] = P_LO(p[i]); 708 } 709 i = i0; 710 for (j=0; j < 16 && i < lim; i++, j++) 711 o[j + 48] = P_C(p[i]); 712 o[j+48] = '\n'; 713 o += j+49; 714 } 715 *o = '\0'; 716 #undef P_HI 717 #undef P_LO 718 #undef P_C 719 return dst; 720 } 721 722 723 /* 724 * Fetch configuration from the device, to cope with dynamic 725 * reconfigurations after loading the module. 726 */ 727 /* call with NMG_LOCK held */ 728 int 729 netmap_update_config(struct netmap_adapter *na) 730 { 731 u_int txr, txd, rxr, rxd; 732 733 txr = txd = rxr = rxd = 0; 734 if (na->nm_config == NULL || 735 na->nm_config(na, &txr, &txd, &rxr, &rxd)) 736 { 737 /* take whatever we had at init time */ 738 txr = na->num_tx_rings; 739 txd = na->num_tx_desc; 740 rxr = na->num_rx_rings; 741 rxd = na->num_rx_desc; 742 } 743 744 if (na->num_tx_rings == txr && na->num_tx_desc == txd && 745 na->num_rx_rings == rxr && na->num_rx_desc == rxd) 746 return 0; /* nothing changed */ 747 if (netmap_verbose || na->active_fds > 0) { 748 D("stored config %s: txring %d x %d, rxring %d x %d", 749 na->name, 750 na->num_tx_rings, na->num_tx_desc, 751 na->num_rx_rings, na->num_rx_desc); 752 D("new config %s: txring %d x %d, rxring %d x %d", 753 na->name, txr, txd, rxr, rxd); 754 } 755 if (na->active_fds == 0) { 756 D("configuration changed (but fine)"); 757 na->num_tx_rings = txr; 758 na->num_tx_desc = txd; 759 na->num_rx_rings = rxr; 760 na->num_rx_desc = rxd; 761 return 0; 762 } 763 D("configuration changed while active, this is bad..."); 764 return 1; 765 } 766 767 /* nm_sync callbacks for the host rings */ 768 static int netmap_txsync_to_host(struct netmap_kring *kring, int flags); 769 static int netmap_rxsync_from_host(struct netmap_kring *kring, int flags); 770 771 /* create the krings array and initialize the fields common to all adapters. 772 * The array layout is this: 773 * 774 * +----------+ 775 * na->tx_rings ----->| | \ 776 * | | } na->num_tx_ring 777 * | | / 778 * +----------+ 779 * | | host tx kring 780 * na->rx_rings ----> +----------+ 781 * | | \ 782 * | | } na->num_rx_rings 783 * | | / 784 * +----------+ 785 * | | host rx kring 786 * +----------+ 787 * na->tailroom ----->| | \ 788 * | | } tailroom bytes 789 * | | / 790 * +----------+ 791 * 792 * Note: for compatibility, host krings are created even when not needed. 793 * The tailroom space is currently used by vale ports for allocating leases. 794 */ 795 /* call with NMG_LOCK held */ 796 int 797 netmap_krings_create(struct netmap_adapter *na, u_int tailroom) 798 { 799 u_int i, len, ndesc; 800 struct netmap_kring *kring; 801 u_int n[NR_TXRX]; 802 enum txrx t; 803 804 /* account for the (possibly fake) host rings */ 805 n[NR_TX] = na->num_tx_rings + 1; 806 n[NR_RX] = na->num_rx_rings + 1; 807 808 len = (n[NR_TX] + n[NR_RX]) * sizeof(struct netmap_kring) + tailroom; 809 810 na->tx_rings = malloc((size_t)len, M_DEVBUF, M_NOWAIT | M_ZERO); 811 if (na->tx_rings == NULL) { 812 D("Cannot allocate krings"); 813 return ENOMEM; 814 } 815 na->rx_rings = na->tx_rings + n[NR_TX]; 816 817 /* 818 * All fields in krings are 0 except the one initialized below. 819 * but better be explicit on important kring fields. 820 */ 821 for_rx_tx(t) { 822 ndesc = nma_get_ndesc(na, t); 823 for (i = 0; i < n[t]; i++) { 824 kring = &NMR(na, t)[i]; 825 bzero(kring, sizeof(*kring)); 826 kring->na = na; 827 kring->ring_id = i; 828 kring->tx = t; 829 kring->nkr_num_slots = ndesc; 830 kring->nr_mode = NKR_NETMAP_OFF; 831 kring->nr_pending_mode = NKR_NETMAP_OFF; 832 if (i < nma_get_nrings(na, t)) { 833 kring->nm_sync = (t == NR_TX ? na->nm_txsync : na->nm_rxsync); 834 } else { 835 kring->nm_sync = (t == NR_TX ? 836 netmap_txsync_to_host: 837 netmap_rxsync_from_host); 838 } 839 kring->nm_notify = na->nm_notify; 840 kring->rhead = kring->rcur = kring->nr_hwcur = 0; 841 /* 842 * IMPORTANT: Always keep one slot empty. 843 */ 844 kring->rtail = kring->nr_hwtail = (t == NR_TX ? ndesc - 1 : 0); 845 snprintf(kring->name, sizeof(kring->name) - 1, "%s %s%d", na->name, 846 nm_txrx2str(t), i); 847 ND("ktx %s h %d c %d t %d", 848 kring->name, kring->rhead, kring->rcur, kring->rtail); 849 mtx_init(&kring->q_lock, (t == NR_TX ? "nm_txq_lock" : "nm_rxq_lock"), NULL, MTX_DEF); 850 nm_os_selinfo_init(&kring->si); 851 } 852 nm_os_selinfo_init(&na->si[t]); 853 } 854 855 na->tailroom = na->rx_rings + n[NR_RX]; 856 857 return 0; 858 } 859 860 861 /* undo the actions performed by netmap_krings_create */ 862 /* call with NMG_LOCK held */ 863 void 864 netmap_krings_delete(struct netmap_adapter *na) 865 { 866 struct netmap_kring *kring = na->tx_rings; 867 enum txrx t; 868 869 for_rx_tx(t) 870 nm_os_selinfo_uninit(&na->si[t]); 871 872 /* we rely on the krings layout described above */ 873 for ( ; kring != na->tailroom; kring++) { 874 mtx_destroy(&kring->q_lock); 875 nm_os_selinfo_uninit(&kring->si); 876 } 877 free(na->tx_rings, M_DEVBUF); 878 na->tx_rings = na->rx_rings = na->tailroom = NULL; 879 } 880 881 882 /* 883 * Destructor for NIC ports. They also have an mbuf queue 884 * on the rings connected to the host so we need to purge 885 * them first. 886 */ 887 /* call with NMG_LOCK held */ 888 void 889 netmap_hw_krings_delete(struct netmap_adapter *na) 890 { 891 struct mbq *q = &na->rx_rings[na->num_rx_rings].rx_queue; 892 893 ND("destroy sw mbq with len %d", mbq_len(q)); 894 mbq_purge(q); 895 mbq_safe_fini(q); 896 netmap_krings_delete(na); 897 } 898 899 900 901 /* 902 * Undo everything that was done in netmap_do_regif(). In particular, 903 * call nm_register(ifp,0) to stop netmap mode on the interface and 904 * revert to normal operation. 905 */ 906 /* call with NMG_LOCK held */ 907 static void netmap_unset_ringid(struct netmap_priv_d *); 908 static void netmap_krings_put(struct netmap_priv_d *); 909 void 910 netmap_do_unregif(struct netmap_priv_d *priv) 911 { 912 struct netmap_adapter *na = priv->np_na; 913 914 NMG_LOCK_ASSERT(); 915 na->active_fds--; 916 /* unset nr_pending_mode and possibly release exclusive mode */ 917 netmap_krings_put(priv); 918 919 #ifdef WITH_MONITOR 920 /* XXX check whether we have to do something with monitor 921 * when rings change nr_mode. */ 922 if (na->active_fds <= 0) { 923 /* walk through all the rings and tell any monitor 924 * that the port is going to exit netmap mode 925 */ 926 netmap_monitor_stop(na); 927 } 928 #endif 929 930 if (na->active_fds <= 0 || nm_kring_pending(priv)) { 931 na->nm_register(na, 0); 932 } 933 934 /* delete rings and buffers that are no longer needed */ 935 netmap_mem_rings_delete(na); 936 937 if (na->active_fds <= 0) { /* last instance */ 938 /* 939 * (TO CHECK) We enter here 940 * when the last reference to this file descriptor goes 941 * away. This means we cannot have any pending poll() 942 * or interrupt routine operating on the structure. 943 * XXX The file may be closed in a thread while 944 * another thread is using it. 945 * Linux keeps the file opened until the last reference 946 * by any outstanding ioctl/poll or mmap is gone. 947 * FreeBSD does not track mmap()s (but we do) and 948 * wakes up any sleeping poll(). Need to check what 949 * happens if the close() occurs while a concurrent 950 * syscall is running. 951 */ 952 if (netmap_verbose) 953 D("deleting last instance for %s", na->name); 954 955 if (nm_netmap_on(na)) { 956 D("BUG: netmap on while going to delete the krings"); 957 } 958 959 na->nm_krings_delete(na); 960 } 961 962 /* possibily decrement counter of tx_si/rx_si users */ 963 netmap_unset_ringid(priv); 964 /* delete the nifp */ 965 netmap_mem_if_delete(na, priv->np_nifp); 966 /* drop the allocator */ 967 netmap_mem_deref(na->nm_mem, na); 968 /* mark the priv as unregistered */ 969 priv->np_na = NULL; 970 priv->np_nifp = NULL; 971 } 972 973 /* call with NMG_LOCK held */ 974 static __inline int 975 nm_si_user(struct netmap_priv_d *priv, enum txrx t) 976 { 977 return (priv->np_na != NULL && 978 (priv->np_qlast[t] - priv->np_qfirst[t] > 1)); 979 } 980 981 struct netmap_priv_d* 982 netmap_priv_new(void) 983 { 984 struct netmap_priv_d *priv; 985 986 priv = malloc(sizeof(struct netmap_priv_d), M_DEVBUF, 987 M_NOWAIT | M_ZERO); 988 if (priv == NULL) 989 return NULL; 990 priv->np_refs = 1; 991 nm_os_get_module(); 992 return priv; 993 } 994 995 /* 996 * Destructor of the netmap_priv_d, called when the fd is closed 997 * Action: undo all the things done by NIOCREGIF, 998 * On FreeBSD we need to track whether there are active mmap()s, 999 * and we use np_active_mmaps for that. On linux, the field is always 0. 1000 * Return: 1 if we can free priv, 0 otherwise. 1001 * 1002 */ 1003 /* call with NMG_LOCK held */ 1004 void 1005 netmap_priv_delete(struct netmap_priv_d *priv) 1006 { 1007 struct netmap_adapter *na = priv->np_na; 1008 1009 /* number of active references to this fd */ 1010 if (--priv->np_refs > 0) { 1011 return; 1012 } 1013 nm_os_put_module(); 1014 if (na) { 1015 netmap_do_unregif(priv); 1016 } 1017 netmap_unget_na(na, priv->np_ifp); 1018 bzero(priv, sizeof(*priv)); /* for safety */ 1019 free(priv, M_DEVBUF); 1020 } 1021 1022 1023 /* call with NMG_LOCK *not* held */ 1024 void 1025 netmap_dtor(void *data) 1026 { 1027 struct netmap_priv_d *priv = data; 1028 1029 NMG_LOCK(); 1030 netmap_priv_delete(priv); 1031 NMG_UNLOCK(); 1032 } 1033 1034 1035 1036 1037 /* 1038 * Handlers for synchronization of the queues from/to the host. 1039 * Netmap has two operating modes: 1040 * - in the default mode, the rings connected to the host stack are 1041 * just another ring pair managed by userspace; 1042 * - in transparent mode (XXX to be defined) incoming packets 1043 * (from the host or the NIC) are marked as NS_FORWARD upon 1044 * arrival, and the user application has a chance to reset the 1045 * flag for packets that should be dropped. 1046 * On the RXSYNC or poll(), packets in RX rings between 1047 * kring->nr_kcur and ring->cur with NS_FORWARD still set are moved 1048 * to the other side. 1049 * The transfer NIC --> host is relatively easy, just encapsulate 1050 * into mbufs and we are done. The host --> NIC side is slightly 1051 * harder because there might not be room in the tx ring so it 1052 * might take a while before releasing the buffer. 1053 */ 1054 1055 1056 /* 1057 * pass a chain of buffers to the host stack as coming from 'dst' 1058 * We do not need to lock because the queue is private. 1059 */ 1060 static void 1061 netmap_send_up(struct ifnet *dst, struct mbq *q) 1062 { 1063 struct mbuf *m; 1064 struct mbuf *head = NULL, *prev = NULL; 1065 1066 /* send packets up, outside the lock */ 1067 while ((m = mbq_dequeue(q)) != NULL) { 1068 if (netmap_verbose & NM_VERB_HOST) 1069 D("sending up pkt %p size %d", m, MBUF_LEN(m)); 1070 prev = nm_os_send_up(dst, m, prev); 1071 if (head == NULL) 1072 head = prev; 1073 } 1074 if (head) 1075 nm_os_send_up(dst, NULL, head); 1076 mbq_fini(q); 1077 } 1078 1079 1080 /* 1081 * put a copy of the buffers marked NS_FORWARD into an mbuf chain. 1082 * Take packets from hwcur to ring->head marked NS_FORWARD (or forced) 1083 * and pass them up. Drop remaining packets in the unlikely event 1084 * of an mbuf shortage. 1085 */ 1086 static void 1087 netmap_grab_packets(struct netmap_kring *kring, struct mbq *q, int force) 1088 { 1089 u_int const lim = kring->nkr_num_slots - 1; 1090 u_int const head = kring->rhead; 1091 u_int n; 1092 struct netmap_adapter *na = kring->na; 1093 1094 for (n = kring->nr_hwcur; n != head; n = nm_next(n, lim)) { 1095 struct mbuf *m; 1096 struct netmap_slot *slot = &kring->ring->slot[n]; 1097 1098 if ((slot->flags & NS_FORWARD) == 0 && !force) 1099 continue; 1100 if (slot->len < 14 || slot->len > NETMAP_BUF_SIZE(na)) { 1101 RD(5, "bad pkt at %d len %d", n, slot->len); 1102 continue; 1103 } 1104 slot->flags &= ~NS_FORWARD; // XXX needed ? 1105 /* XXX TODO: adapt to the case of a multisegment packet */ 1106 m = m_devget(NMB(na, slot), slot->len, 0, na->ifp, NULL); 1107 1108 if (m == NULL) 1109 break; 1110 mbq_enqueue(q, m); 1111 } 1112 } 1113 1114 static inline int 1115 _nm_may_forward(struct netmap_kring *kring) 1116 { 1117 return ((netmap_fwd || kring->ring->flags & NR_FORWARD) && 1118 kring->na->na_flags & NAF_HOST_RINGS && 1119 kring->tx == NR_RX); 1120 } 1121 1122 static inline int 1123 nm_may_forward_up(struct netmap_kring *kring) 1124 { 1125 return _nm_may_forward(kring) && 1126 kring->ring_id != kring->na->num_rx_rings; 1127 } 1128 1129 static inline int 1130 nm_may_forward_down(struct netmap_kring *kring) 1131 { 1132 return _nm_may_forward(kring) && 1133 kring->ring_id == kring->na->num_rx_rings; 1134 } 1135 1136 /* 1137 * Send to the NIC rings packets marked NS_FORWARD between 1138 * kring->nr_hwcur and kring->rhead 1139 * Called under kring->rx_queue.lock on the sw rx ring, 1140 */ 1141 static u_int 1142 netmap_sw_to_nic(struct netmap_adapter *na) 1143 { 1144 struct netmap_kring *kring = &na->rx_rings[na->num_rx_rings]; 1145 struct netmap_slot *rxslot = kring->ring->slot; 1146 u_int i, rxcur = kring->nr_hwcur; 1147 u_int const head = kring->rhead; 1148 u_int const src_lim = kring->nkr_num_slots - 1; 1149 u_int sent = 0; 1150 1151 /* scan rings to find space, then fill as much as possible */ 1152 for (i = 0; i < na->num_tx_rings; i++) { 1153 struct netmap_kring *kdst = &na->tx_rings[i]; 1154 struct netmap_ring *rdst = kdst->ring; 1155 u_int const dst_lim = kdst->nkr_num_slots - 1; 1156 1157 /* XXX do we trust ring or kring->rcur,rtail ? */ 1158 for (; rxcur != head && !nm_ring_empty(rdst); 1159 rxcur = nm_next(rxcur, src_lim) ) { 1160 struct netmap_slot *src, *dst, tmp; 1161 u_int dst_head = rdst->head; 1162 1163 src = &rxslot[rxcur]; 1164 if ((src->flags & NS_FORWARD) == 0 && !netmap_fwd) 1165 continue; 1166 1167 sent++; 1168 1169 dst = &rdst->slot[dst_head]; 1170 1171 tmp = *src; 1172 1173 src->buf_idx = dst->buf_idx; 1174 src->flags = NS_BUF_CHANGED; 1175 1176 dst->buf_idx = tmp.buf_idx; 1177 dst->len = tmp.len; 1178 dst->flags = NS_BUF_CHANGED; 1179 1180 rdst->head = rdst->cur = nm_next(dst_head, dst_lim); 1181 } 1182 /* if (sent) XXX txsync ? */ 1183 } 1184 return sent; 1185 } 1186 1187 1188 /* 1189 * netmap_txsync_to_host() passes packets up. We are called from a 1190 * system call in user process context, and the only contention 1191 * can be among multiple user threads erroneously calling 1192 * this routine concurrently. 1193 */ 1194 static int 1195 netmap_txsync_to_host(struct netmap_kring *kring, int flags) 1196 { 1197 struct netmap_adapter *na = kring->na; 1198 u_int const lim = kring->nkr_num_slots - 1; 1199 u_int const head = kring->rhead; 1200 struct mbq q; 1201 1202 /* Take packets from hwcur to head and pass them up. 1203 * force head = cur since netmap_grab_packets() stops at head 1204 * In case of no buffers we give up. At the end of the loop, 1205 * the queue is drained in all cases. 1206 */ 1207 mbq_init(&q); 1208 netmap_grab_packets(kring, &q, 1 /* force */); 1209 ND("have %d pkts in queue", mbq_len(&q)); 1210 kring->nr_hwcur = head; 1211 kring->nr_hwtail = head + lim; 1212 if (kring->nr_hwtail > lim) 1213 kring->nr_hwtail -= lim + 1; 1214 1215 netmap_send_up(na->ifp, &q); 1216 return 0; 1217 } 1218 1219 1220 /* 1221 * rxsync backend for packets coming from the host stack. 1222 * They have been put in kring->rx_queue by netmap_transmit(). 1223 * We protect access to the kring using kring->rx_queue.lock 1224 * 1225 * This routine also does the selrecord if called from the poll handler 1226 * (we know because sr != NULL). 1227 * 1228 * returns the number of packets delivered to tx queues in 1229 * transparent mode, or a negative value if error 1230 */ 1231 static int 1232 netmap_rxsync_from_host(struct netmap_kring *kring, int flags) 1233 { 1234 struct netmap_adapter *na = kring->na; 1235 struct netmap_ring *ring = kring->ring; 1236 u_int nm_i, n; 1237 u_int const lim = kring->nkr_num_slots - 1; 1238 u_int const head = kring->rhead; 1239 int ret = 0; 1240 struct mbq *q = &kring->rx_queue, fq; 1241 1242 mbq_init(&fq); /* fq holds packets to be freed */ 1243 1244 mbq_lock(q); 1245 1246 /* First part: import newly received packets */ 1247 n = mbq_len(q); 1248 if (n) { /* grab packets from the queue */ 1249 struct mbuf *m; 1250 uint32_t stop_i; 1251 1252 nm_i = kring->nr_hwtail; 1253 stop_i = nm_prev(nm_i, lim); 1254 while ( nm_i != stop_i && (m = mbq_dequeue(q)) != NULL ) { 1255 int len = MBUF_LEN(m); 1256 struct netmap_slot *slot = &ring->slot[nm_i]; 1257 1258 m_copydata(m, 0, len, NMB(na, slot)); 1259 ND("nm %d len %d", nm_i, len); 1260 if (netmap_verbose) 1261 D("%s", nm_dump_buf(NMB(na, slot),len, 128, NULL)); 1262 1263 slot->len = len; 1264 slot->flags = kring->nkr_slot_flags; 1265 nm_i = nm_next(nm_i, lim); 1266 mbq_enqueue(&fq, m); 1267 } 1268 kring->nr_hwtail = nm_i; 1269 } 1270 1271 /* 1272 * Second part: skip past packets that userspace has released. 1273 */ 1274 nm_i = kring->nr_hwcur; 1275 if (nm_i != head) { /* something was released */ 1276 if (nm_may_forward_down(kring)) { 1277 ret = netmap_sw_to_nic(na); 1278 if (ret > 0) { 1279 kring->nr_kflags |= NR_FORWARD; 1280 ret = 0; 1281 } 1282 } 1283 kring->nr_hwcur = head; 1284 } 1285 1286 mbq_unlock(q); 1287 1288 mbq_purge(&fq); 1289 mbq_fini(&fq); 1290 1291 return ret; 1292 } 1293 1294 1295 /* Get a netmap adapter for the port. 1296 * 1297 * If it is possible to satisfy the request, return 0 1298 * with *na containing the netmap adapter found. 1299 * Otherwise return an error code, with *na containing NULL. 1300 * 1301 * When the port is attached to a bridge, we always return 1302 * EBUSY. 1303 * Otherwise, if the port is already bound to a file descriptor, 1304 * then we unconditionally return the existing adapter into *na. 1305 * In all the other cases, we return (into *na) either native, 1306 * generic or NULL, according to the following table: 1307 * 1308 * native_support 1309 * active_fds dev.netmap.admode YES NO 1310 * ------------------------------------------------------- 1311 * >0 * NA(ifp) NA(ifp) 1312 * 1313 * 0 NETMAP_ADMODE_BEST NATIVE GENERIC 1314 * 0 NETMAP_ADMODE_NATIVE NATIVE NULL 1315 * 0 NETMAP_ADMODE_GENERIC GENERIC GENERIC 1316 * 1317 */ 1318 static void netmap_hw_dtor(struct netmap_adapter *); /* needed by NM_IS_NATIVE() */ 1319 int 1320 netmap_get_hw_na(struct ifnet *ifp, struct netmap_adapter **na) 1321 { 1322 /* generic support */ 1323 int i = netmap_admode; /* Take a snapshot. */ 1324 struct netmap_adapter *prev_na; 1325 int error = 0; 1326 1327 *na = NULL; /* default */ 1328 1329 /* reset in case of invalid value */ 1330 if (i < NETMAP_ADMODE_BEST || i >= NETMAP_ADMODE_LAST) 1331 i = netmap_admode = NETMAP_ADMODE_BEST; 1332 1333 if (NM_NA_VALID(ifp)) { 1334 prev_na = NA(ifp); 1335 /* If an adapter already exists, return it if 1336 * there are active file descriptors or if 1337 * netmap is not forced to use generic 1338 * adapters. 1339 */ 1340 if (NETMAP_OWNED_BY_ANY(prev_na) 1341 || i != NETMAP_ADMODE_GENERIC 1342 || prev_na->na_flags & NAF_FORCE_NATIVE 1343 #ifdef WITH_PIPES 1344 /* ugly, but we cannot allow an adapter switch 1345 * if some pipe is referring to this one 1346 */ 1347 || prev_na->na_next_pipe > 0 1348 #endif 1349 ) { 1350 *na = prev_na; 1351 return 0; 1352 } 1353 } 1354 1355 /* If there isn't native support and netmap is not allowed 1356 * to use generic adapters, we cannot satisfy the request. 1357 */ 1358 if (!NM_IS_NATIVE(ifp) && i == NETMAP_ADMODE_NATIVE) 1359 return EOPNOTSUPP; 1360 1361 /* Otherwise, create a generic adapter and return it, 1362 * saving the previously used netmap adapter, if any. 1363 * 1364 * Note that here 'prev_na', if not NULL, MUST be a 1365 * native adapter, and CANNOT be a generic one. This is 1366 * true because generic adapters are created on demand, and 1367 * destroyed when not used anymore. Therefore, if the adapter 1368 * currently attached to an interface 'ifp' is generic, it 1369 * must be that 1370 * (NA(ifp)->active_fds > 0 || NETMAP_OWNED_BY_KERN(NA(ifp))). 1371 * Consequently, if NA(ifp) is generic, we will enter one of 1372 * the branches above. This ensures that we never override 1373 * a generic adapter with another generic adapter. 1374 */ 1375 error = generic_netmap_attach(ifp); 1376 if (error) 1377 return error; 1378 1379 *na = NA(ifp); 1380 return 0; 1381 } 1382 1383 1384 /* 1385 * MUST BE CALLED UNDER NMG_LOCK() 1386 * 1387 * Get a refcounted reference to a netmap adapter attached 1388 * to the interface specified by nmr. 1389 * This is always called in the execution of an ioctl(). 1390 * 1391 * Return ENXIO if the interface specified by the request does 1392 * not exist, ENOTSUP if netmap is not supported by the interface, 1393 * EBUSY if the interface is already attached to a bridge, 1394 * EINVAL if parameters are invalid, ENOMEM if needed resources 1395 * could not be allocated. 1396 * If successful, hold a reference to the netmap adapter. 1397 * 1398 * If the interface specified by nmr is a system one, also keep 1399 * a reference to it and return a valid *ifp. 1400 */ 1401 int 1402 netmap_get_na(struct nmreq *nmr, struct netmap_adapter **na, 1403 struct ifnet **ifp, int create) 1404 { 1405 int error = 0; 1406 struct netmap_adapter *ret = NULL; 1407 1408 *na = NULL; /* default return value */ 1409 *ifp = NULL; 1410 1411 NMG_LOCK_ASSERT(); 1412 1413 /* We cascade through all possible types of netmap adapter. 1414 * All netmap_get_*_na() functions return an error and an na, 1415 * with the following combinations: 1416 * 1417 * error na 1418 * 0 NULL type doesn't match 1419 * !0 NULL type matches, but na creation/lookup failed 1420 * 0 !NULL type matches and na created/found 1421 * !0 !NULL impossible 1422 */ 1423 1424 /* try to see if this is a ptnetmap port */ 1425 error = netmap_get_pt_host_na(nmr, na, create); 1426 if (error || *na != NULL) 1427 return error; 1428 1429 /* try to see if this is a monitor port */ 1430 error = netmap_get_monitor_na(nmr, na, create); 1431 if (error || *na != NULL) 1432 return error; 1433 1434 /* try to see if this is a pipe port */ 1435 error = netmap_get_pipe_na(nmr, na, create); 1436 if (error || *na != NULL) 1437 return error; 1438 1439 /* try to see if this is a bridge port */ 1440 error = netmap_get_bdg_na(nmr, na, create); 1441 if (error) 1442 return error; 1443 1444 if (*na != NULL) /* valid match in netmap_get_bdg_na() */ 1445 goto out; 1446 1447 /* 1448 * This must be a hardware na, lookup the name in the system. 1449 * Note that by hardware we actually mean "it shows up in ifconfig". 1450 * This may still be a tap, a veth/epair, or even a 1451 * persistent VALE port. 1452 */ 1453 *ifp = ifunit_ref(nmr->nr_name); 1454 if (*ifp == NULL) { 1455 return ENXIO; 1456 } 1457 1458 error = netmap_get_hw_na(*ifp, &ret); 1459 if (error) 1460 goto out; 1461 1462 *na = ret; 1463 netmap_adapter_get(ret); 1464 1465 out: 1466 if (error) { 1467 if (ret) 1468 netmap_adapter_put(ret); 1469 if (*ifp) { 1470 if_rele(*ifp); 1471 *ifp = NULL; 1472 } 1473 } 1474 1475 return error; 1476 } 1477 1478 /* undo netmap_get_na() */ 1479 void 1480 netmap_unget_na(struct netmap_adapter *na, struct ifnet *ifp) 1481 { 1482 if (ifp) 1483 if_rele(ifp); 1484 if (na) 1485 netmap_adapter_put(na); 1486 } 1487 1488 1489 #define NM_FAIL_ON(t) do { \ 1490 if (unlikely(t)) { \ 1491 RD(5, "%s: fail '" #t "' " \ 1492 "h %d c %d t %d " \ 1493 "rh %d rc %d rt %d " \ 1494 "hc %d ht %d", \ 1495 kring->name, \ 1496 head, cur, ring->tail, \ 1497 kring->rhead, kring->rcur, kring->rtail, \ 1498 kring->nr_hwcur, kring->nr_hwtail); \ 1499 return kring->nkr_num_slots; \ 1500 } \ 1501 } while (0) 1502 1503 /* 1504 * validate parameters on entry for *_txsync() 1505 * Returns ring->cur if ok, or something >= kring->nkr_num_slots 1506 * in case of error. 1507 * 1508 * rhead, rcur and rtail=hwtail are stored from previous round. 1509 * hwcur is the next packet to send to the ring. 1510 * 1511 * We want 1512 * hwcur <= *rhead <= head <= cur <= tail = *rtail <= hwtail 1513 * 1514 * hwcur, rhead, rtail and hwtail are reliable 1515 */ 1516 u_int 1517 nm_txsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring) 1518 { 1519 u_int head = ring->head; /* read only once */ 1520 u_int cur = ring->cur; /* read only once */ 1521 u_int n = kring->nkr_num_slots; 1522 1523 ND(5, "%s kcur %d ktail %d head %d cur %d tail %d", 1524 kring->name, 1525 kring->nr_hwcur, kring->nr_hwtail, 1526 ring->head, ring->cur, ring->tail); 1527 #if 1 /* kernel sanity checks; but we can trust the kring. */ 1528 NM_FAIL_ON(kring->nr_hwcur >= n || kring->rhead >= n || 1529 kring->rtail >= n || kring->nr_hwtail >= n); 1530 #endif /* kernel sanity checks */ 1531 /* 1532 * user sanity checks. We only use head, 1533 * A, B, ... are possible positions for head: 1534 * 1535 * 0 A rhead B rtail C n-1 1536 * 0 D rtail E rhead F n-1 1537 * 1538 * B, F, D are valid. A, C, E are wrong 1539 */ 1540 if (kring->rtail >= kring->rhead) { 1541 /* want rhead <= head <= rtail */ 1542 NM_FAIL_ON(head < kring->rhead || head > kring->rtail); 1543 /* and also head <= cur <= rtail */ 1544 NM_FAIL_ON(cur < head || cur > kring->rtail); 1545 } else { /* here rtail < rhead */ 1546 /* we need head outside rtail .. rhead */ 1547 NM_FAIL_ON(head > kring->rtail && head < kring->rhead); 1548 1549 /* two cases now: head <= rtail or head >= rhead */ 1550 if (head <= kring->rtail) { 1551 /* want head <= cur <= rtail */ 1552 NM_FAIL_ON(cur < head || cur > kring->rtail); 1553 } else { /* head >= rhead */ 1554 /* cur must be outside rtail..head */ 1555 NM_FAIL_ON(cur > kring->rtail && cur < head); 1556 } 1557 } 1558 if (ring->tail != kring->rtail) { 1559 RD(5, "%s tail overwritten was %d need %d", kring->name, 1560 ring->tail, kring->rtail); 1561 ring->tail = kring->rtail; 1562 } 1563 kring->rhead = head; 1564 kring->rcur = cur; 1565 return head; 1566 } 1567 1568 1569 /* 1570 * validate parameters on entry for *_rxsync() 1571 * Returns ring->head if ok, kring->nkr_num_slots on error. 1572 * 1573 * For a valid configuration, 1574 * hwcur <= head <= cur <= tail <= hwtail 1575 * 1576 * We only consider head and cur. 1577 * hwcur and hwtail are reliable. 1578 * 1579 */ 1580 u_int 1581 nm_rxsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring) 1582 { 1583 uint32_t const n = kring->nkr_num_slots; 1584 uint32_t head, cur; 1585 1586 ND(5,"%s kc %d kt %d h %d c %d t %d", 1587 kring->name, 1588 kring->nr_hwcur, kring->nr_hwtail, 1589 ring->head, ring->cur, ring->tail); 1590 /* 1591 * Before storing the new values, we should check they do not 1592 * move backwards. However: 1593 * - head is not an issue because the previous value is hwcur; 1594 * - cur could in principle go back, however it does not matter 1595 * because we are processing a brand new rxsync() 1596 */ 1597 cur = kring->rcur = ring->cur; /* read only once */ 1598 head = kring->rhead = ring->head; /* read only once */ 1599 #if 1 /* kernel sanity checks */ 1600 NM_FAIL_ON(kring->nr_hwcur >= n || kring->nr_hwtail >= n); 1601 #endif /* kernel sanity checks */ 1602 /* user sanity checks */ 1603 if (kring->nr_hwtail >= kring->nr_hwcur) { 1604 /* want hwcur <= rhead <= hwtail */ 1605 NM_FAIL_ON(head < kring->nr_hwcur || head > kring->nr_hwtail); 1606 /* and also rhead <= rcur <= hwtail */ 1607 NM_FAIL_ON(cur < head || cur > kring->nr_hwtail); 1608 } else { 1609 /* we need rhead outside hwtail..hwcur */ 1610 NM_FAIL_ON(head < kring->nr_hwcur && head > kring->nr_hwtail); 1611 /* two cases now: head <= hwtail or head >= hwcur */ 1612 if (head <= kring->nr_hwtail) { 1613 /* want head <= cur <= hwtail */ 1614 NM_FAIL_ON(cur < head || cur > kring->nr_hwtail); 1615 } else { 1616 /* cur must be outside hwtail..head */ 1617 NM_FAIL_ON(cur < head && cur > kring->nr_hwtail); 1618 } 1619 } 1620 if (ring->tail != kring->rtail) { 1621 RD(5, "%s tail overwritten was %d need %d", 1622 kring->name, 1623 ring->tail, kring->rtail); 1624 ring->tail = kring->rtail; 1625 } 1626 return head; 1627 } 1628 1629 1630 /* 1631 * Error routine called when txsync/rxsync detects an error. 1632 * Can't do much more than resetting head =cur = hwcur, tail = hwtail 1633 * Return 1 on reinit. 1634 * 1635 * This routine is only called by the upper half of the kernel. 1636 * It only reads hwcur (which is changed only by the upper half, too) 1637 * and hwtail (which may be changed by the lower half, but only on 1638 * a tx ring and only to increase it, so any error will be recovered 1639 * on the next call). For the above, we don't strictly need to call 1640 * it under lock. 1641 */ 1642 int 1643 netmap_ring_reinit(struct netmap_kring *kring) 1644 { 1645 struct netmap_ring *ring = kring->ring; 1646 u_int i, lim = kring->nkr_num_slots - 1; 1647 int errors = 0; 1648 1649 // XXX KASSERT nm_kr_tryget 1650 RD(10, "called for %s", kring->name); 1651 // XXX probably wrong to trust userspace 1652 kring->rhead = ring->head; 1653 kring->rcur = ring->cur; 1654 kring->rtail = ring->tail; 1655 1656 if (ring->cur > lim) 1657 errors++; 1658 if (ring->head > lim) 1659 errors++; 1660 if (ring->tail > lim) 1661 errors++; 1662 for (i = 0; i <= lim; i++) { 1663 u_int idx = ring->slot[i].buf_idx; 1664 u_int len = ring->slot[i].len; 1665 if (idx < 2 || idx >= kring->na->na_lut.objtotal) { 1666 RD(5, "bad index at slot %d idx %d len %d ", i, idx, len); 1667 ring->slot[i].buf_idx = 0; 1668 ring->slot[i].len = 0; 1669 } else if (len > NETMAP_BUF_SIZE(kring->na)) { 1670 ring->slot[i].len = 0; 1671 RD(5, "bad len at slot %d idx %d len %d", i, idx, len); 1672 } 1673 } 1674 if (errors) { 1675 RD(10, "total %d errors", errors); 1676 RD(10, "%s reinit, cur %d -> %d tail %d -> %d", 1677 kring->name, 1678 ring->cur, kring->nr_hwcur, 1679 ring->tail, kring->nr_hwtail); 1680 ring->head = kring->rhead = kring->nr_hwcur; 1681 ring->cur = kring->rcur = kring->nr_hwcur; 1682 ring->tail = kring->rtail = kring->nr_hwtail; 1683 } 1684 return (errors ? 1 : 0); 1685 } 1686 1687 /* interpret the ringid and flags fields of an nmreq, by translating them 1688 * into a pair of intervals of ring indices: 1689 * 1690 * [priv->np_txqfirst, priv->np_txqlast) and 1691 * [priv->np_rxqfirst, priv->np_rxqlast) 1692 * 1693 */ 1694 int 1695 netmap_interp_ringid(struct netmap_priv_d *priv, uint16_t ringid, uint32_t flags) 1696 { 1697 struct netmap_adapter *na = priv->np_na; 1698 u_int j, i = ringid & NETMAP_RING_MASK; 1699 u_int reg = flags & NR_REG_MASK; 1700 int excluded_direction[] = { NR_TX_RINGS_ONLY, NR_RX_RINGS_ONLY }; 1701 enum txrx t; 1702 1703 if (reg == NR_REG_DEFAULT) { 1704 /* convert from old ringid to flags */ 1705 if (ringid & NETMAP_SW_RING) { 1706 reg = NR_REG_SW; 1707 } else if (ringid & NETMAP_HW_RING) { 1708 reg = NR_REG_ONE_NIC; 1709 } else { 1710 reg = NR_REG_ALL_NIC; 1711 } 1712 D("deprecated API, old ringid 0x%x -> ringid %x reg %d", ringid, i, reg); 1713 } 1714 1715 if ((flags & NR_PTNETMAP_HOST) && (reg != NR_REG_ALL_NIC || 1716 flags & (NR_RX_RINGS_ONLY|NR_TX_RINGS_ONLY))) { 1717 D("Error: only NR_REG_ALL_NIC supported with netmap passthrough"); 1718 return EINVAL; 1719 } 1720 1721 for_rx_tx(t) { 1722 if (flags & excluded_direction[t]) { 1723 priv->np_qfirst[t] = priv->np_qlast[t] = 0; 1724 continue; 1725 } 1726 switch (reg) { 1727 case NR_REG_ALL_NIC: 1728 case NR_REG_PIPE_MASTER: 1729 case NR_REG_PIPE_SLAVE: 1730 priv->np_qfirst[t] = 0; 1731 priv->np_qlast[t] = nma_get_nrings(na, t); 1732 ND("ALL/PIPE: %s %d %d", nm_txrx2str(t), 1733 priv->np_qfirst[t], priv->np_qlast[t]); 1734 break; 1735 case NR_REG_SW: 1736 case NR_REG_NIC_SW: 1737 if (!(na->na_flags & NAF_HOST_RINGS)) { 1738 D("host rings not supported"); 1739 return EINVAL; 1740 } 1741 priv->np_qfirst[t] = (reg == NR_REG_SW ? 1742 nma_get_nrings(na, t) : 0); 1743 priv->np_qlast[t] = nma_get_nrings(na, t) + 1; 1744 ND("%s: %s %d %d", reg == NR_REG_SW ? "SW" : "NIC+SW", 1745 nm_txrx2str(t), 1746 priv->np_qfirst[t], priv->np_qlast[t]); 1747 break; 1748 case NR_REG_ONE_NIC: 1749 if (i >= na->num_tx_rings && i >= na->num_rx_rings) { 1750 D("invalid ring id %d", i); 1751 return EINVAL; 1752 } 1753 /* if not enough rings, use the first one */ 1754 j = i; 1755 if (j >= nma_get_nrings(na, t)) 1756 j = 0; 1757 priv->np_qfirst[t] = j; 1758 priv->np_qlast[t] = j + 1; 1759 ND("ONE_NIC: %s %d %d", nm_txrx2str(t), 1760 priv->np_qfirst[t], priv->np_qlast[t]); 1761 break; 1762 default: 1763 D("invalid regif type %d", reg); 1764 return EINVAL; 1765 } 1766 } 1767 priv->np_flags = (flags & ~NR_REG_MASK) | reg; 1768 1769 if (netmap_verbose) { 1770 D("%s: tx [%d,%d) rx [%d,%d) id %d", 1771 na->name, 1772 priv->np_qfirst[NR_TX], 1773 priv->np_qlast[NR_TX], 1774 priv->np_qfirst[NR_RX], 1775 priv->np_qlast[NR_RX], 1776 i); 1777 } 1778 return 0; 1779 } 1780 1781 1782 /* 1783 * Set the ring ID. For devices with a single queue, a request 1784 * for all rings is the same as a single ring. 1785 */ 1786 static int 1787 netmap_set_ringid(struct netmap_priv_d *priv, uint16_t ringid, uint32_t flags) 1788 { 1789 struct netmap_adapter *na = priv->np_na; 1790 int error; 1791 enum txrx t; 1792 1793 error = netmap_interp_ringid(priv, ringid, flags); 1794 if (error) { 1795 return error; 1796 } 1797 1798 priv->np_txpoll = (ringid & NETMAP_NO_TX_POLL) ? 0 : 1; 1799 1800 /* optimization: count the users registered for more than 1801 * one ring, which are the ones sleeping on the global queue. 1802 * The default netmap_notify() callback will then 1803 * avoid signaling the global queue if nobody is using it 1804 */ 1805 for_rx_tx(t) { 1806 if (nm_si_user(priv, t)) 1807 na->si_users[t]++; 1808 } 1809 return 0; 1810 } 1811 1812 static void 1813 netmap_unset_ringid(struct netmap_priv_d *priv) 1814 { 1815 struct netmap_adapter *na = priv->np_na; 1816 enum txrx t; 1817 1818 for_rx_tx(t) { 1819 if (nm_si_user(priv, t)) 1820 na->si_users[t]--; 1821 priv->np_qfirst[t] = priv->np_qlast[t] = 0; 1822 } 1823 priv->np_flags = 0; 1824 priv->np_txpoll = 0; 1825 } 1826 1827 1828 /* Set the nr_pending_mode for the requested rings. 1829 * If requested, also try to get exclusive access to the rings, provided 1830 * the rings we want to bind are not exclusively owned by a previous bind. 1831 */ 1832 static int 1833 netmap_krings_get(struct netmap_priv_d *priv) 1834 { 1835 struct netmap_adapter *na = priv->np_na; 1836 u_int i; 1837 struct netmap_kring *kring; 1838 int excl = (priv->np_flags & NR_EXCLUSIVE); 1839 enum txrx t; 1840 1841 ND("%s: grabbing tx [%d, %d) rx [%d, %d)", 1842 na->name, 1843 priv->np_qfirst[NR_TX], 1844 priv->np_qlast[NR_TX], 1845 priv->np_qfirst[NR_RX], 1846 priv->np_qlast[NR_RX]); 1847 1848 /* first round: check that all the requested rings 1849 * are neither alread exclusively owned, nor we 1850 * want exclusive ownership when they are already in use 1851 */ 1852 for_rx_tx(t) { 1853 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 1854 kring = &NMR(na, t)[i]; 1855 if ((kring->nr_kflags & NKR_EXCLUSIVE) || 1856 (kring->users && excl)) 1857 { 1858 ND("ring %s busy", kring->name); 1859 return EBUSY; 1860 } 1861 } 1862 } 1863 1864 /* second round: increment usage count (possibly marking them 1865 * as exclusive) and set the nr_pending_mode 1866 */ 1867 for_rx_tx(t) { 1868 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 1869 kring = &NMR(na, t)[i]; 1870 kring->users++; 1871 if (excl) 1872 kring->nr_kflags |= NKR_EXCLUSIVE; 1873 kring->nr_pending_mode = NKR_NETMAP_ON; 1874 } 1875 } 1876 1877 return 0; 1878 1879 } 1880 1881 /* Undo netmap_krings_get(). This is done by clearing the exclusive mode 1882 * if was asked on regif, and unset the nr_pending_mode if we are the 1883 * last users of the involved rings. */ 1884 static void 1885 netmap_krings_put(struct netmap_priv_d *priv) 1886 { 1887 struct netmap_adapter *na = priv->np_na; 1888 u_int i; 1889 struct netmap_kring *kring; 1890 int excl = (priv->np_flags & NR_EXCLUSIVE); 1891 enum txrx t; 1892 1893 ND("%s: releasing tx [%d, %d) rx [%d, %d)", 1894 na->name, 1895 priv->np_qfirst[NR_TX], 1896 priv->np_qlast[NR_TX], 1897 priv->np_qfirst[NR_RX], 1898 priv->np_qlast[MR_RX]); 1899 1900 1901 for_rx_tx(t) { 1902 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 1903 kring = &NMR(na, t)[i]; 1904 if (excl) 1905 kring->nr_kflags &= ~NKR_EXCLUSIVE; 1906 kring->users--; 1907 if (kring->users == 0) 1908 kring->nr_pending_mode = NKR_NETMAP_OFF; 1909 } 1910 } 1911 } 1912 1913 /* 1914 * possibly move the interface to netmap-mode. 1915 * If success it returns a pointer to netmap_if, otherwise NULL. 1916 * This must be called with NMG_LOCK held. 1917 * 1918 * The following na callbacks are called in the process: 1919 * 1920 * na->nm_config() [by netmap_update_config] 1921 * (get current number and size of rings) 1922 * 1923 * We have a generic one for linux (netmap_linux_config). 1924 * The bwrap has to override this, since it has to forward 1925 * the request to the wrapped adapter (netmap_bwrap_config). 1926 * 1927 * 1928 * na->nm_krings_create() 1929 * (create and init the krings array) 1930 * 1931 * One of the following: 1932 * 1933 * * netmap_hw_krings_create, (hw ports) 1934 * creates the standard layout for the krings 1935 * and adds the mbq (used for the host rings). 1936 * 1937 * * netmap_vp_krings_create (VALE ports) 1938 * add leases and scratchpads 1939 * 1940 * * netmap_pipe_krings_create (pipes) 1941 * create the krings and rings of both ends and 1942 * cross-link them 1943 * 1944 * * netmap_monitor_krings_create (monitors) 1945 * avoid allocating the mbq 1946 * 1947 * * netmap_bwrap_krings_create (bwraps) 1948 * create both the brap krings array, 1949 * the krings array of the wrapped adapter, and 1950 * (if needed) the fake array for the host adapter 1951 * 1952 * na->nm_register(, 1) 1953 * (put the adapter in netmap mode) 1954 * 1955 * This may be one of the following: 1956 * 1957 * * netmap_hw_reg (hw ports) 1958 * checks that the ifp is still there, then calls 1959 * the hardware specific callback; 1960 * 1961 * * netmap_vp_reg (VALE ports) 1962 * If the port is connected to a bridge, 1963 * set the NAF_NETMAP_ON flag under the 1964 * bridge write lock. 1965 * 1966 * * netmap_pipe_reg (pipes) 1967 * inform the other pipe end that it is no 1968 * longer responsible for the lifetime of this 1969 * pipe end 1970 * 1971 * * netmap_monitor_reg (monitors) 1972 * intercept the sync callbacks of the monitored 1973 * rings 1974 * 1975 * * netmap_bwrap_reg (bwraps) 1976 * cross-link the bwrap and hwna rings, 1977 * forward the request to the hwna, override 1978 * the hwna notify callback (to get the frames 1979 * coming from outside go through the bridge). 1980 * 1981 * 1982 */ 1983 int 1984 netmap_do_regif(struct netmap_priv_d *priv, struct netmap_adapter *na, 1985 uint16_t ringid, uint32_t flags) 1986 { 1987 struct netmap_if *nifp = NULL; 1988 int error; 1989 1990 NMG_LOCK_ASSERT(); 1991 /* ring configuration may have changed, fetch from the card */ 1992 netmap_update_config(na); 1993 priv->np_na = na; /* store the reference */ 1994 error = netmap_set_ringid(priv, ringid, flags); 1995 if (error) 1996 goto err; 1997 error = netmap_mem_finalize(na->nm_mem, na); 1998 if (error) 1999 goto err; 2000 2001 if (na->active_fds == 0) { 2002 /* 2003 * If this is the first registration of the adapter, 2004 * create the in-kernel view of the netmap rings, 2005 * the netmap krings. 2006 */ 2007 2008 /* 2009 * Depending on the adapter, this may also create 2010 * the netmap rings themselves 2011 */ 2012 error = na->nm_krings_create(na); 2013 if (error) 2014 goto err_drop_mem; 2015 2016 } 2017 2018 /* now the krings must exist and we can check whether some 2019 * previous bind has exclusive ownership on them, and set 2020 * nr_pending_mode 2021 */ 2022 error = netmap_krings_get(priv); 2023 if (error) 2024 goto err_del_krings; 2025 2026 /* create all needed missing netmap rings */ 2027 error = netmap_mem_rings_create(na); 2028 if (error) 2029 goto err_rel_excl; 2030 2031 /* in all cases, create a new netmap if */ 2032 nifp = netmap_mem_if_new(na); 2033 if (nifp == NULL) { 2034 error = ENOMEM; 2035 goto err_del_rings; 2036 } 2037 2038 if (na->active_fds == 0) { 2039 /* cache the allocator info in the na */ 2040 error = netmap_mem_get_lut(na->nm_mem, &na->na_lut); 2041 if (error) 2042 goto err_del_if; 2043 ND("lut %p bufs %u size %u", na->na_lut.lut, na->na_lut.objtotal, 2044 na->na_lut.objsize); 2045 } 2046 2047 if (nm_kring_pending(priv)) { 2048 /* Some kring is switching mode, tell the adapter to 2049 * react on this. */ 2050 error = na->nm_register(na, 1); 2051 if (error) 2052 goto err_put_lut; 2053 } 2054 2055 /* Commit the reference. */ 2056 na->active_fds++; 2057 2058 /* 2059 * advertise that the interface is ready by setting np_nifp. 2060 * The barrier is needed because readers (poll, *SYNC and mmap) 2061 * check for priv->np_nifp != NULL without locking 2062 */ 2063 mb(); /* make sure previous writes are visible to all CPUs */ 2064 priv->np_nifp = nifp; 2065 2066 return 0; 2067 2068 err_put_lut: 2069 if (na->active_fds == 0) 2070 memset(&na->na_lut, 0, sizeof(na->na_lut)); 2071 err_del_if: 2072 netmap_mem_if_delete(na, nifp); 2073 err_rel_excl: 2074 netmap_krings_put(priv); 2075 err_del_rings: 2076 netmap_mem_rings_delete(na); 2077 err_del_krings: 2078 if (na->active_fds == 0) 2079 na->nm_krings_delete(na); 2080 err_drop_mem: 2081 netmap_mem_deref(na->nm_mem, na); 2082 err: 2083 priv->np_na = NULL; 2084 return error; 2085 } 2086 2087 2088 /* 2089 * update kring and ring at the end of rxsync/txsync. 2090 */ 2091 static inline void 2092 nm_sync_finalize(struct netmap_kring *kring) 2093 { 2094 /* 2095 * Update ring tail to what the kernel knows 2096 * After txsync: head/rhead/hwcur might be behind cur/rcur 2097 * if no carrier. 2098 */ 2099 kring->ring->tail = kring->rtail = kring->nr_hwtail; 2100 2101 ND(5, "%s now hwcur %d hwtail %d head %d cur %d tail %d", 2102 kring->name, kring->nr_hwcur, kring->nr_hwtail, 2103 kring->rhead, kring->rcur, kring->rtail); 2104 } 2105 2106 /* 2107 * ioctl(2) support for the "netmap" device. 2108 * 2109 * Following a list of accepted commands: 2110 * - NIOCGINFO 2111 * - SIOCGIFADDR just for convenience 2112 * - NIOCREGIF 2113 * - NIOCTXSYNC 2114 * - NIOCRXSYNC 2115 * 2116 * Return 0 on success, errno otherwise. 2117 */ 2118 int 2119 netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, caddr_t data, struct thread *td) 2120 { 2121 struct nmreq *nmr = (struct nmreq *) data; 2122 struct netmap_adapter *na = NULL; 2123 struct ifnet *ifp = NULL; 2124 int error = 0; 2125 u_int i, qfirst, qlast; 2126 struct netmap_if *nifp; 2127 struct netmap_kring *krings; 2128 enum txrx t; 2129 2130 if (cmd == NIOCGINFO || cmd == NIOCREGIF) { 2131 /* truncate name */ 2132 nmr->nr_name[sizeof(nmr->nr_name) - 1] = '\0'; 2133 if (nmr->nr_version != NETMAP_API) { 2134 D("API mismatch for %s got %d need %d", 2135 nmr->nr_name, 2136 nmr->nr_version, NETMAP_API); 2137 nmr->nr_version = NETMAP_API; 2138 } 2139 if (nmr->nr_version < NETMAP_MIN_API || 2140 nmr->nr_version > NETMAP_MAX_API) { 2141 return EINVAL; 2142 } 2143 } 2144 2145 switch (cmd) { 2146 case NIOCGINFO: /* return capabilities etc */ 2147 if (nmr->nr_cmd == NETMAP_BDG_LIST) { 2148 error = netmap_bdg_ctl(nmr, NULL); 2149 break; 2150 } 2151 2152 NMG_LOCK(); 2153 do { 2154 /* memsize is always valid */ 2155 struct netmap_mem_d *nmd = &nm_mem; 2156 u_int memflags; 2157 2158 if (nmr->nr_name[0] != '\0') { 2159 2160 /* get a refcount */ 2161 error = netmap_get_na(nmr, &na, &ifp, 1 /* create */); 2162 if (error) { 2163 na = NULL; 2164 ifp = NULL; 2165 break; 2166 } 2167 nmd = na->nm_mem; /* get memory allocator */ 2168 } 2169 2170 error = netmap_mem_get_info(nmd, &nmr->nr_memsize, &memflags, 2171 &nmr->nr_arg2); 2172 if (error) 2173 break; 2174 if (na == NULL) /* only memory info */ 2175 break; 2176 nmr->nr_offset = 0; 2177 nmr->nr_rx_slots = nmr->nr_tx_slots = 0; 2178 netmap_update_config(na); 2179 nmr->nr_rx_rings = na->num_rx_rings; 2180 nmr->nr_tx_rings = na->num_tx_rings; 2181 nmr->nr_rx_slots = na->num_rx_desc; 2182 nmr->nr_tx_slots = na->num_tx_desc; 2183 } while (0); 2184 netmap_unget_na(na, ifp); 2185 NMG_UNLOCK(); 2186 break; 2187 2188 case NIOCREGIF: 2189 /* 2190 * If nmr->nr_cmd is not zero, this NIOCREGIF is not really 2191 * a regif operation, but a different one, specified by the 2192 * value of nmr->nr_cmd. 2193 */ 2194 i = nmr->nr_cmd; 2195 if (i == NETMAP_BDG_ATTACH || i == NETMAP_BDG_DETACH 2196 || i == NETMAP_BDG_VNET_HDR 2197 || i == NETMAP_BDG_NEWIF 2198 || i == NETMAP_BDG_DELIF 2199 || i == NETMAP_BDG_POLLING_ON 2200 || i == NETMAP_BDG_POLLING_OFF) { 2201 /* possibly attach/detach NIC and VALE switch */ 2202 error = netmap_bdg_ctl(nmr, NULL); 2203 break; 2204 } else if (i == NETMAP_PT_HOST_CREATE || i == NETMAP_PT_HOST_DELETE) { 2205 /* forward the command to the ptnetmap subsystem */ 2206 error = ptnetmap_ctl(nmr, priv->np_na); 2207 break; 2208 } else if (i == NETMAP_VNET_HDR_GET) { 2209 /* get vnet-header length for this netmap port */ 2210 struct ifnet *ifp; 2211 2212 NMG_LOCK(); 2213 error = netmap_get_na(nmr, &na, &ifp, 0); 2214 if (na && !error) { 2215 nmr->nr_arg1 = na->virt_hdr_len; 2216 } 2217 netmap_unget_na(na, ifp); 2218 NMG_UNLOCK(); 2219 break; 2220 } else if (i == NETMAP_POOLS_INFO_GET) { 2221 /* get information from the memory allocator */ 2222 error = netmap_mem_pools_info_get(nmr, priv->np_na); 2223 break; 2224 } else if (i != 0) { 2225 D("nr_cmd must be 0 not %d", i); 2226 error = EINVAL; 2227 break; 2228 } 2229 2230 /* protect access to priv from concurrent NIOCREGIF */ 2231 NMG_LOCK(); 2232 do { 2233 u_int memflags; 2234 struct ifnet *ifp; 2235 2236 if (priv->np_nifp != NULL) { /* thread already registered */ 2237 error = EBUSY; 2238 break; 2239 } 2240 /* find the interface and a reference */ 2241 error = netmap_get_na(nmr, &na, &ifp, 2242 1 /* create */); /* keep reference */ 2243 if (error) 2244 break; 2245 if (NETMAP_OWNED_BY_KERN(na)) { 2246 netmap_unget_na(na, ifp); 2247 error = EBUSY; 2248 break; 2249 } 2250 2251 if (na->virt_hdr_len && !(nmr->nr_flags & NR_ACCEPT_VNET_HDR)) { 2252 netmap_unget_na(na, ifp); 2253 error = EIO; 2254 break; 2255 } 2256 2257 error = netmap_do_regif(priv, na, nmr->nr_ringid, nmr->nr_flags); 2258 if (error) { /* reg. failed, release priv and ref */ 2259 netmap_unget_na(na, ifp); 2260 break; 2261 } 2262 nifp = priv->np_nifp; 2263 priv->np_td = td; // XXX kqueue, debugging only 2264 2265 /* return the offset of the netmap_if object */ 2266 nmr->nr_rx_rings = na->num_rx_rings; 2267 nmr->nr_tx_rings = na->num_tx_rings; 2268 nmr->nr_rx_slots = na->num_rx_desc; 2269 nmr->nr_tx_slots = na->num_tx_desc; 2270 error = netmap_mem_get_info(na->nm_mem, &nmr->nr_memsize, &memflags, 2271 &nmr->nr_arg2); 2272 if (error) { 2273 netmap_do_unregif(priv); 2274 netmap_unget_na(na, ifp); 2275 break; 2276 } 2277 if (memflags & NETMAP_MEM_PRIVATE) { 2278 *(uint32_t *)(uintptr_t)&nifp->ni_flags |= NI_PRIV_MEM; 2279 } 2280 for_rx_tx(t) { 2281 priv->np_si[t] = nm_si_user(priv, t) ? 2282 &na->si[t] : &NMR(na, t)[priv->np_qfirst[t]].si; 2283 } 2284 2285 if (nmr->nr_arg3) { 2286 if (netmap_verbose) 2287 D("requested %d extra buffers", nmr->nr_arg3); 2288 nmr->nr_arg3 = netmap_extra_alloc(na, 2289 &nifp->ni_bufs_head, nmr->nr_arg3); 2290 if (netmap_verbose) 2291 D("got %d extra buffers", nmr->nr_arg3); 2292 } 2293 nmr->nr_offset = netmap_mem_if_offset(na->nm_mem, nifp); 2294 2295 /* store ifp reference so that priv destructor may release it */ 2296 priv->np_ifp = ifp; 2297 } while (0); 2298 NMG_UNLOCK(); 2299 break; 2300 2301 case NIOCTXSYNC: 2302 case NIOCRXSYNC: 2303 nifp = priv->np_nifp; 2304 2305 if (nifp == NULL) { 2306 error = ENXIO; 2307 break; 2308 } 2309 mb(); /* make sure following reads are not from cache */ 2310 2311 na = priv->np_na; /* we have a reference */ 2312 2313 if (na == NULL) { 2314 D("Internal error: nifp != NULL && na == NULL"); 2315 error = ENXIO; 2316 break; 2317 } 2318 2319 t = (cmd == NIOCTXSYNC ? NR_TX : NR_RX); 2320 krings = NMR(na, t); 2321 qfirst = priv->np_qfirst[t]; 2322 qlast = priv->np_qlast[t]; 2323 2324 for (i = qfirst; i < qlast; i++) { 2325 struct netmap_kring *kring = krings + i; 2326 struct netmap_ring *ring = kring->ring; 2327 2328 if (unlikely(nm_kr_tryget(kring, 1, &error))) { 2329 error = (error ? EIO : 0); 2330 continue; 2331 } 2332 2333 if (cmd == NIOCTXSYNC) { 2334 if (netmap_verbose & NM_VERB_TXSYNC) 2335 D("pre txsync ring %d cur %d hwcur %d", 2336 i, ring->cur, 2337 kring->nr_hwcur); 2338 if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2339 netmap_ring_reinit(kring); 2340 } else if (kring->nm_sync(kring, NAF_FORCE_RECLAIM) == 0) { 2341 nm_sync_finalize(kring); 2342 } 2343 if (netmap_verbose & NM_VERB_TXSYNC) 2344 D("post txsync ring %d cur %d hwcur %d", 2345 i, ring->cur, 2346 kring->nr_hwcur); 2347 } else { 2348 if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2349 netmap_ring_reinit(kring); 2350 } else if (kring->nm_sync(kring, NAF_FORCE_READ) == 0) { 2351 nm_sync_finalize(kring); 2352 } 2353 microtime(&ring->ts); 2354 } 2355 nm_kr_put(kring); 2356 } 2357 2358 break; 2359 2360 #ifdef WITH_VALE 2361 case NIOCCONFIG: 2362 error = netmap_bdg_config(nmr); 2363 break; 2364 #endif 2365 #ifdef __FreeBSD__ 2366 case FIONBIO: 2367 case FIOASYNC: 2368 ND("FIONBIO/FIOASYNC are no-ops"); 2369 break; 2370 2371 case BIOCIMMEDIATE: 2372 case BIOCGHDRCMPLT: 2373 case BIOCSHDRCMPLT: 2374 case BIOCSSEESENT: 2375 D("ignore BIOCIMMEDIATE/BIOCSHDRCMPLT/BIOCSHDRCMPLT/BIOCSSEESENT"); 2376 break; 2377 2378 default: /* allow device-specific ioctls */ 2379 { 2380 struct ifnet *ifp = ifunit_ref(nmr->nr_name); 2381 if (ifp == NULL) { 2382 error = ENXIO; 2383 } else { 2384 struct socket so; 2385 2386 bzero(&so, sizeof(so)); 2387 so.so_vnet = ifp->if_vnet; 2388 // so->so_proto not null. 2389 error = ifioctl(&so, cmd, data, td); 2390 if_rele(ifp); 2391 } 2392 break; 2393 } 2394 2395 #else /* linux */ 2396 default: 2397 error = EOPNOTSUPP; 2398 #endif /* linux */ 2399 } 2400 2401 return (error); 2402 } 2403 2404 2405 /* 2406 * select(2) and poll(2) handlers for the "netmap" device. 2407 * 2408 * Can be called for one or more queues. 2409 * Return true the event mask corresponding to ready events. 2410 * If there are no ready events, do a selrecord on either individual 2411 * selinfo or on the global one. 2412 * Device-dependent parts (locking and sync of tx/rx rings) 2413 * are done through callbacks. 2414 * 2415 * On linux, arguments are really pwait, the poll table, and 'td' is struct file * 2416 * The first one is remapped to pwait as selrecord() uses the name as an 2417 * hidden argument. 2418 */ 2419 int 2420 netmap_poll(struct netmap_priv_d *priv, int events, NM_SELRECORD_T *sr) 2421 { 2422 struct netmap_adapter *na; 2423 struct netmap_kring *kring; 2424 struct netmap_ring *ring; 2425 u_int i, check_all_tx, check_all_rx, want[NR_TXRX], revents = 0; 2426 #define want_tx want[NR_TX] 2427 #define want_rx want[NR_RX] 2428 struct mbq q; /* packets from hw queues to host stack */ 2429 enum txrx t; 2430 2431 /* 2432 * In order to avoid nested locks, we need to "double check" 2433 * txsync and rxsync if we decide to do a selrecord(). 2434 * retry_tx (and retry_rx, later) prevent looping forever. 2435 */ 2436 int retry_tx = 1, retry_rx = 1; 2437 2438 /* transparent mode: send_down is 1 if we have found some 2439 * packets to forward during the rx scan and we have not 2440 * sent them down to the nic yet 2441 */ 2442 int send_down = 0; 2443 2444 mbq_init(&q); 2445 2446 if (priv->np_nifp == NULL) { 2447 D("No if registered"); 2448 return POLLERR; 2449 } 2450 mb(); /* make sure following reads are not from cache */ 2451 2452 na = priv->np_na; 2453 2454 if (!nm_netmap_on(na)) 2455 return POLLERR; 2456 2457 if (netmap_verbose & 0x8000) 2458 D("device %s events 0x%x", na->name, events); 2459 want_tx = events & (POLLOUT | POLLWRNORM); 2460 want_rx = events & (POLLIN | POLLRDNORM); 2461 2462 /* 2463 * check_all_{tx|rx} are set if the card has more than one queue AND 2464 * the file descriptor is bound to all of them. If so, we sleep on 2465 * the "global" selinfo, otherwise we sleep on individual selinfo 2466 * (FreeBSD only allows two selinfo's per file descriptor). 2467 * The interrupt routine in the driver wake one or the other 2468 * (or both) depending on which clients are active. 2469 * 2470 * rxsync() is only called if we run out of buffers on a POLLIN. 2471 * txsync() is called if we run out of buffers on POLLOUT, or 2472 * there are pending packets to send. The latter can be disabled 2473 * passing NETMAP_NO_TX_POLL in the NIOCREG call. 2474 */ 2475 check_all_tx = nm_si_user(priv, NR_TX); 2476 check_all_rx = nm_si_user(priv, NR_RX); 2477 2478 /* 2479 * We start with a lock free round which is cheap if we have 2480 * slots available. If this fails, then lock and call the sync 2481 * routines. 2482 */ 2483 #if 1 /* new code- call rx if any of the ring needs to release or read buffers */ 2484 if (want_tx) { 2485 t = NR_TX; 2486 for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) { 2487 kring = &NMR(na, t)[i]; 2488 /* XXX compare ring->cur and kring->tail */ 2489 if (!nm_ring_empty(kring->ring)) { 2490 revents |= want[t]; 2491 want[t] = 0; /* also breaks the loop */ 2492 } 2493 } 2494 } 2495 if (want_rx) { 2496 want_rx = 0; /* look for a reason to run the handlers */ 2497 t = NR_RX; 2498 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 2499 kring = &NMR(na, t)[i]; 2500 if (kring->ring->cur == kring->ring->tail /* try fetch new buffers */ 2501 || kring->rhead != kring->ring->head /* release buffers */) { 2502 want_rx = 1; 2503 } 2504 } 2505 if (!want_rx) 2506 revents |= events & (POLLIN | POLLRDNORM); /* we have data */ 2507 } 2508 #else /* old code */ 2509 for_rx_tx(t) { 2510 for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) { 2511 kring = &NMR(na, t)[i]; 2512 /* XXX compare ring->cur and kring->tail */ 2513 if (!nm_ring_empty(kring->ring)) { 2514 revents |= want[t]; 2515 want[t] = 0; /* also breaks the loop */ 2516 } 2517 } 2518 } 2519 #endif /* old code */ 2520 2521 /* 2522 * If we want to push packets out (priv->np_txpoll) or 2523 * want_tx is still set, we must issue txsync calls 2524 * (on all rings, to avoid that the tx rings stall). 2525 * XXX should also check cur != hwcur on the tx rings. 2526 * Fortunately, normal tx mode has np_txpoll set. 2527 */ 2528 if (priv->np_txpoll || want_tx) { 2529 /* 2530 * The first round checks if anyone is ready, if not 2531 * do a selrecord and another round to handle races. 2532 * want_tx goes to 0 if any space is found, and is 2533 * used to skip rings with no pending transmissions. 2534 */ 2535 flush_tx: 2536 for (i = priv->np_qfirst[NR_TX]; i < priv->np_qlast[NR_TX]; i++) { 2537 int found = 0; 2538 2539 kring = &na->tx_rings[i]; 2540 ring = kring->ring; 2541 2542 if (!send_down && !want_tx && ring->cur == kring->nr_hwcur) 2543 continue; 2544 2545 if (nm_kr_tryget(kring, 1, &revents)) 2546 continue; 2547 2548 if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2549 netmap_ring_reinit(kring); 2550 revents |= POLLERR; 2551 } else { 2552 if (kring->nm_sync(kring, 0)) 2553 revents |= POLLERR; 2554 else 2555 nm_sync_finalize(kring); 2556 } 2557 2558 /* 2559 * If we found new slots, notify potential 2560 * listeners on the same ring. 2561 * Since we just did a txsync, look at the copies 2562 * of cur,tail in the kring. 2563 */ 2564 found = kring->rcur != kring->rtail; 2565 nm_kr_put(kring); 2566 if (found) { /* notify other listeners */ 2567 revents |= want_tx; 2568 want_tx = 0; 2569 kring->nm_notify(kring, 0); 2570 } 2571 } 2572 /* if there were any packet to forward we must have handled them by now */ 2573 send_down = 0; 2574 if (want_tx && retry_tx && sr) { 2575 nm_os_selrecord(sr, check_all_tx ? 2576 &na->si[NR_TX] : &na->tx_rings[priv->np_qfirst[NR_TX]].si); 2577 retry_tx = 0; 2578 goto flush_tx; 2579 } 2580 } 2581 2582 /* 2583 * If want_rx is still set scan receive rings. 2584 * Do it on all rings because otherwise we starve. 2585 */ 2586 if (want_rx) { 2587 /* two rounds here for race avoidance */ 2588 do_retry_rx: 2589 for (i = priv->np_qfirst[NR_RX]; i < priv->np_qlast[NR_RX]; i++) { 2590 int found = 0; 2591 2592 kring = &na->rx_rings[i]; 2593 ring = kring->ring; 2594 2595 if (unlikely(nm_kr_tryget(kring, 1, &revents))) 2596 continue; 2597 2598 if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2599 netmap_ring_reinit(kring); 2600 revents |= POLLERR; 2601 } 2602 /* now we can use kring->rcur, rtail */ 2603 2604 /* 2605 * transparent mode support: collect packets 2606 * from the rxring(s). 2607 */ 2608 if (nm_may_forward_up(kring)) { 2609 ND(10, "forwarding some buffers up %d to %d", 2610 kring->nr_hwcur, ring->cur); 2611 netmap_grab_packets(kring, &q, netmap_fwd); 2612 } 2613 2614 kring->nr_kflags &= ~NR_FORWARD; 2615 if (kring->nm_sync(kring, 0)) 2616 revents |= POLLERR; 2617 else 2618 nm_sync_finalize(kring); 2619 send_down |= (kring->nr_kflags & NR_FORWARD); /* host ring only */ 2620 if (netmap_no_timestamp == 0 || 2621 ring->flags & NR_TIMESTAMP) { 2622 microtime(&ring->ts); 2623 } 2624 found = kring->rcur != kring->rtail; 2625 nm_kr_put(kring); 2626 if (found) { 2627 revents |= want_rx; 2628 retry_rx = 0; 2629 kring->nm_notify(kring, 0); 2630 } 2631 } 2632 2633 if (retry_rx && sr) { 2634 nm_os_selrecord(sr, check_all_rx ? 2635 &na->si[NR_RX] : &na->rx_rings[priv->np_qfirst[NR_RX]].si); 2636 } 2637 if (send_down > 0 || retry_rx) { 2638 retry_rx = 0; 2639 if (send_down) 2640 goto flush_tx; /* and retry_rx */ 2641 else 2642 goto do_retry_rx; 2643 } 2644 } 2645 2646 /* 2647 * Transparent mode: marked bufs on rx rings between 2648 * kring->nr_hwcur and ring->head 2649 * are passed to the other endpoint. 2650 * 2651 * Transparent mode requires to bind all 2652 * rings to a single file descriptor. 2653 */ 2654 2655 if (q.head && !nm_kr_tryget(&na->tx_rings[na->num_tx_rings], 1, &revents)) { 2656 netmap_send_up(na->ifp, &q); 2657 nm_kr_put(&na->tx_rings[na->num_tx_rings]); 2658 } 2659 2660 return (revents); 2661 #undef want_tx 2662 #undef want_rx 2663 } 2664 2665 2666 /*-------------------- driver support routines -------------------*/ 2667 2668 /* default notify callback */ 2669 static int 2670 netmap_notify(struct netmap_kring *kring, int flags) 2671 { 2672 struct netmap_adapter *na = kring->na; 2673 enum txrx t = kring->tx; 2674 2675 nm_os_selwakeup(&kring->si); 2676 /* optimization: avoid a wake up on the global 2677 * queue if nobody has registered for more 2678 * than one ring 2679 */ 2680 if (na->si_users[t] > 0) 2681 nm_os_selwakeup(&na->si[t]); 2682 2683 return NM_IRQ_COMPLETED; 2684 } 2685 2686 #if 0 2687 static int 2688 netmap_notify(struct netmap_adapter *na, u_int n_ring, 2689 enum txrx tx, int flags) 2690 { 2691 if (tx == NR_TX) { 2692 KeSetEvent(notes->TX_EVENT, 0, FALSE); 2693 } 2694 else 2695 { 2696 KeSetEvent(notes->RX_EVENT, 0, FALSE); 2697 } 2698 return 0; 2699 } 2700 #endif 2701 2702 /* called by all routines that create netmap_adapters. 2703 * provide some defaults and get a reference to the 2704 * memory allocator 2705 */ 2706 int 2707 netmap_attach_common(struct netmap_adapter *na) 2708 { 2709 if (na->num_tx_rings == 0 || na->num_rx_rings == 0) { 2710 D("%s: invalid rings tx %d rx %d", 2711 na->name, na->num_tx_rings, na->num_rx_rings); 2712 return EINVAL; 2713 } 2714 2715 #ifdef __FreeBSD__ 2716 if (na->na_flags & NAF_HOST_RINGS && na->ifp) { 2717 na->if_input = na->ifp->if_input; /* for netmap_send_up */ 2718 } 2719 #endif /* __FreeBSD__ */ 2720 if (na->nm_krings_create == NULL) { 2721 /* we assume that we have been called by a driver, 2722 * since other port types all provide their own 2723 * nm_krings_create 2724 */ 2725 na->nm_krings_create = netmap_hw_krings_create; 2726 na->nm_krings_delete = netmap_hw_krings_delete; 2727 } 2728 if (na->nm_notify == NULL) 2729 na->nm_notify = netmap_notify; 2730 na->active_fds = 0; 2731 2732 if (na->nm_mem == NULL) 2733 /* use the global allocator */ 2734 na->nm_mem = &nm_mem; 2735 netmap_mem_get(na->nm_mem); 2736 #ifdef WITH_VALE 2737 if (na->nm_bdg_attach == NULL) 2738 /* no special nm_bdg_attach callback. On VALE 2739 * attach, we need to interpose a bwrap 2740 */ 2741 na->nm_bdg_attach = netmap_bwrap_attach; 2742 #endif 2743 2744 return 0; 2745 } 2746 2747 2748 /* standard cleanup, called by all destructors */ 2749 void 2750 netmap_detach_common(struct netmap_adapter *na) 2751 { 2752 if (na->tx_rings) { /* XXX should not happen */ 2753 D("freeing leftover tx_rings"); 2754 na->nm_krings_delete(na); 2755 } 2756 netmap_pipe_dealloc(na); 2757 if (na->nm_mem) 2758 netmap_mem_put(na->nm_mem); 2759 bzero(na, sizeof(*na)); 2760 free(na, M_DEVBUF); 2761 } 2762 2763 /* Wrapper for the register callback provided netmap-enabled 2764 * hardware drivers. 2765 * nm_iszombie(na) means that the driver module has been 2766 * unloaded, so we cannot call into it. 2767 * nm_os_ifnet_lock() must guarantee mutual exclusion with 2768 * module unloading. 2769 */ 2770 static int 2771 netmap_hw_reg(struct netmap_adapter *na, int onoff) 2772 { 2773 struct netmap_hw_adapter *hwna = 2774 (struct netmap_hw_adapter*)na; 2775 int error = 0; 2776 2777 nm_os_ifnet_lock(); 2778 2779 if (nm_iszombie(na)) { 2780 if (onoff) { 2781 error = ENXIO; 2782 } else if (na != NULL) { 2783 na->na_flags &= ~NAF_NETMAP_ON; 2784 } 2785 goto out; 2786 } 2787 2788 error = hwna->nm_hw_register(na, onoff); 2789 2790 out: 2791 nm_os_ifnet_unlock(); 2792 2793 return error; 2794 } 2795 2796 static void 2797 netmap_hw_dtor(struct netmap_adapter *na) 2798 { 2799 if (nm_iszombie(na) || na->ifp == NULL) 2800 return; 2801 2802 WNA(na->ifp) = NULL; 2803 } 2804 2805 2806 /* 2807 * Allocate a ``netmap_adapter`` object, and initialize it from the 2808 * 'arg' passed by the driver on attach. 2809 * We allocate a block of memory with room for a struct netmap_adapter 2810 * plus two sets of N+2 struct netmap_kring (where N is the number 2811 * of hardware rings): 2812 * krings 0..N-1 are for the hardware queues. 2813 * kring N is for the host stack queue 2814 * kring N+1 is only used for the selinfo for all queues. // XXX still true ? 2815 * Return 0 on success, ENOMEM otherwise. 2816 */ 2817 static int 2818 _netmap_attach(struct netmap_adapter *arg, size_t size) 2819 { 2820 struct netmap_hw_adapter *hwna = NULL; 2821 struct ifnet *ifp = NULL; 2822 2823 if (arg == NULL || arg->ifp == NULL) 2824 goto fail; 2825 ifp = arg->ifp; 2826 hwna = malloc(size, M_DEVBUF, M_NOWAIT | M_ZERO); 2827 if (hwna == NULL) 2828 goto fail; 2829 hwna->up = *arg; 2830 hwna->up.na_flags |= NAF_HOST_RINGS | NAF_NATIVE; 2831 strncpy(hwna->up.name, ifp->if_xname, sizeof(hwna->up.name)); 2832 hwna->nm_hw_register = hwna->up.nm_register; 2833 hwna->up.nm_register = netmap_hw_reg; 2834 if (netmap_attach_common(&hwna->up)) { 2835 free(hwna, M_DEVBUF); 2836 goto fail; 2837 } 2838 netmap_adapter_get(&hwna->up); 2839 2840 NM_ATTACH_NA(ifp, &hwna->up); 2841 2842 #ifdef linux 2843 if (ifp->netdev_ops) { 2844 /* prepare a clone of the netdev ops */ 2845 #ifndef NETMAP_LINUX_HAVE_NETDEV_OPS 2846 hwna->nm_ndo.ndo_start_xmit = ifp->netdev_ops; 2847 #else 2848 hwna->nm_ndo = *ifp->netdev_ops; 2849 #endif /* NETMAP_LINUX_HAVE_NETDEV_OPS */ 2850 } 2851 hwna->nm_ndo.ndo_start_xmit = linux_netmap_start_xmit; 2852 if (ifp->ethtool_ops) { 2853 hwna->nm_eto = *ifp->ethtool_ops; 2854 } 2855 hwna->nm_eto.set_ringparam = linux_netmap_set_ringparam; 2856 #ifdef NETMAP_LINUX_HAVE_SET_CHANNELS 2857 hwna->nm_eto.set_channels = linux_netmap_set_channels; 2858 #endif /* NETMAP_LINUX_HAVE_SET_CHANNELS */ 2859 if (arg->nm_config == NULL) { 2860 hwna->up.nm_config = netmap_linux_config; 2861 } 2862 #endif /* linux */ 2863 if (arg->nm_dtor == NULL) { 2864 hwna->up.nm_dtor = netmap_hw_dtor; 2865 } 2866 2867 if_printf(ifp, "netmap queues/slots: TX %d/%d, RX %d/%d\n", 2868 hwna->up.num_tx_rings, hwna->up.num_tx_desc, 2869 hwna->up.num_rx_rings, hwna->up.num_rx_desc); 2870 return 0; 2871 2872 fail: 2873 D("fail, arg %p ifp %p na %p", arg, ifp, hwna); 2874 return (hwna ? EINVAL : ENOMEM); 2875 } 2876 2877 2878 int 2879 netmap_attach(struct netmap_adapter *arg) 2880 { 2881 return _netmap_attach(arg, sizeof(struct netmap_hw_adapter)); 2882 } 2883 2884 2885 #ifdef WITH_PTNETMAP_GUEST 2886 int 2887 netmap_pt_guest_attach(struct netmap_adapter *arg, void *csb, 2888 unsigned int nifp_offset, unsigned int memid) 2889 { 2890 struct netmap_pt_guest_adapter *ptna; 2891 struct ifnet *ifp = arg ? arg->ifp : NULL; 2892 int error; 2893 2894 /* get allocator */ 2895 arg->nm_mem = netmap_mem_pt_guest_new(ifp, nifp_offset, memid); 2896 if (arg->nm_mem == NULL) 2897 return ENOMEM; 2898 arg->na_flags |= NAF_MEM_OWNER; 2899 error = _netmap_attach(arg, sizeof(struct netmap_pt_guest_adapter)); 2900 if (error) 2901 return error; 2902 2903 /* get the netmap_pt_guest_adapter */ 2904 ptna = (struct netmap_pt_guest_adapter *) NA(ifp); 2905 ptna->csb = csb; 2906 2907 /* Initialize a separate pass-through netmap adapter that is going to 2908 * be used by the ptnet driver only, and so never exposed to netmap 2909 * applications. We only need a subset of the available fields. */ 2910 memset(&ptna->dr, 0, sizeof(ptna->dr)); 2911 ptna->dr.up.ifp = ifp; 2912 ptna->dr.up.nm_mem = ptna->hwup.up.nm_mem; 2913 netmap_mem_get(ptna->dr.up.nm_mem); 2914 ptna->dr.up.nm_config = ptna->hwup.up.nm_config; 2915 2916 ptna->backend_regifs = 0; 2917 2918 return 0; 2919 } 2920 #endif /* WITH_PTNETMAP_GUEST */ 2921 2922 2923 void 2924 NM_DBG(netmap_adapter_get)(struct netmap_adapter *na) 2925 { 2926 if (!na) { 2927 return; 2928 } 2929 2930 refcount_acquire(&na->na_refcount); 2931 } 2932 2933 2934 /* returns 1 iff the netmap_adapter is destroyed */ 2935 int 2936 NM_DBG(netmap_adapter_put)(struct netmap_adapter *na) 2937 { 2938 if (!na) 2939 return 1; 2940 2941 if (!refcount_release(&na->na_refcount)) 2942 return 0; 2943 2944 if (na->nm_dtor) 2945 na->nm_dtor(na); 2946 2947 netmap_detach_common(na); 2948 2949 return 1; 2950 } 2951 2952 /* nm_krings_create callback for all hardware native adapters */ 2953 int 2954 netmap_hw_krings_create(struct netmap_adapter *na) 2955 { 2956 int ret = netmap_krings_create(na, 0); 2957 if (ret == 0) { 2958 /* initialize the mbq for the sw rx ring */ 2959 mbq_safe_init(&na->rx_rings[na->num_rx_rings].rx_queue); 2960 ND("initialized sw rx queue %d", na->num_rx_rings); 2961 } 2962 return ret; 2963 } 2964 2965 2966 2967 /* 2968 * Called on module unload by the netmap-enabled drivers 2969 */ 2970 void 2971 netmap_detach(struct ifnet *ifp) 2972 { 2973 struct netmap_adapter *na = NA(ifp); 2974 2975 if (!na) 2976 return; 2977 2978 NMG_LOCK(); 2979 netmap_set_all_rings(na, NM_KR_LOCKED); 2980 na->na_flags |= NAF_ZOMBIE; 2981 /* 2982 * if the netmap adapter is not native, somebody 2983 * changed it, so we can not release it here. 2984 * The NAF_ZOMBIE flag will notify the new owner that 2985 * the driver is gone. 2986 */ 2987 if (na->na_flags & NAF_NATIVE) { 2988 netmap_adapter_put(na); 2989 } 2990 /* give active users a chance to notice that NAF_ZOMBIE has been 2991 * turned on, so that they can stop and return an error to userspace. 2992 * Note that this becomes a NOP if there are no active users and, 2993 * therefore, the put() above has deleted the na, since now NA(ifp) is 2994 * NULL. 2995 */ 2996 netmap_enable_all_rings(ifp); 2997 NMG_UNLOCK(); 2998 } 2999 3000 3001 /* 3002 * Intercept packets from the network stack and pass them 3003 * to netmap as incoming packets on the 'software' ring. 3004 * 3005 * We only store packets in a bounded mbq and then copy them 3006 * in the relevant rxsync routine. 3007 * 3008 * We rely on the OS to make sure that the ifp and na do not go 3009 * away (typically the caller checks for IFF_DRV_RUNNING or the like). 3010 * In nm_register() or whenever there is a reinitialization, 3011 * we make sure to make the mode change visible here. 3012 */ 3013 int 3014 netmap_transmit(struct ifnet *ifp, struct mbuf *m) 3015 { 3016 struct netmap_adapter *na = NA(ifp); 3017 struct netmap_kring *kring, *tx_kring; 3018 u_int len = MBUF_LEN(m); 3019 u_int error = ENOBUFS; 3020 unsigned int txr; 3021 struct mbq *q; 3022 int space; 3023 3024 kring = &na->rx_rings[na->num_rx_rings]; 3025 // XXX [Linux] we do not need this lock 3026 // if we follow the down/configure/up protocol -gl 3027 // mtx_lock(&na->core_lock); 3028 3029 if (!nm_netmap_on(na)) { 3030 D("%s not in netmap mode anymore", na->name); 3031 error = ENXIO; 3032 goto done; 3033 } 3034 3035 txr = MBUF_TXQ(m); 3036 if (txr >= na->num_tx_rings) { 3037 txr %= na->num_tx_rings; 3038 } 3039 tx_kring = &NMR(na, NR_TX)[txr]; 3040 3041 if (tx_kring->nr_mode == NKR_NETMAP_OFF) { 3042 return MBUF_TRANSMIT(na, ifp, m); 3043 } 3044 3045 q = &kring->rx_queue; 3046 3047 // XXX reconsider long packets if we handle fragments 3048 if (len > NETMAP_BUF_SIZE(na)) { /* too long for us */ 3049 D("%s from_host, drop packet size %d > %d", na->name, 3050 len, NETMAP_BUF_SIZE(na)); 3051 goto done; 3052 } 3053 3054 if (nm_os_mbuf_has_offld(m)) { 3055 RD(1, "%s drop mbuf requiring offloadings", na->name); 3056 goto done; 3057 } 3058 3059 /* protect against rxsync_from_host(), netmap_sw_to_nic() 3060 * and maybe other instances of netmap_transmit (the latter 3061 * not possible on Linux). 3062 * Also avoid overflowing the queue. 3063 */ 3064 mbq_lock(q); 3065 3066 space = kring->nr_hwtail - kring->nr_hwcur; 3067 if (space < 0) 3068 space += kring->nkr_num_slots; 3069 if (space + mbq_len(q) >= kring->nkr_num_slots - 1) { // XXX 3070 RD(10, "%s full hwcur %d hwtail %d qlen %d len %d m %p", 3071 na->name, kring->nr_hwcur, kring->nr_hwtail, mbq_len(q), 3072 len, m); 3073 } else { 3074 mbq_enqueue(q, m); 3075 ND(10, "%s %d bufs in queue len %d m %p", 3076 na->name, mbq_len(q), len, m); 3077 /* notify outside the lock */ 3078 m = NULL; 3079 error = 0; 3080 } 3081 mbq_unlock(q); 3082 3083 done: 3084 if (m) 3085 m_freem(m); 3086 /* unconditionally wake up listeners */ 3087 kring->nm_notify(kring, 0); 3088 /* this is normally netmap_notify(), but for nics 3089 * connected to a bridge it is netmap_bwrap_intr_notify(), 3090 * that possibly forwards the frames through the switch 3091 */ 3092 3093 return (error); 3094 } 3095 3096 3097 /* 3098 * netmap_reset() is called by the driver routines when reinitializing 3099 * a ring. The driver is in charge of locking to protect the kring. 3100 * If native netmap mode is not set just return NULL. 3101 * If native netmap mode is set, in particular, we have to set nr_mode to 3102 * NKR_NETMAP_ON. 3103 */ 3104 struct netmap_slot * 3105 netmap_reset(struct netmap_adapter *na, enum txrx tx, u_int n, 3106 u_int new_cur) 3107 { 3108 struct netmap_kring *kring; 3109 int new_hwofs, lim; 3110 3111 if (!nm_native_on(na)) { 3112 ND("interface not in native netmap mode"); 3113 return NULL; /* nothing to reinitialize */ 3114 } 3115 3116 /* XXX note- in the new scheme, we are not guaranteed to be 3117 * under lock (e.g. when called on a device reset). 3118 * In this case, we should set a flag and do not trust too 3119 * much the values. In practice: TODO 3120 * - set a RESET flag somewhere in the kring 3121 * - do the processing in a conservative way 3122 * - let the *sync() fixup at the end. 3123 */ 3124 if (tx == NR_TX) { 3125 if (n >= na->num_tx_rings) 3126 return NULL; 3127 3128 kring = na->tx_rings + n; 3129 3130 if (kring->nr_pending_mode == NKR_NETMAP_OFF) { 3131 kring->nr_mode = NKR_NETMAP_OFF; 3132 return NULL; 3133 } 3134 3135 // XXX check whether we should use hwcur or rcur 3136 new_hwofs = kring->nr_hwcur - new_cur; 3137 } else { 3138 if (n >= na->num_rx_rings) 3139 return NULL; 3140 kring = na->rx_rings + n; 3141 3142 if (kring->nr_pending_mode == NKR_NETMAP_OFF) { 3143 kring->nr_mode = NKR_NETMAP_OFF; 3144 return NULL; 3145 } 3146 3147 new_hwofs = kring->nr_hwtail - new_cur; 3148 } 3149 lim = kring->nkr_num_slots - 1; 3150 if (new_hwofs > lim) 3151 new_hwofs -= lim + 1; 3152 3153 /* Always set the new offset value and realign the ring. */ 3154 if (netmap_verbose) 3155 D("%s %s%d hwofs %d -> %d, hwtail %d -> %d", 3156 na->name, 3157 tx == NR_TX ? "TX" : "RX", n, 3158 kring->nkr_hwofs, new_hwofs, 3159 kring->nr_hwtail, 3160 tx == NR_TX ? lim : kring->nr_hwtail); 3161 kring->nkr_hwofs = new_hwofs; 3162 if (tx == NR_TX) { 3163 kring->nr_hwtail = kring->nr_hwcur + lim; 3164 if (kring->nr_hwtail > lim) 3165 kring->nr_hwtail -= lim + 1; 3166 } 3167 3168 #if 0 // def linux 3169 /* XXX check that the mappings are correct */ 3170 /* need ring_nr, adapter->pdev, direction */ 3171 buffer_info->dma = dma_map_single(&pdev->dev, addr, adapter->rx_buffer_len, DMA_FROM_DEVICE); 3172 if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) { 3173 D("error mapping rx netmap buffer %d", i); 3174 // XXX fix error handling 3175 } 3176 3177 #endif /* linux */ 3178 /* 3179 * Wakeup on the individual and global selwait 3180 * We do the wakeup here, but the ring is not yet reconfigured. 3181 * However, we are under lock so there are no races. 3182 */ 3183 kring->nr_mode = NKR_NETMAP_ON; 3184 kring->nm_notify(kring, 0); 3185 return kring->ring->slot; 3186 } 3187 3188 3189 /* 3190 * Dispatch rx/tx interrupts to the netmap rings. 3191 * 3192 * "work_done" is non-null on the RX path, NULL for the TX path. 3193 * We rely on the OS to make sure that there is only one active 3194 * instance per queue, and that there is appropriate locking. 3195 * 3196 * The 'notify' routine depends on what the ring is attached to. 3197 * - for a netmap file descriptor, do a selwakeup on the individual 3198 * waitqueue, plus one on the global one if needed 3199 * (see netmap_notify) 3200 * - for a nic connected to a switch, call the proper forwarding routine 3201 * (see netmap_bwrap_intr_notify) 3202 */ 3203 int 3204 netmap_common_irq(struct netmap_adapter *na, u_int q, u_int *work_done) 3205 { 3206 struct netmap_kring *kring; 3207 enum txrx t = (work_done ? NR_RX : NR_TX); 3208 3209 q &= NETMAP_RING_MASK; 3210 3211 if (netmap_verbose) { 3212 RD(5, "received %s queue %d", work_done ? "RX" : "TX" , q); 3213 } 3214 3215 if (q >= nma_get_nrings(na, t)) 3216 return NM_IRQ_PASS; // not a physical queue 3217 3218 kring = NMR(na, t) + q; 3219 3220 if (kring->nr_mode == NKR_NETMAP_OFF) { 3221 return NM_IRQ_PASS; 3222 } 3223 3224 if (t == NR_RX) { 3225 kring->nr_kflags |= NKR_PENDINTR; // XXX atomic ? 3226 *work_done = 1; /* do not fire napi again */ 3227 } 3228 3229 return kring->nm_notify(kring, 0); 3230 } 3231 3232 3233 /* 3234 * Default functions to handle rx/tx interrupts from a physical device. 3235 * "work_done" is non-null on the RX path, NULL for the TX path. 3236 * 3237 * If the card is not in netmap mode, simply return NM_IRQ_PASS, 3238 * so that the caller proceeds with regular processing. 3239 * Otherwise call netmap_common_irq(). 3240 * 3241 * If the card is connected to a netmap file descriptor, 3242 * do a selwakeup on the individual queue, plus one on the global one 3243 * if needed (multiqueue card _and_ there are multiqueue listeners), 3244 * and return NR_IRQ_COMPLETED. 3245 * 3246 * Finally, if called on rx from an interface connected to a switch, 3247 * calls the proper forwarding routine. 3248 */ 3249 int 3250 netmap_rx_irq(struct ifnet *ifp, u_int q, u_int *work_done) 3251 { 3252 struct netmap_adapter *na = NA(ifp); 3253 3254 /* 3255 * XXX emulated netmap mode sets NAF_SKIP_INTR so 3256 * we still use the regular driver even though the previous 3257 * check fails. It is unclear whether we should use 3258 * nm_native_on() here. 3259 */ 3260 if (!nm_netmap_on(na)) 3261 return NM_IRQ_PASS; 3262 3263 if (na->na_flags & NAF_SKIP_INTR) { 3264 ND("use regular interrupt"); 3265 return NM_IRQ_PASS; 3266 } 3267 3268 return netmap_common_irq(na, q, work_done); 3269 } 3270 3271 3272 /* 3273 * Module loader and unloader 3274 * 3275 * netmap_init() creates the /dev/netmap device and initializes 3276 * all global variables. Returns 0 on success, errno on failure 3277 * (but there is no chance) 3278 * 3279 * netmap_fini() destroys everything. 3280 */ 3281 3282 static struct cdev *netmap_dev; /* /dev/netmap character device. */ 3283 extern struct cdevsw netmap_cdevsw; 3284 3285 3286 void 3287 netmap_fini(void) 3288 { 3289 if (netmap_dev) 3290 destroy_dev(netmap_dev); 3291 /* we assume that there are no longer netmap users */ 3292 nm_os_ifnet_fini(); 3293 netmap_uninit_bridges(); 3294 netmap_mem_fini(); 3295 NMG_LOCK_DESTROY(); 3296 printf("netmap: unloaded module.\n"); 3297 } 3298 3299 3300 int 3301 netmap_init(void) 3302 { 3303 int error; 3304 3305 NMG_LOCK_INIT(); 3306 3307 error = netmap_mem_init(); 3308 if (error != 0) 3309 goto fail; 3310 /* 3311 * MAKEDEV_ETERNAL_KLD avoids an expensive check on syscalls 3312 * when the module is compiled in. 3313 * XXX could use make_dev_credv() to get error number 3314 */ 3315 netmap_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, 3316 &netmap_cdevsw, 0, NULL, UID_ROOT, GID_WHEEL, 0600, 3317 "netmap"); 3318 if (!netmap_dev) 3319 goto fail; 3320 3321 error = netmap_init_bridges(); 3322 if (error) 3323 goto fail; 3324 3325 #ifdef __FreeBSD__ 3326 nm_os_vi_init_index(); 3327 #endif 3328 3329 error = nm_os_ifnet_init(); 3330 if (error) 3331 goto fail; 3332 3333 printf("netmap: loaded module\n"); 3334 return (0); 3335 fail: 3336 netmap_fini(); 3337 return (EINVAL); /* may be incorrect */ 3338 } 3339