1 /* 2 * Copyright (C) 2011-2014 Matteo Landi 3 * Copyright (C) 2011-2016 Luigi Rizzo 4 * Copyright (C) 2011-2016 Giuseppe Lettieri 5 * Copyright (C) 2011-2016 Vincenzo Maffione 6 * All rights reserved. 7 * 8 * Redistribution and use in source and binary forms, with or without 9 * modification, are permitted provided that the following conditions 10 * are met: 11 * 1. Redistributions of source code must retain the above copyright 12 * notice, this list of conditions and the following disclaimer. 13 * 2. Redistributions in binary form must reproduce the above copyright 14 * notice, this list of conditions and the following disclaimer in the 15 * documentation and/or other materials provided with the distribution. 16 * 17 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 18 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 19 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 20 * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 21 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 22 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 23 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 24 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 25 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 26 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 27 * SUCH DAMAGE. 28 */ 29 30 31 /* 32 * $FreeBSD$ 33 * 34 * This module supports memory mapped access to network devices, 35 * see netmap(4). 36 * 37 * The module uses a large, memory pool allocated by the kernel 38 * and accessible as mmapped memory by multiple userspace threads/processes. 39 * The memory pool contains packet buffers and "netmap rings", 40 * i.e. user-accessible copies of the interface's queues. 41 * 42 * Access to the network card works like this: 43 * 1. a process/thread issues one or more open() on /dev/netmap, to create 44 * select()able file descriptor on which events are reported. 45 * 2. on each descriptor, the process issues an ioctl() to identify 46 * the interface that should report events to the file descriptor. 47 * 3. on each descriptor, the process issues an mmap() request to 48 * map the shared memory region within the process' address space. 49 * The list of interesting queues is indicated by a location in 50 * the shared memory region. 51 * 4. using the functions in the netmap(4) userspace API, a process 52 * can look up the occupation state of a queue, access memory buffers, 53 * and retrieve received packets or enqueue packets to transmit. 54 * 5. using some ioctl()s the process can synchronize the userspace view 55 * of the queue with the actual status in the kernel. This includes both 56 * receiving the notification of new packets, and transmitting new 57 * packets on the output interface. 58 * 6. select() or poll() can be used to wait for events on individual 59 * transmit or receive queues (or all queues for a given interface). 60 * 61 62 SYNCHRONIZATION (USER) 63 64 The netmap rings and data structures may be shared among multiple 65 user threads or even independent processes. 66 Any synchronization among those threads/processes is delegated 67 to the threads themselves. Only one thread at a time can be in 68 a system call on the same netmap ring. The OS does not enforce 69 this and only guarantees against system crashes in case of 70 invalid usage. 71 72 LOCKING (INTERNAL) 73 74 Within the kernel, access to the netmap rings is protected as follows: 75 76 - a spinlock on each ring, to handle producer/consumer races on 77 RX rings attached to the host stack (against multiple host 78 threads writing from the host stack to the same ring), 79 and on 'destination' rings attached to a VALE switch 80 (i.e. RX rings in VALE ports, and TX rings in NIC/host ports) 81 protecting multiple active senders for the same destination) 82 83 - an atomic variable to guarantee that there is at most one 84 instance of *_*xsync() on the ring at any time. 85 For rings connected to user file 86 descriptors, an atomic_test_and_set() protects this, and the 87 lock on the ring is not actually used. 88 For NIC RX rings connected to a VALE switch, an atomic_test_and_set() 89 is also used to prevent multiple executions (the driver might indeed 90 already guarantee this). 91 For NIC TX rings connected to a VALE switch, the lock arbitrates 92 access to the queue (both when allocating buffers and when pushing 93 them out). 94 95 - *xsync() should be protected against initializations of the card. 96 On FreeBSD most devices have the reset routine protected by 97 a RING lock (ixgbe, igb, em) or core lock (re). lem is missing 98 the RING protection on rx_reset(), this should be added. 99 100 On linux there is an external lock on the tx path, which probably 101 also arbitrates access to the reset routine. XXX to be revised 102 103 - a per-interface core_lock protecting access from the host stack 104 while interfaces may be detached from netmap mode. 105 XXX there should be no need for this lock if we detach the interfaces 106 only while they are down. 107 108 109 --- VALE SWITCH --- 110 111 NMG_LOCK() serializes all modifications to switches and ports. 112 A switch cannot be deleted until all ports are gone. 113 114 For each switch, an SX lock (RWlock on linux) protects 115 deletion of ports. When configuring or deleting a new port, the 116 lock is acquired in exclusive mode (after holding NMG_LOCK). 117 When forwarding, the lock is acquired in shared mode (without NMG_LOCK). 118 The lock is held throughout the entire forwarding cycle, 119 during which the thread may incur in a page fault. 120 Hence it is important that sleepable shared locks are used. 121 122 On the rx ring, the per-port lock is grabbed initially to reserve 123 a number of slot in the ring, then the lock is released, 124 packets are copied from source to destination, and then 125 the lock is acquired again and the receive ring is updated. 126 (A similar thing is done on the tx ring for NIC and host stack 127 ports attached to the switch) 128 129 */ 130 131 132 /* --- internals ---- 133 * 134 * Roadmap to the code that implements the above. 135 * 136 * > 1. a process/thread issues one or more open() on /dev/netmap, to create 137 * > select()able file descriptor on which events are reported. 138 * 139 * Internally, we allocate a netmap_priv_d structure, that will be 140 * initialized on ioctl(NIOCREGIF). There is one netmap_priv_d 141 * structure for each open(). 142 * 143 * os-specific: 144 * FreeBSD: see netmap_open() (netmap_freebsd.c) 145 * linux: see linux_netmap_open() (netmap_linux.c) 146 * 147 * > 2. on each descriptor, the process issues an ioctl() to identify 148 * > the interface that should report events to the file descriptor. 149 * 150 * Implemented by netmap_ioctl(), NIOCREGIF case, with nmr->nr_cmd==0. 151 * Most important things happen in netmap_get_na() and 152 * netmap_do_regif(), called from there. Additional details can be 153 * found in the comments above those functions. 154 * 155 * In all cases, this action creates/takes-a-reference-to a 156 * netmap_*_adapter describing the port, and allocates a netmap_if 157 * and all necessary netmap rings, filling them with netmap buffers. 158 * 159 * In this phase, the sync callbacks for each ring are set (these are used 160 * in steps 5 and 6 below). The callbacks depend on the type of adapter. 161 * The adapter creation/initialization code puts them in the 162 * netmap_adapter (fields na->nm_txsync and na->nm_rxsync). Then, they 163 * are copied from there to the netmap_kring's during netmap_do_regif(), by 164 * the nm_krings_create() callback. All the nm_krings_create callbacks 165 * actually call netmap_krings_create() to perform this and the other 166 * common stuff. netmap_krings_create() also takes care of the host rings, 167 * if needed, by setting their sync callbacks appropriately. 168 * 169 * Additional actions depend on the kind of netmap_adapter that has been 170 * registered: 171 * 172 * - netmap_hw_adapter: [netmap.c] 173 * This is a system netdev/ifp with native netmap support. 174 * The ifp is detached from the host stack by redirecting: 175 * - transmissions (from the network stack) to netmap_transmit() 176 * - receive notifications to the nm_notify() callback for 177 * this adapter. The callback is normally netmap_notify(), unless 178 * the ifp is attached to a bridge using bwrap, in which case it 179 * is netmap_bwrap_intr_notify(). 180 * 181 * - netmap_generic_adapter: [netmap_generic.c] 182 * A system netdev/ifp without native netmap support. 183 * 184 * (the decision about native/non native support is taken in 185 * netmap_get_hw_na(), called by netmap_get_na()) 186 * 187 * - netmap_vp_adapter [netmap_vale.c] 188 * Returned by netmap_get_bdg_na(). 189 * This is a persistent or ephemeral VALE port. Ephemeral ports 190 * are created on the fly if they don't already exist, and are 191 * always attached to a bridge. 192 * Persistent VALE ports must must be created separately, and i 193 * then attached like normal NICs. The NIOCREGIF we are examining 194 * will find them only if they had previosly been created and 195 * attached (see VALE_CTL below). 196 * 197 * - netmap_pipe_adapter [netmap_pipe.c] 198 * Returned by netmap_get_pipe_na(). 199 * Both pipe ends are created, if they didn't already exist. 200 * 201 * - netmap_monitor_adapter [netmap_monitor.c] 202 * Returned by netmap_get_monitor_na(). 203 * If successful, the nm_sync callbacks of the monitored adapter 204 * will be intercepted by the returned monitor. 205 * 206 * - netmap_bwrap_adapter [netmap_vale.c] 207 * Cannot be obtained in this way, see VALE_CTL below 208 * 209 * 210 * os-specific: 211 * linux: we first go through linux_netmap_ioctl() to 212 * adapt the FreeBSD interface to the linux one. 213 * 214 * 215 * > 3. on each descriptor, the process issues an mmap() request to 216 * > map the shared memory region within the process' address space. 217 * > The list of interesting queues is indicated by a location in 218 * > the shared memory region. 219 * 220 * os-specific: 221 * FreeBSD: netmap_mmap_single (netmap_freebsd.c). 222 * linux: linux_netmap_mmap (netmap_linux.c). 223 * 224 * > 4. using the functions in the netmap(4) userspace API, a process 225 * > can look up the occupation state of a queue, access memory buffers, 226 * > and retrieve received packets or enqueue packets to transmit. 227 * 228 * these actions do not involve the kernel. 229 * 230 * > 5. using some ioctl()s the process can synchronize the userspace view 231 * > of the queue with the actual status in the kernel. This includes both 232 * > receiving the notification of new packets, and transmitting new 233 * > packets on the output interface. 234 * 235 * These are implemented in netmap_ioctl(), NIOCTXSYNC and NIOCRXSYNC 236 * cases. They invoke the nm_sync callbacks on the netmap_kring 237 * structures, as initialized in step 2 and maybe later modified 238 * by a monitor. Monitors, however, will always call the original 239 * callback before doing anything else. 240 * 241 * 242 * > 6. select() or poll() can be used to wait for events on individual 243 * > transmit or receive queues (or all queues for a given interface). 244 * 245 * Implemented in netmap_poll(). This will call the same nm_sync() 246 * callbacks as in step 5 above. 247 * 248 * os-specific: 249 * linux: we first go through linux_netmap_poll() to adapt 250 * the FreeBSD interface to the linux one. 251 * 252 * 253 * ---- VALE_CTL ----- 254 * 255 * VALE switches are controlled by issuing a NIOCREGIF with a non-null 256 * nr_cmd in the nmreq structure. These subcommands are handled by 257 * netmap_bdg_ctl() in netmap_vale.c. Persistent VALE ports are created 258 * and destroyed by issuing the NETMAP_BDG_NEWIF and NETMAP_BDG_DELIF 259 * subcommands, respectively. 260 * 261 * Any network interface known to the system (including a persistent VALE 262 * port) can be attached to a VALE switch by issuing the 263 * NETMAP_BDG_ATTACH subcommand. After the attachment, persistent VALE ports 264 * look exactly like ephemeral VALE ports (as created in step 2 above). The 265 * attachment of other interfaces, instead, requires the creation of a 266 * netmap_bwrap_adapter. Moreover, the attached interface must be put in 267 * netmap mode. This may require the creation of a netmap_generic_adapter if 268 * we have no native support for the interface, or if generic adapters have 269 * been forced by sysctl. 270 * 271 * Both persistent VALE ports and bwraps are handled by netmap_get_bdg_na(), 272 * called by nm_bdg_ctl_attach(), and discriminated by the nm_bdg_attach() 273 * callback. In the case of the bwrap, the callback creates the 274 * netmap_bwrap_adapter. The initialization of the bwrap is then 275 * completed by calling netmap_do_regif() on it, in the nm_bdg_ctl() 276 * callback (netmap_bwrap_bdg_ctl in netmap_vale.c). 277 * A generic adapter for the wrapped ifp will be created if needed, when 278 * netmap_get_bdg_na() calls netmap_get_hw_na(). 279 * 280 * 281 * ---- DATAPATHS ----- 282 * 283 * -= SYSTEM DEVICE WITH NATIVE SUPPORT =- 284 * 285 * na == NA(ifp) == netmap_hw_adapter created in DEVICE_netmap_attach() 286 * 287 * - tx from netmap userspace: 288 * concurrently: 289 * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context 290 * kring->nm_sync() == DEVICE_netmap_txsync() 291 * 2) device interrupt handler 292 * na->nm_notify() == netmap_notify() 293 * - rx from netmap userspace: 294 * concurrently: 295 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context 296 * kring->nm_sync() == DEVICE_netmap_rxsync() 297 * 2) device interrupt handler 298 * na->nm_notify() == netmap_notify() 299 * - rx from host stack 300 * concurrently: 301 * 1) host stack 302 * netmap_transmit() 303 * na->nm_notify == netmap_notify() 304 * 2) ioctl(NIOCRXSYNC)/netmap_poll() in process context 305 * kring->nm_sync() == netmap_rxsync_from_host 306 * netmap_rxsync_from_host(na, NULL, NULL) 307 * - tx to host stack 308 * ioctl(NIOCTXSYNC)/netmap_poll() in process context 309 * kring->nm_sync() == netmap_txsync_to_host 310 * netmap_txsync_to_host(na) 311 * nm_os_send_up() 312 * FreeBSD: na->if_input() == ether_input() 313 * linux: netif_rx() with NM_MAGIC_PRIORITY_RX 314 * 315 * 316 * -= SYSTEM DEVICE WITH GENERIC SUPPORT =- 317 * 318 * na == NA(ifp) == generic_netmap_adapter created in generic_netmap_attach() 319 * 320 * - tx from netmap userspace: 321 * concurrently: 322 * 1) ioctl(NIOCTXSYNC)/netmap_poll() in process context 323 * kring->nm_sync() == generic_netmap_txsync() 324 * nm_os_generic_xmit_frame() 325 * linux: dev_queue_xmit() with NM_MAGIC_PRIORITY_TX 326 * ifp->ndo_start_xmit == generic_ndo_start_xmit() 327 * gna->save_start_xmit == orig. dev. start_xmit 328 * FreeBSD: na->if_transmit() == orig. dev if_transmit 329 * 2) generic_mbuf_destructor() 330 * na->nm_notify() == netmap_notify() 331 * - rx from netmap userspace: 332 * 1) ioctl(NIOCRXSYNC)/netmap_poll() in process context 333 * kring->nm_sync() == generic_netmap_rxsync() 334 * mbq_safe_dequeue() 335 * 2) device driver 336 * generic_rx_handler() 337 * mbq_safe_enqueue() 338 * na->nm_notify() == netmap_notify() 339 * - rx from host stack 340 * FreeBSD: same as native 341 * Linux: same as native except: 342 * 1) host stack 343 * dev_queue_xmit() without NM_MAGIC_PRIORITY_TX 344 * ifp->ndo_start_xmit == generic_ndo_start_xmit() 345 * netmap_transmit() 346 * na->nm_notify() == netmap_notify() 347 * - tx to host stack (same as native): 348 * 349 * 350 * -= VALE =- 351 * 352 * INCOMING: 353 * 354 * - VALE ports: 355 * ioctl(NIOCTXSYNC)/netmap_poll() in process context 356 * kring->nm_sync() == netmap_vp_txsync() 357 * 358 * - system device with native support: 359 * from cable: 360 * interrupt 361 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr != host ring) 362 * kring->nm_sync() == DEVICE_netmap_rxsync() 363 * netmap_vp_txsync() 364 * kring->nm_sync() == DEVICE_netmap_rxsync() 365 * from host stack: 366 * netmap_transmit() 367 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring) 368 * kring->nm_sync() == netmap_rxsync_from_host() 369 * netmap_vp_txsync() 370 * 371 * - system device with generic support: 372 * from device driver: 373 * generic_rx_handler() 374 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr != host ring) 375 * kring->nm_sync() == generic_netmap_rxsync() 376 * netmap_vp_txsync() 377 * kring->nm_sync() == generic_netmap_rxsync() 378 * from host stack: 379 * netmap_transmit() 380 * na->nm_notify() == netmap_bwrap_intr_notify(ring_nr == host ring) 381 * kring->nm_sync() == netmap_rxsync_from_host() 382 * netmap_vp_txsync() 383 * 384 * (all cases) --> nm_bdg_flush() 385 * dest_na->nm_notify() == (see below) 386 * 387 * OUTGOING: 388 * 389 * - VALE ports: 390 * concurrently: 391 * 1) ioctlNIOCRXSYNC)/netmap_poll() in process context 392 * kring->nm_sync() == netmap_vp_rxsync() 393 * 2) from nm_bdg_flush() 394 * na->nm_notify() == netmap_notify() 395 * 396 * - system device with native support: 397 * to cable: 398 * na->nm_notify() == netmap_bwrap_notify() 399 * netmap_vp_rxsync() 400 * kring->nm_sync() == DEVICE_netmap_txsync() 401 * netmap_vp_rxsync() 402 * to host stack: 403 * netmap_vp_rxsync() 404 * kring->nm_sync() == netmap_txsync_to_host 405 * netmap_vp_rxsync_locked() 406 * 407 * - system device with generic adapter: 408 * to device driver: 409 * na->nm_notify() == netmap_bwrap_notify() 410 * netmap_vp_rxsync() 411 * kring->nm_sync() == generic_netmap_txsync() 412 * netmap_vp_rxsync() 413 * to host stack: 414 * netmap_vp_rxsync() 415 * kring->nm_sync() == netmap_txsync_to_host 416 * netmap_vp_rxsync() 417 * 418 */ 419 420 /* 421 * OS-specific code that is used only within this file. 422 * Other OS-specific code that must be accessed by drivers 423 * is present in netmap_kern.h 424 */ 425 426 #if defined(__FreeBSD__) 427 #include <sys/cdefs.h> /* prerequisite */ 428 #include <sys/types.h> 429 #include <sys/errno.h> 430 #include <sys/param.h> /* defines used in kernel.h */ 431 #include <sys/kernel.h> /* types used in module initialization */ 432 #include <sys/conf.h> /* cdevsw struct, UID, GID */ 433 #include <sys/filio.h> /* FIONBIO */ 434 #include <sys/sockio.h> 435 #include <sys/socketvar.h> /* struct socket */ 436 #include <sys/malloc.h> 437 #include <sys/poll.h> 438 #include <sys/rwlock.h> 439 #include <sys/socket.h> /* sockaddrs */ 440 #include <sys/selinfo.h> 441 #include <sys/sysctl.h> 442 #include <sys/jail.h> 443 #include <net/vnet.h> 444 #include <net/if.h> 445 #include <net/if_var.h> 446 #include <net/bpf.h> /* BIOCIMMEDIATE */ 447 #include <machine/bus.h> /* bus_dmamap_* */ 448 #include <sys/endian.h> 449 #include <sys/refcount.h> 450 451 452 #elif defined(linux) 453 454 #include "bsd_glue.h" 455 456 #elif defined(__APPLE__) 457 458 #warning OSX support is only partial 459 #include "osx_glue.h" 460 461 #elif defined (_WIN32) 462 463 #include "win_glue.h" 464 465 #else 466 467 #error Unsupported platform 468 469 #endif /* unsupported */ 470 471 /* 472 * common headers 473 */ 474 #include <net/netmap.h> 475 #include <dev/netmap/netmap_kern.h> 476 #include <dev/netmap/netmap_mem2.h> 477 478 479 /* user-controlled variables */ 480 int netmap_verbose; 481 482 static int netmap_no_timestamp; /* don't timestamp on rxsync */ 483 int netmap_mitigate = 1; 484 int netmap_no_pendintr = 1; 485 int netmap_txsync_retry = 2; 486 int netmap_flags = 0; /* debug flags */ 487 static int netmap_fwd = 0; /* force transparent mode */ 488 489 /* 490 * netmap_admode selects the netmap mode to use. 491 * Invalid values are reset to NETMAP_ADMODE_BEST 492 */ 493 enum { NETMAP_ADMODE_BEST = 0, /* use native, fallback to generic */ 494 NETMAP_ADMODE_NATIVE, /* either native or none */ 495 NETMAP_ADMODE_GENERIC, /* force generic */ 496 NETMAP_ADMODE_LAST }; 497 static int netmap_admode = NETMAP_ADMODE_BEST; 498 499 /* netmap_generic_mit controls mitigation of RX notifications for 500 * the generic netmap adapter. The value is a time interval in 501 * nanoseconds. */ 502 int netmap_generic_mit = 100*1000; 503 504 /* We use by default netmap-aware qdiscs with generic netmap adapters, 505 * even if there can be a little performance hit with hardware NICs. 506 * However, using the qdisc is the safer approach, for two reasons: 507 * 1) it prevents non-fifo qdiscs to break the TX notification 508 * scheme, which is based on mbuf destructors when txqdisc is 509 * not used. 510 * 2) it makes it possible to transmit over software devices that 511 * change skb->dev, like bridge, veth, ... 512 * 513 * Anyway users looking for the best performance should 514 * use native adapters. 515 */ 516 int netmap_generic_txqdisc = 1; 517 518 /* Default number of slots and queues for generic adapters. */ 519 int netmap_generic_ringsize = 1024; 520 int netmap_generic_rings = 1; 521 522 /* Non-zero if ptnet devices are allowed to use virtio-net headers. */ 523 int ptnet_vnet_hdr = 1; 524 525 /* 526 * SYSCTL calls are grouped between SYSBEGIN and SYSEND to be emulated 527 * in some other operating systems 528 */ 529 SYSBEGIN(main_init); 530 531 SYSCTL_DECL(_dev_netmap); 532 SYSCTL_NODE(_dev, OID_AUTO, netmap, CTLFLAG_RW, 0, "Netmap args"); 533 SYSCTL_INT(_dev_netmap, OID_AUTO, verbose, 534 CTLFLAG_RW, &netmap_verbose, 0, "Verbose mode"); 535 SYSCTL_INT(_dev_netmap, OID_AUTO, no_timestamp, 536 CTLFLAG_RW, &netmap_no_timestamp, 0, "no_timestamp"); 537 SYSCTL_INT(_dev_netmap, OID_AUTO, mitigate, CTLFLAG_RW, &netmap_mitigate, 0, ""); 538 SYSCTL_INT(_dev_netmap, OID_AUTO, no_pendintr, 539 CTLFLAG_RW, &netmap_no_pendintr, 0, "Always look for new received packets."); 540 SYSCTL_INT(_dev_netmap, OID_AUTO, txsync_retry, CTLFLAG_RW, 541 &netmap_txsync_retry, 0 , "Number of txsync loops in bridge's flush."); 542 543 SYSCTL_INT(_dev_netmap, OID_AUTO, flags, CTLFLAG_RW, &netmap_flags, 0 , ""); 544 SYSCTL_INT(_dev_netmap, OID_AUTO, fwd, CTLFLAG_RW, &netmap_fwd, 0 , ""); 545 SYSCTL_INT(_dev_netmap, OID_AUTO, admode, CTLFLAG_RW, &netmap_admode, 0 , ""); 546 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_mit, CTLFLAG_RW, &netmap_generic_mit, 0 , ""); 547 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_ringsize, CTLFLAG_RW, &netmap_generic_ringsize, 0 , ""); 548 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_rings, CTLFLAG_RW, &netmap_generic_rings, 0 , ""); 549 SYSCTL_INT(_dev_netmap, OID_AUTO, generic_txqdisc, CTLFLAG_RW, &netmap_generic_txqdisc, 0 , ""); 550 SYSCTL_INT(_dev_netmap, OID_AUTO, ptnet_vnet_hdr, CTLFLAG_RW, &ptnet_vnet_hdr, 0 , ""); 551 552 SYSEND; 553 554 NMG_LOCK_T netmap_global_lock; 555 556 /* 557 * mark the ring as stopped, and run through the locks 558 * to make sure other users get to see it. 559 * stopped must be either NR_KR_STOPPED (for unbounded stop) 560 * of NR_KR_LOCKED (brief stop for mutual exclusion purposes) 561 */ 562 static void 563 netmap_disable_ring(struct netmap_kring *kr, int stopped) 564 { 565 nm_kr_stop(kr, stopped); 566 // XXX check if nm_kr_stop is sufficient 567 mtx_lock(&kr->q_lock); 568 mtx_unlock(&kr->q_lock); 569 nm_kr_put(kr); 570 } 571 572 /* stop or enable a single ring */ 573 void 574 netmap_set_ring(struct netmap_adapter *na, u_int ring_id, enum txrx t, int stopped) 575 { 576 if (stopped) 577 netmap_disable_ring(NMR(na, t) + ring_id, stopped); 578 else 579 NMR(na, t)[ring_id].nkr_stopped = 0; 580 } 581 582 583 /* stop or enable all the rings of na */ 584 void 585 netmap_set_all_rings(struct netmap_adapter *na, int stopped) 586 { 587 int i; 588 enum txrx t; 589 590 if (!nm_netmap_on(na)) 591 return; 592 593 for_rx_tx(t) { 594 for (i = 0; i < netmap_real_rings(na, t); i++) { 595 netmap_set_ring(na, i, t, stopped); 596 } 597 } 598 } 599 600 /* 601 * Convenience function used in drivers. Waits for current txsync()s/rxsync()s 602 * to finish and prevents any new one from starting. Call this before turning 603 * netmap mode off, or before removing the hardware rings (e.g., on module 604 * onload). 605 */ 606 void 607 netmap_disable_all_rings(struct ifnet *ifp) 608 { 609 if (NM_NA_VALID(ifp)) { 610 netmap_set_all_rings(NA(ifp), NM_KR_STOPPED); 611 } 612 } 613 614 /* 615 * Convenience function used in drivers. Re-enables rxsync and txsync on the 616 * adapter's rings In linux drivers, this should be placed near each 617 * napi_enable(). 618 */ 619 void 620 netmap_enable_all_rings(struct ifnet *ifp) 621 { 622 if (NM_NA_VALID(ifp)) { 623 netmap_set_all_rings(NA(ifp), 0 /* enabled */); 624 } 625 } 626 627 void 628 netmap_make_zombie(struct ifnet *ifp) 629 { 630 if (NM_NA_VALID(ifp)) { 631 struct netmap_adapter *na = NA(ifp); 632 netmap_set_all_rings(na, NM_KR_LOCKED); 633 na->na_flags |= NAF_ZOMBIE; 634 netmap_set_all_rings(na, 0); 635 } 636 } 637 638 void 639 netmap_undo_zombie(struct ifnet *ifp) 640 { 641 if (NM_NA_VALID(ifp)) { 642 struct netmap_adapter *na = NA(ifp); 643 if (na->na_flags & NAF_ZOMBIE) { 644 netmap_set_all_rings(na, NM_KR_LOCKED); 645 na->na_flags &= ~NAF_ZOMBIE; 646 netmap_set_all_rings(na, 0); 647 } 648 } 649 } 650 651 /* 652 * generic bound_checking function 653 */ 654 u_int 655 nm_bound_var(u_int *v, u_int dflt, u_int lo, u_int hi, const char *msg) 656 { 657 u_int oldv = *v; 658 const char *op = NULL; 659 660 if (dflt < lo) 661 dflt = lo; 662 if (dflt > hi) 663 dflt = hi; 664 if (oldv < lo) { 665 *v = dflt; 666 op = "Bump"; 667 } else if (oldv > hi) { 668 *v = hi; 669 op = "Clamp"; 670 } 671 if (op && msg) 672 printf("%s %s to %d (was %d)\n", op, msg, *v, oldv); 673 return *v; 674 } 675 676 677 /* 678 * packet-dump function, user-supplied or static buffer. 679 * The destination buffer must be at least 30+4*len 680 */ 681 const char * 682 nm_dump_buf(char *p, int len, int lim, char *dst) 683 { 684 static char _dst[8192]; 685 int i, j, i0; 686 static char hex[] ="0123456789abcdef"; 687 char *o; /* output position */ 688 689 #define P_HI(x) hex[((x) & 0xf0)>>4] 690 #define P_LO(x) hex[((x) & 0xf)] 691 #define P_C(x) ((x) >= 0x20 && (x) <= 0x7e ? (x) : '.') 692 if (!dst) 693 dst = _dst; 694 if (lim <= 0 || lim > len) 695 lim = len; 696 o = dst; 697 sprintf(o, "buf 0x%p len %d lim %d\n", p, len, lim); 698 o += strlen(o); 699 /* hexdump routine */ 700 for (i = 0; i < lim; ) { 701 sprintf(o, "%5d: ", i); 702 o += strlen(o); 703 memset(o, ' ', 48); 704 i0 = i; 705 for (j=0; j < 16 && i < lim; i++, j++) { 706 o[j*3] = P_HI(p[i]); 707 o[j*3+1] = P_LO(p[i]); 708 } 709 i = i0; 710 for (j=0; j < 16 && i < lim; i++, j++) 711 o[j + 48] = P_C(p[i]); 712 o[j+48] = '\n'; 713 o += j+49; 714 } 715 *o = '\0'; 716 #undef P_HI 717 #undef P_LO 718 #undef P_C 719 return dst; 720 } 721 722 723 /* 724 * Fetch configuration from the device, to cope with dynamic 725 * reconfigurations after loading the module. 726 */ 727 /* call with NMG_LOCK held */ 728 int 729 netmap_update_config(struct netmap_adapter *na) 730 { 731 u_int txr, txd, rxr, rxd; 732 733 txr = txd = rxr = rxd = 0; 734 if (na->nm_config == NULL || 735 na->nm_config(na, &txr, &txd, &rxr, &rxd)) 736 { 737 /* take whatever we had at init time */ 738 txr = na->num_tx_rings; 739 txd = na->num_tx_desc; 740 rxr = na->num_rx_rings; 741 rxd = na->num_rx_desc; 742 } 743 744 if (na->num_tx_rings == txr && na->num_tx_desc == txd && 745 na->num_rx_rings == rxr && na->num_rx_desc == rxd) 746 return 0; /* nothing changed */ 747 if (netmap_verbose || na->active_fds > 0) { 748 D("stored config %s: txring %d x %d, rxring %d x %d", 749 na->name, 750 na->num_tx_rings, na->num_tx_desc, 751 na->num_rx_rings, na->num_rx_desc); 752 D("new config %s: txring %d x %d, rxring %d x %d", 753 na->name, txr, txd, rxr, rxd); 754 } 755 if (na->active_fds == 0) { 756 D("configuration changed (but fine)"); 757 na->num_tx_rings = txr; 758 na->num_tx_desc = txd; 759 na->num_rx_rings = rxr; 760 na->num_rx_desc = rxd; 761 return 0; 762 } 763 D("configuration changed while active, this is bad..."); 764 return 1; 765 } 766 767 /* nm_sync callbacks for the host rings */ 768 static int netmap_txsync_to_host(struct netmap_kring *kring, int flags); 769 static int netmap_rxsync_from_host(struct netmap_kring *kring, int flags); 770 771 /* create the krings array and initialize the fields common to all adapters. 772 * The array layout is this: 773 * 774 * +----------+ 775 * na->tx_rings ----->| | \ 776 * | | } na->num_tx_ring 777 * | | / 778 * +----------+ 779 * | | host tx kring 780 * na->rx_rings ----> +----------+ 781 * | | \ 782 * | | } na->num_rx_rings 783 * | | / 784 * +----------+ 785 * | | host rx kring 786 * +----------+ 787 * na->tailroom ----->| | \ 788 * | | } tailroom bytes 789 * | | / 790 * +----------+ 791 * 792 * Note: for compatibility, host krings are created even when not needed. 793 * The tailroom space is currently used by vale ports for allocating leases. 794 */ 795 /* call with NMG_LOCK held */ 796 int 797 netmap_krings_create(struct netmap_adapter *na, u_int tailroom) 798 { 799 u_int i, len, ndesc; 800 struct netmap_kring *kring; 801 u_int n[NR_TXRX]; 802 enum txrx t; 803 804 /* account for the (possibly fake) host rings */ 805 n[NR_TX] = na->num_tx_rings + 1; 806 n[NR_RX] = na->num_rx_rings + 1; 807 808 len = (n[NR_TX] + n[NR_RX]) * sizeof(struct netmap_kring) + tailroom; 809 810 na->tx_rings = malloc((size_t)len, M_DEVBUF, M_NOWAIT | M_ZERO); 811 if (na->tx_rings == NULL) { 812 D("Cannot allocate krings"); 813 return ENOMEM; 814 } 815 na->rx_rings = na->tx_rings + n[NR_TX]; 816 817 /* 818 * All fields in krings are 0 except the one initialized below. 819 * but better be explicit on important kring fields. 820 */ 821 for_rx_tx(t) { 822 ndesc = nma_get_ndesc(na, t); 823 for (i = 0; i < n[t]; i++) { 824 kring = &NMR(na, t)[i]; 825 bzero(kring, sizeof(*kring)); 826 kring->na = na; 827 kring->ring_id = i; 828 kring->tx = t; 829 kring->nkr_num_slots = ndesc; 830 kring->nr_mode = NKR_NETMAP_OFF; 831 kring->nr_pending_mode = NKR_NETMAP_OFF; 832 if (i < nma_get_nrings(na, t)) { 833 kring->nm_sync = (t == NR_TX ? na->nm_txsync : na->nm_rxsync); 834 } else { 835 kring->nm_sync = (t == NR_TX ? 836 netmap_txsync_to_host: 837 netmap_rxsync_from_host); 838 } 839 kring->nm_notify = na->nm_notify; 840 kring->rhead = kring->rcur = kring->nr_hwcur = 0; 841 /* 842 * IMPORTANT: Always keep one slot empty. 843 */ 844 kring->rtail = kring->nr_hwtail = (t == NR_TX ? ndesc - 1 : 0); 845 snprintf(kring->name, sizeof(kring->name) - 1, "%s %s%d", na->name, 846 nm_txrx2str(t), i); 847 ND("ktx %s h %d c %d t %d", 848 kring->name, kring->rhead, kring->rcur, kring->rtail); 849 mtx_init(&kring->q_lock, (t == NR_TX ? "nm_txq_lock" : "nm_rxq_lock"), NULL, MTX_DEF); 850 nm_os_selinfo_init(&kring->si); 851 } 852 nm_os_selinfo_init(&na->si[t]); 853 } 854 855 na->tailroom = na->rx_rings + n[NR_RX]; 856 857 return 0; 858 } 859 860 861 /* undo the actions performed by netmap_krings_create */ 862 /* call with NMG_LOCK held */ 863 void 864 netmap_krings_delete(struct netmap_adapter *na) 865 { 866 struct netmap_kring *kring = na->tx_rings; 867 enum txrx t; 868 869 for_rx_tx(t) 870 nm_os_selinfo_uninit(&na->si[t]); 871 872 /* we rely on the krings layout described above */ 873 for ( ; kring != na->tailroom; kring++) { 874 mtx_destroy(&kring->q_lock); 875 nm_os_selinfo_uninit(&kring->si); 876 } 877 free(na->tx_rings, M_DEVBUF); 878 na->tx_rings = na->rx_rings = na->tailroom = NULL; 879 } 880 881 882 /* 883 * Destructor for NIC ports. They also have an mbuf queue 884 * on the rings connected to the host so we need to purge 885 * them first. 886 */ 887 /* call with NMG_LOCK held */ 888 void 889 netmap_hw_krings_delete(struct netmap_adapter *na) 890 { 891 struct mbq *q = &na->rx_rings[na->num_rx_rings].rx_queue; 892 893 ND("destroy sw mbq with len %d", mbq_len(q)); 894 mbq_purge(q); 895 mbq_safe_fini(q); 896 netmap_krings_delete(na); 897 } 898 899 900 901 /* 902 * Undo everything that was done in netmap_do_regif(). In particular, 903 * call nm_register(ifp,0) to stop netmap mode on the interface and 904 * revert to normal operation. 905 */ 906 /* call with NMG_LOCK held */ 907 static void netmap_unset_ringid(struct netmap_priv_d *); 908 static void netmap_krings_put(struct netmap_priv_d *); 909 void 910 netmap_do_unregif(struct netmap_priv_d *priv) 911 { 912 struct netmap_adapter *na = priv->np_na; 913 914 NMG_LOCK_ASSERT(); 915 na->active_fds--; 916 /* unset nr_pending_mode and possibly release exclusive mode */ 917 netmap_krings_put(priv); 918 919 #ifdef WITH_MONITOR 920 /* XXX check whether we have to do something with monitor 921 * when rings change nr_mode. */ 922 if (na->active_fds <= 0) { 923 /* walk through all the rings and tell any monitor 924 * that the port is going to exit netmap mode 925 */ 926 netmap_monitor_stop(na); 927 } 928 #endif 929 930 if (na->active_fds <= 0 || nm_kring_pending(priv)) { 931 na->nm_register(na, 0); 932 } 933 934 /* delete rings and buffers that are no longer needed */ 935 netmap_mem_rings_delete(na); 936 937 if (na->active_fds <= 0) { /* last instance */ 938 /* 939 * (TO CHECK) We enter here 940 * when the last reference to this file descriptor goes 941 * away. This means we cannot have any pending poll() 942 * or interrupt routine operating on the structure. 943 * XXX The file may be closed in a thread while 944 * another thread is using it. 945 * Linux keeps the file opened until the last reference 946 * by any outstanding ioctl/poll or mmap is gone. 947 * FreeBSD does not track mmap()s (but we do) and 948 * wakes up any sleeping poll(). Need to check what 949 * happens if the close() occurs while a concurrent 950 * syscall is running. 951 */ 952 if (netmap_verbose) 953 D("deleting last instance for %s", na->name); 954 955 if (nm_netmap_on(na)) { 956 D("BUG: netmap on while going to delete the krings"); 957 } 958 959 na->nm_krings_delete(na); 960 } 961 962 /* possibily decrement counter of tx_si/rx_si users */ 963 netmap_unset_ringid(priv); 964 /* delete the nifp */ 965 netmap_mem_if_delete(na, priv->np_nifp); 966 /* drop the allocator */ 967 netmap_mem_deref(na->nm_mem, na); 968 /* mark the priv as unregistered */ 969 priv->np_na = NULL; 970 priv->np_nifp = NULL; 971 } 972 973 /* call with NMG_LOCK held */ 974 static __inline int 975 nm_si_user(struct netmap_priv_d *priv, enum txrx t) 976 { 977 return (priv->np_na != NULL && 978 (priv->np_qlast[t] - priv->np_qfirst[t] > 1)); 979 } 980 981 struct netmap_priv_d* 982 netmap_priv_new(void) 983 { 984 struct netmap_priv_d *priv; 985 986 priv = malloc(sizeof(struct netmap_priv_d), M_DEVBUF, 987 M_NOWAIT | M_ZERO); 988 if (priv == NULL) 989 return NULL; 990 priv->np_refs = 1; 991 nm_os_get_module(); 992 return priv; 993 } 994 995 /* 996 * Destructor of the netmap_priv_d, called when the fd is closed 997 * Action: undo all the things done by NIOCREGIF, 998 * On FreeBSD we need to track whether there are active mmap()s, 999 * and we use np_active_mmaps for that. On linux, the field is always 0. 1000 * Return: 1 if we can free priv, 0 otherwise. 1001 * 1002 */ 1003 /* call with NMG_LOCK held */ 1004 void 1005 netmap_priv_delete(struct netmap_priv_d *priv) 1006 { 1007 struct netmap_adapter *na = priv->np_na; 1008 1009 /* number of active references to this fd */ 1010 if (--priv->np_refs > 0) { 1011 return; 1012 } 1013 nm_os_put_module(); 1014 if (na) { 1015 netmap_do_unregif(priv); 1016 } 1017 netmap_unget_na(na, priv->np_ifp); 1018 bzero(priv, sizeof(*priv)); /* for safety */ 1019 free(priv, M_DEVBUF); 1020 } 1021 1022 1023 /* call with NMG_LOCK *not* held */ 1024 void 1025 netmap_dtor(void *data) 1026 { 1027 struct netmap_priv_d *priv = data; 1028 1029 NMG_LOCK(); 1030 netmap_priv_delete(priv); 1031 NMG_UNLOCK(); 1032 } 1033 1034 1035 1036 1037 /* 1038 * Handlers for synchronization of the queues from/to the host. 1039 * Netmap has two operating modes: 1040 * - in the default mode, the rings connected to the host stack are 1041 * just another ring pair managed by userspace; 1042 * - in transparent mode (XXX to be defined) incoming packets 1043 * (from the host or the NIC) are marked as NS_FORWARD upon 1044 * arrival, and the user application has a chance to reset the 1045 * flag for packets that should be dropped. 1046 * On the RXSYNC or poll(), packets in RX rings between 1047 * kring->nr_kcur and ring->cur with NS_FORWARD still set are moved 1048 * to the other side. 1049 * The transfer NIC --> host is relatively easy, just encapsulate 1050 * into mbufs and we are done. The host --> NIC side is slightly 1051 * harder because there might not be room in the tx ring so it 1052 * might take a while before releasing the buffer. 1053 */ 1054 1055 1056 /* 1057 * pass a chain of buffers to the host stack as coming from 'dst' 1058 * We do not need to lock because the queue is private. 1059 */ 1060 static void 1061 netmap_send_up(struct ifnet *dst, struct mbq *q) 1062 { 1063 struct mbuf *m; 1064 struct mbuf *head = NULL, *prev = NULL; 1065 1066 /* send packets up, outside the lock */ 1067 while ((m = mbq_dequeue(q)) != NULL) { 1068 if (netmap_verbose & NM_VERB_HOST) 1069 D("sending up pkt %p size %d", m, MBUF_LEN(m)); 1070 prev = nm_os_send_up(dst, m, prev); 1071 if (head == NULL) 1072 head = prev; 1073 } 1074 if (head) 1075 nm_os_send_up(dst, NULL, head); 1076 mbq_fini(q); 1077 } 1078 1079 1080 /* 1081 * put a copy of the buffers marked NS_FORWARD into an mbuf chain. 1082 * Take packets from hwcur to ring->head marked NS_FORWARD (or forced) 1083 * and pass them up. Drop remaining packets in the unlikely event 1084 * of an mbuf shortage. 1085 */ 1086 static void 1087 netmap_grab_packets(struct netmap_kring *kring, struct mbq *q, int force) 1088 { 1089 u_int const lim = kring->nkr_num_slots - 1; 1090 u_int const head = kring->rhead; 1091 u_int n; 1092 struct netmap_adapter *na = kring->na; 1093 1094 for (n = kring->nr_hwcur; n != head; n = nm_next(n, lim)) { 1095 struct mbuf *m; 1096 struct netmap_slot *slot = &kring->ring->slot[n]; 1097 1098 if ((slot->flags & NS_FORWARD) == 0 && !force) 1099 continue; 1100 if (slot->len < 14 || slot->len > NETMAP_BUF_SIZE(na)) { 1101 RD(5, "bad pkt at %d len %d", n, slot->len); 1102 continue; 1103 } 1104 slot->flags &= ~NS_FORWARD; // XXX needed ? 1105 /* XXX TODO: adapt to the case of a multisegment packet */ 1106 m = m_devget(NMB(na, slot), slot->len, 0, na->ifp, NULL); 1107 1108 if (m == NULL) 1109 break; 1110 mbq_enqueue(q, m); 1111 } 1112 } 1113 1114 static inline int 1115 _nm_may_forward(struct netmap_kring *kring) 1116 { 1117 return ((netmap_fwd || kring->ring->flags & NR_FORWARD) && 1118 kring->na->na_flags & NAF_HOST_RINGS && 1119 kring->tx == NR_RX); 1120 } 1121 1122 static inline int 1123 nm_may_forward_up(struct netmap_kring *kring) 1124 { 1125 return _nm_may_forward(kring) && 1126 kring->ring_id != kring->na->num_rx_rings; 1127 } 1128 1129 static inline int 1130 nm_may_forward_down(struct netmap_kring *kring) 1131 { 1132 return _nm_may_forward(kring) && 1133 kring->ring_id == kring->na->num_rx_rings; 1134 } 1135 1136 /* 1137 * Send to the NIC rings packets marked NS_FORWARD between 1138 * kring->nr_hwcur and kring->rhead 1139 * Called under kring->rx_queue.lock on the sw rx ring, 1140 */ 1141 static u_int 1142 netmap_sw_to_nic(struct netmap_adapter *na) 1143 { 1144 struct netmap_kring *kring = &na->rx_rings[na->num_rx_rings]; 1145 struct netmap_slot *rxslot = kring->ring->slot; 1146 u_int i, rxcur = kring->nr_hwcur; 1147 u_int const head = kring->rhead; 1148 u_int const src_lim = kring->nkr_num_slots - 1; 1149 u_int sent = 0; 1150 1151 /* scan rings to find space, then fill as much as possible */ 1152 for (i = 0; i < na->num_tx_rings; i++) { 1153 struct netmap_kring *kdst = &na->tx_rings[i]; 1154 struct netmap_ring *rdst = kdst->ring; 1155 u_int const dst_lim = kdst->nkr_num_slots - 1; 1156 1157 /* XXX do we trust ring or kring->rcur,rtail ? */ 1158 for (; rxcur != head && !nm_ring_empty(rdst); 1159 rxcur = nm_next(rxcur, src_lim) ) { 1160 struct netmap_slot *src, *dst, tmp; 1161 u_int dst_head = rdst->head; 1162 1163 src = &rxslot[rxcur]; 1164 if ((src->flags & NS_FORWARD) == 0 && !netmap_fwd) 1165 continue; 1166 1167 sent++; 1168 1169 dst = &rdst->slot[dst_head]; 1170 1171 tmp = *src; 1172 1173 src->buf_idx = dst->buf_idx; 1174 src->flags = NS_BUF_CHANGED; 1175 1176 dst->buf_idx = tmp.buf_idx; 1177 dst->len = tmp.len; 1178 dst->flags = NS_BUF_CHANGED; 1179 1180 rdst->head = rdst->cur = nm_next(dst_head, dst_lim); 1181 } 1182 /* if (sent) XXX txsync ? */ 1183 } 1184 return sent; 1185 } 1186 1187 1188 /* 1189 * netmap_txsync_to_host() passes packets up. We are called from a 1190 * system call in user process context, and the only contention 1191 * can be among multiple user threads erroneously calling 1192 * this routine concurrently. 1193 */ 1194 static int 1195 netmap_txsync_to_host(struct netmap_kring *kring, int flags) 1196 { 1197 struct netmap_adapter *na = kring->na; 1198 u_int const lim = kring->nkr_num_slots - 1; 1199 u_int const head = kring->rhead; 1200 struct mbq q; 1201 1202 /* Take packets from hwcur to head and pass them up. 1203 * force head = cur since netmap_grab_packets() stops at head 1204 * In case of no buffers we give up. At the end of the loop, 1205 * the queue is drained in all cases. 1206 */ 1207 mbq_init(&q); 1208 netmap_grab_packets(kring, &q, 1 /* force */); 1209 ND("have %d pkts in queue", mbq_len(&q)); 1210 kring->nr_hwcur = head; 1211 kring->nr_hwtail = head + lim; 1212 if (kring->nr_hwtail > lim) 1213 kring->nr_hwtail -= lim + 1; 1214 1215 netmap_send_up(na->ifp, &q); 1216 return 0; 1217 } 1218 1219 1220 /* 1221 * rxsync backend for packets coming from the host stack. 1222 * They have been put in kring->rx_queue by netmap_transmit(). 1223 * We protect access to the kring using kring->rx_queue.lock 1224 * 1225 * This routine also does the selrecord if called from the poll handler 1226 * (we know because sr != NULL). 1227 * 1228 * returns the number of packets delivered to tx queues in 1229 * transparent mode, or a negative value if error 1230 */ 1231 static int 1232 netmap_rxsync_from_host(struct netmap_kring *kring, int flags) 1233 { 1234 struct netmap_adapter *na = kring->na; 1235 struct netmap_ring *ring = kring->ring; 1236 u_int nm_i, n; 1237 u_int const lim = kring->nkr_num_slots - 1; 1238 u_int const head = kring->rhead; 1239 int ret = 0; 1240 struct mbq *q = &kring->rx_queue, fq; 1241 1242 mbq_init(&fq); /* fq holds packets to be freed */ 1243 1244 mbq_lock(q); 1245 1246 /* First part: import newly received packets */ 1247 n = mbq_len(q); 1248 if (n) { /* grab packets from the queue */ 1249 struct mbuf *m; 1250 uint32_t stop_i; 1251 1252 nm_i = kring->nr_hwtail; 1253 stop_i = nm_prev(nm_i, lim); 1254 while ( nm_i != stop_i && (m = mbq_dequeue(q)) != NULL ) { 1255 int len = MBUF_LEN(m); 1256 struct netmap_slot *slot = &ring->slot[nm_i]; 1257 1258 m_copydata(m, 0, len, NMB(na, slot)); 1259 ND("nm %d len %d", nm_i, len); 1260 if (netmap_verbose) 1261 D("%s", nm_dump_buf(NMB(na, slot),len, 128, NULL)); 1262 1263 slot->len = len; 1264 slot->flags = kring->nkr_slot_flags; 1265 nm_i = nm_next(nm_i, lim); 1266 mbq_enqueue(&fq, m); 1267 } 1268 kring->nr_hwtail = nm_i; 1269 } 1270 1271 /* 1272 * Second part: skip past packets that userspace has released. 1273 */ 1274 nm_i = kring->nr_hwcur; 1275 if (nm_i != head) { /* something was released */ 1276 if (nm_may_forward_down(kring)) { 1277 ret = netmap_sw_to_nic(na); 1278 if (ret > 0) { 1279 kring->nr_kflags |= NR_FORWARD; 1280 ret = 0; 1281 } 1282 } 1283 kring->nr_hwcur = head; 1284 } 1285 1286 mbq_unlock(q); 1287 1288 mbq_purge(&fq); 1289 mbq_fini(&fq); 1290 1291 return ret; 1292 } 1293 1294 1295 /* Get a netmap adapter for the port. 1296 * 1297 * If it is possible to satisfy the request, return 0 1298 * with *na containing the netmap adapter found. 1299 * Otherwise return an error code, with *na containing NULL. 1300 * 1301 * When the port is attached to a bridge, we always return 1302 * EBUSY. 1303 * Otherwise, if the port is already bound to a file descriptor, 1304 * then we unconditionally return the existing adapter into *na. 1305 * In all the other cases, we return (into *na) either native, 1306 * generic or NULL, according to the following table: 1307 * 1308 * native_support 1309 * active_fds dev.netmap.admode YES NO 1310 * ------------------------------------------------------- 1311 * >0 * NA(ifp) NA(ifp) 1312 * 1313 * 0 NETMAP_ADMODE_BEST NATIVE GENERIC 1314 * 0 NETMAP_ADMODE_NATIVE NATIVE NULL 1315 * 0 NETMAP_ADMODE_GENERIC GENERIC GENERIC 1316 * 1317 */ 1318 static void netmap_hw_dtor(struct netmap_adapter *); /* needed by NM_IS_NATIVE() */ 1319 int 1320 netmap_get_hw_na(struct ifnet *ifp, struct netmap_adapter **na) 1321 { 1322 /* generic support */ 1323 int i = netmap_admode; /* Take a snapshot. */ 1324 struct netmap_adapter *prev_na; 1325 int error = 0; 1326 1327 *na = NULL; /* default */ 1328 1329 /* reset in case of invalid value */ 1330 if (i < NETMAP_ADMODE_BEST || i >= NETMAP_ADMODE_LAST) 1331 i = netmap_admode = NETMAP_ADMODE_BEST; 1332 1333 if (NM_NA_VALID(ifp)) { 1334 prev_na = NA(ifp); 1335 /* If an adapter already exists, return it if 1336 * there are active file descriptors or if 1337 * netmap is not forced to use generic 1338 * adapters. 1339 */ 1340 if (NETMAP_OWNED_BY_ANY(prev_na) 1341 || i != NETMAP_ADMODE_GENERIC 1342 || prev_na->na_flags & NAF_FORCE_NATIVE 1343 #ifdef WITH_PIPES 1344 /* ugly, but we cannot allow an adapter switch 1345 * if some pipe is referring to this one 1346 */ 1347 || prev_na->na_next_pipe > 0 1348 #endif 1349 ) { 1350 *na = prev_na; 1351 return 0; 1352 } 1353 } 1354 1355 /* If there isn't native support and netmap is not allowed 1356 * to use generic adapters, we cannot satisfy the request. 1357 */ 1358 if (!NM_IS_NATIVE(ifp) && i == NETMAP_ADMODE_NATIVE) 1359 return EOPNOTSUPP; 1360 1361 /* Otherwise, create a generic adapter and return it, 1362 * saving the previously used netmap adapter, if any. 1363 * 1364 * Note that here 'prev_na', if not NULL, MUST be a 1365 * native adapter, and CANNOT be a generic one. This is 1366 * true because generic adapters are created on demand, and 1367 * destroyed when not used anymore. Therefore, if the adapter 1368 * currently attached to an interface 'ifp' is generic, it 1369 * must be that 1370 * (NA(ifp)->active_fds > 0 || NETMAP_OWNED_BY_KERN(NA(ifp))). 1371 * Consequently, if NA(ifp) is generic, we will enter one of 1372 * the branches above. This ensures that we never override 1373 * a generic adapter with another generic adapter. 1374 */ 1375 error = generic_netmap_attach(ifp); 1376 if (error) 1377 return error; 1378 1379 *na = NA(ifp); 1380 return 0; 1381 } 1382 1383 1384 /* 1385 * MUST BE CALLED UNDER NMG_LOCK() 1386 * 1387 * Get a refcounted reference to a netmap adapter attached 1388 * to the interface specified by nmr. 1389 * This is always called in the execution of an ioctl(). 1390 * 1391 * Return ENXIO if the interface specified by the request does 1392 * not exist, ENOTSUP if netmap is not supported by the interface, 1393 * EBUSY if the interface is already attached to a bridge, 1394 * EINVAL if parameters are invalid, ENOMEM if needed resources 1395 * could not be allocated. 1396 * If successful, hold a reference to the netmap adapter. 1397 * 1398 * If the interface specified by nmr is a system one, also keep 1399 * a reference to it and return a valid *ifp. 1400 */ 1401 int 1402 netmap_get_na(struct nmreq *nmr, struct netmap_adapter **na, 1403 struct ifnet **ifp, int create) 1404 { 1405 int error = 0; 1406 struct netmap_adapter *ret = NULL; 1407 1408 *na = NULL; /* default return value */ 1409 *ifp = NULL; 1410 1411 NMG_LOCK_ASSERT(); 1412 1413 /* We cascade through all possible types of netmap adapter. 1414 * All netmap_get_*_na() functions return an error and an na, 1415 * with the following combinations: 1416 * 1417 * error na 1418 * 0 NULL type doesn't match 1419 * !0 NULL type matches, but na creation/lookup failed 1420 * 0 !NULL type matches and na created/found 1421 * !0 !NULL impossible 1422 */ 1423 1424 /* try to see if this is a ptnetmap port */ 1425 error = netmap_get_pt_host_na(nmr, na, create); 1426 if (error || *na != NULL) 1427 return error; 1428 1429 /* try to see if this is a monitor port */ 1430 error = netmap_get_monitor_na(nmr, na, create); 1431 if (error || *na != NULL) 1432 return error; 1433 1434 /* try to see if this is a pipe port */ 1435 error = netmap_get_pipe_na(nmr, na, create); 1436 if (error || *na != NULL) 1437 return error; 1438 1439 /* try to see if this is a bridge port */ 1440 error = netmap_get_bdg_na(nmr, na, create); 1441 if (error) 1442 return error; 1443 1444 if (*na != NULL) /* valid match in netmap_get_bdg_na() */ 1445 goto out; 1446 1447 /* 1448 * This must be a hardware na, lookup the name in the system. 1449 * Note that by hardware we actually mean "it shows up in ifconfig". 1450 * This may still be a tap, a veth/epair, or even a 1451 * persistent VALE port. 1452 */ 1453 *ifp = ifunit_ref(nmr->nr_name); 1454 if (*ifp == NULL) { 1455 return ENXIO; 1456 } 1457 1458 error = netmap_get_hw_na(*ifp, &ret); 1459 if (error) 1460 goto out; 1461 1462 *na = ret; 1463 netmap_adapter_get(ret); 1464 1465 out: 1466 if (error) { 1467 if (ret) 1468 netmap_adapter_put(ret); 1469 if (*ifp) { 1470 if_rele(*ifp); 1471 *ifp = NULL; 1472 } 1473 } 1474 1475 return error; 1476 } 1477 1478 /* undo netmap_get_na() */ 1479 void 1480 netmap_unget_na(struct netmap_adapter *na, struct ifnet *ifp) 1481 { 1482 if (ifp) 1483 if_rele(ifp); 1484 if (na) 1485 netmap_adapter_put(na); 1486 } 1487 1488 1489 #define NM_FAIL_ON(t) do { \ 1490 if (unlikely(t)) { \ 1491 RD(5, "%s: fail '" #t "' " \ 1492 "h %d c %d t %d " \ 1493 "rh %d rc %d rt %d " \ 1494 "hc %d ht %d", \ 1495 kring->name, \ 1496 head, cur, ring->tail, \ 1497 kring->rhead, kring->rcur, kring->rtail, \ 1498 kring->nr_hwcur, kring->nr_hwtail); \ 1499 return kring->nkr_num_slots; \ 1500 } \ 1501 } while (0) 1502 1503 /* 1504 * validate parameters on entry for *_txsync() 1505 * Returns ring->cur if ok, or something >= kring->nkr_num_slots 1506 * in case of error. 1507 * 1508 * rhead, rcur and rtail=hwtail are stored from previous round. 1509 * hwcur is the next packet to send to the ring. 1510 * 1511 * We want 1512 * hwcur <= *rhead <= head <= cur <= tail = *rtail <= hwtail 1513 * 1514 * hwcur, rhead, rtail and hwtail are reliable 1515 */ 1516 u_int 1517 nm_txsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring) 1518 { 1519 u_int head = ring->head; /* read only once */ 1520 u_int cur = ring->cur; /* read only once */ 1521 u_int n = kring->nkr_num_slots; 1522 1523 ND(5, "%s kcur %d ktail %d head %d cur %d tail %d", 1524 kring->name, 1525 kring->nr_hwcur, kring->nr_hwtail, 1526 ring->head, ring->cur, ring->tail); 1527 #if 1 /* kernel sanity checks; but we can trust the kring. */ 1528 NM_FAIL_ON(kring->nr_hwcur >= n || kring->rhead >= n || 1529 kring->rtail >= n || kring->nr_hwtail >= n); 1530 #endif /* kernel sanity checks */ 1531 /* 1532 * user sanity checks. We only use head, 1533 * A, B, ... are possible positions for head: 1534 * 1535 * 0 A rhead B rtail C n-1 1536 * 0 D rtail E rhead F n-1 1537 * 1538 * B, F, D are valid. A, C, E are wrong 1539 */ 1540 if (kring->rtail >= kring->rhead) { 1541 /* want rhead <= head <= rtail */ 1542 NM_FAIL_ON(head < kring->rhead || head > kring->rtail); 1543 /* and also head <= cur <= rtail */ 1544 NM_FAIL_ON(cur < head || cur > kring->rtail); 1545 } else { /* here rtail < rhead */ 1546 /* we need head outside rtail .. rhead */ 1547 NM_FAIL_ON(head > kring->rtail && head < kring->rhead); 1548 1549 /* two cases now: head <= rtail or head >= rhead */ 1550 if (head <= kring->rtail) { 1551 /* want head <= cur <= rtail */ 1552 NM_FAIL_ON(cur < head || cur > kring->rtail); 1553 } else { /* head >= rhead */ 1554 /* cur must be outside rtail..head */ 1555 NM_FAIL_ON(cur > kring->rtail && cur < head); 1556 } 1557 } 1558 if (ring->tail != kring->rtail) { 1559 RD(5, "%s tail overwritten was %d need %d", kring->name, 1560 ring->tail, kring->rtail); 1561 ring->tail = kring->rtail; 1562 } 1563 kring->rhead = head; 1564 kring->rcur = cur; 1565 return head; 1566 } 1567 1568 1569 /* 1570 * validate parameters on entry for *_rxsync() 1571 * Returns ring->head if ok, kring->nkr_num_slots on error. 1572 * 1573 * For a valid configuration, 1574 * hwcur <= head <= cur <= tail <= hwtail 1575 * 1576 * We only consider head and cur. 1577 * hwcur and hwtail are reliable. 1578 * 1579 */ 1580 u_int 1581 nm_rxsync_prologue(struct netmap_kring *kring, struct netmap_ring *ring) 1582 { 1583 uint32_t const n = kring->nkr_num_slots; 1584 uint32_t head, cur; 1585 1586 ND(5,"%s kc %d kt %d h %d c %d t %d", 1587 kring->name, 1588 kring->nr_hwcur, kring->nr_hwtail, 1589 ring->head, ring->cur, ring->tail); 1590 /* 1591 * Before storing the new values, we should check they do not 1592 * move backwards. However: 1593 * - head is not an issue because the previous value is hwcur; 1594 * - cur could in principle go back, however it does not matter 1595 * because we are processing a brand new rxsync() 1596 */ 1597 cur = kring->rcur = ring->cur; /* read only once */ 1598 head = kring->rhead = ring->head; /* read only once */ 1599 #if 1 /* kernel sanity checks */ 1600 NM_FAIL_ON(kring->nr_hwcur >= n || kring->nr_hwtail >= n); 1601 #endif /* kernel sanity checks */ 1602 /* user sanity checks */ 1603 if (kring->nr_hwtail >= kring->nr_hwcur) { 1604 /* want hwcur <= rhead <= hwtail */ 1605 NM_FAIL_ON(head < kring->nr_hwcur || head > kring->nr_hwtail); 1606 /* and also rhead <= rcur <= hwtail */ 1607 NM_FAIL_ON(cur < head || cur > kring->nr_hwtail); 1608 } else { 1609 /* we need rhead outside hwtail..hwcur */ 1610 NM_FAIL_ON(head < kring->nr_hwcur && head > kring->nr_hwtail); 1611 /* two cases now: head <= hwtail or head >= hwcur */ 1612 if (head <= kring->nr_hwtail) { 1613 /* want head <= cur <= hwtail */ 1614 NM_FAIL_ON(cur < head || cur > kring->nr_hwtail); 1615 } else { 1616 /* cur must be outside hwtail..head */ 1617 NM_FAIL_ON(cur < head && cur > kring->nr_hwtail); 1618 } 1619 } 1620 if (ring->tail != kring->rtail) { 1621 RD(5, "%s tail overwritten was %d need %d", 1622 kring->name, 1623 ring->tail, kring->rtail); 1624 ring->tail = kring->rtail; 1625 } 1626 return head; 1627 } 1628 1629 1630 /* 1631 * Error routine called when txsync/rxsync detects an error. 1632 * Can't do much more than resetting head =cur = hwcur, tail = hwtail 1633 * Return 1 on reinit. 1634 * 1635 * This routine is only called by the upper half of the kernel. 1636 * It only reads hwcur (which is changed only by the upper half, too) 1637 * and hwtail (which may be changed by the lower half, but only on 1638 * a tx ring and only to increase it, so any error will be recovered 1639 * on the next call). For the above, we don't strictly need to call 1640 * it under lock. 1641 */ 1642 int 1643 netmap_ring_reinit(struct netmap_kring *kring) 1644 { 1645 struct netmap_ring *ring = kring->ring; 1646 u_int i, lim = kring->nkr_num_slots - 1; 1647 int errors = 0; 1648 1649 // XXX KASSERT nm_kr_tryget 1650 RD(10, "called for %s", kring->name); 1651 // XXX probably wrong to trust userspace 1652 kring->rhead = ring->head; 1653 kring->rcur = ring->cur; 1654 kring->rtail = ring->tail; 1655 1656 if (ring->cur > lim) 1657 errors++; 1658 if (ring->head > lim) 1659 errors++; 1660 if (ring->tail > lim) 1661 errors++; 1662 for (i = 0; i <= lim; i++) { 1663 u_int idx = ring->slot[i].buf_idx; 1664 u_int len = ring->slot[i].len; 1665 if (idx < 2 || idx >= kring->na->na_lut.objtotal) { 1666 RD(5, "bad index at slot %d idx %d len %d ", i, idx, len); 1667 ring->slot[i].buf_idx = 0; 1668 ring->slot[i].len = 0; 1669 } else if (len > NETMAP_BUF_SIZE(kring->na)) { 1670 ring->slot[i].len = 0; 1671 RD(5, "bad len at slot %d idx %d len %d", i, idx, len); 1672 } 1673 } 1674 if (errors) { 1675 RD(10, "total %d errors", errors); 1676 RD(10, "%s reinit, cur %d -> %d tail %d -> %d", 1677 kring->name, 1678 ring->cur, kring->nr_hwcur, 1679 ring->tail, kring->nr_hwtail); 1680 ring->head = kring->rhead = kring->nr_hwcur; 1681 ring->cur = kring->rcur = kring->nr_hwcur; 1682 ring->tail = kring->rtail = kring->nr_hwtail; 1683 } 1684 return (errors ? 1 : 0); 1685 } 1686 1687 /* interpret the ringid and flags fields of an nmreq, by translating them 1688 * into a pair of intervals of ring indices: 1689 * 1690 * [priv->np_txqfirst, priv->np_txqlast) and 1691 * [priv->np_rxqfirst, priv->np_rxqlast) 1692 * 1693 */ 1694 int 1695 netmap_interp_ringid(struct netmap_priv_d *priv, uint16_t ringid, uint32_t flags) 1696 { 1697 struct netmap_adapter *na = priv->np_na; 1698 u_int j, i = ringid & NETMAP_RING_MASK; 1699 u_int reg = flags & NR_REG_MASK; 1700 int excluded_direction[] = { NR_TX_RINGS_ONLY, NR_RX_RINGS_ONLY }; 1701 enum txrx t; 1702 1703 if (reg == NR_REG_DEFAULT) { 1704 /* convert from old ringid to flags */ 1705 if (ringid & NETMAP_SW_RING) { 1706 reg = NR_REG_SW; 1707 } else if (ringid & NETMAP_HW_RING) { 1708 reg = NR_REG_ONE_NIC; 1709 } else { 1710 reg = NR_REG_ALL_NIC; 1711 } 1712 D("deprecated API, old ringid 0x%x -> ringid %x reg %d", ringid, i, reg); 1713 } 1714 1715 if ((flags & NR_PTNETMAP_HOST) && (reg != NR_REG_ALL_NIC || 1716 flags & (NR_RX_RINGS_ONLY|NR_TX_RINGS_ONLY))) { 1717 D("Error: only NR_REG_ALL_NIC supported with netmap passthrough"); 1718 return EINVAL; 1719 } 1720 1721 for_rx_tx(t) { 1722 if (flags & excluded_direction[t]) { 1723 priv->np_qfirst[t] = priv->np_qlast[t] = 0; 1724 continue; 1725 } 1726 switch (reg) { 1727 case NR_REG_ALL_NIC: 1728 case NR_REG_PIPE_MASTER: 1729 case NR_REG_PIPE_SLAVE: 1730 priv->np_qfirst[t] = 0; 1731 priv->np_qlast[t] = nma_get_nrings(na, t); 1732 ND("ALL/PIPE: %s %d %d", nm_txrx2str(t), 1733 priv->np_qfirst[t], priv->np_qlast[t]); 1734 break; 1735 case NR_REG_SW: 1736 case NR_REG_NIC_SW: 1737 if (!(na->na_flags & NAF_HOST_RINGS)) { 1738 D("host rings not supported"); 1739 return EINVAL; 1740 } 1741 priv->np_qfirst[t] = (reg == NR_REG_SW ? 1742 nma_get_nrings(na, t) : 0); 1743 priv->np_qlast[t] = nma_get_nrings(na, t) + 1; 1744 ND("%s: %s %d %d", reg == NR_REG_SW ? "SW" : "NIC+SW", 1745 nm_txrx2str(t), 1746 priv->np_qfirst[t], priv->np_qlast[t]); 1747 break; 1748 case NR_REG_ONE_NIC: 1749 if (i >= na->num_tx_rings && i >= na->num_rx_rings) { 1750 D("invalid ring id %d", i); 1751 return EINVAL; 1752 } 1753 /* if not enough rings, use the first one */ 1754 j = i; 1755 if (j >= nma_get_nrings(na, t)) 1756 j = 0; 1757 priv->np_qfirst[t] = j; 1758 priv->np_qlast[t] = j + 1; 1759 ND("ONE_NIC: %s %d %d", nm_txrx2str(t), 1760 priv->np_qfirst[t], priv->np_qlast[t]); 1761 break; 1762 default: 1763 D("invalid regif type %d", reg); 1764 return EINVAL; 1765 } 1766 } 1767 priv->np_flags = (flags & ~NR_REG_MASK) | reg; 1768 1769 if (netmap_verbose) { 1770 D("%s: tx [%d,%d) rx [%d,%d) id %d", 1771 na->name, 1772 priv->np_qfirst[NR_TX], 1773 priv->np_qlast[NR_TX], 1774 priv->np_qfirst[NR_RX], 1775 priv->np_qlast[NR_RX], 1776 i); 1777 } 1778 return 0; 1779 } 1780 1781 1782 /* 1783 * Set the ring ID. For devices with a single queue, a request 1784 * for all rings is the same as a single ring. 1785 */ 1786 static int 1787 netmap_set_ringid(struct netmap_priv_d *priv, uint16_t ringid, uint32_t flags) 1788 { 1789 struct netmap_adapter *na = priv->np_na; 1790 int error; 1791 enum txrx t; 1792 1793 error = netmap_interp_ringid(priv, ringid, flags); 1794 if (error) { 1795 return error; 1796 } 1797 1798 priv->np_txpoll = (ringid & NETMAP_NO_TX_POLL) ? 0 : 1; 1799 1800 /* optimization: count the users registered for more than 1801 * one ring, which are the ones sleeping on the global queue. 1802 * The default netmap_notify() callback will then 1803 * avoid signaling the global queue if nobody is using it 1804 */ 1805 for_rx_tx(t) { 1806 if (nm_si_user(priv, t)) 1807 na->si_users[t]++; 1808 } 1809 return 0; 1810 } 1811 1812 static void 1813 netmap_unset_ringid(struct netmap_priv_d *priv) 1814 { 1815 struct netmap_adapter *na = priv->np_na; 1816 enum txrx t; 1817 1818 for_rx_tx(t) { 1819 if (nm_si_user(priv, t)) 1820 na->si_users[t]--; 1821 priv->np_qfirst[t] = priv->np_qlast[t] = 0; 1822 } 1823 priv->np_flags = 0; 1824 priv->np_txpoll = 0; 1825 } 1826 1827 1828 /* Set the nr_pending_mode for the requested rings. 1829 * If requested, also try to get exclusive access to the rings, provided 1830 * the rings we want to bind are not exclusively owned by a previous bind. 1831 */ 1832 static int 1833 netmap_krings_get(struct netmap_priv_d *priv) 1834 { 1835 struct netmap_adapter *na = priv->np_na; 1836 u_int i; 1837 struct netmap_kring *kring; 1838 int excl = (priv->np_flags & NR_EXCLUSIVE); 1839 enum txrx t; 1840 1841 ND("%s: grabbing tx [%d, %d) rx [%d, %d)", 1842 na->name, 1843 priv->np_qfirst[NR_TX], 1844 priv->np_qlast[NR_TX], 1845 priv->np_qfirst[NR_RX], 1846 priv->np_qlast[NR_RX]); 1847 1848 /* first round: check that all the requested rings 1849 * are neither alread exclusively owned, nor we 1850 * want exclusive ownership when they are already in use 1851 */ 1852 for_rx_tx(t) { 1853 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 1854 kring = &NMR(na, t)[i]; 1855 if ((kring->nr_kflags & NKR_EXCLUSIVE) || 1856 (kring->users && excl)) 1857 { 1858 ND("ring %s busy", kring->name); 1859 return EBUSY; 1860 } 1861 } 1862 } 1863 1864 /* second round: increment usage count (possibly marking them 1865 * as exclusive) and set the nr_pending_mode 1866 */ 1867 for_rx_tx(t) { 1868 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 1869 kring = &NMR(na, t)[i]; 1870 kring->users++; 1871 if (excl) 1872 kring->nr_kflags |= NKR_EXCLUSIVE; 1873 kring->nr_pending_mode = NKR_NETMAP_ON; 1874 } 1875 } 1876 1877 return 0; 1878 1879 } 1880 1881 /* Undo netmap_krings_get(). This is done by clearing the exclusive mode 1882 * if was asked on regif, and unset the nr_pending_mode if we are the 1883 * last users of the involved rings. */ 1884 static void 1885 netmap_krings_put(struct netmap_priv_d *priv) 1886 { 1887 struct netmap_adapter *na = priv->np_na; 1888 u_int i; 1889 struct netmap_kring *kring; 1890 int excl = (priv->np_flags & NR_EXCLUSIVE); 1891 enum txrx t; 1892 1893 ND("%s: releasing tx [%d, %d) rx [%d, %d)", 1894 na->name, 1895 priv->np_qfirst[NR_TX], 1896 priv->np_qlast[NR_TX], 1897 priv->np_qfirst[NR_RX], 1898 priv->np_qlast[MR_RX]); 1899 1900 1901 for_rx_tx(t) { 1902 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 1903 kring = &NMR(na, t)[i]; 1904 if (excl) 1905 kring->nr_kflags &= ~NKR_EXCLUSIVE; 1906 kring->users--; 1907 if (kring->users == 0) 1908 kring->nr_pending_mode = NKR_NETMAP_OFF; 1909 } 1910 } 1911 } 1912 1913 /* 1914 * possibly move the interface to netmap-mode. 1915 * If success it returns a pointer to netmap_if, otherwise NULL. 1916 * This must be called with NMG_LOCK held. 1917 * 1918 * The following na callbacks are called in the process: 1919 * 1920 * na->nm_config() [by netmap_update_config] 1921 * (get current number and size of rings) 1922 * 1923 * We have a generic one for linux (netmap_linux_config). 1924 * The bwrap has to override this, since it has to forward 1925 * the request to the wrapped adapter (netmap_bwrap_config). 1926 * 1927 * 1928 * na->nm_krings_create() 1929 * (create and init the krings array) 1930 * 1931 * One of the following: 1932 * 1933 * * netmap_hw_krings_create, (hw ports) 1934 * creates the standard layout for the krings 1935 * and adds the mbq (used for the host rings). 1936 * 1937 * * netmap_vp_krings_create (VALE ports) 1938 * add leases and scratchpads 1939 * 1940 * * netmap_pipe_krings_create (pipes) 1941 * create the krings and rings of both ends and 1942 * cross-link them 1943 * 1944 * * netmap_monitor_krings_create (monitors) 1945 * avoid allocating the mbq 1946 * 1947 * * netmap_bwrap_krings_create (bwraps) 1948 * create both the brap krings array, 1949 * the krings array of the wrapped adapter, and 1950 * (if needed) the fake array for the host adapter 1951 * 1952 * na->nm_register(, 1) 1953 * (put the adapter in netmap mode) 1954 * 1955 * This may be one of the following: 1956 * 1957 * * netmap_hw_reg (hw ports) 1958 * checks that the ifp is still there, then calls 1959 * the hardware specific callback; 1960 * 1961 * * netmap_vp_reg (VALE ports) 1962 * If the port is connected to a bridge, 1963 * set the NAF_NETMAP_ON flag under the 1964 * bridge write lock. 1965 * 1966 * * netmap_pipe_reg (pipes) 1967 * inform the other pipe end that it is no 1968 * longer responsible for the lifetime of this 1969 * pipe end 1970 * 1971 * * netmap_monitor_reg (monitors) 1972 * intercept the sync callbacks of the monitored 1973 * rings 1974 * 1975 * * netmap_bwrap_reg (bwraps) 1976 * cross-link the bwrap and hwna rings, 1977 * forward the request to the hwna, override 1978 * the hwna notify callback (to get the frames 1979 * coming from outside go through the bridge). 1980 * 1981 * 1982 */ 1983 int 1984 netmap_do_regif(struct netmap_priv_d *priv, struct netmap_adapter *na, 1985 uint16_t ringid, uint32_t flags) 1986 { 1987 struct netmap_if *nifp = NULL; 1988 int error; 1989 1990 NMG_LOCK_ASSERT(); 1991 /* ring configuration may have changed, fetch from the card */ 1992 netmap_update_config(na); 1993 priv->np_na = na; /* store the reference */ 1994 error = netmap_set_ringid(priv, ringid, flags); 1995 if (error) 1996 goto err; 1997 error = netmap_mem_finalize(na->nm_mem, na); 1998 if (error) 1999 goto err; 2000 2001 if (na->active_fds == 0) { 2002 /* 2003 * If this is the first registration of the adapter, 2004 * create the in-kernel view of the netmap rings, 2005 * the netmap krings. 2006 */ 2007 2008 /* 2009 * Depending on the adapter, this may also create 2010 * the netmap rings themselves 2011 */ 2012 error = na->nm_krings_create(na); 2013 if (error) 2014 goto err_drop_mem; 2015 2016 } 2017 2018 /* now the krings must exist and we can check whether some 2019 * previous bind has exclusive ownership on them, and set 2020 * nr_pending_mode 2021 */ 2022 error = netmap_krings_get(priv); 2023 if (error) 2024 goto err_del_krings; 2025 2026 /* create all needed missing netmap rings */ 2027 error = netmap_mem_rings_create(na); 2028 if (error) 2029 goto err_rel_excl; 2030 2031 /* in all cases, create a new netmap if */ 2032 nifp = netmap_mem_if_new(na); 2033 if (nifp == NULL) { 2034 error = ENOMEM; 2035 goto err_del_rings; 2036 } 2037 2038 if (na->active_fds == 0) { 2039 /* cache the allocator info in the na */ 2040 error = netmap_mem_get_lut(na->nm_mem, &na->na_lut); 2041 if (error) 2042 goto err_del_if; 2043 ND("lut %p bufs %u size %u", na->na_lut.lut, na->na_lut.objtotal, 2044 na->na_lut.objsize); 2045 } 2046 2047 if (nm_kring_pending(priv)) { 2048 /* Some kring is switching mode, tell the adapter to 2049 * react on this. */ 2050 error = na->nm_register(na, 1); 2051 if (error) 2052 goto err_put_lut; 2053 } 2054 2055 /* Commit the reference. */ 2056 na->active_fds++; 2057 2058 /* 2059 * advertise that the interface is ready by setting np_nifp. 2060 * The barrier is needed because readers (poll, *SYNC and mmap) 2061 * check for priv->np_nifp != NULL without locking 2062 */ 2063 mb(); /* make sure previous writes are visible to all CPUs */ 2064 priv->np_nifp = nifp; 2065 2066 return 0; 2067 2068 err_put_lut: 2069 if (na->active_fds == 0) 2070 memset(&na->na_lut, 0, sizeof(na->na_lut)); 2071 err_del_if: 2072 netmap_mem_if_delete(na, nifp); 2073 err_rel_excl: 2074 netmap_krings_put(priv); 2075 err_del_rings: 2076 netmap_mem_rings_delete(na); 2077 err_del_krings: 2078 if (na->active_fds == 0) 2079 na->nm_krings_delete(na); 2080 err_drop_mem: 2081 netmap_mem_deref(na->nm_mem, na); 2082 err: 2083 priv->np_na = NULL; 2084 return error; 2085 } 2086 2087 2088 /* 2089 * update kring and ring at the end of rxsync/txsync. 2090 */ 2091 static inline void 2092 nm_sync_finalize(struct netmap_kring *kring) 2093 { 2094 /* 2095 * Update ring tail to what the kernel knows 2096 * After txsync: head/rhead/hwcur might be behind cur/rcur 2097 * if no carrier. 2098 */ 2099 kring->ring->tail = kring->rtail = kring->nr_hwtail; 2100 2101 ND(5, "%s now hwcur %d hwtail %d head %d cur %d tail %d", 2102 kring->name, kring->nr_hwcur, kring->nr_hwtail, 2103 kring->rhead, kring->rcur, kring->rtail); 2104 } 2105 2106 /* 2107 * ioctl(2) support for the "netmap" device. 2108 * 2109 * Following a list of accepted commands: 2110 * - NIOCGINFO 2111 * - SIOCGIFADDR just for convenience 2112 * - NIOCREGIF 2113 * - NIOCTXSYNC 2114 * - NIOCRXSYNC 2115 * 2116 * Return 0 on success, errno otherwise. 2117 */ 2118 int 2119 netmap_ioctl(struct netmap_priv_d *priv, u_long cmd, caddr_t data, struct thread *td) 2120 { 2121 struct nmreq *nmr = (struct nmreq *) data; 2122 struct netmap_adapter *na = NULL; 2123 struct ifnet *ifp = NULL; 2124 int error = 0; 2125 u_int i, qfirst, qlast; 2126 struct netmap_if *nifp; 2127 struct netmap_kring *krings; 2128 enum txrx t; 2129 2130 if (cmd == NIOCGINFO || cmd == NIOCREGIF) { 2131 /* truncate name */ 2132 nmr->nr_name[sizeof(nmr->nr_name) - 1] = '\0'; 2133 if (nmr->nr_version != NETMAP_API) { 2134 D("API mismatch for %s got %d need %d", 2135 nmr->nr_name, 2136 nmr->nr_version, NETMAP_API); 2137 nmr->nr_version = NETMAP_API; 2138 } 2139 if (nmr->nr_version < NETMAP_MIN_API || 2140 nmr->nr_version > NETMAP_MAX_API) { 2141 return EINVAL; 2142 } 2143 } 2144 2145 switch (cmd) { 2146 case NIOCGINFO: /* return capabilities etc */ 2147 if (nmr->nr_cmd == NETMAP_BDG_LIST) { 2148 error = netmap_bdg_ctl(nmr, NULL); 2149 break; 2150 } 2151 2152 NMG_LOCK(); 2153 do { 2154 /* memsize is always valid */ 2155 struct netmap_mem_d *nmd = &nm_mem; 2156 u_int memflags; 2157 2158 if (nmr->nr_name[0] != '\0') { 2159 2160 /* get a refcount */ 2161 error = netmap_get_na(nmr, &na, &ifp, 1 /* create */); 2162 if (error) { 2163 na = NULL; 2164 ifp = NULL; 2165 break; 2166 } 2167 nmd = na->nm_mem; /* get memory allocator */ 2168 } 2169 2170 error = netmap_mem_get_info(nmd, &nmr->nr_memsize, &memflags, 2171 &nmr->nr_arg2); 2172 if (error) 2173 break; 2174 if (na == NULL) /* only memory info */ 2175 break; 2176 nmr->nr_offset = 0; 2177 nmr->nr_rx_slots = nmr->nr_tx_slots = 0; 2178 netmap_update_config(na); 2179 nmr->nr_rx_rings = na->num_rx_rings; 2180 nmr->nr_tx_rings = na->num_tx_rings; 2181 nmr->nr_rx_slots = na->num_rx_desc; 2182 nmr->nr_tx_slots = na->num_tx_desc; 2183 } while (0); 2184 netmap_unget_na(na, ifp); 2185 NMG_UNLOCK(); 2186 break; 2187 2188 case NIOCREGIF: 2189 /* possibly attach/detach NIC and VALE switch */ 2190 i = nmr->nr_cmd; 2191 if (i == NETMAP_BDG_ATTACH || i == NETMAP_BDG_DETACH 2192 || i == NETMAP_BDG_VNET_HDR 2193 || i == NETMAP_BDG_NEWIF 2194 || i == NETMAP_BDG_DELIF 2195 || i == NETMAP_BDG_POLLING_ON 2196 || i == NETMAP_BDG_POLLING_OFF) { 2197 error = netmap_bdg_ctl(nmr, NULL); 2198 break; 2199 } else if (i == NETMAP_PT_HOST_CREATE || i == NETMAP_PT_HOST_DELETE) { 2200 error = ptnetmap_ctl(nmr, priv->np_na); 2201 break; 2202 } else if (i == NETMAP_VNET_HDR_GET) { 2203 struct ifnet *ifp; 2204 2205 NMG_LOCK(); 2206 error = netmap_get_na(nmr, &na, &ifp, 0); 2207 if (na && !error) { 2208 nmr->nr_arg1 = na->virt_hdr_len; 2209 } 2210 netmap_unget_na(na, ifp); 2211 NMG_UNLOCK(); 2212 break; 2213 } else if (i != 0) { 2214 D("nr_cmd must be 0 not %d", i); 2215 error = EINVAL; 2216 break; 2217 } 2218 2219 /* protect access to priv from concurrent NIOCREGIF */ 2220 NMG_LOCK(); 2221 do { 2222 u_int memflags; 2223 struct ifnet *ifp; 2224 2225 if (priv->np_nifp != NULL) { /* thread already registered */ 2226 error = EBUSY; 2227 break; 2228 } 2229 /* find the interface and a reference */ 2230 error = netmap_get_na(nmr, &na, &ifp, 2231 1 /* create */); /* keep reference */ 2232 if (error) 2233 break; 2234 if (NETMAP_OWNED_BY_KERN(na)) { 2235 netmap_unget_na(na, ifp); 2236 error = EBUSY; 2237 break; 2238 } 2239 2240 if (na->virt_hdr_len && !(nmr->nr_flags & NR_ACCEPT_VNET_HDR)) { 2241 netmap_unget_na(na, ifp); 2242 error = EIO; 2243 break; 2244 } 2245 2246 error = netmap_do_regif(priv, na, nmr->nr_ringid, nmr->nr_flags); 2247 if (error) { /* reg. failed, release priv and ref */ 2248 netmap_unget_na(na, ifp); 2249 break; 2250 } 2251 nifp = priv->np_nifp; 2252 priv->np_td = td; // XXX kqueue, debugging only 2253 2254 /* return the offset of the netmap_if object */ 2255 nmr->nr_rx_rings = na->num_rx_rings; 2256 nmr->nr_tx_rings = na->num_tx_rings; 2257 nmr->nr_rx_slots = na->num_rx_desc; 2258 nmr->nr_tx_slots = na->num_tx_desc; 2259 error = netmap_mem_get_info(na->nm_mem, &nmr->nr_memsize, &memflags, 2260 &nmr->nr_arg2); 2261 if (error) { 2262 netmap_do_unregif(priv); 2263 netmap_unget_na(na, ifp); 2264 break; 2265 } 2266 if (memflags & NETMAP_MEM_PRIVATE) { 2267 *(uint32_t *)(uintptr_t)&nifp->ni_flags |= NI_PRIV_MEM; 2268 } 2269 for_rx_tx(t) { 2270 priv->np_si[t] = nm_si_user(priv, t) ? 2271 &na->si[t] : &NMR(na, t)[priv->np_qfirst[t]].si; 2272 } 2273 2274 if (nmr->nr_arg3) { 2275 if (netmap_verbose) 2276 D("requested %d extra buffers", nmr->nr_arg3); 2277 nmr->nr_arg3 = netmap_extra_alloc(na, 2278 &nifp->ni_bufs_head, nmr->nr_arg3); 2279 if (netmap_verbose) 2280 D("got %d extra buffers", nmr->nr_arg3); 2281 } 2282 nmr->nr_offset = netmap_mem_if_offset(na->nm_mem, nifp); 2283 2284 /* store ifp reference so that priv destructor may release it */ 2285 priv->np_ifp = ifp; 2286 } while (0); 2287 NMG_UNLOCK(); 2288 break; 2289 2290 case NIOCTXSYNC: 2291 case NIOCRXSYNC: 2292 nifp = priv->np_nifp; 2293 2294 if (nifp == NULL) { 2295 error = ENXIO; 2296 break; 2297 } 2298 mb(); /* make sure following reads are not from cache */ 2299 2300 na = priv->np_na; /* we have a reference */ 2301 2302 if (na == NULL) { 2303 D("Internal error: nifp != NULL && na == NULL"); 2304 error = ENXIO; 2305 break; 2306 } 2307 2308 t = (cmd == NIOCTXSYNC ? NR_TX : NR_RX); 2309 krings = NMR(na, t); 2310 qfirst = priv->np_qfirst[t]; 2311 qlast = priv->np_qlast[t]; 2312 2313 for (i = qfirst; i < qlast; i++) { 2314 struct netmap_kring *kring = krings + i; 2315 struct netmap_ring *ring = kring->ring; 2316 2317 if (unlikely(nm_kr_tryget(kring, 1, &error))) { 2318 error = (error ? EIO : 0); 2319 continue; 2320 } 2321 2322 if (cmd == NIOCTXSYNC) { 2323 if (netmap_verbose & NM_VERB_TXSYNC) 2324 D("pre txsync ring %d cur %d hwcur %d", 2325 i, ring->cur, 2326 kring->nr_hwcur); 2327 if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2328 netmap_ring_reinit(kring); 2329 } else if (kring->nm_sync(kring, NAF_FORCE_RECLAIM) == 0) { 2330 nm_sync_finalize(kring); 2331 } 2332 if (netmap_verbose & NM_VERB_TXSYNC) 2333 D("post txsync ring %d cur %d hwcur %d", 2334 i, ring->cur, 2335 kring->nr_hwcur); 2336 } else { 2337 if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2338 netmap_ring_reinit(kring); 2339 } else if (kring->nm_sync(kring, NAF_FORCE_READ) == 0) { 2340 nm_sync_finalize(kring); 2341 } 2342 microtime(&ring->ts); 2343 } 2344 nm_kr_put(kring); 2345 } 2346 2347 break; 2348 2349 #ifdef WITH_VALE 2350 case NIOCCONFIG: 2351 error = netmap_bdg_config(nmr); 2352 break; 2353 #endif 2354 #ifdef __FreeBSD__ 2355 case FIONBIO: 2356 case FIOASYNC: 2357 ND("FIONBIO/FIOASYNC are no-ops"); 2358 break; 2359 2360 case BIOCIMMEDIATE: 2361 case BIOCGHDRCMPLT: 2362 case BIOCSHDRCMPLT: 2363 case BIOCSSEESENT: 2364 D("ignore BIOCIMMEDIATE/BIOCSHDRCMPLT/BIOCSHDRCMPLT/BIOCSSEESENT"); 2365 break; 2366 2367 default: /* allow device-specific ioctls */ 2368 { 2369 struct ifnet *ifp = ifunit_ref(nmr->nr_name); 2370 if (ifp == NULL) { 2371 error = ENXIO; 2372 } else { 2373 struct socket so; 2374 2375 bzero(&so, sizeof(so)); 2376 so.so_vnet = ifp->if_vnet; 2377 // so->so_proto not null. 2378 error = ifioctl(&so, cmd, data, td); 2379 if_rele(ifp); 2380 } 2381 break; 2382 } 2383 2384 #else /* linux */ 2385 default: 2386 error = EOPNOTSUPP; 2387 #endif /* linux */ 2388 } 2389 2390 return (error); 2391 } 2392 2393 2394 /* 2395 * select(2) and poll(2) handlers for the "netmap" device. 2396 * 2397 * Can be called for one or more queues. 2398 * Return true the event mask corresponding to ready events. 2399 * If there are no ready events, do a selrecord on either individual 2400 * selinfo or on the global one. 2401 * Device-dependent parts (locking and sync of tx/rx rings) 2402 * are done through callbacks. 2403 * 2404 * On linux, arguments are really pwait, the poll table, and 'td' is struct file * 2405 * The first one is remapped to pwait as selrecord() uses the name as an 2406 * hidden argument. 2407 */ 2408 int 2409 netmap_poll(struct netmap_priv_d *priv, int events, NM_SELRECORD_T *sr) 2410 { 2411 struct netmap_adapter *na; 2412 struct netmap_kring *kring; 2413 struct netmap_ring *ring; 2414 u_int i, check_all_tx, check_all_rx, want[NR_TXRX], revents = 0; 2415 #define want_tx want[NR_TX] 2416 #define want_rx want[NR_RX] 2417 struct mbq q; /* packets from hw queues to host stack */ 2418 enum txrx t; 2419 2420 /* 2421 * In order to avoid nested locks, we need to "double check" 2422 * txsync and rxsync if we decide to do a selrecord(). 2423 * retry_tx (and retry_rx, later) prevent looping forever. 2424 */ 2425 int retry_tx = 1, retry_rx = 1; 2426 2427 /* transparent mode: send_down is 1 if we have found some 2428 * packets to forward during the rx scan and we have not 2429 * sent them down to the nic yet 2430 */ 2431 int send_down = 0; 2432 2433 mbq_init(&q); 2434 2435 if (priv->np_nifp == NULL) { 2436 D("No if registered"); 2437 return POLLERR; 2438 } 2439 mb(); /* make sure following reads are not from cache */ 2440 2441 na = priv->np_na; 2442 2443 if (!nm_netmap_on(na)) 2444 return POLLERR; 2445 2446 if (netmap_verbose & 0x8000) 2447 D("device %s events 0x%x", na->name, events); 2448 want_tx = events & (POLLOUT | POLLWRNORM); 2449 want_rx = events & (POLLIN | POLLRDNORM); 2450 2451 /* 2452 * check_all_{tx|rx} are set if the card has more than one queue AND 2453 * the file descriptor is bound to all of them. If so, we sleep on 2454 * the "global" selinfo, otherwise we sleep on individual selinfo 2455 * (FreeBSD only allows two selinfo's per file descriptor). 2456 * The interrupt routine in the driver wake one or the other 2457 * (or both) depending on which clients are active. 2458 * 2459 * rxsync() is only called if we run out of buffers on a POLLIN. 2460 * txsync() is called if we run out of buffers on POLLOUT, or 2461 * there are pending packets to send. The latter can be disabled 2462 * passing NETMAP_NO_TX_POLL in the NIOCREG call. 2463 */ 2464 check_all_tx = nm_si_user(priv, NR_TX); 2465 check_all_rx = nm_si_user(priv, NR_RX); 2466 2467 /* 2468 * We start with a lock free round which is cheap if we have 2469 * slots available. If this fails, then lock and call the sync 2470 * routines. 2471 */ 2472 #if 1 /* new code- call rx if any of the ring needs to release or read buffers */ 2473 if (want_tx) { 2474 t = NR_TX; 2475 for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) { 2476 kring = &NMR(na, t)[i]; 2477 /* XXX compare ring->cur and kring->tail */ 2478 if (!nm_ring_empty(kring->ring)) { 2479 revents |= want[t]; 2480 want[t] = 0; /* also breaks the loop */ 2481 } 2482 } 2483 } 2484 if (want_rx) { 2485 want_rx = 0; /* look for a reason to run the handlers */ 2486 t = NR_RX; 2487 for (i = priv->np_qfirst[t]; i < priv->np_qlast[t]; i++) { 2488 kring = &NMR(na, t)[i]; 2489 if (kring->ring->cur == kring->ring->tail /* try fetch new buffers */ 2490 || kring->rhead != kring->ring->head /* release buffers */) { 2491 want_rx = 1; 2492 } 2493 } 2494 if (!want_rx) 2495 revents |= events & (POLLIN | POLLRDNORM); /* we have data */ 2496 } 2497 #else /* old code */ 2498 for_rx_tx(t) { 2499 for (i = priv->np_qfirst[t]; want[t] && i < priv->np_qlast[t]; i++) { 2500 kring = &NMR(na, t)[i]; 2501 /* XXX compare ring->cur and kring->tail */ 2502 if (!nm_ring_empty(kring->ring)) { 2503 revents |= want[t]; 2504 want[t] = 0; /* also breaks the loop */ 2505 } 2506 } 2507 } 2508 #endif /* old code */ 2509 2510 /* 2511 * If we want to push packets out (priv->np_txpoll) or 2512 * want_tx is still set, we must issue txsync calls 2513 * (on all rings, to avoid that the tx rings stall). 2514 * XXX should also check cur != hwcur on the tx rings. 2515 * Fortunately, normal tx mode has np_txpoll set. 2516 */ 2517 if (priv->np_txpoll || want_tx) { 2518 /* 2519 * The first round checks if anyone is ready, if not 2520 * do a selrecord and another round to handle races. 2521 * want_tx goes to 0 if any space is found, and is 2522 * used to skip rings with no pending transmissions. 2523 */ 2524 flush_tx: 2525 for (i = priv->np_qfirst[NR_TX]; i < priv->np_qlast[NR_TX]; i++) { 2526 int found = 0; 2527 2528 kring = &na->tx_rings[i]; 2529 ring = kring->ring; 2530 2531 if (!send_down && !want_tx && ring->cur == kring->nr_hwcur) 2532 continue; 2533 2534 if (nm_kr_tryget(kring, 1, &revents)) 2535 continue; 2536 2537 if (nm_txsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2538 netmap_ring_reinit(kring); 2539 revents |= POLLERR; 2540 } else { 2541 if (kring->nm_sync(kring, 0)) 2542 revents |= POLLERR; 2543 else 2544 nm_sync_finalize(kring); 2545 } 2546 2547 /* 2548 * If we found new slots, notify potential 2549 * listeners on the same ring. 2550 * Since we just did a txsync, look at the copies 2551 * of cur,tail in the kring. 2552 */ 2553 found = kring->rcur != kring->rtail; 2554 nm_kr_put(kring); 2555 if (found) { /* notify other listeners */ 2556 revents |= want_tx; 2557 want_tx = 0; 2558 kring->nm_notify(kring, 0); 2559 } 2560 } 2561 /* if there were any packet to forward we must have handled them by now */ 2562 send_down = 0; 2563 if (want_tx && retry_tx && sr) { 2564 nm_os_selrecord(sr, check_all_tx ? 2565 &na->si[NR_TX] : &na->tx_rings[priv->np_qfirst[NR_TX]].si); 2566 retry_tx = 0; 2567 goto flush_tx; 2568 } 2569 } 2570 2571 /* 2572 * If want_rx is still set scan receive rings. 2573 * Do it on all rings because otherwise we starve. 2574 */ 2575 if (want_rx) { 2576 /* two rounds here for race avoidance */ 2577 do_retry_rx: 2578 for (i = priv->np_qfirst[NR_RX]; i < priv->np_qlast[NR_RX]; i++) { 2579 int found = 0; 2580 2581 kring = &na->rx_rings[i]; 2582 ring = kring->ring; 2583 2584 if (unlikely(nm_kr_tryget(kring, 1, &revents))) 2585 continue; 2586 2587 if (nm_rxsync_prologue(kring, ring) >= kring->nkr_num_slots) { 2588 netmap_ring_reinit(kring); 2589 revents |= POLLERR; 2590 } 2591 /* now we can use kring->rcur, rtail */ 2592 2593 /* 2594 * transparent mode support: collect packets 2595 * from the rxring(s). 2596 */ 2597 if (nm_may_forward_up(kring)) { 2598 ND(10, "forwarding some buffers up %d to %d", 2599 kring->nr_hwcur, ring->cur); 2600 netmap_grab_packets(kring, &q, netmap_fwd); 2601 } 2602 2603 kring->nr_kflags &= ~NR_FORWARD; 2604 if (kring->nm_sync(kring, 0)) 2605 revents |= POLLERR; 2606 else 2607 nm_sync_finalize(kring); 2608 send_down |= (kring->nr_kflags & NR_FORWARD); /* host ring only */ 2609 if (netmap_no_timestamp == 0 || 2610 ring->flags & NR_TIMESTAMP) { 2611 microtime(&ring->ts); 2612 } 2613 found = kring->rcur != kring->rtail; 2614 nm_kr_put(kring); 2615 if (found) { 2616 revents |= want_rx; 2617 retry_rx = 0; 2618 kring->nm_notify(kring, 0); 2619 } 2620 } 2621 2622 if (retry_rx && sr) { 2623 nm_os_selrecord(sr, check_all_rx ? 2624 &na->si[NR_RX] : &na->rx_rings[priv->np_qfirst[NR_RX]].si); 2625 } 2626 if (send_down > 0 || retry_rx) { 2627 retry_rx = 0; 2628 if (send_down) 2629 goto flush_tx; /* and retry_rx */ 2630 else 2631 goto do_retry_rx; 2632 } 2633 } 2634 2635 /* 2636 * Transparent mode: marked bufs on rx rings between 2637 * kring->nr_hwcur and ring->head 2638 * are passed to the other endpoint. 2639 * 2640 * Transparent mode requires to bind all 2641 * rings to a single file descriptor. 2642 */ 2643 2644 if (q.head && !nm_kr_tryget(&na->tx_rings[na->num_tx_rings], 1, &revents)) { 2645 netmap_send_up(na->ifp, &q); 2646 nm_kr_put(&na->tx_rings[na->num_tx_rings]); 2647 } 2648 2649 return (revents); 2650 #undef want_tx 2651 #undef want_rx 2652 } 2653 2654 2655 /*-------------------- driver support routines -------------------*/ 2656 2657 /* default notify callback */ 2658 static int 2659 netmap_notify(struct netmap_kring *kring, int flags) 2660 { 2661 struct netmap_adapter *na = kring->na; 2662 enum txrx t = kring->tx; 2663 2664 nm_os_selwakeup(&kring->si); 2665 /* optimization: avoid a wake up on the global 2666 * queue if nobody has registered for more 2667 * than one ring 2668 */ 2669 if (na->si_users[t] > 0) 2670 nm_os_selwakeup(&na->si[t]); 2671 2672 return NM_IRQ_COMPLETED; 2673 } 2674 2675 #if 0 2676 static int 2677 netmap_notify(struct netmap_adapter *na, u_int n_ring, 2678 enum txrx tx, int flags) 2679 { 2680 if (tx == NR_TX) { 2681 KeSetEvent(notes->TX_EVENT, 0, FALSE); 2682 } 2683 else 2684 { 2685 KeSetEvent(notes->RX_EVENT, 0, FALSE); 2686 } 2687 return 0; 2688 } 2689 #endif 2690 2691 /* called by all routines that create netmap_adapters. 2692 * provide some defaults and get a reference to the 2693 * memory allocator 2694 */ 2695 int 2696 netmap_attach_common(struct netmap_adapter *na) 2697 { 2698 if (na->num_tx_rings == 0 || na->num_rx_rings == 0) { 2699 D("%s: invalid rings tx %d rx %d", 2700 na->name, na->num_tx_rings, na->num_rx_rings); 2701 return EINVAL; 2702 } 2703 2704 #ifdef __FreeBSD__ 2705 if (na->na_flags & NAF_HOST_RINGS && na->ifp) { 2706 na->if_input = na->ifp->if_input; /* for netmap_send_up */ 2707 } 2708 #endif /* __FreeBSD__ */ 2709 if (na->nm_krings_create == NULL) { 2710 /* we assume that we have been called by a driver, 2711 * since other port types all provide their own 2712 * nm_krings_create 2713 */ 2714 na->nm_krings_create = netmap_hw_krings_create; 2715 na->nm_krings_delete = netmap_hw_krings_delete; 2716 } 2717 if (na->nm_notify == NULL) 2718 na->nm_notify = netmap_notify; 2719 na->active_fds = 0; 2720 2721 if (na->nm_mem == NULL) 2722 /* use the global allocator */ 2723 na->nm_mem = &nm_mem; 2724 netmap_mem_get(na->nm_mem); 2725 #ifdef WITH_VALE 2726 if (na->nm_bdg_attach == NULL) 2727 /* no special nm_bdg_attach callback. On VALE 2728 * attach, we need to interpose a bwrap 2729 */ 2730 na->nm_bdg_attach = netmap_bwrap_attach; 2731 #endif 2732 2733 return 0; 2734 } 2735 2736 2737 /* standard cleanup, called by all destructors */ 2738 void 2739 netmap_detach_common(struct netmap_adapter *na) 2740 { 2741 if (na->tx_rings) { /* XXX should not happen */ 2742 D("freeing leftover tx_rings"); 2743 na->nm_krings_delete(na); 2744 } 2745 netmap_pipe_dealloc(na); 2746 if (na->nm_mem) 2747 netmap_mem_put(na->nm_mem); 2748 bzero(na, sizeof(*na)); 2749 free(na, M_DEVBUF); 2750 } 2751 2752 /* Wrapper for the register callback provided netmap-enabled 2753 * hardware drivers. 2754 * nm_iszombie(na) means that the driver module has been 2755 * unloaded, so we cannot call into it. 2756 * nm_os_ifnet_lock() must guarantee mutual exclusion with 2757 * module unloading. 2758 */ 2759 static int 2760 netmap_hw_reg(struct netmap_adapter *na, int onoff) 2761 { 2762 struct netmap_hw_adapter *hwna = 2763 (struct netmap_hw_adapter*)na; 2764 int error = 0; 2765 2766 nm_os_ifnet_lock(); 2767 2768 if (nm_iszombie(na)) { 2769 if (onoff) { 2770 error = ENXIO; 2771 } else if (na != NULL) { 2772 na->na_flags &= ~NAF_NETMAP_ON; 2773 } 2774 goto out; 2775 } 2776 2777 error = hwna->nm_hw_register(na, onoff); 2778 2779 out: 2780 nm_os_ifnet_unlock(); 2781 2782 return error; 2783 } 2784 2785 static void 2786 netmap_hw_dtor(struct netmap_adapter *na) 2787 { 2788 if (nm_iszombie(na) || na->ifp == NULL) 2789 return; 2790 2791 WNA(na->ifp) = NULL; 2792 } 2793 2794 2795 /* 2796 * Allocate a ``netmap_adapter`` object, and initialize it from the 2797 * 'arg' passed by the driver on attach. 2798 * We allocate a block of memory with room for a struct netmap_adapter 2799 * plus two sets of N+2 struct netmap_kring (where N is the number 2800 * of hardware rings): 2801 * krings 0..N-1 are for the hardware queues. 2802 * kring N is for the host stack queue 2803 * kring N+1 is only used for the selinfo for all queues. // XXX still true ? 2804 * Return 0 on success, ENOMEM otherwise. 2805 */ 2806 static int 2807 _netmap_attach(struct netmap_adapter *arg, size_t size) 2808 { 2809 struct netmap_hw_adapter *hwna = NULL; 2810 struct ifnet *ifp = NULL; 2811 2812 if (arg == NULL || arg->ifp == NULL) 2813 goto fail; 2814 ifp = arg->ifp; 2815 hwna = malloc(size, M_DEVBUF, M_NOWAIT | M_ZERO); 2816 if (hwna == NULL) 2817 goto fail; 2818 hwna->up = *arg; 2819 hwna->up.na_flags |= NAF_HOST_RINGS | NAF_NATIVE; 2820 strncpy(hwna->up.name, ifp->if_xname, sizeof(hwna->up.name)); 2821 hwna->nm_hw_register = hwna->up.nm_register; 2822 hwna->up.nm_register = netmap_hw_reg; 2823 if (netmap_attach_common(&hwna->up)) { 2824 free(hwna, M_DEVBUF); 2825 goto fail; 2826 } 2827 netmap_adapter_get(&hwna->up); 2828 2829 NM_ATTACH_NA(ifp, &hwna->up); 2830 2831 #ifdef linux 2832 if (ifp->netdev_ops) { 2833 /* prepare a clone of the netdev ops */ 2834 #ifndef NETMAP_LINUX_HAVE_NETDEV_OPS 2835 hwna->nm_ndo.ndo_start_xmit = ifp->netdev_ops; 2836 #else 2837 hwna->nm_ndo = *ifp->netdev_ops; 2838 #endif /* NETMAP_LINUX_HAVE_NETDEV_OPS */ 2839 } 2840 hwna->nm_ndo.ndo_start_xmit = linux_netmap_start_xmit; 2841 if (ifp->ethtool_ops) { 2842 hwna->nm_eto = *ifp->ethtool_ops; 2843 } 2844 hwna->nm_eto.set_ringparam = linux_netmap_set_ringparam; 2845 #ifdef NETMAP_LINUX_HAVE_SET_CHANNELS 2846 hwna->nm_eto.set_channels = linux_netmap_set_channels; 2847 #endif /* NETMAP_LINUX_HAVE_SET_CHANNELS */ 2848 if (arg->nm_config == NULL) { 2849 hwna->up.nm_config = netmap_linux_config; 2850 } 2851 #endif /* linux */ 2852 if (arg->nm_dtor == NULL) { 2853 hwna->up.nm_dtor = netmap_hw_dtor; 2854 } 2855 2856 if_printf(ifp, "netmap queues/slots: TX %d/%d, RX %d/%d\n", 2857 hwna->up.num_tx_rings, hwna->up.num_tx_desc, 2858 hwna->up.num_rx_rings, hwna->up.num_rx_desc); 2859 return 0; 2860 2861 fail: 2862 D("fail, arg %p ifp %p na %p", arg, ifp, hwna); 2863 return (hwna ? EINVAL : ENOMEM); 2864 } 2865 2866 2867 int 2868 netmap_attach(struct netmap_adapter *arg) 2869 { 2870 return _netmap_attach(arg, sizeof(struct netmap_hw_adapter)); 2871 } 2872 2873 2874 #ifdef WITH_PTNETMAP_GUEST 2875 int 2876 netmap_pt_guest_attach(struct netmap_adapter *arg, 2877 void *csb, 2878 unsigned int nifp_offset, 2879 nm_pt_guest_ptctl_t ptctl) 2880 { 2881 struct netmap_pt_guest_adapter *ptna; 2882 struct ifnet *ifp = arg ? arg->ifp : NULL; 2883 int error; 2884 2885 /* get allocator */ 2886 arg->nm_mem = netmap_mem_pt_guest_new(ifp, nifp_offset, ptctl); 2887 if (arg->nm_mem == NULL) 2888 return ENOMEM; 2889 arg->na_flags |= NAF_MEM_OWNER; 2890 error = _netmap_attach(arg, sizeof(struct netmap_pt_guest_adapter)); 2891 if (error) 2892 return error; 2893 2894 /* get the netmap_pt_guest_adapter */ 2895 ptna = (struct netmap_pt_guest_adapter *) NA(ifp); 2896 ptna->csb = csb; 2897 2898 /* Initialize a separate pass-through netmap adapter that is going to 2899 * be used by the ptnet driver only, and so never exposed to netmap 2900 * applications. We only need a subset of the available fields. */ 2901 memset(&ptna->dr, 0, sizeof(ptna->dr)); 2902 ptna->dr.up.ifp = ifp; 2903 ptna->dr.up.nm_mem = ptna->hwup.up.nm_mem; 2904 netmap_mem_get(ptna->dr.up.nm_mem); 2905 ptna->dr.up.nm_config = ptna->hwup.up.nm_config; 2906 2907 ptna->backend_regifs = 0; 2908 2909 return 0; 2910 } 2911 #endif /* WITH_PTNETMAP_GUEST */ 2912 2913 2914 void 2915 NM_DBG(netmap_adapter_get)(struct netmap_adapter *na) 2916 { 2917 if (!na) { 2918 return; 2919 } 2920 2921 refcount_acquire(&na->na_refcount); 2922 } 2923 2924 2925 /* returns 1 iff the netmap_adapter is destroyed */ 2926 int 2927 NM_DBG(netmap_adapter_put)(struct netmap_adapter *na) 2928 { 2929 if (!na) 2930 return 1; 2931 2932 if (!refcount_release(&na->na_refcount)) 2933 return 0; 2934 2935 if (na->nm_dtor) 2936 na->nm_dtor(na); 2937 2938 netmap_detach_common(na); 2939 2940 return 1; 2941 } 2942 2943 /* nm_krings_create callback for all hardware native adapters */ 2944 int 2945 netmap_hw_krings_create(struct netmap_adapter *na) 2946 { 2947 int ret = netmap_krings_create(na, 0); 2948 if (ret == 0) { 2949 /* initialize the mbq for the sw rx ring */ 2950 mbq_safe_init(&na->rx_rings[na->num_rx_rings].rx_queue); 2951 ND("initialized sw rx queue %d", na->num_rx_rings); 2952 } 2953 return ret; 2954 } 2955 2956 2957 2958 /* 2959 * Called on module unload by the netmap-enabled drivers 2960 */ 2961 void 2962 netmap_detach(struct ifnet *ifp) 2963 { 2964 struct netmap_adapter *na = NA(ifp); 2965 2966 if (!na) 2967 return; 2968 2969 NMG_LOCK(); 2970 netmap_set_all_rings(na, NM_KR_LOCKED); 2971 na->na_flags |= NAF_ZOMBIE; 2972 /* 2973 * if the netmap adapter is not native, somebody 2974 * changed it, so we can not release it here. 2975 * The NAF_ZOMBIE flag will notify the new owner that 2976 * the driver is gone. 2977 */ 2978 if (na->na_flags & NAF_NATIVE) { 2979 netmap_adapter_put(na); 2980 } 2981 /* give active users a chance to notice that NAF_ZOMBIE has been 2982 * turned on, so that they can stop and return an error to userspace. 2983 * Note that this becomes a NOP if there are no active users and, 2984 * therefore, the put() above has deleted the na, since now NA(ifp) is 2985 * NULL. 2986 */ 2987 netmap_enable_all_rings(ifp); 2988 NMG_UNLOCK(); 2989 } 2990 2991 2992 /* 2993 * Intercept packets from the network stack and pass them 2994 * to netmap as incoming packets on the 'software' ring. 2995 * 2996 * We only store packets in a bounded mbq and then copy them 2997 * in the relevant rxsync routine. 2998 * 2999 * We rely on the OS to make sure that the ifp and na do not go 3000 * away (typically the caller checks for IFF_DRV_RUNNING or the like). 3001 * In nm_register() or whenever there is a reinitialization, 3002 * we make sure to make the mode change visible here. 3003 */ 3004 int 3005 netmap_transmit(struct ifnet *ifp, struct mbuf *m) 3006 { 3007 struct netmap_adapter *na = NA(ifp); 3008 struct netmap_kring *kring, *tx_kring; 3009 u_int len = MBUF_LEN(m); 3010 u_int error = ENOBUFS; 3011 unsigned int txr; 3012 struct mbq *q; 3013 int space; 3014 3015 kring = &na->rx_rings[na->num_rx_rings]; 3016 // XXX [Linux] we do not need this lock 3017 // if we follow the down/configure/up protocol -gl 3018 // mtx_lock(&na->core_lock); 3019 3020 if (!nm_netmap_on(na)) { 3021 D("%s not in netmap mode anymore", na->name); 3022 error = ENXIO; 3023 goto done; 3024 } 3025 3026 txr = MBUF_TXQ(m); 3027 if (txr >= na->num_tx_rings) { 3028 txr %= na->num_tx_rings; 3029 } 3030 tx_kring = &NMR(na, NR_TX)[txr]; 3031 3032 if (tx_kring->nr_mode == NKR_NETMAP_OFF) { 3033 return MBUF_TRANSMIT(na, ifp, m); 3034 } 3035 3036 q = &kring->rx_queue; 3037 3038 // XXX reconsider long packets if we handle fragments 3039 if (len > NETMAP_BUF_SIZE(na)) { /* too long for us */ 3040 D("%s from_host, drop packet size %d > %d", na->name, 3041 len, NETMAP_BUF_SIZE(na)); 3042 goto done; 3043 } 3044 3045 if (nm_os_mbuf_has_offld(m)) { 3046 RD(1, "%s drop mbuf requiring offloadings", na->name); 3047 goto done; 3048 } 3049 3050 /* protect against rxsync_from_host(), netmap_sw_to_nic() 3051 * and maybe other instances of netmap_transmit (the latter 3052 * not possible on Linux). 3053 * Also avoid overflowing the queue. 3054 */ 3055 mbq_lock(q); 3056 3057 space = kring->nr_hwtail - kring->nr_hwcur; 3058 if (space < 0) 3059 space += kring->nkr_num_slots; 3060 if (space + mbq_len(q) >= kring->nkr_num_slots - 1) { // XXX 3061 RD(10, "%s full hwcur %d hwtail %d qlen %d len %d m %p", 3062 na->name, kring->nr_hwcur, kring->nr_hwtail, mbq_len(q), 3063 len, m); 3064 } else { 3065 mbq_enqueue(q, m); 3066 ND(10, "%s %d bufs in queue len %d m %p", 3067 na->name, mbq_len(q), len, m); 3068 /* notify outside the lock */ 3069 m = NULL; 3070 error = 0; 3071 } 3072 mbq_unlock(q); 3073 3074 done: 3075 if (m) 3076 m_freem(m); 3077 /* unconditionally wake up listeners */ 3078 kring->nm_notify(kring, 0); 3079 /* this is normally netmap_notify(), but for nics 3080 * connected to a bridge it is netmap_bwrap_intr_notify(), 3081 * that possibly forwards the frames through the switch 3082 */ 3083 3084 return (error); 3085 } 3086 3087 3088 /* 3089 * netmap_reset() is called by the driver routines when reinitializing 3090 * a ring. The driver is in charge of locking to protect the kring. 3091 * If native netmap mode is not set just return NULL. 3092 * If native netmap mode is set, in particular, we have to set nr_mode to 3093 * NKR_NETMAP_ON. 3094 */ 3095 struct netmap_slot * 3096 netmap_reset(struct netmap_adapter *na, enum txrx tx, u_int n, 3097 u_int new_cur) 3098 { 3099 struct netmap_kring *kring; 3100 int new_hwofs, lim; 3101 3102 if (!nm_native_on(na)) { 3103 ND("interface not in native netmap mode"); 3104 return NULL; /* nothing to reinitialize */ 3105 } 3106 3107 /* XXX note- in the new scheme, we are not guaranteed to be 3108 * under lock (e.g. when called on a device reset). 3109 * In this case, we should set a flag and do not trust too 3110 * much the values. In practice: TODO 3111 * - set a RESET flag somewhere in the kring 3112 * - do the processing in a conservative way 3113 * - let the *sync() fixup at the end. 3114 */ 3115 if (tx == NR_TX) { 3116 if (n >= na->num_tx_rings) 3117 return NULL; 3118 3119 kring = na->tx_rings + n; 3120 3121 if (kring->nr_pending_mode == NKR_NETMAP_OFF) { 3122 kring->nr_mode = NKR_NETMAP_OFF; 3123 return NULL; 3124 } 3125 3126 // XXX check whether we should use hwcur or rcur 3127 new_hwofs = kring->nr_hwcur - new_cur; 3128 } else { 3129 if (n >= na->num_rx_rings) 3130 return NULL; 3131 kring = na->rx_rings + n; 3132 3133 if (kring->nr_pending_mode == NKR_NETMAP_OFF) { 3134 kring->nr_mode = NKR_NETMAP_OFF; 3135 return NULL; 3136 } 3137 3138 new_hwofs = kring->nr_hwtail - new_cur; 3139 } 3140 lim = kring->nkr_num_slots - 1; 3141 if (new_hwofs > lim) 3142 new_hwofs -= lim + 1; 3143 3144 /* Always set the new offset value and realign the ring. */ 3145 if (netmap_verbose) 3146 D("%s %s%d hwofs %d -> %d, hwtail %d -> %d", 3147 na->name, 3148 tx == NR_TX ? "TX" : "RX", n, 3149 kring->nkr_hwofs, new_hwofs, 3150 kring->nr_hwtail, 3151 tx == NR_TX ? lim : kring->nr_hwtail); 3152 kring->nkr_hwofs = new_hwofs; 3153 if (tx == NR_TX) { 3154 kring->nr_hwtail = kring->nr_hwcur + lim; 3155 if (kring->nr_hwtail > lim) 3156 kring->nr_hwtail -= lim + 1; 3157 } 3158 3159 #if 0 // def linux 3160 /* XXX check that the mappings are correct */ 3161 /* need ring_nr, adapter->pdev, direction */ 3162 buffer_info->dma = dma_map_single(&pdev->dev, addr, adapter->rx_buffer_len, DMA_FROM_DEVICE); 3163 if (dma_mapping_error(&adapter->pdev->dev, buffer_info->dma)) { 3164 D("error mapping rx netmap buffer %d", i); 3165 // XXX fix error handling 3166 } 3167 3168 #endif /* linux */ 3169 /* 3170 * Wakeup on the individual and global selwait 3171 * We do the wakeup here, but the ring is not yet reconfigured. 3172 * However, we are under lock so there are no races. 3173 */ 3174 kring->nr_mode = NKR_NETMAP_ON; 3175 kring->nm_notify(kring, 0); 3176 return kring->ring->slot; 3177 } 3178 3179 3180 /* 3181 * Dispatch rx/tx interrupts to the netmap rings. 3182 * 3183 * "work_done" is non-null on the RX path, NULL for the TX path. 3184 * We rely on the OS to make sure that there is only one active 3185 * instance per queue, and that there is appropriate locking. 3186 * 3187 * The 'notify' routine depends on what the ring is attached to. 3188 * - for a netmap file descriptor, do a selwakeup on the individual 3189 * waitqueue, plus one on the global one if needed 3190 * (see netmap_notify) 3191 * - for a nic connected to a switch, call the proper forwarding routine 3192 * (see netmap_bwrap_intr_notify) 3193 */ 3194 int 3195 netmap_common_irq(struct netmap_adapter *na, u_int q, u_int *work_done) 3196 { 3197 struct netmap_kring *kring; 3198 enum txrx t = (work_done ? NR_RX : NR_TX); 3199 3200 q &= NETMAP_RING_MASK; 3201 3202 if (netmap_verbose) { 3203 RD(5, "received %s queue %d", work_done ? "RX" : "TX" , q); 3204 } 3205 3206 if (q >= nma_get_nrings(na, t)) 3207 return NM_IRQ_PASS; // not a physical queue 3208 3209 kring = NMR(na, t) + q; 3210 3211 if (kring->nr_mode == NKR_NETMAP_OFF) { 3212 return NM_IRQ_PASS; 3213 } 3214 3215 if (t == NR_RX) { 3216 kring->nr_kflags |= NKR_PENDINTR; // XXX atomic ? 3217 *work_done = 1; /* do not fire napi again */ 3218 } 3219 3220 return kring->nm_notify(kring, 0); 3221 } 3222 3223 3224 /* 3225 * Default functions to handle rx/tx interrupts from a physical device. 3226 * "work_done" is non-null on the RX path, NULL for the TX path. 3227 * 3228 * If the card is not in netmap mode, simply return NM_IRQ_PASS, 3229 * so that the caller proceeds with regular processing. 3230 * Otherwise call netmap_common_irq(). 3231 * 3232 * If the card is connected to a netmap file descriptor, 3233 * do a selwakeup on the individual queue, plus one on the global one 3234 * if needed (multiqueue card _and_ there are multiqueue listeners), 3235 * and return NR_IRQ_COMPLETED. 3236 * 3237 * Finally, if called on rx from an interface connected to a switch, 3238 * calls the proper forwarding routine. 3239 */ 3240 int 3241 netmap_rx_irq(struct ifnet *ifp, u_int q, u_int *work_done) 3242 { 3243 struct netmap_adapter *na = NA(ifp); 3244 3245 /* 3246 * XXX emulated netmap mode sets NAF_SKIP_INTR so 3247 * we still use the regular driver even though the previous 3248 * check fails. It is unclear whether we should use 3249 * nm_native_on() here. 3250 */ 3251 if (!nm_netmap_on(na)) 3252 return NM_IRQ_PASS; 3253 3254 if (na->na_flags & NAF_SKIP_INTR) { 3255 ND("use regular interrupt"); 3256 return NM_IRQ_PASS; 3257 } 3258 3259 return netmap_common_irq(na, q, work_done); 3260 } 3261 3262 3263 /* 3264 * Module loader and unloader 3265 * 3266 * netmap_init() creates the /dev/netmap device and initializes 3267 * all global variables. Returns 0 on success, errno on failure 3268 * (but there is no chance) 3269 * 3270 * netmap_fini() destroys everything. 3271 */ 3272 3273 static struct cdev *netmap_dev; /* /dev/netmap character device. */ 3274 extern struct cdevsw netmap_cdevsw; 3275 3276 3277 void 3278 netmap_fini(void) 3279 { 3280 if (netmap_dev) 3281 destroy_dev(netmap_dev); 3282 /* we assume that there are no longer netmap users */ 3283 nm_os_ifnet_fini(); 3284 netmap_uninit_bridges(); 3285 netmap_mem_fini(); 3286 NMG_LOCK_DESTROY(); 3287 printf("netmap: unloaded module.\n"); 3288 } 3289 3290 3291 int 3292 netmap_init(void) 3293 { 3294 int error; 3295 3296 NMG_LOCK_INIT(); 3297 3298 error = netmap_mem_init(); 3299 if (error != 0) 3300 goto fail; 3301 /* 3302 * MAKEDEV_ETERNAL_KLD avoids an expensive check on syscalls 3303 * when the module is compiled in. 3304 * XXX could use make_dev_credv() to get error number 3305 */ 3306 netmap_dev = make_dev_credf(MAKEDEV_ETERNAL_KLD, 3307 &netmap_cdevsw, 0, NULL, UID_ROOT, GID_WHEEL, 0600, 3308 "netmap"); 3309 if (!netmap_dev) 3310 goto fail; 3311 3312 error = netmap_init_bridges(); 3313 if (error) 3314 goto fail; 3315 3316 #ifdef __FreeBSD__ 3317 nm_os_vi_init_index(); 3318 #endif 3319 3320 error = nm_os_ifnet_init(); 3321 if (error) 3322 goto fail; 3323 3324 printf("netmap: loaded module\n"); 3325 return (0); 3326 fail: 3327 netmap_fini(); 3328 return (EINVAL); /* may be incorrect */ 3329 } 3330