#
2ccf971a |
| 19-Feb-2021 |
John Baldwin <jhb@FreeBSD.org> |
iflib: Cast the result of iflib_netmap_txq_init() to void.
This fixes a warning from GCC for kernels without netmap since the return value is never used.
Reviewed by: vmaffione, erj Differential Re
iflib: Cast the result of iflib_netmap_txq_init() to void.
This fixes a warning from GCC for kernels without netmap since the return value is never used.
Reviewed by: vmaffione, erj Differential Revision: https://reviews.freebsd.org/D28598
show more ...
|
#
922cf8ac |
| 14-Feb-2021 |
Allan Jude <allanjude@FreeBSD.org> |
Use iflib_if_init_locked() during media change instead of iflib_init_locked().
iflib_init_locked() assumes that iflib_stop() has been called, however, it is not called for media changes. iflib_if_in
Use iflib_if_init_locked() during media change instead of iflib_init_locked().
iflib_init_locked() assumes that iflib_stop() has been called, however, it is not called for media changes. iflib_if_init_locked() calls stop then init, so fixes the problem.
PR: 253473 MFC after: 3 days Reviewed by: markj Sponsored by: Juniper Networks, Inc., Klara, Inc. Differential Revision: https://reviews.freebsd.org/D28667
show more ...
|
#
38bfc6de |
| 01-Feb-2021 |
Sai Rajesh Tallamraju <stallamr@netapp.com> |
iflib: Free resources in a consistent order during detach
Memory and PCI resources are freed with no particular order. This could cause use-after-frees when detaching following a failed attach. Fo
iflib: Free resources in a consistent order during detach
Memory and PCI resources are freed with no particular order. This could cause use-after-frees when detaching following a failed attach. For instance, iflib_tx_structures_free() frees ctx->ifc_txqs[] but iflib_tqg_detach() attempts to access this array. Similarly, adapter queues gets freed by IFDI_QUEUES_FREE() but IFDI_DETACH() attempts to access adapter queues to free PCI resources.
MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27634
show more ...
|
#
3f43ada9 |
| 28-Jan-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Catch up with 6edfd179c86: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG.
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer that points to unmapped address. Then, this was e
Catch up with 6edfd179c86: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG.
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer that points to unmapped address. Then, this was extended to array of such pointers. Then, such mbufs were augmented with header/trailer. Basically, extended mbufs are extended, and set of features is subject to change. The new name should be generic enough to avoid further renaming.
show more ...
|
#
f80efe50 |
| 24-Jan-2021 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
iflib: netmap: move per-packet operation out of fragments loop
MFC after: 1 week
|
#
aceaccab |
| 24-Jan-2021 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
iflib: netmap: add support for NS_MOREFRAG
The NS_MOREFRAG flag can be set in a netmap slot to represent a multi-fragment packet. Only the last fragment of a packet does not have the flag set. On TX
iflib: netmap: add support for NS_MOREFRAG
The NS_MOREFRAG flag can be set in a netmap slot to represent a multi-fragment packet. Only the last fragment of a packet does not have the flag set. On TX rings, the flag may be set by the userspace application. The kernel will look at the flag and use it to properly set up the NIC TX descriptors. On RX rings, the kernel may set the flag if the packet received was split across multiple netmap buffers. The userspace application should look at the flag to know when the packet is complete.
Submitted by: rajesh1.kumar_amd.com Reviewed by: vmaffione MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27799
show more ...
|
#
0c864213 |
| 21-Jan-2021 |
Andrew Gallatin <gallatin@FreeBSD.org> |
iflib: Fix a NULL pointer deref
rxd_frag_to_sd() have pf_rv parameter as NULL with the current code. This patch fixes the NULL pointer dereference in that case thus avoiding a possible panic.
Submi
iflib: Fix a NULL pointer deref
rxd_frag_to_sd() have pf_rv parameter as NULL with the current code. This patch fixes the NULL pointer dereference in that case thus avoiding a possible panic.
Submitted by: rajesh1.kumar at amd.com Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D28115
show more ...
|
#
55f0ad5f |
| 10-Jan-2021 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
netmap: restore hwofs and support it in iflib
Restore the hwofs functionality temporarily disabled by 7ba6ecf216fb15e8b147db2 to prevent issues with iflib. This patch brings the necessary changes to
netmap: restore hwofs and support it in iflib
Restore the hwofs functionality temporarily disabled by 7ba6ecf216fb15e8b147db2 to prevent issues with iflib. This patch brings the necessary changes to iflib to enable howfs to allow interface restarts without disrupting netmap applications actively using its rings. After this change, it becomes possible for multiple non-cooperating netmap applications to use non-overlapping subsets of the available netmap rings without clashing with each other.
PR: 252453 MFC after: 1 week
show more ...
|
#
8aa8484c |
| 10-Jan-2021 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
iflib: fix build failure in case DEV_NETMAP is not defined
This addresses the build failure introduced by 3d65fd97e85ab807f3baa62.
MFC with: 3d65fd97e85ab807f3baa62
|
#
4ba9ad0d |
| 10-Jan-2021 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
iflib: add assert to prevent out-of-bounds array access
The iflib_queues_alloc() allocates isc_nrxqs iflib_dma_info structs for each rxqset, and links each struct to a different free list. As a resu
iflib: add assert to prevent out-of-bounds array access
The iflib_queues_alloc() allocates isc_nrxqs iflib_dma_info structs for each rxqset, and links each struct to a different free list. As a result, it must be isc_nrxqs >= isc_nfl (plus the completion queue, if present). Add an assertion to make this constraint explicit.
MFC after: 2 weeks
show more ...
|
#
3d65fd97 |
| 10-Jan-2021 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
netmap: iflib: enable/disable krings on any interface reinit
Since 1d238b07d5d4d9660ae0e0, krings are disabled before a reinit cycle triggered by iflib_netmap_register. However, this operation is ac
netmap: iflib: enable/disable krings on any interface reinit
Since 1d238b07d5d4d9660ae0e0, krings are disabled before a reinit cycle triggered by iflib_netmap_register. However, this operation is actually necessary also for any interface reinit triggered by other causes (i.e., ifconfig commands). We achieve this goal by moving the krings enable/disable operation inside iflib_stop() and iflib_init_locked().
Once here, this change also removes some redundant operations from iflib_netmap_register(), that are already performed by iflib_stop().
PR: 252453 MFC after: 1 week
show more ...
|
#
3189ba61 |
| 09-Jan-2021 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
netmap: iflib: fix asserts in netmap_fl_refill()
When netmap_fl_refill() is called at initialization time (e.g., during netmap_iflib_register()), nic_i must be 0, since the free list is reinitialize
netmap: iflib: fix asserts in netmap_fl_refill()
When netmap_fl_refill() is called at initialization time (e.g., during netmap_iflib_register()), nic_i must be 0, since the free list is reinitialized. At the end of the refill cycle, nic_i must still be zero, because exactly N descriptors (N is the ring size) are refilled. This patch therefore fixes the assertions to check on nic_i rather than on nm_i. The current netmap_reset() may in fact cause nm_i to be != 0 while the device is resetting: this may happen when multiple non-cooperating processes open different subsets of the available netmap rings.
PR: 252518 MFC after: 1 week
show more ...
|
#
1d238b07 |
| 09-Jan-2021 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
netmap: iflib: stop krings during interface reset
When different processes open separate subsets of the available rings of a same netmap interface, a device reset may be performed while one of the p
netmap: iflib: stop krings during interface reset
When different processes open separate subsets of the available rings of a same netmap interface, a device reset may be performed while one of the processes is actively using some rings (e.g., caused by another process executing a nmport_open()). With this patch, such situation will cause the active process to get a POLLERR, so that it can have a chance to detect the situation. We also guarantee that no process is running a txsync or rxsync (ioctl or poll) while an iflib device reset is in progress.
PR: 252453 MFC after: 1 week
show more ...
|
#
81be6552 |
| 19-Dec-2020 |
Matt Macy <mmacy@FreeBSD.org> |
iflib: ensure that tx interrupts enabled and cleanups
Doing a 'dd' over iscsi will reliably cause stalls. Tx cleaning _should_ reliably happen as data is sent. However, currently if the transmit que
iflib: ensure that tx interrupts enabled and cleanups
Doing a 'dd' over iscsi will reliably cause stalls. Tx cleaning _should_ reliably happen as data is sent. However, currently if the transmit queue fills it will wait until the iflib timer (hz/2) runs.
This change causes the the tx taskq thread to be run if there are completed descriptors.
While here:
- make timer interrupt delay a sysctl
- simplify txd_db_check handling
- comment on INTR types
Background on the change:
Initially doorbell updates were minimized by only writing to the register on every fourth packet. If txq_drain would return without writing to the doorbell it scheduled a callout on the next tick to do the doorbell write to ensure that the write otherwise happened "soon". At that time a sysctl was added for users to avoid the potential added latency by simply writing to the doorbell register on every packet. This worked perfectly well for e1000 and ixgbe ... and appeared to work well on ixl. However, as it turned out there was a race to this approach that would lockup the ixl MAC. It was possible for a lower producer index to be written after a higher one. On e1000 and ixgbe this was harmless - on ixl it was fatal. My initial response was to add a lock around doorbell writes - fixing the problem but adding an unacceptable amount of lock contention.
The next iteration was to use transmit interrupts to drive delayed doorbell writes. If there were no packets in the queue all doorbell writes would be immediate as the queue started to fill up we could delay doorbell writes further and further. At the start of drain if we've cleaned any packets we know we've moved the state machine along and we write the doorbell (an obvious missing optimization was to skip that doorbell write if db_pending is zero). This change required that tx interrupts be scheduled periodically as opposed to just when the hardware txq was full. However, that just leads to our next problem.
Initially dedicated msix vectors were used for both tx and rx. However, it was often possible to use up all available vectors before we set up all the queues we wanted. By having rx and tx share a vector for a given queue we could halve the number of vectors used by a given configuration. The problem here is that with this change only e1000 passed the necessary value to have the fast interrupt drive tx when appropriate.
Reported by: mav@ Tested by: mav@ Reviewed by: gallatin@ MFC after: 1 month Sponsored by: iXsystems Differential Revision: https://reviews.freebsd.org/D27683
show more ...
|
#
c065d4e5 |
| 07-Dec-2020 |
Mark Johnston <markj@FreeBSD.org> |
iflib: Avoid leaking the freelist bitmaps upon driver detach
Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://re
iflib: Avoid leaking the freelist bitmaps upon driver detach
Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27342
show more ...
|
#
10254019 |
| 07-Dec-2020 |
Mark Johnston <markj@FreeBSD.org> |
iflib: Detach tasks upon device registration failure
In some error paths we would fail to detach from the iflib taskqueue groups. Also move the detach code into its own subroutine instead of duplic
iflib: Detach tasks upon device registration failure
In some error paths we would fail to detach from the iflib taskqueue groups. Also move the detach code into its own subroutine instead of duplicating it.
Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27342
show more ...
|
#
54bf96fb |
| 11-Nov-2020 |
Mark Johnston <markj@FreeBSD.org> |
iflib: Free full mbuf chains when draining transmit queues
Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> Reviewed by: gallatin, hselasky MFC after: 1 week Sponsored by: NetApp, Inc. Diff
iflib: Free full mbuf chains when draining transmit queues
Submitted by: Sai Rajesh Tallamraju <stallamr@netapp.com> Reviewed by: gallatin, hselasky MFC after: 1 week Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D27179
show more ...
|
#
be7a6b3d |
| 28-Oct-2020 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
iflib: fix typo bug introduced by r367093
Code was supposed to call callout_reset_sbt_on() rather than callout_reset_sbt(). This resulted into passing a "cpu" value to a "flag" argument. A recipe fo
iflib: fix typo bug introduced by r367093
Code was supposed to call callout_reset_sbt_on() rather than callout_reset_sbt(). This resulted into passing a "cpu" value to a "flag" argument. A recipe for subtle errors.
PR: 248652 Reported by: sg@efficientip.com MFC with: r367093
show more ...
|
#
17cec474 |
| 27-Oct-2020 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
iflib: add per-tx-queue netmap timer
The way netmap TX is handled in iflib when TX interrupts are not used (IFC_NETMAP_TX_IRQ not set) has some issues: - The netmap_tx_irq() function gets called b
iflib: add per-tx-queue netmap timer
The way netmap TX is handled in iflib when TX interrupts are not used (IFC_NETMAP_TX_IRQ not set) has some issues: - The netmap_tx_irq() function gets called by iflib_timer(), which gets scheduled with tick granularity (hz). This is not frequent enough for 10Gbps NICs and beyond (e.g., ixgbe or ixl). The end result is that the transmitting netmap application is not woken up fast enough to saturate the link with small packets. - The iflib_timer() functions also calls isc_txd_credits_update() to ask for more TX completion updates. However, this violates the netmap requirement that only txsync can access the TX queue for datapath operations. Only netmap_tx_irq() may be called out of the txsync context.
This change introduces per-tx-queue netmap timers, using microsecond granularity to ensure that netmap_tx_irq() can be called often enough to allow for maximum packet rate. The timer routine simply calls netmap_tx_irq() to wake up the netmap application. The latter will wake up and call txsync to collect TX completion updates.
This change brings back line rate speed with small packets for ixgbe. For the time being, timer expiration is hardcoded to 90 microseconds, in order to avoid introducing a new sysctl. We may eventually implement an adaptive expiration period or use another deferred work mechanism in place of timers.
Also, fix the timers usage to make sure that each queue is serviced by a different CPU.
PR: 248652 Reported by: sg@efficientip.com MFC after: 2 weeks
show more ...
|
Revision tags: release/12.2.0 |
|
#
662c1305 |
| 01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
net: clean up empty lines in .c and .h files
|
#
35d8a463 |
| 01-Sep-2020 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
iflib: leave only 1 receive descriptor unused
The pidx argument of isc_rxd_flush() indicates which is the last valid receive descriptor to be used by the NIC. However, current code has multiple issu
iflib: leave only 1 receive descriptor unused
The pidx argument of isc_rxd_flush() indicates which is the last valid receive descriptor to be used by the NIC. However, current code has multiple issues: - Intel drivers write pidx to their RDT register, which means that NICs will only use the descriptors up to pidx-1 (modulo ring size N), and won't actually use the one pointed by pidx. This does not break reception, but it is anyway confusing and suboptimal (the NIC will actually see only N-2 descriptors as available, rather than N-1). Other drivers (if_vmx, if_bnxt, if_mgb) adhere to this semantic). - The semantic used by Intel (RDT is one descriptor past the last valid one) is used by most (if not all) NICs, and it is also used on the TX side (also in iflib). Since iflib is not currently using this semantic for RX, it must decrement fl->ifl_pidx (modulo N) before calling isc_rxd_flush(), and then the per-driver callback implementation must increment the index again (to match the real semantic). This is confusing and suboptimal. - The iflib refill function is also called at initialization. However, in case the ring size is smaller than 128 (e.g. if_mgb), the refill function will actually prepare all the receive descriptors (N), without leaving one unused, as most of NICs assume (e.g. to avoid RDT to overrun RDH). I can speculate that the code looks like this right now because this issue showed up during testing (e.g. with if_mgb), and it was easy to workaround by decrementing pidx before isc_rxd_flush().
The goal of this change is to simplify the code (removing a bunch of instructions from the RX fast path), and to make the semantic of isc_rxd_flush() consistent across drivers. To achieve this, we: - change the semantics of the pidx argument to the usual one (that is the index one past the last valid one), so that both iflib and drivers avoid the decrement/increment dance. - fix the initialization code to prepare at most N-1 descriptors.
Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26191
show more ...
|
#
e2515283 |
| 27-Aug-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
#
ae750d5c |
| 25-Aug-2020 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
iflib: netmap: publish all the receive buffer
At initialization time, the netmap RX refill function used to prepare the NIC RX ring with N-1 buffers rather than N (with N equal to the number of desc
iflib: netmap: publish all the receive buffer
At initialization time, the netmap RX refill function used to prepare the NIC RX ring with N-1 buffers rather than N (with N equal to the number of descriptors in the NIC RX ring). This is not how netmap is supposed to work, as it would keep kring->nr_hwcur not in sync with the NIC "next index to refill" (i.e., fl->ifl_pidx). Instead we prepare N buffers, although we still publish (with isc_rxd_flush()) only the first N-1 buffers, to avoid the NIC producer pointer to overrun the NIC consumer pointer (for NICs where this is a real issue, e.g. Intel ones).
MFC after: 2 weeks
show more ...
|
#
de5b4610 |
| 24-Aug-2020 |
Vincenzo Maffione <vmaffione@FreeBSD.org> |
iflib: fix isc_rxd_flush call in netmap_fl_refill()
The semantic of the pidx argument of isc_rxd_flush() is the last valid index of in the free list, rather than the next index to be published. Howe
iflib: fix isc_rxd_flush call in netmap_fl_refill()
The semantic of the pidx argument of isc_rxd_flush() is the last valid index of in the free list, rather than the next index to be published. However, netmap was still using the old convention. While there, also refactor the netmap_fl_refill() to simplify a little bit and add an assertion.
MFC after: 2 weeks
show more ...
|
#
de6fc2e3 |
| 15-Aug-2020 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r364082 through r364250.
|