napi.rst - OpenGrok cross reference for /linux/Documentation/networking/napi.rst

Lines Matching +full:tx +full:- +full:queues +full:- +full:to +full:- +full:use
1 .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
14 The host then schedules a NAPI instance to process the events.
19 but there is an option to use :ref:`separate kernel threads<threaded>`
23 of event (packet Rx and Tx) processing.
30 of the NAPI instance while the method is the driver-specific event
31 handler. The method will typically free Tx packets that have been
37 -----------
40 from the system. The instances are attached to the netdevice passed
46 to not be invoked. napi_disable() waits for ownership of the NAPI
47 instance to be released.
50 concurrent use of datapath APIs but an incorrect sequence of control API
55 ------------
59 (see :ref:`drv_sched` for more info). A successful call to napi_schedule()
63 called to process the events/packets. The method takes a ``budget``
64 argument - drivers can process completions for any number of Tx
65 packets but should only process up to ``budget`` number of
71 skb Tx processing should happen regardless of the ``budget``, but if
76    The ``budget`` argument may be 0 if core tries to only process
77    skb Tx completions and no Rx or XDP packets.
80 has outstanding work to do (e.g. ``budget`` was exhausted)
83 need to be scheduled).
93    must be handled carefully. There is no way to report this
94    (rare) condition to the stack, so the driver must either
95    not call napi_complete_done() and wait to be called again,
96    or return ``budget - 1``.
101 -------------
109 As mentioned in the :ref:`drv_ctrl` section - napi_disable() and subsequent
110 calls to the poll method only wait for the ownership of the instance
111 to be released, not for the poll method to exit. This means that
118 --------------------------
121 the NAPI instance - until NAPI polling finishes any further
124 Drivers which have to mask the interrupts explicitly (as opposed
125 to IRQ being auto-masked by the device) should use the napi_schedule_prep()
128 .. code-block:: c
130   if (napi_schedule_prep(&v->napi)) {
131       mydrv_mask_rxtx_irq(v->idx);
132       /* schedule after masking to avoid races */
133       __napi_schedule(&v->napi);
136 IRQ should only be unmasked after a successful call to napi_complete_done():
138 .. code-block:: c
140   if (budget && napi_complete_done(&v->napi, work_done)) {
141     mydrv_unmask_rxtx_irq(v->idx);
142     return min(work_done, budget - 1);
146 of guarantees given by being invoked in IRQ context (no need to
147 mask interrupts). napi_schedule_irqoff() will fall back to napi_schedule() if
150 Instance to queue mapping
151 -------------------------
155 mapped to queues and interrupts. NAPI is primarily a polling/processing
156 abstraction without specific user-facing semantics. That said, most networking
159 NAPI instances most often correspond 1:1:1 to interrupts and queue pairs
160 (queue pair is a set of a single Rx and single Tx queue).
162 In less common cases a NAPI instance may be used for multiple queues
163 or Rx and Tx queues can be serviced by separate NAPI instances on a single
168 each channel can be either ``rx``, ``tx`` or ``combined``. It's not clear
169 what constitutes a channel; the recommended interpretation is to understand
170 a channel as an IRQ/NAPI which services queues of a given type. For example,
171 a configuration of 1 ``rx``, 1 ``tx`` and 1 ``combined`` channel is expected
172 to utilize 3 interrupts, 2 Rx and 2 Tx queues.
175 ----------------------
177 Drivers often allocate and free NAPI instances dynamically. This leads to loss
178 of NAPI-related user configuration each time NAPI instances are reallocated.
184 be beneficial to userspace programs using ``SO_INCOMING_NAPI_ID``. See the
187 Drivers should try to use netif_napi_add_config() whenever possible.
193 are only visible to the user thru the ``SO_INCOMING_NAPI_ID`` socket option.
199 For example, using the script to dump all of the queues for a device (which
202 .. code-block:: bash
204    $ kernel-source/tools/net/ynl/pyynl/cli.py \
205              --spec Documentation/netlink/specs/netdev.yaml \
206              --dump queue-get \
207              --json='{"ifindex": 2}'
213 -----------------------
216 In most scenarios batching happens due to IRQ coalescing which is done
219 NAPI can be configured to arm a repoll timer instead of unmasking
222 is reused to control the delay of the timer, while
224 before NAPI gives up and goes back to using hardware IRQs.
226 The above parameters can also be set on a per-NAPI basis using netlink via
227 netdev-genl. When used with netlink and configured on a per-NAPI basis, the
228 parameters mentioned above use hyphens instead of underscores:
229 ``gro-flush-timeout`` and ``napi-defer-hard-irqs``.
231 Per-NAPI configuration can be done programmatically in a user application
237 .. code-block:: bash
239   $ kernel-source/tools/net/ynl/pyynl/cli.py \
240             --spec Documentation/netlink/specs/netdev.yaml \
241             --do napi-set \
242             --json='{"id": 345,
243                      "defer-hard-irqs": 111,
244                      "gro-flush-timeout": 11111}'
246 Similarly, the parameter ``irq-suspend-timeout`` can be set using netlink
247 via netdev-genl. There is no global sysfs parameter for this value.
249 ``irq-suspend-timeout`` is used to determine how long an application can
251 which can be set on a per-epoll context basis with ``EPIOCSPARAMS`` ioctl.
256 ------------
258 Busy polling allows a user process to check for incoming packets before
268 epoll-based busy polling
269 ------------------------
271 It is possible to trigger packet processing directly from calls to
272 ``epoll_wait``. In order to use this feature, a user application must ensure
273 all file descriptors which are added to an epoll context have the same NAPI ID.
277 distribute that file descriptor to a worker thread. The worker thread would add
278 the file descriptor to its epoll context. This would ensure each worker thread
282 be inserted to distribute incoming connections to threads such that each thread
283 is only given incoming connections with the same NAPI ID. Care must be taken to
286 In order to enable busy polling, there are two choices:
288 1. ``/proc/sys/net/core/busy_poll`` can be set with a time in useconds to busy
289    loop waiting for events. This is a system-wide setting and will cause all
290    epoll-based applications to busy poll when they call epoll_wait. This may
291    not be desirable as many applications may not have the need to busy poll.
294    file descriptor to set (``EPIOCSPARAMS``) or get (``EPIOCGPARAMS``) ``struct
297 .. code-block:: c
304       /* pad the struct to a multiple of 64bits */
309 ---------------
311 While busy polling is supposed to be used by low latency applications,
314 Very high request-per-second applications (especially routing/forwarding
316 want to be interrupted until they finish processing a request or a batch
319 Such applications can pledge to the kernel that they will perform a busy
322 socket option. To avoid system misbehavior the pledge is revoked
323 if ``gro_flush_timeout`` passes without any busy poll call. For epoll-based
325 epoll_params`` can be set to 1 and the ``EPIOCSPARAMS`` ioctl can be issued to
331 with the ``SO_BUSY_POLL_BUDGET`` socket option. For epoll-based busy polling
332 applications, the ``busy_poll_budget`` field can be adjusted to the desired value
336 It is important to note that choosing a large value for ``gro_flush_timeout``
337 will defer IRQs to allow for better batch processing, but will induce latency
340 attempting to busy poll by device IRQs and softirq processing. This value
341 should be chosen carefully with these tradeoffs in mind. epoll-based busy
342 polling applications may be able to mitigate how much user processing happens
345 Users may want to consider an alternate approach, IRQ suspension, to help deal
349 --------------
354 While application calls to epoll_wait successfully retrieve events, the kernel will
360 This allows users to balance CPU consumption with network processing
363 To use this mechanism:
365   1. The per-NAPI config parameter ``irq-suspend-timeout`` should be set to the
368      serves as a safety mechanism to restart IRQ driver interrupt processing if
370      the amount of time the user application needs to process data from its
371      call to epoll_wait, noting that applications can control how much data
374   2. The sysfs parameter or per-NAPI config parameters ``gro_flush_timeout``
375      and ``napi_defer_hard_irqs`` can be set to low values. They will be used
376      to defer IRQs after busy poll has found no data.
378   3. The ``prefer_busy_poll`` flag must be set to true. This can be done using
381   4. The application uses epoll as described above to trigger NAPI packet
384 As mentioned above, as long as subsequent calls to epoll_wait return events to
385 userland, the ``irq-suspend-timeout`` is deferred and IRQs are disabled. This
386 allows the application to process data without interference.
388 Once a call to epoll_wait results in no events being found, IRQ suspension is
392 It is expected that ``irq-suspend-timeout`` will be set to a value much larger
393 than ``gro_flush_timeout`` as ``irq-suspend-timeout`` should suspend IRQs for
396 While it is not strictly necessary to use ``napi_defer_hard_irqs`` and
397 ``gro_flush_timeout`` to use IRQ suspension, their use is strongly
400 IRQ suspension causes the system to alternate between polling mode and
401 irq-driven packet delivery. During busy periods, ``irq-suspend-timeout``
409 1) hardirq -> softirq -> napi poll; basic interrupt delivery
410 2) timer -> softirq -> napi poll; deferred irq processing
411 3) epoll -> busy-poll -> napi poll; busy looping
419 During busy periods, ``irq-suspend-timeout`` is used as timer in Loop 2,
426 the recommended usage, because otherwise setting ``irq-suspend-timeout``
432 -------------
437 (called ``napi/${ifc-name}-${napi-id}``).
439 It is recommended to pin each kernel thread to a single CPU, the same
445 Threaded NAPI is controlled by writing 0/1 to the ``threaded`` file in
451 .. code-block:: bash
453   $ ynl --family netdev --do napi-set --json='{"id": 66, "threaded": 1}'
457 .. [#] NAPI was originally referred to as New API in 2.4 Linux.