scaling.rst - OpenGrok cross reference for /linux/Documentation/networking/scaling.rst

Lines Matching +full:packet +full:- +full:based
1 .. SPDX-License-Identifier: GPL-2.0
13 multi-processor systems.
17 - RSS: Receive Side Scaling
18 - RPS: Receive Packet Steering
19 - RFS: Receive Flow Steering
20 - Accelerated Receive Flow Steering
21 - XPS: Transmit Packet Steering
28 (multi-queue). On reception, a NIC can send different packets to different
30 applying a filter to each packet that assigns it to one of a small number
33 generally known as “Receive-side Scaling” (RSS). The goal of RSS and
35 Multi-queue distribution can also be used for traffic prioritization, but
39 and/or transport layer headers-- for example, a 4-tuple hash over
40 IP addresses and TCP ports of a packet. The most common hardware
41 implementation of RSS uses a 128-entry indirection table where each entry
42 stores a queue number. The receive queue for a packet is determined
44 packet (usually a Toeplitz hash), taking this number as a key into the
52 "Symmetric-XOR" and "Symmetric-OR-XOR" are types of RSS algorithms that
57 Specifically, the "Symmetric-XOR" algorithm XORs the input
62 The "Symmetric-OR-XOR" algorithm, on the other hand, transforms the input as
69 Some advanced NICs allow steering packets to queues based on
71 can be directed to their own receive queue. Such “n-tuple” filters can
72 be configured from ethtool (--config-ntuple).
76 -----------------
78 The driver for a multi-queue capable NIC typically provides a kernel
90 commands (--show-rxfh-indir and --set-rxfh-indir). Modifying the
100 signaling path for PCIe devices uses message signaled interrupts (MSI-X),
103 an IRQ may be handled on any CPU. Because a non-negligible part of packet
106 affinity of each interrupt see Documentation/core-api/irq/irq-affinity.rst. Some systems
118 NIC maximum, if lower). The most efficient high-rate configuration
124 Per-cpu load can be observed using the mpstat utility, but note that on
133 Modern NICs support creating multiple co-existing RSS configurations
134 which are selected based on explicit matching rules. This can be very
142   # ethtool -X eth0 hfunc toeplitz context new
149   # ethtool -x eth0 context 1
154   # ethtool -X eth0 equal 2 context 1
155   # ethtool -x eth0 context 1
161 To make use of the new context direct traffic to it using an n-tuple
164   # ethtool -N eth0 flow-type tcp6 dst-port 22 context 1
169   # ethtool -N eth0 delete 1023
170   # ethtool -X eth0 context 1 delete
173 RPS: Receive Packet Steering
176 Receive Packet Steering (RPS) is logically a software implementation of
180 above the interrupt handler. This is accomplished by placing the packet
187    introduce inter-processor interrupts (IPIs))
190 a driver sends a packet up the network stack with netif_rx() or
192 selects the queue that should process a packet.
195 flow hash over the packet’s addresses or ports (2-tuple or 4-tuple hash
197 associated flow of the packet. The hash is either provided by hardware
199 the receive descriptor for the packet; this would usually be the same
201 skb->hash and can be used elsewhere in the stack as a hash of the
202 packet’s flow.
205 RPS may enqueue packets for processing. For each received packet,
207 of the list. The indexed CPU is the target for processing the packet,
208 and the packet is queued to the tail of that CPU’s backlog queue. At
216 -----------------
223   /sys/class/net/<dev>/queues/rx-<n>/rps_cpus
227 CPU. Documentation/core-api/irq/irq-affinity.rst explains how CPUs are assigned to
240 For a multi-queue system, if RSS is configured so that a hardware
248 --------------
251 reordering. The trade-off to sending all packets from the same flow
252 to the same CPU is CPU load imbalance if flows vary in packet rate.
261 destination CPU approaches saturation.  Once a CPU's input packet
263 net.core.netdev_max_backlog), the kernel starts a per-flow packet
265 default, half) of these packets when a new packet arrives, then the
266 new packet is dropped. Packets from other flows are still only
267 dropped once the input packet queue reaches netdev_max_backlog.
268 No packets are dropped when the input packet queue length is below
284 Per-flow rate is calculated by hashing each packet into a hashtable
285 bucket and incrementing a per-bucket counter. The hash function is
287 be much larger than the number of CPUs, flow limit has finer-grained
305 The feature depends on the input packet queue length to exceed
314 While RPS steers packets solely based on hash, and thus generally
319 consuming the packet is running. RFS relies on the same RPS mechanisms
352 CPU's backlog when a packet in this flow was last enqueued. Each backlog
361 CPU for packet processing (from get_rps_cpu()) the rps_sock_flow table
362 and the rps_dev_flow table of the queue that the packet was received on
365 table), the packet is enqueued onto that CPU’s backlog. If they differ,
369   - The current CPU's queue head counter >= the recorded tail counter
371   - The current CPU is unset (>= nr_cpu_ids)
372   - The current CPU is offline
374 After this check, the packet is sent to the (possibly updated) current
382 -----------------
390 The number of entries in the per-queue flow table are set through::
392   /sys/class/net/<dev>/queues/rx-<n>/rps_flow_cnt
407 For a multi-queue device, the rps_flow_cnt for each queue might be
417 Accelerated RFS is to RFS what RSS is to RPS: a hardware-accelerated load
418 balancing mechanism that uses soft state to steer flows based on where
434 is maintained by the NIC driver. This is an auto-generated reverse map of
444 -----------------------------
461 XPS: Transmit Packet Steering
464 Transmit Packet Steering is a mechanism for intelligently selecting
465 which transmit queue to use when transmitting a packet on a multi-queue
484 This mapping is used to pick transmit queue based on the receive
489 busy polling multi-threaded workloads where there are challenges in
496 the same queue-association that a given application is polling on. This
503 CPUs/receive-queues that may use that queue to transmit. The reverse
504 mapping, from CPUs to transmit queues or from receive-queues to transmit
506 transmitting the first packet in a flow, the function get_xps_queue() is
508 for the socket connection for a match in the receive queue-to-transmit queue
510 running CPU as a key into the CPU-to-queue lookup table. If the
513 into the set. When selecting the transmit queue based on receive queue(s)
523 skb->ooo_okay is set for a packet in the flow. This flag indicates that
531 -----------------
535 how, XPS is configured at device init. The mapping of CPUs/receive-queues
538 For selection based on CPUs map::
540   /sys/class/net/<dev>/queues/tx-<n>/xps_cpus
542 For selection based on receive-queues map::
544   /sys/class/net/<dev>/queues/tx-<n>/xps_rxqs
551 has no effect, since there is no choice in this case. In a multi-queue
560 For transmit queue selection based on receive queue(s), XPS has to be
561 explicitly configured mapping receive-queue(s) to transmit queue(s). If the
562 user configuration for receive-queue map does not apply, then the transmit
563 queue is selected based on the CPUs map.
569 These are rate-limitation mechanisms implemented by HW, where currently
570 a max-rate attribute is supported, by setting a Mbps value to::
572   /sys/class/net/<dev>/queues/tx-<n>/tx_maxrate
588 - Tom Herbert (therbert@google.com)
589 - Willem de Bruijn (willemb@google.com)