xref: /linux/Documentation/networking/device_drivers/ethernet/intel/ice.rst (revision 6116075e18f79698419f2606d9cb34d23198f7e3)
1.. SPDX-License-Identifier: GPL-2.0+
2
3=================================================================
4Linux Base Driver for the Intel(R) Ethernet Controller 800 Series
5=================================================================
6
7Intel ice Linux driver.
8Copyright(c) 2018-2021 Intel Corporation.
9
10Contents
11========
12
13- Overview
14- Identifying Your Adapter
15- Important Notes
16- Additional Features & Configurations
17- Performance Optimization
18
19
20The associated Virtual Function (VF) driver for this driver is iavf.
21
22Driver information can be obtained using ethtool and lspci.
23
24For questions related to hardware requirements, refer to the documentation
25supplied with your Intel adapter. All hardware requirements listed apply to use
26with Linux.
27
28This driver supports XDP (Express Data Path) and AF_XDP zero-copy. Note that
29XDP is blocked for frame sizes larger than 3KB.
30
31
32Identifying Your Adapter
33========================
34For information on how to identify your adapter, and for the latest Intel
35network drivers, refer to the Intel Support website:
36https://www.intel.com/support
37
38
39Important Notes
40===============
41
42Packet drops may occur under receive stress
43-------------------------------------------
44Devices based on the Intel(R) Ethernet Controller 800 Series are designed to
45tolerate a limited amount of system latency during PCIe and DMA transactions.
46If these transactions take longer than the tolerated latency, it can impact the
47length of time the packets are buffered in the device and associated memory,
48which may result in dropped packets. These packets drops typically do not have
49a noticeable impact on throughput and performance under standard workloads.
50
51If these packet drops appear to affect your workload, the following may improve
52the situation:
53
541) Make sure that your system's physical memory is in a high-performance
55   configuration, as recommended by the platform vendor. A common
56   recommendation is for all channels to be populated with a single DIMM
57   module.
582) In your system's BIOS/UEFI settings, select the "Performance" profile.
593) Your distribution may provide tools like "tuned," which can help tweak
60   kernel settings to achieve better standard settings for different workloads.
61
62
63Configuring SR-IOV for improved network security
64------------------------------------------------
65In a virtualized environment, on Intel(R) Ethernet Network Adapters that
66support SR-IOV, the virtual function (VF) may be subject to malicious behavior.
67Software-generated layer two frames, like IEEE 802.3x (link flow control), IEEE
68802.1Qbb (priority based flow-control), and others of this type, are not
69expected and can throttle traffic between the host and the virtual switch,
70reducing performance. To resolve this issue, and to ensure isolation from
71unintended traffic streams, configure all SR-IOV enabled ports for VLAN tagging
72from the administrative interface on the PF. This configuration allows
73unexpected, and potentially malicious, frames to be dropped.
74
75See "Configuring VLAN Tagging on SR-IOV Enabled Adapter Ports" later in this
76README for configuration instructions.
77
78
79Do not unload port driver if VF with active VM is bound to it
80-------------------------------------------------------------
81Do not unload a port's driver if a Virtual Function (VF) with an active Virtual
82Machine (VM) is bound to it. Doing so will cause the port to appear to hang.
83Once the VM shuts down, or otherwise releases the VF, the command will
84complete.
85
86
87Additional Features and Configurations
88======================================
89
90ethtool
91-------
92The driver utilizes the ethtool interface for driver configuration and
93diagnostics, as well as displaying statistical information. The latest ethtool
94version is required for this functionality. Download it at:
95https://kernel.org/pub/software/network/ethtool/
96
97NOTE: The rx_bytes value of ethtool does not match the rx_bytes value of
98Netdev, due to the 4-byte CRC being stripped by the device. The difference
99between the two rx_bytes values will be 4 x the number of Rx packets. For
100example, if Rx packets are 10 and Netdev (software statistics) displays
101rx_bytes as "X", then ethtool (hardware statistics) will display rx_bytes as
102"X+40" (4 bytes CRC x 10 packets).
103
104ethtool reset
105-------------
106The driver supports 3 types of resets:
107
108- PF reset - resets only components associated with the given PF, does not
109  impact other PFs
110
111- CORE reset - whole adapter is affected, reset all PFs
112
113- GLOBAL reset - same as CORE but mac and phy components are also reinitialized
114
115These are mapped to ethtool reset flags as follow:
116
117- PF reset:
118
119  # ethtool --reset <ethX> irq dma filter offload
120
121- CORE reset:
122
123  # ethtool --reset <ethX> irq-shared dma-shared filter-shared offload-shared \
124  ram-shared
125
126- GLOBAL reset:
127
128  # ethtool --reset <ethX> irq-shared dma-shared filter-shared offload-shared \
129  mac-shared phy-shared ram-shared
130
131In switchdev mode you can reset a VF using port representor:
132
133  # ethtool --reset <repr> irq dma filter offload
134
135
136Viewing Link Messages
137---------------------
138Link messages will not be displayed to the console if the distribution is
139restricting system messages. In order to see network driver link messages on
140your console, set dmesg to eight by entering the following::
141
142  # dmesg -n 8
143
144NOTE: This setting is not saved across reboots.
145
146
147Dynamic Device Personalization
148------------------------------
149Dynamic Device Personalization (DDP) allows you to change the packet processing
150pipeline of a device by applying a profile package to the device at runtime.
151Profiles can be used to, for example, add support for new protocols, change
152existing protocols, or change default settings. DDP profiles can also be rolled
153back without rebooting the system.
154
155The DDP package loads during device initialization. The driver looks for
156``intel/ice/ddp/ice.pkg`` in your firmware root (typically ``/lib/firmware/``
157or ``/lib/firmware/updates/``) and checks that it contains a valid DDP package
158file.
159
160NOTE: Your distribution should likely have provided the latest DDP file, but if
161ice.pkg is missing, you can find it in the linux-firmware repository or from
162intel.com.
163
164If the driver is unable to load the DDP package, the device will enter Safe
165Mode. Safe Mode disables advanced and performance features and supports only
166basic traffic and minimal functionality, such as updating the NVM or
167downloading a new driver or DDP package. Safe Mode only applies to the affected
168physical function and does not impact any other PFs. See the "Intel(R) Ethernet
169Adapters and Devices User Guide" for more details on DDP and Safe Mode.
170
171NOTES:
172
173- If you encounter issues with the DDP package file, you may need to download
174  an updated driver or DDP package file. See the log messages for more
175  information.
176
177- The ice.pkg file is a symbolic link to the default DDP package file.
178
179- You cannot update the DDP package if any PF drivers are already loaded. To
180  overwrite a package, unload all PFs and then reload the driver with the new
181  package.
182
183- Only the first loaded PF per device can download a package for that device.
184
185You can install specific DDP package files for different physical devices in
186the same system. To install a specific DDP package file:
187
1881. Download the DDP package file you want for your device.
189
1902. Rename the file ice-xxxxxxxxxxxxxxxx.pkg, where 'xxxxxxxxxxxxxxxx' is the
191   unique 64-bit PCI Express device serial number (in hex) of the device you
192   want the package downloaded on. The filename must include the complete
193   serial number (including leading zeros) and be all lowercase. For example,
194   if the 64-bit serial number is b887a3ffffca0568, then the file name would be
195   ice-b887a3ffffca0568.pkg.
196
197   To find the serial number from the PCI bus address, you can use the
198   following command::
199
200     # lspci -vv -s af:00.0 | grep -i Serial
201     Capabilities: [150 v1] Device Serial Number b8-87-a3-ff-ff-ca-05-68
202
203   You can use the following command to format the serial number without the
204   dashes::
205
206     # lspci -vv -s af:00.0 | grep -i Serial | awk '{print $7}' | sed s/-//g
207     b887a3ffffca0568
208
2093. Copy the renamed DDP package file to
210   ``/lib/firmware/updates/intel/ice/ddp/``. If the directory does not yet
211   exist, create it before copying the file.
212
2134. Unload all of the PFs on the device.
214
2155. Reload the driver with the new package.
216
217NOTE: The presence of a device-specific DDP package file overrides the loading
218of the default DDP package file (ice.pkg).
219
220
221Intel(R) Ethernet Flow Director
222-------------------------------
223The Intel Ethernet Flow Director performs the following tasks:
224
225- Directs receive packets according to their flows to different queues
226- Enables tight control on routing a flow in the platform
227- Matches flows and CPU cores for flow affinity
228
229NOTE: This driver supports the following flow types:
230
231- IPv4
232- TCPv4
233- UDPv4
234- SCTPv4
235- IPv6
236- TCPv6
237- UDPv6
238- SCTPv6
239
240Each flow type supports valid combinations of IP addresses (source or
241destination) and UDP/TCP/SCTP ports (source and destination). You can supply
242only a source IP address, a source IP address and a destination port, or any
243combination of one or more of these four parameters.
244
245NOTE: This driver allows you to filter traffic based on a user-defined flexible
246two-byte pattern and offset by using the ethtool user-def and mask fields. Only
247L3 and L4 flow types are supported for user-defined flexible filters. For a
248given flow type, you must clear all Intel Ethernet Flow Director filters before
249changing the input set (for that flow type).
250
251
252Flow Director Filters
253---------------------
254Flow Director filters are used to direct traffic that matches specified
255characteristics. They are enabled through ethtool's ntuple interface. To enable
256or disable the Intel Ethernet Flow Director and these filters::
257
258  # ethtool -K <ethX> ntuple <off|on>
259
260NOTE: When you disable ntuple filters, all the user programmed filters are
261flushed from the driver cache and hardware. All needed filters must be re-added
262when ntuple is re-enabled.
263
264To display all of the active filters::
265
266  # ethtool -u <ethX>
267
268To add a new filter::
269
270  # ethtool -U <ethX> flow-type <type> src-ip <ip> [m <ip_mask>] dst-ip <ip>
271  [m <ip_mask>] src-port <port> [m <port_mask>] dst-port <port> [m <port_mask>]
272  action <queue>
273
274  Where:
275    <ethX> - the Ethernet device to program
276    <type> - can be ip4, tcp4, udp4, sctp4, ip6, tcp6, udp6, sctp6
277    <ip> - the IP address to match on
278    <ip_mask> - the IPv4 address to mask on
279              NOTE: These filters use inverted masks.
280    <port> - the port number to match on
281    <port_mask> - the 16-bit integer for masking
282              NOTE: These filters use inverted masks.
283    <queue> - the queue to direct traffic toward (-1 discards the
284              matched traffic)
285
286To delete a filter::
287
288  # ethtool -U <ethX> delete <N>
289
290  Where <N> is the filter ID displayed when printing all the active filters,
291  and may also have been specified using "loc <N>" when adding the filter.
292
293EXAMPLES:
294
295To add a filter that directs packet to queue 2::
296
297  # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.10.1 dst-ip \
298  192.168.10.2 src-port 2000 dst-port 2001 action 2 [loc 1]
299
300To set a filter using only the source and destination IP address::
301
302  # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.10.1 dst-ip \
303  192.168.10.2 action 2 [loc 1]
304
305To set a filter based on a user-defined pattern and offset::
306
307  # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.10.1 dst-ip \
308  192.168.10.2 user-def 0x4FFFF action 2 [loc 1]
309
310  where the value of the user-def field contains the offset (4 bytes) and
311  the pattern (0xffff).
312
313To match TCP traffic sent from 192.168.0.1, port 5300, directed to 192.168.0.5,
314port 80, and then send it to queue 7::
315
316  # ethtool -U enp130s0 flow-type tcp4 src-ip 192.168.0.1 dst-ip 192.168.0.5
317  src-port 5300 dst-port 80 action 7
318
319To add a TCPv4 filter with a partial mask for a source IP subnet::
320
321  # ethtool -U <ethX> flow-type tcp4 src-ip 192.168.0.0 m 0.255.255.255 dst-ip
322  192.168.5.12 src-port 12600 dst-port 31 action 12
323
324NOTES:
325
326For each flow-type, the programmed filters must all have the same matching
327input set. For example, issuing the following two commands is acceptable::
328
329  # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
330  # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.5 src-port 55 action 10
331
332Issuing the next two commands, however, is not acceptable, since the first
333specifies src-ip and the second specifies dst-ip::
334
335  # ethtool -U enp130s0 flow-type ip4 src-ip 192.168.0.1 src-port 5300 action 7
336  # ethtool -U enp130s0 flow-type ip4 dst-ip 192.168.0.5 src-port 55 action 10
337
338The second command will fail with an error. You may program multiple filters
339with the same fields, using different values, but, on one device, you may not
340program two tcp4 filters with different matching fields.
341
342The ice driver does not support matching on a subportion of a field, thus
343partial mask fields are not supported.
344
345
346Flex Byte Flow Director Filters
347-------------------------------
348The driver also supports matching user-defined data within the packet payload.
349This flexible data is specified using the "user-def" field of the ethtool
350command in the following way:
351
352.. table::
353
354    ============================== ============================
355    ``31    28    24    20    16`` ``15    12    8    4    0``
356    ``offset into packet payload`` ``2 bytes of flexible data``
357    ============================== ============================
358
359For example,
360
361::
362
363  ... user-def 0x4FFFF ...
364
365tells the filter to look 4 bytes into the payload and match that value against
3660xFFFF. The offset is based on the beginning of the payload, and not the
367beginning of the packet. Thus
368
369::
370
371  flow-type tcp4 ... user-def 0x8BEAF ...
372
373would match TCP/IPv4 packets which have the value 0xBEAF 8 bytes into the
374TCP/IPv4 payload.
375
376Note that ICMP headers are parsed as 4 bytes of header and 4 bytes of payload.
377Thus to match the first byte of the payload, you must actually add 4 bytes to
378the offset. Also note that ip4 filters match both ICMP frames as well as raw
379(unknown) ip4 frames, where the payload will be the L3 payload of the IP4
380frame.
381
382The maximum offset is 64. The hardware will only read up to 64 bytes of data
383from the payload. The offset must be even because the flexible data is 2 bytes
384long and must be aligned to byte 0 of the packet payload.
385
386The user-defined flexible offset is also considered part of the input set and
387cannot be programmed separately for multiple filters of the same type. However,
388the flexible data is not part of the input set and multiple filters may use the
389same offset but match against different data.
390
391
392RSS Hash Flow
393-------------
394Allows you to set the hash bytes per flow type and any combination of one or
395more options for Receive Side Scaling (RSS) hash byte configuration.
396
397::
398
399  # ethtool -N <ethX> rx-flow-hash <type> <option>
400
401  Where <type> is:
402    tcp4    signifying TCP over IPv4
403    udp4    signifying UDP over IPv4
404    gtpc4   signifying GTP-C over IPv4
405    gtpc4t  signifying GTP-C (include TEID) over IPv4
406    gtpu4   signifying GTP-U over IPV4
407    gtpu4e  signifying GTP-U and Extension Header over IPV4
408    gtpu4u  signifying GTP-U PSC Uplink over IPV4
409    gtpu4d  signifying GTP-U PSC Downlink over IPV4
410    tcp6    signifying TCP over IPv6
411    udp6    signifying UDP over IPv6
412    gtpc6   signifying GTP-C over IPv6
413    gtpc6t  signifying GTP-C (include TEID) over IPv6
414    gtpu6   signifying GTP-U over IPV6
415    gtpu6e  signifying GTP-U and Extension Header over IPV6
416    gtpu6u  signifying GTP-U PSC Uplink over IPV6
417    gtpu6d  signifying GTP-U PSC Downlink over IPV6
418  And <option> is one or more of:
419    s     Hash on the IP source address of the Rx packet.
420    d     Hash on the IP destination address of the Rx packet.
421    f     Hash on bytes 0 and 1 of the Layer 4 header of the Rx packet.
422    n     Hash on bytes 2 and 3 of the Layer 4 header of the Rx packet.
423    e     Hash on GTP Packet on TEID (4bytes) of the Rx packet.
424
425
426Accelerated Receive Flow Steering (aRFS)
427----------------------------------------
428Devices based on the Intel(R) Ethernet Controller 800 Series support
429Accelerated Receive Flow Steering (aRFS) on the PF. aRFS is a load-balancing
430mechanism that allows you to direct packets to the same CPU where an
431application is running or consuming the packets in that flow.
432
433NOTES:
434
435- aRFS requires that ntuple filtering is enabled via ethtool.
436- aRFS support is limited to the following packet types:
437
438    - TCP over IPv4 and IPv6
439    - UDP over IPv4 and IPv6
440    - Nonfragmented packets
441
442- aRFS only supports Flow Director filters, which consist of the
443  source/destination IP addresses and source/destination ports.
444- aRFS and ethtool's ntuple interface both use the device's Flow Director. aRFS
445  and ntuple features can coexist, but you may encounter unexpected results if
446  there's a conflict between aRFS and ntuple requests. See "Intel(R) Ethernet
447  Flow Director" for additional information.
448
449To set up aRFS:
450
4511. Enable the Intel Ethernet Flow Director and ntuple filters using ethtool.
452
453::
454
455   # ethtool -K <ethX> ntuple on
456
4572. Set up the number of entries in the global flow table. For example:
458
459::
460
461   # NUM_RPS_ENTRIES=16384
462   # echo $NUM_RPS_ENTRIES > /proc/sys/net/core/rps_sock_flow_entries
463
4643. Set up the number of entries in the per-queue flow table. For example:
465
466::
467
468   # NUM_RX_QUEUES=64
469   # for file in /sys/class/net/$IFACE/queues/rx-*/rps_flow_cnt; do
470   # echo $(($NUM_RPS_ENTRIES/$NUM_RX_QUEUES)) > $file;
471   # done
472
4734. Disable the IRQ balance daemon (this is only a temporary stop of the service
474   until the next reboot).
475
476::
477
478   # systemctl stop irqbalance
479
4805. Configure the interrupt affinity.
481
482   See ``/Documentation/core-api/irq/irq-affinity.rst``
483
484
485To disable aRFS using ethtool::
486
487  # ethtool -K <ethX> ntuple off
488
489NOTE: This command will disable ntuple filters and clear any aRFS filters in
490software and hardware.
491
492Example Use Case:
493
4941. Set the server application on the desired CPU (e.g., CPU 4).
495
496::
497
498   # taskset -c 4 netserver
499
5002. Use netperf to route traffic from the client to CPU 4 on the server with
501   aRFS configured. This example uses TCP over IPv4.
502
503::
504
505   # netperf -H <Host IPv4 Address> -t TCP_STREAM
506
507
508Enabling Virtual Functions (VFs)
509--------------------------------
510Use sysfs to enable virtual functions (VF).
511
512For example, you can create 4 VFs as follows::
513
514  # echo 4 > /sys/class/net/<ethX>/device/sriov_numvfs
515
516To disable VFs, write 0 to the same file::
517
518  # echo 0 > /sys/class/net/<ethX>/device/sriov_numvfs
519
520The maximum number of VFs for the ice driver is 256 total (all ports). To check
521how many VFs each PF supports, use the following command::
522
523  # cat /sys/class/net/<ethX>/device/sriov_totalvfs
524
525Note: You cannot use SR-IOV when link aggregation (LAG)/bonding is active, and
526vice versa. To enforce this, the driver checks for this mutual exclusion.
527
528
529Displaying VF Statistics on the PF
530----------------------------------
531Use the following command to display the statistics for the PF and its VFs::
532
533  # ip -s link show dev <ethX>
534
535NOTE: The output of this command can be very large due to the maximum number of
536possible VFs.
537
538The PF driver will display a subset of the statistics for the PF and for all
539VFs that are configured. The PF will always print a statistics block for each
540of the possible VFs, and it will show zero for all unconfigured VFs.
541
542
543Configuring VLAN Tagging on SR-IOV Enabled Adapter Ports
544--------------------------------------------------------
545To configure VLAN tagging for the ports on an SR-IOV enabled adapter, use the
546following command. The VLAN configuration should be done before the VF driver
547is loaded or the VM is booted. The VF is not aware of the VLAN tag being
548inserted on transmit and removed on received frames (sometimes called "port
549VLAN" mode).
550
551::
552
553  # ip link set dev <ethX> vf <id> vlan <vlan id>
554
555For example, the following will configure PF eth0 and the first VF on VLAN 10::
556
557  # ip link set dev eth0 vf 0 vlan 10
558
559
560Enabling a VF link if the port is disconnected
561----------------------------------------------
562If the physical function (PF) link is down, you can force link up (from the
563host PF) on any virtual functions (VF) bound to the PF.
564
565For example, to force link up on VF 0 bound to PF eth0::
566
567  # ip link set eth0 vf 0 state enable
568
569Note: If the command does not work, it may not be supported by your system.
570
571
572Setting the MAC Address for a VF
573--------------------------------
574To change the MAC address for the specified VF::
575
576  # ip link set <ethX> vf 0 mac <address>
577
578For example::
579
580  # ip link set <ethX> vf 0 mac 00:01:02:03:04:05
581
582This setting lasts until the PF is reloaded.
583
584NOTE: Assigning a MAC address for a VF from the host will disable any
585subsequent requests to change the MAC address from within the VM. This is a
586security feature. The VM is not aware of this restriction, so if this is
587attempted in the VM, it will trigger MDD events.
588
589
590Trusted VFs and VF Promiscuous Mode
591-----------------------------------
592This feature allows you to designate a particular VF as trusted and allows that
593trusted VF to request selective promiscuous mode on the Physical Function (PF).
594
595To set a VF as trusted or untrusted, enter the following command in the
596Hypervisor::
597
598  # ip link set dev <ethX> vf 1 trust [on|off]
599
600NOTE: It's important to set the VF to trusted before setting promiscuous mode.
601If the VM is not trusted, the PF will ignore promiscuous mode requests from the
602VF. If the VM becomes trusted after the VF driver is loaded, you must make a
603new request to set the VF to promiscuous.
604
605Once the VF is designated as trusted, use the following commands in the VM to
606set the VF to promiscuous mode.
607
608For promiscuous all::
609
610  # ip link set <ethX> promisc on
611  Where <ethX> is a VF interface in the VM
612
613For promiscuous Multicast::
614
615  # ip link set <ethX> allmulticast on
616  Where <ethX> is a VF interface in the VM
617
618NOTE: By default, the ethtool private flag vf-true-promisc-support is set to
619"off," meaning that promiscuous mode for the VF will be limited. To set the
620promiscuous mode for the VF to true promiscuous and allow the VF to see all
621ingress traffic, use the following command::
622
623  # ethtool --set-priv-flags <ethX> vf-true-promisc-support on
624
625The vf-true-promisc-support private flag does not enable promiscuous mode;
626rather, it designates which type of promiscuous mode (limited or true) you will
627get when you enable promiscuous mode using the ip link commands above. Note
628that this is a global setting that affects the entire device. However, the
629vf-true-promisc-support private flag is only exposed to the first PF of the
630device. The PF remains in limited promiscuous mode regardless of the
631vf-true-promisc-support setting.
632
633Next, add a VLAN interface on the VF interface. For example::
634
635  # ip link add link eth2 name eth2.100 type vlan id 100
636
637Note that the order in which you set the VF to promiscuous mode and add the
638VLAN interface does not matter (you can do either first). The result in this
639example is that the VF will get all traffic that is tagged with VLAN 100.
640
641
642Malicious Driver Detection (MDD) for VFs
643----------------------------------------
644Some Intel Ethernet devices use Malicious Driver Detection (MDD) to detect
645malicious traffic from the VF and disable Tx/Rx queues or drop the offending
646packet until a VF driver reset occurs. You can view MDD messages in the PF's
647system log using the dmesg command.
648
649- If the PF driver logs MDD events from the VF, confirm that the correct VF
650  driver is installed.
651- To restore functionality, you can manually reload the VF or VM or enable
652  automatic VF resets.
653- When automatic VF resets are enabled, the PF driver will immediately reset
654  the VF and reenable queues when it detects MDD events on the receive path.
655- If automatic VF resets are disabled, the PF will not automatically reset the
656  VF when it detects MDD events.
657
658To enable or disable automatic VF resets, use the following command::
659
660  # ethtool --set-priv-flags <ethX> mdd-auto-reset-vf on|off
661
662
663MAC and VLAN Anti-Spoofing Feature for VFs
664------------------------------------------
665When a malicious driver on a Virtual Function (VF) interface attempts to send a
666spoofed packet, it is dropped by the hardware and not transmitted.
667
668NOTE: This feature can be disabled for a specific VF::
669
670  # ip link set <ethX> vf <vf id> spoofchk {off|on}
671
672
673Jumbo Frames
674------------
675Jumbo Frames support is enabled by changing the Maximum Transmission Unit (MTU)
676to a value larger than the default value of 1500.
677
678Use the ifconfig command to increase the MTU size. For example, enter the
679following where <ethX> is the interface number::
680
681  # ifconfig <ethX> mtu 9000 up
682
683Alternatively, you can use the ip command as follows::
684
685  # ip link set mtu 9000 dev <ethX>
686  # ip link set up dev <ethX>
687
688This setting is not saved across reboots.
689
690
691NOTE: The maximum MTU setting for jumbo frames is 9702. This corresponds to the
692maximum jumbo frame size of 9728 bytes.
693
694NOTE: This driver will attempt to use multiple page sized buffers to receive
695each jumbo packet. This should help to avoid buffer starvation issues when
696allocating receive packets.
697
698NOTE: Packet loss may have a greater impact on throughput when you use jumbo
699frames. If you observe a drop in performance after enabling jumbo frames,
700enabling flow control may mitigate the issue.
701
702
703Speed and Duplex Configuration
704------------------------------
705In addressing speed and duplex configuration issues, you need to distinguish
706between copper-based adapters and fiber-based adapters.
707
708In the default mode, an Intel(R) Ethernet Network Adapter using copper
709connections will attempt to auto-negotiate with its link partner to determine
710the best setting. If the adapter cannot establish link with the link partner
711using auto-negotiation, you may need to manually configure the adapter and link
712partner to identical settings to establish link and pass packets. This should
713only be needed when attempting to link with an older switch that does not
714support auto-negotiation or one that has been forced to a specific speed or
715duplex mode. Your link partner must match the setting you choose. 1 Gbps speeds
716and higher cannot be forced. Use the autonegotiation advertising setting to
717manually set devices for 1 Gbps and higher.
718
719Speed, duplex, and autonegotiation advertising are configured through the
720ethtool utility. For the latest version, download and install ethtool from the
721following website:
722
723   https://kernel.org/pub/software/network/ethtool/
724
725To see the speed configurations your device supports, run the following::
726
727  # ethtool <ethX>
728
729Caution: Only experienced network administrators should force speed and duplex
730or change autonegotiation advertising manually. The settings at the switch must
731always match the adapter settings. Adapter performance may suffer or your
732adapter may not operate if you configure the adapter differently from your
733switch.
734
735
736Data Center Bridging (DCB)
737--------------------------
738NOTE: The kernel assumes that TC0 is available, and will disable Priority Flow
739Control (PFC) on the device if TC0 is not available. To fix this, ensure TC0 is
740enabled when setting up DCB on your switch.
741
742DCB is a configuration Quality of Service implementation in hardware. It uses
743the VLAN priority tag (802.1p) to filter traffic. That means that there are 8
744different priorities that traffic can be filtered into. It also enables
745priority flow control (802.1Qbb) which can limit or eliminate the number of
746dropped packets during network stress. Bandwidth can be allocated to each of
747these priorities, which is enforced at the hardware level (802.1Qaz).
748
749DCB is normally configured on the network using the DCBX protocol (802.1Qaz), a
750specialization of LLDP (802.1AB). The ice driver supports the following
751mutually exclusive variants of DCBX support:
752
7531) Firmware-based LLDP Agent
7542) Software-based LLDP Agent
755
756In firmware-based mode, firmware intercepts all LLDP traffic and handles DCBX
757negotiation transparently for the user. In this mode, the adapter operates in
758"willing" DCBX mode, receiving DCB settings from the link partner (typically a
759switch). The local user can only query the negotiated DCB configuration. For
760information on configuring DCBX parameters on a switch, please consult the
761switch manufacturer's documentation.
762
763In software-based mode, LLDP traffic is forwarded to the network stack and user
764space, where a software agent can handle it. In this mode, the adapter can
765operate in either "willing" or "nonwilling" DCBX mode and DCB configuration can
766be both queried and set locally. This mode requires the FW-based LLDP Agent to
767be disabled.
768
769NOTE:
770
771- You can enable and disable the firmware-based LLDP Agent using an ethtool
772  private flag. Refer to the "FW-LLDP (Firmware Link Layer Discovery Protocol)"
773  section in this README for more information.
774- In software-based DCBX mode, you can configure DCB parameters using software
775  LLDP/DCBX agents that interface with the Linux kernel's DCB Netlink API. We
776  recommend using OpenLLDP as the DCBX agent when running in software mode. For
777  more information, see the OpenLLDP man pages and
778  https://github.com/intel/openlldp.
779- The driver implements the DCB netlink interface layer to allow the user space
780  to communicate with the driver and query DCB configuration for the port.
781- iSCSI with DCB is not supported.
782
783
784FW-LLDP (Firmware Link Layer Discovery Protocol)
785------------------------------------------------
786Use ethtool to change FW-LLDP settings. The FW-LLDP setting is per port and
787persists across boots.
788
789To enable LLDP::
790
791  # ethtool --set-priv-flags <ethX> fw-lldp-agent on
792
793To disable LLDP::
794
795  # ethtool --set-priv-flags <ethX> fw-lldp-agent off
796
797To check the current LLDP setting::
798
799  # ethtool --show-priv-flags <ethX>
800
801NOTE: You must enable the UEFI HII "LLDP Agent" attribute for this setting to
802take effect. If "LLDP AGENT" is set to disabled, you cannot enable it from the
803OS.
804
805
806Flow Control
807------------
808Ethernet Flow Control (IEEE 802.3x) can be configured with ethtool to enable
809receiving and transmitting pause frames for ice. When transmit is enabled,
810pause frames are generated when the receive packet buffer crosses a predefined
811threshold. When receive is enabled, the transmit unit will halt for the time
812delay specified when a pause frame is received.
813
814NOTE: You must have a flow control capable link partner.
815
816Flow Control is disabled by default.
817
818Use ethtool to change the flow control settings.
819
820To enable or disable Rx or Tx Flow Control::
821
822  # ethtool -A <ethX> rx <on|off> tx <on|off>
823
824Note: This command only enables or disables Flow Control if auto-negotiation is
825disabled. If auto-negotiation is enabled, this command changes the parameters
826used for auto-negotiation with the link partner.
827
828Note: Flow Control auto-negotiation is part of link auto-negotiation. Depending
829on your device, you may not be able to change the auto-negotiation setting.
830
831NOTE:
832
833- The ice driver requires flow control on both the port and link partner. If
834  flow control is disabled on one of the sides, the port may appear to hang on
835  heavy traffic.
836- You may encounter issues with link-level flow control (LFC) after disabling
837  DCB. The LFC status may show as enabled but traffic is not paused. To resolve
838  this issue, disable and reenable LFC using ethtool::
839
840   # ethtool -A <ethX> rx off tx off
841   # ethtool -A <ethX> rx on tx on
842
843
844NAPI
845----
846
847This driver supports NAPI (Rx polling mode).
848
849See :ref:`Documentation/networking/napi.rst <napi>` for more information.
850
851MACVLAN
852-------
853This driver supports MACVLAN. Kernel support for MACVLAN can be tested by
854checking if the MACVLAN driver is loaded. You can run 'lsmod | grep macvlan' to
855see if the MACVLAN driver is loaded or run 'modprobe macvlan' to try to load
856the MACVLAN driver.
857
858NOTE:
859
860- In passthru mode, you can only set up one MACVLAN device. It will inherit the
861  MAC address of the underlying PF (Physical Function) device.
862
863
864IEEE 802.1ad (QinQ) Support
865---------------------------
866The IEEE 802.1ad standard, informally known as QinQ, allows for multiple VLAN
867IDs within a single Ethernet frame. VLAN IDs are sometimes referred to as
868"tags," and multiple VLAN IDs are thus referred to as a "tag stack." Tag stacks
869allow L2 tunneling and the ability to segregate traffic within a particular
870VLAN ID, among other uses.
871
872NOTES:
873
874- Receive checksum offloads and VLAN acceleration are not supported for 802.1ad
875  (QinQ) packets.
876
877- 0x88A8 traffic will not be received unless VLAN stripping is disabled with
878  the following command::
879
880    # ethtool -K <ethX> rxvlan off
881
882- 0x88A8/0x8100 double VLANs cannot be used with 0x8100 or 0x8100/0x8100 VLANS
883  configured on the same port. 0x88a8/0x8100 traffic will not be received if
884  0x8100 VLANs are configured.
885
886- The VF can only transmit 0x88A8/0x8100 (i.e., 802.1ad/802.1Q) traffic if:
887
888    1) The VF is not assigned a port VLAN.
889    2) spoofchk is disabled from the PF. If you enable spoofchk, the VF will
890       not transmit 0x88A8/0x8100 traffic.
891
892- The VF may not receive all network traffic based on the Inner VLAN header
893  when VF true promiscuous mode (vf-true-promisc-support) and double VLANs are
894  enabled in SR-IOV mode.
895
896The following are examples of how to configure 802.1ad (QinQ)::
897
898  # ip link add link eth0 eth0.24 type vlan proto 802.1ad id 24
899  # ip link add link eth0.24 eth0.24.371 type vlan proto 802.1Q id 371
900
901  Where "24" and "371" are example VLAN IDs.
902
903
904Tunnel/Overlay Stateless Offloads
905---------------------------------
906Supported tunnels and overlays include VXLAN, GENEVE, and others depending on
907hardware and software configuration. Stateless offloads are enabled by default.
908
909To view the current state of all offloads::
910
911  # ethtool -k <ethX>
912
913
914UDP Segmentation Offload
915------------------------
916Allows the adapter to offload transmit segmentation of UDP packets with
917payloads up to 64K into valid Ethernet frames. Because the adapter hardware is
918able to complete data segmentation much faster than operating system software,
919this feature may improve transmission performance.
920In addition, the adapter may use fewer CPU resources.
921
922NOTE:
923
924- The application sending UDP packets must support UDP segmentation offload.
925
926To enable/disable UDP Segmentation Offload, issue the following command::
927
928  # ethtool -K <ethX> tx-udp-segmentation [off|on]
929
930
931GNSS module
932-----------
933Requires kernel compiled with CONFIG_GNSS=y or CONFIG_GNSS=m.
934Allows user to read messages from the GNSS hardware module and write supported
935commands. If the module is physically present, a GNSS device is spawned:
936``/dev/gnss<id>``.
937The protocol of write command is dependent on the GNSS hardware module as the
938driver writes raw bytes by the GNSS object to the receiver through i2c. Please
939refer to the hardware GNSS module documentation for configuration details.
940
941
942Firmware (FW) logging
943---------------------
944The driver supports FW logging via the debugfs interface on PF 0 only. The FW
945running on the NIC must support FW logging; if the FW doesn't support FW logging
946the 'fwlog' file will not get created in the ice debugfs directory.
947
948Module configuration
949~~~~~~~~~~~~~~~~~~~~
950Firmware logging is configured on a per module basis. Each module can be set to
951a value independent of the other modules (unless the module 'all' is specified).
952The modules will be instantiated under the 'fwlog/modules' directory.
953
954The user can set the log level for a module by writing to the module file like
955this::
956
957  # echo <log_level> > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/<module>
958
959where
960
961* log_level is a name as described below. Each level includes the
962  messages from the previous/lower level
963
964      *	none
965      *	error
966      *	warning
967      *	normal
968      *	verbose
969
970* module is a name that represents the module to receive events for. The
971  module names are
972
973      *	general
974      *	ctrl
975      *	link
976      *	link_topo
977      *	dnl
978      *	i2c
979      *	sdp
980      *	mdio
981      *	adminq
982      *	hdma
983      *	lldp
984      *	dcbx
985      *	dcb
986      *	xlr
987      *	nvm
988      *	auth
989      *	vpd
990      *	iosf
991      *	parser
992      *	sw
993      *	scheduler
994      *	txq
995      *	rsvd
996      *	post
997      *	watchdog
998      *	task_dispatch
999      *	mng
1000      *	synce
1001      *	health
1002      *	tsdrv
1003      *	pfreg
1004      *	mdlver
1005      *	all
1006
1007The name 'all' is special and allows the user to set all of the modules to the
1008specified log_level or to read the log_level of all of the modules.
1009
1010Example usage to configure the modules
1011^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1012
1013To set a single module to 'verbose'::
1014
1015  # echo verbose > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/link
1016
1017To set multiple modules then issue the command multiple times::
1018
1019  # echo verbose > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/link
1020  # echo warning > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/ctrl
1021  # echo none > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/dcb
1022
1023To set all the modules to the same value::
1024
1025  # echo normal > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/all
1026
1027To read the log_level of a specific module (e.g. module 'general')::
1028
1029  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/general
1030
1031To read the log_level of all the modules::
1032
1033  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/modules/all
1034
1035Enabling FW log
1036~~~~~~~~~~~~~~~
1037Configuring the modules indicates to the FW that the configured modules should
1038generate events that the driver is interested in, but it **does not** send the
1039events to the driver until the enable message is sent to the FW. To do this
1040the user can write a 1 (enable) or 0 (disable) to 'fwlog/enable'. An example
1041is::
1042
1043  # echo 1 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/enable
1044
1045Retrieving FW log data
1046~~~~~~~~~~~~~~~~~~~~~~
1047The FW log data can be retrieved by reading from 'fwlog/data'. The user can
1048write any value to 'fwlog/data' to clear the data. The data can only be cleared
1049when FW logging is disabled. The FW log data is a binary file that is sent to
1050Intel and used to help debug user issues.
1051
1052An example to read the data is::
1053
1054  # cat /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data > fwlog.bin
1055
1056An example to clear the data is::
1057
1058  # echo 0 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/data
1059
1060Changing how often the log events are sent to the driver
1061~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1062The driver receives FW log data from the Admin Receive Queue (ARQ). The
1063frequency that the FW sends the ARQ events can be configured by writing to
1064'fwlog/nr_messages'. The range is 1-128 (1 means push every log message, 128
1065means push only when the max AQ command buffer is full). The suggested value is
106610. The user can see what the value is configured to by reading
1067'fwlog/nr_messages'. An example to set the value is::
1068
1069  # echo 50 > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/nr_messages
1070
1071Configuring the amount of memory used to store FW log data
1072~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1073The driver stores FW log data within the driver. The default size of the memory
1074used to store the data is 1MB. Some use cases may require more or less data so
1075the user can change the amount of memory that is allocated for FW log data.
1076To change the amount of memory then write to 'fwlog/log_size'. The value must be
1077one of: 128K, 256K, 512K, 1M, or 2M. FW logging must be disabled to change the
1078value. An example of changing the value is::
1079
1080  # echo 128K > /sys/kernel/debug/ice/0000\:18\:00.0/fwlog/log_size
1081
1082
1083Performance Optimization
1084========================
1085Driver defaults are meant to fit a wide variety of workloads, but if further
1086optimization is required, we recommend experimenting with the following
1087settings.
1088
1089
1090Rx Descriptor Ring Size
1091-----------------------
1092To reduce the number of Rx packet discards, increase the number of Rx
1093descriptors for each Rx ring using ethtool.
1094
1095  Check if the interface is dropping Rx packets due to buffers being full
1096  (rx_dropped.nic can mean that there is no PCIe bandwidth)::
1097
1098    # ethtool -S <ethX> | grep "rx_dropped"
1099
1100  If the previous command shows drops on queues, it may help to increase
1101  the number of descriptors using 'ethtool -G'::
1102
1103    # ethtool -G <ethX> rx <N>
1104    Where <N> is the desired number of ring entries/descriptors
1105
1106  This can provide temporary buffering for issues that create latency while
1107  the CPUs process descriptors.
1108
1109
1110Interrupt Rate Limiting
1111-----------------------
1112This driver supports an adaptive interrupt throttle rate (ITR) mechanism that
1113is tuned for general workloads. The user can customize the interrupt rate
1114control for specific workloads, via ethtool, adjusting the number of
1115microseconds between interrupts.
1116
1117To set the interrupt rate manually, you must disable adaptive mode::
1118
1119  # ethtool -C <ethX> adaptive-rx off adaptive-tx off
1120
1121For lower CPU utilization:
1122
1123  Disable adaptive ITR and lower Rx and Tx interrupts. The examples below
1124  affect every queue of the specified interface.
1125
1126  Setting rx-usecs and tx-usecs to 80 will limit interrupts to about
1127  12,500 interrupts per second per queue::
1128
1129    # ethtool -C <ethX> adaptive-rx off adaptive-tx off rx-usecs 80 tx-usecs 80
1130
1131For reduced latency:
1132
1133  Disable adaptive ITR and ITR by setting rx-usecs and tx-usecs to 0
1134  using ethtool::
1135
1136    # ethtool -C <ethX> adaptive-rx off adaptive-tx off rx-usecs 0 tx-usecs 0
1137
1138Per-queue interrupt rate settings:
1139
1140  The following examples are for queues 1 and 3, but you can adjust other
1141  queues.
1142
1143  To disable Rx adaptive ITR and set static Rx ITR to 10 microseconds or
1144  about 100,000 interrupts/second, for queues 1 and 3::
1145
1146    # ethtool --per-queue <ethX> queue_mask 0xa --coalesce adaptive-rx off
1147    rx-usecs 10
1148
1149  To show the current coalesce settings for queues 1 and 3::
1150
1151    # ethtool --per-queue <ethX> queue_mask 0xa --show-coalesce
1152
1153Bounding interrupt rates using rx-usecs-high:
1154
1155  :Valid Range: 0-236 (0=no limit)
1156
1157   The range of 0-236 microseconds provides an effective range of 4,237 to
1158   250,000 interrupts per second. The value of rx-usecs-high can be set
1159   independently of rx-usecs and tx-usecs in the same ethtool command, and is
1160   also independent of the adaptive interrupt moderation algorithm. The
1161   underlying hardware supports granularity in 4-microsecond intervals, so
1162   adjacent values may result in the same interrupt rate.
1163
1164  The following command would disable adaptive interrupt moderation, and allow
1165  a maximum of 5 microseconds before indicating a receive or transmit was
1166  complete. However, instead of resulting in as many as 200,000 interrupts per
1167  second, it limits total interrupts per second to 50,000 via the rx-usecs-high
1168  parameter.
1169
1170  ::
1171
1172    # ethtool -C <ethX> adaptive-rx off adaptive-tx off rx-usecs-high 20
1173    rx-usecs 5 tx-usecs 5
1174
1175
1176Virtualized Environments
1177------------------------
1178In addition to the other suggestions in this section, the following may be
1179helpful to optimize performance in VMs.
1180
1181  Using the appropriate mechanism (vcpupin) in the VM, pin the CPUs to
1182  individual LCPUs, making sure to use a set of CPUs included in the
1183  device's local_cpulist: ``/sys/class/net/<ethX>/device/local_cpulist``.
1184
1185  Configure as many Rx/Tx queues in the VM as available. (See the iavf driver
1186  documentation for the number of queues supported.) For example::
1187
1188    # ethtool -L <virt_interface> rx <max> tx <max>
1189
1190
1191Support
1192=======
1193For general information, go to the Intel support website at:
1194https://www.intel.com/support/
1195
1196If an issue is identified with the released source code on a supported kernel
1197with a supported adapter, email the specific information related to the issue
1198to intel-wired-lan@lists.osuosl.org.
1199
1200
1201Trademarks
1202==========
1203Intel is a trademark or registered trademark of Intel Corporation or its
1204subsidiaries in the United States and/or other countries.
1205
1206* Other names and brands may be claimed as the property of others.
1207