xref: /linux/Documentation/networking/devlink/ice.rst (revision eb01fe7abbe2d0b38824d2a93fdb4cc3eaf2ccc1)
1.. SPDX-License-Identifier: GPL-2.0
2
3===================
4ice devlink support
5===================
6
7This document describes the devlink features implemented by the ``ice``
8device driver.
9
10Parameters
11==========
12
13.. list-table:: Generic parameters implemented
14
15   * - Name
16     - Mode
17     - Notes
18   * - ``enable_roce``
19     - runtime
20     - mutually exclusive with ``enable_iwarp``
21   * - ``enable_iwarp``
22     - runtime
23     - mutually exclusive with ``enable_roce``
24
25Info versions
26=============
27
28The ``ice`` driver reports the following versions
29
30.. list-table:: devlink info versions implemented
31    :widths: 5 5 5 90
32
33    * - Name
34      - Type
35      - Example
36      - Description
37    * - ``board.id``
38      - fixed
39      - K65390-000
40      - The Product Board Assembly (PBA) identifier of the board.
41    * - ``cgu.id``
42      - fixed
43      - 36
44      - The Clock Generation Unit (CGU) hardware revision identifier.
45    * - ``fw.mgmt``
46      - running
47      - 2.1.7
48      - 3-digit version number of the management firmware running on the
49        Embedded Management Processor of the device. It controls the PHY,
50        link, access to device resources, etc. Intel documentation refers to
51        this as the EMP firmware.
52    * - ``fw.mgmt.api``
53      - running
54      - 1.5.1
55      - 3-digit version number (major.minor.patch) of the API exported over
56        the AdminQ by the management firmware. Used by the driver to
57        identify what commands are supported. Historical versions of the
58        kernel only displayed a 2-digit version number (major.minor).
59    * - ``fw.mgmt.build``
60      - running
61      - 0x305d955f
62      - Unique identifier of the source for the management firmware.
63    * - ``fw.undi``
64      - running
65      - 1.2581.0
66      - Version of the Option ROM containing the UEFI driver. The version is
67        reported in ``major.minor.patch`` format. The major version is
68        incremented whenever a major breaking change occurs, or when the
69        minor version would overflow. The minor version is incremented for
70        non-breaking changes and reset to 1 when the major version is
71        incremented. The patch version is normally 0 but is incremented when
72        a fix is delivered as a patch against an older base Option ROM.
73    * - ``fw.psid.api``
74      - running
75      - 0.80
76      - Version defining the format of the flash contents.
77    * - ``fw.bundle_id``
78      - running
79      - 0x80002ec0
80      - Unique identifier of the firmware image file that was loaded onto
81        the device. Also referred to as the EETRACK identifier of the NVM.
82    * - ``fw.app.name``
83      - running
84      - ICE OS Default Package
85      - The name of the DDP package that is active in the device. The DDP
86        package is loaded by the driver during initialization. Each
87        variation of the DDP package has a unique name.
88    * - ``fw.app``
89      - running
90      - 1.3.1.0
91      - The version of the DDP package that is active in the device. Note
92        that both the name (as reported by ``fw.app.name``) and version are
93        required to uniquely identify the package.
94    * - ``fw.app.bundle_id``
95      - running
96      - 0xc0000001
97      - Unique identifier for the DDP package loaded in the device. Also
98        referred to as the DDP Track ID. Can be used to uniquely identify
99        the specific DDP package.
100    * - ``fw.netlist``
101      - running
102      - 1.1.2000-6.7.0
103      - The version of the netlist module. This module defines the device's
104        Ethernet capabilities and default settings, and is used by the
105        management firmware as part of managing link and device
106        connectivity.
107    * - ``fw.netlist.build``
108      - running
109      - 0xee16ced7
110      - The first 4 bytes of the hash of the netlist module contents.
111    * - ``fw.cgu``
112      - running
113      - 8032.16973825.6021
114      - The version of Clock Generation Unit (CGU). Format:
115        <CGU type>.<configuration version>.<firmware version>.
116
117Flash Update
118============
119
120The ``ice`` driver implements support for flash update using the
121``devlink-flash`` interface. It supports updating the device flash using a
122combined flash image that contains the ``fw.mgmt``, ``fw.undi``, and
123``fw.netlist`` components.
124
125.. list-table:: List of supported overwrite modes
126   :widths: 5 95
127
128   * - Bits
129     - Behavior
130   * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS``
131     - Do not preserve settings stored in the flash components being
132       updated. This includes overwriting the port configuration that
133       determines the number of physical functions the device will
134       initialize with.
135   * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` and ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS``
136     - Do not preserve either settings or identifiers. Overwrite everything
137       in the flash with the contents from the provided image, without
138       performing any preservation. This includes overwriting device
139       identifying fields such as the MAC address, VPD area, and device
140       serial number. It is expected that this combination be used with an
141       image customized for the specific device.
142
143The ice hardware does not support overwriting only identifiers while
144preserving settings, and thus ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` on its
145own will be rejected. If no overwrite mask is provided, the firmware will be
146instructed to preserve all settings and identifying fields when updating.
147
148Reload
149======
150
151The ``ice`` driver supports activating new firmware after a flash update
152using ``DEVLINK_CMD_RELOAD`` with the ``DEVLINK_RELOAD_ACTION_FW_ACTIVATE``
153action.
154
155.. code:: shell
156
157    $ devlink dev reload pci/0000:01:00.0 reload action fw_activate
158
159The new firmware is activated by issuing a device specific Embedded
160Management Processor reset which requests the device to reset and reload the
161EMP firmware image.
162
163The driver does not currently support reloading the driver via
164``DEVLINK_RELOAD_ACTION_DRIVER_REINIT``.
165
166Port split
167==========
168
169The ``ice`` driver supports port splitting only for port 0, as the FW has
170a predefined set of available port split options for the whole device.
171
172A system reboot is required for port split to be applied.
173
174The following command will select the port split option with 4 ports:
175
176.. code:: shell
177
178    $ devlink port split pci/0000:16:00.0/0 count 4
179
180The list of all available port options will be printed to dynamic debug after
181each ``split`` and ``unsplit`` command. The first option is the default.
182
183.. code:: shell
184
185    ice 0000:16:00.0: Available port split options and max port speeds (Gbps):
186    ice 0000:16:00.0: Status  Split      Quad 0          Quad 1
187    ice 0000:16:00.0:         count  L0  L1  L2  L3  L4  L5  L6  L7
188    ice 0000:16:00.0: Active  2     100   -   -   - 100   -   -   -
189    ice 0000:16:00.0:         2      50   -  50   -   -   -   -   -
190    ice 0000:16:00.0: Pending 4      25  25  25  25   -   -   -   -
191    ice 0000:16:00.0:         4      25  25   -   -  25  25   -   -
192    ice 0000:16:00.0:         8      10  10  10  10  10  10  10  10
193    ice 0000:16:00.0:         1     100   -   -   -   -   -   -   -
194
195There could be multiple FW port options with the same port split count. When
196the same port split count request is issued again, the next FW port option with
197the same port split count will be selected.
198
199``devlink port unsplit`` will select the option with a split count of 1. If
200there is no FW option available with split count 1, you will receive an error.
201
202Regions
203=======
204
205The ``ice`` driver implements the following regions for accessing internal
206device data.
207
208.. list-table:: regions implemented
209    :widths: 15 85
210
211    * - Name
212      - Description
213    * - ``nvm-flash``
214      - The contents of the entire flash chip, sometimes referred to as
215        the device's Non Volatile Memory.
216    * - ``shadow-ram``
217      - The contents of the Shadow RAM, which is loaded from the beginning
218        of the flash. Although the contents are primarily from the flash,
219        this area also contains data generated during device boot which is
220        not stored in flash.
221    * - ``device-caps``
222      - The contents of the device firmware's capabilities buffer. Useful to
223        determine the current state and configuration of the device.
224
225Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a
226snapshot. The ``device-caps`` region requires a snapshot as the contents are
227sent by firmware and can't be split into separate reads.
228
229Users can request an immediate capture of a snapshot for all three regions
230via the ``DEVLINK_CMD_REGION_NEW`` command.
231
232.. code:: shell
233
234    $ devlink region show
235    pci/0000:01:00.0/nvm-flash: size 10485760 snapshot [] max 1
236    pci/0000:01:00.0/device-caps: size 4096 snapshot [] max 10
237
238    $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1
239    $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1
240
241    $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1
242    0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
243    0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8
244    0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc
245    0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5
246
247    $ devlink region read pci/0000:01:00.0/nvm-flash snapshot 1 address 0 length 16
248    0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30
249
250    $ devlink region delete pci/0000:01:00.0/nvm-flash snapshot 1
251
252    $ devlink region new pci/0000:01:00.0/device-caps snapshot 1
253    $ devlink region dump pci/0000:01:00.0/device-caps snapshot 1
254    0000000000000000 01 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00
255    0000000000000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
256    0000000000000020 02 00 02 01 32 03 00 00 0a 00 00 00 25 00 00 00
257    0000000000000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
258    0000000000000040 04 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
259    0000000000000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
260    0000000000000060 05 00 01 00 03 00 00 00 00 00 00 00 00 00 00 00
261    0000000000000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
262    0000000000000080 06 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
263    0000000000000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
264    00000000000000a0 08 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00
265    00000000000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
266    00000000000000c0 12 00 01 00 01 00 00 00 01 00 01 00 00 00 00 00
267    00000000000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
268    00000000000000e0 13 00 01 00 00 01 00 00 00 00 00 00 00 00 00 00
269    00000000000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
270    0000000000000100 14 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
271    0000000000000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
272    0000000000000120 15 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
273    0000000000000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
274    0000000000000140 16 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
275    0000000000000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
276    0000000000000160 17 00 01 00 06 00 00 00 00 00 00 00 00 00 00 00
277    0000000000000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
278    0000000000000180 18 00 01 00 01 00 00 00 01 00 00 00 08 00 00 00
279    0000000000000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
280    00000000000001a0 22 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00
281    00000000000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
282    00000000000001c0 40 00 01 00 00 08 00 00 08 00 00 00 00 00 00 00
283    00000000000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
284    00000000000001e0 41 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00
285    00000000000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
286    0000000000000200 42 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00
287    0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
288
289    $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1
290
291Devlink Rate
292============
293
294The ``ice`` driver implements devlink-rate API. It allows for offload of
295the Hierarchical QoS to the hardware. It enables user to group Virtual
296Functions in a tree structure and assign supported parameters: tx_share,
297tx_max, tx_priority and tx_weight to each node in a tree. So effectively
298user gains an ability to control how much bandwidth is allocated for each
299VF group. This is later enforced by the HW.
300
301It is assumed that this feature is mutually exclusive with DCB performed
302in FW and ADQ, or any driver feature that would trigger changes in QoS,
303for example creation of the new traffic class. The driver will prevent DCB
304or ADQ configuration if user started making any changes to the nodes using
305devlink-rate API. To configure those features a driver reload is necessary.
306Correspondingly if ADQ or DCB will get configured the driver won't export
307hierarchy at all, or will remove the untouched hierarchy if those
308features are enabled after the hierarchy is exported, but before any
309changes are made.
310
311This feature is also dependent on switchdev being enabled in the system.
312It's required because devlink-rate requires devlink-port objects to be
313present, and those objects are only created in switchdev mode.
314
315If the driver is set to the switchdev mode, it will export internal
316hierarchy the moment VF's are created. Root of the tree is always
317represented by the node_0. This node can't be deleted by the user. Leaf
318nodes and nodes with children also can't be deleted.
319
320.. list-table:: Attributes supported
321    :widths: 15 85
322
323    * - Name
324      - Description
325    * - ``tx_max``
326      - maximum bandwidth to be consumed by the tree Node. Rate Limit is
327        an absolute number specifying a maximum amount of bytes a Node may
328        consume during the course of one second. Rate limit guarantees
329        that a link will not oversaturate the receiver on the remote end
330        and also enforces an SLA between the subscriber and network
331        provider.
332    * - ``tx_share``
333      - minimum bandwidth allocated to a tree node when it is not blocked.
334        It specifies an absolute BW. While tx_max defines the maximum
335        bandwidth the node may consume, the tx_share marks committed BW
336        for the Node.
337    * - ``tx_priority``
338      - allows for usage of strict priority arbiter among siblings. This
339        arbitration scheme attempts to schedule nodes based on their
340        priority as long as the nodes remain within their bandwidth limit.
341        Range 0-7. Nodes with priority 7 have the highest priority and are
342        selected first, while nodes with priority 0 have the lowest
343        priority. Nodes that have the same priority are treated equally.
344    * - ``tx_weight``
345      - allows for usage of Weighted Fair Queuing arbitration scheme among
346        siblings. This arbitration scheme can be used simultaneously with
347        the strict priority. Range 1-200. Only relative values matter for
348        arbitration.
349
350``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case
351nodes with the same priority form a WFQ subgroup in the sibling group
352and arbitration among them is based on assigned weights.
353
354.. code:: shell
355
356    # enable switchdev
357    $ devlink dev eswitch set pci/0000:4b:00.0 mode switchdev
358
359    # at this point driver should export internal hierarchy
360    $ echo 2 > /sys/class/net/ens785np0/device/sriov_numvfs
361
362    $ devlink port function rate show
363    pci/0000:4b:00.0/node_25: type node parent node_24
364    pci/0000:4b:00.0/node_24: type node parent node_0
365    pci/0000:4b:00.0/node_32: type node parent node_31
366    pci/0000:4b:00.0/node_31: type node parent node_30
367    pci/0000:4b:00.0/node_30: type node parent node_16
368    pci/0000:4b:00.0/node_19: type node parent node_18
369    pci/0000:4b:00.0/node_18: type node parent node_17
370    pci/0000:4b:00.0/node_17: type node parent node_16
371    pci/0000:4b:00.0/node_14: type node parent node_5
372    pci/0000:4b:00.0/node_5: type node parent node_3
373    pci/0000:4b:00.0/node_13: type node parent node_4
374    pci/0000:4b:00.0/node_12: type node parent node_4
375    pci/0000:4b:00.0/node_11: type node parent node_4
376    pci/0000:4b:00.0/node_10: type node parent node_4
377    pci/0000:4b:00.0/node_9: type node parent node_4
378    pci/0000:4b:00.0/node_8: type node parent node_4
379    pci/0000:4b:00.0/node_7: type node parent node_4
380    pci/0000:4b:00.0/node_6: type node parent node_4
381    pci/0000:4b:00.0/node_4: type node parent node_3
382    pci/0000:4b:00.0/node_3: type node parent node_16
383    pci/0000:4b:00.0/node_16: type node parent node_15
384    pci/0000:4b:00.0/node_15: type node parent node_0
385    pci/0000:4b:00.0/node_2: type node parent node_1
386    pci/0000:4b:00.0/node_1: type node parent node_0
387    pci/0000:4b:00.0/node_0: type node
388    pci/0000:4b:00.0/1: type leaf parent node_25
389    pci/0000:4b:00.0/2: type leaf parent node_25
390
391    # let's create some custom node
392    $ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0
393
394    # second custom node
395    $ devlink port function rate add pci/0000:4b:00.0/node_custom_1 parent node_custom
396
397    # reassign second VF to newly created branch
398    $ devlink port function rate set pci/0000:4b:00.0/2 parent node_custom_1
399
400    # assign tx_weight to the VF
401    $ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 5
402
403    # assign tx_share to the VF
404    $ devlink port function rate set pci/0000:4b:00.0/2 tx_share 500Mbps
405