1.. SPDX-License-Identifier: GPL-2.0 2 3=================== 4ice devlink support 5=================== 6 7This document describes the devlink features implemented by the ``ice`` 8device driver. 9 10Parameters 11========== 12 13.. list-table:: Generic parameters implemented 14 15 * - Name 16 - Mode 17 - Notes 18 * - ``enable_roce`` 19 - runtime 20 - mutually exclusive with ``enable_iwarp`` 21 * - ``enable_iwarp`` 22 - runtime 23 - mutually exclusive with ``enable_roce`` 24 25Info versions 26============= 27 28The ``ice`` driver reports the following versions 29 30.. list-table:: devlink info versions implemented 31 :widths: 5 5 5 90 32 33 * - Name 34 - Type 35 - Example 36 - Description 37 * - ``board.id`` 38 - fixed 39 - K65390-000 40 - The Product Board Assembly (PBA) identifier of the board. 41 * - ``cgu.id`` 42 - fixed 43 - 36 44 - The Clock Generation Unit (CGU) hardware revision identifier. 45 * - ``fw.mgmt`` 46 - running 47 - 2.1.7 48 - 3-digit version number of the management firmware running on the 49 Embedded Management Processor of the device. It controls the PHY, 50 link, access to device resources, etc. Intel documentation refers to 51 this as the EMP firmware. 52 * - ``fw.mgmt.api`` 53 - running 54 - 1.5.1 55 - 3-digit version number (major.minor.patch) of the API exported over 56 the AdminQ by the management firmware. Used by the driver to 57 identify what commands are supported. Historical versions of the 58 kernel only displayed a 2-digit version number (major.minor). 59 * - ``fw.mgmt.build`` 60 - running 61 - 0x305d955f 62 - Unique identifier of the source for the management firmware. 63 * - ``fw.undi`` 64 - running 65 - 1.2581.0 66 - Version of the Option ROM containing the UEFI driver. The version is 67 reported in ``major.minor.patch`` format. The major version is 68 incremented whenever a major breaking change occurs, or when the 69 minor version would overflow. The minor version is incremented for 70 non-breaking changes and reset to 1 when the major version is 71 incremented. The patch version is normally 0 but is incremented when 72 a fix is delivered as a patch against an older base Option ROM. 73 * - ``fw.psid.api`` 74 - running 75 - 0.80 76 - Version defining the format of the flash contents. 77 * - ``fw.bundle_id`` 78 - running 79 - 0x80002ec0 80 - Unique identifier of the firmware image file that was loaded onto 81 the device. Also referred to as the EETRACK identifier of the NVM. 82 * - ``fw.app.name`` 83 - running 84 - ICE OS Default Package 85 - The name of the DDP package that is active in the device. The DDP 86 package is loaded by the driver during initialization. Each 87 variation of the DDP package has a unique name. 88 * - ``fw.app`` 89 - running 90 - 1.3.1.0 91 - The version of the DDP package that is active in the device. Note 92 that both the name (as reported by ``fw.app.name``) and version are 93 required to uniquely identify the package. 94 * - ``fw.app.bundle_id`` 95 - running 96 - 0xc0000001 97 - Unique identifier for the DDP package loaded in the device. Also 98 referred to as the DDP Track ID. Can be used to uniquely identify 99 the specific DDP package. 100 * - ``fw.netlist`` 101 - running 102 - 1.1.2000-6.7.0 103 - The version of the netlist module. This module defines the device's 104 Ethernet capabilities and default settings, and is used by the 105 management firmware as part of managing link and device 106 connectivity. 107 * - ``fw.netlist.build`` 108 - running 109 - 0xee16ced7 110 - The first 4 bytes of the hash of the netlist module contents. 111 * - ``fw.cgu`` 112 - running 113 - 8032.16973825.6021 114 - The version of Clock Generation Unit (CGU). Format: 115 <CGU type>.<configuration version>.<firmware version>. 116 117Flash Update 118============ 119 120The ``ice`` driver implements support for flash update using the 121``devlink-flash`` interface. It supports updating the device flash using a 122combined flash image that contains the ``fw.mgmt``, ``fw.undi``, and 123``fw.netlist`` components. 124 125.. list-table:: List of supported overwrite modes 126 :widths: 5 95 127 128 * - Bits 129 - Behavior 130 * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` 131 - Do not preserve settings stored in the flash components being 132 updated. This includes overwriting the port configuration that 133 determines the number of physical functions the device will 134 initialize with. 135 * - ``DEVLINK_FLASH_OVERWRITE_SETTINGS`` and ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` 136 - Do not preserve either settings or identifiers. Overwrite everything 137 in the flash with the contents from the provided image, without 138 performing any preservation. This includes overwriting device 139 identifying fields such as the MAC address, VPD area, and device 140 serial number. It is expected that this combination be used with an 141 image customized for the specific device. 142 143The ice hardware does not support overwriting only identifiers while 144preserving settings, and thus ``DEVLINK_FLASH_OVERWRITE_IDENTIFIERS`` on its 145own will be rejected. If no overwrite mask is provided, the firmware will be 146instructed to preserve all settings and identifying fields when updating. 147 148Reload 149====== 150 151The ``ice`` driver supports activating new firmware after a flash update 152using ``DEVLINK_CMD_RELOAD`` with the ``DEVLINK_RELOAD_ACTION_FW_ACTIVATE`` 153action. 154 155.. code:: shell 156 157 $ devlink dev reload pci/0000:01:00.0 reload action fw_activate 158 159The new firmware is activated by issuing a device specific Embedded 160Management Processor reset which requests the device to reset and reload the 161EMP firmware image. 162 163The driver does not currently support reloading the driver via 164``DEVLINK_RELOAD_ACTION_DRIVER_REINIT``. 165 166Port split 167========== 168 169The ``ice`` driver supports port splitting only for port 0, as the FW has 170a predefined set of available port split options for the whole device. 171 172A system reboot is required for port split to be applied. 173 174The following command will select the port split option with 4 ports: 175 176.. code:: shell 177 178 $ devlink port split pci/0000:16:00.0/0 count 4 179 180The list of all available port options will be printed to dynamic debug after 181each ``split`` and ``unsplit`` command. The first option is the default. 182 183.. code:: shell 184 185 ice 0000:16:00.0: Available port split options and max port speeds (Gbps): 186 ice 0000:16:00.0: Status Split Quad 0 Quad 1 187 ice 0000:16:00.0: count L0 L1 L2 L3 L4 L5 L6 L7 188 ice 0000:16:00.0: Active 2 100 - - - 100 - - - 189 ice 0000:16:00.0: 2 50 - 50 - - - - - 190 ice 0000:16:00.0: Pending 4 25 25 25 25 - - - - 191 ice 0000:16:00.0: 4 25 25 - - 25 25 - - 192 ice 0000:16:00.0: 8 10 10 10 10 10 10 10 10 193 ice 0000:16:00.0: 1 100 - - - - - - - 194 195There could be multiple FW port options with the same port split count. When 196the same port split count request is issued again, the next FW port option with 197the same port split count will be selected. 198 199``devlink port unsplit`` will select the option with a split count of 1. If 200there is no FW option available with split count 1, you will receive an error. 201 202Regions 203======= 204 205The ``ice`` driver implements the following regions for accessing internal 206device data. 207 208.. list-table:: regions implemented 209 :widths: 15 85 210 211 * - Name 212 - Description 213 * - ``nvm-flash`` 214 - The contents of the entire flash chip, sometimes referred to as 215 the device's Non Volatile Memory. 216 * - ``shadow-ram`` 217 - The contents of the Shadow RAM, which is loaded from the beginning 218 of the flash. Although the contents are primarily from the flash, 219 this area also contains data generated during device boot which is 220 not stored in flash. 221 * - ``device-caps`` 222 - The contents of the device firmware's capabilities buffer. Useful to 223 determine the current state and configuration of the device. 224 225Both the ``nvm-flash`` and ``shadow-ram`` regions can be accessed without a 226snapshot. The ``device-caps`` region requires a snapshot as the contents are 227sent by firmware and can't be split into separate reads. 228 229Users can request an immediate capture of a snapshot for all three regions 230via the ``DEVLINK_CMD_REGION_NEW`` command. 231 232.. code:: shell 233 234 $ devlink region show 235 pci/0000:01:00.0/nvm-flash: size 10485760 snapshot [] max 1 236 pci/0000:01:00.0/device-caps: size 4096 snapshot [] max 10 237 238 $ devlink region new pci/0000:01:00.0/nvm-flash snapshot 1 239 $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 240 241 $ devlink region dump pci/0000:01:00.0/nvm-flash snapshot 1 242 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 243 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 244 0000000000000020 0016 0bb8 0016 1720 0000 0000 c00f 3ffc 245 0000000000000030 bada cce5 bada cce5 bada cce5 bada cce5 246 247 $ devlink region read pci/0000:01:00.0/nvm-flash snapshot 1 address 0 length 16 248 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 249 250 $ devlink region delete pci/0000:01:00.0/nvm-flash snapshot 1 251 252 $ devlink region new pci/0000:01:00.0/device-caps snapshot 1 253 $ devlink region dump pci/0000:01:00.0/device-caps snapshot 1 254 0000000000000000 01 00 01 00 00 00 00 00 01 00 00 00 00 00 00 00 255 0000000000000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 256 0000000000000020 02 00 02 01 32 03 00 00 0a 00 00 00 25 00 00 00 257 0000000000000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 258 0000000000000040 04 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 259 0000000000000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 260 0000000000000060 05 00 01 00 03 00 00 00 00 00 00 00 00 00 00 00 261 0000000000000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 262 0000000000000080 06 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 263 0000000000000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 264 00000000000000a0 08 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 265 00000000000000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 266 00000000000000c0 12 00 01 00 01 00 00 00 01 00 01 00 00 00 00 00 267 00000000000000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 268 00000000000000e0 13 00 01 00 00 01 00 00 00 00 00 00 00 00 00 00 269 00000000000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 270 0000000000000100 14 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 271 0000000000000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 272 0000000000000120 15 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 273 0000000000000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 274 0000000000000140 16 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 275 0000000000000150 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 276 0000000000000160 17 00 01 00 06 00 00 00 00 00 00 00 00 00 00 00 277 0000000000000170 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 278 0000000000000180 18 00 01 00 01 00 00 00 01 00 00 00 08 00 00 00 279 0000000000000190 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 280 00000000000001a0 22 00 01 00 01 00 00 00 00 00 00 00 00 00 00 00 281 00000000000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 282 00000000000001c0 40 00 01 00 00 08 00 00 08 00 00 00 00 00 00 00 283 00000000000001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 284 00000000000001e0 41 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00 285 00000000000001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 286 0000000000000200 42 00 01 00 00 08 00 00 00 00 00 00 00 00 00 00 287 0000000000000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 288 289 $ devlink region delete pci/0000:01:00.0/device-caps snapshot 1 290 291Devlink Rate 292============ 293 294The ``ice`` driver implements devlink-rate API. It allows for offload of 295the Hierarchical QoS to the hardware. It enables user to group Virtual 296Functions in a tree structure and assign supported parameters: tx_share, 297tx_max, tx_priority and tx_weight to each node in a tree. So effectively 298user gains an ability to control how much bandwidth is allocated for each 299VF group. This is later enforced by the HW. 300 301It is assumed that this feature is mutually exclusive with DCB performed 302in FW and ADQ, or any driver feature that would trigger changes in QoS, 303for example creation of the new traffic class. The driver will prevent DCB 304or ADQ configuration if user started making any changes to the nodes using 305devlink-rate API. To configure those features a driver reload is necessary. 306Correspondingly if ADQ or DCB will get configured the driver won't export 307hierarchy at all, or will remove the untouched hierarchy if those 308features are enabled after the hierarchy is exported, but before any 309changes are made. 310 311This feature is also dependent on switchdev being enabled in the system. 312It's required because devlink-rate requires devlink-port objects to be 313present, and those objects are only created in switchdev mode. 314 315If the driver is set to the switchdev mode, it will export internal 316hierarchy the moment VF's are created. Root of the tree is always 317represented by the node_0. This node can't be deleted by the user. Leaf 318nodes and nodes with children also can't be deleted. 319 320.. list-table:: Attributes supported 321 :widths: 15 85 322 323 * - Name 324 - Description 325 * - ``tx_max`` 326 - maximum bandwidth to be consumed by the tree Node. Rate Limit is 327 an absolute number specifying a maximum amount of bytes a Node may 328 consume during the course of one second. Rate limit guarantees 329 that a link will not oversaturate the receiver on the remote end 330 and also enforces an SLA between the subscriber and network 331 provider. 332 * - ``tx_share`` 333 - minimum bandwidth allocated to a tree node when it is not blocked. 334 It specifies an absolute BW. While tx_max defines the maximum 335 bandwidth the node may consume, the tx_share marks committed BW 336 for the Node. 337 * - ``tx_priority`` 338 - allows for usage of strict priority arbiter among siblings. This 339 arbitration scheme attempts to schedule nodes based on their 340 priority as long as the nodes remain within their bandwidth limit. 341 Range 0-7. Nodes with priority 7 have the highest priority and are 342 selected first, while nodes with priority 0 have the lowest 343 priority. Nodes that have the same priority are treated equally. 344 * - ``tx_weight`` 345 - allows for usage of Weighted Fair Queuing arbitration scheme among 346 siblings. This arbitration scheme can be used simultaneously with 347 the strict priority. Range 1-200. Only relative values matter for 348 arbitration. 349 350``tx_priority`` and ``tx_weight`` can be used simultaneously. In that case 351nodes with the same priority form a WFQ subgroup in the sibling group 352and arbitration among them is based on assigned weights. 353 354.. code:: shell 355 356 # enable switchdev 357 $ devlink dev eswitch set pci/0000:4b:00.0 mode switchdev 358 359 # at this point driver should export internal hierarchy 360 $ echo 2 > /sys/class/net/ens785np0/device/sriov_numvfs 361 362 $ devlink port function rate show 363 pci/0000:4b:00.0/node_25: type node parent node_24 364 pci/0000:4b:00.0/node_24: type node parent node_0 365 pci/0000:4b:00.0/node_32: type node parent node_31 366 pci/0000:4b:00.0/node_31: type node parent node_30 367 pci/0000:4b:00.0/node_30: type node parent node_16 368 pci/0000:4b:00.0/node_19: type node parent node_18 369 pci/0000:4b:00.0/node_18: type node parent node_17 370 pci/0000:4b:00.0/node_17: type node parent node_16 371 pci/0000:4b:00.0/node_14: type node parent node_5 372 pci/0000:4b:00.0/node_5: type node parent node_3 373 pci/0000:4b:00.0/node_13: type node parent node_4 374 pci/0000:4b:00.0/node_12: type node parent node_4 375 pci/0000:4b:00.0/node_11: type node parent node_4 376 pci/0000:4b:00.0/node_10: type node parent node_4 377 pci/0000:4b:00.0/node_9: type node parent node_4 378 pci/0000:4b:00.0/node_8: type node parent node_4 379 pci/0000:4b:00.0/node_7: type node parent node_4 380 pci/0000:4b:00.0/node_6: type node parent node_4 381 pci/0000:4b:00.0/node_4: type node parent node_3 382 pci/0000:4b:00.0/node_3: type node parent node_16 383 pci/0000:4b:00.0/node_16: type node parent node_15 384 pci/0000:4b:00.0/node_15: type node parent node_0 385 pci/0000:4b:00.0/node_2: type node parent node_1 386 pci/0000:4b:00.0/node_1: type node parent node_0 387 pci/0000:4b:00.0/node_0: type node 388 pci/0000:4b:00.0/1: type leaf parent node_25 389 pci/0000:4b:00.0/2: type leaf parent node_25 390 391 # let's create some custom node 392 $ devlink port function rate add pci/0000:4b:00.0/node_custom parent node_0 393 394 # second custom node 395 $ devlink port function rate add pci/0000:4b:00.0/node_custom_1 parent node_custom 396 397 # reassign second VF to newly created branch 398 $ devlink port function rate set pci/0000:4b:00.0/2 parent node_custom_1 399 400 # assign tx_weight to the VF 401 $ devlink port function rate set pci/0000:4b:00.0/2 tx_weight 5 402 403 # assign tx_share to the VF 404 $ devlink port function rate set pci/0000:4b:00.0/2 tx_share 500Mbps 405