xref: /linux/Documentation/networking/device_drivers/ethernet/marvell/octeontx2.rst (revision 7f71507851fc7764b36a3221839607d3a45c2025)
1.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
2
3====================================
4Marvell OcteonTx2 RVU Kernel Drivers
5====================================
6
7Copyright (c) 2020 Marvell International Ltd.
8
9Contents
10========
11
12- `Overview`_
13- `Drivers`_
14- `Basic packet flow`_
15- `Devlink health reporters`_
16- `Quality of service`_
17- `RVU representors`_
18
19Overview
20========
21
22Resource virtualization unit (RVU) on Marvell's OcteonTX2 SOC maps HW
23resources from the network, crypto and other functional blocks into
24PCI-compatible physical and virtual functions. Each functional block
25again has multiple local functions (LFs) for provisioning to PCI devices.
26RVU supports multiple PCIe SRIOV physical functions (PFs) and virtual
27functions (VFs). PF0 is called the administrative / admin function (AF)
28and has privileges to provision RVU functional block's LFs to each of the
29PF/VF.
30
31RVU managed networking functional blocks
32 - Network pool or buffer allocator (NPA)
33 - Network interface controller (NIX)
34 - Network parser CAM (NPC)
35 - Schedule/Synchronize/Order unit (SSO)
36 - Loopback interface (LBK)
37
38RVU managed non-networking functional blocks
39 - Crypto accelerator (CPT)
40 - Scheduled timers unit (TIM)
41 - Schedule/Synchronize/Order unit (SSO)
42   Used for both networking and non networking usecases
43
44Resource provisioning examples
45 - A PF/VF with NIX-LF & NPA-LF resources works as a pure network device
46 - A PF/VF with CPT-LF resource works as a pure crypto offload device.
47
48RVU functional blocks are highly configurable as per software requirements.
49
50Firmware setups following stuff before kernel boots
51 - Enables required number of RVU PFs based on number of physical links.
52 - Number of VFs per PF are either static or configurable at compile time.
53   Based on config, firmware assigns VFs to each of the PFs.
54 - Also assigns MSIX vectors to each of PF and VFs.
55 - These are not changed after kernel boot.
56
57Drivers
58=======
59
60Linux kernel will have multiple drivers registering to different PF and VFs
61of RVU. Wrt networking there will be 3 flavours of drivers.
62
63Admin Function driver
64---------------------
65
66As mentioned above RVU PF0 is called the admin function (AF), this driver
67supports resource provisioning and configuration of functional blocks.
68Doesn't handle any I/O. It sets up few basic stuff but most of the
69funcionality is achieved via configuration requests from PFs and VFs.
70
71PF/VFs communicates with AF via a shared memory region (mailbox). Upon
72receiving requests AF does resource provisioning and other HW configuration.
73AF is always attached to host kernel, but PFs and their VFs may be used by host
74kernel itself, or attached to VMs or to userspace applications like
75DPDK etc. So AF has to handle provisioning/configuration requests sent
76by any device from any domain.
77
78AF driver also interacts with underlying firmware to
79 - Manage physical ethernet links ie CGX LMACs.
80 - Retrieve information like speed, duplex, autoneg etc
81 - Retrieve PHY EEPROM and stats.
82 - Configure FEC, PAM modes
83 - etc
84
85From pure networking side AF driver supports following functionality.
86 - Map a physical link to a RVU PF to which a netdev is registered.
87 - Attach NIX and NPA block LFs to RVU PF/VF which provide buffer pools, RQs, SQs
88   for regular networking functionality.
89 - Flow control (pause frames) enable/disable/config.
90 - HW PTP timestamping related config.
91 - NPC parser profile config, basically how to parse pkt and what info to extract.
92 - NPC extract profile config, what to extract from the pkt to match data in MCAM entries.
93 - Manage NPC MCAM entries, upon request can frame and install requested packet forwarding rules.
94 - Defines receive side scaling (RSS) algorithms.
95 - Defines segmentation offload algorithms (eg TSO)
96 - VLAN stripping, capture and insertion config.
97 - SSO and TIM blocks config which provide packet scheduling support.
98 - Debugfs support, to check current resource provising, current status of
99   NPA pools, NIX RQ, SQ and CQs, various stats etc which helps in debugging issues.
100 - And many more.
101
102Physical Function driver
103------------------------
104
105This RVU PF handles IO, is mapped to a physical ethernet link and this
106driver registers a netdev. This supports SR-IOV. As said above this driver
107communicates with AF with a mailbox. To retrieve information from physical
108links this driver talks to AF and AF gets that info from firmware and responds
109back ie cannot talk to firmware directly.
110
111Supports ethtool for configuring links, RSS, queue count, queue size,
112flow control, ntuple filters, dump PHY EEPROM, config FEC etc.
113
114Virtual Function driver
115-----------------------
116
117There are two types VFs, VFs that share the physical link with their parent
118SR-IOV PF and the VFs which work in pairs using internal HW loopback channels (LBK).
119
120Type1:
121 - These VFs and their parent PF share a physical link and used for outside communication.
122 - VFs cannot communicate with AF directly, they send mbox message to PF and PF
123   forwards that to AF. AF after processing, responds back to PF and PF forwards
124   the reply to VF.
125 - From functionality point of view there is no difference between PF and VF as same type
126   HW resources are attached to both. But user would be able to configure few stuff only
127   from PF as PF is treated as owner/admin of the link.
128
129Type2:
130 - RVU PF0 ie admin function creates these VFs and maps them to loopback block's channels.
131 - A set of two VFs (VF0 & VF1, VF2 & VF3 .. so on) works as a pair ie pkts sent out of
132   VF0 will be received by VF1 and vice versa.
133 - These VFs can be used by applications or virtual machines to communicate between them
134   without sending traffic outside. There is no switch present in HW, hence the support
135   for loopback VFs.
136 - These communicate directly with AF (PF0) via mbox.
137
138Except for the IO channels or links used for packet reception and transmission there is
139no other difference between these VF types. AF driver takes care of IO channel mapping,
140hence same VF driver works for both types of devices.
141
142Basic packet flow
143=================
144
145Ingress
146-------
147
1481. CGX LMAC receives packet.
1492. Forwards the packet to the NIX block.
1503. Then submitted to NPC block for parsing and then MCAM lookup to get the destination RVU device.
1514. NIX LF attached to the destination RVU device allocates a buffer from RQ mapped buffer pool of NPA block LF.
1525. RQ may be selected by RSS or by configuring MCAM rule with a RQ number.
1536. Packet is DMA'ed and driver is notified.
154
155Egress
156------
157
1581. Driver prepares a send descriptor and submits to SQ for transmission.
1592. The SQ is already configured (by AF) to transmit on a specific link/channel.
1603. The SQ descriptor ring is maintained in buffers allocated from SQ mapped pool of NPA block LF.
1614. NIX block transmits the pkt on the designated channel.
1625. NPC MCAM entries can be installed to divert pkt onto a different channel.
163
164Devlink health reporters
165========================
166
167NPA Reporters
168-------------
169The NPA reporters are responsible for reporting and recovering the following group of errors:
170
1711. GENERAL events
172
173   - Error due to operation of unmapped PF.
174   - Error due to disabled alloc/free for other HW blocks (NIX, SSO, TIM, DPI and AURA).
175
1762. ERROR events
177
178   - Fault due to NPA_AQ_INST_S read or NPA_AQ_RES_S write.
179   - AQ Doorbell Error.
180
1813. RAS events
182
183   - RAS Error Reporting for NPA_AQ_INST_S/NPA_AQ_RES_S.
184
1854. RVU events
186
187   - Error due to unmapped slot.
188
189Sample Output::
190
191	~# devlink health
192	pci/0002:01:00.0:
193	  reporter hw_npa_intr
194	      state healthy error 2872 recover 2872 last_dump_date 2020-12-10 last_dump_time 09:39:09 grace_period 0 auto_recover true auto_dump true
195	  reporter hw_npa_gen
196	      state healthy error 2872 recover 2872 last_dump_date 2020-12-11 last_dump_time 04:43:04 grace_period 0 auto_recover true auto_dump true
197	  reporter hw_npa_err
198	      state healthy error 2871 recover 2871 last_dump_date 2020-12-10 last_dump_time 09:39:17 grace_period 0 auto_recover true auto_dump true
199	   reporter hw_npa_ras
200	      state healthy error 0 recover 0 last_dump_date 2020-12-10 last_dump_time 09:32:40 grace_period 0 auto_recover true auto_dump true
201
202Each reporter dumps the
203
204 - Error Type
205 - Error Register value
206 - Reason in words
207
208For example::
209
210	~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_gen
211	 NPA_AF_GENERAL:
212	         NPA General Interrupt Reg : 1
213	         NIX0: free disabled RX
214	~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_intr
215	 NPA_AF_RVU:
216	         NPA RVU Interrupt Reg : 1
217	         Unmap Slot Error
218	~# devlink health dump show  pci/0002:01:00.0 reporter hw_npa_err
219	 NPA_AF_ERR:
220	        NPA Error Interrupt Reg : 4096
221	        AQ Doorbell Error
222
223
224NIX Reporters
225-------------
226The NIX reporters are responsible for reporting and recovering the following group of errors:
227
2281. GENERAL events
229
230   - Receive mirror/multicast packet drop due to insufficient buffer.
231   - SMQ Flush operation.
232
2332. ERROR events
234
235   - Memory Fault due to WQE read/write from multicast/mirror buffer.
236   - Receive multicast/mirror replication list error.
237   - Receive packet on an unmapped PF.
238   - Fault due to NIX_AQ_INST_S read or NIX_AQ_RES_S write.
239   - AQ Doorbell Error.
240
2413. RAS events
242
243   - RAS Error Reporting for NIX Receive Multicast/Mirror Entry Structure.
244   - RAS Error Reporting for WQE/Packet Data read from Multicast/Mirror Buffer..
245   - RAS Error Reporting for NIX_AQ_INST_S/NIX_AQ_RES_S.
246
2474. RVU events
248
249   - Error due to unmapped slot.
250
251Sample Output::
252
253	~# ./devlink health
254	pci/0002:01:00.0:
255	  reporter hw_npa_intr
256	    state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
257	  reporter hw_npa_gen
258	    state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
259	  reporter hw_npa_err
260	    state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
261	  reporter hw_npa_ras
262	    state healthy error 0 recover 0 grace_period 0 auto_recover true auto_dump true
263	  reporter hw_nix_intr
264	    state healthy error 1121 recover 1121 last_dump_date 2021-01-19 last_dump_time 05:42:26 grace_period 0 auto_recover true auto_dump true
265	  reporter hw_nix_gen
266	    state healthy error 949 recover 949 last_dump_date 2021-01-19 last_dump_time 05:42:43 grace_period 0 auto_recover true auto_dump true
267	  reporter hw_nix_err
268	    state healthy error 1147 recover 1147 last_dump_date 2021-01-19 last_dump_time 05:42:59 grace_period 0 auto_recover true auto_dump true
269	  reporter hw_nix_ras
270	    state healthy error 409 recover 409 last_dump_date 2021-01-19 last_dump_time 05:43:16 grace_period 0 auto_recover true auto_dump true
271
272Each reporter dumps the
273
274 - Error Type
275 - Error Register value
276 - Reason in words
277
278For example::
279
280	~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_intr
281	 NIX_AF_RVU:
282	        NIX RVU Interrupt Reg : 1
283	        Unmap Slot Error
284	~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_gen
285	 NIX_AF_GENERAL:
286	        NIX General Interrupt Reg : 1
287	        Rx multicast pkt drop
288	~# devlink health dump show pci/0002:01:00.0 reporter hw_nix_err
289	 NIX_AF_ERR:
290	        NIX Error Interrupt Reg : 64
291	        Rx on unmapped PF_FUNC
292
293
294Quality of service
295==================
296
297
298Hardware algorithms used in scheduling
299--------------------------------------
300
301octeontx2 silicon and CN10K transmit interface consists of five transmit levels
302starting from SMQ/MDQ, TL4 to TL1. Each packet will traverse MDQ, TL4 to TL1
303levels. Each level contains an array of queues to support scheduling and shaping.
304The hardware uses the below algorithms depending on the priority of scheduler queues.
305once the usercreates tc classes with different priorities, the driver configures
306schedulers allocated to the class with specified priority along with rate-limiting
307configuration.
308
3091. Strict Priority
310
311      -  Once packets are submitted to MDQ, hardware picks all active MDQs having different priority
312         using strict priority.
313
3142. Round Robin
315
316      - Active MDQs having the same priority level are chosen using round robin.
317
318
319Setup HTB offload
320-----------------
321
3221. Enable HW TC offload on the interface::
323
324        # ethtool -K <interface> hw-tc-offload on
325
3262. Crate htb root::
327
328        # tc qdisc add dev <interface> clsact
329        # tc qdisc replace dev <interface> root handle 1: htb offload
330
3313. Create tc classes with different priorities::
332
333        # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 1
334
335        # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 7
336
3374. Create tc classes with same priorities and different quantum::
338
339        # tc class add dev <interface> parent 1: classid 1:1 htb rate 10Gbit prio 2 quantum 409600
340
341        # tc class add dev <interface> parent 1: classid 1:2 htb rate 10Gbit prio 2 quantum 188416
342
343        # tc class add dev <interface> parent 1: classid 1:3 htb rate 10Gbit prio 2 quantum 32768
344
345
346RVU Representors
347================
348
349RVU representor driver adds support for creation of representor devices for
350RVU PFs' VFs in the system. Representor devices are created when user enables
351the switchdev mode.
352Switchdev mode can be enabled either before or after setting up SRIOV numVFs.
353All representor devices share a single NIXLF but each has a dedicated Rx/Tx
354queues. RVU PF representor driver registers a separate netdev for each
355Rx/Tx queue pair.
356
357Current HW does not support built-in switch which can do L2 learning and
358forwarding packets between representee and representor. Hence, packet path
359between representee and it's representor is achieved by setting up appropriate
360NPC MCAM filters.
361Transmit packets matching these filters will be loopbacked through hardware
362loopback channel/interface (i.e, instead of sending them out of MAC interface).
363Which will again match the installed filters and will be forwarded.
364This way representee => representor and representor => representee packet
365path is achieved. These rules get installed when representors are created
366and gets active/deactivate based on the representor/representee interface state.
367
368Usage example:
369
370 - Change device to switchdev mode::
371
372	# devlink dev eswitch set pci/0002:1c:00.0 mode switchdev
373
374 - List of representor devices on the system::
375
376	# ip link show
377	Rpf1vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether f6:43:83:ee:26:21 brd ff:ff:ff:ff:ff:ff
378	Rpf1vf1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 12:b2:54:0e:24:54 brd ff:ff:ff:ff:ff:ff
379	Rpf1vf2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 4a:12:c4:4c:32:62 brd ff:ff:ff:ff:ff:ff
380	Rpf1vf3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether ca:cb:68:0e:e2:6e brd ff:ff:ff:ff:ff:ff
381	Rpf2vf0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 1000 link/ether 06:cc:ad:b4:f0:93 brd ff:ff:ff:ff:ff:ff
382
383
384To delete the representors devices from the system. Change the device to legacy mode.
385
386 - Change device to legacy mode::
387
388	# devlink dev eswitch set pci/0002:1c:00.0 mode legacy
389
390RVU representors can be managed using devlink ports
391(see :ref:`Documentation/networking/devlink/devlink-port.rst <devlink_port>`) interface.
392
393 - Show devlink ports of representors::
394
395	# devlink port
396	pci/0002:1c:00.0/0: type eth netdev Rpf1vf0 flavour physical port 0 splittable false
397	pci/0002:1c:00.0/1: type eth netdev Rpf1vf1 flavour pcivf controller 0 pfnum 1 vfnum 1 external false splittable false
398	pci/0002:1c:00.0/2: type eth netdev Rpf1vf2 flavour pcivf controller 0 pfnum 1 vfnum 2 external false splittable false
399	pci/0002:1c:00.0/3: type eth netdev Rpf1vf3 flavour pcivf controller 0 pfnum 1 vfnum 3 external false splittable false
400
401Function attributes
402===================
403
404The RVU representor support function attributes for representors.
405Port function configuration of the representors are supported through devlink eswitch port.
406
407MAC address setup
408-----------------
409
410RVU representor driver support devlink port function attr mechanism to setup MAC
411address. (refer to Documentation/networking/devlink/devlink-port.rst)
412
413 - To setup MAC address for port 2::
414
415	# devlink port function set pci/0002:1c:00.0/2 hw_addr 5c:a1:1b:5e:43:11
416	# devlink port show pci/0002:1c:00.0/2
417	pci/0002:1c:00.0/2: type eth netdev Rpf1vf2 flavour pcivf controller 0 pfnum 1 vfnum 2 external false splittable false
418	function:
419		hw_addr 5c:a1:1b:5e:43:11
420
421
422TC offload
423==========
424
425The rvu representor driver implements support for offloading tc rules using port representors.
426
427 - Drop packets with vlan id 3::
428
429	# tc filter add dev Rpf1vf0 protocol 802.1Q parent ffff: flower vlan_id 3 vlan_ethtype ipv4 skip_sw action drop
430
431 - Redirect packets with vlan id 5 and IPv4 packets to eth1, after stripping vlan header.::
432
433	# tc filter add dev Rpf1vf0 ingress protocol 802.1Q flower vlan_id 5 vlan_ethtype ipv4 skip_sw action vlan pop action mirred ingress redirect dev eth1
434