xref: /linux/Documentation/userspace-api/fwctl/fwctl.rst (revision 18285acc2c047cda2449f426c09fc8969b04b8b1)
1.. SPDX-License-Identifier: GPL-2.0
2
3===============
4fwctl subsystem
5===============
6
7:Author: Jason Gunthorpe
8
9Overview
10========
11
12Modern devices contain extensive amounts of FW, and in many cases, are largely
13software-defined pieces of hardware. The evolution of this approach is largely a
14reaction to Moore's Law where a chip tape out is now highly expensive, and the
15chip design is extremely large. Replacing fixed HW logic with a flexible and
16tightly coupled FW/HW combination is an effective risk mitigation against chip
17respin. Problems in the HW design can be counteracted in device FW. This is
18especially true for devices which present a stable and backwards compatible
19interface to the operating system driver (such as NVMe).
20
21The FW layer in devices has grown to incredible size and devices frequently
22integrate clusters of fast processors to run it. For example, mlx5 devices have
23over 30MB of FW code, and big configurations operate with over 1GB of FW managed
24runtime state.
25
26The availability of such a flexible layer has created quite a variety in the
27industry where single pieces of silicon are now configurable software-defined
28devices and can operate in substantially different ways depending on the need.
29Further, we often see cases where specific sites wish to operate devices in ways
30that are highly specialized and require applications that have been tailored to
31their unique configuration.
32
33Further, devices have become multi-functional and integrated to the point they
34no longer fit neatly into the kernel's division of subsystems. Modern
35multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
36subsystems while sharing the underlying hardware using the auxiliary device
37system.
38
39All together this creates a challenge for the operating system, where devices
40have an expansive FW environment that needs robust device-specific debugging
41support, and FW-driven functionality that is not well suited to “generic”
42interfaces. fwctl seeks to allow access to the full device functionality from
43user space in the areas of debuggability, management, and first-boot/nth-boot
44provisioning.
45
46fwctl is aimed at the common device design pattern where the OS and FW
47communicate via an RPC message layer constructed with a queue or mailbox scheme.
48In this case the driver will typically have some layer to deliver RPC messages
49and collect RPC responses from device FW. The in-kernel subsystem drivers that
50operate the device for its primary purposes will use these RPCs to build their
51drivers, but devices also usually have a set of ancillary RPCs that don't really
52fit into any specific subsystem. For example, a HW RAID controller is primarily
53operated by the block layer but also comes with a set of RPCs to administer the
54construction of drives within the HW RAID.
55
56In the past when devices were more single function, individual subsystems would
57grow different approaches to solving some of these common problems. For instance
58monitoring device health, manipulating its FLASH, debugging the FW,
59provisioning, all have various unique interfaces across the kernel.
60
61fwctl's purpose is to define a common set of limited rules, described below,
62that allow user space to securely construct and execute RPCs inside device FW.
63The rules serve as an agreement between the operating system and FW on how to
64correctly design the RPC interface. As a uAPI the subsystem provides a thin
65layer of discovery and a generic uAPI to deliver the RPCs and collect the
66response. It supports a system of user space libraries and tools which will
67use this interface to control the device using the device native protocols.
68
69Scope of Action
70---------------
71
72fwctl drivers are strictly restricted to being a way to operate the device FW.
73It is not an avenue to access random kernel internals, or other operating system
74SW states.
75
76fwctl instances must operate on a well-defined device function, and the device
77should have a well-defined security model for what scope within the physical
78device the function is permitted to access. For instance, the most complex PCIe
79device today may broadly have several function-level scopes:
80
81 1. A privileged function with full access to the on-device global state and
82    configuration
83
84 2. Multiple hypervisor functions with control over itself and child functions
85    used with VMs
86
87 3. Multiple VM functions tightly scoped within the VM
88
89The device may create a logical parent/child relationship between these scopes.
90For instance a child VM's FW may be within the scope of the hypervisor FW. It is
91quite common in the VFIO world that the hypervisor environment has a complex
92provisioning/profiling/configuration responsibility for the function VFIO
93assigns to the VM.
94
95Further, within the function, devices often have RPC commands that fall within
96some general scopes of action (see enum fwctl_rpc_scope):
97
98 1. Access to function & child configuration, FLASH, etc. that becomes live at a
99    function reset. Access to function & child runtime configuration that is
100    transparent or non-disruptive to any driver or VM.
101
102 2. Read-only access to function debug information that may report on FW objects
103    in the function & child, including FW objects owned by other kernel
104    subsystems.
105
106 3. Write access to function & child debug information strictly compatible with
107    the principles of kernel lockdown and kernel integrity protection. Triggers
108    a kernel Taint.
109
110 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
111
112User space will provide a scope label on each RPC and the kernel must enforce the
113above CAPs and taints based on that scope. A combination of kernel and FW can
114enforce that RPCs are placed in the correct scope by user space.
115
116Denied behavior
117---------------
118
119There are many things this interface must not allow user space to do (without a
120Taint or CAP), broadly derived from the principles of kernel lockdown. Some
121examples:
122
123 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with
124    untrusted code, or otherwise compromise device or system security and
125    integrity.
126
127 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
128    objects owned by kernel drivers.
129
130 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
131    driver can react to the device configuration at function reset/driver load
132    time, but otherwise must not be coupled to fwctl.
133
134 4. Operate the HW in a way that overlaps with the core purpose of another
135    primary kernel subsystem, such as read/write to LBAs, send/receive of
136    network packets, or operate an accelerator's data plane.
137
138fwctl is not a replacement for device direct access subsystems like uacce or
139VFIO.
140
141Operations exposed through fwctl's non-taining interfaces should be fully
142sharable with other users of the device. For instance exposing a RPC through
143fwctl should never prevent a kernel subsystem from also concurrently using that
144same RPC or hardware unit down the road. In such cases fwctl will be less
145important than proper kernel subsystems that eventually emerge. Mistakes in this
146area resulting in clashes will be resolved in favour of a kernel implementation.
147
148fwctl User API
149==============
150
151.. kernel-doc:: include/uapi/fwctl/fwctl.h
152
153sysfs Class
154-----------
155
156fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
157(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
158operates the iotcl uAPI described above.
159
160fwctl devices can be related to driver components in other subsystems through
161sysfs::
162
163    $ ls /sys/class/fwctl/fwctl0/device/infiniband/
164    ibp0s10f0
165
166    $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
167    fwctl0/
168
169    $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
170    dev  device  power  subsystem  uevent
171
172User space Community
173--------------------
174
175Drawing inspiration from nvme-cli, participating in the kernel side must come
176with a user space in a common TBD git tree, at a minimum to usefully operate the
177kernel driver. Providing such an implementation is a pre-condition to merging a
178kernel driver.
179
180The goal is to build user space community around some of the shared problems
181we all have, and ideally develop some common user space programs with some
182starting themes of:
183
184 - Device in-field debugging
185
186 - HW provisioning
187
188 - VFIO child device profiling before VM boot
189
190 - Confidential Compute topics (attestation, secure provisioning)
191
192that stretch across all subsystems in the kernel. fwupd is a great example of
193how an excellent user space experience can emerge out of kernel-side diversity.
194
195fwctl Kernel API
196================
197
198.. kernel-doc:: drivers/fwctl/main.c
199   :export:
200.. kernel-doc:: include/linux/fwctl.h
201
202fwctl Driver design
203-------------------
204
205In many cases a fwctl driver is going to be part of a larger cross-subsystem
206device possibly using the auxiliary_device mechanism. In that case several
207subsystems are going to be sharing the same device and FW interface layer so the
208device design must already provide for isolation and cooperation between kernel
209subsystems. fwctl should fit into that same model.
210
211Part of the driver should include a description of how its scope restrictions
212and security model work. The driver and FW together must ensure that RPCs
213provided by user space are mapped to the appropriate scope. If the validation is
214done in the driver then the validation can read a 'command effects' report from
215the device, or hardwire the enforcement. If the validation is done in the FW,
216then the driver should pass the fwctl_rpc_scope to the FW along with the command.
217
218The driver and FW must cooperate to ensure that either fwctl cannot allocate
219any FW resources, or any resources it does allocate are freed on FD closure.  A
220driver primarily constructed around FW RPCs may find that its core PCI function
221and RPC layer belongs under fwctl with auxiliary devices connecting to other
222subsystems.
223
224Each device type must be mindful of Linux's philosophy for stable ABI. The FW
225RPC interface does not have to meet a strictly stable ABI, but it does need to
226meet an expectation that userspace tools that are deployed and in significant
227use don't needlessly break. FW upgrade and kernel upgrade should keep widely
228deployed tooling working.
229
230Development and debugging focused RPCs under more permissive scopes can have
231less stabilitiy if the tools using them are only run under exceptional
232circumstances and not for every day use of the device. Debugging tools may even
233require exact version matching as they may require something similar to DWARF
234debug information from the FW binary.
235
236Security Response
237=================
238
239The kernel remains the gatekeeper for this interface. If violations of the
240scopes, security or isolation principles are found, we have options to let
241devices fix them with a FW update, push a kernel patch to parse and block RPC
242commands or push a kernel patch to block entire firmware versions/devices.
243
244While the kernel can always directly parse and restrict RPCs, it is expected
245that the existing kernel pattern of allowing drivers to delegate validation to
246FW to be a useful design.
247
248Existing Similar Examples
249=========================
250
251The approach described in this document is not a new idea. Direct, or near
252direct device access has been offered by the kernel in different areas for
253decades. With more devices wanting to follow this design pattern it is becoming
254clear that it is not entirely well understood and, more importantly, the
255security considerations are not well defined or agreed upon.
256
257Some examples:
258
259 - HW RAID controllers. This includes RPCs to do things like compose drives into
260   a RAID volume, configure RAID parameters, monitor the HW and more.
261
262 - Baseboard managers. RPCs for configuring settings in the device and more
263
264 - NVMe vendor command capsules. nvme-cli provides access to some monitoring
265   functions that different products have defined, but more exist.
266
267 - CXL also has a NVMe-like vendor command system.
268
269 - DRM allows user space drivers to send commands to the device via kernel
270   mediation
271
272 - RDMA allows user space drivers to directly push commands to the device
273   without kernel involvement
274
275 - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc.
276
277The first 4 are examples of areas that fwctl intends to cover. The latter three
278are examples of denied behavior as they fully overlap with the primary purpose
279of a kernel subsystem.
280
281Some key lessons learned from these past efforts are the importance of having a
282common user space project to use as a pre-condition for obtaining a kernel
283driver. Developing good community around useful software in user space is key to
284getting companies to fund participation to enable their products.
285