1.. SPDX-License-Identifier: GPL-2.0 2 3=============== 4fwctl subsystem 5=============== 6 7:Author: Jason Gunthorpe 8 9Overview 10======== 11 12Modern devices contain extensive amounts of FW, and in many cases, are largely 13software-defined pieces of hardware. The evolution of this approach is largely a 14reaction to Moore's Law where a chip tape out is now highly expensive, and the 15chip design is extremely large. Replacing fixed HW logic with a flexible and 16tightly coupled FW/HW combination is an effective risk mitigation against chip 17respin. Problems in the HW design can be counteracted in device FW. This is 18especially true for devices which present a stable and backwards compatible 19interface to the operating system driver (such as NVMe). 20 21The FW layer in devices has grown to incredible size and devices frequently 22integrate clusters of fast processors to run it. For example, mlx5 devices have 23over 30MB of FW code, and big configurations operate with over 1GB of FW managed 24runtime state. 25 26The availability of such a flexible layer has created quite a variety in the 27industry where single pieces of silicon are now configurable software-defined 28devices and can operate in substantially different ways depending on the need. 29Further, we often see cases where specific sites wish to operate devices in ways 30that are highly specialized and require applications that have been tailored to 31their unique configuration. 32 33Further, devices have become multi-functional and integrated to the point they 34no longer fit neatly into the kernel's division of subsystems. Modern 35multi-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many 36subsystems while sharing the underlying hardware using the auxiliary device 37system. 38 39All together this creates a challenge for the operating system, where devices 40have an expansive FW environment that needs robust device-specific debugging 41support, and FW-driven functionality that is not well suited to “generic” 42interfaces. fwctl seeks to allow access to the full device functionality from 43user space in the areas of debuggability, management, and first-boot/nth-boot 44provisioning. 45 46fwctl is aimed at the common device design pattern where the OS and FW 47communicate via an RPC message layer constructed with a queue or mailbox scheme. 48In this case the driver will typically have some layer to deliver RPC messages 49and collect RPC responses from device FW. The in-kernel subsystem drivers that 50operate the device for its primary purposes will use these RPCs to build their 51drivers, but devices also usually have a set of ancillary RPCs that don't really 52fit into any specific subsystem. For example, a HW RAID controller is primarily 53operated by the block layer but also comes with a set of RPCs to administer the 54construction of drives within the HW RAID. 55 56In the past when devices were more single function, individual subsystems would 57grow different approaches to solving some of these common problems. For instance, 58monitoring device health, manipulating its FLASH, debugging the FW, 59provisioning, all have various unique interfaces across the kernel. 60 61fwctl's purpose is to define a common set of limited rules, described below, 62that allow user space to securely construct and execute RPCs inside device FW. 63The rules serve as an agreement between the operating system and FW on how to 64correctly design the RPC interface. As a uAPI the subsystem provides a thin 65layer of discovery and a generic uAPI to deliver the RPCs and collect the 66response. It supports a system of user space libraries and tools which will 67use this interface to control the device using the device native protocols. 68 69Scope of Action 70--------------- 71 72fwctl drivers are strictly restricted to being a way to operate the device FW. 73It is not an avenue to access random kernel internals, or other operating system 74SW states. 75 76fwctl instances must operate on a well-defined device function, and the device 77should have a well-defined security model for what scope within the physical 78device the function is permitted to access. For instance, the most complex PCIe 79device today may broadly have several function-level scopes: 80 81 1. A privileged function with full access to the on-device global state and 82 configuration 83 84 2. Multiple hypervisor functions with control over itself and child functions 85 used with VMs 86 87 3. Multiple VM functions tightly scoped within the VM 88 89The device may create a logical parent/child relationship between these scopes. 90For instance, a child VM's FW may be within the scope of the hypervisor FW. It is 91quite common in the VFIO world that the hypervisor environment has a complex 92provisioning/profiling/configuration responsibility for the function VFIO 93assigns to the VM. 94 95Further, within the function, devices often have RPC commands that fall within 96some general scopes of action (see enum fwctl_rpc_scope): 97 98 1. Access to function & child configuration, FLASH, etc. that becomes live at a 99 function reset. Access to function & child runtime configuration that is 100 transparent or non-disruptive to any driver or VM. 101 102 2. Read-only access to function debug information that may report on FW objects 103 in the function & child, including FW objects owned by other kernel 104 subsystems. 105 106 3. Write access to function & child debug information strictly compatible with 107 the principles of kernel lockdown and kernel integrity protection. Triggers 108 a kernel taint. 109 110 4. Full debug device access. Triggers a kernel taint, requires CAP_SYS_RAWIO. 111 112User space will provide a scope label on each RPC and the kernel must enforce the 113above CAPs and taints based on that scope. A combination of kernel and FW can 114enforce that RPCs are placed in the correct scope by user space. 115 116Disallowed behavior 117------------------- 118 119There are many things this interface must not allow user space to do (without a 120taint or CAP), broadly derived from the principles of kernel lockdown. Some 121examples: 122 123 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with 124 untrusted code, or otherwise compromise device or system security and 125 integrity. 126 127 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel 128 objects owned by kernel drivers. 129 130 3. Directly configure or otherwise control kernel drivers. A subsystem kernel 131 driver can react to the device configuration at function reset/driver load 132 time, but otherwise must not be coupled to fwctl. 133 134 4. Operate the HW in a way that overlaps with the core purpose of another 135 primary kernel subsystem, such as read/write to LBAs, send/receive of 136 network packets, or operate an accelerator's data plane. 137 138fwctl is not a replacement for device direct access subsystems like uacce or 139VFIO. 140 141Operations exposed through fwctl's non-tainting interfaces should be fully 142sharable with other users of the device. For instance, exposing a RPC through 143fwctl should never prevent a kernel subsystem from also concurrently using that 144same RPC or hardware unit down the road. In such cases fwctl will be less 145important than proper kernel subsystems that eventually emerge. Mistakes in this 146area resulting in clashes will be resolved in favour of a kernel implementation. 147 148fwctl User API 149============== 150 151.. kernel-doc:: include/uapi/fwctl/bnxt.h 152.. kernel-doc:: include/uapi/fwctl/fwctl.h 153.. kernel-doc:: include/uapi/fwctl/mlx5.h 154.. kernel-doc:: include/uapi/fwctl/pds.h 155 156sysfs Class 157----------- 158 159fwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices 160(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device 161operates the iotcl uAPI described above. 162 163fwctl devices can be related to driver components in other subsystems through 164sysfs:: 165 166 $ ls /sys/class/fwctl/fwctl0/device/infiniband/ 167 ibp0s10f0 168 169 $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/ 170 fwctl0/ 171 172 $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0 173 dev device power subsystem uevent 174 175User space Community 176-------------------- 177 178Drawing inspiration from nvme-cli, participating in the kernel side must come 179with a user space in a common TBD git tree, at a minimum to usefully operate the 180kernel driver. Providing such an implementation is a pre-condition to merging a 181kernel driver. 182 183The goal is to build user space community around some of the shared problems 184we all have, and ideally develop some common user space programs with some 185starting themes of: 186 187 - Device in-field debugging 188 189 - HW provisioning 190 191 - VFIO child device profiling before VM boot 192 193 - Confidential Compute topics (attestation, secure provisioning) 194 195that stretch across all subsystems in the kernel. fwupd is a great example of 196how an excellent user space experience can emerge out of kernel-side diversity. 197 198fwctl Kernel API 199================ 200 201.. kernel-doc:: drivers/fwctl/main.c 202 :export: 203.. kernel-doc:: include/linux/fwctl.h 204 205fwctl Driver design 206------------------- 207 208In many cases a fwctl driver is going to be part of a larger cross-subsystem 209device possibly using the auxiliary_device mechanism. In that case several 210subsystems are going to be sharing the same device and FW interface layer so the 211device design must already provide for isolation and cooperation between kernel 212subsystems. fwctl should fit into that same model. 213 214Part of the driver should include a description of how its scope restrictions 215and security model work. The driver and FW together must ensure that RPCs 216provided by user space are mapped to the appropriate scope. If the validation is 217done in the driver then the validation can read a 'command effects' report from 218the device, or hardwire the enforcement. If the validation is done in the FW, 219then the driver should pass the fwctl_rpc_scope to the FW along with the command. 220 221The driver and FW must cooperate to ensure that either fwctl cannot allocate 222any FW resources, or any resources it does allocate are freed on FD closure. A 223driver primarily constructed around FW RPCs may find that its core PCI function 224and RPC layer belongs under fwctl with auxiliary devices connecting to other 225subsystems. 226 227Each device type must be mindful of Linux's philosophy for stable ABI. The FW 228RPC interface does not have to meet a strictly stable ABI, but it does need to 229meet an expectation that user space tools that are deployed and in significant 230use don't needlessly break. FW upgrade and kernel upgrade should keep widely 231deployed tooling working. 232 233Development and debugging focused RPCs under more permissive scopes can have 234less stability if the tools using them are only run under exceptional 235circumstances and not for every day use of the device. Debugging tools may even 236require exact version matching as they may require something similar to DWARF 237debug information from the FW binary. 238 239Security Response 240================= 241 242The kernel remains the gatekeeper for this interface. If violations of the 243scopes, security or isolation principles are found, we have options to let 244devices fix them with a FW update, push a kernel patch to parse and block RPC 245commands or push a kernel patch to block entire firmware versions/devices. 246 247While the kernel can always directly parse and restrict RPCs, it is expected 248that the existing kernel pattern of allowing drivers to delegate validation to 249FW to be a useful design. 250 251Existing Similar Examples 252========================= 253 254The approach described in this document is not a new idea. Direct, or near 255direct device access has been offered by the kernel in different areas for 256decades. With more devices wanting to follow this design pattern it is becoming 257clear that it is not entirely well understood and, more importantly, the 258security considerations are not well defined or agreed upon. 259 260Some examples: 261 262 - HW RAID controllers. This includes RPCs to do things like compose drives into 263 a RAID volume, configure RAID parameters, monitor the HW and more. 264 265 - Baseboard managers. RPCs for configuring settings in the device and more. 266 267 - NVMe vendor command capsules. nvme-cli provides access to some monitoring 268 functions that different products have defined, but more exist. 269 270 - CXL also has a NVMe-like vendor command system. 271 272 - DRM allows user space drivers to send commands to the device via kernel 273 mediation. 274 275 - RDMA allows user space drivers to directly push commands to the device 276 without kernel involvement. 277 278 - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc. 279 280The first 4 are examples of areas that fwctl intends to cover. The latter three 281are examples of disallowed behavior as they fully overlap with the primary purpose 282of a kernel subsystem. 283 284Some key lessons learned from these past efforts are the importance of having a 285common user space project to use as a pre-condition for obtaining a kernel 286driver. Developing good community around useful software in user space is key to 287getting companies to fund participation to enable their products. 288