xref: /linux/Documentation/userspace-api/fwctl/fwctl.rst (revision 4f9786035f9e519db41375818e1d0b5f20da2f10)
118285accSJason Gunthorpe.. SPDX-License-Identifier: GPL-2.0
218285accSJason Gunthorpe
318285accSJason Gunthorpe===============
418285accSJason Gunthorpefwctl subsystem
518285accSJason Gunthorpe===============
618285accSJason Gunthorpe
718285accSJason Gunthorpe:Author: Jason Gunthorpe
818285accSJason Gunthorpe
918285accSJason GunthorpeOverview
1018285accSJason Gunthorpe========
1118285accSJason Gunthorpe
1218285accSJason GunthorpeModern devices contain extensive amounts of FW, and in many cases, are largely
1318285accSJason Gunthorpesoftware-defined pieces of hardware. The evolution of this approach is largely a
1418285accSJason Gunthorpereaction to Moore's Law where a chip tape out is now highly expensive, and the
1518285accSJason Gunthorpechip design is extremely large. Replacing fixed HW logic with a flexible and
1618285accSJason Gunthorpetightly coupled FW/HW combination is an effective risk mitigation against chip
1718285accSJason Gunthorperespin. Problems in the HW design can be counteracted in device FW. This is
1818285accSJason Gunthorpeespecially true for devices which present a stable and backwards compatible
1918285accSJason Gunthorpeinterface to the operating system driver (such as NVMe).
2018285accSJason Gunthorpe
2118285accSJason GunthorpeThe FW layer in devices has grown to incredible size and devices frequently
2218285accSJason Gunthorpeintegrate clusters of fast processors to run it. For example, mlx5 devices have
2318285accSJason Gunthorpeover 30MB of FW code, and big configurations operate with over 1GB of FW managed
2418285accSJason Gunthorperuntime state.
2518285accSJason Gunthorpe
2618285accSJason GunthorpeThe availability of such a flexible layer has created quite a variety in the
2718285accSJason Gunthorpeindustry where single pieces of silicon are now configurable software-defined
2818285accSJason Gunthorpedevices and can operate in substantially different ways depending on the need.
2918285accSJason GunthorpeFurther, we often see cases where specific sites wish to operate devices in ways
3018285accSJason Gunthorpethat are highly specialized and require applications that have been tailored to
3118285accSJason Gunthorpetheir unique configuration.
3218285accSJason Gunthorpe
3318285accSJason GunthorpeFurther, devices have become multi-functional and integrated to the point they
3418285accSJason Gunthorpeno longer fit neatly into the kernel's division of subsystems. Modern
3518285accSJason Gunthorpemulti-functional devices have drivers, such as bnxt/ice/mlx5/pds, that span many
3618285accSJason Gunthorpesubsystems while sharing the underlying hardware using the auxiliary device
3718285accSJason Gunthorpesystem.
3818285accSJason Gunthorpe
3918285accSJason GunthorpeAll together this creates a challenge for the operating system, where devices
4018285accSJason Gunthorpehave an expansive FW environment that needs robust device-specific debugging
4118285accSJason Gunthorpesupport, and FW-driven functionality that is not well suited to “generic”
4218285accSJason Gunthorpeinterfaces. fwctl seeks to allow access to the full device functionality from
4318285accSJason Gunthorpeuser space in the areas of debuggability, management, and first-boot/nth-boot
4418285accSJason Gunthorpeprovisioning.
4518285accSJason Gunthorpe
4618285accSJason Gunthorpefwctl is aimed at the common device design pattern where the OS and FW
4718285accSJason Gunthorpecommunicate via an RPC message layer constructed with a queue or mailbox scheme.
4818285accSJason GunthorpeIn this case the driver will typically have some layer to deliver RPC messages
4918285accSJason Gunthorpeand collect RPC responses from device FW. The in-kernel subsystem drivers that
5018285accSJason Gunthorpeoperate the device for its primary purposes will use these RPCs to build their
5118285accSJason Gunthorpedrivers, but devices also usually have a set of ancillary RPCs that don't really
5218285accSJason Gunthorpefit into any specific subsystem. For example, a HW RAID controller is primarily
5318285accSJason Gunthorpeoperated by the block layer but also comes with a set of RPCs to administer the
5418285accSJason Gunthorpeconstruction of drives within the HW RAID.
5518285accSJason Gunthorpe
5618285accSJason GunthorpeIn the past when devices were more single function, individual subsystems would
5718285accSJason Gunthorpegrow different approaches to solving some of these common problems. For instance
5818285accSJason Gunthorpemonitoring device health, manipulating its FLASH, debugging the FW,
5918285accSJason Gunthorpeprovisioning, all have various unique interfaces across the kernel.
6018285accSJason Gunthorpe
6118285accSJason Gunthorpefwctl's purpose is to define a common set of limited rules, described below,
6218285accSJason Gunthorpethat allow user space to securely construct and execute RPCs inside device FW.
6318285accSJason GunthorpeThe rules serve as an agreement between the operating system and FW on how to
6418285accSJason Gunthorpecorrectly design the RPC interface. As a uAPI the subsystem provides a thin
6518285accSJason Gunthorpelayer of discovery and a generic uAPI to deliver the RPCs and collect the
6618285accSJason Gunthorperesponse. It supports a system of user space libraries and tools which will
6718285accSJason Gunthorpeuse this interface to control the device using the device native protocols.
6818285accSJason Gunthorpe
6918285accSJason GunthorpeScope of Action
7018285accSJason Gunthorpe---------------
7118285accSJason Gunthorpe
7218285accSJason Gunthorpefwctl drivers are strictly restricted to being a way to operate the device FW.
7318285accSJason GunthorpeIt is not an avenue to access random kernel internals, or other operating system
7418285accSJason GunthorpeSW states.
7518285accSJason Gunthorpe
7618285accSJason Gunthorpefwctl instances must operate on a well-defined device function, and the device
7718285accSJason Gunthorpeshould have a well-defined security model for what scope within the physical
7818285accSJason Gunthorpedevice the function is permitted to access. For instance, the most complex PCIe
7918285accSJason Gunthorpedevice today may broadly have several function-level scopes:
8018285accSJason Gunthorpe
8118285accSJason Gunthorpe 1. A privileged function with full access to the on-device global state and
8218285accSJason Gunthorpe    configuration
8318285accSJason Gunthorpe
8418285accSJason Gunthorpe 2. Multiple hypervisor functions with control over itself and child functions
8518285accSJason Gunthorpe    used with VMs
8618285accSJason Gunthorpe
8718285accSJason Gunthorpe 3. Multiple VM functions tightly scoped within the VM
8818285accSJason Gunthorpe
8918285accSJason GunthorpeThe device may create a logical parent/child relationship between these scopes.
9018285accSJason GunthorpeFor instance a child VM's FW may be within the scope of the hypervisor FW. It is
9118285accSJason Gunthorpequite common in the VFIO world that the hypervisor environment has a complex
9218285accSJason Gunthorpeprovisioning/profiling/configuration responsibility for the function VFIO
9318285accSJason Gunthorpeassigns to the VM.
9418285accSJason Gunthorpe
9518285accSJason GunthorpeFurther, within the function, devices often have RPC commands that fall within
9618285accSJason Gunthorpesome general scopes of action (see enum fwctl_rpc_scope):
9718285accSJason Gunthorpe
9818285accSJason Gunthorpe 1. Access to function & child configuration, FLASH, etc. that becomes live at a
9918285accSJason Gunthorpe    function reset. Access to function & child runtime configuration that is
10018285accSJason Gunthorpe    transparent or non-disruptive to any driver or VM.
10118285accSJason Gunthorpe
10218285accSJason Gunthorpe 2. Read-only access to function debug information that may report on FW objects
10318285accSJason Gunthorpe    in the function & child, including FW objects owned by other kernel
10418285accSJason Gunthorpe    subsystems.
10518285accSJason Gunthorpe
10618285accSJason Gunthorpe 3. Write access to function & child debug information strictly compatible with
10718285accSJason Gunthorpe    the principles of kernel lockdown and kernel integrity protection. Triggers
10818285accSJason Gunthorpe    a kernel Taint.
10918285accSJason Gunthorpe
11018285accSJason Gunthorpe 4. Full debug device access. Triggers a kernel Taint, requires CAP_SYS_RAWIO.
11118285accSJason Gunthorpe
11218285accSJason GunthorpeUser space will provide a scope label on each RPC and the kernel must enforce the
11318285accSJason Gunthorpeabove CAPs and taints based on that scope. A combination of kernel and FW can
11418285accSJason Gunthorpeenforce that RPCs are placed in the correct scope by user space.
11518285accSJason Gunthorpe
11618285accSJason GunthorpeDenied behavior
11718285accSJason Gunthorpe---------------
11818285accSJason Gunthorpe
11918285accSJason GunthorpeThere are many things this interface must not allow user space to do (without a
12018285accSJason GunthorpeTaint or CAP), broadly derived from the principles of kernel lockdown. Some
12118285accSJason Gunthorpeexamples:
12218285accSJason Gunthorpe
12318285accSJason Gunthorpe 1. DMA to/from arbitrary memory, hang the system, compromise FW integrity with
12418285accSJason Gunthorpe    untrusted code, or otherwise compromise device or system security and
12518285accSJason Gunthorpe    integrity.
12618285accSJason Gunthorpe
12718285accSJason Gunthorpe 2. Provide an abnormal “back door” to kernel drivers. No manipulation of kernel
12818285accSJason Gunthorpe    objects owned by kernel drivers.
12918285accSJason Gunthorpe
13018285accSJason Gunthorpe 3. Directly configure or otherwise control kernel drivers. A subsystem kernel
13118285accSJason Gunthorpe    driver can react to the device configuration at function reset/driver load
13218285accSJason Gunthorpe    time, but otherwise must not be coupled to fwctl.
13318285accSJason Gunthorpe
13418285accSJason Gunthorpe 4. Operate the HW in a way that overlaps with the core purpose of another
13518285accSJason Gunthorpe    primary kernel subsystem, such as read/write to LBAs, send/receive of
13618285accSJason Gunthorpe    network packets, or operate an accelerator's data plane.
13718285accSJason Gunthorpe
13818285accSJason Gunthorpefwctl is not a replacement for device direct access subsystems like uacce or
13918285accSJason GunthorpeVFIO.
14018285accSJason Gunthorpe
14118285accSJason GunthorpeOperations exposed through fwctl's non-taining interfaces should be fully
14218285accSJason Gunthorpesharable with other users of the device. For instance exposing a RPC through
14318285accSJason Gunthorpefwctl should never prevent a kernel subsystem from also concurrently using that
14418285accSJason Gunthorpesame RPC or hardware unit down the road. In such cases fwctl will be less
14518285accSJason Gunthorpeimportant than proper kernel subsystems that eventually emerge. Mistakes in this
14618285accSJason Gunthorpearea resulting in clashes will be resolved in favour of a kernel implementation.
14718285accSJason Gunthorpe
14818285accSJason Gunthorpefwctl User API
14918285accSJason Gunthorpe==============
15018285accSJason Gunthorpe
15118285accSJason Gunthorpe.. kernel-doc:: include/uapi/fwctl/fwctl.h
15252929c21SSaeed Mahameed.. kernel-doc:: include/uapi/fwctl/mlx5.h
153*40325707SShannon Nelson.. kernel-doc:: include/uapi/fwctl/pds.h
15418285accSJason Gunthorpe
15518285accSJason Gunthorpesysfs Class
15618285accSJason Gunthorpe-----------
15718285accSJason Gunthorpe
15818285accSJason Gunthorpefwctl has a sysfs class (/sys/class/fwctl/fwctlNN/) and character devices
15918285accSJason Gunthorpe(/dev/fwctl/fwctlNN) with a simple numbered scheme. The character device
16018285accSJason Gunthorpeoperates the iotcl uAPI described above.
16118285accSJason Gunthorpe
16218285accSJason Gunthorpefwctl devices can be related to driver components in other subsystems through
16318285accSJason Gunthorpesysfs::
16418285accSJason Gunthorpe
16518285accSJason Gunthorpe    $ ls /sys/class/fwctl/fwctl0/device/infiniband/
16618285accSJason Gunthorpe    ibp0s10f0
16718285accSJason Gunthorpe
16818285accSJason Gunthorpe    $ ls /sys/class/infiniband/ibp0s10f0/device/fwctl/
16918285accSJason Gunthorpe    fwctl0/
17018285accSJason Gunthorpe
17118285accSJason Gunthorpe    $ ls /sys/devices/pci0000:00/0000:00:0a.0/fwctl/fwctl0
17218285accSJason Gunthorpe    dev  device  power  subsystem  uevent
17318285accSJason Gunthorpe
17418285accSJason GunthorpeUser space Community
17518285accSJason Gunthorpe--------------------
17618285accSJason Gunthorpe
17718285accSJason GunthorpeDrawing inspiration from nvme-cli, participating in the kernel side must come
17818285accSJason Gunthorpewith a user space in a common TBD git tree, at a minimum to usefully operate the
17918285accSJason Gunthorpekernel driver. Providing such an implementation is a pre-condition to merging a
18018285accSJason Gunthorpekernel driver.
18118285accSJason Gunthorpe
18218285accSJason GunthorpeThe goal is to build user space community around some of the shared problems
18318285accSJason Gunthorpewe all have, and ideally develop some common user space programs with some
18418285accSJason Gunthorpestarting themes of:
18518285accSJason Gunthorpe
18618285accSJason Gunthorpe - Device in-field debugging
18718285accSJason Gunthorpe
18818285accSJason Gunthorpe - HW provisioning
18918285accSJason Gunthorpe
19018285accSJason Gunthorpe - VFIO child device profiling before VM boot
19118285accSJason Gunthorpe
19218285accSJason Gunthorpe - Confidential Compute topics (attestation, secure provisioning)
19318285accSJason Gunthorpe
19418285accSJason Gunthorpethat stretch across all subsystems in the kernel. fwupd is a great example of
19518285accSJason Gunthorpehow an excellent user space experience can emerge out of kernel-side diversity.
19618285accSJason Gunthorpe
19718285accSJason Gunthorpefwctl Kernel API
19818285accSJason Gunthorpe================
19918285accSJason Gunthorpe
20018285accSJason Gunthorpe.. kernel-doc:: drivers/fwctl/main.c
20118285accSJason Gunthorpe   :export:
20218285accSJason Gunthorpe.. kernel-doc:: include/linux/fwctl.h
20318285accSJason Gunthorpe
20418285accSJason Gunthorpefwctl Driver design
20518285accSJason Gunthorpe-------------------
20618285accSJason Gunthorpe
20718285accSJason GunthorpeIn many cases a fwctl driver is going to be part of a larger cross-subsystem
20818285accSJason Gunthorpedevice possibly using the auxiliary_device mechanism. In that case several
20918285accSJason Gunthorpesubsystems are going to be sharing the same device and FW interface layer so the
21018285accSJason Gunthorpedevice design must already provide for isolation and cooperation between kernel
21118285accSJason Gunthorpesubsystems. fwctl should fit into that same model.
21218285accSJason Gunthorpe
21318285accSJason GunthorpePart of the driver should include a description of how its scope restrictions
21418285accSJason Gunthorpeand security model work. The driver and FW together must ensure that RPCs
21518285accSJason Gunthorpeprovided by user space are mapped to the appropriate scope. If the validation is
21618285accSJason Gunthorpedone in the driver then the validation can read a 'command effects' report from
21718285accSJason Gunthorpethe device, or hardwire the enforcement. If the validation is done in the FW,
21818285accSJason Gunthorpethen the driver should pass the fwctl_rpc_scope to the FW along with the command.
21918285accSJason Gunthorpe
22018285accSJason GunthorpeThe driver and FW must cooperate to ensure that either fwctl cannot allocate
22118285accSJason Gunthorpeany FW resources, or any resources it does allocate are freed on FD closure.  A
22218285accSJason Gunthorpedriver primarily constructed around FW RPCs may find that its core PCI function
22318285accSJason Gunthorpeand RPC layer belongs under fwctl with auxiliary devices connecting to other
22418285accSJason Gunthorpesubsystems.
22518285accSJason Gunthorpe
22618285accSJason GunthorpeEach device type must be mindful of Linux's philosophy for stable ABI. The FW
22718285accSJason GunthorpeRPC interface does not have to meet a strictly stable ABI, but it does need to
22818285accSJason Gunthorpemeet an expectation that userspace tools that are deployed and in significant
22918285accSJason Gunthorpeuse don't needlessly break. FW upgrade and kernel upgrade should keep widely
23018285accSJason Gunthorpedeployed tooling working.
23118285accSJason Gunthorpe
23218285accSJason GunthorpeDevelopment and debugging focused RPCs under more permissive scopes can have
23318285accSJason Gunthorpeless stabilitiy if the tools using them are only run under exceptional
23418285accSJason Gunthorpecircumstances and not for every day use of the device. Debugging tools may even
23518285accSJason Gunthorperequire exact version matching as they may require something similar to DWARF
23618285accSJason Gunthorpedebug information from the FW binary.
23718285accSJason Gunthorpe
23818285accSJason GunthorpeSecurity Response
23918285accSJason Gunthorpe=================
24018285accSJason Gunthorpe
24118285accSJason GunthorpeThe kernel remains the gatekeeper for this interface. If violations of the
24218285accSJason Gunthorpescopes, security or isolation principles are found, we have options to let
24318285accSJason Gunthorpedevices fix them with a FW update, push a kernel patch to parse and block RPC
24418285accSJason Gunthorpecommands or push a kernel patch to block entire firmware versions/devices.
24518285accSJason Gunthorpe
24618285accSJason GunthorpeWhile the kernel can always directly parse and restrict RPCs, it is expected
24718285accSJason Gunthorpethat the existing kernel pattern of allowing drivers to delegate validation to
24818285accSJason GunthorpeFW to be a useful design.
24918285accSJason Gunthorpe
25018285accSJason GunthorpeExisting Similar Examples
25118285accSJason Gunthorpe=========================
25218285accSJason Gunthorpe
25318285accSJason GunthorpeThe approach described in this document is not a new idea. Direct, or near
25418285accSJason Gunthorpedirect device access has been offered by the kernel in different areas for
25518285accSJason Gunthorpedecades. With more devices wanting to follow this design pattern it is becoming
25618285accSJason Gunthorpeclear that it is not entirely well understood and, more importantly, the
25718285accSJason Gunthorpesecurity considerations are not well defined or agreed upon.
25818285accSJason Gunthorpe
25918285accSJason GunthorpeSome examples:
26018285accSJason Gunthorpe
26118285accSJason Gunthorpe - HW RAID controllers. This includes RPCs to do things like compose drives into
26218285accSJason Gunthorpe   a RAID volume, configure RAID parameters, monitor the HW and more.
26318285accSJason Gunthorpe
26418285accSJason Gunthorpe - Baseboard managers. RPCs for configuring settings in the device and more
26518285accSJason Gunthorpe
26618285accSJason Gunthorpe - NVMe vendor command capsules. nvme-cli provides access to some monitoring
26718285accSJason Gunthorpe   functions that different products have defined, but more exist.
26818285accSJason Gunthorpe
26918285accSJason Gunthorpe - CXL also has a NVMe-like vendor command system.
27018285accSJason Gunthorpe
27118285accSJason Gunthorpe - DRM allows user space drivers to send commands to the device via kernel
27218285accSJason Gunthorpe   mediation
27318285accSJason Gunthorpe
27418285accSJason Gunthorpe - RDMA allows user space drivers to directly push commands to the device
27518285accSJason Gunthorpe   without kernel involvement
27618285accSJason Gunthorpe
27718285accSJason Gunthorpe - Various “raw” APIs, raw HID (SDL2), raw USB, NVMe Generic Interface, etc.
27818285accSJason Gunthorpe
27918285accSJason GunthorpeThe first 4 are examples of areas that fwctl intends to cover. The latter three
28018285accSJason Gunthorpeare examples of denied behavior as they fully overlap with the primary purpose
28118285accSJason Gunthorpeof a kernel subsystem.
28218285accSJason Gunthorpe
28318285accSJason GunthorpeSome key lessons learned from these past efforts are the importance of having a
28418285accSJason Gunthorpecommon user space project to use as a pre-condition for obtaining a kernel
28518285accSJason Gunthorpedriver. Developing good community around useful software in user space is key to
28618285accSJason Gunthorpegetting companies to fund participation to enable their products.
287