1.. SPDX-License-Identifier: GPL-2.0-only 2 3============= 4 QAIC driver 5============= 6 7The QAIC driver is the Kernel Mode Driver (KMD) for the AIC100 family of AI 8accelerator products. 9 10Interrupts 11========== 12 13While the AIC100 DMA Bridge hardware implements an IRQ storm mitigation 14mechanism, it is still possible for an IRQ storm to occur. A storm can happen 15if the workload is particularly quick, and the host is responsive. If the host 16can drain the response FIFO as quickly as the device can insert elements into 17it, then the device will frequently transition the response FIFO from empty to 18non-empty and generate MSIs at a rate equivalent to the speed of the 19workload's ability to process inputs. The lprnet (license plate reader network) 20workload is known to trigger this condition, and can generate in excess of 100k 21MSIs per second. It has been observed that most systems cannot tolerate this 22for long, and will crash due to some form of watchdog due to the overhead of 23the interrupt controller interrupting the host CPU. 24 25To mitigate this issue, the QAIC driver implements specific IRQ handling. When 26QAIC receives an IRQ, it disables that line. This prevents the interrupt 27controller from interrupting the CPU. Then AIC drains the FIFO. Once the FIFO 28is drained, QAIC implements a "last chance" polling algorithm where QAIC will 29sleep for a time to see if the workload will generate more activity. The IRQ 30line remains disabled during this time. If no activity is detected, QAIC exits 31polling mode and reenables the IRQ line. 32 33This mitigation in QAIC is very effective. The same lprnet usecase that 34generates 100k IRQs per second (per /proc/interrupts) is reduced to roughly 64 35IRQs over 5 minutes while keeping the host system stable, and having the same 36workload throughput performance (within run to run noise variation). 37 38 39Neural Network Control (NNC) Protocol 40===================================== 41 42The implementation of NNC is split between the KMD (QAIC) and UMD. In general 43QAIC understands how to encode/decode NNC wire protocol, and elements of the 44protocol which require kernel space knowledge to process (for example, mapping 45host memory to device IOVAs). QAIC understands the structure of a message, and 46all of the transactions. QAIC does not understand commands (the payload of a 47passthrough transaction). 48 49QAIC handles and enforces the required little endianness and 64-bit alignment, 50to the degree that it can. Since QAIC does not know the contents of a 51passthrough transaction, it relies on the UMD to satisfy the requirements. 52 53The terminate transaction is of particular use to QAIC. QAIC is not aware of 54the resources that are loaded onto a device since the majority of that activity 55occurs within NNC commands. As a result, QAIC does not have the means to 56roll back userspace activity. To ensure that a userspace client's resources 57are fully released in the case of a process crash, or a bug, QAIC uses the 58terminate command to let QSM know when a user has gone away, and the resources 59can be released. 60 61QSM can report a version number of the NNC protocol it supports. This is in the 62form of a Major number and a Minor number. 63 64Major number updates indicate changes to the NNC protocol which impact the 65message format, or transactions (impacts QAIC). 66 67Minor number updates indicate changes to the NNC protocol which impact the 68commands (does not impact QAIC). 69 70uAPI 71==== 72 73QAIC defines a number of driver specific IOCTLs as part of the userspace API. 74This section describes those APIs. 75 76DRM_IOCTL_QAIC_MANAGE 77 This IOCTL allows userspace to send a NNC request to the QSM. The call will 78 block until a response is received, or the request has timed out. 79 80DRM_IOCTL_QAIC_CREATE_BO 81 This IOCTL allows userspace to allocate a buffer object (BO) which can send 82 or receive data from a workload. The call will return a GEM handle that 83 represents the allocated buffer. The BO is not usable until it has been 84 sliced (see DRM_IOCTL_QAIC_ATTACH_SLICE_BO). 85 86DRM_IOCTL_QAIC_MMAP_BO 87 This IOCTL allows userspace to prepare an allocated BO to be mmap'd into the 88 userspace process. 89 90DRM_IOCTL_QAIC_ATTACH_SLICE_BO 91 This IOCTL allows userspace to slice a BO in preparation for sending the BO 92 to the device. Slicing is the operation of describing what portions of a BO 93 get sent where to a workload. This requires a set of DMA transfers for the 94 DMA Bridge, and as such, locks the BO to a specific DBC. 95 96DRM_IOCTL_QAIC_EXECUTE_BO 97 This IOCTL allows userspace to submit a set of sliced BOs to the device. The 98 call is non-blocking. Success only indicates that the BOs have been queued 99 to the device, but does not guarantee they have been executed. 100 101DRM_IOCTL_QAIC_PARTIAL_EXECUTE_BO 102 This IOCTL operates like DRM_IOCTL_QAIC_EXECUTE_BO, but it allows userspace 103 to shrink the BOs sent to the device for this specific call. If a BO 104 typically has N inputs, but only a subset of those is available, this IOCTL 105 allows userspace to indicate that only the first M bytes of the BO should be 106 sent to the device to minimize data transfer overhead. This IOCTL dynamically 107 recomputes the slicing, and therefore has some processing overhead before the 108 BOs can be queued to the device. 109 110DRM_IOCTL_QAIC_WAIT_BO 111 This IOCTL allows userspace to determine when a particular BO has been 112 processed by the device. The call will block until either the BO has been 113 processed and can be re-queued to the device, or a timeout occurs. 114 115DRM_IOCTL_QAIC_PERF_STATS_BO 116 This IOCTL allows userspace to collect performance statistics on the most 117 recent execution of a BO. This allows userspace to construct an end to end 118 timeline of the BO processing for a performance analysis. 119 120DRM_IOCTL_QAIC_PART_DEV 121 This IOCTL allows userspace to request a duplicate "shadow device". This extra 122 accelN device is associated with a specific partition of resources on the 123 AIC100 device and can be used for limiting a process to some subset of 124 resources. 125 126DRM_IOCTL_QAIC_DETACH_SLICE_BO 127 This IOCTL allows userspace to remove the slicing information from a BO that 128 was originally provided by a call to DRM_IOCTL_QAIC_ATTACH_SLICE_BO. This 129 is the inverse of DRM_IOCTL_QAIC_ATTACH_SLICE_BO. The BO must be idle for 130 DRM_IOCTL_QAIC_DETACH_SLICE_BO to be called. After a successful detach slice 131 operation the BO may have new slicing information attached with a new call 132 to DRM_IOCTL_QAIC_ATTACH_SLICE_BO. After detach slice, the BO cannot be 133 executed until after a new attach slice operation. Combining attach slice 134 and detach slice calls allows userspace to use a BO with multiple workloads. 135 136Userspace Client Isolation 137========================== 138 139AIC100 supports multiple clients. Multiple DBCs can be consumed by a single 140client, and multiple clients can each consume one or more DBCs. Workloads 141may contain sensitive information therefore only the client that owns the 142workload should be allowed to interface with the DBC. 143 144Clients are identified by the instance associated with their open(). A client 145may only use memory they allocate, and DBCs that are assigned to their 146workloads. Attempts to access resources assigned to other clients will be 147rejected. 148 149Module parameters 150================= 151 152QAIC supports the following module parameters: 153 154**datapath_polling (bool)** 155 156Configures QAIC to use a polling thread for datapath events instead of relying 157on the device interrupts. Useful for platforms with broken multiMSI. Must be 158set at QAIC driver initialization. Default is 0 (off). 159 160**mhi_timeout_ms (unsigned int)** 161 162Sets the timeout value for MHI operations in milliseconds (ms). Must be set 163at the time the driver detects a device. Default is 2000 (2 seconds). 164 165**control_resp_timeout_s (unsigned int)** 166 167Sets the timeout value for QSM responses to NNC messages in seconds (s). Must 168be set at the time the driver is sending a request to QSM. Default is 60 (one 169minute). 170 171**wait_exec_default_timeout_ms (unsigned int)** 172 173Sets the default timeout for the wait_exec ioctl in milliseconds (ms). Must be 174set prior to the waic_exec ioctl call. A value specified in the ioctl call 175overrides this for that call. Default is 5000 (5 seconds). 176 177**datapath_poll_interval_us (unsigned int)** 178 179Sets the polling interval in microseconds (us) when datapath polling is active. 180Takes effect at the next polling interval. Default is 100 (100 us). 181