1.\" 2.\" Copyright (c) 2012-2016 Intel Corporation 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions, and the following disclaimer, 10.\" without modification. 11.\" 2. Redistributions in binary form must reproduce at minimum a disclaimer 12.\" substantially similar to the "NO WARRANTY" disclaimer below 13.\" ("Disclaimer") and any redistribution must be conditioned upon 14.\" including a substantially similar Disclaimer requirement for further 15.\" binary redistribution. 16.\" 17.\" NO WARRANTY 18.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 19.\" "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 20.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR 21.\" A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 22.\" HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 26.\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING 27.\" IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGES. 29.\" 30.\" nvme driver man page. 31.\" 32.\" Author: Jim Harris <jimharris@FreeBSD.org> 33.\" 34.Dd June 6, 2020 35.Dt NVME 4 36.Os 37.Sh NAME 38.Nm nvme 39.Nd NVM Express core driver 40.Sh SYNOPSIS 41To compile this driver into your kernel, 42place the following line in your kernel configuration file: 43.Bd -ragged -offset indent 44.Cd "device nvme" 45.Ed 46.Pp 47Or, to load the driver as a module at boot, place the following line in 48.Xr loader.conf 5 : 49.Bd -literal -offset indent 50nvme_load="YES" 51.Ed 52.Pp 53Most users will also want to enable 54.Xr nvd 4 55or 56.Xr nda 4 57to expose NVM Express namespaces as disk devices which can be 58partitioned. 59Note that in NVM Express terms, a namespace is roughly equivalent to a 60SCSI LUN. 61.Sh DESCRIPTION 62The 63.Nm 64driver provides support for NVM Express (NVMe) controllers, such as: 65.Bl -bullet 66.It 67Hardware initialization 68.It 69Per-CPU IO queue pairs 70.It 71API for registering NVMe namespace consumers such as 72.Xr nvd 4 73or 74.Xr nda 4 75.It 76API for submitting NVM commands to namespaces 77.It 78Ioctls for controller and namespace configuration and management 79.El 80.Pp 81The 82.Nm 83driver creates controller device nodes in the format 84.Pa /dev/nvmeX 85and namespace device nodes in 86the format 87.Pa /dev/nvmeXnsY . 88Note that the NVM Express specification starts numbering namespaces at 1, 89not 0, and this driver follows that convention. 90.Sh CONFIGURATION 91By default, 92.Nm 93will create an I/O queue pair for each CPU, provided enough MSI-X vectors 94and NVMe queue pairs can be allocated. 95If not enough vectors or queue 96pairs are available, nvme(4) will use a smaller number of queue pairs and 97assign multiple CPUs per queue pair. 98.Pp 99To force a single I/O queue pair shared by all CPUs, set the following 100tunable value in 101.Xr loader.conf 5 : 102.Bd -literal -offset indent 103hw.nvme.per_cpu_io_queues=0 104.Ed 105.Pp 106To assign more than one CPU per I/O queue pair, thereby reducing the number 107of MSI-X vectors consumed by the device, set the following tunable value in 108.Xr loader.conf 5 : 109.Bd -literal -offset indent 110hw.nvme.min_cpus_per_ioq=X 111.Ed 112.Pp 113To force legacy interrupts for all 114.Nm 115driver instances, set the following tunable value in 116.Xr loader.conf 5 : 117.Bd -literal -offset indent 118hw.nvme.force_intx=1 119.Ed 120.Pp 121Note that use of INTx implies disabling of per-CPU I/O queue pairs. 122.Pp 123To control maximum amount of system RAM in bytes to use as Host Memory 124Buffer for capable devices, set the following tunable: 125.Bd -literal -offset indent 126hw.nvme.hmb_max 127.Ed 128.Pp 129The default value is 5% of physical memory size per device. 130.Pp 131The 132.Xr nvd 4 133driver is used to provide a disk driver to the system by default. 134The 135.Xr nda 4 136driver can also be used instead. 137The 138.Xr nvd 4 139driver performs better with smaller transactions and few TRIM 140commands. 141It sends all commands directly to the drive immediately. 142The 143.Xr nda 4 144driver performs better with larger transactions and also collapses 145TRIM commands giving better performance. 146It can queue commands to the drive; combine 147.Dv BIO_DELETE 148commands into a single trip; and 149use the CAM I/O scheduler to bias one type of operation over another. 150To select the 151.Xr nda 4 152driver, set the following tunable value in 153.Xr loader.conf 5 : 154.Bd -literal -offset indent 155hw.nvme.use_nvd=0 156.Ed 157.Pp 158This value may also be set in the kernel config file with 159.Bd -literal -offset indent 160.Cd options NVME_USE_NVD=0 161.Ed 162.Pp 163When there is an error, 164.Nm 165prints only the most relevant information about the command by default. 166To enable dumping of all information about the command, set the following tunable 167value in 168.Xr loader.conf 5 : 169.Bd -literal -offset indent 170hw.nvme.verbose_cmd_dump=1 171.Ed 172.Pp 173Prior versions of the driver reset the card twice on boot. 174This proved to be unnecessary and inefficient, so the driver now resets drive 175controller only once. 176The old behavior may be restored in the kernel config file with 177.Bd -literal -offset indent 178.Cd options NVME_2X_RESET 179.Ed 180.Sh SYSCTL VARIABLES 181The following controller-level sysctls are currently implemented: 182.Bl -tag -width indent 183.It Va dev.nvme.0.num_cpus_per_ioq 184(R) Number of CPUs associated with each I/O queue pair. 185.It Va dev.nvme.0.int_coal_time 186(R/W) Interrupt coalescing timer period in microseconds. 187Set to 0 to disable. 188.It Va dev.nvme.0.int_coal_threshold 189(R/W) Interrupt coalescing threshold in number of command completions. 190Set to 0 to disable. 191.El 192.Pp 193The following queue pair-level sysctls are currently implemented. 194Admin queue sysctls take the format of dev.nvme.0.adminq and I/O queue sysctls 195take the format of dev.nvme.0.ioq0. 196.Bl -tag -width indent 197.It Va dev.nvme.0.ioq0.num_entries 198(R) Number of entries in this queue pair's command and completion queue. 199.It Va dev.nvme.0.ioq0.num_tr 200(R) Number of nvme_tracker structures currently allocated for this queue pair. 201.It Va dev.nvme.0.ioq0.num_prp_list 202(R) Number of nvme_prp_list structures currently allocated for this queue pair. 203.It Va dev.nvme.0.ioq0.sq_head 204(R) Current location of the submission queue head pointer as observed by 205the driver. 206The head pointer is incremented by the controller as it takes commands off 207of the submission queue. 208.It Va dev.nvme.0.ioq0.sq_tail 209(R) Current location of the submission queue tail pointer as observed by 210the driver. 211The driver increments the tail pointer after writing a command 212into the submission queue to signal that a new command is ready to be 213processed. 214.It Va dev.nvme.0.ioq0.cq_head 215(R) Current location of the completion queue head pointer as observed by 216the driver. 217The driver increments the head pointer after finishing 218with a completion entry that was posted by the controller. 219.It Va dev.nvme.0.ioq0.num_cmds 220(R) Number of commands that have been submitted on this queue pair. 221.It Va dev.nvme.0.ioq0.dump_debug 222(W) Writing 1 to this sysctl will dump the full contents of the submission 223and completion queues to the console. 224.El 225.Pp 226In addition to the typical pci attachment, the 227.Nm 228driver supports attaching to a 229.Xr ahci 4 230device. 231Intel's Rapid Storage Technology (RST) hides the nvme device 232behind the AHCI device due to limitations in Windows. 233However, this effectively hides it from the 234.Fx 235kernel. 236To work around this limitation, 237.Fx 238detects that the AHCI device supports RST and when it is enabled. 239See 240.Xr ahci 4 241for more details. 242.Sh DIAGNOSTICS 243.Bl -diag 244.It "nvme%d: System interrupt issues?" 245The driver found a timed-out transaction had a pending completion record, 246indicating an interrupt had not been delivered. 247The system is either not configuring interrupts properly, or the system drops 248them under load. 249This message will appear at most once per boot per controller. 250.El 251.Sh SEE ALSO 252.Xr nda 4 , 253.Xr nvd 4 , 254.Xr pci 4 , 255.Xr nvmecontrol 8 , 256.Xr disk 9 257.Sh HISTORY 258The 259.Nm 260driver first appeared in 261.Fx 9.2 . 262.Sh AUTHORS 263.An -nosplit 264The 265.Nm 266driver was developed by Intel and originally written by 267.An Jim Harris Aq Mt jimharris@FreeBSD.org , 268with contributions from 269.An Joe Golio 270at EMC. 271.Pp 272This man page was written by 273.An Jim Harris Aq Mt jimharris@FreeBSD.org . 274