1.\" 2.\" Copyright (c) 2012-2016 Intel Corporation 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions, and the following disclaimer, 10.\" without modification. 11.\" 2. Redistributions in binary form must reproduce at minimum a disclaimer 12.\" substantially similar to the "NO WARRANTY" disclaimer below 13.\" ("Disclaimer") and any redistribution must be conditioned upon 14.\" including a substantially similar Disclaimer requirement for further 15.\" binary redistribution. 16.\" 17.\" NO WARRANTY 18.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 19.\" "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 20.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR 21.\" A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 22.\" HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 26.\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING 27.\" IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28.\" POSSIBILITY OF SUCH DAMAGES. 29.\" 30.\" nvme driver man page. 31.\" 32.\" Author: Jim Harris <jimharris@FreeBSD.org> 33.\" 34.\" $FreeBSD$ 35.\" 36.Dd June 6, 2020 37.Dt NVME 4 38.Os 39.Sh NAME 40.Nm nvme 41.Nd NVM Express core driver 42.Sh SYNOPSIS 43To compile this driver into your kernel, 44place the following line in your kernel configuration file: 45.Bd -ragged -offset indent 46.Cd "device nvme" 47.Ed 48.Pp 49Or, to load the driver as a module at boot, place the following line in 50.Xr loader.conf 5 : 51.Bd -literal -offset indent 52nvme_load="YES" 53.Ed 54.Pp 55Most users will also want to enable 56.Xr nvd 4 57or 58.Xr nda 4 59to expose NVM Express namespaces as disk devices which can be 60partitioned. 61Note that in NVM Express terms, a namespace is roughly equivalent to a 62SCSI LUN. 63.Sh DESCRIPTION 64The 65.Nm 66driver provides support for NVM Express (NVMe) controllers, such as: 67.Bl -bullet 68.It 69Hardware initialization 70.It 71Per-CPU IO queue pairs 72.It 73API for registering NVMe namespace consumers such as 74.Xr nvd 4 75or 76.Xr nda 4 77.It 78API for submitting NVM commands to namespaces 79.It 80Ioctls for controller and namespace configuration and management 81.El 82.Pp 83The 84.Nm 85driver creates controller device nodes in the format 86.Pa /dev/nvmeX 87and namespace device nodes in 88the format 89.Pa /dev/nvmeXnsY . 90Note that the NVM Express specification starts numbering namespaces at 1, 91not 0, and this driver follows that convention. 92.Sh CONFIGURATION 93By default, 94.Nm 95will create an I/O queue pair for each CPU, provided enough MSI-X vectors 96and NVMe queue pairs can be allocated. 97If not enough vectors or queue 98pairs are available, nvme(4) will use a smaller number of queue pairs and 99assign multiple CPUs per queue pair. 100.Pp 101To force a single I/O queue pair shared by all CPUs, set the following 102tunable value in 103.Xr loader.conf 5 : 104.Bd -literal -offset indent 105hw.nvme.per_cpu_io_queues=0 106.Ed 107.Pp 108To assign more than one CPU per I/O queue pair, thereby reducing the number 109of MSI-X vectors consumed by the device, set the following tunable value in 110.Xr loader.conf 5 : 111.Bd -literal -offset indent 112hw.nvme.min_cpus_per_ioq=X 113.Ed 114.Pp 115To force legacy interrupts for all 116.Nm 117driver instances, set the following tunable value in 118.Xr loader.conf 5 : 119.Bd -literal -offset indent 120hw.nvme.force_intx=1 121.Ed 122.Pp 123Note that use of INTx implies disabling of per-CPU I/O queue pairs. 124.Pp 125To control maximum amount of system RAM in bytes to use as Host Memory 126Buffer for capable devices, set the following tunable: 127.Bd -literal -offset indent 128hw.nvme.hmb_max 129.Ed 130.Pp 131The default value is 5% of physical memory size per device. 132.Pp 133The 134.Xr nvd 4 135driver is used to provide a disk driver to the system by default. 136The 137.Xr nda 4 138driver can also be used instead. 139The 140.Xr nvd 4 141driver performs better with smaller transactions and few TRIM 142commands. 143It sends all commands directly to the drive immediately. 144The 145.Xr nda 4 146driver performs better with larger transactions and also collapses 147TRIM commands giving better performance. 148It can queue commands to the drive; combine 149.Dv BIO_DELETE 150commands into a single trip; and 151use the CAM I/O scheduler to bias one type of operation over another. 152To select the 153.Xr nda 4 154driver, set the following tunable value in 155.Xr loader.conf 5 : 156.Bd -literal -offset indent 157hw.nvme.use_nvd=0 158.Ed 159.Pp 160This value may also be set in the kernel config file with 161.Bd -literal -offset indent 162.Cd options NVME_USE_NVD=0 163.Ed 164.Pp 165When there is an error, 166.Nm 167prints only the most relevant information about the command by default. 168To enable dumping of all information about the command, set the following tunable 169value in 170.Xr loader.conf 5 : 171.Bd -literal -offset indent 172hw.nvme.verbose_cmd_dump=1 173.Ed 174.Pp 175Prior versions of the driver reset the card twice on boot. 176This proved to be unnecessary and inefficient, so the driver now resets drive 177controller only once. 178The old behavior may be restored in the kernel config file with 179.Bd -literal -offset indent 180.Cd options NVME_2X_RESET 181.Ed 182.Pp 183.Sh SYSCTL VARIABLES 184The following controller-level sysctls are currently implemented: 185.Bl -tag -width indent 186.It Va dev.nvme.0.num_cpus_per_ioq 187(R) Number of CPUs associated with each I/O queue pair. 188.It Va dev.nvme.0.int_coal_time 189(R/W) Interrupt coalescing timer period in microseconds. 190Set to 0 to disable. 191.It Va dev.nvme.0.int_coal_threshold 192(R/W) Interrupt coalescing threshold in number of command completions. 193Set to 0 to disable. 194.El 195.Pp 196The following queue pair-level sysctls are currently implemented. 197Admin queue sysctls take the format of dev.nvme.0.adminq and I/O queue sysctls 198take the format of dev.nvme.0.ioq0. 199.Bl -tag -width indent 200.It Va dev.nvme.0.ioq0.num_entries 201(R) Number of entries in this queue pair's command and completion queue. 202.It Va dev.nvme.0.ioq0.num_tr 203(R) Number of nvme_tracker structures currently allocated for this queue pair. 204.It Va dev.nvme.0.ioq0.num_prp_list 205(R) Number of nvme_prp_list structures currently allocated for this queue pair. 206.It Va dev.nvme.0.ioq0.sq_head 207(R) Current location of the submission queue head pointer as observed by 208the driver. 209The head pointer is incremented by the controller as it takes commands off 210of the submission queue. 211.It Va dev.nvme.0.ioq0.sq_tail 212(R) Current location of the submission queue tail pointer as observed by 213the driver. 214The driver increments the tail pointer after writing a command 215into the submission queue to signal that a new command is ready to be 216processed. 217.It Va dev.nvme.0.ioq0.cq_head 218(R) Current location of the completion queue head pointer as observed by 219the driver. 220The driver increments the head pointer after finishing 221with a completion entry that was posted by the controller. 222.It Va dev.nvme.0.ioq0.num_cmds 223(R) Number of commands that have been submitted on this queue pair. 224.It Va dev.nvme.0.ioq0.dump_debug 225(W) Writing 1 to this sysctl will dump the full contents of the submission 226and completion queues to the console. 227.El 228.Pp 229In addition to the typical pci attachment, the 230.Nm 231driver supports attaching to a 232.Xr ahci 4 233device. 234Intel's Rapid Storage Technology (RST) hides the nvme device 235behind the AHCI device due to limitations in Windows. 236However, this effectively hides it from the 237.Fx 238kernel. 239To work around this limitation, 240.Fx 241detects that the AHCI device supports RST and when it is enabled. 242See 243.Xr ahci 4 244for more details. 245.Sh SEE ALSO 246.Xr nda 4 , 247.Xr nvd 4 , 248.Xr pci 4 , 249.Xr nvmecontrol 8 , 250.Xr disk 9 251.Sh HISTORY 252The 253.Nm 254driver first appeared in 255.Fx 9.2 . 256.Sh AUTHORS 257.An -nosplit 258The 259.Nm 260driver was developed by Intel and originally written by 261.An Jim Harris Aq Mt jimharris@FreeBSD.org , 262with contributions from 263.An Joe Golio 264at EMC. 265.Pp 266This man page was written by 267.An Jim Harris Aq Mt jimharris@FreeBSD.org . 268