1e3e90193SJim Harris.\" 250dea2daSJim Harris.\" Copyright (c) 2012-2016 Intel Corporation 3e3e90193SJim Harris.\" All rights reserved. 4e3e90193SJim Harris.\" 5e3e90193SJim Harris.\" Redistribution and use in source and binary forms, with or without 6e3e90193SJim Harris.\" modification, are permitted provided that the following conditions 7e3e90193SJim Harris.\" are met: 8e3e90193SJim Harris.\" 1. Redistributions of source code must retain the above copyright 9e3e90193SJim Harris.\" notice, this list of conditions, and the following disclaimer, 10e3e90193SJim Harris.\" without modification. 11e3e90193SJim Harris.\" 2. Redistributions in binary form must reproduce at minimum a disclaimer 12e3e90193SJim Harris.\" substantially similar to the "NO WARRANTY" disclaimer below 13e3e90193SJim Harris.\" ("Disclaimer") and any redistribution must be conditioned upon 14e3e90193SJim Harris.\" including a substantially similar Disclaimer requirement for further 15e3e90193SJim Harris.\" binary redistribution. 16e3e90193SJim Harris.\" 17e3e90193SJim Harris.\" NO WARRANTY 18e3e90193SJim Harris.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 19e3e90193SJim Harris.\" "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 20e3e90193SJim Harris.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR 21e3e90193SJim Harris.\" A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 22e3e90193SJim Harris.\" HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23e3e90193SJim Harris.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 24e3e90193SJim Harris.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25e3e90193SJim Harris.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, 26e3e90193SJim Harris.\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING 27e3e90193SJim Harris.\" IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 28e3e90193SJim Harris.\" POSSIBILITY OF SUCH DAMAGES. 29e3e90193SJim Harris.\" 30e3e90193SJim Harris.\" nvme driver man page. 31e3e90193SJim Harris.\" 32e3e90193SJim Harris.\" Author: Jim Harris <jimharris@FreeBSD.org> 33e3e90193SJim Harris.\" 3431813e32SWarner Losh.Dd June 6, 2020 35e3e90193SJim Harris.Dt NVME 4 36e3e90193SJim Harris.Os 37e3e90193SJim Harris.Sh NAME 38e3e90193SJim Harris.Nm nvme 39e3e90193SJim Harris.Nd NVM Express core driver 40e3e90193SJim Harris.Sh SYNOPSIS 41e3e90193SJim HarrisTo compile this driver into your kernel, 42e3e90193SJim Harrisplace the following line in your kernel configuration file: 43e3e90193SJim Harris.Bd -ragged -offset indent 44e3e90193SJim Harris.Cd "device nvme" 45e3e90193SJim Harris.Ed 46e3e90193SJim Harris.Pp 47e3e90193SJim HarrisOr, to load the driver as a module at boot, place the following line in 48e3e90193SJim Harris.Xr loader.conf 5 : 49e3e90193SJim Harris.Bd -literal -offset indent 50e3e90193SJim Harrisnvme_load="YES" 51e3e90193SJim Harris.Ed 52e3e90193SJim Harris.Pp 53e3e90193SJim HarrisMost users will also want to enable 54e3e90193SJim Harris.Xr nvd 4 5531813e32SWarner Loshor 5631813e32SWarner Losh.Xr nda 4 5731222756SAlan Somersto expose NVM Express namespaces as disk devices which can be 588f3616caSJim Harrispartitioned. 59e3e90193SJim HarrisNote that in NVM Express terms, a namespace is roughly equivalent to a 60e3e90193SJim HarrisSCSI LUN. 61e3e90193SJim Harris.Sh DESCRIPTION 62e3e90193SJim HarrisThe 63e3e90193SJim Harris.Nm 64e3e90193SJim Harrisdriver provides support for NVM Express (NVMe) controllers, such as: 65e3e90193SJim Harris.Bl -bullet 66e3e90193SJim Harris.It 67e3e90193SJim HarrisHardware initialization 68e3e90193SJim Harris.It 69e3e90193SJim HarrisPer-CPU IO queue pairs 70e3e90193SJim Harris.It 71e3e90193SJim HarrisAPI for registering NVMe namespace consumers such as 72e3e90193SJim Harris.Xr nvd 4 7380c8ffadSWarner Loshor 7480c8ffadSWarner Losh.Xr nda 4 75e3e90193SJim Harris.It 76e3e90193SJim HarrisAPI for submitting NVM commands to namespaces 77e3e90193SJim Harris.It 78e3e90193SJim HarrisIoctls for controller and namespace configuration and management 798f3616caSJim Harris.El 8078c7e17bSChristian Brueffer.Pp 818f3616caSJim HarrisThe 82e3e90193SJim Harris.Nm 838f3616caSJim Harrisdriver creates controller device nodes in the format 8478c7e17bSChristian Brueffer.Pa /dev/nvmeX 858f3616caSJim Harrisand namespace device nodes in 8678c7e17bSChristian Bruefferthe format 8778c7e17bSChristian Brueffer.Pa /dev/nvmeXnsY . 88e3e90193SJim HarrisNote that the NVM Express specification starts numbering namespaces at 1, 89e3e90193SJim Harrisnot 0, and this driver follows that convention. 90e3e90193SJim Harris.Sh CONFIGURATION 91e3e90193SJim HarrisBy default, 92e3e90193SJim Harris.Nm 93e3e90193SJim Harriswill create an I/O queue pair for each CPU, provided enough MSI-X vectors 9425972509SEdward Tomasz Napieralaand NVMe queue pairs can be allocated. 9525972509SEdward Tomasz NapieralaIf not enough vectors or queue 9650dea2daSJim Harrispairs are available, nvme(4) will use a smaller number of queue pairs and 9750dea2daSJim Harrisassign multiple CPUs per queue pair. 9850dea2daSJim Harris.Pp 99e3e90193SJim HarrisTo force a single I/O queue pair shared by all CPUs, set the following 100e3e90193SJim Harristunable value in 101e3e90193SJim Harris.Xr loader.conf 5 : 102e3e90193SJim Harris.Bd -literal -offset indent 103e3e90193SJim Harrishw.nvme.per_cpu_io_queues=0 104e3e90193SJim Harris.Ed 105e3e90193SJim Harris.Pp 10650dea2daSJim HarrisTo assign more than one CPU per I/O queue pair, thereby reducing the number 10750dea2daSJim Harrisof MSI-X vectors consumed by the device, set the following tunable value in 10850dea2daSJim Harris.Xr loader.conf 5 : 10950dea2daSJim Harris.Bd -literal -offset indent 11050dea2daSJim Harrishw.nvme.min_cpus_per_ioq=X 11150dea2daSJim Harris.Ed 11250dea2daSJim Harris.Pp 113e3e90193SJim HarrisTo force legacy interrupts for all 114e3e90193SJim Harris.Nm 115e3e90193SJim Harrisdriver instances, set the following tunable value in 116e3e90193SJim Harris.Xr loader.conf 5 : 117e3e90193SJim Harris.Bd -literal -offset indent 118e3e90193SJim Harrishw.nvme.force_intx=1 119e3e90193SJim Harris.Ed 120e3e90193SJim Harris.Pp 121e3e90193SJim HarrisNote that use of INTx implies disabling of per-CPU I/O queue pairs. 122cd90b464SWarner Losh.Pp 12367abaee9SAlexander MotinTo control maximum amount of system RAM in bytes to use as Host Memory 12467abaee9SAlexander MotinBuffer for capable devices, set the following tunable: 12567abaee9SAlexander Motin.Bd -literal -offset indent 12667abaee9SAlexander Motinhw.nvme.hmb_max 12767abaee9SAlexander Motin.Ed 12867abaee9SAlexander Motin.Pp 1296de4e458SAlexander MotinThe default value is 5% of physical memory size per device. 13067abaee9SAlexander Motin.Pp 131cd90b464SWarner LoshThe 132cd90b464SWarner Losh.Xr nvd 4 133cd90b464SWarner Loshdriver is used to provide a disk driver to the system by default. 134cd90b464SWarner LoshThe 135cd90b464SWarner Losh.Xr nda 4 136cd90b464SWarner Loshdriver can also be used instead. 137cd90b464SWarner LoshThe 138cd90b464SWarner Losh.Xr nvd 4 139cd90b464SWarner Loshdriver performs better with smaller transactions and few TRIM 140cd90b464SWarner Loshcommands. 141bc97490aSConrad MeyerIt sends all commands directly to the drive immediately. 142cd90b464SWarner LoshThe 143cd90b464SWarner Losh.Xr nda 4 144cd90b464SWarner Loshdriver performs better with larger transactions and also collapses 145cd90b464SWarner LoshTRIM commands giving better performance. 146cd90b464SWarner LoshIt can queue commands to the drive; combine 147cd90b464SWarner Losh.Dv BIO_DELETE 148cd90b464SWarner Loshcommands into a single trip; and 149cd90b464SWarner Loshuse the CAM I/O scheduler to bias one type of operation over another. 150cd90b464SWarner LoshTo select the 151cd90b464SWarner Losh.Xr nda 4 152cd90b464SWarner Loshdriver, set the following tunable value in 153cd90b464SWarner Losh.Xr loader.conf 5 : 154cd90b464SWarner Losh.Bd -literal -offset indent 155cd90b464SWarner Loshhw.nvme.use_nvd=0 156cd90b464SWarner Losh.Ed 157cd90b464SWarner Losh.Pp 158cd90b464SWarner LoshThis value may also be set in the kernel config file with 159cd90b464SWarner Losh.Bd -literal -offset indent 160cd90b464SWarner Losh.Cd options NVME_USE_NVD=0 161cd90b464SWarner Losh.Ed 162c75bdc04SWarner Losh.Pp 163c75bdc04SWarner LoshWhen there is an error, 164c75bdc04SWarner Losh.Nm 165c75bdc04SWarner Loshprints only the most relevant information about the command by default. 166c75bdc04SWarner LoshTo enable dumping of all information about the command, set the following tunable 167c75bdc04SWarner Loshvalue in 168c75bdc04SWarner Losh.Xr loader.conf 5 : 169c75bdc04SWarner Losh.Bd -literal -offset indent 170c75bdc04SWarner Loshhw.nvme.verbose_cmd_dump=1 171c75bdc04SWarner Losh.Ed 172c75bdc04SWarner Losh.Pp 1734b3da659SWarner LoshPrior versions of the driver reset the card twice on boot. 1744b3da659SWarner LoshThis proved to be unnecessary and inefficient, so the driver now resets drive 1754b3da659SWarner Loshcontroller only once. 1764b3da659SWarner LoshThe old behavior may be restored in the kernel config file with 1774b3da659SWarner Losh.Bd -literal -offset indent 1784b3da659SWarner Losh.Cd options NVME_2X_RESET 1794b3da659SWarner Losh.Ed 180e3e90193SJim Harris.Sh SYSCTL VARIABLES 181e3e90193SJim HarrisThe following controller-level sysctls are currently implemented: 182e3e90193SJim Harris.Bl -tag -width indent 18350dea2daSJim Harris.It Va dev.nvme.0.num_cpus_per_ioq 18450dea2daSJim Harris(R) Number of CPUs associated with each I/O queue pair. 185e3e90193SJim Harris.It Va dev.nvme.0.int_coal_time 18678c7e17bSChristian Brueffer(R/W) Interrupt coalescing timer period in microseconds. 18778c7e17bSChristian BruefferSet to 0 to disable. 188e3e90193SJim Harris.It Va dev.nvme.0.int_coal_threshold 189e3e90193SJim Harris(R/W) Interrupt coalescing threshold in number of command completions. 190e3e90193SJim HarrisSet to 0 to disable. 191e3e90193SJim Harris.El 192e3e90193SJim Harris.Pp 193e3e90193SJim HarrisThe following queue pair-level sysctls are currently implemented. 194e3e90193SJim HarrisAdmin queue sysctls take the format of dev.nvme.0.adminq and I/O queue sysctls 195e3e90193SJim Harristake the format of dev.nvme.0.ioq0. 196e3e90193SJim Harris.Bl -tag -width indent 197e3e90193SJim Harris.It Va dev.nvme.0.ioq0.num_entries 198e3e90193SJim Harris(R) Number of entries in this queue pair's command and completion queue. 199e3e90193SJim Harris.It Va dev.nvme.0.ioq0.num_tr 200e3e90193SJim Harris(R) Number of nvme_tracker structures currently allocated for this queue pair. 201e3e90193SJim Harris.It Va dev.nvme.0.ioq0.num_prp_list 202e3e90193SJim Harris(R) Number of nvme_prp_list structures currently allocated for this queue pair. 203e3e90193SJim Harris.It Va dev.nvme.0.ioq0.sq_head 204e3e90193SJim Harris(R) Current location of the submission queue head pointer as observed by 205e3e90193SJim Harristhe driver. 206e3e90193SJim HarrisThe head pointer is incremented by the controller as it takes commands off 207e3e90193SJim Harrisof the submission queue. 208e3e90193SJim Harris.It Va dev.nvme.0.ioq0.sq_tail 209e3e90193SJim Harris(R) Current location of the submission queue tail pointer as observed by 210e3e90193SJim Harristhe driver. 211e3e90193SJim HarrisThe driver increments the tail pointer after writing a command 212e3e90193SJim Harrisinto the submission queue to signal that a new command is ready to be 213e3e90193SJim Harrisprocessed. 214e3e90193SJim Harris.It Va dev.nvme.0.ioq0.cq_head 215e3e90193SJim Harris(R) Current location of the completion queue head pointer as observed by 216e3e90193SJim Harristhe driver. 217e3e90193SJim HarrisThe driver increments the head pointer after finishing 218e3e90193SJim Harriswith a completion entry that was posted by the controller. 219e3e90193SJim Harris.It Va dev.nvme.0.ioq0.num_cmds 220e3e90193SJim Harris(R) Number of commands that have been submitted on this queue pair. 221e3e90193SJim Harris.It Va dev.nvme.0.ioq0.dump_debug 222e3e90193SJim Harris(W) Writing 1 to this sysctl will dump the full contents of the submission 223e3e90193SJim Harrisand completion queues to the console. 224e3e90193SJim Harris.El 22551c5de5fSWarner Losh.Pp 22651c5de5fSWarner LoshIn addition to the typical pci attachment, the 22751c5de5fSWarner Losh.Nm 22851c5de5fSWarner Loshdriver supports attaching to a 22951c5de5fSWarner Losh.Xr ahci 4 23051c5de5fSWarner Loshdevice. 23151c5de5fSWarner LoshIntel's Rapid Storage Technology (RST) hides the nvme device 23251c5de5fSWarner Loshbehind the AHCI device due to limitations in Windows. 23351c5de5fSWarner LoshHowever, this effectively hides it from the 23451c5de5fSWarner Losh.Fx 23551c5de5fSWarner Loshkernel. 23651c5de5fSWarner LoshTo work around this limitation, 23751c5de5fSWarner Losh.Fx 23851c5de5fSWarner Loshdetects that the AHCI device supports RST and when it is enabled. 23951c5de5fSWarner LoshSee 24051c5de5fSWarner Losh.Xr ahci 4 24151c5de5fSWarner Loshfor more details. 242*bb7f7d5bSWarner Losh.Sh DIAGNOSTICS 243*bb7f7d5bSWarner Losh.Bl -diag 244*bb7f7d5bSWarner Losh.It "nvme%d: System interrupt issues?" 245*bb7f7d5bSWarner LoshThe driver found a timed-out transaction had a pending completion record, 246*bb7f7d5bSWarner Loshindicating an interrupt had not been delivered. 247*bb7f7d5bSWarner LoshThe system is either not configuring interrupts properly, or the system drops 248*bb7f7d5bSWarner Loshthem under load. 249*bb7f7d5bSWarner LoshThis message will appear at most once per boot per controller. 250*bb7f7d5bSWarner Losh.El 251e3e90193SJim Harris.Sh SEE ALSO 25280c8ffadSWarner Losh.Xr nda 4 , 253e3e90193SJim Harris.Xr nvd 4 , 254e3e90193SJim Harris.Xr pci 4 , 255e3e90193SJim Harris.Xr nvmecontrol 8 , 25678c7e17bSChristian Brueffer.Xr disk 9 2571cb02270SJim Harris.Sh HISTORY 2581cb02270SJim HarrisThe 2591cb02270SJim Harris.Nm 2601cb02270SJim Harrisdriver first appeared in 2611cb02270SJim Harris.Fx 9.2 . 262e3e90193SJim Harris.Sh AUTHORS 263e3e90193SJim Harris.An -nosplit 264e3e90193SJim HarrisThe 265e3e90193SJim Harris.Nm 266e3e90193SJim Harrisdriver was developed by Intel and originally written by 2676c899950SBaptiste Daroussin.An Jim Harris Aq Mt jimharris@FreeBSD.org , 2686c899950SBaptiste Daroussinwith contributions from 2696c899950SBaptiste Daroussin.An Joe Golio 2706c899950SBaptiste Daroussinat EMC. 271e3e90193SJim Harris.Pp 272e3e90193SJim HarrisThis man page was written by 2736c899950SBaptiste Daroussin.An Jim Harris Aq Mt jimharris@FreeBSD.org . 274