xref: /freebsd/share/man/man4/nvme.4 (revision bb7f7d5b5201cfe569fce79b0f325bec2cf38ad2)
1e3e90193SJim Harris.\"
250dea2daSJim Harris.\" Copyright (c) 2012-2016 Intel Corporation
3e3e90193SJim Harris.\" All rights reserved.
4e3e90193SJim Harris.\"
5e3e90193SJim Harris.\" Redistribution and use in source and binary forms, with or without
6e3e90193SJim Harris.\" modification, are permitted provided that the following conditions
7e3e90193SJim Harris.\" are met:
8e3e90193SJim Harris.\" 1. Redistributions of source code must retain the above copyright
9e3e90193SJim Harris.\"    notice, this list of conditions, and the following disclaimer,
10e3e90193SJim Harris.\"    without modification.
11e3e90193SJim Harris.\" 2. Redistributions in binary form must reproduce at minimum a disclaimer
12e3e90193SJim Harris.\"    substantially similar to the "NO WARRANTY" disclaimer below
13e3e90193SJim Harris.\"    ("Disclaimer") and any redistribution must be conditioned upon
14e3e90193SJim Harris.\"    including a substantially similar Disclaimer requirement for further
15e3e90193SJim Harris.\"    binary redistribution.
16e3e90193SJim Harris.\"
17e3e90193SJim Harris.\" NO WARRANTY
18e3e90193SJim Harris.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
19e3e90193SJim Harris.\" "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
20e3e90193SJim Harris.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
21e3e90193SJim Harris.\" A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
22e3e90193SJim Harris.\" HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23e3e90193SJim Harris.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24e3e90193SJim Harris.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25e3e90193SJim Harris.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
26e3e90193SJim Harris.\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
27e3e90193SJim Harris.\" IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28e3e90193SJim Harris.\" POSSIBILITY OF SUCH DAMAGES.
29e3e90193SJim Harris.\"
30e3e90193SJim Harris.\" nvme driver man page.
31e3e90193SJim Harris.\"
32e3e90193SJim Harris.\" Author: Jim Harris <jimharris@FreeBSD.org>
33e3e90193SJim Harris.\"
3431813e32SWarner Losh.Dd June 6, 2020
35e3e90193SJim Harris.Dt NVME 4
36e3e90193SJim Harris.Os
37e3e90193SJim Harris.Sh NAME
38e3e90193SJim Harris.Nm nvme
39e3e90193SJim Harris.Nd NVM Express core driver
40e3e90193SJim Harris.Sh SYNOPSIS
41e3e90193SJim HarrisTo compile this driver into your kernel,
42e3e90193SJim Harrisplace the following line in your kernel configuration file:
43e3e90193SJim Harris.Bd -ragged -offset indent
44e3e90193SJim Harris.Cd "device nvme"
45e3e90193SJim Harris.Ed
46e3e90193SJim Harris.Pp
47e3e90193SJim HarrisOr, to load the driver as a module at boot, place the following line in
48e3e90193SJim Harris.Xr loader.conf 5 :
49e3e90193SJim Harris.Bd -literal -offset indent
50e3e90193SJim Harrisnvme_load="YES"
51e3e90193SJim Harris.Ed
52e3e90193SJim Harris.Pp
53e3e90193SJim HarrisMost users will also want to enable
54e3e90193SJim Harris.Xr nvd 4
5531813e32SWarner Loshor
5631813e32SWarner Losh.Xr nda 4
5731222756SAlan Somersto expose NVM Express namespaces as disk devices which can be
588f3616caSJim Harrispartitioned.
59e3e90193SJim HarrisNote that in NVM Express terms, a namespace is roughly equivalent to a
60e3e90193SJim HarrisSCSI LUN.
61e3e90193SJim Harris.Sh DESCRIPTION
62e3e90193SJim HarrisThe
63e3e90193SJim Harris.Nm
64e3e90193SJim Harrisdriver provides support for NVM Express (NVMe) controllers, such as:
65e3e90193SJim Harris.Bl -bullet
66e3e90193SJim Harris.It
67e3e90193SJim HarrisHardware initialization
68e3e90193SJim Harris.It
69e3e90193SJim HarrisPer-CPU IO queue pairs
70e3e90193SJim Harris.It
71e3e90193SJim HarrisAPI for registering NVMe namespace consumers such as
72e3e90193SJim Harris.Xr nvd 4
7380c8ffadSWarner Loshor
7480c8ffadSWarner Losh.Xr nda 4
75e3e90193SJim Harris.It
76e3e90193SJim HarrisAPI for submitting NVM commands to namespaces
77e3e90193SJim Harris.It
78e3e90193SJim HarrisIoctls for controller and namespace configuration and management
798f3616caSJim Harris.El
8078c7e17bSChristian Brueffer.Pp
818f3616caSJim HarrisThe
82e3e90193SJim Harris.Nm
838f3616caSJim Harrisdriver creates controller device nodes in the format
8478c7e17bSChristian Brueffer.Pa /dev/nvmeX
858f3616caSJim Harrisand namespace device nodes in
8678c7e17bSChristian Bruefferthe format
8778c7e17bSChristian Brueffer.Pa /dev/nvmeXnsY .
88e3e90193SJim HarrisNote that the NVM Express specification starts numbering namespaces at 1,
89e3e90193SJim Harrisnot 0, and this driver follows that convention.
90e3e90193SJim Harris.Sh CONFIGURATION
91e3e90193SJim HarrisBy default,
92e3e90193SJim Harris.Nm
93e3e90193SJim Harriswill create an I/O queue pair for each CPU, provided enough MSI-X vectors
9425972509SEdward Tomasz Napieralaand NVMe queue pairs can be allocated.
9525972509SEdward Tomasz NapieralaIf not enough vectors or queue
9650dea2daSJim Harrispairs are available, nvme(4) will use a smaller number of queue pairs and
9750dea2daSJim Harrisassign multiple CPUs per queue pair.
9850dea2daSJim Harris.Pp
99e3e90193SJim HarrisTo force a single I/O queue pair shared by all CPUs, set the following
100e3e90193SJim Harristunable value in
101e3e90193SJim Harris.Xr loader.conf 5 :
102e3e90193SJim Harris.Bd -literal -offset indent
103e3e90193SJim Harrishw.nvme.per_cpu_io_queues=0
104e3e90193SJim Harris.Ed
105e3e90193SJim Harris.Pp
10650dea2daSJim HarrisTo assign more than one CPU per I/O queue pair, thereby reducing the number
10750dea2daSJim Harrisof MSI-X vectors consumed by the device, set the following tunable value in
10850dea2daSJim Harris.Xr loader.conf 5 :
10950dea2daSJim Harris.Bd -literal -offset indent
11050dea2daSJim Harrishw.nvme.min_cpus_per_ioq=X
11150dea2daSJim Harris.Ed
11250dea2daSJim Harris.Pp
113e3e90193SJim HarrisTo force legacy interrupts for all
114e3e90193SJim Harris.Nm
115e3e90193SJim Harrisdriver instances, set the following tunable value in
116e3e90193SJim Harris.Xr loader.conf 5 :
117e3e90193SJim Harris.Bd -literal -offset indent
118e3e90193SJim Harrishw.nvme.force_intx=1
119e3e90193SJim Harris.Ed
120e3e90193SJim Harris.Pp
121e3e90193SJim HarrisNote that use of INTx implies disabling of per-CPU I/O queue pairs.
122cd90b464SWarner Losh.Pp
12367abaee9SAlexander MotinTo control maximum amount of system RAM in bytes to use as Host Memory
12467abaee9SAlexander MotinBuffer for capable devices, set the following tunable:
12567abaee9SAlexander Motin.Bd -literal -offset indent
12667abaee9SAlexander Motinhw.nvme.hmb_max
12767abaee9SAlexander Motin.Ed
12867abaee9SAlexander Motin.Pp
1296de4e458SAlexander MotinThe default value is 5% of physical memory size per device.
13067abaee9SAlexander Motin.Pp
131cd90b464SWarner LoshThe
132cd90b464SWarner Losh.Xr nvd 4
133cd90b464SWarner Loshdriver is used to provide a disk driver to the system by default.
134cd90b464SWarner LoshThe
135cd90b464SWarner Losh.Xr nda 4
136cd90b464SWarner Loshdriver can also be used instead.
137cd90b464SWarner LoshThe
138cd90b464SWarner Losh.Xr nvd 4
139cd90b464SWarner Loshdriver performs better with smaller transactions and few TRIM
140cd90b464SWarner Loshcommands.
141bc97490aSConrad MeyerIt sends all commands directly to the drive immediately.
142cd90b464SWarner LoshThe
143cd90b464SWarner Losh.Xr nda 4
144cd90b464SWarner Loshdriver performs better with larger transactions and also collapses
145cd90b464SWarner LoshTRIM commands giving better performance.
146cd90b464SWarner LoshIt can queue commands to the drive; combine
147cd90b464SWarner Losh.Dv BIO_DELETE
148cd90b464SWarner Loshcommands into a single trip; and
149cd90b464SWarner Loshuse the CAM I/O scheduler to bias one type of operation over another.
150cd90b464SWarner LoshTo select the
151cd90b464SWarner Losh.Xr nda 4
152cd90b464SWarner Loshdriver, set the following tunable value in
153cd90b464SWarner Losh.Xr loader.conf 5 :
154cd90b464SWarner Losh.Bd -literal -offset indent
155cd90b464SWarner Loshhw.nvme.use_nvd=0
156cd90b464SWarner Losh.Ed
157cd90b464SWarner Losh.Pp
158cd90b464SWarner LoshThis value may also be set in the kernel config file with
159cd90b464SWarner Losh.Bd -literal -offset indent
160cd90b464SWarner Losh.Cd options NVME_USE_NVD=0
161cd90b464SWarner Losh.Ed
162c75bdc04SWarner Losh.Pp
163c75bdc04SWarner LoshWhen there is an error,
164c75bdc04SWarner Losh.Nm
165c75bdc04SWarner Loshprints only the most relevant information about the command by default.
166c75bdc04SWarner LoshTo enable dumping of all information about the command, set the following tunable
167c75bdc04SWarner Loshvalue in
168c75bdc04SWarner Losh.Xr loader.conf 5 :
169c75bdc04SWarner Losh.Bd -literal -offset indent
170c75bdc04SWarner Loshhw.nvme.verbose_cmd_dump=1
171c75bdc04SWarner Losh.Ed
172c75bdc04SWarner Losh.Pp
1734b3da659SWarner LoshPrior versions of the driver reset the card twice on boot.
1744b3da659SWarner LoshThis proved to be unnecessary and inefficient, so the driver now resets drive
1754b3da659SWarner Loshcontroller only once.
1764b3da659SWarner LoshThe old behavior may be restored in the kernel config file with
1774b3da659SWarner Losh.Bd -literal -offset indent
1784b3da659SWarner Losh.Cd options NVME_2X_RESET
1794b3da659SWarner Losh.Ed
180e3e90193SJim Harris.Sh SYSCTL VARIABLES
181e3e90193SJim HarrisThe following controller-level sysctls are currently implemented:
182e3e90193SJim Harris.Bl -tag -width indent
18350dea2daSJim Harris.It Va dev.nvme.0.num_cpus_per_ioq
18450dea2daSJim Harris(R) Number of CPUs associated with each I/O queue pair.
185e3e90193SJim Harris.It Va dev.nvme.0.int_coal_time
18678c7e17bSChristian Brueffer(R/W) Interrupt coalescing timer period in microseconds.
18778c7e17bSChristian BruefferSet to 0 to disable.
188e3e90193SJim Harris.It Va dev.nvme.0.int_coal_threshold
189e3e90193SJim Harris(R/W) Interrupt coalescing threshold in number of command completions.
190e3e90193SJim HarrisSet to 0 to disable.
191e3e90193SJim Harris.El
192e3e90193SJim Harris.Pp
193e3e90193SJim HarrisThe following queue pair-level sysctls are currently implemented.
194e3e90193SJim HarrisAdmin queue sysctls take the format of dev.nvme.0.adminq and I/O queue sysctls
195e3e90193SJim Harristake the format of dev.nvme.0.ioq0.
196e3e90193SJim Harris.Bl -tag -width indent
197e3e90193SJim Harris.It Va dev.nvme.0.ioq0.num_entries
198e3e90193SJim Harris(R) Number of entries in this queue pair's command and completion queue.
199e3e90193SJim Harris.It Va dev.nvme.0.ioq0.num_tr
200e3e90193SJim Harris(R) Number of nvme_tracker structures currently allocated for this queue pair.
201e3e90193SJim Harris.It Va dev.nvme.0.ioq0.num_prp_list
202e3e90193SJim Harris(R) Number of nvme_prp_list structures currently allocated for this queue pair.
203e3e90193SJim Harris.It Va dev.nvme.0.ioq0.sq_head
204e3e90193SJim Harris(R) Current location of the submission queue head pointer as observed by
205e3e90193SJim Harristhe driver.
206e3e90193SJim HarrisThe head pointer is incremented by the controller as it takes commands off
207e3e90193SJim Harrisof the submission queue.
208e3e90193SJim Harris.It Va dev.nvme.0.ioq0.sq_tail
209e3e90193SJim Harris(R) Current location of the submission queue tail pointer as observed by
210e3e90193SJim Harristhe driver.
211e3e90193SJim HarrisThe driver increments the tail pointer after writing a command
212e3e90193SJim Harrisinto the submission queue to signal that a new command is ready to be
213e3e90193SJim Harrisprocessed.
214e3e90193SJim Harris.It Va dev.nvme.0.ioq0.cq_head
215e3e90193SJim Harris(R) Current location of the completion queue head pointer as observed by
216e3e90193SJim Harristhe driver.
217e3e90193SJim HarrisThe driver increments the head pointer after finishing
218e3e90193SJim Harriswith a completion entry that was posted by the controller.
219e3e90193SJim Harris.It Va dev.nvme.0.ioq0.num_cmds
220e3e90193SJim Harris(R) Number of commands that have been submitted on this queue pair.
221e3e90193SJim Harris.It Va dev.nvme.0.ioq0.dump_debug
222e3e90193SJim Harris(W) Writing 1 to this sysctl will dump the full contents of the submission
223e3e90193SJim Harrisand completion queues to the console.
224e3e90193SJim Harris.El
22551c5de5fSWarner Losh.Pp
22651c5de5fSWarner LoshIn addition to the typical pci attachment, the
22751c5de5fSWarner Losh.Nm
22851c5de5fSWarner Loshdriver supports attaching to a
22951c5de5fSWarner Losh.Xr ahci 4
23051c5de5fSWarner Loshdevice.
23151c5de5fSWarner LoshIntel's Rapid Storage Technology (RST) hides the nvme device
23251c5de5fSWarner Loshbehind the AHCI device due to limitations in Windows.
23351c5de5fSWarner LoshHowever, this effectively hides it from the
23451c5de5fSWarner Losh.Fx
23551c5de5fSWarner Loshkernel.
23651c5de5fSWarner LoshTo work around this limitation,
23751c5de5fSWarner Losh.Fx
23851c5de5fSWarner Loshdetects that the AHCI device supports RST and when it is enabled.
23951c5de5fSWarner LoshSee
24051c5de5fSWarner Losh.Xr ahci 4
24151c5de5fSWarner Loshfor more details.
242*bb7f7d5bSWarner Losh.Sh DIAGNOSTICS
243*bb7f7d5bSWarner Losh.Bl -diag
244*bb7f7d5bSWarner Losh.It "nvme%d: System interrupt issues?"
245*bb7f7d5bSWarner LoshThe driver found a timed-out transaction had a pending completion record,
246*bb7f7d5bSWarner Loshindicating an interrupt had not been delivered.
247*bb7f7d5bSWarner LoshThe system is either not configuring interrupts properly, or the system drops
248*bb7f7d5bSWarner Loshthem under load.
249*bb7f7d5bSWarner LoshThis message will appear at most once per boot per controller.
250*bb7f7d5bSWarner Losh.El
251e3e90193SJim Harris.Sh SEE ALSO
25280c8ffadSWarner Losh.Xr nda 4 ,
253e3e90193SJim Harris.Xr nvd 4 ,
254e3e90193SJim Harris.Xr pci 4 ,
255e3e90193SJim Harris.Xr nvmecontrol 8 ,
25678c7e17bSChristian Brueffer.Xr disk 9
2571cb02270SJim Harris.Sh HISTORY
2581cb02270SJim HarrisThe
2591cb02270SJim Harris.Nm
2601cb02270SJim Harrisdriver first appeared in
2611cb02270SJim Harris.Fx 9.2 .
262e3e90193SJim Harris.Sh AUTHORS
263e3e90193SJim Harris.An -nosplit
264e3e90193SJim HarrisThe
265e3e90193SJim Harris.Nm
266e3e90193SJim Harrisdriver was developed by Intel and originally written by
2676c899950SBaptiste Daroussin.An Jim Harris Aq Mt jimharris@FreeBSD.org ,
2686c899950SBaptiste Daroussinwith contributions from
2696c899950SBaptiste Daroussin.An Joe Golio
2706c899950SBaptiste Daroussinat EMC.
271e3e90193SJim Harris.Pp
272e3e90193SJim HarrisThis man page was written by
2736c899950SBaptiste Daroussin.An Jim Harris Aq Mt jimharris@FreeBSD.org .
274