xref: /freebsd/share/man/man4/nvme.4 (revision b64c5a0ace59af62eff52bfe110a521dc73c937b)
1.\"
2.\" Copyright (c) 2012-2016 Intel Corporation
3.\" All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\" 1. Redistributions of source code must retain the above copyright
9.\"    notice, this list of conditions, and the following disclaimer,
10.\"    without modification.
11.\" 2. Redistributions in binary form must reproduce at minimum a disclaimer
12.\"    substantially similar to the "NO WARRANTY" disclaimer below
13.\"    ("Disclaimer") and any redistribution must be conditioned upon
14.\"    including a substantially similar Disclaimer requirement for further
15.\"    binary redistribution.
16.\"
17.\" NO WARRANTY
18.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
19.\" "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
20.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
21.\" A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
22.\" HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
26.\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
27.\" IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGES.
29.\"
30.\" nvme driver man page.
31.\"
32.\" Author: Jim Harris <jimharris@FreeBSD.org>
33.\"
34.Dd June 6, 2020
35.Dt NVME 4
36.Os
37.Sh NAME
38.Nm nvme
39.Nd NVM Express core driver
40.Sh SYNOPSIS
41To compile this driver into your kernel,
42place the following line in your kernel configuration file:
43.Bd -ragged -offset indent
44.Cd "device nvme"
45.Ed
46.Pp
47Or, to load the driver as a module at boot, place the following line in
48.Xr loader.conf 5 :
49.Bd -literal -offset indent
50nvme_load="YES"
51.Ed
52.Pp
53Most users will also want to enable
54.Xr nvd 4
55or
56.Xr nda 4
57to expose NVM Express namespaces as disk devices which can be
58partitioned.
59Note that in NVM Express terms, a namespace is roughly equivalent to a
60SCSI LUN.
61.Sh DESCRIPTION
62The
63.Nm
64driver provides support for NVM Express (NVMe) controllers, such as:
65.Bl -bullet
66.It
67Hardware initialization
68.It
69Per-CPU IO queue pairs
70.It
71API for registering NVMe namespace consumers such as
72.Xr nvd 4
73or
74.Xr nda 4
75.It
76API for submitting NVM commands to namespaces
77.It
78Ioctls for controller and namespace configuration and management
79.El
80.Pp
81The
82.Nm
83driver creates controller device nodes in the format
84.Pa /dev/nvmeX
85and namespace device nodes in
86the format
87.Pa /dev/nvmeXnsY .
88Note that the NVM Express specification starts numbering namespaces at 1,
89not 0, and this driver follows that convention.
90.Sh CONFIGURATION
91By default,
92.Nm
93will create an I/O queue pair for each CPU, provided enough MSI-X vectors
94and NVMe queue pairs can be allocated.
95If not enough vectors or queue
96pairs are available, nvme(4) will use a smaller number of queue pairs and
97assign multiple CPUs per queue pair.
98.Pp
99To force a single I/O queue pair shared by all CPUs, set the following
100tunable value in
101.Xr loader.conf 5 :
102.Bd -literal -offset indent
103hw.nvme.per_cpu_io_queues=0
104.Ed
105.Pp
106To assign more than one CPU per I/O queue pair, thereby reducing the number
107of MSI-X vectors consumed by the device, set the following tunable value in
108.Xr loader.conf 5 :
109.Bd -literal -offset indent
110hw.nvme.min_cpus_per_ioq=X
111.Ed
112.Pp
113To force legacy interrupts for all
114.Nm
115driver instances, set the following tunable value in
116.Xr loader.conf 5 :
117.Bd -literal -offset indent
118hw.nvme.force_intx=1
119.Ed
120.Pp
121Note that use of INTx implies disabling of per-CPU I/O queue pairs.
122.Pp
123To control maximum amount of system RAM in bytes to use as Host Memory
124Buffer for capable devices, set the following tunable:
125.Bd -literal -offset indent
126hw.nvme.hmb_max
127.Ed
128.Pp
129The default value is 5% of physical memory size per device.
130.Pp
131The
132.Xr nvd 4
133driver is used to provide a disk driver to the system by default.
134The
135.Xr nda 4
136driver can also be used instead.
137The
138.Xr nvd 4
139driver performs better with smaller transactions and few TRIM
140commands.
141It sends all commands directly to the drive immediately.
142The
143.Xr nda 4
144driver performs better with larger transactions and also collapses
145TRIM commands giving better performance.
146It can queue commands to the drive; combine
147.Dv BIO_DELETE
148commands into a single trip; and
149use the CAM I/O scheduler to bias one type of operation over another.
150To select the
151.Xr nda 4
152driver, set the following tunable value in
153.Xr loader.conf 5 :
154.Bd -literal -offset indent
155hw.nvme.use_nvd=0
156.Ed
157.Pp
158This value may also be set in the kernel config file with
159.Bd -literal -offset indent
160.Cd options NVME_USE_NVD=0
161.Ed
162.Pp
163When there is an error,
164.Nm
165prints only the most relevant information about the command by default.
166To enable dumping of all information about the command, set the following tunable
167value in
168.Xr loader.conf 5 :
169.Bd -literal -offset indent
170hw.nvme.verbose_cmd_dump=1
171.Ed
172.Pp
173Prior versions of the driver reset the card twice on boot.
174This proved to be unnecessary and inefficient, so the driver now resets drive
175controller only once.
176The old behavior may be restored in the kernel config file with
177.Bd -literal -offset indent
178.Cd options NVME_2X_RESET
179.Ed
180.Sh SYSCTL VARIABLES
181The following controller-level sysctls are currently implemented:
182.Bl -tag -width indent
183.It Va dev.nvme.0.num_cpus_per_ioq
184(R) Number of CPUs associated with each I/O queue pair.
185.It Va dev.nvme.0.int_coal_time
186(R/W) Interrupt coalescing timer period in microseconds.
187Set to 0 to disable.
188.It Va dev.nvme.0.int_coal_threshold
189(R/W) Interrupt coalescing threshold in number of command completions.
190Set to 0 to disable.
191.El
192.Pp
193The following queue pair-level sysctls are currently implemented.
194Admin queue sysctls take the format of dev.nvme.0.adminq and I/O queue sysctls
195take the format of dev.nvme.0.ioq0.
196.Bl -tag -width indent
197.It Va dev.nvme.0.ioq0.num_entries
198(R) Number of entries in this queue pair's command and completion queue.
199.It Va dev.nvme.0.ioq0.num_tr
200(R) Number of nvme_tracker structures currently allocated for this queue pair.
201.It Va dev.nvme.0.ioq0.num_prp_list
202(R) Number of nvme_prp_list structures currently allocated for this queue pair.
203.It Va dev.nvme.0.ioq0.sq_head
204(R) Current location of the submission queue head pointer as observed by
205the driver.
206The head pointer is incremented by the controller as it takes commands off
207of the submission queue.
208.It Va dev.nvme.0.ioq0.sq_tail
209(R) Current location of the submission queue tail pointer as observed by
210the driver.
211The driver increments the tail pointer after writing a command
212into the submission queue to signal that a new command is ready to be
213processed.
214.It Va dev.nvme.0.ioq0.cq_head
215(R) Current location of the completion queue head pointer as observed by
216the driver.
217The driver increments the head pointer after finishing
218with a completion entry that was posted by the controller.
219.It Va dev.nvme.0.ioq0.num_cmds
220(R) Number of commands that have been submitted on this queue pair.
221.It Va dev.nvme.0.ioq0.dump_debug
222(W) Writing 1 to this sysctl will dump the full contents of the submission
223and completion queues to the console.
224.El
225.Pp
226In addition to the typical pci attachment, the
227.Nm
228driver supports attaching to a
229.Xr ahci 4
230device.
231Intel's Rapid Storage Technology (RST) hides the nvme device
232behind the AHCI device due to limitations in Windows.
233However, this effectively hides it from the
234.Fx
235kernel.
236To work around this limitation,
237.Fx
238detects that the AHCI device supports RST and when it is enabled.
239See
240.Xr ahci 4
241for more details.
242.Sh DIAGNOSTICS
243.Bl -diag
244.It "nvme%d: System interrupt issues?"
245The driver found a timed-out transaction had a pending completion record,
246indicating an interrupt had not been delivered.
247The system is either not configuring interrupts properly, or the system drops
248them under load.
249This message will appear at most once per boot per controller.
250.El
251.Sh SEE ALSO
252.Xr nda 4 ,
253.Xr nvd 4 ,
254.Xr pci 4 ,
255.Xr nvmecontrol 8 ,
256.Xr disk 9
257.Sh HISTORY
258The
259.Nm
260driver first appeared in
261.Fx 9.2 .
262.Sh AUTHORS
263.An -nosplit
264The
265.Nm
266driver was developed by Intel and originally written by
267.An Jim Harris Aq Mt jimharris@FreeBSD.org ,
268with contributions from
269.An Joe Golio
270at EMC.
271.Pp
272This man page was written by
273.An Jim Harris Aq Mt jimharris@FreeBSD.org .
274