xref: /freebsd/share/man/man4/nvme.4 (revision 924226fba12cc9a228c73b956e1b7fa24c60b055)
1.\"
2.\" Copyright (c) 2012-2016 Intel Corporation
3.\" All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\" 1. Redistributions of source code must retain the above copyright
9.\"    notice, this list of conditions, and the following disclaimer,
10.\"    without modification.
11.\" 2. Redistributions in binary form must reproduce at minimum a disclaimer
12.\"    substantially similar to the "NO WARRANTY" disclaimer below
13.\"    ("Disclaimer") and any redistribution must be conditioned upon
14.\"    including a substantially similar Disclaimer requirement for further
15.\"    binary redistribution.
16.\"
17.\" NO WARRANTY
18.\" THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
19.\" "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
20.\" LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR
21.\" A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
22.\" HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
26.\" STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
27.\" IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
28.\" POSSIBILITY OF SUCH DAMAGES.
29.\"
30.\" nvme driver man page.
31.\"
32.\" Author: Jim Harris <jimharris@FreeBSD.org>
33.\"
34.\" $FreeBSD$
35.\"
36.Dd June 6, 2020
37.Dt NVME 4
38.Os
39.Sh NAME
40.Nm nvme
41.Nd NVM Express core driver
42.Sh SYNOPSIS
43To compile this driver into your kernel,
44place the following line in your kernel configuration file:
45.Bd -ragged -offset indent
46.Cd "device nvme"
47.Ed
48.Pp
49Or, to load the driver as a module at boot, place the following line in
50.Xr loader.conf 5 :
51.Bd -literal -offset indent
52nvme_load="YES"
53.Ed
54.Pp
55Most users will also want to enable
56.Xr nvd 4
57or
58.Xr nda 4
59to expose NVM Express namespaces as disk devices which can be
60partitioned.
61Note that in NVM Express terms, a namespace is roughly equivalent to a
62SCSI LUN.
63.Sh DESCRIPTION
64The
65.Nm
66driver provides support for NVM Express (NVMe) controllers, such as:
67.Bl -bullet
68.It
69Hardware initialization
70.It
71Per-CPU IO queue pairs
72.It
73API for registering NVMe namespace consumers such as
74.Xr nvd 4
75or
76.Xr nda 4
77.It
78API for submitting NVM commands to namespaces
79.It
80Ioctls for controller and namespace configuration and management
81.El
82.Pp
83The
84.Nm
85driver creates controller device nodes in the format
86.Pa /dev/nvmeX
87and namespace device nodes in
88the format
89.Pa /dev/nvmeXnsY .
90Note that the NVM Express specification starts numbering namespaces at 1,
91not 0, and this driver follows that convention.
92.Sh CONFIGURATION
93By default,
94.Nm
95will create an I/O queue pair for each CPU, provided enough MSI-X vectors
96and NVMe queue pairs can be allocated.
97If not enough vectors or queue
98pairs are available, nvme(4) will use a smaller number of queue pairs and
99assign multiple CPUs per queue pair.
100.Pp
101To force a single I/O queue pair shared by all CPUs, set the following
102tunable value in
103.Xr loader.conf 5 :
104.Bd -literal -offset indent
105hw.nvme.per_cpu_io_queues=0
106.Ed
107.Pp
108To assign more than one CPU per I/O queue pair, thereby reducing the number
109of MSI-X vectors consumed by the device, set the following tunable value in
110.Xr loader.conf 5 :
111.Bd -literal -offset indent
112hw.nvme.min_cpus_per_ioq=X
113.Ed
114.Pp
115To force legacy interrupts for all
116.Nm
117driver instances, set the following tunable value in
118.Xr loader.conf 5 :
119.Bd -literal -offset indent
120hw.nvme.force_intx=1
121.Ed
122.Pp
123Note that use of INTx implies disabling of per-CPU I/O queue pairs.
124.Pp
125To control maximum amount of system RAM in bytes to use as Host Memory
126Buffer for capable devices, set the following tunable:
127.Bd -literal -offset indent
128hw.nvme.hmb_max
129.Ed
130.Pp
131The default value is 5% of physical memory size per device.
132.Pp
133The
134.Xr nvd 4
135driver is used to provide a disk driver to the system by default.
136The
137.Xr nda 4
138driver can also be used instead.
139The
140.Xr nvd 4
141driver performs better with smaller transactions and few TRIM
142commands.
143It sends all commands directly to the drive immediately.
144The
145.Xr nda 4
146driver performs better with larger transactions and also collapses
147TRIM commands giving better performance.
148It can queue commands to the drive; combine
149.Dv BIO_DELETE
150commands into a single trip; and
151use the CAM I/O scheduler to bias one type of operation over another.
152To select the
153.Xr nda 4
154driver, set the following tunable value in
155.Xr loader.conf 5 :
156.Bd -literal -offset indent
157hw.nvme.use_nvd=0
158.Ed
159.Pp
160This value may also be set in the kernel config file with
161.Bd -literal -offset indent
162.Cd options NVME_USE_NVD=0
163.Ed
164.Pp
165When there is an error,
166.Nm
167prints only the most relevant information about the command by default.
168To enable dumping of all information about the command, set the following tunable
169value in
170.Xr loader.conf 5 :
171.Bd -literal -offset indent
172hw.nvme.verbose_cmd_dump=1
173.Ed
174.Pp
175Prior versions of the driver reset the card twice on boot.
176This proved to be unnecessary and inefficient, so the driver now resets drive
177controller only once.
178The old behavior may be restored in the kernel config file with
179.Bd -literal -offset indent
180.Cd options NVME_2X_RESET
181.Ed
182.Sh SYSCTL VARIABLES
183The following controller-level sysctls are currently implemented:
184.Bl -tag -width indent
185.It Va dev.nvme.0.num_cpus_per_ioq
186(R) Number of CPUs associated with each I/O queue pair.
187.It Va dev.nvme.0.int_coal_time
188(R/W) Interrupt coalescing timer period in microseconds.
189Set to 0 to disable.
190.It Va dev.nvme.0.int_coal_threshold
191(R/W) Interrupt coalescing threshold in number of command completions.
192Set to 0 to disable.
193.El
194.Pp
195The following queue pair-level sysctls are currently implemented.
196Admin queue sysctls take the format of dev.nvme.0.adminq and I/O queue sysctls
197take the format of dev.nvme.0.ioq0.
198.Bl -tag -width indent
199.It Va dev.nvme.0.ioq0.num_entries
200(R) Number of entries in this queue pair's command and completion queue.
201.It Va dev.nvme.0.ioq0.num_tr
202(R) Number of nvme_tracker structures currently allocated for this queue pair.
203.It Va dev.nvme.0.ioq0.num_prp_list
204(R) Number of nvme_prp_list structures currently allocated for this queue pair.
205.It Va dev.nvme.0.ioq0.sq_head
206(R) Current location of the submission queue head pointer as observed by
207the driver.
208The head pointer is incremented by the controller as it takes commands off
209of the submission queue.
210.It Va dev.nvme.0.ioq0.sq_tail
211(R) Current location of the submission queue tail pointer as observed by
212the driver.
213The driver increments the tail pointer after writing a command
214into the submission queue to signal that a new command is ready to be
215processed.
216.It Va dev.nvme.0.ioq0.cq_head
217(R) Current location of the completion queue head pointer as observed by
218the driver.
219The driver increments the head pointer after finishing
220with a completion entry that was posted by the controller.
221.It Va dev.nvme.0.ioq0.num_cmds
222(R) Number of commands that have been submitted on this queue pair.
223.It Va dev.nvme.0.ioq0.dump_debug
224(W) Writing 1 to this sysctl will dump the full contents of the submission
225and completion queues to the console.
226.El
227.Pp
228In addition to the typical pci attachment, the
229.Nm
230driver supports attaching to a
231.Xr ahci 4
232device.
233Intel's Rapid Storage Technology (RST) hides the nvme device
234behind the AHCI device due to limitations in Windows.
235However, this effectively hides it from the
236.Fx
237kernel.
238To work around this limitation,
239.Fx
240detects that the AHCI device supports RST and when it is enabled.
241See
242.Xr ahci 4
243for more details.
244.Sh SEE ALSO
245.Xr nda 4 ,
246.Xr nvd 4 ,
247.Xr pci 4 ,
248.Xr nvmecontrol 8 ,
249.Xr disk 9
250.Sh HISTORY
251The
252.Nm
253driver first appeared in
254.Fx 9.2 .
255.Sh AUTHORS
256.An -nosplit
257The
258.Nm
259driver was developed by Intel and originally written by
260.An Jim Harris Aq Mt jimharris@FreeBSD.org ,
261with contributions from
262.An Joe Golio
263at EMC.
264.Pp
265This man page was written by
266.An Jim Harris Aq Mt jimharris@FreeBSD.org .
267