#
e3bdf3da |
| 31-Aug-2021 |
Alexander Motin <mav@FreeBSD.org> |
nvme(4): Add MSI and single MSI-X support.
If we can't allocate more MSI-X vectors, accept using single shared. If we can't allocate any MSI-X, try to allocate 2 MSI vectors, but accept single share
nvme(4): Add MSI and single MSI-X support.
If we can't allocate more MSI-X vectors, accept using single shared. If we can't allocate any MSI-X, try to allocate 2 MSI vectors, but accept single shared. If still no luck, fall back to shared INTx.
This provides maximal flexibility in some limited scenarios. For example, vmd(4) does not support INTx and can handle only limited number of MSI/MSI-X vectors without sharing.
MFC after: 1 week
show more ...
|
#
fc9a0840 |
| 16-Jul-2021 |
Warner Losh <imp@FreeBSD.org> |
nvme: Enable interrupts after qpair fully constructed
To guard against the ill effects of a spurious interrupt during construction (or one that was bogusly pending), enable interrupts after the qpai
nvme: Enable interrupts after qpair fully constructed
To guard against the ill effects of a spurious interrupt during construction (or one that was bogusly pending), enable interrupts after the qpair is completely constructed. Otherwise, we can die with null pointer dereferences in nvme_qpair_process_completions. This has been observed in at least one pre-release NVMe drive where the MSIX interrupt fired while the queue was being created, before we'd started the NVMe controller card.
The alternative of only turning on the interrupts after the rest was tried, but was insufficient to work around this bug and made the code more complicated w/o benefit.
Reviewed by: mav, chuck Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D31182
show more ...
|
#
aa0ab681 |
| 03-Jul-2021 |
Warner Losh <imp@FreeBSD.org> |
nvme: coherently read status of completion records
Coherently read the phase bit of the status completion record. We loop over the completion record array, looking for all the transactions in the sa
nvme: coherently read status of completion records
Coherently read the phase bit of the status completion record. We loop over the completion record array, looking for all the transactions in the same phase that have been completed. In doing that, we have to be careful to read the status field first, and if it indicates a complete record, we need to read and process that record. Otherwise, the host might be overtaken by device when reading this completion record, leading to a mistaken belief that the record is in phase. This leads to the code using old values and looking at an already completed entry, which has no current tracker.
To work around this problem, we read the status and make sure it is in phase, we then re-read the entire completion record guaranteeing it's complete, valid, and consistent . In addition we resync the dmatag to reflect changes since the prior loop for the bouncing dma case.
Reviewed by: jrtc27@, chuck@ Found by: jrtc27 (this fix is based in part on her D30995 fix) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D31002
show more ...
|
Revision tags: release/13.0.0 |
|
#
9600aa31 |
| 08-Feb-2021 |
Warner Losh <imp@FreeBSD.org> |
nvme: use NVME_GONE rather than hard-coded 0xffffffff
Make it clearer that the value 0xfffffff is being used to detect the device is gone. We use it other places in the driver for other meanings.
|
#
082905ca |
| 04-Dec-2020 |
Warner Losh <imp@FreeBSD.org> |
nvme: Remove a wmb() that's not necessary.
bus_dmamap_sync() ensures that memory that's prepared for PREWRITE can be DMA'd immediately after it returns. The details differ, but this mirrors atomic t
nvme: Remove a wmb() that's not necessary.
bus_dmamap_sync() ensures that memory that's prepared for PREWRITE can be DMA'd immediately after it returns. The details differ, but this mirrors atomic thread release semantics, at least for the buffers synced.
For non-x86 platforms, bus_dmamap_sync() has the right syncing and fences. So in the past, wmb() had been omitted for them.
For x86 platforms, the memory ordering is already strong enough to ensure DMA to the device sees the current contents. As such, we don't need the wmb() here. It translates to an sfence which is only needed for writes to regions that have the write combining attribute set or when some exotic opcodes are used. The nvme driver does neither of these. Since bus_dmamap_sync() includes atomic_thread_fence_rel, we can be assured any optimizer won't reorder the bus_dmamap_sync and the bus_space_write operations. The wmb() was a vestiage of the pre-busdma version initially committed to the tree.
Reviewed by: kib@, gallatin@, chuck@, mav@ Differential Revision: https://reviews.freebsd.org/D27448
show more ...
|
#
8f9d5a8d |
| 02-Dec-2020 |
Michal Meloun <mmel@FreeBSD.org> |
NVME: Multiple busdma related fixes. - in nvme_qpair_process_completions() do dma sync before completion buffer is used. - in nvme_qpair_submit_tracker(), don't do explicit wmb() also for arm and
NVME: Multiple busdma related fixes. - in nvme_qpair_process_completions() do dma sync before completion buffer is used. - in nvme_qpair_submit_tracker(), don't do explicit wmb() also for arm and arm64. Bus_dmamap_sync() on these architectures is sufficient to ensure that all CPU stores are visible to external (including DMA) observers. - Allocate completion buffer as BUS_DMA_COHERENT. On not-DMA coherent systems, buffers continuously owned (and accessed) by DMA must be allocated with this flag. Note that BUS_DMA_COHERENT flag is no-op on DMA coherent systems (or coherent buses in mixed systems).
MFC after: 4 weeks Reviewed by: mav, imp Differential Revision: https://reviews.freebsd.org/D27446
show more ...
|
#
8d08cdc7 |
| 02-Dec-2020 |
Chuck Tuffli <chuck@FreeBSD.org> |
nvme: Fix typo in definition
Change occurrences of "selt test" to "self tests in the NVMe header file.
Reviewed by: imp, mav MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27
nvme: Fix typo in definition
Change occurrences of "selt test" to "self tests in the NVMe header file.
Reviewed by: imp, mav MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27439
show more ...
|
#
ac90f70d |
| 29-Nov-2020 |
Alexander Motin <mav@FreeBSD.org> |
Increase nvme(4) maximum transfer size from 1MB to 2MB.
With 4KB page size the 2MB is the maximum we can address with one page PRP. Going further would require chaining, that would add some more com
Increase nvme(4) maximum transfer size from 1MB to 2MB.
With 4KB page size the 2MB is the maximum we can address with one page PRP. Going further would require chaining, that would add some more complexity.
On the other side, to reduce memory consumption, allocate the PRP memory respecting maximum transfer size reported in the controller identify data. Many of NVMe devices support much smaller values, starting from 128KB. To do that we have to change the initialization sequence to pull the data earlier, before setting up the I/O queue pairs. The admin queue pair is still allocated for full MIN(maxphys, 2MB) size, but it is not a big deal, since there is only one such queue with only 16 trackers.
Reviewed by: imp MFC after: 2 weeks Sponsored by: iXsystems, Inc.
show more ...
|
Revision tags: release/12.2.0 |
|
#
d87b31e1 |
| 02-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
nvme: clean up empty lines in .c and .h files
|
#
440cec3f |
| 12-Aug-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
#
e383ec74 |
| 06-Aug-2020 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r363739 through r363986.
|
#
96ad26ee |
| 04-Aug-2020 |
Mark Johnston <markj@FreeBSD.org> |
Remove free_domain() and uma_zfree_domain().
These functions were introduced before UMA started ensuring that freed memory gets placed in domain-local caches. They no longer serve any purpose since
Remove free_domain() and uma_zfree_domain().
These functions were introduced before UMA started ensuring that freed memory gets placed in domain-local caches. They no longer serve any purpose since UMA now provides their functionality by default. Remove them to simplyify the kernel memory allocator interfaces a bit.
Reviewed by: cem, kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25937
show more ...
|
#
ead7e103 |
| 18-Jun-2020 |
Alexander Motin <mav@FreeBSD.org> |
Make polled request timeout less invasive.
Instead of panic after one second of polling, make the normal timeout handler to activate, reset the controller and abort the outstanding requests. If all
Make polled request timeout less invasive.
Instead of panic after one second of polling, make the normal timeout handler to activate, reset the controller and abort the outstanding requests. If all of it won't happen within 10 seconds then something in the driver is likely stuck bad and panic is the only way out.
In particular this fixed device hot unplug during execution of those polled commands, allowing clean device detach instead of panic.
MFC after: 1 week Sponsored by: iXsystems, Inc.
show more ...
|
#
550d5d64 |
| 17-Jun-2020 |
Alexander Motin <mav@FreeBSD.org> |
Fix admin qpair leak if detached during initial reset.
MFC after: 1 week Sponsored by: iXsystems, Inc.
|
Revision tags: release/11.4.0 |
|
#
4053f8ac |
| 02-May-2020 |
David Bright <dab@FreeBSD.org> |
Fix various Coverity-detected errors in nvme driver
This fixes several Coverity-detected errors in the nvme driver.
CIDs addressed: 1008344, 1009377, 1009380, 1193740, 1305470, 1403975, 1403980
Re
Fix various Coverity-detected errors in nvme driver
This fixes several Coverity-detected errors in the nvme driver.
CIDs addressed: 1008344, 1009377, 1009380, 1193740, 1305470, 1403975, 1403980
Reviewed by: imp@, vangyzen@ MFC after: 5 days Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D24532
show more ...
|
#
aeb665b5 |
| 30-Mar-2020 |
Ed Maste <emaste@FreeBSD.org> |
remove extraneous double ;s in sys/
|
#
0a4b14e8 |
| 15-Dec-2019 |
Michal Meloun <mmel@FreeBSD.org> |
Properly synchronize completion DMA buffers. Within command completion processing the callback function may access DMAed data buffer. Synchronize it before use, not after. This allows to use NVMe dis
Properly synchronize completion DMA buffers. Within command completion processing the callback function may access DMAed data buffer. Synchronize it before use, not after. This allows to use NVMe disk on non-DMA coherent arm64 system.
MFC after: 3 weeks
show more ...
|
#
7588c6cc |
| 13-Dec-2019 |
Warner Losh <imp@FreeBSD.org> |
Move to using bool instead of boolean_t
While there are subtle semantic differences between bool and boolean_t, none of them matter in these cases. Prefer true/false when dealing with bool type. Pre
Move to using bool instead of boolean_t
While there are subtle semantic differences between bool and boolean_t, none of them matter in these cases. Prefer true/false when dealing with bool type. Preserve a couple of TRUEs since they are passed into int args into CAM. Preserve a couple of FALSEs when used for status.done, an int.
Differential Revision: https://reviews.freebsd.org/D20999
show more ...
|
#
43393e8b |
| 06-Dec-2019 |
Warner Losh <imp@FreeBSD.org> |
trackers always know what qpair they are on
Don't needlessly pass around qpair pointers when the tracker knows what qpair it's on. This will simplify code and make it easier to split submission and
trackers always know what qpair they are on
Don't needlessly pass around qpair pointers when the tracker knows what qpair it's on. This will simplify code and make it easier to split submission and completion queues in the future.
Signed-off-by: John Meneghini <johnm@netapp.com>
show more ...
|
Revision tags: release/12.1.0 |
|
#
668ee101 |
| 26-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r352587 through r352763.
|
#
1eab19cb |
| 23-Sep-2019 |
Alexander Motin <mav@FreeBSD.org> |
Make nvme(4) driver some more NUMA aware.
- For each queue pair precalculate CPU and domain it is bound to. If queue pairs are not per-CPU, then use the domain of the device. - Allocate most of qu
Make nvme(4) driver some more NUMA aware.
- For each queue pair precalculate CPU and domain it is bound to. If queue pairs are not per-CPU, then use the domain of the device. - Allocate most of queue pair memory from the domain it is bound to. - Bind callouts to the same CPUs as queue pair to avoid migrations. - Do not assign queue pairs to each SMT thread. It just wasted resources and increased lock congestions. - Remove fixed multiplier of CPUs per queue pair, spread them even. This allows to use more queue pairs in some hardware configurations. - If queue pair serves multiple CPUs, bind different NVMe devices to different CPUs.
MFC after: 1 month Sponsored by: iXsystems, Inc.
show more ...
|
#
f993ed2f |
| 09-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r351732 through r352104.
|
#
f93b7f95 |
| 04-Sep-2019 |
Warner Losh <imp@FreeBSD.org> |
Support doorbell strides != 0.
The NVMe standard (1.4) states
>>> 8.6 Doorbell Stride for Software Emulation >>> The doorbell stride,...is useful in software emulation of an NVM >>> Express control
Support doorbell strides != 0.
The NVMe standard (1.4) states
>>> 8.6 Doorbell Stride for Software Emulation >>> The doorbell stride,...is useful in software emulation of an NVM >>> Express controller. ... For hardware implementations of the NVM >>> Express interface, the expected doorbell stride value is 0h.
However, hardware in the wild exists with a doorbell stride of 1 (meaning 8 byte separation). This change supports that hardware, as well as software emulators as envisioned in Section 8.6. Since this is the fast path, care has been taken to make this computation efficient. The bit of math to compute an offset for each is replaced by a memory load from cache of a pre-computed value.
MFC After: 3 days Reviewed by: scottl@ Differential Revision: https://reviews.freebsd.org/D21514
show more ...
|
#
c5c3ba6b |
| 03-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r351317 through r351731.
|
#
71a28181 |
| 21-Aug-2019 |
Alexander Motin <mav@FreeBSD.org> |
Improve NVMe hot unplug handling.
If device is unplugged from the system (CSTS register reads return 0xffffffff), it makes no sense to send any more recovery requests or expect any responses back.
Improve NVMe hot unplug handling.
If device is unplugged from the system (CSTS register reads return 0xffffffff), it makes no sense to send any more recovery requests or expect any responses back. If there is a detach call in such state, just stop all activity and free resources. If there is no detach call (hot-plug is not supported), rely on normal timeout handling, but when it trigger controller reset, do not wait for impossible and quickly report failure.
MFC after: 2 weeks Sponsored by: iXsystems, Inc.
show more ...
|