#
3d89acf5 |
| 15-Aug-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out the admin failures and flag them as such so that we can still send admin comma
nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out the admin failures and flag them as such so that we can still send admin commands on half-failed drives.
Fixes: 9229b3105d88 (nvme: Fail passthrough commands right away in failed state) Sponsored by: Netflix
show more ...
|
#
ce7fac64 |
| 16-Aug-2024 |
Warner Losh <imp@FreeBSD.org> |
Revert "nvme: Separate total failures from I/O failures"
All kinds of crazy stuff was mixed into this commit. Revert it and do it again.
This reverts commit d5507f9e436698ac17dc5ace7ef58493988a9b04
Revert "nvme: Separate total failures from I/O failures"
All kinds of crazy stuff was mixed into this commit. Revert it and do it again.
This reverts commit d5507f9e436698ac17dc5ace7ef58493988a9b04.
Sponsored by: Netflix
show more ...
|
#
d5507f9e |
| 15-Aug-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out the admin failures and flag them as such so that we can still send admin comma
nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out the admin failures and flag them as such so that we can still send admin commands on half-failed drives.
Fixes: 9229b3105d88 (nvme: Fail passthrough commands right away in failed state) Sponsored by: Netflix
show more ...
|
#
bb7f7d5b |
| 24-Jul-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Warn if there's system interrupt issues.
Issue a warning if we have system interrupt issues. If you get this warning, then we submitted a request, it timed out without an interrupt being poste
nvme: Warn if there's system interrupt issues.
Issue a warning if we have system interrupt issues. If you get this warning, then we submitted a request, it timed out without an interrupt being posted, but when we polled the card's completion, we found completion events. This indicates that we're missing interrupts, and to date all the times I've helped people track issues like this down it has been a system issue, not an NVMe driver isseue.
Sponsored by: Netflix Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D46031
show more ...
|
#
aa413543 |
| 24-Jul-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Optimize timeout code further
Optimize timeout code based on three observations.
(1) The tr queues are sorted in order of submission, so the first one that could time out is the first "re
nvme: Optimize timeout code further
Optimize timeout code based on three observations.
(1) The tr queues are sorted in order of submission, so the first one that could time out is the first "real" one on the list. (2) Timeouts for a given queue are all the same length (well, except at startup, where timeout doesn't matter, and when you change it at runtime, where timeouts will still happen eventually and the difference isn't worth optimizing for). (3) Calling the ISR races the real ISR and we should avoid that better.
So now, after checking to see if the card is there and working, the timeout routine scans the pending tracker list until it finds a non-AER tracker. If the deadline hasn't passed, we return, doing nothing further. Otherwise, we call poll completions and then process the list looking for timed out items.
This should move the timeout routine to touching hardware only when it's really necessary. It thus avoids racing the normal ISR, while still timig out stuck transactions quickly enough.
There was also some minor code motion to make all of the above flow more nicely for the reader.
When interrupts aren't working at all, then this will increase latency somewhat. But when interrupts aren't working at all, there's bigger problems and we should poll quite often in that case. That will be handled in future commits.
Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D46026
show more ...
|
#
e6d3ba4b |
| 24-Jul-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Lock when processing an abort completion command.
When processing an abort completion command, we have to lock. But we have to lock the qpair of the original transaction (not the abort we're c
nvme: Lock when processing an abort completion command.
When processing an abort completion command, we have to lock. But we have to lock the qpair of the original transaction (not the abort we're completing). We do this to avoid races with checking the completion id to tr mapping array, as well as to manually complete it.
Note: we don't handle the completion status of 'Asked to abort too many transactions at once.' That will be fixed on subsequent commits. Add a note to that effect for now since it's a harder problem to solve.
Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D46025
show more ...
|
#
86909f7a |
| 24-Jul-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Always lock and only avoid processing for recovery state
When we lose a race with the timeout code, shift towards waiting for that timeout code to complete so we can acquire the lock. This way
nvme: Always lock and only avoid processing for recovery state
When we lose a race with the timeout code, shift towards waiting for that timeout code to complete so we can acquire the lock. This way we can make sure we're in 'normal' mode before processing I/O completions. If we're not in 'normal' mode, then we're resetting and we should avoid completions.
Sponsored by: Netflix Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D46024
show more ...
|
#
123e2906 |
| 20-Jul-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: widen nvme_qpair_manual_complete_request for better errors
Make nvme_qpair_manual_complete_request take dnr as well as a print_on_error action. Make the status word computation common between
nvme: widen nvme_qpair_manual_complete_request for better errors
Make nvme_qpair_manual_complete_request take dnr as well as a print_on_error action. Make the status word computation common between it and nvme_qpair_manual_complete_tracker. And print the error when we are cancelling the I/O on failure, but not when we're filtering the I/O after we've failed. Make it private again to nvme_qpair.c.
Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D46049
show more ...
|
#
9229b310 |
| 20-Jul-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Fail passthrough commands right away in failed state.
When the drive is failed, we can't send passthrough commands to the card, so fail them right away. Rearrange the comments to reflect the c
nvme: Fail passthrough commands right away in failed state.
When the drive is failed, we can't send passthrough commands to the card, so fail them right away. Rearrange the comments to reflect the current failure paths in the driver.
Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D46048
show more ...
|
Revision tags: release/14.1.0 |
|
#
0dd84c3b |
| 13-May-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Add comment about where tr->deadline is set
It's easy to overlook the chain of events that lead to tr->deadline being updated. Add a comment here to explain what otherwise looks like an oversi
nvme: Add comment about where tr->deadline is set
It's easy to overlook the chain of events that lead to tr->deadline being updated. Add a comment here to explain what otherwise looks like an oversight w/o careful study.
Sponsored by: Netflix
show more ...
|
#
c931cf6a |
| 13-May-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Slight simplification
We don't need to dereference qpair to get the ctrlr pointer each time, so use the cached value. It's not going to change. No change intended.
Sponsored by: Netflix
|
#
9db8ca92 |
| 13-May-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Slight reworking this loop to match FreeBSD style
Update the comment for the code, and slightly rework the code in the 'fast exit' paradigm that FreeBSD generally tries to do.
Sponsored by:
nvme: Slight reworking this loop to match FreeBSD style
Update the comment for the code, and slightly rework the code in the 'fast exit' paradigm that FreeBSD generally tries to do.
Sponsored by: Netflix
show more ...
|
#
5a178b83 |
| 13-May-2024 |
Warner Losh <imp@FreeBSD.org> |
nvme: Add locking asserts
nvme_qpair_complete_tracker and nvme_qpair_manual_complete_tracker have to be called without the qpair lock, so assert its unowned.
Sponsored by: Netflix
|
Revision tags: release/13.3.0 |
|
#
5650bd3f |
| 29-Jan-2024 |
John Baldwin <jhb@FreeBSD.org> |
nvme: Use the NVMEF macro to construct fields
Reviewed by: chuck, imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43605
|
#
479680f2 |
| 29-Jan-2024 |
John Baldwin <jhb@FreeBSD.org> |
nvme: Use the NVMEV macro instead of expanded versions
Reviewed by: chuck Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43595
|
#
fdafd315 |
| 24-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate no-longer-needed sys/cdefs.h includes as well as now-empty blank lines in a row.
Remov
sys: Automated cleanup of cdefs and other formatting
Apply the following automated changes to try to eliminate no-longer-needed sys/cdefs.h includes as well as now-empty blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/ Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/ Remove /\n+#if.*\n#endif.*\n+/ Remove /^#if.*\n#endif.*\n/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
8d6c0743 |
| 06-Nov-2023 |
Alexander Motin <mav@FreeBSD.org> |
nvme: Introduce longer timeouts for admin queue
KIOXIA CD8 SSDs routinely take ~25 seconds to delete non-empty namespace. In some cases like hot-plug it takes longer, triggering timeout and control
nvme: Introduce longer timeouts for admin queue
KIOXIA CD8 SSDs routinely take ~25 seconds to delete non-empty namespace. In some cases like hot-plug it takes longer, triggering timeout and controller resets after just 30 seconds. Linux for many years has separate 60 seconds timeout for admin queue. This patch does the same. And it is good to be consistent.
Sponsored by: iXsystems, Inc. Reviewed by: imp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D42454
show more ...
|
#
afc3d49b |
| 10-Oct-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Close a race in destroying qpair and timeouts
While we should have cleared all the pending I/O prior to calling nvme_qpair_destroy, which should ensure that if the callout_drain causes a call
nvme: Close a race in destroying qpair and timeouts
While we should have cleared all the pending I/O prior to calling nvme_qpair_destroy, which should ensure that if the callout_drain causes a call to nvme_qpair_timeout(), it won't schedule any new timeout. However, it doesn't hurt to set timeout_pending to false in nvme_qpair_destroy() and have nvme_qpair_timeout() exit early if it sees it w/o scheduling a timeout. Since we don't otherwise stop the timeout until we're about to destroy the qpair, this ensures we fail safe. The lock/unlock also ensures the callout_drain will either remove the callout, or wait for it to run with the early bailout.
We can likely further improve this by using callout_stop() inside the pending lock. I'll investigate that for future refinement.
Sponsored by: Netflix Suggestions by: jhb Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D42065
show more ...
|
#
9cd7b624 |
| 10-Oct-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Eliminate RECOVERY_FAILED state
While it seemed like a good idea to have this state, we can do everything we wanted with the state by checking ctrlr->is_failed since that's set before we start
nvme: Eliminate RECOVERY_FAILED state
While it seemed like a good idea to have this state, we can do everything we wanted with the state by checking ctrlr->is_failed since that's set before we start failing the qpairs. Add some comments about racing when we're failing the controller, though in practice I'm not sure that kind of race could even be lost.
Sponsored by: Netflix Reviewed by: chuck, gallatin, jhb Differential Revision: https://reviews.freebsd.org/D42051
show more ...
|
#
1d6021cd |
| 26-Sep-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Supress noise messages
When we're suspending, we get messages about waiting for the controller to reset. These are in error: we're not waiting for it to reset. We put the recovery state as par
nvme: Supress noise messages
When we're suspending, we get messages about waiting for the controller to reset. These are in error: we're not waiting for it to reset. We put the recovery state as part of suspending, so we should suppress these as a false positive.
Also remove a stray debug that's left over from earlier versions of the recovery code that no longer makes sense.
Sponsored by: Netflix
show more ...
|
#
da8324a9 |
| 24-Sep-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Fix locking protocol violation to fix suspend / resume
Currently, when we suspend, we need to tear down all the qpairs. We call nvme_admin_qpair_abort_aers with the admin qpair lock held, but
nvme: Fix locking protocol violation to fix suspend / resume
Currently, when we suspend, we need to tear down all the qpairs. We call nvme_admin_qpair_abort_aers with the admin qpair lock held, but the tracker it will call for the pending AER also locks it (recursively) hitting an assert. This routine is called without the qpair lock held when we destroy the device entirely in a number of places. Add an assert to this effect and drop the qpair lock before calling it. nvme_admin_qpair_abort_aers then locks the qpair lock to traverse the list, dropping it around calls to nvme_qpair_complete_tracker, and restarting the list scan after picking it back up.
Note: If interrupts are still running, there's a tiny window for these AERs: If one fires just an instant after we manually complete it, then we'll be fine: we set the state of the queue to 'waiting' and we ignore interrupts while 'waiting'. We know we'll destroy all the queue state with these pending interrupts before looking at them again and we know all the TRs will have been completed or rescheduled. So either way we're covered.
Also, tidy up the failure case as well: failing a queue is a superset of disabling it, so no need to call disable first. This solves solves some locking issues with recursion since we don't need to recurse.. Set the qpair state of failed queues to RECOVERY_FAILED and stop scheduling the watchdog. Assert we're not failed when we're enabling a qpair, since failure currently is one-way. Make failure a little less verbose.
Next, kill the pre/post reset stuff. It's completely bogus since we disable the qparis, we don't need to also hold the lock through the reset: disabling will cause the ISR to return early. This keeps us from recursing on the recovery lock when resuming. We only need the recovery lock to avoid a specific race between the timer and the ISR.
Finally, kill NVME_RESET_2X. It'S been a major release since we put it in and nobody has used it as far as I can tell. And it was a motivator for the pre/post uglification.
These are all interrelated, so need to be done at the same time.
Sponsored by: Netflix Reviewed by: jhb Tested by: jhb (made sure suspend / resume worked) MFC After: 3 days Differential Revision: https://reviews.freebsd.org/D41866
show more ...
|
#
d9543162 |
| 15-Sep-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Give up when we've failed
Normally, we poll the device every so often to see if commands have timed out. However, we'll go into the recovery state as part of failing the drive. To account for
nvme: Give up when we've failed
Normally, we poll the device every so often to see if commands have timed out. However, we'll go into the recovery state as part of failing the drive. To account for all possibilties, if we're failed when we get into the polling function, just stop polling: Party is over.
Sponsored by: Netflix
show more ...
|
#
8052b01e |
| 25-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Add exclusion for ISR
Add a basically uncontended spinlock that we take out while the ISR is running. This has two effects: First, when we get a timeout, we can safely call the nvme_qpair_proc
nvme: Add exclusion for ISR
Add a basically uncontended spinlock that we take out while the ISR is running. This has two effects: First, when we get a timeout, we can safely call the nvme_qpair_process_completions w/o racing any ISRs. Second, we can use it to ensure that we don't reset the card while the ISRs are active (right now we just sleep and hope for the best, which usually is fine, but not always).
Sponsored by: Netflix MFC After: 2 weeks Reviewed by: chuck, gallatin Differential Revision: https://reviews.freebsd.org/D41452
show more ...
|
#
d4959bfc |
| 25-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Greatly improve error recovery
Next phase of error recovery: Eliminate the REOVERY_START phase, since we don't need to wait to start recovery. Eliminate the RECOVERY_RESET phase since it is tr
nvme: Greatly improve error recovery
Next phase of error recovery: Eliminate the REOVERY_START phase, since we don't need to wait to start recovery. Eliminate the RECOVERY_RESET phase since it is transient, we now transition from RECOVERY_NORMAL into RECOVERY_WAITING.
In normal mode, read the status of the controller. If it is in failed state, or appears to be hot-plugged, jump directly to reset which will sort out the proper things to do. This will cause all pending I/O to complete with an abort status before the reset.
When in the NORMAL state, call the interrupt handler. This will complete all pending transactions when interrupts are broken or temporarily misbehaving. We then check all the pending completions for timeouts. If we have abort enabled, then we'll send an abort. Otherwise we'll assume the controller is wedged and needs a reset. By calling the interrupt handler here, we'll avoid an issue with the current code where we transitioned to RECOVERY_START which prevented any completions from happening. Now completions happen. In addition and follow-on I/O that is scheduled in the completion routines will be submitted, rather than queued, because the recovery state is correct. This also fixes a problem where I/O would timeout, but never complete, leading to hung I/O.
Resetting remains the same as before, just when we chose to reset has changed.
A nice side effect of these changes is that we now do I/O when interrupts to the card are totally broken. Followon commits will improve the error reporting and logging when this happens. Performance will be aweful, but will at least be minimally functional.
There is a small race when we're checking the completions if interrupts are working, but this is handled in a future commit.
Sponsored by: Netflix MFC After: 2 weeks Differential Revision: https://reviews.freebsd.org/D36922
show more ...
|
#
2a6b7055 |
| 25-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
nvme: Timeout expired transactions
When we went to having a shared timeout routine, failing the timed-out transaction code was inadvertantly dropped. Reinstate it.
Fixes: 502dc84a8b670 Sponsored
nvme: Timeout expired transactions
When we went to having a shared timeout routine, failing the timed-out transaction code was inadvertantly dropped. Reinstate it.
Fixes: 502dc84a8b670 Sponsored by: Netflix MFC After: 2 weeks Reviewed by: chuck, jhb Differential Revision: https://reviews.freebsd.org/D36921
show more ...
|