Searched hist:d4959bfcd110ea471222c7dd87775ba1f4e3d1d9 (Results 1 – 2 of 2) sorted by relevance
/freebsd/sys/dev/nvme/ |
H A D | nvme_private.h | diff d4959bfcd110ea471222c7dd87775ba1f4e3d1d9 Fri Aug 25 18:10:08 CEST 2023 Warner Losh <imp@FreeBSD.org> nvme: Greatly improve error recovery
Next phase of error recovery: Eliminate the REOVERY_START phase, since we don't need to wait to start recovery. Eliminate the RECOVERY_RESET phase since it is transient, we now transition from RECOVERY_NORMAL into RECOVERY_WAITING.
In normal mode, read the status of the controller. If it is in failed state, or appears to be hot-plugged, jump directly to reset which will sort out the proper things to do. This will cause all pending I/O to complete with an abort status before the reset.
When in the NORMAL state, call the interrupt handler. This will complete all pending transactions when interrupts are broken or temporarily misbehaving. We then check all the pending completions for timeouts. If we have abort enabled, then we'll send an abort. Otherwise we'll assume the controller is wedged and needs a reset. By calling the interrupt handler here, we'll avoid an issue with the current code where we transitioned to RECOVERY_START which prevented any completions from happening. Now completions happen. In addition and follow-on I/O that is scheduled in the completion routines will be submitted, rather than queued, because the recovery state is correct. This also fixes a problem where I/O would timeout, but never complete, leading to hung I/O.
Resetting remains the same as before, just when we chose to reset has changed.
A nice side effect of these changes is that we now do I/O when interrupts to the card are totally broken. Followon commits will improve the error reporting and logging when this happens. Performance will be aweful, but will at least be minimally functional.
There is a small race when we're checking the completions if interrupts are working, but this is handled in a future commit.
Sponsored by: Netflix MFC After: 2 weeks Differential Revision: https://reviews.freebsd.org/D36922
|
H A D | nvme_qpair.c | diff d4959bfcd110ea471222c7dd87775ba1f4e3d1d9 Fri Aug 25 18:10:08 CEST 2023 Warner Losh <imp@FreeBSD.org> nvme: Greatly improve error recovery
Next phase of error recovery: Eliminate the REOVERY_START phase, since we don't need to wait to start recovery. Eliminate the RECOVERY_RESET phase since it is transient, we now transition from RECOVERY_NORMAL into RECOVERY_WAITING.
In normal mode, read the status of the controller. If it is in failed state, or appears to be hot-plugged, jump directly to reset which will sort out the proper things to do. This will cause all pending I/O to complete with an abort status before the reset.
When in the NORMAL state, call the interrupt handler. This will complete all pending transactions when interrupts are broken or temporarily misbehaving. We then check all the pending completions for timeouts. If we have abort enabled, then we'll send an abort. Otherwise we'll assume the controller is wedged and needs a reset. By calling the interrupt handler here, we'll avoid an issue with the current code where we transitioned to RECOVERY_START which prevented any completions from happening. Now completions happen. In addition and follow-on I/O that is scheduled in the completion routines will be submitted, rather than queued, because the recovery state is correct. This also fixes a problem where I/O would timeout, but never complete, leading to hung I/O.
Resetting remains the same as before, just when we chose to reset has changed.
A nice side effect of these changes is that we now do I/O when interrupts to the card are totally broken. Followon commits will improve the error reporting and logging when this happens. Performance will be aweful, but will at least be minimally functional.
There is a small race when we're checking the completions if interrupts are working, but this is handled in a future commit.
Sponsored by: Netflix MFC After: 2 weeks Differential Revision: https://reviews.freebsd.org/D36922
|