5bc155cf | 17-Nov-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs/gaudi2: use correct registers to dump QM CQ info
The QM CQ PTR_LO/PTR_HI/TSIZE registers are for pushing a CQ entry, and although they are updated by HW even when descriptors are fe
accel/habanalabs/gaudi2: use correct registers to dump QM CQ info
The QM CQ PTR_LO/PTR_HI/TSIZE registers are for pushing a CQ entry, and although they are updated by HW even when descriptors are fetched by PQ and CB addresses are fed into CQ, the correct registers to use when dumping the CQ info are the ones with the _STS suffix.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
ae303d88 | 06-Nov-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs/gaudi2: get the correct QM CQ info upon an error
Upon a QM error, the address/size from both the CQ and the ARC_CQ are printed, although the instruction that led to the error was re
accel/habanalabs/gaudi2: get the correct QM CQ info upon an error
Upon a QM error, the address/size from both the CQ and the ARC_CQ are printed, although the instruction that led to the error was received from only one of them.
Moreover, in case of a QM undefined opcode, only one of these address/size sets will be captured based on the value of ARC_CQ_PTR. However, this value can be non-zero even if currently the CQ is used, in case the CQ/ARC_CQ are alternately used.
Under the assumption of having a stop-on-error configuration, modify to use CP_STS.CUR_CQ field to get the relevant CQ for the QM error.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|
0426e031 | 19-Sep-2023 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs/gaudi2: perform hard-reset upon PCIe AXI drain event
Non-completed transactions from PCIe towards the device are handled by the AXI drain mechanism. This handling is in the PCIe lev
accel/habanalabs/gaudi2: perform hard-reset upon PCIe AXI drain event
Non-completed transactions from PCIe towards the device are handled by the AXI drain mechanism. This handling is in the PCIe level, but the transactions are still there in the device consuming some queues entries, and therefore the device must be reset. Modify to perform hard-reset upon PCIe AXI drain events.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
show more ...
|