57d298bd | 25-Apr-2024 |
Mikko Perttunen <mperttunen@nvidia.com> |
gpu: host1x: Add MLOCK recovery for rest of engines
Add class IDs / MLOCKs for MLOCK recovery for rest of engines present on Tegra234.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-
gpu: host1x: Add MLOCK recovery for rest of engines
Add class IDs / MLOCKs for MLOCK recovery for rest of engines present on Tegra234.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240425050238.2943404-4-cyndis@kapsi.fi
show more ...
|
e436a408 | 25-Apr-2024 |
Mikko Perttunen <mperttunen@nvidia.com> |
gpu: host1x: Handle CDMA wraparound when debug printing
During channel debug information dump, when printing CDMA opcodes, the circular nature of the CDMA pushbuffer wasn't being taken into account,
gpu: host1x: Handle CDMA wraparound when debug printing
During channel debug information dump, when printing CDMA opcodes, the circular nature of the CDMA pushbuffer wasn't being taken into account, sometimes accessing past the end. Change the printing to take this into account.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240425050238.2943404-2-cyndis@kapsi.fi
show more ...
|
d5179020 | 19-Jan-2023 |
Mikko Perttunen <mperttunen@nvidia.com> |
gpu: host1x: External timeout/cancellation for fences
Currently all fences have a 30 second timeout to ensure they are cleaned up if the fence never completes otherwise. However, this one size fits
gpu: host1x: External timeout/cancellation for fences
Currently all fences have a 30 second timeout to ensure they are cleaned up if the fence never completes otherwise. However, this one size fits all solution doesn't actually fit in every case, such as syncpoint waiting where we want to be able to have timeouts longer than 30 seconds. As such, we want to be able to give control over fence cancellation to the caller (and maybe eventually get rid of the internal timeout altogether).
Here we add this cancellation mechanism by essentially adding a function for entering the timeout path by function call, and changing the syncpoint wait function to use it.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
show more ...
|
625d4ffb | 19-Jan-2023 |
Mikko Perttunen <mperttunen@nvidia.com> |
gpu: host1x: Rewrite syncpoint interrupt handling
Move from the old, complex intr handling code to a new implementation based on dma_fences. While there is a fair bit of churn to get there, the new
gpu: host1x: Rewrite syncpoint interrupt handling
Move from the old, complex intr handling code to a new implementation based on dma_fences. While there is a fair bit of churn to get there, the new implementation is much simpler and likely faster as well due to allowing signaling directly from interrupt context.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
show more ...
|
c24973ed | 19-Jan-2023 |
Mikko Perttunen <mperttunen@nvidia.com> |
gpu: host1x: Implement job tracking using DMA fences
In anticipation of removal of the intr API, implement job tracking using DMA fences instead. The main two things about this are making cdma_updat
gpu: host1x: Implement job tracking using DMA fences
In anticipation of removal of the intr API, implement job tracking using DMA fences instead. The main two things about this are making cdma_update schedule the work since fence completion can now be called from interrupt context, and some complication in ensuring the callback is not running when we free the fence.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
show more ...
|
eb258cc1 | 19-Jan-2023 |
Mikko Perttunen <mperttunen@nvidia.com> |
gpu: host1x: Don't skip assigning syncpoints to channels
The code to write the syncpoint channel assignment register incorrectly skips the write if hypervisor registers are not available.
The regis
gpu: host1x: Don't skip assigning syncpoints to channels
The code to write the syncpoint channel assignment register incorrectly skips the write if hypervisor registers are not available.
The register, however, is within the guest aperture so remove the check and assign syncpoints properly even on virtualized systems.
Fixes: c3f52220f276 ("gpu: host1x: Enable Tegra186 syncpoint protection") Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
show more ...
|
a94b8a77 | 27-Jun-2022 |
Mikko Perttunen <mperttunen@nvidia.com> |
gpu: host1x: Add MLOCK release code on Tegra234
With the full-featured opcode sequence using MLOCKs, we need to also unlock those MLOCKs in the event of a timeout. However, it turns out that on Tegr
gpu: host1x: Add MLOCK release code on Tegra234
With the full-featured opcode sequence using MLOCKs, we need to also unlock those MLOCKs in the event of a timeout. However, it turns out that on Tegra186/Tegra194, by default, we don't need to do this; furthermore, on Tegra234 it is much simpler to do; so only implement this on Tegra234 for the time being.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
show more ...
|
1411796f | 27-Jun-2022 |
Mikko Perttunen <mperttunen@nvidia.com> |
gpu: host1x: Rewrite job opcode sequence
For new (Tegra186+) SoCs, use a new ('full-featured') job opcode sequence that is compatible with virtualization. In particular, the Host1x hardware in Tegra
gpu: host1x: Rewrite job opcode sequence
For new (Tegra186+) SoCs, use a new ('full-featured') job opcode sequence that is compatible with virtualization. In particular, the Host1x hardware in Tegra234 is more strict regarding the sequence, requiring ACQUIRE_MLOCK-SETCLASS-SETSTREAMID opcodes to occur in that sequence without gaps (except for SETPAYLOAD), so let's do it properly in one go now.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
show more ...
|
7afd1194 | 27-Jun-2022 |
Mikko Perttunen <mperttunen@nvidia.com> |
gpu: host1x: Program interrupt destinations on Tegra234
On Tegra234, each Host1x VM has 8 interrupt lines. Each syncpoint can be configured with which interrupt line should be used for threshold int
gpu: host1x: Program interrupt destinations on Tegra234
On Tegra234, each Host1x VM has 8 interrupt lines. Each syncpoint can be configured with which interrupt line should be used for threshold interrupt, allowing for load balancing.
For now, to keep backwards compatibility, just set all syncpoints to the first interrupt.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
show more ...
|
fed02893 | 09-Jul-2021 |
Thierry Reding <treding@nvidia.com> |
gpu: host1x: debug: Dump DMASTART and DMAEND register
Show the values of the DMASTART and DMAEND registers when dumping status to help with failure analysis.
Signed-off-by: Thierry Reding <treding@
gpu: host1x: debug: Dump DMASTART and DMAEND register
Show the values of the DMASTART and DMAEND registers when dumping status to help with failure analysis.
Signed-off-by: Thierry Reding <treding@nvidia.com>
show more ...
|
afa770fe | 09-Jul-2021 |
Thierry Reding <treding@nvidia.com> |
gpu: host1x: debug: Dump only relevant parts of CDMA push buffer
Dumping the full CDMA push buffer takes a long time and isn't very useful since most of the contents are not relevant. Instead only s
gpu: host1x: debug: Dump only relevant parts of CDMA push buffer
Dumping the full CDMA push buffer takes a long time and isn't very useful since most of the contents are not relevant. Instead only show the CDMA push buffer entries associated with current jobs.
While at it, tweak the indentation a bit to make the output more readable.
Signed-off-by: Thierry Reding <treding@nvidia.com>
show more ...
|
e902585f | 10-Jun-2021 |
Mikko Perttunen <mperttunen@nvidia.com> |
gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
Add support for inserting syncpoint waits in the CDMA pushbuffer. These waits need to be done in HOST1X class, while gather submitted
gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
Add support for inserting syncpoint waits in the CDMA pushbuffer. These waits need to be done in HOST1X class, while gather submitted by the application execute in engine class.
Support is added by converting the gather list of job into a command list that can include both gathers and waits. When the job is submitted, these commands are pushed as the appropriate opcodes on the CDMA pushbuffer.
Also supported are waits relative to the start of the job, which are useful for jobs doing multiple things with an engine that doesn't natively support pipelining.
While at it, use 32-bit waits on chips that support them.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
show more ...
|