/linux/include/drm/ |
H A D | drm_panic.h | e2a1cda3e0c784740751d46431973dcee32cf108 Tue Apr 09 18:30:40 CEST 2024 Daniel Vetter <daniel.vetter@ffwll.ch> drm/panic: Add drm panic locking
Rough sketch for the locking of drm panic printing code. The upshot of this approach is that we can pretty much entirely rely on the atomic commit flow, with the pair of raw_spin_lock/unlock providing any barriers we need, without having to create really big critical sections in code.
This also avoids the need that drivers must explicitly update the panic handler state, which they might forget to do, or not do consistently, and then we blow up in the worst possible times.
It is somewhat racy against a concurrent atomic update, and we might write into a buffer which the hardware will never display. But there's fundamentally no way to avoid that - if we do the panic state update explicitly after writing to the hardware, we might instead write to an old buffer that the user will barely ever see.
Note that an rcu protected deference of plane->state would give us the the same guarantees, but it has the downside that we then need to protect the plane state freeing functions with call_rcu too. Which would very widely impact a lot of code and therefore doesn't seem worth the complexity compared to a raw spinlock with very tiny critical sections. Plus rcu cannot be used to protect access to peek/poke registers anyway, so we'd still need it for those cases.
Peek/poke registers for vram access (or a gart pte reserved just for panic code) are also the reason I've gone with a per-device and not per-plane spinlock, since usually these things are global for the entire display. Going with per-plane locks would mean drivers for such hardware would need additional locks, which we don't want, since it deviates from the per-console takeoverlocks design.
Longer term it might be useful if the panic notifiers grow a bit more structure than just the absolute bare EXPORT_SYMBOL(panic_notifier_list) - somewhat aside, why is that not EXPORT_SYMBOL_GPL ... If panic notifiers would be more like console drivers with proper register/unregister interfaces we could perhaps reuse the very fancy console lock with all it's check and takeover semantics that John Ogness is developing to fix the console_lock mess. But for the initial cut of a drm panic printing support I don't think we need that, because the critical sections are extremely small and only happen once per display refresh. So generally just 60 tiny locked sections per second, which is nothing compared to a serial console running a 115kbaud doing really slow mmio writes for each byte. So for now the raw spintrylock in drm panic notifier callback should be good enough.
Another benefit of making panic notifiers more like full blown consoles (that are used in panics only) would be that we get the two stage design, where first all the safe outputs are used. And then the dangerous takeover tricks are deployed (where for display drivers we also might try to intercept any in-flight display buffer flips, which if we race and misprogram fifos and watermarks can hang the memory controller on some hw).
For context the actual implementation on the drm side is by Jocelyn and this patch is meant to be combined with the overall approach in v7 (v8 is a bit less flexible, which I think is the wrong direction):
https://lore.kernel.org/dri-devel/20240104160301.185915-1-jfalempe@redhat.com/
Note that the locking is very much not correct there, hence this separate rfc.
Starting from v10, I (Jocelyn) have included this patch in the drm_panic series, and done the corresponding changes.
v2: - fix authorship, this was all my typing - some typo oopsies - link to the drm panic work by Jocelyn for context
v10: - Use spinlock_irqsave/restore (John Ogness)
v11: - Use macro instead of inline functions for drm_panic_lock/unlock (John Ogness)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240409163432.352518-2-jfalempe@redhat.com Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
H A D | drm_mode_config.h | diff e2a1cda3e0c784740751d46431973dcee32cf108 Tue Apr 09 18:30:40 CEST 2024 Daniel Vetter <daniel.vetter@ffwll.ch> drm/panic: Add drm panic locking
Rough sketch for the locking of drm panic printing code. The upshot of this approach is that we can pretty much entirely rely on the atomic commit flow, with the pair of raw_spin_lock/unlock providing any barriers we need, without having to create really big critical sections in code.
This also avoids the need that drivers must explicitly update the panic handler state, which they might forget to do, or not do consistently, and then we blow up in the worst possible times.
It is somewhat racy against a concurrent atomic update, and we might write into a buffer which the hardware will never display. But there's fundamentally no way to avoid that - if we do the panic state update explicitly after writing to the hardware, we might instead write to an old buffer that the user will barely ever see.
Note that an rcu protected deference of plane->state would give us the the same guarantees, but it has the downside that we then need to protect the plane state freeing functions with call_rcu too. Which would very widely impact a lot of code and therefore doesn't seem worth the complexity compared to a raw spinlock with very tiny critical sections. Plus rcu cannot be used to protect access to peek/poke registers anyway, so we'd still need it for those cases.
Peek/poke registers for vram access (or a gart pte reserved just for panic code) are also the reason I've gone with a per-device and not per-plane spinlock, since usually these things are global for the entire display. Going with per-plane locks would mean drivers for such hardware would need additional locks, which we don't want, since it deviates from the per-console takeoverlocks design.
Longer term it might be useful if the panic notifiers grow a bit more structure than just the absolute bare EXPORT_SYMBOL(panic_notifier_list) - somewhat aside, why is that not EXPORT_SYMBOL_GPL ... If panic notifiers would be more like console drivers with proper register/unregister interfaces we could perhaps reuse the very fancy console lock with all it's check and takeover semantics that John Ogness is developing to fix the console_lock mess. But for the initial cut of a drm panic printing support I don't think we need that, because the critical sections are extremely small and only happen once per display refresh. So generally just 60 tiny locked sections per second, which is nothing compared to a serial console running a 115kbaud doing really slow mmio writes for each byte. So for now the raw spintrylock in drm panic notifier callback should be good enough.
Another benefit of making panic notifiers more like full blown consoles (that are used in panics only) would be that we get the two stage design, where first all the safe outputs are used. And then the dangerous takeover tricks are deployed (where for display drivers we also might try to intercept any in-flight display buffer flips, which if we race and misprogram fifos and watermarks can hang the memory controller on some hw).
For context the actual implementation on the drm side is by Jocelyn and this patch is meant to be combined with the overall approach in v7 (v8 is a bit less flexible, which I think is the wrong direction):
https://lore.kernel.org/dri-devel/20240104160301.185915-1-jfalempe@redhat.com/
Note that the locking is very much not correct there, hence this separate rfc.
Starting from v10, I (Jocelyn) have included this patch in the drm_panic series, and done the corresponding changes.
v2: - fix authorship, this was all my typing - some typo oopsies - link to the drm panic work by Jocelyn for context
v10: - Use spinlock_irqsave/restore (John Ogness)
v11: - Use macro instead of inline functions for drm_panic_lock/unlock (John Ogness)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240409163432.352518-2-jfalempe@redhat.com Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
/linux/drivers/gpu/drm/ |
H A D | drm_drv.c | diff e2a1cda3e0c784740751d46431973dcee32cf108 Tue Apr 09 18:30:40 CEST 2024 Daniel Vetter <daniel.vetter@ffwll.ch> drm/panic: Add drm panic locking
Rough sketch for the locking of drm panic printing code. The upshot of this approach is that we can pretty much entirely rely on the atomic commit flow, with the pair of raw_spin_lock/unlock providing any barriers we need, without having to create really big critical sections in code.
This also avoids the need that drivers must explicitly update the panic handler state, which they might forget to do, or not do consistently, and then we blow up in the worst possible times.
It is somewhat racy against a concurrent atomic update, and we might write into a buffer which the hardware will never display. But there's fundamentally no way to avoid that - if we do the panic state update explicitly after writing to the hardware, we might instead write to an old buffer that the user will barely ever see.
Note that an rcu protected deference of plane->state would give us the the same guarantees, but it has the downside that we then need to protect the plane state freeing functions with call_rcu too. Which would very widely impact a lot of code and therefore doesn't seem worth the complexity compared to a raw spinlock with very tiny critical sections. Plus rcu cannot be used to protect access to peek/poke registers anyway, so we'd still need it for those cases.
Peek/poke registers for vram access (or a gart pte reserved just for panic code) are also the reason I've gone with a per-device and not per-plane spinlock, since usually these things are global for the entire display. Going with per-plane locks would mean drivers for such hardware would need additional locks, which we don't want, since it deviates from the per-console takeoverlocks design.
Longer term it might be useful if the panic notifiers grow a bit more structure than just the absolute bare EXPORT_SYMBOL(panic_notifier_list) - somewhat aside, why is that not EXPORT_SYMBOL_GPL ... If panic notifiers would be more like console drivers with proper register/unregister interfaces we could perhaps reuse the very fancy console lock with all it's check and takeover semantics that John Ogness is developing to fix the console_lock mess. But for the initial cut of a drm panic printing support I don't think we need that, because the critical sections are extremely small and only happen once per display refresh. So generally just 60 tiny locked sections per second, which is nothing compared to a serial console running a 115kbaud doing really slow mmio writes for each byte. So for now the raw spintrylock in drm panic notifier callback should be good enough.
Another benefit of making panic notifiers more like full blown consoles (that are used in panics only) would be that we get the two stage design, where first all the safe outputs are used. And then the dangerous takeover tricks are deployed (where for display drivers we also might try to intercept any in-flight display buffer flips, which if we race and misprogram fifos and watermarks can hang the memory controller on some hw).
For context the actual implementation on the drm side is by Jocelyn and this patch is meant to be combined with the overall approach in v7 (v8 is a bit less flexible, which I think is the wrong direction):
https://lore.kernel.org/dri-devel/20240104160301.185915-1-jfalempe@redhat.com/
Note that the locking is very much not correct there, hence this separate rfc.
Starting from v10, I (Jocelyn) have included this patch in the drm_panic series, and done the corresponding changes.
v2: - fix authorship, this was all my typing - some typo oopsies - link to the drm panic work by Jocelyn for context
v10: - Use spinlock_irqsave/restore (John Ogness)
v11: - Use macro instead of inline functions for drm_panic_lock/unlock (John Ogness)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240409163432.352518-2-jfalempe@redhat.com Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
H A D | drm_atomic_helper.c | diff e2a1cda3e0c784740751d46431973dcee32cf108 Tue Apr 09 18:30:40 CEST 2024 Daniel Vetter <daniel.vetter@ffwll.ch> drm/panic: Add drm panic locking
Rough sketch for the locking of drm panic printing code. The upshot of this approach is that we can pretty much entirely rely on the atomic commit flow, with the pair of raw_spin_lock/unlock providing any barriers we need, without having to create really big critical sections in code.
This also avoids the need that drivers must explicitly update the panic handler state, which they might forget to do, or not do consistently, and then we blow up in the worst possible times.
It is somewhat racy against a concurrent atomic update, and we might write into a buffer which the hardware will never display. But there's fundamentally no way to avoid that - if we do the panic state update explicitly after writing to the hardware, we might instead write to an old buffer that the user will barely ever see.
Note that an rcu protected deference of plane->state would give us the the same guarantees, but it has the downside that we then need to protect the plane state freeing functions with call_rcu too. Which would very widely impact a lot of code and therefore doesn't seem worth the complexity compared to a raw spinlock with very tiny critical sections. Plus rcu cannot be used to protect access to peek/poke registers anyway, so we'd still need it for those cases.
Peek/poke registers for vram access (or a gart pte reserved just for panic code) are also the reason I've gone with a per-device and not per-plane spinlock, since usually these things are global for the entire display. Going with per-plane locks would mean drivers for such hardware would need additional locks, which we don't want, since it deviates from the per-console takeoverlocks design.
Longer term it might be useful if the panic notifiers grow a bit more structure than just the absolute bare EXPORT_SYMBOL(panic_notifier_list) - somewhat aside, why is that not EXPORT_SYMBOL_GPL ... If panic notifiers would be more like console drivers with proper register/unregister interfaces we could perhaps reuse the very fancy console lock with all it's check and takeover semantics that John Ogness is developing to fix the console_lock mess. But for the initial cut of a drm panic printing support I don't think we need that, because the critical sections are extremely small and only happen once per display refresh. So generally just 60 tiny locked sections per second, which is nothing compared to a serial console running a 115kbaud doing really slow mmio writes for each byte. So for now the raw spintrylock in drm panic notifier callback should be good enough.
Another benefit of making panic notifiers more like full blown consoles (that are used in panics only) would be that we get the two stage design, where first all the safe outputs are used. And then the dangerous takeover tricks are deployed (where for display drivers we also might try to intercept any in-flight display buffer flips, which if we race and misprogram fifos and watermarks can hang the memory controller on some hw).
For context the actual implementation on the drm side is by Jocelyn and this patch is meant to be combined with the overall approach in v7 (v8 is a bit less flexible, which I think is the wrong direction):
https://lore.kernel.org/dri-devel/20240104160301.185915-1-jfalempe@redhat.com/
Note that the locking is very much not correct there, hence this separate rfc.
Starting from v10, I (Jocelyn) have included this patch in the drm_panic series, and done the corresponding changes.
v2: - fix authorship, this was all my typing - some typo oopsies - link to the drm panic work by Jocelyn for context
v10: - Use spinlock_irqsave/restore (John Ogness)
v11: - Use macro instead of inline functions for drm_panic_lock/unlock (John Ogness)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jocelyn Falempe <jfalempe@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Cc: Lukas Wunner <lukas@wunner.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: John Ogness <john.ogness@linutronix.de> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@gmail.com> Cc: Daniel Vetter <daniel@ffwll.ch> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240409163432.352518-2-jfalempe@redhat.com Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|