#
b08494a8 |
| 28-May-2025 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie: "As part of building up nova-core/nova-drm pieces we've brought in some rust abstra
Merge tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie: "As part of building up nova-core/nova-drm pieces we've brought in some rust abstractions through this tree, aux bus being the main one, with devres changes also in the driver-core tree. Along with the drm core abstractions and enough nova-core/nova-drm to use them. This is still all stub work under construction, to build the nova driver upstream.
The other big NVIDIA related one is nouveau adds support for Hopper/Blackwell GPUs, this required a new GSP firmware update to 570.144, and a bunch of rework in order to support multiple fw interfaces.
There is also the introduction of an asahi uapi header file as a precursor to getting the real driver in later, but to unblock userspace mesa packages while the driver is trapped behind rust enablement.
Otherwise it's the usual mixture of stuff all over, amdgpu, i915/xe, and msm being the main ones, and some changes to vsprintf.
new drivers: - bring in the asahi uapi header standalone - nova-drm: stub driver
rust dependencies (for nova-core): - auxiliary - bus abstractions - driver registration - sample driver - devres changes from driver-core - revocable changes
core: - add Apple fourcc modifiers - add virtio capset definitions - extend EXPORT_SYNC_FILE for timeline syncobjs - convert to devm_platform_ioremap_resource - refactor shmem helper page pinning - DP powerup/down link helpers - extended %p4cc in vsprintf.c to support fourcc prints - change vsprintf %p4cn to %p4chR, remove %p4cn - Add drm_file_err function - IN_FORMATS_ASYNC property - move sitronix from tiny to their own subdir
rust: - add drm core infrastructure rust abstractions (device/driver, ioctl, file, gem)
dma-buf: - adjust sg handling to not cache map on attach - allow setting dma-device for import - Add a helper to sort and deduplicate dma_fence arrays
docs: - updated drm scheduler docs - fbdev todo update - fb rendering - actual brightness
ttm: - fix delayed destroy resv object
bridge: - add kunit tests - convert tc358775 to atomic - convert drivers to devm_drm_bridge_alloc - convert rk3066_hdmi to bridge driver
scheduler: - add kunit tests
panel: - refcount panels to improve lifetime handling - Powertip PH128800T004-ZZA01 - NLT NL13676BC25-03F, Tianma TM070JDHG34-00 - Himax HX8279/HX8279-D DDIC - Visionox G2647FB105 - Sitronix ST7571 - ZOTAC rotation quirk
vkms: - allow attaching more displays
i915: - xe3lpd display updates - vrr refactor - intel_display struct conversions - xe2hpd memory type identification - add link rate/count to i915_display_info - cleanup VGA plane handling - refactor HDCP GSC - fix SLPC wait boosting reference counting - add 20ms delay to engine reset - fix fence release on early probe errors
xe: - SRIOV updates - BMG PCI ID update - support separate firmware for each GT - SVM fix, prelim SVM multi-device work - export fan speed - temp disable d3cold on BMG - backup VRAM in PM notifier instead of suspend/freeze - update xe_ttm_access_memory to use GPU for non-visible access - fix guc_info debugfs for VFs - use copy_from_user instead of __copy_from_user - append PCIe gen5 limitations to xe_firmware document
amdgpu: - DSC cleanup - DC Scaling updates - Fused I2C-over-AUX updates - DMUB updates - Use drm_file_err in amdgpu - Enforce isolation updates - Use new dma_fence helpers - USERQ fixes - Documentation updates - SR-IOV updates - RAS updates - PSP 12 cleanups - GC 9.5 updates - SMU 13.x updates - VCN / JPEG SR-IOV updates
amdkfd: - Update error messages for SDMA - Userptr updates - XNACK fixes
radeon: - CIK doorbell cleanup
nouveau: - add support for NVIDIA r570 GSP firmware - enable Hopper/Blackwell support
nova-core: - fix task list - register definition infrastructure - move firmware into own rust module - register auxiliary device for nova-drm
nova-drm: - initial driver skeleton
msm: - GPU: - ACD (adaptive clock distribution) for X1-85 - drop fictional address_space_size - improve GMU HFI response time out robustness - fix crash when throttling during boot - DPU: - use single CTL path for flushing on DPU 5.x+ - improve SSPP allocation code for better sharing - Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550 - Added SAR2130P support - Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660 - DP: - switch to new audio helpers - better LTTPR handling - DSI: - Added support for SA8775P - Added SAR2130P support - HDMI: - Switched to use new helpers for ACR data - Fixed old standing issue of HPD not working in some cases
amdxdna: - add dma-buf support - allow empty command submits
renesas: - add dma-buf support - add zpos, alpha, blend support
panthor: - fail properly for NO_MMAP bos - add SET_LABEL ioctl - debugfs BO dumping support
imagination: - update DT bindings - support TI AM68 GPU
hibmc: - improve interrupt handling and HPD support
virtio: - add panic handler support
rockchip: - add RK3588 support - add DP AUX bus panel support
ivpu: - add heartbeat based hangcheck
mediatek: - prepares support for MT8195/99 HDMIv2/DDCv2
anx7625: - improve HPD
tegra: - speed up firmware loading
* tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel: (1627 commits) drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr() drm/xe: Default auto_link_downgrade status to false drm/xe/guc: Make creation of SLPC debugfs files conditional drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue() drm/i915/dp_mst: Work around Thunderbolt sink disconnect after SINK_COUNT_ESI read drm/i915/ptl: Use everywhere the correct DDI port clock select mask drm/nouveau/kms: add support for GB20x drm/dp: add option to disable zero sized address only transactions. drm/nouveau: add support for GB20x drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle drm/nouveau: add support for GB10x drm/nouveau/gf100-: track chan progress with non-WFI semaphore release drm/nouveau/nv50-: separate CHANNEL_GPFIFO handling out from CHANNEL_DMA drm/nouveau: add helper functions for allocating pinned/cpu-mapped bos drm/nouveau: add support for GH100 drm/nouveau: improve handling of 64-bit BARs drm/nouveau/gv100-: switch to volta semaphore methods drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM ...
show more ...
|
Revision tags: v6.15 |
|
#
72dc7c05 |
| 19-May-2025 |
Dave Airlie <airlied@redhat.com> |
Merge tag 'amd-drm-next-6.16-2025-05-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amdgpu: - Misc code cleanups - UserQ fixes - MALL reporting fix - DP AUX fixes - DCN 3.5 fixes -
Merge tag 'amd-drm-next-6.16-2025-05-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amdgpu: - Misc code cleanups - UserQ fixes - MALL reporting fix - DP AUX fixes - DCN 3.5 fixes - DP MST fixes - DC DMI quirks cleanup - RAS fixes - SR-IOV updates - GC 9.5 updates - Misc display fixes - VCN 4.0.5 powergating race fix - SMU 13.x updates - Paritioning fixes - VCN 5.0.1 SR-IOV updates - JPEG 5.0.1 SR-IOV updates
amdkfd: - Fix spurious warning in interrupt code - XNACK fixes
radeon: - CIK doorbell cleanup
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250516204609.2437472-1-alexander.deucher@amd.com
show more ...
|
Revision tags: v6.15-rc7 |
|
#
96a86dcb |
| 13-May-2025 |
Jesse.Zhang <Jesse.Zhang@amd.com> |
drm/amdgpu: Fix circular locking in userq creation
A circular locking dependency was detected between the global `adev->userq_mutex` and per-file `userq_mgr->userq_mutex` when creating user queues.
drm/amdgpu: Fix circular locking in userq creation
A circular locking dependency was detected between the global `adev->userq_mutex` and per-file `userq_mgr->userq_mutex` when creating user queues. The issue occurs because:
1. `amdgpu_userq_suspend()` and `amdgpu_userq_resume` take `adev->userq_mutex` first, then `userq_mgr->userq_mutex` 2. While `amdgpu_userq_create()` takes them in reverse order
This patch resolves the issue by: 1. Moving the `adev->userq_mutex` lock earlier in `amdgpu_userq_create()` to cover the `amdgpu_userq_ensure_ev_fence()` call 2. Releasing it after we're done with both queue creation and the scheduling halt check
v2: remove unused adev->userq_mutex lock (Prike)
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
bc5bab82 |
| 13-May-2025 |
Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> |
drm/amdgpu: Fix userq ttm_bo_pin and ttm_bo_unpin lockdep warnings
The ttm_bo_pin and ttm_bo_unpin warnings are resolved by moving the doorbell bo reserve up before pin/unpin.
WARNING: CPU: 11 PID:
drm/amdgpu: Fix userq ttm_bo_pin and ttm_bo_unpin lockdep warnings
The ttm_bo_pin and ttm_bo_unpin warnings are resolved by moving the doorbell bo reserve up before pin/unpin.
WARNING: CPU: 11 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:592 ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000277] CPU: 11 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G W 6.12.0+ #15 [ +0.000006] Tainted: [W]=WARN [ +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024 [ +0.000004] RIP: 0010:ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000005] RSP: 0018:ffff88846ca879d0 EFLAGS: 00010246 [ +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000 [ +0.000004] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000005] RBP: ffff88846ca879e8 R08: 0000000000000000 R09: 0000000000000000 [ +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88810b7ca848 [ +0.000004] R13: ffff88846c666250 R14: 1ffff1108d950f44 R15: ffff88846ca87aa0 [ +0.000005] FS: 00007c45ff436d00(0000) GS:ffff888409580000(0000) knlGS:0000000000000000 [ +0.000004] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000005] CR2: 00005b0c142a60e0 CR3: 000000012ce5a000 CR4: 0000000000f50ef0 [ +0.000004] PKRU: 55555554 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000005] ? show_regs+0x6c/0x80 [ +0.000007] ? __warn+0xd2/0x2d0 [ +0.000007] ? ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000031] ? report_bug+0x282/0x2f0 [ +0.000012] ? handle_bug+0x6e/0xc0 [ +0.000007] ? exc_invalid_op+0x18/0x50 [ +0.000007] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? ttm_bo_pin+0x1f6/0x270 [ttm] [ +0.000014] amdgpu_bo_pin+0x365/0x9d0 [amdgpu] [ +0.000191] ? __pfx_amdgpu_bo_pin+0x10/0x10 [amdgpu] [ +0.000185] ? drm_gem_object_lookup+0x81/0xc0 [ +0.000008] ? kasan_save_alloc_info+0x37/0x60 [ +0.000007] ? __kasan_kmalloc+0xc3/0xd0 [ +0.000013] amdgpu_userqueue_get_doorbell_index+0xee/0x5f0 [amdgpu] [ +0.000209] amdgpu_userq_ioctl+0x6b4/0xd40 [amdgpu] [ +0.000193] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000211] ? lock_acquire+0x7c/0xc0 [ +0.000006] ? drm_dev_enter+0x51/0x190 [ +0.000015] drm_ioctl_kernel+0x18b/0x330 [ +0.000007] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000190] ? __pfx_drm_ioctl_kernel+0x10/0x10 [ +0.000005] ? lock_acquire+0x7c/0xc0 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000011] drm_ioctl+0x589/0xd00 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000194] ? __pfx_drm_ioctl+0x10/0x10 [ +0.000006] ? __pm_runtime_resume+0x80/0x110 [ +0.000021] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? trace_hardirqs_on+0x53/0x60 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000013] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu] [ +0.000185] __x64_sys_ioctl+0x13a/0x1c0 [ +0.000010] x64_sys_call+0x11ad/0x25f0 [ +0.000007] do_syscall_64+0x91/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? irqentry_exit+0x77/0xb0 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? exc_page_fault+0x93/0x150 [ +0.000009] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7c45ff924ded [ +0.000005] RSP: 002b:00007ffff7167810 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded [ +0.000004] RDX: 00007ffff7167870 RSI: 00000000c0486456 RDI: 000000000000000b [ +0.000004] RBP: 00007ffff7167860 R08: ffff800100000000 R09: 0000000000010000 [ +0.000005] R10: 00007ffff7167950 R11: 0000000000000246 R12: 00005b0c2a51bc48 [ +0.000004] R13: 000000000000000b R14: 0000000000000000 R15: 00007ffff7167950 [ +0.000022] </TASK> [ +0.000004] irq event stamp: 80693 [ +0.000004] hardirqs last enabled at (80699): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0 [ +0.000005] hardirqs last disabled at (80704): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0 [ +0.000005] softirqs last enabled at (80390): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] softirqs last disabled at (80385): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000006] ---[ end trace 0000000000000000 ]--- ------------------------------------------------------------------------------------------------------
[ +0.000006] WARNING: CPU: 10 PID: 1818 at drivers/gpu/drm/ttm/ttm_bo.c:611 ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000280] CPU: 10 UID: 1000 PID: 1818 Comm: Xwayland Tainted: G W 6.12.0+ #15 [ +0.000006] Tainted: [W]=WARN [ +0.000004] Hardware name: ASUS System Product Name/TUF GAMING B650-PLUS, BIOS 3072 12/20/2024 [ +0.000004] RIP: 0010:ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000005] RSP: 0018:ffff88846ca87888 EFLAGS: 00010246 [ +0.000007] RAX: 0000000000000000 RBX: ffff88810b7ca848 RCX: 0000000000000000 [ +0.000005] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ +0.000004] RBP: ffff88846ca878a0 R08: 0000000000000000 R09: 0000000000000000 [ +0.000004] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888164e90050 [ +0.000005] R13: ffff88846c666200 R14: 0000000000000001 R15: ffff888168402d28 [ +0.000004] FS: 00007c45ff436d00(0000) GS:ffff888409500000(0000) knlGS:0000000000000000 [ +0.000005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000004] CR2: 00007c45f7373b20 CR3: 000000012ce5a000 CR4: 0000000000f50ef0 [ +0.000005] PKRU: 55555554 [ +0.000004] Call Trace: [ +0.000004] <TASK> [ +0.000005] ? show_regs+0x6c/0x80 [ +0.000008] ? __warn+0xd2/0x2d0 [ +0.000007] ? ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000012] ? report_bug+0x282/0x2f0 [ +0.000013] ? handle_bug+0x6e/0xc0 [ +0.000006] ? exc_invalid_op+0x18/0x50 [ +0.000008] ? asm_exc_invalid_op+0x1b/0x20 [ +0.000017] ? ttm_bo_unpin+0x21f/0x2c0 [ttm] [ +0.000011] ? ttm_bo_unpin+0x217/0x2c0 [ttm] [ +0.000011] amdgpu_bo_unpin+0x45/0x250 [amdgpu] [ +0.000216] amdgpu_userq_ioctl+0x2c3/0xd40 [amdgpu] [ +0.000226] ? drm_dev_exit+0x2d/0x60 [ +0.000010] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000201] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? lock_acquire+0x7c/0xc0 [ +0.000006] ? drm_dev_enter+0x51/0x190 [ +0.000015] drm_ioctl_kernel+0x18b/0x330 [ +0.000007] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000188] ? __pfx_drm_ioctl_kernel+0x10/0x10 [ +0.000006] ? lock_acquire+0x7c/0xc0 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? __kasan_check_write+0x14/0x30 [ +0.000006] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] drm_ioctl+0x589/0xd00 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? __pfx_amdgpu_userq_ioctl+0x10/0x10 [amdgpu] [ +0.000211] ? __pfx_drm_ioctl+0x10/0x10 [ +0.000006] ? __pm_runtime_resume+0x80/0x110 [ +0.000020] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? trace_hardirqs_on+0x53/0x60 [ +0.000005] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? _raw_spin_unlock_irqrestore+0x51/0x80 [ +0.000013] amdgpu_drm_ioctl+0xd2/0x1c0 [amdgpu] [ +0.000186] __x64_sys_ioctl+0x13a/0x1c0 [ +0.000010] x64_sys_call+0x11ad/0x25f0 [ +0.000007] do_syscall_64+0x91/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] ? __pfx___rseq_handle_notify_resume+0x10/0x10 [ +0.000005] ? __pfx_blkcg_maybe_throttle_current+0x10/0x10 [ +0.000013] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? syscall_exit_to_user_mode+0x95/0x260 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? do_syscall_64+0x9d/0x180 [ +0.000011] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000010] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000009] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000008] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? irqentry_exit_to_user_mode+0x8b/0x260 [ +0.000007] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000006] ? irqentry_exit+0x77/0xb0 [ +0.000004] ? srso_alias_return_thunk+0x5/0xfbef5 [ +0.000005] ? exc_page_fault+0x93/0x150 [ +0.000010] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ +0.000005] RIP: 0033:0x7c45ff924ded [ +0.000005] RSP: 002b:00007ffff7168790 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ +0.000008] RAX: ffffffffffffffda RBX: 00000000c0486456 RCX: 00007c45ff924ded [ +0.000005] RDX: 00007ffff71687f0 RSI: 00000000c0486456 RDI: 000000000000000b [ +0.000004] RBP: 00007ffff71687e0 R08: 00005b0c2a49b010 R09: 0000000000000007 [ +0.000004] R10: 00005b0c2a4d7140 R11: 0000000000000246 R12: 000000000000000b [ +0.000004] R13: 00007c45ff19e5cc R14: 00005b0c2a51c538 R15: 00005b0c2a51bbd8 [ +0.000022] </TASK> [ +0.000005] irq event stamp: 87419 [ +0.000004] hardirqs last enabled at (87425): [<ffffffff86a693a9>] __up_console_sem+0x79/0xa0 [ +0.000005] hardirqs last disabled at (87430): [<ffffffff86a6938e>] __up_console_sem+0x5e/0xa0 [ +0.000005] softirqs last enabled at (87058): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000006] softirqs last disabled at (87053): [<ffffffff8687377e>] __irq_exit_rcu+0x17e/0x1d0 [ +0.000005] ---[ end trace 0000000000000000 ]---
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v6.15-rc6 |
|
#
648a0dc0 |
| 09-May-2025 |
Jesse.Zhang <Jesse.Zhang@amd.com> |
drm/amdgpu: Fix user queue deadlock by reordering mutex locking
This resolves a deadlock between user queue management and GPU reset paths by enforcing consistent lock ordering.
The deadlock occurr
drm/amdgpu: Fix user queue deadlock by reordering mutex locking
This resolves a deadlock between user queue management and GPU reset paths by enforcing consistent lock ordering.
The deadlock occurred when:
1. Process exit path (amdgpu_userq_mgr_fini) would: - Take uqm->userq_mutex - Then try to take adev->userq_mutex for list operations
2. GPU reset path (amdgpu_userq_pre_reset) would: - Take adev->userq_mutex first (for list traversal) - Then take uqm->userq_mutex
The solution establishes a strict top-down locking order: 1. Always take adev->userq_mutex before any uqm->userq_mutex 2. Maintain this order consistently across all code paths
Changes made: - Reordered locking in amdgpu_userq_mgr_fini() to take device lock first - Kept existing proper order in amdgpu_userq_pre_reset() - Simplified the fini flow by removing redundant operations
This prevents circular dependencies while maintaining thread safety during both normal operation and GPU reset scenarios.
Fixes: 4ce60dbada96 ("drm/amdgpu: store userq_managers in a list in adev") Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
f10eb185 |
| 07-May-2025 |
Arvind Yadav <Arvind.Yadav@amd.com> |
drm/amdgpu: Fix NULL dereference in amdgpu_userq_restore_worker
Switch cancel_delayed_work() to cancel_delayed_work_sync() to ensure the delayed work has finished executing before proceeding with re
drm/amdgpu: Fix NULL dereference in amdgpu_userq_restore_worker
Switch cancel_delayed_work() to cancel_delayed_work_sync() to ensure the delayed work has finished executing before proceeding with resource cleanup. This prevents a potential use-after-free or NULL dereference if the resume_work is still running during finalization.
BUG: kernel NULL pointer dereference, address: 0000000000000140 [ +0.000050] #PF: supervisor read access in kernel mode [ +0.000019] #PF: error_code(0x0000) - not-present page [ +0.000021] PGD 0 P4D 0 [ +0.000015] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000021] CPU: 17 UID: 0 PID: 196299 Comm: kworker/17:0 Tainted: G U 6.14.0-org-staging #1 [ +0.000032] Tainted: [U]=USER [ +0.000015] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F39 03/22/2024 [ +0.000029] Workqueue: events amdgpu_userq_restore_worker [amdgpu] [ +0.000426] RIP: 0010:drm_exec_lock_obj+0x32/0x210 [drm_exec] [ +0.000025] Code: e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 48 83 ec 08 4c 8b 77 30 4d 85 f6 0f 85 c0 00 00 00 4c 8d 7f 08 48 39 77 38 74 54 <49> 8b bd f8 00 00 00 4c 89 fe 41 f6 04 24 01 75 3c e8 08 50 bc e0 [ +0.000046] RSP: 0018:ffffab1b04da3ce8 EFLAGS: 00010297 [ +0.000020] RAX: 0000000000000001 RBX: ffff930cc60e4bc0 RCX: 0000000000000000 [ +0.000025] RDX: 0000000000000004 RSI: 0000000000000048 RDI: ffffab1b04da3d88 [ +0.000028] RBP: ffffab1b04da3d10 R08: ffff930cc60e4000 R09: 0000000000000000 [ +0.000022] R10: ffffab1b04da3d18 R11: 0000000000000001 R12: ffffab1b04da3d88 [ +0.000023] R13: 0000000000000048 R14: 0000000000000000 R15: ffffab1b04da3d90 [ +0.000023] FS: 0000000000000000(0000) GS:ffff9313dea80000(0000) knlGS:0000000000000000 [ +0.000024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000021] CR2: 0000000000000140 CR3: 000000018351a000 CR4: 0000000000350ef0 [ +0.000025] Call Trace: [ +0.000018] <TASK> [ +0.000015] ? show_regs+0x69/0x80 [ +0.000022] ? __die+0x25/0x70 [ +0.000019] ? page_fault_oops+0x15d/0x510 [ +0.000024] ? do_user_addr_fault+0x312/0x690 [ +0.000024] ? sched_clock_cpu+0x10/0x1a0 [ +0.000028] ? exc_page_fault+0x78/0x1b0 [ +0.000025] ? asm_exc_page_fault+0x27/0x30 [ +0.000024] ? drm_exec_lock_obj+0x32/0x210 [drm_exec] [ +0.000024] drm_exec_prepare_obj+0x21/0x60 [drm_exec] [ +0.000021] amdgpu_vm_lock_pd+0x22/0x30 [amdgpu] [ +0.000266] amdgpu_userq_validate_bos+0x6c/0x320 [amdgpu] [ +0.000333] amdgpu_userq_restore_worker+0x4a/0x120 [amdgpu] [ +0.000316] process_one_work+0x189/0x3c0 [ +0.000021] worker_thread+0x2a4/0x3b0 [ +0.000022] kthread+0x109/0x220 [ +0.000018] ? __pfx_worker_thread+0x10/0x10 [ +0.000779] ? _raw_spin_unlock_irq+0x1f/0x40 [ +0.000560] ? __pfx_kthread+0x10/0x10 [ +0.000543] ret_from_fork+0x3c/0x60 [ +0.000507] ? __pfx_kthread+0x10/0x10 [ +0.000515] ret_from_fork_asm+0x1a/0x30 [ +0.000515] </TASK>
v2: Replace cancel_delayed_work() to cancel_delayed_work_sync() in amdgpu_userq_destroy() and amdgpu_userq_evict().
Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
1faeeb31 |
| 11-May-2025 |
Dave Airlie <airlied@redhat.com> |
Merge tag 'amd-drm-next-6.16-2025-05-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.16-2025-05-09:
amdgpu: - IPS fixes - DSC cleanup - DC Scaling updates - DC FP fix
Merge tag 'amd-drm-next-6.16-2025-05-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.16-2025-05-09:
amdgpu: - IPS fixes - DSC cleanup - DC Scaling updates - DC FP fixes - Fused I2C-over-AUX updates - SubVP fixes - Freesync fix - DMUB AUX fixes - VCN fix - Hibernation fixes - HDP fixes - DCN 2.1 fixes - DPIA fixes - DMUB updates - Use drm_file_err in amdgpu - Enforce isolation updates - Use new dma_fence helpers - USERQ fixes - Documentation updates - Misc code cleanups - SR-IOV updates - RAS updates - PSP 12 cleanups
amdkfd: - Update error messages for SDMA - Userptr updates
drm: - Add drm_file_err function
dma-buf: - Add a helper to sort and deduplicate dma_fence arrays
From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250509230951.3871914-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>
show more ...
|
Revision tags: v6.15-rc5, v6.15-rc4 |
|
#
71353c1a |
| 22-Apr-2025 |
Sunil Khatri <sunil.khatri@amd.com> |
drm/amdgpu: change DRM_DBG_DRIVER to drm_dbg_driver
update the functions in amdgpu_userqueues.c from DRM_DBG_DRIVER to drm_dbg_driver so multi gpu instance can be logged in.
Signed-off-by: Sunil Kh
drm/amdgpu: change DRM_DBG_DRIVER to drm_dbg_driver
update the functions in amdgpu_userqueues.c from DRM_DBG_DRIVER to drm_dbg_driver so multi gpu instance can be logged in.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
c46a3762 |
| 22-Apr-2025 |
Sunil Khatri <sunil.khatri@amd.com> |
drm/amdgpu: change DRM_ERROR to drm_file_err in amdgpu_userq.c
change the DRM_ERROR and drm_err to drm_file_err to add process name and pid to the logging.
Signed-off-by: Sunil Khatri <sunil.khatri
drm/amdgpu: change DRM_ERROR to drm_file_err in amdgpu_userq.c
change the DRM_ERROR and drm_err to drm_file_err to add process name and pid to the logging.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
8c97cdb1 |
| 22-Apr-2025 |
Sunil Khatri <sunil.khatri@amd.com> |
drm/amdgpu: use drm_file_err in fence timeouts
use drm_file_err instead of DRM_ERROR which adds process and pid information in the userqueue error logging.
Sample log:
[ 19.802315] amdgpu 0000:0
drm/amdgpu: use drm_file_err in fence timeouts
use drm_file_err instead of DRM_ERROR which adds process and pid information in the userqueue error logging.
Sample log:
[ 19.802315] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: ibus-x11 pid: 2055 client: Unset ... Couldn't unmap all the queues [ 19.802319] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: ibus-x11 pid: 2055 client: Unset ... Failed to evict userqueue [ 19.838432] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: systemd-logind pid: 1042 client: Unset ... Couldn't unmap all the queues [ 19.838436] amdgpu 0000:0a:00.0: [drm] *ERROR* comm: systemd-logind pid: 1042 client: Unset ... Failed to evict userqueue
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
30ff7580 |
| 22-Apr-2025 |
Sunil Khatri <sunil.khatri@amd.com> |
drm/amdgpu: add drm_file reference in userq_mgr
drm_file will be used in usermode queues code to enable better process information in logging and hence add drm_file part of the userq_mgr struct.
up
drm/amdgpu: add drm_file reference in userq_mgr
drm_file will be used in usermode queues code to enable better process information in logging and hence add drm_file part of the userq_mgr struct.
update the drm_file pointer in userq_mgr for each amdgpu_driver_open_kms.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
97c39b4d |
| 30-Apr-2025 |
Dan Carpenter <dan.carpenter@linaro.org> |
drm/amdgpu/userq: remove unnecessary NULL check
The "ticket" pointer points to in the middle of the &exec struct so it can't be NULL. Remove the check.
Reviewed-by: Christian König <christian.koen
drm/amdgpu/userq: remove unnecessary NULL check
The "ticket" pointer points to in the middle of the &exec struct so it can't be NULL. Remove the check.
Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Shashank Sharma <shashank.sharma@amd.com> Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
482d4853 |
| 25-Apr-2025 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/userq: take the userq_mgr lock in enforce isolation
Add the missing locking.
Fixes: 94976e7e5ede ("drm/amdgpu/userq: add helpers to start/stop scheduling") Reviewed-by: Arvind Yadav <Arv
drm/amdgpu/userq: take the userq_mgr lock in enforce isolation
Add the missing locking.
Fixes: 94976e7e5ede ("drm/amdgpu/userq: add helpers to start/stop scheduling") Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
c5e02d65 |
| 25-Apr-2025 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/userq: take the userq_mgr lock in suspend/resume
Add the missing locking.
Fixes: 73e12e98ec0c ("drm/amdgpu/userq: add suspend and resume helpers") Reviewed-by: Arvind Yadav <Arvind.Yadav
drm/amdgpu/userq: take the userq_mgr lock in suspend/resume
Add the missing locking.
Fixes: 73e12e98ec0c ("drm/amdgpu/userq: add suspend and resume helpers") Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
56801cb8 |
| 22-Apr-2025 |
Arvind Yadav <Arvind.Yadav@amd.com> |
drm/amdgpu: remove DRM_AMDGPU_NAVI3X_USERQ config for UQ
DRM_AMDGPU_NAVI3X_USERQ config support is not required for usermode queue.
v2: rebase.
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelva
drm/amdgpu: remove DRM_AMDGPU_NAVI3X_USERQ config for UQ
DRM_AMDGPU_NAVI3X_USERQ config support is not required for usermode queue.
v2: rebase.
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
#
127e612b |
| 21-Apr-2025 |
Sunil Khatri <sunil.khatri@amd.com> |
drm/amdgpu: update fence ptr with context:seqno
log context:seqno of the fence during timeout rather than logging fence pointer.
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Cc: Tvrtko Ursulin
drm/amdgpu: update fence ptr with context:seqno
log context:seqno of the fence during timeout rather than logging fence pointer.
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|
Revision tags: v6.15-rc3 |
|
#
42a66677 |
| 16-Apr-2025 |
Alex Deucher <alexander.deucher@amd.com> |
drm/amdgpu/userq: use consistent function naming
s/userqueue/userq/
1. remove the mix of amdgpu_userqueue and amdgpu_userq 2. to be consistent with other amdgpu_userq_fence.c 3. it's shorter
Revie
drm/amdgpu/userq: use consistent function naming
s/userqueue/userq/
1. remove the mix of amdgpu_userqueue and amdgpu_userq 2. to be consistent with other amdgpu_userq_fence.c 3. it's shorter
Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
show more ...
|