History log of /linux/drivers/gpu/drm/xe/xe_hw_error.c (Results 1 – 7 of 7)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 58809f61 02-Oct-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
"cross-subsystem:
- i2c-hid: Make elan touch controllers power on after panel is

Merge tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel

Pull drm updates from Dave Airlie:
"cross-subsystem:
- i2c-hid: Make elan touch controllers power on after panel is
enabled
- dt bindings for STM32MP25 SoC
- pci vgaarb: use screen_info helpers
- rust pin-init updates
- add MEI driver for late binding firmware update/load

uapi:
- add ioctl for reassigning GEM handles
- provide boot_display attribute on boot-up devices

core:
- document DRM_MODE_PAGE_FLIP_EVENT
- add vendor specific recovery method to drm device wedged uevent

gem:
- Simplify gpuvm locking

ttm:
- add interface to populate buffers

sched:
- Fix race condition in trace code

atomic:
- Reallow no-op async page flips

display:
- dp: Fix command length

video:
- Improve pixel-format handling for struct screen_info

rust:
- drop Opaque<> from ioctl args
- Alloc:
- BorrowedPage type and AsPageIter traits
- Implement Vmalloc::to_page() and VmallocPageIter
- DMA/Scatterlist:
- Add dma::DataDirection and type alias for dma_addr_t
- Abstraction for struct scatterlist and sg_table
- DRM:
- simplify use of generics
- add DriverFile type alias
- drop Object::SIZE
- Rust:
- pin-init tree merge
- Various methods for AsBytes and FromBytes traits

gpuvm:
- Support madvice in Xe driver

gpusvm:
- fix hmm_pfn_to_map_order usage in gpusvm

bridge:
- Improve and fix ref counting on bridge management
- cdns-dsi: Various improvements to mode setting
- Support Solomon SSD2825 plus DT bindings
- Support Waveshare DSI2DPI plus DT bindings
- Support Content Protection property
- display-connector: Improve DP display detection
- Add support for Radxa Ra620 plus DT bindings
- adv7511: Provide SPD and HDMI infoframes
- it6505: Replace crypto_shash with sha()
- synopsys: Add support for DW DPTX Controller plus DT bindings
- adv7511: Write full Audio infoframe
- ite6263: Support vendor-specific infoframes
- simple: Add support for Realtek RTD2171 DP-to-HDMI plus DT bindings

panel:
- panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64;
Support SHP LQ134Z1; Fixes
- panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings
- Support Samsung AMS561RA01
- Support Hydis HV101HD1 plus DT bindings
- ilitek-ili9881c: Refactor mode setting; Add support for Bestar
BSD1218-A101KL68 LCD plus DT bindings
- lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings
- edp: Add support for additonal mt8189 Chromebook panels
- lvds: Add DT bindings for EDT ETML0700Z8DHA

amdgpu:
- add CRIU support for gem objects
- RAS updates
- VCN SRAM load fixes
- EDID read fixes
- eDP ALPM support
- Documentation updates
- Rework PTE flag generation
- DCE6 fixes
- VCN devcoredump cleanup
- MMHUB client id fixes
- VCN 5.0.1 RAS support
- SMU 13.0.x updates
- Expanded PCIe DPC support
- Expanded VCN reset support
- VPE per queue reset support
- give kernel jobs unique id for tracing
- pre-populate exported buffers
- cyan skillfish updates
- make vbios build number available in sysfs
- userq updates
- HDCP updates
- support MMIO remap page as ttm pool
- JPEG parser updates
- DCE6 DC updates
- use devm for i2c buses
- GPUVM locking updates
- Drop non-DC DCE11 code
- improve fallback handling for pixel encoding

amdkfd:
- SVM/page migration fixes
- debugfs fixes
- add CRIO support for gem objects
- SVM updates

radeon:
- use dev_warn_once in CS parsers

xe:
- add madvise interface
- add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count
and memory attributes
- drop L# bank mask reporting from media GT3 on Xe3+.
- add SLPC power_profile sysfs interface
- add configs attribs to add post/mid context-switch commands
- handle firmware reported hardware errors notifying userspace with
device wedged uevent
- use same dir structure across sysfs/debugfs
- cleanup and future proof vram region init
- add G-states and PCI link states to debugfs
- Add SRIOV support for CCS surfaces on Xe2+
- Enable SRIOV PF mode by default on supported platforms
- move flush to common code
- extended core workarounds for Xe2/3
- use DRM scheduler for delayed GT TLB invalidations
- configs improvements and allow VF device enablement
- prep work to expose mmio regions to userspace
- VF migration support added
- prepare GPU SVM for THP migration
- start fixing XE_PAGE_SIZE vs PAGE_SIZE
- add PSMI support for hw validation
- resize VF bars to max possible size according to number of VFs
- Ensure GT is in C0 during resume
- pre-populate exported buffers
- replace xe_hmm with gpusvm
- add more SVM GT stats to debugfs
- improve fake pci and WA kunnit handle for new platform testing
- Test GuC to GuC comms to add debugging
- use attribute groups to simplify sysfs registration
- add Late Binding firmware code to interact with MEI

i915:
- apply multiple JSL/EHL/Gen7/Gen6 workarounds properly
- protect against overflow in active_engine()
- Use try_cmpxchg64() in __active_lookup()
- include GuC registers in error state
- get rid of dev->struct_mutex
- iopoll: generalize read_poll_timout
- lots more display refactoring
- Reject HBR3 in any eDP Panel
- Prune modes for YUV420
- Display Wa fix, additions, and updates
- DP: Fix 2.7 Gbps link training on g4x
- DP: Adjust the idle pattern handling
- DP: Shuffle the link training code a bit
- Don't set/read the DSI C clock divider on GLK
- Enable_psr kernel parameter changes
- Type-C enabled/disconnected dp-alt sink
- Wildcat Lake enabling
- DP HDR updates
- DRAM detection
- wait PSR idle on dsb commit
- Remove FBC modulo 4 restriction for ADL-P+
- panic: refactor framebuffer allocation

habanalabs:
- debug/visibility improvements
- vmalloc-backed coherent mmap support
- HLDIO infrastructure

nova-core:
- various register!() macro improvements
- minor vbios/firmware fixes/refactoring
- advance firmware boot stages; process Booter and patch signatures
- process GSP and GSP bootloader
- Add r570.144 firmware bindings and update to it
- Move GSP boot code to own module
- Use new pin-init features to store driver's private data in a
single allocation
- Update ARef import from sync::aref

nova-drm:
- Update ARef import from sync::aref

tyr:
- initial driver skeleton for a rust driver for ARM Mali GPUs
- capable of powering up, query metadata and provide it to userspace.

msm:
- GPU and Core:
- in DT bindings describe clocks per GPU type
- GMU bandwidth voting for x1-85
- a623/a663 speedbins
- cleanup some remaining no-iommu leftovers after VM_BIND conversion
- fix GEM obj 32b size truncation
- add missing VM_BIND param validation
- IFPC for x1-85 and a750
- register xml and gen_header.py sync from mesa
- Display:
- add missing bindings for display on SC8180X
- added DisplayPort MST bindings
- conversion from round_rate() to determine_rate()

amdxdna:
- add IOCTL_AMDXDNA_GET_ARRAY
- support user space allocated buffers
- streamline PM interfaces
- Refactoring wrt. hardware contexts
- improve error reporting

nouveau:
- use GSP firmware by default
- improve error reporting
- Pre-populate exported buffers

ast:
- Clean up detection of DRAM config

exynos:
- add DSIM bridge driver support for Exynos7870
- Document Exynos7870 DSIM compatible in dt-binding

panthor:
- Print task/pid on errors
- Add support for Mali G710, G510, G310, Gx15, Gx20, Gx25
- Improve cache flushing
- Fail VM bind if BO has offset

renesas:
- convert to RUNTIME_PM_OPS

rcar-du:
- Make number of lanes configurable
- Use RUNTIME_PM_OPS
- Add support for DSI commands

rocket:
- Add driver for Rockchip NPU plus DT bindings
- Use kfree() and sizeof() correctly
- Test DMA status

rockchip:
- dsi2: Add support for RK3576 plus DT bindings
- Add support for RK3588 DPTX output

tidss:
- Use crtc_ fields for programming display mode
- Remove other drivers from aperture

pixpaper:
- Add support for Mayqueen Pixpaper plus DT bindings

v3d:
- Support querying nubmer of GPU resets for KHR_robustness

stm:
- Clean up logging
- ltdc: Add support support for STM32MP257F-EV1 plus DT bindings

sitronix:
- st7571-i2c: Add support for inverted displays and 2-bit grayscale

tidss:
- Convert to kernel's FIELD_ macros

vesadrm:
- Support 8-bit palette mode

imagination:
- Improve power management
- Add support for TH1520 GPU
- Support Risc-V architectures

v3d:
- Improve job management and locking

vkms:
- Support variants of ARGB8888, ARGB16161616, RGB565, RGB888 and P01x
- Spport YUV with 16-bit components"

* tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel: (1455 commits)
drm/amd: Add name to modes from amdgpu_connector_add_common_modes()
drm/amd: Drop some common modes from amdgpu_connector_add_common_modes()
drm/amdgpu: update MODULE_PARM_DESC for freesync_video
drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes()
drm/amd/display: Share dce100_validate_global with DCE6-8
drm/amd/display: Share dce100_validate_bandwidth with DCE6-8
drm/amdgpu: Fix fence signaling race condition in userqueue
amd/amdkfd: enhance kfd process check in switch partition
amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
drm/amd/display: Reject modes with too high pixel clock on DCE6-10
drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes()
drm/amd/display: Only enable common modes for eDP and LVDS
drm/amdgpu: remove the redeclaration of variable i
drm/amdgpu/userq: assign an error code for invalid userq va
drm/amdgpu: revert "rework reserved VMID handling" v2
drm/amdgpu: remove leftover from enforcing isolation by VMID
drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails
accel/habanalabs: add Infineon version check
accel/habanalabs/gaudi2: read preboot status after recovering from dirty state
accel/habanalabs: add HL_GET_P_STATE passthrough type
...

show more ...


Revision tags: v6.17, v6.17-rc7
# b4d90dbc 15-Sep-2025 Thomas Zimmermann <tzimmermann@suse.de>

Merge drm/drm-next into drm-misc-next-fixes

Backmerging to drm-misc-next-fixes to get features and fixes from
v6.17-rc6.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>


Revision tags: v6.17-rc6
# 702fdf35 10-Sep-2025 Rodrigo Vivi <rodrigo.vivi@intel.com>

Merge drm/drm-next into drm-intel-next

Catching up with some display dependencies.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>


Revision tags: v6.17-rc5
# 83631c7b 02-Sep-2025 Dave Airlie <airlied@redhat.com>

Merge tag 'drm-xe-next-2025-08-29' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

UAPI Changes:
- Add madvise interface (Himal Prasad Ghimiray)
- Add DRM_IOCTL_XE_VM_QUERY_MEMORY_RA

Merge tag 'drm-xe-next-2025-08-29' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next

UAPI Changes:
- Add madvise interface (Himal Prasad Ghimiray)
- Add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count and
memory attributes (Himal Prasad Ghimiray)
- Handle Firmware reported Hardware Errors notifying userspace with
device wedged uevent (Riana Tauro)

Cross-subsystem Changes:

- Add a vendor-specific recovery method to drm device wedged uevent
(Riana Tauro)

Driver Changes:
- Use same directory structure in debugfs as in sysfs (Michal Wajdeczko)
- Cleanup and future-proof VRAM region initialization (Piotr Piórkowski)
- Add G-states and PCIe link states to debugfs (Soham Purkait)
- Cleanup eustall debug messages (Harish Chegondi)
- Add SR-IOV support to restore Compression Control Surface (CCS) to
Xe2 and later (Satyanarayana K V P)
- Enable SR-IOV PF mode by default on supported platforms without
needing CONFIG_DRM_XE_DEBUG and mark some platforms behind
force_probe as supported (Michal Wajdeczko)
- More targeted log messages (Michal Wajdeczko)
- Cleanup STEER_SEMAPHORE/MCFG_MCR_SELECTOR usage (Nitin Gote)
- Use common code to emit flush (Tvrtko Ursulin)
- Add/extend more HW workarounds and tunings for Xe2 and Xe3
(Sk Anirban, Tangudu Tilak Tirumalesh, Nitin Gote, Chaitanya Kumar Borah)
- Add a generic dependency scheduler to help with TLB invalidations
and future scenarios (Matthew Brost)
- Use DRM scheduler for delayed GT TLB invalidations (Matthew Brost)
- Error out on incorrect device use in configfs
(Michal Wajdeczko, Lucas De Marchi)
- Refactor configfs attributes (Michal Wajdeczko / Lucas De Marchi)
- Allow configuring future VF devices via configfs (Michal Wajdeczko)
- Implement some missing XeLP workarounds (Tvrtko Ursulin)
- Generalize WA BB setup/emission and add support for
mid context restore BB, aka indirect context (Tvrtko Ursulin)
- Prepare the driver to expose mmio regions to userspace
in future (Ilia Levi)
- Add more GuC load error status codes (John Harrison)
- Document DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING (Priyanka Dandamudi)
- Disable CSC and RPM on VFs (Lukasz Laguna, Satyanarayana K V P)
- Fix oops in xe_gem_fault with PREEMPT_RT (Maarten Lankhorst)
- Skip LMTT update if no LMEM was provisioned (Michal Wajdeczko)
- Add support to VF migration (Tomasz Lis)
- Use a helper for guc_waklv_enable functions (Jonathan Cavitt)
- Prepare GPU SVM for migration of THP (Francois Dugast)
- Program LMTT directory pointer on all GTs within a tile
(Piotr Piórkowski)
- Rename XE_WA to XE_GT_WA to better convey its scope vs the device WAs
(Matt Atwood)
- Allow to match devices on PCI devid/vendorid only (Lucas De Marchi)
- Improve PDE PAT index selection (Matthew Brost)
- Consolidate ASID allocation in xe_vm_create() vs
xe_vm_create_ioctl() (Piotr Piórkowski)
- Resize VF BARS to max possible size according to number of VFs
(Michał Winiarski)
- Untangle vm_bind_ioctl cleanup order (Christoph Manszewski)
- Start fixing usage of XE_PAGE_SIZE vs PAGE_SIZE to improve
compatibility with non-x86 arch (Simon Richter)
- Improve tile vs gt initialization order and accounting
(Gustavo Sousa)
- Extend WA kunit test to PTL
- Ensure data is initialized before transferring to pcode
(Stuart Summers)
- Add PSMI support for HW validation (Lucas De Marchi,
Vinay Belgaumkar, Badal Nilawar)
- Improve xe_dma_buf test (Thomas Hellström, Marcin Bernatowicz)
- Fix basename() usage in generator with !glibc (Carlos Llamas)
- Ensure GT is in C0 during resumes (Xin Wang)
- Add TLB invalidation abstraction (Matt Brost, Stuart Summers)
- Make MI_TLB_INVALIDATE conditional on migrate (Matthew Auld)
- Prepare xe_nvm to be initialized early for future use cases
(Riana Tauro)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/nuejxdhnalyok7tzwkrj67dwjgdafwp4mhdejpyyqnrh4f2epq@nlldovuflnbx

show more ...


Revision tags: v6.17-rc4
# d1f51a4f 26-Aug-2025 Riana Tauro <riana.tauro@intel.com>

drm/xe/xe_hw_error: Add fault injection to trigger csc error handler

Add a debugfs fault handler to trigger csc error handler that
wedges the device and enables runtime survivability mode.

v2: add

drm/xe/xe_hw_error: Add fault injection to trigger csc error handler

Add a debugfs fault handler to trigger csc error handler that
wedges the device and enables runtime survivability mode.

v2: add debugfs only for bmg (Umesh)
v3: do not use csc_fault attribute if debugfs is not enabled
v4: rebase

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-11-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

show more ...


# a7df563b 26-Aug-2025 Riana Tauro <riana.tauro@intel.com>

drm/xe/xe_hw_error: Handle CSC Firmware reported Hardware errors

Add support to handle CSC firmware reported errors. When CSC firmware
errors are encoutered, a error interrupt is received by the GFX

drm/xe/xe_hw_error: Handle CSC Firmware reported Hardware errors

Add support to handle CSC firmware reported errors. When CSC firmware
errors are encoutered, a error interrupt is received by the GFX device as
a MSI interrupt.

Device Source control registers indicates the source of the error as CSC
The HEC error status register indicates that the error is firmware reported
Depending on the type of error, the error cause is written to the HEC
Firmware error register.

On encountering such CSC firmware errors, the graphics device is
non-recoverable from driver context. The only way to recover from these
errors is firmware flash.

System admin/userspace is notified of the necessity of firmware flash
with a combination of vendor-specific drm device edged uevent, dmesg logs
and runtime survivability sysfs. It is the responsiblity of the consumer
to verify all the actions and then trigger a firmware flash using tools
like fwupd.

$ udevadm monitor --property --kernel
monitor will print the received events for:
KERNEL - the kernel uevent

KERNEL[754.709341] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 (drm)
ACTION=change
DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0
SUBSYSTEM=drm
WEDGED=vendor-specific
DEVNAME=/dev/dri/card0
DEVTYPE=drm_minor
SEQNUM=5973
MAJOR=226
MINOR=0

Logs

xe 0000:03:00.0: [drm] *ERROR* [Hardware Error]: Tile0 reported NONFATAL error 0x20000
xe 0000:03:00.0: [drm] *ERROR* [Hardware Error]: NONFATAL: HEC Uncorrected FW FD Corruption error reported, bit[2] is set
xe 0000:03:00.0: Runtime Survivability mode enabled
xe 0000:03:00.0: [drm] *ERROR* CRITICAL: Xe has declared device 0000:03:00.0 as wedged.
IOCTLs and executions are blocked. Only a rebind may clear the failure
Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new
xe 0000:03:00.0: [drm] device wedged, needs recovery
xe 0000:03:00.0: Firmware flash required, Please refer to the userspace documentation for more details!

Runtime survivability Sysfs:

/sys/bus/pci/devices/<device>/survivability_mode

v2: use vendor recovery method with
runtime survivability (Christian, Rodrigo, Raag)
v3: move declare wedged to runtime survivability mode (Rodrigo)
v4: update commit message

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-10-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

show more ...


# 0a2a873d 26-Aug-2025 Riana Tauro <riana.tauro@intel.com>

drm/xe: Add support to handle hardware errors

Gfx device reports two classes of errors: uncorrectable and
correctable. Depending on the severity uncorrectable errors are further
classified Non-Fatal

drm/xe: Add support to handle hardware errors

Gfx device reports two classes of errors: uncorrectable and
correctable. Depending on the severity uncorrectable errors are further
classified Non-Fatal and Fatal.

Correctable and Non-Fatal errors: These errors are reported as MSI. Bits in
the Master Interrupt Register indicate the class of the error.
The source of the error is then read from the Device Error Source
Register.

Fatal errors: These are reported as PCIe errors
When a PCIe error is asserted, the OS will perform a SBR (Secondary
Bus reset) which causes the driver to reload. The error registers are
sticky and the values are maintained through SBR.

Add basic support to handle these errors.

Bspec: 50875, 53073, 53074, 53075, 53076

v2: Format commit message (Umesh)
v3: fix documentation (Stuart)

Cc: Stuart Summers <stuart.summers@intel.com>
Co-developed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-9-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

show more ...