xref: /linux/Documentation/ABI/testing/debugfs-driver-habanalabs (revision c94cd9508b1335b949fd13ebd269313c65492df0)
1What:           /sys/kernel/debug/accel/<parent_device>/addr
2Date:           Jan 2019
3KernelVersion:  5.1
4Contact:        ogabbay@kernel.org
5Description:    Sets the device address to be used for read or write through
6                PCI bar, or the device VA of a host mapped memory to be read or
7                written directly from the host. The latter option is allowed
8                only when the IOMMU is disabled.
9                The acceptable value is a string that starts with "0x"
10
11What:           /sys/kernel/debug/accel/<parent_device>/clk_gate
12Date:           May 2020
13KernelVersion:  5.8
14Contact:        ogabbay@kernel.org
15Description:    This setting is now deprecated as clock gating is handled solely by the f/w
16
17What:           /sys/kernel/debug/accel/<parent_device>/command_buffers
18Date:           Jan 2019
19KernelVersion:  5.1
20Contact:        ogabbay@kernel.org
21Description:    Displays a list with information about the currently allocated
22                command buffers
23
24What:           /sys/kernel/debug/accel/<parent_device>/command_submission
25Date:           Jan 2019
26KernelVersion:  5.1
27Contact:        ogabbay@kernel.org
28Description:    Displays a list with information about the currently active
29                command submissions
30
31What:           /sys/kernel/debug/accel/<parent_device>/command_submission_jobs
32Date:           Jan 2019
33KernelVersion:  5.1
34Contact:        ogabbay@kernel.org
35Description:    Displays a list with detailed information about each JOB (CB) of
36                each active command submission
37
38What:           /sys/kernel/debug/accel/<parent_device>/data32
39Date:           Jan 2019
40KernelVersion:  5.1
41Contact:        ogabbay@kernel.org
42Description:    Allows the root user to read or write directly through the
43                device's PCI bar. Writing to this file generates a write
44                transaction while reading from the file generates a read
45                transaction. This custom interface is needed (instead of using
46                the generic Linux user-space PCI mapping) because the DDR bar
47                is very small compared to the DDR memory and only the driver can
48                move the bar before and after the transaction.
49
50                If the IOMMU is disabled, it also allows the root user to read
51                or write from the host a device VA of a host mapped memory
52
53What:           /sys/kernel/debug/accel/<parent_device>/data64
54Date:           Jan 2020
55KernelVersion:  5.6
56Contact:        ogabbay@kernel.org
57Description:    Allows the root user to read or write 64 bit data directly
58                through the device's PCI bar. Writing to this file generates a
59                write transaction while reading from the file generates a read
60                transaction. This custom interface is needed (instead of using
61                the generic Linux user-space PCI mapping) because the DDR bar
62                is very small compared to the DDR memory and only the driver can
63                move the bar before and after the transaction.
64
65                If the IOMMU is disabled, it also allows the root user to read
66                or write from the host a device VA of a host mapped memory
67
68What:           /sys/kernel/debug/accel/<parent_device>/data_dma
69Date:           Apr 2021
70KernelVersion:  5.13
71Contact:        ogabbay@kernel.org
72Description:    Allows the root user to read from the device's internal
73                memory (DRAM/SRAM) through a DMA engine.
74                This property is a binary blob that contains the result of the
75                DMA transfer.
76                This custom interface is needed (instead of using the generic
77                Linux user-space PCI mapping) because the amount of internal
78                memory is huge (>32GB) and reading it via the PCI bar will take
79                a very long time.
80                This interface doesn't support concurrency in the same device.
81                In GAUDI and GOYA, this action can cause undefined behavior
82                in case it is done while the device is executing user
83                workloads.
84                Only supported on GAUDI at this stage.
85
86What:           /sys/kernel/debug/accel/<parent_device>/device
87Date:           Jan 2019
88KernelVersion:  5.1
89Contact:        ogabbay@kernel.org
90Description:    Enables the root user to set the device to specific state.
91                Valid values are "disable", "enable", "suspend", "resume".
92                User can read this property to see the valid values
93
94What:           /sys/kernel/debug/accel/<parent_device>/device_release_watchdog_timeout
95Date:           Oct 2022
96KernelVersion:  6.2
97Contact:        ttayar@habana.ai
98Description:    The watchdog timeout value in seconds for a device release upon
99                certain error cases, after which the device is reset.
100
101What:           /sys/kernel/debug/accel/<parent_device>/dma_size
102Date:           Apr 2021
103KernelVersion:  5.13
104Contact:        ogabbay@kernel.org
105Description:    Specify the size of the DMA transaction when using DMA to read
106                from the device's internal memory. The value can not be larger
107                than 128MB. Writing to this value initiates the DMA transfer.
108                When the write is finished, the user can read the "data_dma"
109                blob
110
111What:           /sys/kernel/debug/accel/<parent_device>/dump_razwi_events
112Date:           Aug 2022
113KernelVersion:  5.20
114Contact:        fkassabri@habana.ai
115Description:    Dumps all razwi events to dmesg if exist.
116                After reading the status register of an existing event
117                the routine will clear the status register.
118                Usage: cat dump_razwi_events
119
120What:           /sys/kernel/debug/accel/<parent_device>/dump_security_violations
121Date:           Jan 2021
122KernelVersion:  5.12
123Contact:        ogabbay@kernel.org
124Description:    Dumps all security violations to dmesg. This will also ack
125                all security violations meanings those violations will not be
126                dumped next time user calls this API
127
128What:           /sys/kernel/debug/accel/<parent_device>/engines
129Date:           Jul 2019
130KernelVersion:  5.3
131Contact:        ogabbay@kernel.org
132Description:    Displays the status registers values of the device engines and
133                their derived idle status
134
135What:           /sys/kernel/debug/accel/<parent_device>/i2c_addr
136Date:           Jan 2019
137KernelVersion:  5.1
138Contact:        ogabbay@kernel.org
139Description:    Sets I2C device address for I2C transaction that is generated
140                by the device's CPU, Not available when device is loaded with secured
141                firmware
142
143What:           /sys/kernel/debug/accel/<parent_device>/i2c_bus
144Date:           Jan 2019
145KernelVersion:  5.1
146Contact:        ogabbay@kernel.org
147Description:    Sets I2C bus address for I2C transaction that is generated by
148                the device's CPU, Not available when device is loaded with secured
149                firmware
150
151What:           /sys/kernel/debug/accel/<parent_device>/i2c_data
152Date:           Jan 2019
153KernelVersion:  5.1
154Contact:        ogabbay@kernel.org
155Description:    Triggers an I2C transaction that is generated by the device's
156                CPU. Writing to this file generates a write transaction while
157                reading from the file generates a read transaction, Not available
158                when device is loaded with secured firmware
159
160What:           /sys/kernel/debug/accel/<parent_device>/i2c_len
161Date:           Dec 2021
162KernelVersion:  5.17
163Contact:        obitton@habana.ai
164Description:    Sets I2C length in bytes for I2C transaction that is generated by
165                the device's CPU, Not available when device is loaded with secured
166                firmware
167
168What:           /sys/kernel/debug/accel/<parent_device>/i2c_reg
169Date:           Jan 2019
170KernelVersion:  5.1
171Contact:        ogabbay@kernel.org
172Description:    Sets I2C register id for I2C transaction that is generated by
173                the device's CPU, Not available when device is loaded with secured
174                firmware
175
176What:           /sys/kernel/debug/accel/<parent_device>/led0
177Date:           Jan 2019
178KernelVersion:  5.1
179Contact:        ogabbay@kernel.org
180Description:    Sets the state of the first S/W led on the device, Not available
181                when device is loaded with secured firmware
182
183What:           /sys/kernel/debug/accel/<parent_device>/led1
184Date:           Jan 2019
185KernelVersion:  5.1
186Contact:        ogabbay@kernel.org
187Description:    Sets the state of the second S/W led on the device, Not available
188                when device is loaded with secured firmware
189
190What:           /sys/kernel/debug/accel/<parent_device>/led2
191Date:           Jan 2019
192KernelVersion:  5.1
193Contact:        ogabbay@kernel.org
194Description:    Sets the state of the third S/W led on the device, Not available
195                when device is loaded with secured firmware
196
197What:           /sys/kernel/debug/accel/<parent_device>/memory_scrub
198Date:           May 2022
199KernelVersion:  5.19
200Contact:        dhirschfeld@habana.ai
201Description:    Allows the root user to scrub the dram memory. The scrubbing
202                value can be set using the debugfs file memory_scrub_val.
203
204What:           /sys/kernel/debug/accel/<parent_device>/memory_scrub_val
205Date:           May 2022
206KernelVersion:  5.19
207Contact:        dhirschfeld@habana.ai
208Description:    The value to which the dram will be set to when the user
209                scrubs the dram using 'memory_scrub' debugfs file and
210                the scrubbing value when using module param 'memory_scrub'
211
212What:           /sys/kernel/debug/accel/<parent_device>/mmu
213Date:           Jan 2019
214KernelVersion:  5.1
215Contact:        ogabbay@kernel.org
216Description:    Displays the hop values and physical address for a given ASID
217                and virtual address. The user should write the ASID and VA into
218                the file and then read the file to get the result.
219                e.g. to display info about VA 0x1000 for ASID 1 you need to do:
220                echo "1 0x1000" > /sys/kernel/debug/accel/<parent_device>/mmu
221
222What:           /sys/kernel/debug/accel/<parent_device>/mmu_error
223Date:           Mar 2021
224KernelVersion:  5.12
225Contact:        fkassabri@habana.ai
226Description:    Check and display page fault or access violation mmu errors for
227                all MMUs specified in mmu_cap_mask.
228                e.g. to display error info for MMU hw cap bit 9, you need to do:
229                echo "0x200" > /sys/kernel/debug/accel/<parent_device>/mmu_error
230                cat /sys/kernel/debug/accel/<parent_device>/mmu_error
231
232What:           /sys/kernel/debug/accel/<parent_device>/monitor_dump
233Date:           Mar 2022
234KernelVersion:  5.19
235Contact:        osharabi@habana.ai
236Description:    Allows the root user to dump monitors status from the device's
237                protected config space.
238                This property is a binary blob that contains the result of the
239                monitors registers dump.
240                This custom interface is needed (instead of using the generic
241                Linux user-space PCI mapping) because this space is protected
242                and cannot be accessed using PCI read.
243                This interface doesn't support concurrency in the same device.
244                Only supported on GAUDI.
245
246What:           /sys/kernel/debug/accel/<parent_device>/monitor_dump_trig
247Date:           Mar 2022
248KernelVersion:  5.19
249Contact:        osharabi@habana.ai
250Description:    Triggers dump of monitor data. The value to trigger the operation
251                must be 1. Triggering the monitor dump operation initiates dump of
252                current registers values of all monitors.
253                When the write is finished, the user can read the "monitor_dump"
254                blob
255
256What:           /sys/kernel/debug/accel/<parent_device>/server_type
257Date:           Feb 2024
258KernelVersion:  6.11
259Contact:        trisin@habana.ai
260Description:    Exposes the device's server type, maps to enum hl_server_type.
261
262What:           /sys/kernel/debug/accel/<parent_device>/set_power_state
263Date:           Jan 2019
264KernelVersion:  5.1
265Contact:        ogabbay@kernel.org
266Description:    Sets the PCI power state. Valid values are "1" for D0 and "2"
267                for D3Hot
268
269What:           /sys/kernel/debug/accel/<parent_device>/skip_reset_on_timeout
270Date:           Jun 2021
271KernelVersion:  5.13
272Contact:        ynudelman@habana.ai
273Description:    Sets the skip reset on timeout option for the device. Value of
274                "0" means device will be reset in case some CS has timed out,
275                otherwise it will not be reset.
276
277What:           /sys/kernel/debug/accel/<parent_device>/state_dump
278Date:           Oct 2021
279KernelVersion:  5.15
280Contact:        ynudelman@habana.ai
281Description:    Gets the state dump occurring on a CS timeout or failure.
282                State dump is used for debug and is created each time in case of
283                a problem in a CS execution, before reset.
284                Reading from the node returns the newest state dump available.
285                Writing an integer X discards X state dumps, so that the
286                next read would return X+1-st newest state dump.
287
288What:           /sys/kernel/debug/accel/<parent_device>/stop_on_err
289Date:           Mar 2020
290KernelVersion:  5.6
291Contact:        ogabbay@kernel.org
292Description:    Sets the stop-on_error option for the device engines. Value of
293                "0" is for disable, otherwise enable.
294                Relevant only for GOYA and GAUDI.
295
296What:           /sys/kernel/debug/accel/<parent_device>/timeout_locked
297Date:           Sep 2021
298KernelVersion:  5.16
299Contact:        obitton@habana.ai
300Description:    Sets the command submission timeout value in seconds.
301
302What:           /sys/kernel/debug/accel/<parent_device>/userptr
303Date:           Jan 2019
304KernelVersion:  5.1
305Contact:        ogabbay@kernel.org
306Description:    Displays a list with information about the current user
307                pointers (user virtual addresses) that are pinned and mapped
308                to DMA addresses
309
310What:           /sys/kernel/debug/accel/<parent_device>/userptr_lookup
311Date:           Oct 2021
312KernelVersion:  5.15
313Contact:        ogabbay@kernel.org
314Description:    Allows to search for specific user pointers (user virtual
315                addresses) that are pinned and mapped to DMA addresses, and see
316                their resolution to the specific dma address.
317
318What:           /sys/kernel/debug/accel/<parent_device>/vm
319Date:           Jan 2019
320KernelVersion:  5.1
321Contact:        ogabbay@kernel.org
322Description:    Displays a list with information about all the active virtual
323                address mappings per ASID and all user mappings of HW blocks
324