9dec27bb | 07-May-2024 |
Didi Freiman <dfreiman@habana.ai> |
accel/habanalabs: gradual sleep in polling memory macro
It’s better to avoid long sleeps right from the beginning of the polling since the data may be available much sooner than the sleep period. Be
accel/habanalabs: gradual sleep in polling memory macro
It’s better to avoid long sleeps right from the beginning of the polling since the data may be available much sooner than the sleep period. Because polling host memory is inexpensive, this change gradually increases the sleep time up to the user-requested period.
Signed-off-by: Didi Freiman <dfreiman@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
0199e639 | 13-May-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: move heartbeat work initialization to early init
The device heartbeat work is currently initialized at device_heartbeat_schedule() which is called at the end of hl_device_init(). H
accel/habanalabs: move heartbeat work initialization to early init
The device heartbeat work is currently initialized at device_heartbeat_schedule() which is called at the end of hl_device_init(). However hl_device_init() can fail at a previous step, and in such a case, a subsequent call to hl_device_fini() will lead to calling cleanup_resources() and accessing this work uninitialized.
As there is no real need to re-initialize this work every time it is rescheduled, move this initialization to device_early_init() to be done once and early enough.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
5cb97d74 | 01-May-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: print timestamp of last PQ heartbeat on EQ heartbeat failure
The test packet which is sent to FW for the PQ heartbeat is used also as the trigger in FW to send the EQ heartbeat eve
accel/habanalabs: print timestamp of last PQ heartbeat on EQ heartbeat failure
The test packet which is sent to FW for the PQ heartbeat is used also as the trigger in FW to send the EQ heartbeat event. Add the time of the last sent packet to the debug info which is printed upon a EQ heartbeat failure.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
c4548eee | 16-Apr-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: dump the EQ entries headers on EQ heartbeat failure
Add a dump of the EQ entries headers upon a EQ heartbeat failure.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Of
accel/habanalabs: dump the EQ entries headers on EQ heartbeat failure
Add a dump of the EQ entries headers upon a EQ heartbeat failure.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
795f93e6 | 16-Apr-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: revise print on EQ heartbeat failure
Don't print the "previous EQ index" value in case of a EQ heartbeat failure, because it is incremented along with the EQ CI and therefore redun
accel/habanalabs: revise print on EQ heartbeat failure
Don't print the "previous EQ index" value in case of a EQ heartbeat failure, because it is incremented along with the EQ CI and therefore redundant.
In addition, as the CPU-CP PI is zeroed when it reaches a value that is twice the queue size, add a value of the CI with a similar wrap around, to make it easier to compare the values.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
9ee446f9 | 09-Apr-2024 |
Farah Kassabri <fkassabri@habana.ai> |
accel/habanalabs: add more info upon cpu pkt timeout
In order to have better debuggability upon encountering FW issues, We are adding additional info once CPU packet timeout expires.
Signed-off-by:
accel/habanalabs: add more info upon cpu pkt timeout
In order to have better debuggability upon encountering FW issues, We are adding additional info once CPU packet timeout expires.
Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
fda92282 | 04-Apr-2024 |
Ilia Levi <illevi@habana.ai> |
accel/habanalabs: additional print in device-in-use info
When device release triggers a hard reset, there is a printout of the cause. Currently listed causes (that increment context refcount) are ac
accel/habanalabs: additional print in device-in-use info
When device release triggers a hard reset, there is a printout of the cause. Currently listed causes (that increment context refcount) are active command submissions and exported DMA buffer objects. In any other case, the printout emits "unknown reason". We identify and print another reason - allocated command buffers.
Signed-off-by: Ilia Levi <illevi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
61f4f624 | 03-Apr-2024 |
Tal Cohen <talcohen@habana.ai> |
accel/habanalabs: disable EQ interrupt after disabling pci
When sending disable pci msg towards firmware, there is a possibility that an EQ packet is already pending, disabling EQ interrupt will pre
accel/habanalabs: disable EQ interrupt after disabling pci
When sending disable pci msg towards firmware, there is a possibility that an EQ packet is already pending, disabling EQ interrupt will prevent this from happening. The interrupt will be re-enabled after reset.
Signed-off-by: Tal Cohen <talcohen@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
52fbab90 | 04-Apr-2024 |
Farah Kassabri <fkassabri@habana.ai> |
accel/habanalabs: change the heartbeat scheduling point
Currently we schedule the heartbeat thread at late init, only then we set the INTS_REGISTER packet which enables events to be received from fi
accel/habanalabs: change the heartbeat scheduling point
Currently we schedule the heartbeat thread at late init, only then we set the INTS_REGISTER packet which enables events to be received from firmware.
Init may take some time and we want to give firmware 2 full cycles of heartbeat thread after it received INTS_REGISTER.
The patch will move the heartbeat thread scheduling to be after driver is done with all initializations.
Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
2214cafd | 20-Mar-2024 |
Ofir Bitton <obitton@habana.ai> |
accel/habanalabs: remove timestamp registration debug prints
There are several timestamp registration debug prints which spams the kernel log whenever dyn debug is enabled. Remove those prints.
Rev
accel/habanalabs: remove timestamp registration debug prints
There are several timestamp registration debug prints which spams the kernel log whenever dyn debug is enabled. Remove those prints.
Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
cebb64f9 | 27-May-2024 |
Vitaly Margolin <vmargolin@habana.ai> |
accel/habanalabs: add cpld ts cpld_timestamp cpucp
Add cpld_timestamp field to cpucp_info structure and return cpld timestamp as part of cpld version
Signed-off-by: Vitaly Margolin <vmargolin@haban
accel/habanalabs: add cpld ts cpld_timestamp cpucp
Add cpld_timestamp field to cpucp_info structure and return cpld timestamp as part of cpld version
Signed-off-by: Vitaly Margolin <vmargolin@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
3d613b0c | 12-Mar-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: add a common handler for clock change events
As the new dynamic EQ includes clock change events which are common and not ASIC-specific, add a common handler for these events.
Sign
accel/habanalabs: add a common handler for clock change events
As the new dynamic EQ includes clock change events which are common and not ASIC-specific, add a common handler for these events.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
93a296dd | 12-Mar-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: move hl_eq_heartbeat_event_handle() to common code
hl_eq_heartbeat_event_handle() doesn't have ASIC specific code, and therefore can be moved from Gaudi2-only code to common code,
accel/habanalabs: move hl_eq_heartbeat_event_handle() to common code
hl_eq_heartbeat_event_handle() doesn't have ASIC specific code, and therefore can be moved from Gaudi2-only code to common code, and possibly used for other ASICs.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
5f6ad3c6 | 13-Mar-2024 |
Tomer Tayar <ttayar@habana.ai> |
accel/habanalabs: add an EQ size ASIC property
Future supported ASICs might use the dynamic EQ mechanism with the firmware, and in that case the EQ size won't be equal to the default HL_EQ_SIZE_IN_B
accel/habanalabs: add an EQ size ASIC property
Future supported ASICs might use the dynamic EQ mechanism with the firmware, and in that case the EQ size won't be equal to the default HL_EQ_SIZE_IN_BYTES value. Add an ASIC property to enable overriding this value.
Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
b94488be | 03-Mar-2024 |
Farah Kassabri <fkassabri@habana.ai> |
accel/habanalabs: check for errors after preboot is ready
Driver should check and report any fatal errors detected by preboot, before it attempts to load the boot fit. Some errors may cause the driv
accel/habanalabs: check for errors after preboot is ready
Driver should check and report any fatal errors detected by preboot, before it attempts to load the boot fit. Some errors may cause the driver to stop the boot process and mark the device as unusable. This check will allow the driver to fail and print the error reported by preboot and skip the time wasting attempt of trying to load the boot fit, which will fail due to the error.
Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
25abbe7a | 03-Mar-2024 |
Igal Zeltser <izeltser@habana.ai> |
accel/habanalabs: use msg_header instead of desc_header
Struct comms_desc_header is deprecated and replaced by struct comms_msg_header. As a preparation for removing comms_desc_header from FW, all i
accel/habanalabs: use msg_header instead of desc_header
Struct comms_desc_header is deprecated and replaced by struct comms_msg_header. As a preparation for removing comms_desc_header from FW, all it's usage in code is replaced by comms_msg_header.
Signed-off-by: Igal Zeltser <izeltser@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
31bd2693 | 21-Feb-2024 |
Farah Kassabri <fkassabri@habana.ai> |
accel/habanalabs: add heartbeat debug info
It is hard to debug the reason for heartbeat check failures. As an attempt to ease this task, this patch will provide more information when this failure ha
accel/habanalabs: add heartbeat debug info
It is hard to debug the reason for heartbeat check failures. As an attempt to ease this task, this patch will provide more information when this failure happens. Heartbeat checks the communication with FW, so printing the CPU queue pi/ci and the counter of how many times that event was received would help in debugging the issue.
Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|
42f04ca6 | 25-Feb-2024 |
Ohad Sharabi <osharabi@habana.ai> |
accel/habanalabs: add device name to invalidation failure msg
This addition helps log parsers better define the error without the need to go back and search the device name on former log lines.
Sig
accel/habanalabs: add device name to invalidation failure msg
This addition helps log parsers better define the error without the need to go back and search the device name on former log lines.
Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>
show more ...
|