2cbab3c2 | 16-Feb-2024 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: use pci_reset_function for health reset
We get the benefit of all the PCI reset locking and recovery if we use the existing pci_reset_function() that will call our local reset handlers.
R
pds_core: use pci_reset_function for health reset
We get the benefit of all the PCI reset locking and recovery if we use the existing pci_reset_function() that will call our local reset handlers.
Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
2dac60e0 | 16-Feb-2024 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: delete VF dev on reset
When the VF is hit with a reset, remove the aux device in the prepare for reset and try to restore it after the reset. The userland mechanics will need to recover an
pds_core: delete VF dev on reset
When the VF is hit with a reset, remove the aux device in the prepare for reset and try to restore it after the reset. The userland mechanics will need to recover and rebuild whatever uses the device afterwards.
Reviewed-by: Brett Creeley <brett.creeley@amd.com> Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
792d36cc | 02-Feb-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Clean up init/uninit flows to be more readable
The setup and teardown flows are somewhat hard to follow regarding pdsc_core_init()/pdsc_dev_init() and their corresponding teardown flows be
pds_core: Clean up init/uninit flows to be more readable
The setup and teardown flows are somewhat hard to follow regarding pdsc_core_init()/pdsc_dev_init() and their corresponding teardown flows being in pdsc_teardown(). Improve the readability by adding new pdsc_core_uninit()/pdsc_dev_unint() functions that mirror their init counterparts. Also, move the notify and admin qcq allocations into pdsc_core_init(), so they can be freed in pdsc_core_uninit().
Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
247c4ed0 | 02-Feb-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Fix up some minor issues
Running xmastree.py against the driver found some RCT issues, so fix them.
Also, if allocating pdsc->intr_info in pdsc_dev_init() fails the driver still tries to
pds_core: Fix up some minor issues
Running xmastree.py against the driver found some RCT issues, so fix them.
Also, if allocating pdsc->intr_info in pdsc_dev_init() fails the driver still tries to free pdsc->intr_info. Fix this by just returning -ENOMEM since there's nothing to free at this point of failure.
Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
bca10f2c | 02-Feb-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Unmask adminq interrupt in work thread
Unmasking the interrupt during the pdsc_adminq_isr is a bit early and could cause unnecessary interrupts. Instead always unmask after processing the
pds_core: Unmask adminq interrupt in work thread
Unmasking the interrupt during the pdsc_adminq_isr is a bit early and could cause unnecessary interrupts. Instead always unmask after processing the adminq and notifyq in pdsc_work_thread()->pdsc_process_adminq(). Also, since we are always unmasking, there's no need for the local credits variable in pdsc_process_adminq().
Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
bc90fbe0 | 30-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Rework teardown/setup flow to be more common
Currently the teardown/setup flow for driver probe/remove is quite a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up(). One key
pds_core: Rework teardown/setup flow to be more common
Currently the teardown/setup flow for driver probe/remove is quite a bit different from the reset flows in pdsc_fw_down()/pdsc_fw_up(). One key piece that's missing are the calls to pci_alloc_irq_vectors() and pci_free_irq_vectors(). The pcie reset case is calling pci_free_irq_vectors() on reset_prepare, but not calling the corresponding pci_alloc_irq_vectors() on reset_done. This is causing unexpected/unwanted interrupt behavior due to the adminq interrupt being accidentally put into legacy interrupt mode. Also, the pci_alloc_irq_vectors()/pci_free_irq_vectors() functions are being called directly in probe/remove respectively.
Fix this inconsistency by making the following changes: 1. Always call pdsc_dev_init() in pdsc_setup(), which calls pci_alloc_irq_vectors() and get rid of the now unused pds_dev_reinit(). 2. Always free/clear the pdsc->intr_info in pdsc_teardown() since this structure will get re-alloced in pdsc_setup(). 3. Move the calls of pci_free_irq_vectors() to pdsc_teardown() since pci_alloc_irq_vectors() will always be called in pdsc_setup()->pdsc_dev_init() for both the probe/remove and reset flows. 4. Make sure to only create the debugfs "identity" entry when it doesn't already exist, which it will in the reset case because it's already been created in the initial call to pdsc_dev_init().
Fixes: ffa55858330f ("pds_core: implement pci reset handlers") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20240129234035.69802-7-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
e96094c1 | 30-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Clear BARs on reset
During reset the BARs might be accessed when they are unmapped. This can cause unexpected issues, so fix it by clearing the cached BAR values so they are not accessed u
pds_core: Clear BARs on reset
During reset the BARs might be accessed when they are unmapped. This can cause unexpected issues, so fix it by clearing the cached BAR values so they are not accessed until they are re-mapped.
Also, make sure any places that can access the BARs when they are NULL are prevented.
Fixes: 49ce92fbee0b ("pds_core: add FW update feature to devlink") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-6-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
7e82a874 | 30-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Prevent race issues involving the adminq
There are multiple paths that can result in using the pdsc's adminq.
[1] pdsc_adminq_isr and the resulting work from queue_work(), i.e. pdsc_w
pds_core: Prevent race issues involving the adminq
There are multiple paths that can result in using the pdsc's adminq.
[1] pdsc_adminq_isr and the resulting work from queue_work(), i.e. pdsc_work_thread()->pdsc_process_adminq()
[2] pdsc_adminq_post()
When the device goes through reset via PCIe reset and/or a fw_down/fw_up cycle due to bad PCIe state or bad device state the adminq is destroyed and recreated.
A NULL pointer dereference can happen if [1] or [2] happens after the adminq is already destroyed.
In order to fix this, add some further state checks and implement reference counting for adminq uses. Reference counting was used because multiple threads can attempt to access the adminq at the same time via [1] or [2]. Additionally, multiple clients (i.e. pds-vfio-pci) can be using [2] at the same time.
The adminq_refcnt is initialized to 1 when the adminq has been allocated and is ready to use. Users/clients of the adminq (i.e. [1] and [2]) will increment the refcnt when they are using the adminq. When the driver goes into a fw_down cycle it will set the PDSC_S_FW_DEAD bit and then wait for the adminq_refcnt to hit 1. Setting the PDSC_S_FW_DEAD before waiting will prevent any further adminq_refcnt increments. Waiting for the adminq_refcnt to hit 1 allows for any current users of the adminq to finish before the driver frees the adminq. Once the adminq_refcnt hits 1 the driver clears the refcnt to signify that the adminq is deleted and cannot be used. On the fw_up cycle the driver will once again initialize the adminq_refcnt to 1 allowing the adminq to be used again.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-5-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
95170515 | 30-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Use struct pdsc for the pdsc_adminq_isr private data
The initial design for the adminq interrupt was done based on client drivers having their own adminq and adminq interrupt. So, each cli
pds_core: Use struct pdsc for the pdsc_adminq_isr private data
The initial design for the adminq interrupt was done based on client drivers having their own adminq and adminq interrupt. So, each client driver's adminq isr would use their specific adminqcq for the private data struct. For the time being the design has changed to only use a single adminq for all clients. So, instead use the struct pdsc for the private data to simplify things a bit.
This also has the benefit of not dereferencing the adminqcq to access the pdsc struct when the PDSC_S_STOPPING_DRIVER bit is set and the adminqcq has actually been cleared/freed.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-4-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
d321067e | 30-Jan-2024 |
Brett Creeley <brett.creeley@amd.com> |
pds_core: Cancel AQ work on teardown
There is a small window where pdsc_work_thread() calls pdsc_process_adminq() and pdsc_process_adminq() passes the PDSC_S_STOPPING_DRIVER check and starts to proc
pds_core: Cancel AQ work on teardown
There is a small window where pdsc_work_thread() calls pdsc_process_adminq() and pdsc_process_adminq() passes the PDSC_S_STOPPING_DRIVER check and starts to process adminq/notifyq work and then the driver starts a fw_down cycle. This could cause some undefined behavior if the notifyqcq/adminqcq are free'd while pdsc_process_adminq() is running. Use cancel_work_sync() on the adminqcq's work struct to make sure any pending work items are cancelled and any in progress work items are completed.
Also, make sure to not call cancel_work_sync() if the work item has not be initialized. Without this, traces will happen in cases where a reset fails and teardown is called again or if reset fails and the driver is removed.
Fixes: 01ba61b55b20 ("pds_core: Add adminq processing and commands") Signed-off-by: Brett Creeley <brett.creeley@amd.com> Reviewed-by: Shannon Nelson <shannon.nelson@amd.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Link: https://lore.kernel.org/r/20240129234035.69802-3-brett.creeley@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
7c02f6ae | 13-Nov-2023 |
Shannon Nelson <shannon.nelson@amd.com> |
pds_core: fix up some format-truncation complaints
Our friendly kernel test robot pointed out a couple of potential string truncation issues. None of which were we worried about, but can be relativ
pds_core: fix up some format-truncation complaints
Our friendly kernel test robot pointed out a couple of potential string truncation issues. None of which were we worried about, but can be relatively easily fixed to quiet the complaints.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310211736.66syyDpp-lkp@intel.com/ Fixes: 45d76f492938 ("pds_core: set up device and adminq") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Link: https://lore.kernel.org/r/20231113183257.71110-3-shannon.nelson@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|