c1fffc5d | 17-Jul-2025 |
Joshua Washington <joshwash@google.com> |
gve: implement DQO RX datapath and control path for AF_XDP zero-copy
Add the RX datapath for AF_XDP zero-copy for DQ RDA. The RX path is quite similar to that of the normal XDP case. Parallel method
gve: implement DQO RX datapath and control path for AF_XDP zero-copy
Add the RX datapath for AF_XDP zero-copy for DQ RDA. The RX path is quite similar to that of the normal XDP case. Parallel methods are introduced to properly handle XSKs instead of normal driver buffers.
To properly support posting from XSKs, queues are destroyed and recreated, as the driver was initially making use of page pool buffers instead of the XSK pool memory.
Expose support for AF_XDP zero-copy, as the TX and RX datapaths both exist.
Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250717152839.973004-6-jeroendb@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
2236836e | 17-Jul-2025 |
Joshua Washington <joshwash@google.com> |
gve: implement DQO TX datapath for AF_XDP zero-copy
In the descriptor clean path, a number of changes need to be made to accommodate out of order completions and double completions.
The XSK stack c
gve: implement DQO TX datapath for AF_XDP zero-copy
In the descriptor clean path, a number of changes need to be made to accommodate out of order completions and double completions.
The XSK stack can only handle completions being processed in order, as a single counter is incremented in xsk_tx_completed to sigify how many XSK descriptors have been completed. Because completions can come back out of order in DQ, a separate queue of XSK descriptors must be maintained. This queue keeps the pending packets in the order that they were written so that the descriptors can be counted in xsk_tx_completed in the same order.
For double completions, a new pending packet state and type are introduced. The new type, GVE_TX_PENDING_PACKET_DQO_XSK, plays an anlogous role to pre-existing _SKB and _XDP_FRAME pending packet types for XSK descriptors. The new state, GVE_PACKET_STATE_XSK_COMPLETE, represents packets for which no more completions are expected. This includes packets which have received a packet completion or reinjection completion, as well as packets whose reinjection completion timer have timed out. At this point, such packets can be counted as part of xsk_tx_completed() and freed.
Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250717152839.973004-5-jeroendb@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
652fe13b | 17-Jul-2025 |
Joshua Washington <joshwash@google.com> |
gve: keep registry of zc xsk pools in netdev_priv
Relying on xsk_get_pool_from_qid for getting whether zero copy is enabled on a queue is erroneous, as an XSK pool is registered in xp_assign_dev whe
gve: keep registry of zc xsk pools in netdev_priv
Relying on xsk_get_pool_from_qid for getting whether zero copy is enabled on a queue is erroneous, as an XSK pool is registered in xp_assign_dev whether AF_XDP zero-copy is enabled or not. This becomes problematic when queues are restarted in copy mode, as all RX queues with XSKs will register a pool, causing the driver to exercise the zero-copy codepath.
This patch adds a bitmap to keep track of which queues have zero-copy enabled.
Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250717152839.973004-4-jeroendb@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
077f7153 | 17-Jul-2025 |
Joshua Washington <joshwash@google.com> |
gve: merge xdp and xsk registration
The existence of both of these xdp_rxq and xsk_rxq is redundant. xdp_rxq can be used in both the zero-copy mode and the copy mode case. XSK pool memory model regi
gve: merge xdp and xsk registration
The existence of both of these xdp_rxq and xsk_rxq is redundant. xdp_rxq can be used in both the zero-copy mode and the copy mode case. XSK pool memory model registration is prioritized over normal memory model registration to ensure that memory model registration happens only once per queue.
Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Jeroen de Borst <jeroendb@google.com> Link: https://patch.msgid.link/20250717152839.973004-3-jeroendb@google.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
d8a8ca14 | 18-Jun-2025 |
Joshua Washington <joshwash@google.com> |
gve: add XDP_TX and XDP_REDIRECT support for DQ RDA
This patch adds support for XDP_TX and XDP_REDIRECT for the DQ RDA queue format. To appropriately support transmission of XDP frames, a new pendin
gve: add XDP_TX and XDP_REDIRECT support for DQ RDA
This patch adds support for XDP_TX and XDP_REDIRECT for the DQ RDA queue format. To appropriately support transmission of XDP frames, a new pending packet type GVE_TX_PENDING_PACKET_DQO_XDP_FRAME is introduced for completion handling, as there was a previous assumption that completed packets would be SKBs.
XDP_TX handling completes the basic XDP actions, so the feature is recorded accordingly. This patch also enables the ndo_xdp_xmit callback allowing DQ to handle XDP_REDIRECT packets originating from another interface.
The XDP spinlock is moved to common TX ring fields so that it can be used in both GQ and DQ. Originally, it was in a section which was mutually exclusive for GQ and DQ.
In summary, 3 XDP features are exposed for the DQ RDA queue format: 1) NETDEV_XDP_ACT_BASIC 2) NETDEV_XDP_ACT_NDO_XMIT 3) NETDEV_XDP_ACT_REDIRECT
Note that XDP and header-data split are mutually exclusive for the time being due to lack of multi-buffer XDP support.
This patch does not add support for the DQ QPL format. That is to come in a future patch series.
Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
cb711b3d | 18-Jun-2025 |
Joshua Washington <joshwash@google.com> |
gve: refactor DQO TX methods to be more generic for XDP
This patch performs various minor DQO TX datapath refactors in preparation for adding XDP_TX and XDP_REDIRECT support. The following refactors
gve: refactor DQO TX methods to be more generic for XDP
This patch performs various minor DQO TX datapath refactors in preparation for adding XDP_TX and XDP_REDIRECT support. The following refactors are performed:
1) gve_tx_fill_pkt_desc_dqo() relies on a SKB pointer to get whether checksum offloading should be enabled. This won't work for the XDP case, which does not have a SKB. This patch updates the method to use a boolean representing whether checksum offloading should be enabled directly.
2) gve_maybe_stop_dqo() contains some synchronization between the true TX head and the cached value, a synchronization which is common for XDP queues and normal netdev queues. However, that method is reserved for netdev TX queues. To avoid duplicate code, this logic is factored out into a new method, gve_has_tx_slots_available().
3) gve_tx_update_tail() is added to update the TX tail, a functionality that will be common between normal TX and XDP TX codepaths.
Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Joshua Washington <joshwash@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
b11344f6 | 16-Jun-2025 |
Alok Tiwari <alok.a.tiwari@oracle.com> |
gve: Return error for unknown admin queue command
In gve_adminq_issue_cmd(), return -EINVAL instead of 0 when an unknown admin queue command opcode is encountered.
This prevents the function from s
gve: Return error for unknown admin queue command
In gve_adminq_issue_cmd(), return -EINVAL instead of 0 when an unknown admin queue command opcode is encountered.
This prevents the function from silently succeeding on invalid input and prevents undefined behavior by ensuring the function fails gracefully when an unrecognized opcode is provided.
These changes improve error handling.
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20250616054504.1644770-2-alok.a.tiwari@oracle.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
a471e7f8 | 14-Jun-2025 |
John Fraker <jfraker@google.com> |
gve: Advertise support for rx hardware timestamping
Expand the get_ts_info ethtool handler with the new gve_get_ts_info which advertises support for rx hardware timestamping.
With this patch, the d
gve: Advertise support for rx hardware timestamping
Expand the get_ts_info ethtool handler with the new gve_get_ts_info which advertises support for rx hardware timestamping.
With this patch, the driver now fully supports rx hardware timestamping.
Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-9-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
b2c7aeb4 | 14-Jun-2025 |
John Fraker <jfraker@google.com> |
gve: Implement ndo_hwtstamp_get/set for RX timestamping
Implement ndo_hwtstamp_get/set to enable hardware RX timestamping, providing support for SIOC[SG]HWTSTAMP IOCTLs. Included with this support i
gve: Implement ndo_hwtstamp_get/set for RX timestamping
Implement ndo_hwtstamp_get/set to enable hardware RX timestamping, providing support for SIOC[SG]HWTSTAMP IOCTLs. Included with this support is the small change necessary to read the rx timestamp out of the rx descriptor, now that timestamps start being enabled. The gve clock is only used for hardware timestamps, so started when timestamps are requested and stopped when not needed.
This version only supports RX hardware timestamping with the rx filter HWTSTAMP_FILTER_ALL. If the user attempts to configure a more restrictive filter, the filter will be set to HWTSTAMP_FILTER_ALL in the returned structure.
Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-8-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
3bf5431f | 14-Jun-2025 |
John Fraker <jfraker@google.com> |
gve: Add rx hardware timestamp expansion
Allow the rx path to recover the high 32 bits of the full 64 bit rx timestamp.
Use the low 32 bits of the last synced nic time and the 32 bits of the timest
gve: Add rx hardware timestamp expansion
Allow the rx path to recover the high 32 bits of the full 64 bit rx timestamp.
Use the low 32 bits of the last synced nic time and the 32 bits of the timestamp provided in the rx descriptor to generate a difference, which is then applied to the last synced nic time to reconstruct the complete 64-bit timestamp.
This scheme remains accurate as long as no more than ~2 seconds have passed between the last read of the nic clock and the timestamping application of the received packet.
Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-7-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
c51b7bf8 | 14-Jun-2025 |
Kevin Yang <yyd@google.com> |
gve: Add support to query the nic clock
Query the nic clock and store the results. The timestamp delivered in descriptors has a wraparound time of ~4 seconds so 250ms is chosen as the sync cadence t
gve: Add support to query the nic clock
Query the nic clock and store the results. The timestamp delivered in descriptors has a wraparound time of ~4 seconds so 250ms is chosen as the sync cadence to provide a balance between performance, and drift potential when we do start associating host time and nic time.
Leverage PTP's aux_work to query the nic clock periodically.
Signed-off-by: Kevin Yang <yyd@google.com> Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Tim Hostetler <thostet@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-6-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
21235ad9 | 14-Jun-2025 |
Ziwei Xiao <ziweixiao@google.com> |
gve: Add adminq lock for queues creation and destruction
Adminq commands for queues creation and destruction were not consistently protected by the driver's adminq_lock. This was previously benign a
gve: Add adminq lock for queues creation and destruction
Adminq commands for queues creation and destruction were not consistently protected by the driver's adminq_lock. This was previously benign as these operations were always initiated from contexts holding kernel-level locks (e.g., rtnl_lock, netdev_lock), which provided serialization.
Upcoming PTP aux_work will issue adminq commands directly from the driver to read the NIC clock, without such kernel lock protection. To prevent race conditions with this new PTP work, this patch ensures the adminq_lock is held during queues creation and destruction.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-5-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
acd16380 | 14-Jun-2025 |
Harshitha Ramamurthy <hramamurthy@google.com> |
gve: Add initial PTP device support
If the device supports reading of the nic clock, add support to initialize and register the PTP clock.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-
gve: Add initial PTP device support
If the device supports reading of the nic clock, add support to initialize and register the PTP clock.
Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-4-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
e0c9d568 | 14-Jun-2025 |
John Fraker <jfraker@google.com> |
gve: Add adminq command to report nic timestamp
Add an adminq command to read NIC's hardware clock. The driver allocates dma memory and passes that dma memory address to the device. The device then
gve: Add adminq command to report nic timestamp
Add an adminq command to read NIC's hardware clock. The driver allocates dma memory and passes that dma memory address to the device. The device then writes the clock to the given address.
Signed-off-by: Jeff Rogers <jefrogers@google.com> Signed-off-by: John Fraker <jfraker@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Link: https://patch.msgid.link/20250614000754.164827-3-hramamurthy@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|