/linux/drivers/net/ethernet/intel/idpf/ |
H A D | idpf_ethtool.c | diff e4891e4687c8dd136d80d6c1b857a02931ed6fc8 Thu Jun 20 15:53:38 CEST 2024 Alexander Lobakin <aleksander.lobakin@intel.com> idpf: split &idpf_queue into 4 strictly-typed queue structures
Currently, sizeof(struct idpf_queue) is 32 Kb. This is due to the 12-bit hashtable declaration at the end of the queue. This HT is needed only for Tx queues when the flow scheduling mode is enabled. But &idpf_queue is unified for all of the queue types, provoking excessive memory usage. The unified structure in general makes the code less effective via suboptimal fields placement. You can't avoid that unless you make unions each 2 fields. Even then, different field alignment etc., doesn't allow you to optimize things to the limit. Split &idpf_queue into 4 structures corresponding to the queue types: RQ (Rx queue), SQ (Tx queue), FQ (buffer queue), and CQ (completion queue). Place only needed fields there and shortcuts handy for hotpath. Allocate the abovementioned hashtable dynamically and only when needed, keeping &idpf_tx_queue relatively short (192 bytes, same as Rx). This HT is used only for OOO completions, which aren't really hotpath anyway. Note that this change must be done atomically, otherwise it's really easy to get lost and miss something.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
H A D | idpf_singleq_txrx.c | diff e4891e4687c8dd136d80d6c1b857a02931ed6fc8 Thu Jun 20 15:53:38 CEST 2024 Alexander Lobakin <aleksander.lobakin@intel.com> idpf: split &idpf_queue into 4 strictly-typed queue structures
Currently, sizeof(struct idpf_queue) is 32 Kb. This is due to the 12-bit hashtable declaration at the end of the queue. This HT is needed only for Tx queues when the flow scheduling mode is enabled. But &idpf_queue is unified for all of the queue types, provoking excessive memory usage. The unified structure in general makes the code less effective via suboptimal fields placement. You can't avoid that unless you make unions each 2 fields. Even then, different field alignment etc., doesn't allow you to optimize things to the limit. Split &idpf_queue into 4 structures corresponding to the queue types: RQ (Rx queue), SQ (Tx queue), FQ (buffer queue), and CQ (completion queue). Place only needed fields there and shortcuts handy for hotpath. Allocate the abovementioned hashtable dynamically and only when needed, keeping &idpf_tx_queue relatively short (192 bytes, same as Rx). This HT is used only for OOO completions, which aren't really hotpath anyway. Note that this change must be done atomically, otherwise it's really easy to get lost and miss something.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
H A D | idpf.h | diff e4891e4687c8dd136d80d6c1b857a02931ed6fc8 Thu Jun 20 15:53:38 CEST 2024 Alexander Lobakin <aleksander.lobakin@intel.com> idpf: split &idpf_queue into 4 strictly-typed queue structures
Currently, sizeof(struct idpf_queue) is 32 Kb. This is due to the 12-bit hashtable declaration at the end of the queue. This HT is needed only for Tx queues when the flow scheduling mode is enabled. But &idpf_queue is unified for all of the queue types, provoking excessive memory usage. The unified structure in general makes the code less effective via suboptimal fields placement. You can't avoid that unless you make unions each 2 fields. Even then, different field alignment etc., doesn't allow you to optimize things to the limit. Split &idpf_queue into 4 structures corresponding to the queue types: RQ (Rx queue), SQ (Tx queue), FQ (buffer queue), and CQ (completion queue). Place only needed fields there and shortcuts handy for hotpath. Allocate the abovementioned hashtable dynamically and only when needed, keeping &idpf_tx_queue relatively short (192 bytes, same as Rx). This HT is used only for OOO completions, which aren't really hotpath anyway. Note that this change must be done atomically, otherwise it's really easy to get lost and miss something.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
H A D | idpf_txrx.h | diff e4891e4687c8dd136d80d6c1b857a02931ed6fc8 Thu Jun 20 15:53:38 CEST 2024 Alexander Lobakin <aleksander.lobakin@intel.com> idpf: split &idpf_queue into 4 strictly-typed queue structures
Currently, sizeof(struct idpf_queue) is 32 Kb. This is due to the 12-bit hashtable declaration at the end of the queue. This HT is needed only for Tx queues when the flow scheduling mode is enabled. But &idpf_queue is unified for all of the queue types, provoking excessive memory usage. The unified structure in general makes the code less effective via suboptimal fields placement. You can't avoid that unless you make unions each 2 fields. Even then, different field alignment etc., doesn't allow you to optimize things to the limit. Split &idpf_queue into 4 structures corresponding to the queue types: RQ (Rx queue), SQ (Tx queue), FQ (buffer queue), and CQ (completion queue). Place only needed fields there and shortcuts handy for hotpath. Allocate the abovementioned hashtable dynamically and only when needed, keeping &idpf_tx_queue relatively short (192 bytes, same as Rx). This HT is used only for OOO completions, which aren't really hotpath anyway. Note that this change must be done atomically, otherwise it's really easy to get lost and miss something.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
H A D | idpf_virtchnl.c | diff e4891e4687c8dd136d80d6c1b857a02931ed6fc8 Thu Jun 20 15:53:38 CEST 2024 Alexander Lobakin <aleksander.lobakin@intel.com> idpf: split &idpf_queue into 4 strictly-typed queue structures
Currently, sizeof(struct idpf_queue) is 32 Kb. This is due to the 12-bit hashtable declaration at the end of the queue. This HT is needed only for Tx queues when the flow scheduling mode is enabled. But &idpf_queue is unified for all of the queue types, provoking excessive memory usage. The unified structure in general makes the code less effective via suboptimal fields placement. You can't avoid that unless you make unions each 2 fields. Even then, different field alignment etc., doesn't allow you to optimize things to the limit. Split &idpf_queue into 4 structures corresponding to the queue types: RQ (Rx queue), SQ (Tx queue), FQ (buffer queue), and CQ (completion queue). Place only needed fields there and shortcuts handy for hotpath. Allocate the abovementioned hashtable dynamically and only when needed, keeping &idpf_tx_queue relatively short (192 bytes, same as Rx). This HT is used only for OOO completions, which aren't really hotpath anyway. Note that this change must be done atomically, otherwise it's really easy to get lost and miss something.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
H A D | idpf_lib.c | diff e4891e4687c8dd136d80d6c1b857a02931ed6fc8 Thu Jun 20 15:53:38 CEST 2024 Alexander Lobakin <aleksander.lobakin@intel.com> idpf: split &idpf_queue into 4 strictly-typed queue structures
Currently, sizeof(struct idpf_queue) is 32 Kb. This is due to the 12-bit hashtable declaration at the end of the queue. This HT is needed only for Tx queues when the flow scheduling mode is enabled. But &idpf_queue is unified for all of the queue types, provoking excessive memory usage. The unified structure in general makes the code less effective via suboptimal fields placement. You can't avoid that unless you make unions each 2 fields. Even then, different field alignment etc., doesn't allow you to optimize things to the limit. Split &idpf_queue into 4 structures corresponding to the queue types: RQ (Rx queue), SQ (Tx queue), FQ (buffer queue), and CQ (completion queue). Place only needed fields there and shortcuts handy for hotpath. Allocate the abovementioned hashtable dynamically and only when needed, keeping &idpf_tx_queue relatively short (192 bytes, same as Rx). This HT is used only for OOO completions, which aren't really hotpath anyway. Note that this change must be done atomically, otherwise it's really easy to get lost and miss something.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|
H A D | idpf_txrx.c | diff e4891e4687c8dd136d80d6c1b857a02931ed6fc8 Thu Jun 20 15:53:38 CEST 2024 Alexander Lobakin <aleksander.lobakin@intel.com> idpf: split &idpf_queue into 4 strictly-typed queue structures
Currently, sizeof(struct idpf_queue) is 32 Kb. This is due to the 12-bit hashtable declaration at the end of the queue. This HT is needed only for Tx queues when the flow scheduling mode is enabled. But &idpf_queue is unified for all of the queue types, provoking excessive memory usage. The unified structure in general makes the code less effective via suboptimal fields placement. You can't avoid that unless you make unions each 2 fields. Even then, different field alignment etc., doesn't allow you to optimize things to the limit. Split &idpf_queue into 4 structures corresponding to the queue types: RQ (Rx queue), SQ (Tx queue), FQ (buffer queue), and CQ (completion queue). Place only needed fields there and shortcuts handy for hotpath. Allocate the abovementioned hashtable dynamically and only when needed, keeping &idpf_tx_queue relatively short (192 bytes, same as Rx). This HT is used only for OOO completions, which aren't really hotpath anyway. Note that this change must be done atomically, otherwise it's really easy to get lost and miss something.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
|