4dec64c5 | 28-Jun-2024 |
Mina Almasry <almasrymina@google.com> |
page_pool: convert to use netmem
Abstract the memory type from the page_pool so we can later add support for new memory types. Convert the page_pool to use the new netmem type abstraction, rather th
page_pool: convert to use netmem
Abstract the memory type from the page_pool so we can later add support for new memory types. Convert the page_pool to use the new netmem type abstraction, rather than use struct page directly.
As of this patch the netmem type is a no-op abstraction: it's always a struct page underneath. All the page pool internals are converted to use struct netmem instead of struct page, and the page pool now exports 2 APIs:
1. The existing struct page API. 2. The new struct netmem API.
Keeping the existing API is transitional; we do not want to refactor all the current drivers using the page pool at once.
The netmem abstraction is currently a no-op. The page_pool uses page_to_netmem() to convert allocated pages to netmem, and uses netmem_to_page() to convert the netmem back to pages to pass to mm APIs,
Follow up patches to this series add non-paged netmem support to the page_pool. This change is factored out on its own to limit the code churn to this 1 patch, for ease of code review.
Signed-off-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://patch.msgid.link/20240628003253.1694510-6-almasrymina@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
403f11ac | 07-May-2024 |
Alexander Lobakin <aleksander.lobakin@intel.com> |
page_pool: don't use driver-set flags field directly
page_pool::p is driver-defined params, copied directly from the structure passed to page_pool_create(). The structure isn't meant to be modified
page_pool: don't use driver-set flags field directly
page_pool::p is driver-defined params, copied directly from the structure passed to page_pool_create(). The structure isn't meant to be modified by the Page Pool core code and this even might look confusing[0][1]. In order to be able to alter some flags, let's define our own, internal fields the same way as the already existing one (::has_init_callback). They are defined as bits in the driver-set params, leave them so here as well, to not waste byte-per-bit or so. Almost 30 bits are still free for future extensions. We could've defined only new flags here or only the ones we may need to alter, but checking some flags in one place while others in another doesn't sound convenient or intuitive. ::flags passed by the driver can now go to the "slow" PP params.
Suggested-by: Jakub Kicinski <kuba@kernel.org> Link[0]: https://lore.kernel.org/netdev/20230703133207.4f0c54ce@kernel.org Suggested-by: Alexander Duyck <alexanderduyck@fb.com> Link[1]: https://lore.kernel.org/netdev/CAKgT0UfZCGnWgOH96E4GV3ZP6LLbROHM7SHE8NKwq+exX+Gk_Q@mail.gmail.com Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
show more ...
|
ce230f4f | 18-Apr-2024 |
Alexander Lobakin <aleksander.lobakin@intel.com> |
page_pool: add DMA-sync-for-CPU inline helper
Each driver is responsible for syncing buffers written by HW for CPU before accessing them. Almost each PP-enabled driver uses the same pattern, which c
page_pool: add DMA-sync-for-CPU inline helper
Each driver is responsible for syncing buffers written by HW for CPU before accessing them. Almost each PP-enabled driver uses the same pattern, which could be shorthanded into a static inline to make driver code a little bit more compact. Introduce a simple helper which performs DMA synchronization for the size passed from the driver. It can be used even when the pool doesn't manage DMA-syncs-for-device, just make sure the page has a correct DMA address set via page_pool_set_dma_addr().
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
show more ...
|
f853fa5c | 16-Feb-2024 |
Lorenzo Bianconi <lorenzo@kernel.org> |
net: page_pool: fix recycle stats for system page_pool allocator
Use global percpu page_pool_recycle_stats counter for system page_pool allocator instead of allocating a separate percpu variable for
net: page_pool: fix recycle stats for system page_pool allocator
Use global percpu page_pool_recycle_stats counter for system page_pool allocator instead of allocating a separate percpu variable for each (also percpu) page pool instance.
Reviewed-by: Toke Hoiland-Jorgensen <toke@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/87f572425e98faea3da45f76c3c68815c01a20ee.1708075412.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
d49010ad | 27-Nov-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: page_pool: expose page pool stats via netlink
Dump the stats into netlink. More clever approaches like dumping the stats per-CPU for each CPU individually to see where the packets get consumed
net: page_pool: expose page pool stats via netlink
Dump the stats into netlink. More clever approaches like dumping the stats per-CPU for each CPU individually to see where the packets get consumed can be implemented in the future.
A trimmed example from a real (but recently booted system):
$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \ --dump page-pool-stats-get [{'info': {'id': 19, 'ifindex': 2}, 'alloc-empty': 48, 'alloc-fast': 3024, 'alloc-refill': 0, 'alloc-slow': 48, 'alloc-slow-high-order': 0, 'alloc-waive': 0, 'recycle-cache-full': 0, 'recycle-cached': 0, 'recycle-released-refcnt': 0, 'recycle-ring': 0, 'recycle-ring-full': 0}, {'info': {'id': 18, 'ifindex': 2}, 'alloc-empty': 66, 'alloc-fast': 11811, 'alloc-refill': 35, 'alloc-slow': 66, 'alloc-slow-high-order': 0, 'alloc-waive': 0, 'recycle-cache-full': 1145, 'recycle-cached': 6541, 'recycle-released-refcnt': 0, 'recycle-ring': 1275, 'recycle-ring-full': 0}, {'info': {'id': 17, 'ifindex': 2}, 'alloc-empty': 73, 'alloc-fast': 62099, 'alloc-refill': 413, ...
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
69cb4952 | 27-Nov-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: page_pool: report when page pool was destroyed
Report when page pool was destroyed. Together with the inflight / memory use reporting this can serve as a replacement for the warning about leake
net: page_pool: report when page pool was destroyed
Report when page pool was destroyed. Together with the inflight / memory use reporting this can serve as a replacement for the warning about leaked page pools we currently print to dmesg.
Example output for a fake leaked page pool using some hacks in netdevsim (one "live" pool, and one "leaked" on the same dev):
$ ./cli.py --no-schema --spec netlink/specs/netdev.yaml \ --dump page-pool-get [{'id': 2, 'ifindex': 3}, {'id': 1, 'ifindex': 3, 'destroyed': 133, 'inflight': 1}]
Tested-by: Dragos Tatulea <dtatulea@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
02b3de80 | 27-Nov-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: page_pool: stash the NAPI ID for easier access
To avoid any issues with race conditions on accessing napi and having to think about the lifetime of NAPI objects in netlink GET - stash the napi_
net: page_pool: stash the NAPI ID for easier access
To avoid any issues with race conditions on accessing napi and having to think about the lifetime of NAPI objects in netlink GET - stash the napi_id to which page pool was linked at creation time.
Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
083772c9 | 27-Nov-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: page_pool: record pools per netdev
Link the page pools with netdevs. This needs to be netns compatible so we have two options. Either we record the pools per netns and have to worry about movin
net: page_pool: record pools per netdev
Link the page pools with netdevs. This needs to be netns compatible so we have two options. Either we record the pools per netns and have to worry about moving them as the netdev gets moved. Or we record them directly on the netdev so they move with the netdev without any extra work.
Implement the latter option. Since pools may outlast netdev we need a place to store orphans. In time honored tradition use loopback for this purpose.
Reviewed-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
show more ...
|
2da0cac1 | 21-Nov-2023 |
Jakub Kicinski <kuba@kernel.org> |
net: page_pool: avoid touching slow on the fastpath
To fully benefit from previous commit add one byte of state in the first cache line recording if we need to look at the slow part.
The packing is
net: page_pool: avoid touching slow on the fastpath
To fully benefit from previous commit add one byte of state in the first cache line recording if we need to look at the slow part.
The packing isn't all that impressive right now, we create a 7B hole. I'm expecting Olek's rework will reshuffle this, anyway.
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://lore.kernel.org/r/20231121000048.789613-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
8ab32fa1 | 20-Oct-2023 |
Yunsheng Lin <linyunsheng@huawei.com> |
page_pool: update document about fragment API
As more drivers begin to use the fragment API, update the document about how to decide which API to use for the driver author.
Signed-off-by: Yunsheng
page_pool: update document about fragment API
As more drivers begin to use the fragment API, update the document about how to decide which API to use for the driver author.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> CC: Lorenzo Bianconi <lorenzo@kernel.org> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: Liang Chen <liangchen.linux@gmail.com> CC: Alexander Lobakin <aleksander.lobakin@intel.com> CC: Dima Tisnek <dimaqq@gmail.com> Link: https://lore.kernel.org/r/20231020095952.11055-5-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|
de97502e | 20-Oct-2023 |
Yunsheng Lin <linyunsheng@huawei.com> |
page_pool: introduce page_pool_alloc() API
Currently page pool supports the below use cases: use case 1: allocate page without page splitting using page_pool_alloc_pages() API if the dri
page_pool: introduce page_pool_alloc() API
Currently page pool supports the below use cases: use case 1: allocate page without page splitting using page_pool_alloc_pages() API if the driver knows that the memory it need is always bigger than half of the page allocated from page pool. use case 2: allocate page frag with page splitting using page_pool_alloc_frag() API if the driver knows that the memory it need is always smaller than or equal to the half of the page allocated from page pool.
There is emerging use case [1] & [2] that is a mix of the above two case: the driver doesn't know the size of memory it need beforehand, so the driver may use something like below to allocate memory with least memory utilization and performance penalty:
if (size << 1 > max_size) page = page_pool_alloc_pages(); else page = page_pool_alloc_frag();
To avoid the driver doing something like above, add the page_pool_alloc() API to support the above use case, and update the true size of memory that is acctually allocated by updating '*size' back to the driver in order to avoid exacerbating truesize underestimate problem.
Rename page_pool_free() which is used in the destroy process to __page_pool_destroy() to avoid confusion with the newly added API.
1. https://lore.kernel.org/all/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/ 2. https://lore.kernel.org/all/20230526054621.18371-3-liangchen.linux@gmail.com/
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> CC: Lorenzo Bianconi <lorenzo@kernel.org> CC: Alexander Duyck <alexander.duyck@gmail.com> CC: Liang Chen <liangchen.linux@gmail.com> CC: Alexander Lobakin <aleksander.lobakin@intel.com> Link: https://lore.kernel.org/r/20231020095952.11055-4-linyunsheng@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
show more ...
|