Lines Matching +full:rx +full:- +full:queues +full:- +full:to +full:- +full:use
1 .. SPDX-License-Identifier: GPL-2.0
16 -----------
22 - Distributed training, where ML accelerators, such as GPUs on different hosts,
25 - Distributed raw block storage applications transfer large amounts of data with
28 Typically the Device-to-Device data transfers in the network are implemented as
29 the following low-level operations: Device-to-Host copy, Host-to-Host network
30 transfer, and Host-to-Device copy.
36 Devmem TCP optimizes this use case by implementing socket APIs that enable
37 the user to receive incoming network packets directly into device memory.
39 Packet payloads go directly from the NIC to device memory.
41 Packet headers go to host memory and are processed by the TCP/IP stack
42 normally. The NIC must support header split to achieve this.
46 - Alleviate host memory bandwidth pressure, compared to existing
47 network-transfer + device-copy semantics.
49 - Alleviate PCIe bandwidth pressure, by limiting data transfer to the lowest
50 level of the PCIe tree, compared to the traditional path which sends data
55 ---------
58 https://netdevconf.org/0x17/sessions/talk/device-memory-tcp.html
61 [PATCH net-next v24 00/13] Device Memory TCP
62 https://lore.kernel.org/netdev/20240831004313.3713467-1-almasrymina@google.com/
70 -------
73 the RX path of this API.
77 ---------
81 Header split is used to split incoming packets into a header buffer in host
84 Flow steering & RSS are used to ensure that only flows targeting devmem land on
85 an RX queue bound to devmem.
90 ethtool -G eth1 tcp-data-split on
94 ethtool -K eth1 ntuple on
96 Configure RSS to steer all traffic away from the target RX queue (queue 15 in
99 ethtool --set-rxfh-indir eth1 equal 15
102 The user must bind a dmabuf to any number of RX queues on a given NIC using
105 /* Bind dmabuf to NIC RX queue 15 */
106 struct netdev_queue *queues;
107 queues = malloc(sizeof(*queues) * 1);
109 queues[0]._present.type = 1;
110 queues[0]._present.idx = 1;
111 queues[0].type = NETDEV_RX_QUEUE_TYPE_RX;
112 queues[0].idx = 15;
119 __netdev_bind_rx_req_set_queues(req, queues, n_queue_index);
123 dmabuf_id = rsp->dmabuf_id;
126 The netlink API returns a dmabuf_id: a unique ID that refers to this dmabuf
133 Note that any reasonably well-behaved dmabuf from any exporter should work with
135 this is udmabuf, which wraps user memory (non-devmem) in a dmabuf.
139 ------------
141 The socket must be flow steered to the dmabuf bound RX queue::
143 ethtool -N eth1 flow-type tcp4 ... queue 15
147 --------------
149 The user application must signal to the kernel that it is capable of receiving
150 devmem data by passing the MSG_SOCK_DEVMEM flag to recvmsg::
157 Devmem data is received directly into the dmabuf bound to the NIC in 'NIC
158 Setup', and the kernel signals such to the user via the SCM_DEVMEM_* cmsgs::
161 if (cm->cmsg_level != SOL_SOCKET ||
162 (cm->cmsg_type != SCM_DEVMEM_DMABUF &&
163 cm->cmsg_type != SCM_DEVMEM_LINEAR))
168 if (cm->cmsg_type == SCM_DEVMEM_DMABUF) {
171 * dmabuf_cmsg->dmabuf_id is the dmabuf the
174 * dmabuf_cmsg->frag_offset is the offset into
177 * dmabuf_cmsg->frag_size is the size of the
180 * dmabuf_cmsg->frag_token is a token used to
181 * refer to this frag for later freeing.
185 token.token_start = dmabuf_cmsg->frag_token;
190 if (cm->cmsg_type == SCM_DEVMEM_LINEAR)
193 * dmabuf_cmsg->frag_size is the size of the
202 - SCM_DEVMEM_DMABUF: this indicates the fragment landed in the dmabuf indicated
205 - SCM_DEVMEM_LINEAR: this indicates the fragment landed in the linear buffer.
206 This typically happens when the NIC is unable to split the packet at the
210 Applications may receive no SO_DEVMEM_* cmsgs. That indicates non-devmem,
211 regular TCP data that landed on an RX queue not bound to a dmabuf.
215 -------------
218 processes the frag. The user must return the frag to the kernel via
224 The user must ensure the tokens are returned to the kernel in a timely manner.
225 Failure to do so will exhaust the limited dmabuf that is bound to the RX queue
226 and will lead to packet drops.
229 among the token->token_count across all the tokens. If the user provides more
230 than 1024 frags, the kernel will free up to 1024 frags and return early.
242 ---------------
244 Devmem payloads are inaccessible to the kernel processing the packets. This
247 - Loopback is not functional. Loopback relies on copying the payload, which is
250 - Software checksum calculation fails.
252 - TCP Dump and bpf can't access devmem packet payloads.
261 ncdevmem is a devmem TCP netcat. It works very similarly to netcat, but
264 To run ncdevmem, you need to run it on a server on the machine under test, and
265 you need to run netcat on a peer to provide the TX data.
271 ncdevmem -s <server IP> -c <client IP> -f eth1 -d 3 -n 0000:06:00.0 -l \
272 -p 5201 -v 7
274 On client side, use regular netcat to send TX data to ncdevmem process
277 yes $(echo -e \\x01\\x02\\x03\\x04\\x05\\x06) | \
278 tr \\n \\0 | head -c 5G | nc <server IP> 5201 -p 5201