#
a15f7c96 |
| 03-May-2024 |
John Baldwin <jhb@FreeBSD.org> |
nvmft: The in-kernel NVMe over Fabrics controller
This is the server (target in SCSI terms) for NVMe over Fabrics. Userland is responsible for accepting a new queue pair and receiving the initial Co
nvmft: The in-kernel NVMe over Fabrics controller
This is the server (target in SCSI terms) for NVMe over Fabrics. Userland is responsible for accepting a new queue pair and receiving the initial Connect command before handing the queue pair off via an ioctl to this CTL frontend.
This frontend exposes CTL LUNs as NVMe namespaces to remote hosts. Users can ask LUNS to CTL that can be shared via either iSCSI or NVMeoF.
Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44726
show more ...
|
#
a1eda741 |
| 03-May-2024 |
John Baldwin <jhb@FreeBSD.org> |
nvmf: The in-kernel NVMe over Fabrics host
This is the client (initiator in SCSI terms) for NVMe over Fabrics. Userland is responsible for creating a set of queue pairs and then handing them off via
nvmf: The in-kernel NVMe over Fabrics host
This is the client (initiator in SCSI terms) for NVMe over Fabrics. Userland is responsible for creating a set of queue pairs and then handing them off via an ioctl to this driver, e.g. via the 'connect' command from nvmecontrol(8). An nvmeX new-bus device is created at the top-level to represent the remote controller similar to PCI nvmeX devices for PCI-express controllers.
As with nvme(4), namespace devices named /dev/nvmeXnsY are created and pass through commands can be submitted to either the namespace devices or the controller device. For example, 'nvmecontrol identify nvmeX' works for a remote Fabrics controller the same as for a PCI-express controller.
nvmf exports remote namespaces via nda(4) devices using the new NVMF CAM transport. nvmf does not support nvd(4), only nda(4).
Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44714
show more ...
|
#
59144db3 |
| 03-May-2024 |
John Baldwin <jhb@FreeBSD.org> |
nvmf_tcp: Add a TCP transport for NVMe over Fabrics
Structurally this is very similar to the TCP transport for iSCSI (icl_soft.c). One key difference is that NVMeoF transports use a more abstract i
nvmf_tcp: Add a TCP transport for NVMe over Fabrics
Structurally this is very similar to the TCP transport for iSCSI (icl_soft.c). One key difference is that NVMeoF transports use a more abstract interface working with NVMe commands rather than transport PDUs. Thus, the data transfer for a given command is managed entirely in the transport backend.
Similar to icl_soft.c, separate kthreads are used to handle transmit and receive for each queue pair. On the transmit side, when a capsule is transmitted by an upper layer, it is placed on a queue for processing by the transmit thread. The transmit thread converts command response capsules into suitable TCP PDUs where each PDU is described by an mbuf chain that is then queued to the backing socket's send buffer. Command capsules can embed data along with the NVMe command.
On the receive side, a socket upcall notifies the receive kthread when more data arrives. Once enough data has arrived for a PDU, the PDU is handled synchronously in the kthread. PDUs such as R2T or data related PDUs are handled internally, with callbacks invoked if a data transfer encounters an error, or once the data transfer has completed. Received capsule PDUs invoke the upper layer's capsule_received callback.
struct nvmf_tcp_command_buffer manages a TCP command buffer for data transfers that do not use in-capsule-data as described in the NVMeoF spec. Data related PDUs such as R2T, C2H, and H2C are associated with a command buffer except in the case of the send_controller_data transport method which simply constructs one or more C2H PDUs from the caller's mbuf chain.
Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44712
show more ...
|
#
aa1207ea |
| 03-May-2024 |
John Baldwin <jhb@FreeBSD.org> |
nvmf: Add infrastructure kernel module for NVMe over Fabrics
nvmf_transport.ko provides routines for managing NVMeoF queue pairs and capsules. It provides a glue layer between transports (such as T
nvmf: Add infrastructure kernel module for NVMe over Fabrics
nvmf_transport.ko provides routines for managing NVMeoF queue pairs and capsules. It provides a glue layer between transports (such as TCP or RDMA) and an NVMeoF host (initiator) and controller (target).
Unlike the synchronous API exposed to the host and controller by libnvmf, the kernel's transport layer uses an asynchronous API built on callbacks. Upper layers provide callbacks on queue pairs that are invoked for transport errors (error_cb) or anytime a capsule is received (receive_cb).
Data transfers for a command are usually associated with a callback that is invoked once a transfer has finished either due to an error or successful completion.
For an upper layer that is a host, command capsules are allocated and populated with an NVMe SQE by calling nvmf_allocate_command. A data buffer (described by a struct memdesc) can be associated with a command capsule before it is transmitted via nvmf_capsule_append_data. This function accepts a direction (send vs receive) as well as the data transfer callback. The host then transmits the command via nvmf_transmit_capsule. The host must ensure that the data buffer described by the 'struct memdesc' remains valid until the data transfer callback is called. The queue pair's receive_cb callback should match received response capsules up with previously transmitted commands.
For the controller, incoming commands are received via the queue pair's receive_cb callback. nvmf_receive_controller_data is used to retrieve any data from a command (e.g. the data for a WRITE command). It can be called multiple times to split the data transfer into smaller sizes. This function accepts an I/O completion callback that is invoked once the data transfer has completed. nvmf_send_controller_data is used to send data to a remote host in response to a command. In this case a callback function is not used but the status is returned synchronously. Finally, the controller can allocate a response capsule via nvmf_allocate_response populated with a supplied CQE and send the response via nvmf_transmit_capsule.
Reviewed by: imp Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44711
show more ...
|