1f8b8d030SDaniel W. S. Almeida=================== 2f8b8d030SDaniel W. S. AlmeidaSetting up NFS/RDMA 3f8b8d030SDaniel W. S. Almeida=================== 4f8b8d030SDaniel W. S. Almeida 5f8b8d030SDaniel W. S. Almeida:Author: 6f8b8d030SDaniel W. S. Almeida NetApp and Open Grid Computing (May 29, 2008) 7f8b8d030SDaniel W. S. Almeida 8f8b8d030SDaniel W. S. Almeida.. warning:: 9f8b8d030SDaniel W. S. Almeida This document is probably obsolete. 10f8b8d030SDaniel W. S. Almeida 11f8b8d030SDaniel W. S. AlmeidaOverview 12f8b8d030SDaniel W. S. Almeida======== 13f8b8d030SDaniel W. S. Almeida 14f8b8d030SDaniel W. S. AlmeidaThis document describes how to install and setup the Linux NFS/RDMA client 15f8b8d030SDaniel W. S. Almeidaand server software. 16f8b8d030SDaniel W. S. Almeida 17f8b8d030SDaniel W. S. AlmeidaThe NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server 18f8b8d030SDaniel W. S. Almeidawas first included in the following release, Linux 2.6.25. 19f8b8d030SDaniel W. S. Almeida 20f8b8d030SDaniel W. S. AlmeidaIn our testing, we have obtained excellent performance results (full 10Gbit 21f8b8d030SDaniel W. S. Almeidawire bandwidth at minimal client CPU) under many workloads. The code passes 22f8b8d030SDaniel W. S. Almeidathe full Connectathon test suite and operates over both Infiniband and iWARP 23f8b8d030SDaniel W. S. AlmeidaRDMA adapters. 24f8b8d030SDaniel W. S. Almeida 25f8b8d030SDaniel W. S. AlmeidaGetting Help 26f8b8d030SDaniel W. S. Almeida============ 27f8b8d030SDaniel W. S. Almeida 28f8b8d030SDaniel W. S. AlmeidaIf you get stuck, you can ask questions on the 29f8b8d030SDaniel W. S. Almeidanfs-rdma-devel@lists.sourceforge.net mailing list. 30f8b8d030SDaniel W. S. Almeida 31f8b8d030SDaniel W. S. AlmeidaInstallation 32f8b8d030SDaniel W. S. Almeida============ 33f8b8d030SDaniel W. S. Almeida 34f8b8d030SDaniel W. S. AlmeidaThese instructions are a step by step guide to building a machine for 35f8b8d030SDaniel W. S. Almeidause with NFS/RDMA. 36f8b8d030SDaniel W. S. Almeida 37f8b8d030SDaniel W. S. Almeida- Install an RDMA device 38f8b8d030SDaniel W. S. Almeida 39f8b8d030SDaniel W. S. Almeida Any device supported by the drivers in drivers/infiniband/hw is acceptable. 40f8b8d030SDaniel W. S. Almeida 41f8b8d030SDaniel W. S. Almeida Testing has been performed using several Mellanox-based IB cards, the 42f8b8d030SDaniel W. S. Almeida Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. 43f8b8d030SDaniel W. S. Almeida 44f8b8d030SDaniel W. S. Almeida- Install a Linux distribution and tools 45f8b8d030SDaniel W. S. Almeida 46f8b8d030SDaniel W. S. Almeida The first kernel release to contain both the NFS/RDMA client and server was 47f8b8d030SDaniel W. S. Almeida Linux 2.6.25 Therefore, a distribution compatible with this and subsequent 48f8b8d030SDaniel W. S. Almeida Linux kernel release should be installed. 49f8b8d030SDaniel W. S. Almeida 50f8b8d030SDaniel W. S. Almeida The procedures described in this document have been tested with 51f8b8d030SDaniel W. S. Almeida distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). 52f8b8d030SDaniel W. S. Almeida 53f8b8d030SDaniel W. S. Almeida- Install nfs-utils-1.1.2 or greater on the client 54f8b8d030SDaniel W. S. Almeida 55f8b8d030SDaniel W. S. Almeida An NFS/RDMA mount point can be obtained by using the mount.nfs command in 56f8b8d030SDaniel W. S. Almeida nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils 57f8b8d030SDaniel W. S. Almeida version with support for NFS/RDMA mounts, but for various reasons we 58f8b8d030SDaniel W. S. Almeida recommend using nfs-utils-1.1.2 or greater). To see which version of 59f8b8d030SDaniel W. S. Almeida mount.nfs you are using, type: 60f8b8d030SDaniel W. S. Almeida 61f8b8d030SDaniel W. S. Almeida .. code-block:: sh 62f8b8d030SDaniel W. S. Almeida 63f8b8d030SDaniel W. S. Almeida $ /sbin/mount.nfs -V 64f8b8d030SDaniel W. S. Almeida 65f8b8d030SDaniel W. S. Almeida If the version is less than 1.1.2 or the command does not exist, 66f8b8d030SDaniel W. S. Almeida you should install the latest version of nfs-utils. 67f8b8d030SDaniel W. S. Almeida 68*6b2484e1SAlexander A. Klimov Download the latest package from: https://www.kernel.org/pub/linux/utils/nfs 69f8b8d030SDaniel W. S. Almeida 70f8b8d030SDaniel W. S. Almeida Uncompress the package and follow the installation instructions. 71f8b8d030SDaniel W. S. Almeida 72f8b8d030SDaniel W. S. Almeida If you will not need the idmapper and gssd executables (you do not need 73f8b8d030SDaniel W. S. Almeida these to create an NFS/RDMA enabled mount command), the installation 74f8b8d030SDaniel W. S. Almeida process can be simplified by disabling these features when running 75f8b8d030SDaniel W. S. Almeida configure: 76f8b8d030SDaniel W. S. Almeida 77f8b8d030SDaniel W. S. Almeida .. code-block:: sh 78f8b8d030SDaniel W. S. Almeida 79f8b8d030SDaniel W. S. Almeida $ ./configure --disable-gss --disable-nfsv4 80f8b8d030SDaniel W. S. Almeida 81f8b8d030SDaniel W. S. Almeida To build nfs-utils you will need the tcp_wrappers package installed. For 82f8b8d030SDaniel W. S. Almeida more information on this see the package's README and INSTALL files. 83f8b8d030SDaniel W. S. Almeida 84f8b8d030SDaniel W. S. Almeida After building the nfs-utils package, there will be a mount.nfs binary in 85f8b8d030SDaniel W. S. Almeida the utils/mount directory. This binary can be used to initiate NFS v2, v3, 86f8b8d030SDaniel W. S. Almeida or v4 mounts. To initiate a v4 mount, the binary must be called 87f8b8d030SDaniel W. S. Almeida mount.nfs4. The standard technique is to create a symlink called 88f8b8d030SDaniel W. S. Almeida mount.nfs4 to mount.nfs. 89f8b8d030SDaniel W. S. Almeida 90f8b8d030SDaniel W. S. Almeida This mount.nfs binary should be installed at /sbin/mount.nfs as follows: 91f8b8d030SDaniel W. S. Almeida 92f8b8d030SDaniel W. S. Almeida .. code-block:: sh 93f8b8d030SDaniel W. S. Almeida 94f8b8d030SDaniel W. S. Almeida $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs 95f8b8d030SDaniel W. S. Almeida 96f8b8d030SDaniel W. S. Almeida In this location, mount.nfs will be invoked automatically for NFS mounts 97f8b8d030SDaniel W. S. Almeida by the system mount command. 98f8b8d030SDaniel W. S. Almeida 99f8b8d030SDaniel W. S. Almeida .. note:: 100f8b8d030SDaniel W. S. Almeida mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed 101f8b8d030SDaniel W. S. Almeida on the NFS client machine. You do not need this specific version of 102f8b8d030SDaniel W. S. Almeida nfs-utils on the server. Furthermore, only the mount.nfs command from 103f8b8d030SDaniel W. S. Almeida nfs-utils-1.1.2 is needed on the client. 104f8b8d030SDaniel W. S. Almeida 105f8b8d030SDaniel W. S. Almeida- Install a Linux kernel with NFS/RDMA 106f8b8d030SDaniel W. S. Almeida 107f8b8d030SDaniel W. S. Almeida The NFS/RDMA client and server are both included in the mainline Linux 108f8b8d030SDaniel W. S. Almeida kernel version 2.6.25 and later. This and other versions of the Linux 109f8b8d030SDaniel W. S. Almeida kernel can be found at: https://www.kernel.org/pub/linux/kernel/ 110f8b8d030SDaniel W. S. Almeida 111f8b8d030SDaniel W. S. Almeida Download the sources and place them in an appropriate location. 112f8b8d030SDaniel W. S. Almeida 113f8b8d030SDaniel W. S. Almeida- Configure the RDMA stack 114f8b8d030SDaniel W. S. Almeida 115f8b8d030SDaniel W. S. Almeida Make sure your kernel configuration has RDMA support enabled. Under 116f8b8d030SDaniel W. S. Almeida Device Drivers -> InfiniBand support, update the kernel configuration 117f8b8d030SDaniel W. S. Almeida to enable InfiniBand support [NOTE: the option name is misleading. Enabling 118f8b8d030SDaniel W. S. Almeida InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. 119f8b8d030SDaniel W. S. Almeida 120f8b8d030SDaniel W. S. Almeida Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or 121f8b8d030SDaniel W. S. Almeida iWARP adapter support (amso, cxgb3, etc.). 122f8b8d030SDaniel W. S. Almeida 123f8b8d030SDaniel W. S. Almeida If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. 124f8b8d030SDaniel W. S. Almeida 125f8b8d030SDaniel W. S. Almeida- Configure the NFS client and server 126f8b8d030SDaniel W. S. Almeida 127f8b8d030SDaniel W. S. Almeida Your kernel configuration must also have NFS file system support and/or 128f8b8d030SDaniel W. S. Almeida NFS server support enabled. These and other NFS related configuration 129f8b8d030SDaniel W. S. Almeida options can be found under File Systems -> Network File Systems. 130f8b8d030SDaniel W. S. Almeida 131f8b8d030SDaniel W. S. Almeida- Build, install, reboot 132f8b8d030SDaniel W. S. Almeida 133f8b8d030SDaniel W. S. Almeida The NFS/RDMA code will be enabled automatically if NFS and RDMA 134f8b8d030SDaniel W. S. Almeida are turned on. The NFS/RDMA client and server are configured via the hidden 135f8b8d030SDaniel W. S. Almeida SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The 136f8b8d030SDaniel W. S. Almeida value of SUNRPC_XPRT_RDMA will be: 137f8b8d030SDaniel W. S. Almeida 138f8b8d030SDaniel W. S. Almeida #. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client 139f8b8d030SDaniel W. S. Almeida and server will not be built 140f8b8d030SDaniel W. S. Almeida 141f8b8d030SDaniel W. S. Almeida #. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, 142f8b8d030SDaniel W. S. Almeida in this case the NFS/RDMA client and server will be built as modules 143f8b8d030SDaniel W. S. Almeida 144f8b8d030SDaniel W. S. Almeida #. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client 145f8b8d030SDaniel W. S. Almeida and server will be built into the kernel 146f8b8d030SDaniel W. S. Almeida 147f8b8d030SDaniel W. S. Almeida Therefore, if you have followed the steps above and turned no NFS and RDMA, 148f8b8d030SDaniel W. S. Almeida the NFS/RDMA client and server will be built. 149f8b8d030SDaniel W. S. Almeida 150f8b8d030SDaniel W. S. Almeida Build a new kernel, install it, boot it. 151f8b8d030SDaniel W. S. Almeida 152f8b8d030SDaniel W. S. AlmeidaCheck RDMA and NFS Setup 153f8b8d030SDaniel W. S. Almeida======================== 154f8b8d030SDaniel W. S. Almeida 155f8b8d030SDaniel W. S. AlmeidaBefore configuring the NFS/RDMA software, it is a good idea to test 156f8b8d030SDaniel W. S. Almeidayour new kernel to ensure that the kernel is working correctly. 157f8b8d030SDaniel W. S. AlmeidaIn particular, it is a good idea to verify that the RDMA stack 158f8b8d030SDaniel W. S. Almeidais functioning as expected and standard NFS over TCP/IP and/or UDP/IP 159f8b8d030SDaniel W. S. Almeidais working properly. 160f8b8d030SDaniel W. S. Almeida 161f8b8d030SDaniel W. S. Almeida- Check RDMA Setup 162f8b8d030SDaniel W. S. Almeida 163f8b8d030SDaniel W. S. Almeida If you built the RDMA components as modules, load them at 164f8b8d030SDaniel W. S. Almeida this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel 165f8b8d030SDaniel W. S. Almeida card: 166f8b8d030SDaniel W. S. Almeida 167f8b8d030SDaniel W. S. Almeida .. code-block:: sh 168f8b8d030SDaniel W. S. Almeida 169f8b8d030SDaniel W. S. Almeida $ modprobe ib_mthca 170f8b8d030SDaniel W. S. Almeida $ modprobe ib_ipoib 171f8b8d030SDaniel W. S. Almeida 172f8b8d030SDaniel W. S. Almeida If you are using InfiniBand, make sure there is a Subnet Manager (SM) 173f8b8d030SDaniel W. S. Almeida running on the network. If your IB switch has an embedded SM, you can 174f8b8d030SDaniel W. S. Almeida use it. Otherwise, you will need to run an SM, such as OpenSM, on one 175f8b8d030SDaniel W. S. Almeida of your end nodes. 176f8b8d030SDaniel W. S. Almeida 177f8b8d030SDaniel W. S. Almeida If an SM is running on your network, you should see the following: 178f8b8d030SDaniel W. S. Almeida 179f8b8d030SDaniel W. S. Almeida .. code-block:: sh 180f8b8d030SDaniel W. S. Almeida 181f8b8d030SDaniel W. S. Almeida $ cat /sys/class/infiniband/driverX/ports/1/state 182f8b8d030SDaniel W. S. Almeida 4: ACTIVE 183f8b8d030SDaniel W. S. Almeida 184f8b8d030SDaniel W. S. Almeida where driverX is mthca0, ipath5, ehca3, etc. 185f8b8d030SDaniel W. S. Almeida 186f8b8d030SDaniel W. S. Almeida To further test the InfiniBand software stack, use IPoIB (this 187f8b8d030SDaniel W. S. Almeida assumes you have two IB hosts named host1 and host2): 188f8b8d030SDaniel W. S. Almeida 189f8b8d030SDaniel W. S. Almeida .. code-block:: sh 190f8b8d030SDaniel W. S. Almeida 191f8b8d030SDaniel W. S. Almeida host1$ ip link set dev ib0 up 192f8b8d030SDaniel W. S. Almeida host1$ ip address add dev ib0 a.b.c.x 193f8b8d030SDaniel W. S. Almeida host2$ ip link set dev ib0 up 194f8b8d030SDaniel W. S. Almeida host2$ ip address add dev ib0 a.b.c.y 195f8b8d030SDaniel W. S. Almeida host1$ ping a.b.c.y 196f8b8d030SDaniel W. S. Almeida host2$ ping a.b.c.x 197f8b8d030SDaniel W. S. Almeida 198f8b8d030SDaniel W. S. Almeida For other device types, follow the appropriate procedures. 199f8b8d030SDaniel W. S. Almeida 200f8b8d030SDaniel W. S. Almeida- Check NFS Setup 201f8b8d030SDaniel W. S. Almeida 202f8b8d030SDaniel W. S. Almeida For the NFS components enabled above (client and/or server), 203f8b8d030SDaniel W. S. Almeida test their functionality over standard Ethernet using TCP/IP or UDP/IP. 204f8b8d030SDaniel W. S. Almeida 205f8b8d030SDaniel W. S. AlmeidaNFS/RDMA Setup 206f8b8d030SDaniel W. S. Almeida============== 207f8b8d030SDaniel W. S. Almeida 208f8b8d030SDaniel W. S. AlmeidaWe recommend that you use two machines, one to act as the client and 209f8b8d030SDaniel W. S. Almeidaone to act as the server. 210f8b8d030SDaniel W. S. Almeida 211f8b8d030SDaniel W. S. AlmeidaOne time configuration: 212f8b8d030SDaniel W. S. Almeida----------------------- 213f8b8d030SDaniel W. S. Almeida 214f8b8d030SDaniel W. S. Almeida- On the server system, configure the /etc/exports file and start the NFS/RDMA server. 215f8b8d030SDaniel W. S. Almeida 216f8b8d030SDaniel W. S. Almeida Exports entries with the following formats have been tested:: 217f8b8d030SDaniel W. S. Almeida 218f8b8d030SDaniel W. S. Almeida /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) 219f8b8d030SDaniel W. S. Almeida /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) 220f8b8d030SDaniel W. S. Almeida 221f8b8d030SDaniel W. S. Almeida The IP address(es) is(are) the client's IPoIB address for an InfiniBand 222f8b8d030SDaniel W. S. Almeida HCA or the client's iWARP address(es) for an RNIC. 223f8b8d030SDaniel W. S. Almeida 224f8b8d030SDaniel W. S. Almeida .. note:: 225f8b8d030SDaniel W. S. Almeida The "insecure" option must be used because the NFS/RDMA client does 226f8b8d030SDaniel W. S. Almeida not use a reserved port. 227f8b8d030SDaniel W. S. Almeida 228f8b8d030SDaniel W. S. AlmeidaEach time a machine boots: 229f8b8d030SDaniel W. S. Almeida-------------------------- 230f8b8d030SDaniel W. S. Almeida 231f8b8d030SDaniel W. S. Almeida- Load and configure the RDMA drivers 232f8b8d030SDaniel W. S. Almeida 233f8b8d030SDaniel W. S. Almeida For InfiniBand using a Mellanox adapter: 234f8b8d030SDaniel W. S. Almeida 235f8b8d030SDaniel W. S. Almeida .. code-block:: sh 236f8b8d030SDaniel W. S. Almeida 237f8b8d030SDaniel W. S. Almeida $ modprobe ib_mthca 238f8b8d030SDaniel W. S. Almeida $ modprobe ib_ipoib 239f8b8d030SDaniel W. S. Almeida $ ip li set dev ib0 up 240f8b8d030SDaniel W. S. Almeida $ ip addr add dev ib0 a.b.c.d 241f8b8d030SDaniel W. S. Almeida 242f8b8d030SDaniel W. S. Almeida .. note:: 243f8b8d030SDaniel W. S. Almeida Please use unique addresses for the client and server! 244f8b8d030SDaniel W. S. Almeida 245f8b8d030SDaniel W. S. Almeida- Start the NFS server 246f8b8d030SDaniel W. S. Almeida 247f8b8d030SDaniel W. S. Almeida If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in 248f8b8d030SDaniel W. S. Almeida kernel config), load the RDMA transport module: 249f8b8d030SDaniel W. S. Almeida 250f8b8d030SDaniel W. S. Almeida .. code-block:: sh 251f8b8d030SDaniel W. S. Almeida 252f8b8d030SDaniel W. S. Almeida $ modprobe svcrdma 253f8b8d030SDaniel W. S. Almeida 254f8b8d030SDaniel W. S. Almeida Regardless of how the server was built (module or built-in), start the 255f8b8d030SDaniel W. S. Almeida server: 256f8b8d030SDaniel W. S. Almeida 257f8b8d030SDaniel W. S. Almeida .. code-block:: sh 258f8b8d030SDaniel W. S. Almeida 259f8b8d030SDaniel W. S. Almeida $ /etc/init.d/nfs start 260f8b8d030SDaniel W. S. Almeida 261f8b8d030SDaniel W. S. Almeida or 262f8b8d030SDaniel W. S. Almeida 263f8b8d030SDaniel W. S. Almeida .. code-block:: sh 264f8b8d030SDaniel W. S. Almeida 265f8b8d030SDaniel W. S. Almeida $ service nfs start 266f8b8d030SDaniel W. S. Almeida 267f8b8d030SDaniel W. S. Almeida Instruct the server to listen on the RDMA transport: 268f8b8d030SDaniel W. S. Almeida 269f8b8d030SDaniel W. S. Almeida .. code-block:: sh 270f8b8d030SDaniel W. S. Almeida 271f8b8d030SDaniel W. S. Almeida $ echo rdma 20049 > /proc/fs/nfsd/portlist 272f8b8d030SDaniel W. S. Almeida 273f8b8d030SDaniel W. S. Almeida- On the client system 274f8b8d030SDaniel W. S. Almeida 275f8b8d030SDaniel W. S. Almeida If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in 276f8b8d030SDaniel W. S. Almeida kernel config), load the RDMA client module: 277f8b8d030SDaniel W. S. Almeida 278f8b8d030SDaniel W. S. Almeida .. code-block:: sh 279f8b8d030SDaniel W. S. Almeida 280f8b8d030SDaniel W. S. Almeida $ modprobe xprtrdma.ko 281f8b8d030SDaniel W. S. Almeida 282f8b8d030SDaniel W. S. Almeida Regardless of how the client was built (module or built-in), use this 283f8b8d030SDaniel W. S. Almeida command to mount the NFS/RDMA server: 284f8b8d030SDaniel W. S. Almeida 285f8b8d030SDaniel W. S. Almeida .. code-block:: sh 286f8b8d030SDaniel W. S. Almeida 287f8b8d030SDaniel W. S. Almeida $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt 288f8b8d030SDaniel W. S. Almeida 289f8b8d030SDaniel W. S. Almeida To verify that the mount is using RDMA, run "cat /proc/mounts" and check 290f8b8d030SDaniel W. S. Almeida the "proto" field for the given mount. 291f8b8d030SDaniel W. S. Almeida 292f8b8d030SDaniel W. S. Almeida Congratulations! You're using NFS/RDMA! 293