1*f8b8d030SDaniel W. S. Almeida=================== 2*f8b8d030SDaniel W. S. AlmeidaSetting up NFS/RDMA 3*f8b8d030SDaniel W. S. Almeida=================== 4*f8b8d030SDaniel W. S. Almeida 5*f8b8d030SDaniel W. S. Almeida:Author: 6*f8b8d030SDaniel W. S. Almeida NetApp and Open Grid Computing (May 29, 2008) 7*f8b8d030SDaniel W. S. Almeida 8*f8b8d030SDaniel W. S. Almeida.. warning:: 9*f8b8d030SDaniel W. S. Almeida This document is probably obsolete. 10*f8b8d030SDaniel W. S. Almeida 11*f8b8d030SDaniel W. S. AlmeidaOverview 12*f8b8d030SDaniel W. S. Almeida======== 13*f8b8d030SDaniel W. S. Almeida 14*f8b8d030SDaniel W. S. AlmeidaThis document describes how to install and setup the Linux NFS/RDMA client 15*f8b8d030SDaniel W. S. Almeidaand server software. 16*f8b8d030SDaniel W. S. Almeida 17*f8b8d030SDaniel W. S. AlmeidaThe NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server 18*f8b8d030SDaniel W. S. Almeidawas first included in the following release, Linux 2.6.25. 19*f8b8d030SDaniel W. S. Almeida 20*f8b8d030SDaniel W. S. AlmeidaIn our testing, we have obtained excellent performance results (full 10Gbit 21*f8b8d030SDaniel W. S. Almeidawire bandwidth at minimal client CPU) under many workloads. The code passes 22*f8b8d030SDaniel W. S. Almeidathe full Connectathon test suite and operates over both Infiniband and iWARP 23*f8b8d030SDaniel W. S. AlmeidaRDMA adapters. 24*f8b8d030SDaniel W. S. Almeida 25*f8b8d030SDaniel W. S. AlmeidaGetting Help 26*f8b8d030SDaniel W. S. Almeida============ 27*f8b8d030SDaniel W. S. Almeida 28*f8b8d030SDaniel W. S. AlmeidaIf you get stuck, you can ask questions on the 29*f8b8d030SDaniel W. S. Almeidanfs-rdma-devel@lists.sourceforge.net mailing list. 30*f8b8d030SDaniel W. S. Almeida 31*f8b8d030SDaniel W. S. AlmeidaInstallation 32*f8b8d030SDaniel W. S. Almeida============ 33*f8b8d030SDaniel W. S. Almeida 34*f8b8d030SDaniel W. S. AlmeidaThese instructions are a step by step guide to building a machine for 35*f8b8d030SDaniel W. S. Almeidause with NFS/RDMA. 36*f8b8d030SDaniel W. S. Almeida 37*f8b8d030SDaniel W. S. Almeida- Install an RDMA device 38*f8b8d030SDaniel W. S. Almeida 39*f8b8d030SDaniel W. S. Almeida Any device supported by the drivers in drivers/infiniband/hw is acceptable. 40*f8b8d030SDaniel W. S. Almeida 41*f8b8d030SDaniel W. S. Almeida Testing has been performed using several Mellanox-based IB cards, the 42*f8b8d030SDaniel W. S. Almeida Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter. 43*f8b8d030SDaniel W. S. Almeida 44*f8b8d030SDaniel W. S. Almeida- Install a Linux distribution and tools 45*f8b8d030SDaniel W. S. Almeida 46*f8b8d030SDaniel W. S. Almeida The first kernel release to contain both the NFS/RDMA client and server was 47*f8b8d030SDaniel W. S. Almeida Linux 2.6.25 Therefore, a distribution compatible with this and subsequent 48*f8b8d030SDaniel W. S. Almeida Linux kernel release should be installed. 49*f8b8d030SDaniel W. S. Almeida 50*f8b8d030SDaniel W. S. Almeida The procedures described in this document have been tested with 51*f8b8d030SDaniel W. S. Almeida distributions from Red Hat's Fedora Project (http://fedora.redhat.com/). 52*f8b8d030SDaniel W. S. Almeida 53*f8b8d030SDaniel W. S. Almeida- Install nfs-utils-1.1.2 or greater on the client 54*f8b8d030SDaniel W. S. Almeida 55*f8b8d030SDaniel W. S. Almeida An NFS/RDMA mount point can be obtained by using the mount.nfs command in 56*f8b8d030SDaniel W. S. Almeida nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils 57*f8b8d030SDaniel W. S. Almeida version with support for NFS/RDMA mounts, but for various reasons we 58*f8b8d030SDaniel W. S. Almeida recommend using nfs-utils-1.1.2 or greater). To see which version of 59*f8b8d030SDaniel W. S. Almeida mount.nfs you are using, type: 60*f8b8d030SDaniel W. S. Almeida 61*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 62*f8b8d030SDaniel W. S. Almeida 63*f8b8d030SDaniel W. S. Almeida $ /sbin/mount.nfs -V 64*f8b8d030SDaniel W. S. Almeida 65*f8b8d030SDaniel W. S. Almeida If the version is less than 1.1.2 or the command does not exist, 66*f8b8d030SDaniel W. S. Almeida you should install the latest version of nfs-utils. 67*f8b8d030SDaniel W. S. Almeida 68*f8b8d030SDaniel W. S. Almeida Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs 69*f8b8d030SDaniel W. S. Almeida 70*f8b8d030SDaniel W. S. Almeida Uncompress the package and follow the installation instructions. 71*f8b8d030SDaniel W. S. Almeida 72*f8b8d030SDaniel W. S. Almeida If you will not need the idmapper and gssd executables (you do not need 73*f8b8d030SDaniel W. S. Almeida these to create an NFS/RDMA enabled mount command), the installation 74*f8b8d030SDaniel W. S. Almeida process can be simplified by disabling these features when running 75*f8b8d030SDaniel W. S. Almeida configure: 76*f8b8d030SDaniel W. S. Almeida 77*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 78*f8b8d030SDaniel W. S. Almeida 79*f8b8d030SDaniel W. S. Almeida $ ./configure --disable-gss --disable-nfsv4 80*f8b8d030SDaniel W. S. Almeida 81*f8b8d030SDaniel W. S. Almeida To build nfs-utils you will need the tcp_wrappers package installed. For 82*f8b8d030SDaniel W. S. Almeida more information on this see the package's README and INSTALL files. 83*f8b8d030SDaniel W. S. Almeida 84*f8b8d030SDaniel W. S. Almeida After building the nfs-utils package, there will be a mount.nfs binary in 85*f8b8d030SDaniel W. S. Almeida the utils/mount directory. This binary can be used to initiate NFS v2, v3, 86*f8b8d030SDaniel W. S. Almeida or v4 mounts. To initiate a v4 mount, the binary must be called 87*f8b8d030SDaniel W. S. Almeida mount.nfs4. The standard technique is to create a symlink called 88*f8b8d030SDaniel W. S. Almeida mount.nfs4 to mount.nfs. 89*f8b8d030SDaniel W. S. Almeida 90*f8b8d030SDaniel W. S. Almeida This mount.nfs binary should be installed at /sbin/mount.nfs as follows: 91*f8b8d030SDaniel W. S. Almeida 92*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 93*f8b8d030SDaniel W. S. Almeida 94*f8b8d030SDaniel W. S. Almeida $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs 95*f8b8d030SDaniel W. S. Almeida 96*f8b8d030SDaniel W. S. Almeida In this location, mount.nfs will be invoked automatically for NFS mounts 97*f8b8d030SDaniel W. S. Almeida by the system mount command. 98*f8b8d030SDaniel W. S. Almeida 99*f8b8d030SDaniel W. S. Almeida .. note:: 100*f8b8d030SDaniel W. S. Almeida mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed 101*f8b8d030SDaniel W. S. Almeida on the NFS client machine. You do not need this specific version of 102*f8b8d030SDaniel W. S. Almeida nfs-utils on the server. Furthermore, only the mount.nfs command from 103*f8b8d030SDaniel W. S. Almeida nfs-utils-1.1.2 is needed on the client. 104*f8b8d030SDaniel W. S. Almeida 105*f8b8d030SDaniel W. S. Almeida- Install a Linux kernel with NFS/RDMA 106*f8b8d030SDaniel W. S. Almeida 107*f8b8d030SDaniel W. S. Almeida The NFS/RDMA client and server are both included in the mainline Linux 108*f8b8d030SDaniel W. S. Almeida kernel version 2.6.25 and later. This and other versions of the Linux 109*f8b8d030SDaniel W. S. Almeida kernel can be found at: https://www.kernel.org/pub/linux/kernel/ 110*f8b8d030SDaniel W. S. Almeida 111*f8b8d030SDaniel W. S. Almeida Download the sources and place them in an appropriate location. 112*f8b8d030SDaniel W. S. Almeida 113*f8b8d030SDaniel W. S. Almeida- Configure the RDMA stack 114*f8b8d030SDaniel W. S. Almeida 115*f8b8d030SDaniel W. S. Almeida Make sure your kernel configuration has RDMA support enabled. Under 116*f8b8d030SDaniel W. S. Almeida Device Drivers -> InfiniBand support, update the kernel configuration 117*f8b8d030SDaniel W. S. Almeida to enable InfiniBand support [NOTE: the option name is misleading. Enabling 118*f8b8d030SDaniel W. S. Almeida InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)]. 119*f8b8d030SDaniel W. S. Almeida 120*f8b8d030SDaniel W. S. Almeida Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or 121*f8b8d030SDaniel W. S. Almeida iWARP adapter support (amso, cxgb3, etc.). 122*f8b8d030SDaniel W. S. Almeida 123*f8b8d030SDaniel W. S. Almeida If you are using InfiniBand, be sure to enable IP-over-InfiniBand support. 124*f8b8d030SDaniel W. S. Almeida 125*f8b8d030SDaniel W. S. Almeida- Configure the NFS client and server 126*f8b8d030SDaniel W. S. Almeida 127*f8b8d030SDaniel W. S. Almeida Your kernel configuration must also have NFS file system support and/or 128*f8b8d030SDaniel W. S. Almeida NFS server support enabled. These and other NFS related configuration 129*f8b8d030SDaniel W. S. Almeida options can be found under File Systems -> Network File Systems. 130*f8b8d030SDaniel W. S. Almeida 131*f8b8d030SDaniel W. S. Almeida- Build, install, reboot 132*f8b8d030SDaniel W. S. Almeida 133*f8b8d030SDaniel W. S. Almeida The NFS/RDMA code will be enabled automatically if NFS and RDMA 134*f8b8d030SDaniel W. S. Almeida are turned on. The NFS/RDMA client and server are configured via the hidden 135*f8b8d030SDaniel W. S. Almeida SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The 136*f8b8d030SDaniel W. S. Almeida value of SUNRPC_XPRT_RDMA will be: 137*f8b8d030SDaniel W. S. Almeida 138*f8b8d030SDaniel W. S. Almeida #. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client 139*f8b8d030SDaniel W. S. Almeida and server will not be built 140*f8b8d030SDaniel W. S. Almeida 141*f8b8d030SDaniel W. S. Almeida #. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M, 142*f8b8d030SDaniel W. S. Almeida in this case the NFS/RDMA client and server will be built as modules 143*f8b8d030SDaniel W. S. Almeida 144*f8b8d030SDaniel W. S. Almeida #. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client 145*f8b8d030SDaniel W. S. Almeida and server will be built into the kernel 146*f8b8d030SDaniel W. S. Almeida 147*f8b8d030SDaniel W. S. Almeida Therefore, if you have followed the steps above and turned no NFS and RDMA, 148*f8b8d030SDaniel W. S. Almeida the NFS/RDMA client and server will be built. 149*f8b8d030SDaniel W. S. Almeida 150*f8b8d030SDaniel W. S. Almeida Build a new kernel, install it, boot it. 151*f8b8d030SDaniel W. S. Almeida 152*f8b8d030SDaniel W. S. AlmeidaCheck RDMA and NFS Setup 153*f8b8d030SDaniel W. S. Almeida======================== 154*f8b8d030SDaniel W. S. Almeida 155*f8b8d030SDaniel W. S. AlmeidaBefore configuring the NFS/RDMA software, it is a good idea to test 156*f8b8d030SDaniel W. S. Almeidayour new kernel to ensure that the kernel is working correctly. 157*f8b8d030SDaniel W. S. AlmeidaIn particular, it is a good idea to verify that the RDMA stack 158*f8b8d030SDaniel W. S. Almeidais functioning as expected and standard NFS over TCP/IP and/or UDP/IP 159*f8b8d030SDaniel W. S. Almeidais working properly. 160*f8b8d030SDaniel W. S. Almeida 161*f8b8d030SDaniel W. S. Almeida- Check RDMA Setup 162*f8b8d030SDaniel W. S. Almeida 163*f8b8d030SDaniel W. S. Almeida If you built the RDMA components as modules, load them at 164*f8b8d030SDaniel W. S. Almeida this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel 165*f8b8d030SDaniel W. S. Almeida card: 166*f8b8d030SDaniel W. S. Almeida 167*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 168*f8b8d030SDaniel W. S. Almeida 169*f8b8d030SDaniel W. S. Almeida $ modprobe ib_mthca 170*f8b8d030SDaniel W. S. Almeida $ modprobe ib_ipoib 171*f8b8d030SDaniel W. S. Almeida 172*f8b8d030SDaniel W. S. Almeida If you are using InfiniBand, make sure there is a Subnet Manager (SM) 173*f8b8d030SDaniel W. S. Almeida running on the network. If your IB switch has an embedded SM, you can 174*f8b8d030SDaniel W. S. Almeida use it. Otherwise, you will need to run an SM, such as OpenSM, on one 175*f8b8d030SDaniel W. S. Almeida of your end nodes. 176*f8b8d030SDaniel W. S. Almeida 177*f8b8d030SDaniel W. S. Almeida If an SM is running on your network, you should see the following: 178*f8b8d030SDaniel W. S. Almeida 179*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 180*f8b8d030SDaniel W. S. Almeida 181*f8b8d030SDaniel W. S. Almeida $ cat /sys/class/infiniband/driverX/ports/1/state 182*f8b8d030SDaniel W. S. Almeida 4: ACTIVE 183*f8b8d030SDaniel W. S. Almeida 184*f8b8d030SDaniel W. S. Almeida where driverX is mthca0, ipath5, ehca3, etc. 185*f8b8d030SDaniel W. S. Almeida 186*f8b8d030SDaniel W. S. Almeida To further test the InfiniBand software stack, use IPoIB (this 187*f8b8d030SDaniel W. S. Almeida assumes you have two IB hosts named host1 and host2): 188*f8b8d030SDaniel W. S. Almeida 189*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 190*f8b8d030SDaniel W. S. Almeida 191*f8b8d030SDaniel W. S. Almeida host1$ ip link set dev ib0 up 192*f8b8d030SDaniel W. S. Almeida host1$ ip address add dev ib0 a.b.c.x 193*f8b8d030SDaniel W. S. Almeida host2$ ip link set dev ib0 up 194*f8b8d030SDaniel W. S. Almeida host2$ ip address add dev ib0 a.b.c.y 195*f8b8d030SDaniel W. S. Almeida host1$ ping a.b.c.y 196*f8b8d030SDaniel W. S. Almeida host2$ ping a.b.c.x 197*f8b8d030SDaniel W. S. Almeida 198*f8b8d030SDaniel W. S. Almeida For other device types, follow the appropriate procedures. 199*f8b8d030SDaniel W. S. Almeida 200*f8b8d030SDaniel W. S. Almeida- Check NFS Setup 201*f8b8d030SDaniel W. S. Almeida 202*f8b8d030SDaniel W. S. Almeida For the NFS components enabled above (client and/or server), 203*f8b8d030SDaniel W. S. Almeida test their functionality over standard Ethernet using TCP/IP or UDP/IP. 204*f8b8d030SDaniel W. S. Almeida 205*f8b8d030SDaniel W. S. AlmeidaNFS/RDMA Setup 206*f8b8d030SDaniel W. S. Almeida============== 207*f8b8d030SDaniel W. S. Almeida 208*f8b8d030SDaniel W. S. AlmeidaWe recommend that you use two machines, one to act as the client and 209*f8b8d030SDaniel W. S. Almeidaone to act as the server. 210*f8b8d030SDaniel W. S. Almeida 211*f8b8d030SDaniel W. S. AlmeidaOne time configuration: 212*f8b8d030SDaniel W. S. Almeida----------------------- 213*f8b8d030SDaniel W. S. Almeida 214*f8b8d030SDaniel W. S. Almeida- On the server system, configure the /etc/exports file and start the NFS/RDMA server. 215*f8b8d030SDaniel W. S. Almeida 216*f8b8d030SDaniel W. S. Almeida Exports entries with the following formats have been tested:: 217*f8b8d030SDaniel W. S. Almeida 218*f8b8d030SDaniel W. S. Almeida /vol0 192.168.0.47(fsid=0,rw,async,insecure,no_root_squash) 219*f8b8d030SDaniel W. S. Almeida /vol0 192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash) 220*f8b8d030SDaniel W. S. Almeida 221*f8b8d030SDaniel W. S. Almeida The IP address(es) is(are) the client's IPoIB address for an InfiniBand 222*f8b8d030SDaniel W. S. Almeida HCA or the client's iWARP address(es) for an RNIC. 223*f8b8d030SDaniel W. S. Almeida 224*f8b8d030SDaniel W. S. Almeida .. note:: 225*f8b8d030SDaniel W. S. Almeida The "insecure" option must be used because the NFS/RDMA client does 226*f8b8d030SDaniel W. S. Almeida not use a reserved port. 227*f8b8d030SDaniel W. S. Almeida 228*f8b8d030SDaniel W. S. AlmeidaEach time a machine boots: 229*f8b8d030SDaniel W. S. Almeida-------------------------- 230*f8b8d030SDaniel W. S. Almeida 231*f8b8d030SDaniel W. S. Almeida- Load and configure the RDMA drivers 232*f8b8d030SDaniel W. S. Almeida 233*f8b8d030SDaniel W. S. Almeida For InfiniBand using a Mellanox adapter: 234*f8b8d030SDaniel W. S. Almeida 235*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 236*f8b8d030SDaniel W. S. Almeida 237*f8b8d030SDaniel W. S. Almeida $ modprobe ib_mthca 238*f8b8d030SDaniel W. S. Almeida $ modprobe ib_ipoib 239*f8b8d030SDaniel W. S. Almeida $ ip li set dev ib0 up 240*f8b8d030SDaniel W. S. Almeida $ ip addr add dev ib0 a.b.c.d 241*f8b8d030SDaniel W. S. Almeida 242*f8b8d030SDaniel W. S. Almeida .. note:: 243*f8b8d030SDaniel W. S. Almeida Please use unique addresses for the client and server! 244*f8b8d030SDaniel W. S. Almeida 245*f8b8d030SDaniel W. S. Almeida- Start the NFS server 246*f8b8d030SDaniel W. S. Almeida 247*f8b8d030SDaniel W. S. Almeida If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in 248*f8b8d030SDaniel W. S. Almeida kernel config), load the RDMA transport module: 249*f8b8d030SDaniel W. S. Almeida 250*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 251*f8b8d030SDaniel W. S. Almeida 252*f8b8d030SDaniel W. S. Almeida $ modprobe svcrdma 253*f8b8d030SDaniel W. S. Almeida 254*f8b8d030SDaniel W. S. Almeida Regardless of how the server was built (module or built-in), start the 255*f8b8d030SDaniel W. S. Almeida server: 256*f8b8d030SDaniel W. S. Almeida 257*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 258*f8b8d030SDaniel W. S. Almeida 259*f8b8d030SDaniel W. S. Almeida $ /etc/init.d/nfs start 260*f8b8d030SDaniel W. S. Almeida 261*f8b8d030SDaniel W. S. Almeida or 262*f8b8d030SDaniel W. S. Almeida 263*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 264*f8b8d030SDaniel W. S. Almeida 265*f8b8d030SDaniel W. S. Almeida $ service nfs start 266*f8b8d030SDaniel W. S. Almeida 267*f8b8d030SDaniel W. S. Almeida Instruct the server to listen on the RDMA transport: 268*f8b8d030SDaniel W. S. Almeida 269*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 270*f8b8d030SDaniel W. S. Almeida 271*f8b8d030SDaniel W. S. Almeida $ echo rdma 20049 > /proc/fs/nfsd/portlist 272*f8b8d030SDaniel W. S. Almeida 273*f8b8d030SDaniel W. S. Almeida- On the client system 274*f8b8d030SDaniel W. S. Almeida 275*f8b8d030SDaniel W. S. Almeida If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in 276*f8b8d030SDaniel W. S. Almeida kernel config), load the RDMA client module: 277*f8b8d030SDaniel W. S. Almeida 278*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 279*f8b8d030SDaniel W. S. Almeida 280*f8b8d030SDaniel W. S. Almeida $ modprobe xprtrdma.ko 281*f8b8d030SDaniel W. S. Almeida 282*f8b8d030SDaniel W. S. Almeida Regardless of how the client was built (module or built-in), use this 283*f8b8d030SDaniel W. S. Almeida command to mount the NFS/RDMA server: 284*f8b8d030SDaniel W. S. Almeida 285*f8b8d030SDaniel W. S. Almeida .. code-block:: sh 286*f8b8d030SDaniel W. S. Almeida 287*f8b8d030SDaniel W. S. Almeida $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt 288*f8b8d030SDaniel W. S. Almeida 289*f8b8d030SDaniel W. S. Almeida To verify that the mount is using RDMA, run "cat /proc/mounts" and check 290*f8b8d030SDaniel W. S. Almeida the "proto" field for the given mount. 291*f8b8d030SDaniel W. S. Almeida 292*f8b8d030SDaniel W. S. Almeida Congratulations! You're using NFS/RDMA! 293