xref: /linux/Documentation/admin-guide/nfs/nfs-rdma.rst (revision f8b8d030597a3b0a20e9cc2e958f82164690fbdb)
1*f8b8d030SDaniel W. S. Almeida===================
2*f8b8d030SDaniel W. S. AlmeidaSetting up NFS/RDMA
3*f8b8d030SDaniel W. S. Almeida===================
4*f8b8d030SDaniel W. S. Almeida
5*f8b8d030SDaniel W. S. Almeida:Author:
6*f8b8d030SDaniel W. S. Almeida  NetApp and Open Grid Computing (May 29, 2008)
7*f8b8d030SDaniel W. S. Almeida
8*f8b8d030SDaniel W. S. Almeida.. warning::
9*f8b8d030SDaniel W. S. Almeida  This document is probably obsolete.
10*f8b8d030SDaniel W. S. Almeida
11*f8b8d030SDaniel W. S. AlmeidaOverview
12*f8b8d030SDaniel W. S. Almeida========
13*f8b8d030SDaniel W. S. Almeida
14*f8b8d030SDaniel W. S. AlmeidaThis document describes how to install and setup the Linux NFS/RDMA client
15*f8b8d030SDaniel W. S. Almeidaand server software.
16*f8b8d030SDaniel W. S. Almeida
17*f8b8d030SDaniel W. S. AlmeidaThe NFS/RDMA client was first included in Linux 2.6.24. The NFS/RDMA server
18*f8b8d030SDaniel W. S. Almeidawas first included in the following release, Linux 2.6.25.
19*f8b8d030SDaniel W. S. Almeida
20*f8b8d030SDaniel W. S. AlmeidaIn our testing, we have obtained excellent performance results (full 10Gbit
21*f8b8d030SDaniel W. S. Almeidawire bandwidth at minimal client CPU) under many workloads. The code passes
22*f8b8d030SDaniel W. S. Almeidathe full Connectathon test suite and operates over both Infiniband and iWARP
23*f8b8d030SDaniel W. S. AlmeidaRDMA adapters.
24*f8b8d030SDaniel W. S. Almeida
25*f8b8d030SDaniel W. S. AlmeidaGetting Help
26*f8b8d030SDaniel W. S. Almeida============
27*f8b8d030SDaniel W. S. Almeida
28*f8b8d030SDaniel W. S. AlmeidaIf you get stuck, you can ask questions on the
29*f8b8d030SDaniel W. S. Almeidanfs-rdma-devel@lists.sourceforge.net mailing list.
30*f8b8d030SDaniel W. S. Almeida
31*f8b8d030SDaniel W. S. AlmeidaInstallation
32*f8b8d030SDaniel W. S. Almeida============
33*f8b8d030SDaniel W. S. Almeida
34*f8b8d030SDaniel W. S. AlmeidaThese instructions are a step by step guide to building a machine for
35*f8b8d030SDaniel W. S. Almeidause with NFS/RDMA.
36*f8b8d030SDaniel W. S. Almeida
37*f8b8d030SDaniel W. S. Almeida- Install an RDMA device
38*f8b8d030SDaniel W. S. Almeida
39*f8b8d030SDaniel W. S. Almeida  Any device supported by the drivers in drivers/infiniband/hw is acceptable.
40*f8b8d030SDaniel W. S. Almeida
41*f8b8d030SDaniel W. S. Almeida  Testing has been performed using several Mellanox-based IB cards, the
42*f8b8d030SDaniel W. S. Almeida  Ammasso AMS1100 iWARP adapter, and the Chelsio cxgb3 iWARP adapter.
43*f8b8d030SDaniel W. S. Almeida
44*f8b8d030SDaniel W. S. Almeida- Install a Linux distribution and tools
45*f8b8d030SDaniel W. S. Almeida
46*f8b8d030SDaniel W. S. Almeida  The first kernel release to contain both the NFS/RDMA client and server was
47*f8b8d030SDaniel W. S. Almeida  Linux 2.6.25  Therefore, a distribution compatible with this and subsequent
48*f8b8d030SDaniel W. S. Almeida  Linux kernel release should be installed.
49*f8b8d030SDaniel W. S. Almeida
50*f8b8d030SDaniel W. S. Almeida  The procedures described in this document have been tested with
51*f8b8d030SDaniel W. S. Almeida  distributions from Red Hat's Fedora Project (http://fedora.redhat.com/).
52*f8b8d030SDaniel W. S. Almeida
53*f8b8d030SDaniel W. S. Almeida- Install nfs-utils-1.1.2 or greater on the client
54*f8b8d030SDaniel W. S. Almeida
55*f8b8d030SDaniel W. S. Almeida  An NFS/RDMA mount point can be obtained by using the mount.nfs command in
56*f8b8d030SDaniel W. S. Almeida  nfs-utils-1.1.2 or greater (nfs-utils-1.1.1 was the first nfs-utils
57*f8b8d030SDaniel W. S. Almeida  version with support for NFS/RDMA mounts, but for various reasons we
58*f8b8d030SDaniel W. S. Almeida  recommend using nfs-utils-1.1.2 or greater). To see which version of
59*f8b8d030SDaniel W. S. Almeida  mount.nfs you are using, type:
60*f8b8d030SDaniel W. S. Almeida
61*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
62*f8b8d030SDaniel W. S. Almeida
63*f8b8d030SDaniel W. S. Almeida    $ /sbin/mount.nfs -V
64*f8b8d030SDaniel W. S. Almeida
65*f8b8d030SDaniel W. S. Almeida  If the version is less than 1.1.2 or the command does not exist,
66*f8b8d030SDaniel W. S. Almeida  you should install the latest version of nfs-utils.
67*f8b8d030SDaniel W. S. Almeida
68*f8b8d030SDaniel W. S. Almeida  Download the latest package from: http://www.kernel.org/pub/linux/utils/nfs
69*f8b8d030SDaniel W. S. Almeida
70*f8b8d030SDaniel W. S. Almeida  Uncompress the package and follow the installation instructions.
71*f8b8d030SDaniel W. S. Almeida
72*f8b8d030SDaniel W. S. Almeida  If you will not need the idmapper and gssd executables (you do not need
73*f8b8d030SDaniel W. S. Almeida  these to create an NFS/RDMA enabled mount command), the installation
74*f8b8d030SDaniel W. S. Almeida  process can be simplified by disabling these features when running
75*f8b8d030SDaniel W. S. Almeida  configure:
76*f8b8d030SDaniel W. S. Almeida
77*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
78*f8b8d030SDaniel W. S. Almeida
79*f8b8d030SDaniel W. S. Almeida    $ ./configure --disable-gss --disable-nfsv4
80*f8b8d030SDaniel W. S. Almeida
81*f8b8d030SDaniel W. S. Almeida  To build nfs-utils you will need the tcp_wrappers package installed. For
82*f8b8d030SDaniel W. S. Almeida  more information on this see the package's README and INSTALL files.
83*f8b8d030SDaniel W. S. Almeida
84*f8b8d030SDaniel W. S. Almeida  After building the nfs-utils package, there will be a mount.nfs binary in
85*f8b8d030SDaniel W. S. Almeida  the utils/mount directory. This binary can be used to initiate NFS v2, v3,
86*f8b8d030SDaniel W. S. Almeida  or v4 mounts. To initiate a v4 mount, the binary must be called
87*f8b8d030SDaniel W. S. Almeida  mount.nfs4.  The standard technique is to create a symlink called
88*f8b8d030SDaniel W. S. Almeida  mount.nfs4 to mount.nfs.
89*f8b8d030SDaniel W. S. Almeida
90*f8b8d030SDaniel W. S. Almeida  This mount.nfs binary should be installed at /sbin/mount.nfs as follows:
91*f8b8d030SDaniel W. S. Almeida
92*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
93*f8b8d030SDaniel W. S. Almeida
94*f8b8d030SDaniel W. S. Almeida    $ sudo cp utils/mount/mount.nfs /sbin/mount.nfs
95*f8b8d030SDaniel W. S. Almeida
96*f8b8d030SDaniel W. S. Almeida  In this location, mount.nfs will be invoked automatically for NFS mounts
97*f8b8d030SDaniel W. S. Almeida  by the system mount command.
98*f8b8d030SDaniel W. S. Almeida
99*f8b8d030SDaniel W. S. Almeida    .. note::
100*f8b8d030SDaniel W. S. Almeida      mount.nfs and therefore nfs-utils-1.1.2 or greater is only needed
101*f8b8d030SDaniel W. S. Almeida      on the NFS client machine. You do not need this specific version of
102*f8b8d030SDaniel W. S. Almeida      nfs-utils on the server. Furthermore, only the mount.nfs command from
103*f8b8d030SDaniel W. S. Almeida      nfs-utils-1.1.2 is needed on the client.
104*f8b8d030SDaniel W. S. Almeida
105*f8b8d030SDaniel W. S. Almeida- Install a Linux kernel with NFS/RDMA
106*f8b8d030SDaniel W. S. Almeida
107*f8b8d030SDaniel W. S. Almeida  The NFS/RDMA client and server are both included in the mainline Linux
108*f8b8d030SDaniel W. S. Almeida  kernel version 2.6.25 and later. This and other versions of the Linux
109*f8b8d030SDaniel W. S. Almeida  kernel can be found at: https://www.kernel.org/pub/linux/kernel/
110*f8b8d030SDaniel W. S. Almeida
111*f8b8d030SDaniel W. S. Almeida  Download the sources and place them in an appropriate location.
112*f8b8d030SDaniel W. S. Almeida
113*f8b8d030SDaniel W. S. Almeida- Configure the RDMA stack
114*f8b8d030SDaniel W. S. Almeida
115*f8b8d030SDaniel W. S. Almeida  Make sure your kernel configuration has RDMA support enabled. Under
116*f8b8d030SDaniel W. S. Almeida  Device Drivers -> InfiniBand support, update the kernel configuration
117*f8b8d030SDaniel W. S. Almeida  to enable InfiniBand support [NOTE: the option name is misleading. Enabling
118*f8b8d030SDaniel W. S. Almeida  InfiniBand support is required for all RDMA devices (IB, iWARP, etc.)].
119*f8b8d030SDaniel W. S. Almeida
120*f8b8d030SDaniel W. S. Almeida  Enable the appropriate IB HCA support (mlx4, mthca, ehca, ipath, etc.) or
121*f8b8d030SDaniel W. S. Almeida  iWARP adapter support (amso, cxgb3, etc.).
122*f8b8d030SDaniel W. S. Almeida
123*f8b8d030SDaniel W. S. Almeida  If you are using InfiniBand, be sure to enable IP-over-InfiniBand support.
124*f8b8d030SDaniel W. S. Almeida
125*f8b8d030SDaniel W. S. Almeida- Configure the NFS client and server
126*f8b8d030SDaniel W. S. Almeida
127*f8b8d030SDaniel W. S. Almeida  Your kernel configuration must also have NFS file system support and/or
128*f8b8d030SDaniel W. S. Almeida  NFS server support enabled. These and other NFS related configuration
129*f8b8d030SDaniel W. S. Almeida  options can be found under File Systems -> Network File Systems.
130*f8b8d030SDaniel W. S. Almeida
131*f8b8d030SDaniel W. S. Almeida- Build, install, reboot
132*f8b8d030SDaniel W. S. Almeida
133*f8b8d030SDaniel W. S. Almeida  The NFS/RDMA code will be enabled automatically if NFS and RDMA
134*f8b8d030SDaniel W. S. Almeida  are turned on. The NFS/RDMA client and server are configured via the hidden
135*f8b8d030SDaniel W. S. Almeida  SUNRPC_XPRT_RDMA config option that depends on SUNRPC and INFINIBAND. The
136*f8b8d030SDaniel W. S. Almeida  value of SUNRPC_XPRT_RDMA will be:
137*f8b8d030SDaniel W. S. Almeida
138*f8b8d030SDaniel W. S. Almeida    #. N if either SUNRPC or INFINIBAND are N, in this case the NFS/RDMA client
139*f8b8d030SDaniel W. S. Almeida       and server will not be built
140*f8b8d030SDaniel W. S. Almeida
141*f8b8d030SDaniel W. S. Almeida    #. M if both SUNRPC and INFINIBAND are on (M or Y) and at least one is M,
142*f8b8d030SDaniel W. S. Almeida       in this case the NFS/RDMA client and server will be built as modules
143*f8b8d030SDaniel W. S. Almeida
144*f8b8d030SDaniel W. S. Almeida    #. Y if both SUNRPC and INFINIBAND are Y, in this case the NFS/RDMA client
145*f8b8d030SDaniel W. S. Almeida       and server will be built into the kernel
146*f8b8d030SDaniel W. S. Almeida
147*f8b8d030SDaniel W. S. Almeida  Therefore, if you have followed the steps above and turned no NFS and RDMA,
148*f8b8d030SDaniel W. S. Almeida  the NFS/RDMA client and server will be built.
149*f8b8d030SDaniel W. S. Almeida
150*f8b8d030SDaniel W. S. Almeida  Build a new kernel, install it, boot it.
151*f8b8d030SDaniel W. S. Almeida
152*f8b8d030SDaniel W. S. AlmeidaCheck RDMA and NFS Setup
153*f8b8d030SDaniel W. S. Almeida========================
154*f8b8d030SDaniel W. S. Almeida
155*f8b8d030SDaniel W. S. AlmeidaBefore configuring the NFS/RDMA software, it is a good idea to test
156*f8b8d030SDaniel W. S. Almeidayour new kernel to ensure that the kernel is working correctly.
157*f8b8d030SDaniel W. S. AlmeidaIn particular, it is a good idea to verify that the RDMA stack
158*f8b8d030SDaniel W. S. Almeidais functioning as expected and standard NFS over TCP/IP and/or UDP/IP
159*f8b8d030SDaniel W. S. Almeidais working properly.
160*f8b8d030SDaniel W. S. Almeida
161*f8b8d030SDaniel W. S. Almeida- Check RDMA Setup
162*f8b8d030SDaniel W. S. Almeida
163*f8b8d030SDaniel W. S. Almeida  If you built the RDMA components as modules, load them at
164*f8b8d030SDaniel W. S. Almeida  this time. For example, if you are using a Mellanox Tavor/Sinai/Arbel
165*f8b8d030SDaniel W. S. Almeida  card:
166*f8b8d030SDaniel W. S. Almeida
167*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
168*f8b8d030SDaniel W. S. Almeida
169*f8b8d030SDaniel W. S. Almeida    $ modprobe ib_mthca
170*f8b8d030SDaniel W. S. Almeida    $ modprobe ib_ipoib
171*f8b8d030SDaniel W. S. Almeida
172*f8b8d030SDaniel W. S. Almeida  If you are using InfiniBand, make sure there is a Subnet Manager (SM)
173*f8b8d030SDaniel W. S. Almeida  running on the network. If your IB switch has an embedded SM, you can
174*f8b8d030SDaniel W. S. Almeida  use it. Otherwise, you will need to run an SM, such as OpenSM, on one
175*f8b8d030SDaniel W. S. Almeida  of your end nodes.
176*f8b8d030SDaniel W. S. Almeida
177*f8b8d030SDaniel W. S. Almeida  If an SM is running on your network, you should see the following:
178*f8b8d030SDaniel W. S. Almeida
179*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
180*f8b8d030SDaniel W. S. Almeida
181*f8b8d030SDaniel W. S. Almeida    $ cat /sys/class/infiniband/driverX/ports/1/state
182*f8b8d030SDaniel W. S. Almeida    4: ACTIVE
183*f8b8d030SDaniel W. S. Almeida
184*f8b8d030SDaniel W. S. Almeida  where driverX is mthca0, ipath5, ehca3, etc.
185*f8b8d030SDaniel W. S. Almeida
186*f8b8d030SDaniel W. S. Almeida  To further test the InfiniBand software stack, use IPoIB (this
187*f8b8d030SDaniel W. S. Almeida  assumes you have two IB hosts named host1 and host2):
188*f8b8d030SDaniel W. S. Almeida
189*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
190*f8b8d030SDaniel W. S. Almeida
191*f8b8d030SDaniel W. S. Almeida    host1$ ip link set dev ib0 up
192*f8b8d030SDaniel W. S. Almeida    host1$ ip address add dev ib0 a.b.c.x
193*f8b8d030SDaniel W. S. Almeida    host2$ ip link set dev ib0 up
194*f8b8d030SDaniel W. S. Almeida    host2$ ip address add dev ib0 a.b.c.y
195*f8b8d030SDaniel W. S. Almeida    host1$ ping a.b.c.y
196*f8b8d030SDaniel W. S. Almeida    host2$ ping a.b.c.x
197*f8b8d030SDaniel W. S. Almeida
198*f8b8d030SDaniel W. S. Almeida  For other device types, follow the appropriate procedures.
199*f8b8d030SDaniel W. S. Almeida
200*f8b8d030SDaniel W. S. Almeida- Check NFS Setup
201*f8b8d030SDaniel W. S. Almeida
202*f8b8d030SDaniel W. S. Almeida  For the NFS components enabled above (client and/or server),
203*f8b8d030SDaniel W. S. Almeida  test their functionality over standard Ethernet using TCP/IP or UDP/IP.
204*f8b8d030SDaniel W. S. Almeida
205*f8b8d030SDaniel W. S. AlmeidaNFS/RDMA Setup
206*f8b8d030SDaniel W. S. Almeida==============
207*f8b8d030SDaniel W. S. Almeida
208*f8b8d030SDaniel W. S. AlmeidaWe recommend that you use two machines, one to act as the client and
209*f8b8d030SDaniel W. S. Almeidaone to act as the server.
210*f8b8d030SDaniel W. S. Almeida
211*f8b8d030SDaniel W. S. AlmeidaOne time configuration:
212*f8b8d030SDaniel W. S. Almeida-----------------------
213*f8b8d030SDaniel W. S. Almeida
214*f8b8d030SDaniel W. S. Almeida- On the server system, configure the /etc/exports file and start the NFS/RDMA server.
215*f8b8d030SDaniel W. S. Almeida
216*f8b8d030SDaniel W. S. Almeida  Exports entries with the following formats have been tested::
217*f8b8d030SDaniel W. S. Almeida
218*f8b8d030SDaniel W. S. Almeida  /vol0   192.168.0.47(fsid=0,rw,async,insecure,no_root_squash)
219*f8b8d030SDaniel W. S. Almeida  /vol0   192.168.0.0/255.255.255.0(fsid=0,rw,async,insecure,no_root_squash)
220*f8b8d030SDaniel W. S. Almeida
221*f8b8d030SDaniel W. S. Almeida  The IP address(es) is(are) the client's IPoIB address for an InfiniBand
222*f8b8d030SDaniel W. S. Almeida  HCA or the client's iWARP address(es) for an RNIC.
223*f8b8d030SDaniel W. S. Almeida
224*f8b8d030SDaniel W. S. Almeida  .. note::
225*f8b8d030SDaniel W. S. Almeida    The "insecure" option must be used because the NFS/RDMA client does
226*f8b8d030SDaniel W. S. Almeida    not use a reserved port.
227*f8b8d030SDaniel W. S. Almeida
228*f8b8d030SDaniel W. S. AlmeidaEach time a machine boots:
229*f8b8d030SDaniel W. S. Almeida--------------------------
230*f8b8d030SDaniel W. S. Almeida
231*f8b8d030SDaniel W. S. Almeida- Load and configure the RDMA drivers
232*f8b8d030SDaniel W. S. Almeida
233*f8b8d030SDaniel W. S. Almeida  For InfiniBand using a Mellanox adapter:
234*f8b8d030SDaniel W. S. Almeida
235*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
236*f8b8d030SDaniel W. S. Almeida
237*f8b8d030SDaniel W. S. Almeida    $ modprobe ib_mthca
238*f8b8d030SDaniel W. S. Almeida    $ modprobe ib_ipoib
239*f8b8d030SDaniel W. S. Almeida    $ ip li set dev ib0 up
240*f8b8d030SDaniel W. S. Almeida    $ ip addr add dev ib0 a.b.c.d
241*f8b8d030SDaniel W. S. Almeida
242*f8b8d030SDaniel W. S. Almeida  .. note::
243*f8b8d030SDaniel W. S. Almeida    Please use unique addresses for the client and server!
244*f8b8d030SDaniel W. S. Almeida
245*f8b8d030SDaniel W. S. Almeida- Start the NFS server
246*f8b8d030SDaniel W. S. Almeida
247*f8b8d030SDaniel W. S. Almeida  If the NFS/RDMA server was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
248*f8b8d030SDaniel W. S. Almeida  kernel config), load the RDMA transport module:
249*f8b8d030SDaniel W. S. Almeida
250*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
251*f8b8d030SDaniel W. S. Almeida
252*f8b8d030SDaniel W. S. Almeida    $ modprobe svcrdma
253*f8b8d030SDaniel W. S. Almeida
254*f8b8d030SDaniel W. S. Almeida  Regardless of how the server was built (module or built-in), start the
255*f8b8d030SDaniel W. S. Almeida  server:
256*f8b8d030SDaniel W. S. Almeida
257*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
258*f8b8d030SDaniel W. S. Almeida
259*f8b8d030SDaniel W. S. Almeida    $ /etc/init.d/nfs start
260*f8b8d030SDaniel W. S. Almeida
261*f8b8d030SDaniel W. S. Almeida  or
262*f8b8d030SDaniel W. S. Almeida
263*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
264*f8b8d030SDaniel W. S. Almeida
265*f8b8d030SDaniel W. S. Almeida    $ service nfs start
266*f8b8d030SDaniel W. S. Almeida
267*f8b8d030SDaniel W. S. Almeida  Instruct the server to listen on the RDMA transport:
268*f8b8d030SDaniel W. S. Almeida
269*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
270*f8b8d030SDaniel W. S. Almeida
271*f8b8d030SDaniel W. S. Almeida    $ echo rdma 20049 > /proc/fs/nfsd/portlist
272*f8b8d030SDaniel W. S. Almeida
273*f8b8d030SDaniel W. S. Almeida- On the client system
274*f8b8d030SDaniel W. S. Almeida
275*f8b8d030SDaniel W. S. Almeida  If the NFS/RDMA client was built as a module (CONFIG_SUNRPC_XPRT_RDMA=m in
276*f8b8d030SDaniel W. S. Almeida  kernel config), load the RDMA client module:
277*f8b8d030SDaniel W. S. Almeida
278*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
279*f8b8d030SDaniel W. S. Almeida
280*f8b8d030SDaniel W. S. Almeida    $ modprobe xprtrdma.ko
281*f8b8d030SDaniel W. S. Almeida
282*f8b8d030SDaniel W. S. Almeida  Regardless of how the client was built (module or built-in), use this
283*f8b8d030SDaniel W. S. Almeida  command to mount the NFS/RDMA server:
284*f8b8d030SDaniel W. S. Almeida
285*f8b8d030SDaniel W. S. Almeida  .. code-block:: sh
286*f8b8d030SDaniel W. S. Almeida
287*f8b8d030SDaniel W. S. Almeida    $ mount -o rdma,port=20049 <IPoIB-server-name-or-address>:/<export> /mnt
288*f8b8d030SDaniel W. S. Almeida
289*f8b8d030SDaniel W. S. Almeida  To verify that the mount is using RDMA, run "cat /proc/mounts" and check
290*f8b8d030SDaniel W. S. Almeida  the "proto" field for the given mount.
291*f8b8d030SDaniel W. S. Almeida
292*f8b8d030SDaniel W. S. Almeida  Congratulations! You're using NFS/RDMA!
293