xref: /freebsd/usr.sbin/nfsd/pnfsserver.4 (revision 5e3190f700637fcfc1a52daeaa4a031fdd2557c7)
1.\" Copyright (c) 2018 Rick Macklem
2.\"
3.\" Redistribution and use in source and binary forms, with or without
4.\" modification, are permitted provided that the following conditions
5.\" are met:
6.\" 1. Redistributions of source code must retain the above copyright
7.\"    notice, this list of conditions and the following disclaimer.
8.\" 2. Redistributions in binary form must reproduce the above copyright
9.\"    notice, this list of conditions and the following disclaimer in the
10.\"    documentation and/or other materials provided with the distribution.
11.\"
12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
22.\" SUCH DAMAGE.
23.\"
24.Dd December 20, 2019
25.Dt PNFSSERVER 4
26.Os
27.Sh NAME
28.Nm pNFSserver
29.Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol Server
30.Sh DESCRIPTION
31A set of
32.Fx
33servers may be configured to provide a
34.Xr pnfs 4
35service.
36One
37.Fx
38system needs to be configured as a MetaData Server (MDS) and
39at least one additional
40.Fx
41system needs to be configured as one or
42more Data Servers (DS)s.
43.Pp
44These
45.Fx
46systems are configured to be NFSv4.1 and NFSv4.2
47servers, see
48.Xr nfsd 8
49and
50.Xr exports 5
51if you are not familiar with configuring a NFSv4.n server.
52All DS(s) and the MDS should support NFSv4.2 as well as NFSv4.1.
53Mixing an MDS that supports NFSv4.2 with any DS(s) that do not support
54NFSv4.2 will not work correctly.
55As such, all DS(s) must be upgraded from
56.Fx 12
57to
58.Fx 13
59before upgrading the MDS.
60.Sh DS server configuration
61The DS(s) need to be configured as NFSv4.1 and NFSv4.2 server(s),
62with a top level exported
63directory used for storage of data files.
64This directory must be owned by
65.Dq root
66and would normally have a mode of
67.Dq 700 .
68Within this directory there needs to be additional directories named
69ds0,...,dsN (where N is 19 by default) also owned by
70.Dq root
71with mode
72.Dq 700 .
73These are the directories where the data files are stored.
74The following command can be run by root when in the top level exported
75directory to create these subdirectories.
76.Bd -literal -offset indent
77jot -w ds 20 0 | xargs mkdir -m 700
78.Ed
79.sp
80Note that
81.Dq 20
82is the default and can be set to a larger value on the MDS as shown below.
83.sp
84The top level exported directory used for storage of data files must be
85exported to the MDS with the
86.Dq maproot=root sec=sys
87export options so that the MDS can create entries in these subdirectories.
88It must also be exported to all pNFS aware clients, but these clients do
89not require the
90.Dq maproot=root
91export option and this directory should be exported to them with the same
92options as used by the MDS to export file system(s) to the clients.
93.Pp
94It is possible to have multiple DSs on the same
95.Fx
96system, but each
97of these DSs must have a separate top level exported directory used for storage
98of data files and each
99of these DSs must be mountable via a separate IP address.
100Alias addresses can be set on the DS server system for a network
101interface via
102.Xr ifconfig 8
103to create these different IP addresses.
104Multiple DSs on the same server may be useful when data for different file systems
105on the MDS are being stored on different file system volumes on the
106.Fx
107DS system.
108.Sh MDS server configuration
109The MDS must be a separate
110.Fx
111system from the
112.Fx
113DS system(s) and
114NFS clients.
115It is configured as a NFSv4.1 and NFSv4.2 server with
116file system(s) exported to clients.
117However, the
118.Dq -p
119command line argument for
120.Xr nfsd
121is used to indicate that it is running as the MDS for a pNFS server.
122.Pp
123The DS(s) must all be mounted on the MDS using the following mount options:
124.Bd -literal -offset indent
125nfsv4,minorversion=2,soft,retrans=2
126.Ed
127.sp
128so that they can be defined as DSs in the
129.Dq -p
130option.
131Normally these mounts would be entered in the
132.Xr fstab 5
133on the MDS.
134For example, if there are four DSs named nfsv4-data[0-3], the
135.Xr fstab 5
136lines might look like:
137.Bd -literal -offset
138nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
139nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
140nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
141nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
142.Ed
143.sp
144The
145.Xr nfsd 8
146command line option
147.Dq -p
148indicates that the NFS server is a pNFS MDS and specifies what
149DSs are to be used.
150.br
151For the above
152.Xr fstab 5
153example, the
154.Xr nfsd 8
155nfs_server_flags line in your
156.Xr rc.conf 5
157might look like:
158.Bd -literal -offset
159nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3"
160.Ed
161.sp
162This example specifies that the data files should be distributed over the
163four DSs and File layouts will be issued to pNFS enabled clients.
164If issuing Flexible File layouts is desired for this case, setting the sysctl
165.Dq vfs.nfsd.default_flexfile
166non-zero in your
167.Xr sysctl.conf 5
168file will make the
169.Nm
170do that.
171.br
172Alternately, this variant of
173.Dq nfs_server_flags
174will specify that two way mirroring is to be done, via the
175.Dq -m
176command line option.
177.Bd -literal -offset
178nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3 -m 2"
179.Ed
180.sp
181With two way mirroring, the data file for each exported file on the MDS
182will be stored on two of the DSs.
183When mirroring is enabled, the server will always issue Flexible File layouts.
184.Pp
185It is also possible to specify which DSs are to be used to store data files for
186specific exported file systems on the MDS.
187For example, if the MDS has exported two file systems
188.Dq /export1
189and
190.Dq /export2
191to clients, the following variant of
192.Dq nfs_server_flags
193will specify that data files for
194.Dq /export1
195will be stored on nfsv4-data0 and nfsv4-data1, whereas the data files for
196.Dq /export2
197will be store on nfsv4-data2 and nfsv4-data3.
198.Bd -literal -offset
199nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0#/export1,nfsv4-data1:/data1#/export1,nfsv4-data2:/data2#/export2,nfsv4-data3:/data3#/export2"
200.Ed
201.sp
202This can be used by system administrators to control where data files are
203stored and might be useful for control of storage use.
204For this case, it may be convenient to co-locate more than one of the DSs
205on the same
206.Fx
207server, using separate file systems on the DS system
208for storage of the respective DS's data files.
209If mirroring is desired for this case, the
210.Dq -m
211option also needs to be specified.
212There must be enough DSs assigned to each exported file system on the MDS
213to support the level of mirroring.
214The above example would be fine for two way mirroring, but four way mirroring
215would not work, since there are only two DSs assigned to each exported file
216system on the MDS.
217.Pp
218The number of subdirectories in each DS is defined by the
219.Dq vfs.nfs.dsdirsize
220sysctl on the MDS.
221This value can be increased from the default of 20, but only when the
222.Xr nfsd 8
223is not running and after the additional ds20,... subdirectories have been
224created on all the DSs.
225For a service that will store a large number of files this sysctl should be
226set much larger, to avoid the number of entries in a subdirectory from
227getting too large.
228.Sh Client mounts
229Once operational, NFSv4.1 or NFSv4.2
230.Fx
231client mounts
232done with the
233.Dq pnfs
234option should do I/O directly on the DSs.
235The clients mounting the MDS must be running the
236.Xr nfscbd
237daemon for pNFS to work.
238Set
239.Bd -literal -offset indent
240nfscbd_enable="YES"
241.Ed
242.sp
243in the
244.Xr rc.conf 5
245on these clients.
246Non-pNFS aware clients or NFSv3 mounts will do all I/O RPCs on the MDS,
247which acts as a proxy for the appropriate DS(s).
248.Sh Backing up a pNFS service
249Since the data is separated from the metadata, the simple way to back up
250a pNFS service is to do so from an NFS client that has the service mounted
251on it.
252If you back up the MDS exported file system(s) on the MDS, you must do it
253in such a way that the
254.Dq system
255namespace extended attributes get backed up.
256.Sh Handling of failed mirrored DSs
257When a mirrored DS fails, it can be disabled one of three ways:
258.sp
2591 - The MDS detects a problem when trying to do proxy
260operations on the DS.
261This can take a couple of minutes
262after the DS failure or network partitioning occurs.
263.sp
2642 - A pNFS client can report an I/O error that occurred for a DS to the MDS in
265the arguments for a LayoutReturn operation.
266.sp
2673 - The system administrator can perform the pnfsdskill(8) command on the MDS
268to disable it.
269If the system administrator does a pnfsdskill(8) and it fails with ENXIO
270(Device not configured) that normally means the DS was already
271disabled via #1 or #2.
272Since doing this is harmless, once a system administrator knows that
273there is a problem with a mirrored DS, doing the command is recommended.
274.sp
275Once a system administrator knows that a mirrored DS has malfunctioned
276or has been network partitioned, they should do the following as root/su
277on the MDS:
278.Bd -literal -offset indent
279# pnfsdskill <mounted-on-path-of-DS>
280# umount -N <mounted-on-path-of-DS>
281.Ed
282.sp
283Note that the <mounted-on-path-of-DS> must be the exact mounted-on path
284string used when the DS was mounted on the MDS.
285.Pp
286Once the mirrored DS has been disabled, the pNFS service should continue to
287function, but file updates will only happen on the DS(s) that have not been disabled.
288Assuming two way mirroring, that implies the one DS of the pair stored in the
289.Dq pnfsd.dsfile
290extended attribute for the file on the MDS, for files stored on the disabled DS.
291.Pp
292The next step is to clear the IP address in the
293.Dq pnfsd.dsfile
294extended attribute on all files on the MDS for the failed DS.
295This is done so that, when the disabled DS is repaired and brought back online,
296the data files on this DS will not be used, since they may be out of date.
297The command that clears the IP address is
298.Xr pnfsdsfile 8
299with the
300.Dq -r
301option.
302.Bd -literal -offset
303For example:
304# pnfsdsfile -r nfsv4-data3 yyy.c
305yyy.c:	nfsv4-data2.home.rick	ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000	0.0.0.0	ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000
306.Ed
307.sp
308replaces nfsv4-data3 with an IPv4 address of 0.0.0.0, so that nfsv4-data3
309will not get used.
310.Pp
311Normally this will be called within a
312.Xr find 1
313command for all regular
314files in the exported directory tree and must be done on the MDS.
315When used with
316.Xr find 1 ,
317you will probably also want the
318.Dq -q
319option so that it won't spit out the results for every file.
320If the disabled/repaired DS is nfsv4-data3, the commands done on the MDS
321would be:
322.Bd -literal -offset
323# cd <top-level-exported-dir>
324# find . -type f -exec pnfsdsfile -q -r nfsv4-data3 {} \;
325.Ed
326.sp
327There is a problem with the above command if the file found by
328.Xr find 1
329is renamed or unlinked before the
330.Xr pnfsdsfile 8
331command is done on it.
332This should normally generate an error message.
333A simple unlink is harmless
334but a link/unlink or rename might result in the file not having been processed
335under its new name.
336To check that all files have their IP addresses set to 0.0.0.0 these
337commands can be used (assuming the
338.Xr sh 1
339shell):
340.Bd -literal -offset
341# cd <top-level-exported-dir>
342# find . -type f -exec pnfsdsfile {} \; | sed "/nfsv4-data3/!d"
343.Ed
344.sp
345Any line(s) printed require the
346.Xr pnfsdsfile 8
347with
348.Dq -r
349to be done again.
350Once this is done, the replaced/repaired DS can be brought back online.
351It should have empty ds0,...,dsN directories under the top level exported
352directory for storage of data files just like it did when first set up.
353Mount it on the MDS exactly as you did before disabling it.
354For the nfsv4-data3 example, the command would be:
355.Bd -literal -offset
356# mount -t nfs -o nfsv4,minorversion=2,soft,retrans=2 nfsv4-data3:/ /data3
357.Ed
358.sp
359Then restart the nfsd to re-enable the DS.
360.Bd -literal -offset
361# /etc/rc.d/nfsd restart
362.Ed
363.sp
364Now, new files can be stored on nfsv4-data3,
365but files with the IP address zeroed out on the MDS will not yet use the
366repaired DS (nfsv4-data3).
367The next step is to go through the exported file tree on the MDS and,
368for each of the
369files with an IPv4 address of 0.0.0.0 in its extended attribute, copy the file
370data to the repaired DS and re-enable use of this mirror for it.
371This command for copying the file data for one MDS file is
372.Xr pnfsdscopymr 8
373and it will also normally be used in a
374.Xr find 1 .
375For the example case, the commands on the MDS would be:
376.Bd -literal -offset
377# cd <top-level-exported-dir>
378# find . -type f -exec pnfsdscopymr -r /data3 {} \;
379.Ed
380.sp
381When this completes, the recovery should be complete or at least nearly so.
382As noted above, if a link/unlink or rename occurs on a file name while the
383above
384.Xr find 1
385is in progress, it may not get copied.
386To check for any file(s) not yet copied, the commands are:
387.Bd -literal -offset
388# cd <top-level-exported-dir>
389# find . -type f -exec pnfsdsfile {} \; | sed "/0\.0\.0\.0/!d"
390.Ed
391.sp
392If this command prints out any file name(s), these files must
393have the
394.Xr pnfsdscopymr 8
395command done on them to complete the recovery.
396.Bd -literal -offset
397# pnfsdscopymr -r /data3 <file-path-reported>
398.Ed
399.sp
400If this command fails with the error
401.br
402.Dq pnfsdscopymr: Copymr failed for file <path>: Device not configured
403.br
404repeatedly, this may be caused by a Read/Write layout that has not
405been returned.
406The only way to get rid of such a layout is to restart the
407.Xr nfsd 8 .
408.sp
409All of these commands are designed to be
410done while the pNFS service is running and can be re-run safely.
411.Pp
412For a more detailed discussion of the setup and management of a pNFS service
413see:
414.Bd -literal -offset indent
415https://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
416.Ed
417.sp
418.Sh SEE ALSO
419.Xr nfsv4 4 ,
420.Xr pnfs 4 ,
421.Xr exports 5 ,
422.Xr fstab 5 ,
423.Xr rc.conf 5 ,
424.Xr sysctl.conf 5 ,
425.Xr nfscbd 8 ,
426.Xr nfsd 8 ,
427.Xr nfsuserd 8 ,
428.Xr pnfsdscopymr 8 ,
429.Xr pnfsdsfile 8 ,
430.Xr pnfsdskill 8
431.Sh HISTORY
432The
433.Nm
434service first appeared in
435.Fx 12.0 .
436.Sh BUGS
437Since the MDS cannot be mirrored, it is a single point of failure just
438as a non
439.Tn pNFS
440server is.
441For non-mirrored configurations, all
442.Fx
443systems used in the service
444are single points of failure.
445