xref: /freebsd/usr.sbin/nfsd/pnfsserver.4 (revision 1323ec571215a77ddd21294f0871979d5ad6b992)
1.\" Copyright (c) 2018 Rick Macklem
2.\"
3.\" Redistribution and use in source and binary forms, with or without
4.\" modification, are permitted provided that the following conditions
5.\" are met:
6.\" 1. Redistributions of source code must retain the above copyright
7.\"    notice, this list of conditions and the following disclaimer.
8.\" 2. Redistributions in binary form must reproduce the above copyright
9.\"    notice, this list of conditions and the following disclaimer in the
10.\"    documentation and/or other materials provided with the distribution.
11.\"
12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
22.\" SUCH DAMAGE.
23.\"
24.\" $FreeBSD$
25.\"
26.Dd December 20, 2019
27.Dt PNFSSERVER 4
28.Os
29.Sh NAME
30.Nm pNFSserver
31.Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol Server
32.Sh DESCRIPTION
33A set of
34.Fx
35servers may be configured to provide a
36.Xr pnfs 4
37service.
38One
39.Fx
40system needs to be configured as a MetaData Server (MDS) and
41at least one additional
42.Fx
43system needs to be configured as one or
44more Data Servers (DS)s.
45.Pp
46These
47.Fx
48systems are configured to be NFSv4.1 and NFSv4.2
49servers, see
50.Xr nfsd 8
51and
52.Xr exports 5
53if you are not familiar with configuring a NFSv4.n server.
54All DS(s) and the MDS should support NFSv4.2 as well as NFSv4.1.
55Mixing an MDS that supports NFSv4.2 with any DS(s) that do not support
56NFSv4.2 will not work correctly.
57As such, all DS(s) must be upgraded from
58.Fx 12
59to
60.Fx 13
61before upgrading the MDS.
62.Sh DS server configuration
63The DS(s) need to be configured as NFSv4.1 and NFSv4.2 server(s),
64with a top level exported
65directory used for storage of data files.
66This directory must be owned by
67.Dq root
68and would normally have a mode of
69.Dq 700 .
70Within this directory there needs to be additional directories named
71ds0,...,dsN (where N is 19 by default) also owned by
72.Dq root
73with mode
74.Dq 700 .
75These are the directories where the data files are stored.
76The following command can be run by root when in the top level exported
77directory to create these subdirectories.
78.Bd -literal -offset indent
79jot -w ds 20 0 | xargs mkdir -m 700
80.Ed
81.sp
82Note that
83.Dq 20
84is the default and can be set to a larger value on the MDS as shown below.
85.sp
86The top level exported directory used for storage of data files must be
87exported to the MDS with the
88.Dq maproot=root sec=sys
89export options so that the MDS can create entries in these subdirectories.
90It must also be exported to all pNFS aware clients, but these clients do
91not require the
92.Dq maproot=root
93export option and this directory should be exported to them with the same
94options as used by the MDS to export file system(s) to the clients.
95.Pp
96It is possible to have multiple DSs on the same
97.Fx
98system, but each
99of these DSs must have a separate top level exported directory used for storage
100of data files and each
101of these DSs must be mountable via a separate IP address.
102Alias addresses can be set on the DS server system for a network
103interface via
104.Xr ifconfig 8
105to create these different IP addresses.
106Multiple DSs on the same server may be useful when data for different file systems
107on the MDS are being stored on different file system volumes on the
108.Fx
109DS system.
110.Sh MDS server configuration
111The MDS must be a separate
112.Fx
113system from the
114.Fx
115DS system(s) and
116NFS clients.
117It is configured as a NFSv4.1 and NFSv4.2 server with
118file system(s) exported to clients.
119However, the
120.Dq -p
121command line argument for
122.Xr nfsd
123is used to indicate that it is running as the MDS for a pNFS server.
124.Pp
125The DS(s) must all be mounted on the MDS using the following mount options:
126.Bd -literal -offset indent
127nfsv4,minorversion=2,soft,retrans=2
128.Ed
129.sp
130so that they can be defined as DSs in the
131.Dq -p
132option.
133Normally these mounts would be entered in the
134.Xr fstab 5
135on the MDS.
136For example, if there are four DSs named nfsv4-data[0-3], the
137.Xr fstab 5
138lines might look like:
139.Bd -literal -offset
140nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
141nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
142nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
143nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0
144.Ed
145.sp
146The
147.Xr nfsd 8
148command line option
149.Dq -p
150indicates that the NFS server is a pNFS MDS and specifies what
151DSs are to be used.
152.br
153For the above
154.Xr fstab 5
155example, the
156.Xr nfsd 8
157nfs_server_flags line in your
158.Xr rc.conf 5
159might look like:
160.Bd -literal -offset
161nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3"
162.Ed
163.sp
164This example specifies that the data files should be distributed over the
165four DSs and File layouts will be issued to pNFS enabled clients.
166If issuing Flexible File layouts is desired for this case, setting the sysctl
167.Dq vfs.nfsd.default_flexfile
168non-zero in your
169.Xr sysctl.conf 5
170file will make the
171.Nm
172do that.
173.br
174Alternately, this variant of
175.Dq nfs_server_flags
176will specify that two way mirroring is to be done, via the
177.Dq -m
178command line option.
179.Bd -literal -offset
180nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3 -m 2"
181.Ed
182.sp
183With two way mirroring, the data file for each exported file on the MDS
184will be stored on two of the DSs.
185When mirroring is enabled, the server will always issue Flexible File layouts.
186.Pp
187It is also possible to specify which DSs are to be used to store data files for
188specific exported file systems on the MDS.
189For example, if the MDS has exported two file systems
190.Dq /export1
191and
192.Dq /export2
193to clients, the following variant of
194.Dq nfs_server_flags
195will specify that data files for
196.Dq /export1
197will be stored on nfsv4-data0 and nfsv4-data1, whereas the data files for
198.Dq /export2
199will be store on nfsv4-data2 and nfsv4-data3.
200.Bd -literal -offset
201nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0#/export1,nfsv4-data1:/data1#/export1,nfsv4-data2:/data2#/export2,nfsv4-data3:/data3#/export2"
202.Ed
203.sp
204This can be used by system administrators to control where data files are
205stored and might be useful for control of storage use.
206For this case, it may be convenient to co-locate more than one of the DSs
207on the same
208.Fx
209server, using separate file systems on the DS system
210for storage of the respective DS's data files.
211If mirroring is desired for this case, the
212.Dq -m
213option also needs to be specified.
214There must be enough DSs assigned to each exported file system on the MDS
215to support the level of mirroring.
216The above example would be fine for two way mirroring, but four way mirroring
217would not work, since there are only two DSs assigned to each exported file
218system on the MDS.
219.Pp
220The number of subdirectories in each DS is defined by the
221.Dq vfs.nfs.dsdirsize
222sysctl on the MDS.
223This value can be increased from the default of 20, but only when the
224.Xr nfsd 8
225is not running and after the additional ds20,... subdirectories have been
226created on all the DSs.
227For a service that will store a large number of files this sysctl should be
228set much larger, to avoid the number of entries in a subdirectory from
229getting too large.
230.Sh Client mounts
231Once operational, NFSv4.1 or NFSv4.2
232.Fx
233client mounts
234done with the
235.Dq pnfs
236option should do I/O directly on the DSs.
237The clients mounting the MDS must be running the
238.Xr nfscbd
239daemon for pNFS to work.
240Set
241.Bd -literal -offset indent
242nfscbd_enable="YES"
243.Ed
244.sp
245in the
246.Xr rc.conf 5
247on these clients.
248Non-pNFS aware clients or NFSv3 mounts will do all I/O RPCs on the MDS,
249which acts as a proxy for the appropriate DS(s).
250.Sh Backing up a pNFS service
251Since the data is separated from the metadata, the simple way to back up
252a pNFS service is to do so from an NFS client that has the service mounted
253on it.
254If you back up the MDS exported file system(s) on the MDS, you must do it
255in such a way that the
256.Dq system
257namespace extended attributes get backed up.
258.Sh Handling of failed mirrored DSs
259When a mirrored DS fails, it can be disabled one of three ways:
260.sp
2611 - The MDS detects a problem when trying to do proxy
262operations on the DS.
263This can take a couple of minutes
264after the DS failure or network partitioning occurs.
265.sp
2662 - A pNFS client can report an I/O error that occurred for a DS to the MDS in
267the arguments for a LayoutReturn operation.
268.sp
2693 - The system administrator can perform the pnfsdskill(8) command on the MDS
270to disable it.
271If the system administrator does a pnfsdskill(8) and it fails with ENXIO
272(Device not configured) that normally means the DS was already
273disabled via #1 or #2.
274Since doing this is harmless, once a system administrator knows that
275there is a problem with a mirrored DS, doing the command is recommended.
276.sp
277Once a system administrator knows that a mirrored DS has malfunctioned
278or has been network partitioned, they should do the following as root/su
279on the MDS:
280.Bd -literal -offset indent
281# pnfsdskill <mounted-on-path-of-DS>
282# umount -N <mounted-on-path-of-DS>
283.Ed
284.sp
285Note that the <mounted-on-path-of-DS> must be the exact mounted-on path
286string used when the DS was mounted on the MDS.
287.Pp
288Once the mirrored DS has been disabled, the pNFS service should continue to
289function, but file updates will only happen on the DS(s) that have not been disabled.
290Assuming two way mirroring, that implies the one DS of the pair stored in the
291.Dq pnfsd.dsfile
292extended attribute for the file on the MDS, for files stored on the disabled DS.
293.Pp
294The next step is to clear the IP address in the
295.Dq pnfsd.dsfile
296extended attribute on all files on the MDS for the failed DS.
297This is done so that, when the disabled DS is repaired and brought back online,
298the data files on this DS will not be used, since they may be out of date.
299The command that clears the IP address is
300.Xr pnfsdsfile 8
301with the
302.Dq -r
303option.
304.Bd -literal -offset
305For example:
306# pnfsdsfile -r nfsv4-data3 yyy.c
307yyy.c:	nfsv4-data2.home.rick	ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000	0.0.0.0	ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000
308.Ed
309.sp
310replaces nfsv4-data3 with an IPv4 address of 0.0.0.0, so that nfsv4-data3
311will not get used.
312.Pp
313Normally this will be called within a
314.Xr find 1
315command for all regular
316files in the exported directory tree and must be done on the MDS.
317When used with
318.Xr find 1 ,
319you will probably also want the
320.Dq -q
321option so that it won't spit out the results for every file.
322If the disabled/repaired DS is nfsv4-data3, the commands done on the MDS
323would be:
324.Bd -literal -offset
325# cd <top-level-exported-dir>
326# find . -type f -exec pnfsdsfile -q -r nfsv4-data3 {} \;
327.Ed
328.sp
329There is a problem with the above command if the file found by
330.Xr find 1
331is renamed or unlinked before the
332.Xr pnfsdsfile 8
333command is done on it.
334This should normally generate an error message.
335A simple unlink is harmless
336but a link/unlink or rename might result in the file not having been processed
337under its new name.
338To check that all files have their IP addresses set to 0.0.0.0 these
339commands can be used (assuming the
340.Xr sh 1
341shell):
342.Bd -literal -offset
343# cd <top-level-exported-dir>
344# find . -type f -exec pnfsdsfile {} \; | sed "/nfsv4-data3/!d"
345.Ed
346.sp
347Any line(s) printed require the
348.Xr pnfsdsfile 8
349with
350.Dq -r
351to be done again.
352Once this is done, the replaced/repaired DS can be brought back online.
353It should have empty ds0,...,dsN directories under the top level exported
354directory for storage of data files just like it did when first set up.
355Mount it on the MDS exactly as you did before disabling it.
356For the nfsv4-data3 example, the command would be:
357.Bd -literal -offset
358# mount -t nfs -o nfsv4,minorversion=2,soft,retrans=2 nfsv4-data3:/ /data3
359.Ed
360.sp
361Then restart the nfsd to re-enable the DS.
362.Bd -literal -offset
363# /etc/rc.d/nfsd restart
364.Ed
365.sp
366Now, new files can be stored on nfsv4-data3,
367but files with the IP address zeroed out on the MDS will not yet use the
368repaired DS (nfsv4-data3).
369The next step is to go through the exported file tree on the MDS and,
370for each of the
371files with an IPv4 address of 0.0.0.0 in its extended attribute, copy the file
372data to the repaired DS and re-enable use of this mirror for it.
373This command for copying the file data for one MDS file is
374.Xr pnfsdscopymr 8
375and it will also normally be used in a
376.Xr find 1 .
377For the example case, the commands on the MDS would be:
378.Bd -literal -offset
379# cd <top-level-exported-dir>
380# find . -type f -exec pnfsdscopymr -r /data3 {} \;
381.Ed
382.sp
383When this completes, the recovery should be complete or at least nearly so.
384As noted above, if a link/unlink or rename occurs on a file name while the
385above
386.Xr find 1
387is in progress, it may not get copied.
388To check for any file(s) not yet copied, the commands are:
389.Bd -literal -offset
390# cd <top-level-exported-dir>
391# find . -type f -exec pnfsdsfile {} \; | sed "/0\.0\.0\.0/!d"
392.Ed
393.sp
394If this command prints out any file name(s), these files must
395have the
396.Xr pnfsdscopymr 8
397command done on them to complete the recovery.
398.Bd -literal -offset
399# pnfsdscopymr -r /data3 <file-path-reported>
400.Ed
401.sp
402If this command fails with the error
403.br
404.Dq pnfsdscopymr: Copymr failed for file <path>: Device not configured
405.br
406repeatedly, this may be caused by a Read/Write layout that has not
407been returned.
408The only way to get rid of such a layout is to restart the
409.Xr nfsd 8 .
410.sp
411All of these commands are designed to be
412done while the pNFS service is running and can be re-run safely.
413.Pp
414For a more detailed discussion of the setup and management of a pNFS service
415see:
416.Bd -literal -offset indent
417https://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
418.Ed
419.sp
420.Sh SEE ALSO
421.Xr nfsv4 4 ,
422.Xr pnfs 4 ,
423.Xr exports 5 ,
424.Xr fstab 5 ,
425.Xr rc.conf 5 ,
426.Xr sysctl.conf 5 ,
427.Xr nfscbd 8 ,
428.Xr nfsd 8 ,
429.Xr nfsuserd 8 ,
430.Xr pnfsdscopymr 8 ,
431.Xr pnfsdsfile 8 ,
432.Xr pnfsdskill 8
433.Sh HISTORY
434The
435.Nm
436service first appeared in
437.Fx 12.0 .
438.Sh BUGS
439Since the MDS cannot be mirrored, it is a single point of failure just
440as a non
441.Tn pNFS
442server is.
443For non-mirrored configurations, all
444.Fx
445systems used in the service
446are single points of failure.
447