1.\" Copyright (c) 2018 Rick Macklem 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 15.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 22.\" SUCH DAMAGE. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd August 8, 2018 27.Dt PNFSSERVER 4 28.Os 29.Sh NAME 30.Nm pNFSserver 31.Nd NFS Version 4.1 Parallel NFS Protocol Server 32.Sh DESCRIPTION 33A set of FreeBSD servers may be configured to provide a 34.Xr pnfs 4 35service. 36One FreeBSD system needs to be configured as a MetaData Server (MDS) and 37at least one additional FreeBSD system needs to be configured as one or 38more Data Servers (DS)s. 39.Pp 40These FreeBSD systems are configured to be NFSv4.1 servers, see 41.Xr nfsd 8 42and 43.Xr exports 5 44if you are not familiar with configuring a NFSv4.1 server. 45.Sh DS server configuration 46The DS(s) need to be configured as NFSv4.1 server(s), with a top level exported 47directory used for storage of data files. 48This directory must be owned by 49.Dq root 50and would normally have a mode of 51.Dq 700 . 52Within this directory there needs to be additional directories named 53ds0,...,dsN (where N is 19 by default) also owned by 54.Dq root 55with mode 56.Dq 700 . 57These are the directories where the data files are stored. 58The following command can be run by root when in the top level exported 59directory to create these subdirectories. 60.Bd -literal -offset indent 61jot -w ds 20 0 | xargs mkdir -m 700 62.Ed 63.sp 64Note that 65.Dq 20 66is the default and can be set to a larger value on the MDS as shown below. 67.sp 68The top level exported directory used for storage of data files must be 69exported to the MDS with the 70.Dq maproot=root sec=sys 71export options so that the MDS can create entries in these subdirectories. 72It must also be exported to all pNFS aware clients, but these clients do 73not require the 74.Dq maproot=root 75export option and this directory should be exported to them with the same 76options as used by the MDS to export file system(s) to the clients. 77.Pp 78It is possible to have multiple DSs on the same FreeBSD system, but each 79of these DSs must have a separate top level exported directory used for storage 80of data files and each 81of these DSs must be mountable via a separate IP address. 82Alias addresses can be set on the DS server system for a network 83interface via 84.Xr ifconfig 8 85to create these different IP addresses. 86Multiple DSs on the same server may be useful when data for different file systems 87on the MDS are being stored on different file system volumes on the FreeBSD 88DS system. 89.Sh MDS server configuration 90The MDS must be a separate FreeBSD system from the FreeBSD DS system(s) and 91NFS clients. 92It is configured as a NFSv4.1 server with file system(s) exported to 93clients. 94However, the 95.Dq -p 96command line argument for 97.Xr nfsd 98is used to indicate that it is running as the MDS for a pNFS server. 99.Pp 100The DS(s) must all be mounted on the MDS using the following mount options: 101.Bd -literal -offset indent 102nfsv4,minorversion=1,soft,retrans=2 103.Ed 104.sp 105so that they can be defined as DSs in the 106.Dq -p 107option. 108Normally these mounts would be entered in the 109.Xr fstab 5 110on the MDS. 111For example, if there are four DSs named nfsv4-data[0-3], the 112.Xr fstab 5 113lines might look like: 114.Bd -literal -offset 115nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0 116nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0 117nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0 118nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=1,soft,retrans=2 0 0 119.Ed 120.sp 121The 122.Xr nfsd 8 123command line option 124.Dq -p 125indicates that the NFS server is a pNFS MDS and specifies what 126DSs are to be used. 127.br 128For the above 129.Xr fstab 5 130example, the 131.Xr nfsd 8 132nfs_server_flags line in your 133.Xr rc.conf 5 134might look like: 135.Bd -literal -offset 136nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3" 137.Ed 138.sp 139This example specifies that the data files should be distributed over the 140four DSs and File layouts will be issued to pNFS enabled clients. 141If issuing Flexible File layouts is desired for this case, setting the sysctl 142.Dq vfs.nfsd.default_flexfile 143non-zero in your 144.Xr sysctl.conf 5 145file will make the 146.Nm 147do that. 148.br 149Alternately, this variant of 150.Dq nfs_server_flags 151will specify that two way mirroring is to be done, via the 152.Dq -m 153command line option. 154.Bd -literal -offset 155nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3 -m 2" 156.Ed 157.sp 158With two way mirroring, the data file for each exported file on the MDS 159will be stored on two of the DSs. 160When mirroring is enabled, the server will always issue Flexible File layouts. 161.Pp 162It is also possible to specify which DSs are to be used to store data files for 163specific exported file systems on the MDS. 164For example, if the MDS has exported two file systems 165.Dq /export1 166and 167.Dq /export2 168to clients, the following variant of 169.Dq nfs_server_flags 170will specify that data files for 171.Dq /export1 172will be stored on nfsv4-data0 and nfsv4-data1, whereas the data files for 173.Dq /export2 174will be store on nfsv4-data2 and nfsv4-data3. 175.Bd -literal -offset 176nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0#/export1,nfsv4-data1:/data1#/export1,nfsv4-data2:/data2#/export2,nfsv4-data3:/data3#/export2" 177.Ed 178.sp 179This can be used by system administrators to control where data files are 180stored and might be useful for control of storage use. 181For this case, it may be convenient to co-locate more than one of the DSs 182on the same FreeBSD server, using separate file systems on the DS system 183for storage of the respective DS's data files. 184If mirroring is desired for this case, the 185.Dq -m 186option also needs to be specified. 187There must be enough DSs assigned to each exported file system on the MDS 188to support the level of mirroring. 189The above example would be fine for two way mirroring, but four way mirroring 190would not work, since there are only two DSs assigned to each exported file 191system on the MDS. 192.Pp 193The number of subdirectories in each DS is defined by the 194.Dq vfs.nfs.dsdirsize 195sysctl on the MDS. 196This value can be increased from the default of 20, but only when the 197.Xr nfsd 8 198is not running and after the additional ds20,... subdirectories have been 199created on all the DSs. 200For a service that will store a large number of files this sysctl should be 201set much larger, to avoid the number of entries in a subdirectory from 202getting too large. 203.Sh Client mounts 204Once operational, NFSv4.1 FreeBSD client mounts done with the 205.Dq pnfs 206option should do I/O directly on the DSs. 207The clients mounting the MDS must be running the 208.Xr nfscbd 209daemon for pNFS to work. 210Set 211.Bd -literal -offset indent 212nfscbd_enable="YES" 213.Ed 214.sp 215in the 216.Xr rc.conf 5 217on these clients. 218Non-pNFS aware clients or NFSv3 mounts will do all I/O RPCs on the MDS, 219which acts as a proxy for the appropriate DS(s). 220.Sh Backing up a pNFS service 221Since the data is separated from the metadata, the simple way to back up 222a pNFS service is to do so from an NFS client that has the service mounted 223on it. 224If you back up the MDS exported file system(s) on the MDS, you must do it 225in such a way that the 226.Dq system 227namespace extended attributes get backed up. 228.Sh Handling of failed mirrored DSs 229When a mirrored DS fails, it can be disabled one of three ways: 230.sp 2311 - The MDS detects a problem when trying to do proxy 232operations on the DS. 233This can take a couple of minutes 234after the DS failure or network partitioning occurs. 235.sp 2362 - A pNFS client can report an I/O error that occurred for a DS to the MDS in 237the arguments for a LayoutReturn operation. 238.sp 2393 - The system administrator can perform the pnfsdskill(8) command on the MDS 240to disable it. If the system administrator does a pnfsdskill(8) and it fails 241with ENXIO (Device not configured) that normally means the DS was already 242disabled via #1 or #2. Since doing this is harmless, once a system 243administrator knows that there is a problem with a mirrored DS, doing the 244command is recommended. 245.sp 246Once a system administrator knows that a mirrored DS has malfunctioned 247or has been network partitioned, they should do the following as root/su 248on the MDS: 249.Bd -literal -offset indent 250# pnfsdskill <mounted-on-path-of-DS> 251# umount -N <mounted-on-path-of-DS> 252.Ed 253.sp 254Note that the <mounted-on-path-of-DS> must be the exact mounted-on path 255string used when the DS was mounted on the MDS. 256.Pp 257Once the mirrored DS has been disabled, the pNFS service should continue to 258function, but file updates will only happen on the DS(s) 259that have not been disabled. Assuming two way mirroring, that implies 260the one DS of the pair stored in the 261.Dq pnfsd.dsfile 262extended attribute for the file on the MDS, for files stored on the disabled DS. 263.Pp 264The next step is to clear the IP address in the 265.Dq pnfsd.dsfile 266extended attribute on all files on the MDS for the failed DS. 267This is done so that, when the disabled DS is repaired and brought back online, 268the data files on this DS will not be used, since they may be out of date. 269The command that clears the IP address is 270.Xr pnfsdsfile 8 271with the 272.Dq -r 273option. 274.Bd -literal -offset 275For example: 276# pnfsdsfile -r nfsv4-data3 yyy.c 277yyy.c: nfsv4-data2.home.rick ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 0.0.0.0 ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 278.Ed 279.sp 280replaces nfsv4-data3 with an IPv4 address of 0.0.0.0, so that nfsv4-data3 281will not get used. 282.Pp 283Normally this will be called within a 284.Xr find 1 285command for all regular 286files in the exported directory tree and must be done on the MDS. 287When used with 288.Xr find 1 , 289you will probably also want the 290.Dq -q 291option so that it won't spit out the results for every file. 292If the disabled/repaired DS is nfsv4-data3, the commands done on the MDS 293would be: 294.Bd -literal -offset 295# cd <top-level-exported-dir> 296# find . -type f -exec pnfsdsfile -q -r nfsv4-data3 {} \; 297.Ed 298.sp 299There is a problem with the above command if the file found by 300.Xr find 1 301is renamed or unlinked before the 302.Xr pnfsdsfile 8 303command is done on it. 304This should normally generate an error message. 305A simple unlink is harmless 306but a link/unlink or rename might result in the file not having been processed 307under its new name. 308To check that all files have their IP addresses set to 0.0.0.0 these 309commands can be used (assuming the 310.Xr sh 1 311shell): 312.Bd -literal -offset 313# cd <top-level-exported-dir> 314# find . -type f -exec pnfsdsfile {} \; | sed "/nfsv4-data3/!d" 315.Ed 316.sp 317Any line(s) printed require the 318.Xr pnfsdsfile 8 319with 320.Dq -r 321to be done again. 322Once this is done, the replaced/repaired DS can be brought back online. 323It should have empty ds0,...,dsN directories under the top level exported 324directory for storage of data files just like it did when first set up. 325Mount it on the MDS exactly as you did before disabling it. 326For the nfsv4-data3 example, the command would be: 327.Bd -literal -offset 328# mount -t nfs -o nfsv4,minorversion=1,soft,retrans=2 nfsv4-data3:/ /data3 329.Ed 330.sp 331Then restart the nfsd to re-enable the DS. 332.Bd -literal -offset 333# /etc/rc.d/nfsd restart 334.Ed 335.sp 336Now, new files can be stored on nfsv4-data3, 337but files with the IP address zeroed out on the MDS will not yet use the 338repaired DS (nfsv4-data3). 339The next step is to go through the exported file tree on the MDS and, 340for each of the 341files with an IPv4 address of 0.0.0.0 in its extended attribute, copy the file 342data to the repaired DS and re-enable use of this mirror for it. 343This command for copying the file data for one MDS file is 344.Xr pnfsdscopymr 8 345and it will also normally be used in a 346.Xr find 1 . 347For the example case, the commands on the MDS would be: 348.Bd -literal -offset 349# cd <top-level-exported-dir> 350# find . -type f -exec pnfsdscopymr -r /data3 {} \; 351.Ed 352.sp 353When this completes, the recovery should be complete or at least nearly so. 354As noted above, if a link/unlink or rename occurs on a file name while the 355above 356.Xr find 1 357is in progress, it may not get copied. 358To check for any file(s) not yet copied, the commands are: 359.Bd -literal -offset 360# cd <top-level-exported-dir> 361# find . -type f -exec pnfsdsfile {} \; | sed "/0\.0\.0\.0/!d" 362.Ed 363.sp 364If this command prints out any file name(s), these files must 365have the 366.Xr pnfsdscopymr 8 367command done on them to complete the recovery. 368.Bd -literal -offset 369# pnfsdscopymr -r /data3 <file-path-reported> 370.Ed 371.sp 372If this commmand fails with the error 373.br 374.Dq pnfsdscopymr: Copymr failed for file <path>: Device not configured 375.br 376repeatedly, this may be caused by a Read/Write layout that has not 377been returned. 378The only way to get rid of such a layout is to restart the 379.Xr nfsd 8 . 380.sp 381All of these commands are designed to be 382done while the pNFS service is running and can be re-run safely. 383.Pp 384For a more detailed discussion of the setup and management of a pNFS service 385see: 386.Bd -literal -offset indent 387http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt 388.Ed 389.sp 390.Sh SEE ALSO 391.Xr nfsv4 4 , 392.Xr pnfs 4 , 393.Xr exports 5 , 394.Xr fstab 5 , 395.Xr rc.conf 5 , 396.Xr sysctl.conf 5 , 397.Xr nfscbd 8 , 398.Xr nfsd 8 , 399.Xr nfsuserd 8 , 400.Xr pnfsdscopymr 8 , 401.Xr pnfsdsfile 8 , 402.Xr pnfsdskill 8 403.Sh HISTORY 404The 405.Nm 406command first appeared in 407.Fx 12.0 . 408.Sh BUGS 409Since the MDS cannot be mirrored, it is a single point of failure just 410as a non 411.Tn pNFS 412server is. 413For non-mirrored configurations, all FreeBSD systems used in the service 414are single points of failure. 415