1.\" Copyright (c) 2018 Rick Macklem 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 15.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 22.\" SUCH DAMAGE. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd December 20, 2019 27.Dt PNFSSERVER 4 28.Os 29.Sh NAME 30.Nm pNFSserver 31.Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol Server 32.Sh DESCRIPTION 33A set of FreeBSD servers may be configured to provide a 34.Xr pnfs 4 35service. 36One FreeBSD system needs to be configured as a MetaData Server (MDS) and 37at least one additional FreeBSD system needs to be configured as one or 38more Data Servers (DS)s. 39.Pp 40These FreeBSD systems are configured to be NFSv4.1 and NFSv4.2 41servers, see 42.Xr nfsd 8 43and 44.Xr exports 5 45if you are not familiar with configuring a NFSv4.n server. 46All DS(s) and the MDS should support NFSv4.2 as well as NFSv4.1. 47Mixing an MDS that supports NFSv4.2 with any DS(s) that do not support 48NFSv4.2 will not work correctly. 49As such, all DS(s) must be upgraded from 50.Fx 12 51to 52.Fx 13 53before upgrading the MDS. 54.Sh DS server configuration 55The DS(s) need to be configured as NFSv4.1 and NFSv4.2 server(s), 56with a top level exported 57directory used for storage of data files. 58This directory must be owned by 59.Dq root 60and would normally have a mode of 61.Dq 700 . 62Within this directory there needs to be additional directories named 63ds0,...,dsN (where N is 19 by default) also owned by 64.Dq root 65with mode 66.Dq 700 . 67These are the directories where the data files are stored. 68The following command can be run by root when in the top level exported 69directory to create these subdirectories. 70.Bd -literal -offset indent 71jot -w ds 20 0 | xargs mkdir -m 700 72.Ed 73.sp 74Note that 75.Dq 20 76is the default and can be set to a larger value on the MDS as shown below. 77.sp 78The top level exported directory used for storage of data files must be 79exported to the MDS with the 80.Dq maproot=root sec=sys 81export options so that the MDS can create entries in these subdirectories. 82It must also be exported to all pNFS aware clients, but these clients do 83not require the 84.Dq maproot=root 85export option and this directory should be exported to them with the same 86options as used by the MDS to export file system(s) to the clients. 87.Pp 88It is possible to have multiple DSs on the same FreeBSD system, but each 89of these DSs must have a separate top level exported directory used for storage 90of data files and each 91of these DSs must be mountable via a separate IP address. 92Alias addresses can be set on the DS server system for a network 93interface via 94.Xr ifconfig 8 95to create these different IP addresses. 96Multiple DSs on the same server may be useful when data for different file systems 97on the MDS are being stored on different file system volumes on the FreeBSD 98DS system. 99.Sh MDS server configuration 100The MDS must be a separate FreeBSD system from the FreeBSD DS system(s) and 101NFS clients. 102It is configured as a NFSv4.1 and NFSv4.2 server with 103file system(s) exported to clients. 104However, the 105.Dq -p 106command line argument for 107.Xr nfsd 108is used to indicate that it is running as the MDS for a pNFS server. 109.Pp 110The DS(s) must all be mounted on the MDS using the following mount options: 111.Bd -literal -offset indent 112nfsv4,minorversion=2,soft,retrans=2 113.Ed 114.sp 115so that they can be defined as DSs in the 116.Dq -p 117option. 118Normally these mounts would be entered in the 119.Xr fstab 5 120on the MDS. 121For example, if there are four DSs named nfsv4-data[0-3], the 122.Xr fstab 5 123lines might look like: 124.Bd -literal -offset 125nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 126nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 127nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 128nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 129.Ed 130.sp 131The 132.Xr nfsd 8 133command line option 134.Dq -p 135indicates that the NFS server is a pNFS MDS and specifies what 136DSs are to be used. 137.br 138For the above 139.Xr fstab 5 140example, the 141.Xr nfsd 8 142nfs_server_flags line in your 143.Xr rc.conf 5 144might look like: 145.Bd -literal -offset 146nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3" 147.Ed 148.sp 149This example specifies that the data files should be distributed over the 150four DSs and File layouts will be issued to pNFS enabled clients. 151If issuing Flexible File layouts is desired for this case, setting the sysctl 152.Dq vfs.nfsd.default_flexfile 153non-zero in your 154.Xr sysctl.conf 5 155file will make the 156.Nm 157do that. 158.br 159Alternately, this variant of 160.Dq nfs_server_flags 161will specify that two way mirroring is to be done, via the 162.Dq -m 163command line option. 164.Bd -literal -offset 165nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3 -m 2" 166.Ed 167.sp 168With two way mirroring, the data file for each exported file on the MDS 169will be stored on two of the DSs. 170When mirroring is enabled, the server will always issue Flexible File layouts. 171.Pp 172It is also possible to specify which DSs are to be used to store data files for 173specific exported file systems on the MDS. 174For example, if the MDS has exported two file systems 175.Dq /export1 176and 177.Dq /export2 178to clients, the following variant of 179.Dq nfs_server_flags 180will specify that data files for 181.Dq /export1 182will be stored on nfsv4-data0 and nfsv4-data1, whereas the data files for 183.Dq /export2 184will be store on nfsv4-data2 and nfsv4-data3. 185.Bd -literal -offset 186nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0#/export1,nfsv4-data1:/data1#/export1,nfsv4-data2:/data2#/export2,nfsv4-data3:/data3#/export2" 187.Ed 188.sp 189This can be used by system administrators to control where data files are 190stored and might be useful for control of storage use. 191For this case, it may be convenient to co-locate more than one of the DSs 192on the same FreeBSD server, using separate file systems on the DS system 193for storage of the respective DS's data files. 194If mirroring is desired for this case, the 195.Dq -m 196option also needs to be specified. 197There must be enough DSs assigned to each exported file system on the MDS 198to support the level of mirroring. 199The above example would be fine for two way mirroring, but four way mirroring 200would not work, since there are only two DSs assigned to each exported file 201system on the MDS. 202.Pp 203The number of subdirectories in each DS is defined by the 204.Dq vfs.nfs.dsdirsize 205sysctl on the MDS. 206This value can be increased from the default of 20, but only when the 207.Xr nfsd 8 208is not running and after the additional ds20,... subdirectories have been 209created on all the DSs. 210For a service that will store a large number of files this sysctl should be 211set much larger, to avoid the number of entries in a subdirectory from 212getting too large. 213.Sh Client mounts 214Once operational, NFSv4.1 or NFSv4.2 FreeBSD client mounts 215done with the 216.Dq pnfs 217option should do I/O directly on the DSs. 218The clients mounting the MDS must be running the 219.Xr nfscbd 220daemon for pNFS to work. 221Set 222.Bd -literal -offset indent 223nfscbd_enable="YES" 224.Ed 225.sp 226in the 227.Xr rc.conf 5 228on these clients. 229Non-pNFS aware clients or NFSv3 mounts will do all I/O RPCs on the MDS, 230which acts as a proxy for the appropriate DS(s). 231.Sh Backing up a pNFS service 232Since the data is separated from the metadata, the simple way to back up 233a pNFS service is to do so from an NFS client that has the service mounted 234on it. 235If you back up the MDS exported file system(s) on the MDS, you must do it 236in such a way that the 237.Dq system 238namespace extended attributes get backed up. 239.Sh Handling of failed mirrored DSs 240When a mirrored DS fails, it can be disabled one of three ways: 241.sp 2421 - The MDS detects a problem when trying to do proxy 243operations on the DS. 244This can take a couple of minutes 245after the DS failure or network partitioning occurs. 246.sp 2472 - A pNFS client can report an I/O error that occurred for a DS to the MDS in 248the arguments for a LayoutReturn operation. 249.sp 2503 - The system administrator can perform the pnfsdskill(8) command on the MDS 251to disable it. 252If the system administrator does a pnfsdskill(8) and it fails with ENXIO 253(Device not configured) that normally means the DS was already 254disabled via #1 or #2. 255Since doing this is harmless, once a system administrator knows that 256there is a problem with a mirrored DS, doing the command is recommended. 257.sp 258Once a system administrator knows that a mirrored DS has malfunctioned 259or has been network partitioned, they should do the following as root/su 260on the MDS: 261.Bd -literal -offset indent 262# pnfsdskill <mounted-on-path-of-DS> 263# umount -N <mounted-on-path-of-DS> 264.Ed 265.sp 266Note that the <mounted-on-path-of-DS> must be the exact mounted-on path 267string used when the DS was mounted on the MDS. 268.Pp 269Once the mirrored DS has been disabled, the pNFS service should continue to 270function, but file updates will only happen on the DS(s) that have not been disabled. 271Assuming two way mirroring, that implies the one DS of the pair stored in the 272.Dq pnfsd.dsfile 273extended attribute for the file on the MDS, for files stored on the disabled DS. 274.Pp 275The next step is to clear the IP address in the 276.Dq pnfsd.dsfile 277extended attribute on all files on the MDS for the failed DS. 278This is done so that, when the disabled DS is repaired and brought back online, 279the data files on this DS will not be used, since they may be out of date. 280The command that clears the IP address is 281.Xr pnfsdsfile 8 282with the 283.Dq -r 284option. 285.Bd -literal -offset 286For example: 287# pnfsdsfile -r nfsv4-data3 yyy.c 288yyy.c: nfsv4-data2.home.rick ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 0.0.0.0 ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 289.Ed 290.sp 291replaces nfsv4-data3 with an IPv4 address of 0.0.0.0, so that nfsv4-data3 292will not get used. 293.Pp 294Normally this will be called within a 295.Xr find 1 296command for all regular 297files in the exported directory tree and must be done on the MDS. 298When used with 299.Xr find 1 , 300you will probably also want the 301.Dq -q 302option so that it won't spit out the results for every file. 303If the disabled/repaired DS is nfsv4-data3, the commands done on the MDS 304would be: 305.Bd -literal -offset 306# cd <top-level-exported-dir> 307# find . -type f -exec pnfsdsfile -q -r nfsv4-data3 {} \; 308.Ed 309.sp 310There is a problem with the above command if the file found by 311.Xr find 1 312is renamed or unlinked before the 313.Xr pnfsdsfile 8 314command is done on it. 315This should normally generate an error message. 316A simple unlink is harmless 317but a link/unlink or rename might result in the file not having been processed 318under its new name. 319To check that all files have their IP addresses set to 0.0.0.0 these 320commands can be used (assuming the 321.Xr sh 1 322shell): 323.Bd -literal -offset 324# cd <top-level-exported-dir> 325# find . -type f -exec pnfsdsfile {} \; | sed "/nfsv4-data3/!d" 326.Ed 327.sp 328Any line(s) printed require the 329.Xr pnfsdsfile 8 330with 331.Dq -r 332to be done again. 333Once this is done, the replaced/repaired DS can be brought back online. 334It should have empty ds0,...,dsN directories under the top level exported 335directory for storage of data files just like it did when first set up. 336Mount it on the MDS exactly as you did before disabling it. 337For the nfsv4-data3 example, the command would be: 338.Bd -literal -offset 339# mount -t nfs -o nfsv4,minorversion=2,soft,retrans=2 nfsv4-data3:/ /data3 340.Ed 341.sp 342Then restart the nfsd to re-enable the DS. 343.Bd -literal -offset 344# /etc/rc.d/nfsd restart 345.Ed 346.sp 347Now, new files can be stored on nfsv4-data3, 348but files with the IP address zeroed out on the MDS will not yet use the 349repaired DS (nfsv4-data3). 350The next step is to go through the exported file tree on the MDS and, 351for each of the 352files with an IPv4 address of 0.0.0.0 in its extended attribute, copy the file 353data to the repaired DS and re-enable use of this mirror for it. 354This command for copying the file data for one MDS file is 355.Xr pnfsdscopymr 8 356and it will also normally be used in a 357.Xr find 1 . 358For the example case, the commands on the MDS would be: 359.Bd -literal -offset 360# cd <top-level-exported-dir> 361# find . -type f -exec pnfsdscopymr -r /data3 {} \; 362.Ed 363.sp 364When this completes, the recovery should be complete or at least nearly so. 365As noted above, if a link/unlink or rename occurs on a file name while the 366above 367.Xr find 1 368is in progress, it may not get copied. 369To check for any file(s) not yet copied, the commands are: 370.Bd -literal -offset 371# cd <top-level-exported-dir> 372# find . -type f -exec pnfsdsfile {} \; | sed "/0\.0\.0\.0/!d" 373.Ed 374.sp 375If this command prints out any file name(s), these files must 376have the 377.Xr pnfsdscopymr 8 378command done on them to complete the recovery. 379.Bd -literal -offset 380# pnfsdscopymr -r /data3 <file-path-reported> 381.Ed 382.sp 383If this commmand fails with the error 384.br 385.Dq pnfsdscopymr: Copymr failed for file <path>: Device not configured 386.br 387repeatedly, this may be caused by a Read/Write layout that has not 388been returned. 389The only way to get rid of such a layout is to restart the 390.Xr nfsd 8 . 391.sp 392All of these commands are designed to be 393done while the pNFS service is running and can be re-run safely. 394.Pp 395For a more detailed discussion of the setup and management of a pNFS service 396see: 397.Bd -literal -offset indent 398http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt 399.Ed 400.sp 401.Sh SEE ALSO 402.Xr nfsv4 4 , 403.Xr pnfs 4 , 404.Xr exports 5 , 405.Xr fstab 5 , 406.Xr rc.conf 5 , 407.Xr sysctl.conf 5 , 408.Xr nfscbd 8 , 409.Xr nfsd 8 , 410.Xr nfsuserd 8 , 411.Xr pnfsdscopymr 8 , 412.Xr pnfsdsfile 8 , 413.Xr pnfsdskill 8 414.Sh HISTORY 415The 416.Nm 417service first appeared in 418.Fx 12.0 . 419.Sh BUGS 420Since the MDS cannot be mirrored, it is a single point of failure just 421as a non 422.Tn pNFS 423server is. 424For non-mirrored configurations, all FreeBSD systems used in the service 425are single points of failure. 426