1.\" Copyright (c) 2018 Rick Macklem 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 15.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 22.\" SUCH DAMAGE. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd December 20, 2019 27.Dt PNFSSERVER 4 28.Os 29.Sh NAME 30.Nm pNFSserver 31.Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol Server 32.Sh DESCRIPTION 33A set of 34.Fx 35servers may be configured to provide a 36.Xr pnfs 4 37service. 38One 39.Fx 40system needs to be configured as a MetaData Server (MDS) and 41at least one additional 42.Fx 43system needs to be configured as one or 44more Data Servers (DS)s. 45.Pp 46These 47.Fx 48systems are configured to be NFSv4.1 and NFSv4.2 49servers, see 50.Xr nfsd 8 51and 52.Xr exports 5 53if you are not familiar with configuring a NFSv4.n server. 54All DS(s) and the MDS should support NFSv4.2 as well as NFSv4.1. 55Mixing an MDS that supports NFSv4.2 with any DS(s) that do not support 56NFSv4.2 will not work correctly. 57As such, all DS(s) must be upgraded from 58.Fx 12 59to 60.Fx 13 61before upgrading the MDS. 62.Sh DS server configuration 63The DS(s) need to be configured as NFSv4.1 and NFSv4.2 server(s), 64with a top level exported 65directory used for storage of data files. 66This directory must be owned by 67.Dq root 68and would normally have a mode of 69.Dq 700 . 70Within this directory there needs to be additional directories named 71ds0,...,dsN (where N is 19 by default) also owned by 72.Dq root 73with mode 74.Dq 700 . 75These are the directories where the data files are stored. 76The following command can be run by root when in the top level exported 77directory to create these subdirectories. 78.Bd -literal -offset indent 79jot -w ds 20 0 | xargs mkdir -m 700 80.Ed 81.sp 82Note that 83.Dq 20 84is the default and can be set to a larger value on the MDS as shown below. 85.sp 86The top level exported directory used for storage of data files must be 87exported to the MDS with the 88.Dq maproot=root sec=sys 89export options so that the MDS can create entries in these subdirectories. 90It must also be exported to all pNFS aware clients, but these clients do 91not require the 92.Dq maproot=root 93export option and this directory should be exported to them with the same 94options as used by the MDS to export file system(s) to the clients. 95.Pp 96It is possible to have multiple DSs on the same 97.Fx 98system, but each 99of these DSs must have a separate top level exported directory used for storage 100of data files and each 101of these DSs must be mountable via a separate IP address. 102Alias addresses can be set on the DS server system for a network 103interface via 104.Xr ifconfig 8 105to create these different IP addresses. 106Multiple DSs on the same server may be useful when data for different file systems 107on the MDS are being stored on different file system volumes on the 108.Fx 109DS system. 110.Sh MDS server configuration 111The MDS must be a separate 112.Fx 113system from the 114.Fx 115DS system(s) and 116NFS clients. 117It is configured as a NFSv4.1 and NFSv4.2 server with 118file system(s) exported to clients. 119However, the 120.Dq -p 121command line argument for 122.Xr nfsd 123is used to indicate that it is running as the MDS for a pNFS server. 124.Pp 125The DS(s) must all be mounted on the MDS using the following mount options: 126.Bd -literal -offset indent 127nfsv4,minorversion=2,soft,retrans=2 128.Ed 129.sp 130so that they can be defined as DSs in the 131.Dq -p 132option. 133Normally these mounts would be entered in the 134.Xr fstab 5 135on the MDS. 136For example, if there are four DSs named nfsv4-data[0-3], the 137.Xr fstab 5 138lines might look like: 139.Bd -literal -offset 140nfsv4-data0:/ /data0 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 141nfsv4-data1:/ /data1 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 142nfsv4-data2:/ /data2 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 143nfsv4-data3:/ /data3 nfs rw,nfsv4,minorversion=2,soft,retrans=2 0 0 144.Ed 145.sp 146The 147.Xr nfsd 8 148command line option 149.Dq -p 150indicates that the NFS server is a pNFS MDS and specifies what 151DSs are to be used. 152.br 153For the above 154.Xr fstab 5 155example, the 156.Xr nfsd 8 157nfs_server_flags line in your 158.Xr rc.conf 5 159might look like: 160.Bd -literal -offset 161nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3" 162.Ed 163.sp 164This example specifies that the data files should be distributed over the 165four DSs and File layouts will be issued to pNFS enabled clients. 166If issuing Flexible File layouts is desired for this case, setting the sysctl 167.Dq vfs.nfsd.default_flexfile 168non-zero in your 169.Xr sysctl.conf 5 170file will make the 171.Nm 172do that. 173.br 174Alternately, this variant of 175.Dq nfs_server_flags 176will specify that two way mirroring is to be done, via the 177.Dq -m 178command line option. 179.Bd -literal -offset 180nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0,nfsv4-data1:/data1,nfsv4-data2:/data2,nfsv4-data3:/data3 -m 2" 181.Ed 182.sp 183With two way mirroring, the data file for each exported file on the MDS 184will be stored on two of the DSs. 185When mirroring is enabled, the server will always issue Flexible File layouts. 186.Pp 187It is also possible to specify which DSs are to be used to store data files for 188specific exported file systems on the MDS. 189For example, if the MDS has exported two file systems 190.Dq /export1 191and 192.Dq /export2 193to clients, the following variant of 194.Dq nfs_server_flags 195will specify that data files for 196.Dq /export1 197will be stored on nfsv4-data0 and nfsv4-data1, whereas the data files for 198.Dq /export2 199will be store on nfsv4-data2 and nfsv4-data3. 200.Bd -literal -offset 201nfs_server_flags="-u -t -n 128 -p nfsv4-data0:/data0#/export1,nfsv4-data1:/data1#/export1,nfsv4-data2:/data2#/export2,nfsv4-data3:/data3#/export2" 202.Ed 203.sp 204This can be used by system administrators to control where data files are 205stored and might be useful for control of storage use. 206For this case, it may be convenient to co-locate more than one of the DSs 207on the same 208.Fx 209server, using separate file systems on the DS system 210for storage of the respective DS's data files. 211If mirroring is desired for this case, the 212.Dq -m 213option also needs to be specified. 214There must be enough DSs assigned to each exported file system on the MDS 215to support the level of mirroring. 216The above example would be fine for two way mirroring, but four way mirroring 217would not work, since there are only two DSs assigned to each exported file 218system on the MDS. 219.Pp 220The number of subdirectories in each DS is defined by the 221.Dq vfs.nfs.dsdirsize 222sysctl on the MDS. 223This value can be increased from the default of 20, but only when the 224.Xr nfsd 8 225is not running and after the additional ds20,... subdirectories have been 226created on all the DSs. 227For a service that will store a large number of files this sysctl should be 228set much larger, to avoid the number of entries in a subdirectory from 229getting too large. 230.Sh Client mounts 231Once operational, NFSv4.1 or NFSv4.2 232.Fx 233client mounts 234done with the 235.Dq pnfs 236option should do I/O directly on the DSs. 237The clients mounting the MDS must be running the 238.Xr nfscbd 239daemon for pNFS to work. 240Set 241.Bd -literal -offset indent 242nfscbd_enable="YES" 243.Ed 244.sp 245in the 246.Xr rc.conf 5 247on these clients. 248Non-pNFS aware clients or NFSv3 mounts will do all I/O RPCs on the MDS, 249which acts as a proxy for the appropriate DS(s). 250.Sh Backing up a pNFS service 251Since the data is separated from the metadata, the simple way to back up 252a pNFS service is to do so from an NFS client that has the service mounted 253on it. 254If you back up the MDS exported file system(s) on the MDS, you must do it 255in such a way that the 256.Dq system 257namespace extended attributes get backed up. 258.Sh Handling of failed mirrored DSs 259When a mirrored DS fails, it can be disabled one of three ways: 260.sp 2611 - The MDS detects a problem when trying to do proxy 262operations on the DS. 263This can take a couple of minutes 264after the DS failure or network partitioning occurs. 265.sp 2662 - A pNFS client can report an I/O error that occurred for a DS to the MDS in 267the arguments for a LayoutReturn operation. 268.sp 2693 - The system administrator can perform the pnfsdskill(8) command on the MDS 270to disable it. 271If the system administrator does a pnfsdskill(8) and it fails with ENXIO 272(Device not configured) that normally means the DS was already 273disabled via #1 or #2. 274Since doing this is harmless, once a system administrator knows that 275there is a problem with a mirrored DS, doing the command is recommended. 276.sp 277Once a system administrator knows that a mirrored DS has malfunctioned 278or has been network partitioned, they should do the following as root/su 279on the MDS: 280.Bd -literal -offset indent 281# pnfsdskill <mounted-on-path-of-DS> 282# umount -N <mounted-on-path-of-DS> 283.Ed 284.sp 285Note that the <mounted-on-path-of-DS> must be the exact mounted-on path 286string used when the DS was mounted on the MDS. 287.Pp 288Once the mirrored DS has been disabled, the pNFS service should continue to 289function, but file updates will only happen on the DS(s) that have not been disabled. 290Assuming two way mirroring, that implies the one DS of the pair stored in the 291.Dq pnfsd.dsfile 292extended attribute for the file on the MDS, for files stored on the disabled DS. 293.Pp 294The next step is to clear the IP address in the 295.Dq pnfsd.dsfile 296extended attribute on all files on the MDS for the failed DS. 297This is done so that, when the disabled DS is repaired and brought back online, 298the data files on this DS will not be used, since they may be out of date. 299The command that clears the IP address is 300.Xr pnfsdsfile 8 301with the 302.Dq -r 303option. 304.Bd -literal -offset 305For example: 306# pnfsdsfile -r nfsv4-data3 yyy.c 307yyy.c: nfsv4-data2.home.rick ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 0.0.0.0 ds0/207508569ff983350c000000ec7c0200e4c57b2e0000000000000000 308.Ed 309.sp 310replaces nfsv4-data3 with an IPv4 address of 0.0.0.0, so that nfsv4-data3 311will not get used. 312.Pp 313Normally this will be called within a 314.Xr find 1 315command for all regular 316files in the exported directory tree and must be done on the MDS. 317When used with 318.Xr find 1 , 319you will probably also want the 320.Dq -q 321option so that it won't spit out the results for every file. 322If the disabled/repaired DS is nfsv4-data3, the commands done on the MDS 323would be: 324.Bd -literal -offset 325# cd <top-level-exported-dir> 326# find . -type f -exec pnfsdsfile -q -r nfsv4-data3 {} \; 327.Ed 328.sp 329There is a problem with the above command if the file found by 330.Xr find 1 331is renamed or unlinked before the 332.Xr pnfsdsfile 8 333command is done on it. 334This should normally generate an error message. 335A simple unlink is harmless 336but a link/unlink or rename might result in the file not having been processed 337under its new name. 338To check that all files have their IP addresses set to 0.0.0.0 these 339commands can be used (assuming the 340.Xr sh 1 341shell): 342.Bd -literal -offset 343# cd <top-level-exported-dir> 344# find . -type f -exec pnfsdsfile {} \; | sed "/nfsv4-data3/!d" 345.Ed 346.sp 347Any line(s) printed require the 348.Xr pnfsdsfile 8 349with 350.Dq -r 351to be done again. 352Once this is done, the replaced/repaired DS can be brought back online. 353It should have empty ds0,...,dsN directories under the top level exported 354directory for storage of data files just like it did when first set up. 355Mount it on the MDS exactly as you did before disabling it. 356For the nfsv4-data3 example, the command would be: 357.Bd -literal -offset 358# mount -t nfs -o nfsv4,minorversion=2,soft,retrans=2 nfsv4-data3:/ /data3 359.Ed 360.sp 361Then restart the nfsd to re-enable the DS. 362.Bd -literal -offset 363# /etc/rc.d/nfsd restart 364.Ed 365.sp 366Now, new files can be stored on nfsv4-data3, 367but files with the IP address zeroed out on the MDS will not yet use the 368repaired DS (nfsv4-data3). 369The next step is to go through the exported file tree on the MDS and, 370for each of the 371files with an IPv4 address of 0.0.0.0 in its extended attribute, copy the file 372data to the repaired DS and re-enable use of this mirror for it. 373This command for copying the file data for one MDS file is 374.Xr pnfsdscopymr 8 375and it will also normally be used in a 376.Xr find 1 . 377For the example case, the commands on the MDS would be: 378.Bd -literal -offset 379# cd <top-level-exported-dir> 380# find . -type f -exec pnfsdscopymr -r /data3 {} \; 381.Ed 382.sp 383When this completes, the recovery should be complete or at least nearly so. 384As noted above, if a link/unlink or rename occurs on a file name while the 385above 386.Xr find 1 387is in progress, it may not get copied. 388To check for any file(s) not yet copied, the commands are: 389.Bd -literal -offset 390# cd <top-level-exported-dir> 391# find . -type f -exec pnfsdsfile {} \; | sed "/0\.0\.0\.0/!d" 392.Ed 393.sp 394If this command prints out any file name(s), these files must 395have the 396.Xr pnfsdscopymr 8 397command done on them to complete the recovery. 398.Bd -literal -offset 399# pnfsdscopymr -r /data3 <file-path-reported> 400.Ed 401.sp 402If this command fails with the error 403.br 404.Dq pnfsdscopymr: Copymr failed for file <path>: Device not configured 405.br 406repeatedly, this may be caused by a Read/Write layout that has not 407been returned. 408The only way to get rid of such a layout is to restart the 409.Xr nfsd 8 . 410.sp 411All of these commands are designed to be 412done while the pNFS service is running and can be re-run safely. 413.Pp 414For a more detailed discussion of the setup and management of a pNFS service 415see: 416.Bd -literal -offset indent 417https://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt 418.Ed 419.sp 420.Sh SEE ALSO 421.Xr nfsv4 4 , 422.Xr pnfs 4 , 423.Xr exports 5 , 424.Xr fstab 5 , 425.Xr rc.conf 5 , 426.Xr sysctl.conf 5 , 427.Xr nfscbd 8 , 428.Xr nfsd 8 , 429.Xr nfsuserd 8 , 430.Xr pnfsdscopymr 8 , 431.Xr pnfsdsfile 8 , 432.Xr pnfsdskill 8 433.Sh HISTORY 434The 435.Nm 436service first appeared in 437.Fx 12.0 . 438.Sh BUGS 439Since the MDS cannot be mirrored, it is a single point of failure just 440as a non 441.Tn pNFS 442server is. 443For non-mirrored configurations, all 444.Fx 445systems used in the service 446are single points of failure. 447