1.\" Copyright (c) 2017 Rick Macklem 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 15.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 22.\" SUCH DAMAGE. 23.\" 24.Dd December 20, 2019 25.Dt PNFS 4 26.Os 27.Sh NAME 28.Nm pNFS 29.Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol 30.Sh DESCRIPTION 31The NFSv4.1 and NFSv4.2 client and server provides support for the 32.Tn pNFS 33specification; see 34.%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" , 35.%T "Network File System (NFS) Version 4 Minor Version 2 Protocol RFC 7862" and 36.%T "Parallel NFS (pNFS) Flexible File Layout RFC 8435" . 37A pNFS service separates Read/Write operations from all other NFSv4.1 and 38NFSv4.2 operations, which are referred to as Metadata operations. 39The Read/Write operations are performed directly on the Data Server (DS) 40where the file's data resides, bypassing the NFS server. 41All other file operations are performed on the NFS server, which is referred to 42as a Metadata Server (MDS). 43NFS clients that do not support 44.Tn pNFS 45perform Read/Write operations on the MDS, which acts as a proxy for the 46appropriate DS(s). 47.Pp 48The NFSv4.1 and NFSv4.2 protocols provide two pieces of information to pNFS 49aware clients that allow them to perform Read/Write operations directly on 50the DS. 51.Pp 52The first is DeviceInfo, which is static information defining the DS 53server. 54The critical piece of information in DeviceInfo for the layout types 55supported by 56.Fx 57is the IP address that is used to perform RPCs on the DS. 58It also indicates which version of NFS the DS supports, I/O size and other 59layout specific information. 60In the DeviceInfo, there is a DeviceID which, for the 61.Fx 62server 63is unique to the DS configuration 64and changes whenever the 65.Xr nfsd 66daemon is restarted or the server is rebooted. 67.Pp 68The second is the layout, which is per file and references the DeviceInfo 69to use via the DeviceID. 70It is for a byte range of a file and is either Read or Read/Write. 71For the 72.Fx 73server, a layout covers all bytes of a file. 74A layout may be recalled by the MDS using a LayoutRecall callback. 75When a client returns a layout via the LayoutReturn operation it can 76indicate that error(s) were encountered while doing I/O on the DS, 77at least for certain layout types such as the Flexible File Layout. 78.Pp 79The 80.Fx 81client and server supports two layout types. 82.Pp 83The File Layout is described in RFC5661 and uses the NFSv4.1 or NFSv4.2 protocol 84to perform I/O on the DS. 85It does not support client aware DS mirroring and, as such, 86the 87.Fx 88server only provides File Layout support for non-mirrored 89configurations. 90.Pp 91The Flexible File Layout allows the use of the NFSv3, NFSv4.0, NFSv4.1 or 92NFSv4.2 protocol to perform I/O on the DS and does support client aware 93mirroring. 94As such, the 95.Fx 96server uses Flexible File Layout layouts for the 97mirrored DS configurations. 98The 99.Fx 100server supports the 101.Dq tightly coupled 102variant and all DSs allow use of the 103NFSv4.2 or NFSv4.1 protocol for I/O operations. 104Clients that support the Flexible File Layout will do writes and commits 105to all DS mirrors in the mirror set. 106.Pp 107A 108.Fx 109pNFS service consists of a single MDS server plus one or more 110DS servers, all of which are 111.Fx 112systems. 113For a non-mirrored configuration, the 114.Fx 115server will issue File Layout 116layouts by default. 117However that default can be set to the Flexible File Layout by setting the 118.Xr sysctl 8 119sysctl 120.Dq vfs.nfsd.default_flexfile 121to one. 122Mirrored server configurations will only issue Flexible File Layouts. 123.Tn pNFS 124clients mount the MDS as they would a single NFS server. 125.Pp 126A 127.Fx 128.Tn pNFS 129client must be running the 130.Xr nfscbd 8 131daemon and use the mount options 132.Dq nfsv4,minorversion=2,pnfs or 133.Dq nfsv4,minorversion=1,pnfs . 134.Pp 135When files are created, the MDS creates a file tree identical to what a 136single NFS server creates, except that all the regular (VREG) files will 137be empty. 138As such, if you look at the exported tree on the MDS directly 139on the MDS server (not via an NFS mount), the files will all be of size zero. 140Each of these files will also have two extended attributes in the system 141attribute name space: 142.Bd -literal -offset indent 143pnfsd.dsfile - This extended attribute stores the information that the 144 MDS needs to find the data file on a DS(s) for this file. 145pnfsd.dsattr - This extended attribute stores the Size, AccessTime, 146 ModifyTime, Change and SpaceUsed attributes for the file. 147.Ed 148.Pp 149For each regular (VREG) file, the MDS creates a data file on one 150(or on N of them for the mirrored case, where N is the mirror_level) 151of the DS(s) where the file's data will be stored. 152The name of this file is 153the file handle of the file on the MDS in hexadecimal at time of file creation. 154The data file will have the same file ownership, mode and NFSv4 ACL 155(if ACLs are enabled for the file system) as the file on the MDS, so that 156permission checking can be done on the DS. 157This is referred to as 158.Dq tightly coupled 159for the Flexible File Layout. 160.Pp 161For 162.Tn pNFS 163aware clients, the service generates File Layout 164or Flexible File Layout 165layouts and associated DeviceInfo. 166For non-pNFS aware NFS clients, the pNFS service appears just like a normal 167NFS service. 168For the non-pNFS aware client, the MDS will perform I/O operations on the 169appropriate DS(s), acting as 170a proxy for the non-pNFS aware client. 171This is also true for NFSv3 and NFSv4.0 mounts, since these are always non-pNFS 172aware. 173.Pp 174It is possible to assign a DS to an MDS exported file system so that it will 175store data for files on the MDS exported file system. 176If a DS is not assigned to an MDS exported file system, it will store data 177for files on all exported file systems on the MDS. 178.Pp 179If mirroring is enabled, the pNFS service will continue to function when 180DS(s) have failed, so long is there is at least one DS still operational 181that stores data for files on all of the MDS exported file systems. 182After a disabled mirrored DS is repaired, it is possible to recover the DS 183as a mirror while the pNFS service continues to function. 184.Pp 185See 186.Xr pnfsserver 4 187for information on how to set up a 188.Fx 189pNFS service. 190.Sh SEE ALSO 191.Xr nfsv4 4 , 192.Xr pnfsserver 4 , 193.Xr exports 5 , 194.Xr fstab 5 , 195.Xr rc.conf 5 , 196.Xr nfscbd 8 , 197.Xr nfsd 8 , 198.Xr nfsuserd 8 , 199.Xr pnfsdscopymr 8 , 200.Xr pnfsdsfile 8 , 201.Xr pnfsdskill 8 202.Sh BUGS 203Linux kernel versions prior to 4.12 only supports NFSv3 DSs in its client 204and will do all I/O through the MDS. 205For Linux 4.12 kernels, support for NFSv4.1 DSs was added, but I have seen 206Linux client crashes when testing this client. 207For Linux 4.17-rc2 kernels, I have not seen client crashes during testing, 208but it only supports the 209.Dq loosely coupled 210variant. 211To make it work correctly when mounting the 212.Fx 213server, you must 214set the sysctl 215.Dq vfs.nfsd.flexlinuxhack 216to one so that it works around 217the Linux client driver's limitations. 218Wihout this sysctl being set, there will be access errors, since the Linux 219client will use the authenticator in the layout (uid=999, gid=999) and not 220the authenticator specified in the RPC header. 221.Pp 222Linux 5.n kernels appear to be patched so that it uses the authenticator 223in the RPC header and, as such, the above sysctl should not need to be set. 224.Pp 225Since the MDS cannot be mirrored, it is a single point of failure just 226as a non 227.Tn pNFS 228server is. 229