1.\" Copyright (c) 2017 Rick Macklem 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 15.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 22.\" SUCH DAMAGE. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd December 20, 2019 27.Dt PNFS 4 28.Os 29.Sh NAME 30.Nm pNFS 31.Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol 32.Sh DESCRIPTION 33The NFSv4.1 and NFSv4.2 client and server provides support for the 34.Tn pNFS 35specification; see 36.%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" , 37.%T "Network File System (NFS) Version 4 Minor Version 2 Protocol RFC 7862" and 38.%T "Parallel NFS (pNFS) Flexible File Layout RFC 8435" . 39A pNFS service separates Read/Write operations from all other NFSv4.1 and 40NFSv4.2 operations, which are referred to as Metadata operations. 41The Read/Write operations are performed directly on the Data Server (DS) 42where the file's data resides, bypassing the NFS server. 43All other file operations are performed on the NFS server, which is referred to 44as a Metadata Server (MDS). 45NFS clients that do not support 46.Tn pNFS 47perform Read/Write operations on the MDS, which acts as a proxy for the 48appropriate DS(s). 49.Pp 50The NFSv4.1 and NFSv4.2 protocols provide two pieces of information to pNFS 51aware clients that allow them to perform Read/Write operations directly on 52the DS. 53.Pp 54The first is DeviceInfo, which is static information defining the DS 55server. 56The critical piece of information in DeviceInfo for the layout types 57supported by FreeBSD is the IP address that is used to perform RPCs on the DS. 58It also indicates which version of NFS the DS supports, I/O size and other 59layout specific information. 60In the DeviceInfo, there is a DeviceID which, for the FreeBSD server 61is unique to the DS configuration 62and changes whenever the 63.Xr nfsd 64daemon is restarted or the server is rebooted. 65.Pp 66The second is the layout, which is per file and references the DeviceInfo 67to use via the DeviceID. 68It is for a byte range of a file and is either Read or Read/Write. 69For the FreeBSD server, a layout covers all bytes of a file. 70A layout may be recalled by the MDS using a LayoutRecall callback. 71When a client returns a layout via the LayoutReturn operation it can 72indicate that error(s) were encountered while doing I/O on the DS, 73at least for certain layout types such as the Flexible File Layout. 74.Pp 75The FreeBSD client and server supports two layout types. 76.Pp 77The File Layout is described in RFC5661 and uses the NFSv4.1 or NFSv4.2 protocol 78to perform I/O on the DS. 79It does not support client aware DS mirroring and, as such, 80the FreeBSD server only provides File Layout support for non-mirrored 81configurations. 82.Pp 83The Flexible File Layout allows the use of the NFSv3, NFSv4.0, NFSv4.1 or 84NFSv4.2 protocol to perform I/O on the DS and does support client aware 85mirroring. 86As such, the FreeBSD server uses Flexible File Layout layouts for the 87mirrored DS configurations. 88The FreeBSD server supports the 89.Dq tightly coupled 90variant and all DSs allow use of the 91NFSv4.2 or NFSv4.1 protocol for I/O operations. 92Clients that support the Flexible File Layout will do writes and commits 93to all DS mirrors in the mirror set. 94.Pp 95A FreeBSD pNFS service consists of a single MDS server plus one or more 96DS servers, all of which are FreeBSD systems. 97For a non-mirrored configuration, the FreeBSD server will issue File Layout 98layouts by default. 99However that default can be set to the Flexible File Layout by setting the 100.Xr sysctl 1 101sysctl 102.Dq vfs.nfsd.default_flexfile 103to one. 104Mirrored server configurations will only issue Flexible File Layouts. 105.Tn pNFS 106clients mount the MDS as they would a single NFS server. 107.Pp 108A FreeBSD 109.Tn pNFS 110client must be running the 111.Xr nfscbd 8 112daemon and use the mount options 113.Dq nfsv4,minorversion=2,pnfs or 114.Dq nfsv4,minorversion=1,pnfs . 115.Pp 116When files are created, the MDS creates a file tree identical to what a 117single NFS server creates, except that all the regular (VREG) files will 118be empty. 119As such, if you look at the exported tree on the MDS directly 120on the MDS server (not via an NFS mount), the files will all be of size zero. 121Each of these files will also have two extended attributes in the system 122attribute name space: 123.Bd -literal -offset indent 124pnfsd.dsfile - This extended attrbute stores the information that the 125 MDS needs to find the data file on a DS(s) for this file. 126pnfsd.dsattr - This extended attribute stores the Size, AccessTime, 127 ModifyTime, Change and SpaceUsed attributes for the file. 128.Ed 129.Pp 130For each regular (VREG) file, the MDS creates a data file on one 131(or on N of them for the mirrored case, where N is the mirror_level) 132of the DS(s) where the file's data will be stored. 133The name of this file is 134the file handle of the file on the MDS in hexadecimal at time of file creation. 135The data file will have the same file ownership, mode and NFSv4 ACL 136(if ACLs are enabled for the file system) as the file on the MDS, so that 137permission checking can be done on the DS. 138This is referred to as 139.Dq tightly coupled 140for the Flexible File Layout. 141.Pp 142For 143.Tn pNFS 144aware clients, the service generates File Layout 145or Flexible File Layout 146layouts and associated DeviceInfo. 147For non-pNFS aware NFS clients, the pNFS service appears just like a normal 148NFS service. 149For the non-pNFS aware client, the MDS will perform I/O operations on the 150appropriate DS(s), acting as 151a proxy for the non-pNFS aware client. 152This is also true for NFSv3 and NFSv4.0 mounts, since these are always non-pNFS 153aware. 154.Pp 155It is possible to assign a DS to an MDS exported file system so that it will 156store data for files on the MDS exported file system. 157If a DS is not assigned to an MDS exported file system, it will store data 158for files on all exported file systems on the MDS. 159.Pp 160If mirroring is enabled, the pNFS service will continue to function when 161DS(s) have failed, so long is there is at least one DS still operational 162that stores data for files on all of the MDS exported file systems. 163After a disabled mirrored DS is repaired, it is possible to recover the DS 164as a mirror while the pNFS service continues to function. 165.Pp 166See 167.Xr pnfsserver 4 168for information on how to set up a FreeBSD pNFS service. 169.Sh SEE ALSO 170.Xr nfsv4 4 , 171.Xr pnfsserver 4 , 172.Xr exports 5 , 173.Xr fstab 5 , 174.Xr rc.conf 5 , 175.Xr nfscbd 8 , 176.Xr nfsd 8 , 177.Xr nfsuserd 8 , 178.Xr pnfsdscopymr 8 , 179.Xr pnfsdsfile 8 , 180.Xr pnfsdskill 8 181.Sh BUGS 182Linux kernel versions prior to 4.12 only supports NFSv3 DSs in its client 183and will do all I/O through the MDS. 184For Linux 4.12 kernels, support for NFSv4.1 DSs was added, but I have seen 185Linux client crashes when testing this client. 186For Linux 4.17-rc2 kernels, I have not seen client crashes during testing, 187but it only supports the 188.Dq loosely coupled 189variant. 190To make it work correctly when mounting the FreeBSD server, you must 191set the sysctl 192.Dq vfs.nfsd.flexlinuxhack 193to one so that it works around 194the Linux client driver's limitations. 195Wihout this sysctl being set, there will be access errors, since the Linux 196client will use the authenticator in the layout (uid=999, gid=999) and not 197the authenticator specified in the RPC header. 198.Pp 199Linux 5.n kernels appear to be patched so that it uses the authenticator 200in the RPC header and, as such, the above sysctl should not need to be set. 201.Pp 202Since the MDS cannot be mirrored, it is a single point of failure just 203as a non 204.Tn pNFS 205server is. 206