1.\" Copyright (c) 2017 Rick Macklem 2.\" 3.\" Redistribution and use in source and binary forms, with or without 4.\" modification, are permitted provided that the following conditions 5.\" are met: 6.\" 1. Redistributions of source code must retain the above copyright 7.\" notice, this list of conditions and the following disclaimer. 8.\" 2. Redistributions in binary form must reproduce the above copyright 9.\" notice, this list of conditions and the following disclaimer in the 10.\" documentation and/or other materials provided with the distribution. 11.\" 12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 15.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 22.\" SUCH DAMAGE. 23.\" 24.\" $FreeBSD$ 25.\" 26.Dd March 26, 2018 27.Dt PNFS 4 28.Os 29.Sh NAME 30.Nm pNFS 31.Nd NFS Version 4.1 Parallel NFS Protocol 32.Sh DESCRIPTION 33The NFSv4.1 client and server provides support for the 34.Tn pNFS 35specification; see 36.%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" . 37A pNFS service separates Read/Write operations from all other NFSv4.1 38operations, which are referred to as Metadata operations. 39The Read/Write operations are performed directly on the Data Server (DS) 40where the file's data resides, bypassing the NFS server. 41All other file operations are performed on the NFS server, which is referred to 42as a Metadata Server (MDS). 43NFS clients that do not support 44.Tn pNFS 45perform Read/Write operations on the MDS, which acts as a proxy for the 46appropriate DS(s). 47.Pp 48The NFSv4.1 protocol provides two pieces of information to pNFS aware 49clients that allow them to perform Read/Write operations directly on 50the DS. 51.Pp 52The first is DeviceInfo, which is static information defining the DS 53server. 54The critical piece of information in DeviceInfo for the layout types 55supported by FreeBSD is the IP address that is used to perform RPCs on the DS. 56It also indicates which version of NFS the DS supports, I/O size and other 57layout specific information. 58In the DeviceInfo, there is a DeviceID which, for the FreeBSD server 59is unique to the DS configuration 60and changes whenever the 61.Xr nfsd 62daemon is restarted or the server is rebooted. 63.Pp 64The second is the layout, which is per file and references the DeviceInfo 65to use via the DeviceID. 66It is for a byte range of a file and is either Read or Read/Write. 67For the FreeBSD server, a layout covers all bytes of a file. 68A layout may be recalled by the MDS using a LayoutRecall callback. 69When a client returns a layout via the LayoutReturn operation it can 70indicate that error(s) were encountered while doing I/O on the DS. 71.Pp 72The FreeBSD client and server supports two layout types. 73.Pp 74The File Layout is described in RFC5661 and uses the NFSv4.1 protocol 75to perform I/O on the DS. 76It does not support client aware DS mirroring and, as such, 77the FreeBSD server only provides File Layout support for non-mirrored 78configurations. 79.Pp 80The Flexible File Layout allows the use of the NFSv3, NFSv4.0 or NFSv4.1 81protocol to perform I/O on the DS and does support client aware mirroring. 82As such, the FreeBSD server uses Flexible File Layout layouts for the 83mirrored DS configurations. 84The FreeBSD server supports the 85.Dq tightly coupled 86variant and all DSs use the 87NFSv4.1 protocol for I/O operations. 88Clients that support the Flexible File Layout will do writes and commits 89to all DS mirrors in the mirror set. 90.Pp 91A FreeBSD pNFS service consists of a single MDS server plus one or more 92DS servers, all of which are FreeBSD systems. 93For a non-mirrored configuration, the FreeBSD server will issue File Layout 94layouts by default. 95However that default can be set to the Flexible File Layout by setting the 96.Xr sysctl 1 97sysctl ``vfs.nfsd.default_flexfile'' to one. 98Mirrored server configurations will only issue Flexible File Layouts. 99.Tn pNFS 100clients mount the MDS as they would a single NFS server. 101.Pp 102A FreeBSD 103.Tn pNFS 104client must be running the 105.Xr nfscbd 8 106daemon and use the mount options 107.Dq nfsv4,minorversion=1,pnfs . 108.Pp 109When files are created, the MDS creates a file tree identical to what a 110single NFS server creates, except that all the regular (VREG) files will 111be empty. 112As such, if you look at the exported tree on the MDS directly 113on the MDS server (not via an NFS mount), the files will all be of size zero. 114Each of these files will also have two extended attributes in the system 115attribute name space: 116.Bd -literal -offset indent 117pnfsd.dsfile - This extended attrbute stores the information that the 118 MDS needs to find the data file on a DS for this file. 119pnfsd.dsattr - This extended attribute stores the Size, AccessTime, 120 ModifyTime and Change attributes for the file. 121.Ed 122.Pp 123For each regular (VREG) file, the MDS creates a data file on one 124(or on N of them for the mirrored case, where N is the mirror_level) 125of the DSs where the file's data will be stored. 126The name of this file is 127the file handle of the file on the MDS in hexadecimal at time of file creation. 128The data file will have the same file ownership, mode and NFSv4 ACL 129(if ACLs are enabled for the file system) as the file on the MDS, so that 130permission checking can be done on the DS. 131This is referred to as 132.Dq tightly coupled 133for the Flexible File Layout. 134.Pp 135For 136.Tn pNFS 137aware clients, the service generates File Layout 138or Flexible File Layout 139layouts and associated DeviceInfo. 140For non-pNFS aware NFS clients, the pNFS service appears just like a normal 141NFS service. 142For the non-pNFS aware client, the MDS will perform I/O operations on the appropriate DS(s), acting as 143a proxy for the non-pNFS aware client. 144This is also true for NFSv3 and NFSv4.0 mounts, since these are always non-pNFS 145aware. 146.Pp 147See 148.Bd -literal -offset indent 149http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt 150.Ed 151.sp 152for information on how to set up a FreeBSD pNFS service. 153.Sh SEE ALSO 154.Xr nfsv4 4 , 155.Xr exports 5 , 156.Xr fstab 5 , 157.Xr rc.conf 5 , 158.Xr nfscbd 8 , 159.Xr nfsd 8 , 160.Xr nfsuserd 8 , 161.Xr pnfsdscopymr 8 , 162.Xr pnfsdsfile 8 , 163.Xr pnfsdskill 8 164.Sh BUGS 165Linux kernel versions prior to 4.12 only supports NFSv3 DSs in its client 166and will do all I/O through the MDS. 167For Linux 4.12 kernels, support for NFSv4.1 DSs was added, but I have seen 168Linux client crashes when testing this client. 169For Linux 4.17-rc2 kernels, I have not seen client crashes during testing, 170but it only supports the 171.Dq loosely coupled 172variant. 173To make it work correctly when mounting the FreeBSD server, you must either 174patch the Flexible File Layout client driver with a patch like: 175.Bd -literal -offset indent 176http://people.freebsd.org/~rmacklem/flexfile.patch 177.Ed 178.sp 179or set the sysctl 180.Dq vfs.nfsd.flexlinuxhack 181to one so that it works around 182the Linux client driver's limitations. 183.Pp 184Since the MDS cannot be mirrored, it is a single point of failure just 185as a non 186.Tn pNFS 187server is. 188