xref: /freebsd/usr.sbin/nfsd/pnfs.4 (revision fa9896e082a1046ff4fbc75fcba4d18d1f2efc19)
1203c00b3SRick Macklem.\" Copyright (c) 2017 Rick Macklem
2203c00b3SRick Macklem.\"
3203c00b3SRick Macklem.\" Redistribution and use in source and binary forms, with or without
4203c00b3SRick Macklem.\" modification, are permitted provided that the following conditions
5203c00b3SRick Macklem.\" are met:
6203c00b3SRick Macklem.\" 1. Redistributions of source code must retain the above copyright
7203c00b3SRick Macklem.\"    notice, this list of conditions and the following disclaimer.
8203c00b3SRick Macklem.\" 2. Redistributions in binary form must reproduce the above copyright
9203c00b3SRick Macklem.\"    notice, this list of conditions and the following disclaimer in the
10203c00b3SRick Macklem.\"    documentation and/or other materials provided with the distribution.
11203c00b3SRick Macklem.\"
12203c00b3SRick Macklem.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
13203c00b3SRick Macklem.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
14203c00b3SRick Macklem.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15203c00b3SRick Macklem.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
16203c00b3SRick Macklem.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
17203c00b3SRick Macklem.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
18203c00b3SRick Macklem.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
19203c00b3SRick Macklem.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
20203c00b3SRick Macklem.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
21203c00b3SRick Macklem.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
22203c00b3SRick Macklem.\" SUCH DAMAGE.
23203c00b3SRick Macklem.\"
24452588d3SRick Macklem.Dd December 20, 2019
25203c00b3SRick Macklem.Dt PNFS 4
26203c00b3SRick Macklem.Os
27203c00b3SRick Macklem.Sh NAME
28203c00b3SRick Macklem.Nm pNFS
29452588d3SRick Macklem.Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol
30203c00b3SRick Macklem.Sh DESCRIPTION
31452588d3SRick MacklemThe NFSv4.1 and NFSv4.2 client and server provides support for the
32203c00b3SRick Macklem.Tn pNFS
33203c00b3SRick Macklemspecification; see
34452588d3SRick Macklem.%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" ,
35452588d3SRick Macklem.%T "Network File System (NFS) Version 4 Minor Version 2 Protocol RFC 7862" and
36452588d3SRick Macklem.%T "Parallel NFS (pNFS) Flexible File Layout RFC 8435" .
37452588d3SRick MacklemA pNFS service separates Read/Write operations from all other NFSv4.1 and
38452588d3SRick MacklemNFSv4.2 operations, which are referred to as Metadata operations.
39203c00b3SRick MacklemThe Read/Write operations are performed directly on the Data Server (DS)
40203c00b3SRick Macklemwhere the file's data resides, bypassing the NFS server.
41203c00b3SRick MacklemAll other file operations are performed on the NFS server, which is referred to
42203c00b3SRick Macklemas a Metadata Server (MDS).
43203c00b3SRick MacklemNFS clients that do not support
44203c00b3SRick Macklem.Tn pNFS
45203c00b3SRick Macklemperform Read/Write operations on the MDS, which acts as a proxy for the
46203c00b3SRick Macklemappropriate DS(s).
47203c00b3SRick Macklem.Pp
48452588d3SRick MacklemThe NFSv4.1 and NFSv4.2 protocols provide two pieces of information to pNFS
49452588d3SRick Macklemaware clients that allow them to perform Read/Write operations directly on
50203c00b3SRick Macklemthe DS.
51203c00b3SRick Macklem.Pp
52203c00b3SRick MacklemThe first is DeviceInfo, which is static information defining the DS
53203c00b3SRick Macklemserver.
54203c00b3SRick MacklemThe critical piece of information in DeviceInfo for the layout types
554c87085dSGordon Berglingsupported by
564c87085dSGordon Bergling.Fx
574c87085dSGordon Berglingis the IP address that is used to perform RPCs on the DS.
58203c00b3SRick MacklemIt also indicates which version of NFS the DS supports, I/O size and other
59203c00b3SRick Macklemlayout specific information.
604c87085dSGordon BerglingIn the DeviceInfo, there is a DeviceID which, for the
614c87085dSGordon Bergling.Fx
624c87085dSGordon Berglingserver
63203c00b3SRick Macklemis unique to the DS configuration
64203c00b3SRick Macklemand changes whenever the
65203c00b3SRick Macklem.Xr nfsd
66203c00b3SRick Macklemdaemon is restarted or the server is rebooted.
67203c00b3SRick Macklem.Pp
68203c00b3SRick MacklemThe second is the layout, which is per file and references the DeviceInfo
69203c00b3SRick Macklemto use via the DeviceID.
70203c00b3SRick MacklemIt is for a byte range of a file and is either Read or Read/Write.
714c87085dSGordon BerglingFor the
724c87085dSGordon Bergling.Fx
734c87085dSGordon Berglingserver, a layout covers all bytes of a file.
74203c00b3SRick MacklemA layout may be recalled by the MDS using a LayoutRecall callback.
75203c00b3SRick MacklemWhen a client returns a layout via the LayoutReturn operation it can
76eec5cbdeSRick Macklemindicate that error(s) were encountered while doing I/O on the DS,
77eec5cbdeSRick Macklemat least for certain layout types such as the Flexible File Layout.
78203c00b3SRick Macklem.Pp
794c87085dSGordon BerglingThe
804c87085dSGordon Bergling.Fx
814c87085dSGordon Berglingclient and server supports two layout types.
82203c00b3SRick Macklem.Pp
83452588d3SRick MacklemThe File Layout is described in RFC5661 and uses the NFSv4.1 or NFSv4.2 protocol
84203c00b3SRick Macklemto perform I/O on the DS.
85203c00b3SRick MacklemIt does not support client aware DS mirroring and, as such,
864c87085dSGordon Berglingthe
874c87085dSGordon Bergling.Fx
884c87085dSGordon Berglingserver only provides File Layout support for non-mirrored
89203c00b3SRick Macklemconfigurations.
90203c00b3SRick Macklem.Pp
91452588d3SRick MacklemThe Flexible File Layout allows the use of the NFSv3, NFSv4.0, NFSv4.1 or
92452588d3SRick MacklemNFSv4.2 protocol to perform I/O on the DS and does support client aware
93452588d3SRick Macklemmirroring.
944c87085dSGordon BerglingAs such, the
954c87085dSGordon Bergling.Fx
964c87085dSGordon Berglingserver uses Flexible File Layout layouts for the
97203c00b3SRick Macklemmirrored DS configurations.
984c87085dSGordon BerglingThe
994c87085dSGordon Bergling.Fx
1004c87085dSGordon Berglingserver supports the
101203c00b3SRick Macklem.Dq tightly coupled
102452588d3SRick Macklemvariant and all DSs allow use of the
103452588d3SRick MacklemNFSv4.2 or NFSv4.1 protocol for I/O operations.
104203c00b3SRick MacklemClients that support the Flexible File Layout will do writes and commits
105203c00b3SRick Macklemto all DS mirrors in the mirror set.
106203c00b3SRick Macklem.Pp
1074c87085dSGordon BerglingA
1084c87085dSGordon Bergling.Fx
1094c87085dSGordon BerglingpNFS service consists of a single MDS server plus one or more
1104c87085dSGordon BerglingDS servers, all of which are
1114c87085dSGordon Bergling.Fx
1124c87085dSGordon Berglingsystems.
1134c87085dSGordon BerglingFor a non-mirrored configuration, the
1144c87085dSGordon Bergling.Fx
1154c87085dSGordon Berglingserver will issue File Layout
116203c00b3SRick Macklemlayouts by default.
117203c00b3SRick MacklemHowever that default can be set to the Flexible File Layout by setting the
118e9e615c8SJens Schweikhardt.Xr sysctl 8
119eec5cbdeSRick Macklemsysctl
120eec5cbdeSRick Macklem.Dq vfs.nfsd.default_flexfile
121eec5cbdeSRick Macklemto one.
122203c00b3SRick MacklemMirrored server configurations will only issue Flexible File Layouts.
123203c00b3SRick Macklem.Tn pNFS
124203c00b3SRick Macklemclients mount the MDS as they would a single NFS server.
125203c00b3SRick Macklem.Pp
1264c87085dSGordon BerglingA
1274c87085dSGordon Bergling.Fx
128203c00b3SRick Macklem.Tn pNFS
129203c00b3SRick Macklemclient must be running the
130203c00b3SRick Macklem.Xr nfscbd 8
131203c00b3SRick Macklemdaemon and use the mount options
132452588d3SRick Macklem.Dq nfsv4,minorversion=2,pnfs or
133203c00b3SRick Macklem.Dq nfsv4,minorversion=1,pnfs .
134203c00b3SRick Macklem.Pp
135203c00b3SRick MacklemWhen files are created, the MDS creates a file tree identical to what a
136203c00b3SRick Macklemsingle NFS server creates, except that all the regular (VREG) files will
137203c00b3SRick Macklembe empty.
138203c00b3SRick MacklemAs such, if you look at the exported tree on the MDS directly
139203c00b3SRick Macklemon the MDS server (not via an NFS mount), the files will all be of size zero.
140203c00b3SRick MacklemEach of these files will also have two extended attributes in the system
141203c00b3SRick Macklemattribute name space:
142203c00b3SRick Macklem.Bd -literal -offset indent
143*40b245afSGordon Berglingpnfsd.dsfile - This extended attribute stores the information that the
144eec5cbdeSRick Macklem    MDS needs to find the data file on a DS(s) for this file.
145203c00b3SRick Macklempnfsd.dsattr - This extended attribute stores the Size, AccessTime,
146452588d3SRick Macklem    ModifyTime, Change and SpaceUsed attributes for the file.
147203c00b3SRick Macklem.Ed
148203c00b3SRick Macklem.Pp
149203c00b3SRick MacklemFor each regular (VREG) file, the MDS creates a data file on one
150203c00b3SRick Macklem(or on N of them for the mirrored case, where N is the mirror_level)
151eec5cbdeSRick Macklemof the DS(s) where the file's data will be stored.
152203c00b3SRick MacklemThe name of this file is
153203c00b3SRick Macklemthe file handle of the file on the MDS in hexadecimal at time of file creation.
154203c00b3SRick MacklemThe data file will have the same file ownership, mode and NFSv4 ACL
155203c00b3SRick Macklem(if ACLs are enabled for the file system) as the file on the MDS, so that
156203c00b3SRick Macklempermission checking can be done on the DS.
157203c00b3SRick MacklemThis is referred to as
158203c00b3SRick Macklem.Dq tightly coupled
159203c00b3SRick Macklemfor the Flexible File Layout.
160203c00b3SRick Macklem.Pp
161203c00b3SRick MacklemFor
162203c00b3SRick Macklem.Tn pNFS
163203c00b3SRick Macklemaware clients, the service generates File Layout
164203c00b3SRick Macklemor Flexible File Layout
165203c00b3SRick Macklemlayouts and associated DeviceInfo.
166203c00b3SRick MacklemFor non-pNFS aware NFS clients, the pNFS service appears just like a normal
167203c00b3SRick MacklemNFS service.
168452588d3SRick MacklemFor the non-pNFS aware client, the MDS will perform I/O operations on the
169452588d3SRick Macklemappropriate DS(s), acting as
170203c00b3SRick Macklema proxy for the non-pNFS aware client.
171203c00b3SRick MacklemThis is also true for NFSv3 and NFSv4.0 mounts, since these are always non-pNFS
172203c00b3SRick Macklemaware.
173203c00b3SRick Macklem.Pp
174eec5cbdeSRick MacklemIt is possible to assign a DS to an MDS exported file system so that it will
175eec5cbdeSRick Macklemstore data for files on the MDS exported file system.
176eec5cbdeSRick MacklemIf a DS is not assigned to an MDS exported file system, it will store data
177eec5cbdeSRick Macklemfor files on all exported file systems on the MDS.
178eec5cbdeSRick Macklem.Pp
179eec5cbdeSRick MacklemIf mirroring is enabled, the pNFS service will continue to function when
180eec5cbdeSRick MacklemDS(s) have failed, so long is there is at least one DS still operational
181eec5cbdeSRick Macklemthat stores data for files on all of the MDS exported file systems.
182eec5cbdeSRick MacklemAfter a disabled mirrored DS is repaired, it is possible to recover the DS
183eec5cbdeSRick Macklemas a mirror while the pNFS service continues to function.
184eec5cbdeSRick Macklem.Pp
185203c00b3SRick MacklemSee
186b11b7059SRick Macklem.Xr pnfsserver 4
1874c87085dSGordon Berglingfor information on how to set up a
1884c87085dSGordon Bergling.Fx
1894c87085dSGordon BerglingpNFS service.
190203c00b3SRick Macklem.Sh SEE ALSO
191203c00b3SRick Macklem.Xr nfsv4 4 ,
192b11b7059SRick Macklem.Xr pnfsserver 4 ,
193203c00b3SRick Macklem.Xr exports 5 ,
194203c00b3SRick Macklem.Xr fstab 5 ,
195203c00b3SRick Macklem.Xr rc.conf 5 ,
196203c00b3SRick Macklem.Xr nfscbd 8 ,
197203c00b3SRick Macklem.Xr nfsd 8 ,
198203c00b3SRick Macklem.Xr nfsuserd 8 ,
199203c00b3SRick Macklem.Xr pnfsdscopymr 8 ,
200203c00b3SRick Macklem.Xr pnfsdsfile 8 ,
201203c00b3SRick Macklem.Xr pnfsdskill 8
202203c00b3SRick Macklem.Sh BUGS
203203c00b3SRick MacklemLinux kernel versions prior to 4.12 only supports NFSv3 DSs in its client
204203c00b3SRick Macklemand will do all I/O through the MDS.
205203c00b3SRick MacklemFor Linux 4.12 kernels, support for NFSv4.1 DSs was added, but I have seen
206203c00b3SRick MacklemLinux client crashes when testing this client.
207203c00b3SRick MacklemFor Linux 4.17-rc2 kernels, I have not seen client crashes during testing,
208203c00b3SRick Macklembut it only supports the
209203c00b3SRick Macklem.Dq loosely coupled
210203c00b3SRick Macklemvariant.
2114c87085dSGordon BerglingTo make it work correctly when mounting the
2124c87085dSGordon Bergling.Fx
2134c87085dSGordon Berglingserver, you must
214452588d3SRick Macklemset the sysctl
215203c00b3SRick Macklem.Dq vfs.nfsd.flexlinuxhack
216203c00b3SRick Macklemto one so that it works around
217203c00b3SRick Macklemthe Linux client driver's limitations.
218452588d3SRick MacklemWihout this sysctl being set, there will be access errors, since the Linux
219452588d3SRick Macklemclient will use the authenticator in the layout (uid=999, gid=999) and not
220452588d3SRick Macklemthe authenticator specified in the RPC header.
221452588d3SRick Macklem.Pp
222452588d3SRick MacklemLinux 5.n kernels appear to be patched so that it uses the authenticator
223452588d3SRick Macklemin the RPC header and, as such, the above sysctl should not need to be set.
224203c00b3SRick Macklem.Pp
225203c00b3SRick MacklemSince the MDS cannot be mirrored, it is a single point of failure just
226203c00b3SRick Macklemas a non
227203c00b3SRick Macklem.Tn pNFS
228203c00b3SRick Macklemserver is.
229