xref: /freebsd/usr.sbin/nfsd/pnfs.4 (revision 51015e6d0f570239b0c2088dc6cf2b018928375d)
1.\" Copyright (c) 2017 Rick Macklem
2.\"
3.\" Redistribution and use in source and binary forms, with or without
4.\" modification, are permitted provided that the following conditions
5.\" are met:
6.\" 1. Redistributions of source code must retain the above copyright
7.\"    notice, this list of conditions and the following disclaimer.
8.\" 2. Redistributions in binary form must reproduce the above copyright
9.\"    notice, this list of conditions and the following disclaimer in the
10.\"    documentation and/or other materials provided with the distribution.
11.\"
12.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
13.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
14.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
15.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
16.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
17.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
18.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
19.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
20.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
21.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
22.\" SUCH DAMAGE.
23.\"
24.\" $FreeBSD$
25.\"
26.Dd December 20, 2019
27.Dt PNFS 4
28.Os
29.Sh NAME
30.Nm pNFS
31.Nd NFS Version 4.1 and 4.2 Parallel NFS Protocol
32.Sh DESCRIPTION
33The NFSv4.1 and NFSv4.2 client and server provides support for the
34.Tn pNFS
35specification; see
36.%T "Network File System (NFS) Version 4 Minor Version 1 Protocol RFC 5661" ,
37.%T "Network File System (NFS) Version 4 Minor Version 2 Protocol RFC 7862" and
38.%T "Parallel NFS (pNFS) Flexible File Layout RFC 8435" .
39A pNFS service separates Read/Write operations from all other NFSv4.1 and
40NFSv4.2 operations, which are referred to as Metadata operations.
41The Read/Write operations are performed directly on the Data Server (DS)
42where the file's data resides, bypassing the NFS server.
43All other file operations are performed on the NFS server, which is referred to
44as a Metadata Server (MDS).
45NFS clients that do not support
46.Tn pNFS
47perform Read/Write operations on the MDS, which acts as a proxy for the
48appropriate DS(s).
49.Pp
50The NFSv4.1 and NFSv4.2 protocols provide two pieces of information to pNFS
51aware clients that allow them to perform Read/Write operations directly on
52the DS.
53.Pp
54The first is DeviceInfo, which is static information defining the DS
55server.
56The critical piece of information in DeviceInfo for the layout types
57supported by
58.Fx
59is the IP address that is used to perform RPCs on the DS.
60It also indicates which version of NFS the DS supports, I/O size and other
61layout specific information.
62In the DeviceInfo, there is a DeviceID which, for the
63.Fx
64server
65is unique to the DS configuration
66and changes whenever the
67.Xr nfsd
68daemon is restarted or the server is rebooted.
69.Pp
70The second is the layout, which is per file and references the DeviceInfo
71to use via the DeviceID.
72It is for a byte range of a file and is either Read or Read/Write.
73For the
74.Fx
75server, a layout covers all bytes of a file.
76A layout may be recalled by the MDS using a LayoutRecall callback.
77When a client returns a layout via the LayoutReturn operation it can
78indicate that error(s) were encountered while doing I/O on the DS,
79at least for certain layout types such as the Flexible File Layout.
80.Pp
81The
82.Fx
83client and server supports two layout types.
84.Pp
85The File Layout is described in RFC5661 and uses the NFSv4.1 or NFSv4.2 protocol
86to perform I/O on the DS.
87It does not support client aware DS mirroring and, as such,
88the
89.Fx
90server only provides File Layout support for non-mirrored
91configurations.
92.Pp
93The Flexible File Layout allows the use of the NFSv3, NFSv4.0, NFSv4.1 or
94NFSv4.2 protocol to perform I/O on the DS and does support client aware
95mirroring.
96As such, the
97.Fx
98server uses Flexible File Layout layouts for the
99mirrored DS configurations.
100The
101.Fx
102server supports the
103.Dq tightly coupled
104variant and all DSs allow use of the
105NFSv4.2 or NFSv4.1 protocol for I/O operations.
106Clients that support the Flexible File Layout will do writes and commits
107to all DS mirrors in the mirror set.
108.Pp
109A
110.Fx
111pNFS service consists of a single MDS server plus one or more
112DS servers, all of which are
113.Fx
114systems.
115For a non-mirrored configuration, the
116.Fx
117server will issue File Layout
118layouts by default.
119However that default can be set to the Flexible File Layout by setting the
120.Xr sysctl 8
121sysctl
122.Dq vfs.nfsd.default_flexfile
123to one.
124Mirrored server configurations will only issue Flexible File Layouts.
125.Tn pNFS
126clients mount the MDS as they would a single NFS server.
127.Pp
128A
129.Fx
130.Tn pNFS
131client must be running the
132.Xr nfscbd 8
133daemon and use the mount options
134.Dq nfsv4,minorversion=2,pnfs or
135.Dq nfsv4,minorversion=1,pnfs .
136.Pp
137When files are created, the MDS creates a file tree identical to what a
138single NFS server creates, except that all the regular (VREG) files will
139be empty.
140As such, if you look at the exported tree on the MDS directly
141on the MDS server (not via an NFS mount), the files will all be of size zero.
142Each of these files will also have two extended attributes in the system
143attribute name space:
144.Bd -literal -offset indent
145pnfsd.dsfile - This extended attribute stores the information that the
146    MDS needs to find the data file on a DS(s) for this file.
147pnfsd.dsattr - This extended attribute stores the Size, AccessTime,
148    ModifyTime, Change and SpaceUsed attributes for the file.
149.Ed
150.Pp
151For each regular (VREG) file, the MDS creates a data file on one
152(or on N of them for the mirrored case, where N is the mirror_level)
153of the DS(s) where the file's data will be stored.
154The name of this file is
155the file handle of the file on the MDS in hexadecimal at time of file creation.
156The data file will have the same file ownership, mode and NFSv4 ACL
157(if ACLs are enabled for the file system) as the file on the MDS, so that
158permission checking can be done on the DS.
159This is referred to as
160.Dq tightly coupled
161for the Flexible File Layout.
162.Pp
163For
164.Tn pNFS
165aware clients, the service generates File Layout
166or Flexible File Layout
167layouts and associated DeviceInfo.
168For non-pNFS aware NFS clients, the pNFS service appears just like a normal
169NFS service.
170For the non-pNFS aware client, the MDS will perform I/O operations on the
171appropriate DS(s), acting as
172a proxy for the non-pNFS aware client.
173This is also true for NFSv3 and NFSv4.0 mounts, since these are always non-pNFS
174aware.
175.Pp
176It is possible to assign a DS to an MDS exported file system so that it will
177store data for files on the MDS exported file system.
178If a DS is not assigned to an MDS exported file system, it will store data
179for files on all exported file systems on the MDS.
180.Pp
181If mirroring is enabled, the pNFS service will continue to function when
182DS(s) have failed, so long is there is at least one DS still operational
183that stores data for files on all of the MDS exported file systems.
184After a disabled mirrored DS is repaired, it is possible to recover the DS
185as a mirror while the pNFS service continues to function.
186.Pp
187See
188.Xr pnfsserver 4
189for information on how to set up a
190.Fx
191pNFS service.
192.Sh SEE ALSO
193.Xr nfsv4 4 ,
194.Xr pnfsserver 4 ,
195.Xr exports 5 ,
196.Xr fstab 5 ,
197.Xr rc.conf 5 ,
198.Xr nfscbd 8 ,
199.Xr nfsd 8 ,
200.Xr nfsuserd 8 ,
201.Xr pnfsdscopymr 8 ,
202.Xr pnfsdsfile 8 ,
203.Xr pnfsdskill 8
204.Sh BUGS
205Linux kernel versions prior to 4.12 only supports NFSv3 DSs in its client
206and will do all I/O through the MDS.
207For Linux 4.12 kernels, support for NFSv4.1 DSs was added, but I have seen
208Linux client crashes when testing this client.
209For Linux 4.17-rc2 kernels, I have not seen client crashes during testing,
210but it only supports the
211.Dq loosely coupled
212variant.
213To make it work correctly when mounting the
214.Fx
215server, you must
216set the sysctl
217.Dq vfs.nfsd.flexlinuxhack
218to one so that it works around
219the Linux client driver's limitations.
220Wihout this sysctl being set, there will be access errors, since the Linux
221client will use the authenticator in the layout (uid=999, gid=999) and not
222the authenticator specified in the RPC header.
223.Pp
224Linux 5.n kernels appear to be patched so that it uses the authenticator
225in the RPC header and, as such, the above sysctl should not need to be set.
226.Pp
227Since the MDS cannot be mirrored, it is a single point of failure just
228as a non
229.Tn pNFS
230server is.
231