xref: /linux/Documentation/filesystems/overlayfs.rst (revision 3a39d672e7f48b8d6b91a09afa4b55352773b4b5)
135c6cb41SAmir Goldstein.. SPDX-License-Identifier: GPL-2.0
235c6cb41SAmir Goldstein
35356ab06SAmir GoldsteinWritten by: Neil Brown
45356ab06SAmir GoldsteinPlease see MAINTAINERS file for where to send questions.
55356ab06SAmir Goldstein
65356ab06SAmir GoldsteinOverlay Filesystem
75356ab06SAmir Goldstein==================
85356ab06SAmir Goldstein
95356ab06SAmir GoldsteinThis document describes a prototype for a new approach to providing
105356ab06SAmir Goldsteinoverlay-filesystem functionality in Linux (sometimes referred to as
115356ab06SAmir Goldsteinunion-filesystems).  An overlay-filesystem tries to present a
125356ab06SAmir Goldsteinfilesystem which is the result over overlaying one filesystem on top
135356ab06SAmir Goldsteinof the other.
145356ab06SAmir Goldstein
155356ab06SAmir Goldstein
165356ab06SAmir GoldsteinOverlay objects
175356ab06SAmir Goldstein---------------
185356ab06SAmir Goldstein
195356ab06SAmir GoldsteinThe overlay filesystem approach is 'hybrid', because the objects that
205356ab06SAmir Goldsteinappear in the filesystem do not always appear to belong to that filesystem.
215356ab06SAmir GoldsteinIn many cases, an object accessed in the union will be indistinguishable
225356ab06SAmir Goldsteinfrom accessing the corresponding object from the original filesystem.
235356ab06SAmir GoldsteinThis is most obvious from the 'st_dev' field returned by stat(2).
245356ab06SAmir Goldstein
255356ab06SAmir GoldsteinWhile directories will report an st_dev from the overlay-filesystem,
265356ab06SAmir Goldsteinnon-directory objects may report an st_dev from the lower filesystem or
275356ab06SAmir Goldsteinupper filesystem that is providing the object.  Similarly st_ino will
285356ab06SAmir Goldsteinonly be unique when combined with st_dev, and both of these can change
295356ab06SAmir Goldsteinover the lifetime of a non-directory object.  Many applications and
305356ab06SAmir Goldsteintools ignore these values and will not be affected.
315356ab06SAmir Goldstein
325356ab06SAmir GoldsteinIn the special case of all overlay layers on the same underlying
335356ab06SAmir Goldsteinfilesystem, all objects will report an st_dev from the overlay
345356ab06SAmir Goldsteinfilesystem and st_ino from the underlying filesystem.  This will
355356ab06SAmir Goldsteinmake the overlay mount more compliant with filesystem scanners and
365356ab06SAmir Goldsteinoverlay objects will be distinguishable from the corresponding
375356ab06SAmir Goldsteinobjects in the original filesystem.
385356ab06SAmir Goldstein
395356ab06SAmir GoldsteinOn 64bit systems, even if all overlay layers are not on the same
405356ab06SAmir Goldsteinunderlying filesystem, the same compliant behavior could be achieved
415356ab06SAmir Goldsteinwith the "xino" feature.  The "xino" feature composes a unique object
42bdc10bdfSAmir Goldsteinidentifier from the real object st_ino and an underlying fsid number.
43b0e0f697SAmir GoldsteinThe "xino" feature uses the high inode number bits for fsid, because the
44b0e0f697SAmir Goldsteinunderlying filesystems rarely use the high inode number bits.  In case
452eda9eaaSAmir Goldsteinthe underlying inode number does overflow into the high xino bits, overlay
462eda9eaaSAmir Goldsteinfilesystem will fall back to the non xino behavior for that inode.
472eda9eaaSAmir Goldstein
48b0e0f697SAmir GoldsteinThe "xino" feature can be enabled with the "-o xino=on" overlay mount option.
49b0e0f697SAmir GoldsteinIf all underlying filesystems support NFS file handles, the value of st_ino
50b0e0f697SAmir Goldsteinfor overlay filesystem objects is not only unique, but also persistent over
51b0e0f697SAmir Goldsteinthe lifetime of the filesystem.  The "-o xino=auto" overlay mount option
52b0e0f697SAmir Goldsteinenables the "xino" feature only if the persistent st_ino requirement is met.
53b0e0f697SAmir Goldstein
542eda9eaaSAmir GoldsteinThe following table summarizes what can be expected in different overlay
552eda9eaaSAmir Goldsteinconfigurations.
562eda9eaaSAmir Goldstein
572eda9eaaSAmir GoldsteinInode properties
582eda9eaaSAmir Goldstein````````````````
592eda9eaaSAmir Goldstein
602eda9eaaSAmir Goldstein+--------------+------------+------------+-----------------+----------------+
612eda9eaaSAmir Goldstein|Configuration | Persistent | Uniform    | st_ino == d_ino | d_ino == i_ino |
622eda9eaaSAmir Goldstein|              | st_ino     | st_dev     |                 | [*]            |
632eda9eaaSAmir Goldstein+==============+=====+======+=====+======+========+========+========+=======+
642eda9eaaSAmir Goldstein|              | dir | !dir | dir | !dir |  dir   +  !dir  |  dir   | !dir  |
652eda9eaaSAmir Goldstein+--------------+-----+------+-----+------+--------+--------+--------+-------+
662eda9eaaSAmir Goldstein| All layers   |  Y  |  Y   |  Y  |  Y   |  Y     |   Y    |  Y     |  Y    |
672eda9eaaSAmir Goldstein| on same fs   |     |      |     |      |        |        |        |       |
682eda9eaaSAmir Goldstein+--------------+-----+------+-----+------+--------+--------+--------+-------+
69b0e0f697SAmir Goldstein| Layers not   |  N  |  N   |  Y  |  N   |  N     |   Y    |  N     |  Y    |
702eda9eaaSAmir Goldstein| on same fs,  |     |      |     |      |        |        |        |       |
712eda9eaaSAmir Goldstein| xino=off     |     |      |     |      |        |        |        |       |
722eda9eaaSAmir Goldstein+--------------+-----+------+-----+------+--------+--------+--------+-------+
732eda9eaaSAmir Goldstein| xino=on/auto |  Y  |  Y   |  Y  |  Y   |  Y     |   Y    |  Y     |  Y    |
742eda9eaaSAmir Goldstein+--------------+-----+------+-----+------+--------+--------+--------+-------+
75b0e0f697SAmir Goldstein| xino=on/auto,|  N  |  N   |  Y  |  N   |  N     |   Y    |  N     |  Y    |
762eda9eaaSAmir Goldstein| ino overflow |     |      |     |      |        |        |        |       |
772eda9eaaSAmir Goldstein+--------------+-----+------+-----+------+--------+--------+--------+-------+
782eda9eaaSAmir Goldstein
792eda9eaaSAmir Goldstein[*] nfsd v3 readdirplus verifies d_ino == i_ino. i_ino is exposed via several
802eda9eaaSAmir Goldstein/proc files, such as /proc/locks and /proc/self/fdinfo/<fd> of an inotify
812eda9eaaSAmir Goldsteinfile descriptor.
825356ab06SAmir Goldstein
835356ab06SAmir GoldsteinUpper and Lower
845356ab06SAmir Goldstein---------------
855356ab06SAmir Goldstein
865356ab06SAmir GoldsteinAn overlay filesystem combines two filesystems - an 'upper' filesystem
875356ab06SAmir Goldsteinand a 'lower' filesystem.  When a name exists in both filesystems, the
885356ab06SAmir Goldsteinobject in the 'upper' filesystem is visible while the object in the
895356ab06SAmir Goldstein'lower' filesystem is either hidden or, in the case of directories,
905356ab06SAmir Goldsteinmerged with the 'upper' object.
915356ab06SAmir Goldstein
925356ab06SAmir GoldsteinIt would be more correct to refer to an upper and lower 'directory
935356ab06SAmir Goldsteintree' rather than 'filesystem' as it is quite possible for both
945356ab06SAmir Goldsteindirectory trees to be in the same filesystem and there is no
955356ab06SAmir Goldsteinrequirement that the root of a filesystem be given for either upper or
965356ab06SAmir Goldsteinlower.
975356ab06SAmir Goldstein
9858afaf5dSMiklos SzerediA wide range of filesystems supported by Linux can be the lower filesystem,
9958afaf5dSMiklos Szeredibut not all filesystems that are mountable by Linux have the features
10058afaf5dSMiklos Szeredineeded for OverlayFS to work.  The lower filesystem does not need to be
10158afaf5dSMiklos Szerediwritable.  The lower filesystem can even be another overlayfs.  The upper
10258afaf5dSMiklos Szeredifilesystem will normally be writable and if it is it must support the
1032d2f2d73SMiklos Szeredicreation of trusted.* and/or user.* extended attributes, and must provide
1042d2f2d73SMiklos Szeredivalid d_type in readdir responses, so NFS is not suitable.
1055356ab06SAmir Goldstein
1065356ab06SAmir GoldsteinA read-only overlay of two read-only filesystems may use any
1075356ab06SAmir Goldsteinfilesystem type.
1085356ab06SAmir Goldstein
1095356ab06SAmir GoldsteinDirectories
1105356ab06SAmir Goldstein-----------
1115356ab06SAmir Goldstein
1125356ab06SAmir GoldsteinOverlaying mainly involves directories.  If a given name appears in both
1135356ab06SAmir Goldsteinupper and lower filesystems and refers to a non-directory in either,
1145356ab06SAmir Goldsteinthen the lower object is hidden - the name refers only to the upper
1155356ab06SAmir Goldsteinobject.
1165356ab06SAmir Goldstein
1175356ab06SAmir GoldsteinWhere both upper and lower objects are directories, a merged directory
1185356ab06SAmir Goldsteinis formed.
1195356ab06SAmir Goldstein
1205356ab06SAmir GoldsteinAt mount time, the two directories given as mount options "lowerdir" and
121d17bb462SAmir Goldstein"upperdir" are combined into a merged directory::
1225356ab06SAmir Goldstein
1235356ab06SAmir Goldstein  mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,\
1245356ab06SAmir Goldstein  workdir=/work /merged
1255356ab06SAmir Goldstein
1265356ab06SAmir GoldsteinThe "workdir" needs to be an empty directory on the same filesystem
1275356ab06SAmir Goldsteinas upperdir.
1285356ab06SAmir Goldstein
1295356ab06SAmir GoldsteinThen whenever a lookup is requested in such a merged directory, the
1305356ab06SAmir Goldsteinlookup is performed in each actual directory and the combined result
1315356ab06SAmir Goldsteinis cached in the dentry belonging to the overlay filesystem.  If both
1325356ab06SAmir Goldsteinactual lookups find directories, both are stored and a merged
1335356ab06SAmir Goldsteindirectory is created, otherwise only one is stored: the upper if it
1345356ab06SAmir Goldsteinexists, else the lower.
1355356ab06SAmir Goldstein
1365356ab06SAmir GoldsteinOnly the lists of names from directories are merged.  Other content
1375356ab06SAmir Goldsteinsuch as metadata and extended attributes are reported for the upper
1385356ab06SAmir Goldsteindirectory only.  These attributes of the lower directory are hidden.
1395356ab06SAmir Goldstein
1405356ab06SAmir Goldsteinwhiteouts and opaque directories
1415356ab06SAmir Goldstein--------------------------------
1425356ab06SAmir Goldstein
1435356ab06SAmir GoldsteinIn order to support rm and rmdir without changing the lower
1445356ab06SAmir Goldsteinfilesystem, an overlay filesystem needs to record in the upper filesystem
1455356ab06SAmir Goldsteinthat files have been removed.  This is done using whiteouts and opaque
1465356ab06SAmir Goldsteindirectories (non-directories are always opaque).
1475356ab06SAmir Goldstein
148420332b9SAmir GoldsteinA whiteout is created as a character device with 0/0 device number or
149420332b9SAmir Goldsteinas a zero-size regular file with the xattr "trusted.overlay.whiteout".
150420332b9SAmir Goldstein
1515356ab06SAmir GoldsteinWhen a whiteout is found in the upper level of a merged directory, any
1525356ab06SAmir Goldsteinmatching name in the lower level is ignored, and the whiteout itself
1535356ab06SAmir Goldsteinis also hidden.
1545356ab06SAmir Goldstein
1555356ab06SAmir GoldsteinA directory is made opaque by setting the xattr "trusted.overlay.opaque"
1565356ab06SAmir Goldsteinto "y".  Where the upper filesystem contains an opaque directory, any
1575356ab06SAmir Goldsteindirectory in the lower filesystem with the same name is ignored.
1585356ab06SAmir Goldstein
159420332b9SAmir GoldsteinAn opaque directory should not conntain any whiteouts, because they do not
160420332b9SAmir Goldsteinserve any purpose.  A merge directory containing regular files with the xattr
161420332b9SAmir Goldstein"trusted.overlay.whiteout", should be additionally marked by setting the xattr
162420332b9SAmir Goldstein"trusted.overlay.opaque" to "x" on the merge directory itself.
163420332b9SAmir GoldsteinThis is needed to avoid the overhead of checking the "trusted.overlay.whiteout"
164420332b9SAmir Goldsteinon all entries during readdir in the common case.
165420332b9SAmir Goldstein
1665356ab06SAmir Goldsteinreaddir
1675356ab06SAmir Goldstein-------
1685356ab06SAmir Goldstein
1695356ab06SAmir GoldsteinWhen a 'readdir' request is made on a merged directory, the upper and
1705356ab06SAmir Goldsteinlower directories are each read and the name lists merged in the
1715356ab06SAmir Goldsteinobvious way (upper is read first, then lower - entries that already
1725356ab06SAmir Goldsteinexist are not re-added).  This merged name list is cached in the
1735356ab06SAmir Goldstein'struct file' and so remains as long as the file is kept open.  If the
1745356ab06SAmir Goldsteindirectory is opened and read by two processes at the same time, they
1755356ab06SAmir Goldsteinwill each have separate caches.  A seekdir to the start of the
1765356ab06SAmir Goldsteindirectory (offset 0) followed by a readdir will cause the cache to be
1775356ab06SAmir Goldsteindiscarded and rebuilt.
1785356ab06SAmir Goldstein
1795356ab06SAmir GoldsteinThis means that changes to the merged directory do not appear while a
1805356ab06SAmir Goldsteindirectory is being read.  This is unlikely to be noticed by many
1815356ab06SAmir Goldsteinprograms.
1825356ab06SAmir Goldstein
1835356ab06SAmir Goldsteinseek offsets are assigned sequentially when the directories are read.
184d17bb462SAmir GoldsteinThus if:
1855356ab06SAmir Goldstein
1865356ab06SAmir Goldstein - read part of a directory
1875356ab06SAmir Goldstein - remember an offset, and close the directory
1885356ab06SAmir Goldstein - re-open the directory some time later
1895356ab06SAmir Goldstein - seek to the remembered offset
1905356ab06SAmir Goldstein
1915356ab06SAmir Goldsteinthere may be little correlation between the old and new locations in
1925356ab06SAmir Goldsteinthe list of filenames, particularly if anything has changed in the
1935356ab06SAmir Goldsteindirectory.
1945356ab06SAmir Goldstein
1955356ab06SAmir GoldsteinReaddir on directories that are not merged is simply handled by the
1965356ab06SAmir Goldsteinunderlying directory (upper or lower).
1975356ab06SAmir Goldstein
1985356ab06SAmir Goldsteinrenaming directories
1995356ab06SAmir Goldstein--------------------
2005356ab06SAmir Goldstein
2015356ab06SAmir GoldsteinWhen renaming a directory that is on the lower layer or merged (i.e. the
2025356ab06SAmir Goldsteindirectory was not created on the upper layer to start with) overlayfs can
2035356ab06SAmir Goldsteinhandle it in two different ways:
2045356ab06SAmir Goldstein
2055356ab06SAmir Goldstein1. return EXDEV error: this error is returned by rename(2) when trying to
2065356ab06SAmir Goldstein   move a file or directory across filesystem boundaries.  Hence
207d56b699dSBjorn Helgaas   applications are usually prepared to handle this error (mv(1) for example
2085356ab06SAmir Goldstein   recursively copies the directory tree).  This is the default behavior.
2095356ab06SAmir Goldstein
2105356ab06SAmir Goldstein2. If the "redirect_dir" feature is enabled, then the directory will be
2115356ab06SAmir Goldstein   copied up (but not the contents).  Then the "trusted.overlay.redirect"
2125356ab06SAmir Goldstein   extended attribute is set to the path of the original location from the
2135356ab06SAmir Goldstein   root of the overlay.  Finally the directory is moved to the new
2145356ab06SAmir Goldstein   location.
2155356ab06SAmir Goldstein
2165356ab06SAmir GoldsteinThere are several ways to tune the "redirect_dir" feature.
2175356ab06SAmir Goldstein
2185356ab06SAmir GoldsteinKernel config options:
2195356ab06SAmir Goldstein
2205356ab06SAmir Goldstein- OVERLAY_FS_REDIRECT_DIR:
2215356ab06SAmir Goldstein    If this is enabled, then redirect_dir is turned on by  default.
2225356ab06SAmir Goldstein- OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW:
2235356ab06SAmir Goldstein    If this is enabled, then redirects are always followed by default. Enabling
2245356ab06SAmir Goldstein    this results in a less secure configuration.  Enable this option only when
2255356ab06SAmir Goldstein    worried about backward compatibility with kernels that have the redirect_dir
2265356ab06SAmir Goldstein    feature and follow redirects even if turned off.
2275356ab06SAmir Goldstein
22835c6cb41SAmir GoldsteinModule options (can also be changed through /sys/module/overlay/parameters/):
2295356ab06SAmir Goldstein
2305356ab06SAmir Goldstein- "redirect_dir=BOOL":
2315356ab06SAmir Goldstein    See OVERLAY_FS_REDIRECT_DIR kernel config option above.
2325356ab06SAmir Goldstein- "redirect_always_follow=BOOL":
2335356ab06SAmir Goldstein    See OVERLAY_FS_REDIRECT_ALWAYS_FOLLOW kernel config option above.
2345356ab06SAmir Goldstein- "redirect_max=NUM":
2355356ab06SAmir Goldstein    The maximum number of bytes in an absolute redirect (default is 256).
2365356ab06SAmir Goldstein
2375356ab06SAmir GoldsteinMount options:
2385356ab06SAmir Goldstein
2395356ab06SAmir Goldstein- "redirect_dir=on":
2405356ab06SAmir Goldstein    Redirects are enabled.
2415356ab06SAmir Goldstein- "redirect_dir=follow":
2425356ab06SAmir Goldstein    Redirects are not created, but followed.
2435356ab06SAmir Goldstein- "redirect_dir=nofollow":
244af5f2396SAmir Goldstein    Redirects are not created and not followed.
245af5f2396SAmir Goldstein- "redirect_dir=off":
246af5f2396SAmir Goldstein    If "redirect_always_follow" is enabled in the kernel/module config,
247d56b699dSBjorn Helgaas    this "off" translates to "follow", otherwise it translates to "nofollow".
2485356ab06SAmir Goldstein
2495356ab06SAmir GoldsteinWhen the NFS export feature is enabled, every copied up directory is
2505356ab06SAmir Goldsteinindexed by the file handle of the lower inode and a file handle of the
2515356ab06SAmir Goldsteinupper directory is stored in a "trusted.overlay.upper" extended attribute
2525356ab06SAmir Goldsteinon the index entry.  On lookup of a merged directory, if the upper
2535356ab06SAmir Goldsteindirectory does not match the file handle stores in the index, that is an
2545356ab06SAmir Goldsteinindication that multiple upper directories may be redirected to the same
2555356ab06SAmir Goldsteinlower directory.  In that case, lookup returns an error and warns about
2565356ab06SAmir Goldsteina possible inconsistency.
2575356ab06SAmir Goldstein
2585356ab06SAmir GoldsteinBecause lower layer redirects cannot be verified with the index, enabling
2595356ab06SAmir GoldsteinNFS export support on an overlay filesystem with no upper layer requires
2605356ab06SAmir Goldsteinturning off redirect follow (e.g. "redirect_dir=nofollow").
2615356ab06SAmir Goldstein
2625356ab06SAmir Goldstein
2635356ab06SAmir GoldsteinNon-directories
2645356ab06SAmir Goldstein---------------
2655356ab06SAmir Goldstein
2665356ab06SAmir GoldsteinObjects that are not directories (files, symlinks, device-special
2675356ab06SAmir Goldsteinfiles etc.) are presented either from the upper or lower filesystem as
2685356ab06SAmir Goldsteinappropriate.  When a file in the lower filesystem is accessed in a way
2695356ab06SAmir Goldsteinthe requires write-access, such as opening for write access, changing
2705356ab06SAmir Goldsteinsome metadata etc., the file is first copied from the lower filesystem
2715356ab06SAmir Goldsteinto the upper filesystem (copy_up).  Note that creating a hard-link
2725356ab06SAmir Goldsteinalso requires copy_up, though of course creation of a symlink does
2735356ab06SAmir Goldsteinnot.
2745356ab06SAmir Goldstein
2755356ab06SAmir GoldsteinThe copy_up may turn out to be unnecessary, for example if the file is
2765356ab06SAmir Goldsteinopened for read-write but the data is not modified.
2775356ab06SAmir Goldstein
2785356ab06SAmir GoldsteinThe copy_up process first makes sure that the containing directory
2795356ab06SAmir Goldsteinexists in the upper filesystem - creating it and any parents as
2805356ab06SAmir Goldsteinnecessary.  It then creates the object with the same metadata (owner,
2815356ab06SAmir Goldsteinmode, mtime, symlink-target etc.) and then if the object is a file, the
2825356ab06SAmir Goldsteindata is copied from the lower to the upper filesystem.  Finally any
2835356ab06SAmir Goldsteinextended attributes are copied up.
2845356ab06SAmir Goldstein
2855356ab06SAmir GoldsteinOnce the copy_up is complete, the overlay filesystem simply
2865356ab06SAmir Goldsteinprovides direct access to the newly created file in the upper
2875356ab06SAmir Goldsteinfilesystem - future operations on the file are barely noticed by the
2885356ab06SAmir Goldsteinoverlay filesystem (though an operation on the name of the file such as
2895356ab06SAmir Goldsteinrename or unlink will of course be noticed and handled).
2905356ab06SAmir Goldstein
2915356ab06SAmir Goldstein
2924c494bd5SMiklos SzerediPermission model
2934c494bd5SMiklos Szeredi----------------
2944c494bd5SMiklos Szeredi
2954c494bd5SMiklos SzerediPermission checking in the overlay filesystem follows these principles:
2964c494bd5SMiklos Szeredi
2974c494bd5SMiklos Szeredi 1) permission check SHOULD return the same result before and after copy up
2984c494bd5SMiklos Szeredi
2994c494bd5SMiklos Szeredi 2) task creating the overlay mount MUST NOT gain additional privileges
3004c494bd5SMiklos Szeredi
3014c494bd5SMiklos Szeredi 3) non-mounting task MAY gain additional privileges through the overlay,
3024c494bd5SMiklos Szeredi    compared to direct access on underlying lower or upper filesystems
3034c494bd5SMiklos Szeredi
304d17bb462SAmir GoldsteinThis is achieved by performing two permission checks on each access:
3054c494bd5SMiklos Szeredi
3064c494bd5SMiklos Szeredi a) check if current task is allowed access based on local DAC (owner,
3074c494bd5SMiklos Szeredi    group, mode and posix acl), as well as MAC checks
3084c494bd5SMiklos Szeredi
3094c494bd5SMiklos Szeredi b) check if mounting task would be allowed real operation on lower or
3104c494bd5SMiklos Szeredi    upper layer based on underlying filesystem permissions, again including
3114c494bd5SMiklos Szeredi    MAC checks
3124c494bd5SMiklos Szeredi
3134c494bd5SMiklos SzerediCheck (a) ensures consistency (1) since owner, group, mode and posix acls
3144c494bd5SMiklos Szerediare copied up.  On the other hand it can result in server enforced
3154c494bd5SMiklos Szeredipermissions (used by NFS, for example) being ignored (3).
3164c494bd5SMiklos Szeredi
3174c494bd5SMiklos SzerediCheck (b) ensures that no task gains permissions to underlying layers that
3184c494bd5SMiklos Szeredithe mounting task does not have (2).  This also means that it is possible
3194c494bd5SMiklos Szeredito create setups where the consistency rule (1) does not hold; normally,
3204c494bd5SMiklos Szeredihowever, the mounting task will have sufficient privileges to perform all
3214c494bd5SMiklos Szeredioperations.
3224c494bd5SMiklos Szeredi
323d17bb462SAmir GoldsteinAnother way to demonstrate this model is drawing parallels between::
3244c494bd5SMiklos Szeredi
3254c494bd5SMiklos Szeredi  mount -t overlay overlay -olowerdir=/lower,upperdir=/upper,... /merged
3264c494bd5SMiklos Szeredi
327d17bb462SAmir Goldsteinand::
3284c494bd5SMiklos Szeredi
3294c494bd5SMiklos Szeredi  cp -a /lower /upper
3304c494bd5SMiklos Szeredi  mount --bind /upper /merged
3314c494bd5SMiklos Szeredi
3324c494bd5SMiklos SzerediThe resulting access permissions should be the same.  The difference is in
3334c494bd5SMiklos Szeredithe time of copy (on-demand vs. up-front).
3344c494bd5SMiklos Szeredi
3354c494bd5SMiklos Szeredi
3365356ab06SAmir GoldsteinMultiple lower layers
3375356ab06SAmir Goldstein---------------------
3385356ab06SAmir Goldstein
339f7eb0de7SRandy DunlapMultiple lower layers can now be given using the colon (":") as a
340d17bb462SAmir Goldsteinseparator character between the directory names.  For example::
3415356ab06SAmir Goldstein
3425356ab06SAmir Goldstein  mount -t overlay overlay -olowerdir=/lower1:/lower2:/lower3 /merged
3435356ab06SAmir Goldstein
3445356ab06SAmir GoldsteinAs the example shows, "upperdir=" and "workdir=" may be omitted.  In
3455356ab06SAmir Goldsteinthat case the overlay will be read-only.
3465356ab06SAmir Goldstein
3475356ab06SAmir GoldsteinThe specified lower directories will be stacked beginning from the
3485356ab06SAmir Goldsteinrightmost one and going left.  In the above example lower1 will be the
3495356ab06SAmir Goldsteintop, lower2 the middle and lower3 the bottom layer.
3505356ab06SAmir Goldstein
35132db5107SAmir GoldsteinNote: directory names containing colons can be provided as lower layer by
352d17bb462SAmir Goldsteinescaping the colons with a single backslash.  For example::
35332db5107SAmir Goldstein
35432db5107SAmir Goldstein  mount -t overlay overlay -olowerdir=/a\:lower\:\:dir /merged
35532db5107SAmir Goldstein
35624e16e38SAmir GoldsteinSince kernel version v6.8, directory names containing colons can also
35724e16e38SAmir Goldsteinbe configured as lower layer using the "lowerdir+" mount options and the
358d17bb462SAmir Goldsteinfsconfig syscall from new mount api.  For example::
35932db5107SAmir Goldstein
36024e16e38SAmir Goldstein  fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir+", "/a:lower::dir", 0);
36132db5107SAmir Goldstein
36232db5107SAmir GoldsteinIn the latter case, colons in lower layer directory names will be escaped
36332db5107SAmir Goldsteinas an octal characters (\072) when displayed in /proc/self/mountinfo.
3645356ab06SAmir Goldstein
3655356ab06SAmir GoldsteinMetadata only copy up
36635c6cb41SAmir Goldstein---------------------
3675356ab06SAmir Goldstein
368bdc10bdfSAmir GoldsteinWhen the "metacopy" feature is enabled, overlayfs will only copy
3695356ab06SAmir Goldsteinup metadata (as opposed to whole file), when a metadata specific operation
370*930b7c32SYuriy Belikovlike chown/chmod is performed. An upper file in this state is marked with
371*930b7c32SYuriy Belikov"trusted.overlayfs.metacopy" xattr which indicates that the upper file
372*930b7c32SYuriy Belikovcontains no data.  The data will be copied up later when file is opened for
373*930b7c32SYuriy BelikovWRITE operation.  After the lower file's data is copied up,
374*930b7c32SYuriy Belikovthe "trusted.overlayfs.metacopy" xattr is removed from the upper file.
3755356ab06SAmir Goldstein
3765356ab06SAmir GoldsteinIn other words, this is delayed data copy up operation and data is copied
3775356ab06SAmir Goldsteinup when there is a need to actually modify data.
3785356ab06SAmir Goldstein
3795356ab06SAmir GoldsteinThere are multiple ways to enable/disable this feature. A config option
3805356ab06SAmir GoldsteinCONFIG_OVERLAY_FS_METACOPY can be set/unset to enable/disable this feature
3815356ab06SAmir Goldsteinby default. Or one can enable/disable it at module load time with module
3825356ab06SAmir Goldsteinparameter metacopy=on/off. Lastly, there is also a per mount option
3835356ab06SAmir Goldsteinmetacopy=on/off to enable/disable this feature per mount.
3845356ab06SAmir Goldstein
3855356ab06SAmir GoldsteinDo not use metacopy=on with untrusted upper/lower directories. Otherwise
3865356ab06SAmir Goldsteinit is possible that an attacker can create a handcrafted file with
3875356ab06SAmir Goldsteinappropriate REDIRECT and METACOPY xattrs, and gain access to file on lower
3885356ab06SAmir Goldsteinpointed by REDIRECT. This should not be possible on local system as setting
3895356ab06SAmir Goldstein"trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible
3905356ab06SAmir Goldsteinfor untrusted layers like from a pen drive.
3915356ab06SAmir Goldstein
392b0def88dSAmir GoldsteinNote: redirect_dir={off|nofollow|follow[*]} and nfs_export=on mount options
393b0def88dSAmir Goldsteinconflict with metacopy=on, and will result in an error.
3945356ab06SAmir Goldstein
39535c6cb41SAmir Goldstein[*] redirect_dir=follow only conflicts with metacopy=on if upperdir=... is
3965356ab06SAmir Goldsteingiven.
3975356ab06SAmir Goldstein
39837ebf056SAmir Goldstein
39937ebf056SAmir GoldsteinData-only lower layers
40037ebf056SAmir Goldstein----------------------
40137ebf056SAmir Goldstein
40237ebf056SAmir GoldsteinWith "metacopy" feature enabled, an overlayfs regular file may be a composition
40337ebf056SAmir Goldsteinof information from up to three different layers:
40437ebf056SAmir Goldstein
40537ebf056SAmir Goldstein 1) metadata from a file in the upper layer
40637ebf056SAmir Goldstein
40737ebf056SAmir Goldstein 2) st_ino and st_dev object identifier from a file in a lower layer
40837ebf056SAmir Goldstein
40937ebf056SAmir Goldstein 3) data from a file in another lower layer (further below)
41037ebf056SAmir Goldstein
41137ebf056SAmir GoldsteinThe "lower data" file can be on any lower layer, except from the top most
41237ebf056SAmir Goldsteinlower layer.
41337ebf056SAmir Goldstein
41437ebf056SAmir GoldsteinBelow the top most lower layer, any number of lower most layers may be defined
41537ebf056SAmir Goldsteinas "data-only" lower layers, using double colon ("::") separators.
41637ebf056SAmir GoldsteinA normal lower layer is not allowed to be below a data-only layer, so single
41737ebf056SAmir Goldsteincolon separators are not allowed to the right of double colon ("::") separators.
41837ebf056SAmir Goldstein
41937ebf056SAmir Goldstein
420d17bb462SAmir GoldsteinFor example::
42137ebf056SAmir Goldstein
42237ebf056SAmir Goldstein  mount -t overlay overlay -olowerdir=/l1:/l2:/l3::/do1::/do2 /merged
42337ebf056SAmir Goldstein
42437ebf056SAmir GoldsteinThe paths of files in the "data-only" lower layers are not visible in the
42537ebf056SAmir Goldsteinmerged overlayfs directories and the metadata and st_ino/st_dev of files
42637ebf056SAmir Goldsteinin the "data-only" lower layers are not visible in overlayfs inodes.
42737ebf056SAmir Goldstein
42837ebf056SAmir GoldsteinOnly the data of the files in the "data-only" lower layers may be visible
42937ebf056SAmir Goldsteinwhen a "metacopy" file in one of the lower layers above it, has a "redirect"
43037ebf056SAmir Goldsteinto the absolute path of the "lower data" file in the "data-only" lower layer.
43137ebf056SAmir Goldstein
43224e16e38SAmir GoldsteinSince kernel version v6.8, "data-only" lower layers can also be added using
43324e16e38SAmir Goldsteinthe "datadir+" mount options and the fsconfig syscall from new mount api.
434d17bb462SAmir GoldsteinFor example::
43524e16e38SAmir Goldstein
43624e16e38SAmir Goldstein  fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir+", "/l1", 0);
43724e16e38SAmir Goldstein  fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir+", "/l2", 0);
43824e16e38SAmir Goldstein  fsconfig(fs_fd, FSCONFIG_SET_STRING, "lowerdir+", "/l3", 0);
43924e16e38SAmir Goldstein  fsconfig(fs_fd, FSCONFIG_SET_STRING, "datadir+", "/do1", 0);
44024e16e38SAmir Goldstein  fsconfig(fs_fd, FSCONFIG_SET_STRING, "datadir+", "/do2", 0);
44124e16e38SAmir Goldstein
44237ebf056SAmir Goldstein
443ae8cba40SAlexander Larssonfs-verity support
444d17bb462SAmir Goldstein-----------------
445ae8cba40SAlexander Larsson
446ae8cba40SAlexander LarssonDuring metadata copy up of a lower file, if the source file has
447ae8cba40SAlexander Larssonfs-verity enabled and overlay verity support is enabled, then the
448ae8cba40SAlexander Larssondigest of the lower file is added to the "trusted.overlay.metacopy"
449ae8cba40SAlexander Larssonxattr. This is then used to verify the content of the lower file
450ae8cba40SAlexander Larssoneach the time the metacopy file is opened.
451ae8cba40SAlexander Larsson
452ae8cba40SAlexander LarssonWhen a layer containing verity xattrs is used, it means that any such
453ae8cba40SAlexander Larssonmetacopy file in the upper layer is guaranteed to match the content
454ae8cba40SAlexander Larssonthat was in the lower at the time of the copy-up. If at any time
455ae8cba40SAlexander Larsson(during a mount, after a remount, etc) such a file in the lower is
456ae8cba40SAlexander Larssonreplaced or modified in any way, access to the corresponding file in
457ae8cba40SAlexander Larssonoverlayfs will result in EIO errors (either on open, due to overlayfs
458ae8cba40SAlexander Larssondigest check, or from a later read due to fs-verity) and a detailed
459ae8cba40SAlexander Larssonerror is printed to the kernel logs. For more details of how fs-verity
460ae8cba40SAlexander Larssonfile access works, see :ref:`Documentation/filesystems/fsverity.rst
461ae8cba40SAlexander Larsson<accessing_verity_files>`.
462ae8cba40SAlexander Larsson
463ae8cba40SAlexander LarssonVerity can be used as a general robustness check to detect accidental
464ae8cba40SAlexander Larssonchanges in the overlayfs directories in use. But, with additional care
465ae8cba40SAlexander Larssonit can also give more powerful guarantees. For example, if the upper
466ae8cba40SAlexander Larssonlayer is fully trusted (by using dm-verity or something similar), then
467ae8cba40SAlexander Larssonan untrusted lower layer can be used to supply validated file content
468ae8cba40SAlexander Larssonfor all metacopy files.  If additionally the untrusted lower
469ae8cba40SAlexander Larssondirectories are specified as "Data-only", then they can only supply
470ae8cba40SAlexander Larssonsuch file content, and the entire mount can be trusted to match the
471ae8cba40SAlexander Larssonupper layer.
472ae8cba40SAlexander Larsson
473ae8cba40SAlexander LarssonThis feature is controlled by the "verity" mount option, which
474ae8cba40SAlexander Larssonsupports these values:
475ae8cba40SAlexander Larsson
476ae8cba40SAlexander Larsson- "off":
477ae8cba40SAlexander Larsson    The metacopy digest is never generated or used. This is the
478ae8cba40SAlexander Larsson    default if verity option is not specified.
479ae8cba40SAlexander Larsson- "on":
480ae8cba40SAlexander Larsson    Whenever a metacopy files specifies an expected digest, the
481ae8cba40SAlexander Larsson    corresponding data file must match the specified digest. When
482ae8cba40SAlexander Larsson    generating a metacopy file the verity digest will be set in it
483ae8cba40SAlexander Larsson    based on the source file (if it has one).
484ae8cba40SAlexander Larsson- "require":
485ae8cba40SAlexander Larsson    Same as "on", but additionally all metacopy files must specify a
486ae8cba40SAlexander Larsson    digest (or EIO is returned on open). This means metadata copy up
487ae8cba40SAlexander Larsson    will only be used if the data file has fs-verity enabled,
488ae8cba40SAlexander Larsson    otherwise a full copy-up is used.
489ae8cba40SAlexander Larsson
4905356ab06SAmir GoldsteinSharing and copying layers
4915356ab06SAmir Goldstein--------------------------
4925356ab06SAmir Goldstein
4935356ab06SAmir GoldsteinLower layers may be shared among several overlay mounts and that is indeed
4945356ab06SAmir Goldsteina very common practice.  An overlay mount may use the same lower layer
4955356ab06SAmir Goldsteinpath as another overlay mount and it may use a lower layer path that is
4965356ab06SAmir Goldsteinbeneath or above the path of another overlay lower layer path.
4975356ab06SAmir Goldstein
4985356ab06SAmir GoldsteinUsing an upper layer path and/or a workdir path that are already used by
4995356ab06SAmir Goldsteinanother overlay mount is not allowed and may fail with EBUSY.  Using
5005356ab06SAmir Goldsteinpartially overlapping paths is not allowed and may fail with EBUSY.
5015356ab06SAmir GoldsteinIf files are accessed from two overlayfs mounts which share or overlap the
5025356ab06SAmir Goldsteinupper layer and/or workdir path the behavior of the overlay is undefined,
5035356ab06SAmir Goldsteinthough it will not result in a crash or deadlock.
5045356ab06SAmir Goldstein
5055356ab06SAmir GoldsteinMounting an overlay using an upper layer path, where the upper layer path
5065356ab06SAmir Goldsteinwas previously used by another mounted overlay in combination with a
507bdc10bdfSAmir Goldsteindifferent lower layer path, is allowed, unless the "index" or "metacopy"
508bdc10bdfSAmir Goldsteinfeatures are enabled.
5095356ab06SAmir Goldstein
510bdc10bdfSAmir GoldsteinWith the "index" feature, on the first time mount, an NFS file
5115356ab06SAmir Goldsteinhandle of the lower layer root directory, along with the UUID of the lower
5125356ab06SAmir Goldsteinfilesystem, are encoded and stored in the "trusted.overlay.origin" extended
5135356ab06SAmir Goldsteinattribute on the upper layer root directory.  On subsequent mount attempts,
5145356ab06SAmir Goldsteinthe lower root directory file handle and lower filesystem UUID are compared
5155356ab06SAmir Goldsteinto the stored origin in upper root directory.  On failure to verify the
5165356ab06SAmir Goldsteinlower root origin, mount will fail with ESTALE.  An overlayfs mount with
517bdc10bdfSAmir Goldstein"index" enabled will fail with EOPNOTSUPP if the lower filesystem
5185356ab06SAmir Goldsteindoes not support NFS export, lower filesystem does not have a valid UUID or
5195356ab06SAmir Goldsteinif the upper filesystem does not support extended attributes.
5205356ab06SAmir Goldstein
521bdc10bdfSAmir GoldsteinFor the "metacopy" feature, there is no verification mechanism at
5225356ab06SAmir Goldsteinmount time. So if same upper is mounted with different set of lower, mount
5235356ab06SAmir Goldsteinprobably will succeed but expect the unexpected later on. So don't do it.
5245356ab06SAmir Goldstein
5255356ab06SAmir GoldsteinIt is quite a common practice to copy overlay layers to a different
5265356ab06SAmir Goldsteindirectory tree on the same or different underlying filesystem, and even
527bdc10bdfSAmir Goldsteinto a different machine.  With the "index" feature, trying to mount
5285356ab06SAmir Goldsteinthe copied layers will fail the verification of the lower root file handle.
5295356ab06SAmir Goldstein
530bb7055a7SAlexander LarssonNesting overlayfs mounts
531bb7055a7SAlexander Larsson------------------------
532bb7055a7SAlexander Larsson
533bb7055a7SAlexander LarssonIt is possible to use a lower directory that is stored on an overlayfs
534bb7055a7SAlexander Larssonmount. For regular files this does not need any special care. However, files
535bb7055a7SAlexander Larssonthat have overlayfs attributes, such as whiteouts or "overlay.*" xattrs will be
536bb7055a7SAlexander Larssoninterpreted by the underlying overlayfs mount and stripped out. In order to
537bb7055a7SAlexander Larssonallow the second overlayfs mount to see the attributes they must be escaped.
538bb7055a7SAlexander Larsson
539bb7055a7SAlexander LarssonOverlayfs specific xattrs are escaped by using a special prefix of
540bb7055a7SAlexander Larsson"overlay.overlay.". So, a file with a "trusted.overlay.overlay.metacopy" xattr
541bb7055a7SAlexander Larssonin the lower dir will be exposed as a regular file with a
542bb7055a7SAlexander Larsson"trusted.overlay.metacopy" xattr in the overlayfs mount. This can be nested by
543bb7055a7SAlexander Larssonrepeating the prefix multiple time, as each instance only removes one prefix.
544bb7055a7SAlexander Larsson
545bb7055a7SAlexander LarssonA lower dir with a regular whiteout will always be handled by the overlayfs
546bb7055a7SAlexander Larssonmount, so to support storing an effective whiteout file in an overlayfs mount an
547bb7055a7SAlexander Larssonalternative form of whiteout is supported. This form is a regular, zero-size
548bb7055a7SAlexander Larssonfile with the "overlay.whiteout" xattr set, inside a directory with the
549420332b9SAmir Goldstein"overlay.opaque" xattr set to "x" (see `whiteouts and opaque directories`_).
550420332b9SAmir GoldsteinThese alternative whiteouts are never created by overlayfs, but can be used by
551420332b9SAmir Goldsteinuserspace tools (like containers) that generate lower layers.
552bb7055a7SAlexander LarssonThese alternative whiteouts can be escaped using the standard xattr escape
553bb7055a7SAlexander Larssonmechanism in order to properly nest to any depth.
5545356ab06SAmir Goldstein
5555356ab06SAmir GoldsteinNon-standard behavior
5565356ab06SAmir Goldstein---------------------
5575356ab06SAmir Goldstein
5585356ab06SAmir GoldsteinCurrent version of overlayfs can act as a mostly POSIX compliant
5595356ab06SAmir Goldsteinfilesystem.
5605356ab06SAmir Goldstein
5615356ab06SAmir GoldsteinThis is the list of cases that overlayfs doesn't currently handle:
5625356ab06SAmir Goldstein
5635356ab06SAmir Goldstein a) POSIX mandates updating st_atime for reads.  This is currently not
5645356ab06SAmir Goldstein    done in the case when the file resides on a lower layer.
5655356ab06SAmir Goldstein
5665356ab06SAmir Goldstein b) If a file residing on a lower layer is opened for read-only and then
5675356ab06SAmir Goldstein    memory mapped with MAP_SHARED, then subsequent changes to the file are not
5685356ab06SAmir Goldstein    reflected in the memory mapping.
5695356ab06SAmir Goldstein
570b71759efSChengguang Xu c) If a file residing on a lower layer is being executed, then opening that
571b71759efSChengguang Xu    file for write or truncating the file will not be denied with ETXTBSY.
572b71759efSChengguang Xu
5735356ab06SAmir GoldsteinThe following options allow overlayfs to act more like a standards
5745356ab06SAmir Goldsteincompliant filesystem:
5755356ab06SAmir Goldstein
576bdc10bdfSAmir Goldsteinredirect_dir
577bdc10bdfSAmir Goldstein````````````
5785356ab06SAmir Goldstein
5795356ab06SAmir GoldsteinEnabled with the mount option or module option: "redirect_dir=on" or with
5805356ab06SAmir Goldsteinthe kernel config option CONFIG_OVERLAY_FS_REDIRECT_DIR=y.
5815356ab06SAmir Goldstein
5825356ab06SAmir GoldsteinIf this feature is disabled, then rename(2) on a lower or merged directory
5835356ab06SAmir Goldsteinwill fail with EXDEV ("Invalid cross-device link").
5845356ab06SAmir Goldstein
585bdc10bdfSAmir Goldsteinindex
586bdc10bdfSAmir Goldstein`````
5875356ab06SAmir Goldstein
5885356ab06SAmir GoldsteinEnabled with the mount option or module option "index=on" or with the
5895356ab06SAmir Goldsteinkernel config option CONFIG_OVERLAY_FS_INDEX=y.
5905356ab06SAmir Goldstein
5915356ab06SAmir GoldsteinIf this feature is disabled and a file with multiple hard links is copied
5925356ab06SAmir Goldsteinup, then this will "break" the link.  Changes will not be propagated to
5935356ab06SAmir Goldsteinother names referring to the same inode.
5945356ab06SAmir Goldstein
595bdc10bdfSAmir Goldsteinxino
596bdc10bdfSAmir Goldstein````
5975356ab06SAmir Goldstein
5985356ab06SAmir GoldsteinEnabled with the mount option "xino=auto" or "xino=on", with the module
5995356ab06SAmir Goldsteinoption "xino_auto=on" or with the kernel config option
6005356ab06SAmir GoldsteinCONFIG_OVERLAY_FS_XINO_AUTO=y.  Also implicitly enabled by using the same
6015356ab06SAmir Goldsteinunderlying filesystem for all layers making up the overlay.
6025356ab06SAmir Goldstein
6035356ab06SAmir GoldsteinIf this feature is disabled or the underlying filesystem doesn't have
6045356ab06SAmir Goldsteinenough free bits in the inode number, then overlayfs will not be able to
6055356ab06SAmir Goldsteinguarantee that the values of st_ino and st_dev returned by stat(2) and the
6065356ab06SAmir Goldsteinvalue of d_ino returned by readdir(3) will act like on a normal filesystem.
6075356ab06SAmir GoldsteinE.g. the value of st_dev may be different for two objects in the same
608b0e0f697SAmir Goldsteinoverlay filesystem and the value of st_ino for filesystem objects may not be
6092eda9eaaSAmir Goldsteinpersistent and could change even while the overlay filesystem is mounted, as
6102eda9eaaSAmir Goldsteinsummarized in the `Inode properties`_ table above.
6115356ab06SAmir Goldstein
6125356ab06SAmir Goldstein
6135356ab06SAmir GoldsteinChanges to underlying filesystems
6145356ab06SAmir Goldstein---------------------------------
6155356ab06SAmir Goldstein
6165356ab06SAmir GoldsteinChanges to the underlying filesystems while part of a mounted overlay
6175356ab06SAmir Goldsteinfilesystem are not allowed.  If the underlying filesystem is changed,
6185356ab06SAmir Goldsteinthe behavior of the overlay is undefined, though it will not result in
6195356ab06SAmir Goldsteina crash or deadlock.
6205356ab06SAmir Goldstein
62113c6ad0fSKevin LockeOffline changes, when the overlay is not mounted, are allowed to the
62213c6ad0fSKevin Lockeupper tree.  Offline changes to the lower tree are only allowed if the
623bdc10bdfSAmir Goldstein"metacopy", "index", "xino" and "redirect_dir" features
62413c6ad0fSKevin Lockehave not been used.  If the lower tree is modified and any of these
62513c6ad0fSKevin Lockefeatures has been used, the behavior of the overlay is undefined,
62613c6ad0fSKevin Lockethough it will not result in a crash or deadlock.
62713c6ad0fSKevin Locke
6285356ab06SAmir GoldsteinWhen the overlay NFS export feature is enabled, overlay filesystems
6295356ab06SAmir Goldsteinbehavior on offline changes of the underlying lower layer is different
6305356ab06SAmir Goldsteinthan the behavior when NFS export is disabled.
6315356ab06SAmir Goldstein
6325356ab06SAmir GoldsteinOn every copy_up, an NFS file handle of the lower inode, along with the
6335356ab06SAmir GoldsteinUUID of the lower filesystem, are encoded and stored in an extended
6345356ab06SAmir Goldsteinattribute "trusted.overlay.origin" on the upper inode.
6355356ab06SAmir Goldstein
6365356ab06SAmir GoldsteinWhen the NFS export feature is enabled, a lookup of a merged directory,
6375356ab06SAmir Goldsteinthat found a lower directory at the lookup path or at the path pointed
6385356ab06SAmir Goldsteinto by the "trusted.overlay.redirect" extended attribute, will verify
6395356ab06SAmir Goldsteinthat the found lower directory file handle and lower filesystem UUID
6405356ab06SAmir Goldsteinmatch the origin file handle that was stored at copy_up time.  If a
6415356ab06SAmir Goldsteinfound lower directory does not match the stored origin, that directory
6425356ab06SAmir Goldsteinwill not be merged with the upper directory.
6435356ab06SAmir Goldstein
6445356ab06SAmir Goldstein
6455356ab06SAmir Goldstein
6465356ab06SAmir GoldsteinNFS export
6475356ab06SAmir Goldstein----------
6485356ab06SAmir Goldstein
6495356ab06SAmir GoldsteinWhen the underlying filesystems supports NFS export and the "nfs_export"
6505356ab06SAmir Goldsteinfeature is enabled, an overlay filesystem may be exported to NFS.
6515356ab06SAmir Goldstein
6525356ab06SAmir GoldsteinWith the "nfs_export" feature, on copy_up of any lower object, an index
6535356ab06SAmir Goldsteinentry is created under the index directory.  The index entry name is the
6545356ab06SAmir Goldsteinhexadecimal representation of the copy up origin file handle.  For a
6555356ab06SAmir Goldsteinnon-directory object, the index entry is a hard link to the upper inode.
6565356ab06SAmir GoldsteinFor a directory object, the index entry has an extended attribute
6575356ab06SAmir Goldstein"trusted.overlay.upper" with an encoded file handle of the upper
6585356ab06SAmir Goldsteindirectory inode.
6595356ab06SAmir Goldstein
6605356ab06SAmir GoldsteinWhen encoding a file handle from an overlay filesystem object, the
6615356ab06SAmir Goldsteinfollowing rules apply:
6625356ab06SAmir Goldstein
6635356ab06SAmir Goldstein 1. For a non-upper object, encode a lower file handle from lower inode
6645356ab06SAmir Goldstein 2. For an indexed object, encode a lower file handle from copy_up origin
6655356ab06SAmir Goldstein 3. For a pure-upper object and for an existing non-indexed upper object,
6665356ab06SAmir Goldstein    encode an upper file handle from upper inode
6675356ab06SAmir Goldstein
6685356ab06SAmir GoldsteinThe encoded overlay file handle includes:
669d17bb462SAmir Goldstein
6705356ab06SAmir Goldstein - Header including path type information (e.g. lower/upper)
6715356ab06SAmir Goldstein - UUID of the underlying filesystem
6725356ab06SAmir Goldstein - Underlying filesystem encoding of underlying inode
6735356ab06SAmir Goldstein
6745356ab06SAmir GoldsteinThis encoding format is identical to the encoding format file handles that
6755356ab06SAmir Goldsteinare stored in extended attribute "trusted.overlay.origin".
6765356ab06SAmir Goldstein
6775356ab06SAmir GoldsteinWhen decoding an overlay file handle, the following steps are followed:
6785356ab06SAmir Goldstein
6795356ab06SAmir Goldstein 1. Find underlying layer by UUID and path type information.
6805356ab06SAmir Goldstein 2. Decode the underlying filesystem file handle to underlying dentry.
6815356ab06SAmir Goldstein 3. For a lower file handle, lookup the handle in index directory by name.
6825356ab06SAmir Goldstein 4. If a whiteout is found in index, return ESTALE. This represents an
6835356ab06SAmir Goldstein    overlay object that was deleted after its file handle was encoded.
6845356ab06SAmir Goldstein 5. For a non-directory, instantiate a disconnected overlay dentry from the
6855356ab06SAmir Goldstein    decoded underlying dentry, the path type and index inode, if found.
6865356ab06SAmir Goldstein 6. For a directory, use the connected underlying decoded dentry, path type
6875356ab06SAmir Goldstein    and index, to lookup a connected overlay dentry.
6885356ab06SAmir Goldstein
6895356ab06SAmir GoldsteinDecoding a non-directory file handle may return a disconnected dentry.
6905356ab06SAmir Goldsteincopy_up of that disconnected dentry will create an upper index entry with
6915356ab06SAmir Goldsteinno upper alias.
6925356ab06SAmir Goldstein
6935356ab06SAmir GoldsteinWhen overlay filesystem has multiple lower layers, a middle layer
6945356ab06SAmir Goldsteindirectory may have a "redirect" to lower directory.  Because middle layer
6955356ab06SAmir Goldstein"redirects" are not indexed, a lower file handle that was encoded from the
6965356ab06SAmir Goldstein"redirect" origin directory, cannot be used to find the middle or upper
6975356ab06SAmir Goldsteinlayer directory.  Similarly, a lower file handle that was encoded from a
6985356ab06SAmir Goldsteindescendant of the "redirect" origin directory, cannot be used to
6995356ab06SAmir Goldsteinreconstruct a connected overlay path.  To mitigate the cases of
7005356ab06SAmir Goldsteindirectories that cannot be decoded from a lower file handle, these
7015356ab06SAmir Goldsteindirectories are copied up on encode and encoded as an upper file handle.
7025356ab06SAmir GoldsteinOn an overlay filesystem with no upper layer this mitigation cannot be
7035356ab06SAmir Goldsteinused NFS export in this setup requires turning off redirect follow (e.g.
7045356ab06SAmir Goldstein"redirect_dir=nofollow").
7055356ab06SAmir Goldstein
7065356ab06SAmir GoldsteinThe overlay filesystem does not support non-directory connectable file
7075356ab06SAmir Goldsteinhandles, so exporting with the 'subtree_check' exportfs configuration will
7085356ab06SAmir Goldsteincause failures to lookup files over NFS.
7095356ab06SAmir Goldstein
7105356ab06SAmir GoldsteinWhen the NFS export feature is enabled, all directory index entries are
7115356ab06SAmir Goldsteinverified on mount time to check that upper file handles are not stale.
7125356ab06SAmir GoldsteinThis verification may cause significant overhead in some cases.
7135356ab06SAmir Goldstein
714f0e1266eSAmir GoldsteinNote: the mount options index=off,nfs_export=on are conflicting for a
715f0e1266eSAmir Goldsteinread-write mount and will result in an error.
716b0def88dSAmir Goldstein
7175830fb6bSPavel TikhomirovNote: the mount option uuid=off can be used to replace UUID of the underlying
7185830fb6bSPavel Tikhomirovfilesystem in file handles with null, and effectively disable UUID checks. This
7195830fb6bSPavel Tikhomirovcan be useful in case the underlying disk is copied and the UUID of this copy
7205830fb6bSPavel Tikhomirovis changed. This is only applicable if all lower/upper/work directories are on
7215830fb6bSPavel Tikhomirovthe same filesystem, otherwise it will fallback to normal behaviour.
7225356ab06SAmir Goldstein
723b0504bfeSAmir Goldstein
724b0504bfeSAmir GoldsteinUUID and fsid
725b0504bfeSAmir Goldstein-------------
726b0504bfeSAmir Goldstein
727b0504bfeSAmir GoldsteinThe UUID of overlayfs instance itself and the fsid reported by statfs(2) are
728b0504bfeSAmir Goldsteincontrolled by the "uuid" mount option, which supports these values:
729b0504bfeSAmir Goldstein
730cbb44f09SAmir Goldstein- "null":
731b0504bfeSAmir Goldstein    UUID of overlayfs is null. fsid is taken from upper most filesystem.
732b0504bfeSAmir Goldstein- "off":
733b0504bfeSAmir Goldstein    UUID of overlayfs is null. fsid is taken from upper most filesystem.
734b0504bfeSAmir Goldstein    UUID of underlying layers is ignored.
735b0504bfeSAmir Goldstein- "on":
736b0504bfeSAmir Goldstein    UUID of overlayfs is generated and used to report a unique fsid.
737d9544c1bSAmir Goldstein    UUID is stored in xattr "trusted.overlay.uuid", making overlayfs fsid
738d9544c1bSAmir Goldstein    unique and persistent.  This option requires an overlayfs with upper
739d9544c1bSAmir Goldstein    filesystem that supports xattrs.
740cbb44f09SAmir Goldstein- "auto": (default)
741cbb44f09SAmir Goldstein    UUID is taken from xattr "trusted.overlay.uuid" if it exists.
742cbb44f09SAmir Goldstein    Upgrade to "uuid=on" on first time mount of new overlay filesystem that
743cbb44f09SAmir Goldstein    meets the prerequites.
744cbb44f09SAmir Goldstein    Downgrade to "uuid=null" for existing overlay filesystems that were never
745cbb44f09SAmir Goldstein    mounted with "uuid=on".
746b0504bfeSAmir Goldstein
747b0504bfeSAmir Goldstein
748c86243b0SVivek GoyalVolatile mount
749c86243b0SVivek Goyal--------------
750c86243b0SVivek Goyal
751c86243b0SVivek GoyalThis is enabled with the "volatile" mount option.  Volatile mounts are not
752c86243b0SVivek Goyalguaranteed to survive a crash.  It is strongly recommended that volatile
753c86243b0SVivek Goyalmounts are only used if data written to the overlay can be recreated
754c86243b0SVivek Goyalwithout significant effort.
755c86243b0SVivek Goyal
756c86243b0SVivek GoyalThe advantage of mounting with the "volatile" option is that all forms of
757c86243b0SVivek Goyalsync calls to the upper filesystem are omitted.
758c86243b0SVivek Goyal
759335d3fc5SSargun DhillonIn order to avoid a giving a false sense of safety, the syncfs (and fsync)
760335d3fc5SSargun Dhillonsemantics of volatile mounts are slightly different than that of the rest of
761335d3fc5SSargun DhillonVFS.  If any writeback error occurs on the upperdir's filesystem after a
762335d3fc5SSargun Dhillonvolatile mount takes place, all sync functions will return an error.  Once this
763335d3fc5SSargun Dhilloncondition is reached, the filesystem will not recover, and every subsequent sync
764335d3fc5SSargun Dhilloncall will return an error, even if the upperdir has not experience a new error
765335d3fc5SSargun Dhillonsince the last sync call.
766335d3fc5SSargun Dhillon
767c86243b0SVivek GoyalWhen overlay is mounted with "volatile" option, the directory
768c86243b0SVivek Goyal"$workdir/work/incompat/volatile" is created.  During next mount, overlay
769c86243b0SVivek Goyalchecks for this directory and refuses to mount if present. This is a strong
770c86243b0SVivek Goyalindicator that user should throw away upper and work directories and create
771c86243b0SVivek Goyalfresh one. In very limited cases where the user knows that the system has
772c86243b0SVivek Goyalnot crashed and contents of upperdir are intact, The "volatile" directory
773c86243b0SVivek Goyalcan be removed.
774c86243b0SVivek Goyal
7752d2f2d73SMiklos Szeredi
7762d2f2d73SMiklos SzerediUser xattr
7772d2f2d73SMiklos Szeredi----------
7782d2f2d73SMiklos Szeredi
779df672565SDeming WangThe "-o userxattr" mount option forces overlayfs to use the
7802d2f2d73SMiklos Szeredi"user.overlay." xattr namespace instead of "trusted.overlay.".  This is
7812d2f2d73SMiklos Szerediuseful for unprivileged mounting of overlayfs.
7822d2f2d73SMiklos Szeredi
7832d2f2d73SMiklos Szeredi
7845356ab06SAmir GoldsteinTestsuite
7855356ab06SAmir Goldstein---------
7865356ab06SAmir Goldstein
7875356ab06SAmir GoldsteinThere's a testsuite originally developed by David Howells and currently
7885356ab06SAmir Goldsteinmaintained by Amir Goldstein at:
7895356ab06SAmir Goldstein
7905356ab06SAmir Goldsteinhttps://github.com/amir73il/unionmount-testsuite.git
7915356ab06SAmir Goldstein
792d17bb462SAmir GoldsteinRun as root::
7935356ab06SAmir Goldstein
7945356ab06SAmir Goldstein  # cd unionmount-testsuite
7955356ab06SAmir Goldstein  # ./run --ov --verify
796