xref: /linux/Documentation/filesystems/multigrain-ts.rst (revision 60675d4ca1ef0857e44eba5849b74a3a998d0c0f)
1*e3fad037SJeff Layton.. SPDX-License-Identifier: GPL-2.0
2*e3fad037SJeff Layton
3*e3fad037SJeff Layton=====================
4*e3fad037SJeff LaytonMultigrain Timestamps
5*e3fad037SJeff Layton=====================
6*e3fad037SJeff Layton
7*e3fad037SJeff LaytonIntroduction
8*e3fad037SJeff Layton============
9*e3fad037SJeff LaytonHistorically, the kernel has always used coarse time values to stamp inodes.
10*e3fad037SJeff LaytonThis value is updated every jiffy, so any change that happens within that jiffy
11*e3fad037SJeff Laytonwill end up with the same timestamp.
12*e3fad037SJeff Layton
13*e3fad037SJeff LaytonWhen the kernel goes to stamp an inode (due to a read or write), it first gets
14*e3fad037SJeff Laytonthe current time and then compares it to the existing timestamp(s) to see
15*e3fad037SJeff Laytonwhether anything will change. If nothing changed, then it can avoid updating
16*e3fad037SJeff Laytonthe inode's metadata.
17*e3fad037SJeff Layton
18*e3fad037SJeff LaytonCoarse timestamps are therefore good from a performance standpoint, since they
19*e3fad037SJeff Laytonreduce the need for metadata updates, but bad from the standpoint of
20*e3fad037SJeff Laytondetermining whether anything has changed, since a lot of things can happen in a
21*e3fad037SJeff Laytonjiffy.
22*e3fad037SJeff Layton
23*e3fad037SJeff LaytonThey are particularly troublesome with NFSv3, where unchanging timestamps can
24*e3fad037SJeff Laytonmake it difficult to tell whether to invalidate caches. NFSv4 provides a
25*e3fad037SJeff Laytondedicated change attribute that should always show a visible change, but not
26*e3fad037SJeff Laytonall filesystems implement this properly, causing the NFS server to substitute
27*e3fad037SJeff Laytonthe ctime in many cases.
28*e3fad037SJeff Layton
29*e3fad037SJeff LaytonMultigrain timestamps aim to remedy this by selectively using fine-grained
30*e3fad037SJeff Laytontimestamps when a file has had its timestamps queried recently, and the current
31*e3fad037SJeff Laytoncoarse-grained time does not cause a change.
32*e3fad037SJeff Layton
33*e3fad037SJeff LaytonInode Timestamps
34*e3fad037SJeff Layton================
35*e3fad037SJeff LaytonThere are currently 3 timestamps in the inode that are updated to the current
36*e3fad037SJeff Laytonwallclock time on different activity:
37*e3fad037SJeff Layton
38*e3fad037SJeff Laytonctime:
39*e3fad037SJeff Layton  The inode change time. This is stamped with the current time whenever
40*e3fad037SJeff Layton  the inode's metadata is changed. Note that this value is not settable
41*e3fad037SJeff Layton  from userland.
42*e3fad037SJeff Layton
43*e3fad037SJeff Laytonmtime:
44*e3fad037SJeff Layton  The inode modification time. This is stamped with the current time
45*e3fad037SJeff Layton  any time a file's contents change.
46*e3fad037SJeff Layton
47*e3fad037SJeff Laytonatime:
48*e3fad037SJeff Layton  The inode access time. This is stamped whenever an inode's contents are
49*e3fad037SJeff Layton  read. Widely considered to be a terrible mistake. Usually avoided with
50*e3fad037SJeff Layton  options like noatime or relatime.
51*e3fad037SJeff Layton
52*e3fad037SJeff LaytonUpdating the mtime always implies a change to the ctime, but updating the
53*e3fad037SJeff Laytonatime due to a read request does not.
54*e3fad037SJeff Layton
55*e3fad037SJeff LaytonMultigrain timestamps are only tracked for the ctime and the mtime. atimes are
56*e3fad037SJeff Laytonnot affected and always use the coarse-grained value (subject to the floor).
57*e3fad037SJeff Layton
58*e3fad037SJeff LaytonInode Timestamp Ordering
59*e3fad037SJeff Layton========================
60*e3fad037SJeff Layton
61*e3fad037SJeff LaytonIn addition to just providing info about changes to individual files, file
62*e3fad037SJeff Laytontimestamps also serve an important purpose in applications like "make". These
63*e3fad037SJeff Laytonprograms measure timestamps in order to determine whether source files might be
64*e3fad037SJeff Laytonnewer than cached objects.
65*e3fad037SJeff Layton
66*e3fad037SJeff LaytonUserland applications like make can only determine ordering based on
67*e3fad037SJeff Laytonoperational boundaries. For a syscall those are the syscall entry and exit
68*e3fad037SJeff Laytonpoints. For io_uring or nfsd operations, that's the request submission and
69*e3fad037SJeff Laytonresponse. In the case of concurrent operations, userland can make no
70*e3fad037SJeff Laytondetermination about the order in which things will occur.
71*e3fad037SJeff Layton
72*e3fad037SJeff LaytonFor instance, if a single thread modifies one file, and then another file in
73*e3fad037SJeff Laytonsequence, the second file must show an equal or later mtime than the first. The
74*e3fad037SJeff Laytonsame is true if two threads are issuing similar operations that do not overlap
75*e3fad037SJeff Laytonin time.
76*e3fad037SJeff Layton
77*e3fad037SJeff LaytonIf however, two threads have racing syscalls that overlap in time, then there
78*e3fad037SJeff Laytonis no such guarantee, and the second file may appear to have been modified
79*e3fad037SJeff Laytonbefore, after or at the same time as the first, regardless of which one was
80*e3fad037SJeff Laytonsubmitted first.
81*e3fad037SJeff Layton
82*e3fad037SJeff LaytonNote that the above assumes that the system doesn't experience a backward jump
83*e3fad037SJeff Laytonof the realtime clock. If that occurs at an inopportune time, then timestamps
84*e3fad037SJeff Laytoncan appear to go backward, even on a properly functioning system.
85*e3fad037SJeff Layton
86*e3fad037SJeff LaytonMultigrain Timestamp Implementation
87*e3fad037SJeff Layton===================================
88*e3fad037SJeff LaytonMultigrain timestamps are aimed at ensuring that changes to a single file are
89*e3fad037SJeff Laytonalways recognizable, without violating the ordering guarantees when multiple
90*e3fad037SJeff Laytondifferent files are modified. This affects the mtime and the ctime, but the
91*e3fad037SJeff Laytonatime will always use coarse-grained timestamps.
92*e3fad037SJeff Layton
93*e3fad037SJeff LaytonIt uses an unused bit in the i_ctime_nsec field to indicate whether the mtime
94*e3fad037SJeff Laytonor ctime has been queried. If either or both have, then the kernel takes
95*e3fad037SJeff Laytonspecial care to ensure the next timestamp update will display a visible change.
96*e3fad037SJeff LaytonThis ensures tight cache coherency for use-cases like NFS, without sacrificing
97*e3fad037SJeff Laytonthe benefits of reduced metadata updates when files aren't being watched.
98*e3fad037SJeff Layton
99*e3fad037SJeff LaytonThe Ctime Floor Value
100*e3fad037SJeff Layton=====================
101*e3fad037SJeff LaytonIt's not sufficient to simply use fine or coarse-grained timestamps based on
102*e3fad037SJeff Laytonwhether the mtime or ctime has been queried. A file could get a fine grained
103*e3fad037SJeff Laytontimestamp, and then a second file modified later could get a coarse-grained one
104*e3fad037SJeff Laytonthat appears earlier than the first, which would break the kernel's timestamp
105*e3fad037SJeff Laytonordering guarantees.
106*e3fad037SJeff Layton
107*e3fad037SJeff LaytonTo mitigate this problem, maintain a global floor value that ensures that
108*e3fad037SJeff Laytonthis can't happen. The two files in the above example may appear to have been
109*e3fad037SJeff Laytonmodified at the same time in such a case, but they will never show the reverse
110*e3fad037SJeff Laytonorder. To avoid problems with realtime clock jumps, the floor is managed as a
111*e3fad037SJeff Laytonmonotonic ktime_t, and the values are converted to realtime clock values as
112*e3fad037SJeff Laytonneeded.
113*e3fad037SJeff Layton
114*e3fad037SJeff LaytonImplementation Notes
115*e3fad037SJeff Layton====================
116*e3fad037SJeff LaytonMultigrain timestamps are intended for use by local filesystems that get
117*e3fad037SJeff Laytonctime values from the local clock. This is in contrast to network filesystems
118*e3fad037SJeff Laytonand the like that just mirror timestamp values from a server.
119*e3fad037SJeff Layton
120*e3fad037SJeff LaytonFor most filesystems, it's sufficient to just set the FS_MGTIME flag in the
121*e3fad037SJeff Laytonfstype->fs_flags in order to opt-in, providing the ctime is only ever set via
122*e3fad037SJeff Laytoninode_set_ctime_current(). If the filesystem has a ->getattr routine that
123*e3fad037SJeff Laytondoesn't call generic_fillattr, then it should call fill_mg_cmtime() to
124*e3fad037SJeff Laytonfill those values. For setattr, it should use setattr_copy() to update the
125*e3fad037SJeff Laytontimestamps, or otherwise mimic its behavior.
126