1*e3fad037SJeff Layton.. SPDX-License-Identifier: GPL-2.0 2*e3fad037SJeff Layton 3*e3fad037SJeff Layton===================== 4*e3fad037SJeff LaytonMultigrain Timestamps 5*e3fad037SJeff Layton===================== 6*e3fad037SJeff Layton 7*e3fad037SJeff LaytonIntroduction 8*e3fad037SJeff Layton============ 9*e3fad037SJeff LaytonHistorically, the kernel has always used coarse time values to stamp inodes. 10*e3fad037SJeff LaytonThis value is updated every jiffy, so any change that happens within that jiffy 11*e3fad037SJeff Laytonwill end up with the same timestamp. 12*e3fad037SJeff Layton 13*e3fad037SJeff LaytonWhen the kernel goes to stamp an inode (due to a read or write), it first gets 14*e3fad037SJeff Laytonthe current time and then compares it to the existing timestamp(s) to see 15*e3fad037SJeff Laytonwhether anything will change. If nothing changed, then it can avoid updating 16*e3fad037SJeff Laytonthe inode's metadata. 17*e3fad037SJeff Layton 18*e3fad037SJeff LaytonCoarse timestamps are therefore good from a performance standpoint, since they 19*e3fad037SJeff Laytonreduce the need for metadata updates, but bad from the standpoint of 20*e3fad037SJeff Laytondetermining whether anything has changed, since a lot of things can happen in a 21*e3fad037SJeff Laytonjiffy. 22*e3fad037SJeff Layton 23*e3fad037SJeff LaytonThey are particularly troublesome with NFSv3, where unchanging timestamps can 24*e3fad037SJeff Laytonmake it difficult to tell whether to invalidate caches. NFSv4 provides a 25*e3fad037SJeff Laytondedicated change attribute that should always show a visible change, but not 26*e3fad037SJeff Laytonall filesystems implement this properly, causing the NFS server to substitute 27*e3fad037SJeff Laytonthe ctime in many cases. 28*e3fad037SJeff Layton 29*e3fad037SJeff LaytonMultigrain timestamps aim to remedy this by selectively using fine-grained 30*e3fad037SJeff Laytontimestamps when a file has had its timestamps queried recently, and the current 31*e3fad037SJeff Laytoncoarse-grained time does not cause a change. 32*e3fad037SJeff Layton 33*e3fad037SJeff LaytonInode Timestamps 34*e3fad037SJeff Layton================ 35*e3fad037SJeff LaytonThere are currently 3 timestamps in the inode that are updated to the current 36*e3fad037SJeff Laytonwallclock time on different activity: 37*e3fad037SJeff Layton 38*e3fad037SJeff Laytonctime: 39*e3fad037SJeff Layton The inode change time. This is stamped with the current time whenever 40*e3fad037SJeff Layton the inode's metadata is changed. Note that this value is not settable 41*e3fad037SJeff Layton from userland. 42*e3fad037SJeff Layton 43*e3fad037SJeff Laytonmtime: 44*e3fad037SJeff Layton The inode modification time. This is stamped with the current time 45*e3fad037SJeff Layton any time a file's contents change. 46*e3fad037SJeff Layton 47*e3fad037SJeff Laytonatime: 48*e3fad037SJeff Layton The inode access time. This is stamped whenever an inode's contents are 49*e3fad037SJeff Layton read. Widely considered to be a terrible mistake. Usually avoided with 50*e3fad037SJeff Layton options like noatime or relatime. 51*e3fad037SJeff Layton 52*e3fad037SJeff LaytonUpdating the mtime always implies a change to the ctime, but updating the 53*e3fad037SJeff Laytonatime due to a read request does not. 54*e3fad037SJeff Layton 55*e3fad037SJeff LaytonMultigrain timestamps are only tracked for the ctime and the mtime. atimes are 56*e3fad037SJeff Laytonnot affected and always use the coarse-grained value (subject to the floor). 57*e3fad037SJeff Layton 58*e3fad037SJeff LaytonInode Timestamp Ordering 59*e3fad037SJeff Layton======================== 60*e3fad037SJeff Layton 61*e3fad037SJeff LaytonIn addition to just providing info about changes to individual files, file 62*e3fad037SJeff Laytontimestamps also serve an important purpose in applications like "make". These 63*e3fad037SJeff Laytonprograms measure timestamps in order to determine whether source files might be 64*e3fad037SJeff Laytonnewer than cached objects. 65*e3fad037SJeff Layton 66*e3fad037SJeff LaytonUserland applications like make can only determine ordering based on 67*e3fad037SJeff Laytonoperational boundaries. For a syscall those are the syscall entry and exit 68*e3fad037SJeff Laytonpoints. For io_uring or nfsd operations, that's the request submission and 69*e3fad037SJeff Laytonresponse. In the case of concurrent operations, userland can make no 70*e3fad037SJeff Laytondetermination about the order in which things will occur. 71*e3fad037SJeff Layton 72*e3fad037SJeff LaytonFor instance, if a single thread modifies one file, and then another file in 73*e3fad037SJeff Laytonsequence, the second file must show an equal or later mtime than the first. The 74*e3fad037SJeff Laytonsame is true if two threads are issuing similar operations that do not overlap 75*e3fad037SJeff Laytonin time. 76*e3fad037SJeff Layton 77*e3fad037SJeff LaytonIf however, two threads have racing syscalls that overlap in time, then there 78*e3fad037SJeff Laytonis no such guarantee, and the second file may appear to have been modified 79*e3fad037SJeff Laytonbefore, after or at the same time as the first, regardless of which one was 80*e3fad037SJeff Laytonsubmitted first. 81*e3fad037SJeff Layton 82*e3fad037SJeff LaytonNote that the above assumes that the system doesn't experience a backward jump 83*e3fad037SJeff Laytonof the realtime clock. If that occurs at an inopportune time, then timestamps 84*e3fad037SJeff Laytoncan appear to go backward, even on a properly functioning system. 85*e3fad037SJeff Layton 86*e3fad037SJeff LaytonMultigrain Timestamp Implementation 87*e3fad037SJeff Layton=================================== 88*e3fad037SJeff LaytonMultigrain timestamps are aimed at ensuring that changes to a single file are 89*e3fad037SJeff Laytonalways recognizable, without violating the ordering guarantees when multiple 90*e3fad037SJeff Laytondifferent files are modified. This affects the mtime and the ctime, but the 91*e3fad037SJeff Laytonatime will always use coarse-grained timestamps. 92*e3fad037SJeff Layton 93*e3fad037SJeff LaytonIt uses an unused bit in the i_ctime_nsec field to indicate whether the mtime 94*e3fad037SJeff Laytonor ctime has been queried. If either or both have, then the kernel takes 95*e3fad037SJeff Laytonspecial care to ensure the next timestamp update will display a visible change. 96*e3fad037SJeff LaytonThis ensures tight cache coherency for use-cases like NFS, without sacrificing 97*e3fad037SJeff Laytonthe benefits of reduced metadata updates when files aren't being watched. 98*e3fad037SJeff Layton 99*e3fad037SJeff LaytonThe Ctime Floor Value 100*e3fad037SJeff Layton===================== 101*e3fad037SJeff LaytonIt's not sufficient to simply use fine or coarse-grained timestamps based on 102*e3fad037SJeff Laytonwhether the mtime or ctime has been queried. A file could get a fine grained 103*e3fad037SJeff Laytontimestamp, and then a second file modified later could get a coarse-grained one 104*e3fad037SJeff Laytonthat appears earlier than the first, which would break the kernel's timestamp 105*e3fad037SJeff Laytonordering guarantees. 106*e3fad037SJeff Layton 107*e3fad037SJeff LaytonTo mitigate this problem, maintain a global floor value that ensures that 108*e3fad037SJeff Laytonthis can't happen. The two files in the above example may appear to have been 109*e3fad037SJeff Laytonmodified at the same time in such a case, but they will never show the reverse 110*e3fad037SJeff Laytonorder. To avoid problems with realtime clock jumps, the floor is managed as a 111*e3fad037SJeff Laytonmonotonic ktime_t, and the values are converted to realtime clock values as 112*e3fad037SJeff Laytonneeded. 113*e3fad037SJeff Layton 114*e3fad037SJeff LaytonImplementation Notes 115*e3fad037SJeff Layton==================== 116*e3fad037SJeff LaytonMultigrain timestamps are intended for use by local filesystems that get 117*e3fad037SJeff Laytonctime values from the local clock. This is in contrast to network filesystems 118*e3fad037SJeff Laytonand the like that just mirror timestamp values from a server. 119*e3fad037SJeff Layton 120*e3fad037SJeff LaytonFor most filesystems, it's sufficient to just set the FS_MGTIME flag in the 121*e3fad037SJeff Laytonfstype->fs_flags in order to opt-in, providing the ctime is only ever set via 122*e3fad037SJeff Laytoninode_set_ctime_current(). If the filesystem has a ->getattr routine that 123*e3fad037SJeff Laytondoesn't call generic_fillattr, then it should call fill_mg_cmtime() to 124*e3fad037SJeff Laytonfill those values. For setattr, it should use setattr_copy() to update the 125*e3fad037SJeff Laytontimestamps, or otherwise mimic its behavior. 126