xref: /linux/Documentation/filesystems/fiemap.rst (revision e6f7df74ec1ab956430db2dcdcaf217ae5cf4ae0)
1*e6f7df74SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0
2*e6f7df74SMauro Carvalho Chehab
3*e6f7df74SMauro Carvalho Chehab============
4*e6f7df74SMauro Carvalho ChehabFiemap Ioctl
5*e6f7df74SMauro Carvalho Chehab============
6*e6f7df74SMauro Carvalho Chehab
7*e6f7df74SMauro Carvalho ChehabThe fiemap ioctl is an efficient method for userspace to get file
8*e6f7df74SMauro Carvalho Chehabextent mappings. Instead of block-by-block mapping (such as bmap), fiemap
9*e6f7df74SMauro Carvalho Chehabreturns a list of extents.
10*e6f7df74SMauro Carvalho Chehab
11*e6f7df74SMauro Carvalho Chehab
12*e6f7df74SMauro Carvalho ChehabRequest Basics
13*e6f7df74SMauro Carvalho Chehab--------------
14*e6f7df74SMauro Carvalho Chehab
15*e6f7df74SMauro Carvalho ChehabA fiemap request is encoded within struct fiemap::
16*e6f7df74SMauro Carvalho Chehab
17*e6f7df74SMauro Carvalho Chehab  struct fiemap {
18*e6f7df74SMauro Carvalho Chehab	__u64	fm_start;	 /* logical offset (inclusive) at
19*e6f7df74SMauro Carvalho Chehab				  * which to start mapping (in) */
20*e6f7df74SMauro Carvalho Chehab	__u64	fm_length;	 /* logical length of mapping which
21*e6f7df74SMauro Carvalho Chehab				  * userspace cares about (in) */
22*e6f7df74SMauro Carvalho Chehab	__u32	fm_flags;	 /* FIEMAP_FLAG_* flags for request (in/out) */
23*e6f7df74SMauro Carvalho Chehab	__u32	fm_mapped_extents; /* number of extents that were
24*e6f7df74SMauro Carvalho Chehab				    * mapped (out) */
25*e6f7df74SMauro Carvalho Chehab	__u32	fm_extent_count; /* size of fm_extents array (in) */
26*e6f7df74SMauro Carvalho Chehab	__u32	fm_reserved;
27*e6f7df74SMauro Carvalho Chehab	struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
28*e6f7df74SMauro Carvalho Chehab  };
29*e6f7df74SMauro Carvalho Chehab
30*e6f7df74SMauro Carvalho Chehab
31*e6f7df74SMauro Carvalho Chehabfm_start, and fm_length specify the logical range within the file
32*e6f7df74SMauro Carvalho Chehabwhich the process would like mappings for. Extents returned mirror
33*e6f7df74SMauro Carvalho Chehabthose on disk - that is, the logical offset of the 1st returned extent
34*e6f7df74SMauro Carvalho Chehabmay start before fm_start, and the range covered by the last returned
35*e6f7df74SMauro Carvalho Chehabextent may end after fm_length. All offsets and lengths are in bytes.
36*e6f7df74SMauro Carvalho Chehab
37*e6f7df74SMauro Carvalho ChehabCertain flags to modify the way in which mappings are looked up can be
38*e6f7df74SMauro Carvalho Chehabset in fm_flags. If the kernel doesn't understand some particular
39*e6f7df74SMauro Carvalho Chehabflags, it will return EBADR and the contents of fm_flags will contain
40*e6f7df74SMauro Carvalho Chehabthe set of flags which caused the error. If the kernel is compatible
41*e6f7df74SMauro Carvalho Chehabwith all flags passed, the contents of fm_flags will be unmodified.
42*e6f7df74SMauro Carvalho ChehabIt is up to userspace to determine whether rejection of a particular
43*e6f7df74SMauro Carvalho Chehabflag is fatal to its operation. This scheme is intended to allow the
44*e6f7df74SMauro Carvalho Chehabfiemap interface to grow in the future but without losing
45*e6f7df74SMauro Carvalho Chehabcompatibility with old software.
46*e6f7df74SMauro Carvalho Chehab
47*e6f7df74SMauro Carvalho Chehabfm_extent_count specifies the number of elements in the fm_extents[] array
48*e6f7df74SMauro Carvalho Chehabthat can be used to return extents.  If fm_extent_count is zero, then the
49*e6f7df74SMauro Carvalho Chehabfm_extents[] array is ignored (no extents will be returned), and the
50*e6f7df74SMauro Carvalho Chehabfm_mapped_extents count will hold the number of extents needed in
51*e6f7df74SMauro Carvalho Chehabfm_extents[] to hold the file's current mapping.  Note that there is
52*e6f7df74SMauro Carvalho Chehabnothing to prevent the file from changing between calls to FIEMAP.
53*e6f7df74SMauro Carvalho Chehab
54*e6f7df74SMauro Carvalho ChehabThe following flags can be set in fm_flags:
55*e6f7df74SMauro Carvalho Chehab
56*e6f7df74SMauro Carvalho ChehabFIEMAP_FLAG_SYNC
57*e6f7df74SMauro Carvalho Chehab  If this flag is set, the kernel will sync the file before mapping extents.
58*e6f7df74SMauro Carvalho Chehab
59*e6f7df74SMauro Carvalho ChehabFIEMAP_FLAG_XATTR
60*e6f7df74SMauro Carvalho Chehab  If this flag is set, the extents returned will describe the inodes
61*e6f7df74SMauro Carvalho Chehab  extended attribute lookup tree, instead of its data tree.
62*e6f7df74SMauro Carvalho Chehab
63*e6f7df74SMauro Carvalho Chehab
64*e6f7df74SMauro Carvalho ChehabExtent Mapping
65*e6f7df74SMauro Carvalho Chehab--------------
66*e6f7df74SMauro Carvalho Chehab
67*e6f7df74SMauro Carvalho ChehabExtent information is returned within the embedded fm_extents array
68*e6f7df74SMauro Carvalho Chehabwhich userspace must allocate along with the fiemap structure. The
69*e6f7df74SMauro Carvalho Chehabnumber of elements in the fiemap_extents[] array should be passed via
70*e6f7df74SMauro Carvalho Chehabfm_extent_count. The number of extents mapped by kernel will be
71*e6f7df74SMauro Carvalho Chehabreturned via fm_mapped_extents. If the number of fiemap_extents
72*e6f7df74SMauro Carvalho Chehaballocated is less than would be required to map the requested range,
73*e6f7df74SMauro Carvalho Chehabthe maximum number of extents that can be mapped in the fm_extent[]
74*e6f7df74SMauro Carvalho Chehabarray will be returned and fm_mapped_extents will be equal to
75*e6f7df74SMauro Carvalho Chehabfm_extent_count. In that case, the last extent in the array will not
76*e6f7df74SMauro Carvalho Chehabcomplete the requested range and will not have the FIEMAP_EXTENT_LAST
77*e6f7df74SMauro Carvalho Chehabflag set (see the next section on extent flags).
78*e6f7df74SMauro Carvalho Chehab
79*e6f7df74SMauro Carvalho ChehabEach extent is described by a single fiemap_extent structure as
80*e6f7df74SMauro Carvalho Chehabreturned in fm_extents::
81*e6f7df74SMauro Carvalho Chehab
82*e6f7df74SMauro Carvalho Chehab    struct fiemap_extent {
83*e6f7df74SMauro Carvalho Chehab	    __u64	fe_logical;  /* logical offset in bytes for the start of
84*e6f7df74SMauro Carvalho Chehab				* the extent */
85*e6f7df74SMauro Carvalho Chehab	    __u64	fe_physical; /* physical offset in bytes for the start
86*e6f7df74SMauro Carvalho Chehab				* of the extent */
87*e6f7df74SMauro Carvalho Chehab	    __u64	fe_length;   /* length in bytes for the extent */
88*e6f7df74SMauro Carvalho Chehab	    __u64	fe_reserved64[2];
89*e6f7df74SMauro Carvalho Chehab	    __u32	fe_flags;    /* FIEMAP_EXTENT_* flags for this extent */
90*e6f7df74SMauro Carvalho Chehab	    __u32	fe_reserved[3];
91*e6f7df74SMauro Carvalho Chehab    };
92*e6f7df74SMauro Carvalho Chehab
93*e6f7df74SMauro Carvalho ChehabAll offsets and lengths are in bytes and mirror those on disk.  It is valid
94*e6f7df74SMauro Carvalho Chehabfor an extents logical offset to start before the request or its logical
95*e6f7df74SMauro Carvalho Chehablength to extend past the request.  Unless FIEMAP_EXTENT_NOT_ALIGNED is
96*e6f7df74SMauro Carvalho Chehabreturned, fe_logical, fe_physical, and fe_length will be aligned to the
97*e6f7df74SMauro Carvalho Chehabblock size of the file system.  With the exception of extents flagged as
98*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_MERGED, adjacent extents will not be merged.
99*e6f7df74SMauro Carvalho Chehab
100*e6f7df74SMauro Carvalho ChehabThe fe_flags field contains flags which describe the extent returned.
101*e6f7df74SMauro Carvalho ChehabA special flag, FIEMAP_EXTENT_LAST is always set on the last extent in
102*e6f7df74SMauro Carvalho Chehabthe file so that the process making fiemap calls can determine when no
103*e6f7df74SMauro Carvalho Chehabmore extents are available, without having to call the ioctl again.
104*e6f7df74SMauro Carvalho Chehab
105*e6f7df74SMauro Carvalho ChehabSome flags are intentionally vague and will always be set in the
106*e6f7df74SMauro Carvalho Chehabpresence of other more specific flags. This way a program looking for
107*e6f7df74SMauro Carvalho Chehaba general property does not have to know all existing and future flags
108*e6f7df74SMauro Carvalho Chehabwhich imply that property.
109*e6f7df74SMauro Carvalho Chehab
110*e6f7df74SMauro Carvalho ChehabFor example, if FIEMAP_EXTENT_DATA_INLINE or FIEMAP_EXTENT_DATA_TAIL
111*e6f7df74SMauro Carvalho Chehabare set, FIEMAP_EXTENT_NOT_ALIGNED will also be set. A program looking
112*e6f7df74SMauro Carvalho Chehabfor inline or tail-packed data can key on the specific flag. Software
113*e6f7df74SMauro Carvalho Chehabwhich simply cares not to try operating on non-aligned extents
114*e6f7df74SMauro Carvalho Chehabhowever, can just key on FIEMAP_EXTENT_NOT_ALIGNED, and not have to
115*e6f7df74SMauro Carvalho Chehabworry about all present and future flags which might imply unaligned
116*e6f7df74SMauro Carvalho Chehabdata. Note that the opposite is not true - it would be valid for
117*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_NOT_ALIGNED to appear alone.
118*e6f7df74SMauro Carvalho Chehab
119*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_LAST
120*e6f7df74SMauro Carvalho Chehab  This is generally the last extent in the file. A mapping attempt past
121*e6f7df74SMauro Carvalho Chehab  this extent may return nothing. Some implementations set this flag to
122*e6f7df74SMauro Carvalho Chehab  indicate this extent is the last one in the range queried by the user
123*e6f7df74SMauro Carvalho Chehab  (via fiemap->fm_length).
124*e6f7df74SMauro Carvalho Chehab
125*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_UNKNOWN
126*e6f7df74SMauro Carvalho Chehab  The location of this extent is currently unknown. This may indicate
127*e6f7df74SMauro Carvalho Chehab  the data is stored on an inaccessible volume or that no storage has
128*e6f7df74SMauro Carvalho Chehab  been allocated for the file yet.
129*e6f7df74SMauro Carvalho Chehab
130*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_DELALLOC
131*e6f7df74SMauro Carvalho Chehab  This will also set FIEMAP_EXTENT_UNKNOWN.
132*e6f7df74SMauro Carvalho Chehab
133*e6f7df74SMauro Carvalho Chehab  Delayed allocation - while there is data for this extent, its
134*e6f7df74SMauro Carvalho Chehab  physical location has not been allocated yet.
135*e6f7df74SMauro Carvalho Chehab
136*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_ENCODED
137*e6f7df74SMauro Carvalho Chehab  This extent does not consist of plain filesystem blocks but is
138*e6f7df74SMauro Carvalho Chehab  encoded (e.g. encrypted or compressed).  Reading the data in this
139*e6f7df74SMauro Carvalho Chehab  extent via I/O to the block device will have undefined results.
140*e6f7df74SMauro Carvalho Chehab
141*e6f7df74SMauro Carvalho ChehabNote that it is *always* undefined to try to update the data
142*e6f7df74SMauro Carvalho Chehabin-place by writing to the indicated location without the
143*e6f7df74SMauro Carvalho Chehabassistance of the filesystem, or to access the data using the
144*e6f7df74SMauro Carvalho Chehabinformation returned by the FIEMAP interface while the filesystem
145*e6f7df74SMauro Carvalho Chehabis mounted.  In other words, user applications may only read the
146*e6f7df74SMauro Carvalho Chehabextent data via I/O to the block device while the filesystem is
147*e6f7df74SMauro Carvalho Chehabunmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is
148*e6f7df74SMauro Carvalho Chehabclear; user applications must not try reading or writing to the
149*e6f7df74SMauro Carvalho Chehabfilesystem via the block device under any other circumstances.
150*e6f7df74SMauro Carvalho Chehab
151*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_DATA_ENCRYPTED
152*e6f7df74SMauro Carvalho Chehab  This will also set FIEMAP_EXTENT_ENCODED
153*e6f7df74SMauro Carvalho Chehab  The data in this extent has been encrypted by the file system.
154*e6f7df74SMauro Carvalho Chehab
155*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_NOT_ALIGNED
156*e6f7df74SMauro Carvalho Chehab  Extent offsets and length are not guaranteed to be block aligned.
157*e6f7df74SMauro Carvalho Chehab
158*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_DATA_INLINE
159*e6f7df74SMauro Carvalho Chehab  This will also set FIEMAP_EXTENT_NOT_ALIGNED
160*e6f7df74SMauro Carvalho Chehab  Data is located within a meta data block.
161*e6f7df74SMauro Carvalho Chehab
162*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_DATA_TAIL
163*e6f7df74SMauro Carvalho Chehab  This will also set FIEMAP_EXTENT_NOT_ALIGNED
164*e6f7df74SMauro Carvalho Chehab  Data is packed into a block with data from other files.
165*e6f7df74SMauro Carvalho Chehab
166*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_UNWRITTEN
167*e6f7df74SMauro Carvalho Chehab  Unwritten extent - the extent is allocated but its data has not been
168*e6f7df74SMauro Carvalho Chehab  initialized.  This indicates the extent's data will be all zero if read
169*e6f7df74SMauro Carvalho Chehab  through the filesystem but the contents are undefined if read directly from
170*e6f7df74SMauro Carvalho Chehab  the device.
171*e6f7df74SMauro Carvalho Chehab
172*e6f7df74SMauro Carvalho ChehabFIEMAP_EXTENT_MERGED
173*e6f7df74SMauro Carvalho Chehab  This will be set when a file does not support extents, i.e., it uses a block
174*e6f7df74SMauro Carvalho Chehab  based addressing scheme.  Since returning an extent for each block back to
175*e6f7df74SMauro Carvalho Chehab  userspace would be highly inefficient, the kernel will try to merge most
176*e6f7df74SMauro Carvalho Chehab  adjacent blocks into 'extents'.
177*e6f7df74SMauro Carvalho Chehab
178*e6f7df74SMauro Carvalho Chehab
179*e6f7df74SMauro Carvalho ChehabVFS -> File System Implementation
180*e6f7df74SMauro Carvalho Chehab---------------------------------
181*e6f7df74SMauro Carvalho Chehab
182*e6f7df74SMauro Carvalho ChehabFile systems wishing to support fiemap must implement a ->fiemap callback on
183*e6f7df74SMauro Carvalho Chehabtheir inode_operations structure. The fs ->fiemap call is responsible for
184*e6f7df74SMauro Carvalho Chehabdefining its set of supported fiemap flags, and calling a helper function on
185*e6f7df74SMauro Carvalho Chehabeach discovered extent::
186*e6f7df74SMauro Carvalho Chehab
187*e6f7df74SMauro Carvalho Chehab  struct inode_operations {
188*e6f7df74SMauro Carvalho Chehab       ...
189*e6f7df74SMauro Carvalho Chehab
190*e6f7df74SMauro Carvalho Chehab       int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start,
191*e6f7df74SMauro Carvalho Chehab                     u64 len);
192*e6f7df74SMauro Carvalho Chehab
193*e6f7df74SMauro Carvalho Chehab->fiemap is passed struct fiemap_extent_info which describes the
194*e6f7df74SMauro Carvalho Chehabfiemap request::
195*e6f7df74SMauro Carvalho Chehab
196*e6f7df74SMauro Carvalho Chehab  struct fiemap_extent_info {
197*e6f7df74SMauro Carvalho Chehab	unsigned int fi_flags;		/* Flags as passed from user */
198*e6f7df74SMauro Carvalho Chehab	unsigned int fi_extents_mapped;	/* Number of mapped extents */
199*e6f7df74SMauro Carvalho Chehab	unsigned int fi_extents_max;	/* Size of fiemap_extent array */
200*e6f7df74SMauro Carvalho Chehab	struct fiemap_extent *fi_extents_start;	/* Start of fiemap_extent array */
201*e6f7df74SMauro Carvalho Chehab  };
202*e6f7df74SMauro Carvalho Chehab
203*e6f7df74SMauro Carvalho ChehabIt is intended that the file system should not need to access any of this
204*e6f7df74SMauro Carvalho Chehabstructure directly. Filesystem handlers should be tolerant to signals and return
205*e6f7df74SMauro Carvalho ChehabEINTR once fatal signal received.
206*e6f7df74SMauro Carvalho Chehab
207*e6f7df74SMauro Carvalho Chehab
208*e6f7df74SMauro Carvalho ChehabFlag checking should be done at the beginning of the ->fiemap callback via the
209*e6f7df74SMauro Carvalho Chehabfiemap_check_flags() helper::
210*e6f7df74SMauro Carvalho Chehab
211*e6f7df74SMauro Carvalho Chehab  int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags);
212*e6f7df74SMauro Carvalho Chehab
213*e6f7df74SMauro Carvalho ChehabThe struct fieinfo should be passed in as received from ioctl_fiemap(). The
214*e6f7df74SMauro Carvalho Chehabset of fiemap flags which the fs understands should be passed via fs_flags. If
215*e6f7df74SMauro Carvalho Chehabfiemap_check_flags finds invalid user flags, it will place the bad values in
216*e6f7df74SMauro Carvalho Chehabfieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from
217*e6f7df74SMauro Carvalho Chehabfiemap_check_flags(), it should immediately exit, returning that error back to
218*e6f7df74SMauro Carvalho Chehabioctl_fiemap().
219*e6f7df74SMauro Carvalho Chehab
220*e6f7df74SMauro Carvalho Chehab
221*e6f7df74SMauro Carvalho ChehabFor each extent in the request range, the file system should call
222*e6f7df74SMauro Carvalho Chehabthe helper function, fiemap_fill_next_extent()::
223*e6f7df74SMauro Carvalho Chehab
224*e6f7df74SMauro Carvalho Chehab  int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical,
225*e6f7df74SMauro Carvalho Chehab			      u64 phys, u64 len, u32 flags, u32 dev);
226*e6f7df74SMauro Carvalho Chehab
227*e6f7df74SMauro Carvalho Chehabfiemap_fill_next_extent() will use the passed values to populate the
228*e6f7df74SMauro Carvalho Chehabnext free extent in the fm_extents array. 'General' extent flags will
229*e6f7df74SMauro Carvalho Chehabautomatically be set from specific flags on behalf of the calling file
230*e6f7df74SMauro Carvalho Chehabsystem so that the userspace API is not broken.
231*e6f7df74SMauro Carvalho Chehab
232*e6f7df74SMauro Carvalho Chehabfiemap_fill_next_extent() returns 0 on success, and 1 when the
233*e6f7df74SMauro Carvalho Chehabuser-supplied fm_extents array is full. If an error is encountered
234*e6f7df74SMauro Carvalho Chehabwhile copying the extent to user memory, -EFAULT will be returned.
235