xref: /linux/Documentation/admin-guide/ext4.rst (revision 3a39d672e7f48b8d6b91a09afa4b55352773b4b5)
1d3091215SDarrick J. Wong.. SPDX-License-Identifier: GPL-2.0
2d3091215SDarrick J. Wong
3d3091215SDarrick J. Wong========================
4d3091215SDarrick J. Wongext4 General Information
5d3091215SDarrick J. Wong========================
6d3091215SDarrick J. Wong
7d3091215SDarrick J. WongExt4 is an advanced level of the ext3 filesystem which incorporates
8d3091215SDarrick J. Wongscalability and reliability enhancements for supporting large filesystems
9d3091215SDarrick J. Wong(64 bit) in keeping with increasing disk capacities and state-of-the-art
10d3091215SDarrick J. Wongfeature requirements.
11d3091215SDarrick J. Wong
12d3091215SDarrick J. WongMailing list:	linux-ext4@vger.kernel.org
13d3091215SDarrick J. WongWeb site:	http://ext4.wiki.kernel.org
14d3091215SDarrick J. Wong
15d3091215SDarrick J. Wong
16d3091215SDarrick J. WongQuick usage instructions
17d3091215SDarrick J. Wong========================
18d3091215SDarrick J. Wong
19d3091215SDarrick J. WongNote: More extensive information for getting started with ext4 can be
20d3091215SDarrick J. Wongfound at the ext4 wiki site at the URL:
21d3091215SDarrick J. Wonghttp://ext4.wiki.kernel.org/index.php/Ext4_Howto
22d3091215SDarrick J. Wong
23d3091215SDarrick J. Wong  - The latest version of e2fsprogs can be found at:
24d3091215SDarrick J. Wong
25d3091215SDarrick J. Wong    https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
26d3091215SDarrick J. Wong
27d3091215SDarrick J. Wong	or
28d3091215SDarrick J. Wong
29d3091215SDarrick J. Wong    http://sourceforge.net/project/showfiles.php?group_id=2406
30d3091215SDarrick J. Wong
31d3091215SDarrick J. Wong	or grab the latest git repository from:
32d3091215SDarrick J. Wong
33d3091215SDarrick J. Wong   https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
34d3091215SDarrick J. Wong
35d3091215SDarrick J. Wong  - Create a new filesystem using the ext4 filesystem type:
36d3091215SDarrick J. Wong
37d3091215SDarrick J. Wong        # mke2fs -t ext4 /dev/hda1
38d3091215SDarrick J. Wong
39d3091215SDarrick J. Wong    Or to configure an existing ext3 filesystem to support extents:
40d3091215SDarrick J. Wong
41d3091215SDarrick J. Wong	# tune2fs -O extents /dev/hda1
42d3091215SDarrick J. Wong
43d3091215SDarrick J. Wong    If the filesystem was created with 128 byte inodes, it can be
44d3091215SDarrick J. Wong    converted to use 256 byte for greater efficiency via:
45d3091215SDarrick J. Wong
46d3091215SDarrick J. Wong        # tune2fs -I 256 /dev/hda1
47d3091215SDarrick J. Wong
48d3091215SDarrick J. Wong  - Mounting:
49d3091215SDarrick J. Wong
50d3091215SDarrick J. Wong	# mount -t ext4 /dev/hda1 /wherever
51d3091215SDarrick J. Wong
52d3091215SDarrick J. Wong  - When comparing performance with other filesystems, it's always
53d3091215SDarrick J. Wong    important to try multiple workloads; very often a subtle change in a
54d3091215SDarrick J. Wong    workload parameter can completely change the ranking of which
55d3091215SDarrick J. Wong    filesystems do well compared to others.  When comparing versus ext3,
56d3091215SDarrick J. Wong    note that ext4 enables write barriers by default, while ext3 does
57d3091215SDarrick J. Wong    not enable write barriers by default.  So it is useful to use
58d3091215SDarrick J. Wong    explicitly specify whether barriers are enabled or not when via the
59d3091215SDarrick J. Wong    '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems
60d3091215SDarrick J. Wong    for a fair comparison.  When tuning ext3 for best benchmark numbers,
61d3091215SDarrick J. Wong    it is often worthwhile to try changing the data journaling mode; '-o
62d3091215SDarrick J. Wong    data=writeback' can be faster for some workloads.  (Note however that
63d3091215SDarrick J. Wong    running mounted with data=writeback can potentially leave stale data
64d3091215SDarrick J. Wong    exposed in recently written files in case of an unclean shutdown,
65d3091215SDarrick J. Wong    which could be a security exposure in some situations.)  Configuring
66d3091215SDarrick J. Wong    the filesystem with a large journal can also be helpful for
67d3091215SDarrick J. Wong    metadata-intensive workloads.
68d3091215SDarrick J. Wong
69d3091215SDarrick J. WongFeatures
70d3091215SDarrick J. Wong========
71d3091215SDarrick J. Wong
72d3091215SDarrick J. WongCurrently Available
73d3091215SDarrick J. Wong-------------------
74d3091215SDarrick J. Wong
75d3091215SDarrick J. Wong* ability to use filesystems > 16TB (e2fsprogs support not available yet)
76d3091215SDarrick J. Wong* extent format reduces metadata overhead (RAM, IO for access, transactions)
77d3091215SDarrick J. Wong* extent format more robust in face of on-disk corruption due to magics,
78d3091215SDarrick J. Wong* internal redundancy in tree
79d3091215SDarrick J. Wong* improved file allocation (multi-block alloc)
80d3091215SDarrick J. Wong* lift 32000 subdirectory limit imposed by i_links_count[1]
81d3091215SDarrick J. Wong* nsec timestamps for mtime, atime, ctime, create time
82d3091215SDarrick J. Wong* inode version field on disk (NFSv4, Lustre)
83d3091215SDarrick J. Wong* reduced e2fsck time via uninit_bg feature
84d3091215SDarrick J. Wong* journal checksumming for robustness, performance
85d3091215SDarrick J. Wong* persistent file preallocation (e.g for streaming media, databases)
86d3091215SDarrick J. Wong* ability to pack bitmaps and inode tables into larger virtual groups via the
87d3091215SDarrick J. Wong  flex_bg feature
88d3091215SDarrick J. Wong* large file support
89d3091215SDarrick J. Wong* inode allocation using large virtual block groups via flex_bg
90d3091215SDarrick J. Wong* delayed allocation
91d3091215SDarrick J. Wong* large block (up to pagesize) support
92d3091215SDarrick J. Wong* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
93d3091215SDarrick J. Wong  the ordering)
940a790fe4SGabriel Krisman Bertazi* Case-insensitive file name lookups
952fdff4c8SEric Biggers* file-based encryption support (fscrypt)
962fdff4c8SEric Biggers* file-based verity support (fsverity)
97d3091215SDarrick J. Wong
98d3091215SDarrick J. Wong[1] Filesystems with a block size of 1k may see a limit imposed by the
99d3091215SDarrick J. Wongdirectory hash tree having a maximum depth of two.
100d3091215SDarrick J. Wong
1010a790fe4SGabriel Krisman Bertazicase-insensitive file name lookups
1020a790fe4SGabriel Krisman Bertazi======================================================
1030a790fe4SGabriel Krisman Bertazi
1040a790fe4SGabriel Krisman BertaziThe case-insensitive file name lookup feature is supported on a
1050a790fe4SGabriel Krisman Bertaziper-directory basis, allowing the user to mix case-insensitive and
1060a790fe4SGabriel Krisman Bertazicase-sensitive directories in the same filesystem.  It is enabled by
1070a790fe4SGabriel Krisman Bertaziflipping the +F inode attribute of an empty directory.  The
1080a790fe4SGabriel Krisman Bertazicase-insensitive string match operation is only defined when we know how
1090a790fe4SGabriel Krisman Bertazitext in encoded in a byte sequence.  For that reason, in order to enable
1100a790fe4SGabriel Krisman Bertazicase-insensitive directories, the filesystem must have the
1110a790fe4SGabriel Krisman Bertazicasefold feature, which stores the filesystem-wide encoding
1120a790fe4SGabriel Krisman Bertazimodel used.  By default, the charset adopted is the latest version of
1130a790fe4SGabriel Krisman BertaziUnicode (12.1.0, by the time of this writing), encoded in the UTF-8
1140a790fe4SGabriel Krisman Bertaziform.  The comparison algorithm is implemented by normalizing the
1150a790fe4SGabriel Krisman Bertazistrings to the Canonical decomposition form, as defined by Unicode,
1160a790fe4SGabriel Krisman Bertazifollowed by a byte per byte comparison.
1170a790fe4SGabriel Krisman Bertazi
1180a790fe4SGabriel Krisman BertaziThe case-awareness is name-preserving on the disk, meaning that the file
1190a790fe4SGabriel Krisman Bertaziname provided by userspace is a byte-per-byte match to what is actually
1200a790fe4SGabriel Krisman Bertaziwritten in the disk.  The Unicode normalization format used by the
1210a790fe4SGabriel Krisman Bertazikernel is thus an internal representation, and not exposed to the
1220a790fe4SGabriel Krisman Bertaziuserspace nor to the disk, with the important exception of disk hashes,
1230a790fe4SGabriel Krisman Bertaziused on large case-insensitive directories with DX feature.  On DX
1240a790fe4SGabriel Krisman Bertazidirectories, the hash must be calculated using the casefolded version of
1250a790fe4SGabriel Krisman Bertazithe filename, meaning that the normalization format used actually has an
1260a790fe4SGabriel Krisman Bertaziimpact on where the directory entry is stored.
1270a790fe4SGabriel Krisman Bertazi
1280a790fe4SGabriel Krisman BertaziWhen we change from viewing filenames as opaque byte sequences to seeing
1290a790fe4SGabriel Krisman Bertazithem as encoded strings we need to address what happens when a program
1300a790fe4SGabriel Krisman Bertazitries to create a file with an invalid name.  The Unicode subsystem
1310a790fe4SGabriel Krisman Bertaziwithin the kernel leaves the decision of what to do in this case to the
1320a790fe4SGabriel Krisman Bertazifilesystem, which select its preferred behavior by enabling/disabling
1330a790fe4SGabriel Krisman Bertazithe strict mode.  When Ext4 encounters one of those strings and the
1340a790fe4SGabriel Krisman Bertazifilesystem did not require strict mode, it falls back to considering the
1350a790fe4SGabriel Krisman Bertazientire string as an opaque byte sequence, which still allows the user to
1360a790fe4SGabriel Krisman Bertazioperate on that file, but the case-insensitive lookups won't work.
1370a790fe4SGabriel Krisman Bertazi
138d3091215SDarrick J. WongOptions
139d3091215SDarrick J. Wong=======
140d3091215SDarrick J. Wong
141d3091215SDarrick J. WongWhen mounting an ext4 filesystem, the following option are accepted:
142d3091215SDarrick J. Wong(*) == default
143d3091215SDarrick J. Wong
144d3091215SDarrick J. Wong  ro
145d3091215SDarrick J. Wong        Mount filesystem read only. Note that ext4 will replay the journal (and
146d3091215SDarrick J. Wong        thus write to the partition) even when mounted "read only". The mount
147d3091215SDarrick J. Wong        options "ro,noload" can be used to prevent writes to the filesystem.
148d3091215SDarrick J. Wong
149d3091215SDarrick J. Wong  journal_checksum
150d3091215SDarrick J. Wong        Enable checksumming of the journal transactions.  This will allow the
151d3091215SDarrick J. Wong        recovery code in e2fsck and the kernel to detect corruption in the
152d3091215SDarrick J. Wong        kernel.  It is a compatible change and will be ignored by older
153d3091215SDarrick J. Wong        kernels.
154d3091215SDarrick J. Wong
155d3091215SDarrick J. Wong  journal_async_commit
156d3091215SDarrick J. Wong        Commit block can be written to disk without waiting for descriptor
157d3091215SDarrick J. Wong        blocks. If enabled older kernels cannot mount the device. This will
158d3091215SDarrick J. Wong        enable 'journal_checksum' internally.
159d3091215SDarrick J. Wong
160d3091215SDarrick J. Wong  journal_path=path, journal_dev=devnum
161d3091215SDarrick J. Wong        When the external journal device's major/minor numbers have changed,
162d3091215SDarrick J. Wong        these options allow the user to specify the new journal location.  The
163d3091215SDarrick J. Wong        journal device is identified through either its new major/minor numbers
164d3091215SDarrick J. Wong        encoded in devnum, or via a path to the device.
165d3091215SDarrick J. Wong
166d3091215SDarrick J. Wong  norecovery, noload
167d3091215SDarrick J. Wong        Don't load the journal on mounting.  Note that if the filesystem was
168d3091215SDarrick J. Wong        not unmounted cleanly, skipping the journal replay will lead to the
169d3091215SDarrick J. Wong        filesystem containing inconsistencies that can lead to any number of
170d3091215SDarrick J. Wong        problems.
171d3091215SDarrick J. Wong
172d3091215SDarrick J. Wong  data=journal
173d3091215SDarrick J. Wong        All data are committed into the journal prior to being written into the
174d3091215SDarrick J. Wong        main file system.  Enabling this mode will disable delayed allocation
175d3091215SDarrick J. Wong        and O_DIRECT support.
176d3091215SDarrick J. Wong
177d3091215SDarrick J. Wong  data=ordered	(*)
178d3091215SDarrick J. Wong        All data are forced directly out to the main file system prior to its
179d3091215SDarrick J. Wong        metadata being committed to the journal.
180d3091215SDarrick J. Wong
181d3091215SDarrick J. Wong  data=writeback
182d3091215SDarrick J. Wong        Data ordering is not preserved, data may be written into the main file
183d3091215SDarrick J. Wong        system after its metadata has been committed to the journal.
184d3091215SDarrick J. Wong
185d3091215SDarrick J. Wong  commit=nrsec	(*)
18623f6b024SJan Kara        This setting limits the maximum age of the running transaction to
18723f6b024SJan Kara        'nrsec' seconds.  The default value is 5 seconds.  This means that if
18823f6b024SJan Kara        you lose your power, you will lose as much as the latest 5 seconds of
18923f6b024SJan Kara        metadata changes (your filesystem will not be damaged though, thanks
19023f6b024SJan Kara        to the journaling). This default value (or any low value) will hurt
19123f6b024SJan Kara        performance, but it's good for data-safety.  Setting it to 0 will have
19223f6b024SJan Kara        the same effect as leaving it at the default (5 seconds).  Setting it
19323f6b024SJan Kara        to very large values will improve performance.  Note that due to
19423f6b024SJan Kara        delayed allocation even older data can be lost on power failure since
19523f6b024SJan Kara        writeback of those data begins only after time set in
19623f6b024SJan Kara        /proc/sys/vm/dirty_expire_centisecs.
197d3091215SDarrick J. Wong
198d3091215SDarrick J. Wong  barrier=<0|1(*)>, barrier(*), nobarrier
199d3091215SDarrick J. Wong        This enables/disables the use of write barriers in the jbd code.
200d3091215SDarrick J. Wong        barrier=0 disables, barrier=1 enables.  This also requires an IO stack
201d3091215SDarrick J. Wong        which can support barriers, and if jbd gets an error on a barrier
202d3091215SDarrick J. Wong        write, it will disable again with a warning.  Write barriers enforce
203d3091215SDarrick J. Wong        proper on-disk ordering of journal commits, making volatile disk write
204d3091215SDarrick J. Wong        caches safe to use, at some performance penalty.  If your disks are
205d3091215SDarrick J. Wong        battery-backed in one way or another, disabling barriers may safely
206d3091215SDarrick J. Wong        improve performance.  The mount options "barrier" and "nobarrier" can
207d3091215SDarrick J. Wong        also be used to enable or disable barriers, for consistency with other
208d3091215SDarrick J. Wong        ext4 mount options.
209d3091215SDarrick J. Wong
210d3091215SDarrick J. Wong  inode_readahead_blks=n
211d3091215SDarrick J. Wong        This tuning parameter controls the maximum number of inode table blocks
212d3091215SDarrick J. Wong        that ext4's inode table readahead algorithm will pre-read into the
213d3091215SDarrick J. Wong        buffer cache.  The default value is 32 blocks.
214d3091215SDarrick J. Wong
215d3091215SDarrick J. Wong  bsddf	(*)
216d3091215SDarrick J. Wong        Make 'df' act like BSD.
217d3091215SDarrick J. Wong
218d3091215SDarrick J. Wong  minixdf
219d3091215SDarrick J. Wong        Make 'df' act like Minix.
220d3091215SDarrick J. Wong
221d3091215SDarrick J. Wong  debug
222d3091215SDarrick J. Wong        Extra debugging information is sent to syslog.
223d3091215SDarrick J. Wong
224d3091215SDarrick J. Wong  abort
225d3091215SDarrick J. Wong        Simulate the effects of calling ext4_abort() for debugging purposes.
226d3091215SDarrick J. Wong        This is normally used while remounting a filesystem which is already
227d3091215SDarrick J. Wong        mounted.
228d3091215SDarrick J. Wong
229d3091215SDarrick J. Wong  errors=remount-ro
230d3091215SDarrick J. Wong        Remount the filesystem read-only on an error.
231d3091215SDarrick J. Wong
232d3091215SDarrick J. Wong  errors=continue
233d3091215SDarrick J. Wong        Keep going on a filesystem error.
234d3091215SDarrick J. Wong
235d3091215SDarrick J. Wong  errors=panic
236d3091215SDarrick J. Wong        Panic and halt the machine if an error occurs.  (These mount options
237d3091215SDarrick J. Wong        override the errors behavior specified in the superblock, which can be
238d3091215SDarrick J. Wong        configured using tune2fs)
239d3091215SDarrick J. Wong
240d3091215SDarrick J. Wong  data_err=ignore(*)
241d3091215SDarrick J. Wong        Just print an error message if an error occurs in a file data buffer in
242d3091215SDarrick J. Wong        ordered mode.
243d3091215SDarrick J. Wong  data_err=abort
244d3091215SDarrick J. Wong        Abort the journal if an error occurs in a file data buffer in ordered
245d3091215SDarrick J. Wong        mode.
246d3091215SDarrick J. Wong
247d3091215SDarrick J. Wong  grpid | bsdgroups
248d3091215SDarrick J. Wong        New objects have the group ID of their parent.
249d3091215SDarrick J. Wong
250d3091215SDarrick J. Wong  nogrpid (*) | sysvgroups
251d3091215SDarrick J. Wong        New objects have the group ID of their creator.
252d3091215SDarrick J. Wong
253d3091215SDarrick J. Wong  resgid=n
254d3091215SDarrick J. Wong        The group ID which may use the reserved blocks.
255d3091215SDarrick J. Wong
256d3091215SDarrick J. Wong  resuid=n
257d3091215SDarrick J. Wong        The user ID which may use the reserved blocks.
258d3091215SDarrick J. Wong
259d3091215SDarrick J. Wong  sb=
260d3091215SDarrick J. Wong        Use alternate superblock at this location.
261d3091215SDarrick J. Wong
262d3091215SDarrick J. Wong  quota, noquota, grpquota, usrquota
263d3091215SDarrick J. Wong        These options are ignored by the filesystem. They are used only by
264d3091215SDarrick J. Wong        quota tools to recognize volumes where quota should be turned on. See
265d3091215SDarrick J. Wong        documentation in the quota-tools package for more details
266d3091215SDarrick J. Wong        (http://sourceforge.net/projects/linuxquota).
267d3091215SDarrick J. Wong
268d3091215SDarrick J. Wong  jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file>
269d3091215SDarrick J. Wong        These options tell filesystem details about quota so that quota
270d3091215SDarrick J. Wong        information can be properly updated during journal replay. They replace
271d3091215SDarrick J. Wong        the above quota options. See documentation in the quota-tools package
272d3091215SDarrick J. Wong        for more details (http://sourceforge.net/projects/linuxquota).
273d3091215SDarrick J. Wong
274d3091215SDarrick J. Wong  stripe=n
275d3091215SDarrick J. Wong        Number of filesystem blocks that mballoc will try to use for allocation
276d3091215SDarrick J. Wong        size and alignment. For RAID5/6 systems this should be the number of
277d3091215SDarrick J. Wong        data disks *  RAID chunk size in file system blocks.
278d3091215SDarrick J. Wong
279d3091215SDarrick J. Wong  delalloc	(*)
280d3091215SDarrick J. Wong        Defer block allocation until just before ext4 writes out the block(s)
281d3091215SDarrick J. Wong        in question.  This allows ext4 to better allocation decisions more
282d3091215SDarrick J. Wong        efficiently.
283d3091215SDarrick J. Wong
284d3091215SDarrick J. Wong  nodelalloc
285d3091215SDarrick J. Wong        Disable delayed allocation.  Blocks are allocated when the data is
286d3091215SDarrick J. Wong        copied from userspace to the page cache, either via the write(2) system
287d3091215SDarrick J. Wong        call or when an mmap'ed page which was previously unallocated is
288d3091215SDarrick J. Wong        written for the first time.
289d3091215SDarrick J. Wong
290d3091215SDarrick J. Wong  max_batch_time=usec
291d3091215SDarrick J. Wong        Maximum amount of time ext4 should wait for additional filesystem
292d3091215SDarrick J. Wong        operations to be batch together with a synchronous write operation.
293d3091215SDarrick J. Wong        Since a synchronous write operation is going to force a commit and then
294d3091215SDarrick J. Wong        a wait for the I/O complete, it doesn't cost much, and can be a huge
295d3091215SDarrick J. Wong        throughput win, we wait for a small amount of time to see if any other
296d3091215SDarrick J. Wong        transactions can piggyback on the synchronous write.   The algorithm
297d3091215SDarrick J. Wong        used is designed to automatically tune for the speed of the disk, by
298d3091215SDarrick J. Wong        measuring the amount of time (on average) that it takes to finish
299d3091215SDarrick J. Wong        committing a transaction.  Call this time the "commit time".  If the
300d3091215SDarrick J. Wong        time that the transaction has been running is less than the commit
301d3091215SDarrick J. Wong        time, ext4 will try sleeping for the commit time to see if other
302d3091215SDarrick J. Wong        operations will join the transaction.   The commit time is capped by
303d3091215SDarrick J. Wong        the max_batch_time, which defaults to 15000us (15ms).   This
304d3091215SDarrick J. Wong        optimization can be turned off entirely by setting max_batch_time to 0.
305d3091215SDarrick J. Wong
306d3091215SDarrick J. Wong  min_batch_time=usec
307d3091215SDarrick J. Wong        This parameter sets the commit time (as described above) to be at least
308d3091215SDarrick J. Wong        min_batch_time.  It defaults to zero microseconds.  Increasing this
309d3091215SDarrick J. Wong        parameter may improve the throughput of multi-threaded, synchronous
310d3091215SDarrick J. Wong        workloads on very fast disks, at the cost of increasing latency.
311d3091215SDarrick J. Wong
312d3091215SDarrick J. Wong  journal_ioprio=prio
313d3091215SDarrick J. Wong        The I/O priority (from 0 to 7, where 0 is the highest priority) which
314d3091215SDarrick J. Wong        should be used for I/O operations submitted by kjournald2 during a
315d3091215SDarrick J. Wong        commit operation.  This defaults to 3, which is a slightly higher
316d3091215SDarrick J. Wong        priority than the default I/O priority.
317d3091215SDarrick J. Wong
318d3091215SDarrick J. Wong  auto_da_alloc(*), noauto_da_alloc
319d3091215SDarrick J. Wong        Many broken applications don't use fsync() when replacing existing
320d3091215SDarrick J. Wong        files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/
321d3091215SDarrick J. Wong        rename("foo.new", "foo"), or worse yet, fd = open("foo",
322d3091215SDarrick J. Wong        O_TRUNC)/write(fd,..)/close(fd).  If auto_da_alloc is enabled, ext4
323d3091215SDarrick J. Wong        will detect the replace-via-rename and replace-via-truncate patterns
324d3091215SDarrick J. Wong        and force that any delayed allocation blocks are allocated such that at
325d3091215SDarrick J. Wong        the next journal commit, in the default data=ordered mode, the data
326d3091215SDarrick J. Wong        blocks of the new file are forced to disk before the rename() operation
327d3091215SDarrick J. Wong        is committed.  This provides roughly the same level of guarantees as
328d3091215SDarrick J. Wong        ext3, and avoids the "zero-length" problem that can happen when a
329d3091215SDarrick J. Wong        system crashes before the delayed allocation blocks are forced to disk.
330d3091215SDarrick J. Wong
331d3091215SDarrick J. Wong  noinit_itable
332d3091215SDarrick J. Wong        Do not initialize any uninitialized inode table blocks in the
333d3091215SDarrick J. Wong        background.  This feature may be used by installation CD's so that the
334d3091215SDarrick J. Wong        install process can complete as quickly as possible; the inode table
335d3091215SDarrick J. Wong        initialization process would then be deferred until the next time the
336d3091215SDarrick J. Wong        file system is unmounted.
337d3091215SDarrick J. Wong
338d3091215SDarrick J. Wong  init_itable=n
339d3091215SDarrick J. Wong        The lazy itable init code will wait n times the number of milliseconds
340d3091215SDarrick J. Wong        it took to zero out the previous block group's inode table.  This
341d3091215SDarrick J. Wong        minimizes the impact on the system performance while file system's
342d3091215SDarrick J. Wong        inode table is being initialized.
343d3091215SDarrick J. Wong
344d3091215SDarrick J. Wong  discard, nodiscard(*)
345d3091215SDarrick J. Wong        Controls whether ext4 should issue discard/TRIM commands to the
346d3091215SDarrick J. Wong        underlying block device when blocks are freed.  This is useful for SSD
347d3091215SDarrick J. Wong        devices and sparse/thinly-provisioned LUNs, but it is off by default
348d3091215SDarrick J. Wong        until sufficient testing has been done.
349d3091215SDarrick J. Wong
350d3091215SDarrick J. Wong  nouid32
351d3091215SDarrick J. Wong        Disables 32-bit UIDs and GIDs.  This is for interoperability  with
352d3091215SDarrick J. Wong        older kernels which only store and expect 16-bit values.
353d3091215SDarrick J. Wong
354d3091215SDarrick J. Wong  block_validity(*), noblock_validity
355d3091215SDarrick J. Wong        These options enable or disable the in-kernel facility for tracking
356d3091215SDarrick J. Wong        filesystem metadata blocks within internal data structures.  This
357d3091215SDarrick J. Wong        allows multi- block allocator and other routines to notice bugs or
358d3091215SDarrick J. Wong        corrupted allocation bitmaps which cause blocks to be allocated which
359d3091215SDarrick J. Wong        overlap with filesystem metadata blocks.
360d3091215SDarrick J. Wong
361d3091215SDarrick J. Wong  dioread_lock, dioread_nolock
362d3091215SDarrick J. Wong        Controls whether or not ext4 should use the DIO read locking. If the
363d3091215SDarrick J. Wong        dioread_nolock option is specified ext4 will allocate uninitialized
364d3091215SDarrick J. Wong        extent before buffer write and convert the extent to initialized after
365d3091215SDarrick J. Wong        IO completes. This approach allows ext4 code to avoid using inode
366d3091215SDarrick J. Wong        mutex, which improves scalability on high speed storages. However this
367d3091215SDarrick J. Wong        does not work with data journaling and dioread_nolock option will be
368d3091215SDarrick J. Wong        ignored with kernel warning. Note that dioread_nolock code path is only
369d3091215SDarrick J. Wong        used for extent-based files.  Because of the restrictions this options
370d3091215SDarrick J. Wong        comprises it is off by default (e.g. dioread_lock).
371d3091215SDarrick J. Wong
372d3091215SDarrick J. Wong  max_dir_size_kb=n
373d3091215SDarrick J. Wong        This limits the size of directories so that any attempt to expand them
374d3091215SDarrick J. Wong        beyond the specified limit in kilobytes will cause an ENOSPC error.
375d3091215SDarrick J. Wong        This is useful in memory constrained environments, where a very large
376d3091215SDarrick J. Wong        directory can cause severe performance problems or even provoke the Out
377d3091215SDarrick J. Wong        Of Memory killer.  (For example, if there is only 512mb memory
378d3091215SDarrick J. Wong        available, a 176mb directory may seriously cramp the system's style.)
379d3091215SDarrick J. Wong
380d3091215SDarrick J. Wong  i_version
381d3091215SDarrick J. Wong        Enable 64-bit inode version support. This option is off by default.
382d3091215SDarrick J. Wong
383d3091215SDarrick J. Wong  dax
384d3091215SDarrick J. Wong        Use direct access (no page cache).  See
385*a9edc03fSKir Kolyshkin        Documentation/filesystems/dax.rst.  Note that this option is
386d3091215SDarrick J. Wong        incompatible with data=journal.
387d3091215SDarrick J. Wong
3884f74d15fSEric Biggers  inlinecrypt
3894f74d15fSEric Biggers        When possible, encrypt/decrypt the contents of encrypted files using the
3904f74d15fSEric Biggers        blk-crypto framework rather than filesystem-layer encryption. This
3914f74d15fSEric Biggers        allows the use of inline encryption hardware. The on-disk format is
3924f74d15fSEric Biggers        unaffected. For more details, see
3934f74d15fSEric Biggers        Documentation/block/inline-encryption.rst.
3944f74d15fSEric Biggers
395d3091215SDarrick J. WongData Mode
396d3091215SDarrick J. Wong=========
397d3091215SDarrick J. WongThere are 3 different data modes:
398d3091215SDarrick J. Wong
399d3091215SDarrick J. Wong* writeback mode
400d3091215SDarrick J. Wong
401d3091215SDarrick J. Wong  In data=writeback mode, ext4 does not journal data at all.  This mode provides
402d3091215SDarrick J. Wong  a similar level of journaling as that of XFS, JFS, and ReiserFS in its default
403d3091215SDarrick J. Wong  mode - metadata journaling.  A crash+recovery can cause incorrect data to
404d3091215SDarrick J. Wong  appear in files which were written shortly before the crash.  This mode will
405d3091215SDarrick J. Wong  typically provide the best ext4 performance.
406d3091215SDarrick J. Wong
407d3091215SDarrick J. Wong* ordered mode
408d3091215SDarrick J. Wong
409d3091215SDarrick J. Wong  In data=ordered mode, ext4 only officially journals metadata, but it logically
410d3091215SDarrick J. Wong  groups metadata information related to data changes with the data blocks into
411d3091215SDarrick J. Wong  a single unit called a transaction.  When it's time to write the new metadata
412d3091215SDarrick J. Wong  out to disk, the associated data blocks are written first.  In general, this
413d3091215SDarrick J. Wong  mode performs slightly slower than writeback but significantly faster than
414d3091215SDarrick J. Wong  journal mode.
415d3091215SDarrick J. Wong
416d3091215SDarrick J. Wong* journal mode
417d3091215SDarrick J. Wong
418d3091215SDarrick J. Wong  data=journal mode provides full data and metadata journaling.  All new data is
419d3091215SDarrick J. Wong  written to the journal first, and then to its final location.  In the event of
420d3091215SDarrick J. Wong  a crash, the journal can be replayed, bringing both data and metadata into a
421d3091215SDarrick J. Wong  consistent state.  This mode is the slowest except when data needs to be read
422d3091215SDarrick J. Wong  from and written to disk at the same time where it outperforms all others
423d3091215SDarrick J. Wong  modes.  Enabling this mode will disable delayed allocation and O_DIRECT
424d3091215SDarrick J. Wong  support.
425d3091215SDarrick J. Wong
426d3091215SDarrick J. Wong/proc entries
427d3091215SDarrick J. Wong=============
428d3091215SDarrick J. Wong
429d3091215SDarrick J. WongInformation about mounted ext4 file systems can be found in
430d3091215SDarrick J. Wong/proc/fs/ext4.  Each mounted filesystem will have a directory in
431d3091215SDarrick J. Wong/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or
432d3091215SDarrick J. Wong/proc/fs/ext4/dm-0).   The files in each per-device directory are shown
433d3091215SDarrick J. Wongin table below.
434d3091215SDarrick J. Wong
435d3091215SDarrick J. WongFiles in /proc/fs/ext4/<devname>
436d3091215SDarrick J. Wong
437d3091215SDarrick J. Wong  mb_groups
438d3091215SDarrick J. Wong        details of multiblock allocator buddy cache of free blocks
439d3091215SDarrick J. Wong
440d3091215SDarrick J. Wong/sys entries
441d3091215SDarrick J. Wong============
442d3091215SDarrick J. Wong
443d3091215SDarrick J. WongInformation about mounted ext4 file systems can be found in
444d3091215SDarrick J. Wong/sys/fs/ext4.  Each mounted filesystem will have a directory in
445d3091215SDarrick J. Wong/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or
446d3091215SDarrick J. Wong/sys/fs/ext4/dm-0).   The files in each per-device directory are shown
447d3091215SDarrick J. Wongin table below.
448d3091215SDarrick J. Wong
449d3091215SDarrick J. WongFiles in /sys/fs/ext4/<devname>:
450d3091215SDarrick J. Wong
451d3091215SDarrick J. Wong(see also Documentation/ABI/testing/sysfs-fs-ext4)
452d3091215SDarrick J. Wong
453d3091215SDarrick J. Wong  delayed_allocation_blocks
454d3091215SDarrick J. Wong        This file is read-only and shows the number of blocks that are dirty in
455d3091215SDarrick J. Wong        the page cache, but which do not have their location in the filesystem
456d3091215SDarrick J. Wong        allocated yet.
457d3091215SDarrick J. Wong
458d3091215SDarrick J. Wong  inode_goal
459d3091215SDarrick J. Wong        Tuning parameter which (if non-zero) controls the goal inode used by
460d3091215SDarrick J. Wong        the inode allocator in preference to all other allocation heuristics.
461d3091215SDarrick J. Wong        This is intended for debugging use only, and should be 0 on production
462d3091215SDarrick J. Wong        systems.
463d3091215SDarrick J. Wong
464d3091215SDarrick J. Wong  inode_readahead_blks
465d3091215SDarrick J. Wong        Tuning parameter which controls the maximum number of inode table
466d3091215SDarrick J. Wong        blocks that ext4's inode table readahead algorithm will pre-read into
467d3091215SDarrick J. Wong        the buffer cache.
468d3091215SDarrick J. Wong
469d3091215SDarrick J. Wong  lifetime_write_kbytes
470d3091215SDarrick J. Wong        This file is read-only and shows the number of kilobytes of data that
471d3091215SDarrick J. Wong        have been written to this filesystem since it was created.
472d3091215SDarrick J. Wong
473d3091215SDarrick J. Wong  max_writeback_mb_bump
474d3091215SDarrick J. Wong        The maximum number of megabytes the writeback code will try to write
475d3091215SDarrick J. Wong        out before move on to another inode.
476d3091215SDarrick J. Wong
477d3091215SDarrick J. Wong  mb_group_prealloc
478d3091215SDarrick J. Wong        The multiblock allocator will round up allocation requests to a
479d3091215SDarrick J. Wong        multiple of this tuning parameter if the stripe size is not set in the
480d3091215SDarrick J. Wong        ext4 superblock
481d3091215SDarrick J. Wong
482d3091215SDarrick J. Wong  mb_max_to_scan
483d3091215SDarrick J. Wong        The maximum number of extents the multiblock allocator will search to
484d3091215SDarrick J. Wong        find the best extent.
485d3091215SDarrick J. Wong
486d3091215SDarrick J. Wong  mb_min_to_scan
487d3091215SDarrick J. Wong        The minimum number of extents the multiblock allocator will search to
488d3091215SDarrick J. Wong        find the best extent.
489d3091215SDarrick J. Wong
490d3091215SDarrick J. Wong  mb_order2_req
491d3091215SDarrick J. Wong        Tuning parameter which controls the minimum size for requests (as a
492d3091215SDarrick J. Wong        power of 2) where the buddy cache is used.
493d3091215SDarrick J. Wong
494d3091215SDarrick J. Wong  mb_stats
495d3091215SDarrick J. Wong        Controls whether the multiblock allocator should collect statistics,
496d3091215SDarrick J. Wong        which are shown during the unmount. 1 means to collect statistics, 0
497d3091215SDarrick J. Wong        means not to collect statistics.
498d3091215SDarrick J. Wong
499d3091215SDarrick J. Wong  mb_stream_req
500d3091215SDarrick J. Wong        Files which have fewer blocks than this tunable parameter will have
501d3091215SDarrick J. Wong        their blocks allocated out of a block group specific preallocation
502d3091215SDarrick J. Wong        pool, so that small files are packed closely together.  Each large file
503d3091215SDarrick J. Wong        will have its blocks allocated out of its own unique preallocation
504d3091215SDarrick J. Wong        pool.
505d3091215SDarrick J. Wong
506d3091215SDarrick J. Wong  session_write_kbytes
507d3091215SDarrick J. Wong        This file is read-only and shows the number of kilobytes of data that
508d3091215SDarrick J. Wong        have been written to this filesystem since it was mounted.
509d3091215SDarrick J. Wong
510d3091215SDarrick J. Wong  reserved_clusters
511d3091215SDarrick J. Wong        This is RW file and contains number of reserved clusters in the file
512d3091215SDarrick J. Wong        system which will be used in the specific situations to avoid costly
513d3091215SDarrick J. Wong        zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or
514d3091215SDarrick J. Wong        4096 clusters, whichever is smaller and this can be changed however it
515d3091215SDarrick J. Wong        can never exceed number of clusters in the file system. If there is not
516d3091215SDarrick J. Wong        enough space for the reserved space when mounting the file mount will
517d3091215SDarrick J. Wong        _not_ fail.
518d3091215SDarrick J. Wong
519d3091215SDarrick J. WongIoctls
520d3091215SDarrick J. Wong======
521d3091215SDarrick J. Wong
522cb29a02dSEric BiggersExt4 implements various ioctls which can be used by applications to access
523cb29a02dSEric Biggersext4-specific functionality. An incomplete list of these ioctls is shown in the
524cb29a02dSEric Biggerstable below. This list includes truly ext4-specific ioctls (``EXT4_IOC_*``) as
525cb29a02dSEric Biggerswell as ioctls that may have been ext4-specific originally but are now supported
526cb29a02dSEric Biggersby some other filesystem(s) too (``FS_IOC_*``).
527d3091215SDarrick J. Wong
528cb29a02dSEric BiggersTable of Ext4 ioctls
529d3091215SDarrick J. Wong
530cb29a02dSEric Biggers  FS_IOC_GETFLAGS
531d3091215SDarrick J. Wong        Get additional attributes associated with inode.  The ioctl argument is
532cb29a02dSEric Biggers        an integer bitfield, with bit values described in ext4.h.
533d3091215SDarrick J. Wong
534cb29a02dSEric Biggers  FS_IOC_SETFLAGS
535d3091215SDarrick J. Wong        Set additional attributes associated with inode.  The ioctl argument is
536cb29a02dSEric Biggers        an integer bitfield, with bit values described in ext4.h.
537d3091215SDarrick J. Wong
538d3091215SDarrick J. Wong  EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD
539d3091215SDarrick J. Wong        Get the inode i_generation number stored for each inode. The
540d3091215SDarrick J. Wong        i_generation number is normally changed only when new inode is created
541d3091215SDarrick J. Wong        and it is particularly useful for network filesystems. The '_OLD'
542d3091215SDarrick J. Wong        version of this ioctl is an alias for FS_IOC_GETVERSION.
543d3091215SDarrick J. Wong
544d3091215SDarrick J. Wong  EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD
545d3091215SDarrick J. Wong        Set the inode i_generation number stored for each inode. The '_OLD'
546d3091215SDarrick J. Wong        version of this ioctl is an alias for FS_IOC_SETVERSION.
547d3091215SDarrick J. Wong
548d3091215SDarrick J. Wong  EXT4_IOC_GROUP_EXTEND
549d3091215SDarrick J. Wong        This ioctl has the same purpose as the resize mount option. It allows
550d3091215SDarrick J. Wong        to resize filesystem to the end of the last existing block group,
551d3091215SDarrick J. Wong        further resize has to be done with resize2fs, either online, or
552d3091215SDarrick J. Wong        offline. The argument points to the unsigned logn number representing
553d3091215SDarrick J. Wong        the filesystem new block count.
554d3091215SDarrick J. Wong
555d3091215SDarrick J. Wong  EXT4_IOC_MOVE_EXT
556d3091215SDarrick J. Wong        Move the block extents from orig_fd (the one this ioctl is pointing to)
557d3091215SDarrick J. Wong        to the donor_fd (the one specified in move_extent structure passed as
558d3091215SDarrick J. Wong        an argument to this ioctl). Then, exchange inode metadata between
559d3091215SDarrick J. Wong        orig_fd and donor_fd.  This is especially useful for online
560d3091215SDarrick J. Wong        defragmentation, because the allocator has the opportunity to allocate
561d3091215SDarrick J. Wong        moved blocks better, ideally into one contiguous extent.
562d3091215SDarrick J. Wong
563d3091215SDarrick J. Wong  EXT4_IOC_GROUP_ADD
564d3091215SDarrick J. Wong        Add a new group descriptor to an existing or new group descriptor
565d3091215SDarrick J. Wong        block. The new group descriptor is described by ext4_new_group_input
566d3091215SDarrick J. Wong        structure, which is passed as an argument to this ioctl. This is
567d3091215SDarrick J. Wong        especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which
568d3091215SDarrick J. Wong        allows online resize of the filesystem to the end of the last existing
569d3091215SDarrick J. Wong        block group.  Those two ioctls combined is used in userspace online
570d3091215SDarrick J. Wong        resize tool (e.g. resize2fs).
571d3091215SDarrick J. Wong
572d3091215SDarrick J. Wong  EXT4_IOC_MIGRATE
573d3091215SDarrick J. Wong        This ioctl operates on the filesystem itself.  It converts (migrates)
574d3091215SDarrick J. Wong        ext3 indirect block mapped inode to ext4 extent mapped inode by walking
575d3091215SDarrick J. Wong        through indirect block mapping of the original inode and converting
576d3091215SDarrick J. Wong        contiguous block ranges into ext4 extents of the temporary inode. Then,
577d3091215SDarrick J. Wong        inodes are swapped. This ioctl might help, when migrating from ext3 to
578d3091215SDarrick J. Wong        ext4 filesystem, however suggestion is to create fresh ext4 filesystem
579d3091215SDarrick J. Wong        and copy data from the backup. Note, that filesystem has to support
580d3091215SDarrick J. Wong        extents for this ioctl to work.
581d3091215SDarrick J. Wong
582d3091215SDarrick J. Wong  EXT4_IOC_ALLOC_DA_BLKS
583d3091215SDarrick J. Wong        Force all of the delay allocated blocks to be allocated to preserve
584d3091215SDarrick J. Wong        application-expected ext3 behaviour. Note that this will also start
585d3091215SDarrick J. Wong        triggering a write of the data blocks, but this behaviour may change in
586d3091215SDarrick J. Wong        the future as it is not necessary and has been done this way only for
587d3091215SDarrick J. Wong        sake of simplicity.
588d3091215SDarrick J. Wong
589d3091215SDarrick J. Wong  EXT4_IOC_RESIZE_FS
590d3091215SDarrick J. Wong        Resize the filesystem to a new size.  The number of blocks of resized
591d3091215SDarrick J. Wong        filesystem is passed in via 64 bit integer argument.  The kernel
592d3091215SDarrick J. Wong        allocates bitmaps and inode table, the userspace tool thus just passes
593d3091215SDarrick J. Wong        the new number of blocks.
594d3091215SDarrick J. Wong
595d3091215SDarrick J. Wong  EXT4_IOC_SWAP_BOOT
596d3091215SDarrick J. Wong        Swap i_blocks and associated attributes (like i_blocks, i_size,
597d3091215SDarrick J. Wong        i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO
598d3091215SDarrick J. Wong        (#5). This is typically used to store a boot loader in a secure part of
599d3091215SDarrick J. Wong        the filesystem, where it can't be changed by a normal user by accident.
600d3091215SDarrick J. Wong        The data blocks of the previous boot loader will be associated with the
601d3091215SDarrick J. Wong        given inode.
602d3091215SDarrick J. Wong
603d3091215SDarrick J. WongReferences
604d3091215SDarrick J. Wong==========
605d3091215SDarrick J. Wong
606d3091215SDarrick J. Wongkernel source:	<file:fs/ext4/>
607d3091215SDarrick J. Wong		<file:fs/jbd2/>
608d3091215SDarrick J. Wong
609d3091215SDarrick J. Wongprograms:	http://e2fsprogs.sourceforge.net/
610d3091215SDarrick J. Wong
6116b2484e1SAlexander A. Klimovuseful links:	https://fedoraproject.org/wiki/ext3-devel
612d3091215SDarrick J. Wong		http://www.bullopensource.org/ext4/
613d3091215SDarrick J. Wong		http://ext4.wiki.kernel.org/index.php/Main_Page
6146b2484e1SAlexander A. Klimov		https://fedoraproject.org/wiki/Features/Ext4
615