1d3091215SDarrick J. Wong.. SPDX-License-Identifier: GPL-2.0 2d3091215SDarrick J. Wong 3d3091215SDarrick J. Wong======================== 4d3091215SDarrick J. Wongext4 General Information 5d3091215SDarrick J. Wong======================== 6d3091215SDarrick J. Wong 7d3091215SDarrick J. WongExt4 is an advanced level of the ext3 filesystem which incorporates 8d3091215SDarrick J. Wongscalability and reliability enhancements for supporting large filesystems 9d3091215SDarrick J. Wong(64 bit) in keeping with increasing disk capacities and state-of-the-art 10d3091215SDarrick J. Wongfeature requirements. 11d3091215SDarrick J. Wong 12d3091215SDarrick J. WongMailing list: linux-ext4@vger.kernel.org 13d3091215SDarrick J. WongWeb site: http://ext4.wiki.kernel.org 14d3091215SDarrick J. Wong 15d3091215SDarrick J. Wong 16d3091215SDarrick J. WongQuick usage instructions 17d3091215SDarrick J. Wong======================== 18d3091215SDarrick J. Wong 19d3091215SDarrick J. WongNote: More extensive information for getting started with ext4 can be 20d3091215SDarrick J. Wongfound at the ext4 wiki site at the URL: 21d3091215SDarrick J. Wonghttp://ext4.wiki.kernel.org/index.php/Ext4_Howto 22d3091215SDarrick J. Wong 23d3091215SDarrick J. Wong - The latest version of e2fsprogs can be found at: 24d3091215SDarrick J. Wong 25d3091215SDarrick J. Wong https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ 26d3091215SDarrick J. Wong 27d3091215SDarrick J. Wong or 28d3091215SDarrick J. Wong 29d3091215SDarrick J. Wong http://sourceforge.net/project/showfiles.php?group_id=2406 30d3091215SDarrick J. Wong 31d3091215SDarrick J. Wong or grab the latest git repository from: 32d3091215SDarrick J. Wong 33d3091215SDarrick J. Wong https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git 34d3091215SDarrick J. Wong 35d3091215SDarrick J. Wong - Create a new filesystem using the ext4 filesystem type: 36d3091215SDarrick J. Wong 37d3091215SDarrick J. Wong # mke2fs -t ext4 /dev/hda1 38d3091215SDarrick J. Wong 39d3091215SDarrick J. Wong Or to configure an existing ext3 filesystem to support extents: 40d3091215SDarrick J. Wong 41d3091215SDarrick J. Wong # tune2fs -O extents /dev/hda1 42d3091215SDarrick J. Wong 43d3091215SDarrick J. Wong If the filesystem was created with 128 byte inodes, it can be 44d3091215SDarrick J. Wong converted to use 256 byte for greater efficiency via: 45d3091215SDarrick J. Wong 46d3091215SDarrick J. Wong # tune2fs -I 256 /dev/hda1 47d3091215SDarrick J. Wong 48d3091215SDarrick J. Wong - Mounting: 49d3091215SDarrick J. Wong 50d3091215SDarrick J. Wong # mount -t ext4 /dev/hda1 /wherever 51d3091215SDarrick J. Wong 52d3091215SDarrick J. Wong - When comparing performance with other filesystems, it's always 53d3091215SDarrick J. Wong important to try multiple workloads; very often a subtle change in a 54d3091215SDarrick J. Wong workload parameter can completely change the ranking of which 55d3091215SDarrick J. Wong filesystems do well compared to others. When comparing versus ext3, 56d3091215SDarrick J. Wong note that ext4 enables write barriers by default, while ext3 does 57d3091215SDarrick J. Wong not enable write barriers by default. So it is useful to use 58d3091215SDarrick J. Wong explicitly specify whether barriers are enabled or not when via the 59d3091215SDarrick J. Wong '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems 60d3091215SDarrick J. Wong for a fair comparison. When tuning ext3 for best benchmark numbers, 61d3091215SDarrick J. Wong it is often worthwhile to try changing the data journaling mode; '-o 62d3091215SDarrick J. Wong data=writeback' can be faster for some workloads. (Note however that 63d3091215SDarrick J. Wong running mounted with data=writeback can potentially leave stale data 64d3091215SDarrick J. Wong exposed in recently written files in case of an unclean shutdown, 65d3091215SDarrick J. Wong which could be a security exposure in some situations.) Configuring 66d3091215SDarrick J. Wong the filesystem with a large journal can also be helpful for 67d3091215SDarrick J. Wong metadata-intensive workloads. 68d3091215SDarrick J. Wong 69d3091215SDarrick J. WongFeatures 70d3091215SDarrick J. Wong======== 71d3091215SDarrick J. Wong 72d3091215SDarrick J. WongCurrently Available 73d3091215SDarrick J. Wong------------------- 74d3091215SDarrick J. Wong 75d3091215SDarrick J. Wong* ability to use filesystems > 16TB (e2fsprogs support not available yet) 76d3091215SDarrick J. Wong* extent format reduces metadata overhead (RAM, IO for access, transactions) 77d3091215SDarrick J. Wong* extent format more robust in face of on-disk corruption due to magics, 78d3091215SDarrick J. Wong* internal redundancy in tree 79d3091215SDarrick J. Wong* improved file allocation (multi-block alloc) 80d3091215SDarrick J. Wong* lift 32000 subdirectory limit imposed by i_links_count[1] 81d3091215SDarrick J. Wong* nsec timestamps for mtime, atime, ctime, create time 82d3091215SDarrick J. Wong* inode version field on disk (NFSv4, Lustre) 83d3091215SDarrick J. Wong* reduced e2fsck time via uninit_bg feature 84d3091215SDarrick J. Wong* journal checksumming for robustness, performance 85d3091215SDarrick J. Wong* persistent file preallocation (e.g for streaming media, databases) 86d3091215SDarrick J. Wong* ability to pack bitmaps and inode tables into larger virtual groups via the 87d3091215SDarrick J. Wong flex_bg feature 88d3091215SDarrick J. Wong* large file support 89d3091215SDarrick J. Wong* inode allocation using large virtual block groups via flex_bg 90d3091215SDarrick J. Wong* delayed allocation 91d3091215SDarrick J. Wong* large block (up to pagesize) support 92d3091215SDarrick J. Wong* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force 93d3091215SDarrick J. Wong the ordering) 940a790fe4SGabriel Krisman Bertazi* Case-insensitive file name lookups 952fdff4c8SEric Biggers* file-based encryption support (fscrypt) 962fdff4c8SEric Biggers* file-based verity support (fsverity) 97d3091215SDarrick J. Wong 98d3091215SDarrick J. Wong[1] Filesystems with a block size of 1k may see a limit imposed by the 99d3091215SDarrick J. Wongdirectory hash tree having a maximum depth of two. 100d3091215SDarrick J. Wong 1010a790fe4SGabriel Krisman Bertazicase-insensitive file name lookups 1020a790fe4SGabriel Krisman Bertazi====================================================== 1030a790fe4SGabriel Krisman Bertazi 1040a790fe4SGabriel Krisman BertaziThe case-insensitive file name lookup feature is supported on a 1050a790fe4SGabriel Krisman Bertaziper-directory basis, allowing the user to mix case-insensitive and 1060a790fe4SGabriel Krisman Bertazicase-sensitive directories in the same filesystem. It is enabled by 1070a790fe4SGabriel Krisman Bertaziflipping the +F inode attribute of an empty directory. The 1080a790fe4SGabriel Krisman Bertazicase-insensitive string match operation is only defined when we know how 1090a790fe4SGabriel Krisman Bertazitext in encoded in a byte sequence. For that reason, in order to enable 1100a790fe4SGabriel Krisman Bertazicase-insensitive directories, the filesystem must have the 1110a790fe4SGabriel Krisman Bertazicasefold feature, which stores the filesystem-wide encoding 1120a790fe4SGabriel Krisman Bertazimodel used. By default, the charset adopted is the latest version of 1130a790fe4SGabriel Krisman BertaziUnicode (12.1.0, by the time of this writing), encoded in the UTF-8 1140a790fe4SGabriel Krisman Bertaziform. The comparison algorithm is implemented by normalizing the 1150a790fe4SGabriel Krisman Bertazistrings to the Canonical decomposition form, as defined by Unicode, 1160a790fe4SGabriel Krisman Bertazifollowed by a byte per byte comparison. 1170a790fe4SGabriel Krisman Bertazi 1180a790fe4SGabriel Krisman BertaziThe case-awareness is name-preserving on the disk, meaning that the file 1190a790fe4SGabriel Krisman Bertaziname provided by userspace is a byte-per-byte match to what is actually 1200a790fe4SGabriel Krisman Bertaziwritten in the disk. The Unicode normalization format used by the 1210a790fe4SGabriel Krisman Bertazikernel is thus an internal representation, and not exposed to the 1220a790fe4SGabriel Krisman Bertaziuserspace nor to the disk, with the important exception of disk hashes, 1230a790fe4SGabriel Krisman Bertaziused on large case-insensitive directories with DX feature. On DX 1240a790fe4SGabriel Krisman Bertazidirectories, the hash must be calculated using the casefolded version of 1250a790fe4SGabriel Krisman Bertazithe filename, meaning that the normalization format used actually has an 1260a790fe4SGabriel Krisman Bertaziimpact on where the directory entry is stored. 1270a790fe4SGabriel Krisman Bertazi 1280a790fe4SGabriel Krisman BertaziWhen we change from viewing filenames as opaque byte sequences to seeing 1290a790fe4SGabriel Krisman Bertazithem as encoded strings we need to address what happens when a program 1300a790fe4SGabriel Krisman Bertazitries to create a file with an invalid name. The Unicode subsystem 1310a790fe4SGabriel Krisman Bertaziwithin the kernel leaves the decision of what to do in this case to the 1320a790fe4SGabriel Krisman Bertazifilesystem, which select its preferred behavior by enabling/disabling 1330a790fe4SGabriel Krisman Bertazithe strict mode. When Ext4 encounters one of those strings and the 1340a790fe4SGabriel Krisman Bertazifilesystem did not require strict mode, it falls back to considering the 1350a790fe4SGabriel Krisman Bertazientire string as an opaque byte sequence, which still allows the user to 1360a790fe4SGabriel Krisman Bertazioperate on that file, but the case-insensitive lookups won't work. 1370a790fe4SGabriel Krisman Bertazi 138d3091215SDarrick J. WongOptions 139d3091215SDarrick J. Wong======= 140d3091215SDarrick J. Wong 141d3091215SDarrick J. WongWhen mounting an ext4 filesystem, the following option are accepted: 142d3091215SDarrick J. Wong(*) == default 143d3091215SDarrick J. Wong 144d3091215SDarrick J. Wong ro 145d3091215SDarrick J. Wong Mount filesystem read only. Note that ext4 will replay the journal (and 146d3091215SDarrick J. Wong thus write to the partition) even when mounted "read only". The mount 147d3091215SDarrick J. Wong options "ro,noload" can be used to prevent writes to the filesystem. 148d3091215SDarrick J. Wong 149d3091215SDarrick J. Wong journal_checksum 150d3091215SDarrick J. Wong Enable checksumming of the journal transactions. This will allow the 151d3091215SDarrick J. Wong recovery code in e2fsck and the kernel to detect corruption in the 152d3091215SDarrick J. Wong kernel. It is a compatible change and will be ignored by older 153d3091215SDarrick J. Wong kernels. 154d3091215SDarrick J. Wong 155d3091215SDarrick J. Wong journal_async_commit 156d3091215SDarrick J. Wong Commit block can be written to disk without waiting for descriptor 157d3091215SDarrick J. Wong blocks. If enabled older kernels cannot mount the device. This will 158d3091215SDarrick J. Wong enable 'journal_checksum' internally. 159d3091215SDarrick J. Wong 160d3091215SDarrick J. Wong journal_path=path, journal_dev=devnum 161d3091215SDarrick J. Wong When the external journal device's major/minor numbers have changed, 162d3091215SDarrick J. Wong these options allow the user to specify the new journal location. The 163d3091215SDarrick J. Wong journal device is identified through either its new major/minor numbers 164d3091215SDarrick J. Wong encoded in devnum, or via a path to the device. 165d3091215SDarrick J. Wong 166d3091215SDarrick J. Wong norecovery, noload 167d3091215SDarrick J. Wong Don't load the journal on mounting. Note that if the filesystem was 168d3091215SDarrick J. Wong not unmounted cleanly, skipping the journal replay will lead to the 169d3091215SDarrick J. Wong filesystem containing inconsistencies that can lead to any number of 170d3091215SDarrick J. Wong problems. 171d3091215SDarrick J. Wong 172d3091215SDarrick J. Wong data=journal 173d3091215SDarrick J. Wong All data are committed into the journal prior to being written into the 174d3091215SDarrick J. Wong main file system. Enabling this mode will disable delayed allocation 175d3091215SDarrick J. Wong and O_DIRECT support. 176d3091215SDarrick J. Wong 177d3091215SDarrick J. Wong data=ordered (*) 178d3091215SDarrick J. Wong All data are forced directly out to the main file system prior to its 179d3091215SDarrick J. Wong metadata being committed to the journal. 180d3091215SDarrick J. Wong 181d3091215SDarrick J. Wong data=writeback 182d3091215SDarrick J. Wong Data ordering is not preserved, data may be written into the main file 183d3091215SDarrick J. Wong system after its metadata has been committed to the journal. 184d3091215SDarrick J. Wong 185d3091215SDarrick J. Wong commit=nrsec (*) 18623f6b024SJan Kara This setting limits the maximum age of the running transaction to 18723f6b024SJan Kara 'nrsec' seconds. The default value is 5 seconds. This means that if 18823f6b024SJan Kara you lose your power, you will lose as much as the latest 5 seconds of 18923f6b024SJan Kara metadata changes (your filesystem will not be damaged though, thanks 19023f6b024SJan Kara to the journaling). This default value (or any low value) will hurt 19123f6b024SJan Kara performance, but it's good for data-safety. Setting it to 0 will have 19223f6b024SJan Kara the same effect as leaving it at the default (5 seconds). Setting it 19323f6b024SJan Kara to very large values will improve performance. Note that due to 19423f6b024SJan Kara delayed allocation even older data can be lost on power failure since 19523f6b024SJan Kara writeback of those data begins only after time set in 19623f6b024SJan Kara /proc/sys/vm/dirty_expire_centisecs. 197d3091215SDarrick J. Wong 198d3091215SDarrick J. Wong barrier=<0|1(*)>, barrier(*), nobarrier 199d3091215SDarrick J. Wong This enables/disables the use of write barriers in the jbd code. 200d3091215SDarrick J. Wong barrier=0 disables, barrier=1 enables. This also requires an IO stack 201d3091215SDarrick J. Wong which can support barriers, and if jbd gets an error on a barrier 202d3091215SDarrick J. Wong write, it will disable again with a warning. Write barriers enforce 203d3091215SDarrick J. Wong proper on-disk ordering of journal commits, making volatile disk write 204d3091215SDarrick J. Wong caches safe to use, at some performance penalty. If your disks are 205d3091215SDarrick J. Wong battery-backed in one way or another, disabling barriers may safely 206d3091215SDarrick J. Wong improve performance. The mount options "barrier" and "nobarrier" can 207d3091215SDarrick J. Wong also be used to enable or disable barriers, for consistency with other 208d3091215SDarrick J. Wong ext4 mount options. 209d3091215SDarrick J. Wong 210d3091215SDarrick J. Wong inode_readahead_blks=n 211d3091215SDarrick J. Wong This tuning parameter controls the maximum number of inode table blocks 212d3091215SDarrick J. Wong that ext4's inode table readahead algorithm will pre-read into the 213d3091215SDarrick J. Wong buffer cache. The default value is 32 blocks. 214d3091215SDarrick J. Wong 215d3091215SDarrick J. Wong bsddf (*) 216d3091215SDarrick J. Wong Make 'df' act like BSD. 217d3091215SDarrick J. Wong 218d3091215SDarrick J. Wong minixdf 219d3091215SDarrick J. Wong Make 'df' act like Minix. 220d3091215SDarrick J. Wong 221d3091215SDarrick J. Wong debug 222d3091215SDarrick J. Wong Extra debugging information is sent to syslog. 223d3091215SDarrick J. Wong 224d3091215SDarrick J. Wong abort 225d3091215SDarrick J. Wong Simulate the effects of calling ext4_abort() for debugging purposes. 226d3091215SDarrick J. Wong This is normally used while remounting a filesystem which is already 227d3091215SDarrick J. Wong mounted. 228d3091215SDarrick J. Wong 229d3091215SDarrick J. Wong errors=remount-ro 230d3091215SDarrick J. Wong Remount the filesystem read-only on an error. 231d3091215SDarrick J. Wong 232d3091215SDarrick J. Wong errors=continue 233d3091215SDarrick J. Wong Keep going on a filesystem error. 234d3091215SDarrick J. Wong 235d3091215SDarrick J. Wong errors=panic 236d3091215SDarrick J. Wong Panic and halt the machine if an error occurs. (These mount options 237d3091215SDarrick J. Wong override the errors behavior specified in the superblock, which can be 238d3091215SDarrick J. Wong configured using tune2fs) 239d3091215SDarrick J. Wong 240d3091215SDarrick J. Wong data_err=ignore(*) 241d3091215SDarrick J. Wong Just print an error message if an error occurs in a file data buffer in 242d3091215SDarrick J. Wong ordered mode. 243d3091215SDarrick J. Wong data_err=abort 244d3091215SDarrick J. Wong Abort the journal if an error occurs in a file data buffer in ordered 245d3091215SDarrick J. Wong mode. 246d3091215SDarrick J. Wong 247d3091215SDarrick J. Wong grpid | bsdgroups 248d3091215SDarrick J. Wong New objects have the group ID of their parent. 249d3091215SDarrick J. Wong 250d3091215SDarrick J. Wong nogrpid (*) | sysvgroups 251d3091215SDarrick J. Wong New objects have the group ID of their creator. 252d3091215SDarrick J. Wong 253d3091215SDarrick J. Wong resgid=n 254d3091215SDarrick J. Wong The group ID which may use the reserved blocks. 255d3091215SDarrick J. Wong 256d3091215SDarrick J. Wong resuid=n 257d3091215SDarrick J. Wong The user ID which may use the reserved blocks. 258d3091215SDarrick J. Wong 259d3091215SDarrick J. Wong sb= 260d3091215SDarrick J. Wong Use alternate superblock at this location. 261d3091215SDarrick J. Wong 262d3091215SDarrick J. Wong quota, noquota, grpquota, usrquota 263d3091215SDarrick J. Wong These options are ignored by the filesystem. They are used only by 264d3091215SDarrick J. Wong quota tools to recognize volumes where quota should be turned on. See 265d3091215SDarrick J. Wong documentation in the quota-tools package for more details 266d3091215SDarrick J. Wong (http://sourceforge.net/projects/linuxquota). 267d3091215SDarrick J. Wong 268d3091215SDarrick J. Wong jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file> 269d3091215SDarrick J. Wong These options tell filesystem details about quota so that quota 270d3091215SDarrick J. Wong information can be properly updated during journal replay. They replace 271d3091215SDarrick J. Wong the above quota options. See documentation in the quota-tools package 272d3091215SDarrick J. Wong for more details (http://sourceforge.net/projects/linuxquota). 273d3091215SDarrick J. Wong 274d3091215SDarrick J. Wong stripe=n 275d3091215SDarrick J. Wong Number of filesystem blocks that mballoc will try to use for allocation 276d3091215SDarrick J. Wong size and alignment. For RAID5/6 systems this should be the number of 277d3091215SDarrick J. Wong data disks * RAID chunk size in file system blocks. 278d3091215SDarrick J. Wong 279d3091215SDarrick J. Wong delalloc (*) 280d3091215SDarrick J. Wong Defer block allocation until just before ext4 writes out the block(s) 281d3091215SDarrick J. Wong in question. This allows ext4 to better allocation decisions more 282d3091215SDarrick J. Wong efficiently. 283d3091215SDarrick J. Wong 284d3091215SDarrick J. Wong nodelalloc 285d3091215SDarrick J. Wong Disable delayed allocation. Blocks are allocated when the data is 286d3091215SDarrick J. Wong copied from userspace to the page cache, either via the write(2) system 287d3091215SDarrick J. Wong call or when an mmap'ed page which was previously unallocated is 288d3091215SDarrick J. Wong written for the first time. 289d3091215SDarrick J. Wong 290d3091215SDarrick J. Wong max_batch_time=usec 291d3091215SDarrick J. Wong Maximum amount of time ext4 should wait for additional filesystem 292d3091215SDarrick J. Wong operations to be batch together with a synchronous write operation. 293d3091215SDarrick J. Wong Since a synchronous write operation is going to force a commit and then 294d3091215SDarrick J. Wong a wait for the I/O complete, it doesn't cost much, and can be a huge 295d3091215SDarrick J. Wong throughput win, we wait for a small amount of time to see if any other 296d3091215SDarrick J. Wong transactions can piggyback on the synchronous write. The algorithm 297d3091215SDarrick J. Wong used is designed to automatically tune for the speed of the disk, by 298d3091215SDarrick J. Wong measuring the amount of time (on average) that it takes to finish 299d3091215SDarrick J. Wong committing a transaction. Call this time the "commit time". If the 300d3091215SDarrick J. Wong time that the transaction has been running is less than the commit 301d3091215SDarrick J. Wong time, ext4 will try sleeping for the commit time to see if other 302d3091215SDarrick J. Wong operations will join the transaction. The commit time is capped by 303d3091215SDarrick J. Wong the max_batch_time, which defaults to 15000us (15ms). This 304d3091215SDarrick J. Wong optimization can be turned off entirely by setting max_batch_time to 0. 305d3091215SDarrick J. Wong 306d3091215SDarrick J. Wong min_batch_time=usec 307d3091215SDarrick J. Wong This parameter sets the commit time (as described above) to be at least 308d3091215SDarrick J. Wong min_batch_time. It defaults to zero microseconds. Increasing this 309d3091215SDarrick J. Wong parameter may improve the throughput of multi-threaded, synchronous 310d3091215SDarrick J. Wong workloads on very fast disks, at the cost of increasing latency. 311d3091215SDarrick J. Wong 312d3091215SDarrick J. Wong journal_ioprio=prio 313d3091215SDarrick J. Wong The I/O priority (from 0 to 7, where 0 is the highest priority) which 314d3091215SDarrick J. Wong should be used for I/O operations submitted by kjournald2 during a 315d3091215SDarrick J. Wong commit operation. This defaults to 3, which is a slightly higher 316d3091215SDarrick J. Wong priority than the default I/O priority. 317d3091215SDarrick J. Wong 318d3091215SDarrick J. Wong auto_da_alloc(*), noauto_da_alloc 319d3091215SDarrick J. Wong Many broken applications don't use fsync() when replacing existing 320d3091215SDarrick J. Wong files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/ 321d3091215SDarrick J. Wong rename("foo.new", "foo"), or worse yet, fd = open("foo", 322d3091215SDarrick J. Wong O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4 323d3091215SDarrick J. Wong will detect the replace-via-rename and replace-via-truncate patterns 324d3091215SDarrick J. Wong and force that any delayed allocation blocks are allocated such that at 325d3091215SDarrick J. Wong the next journal commit, in the default data=ordered mode, the data 326d3091215SDarrick J. Wong blocks of the new file are forced to disk before the rename() operation 327d3091215SDarrick J. Wong is committed. This provides roughly the same level of guarantees as 328d3091215SDarrick J. Wong ext3, and avoids the "zero-length" problem that can happen when a 329d3091215SDarrick J. Wong system crashes before the delayed allocation blocks are forced to disk. 330d3091215SDarrick J. Wong 331d3091215SDarrick J. Wong noinit_itable 332d3091215SDarrick J. Wong Do not initialize any uninitialized inode table blocks in the 333d3091215SDarrick J. Wong background. This feature may be used by installation CD's so that the 334d3091215SDarrick J. Wong install process can complete as quickly as possible; the inode table 335d3091215SDarrick J. Wong initialization process would then be deferred until the next time the 336d3091215SDarrick J. Wong file system is unmounted. 337d3091215SDarrick J. Wong 338d3091215SDarrick J. Wong init_itable=n 339d3091215SDarrick J. Wong The lazy itable init code will wait n times the number of milliseconds 340d3091215SDarrick J. Wong it took to zero out the previous block group's inode table. This 341d3091215SDarrick J. Wong minimizes the impact on the system performance while file system's 342d3091215SDarrick J. Wong inode table is being initialized. 343d3091215SDarrick J. Wong 344d3091215SDarrick J. Wong discard, nodiscard(*) 345d3091215SDarrick J. Wong Controls whether ext4 should issue discard/TRIM commands to the 346d3091215SDarrick J. Wong underlying block device when blocks are freed. This is useful for SSD 347d3091215SDarrick J. Wong devices and sparse/thinly-provisioned LUNs, but it is off by default 348d3091215SDarrick J. Wong until sufficient testing has been done. 349d3091215SDarrick J. Wong 350d3091215SDarrick J. Wong nouid32 351d3091215SDarrick J. Wong Disables 32-bit UIDs and GIDs. This is for interoperability with 352d3091215SDarrick J. Wong older kernels which only store and expect 16-bit values. 353d3091215SDarrick J. Wong 354d3091215SDarrick J. Wong block_validity(*), noblock_validity 355d3091215SDarrick J. Wong These options enable or disable the in-kernel facility for tracking 356d3091215SDarrick J. Wong filesystem metadata blocks within internal data structures. This 357d3091215SDarrick J. Wong allows multi- block allocator and other routines to notice bugs or 358d3091215SDarrick J. Wong corrupted allocation bitmaps which cause blocks to be allocated which 359d3091215SDarrick J. Wong overlap with filesystem metadata blocks. 360d3091215SDarrick J. Wong 361d3091215SDarrick J. Wong dioread_lock, dioread_nolock 362d3091215SDarrick J. Wong Controls whether or not ext4 should use the DIO read locking. If the 363d3091215SDarrick J. Wong dioread_nolock option is specified ext4 will allocate uninitialized 364d3091215SDarrick J. Wong extent before buffer write and convert the extent to initialized after 365d3091215SDarrick J. Wong IO completes. This approach allows ext4 code to avoid using inode 366d3091215SDarrick J. Wong mutex, which improves scalability on high speed storages. However this 367d3091215SDarrick J. Wong does not work with data journaling and dioread_nolock option will be 368d3091215SDarrick J. Wong ignored with kernel warning. Note that dioread_nolock code path is only 369d3091215SDarrick J. Wong used for extent-based files. Because of the restrictions this options 370d3091215SDarrick J. Wong comprises it is off by default (e.g. dioread_lock). 371d3091215SDarrick J. Wong 372d3091215SDarrick J. Wong max_dir_size_kb=n 373d3091215SDarrick J. Wong This limits the size of directories so that any attempt to expand them 374d3091215SDarrick J. Wong beyond the specified limit in kilobytes will cause an ENOSPC error. 375d3091215SDarrick J. Wong This is useful in memory constrained environments, where a very large 376d3091215SDarrick J. Wong directory can cause severe performance problems or even provoke the Out 377d3091215SDarrick J. Wong Of Memory killer. (For example, if there is only 512mb memory 378d3091215SDarrick J. Wong available, a 176mb directory may seriously cramp the system's style.) 379d3091215SDarrick J. Wong 380d3091215SDarrick J. Wong i_version 381d3091215SDarrick J. Wong Enable 64-bit inode version support. This option is off by default. 382d3091215SDarrick J. Wong 383d3091215SDarrick J. Wong dax 384d3091215SDarrick J. Wong Use direct access (no page cache). See 385*a9edc03fSKir Kolyshkin Documentation/filesystems/dax.rst. Note that this option is 386d3091215SDarrick J. Wong incompatible with data=journal. 387d3091215SDarrick J. Wong 3884f74d15fSEric Biggers inlinecrypt 3894f74d15fSEric Biggers When possible, encrypt/decrypt the contents of encrypted files using the 3904f74d15fSEric Biggers blk-crypto framework rather than filesystem-layer encryption. This 3914f74d15fSEric Biggers allows the use of inline encryption hardware. The on-disk format is 3924f74d15fSEric Biggers unaffected. For more details, see 3934f74d15fSEric Biggers Documentation/block/inline-encryption.rst. 3944f74d15fSEric Biggers 395d3091215SDarrick J. WongData Mode 396d3091215SDarrick J. Wong========= 397d3091215SDarrick J. WongThere are 3 different data modes: 398d3091215SDarrick J. Wong 399d3091215SDarrick J. Wong* writeback mode 400d3091215SDarrick J. Wong 401d3091215SDarrick J. Wong In data=writeback mode, ext4 does not journal data at all. This mode provides 402d3091215SDarrick J. Wong a similar level of journaling as that of XFS, JFS, and ReiserFS in its default 403d3091215SDarrick J. Wong mode - metadata journaling. A crash+recovery can cause incorrect data to 404d3091215SDarrick J. Wong appear in files which were written shortly before the crash. This mode will 405d3091215SDarrick J. Wong typically provide the best ext4 performance. 406d3091215SDarrick J. Wong 407d3091215SDarrick J. Wong* ordered mode 408d3091215SDarrick J. Wong 409d3091215SDarrick J. Wong In data=ordered mode, ext4 only officially journals metadata, but it logically 410d3091215SDarrick J. Wong groups metadata information related to data changes with the data blocks into 411d3091215SDarrick J. Wong a single unit called a transaction. When it's time to write the new metadata 412d3091215SDarrick J. Wong out to disk, the associated data blocks are written first. In general, this 413d3091215SDarrick J. Wong mode performs slightly slower than writeback but significantly faster than 414d3091215SDarrick J. Wong journal mode. 415d3091215SDarrick J. Wong 416d3091215SDarrick J. Wong* journal mode 417d3091215SDarrick J. Wong 418d3091215SDarrick J. Wong data=journal mode provides full data and metadata journaling. All new data is 419d3091215SDarrick J. Wong written to the journal first, and then to its final location. In the event of 420d3091215SDarrick J. Wong a crash, the journal can be replayed, bringing both data and metadata into a 421d3091215SDarrick J. Wong consistent state. This mode is the slowest except when data needs to be read 422d3091215SDarrick J. Wong from and written to disk at the same time where it outperforms all others 423d3091215SDarrick J. Wong modes. Enabling this mode will disable delayed allocation and O_DIRECT 424d3091215SDarrick J. Wong support. 425d3091215SDarrick J. Wong 426d3091215SDarrick J. Wong/proc entries 427d3091215SDarrick J. Wong============= 428d3091215SDarrick J. Wong 429d3091215SDarrick J. WongInformation about mounted ext4 file systems can be found in 430d3091215SDarrick J. Wong/proc/fs/ext4. Each mounted filesystem will have a directory in 431d3091215SDarrick J. Wong/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or 432d3091215SDarrick J. Wong/proc/fs/ext4/dm-0). The files in each per-device directory are shown 433d3091215SDarrick J. Wongin table below. 434d3091215SDarrick J. Wong 435d3091215SDarrick J. WongFiles in /proc/fs/ext4/<devname> 436d3091215SDarrick J. Wong 437d3091215SDarrick J. Wong mb_groups 438d3091215SDarrick J. Wong details of multiblock allocator buddy cache of free blocks 439d3091215SDarrick J. Wong 440d3091215SDarrick J. Wong/sys entries 441d3091215SDarrick J. Wong============ 442d3091215SDarrick J. Wong 443d3091215SDarrick J. WongInformation about mounted ext4 file systems can be found in 444d3091215SDarrick J. Wong/sys/fs/ext4. Each mounted filesystem will have a directory in 445d3091215SDarrick J. Wong/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or 446d3091215SDarrick J. Wong/sys/fs/ext4/dm-0). The files in each per-device directory are shown 447d3091215SDarrick J. Wongin table below. 448d3091215SDarrick J. Wong 449d3091215SDarrick J. WongFiles in /sys/fs/ext4/<devname>: 450d3091215SDarrick J. Wong 451d3091215SDarrick J. Wong(see also Documentation/ABI/testing/sysfs-fs-ext4) 452d3091215SDarrick J. Wong 453d3091215SDarrick J. Wong delayed_allocation_blocks 454d3091215SDarrick J. Wong This file is read-only and shows the number of blocks that are dirty in 455d3091215SDarrick J. Wong the page cache, but which do not have their location in the filesystem 456d3091215SDarrick J. Wong allocated yet. 457d3091215SDarrick J. Wong 458d3091215SDarrick J. Wong inode_goal 459d3091215SDarrick J. Wong Tuning parameter which (if non-zero) controls the goal inode used by 460d3091215SDarrick J. Wong the inode allocator in preference to all other allocation heuristics. 461d3091215SDarrick J. Wong This is intended for debugging use only, and should be 0 on production 462d3091215SDarrick J. Wong systems. 463d3091215SDarrick J. Wong 464d3091215SDarrick J. Wong inode_readahead_blks 465d3091215SDarrick J. Wong Tuning parameter which controls the maximum number of inode table 466d3091215SDarrick J. Wong blocks that ext4's inode table readahead algorithm will pre-read into 467d3091215SDarrick J. Wong the buffer cache. 468d3091215SDarrick J. Wong 469d3091215SDarrick J. Wong lifetime_write_kbytes 470d3091215SDarrick J. Wong This file is read-only and shows the number of kilobytes of data that 471d3091215SDarrick J. Wong have been written to this filesystem since it was created. 472d3091215SDarrick J. Wong 473d3091215SDarrick J. Wong max_writeback_mb_bump 474d3091215SDarrick J. Wong The maximum number of megabytes the writeback code will try to write 475d3091215SDarrick J. Wong out before move on to another inode. 476d3091215SDarrick J. Wong 477d3091215SDarrick J. Wong mb_group_prealloc 478d3091215SDarrick J. Wong The multiblock allocator will round up allocation requests to a 479d3091215SDarrick J. Wong multiple of this tuning parameter if the stripe size is not set in the 480d3091215SDarrick J. Wong ext4 superblock 481d3091215SDarrick J. Wong 482d3091215SDarrick J. Wong mb_max_to_scan 483d3091215SDarrick J. Wong The maximum number of extents the multiblock allocator will search to 484d3091215SDarrick J. Wong find the best extent. 485d3091215SDarrick J. Wong 486d3091215SDarrick J. Wong mb_min_to_scan 487d3091215SDarrick J. Wong The minimum number of extents the multiblock allocator will search to 488d3091215SDarrick J. Wong find the best extent. 489d3091215SDarrick J. Wong 490d3091215SDarrick J. Wong mb_order2_req 491d3091215SDarrick J. Wong Tuning parameter which controls the minimum size for requests (as a 492d3091215SDarrick J. Wong power of 2) where the buddy cache is used. 493d3091215SDarrick J. Wong 494d3091215SDarrick J. Wong mb_stats 495d3091215SDarrick J. Wong Controls whether the multiblock allocator should collect statistics, 496d3091215SDarrick J. Wong which are shown during the unmount. 1 means to collect statistics, 0 497d3091215SDarrick J. Wong means not to collect statistics. 498d3091215SDarrick J. Wong 499d3091215SDarrick J. Wong mb_stream_req 500d3091215SDarrick J. Wong Files which have fewer blocks than this tunable parameter will have 501d3091215SDarrick J. Wong their blocks allocated out of a block group specific preallocation 502d3091215SDarrick J. Wong pool, so that small files are packed closely together. Each large file 503d3091215SDarrick J. Wong will have its blocks allocated out of its own unique preallocation 504d3091215SDarrick J. Wong pool. 505d3091215SDarrick J. Wong 506d3091215SDarrick J. Wong session_write_kbytes 507d3091215SDarrick J. Wong This file is read-only and shows the number of kilobytes of data that 508d3091215SDarrick J. Wong have been written to this filesystem since it was mounted. 509d3091215SDarrick J. Wong 510d3091215SDarrick J. Wong reserved_clusters 511d3091215SDarrick J. Wong This is RW file and contains number of reserved clusters in the file 512d3091215SDarrick J. Wong system which will be used in the specific situations to avoid costly 513d3091215SDarrick J. Wong zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or 514d3091215SDarrick J. Wong 4096 clusters, whichever is smaller and this can be changed however it 515d3091215SDarrick J. Wong can never exceed number of clusters in the file system. If there is not 516d3091215SDarrick J. Wong enough space for the reserved space when mounting the file mount will 517d3091215SDarrick J. Wong _not_ fail. 518d3091215SDarrick J. Wong 519d3091215SDarrick J. WongIoctls 520d3091215SDarrick J. Wong====== 521d3091215SDarrick J. Wong 522cb29a02dSEric BiggersExt4 implements various ioctls which can be used by applications to access 523cb29a02dSEric Biggersext4-specific functionality. An incomplete list of these ioctls is shown in the 524cb29a02dSEric Biggerstable below. This list includes truly ext4-specific ioctls (``EXT4_IOC_*``) as 525cb29a02dSEric Biggerswell as ioctls that may have been ext4-specific originally but are now supported 526cb29a02dSEric Biggersby some other filesystem(s) too (``FS_IOC_*``). 527d3091215SDarrick J. Wong 528cb29a02dSEric BiggersTable of Ext4 ioctls 529d3091215SDarrick J. Wong 530cb29a02dSEric Biggers FS_IOC_GETFLAGS 531d3091215SDarrick J. Wong Get additional attributes associated with inode. The ioctl argument is 532cb29a02dSEric Biggers an integer bitfield, with bit values described in ext4.h. 533d3091215SDarrick J. Wong 534cb29a02dSEric Biggers FS_IOC_SETFLAGS 535d3091215SDarrick J. Wong Set additional attributes associated with inode. The ioctl argument is 536cb29a02dSEric Biggers an integer bitfield, with bit values described in ext4.h. 537d3091215SDarrick J. Wong 538d3091215SDarrick J. Wong EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD 539d3091215SDarrick J. Wong Get the inode i_generation number stored for each inode. The 540d3091215SDarrick J. Wong i_generation number is normally changed only when new inode is created 541d3091215SDarrick J. Wong and it is particularly useful for network filesystems. The '_OLD' 542d3091215SDarrick J. Wong version of this ioctl is an alias for FS_IOC_GETVERSION. 543d3091215SDarrick J. Wong 544d3091215SDarrick J. Wong EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD 545d3091215SDarrick J. Wong Set the inode i_generation number stored for each inode. The '_OLD' 546d3091215SDarrick J. Wong version of this ioctl is an alias for FS_IOC_SETVERSION. 547d3091215SDarrick J. Wong 548d3091215SDarrick J. Wong EXT4_IOC_GROUP_EXTEND 549d3091215SDarrick J. Wong This ioctl has the same purpose as the resize mount option. It allows 550d3091215SDarrick J. Wong to resize filesystem to the end of the last existing block group, 551d3091215SDarrick J. Wong further resize has to be done with resize2fs, either online, or 552d3091215SDarrick J. Wong offline. The argument points to the unsigned logn number representing 553d3091215SDarrick J. Wong the filesystem new block count. 554d3091215SDarrick J. Wong 555d3091215SDarrick J. Wong EXT4_IOC_MOVE_EXT 556d3091215SDarrick J. Wong Move the block extents from orig_fd (the one this ioctl is pointing to) 557d3091215SDarrick J. Wong to the donor_fd (the one specified in move_extent structure passed as 558d3091215SDarrick J. Wong an argument to this ioctl). Then, exchange inode metadata between 559d3091215SDarrick J. Wong orig_fd and donor_fd. This is especially useful for online 560d3091215SDarrick J. Wong defragmentation, because the allocator has the opportunity to allocate 561d3091215SDarrick J. Wong moved blocks better, ideally into one contiguous extent. 562d3091215SDarrick J. Wong 563d3091215SDarrick J. Wong EXT4_IOC_GROUP_ADD 564d3091215SDarrick J. Wong Add a new group descriptor to an existing or new group descriptor 565d3091215SDarrick J. Wong block. The new group descriptor is described by ext4_new_group_input 566d3091215SDarrick J. Wong structure, which is passed as an argument to this ioctl. This is 567d3091215SDarrick J. Wong especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which 568d3091215SDarrick J. Wong allows online resize of the filesystem to the end of the last existing 569d3091215SDarrick J. Wong block group. Those two ioctls combined is used in userspace online 570d3091215SDarrick J. Wong resize tool (e.g. resize2fs). 571d3091215SDarrick J. Wong 572d3091215SDarrick J. Wong EXT4_IOC_MIGRATE 573d3091215SDarrick J. Wong This ioctl operates on the filesystem itself. It converts (migrates) 574d3091215SDarrick J. Wong ext3 indirect block mapped inode to ext4 extent mapped inode by walking 575d3091215SDarrick J. Wong through indirect block mapping of the original inode and converting 576d3091215SDarrick J. Wong contiguous block ranges into ext4 extents of the temporary inode. Then, 577d3091215SDarrick J. Wong inodes are swapped. This ioctl might help, when migrating from ext3 to 578d3091215SDarrick J. Wong ext4 filesystem, however suggestion is to create fresh ext4 filesystem 579d3091215SDarrick J. Wong and copy data from the backup. Note, that filesystem has to support 580d3091215SDarrick J. Wong extents for this ioctl to work. 581d3091215SDarrick J. Wong 582d3091215SDarrick J. Wong EXT4_IOC_ALLOC_DA_BLKS 583d3091215SDarrick J. Wong Force all of the delay allocated blocks to be allocated to preserve 584d3091215SDarrick J. Wong application-expected ext3 behaviour. Note that this will also start 585d3091215SDarrick J. Wong triggering a write of the data blocks, but this behaviour may change in 586d3091215SDarrick J. Wong the future as it is not necessary and has been done this way only for 587d3091215SDarrick J. Wong sake of simplicity. 588d3091215SDarrick J. Wong 589d3091215SDarrick J. Wong EXT4_IOC_RESIZE_FS 590d3091215SDarrick J. Wong Resize the filesystem to a new size. The number of blocks of resized 591d3091215SDarrick J. Wong filesystem is passed in via 64 bit integer argument. The kernel 592d3091215SDarrick J. Wong allocates bitmaps and inode table, the userspace tool thus just passes 593d3091215SDarrick J. Wong the new number of blocks. 594d3091215SDarrick J. Wong 595d3091215SDarrick J. Wong EXT4_IOC_SWAP_BOOT 596d3091215SDarrick J. Wong Swap i_blocks and associated attributes (like i_blocks, i_size, 597d3091215SDarrick J. Wong i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO 598d3091215SDarrick J. Wong (#5). This is typically used to store a boot loader in a secure part of 599d3091215SDarrick J. Wong the filesystem, where it can't be changed by a normal user by accident. 600d3091215SDarrick J. Wong The data blocks of the previous boot loader will be associated with the 601d3091215SDarrick J. Wong given inode. 602d3091215SDarrick J. Wong 603d3091215SDarrick J. WongReferences 604d3091215SDarrick J. Wong========== 605d3091215SDarrick J. Wong 606d3091215SDarrick J. Wongkernel source: <file:fs/ext4/> 607d3091215SDarrick J. Wong <file:fs/jbd2/> 608d3091215SDarrick J. Wong 609d3091215SDarrick J. Wongprograms: http://e2fsprogs.sourceforge.net/ 610d3091215SDarrick J. Wong 6116b2484e1SAlexander A. Klimovuseful links: https://fedoraproject.org/wiki/ext3-devel 612d3091215SDarrick J. Wong http://www.bullopensource.org/ext4/ 613d3091215SDarrick J. Wong http://ext4.wiki.kernel.org/index.php/Main_Page 6146b2484e1SAlexander A. Klimov https://fedoraproject.org/wiki/Features/Ext4 615