1d3091215SDarrick J. Wong.. SPDX-License-Identifier: GPL-2.0 2d3091215SDarrick J. Wong 3d3091215SDarrick J. Wong======================== 4d3091215SDarrick J. Wongext4 General Information 5d3091215SDarrick J. Wong======================== 6d3091215SDarrick J. Wong 7d3091215SDarrick J. WongExt4 is an advanced level of the ext3 filesystem which incorporates 8d3091215SDarrick J. Wongscalability and reliability enhancements for supporting large filesystems 9d3091215SDarrick J. Wong(64 bit) in keeping with increasing disk capacities and state-of-the-art 10d3091215SDarrick J. Wongfeature requirements. 11d3091215SDarrick J. Wong 12d3091215SDarrick J. WongMailing list: linux-ext4@vger.kernel.org 13d3091215SDarrick J. WongWeb site: http://ext4.wiki.kernel.org 14d3091215SDarrick J. Wong 15d3091215SDarrick J. Wong 16d3091215SDarrick J. WongQuick usage instructions 17d3091215SDarrick J. Wong======================== 18d3091215SDarrick J. Wong 19d3091215SDarrick J. WongNote: More extensive information for getting started with ext4 can be 20d3091215SDarrick J. Wongfound at the ext4 wiki site at the URL: 21d3091215SDarrick J. Wonghttp://ext4.wiki.kernel.org/index.php/Ext4_Howto 22d3091215SDarrick J. Wong 23d3091215SDarrick J. Wong - The latest version of e2fsprogs can be found at: 24d3091215SDarrick J. Wong 25d3091215SDarrick J. Wong https://www.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/ 26d3091215SDarrick J. Wong 27d3091215SDarrick J. Wong or 28d3091215SDarrick J. Wong 29d3091215SDarrick J. Wong http://sourceforge.net/project/showfiles.php?group_id=2406 30d3091215SDarrick J. Wong 31d3091215SDarrick J. Wong or grab the latest git repository from: 32d3091215SDarrick J. Wong 33d3091215SDarrick J. Wong https://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git 34d3091215SDarrick J. Wong 35d3091215SDarrick J. Wong - Create a new filesystem using the ext4 filesystem type: 36d3091215SDarrick J. Wong 37d3091215SDarrick J. Wong # mke2fs -t ext4 /dev/hda1 38d3091215SDarrick J. Wong 39d3091215SDarrick J. Wong Or to configure an existing ext3 filesystem to support extents: 40d3091215SDarrick J. Wong 41d3091215SDarrick J. Wong # tune2fs -O extents /dev/hda1 42d3091215SDarrick J. Wong 43d3091215SDarrick J. Wong If the filesystem was created with 128 byte inodes, it can be 44d3091215SDarrick J. Wong converted to use 256 byte for greater efficiency via: 45d3091215SDarrick J. Wong 46d3091215SDarrick J. Wong # tune2fs -I 256 /dev/hda1 47d3091215SDarrick J. Wong 48d3091215SDarrick J. Wong - Mounting: 49d3091215SDarrick J. Wong 50d3091215SDarrick J. Wong # mount -t ext4 /dev/hda1 /wherever 51d3091215SDarrick J. Wong 52d3091215SDarrick J. Wong - When comparing performance with other filesystems, it's always 53d3091215SDarrick J. Wong important to try multiple workloads; very often a subtle change in a 54d3091215SDarrick J. Wong workload parameter can completely change the ranking of which 55d3091215SDarrick J. Wong filesystems do well compared to others. When comparing versus ext3, 56d3091215SDarrick J. Wong note that ext4 enables write barriers by default, while ext3 does 57d3091215SDarrick J. Wong not enable write barriers by default. So it is useful to use 58d3091215SDarrick J. Wong explicitly specify whether barriers are enabled or not when via the 59d3091215SDarrick J. Wong '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems 60d3091215SDarrick J. Wong for a fair comparison. When tuning ext3 for best benchmark numbers, 61d3091215SDarrick J. Wong it is often worthwhile to try changing the data journaling mode; '-o 62d3091215SDarrick J. Wong data=writeback' can be faster for some workloads. (Note however that 63d3091215SDarrick J. Wong running mounted with data=writeback can potentially leave stale data 64d3091215SDarrick J. Wong exposed in recently written files in case of an unclean shutdown, 65d3091215SDarrick J. Wong which could be a security exposure in some situations.) Configuring 66d3091215SDarrick J. Wong the filesystem with a large journal can also be helpful for 67d3091215SDarrick J. Wong metadata-intensive workloads. 68d3091215SDarrick J. Wong 69d3091215SDarrick J. WongFeatures 70d3091215SDarrick J. Wong======== 71d3091215SDarrick J. Wong 72d3091215SDarrick J. WongCurrently Available 73d3091215SDarrick J. Wong------------------- 74d3091215SDarrick J. Wong 75d3091215SDarrick J. Wong* ability to use filesystems > 16TB (e2fsprogs support not available yet) 76d3091215SDarrick J. Wong* extent format reduces metadata overhead (RAM, IO for access, transactions) 77d3091215SDarrick J. Wong* extent format more robust in face of on-disk corruption due to magics, 78d3091215SDarrick J. Wong* internal redundancy in tree 79d3091215SDarrick J. Wong* improved file allocation (multi-block alloc) 80d3091215SDarrick J. Wong* lift 32000 subdirectory limit imposed by i_links_count[1] 81d3091215SDarrick J. Wong* nsec timestamps for mtime, atime, ctime, create time 82d3091215SDarrick J. Wong* inode version field on disk (NFSv4, Lustre) 83d3091215SDarrick J. Wong* reduced e2fsck time via uninit_bg feature 84d3091215SDarrick J. Wong* journal checksumming for robustness, performance 85d3091215SDarrick J. Wong* persistent file preallocation (e.g for streaming media, databases) 86d3091215SDarrick J. Wong* ability to pack bitmaps and inode tables into larger virtual groups via the 87d3091215SDarrick J. Wong flex_bg feature 88d3091215SDarrick J. Wong* large file support 89d3091215SDarrick J. Wong* inode allocation using large virtual block groups via flex_bg 90d3091215SDarrick J. Wong* delayed allocation 91d3091215SDarrick J. Wong* large block (up to pagesize) support 92d3091215SDarrick J. Wong* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force 93d3091215SDarrick J. Wong the ordering) 940a790fe4SGabriel Krisman Bertazi* Case-insensitive file name lookups 95*2fdff4c8SEric Biggers* file-based encryption support (fscrypt) 96*2fdff4c8SEric Biggers* file-based verity support (fsverity) 97d3091215SDarrick J. Wong 98d3091215SDarrick J. Wong[1] Filesystems with a block size of 1k may see a limit imposed by the 99d3091215SDarrick J. Wongdirectory hash tree having a maximum depth of two. 100d3091215SDarrick J. Wong 1010a790fe4SGabriel Krisman Bertazicase-insensitive file name lookups 1020a790fe4SGabriel Krisman Bertazi====================================================== 1030a790fe4SGabriel Krisman Bertazi 1040a790fe4SGabriel Krisman BertaziThe case-insensitive file name lookup feature is supported on a 1050a790fe4SGabriel Krisman Bertaziper-directory basis, allowing the user to mix case-insensitive and 1060a790fe4SGabriel Krisman Bertazicase-sensitive directories in the same filesystem. It is enabled by 1070a790fe4SGabriel Krisman Bertaziflipping the +F inode attribute of an empty directory. The 1080a790fe4SGabriel Krisman Bertazicase-insensitive string match operation is only defined when we know how 1090a790fe4SGabriel Krisman Bertazitext in encoded in a byte sequence. For that reason, in order to enable 1100a790fe4SGabriel Krisman Bertazicase-insensitive directories, the filesystem must have the 1110a790fe4SGabriel Krisman Bertazicasefold feature, which stores the filesystem-wide encoding 1120a790fe4SGabriel Krisman Bertazimodel used. By default, the charset adopted is the latest version of 1130a790fe4SGabriel Krisman BertaziUnicode (12.1.0, by the time of this writing), encoded in the UTF-8 1140a790fe4SGabriel Krisman Bertaziform. The comparison algorithm is implemented by normalizing the 1150a790fe4SGabriel Krisman Bertazistrings to the Canonical decomposition form, as defined by Unicode, 1160a790fe4SGabriel Krisman Bertazifollowed by a byte per byte comparison. 1170a790fe4SGabriel Krisman Bertazi 1180a790fe4SGabriel Krisman BertaziThe case-awareness is name-preserving on the disk, meaning that the file 1190a790fe4SGabriel Krisman Bertaziname provided by userspace is a byte-per-byte match to what is actually 1200a790fe4SGabriel Krisman Bertaziwritten in the disk. The Unicode normalization format used by the 1210a790fe4SGabriel Krisman Bertazikernel is thus an internal representation, and not exposed to the 1220a790fe4SGabriel Krisman Bertaziuserspace nor to the disk, with the important exception of disk hashes, 1230a790fe4SGabriel Krisman Bertaziused on large case-insensitive directories with DX feature. On DX 1240a790fe4SGabriel Krisman Bertazidirectories, the hash must be calculated using the casefolded version of 1250a790fe4SGabriel Krisman Bertazithe filename, meaning that the normalization format used actually has an 1260a790fe4SGabriel Krisman Bertaziimpact on where the directory entry is stored. 1270a790fe4SGabriel Krisman Bertazi 1280a790fe4SGabriel Krisman BertaziWhen we change from viewing filenames as opaque byte sequences to seeing 1290a790fe4SGabriel Krisman Bertazithem as encoded strings we need to address what happens when a program 1300a790fe4SGabriel Krisman Bertazitries to create a file with an invalid name. The Unicode subsystem 1310a790fe4SGabriel Krisman Bertaziwithin the kernel leaves the decision of what to do in this case to the 1320a790fe4SGabriel Krisman Bertazifilesystem, which select its preferred behavior by enabling/disabling 1330a790fe4SGabriel Krisman Bertazithe strict mode. When Ext4 encounters one of those strings and the 1340a790fe4SGabriel Krisman Bertazifilesystem did not require strict mode, it falls back to considering the 1350a790fe4SGabriel Krisman Bertazientire string as an opaque byte sequence, which still allows the user to 1360a790fe4SGabriel Krisman Bertazioperate on that file, but the case-insensitive lookups won't work. 1370a790fe4SGabriel Krisman Bertazi 138d3091215SDarrick J. WongOptions 139d3091215SDarrick J. Wong======= 140d3091215SDarrick J. Wong 141d3091215SDarrick J. WongWhen mounting an ext4 filesystem, the following option are accepted: 142d3091215SDarrick J. Wong(*) == default 143d3091215SDarrick J. Wong 144d3091215SDarrick J. Wong ro 145d3091215SDarrick J. Wong Mount filesystem read only. Note that ext4 will replay the journal (and 146d3091215SDarrick J. Wong thus write to the partition) even when mounted "read only". The mount 147d3091215SDarrick J. Wong options "ro,noload" can be used to prevent writes to the filesystem. 148d3091215SDarrick J. Wong 149d3091215SDarrick J. Wong journal_checksum 150d3091215SDarrick J. Wong Enable checksumming of the journal transactions. This will allow the 151d3091215SDarrick J. Wong recovery code in e2fsck and the kernel to detect corruption in the 152d3091215SDarrick J. Wong kernel. It is a compatible change and will be ignored by older 153d3091215SDarrick J. Wong kernels. 154d3091215SDarrick J. Wong 155d3091215SDarrick J. Wong journal_async_commit 156d3091215SDarrick J. Wong Commit block can be written to disk without waiting for descriptor 157d3091215SDarrick J. Wong blocks. If enabled older kernels cannot mount the device. This will 158d3091215SDarrick J. Wong enable 'journal_checksum' internally. 159d3091215SDarrick J. Wong 160d3091215SDarrick J. Wong journal_path=path, journal_dev=devnum 161d3091215SDarrick J. Wong When the external journal device's major/minor numbers have changed, 162d3091215SDarrick J. Wong these options allow the user to specify the new journal location. The 163d3091215SDarrick J. Wong journal device is identified through either its new major/minor numbers 164d3091215SDarrick J. Wong encoded in devnum, or via a path to the device. 165d3091215SDarrick J. Wong 166d3091215SDarrick J. Wong norecovery, noload 167d3091215SDarrick J. Wong Don't load the journal on mounting. Note that if the filesystem was 168d3091215SDarrick J. Wong not unmounted cleanly, skipping the journal replay will lead to the 169d3091215SDarrick J. Wong filesystem containing inconsistencies that can lead to any number of 170d3091215SDarrick J. Wong problems. 171d3091215SDarrick J. Wong 172d3091215SDarrick J. Wong data=journal 173d3091215SDarrick J. Wong All data are committed into the journal prior to being written into the 174d3091215SDarrick J. Wong main file system. Enabling this mode will disable delayed allocation 175d3091215SDarrick J. Wong and O_DIRECT support. 176d3091215SDarrick J. Wong 177d3091215SDarrick J. Wong data=ordered (*) 178d3091215SDarrick J. Wong All data are forced directly out to the main file system prior to its 179d3091215SDarrick J. Wong metadata being committed to the journal. 180d3091215SDarrick J. Wong 181d3091215SDarrick J. Wong data=writeback 182d3091215SDarrick J. Wong Data ordering is not preserved, data may be written into the main file 183d3091215SDarrick J. Wong system after its metadata has been committed to the journal. 184d3091215SDarrick J. Wong 185d3091215SDarrick J. Wong commit=nrsec (*) 18623f6b024SJan Kara This setting limits the maximum age of the running transaction to 18723f6b024SJan Kara 'nrsec' seconds. The default value is 5 seconds. This means that if 18823f6b024SJan Kara you lose your power, you will lose as much as the latest 5 seconds of 18923f6b024SJan Kara metadata changes (your filesystem will not be damaged though, thanks 19023f6b024SJan Kara to the journaling). This default value (or any low value) will hurt 19123f6b024SJan Kara performance, but it's good for data-safety. Setting it to 0 will have 19223f6b024SJan Kara the same effect as leaving it at the default (5 seconds). Setting it 19323f6b024SJan Kara to very large values will improve performance. Note that due to 19423f6b024SJan Kara delayed allocation even older data can be lost on power failure since 19523f6b024SJan Kara writeback of those data begins only after time set in 19623f6b024SJan Kara /proc/sys/vm/dirty_expire_centisecs. 197d3091215SDarrick J. Wong 198d3091215SDarrick J. Wong barrier=<0|1(*)>, barrier(*), nobarrier 199d3091215SDarrick J. Wong This enables/disables the use of write barriers in the jbd code. 200d3091215SDarrick J. Wong barrier=0 disables, barrier=1 enables. This also requires an IO stack 201d3091215SDarrick J. Wong which can support barriers, and if jbd gets an error on a barrier 202d3091215SDarrick J. Wong write, it will disable again with a warning. Write barriers enforce 203d3091215SDarrick J. Wong proper on-disk ordering of journal commits, making volatile disk write 204d3091215SDarrick J. Wong caches safe to use, at some performance penalty. If your disks are 205d3091215SDarrick J. Wong battery-backed in one way or another, disabling barriers may safely 206d3091215SDarrick J. Wong improve performance. The mount options "barrier" and "nobarrier" can 207d3091215SDarrick J. Wong also be used to enable or disable barriers, for consistency with other 208d3091215SDarrick J. Wong ext4 mount options. 209d3091215SDarrick J. Wong 210d3091215SDarrick J. Wong inode_readahead_blks=n 211d3091215SDarrick J. Wong This tuning parameter controls the maximum number of inode table blocks 212d3091215SDarrick J. Wong that ext4's inode table readahead algorithm will pre-read into the 213d3091215SDarrick J. Wong buffer cache. The default value is 32 blocks. 214d3091215SDarrick J. Wong 215d3091215SDarrick J. Wong nouser_xattr 216d3091215SDarrick J. Wong Disables Extended User Attributes. See the attr(5) manual page for 217d3091215SDarrick J. Wong more information about extended attributes. 218d3091215SDarrick J. Wong 219d3091215SDarrick J. Wong noacl 220d3091215SDarrick J. Wong This option disables POSIX Access Control List support. If ACL support 221d3091215SDarrick J. Wong is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL 222d3091215SDarrick J. Wong is enabled by default on mount. See the acl(5) manual page for more 223d3091215SDarrick J. Wong information about acl. 224d3091215SDarrick J. Wong 225d3091215SDarrick J. Wong bsddf (*) 226d3091215SDarrick J. Wong Make 'df' act like BSD. 227d3091215SDarrick J. Wong 228d3091215SDarrick J. Wong minixdf 229d3091215SDarrick J. Wong Make 'df' act like Minix. 230d3091215SDarrick J. Wong 231d3091215SDarrick J. Wong debug 232d3091215SDarrick J. Wong Extra debugging information is sent to syslog. 233d3091215SDarrick J. Wong 234d3091215SDarrick J. Wong abort 235d3091215SDarrick J. Wong Simulate the effects of calling ext4_abort() for debugging purposes. 236d3091215SDarrick J. Wong This is normally used while remounting a filesystem which is already 237d3091215SDarrick J. Wong mounted. 238d3091215SDarrick J. Wong 239d3091215SDarrick J. Wong errors=remount-ro 240d3091215SDarrick J. Wong Remount the filesystem read-only on an error. 241d3091215SDarrick J. Wong 242d3091215SDarrick J. Wong errors=continue 243d3091215SDarrick J. Wong Keep going on a filesystem error. 244d3091215SDarrick J. Wong 245d3091215SDarrick J. Wong errors=panic 246d3091215SDarrick J. Wong Panic and halt the machine if an error occurs. (These mount options 247d3091215SDarrick J. Wong override the errors behavior specified in the superblock, which can be 248d3091215SDarrick J. Wong configured using tune2fs) 249d3091215SDarrick J. Wong 250d3091215SDarrick J. Wong data_err=ignore(*) 251d3091215SDarrick J. Wong Just print an error message if an error occurs in a file data buffer in 252d3091215SDarrick J. Wong ordered mode. 253d3091215SDarrick J. Wong data_err=abort 254d3091215SDarrick J. Wong Abort the journal if an error occurs in a file data buffer in ordered 255d3091215SDarrick J. Wong mode. 256d3091215SDarrick J. Wong 257d3091215SDarrick J. Wong grpid | bsdgroups 258d3091215SDarrick J. Wong New objects have the group ID of their parent. 259d3091215SDarrick J. Wong 260d3091215SDarrick J. Wong nogrpid (*) | sysvgroups 261d3091215SDarrick J. Wong New objects have the group ID of their creator. 262d3091215SDarrick J. Wong 263d3091215SDarrick J. Wong resgid=n 264d3091215SDarrick J. Wong The group ID which may use the reserved blocks. 265d3091215SDarrick J. Wong 266d3091215SDarrick J. Wong resuid=n 267d3091215SDarrick J. Wong The user ID which may use the reserved blocks. 268d3091215SDarrick J. Wong 269d3091215SDarrick J. Wong sb= 270d3091215SDarrick J. Wong Use alternate superblock at this location. 271d3091215SDarrick J. Wong 272d3091215SDarrick J. Wong quota, noquota, grpquota, usrquota 273d3091215SDarrick J. Wong These options are ignored by the filesystem. They are used only by 274d3091215SDarrick J. Wong quota tools to recognize volumes where quota should be turned on. See 275d3091215SDarrick J. Wong documentation in the quota-tools package for more details 276d3091215SDarrick J. Wong (http://sourceforge.net/projects/linuxquota). 277d3091215SDarrick J. Wong 278d3091215SDarrick J. Wong jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file> 279d3091215SDarrick J. Wong These options tell filesystem details about quota so that quota 280d3091215SDarrick J. Wong information can be properly updated during journal replay. They replace 281d3091215SDarrick J. Wong the above quota options. See documentation in the quota-tools package 282d3091215SDarrick J. Wong for more details (http://sourceforge.net/projects/linuxquota). 283d3091215SDarrick J. Wong 284d3091215SDarrick J. Wong stripe=n 285d3091215SDarrick J. Wong Number of filesystem blocks that mballoc will try to use for allocation 286d3091215SDarrick J. Wong size and alignment. For RAID5/6 systems this should be the number of 287d3091215SDarrick J. Wong data disks * RAID chunk size in file system blocks. 288d3091215SDarrick J. Wong 289d3091215SDarrick J. Wong delalloc (*) 290d3091215SDarrick J. Wong Defer block allocation until just before ext4 writes out the block(s) 291d3091215SDarrick J. Wong in question. This allows ext4 to better allocation decisions more 292d3091215SDarrick J. Wong efficiently. 293d3091215SDarrick J. Wong 294d3091215SDarrick J. Wong nodelalloc 295d3091215SDarrick J. Wong Disable delayed allocation. Blocks are allocated when the data is 296d3091215SDarrick J. Wong copied from userspace to the page cache, either via the write(2) system 297d3091215SDarrick J. Wong call or when an mmap'ed page which was previously unallocated is 298d3091215SDarrick J. Wong written for the first time. 299d3091215SDarrick J. Wong 300d3091215SDarrick J. Wong max_batch_time=usec 301d3091215SDarrick J. Wong Maximum amount of time ext4 should wait for additional filesystem 302d3091215SDarrick J. Wong operations to be batch together with a synchronous write operation. 303d3091215SDarrick J. Wong Since a synchronous write operation is going to force a commit and then 304d3091215SDarrick J. Wong a wait for the I/O complete, it doesn't cost much, and can be a huge 305d3091215SDarrick J. Wong throughput win, we wait for a small amount of time to see if any other 306d3091215SDarrick J. Wong transactions can piggyback on the synchronous write. The algorithm 307d3091215SDarrick J. Wong used is designed to automatically tune for the speed of the disk, by 308d3091215SDarrick J. Wong measuring the amount of time (on average) that it takes to finish 309d3091215SDarrick J. Wong committing a transaction. Call this time the "commit time". If the 310d3091215SDarrick J. Wong time that the transaction has been running is less than the commit 311d3091215SDarrick J. Wong time, ext4 will try sleeping for the commit time to see if other 312d3091215SDarrick J. Wong operations will join the transaction. The commit time is capped by 313d3091215SDarrick J. Wong the max_batch_time, which defaults to 15000us (15ms). This 314d3091215SDarrick J. Wong optimization can be turned off entirely by setting max_batch_time to 0. 315d3091215SDarrick J. Wong 316d3091215SDarrick J. Wong min_batch_time=usec 317d3091215SDarrick J. Wong This parameter sets the commit time (as described above) to be at least 318d3091215SDarrick J. Wong min_batch_time. It defaults to zero microseconds. Increasing this 319d3091215SDarrick J. Wong parameter may improve the throughput of multi-threaded, synchronous 320d3091215SDarrick J. Wong workloads on very fast disks, at the cost of increasing latency. 321d3091215SDarrick J. Wong 322d3091215SDarrick J. Wong journal_ioprio=prio 323d3091215SDarrick J. Wong The I/O priority (from 0 to 7, where 0 is the highest priority) which 324d3091215SDarrick J. Wong should be used for I/O operations submitted by kjournald2 during a 325d3091215SDarrick J. Wong commit operation. This defaults to 3, which is a slightly higher 326d3091215SDarrick J. Wong priority than the default I/O priority. 327d3091215SDarrick J. Wong 328d3091215SDarrick J. Wong auto_da_alloc(*), noauto_da_alloc 329d3091215SDarrick J. Wong Many broken applications don't use fsync() when replacing existing 330d3091215SDarrick J. Wong files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/ 331d3091215SDarrick J. Wong rename("foo.new", "foo"), or worse yet, fd = open("foo", 332d3091215SDarrick J. Wong O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4 333d3091215SDarrick J. Wong will detect the replace-via-rename and replace-via-truncate patterns 334d3091215SDarrick J. Wong and force that any delayed allocation blocks are allocated such that at 335d3091215SDarrick J. Wong the next journal commit, in the default data=ordered mode, the data 336d3091215SDarrick J. Wong blocks of the new file are forced to disk before the rename() operation 337d3091215SDarrick J. Wong is committed. This provides roughly the same level of guarantees as 338d3091215SDarrick J. Wong ext3, and avoids the "zero-length" problem that can happen when a 339d3091215SDarrick J. Wong system crashes before the delayed allocation blocks are forced to disk. 340d3091215SDarrick J. Wong 341d3091215SDarrick J. Wong noinit_itable 342d3091215SDarrick J. Wong Do not initialize any uninitialized inode table blocks in the 343d3091215SDarrick J. Wong background. This feature may be used by installation CD's so that the 344d3091215SDarrick J. Wong install process can complete as quickly as possible; the inode table 345d3091215SDarrick J. Wong initialization process would then be deferred until the next time the 346d3091215SDarrick J. Wong file system is unmounted. 347d3091215SDarrick J. Wong 348d3091215SDarrick J. Wong init_itable=n 349d3091215SDarrick J. Wong The lazy itable init code will wait n times the number of milliseconds 350d3091215SDarrick J. Wong it took to zero out the previous block group's inode table. This 351d3091215SDarrick J. Wong minimizes the impact on the system performance while file system's 352d3091215SDarrick J. Wong inode table is being initialized. 353d3091215SDarrick J. Wong 354d3091215SDarrick J. Wong discard, nodiscard(*) 355d3091215SDarrick J. Wong Controls whether ext4 should issue discard/TRIM commands to the 356d3091215SDarrick J. Wong underlying block device when blocks are freed. This is useful for SSD 357d3091215SDarrick J. Wong devices and sparse/thinly-provisioned LUNs, but it is off by default 358d3091215SDarrick J. Wong until sufficient testing has been done. 359d3091215SDarrick J. Wong 360d3091215SDarrick J. Wong nouid32 361d3091215SDarrick J. Wong Disables 32-bit UIDs and GIDs. This is for interoperability with 362d3091215SDarrick J. Wong older kernels which only store and expect 16-bit values. 363d3091215SDarrick J. Wong 364d3091215SDarrick J. Wong block_validity(*), noblock_validity 365d3091215SDarrick J. Wong These options enable or disable the in-kernel facility for tracking 366d3091215SDarrick J. Wong filesystem metadata blocks within internal data structures. This 367d3091215SDarrick J. Wong allows multi- block allocator and other routines to notice bugs or 368d3091215SDarrick J. Wong corrupted allocation bitmaps which cause blocks to be allocated which 369d3091215SDarrick J. Wong overlap with filesystem metadata blocks. 370d3091215SDarrick J. Wong 371d3091215SDarrick J. Wong dioread_lock, dioread_nolock 372d3091215SDarrick J. Wong Controls whether or not ext4 should use the DIO read locking. If the 373d3091215SDarrick J. Wong dioread_nolock option is specified ext4 will allocate uninitialized 374d3091215SDarrick J. Wong extent before buffer write and convert the extent to initialized after 375d3091215SDarrick J. Wong IO completes. This approach allows ext4 code to avoid using inode 376d3091215SDarrick J. Wong mutex, which improves scalability on high speed storages. However this 377d3091215SDarrick J. Wong does not work with data journaling and dioread_nolock option will be 378d3091215SDarrick J. Wong ignored with kernel warning. Note that dioread_nolock code path is only 379d3091215SDarrick J. Wong used for extent-based files. Because of the restrictions this options 380d3091215SDarrick J. Wong comprises it is off by default (e.g. dioread_lock). 381d3091215SDarrick J. Wong 382d3091215SDarrick J. Wong max_dir_size_kb=n 383d3091215SDarrick J. Wong This limits the size of directories so that any attempt to expand them 384d3091215SDarrick J. Wong beyond the specified limit in kilobytes will cause an ENOSPC error. 385d3091215SDarrick J. Wong This is useful in memory constrained environments, where a very large 386d3091215SDarrick J. Wong directory can cause severe performance problems or even provoke the Out 387d3091215SDarrick J. Wong Of Memory killer. (For example, if there is only 512mb memory 388d3091215SDarrick J. Wong available, a 176mb directory may seriously cramp the system's style.) 389d3091215SDarrick J. Wong 390d3091215SDarrick J. Wong i_version 391d3091215SDarrick J. Wong Enable 64-bit inode version support. This option is off by default. 392d3091215SDarrick J. Wong 393d3091215SDarrick J. Wong dax 394d3091215SDarrick J. Wong Use direct access (no page cache). See 395d3091215SDarrick J. Wong Documentation/filesystems/dax.txt. Note that this option is 396d3091215SDarrick J. Wong incompatible with data=journal. 397d3091215SDarrick J. Wong 398d3091215SDarrick J. WongData Mode 399d3091215SDarrick J. Wong========= 400d3091215SDarrick J. WongThere are 3 different data modes: 401d3091215SDarrick J. Wong 402d3091215SDarrick J. Wong* writeback mode 403d3091215SDarrick J. Wong 404d3091215SDarrick J. Wong In data=writeback mode, ext4 does not journal data at all. This mode provides 405d3091215SDarrick J. Wong a similar level of journaling as that of XFS, JFS, and ReiserFS in its default 406d3091215SDarrick J. Wong mode - metadata journaling. A crash+recovery can cause incorrect data to 407d3091215SDarrick J. Wong appear in files which were written shortly before the crash. This mode will 408d3091215SDarrick J. Wong typically provide the best ext4 performance. 409d3091215SDarrick J. Wong 410d3091215SDarrick J. Wong* ordered mode 411d3091215SDarrick J. Wong 412d3091215SDarrick J. Wong In data=ordered mode, ext4 only officially journals metadata, but it logically 413d3091215SDarrick J. Wong groups metadata information related to data changes with the data blocks into 414d3091215SDarrick J. Wong a single unit called a transaction. When it's time to write the new metadata 415d3091215SDarrick J. Wong out to disk, the associated data blocks are written first. In general, this 416d3091215SDarrick J. Wong mode performs slightly slower than writeback but significantly faster than 417d3091215SDarrick J. Wong journal mode. 418d3091215SDarrick J. Wong 419d3091215SDarrick J. Wong* journal mode 420d3091215SDarrick J. Wong 421d3091215SDarrick J. Wong data=journal mode provides full data and metadata journaling. All new data is 422d3091215SDarrick J. Wong written to the journal first, and then to its final location. In the event of 423d3091215SDarrick J. Wong a crash, the journal can be replayed, bringing both data and metadata into a 424d3091215SDarrick J. Wong consistent state. This mode is the slowest except when data needs to be read 425d3091215SDarrick J. Wong from and written to disk at the same time where it outperforms all others 426d3091215SDarrick J. Wong modes. Enabling this mode will disable delayed allocation and O_DIRECT 427d3091215SDarrick J. Wong support. 428d3091215SDarrick J. Wong 429d3091215SDarrick J. Wong/proc entries 430d3091215SDarrick J. Wong============= 431d3091215SDarrick J. Wong 432d3091215SDarrick J. WongInformation about mounted ext4 file systems can be found in 433d3091215SDarrick J. Wong/proc/fs/ext4. Each mounted filesystem will have a directory in 434d3091215SDarrick J. Wong/proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or 435d3091215SDarrick J. Wong/proc/fs/ext4/dm-0). The files in each per-device directory are shown 436d3091215SDarrick J. Wongin table below. 437d3091215SDarrick J. Wong 438d3091215SDarrick J. WongFiles in /proc/fs/ext4/<devname> 439d3091215SDarrick J. Wong 440d3091215SDarrick J. Wong mb_groups 441d3091215SDarrick J. Wong details of multiblock allocator buddy cache of free blocks 442d3091215SDarrick J. Wong 443d3091215SDarrick J. Wong/sys entries 444d3091215SDarrick J. Wong============ 445d3091215SDarrick J. Wong 446d3091215SDarrick J. WongInformation about mounted ext4 file systems can be found in 447d3091215SDarrick J. Wong/sys/fs/ext4. Each mounted filesystem will have a directory in 448d3091215SDarrick J. Wong/sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or 449d3091215SDarrick J. Wong/sys/fs/ext4/dm-0). The files in each per-device directory are shown 450d3091215SDarrick J. Wongin table below. 451d3091215SDarrick J. Wong 452d3091215SDarrick J. WongFiles in /sys/fs/ext4/<devname>: 453d3091215SDarrick J. Wong 454d3091215SDarrick J. Wong(see also Documentation/ABI/testing/sysfs-fs-ext4) 455d3091215SDarrick J. Wong 456d3091215SDarrick J. Wong delayed_allocation_blocks 457d3091215SDarrick J. Wong This file is read-only and shows the number of blocks that are dirty in 458d3091215SDarrick J. Wong the page cache, but which do not have their location in the filesystem 459d3091215SDarrick J. Wong allocated yet. 460d3091215SDarrick J. Wong 461d3091215SDarrick J. Wong inode_goal 462d3091215SDarrick J. Wong Tuning parameter which (if non-zero) controls the goal inode used by 463d3091215SDarrick J. Wong the inode allocator in preference to all other allocation heuristics. 464d3091215SDarrick J. Wong This is intended for debugging use only, and should be 0 on production 465d3091215SDarrick J. Wong systems. 466d3091215SDarrick J. Wong 467d3091215SDarrick J. Wong inode_readahead_blks 468d3091215SDarrick J. Wong Tuning parameter which controls the maximum number of inode table 469d3091215SDarrick J. Wong blocks that ext4's inode table readahead algorithm will pre-read into 470d3091215SDarrick J. Wong the buffer cache. 471d3091215SDarrick J. Wong 472d3091215SDarrick J. Wong lifetime_write_kbytes 473d3091215SDarrick J. Wong This file is read-only and shows the number of kilobytes of data that 474d3091215SDarrick J. Wong have been written to this filesystem since it was created. 475d3091215SDarrick J. Wong 476d3091215SDarrick J. Wong max_writeback_mb_bump 477d3091215SDarrick J. Wong The maximum number of megabytes the writeback code will try to write 478d3091215SDarrick J. Wong out before move on to another inode. 479d3091215SDarrick J. Wong 480d3091215SDarrick J. Wong mb_group_prealloc 481d3091215SDarrick J. Wong The multiblock allocator will round up allocation requests to a 482d3091215SDarrick J. Wong multiple of this tuning parameter if the stripe size is not set in the 483d3091215SDarrick J. Wong ext4 superblock 484d3091215SDarrick J. Wong 485d3091215SDarrick J. Wong mb_max_to_scan 486d3091215SDarrick J. Wong The maximum number of extents the multiblock allocator will search to 487d3091215SDarrick J. Wong find the best extent. 488d3091215SDarrick J. Wong 489d3091215SDarrick J. Wong mb_min_to_scan 490d3091215SDarrick J. Wong The minimum number of extents the multiblock allocator will search to 491d3091215SDarrick J. Wong find the best extent. 492d3091215SDarrick J. Wong 493d3091215SDarrick J. Wong mb_order2_req 494d3091215SDarrick J. Wong Tuning parameter which controls the minimum size for requests (as a 495d3091215SDarrick J. Wong power of 2) where the buddy cache is used. 496d3091215SDarrick J. Wong 497d3091215SDarrick J. Wong mb_stats 498d3091215SDarrick J. Wong Controls whether the multiblock allocator should collect statistics, 499d3091215SDarrick J. Wong which are shown during the unmount. 1 means to collect statistics, 0 500d3091215SDarrick J. Wong means not to collect statistics. 501d3091215SDarrick J. Wong 502d3091215SDarrick J. Wong mb_stream_req 503d3091215SDarrick J. Wong Files which have fewer blocks than this tunable parameter will have 504d3091215SDarrick J. Wong their blocks allocated out of a block group specific preallocation 505d3091215SDarrick J. Wong pool, so that small files are packed closely together. Each large file 506d3091215SDarrick J. Wong will have its blocks allocated out of its own unique preallocation 507d3091215SDarrick J. Wong pool. 508d3091215SDarrick J. Wong 509d3091215SDarrick J. Wong session_write_kbytes 510d3091215SDarrick J. Wong This file is read-only and shows the number of kilobytes of data that 511d3091215SDarrick J. Wong have been written to this filesystem since it was mounted. 512d3091215SDarrick J. Wong 513d3091215SDarrick J. Wong reserved_clusters 514d3091215SDarrick J. Wong This is RW file and contains number of reserved clusters in the file 515d3091215SDarrick J. Wong system which will be used in the specific situations to avoid costly 516d3091215SDarrick J. Wong zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or 517d3091215SDarrick J. Wong 4096 clusters, whichever is smaller and this can be changed however it 518d3091215SDarrick J. Wong can never exceed number of clusters in the file system. If there is not 519d3091215SDarrick J. Wong enough space for the reserved space when mounting the file mount will 520d3091215SDarrick J. Wong _not_ fail. 521d3091215SDarrick J. Wong 522d3091215SDarrick J. WongIoctls 523d3091215SDarrick J. Wong====== 524d3091215SDarrick J. Wong 525d3091215SDarrick J. WongThere is some Ext4 specific functionality which can be accessed by applications 526d3091215SDarrick J. Wongthrough the system call interfaces. The list of all Ext4 specific ioctls are 527d3091215SDarrick J. Wongshown in the table below. 528d3091215SDarrick J. Wong 529d3091215SDarrick J. WongTable of Ext4 specific ioctls 530d3091215SDarrick J. Wong 531d3091215SDarrick J. Wong EXT4_IOC_GETFLAGS 532d3091215SDarrick J. Wong Get additional attributes associated with inode. The ioctl argument is 533d3091215SDarrick J. Wong an integer bitfield, with bit values described in ext4.h. This ioctl is 534d3091215SDarrick J. Wong an alias for FS_IOC_GETFLAGS. 535d3091215SDarrick J. Wong 536d3091215SDarrick J. Wong EXT4_IOC_SETFLAGS 537d3091215SDarrick J. Wong Set additional attributes associated with inode. The ioctl argument is 538d3091215SDarrick J. Wong an integer bitfield, with bit values described in ext4.h. This ioctl is 539d3091215SDarrick J. Wong an alias for FS_IOC_SETFLAGS. 540d3091215SDarrick J. Wong 541d3091215SDarrick J. Wong EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD 542d3091215SDarrick J. Wong Get the inode i_generation number stored for each inode. The 543d3091215SDarrick J. Wong i_generation number is normally changed only when new inode is created 544d3091215SDarrick J. Wong and it is particularly useful for network filesystems. The '_OLD' 545d3091215SDarrick J. Wong version of this ioctl is an alias for FS_IOC_GETVERSION. 546d3091215SDarrick J. Wong 547d3091215SDarrick J. Wong EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD 548d3091215SDarrick J. Wong Set the inode i_generation number stored for each inode. The '_OLD' 549d3091215SDarrick J. Wong version of this ioctl is an alias for FS_IOC_SETVERSION. 550d3091215SDarrick J. Wong 551d3091215SDarrick J. Wong EXT4_IOC_GROUP_EXTEND 552d3091215SDarrick J. Wong This ioctl has the same purpose as the resize mount option. It allows 553d3091215SDarrick J. Wong to resize filesystem to the end of the last existing block group, 554d3091215SDarrick J. Wong further resize has to be done with resize2fs, either online, or 555d3091215SDarrick J. Wong offline. The argument points to the unsigned logn number representing 556d3091215SDarrick J. Wong the filesystem new block count. 557d3091215SDarrick J. Wong 558d3091215SDarrick J. Wong EXT4_IOC_MOVE_EXT 559d3091215SDarrick J. Wong Move the block extents from orig_fd (the one this ioctl is pointing to) 560d3091215SDarrick J. Wong to the donor_fd (the one specified in move_extent structure passed as 561d3091215SDarrick J. Wong an argument to this ioctl). Then, exchange inode metadata between 562d3091215SDarrick J. Wong orig_fd and donor_fd. This is especially useful for online 563d3091215SDarrick J. Wong defragmentation, because the allocator has the opportunity to allocate 564d3091215SDarrick J. Wong moved blocks better, ideally into one contiguous extent. 565d3091215SDarrick J. Wong 566d3091215SDarrick J. Wong EXT4_IOC_GROUP_ADD 567d3091215SDarrick J. Wong Add a new group descriptor to an existing or new group descriptor 568d3091215SDarrick J. Wong block. The new group descriptor is described by ext4_new_group_input 569d3091215SDarrick J. Wong structure, which is passed as an argument to this ioctl. This is 570d3091215SDarrick J. Wong especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which 571d3091215SDarrick J. Wong allows online resize of the filesystem to the end of the last existing 572d3091215SDarrick J. Wong block group. Those two ioctls combined is used in userspace online 573d3091215SDarrick J. Wong resize tool (e.g. resize2fs). 574d3091215SDarrick J. Wong 575d3091215SDarrick J. Wong EXT4_IOC_MIGRATE 576d3091215SDarrick J. Wong This ioctl operates on the filesystem itself. It converts (migrates) 577d3091215SDarrick J. Wong ext3 indirect block mapped inode to ext4 extent mapped inode by walking 578d3091215SDarrick J. Wong through indirect block mapping of the original inode and converting 579d3091215SDarrick J. Wong contiguous block ranges into ext4 extents of the temporary inode. Then, 580d3091215SDarrick J. Wong inodes are swapped. This ioctl might help, when migrating from ext3 to 581d3091215SDarrick J. Wong ext4 filesystem, however suggestion is to create fresh ext4 filesystem 582d3091215SDarrick J. Wong and copy data from the backup. Note, that filesystem has to support 583d3091215SDarrick J. Wong extents for this ioctl to work. 584d3091215SDarrick J. Wong 585d3091215SDarrick J. Wong EXT4_IOC_ALLOC_DA_BLKS 586d3091215SDarrick J. Wong Force all of the delay allocated blocks to be allocated to preserve 587d3091215SDarrick J. Wong application-expected ext3 behaviour. Note that this will also start 588d3091215SDarrick J. Wong triggering a write of the data blocks, but this behaviour may change in 589d3091215SDarrick J. Wong the future as it is not necessary and has been done this way only for 590d3091215SDarrick J. Wong sake of simplicity. 591d3091215SDarrick J. Wong 592d3091215SDarrick J. Wong EXT4_IOC_RESIZE_FS 593d3091215SDarrick J. Wong Resize the filesystem to a new size. The number of blocks of resized 594d3091215SDarrick J. Wong filesystem is passed in via 64 bit integer argument. The kernel 595d3091215SDarrick J. Wong allocates bitmaps and inode table, the userspace tool thus just passes 596d3091215SDarrick J. Wong the new number of blocks. 597d3091215SDarrick J. Wong 598d3091215SDarrick J. Wong EXT4_IOC_SWAP_BOOT 599d3091215SDarrick J. Wong Swap i_blocks and associated attributes (like i_blocks, i_size, 600d3091215SDarrick J. Wong i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO 601d3091215SDarrick J. Wong (#5). This is typically used to store a boot loader in a secure part of 602d3091215SDarrick J. Wong the filesystem, where it can't be changed by a normal user by accident. 603d3091215SDarrick J. Wong The data blocks of the previous boot loader will be associated with the 604d3091215SDarrick J. Wong given inode. 605d3091215SDarrick J. Wong 606d3091215SDarrick J. WongReferences 607d3091215SDarrick J. Wong========== 608d3091215SDarrick J. Wong 609d3091215SDarrick J. Wongkernel source: <file:fs/ext4/> 610d3091215SDarrick J. Wong <file:fs/jbd2/> 611d3091215SDarrick J. Wong 612d3091215SDarrick J. Wongprograms: http://e2fsprogs.sourceforge.net/ 613d3091215SDarrick J. Wong 614d3091215SDarrick J. Wonguseful links: http://fedoraproject.org/wiki/ext3-devel 615d3091215SDarrick J. Wong http://www.bullopensource.org/ext4/ 616d3091215SDarrick J. Wong http://ext4.wiki.kernel.org/index.php/Main_Page 617d3091215SDarrick J. Wong http://fedoraproject.org/wiki/Features/Ext4 618