1.. SPDX-License-Identifier: GPL-2.0 2 3Directory Entries 4----------------- 5 6In an ext4 filesystem, a directory is more or less a flat file that maps 7an arbitrary byte string (usually ASCII) to an inode number on the 8filesystem. There can be many directory entries across the filesystem 9that reference the same inode number--these are known as hard links, and 10that is why hard links cannot reference files on other filesystems. As 11such, directory entries are found by reading the data block(s) 12associated with a directory file for the particular directory entry that 13is desired. 14 15Linear (Classic) Directories 16~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 17 18By default, each directory lists its entries in an “almost-linear” 19array. I write “almost” because it's not a linear array in the memory 20sense because directory entries are not split across filesystem blocks. 21Therefore, it is more accurate to say that a directory is a series of 22data blocks and that each block contains a linear array of directory 23entries. The end of each per-block array is signified by reaching the 24end of the block; the last entry in the block has a record length that 25takes it all the way to the end of the block. The end of the entire 26directory is of course signified by reaching the end of the file. Unused 27directory entries are signified by inode = 0. By default the filesystem 28uses ``struct ext4_dir_entry_2`` for directory entries unless the 29“filetype” feature flag is not set, in which case it uses 30``struct ext4_dir_entry``. 31 32The original directory entry format is ``struct ext4_dir_entry``, which 33is at most 263 bytes long, though on disk you'll need to reference 34``dirent.rec_len`` to know for sure. 35 36.. list-table:: 37 :widths: 8 8 24 40 38 :header-rows: 1 39 40 * - Offset 41 - Size 42 - Name 43 - Description 44 * - 0x0 45 - __le32 46 - inode 47 - Number of the inode that this directory entry points to. 48 * - 0x4 49 - __le16 50 - rec_len 51 - Length of this directory entry. Must be a multiple of 4. 52 * - 0x6 53 - __le16 54 - name_len 55 - Length of the file name. 56 * - 0x8 57 - char 58 - name[EXT4_NAME_LEN] 59 - File name. 60 61Since file names cannot be longer than 255 bytes, the new directory 62entry format shortens the name_len field and uses the space for a file 63type flag, probably to avoid having to load every inode during directory 64tree traversal. This format is ``ext4_dir_entry_2``, which is at most 65263 bytes long, though on disk you'll need to reference 66``dirent.rec_len`` to know for sure. 67 68.. list-table:: 69 :widths: 8 8 24 40 70 :header-rows: 1 71 72 * - Offset 73 - Size 74 - Name 75 - Description 76 * - 0x0 77 - __le32 78 - inode 79 - Number of the inode that this directory entry points to. 80 * - 0x4 81 - __le16 82 - rec_len 83 - Length of this directory entry. 84 * - 0x6 85 - __u8 86 - name_len 87 - Length of the file name. 88 * - 0x7 89 - __u8 90 - file_type 91 - File type code, see ftype_ table below. 92 * - 0x8 93 - char 94 - name[EXT4_NAME_LEN] 95 - File name. 96 97.. _ftype: 98 99The directory file type is one of the following values: 100 101.. list-table:: 102 :widths: 16 64 103 :header-rows: 1 104 105 * - Value 106 - Description 107 * - 0x0 108 - Unknown. 109 * - 0x1 110 - Regular file. 111 * - 0x2 112 - Directory. 113 * - 0x3 114 - Character device file. 115 * - 0x4 116 - Block device file. 117 * - 0x5 118 - FIFO. 119 * - 0x6 120 - Socket. 121 * - 0x7 122 - Symbolic link. 123 124To support directories that are both encrypted and casefolded directories, we 125must also include hash information in the directory entry. We append 126``ext4_extended_dir_entry_2`` to ``ext4_dir_entry_2`` except for the entries 127for dot and dotdot, which are kept the same. The structure follows immediately 128after ``name`` and is included in the size listed by ``rec_len`` If a directory 129entry uses this extension, it may be up to 271 bytes. 130 131.. list-table:: 132 :widths: 8 8 24 40 133 :header-rows: 1 134 135 * - Offset 136 - Size 137 - Name 138 - Description 139 * - 0x0 140 - __le32 141 - hash 142 - The hash of the directory name 143 * - 0x4 144 - __le32 145 - minor_hash 146 - The minor hash of the directory name 147 148 149In order to add checksums to these classic directory blocks, a phony 150``struct ext4_dir_entry`` is placed at the end of each leaf block to 151hold the checksum. The directory entry is 12 bytes long. The inode 152number and name_len fields are set to zero to fool old software into 153ignoring an apparently empty directory entry, and the checksum is stored 154in the place where the name normally goes. The structure is 155``struct ext4_dir_entry_tail``: 156 157.. list-table:: 158 :widths: 8 8 24 40 159 :header-rows: 1 160 161 * - Offset 162 - Size 163 - Name 164 - Description 165 * - 0x0 166 - __le32 167 - det_reserved_zero1 168 - Inode number, which must be zero. 169 * - 0x4 170 - __le16 171 - det_rec_len 172 - Length of this directory entry, which must be 12. 173 * - 0x6 174 - __u8 175 - det_reserved_zero2 176 - Length of the file name, which must be zero. 177 * - 0x7 178 - __u8 179 - det_reserved_ft 180 - File type, which must be 0xDE. 181 * - 0x8 182 - __le32 183 - det_checksum 184 - Directory leaf block checksum. 185 186The leaf directory block checksum is calculated against the FS UUID (or 187the checksum seed, if that feature is enabled for the fs), the directory's 188inode number, the directory's inode generation number, and the entire 189directory entry block up to (but not including) the fake directory entry. 190 191Hash Tree Directories 192~~~~~~~~~~~~~~~~~~~~~ 193 194A linear array of directory entries isn't great for performance, so a 195new feature was added to ext3 to provide a faster (but peculiar) 196balanced tree keyed off a hash of the directory entry name. If the 197EXT4_INDEX_FL (0x1000) flag is set in the inode, this directory uses a 198hashed btree (htree) to organize and find directory entries. For 199backwards read-only compatibility with ext2, interior tree nodes are actually 200hidden inside the directory file, masquerading as “empty” directory entries 201spanning the whole block. It was stated previously that directory entries 202with the inode set to 0 are treated as unused entries; this is (ab)used to 203fool the old linear-scan algorithm into skipping over those blocks containing 204the interior tree node data. 205 206The root of the tree always lives in the first data block of the 207directory. By ext2 custom, the '.' and '..' entries must appear at the 208beginning of this first block, so they are put here as two 209``struct ext4_dir_entry_2`` s and not stored in the tree. The rest of 210the root node contains metadata about the tree and finally a hash->block 211map to find nodes that are lower in the htree. If 212``dx_root.info.indirect_levels`` is non-zero then the htree has that many 213levels and the blocks pointed to by the root node's map are interior nodes. 214These interior nodes have a zeroed out ``struct ext4_dir_entry_2`` followed by 215a hash->block map to find nodes of the next level. Leaf nodes look like 216classic linear directory blocks, but all of its entries have a hash value 217equal or greater than the indicated hash of the parent node. 218 219The actual hash value for an entry name is only 31 bits, the least-significant 220bit is set to 0. However, if there is a hash collision between directory 221entries, the least-significant bit may get set to 1 on interior nodes in the 222case where these two (or more) hash-colliding entries do not fit into one leaf 223node and must be split across multiple nodes. 224 225To look up a name in such a htree, the code calculates the hash of the desired 226file name and uses it to find the leaf node with the range of hash values the 227calculated hash falls into (in other words, a lookup works basically the same 228as it would in a B-Tree keyed by the hash value), and possibly also scanning 229the leaf nodes that follow (in tree order) in case of hash collisions. 230 231To traverse the directory as a linear array (such as the old code does), 232the code simply reads every data block in the directory. The blocks used 233for the htree will appear to have no entries (aside from '.' and '..') 234and so only the leaf nodes will appear to have any interesting content. 235 236The root of the htree is in ``struct dx_root``, which is the full length 237of a data block: 238 239.. list-table:: 240 :widths: 8 8 24 40 241 :header-rows: 1 242 243 * - Offset 244 - Type 245 - Name 246 - Description 247 * - 0x0 248 - __le32 249 - dot.inode 250 - inode number of this directory. 251 * - 0x4 252 - __le16 253 - dot.rec_len 254 - Length of this record, 12. 255 * - 0x6 256 - u8 257 - dot.name_len 258 - Length of the name, 1. 259 * - 0x7 260 - u8 261 - dot.file_type 262 - File type of this entry, 0x2 (directory) (if the feature flag is set). 263 * - 0x8 264 - char 265 - dot.name[4] 266 - “.\0\0\0” 267 * - 0xC 268 - __le32 269 - dotdot.inode 270 - inode number of parent directory. 271 * - 0x10 272 - __le16 273 - dotdot.rec_len 274 - block_size - 12. The record length is long enough to cover all htree 275 data. 276 * - 0x12 277 - u8 278 - dotdot.name_len 279 - Length of the name, 2. 280 * - 0x13 281 - u8 282 - dotdot.file_type 283 - File type of this entry, 0x2 (directory) (if the feature flag is set). 284 * - 0x14 285 - char 286 - dotdot_name[4] 287 - “..\0\0” 288 * - 0x18 289 - __le32 290 - struct dx_root_info.reserved_zero 291 - Zero. 292 * - 0x1C 293 - u8 294 - struct dx_root_info.hash_version 295 - Hash type, see dirhash_ table below. 296 * - 0x1D 297 - u8 298 - struct dx_root_info.info_length 299 - Length of the tree information, 0x8. 300 * - 0x1E 301 - u8 302 - struct dx_root_info.indirect_levels 303 - Depth of the htree. Cannot be larger than 3 if the INCOMPAT_LARGEDIR 304 feature is set; cannot be larger than 2 otherwise. 305 * - 0x1F 306 - u8 307 - struct dx_root_info.unused_flags 308 - 309 * - 0x20 310 - __le16 311 - limit 312 - Maximum number of dx_entries that can follow this header, plus 1 for 313 the header itself. 314 * - 0x22 315 - __le16 316 - count 317 - Actual number of dx_entries that follow this header, plus 1 for the 318 header itself. 319 * - 0x24 320 - __le32 321 - block 322 - The block number (within the directory file) that lead to the left-most 323 leaf node, i.e. the leaf containing entries with the lowest hash values. 324 * - 0x28 325 - struct dx_entry 326 - entries[0] 327 - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block. 328 329.. _dirhash: 330 331The directory hash is one of the following values: 332 333.. list-table:: 334 :widths: 16 64 335 :header-rows: 1 336 337 * - Value 338 - Description 339 * - 0x0 340 - Legacy. 341 * - 0x1 342 - Half MD4. 343 * - 0x2 344 - Tea. 345 * - 0x3 346 - Legacy, unsigned. 347 * - 0x4 348 - Half MD4, unsigned. 349 * - 0x5 350 - Tea, unsigned. 351 * - 0x6 352 - Siphash. 353 354Interior nodes of an htree are recorded as ``struct dx_node``, which is 355also the full length of a data block: 356 357.. list-table:: 358 :widths: 8 8 24 40 359 :header-rows: 1 360 361 * - Offset 362 - Type 363 - Name 364 - Description 365 * - 0x0 366 - __le32 367 - fake.inode 368 - Zero, to make it look like this entry is not in use. 369 * - 0x4 370 - __le16 371 - fake.rec_len 372 - The size of the block, in order to hide all of the dx_node data. 373 * - 0x6 374 - u8 375 - name_len 376 - Zero. There is no name for this “unused” directory entry. 377 * - 0x7 378 - u8 379 - file_type 380 - Zero. There is no file type for this “unused” directory entry. 381 * - 0x8 382 - __le16 383 - limit 384 - Maximum number of dx_entries that can follow this header, plus 1 for 385 the header itself. 386 * - 0xA 387 - __le16 388 - count 389 - Actual number of dx_entries that follow this header, plus 1 for the 390 header itself. 391 * - 0xE 392 - __le32 393 - block 394 - The block number (within the directory file) that goes with the lowest 395 hash value of this block. This value is stored in the parent block. 396 * - 0x12 397 - struct dx_entry 398 - entries[0] 399 - As many 8-byte ``struct dx_entry`` as fits in the rest of the data block. 400 401The hash maps that exist in both ``struct dx_root`` and 402``struct dx_node`` are recorded as ``struct dx_entry``, which is 8 bytes 403long: 404 405.. list-table:: 406 :widths: 8 8 24 40 407 :header-rows: 1 408 409 * - Offset 410 - Type 411 - Name 412 - Description 413 * - 0x0 414 - __le32 415 - hash 416 - Hash code. 417 * - 0x4 418 - __le32 419 - block 420 - Block number (within the directory file, not filesystem blocks) of the 421 next node in the htree. 422 423(If you think this is all quite clever and peculiar, so does the 424author.) 425 426If metadata checksums are enabled, the last 8 bytes of the directory 427block (precisely the length of one dx_entry) are used to store a 428``struct dx_tail``, which contains the checksum. The ``limit`` and 429``count`` entries in the dx_root/dx_node structures are adjusted as 430necessary to fit the dx_tail into the block. If there is no space for 431the dx_tail, the user is notified to run e2fsck -D to rebuild the 432directory index (which will ensure that there's space for the checksum. 433The dx_tail structure is 8 bytes long and looks like this: 434 435.. list-table:: 436 :widths: 8 8 24 40 437 :header-rows: 1 438 439 * - Offset 440 - Type 441 - Name 442 - Description 443 * - 0x0 444 - u32 445 - dt_reserved 446 - Unused (but still part of the checksum curiously). 447 * - 0x4 448 - __le32 449 - dt_checksum 450 - Checksum of the htree directory block. 451 452The checksum is calculated against the FS UUID, the htree index header 453(dx_root or dx_node), all of the htree indices (dx_entry) that are in 454use, and the tail block (dx_tail) with the dt_checksum initially set to 0. 455