131771f45SMauro Carvalho Chehab.. SPDX-License-Identifier: GPL-2.0 231771f45SMauro Carvalho Chehab 331771f45SMauro Carvalho Chehab======================= 431771f45SMauro Carvalho ChehabSquashfs 4.0 Filesystem 531771f45SMauro Carvalho Chehab======================= 631771f45SMauro Carvalho Chehab 731771f45SMauro Carvalho ChehabSquashfs is a compressed read-only filesystem for Linux. 831771f45SMauro Carvalho Chehab 931771f45SMauro Carvalho ChehabIt uses zlib, lz4, lzo, or xz compression to compress files, inodes and 1031771f45SMauro Carvalho Chehabdirectories. Inodes in the system are very small and all blocks are packed to 1131771f45SMauro Carvalho Chehabminimise data overhead. Block sizes greater than 4K are supported up to a 1231771f45SMauro Carvalho Chehabmaximum of 1Mbytes (default block size 128K). 1331771f45SMauro Carvalho Chehab 1431771f45SMauro Carvalho ChehabSquashfs is intended for general read-only filesystem use, for archival 1531771f45SMauro Carvalho Chehabuse (i.e. in cases where a .tar.gz file may be used), and in constrained 1631771f45SMauro Carvalho Chehabblock device/memory systems (e.g. embedded systems) where low overhead is 1731771f45SMauro Carvalho Chehabneeded. 1831771f45SMauro Carvalho Chehab 1931771f45SMauro Carvalho ChehabMailing list: squashfs-devel@lists.sourceforge.net 2031771f45SMauro Carvalho ChehabWeb site: www.squashfs.org 2131771f45SMauro Carvalho Chehab 2231771f45SMauro Carvalho Chehab1. Filesystem Features 2331771f45SMauro Carvalho Chehab---------------------- 2431771f45SMauro Carvalho Chehab 2531771f45SMauro Carvalho ChehabSquashfs filesystem features versus Cramfs: 2631771f45SMauro Carvalho Chehab 2731771f45SMauro Carvalho Chehab============================== ========= ========== 2831771f45SMauro Carvalho Chehab Squashfs Cramfs 2931771f45SMauro Carvalho Chehab============================== ========= ========== 3031771f45SMauro Carvalho ChehabMax filesystem size 2^64 256 MiB 3131771f45SMauro Carvalho ChehabMax file size ~ 2 TiB 16 MiB 3231771f45SMauro Carvalho ChehabMax files unlimited unlimited 3331771f45SMauro Carvalho ChehabMax directories unlimited unlimited 3431771f45SMauro Carvalho ChehabMax entries per directory unlimited unlimited 3531771f45SMauro Carvalho ChehabMax block size 1 MiB 4 KiB 3631771f45SMauro Carvalho ChehabMetadata compression yes no 3731771f45SMauro Carvalho ChehabDirectory indexes yes no 3831771f45SMauro Carvalho ChehabSparse file support yes no 3931771f45SMauro Carvalho ChehabTail-end packing (fragments) yes no 4031771f45SMauro Carvalho ChehabExportable (NFS etc.) yes no 4131771f45SMauro Carvalho ChehabHard link support yes no 4231771f45SMauro Carvalho Chehab"." and ".." in readdir yes no 4331771f45SMauro Carvalho ChehabReal inode numbers yes no 4431771f45SMauro Carvalho Chehab32-bit uids/gids yes no 4531771f45SMauro Carvalho ChehabFile creation time yes no 4631771f45SMauro Carvalho ChehabXattr support yes no 4731771f45SMauro Carvalho ChehabACL support no no 4831771f45SMauro Carvalho Chehab============================== ========= ========== 4931771f45SMauro Carvalho Chehab 5031771f45SMauro Carvalho ChehabSquashfs compresses data, inodes and directories. In addition, inode and 5131771f45SMauro Carvalho Chehabdirectory data are highly compacted, and packed on byte boundaries. Each 5231771f45SMauro Carvalho Chehabcompressed inode is on average 8 bytes in length (the exact length varies on 5331771f45SMauro Carvalho Chehabfile type, i.e. regular file, directory, symbolic link, and block/char device 5431771f45SMauro Carvalho Chehabinodes have different sizes). 5531771f45SMauro Carvalho Chehab 5631771f45SMauro Carvalho Chehab2. Using Squashfs 5731771f45SMauro Carvalho Chehab----------------- 5831771f45SMauro Carvalho Chehab 5931771f45SMauro Carvalho ChehabAs squashfs is a read-only filesystem, the mksquashfs program must be used to 6031771f45SMauro Carvalho Chehabcreate populated squashfs filesystems. This and other squashfs utilities 6131771f45SMauro Carvalho Chehabcan be obtained from http://www.squashfs.org. Usage instructions can be 6231771f45SMauro Carvalho Chehabobtained from this site also. 6331771f45SMauro Carvalho Chehab 6431771f45SMauro Carvalho ChehabThe squashfs-tools development tree is now located on kernel.org 6531771f45SMauro Carvalho Chehab git://git.kernel.org/pub/scm/fs/squashfs/squashfs-tools.git 6631771f45SMauro Carvalho Chehab 67*48aa137eSAriel Miculas2.1 Mount options 68*48aa137eSAriel Miculas----------------- 69*48aa137eSAriel Miculas=================== ========================================================= 70*48aa137eSAriel Miculaserrors=%s Specify whether squashfs errors trigger a kernel panic 71*48aa137eSAriel Miculas or not 72*48aa137eSAriel Miculas 73*48aa137eSAriel Miculas ========== ============================================= 74*48aa137eSAriel Miculas continue errors don't trigger a panic (default) 75*48aa137eSAriel Miculas panic trigger a panic when errors are encountered, 76*48aa137eSAriel Miculas similar to several other filesystems (e.g. 77*48aa137eSAriel Miculas btrfs, ext4, f2fs, GFS2, jfs, ntfs, ubifs) 78*48aa137eSAriel Miculas 79*48aa137eSAriel Miculas This allows a kernel dump to be saved, 80*48aa137eSAriel Miculas useful for analyzing and debugging the 81*48aa137eSAriel Miculas corruption. 82*48aa137eSAriel Miculas ========== ============================================= 83*48aa137eSAriel Miculasthreads=%s Select the decompression mode or the number of threads 84*48aa137eSAriel Miculas 85*48aa137eSAriel Miculas If SQUASHFS_CHOICE_DECOMP_BY_MOUNT is set: 86*48aa137eSAriel Miculas 87*48aa137eSAriel Miculas ========== ============================================= 88*48aa137eSAriel Miculas single use single-threaded decompression (default) 89*48aa137eSAriel Miculas 90*48aa137eSAriel Miculas Only one block (data or metadata) can be 91*48aa137eSAriel Miculas decompressed at any one time. This limits 92*48aa137eSAriel Miculas CPU and memory usage to a minimum, but it 93*48aa137eSAriel Miculas also gives poor performance on parallel I/O 94*48aa137eSAriel Miculas workloads when using multiple CPU machines 95*48aa137eSAriel Miculas due to waiting on decompressor availability. 96*48aa137eSAriel Miculas multi use up to two parallel decompressors per core 97*48aa137eSAriel Miculas 98*48aa137eSAriel Miculas If you have a parallel I/O workload and your 99*48aa137eSAriel Miculas system has enough memory, using this option 100*48aa137eSAriel Miculas may improve overall I/O performance. It 101*48aa137eSAriel Miculas dynamically allocates decompressors on a 102*48aa137eSAriel Miculas demand basis. 103*48aa137eSAriel Miculas percpu use a maximum of one decompressor per core 104*48aa137eSAriel Miculas 105*48aa137eSAriel Miculas It uses percpu variables to ensure 106*48aa137eSAriel Miculas decompression is load-balanced across the 107*48aa137eSAriel Miculas cores. 108*48aa137eSAriel Miculas 1|2|3|... configure the number of threads used for 109*48aa137eSAriel Miculas decompression 110*48aa137eSAriel Miculas 111*48aa137eSAriel Miculas The upper limit is num_online_cpus() * 2. 112*48aa137eSAriel Miculas ========== ============================================= 113*48aa137eSAriel Miculas 114*48aa137eSAriel Miculas If SQUASHFS_CHOICE_DECOMP_BY_MOUNT is **not** set and 115*48aa137eSAriel Miculas SQUASHFS_DECOMP_MULTI, SQUASHFS_MOUNT_DECOMP_THREADS are 116*48aa137eSAriel Miculas both set: 117*48aa137eSAriel Miculas 118*48aa137eSAriel Miculas ========== ============================================= 119*48aa137eSAriel Miculas 2|3|... configure the number of threads used for 120*48aa137eSAriel Miculas decompression 121*48aa137eSAriel Miculas 122*48aa137eSAriel Miculas The upper limit is num_online_cpus() * 2. 123*48aa137eSAriel Miculas ========== ============================================= 124*48aa137eSAriel Miculas 125*48aa137eSAriel Miculas=================== ========================================================= 126*48aa137eSAriel Miculas 12731771f45SMauro Carvalho Chehab3. Squashfs Filesystem Design 12831771f45SMauro Carvalho Chehab----------------------------- 12931771f45SMauro Carvalho Chehab 13031771f45SMauro Carvalho ChehabA squashfs filesystem consists of a maximum of nine parts, packed together on a 13131771f45SMauro Carvalho Chehabbyte alignment:: 13231771f45SMauro Carvalho Chehab 13331771f45SMauro Carvalho Chehab --------------- 13431771f45SMauro Carvalho Chehab | superblock | 13531771f45SMauro Carvalho Chehab |---------------| 13631771f45SMauro Carvalho Chehab | compression | 13731771f45SMauro Carvalho Chehab | options | 13831771f45SMauro Carvalho Chehab |---------------| 13931771f45SMauro Carvalho Chehab | datablocks | 14031771f45SMauro Carvalho Chehab | & fragments | 14131771f45SMauro Carvalho Chehab |---------------| 14231771f45SMauro Carvalho Chehab | inode table | 14331771f45SMauro Carvalho Chehab |---------------| 14431771f45SMauro Carvalho Chehab | directory | 14531771f45SMauro Carvalho Chehab | table | 14631771f45SMauro Carvalho Chehab |---------------| 14731771f45SMauro Carvalho Chehab | fragment | 14831771f45SMauro Carvalho Chehab | table | 14931771f45SMauro Carvalho Chehab |---------------| 15031771f45SMauro Carvalho Chehab | export | 15131771f45SMauro Carvalho Chehab | table | 15231771f45SMauro Carvalho Chehab |---------------| 15331771f45SMauro Carvalho Chehab | uid/gid | 15431771f45SMauro Carvalho Chehab | lookup table | 15531771f45SMauro Carvalho Chehab |---------------| 15631771f45SMauro Carvalho Chehab | xattr | 15731771f45SMauro Carvalho Chehab | table | 15831771f45SMauro Carvalho Chehab --------------- 15931771f45SMauro Carvalho Chehab 16031771f45SMauro Carvalho ChehabCompressed data blocks are written to the filesystem as files are read from 16131771f45SMauro Carvalho Chehabthe source directory, and checked for duplicates. Once all file data has been 16231771f45SMauro Carvalho Chehabwritten the completed inode, directory, fragment, export, uid/gid lookup and 16331771f45SMauro Carvalho Chehabxattr tables are written. 16431771f45SMauro Carvalho Chehab 16531771f45SMauro Carvalho Chehab3.1 Compression options 16631771f45SMauro Carvalho Chehab----------------------- 16731771f45SMauro Carvalho Chehab 16831771f45SMauro Carvalho ChehabCompressors can optionally support compression specific options (e.g. 16931771f45SMauro Carvalho Chehabdictionary size). If non-default compression options have been used, then 17031771f45SMauro Carvalho Chehabthese are stored here. 17131771f45SMauro Carvalho Chehab 17231771f45SMauro Carvalho Chehab3.2 Inodes 17331771f45SMauro Carvalho Chehab---------- 17431771f45SMauro Carvalho Chehab 17531771f45SMauro Carvalho ChehabMetadata (inodes and directories) are compressed in 8Kbyte blocks. Each 17631771f45SMauro Carvalho Chehabcompressed block is prefixed by a two byte length, the top bit is set if the 17731771f45SMauro Carvalho Chehabblock is uncompressed. A block will be uncompressed if the -noI option is set, 17831771f45SMauro Carvalho Chehabor if the compressed block was larger than the uncompressed block. 17931771f45SMauro Carvalho Chehab 18031771f45SMauro Carvalho ChehabInodes are packed into the metadata blocks, and are not aligned to block 18131771f45SMauro Carvalho Chehabboundaries, therefore inodes overlap compressed blocks. Inodes are identified 18231771f45SMauro Carvalho Chehabby a 48-bit number which encodes the location of the compressed metadata block 18331771f45SMauro Carvalho Chehabcontaining the inode, and the byte offset into that block where the inode is 18431771f45SMauro Carvalho Chehabplaced (<block, offset>). 18531771f45SMauro Carvalho Chehab 18631771f45SMauro Carvalho ChehabTo maximise compression there are different inodes for each file type 18731771f45SMauro Carvalho Chehab(regular file, directory, device, etc.), the inode contents and length 18831771f45SMauro Carvalho Chehabvarying with the type. 18931771f45SMauro Carvalho Chehab 19031771f45SMauro Carvalho ChehabTo further maximise compression, two types of regular file inode and 19131771f45SMauro Carvalho Chehabdirectory inode are defined: inodes optimised for frequently occurring 19231771f45SMauro Carvalho Chehabregular files and directories, and extended types where extra 19331771f45SMauro Carvalho Chehabinformation has to be stored. 19431771f45SMauro Carvalho Chehab 19531771f45SMauro Carvalho Chehab3.3 Directories 19631771f45SMauro Carvalho Chehab--------------- 19731771f45SMauro Carvalho Chehab 19831771f45SMauro Carvalho ChehabLike inodes, directories are packed into compressed metadata blocks, stored 19931771f45SMauro Carvalho Chehabin a directory table. Directories are accessed using the start address of 20031771f45SMauro Carvalho Chehabthe metablock containing the directory and the offset into the 20131771f45SMauro Carvalho Chehabdecompressed block (<block, offset>). 20231771f45SMauro Carvalho Chehab 20331771f45SMauro Carvalho ChehabDirectories are organised in a slightly complex way, and are not simply 20431771f45SMauro Carvalho Chehaba list of file names. The organisation takes advantage of the 20531771f45SMauro Carvalho Chehabfact that (in most cases) the inodes of the files will be in the same 20631771f45SMauro Carvalho Chehabcompressed metadata block, and therefore, can share the start block. 20731771f45SMauro Carvalho ChehabDirectories are therefore organised in a two level list, a directory 20831771f45SMauro Carvalho Chehabheader containing the shared start block value, and a sequence of directory 20931771f45SMauro Carvalho Chehabentries, each of which share the shared start block. A new directory header 21031771f45SMauro Carvalho Chehabis written once/if the inode start block changes. The directory 21131771f45SMauro Carvalho Chehabheader/directory entry list is repeated as many times as necessary. 21231771f45SMauro Carvalho Chehab 21331771f45SMauro Carvalho ChehabDirectories are sorted, and can contain a directory index to speed up 21431771f45SMauro Carvalho Chehabfile lookup. Directory indexes store one entry per metablock, each entry 21531771f45SMauro Carvalho Chehabstoring the index/filename mapping to the first directory header 21631771f45SMauro Carvalho Chehabin each metadata block. Directories are sorted in alphabetical order, 21731771f45SMauro Carvalho Chehaband at lookup the index is scanned linearly looking for the first filename 21831771f45SMauro Carvalho Chehabalphabetically larger than the filename being looked up. At this point the 21931771f45SMauro Carvalho Chehablocation of the metadata block the filename is in has been found. 22031771f45SMauro Carvalho ChehabThe general idea of the index is to ensure only one metadata block needs to be 22131771f45SMauro Carvalho Chehabdecompressed to do a lookup irrespective of the length of the directory. 22231771f45SMauro Carvalho ChehabThis scheme has the advantage that it doesn't require extra memory overhead 22331771f45SMauro Carvalho Chehaband doesn't require much extra storage on disk. 22431771f45SMauro Carvalho Chehab 22531771f45SMauro Carvalho Chehab3.4 File data 22631771f45SMauro Carvalho Chehab------------- 22731771f45SMauro Carvalho Chehab 22831771f45SMauro Carvalho ChehabRegular files consist of a sequence of contiguous compressed blocks, and/or a 22931771f45SMauro Carvalho Chehabcompressed fragment block (tail-end packed block). The compressed size 23031771f45SMauro Carvalho Chehabof each datablock is stored in a block list contained within the 23131771f45SMauro Carvalho Chehabfile inode. 23231771f45SMauro Carvalho Chehab 23331771f45SMauro Carvalho ChehabTo speed up access to datablocks when reading 'large' files (256 Mbytes or 23431771f45SMauro Carvalho Chehablarger), the code implements an index cache that caches the mapping from 23531771f45SMauro Carvalho Chehabblock index to datablock location on disk. 23631771f45SMauro Carvalho Chehab 23731771f45SMauro Carvalho ChehabThe index cache allows Squashfs to handle large files (up to 1.75 TiB) while 23831771f45SMauro Carvalho Chehabretaining a simple and space-efficient block list on disk. The cache 23931771f45SMauro Carvalho Chehabis split into slots, caching up to eight 224 GiB files (128 KiB blocks). 24031771f45SMauro Carvalho ChehabLarger files use multiple slots, with 1.75 TiB files using all 8 slots. 24131771f45SMauro Carvalho ChehabThe index cache is designed to be memory efficient, and by default uses 24231771f45SMauro Carvalho Chehab16 KiB. 24331771f45SMauro Carvalho Chehab 24431771f45SMauro Carvalho Chehab3.5 Fragment lookup table 24531771f45SMauro Carvalho Chehab------------------------- 24631771f45SMauro Carvalho Chehab 24731771f45SMauro Carvalho ChehabRegular files can contain a fragment index which is mapped to a fragment 24831771f45SMauro Carvalho Chehablocation on disk and compressed size using a fragment lookup table. This 24931771f45SMauro Carvalho Chehabfragment lookup table is itself stored compressed into metadata blocks. 25031771f45SMauro Carvalho ChehabA second index table is used to locate these. This second index table for 25131771f45SMauro Carvalho Chehabspeed of access (and because it is small) is read at mount time and cached 25231771f45SMauro Carvalho Chehabin memory. 25331771f45SMauro Carvalho Chehab 25431771f45SMauro Carvalho Chehab3.6 Uid/gid lookup table 25531771f45SMauro Carvalho Chehab------------------------ 25631771f45SMauro Carvalho Chehab 25731771f45SMauro Carvalho ChehabFor space efficiency regular files store uid and gid indexes, which are 25831771f45SMauro Carvalho Chehabconverted to 32-bit uids/gids using an id look up table. This table is 25931771f45SMauro Carvalho Chehabstored compressed into metadata blocks. A second index table is used to 26031771f45SMauro Carvalho Chehablocate these. This second index table for speed of access (and because it 26131771f45SMauro Carvalho Chehabis small) is read at mount time and cached in memory. 26231771f45SMauro Carvalho Chehab 26331771f45SMauro Carvalho Chehab3.7 Export table 26431771f45SMauro Carvalho Chehab---------------- 26531771f45SMauro Carvalho Chehab 26631771f45SMauro Carvalho ChehabTo enable Squashfs filesystems to be exportable (via NFS etc.) filesystems 26731771f45SMauro Carvalho Chehabcan optionally (disabled with the -no-exports Mksquashfs option) contain 26831771f45SMauro Carvalho Chehaban inode number to inode disk location lookup table. This is required to 26931771f45SMauro Carvalho Chehabenable Squashfs to map inode numbers passed in filehandles to the inode 27031771f45SMauro Carvalho Chehablocation on disk, which is necessary when the export code reinstantiates 27131771f45SMauro Carvalho Chehabexpired/flushed inodes. 27231771f45SMauro Carvalho Chehab 27331771f45SMauro Carvalho ChehabThis table is stored compressed into metadata blocks. A second index table is 27431771f45SMauro Carvalho Chehabused to locate these. This second index table for speed of access (and because 27531771f45SMauro Carvalho Chehabit is small) is read at mount time and cached in memory. 27631771f45SMauro Carvalho Chehab 27731771f45SMauro Carvalho Chehab3.8 Xattr table 27831771f45SMauro Carvalho Chehab--------------- 27931771f45SMauro Carvalho Chehab 28031771f45SMauro Carvalho ChehabThe xattr table contains extended attributes for each inode. The xattrs 28131771f45SMauro Carvalho Chehabfor each inode are stored in a list, each list entry containing a type, 28231771f45SMauro Carvalho Chehabname and value field. The type field encodes the xattr prefix 28331771f45SMauro Carvalho Chehab("user.", "trusted." etc) and it also encodes how the name/value fields 28431771f45SMauro Carvalho Chehabshould be interpreted. Currently the type indicates whether the value 28531771f45SMauro Carvalho Chehabis stored inline (in which case the value field contains the xattr value), 28631771f45SMauro Carvalho Chehabor if it is stored out of line (in which case the value field stores a 28731771f45SMauro Carvalho Chehabreference to where the actual value is stored). This allows large values 28831771f45SMauro Carvalho Chehabto be stored out of line improving scanning and lookup performance and it 28931771f45SMauro Carvalho Chehabalso allows values to be de-duplicated, the value being stored once, and 29031771f45SMauro Carvalho Chehaball other occurrences holding an out of line reference to that value. 29131771f45SMauro Carvalho Chehab 29231771f45SMauro Carvalho ChehabThe xattr lists are packed into compressed 8K metadata blocks. 29331771f45SMauro Carvalho ChehabTo reduce overhead in inodes, rather than storing the on-disk 29431771f45SMauro Carvalho Chehablocation of the xattr list inside each inode, a 32-bit xattr id 29531771f45SMauro Carvalho Chehabis stored. This xattr id is mapped into the location of the xattr 29631771f45SMauro Carvalho Chehablist using a second xattr id lookup table. 29731771f45SMauro Carvalho Chehab 29831771f45SMauro Carvalho Chehab4. TODOs and Outstanding Issues 29931771f45SMauro Carvalho Chehab------------------------------- 30031771f45SMauro Carvalho Chehab 30131771f45SMauro Carvalho Chehab4.1 TODO list 30231771f45SMauro Carvalho Chehab------------- 30331771f45SMauro Carvalho Chehab 30431771f45SMauro Carvalho ChehabImplement ACL support. 30531771f45SMauro Carvalho Chehab 30631771f45SMauro Carvalho Chehab4.2 Squashfs Internal Cache 30731771f45SMauro Carvalho Chehab--------------------------- 30831771f45SMauro Carvalho Chehab 30931771f45SMauro Carvalho ChehabBlocks in Squashfs are compressed. To avoid repeatedly decompressing 31031771f45SMauro Carvalho Chehabrecently accessed data Squashfs uses two small metadata and fragment caches. 31131771f45SMauro Carvalho Chehab 31231771f45SMauro Carvalho ChehabThe cache is not used for file datablocks, these are decompressed and cached in 31331771f45SMauro Carvalho Chehabthe page-cache in the normal way. The cache is used to temporarily cache 31431771f45SMauro Carvalho Chehabfragment and metadata blocks which have been read as a result of a metadata 31531771f45SMauro Carvalho Chehab(i.e. inode or directory) or fragment access. Because metadata and fragments 31631771f45SMauro Carvalho Chehabare packed together into blocks (to gain greater compression) the read of a 31731771f45SMauro Carvalho Chehabparticular piece of metadata or fragment will retrieve other metadata/fragments 31831771f45SMauro Carvalho Chehabwhich have been packed with it, these because of locality-of-reference may be 31931771f45SMauro Carvalho Chehabread in the near future. Temporarily caching them ensures they are available 32031771f45SMauro Carvalho Chehabfor near future access without requiring an additional read and decompress. 32131771f45SMauro Carvalho Chehab 32231771f45SMauro Carvalho ChehabIn the future this internal cache may be replaced with an implementation which 32331771f45SMauro Carvalho Chehabuses the kernel page cache. Because the page cache operates on page sized 32431771f45SMauro Carvalho Chehabunits this may introduce additional complexity in terms of locking and 32531771f45SMauro Carvalho Chehabassociated race conditions. 326