11da177e4SLinus TorvaldsNotes on Filesystem Layout 21da177e4SLinus Torvalds-------------------------- 31da177e4SLinus Torvalds 41da177e4SLinus TorvaldsThese notes describe what mkcramfs generates. Kernel requirements are 51da177e4SLinus Torvaldsa bit looser, e.g. it doesn't care if the <file_data> items are 61da177e4SLinus Torvaldsswapped around (though it does care that directory entries (inodes) in 71da177e4SLinus Torvaldsa given directory are contiguous, as this is used by readdir). 81da177e4SLinus Torvalds 91da177e4SLinus TorvaldsAll data is currently in host-endian format; neither mkcramfs nor the 101da177e4SLinus Torvaldskernel ever do swabbing. (See section `Block Size' below.) 111da177e4SLinus Torvalds 121da177e4SLinus Torvalds<filesystem>: 131da177e4SLinus Torvalds <superblock> 141da177e4SLinus Torvalds <directory_structure> 151da177e4SLinus Torvalds <data> 161da177e4SLinus Torvalds 171da177e4SLinus Torvalds<superblock>: struct cramfs_super (see cramfs_fs.h). 181da177e4SLinus Torvalds 191da177e4SLinus Torvalds<directory_structure>: 201da177e4SLinus Torvalds For each file: 211da177e4SLinus Torvalds struct cramfs_inode (see cramfs_fs.h). 221da177e4SLinus Torvalds Filename. Not generally null-terminated, but it is 231da177e4SLinus Torvalds null-padded to a multiple of 4 bytes. 241da177e4SLinus Torvalds 251da177e4SLinus TorvaldsThe order of inode traversal is described as "width-first" (not to be 261da177e4SLinus Torvaldsconfused with breadth-first); i.e. like depth-first but listing all of 271da177e4SLinus Torvaldsa directory's entries before recursing down its subdirectories: the 281da177e4SLinus Torvaldssame order as `ls -AUR' (but without the /^\..*:$/ directory header 291da177e4SLinus Torvaldslines); put another way, the same order as `find -type d -exec 301da177e4SLinus Torvaldsls -AU1 {} \;'. 311da177e4SLinus Torvalds 321da177e4SLinus TorvaldsBeginning in 2.4.7, directory entries are sorted. This optimization 331da177e4SLinus Torvaldsallows cramfs_lookup to return more quickly when a filename does not 341da177e4SLinus Torvaldsexist, speeds up user-space directory sorts, etc. 351da177e4SLinus Torvalds 361da177e4SLinus Torvalds<data>: 371da177e4SLinus Torvalds One <file_data> for each file that's either a symlink or a 381da177e4SLinus Torvalds regular file of non-zero st_size. 391da177e4SLinus Torvalds 401da177e4SLinus Torvalds<file_data>: 411da177e4SLinus Torvalds nblocks * <block_pointer> 421da177e4SLinus Torvalds (where nblocks = (st_size - 1) / blksize + 1) 431da177e4SLinus Torvalds nblocks * <block> 441da177e4SLinus Torvalds padding to multiple of 4 bytes 451da177e4SLinus Torvalds 461da177e4SLinus TorvaldsThe i'th <block_pointer> for a file stores the byte offset of the 471da177e4SLinus Torvalds*end* of the i'th <block> (i.e. one past the last byte, which is the 481da177e4SLinus Torvaldssame as the start of the (i+1)'th <block> if there is one). The first 491da177e4SLinus Torvalds<block> immediately follows the last <block_pointer> for the file. 501da177e4SLinus Torvalds<block_pointer>s are each 32 bits long. 511da177e4SLinus Torvalds 521da177e4SLinus TorvaldsThe order of <file_data>'s is a depth-first descent of the directory 531da177e4SLinus Torvaldstree, i.e. the same order as `find -size +0 \( -type f -o -type l \) 541da177e4SLinus Torvalds-print'. 551da177e4SLinus Torvalds 561da177e4SLinus Torvalds 571da177e4SLinus Torvalds<block>: The i'th <block> is the output of zlib's compress function 581da177e4SLinus Torvaldsapplied to the i'th blksize-sized chunk of the input data. 591da177e4SLinus Torvalds(For the last <block> of the file, the input may of course be smaller.) 601da177e4SLinus TorvaldsEach <block> may be a different size. (See <block_pointer> above.) 611da177e4SLinus Torvalds<block>s are merely byte-aligned, not generally u32-aligned. 621da177e4SLinus Torvalds 631da177e4SLinus Torvalds 641da177e4SLinus TorvaldsHoles 651da177e4SLinus Torvalds----- 661da177e4SLinus Torvalds 671da177e4SLinus TorvaldsThis kernel supports cramfs holes (i.e. [efficient representation of] 681da177e4SLinus Torvaldsblocks in uncompressed data consisting entirely of NUL bytes), but by 691da177e4SLinus Torvaldsdefault mkcramfs doesn't test for & create holes, since cramfs in 701da177e4SLinus Torvaldskernels up to at least 2.3.39 didn't support holes. Run mkcramfs 711da177e4SLinus Torvaldswith -z if you want it to create files that can have holes in them. 721da177e4SLinus Torvalds 731da177e4SLinus Torvalds 741da177e4SLinus TorvaldsTools 751da177e4SLinus Torvalds----- 761da177e4SLinus Torvalds 771da177e4SLinus TorvaldsThe cramfs user-space tools, including mkcramfs and cramfsck, are 781da177e4SLinus Torvaldslocated at <http://sourceforge.net/projects/cramfs/>. 791da177e4SLinus Torvalds 801da177e4SLinus Torvalds 811da177e4SLinus TorvaldsFuture Development 821da177e4SLinus Torvalds================== 831da177e4SLinus Torvalds 841da177e4SLinus TorvaldsBlock Size 851da177e4SLinus Torvalds---------- 861da177e4SLinus Torvalds 871da177e4SLinus Torvalds(Block size in cramfs refers to the size of input data that is 881da177e4SLinus Torvaldscompressed at a time. It's intended to be somewhere around 89*ea1754a0SKirill A. ShutemovPAGE_SIZE for cramfs_readpage's convenience.) 901da177e4SLinus Torvalds 911da177e4SLinus TorvaldsThe superblock ought to indicate the block size that the fs was 921da177e4SLinus Torvaldswritten for, since comments in <linux/pagemap.h> indicate that 93*ea1754a0SKirill A. ShutemovPAGE_SIZE may grow in future (if I interpret the comment 941da177e4SLinus Torvaldscorrectly). 951da177e4SLinus Torvalds 96*ea1754a0SKirill A. ShutemovCurrently, mkcramfs #define's PAGE_SIZE as 4096 and uses that 97*ea1754a0SKirill A. Shutemovfor blksize, whereas Linux-2.3.39 uses its PAGE_SIZE, which in 981da177e4SLinus Torvaldsturn is defined as PAGE_SIZE (which can be as large as 32KB on arm). 991da177e4SLinus TorvaldsThis discrepancy is a bug, though it's not clear which should be 1001da177e4SLinus Torvaldschanged. 1011da177e4SLinus Torvalds 102*ea1754a0SKirill A. ShutemovOne option is to change mkcramfs to take its PAGE_SIZE from 1031da177e4SLinus Torvalds<asm/page.h>. Personally I don't like this option, but it does 1041da177e4SLinus Torvaldsrequire the least amount of change: just change `#define 105*ea1754a0SKirill A. ShutemovPAGE_SIZE (4096)' to `#include <asm/page.h>'. The disadvantage 1061da177e4SLinus Torvaldsis that the generated cramfs cannot always be shared between different 1071da177e4SLinus Torvaldskernels, not even necessarily kernels of the same architecture if 108*ea1754a0SKirill A. ShutemovPAGE_SIZE is subject to change between kernel versions 1091da177e4SLinus Torvalds(currently possible with arm and ia64). 1101da177e4SLinus Torvalds 1111da177e4SLinus TorvaldsThe remaining options try to make cramfs more sharable. 1121da177e4SLinus Torvalds 1131da177e4SLinus TorvaldsOne part of that is addressing endianness. The two options here are 1141da177e4SLinus Torvalds`always use little-endian' (like ext2fs) or `writer chooses 1151da177e4SLinus Torvaldsendianness; kernel adapts at runtime'. Little-endian wins because of 1161da177e4SLinus Torvaldscode simplicity and little CPU overhead even on big-endian machines. 1171da177e4SLinus Torvalds 1181da177e4SLinus TorvaldsThe cost of swabbing is changing the code to use the le32_to_cpu 1191da177e4SLinus Torvaldsetc. macros as used by ext2fs. We don't need to swab the compressed 1201da177e4SLinus Torvaldsdata, only the superblock, inodes and block pointers. 1211da177e4SLinus Torvalds 1221da177e4SLinus Torvalds 1231da177e4SLinus TorvaldsThe other part of making cramfs more sharable is choosing a block 1241da177e4SLinus Torvaldssize. The options are: 1251da177e4SLinus Torvalds 1261da177e4SLinus Torvalds 1. Always 4096 bytes. 1271da177e4SLinus Torvalds 1281da177e4SLinus Torvalds 2. Writer chooses blocksize; kernel adapts but rejects blocksize > 129*ea1754a0SKirill A. Shutemov PAGE_SIZE. 1301da177e4SLinus Torvalds 1311da177e4SLinus Torvalds 3. Writer chooses blocksize; kernel adapts even to blocksize > 132*ea1754a0SKirill A. Shutemov PAGE_SIZE. 1331da177e4SLinus Torvalds 1341da177e4SLinus TorvaldsIt's easy enough to change the kernel to use a smaller value than 135*ea1754a0SKirill A. ShutemovPAGE_SIZE: just make cramfs_readpage read multiple blocks. 1361da177e4SLinus Torvalds 137*ea1754a0SKirill A. ShutemovThe cost of option 1 is that kernels with a larger PAGE_SIZE 1381da177e4SLinus Torvaldsvalue don't get as good compression as they can. 1391da177e4SLinus Torvalds 1401da177e4SLinus TorvaldsThe cost of option 2 relative to option 1 is that the code uses 1411da177e4SLinus Torvaldsvariables instead of #define'd constants. The gain is that people 142*ea1754a0SKirill A. Shutemovwith kernels having larger PAGE_SIZE can make use of that if 1431da177e4SLinus Torvaldsthey don't mind their cramfs being inaccessible to kernels with 144*ea1754a0SKirill A. Shutemovsmaller PAGE_SIZE values. 1451da177e4SLinus Torvalds 1461da177e4SLinus TorvaldsOption 3 is easy to implement if we don't mind being CPU-inefficient: 1471da177e4SLinus Torvaldse.g. get readpage to decompress to a buffer of size MAX_BLKSIZE (which 1481da177e4SLinus Torvaldsmust be no larger than 32KB) and discard what it doesn't need. 1491da177e4SLinus TorvaldsGetting readpage to read into all the covered pages is harder. 1501da177e4SLinus Torvalds 1511da177e4SLinus TorvaldsThe main advantage of option 3 over 1, 2, is better compression. The 1521da177e4SLinus Torvaldscost is greater complexity. Probably not worth it, but I hope someone 1531da177e4SLinus Torvaldswill disagree. (If it is implemented, then I'll re-use that code in 1541da177e4SLinus Torvaldse2compr.) 1551da177e4SLinus Torvalds 1561da177e4SLinus Torvalds 1571da177e4SLinus TorvaldsAnother cost of 2 and 3 over 1 is making mkcramfs use a different 1581da177e4SLinus Torvaldsblock size, but that just means adding and parsing a -b option. 1591da177e4SLinus Torvalds 1601da177e4SLinus Torvalds 1611da177e4SLinus TorvaldsInode Size 1621da177e4SLinus Torvalds---------- 1631da177e4SLinus Torvalds 1641da177e4SLinus TorvaldsGiven that cramfs will probably be used for CDs etc. as well as just 1651da177e4SLinus Torvaldssilicon ROMs, it might make sense to expand the inode a little from 1661da177e4SLinus Torvaldsits current 12 bytes. Inodes other than the root inode are followed 1671da177e4SLinus Torvaldsby filename, so the expansion doesn't even have to be a multiple of 4 1681da177e4SLinus Torvaldsbytes. 169