1.\" Copyright (c) 2007 Tim Kientzle 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.Dd December 23, 2011 26.Dt CPIO 5 27.Os 28.Sh NAME 29.Nm cpio 30.Nd format of cpio archive files 31.Sh DESCRIPTION 32The 33.Nm 34archive format collects any number of files, directories, and other 35file system objects (symbolic links, device nodes, etc.) into a single 36stream of bytes. 37.Ss General Format 38Each file system object in a 39.Nm 40archive comprises a header record with basic numeric metadata 41followed by the full pathname of the entry and the file data. 42The header record stores a series of integer values that generally 43follow the fields in 44.Va struct stat . 45(See 46.Xr stat 2 47for details.) 48The variants differ primarily in how they store those integers 49(binary, octal, or hexadecimal). 50The header is followed by the pathname of the 51entry (the length of the pathname is stored in the header) 52and any file data. 53The end of the archive is indicated by a special record with 54the pathname 55.Dq TRAILER!!! . 56.Ss PWB format 57The PWB binary 58.Nm 59format is the original format, when cpio was introduced as part of the 60Programmer's Work Bench system, a variant of 6th Edition UNIX. It 61stores numbers as 2-byte and 4-byte binary values. 62Each entry begins with a header in the following format: 63.Pp 64.Bd -literal -offset indent 65struct header_pwb_cpio { 66 short h_magic; 67 short h_dev; 68 short h_ino; 69 short h_mode; 70 short h_uid; 71 short h_gid; 72 short h_nlink; 73 short h_majmin; 74 long h_mtime; 75 short h_namesize; 76 long h_filesize; 77}; 78.Ed 79.Pp 80The 81.Va short 82fields here are 16-bit integer values, while the 83.Va long 84fields are 32 bit integers. Since PWB UNIX, like the 6th Edition UNIX 85it was based on, only ran on PDP-11 computers, they 86are in PDP-endian format, which has little-endian shorts, and 87big-endian longs. That is, the long integer whose hexadecimal 88representation is 0x12345678 would be stored in four successive bytes 89as 0x34, 0x12, 0x78, 0x56. 90The fields are as follows: 91.Bl -tag -width indent 92.It Va h_magic 93The integer value octal 070707. 94.It Va h_dev , Va h_ino 95The device and inode numbers from the disk. 96These are used by programs that read 97.Nm 98archives to determine when two entries refer to the same file. 99Programs that synthesize 100.Nm 101archives should be careful to set these to distinct values for each entry. 102.It Va h_mode 103The mode specifies both the regular permissions and the file type, and 104it also holds a couple of bits that are irrelevant to the cpio format, 105because the field is actually a raw copy of the mode field in the inode 106representing the file. These are the IALLOC flag, which shows that 107the inode entry is in use, and the ILARG flag, which shows that the 108file it represents is large enough to have indirect blocks pointers in 109the inode. 110The mode is decoded as follows: 111.Pp 112.Bl -tag -width "MMMMMMM" -compact 113.It 0100000 114IALLOC flag - irrelevant to cpio. 115.It 0060000 116This masks the file type bits. 117.It 0040000 118File type value for directories. 119.It 0020000 120File type value for character special devices. 121.It 0060000 122File type value for block special devices. 123.It 0010000 124ILARG flag - irrelevant to cpio. 125.It 0004000 126SUID bit. 127.It 0002000 128SGID bit. 129.It 0001000 130Sticky bit. 131.It 0000777 132The lower 9 bits specify read/write/execute permissions 133for world, group, and user following standard POSIX conventions. 134.El 135.It Va h_uid , Va h_gid 136The numeric user id and group id of the owner. 137.It Va h_nlink 138The number of links to this file. 139Directories always have a value of at least two here. 140Note that hardlinked files include file data with every copy in the archive. 141.It Va h_majmin 142For block special and character special entries, 143this field contains the associated device number, with the major 144number in the high byte, and the minor number in the low byte. 145For all other entry types, it should be set to zero by writers 146and ignored by readers. 147.It Va h_mtime 148Modification time of the file, indicated as the number 149of seconds since the start of the epoch, 15000:00:00 UTC January 1, 1970. 151.It Va h_namesize 152The number of bytes in the pathname that follows the header. 153This count includes the trailing NUL byte. 154.It Va h_filesize 155The size of the file. Note that this archive format is limited to 16 156megabyte file sizes, because PWB UNIX, like 6th Edition, only used 157an unsigned 24 bit integer for the file size internally. 158.El 159.Pp 160The pathname immediately follows the fixed header. 161If 162.Cm h_namesize 163is odd, an additional NUL byte is added after the pathname. 164The file data is then appended, again with an additional NUL 165appended if needed to get the next header at an even offset. 166.Pp 167Hardlinked files are not given special treatment; 168the full file contents are included with each copy of the 169file. 170.Ss New Binary Format 171The new binary 172.Nm 173format showed up when cpio was adopted into late 7th Edition UNIX. 174It is exactly like the PWB binary format, described above, except for 175three changes: 176.Pp 177First, UNIX now ran on more than one hardware type, so the endianness 178of 16 bit integers must be determined by observing the magic number at 179the start of the header. The 32 bit integers are still always stored 180with the most significant word first, though, so each of those two, in 181the struct shown above, was stored as an array of two 16 bit integers, 182in the traditional order. Those 16 bit integers, like all the others 183in the struct, were accessed using a macro that byte swapped them if 184necessary. 185.Pp 186Next, 7th Edition had more file types to store, and the IALLOC and ILARG 187flag bits were re-purposed to accommodate these. The revised use of the 188various bits is as follows: 189.Pp 190.Bl -tag -width "MMMMMMM" -compact 191.It 0170000 192This masks the file type bits. 193.It 0140000 194File type value for sockets. 195.It 0120000 196File type value for symbolic links. 197For symbolic links, the link body is stored as file data. 198.It 0100000 199File type value for regular files. 200.It 0060000 201File type value for block special devices. 202.It 0040000 203File type value for directories. 204.It 0020000 205File type value for character special devices. 206.It 0010000 207File type value for named pipes or FIFOs. 208.It 0004000 209SUID bit. 210.It 0002000 211SGID bit. 212.It 0001000 213Sticky bit. 214.It 0000777 215The lower 9 bits specify read/write/execute permissions 216for world, group, and user following standard POSIX conventions. 217.El 218.Pp 219Finally, the file size field now represents a signed 32 bit integer in 220the underlying file system, so the maximum file size has increased to 2212 gigabytes. 222.Pp 223Note that there is no obvious way to tell which of the two binary 224formats an archive uses, other than to see which one makes more 225sense. The typical error scenario is that a PWB format archive 226unpacked as if it were in the new format will create named sockets 227instead of directories, and then fail to unpack files that should 228go in those directories. Running 229.Va bsdcpio -itv 230on an unknown archive will make it obvious which it is: if it's 231PWB format, directories will be listed with an 's' instead of 232a 'd' as the first character of the mode string, and the larger 233files will have a '?' in that position. 234.Ss Portable ASCII Format 235.St -susv2 236standardized an ASCII variant that is portable across all 237platforms. 238It is commonly known as the 239.Dq old character 240format or as the 241.Dq odc 242format. 243It stores the same numeric fields as the old binary format, but 244represents them as 6-character or 11-character octal values. 245.Pp 246.Bd -literal -offset indent 247struct cpio_odc_header { 248 char c_magic[6]; 249 char c_dev[6]; 250 char c_ino[6]; 251 char c_mode[6]; 252 char c_uid[6]; 253 char c_gid[6]; 254 char c_nlink[6]; 255 char c_rdev[6]; 256 char c_mtime[11]; 257 char c_namesize[6]; 258 char c_filesize[11]; 259}; 260.Ed 261.Pp 262The fields are identical to those in the new binary format. 263The name and file body follow the fixed header. 264Unlike the binary formats, there is no additional padding 265after the pathname or file contents. 266If the files being archived are themselves entirely ASCII, then 267the resulting archive will be entirely ASCII, except for the 268NUL byte that terminates the name field. 269.Ss New ASCII Format 270The "new" ASCII format uses 8-byte hexadecimal fields for 271all numbers and separates device numbers into separate fields 272for major and minor numbers. 273.Pp 274.Bd -literal -offset indent 275struct cpio_newc_header { 276 char c_magic[6]; 277 char c_ino[8]; 278 char c_mode[8]; 279 char c_uid[8]; 280 char c_gid[8]; 281 char c_nlink[8]; 282 char c_mtime[8]; 283 char c_filesize[8]; 284 char c_devmajor[8]; 285 char c_devminor[8]; 286 char c_rdevmajor[8]; 287 char c_rdevminor[8]; 288 char c_namesize[8]; 289 char c_check[8]; 290}; 291.Ed 292.Pp 293Except as specified below, the fields here match those specified 294for the new binary format above. 295.Bl -tag -width indent 296.It Va magic 297The string 298.Dq 070701 . 299.It Va check 300This field is always set to zero by writers and ignored by readers. 301See the next section for more details. 302.El 303.Pp 304The pathname is followed by NUL bytes so that the total size 305of the fixed header plus pathname is a multiple of four. 306Likewise, the file data is padded to a multiple of four bytes. 307Note that this format supports only 4 gigabyte files (unlike the 308older ASCII format, which supports 8 gigabyte files). 309.Pp 310In this format, hardlinked files are handled by setting the 311filesize to zero for each entry except the first one that 312appears in the archive. 313.Ss New CRC Format 314The CRC format is identical to the new ASCII format described 315in the previous section except that the magic field is set 316to 317.Dq 070702 318and the 319.Va check 320field is set to the sum of all bytes in the file data. 321This sum is computed treating all bytes as unsigned values 322and using unsigned arithmetic. 323Only the least-significant 32 bits of the sum are stored. 324.Ss HP variants 325The 326.Nm cpio 327implementation distributed with HPUX used XXXX but stored 328device numbers differently XXX. 329.Ss Other Extensions and Variants 330Sun Solaris uses additional file types to store extended file 331data, including ACLs and extended attributes, as special 332entries in cpio archives. 333.Pp 334XXX Others? XXX 335.Sh SEE ALSO 336.Xr cpio 1 , 337.Xr tar 5 338.Sh STANDARDS 339The 340.Nm cpio 341utility is no longer a part of POSIX or the Single Unix Standard. 342It last appeared in 343.St -susv2 . 344It has been supplanted in subsequent standards by 345.Xr pax 1 . 346The portable ASCII format is currently part of the specification for the 347.Xr pax 1 348utility. 349.Sh HISTORY 350The original cpio utility was written by Dick Haight 351while working in AT&T's Unix Support Group. 352It appeared in 1977 as part of PWB/UNIX 1.0, the 353.Dq Programmer's Work Bench 354derived from 355.At v6 356that was used internally at AT&T. 357Both the new binary and old character formats were in use 358by 1980, according to the System III source released 359by SCO under their 360.Dq Ancient Unix 361license. 362The character format was adopted as part of 363.St -p1003.1-88 . 364XXX when did "newc" appear? Who invented it? When did HP come out with their variant? When did Sun introduce ACLs and extended attributes? XXX 365.Sh BUGS 366The 367.Dq CRC 368format is mis-named, as it uses a simple checksum and 369not a cyclic redundancy check. 370.Pp 371The binary formats are limited to 16 bits for user id, group id, 372device, and inode numbers. They are limited to 16 megabyte and 2 373gigabyte file sizes for the older and newer variants, respectively. 374.Pp 375The old ASCII format is limited to 18 bits for 376the user id, group id, device, and inode numbers. 377It is limited to 8 gigabyte file sizes. 378.Pp 379The new ASCII format is limited to 4 gigabyte file sizes. 380.Pp 381None of the cpio formats store user or group names, 382which are essential when moving files between systems with 383dissimilar user or group numbering. 384.Pp 385Especially when writing older cpio variants, it may be necessary 386to map actual device/inode values to synthesized values that 387fit the available fields. 388With very large filesystems, this may be necessary even for 389the newer formats. 390