1caf54c4fSMartin Matuska.\" Copyright (c) 2007 Tim Kientzle 2caf54c4fSMartin Matuska.\" All rights reserved. 3caf54c4fSMartin Matuska.\" 4caf54c4fSMartin Matuska.\" Redistribution and use in source and binary forms, with or without 5caf54c4fSMartin Matuska.\" modification, are permitted provided that the following conditions 6caf54c4fSMartin Matuska.\" are met: 7caf54c4fSMartin Matuska.\" 1. Redistributions of source code must retain the above copyright 8caf54c4fSMartin Matuska.\" notice, this list of conditions and the following disclaimer. 9caf54c4fSMartin Matuska.\" 2. Redistributions in binary form must reproduce the above copyright 10caf54c4fSMartin Matuska.\" notice, this list of conditions and the following disclaimer in the 11caf54c4fSMartin Matuska.\" documentation and/or other materials provided with the distribution. 12caf54c4fSMartin Matuska.\" 13caf54c4fSMartin Matuska.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14caf54c4fSMartin Matuska.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15caf54c4fSMartin Matuska.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16caf54c4fSMartin Matuska.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17caf54c4fSMartin Matuska.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18caf54c4fSMartin Matuska.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19caf54c4fSMartin Matuska.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20caf54c4fSMartin Matuska.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21caf54c4fSMartin Matuska.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22caf54c4fSMartin Matuska.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23caf54c4fSMartin Matuska.\" SUCH DAMAGE. 24caf54c4fSMartin Matuska.\" 256c22d9efSMartin Matuska.\" $FreeBSD$ 26caf54c4fSMartin Matuska.\" 27*fd082e96SMartin Matuska.Dd December 23, 2011 28caf54c4fSMartin Matuska.Dt CPIO 5 29caf54c4fSMartin Matuska.Os 30caf54c4fSMartin Matuska.Sh NAME 31caf54c4fSMartin Matuska.Nm cpio 32caf54c4fSMartin Matuska.Nd format of cpio archive files 33caf54c4fSMartin Matuska.Sh DESCRIPTION 34caf54c4fSMartin MatuskaThe 35caf54c4fSMartin Matuska.Nm 36caf54c4fSMartin Matuskaarchive format collects any number of files, directories, and other 37caf54c4fSMartin Matuskafile system objects (symbolic links, device nodes, etc.) into a single 38caf54c4fSMartin Matuskastream of bytes. 39caf54c4fSMartin Matuska.Ss General Format 40caf54c4fSMartin MatuskaEach file system object in a 41caf54c4fSMartin Matuska.Nm 42caf54c4fSMartin Matuskaarchive comprises a header record with basic numeric metadata 43caf54c4fSMartin Matuskafollowed by the full pathname of the entry and the file data. 44caf54c4fSMartin MatuskaThe header record stores a series of integer values that generally 45caf54c4fSMartin Matuskafollow the fields in 46caf54c4fSMartin Matuska.Va struct stat . 47caf54c4fSMartin Matuska(See 48caf54c4fSMartin Matuska.Xr stat 2 49caf54c4fSMartin Matuskafor details.) 50caf54c4fSMartin MatuskaThe variants differ primarily in how they store those integers 51caf54c4fSMartin Matuska(binary, octal, or hexadecimal). 52caf54c4fSMartin MatuskaThe header is followed by the pathname of the 53caf54c4fSMartin Matuskaentry (the length of the pathname is stored in the header) 54caf54c4fSMartin Matuskaand any file data. 55caf54c4fSMartin MatuskaThe end of the archive is indicated by a special record with 56caf54c4fSMartin Matuskathe pathname 57caf54c4fSMartin Matuska.Dq TRAILER!!! . 58caf54c4fSMartin Matuska.Ss PWB format 59caf54c4fSMartin MatuskaXXX Any documentation of the original PWB/UNIX 1.0 format? XXX 60caf54c4fSMartin Matuska.Ss Old Binary Format 61caf54c4fSMartin MatuskaThe old binary 62caf54c4fSMartin Matuska.Nm 63caf54c4fSMartin Matuskaformat stores numbers as 2-byte and 4-byte binary values. 64caf54c4fSMartin MatuskaEach entry begins with a header in the following format: 65caf54c4fSMartin Matuska.Bd -literal -offset indent 66caf54c4fSMartin Matuskastruct header_old_cpio { 67caf54c4fSMartin Matuska unsigned short c_magic; 68caf54c4fSMartin Matuska unsigned short c_dev; 69caf54c4fSMartin Matuska unsigned short c_ino; 70caf54c4fSMartin Matuska unsigned short c_mode; 71caf54c4fSMartin Matuska unsigned short c_uid; 72caf54c4fSMartin Matuska unsigned short c_gid; 73caf54c4fSMartin Matuska unsigned short c_nlink; 74caf54c4fSMartin Matuska unsigned short c_rdev; 75caf54c4fSMartin Matuska unsigned short c_mtime[2]; 76caf54c4fSMartin Matuska unsigned short c_namesize; 77caf54c4fSMartin Matuska unsigned short c_filesize[2]; 78caf54c4fSMartin Matuska}; 79caf54c4fSMartin Matuska.Ed 80caf54c4fSMartin Matuska.Pp 81caf54c4fSMartin MatuskaThe 82caf54c4fSMartin Matuska.Va unsigned short 83caf54c4fSMartin Matuskafields here are 16-bit integer values; the 84caf54c4fSMartin Matuska.Va unsigned int 85caf54c4fSMartin Matuskafields are 32-bit integer values. 86caf54c4fSMartin MatuskaThe fields are as follows 87caf54c4fSMartin Matuska.Bl -tag -width indent 88caf54c4fSMartin Matuska.It Va magic 89caf54c4fSMartin MatuskaThe integer value octal 070707. 90caf54c4fSMartin MatuskaThis value can be used to determine whether this archive is 91caf54c4fSMartin Matuskawritten with little-endian or big-endian integers. 92caf54c4fSMartin Matuska.It Va dev , Va ino 93caf54c4fSMartin MatuskaThe device and inode numbers from the disk. 94caf54c4fSMartin MatuskaThese are used by programs that read 95caf54c4fSMartin Matuska.Nm 96caf54c4fSMartin Matuskaarchives to determine when two entries refer to the same file. 97caf54c4fSMartin MatuskaPrograms that synthesize 98caf54c4fSMartin Matuska.Nm 99caf54c4fSMartin Matuskaarchives should be careful to set these to distinct values for each entry. 100caf54c4fSMartin Matuska.It Va mode 101caf54c4fSMartin MatuskaThe mode specifies both the regular permissions and the file type. 102caf54c4fSMartin MatuskaIt consists of several bit fields as follows: 103caf54c4fSMartin Matuska.Bl -tag -width "MMMMMMM" -compact 104caf54c4fSMartin Matuska.It 0170000 105caf54c4fSMartin MatuskaThis masks the file type bits. 106caf54c4fSMartin Matuska.It 0140000 107caf54c4fSMartin MatuskaFile type value for sockets. 108caf54c4fSMartin Matuska.It 0120000 109caf54c4fSMartin MatuskaFile type value for symbolic links. 110caf54c4fSMartin MatuskaFor symbolic links, the link body is stored as file data. 111caf54c4fSMartin Matuska.It 0100000 112caf54c4fSMartin MatuskaFile type value for regular files. 113caf54c4fSMartin Matuska.It 0060000 114caf54c4fSMartin MatuskaFile type value for block special devices. 115caf54c4fSMartin Matuska.It 0040000 116caf54c4fSMartin MatuskaFile type value for directories. 117caf54c4fSMartin Matuska.It 0020000 118caf54c4fSMartin MatuskaFile type value for character special devices. 119caf54c4fSMartin Matuska.It 0010000 120caf54c4fSMartin MatuskaFile type value for named pipes or FIFOs. 121caf54c4fSMartin Matuska.It 0004000 122caf54c4fSMartin MatuskaSUID bit. 123caf54c4fSMartin Matuska.It 0002000 124caf54c4fSMartin MatuskaSGID bit. 125caf54c4fSMartin Matuska.It 0001000 126caf54c4fSMartin MatuskaSticky bit. 127caf54c4fSMartin MatuskaOn some systems, this modifies the behavior of executables and/or directories. 128caf54c4fSMartin Matuska.It 0000777 129caf54c4fSMartin MatuskaThe lower 9 bits specify read/write/execute permissions 130caf54c4fSMartin Matuskafor world, group, and user following standard POSIX conventions. 131caf54c4fSMartin Matuska.El 132caf54c4fSMartin Matuska.It Va uid , Va gid 133caf54c4fSMartin MatuskaThe numeric user id and group id of the owner. 134caf54c4fSMartin Matuska.It Va nlink 135caf54c4fSMartin MatuskaThe number of links to this file. 136caf54c4fSMartin MatuskaDirectories always have a value of at least two here. 137caf54c4fSMartin MatuskaNote that hardlinked files include file data with every copy in the archive. 138caf54c4fSMartin Matuska.It Va rdev 139caf54c4fSMartin MatuskaFor block special and character special entries, 140caf54c4fSMartin Matuskathis field contains the associated device number. 141caf54c4fSMartin MatuskaFor all other entry types, it should be set to zero by writers 142caf54c4fSMartin Matuskaand ignored by readers. 143caf54c4fSMartin Matuska.It Va mtime 144caf54c4fSMartin MatuskaModification time of the file, indicated as the number 145caf54c4fSMartin Matuskaof seconds since the start of the epoch, 146caf54c4fSMartin Matuska00:00:00 UTC January 1, 1970. 147caf54c4fSMartin MatuskaThe four-byte integer is stored with the most-significant 16 bits first 148caf54c4fSMartin Matuskafollowed by the least-significant 16 bits. 149caf54c4fSMartin MatuskaEach of the two 16 bit values are stored in machine-native byte order. 150caf54c4fSMartin Matuska.It Va namesize 151caf54c4fSMartin MatuskaThe number of bytes in the pathname that follows the header. 152caf54c4fSMartin MatuskaThis count includes the trailing NUL byte. 153caf54c4fSMartin Matuska.It Va filesize 154caf54c4fSMartin MatuskaThe size of the file. 155caf54c4fSMartin MatuskaNote that this archive format is limited to 156caf54c4fSMartin Matuskafour gigabyte file sizes. 157caf54c4fSMartin MatuskaSee 158caf54c4fSMartin Matuska.Va mtime 159caf54c4fSMartin Matuskaabove for a description of the storage of four-byte integers. 160caf54c4fSMartin Matuska.El 161caf54c4fSMartin Matuska.Pp 162caf54c4fSMartin MatuskaThe pathname immediately follows the fixed header. 163caf54c4fSMartin MatuskaIf the 164caf54c4fSMartin Matuska.Cm namesize 165caf54c4fSMartin Matuskais odd, an additional NUL byte is added after the pathname. 166caf54c4fSMartin MatuskaThe file data is then appended, padded with NUL 167caf54c4fSMartin Matuskabytes to an even length. 168caf54c4fSMartin Matuska.Pp 169caf54c4fSMartin MatuskaHardlinked files are not given special treatment; 170caf54c4fSMartin Matuskathe full file contents are included with each copy of the 171caf54c4fSMartin Matuskafile. 172caf54c4fSMartin Matuska.Ss Portable ASCII Format 173caf54c4fSMartin Matuska.St -susv2 174caf54c4fSMartin Matuskastandardized an ASCII variant that is portable across all 175caf54c4fSMartin Matuskaplatforms. 176caf54c4fSMartin MatuskaIt is commonly known as the 177caf54c4fSMartin Matuska.Dq old character 178caf54c4fSMartin Matuskaformat or as the 179caf54c4fSMartin Matuska.Dq odc 180caf54c4fSMartin Matuskaformat. 181caf54c4fSMartin MatuskaIt stores the same numeric fields as the old binary format, but 182caf54c4fSMartin Matuskarepresents them as 6-character or 11-character octal values. 183caf54c4fSMartin Matuska.Bd -literal -offset indent 184caf54c4fSMartin Matuskastruct cpio_odc_header { 185caf54c4fSMartin Matuska char c_magic[6]; 186caf54c4fSMartin Matuska char c_dev[6]; 187caf54c4fSMartin Matuska char c_ino[6]; 188caf54c4fSMartin Matuska char c_mode[6]; 189caf54c4fSMartin Matuska char c_uid[6]; 190caf54c4fSMartin Matuska char c_gid[6]; 191caf54c4fSMartin Matuska char c_nlink[6]; 192caf54c4fSMartin Matuska char c_rdev[6]; 193caf54c4fSMartin Matuska char c_mtime[11]; 194caf54c4fSMartin Matuska char c_namesize[6]; 195caf54c4fSMartin Matuska char c_filesize[11]; 196caf54c4fSMartin Matuska}; 197caf54c4fSMartin Matuska.Ed 198caf54c4fSMartin Matuska.Pp 199caf54c4fSMartin MatuskaThe fields are identical to those in the old binary format. 200caf54c4fSMartin MatuskaThe name and file body follow the fixed header. 201caf54c4fSMartin MatuskaUnlike the old binary format, there is no additional padding 202caf54c4fSMartin Matuskaafter the pathname or file contents. 203caf54c4fSMartin MatuskaIf the files being archived are themselves entirely ASCII, then 204caf54c4fSMartin Matuskathe resulting archive will be entirely ASCII, except for the 205caf54c4fSMartin MatuskaNUL byte that terminates the name field. 206caf54c4fSMartin Matuska.Ss New ASCII Format 207caf54c4fSMartin MatuskaThe "new" ASCII format uses 8-byte hexadecimal fields for 208caf54c4fSMartin Matuskaall numbers and separates device numbers into separate fields 209caf54c4fSMartin Matuskafor major and minor numbers. 210caf54c4fSMartin Matuska.Bd -literal -offset indent 211caf54c4fSMartin Matuskastruct cpio_newc_header { 212caf54c4fSMartin Matuska char c_magic[6]; 213caf54c4fSMartin Matuska char c_ino[8]; 214caf54c4fSMartin Matuska char c_mode[8]; 215caf54c4fSMartin Matuska char c_uid[8]; 216caf54c4fSMartin Matuska char c_gid[8]; 217caf54c4fSMartin Matuska char c_nlink[8]; 218caf54c4fSMartin Matuska char c_mtime[8]; 219caf54c4fSMartin Matuska char c_filesize[8]; 220caf54c4fSMartin Matuska char c_devmajor[8]; 221caf54c4fSMartin Matuska char c_devminor[8]; 222caf54c4fSMartin Matuska char c_rdevmajor[8]; 223caf54c4fSMartin Matuska char c_rdevminor[8]; 224caf54c4fSMartin Matuska char c_namesize[8]; 225caf54c4fSMartin Matuska char c_check[8]; 226caf54c4fSMartin Matuska}; 227caf54c4fSMartin Matuska.Ed 228caf54c4fSMartin Matuska.Pp 229caf54c4fSMartin MatuskaExcept as specified below, the fields here match those specified 230caf54c4fSMartin Matuskafor the old binary format above. 231caf54c4fSMartin Matuska.Bl -tag -width indent 232caf54c4fSMartin Matuska.It Va magic 233caf54c4fSMartin MatuskaThe string 234caf54c4fSMartin Matuska.Dq 070701 . 235caf54c4fSMartin Matuska.It Va check 236caf54c4fSMartin MatuskaThis field is always set to zero by writers and ignored by readers. 237caf54c4fSMartin MatuskaSee the next section for more details. 238caf54c4fSMartin Matuska.El 239caf54c4fSMartin Matuska.Pp 240caf54c4fSMartin MatuskaThe pathname is followed by NUL bytes so that the total size 241caf54c4fSMartin Matuskaof the fixed header plus pathname is a multiple of four. 242caf54c4fSMartin MatuskaLikewise, the file data is padded to a multiple of four bytes. 243caf54c4fSMartin MatuskaNote that this format supports only 4 gigabyte files (unlike the 244caf54c4fSMartin Matuskaolder ASCII format, which supports 8 gigabyte files). 245caf54c4fSMartin Matuska.Pp 246caf54c4fSMartin MatuskaIn this format, hardlinked files are handled by setting the 247caf54c4fSMartin Matuskafilesize to zero for each entry except the last one that 248caf54c4fSMartin Matuskaappears in the archive. 249caf54c4fSMartin Matuska.Ss New CRC Format 250caf54c4fSMartin MatuskaThe CRC format is identical to the new ASCII format described 251caf54c4fSMartin Matuskain the previous section except that the magic field is set 252caf54c4fSMartin Matuskato 253caf54c4fSMartin Matuska.Dq 070702 254caf54c4fSMartin Matuskaand the 255caf54c4fSMartin Matuska.Va check 256caf54c4fSMartin Matuskafield is set to the sum of all bytes in the file data. 257caf54c4fSMartin MatuskaThis sum is computed treating all bytes as unsigned values 258caf54c4fSMartin Matuskaand using unsigned arithmetic. 259caf54c4fSMartin MatuskaOnly the least-significant 32 bits of the sum are stored. 260caf54c4fSMartin Matuska.Ss HP variants 261caf54c4fSMartin MatuskaThe 262caf54c4fSMartin Matuska.Nm cpio 263caf54c4fSMartin Matuskaimplementation distributed with HPUX used XXXX but stored 264caf54c4fSMartin Matuskadevice numbers differently XXX. 265caf54c4fSMartin Matuska.Ss Other Extensions and Variants 266caf54c4fSMartin MatuskaSun Solaris uses additional file types to store extended file 267caf54c4fSMartin Matuskadata, including ACLs and extended attributes, as special 268caf54c4fSMartin Matuskaentries in cpio archives. 269caf54c4fSMartin Matuska.Pp 270caf54c4fSMartin MatuskaXXX Others? XXX 271caf54c4fSMartin Matuska.Sh SEE ALSO 272caf54c4fSMartin Matuska.Xr cpio 1 , 273caf54c4fSMartin Matuska.Xr tar 5 274caf54c4fSMartin Matuska.Sh STANDARDS 275caf54c4fSMartin MatuskaThe 276caf54c4fSMartin Matuska.Nm cpio 277caf54c4fSMartin Matuskautility is no longer a part of POSIX or the Single Unix Standard. 278caf54c4fSMartin MatuskaIt last appeared in 279caf54c4fSMartin Matuska.St -susv2 . 280caf54c4fSMartin MatuskaIt has been supplanted in subsequent standards by 281caf54c4fSMartin Matuska.Xr pax 1 . 282caf54c4fSMartin MatuskaThe portable ASCII format is currently part of the specification for the 283caf54c4fSMartin Matuska.Xr pax 1 284caf54c4fSMartin Matuskautility. 285caf54c4fSMartin Matuska.Sh HISTORY 286caf54c4fSMartin MatuskaThe original cpio utility was written by Dick Haight 287caf54c4fSMartin Matuskawhile working in AT&T's Unix Support Group. 288caf54c4fSMartin MatuskaIt appeared in 1977 as part of PWB/UNIX 1.0, the 289caf54c4fSMartin Matuska.Dq Programmer's Work Bench 290caf54c4fSMartin Matuskaderived from 291caf54c4fSMartin Matuska.At v6 292caf54c4fSMartin Matuskathat was used internally at AT&T. 293caf54c4fSMartin MatuskaBoth the old binary and old character formats were in use 294caf54c4fSMartin Matuskaby 1980, according to the System III source released 295caf54c4fSMartin Matuskaby SCO under their 296caf54c4fSMartin Matuska.Dq Ancient Unix 297caf54c4fSMartin Matuskalicense. 298caf54c4fSMartin MatuskaThe character format was adopted as part of 299caf54c4fSMartin Matuska.St -p1003.1-88 . 300caf54c4fSMartin MatuskaXXX when did "newc" appear? Who invented it? When did HP come out with their variant? When did Sun introduce ACLs and extended attributes? XXX 301e2f3482bSMartin Matuska.Sh BUGS 302e2f3482bSMartin MatuskaThe 303e2f3482bSMartin Matuska.Dq CRC 304e2f3482bSMartin Matuskaformat is mis-named, as it uses a simple checksum and 305e2f3482bSMartin Matuskanot a cyclic redundancy check. 306e2f3482bSMartin Matuska.Pp 307e2f3482bSMartin MatuskaThe old binary format is limited to 16 bits for user id, 308e2f3482bSMartin Matuskagroup id, device, and inode numbers. 309e2f3482bSMartin MatuskaIt is limited to 4 gigabyte file sizes. 310e2f3482bSMartin Matuska.Pp 311e2f3482bSMartin MatuskaThe old ASCII format is limited to 18 bits for 312e2f3482bSMartin Matuskathe user id, group id, device, and inode numbers. 313e2f3482bSMartin MatuskaIt is limited to 8 gigabyte file sizes. 314e2f3482bSMartin Matuska.Pp 315e2f3482bSMartin MatuskaThe new ASCII format is limited to 4 gigabyte file sizes. 316e2f3482bSMartin Matuska.Pp 317e2f3482bSMartin MatuskaNone of the cpio formats store user or group names, 318e2f3482bSMartin Matuskawhich are essential when moving files between systems with 319e2f3482bSMartin Matuskadissimilar user or group numbering. 320e2f3482bSMartin Matuska.Pp 321e2f3482bSMartin MatuskaEspecially when writing older cpio variants, it may be necessary 322e2f3482bSMartin Matuskato map actual device/inode values to synthesized values that 323e2f3482bSMartin Matuskafit the available fields. 324e2f3482bSMartin MatuskaWith very large filesystems, this may be necessary even for 325e2f3482bSMartin Matuskathe newer formats. 326