xref: /freebsd/contrib/libarchive/libarchive/cpio.5 (revision fd082e96c469d04c399fc0556f40cb7152ee9148)
1caf54c4fSMartin Matuska.\" Copyright (c) 2007 Tim Kientzle
2caf54c4fSMartin Matuska.\" All rights reserved.
3caf54c4fSMartin Matuska.\"
4caf54c4fSMartin Matuska.\" Redistribution and use in source and binary forms, with or without
5caf54c4fSMartin Matuska.\" modification, are permitted provided that the following conditions
6caf54c4fSMartin Matuska.\" are met:
7caf54c4fSMartin Matuska.\" 1. Redistributions of source code must retain the above copyright
8caf54c4fSMartin Matuska.\"    notice, this list of conditions and the following disclaimer.
9caf54c4fSMartin Matuska.\" 2. Redistributions in binary form must reproduce the above copyright
10caf54c4fSMartin Matuska.\"    notice, this list of conditions and the following disclaimer in the
11caf54c4fSMartin Matuska.\"    documentation and/or other materials provided with the distribution.
12caf54c4fSMartin Matuska.\"
13caf54c4fSMartin Matuska.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14caf54c4fSMartin Matuska.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15caf54c4fSMartin Matuska.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16caf54c4fSMartin Matuska.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17caf54c4fSMartin Matuska.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18caf54c4fSMartin Matuska.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19caf54c4fSMartin Matuska.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20caf54c4fSMartin Matuska.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21caf54c4fSMartin Matuska.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22caf54c4fSMartin Matuska.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23caf54c4fSMartin Matuska.\" SUCH DAMAGE.
24caf54c4fSMartin Matuska.\"
256c22d9efSMartin Matuska.\" $FreeBSD$
26caf54c4fSMartin Matuska.\"
27*fd082e96SMartin Matuska.Dd December 23, 2011
28caf54c4fSMartin Matuska.Dt CPIO 5
29caf54c4fSMartin Matuska.Os
30caf54c4fSMartin Matuska.Sh NAME
31caf54c4fSMartin Matuska.Nm cpio
32caf54c4fSMartin Matuska.Nd format of cpio archive files
33caf54c4fSMartin Matuska.Sh DESCRIPTION
34caf54c4fSMartin MatuskaThe
35caf54c4fSMartin Matuska.Nm
36caf54c4fSMartin Matuskaarchive format collects any number of files, directories, and other
37caf54c4fSMartin Matuskafile system objects (symbolic links, device nodes, etc.) into a single
38caf54c4fSMartin Matuskastream of bytes.
39caf54c4fSMartin Matuska.Ss General Format
40caf54c4fSMartin MatuskaEach file system object in a
41caf54c4fSMartin Matuska.Nm
42caf54c4fSMartin Matuskaarchive comprises a header record with basic numeric metadata
43caf54c4fSMartin Matuskafollowed by the full pathname of the entry and the file data.
44caf54c4fSMartin MatuskaThe header record stores a series of integer values that generally
45caf54c4fSMartin Matuskafollow the fields in
46caf54c4fSMartin Matuska.Va struct stat .
47caf54c4fSMartin Matuska(See
48caf54c4fSMartin Matuska.Xr stat 2
49caf54c4fSMartin Matuskafor details.)
50caf54c4fSMartin MatuskaThe variants differ primarily in how they store those integers
51caf54c4fSMartin Matuska(binary, octal, or hexadecimal).
52caf54c4fSMartin MatuskaThe header is followed by the pathname of the
53caf54c4fSMartin Matuskaentry (the length of the pathname is stored in the header)
54caf54c4fSMartin Matuskaand any file data.
55caf54c4fSMartin MatuskaThe end of the archive is indicated by a special record with
56caf54c4fSMartin Matuskathe pathname
57caf54c4fSMartin Matuska.Dq TRAILER!!! .
58caf54c4fSMartin Matuska.Ss PWB format
59caf54c4fSMartin MatuskaXXX Any documentation of the original PWB/UNIX 1.0 format? XXX
60caf54c4fSMartin Matuska.Ss Old Binary Format
61caf54c4fSMartin MatuskaThe old binary
62caf54c4fSMartin Matuska.Nm
63caf54c4fSMartin Matuskaformat stores numbers as 2-byte and 4-byte binary values.
64caf54c4fSMartin MatuskaEach entry begins with a header in the following format:
65caf54c4fSMartin Matuska.Bd -literal -offset indent
66caf54c4fSMartin Matuskastruct header_old_cpio {
67caf54c4fSMartin Matuska        unsigned short   c_magic;
68caf54c4fSMartin Matuska        unsigned short   c_dev;
69caf54c4fSMartin Matuska        unsigned short   c_ino;
70caf54c4fSMartin Matuska        unsigned short   c_mode;
71caf54c4fSMartin Matuska        unsigned short   c_uid;
72caf54c4fSMartin Matuska        unsigned short   c_gid;
73caf54c4fSMartin Matuska        unsigned short   c_nlink;
74caf54c4fSMartin Matuska        unsigned short   c_rdev;
75caf54c4fSMartin Matuska	unsigned short   c_mtime[2];
76caf54c4fSMartin Matuska        unsigned short   c_namesize;
77caf54c4fSMartin Matuska	unsigned short   c_filesize[2];
78caf54c4fSMartin Matuska};
79caf54c4fSMartin Matuska.Ed
80caf54c4fSMartin Matuska.Pp
81caf54c4fSMartin MatuskaThe
82caf54c4fSMartin Matuska.Va unsigned short
83caf54c4fSMartin Matuskafields here are 16-bit integer values; the
84caf54c4fSMartin Matuska.Va unsigned int
85caf54c4fSMartin Matuskafields are 32-bit integer values.
86caf54c4fSMartin MatuskaThe fields are as follows
87caf54c4fSMartin Matuska.Bl -tag -width indent
88caf54c4fSMartin Matuska.It Va magic
89caf54c4fSMartin MatuskaThe integer value octal 070707.
90caf54c4fSMartin MatuskaThis value can be used to determine whether this archive is
91caf54c4fSMartin Matuskawritten with little-endian or big-endian integers.
92caf54c4fSMartin Matuska.It Va dev , Va ino
93caf54c4fSMartin MatuskaThe device and inode numbers from the disk.
94caf54c4fSMartin MatuskaThese are used by programs that read
95caf54c4fSMartin Matuska.Nm
96caf54c4fSMartin Matuskaarchives to determine when two entries refer to the same file.
97caf54c4fSMartin MatuskaPrograms that synthesize
98caf54c4fSMartin Matuska.Nm
99caf54c4fSMartin Matuskaarchives should be careful to set these to distinct values for each entry.
100caf54c4fSMartin Matuska.It Va mode
101caf54c4fSMartin MatuskaThe mode specifies both the regular permissions and the file type.
102caf54c4fSMartin MatuskaIt consists of several bit fields as follows:
103caf54c4fSMartin Matuska.Bl -tag -width "MMMMMMM" -compact
104caf54c4fSMartin Matuska.It 0170000
105caf54c4fSMartin MatuskaThis masks the file type bits.
106caf54c4fSMartin Matuska.It 0140000
107caf54c4fSMartin MatuskaFile type value for sockets.
108caf54c4fSMartin Matuska.It 0120000
109caf54c4fSMartin MatuskaFile type value for symbolic links.
110caf54c4fSMartin MatuskaFor symbolic links, the link body is stored as file data.
111caf54c4fSMartin Matuska.It 0100000
112caf54c4fSMartin MatuskaFile type value for regular files.
113caf54c4fSMartin Matuska.It 0060000
114caf54c4fSMartin MatuskaFile type value for block special devices.
115caf54c4fSMartin Matuska.It 0040000
116caf54c4fSMartin MatuskaFile type value for directories.
117caf54c4fSMartin Matuska.It 0020000
118caf54c4fSMartin MatuskaFile type value for character special devices.
119caf54c4fSMartin Matuska.It 0010000
120caf54c4fSMartin MatuskaFile type value for named pipes or FIFOs.
121caf54c4fSMartin Matuska.It 0004000
122caf54c4fSMartin MatuskaSUID bit.
123caf54c4fSMartin Matuska.It 0002000
124caf54c4fSMartin MatuskaSGID bit.
125caf54c4fSMartin Matuska.It 0001000
126caf54c4fSMartin MatuskaSticky bit.
127caf54c4fSMartin MatuskaOn some systems, this modifies the behavior of executables and/or directories.
128caf54c4fSMartin Matuska.It 0000777
129caf54c4fSMartin MatuskaThe lower 9 bits specify read/write/execute permissions
130caf54c4fSMartin Matuskafor world, group, and user following standard POSIX conventions.
131caf54c4fSMartin Matuska.El
132caf54c4fSMartin Matuska.It Va uid , Va gid
133caf54c4fSMartin MatuskaThe numeric user id and group id of the owner.
134caf54c4fSMartin Matuska.It Va nlink
135caf54c4fSMartin MatuskaThe number of links to this file.
136caf54c4fSMartin MatuskaDirectories always have a value of at least two here.
137caf54c4fSMartin MatuskaNote that hardlinked files include file data with every copy in the archive.
138caf54c4fSMartin Matuska.It Va rdev
139caf54c4fSMartin MatuskaFor block special and character special entries,
140caf54c4fSMartin Matuskathis field contains the associated device number.
141caf54c4fSMartin MatuskaFor all other entry types, it should be set to zero by writers
142caf54c4fSMartin Matuskaand ignored by readers.
143caf54c4fSMartin Matuska.It Va mtime
144caf54c4fSMartin MatuskaModification time of the file, indicated as the number
145caf54c4fSMartin Matuskaof seconds since the start of the epoch,
146caf54c4fSMartin Matuska00:00:00 UTC January 1, 1970.
147caf54c4fSMartin MatuskaThe four-byte integer is stored with the most-significant 16 bits first
148caf54c4fSMartin Matuskafollowed by the least-significant 16 bits.
149caf54c4fSMartin MatuskaEach of the two 16 bit values are stored in machine-native byte order.
150caf54c4fSMartin Matuska.It Va namesize
151caf54c4fSMartin MatuskaThe number of bytes in the pathname that follows the header.
152caf54c4fSMartin MatuskaThis count includes the trailing NUL byte.
153caf54c4fSMartin Matuska.It Va filesize
154caf54c4fSMartin MatuskaThe size of the file.
155caf54c4fSMartin MatuskaNote that this archive format is limited to
156caf54c4fSMartin Matuskafour gigabyte file sizes.
157caf54c4fSMartin MatuskaSee
158caf54c4fSMartin Matuska.Va mtime
159caf54c4fSMartin Matuskaabove for a description of the storage of four-byte integers.
160caf54c4fSMartin Matuska.El
161caf54c4fSMartin Matuska.Pp
162caf54c4fSMartin MatuskaThe pathname immediately follows the fixed header.
163caf54c4fSMartin MatuskaIf the
164caf54c4fSMartin Matuska.Cm namesize
165caf54c4fSMartin Matuskais odd, an additional NUL byte is added after the pathname.
166caf54c4fSMartin MatuskaThe file data is then appended, padded with NUL
167caf54c4fSMartin Matuskabytes to an even length.
168caf54c4fSMartin Matuska.Pp
169caf54c4fSMartin MatuskaHardlinked files are not given special treatment;
170caf54c4fSMartin Matuskathe full file contents are included with each copy of the
171caf54c4fSMartin Matuskafile.
172caf54c4fSMartin Matuska.Ss Portable ASCII Format
173caf54c4fSMartin Matuska.St -susv2
174caf54c4fSMartin Matuskastandardized an ASCII variant that is portable across all
175caf54c4fSMartin Matuskaplatforms.
176caf54c4fSMartin MatuskaIt is commonly known as the
177caf54c4fSMartin Matuska.Dq old character
178caf54c4fSMartin Matuskaformat or as the
179caf54c4fSMartin Matuska.Dq odc
180caf54c4fSMartin Matuskaformat.
181caf54c4fSMartin MatuskaIt stores the same numeric fields as the old binary format, but
182caf54c4fSMartin Matuskarepresents them as 6-character or 11-character octal values.
183caf54c4fSMartin Matuska.Bd -literal -offset indent
184caf54c4fSMartin Matuskastruct cpio_odc_header {
185caf54c4fSMartin Matuska        char    c_magic[6];
186caf54c4fSMartin Matuska        char    c_dev[6];
187caf54c4fSMartin Matuska        char    c_ino[6];
188caf54c4fSMartin Matuska        char    c_mode[6];
189caf54c4fSMartin Matuska        char    c_uid[6];
190caf54c4fSMartin Matuska        char    c_gid[6];
191caf54c4fSMartin Matuska        char    c_nlink[6];
192caf54c4fSMartin Matuska        char    c_rdev[6];
193caf54c4fSMartin Matuska        char    c_mtime[11];
194caf54c4fSMartin Matuska        char    c_namesize[6];
195caf54c4fSMartin Matuska        char    c_filesize[11];
196caf54c4fSMartin Matuska};
197caf54c4fSMartin Matuska.Ed
198caf54c4fSMartin Matuska.Pp
199caf54c4fSMartin MatuskaThe fields are identical to those in the old binary format.
200caf54c4fSMartin MatuskaThe name and file body follow the fixed header.
201caf54c4fSMartin MatuskaUnlike the old binary format, there is no additional padding
202caf54c4fSMartin Matuskaafter the pathname or file contents.
203caf54c4fSMartin MatuskaIf the files being archived are themselves entirely ASCII, then
204caf54c4fSMartin Matuskathe resulting archive will be entirely ASCII, except for the
205caf54c4fSMartin MatuskaNUL byte that terminates the name field.
206caf54c4fSMartin Matuska.Ss New ASCII Format
207caf54c4fSMartin MatuskaThe "new" ASCII format uses 8-byte hexadecimal fields for
208caf54c4fSMartin Matuskaall numbers and separates device numbers into separate fields
209caf54c4fSMartin Matuskafor major and minor numbers.
210caf54c4fSMartin Matuska.Bd -literal -offset indent
211caf54c4fSMartin Matuskastruct cpio_newc_header {
212caf54c4fSMartin Matuska        char    c_magic[6];
213caf54c4fSMartin Matuska        char    c_ino[8];
214caf54c4fSMartin Matuska        char    c_mode[8];
215caf54c4fSMartin Matuska        char    c_uid[8];
216caf54c4fSMartin Matuska        char    c_gid[8];
217caf54c4fSMartin Matuska        char    c_nlink[8];
218caf54c4fSMartin Matuska        char    c_mtime[8];
219caf54c4fSMartin Matuska        char    c_filesize[8];
220caf54c4fSMartin Matuska        char    c_devmajor[8];
221caf54c4fSMartin Matuska        char    c_devminor[8];
222caf54c4fSMartin Matuska        char    c_rdevmajor[8];
223caf54c4fSMartin Matuska        char    c_rdevminor[8];
224caf54c4fSMartin Matuska        char    c_namesize[8];
225caf54c4fSMartin Matuska        char    c_check[8];
226caf54c4fSMartin Matuska};
227caf54c4fSMartin Matuska.Ed
228caf54c4fSMartin Matuska.Pp
229caf54c4fSMartin MatuskaExcept as specified below, the fields here match those specified
230caf54c4fSMartin Matuskafor the old binary format above.
231caf54c4fSMartin Matuska.Bl -tag -width indent
232caf54c4fSMartin Matuska.It Va magic
233caf54c4fSMartin MatuskaThe string
234caf54c4fSMartin Matuska.Dq 070701 .
235caf54c4fSMartin Matuska.It Va check
236caf54c4fSMartin MatuskaThis field is always set to zero by writers and ignored by readers.
237caf54c4fSMartin MatuskaSee the next section for more details.
238caf54c4fSMartin Matuska.El
239caf54c4fSMartin Matuska.Pp
240caf54c4fSMartin MatuskaThe pathname is followed by NUL bytes so that the total size
241caf54c4fSMartin Matuskaof the fixed header plus pathname is a multiple of four.
242caf54c4fSMartin MatuskaLikewise, the file data is padded to a multiple of four bytes.
243caf54c4fSMartin MatuskaNote that this format supports only 4 gigabyte files (unlike the
244caf54c4fSMartin Matuskaolder ASCII format, which supports 8 gigabyte files).
245caf54c4fSMartin Matuska.Pp
246caf54c4fSMartin MatuskaIn this format, hardlinked files are handled by setting the
247caf54c4fSMartin Matuskafilesize to zero for each entry except the last one that
248caf54c4fSMartin Matuskaappears in the archive.
249caf54c4fSMartin Matuska.Ss New CRC Format
250caf54c4fSMartin MatuskaThe CRC format is identical to the new ASCII format described
251caf54c4fSMartin Matuskain the previous section except that the magic field is set
252caf54c4fSMartin Matuskato
253caf54c4fSMartin Matuska.Dq 070702
254caf54c4fSMartin Matuskaand the
255caf54c4fSMartin Matuska.Va check
256caf54c4fSMartin Matuskafield is set to the sum of all bytes in the file data.
257caf54c4fSMartin MatuskaThis sum is computed treating all bytes as unsigned values
258caf54c4fSMartin Matuskaand using unsigned arithmetic.
259caf54c4fSMartin MatuskaOnly the least-significant 32 bits of the sum are stored.
260caf54c4fSMartin Matuska.Ss HP variants
261caf54c4fSMartin MatuskaThe
262caf54c4fSMartin Matuska.Nm cpio
263caf54c4fSMartin Matuskaimplementation distributed with HPUX used XXXX but stored
264caf54c4fSMartin Matuskadevice numbers differently XXX.
265caf54c4fSMartin Matuska.Ss Other Extensions and Variants
266caf54c4fSMartin MatuskaSun Solaris uses additional file types to store extended file
267caf54c4fSMartin Matuskadata, including ACLs and extended attributes, as special
268caf54c4fSMartin Matuskaentries in cpio archives.
269caf54c4fSMartin Matuska.Pp
270caf54c4fSMartin MatuskaXXX Others? XXX
271caf54c4fSMartin Matuska.Sh SEE ALSO
272caf54c4fSMartin Matuska.Xr cpio 1 ,
273caf54c4fSMartin Matuska.Xr tar 5
274caf54c4fSMartin Matuska.Sh STANDARDS
275caf54c4fSMartin MatuskaThe
276caf54c4fSMartin Matuska.Nm cpio
277caf54c4fSMartin Matuskautility is no longer a part of POSIX or the Single Unix Standard.
278caf54c4fSMartin MatuskaIt last appeared in
279caf54c4fSMartin Matuska.St -susv2 .
280caf54c4fSMartin MatuskaIt has been supplanted in subsequent standards by
281caf54c4fSMartin Matuska.Xr pax 1 .
282caf54c4fSMartin MatuskaThe portable ASCII format is currently part of the specification for the
283caf54c4fSMartin Matuska.Xr pax 1
284caf54c4fSMartin Matuskautility.
285caf54c4fSMartin Matuska.Sh HISTORY
286caf54c4fSMartin MatuskaThe original cpio utility was written by Dick Haight
287caf54c4fSMartin Matuskawhile working in AT&T's Unix Support Group.
288caf54c4fSMartin MatuskaIt appeared in 1977 as part of PWB/UNIX 1.0, the
289caf54c4fSMartin Matuska.Dq Programmer's Work Bench
290caf54c4fSMartin Matuskaderived from
291caf54c4fSMartin Matuska.At v6
292caf54c4fSMartin Matuskathat was used internally at AT&T.
293caf54c4fSMartin MatuskaBoth the old binary and old character formats were in use
294caf54c4fSMartin Matuskaby 1980, according to the System III source released
295caf54c4fSMartin Matuskaby SCO under their
296caf54c4fSMartin Matuska.Dq Ancient Unix
297caf54c4fSMartin Matuskalicense.
298caf54c4fSMartin MatuskaThe character format was adopted as part of
299caf54c4fSMartin Matuska.St -p1003.1-88 .
300caf54c4fSMartin MatuskaXXX when did "newc" appear?  Who invented it?  When did HP come out with their variant?  When did Sun introduce ACLs and extended attributes? XXX
301e2f3482bSMartin Matuska.Sh BUGS
302e2f3482bSMartin MatuskaThe
303e2f3482bSMartin Matuska.Dq CRC
304e2f3482bSMartin Matuskaformat is mis-named, as it uses a simple checksum and
305e2f3482bSMartin Matuskanot a cyclic redundancy check.
306e2f3482bSMartin Matuska.Pp
307e2f3482bSMartin MatuskaThe old binary format is limited to 16 bits for user id,
308e2f3482bSMartin Matuskagroup id, device, and inode numbers.
309e2f3482bSMartin MatuskaIt is limited to 4 gigabyte file sizes.
310e2f3482bSMartin Matuska.Pp
311e2f3482bSMartin MatuskaThe old ASCII format is limited to 18 bits for
312e2f3482bSMartin Matuskathe user id, group id, device, and inode numbers.
313e2f3482bSMartin MatuskaIt is limited to 8 gigabyte file sizes.
314e2f3482bSMartin Matuska.Pp
315e2f3482bSMartin MatuskaThe new ASCII format is limited to 4 gigabyte file sizes.
316e2f3482bSMartin Matuska.Pp
317e2f3482bSMartin MatuskaNone of the cpio formats store user or group names,
318e2f3482bSMartin Matuskawhich are essential when moving files between systems with
319e2f3482bSMartin Matuskadissimilar user or group numbering.
320e2f3482bSMartin Matuska.Pp
321e2f3482bSMartin MatuskaEspecially when writing older cpio variants, it may be necessary
322e2f3482bSMartin Matuskato map actual device/inode values to synthesized values that
323e2f3482bSMartin Matuskafit the available fields.
324e2f3482bSMartin MatuskaWith very large filesystems, this may be necessary even for
325e2f3482bSMartin Matuskathe newer formats.
326