xref: /freebsd/contrib/libarchive/README.md (revision 47af42f8e7cf195e957d55d58fd16c030f6b5f83)
1*47af42f8SMartin Matuska# Welcome to libarchive!
2*47af42f8SMartin Matuska
3*47af42f8SMartin MatuskaThe libarchive project develops a portable, efficient C library that
4*47af42f8SMartin Matuskacan read and write streaming archives in a variety of formats.  It
5*47af42f8SMartin Matuskaalso includes implementations of the common `tar`, `cpio`, and `zcat`
6*47af42f8SMartin Matuskacommand-line tools that use the libarchive library.
7*47af42f8SMartin Matuska
8*47af42f8SMartin Matuska## Questions?  Issues?
9*47af42f8SMartin Matuska
10*47af42f8SMartin Matuska* http://www.libarchive.org is the home for ongoing
11*47af42f8SMartin Matuska  libarchive development, including documentation,
12*47af42f8SMartin Matuska  and links to the libarchive mailing lists.
13*47af42f8SMartin Matuska* To report an issue, use the issue tracker at
14*47af42f8SMartin Matuska  https://github.com/libarchive/libarchive/issues
15*47af42f8SMartin Matuska* To submit an enhancement to libarchive, please
16*47af42f8SMartin Matuska  submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls
17*47af42f8SMartin Matuska
18*47af42f8SMartin Matuska## Contents of the Distribution
19*47af42f8SMartin Matuska
20*47af42f8SMartin MatuskaThis distribution bundle includes the following major components:
21*47af42f8SMartin Matuska
22*47af42f8SMartin Matuska* **libarchive**: a library for reading and writing streaming archives
23*47af42f8SMartin Matuska* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive
24*47af42f8SMartin Matuska* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality
25*47af42f8SMartin Matuska* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such
26*47af42f8SMartin Matuska* **examples**: Some small example programs that you may find useful.
27*47af42f8SMartin Matuska* **examples/minitar**: a compact sample demonstrating use of libarchive.
28*47af42f8SMartin Matuska* **contrib**:  Various items sent to me by third parties; please contact the authors with any questions.
29*47af42f8SMartin Matuska
30*47af42f8SMartin MatuskaThe top-level directory contains the following information files:
31*47af42f8SMartin Matuska
32*47af42f8SMartin Matuska* **NEWS** - highlights of recent changes
33*47af42f8SMartin Matuska* **COPYING** - what you can do with this
34*47af42f8SMartin Matuska* **INSTALL** - installation instructions
35*47af42f8SMartin Matuska* **README** - this file
36*47af42f8SMartin Matuska* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL
37*47af42f8SMartin Matuska* **configure** - configuration script, see INSTALL for details.  If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`).
38*47af42f8SMartin Matuska
39*47af42f8SMartin MatuskaThe following files in the top-level directory are used by the 'configure' script:
40*47af42f8SMartin Matuska* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers
41*47af42f8SMartin Matuska* `Makefile.in`, `config.h.in` - templates used by configure script
42*47af42f8SMartin Matuska
43*47af42f8SMartin Matuska## Documentation
44*47af42f8SMartin Matuska
45*47af42f8SMartin MatuskaIn addition to the informational articles and documentation
46*47af42f8SMartin Matuskain the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki),
47*47af42f8SMartin Matuskathe distribution also includes a number of manual pages:
48*47af42f8SMartin Matuska
49*47af42f8SMartin Matuska * bsdtar.1 explains the use of the bsdtar program
50*47af42f8SMartin Matuska * bsdcpio.1 explains the use of the bsdcpio program
51*47af42f8SMartin Matuska * bsdcat.1 explains the use of the bsdcat program
52*47af42f8SMartin Matuska * libarchive.3 gives an overview of the library as a whole
53*47af42f8SMartin Matuska * archive_read.3, archive_write.3, archive_write_disk.3, and
54*47af42f8SMartin Matuska   archive_read_disk.3 provide detailed calling sequences for the read
55*47af42f8SMartin Matuska   and write APIs
56*47af42f8SMartin Matuska * archive_entry.3 details the "struct archive_entry" utility class
57*47af42f8SMartin Matuska * archive_internals.3 provides some insight into libarchive's
58*47af42f8SMartin Matuska   internal structure and operation.
59*47af42f8SMartin Matuska * libarchive-formats.5 documents the file formats supported by the library
60*47af42f8SMartin Matuska * cpio.5, mtree.5, and tar.5 provide detailed information about these
61*47af42f8SMartin Matuska   popular archive formats, including hard-to-find details about
62*47af42f8SMartin Matuska   modern cpio and tar variants.
63*47af42f8SMartin Matuska
64*47af42f8SMartin MatuskaThe manual pages above are provided in the 'doc' directory in
65*47af42f8SMartin Matuskaa number of different formats.
66*47af42f8SMartin Matuska
67*47af42f8SMartin MatuskaYou should also read the copious comments in `archive.h` and the
68*47af42f8SMartin Matuskasource code for the sample programs for more details.  Please let us
69*47af42f8SMartin Matuskaknow about any errors or omissions you find.
70*47af42f8SMartin Matuska
71*47af42f8SMartin Matuska## Supported Formats
72*47af42f8SMartin Matuska
73*47af42f8SMartin MatuskaCurrently, the library automatically detects and reads the following fomats:
74*47af42f8SMartin Matuska  * Old V7 tar archives
75*47af42f8SMartin Matuska  * POSIX ustar
76*47af42f8SMartin Matuska  * GNU tar format (including GNU long filenames, long link names, and sparse files)
77*47af42f8SMartin Matuska  * Solaris 9 extended tar format (including ACLs)
78*47af42f8SMartin Matuska  * POSIX pax interchange format
79*47af42f8SMartin Matuska  * POSIX octet-oriented cpio
80*47af42f8SMartin Matuska  * SVR4 ASCII cpio
81*47af42f8SMartin Matuska  * POSIX octet-oriented cpio
82*47af42f8SMartin Matuska  * Binary cpio (big-endian or little-endian)
83*47af42f8SMartin Matuska  * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions)
84*47af42f8SMartin Matuska  * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives)
85*47af42f8SMartin Matuska  * GNU and BSD 'ar' archives
86*47af42f8SMartin Matuska  * 'mtree' format
87*47af42f8SMartin Matuska  * 7-Zip archives
88*47af42f8SMartin Matuska  * Microsoft CAB format
89*47af42f8SMartin Matuska  * LHA and LZH archives
90*47af42f8SMartin Matuska  * RAR archives (with some limitations due to RAR's proprietary status)
91*47af42f8SMartin Matuska  * XAR archives
92*47af42f8SMartin Matuska
93*47af42f8SMartin MatuskaThe library also detects and handles any of the following before evaluating the archive:
94*47af42f8SMartin Matuska  * uuencoded files
95*47af42f8SMartin Matuska  * files with RPM wrapper
96*47af42f8SMartin Matuska  * gzip compression
97*47af42f8SMartin Matuska  * bzip2 compression
98*47af42f8SMartin Matuska  * compress/LZW compression
99*47af42f8SMartin Matuska  * lzma, lzip, and xz compression
100*47af42f8SMartin Matuska  * lz4 compression
101*47af42f8SMartin Matuska  * lzop compression
102*47af42f8SMartin Matuska
103*47af42f8SMartin MatuskaThe library can create archives in any of the following formats:
104*47af42f8SMartin Matuska  * POSIX ustar
105*47af42f8SMartin Matuska  * POSIX pax interchange format
106*47af42f8SMartin Matuska  * "restricted" pax format, which will create ustar archives except for
107*47af42f8SMartin Matuska    entries that require pax extensions (for long filenames, ACLs, etc).
108*47af42f8SMartin Matuska  * Old GNU tar format
109*47af42f8SMartin Matuska  * Old V7 tar format
110*47af42f8SMartin Matuska  * POSIX octet-oriented cpio
111*47af42f8SMartin Matuska  * SVR4 "newc" cpio
112*47af42f8SMartin Matuska  * shar archives
113*47af42f8SMartin Matuska  * ZIP archives (with uncompressed or "deflate" compressed entries)
114*47af42f8SMartin Matuska  * GNU and BSD 'ar' archives
115*47af42f8SMartin Matuska  * 'mtree' format
116*47af42f8SMartin Matuska  * ISO9660 format
117*47af42f8SMartin Matuska  * 7-Zip archives
118*47af42f8SMartin Matuska  * XAR archives
119*47af42f8SMartin Matuska
120*47af42f8SMartin MatuskaWhen creating archives, the result can be filtered with any of the following:
121*47af42f8SMartin Matuska  * uuencode
122*47af42f8SMartin Matuska  * gzip compression
123*47af42f8SMartin Matuska  * bzip2 compression
124*47af42f8SMartin Matuska  * compress/LZW compression
125*47af42f8SMartin Matuska  * lzma, lzip, and xz compression
126*47af42f8SMartin Matuska  * lz4 compression
127*47af42f8SMartin Matuska  * lzop compression
128*47af42f8SMartin Matuska
129*47af42f8SMartin Matuska## Notes about the Library Design
130*47af42f8SMartin Matuska
131*47af42f8SMartin MatuskaThe following notes address many of the most common
132*47af42f8SMartin Matuskaquestions we are asked about libarchive:
133*47af42f8SMartin Matuska
134*47af42f8SMartin Matuska* This is a heavily stream-oriented system.  That means that
135*47af42f8SMartin Matuska  it is optimized to read or write the archive in a single
136*47af42f8SMartin Matuska  pass from beginning to end.  For example, this allows
137*47af42f8SMartin Matuska  libarchive to process archives too large to store on disk
138*47af42f8SMartin Matuska  by processing them on-the-fly as they are read from or
139*47af42f8SMartin Matuska  written to a network or tape drive.  This also makes
140*47af42f8SMartin Matuska  libarchive useful for tools that need to produce
141*47af42f8SMartin Matuska  archives on-the-fly (such as webservers that provide
142*47af42f8SMartin Matuska  archived contents of a users account).
143*47af42f8SMartin Matuska
144*47af42f8SMartin Matuska* In-place modification and random access to the contents
145*47af42f8SMartin Matuska  of an archive are not directly supported.  For some formats,
146*47af42f8SMartin Matuska  this is not an issue: For example, tar.gz archives are not
147*47af42f8SMartin Matuska  designed for random access.  In some other cases, libarchive
148*47af42f8SMartin Matuska  can re-open an archive and scan it from the beginning quickly
149*47af42f8SMartin Matuska  enough to provide the needed abilities even without true
150*47af42f8SMartin Matuska  random access.  Of course, some applications do require true
151*47af42f8SMartin Matuska  random access; those applications should consider alternatives
152*47af42f8SMartin Matuska  to libarchive.
153*47af42f8SMartin Matuska
154*47af42f8SMartin Matuska* The library is designed to be extended with new compression and
155*47af42f8SMartin Matuska  archive formats.  The only requirement is that the format be
156*47af42f8SMartin Matuska  readable or writable as a stream and that each archive entry be
157*47af42f8SMartin Matuska  independent.  There are articles on the libarchive Wiki explaining
158*47af42f8SMartin Matuska  how to extend libarchive.
159*47af42f8SMartin Matuska
160*47af42f8SMartin Matuska* On read, compression and format are always detected automatically.
161*47af42f8SMartin Matuska
162*47af42f8SMartin Matuska* The same API is used for all formats; in particular, it's very
163*47af42f8SMartin Matuska  easy for software using libarchive to transparently handle
164*47af42f8SMartin Matuska  any of libarchive's archiving formats.
165*47af42f8SMartin Matuska
166*47af42f8SMartin Matuska* Libarchive's automatic support for decompression can be used
167*47af42f8SMartin Matuska  without archiving by explicitly selecting the "raw" and "empty"
168*47af42f8SMartin Matuska  formats.
169*47af42f8SMartin Matuska
170*47af42f8SMartin Matuska* I've attempted to minimize static link pollution.  If you don't
171*47af42f8SMartin Matuska  explicitly invoke a particular feature (such as support for a
172*47af42f8SMartin Matuska  particular compression or format), it won't get pulled in to
173*47af42f8SMartin Matuska  statically-linked programs.  In particular, if you don't explicitly
174*47af42f8SMartin Matuska  enable a particular compression or decompression support, you won't
175*47af42f8SMartin Matuska  need to link against the corresponding compression or decompression
176*47af42f8SMartin Matuska  libraries.  This also reduces the size of statically-linked
177*47af42f8SMartin Matuska  binaries in environments where that matters.
178*47af42f8SMartin Matuska
179*47af42f8SMartin Matuska* The library is generally _thread safe_ depending on the platform:
180*47af42f8SMartin Matuska  it does not define any global variables of its own.  However, some
181*47af42f8SMartin Matuska  platforms do not provide fully thread-safe versions of key C library
182*47af42f8SMartin Matuska  functions.  On those platforms, libarchive will use the non-thread-safe
183*47af42f8SMartin Matuska  functions.  Patches to improve this are of great interest to us.
184*47af42f8SMartin Matuska
185*47af42f8SMartin Matuska* In particular, libarchive's modules to read or write a directory
186*47af42f8SMartin Matuska  tree do use `chdir()` to optimize the directory traversals.  This
187*47af42f8SMartin Matuska  can cause problems for programs that expect to do disk access from
188*47af42f8SMartin Matuska  multiple threads.  Of course, those modules are completely
189*47af42f8SMartin Matuska  optional and you can use the rest of libarchive without them.
190*47af42f8SMartin Matuska
191*47af42f8SMartin Matuska* The library is _not_ thread aware, however.  It does no locking
192*47af42f8SMartin Matuska  or thread management of any kind.  If you create a libarchive
193*47af42f8SMartin Matuska  object and need to access it from multiple threads, you will
194*47af42f8SMartin Matuska  need to provide your own locking.
195*47af42f8SMartin Matuska
196*47af42f8SMartin Matuska* On read, the library accepts whatever blocks you hand it.
197*47af42f8SMartin Matuska  Your read callback is free to pass the library a byte at a time
198*47af42f8SMartin Matuska  or mmap the entire archive and give it to the library at once.
199*47af42f8SMartin Matuska  On write, the library always produces correctly-blocked output.
200*47af42f8SMartin Matuska
201*47af42f8SMartin Matuska* The object-style approach allows you to have multiple archive streams
202*47af42f8SMartin Matuska  open at once.  bsdtar uses this in its "@archive" extension.
203*47af42f8SMartin Matuska
204*47af42f8SMartin Matuska* The archive itself is read/written using callback functions.
205*47af42f8SMartin Matuska  You can read an archive directly from an in-memory buffer or
206*47af42f8SMartin Matuska  write it to a socket, if you wish.  There are some utility
207*47af42f8SMartin Matuska  functions to provide easy-to-use "open file," etc, capabilities.
208*47af42f8SMartin Matuska
209*47af42f8SMartin Matuska* The read/write APIs are designed to allow individual entries
210*47af42f8SMartin Matuska  to be read or written to any data source:  You can create
211*47af42f8SMartin Matuska  a block of data in memory and add it to a tar archive without
212*47af42f8SMartin Matuska  first writing a temporary file.  You can also read an entry from
213*47af42f8SMartin Matuska  an archive and write the data directly to a socket.  If you want
214*47af42f8SMartin Matuska  to read/write entries to disk, there are convenience functions to
215*47af42f8SMartin Matuska  make this especially easy.
216*47af42f8SMartin Matuska
217*47af42f8SMartin Matuska* Note: The "pax interchange format" is a POSIX standard extended tar
218*47af42f8SMartin Matuska  format that should be used when the older _ustar_ format is not
219*47af42f8SMartin Matuska  appropriate.  It has many advantages over other tar formats
220*47af42f8SMartin Matuska  (including the legacy GNU tar format) and is widely supported by
221*47af42f8SMartin Matuska  current tar implementations.
222*47af42f8SMartin Matuska
223