xref: /freebsd/contrib/libarchive/README.md (revision 52c2bb75163559a6e2866ad374a7de67a4ea1273)
1# Welcome to libarchive!
2
3The libarchive project develops a portable, efficient C library that
4can read and write streaming archives in a variety of formats.  It
5also includes implementations of the common `tar`, `cpio`, and `zcat`
6command-line tools that use the libarchive library.
7
8## Questions?  Issues?
9
10* http://www.libarchive.org is the home for ongoing
11  libarchive development, including documentation,
12  and links to the libarchive mailing lists.
13* To report an issue, use the issue tracker at
14  https://github.com/libarchive/libarchive/issues
15* To submit an enhancement to libarchive, please
16  submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls
17
18## Contents of the Distribution
19
20This distribution bundle includes the following major components:
21
22* **libarchive**: a library for reading and writing streaming archives
23* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive
24* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality
25* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such
26* **examples**: Some small example programs that you may find useful.
27* **examples/minitar**: a compact sample demonstrating use of libarchive.
28* **contrib**:  Various items sent to me by third parties; please contact the authors with any questions.
29
30The top-level directory contains the following information files:
31
32* **NEWS** - highlights of recent changes
33* **COPYING** - what you can do with this
34* **INSTALL** - installation instructions
35* **README** - this file
36* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL
37* **configure** - configuration script, see INSTALL for details.  If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`).
38
39The following files in the top-level directory are used by the 'configure' script:
40* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers
41* `Makefile.in`, `config.h.in` - templates used by configure script
42
43## Documentation
44
45In addition to the informational articles and documentation
46in the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki),
47the distribution also includes a number of manual pages:
48
49 * bsdtar.1 explains the use of the bsdtar program
50 * bsdcpio.1 explains the use of the bsdcpio program
51 * bsdcat.1 explains the use of the bsdcat program
52 * libarchive.3 gives an overview of the library as a whole
53 * archive_read.3, archive_write.3, archive_write_disk.3, and
54   archive_read_disk.3 provide detailed calling sequences for the read
55   and write APIs
56 * archive_entry.3 details the "struct archive_entry" utility class
57 * archive_internals.3 provides some insight into libarchive's
58   internal structure and operation.
59 * libarchive-formats.5 documents the file formats supported by the library
60 * cpio.5, mtree.5, and tar.5 provide detailed information about these
61   popular archive formats, including hard-to-find details about
62   modern cpio and tar variants.
63
64The manual pages above are provided in the 'doc' directory in
65a number of different formats.
66
67You should also read the copious comments in `archive.h` and the
68source code for the sample programs for more details.  Please let us
69know about any errors or omissions you find.
70
71## Supported Formats
72
73Currently, the library automatically detects and reads the following fomats:
74  * Old V7 tar archives
75  * POSIX ustar
76  * GNU tar format (including GNU long filenames, long link names, and sparse files)
77  * Solaris 9 extended tar format (including ACLs)
78  * POSIX pax interchange format
79  * POSIX octet-oriented cpio
80  * SVR4 ASCII cpio
81  * Binary cpio (big-endian or little-endian)
82  * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions)
83  * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives)
84  * GNU and BSD 'ar' archives
85  * 'mtree' format
86  * 7-Zip archives
87  * Microsoft CAB format
88  * LHA and LZH archives
89  * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status)
90  * XAR archives
91
92The library also detects and handles any of the following before evaluating the archive:
93  * uuencoded files
94  * files with RPM wrapper
95  * gzip compression
96  * bzip2 compression
97  * compress/LZW compression
98  * lzma, lzip, and xz compression
99  * lz4 compression
100  * lzop compression
101  * zstandard compression
102
103The library can create archives in any of the following formats:
104  * POSIX ustar
105  * POSIX pax interchange format
106  * "restricted" pax format, which will create ustar archives except for
107    entries that require pax extensions (for long filenames, ACLs, etc).
108  * Old GNU tar format
109  * Old V7 tar format
110  * POSIX octet-oriented cpio
111  * SVR4 "newc" cpio
112  * shar archives
113  * ZIP archives (with uncompressed or "deflate" compressed entries)
114  * GNU and BSD 'ar' archives
115  * 'mtree' format
116  * ISO9660 format
117  * 7-Zip archives
118  * XAR archives
119
120When creating archives, the result can be filtered with any of the following:
121  * uuencode
122  * gzip compression
123  * bzip2 compression
124  * compress/LZW compression
125  * lzma, lzip, and xz compression
126  * lz4 compression
127  * lzop compression
128  * zstandard compression
129
130## Notes about the Library Design
131
132The following notes address many of the most common
133questions we are asked about libarchive:
134
135* This is a heavily stream-oriented system.  That means that
136  it is optimized to read or write the archive in a single
137  pass from beginning to end.  For example, this allows
138  libarchive to process archives too large to store on disk
139  by processing them on-the-fly as they are read from or
140  written to a network or tape drive.  This also makes
141  libarchive useful for tools that need to produce
142  archives on-the-fly (such as webservers that provide
143  archived contents of a users account).
144
145* In-place modification and random access to the contents
146  of an archive are not directly supported.  For some formats,
147  this is not an issue: For example, tar.gz archives are not
148  designed for random access.  In some other cases, libarchive
149  can re-open an archive and scan it from the beginning quickly
150  enough to provide the needed abilities even without true
151  random access.  Of course, some applications do require true
152  random access; those applications should consider alternatives
153  to libarchive.
154
155* The library is designed to be extended with new compression and
156  archive formats.  The only requirement is that the format be
157  readable or writable as a stream and that each archive entry be
158  independent.  There are articles on the libarchive Wiki explaining
159  how to extend libarchive.
160
161* On read, compression and format are always detected automatically.
162
163* The same API is used for all formats; it should be very
164  easy for software using libarchive to transparently handle
165  any of libarchive's archiving formats.
166
167* Libarchive's automatic support for decompression can be used
168  without archiving by explicitly selecting the "raw" and "empty"
169  formats.
170
171* I've attempted to minimize static link pollution.  If you don't
172  explicitly invoke a particular feature (such as support for a
173  particular compression or format), it won't get pulled in to
174  statically-linked programs.  In particular, if you don't explicitly
175  enable a particular compression or decompression support, you won't
176  need to link against the corresponding compression or decompression
177  libraries.  This also reduces the size of statically-linked
178  binaries in environments where that matters.
179
180* The library is generally _thread safe_ depending on the platform:
181  it does not define any global variables of its own.  However, some
182  platforms do not provide fully thread-safe versions of key C library
183  functions.  On those platforms, libarchive will use the non-thread-safe
184  functions.  Patches to improve this are of great interest to us.
185
186* In particular, libarchive's modules to read or write a directory
187  tree do use `chdir()` to optimize the directory traversals.  This
188  can cause problems for programs that expect to do disk access from
189  multiple threads.  Of course, those modules are completely
190  optional and you can use the rest of libarchive without them.
191
192* The library is _not_ thread aware, however.  It does no locking
193  or thread management of any kind.  If you create a libarchive
194  object and need to access it from multiple threads, you will
195  need to provide your own locking.
196
197* On read, the library accepts whatever blocks you hand it.
198  Your read callback is free to pass the library a byte at a time
199  or mmap the entire archive and give it to the library at once.
200  On write, the library always produces correctly-blocked output.
201
202* The object-style approach allows you to have multiple archive streams
203  open at once.  bsdtar uses this in its "@archive" extension.
204
205* The archive itself is read/written using callback functions.
206  You can read an archive directly from an in-memory buffer or
207  write it to a socket, if you wish.  There are some utility
208  functions to provide easy-to-use "open file," etc, capabilities.
209
210* The read/write APIs are designed to allow individual entries
211  to be read or written to any data source:  You can create
212  a block of data in memory and add it to a tar archive without
213  first writing a temporary file.  You can also read an entry from
214  an archive and write the data directly to a socket.  If you want
215  to read/write entries to disk, there are convenience functions to
216  make this especially easy.
217
218* Note: The "pax interchange format" is a POSIX standard extended tar
219  format that should be used when the older _ustar_ format is not
220  appropriate.  It has many advantages over other tar formats
221  (including the legacy GNU tar format) and is widely supported by
222  current tar implementations.
223
224