xref: /freebsd/contrib/libarchive/README.md (revision fa4d25f5b4573a54eebeb7f254b52153b8d3811e)
1# Welcome to libarchive!
2
3The libarchive project develops a portable, efficient C library that
4can read and write streaming archives in a variety of formats.  It
5also includes implementations of the common `tar`, `cpio`, and `zcat`
6command-line tools that use the libarchive library.
7
8## Questions?  Issues?
9
10* http://www.libarchive.org is the home for ongoing
11  libarchive development, including documentation,
12  and links to the libarchive mailing lists.
13* To report an issue, use the issue tracker at
14  https://github.com/libarchive/libarchive/issues
15* To submit an enhancement to libarchive, please
16  submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls
17
18## Contents of the Distribution
19
20This distribution bundle includes the following major components:
21
22* **libarchive**: a library for reading and writing streaming archives
23* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive
24* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality
25* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such
26* **examples**: Some small example programs that you may find useful.
27* **examples/minitar**: a compact sample demonstrating use of libarchive.
28* **contrib**:  Various items sent to me by third parties; please contact the authors with any questions.
29
30The top-level directory contains the following information files:
31
32* **NEWS** - highlights of recent changes
33* **COPYING** - what you can do with this
34* **INSTALL** - installation instructions
35* **README** - this file
36* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL
37* **configure** - configuration script, see INSTALL for details.  If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`).
38
39The following files in the top-level directory are used by the 'configure' script:
40
41* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers
42* `Makefile.in`, `config.h.in` - templates used by configure script
43
44## Documentation
45
46In addition to the informational articles and documentation
47in the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki),
48the distribution also includes a number of manual pages:
49
50 * bsdtar.1 explains the use of the bsdtar program
51 * bsdcpio.1 explains the use of the bsdcpio program
52 * bsdcat.1 explains the use of the bsdcat program
53 * libarchive.3 gives an overview of the library as a whole
54 * archive_read.3, archive_write.3, archive_write_disk.3, and
55   archive_read_disk.3 provide detailed calling sequences for the read
56   and write APIs
57 * archive_entry.3 details the "struct archive_entry" utility class
58 * archive_internals.3 provides some insight into libarchive's
59   internal structure and operation.
60 * libarchive-formats.5 documents the file formats supported by the library
61 * cpio.5, mtree.5, and tar.5 provide detailed information about these
62   popular archive formats, including hard-to-find details about
63   modern cpio and tar variants.
64
65The manual pages above are provided in the 'doc' directory in
66a number of different formats.
67
68You should also read the copious comments in `archive.h` and the
69source code for the sample programs for more details.  Please let us
70know about any errors or omissions you find.
71
72## Supported Formats
73
74Currently, the library automatically detects and reads the following formats:
75
76  * Old V7 tar archives
77  * POSIX ustar
78  * GNU tar format (including GNU long filenames, long link names, and sparse files)
79  * Solaris 9 extended tar format (including ACLs)
80  * POSIX pax interchange format
81  * POSIX octet-oriented cpio
82  * SVR4 ASCII cpio
83  * Binary cpio (big-endian or little-endian)
84  * PWB binary cpio
85  * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions)
86  * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives)
87  * ZIPX archives (with support for bzip2, ppmd8, lzma and xz compressed entries)
88  * GNU and BSD 'ar' archives
89  * 'mtree' format
90  * 7-Zip archives
91  * Microsoft CAB format
92  * LHA and LZH archives
93  * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status)
94  * XAR archives
95
96The library also detects and handles any of the following before evaluating the archive:
97
98  * uuencoded files
99  * files with RPM wrapper
100  * gzip compression
101  * bzip2 compression
102  * compress/LZW compression
103  * lzma, lzip, and xz compression
104  * lz4 compression
105  * lzop compression
106  * zstandard compression
107
108The library can create archives in any of the following formats:
109
110  * POSIX ustar
111  * POSIX pax interchange format
112  * "restricted" pax format, which will create ustar archives except for
113    entries that require pax extensions (for long filenames, ACLs, etc).
114  * Old GNU tar format
115  * Old V7 tar format
116  * POSIX octet-oriented cpio
117  * SVR4 "newc" cpio
118  * Binary cpio (little-endian)
119  * PWB binary cpio
120  * shar archives
121  * ZIP archives (with uncompressed or "deflate" compressed entries)
122  * GNU and BSD 'ar' archives
123  * 'mtree' format
124  * ISO9660 format
125  * 7-Zip archives
126  * XAR archives
127
128When creating archives, the result can be filtered with any of the following:
129
130  * uuencode
131  * gzip compression
132  * bzip2 compression
133  * compress/LZW compression
134  * lzma, lzip, and xz compression
135  * lz4 compression
136  * lzop compression
137  * zstandard compression
138
139## Notes about the Library Design
140
141The following notes address many of the most common
142questions we are asked about libarchive:
143
144* This is a heavily stream-oriented system.  That means that
145  it is optimized to read or write the archive in a single
146  pass from beginning to end.  For example, this allows
147  libarchive to process archives too large to store on disk
148  by processing them on-the-fly as they are read from or
149  written to a network or tape drive.  This also makes
150  libarchive useful for tools that need to produce
151  archives on-the-fly (such as webservers that provide
152  archived contents of a users account).
153
154* In-place modification and random access to the contents
155  of an archive are not directly supported.  For some formats,
156  this is not an issue: For example, tar.gz archives are not
157  designed for random access.  In some other cases, libarchive
158  can re-open an archive and scan it from the beginning quickly
159  enough to provide the needed abilities even without true
160  random access.  Of course, some applications do require true
161  random access; those applications should consider alternatives
162  to libarchive.
163
164* The library is designed to be extended with new compression and
165  archive formats.  The only requirement is that the format be
166  readable or writable as a stream and that each archive entry be
167  independent.  There are articles on the libarchive Wiki explaining
168  how to extend libarchive.
169
170* On read, compression and format are always detected automatically.
171
172* The same API is used for all formats; it should be very
173  easy for software using libarchive to transparently handle
174  any of libarchive's archiving formats.
175
176* Libarchive's automatic support for decompression can be used
177  without archiving by explicitly selecting the "raw" and "empty"
178  formats.
179
180* I've attempted to minimize static link pollution.  If you don't
181  explicitly invoke a particular feature (such as support for a
182  particular compression or format), it won't get pulled in to
183  statically-linked programs.  In particular, if you don't explicitly
184  enable a particular compression or decompression support, you won't
185  need to link against the corresponding compression or decompression
186  libraries.  This also reduces the size of statically-linked
187  binaries in environments where that matters.
188
189* The library is generally _thread safe_ depending on the platform:
190  it does not define any global variables of its own.  However, some
191  platforms do not provide fully thread-safe versions of key C library
192  functions.  On those platforms, libarchive will use the non-thread-safe
193  functions.  Patches to improve this are of great interest to us.
194
195* In particular, libarchive's modules to read or write a directory
196  tree do use `chdir()` to optimize the directory traversals.  This
197  can cause problems for programs that expect to do disk access from
198  multiple threads.  Of course, those modules are completely
199  optional and you can use the rest of libarchive without them.
200
201* The library is _not_ thread aware, however.  It does no locking
202  or thread management of any kind.  If you create a libarchive
203  object and need to access it from multiple threads, you will
204  need to provide your own locking.
205
206* On read, the library accepts whatever blocks you hand it.
207  Your read callback is free to pass the library a byte at a time
208  or mmap the entire archive and give it to the library at once.
209  On write, the library always produces correctly-blocked output.
210
211* The object-style approach allows you to have multiple archive streams
212  open at once.  bsdtar uses this in its "@archive" extension.
213
214* The archive itself is read/written using callback functions.
215  You can read an archive directly from an in-memory buffer or
216  write it to a socket, if you wish.  There are some utility
217  functions to provide easy-to-use "open file," etc, capabilities.
218
219* The read/write APIs are designed to allow individual entries
220  to be read or written to any data source:  You can create
221  a block of data in memory and add it to a tar archive without
222  first writing a temporary file.  You can also read an entry from
223  an archive and write the data directly to a socket.  If you want
224  to read/write entries to disk, there are convenience functions to
225  make this especially easy.
226
227* Note: The "pax interchange format" is a POSIX standard extended tar
228  format that should be used when the older _ustar_ format is not
229  appropriate.  It has many advantages over other tar formats
230  (including the legacy GNU tar format) and is widely supported by
231  current tar implementations.
232
233