README.md
1# Welcome to libarchive!
2
3The libarchive project develops a portable, efficient C library that
4can read and write streaming archives in a variety of formats. It
5also includes implementations of the common `tar`, `cpio`, and `zcat`
6command-line tools that use the libarchive library.
7
8## Questions? Issues?
9
10* https://www.libarchive.org is the home for ongoing
11 libarchive development, including documentation,
12 and links to the libarchive mailing lists.
13* To report an issue, use the issue tracker at
14 https://github.com/libarchive/libarchive/issues
15* To submit an enhancement to libarchive, please
16 submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls
17
18## Contents of the Distribution
19
20This distribution bundle includes the following major components:
21
22* **libarchive**: a library for reading and writing streaming archives
23* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive
24* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality
25* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such
26* **unzip**: the 'bsdunzip' program is a simple replacement tool for Info-ZIP's unzip
27* **examples**: Some small example programs that you may find useful.
28* **examples/minitar**: a compact sample demonstrating use of libarchive.
29* **contrib**: Various items sent to me by third parties; please contact the authors with any questions.
30
31The top-level directory contains the following information files:
32
33* **NEWS** - highlights of recent changes
34* **COPYING** - what you can do with this
35* **INSTALL** - installation instructions
36* **README** - this file
37* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL
38* **configure** - configuration script, see INSTALL for details. If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`).
39
40The following files in the top-level directory are related to the 'configure' script and are only needed by maintainers:
41
42* `configure.ac` - used (by autoconf) to build the configure script and related files
43* `Makefile.am` - used (by automake) to generate Makefile.in
44* `aclocal.m4` - auto-generated file (created by aclocal) used to build the configure script
45* `Makefile.in` - auto-generated template (created by automake) used by the configure script to create Makefile
46* `config.h.in` - auto-generated template (created by autoheader) used by the configure script to create config.h
47
48## Documentation
49
50In addition to the informational articles and documentation
51in the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki),
52the distribution also includes a number of manual pages:
53
54 * bsdtar.1 explains the use of the bsdtar program
55 * bsdcpio.1 explains the use of the bsdcpio program
56 * bsdcat.1 explains the use of the bsdcat program
57 * libarchive.3 gives an overview of the library as a whole
58 * archive_read.3, archive_write.3, archive_write_disk.3, and
59 archive_read_disk.3 provide detailed calling sequences for the read
60 and write APIs
61 * archive_entry.3 details the "struct archive_entry" utility class
62 * archive_internals.3 provides some insight into libarchive's
63 internal structure and operation.
64 * libarchive-formats.5 documents the file formats supported by the library
65 * cpio.5, mtree.5, and tar.5 provide detailed information about these
66 popular archive formats, including hard-to-find details about
67 modern cpio and tar variants.
68
69The manual pages above are provided in the 'doc' directory in
70a number of different formats.
71
72You should also read the copious comments in `archive.h` and the
73source code for the sample programs for more details. Please let us
74know about any errors or omissions you find.
75
76## Supported Formats
77
78Currently, the library automatically detects and reads the following formats:
79
80 * Old V7 tar archives
81 * POSIX ustar
82 * GNU tar format (including GNU long filenames, long link names, and sparse files)
83 * Solaris 9 extended tar format (including ACLs)
84 * POSIX pax interchange format
85 * POSIX octet-oriented cpio
86 * SVR4 ASCII cpio
87 * Binary cpio (big-endian or little-endian)
88 * PWB binary cpio
89 * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions)
90 * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives)
91 * ZIPX archives (with support for bzip2, zstd, ppmd8, lzma and xz compressed entries)
92 * GNU and BSD 'ar' archives
93 * 'mtree' format
94 * 7-Zip archives (including archives that use zstandard compression)
95 * Microsoft CAB format
96 * LHA and LZH archives
97 * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status)
98 * WARC archives
99 * XAR archives
100
101The library also detects and handles any of the following before evaluating the archive:
102
103 * uuencoded files
104 * files with RPM wrapper
105 * gzip compression
106 * bzip2 compression
107 * compress/LZW compression
108 * lzma, lzip, and xz compression
109 * lz4 compression
110 * lzop compression
111 * zstandard compression
112
113The library can create archives in any of the following formats:
114
115 * POSIX ustar
116 * POSIX pax interchange format
117 * "restricted" pax format, which will create ustar archives except for
118 entries that require pax extensions (for long filenames, ACLs, etc).
119 * Old GNU tar format
120 * Old V7 tar format
121 * POSIX octet-oriented cpio
122 * SVR4 "newc" cpio
123 * Binary cpio (little-endian)
124 * PWB binary cpio
125 * shar archives
126 * ZIP archives (with uncompressed or "deflate" compressed entries)
127 * ZIPX archives (with bzip2, zstd, lzma or xz compressed entries)
128 * GNU and BSD 'ar' archives
129 * 'mtree' format
130 * ISO9660 format
131 * 7-Zip archives (including archives that use zstandard compression)
132 * WARC archives
133 * XAR archives
134
135When creating archives, the result can be filtered with any of the following:
136
137 * uuencode
138 * base64
139 * gzip compression
140 * bzip2 compression
141 * compress/LZW compression
142 * lzma, lzip, and xz compression
143 * lz4 compression
144 * lzop compression
145 * zstandard compression
146
147## Notes about the Library Design
148
149The following notes address many of the most common
150questions we are asked about libarchive:
151
152* This is a heavily stream-oriented system. That means that
153 it is optimized to read or write the archive in a single
154 pass from beginning to end. For example, this allows
155 libarchive to process archives too large to store on disk
156 by processing them on-the-fly as they are read from or
157 written to a network or tape drive. This also makes
158 libarchive useful for tools that need to produce
159 archives on-the-fly (such as webservers that provide
160 archived contents of a users account).
161
162* In-place modification and random access to the contents
163 of an archive are not directly supported. For some formats,
164 this is not an issue: For example, tar.gz archives are not
165 designed for random access. In some other cases, libarchive
166 can re-open an archive and scan it from the beginning quickly
167 enough to provide the needed abilities even without true
168 random access. Of course, some applications do require true
169 random access; those applications should consider alternatives
170 to libarchive.
171
172* The library is designed to be extended with new compression and
173 archive formats. The only requirement is that the format be
174 readable or writable as a stream and that each archive entry be
175 independent. There are articles on the libarchive Wiki explaining
176 how to extend libarchive.
177
178* On read, compression and format are always detected automatically.
179
180* The same API is used for all formats; it should be very
181 easy for software using libarchive to transparently handle
182 any of libarchive's archiving formats.
183
184* Libarchive's automatic support for decompression can be used
185 without archiving by explicitly selecting the "raw" and "empty"
186 formats.
187
188* I've attempted to minimize static link pollution. If you don't
189 explicitly invoke a particular feature (such as support for a
190 particular compression or format), it won't get pulled in to
191 statically-linked programs. In particular, if you don't explicitly
192 enable a particular compression or decompression support, you won't
193 need to link against the corresponding compression or decompression
194 libraries. This also reduces the size of statically-linked
195 binaries in environments where that matters.
196
197* The library is generally _thread-safe_ depending on the platform:
198 it does not define any global variables of its own. However, some
199 platforms do not provide fully thread-safe versions of key C library
200 functions. On those platforms, libarchive will use the non-thread-safe
201 functions. Patches to improve this are of great interest to us.
202
203* The function `archive_write_disk_header()` is _not_ thread safe on
204 POSIX machines and could lead to security issue resulting in world
205 writeable directories. Thus it must be mutexed by the calling code.
206 This is due to calling `umask(oldumask = umask(0))`, which sets the
207 umask for the whole process to 0 for a short time frame.
208 In case other thread calls the same function in parallel, it might
209 get interrupted by it and cause the executable to use umask=0 for the
210 remaining execution.
211 This will then lead to implicitly created directories to have 777
212 permissions without sticky bit.
213
214* In particular, libarchive's modules to read or write a directory
215 tree do use `chdir()` to optimize the directory traversals. This
216 can cause problems for programs that expect to do disk access from
217 multiple threads. Of course, those modules are completely
218 optional and you can use the rest of libarchive without them.
219
220* The library is _not_ thread-aware, however. It does no locking
221 or thread management of any kind. If you create a libarchive
222 object and need to access it from multiple threads, you will
223 need to provide your own locking.
224
225* On read, the library accepts whatever blocks you hand it.
226 Your read callback is free to pass the library a byte at a time
227 or mmap the entire archive and give it to the library at once.
228 On write, the library always produces correctly-blocked output.
229
230* The object-style approach allows you to have multiple archive streams
231 open at once. bsdtar uses this in its "@archive" extension.
232
233* The archive itself is read/written using callback functions.
234 You can read an archive directly from an in-memory buffer or
235 write it to a socket, if you wish. There are some utility
236 functions to provide easy-to-use "open file," etc, capabilities.
237
238* The read/write APIs are designed to allow individual entries
239 to be read or written to any data source: You can create
240 a block of data in memory and add it to a tar archive without
241 first writing a temporary file. You can also read an entry from
242 an archive and write the data directly to a socket. If you want
243 to read/write entries to disk, there are convenience functions to
244 make this especially easy.
245
246* Note: The "pax interchange format" is a POSIX standard extended tar
247 format that should be used when the older _ustar_ format is not
248 appropriate. It has many advantages over other tar formats
249 (including the legacy GNU tar format) and is widely supported by
250 current tar implementations.
251