1# Welcome to libarchive! 2 3The libarchive project develops a portable, efficient C library that 4can read and write streaming archives in a variety of formats. It 5also includes implementations of the common `tar`, `cpio`, and `zcat` 6command-line tools that use the libarchive library. 7 8## Questions? Issues? 9 10* https://www.libarchive.org is the home for ongoing 11 libarchive development, including documentation, 12 and links to the libarchive mailing lists. 13* To report an issue, use the issue tracker at 14 https://github.com/libarchive/libarchive/issues 15* To submit an enhancement to libarchive, please 16 submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls 17 18## Contents of the Distribution 19 20This distribution bundle includes the following major components: 21 22* **libarchive**: a library for reading and writing streaming archives 23* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive 24* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality 25* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such 26* **unzip**: the 'bsdunzip' program is a simple replacement tool for Info-ZIP's unzip 27* **examples**: Some small example programs that you may find useful. 28* **examples/minitar**: a compact sample demonstrating use of libarchive. 29* **contrib**: Various items sent to me by third parties; please contact the authors with any questions. 30 31The top-level directory contains the following information files: 32 33* **NEWS** - highlights of recent changes 34* **COPYING** - what you can do with this 35* **INSTALL** - installation instructions 36* **README** - this file 37* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL 38* **configure** - configuration script, see INSTALL for details. If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`). 39 40The following files in the top-level directory are used by the 'configure' script: 41 42* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers 43* `Makefile.in`, `config.h.in` - templates used by configure script 44 45## Documentation 46 47In addition to the informational articles and documentation 48in the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki), 49the distribution also includes a number of manual pages: 50 51 * bsdtar.1 explains the use of the bsdtar program 52 * bsdcpio.1 explains the use of the bsdcpio program 53 * bsdcat.1 explains the use of the bsdcat program 54 * libarchive.3 gives an overview of the library as a whole 55 * archive_read.3, archive_write.3, archive_write_disk.3, and 56 archive_read_disk.3 provide detailed calling sequences for the read 57 and write APIs 58 * archive_entry.3 details the "struct archive_entry" utility class 59 * archive_internals.3 provides some insight into libarchive's 60 internal structure and operation. 61 * libarchive-formats.5 documents the file formats supported by the library 62 * cpio.5, mtree.5, and tar.5 provide detailed information about these 63 popular archive formats, including hard-to-find details about 64 modern cpio and tar variants. 65 66The manual pages above are provided in the 'doc' directory in 67a number of different formats. 68 69You should also read the copious comments in `archive.h` and the 70source code for the sample programs for more details. Please let us 71know about any errors or omissions you find. 72 73## Supported Formats 74 75Currently, the library automatically detects and reads the following formats: 76 77 * Old V7 tar archives 78 * POSIX ustar 79 * GNU tar format (including GNU long filenames, long link names, and sparse files) 80 * Solaris 9 extended tar format (including ACLs) 81 * POSIX pax interchange format 82 * POSIX octet-oriented cpio 83 * SVR4 ASCII cpio 84 * Binary cpio (big-endian or little-endian) 85 * PWB binary cpio 86 * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions) 87 * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives) 88 * ZIPX archives (with support for bzip2, zstd, ppmd8, lzma and xz compressed entries) 89 * GNU and BSD 'ar' archives 90 * 'mtree' format 91 * 7-Zip archives (including archives that use zstandard compression) 92 * Microsoft CAB format 93 * LHA and LZH archives 94 * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status) 95 * WARC archives 96 * XAR archives 97 98The library also detects and handles any of the following before evaluating the archive: 99 100 * uuencoded files 101 * files with RPM wrapper 102 * gzip compression 103 * bzip2 compression 104 * compress/LZW compression 105 * lzma, lzip, and xz compression 106 * lz4 compression 107 * lzop compression 108 * zstandard compression 109 110The library can create archives in any of the following formats: 111 112 * POSIX ustar 113 * POSIX pax interchange format 114 * "restricted" pax format, which will create ustar archives except for 115 entries that require pax extensions (for long filenames, ACLs, etc). 116 * Old GNU tar format 117 * Old V7 tar format 118 * POSIX octet-oriented cpio 119 * SVR4 "newc" cpio 120 * Binary cpio (little-endian) 121 * PWB binary cpio 122 * shar archives 123 * ZIP archives (with uncompressed or "deflate" compressed entries) 124 * ZIPX archives (with bzip2, zstd, lzma or xz compressed entries) 125 * GNU and BSD 'ar' archives 126 * 'mtree' format 127 * ISO9660 format 128 * 7-Zip archives (including archives that use zstandard compression) 129 * WARC archives 130 * XAR archives 131 132When creating archives, the result can be filtered with any of the following: 133 134 * uuencode 135 * base64 136 * gzip compression 137 * bzip2 compression 138 * compress/LZW compression 139 * lzma, lzip, and xz compression 140 * lz4 compression 141 * lzop compression 142 * zstandard compression 143 144## Notes about the Library Design 145 146The following notes address many of the most common 147questions we are asked about libarchive: 148 149* This is a heavily stream-oriented system. That means that 150 it is optimized to read or write the archive in a single 151 pass from beginning to end. For example, this allows 152 libarchive to process archives too large to store on disk 153 by processing them on-the-fly as they are read from or 154 written to a network or tape drive. This also makes 155 libarchive useful for tools that need to produce 156 archives on-the-fly (such as webservers that provide 157 archived contents of a users account). 158 159* In-place modification and random access to the contents 160 of an archive are not directly supported. For some formats, 161 this is not an issue: For example, tar.gz archives are not 162 designed for random access. In some other cases, libarchive 163 can re-open an archive and scan it from the beginning quickly 164 enough to provide the needed abilities even without true 165 random access. Of course, some applications do require true 166 random access; those applications should consider alternatives 167 to libarchive. 168 169* The library is designed to be extended with new compression and 170 archive formats. The only requirement is that the format be 171 readable or writable as a stream and that each archive entry be 172 independent. There are articles on the libarchive Wiki explaining 173 how to extend libarchive. 174 175* On read, compression and format are always detected automatically. 176 177* The same API is used for all formats; it should be very 178 easy for software using libarchive to transparently handle 179 any of libarchive's archiving formats. 180 181* Libarchive's automatic support for decompression can be used 182 without archiving by explicitly selecting the "raw" and "empty" 183 formats. 184 185* I've attempted to minimize static link pollution. If you don't 186 explicitly invoke a particular feature (such as support for a 187 particular compression or format), it won't get pulled in to 188 statically-linked programs. In particular, if you don't explicitly 189 enable a particular compression or decompression support, you won't 190 need to link against the corresponding compression or decompression 191 libraries. This also reduces the size of statically-linked 192 binaries in environments where that matters. 193 194* The library is generally _thread safe_ depending on the platform: 195 it does not define any global variables of its own. However, some 196 platforms do not provide fully thread-safe versions of key C library 197 functions. On those platforms, libarchive will use the non-thread-safe 198 functions. Patches to improve this are of great interest to us. 199 200* The function `archive_write_disk_header()` is _not_ thread safe on 201 POSIX machines and could lead to security issue resulting in world 202 writeable directories. Thus it must be mutexed by the calling code. 203 This is due to calling `umask(oldumask = umask(0))`, which sets the 204 umask for the whole process to 0 for a short time frame. 205 In case other thread calls the same function in parallel, it might 206 get interrupted by it and cause the executable to use umask=0 for the 207 remaining execution. 208 This will then lead to implicitly created directories to have 777 209 permissions without sticky bit. 210 211* In particular, libarchive's modules to read or write a directory 212 tree do use `chdir()` to optimize the directory traversals. This 213 can cause problems for programs that expect to do disk access from 214 multiple threads. Of course, those modules are completely 215 optional and you can use the rest of libarchive without them. 216 217* The library is _not_ thread aware, however. It does no locking 218 or thread management of any kind. If you create a libarchive 219 object and need to access it from multiple threads, you will 220 need to provide your own locking. 221 222* On read, the library accepts whatever blocks you hand it. 223 Your read callback is free to pass the library a byte at a time 224 or mmap the entire archive and give it to the library at once. 225 On write, the library always produces correctly-blocked output. 226 227* The object-style approach allows you to have multiple archive streams 228 open at once. bsdtar uses this in its "@archive" extension. 229 230* The archive itself is read/written using callback functions. 231 You can read an archive directly from an in-memory buffer or 232 write it to a socket, if you wish. There are some utility 233 functions to provide easy-to-use "open file," etc, capabilities. 234 235* The read/write APIs are designed to allow individual entries 236 to be read or written to any data source: You can create 237 a block of data in memory and add it to a tar archive without 238 first writing a temporary file. You can also read an entry from 239 an archive and write the data directly to a socket. If you want 240 to read/write entries to disk, there are convenience functions to 241 make this especially easy. 242 243* Note: The "pax interchange format" is a POSIX standard extended tar 244 format that should be used when the older _ustar_ format is not 245 appropriate. It has many advantages over other tar formats 246 (including the legacy GNU tar format) and is widely supported by 247 current tar implementations. 248