1# Welcome to libarchive! 2 3The libarchive project develops a portable, efficient C library that 4can read and write streaming archives in a variety of formats. It 5also includes implementations of the common `tar`, `cpio`, and `zcat` 6command-line tools that use the libarchive library. 7 8## Questions? Issues? 9 10* https://www.libarchive.org is the home for ongoing 11 libarchive development, including documentation, 12 and links to the libarchive mailing lists. 13* To report an issue, use the issue tracker at 14 https://github.com/libarchive/libarchive/issues 15* To submit an enhancement to libarchive, please 16 submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls 17 18## Contents of the Distribution 19 20This distribution bundle includes the following major components: 21 22* **libarchive**: a library for reading and writing streaming archives 23* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive 24* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality 25* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such 26* **unzip**: the 'bsdunzip' program is a simple replacement tool for Info-ZIP's unzip 27* **examples**: Some small example programs that you may find useful. 28* **examples/minitar**: a compact sample demonstrating use of libarchive. 29* **contrib**: Various items sent to me by third parties; please contact the authors with any questions. 30 31The top-level directory contains the following information files: 32 33* **NEWS** - highlights of recent changes 34* **COPYING** - what you can do with this 35* **INSTALL** - installation instructions 36* **README** - this file 37* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL 38* **configure** - configuration script, see INSTALL for details. If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`). 39 40The following files in the top-level directory are used by the 'configure' script: 41 42* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers 43* `Makefile.in`, `config.h.in` - templates used by configure script 44 45## Documentation 46 47In addition to the informational articles and documentation 48in the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki), 49the distribution also includes a number of manual pages: 50 51 * bsdtar.1 explains the use of the bsdtar program 52 * bsdcpio.1 explains the use of the bsdcpio program 53 * bsdcat.1 explains the use of the bsdcat program 54 * libarchive.3 gives an overview of the library as a whole 55 * archive_read.3, archive_write.3, archive_write_disk.3, and 56 archive_read_disk.3 provide detailed calling sequences for the read 57 and write APIs 58 * archive_entry.3 details the "struct archive_entry" utility class 59 * archive_internals.3 provides some insight into libarchive's 60 internal structure and operation. 61 * libarchive-formats.5 documents the file formats supported by the library 62 * cpio.5, mtree.5, and tar.5 provide detailed information about these 63 popular archive formats, including hard-to-find details about 64 modern cpio and tar variants. 65 66The manual pages above are provided in the 'doc' directory in 67a number of different formats. 68 69You should also read the copious comments in `archive.h` and the 70source code for the sample programs for more details. Please let us 71know about any errors or omissions you find. 72 73## Supported Formats 74 75Currently, the library automatically detects and reads the following formats: 76 77 * Old V7 tar archives 78 * POSIX ustar 79 * GNU tar format (including GNU long filenames, long link names, and sparse files) 80 * Solaris 9 extended tar format (including ACLs) 81 * POSIX pax interchange format 82 * POSIX octet-oriented cpio 83 * SVR4 ASCII cpio 84 * Binary cpio (big-endian or little-endian) 85 * PWB binary cpio 86 * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions) 87 * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives) 88 * ZIPX archives (with support for bzip2, ppmd8, lzma and xz compressed entries) 89 * GNU and BSD 'ar' archives 90 * 'mtree' format 91 * 7-Zip archives (including archives that use zstandard compression) 92 * Microsoft CAB format 93 * LHA and LZH archives 94 * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status) 95 * XAR archives 96 97The library also detects and handles any of the following before evaluating the archive: 98 99 * uuencoded files 100 * files with RPM wrapper 101 * gzip compression 102 * bzip2 compression 103 * compress/LZW compression 104 * lzma, lzip, and xz compression 105 * lz4 compression 106 * lzop compression 107 * zstandard compression 108 109The library can create archives in any of the following formats: 110 111 * POSIX ustar 112 * POSIX pax interchange format 113 * "restricted" pax format, which will create ustar archives except for 114 entries that require pax extensions (for long filenames, ACLs, etc). 115 * Old GNU tar format 116 * Old V7 tar format 117 * POSIX octet-oriented cpio 118 * SVR4 "newc" cpio 119 * Binary cpio (little-endian) 120 * PWB binary cpio 121 * shar archives 122 * ZIP archives (with uncompressed or "deflate" compressed entries) 123 * GNU and BSD 'ar' archives 124 * 'mtree' format 125 * ISO9660 format 126 * 7-Zip archives 127 * XAR archives 128 129When creating archives, the result can be filtered with any of the following: 130 131 * uuencode 132 * gzip compression 133 * bzip2 compression 134 * compress/LZW compression 135 * lzma, lzip, and xz compression 136 * lz4 compression 137 * lzop compression 138 * zstandard compression 139 140## Notes about the Library Design 141 142The following notes address many of the most common 143questions we are asked about libarchive: 144 145* This is a heavily stream-oriented system. That means that 146 it is optimized to read or write the archive in a single 147 pass from beginning to end. For example, this allows 148 libarchive to process archives too large to store on disk 149 by processing them on-the-fly as they are read from or 150 written to a network or tape drive. This also makes 151 libarchive useful for tools that need to produce 152 archives on-the-fly (such as webservers that provide 153 archived contents of a users account). 154 155* In-place modification and random access to the contents 156 of an archive are not directly supported. For some formats, 157 this is not an issue: For example, tar.gz archives are not 158 designed for random access. In some other cases, libarchive 159 can re-open an archive and scan it from the beginning quickly 160 enough to provide the needed abilities even without true 161 random access. Of course, some applications do require true 162 random access; those applications should consider alternatives 163 to libarchive. 164 165* The library is designed to be extended with new compression and 166 archive formats. The only requirement is that the format be 167 readable or writable as a stream and that each archive entry be 168 independent. There are articles on the libarchive Wiki explaining 169 how to extend libarchive. 170 171* On read, compression and format are always detected automatically. 172 173* The same API is used for all formats; it should be very 174 easy for software using libarchive to transparently handle 175 any of libarchive's archiving formats. 176 177* Libarchive's automatic support for decompression can be used 178 without archiving by explicitly selecting the "raw" and "empty" 179 formats. 180 181* I've attempted to minimize static link pollution. If you don't 182 explicitly invoke a particular feature (such as support for a 183 particular compression or format), it won't get pulled in to 184 statically-linked programs. In particular, if you don't explicitly 185 enable a particular compression or decompression support, you won't 186 need to link against the corresponding compression or decompression 187 libraries. This also reduces the size of statically-linked 188 binaries in environments where that matters. 189 190* The library is generally _thread safe_ depending on the platform: 191 it does not define any global variables of its own. However, some 192 platforms do not provide fully thread-safe versions of key C library 193 functions. On those platforms, libarchive will use the non-thread-safe 194 functions. Patches to improve this are of great interest to us. 195 196* The function `archive_write_disk_header()` is _not_ thread safe on 197 POSIX machines and could lead to security issue resulting in world 198 writeable directories. Thus it must be mutexed by the calling code. 199 This is due to calling `umask(oldumask = umask(0))`, which sets the 200 umask for the whole process to 0 for a short time frame. 201 In case other thread calls the same function in parallel, it might 202 get interrupted by it and cause the executable to use umask=0 for the 203 remaining execution. 204 This will then lead to implicitly created directories to have 777 205 permissions without sticky bit. 206 207* In particular, libarchive's modules to read or write a directory 208 tree do use `chdir()` to optimize the directory traversals. This 209 can cause problems for programs that expect to do disk access from 210 multiple threads. Of course, those modules are completely 211 optional and you can use the rest of libarchive without them. 212 213* The library is _not_ thread aware, however. It does no locking 214 or thread management of any kind. If you create a libarchive 215 object and need to access it from multiple threads, you will 216 need to provide your own locking. 217 218* On read, the library accepts whatever blocks you hand it. 219 Your read callback is free to pass the library a byte at a time 220 or mmap the entire archive and give it to the library at once. 221 On write, the library always produces correctly-blocked output. 222 223* The object-style approach allows you to have multiple archive streams 224 open at once. bsdtar uses this in its "@archive" extension. 225 226* The archive itself is read/written using callback functions. 227 You can read an archive directly from an in-memory buffer or 228 write it to a socket, if you wish. There are some utility 229 functions to provide easy-to-use "open file," etc, capabilities. 230 231* The read/write APIs are designed to allow individual entries 232 to be read or written to any data source: You can create 233 a block of data in memory and add it to a tar archive without 234 first writing a temporary file. You can also read an entry from 235 an archive and write the data directly to a socket. If you want 236 to read/write entries to disk, there are convenience functions to 237 make this especially easy. 238 239* Note: The "pax interchange format" is a POSIX standard extended tar 240 format that should be used when the older _ustar_ format is not 241 appropriate. It has many advantages over other tar formats 242 (including the legacy GNU tar format) and is widely supported by 243 current tar implementations. 244 245