1*47af42f8SMartin Matuska# Welcome to libarchive! 2*47af42f8SMartin Matuska 3*47af42f8SMartin MatuskaThe libarchive project develops a portable, efficient C library that 4*47af42f8SMartin Matuskacan read and write streaming archives in a variety of formats. It 5*47af42f8SMartin Matuskaalso includes implementations of the common `tar`, `cpio`, and `zcat` 6*47af42f8SMartin Matuskacommand-line tools that use the libarchive library. 7*47af42f8SMartin Matuska 8*47af42f8SMartin Matuska## Questions? Issues? 9*47af42f8SMartin Matuska 10*47af42f8SMartin Matuska* http://www.libarchive.org is the home for ongoing 11*47af42f8SMartin Matuska libarchive development, including documentation, 12*47af42f8SMartin Matuska and links to the libarchive mailing lists. 13*47af42f8SMartin Matuska* To report an issue, use the issue tracker at 14*47af42f8SMartin Matuska https://github.com/libarchive/libarchive/issues 15*47af42f8SMartin Matuska* To submit an enhancement to libarchive, please 16*47af42f8SMartin Matuska submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls 17*47af42f8SMartin Matuska 18*47af42f8SMartin Matuska## Contents of the Distribution 19*47af42f8SMartin Matuska 20*47af42f8SMartin MatuskaThis distribution bundle includes the following major components: 21*47af42f8SMartin Matuska 22*47af42f8SMartin Matuska* **libarchive**: a library for reading and writing streaming archives 23*47af42f8SMartin Matuska* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive 24*47af42f8SMartin Matuska* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality 25*47af42f8SMartin Matuska* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such 26*47af42f8SMartin Matuska* **examples**: Some small example programs that you may find useful. 27*47af42f8SMartin Matuska* **examples/minitar**: a compact sample demonstrating use of libarchive. 28*47af42f8SMartin Matuska* **contrib**: Various items sent to me by third parties; please contact the authors with any questions. 29*47af42f8SMartin Matuska 30*47af42f8SMartin MatuskaThe top-level directory contains the following information files: 31*47af42f8SMartin Matuska 32*47af42f8SMartin Matuska* **NEWS** - highlights of recent changes 33*47af42f8SMartin Matuska* **COPYING** - what you can do with this 34*47af42f8SMartin Matuska* **INSTALL** - installation instructions 35*47af42f8SMartin Matuska* **README** - this file 36*47af42f8SMartin Matuska* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL 37*47af42f8SMartin Matuska* **configure** - configuration script, see INSTALL for details. If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`). 38*47af42f8SMartin Matuska 39*47af42f8SMartin MatuskaThe following files in the top-level directory are used by the 'configure' script: 40*47af42f8SMartin Matuska* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers 41*47af42f8SMartin Matuska* `Makefile.in`, `config.h.in` - templates used by configure script 42*47af42f8SMartin Matuska 43*47af42f8SMartin Matuska## Documentation 44*47af42f8SMartin Matuska 45*47af42f8SMartin MatuskaIn addition to the informational articles and documentation 46*47af42f8SMartin Matuskain the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki), 47*47af42f8SMartin Matuskathe distribution also includes a number of manual pages: 48*47af42f8SMartin Matuska 49*47af42f8SMartin Matuska * bsdtar.1 explains the use of the bsdtar program 50*47af42f8SMartin Matuska * bsdcpio.1 explains the use of the bsdcpio program 51*47af42f8SMartin Matuska * bsdcat.1 explains the use of the bsdcat program 52*47af42f8SMartin Matuska * libarchive.3 gives an overview of the library as a whole 53*47af42f8SMartin Matuska * archive_read.3, archive_write.3, archive_write_disk.3, and 54*47af42f8SMartin Matuska archive_read_disk.3 provide detailed calling sequences for the read 55*47af42f8SMartin Matuska and write APIs 56*47af42f8SMartin Matuska * archive_entry.3 details the "struct archive_entry" utility class 57*47af42f8SMartin Matuska * archive_internals.3 provides some insight into libarchive's 58*47af42f8SMartin Matuska internal structure and operation. 59*47af42f8SMartin Matuska * libarchive-formats.5 documents the file formats supported by the library 60*47af42f8SMartin Matuska * cpio.5, mtree.5, and tar.5 provide detailed information about these 61*47af42f8SMartin Matuska popular archive formats, including hard-to-find details about 62*47af42f8SMartin Matuska modern cpio and tar variants. 63*47af42f8SMartin Matuska 64*47af42f8SMartin MatuskaThe manual pages above are provided in the 'doc' directory in 65*47af42f8SMartin Matuskaa number of different formats. 66*47af42f8SMartin Matuska 67*47af42f8SMartin MatuskaYou should also read the copious comments in `archive.h` and the 68*47af42f8SMartin Matuskasource code for the sample programs for more details. Please let us 69*47af42f8SMartin Matuskaknow about any errors or omissions you find. 70*47af42f8SMartin Matuska 71*47af42f8SMartin Matuska## Supported Formats 72*47af42f8SMartin Matuska 73*47af42f8SMartin MatuskaCurrently, the library automatically detects and reads the following fomats: 74*47af42f8SMartin Matuska * Old V7 tar archives 75*47af42f8SMartin Matuska * POSIX ustar 76*47af42f8SMartin Matuska * GNU tar format (including GNU long filenames, long link names, and sparse files) 77*47af42f8SMartin Matuska * Solaris 9 extended tar format (including ACLs) 78*47af42f8SMartin Matuska * POSIX pax interchange format 79*47af42f8SMartin Matuska * POSIX octet-oriented cpio 80*47af42f8SMartin Matuska * SVR4 ASCII cpio 81*47af42f8SMartin Matuska * POSIX octet-oriented cpio 82*47af42f8SMartin Matuska * Binary cpio (big-endian or little-endian) 83*47af42f8SMartin Matuska * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions) 84*47af42f8SMartin Matuska * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives) 85*47af42f8SMartin Matuska * GNU and BSD 'ar' archives 86*47af42f8SMartin Matuska * 'mtree' format 87*47af42f8SMartin Matuska * 7-Zip archives 88*47af42f8SMartin Matuska * Microsoft CAB format 89*47af42f8SMartin Matuska * LHA and LZH archives 90*47af42f8SMartin Matuska * RAR archives (with some limitations due to RAR's proprietary status) 91*47af42f8SMartin Matuska * XAR archives 92*47af42f8SMartin Matuska 93*47af42f8SMartin MatuskaThe library also detects and handles any of the following before evaluating the archive: 94*47af42f8SMartin Matuska * uuencoded files 95*47af42f8SMartin Matuska * files with RPM wrapper 96*47af42f8SMartin Matuska * gzip compression 97*47af42f8SMartin Matuska * bzip2 compression 98*47af42f8SMartin Matuska * compress/LZW compression 99*47af42f8SMartin Matuska * lzma, lzip, and xz compression 100*47af42f8SMartin Matuska * lz4 compression 101*47af42f8SMartin Matuska * lzop compression 102*47af42f8SMartin Matuska 103*47af42f8SMartin MatuskaThe library can create archives in any of the following formats: 104*47af42f8SMartin Matuska * POSIX ustar 105*47af42f8SMartin Matuska * POSIX pax interchange format 106*47af42f8SMartin Matuska * "restricted" pax format, which will create ustar archives except for 107*47af42f8SMartin Matuska entries that require pax extensions (for long filenames, ACLs, etc). 108*47af42f8SMartin Matuska * Old GNU tar format 109*47af42f8SMartin Matuska * Old V7 tar format 110*47af42f8SMartin Matuska * POSIX octet-oriented cpio 111*47af42f8SMartin Matuska * SVR4 "newc" cpio 112*47af42f8SMartin Matuska * shar archives 113*47af42f8SMartin Matuska * ZIP archives (with uncompressed or "deflate" compressed entries) 114*47af42f8SMartin Matuska * GNU and BSD 'ar' archives 115*47af42f8SMartin Matuska * 'mtree' format 116*47af42f8SMartin Matuska * ISO9660 format 117*47af42f8SMartin Matuska * 7-Zip archives 118*47af42f8SMartin Matuska * XAR archives 119*47af42f8SMartin Matuska 120*47af42f8SMartin MatuskaWhen creating archives, the result can be filtered with any of the following: 121*47af42f8SMartin Matuska * uuencode 122*47af42f8SMartin Matuska * gzip compression 123*47af42f8SMartin Matuska * bzip2 compression 124*47af42f8SMartin Matuska * compress/LZW compression 125*47af42f8SMartin Matuska * lzma, lzip, and xz compression 126*47af42f8SMartin Matuska * lz4 compression 127*47af42f8SMartin Matuska * lzop compression 128*47af42f8SMartin Matuska 129*47af42f8SMartin Matuska## Notes about the Library Design 130*47af42f8SMartin Matuska 131*47af42f8SMartin MatuskaThe following notes address many of the most common 132*47af42f8SMartin Matuskaquestions we are asked about libarchive: 133*47af42f8SMartin Matuska 134*47af42f8SMartin Matuska* This is a heavily stream-oriented system. That means that 135*47af42f8SMartin Matuska it is optimized to read or write the archive in a single 136*47af42f8SMartin Matuska pass from beginning to end. For example, this allows 137*47af42f8SMartin Matuska libarchive to process archives too large to store on disk 138*47af42f8SMartin Matuska by processing them on-the-fly as they are read from or 139*47af42f8SMartin Matuska written to a network or tape drive. This also makes 140*47af42f8SMartin Matuska libarchive useful for tools that need to produce 141*47af42f8SMartin Matuska archives on-the-fly (such as webservers that provide 142*47af42f8SMartin Matuska archived contents of a users account). 143*47af42f8SMartin Matuska 144*47af42f8SMartin Matuska* In-place modification and random access to the contents 145*47af42f8SMartin Matuska of an archive are not directly supported. For some formats, 146*47af42f8SMartin Matuska this is not an issue: For example, tar.gz archives are not 147*47af42f8SMartin Matuska designed for random access. In some other cases, libarchive 148*47af42f8SMartin Matuska can re-open an archive and scan it from the beginning quickly 149*47af42f8SMartin Matuska enough to provide the needed abilities even without true 150*47af42f8SMartin Matuska random access. Of course, some applications do require true 151*47af42f8SMartin Matuska random access; those applications should consider alternatives 152*47af42f8SMartin Matuska to libarchive. 153*47af42f8SMartin Matuska 154*47af42f8SMartin Matuska* The library is designed to be extended with new compression and 155*47af42f8SMartin Matuska archive formats. The only requirement is that the format be 156*47af42f8SMartin Matuska readable or writable as a stream and that each archive entry be 157*47af42f8SMartin Matuska independent. There are articles on the libarchive Wiki explaining 158*47af42f8SMartin Matuska how to extend libarchive. 159*47af42f8SMartin Matuska 160*47af42f8SMartin Matuska* On read, compression and format are always detected automatically. 161*47af42f8SMartin Matuska 162*47af42f8SMartin Matuska* The same API is used for all formats; in particular, it's very 163*47af42f8SMartin Matuska easy for software using libarchive to transparently handle 164*47af42f8SMartin Matuska any of libarchive's archiving formats. 165*47af42f8SMartin Matuska 166*47af42f8SMartin Matuska* Libarchive's automatic support for decompression can be used 167*47af42f8SMartin Matuska without archiving by explicitly selecting the "raw" and "empty" 168*47af42f8SMartin Matuska formats. 169*47af42f8SMartin Matuska 170*47af42f8SMartin Matuska* I've attempted to minimize static link pollution. If you don't 171*47af42f8SMartin Matuska explicitly invoke a particular feature (such as support for a 172*47af42f8SMartin Matuska particular compression or format), it won't get pulled in to 173*47af42f8SMartin Matuska statically-linked programs. In particular, if you don't explicitly 174*47af42f8SMartin Matuska enable a particular compression or decompression support, you won't 175*47af42f8SMartin Matuska need to link against the corresponding compression or decompression 176*47af42f8SMartin Matuska libraries. This also reduces the size of statically-linked 177*47af42f8SMartin Matuska binaries in environments where that matters. 178*47af42f8SMartin Matuska 179*47af42f8SMartin Matuska* The library is generally _thread safe_ depending on the platform: 180*47af42f8SMartin Matuska it does not define any global variables of its own. However, some 181*47af42f8SMartin Matuska platforms do not provide fully thread-safe versions of key C library 182*47af42f8SMartin Matuska functions. On those platforms, libarchive will use the non-thread-safe 183*47af42f8SMartin Matuska functions. Patches to improve this are of great interest to us. 184*47af42f8SMartin Matuska 185*47af42f8SMartin Matuska* In particular, libarchive's modules to read or write a directory 186*47af42f8SMartin Matuska tree do use `chdir()` to optimize the directory traversals. This 187*47af42f8SMartin Matuska can cause problems for programs that expect to do disk access from 188*47af42f8SMartin Matuska multiple threads. Of course, those modules are completely 189*47af42f8SMartin Matuska optional and you can use the rest of libarchive without them. 190*47af42f8SMartin Matuska 191*47af42f8SMartin Matuska* The library is _not_ thread aware, however. It does no locking 192*47af42f8SMartin Matuska or thread management of any kind. If you create a libarchive 193*47af42f8SMartin Matuska object and need to access it from multiple threads, you will 194*47af42f8SMartin Matuska need to provide your own locking. 195*47af42f8SMartin Matuska 196*47af42f8SMartin Matuska* On read, the library accepts whatever blocks you hand it. 197*47af42f8SMartin Matuska Your read callback is free to pass the library a byte at a time 198*47af42f8SMartin Matuska or mmap the entire archive and give it to the library at once. 199*47af42f8SMartin Matuska On write, the library always produces correctly-blocked output. 200*47af42f8SMartin Matuska 201*47af42f8SMartin Matuska* The object-style approach allows you to have multiple archive streams 202*47af42f8SMartin Matuska open at once. bsdtar uses this in its "@archive" extension. 203*47af42f8SMartin Matuska 204*47af42f8SMartin Matuska* The archive itself is read/written using callback functions. 205*47af42f8SMartin Matuska You can read an archive directly from an in-memory buffer or 206*47af42f8SMartin Matuska write it to a socket, if you wish. There are some utility 207*47af42f8SMartin Matuska functions to provide easy-to-use "open file," etc, capabilities. 208*47af42f8SMartin Matuska 209*47af42f8SMartin Matuska* The read/write APIs are designed to allow individual entries 210*47af42f8SMartin Matuska to be read or written to any data source: You can create 211*47af42f8SMartin Matuska a block of data in memory and add it to a tar archive without 212*47af42f8SMartin Matuska first writing a temporary file. You can also read an entry from 213*47af42f8SMartin Matuska an archive and write the data directly to a socket. If you want 214*47af42f8SMartin Matuska to read/write entries to disk, there are convenience functions to 215*47af42f8SMartin Matuska make this especially easy. 216*47af42f8SMartin Matuska 217*47af42f8SMartin Matuska* Note: The "pax interchange format" is a POSIX standard extended tar 218*47af42f8SMartin Matuska format that should be used when the older _ustar_ format is not 219*47af42f8SMartin Matuska appropriate. It has many advantages over other tar formats 220*47af42f8SMartin Matuska (including the legacy GNU tar format) and is widely supported by 221*47af42f8SMartin Matuska current tar implementations. 222*47af42f8SMartin Matuska 223