147af42f8SMartin Matuska# Welcome to libarchive! 247af42f8SMartin Matuska 347af42f8SMartin MatuskaThe libarchive project develops a portable, efficient C library that 447af42f8SMartin Matuskacan read and write streaming archives in a variety of formats. It 547af42f8SMartin Matuskaalso includes implementations of the common `tar`, `cpio`, and `zcat` 647af42f8SMartin Matuskacommand-line tools that use the libarchive library. 747af42f8SMartin Matuska 847af42f8SMartin Matuska## Questions? Issues? 947af42f8SMartin Matuska 10e64fe029SMartin Matuska* https://www.libarchive.org is the home for ongoing 1147af42f8SMartin Matuska libarchive development, including documentation, 1247af42f8SMartin Matuska and links to the libarchive mailing lists. 1347af42f8SMartin Matuska* To report an issue, use the issue tracker at 1447af42f8SMartin Matuska https://github.com/libarchive/libarchive/issues 1547af42f8SMartin Matuska* To submit an enhancement to libarchive, please 1647af42f8SMartin Matuska submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls 1747af42f8SMartin Matuska 1847af42f8SMartin Matuska## Contents of the Distribution 1947af42f8SMartin Matuska 2047af42f8SMartin MatuskaThis distribution bundle includes the following major components: 2147af42f8SMartin Matuska 2247af42f8SMartin Matuska* **libarchive**: a library for reading and writing streaming archives 2347af42f8SMartin Matuska* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive 2447af42f8SMartin Matuska* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality 2547af42f8SMartin Matuska* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such 26e64fe029SMartin Matuska* **unzip**: the 'bsdunzip' program is a simple replacement tool for Info-ZIP's unzip 2747af42f8SMartin Matuska* **examples**: Some small example programs that you may find useful. 2847af42f8SMartin Matuska* **examples/minitar**: a compact sample demonstrating use of libarchive. 2947af42f8SMartin Matuska* **contrib**: Various items sent to me by third parties; please contact the authors with any questions. 3047af42f8SMartin Matuska 3147af42f8SMartin MatuskaThe top-level directory contains the following information files: 3247af42f8SMartin Matuska 3347af42f8SMartin Matuska* **NEWS** - highlights of recent changes 3447af42f8SMartin Matuska* **COPYING** - what you can do with this 3547af42f8SMartin Matuska* **INSTALL** - installation instructions 3647af42f8SMartin Matuska* **README** - this file 3747af42f8SMartin Matuska* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL 3847af42f8SMartin Matuska* **configure** - configuration script, see INSTALL for details. If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`). 3947af42f8SMartin Matuska 4047af42f8SMartin MatuskaThe following files in the top-level directory are used by the 'configure' script: 41bd5e624aSMartin Matuska 4247af42f8SMartin Matuska* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers 4347af42f8SMartin Matuska* `Makefile.in`, `config.h.in` - templates used by configure script 4447af42f8SMartin Matuska 4547af42f8SMartin Matuska## Documentation 4647af42f8SMartin Matuska 4747af42f8SMartin MatuskaIn addition to the informational articles and documentation 4847af42f8SMartin Matuskain the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki), 4947af42f8SMartin Matuskathe distribution also includes a number of manual pages: 5047af42f8SMartin Matuska 5147af42f8SMartin Matuska * bsdtar.1 explains the use of the bsdtar program 5247af42f8SMartin Matuska * bsdcpio.1 explains the use of the bsdcpio program 5347af42f8SMartin Matuska * bsdcat.1 explains the use of the bsdcat program 5447af42f8SMartin Matuska * libarchive.3 gives an overview of the library as a whole 5547af42f8SMartin Matuska * archive_read.3, archive_write.3, archive_write_disk.3, and 5647af42f8SMartin Matuska archive_read_disk.3 provide detailed calling sequences for the read 5747af42f8SMartin Matuska and write APIs 5847af42f8SMartin Matuska * archive_entry.3 details the "struct archive_entry" utility class 5947af42f8SMartin Matuska * archive_internals.3 provides some insight into libarchive's 6047af42f8SMartin Matuska internal structure and operation. 6147af42f8SMartin Matuska * libarchive-formats.5 documents the file formats supported by the library 6247af42f8SMartin Matuska * cpio.5, mtree.5, and tar.5 provide detailed information about these 6347af42f8SMartin Matuska popular archive formats, including hard-to-find details about 6447af42f8SMartin Matuska modern cpio and tar variants. 6547af42f8SMartin Matuska 6647af42f8SMartin MatuskaThe manual pages above are provided in the 'doc' directory in 6747af42f8SMartin Matuskaa number of different formats. 6847af42f8SMartin Matuska 6947af42f8SMartin MatuskaYou should also read the copious comments in `archive.h` and the 7047af42f8SMartin Matuskasource code for the sample programs for more details. Please let us 7147af42f8SMartin Matuskaknow about any errors or omissions you find. 7247af42f8SMartin Matuska 7347af42f8SMartin Matuska## Supported Formats 7447af42f8SMartin Matuska 75de6fa6b4SMartin MatuskaCurrently, the library automatically detects and reads the following formats: 76bd5e624aSMartin Matuska 7747af42f8SMartin Matuska * Old V7 tar archives 7847af42f8SMartin Matuska * POSIX ustar 7947af42f8SMartin Matuska * GNU tar format (including GNU long filenames, long link names, and sparse files) 8047af42f8SMartin Matuska * Solaris 9 extended tar format (including ACLs) 8147af42f8SMartin Matuska * POSIX pax interchange format 8247af42f8SMartin Matuska * POSIX octet-oriented cpio 8347af42f8SMartin Matuska * SVR4 ASCII cpio 8447af42f8SMartin Matuska * Binary cpio (big-endian or little-endian) 85ddce862aSMartin Matuska * PWB binary cpio 8647af42f8SMartin Matuska * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions) 8747af42f8SMartin Matuska * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives) 88fae5c36eSMartin Matuska * ZIPX archives (with support for bzip2, ppmd8, lzma and xz compressed entries) 8947af42f8SMartin Matuska * GNU and BSD 'ar' archives 9047af42f8SMartin Matuska * 'mtree' format 91e64fe029SMartin Matuska * 7-Zip archives (including archives that use zstandard compression) 9247af42f8SMartin Matuska * Microsoft CAB format 9347af42f8SMartin Matuska * LHA and LZH archives 94b1c91e4bSMartin Matuska * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status) 9547af42f8SMartin Matuska * XAR archives 9647af42f8SMartin Matuska 9747af42f8SMartin MatuskaThe library also detects and handles any of the following before evaluating the archive: 98bd5e624aSMartin Matuska 9947af42f8SMartin Matuska * uuencoded files 10047af42f8SMartin Matuska * files with RPM wrapper 10147af42f8SMartin Matuska * gzip compression 10247af42f8SMartin Matuska * bzip2 compression 10347af42f8SMartin Matuska * compress/LZW compression 10447af42f8SMartin Matuska * lzma, lzip, and xz compression 10547af42f8SMartin Matuska * lz4 compression 10647af42f8SMartin Matuska * lzop compression 107a7bc2822SMartin Matuska * zstandard compression 10847af42f8SMartin Matuska 10947af42f8SMartin MatuskaThe library can create archives in any of the following formats: 110bd5e624aSMartin Matuska 11147af42f8SMartin Matuska * POSIX ustar 11247af42f8SMartin Matuska * POSIX pax interchange format 11347af42f8SMartin Matuska * "restricted" pax format, which will create ustar archives except for 11447af42f8SMartin Matuska entries that require pax extensions (for long filenames, ACLs, etc). 11547af42f8SMartin Matuska * Old GNU tar format 11647af42f8SMartin Matuska * Old V7 tar format 11747af42f8SMartin Matuska * POSIX octet-oriented cpio 11847af42f8SMartin Matuska * SVR4 "newc" cpio 119ddce862aSMartin Matuska * Binary cpio (little-endian) 120ddce862aSMartin Matuska * PWB binary cpio 12147af42f8SMartin Matuska * shar archives 12247af42f8SMartin Matuska * ZIP archives (with uncompressed or "deflate" compressed entries) 12347af42f8SMartin Matuska * GNU and BSD 'ar' archives 12447af42f8SMartin Matuska * 'mtree' format 12547af42f8SMartin Matuska * ISO9660 format 12647af42f8SMartin Matuska * 7-Zip archives 12747af42f8SMartin Matuska * XAR archives 12847af42f8SMartin Matuska 12947af42f8SMartin MatuskaWhen creating archives, the result can be filtered with any of the following: 130bd5e624aSMartin Matuska 13147af42f8SMartin Matuska * uuencode 13247af42f8SMartin Matuska * gzip compression 13347af42f8SMartin Matuska * bzip2 compression 13447af42f8SMartin Matuska * compress/LZW compression 13547af42f8SMartin Matuska * lzma, lzip, and xz compression 13647af42f8SMartin Matuska * lz4 compression 13747af42f8SMartin Matuska * lzop compression 138a7bc2822SMartin Matuska * zstandard compression 13947af42f8SMartin Matuska 14047af42f8SMartin Matuska## Notes about the Library Design 14147af42f8SMartin Matuska 14247af42f8SMartin MatuskaThe following notes address many of the most common 14347af42f8SMartin Matuskaquestions we are asked about libarchive: 14447af42f8SMartin Matuska 14547af42f8SMartin Matuska* This is a heavily stream-oriented system. That means that 14647af42f8SMartin Matuska it is optimized to read or write the archive in a single 14747af42f8SMartin Matuska pass from beginning to end. For example, this allows 14847af42f8SMartin Matuska libarchive to process archives too large to store on disk 14947af42f8SMartin Matuska by processing them on-the-fly as they are read from or 15047af42f8SMartin Matuska written to a network or tape drive. This also makes 15147af42f8SMartin Matuska libarchive useful for tools that need to produce 15247af42f8SMartin Matuska archives on-the-fly (such as webservers that provide 15347af42f8SMartin Matuska archived contents of a users account). 15447af42f8SMartin Matuska 15547af42f8SMartin Matuska* In-place modification and random access to the contents 15647af42f8SMartin Matuska of an archive are not directly supported. For some formats, 15747af42f8SMartin Matuska this is not an issue: For example, tar.gz archives are not 15847af42f8SMartin Matuska designed for random access. In some other cases, libarchive 15947af42f8SMartin Matuska can re-open an archive and scan it from the beginning quickly 16047af42f8SMartin Matuska enough to provide the needed abilities even without true 16147af42f8SMartin Matuska random access. Of course, some applications do require true 16247af42f8SMartin Matuska random access; those applications should consider alternatives 16347af42f8SMartin Matuska to libarchive. 16447af42f8SMartin Matuska 16547af42f8SMartin Matuska* The library is designed to be extended with new compression and 16647af42f8SMartin Matuska archive formats. The only requirement is that the format be 16747af42f8SMartin Matuska readable or writable as a stream and that each archive entry be 16847af42f8SMartin Matuska independent. There are articles on the libarchive Wiki explaining 16947af42f8SMartin Matuska how to extend libarchive. 17047af42f8SMartin Matuska 17147af42f8SMartin Matuska* On read, compression and format are always detected automatically. 17247af42f8SMartin Matuska 173a7bc2822SMartin Matuska* The same API is used for all formats; it should be very 17447af42f8SMartin Matuska easy for software using libarchive to transparently handle 17547af42f8SMartin Matuska any of libarchive's archiving formats. 17647af42f8SMartin Matuska 17747af42f8SMartin Matuska* Libarchive's automatic support for decompression can be used 17847af42f8SMartin Matuska without archiving by explicitly selecting the "raw" and "empty" 17947af42f8SMartin Matuska formats. 18047af42f8SMartin Matuska 18147af42f8SMartin Matuska* I've attempted to minimize static link pollution. If you don't 18247af42f8SMartin Matuska explicitly invoke a particular feature (such as support for a 18347af42f8SMartin Matuska particular compression or format), it won't get pulled in to 18447af42f8SMartin Matuska statically-linked programs. In particular, if you don't explicitly 18547af42f8SMartin Matuska enable a particular compression or decompression support, you won't 18647af42f8SMartin Matuska need to link against the corresponding compression or decompression 18747af42f8SMartin Matuska libraries. This also reduces the size of statically-linked 18847af42f8SMartin Matuska binaries in environments where that matters. 18947af42f8SMartin Matuska 19047af42f8SMartin Matuska* The library is generally _thread safe_ depending on the platform: 19147af42f8SMartin Matuska it does not define any global variables of its own. However, some 19247af42f8SMartin Matuska platforms do not provide fully thread-safe versions of key C library 19347af42f8SMartin Matuska functions. On those platforms, libarchive will use the non-thread-safe 19447af42f8SMartin Matuska functions. Patches to improve this are of great interest to us. 19547af42f8SMartin Matuska 196e64fe029SMartin Matuska* The function `archive_write_disk_header()` is _not_ thread safe on 197e64fe029SMartin Matuska POSIX machines and could lead to security issue resulting in world 198e64fe029SMartin Matuska writeable directories. Thus it must be mutexed by the calling code. 199e64fe029SMartin Matuska This is due to calling `umask(oldumask = umask(0))`, which sets the 200e64fe029SMartin Matuska umask for the whole process to 0 for a short time frame. 201e64fe029SMartin Matuska In case other thread calls the same function in parallel, it might 202e64fe029SMartin Matuska get interrupted by it and cause the executable to use umask=0 for the 203e64fe029SMartin Matuska remaining execution. 204*13d826ffSMartin Matuska This will then lead to implicitly created directories to have 777 205e64fe029SMartin Matuska permissions without sticky bit. 206e64fe029SMartin Matuska 20747af42f8SMartin Matuska* In particular, libarchive's modules to read or write a directory 20847af42f8SMartin Matuska tree do use `chdir()` to optimize the directory traversals. This 20947af42f8SMartin Matuska can cause problems for programs that expect to do disk access from 21047af42f8SMartin Matuska multiple threads. Of course, those modules are completely 21147af42f8SMartin Matuska optional and you can use the rest of libarchive without them. 21247af42f8SMartin Matuska 21347af42f8SMartin Matuska* The library is _not_ thread aware, however. It does no locking 21447af42f8SMartin Matuska or thread management of any kind. If you create a libarchive 21547af42f8SMartin Matuska object and need to access it from multiple threads, you will 21647af42f8SMartin Matuska need to provide your own locking. 21747af42f8SMartin Matuska 21847af42f8SMartin Matuska* On read, the library accepts whatever blocks you hand it. 21947af42f8SMartin Matuska Your read callback is free to pass the library a byte at a time 22047af42f8SMartin Matuska or mmap the entire archive and give it to the library at once. 22147af42f8SMartin Matuska On write, the library always produces correctly-blocked output. 22247af42f8SMartin Matuska 22347af42f8SMartin Matuska* The object-style approach allows you to have multiple archive streams 22447af42f8SMartin Matuska open at once. bsdtar uses this in its "@archive" extension. 22547af42f8SMartin Matuska 22647af42f8SMartin Matuska* The archive itself is read/written using callback functions. 22747af42f8SMartin Matuska You can read an archive directly from an in-memory buffer or 22847af42f8SMartin Matuska write it to a socket, if you wish. There are some utility 22947af42f8SMartin Matuska functions to provide easy-to-use "open file," etc, capabilities. 23047af42f8SMartin Matuska 23147af42f8SMartin Matuska* The read/write APIs are designed to allow individual entries 23247af42f8SMartin Matuska to be read or written to any data source: You can create 23347af42f8SMartin Matuska a block of data in memory and add it to a tar archive without 23447af42f8SMartin Matuska first writing a temporary file. You can also read an entry from 23547af42f8SMartin Matuska an archive and write the data directly to a socket. If you want 23647af42f8SMartin Matuska to read/write entries to disk, there are convenience functions to 23747af42f8SMartin Matuska make this especially easy. 23847af42f8SMartin Matuska 23947af42f8SMartin Matuska* Note: The "pax interchange format" is a POSIX standard extended tar 24047af42f8SMartin Matuska format that should be used when the older _ustar_ format is not 24147af42f8SMartin Matuska appropriate. It has many advantages over other tar formats 24247af42f8SMartin Matuska (including the legacy GNU tar format) and is widely supported by 24347af42f8SMartin Matuska current tar implementations. 24447af42f8SMartin Matuska 245