xref: /freebsd/contrib/libarchive/README.md (revision 13d826ff947d9026f98e317e7385b22abfc0eace)
147af42f8SMartin Matuska# Welcome to libarchive!
247af42f8SMartin Matuska
347af42f8SMartin MatuskaThe libarchive project develops a portable, efficient C library that
447af42f8SMartin Matuskacan read and write streaming archives in a variety of formats.  It
547af42f8SMartin Matuskaalso includes implementations of the common `tar`, `cpio`, and `zcat`
647af42f8SMartin Matuskacommand-line tools that use the libarchive library.
747af42f8SMartin Matuska
847af42f8SMartin Matuska## Questions?  Issues?
947af42f8SMartin Matuska
10e64fe029SMartin Matuska* https://www.libarchive.org is the home for ongoing
1147af42f8SMartin Matuska  libarchive development, including documentation,
1247af42f8SMartin Matuska  and links to the libarchive mailing lists.
1347af42f8SMartin Matuska* To report an issue, use the issue tracker at
1447af42f8SMartin Matuska  https://github.com/libarchive/libarchive/issues
1547af42f8SMartin Matuska* To submit an enhancement to libarchive, please
1647af42f8SMartin Matuska  submit a pull request via GitHub: https://github.com/libarchive/libarchive/pulls
1747af42f8SMartin Matuska
1847af42f8SMartin Matuska## Contents of the Distribution
1947af42f8SMartin Matuska
2047af42f8SMartin MatuskaThis distribution bundle includes the following major components:
2147af42f8SMartin Matuska
2247af42f8SMartin Matuska* **libarchive**: a library for reading and writing streaming archives
2347af42f8SMartin Matuska* **tar**: the 'bsdtar' program is a full-featured 'tar' implementation built on libarchive
2447af42f8SMartin Matuska* **cpio**: the 'bsdcpio' program is a different interface to essentially the same functionality
2547af42f8SMartin Matuska* **cat**: the 'bsdcat' program is a simple replacement tool for zcat, bzcat, xzcat, and such
26e64fe029SMartin Matuska* **unzip**: the 'bsdunzip' program is a simple replacement tool for Info-ZIP's unzip
2747af42f8SMartin Matuska* **examples**: Some small example programs that you may find useful.
2847af42f8SMartin Matuska* **examples/minitar**: a compact sample demonstrating use of libarchive.
2947af42f8SMartin Matuska* **contrib**:  Various items sent to me by third parties; please contact the authors with any questions.
3047af42f8SMartin Matuska
3147af42f8SMartin MatuskaThe top-level directory contains the following information files:
3247af42f8SMartin Matuska
3347af42f8SMartin Matuska* **NEWS** - highlights of recent changes
3447af42f8SMartin Matuska* **COPYING** - what you can do with this
3547af42f8SMartin Matuska* **INSTALL** - installation instructions
3647af42f8SMartin Matuska* **README** - this file
3747af42f8SMartin Matuska* **CMakeLists.txt** - input for "cmake" build tool, see INSTALL
3847af42f8SMartin Matuska* **configure** - configuration script, see INSTALL for details.  If your copy of the source lacks a `configure` script, you can try to construct it by running the script in `build/autogen.sh` (or use `cmake`).
3947af42f8SMartin Matuska
4047af42f8SMartin MatuskaThe following files in the top-level directory are used by the 'configure' script:
41bd5e624aSMartin Matuska
4247af42f8SMartin Matuska* `Makefile.am`, `aclocal.m4`, `configure.ac` - used to build this distribution, only needed by maintainers
4347af42f8SMartin Matuska* `Makefile.in`, `config.h.in` - templates used by configure script
4447af42f8SMartin Matuska
4547af42f8SMartin Matuska## Documentation
4647af42f8SMartin Matuska
4747af42f8SMartin MatuskaIn addition to the informational articles and documentation
4847af42f8SMartin Matuskain the online [libarchive Wiki](https://github.com/libarchive/libarchive/wiki),
4947af42f8SMartin Matuskathe distribution also includes a number of manual pages:
5047af42f8SMartin Matuska
5147af42f8SMartin Matuska * bsdtar.1 explains the use of the bsdtar program
5247af42f8SMartin Matuska * bsdcpio.1 explains the use of the bsdcpio program
5347af42f8SMartin Matuska * bsdcat.1 explains the use of the bsdcat program
5447af42f8SMartin Matuska * libarchive.3 gives an overview of the library as a whole
5547af42f8SMartin Matuska * archive_read.3, archive_write.3, archive_write_disk.3, and
5647af42f8SMartin Matuska   archive_read_disk.3 provide detailed calling sequences for the read
5747af42f8SMartin Matuska   and write APIs
5847af42f8SMartin Matuska * archive_entry.3 details the "struct archive_entry" utility class
5947af42f8SMartin Matuska * archive_internals.3 provides some insight into libarchive's
6047af42f8SMartin Matuska   internal structure and operation.
6147af42f8SMartin Matuska * libarchive-formats.5 documents the file formats supported by the library
6247af42f8SMartin Matuska * cpio.5, mtree.5, and tar.5 provide detailed information about these
6347af42f8SMartin Matuska   popular archive formats, including hard-to-find details about
6447af42f8SMartin Matuska   modern cpio and tar variants.
6547af42f8SMartin Matuska
6647af42f8SMartin MatuskaThe manual pages above are provided in the 'doc' directory in
6747af42f8SMartin Matuskaa number of different formats.
6847af42f8SMartin Matuska
6947af42f8SMartin MatuskaYou should also read the copious comments in `archive.h` and the
7047af42f8SMartin Matuskasource code for the sample programs for more details.  Please let us
7147af42f8SMartin Matuskaknow about any errors or omissions you find.
7247af42f8SMartin Matuska
7347af42f8SMartin Matuska## Supported Formats
7447af42f8SMartin Matuska
75de6fa6b4SMartin MatuskaCurrently, the library automatically detects and reads the following formats:
76bd5e624aSMartin Matuska
7747af42f8SMartin Matuska  * Old V7 tar archives
7847af42f8SMartin Matuska  * POSIX ustar
7947af42f8SMartin Matuska  * GNU tar format (including GNU long filenames, long link names, and sparse files)
8047af42f8SMartin Matuska  * Solaris 9 extended tar format (including ACLs)
8147af42f8SMartin Matuska  * POSIX pax interchange format
8247af42f8SMartin Matuska  * POSIX octet-oriented cpio
8347af42f8SMartin Matuska  * SVR4 ASCII cpio
8447af42f8SMartin Matuska  * Binary cpio (big-endian or little-endian)
85ddce862aSMartin Matuska  * PWB binary cpio
8647af42f8SMartin Matuska  * ISO9660 CD-ROM images (with optional Rockridge or Joliet extensions)
8747af42f8SMartin Matuska  * ZIP archives (with uncompressed or "deflate" compressed entries, including support for encrypted Zip archives)
88fae5c36eSMartin Matuska  * ZIPX archives (with support for bzip2, ppmd8, lzma and xz compressed entries)
8947af42f8SMartin Matuska  * GNU and BSD 'ar' archives
9047af42f8SMartin Matuska  * 'mtree' format
91e64fe029SMartin Matuska  * 7-Zip archives (including archives that use zstandard compression)
9247af42f8SMartin Matuska  * Microsoft CAB format
9347af42f8SMartin Matuska  * LHA and LZH archives
94b1c91e4bSMartin Matuska  * RAR and RAR 5.0 archives (with some limitations due to RAR's proprietary status)
9547af42f8SMartin Matuska  * XAR archives
9647af42f8SMartin Matuska
9747af42f8SMartin MatuskaThe library also detects and handles any of the following before evaluating the archive:
98bd5e624aSMartin Matuska
9947af42f8SMartin Matuska  * uuencoded files
10047af42f8SMartin Matuska  * files with RPM wrapper
10147af42f8SMartin Matuska  * gzip compression
10247af42f8SMartin Matuska  * bzip2 compression
10347af42f8SMartin Matuska  * compress/LZW compression
10447af42f8SMartin Matuska  * lzma, lzip, and xz compression
10547af42f8SMartin Matuska  * lz4 compression
10647af42f8SMartin Matuska  * lzop compression
107a7bc2822SMartin Matuska  * zstandard compression
10847af42f8SMartin Matuska
10947af42f8SMartin MatuskaThe library can create archives in any of the following formats:
110bd5e624aSMartin Matuska
11147af42f8SMartin Matuska  * POSIX ustar
11247af42f8SMartin Matuska  * POSIX pax interchange format
11347af42f8SMartin Matuska  * "restricted" pax format, which will create ustar archives except for
11447af42f8SMartin Matuska    entries that require pax extensions (for long filenames, ACLs, etc).
11547af42f8SMartin Matuska  * Old GNU tar format
11647af42f8SMartin Matuska  * Old V7 tar format
11747af42f8SMartin Matuska  * POSIX octet-oriented cpio
11847af42f8SMartin Matuska  * SVR4 "newc" cpio
119ddce862aSMartin Matuska  * Binary cpio (little-endian)
120ddce862aSMartin Matuska  * PWB binary cpio
12147af42f8SMartin Matuska  * shar archives
12247af42f8SMartin Matuska  * ZIP archives (with uncompressed or "deflate" compressed entries)
12347af42f8SMartin Matuska  * GNU and BSD 'ar' archives
12447af42f8SMartin Matuska  * 'mtree' format
12547af42f8SMartin Matuska  * ISO9660 format
12647af42f8SMartin Matuska  * 7-Zip archives
12747af42f8SMartin Matuska  * XAR archives
12847af42f8SMartin Matuska
12947af42f8SMartin MatuskaWhen creating archives, the result can be filtered with any of the following:
130bd5e624aSMartin Matuska
13147af42f8SMartin Matuska  * uuencode
13247af42f8SMartin Matuska  * gzip compression
13347af42f8SMartin Matuska  * bzip2 compression
13447af42f8SMartin Matuska  * compress/LZW compression
13547af42f8SMartin Matuska  * lzma, lzip, and xz compression
13647af42f8SMartin Matuska  * lz4 compression
13747af42f8SMartin Matuska  * lzop compression
138a7bc2822SMartin Matuska  * zstandard compression
13947af42f8SMartin Matuska
14047af42f8SMartin Matuska## Notes about the Library Design
14147af42f8SMartin Matuska
14247af42f8SMartin MatuskaThe following notes address many of the most common
14347af42f8SMartin Matuskaquestions we are asked about libarchive:
14447af42f8SMartin Matuska
14547af42f8SMartin Matuska* This is a heavily stream-oriented system.  That means that
14647af42f8SMartin Matuska  it is optimized to read or write the archive in a single
14747af42f8SMartin Matuska  pass from beginning to end.  For example, this allows
14847af42f8SMartin Matuska  libarchive to process archives too large to store on disk
14947af42f8SMartin Matuska  by processing them on-the-fly as they are read from or
15047af42f8SMartin Matuska  written to a network or tape drive.  This also makes
15147af42f8SMartin Matuska  libarchive useful for tools that need to produce
15247af42f8SMartin Matuska  archives on-the-fly (such as webservers that provide
15347af42f8SMartin Matuska  archived contents of a users account).
15447af42f8SMartin Matuska
15547af42f8SMartin Matuska* In-place modification and random access to the contents
15647af42f8SMartin Matuska  of an archive are not directly supported.  For some formats,
15747af42f8SMartin Matuska  this is not an issue: For example, tar.gz archives are not
15847af42f8SMartin Matuska  designed for random access.  In some other cases, libarchive
15947af42f8SMartin Matuska  can re-open an archive and scan it from the beginning quickly
16047af42f8SMartin Matuska  enough to provide the needed abilities even without true
16147af42f8SMartin Matuska  random access.  Of course, some applications do require true
16247af42f8SMartin Matuska  random access; those applications should consider alternatives
16347af42f8SMartin Matuska  to libarchive.
16447af42f8SMartin Matuska
16547af42f8SMartin Matuska* The library is designed to be extended with new compression and
16647af42f8SMartin Matuska  archive formats.  The only requirement is that the format be
16747af42f8SMartin Matuska  readable or writable as a stream and that each archive entry be
16847af42f8SMartin Matuska  independent.  There are articles on the libarchive Wiki explaining
16947af42f8SMartin Matuska  how to extend libarchive.
17047af42f8SMartin Matuska
17147af42f8SMartin Matuska* On read, compression and format are always detected automatically.
17247af42f8SMartin Matuska
173a7bc2822SMartin Matuska* The same API is used for all formats; it should be very
17447af42f8SMartin Matuska  easy for software using libarchive to transparently handle
17547af42f8SMartin Matuska  any of libarchive's archiving formats.
17647af42f8SMartin Matuska
17747af42f8SMartin Matuska* Libarchive's automatic support for decompression can be used
17847af42f8SMartin Matuska  without archiving by explicitly selecting the "raw" and "empty"
17947af42f8SMartin Matuska  formats.
18047af42f8SMartin Matuska
18147af42f8SMartin Matuska* I've attempted to minimize static link pollution.  If you don't
18247af42f8SMartin Matuska  explicitly invoke a particular feature (such as support for a
18347af42f8SMartin Matuska  particular compression or format), it won't get pulled in to
18447af42f8SMartin Matuska  statically-linked programs.  In particular, if you don't explicitly
18547af42f8SMartin Matuska  enable a particular compression or decompression support, you won't
18647af42f8SMartin Matuska  need to link against the corresponding compression or decompression
18747af42f8SMartin Matuska  libraries.  This also reduces the size of statically-linked
18847af42f8SMartin Matuska  binaries in environments where that matters.
18947af42f8SMartin Matuska
19047af42f8SMartin Matuska* The library is generally _thread safe_ depending on the platform:
19147af42f8SMartin Matuska  it does not define any global variables of its own.  However, some
19247af42f8SMartin Matuska  platforms do not provide fully thread-safe versions of key C library
19347af42f8SMartin Matuska  functions.  On those platforms, libarchive will use the non-thread-safe
19447af42f8SMartin Matuska  functions.  Patches to improve this are of great interest to us.
19547af42f8SMartin Matuska
196e64fe029SMartin Matuska* The function `archive_write_disk_header()` is _not_ thread safe on
197e64fe029SMartin Matuska  POSIX machines and could lead to security issue resulting in world
198e64fe029SMartin Matuska  writeable directories.  Thus it must be mutexed by the calling code.
199e64fe029SMartin Matuska  This is due to calling `umask(oldumask = umask(0))`, which sets the
200e64fe029SMartin Matuska  umask for the whole process to 0 for a short time frame.
201e64fe029SMartin Matuska  In case other thread calls the same function in parallel, it might
202e64fe029SMartin Matuska  get interrupted by it and cause the executable to use umask=0 for the
203e64fe029SMartin Matuska  remaining execution.
204*13d826ffSMartin Matuska  This will then lead to implicitly created directories to have 777
205e64fe029SMartin Matuska  permissions without sticky bit.
206e64fe029SMartin Matuska
20747af42f8SMartin Matuska* In particular, libarchive's modules to read or write a directory
20847af42f8SMartin Matuska  tree do use `chdir()` to optimize the directory traversals.  This
20947af42f8SMartin Matuska  can cause problems for programs that expect to do disk access from
21047af42f8SMartin Matuska  multiple threads.  Of course, those modules are completely
21147af42f8SMartin Matuska  optional and you can use the rest of libarchive without them.
21247af42f8SMartin Matuska
21347af42f8SMartin Matuska* The library is _not_ thread aware, however.  It does no locking
21447af42f8SMartin Matuska  or thread management of any kind.  If you create a libarchive
21547af42f8SMartin Matuska  object and need to access it from multiple threads, you will
21647af42f8SMartin Matuska  need to provide your own locking.
21747af42f8SMartin Matuska
21847af42f8SMartin Matuska* On read, the library accepts whatever blocks you hand it.
21947af42f8SMartin Matuska  Your read callback is free to pass the library a byte at a time
22047af42f8SMartin Matuska  or mmap the entire archive and give it to the library at once.
22147af42f8SMartin Matuska  On write, the library always produces correctly-blocked output.
22247af42f8SMartin Matuska
22347af42f8SMartin Matuska* The object-style approach allows you to have multiple archive streams
22447af42f8SMartin Matuska  open at once.  bsdtar uses this in its "@archive" extension.
22547af42f8SMartin Matuska
22647af42f8SMartin Matuska* The archive itself is read/written using callback functions.
22747af42f8SMartin Matuska  You can read an archive directly from an in-memory buffer or
22847af42f8SMartin Matuska  write it to a socket, if you wish.  There are some utility
22947af42f8SMartin Matuska  functions to provide easy-to-use "open file," etc, capabilities.
23047af42f8SMartin Matuska
23147af42f8SMartin Matuska* The read/write APIs are designed to allow individual entries
23247af42f8SMartin Matuska  to be read or written to any data source:  You can create
23347af42f8SMartin Matuska  a block of data in memory and add it to a tar archive without
23447af42f8SMartin Matuska  first writing a temporary file.  You can also read an entry from
23547af42f8SMartin Matuska  an archive and write the data directly to a socket.  If you want
23647af42f8SMartin Matuska  to read/write entries to disk, there are convenience functions to
23747af42f8SMartin Matuska  make this especially easy.
23847af42f8SMartin Matuska
23947af42f8SMartin Matuska* Note: The "pax interchange format" is a POSIX standard extended tar
24047af42f8SMartin Matuska  format that should be used when the older _ustar_ format is not
24147af42f8SMartin Matuska  appropriate.  It has many advantages over other tar formats
24247af42f8SMartin Matuska  (including the legacy GNU tar format) and is widely supported by
24347af42f8SMartin Matuska  current tar implementations.
24447af42f8SMartin Matuska
245