xref: /freebsd/contrib/xz/README (revision 128836d304d93f2d00eb14069c27089ab46c38d4)
181ad8388SMartin Matuska
281ad8388SMartin MatuskaXZ Utils
381ad8388SMartin Matuska========
481ad8388SMartin Matuska
581ad8388SMartin Matuska    0. Overview
681ad8388SMartin Matuska    1. Documentation
781ad8388SMartin Matuska       1.1. Overall documentation
83632bc4cSMartin Matuska       1.2. Documentation for command-line tools
981ad8388SMartin Matuska       1.3. Documentation for liblzma
1081ad8388SMartin Matuska    2. Version numbering
1181ad8388SMartin Matuska    3. Reporting bugs
12a8675d92SXin LI    4. Translations
13*128836d3SXin LI       4.1. Testing translations
14e0f0e66dSMartin Matuska    5. Other implementations of the .xz format
15e0f0e66dSMartin Matuska    6. Contact information
1681ad8388SMartin Matuska
1781ad8388SMartin Matuska
1881ad8388SMartin Matuska0. Overview
1981ad8388SMartin Matuska-----------
2081ad8388SMartin Matuska
213632bc4cSMartin Matuska    XZ Utils provide a general-purpose data-compression library plus
223632bc4cSMartin Matuska    command-line tools. The native file format is the .xz format, but
2381ad8388SMartin Matuska    also the legacy .lzma format is supported. The .xz format supports
243632bc4cSMartin Matuska    multiple compression algorithms, which are called "filters" in the
2581ad8388SMartin Matuska    context of XZ Utils. The primary filter is currently LZMA2. With
2681ad8388SMartin Matuska    typical files, XZ Utils create about 30 % smaller files than gzip.
2781ad8388SMartin Matuska
2881ad8388SMartin Matuska    To ease adapting support for the .xz format into existing applications
2981ad8388SMartin Matuska    and scripts, the API of liblzma is somewhat similar to the API of the
303632bc4cSMartin Matuska    popular zlib library. For the same reason, the command-line tool xz
313632bc4cSMartin Matuska    has a command-line syntax similar to that of gzip.
3281ad8388SMartin Matuska
333632bc4cSMartin Matuska    When aiming for the highest compression ratio, the LZMA2 encoder uses
3481ad8388SMartin Matuska    a lot of CPU time and may use, depending on the settings, even
353632bc4cSMartin Matuska    hundreds of megabytes of RAM. However, in fast modes, the LZMA2 encoder
3681ad8388SMartin Matuska    competes with bzip2 in compression speed, RAM usage, and compression
3781ad8388SMartin Matuska    ratio.
3881ad8388SMartin Matuska
3981ad8388SMartin Matuska    LZMA2 is reasonably fast to decompress. It is a little slower than
4081ad8388SMartin Matuska    gzip, but a lot faster than bzip2. Being fast to decompress means
4181ad8388SMartin Matuska    that the .xz format is especially nice when the same file will be
4281ad8388SMartin Matuska    decompressed very many times (usually on different computers), which
4381ad8388SMartin Matuska    is the case e.g. when distributing software packages. In such
4481ad8388SMartin Matuska    situations, it's not too bad if the compression takes some time,
4581ad8388SMartin Matuska    since that needs to be done only once to benefit many people.
4681ad8388SMartin Matuska
4781ad8388SMartin Matuska    With some file types, combining (or "chaining") LZMA2 with an
483632bc4cSMartin Matuska    additional filter can improve the compression ratio. A filter chain may
493632bc4cSMartin Matuska    contain up to four filters, although usually only one or two are used.
5081ad8388SMartin Matuska    For example, putting a BCJ (Branch/Call/Jump) filter before LZMA2
5181ad8388SMartin Matuska    in the filter chain can improve compression ratio of executable files.
5281ad8388SMartin Matuska
5381ad8388SMartin Matuska    Since the .xz format allows adding new filter IDs, it is possible that
5481ad8388SMartin Matuska    some day there will be a filter that is, for example, much faster to
5581ad8388SMartin Matuska    compress than LZMA2 (but probably with worse compression ratio).
5681ad8388SMartin Matuska    Similarly, it is possible that some day there is a filter that will
5781ad8388SMartin Matuska    compress better than LZMA2.
5881ad8388SMartin Matuska
59a8675d92SXin LI    XZ Utils supports multithreaded compression. XZ Utils doesn't support
60a8675d92SXin LI    multithreaded decompression yet. It has been planned though and taken
61a8675d92SXin LI    into account when designing the .xz file format. In the future, files
62a8675d92SXin LI    that were created in threaded mode can be decompressed in threaded
63a8675d92SXin LI    mode too.
6481ad8388SMartin Matuska
6581ad8388SMartin Matuska
6681ad8388SMartin Matuska1. Documentation
6781ad8388SMartin Matuska----------------
6881ad8388SMartin Matuska
6981ad8388SMartin Matuska1.1. Overall documentation
7081ad8388SMartin Matuska
7181ad8388SMartin Matuska    README                This file
7281ad8388SMartin Matuska
733b35e7eeSXin LI    INSTALL.generic       Generic install instructions for those not
743b35e7eeSXin LI                          familiar with packages using GNU Autotools
7581ad8388SMartin Matuska    INSTALL               Installation instructions specific to XZ Utils
7681ad8388SMartin Matuska    PACKAGERS             Information to packagers of XZ Utils
7781ad8388SMartin Matuska
7881ad8388SMartin Matuska    COPYING               XZ Utils copyright and license information
793b35e7eeSXin LI    COPYING.0BSD          BSD Zero Clause License
8081ad8388SMartin Matuska    COPYING.GPLv2         GNU General Public License version 2
8181ad8388SMartin Matuska    COPYING.GPLv3         GNU General Public License version 3
8281ad8388SMartin Matuska    COPYING.LGPLv2.1      GNU Lesser General Public License version 2.1
8381ad8388SMartin Matuska
8481ad8388SMartin Matuska    AUTHORS               The main authors of XZ Utils
8581ad8388SMartin Matuska    THANKS                Incomplete list of people who have helped making
8681ad8388SMartin Matuska                          this software
8781ad8388SMartin Matuska    NEWS                  User-visible changes between XZ Utils releases
8881ad8388SMartin Matuska    ChangeLog             Detailed list of changes (commit log)
8981ad8388SMartin Matuska    TODO                  Known bugs and some sort of to-do list
9081ad8388SMartin Matuska
9181ad8388SMartin Matuska    Note that only some of the above files are included in binary
9281ad8388SMartin Matuska    packages.
9381ad8388SMartin Matuska
9481ad8388SMartin Matuska
953632bc4cSMartin Matuska1.2. Documentation for command-line tools
9681ad8388SMartin Matuska
973632bc4cSMartin Matuska    The command-line tools are documented as man pages. In source code
9881ad8388SMartin Matuska    releases (and possibly also in some binary packages), the man pages
993b35e7eeSXin LI    are also provided in plain text (ASCII only) format in the directory
1003b35e7eeSXin LI    "doc/man" to make the man pages more accessible to those whose
1013b35e7eeSXin LI    operating system doesn't provide an easy way to view man pages.
10281ad8388SMartin Matuska
10381ad8388SMartin Matuska
10481ad8388SMartin Matuska1.3. Documentation for liblzma
10581ad8388SMartin Matuska
10681ad8388SMartin Matuska    The liblzma API headers include short docs about each function
10781ad8388SMartin Matuska    and data type as Doxygen tags. These docs should be quite OK as
10881ad8388SMartin Matuska    a quick reference.
10981ad8388SMartin Matuska
110a8675d92SXin LI    There are a few example/tutorial programs that should help in
111a8675d92SXin LI    getting started with liblzma. In the source package the examples
112a8675d92SXin LI    are in "doc/examples" and in binary packages they may be under
113a8675d92SXin LI    "examples" in the same directory as this README.
11481ad8388SMartin Matuska
115a8675d92SXin LI    Since the liblzma API has similarities to the zlib API, some people
116a8675d92SXin LI    may find it useful to read the zlib docs and tutorial too:
11781ad8388SMartin Matuska
118c917796cSXin LI        https://zlib.net/manual.html
119c917796cSXin LI        https://zlib.net/zlib_how.html
12081ad8388SMartin Matuska
12181ad8388SMartin Matuska
12281ad8388SMartin Matuska2. Version numbering
12381ad8388SMartin Matuska--------------------
12481ad8388SMartin Matuska
12581ad8388SMartin Matuska    The version number format of XZ Utils is X.Y.ZS:
12681ad8388SMartin Matuska
12781ad8388SMartin Matuska      - X is the major version. When this is incremented, the library
12881ad8388SMartin Matuska        API and ABI break.
12981ad8388SMartin Matuska
1303632bc4cSMartin Matuska      - Y is the minor version. It is incremented when new features
1313632bc4cSMartin Matuska        are added without breaking the existing API or ABI. An even Y
1323632bc4cSMartin Matuska        indicates a stable release and an odd Y indicates unstable
1333632bc4cSMartin Matuska        (alpha or beta version).
13481ad8388SMartin Matuska
1353632bc4cSMartin Matuska      - Z is the revision. This has a different meaning for stable and
13681ad8388SMartin Matuska        unstable releases:
1373632bc4cSMartin Matuska
13881ad8388SMartin Matuska          * Stable: Z is incremented when bugs get fixed without adding
1393632bc4cSMartin Matuska            any new features. This is intended to be convenient for
1403632bc4cSMartin Matuska            downstream distributors that want bug fixes but don't want
1413632bc4cSMartin Matuska            any new features to minimize the risk of introducing new bugs.
1423632bc4cSMartin Matuska
14381ad8388SMartin Matuska          * Unstable: Z is just a counter. API or ABI of features added
14481ad8388SMartin Matuska            in earlier unstable releases having the same X.Y may break.
14581ad8388SMartin Matuska
14681ad8388SMartin Matuska      - S indicates stability of the release. It is missing from the
1473632bc4cSMartin Matuska        stable releases, where Y is an even number. When Y is odd, S
14881ad8388SMartin Matuska        is either "alpha" or "beta" to make it very clear that such
14981ad8388SMartin Matuska        versions are not stable releases. The same X.Y.Z combination is
1503632bc4cSMartin Matuska        not used for more than one stability level, i.e. after X.Y.Zalpha,
15181ad8388SMartin Matuska        the next version can be X.Y.(Z+1)beta but not X.Y.Zbeta.
15281ad8388SMartin Matuska
15381ad8388SMartin Matuska
15481ad8388SMartin Matuska3. Reporting bugs
15581ad8388SMartin Matuska-----------------
15681ad8388SMartin Matuska
15781ad8388SMartin Matuska    Naturally it is easiest for me if you already know what causes the
15881ad8388SMartin Matuska    unexpected behavior. Even better if you have a patch to propose.
15981ad8388SMartin Matuska    However, quite often the reason for unexpected behavior is unknown,
16081ad8388SMartin Matuska    so here are a few things to do before sending a bug report:
16181ad8388SMartin Matuska
16281ad8388SMartin Matuska      1. Try to create a small example how to reproduce the issue.
16381ad8388SMartin Matuska
16481ad8388SMartin Matuska      2. Compile XZ Utils with debugging code using configure switches
16581ad8388SMartin Matuska         --enable-debug and, if possible, --disable-shared. If you are
16681ad8388SMartin Matuska         using GCC, use CFLAGS='-O0 -ggdb3'. Don't strip the resulting
16781ad8388SMartin Matuska         binaries.
16881ad8388SMartin Matuska
16981ad8388SMartin Matuska      3. Turn on core dumps. The exact command depends on your shell;
17081ad8388SMartin Matuska         for example in GNU bash it is done with "ulimit -c unlimited",
17181ad8388SMartin Matuska         and in tcsh with "limit coredumpsize unlimited".
17281ad8388SMartin Matuska
17381ad8388SMartin Matuska      4. Try to reproduce the suspected bug. If you get "assertion failed"
17481ad8388SMartin Matuska         message, be sure to include the complete message in your bug
17581ad8388SMartin Matuska         report. If the application leaves a coredump, get a backtrace
17681ad8388SMartin Matuska         using gdb:
17781ad8388SMartin Matuska           $ gdb /path/to/app-binary   # Load the app to the debugger.
17881ad8388SMartin Matuska           (gdb) core core   # Open the coredump.
17981ad8388SMartin Matuska           (gdb) bt   # Print the backtrace. Copy & paste to bug report.
18081ad8388SMartin Matuska           (gdb) quit   # Quit gdb.
18181ad8388SMartin Matuska
18281ad8388SMartin Matuska    Report your bug via email or IRC (see Contact information below).
18381ad8388SMartin Matuska    Don't send core dump files or any executables. If you have a small
18481ad8388SMartin Matuska    example file(s) (total size less than 256 KiB), please include
18581ad8388SMartin Matuska    it/them as an attachment. If you have bigger test files, put them
1863632bc4cSMartin Matuska    online somewhere and include a URL to the file(s) in the bug report.
18781ad8388SMartin Matuska
18881ad8388SMartin Matuska    Always include the exact version number of XZ Utils in the bug report.
18981ad8388SMartin Matuska    If you are using a snapshot from the git repository, use "git describe"
19081ad8388SMartin Matuska    to get the exact snapshot version. If you are using XZ Utils shipped
19181ad8388SMartin Matuska    in an operating system distribution, mention the distribution name,
19281ad8388SMartin Matuska    distribution version, and exact xz package version; if you cannot
19381ad8388SMartin Matuska    repeat the bug with the code compiled from unpatched source code,
19481ad8388SMartin Matuska    you probably need to report a bug to your distribution's bug tracking
19581ad8388SMartin Matuska    system.
19681ad8388SMartin Matuska
19781ad8388SMartin Matuska
198a8675d92SXin LI4. Translations
199a8675d92SXin LI---------------
200e0f0e66dSMartin Matuska
201a8675d92SXin LI    The xz command line tool and all man pages can be translated.
202a8675d92SXin LI    The translations are handled via the Translation Project. If you
203a8675d92SXin LI    wish to help translating xz, please join the Translation Project:
204e0f0e66dSMartin Matuska
205a8675d92SXin LI        https://translationproject.org/html/translators.html
206e0f0e66dSMartin Matuska
207*128836d3SXin LI    Updates to translations won't be accepted by methods that bypass
208*128836d3SXin LI    the Translation Project because there is a risk of duplicate work:
209*128836d3SXin LI    translation updates made in the xz repository aren't seen by the
210*128836d3SXin LI    translators in the Translation Project. If you have found bugs in
211*128836d3SXin LI    a translation, please report them to the Language-Team address
212*128836d3SXin LI    which can be found near the beginning of the PO file.
21373ed8e77SXin LI
214*128836d3SXin LI    If you find language problems in the original English strings,
21573ed8e77SXin LI    feel free to suggest improvements. Ask if something is unclear.
21673ed8e77SXin LI
21773ed8e77SXin LI
218*128836d3SXin LI4.1. Testing translations
219*128836d3SXin LI
220*128836d3SXin LI    Testing can be done by installing xz into a temporary directory.
221*128836d3SXin LI
222*128836d3SXin LI    If building from Git repository (not tarball), generate the
223*128836d3SXin LI    Autotools files:
224*128836d3SXin LI
225*128836d3SXin LI        ./autogen.sh
226*128836d3SXin LI
227*128836d3SXin LI    Create a subdirectory for the build files. The tmp-build directory
228*128836d3SXin LI    can be deleted after testing.
229*128836d3SXin LI
230*128836d3SXin LI        mkdir tmp-build
231*128836d3SXin LI        cd tmp-build
232*128836d3SXin LI        ../configure --disable-shared --enable-debug --prefix=$PWD/inst
233*128836d3SXin LI
234*128836d3SXin LI    Edit the .po file in the po directory. Then build and install to
235*128836d3SXin LI    the "tmp-build/inst" directory, and use translations.bash to see
236*128836d3SXin LI    how some of the messages look. Repeat these  steps if needed:
237*128836d3SXin LI
238*128836d3SXin LI        make -C po update-po
239*128836d3SXin LI        make -j"$(nproc)" install
240*128836d3SXin LI        bash ../debug/translation.bash | less
241*128836d3SXin LI        bash ../debug/translation.bash | less -S  # For --list outputs
242*128836d3SXin LI
243*128836d3SXin LI    To test other languages, set the LANGUAGE environment variable
244*128836d3SXin LI    before running translations.bash. The value should match the PO file
245*128836d3SXin LI    name without the .po suffix. Example:
246*128836d3SXin LI
247*128836d3SXin LI        export LANGUAGE=fi
248e0f0e66dSMartin Matuska
249e0f0e66dSMartin Matuska
250e0f0e66dSMartin Matuska5. Other implementations of the .xz format
25181ad8388SMartin Matuska------------------------------------------
25281ad8388SMartin Matuska
25381ad8388SMartin Matuska    7-Zip and the p7zip port of 7-Zip support the .xz format starting
25481ad8388SMartin Matuska    from the version 9.00alpha.
25581ad8388SMartin Matuska
256c917796cSXin LI        https://7-zip.org/
257c917796cSXin LI        https://p7zip.sourceforge.net/
25881ad8388SMartin Matuska
25981ad8388SMartin Matuska    XZ Embedded is a limited implementation written for use in the Linux
26081ad8388SMartin Matuska    kernel, but it is also suitable for other embedded use.
26181ad8388SMartin Matuska
2622f9cd13dSXin LI        https://tukaani.org/xz/embedded.html
26381ad8388SMartin Matuska
2641f3ced26SXin LI    XZ for Java is a complete implementation written in pure Java.
2651f3ced26SXin LI
2662f9cd13dSXin LI        https://tukaani.org/xz/java.html
2671f3ced26SXin LI
26881ad8388SMartin Matuska
269e0f0e66dSMartin Matuska6. Contact information
27081ad8388SMartin Matuska----------------------
27181ad8388SMartin Matuska
2723b35e7eeSXin LI    XZ Utils in general:
2733b35e7eeSXin LI      - Home page: https://tukaani.org/xz/
2743b35e7eeSXin LI      - Email to maintainer(s): xz@tukaani.org
2753b35e7eeSXin LI      - IRC: #tukaani on Libera Chat
2763b35e7eeSXin LI      - GitHub: https://github.com/tukaani-project/xz
27781ad8388SMartin Matuska
2783b35e7eeSXin LI    Lead maintainer:
2793b35e7eeSXin LI      - Email: Lasse Collin <lasse.collin@tukaani.org>
2803b35e7eeSXin LI      - IRC: Larhzu on Libera Chat
28181ad8388SMartin Matuska
282