xref: /freebsd/contrib/xz/README (revision 1f3ced26d4946ff2e24472432a275efb9ba1f2ca)
181ad8388SMartin Matuska
281ad8388SMartin MatuskaXZ Utils
381ad8388SMartin Matuska========
481ad8388SMartin Matuska
581ad8388SMartin Matuska    0. Overview
681ad8388SMartin Matuska    1. Documentation
781ad8388SMartin Matuska       1.1. Overall documentation
83632bc4cSMartin Matuska       1.2. Documentation for command-line tools
981ad8388SMartin Matuska       1.3. Documentation for liblzma
1081ad8388SMartin Matuska    2. Version numbering
1181ad8388SMartin Matuska    3. Reporting bugs
12a8675d92SXin LI    4. Translations
13e0f0e66dSMartin Matuska    5. Other implementations of the .xz format
14e0f0e66dSMartin Matuska    6. Contact information
1581ad8388SMartin Matuska
1681ad8388SMartin Matuska
1781ad8388SMartin Matuska0. Overview
1881ad8388SMartin Matuska-----------
1981ad8388SMartin Matuska
203632bc4cSMartin Matuska    XZ Utils provide a general-purpose data-compression library plus
213632bc4cSMartin Matuska    command-line tools. The native file format is the .xz format, but
2281ad8388SMartin Matuska    also the legacy .lzma format is supported. The .xz format supports
233632bc4cSMartin Matuska    multiple compression algorithms, which are called "filters" in the
2481ad8388SMartin Matuska    context of XZ Utils. The primary filter is currently LZMA2. With
2581ad8388SMartin Matuska    typical files, XZ Utils create about 30 % smaller files than gzip.
2681ad8388SMartin Matuska
2781ad8388SMartin Matuska    To ease adapting support for the .xz format into existing applications
2881ad8388SMartin Matuska    and scripts, the API of liblzma is somewhat similar to the API of the
293632bc4cSMartin Matuska    popular zlib library. For the same reason, the command-line tool xz
303632bc4cSMartin Matuska    has a command-line syntax similar to that of gzip.
3181ad8388SMartin Matuska
323632bc4cSMartin Matuska    When aiming for the highest compression ratio, the LZMA2 encoder uses
3381ad8388SMartin Matuska    a lot of CPU time and may use, depending on the settings, even
343632bc4cSMartin Matuska    hundreds of megabytes of RAM. However, in fast modes, the LZMA2 encoder
3581ad8388SMartin Matuska    competes with bzip2 in compression speed, RAM usage, and compression
3681ad8388SMartin Matuska    ratio.
3781ad8388SMartin Matuska
3881ad8388SMartin Matuska    LZMA2 is reasonably fast to decompress. It is a little slower than
3981ad8388SMartin Matuska    gzip, but a lot faster than bzip2. Being fast to decompress means
4081ad8388SMartin Matuska    that the .xz format is especially nice when the same file will be
4181ad8388SMartin Matuska    decompressed very many times (usually on different computers), which
4281ad8388SMartin Matuska    is the case e.g. when distributing software packages. In such
4381ad8388SMartin Matuska    situations, it's not too bad if the compression takes some time,
4481ad8388SMartin Matuska    since that needs to be done only once to benefit many people.
4581ad8388SMartin Matuska
4681ad8388SMartin Matuska    With some file types, combining (or "chaining") LZMA2 with an
473632bc4cSMartin Matuska    additional filter can improve the compression ratio. A filter chain may
483632bc4cSMartin Matuska    contain up to four filters, although usually only one or two are used.
4981ad8388SMartin Matuska    For example, putting a BCJ (Branch/Call/Jump) filter before LZMA2
5081ad8388SMartin Matuska    in the filter chain can improve compression ratio of executable files.
5181ad8388SMartin Matuska
5281ad8388SMartin Matuska    Since the .xz format allows adding new filter IDs, it is possible that
5381ad8388SMartin Matuska    some day there will be a filter that is, for example, much faster to
5481ad8388SMartin Matuska    compress than LZMA2 (but probably with worse compression ratio).
5581ad8388SMartin Matuska    Similarly, it is possible that some day there is a filter that will
5681ad8388SMartin Matuska    compress better than LZMA2.
5781ad8388SMartin Matuska
58a8675d92SXin LI    XZ Utils supports multithreaded compression. XZ Utils doesn't support
59a8675d92SXin LI    multithreaded decompression yet. It has been planned though and taken
60a8675d92SXin LI    into account when designing the .xz file format. In the future, files
61a8675d92SXin LI    that were created in threaded mode can be decompressed in threaded
62a8675d92SXin LI    mode too.
6381ad8388SMartin Matuska
6481ad8388SMartin Matuska
6581ad8388SMartin Matuska1. Documentation
6681ad8388SMartin Matuska----------------
6781ad8388SMartin Matuska
6881ad8388SMartin Matuska1.1. Overall documentation
6981ad8388SMartin Matuska
7081ad8388SMartin Matuska    README              This file
7181ad8388SMartin Matuska
7281ad8388SMartin Matuska    INSTALL.generic     Generic install instructions for those not familiar
7381ad8388SMartin Matuska                        with packages using GNU Autotools
7481ad8388SMartin Matuska    INSTALL             Installation instructions specific to XZ Utils
7581ad8388SMartin Matuska    PACKAGERS           Information to packagers of XZ Utils
7681ad8388SMartin Matuska
7781ad8388SMartin Matuska    COPYING             XZ Utils copyright and license information
7881ad8388SMartin Matuska    COPYING.GPLv2       GNU General Public License version 2
7981ad8388SMartin Matuska    COPYING.GPLv3       GNU General Public License version 3
8081ad8388SMartin Matuska    COPYING.LGPLv2.1    GNU Lesser General Public License version 2.1
8181ad8388SMartin Matuska
8281ad8388SMartin Matuska    AUTHORS             The main authors of XZ Utils
8381ad8388SMartin Matuska    THANKS              Incomplete list of people who have helped making
8481ad8388SMartin Matuska                        this software
8581ad8388SMartin Matuska    NEWS                User-visible changes between XZ Utils releases
8681ad8388SMartin Matuska    ChangeLog           Detailed list of changes (commit log)
8781ad8388SMartin Matuska    TODO                Known bugs and some sort of to-do list
8881ad8388SMartin Matuska
8981ad8388SMartin Matuska    Note that only some of the above files are included in binary
9081ad8388SMartin Matuska    packages.
9181ad8388SMartin Matuska
9281ad8388SMartin Matuska
933632bc4cSMartin Matuska1.2. Documentation for command-line tools
9481ad8388SMartin Matuska
953632bc4cSMartin Matuska    The command-line tools are documented as man pages. In source code
9681ad8388SMartin Matuska    releases (and possibly also in some binary packages), the man pages
9781ad8388SMartin Matuska    are also provided in plain text (ASCII only) and PDF formats in the
9881ad8388SMartin Matuska    directory "doc/man" to make the man pages more accessible to those
9981ad8388SMartin Matuska    whose operating system doesn't provide an easy way to view man pages.
10081ad8388SMartin Matuska
10181ad8388SMartin Matuska
10281ad8388SMartin Matuska1.3. Documentation for liblzma
10381ad8388SMartin Matuska
10481ad8388SMartin Matuska    The liblzma API headers include short docs about each function
10581ad8388SMartin Matuska    and data type as Doxygen tags. These docs should be quite OK as
10681ad8388SMartin Matuska    a quick reference.
10781ad8388SMartin Matuska
108a8675d92SXin LI    There are a few example/tutorial programs that should help in
109a8675d92SXin LI    getting started with liblzma. In the source package the examples
110a8675d92SXin LI    are in "doc/examples" and in binary packages they may be under
111a8675d92SXin LI    "examples" in the same directory as this README.
11281ad8388SMartin Matuska
113a8675d92SXin LI    Since the liblzma API has similarities to the zlib API, some people
114a8675d92SXin LI    may find it useful to read the zlib docs and tutorial too:
11581ad8388SMartin Matuska
116c917796cSXin LI        https://zlib.net/manual.html
117c917796cSXin LI        https://zlib.net/zlib_how.html
11881ad8388SMartin Matuska
11981ad8388SMartin Matuska
12081ad8388SMartin Matuska2. Version numbering
12181ad8388SMartin Matuska--------------------
12281ad8388SMartin Matuska
12381ad8388SMartin Matuska    The version number format of XZ Utils is X.Y.ZS:
12481ad8388SMartin Matuska
12581ad8388SMartin Matuska      - X is the major version. When this is incremented, the library
12681ad8388SMartin Matuska        API and ABI break.
12781ad8388SMartin Matuska
1283632bc4cSMartin Matuska      - Y is the minor version. It is incremented when new features
1293632bc4cSMartin Matuska        are added without breaking the existing API or ABI. An even Y
1303632bc4cSMartin Matuska        indicates a stable release and an odd Y indicates unstable
1313632bc4cSMartin Matuska        (alpha or beta version).
13281ad8388SMartin Matuska
1333632bc4cSMartin Matuska      - Z is the revision. This has a different meaning for stable and
13481ad8388SMartin Matuska        unstable releases:
1353632bc4cSMartin Matuska
13681ad8388SMartin Matuska          * Stable: Z is incremented when bugs get fixed without adding
1373632bc4cSMartin Matuska            any new features. This is intended to be convenient for
1383632bc4cSMartin Matuska            downstream distributors that want bug fixes but don't want
1393632bc4cSMartin Matuska            any new features to minimize the risk of introducing new bugs.
1403632bc4cSMartin Matuska
14181ad8388SMartin Matuska          * Unstable: Z is just a counter. API or ABI of features added
14281ad8388SMartin Matuska            in earlier unstable releases having the same X.Y may break.
14381ad8388SMartin Matuska
14481ad8388SMartin Matuska      - S indicates stability of the release. It is missing from the
1453632bc4cSMartin Matuska        stable releases, where Y is an even number. When Y is odd, S
14681ad8388SMartin Matuska        is either "alpha" or "beta" to make it very clear that such
14781ad8388SMartin Matuska        versions are not stable releases. The same X.Y.Z combination is
1483632bc4cSMartin Matuska        not used for more than one stability level, i.e. after X.Y.Zalpha,
14981ad8388SMartin Matuska        the next version can be X.Y.(Z+1)beta but not X.Y.Zbeta.
15081ad8388SMartin Matuska
15181ad8388SMartin Matuska
15281ad8388SMartin Matuska3. Reporting bugs
15381ad8388SMartin Matuska-----------------
15481ad8388SMartin Matuska
15581ad8388SMartin Matuska    Naturally it is easiest for me if you already know what causes the
15681ad8388SMartin Matuska    unexpected behavior. Even better if you have a patch to propose.
15781ad8388SMartin Matuska    However, quite often the reason for unexpected behavior is unknown,
15881ad8388SMartin Matuska    so here are a few things to do before sending a bug report:
15981ad8388SMartin Matuska
16081ad8388SMartin Matuska      1. Try to create a small example how to reproduce the issue.
16181ad8388SMartin Matuska
16281ad8388SMartin Matuska      2. Compile XZ Utils with debugging code using configure switches
16381ad8388SMartin Matuska         --enable-debug and, if possible, --disable-shared. If you are
16481ad8388SMartin Matuska         using GCC, use CFLAGS='-O0 -ggdb3'. Don't strip the resulting
16581ad8388SMartin Matuska         binaries.
16681ad8388SMartin Matuska
16781ad8388SMartin Matuska      3. Turn on core dumps. The exact command depends on your shell;
16881ad8388SMartin Matuska         for example in GNU bash it is done with "ulimit -c unlimited",
16981ad8388SMartin Matuska         and in tcsh with "limit coredumpsize unlimited".
17081ad8388SMartin Matuska
17181ad8388SMartin Matuska      4. Try to reproduce the suspected bug. If you get "assertion failed"
17281ad8388SMartin Matuska         message, be sure to include the complete message in your bug
17381ad8388SMartin Matuska         report. If the application leaves a coredump, get a backtrace
17481ad8388SMartin Matuska         using gdb:
17581ad8388SMartin Matuska           $ gdb /path/to/app-binary   # Load the app to the debugger.
17681ad8388SMartin Matuska           (gdb) core core   # Open the coredump.
17781ad8388SMartin Matuska           (gdb) bt   # Print the backtrace. Copy & paste to bug report.
17881ad8388SMartin Matuska           (gdb) quit   # Quit gdb.
17981ad8388SMartin Matuska
18081ad8388SMartin Matuska    Report your bug via email or IRC (see Contact information below).
18181ad8388SMartin Matuska    Don't send core dump files or any executables. If you have a small
18281ad8388SMartin Matuska    example file(s) (total size less than 256 KiB), please include
18381ad8388SMartin Matuska    it/them as an attachment. If you have bigger test files, put them
1843632bc4cSMartin Matuska    online somewhere and include a URL to the file(s) in the bug report.
18581ad8388SMartin Matuska
18681ad8388SMartin Matuska    Always include the exact version number of XZ Utils in the bug report.
18781ad8388SMartin Matuska    If you are using a snapshot from the git repository, use "git describe"
18881ad8388SMartin Matuska    to get the exact snapshot version. If you are using XZ Utils shipped
18981ad8388SMartin Matuska    in an operating system distribution, mention the distribution name,
19081ad8388SMartin Matuska    distribution version, and exact xz package version; if you cannot
19181ad8388SMartin Matuska    repeat the bug with the code compiled from unpatched source code,
19281ad8388SMartin Matuska    you probably need to report a bug to your distribution's bug tracking
19381ad8388SMartin Matuska    system.
19481ad8388SMartin Matuska
19581ad8388SMartin Matuska
196a8675d92SXin LI4. Translations
197a8675d92SXin LI---------------
198e0f0e66dSMartin Matuska
199a8675d92SXin LI    The xz command line tool and all man pages can be translated.
200a8675d92SXin LI    The translations are handled via the Translation Project. If you
201a8675d92SXin LI    wish to help translating xz, please join the Translation Project:
202e0f0e66dSMartin Matuska
203a8675d92SXin LI        https://translationproject.org/html/translators.html
204e0f0e66dSMartin Matuska
20573ed8e77SXin LI    Below are notes and testing instructions specific to xz
20673ed8e77SXin LI    translations.
20773ed8e77SXin LI
20873ed8e77SXin LI    Testing can be done by installing xz into a temporary directory:
20973ed8e77SXin LI
21073ed8e77SXin LI        ./configure --disable-shared --prefix=/tmp/xz-test
21173ed8e77SXin LI        # <Edit the .po file in the po directory.>
21273ed8e77SXin LI        make -C po update-po
21373ed8e77SXin LI        make install
21473ed8e77SXin LI        bash debug/translation.bash | less
21573ed8e77SXin LI        bash debug/translation.bash | less -S  # For --list outputs
21673ed8e77SXin LI
21773ed8e77SXin LI    Repeat the above as needed (no need to re-run configure though).
21873ed8e77SXin LI
21973ed8e77SXin LI    Note especially the following:
22073ed8e77SXin LI
22173ed8e77SXin LI      - The output of --help and --long-help must look nice on
22273ed8e77SXin LI        an 80-column terminal. It's OK to add extra lines if needed.
22373ed8e77SXin LI
22473ed8e77SXin LI      - In contrast, don't add extra lines to error messages and such.
22573ed8e77SXin LI        They are often preceded with e.g. a filename on the same line,
22673ed8e77SXin LI        so you have no way to predict where to put a \n. Let the terminal
22773ed8e77SXin LI        do the wrapping even if it looks ugly. Adding new lines will be
22873ed8e77SXin LI        even uglier in the generic case even if it looks nice in a few
22973ed8e77SXin LI        limited examples.
23073ed8e77SXin LI
23173ed8e77SXin LI      - Be careful with column alignment in tables and table-like output
23273ed8e77SXin LI        (--list, --list --verbose --verbose, --info-memory, --help, and
23373ed8e77SXin LI        --long-help):
23473ed8e77SXin LI
23573ed8e77SXin LI          * All descriptions of options in --help should start in the
23673ed8e77SXin LI            same column (but it doesn't need to be the same column as
23773ed8e77SXin LI            in the English messages; just be consistent if you change it).
23873ed8e77SXin LI            Check that both --help and --long-help look OK, since they
23973ed8e77SXin LI            share several strings.
24073ed8e77SXin LI
24173ed8e77SXin LI          * --list --verbose and --info-memory print lines that have
24273ed8e77SXin LI            the format "Description:   %s". If you need a longer
24373ed8e77SXin LI            description, you can put extra space between the colon
24473ed8e77SXin LI            and %s. Then you may need to add extra space to other
24573ed8e77SXin LI            strings too so that the result as a whole looks good (all
24673ed8e77SXin LI            values start at the same column).
24773ed8e77SXin LI
24873ed8e77SXin LI          * The columns of the actual tables in --list --verbose --verbose
24973ed8e77SXin LI            should be aligned properly. Abbreviate if necessary. It might
25073ed8e77SXin LI            be good to keep at least 2 or 3 spaces between column headings
25173ed8e77SXin LI            and avoid spaces in the headings so that the columns stand out
25273ed8e77SXin LI            better, but this is a matter of opinion. Do what you think
25373ed8e77SXin LI            looks best.
25473ed8e77SXin LI
25573ed8e77SXin LI      - Be careful to put a period at the end of a sentence when the
25673ed8e77SXin LI        original version has it, and don't put it when the original
25773ed8e77SXin LI        doesn't have it. Similarly, be careful with \n characters
25873ed8e77SXin LI        at the beginning and end of the strings.
25973ed8e77SXin LI
26073ed8e77SXin LI      - Read the TRANSLATORS comments that have been extracted from the
26173ed8e77SXin LI        source code and included in xz.pot. Some comments suggest
26273ed8e77SXin LI        testing with a specific command which needs an .xz file. You
26373ed8e77SXin LI        may use e.g. any tests/files/good-*.xz. However, these test
26473ed8e77SXin LI        commands are included in translations.bash output, so reading
26573ed8e77SXin LI        translations.bash output carefully can be enough.
26673ed8e77SXin LI
26773ed8e77SXin LI      - If you find language problems in the original English strings,
26873ed8e77SXin LI        feel free to suggest improvements. Ask if something is unclear.
26973ed8e77SXin LI
27073ed8e77SXin LI      - The translated messages should be understandable (sometimes this
27173ed8e77SXin LI        may be a problem with the original English messages too). Don't
27273ed8e77SXin LI        make a direct word-by-word translation from English especially if
27373ed8e77SXin LI        the result doesn't sound good in your language.
27473ed8e77SXin LI
27573ed8e77SXin LI    Thanks for your help!
276e0f0e66dSMartin Matuska
277e0f0e66dSMartin Matuska
278e0f0e66dSMartin Matuska5. Other implementations of the .xz format
27981ad8388SMartin Matuska------------------------------------------
28081ad8388SMartin Matuska
28181ad8388SMartin Matuska    7-Zip and the p7zip port of 7-Zip support the .xz format starting
28281ad8388SMartin Matuska    from the version 9.00alpha.
28381ad8388SMartin Matuska
284c917796cSXin LI        https://7-zip.org/
285c917796cSXin LI        https://p7zip.sourceforge.net/
28681ad8388SMartin Matuska
28781ad8388SMartin Matuska    XZ Embedded is a limited implementation written for use in the Linux
28881ad8388SMartin Matuska    kernel, but it is also suitable for other embedded use.
28981ad8388SMartin Matuska
290b71a5db3SXin LI        https://tukaani.org/xz/embedded.html
29181ad8388SMartin Matuska
292*1f3ced26SXin LI    XZ for Java is a complete implementation written in pure Java.
293*1f3ced26SXin LI
294*1f3ced26SXin LI        https://tukaani.org/xz/java.html
295*1f3ced26SXin LI
29681ad8388SMartin Matuska
297e0f0e66dSMartin Matuska6. Contact information
29881ad8388SMartin Matuska----------------------
29981ad8388SMartin Matuska
30081ad8388SMartin Matuska    If you have questions, bug reports, patches etc. related to XZ Utils,
3010ca90ed4SXin LI    the project maintainers Lasse Collin and Jia Tan can be reached via
3020ca90ed4SXin LI    <xz@tukaani.org>.
30381ad8388SMartin Matuska
3040ca90ed4SXin LI    You might find Lasse also from #tukaani on Libera Chat (IRC).
3050ca90ed4SXin LI    The nick is Larhzu. The channel tends to be pretty quiet,
3060ca90ed4SXin LI    so just ask your question and someone might wake up.
30781ad8388SMartin Matuska
308