xref: /freebsd/sys/contrib/zstd/programs/zstd.1.md (revision 3c134670993bf525fcd6c4dfef84a3dfc3d4ed1b)
1zstd(1) -- zstd, zstdmt, unzstd, zstdcat - Compress or decompress .zst files
2============================================================================
3
4SYNOPSIS
5--------
6
7`zstd` [*OPTIONS*] [-|_INPUT-FILE_] [-o _OUTPUT-FILE_]
8
9`zstdmt` is equivalent to `zstd -T0`
10
11`unzstd` is equivalent to `zstd -d`
12
13`zstdcat` is equivalent to `zstd -dcf`
14
15
16DESCRIPTION
17-----------
18`zstd` is a fast lossless compression algorithm and data compression tool,
19with command line syntax similar to `gzip (1)` and `xz (1)`.
20It is based on the **LZ77** family, with further FSE & huff0 entropy stages.
21`zstd` offers highly configurable compression speed,
22with fast modes at > 200 MB/s per core,
23and strong modes nearing lzma compression ratios.
24It also features a very fast decoder, with speeds > 500 MB/s per core.
25
26`zstd` command line syntax is generally similar to gzip,
27but features the following differences :
28
29  - Source files are preserved by default.
30    It's possible to remove them automatically by using the `--rm` command.
31  - When compressing a single file, `zstd` displays progress notifications
32    and result summary by default.
33    Use `-q` to turn them off.
34  - `zstd` does not accept input from console,
35    but it properly accepts `stdin` when it's not the console.
36  - `zstd` displays a short help page when command line is an error.
37    Use `-q` to turn it off.
38
39`zstd` compresses or decompresses each _file_ according to the selected
40operation mode.
41If no _files_ are given or _file_ is `-`, `zstd` reads from standard input
42and writes the processed data to standard output.
43`zstd` will refuse to write compressed data to standard output
44if it is a terminal : it will display an error message and skip the _file_.
45Similarly, `zstd` will refuse to read compressed data from standard input
46if it is a terminal.
47
48Unless `--stdout` or `-o` is specified, _files_ are written to a new file
49whose name is derived from the source _file_ name:
50
51* When compressing, the suffix `.zst` is appended to the source filename to
52  get the target filename.
53* When decompressing, the `.zst` suffix is removed from the source filename to
54  get the target filename
55
56### Concatenation with .zst files
57It is possible to concatenate `.zst` files as is.
58`zstd` will decompress such files as if they were a single `.zst` file.
59
60OPTIONS
61-------
62
63### Integer suffixes and special values
64In most places where an integer argument is expected,
65an optional suffix is supported to easily indicate large integers.
66There must be no space between the integer and the suffix.
67
68* `KiB`:
69    Multiply the integer by 1,024 (2\^10).
70    `Ki`, `K`, and `KB` are accepted as synonyms for `KiB`.
71* `MiB`:
72    Multiply the integer by 1,048,576 (2\^20).
73    `Mi`, `M`, and `MB` are accepted as synonyms for `MiB`.
74
75### Operation mode
76If multiple operation mode options are given,
77the last one takes effect.
78
79* `-z`, `--compress`:
80    Compress.
81    This is the default operation mode when no operation mode option is specified
82    and no other operation mode is implied from the command name
83    (for example, `unzstd` implies `--decompress`).
84* `-d`, `--decompress`, `--uncompress`:
85    Decompress.
86* `-t`, `--test`:
87    Test the integrity of compressed _files_.
88    This option is equivalent to `--decompress --stdout` except that the
89    decompressed data is discarded instead of being written to standard output.
90    No files are created or removed.
91* `-b#`:
92    Benchmark file(s) using compression level #
93* `--train FILEs`:
94    Use FILEs as a training set to create a dictionary.
95    The training set should contain a lot of small files (> 100).
96* `-l`, `--list`:
97    Display information related to a zstd compressed file, such as size, ratio, and checksum.
98    Some of these fields may not be available.
99    This command can be augmented with the `-v` modifier.
100
101### Operation modifiers
102
103* `-#`:
104    `#` compression level \[1-19] (default: 3)
105* `--fast[=#]`:
106    switch to ultra-fast compression levels.
107    If `=#` is not present, it defaults to `1`.
108    The higher the value, the faster the compression speed,
109    at the cost of some compression ratio.
110    This setting overwrites compression level if one was set previously.
111    Similarly, if a compression level is set after `--fast`, it overrides it.
112* `--ultra`:
113    unlocks high compression levels 20+ (maximum 22), using a lot more memory.
114    Note that decompression will also require more memory when using these levels.
115* `--long[=#]`:
116    enables long distance matching with `#` `windowLog`, if not `#` is not
117    present it defaults to `27`.
118    This increases the window size (`windowLog`) and memory usage for both the
119    compressor and decompressor.
120    This setting is designed to improve the compression ratio for files with
121    long matches at a large distance.
122
123    Note: If `windowLog` is set to larger than 27, `--long=windowLog` or
124    `--memory=windowSize` needs to be passed to the decompressor.
125* `--patch-from=FILE`:
126    Specify the file to be used as a reference point for zstd's diff engine.
127    This is effectively dictionary compression with some convenient parameter
128    selection, namely that windowSize > srcSize.
129
130    Note: cannot use both this and -D together
131    Note: `--long` mode will be automatically activated if chainLog < fileLog
132        (fileLog being the windowLog requried to cover the whole file). You
133        can also manually force it.
134	Node: for all levels, you can use --patch-from in --single-thread mode
135		to improve compression ratio at the cost of speed
136    Note: for level 19, you can get increased compression ratio at the cost
137        of speed by specifying `--zstd=targetLength=` to be something large
138        (i.e 4096), and by setting a large `--zstd=chainLog=`
139* `-M#`, `--memory=#`:
140    Set a memory usage limit. By default, Zstandard uses 128 MB for decompression
141    as the maximum amount of memory the decompressor is allowed to use, but you can
142    override this manually if need be in either direction (ie. you can increase or
143    decrease it).
144
145    This is also used during compression when using with --patch-from=. In this case,
146    this parameter overrides that maximum size allowed for a dictionary. (128 MB).
147* `-T#`, `--threads=#`:
148    Compress using `#` working threads (default: 1).
149    If `#` is 0, attempt to detect and use the number of physical CPU cores.
150    In all cases, the nb of threads is capped to ZSTDMT_NBTHREADS_MAX==200.
151    This modifier does nothing if `zstd` is compiled without multithread support.
152* `--single-thread`:
153    Does not spawn a thread for compression, use a single thread for both I/O and compression.
154    In this mode, compression is serialized with I/O, which is slightly slower.
155    (This is different from `-T1`, which spawns 1 compression thread in parallel of I/O).
156    This mode is the only one available when multithread support is disabled.
157    Single-thread mode features lower memory usage.
158    Final compressed result is slightly different from `-T1`.
159* `--adapt[=min=#,max=#]` :
160    `zstd` will dynamically adapt compression level to perceived I/O conditions.
161    Compression level adaptation can be observed live by using command `-v`.
162    Adaptation can be constrained between supplied `min` and `max` levels.
163    The feature works when combined with multi-threading and `--long` mode.
164    It does not work with `--single-thread`.
165    It sets window size to 8 MB by default (can be changed manually, see `wlog`).
166    Due to the chaotic nature of dynamic adaptation, compressed result is not reproducible.
167    _note_ : at the time of this writing, `--adapt` can remain stuck at low speed
168    when combined with multiple worker threads (>=2).
169* `--stream-size=#` :
170    Sets the pledged source size of input coming from a stream. This value must be exact, as it
171    will be included in the produced frame header. Incorrect stream sizes will cause an error.
172    This information will be used to better optimize compression parameters, resulting in
173    better and potentially faster compression, especially for smaller source sizes.
174* `--size-hint=#`:
175    When handling input from a stream, `zstd` must guess how large the source size
176    will be when optimizing compression parameters. If the stream size is relatively
177    small, this guess may be a poor one, resulting in a higher compression ratio than
178    expected. This feature allows for controlling the guess when needed.
179    Exact guesses result in better compression ratios. Overestimates result in slightly
180    degraded compression ratios, while underestimates may result in significant degradation.
181* `--rsyncable` :
182    `zstd` will periodically synchronize the compression state to make the
183    compressed file more rsync-friendly. There is a negligible impact to
184    compression ratio, and the faster compression levels will see a small
185    compression speed hit.
186    This feature does not work with `--single-thread`. You probably don't want
187    to use it with long range mode, since it will decrease the effectiveness of
188    the synchronization points, but your milage may vary.
189* `-D file`:
190    use `file` as Dictionary to compress or decompress FILE(s)
191* `--no-dictID`:
192    do not store dictionary ID within frame header (dictionary compression).
193    The decoder will have to rely on implicit knowledge about which dictionary to use,
194    it won't be able to check if it's correct.
195* `-o file`:
196    save result into `file` (only possible with a single _INPUT-FILE_)
197* `-f`, `--force`:
198    overwrite output without prompting, and (de)compress symbolic links
199* `-c`, `--stdout`:
200    force write to standard output, even if it is the console
201* `--[no-]sparse`:
202    enable / disable sparse FS support,
203    to make files with many zeroes smaller on disk.
204    Creating sparse files may save disk space and speed up decompression by
205    reducing the amount of disk I/O.
206    default: enabled when output is into a file,
207    and disabled when output is stdout.
208    This setting overrides default and can force sparse mode over stdout.
209* `--[no-]content-size`:
210    enable / disable whether or not the original size of the file is placed in
211    the header of the compressed file. The default option is
212    --content-size (meaning that the original size will be placed in the header).
213* `--rm`:
214    remove source file(s) after successful compression or decompression
215* `-k`, `--keep`:
216    keep source file(s) after successful compression or decompression.
217    This is the default behavior.
218* `-r`:
219    operate recursively on directories
220* `--filelist=FILE`
221    read a list of files to process as content from `FILE`.
222    Format is compatible with `ls` output, with one file per line.
223* `--output-dir-flat[=dir]`:
224    resulting files are stored into target `dir` directory,
225    instead of same directory as origin file.
226    Be aware that this command can introduce name collision issues,
227    if multiple files, from different directories, end up having the same name.
228    Collision resolution ensures first file with a given name will be present in `dir`,
229    while in combination with `-f`, the last file will be present instead.
230* `--format=FORMAT`:
231    compress and decompress in other formats. If compiled with
232    support, zstd can compress to or decompress from other compression algorithm
233    formats. Possibly available options are `zstd`, `gzip`, `xz`, `lzma`, and `lz4`.
234    If no such format is provided, `zstd` is the default.
235* `-h`/`-H`, `--help`:
236    display help/long help and exit
237* `-V`, `--version`:
238    display version number and exit.
239    Advanced : `-vV` also displays supported formats.
240    `-vvV` also displays POSIX support.
241* `-v`, `--verbose`:
242    verbose mode
243* `--show-default-cparams`:
244    Shows the default compresssion parameters that will be used for a
245    particular src file. If the provided src file is not a regular file
246    (eg. named pipe), the cli will just output the default paramters.
247    That is, the parameters that are used when the src size is
248    unknown.
249* `-q`, `--quiet`:
250    suppress warnings, interactivity, and notifications.
251    specify twice to suppress errors too.
252* `--no-progress`:
253    do not display the progress bar, but keep all other messages.
254* `-C`, `--[no-]check`:
255    add integrity check computed from uncompressed data (default: enabled)
256* `--`:
257    All arguments after `--` are treated as files
258
259### Restricted usage of Environment Variables
260
261Using environment variables to set parameters has security implications.
262Therefore, this avenue is intentionally restricted.
263Only `ZSTD_CLEVEL` is supported currently, for setting compression level.
264`ZSTD_CLEVEL` can be used to set the level between 1 and 19 (the "normal" range).
265If the value of `ZSTD_CLEVEL` is not a valid integer, it will be ignored with a warning message.
266`ZSTD_CLEVEL` just replaces the default compression level (`3`).
267It can be overridden by corresponding command line arguments.
268
269
270DICTIONARY BUILDER
271------------------
272`zstd` offers _dictionary_ compression,
273which greatly improves efficiency on small files and messages.
274It's possible to train `zstd` with a set of samples,
275the result of which is saved into a file called a `dictionary`.
276Then during compression and decompression, reference the same dictionary,
277using command `-D dictionaryFileName`.
278Compression of small files similar to the sample set will be greatly improved.
279
280* `--train FILEs`:
281    Use FILEs as training set to create a dictionary.
282    The training set should contain a lot of small files (> 100),
283    and weight typically 100x the target dictionary size
284    (for example, 10 MB for a 100 KB dictionary).
285
286    Supports multithreading if `zstd` is compiled with threading support.
287    Additional parameters can be specified with `--train-fastcover`.
288    The legacy dictionary builder can be accessed with `--train-legacy`.
289    The cover dictionary builder can be accessed with `--train-cover`.
290    Equivalent to `--train-fastcover=d=8,steps=4`.
291* `-o file`:
292    Dictionary saved into `file` (default name: dictionary).
293* `--maxdict=#`:
294    Limit dictionary to specified size (default: 112640).
295* `-#`:
296    Use `#` compression level during training (optional).
297    Will generate statistics more tuned for selected compression level,
298    resulting in a _small_ compression ratio improvement for this level.
299* `-B#`:
300    Split input files in blocks of size # (default: no split)
301* `--dictID=#`:
302    A dictionary ID is a locally unique ID that a decoder can use to verify it is
303    using the right dictionary.
304    By default, zstd will create a 4-bytes random number ID.
305    It's possible to give a precise number instead.
306    Short numbers have an advantage : an ID < 256 will only need 1 byte in the
307    compressed frame header, and an ID < 65536 will only need 2 bytes.
308    This compares favorably to 4 bytes default.
309    However, it's up to the dictionary manager to not assign twice the same ID to
310    2 different dictionaries.
311* `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`:
312    Select parameters for the default dictionary builder algorithm named cover.
313    If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8.
314    If _k_ is not specified, then it tries _steps_ values in the range [50, 2000].
315    If _steps_ is not specified, then the default value of 40 is used.
316    If _split_ is not specified or split <= 0, then the default value of 100 is used.
317    Requires that _d_ <= _k_.
318    If _shrink_ flag is not used, then the default value for _shrinkDict_ of 0 is used.
319    If _shrink_ is not specified, then the default value for _shrinkDictMaxRegression_ of 1 is used.
320
321    Selects segments of size _k_ with highest score to put in the dictionary.
322    The score of a segment is computed by the sum of the frequencies of all the
323    subsegments of size _d_.
324    Generally _d_ should be in the range [6, 8], occasionally up to 16, but the
325    algorithm will run faster with d <= _8_.
326    Good values for _k_ vary widely based on the input data, but a safe range is
327    [2 * _d_, 2000].
328    If _split_ is 100, all input samples are used for both training and testing
329    to find optimal _d_ and _k_ to build dictionary.
330    Supports multithreading if `zstd` is compiled with threading support.
331    Having _shrink_ enabled takes a truncated dictionary of minimum size and doubles
332    in size until compression ratio of the truncated dictionary is at most
333    _shrinkDictMaxRegression%_ worse than the compression ratio of the largest dictionary.
334
335    Examples:
336
337    `zstd --train-cover FILEs`
338
339    `zstd --train-cover=k=50,d=8 FILEs`
340
341    `zstd --train-cover=d=8,steps=500 FILEs`
342
343    `zstd --train-cover=k=50 FILEs`
344
345    `zstd --train-cover=k=50,split=60 FILEs`
346
347    `zstd --train-cover=shrink FILEs`
348
349    `zstd --train-cover=shrink=2 FILEs`
350
351* `--train-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]`:
352    Same as cover but with extra parameters _f_ and _accel_ and different default value of split
353    If _split_ is not specified, then it tries _split_ = 75.
354    If _f_ is not specified, then it tries _f_ = 20.
355    Requires that 0 < _f_ < 32.
356    If _accel_ is not specified, then it tries _accel_ = 1.
357    Requires that 0 < _accel_ <= 10.
358    Requires that _d_ = 6 or _d_ = 8.
359
360    _f_ is log of size of array that keeps track of frequency of subsegments of size _d_.
361    The subsegment is hashed to an index in the range [0,2^_f_ - 1].
362    It is possible that 2 different subsegments are hashed to the same index, and they are considered as the same subsegment when computing frequency.
363    Using a higher _f_ reduces collision but takes longer.
364
365    Examples:
366
367    `zstd --train-fastcover FILEs`
368
369    `zstd --train-fastcover=d=8,f=15,accel=2 FILEs`
370
371* `--train-legacy[=selectivity=#]`:
372    Use legacy dictionary builder algorithm with the given dictionary
373    _selectivity_ (default: 9).
374    The smaller the _selectivity_ value, the denser the dictionary,
375    improving its efficiency but reducing its possible maximum size.
376    `--train-legacy=s=#` is also accepted.
377
378    Examples:
379
380    `zstd --train-legacy FILEs`
381
382    `zstd --train-legacy=selectivity=8 FILEs`
383
384
385BENCHMARK
386---------
387
388* `-b#`:
389    benchmark file(s) using compression level #
390* `-e#`:
391    benchmark file(s) using multiple compression levels, from `-b#` to `-e#` (inclusive)
392* `-i#`:
393    minimum evaluation time, in seconds (default: 3s), benchmark mode only
394* `-B#`, `--block-size=#`:
395    cut file(s) into independent blocks of size # (default: no block)
396* `--priority=rt`:
397    set process priority to real-time
398
399**Output Format:** CompressionLevel#Filename : IntputSize -> OutputSize (CompressionRatio), CompressionSpeed, DecompressionSpeed
400
401**Methodology:** For both compression and decompression speed, the entire input is compressed/decompressed in-memory to measure speed. A run lasts at least 1 sec, so when files are small, they are compressed/decompressed several times per run, in order to improve measurement accuracy.
402
403ADVANCED COMPRESSION OPTIONS
404----------------------------
405### --zstd[=options]:
406`zstd` provides 22 predefined compression levels.
407The selected or default predefined compression level can be changed with
408advanced compression options.
409The _options_ are provided as a comma-separated list.
410You may specify only the options you want to change and the rest will be
411taken from the selected or default compression level.
412The list of available _options_:
413
414- `strategy`=_strat_, `strat`=_strat_:
415    Specify a strategy used by a match finder.
416
417    There are 9 strategies numbered from 1 to 9, from faster to stronger:
418    1=ZSTD\_fast, 2=ZSTD\_dfast, 3=ZSTD\_greedy,
419    4=ZSTD\_lazy, 5=ZSTD\_lazy2, 6=ZSTD\_btlazy2,
420    7=ZSTD\_btopt, 8=ZSTD\_btultra, 9=ZSTD\_btultra2.
421
422- `windowLog`=_wlog_, `wlog`=_wlog_:
423    Specify the maximum number of bits for a match distance.
424
425    The higher number of increases the chance to find a match which usually
426    improves compression ratio.
427    It also increases memory requirements for the compressor and decompressor.
428    The minimum _wlog_ is 10 (1 KiB) and the maximum is 30 (1 GiB) on 32-bit
429    platforms and 31 (2 GiB) on 64-bit platforms.
430
431    Note: If `windowLog` is set to larger than 27, `--long=windowLog` or
432    `--memory=windowSize` needs to be passed to the decompressor.
433
434- `hashLog`=_hlog_, `hlog`=_hlog_:
435    Specify the maximum number of bits for a hash table.
436
437    Bigger hash tables cause less collisions which usually makes compression
438    faster, but requires more memory during compression.
439
440    The minimum _hlog_ is 6 (64 B) and the maximum is 30 (1 GiB).
441
442- `chainLog`=_clog_, `clog`=_clog_:
443    Specify the maximum number of bits for a hash chain or a binary tree.
444
445    Higher numbers of bits increases the chance to find a match which usually
446    improves compression ratio.
447    It also slows down compression speed and increases memory requirements for
448    compression.
449    This option is ignored for the ZSTD_fast strategy.
450
451    The minimum _clog_ is 6 (64 B) and the maximum is 29 (524 Mib) on 32-bit platforms
452    and 30 (1 Gib) on 64-bit platforms.
453
454- `searchLog`=_slog_, `slog`=_slog_:
455    Specify the maximum number of searches in a hash chain or a binary tree
456    using logarithmic scale.
457
458    More searches increases the chance to find a match which usually increases
459    compression ratio but decreases compression speed.
460
461    The minimum _slog_ is 1 and the maximum is 'windowLog' - 1.
462
463- `minMatch`=_mml_, `mml`=_mml_:
464    Specify the minimum searched length of a match in a hash table.
465
466    Larger search lengths usually decrease compression ratio but improve
467    decompression speed.
468
469    The minimum _mml_ is 3 and the maximum is 7.
470
471- `targetLength`=_tlen_, `tlen`=_tlen_:
472    The impact of this field vary depending on selected strategy.
473
474    For ZSTD\_btopt, ZSTD\_btultra and ZSTD\_btultra2, it specifies
475    the minimum match length that causes match finder to stop searching.
476    A larger `targetLength` usually improves compression ratio
477    but decreases compression speed.
478t
479    For ZSTD\_fast, it triggers ultra-fast mode when > 0.
480    The value represents the amount of data skipped between match sampling.
481    Impact is reversed : a larger `targetLength` increases compression speed
482    but decreases compression ratio.
483
484    For all other strategies, this field has no impact.
485
486    The minimum _tlen_ is 0 and the maximum is 128 Kib.
487
488- `overlapLog`=_ovlog_,  `ovlog`=_ovlog_:
489    Determine `overlapSize`, amount of data reloaded from previous job.
490    This parameter is only available when multithreading is enabled.
491    Reloading more data improves compression ratio, but decreases speed.
492
493    The minimum _ovlog_ is 0, and the maximum is 9.
494    1 means "no overlap", hence completely independent jobs.
495    9 means "full overlap", meaning up to `windowSize` is reloaded from previous job.
496    Reducing _ovlog_ by 1 reduces the reloaded amount by a factor 2.
497    For example, 8 means "windowSize/2", and 6 means "windowSize/8".
498    Value 0 is special and means "default" : _ovlog_ is automatically determined by `zstd`.
499    In which case, _ovlog_ will range from 6 to 9, depending on selected _strat_.
500
501- `ldmHashLog`=_lhlog_, `lhlog`=_lhlog_:
502    Specify the maximum size for a hash table used for long distance matching.
503
504    This option is ignored unless long distance matching is enabled.
505
506    Bigger hash tables usually improve compression ratio at the expense of more
507    memory during compression and a decrease in compression speed.
508
509    The minimum _lhlog_ is 6 and the maximum is 30 (default: 20).
510
511- `ldmMinMatch`=_lmml_, `lmml`=_lmml_:
512    Specify the minimum searched length of a match for long distance matching.
513
514    This option is ignored unless long distance matching is enabled.
515
516    Larger/very small values usually decrease compression ratio.
517
518    The minimum _lmml_ is 4 and the maximum is 4096 (default: 64).
519
520- `ldmBucketSizeLog`=_lblog_, `lblog`=_lblog_:
521    Specify the size of each bucket for the hash table used for long distance
522    matching.
523
524    This option is ignored unless long distance matching is enabled.
525
526    Larger bucket sizes improve collision resolution but decrease compression
527    speed.
528
529    The minimum _lblog_ is 1 and the maximum is 8 (default: 3).
530
531- `ldmHashRateLog`=_lhrlog_, `lhrlog`=_lhrlog_:
532    Specify the frequency of inserting entries into the long distance matching
533    hash table.
534
535    This option is ignored unless long distance matching is enabled.
536
537    Larger values will improve compression speed. Deviating far from the
538    default value will likely result in a decrease in compression ratio.
539
540    The default value is `wlog - lhlog`.
541
542### Example
543The following parameters sets advanced compression options to something
544similar to predefined level 19 for files bigger than 256 KB:
545
546`--zstd`=wlog=23,clog=23,hlog=22,slog=6,mml=3,tlen=48,strat=6
547
548### -B#:
549Select the size of each compression job.
550This parameter is available only when multi-threading is enabled.
551Default value is `4 * windowSize`, which means it varies depending on compression level.
552`-B#` makes it possible to select a custom value.
553Note that job size must respect a minimum value which is enforced transparently.
554This minimum is either 1 MB, or `overlapSize`, whichever is largest.
555
556BUGS
557----
558Report bugs at: https://github.com/facebook/zstd/issues
559
560AUTHOR
561------
562Yann Collet
563