10c16b537SWarner Loshzstd(1) -- zstd, zstdmt, unzstd, zstdcat - Compress or decompress .zst files 20c16b537SWarner Losh============================================================================ 30c16b537SWarner Losh 40c16b537SWarner LoshSYNOPSIS 50c16b537SWarner Losh-------- 60c16b537SWarner Losh 70c16b537SWarner Losh`zstd` [*OPTIONS*] [-|_INPUT-FILE_] [-o _OUTPUT-FILE_] 80c16b537SWarner Losh 90c16b537SWarner Losh`zstdmt` is equivalent to `zstd -T0` 100c16b537SWarner Losh 110c16b537SWarner Losh`unzstd` is equivalent to `zstd -d` 120c16b537SWarner Losh 130c16b537SWarner Losh`zstdcat` is equivalent to `zstd -dcf` 140c16b537SWarner Losh 150c16b537SWarner Losh 160c16b537SWarner LoshDESCRIPTION 170c16b537SWarner Losh----------- 180c16b537SWarner Losh`zstd` is a fast lossless compression algorithm and data compression tool, 190c16b537SWarner Loshwith command line syntax similar to `gzip (1)` and `xz (1)`. 200c16b537SWarner LoshIt is based on the **LZ77** family, with further FSE & huff0 entropy stages. 210c16b537SWarner Losh`zstd` offers highly configurable compression speed, 220f743729SConrad Meyerwith fast modes at > 200 MB/s per core, 230c16b537SWarner Loshand strong modes nearing lzma compression ratios. 240c16b537SWarner LoshIt also features a very fast decoder, with speeds > 500 MB/s per core. 250c16b537SWarner Losh 260c16b537SWarner Losh`zstd` command line syntax is generally similar to gzip, 270c16b537SWarner Loshbut features the following differences : 280c16b537SWarner Losh 290c16b537SWarner Losh - Source files are preserved by default. 300c16b537SWarner Losh It's possible to remove them automatically by using the `--rm` command. 310c16b537SWarner Losh - When compressing a single file, `zstd` displays progress notifications 320c16b537SWarner Losh and result summary by default. 330c16b537SWarner Losh Use `-q` to turn them off. 340c16b537SWarner Losh - `zstd` does not accept input from console, 350c16b537SWarner Losh but it properly accepts `stdin` when it's not the console. 360c16b537SWarner Losh - `zstd` displays a short help page when command line is an error. 370c16b537SWarner Losh Use `-q` to turn it off. 380c16b537SWarner Losh 390c16b537SWarner Losh`zstd` compresses or decompresses each _file_ according to the selected 400c16b537SWarner Loshoperation mode. 410c16b537SWarner LoshIf no _files_ are given or _file_ is `-`, `zstd` reads from standard input 420c16b537SWarner Loshand writes the processed data to standard output. 430c16b537SWarner Losh`zstd` will refuse to write compressed data to standard output 440c16b537SWarner Loshif it is a terminal : it will display an error message and skip the _file_. 450c16b537SWarner LoshSimilarly, `zstd` will refuse to read compressed data from standard input 460c16b537SWarner Loshif it is a terminal. 470c16b537SWarner Losh 480c16b537SWarner LoshUnless `--stdout` or `-o` is specified, _files_ are written to a new file 490c16b537SWarner Loshwhose name is derived from the source _file_ name: 500c16b537SWarner Losh 510c16b537SWarner Losh* When compressing, the suffix `.zst` is appended to the source filename to 520c16b537SWarner Losh get the target filename. 530c16b537SWarner Losh* When decompressing, the `.zst` suffix is removed from the source filename to 540c16b537SWarner Losh get the target filename 550c16b537SWarner Losh 560c16b537SWarner Losh### Concatenation with .zst files 570c16b537SWarner LoshIt is possible to concatenate `.zst` files as is. 580c16b537SWarner Losh`zstd` will decompress such files as if they were a single `.zst` file. 590c16b537SWarner Losh 600c16b537SWarner LoshOPTIONS 610c16b537SWarner Losh------- 620c16b537SWarner Losh 630c16b537SWarner Losh### Integer suffixes and special values 640c16b537SWarner LoshIn most places where an integer argument is expected, 650c16b537SWarner Loshan optional suffix is supported to easily indicate large integers. 660c16b537SWarner LoshThere must be no space between the integer and the suffix. 670c16b537SWarner Losh 680c16b537SWarner Losh* `KiB`: 690c16b537SWarner Losh Multiply the integer by 1,024 (2\^10). 700c16b537SWarner Losh `Ki`, `K`, and `KB` are accepted as synonyms for `KiB`. 710c16b537SWarner Losh* `MiB`: 720c16b537SWarner Losh Multiply the integer by 1,048,576 (2\^20). 730c16b537SWarner Losh `Mi`, `M`, and `MB` are accepted as synonyms for `MiB`. 740c16b537SWarner Losh 750c16b537SWarner Losh### Operation mode 760c16b537SWarner LoshIf multiple operation mode options are given, 770c16b537SWarner Loshthe last one takes effect. 780c16b537SWarner Losh 790c16b537SWarner Losh* `-z`, `--compress`: 800c16b537SWarner Losh Compress. 810c16b537SWarner Losh This is the default operation mode when no operation mode option is specified 820c16b537SWarner Losh and no other operation mode is implied from the command name 830c16b537SWarner Losh (for example, `unzstd` implies `--decompress`). 840c16b537SWarner Losh* `-d`, `--decompress`, `--uncompress`: 850c16b537SWarner Losh Decompress. 860c16b537SWarner Losh* `-t`, `--test`: 870c16b537SWarner Losh Test the integrity of compressed _files_. 880c16b537SWarner Losh This option is equivalent to `--decompress --stdout` except that the 890c16b537SWarner Losh decompressed data is discarded instead of being written to standard output. 900c16b537SWarner Losh No files are created or removed. 910c16b537SWarner Losh* `-b#`: 920c16b537SWarner Losh Benchmark file(s) using compression level # 930c16b537SWarner Losh* `--train FILEs`: 940c16b537SWarner Losh Use FILEs as a training set to create a dictionary. 950c16b537SWarner Losh The training set should contain a lot of small files (> 100). 960c16b537SWarner Losh* `-l`, `--list`: 970c16b537SWarner Losh Display information related to a zstd compressed file, such as size, ratio, and checksum. 980c16b537SWarner Losh Some of these fields may not be available. 990c16b537SWarner Losh This command can be augmented with the `-v` modifier. 1000c16b537SWarner Losh 1010c16b537SWarner Losh### Operation modifiers 1020c16b537SWarner Losh 1030c16b537SWarner Losh* `-#`: 1040c16b537SWarner Losh `#` compression level \[1-19] (default: 3) 105f7cd7fe5SConrad Meyer* `--ultra`: 106f7cd7fe5SConrad Meyer unlocks high compression levels 20+ (maximum 22), using a lot more memory. 107f7cd7fe5SConrad Meyer Note that decompression will also require more memory when using these levels. 1080f743729SConrad Meyer* `--fast[=#]`: 1090f743729SConrad Meyer switch to ultra-fast compression levels. 1100f743729SConrad Meyer If `=#` is not present, it defaults to `1`. 1110f743729SConrad Meyer The higher the value, the faster the compression speed, 1120f743729SConrad Meyer at the cost of some compression ratio. 1130f743729SConrad Meyer This setting overwrites compression level if one was set previously. 1140f743729SConrad Meyer Similarly, if a compression level is set after `--fast`, it overrides it. 1150c16b537SWarner Losh* `-T#`, `--threads=#`: 11619fcbaf1SConrad Meyer Compress using `#` working threads (default: 1). 1170c16b537SWarner Losh If `#` is 0, attempt to detect and use the number of physical CPU cores. 118*5ff13fbcSAllan Jude In all cases, the nb of threads is capped to `ZSTDMT_NBWORKERS_MAX`, 119*5ff13fbcSAllan Jude which is either 64 in 32-bit mode, or 256 for 64-bit environments. 1200c16b537SWarner Losh This modifier does nothing if `zstd` is compiled without multithread support. 12119fcbaf1SConrad Meyer* `--single-thread`: 1220f743729SConrad Meyer Does not spawn a thread for compression, use a single thread for both I/O and compression. 1230f743729SConrad Meyer In this mode, compression is serialized with I/O, which is slightly slower. 12419fcbaf1SConrad Meyer (This is different from `-T1`, which spawns 1 compression thread in parallel of I/O). 1250f743729SConrad Meyer This mode is the only one available when multithread support is disabled. 1260f743729SConrad Meyer Single-thread mode features lower memory usage. 1270f743729SConrad Meyer Final compressed result is slightly different from `-T1`. 128*5ff13fbcSAllan Jude* `--auto-threads={physical,logical} (default: physical)`: 129*5ff13fbcSAllan Jude When using a default amount of threads via `-T0`, choose the default based on the number 130*5ff13fbcSAllan Jude of detected physical or logical cores. 1310f743729SConrad Meyer* `--adapt[=min=#,max=#]` : 1320f743729SConrad Meyer `zstd` will dynamically adapt compression level to perceived I/O conditions. 1330f743729SConrad Meyer Compression level adaptation can be observed live by using command `-v`. 1340f743729SConrad Meyer Adaptation can be constrained between supplied `min` and `max` levels. 1350f743729SConrad Meyer The feature works when combined with multi-threading and `--long` mode. 1360f743729SConrad Meyer It does not work with `--single-thread`. 1370f743729SConrad Meyer It sets window size to 8 MB by default (can be changed manually, see `wlog`). 1380f743729SConrad Meyer Due to the chaotic nature of dynamic adaptation, compressed result is not reproducible. 1390f743729SConrad Meyer _note_ : at the time of this writing, `--adapt` can remain stuck at low speed 1400f743729SConrad Meyer when combined with multiple worker threads (>=2). 141f7cd7fe5SConrad Meyer* `--long[=#]`: 142f7cd7fe5SConrad Meyer enables long distance matching with `#` `windowLog`, if not `#` is not 143f7cd7fe5SConrad Meyer present it defaults to `27`. 144f7cd7fe5SConrad Meyer This increases the window size (`windowLog`) and memory usage for both the 145f7cd7fe5SConrad Meyer compressor and decompressor. 146f7cd7fe5SConrad Meyer This setting is designed to improve the compression ratio for files with 147f7cd7fe5SConrad Meyer long matches at a large distance. 148f7cd7fe5SConrad Meyer 149f7cd7fe5SConrad Meyer Note: If `windowLog` is set to larger than 27, `--long=windowLog` or 150f7cd7fe5SConrad Meyer `--memory=windowSize` needs to be passed to the decompressor. 151f7cd7fe5SConrad Meyer* `-D DICT`: 152f7cd7fe5SConrad Meyer use `DICT` as Dictionary to compress or decompress FILE(s) 153f7cd7fe5SConrad Meyer* `--patch-from FILE`: 154f7cd7fe5SConrad Meyer Specify the file to be used as a reference point for zstd's diff engine. 155f7cd7fe5SConrad Meyer This is effectively dictionary compression with some convenient parameter 156f7cd7fe5SConrad Meyer selection, namely that windowSize > srcSize. 157f7cd7fe5SConrad Meyer 158f7cd7fe5SConrad Meyer Note: cannot use both this and -D together 159f7cd7fe5SConrad Meyer Note: `--long` mode will be automatically activated if chainLog < fileLog 160f7cd7fe5SConrad Meyer (fileLog being the windowLog required to cover the whole file). You 161f7cd7fe5SConrad Meyer can also manually force it. 162f7cd7fe5SConrad Meyer Node: for all levels, you can use --patch-from in --single-thread mode 163f7cd7fe5SConrad Meyer to improve compression ratio at the cost of speed 164f7cd7fe5SConrad Meyer Note: for level 19, you can get increased compression ratio at the cost 165f7cd7fe5SConrad Meyer of speed by specifying `--zstd=targetLength=` to be something large 166f7cd7fe5SConrad Meyer (i.e 4096), and by setting a large `--zstd=chainLog=` 167f7cd7fe5SConrad Meyer* `--rsyncable` : 168f7cd7fe5SConrad Meyer `zstd` will periodically synchronize the compression state to make the 169f7cd7fe5SConrad Meyer compressed file more rsync-friendly. There is a negligible impact to 170f7cd7fe5SConrad Meyer compression ratio, and the faster compression levels will see a small 171f7cd7fe5SConrad Meyer compression speed hit. 172f7cd7fe5SConrad Meyer This feature does not work with `--single-thread`. You probably don't want 173f7cd7fe5SConrad Meyer to use it with long range mode, since it will decrease the effectiveness of 174*5ff13fbcSAllan Jude the synchronization points, but your mileage may vary. 175f7cd7fe5SConrad Meyer* `-C`, `--[no-]check`: 176f7cd7fe5SConrad Meyer add integrity check computed from uncompressed data (default: enabled) 177f7cd7fe5SConrad Meyer* `--[no-]content-size`: 178f7cd7fe5SConrad Meyer enable / disable whether or not the original size of the file is placed in 179f7cd7fe5SConrad Meyer the header of the compressed file. The default option is 180f7cd7fe5SConrad Meyer --content-size (meaning that the original size will be placed in the header). 181f7cd7fe5SConrad Meyer* `--no-dictID`: 182f7cd7fe5SConrad Meyer do not store dictionary ID within frame header (dictionary compression). 183f7cd7fe5SConrad Meyer The decoder will have to rely on implicit knowledge about which dictionary to use, 184f7cd7fe5SConrad Meyer it won't be able to check if it's correct. 185f7cd7fe5SConrad Meyer* `-M#`, `--memory=#`: 186f7cd7fe5SConrad Meyer Set a memory usage limit. By default, Zstandard uses 128 MB for decompression 187f7cd7fe5SConrad Meyer as the maximum amount of memory the decompressor is allowed to use, but you can 188f7cd7fe5SConrad Meyer override this manually if need be in either direction (ie. you can increase or 189f7cd7fe5SConrad Meyer decrease it). 190f7cd7fe5SConrad Meyer 191f7cd7fe5SConrad Meyer This is also used during compression when using with --patch-from=. In this case, 192f7cd7fe5SConrad Meyer this parameter overrides that maximum size allowed for a dictionary. (128 MB). 193*5ff13fbcSAllan Jude 194*5ff13fbcSAllan Jude Additionally, this can be used to limit memory for dictionary training. This parameter 195*5ff13fbcSAllan Jude overrides the default limit of 2 GB. zstd will load training samples up to the memory limit 196*5ff13fbcSAllan Jude and ignore the rest. 1979cbefe25SConrad Meyer* `--stream-size=#` : 1989cbefe25SConrad Meyer Sets the pledged source size of input coming from a stream. This value must be exact, as it 1999cbefe25SConrad Meyer will be included in the produced frame header. Incorrect stream sizes will cause an error. 2009cbefe25SConrad Meyer This information will be used to better optimize compression parameters, resulting in 2019cbefe25SConrad Meyer better and potentially faster compression, especially for smaller source sizes. 2029cbefe25SConrad Meyer* `--size-hint=#`: 2039cbefe25SConrad Meyer When handling input from a stream, `zstd` must guess how large the source size 2049cbefe25SConrad Meyer will be when optimizing compression parameters. If the stream size is relatively 2059cbefe25SConrad Meyer small, this guess may be a poor one, resulting in a higher compression ratio than 2069cbefe25SConrad Meyer expected. This feature allows for controlling the guess when needed. 2079cbefe25SConrad Meyer Exact guesses result in better compression ratios. Overestimates result in slightly 2089cbefe25SConrad Meyer degraded compression ratios, while underestimates may result in significant degradation. 209f7cd7fe5SConrad Meyer* `-o FILE`: 210f7cd7fe5SConrad Meyer save result into `FILE` 2110c16b537SWarner Losh* `-f`, `--force`: 212*5ff13fbcSAllan Jude disable input and output checks. Allows overwriting existing files, input 213*5ff13fbcSAllan Jude from console, output to stdout, operating on links, block devices, etc. 2140c16b537SWarner Losh* `-c`, `--stdout`: 215*5ff13fbcSAllan Jude write to standard output (even if it is the console) 2160c16b537SWarner Losh* `--[no-]sparse`: 2170c16b537SWarner Losh enable / disable sparse FS support, 2180c16b537SWarner Losh to make files with many zeroes smaller on disk. 2190c16b537SWarner Losh Creating sparse files may save disk space and speed up decompression by 2200c16b537SWarner Losh reducing the amount of disk I/O. 2210c16b537SWarner Losh default: enabled when output is into a file, 2220c16b537SWarner Losh and disabled when output is stdout. 2230c16b537SWarner Losh This setting overrides default and can force sparse mode over stdout. 2240c16b537SWarner Losh* `--rm`: 225f7cd7fe5SConrad Meyer remove source file(s) after successful compression or decompression. If used in combination with 226f7cd7fe5SConrad Meyer -o, will trigger a confirmation prompt (which can be silenced with -f), as this is a destructive operation. 2270c16b537SWarner Losh* `-k`, `--keep`: 2280c16b537SWarner Losh keep source file(s) after successful compression or decompression. 2290c16b537SWarner Losh This is the default behavior. 2300c16b537SWarner Losh* `-r`: 231*5ff13fbcSAllan Jude operate recursively on directories. 232*5ff13fbcSAllan Jude It selects all files in the named directory and all its subdirectories. 233*5ff13fbcSAllan Jude This can be useful both to reduce command line typing, 234*5ff13fbcSAllan Jude and to circumvent shell expansion limitations, 235*5ff13fbcSAllan Jude when there are a lot of files and naming breaks the maximum size of a command line. 236f7cd7fe5SConrad Meyer* `--filelist FILE` 23737f1f268SConrad Meyer read a list of files to process as content from `FILE`. 23837f1f268SConrad Meyer Format is compatible with `ls` output, with one file per line. 239f7cd7fe5SConrad Meyer* `--output-dir-flat DIR`: 240f7cd7fe5SConrad Meyer resulting files are stored into target `DIR` directory, 2419cbefe25SConrad Meyer instead of same directory as origin file. 2429cbefe25SConrad Meyer Be aware that this command can introduce name collision issues, 2439cbefe25SConrad Meyer if multiple files, from different directories, end up having the same name. 244f7cd7fe5SConrad Meyer Collision resolution ensures first file with a given name will be present in `DIR`, 2459cbefe25SConrad Meyer while in combination with `-f`, the last file will be present instead. 246f7cd7fe5SConrad Meyer* `--output-dir-mirror DIR`: 247f7cd7fe5SConrad Meyer similar to `--output-dir-flat`, 248f7cd7fe5SConrad Meyer the output files are stored underneath target `DIR` directory, 249f7cd7fe5SConrad Meyer but this option will replicate input directory hierarchy into output `DIR`. 250f7cd7fe5SConrad Meyer 251f7cd7fe5SConrad Meyer If input directory contains "..", the files in this directory will be ignored. 252f7cd7fe5SConrad Meyer If input directory is an absolute directory (i.e. "/var/tmp/abc"), 253f7cd7fe5SConrad Meyer it will be stored into the "output-dir/var/tmp/abc". 254f7cd7fe5SConrad Meyer If there are multiple input files or directories, 255f7cd7fe5SConrad Meyer name collision resolution will follow the same rules as `--output-dir-flat`. 2560c16b537SWarner Losh* `--format=FORMAT`: 2570c16b537SWarner Losh compress and decompress in other formats. If compiled with 2580c16b537SWarner Losh support, zstd can compress to or decompress from other compression algorithm 2590f743729SConrad Meyer formats. Possibly available options are `zstd`, `gzip`, `xz`, `lzma`, and `lz4`. 2600f743729SConrad Meyer If no such format is provided, `zstd` is the default. 2610c16b537SWarner Losh* `-h`/`-H`, `--help`: 2620c16b537SWarner Losh display help/long help and exit 2630c16b537SWarner Losh* `-V`, `--version`: 2640c16b537SWarner Losh display version number and exit. 2650c16b537SWarner Losh Advanced : `-vV` also displays supported formats. 2660c16b537SWarner Losh `-vvV` also displays POSIX support. 267f7cd7fe5SConrad Meyer `-q` will only display the version number, suitable for machine reading. 26837f1f268SConrad Meyer* `-v`, `--verbose`: 269f7cd7fe5SConrad Meyer verbose mode, display more information 2700c16b537SWarner Losh* `-q`, `--quiet`: 2710c16b537SWarner Losh suppress warnings, interactivity, and notifications. 2720c16b537SWarner Losh specify twice to suppress errors too. 273a0483764SConrad Meyer* `--no-progress`: 274a0483764SConrad Meyer do not display the progress bar, but keep all other messages. 275f7cd7fe5SConrad Meyer* `--show-default-cparams`: 276f7cd7fe5SConrad Meyer Shows the default compression parameters that will be used for a 277f7cd7fe5SConrad Meyer particular src file. If the provided src file is not a regular file 278f7cd7fe5SConrad Meyer (eg. named pipe), the cli will just output the default parameters. 279f7cd7fe5SConrad Meyer That is, the parameters that are used when the src size is unknown. 2800c16b537SWarner Losh* `--`: 2810c16b537SWarner Losh All arguments after `--` are treated as files 2820c16b537SWarner Losh 2839cbefe25SConrad Meyer### Restricted usage of Environment Variables 2849cbefe25SConrad Meyer 2859cbefe25SConrad MeyerUsing environment variables to set parameters has security implications. 2869cbefe25SConrad MeyerTherefore, this avenue is intentionally restricted. 287f7cd7fe5SConrad MeyerOnly `ZSTD_CLEVEL` and `ZSTD_NBTHREADS` are currently supported. 288f7cd7fe5SConrad MeyerThey set the compression level and number of threads to use during compression, respectively. 289f7cd7fe5SConrad Meyer 2909cbefe25SConrad Meyer`ZSTD_CLEVEL` can be used to set the level between 1 and 19 (the "normal" range). 2919cbefe25SConrad MeyerIf the value of `ZSTD_CLEVEL` is not a valid integer, it will be ignored with a warning message. 2929cbefe25SConrad Meyer`ZSTD_CLEVEL` just replaces the default compression level (`3`). 293f7cd7fe5SConrad Meyer 294f7cd7fe5SConrad Meyer`ZSTD_NBTHREADS` can be used to set the number of threads `zstd` will attempt to use during compression. 295f7cd7fe5SConrad MeyerIf the value of `ZSTD_NBTHREADS` is not a valid unsigned integer, it will be ignored with a warning message. 296*5ff13fbcSAllan Jude`ZSTD_NBTHREADS` has a default value of (`1`), and is capped at ZSTDMT_NBWORKERS_MAX==200. `zstd` must be 297f7cd7fe5SConrad Meyercompiled with multithread support for this to have any effect. 298f7cd7fe5SConrad Meyer 299f7cd7fe5SConrad MeyerThey can both be overridden by corresponding command line arguments: 300f7cd7fe5SConrad Meyer`-#` for compression level and `-T#` for number of compression threads. 3019cbefe25SConrad Meyer 3020c16b537SWarner Losh 3030c16b537SWarner LoshDICTIONARY BUILDER 3040c16b537SWarner Losh------------------ 3050c16b537SWarner Losh`zstd` offers _dictionary_ compression, 30619fcbaf1SConrad Meyerwhich greatly improves efficiency on small files and messages. 30719fcbaf1SConrad MeyerIt's possible to train `zstd` with a set of samples, 3080c16b537SWarner Loshthe result of which is saved into a file called a `dictionary`. 30919fcbaf1SConrad MeyerThen during compression and decompression, reference the same dictionary, 31019fcbaf1SConrad Meyerusing command `-D dictionaryFileName`. 31119fcbaf1SConrad MeyerCompression of small files similar to the sample set will be greatly improved. 3120c16b537SWarner Losh 3130c16b537SWarner Losh* `--train FILEs`: 3140c16b537SWarner Losh Use FILEs as training set to create a dictionary. 3150c16b537SWarner Losh The training set should contain a lot of small files (> 100), 3160c16b537SWarner Losh and weight typically 100x the target dictionary size 3170c16b537SWarner Losh (for example, 10 MB for a 100 KB dictionary). 318*5ff13fbcSAllan Jude `--train` can be combined with `-r` to indicate a directory rather than listing all the files, 319*5ff13fbcSAllan Jude which can be useful to circumvent shell expansion limits. 3200c16b537SWarner Losh 321*5ff13fbcSAllan Jude `--train` supports multithreading if `zstd` is compiled with threading support (default). 3220f743729SConrad Meyer Additional parameters can be specified with `--train-fastcover`. 3230c16b537SWarner Losh The legacy dictionary builder can be accessed with `--train-legacy`. 324*5ff13fbcSAllan Jude The slower cover dictionary builder can be accessed with `--train-cover`. 325*5ff13fbcSAllan Jude Default is equivalent to `--train-fastcover=d=8,steps=4`. 3260c16b537SWarner Losh* `-o file`: 3270c16b537SWarner Losh Dictionary saved into `file` (default name: dictionary). 3280c16b537SWarner Losh* `--maxdict=#`: 3290c16b537SWarner Losh Limit dictionary to specified size (default: 112640). 33019fcbaf1SConrad Meyer* `-#`: 33119fcbaf1SConrad Meyer Use `#` compression level during training (optional). 33219fcbaf1SConrad Meyer Will generate statistics more tuned for selected compression level, 33319fcbaf1SConrad Meyer resulting in a _small_ compression ratio improvement for this level. 3340c16b537SWarner Losh* `-B#`: 335*5ff13fbcSAllan Jude Split input files into blocks of size # (default: no split) 336*5ff13fbcSAllan Jude* `-M#`, `--memory=#`: 337*5ff13fbcSAllan Jude Limit the amount of sample data loaded for training (default: 2 GB). See above for details. 3380c16b537SWarner Losh* `--dictID=#`: 339*5ff13fbcSAllan Jude A dictionary ID is a locally unique ID 340*5ff13fbcSAllan Jude that a decoder can use to verify it is using the right dictionary. 3410c16b537SWarner Losh By default, zstd will create a 4-bytes random number ID. 3420c16b537SWarner Losh It's possible to give a precise number instead. 3430c16b537SWarner Losh Short numbers have an advantage : an ID < 256 will only need 1 byte in the 3440c16b537SWarner Losh compressed frame header, and an ID < 65536 will only need 2 bytes. 3450c16b537SWarner Losh This compares favorably to 4 bytes default. 3460c16b537SWarner Losh However, it's up to the dictionary manager to not assign twice the same ID to 3470c16b537SWarner Losh 2 different dictionaries. 3484d3f1eafSConrad Meyer* `--train-cover[=k#,d=#,steps=#,split=#,shrink[=#]]`: 3490c16b537SWarner Losh Select parameters for the default dictionary builder algorithm named cover. 3500c16b537SWarner Losh If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8. 3510c16b537SWarner Losh If _k_ is not specified, then it tries _steps_ values in the range [50, 2000]. 3520c16b537SWarner Losh If _steps_ is not specified, then the default value of 40 is used. 3530f743729SConrad Meyer If _split_ is not specified or split <= 0, then the default value of 100 is used. 3540c16b537SWarner Losh Requires that _d_ <= _k_. 3554d3f1eafSConrad Meyer If _shrink_ flag is not used, then the default value for _shrinkDict_ of 0 is used. 3564d3f1eafSConrad Meyer If _shrink_ is not specified, then the default value for _shrinkDictMaxRegression_ of 1 is used. 3570c16b537SWarner Losh 3580c16b537SWarner Losh Selects segments of size _k_ with highest score to put in the dictionary. 3590c16b537SWarner Losh The score of a segment is computed by the sum of the frequencies of all the 3600c16b537SWarner Losh subsegments of size _d_. 3610c16b537SWarner Losh Generally _d_ should be in the range [6, 8], occasionally up to 16, but the 3620c16b537SWarner Losh algorithm will run faster with d <= _8_. 3630c16b537SWarner Losh Good values for _k_ vary widely based on the input data, but a safe range is 3640c16b537SWarner Losh [2 * _d_, 2000]. 3650f743729SConrad Meyer If _split_ is 100, all input samples are used for both training and testing 3660f743729SConrad Meyer to find optimal _d_ and _k_ to build dictionary. 3670c16b537SWarner Losh Supports multithreading if `zstd` is compiled with threading support. 3684d3f1eafSConrad Meyer Having _shrink_ enabled takes a truncated dictionary of minimum size and doubles 3694d3f1eafSConrad Meyer in size until compression ratio of the truncated dictionary is at most 3704d3f1eafSConrad Meyer _shrinkDictMaxRegression%_ worse than the compression ratio of the largest dictionary. 3710c16b537SWarner Losh 3720c16b537SWarner Losh Examples: 3730c16b537SWarner Losh 3740c16b537SWarner Losh `zstd --train-cover FILEs` 3750c16b537SWarner Losh 3760c16b537SWarner Losh `zstd --train-cover=k=50,d=8 FILEs` 3770c16b537SWarner Losh 3780c16b537SWarner Losh `zstd --train-cover=d=8,steps=500 FILEs` 3790c16b537SWarner Losh 3800c16b537SWarner Losh `zstd --train-cover=k=50 FILEs` 3810c16b537SWarner Losh 3820f743729SConrad Meyer `zstd --train-cover=k=50,split=60 FILEs` 3830f743729SConrad Meyer 3844d3f1eafSConrad Meyer `zstd --train-cover=shrink FILEs` 3854d3f1eafSConrad Meyer 3864d3f1eafSConrad Meyer `zstd --train-cover=shrink=2 FILEs` 3874d3f1eafSConrad Meyer 3880f743729SConrad Meyer* `--train-fastcover[=k#,d=#,f=#,steps=#,split=#,accel=#]`: 3890f743729SConrad Meyer Same as cover but with extra parameters _f_ and _accel_ and different default value of split 3900f743729SConrad Meyer If _split_ is not specified, then it tries _split_ = 75. 3910f743729SConrad Meyer If _f_ is not specified, then it tries _f_ = 20. 3920f743729SConrad Meyer Requires that 0 < _f_ < 32. 3930f743729SConrad Meyer If _accel_ is not specified, then it tries _accel_ = 1. 3940f743729SConrad Meyer Requires that 0 < _accel_ <= 10. 3950f743729SConrad Meyer Requires that _d_ = 6 or _d_ = 8. 3960f743729SConrad Meyer 3970f743729SConrad Meyer _f_ is log of size of array that keeps track of frequency of subsegments of size _d_. 3980f743729SConrad Meyer The subsegment is hashed to an index in the range [0,2^_f_ - 1]. 3990f743729SConrad Meyer It is possible that 2 different subsegments are hashed to the same index, and they are considered as the same subsegment when computing frequency. 4000f743729SConrad Meyer Using a higher _f_ reduces collision but takes longer. 4010f743729SConrad Meyer 4020f743729SConrad Meyer Examples: 4030f743729SConrad Meyer 4040f743729SConrad Meyer `zstd --train-fastcover FILEs` 4050f743729SConrad Meyer 4060f743729SConrad Meyer `zstd --train-fastcover=d=8,f=15,accel=2 FILEs` 4070f743729SConrad Meyer 4080c16b537SWarner Losh* `--train-legacy[=selectivity=#]`: 4090c16b537SWarner Losh Use legacy dictionary builder algorithm with the given dictionary 4100c16b537SWarner Losh _selectivity_ (default: 9). 4110c16b537SWarner Losh The smaller the _selectivity_ value, the denser the dictionary, 4120c16b537SWarner Losh improving its efficiency but reducing its possible maximum size. 4130c16b537SWarner Losh `--train-legacy=s=#` is also accepted. 4140c16b537SWarner Losh 4150c16b537SWarner Losh Examples: 4160c16b537SWarner Losh 4170c16b537SWarner Losh `zstd --train-legacy FILEs` 4180c16b537SWarner Losh 4190c16b537SWarner Losh `zstd --train-legacy=selectivity=8 FILEs` 4200c16b537SWarner Losh 4210c16b537SWarner Losh 4220c16b537SWarner LoshBENCHMARK 4230c16b537SWarner Losh--------- 4240c16b537SWarner Losh 4250c16b537SWarner Losh* `-b#`: 4260c16b537SWarner Losh benchmark file(s) using compression level # 4270c16b537SWarner Losh* `-e#`: 4280c16b537SWarner Losh benchmark file(s) using multiple compression levels, from `-b#` to `-e#` (inclusive) 4290c16b537SWarner Losh* `-i#`: 4300c16b537SWarner Losh minimum evaluation time, in seconds (default: 3s), benchmark mode only 4310c16b537SWarner Losh* `-B#`, `--block-size=#`: 4320c16b537SWarner Losh cut file(s) into independent blocks of size # (default: no block) 4330c16b537SWarner Losh* `--priority=rt`: 4340c16b537SWarner Losh set process priority to real-time 4350c16b537SWarner Losh 436052d3c12SConrad Meyer**Output Format:** CompressionLevel#Filename : IntputSize -> OutputSize (CompressionRatio), CompressionSpeed, DecompressionSpeed 437052d3c12SConrad Meyer 438052d3c12SConrad Meyer**Methodology:** For both compression and decompression speed, the entire input is compressed/decompressed in-memory to measure speed. A run lasts at least 1 sec, so when files are small, they are compressed/decompressed several times per run, in order to improve measurement accuracy. 4390c16b537SWarner Losh 4400c16b537SWarner LoshADVANCED COMPRESSION OPTIONS 4410c16b537SWarner Losh---------------------------- 442*5ff13fbcSAllan Jude### -B#: 443*5ff13fbcSAllan JudeSelect the size of each compression job. 444*5ff13fbcSAllan JudeThis parameter is only available when multi-threading is enabled. 445*5ff13fbcSAllan JudeEach compression job is run in parallel, so this value indirectly impacts the nb of active threads. 446*5ff13fbcSAllan JudeDefault job size varies depending on compression level (generally `4 * windowSize`). 447*5ff13fbcSAllan Jude`-B#` makes it possible to manually select a custom size. 448*5ff13fbcSAllan JudeNote that job size must respect a minimum value which is enforced transparently. 449*5ff13fbcSAllan JudeThis minimum is either 512 KB, or `overlapSize`, whichever is largest. 450*5ff13fbcSAllan JudeDifferent job sizes will lead to (slightly) different compressed frames. 451*5ff13fbcSAllan Jude 4520c16b537SWarner Losh### --zstd[=options]: 4530c16b537SWarner Losh`zstd` provides 22 predefined compression levels. 4540c16b537SWarner LoshThe selected or default predefined compression level can be changed with 4550c16b537SWarner Loshadvanced compression options. 4560c16b537SWarner LoshThe _options_ are provided as a comma-separated list. 4570c16b537SWarner LoshYou may specify only the options you want to change and the rest will be 4580c16b537SWarner Loshtaken from the selected or default compression level. 4590c16b537SWarner LoshThe list of available _options_: 4600c16b537SWarner Losh 4610c16b537SWarner Losh- `strategy`=_strat_, `strat`=_strat_: 4620c16b537SWarner Losh Specify a strategy used by a match finder. 4630c16b537SWarner Losh 464a0483764SConrad Meyer There are 9 strategies numbered from 1 to 9, from faster to stronger: 465a0483764SConrad Meyer 1=ZSTD\_fast, 2=ZSTD\_dfast, 3=ZSTD\_greedy, 466a0483764SConrad Meyer 4=ZSTD\_lazy, 5=ZSTD\_lazy2, 6=ZSTD\_btlazy2, 467a0483764SConrad Meyer 7=ZSTD\_btopt, 8=ZSTD\_btultra, 9=ZSTD\_btultra2. 4680c16b537SWarner Losh 4690c16b537SWarner Losh- `windowLog`=_wlog_, `wlog`=_wlog_: 4700c16b537SWarner Losh Specify the maximum number of bits for a match distance. 4710c16b537SWarner Losh 4720c16b537SWarner Losh The higher number of increases the chance to find a match which usually 4730c16b537SWarner Losh improves compression ratio. 4740c16b537SWarner Losh It also increases memory requirements for the compressor and decompressor. 4750c16b537SWarner Losh The minimum _wlog_ is 10 (1 KiB) and the maximum is 30 (1 GiB) on 32-bit 4760c16b537SWarner Losh platforms and 31 (2 GiB) on 64-bit platforms. 4770c16b537SWarner Losh 4780c16b537SWarner Losh Note: If `windowLog` is set to larger than 27, `--long=windowLog` or 4790c16b537SWarner Losh `--memory=windowSize` needs to be passed to the decompressor. 4800c16b537SWarner Losh 4810c16b537SWarner Losh- `hashLog`=_hlog_, `hlog`=_hlog_: 4820c16b537SWarner Losh Specify the maximum number of bits for a hash table. 4830c16b537SWarner Losh 4840c16b537SWarner Losh Bigger hash tables cause less collisions which usually makes compression 4850c16b537SWarner Losh faster, but requires more memory during compression. 4860c16b537SWarner Losh 48737f1f268SConrad Meyer The minimum _hlog_ is 6 (64 B) and the maximum is 30 (1 GiB). 4880c16b537SWarner Losh 4890c16b537SWarner Losh- `chainLog`=_clog_, `clog`=_clog_: 4900c16b537SWarner Losh Specify the maximum number of bits for a hash chain or a binary tree. 4910c16b537SWarner Losh 4920c16b537SWarner Losh Higher numbers of bits increases the chance to find a match which usually 4930c16b537SWarner Losh improves compression ratio. 4940c16b537SWarner Losh It also slows down compression speed and increases memory requirements for 4950c16b537SWarner Losh compression. 4960c16b537SWarner Losh This option is ignored for the ZSTD_fast strategy. 4970c16b537SWarner Losh 49837f1f268SConrad Meyer The minimum _clog_ is 6 (64 B) and the maximum is 29 (524 Mib) on 32-bit platforms 49937f1f268SConrad Meyer and 30 (1 Gib) on 64-bit platforms. 5000c16b537SWarner Losh 5010c16b537SWarner Losh- `searchLog`=_slog_, `slog`=_slog_: 5020c16b537SWarner Losh Specify the maximum number of searches in a hash chain or a binary tree 5030c16b537SWarner Losh using logarithmic scale. 5040c16b537SWarner Losh 5050c16b537SWarner Losh More searches increases the chance to find a match which usually increases 5060c16b537SWarner Losh compression ratio but decreases compression speed. 5070c16b537SWarner Losh 50837f1f268SConrad Meyer The minimum _slog_ is 1 and the maximum is 'windowLog' - 1. 5090c16b537SWarner Losh 510a0483764SConrad Meyer- `minMatch`=_mml_, `mml`=_mml_: 5110c16b537SWarner Losh Specify the minimum searched length of a match in a hash table. 5120c16b537SWarner Losh 5130c16b537SWarner Losh Larger search lengths usually decrease compression ratio but improve 5140c16b537SWarner Losh decompression speed. 5150c16b537SWarner Losh 516a0483764SConrad Meyer The minimum _mml_ is 3 and the maximum is 7. 5170c16b537SWarner Losh 51837f1f268SConrad Meyer- `targetLength`=_tlen_, `tlen`=_tlen_: 51919fcbaf1SConrad Meyer The impact of this field vary depending on selected strategy. 5200c16b537SWarner Losh 521a0483764SConrad Meyer For ZSTD\_btopt, ZSTD\_btultra and ZSTD\_btultra2, it specifies 522a0483764SConrad Meyer the minimum match length that causes match finder to stop searching. 52337f1f268SConrad Meyer A larger `targetLength` usually improves compression ratio 52419fcbaf1SConrad Meyer but decreases compression speed. 52537f1f268SConrad Meyert 5260f743729SConrad Meyer For ZSTD\_fast, it triggers ultra-fast mode when > 0. 5270f743729SConrad Meyer The value represents the amount of data skipped between match sampling. 52837f1f268SConrad Meyer Impact is reversed : a larger `targetLength` increases compression speed 52919fcbaf1SConrad Meyer but decreases compression ratio. 53019fcbaf1SConrad Meyer 53119fcbaf1SConrad Meyer For all other strategies, this field has no impact. 53219fcbaf1SConrad Meyer 53337f1f268SConrad Meyer The minimum _tlen_ is 0 and the maximum is 128 Kib. 5340c16b537SWarner Losh 5350c16b537SWarner Losh- `overlapLog`=_ovlog_, `ovlog`=_ovlog_: 5360c16b537SWarner Losh Determine `overlapSize`, amount of data reloaded from previous job. 5370c16b537SWarner Losh This parameter is only available when multithreading is enabled. 5380c16b537SWarner Losh Reloading more data improves compression ratio, but decreases speed. 5390c16b537SWarner Losh 5400c16b537SWarner Losh The minimum _ovlog_ is 0, and the maximum is 9. 541a0483764SConrad Meyer 1 means "no overlap", hence completely independent jobs. 5420c16b537SWarner Losh 9 means "full overlap", meaning up to `windowSize` is reloaded from previous job. 543a0483764SConrad Meyer Reducing _ovlog_ by 1 reduces the reloaded amount by a factor 2. 544a0483764SConrad Meyer For example, 8 means "windowSize/2", and 6 means "windowSize/8". 545a0483764SConrad Meyer Value 0 is special and means "default" : _ovlog_ is automatically determined by `zstd`. 546a0483764SConrad Meyer In which case, _ovlog_ will range from 6 to 9, depending on selected _strat_. 5470c16b537SWarner Losh 548a0483764SConrad Meyer- `ldmHashLog`=_lhlog_, `lhlog`=_lhlog_: 5490c16b537SWarner Losh Specify the maximum size for a hash table used for long distance matching. 5500c16b537SWarner Losh 5510c16b537SWarner Losh This option is ignored unless long distance matching is enabled. 5520c16b537SWarner Losh 5530c16b537SWarner Losh Bigger hash tables usually improve compression ratio at the expense of more 5540c16b537SWarner Losh memory during compression and a decrease in compression speed. 5550c16b537SWarner Losh 55637f1f268SConrad Meyer The minimum _lhlog_ is 6 and the maximum is 30 (default: 20). 5570c16b537SWarner Losh 558a0483764SConrad Meyer- `ldmMinMatch`=_lmml_, `lmml`=_lmml_: 5590c16b537SWarner Losh Specify the minimum searched length of a match for long distance matching. 5600c16b537SWarner Losh 5610c16b537SWarner Losh This option is ignored unless long distance matching is enabled. 5620c16b537SWarner Losh 5630c16b537SWarner Losh Larger/very small values usually decrease compression ratio. 5640c16b537SWarner Losh 565a0483764SConrad Meyer The minimum _lmml_ is 4 and the maximum is 4096 (default: 64). 5660c16b537SWarner Losh 567a0483764SConrad Meyer- `ldmBucketSizeLog`=_lblog_, `lblog`=_lblog_: 5680c16b537SWarner Losh Specify the size of each bucket for the hash table used for long distance 5690c16b537SWarner Losh matching. 5700c16b537SWarner Losh 5710c16b537SWarner Losh This option is ignored unless long distance matching is enabled. 5720c16b537SWarner Losh 5730c16b537SWarner Losh Larger bucket sizes improve collision resolution but decrease compression 5740c16b537SWarner Losh speed. 5750c16b537SWarner Losh 57637f1f268SConrad Meyer The minimum _lblog_ is 1 and the maximum is 8 (default: 3). 5770c16b537SWarner Losh 578a0483764SConrad Meyer- `ldmHashRateLog`=_lhrlog_, `lhrlog`=_lhrlog_: 5790c16b537SWarner Losh Specify the frequency of inserting entries into the long distance matching 5800c16b537SWarner Losh hash table. 5810c16b537SWarner Losh 5820c16b537SWarner Losh This option is ignored unless long distance matching is enabled. 5830c16b537SWarner Losh 5840c16b537SWarner Losh Larger values will improve compression speed. Deviating far from the 5850c16b537SWarner Losh default value will likely result in a decrease in compression ratio. 5860c16b537SWarner Losh 587a0483764SConrad Meyer The default value is `wlog - lhlog`. 5880c16b537SWarner Losh 5890f743729SConrad Meyer### Example 5900f743729SConrad MeyerThe following parameters sets advanced compression options to something 5910f743729SConrad Meyersimilar to predefined level 19 for files bigger than 256 KB: 5920f743729SConrad Meyer 593a0483764SConrad Meyer`--zstd`=wlog=23,clog=23,hlog=22,slog=6,mml=3,tlen=48,strat=6 5940f743729SConrad Meyer 5950c16b537SWarner Losh 5960c16b537SWarner LoshBUGS 5970c16b537SWarner Losh---- 5980c16b537SWarner LoshReport bugs at: https://github.com/facebook/zstd/issues 5990c16b537SWarner Losh 6000c16b537SWarner LoshAUTHOR 6010c16b537SWarner Losh------ 6020c16b537SWarner LoshYann Collet 603