1.\"- 2.\" Copyright (c) 2004-2016 Maxim Sobolev <sobomax@FreeBSD.org> 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 17.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 18.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 24.\" SUCH DAMAGE. 25.\" 26.Dd August 9, 2019 27.Dt MKUZIP 8 28.Os 29.Sh NAME 30.Nm mkuzip 31.Nd compress disk image for use with 32.Xr geom_uzip 4 33class 34.Sh SYNOPSIS 35.Nm 36.Op Fl dSsvZ 37.Op Fl A Ar compression_algorithm 38.Op Fl C Ar compression_level 39.Op Fl j Ar compression_jobs 40.Op Fl o Ar outfile 41.Op Fl s Ar cluster_size 42.Ar infile 43.Sh DESCRIPTION 44The 45.Nm 46utility compresses a disk image file so that the 47.Xr geom_uzip 4 48class will be able to decompress the resulting image at run-time. 49This allows for a significant reduction of size of disk image at 50the expense of some CPU time required to decompress the data each 51time it is read. 52The 53.Nm 54utility 55works in two phases: 56.Bl -enum 57.It 58An 59.Ar infile 60image is split into clusters; each cluster is compressed. 61.It 62The resulting set of compressed clusters is written to the output file. 63In addition, a 64.Dq table of contents 65header is written which allows for efficient seeking. 66.El 67.Pp 68The options are: 69.Bl -tag -width indent 70.It Fl A Op Ar lzma | Ar zlib | Ar zstd 71Select a specific compression algorithm. 72If this option is not provided, the default is 73.Ar zlib . 74.Pp 75The 76.Ar lzma 77algorithm provides noticeable better compression levels than zlib on the same 78data set. 79It has vastly slower compression speed and moderately slower decompression 80speed. 81.Pp 82The 83.Ar zstd 84algorithm provides better compression levels than zlib on the same data set. 85It also has faster compression and decompression speed than zlib. 86In the very high compression 87.Dq level 88settings, it does not offer quite as high a compression ratio as 89.Ar lzma . 90However, its decompression speed does not suffer at high compression 91.Dq levels . 92.It Fl C Ar compression_level 93Select the integer compression level used to parameterize the chosen 94compression algorithm. 95.Pp 96For any given algorithm, a lesser number selects a faster compression mode. 97A greater number selects a slower compression mode. 98Typically, for the same algorithm, a greater 99.Ar compression_level 100provides better final compression ratio. 101.Pp 102For 103.Ar lzma , 104the range of valid compression levels is 105.Va 0-9 . 106The 107.Nm 108default for lzma is 109.Va 6 . 110.Pp 111For 112.Ar zlib , 113the range of valid compression levels is 114.Va 1-9 . 115The 116.Nm 117default for zlib is 118.Va 9 . 119.Pp 120For 121.Ar zstd , 122the range of valid compression levels is currently 123.Va 1-19 . 124The 125.Nm 126default for zstd is 127.Va 9 . 128.It Fl d 129Enable de-duplication. 130When the option is enabled 131.Nm 132detects identical blocks in the input and replaces each subsequent occurrence 133of such block with pointer to the very first one in the output. 134Setting this option results is moderate decrease of compressed image size, 135typically around 3-5% of a final size of the compressed image. 136.It Fl j Ar compression_jobs 137Specify the number of compression jobs that 138.Nm 139runs in parallel to speed up compression. 140When option is not specified the number of jobs set to be equal 141to the value of 142.Va hw.ncpu 143.Xr sysctl 8 144variable. 145.It Op Fl L 146Legacy flag that indicates the same thing as 147.Dq Fl A Ar lzma . 148.It Fl o Ar outfile 149Name of the output file 150.Ar outfile . 151The default is to use the input name with the suffix 152.Pa .uzip 153for the 154.Xr zlib 3 155compression or 156.Pa .ulzma 157for the 158.Xr lzma 3 . 159.It Fl S 160Print summary about the compression ratio as well as output 161file size after file has been processed. 162.It Fl s Ar cluster_size 163Split the image into clusters of 164.Ar cluster_size 165bytes, 16384 bytes by default. 166The 167.Ar cluster_size 168should be a multiple of 512 bytes. 169.It Fl v 170Display verbose messages. 171.It Fl Z 172Disable zero-block detection and elimination. 173When this option is set, 174.Nm 175compresses blocks of zero bytes just as it would any other block. 176When the option is not set, 177.Nm 178detects and compresses zero blocks in a space-efficient way. 179Setting 180.Fl Z 181increases compressed image sizes slightly, typically less than 0.1%. 182.El 183.Sh IMPLEMENTATION NOTES 184The compression ratio largely depends on the compression algorithm, level, and 185cluster size used. 186For large cluster sizes (16kB and higher), typical overall image compression 187ratios with 188.Xr zlib 3 189are only 1-2% less than those achieved with 190.Xr gzip 1 191over the entire image. 192However, it should be kept in mind that larger cluster sizes lead to higher 193overhead in the 194.Xr geom_uzip 4 195class, as the class has to decompress the whole cluster even if 196only a few bytes from that cluster have to be read. 197.Pp 198Additionally, the threshold at 16-32 kB where a larger cluster size does not 199benefit overall compression ratio is an artifact of the 200.Xr zlib 3 201algorithm in particular. 202.Ar Lzma 203and 204.Ar Zstd will continue to provide better compression ratios as cluster sizes 205are increased, at high enough compression levels. 206The same tradeoff continues to apply: reads in 207.Xr geom_uzip 4 208become more expensive the greater the cluster size. 209.Pp 210The 211.Nm 212utility 213inserts a short shell script at the beginning of the generated image, 214which makes it possible to 215.Dq run 216the image just like any other shell script. 217The script tries to load the 218.Xr geom_uzip 4 219class if it is not loaded, configure the image as an 220.Xr md 4 221disk device using 222.Xr mdconfig 8 , 223and automatically mount it using 224.Xr mount_cd9660 8 225on the mount point provided as the first argument to the script. 226.Pp 227The de-duplication is a 228.Fx 229specific feature and while it does not require any changes to on-disk 230compressed image format, however it did require some matching changes to the 231.Xr geom_uzip 4 232to handle resulting images correctly. 233.Pp 234To make use of 235.Ar zstd 236.Nm 237images, the kernel must be configured with 238.Cd ZSTDIO . 239It is enabled by default in many 240.Cd GENERIC 241kernels provided as binary distributions by 242.Fx . 243The status on any particular system can be verified by checking 244.Xr sysctl 8 245.Dv kern.features.geom_uzip_zstd 246for 247.Dq 1 . 248.Sh EXIT STATUS 249.Ex -std 250.Sh SEE ALSO 251.Xr gzip 1 , 252.Xr xz 1 , 253.Xr zstd 1 , 254.Xr zlib 3 , 255.Xr geom 4 , 256.Xr geom_uzip 4 , 257.Xr md 4 , 258.Xr mdconfig 8 , 259.Xr mount_cd9660 8 260.Sh AUTHORS 261.An Maxim Sobolev Aq Mt sobomax@FreeBSD.org 262