1.\"- 2.\" Copyright (c) 2004-2016 Maxim Sobolev <sobomax@FreeBSD.org> 3.\" All rights reserved. 4.\" 5.\" Redistribution and use in source and binary forms, with or without 6.\" modification, are permitted provided that the following conditions 7.\" are met: 8.\" 1. Redistributions of source code must retain the above copyright 9.\" notice, this list of conditions and the following disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 17.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 18.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 24.\" SUCH DAMAGE. 25.\" 26.\" $FreeBSD$ 27.\" 28.Dd August 9, 2019 29.Dt MKUZIP 8 30.Os 31.Sh NAME 32.Nm mkuzip 33.Nd compress disk image for use with 34.Xr geom_uzip 4 35class 36.Sh SYNOPSIS 37.Nm 38.Op Fl dSsvZ 39.Op Fl A Ar compression_algorithm 40.Op Fl C Ar compression_level 41.Op Fl j Ar compression_jobs 42.Op Fl o Ar outfile 43.Op Fl s Ar cluster_size 44.Ar infile 45.Sh DESCRIPTION 46The 47.Nm 48utility compresses a disk image file so that the 49.Xr geom_uzip 4 50class will be able to decompress the resulting image at run-time. 51This allows for a significant reduction of size of disk image at 52the expense of some CPU time required to decompress the data each 53time it is read. 54The 55.Nm 56utility 57works in two phases: 58.Bl -enum 59.It 60An 61.Ar infile 62image is split into clusters; each cluster is compressed. 63.It 64The resulting set of compressed clusters is written to the output file. 65In addition, a 66.Dq table of contents 67header is written which allows for efficient seeking. 68.El 69.Pp 70The options are: 71.Bl -tag -width indent 72.It Fl A Op Ar lzma | Ar zlib | Ar zstd 73Select a specific compression algorithm. 74If this option is not provided, the default is 75.Ar zlib . 76.Pp 77The 78.Ar lzma 79algorithm provides noticeable better compression levels than zlib on the same 80data set. 81It has vastly slower compression speed and moderately slower decompression 82speed. 83.Pp 84The 85.Ar zstd 86algorithm provides better compression levels than zlib on the same data set. 87It also has faster compression and decompression speed than zlib. 88In the very high compression 89.Dq level 90settings, it does not offer quite as high a compression ratio as 91.Ar lzma . 92However, its decompression speed does not suffer at high compression 93.Dq levels . 94.It Fl C Ar compression_level 95Select the integer compression level used to parameterize the chosen 96compression algorithm. 97.Pp 98For any given algorithm, a lesser number selects a faster compression mode. 99A greater number selects a slower compression mode. 100Typically, for the same algorithm, a greater 101.Ar compression_level 102provides better final compression ratio. 103.Pp 104For 105.Ar lzma , 106the range of valid compression levels is 107.Va 0-9 . 108The 109.Nm 110default for lzma is 111.Va 6 . 112.Pp 113For 114.Ar zlib , 115the range of valid compression levels is 116.Va 1-9 . 117The 118.Nm 119default for zlib is 120.Va 9 . 121.Pp 122For 123.Ar zstd , 124the range of valid compression levels is currently 125.Va 1-19 . 126The 127.Nm 128default for zstd is 129.Va 9 . 130.It Fl d 131Enable de-duplication. 132When the option is enabled 133.Nm 134detects identical blocks in the input and replaces each subsequent occurrence 135of such block with pointer to the very first one in the output. 136Setting this option results is moderate decrease of compressed image size, 137typically around 3-5% of a final size of the compressed image. 138.It Fl j Ar compression_jobs 139Specify the number of compression jobs that 140.Nm 141runs in parallel to speed up compression. 142When option is not specified the number of jobs set to be equal 143to the value of 144.Va hw.ncpu 145.Xr sysctl 8 146variable. 147.It Op Fl L 148Legacy flag that indicates the same thing as 149.Dq Fl A Ar lzma . 150.It Fl o Ar outfile 151Name of the output file 152.Ar outfile . 153The default is to use the input name with the suffix 154.Pa .uzip 155for the 156.Xr zlib 3 157compression or 158.Pa .ulzma 159for the 160.Xr lzma 3 . 161.It Fl S 162Print summary about the compression ratio as well as output 163file size after file has been processed. 164.It Fl s Ar cluster_size 165Split the image into clusters of 166.Ar cluster_size 167bytes, 16384 bytes by default. 168The 169.Ar cluster_size 170should be a multiple of 512 bytes. 171.It Fl v 172Display verbose messages. 173.It Fl Z 174Disable zero-block detection and elimination. 175When this option is set, 176.Nm 177compresses blocks of zero bytes just as it would any other block. 178When the option is not set, 179.Nm 180detects and compresses zero blocks in a space-efficient way. 181Setting 182.Fl Z 183increases compressed image sizes slightly, typically less than 0.1%. 184.El 185.Sh IMPLEMENTATION NOTES 186The compression ratio largely depends on the compression algorithm, level, and 187cluster size used. 188For large cluster sizes (16kB and higher), typical overall image compression 189ratios with 190.Xr zlib 3 191are only 1-2% less than those achieved with 192.Xr gzip 1 193over the entire image. 194However, it should be kept in mind that larger cluster sizes lead to higher 195overhead in the 196.Xr geom_uzip 4 197class, as the class has to decompress the whole cluster even if 198only a few bytes from that cluster have to be read. 199.Pp 200Additionally, the threshold at 16-32 kB where a larger cluster size does not 201benefit overall compression ratio is an artifact of the 202.Xr zlib 3 203algorithm in particular. 204.Ar Lzma 205and 206.Ar Zstd will continue to provide better compression ratios as cluster sizes 207are increased, at high enough compression levels. 208The same tradeoff continues to apply: reads in 209.Xr geom_uzip 4 210become more expensive the greater the cluster size. 211.Pp 212The 213.Nm 214utility 215inserts a short shell script at the beginning of the generated image, 216which makes it possible to 217.Dq run 218the image just like any other shell script. 219The script tries to load the 220.Xr geom_uzip 4 221class if it is not loaded, configure the image as an 222.Xr md 4 223disk device using 224.Xr mdconfig 8 , 225and automatically mount it using 226.Xr mount_cd9660 8 227on the mount point provided as the first argument to the script. 228.Pp 229The de-duplication is a 230.Fx 231specific feature and while it does not require any changes to on-disk 232compressed image format, however it did require some matching changes to the 233.Xr geom_uzip 4 234to handle resulting images correctly. 235.Pp 236To make use of 237.Ar zstd 238.Nm 239images, the kernel must be configured with 240.Cd ZSTDIO . 241It is enabled by default in many 242.Cd GENERIC 243kernels provided as binary distributions by 244.Fx . 245The status on any particular system can be verified by checking 246.Xr sysctl 8 247.Dv kern.features.geom_uzip_zstd 248for 249.Dq 1 . 250.Sh EXIT STATUS 251.Ex -std 252.Sh SEE ALSO 253.Xr gzip 1 , 254.Xr xz 1 , 255.Xr zstd 1 , 256.Xr zlib 3 , 257.Xr geom 4 , 258.Xr geom_uzip 4 , 259.Xr md 4 , 260.Xr mdconfig 8 , 261.Xr mount_cd9660 8 262.Sh AUTHORS 263.An Maxim Sobolev Aq Mt sobomax@FreeBSD.org 264