xref: /freebsd/usr.bin/mkuzip/mkuzip.8 (revision 214e3e09b3381e44bf5d9c1dcd19c4b1b923a796)
1.\"-
2.\" Copyright (c) 2004-2016 Maxim Sobolev <sobomax@FreeBSD.org>
3.\" All rights reserved.
4.\"
5.\" Redistribution and use in source and binary forms, with or without
6.\" modification, are permitted provided that the following conditions
7.\" are met:
8.\" 1. Redistributions of source code must retain the above copyright
9.\"    notice, this list of conditions and the following disclaimer.
10.\" 2. Redistributions in binary form must reproduce the above copyright
11.\"    notice, this list of conditions and the following disclaimer in the
12.\"    documentation and/or other materials provided with the distribution.
13.\"
14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
17.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
18.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
24.\" SUCH DAMAGE.
25.\"
26.Dd August 9, 2019
27.Dt MKUZIP 8
28.Os
29.Sh NAME
30.Nm mkuzip
31.Nd compress disk image for use with
32.Xr geom_uzip 4
33class
34.Sh SYNOPSIS
35.Nm
36.Op Fl dSsvZ
37.Op Fl A Ar compression_algorithm
38.Op Fl C Ar compression_level
39.Op Fl j Ar compression_jobs
40.Op Fl o Ar outfile
41.Op Fl s Ar cluster_size
42.Ar infile
43.Sh DESCRIPTION
44The
45.Nm
46utility compresses a disk image file so that the
47.Xr geom_uzip 4
48class will be able to decompress the resulting image at run-time.
49This allows for a significant reduction of size of disk image at
50the expense of some CPU time required to decompress the data each
51time it is read.
52The
53.Nm
54utility
55works in two phases:
56.Bl -enum
57.It
58An
59.Ar infile
60image is split into clusters; each cluster is compressed.
61.It
62The resulting set of compressed clusters is written to the output file.
63In addition, a
64.Dq table of contents
65header is written which allows for efficient seeking.
66.El
67.Pp
68The options are:
69.Bl -tag -width indent
70.It Fl A Op Ar lzma | Ar zlib | Ar zstd
71Select a specific compression algorithm.
72If this option is not provided, the default is
73.Ar zlib .
74.Pp
75The
76.Ar lzma
77algorithm provides noticeable better compression levels than zlib on the same
78data set.
79It has vastly slower compression speed and moderately slower decompression
80speed.
81.Pp
82The
83.Ar zstd
84algorithm provides better compression levels than zlib on the same data set.
85It also has faster compression and decompression speed than zlib.
86In the very high compression
87.Dq level
88settings, it does not offer quite as high a compression ratio as
89.Ar lzma .
90However, its decompression speed does not suffer at high compression
91.Dq levels .
92.It Fl C Ar compression_level
93Select the integer compression level used to parameterize the chosen
94compression algorithm.
95.Pp
96For any given algorithm, a lesser number selects a faster compression mode.
97A greater number selects a slower compression mode.
98Typically, for the same algorithm, a greater
99.Ar compression_level
100provides better final compression ratio.
101.Pp
102For
103.Ar lzma ,
104the range of valid compression levels is
105.Va 0-9 .
106The
107.Nm
108default for lzma is
109.Va 6 .
110.Pp
111For
112.Ar zlib ,
113the range of valid compression levels is
114.Va 1-9 .
115The
116.Nm
117default for zlib is
118.Va 9 .
119.Pp
120For
121.Ar zstd ,
122the range of valid compression levels is currently
123.Va 1-19 .
124The
125.Nm
126default for zstd is
127.Va 9 .
128.It Fl d
129Enable de-duplication.
130When the option is enabled
131.Nm
132detects identical blocks in the input and replaces each subsequent occurrence
133of such block with pointer to the very first one in the output.
134Setting this option results is moderate decrease of compressed image size,
135typically around 3-5% of a final size of the compressed image.
136.It Fl j Ar compression_jobs
137Specify the number of compression jobs that
138.Nm
139runs in parallel to speed up compression.
140When option is not specified the number of jobs set to be equal
141to the value of
142.Va hw.ncpu
143.Xr sysctl 8
144variable.
145.It Op Fl L
146Legacy flag that indicates the same thing as
147.Dq Fl A Ar lzma .
148.It Fl o Ar outfile
149Name of the output file
150.Ar outfile .
151The default is to use the input name with the suffix
152.Pa .uzip
153for the
154.Xr zlib 3
155compression or
156.Pa .ulzma
157for the
158.Xr lzma 3 .
159.It Fl S
160Print summary about the compression ratio as well as output
161file size after file has been processed.
162.It Fl s Ar cluster_size
163Split the image into clusters of
164.Ar cluster_size
165bytes, 16384 bytes by default.
166The
167.Ar cluster_size
168should be a multiple of 512 bytes.
169.It Fl v
170Display verbose messages.
171.It Fl Z
172Disable zero-block detection and elimination.
173When this option is set,
174.Nm
175compresses blocks of zero bytes just as it would any other block.
176When the option is not set,
177.Nm
178detects and compresses zero blocks in a space-efficient way.
179Setting
180.Fl Z
181increases compressed image sizes slightly, typically less than 0.1%.
182.El
183.Sh IMPLEMENTATION NOTES
184The compression ratio largely depends on the compression algorithm, level, and
185cluster size used.
186For large cluster sizes (16kB and higher), typical overall image compression
187ratios with
188.Xr zlib 3
189are only 1-2% less than those achieved with
190.Xr gzip 1
191over the entire image.
192However, it should be kept in mind that larger cluster sizes lead to higher
193overhead in the
194.Xr geom_uzip 4
195class, as the class has to decompress the whole cluster even if
196only a few bytes from that cluster have to be read.
197.Pp
198Additionally, the threshold at 16-32 kB where a larger cluster size does not
199benefit overall compression ratio is an artifact of the
200.Xr zlib 3
201algorithm in particular.
202.Ar Lzma
203and
204.Ar Zstd will continue to provide better compression ratios as cluster sizes
205are increased, at high enough compression levels.
206The same tradeoff continues to apply: reads in
207.Xr geom_uzip 4
208become more expensive the greater the cluster size.
209.Pp
210The
211.Nm
212utility
213inserts a short shell script at the beginning of the generated image,
214which makes it possible to
215.Dq run
216the image just like any other shell script.
217The script tries to load the
218.Xr geom_uzip 4
219class if it is not loaded, configure the image as an
220.Xr md 4
221disk device using
222.Xr mdconfig 8 ,
223and automatically mount it using
224.Xr mount_cd9660 8
225on the mount point provided as the first argument to the script.
226.Pp
227The de-duplication is a
228.Fx
229specific feature and while it does not require any changes to on-disk
230compressed image format, however it did require some matching changes to the
231.Xr geom_uzip 4
232to handle resulting images correctly.
233.Pp
234To make use of
235.Ar zstd
236.Nm
237images, the kernel must be configured with
238.Cd ZSTDIO .
239It is enabled by default in many
240.Cd GENERIC
241kernels provided as binary distributions by
242.Fx .
243The status on any particular system can be verified by checking
244.Xr sysctl 8
245.Dv kern.features.geom_uzip_zstd
246for
247.Dq 1 .
248.Sh EXIT STATUS
249.Ex -std
250.Sh SEE ALSO
251.Xr gzip 1 ,
252.Xr xz 1 ,
253.Xr zstd 1 ,
254.Xr zlib 3 ,
255.Xr geom 4 ,
256.Xr geom_uzip 4 ,
257.Xr md 4 ,
258.Xr mdconfig 8 ,
259.Xr mount_cd9660 8
260.Sh AUTHORS
261.An Maxim Sobolev Aq Mt sobomax@FreeBSD.org
262