xref: /freebsd/usr.bin/compress/doc/README (revision 1c4ee7dfb8affed302171232b0f612e6bcba3c10)
1Compress version 4.0 improvements over 3.0:
2	o compress() speedup (10-50%) by changing division hash to xor
3	o decompress() speedup (5-10%)
4	o Memory requirements reduced (3-30%)
5	o Stack requirements reduced to less than 4kb
6	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
7    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
8	o Default to 'quiet' mode
9	o Unification of 'force' flags
10	o Manual page overhaul
11	o Portability enhancement for M_XENIX
12	o Removed text on #else and #endif
13	o Added "-V" switch to print version and options
14	o Added #defines for SIGNED_COMPARE_SLOW
15	o Added Makefile and "usermem" program
16	o Removed all floating point computations
17	o New programs: [deleted]
18
19The "usermem" script attempts to determine the maximum process size.  Some
20editing of the script may be necessary (see the comments).  [It should work
21fine on 4.3 BSD.] If you can't get it to work at all, just create file
22"USERMEM" containing the maximum process size in decimal.
23
24The following preprocessor symbols control the compilation of "compress.c":
25
26	o USERMEM		Maximum process memory on the system
27	o SACREDMEM		Amount to reserve for other processes
28	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
29	o NO_UCHAR		Don't use "unsigned char" types
30	o BITS			Overrules default set by USERMEM-SACREDMEM
31	o vax			Generate inline assembler
32	o interdata		Defines SIGNED_COMPARE_SLOW
33	o M_XENIX		Makes arrays < 65536 bytes each
34	o pdp11			BITS=12, NO_UCHAR
35	o z8000			BITS=12
36	o pcxt			BITS=12
37	o BSD4_2		Allow long filenames ( > 14 characters) &
38				Call setlinebuf(stderr)
39
40The difference "usermem-sacredmem" determines the maximum BITS that can be
41specified with the "-b" flag.
42
43memory: at least		BITS
44------  -- -----                ----
45     433,484			 16
46     229,600			 15
47     127,536			 14
48      73,464			 13
49           0			 12
50
51The default is BITS=16.
52
53The maximum bits can be overruled by specifying "-DBITS=bits" at
54compilation time.
55
56WARNING: files compressed on a large machine with more bits than allowed by
57a version of compress on a smaller machine cannot be decompressed!  Use the
58"-b12" flag to generate a file on a large machine that can be uncompressed
59on a 16-bit machine.
60
61The output of compress 4.0 is fully compatible with that of compress 3.0.
62In other words, the output of compress 4.0 may be fed into uncompress 3.0 or
63the output of compress 3.0 may be fed into uncompress 4.0.
64
65The output of compress 4.0 not compatible with that of
66compress 2.0.  However, compress 4.0 still accepts the output of
67compress 2.0.  To generate output that is compatible with compress
682.0, use the undocumented "-C" flag.
69
70	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
71--------------------------------
72
73Enclosed is compress version 3.0 with the following changes:
74
751.	"Block" compression is performed.  After the BITS run out, the
76	compression ratio is checked every so often.  If it is decreasing,
77	the table is cleared and a new set of substrings are generated.
78
79	This makes the output of compress 3.0 not compatible with that of
80	compress 2.0.  However, compress 3.0 still accepts the output of
81	compress 2.0.  To generate output that is compatible with compress
82	2.0, use the undocumented "-C" flag.
83
842.	A quiet "-q" flag has been added for use by the news system.
85
863.	The character chaining has been deleted and the program now uses
87	hashing.  This improves the speed of the program, especially
88	during decompression.  Other speed improvements have been made,
89	such as using putc() instead of fwrite().
90
914.	A large table is used on large machines when a relatively small
92	number of bits is specified.  This saves much time when compressing
93	for a 16-bit machine on a 32-bit virtual machine.  Note that the
94	speed improvement only occurs when the input file is > 30000
95	characters, and the -b BITS is less than or equal to the cutoff
96	described below.
97
98Most of these changes were made by James A. Woods (ames!jaw).  Thank you
99James!
100
101To compile compress:
102
103	cc -O -DUSERMEM=usermem -o compress compress.c
104
105Where "usermem" is the amount of physical user memory available (in bytes).
106If any physical memory is to be reserved for other processes, put in
107"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
108
109The difference "usermem-sacredmem" determines the maximum BITS that can be
110specified, and the cutoff bits where the large+fast table is used.
111
112memory: at least		BITS		cutoff
113------  -- -----                ----            ------
114   4,718,592 			 16		  13
115   2,621,440 			 16		  12
116   1,572,864			 16		  11
117   1,048,576			 16		  10
118     631,808			 16               --
119     329,728			 15               --
120     178,176			 14		  --
121      99,328			 13		  --
122           0			 12		  --
123
124The default memory size is 750,000 which gives a maximum BITS=16 and no
125large+fast table.
126
127The maximum bits can be overruled by specifying "-DBITS=bits" at
128compilation time.
129
130If your machine doesn't support unsigned characters, define "NO_UCHAR"
131when compiling.
132
133If your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
134
135After compilation, move "compress" to a standard executable location, such
136as /usr/local.  Then:
137	cd /usr/local
138	ln compress uncompress
139	ln compress zcat
140
141On machines that have a fixed stack size (such as Perkin-Elmer), set the
142stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
143
144Next, install the manual (compress.l).
145	cp compress.l /usr/man/manl
146	cd /usr/man/manl
147	ln compress.l uncompress.l
148	ln compress.l zcat.l
149
150		- or -
151
152	cp compress.l /usr/man/man1/compress.1
153	cd /usr/man/man1
154	ln compress.1 uncompress.1
155	ln compress.1 zcat.1
156
157					regards,
158					petsd!joe
159
160Here is a note from the net:
161
162>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
163Path: ames!hplabs!pesnta!amd!turtlevax!ken
164From: ken@turtlevax.UUCP (Ken Turkowski)
165Newsgroups: net.sources
166Subject: Re: Compress release 3.0 : sample Makefile
167Organization: CADLINC, Inc. @ Menlo Park, CA
168
169In the compress 3.0 source recently posted to mod.sources, there is a
170#define variable which can be set for optimum performance on a machine
171with a large amount of memory.  A program (usermem) to calculate the
172usable amount of physical user memory is enclosed, as well as a sample
1734.2BSD Vax Makefile for compress.
174
175Here is the README file from the previous version of compress (2.0):
176
177>Enclosed is compress.c version 2.0 with the following bugs fixed:
178>
179>1.	The packed files produced by compress are different on different
180>	machines and dependent on the vax sysgen option.
181>		The bug was in the different byte/bit ordering on the
182>		various machines.  This has been fixed.
183>
184>		This version is NOT compatible with the original vax posting
185>		unless the '-DCOMPATIBLE' option is specified to the C
186>		compiler.  The original posting has a bug which I fixed,
187>		causing incompatible files.  I recommend you NOT to use this
188>		option unless you already have a lot of packed files from
189>		the original posting by Thomas.
190>2.	The exit status is not well defined (on some machines) causing the
191>	scripts to fail.
192>		The exit status is now 0,1 or 2 and is documented in
193>		compress.l.
194>3.	The function getopt() is not available in all C libraries.
195>		The function getopt() is no longer referenced by the
196>		program.
197>4.	Error status is not being checked on the fwrite() and fflush() calls.
198>		Fixed.
199>
200>The following enhancements have been made:
201>
202>1.	Added facilities of "compact" into the compress program.  "Pack",
203>	"Unpack", and "Pcat" are no longer required (no longer supplied).
204>2.	Installed work around for C compiler bug with "-O".
205>3.	Added a magic number header (\037\235).  Put the bits specified
206>	in the file.
207>4.	Added "-f" flag to force overwrite of output file.
208>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
209>	compile.
210>6.	The 'uncompress' script has been deleted; simply
211>	'ln compress uncompress' after you compile and it will work.
212>7.	Removed extra bit masking for machines that support unsigned
213>	characters.  If your machine doesn't support unsigned characters,
214>	define "NO_UCHAR" when compiling.
215>
216>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
217>standard executable location, such as /usr/local.  Then:
218>	cd /usr/local
219>	ln compress uncompress
220>	ln compress zcat
221>
222>On machines that have a fixed stack size (such as Perkin-Elmer), set the
223>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
224>
225>Next, install the manual (compress.l).
226>	cp compress.l /usr/man/manl		- or -
227>	cp compress.l /usr/man/man1/compress.1
228>
229>Here is the README that I sent with my first posting:
230>
231>>Enclosed is a modified version of compress.c, along with scripts to make it
232>>run identically to pack(1), unpack(1), and pcat(1).  Here is what I
233>>(petsd!joe) and a colleague (petsd!peora!srd) did:
234>>
235>>1. Removed VAX dependencies.
236>>2. Changed the struct to separate arrays; saves mucho memory.
237>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
238>>4. Sorted the character next chain and changed the search to stop
239>>prematurely.  This saves a lot on the execution time when compressing.
240>>
241>>This version is totally compatible with the original version.  Even though
242>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
243>>machine, due to the size of the arrays.
244>>
245>>Here is the README file from the original author:
246>>
247>>>Well, with all this discussion about file compression (for news batching
248>>>in particular) going around, I decided to implement the text compression
249>>>algorithm described in the June Computer magazine.  The author claimed
250>>>blinding speed and good compression ratios.  It's certainly faster than
251>>>compact (but, then, what wouldn't be), but it's also the same speed as
252>>>pack, and gets better compression than both of them.  On 350K bytes of
253>>>Unix-wizards, compact took about 8 minutes of CPU, pack took about 80
254>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
255>>>pack got about 30% compression, whereas compress got over 50%.  So, I
256>>>decided I had something, and that others might be interested, too.
257>>>
258>>>As is probably true of compact and pack (although I haven't checked),
259>>>the byte order within a word is probably relevant here, but as long as
260>>>you stay on a single machine type, you should be ok.  (Can anybody
261>>>elucidate on this?)  There are a couple of asm's in the code (extv and
262>>>insv instructions), so anyone porting it to another machine will have to
263>>>deal with this anyway (and could probably make it compatible with Vax
264>>>byte order at the same time).  Anyway, I've linted the code (both with
265>>>and without -p), so it should run elsewhere.  Note the longs in the
266>>>code, you can take these out if you reduce BITS to <= 15.
267>>>
268>>>Have fun, and as always, if you make good enhancements, or bug fixes,
269>>>I'd like to see them.
270>>>
271>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
272>>
273>>					regards,
274>>					joe
275>>
276>>--
277>>Full-Name:  Joseph M. Orost
278>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
279>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
280>>Phone:      (201) 870-5844
281