xref: /freebsd/usr.bin/compress/doc/README (revision 9b50d9027575220cb6dd09b3e62f03f511e908b8)
19b50d902SRodney W. Grimes
29b50d902SRodney W. Grimes	@(#)README	8.1 (Berkeley) 6/9/93
39b50d902SRodney W. Grimes
49b50d902SRodney W. GrimesCompress version 4.0 improvements over 3.0:
59b50d902SRodney W. Grimes	o compress() speedup (10-50%) by changing division hash to xor
69b50d902SRodney W. Grimes	o decompress() speedup (5-10%)
79b50d902SRodney W. Grimes	o Memory requirements reduced (3-30%)
89b50d902SRodney W. Grimes	o Stack requirements reduced to less than 4kb
99b50d902SRodney W. Grimes	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
109b50d902SRodney W. Grimes    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
119b50d902SRodney W. Grimes	o Default to 'quiet' mode
129b50d902SRodney W. Grimes	o Unification of 'force' flags
139b50d902SRodney W. Grimes	o Manual page overhaul
149b50d902SRodney W. Grimes	o Portability enhancement for M_XENIX
159b50d902SRodney W. Grimes	o Removed text on #else and #endif
169b50d902SRodney W. Grimes	o Added "-V" switch to print version and options
179b50d902SRodney W. Grimes	o Added #defines for SIGNED_COMPARE_SLOW
189b50d902SRodney W. Grimes	o Added Makefile and "usermem" program
199b50d902SRodney W. Grimes	o Removed all floating point computations
209b50d902SRodney W. Grimes	o New programs: [deleted]
219b50d902SRodney W. Grimes
229b50d902SRodney W. GrimesThe "usermem" script attempts to determine the maximum process size.  Some
239b50d902SRodney W. Grimesediting of the script may be necessary (see the comments).  [It should work
249b50d902SRodney W. Grimesfine on 4.3 bsd.] If you can't get it to work at all, just create file
259b50d902SRodney W. Grimes"USERMEM" containing the maximum process size in decimal.
269b50d902SRodney W. Grimes
279b50d902SRodney W. GrimesThe following preprocessor symbols control the compilation of "compress.c":
289b50d902SRodney W. Grimes
299b50d902SRodney W. Grimes	o USERMEM		Maximum process memory on the system
309b50d902SRodney W. Grimes	o SACREDMEM		Amount to reserve for other proceses
319b50d902SRodney W. Grimes	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
329b50d902SRodney W. Grimes	o NO_UCHAR		Don't use "unsigned char" types
339b50d902SRodney W. Grimes	o BITS			Overrules default set by USERMEM-SACREDMEM
349b50d902SRodney W. Grimes	o vax			Generate inline assembler
359b50d902SRodney W. Grimes	o interdata		Defines SIGNED_COMPARE_SLOW
369b50d902SRodney W. Grimes	o M_XENIX		Makes arrays < 65536 bytes each
379b50d902SRodney W. Grimes	o pdp11			BITS=12, NO_UCHAR
389b50d902SRodney W. Grimes	o z8000			BITS=12
399b50d902SRodney W. Grimes	o pcxt			BITS=12
409b50d902SRodney W. Grimes	o BSD4_2		Allow long filenames ( > 14 characters) &
419b50d902SRodney W. Grimes				Call setlinebuf(stderr)
429b50d902SRodney W. Grimes
439b50d902SRodney W. GrimesThe difference "usermem-sacredmem" determines the maximum BITS that can be
449b50d902SRodney W. Grimesspecified with the "-b" flag.
459b50d902SRodney W. Grimes
469b50d902SRodney W. Grimesmemory: at least		BITS
479b50d902SRodney W. Grimes------  -- -----                ----
489b50d902SRodney W. Grimes     433,484			 16
499b50d902SRodney W. Grimes     229,600			 15
509b50d902SRodney W. Grimes     127,536			 14
519b50d902SRodney W. Grimes      73,464			 13
529b50d902SRodney W. Grimes           0			 12
539b50d902SRodney W. Grimes
549b50d902SRodney W. GrimesThe default is BITS=16.
559b50d902SRodney W. Grimes
569b50d902SRodney W. GrimesThe maximum bits can be overrulled by specifying "-DBITS=bits" at
579b50d902SRodney W. Grimescompilation time.
589b50d902SRodney W. Grimes
599b50d902SRodney W. GrimesWARNING: files compressed on a large machine with more bits than allowed by
609b50d902SRodney W. Grimesa version of compress on a smaller machine cannot be decompressed!  Use the
619b50d902SRodney W. Grimes"-b12" flag to generate a file on a large machine that can be uncompressed
629b50d902SRodney W. Grimeson a 16-bit machine.
639b50d902SRodney W. Grimes
649b50d902SRodney W. GrimesThe output of compress 4.0 is fully compatible with that of compress 3.0.
659b50d902SRodney W. GrimesIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or
669b50d902SRodney W. Grimesthe output of compress 3.0 may be fed into uncompress 4.0.
679b50d902SRodney W. Grimes
689b50d902SRodney W. GrimesThe output of compress 4.0 not compatible with that of
699b50d902SRodney W. Grimescompress 2.0.  However, compress 4.0 still accepts the output of
709b50d902SRodney W. Grimescompress 2.0.  To generate output that is compatible with compress
719b50d902SRodney W. Grimes2.0, use the undocumented "-C" flag.
729b50d902SRodney W. Grimes
739b50d902SRodney W. Grimes	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
749b50d902SRodney W. Grimes--------------------------------
759b50d902SRodney W. Grimes
769b50d902SRodney W. GrimesEnclosed is compress version 3.0 with the following changes:
779b50d902SRodney W. Grimes
789b50d902SRodney W. Grimes1.	"Block" compression is performed.  After the BITS run out, the
799b50d902SRodney W. Grimes	compression ratio is checked every so often.  If it is decreasing,
809b50d902SRodney W. Grimes	the table is cleared and a new set of substrings are generated.
819b50d902SRodney W. Grimes
829b50d902SRodney W. Grimes	This makes the output of compress 3.0 not compatible with that of
839b50d902SRodney W. Grimes	compress 2.0.  However, compress 3.0 still accepts the output of
849b50d902SRodney W. Grimes	compress 2.0.  To generate output that is compatible with compress
859b50d902SRodney W. Grimes	2.0, use the undocumented "-C" flag.
869b50d902SRodney W. Grimes
879b50d902SRodney W. Grimes2.	A quiet "-q" flag has been added for use by the news system.
889b50d902SRodney W. Grimes
899b50d902SRodney W. Grimes3.	The character chaining has been deleted and the program now uses
909b50d902SRodney W. Grimes	hashing.  This improves the speed of the program, especially
919b50d902SRodney W. Grimes	during decompression.  Other speed improvements have been made,
929b50d902SRodney W. Grimes	such as using putc() instead of fwrite().
939b50d902SRodney W. Grimes
949b50d902SRodney W. Grimes4.	A large table is used on large machines when a relatively small
959b50d902SRodney W. Grimes	number of bits is specified.  This saves much time when compressing
969b50d902SRodney W. Grimes	for a 16-bit machine on a 32-bit virtual machine.  Note that the
979b50d902SRodney W. Grimes	speed improvement only occurs when the input file is > 30000
989b50d902SRodney W. Grimes	characters, and the -b BITS is less than or equal to the cutoff
999b50d902SRodney W. Grimes	described below.
1009b50d902SRodney W. Grimes
1019b50d902SRodney W. GrimesMost of these changes were made by James A. Woods (ames!jaw).  Thank you
1029b50d902SRodney W. GrimesJames!
1039b50d902SRodney W. Grimes
1049b50d902SRodney W. GrimesTo compile compress:
1059b50d902SRodney W. Grimes
1069b50d902SRodney W. Grimes	cc -O -DUSERMEM=usermem -o compress compress.c
1079b50d902SRodney W. Grimes
1089b50d902SRodney W. GrimesWhere "usermem" is the amount of physical user memory available (in bytes).
1099b50d902SRodney W. GrimesIf any physical memory is to be reserved for other processes, put in
1109b50d902SRodney W. Grimes"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
1119b50d902SRodney W. Grimes
1129b50d902SRodney W. GrimesThe difference "usermem-sacredmem" determines the maximum BITS that can be
1139b50d902SRodney W. Grimesspecified, and the cutoff bits where the large+fast table is used.
1149b50d902SRodney W. Grimes
1159b50d902SRodney W. Grimesmemory: at least		BITS		cutoff
1169b50d902SRodney W. Grimes------  -- -----                ----            ------
1179b50d902SRodney W. Grimes   4,718,592 			 16		  13
1189b50d902SRodney W. Grimes   2,621,440 			 16		  12
1199b50d902SRodney W. Grimes   1,572,864			 16		  11
1209b50d902SRodney W. Grimes   1,048,576			 16		  10
1219b50d902SRodney W. Grimes     631,808			 16               --
1229b50d902SRodney W. Grimes     329,728			 15               --
1239b50d902SRodney W. Grimes     178,176			 14		  --
1249b50d902SRodney W. Grimes      99,328			 13		  --
1259b50d902SRodney W. Grimes           0			 12		  --
1269b50d902SRodney W. Grimes
1279b50d902SRodney W. GrimesThe default memory size is 750,000 which gives a maximum BITS=16 and no
1289b50d902SRodney W. Grimeslarge+fast table.
1299b50d902SRodney W. Grimes
1309b50d902SRodney W. GrimesThe maximum bits can be overruled by specifying "-DBITS=bits" at
1319b50d902SRodney W. Grimescompilation time.
1329b50d902SRodney W. Grimes
1339b50d902SRodney W. GrimesIf your machine doesn't support unsigned characters, define "NO_UCHAR"
1349b50d902SRodney W. Grimeswhen compiling.
1359b50d902SRodney W. Grimes
1369b50d902SRodney W. GrimesIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
1379b50d902SRodney W. Grimes
1389b50d902SRodney W. GrimesAfter compilation, move "compress" to a standard executable location, such
1399b50d902SRodney W. Grimesas /usr/local.  Then:
1409b50d902SRodney W. Grimes	cd /usr/local
1419b50d902SRodney W. Grimes	ln compress uncompress
1429b50d902SRodney W. Grimes	ln compress zcat
1439b50d902SRodney W. Grimes
1449b50d902SRodney W. GrimesOn machines that have a fixed stack size (such as Perkin-Elmer), set the
1459b50d902SRodney W. Grimesstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
1469b50d902SRodney W. Grimes
1479b50d902SRodney W. GrimesNext, install the manual (compress.l).
1489b50d902SRodney W. Grimes	cp compress.l /usr/man/manl
1499b50d902SRodney W. Grimes	cd /usr/man/manl
1509b50d902SRodney W. Grimes	ln compress.l uncompress.l
1519b50d902SRodney W. Grimes	ln compress.l zcat.l
1529b50d902SRodney W. Grimes
1539b50d902SRodney W. Grimes		- or -
1549b50d902SRodney W. Grimes
1559b50d902SRodney W. Grimes	cp compress.l /usr/man/man1/compress.1
1569b50d902SRodney W. Grimes	cd /usr/man/man1
1579b50d902SRodney W. Grimes	ln compress.1 uncompress.1
1589b50d902SRodney W. Grimes	ln compress.1 zcat.1
1599b50d902SRodney W. Grimes
1609b50d902SRodney W. Grimes					regards,
1619b50d902SRodney W. Grimes					petsd!joe
1629b50d902SRodney W. Grimes
1639b50d902SRodney W. GrimesHere is a note from the net:
1649b50d902SRodney W. Grimes
1659b50d902SRodney W. Grimes>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
1669b50d902SRodney W. GrimesPath: ames!hplabs!pesnta!amd!turtlevax!ken
1679b50d902SRodney W. GrimesFrom: ken@turtlevax.UUCP (Ken Turkowski)
1689b50d902SRodney W. GrimesNewsgroups: net.sources
1699b50d902SRodney W. GrimesSubject: Re: Compress release 3.0 : sample Makefile
1709b50d902SRodney W. GrimesOrganization: CADLINC, Inc. @ Menlo Park, CA
1719b50d902SRodney W. Grimes
1729b50d902SRodney W. GrimesIn the compress 3.0 source recently posted to mod.sources, there is a
1739b50d902SRodney W. Grimes#define variable which can be set for optimum performance on a machine
1749b50d902SRodney W. Grimeswith a large amount of memory.  A program (usermem) to calculate the
1759b50d902SRodney W. Grimesuseable amount of physical user memory is enclosed, as well as a sample
1769b50d902SRodney W. Grimes4.2bsd Vax Makefile for compress.
1779b50d902SRodney W. Grimes
1789b50d902SRodney W. GrimesHere is the README file from the previous version of compress (2.0):
1799b50d902SRodney W. Grimes
1809b50d902SRodney W. Grimes>Enclosed is compress.c version 2.0 with the following bugs fixed:
1819b50d902SRodney W. Grimes>
1829b50d902SRodney W. Grimes>1.	The packed files produced by compress are different on different
1839b50d902SRodney W. Grimes>	machines and dependent on the vax sysgen option.
1849b50d902SRodney W. Grimes>		The bug was in the different byte/bit ordering on the
1859b50d902SRodney W. Grimes>		various machines.  This has been fixed.
1869b50d902SRodney W. Grimes>
1879b50d902SRodney W. Grimes>		This version is NOT compatible with the original vax posting
1889b50d902SRodney W. Grimes>		unless the '-DCOMPATIBLE' option is specified to the C
1899b50d902SRodney W. Grimes>		compiler.  The original posting has a bug which I fixed,
1909b50d902SRodney W. Grimes>		causing incompatible files.  I recommend you NOT to use this
1919b50d902SRodney W. Grimes>		option unless you already have a lot of packed files from
1929b50d902SRodney W. Grimes>		the original posting by thomas.
1939b50d902SRodney W. Grimes>2.	The exit status is not well defined (on some machines) causing the
1949b50d902SRodney W. Grimes>	scripts to fail.
1959b50d902SRodney W. Grimes>		The exit status is now 0,1 or 2 and is documented in
1969b50d902SRodney W. Grimes>		compress.l.
1979b50d902SRodney W. Grimes>3.	The function getopt() is not available in all C libraries.
1989b50d902SRodney W. Grimes>		The function getopt() is no longer referenced by the
1999b50d902SRodney W. Grimes>		program.
2009b50d902SRodney W. Grimes>4.	Error status is not being checked on the fwrite() and fflush() calls.
2019b50d902SRodney W. Grimes>		Fixed.
2029b50d902SRodney W. Grimes>
2039b50d902SRodney W. Grimes>The following enhancements have been made:
2049b50d902SRodney W. Grimes>
2059b50d902SRodney W. Grimes>1.	Added facilities of "compact" into the compress program.  "Pack",
2069b50d902SRodney W. Grimes>	"Unpack", and "Pcat" are no longer required (no longer supplied).
2079b50d902SRodney W. Grimes>2.	Installed work around for C compiler bug with "-O".
2089b50d902SRodney W. Grimes>3.	Added a magic number header (\037\235).  Put the bits specified
2099b50d902SRodney W. Grimes>	in the file.
2109b50d902SRodney W. Grimes>4.	Added "-f" flag to force overwrite of output file.
2119b50d902SRodney W. Grimes>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
2129b50d902SRodney W. Grimes>	compile.
2139b50d902SRodney W. Grimes>6.	The 'uncompress' script has been deleted; simply
2149b50d902SRodney W. Grimes>	'ln compress uncompress' after you compile and it will work.
2159b50d902SRodney W. Grimes>7.	Removed extra bit masking for machines that support unsigned
2169b50d902SRodney W. Grimes>	characters.  If your machine doesn't support unsigned characters,
2179b50d902SRodney W. Grimes>	define "NO_UCHAR" when compiling.
2189b50d902SRodney W. Grimes>
2199b50d902SRodney W. Grimes>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
2209b50d902SRodney W. Grimes>standard executable location, such as /usr/local.  Then:
2219b50d902SRodney W. Grimes>	cd /usr/local
2229b50d902SRodney W. Grimes>	ln compress uncompress
2239b50d902SRodney W. Grimes>	ln compress zcat
2249b50d902SRodney W. Grimes>
2259b50d902SRodney W. Grimes>On machines that have a fixed stack size (such as Perkin-Elmer), set the
2269b50d902SRodney W. Grimes>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
2279b50d902SRodney W. Grimes>
2289b50d902SRodney W. Grimes>Next, install the manual (compress.l).
2299b50d902SRodney W. Grimes>	cp compress.l /usr/man/manl		- or -
2309b50d902SRodney W. Grimes>	cp compress.l /usr/man/man1/compress.1
2319b50d902SRodney W. Grimes>
2329b50d902SRodney W. Grimes>Here is the README that I sent with my first posting:
2339b50d902SRodney W. Grimes>
2349b50d902SRodney W. Grimes>>Enclosed is a modified version of compress.c, along with scripts to make it
2359b50d902SRodney W. Grimes>>run identically to pack(1), unpack(1), an pcat(1).  Here is what I
2369b50d902SRodney W. Grimes>>(petsd!joe) and a colleague (petsd!peora!srd) did:
2379b50d902SRodney W. Grimes>>
2389b50d902SRodney W. Grimes>>1. Removed VAX dependencies.
2399b50d902SRodney W. Grimes>>2. Changed the struct to separate arrays; saves mucho memory.
2409b50d902SRodney W. Grimes>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
2419b50d902SRodney W. Grimes>>4. Sorted the character next chain and changed the search to stop
2429b50d902SRodney W. Grimes>>prematurely.  This saves a lot on the execution time when compressing.
2439b50d902SRodney W. Grimes>>
2449b50d902SRodney W. Grimes>>This version is totally compatible with the original version.  Even though
2459b50d902SRodney W. Grimes>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
2469b50d902SRodney W. Grimes>>machine, due to the size of the arrays.
2479b50d902SRodney W. Grimes>>
2489b50d902SRodney W. Grimes>>Here is the README file from the original author:
2499b50d902SRodney W. Grimes>>
2509b50d902SRodney W. Grimes>>>Well, with all this discussion about file compression (for news batching
2519b50d902SRodney W. Grimes>>>in particular) going around, I decided to implement the text compression
2529b50d902SRodney W. Grimes>>>algorithm described in the June Computer magazine.  The author claimed
2539b50d902SRodney W. Grimes>>>blinding speed and good compression ratios.  It's certainly faster than
2549b50d902SRodney W. Grimes>>>compact (but, then, what wouldn't be), but it's also the same speed as
2559b50d902SRodney W. Grimes>>>pack, and gets better compression than both of them.  On 350K bytes of
2569b50d902SRodney W. Grimes>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80
2579b50d902SRodney W. Grimes>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
2589b50d902SRodney W. Grimes>>>pack got about 30% compression, whereas compress got over 50%.  So, I
2599b50d902SRodney W. Grimes>>>decided I had something, and that others might be interested, too.
2609b50d902SRodney W. Grimes>>>
2619b50d902SRodney W. Grimes>>>As is probably true of compact and pack (although I haven't checked),
2629b50d902SRodney W. Grimes>>>the byte order within a word is probably relevant here, but as long as
2639b50d902SRodney W. Grimes>>>you stay on a single machine type, you should be ok.  (Can anybody
2649b50d902SRodney W. Grimes>>>elucidate on this?)  There are a couple of asm's in the code (extv and
2659b50d902SRodney W. Grimes>>>insv instructions), so anyone porting it to another machine will have to
2669b50d902SRodney W. Grimes>>>deal with this anyway (and could probably make it compatible with Vax
2679b50d902SRodney W. Grimes>>>byte order at the same time).  Anyway, I've linted the code (both with
2689b50d902SRodney W. Grimes>>>and without -p), so it should run elsewhere.  Note the longs in the
2699b50d902SRodney W. Grimes>>>code, you can take these out if you reduce BITS to <= 15.
2709b50d902SRodney W. Grimes>>>
2719b50d902SRodney W. Grimes>>>Have fun, and as always, if you make good enhancements, or bug fixes,
2729b50d902SRodney W. Grimes>>>I'd like to see them.
2739b50d902SRodney W. Grimes>>>
2749b50d902SRodney W. Grimes>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
2759b50d902SRodney W. Grimes>>
2769b50d902SRodney W. Grimes>>					regards,
2779b50d902SRodney W. Grimes>>					joe
2789b50d902SRodney W. Grimes>>
2799b50d902SRodney W. Grimes>>--
2809b50d902SRodney W. Grimes>>Full-Name:  Joseph M. Orost
2819b50d902SRodney W. Grimes>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
2829b50d902SRodney W. Grimes>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
2839b50d902SRodney W. Grimes>>Phone:      (201) 870-5844
284