xref: /freebsd/usr.bin/compress/doc/README (revision 5b31cc94b10d4bb7109c6b27940a0fc76a44a331)
19b50d902SRodney W. GrimesCompress version 4.0 improvements over 3.0:
29b50d902SRodney W. Grimes	o compress() speedup (10-50%) by changing division hash to xor
39b50d902SRodney W. Grimes	o decompress() speedup (5-10%)
49b50d902SRodney W. Grimes	o Memory requirements reduced (3-30%)
59b50d902SRodney W. Grimes	o Stack requirements reduced to less than 4kb
69b50d902SRodney W. Grimes	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
79b50d902SRodney W. Grimes    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
89b50d902SRodney W. Grimes	o Default to 'quiet' mode
99b50d902SRodney W. Grimes	o Unification of 'force' flags
109b50d902SRodney W. Grimes	o Manual page overhaul
119b50d902SRodney W. Grimes	o Portability enhancement for M_XENIX
129b50d902SRodney W. Grimes	o Removed text on #else and #endif
139b50d902SRodney W. Grimes	o Added "-V" switch to print version and options
149b50d902SRodney W. Grimes	o Added #defines for SIGNED_COMPARE_SLOW
159b50d902SRodney W. Grimes	o Added Makefile and "usermem" program
169b50d902SRodney W. Grimes	o Removed all floating point computations
179b50d902SRodney W. Grimes	o New programs: [deleted]
189b50d902SRodney W. Grimes
199b50d902SRodney W. GrimesThe "usermem" script attempts to determine the maximum process size.  Some
209b50d902SRodney W. Grimesediting of the script may be necessary (see the comments).  [It should work
216dc4364cSPhilippe Charnierfine on 4.3 BSD.] If you can't get it to work at all, just create file
229b50d902SRodney W. Grimes"USERMEM" containing the maximum process size in decimal.
239b50d902SRodney W. Grimes
249b50d902SRodney W. GrimesThe following preprocessor symbols control the compilation of "compress.c":
259b50d902SRodney W. Grimes
269b50d902SRodney W. Grimes	o USERMEM		Maximum process memory on the system
276dc4364cSPhilippe Charnier	o SACREDMEM		Amount to reserve for other processes
289b50d902SRodney W. Grimes	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
299b50d902SRodney W. Grimes	o NO_UCHAR		Don't use "unsigned char" types
309b50d902SRodney W. Grimes	o BITS			Overrules default set by USERMEM-SACREDMEM
319b50d902SRodney W. Grimes	o vax			Generate inline assembler
329b50d902SRodney W. Grimes	o interdata		Defines SIGNED_COMPARE_SLOW
339b50d902SRodney W. Grimes	o M_XENIX		Makes arrays < 65536 bytes each
349b50d902SRodney W. Grimes	o pdp11			BITS=12, NO_UCHAR
359b50d902SRodney W. Grimes	o z8000			BITS=12
369b50d902SRodney W. Grimes	o pcxt			BITS=12
379b50d902SRodney W. Grimes	o BSD4_2		Allow long filenames ( > 14 characters) &
389b50d902SRodney W. Grimes				Call setlinebuf(stderr)
399b50d902SRodney W. Grimes
409b50d902SRodney W. GrimesThe difference "usermem-sacredmem" determines the maximum BITS that can be
419b50d902SRodney W. Grimesspecified with the "-b" flag.
429b50d902SRodney W. Grimes
439b50d902SRodney W. Grimesmemory: at least		BITS
449b50d902SRodney W. Grimes------  -- -----                ----
459b50d902SRodney W. Grimes     433,484			 16
469b50d902SRodney W. Grimes     229,600			 15
479b50d902SRodney W. Grimes     127,536			 14
489b50d902SRodney W. Grimes      73,464			 13
499b50d902SRodney W. Grimes           0			 12
509b50d902SRodney W. Grimes
519b50d902SRodney W. GrimesThe default is BITS=16.
529b50d902SRodney W. Grimes
536dc4364cSPhilippe CharnierThe maximum bits can be overruled by specifying "-DBITS=bits" at
549b50d902SRodney W. Grimescompilation time.
559b50d902SRodney W. Grimes
569b50d902SRodney W. GrimesWARNING: files compressed on a large machine with more bits than allowed by
579b50d902SRodney W. Grimesa version of compress on a smaller machine cannot be decompressed!  Use the
589b50d902SRodney W. Grimes"-b12" flag to generate a file on a large machine that can be uncompressed
599b50d902SRodney W. Grimeson a 16-bit machine.
609b50d902SRodney W. Grimes
619b50d902SRodney W. GrimesThe output of compress 4.0 is fully compatible with that of compress 3.0.
629b50d902SRodney W. GrimesIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or
639b50d902SRodney W. Grimesthe output of compress 3.0 may be fed into uncompress 4.0.
649b50d902SRodney W. Grimes
659b50d902SRodney W. GrimesThe output of compress 4.0 not compatible with that of
669b50d902SRodney W. Grimescompress 2.0.  However, compress 4.0 still accepts the output of
679b50d902SRodney W. Grimescompress 2.0.  To generate output that is compatible with compress
689b50d902SRodney W. Grimes2.0, use the undocumented "-C" flag.
699b50d902SRodney W. Grimes
709b50d902SRodney W. Grimes	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
719b50d902SRodney W. Grimes--------------------------------
729b50d902SRodney W. Grimes
739b50d902SRodney W. GrimesEnclosed is compress version 3.0 with the following changes:
749b50d902SRodney W. Grimes
759b50d902SRodney W. Grimes1.	"Block" compression is performed.  After the BITS run out, the
769b50d902SRodney W. Grimes	compression ratio is checked every so often.  If it is decreasing,
779b50d902SRodney W. Grimes	the table is cleared and a new set of substrings are generated.
789b50d902SRodney W. Grimes
799b50d902SRodney W. Grimes	This makes the output of compress 3.0 not compatible with that of
809b50d902SRodney W. Grimes	compress 2.0.  However, compress 3.0 still accepts the output of
819b50d902SRodney W. Grimes	compress 2.0.  To generate output that is compatible with compress
829b50d902SRodney W. Grimes	2.0, use the undocumented "-C" flag.
839b50d902SRodney W. Grimes
849b50d902SRodney W. Grimes2.	A quiet "-q" flag has been added for use by the news system.
859b50d902SRodney W. Grimes
869b50d902SRodney W. Grimes3.	The character chaining has been deleted and the program now uses
879b50d902SRodney W. Grimes	hashing.  This improves the speed of the program, especially
889b50d902SRodney W. Grimes	during decompression.  Other speed improvements have been made,
899b50d902SRodney W. Grimes	such as using putc() instead of fwrite().
909b50d902SRodney W. Grimes
919b50d902SRodney W. Grimes4.	A large table is used on large machines when a relatively small
929b50d902SRodney W. Grimes	number of bits is specified.  This saves much time when compressing
939b50d902SRodney W. Grimes	for a 16-bit machine on a 32-bit virtual machine.  Note that the
949b50d902SRodney W. Grimes	speed improvement only occurs when the input file is > 30000
959b50d902SRodney W. Grimes	characters, and the -b BITS is less than or equal to the cutoff
969b50d902SRodney W. Grimes	described below.
979b50d902SRodney W. Grimes
989b50d902SRodney W. GrimesMost of these changes were made by James A. Woods (ames!jaw).  Thank you
999b50d902SRodney W. GrimesJames!
1009b50d902SRodney W. Grimes
1019b50d902SRodney W. GrimesTo compile compress:
1029b50d902SRodney W. Grimes
1039b50d902SRodney W. Grimes	cc -O -DUSERMEM=usermem -o compress compress.c
1049b50d902SRodney W. Grimes
1059b50d902SRodney W. GrimesWhere "usermem" is the amount of physical user memory available (in bytes).
1069b50d902SRodney W. GrimesIf any physical memory is to be reserved for other processes, put in
1079b50d902SRodney W. Grimes"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
1089b50d902SRodney W. Grimes
1099b50d902SRodney W. GrimesThe difference "usermem-sacredmem" determines the maximum BITS that can be
1109b50d902SRodney W. Grimesspecified, and the cutoff bits where the large+fast table is used.
1119b50d902SRodney W. Grimes
1129b50d902SRodney W. Grimesmemory: at least		BITS		cutoff
1139b50d902SRodney W. Grimes------  -- -----                ----            ------
1149b50d902SRodney W. Grimes   4,718,592 			 16		  13
1159b50d902SRodney W. Grimes   2,621,440 			 16		  12
1169b50d902SRodney W. Grimes   1,572,864			 16		  11
1179b50d902SRodney W. Grimes   1,048,576			 16		  10
1189b50d902SRodney W. Grimes     631,808			 16               --
1199b50d902SRodney W. Grimes     329,728			 15               --
1209b50d902SRodney W. Grimes     178,176			 14		  --
1219b50d902SRodney W. Grimes      99,328			 13		  --
1229b50d902SRodney W. Grimes           0			 12		  --
1239b50d902SRodney W. Grimes
1249b50d902SRodney W. GrimesThe default memory size is 750,000 which gives a maximum BITS=16 and no
1259b50d902SRodney W. Grimeslarge+fast table.
1269b50d902SRodney W. Grimes
1279b50d902SRodney W. GrimesThe maximum bits can be overruled by specifying "-DBITS=bits" at
1289b50d902SRodney W. Grimescompilation time.
1299b50d902SRodney W. Grimes
1309b50d902SRodney W. GrimesIf your machine doesn't support unsigned characters, define "NO_UCHAR"
1319b50d902SRodney W. Grimeswhen compiling.
1329b50d902SRodney W. Grimes
1339b50d902SRodney W. GrimesIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
1349b50d902SRodney W. Grimes
1359b50d902SRodney W. GrimesAfter compilation, move "compress" to a standard executable location, such
1369b50d902SRodney W. Grimesas /usr/local.  Then:
1379b50d902SRodney W. Grimes	cd /usr/local
1389b50d902SRodney W. Grimes	ln compress uncompress
1399b50d902SRodney W. Grimes	ln compress zcat
1409b50d902SRodney W. Grimes
1419b50d902SRodney W. GrimesOn machines that have a fixed stack size (such as Perkin-Elmer), set the
1429b50d902SRodney W. Grimesstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
1439b50d902SRodney W. Grimes
1449b50d902SRodney W. GrimesNext, install the manual (compress.l).
1459b50d902SRodney W. Grimes	cp compress.l /usr/man/manl
1469b50d902SRodney W. Grimes	cd /usr/man/manl
1479b50d902SRodney W. Grimes	ln compress.l uncompress.l
1489b50d902SRodney W. Grimes	ln compress.l zcat.l
1499b50d902SRodney W. Grimes
1509b50d902SRodney W. Grimes		- or -
1519b50d902SRodney W. Grimes
1529b50d902SRodney W. Grimes	cp compress.l /usr/man/man1/compress.1
1539b50d902SRodney W. Grimes	cd /usr/man/man1
1549b50d902SRodney W. Grimes	ln compress.1 uncompress.1
1559b50d902SRodney W. Grimes	ln compress.1 zcat.1
1569b50d902SRodney W. Grimes
1579b50d902SRodney W. Grimes					regards,
1589b50d902SRodney W. Grimes					petsd!joe
1599b50d902SRodney W. Grimes
1609b50d902SRodney W. GrimesHere is a note from the net:
1619b50d902SRodney W. Grimes
1629b50d902SRodney W. Grimes>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
1639b50d902SRodney W. GrimesPath: ames!hplabs!pesnta!amd!turtlevax!ken
1649b50d902SRodney W. GrimesFrom: ken@turtlevax.UUCP (Ken Turkowski)
1659b50d902SRodney W. GrimesNewsgroups: net.sources
1669b50d902SRodney W. GrimesSubject: Re: Compress release 3.0 : sample Makefile
1679b50d902SRodney W. GrimesOrganization: CADLINC, Inc. @ Menlo Park, CA
1689b50d902SRodney W. Grimes
1699b50d902SRodney W. GrimesIn the compress 3.0 source recently posted to mod.sources, there is a
1709b50d902SRodney W. Grimes#define variable which can be set for optimum performance on a machine
1719b50d902SRodney W. Grimeswith a large amount of memory.  A program (usermem) to calculate the
1726dc4364cSPhilippe Charnierusable amount of physical user memory is enclosed, as well as a sample
1736dc4364cSPhilippe Charnier4.2BSD Vax Makefile for compress.
1749b50d902SRodney W. Grimes
1759b50d902SRodney W. GrimesHere is the README file from the previous version of compress (2.0):
1769b50d902SRodney W. Grimes
1779b50d902SRodney W. Grimes>Enclosed is compress.c version 2.0 with the following bugs fixed:
1789b50d902SRodney W. Grimes>
1799b50d902SRodney W. Grimes>1.	The packed files produced by compress are different on different
1809b50d902SRodney W. Grimes>	machines and dependent on the vax sysgen option.
1819b50d902SRodney W. Grimes>		The bug was in the different byte/bit ordering on the
1829b50d902SRodney W. Grimes>		various machines.  This has been fixed.
1839b50d902SRodney W. Grimes>
1849b50d902SRodney W. Grimes>		This version is NOT compatible with the original vax posting
1859b50d902SRodney W. Grimes>		unless the '-DCOMPATIBLE' option is specified to the C
1869b50d902SRodney W. Grimes>		compiler.  The original posting has a bug which I fixed,
1879b50d902SRodney W. Grimes>		causing incompatible files.  I recommend you NOT to use this
1889b50d902SRodney W. Grimes>		option unless you already have a lot of packed files from
1896dc4364cSPhilippe Charnier>		the original posting by Thomas.
1909b50d902SRodney W. Grimes>2.	The exit status is not well defined (on some machines) causing the
1919b50d902SRodney W. Grimes>	scripts to fail.
1929b50d902SRodney W. Grimes>		The exit status is now 0,1 or 2 and is documented in
1939b50d902SRodney W. Grimes>		compress.l.
1949b50d902SRodney W. Grimes>3.	The function getopt() is not available in all C libraries.
1959b50d902SRodney W. Grimes>		The function getopt() is no longer referenced by the
1969b50d902SRodney W. Grimes>		program.
1979b50d902SRodney W. Grimes>4.	Error status is not being checked on the fwrite() and fflush() calls.
1989b50d902SRodney W. Grimes>		Fixed.
1999b50d902SRodney W. Grimes>
2009b50d902SRodney W. Grimes>The following enhancements have been made:
2019b50d902SRodney W. Grimes>
2029b50d902SRodney W. Grimes>1.	Added facilities of "compact" into the compress program.  "Pack",
2039b50d902SRodney W. Grimes>	"Unpack", and "Pcat" are no longer required (no longer supplied).
2049b50d902SRodney W. Grimes>2.	Installed work around for C compiler bug with "-O".
2059b50d902SRodney W. Grimes>3.	Added a magic number header (\037\235).  Put the bits specified
2069b50d902SRodney W. Grimes>	in the file.
2079b50d902SRodney W. Grimes>4.	Added "-f" flag to force overwrite of output file.
2089b50d902SRodney W. Grimes>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
2099b50d902SRodney W. Grimes>	compile.
2109b50d902SRodney W. Grimes>6.	The 'uncompress' script has been deleted; simply
2119b50d902SRodney W. Grimes>	'ln compress uncompress' after you compile and it will work.
2129b50d902SRodney W. Grimes>7.	Removed extra bit masking for machines that support unsigned
2139b50d902SRodney W. Grimes>	characters.  If your machine doesn't support unsigned characters,
2149b50d902SRodney W. Grimes>	define "NO_UCHAR" when compiling.
2159b50d902SRodney W. Grimes>
2169b50d902SRodney W. Grimes>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
2179b50d902SRodney W. Grimes>standard executable location, such as /usr/local.  Then:
2189b50d902SRodney W. Grimes>	cd /usr/local
2199b50d902SRodney W. Grimes>	ln compress uncompress
2209b50d902SRodney W. Grimes>	ln compress zcat
2219b50d902SRodney W. Grimes>
2229b50d902SRodney W. Grimes>On machines that have a fixed stack size (such as Perkin-Elmer), set the
2239b50d902SRodney W. Grimes>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
2249b50d902SRodney W. Grimes>
2259b50d902SRodney W. Grimes>Next, install the manual (compress.l).
2269b50d902SRodney W. Grimes>	cp compress.l /usr/man/manl		- or -
2279b50d902SRodney W. Grimes>	cp compress.l /usr/man/man1/compress.1
2289b50d902SRodney W. Grimes>
2299b50d902SRodney W. Grimes>Here is the README that I sent with my first posting:
2309b50d902SRodney W. Grimes>
2319b50d902SRodney W. Grimes>>Enclosed is a modified version of compress.c, along with scripts to make it
232d64ada50SJens Schweikhardt>>run identically to pack(1), unpack(1), and pcat(1).  Here is what I
2339b50d902SRodney W. Grimes>>(petsd!joe) and a colleague (petsd!peora!srd) did:
2349b50d902SRodney W. Grimes>>
2359b50d902SRodney W. Grimes>>1. Removed VAX dependencies.
2369b50d902SRodney W. Grimes>>2. Changed the struct to separate arrays; saves mucho memory.
2379b50d902SRodney W. Grimes>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
2389b50d902SRodney W. Grimes>>4. Sorted the character next chain and changed the search to stop
2399b50d902SRodney W. Grimes>>prematurely.  This saves a lot on the execution time when compressing.
2409b50d902SRodney W. Grimes>>
2419b50d902SRodney W. Grimes>>This version is totally compatible with the original version.  Even though
2429b50d902SRodney W. Grimes>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
2439b50d902SRodney W. Grimes>>machine, due to the size of the arrays.
2449b50d902SRodney W. Grimes>>
2459b50d902SRodney W. Grimes>>Here is the README file from the original author:
2469b50d902SRodney W. Grimes>>
2479b50d902SRodney W. Grimes>>>Well, with all this discussion about file compression (for news batching
2489b50d902SRodney W. Grimes>>>in particular) going around, I decided to implement the text compression
2499b50d902SRodney W. Grimes>>>algorithm described in the June Computer magazine.  The author claimed
2509b50d902SRodney W. Grimes>>>blinding speed and good compression ratios.  It's certainly faster than
2519b50d902SRodney W. Grimes>>>compact (but, then, what wouldn't be), but it's also the same speed as
2529b50d902SRodney W. Grimes>>>pack, and gets better compression than both of them.  On 350K bytes of
2536dc4364cSPhilippe Charnier>>>Unix-wizards, compact took about 8 minutes of CPU, pack took about 80
2549b50d902SRodney W. Grimes>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
2559b50d902SRodney W. Grimes>>>pack got about 30% compression, whereas compress got over 50%.  So, I
2569b50d902SRodney W. Grimes>>>decided I had something, and that others might be interested, too.
2579b50d902SRodney W. Grimes>>>
2589b50d902SRodney W. Grimes>>>As is probably true of compact and pack (although I haven't checked),
2599b50d902SRodney W. Grimes>>>the byte order within a word is probably relevant here, but as long as
2609b50d902SRodney W. Grimes>>>you stay on a single machine type, you should be ok.  (Can anybody
2619b50d902SRodney W. Grimes>>>elucidate on this?)  There are a couple of asm's in the code (extv and
2629b50d902SRodney W. Grimes>>>insv instructions), so anyone porting it to another machine will have to
2639b50d902SRodney W. Grimes>>>deal with this anyway (and could probably make it compatible with Vax
2649b50d902SRodney W. Grimes>>>byte order at the same time).  Anyway, I've linted the code (both with
2659b50d902SRodney W. Grimes>>>and without -p), so it should run elsewhere.  Note the longs in the
2669b50d902SRodney W. Grimes>>>code, you can take these out if you reduce BITS to <= 15.
2679b50d902SRodney W. Grimes>>>
2689b50d902SRodney W. Grimes>>>Have fun, and as always, if you make good enhancements, or bug fixes,
2699b50d902SRodney W. Grimes>>>I'd like to see them.
2709b50d902SRodney W. Grimes>>>
2719b50d902SRodney W. Grimes>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
2729b50d902SRodney W. Grimes>>
2739b50d902SRodney W. Grimes>>					regards,
2749b50d902SRodney W. Grimes>>					joe
2759b50d902SRodney W. Grimes>>
2769b50d902SRodney W. Grimes>>--
2779b50d902SRodney W. Grimes>>Full-Name:  Joseph M. Orost
2789b50d902SRodney W. Grimes>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
2799b50d902SRodney W. Grimes>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
2809b50d902SRodney W. Grimes>>Phone:      (201) 870-5844
281