xref: /freebsd/usr.bin/compress/doc/README (revision d64ada501ae33c02e7c4e3ce21962907df814a5a)
19b50d902SRodney W. Grimes
29b50d902SRodney W. Grimes	@(#)README	8.1 (Berkeley) 6/9/93
36dc4364cSPhilippe Charnier  $FreeBSD$
49b50d902SRodney W. Grimes
59b50d902SRodney W. GrimesCompress version 4.0 improvements over 3.0:
69b50d902SRodney W. Grimes	o compress() speedup (10-50%) by changing division hash to xor
79b50d902SRodney W. Grimes	o decompress() speedup (5-10%)
89b50d902SRodney W. Grimes	o Memory requirements reduced (3-30%)
99b50d902SRodney W. Grimes	o Stack requirements reduced to less than 4kb
109b50d902SRodney W. Grimes	o Removed 'Big+Fast' compress code (FBITS) because of compress speedup
119b50d902SRodney W. Grimes    	o Portability mods for Z8000 and PC/XT (but not zeus 3.2)
129b50d902SRodney W. Grimes	o Default to 'quiet' mode
139b50d902SRodney W. Grimes	o Unification of 'force' flags
149b50d902SRodney W. Grimes	o Manual page overhaul
159b50d902SRodney W. Grimes	o Portability enhancement for M_XENIX
169b50d902SRodney W. Grimes	o Removed text on #else and #endif
179b50d902SRodney W. Grimes	o Added "-V" switch to print version and options
189b50d902SRodney W. Grimes	o Added #defines for SIGNED_COMPARE_SLOW
199b50d902SRodney W. Grimes	o Added Makefile and "usermem" program
209b50d902SRodney W. Grimes	o Removed all floating point computations
219b50d902SRodney W. Grimes	o New programs: [deleted]
229b50d902SRodney W. Grimes
239b50d902SRodney W. GrimesThe "usermem" script attempts to determine the maximum process size.  Some
249b50d902SRodney W. Grimesediting of the script may be necessary (see the comments).  [It should work
256dc4364cSPhilippe Charnierfine on 4.3 BSD.] If you can't get it to work at all, just create file
269b50d902SRodney W. Grimes"USERMEM" containing the maximum process size in decimal.
279b50d902SRodney W. Grimes
289b50d902SRodney W. GrimesThe following preprocessor symbols control the compilation of "compress.c":
299b50d902SRodney W. Grimes
309b50d902SRodney W. Grimes	o USERMEM		Maximum process memory on the system
316dc4364cSPhilippe Charnier	o SACREDMEM		Amount to reserve for other processes
329b50d902SRodney W. Grimes	o SIGNED_COMPARE_SLOW	Unsigned compare instructions are faster
339b50d902SRodney W. Grimes	o NO_UCHAR		Don't use "unsigned char" types
349b50d902SRodney W. Grimes	o BITS			Overrules default set by USERMEM-SACREDMEM
359b50d902SRodney W. Grimes	o vax			Generate inline assembler
369b50d902SRodney W. Grimes	o interdata		Defines SIGNED_COMPARE_SLOW
379b50d902SRodney W. Grimes	o M_XENIX		Makes arrays < 65536 bytes each
389b50d902SRodney W. Grimes	o pdp11			BITS=12, NO_UCHAR
399b50d902SRodney W. Grimes	o z8000			BITS=12
409b50d902SRodney W. Grimes	o pcxt			BITS=12
419b50d902SRodney W. Grimes	o BSD4_2		Allow long filenames ( > 14 characters) &
429b50d902SRodney W. Grimes				Call setlinebuf(stderr)
439b50d902SRodney W. Grimes
449b50d902SRodney W. GrimesThe difference "usermem-sacredmem" determines the maximum BITS that can be
459b50d902SRodney W. Grimesspecified with the "-b" flag.
469b50d902SRodney W. Grimes
479b50d902SRodney W. Grimesmemory: at least		BITS
489b50d902SRodney W. Grimes------  -- -----                ----
499b50d902SRodney W. Grimes     433,484			 16
509b50d902SRodney W. Grimes     229,600			 15
519b50d902SRodney W. Grimes     127,536			 14
529b50d902SRodney W. Grimes      73,464			 13
539b50d902SRodney W. Grimes           0			 12
549b50d902SRodney W. Grimes
559b50d902SRodney W. GrimesThe default is BITS=16.
569b50d902SRodney W. Grimes
576dc4364cSPhilippe CharnierThe maximum bits can be overruled by specifying "-DBITS=bits" at
589b50d902SRodney W. Grimescompilation time.
599b50d902SRodney W. Grimes
609b50d902SRodney W. GrimesWARNING: files compressed on a large machine with more bits than allowed by
619b50d902SRodney W. Grimesa version of compress on a smaller machine cannot be decompressed!  Use the
629b50d902SRodney W. Grimes"-b12" flag to generate a file on a large machine that can be uncompressed
639b50d902SRodney W. Grimeson a 16-bit machine.
649b50d902SRodney W. Grimes
659b50d902SRodney W. GrimesThe output of compress 4.0 is fully compatible with that of compress 3.0.
669b50d902SRodney W. GrimesIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or
679b50d902SRodney W. Grimesthe output of compress 3.0 may be fed into uncompress 4.0.
689b50d902SRodney W. Grimes
699b50d902SRodney W. GrimesThe output of compress 4.0 not compatible with that of
709b50d902SRodney W. Grimescompress 2.0.  However, compress 4.0 still accepts the output of
719b50d902SRodney W. Grimescompress 2.0.  To generate output that is compatible with compress
729b50d902SRodney W. Grimes2.0, use the undocumented "-C" flag.
739b50d902SRodney W. Grimes
749b50d902SRodney W. Grimes	-from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85
759b50d902SRodney W. Grimes--------------------------------
769b50d902SRodney W. Grimes
779b50d902SRodney W. GrimesEnclosed is compress version 3.0 with the following changes:
789b50d902SRodney W. Grimes
799b50d902SRodney W. Grimes1.	"Block" compression is performed.  After the BITS run out, the
809b50d902SRodney W. Grimes	compression ratio is checked every so often.  If it is decreasing,
819b50d902SRodney W. Grimes	the table is cleared and a new set of substrings are generated.
829b50d902SRodney W. Grimes
839b50d902SRodney W. Grimes	This makes the output of compress 3.0 not compatible with that of
849b50d902SRodney W. Grimes	compress 2.0.  However, compress 3.0 still accepts the output of
859b50d902SRodney W. Grimes	compress 2.0.  To generate output that is compatible with compress
869b50d902SRodney W. Grimes	2.0, use the undocumented "-C" flag.
879b50d902SRodney W. Grimes
889b50d902SRodney W. Grimes2.	A quiet "-q" flag has been added for use by the news system.
899b50d902SRodney W. Grimes
909b50d902SRodney W. Grimes3.	The character chaining has been deleted and the program now uses
919b50d902SRodney W. Grimes	hashing.  This improves the speed of the program, especially
929b50d902SRodney W. Grimes	during decompression.  Other speed improvements have been made,
939b50d902SRodney W. Grimes	such as using putc() instead of fwrite().
949b50d902SRodney W. Grimes
959b50d902SRodney W. Grimes4.	A large table is used on large machines when a relatively small
969b50d902SRodney W. Grimes	number of bits is specified.  This saves much time when compressing
979b50d902SRodney W. Grimes	for a 16-bit machine on a 32-bit virtual machine.  Note that the
989b50d902SRodney W. Grimes	speed improvement only occurs when the input file is > 30000
999b50d902SRodney W. Grimes	characters, and the -b BITS is less than or equal to the cutoff
1009b50d902SRodney W. Grimes	described below.
1019b50d902SRodney W. Grimes
1029b50d902SRodney W. GrimesMost of these changes were made by James A. Woods (ames!jaw).  Thank you
1039b50d902SRodney W. GrimesJames!
1049b50d902SRodney W. Grimes
1059b50d902SRodney W. GrimesTo compile compress:
1069b50d902SRodney W. Grimes
1079b50d902SRodney W. Grimes	cc -O -DUSERMEM=usermem -o compress compress.c
1089b50d902SRodney W. Grimes
1099b50d902SRodney W. GrimesWhere "usermem" is the amount of physical user memory available (in bytes).
1109b50d902SRodney W. GrimesIf any physical memory is to be reserved for other processes, put in
1119b50d902SRodney W. Grimes"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved.
1129b50d902SRodney W. Grimes
1139b50d902SRodney W. GrimesThe difference "usermem-sacredmem" determines the maximum BITS that can be
1149b50d902SRodney W. Grimesspecified, and the cutoff bits where the large+fast table is used.
1159b50d902SRodney W. Grimes
1169b50d902SRodney W. Grimesmemory: at least		BITS		cutoff
1179b50d902SRodney W. Grimes------  -- -----                ----            ------
1189b50d902SRodney W. Grimes   4,718,592 			 16		  13
1199b50d902SRodney W. Grimes   2,621,440 			 16		  12
1209b50d902SRodney W. Grimes   1,572,864			 16		  11
1219b50d902SRodney W. Grimes   1,048,576			 16		  10
1229b50d902SRodney W. Grimes     631,808			 16               --
1239b50d902SRodney W. Grimes     329,728			 15               --
1249b50d902SRodney W. Grimes     178,176			 14		  --
1259b50d902SRodney W. Grimes      99,328			 13		  --
1269b50d902SRodney W. Grimes           0			 12		  --
1279b50d902SRodney W. Grimes
1289b50d902SRodney W. GrimesThe default memory size is 750,000 which gives a maximum BITS=16 and no
1299b50d902SRodney W. Grimeslarge+fast table.
1309b50d902SRodney W. Grimes
1319b50d902SRodney W. GrimesThe maximum bits can be overruled by specifying "-DBITS=bits" at
1329b50d902SRodney W. Grimescompilation time.
1339b50d902SRodney W. Grimes
1349b50d902SRodney W. GrimesIf your machine doesn't support unsigned characters, define "NO_UCHAR"
1359b50d902SRodney W. Grimeswhen compiling.
1369b50d902SRodney W. Grimes
1379b50d902SRodney W. GrimesIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling.
1389b50d902SRodney W. Grimes
1399b50d902SRodney W. GrimesAfter compilation, move "compress" to a standard executable location, such
1409b50d902SRodney W. Grimesas /usr/local.  Then:
1419b50d902SRodney W. Grimes	cd /usr/local
1429b50d902SRodney W. Grimes	ln compress uncompress
1439b50d902SRodney W. Grimes	ln compress zcat
1449b50d902SRodney W. Grimes
1459b50d902SRodney W. GrimesOn machines that have a fixed stack size (such as Perkin-Elmer), set the
1469b50d902SRodney W. Grimesstack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
1479b50d902SRodney W. Grimes
1489b50d902SRodney W. GrimesNext, install the manual (compress.l).
1499b50d902SRodney W. Grimes	cp compress.l /usr/man/manl
1509b50d902SRodney W. Grimes	cd /usr/man/manl
1519b50d902SRodney W. Grimes	ln compress.l uncompress.l
1529b50d902SRodney W. Grimes	ln compress.l zcat.l
1539b50d902SRodney W. Grimes
1549b50d902SRodney W. Grimes		- or -
1559b50d902SRodney W. Grimes
1569b50d902SRodney W. Grimes	cp compress.l /usr/man/man1/compress.1
1579b50d902SRodney W. Grimes	cd /usr/man/man1
1589b50d902SRodney W. Grimes	ln compress.1 uncompress.1
1599b50d902SRodney W. Grimes	ln compress.1 zcat.1
1609b50d902SRodney W. Grimes
1619b50d902SRodney W. Grimes					regards,
1629b50d902SRodney W. Grimes					petsd!joe
1639b50d902SRodney W. Grimes
1649b50d902SRodney W. GrimesHere is a note from the net:
1659b50d902SRodney W. Grimes
1669b50d902SRodney W. Grimes>From hplabs!pesnta!amd!turtlevax!ken Sat Jan  5 03:35:20 1985
1679b50d902SRodney W. GrimesPath: ames!hplabs!pesnta!amd!turtlevax!ken
1689b50d902SRodney W. GrimesFrom: ken@turtlevax.UUCP (Ken Turkowski)
1699b50d902SRodney W. GrimesNewsgroups: net.sources
1709b50d902SRodney W. GrimesSubject: Re: Compress release 3.0 : sample Makefile
1719b50d902SRodney W. GrimesOrganization: CADLINC, Inc. @ Menlo Park, CA
1729b50d902SRodney W. Grimes
1739b50d902SRodney W. GrimesIn the compress 3.0 source recently posted to mod.sources, there is a
1749b50d902SRodney W. Grimes#define variable which can be set for optimum performance on a machine
1759b50d902SRodney W. Grimeswith a large amount of memory.  A program (usermem) to calculate the
1766dc4364cSPhilippe Charnierusable amount of physical user memory is enclosed, as well as a sample
1776dc4364cSPhilippe Charnier4.2BSD Vax Makefile for compress.
1789b50d902SRodney W. Grimes
1799b50d902SRodney W. GrimesHere is the README file from the previous version of compress (2.0):
1809b50d902SRodney W. Grimes
1819b50d902SRodney W. Grimes>Enclosed is compress.c version 2.0 with the following bugs fixed:
1829b50d902SRodney W. Grimes>
1839b50d902SRodney W. Grimes>1.	The packed files produced by compress are different on different
1849b50d902SRodney W. Grimes>	machines and dependent on the vax sysgen option.
1859b50d902SRodney W. Grimes>		The bug was in the different byte/bit ordering on the
1869b50d902SRodney W. Grimes>		various machines.  This has been fixed.
1879b50d902SRodney W. Grimes>
1889b50d902SRodney W. Grimes>		This version is NOT compatible with the original vax posting
1899b50d902SRodney W. Grimes>		unless the '-DCOMPATIBLE' option is specified to the C
1909b50d902SRodney W. Grimes>		compiler.  The original posting has a bug which I fixed,
1919b50d902SRodney W. Grimes>		causing incompatible files.  I recommend you NOT to use this
1929b50d902SRodney W. Grimes>		option unless you already have a lot of packed files from
1936dc4364cSPhilippe Charnier>		the original posting by Thomas.
1949b50d902SRodney W. Grimes>2.	The exit status is not well defined (on some machines) causing the
1959b50d902SRodney W. Grimes>	scripts to fail.
1969b50d902SRodney W. Grimes>		The exit status is now 0,1 or 2 and is documented in
1979b50d902SRodney W. Grimes>		compress.l.
1989b50d902SRodney W. Grimes>3.	The function getopt() is not available in all C libraries.
1999b50d902SRodney W. Grimes>		The function getopt() is no longer referenced by the
2009b50d902SRodney W. Grimes>		program.
2019b50d902SRodney W. Grimes>4.	Error status is not being checked on the fwrite() and fflush() calls.
2029b50d902SRodney W. Grimes>		Fixed.
2039b50d902SRodney W. Grimes>
2049b50d902SRodney W. Grimes>The following enhancements have been made:
2059b50d902SRodney W. Grimes>
2069b50d902SRodney W. Grimes>1.	Added facilities of "compact" into the compress program.  "Pack",
2079b50d902SRodney W. Grimes>	"Unpack", and "Pcat" are no longer required (no longer supplied).
2089b50d902SRodney W. Grimes>2.	Installed work around for C compiler bug with "-O".
2099b50d902SRodney W. Grimes>3.	Added a magic number header (\037\235).  Put the bits specified
2109b50d902SRodney W. Grimes>	in the file.
2119b50d902SRodney W. Grimes>4.	Added "-f" flag to force overwrite of output file.
2129b50d902SRodney W. Grimes>5.	Added "-c" flag and "zcat" program.  'ln compress zcat' after you
2139b50d902SRodney W. Grimes>	compile.
2149b50d902SRodney W. Grimes>6.	The 'uncompress' script has been deleted; simply
2159b50d902SRodney W. Grimes>	'ln compress uncompress' after you compile and it will work.
2169b50d902SRodney W. Grimes>7.	Removed extra bit masking for machines that support unsigned
2179b50d902SRodney W. Grimes>	characters.  If your machine doesn't support unsigned characters,
2189b50d902SRodney W. Grimes>	define "NO_UCHAR" when compiling.
2199b50d902SRodney W. Grimes>
2209b50d902SRodney W. Grimes>Compile "compress.c" with "-O -o compress" flags.  Move "compress" to a
2219b50d902SRodney W. Grimes>standard executable location, such as /usr/local.  Then:
2229b50d902SRodney W. Grimes>	cd /usr/local
2239b50d902SRodney W. Grimes>	ln compress uncompress
2249b50d902SRodney W. Grimes>	ln compress zcat
2259b50d902SRodney W. Grimes>
2269b50d902SRodney W. Grimes>On machines that have a fixed stack size (such as Perkin-Elmer), set the
2279b50d902SRodney W. Grimes>stack to at least 12kb.  ("setstack compress 12" on Perkin-Elmer).
2289b50d902SRodney W. Grimes>
2299b50d902SRodney W. Grimes>Next, install the manual (compress.l).
2309b50d902SRodney W. Grimes>	cp compress.l /usr/man/manl		- or -
2319b50d902SRodney W. Grimes>	cp compress.l /usr/man/man1/compress.1
2329b50d902SRodney W. Grimes>
2339b50d902SRodney W. Grimes>Here is the README that I sent with my first posting:
2349b50d902SRodney W. Grimes>
2359b50d902SRodney W. Grimes>>Enclosed is a modified version of compress.c, along with scripts to make it
236d64ada50SJens Schweikhardt>>run identically to pack(1), unpack(1), and pcat(1).  Here is what I
2379b50d902SRodney W. Grimes>>(petsd!joe) and a colleague (petsd!peora!srd) did:
2389b50d902SRodney W. Grimes>>
2399b50d902SRodney W. Grimes>>1. Removed VAX dependencies.
2409b50d902SRodney W. Grimes>>2. Changed the struct to separate arrays; saves mucho memory.
2419b50d902SRodney W. Grimes>>3. Did comparisons in unsigned, where possible.  (Faster on Perkin-Elmer.)
2429b50d902SRodney W. Grimes>>4. Sorted the character next chain and changed the search to stop
2439b50d902SRodney W. Grimes>>prematurely.  This saves a lot on the execution time when compressing.
2449b50d902SRodney W. Grimes>>
2459b50d902SRodney W. Grimes>>This version is totally compatible with the original version.  Even though
2469b50d902SRodney W. Grimes>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit
2479b50d902SRodney W. Grimes>>machine, due to the size of the arrays.
2489b50d902SRodney W. Grimes>>
2499b50d902SRodney W. Grimes>>Here is the README file from the original author:
2509b50d902SRodney W. Grimes>>
2519b50d902SRodney W. Grimes>>>Well, with all this discussion about file compression (for news batching
2529b50d902SRodney W. Grimes>>>in particular) going around, I decided to implement the text compression
2539b50d902SRodney W. Grimes>>>algorithm described in the June Computer magazine.  The author claimed
2549b50d902SRodney W. Grimes>>>blinding speed and good compression ratios.  It's certainly faster than
2559b50d902SRodney W. Grimes>>>compact (but, then, what wouldn't be), but it's also the same speed as
2569b50d902SRodney W. Grimes>>>pack, and gets better compression than both of them.  On 350K bytes of
2576dc4364cSPhilippe Charnier>>>Unix-wizards, compact took about 8 minutes of CPU, pack took about 80
2589b50d902SRodney W. Grimes>>>seconds, and compress (herein) also took 80 seconds.  But, compact and
2599b50d902SRodney W. Grimes>>>pack got about 30% compression, whereas compress got over 50%.  So, I
2609b50d902SRodney W. Grimes>>>decided I had something, and that others might be interested, too.
2619b50d902SRodney W. Grimes>>>
2629b50d902SRodney W. Grimes>>>As is probably true of compact and pack (although I haven't checked),
2639b50d902SRodney W. Grimes>>>the byte order within a word is probably relevant here, but as long as
2649b50d902SRodney W. Grimes>>>you stay on a single machine type, you should be ok.  (Can anybody
2659b50d902SRodney W. Grimes>>>elucidate on this?)  There are a couple of asm's in the code (extv and
2669b50d902SRodney W. Grimes>>>insv instructions), so anyone porting it to another machine will have to
2679b50d902SRodney W. Grimes>>>deal with this anyway (and could probably make it compatible with Vax
2689b50d902SRodney W. Grimes>>>byte order at the same time).  Anyway, I've linted the code (both with
2699b50d902SRodney W. Grimes>>>and without -p), so it should run elsewhere.  Note the longs in the
2709b50d902SRodney W. Grimes>>>code, you can take these out if you reduce BITS to <= 15.
2719b50d902SRodney W. Grimes>>>
2729b50d902SRodney W. Grimes>>>Have fun, and as always, if you make good enhancements, or bug fixes,
2739b50d902SRodney W. Grimes>>>I'd like to see them.
2749b50d902SRodney W. Grimes>>>
2759b50d902SRodney W. Grimes>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas)
2769b50d902SRodney W. Grimes>>
2779b50d902SRodney W. Grimes>>					regards,
2789b50d902SRodney W. Grimes>>					joe
2799b50d902SRodney W. Grimes>>
2809b50d902SRodney W. Grimes>>--
2819b50d902SRodney W. Grimes>>Full-Name:  Joseph M. Orost
2829b50d902SRodney W. Grimes>>UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe
2839b50d902SRodney W. Grimes>>US Mail:    MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724
2849b50d902SRodney W. Grimes>>Phone:      (201) 870-5844
285