19b50d902SRodney W. Grimes 29b50d902SRodney W. Grimes @(#)README 8.1 (Berkeley) 6/9/93 39b50d902SRodney W. Grimes 49b50d902SRodney W. GrimesCompress version 4.0 improvements over 3.0: 59b50d902SRodney W. Grimes o compress() speedup (10-50%) by changing division hash to xor 69b50d902SRodney W. Grimes o decompress() speedup (5-10%) 79b50d902SRodney W. Grimes o Memory requirements reduced (3-30%) 89b50d902SRodney W. Grimes o Stack requirements reduced to less than 4kb 99b50d902SRodney W. Grimes o Removed 'Big+Fast' compress code (FBITS) because of compress speedup 109b50d902SRodney W. Grimes o Portability mods for Z8000 and PC/XT (but not zeus 3.2) 119b50d902SRodney W. Grimes o Default to 'quiet' mode 129b50d902SRodney W. Grimes o Unification of 'force' flags 139b50d902SRodney W. Grimes o Manual page overhaul 149b50d902SRodney W. Grimes o Portability enhancement for M_XENIX 159b50d902SRodney W. Grimes o Removed text on #else and #endif 169b50d902SRodney W. Grimes o Added "-V" switch to print version and options 179b50d902SRodney W. Grimes o Added #defines for SIGNED_COMPARE_SLOW 189b50d902SRodney W. Grimes o Added Makefile and "usermem" program 199b50d902SRodney W. Grimes o Removed all floating point computations 209b50d902SRodney W. Grimes o New programs: [deleted] 219b50d902SRodney W. Grimes 229b50d902SRodney W. GrimesThe "usermem" script attempts to determine the maximum process size. Some 239b50d902SRodney W. Grimesediting of the script may be necessary (see the comments). [It should work 249b50d902SRodney W. Grimesfine on 4.3 bsd.] If you can't get it to work at all, just create file 259b50d902SRodney W. Grimes"USERMEM" containing the maximum process size in decimal. 269b50d902SRodney W. Grimes 279b50d902SRodney W. GrimesThe following preprocessor symbols control the compilation of "compress.c": 289b50d902SRodney W. Grimes 299b50d902SRodney W. Grimes o USERMEM Maximum process memory on the system 309b50d902SRodney W. Grimes o SACREDMEM Amount to reserve for other proceses 319b50d902SRodney W. Grimes o SIGNED_COMPARE_SLOW Unsigned compare instructions are faster 329b50d902SRodney W. Grimes o NO_UCHAR Don't use "unsigned char" types 339b50d902SRodney W. Grimes o BITS Overrules default set by USERMEM-SACREDMEM 349b50d902SRodney W. Grimes o vax Generate inline assembler 359b50d902SRodney W. Grimes o interdata Defines SIGNED_COMPARE_SLOW 369b50d902SRodney W. Grimes o M_XENIX Makes arrays < 65536 bytes each 379b50d902SRodney W. Grimes o pdp11 BITS=12, NO_UCHAR 389b50d902SRodney W. Grimes o z8000 BITS=12 399b50d902SRodney W. Grimes o pcxt BITS=12 409b50d902SRodney W. Grimes o BSD4_2 Allow long filenames ( > 14 characters) & 419b50d902SRodney W. Grimes Call setlinebuf(stderr) 429b50d902SRodney W. Grimes 439b50d902SRodney W. GrimesThe difference "usermem-sacredmem" determines the maximum BITS that can be 449b50d902SRodney W. Grimesspecified with the "-b" flag. 459b50d902SRodney W. Grimes 469b50d902SRodney W. Grimesmemory: at least BITS 479b50d902SRodney W. Grimes------ -- ----- ---- 489b50d902SRodney W. Grimes 433,484 16 499b50d902SRodney W. Grimes 229,600 15 509b50d902SRodney W. Grimes 127,536 14 519b50d902SRodney W. Grimes 73,464 13 529b50d902SRodney W. Grimes 0 12 539b50d902SRodney W. Grimes 549b50d902SRodney W. GrimesThe default is BITS=16. 559b50d902SRodney W. Grimes 569b50d902SRodney W. GrimesThe maximum bits can be overrulled by specifying "-DBITS=bits" at 579b50d902SRodney W. Grimescompilation time. 589b50d902SRodney W. Grimes 599b50d902SRodney W. GrimesWARNING: files compressed on a large machine with more bits than allowed by 609b50d902SRodney W. Grimesa version of compress on a smaller machine cannot be decompressed! Use the 619b50d902SRodney W. Grimes"-b12" flag to generate a file on a large machine that can be uncompressed 629b50d902SRodney W. Grimeson a 16-bit machine. 639b50d902SRodney W. Grimes 649b50d902SRodney W. GrimesThe output of compress 4.0 is fully compatible with that of compress 3.0. 659b50d902SRodney W. GrimesIn other words, the output of compress 4.0 may be fed into uncompress 3.0 or 669b50d902SRodney W. Grimesthe output of compress 3.0 may be fed into uncompress 4.0. 679b50d902SRodney W. Grimes 689b50d902SRodney W. GrimesThe output of compress 4.0 not compatible with that of 699b50d902SRodney W. Grimescompress 2.0. However, compress 4.0 still accepts the output of 709b50d902SRodney W. Grimescompress 2.0. To generate output that is compatible with compress 719b50d902SRodney W. Grimes2.0, use the undocumented "-C" flag. 729b50d902SRodney W. Grimes 739b50d902SRodney W. Grimes -from mod.sources, submitted by vax135!petsd!joe (Joe Orost), 8/1/85 749b50d902SRodney W. Grimes-------------------------------- 759b50d902SRodney W. Grimes 769b50d902SRodney W. GrimesEnclosed is compress version 3.0 with the following changes: 779b50d902SRodney W. Grimes 789b50d902SRodney W. Grimes1. "Block" compression is performed. After the BITS run out, the 799b50d902SRodney W. Grimes compression ratio is checked every so often. If it is decreasing, 809b50d902SRodney W. Grimes the table is cleared and a new set of substrings are generated. 819b50d902SRodney W. Grimes 829b50d902SRodney W. Grimes This makes the output of compress 3.0 not compatible with that of 839b50d902SRodney W. Grimes compress 2.0. However, compress 3.0 still accepts the output of 849b50d902SRodney W. Grimes compress 2.0. To generate output that is compatible with compress 859b50d902SRodney W. Grimes 2.0, use the undocumented "-C" flag. 869b50d902SRodney W. Grimes 879b50d902SRodney W. Grimes2. A quiet "-q" flag has been added for use by the news system. 889b50d902SRodney W. Grimes 899b50d902SRodney W. Grimes3. The character chaining has been deleted and the program now uses 909b50d902SRodney W. Grimes hashing. This improves the speed of the program, especially 919b50d902SRodney W. Grimes during decompression. Other speed improvements have been made, 929b50d902SRodney W. Grimes such as using putc() instead of fwrite(). 939b50d902SRodney W. Grimes 949b50d902SRodney W. Grimes4. A large table is used on large machines when a relatively small 959b50d902SRodney W. Grimes number of bits is specified. This saves much time when compressing 969b50d902SRodney W. Grimes for a 16-bit machine on a 32-bit virtual machine. Note that the 979b50d902SRodney W. Grimes speed improvement only occurs when the input file is > 30000 989b50d902SRodney W. Grimes characters, and the -b BITS is less than or equal to the cutoff 999b50d902SRodney W. Grimes described below. 1009b50d902SRodney W. Grimes 1019b50d902SRodney W. GrimesMost of these changes were made by James A. Woods (ames!jaw). Thank you 1029b50d902SRodney W. GrimesJames! 1039b50d902SRodney W. Grimes 1049b50d902SRodney W. GrimesTo compile compress: 1059b50d902SRodney W. Grimes 1069b50d902SRodney W. Grimes cc -O -DUSERMEM=usermem -o compress compress.c 1079b50d902SRodney W. Grimes 1089b50d902SRodney W. GrimesWhere "usermem" is the amount of physical user memory available (in bytes). 1099b50d902SRodney W. GrimesIf any physical memory is to be reserved for other processes, put in 1109b50d902SRodney W. Grimes"-DSACREDMEM sacredmem", where "sacredmem" is the amount to be reserved. 1119b50d902SRodney W. Grimes 1129b50d902SRodney W. GrimesThe difference "usermem-sacredmem" determines the maximum BITS that can be 1139b50d902SRodney W. Grimesspecified, and the cutoff bits where the large+fast table is used. 1149b50d902SRodney W. Grimes 1159b50d902SRodney W. Grimesmemory: at least BITS cutoff 1169b50d902SRodney W. Grimes------ -- ----- ---- ------ 1179b50d902SRodney W. Grimes 4,718,592 16 13 1189b50d902SRodney W. Grimes 2,621,440 16 12 1199b50d902SRodney W. Grimes 1,572,864 16 11 1209b50d902SRodney W. Grimes 1,048,576 16 10 1219b50d902SRodney W. Grimes 631,808 16 -- 1229b50d902SRodney W. Grimes 329,728 15 -- 1239b50d902SRodney W. Grimes 178,176 14 -- 1249b50d902SRodney W. Grimes 99,328 13 -- 1259b50d902SRodney W. Grimes 0 12 -- 1269b50d902SRodney W. Grimes 1279b50d902SRodney W. GrimesThe default memory size is 750,000 which gives a maximum BITS=16 and no 1289b50d902SRodney W. Grimeslarge+fast table. 1299b50d902SRodney W. Grimes 1309b50d902SRodney W. GrimesThe maximum bits can be overruled by specifying "-DBITS=bits" at 1319b50d902SRodney W. Grimescompilation time. 1329b50d902SRodney W. Grimes 1339b50d902SRodney W. GrimesIf your machine doesn't support unsigned characters, define "NO_UCHAR" 1349b50d902SRodney W. Grimeswhen compiling. 1359b50d902SRodney W. Grimes 1369b50d902SRodney W. GrimesIf your machine has "int" as 16-bits, define "SHORT_INT" when compiling. 1379b50d902SRodney W. Grimes 1389b50d902SRodney W. GrimesAfter compilation, move "compress" to a standard executable location, such 1399b50d902SRodney W. Grimesas /usr/local. Then: 1409b50d902SRodney W. Grimes cd /usr/local 1419b50d902SRodney W. Grimes ln compress uncompress 1429b50d902SRodney W. Grimes ln compress zcat 1439b50d902SRodney W. Grimes 1449b50d902SRodney W. GrimesOn machines that have a fixed stack size (such as Perkin-Elmer), set the 1459b50d902SRodney W. Grimesstack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 1469b50d902SRodney W. Grimes 1479b50d902SRodney W. GrimesNext, install the manual (compress.l). 1489b50d902SRodney W. Grimes cp compress.l /usr/man/manl 1499b50d902SRodney W. Grimes cd /usr/man/manl 1509b50d902SRodney W. Grimes ln compress.l uncompress.l 1519b50d902SRodney W. Grimes ln compress.l zcat.l 1529b50d902SRodney W. Grimes 1539b50d902SRodney W. Grimes - or - 1549b50d902SRodney W. Grimes 1559b50d902SRodney W. Grimes cp compress.l /usr/man/man1/compress.1 1569b50d902SRodney W. Grimes cd /usr/man/man1 1579b50d902SRodney W. Grimes ln compress.1 uncompress.1 1589b50d902SRodney W. Grimes ln compress.1 zcat.1 1599b50d902SRodney W. Grimes 1609b50d902SRodney W. Grimes regards, 1619b50d902SRodney W. Grimes petsd!joe 1629b50d902SRodney W. Grimes 1639b50d902SRodney W. GrimesHere is a note from the net: 1649b50d902SRodney W. Grimes 1659b50d902SRodney W. Grimes>From hplabs!pesnta!amd!turtlevax!ken Sat Jan 5 03:35:20 1985 1669b50d902SRodney W. GrimesPath: ames!hplabs!pesnta!amd!turtlevax!ken 1679b50d902SRodney W. GrimesFrom: ken@turtlevax.UUCP (Ken Turkowski) 1689b50d902SRodney W. GrimesNewsgroups: net.sources 1699b50d902SRodney W. GrimesSubject: Re: Compress release 3.0 : sample Makefile 1709b50d902SRodney W. GrimesOrganization: CADLINC, Inc. @ Menlo Park, CA 1719b50d902SRodney W. Grimes 1729b50d902SRodney W. GrimesIn the compress 3.0 source recently posted to mod.sources, there is a 1739b50d902SRodney W. Grimes#define variable which can be set for optimum performance on a machine 1749b50d902SRodney W. Grimeswith a large amount of memory. A program (usermem) to calculate the 1759b50d902SRodney W. Grimesuseable amount of physical user memory is enclosed, as well as a sample 1769b50d902SRodney W. Grimes4.2bsd Vax Makefile for compress. 1779b50d902SRodney W. Grimes 1789b50d902SRodney W. GrimesHere is the README file from the previous version of compress (2.0): 1799b50d902SRodney W. Grimes 1809b50d902SRodney W. Grimes>Enclosed is compress.c version 2.0 with the following bugs fixed: 1819b50d902SRodney W. Grimes> 1829b50d902SRodney W. Grimes>1. The packed files produced by compress are different on different 1839b50d902SRodney W. Grimes> machines and dependent on the vax sysgen option. 1849b50d902SRodney W. Grimes> The bug was in the different byte/bit ordering on the 1859b50d902SRodney W. Grimes> various machines. This has been fixed. 1869b50d902SRodney W. Grimes> 1879b50d902SRodney W. Grimes> This version is NOT compatible with the original vax posting 1889b50d902SRodney W. Grimes> unless the '-DCOMPATIBLE' option is specified to the C 1899b50d902SRodney W. Grimes> compiler. The original posting has a bug which I fixed, 1909b50d902SRodney W. Grimes> causing incompatible files. I recommend you NOT to use this 1919b50d902SRodney W. Grimes> option unless you already have a lot of packed files from 1929b50d902SRodney W. Grimes> the original posting by thomas. 1939b50d902SRodney W. Grimes>2. The exit status is not well defined (on some machines) causing the 1949b50d902SRodney W. Grimes> scripts to fail. 1959b50d902SRodney W. Grimes> The exit status is now 0,1 or 2 and is documented in 1969b50d902SRodney W. Grimes> compress.l. 1979b50d902SRodney W. Grimes>3. The function getopt() is not available in all C libraries. 1989b50d902SRodney W. Grimes> The function getopt() is no longer referenced by the 1999b50d902SRodney W. Grimes> program. 2009b50d902SRodney W. Grimes>4. Error status is not being checked on the fwrite() and fflush() calls. 2019b50d902SRodney W. Grimes> Fixed. 2029b50d902SRodney W. Grimes> 2039b50d902SRodney W. Grimes>The following enhancements have been made: 2049b50d902SRodney W. Grimes> 2059b50d902SRodney W. Grimes>1. Added facilities of "compact" into the compress program. "Pack", 2069b50d902SRodney W. Grimes> "Unpack", and "Pcat" are no longer required (no longer supplied). 2079b50d902SRodney W. Grimes>2. Installed work around for C compiler bug with "-O". 2089b50d902SRodney W. Grimes>3. Added a magic number header (\037\235). Put the bits specified 2099b50d902SRodney W. Grimes> in the file. 2109b50d902SRodney W. Grimes>4. Added "-f" flag to force overwrite of output file. 2119b50d902SRodney W. Grimes>5. Added "-c" flag and "zcat" program. 'ln compress zcat' after you 2129b50d902SRodney W. Grimes> compile. 2139b50d902SRodney W. Grimes>6. The 'uncompress' script has been deleted; simply 2149b50d902SRodney W. Grimes> 'ln compress uncompress' after you compile and it will work. 2159b50d902SRodney W. Grimes>7. Removed extra bit masking for machines that support unsigned 2169b50d902SRodney W. Grimes> characters. If your machine doesn't support unsigned characters, 2179b50d902SRodney W. Grimes> define "NO_UCHAR" when compiling. 2189b50d902SRodney W. Grimes> 2199b50d902SRodney W. Grimes>Compile "compress.c" with "-O -o compress" flags. Move "compress" to a 2209b50d902SRodney W. Grimes>standard executable location, such as /usr/local. Then: 2219b50d902SRodney W. Grimes> cd /usr/local 2229b50d902SRodney W. Grimes> ln compress uncompress 2239b50d902SRodney W. Grimes> ln compress zcat 2249b50d902SRodney W. Grimes> 2259b50d902SRodney W. Grimes>On machines that have a fixed stack size (such as Perkin-Elmer), set the 2269b50d902SRodney W. Grimes>stack to at least 12kb. ("setstack compress 12" on Perkin-Elmer). 2279b50d902SRodney W. Grimes> 2289b50d902SRodney W. Grimes>Next, install the manual (compress.l). 2299b50d902SRodney W. Grimes> cp compress.l /usr/man/manl - or - 2309b50d902SRodney W. Grimes> cp compress.l /usr/man/man1/compress.1 2319b50d902SRodney W. Grimes> 2329b50d902SRodney W. Grimes>Here is the README that I sent with my first posting: 2339b50d902SRodney W. Grimes> 2349b50d902SRodney W. Grimes>>Enclosed is a modified version of compress.c, along with scripts to make it 2359b50d902SRodney W. Grimes>>run identically to pack(1), unpack(1), an pcat(1). Here is what I 2369b50d902SRodney W. Grimes>>(petsd!joe) and a colleague (petsd!peora!srd) did: 2379b50d902SRodney W. Grimes>> 2389b50d902SRodney W. Grimes>>1. Removed VAX dependencies. 2399b50d902SRodney W. Grimes>>2. Changed the struct to separate arrays; saves mucho memory. 2409b50d902SRodney W. Grimes>>3. Did comparisons in unsigned, where possible. (Faster on Perkin-Elmer.) 2419b50d902SRodney W. Grimes>>4. Sorted the character next chain and changed the search to stop 2429b50d902SRodney W. Grimes>>prematurely. This saves a lot on the execution time when compressing. 2439b50d902SRodney W. Grimes>> 2449b50d902SRodney W. Grimes>>This version is totally compatible with the original version. Even though 2459b50d902SRodney W. Grimes>>lint(1) -p has no complaints about compress.c, it won't run on a 16-bit 2469b50d902SRodney W. Grimes>>machine, due to the size of the arrays. 2479b50d902SRodney W. Grimes>> 2489b50d902SRodney W. Grimes>>Here is the README file from the original author: 2499b50d902SRodney W. Grimes>> 2509b50d902SRodney W. Grimes>>>Well, with all this discussion about file compression (for news batching 2519b50d902SRodney W. Grimes>>>in particular) going around, I decided to implement the text compression 2529b50d902SRodney W. Grimes>>>algorithm described in the June Computer magazine. The author claimed 2539b50d902SRodney W. Grimes>>>blinding speed and good compression ratios. It's certainly faster than 2549b50d902SRodney W. Grimes>>>compact (but, then, what wouldn't be), but it's also the same speed as 2559b50d902SRodney W. Grimes>>>pack, and gets better compression than both of them. On 350K bytes of 2569b50d902SRodney W. Grimes>>>unix-wizards, compact took about 8 minutes of CPU, pack took about 80 2579b50d902SRodney W. Grimes>>>seconds, and compress (herein) also took 80 seconds. But, compact and 2589b50d902SRodney W. Grimes>>>pack got about 30% compression, whereas compress got over 50%. So, I 2599b50d902SRodney W. Grimes>>>decided I had something, and that others might be interested, too. 2609b50d902SRodney W. Grimes>>> 2619b50d902SRodney W. Grimes>>>As is probably true of compact and pack (although I haven't checked), 2629b50d902SRodney W. Grimes>>>the byte order within a word is probably relevant here, but as long as 2639b50d902SRodney W. Grimes>>>you stay on a single machine type, you should be ok. (Can anybody 2649b50d902SRodney W. Grimes>>>elucidate on this?) There are a couple of asm's in the code (extv and 2659b50d902SRodney W. Grimes>>>insv instructions), so anyone porting it to another machine will have to 2669b50d902SRodney W. Grimes>>>deal with this anyway (and could probably make it compatible with Vax 2679b50d902SRodney W. Grimes>>>byte order at the same time). Anyway, I've linted the code (both with 2689b50d902SRodney W. Grimes>>>and without -p), so it should run elsewhere. Note the longs in the 2699b50d902SRodney W. Grimes>>>code, you can take these out if you reduce BITS to <= 15. 2709b50d902SRodney W. Grimes>>> 2719b50d902SRodney W. Grimes>>>Have fun, and as always, if you make good enhancements, or bug fixes, 2729b50d902SRodney W. Grimes>>>I'd like to see them. 2739b50d902SRodney W. Grimes>>> 2749b50d902SRodney W. Grimes>>>=Spencer (thomas@utah-20, {harpo,hplabs,arizona}!utah-cs!thomas) 2759b50d902SRodney W. Grimes>> 2769b50d902SRodney W. Grimes>> regards, 2779b50d902SRodney W. Grimes>> joe 2789b50d902SRodney W. Grimes>> 2799b50d902SRodney W. Grimes>>-- 2809b50d902SRodney W. Grimes>>Full-Name: Joseph M. Orost 2819b50d902SRodney W. Grimes>>UUCP: ..!{decvax,ucbvax,ihnp4}!vax135!petsd!joe 2829b50d902SRodney W. Grimes>>US Mail: MS 313; Perkin-Elmer; 106 Apple St; Tinton Falls, NJ 07724 2839b50d902SRodney W. Grimes>>Phone: (201) 870-5844 284