1*4a5d661aSToomas SoomeThis is a patched version of zlib, modified to use 2*4a5d661aSToomas SoomePentium-Pro-optimized assembly code in the deflation algorithm. The 3*4a5d661aSToomas Soomefiles changed/added by this patch are: 4*4a5d661aSToomas Soome 5*4a5d661aSToomas SoomeREADME.686 6*4a5d661aSToomas Soomematch.S 7*4a5d661aSToomas Soome 8*4a5d661aSToomas SoomeThe speedup that this patch provides varies, depending on whether the 9*4a5d661aSToomas Soomecompiler used to build the original version of zlib falls afoul of the 10*4a5d661aSToomas SoomePPro's speed traps. My own tests show a speedup of around 10-20% at 11*4a5d661aSToomas Soomethe default compression level, and 20-30% using -9, against a version 12*4a5d661aSToomas Soomecompiled using gcc 2.7.2.3. Your mileage may vary. 13*4a5d661aSToomas Soome 14*4a5d661aSToomas SoomeNote that this code has been tailored for the PPro/PII in particular, 15*4a5d661aSToomas Soomeand will not perform particuarly well on a Pentium. 16*4a5d661aSToomas Soome 17*4a5d661aSToomas SoomeIf you are using an assembler other than GNU as, you will have to 18*4a5d661aSToomas Soometranslate match.S to use your assembler's syntax. (Have fun.) 19*4a5d661aSToomas Soome 20*4a5d661aSToomas SoomeBrian Raiter 21*4a5d661aSToomas Soomebreadbox@muppetlabs.com 22*4a5d661aSToomas SoomeApril, 1998 23*4a5d661aSToomas Soome 24*4a5d661aSToomas Soome 25*4a5d661aSToomas SoomeAdded for zlib 1.1.3: 26*4a5d661aSToomas Soome 27*4a5d661aSToomas SoomeThe patches come from 28*4a5d661aSToomas Soomehttp://www.muppetlabs.com/~breadbox/software/assembly.html 29*4a5d661aSToomas Soome 30*4a5d661aSToomas SoomeTo compile zlib with this asm file, copy match.S to the zlib directory 31*4a5d661aSToomas Soomethen do: 32*4a5d661aSToomas Soome 33*4a5d661aSToomas SoomeCFLAGS="-O3 -DASMV" ./configure 34*4a5d661aSToomas Soomemake OBJA=match.o 35*4a5d661aSToomas Soome 36*4a5d661aSToomas Soome 37*4a5d661aSToomas SoomeUpdate: 38*4a5d661aSToomas Soome 39*4a5d661aSToomas SoomeI've been ignoring these assembly routines for years, believing that 40*4a5d661aSToomas Soomegcc's generated code had caught up with it sometime around gcc 2.95 41*4a5d661aSToomas Soomeand the major rearchitecting of the Pentium 4. However, I recently 42*4a5d661aSToomas Soomelearned that, despite what I believed, this code still has some life 43*4a5d661aSToomas Soomein it. On the Pentium 4 and AMD64 chips, it continues to run about 8% 44*4a5d661aSToomas Soomefaster than the code produced by gcc 4.1. 45*4a5d661aSToomas Soome 46*4a5d661aSToomas SoomeIn acknowledgement of its continuing usefulness, I've altered the 47*4a5d661aSToomas Soomelicense to match that of the rest of zlib. Share and Enjoy! 48*4a5d661aSToomas Soome 49*4a5d661aSToomas SoomeBrian Raiter 50*4a5d661aSToomas Soomebreadbox@muppetlabs.com 51*4a5d661aSToomas SoomeApril, 2007 52