1*4a5d661aSToomas Soome 2*4a5d661aSToomas Soome 3*4a5d661aSToomas Soome 4*4a5d661aSToomas Soome 5*4a5d661aSToomas Soome 6*4a5d661aSToomas Soome 7*4a5d661aSToomas SoomeNetwork Working Group P. Deutsch 8*4a5d661aSToomas SoomeRequest for Comments: 1952 Aladdin Enterprises 9*4a5d661aSToomas SoomeCategory: Informational May 1996 10*4a5d661aSToomas Soome 11*4a5d661aSToomas Soome 12*4a5d661aSToomas Soome GZIP file format specification version 4.3 13*4a5d661aSToomas Soome 14*4a5d661aSToomas SoomeStatus of This Memo 15*4a5d661aSToomas Soome 16*4a5d661aSToomas Soome This memo provides information for the Internet community. This memo 17*4a5d661aSToomas Soome does not specify an Internet standard of any kind. Distribution of 18*4a5d661aSToomas Soome this memo is unlimited. 19*4a5d661aSToomas Soome 20*4a5d661aSToomas SoomeIESG Note: 21*4a5d661aSToomas Soome 22*4a5d661aSToomas Soome The IESG takes no position on the validity of any Intellectual 23*4a5d661aSToomas Soome Property Rights statements contained in this document. 24*4a5d661aSToomas Soome 25*4a5d661aSToomas SoomeNotices 26*4a5d661aSToomas Soome 27*4a5d661aSToomas Soome Copyright (c) 1996 L. Peter Deutsch 28*4a5d661aSToomas Soome 29*4a5d661aSToomas Soome Permission is granted to copy and distribute this document for any 30*4a5d661aSToomas Soome purpose and without charge, including translations into other 31*4a5d661aSToomas Soome languages and incorporation into compilations, provided that the 32*4a5d661aSToomas Soome copyright notice and this notice are preserved, and that any 33*4a5d661aSToomas Soome substantive changes or deletions from the original are clearly 34*4a5d661aSToomas Soome marked. 35*4a5d661aSToomas Soome 36*4a5d661aSToomas Soome A pointer to the latest version of this and related documentation in 37*4a5d661aSToomas Soome HTML format can be found at the URL 38*4a5d661aSToomas Soome <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>. 39*4a5d661aSToomas Soome 40*4a5d661aSToomas SoomeAbstract 41*4a5d661aSToomas Soome 42*4a5d661aSToomas Soome This specification defines a lossless compressed data format that is 43*4a5d661aSToomas Soome compatible with the widely used GZIP utility. The format includes a 44*4a5d661aSToomas Soome cyclic redundancy check value for detecting data corruption. The 45*4a5d661aSToomas Soome format presently uses the DEFLATE method of compression but can be 46*4a5d661aSToomas Soome easily extended to use other compression methods. The format can be 47*4a5d661aSToomas Soome implemented readily in a manner not covered by patents. 48*4a5d661aSToomas Soome 49*4a5d661aSToomas Soome 50*4a5d661aSToomas Soome 51*4a5d661aSToomas Soome 52*4a5d661aSToomas Soome 53*4a5d661aSToomas Soome 54*4a5d661aSToomas Soome 55*4a5d661aSToomas Soome 56*4a5d661aSToomas Soome 57*4a5d661aSToomas Soome 58*4a5d661aSToomas SoomeDeutsch Informational [Page 1] 59*4a5d661aSToomas Soome 60*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 61*4a5d661aSToomas Soome 62*4a5d661aSToomas Soome 63*4a5d661aSToomas SoomeTable of Contents 64*4a5d661aSToomas Soome 65*4a5d661aSToomas Soome 1. Introduction ................................................... 2 66*4a5d661aSToomas Soome 1.1. Purpose ................................................... 2 67*4a5d661aSToomas Soome 1.2. Intended audience ......................................... 3 68*4a5d661aSToomas Soome 1.3. Scope ..................................................... 3 69*4a5d661aSToomas Soome 1.4. Compliance ................................................ 3 70*4a5d661aSToomas Soome 1.5. Definitions of terms and conventions used ................. 3 71*4a5d661aSToomas Soome 1.6. Changes from previous versions ............................ 3 72*4a5d661aSToomas Soome 2. Detailed specification ......................................... 4 73*4a5d661aSToomas Soome 2.1. Overall conventions ....................................... 4 74*4a5d661aSToomas Soome 2.2. File format ............................................... 5 75*4a5d661aSToomas Soome 2.3. Member format ............................................. 5 76*4a5d661aSToomas Soome 2.3.1. Member header and trailer ........................... 6 77*4a5d661aSToomas Soome 2.3.1.1. Extra field ................................... 8 78*4a5d661aSToomas Soome 2.3.1.2. Compliance .................................... 9 79*4a5d661aSToomas Soome 3. References .................................................. 9 80*4a5d661aSToomas Soome 4. Security Considerations .................................... 10 81*4a5d661aSToomas Soome 5. Acknowledgements ........................................... 10 82*4a5d661aSToomas Soome 6. Author's Address ........................................... 10 83*4a5d661aSToomas Soome 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11 84*4a5d661aSToomas Soome 8. Appendix: Sample CRC Code .................................. 11 85*4a5d661aSToomas Soome 86*4a5d661aSToomas Soome1. Introduction 87*4a5d661aSToomas Soome 88*4a5d661aSToomas Soome 1.1. Purpose 89*4a5d661aSToomas Soome 90*4a5d661aSToomas Soome The purpose of this specification is to define a lossless 91*4a5d661aSToomas Soome compressed data format that: 92*4a5d661aSToomas Soome 93*4a5d661aSToomas Soome * Is independent of CPU type, operating system, file system, 94*4a5d661aSToomas Soome and character set, and hence can be used for interchange; 95*4a5d661aSToomas Soome * Can compress or decompress a data stream (as opposed to a 96*4a5d661aSToomas Soome randomly accessible file) to produce another data stream, 97*4a5d661aSToomas Soome using only an a priori bounded amount of intermediate 98*4a5d661aSToomas Soome storage, and hence can be used in data communications or 99*4a5d661aSToomas Soome similar structures such as Unix filters; 100*4a5d661aSToomas Soome * Compresses data with efficiency comparable to the best 101*4a5d661aSToomas Soome currently available general-purpose compression methods, 102*4a5d661aSToomas Soome and in particular considerably better than the "compress" 103*4a5d661aSToomas Soome program; 104*4a5d661aSToomas Soome * Can be implemented readily in a manner not covered by 105*4a5d661aSToomas Soome patents, and hence can be practiced freely; 106*4a5d661aSToomas Soome * Is compatible with the file format produced by the current 107*4a5d661aSToomas Soome widely used gzip utility, in that conforming decompressors 108*4a5d661aSToomas Soome will be able to read data produced by the existing gzip 109*4a5d661aSToomas Soome compressor. 110*4a5d661aSToomas Soome 111*4a5d661aSToomas Soome 112*4a5d661aSToomas Soome 113*4a5d661aSToomas Soome 114*4a5d661aSToomas SoomeDeutsch Informational [Page 2] 115*4a5d661aSToomas Soome 116*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 117*4a5d661aSToomas Soome 118*4a5d661aSToomas Soome 119*4a5d661aSToomas Soome The data format defined by this specification does not attempt to: 120*4a5d661aSToomas Soome 121*4a5d661aSToomas Soome * Provide random access to compressed data; 122*4a5d661aSToomas Soome * Compress specialized data (e.g., raster graphics) as well as 123*4a5d661aSToomas Soome the best currently available specialized algorithms. 124*4a5d661aSToomas Soome 125*4a5d661aSToomas Soome 1.2. Intended audience 126*4a5d661aSToomas Soome 127*4a5d661aSToomas Soome This specification is intended for use by implementors of software 128*4a5d661aSToomas Soome to compress data into gzip format and/or decompress data from gzip 129*4a5d661aSToomas Soome format. 130*4a5d661aSToomas Soome 131*4a5d661aSToomas Soome The text of the specification assumes a basic background in 132*4a5d661aSToomas Soome programming at the level of bits and other primitive data 133*4a5d661aSToomas Soome representations. 134*4a5d661aSToomas Soome 135*4a5d661aSToomas Soome 1.3. Scope 136*4a5d661aSToomas Soome 137*4a5d661aSToomas Soome The specification specifies a compression method and a file format 138*4a5d661aSToomas Soome (the latter assuming only that a file can store a sequence of 139*4a5d661aSToomas Soome arbitrary bytes). It does not specify any particular interface to 140*4a5d661aSToomas Soome a file system or anything about character sets or encodings 141*4a5d661aSToomas Soome (except for file names and comments, which are optional). 142*4a5d661aSToomas Soome 143*4a5d661aSToomas Soome 1.4. Compliance 144*4a5d661aSToomas Soome 145*4a5d661aSToomas Soome Unless otherwise indicated below, a compliant decompressor must be 146*4a5d661aSToomas Soome able to accept and decompress any file that conforms to all the 147*4a5d661aSToomas Soome specifications presented here; a compliant compressor must produce 148*4a5d661aSToomas Soome files that conform to all the specifications presented here. The 149*4a5d661aSToomas Soome material in the appendices is not part of the specification per se 150*4a5d661aSToomas Soome and is not relevant to compliance. 151*4a5d661aSToomas Soome 152*4a5d661aSToomas Soome 1.5. Definitions of terms and conventions used 153*4a5d661aSToomas Soome 154*4a5d661aSToomas Soome byte: 8 bits stored or transmitted as a unit (same as an octet). 155*4a5d661aSToomas Soome (For this specification, a byte is exactly 8 bits, even on 156*4a5d661aSToomas Soome machines which store a character on a number of bits different 157*4a5d661aSToomas Soome from 8.) See below for the numbering of bits within a byte. 158*4a5d661aSToomas Soome 159*4a5d661aSToomas Soome 1.6. Changes from previous versions 160*4a5d661aSToomas Soome 161*4a5d661aSToomas Soome There have been no technical changes to the gzip format since 162*4a5d661aSToomas Soome version 4.1 of this specification. In version 4.2, some 163*4a5d661aSToomas Soome terminology was changed, and the sample CRC code was rewritten for 164*4a5d661aSToomas Soome clarity and to eliminate the requirement for the caller to do pre- 165*4a5d661aSToomas Soome and post-conditioning. Version 4.3 is a conversion of the 166*4a5d661aSToomas Soome specification to RFC style. 167*4a5d661aSToomas Soome 168*4a5d661aSToomas Soome 169*4a5d661aSToomas Soome 170*4a5d661aSToomas SoomeDeutsch Informational [Page 3] 171*4a5d661aSToomas Soome 172*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 173*4a5d661aSToomas Soome 174*4a5d661aSToomas Soome 175*4a5d661aSToomas Soome2. Detailed specification 176*4a5d661aSToomas Soome 177*4a5d661aSToomas Soome 2.1. Overall conventions 178*4a5d661aSToomas Soome 179*4a5d661aSToomas Soome In the diagrams below, a box like this: 180*4a5d661aSToomas Soome 181*4a5d661aSToomas Soome +---+ 182*4a5d661aSToomas Soome | | <-- the vertical bars might be missing 183*4a5d661aSToomas Soome +---+ 184*4a5d661aSToomas Soome 185*4a5d661aSToomas Soome represents one byte; a box like this: 186*4a5d661aSToomas Soome 187*4a5d661aSToomas Soome +==============+ 188*4a5d661aSToomas Soome | | 189*4a5d661aSToomas Soome +==============+ 190*4a5d661aSToomas Soome 191*4a5d661aSToomas Soome represents a variable number of bytes. 192*4a5d661aSToomas Soome 193*4a5d661aSToomas Soome Bytes stored within a computer do not have a "bit order", since 194*4a5d661aSToomas Soome they are always treated as a unit. However, a byte considered as 195*4a5d661aSToomas Soome an integer between 0 and 255 does have a most- and least- 196*4a5d661aSToomas Soome significant bit, and since we write numbers with the most- 197*4a5d661aSToomas Soome significant digit on the left, we also write bytes with the most- 198*4a5d661aSToomas Soome significant bit on the left. In the diagrams below, we number the 199*4a5d661aSToomas Soome bits of a byte so that bit 0 is the least-significant bit, i.e., 200*4a5d661aSToomas Soome the bits are numbered: 201*4a5d661aSToomas Soome 202*4a5d661aSToomas Soome +--------+ 203*4a5d661aSToomas Soome |76543210| 204*4a5d661aSToomas Soome +--------+ 205*4a5d661aSToomas Soome 206*4a5d661aSToomas Soome This document does not address the issue of the order in which 207*4a5d661aSToomas Soome bits of a byte are transmitted on a bit-sequential medium, since 208*4a5d661aSToomas Soome the data format described here is byte- rather than bit-oriented. 209*4a5d661aSToomas Soome 210*4a5d661aSToomas Soome Within a computer, a number may occupy multiple bytes. All 211*4a5d661aSToomas Soome multi-byte numbers in the format described here are stored with 212*4a5d661aSToomas Soome the least-significant byte first (at the lower memory address). 213*4a5d661aSToomas Soome For example, the decimal number 520 is stored as: 214*4a5d661aSToomas Soome 215*4a5d661aSToomas Soome 0 1 216*4a5d661aSToomas Soome +--------+--------+ 217*4a5d661aSToomas Soome |00001000|00000010| 218*4a5d661aSToomas Soome +--------+--------+ 219*4a5d661aSToomas Soome ^ ^ 220*4a5d661aSToomas Soome | | 221*4a5d661aSToomas Soome | + more significant byte = 2 x 256 222*4a5d661aSToomas Soome + less significant byte = 8 223*4a5d661aSToomas Soome 224*4a5d661aSToomas Soome 225*4a5d661aSToomas Soome 226*4a5d661aSToomas SoomeDeutsch Informational [Page 4] 227*4a5d661aSToomas Soome 228*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 229*4a5d661aSToomas Soome 230*4a5d661aSToomas Soome 231*4a5d661aSToomas Soome 2.2. File format 232*4a5d661aSToomas Soome 233*4a5d661aSToomas Soome A gzip file consists of a series of "members" (compressed data 234*4a5d661aSToomas Soome sets). The format of each member is specified in the following 235*4a5d661aSToomas Soome section. The members simply appear one after another in the file, 236*4a5d661aSToomas Soome with no additional information before, between, or after them. 237*4a5d661aSToomas Soome 238*4a5d661aSToomas Soome 2.3. Member format 239*4a5d661aSToomas Soome 240*4a5d661aSToomas Soome Each member has the following structure: 241*4a5d661aSToomas Soome 242*4a5d661aSToomas Soome +---+---+---+---+---+---+---+---+---+---+ 243*4a5d661aSToomas Soome |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->) 244*4a5d661aSToomas Soome +---+---+---+---+---+---+---+---+---+---+ 245*4a5d661aSToomas Soome 246*4a5d661aSToomas Soome (if FLG.FEXTRA set) 247*4a5d661aSToomas Soome 248*4a5d661aSToomas Soome +---+---+=================================+ 249*4a5d661aSToomas Soome | XLEN |...XLEN bytes of "extra field"...| (more-->) 250*4a5d661aSToomas Soome +---+---+=================================+ 251*4a5d661aSToomas Soome 252*4a5d661aSToomas Soome (if FLG.FNAME set) 253*4a5d661aSToomas Soome 254*4a5d661aSToomas Soome +=========================================+ 255*4a5d661aSToomas Soome |...original file name, zero-terminated...| (more-->) 256*4a5d661aSToomas Soome +=========================================+ 257*4a5d661aSToomas Soome 258*4a5d661aSToomas Soome (if FLG.FCOMMENT set) 259*4a5d661aSToomas Soome 260*4a5d661aSToomas Soome +===================================+ 261*4a5d661aSToomas Soome |...file comment, zero-terminated...| (more-->) 262*4a5d661aSToomas Soome +===================================+ 263*4a5d661aSToomas Soome 264*4a5d661aSToomas Soome (if FLG.FHCRC set) 265*4a5d661aSToomas Soome 266*4a5d661aSToomas Soome +---+---+ 267*4a5d661aSToomas Soome | CRC16 | 268*4a5d661aSToomas Soome +---+---+ 269*4a5d661aSToomas Soome 270*4a5d661aSToomas Soome +=======================+ 271*4a5d661aSToomas Soome |...compressed blocks...| (more-->) 272*4a5d661aSToomas Soome +=======================+ 273*4a5d661aSToomas Soome 274*4a5d661aSToomas Soome 0 1 2 3 4 5 6 7 275*4a5d661aSToomas Soome +---+---+---+---+---+---+---+---+ 276*4a5d661aSToomas Soome | CRC32 | ISIZE | 277*4a5d661aSToomas Soome +---+---+---+---+---+---+---+---+ 278*4a5d661aSToomas Soome 279*4a5d661aSToomas Soome 280*4a5d661aSToomas Soome 281*4a5d661aSToomas Soome 282*4a5d661aSToomas SoomeDeutsch Informational [Page 5] 283*4a5d661aSToomas Soome 284*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 285*4a5d661aSToomas Soome 286*4a5d661aSToomas Soome 287*4a5d661aSToomas Soome 2.3.1. Member header and trailer 288*4a5d661aSToomas Soome 289*4a5d661aSToomas Soome ID1 (IDentification 1) 290*4a5d661aSToomas Soome ID2 (IDentification 2) 291*4a5d661aSToomas Soome These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 292*4a5d661aSToomas Soome (0x8b, \213), to identify the file as being in gzip format. 293*4a5d661aSToomas Soome 294*4a5d661aSToomas Soome CM (Compression Method) 295*4a5d661aSToomas Soome This identifies the compression method used in the file. CM 296*4a5d661aSToomas Soome = 0-7 are reserved. CM = 8 denotes the "deflate" 297*4a5d661aSToomas Soome compression method, which is the one customarily used by 298*4a5d661aSToomas Soome gzip and which is documented elsewhere. 299*4a5d661aSToomas Soome 300*4a5d661aSToomas Soome FLG (FLaGs) 301*4a5d661aSToomas Soome This flag byte is divided into individual bits as follows: 302*4a5d661aSToomas Soome 303*4a5d661aSToomas Soome bit 0 FTEXT 304*4a5d661aSToomas Soome bit 1 FHCRC 305*4a5d661aSToomas Soome bit 2 FEXTRA 306*4a5d661aSToomas Soome bit 3 FNAME 307*4a5d661aSToomas Soome bit 4 FCOMMENT 308*4a5d661aSToomas Soome bit 5 reserved 309*4a5d661aSToomas Soome bit 6 reserved 310*4a5d661aSToomas Soome bit 7 reserved 311*4a5d661aSToomas Soome 312*4a5d661aSToomas Soome If FTEXT is set, the file is probably ASCII text. This is 313*4a5d661aSToomas Soome an optional indication, which the compressor may set by 314*4a5d661aSToomas Soome checking a small amount of the input data to see whether any 315*4a5d661aSToomas Soome non-ASCII characters are present. In case of doubt, FTEXT 316*4a5d661aSToomas Soome is cleared, indicating binary data. For systems which have 317*4a5d661aSToomas Soome different file formats for ascii text and binary data, the 318*4a5d661aSToomas Soome decompressor can use FTEXT to choose the appropriate format. 319*4a5d661aSToomas Soome We deliberately do not specify the algorithm used to set 320*4a5d661aSToomas Soome this bit, since a compressor always has the option of 321*4a5d661aSToomas Soome leaving it cleared and a decompressor always has the option 322*4a5d661aSToomas Soome of ignoring it and letting some other program handle issues 323*4a5d661aSToomas Soome of data conversion. 324*4a5d661aSToomas Soome 325*4a5d661aSToomas Soome If FHCRC is set, a CRC16 for the gzip header is present, 326*4a5d661aSToomas Soome immediately before the compressed data. The CRC16 consists 327*4a5d661aSToomas Soome of the two least significant bytes of the CRC32 for all 328*4a5d661aSToomas Soome bytes of the gzip header up to and not including the CRC16. 329*4a5d661aSToomas Soome [The FHCRC bit was never set by versions of gzip up to 330*4a5d661aSToomas Soome 1.2.4, even though it was documented with a different 331*4a5d661aSToomas Soome meaning in gzip 1.2.4.] 332*4a5d661aSToomas Soome 333*4a5d661aSToomas Soome If FEXTRA is set, optional extra fields are present, as 334*4a5d661aSToomas Soome described in a following section. 335*4a5d661aSToomas Soome 336*4a5d661aSToomas Soome 337*4a5d661aSToomas Soome 338*4a5d661aSToomas SoomeDeutsch Informational [Page 6] 339*4a5d661aSToomas Soome 340*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 341*4a5d661aSToomas Soome 342*4a5d661aSToomas Soome 343*4a5d661aSToomas Soome If FNAME is set, an original file name is present, 344*4a5d661aSToomas Soome terminated by a zero byte. The name must consist of ISO 345*4a5d661aSToomas Soome 8859-1 (LATIN-1) characters; on operating systems using 346*4a5d661aSToomas Soome EBCDIC or any other character set for file names, the name 347*4a5d661aSToomas Soome must be translated to the ISO LATIN-1 character set. This 348*4a5d661aSToomas Soome is the original name of the file being compressed, with any 349*4a5d661aSToomas Soome directory components removed, and, if the file being 350*4a5d661aSToomas Soome compressed is on a file system with case insensitive names, 351*4a5d661aSToomas Soome forced to lower case. There is no original file name if the 352*4a5d661aSToomas Soome data was compressed from a source other than a named file; 353*4a5d661aSToomas Soome for example, if the source was stdin on a Unix system, there 354*4a5d661aSToomas Soome is no file name. 355*4a5d661aSToomas Soome 356*4a5d661aSToomas Soome If FCOMMENT is set, a zero-terminated file comment is 357*4a5d661aSToomas Soome present. This comment is not interpreted; it is only 358*4a5d661aSToomas Soome intended for human consumption. The comment must consist of 359*4a5d661aSToomas Soome ISO 8859-1 (LATIN-1) characters. Line breaks should be 360*4a5d661aSToomas Soome denoted by a single line feed character (10 decimal). 361*4a5d661aSToomas Soome 362*4a5d661aSToomas Soome Reserved FLG bits must be zero. 363*4a5d661aSToomas Soome 364*4a5d661aSToomas Soome MTIME (Modification TIME) 365*4a5d661aSToomas Soome This gives the most recent modification time of the original 366*4a5d661aSToomas Soome file being compressed. The time is in Unix format, i.e., 367*4a5d661aSToomas Soome seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this 368*4a5d661aSToomas Soome may cause problems for MS-DOS and other systems that use 369*4a5d661aSToomas Soome local rather than Universal time.) If the compressed data 370*4a5d661aSToomas Soome did not come from a file, MTIME is set to the time at which 371*4a5d661aSToomas Soome compression started. MTIME = 0 means no time stamp is 372*4a5d661aSToomas Soome available. 373*4a5d661aSToomas Soome 374*4a5d661aSToomas Soome XFL (eXtra FLags) 375*4a5d661aSToomas Soome These flags are available for use by specific compression 376*4a5d661aSToomas Soome methods. The "deflate" method (CM = 8) sets these flags as 377*4a5d661aSToomas Soome follows: 378*4a5d661aSToomas Soome 379*4a5d661aSToomas Soome XFL = 2 - compressor used maximum compression, 380*4a5d661aSToomas Soome slowest algorithm 381*4a5d661aSToomas Soome XFL = 4 - compressor used fastest algorithm 382*4a5d661aSToomas Soome 383*4a5d661aSToomas Soome OS (Operating System) 384*4a5d661aSToomas Soome This identifies the type of file system on which compression 385*4a5d661aSToomas Soome took place. This may be useful in determining end-of-line 386*4a5d661aSToomas Soome convention for text files. The currently defined values are 387*4a5d661aSToomas Soome as follows: 388*4a5d661aSToomas Soome 389*4a5d661aSToomas Soome 390*4a5d661aSToomas Soome 391*4a5d661aSToomas Soome 392*4a5d661aSToomas Soome 393*4a5d661aSToomas Soome 394*4a5d661aSToomas SoomeDeutsch Informational [Page 7] 395*4a5d661aSToomas Soome 396*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 397*4a5d661aSToomas Soome 398*4a5d661aSToomas Soome 399*4a5d661aSToomas Soome 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32) 400*4a5d661aSToomas Soome 1 - Amiga 401*4a5d661aSToomas Soome 2 - VMS (or OpenVMS) 402*4a5d661aSToomas Soome 3 - Unix 403*4a5d661aSToomas Soome 4 - VM/CMS 404*4a5d661aSToomas Soome 5 - Atari TOS 405*4a5d661aSToomas Soome 6 - HPFS filesystem (OS/2, NT) 406*4a5d661aSToomas Soome 7 - Macintosh 407*4a5d661aSToomas Soome 8 - Z-System 408*4a5d661aSToomas Soome 9 - CP/M 409*4a5d661aSToomas Soome 10 - TOPS-20 410*4a5d661aSToomas Soome 11 - NTFS filesystem (NT) 411*4a5d661aSToomas Soome 12 - QDOS 412*4a5d661aSToomas Soome 13 - Acorn RISCOS 413*4a5d661aSToomas Soome 255 - unknown 414*4a5d661aSToomas Soome 415*4a5d661aSToomas Soome XLEN (eXtra LENgth) 416*4a5d661aSToomas Soome If FLG.FEXTRA is set, this gives the length of the optional 417*4a5d661aSToomas Soome extra field. See below for details. 418*4a5d661aSToomas Soome 419*4a5d661aSToomas Soome CRC32 (CRC-32) 420*4a5d661aSToomas Soome This contains a Cyclic Redundancy Check value of the 421*4a5d661aSToomas Soome uncompressed data computed according to CRC-32 algorithm 422*4a5d661aSToomas Soome used in the ISO 3309 standard and in section 8.1.1.6.2 of 423*4a5d661aSToomas Soome ITU-T recommendation V.42. (See http://www.iso.ch for 424*4a5d661aSToomas Soome ordering ISO documents. See gopher://info.itu.ch for an 425*4a5d661aSToomas Soome online version of ITU-T V.42.) 426*4a5d661aSToomas Soome 427*4a5d661aSToomas Soome ISIZE (Input SIZE) 428*4a5d661aSToomas Soome This contains the size of the original (uncompressed) input 429*4a5d661aSToomas Soome data modulo 2^32. 430*4a5d661aSToomas Soome 431*4a5d661aSToomas Soome 2.3.1.1. Extra field 432*4a5d661aSToomas Soome 433*4a5d661aSToomas Soome If the FLG.FEXTRA bit is set, an "extra field" is present in 434*4a5d661aSToomas Soome the header, with total length XLEN bytes. It consists of a 435*4a5d661aSToomas Soome series of subfields, each of the form: 436*4a5d661aSToomas Soome 437*4a5d661aSToomas Soome +---+---+---+---+==================================+ 438*4a5d661aSToomas Soome |SI1|SI2| LEN |... LEN bytes of subfield data ...| 439*4a5d661aSToomas Soome +---+---+---+---+==================================+ 440*4a5d661aSToomas Soome 441*4a5d661aSToomas Soome SI1 and SI2 provide a subfield ID, typically two ASCII letters 442*4a5d661aSToomas Soome with some mnemonic value. Jean-Loup Gailly 443*4a5d661aSToomas Soome <gzip@prep.ai.mit.edu> is maintaining a registry of subfield 444*4a5d661aSToomas Soome IDs; please send him any subfield ID you wish to use. Subfield 445*4a5d661aSToomas Soome IDs with SI2 = 0 are reserved for future use. The following 446*4a5d661aSToomas Soome IDs are currently defined: 447*4a5d661aSToomas Soome 448*4a5d661aSToomas Soome 449*4a5d661aSToomas Soome 450*4a5d661aSToomas SoomeDeutsch Informational [Page 8] 451*4a5d661aSToomas Soome 452*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 453*4a5d661aSToomas Soome 454*4a5d661aSToomas Soome 455*4a5d661aSToomas Soome SI1 SI2 Data 456*4a5d661aSToomas Soome ---------- ---------- ---- 457*4a5d661aSToomas Soome 0x41 ('A') 0x70 ('P') Apollo file type information 458*4a5d661aSToomas Soome 459*4a5d661aSToomas Soome LEN gives the length of the subfield data, excluding the 4 460*4a5d661aSToomas Soome initial bytes. 461*4a5d661aSToomas Soome 462*4a5d661aSToomas Soome 2.3.1.2. Compliance 463*4a5d661aSToomas Soome 464*4a5d661aSToomas Soome A compliant compressor must produce files with correct ID1, 465*4a5d661aSToomas Soome ID2, CM, CRC32, and ISIZE, but may set all the other fields in 466*4a5d661aSToomas Soome the fixed-length part of the header to default values (255 for 467*4a5d661aSToomas Soome OS, 0 for all others). The compressor must set all reserved 468*4a5d661aSToomas Soome bits to zero. 469*4a5d661aSToomas Soome 470*4a5d661aSToomas Soome A compliant decompressor must check ID1, ID2, and CM, and 471*4a5d661aSToomas Soome provide an error indication if any of these have incorrect 472*4a5d661aSToomas Soome values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC 473*4a5d661aSToomas Soome at least so it can skip over the optional fields if they are 474*4a5d661aSToomas Soome present. It need not examine any other part of the header or 475*4a5d661aSToomas Soome trailer; in particular, a decompressor may ignore FTEXT and OS 476*4a5d661aSToomas Soome and always produce binary output, and still be compliant. A 477*4a5d661aSToomas Soome compliant decompressor must give an error indication if any 478*4a5d661aSToomas Soome reserved bit is non-zero, since such a bit could indicate the 479*4a5d661aSToomas Soome presence of a new field that would cause subsequent data to be 480*4a5d661aSToomas Soome interpreted incorrectly. 481*4a5d661aSToomas Soome 482*4a5d661aSToomas Soome3. References 483*4a5d661aSToomas Soome 484*4a5d661aSToomas Soome [1] "Information Processing - 8-bit single-byte coded graphic 485*4a5d661aSToomas Soome character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987). 486*4a5d661aSToomas Soome The ISO 8859-1 (Latin-1) character set is a superset of 7-bit 487*4a5d661aSToomas Soome ASCII. Files defining this character set are available as 488*4a5d661aSToomas Soome iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/ 489*4a5d661aSToomas Soome 490*4a5d661aSToomas Soome [2] ISO 3309 491*4a5d661aSToomas Soome 492*4a5d661aSToomas Soome [3] ITU-T recommendation V.42 493*4a5d661aSToomas Soome 494*4a5d661aSToomas Soome [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification", 495*4a5d661aSToomas Soome available in ftp://ftp.uu.net/pub/archiving/zip/doc/ 496*4a5d661aSToomas Soome 497*4a5d661aSToomas Soome [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in 498*4a5d661aSToomas Soome ftp://prep.ai.mit.edu/pub/gnu/ 499*4a5d661aSToomas Soome 500*4a5d661aSToomas Soome [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table 501*4a5d661aSToomas Soome Look-Up", Communications of the ACM, 31(8), pp.1008-1013. 502*4a5d661aSToomas Soome 503*4a5d661aSToomas Soome 504*4a5d661aSToomas Soome 505*4a5d661aSToomas Soome 506*4a5d661aSToomas SoomeDeutsch Informational [Page 9] 507*4a5d661aSToomas Soome 508*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 509*4a5d661aSToomas Soome 510*4a5d661aSToomas Soome 511*4a5d661aSToomas Soome [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal, 512*4a5d661aSToomas Soome pp.118-133. 513*4a5d661aSToomas Soome 514*4a5d661aSToomas Soome [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt, 515*4a5d661aSToomas Soome describing the CRC concept. 516*4a5d661aSToomas Soome 517*4a5d661aSToomas Soome4. Security Considerations 518*4a5d661aSToomas Soome 519*4a5d661aSToomas Soome Any data compression method involves the reduction of redundancy in 520*4a5d661aSToomas Soome the data. Consequently, any corruption of the data is likely to have 521*4a5d661aSToomas Soome severe effects and be difficult to correct. Uncompressed text, on 522*4a5d661aSToomas Soome the other hand, will probably still be readable despite the presence 523*4a5d661aSToomas Soome of some corrupted bytes. 524*4a5d661aSToomas Soome 525*4a5d661aSToomas Soome It is recommended that systems using this data format provide some 526*4a5d661aSToomas Soome means of validating the integrity of the compressed data, such as by 527*4a5d661aSToomas Soome setting and checking the CRC-32 check value. 528*4a5d661aSToomas Soome 529*4a5d661aSToomas Soome5. Acknowledgements 530*4a5d661aSToomas Soome 531*4a5d661aSToomas Soome Trademarks cited in this document are the property of their 532*4a5d661aSToomas Soome respective owners. 533*4a5d661aSToomas Soome 534*4a5d661aSToomas Soome Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler, 535*4a5d661aSToomas Soome the related software described in this specification. Glenn 536*4a5d661aSToomas Soome Randers-Pehrson converted this document to RFC and HTML format. 537*4a5d661aSToomas Soome 538*4a5d661aSToomas Soome6. Author's Address 539*4a5d661aSToomas Soome 540*4a5d661aSToomas Soome L. Peter Deutsch 541*4a5d661aSToomas Soome Aladdin Enterprises 542*4a5d661aSToomas Soome 203 Santa Margarita Ave. 543*4a5d661aSToomas Soome Menlo Park, CA 94025 544*4a5d661aSToomas Soome 545*4a5d661aSToomas Soome Phone: (415) 322-0103 (AM only) 546*4a5d661aSToomas Soome FAX: (415) 322-1734 547*4a5d661aSToomas Soome EMail: <ghost@aladdin.com> 548*4a5d661aSToomas Soome 549*4a5d661aSToomas Soome Questions about the technical content of this specification can be 550*4a5d661aSToomas Soome sent by email to: 551*4a5d661aSToomas Soome 552*4a5d661aSToomas Soome Jean-Loup Gailly <gzip@prep.ai.mit.edu> and 553*4a5d661aSToomas Soome Mark Adler <madler@alumni.caltech.edu> 554*4a5d661aSToomas Soome 555*4a5d661aSToomas Soome Editorial comments on this specification can be sent by email to: 556*4a5d661aSToomas Soome 557*4a5d661aSToomas Soome L. Peter Deutsch <ghost@aladdin.com> and 558*4a5d661aSToomas Soome Glenn Randers-Pehrson <randeg@alumni.rpi.edu> 559*4a5d661aSToomas Soome 560*4a5d661aSToomas Soome 561*4a5d661aSToomas Soome 562*4a5d661aSToomas SoomeDeutsch Informational [Page 10] 563*4a5d661aSToomas Soome 564*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 565*4a5d661aSToomas Soome 566*4a5d661aSToomas Soome 567*4a5d661aSToomas Soome7. Appendix: Jean-Loup Gailly's gzip utility 568*4a5d661aSToomas Soome 569*4a5d661aSToomas Soome The most widely used implementation of gzip compression, and the 570*4a5d661aSToomas Soome original documentation on which this specification is based, were 571*4a5d661aSToomas Soome created by Jean-Loup Gailly <gzip@prep.ai.mit.edu>. Since this 572*4a5d661aSToomas Soome implementation is a de facto standard, we mention some more of its 573*4a5d661aSToomas Soome features here. Again, the material in this section is not part of 574*4a5d661aSToomas Soome the specification per se, and implementations need not follow it to 575*4a5d661aSToomas Soome be compliant. 576*4a5d661aSToomas Soome 577*4a5d661aSToomas Soome When compressing or decompressing a file, gzip preserves the 578*4a5d661aSToomas Soome protection, ownership, and modification time attributes on the local 579*4a5d661aSToomas Soome file system, since there is no provision for representing protection 580*4a5d661aSToomas Soome attributes in the gzip file format itself. Since the file format 581*4a5d661aSToomas Soome includes a modification time, the gzip decompressor provides a 582*4a5d661aSToomas Soome command line switch that assigns the modification time from the file, 583*4a5d661aSToomas Soome rather than the local modification time of the compressed input, to 584*4a5d661aSToomas Soome the decompressed output. 585*4a5d661aSToomas Soome 586*4a5d661aSToomas Soome8. Appendix: Sample CRC Code 587*4a5d661aSToomas Soome 588*4a5d661aSToomas Soome The following sample code represents a practical implementation of 589*4a5d661aSToomas Soome the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42 590*4a5d661aSToomas Soome for a formal specification.) 591*4a5d661aSToomas Soome 592*4a5d661aSToomas Soome The sample code is in the ANSI C programming language. Non C users 593*4a5d661aSToomas Soome may find it easier to read with these hints: 594*4a5d661aSToomas Soome 595*4a5d661aSToomas Soome & Bitwise AND operator. 596*4a5d661aSToomas Soome ^ Bitwise exclusive-OR operator. 597*4a5d661aSToomas Soome >> Bitwise right shift operator. When applied to an 598*4a5d661aSToomas Soome unsigned quantity, as here, right shift inserts zero 599*4a5d661aSToomas Soome bit(s) at the left. 600*4a5d661aSToomas Soome ! Logical NOT operator. 601*4a5d661aSToomas Soome ++ "n++" increments the variable n. 602*4a5d661aSToomas Soome 0xNNN 0x introduces a hexadecimal (base 16) constant. 603*4a5d661aSToomas Soome Suffix L indicates a long value (at least 32 bits). 604*4a5d661aSToomas Soome 605*4a5d661aSToomas Soome /* Table of CRCs of all 8-bit messages. */ 606*4a5d661aSToomas Soome unsigned long crc_table[256]; 607*4a5d661aSToomas Soome 608*4a5d661aSToomas Soome /* Flag: has the table been computed? Initially false. */ 609*4a5d661aSToomas Soome int crc_table_computed = 0; 610*4a5d661aSToomas Soome 611*4a5d661aSToomas Soome /* Make the table for a fast CRC. */ 612*4a5d661aSToomas Soome void make_crc_table(void) 613*4a5d661aSToomas Soome { 614*4a5d661aSToomas Soome unsigned long c; 615*4a5d661aSToomas Soome 616*4a5d661aSToomas Soome 617*4a5d661aSToomas Soome 618*4a5d661aSToomas SoomeDeutsch Informational [Page 11] 619*4a5d661aSToomas Soome 620*4a5d661aSToomas SoomeRFC 1952 GZIP File Format Specification May 1996 621*4a5d661aSToomas Soome 622*4a5d661aSToomas Soome 623*4a5d661aSToomas Soome int n, k; 624*4a5d661aSToomas Soome for (n = 0; n < 256; n++) { 625*4a5d661aSToomas Soome c = (unsigned long) n; 626*4a5d661aSToomas Soome for (k = 0; k < 8; k++) { 627*4a5d661aSToomas Soome if (c & 1) { 628*4a5d661aSToomas Soome c = 0xedb88320L ^ (c >> 1); 629*4a5d661aSToomas Soome } else { 630*4a5d661aSToomas Soome c = c >> 1; 631*4a5d661aSToomas Soome } 632*4a5d661aSToomas Soome } 633*4a5d661aSToomas Soome crc_table[n] = c; 634*4a5d661aSToomas Soome } 635*4a5d661aSToomas Soome crc_table_computed = 1; 636*4a5d661aSToomas Soome } 637*4a5d661aSToomas Soome 638*4a5d661aSToomas Soome /* 639*4a5d661aSToomas Soome Update a running crc with the bytes buf[0..len-1] and return 640*4a5d661aSToomas Soome the updated crc. The crc should be initialized to zero. Pre- and 641*4a5d661aSToomas Soome post-conditioning (one's complement) is performed within this 642*4a5d661aSToomas Soome function so it shouldn't be done by the caller. Usage example: 643*4a5d661aSToomas Soome 644*4a5d661aSToomas Soome unsigned long crc = 0L; 645*4a5d661aSToomas Soome 646*4a5d661aSToomas Soome while (read_buffer(buffer, length) != EOF) { 647*4a5d661aSToomas Soome crc = update_crc(crc, buffer, length); 648*4a5d661aSToomas Soome } 649*4a5d661aSToomas Soome if (crc != original_crc) error(); 650*4a5d661aSToomas Soome */ 651*4a5d661aSToomas Soome unsigned long update_crc(unsigned long crc, 652*4a5d661aSToomas Soome unsigned char *buf, int len) 653*4a5d661aSToomas Soome { 654*4a5d661aSToomas Soome unsigned long c = crc ^ 0xffffffffL; 655*4a5d661aSToomas Soome int n; 656*4a5d661aSToomas Soome 657*4a5d661aSToomas Soome if (!crc_table_computed) 658*4a5d661aSToomas Soome make_crc_table(); 659*4a5d661aSToomas Soome for (n = 0; n < len; n++) { 660*4a5d661aSToomas Soome c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8); 661*4a5d661aSToomas Soome } 662*4a5d661aSToomas Soome return c ^ 0xffffffffL; 663*4a5d661aSToomas Soome } 664*4a5d661aSToomas Soome 665*4a5d661aSToomas Soome /* Return the CRC of the bytes buf[0..len-1]. */ 666*4a5d661aSToomas Soome unsigned long crc(unsigned char *buf, int len) 667*4a5d661aSToomas Soome { 668*4a5d661aSToomas Soome return update_crc(0L, buf, len); 669*4a5d661aSToomas Soome } 670*4a5d661aSToomas Soome 671*4a5d661aSToomas Soome 672*4a5d661aSToomas Soome 673*4a5d661aSToomas Soome 674*4a5d661aSToomas SoomeDeutsch Informational [Page 12] 675*4a5d661aSToomas Soome 676