1.\" $File: magic.man,v 1.91 2017/02/12 15:30:08 christos Exp $ 2.Dd February 12, 2017 3.Dt MAGIC __FSECTION__ 4.Os 5.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. 6.Sh NAME 7.Nm magic 8.Nd file command's magic pattern file 9.Sh DESCRIPTION 10This manual page documents the format of magic files as 11used by the 12.Xr file __CSECTION__ 13command, version __VERSION__. 14The 15.Xr file __CSECTION__ 16command identifies the type of a file using, 17among other tests, 18a test for whether the file contains certain 19.Dq "magic patterns" . 20The database of these 21.Dq "magic patterns" 22is usually located in a binary file in 23.Pa __MAGIC__.mgc 24or a directory of source text magic pattern fragment files in 25.Pa __MAGIC__ . 26The database specifies what patterns are to be tested for, what message or 27MIME type to print if a particular pattern is found, 28and additional information to extract from the file. 29.Pp 30The format of the source fragment files that are used to build this database 31is as follows: 32Each line of a fragment file specifies a test to be performed. 33A test compares the data starting at a particular offset 34in the file with a byte value, a string or a numeric value. 35If the test succeeds, a message is printed. 36The line consists of the following fields: 37.Bl -tag -width ".Dv message" 38.It Dv offset 39A number specifying the offset, in bytes, into the file of the data 40which is to be tested. 41.It Dv type 42The type of the data to be tested. 43The possible values are: 44.Bl -tag -width ".Dv lestring16" 45.It Dv byte 46A one-byte value. 47.It Dv short 48A two-byte value in this machine's native byte order. 49.It Dv long 50A four-byte value in this machine's native byte order. 51.It Dv quad 52An eight-byte value in this machine's native byte order. 53.It Dv float 54A 32-bit single precision IEEE floating point number in this machine's native byte order. 55.It Dv double 56A 64-bit double precision IEEE floating point number in this machine's native byte order. 57.It Dv string 58A string of bytes. 59The string type specification can be optionally followed 60by /[WwcCtbT]*. 61The 62.Dq W 63flag compacts whitespace in the target, which must 64contain at least one whitespace character. 65If the magic has 66.Dv n 67consecutive blanks, the target needs at least 68.Dv n 69consecutive blanks to match. 70The 71.Dq w 72flag treats every blank in the magic as an optional blank. 73The 74.Dq c 75flag specifies case insensitive matching: lower case 76characters in the magic match both lower and upper case characters in the 77target, whereas upper case characters in the magic only match upper case 78characters in the target. 79The 80.Dq C 81flag specifies case insensitive matching: upper case 82characters in the magic match both lower and upper case characters in the 83target, whereas lower case characters in the magic only match upper case 84characters in the target. 85To do a complete case insensitive match, specify both 86.Dq c 87and 88.Dq C . 89The 90.Dq t 91flag forces the test to be done for text files, while the 92.Dq b 93flag forces the test to be done for binary files. 94The 95.Dq T 96flag causes the string to be trimmed, i.e. leading and trailing whitespace 97is deleted before the string is printed. 98.It Dv pstring 99A Pascal-style string where the first byte/short/int is interpreted as the 100unsigned length. 101The length defaults to byte and can be specified as a modifier. 102The following modifiers are supported: 103.Bl -tag -compact -width B 104.It B 105A byte length (default). 106.It H 107A 4 byte big endian length. 108.It h 109A 2 byte big endian length. 110.It L 111A 4 byte little endian length. 112.It l 113A 2 byte little endian length. 114.It J 115The length includes itself in its count. 116.El 117The string is not NUL terminated. 118.Dq J 119is used rather than the more 120valuable 121.Dq I 122because this type of length is a feature of the JPEG 123format. 124.It Dv date 125A four-byte value interpreted as a UNIX date. 126.It Dv qdate 127A eight-byte value interpreted as a UNIX date. 128.It Dv ldate 129A four-byte value interpreted as a UNIX-style date, but interpreted as 130local time rather than UTC. 131.It Dv qldate 132An eight-byte value interpreted as a UNIX-style date, but interpreted as 133local time rather than UTC. 134.It Dv qwdate 135An eight-byte value interpreted as a Windows-style date. 136.It Dv beid3 137A 32-bit ID3 length in big-endian byte order. 138.It Dv beshort 139A two-byte value in big-endian byte order. 140.It Dv belong 141A four-byte value in big-endian byte order. 142.It Dv bequad 143An eight-byte value in big-endian byte order. 144.It Dv befloat 145A 32-bit single precision IEEE floating point number in big-endian byte order. 146.It Dv bedouble 147A 64-bit double precision IEEE floating point number in big-endian byte order. 148.It Dv bedate 149A four-byte value in big-endian byte order, 150interpreted as a Unix date. 151.It Dv beqdate 152An eight-byte value in big-endian byte order, 153interpreted as a Unix date. 154.It Dv beldate 155A four-byte value in big-endian byte order, 156interpreted as a UNIX-style date, but interpreted as local time rather 157than UTC. 158.It Dv beqldate 159An eight-byte value in big-endian byte order, 160interpreted as a UNIX-style date, but interpreted as local time rather 161than UTC. 162.It Dv beqwdate 163An eight-byte value in big-endian byte order, 164interpreted as a Windows-style date. 165.It Dv bestring16 166A two-byte unicode (UCS16) string in big-endian byte order. 167.It Dv leid3 168A 32-bit ID3 length in little-endian byte order. 169.It Dv leshort 170A two-byte value in little-endian byte order. 171.It Dv lelong 172A four-byte value in little-endian byte order. 173.It Dv lequad 174An eight-byte value in little-endian byte order. 175.It Dv lefloat 176A 32-bit single precision IEEE floating point number in little-endian byte order. 177.It Dv ledouble 178A 64-bit double precision IEEE floating point number in little-endian byte order. 179.It Dv ledate 180A four-byte value in little-endian byte order, 181interpreted as a UNIX date. 182.It Dv leqdate 183An eight-byte value in little-endian byte order, 184interpreted as a UNIX date. 185.It Dv leldate 186A four-byte value in little-endian byte order, 187interpreted as a UNIX-style date, but interpreted as local time rather 188than UTC. 189.It Dv leqldate 190An eight-byte value in little-endian byte order, 191interpreted as a UNIX-style date, but interpreted as local time rather 192than UTC. 193.It Dv leqwdate 194An eight-byte value in little-endian byte order, 195interpreted as a Windows-style date. 196.It Dv lestring16 197A two-byte unicode (UCS16) string in little-endian byte order. 198.It Dv melong 199A four-byte value in middle-endian (PDP-11) byte order. 200.It Dv medate 201A four-byte value in middle-endian (PDP-11) byte order, 202interpreted as a UNIX date. 203.It Dv meldate 204A four-byte value in middle-endian (PDP-11) byte order, 205interpreted as a UNIX-style date, but interpreted as local time rather 206than UTC. 207.It Dv indirect 208Starting at the given offset, consult the magic database again. 209The offset of the 210.Dv indirect 211magic is by default absolute in the file, but one can specify 212.Dv /r 213to indicate that the offset is relative from the beginning of the entry. 214.It Dv name 215Define a 216.Dq named 217magic instance that can be called from another 218.Dv use 219magic entry, like a subroutine call. 220Named instance direct magic offsets are relative to the offset of the 221previous matched entry, but indirect offsets are relative to the beginning 222of the file as usual. 223Named magic entries always match. 224.It Dv use 225Recursively call the named magic starting from the current offset. 226If the name of the referenced begins with a 227.Dv ^ 228then the endianness of the magic is switched; if the magic mentioned 229.Dv leshort 230for example, 231it is treated as 232.Dv beshort 233and vice versa. 234This is useful to avoid duplicating the rules for different endianness. 235.It Dv regex 236A regular expression match in extended POSIX regular expression syntax 237(like egrep). 238Regular expressions can take exponential time to process, and their 239performance is hard to predict, so their use is discouraged. 240When used in production environments, their performance 241should be carefully checked. 242The size of the string to search should also be limited by specifying 243.Dv /<length> , 244to avoid performance issues scanning long files. 245The type specification can also be optionally followed by 246.Dv /[c][s][l] . 247The 248.Dq c 249flag makes the match case insensitive, while the 250.Dq s 251flag update the offset to the start offset of the match, rather than the end. 252The 253.Dq l 254modifier, changes the limit of length to mean number of lines instead of a 255byte count. 256Lines are delimited by the platforms native line delimiter. 257When a line count is specified, an implicit byte count also computed assuming 258each line is 80 characters long. 259If neither a byte or line count is specified, the search is limited automatically 260to 8KiB. 261.Dv ^ 262and 263.Dv $ 264match the beginning and end of individual lines, respectively, 265not beginning and end of file. 266.It Dv search 267A literal string search starting at the given offset. 268The same modifier flags can be used as for string patterns. 269The search expression must contain the range in the form 270.Dv /number, 271that is the number of positions at which the match will be 272attempted, starting from the start offset. 273This is suitable for 274searching larger binary expressions with variable offsets, using 275.Dv \e 276escapes for special characters. 277The order of modifier and number is not relevant. 278.It Dv default 279This is intended to be used with the test 280.Em x 281(which is always true) and it has no type. 282It matches when no other test at that continuation level has matched before. 283Clearing that matched tests for a continuation level, can be done using the 284.Dv clear 285test. 286.It Dv clear 287This test is always true and clears the match flag for that continuation level. 288It is intended to be used with the 289.Dv default 290test. 291.El 292.Pp 293For compatibility with the Single 294.Ux 295Standard, the type specifiers 296.Dv dC 297and 298.Dv d1 299are equivalent to 300.Dv byte , 301the type specifiers 302.Dv uC 303and 304.Dv u1 305are equivalent to 306.Dv ubyte , 307the type specifiers 308.Dv dS 309and 310.Dv d2 311are equivalent to 312.Dv short , 313the type specifiers 314.Dv uS 315and 316.Dv u2 317are equivalent to 318.Dv ushort , 319the type specifiers 320.Dv dI , 321.Dv dL , 322and 323.Dv d4 324are equivalent to 325.Dv long , 326the type specifiers 327.Dv uI , 328.Dv uL , 329and 330.Dv u4 331are equivalent to 332.Dv ulong , 333the type specifier 334.Dv d8 335is equivalent to 336.Dv quad , 337the type specifier 338.Dv u8 339is equivalent to 340.Dv uquad , 341and the type specifier 342.Dv s 343is equivalent to 344.Dv string . 345In addition, the type specifier 346.Dv dQ 347is equivalent to 348.Dv quad 349and the type specifier 350.Dv uQ 351is equivalent to 352.Dv uquad . 353.Pp 354Each top-level magic pattern (see below for an explanation of levels) 355is classified as text or binary according to the types used. 356Types 357.Dq regex 358and 359.Dq search 360are classified as text tests, unless non-printable characters are used 361in the pattern. 362All other tests are classified as binary. 363A top-level 364pattern is considered to be a test text when all its patterns are text 365patterns; otherwise, it is considered to be a binary pattern. 366When 367matching a file, binary patterns are tried first; if no match is 368found, and the file looks like text, then its encoding is determined 369and the text patterns are tried. 370.Pp 371The numeric types may optionally be followed by 372.Dv \*[Am] 373and a numeric value, 374to specify that the value is to be AND'ed with the 375numeric value before any comparisons are done. 376Prepending a 377.Dv u 378to the type indicates that ordered comparisons should be unsigned. 379.It Dv test 380The value to be compared with the value from the file. 381If the type is 382numeric, this value 383is specified in C form; if it is a string, it is specified as a C string 384with the usual escapes permitted (e.g. \en for new-line). 385.Pp 386Numeric values 387may be preceded by a character indicating the operation to be performed. 388It may be 389.Dv = , 390to specify that the value from the file must equal the specified value, 391.Dv \*[Lt] , 392to specify that the value from the file must be less than the specified 393value, 394.Dv \*[Gt] , 395to specify that the value from the file must be greater than the specified 396value, 397.Dv \*[Am] , 398to specify that the value from the file must have set all of the bits 399that are set in the specified value, 400.Dv ^ , 401to specify that the value from the file must have clear any of the bits 402that are set in the specified value, or 403.Dv ~ , 404the value specified after is negated before tested. 405.Dv x , 406to specify that any value will match. 407If the character is omitted, it is assumed to be 408.Dv = . 409Operators 410.Dv \*[Am] , 411.Dv ^ , 412and 413.Dv ~ 414don't work with floats and doubles. 415The operator 416.Dv !\& 417specifies that the line matches if the test does 418.Em not 419succeed. 420.Pp 421Numeric values are specified in C form; e.g. 422.Dv 13 423is decimal, 424.Dv 013 425is octal, and 426.Dv 0x13 427is hexadecimal. 428.Pp 429Numeric operations are not performed on date types, instead the numeric 430value is interpreted as an offset. 431.Pp 432For string values, the string from the 433file must match the specified string. 434The operators 435.Dv = , 436.Dv \*[Lt] 437and 438.Dv \*[Gt] 439(but not 440.Dv \*[Am] ) 441can be applied to strings. 442The length used for matching is that of the string argument 443in the magic file. 444This means that a line can match any non-empty string (usually used to 445then print the string), with 446.Em \*[Gt]\e0 447(because all non-empty strings are greater than the empty string). 448.Pp 449Dates are treated as numerical values in the respective internal 450representation. 451.Pp 452The special test 453.Em x 454always evaluates to true. 455.It Dv message 456The message to be printed if the comparison succeeds. 457If the string contains a 458.Xr printf 3 459format specification, the value from the file (with any specified masking 460performed) is printed using the message as the format string. 461If the string begins with 462.Dq \eb , 463the message printed is the remainder of the string with no whitespace 464added before it: multiple matches are normally separated by a single 465space. 466.El 467.Pp 468An APPLE 4+4 character APPLE creator and type can be specified as: 469.Bd -literal -offset indent 470!:apple CREATYPE 471.Ed 472.Pp 473A MIME type is given on a separate line, which must be the next 474non-blank or comment line after the magic line that identifies the 475file type, and has the following format: 476.Bd -literal -offset indent 477!:mime MIMETYPE 478.Ed 479.Pp 480i.e. the literal string 481.Dq !:mime 482followed by the MIME type. 483.Pp 484An optional strength can be supplied on a separate line which refers to 485the current magic description using the following format: 486.Bd -literal -offset indent 487!:strength OP VALUE 488.Ed 489.Pp 490The operand 491.Dv OP 492can be: 493.Dv + , 494.Dv - , 495.Dv * , 496or 497.Dv / 498and 499.Dv VALUE 500is a constant between 0 and 255. 501This constant is applied using the specified operand 502to the currently computed default magic strength. 503.Pp 504Some file formats contain additional information which is to be printed 505along with the file type or need additional tests to determine the true 506file type. 507These additional tests are introduced by one or more 508.Em \*[Gt] 509characters preceding the offset. 510The number of 511.Em \*[Gt] 512on the line indicates the level of the test; a line with no 513.Em \*[Gt] 514at the beginning is considered to be at level 0. 515Tests are arranged in a tree-like hierarchy: 516if the test on a line at level 517.Em n 518succeeds, all following tests at level 519.Em n+1 520are performed, and the messages printed if the tests succeed, until a line 521with level 522.Em n 523(or less) appears. 524For more complex files, one can use empty messages to get just the 525"if/then" effect, in the following way: 526.Bd -literal -offset indent 5270 string MZ 528\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable 529\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) 530.Ed 531.Pp 532Offsets do not need to be constant, but can also be read from the file 533being examined. 534If the first character following the last 535.Em \*[Gt] 536is a 537.Em \&( 538then the string after the parenthesis is interpreted as an indirect offset. 539That means that the number after the parenthesis is used as an offset in 540the file. 541The value at that offset is read, and is used again as an offset 542in the file. 543Indirect offsets are of the form: 544.Em (( x [[.,][bislBISL]][+\-][ y ]) . 545The value of 546.Em x 547is used as an offset in the file. 548A byte, id3 length, short or long is read at that offset depending on the 549.Em [bislBISLm] 550type specifier. 551The value is treated as signed if 552.Dq , 553is specified or unsigned if 554.Dq . 555is specified. 556The capitalized types interpret the number as a big endian 557value, whereas the small letter versions interpret the number as a little 558endian value; 559the 560.Em m 561type interprets the number as a middle endian (PDP-11) value. 562To that number the value of 563.Em y 564is added and the result is used as an offset in the file. 565The default type if one is not specified is long. 566.Pp 567That way variable length structures can be examined: 568.Bd -literal -offset indent 569# MS Windows executables are also valid MS-DOS executables 5700 string MZ 571\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) 572# skip the whole block below if it is not an extended executable 573\*[Gt]0x18 leshort \*[Gt]0x3f 574\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 575\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) 576.Ed 577.Pp 578This strategy of examining has a drawback: you must make sure that you 579eventually print something, or users may get empty output (such as when 580there is neither PE\e0\e0 nor LE\e0\e0 in the above example). 581.Pp 582If this indirect offset cannot be used directly, simple calculations are 583possible: appending 584.Em [+-*/%\*[Am]|^]number 585inside parentheses allows one to modify 586the value read from the file before it is used as an offset: 587.Bd -literal -offset indent 588# MS Windows executables are also valid MS-DOS executables 5890 string MZ 590# sometimes, the value at 0x18 is less that 0x40 but there's still an 591# extended executable, simply appended to the file 592\*[Gt]0x18 leshort \*[Lt]0x40 593\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) 594\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 595.Ed 596.Pp 597Sometimes you do not know the exact offset as this depends on the length or 598position (when indirection was used before) of preceding fields. 599You can specify an offset relative to the end of the last up-level 600field using 601.Sq \*[Am] 602as a prefix to the offset: 603.Bd -literal -offset indent 6040 string MZ 605\*[Gt]0x18 leshort \*[Gt]0x3f 606\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 607# immediately following the PE signature is the CPU type 608\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 609\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha 610.Ed 611.Pp 612Indirect and relative offsets can be combined: 613.Bd -literal -offset indent 6140 string MZ 615\*[Gt]0x18 leshort \*[Lt]0x40 616\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 617# if it's not COFF, go back 512 bytes and add the offset taken 618# from byte 2/3, which is yet another way of finding the start 619# of the extended executable 620\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) 621.Ed 622.Pp 623Or the other way around: 624.Bd -literal -offset indent 6250 string MZ 626\*[Gt]0x18 leshort \*[Gt]0x3f 627\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 628# at offset 0x80 (-4, since relative offsets start at the end 629# of the up-level match) inside the LE header, we find the absolute 630# offset to the code area, where we look for a specific signature 631\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed 632.Ed 633.Pp 634Or even both! 635.Bd -literal -offset indent 6360 string MZ 637\*[Gt]0x18 leshort \*[Gt]0x3f 638\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 639# at offset 0x58 inside the LE header, we find the relative offset 640# to a data area where we look for a specific signature 641\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive 642.Ed 643.Pp 644If you have to deal with offset/length pairs in your file, even the 645second value in a parenthesized expression can be taken from the file itself, 646using another set of parentheses. 647Note that this additional indirect offset is always relative to the 648start of the main indirect offset. 649.Bd -literal -offset indent 6500 string MZ 651\*[Gt]0x18 leshort \*[Gt]0x3f 652\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 653# search for the PE section called ".idata"... 654\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata 655# ...and go to the end of it, calculated from start+length; 656# these are located 14 and 10 bytes after the section name 657\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive 658.Ed 659.Pp 660If you have a list of known values at a particular continuation level, 661and you want to provide a switch-like default case: 662.Bd -literal -offset indent 663# clear that continuation level match 664\*[Gt]18 clear 665\*[Gt]18 lelong 1 one 666\*[Gt]18 lelong 2 two 667\*[Gt]18 default x 668# print default match 669\*[Gt]\*[Gt]18 lelong x unmatched 0x%x 670.Ed 671.Sh SEE ALSO 672.Xr file __CSECTION__ 673\- the command that reads this file. 674.Sh BUGS 675The formats 676.Dv long , 677.Dv belong , 678.Dv lelong , 679.Dv melong , 680.Dv short , 681.Dv beshort , 682and 683.Dv leshort 684do not depend on the length of the C data types 685.Dv short 686and 687.Dv long 688on the platform, even though the Single 689.Ux 690Specification implies that they do. However, as OS X Mountain Lion has 691passed the Single 692.Ux 693Specification validation suite, and supplies a version of 694.Xr file __CSECTION__ 695in which they do not depend on the sizes of the C data types and that is 696built for a 64-bit environment in which 697.Dv long 698is 8 bytes rather than 4 bytes, presumably the validation suite does not 699test whether, for example 700.Dv long 701refers to an item with the same size as the C data type 702.Dv long . 703There should probably be 704.Dv type 705names 706.Dv int8 , 707.Dv uint8 , 708.Dv int16 , 709.Dv uint16 , 710.Dv int32 , 711.Dv uint32 , 712.Dv int64 , 713and 714.Dv uint64 , 715and specified-byte-order variants of them, 716to make it clearer that those types have specified widths. 717.\" 718.\" From: guy@sun.uucp (Guy Harris) 719.\" Newsgroups: net.bugs.usg 720.\" Subject: /etc/magic's format isn't well documented 721.\" Message-ID: <2752@sun.uucp> 722.\" Date: 3 Sep 85 08:19:07 GMT 723.\" Organization: Sun Microsystems, Inc. 724.\" Lines: 136 725.\" 726.\" Here's a manual page for the format accepted by the "file" made by adding 727.\" the changes I posted to the S5R2 version. 728.\" 729.\" Modified for Ian Darwin's version of the file command. 730