1.It S2 2.\" $File: magic.man,v 1.93 2018/06/22 20:39:49 christos Exp $ 3.Dd June 22, 2018 4.Dt MAGIC __FSECTION__ 5.Os 6.\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems. 7.Sh NAME 8.Nm magic 9.Nd file command's magic pattern file 10.Sh DESCRIPTION 11This manual page documents the format of magic files as 12used by the 13.Xr file __CSECTION__ 14command, version __VERSION__. 15The 16.Xr file __CSECTION__ 17command identifies the type of a file using, 18among other tests, 19a test for whether the file contains certain 20.Dq "magic patterns" . 21The database of these 22.Dq "magic patterns" 23is usually located in a binary file in 24.Pa __MAGIC__.mgc 25or a directory of source text magic pattern fragment files in 26.Pa __MAGIC__ . 27The database specifies what patterns are to be tested for, what message or 28MIME type to print if a particular pattern is found, 29and additional information to extract from the file. 30.Pp 31The format of the source fragment files that are used to build this database 32is as follows: 33Each line of a fragment file specifies a test to be performed. 34A test compares the data starting at a particular offset 35in the file with a byte value, a string or a numeric value. 36If the test succeeds, a message is printed. 37The line consists of the following fields: 38.Bl -tag -width ".Dv message" 39.It Dv offset 40A number specifying the offset (in bytes) into the file of the data 41which is to be tested. 42This offset can be a negative number if it is: 43.Bl -bullet -compact 44.It 45The first direct offset of the magic entry (at continuation level 0), 46in which case it is interpreted an offset from end end of the file 47going backwards. 48This works only when a file descriptor to the file is a available and it 49is a regular file. 50.It 51A continuation offset relative to the end of the last up-level field 52.Dv ( \*[Am] ) . 53.El 54.It Dv type 55The type of the data to be tested. 56The possible values are: 57.Bl -tag -width ".Dv lestring16" 58.It Dv byte 59A one-byte value. 60.It Dv short 61A two-byte value in this machine's native byte order. 62.It Dv long 63A four-byte value in this machine's native byte order. 64.It Dv quad 65An eight-byte value in this machine's native byte order. 66.It Dv float 67A 32-bit single precision IEEE floating point number in this machine's native byte order. 68.It Dv double 69A 64-bit double precision IEEE floating point number in this machine's native byte order. 70.It Dv string 71A string of bytes. 72The string type specification can be optionally followed 73by /[WwcCtbT]*. 74The 75.Dq W 76flag compacts whitespace in the target, which must 77contain at least one whitespace character. 78If the magic has 79.Dv n 80consecutive blanks, the target needs at least 81.Dv n 82consecutive blanks to match. 83The 84.Dq w 85flag treats every blank in the magic as an optional blank. 86The 87.Dq c 88flag specifies case insensitive matching: lower case 89characters in the magic match both lower and upper case characters in the 90target, whereas upper case characters in the magic only match upper case 91characters in the target. 92The 93.Dq C 94flag specifies case insensitive matching: upper case 95characters in the magic match both lower and upper case characters in the 96target, whereas lower case characters in the magic only match upper case 97characters in the target. 98To do a complete case insensitive match, specify both 99.Dq c 100and 101.Dq C . 102The 103.Dq t 104flag forces the test to be done for text files, while the 105.Dq b 106flag forces the test to be done for binary files. 107The 108.Dq T 109flag causes the string to be trimmed, i.e. leading and trailing whitespace 110is deleted before the string is printed. 111.It Dv pstring 112A Pascal-style string where the first byte/short/int is interpreted as the 113unsigned length. 114The length defaults to byte and can be specified as a modifier. 115The following modifiers are supported: 116.Bl -tag -compact -width B 117.It B 118A byte length (default). 119.It H 120A 4 byte big endian length. 121.It h 122A 2 byte big endian length. 123.It L 124A 4 byte little endian length. 125.It l 126A 2 byte little endian length. 127.It J 128The length includes itself in its count. 129.El 130The string is not NUL terminated. 131.Dq J 132is used rather than the more 133valuable 134.Dq I 135because this type of length is a feature of the JPEG 136format. 137.It Dv date 138A four-byte value interpreted as a UNIX date. 139.It Dv qdate 140A eight-byte value interpreted as a UNIX date. 141.It Dv ldate 142A four-byte value interpreted as a UNIX-style date, but interpreted as 143local time rather than UTC. 144.It Dv qldate 145An eight-byte value interpreted as a UNIX-style date, but interpreted as 146local time rather than UTC. 147.It Dv qwdate 148An eight-byte value interpreted as a Windows-style date. 149.It Dv beid3 150A 32-bit ID3 length in big-endian byte order. 151.It Dv beshort 152A two-byte value in big-endian byte order. 153.It Dv belong 154A four-byte value in big-endian byte order. 155.It Dv bequad 156An eight-byte value in big-endian byte order. 157.It Dv befloat 158A 32-bit single precision IEEE floating point number in big-endian byte order. 159.It Dv bedouble 160A 64-bit double precision IEEE floating point number in big-endian byte order. 161.It Dv bedate 162A four-byte value in big-endian byte order, 163interpreted as a Unix date. 164.It Dv beqdate 165An eight-byte value in big-endian byte order, 166interpreted as a Unix date. 167.It Dv beldate 168A four-byte value in big-endian byte order, 169interpreted as a UNIX-style date, but interpreted as local time rather 170than UTC. 171.It Dv beqldate 172An eight-byte value in big-endian byte order, 173interpreted as a UNIX-style date, but interpreted as local time rather 174than UTC. 175.It Dv beqwdate 176An eight-byte value in big-endian byte order, 177interpreted as a Windows-style date. 178.It Dv bestring16 179A two-byte unicode (UCS16) string in big-endian byte order. 180.It Dv leid3 181A 32-bit ID3 length in little-endian byte order. 182.It Dv leshort 183A two-byte value in little-endian byte order. 184.It Dv lelong 185A four-byte value in little-endian byte order. 186.It Dv lequad 187An eight-byte value in little-endian byte order. 188.It Dv lefloat 189A 32-bit single precision IEEE floating point number in little-endian byte order. 190.It Dv ledouble 191A 64-bit double precision IEEE floating point number in little-endian byte order. 192.It Dv ledate 193A four-byte value in little-endian byte order, 194interpreted as a UNIX date. 195.It Dv leqdate 196An eight-byte value in little-endian byte order, 197interpreted as a UNIX date. 198.It Dv leldate 199A four-byte value in little-endian byte order, 200interpreted as a UNIX-style date, but interpreted as local time rather 201than UTC. 202.It Dv leqldate 203An eight-byte value in little-endian byte order, 204interpreted as a UNIX-style date, but interpreted as local time rather 205than UTC. 206.It Dv leqwdate 207An eight-byte value in little-endian byte order, 208interpreted as a Windows-style date. 209.It Dv lestring16 210A two-byte unicode (UCS16) string in little-endian byte order. 211.It Dv melong 212A four-byte value in middle-endian (PDP-11) byte order. 213.It Dv medate 214A four-byte value in middle-endian (PDP-11) byte order, 215interpreted as a UNIX date. 216.It Dv meldate 217A four-byte value in middle-endian (PDP-11) byte order, 218interpreted as a UNIX-style date, but interpreted as local time rather 219than UTC. 220.It Dv indirect 221Starting at the given offset, consult the magic database again. 222The offset of the 223.Dv indirect 224magic is by default absolute in the file, but one can specify 225.Dv /r 226to indicate that the offset is relative from the beginning of the entry. 227.It Dv name 228Define a 229.Dq named 230magic instance that can be called from another 231.Dv use 232magic entry, like a subroutine call. 233Named instance direct magic offsets are relative to the offset of the 234previous matched entry, but indirect offsets are relative to the beginning 235of the file as usual. 236Named magic entries always match. 237.It Dv use 238Recursively call the named magic starting from the current offset. 239If the name of the referenced begins with a 240.Dv ^ 241then the endianness of the magic is switched; if the magic mentioned 242.Dv leshort 243for example, 244it is treated as 245.Dv beshort 246and vice versa. 247This is useful to avoid duplicating the rules for different endianness. 248.It Dv regex 249A regular expression match in extended POSIX regular expression syntax 250(like egrep). 251Regular expressions can take exponential time to process, and their 252performance is hard to predict, so their use is discouraged. 253When used in production environments, their performance 254should be carefully checked. 255The size of the string to search should also be limited by specifying 256.Dv /<length> , 257to avoid performance issues scanning long files. 258The type specification can also be optionally followed by 259.Dv /[c][s][l] . 260The 261.Dq c 262flag makes the match case insensitive, while the 263.Dq s 264flag update the offset to the start offset of the match, rather than the end. 265The 266.Dq l 267modifier, changes the limit of length to mean number of lines instead of a 268byte count. 269Lines are delimited by the platforms native line delimiter. 270When a line count is specified, an implicit byte count also computed assuming 271each line is 80 characters long. 272If neither a byte or line count is specified, the search is limited automatically 273to 8KiB. 274.Dv ^ 275and 276.Dv $ 277match the beginning and end of individual lines, respectively, 278not beginning and end of file. 279.It Dv search 280A literal string search starting at the given offset. 281The same modifier flags can be used as for string patterns. 282The search expression must contain the range in the form 283.Dv /number, 284that is the number of positions at which the match will be 285attempted, starting from the start offset. 286This is suitable for 287searching larger binary expressions with variable offsets, using 288.Dv \e 289escapes for special characters. 290The order of modifier and number is not relevant. 291.It Dv default 292This is intended to be used with the test 293.Em x 294(which is always true) and it has no type. 295It matches when no other test at that continuation level has matched before. 296Clearing that matched tests for a continuation level, can be done using the 297.Dv clear 298test. 299.It Dv clear 300This test is always true and clears the match flag for that continuation level. 301It is intended to be used with the 302.Dv default 303test. 304.El 305.Pp 306For compatibility with the Single 307.Ux 308Standard, the type specifiers 309.Dv dC 310and 311.Dv d1 312are equivalent to 313.Dv byte , 314the type specifiers 315.Dv uC 316and 317.Dv u1 318are equivalent to 319.Dv ubyte , 320the type specifiers 321.Dv dS 322and 323.Dv d2 324are equivalent to 325.Dv short , 326the type specifiers 327.Dv uS 328and 329.Dv u2 330are equivalent to 331.Dv ushort , 332the type specifiers 333.Dv dI , 334.Dv dL , 335and 336.Dv d4 337are equivalent to 338.Dv long , 339the type specifiers 340.Dv uI , 341.Dv uL , 342and 343.Dv u4 344are equivalent to 345.Dv ulong , 346the type specifier 347.Dv d8 348is equivalent to 349.Dv quad , 350the type specifier 351.Dv u8 352is equivalent to 353.Dv uquad , 354and the type specifier 355.Dv s 356is equivalent to 357.Dv string . 358In addition, the type specifier 359.Dv dQ 360is equivalent to 361.Dv quad 362and the type specifier 363.Dv uQ 364is equivalent to 365.Dv uquad . 366.Pp 367Each top-level magic pattern (see below for an explanation of levels) 368is classified as text or binary according to the types used. 369Types 370.Dq regex 371and 372.Dq search 373are classified as text tests, unless non-printable characters are used 374in the pattern. 375All other tests are classified as binary. 376A top-level 377pattern is considered to be a test text when all its patterns are text 378patterns; otherwise, it is considered to be a binary pattern. 379When 380matching a file, binary patterns are tried first; if no match is 381found, and the file looks like text, then its encoding is determined 382and the text patterns are tried. 383.Pp 384The numeric types may optionally be followed by 385.Dv \*[Am] 386and a numeric value, 387to specify that the value is to be AND'ed with the 388numeric value before any comparisons are done. 389Prepending a 390.Dv u 391to the type indicates that ordered comparisons should be unsigned. 392.It Dv test 393The value to be compared with the value from the file. 394If the type is 395numeric, this value 396is specified in C form; if it is a string, it is specified as a C string 397with the usual escapes permitted (e.g. \en for new-line). 398.Pp 399Numeric values 400may be preceded by a character indicating the operation to be performed. 401It may be 402.Dv = , 403to specify that the value from the file must equal the specified value, 404.Dv \*[Lt] , 405to specify that the value from the file must be less than the specified 406value, 407.Dv \*[Gt] , 408to specify that the value from the file must be greater than the specified 409value, 410.Dv \*[Am] , 411to specify that the value from the file must have set all of the bits 412that are set in the specified value, 413.Dv ^ , 414to specify that the value from the file must have clear any of the bits 415that are set in the specified value, or 416.Dv ~ , 417the value specified after is negated before tested. 418.Dv x , 419to specify that any value will match. 420If the character is omitted, it is assumed to be 421.Dv = . 422Operators 423.Dv \*[Am] , 424.Dv ^ , 425and 426.Dv ~ 427don't work with floats and doubles. 428The operator 429.Dv !\& 430specifies that the line matches if the test does 431.Em not 432succeed. 433.Pp 434Numeric values are specified in C form; e.g. 435.Dv 13 436is decimal, 437.Dv 013 438is octal, and 439.Dv 0x13 440is hexadecimal. 441.Pp 442Numeric operations are not performed on date types, instead the numeric 443value is interpreted as an offset. 444.Pp 445For string values, the string from the 446file must match the specified string. 447The operators 448.Dv = , 449.Dv \*[Lt] 450and 451.Dv \*[Gt] 452(but not 453.Dv \*[Am] ) 454can be applied to strings. 455The length used for matching is that of the string argument 456in the magic file. 457This means that a line can match any non-empty string (usually used to 458then print the string), with 459.Em \*[Gt]\e0 460(because all non-empty strings are greater than the empty string). 461.Pp 462Dates are treated as numerical values in the respective internal 463representation. 464.Pp 465The special test 466.Em x 467always evaluates to true. 468.It Dv message 469The message to be printed if the comparison succeeds. 470If the string contains a 471.Xr printf 3 472format specification, the value from the file (with any specified masking 473performed) is printed using the message as the format string. 474If the string begins with 475.Dq \eb , 476the message printed is the remainder of the string with no whitespace 477added before it: multiple matches are normally separated by a single 478space. 479.El 480.Pp 481An APPLE 4+4 character APPLE creator and type can be specified as: 482.Bd -literal -offset indent 483!:apple CREATYPE 484.Ed 485.Pp 486A MIME type is given on a separate line, which must be the next 487non-blank or comment line after the magic line that identifies the 488file type, and has the following format: 489.Bd -literal -offset indent 490!:mime MIMETYPE 491.Ed 492.Pp 493i.e. the literal string 494.Dq !:mime 495followed by the MIME type. 496.Pp 497An optional strength can be supplied on a separate line which refers to 498the current magic description using the following format: 499.Bd -literal -offset indent 500!:strength OP VALUE 501.Ed 502.Pp 503The operand 504.Dv OP 505can be: 506.Dv + , 507.Dv - , 508.Dv * , 509or 510.Dv / 511and 512.Dv VALUE 513is a constant between 0 and 255. 514This constant is applied using the specified operand 515to the currently computed default magic strength. 516.Pp 517Some file formats contain additional information which is to be printed 518along with the file type or need additional tests to determine the true 519file type. 520These additional tests are introduced by one or more 521.Em \*[Gt] 522characters preceding the offset. 523The number of 524.Em \*[Gt] 525on the line indicates the level of the test; a line with no 526.Em \*[Gt] 527at the beginning is considered to be at level 0. 528Tests are arranged in a tree-like hierarchy: 529if the test on a line at level 530.Em n 531succeeds, all following tests at level 532.Em n+1 533are performed, and the messages printed if the tests succeed, until a line 534with level 535.Em n 536(or less) appears. 537For more complex files, one can use empty messages to get just the 538"if/then" effect, in the following way: 539.Bd -literal -offset indent 5400 string MZ 541\*[Gt]0x18 leshort \*[Lt]0x40 MS-DOS executable 542\*[Gt]0x18 leshort \*[Gt]0x3f extended PC executable (e.g., MS Windows) 543.Ed 544.Pp 545Offsets do not need to be constant, but can also be read from the file 546being examined. 547If the first character following the last 548.Em \*[Gt] 549is a 550.Em \&( 551then the string after the parenthesis is interpreted as an indirect offset. 552That means that the number after the parenthesis is used as an offset in 553the file. 554The value at that offset is read, and is used again as an offset 555in the file. 556Indirect offsets are of the form: 557.Em (( x [[.,][bBcCeEfFgGhHiIlmsSqQ]][+\-][ y ]) . 558The value of 559.Em x 560is used as an offset in the file. 561A byte, id3 length, short or long is read at that offset depending on the 562.Em [bBcCeEfFgGhHiIlmsSqQ] 563type specifier. 564The value is treated as signed if 565.Dq , 566is specified or unsigned if 567.Dq . 568is specified. 569The capitalized types interpret the number as a big endian 570value, whereas the small letter versions interpret the number as a little 571endian value; 572the 573.Em m 574type interprets the number as a middle endian (PDP-11) value. 575To that number the value of 576.Em y 577is added and the result is used as an offset in the file. 578The default type if one is not specified is long. 579The following types are recognized: 580.Bl -column -offset indent "Type" "Half/Short" "Little" "Size" 581.It Sy Type Sy Mnemonic Sy Endian Sy Size 582.It bcBc Byte/Char N/A 1 583.It efg Double Little 8 584.It EFG Double Big 8 585.It hs Half/Short Little 2 586.It HS Half/Short Big 2 587.It i ID3 Little 4 588.It I ID3 Big 4 589.It m Middle Middle 4 590.It q Quad Little 8 591.It Q Quad Big 8 592.El 593.Pp 594That way variable length structures can be examined: 595.Bd -literal -offset indent 596# MS Windows executables are also valid MS-DOS executables 5970 string MZ 598\*[Gt]0x18 leshort \*[Lt]0x40 MZ executable (MS-DOS) 599# skip the whole block below if it is not an extended executable 600\*[Gt]0x18 leshort \*[Gt]0x3f 601\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 602\*[Gt]\*[Gt](0x3c.l) string LX\e0\e0 LX executable (OS/2) 603.Ed 604.Pp 605This strategy of examining has a drawback: you must make sure that you 606eventually print something, or users may get empty output (such as when 607there is neither PE\e0\e0 nor LE\e0\e0 in the above example). 608.Pp 609If this indirect offset cannot be used directly, simple calculations are 610possible: appending 611.Em [+-*/%\*[Am]|^]number 612inside parentheses allows one to modify 613the value read from the file before it is used as an offset: 614.Bd -literal -offset indent 615# MS Windows executables are also valid MS-DOS executables 6160 string MZ 617# sometimes, the value at 0x18 is less that 0x40 but there's still an 618# extended executable, simply appended to the file 619\*[Gt]0x18 leshort \*[Lt]0x40 620\*[Gt]\*[Gt](4.s*512) leshort 0x014c COFF executable (MS-DOS, DJGPP) 621\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 622.Ed 623.Pp 624Sometimes you do not know the exact offset as this depends on the length or 625position (when indirection was used before) of preceding fields. 626You can specify an offset relative to the end of the last up-level 627field using 628.Sq \*[Am] 629as a prefix to the offset: 630.Bd -literal -offset indent 6310 string MZ 632\*[Gt]0x18 leshort \*[Gt]0x3f 633\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 634# immediately following the PE signature is the CPU type 635\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x14c for Intel 80386 636\*[Gt]\*[Gt]\*[Gt]\*[Am]0 leshort 0x184 for DEC Alpha 637.Ed 638.Pp 639Indirect and relative offsets can be combined: 640.Bd -literal -offset indent 6410 string MZ 642\*[Gt]0x18 leshort \*[Lt]0x40 643\*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS) 644# if it's not COFF, go back 512 bytes and add the offset taken 645# from byte 2/3, which is yet another way of finding the start 646# of the extended executable 647\*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string LE LE executable (MS Windows VxD driver) 648.Ed 649.Pp 650Or the other way around: 651.Bd -literal -offset indent 6520 string MZ 653\*[Gt]0x18 leshort \*[Gt]0x3f 654\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 655# at offset 0x80 (-4, since relative offsets start at the end 656# of the up-level match) inside the LE header, we find the absolute 657# offset to the code area, where we look for a specific signature 658\*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string UPX \eb, UPX compressed 659.Ed 660.Pp 661Or even both! 662.Bd -literal -offset indent 6630 string MZ 664\*[Gt]0x18 leshort \*[Gt]0x3f 665\*[Gt]\*[Gt](0x3c.l) string LE\e0\e0 LE executable (MS-Windows) 666# at offset 0x58 inside the LE header, we find the relative offset 667# to a data area where we look for a specific signature 668\*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3) string UNACE \eb, ACE self-extracting archive 669.Ed 670.Pp 671If you have to deal with offset/length pairs in your file, even the 672second value in a parenthesized expression can be taken from the file itself, 673using another set of parentheses. 674Note that this additional indirect offset is always relative to the 675start of the main indirect offset. 676.Bd -literal -offset indent 6770 string MZ 678\*[Gt]0x18 leshort \*[Gt]0x3f 679\*[Gt]\*[Gt](0x3c.l) string PE\e0\e0 PE executable (MS-Windows) 680# search for the PE section called ".idata"... 681\*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4 search/0x140 .idata 682# ...and go to the end of it, calculated from start+length; 683# these are located 14 and 10 bytes after the section name 684\*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string PK\e3\e4 \eb, ZIP self-extracting archive 685.Ed 686.Pp 687If you have a list of known values at a particular continuation level, 688and you want to provide a switch-like default case: 689.Bd -literal -offset indent 690# clear that continuation level match 691\*[Gt]18 clear 692\*[Gt]18 lelong 1 one 693\*[Gt]18 lelong 2 two 694\*[Gt]18 default x 695# print default match 696\*[Gt]\*[Gt]18 lelong x unmatched 0x%x 697.Ed 698.Sh SEE ALSO 699.Xr file __CSECTION__ 700\- the command that reads this file. 701.Sh BUGS 702The formats 703.Dv long , 704.Dv belong , 705.Dv lelong , 706.Dv melong , 707.Dv short , 708.Dv beshort , 709and 710.Dv leshort 711do not depend on the length of the C data types 712.Dv short 713and 714.Dv long 715on the platform, even though the Single 716.Ux 717Specification implies that they do. However, as OS X Mountain Lion has 718passed the Single 719.Ux 720Specification validation suite, and supplies a version of 721.Xr file __CSECTION__ 722in which they do not depend on the sizes of the C data types and that is 723built for a 64-bit environment in which 724.Dv long 725is 8 bytes rather than 4 bytes, presumably the validation suite does not 726test whether, for example 727.Dv long 728refers to an item with the same size as the C data type 729.Dv long . 730There should probably be 731.Dv type 732names 733.Dv int8 , 734.Dv uint8 , 735.Dv int16 , 736.Dv uint16 , 737.Dv int32 , 738.Dv uint32 , 739.Dv int64 , 740and 741.Dv uint64 , 742and specified-byte-order variants of them, 743to make it clearer that those types have specified widths. 744.\" 745.\" From: guy@sun.uucp (Guy Harris) 746.\" Newsgroups: net.bugs.usg 747.\" Subject: /etc/magic's format isn't well documented 748.\" Message-ID: <2752@sun.uucp> 749.\" Date: 3 Sep 85 08:19:07 GMT 750.\" Organization: Sun Microsystems, Inc. 751.\" Lines: 136 752.\" 753.\" Here's a manual page for the format accepted by the "file" made by adding 754.\" the changes I posted to the S5R2 version. 755.\" 756.\" Modified for Ian Darwin's version of the file command. 757