1.\" $File: file.man,v 1.106 2014/03/07 23:11:51 christos Exp $ 2.Dd January 30, 2014 3.Dt FILE __CSECTION__ 4.Os 5.Sh NAME 6.Nm file 7.Nd determine file type 8.Sh SYNOPSIS 9.Nm 10.Bk -words 11.Op Fl bcEhiklLNnprsvz0 12.Op Fl Fl apple 13.Op Fl Fl mime-encoding 14.Op Fl Fl mime-type 15.Op Fl e Ar testname 16.Op Fl F Ar separator 17.Op Fl f Ar namefile 18.Op Fl m Ar magicfiles 19.Ar 20.Ek 21.Nm 22.Fl C 23.Op Fl m Ar magicfiles 24.Nm 25.Op Fl Fl help 26.Sh DESCRIPTION 27This manual page documents version __VERSION__ of the 28.Nm 29command. 30.Pp 31.Nm 32tests each argument in an attempt to classify it. 33There are three sets of tests, performed in this order: 34filesystem tests, magic tests, and language tests. 35The 36.Em first 37test that succeeds causes the file type to be printed. 38.Pp 39The type printed will usually contain one of the words 40.Em text 41(the file contains only 42printing characters and a few common control 43characters and is probably safe to read on an 44.Dv ASCII 45terminal), 46.Em executable 47(the file contains the result of compiling a program 48in a form understandable to some 49.Tn UNIX 50kernel or another), 51or 52.Em data 53meaning anything else (data is usually 54.Dq binary 55or non-printable). 56Exceptions are well-known file formats (core files, tar archives) 57that are known to contain binary data. 58When modifying magic files or the program itself, make sure to 59.Em "preserve these keywords" . 60Users depend on knowing that all the readable files in a directory 61have the word 62.Dq text 63printed. 64Don't do as Berkeley did and change 65.Dq shell commands text 66to 67.Dq shell script . 68.Pp 69The filesystem tests are based on examining the return from a 70.Xr stat 2 71system call. 72The program checks to see if the file is empty, 73or if it's some sort of special file. 74Any known file types appropriate to the system you are running on 75(sockets, symbolic links, or named pipes (FIFOs) on those systems that 76implement them) 77are intuited if they are defined in the system header file 78.In sys/stat.h . 79.Pp 80The magic tests are used to check for files with data in 81particular fixed formats. 82The canonical example of this is a binary executable (compiled program) 83.Dv a.out 84file, whose format is defined in 85.In elf.h , 86.In a.out.h 87and possibly 88.In exec.h 89in the standard include directory. 90These files have a 91.Dq "magic number" 92stored in a particular place 93near the beginning of the file that tells the 94.Tn UNIX 95operating system 96that the file is a binary executable, and which of several types thereof. 97The concept of a 98.Dq "magic" 99has been applied by extension to data files. 100Any file with some invariant identifier at a small fixed 101offset into the file can usually be described in this way. 102The information identifying these files is read from the compiled 103magic file 104.Pa __MAGIC__.mgc , 105or the files in the directory 106.Pa __MAGIC__ 107if the compiled file does not exist. 108In addition, if 109.Pa $HOME/.magic.mgc 110or 111.Pa $HOME/.magic 112exists, it will be used in preference to the system magic files. 113.Pp 114If a file does not match any of the entries in the magic file, 115it is examined to see if it seems to be a text file. 116ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets 117(such as those used on Macintosh and IBM PC systems), 118UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC 119character sets can be distinguished by the different 120ranges and sequences of bytes that constitute printable text 121in each set. 122If a file passes any of these tests, its character set is reported. 123ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified 124as 125.Dq text 126because they will be mostly readable on nearly any terminal; 127UTF-16 and EBCDIC are only 128.Dq character data 129because, while 130they contain text, it is text that will require translation 131before it can be read. 132In addition, 133.Nm 134will attempt to determine other characteristics of text-type files. 135If the lines of a file are terminated by CR, CRLF, or NEL, instead 136of the Unix-standard LF, this will be reported. 137Files that contain embedded escape sequences or overstriking 138will also be identified. 139.Pp 140Once 141.Nm 142has determined the character set used in a text-type file, 143it will 144attempt to determine in what language the file is written. 145The language tests look for particular strings (cf. 146.In names.h ) 147that can appear anywhere in the first few blocks of a file. 148For example, the keyword 149.Em .br 150indicates that the file is most likely a 151.Xr troff 1 152input file, just as the keyword 153.Em struct 154indicates a C program. 155These tests are less reliable than the previous 156two groups, so they are performed last. 157The language test routines also test for some miscellany 158(such as 159.Xr tar 1 160archives). 161.Pp 162Any file that cannot be identified as having been written 163in any of the character sets listed above is simply said to be 164.Dq data . 165.Sh OPTIONS 166.Bl -tag -width indent 167.It Fl Fl apple 168Causes the file command to output the file type and creator code as 169used by older MacOS versions. The code consists of eight letters, 170the first describing the file type, the latter the creator. 171.It Fl b , Fl Fl brief 172Do not prepend filenames to output lines (brief mode). 173.It Fl C , Fl Fl compile 174Write a 175.Pa magic.mgc 176output file that contains a pre-parsed version of the magic file or directory. 177.It Fl c , Fl Fl checking-printout 178Cause a checking printout of the parsed form of the magic file. 179This is usually used in conjunction with the 180.Fl m 181flag to debug a new magic file before installing it. 182.It Fl E 183On filesystem errors (file not found etc), instead of handling the error 184as regular output as POSIX mandates and keep going, issue an error message 185and exit. 186.It Fl e , Fl Fl exclude Ar testname 187Exclude the test named in 188.Ar testname 189from the list of tests made to determine the file type. 190Valid test names are: 191.Bl -tag -width compress 192.It apptype 193.Dv EMX 194application type (only on EMX). 195.It ascii 196Various types of text files (this test will try to guess the text 197encoding, irrespective of the setting of the 198.Sq encoding 199option). 200.It encoding 201Different text encodings for soft magic tests. 202.It tokens 203Ignored for backwards compatibility. 204.It cdf 205Prints details of Compound Document Files. 206.It compress 207Checks for, and looks inside, compressed files. 208.It elf 209Prints ELF file details. 210.It soft 211Consults magic files. 212.It tar 213Examines tar files. 214.El 215.It Fl F , Fl Fl separator Ar separator 216Use the specified string as the separator between the filename and the 217file result returned. 218Defaults to 219.Sq \&: . 220.It Fl f , Fl Fl files-from Ar namefile 221Read the names of the files to be examined from 222.Ar namefile 223(one per line) 224before the argument list. 225Either 226.Ar namefile 227or at least one filename argument must be present; 228to test the standard input, use 229.Sq - 230as a filename argument. 231Please note that 232.Ar namefile 233is unwrapped and the enclosed filenames are processed when this option is 234encountered and before any further options processing is done. 235This allows one to process multiple lists of files with different command line 236arguments on the same 237.Nm 238invocation. 239Thus if you want to set the delimiter, you need to do it before you specify 240the list of files, like: 241.Dq Fl F Ar @ Fl f Ar namefile , 242instead of: 243.Dq Fl f Ar namefile Fl F Ar @ . 244.It Fl h , Fl Fl no-dereference 245option causes symlinks not to be followed 246(on systems that support symbolic links). 247This is the default if the environment variable 248.Dv POSIXLY_CORRECT 249is not defined. 250.It Fl i , Fl Fl mime 251Causes the file command to output mime type strings rather than the more 252traditional human readable ones. 253Thus it may say 254.Sq text/plain; charset=us-ascii 255rather than 256.Dq ASCII text . 257.It Fl Fl mime-type , Fl Fl mime-encoding 258Like 259.Fl i , 260but print only the specified element(s). 261.It Fl k , Fl Fl keep-going 262Don't stop at the first match, keep going. 263Subsequent matches will be 264have the string 265.Sq "\[rs]012\- " 266prepended. 267(If you want a newline, see the 268.Fl r 269option.) 270The magic pattern with the highest strength (see the 271.Fl l 272option) comes first. 273.It Fl l , Fl Fl list 274Shows a list of patterns and their strength sorted descending by 275.Xr magic 4 276strength 277which is used for the matching (see also the 278.Fl k 279option). 280.It Fl L , Fl Fl dereference 281option causes symlinks to be followed, as the like-named option in 282.Xr ls 1 283(on systems that support symbolic links). 284This is the default if the environment variable 285.Ev POSIXLY_CORRECT 286is defined. 287.It Fl m , Fl Fl magic-file Ar magicfiles 288Specify an alternate list of files and directories containing magic. 289This can be a single item, or a colon-separated list. 290If a compiled magic file is found alongside a file or directory, 291it will be used instead. 292.It Fl N , Fl Fl no-pad 293Don't pad filenames so that they align in the output. 294.It Fl n , Fl Fl no-buffer 295Force stdout to be flushed after checking each file. 296This is only useful if checking a list of files. 297It is intended to be used by programs that want filetype output from a pipe. 298.It Fl p , Fl Fl preserve-date 299On systems that support 300.Xr utime 3 301or 302.Xr utimes 2 , 303attempt to preserve the access time of files analyzed, to pretend that 304.Nm 305never read them. 306.It Fl r , Fl Fl raw 307Don't translate unprintable characters to \eooo. 308Normally 309.Nm 310translates unprintable characters to their octal representation. 311.It Fl s , Fl Fl special-files 312Normally, 313.Nm 314only attempts to read and determine the type of argument files which 315.Xr stat 2 316reports are ordinary files. 317This prevents problems, because reading special files may have peculiar 318consequences. 319Specifying the 320.Fl s 321option causes 322.Nm 323to also read argument files which are block or character special files. 324This is useful for determining the filesystem types of the data in raw 325disk partitions, which are block special files. 326This option also causes 327.Nm 328to disregard the file size as reported by 329.Xr stat 2 330since on some systems it reports a zero size for raw disk partitions. 331.It Fl v , Fl Fl version 332Print the version of the program and exit. 333.It Fl z , Fl Fl uncompress 334Try to look inside compressed files. 335.It Fl 0 , Fl Fl print0 336Output a null character 337.Sq \e0 338after the end of the filename. 339Nice to 340.Xr cut 1 341the output. 342This does not affect the separator, which is still printed. 343.It Fl -help 344Print a help message and exit. 345.El 346.Sh FILES 347.Bl -tag -width __MAGIC__.mgc -compact 348.It Pa __MAGIC__.mgc 349Default compiled list of magic. 350.It Pa __MAGIC__ 351Directory containing default magic files. 352.El 353.Sh ENVIRONMENT 354The environment variable 355.Ev MAGIC 356can be used to set the default magic file name. 357If that variable is set, then 358.Nm 359will not attempt to open 360.Pa $HOME/.magic . 361.Nm 362adds 363.Dq Pa .mgc 364to the value of this variable as appropriate. 365However, 366.Pa file 367has to exist in order for 368.Pa file.mime 369to be considered. 370The environment variable 371.Ev POSIXLY_CORRECT 372controls (on systems that support symbolic links), whether 373.Nm 374will attempt to follow symlinks or not. 375If set, then 376.Nm 377follows symlink, otherwise it does not. 378This is also controlled by the 379.Fl L 380and 381.Fl h 382options. 383.Sh SEE ALSO 384.Xr magic __FSECTION__ , 385.Xr hexdump 1 , 386.Xr od 1 , 387.Xr strings 1 , 388.Sh STANDARDS CONFORMANCE 389This program is believed to exceed the System V Interface Definition 390of FILE(CMD), as near as one can determine from the vague language 391contained therein. 392Its behavior is mostly compatible with the System V program of the same name. 393This version knows more magic, however, so it will produce 394different (albeit more accurate) output in many cases. 395.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html 396.Pp 397The one significant difference 398between this version and System V 399is that this version treats any white space 400as a delimiter, so that spaces in pattern strings must be escaped. 401For example, 402.Bd -literal -offset indent 403\*[Gt]10 string language impress\ (imPRESS data) 404.Ed 405.Pp 406in an existing magic file would have to be changed to 407.Bd -literal -offset indent 408\*[Gt]10 string language\e impress (imPRESS data) 409.Ed 410.Pp 411In addition, in this version, if a pattern string contains a backslash, 412it must be escaped. 413For example 414.Bd -literal -offset indent 4150 string \ebegindata Andrew Toolkit document 416.Ed 417.Pp 418in an existing magic file would have to be changed to 419.Bd -literal -offset indent 4200 string \e\ebegindata Andrew Toolkit document 421.Ed 422.Pp 423SunOS releases 3.2 and later from Sun Microsystems include a 424.Nm 425command derived from the System V one, but with some extensions. 426This version differs from Sun's only in minor ways. 427It includes the extension of the 428.Sq \*[Am] 429operator, used as, 430for example, 431.Bd -literal -offset indent 432\*[Gt]16 long\*[Am]0x7fffffff \*[Gt]0 not stripped 433.Ed 434.Sh MAGIC DIRECTORY 435The magic file entries have been collected from various sources, 436mainly USENET, and contributed by various authors. 437Christos Zoulas (address below) will collect additional 438or corrected magic file entries. 439A consolidation of magic file entries 440will be distributed periodically. 441.Pp 442The order of entries in the magic file is significant. 443Depending on what system you are using, the order that 444they are put together may be incorrect. 445If your old 446.Nm 447command uses a magic file, 448keep the old magic file around for comparison purposes 449(rename it to 450.Pa __MAGIC__.orig ) . 451.Sh EXAMPLES 452.Bd -literal -offset indent 453$ file file.c file /dev/{wd0a,hda} 454file.c: C program text 455file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 456 dynamically linked (uses shared libs), stripped 457/dev/wd0a: block special (0/0) 458/dev/hda: block special (3/0) 459 460$ file -s /dev/wd0{b,d} 461/dev/wd0b: data 462/dev/wd0d: x86 boot sector 463 464$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} 465/dev/hda: x86 boot sector 466/dev/hda1: Linux/i386 ext2 filesystem 467/dev/hda2: x86 boot sector 468/dev/hda3: x86 boot sector, extended partition table 469/dev/hda4: Linux/i386 ext2 filesystem 470/dev/hda5: Linux/i386 swap file 471/dev/hda6: Linux/i386 swap file 472/dev/hda7: Linux/i386 swap file 473/dev/hda8: Linux/i386 swap file 474/dev/hda9: empty 475/dev/hda10: empty 476 477$ file -i file.c file /dev/{wd0a,hda} 478file.c: text/x-c 479file: application/x-executable 480/dev/hda: application/x-not-regular-file 481/dev/wd0a: application/x-not-regular-file 482 483.Ed 484.Sh HISTORY 485There has been a 486.Nm 487command in every 488.Dv UNIX since at least Research Version 4 489(man page dated November, 1973). 490The System V version introduced one significant major change: 491the external list of magic types. 492This slowed the program down slightly but made it a lot more flexible. 493.Pp 494This program, based on the System V version, 495was written by Ian Darwin 496.Aq ian@darwinsys.com 497without looking at anybody else's source code. 498.Pp 499John Gilmore revised the code extensively, making it better than 500the first version. 501Geoff Collyer found several inadequacies 502and provided some magic file entries. 503Contributions by the 504.Sq \*[Am] 505operator by Rob McMahon, 506.Aq cudcv@warwick.ac.uk , 5071989. 508.Pp 509Guy Harris, 510.Aq guy@netapp.com , 511made many changes from 1993 to the present. 5121989. 513.Pp 514Primary development and maintenance from 1990 to the present by 515Christos Zoulas 516.Aq christos@astron.com . 517.Pp 518Altered by Chris Lowth 519.Aq chris@lowth.com , 5202000: handle the 521.Fl i 522option to output mime type strings, using an alternative 523magic file and internal logic. 524.Pp 525Altered by Eric Fischer 526.Aq enf@pobox.com , 527July, 2000, 528to identify character codes and attempt to identify the languages 529of non-ASCII files. 530.Pp 531Altered by Reuben Thomas 532.Aq rrt@sc3d.org , 5332007-2011, to improve MIME support, merge MIME and non-MIME magic, 534support directories as well as files of magic, apply many bug fixes, 535update and fix a lot of magic, improve the build system, improve the 536documentation, and rewrite the Python bindings in pure Python. 537.Pp 538The list of contributors to the 539.Sq magic 540directory (magic files) 541is too long to include here. 542You know who you are; thank you. 543Many contributors are listed in the source files. 544.Sh LEGAL NOTICE 545Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 546Covered by the standard Berkeley Software Distribution copyright; see the file 547COPYING in the source distribution. 548.Pp 549The files 550.Pa tar.h 551and 552.Pa is_tar.c 553were written by John Gilmore from his public-domain 554.Xr tar 1 555program, and are not covered by the above license. 556.Sh RETURN CODE 557.Nm 558returns 0 on success, and non-zero on error. 559.Sh BUGS 560.Pp 561Please report bugs and send patches to the bug tracker at 562.Pa http://bugs.gw.com/ 563or the mailing list at 564.Aq file@mx.gw.com 565(visit 566.Pa http://mx.gw.com/mailman/listinfo/file 567first to subscribe). 568.Sh TODO 569.Pp 570Fix output so that tests for MIME and APPLE flags are not needed all 571over the place, and actual output is only done in one place. 572This needs a design. 573Suggestion: push possible outputs on to a list, then pick the 574last-pushed (most specific, one hopes) value at the end, or 575use a default if the list is empty. 576This should not slow down evaluation. 577.Pp 578Continue to squash all magic bugs. 579See Debian BTS for a good source. 580.Pp 581Store arbitrarily long strings, for example for %s patterns, so that 582they can be printed out. 583Fixes Debian bug #271672. 584Would require more complex store/load code in apprentice. 585.Pp 586Add syntax for relative offsets after current level (Debian bug #466037). 587.Pp 588Make file -ki work, i.e. give multiple MIME types. 589.Pp 590Add a zip library so we can peek inside Office2007 documents to 591figure out what they are. 592.Pp 593Add an option to print URLs for the sources of the file descriptions. 594.Pp 595Combine script searches and add a way to map executable names to MIME 596types (e.g. have a magic value for !:mime which causes the resulting 597string to be looked up in a table). 598This would avoid adding the same magic repeatedly for each new 599hash-bang interpreter. 600.Pp 601Fix 602.Dq name 603and 604.Dq use 605to check for consistency at compile time (duplicate 606.Dq name , 607.Dq use 608pointing to undefined 609.Dq name 610). 611Make 612.Dq name 613/ 614.Dq use 615more efficient by keeping a sorted list of names. 616Special-case ^ to flip endianness in the parser so that it does not 617have to be escaped, and document it. 618.Sh AVAILABILITY 619You can obtain the original author's latest version by anonymous FTP 620on 621.Pa ftp.astron.com 622in the directory 623.Pa /pub/file/file-X.YZ.tar.gz . 624