1.\" $File: file.man,v 1.140 2020/06/07 17:41:07 christos Exp $ 2.Dd June 7, 2020 3.Dt FILE __CSECTION__ 4.Os 5.Sh NAME 6.Nm file 7.Nd determine file type 8.Sh SYNOPSIS 9.Nm 10.Bk -words 11.Op Fl bcdEhiklLNnprsSvzZ0 12.Op Fl Fl apple 13.Op Fl Fl exclude-quiet 14.Op Fl Fl extension 15.Op Fl Fl mime-encoding 16.Op Fl Fl mime-type 17.Op Fl e Ar testname 18.Op Fl F Ar separator 19.Op Fl f Ar namefile 20.Op Fl m Ar magicfiles 21.Op Fl P Ar name=value 22.Ar 23.Ek 24.Nm 25.Fl C 26.Op Fl m Ar magicfiles 27.Nm 28.Op Fl Fl help 29.Sh DESCRIPTION 30This manual page documents version __VERSION__ of the 31.Nm 32command. 33.Pp 34.Nm 35tests each argument in an attempt to classify it. 36There are three sets of tests, performed in this order: 37filesystem tests, magic tests, and language tests. 38The 39.Em first 40test that succeeds causes the file type to be printed. 41.Pp 42The type printed will usually contain one of the words 43.Em text 44(the file contains only 45printing characters and a few common control 46characters and is probably safe to read on an 47.Dv ASCII 48terminal), 49.Em executable 50(the file contains the result of compiling a program 51in a form understandable to some 52.Tn UNIX 53kernel or another), 54or 55.Em data 56meaning anything else (data is usually 57.Dq binary 58or non-printable). 59Exceptions are well-known file formats (core files, tar archives) 60that are known to contain binary data. 61When modifying magic files or the program itself, make sure to 62.Em "preserve these keywords" . 63Users depend on knowing that all the readable files in a directory 64have the word 65.Dq text 66printed. 67Don't do as Berkeley did and change 68.Dq shell commands text 69to 70.Dq shell script . 71.Pp 72The filesystem tests are based on examining the return from a 73.Xr stat 2 74system call. 75The program checks to see if the file is empty, 76or if it's some sort of special file. 77Any known file types appropriate to the system you are running on 78(sockets, symbolic links, or named pipes (FIFOs) on those systems that 79implement them) 80are intuited if they are defined in the system header file 81.In sys/stat.h . 82.Pp 83The magic tests are used to check for files with data in 84particular fixed formats. 85The canonical example of this is a binary executable (compiled program) 86.Dv a.out 87file, whose format is defined in 88.In elf.h , 89.In a.out.h 90and possibly 91.In exec.h 92in the standard include directory. 93These files have a 94.Dq "magic number" 95stored in a particular place 96near the beginning of the file that tells the 97.Tn UNIX 98operating system 99that the file is a binary executable, and which of several types thereof. 100The concept of a 101.Dq "magic" 102has been applied by extension to data files. 103Any file with some invariant identifier at a small fixed 104offset into the file can usually be described in this way. 105The information identifying these files is read from the compiled 106magic file 107.Pa __MAGIC__.mgc , 108or the files in the directory 109.Pa __MAGIC__ 110if the compiled file does not exist. 111In addition, if 112.Pa $HOME/.magic.mgc 113or 114.Pa $HOME/.magic 115exists, it will be used in preference to the system magic files. 116.Pp 117If a file does not match any of the entries in the magic file, 118it is examined to see if it seems to be a text file. 119ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets 120(such as those used on Macintosh and IBM PC systems), 121UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC 122character sets can be distinguished by the different 123ranges and sequences of bytes that constitute printable text 124in each set. 125If a file passes any of these tests, its character set is reported. 126ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified 127as 128.Dq text 129because they will be mostly readable on nearly any terminal; 130UTF-16 and EBCDIC are only 131.Dq character data 132because, while 133they contain text, it is text that will require translation 134before it can be read. 135In addition, 136.Nm 137will attempt to determine other characteristics of text-type files. 138If the lines of a file are terminated by CR, CRLF, or NEL, instead 139of the Unix-standard LF, this will be reported. 140Files that contain embedded escape sequences or overstriking 141will also be identified. 142.Pp 143Once 144.Nm 145has determined the character set used in a text-type file, 146it will 147attempt to determine in what language the file is written. 148The language tests look for particular strings (cf. 149.In names.h ) 150that can appear anywhere in the first few blocks of a file. 151For example, the keyword 152.Em .br 153indicates that the file is most likely a 154.Xr troff 1 155input file, just as the keyword 156.Em struct 157indicates a C program. 158These tests are less reliable than the previous 159two groups, so they are performed last. 160The language test routines also test for some miscellany 161(such as 162.Xr tar 1 163archives, JSON files). 164.Pp 165Any file that cannot be identified as having been written 166in any of the character sets listed above is simply said to be 167.Dq data . 168.Sh OPTIONS 169.Bl -tag -width indent 170.It Fl Fl apple 171Causes the file command to output the file type and creator code as 172used by older MacOS versions. 173The code consists of eight letters, 174the first describing the file type, the latter the creator. 175This option works properly only for file formats that have the 176apple-style output defined. 177.It Fl b , Fl Fl brief 178Do not prepend filenames to output lines (brief mode). 179.It Fl C , Fl Fl compile 180Write a 181.Pa magic.mgc 182output file that contains a pre-parsed version of the magic file or directory. 183.It Fl c , Fl Fl checking-printout 184Cause a checking printout of the parsed form of the magic file. 185This is usually used in conjunction with the 186.Fl m 187flag to debug a new magic file before installing it. 188.It Fl d 189Prints internal debugging information to stderr. 190.It Fl E 191On filesystem errors (file not found etc), instead of handling the error 192as regular output as POSIX mandates and keep going, issue an error message 193and exit. 194.It Fl e , Fl Fl exclude Ar testname 195Exclude the test named in 196.Ar testname 197from the list of tests made to determine the file type. 198Valid test names are: 199.Bl -tag -width compress 200.It apptype 201.Dv EMX 202application type (only on EMX). 203.It ascii 204Various types of text files (this test will try to guess the text 205encoding, irrespective of the setting of the 206.Sq encoding 207option). 208.It encoding 209Different text encodings for soft magic tests. 210.It tokens 211Ignored for backwards compatibility. 212.It cdf 213Prints details of Compound Document Files. 214.It compress 215Checks for, and looks inside, compressed files. 216.It csv 217Checks Comma Separated Value files. 218.It elf 219Prints ELF file details, provided soft magic tests are enabled and the 220elf magic is found. 221.It json 222Examines JSON (RFC-7159) files by parsing them for compliance. 223.It soft 224Consults magic files. 225.It tar 226Examines tar files by verifying the checksum of the 512 byte tar header. 227Excluding this test can provide more detailed content description by using 228the soft magic method. 229.It text 230A synonym for 231.Sq ascii . 232.El 233.It Fl Fl exclude-quiet 234Like 235.Fl Fl exclude 236but ignore tests that 237.Nm 238does not know about. 239This is intended for compatilibity with older versions of 240.Nm . 241.It Fl Fl extension 242Print a slash-separated list of valid extensions for the file type found. 243.It Fl F , Fl Fl separator Ar separator 244Use the specified string as the separator between the filename and the 245file result returned. 246Defaults to 247.Sq \&: . 248.It Fl f , Fl Fl files-from Ar namefile 249Read the names of the files to be examined from 250.Ar namefile 251(one per line) 252before the argument list. 253Either 254.Ar namefile 255or at least one filename argument must be present; 256to test the standard input, use 257.Sq - 258as a filename argument. 259Please note that 260.Ar namefile 261is unwrapped and the enclosed filenames are processed when this option is 262encountered and before any further options processing is done. 263This allows one to process multiple lists of files with different command line 264arguments on the same 265.Nm 266invocation. 267Thus if you want to set the delimiter, you need to do it before you specify 268the list of files, like: 269.Dq Fl F Ar @ Fl f Ar namefile , 270instead of: 271.Dq Fl f Ar namefile Fl F Ar @ . 272.It Fl h , Fl Fl no-dereference 273option causes symlinks not to be followed 274(on systems that support symbolic links). 275This is the default if the environment variable 276.Dv POSIXLY_CORRECT 277is not defined. 278.It Fl i , Fl Fl mime 279Causes the file command to output mime type strings rather than the more 280traditional human readable ones. 281Thus it may say 282.Sq text/plain; charset=us-ascii 283rather than 284.Dq ASCII text . 285.It Fl Fl mime-type , Fl Fl mime-encoding 286Like 287.Fl i , 288but print only the specified element(s). 289.It Fl k , Fl Fl keep-going 290Don't stop at the first match, keep going. 291Subsequent matches will be 292have the string 293.Sq "\[rs]012\- " 294prepended. 295(If you want a newline, see the 296.Fl r 297option.) 298The magic pattern with the highest strength (see the 299.Fl l 300option) comes first. 301.It Fl l , Fl Fl list 302Shows a list of patterns and their strength sorted descending by 303.Xr magic __FSECTION__ 304strength 305which is used for the matching (see also the 306.Fl k 307option). 308.It Fl L , Fl Fl dereference 309option causes symlinks to be followed, as the like-named option in 310.Xr ls 1 311(on systems that support symbolic links). 312This is the default if the environment variable 313.Ev POSIXLY_CORRECT 314is defined. 315.It Fl m , Fl Fl magic-file Ar magicfiles 316Specify an alternate list of files and directories containing magic. 317This can be a single item, or a colon-separated list. 318If a compiled magic file is found alongside a file or directory, 319it will be used instead. 320.It Fl N , Fl Fl no-pad 321Don't pad filenames so that they align in the output. 322.It Fl n , Fl Fl no-buffer 323Force stdout to be flushed after checking each file. 324This is only useful if checking a list of files. 325It is intended to be used by programs that want filetype output from a pipe. 326.It Fl p , Fl Fl preserve-date 327On systems that support 328.Xr utime 3 329or 330.Xr utimes 2 , 331attempt to preserve the access time of files analyzed, to pretend that 332.Nm 333never read them. 334.It Fl P , Fl Fl parameter Ar name=value 335Set various parameter limits. 336.Bl -column "elf_phnum" "Default" "XXXXXXXXXXXXXXXXXXXXXXXXXXX" -offset indent 337.It Sy "Name" Ta Sy "Default" Ta Sy "Explanation" 338.It Li bytes Ta 1048576 Ta max number of bytes to read from file 339.It Li elf_notes Ta 256 Ta max ELF notes processed 340.It Li elf_phnum Ta 2048 Ta max ELF program sections processed 341.It Li elf_shnum Ta 32768 Ta max ELF sections processed 342.It Li indir Ta 50 Ta recursion limit for indirect magic 343.It Li name Ta 50 Ta use count limit for name/use magic 344.It Li regex Ta 8192 Ta length limit for regex searches 345.El 346.It Fl r , Fl Fl raw 347Don't translate unprintable characters to \eooo. 348Normally 349.Nm 350translates unprintable characters to their octal representation. 351.It Fl s , Fl Fl special-files 352Normally, 353.Nm 354only attempts to read and determine the type of argument files which 355.Xr stat 2 356reports are ordinary files. 357This prevents problems, because reading special files may have peculiar 358consequences. 359Specifying the 360.Fl s 361option causes 362.Nm 363to also read argument files which are block or character special files. 364This is useful for determining the filesystem types of the data in raw 365disk partitions, which are block special files. 366This option also causes 367.Nm 368to disregard the file size as reported by 369.Xr stat 2 370since on some systems it reports a zero size for raw disk partitions. 371.It Fl S , Fl Fl no-sandbox 372On systems where libseccomp 373.Pa ( https://github.com/seccomp/libseccomp ) 374is available, the 375.Fl S 376flag disables sandboxing which is enabled by default. 377This option is needed for file to execute external decompressing programs, 378i.e. when the 379.Fl z 380flag is specified and the built-in decompressors are not available. 381On systems where sandboxing is not available, this option has no effect. 382.It Fl v , Fl Fl version 383Print the version of the program and exit. 384.It Fl z , Fl Fl uncompress 385Try to look inside compressed files. 386.It Fl Z , Fl Fl uncompress-noreport 387Try to look inside compressed files, but report information about the contents 388only not the compression. 389.It Fl 0 , Fl Fl print0 390Output a null character 391.Sq \e0 392after the end of the filename. 393Nice to 394.Xr cut 1 395the output. 396This does not affect the separator, which is still printed. 397.Pp 398If this option is repeated more than once, then 399.Nm 400prints just the filename followed by a NUL followed by the description 401(or ERROR: text) followed by a second NUL for each entry. 402.It Fl -help 403Print a help message and exit. 404.El 405.Sh ENVIRONMENT 406The environment variable 407.Ev MAGIC 408can be used to set the default magic file name. 409If that variable is set, then 410.Nm 411will not attempt to open 412.Pa $HOME/.magic . 413.Nm 414adds 415.Dq Pa .mgc 416to the value of this variable as appropriate. 417The environment variable 418.Ev POSIXLY_CORRECT 419controls (on systems that support symbolic links), whether 420.Nm 421will attempt to follow symlinks or not. 422If set, then 423.Nm 424follows symlink, otherwise it does not. 425This is also controlled by the 426.Fl L 427and 428.Fl h 429options. 430.Sh FILES 431.Bl -tag -width __MAGIC__.mgc -compact 432.It Pa __MAGIC__.mgc 433Default compiled list of magic. 434.It Pa __MAGIC__ 435Directory containing default magic files. 436.El 437.Sh EXIT STATUS 438.Nm 439will exit with 440.Dv 0 441if the operation was successful or 442.Dv >0 443if an error was encountered. 444The following errors cause diagnostic messages, but don't affect the program 445exit code (as POSIX requires), unless 446.Fl E 447is specified: 448.Bl -bullet -compact -offset indent 449.It 450A file cannot be found 451.It 452There is no permission to read a file 453.It 454The file type cannot be determined 455.El 456.Sh EXAMPLES 457.Bd -literal -offset indent 458$ file file.c file /dev/{wd0a,hda} 459file.c: C program text 460file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 461 dynamically linked (uses shared libs), stripped 462/dev/wd0a: block special (0/0) 463/dev/hda: block special (3/0) 464 465$ file -s /dev/wd0{b,d} 466/dev/wd0b: data 467/dev/wd0d: x86 boot sector 468 469$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} 470/dev/hda: x86 boot sector 471/dev/hda1: Linux/i386 ext2 filesystem 472/dev/hda2: x86 boot sector 473/dev/hda3: x86 boot sector, extended partition table 474/dev/hda4: Linux/i386 ext2 filesystem 475/dev/hda5: Linux/i386 swap file 476/dev/hda6: Linux/i386 swap file 477/dev/hda7: Linux/i386 swap file 478/dev/hda8: Linux/i386 swap file 479/dev/hda9: empty 480/dev/hda10: empty 481 482$ file -i file.c file /dev/{wd0a,hda} 483file.c: text/x-c 484file: application/x-executable 485/dev/hda: application/x-not-regular-file 486/dev/wd0a: application/x-not-regular-file 487 488.Ed 489.Sh SEE ALSO 490.Xr hexdump 1 , 491.Xr od 1 , 492.Xr strings 1 , 493.Xr magic __FSECTION__ , 494.Xr fstyp 8 495.Sh STANDARDS CONFORMANCE 496This program is believed to exceed the System V Interface Definition 497of FILE(CMD), as near as one can determine from the vague language 498contained therein. 499Its behavior is mostly compatible with the System V program of the same name. 500This version knows more magic, however, so it will produce 501different (albeit more accurate) output in many cases. 502.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html 503.Pp 504The one significant difference 505between this version and System V 506is that this version treats any white space 507as a delimiter, so that spaces in pattern strings must be escaped. 508For example, 509.Bd -literal -offset indent 510\*[Gt]10 string language impress\ (imPRESS data) 511.Ed 512.Pp 513in an existing magic file would have to be changed to 514.Bd -literal -offset indent 515\*[Gt]10 string language\e impress (imPRESS data) 516.Ed 517.Pp 518In addition, in this version, if a pattern string contains a backslash, 519it must be escaped. 520For example 521.Bd -literal -offset indent 5220 string \ebegindata Andrew Toolkit document 523.Ed 524.Pp 525in an existing magic file would have to be changed to 526.Bd -literal -offset indent 5270 string \e\ebegindata Andrew Toolkit document 528.Ed 529.Pp 530SunOS releases 3.2 and later from Sun Microsystems include a 531.Nm 532command derived from the System V one, but with some extensions. 533This version differs from Sun's only in minor ways. 534It includes the extension of the 535.Sq \*[Am] 536operator, used as, 537for example, 538.Bd -literal -offset indent 539\*[Gt]16 long\*[Am]0x7fffffff \*[Gt]0 not stripped 540.Ed 541.Sh SECURITY 542On systems where libseccomp 543.Pa ( https://github.com/seccomp/libseccomp ) 544is available, 545.Nm 546is enforces limiting system calls to only the ones necessary for the 547operation of the program. 548This enforcement does not provide any security benefit when 549.Nm 550is asked to decompress input files running external programs with 551the 552.Fl z 553option. 554To enable execution of external decompressors, one needs to disable 555sandboxing using the 556.Fl S 557flag. 558.Sh MAGIC DIRECTORY 559The magic file entries have been collected from various sources, 560mainly USENET, and contributed by various authors. 561Christos Zoulas (address below) will collect additional 562or corrected magic file entries. 563A consolidation of magic file entries 564will be distributed periodically. 565.Pp 566The order of entries in the magic file is significant. 567Depending on what system you are using, the order that 568they are put together may be incorrect. 569If your old 570.Nm 571command uses a magic file, 572keep the old magic file around for comparison purposes 573(rename it to 574.Pa __MAGIC__.orig ) . 575.Sh HISTORY 576There has been a 577.Nm 578command in every 579.Dv UNIX since at least Research Version 4 580(man page dated November, 1973). 581The System V version introduced one significant major change: 582the external list of magic types. 583This slowed the program down slightly but made it a lot more flexible. 584.Pp 585This program, based on the System V version, 586was written by Ian Darwin 587.Aq ian@darwinsys.com 588without looking at anybody else's source code. 589.Pp 590John Gilmore revised the code extensively, making it better than 591the first version. 592Geoff Collyer found several inadequacies 593and provided some magic file entries. 594Contributions of the 595.Sq \*[Am] 596operator by Rob McMahon, 597.Aq cudcv@warwick.ac.uk , 5981989. 599.Pp 600Guy Harris, 601.Aq guy@netapp.com , 602made many changes from 1993 to the present. 603.Pp 604Primary development and maintenance from 1990 to the present by 605Christos Zoulas 606.Aq christos@astron.com . 607.Pp 608Altered by Chris Lowth 609.Aq chris@lowth.com , 6102000: handle the 611.Fl i 612option to output mime type strings, using an alternative 613magic file and internal logic. 614.Pp 615Altered by Eric Fischer 616.Aq enf@pobox.com , 617July, 2000, 618to identify character codes and attempt to identify the languages 619of non-ASCII files. 620.Pp 621Altered by Reuben Thomas 622.Aq rrt@sc3d.org , 6232007-2011, to improve MIME support, merge MIME and non-MIME magic, 624support directories as well as files of magic, apply many bug fixes, 625update and fix a lot of magic, improve the build system, improve the 626documentation, and rewrite the Python bindings in pure Python. 627.Pp 628The list of contributors to the 629.Sq magic 630directory (magic files) 631is too long to include here. 632You know who you are; thank you. 633Many contributors are listed in the source files. 634.Sh LEGAL NOTICE 635Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 636Covered by the standard Berkeley Software Distribution copyright; see the file 637COPYING in the source distribution. 638.Pp 639The files 640.Pa tar.h 641and 642.Pa is_tar.c 643were written by John Gilmore from his public-domain 644.Xr tar 1 645program, and are not covered by the above license. 646.Sh BUGS 647Please report bugs and send patches to the bug tracker at 648.Pa https://bugs.astron.com/ 649or the mailing list at 650.Aq file@astron.com 651(visit 652.Pa https://mailman.astron.com/mailman/listinfo/file 653first to subscribe). 654.Sh TODO 655Fix output so that tests for MIME and APPLE flags are not needed all 656over the place, and actual output is only done in one place. 657This needs a design. 658Suggestion: push possible outputs on to a list, then pick the 659last-pushed (most specific, one hopes) value at the end, or 660use a default if the list is empty. 661This should not slow down evaluation. 662.Pp 663The handling of 664.Dv MAGIC_CONTINUE 665and printing \e012- between entries is clumsy and complicated; refactor 666and centralize. 667.Pp 668Some of the encoding logic is hard-coded in encoding.c and can be moved 669to the magic files if we had a !:charset annotation 670.Pp 671Continue to squash all magic bugs. 672See Debian BTS for a good source. 673.Pp 674Store arbitrarily long strings, for example for %s patterns, so that 675they can be printed out. 676Fixes Debian bug #271672. 677This can be done by allocating strings in a string pool, storing the 678string pool at the end of the magic file and converting all the string 679pointers to relative offsets from the string pool. 680.Pp 681Add syntax for relative offsets after current level (Debian bug #466037). 682.Pp 683Make file -ki work, i.e. give multiple MIME types. 684.Pp 685Add a zip library so we can peek inside Office2007 documents to 686print more details about their contents. 687.Pp 688Add an option to print URLs for the sources of the file descriptions. 689.Pp 690Combine script searches and add a way to map executable names to MIME 691types (e.g. have a magic value for !:mime which causes the resulting 692string to be looked up in a table). 693This would avoid adding the same magic repeatedly for each new 694hash-bang interpreter. 695.Pp 696When a file descriptor is available, we can skip and adjust the buffer 697instead of the hacky buffer management we do now. 698.Pp 699Fix 700.Dq name 701and 702.Dq use 703to check for consistency at compile time (duplicate 704.Dq name , 705.Dq use 706pointing to undefined 707.Dq name 708). 709Make 710.Dq name 711/ 712.Dq use 713more efficient by keeping a sorted list of names. 714Special-case ^ to flip endianness in the parser so that it does not 715have to be escaped, and document it. 716.Pp 717If the offsets specified internally in the file exceed the buffer size 718( 719.Dv HOWMANY 720variable in file.h), then we don't seek to that offset, but we give up. 721It would be better if buffer managements was done when the file descriptor 722is available so move around the file. 723One must be careful though because this has performance (and thus security 724considerations). 725.Sh AVAILABILITY 726You can obtain the original author's latest version by anonymous FTP 727on 728.Pa ftp.astron.com 729in the directory 730.Pa /pub/file/file-X.YZ.tar.gz . 731