xref: /freebsd/contrib/file/README.md (revision a4d6d3b8910f3805eebcd8703e11e066aad2e2a1)
1*a4d6d3b8SXin LI## README for file(1) Command and the libmagic(3) library ##
2*a4d6d3b8SXin LI
3*a4d6d3b8SXin LI    @(#) $File: README.md,v 1.4 2021/10/21 01:51:31 christos Exp $
4*a4d6d3b8SXin LI
5*a4d6d3b8SXin LI- Bug Tracker: <https://bugs.astron.com/>
6*a4d6d3b8SXin LI- Build Status: <https://travis-ci.org/file/file>
7*a4d6d3b8SXin LI- Download link: <ftp://ftp.astron.com/pub/file/>
8*a4d6d3b8SXin LI- E-mail: <christos@astron.com>
9*a4d6d3b8SXin LI- Fuzzing link: <https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:file>
10*a4d6d3b8SXin LI- Home page: https://www.darwinsys.com/file/
11*a4d6d3b8SXin LI- Mailing List archives: <https://mailman.astron.com/pipermail/file/>
12*a4d6d3b8SXin LI- Mailing List: <file@astron.com>
13*a4d6d3b8SXin LI- Public repo: <https://github.com/file/file>
14*a4d6d3b8SXin LI- Test framework: <https://github.com/file/file-tests>
15*a4d6d3b8SXin LI
16*a4d6d3b8SXin LIPhone: Do not even think of telephoning me about this program. Send
17*a4d6d3b8SXin LIcash first!
18*a4d6d3b8SXin LI
19*a4d6d3b8SXin LIThis is Release 5.x of Ian Darwin's (copyright but distributable)
20*a4d6d3b8SXin LIfile(1) command, an implementation of the Unix File(1) command.
21*a4d6d3b8SXin LIIt knows the 'magic number' of several thousands of file types.
22*a4d6d3b8SXin LIThis version is the standard "file" command for Linux, *BSD, and
23*a4d6d3b8SXin LIother systems. (See "patchlevel.h" for the exact release number).
24*a4d6d3b8SXin LI
25*a4d6d3b8SXin LIThe major changes for 5.x are CDF file parsing, indirect magic,
26*a4d6d3b8SXin LIname/use (recursion) and overhaul in mime and ascii encoding
27*a4d6d3b8SXin LIhandling.
28*a4d6d3b8SXin LI
29*a4d6d3b8SXin LIThe major feature of 4.x is the refactoring of the code into a
30*a4d6d3b8SXin LIlibrary, and the re-write of the file command in terms of that
31*a4d6d3b8SXin LIlibrary. The library itself, libmagic can be used by 3rd party
32*a4d6d3b8SXin LIprograms that wish to identify file types without having to fork()
33*a4d6d3b8SXin LIand exec() file. The prime contributor for 4.0 was Mans Rullgard.
34*a4d6d3b8SXin LI
35*a4d6d3b8SXin LIUNIX is a trademark of UNIX System Laboratories.
36*a4d6d3b8SXin LI
37*a4d6d3b8SXin LIThe prime contributor to Release 3.8 was Guy Harris, who put in
38*a4d6d3b8SXin LImegachanges including byte-order independence.
39*a4d6d3b8SXin LI
40*a4d6d3b8SXin LIThe prime contributor to Release 3.0 was Christos Zoulas, who put
41*a4d6d3b8SXin LIin hundreds of lines of source code changes, including his own
42*a4d6d3b8SXin LIANSIfication of the code (I liked my own ANSIfication better, but
43*a4d6d3b8SXin LIhis (__P()) is the "Berkeley standard" way of doing it, and I wanted
44*a4d6d3b8SXin LIUCB to include the code...), his HP-like "indirection" (a feature
45*a4d6d3b8SXin LIof the HP file command, I think), and his mods that finally got
46*a4d6d3b8SXin LIthe uncompress (-z) mode finished and working.
47*a4d6d3b8SXin LI
48*a4d6d3b8SXin LIThis release has compiled in numerous environments; see PORTING
49*a4d6d3b8SXin LIfor a list and problems.
50*a4d6d3b8SXin LI
51*a4d6d3b8SXin LIThis fine freeware file(1) follows the USG (System V) model of the
52*a4d6d3b8SXin LIfile command, rather than the Research (V7) version or the V7-derived
53*a4d6d3b8SXin LI4.[23] Berkeley one. That is, the file /etc/magic contains much of
54*a4d6d3b8SXin LIthe ritual information that is the source of this program's power.
55*a4d6d3b8SXin LIMy version knows a little more magic (including tar archives) than
56*a4d6d3b8SXin LISystem V; the /etc/magic parsing seems to be compatible with the
57*a4d6d3b8SXin LI(poorly documented) System V /etc/magic format (with one exception;
58*a4d6d3b8SXin LIsee the man page).
59*a4d6d3b8SXin LI
60*a4d6d3b8SXin LIIn addition, the /etc/magic file is built from a subdirectory
61*a4d6d3b8SXin LIfor easier(?) maintenance.  I will act as a clearinghouse for
62*a4d6d3b8SXin LImagic numbers assigned to all sorts of data files that
63*a4d6d3b8SXin LIare in reasonable circulation. Send your magic numbers,
64*a4d6d3b8SXin LIin magic(5) format please, to the maintainer, Christos Zoulas.
65*a4d6d3b8SXin LI
66*a4d6d3b8SXin LICOPYING - read this first.
67*a4d6d3b8SXin LI* `README` - read this second (you are currently reading this file).
68*a4d6d3b8SXin LI* `INSTALL` - read on how to install
69*a4d6d3b8SXin LI* `src/apprentice.c` - parses /etc/magic to learn magic
70*a4d6d3b8SXin LI* `src/apptype.c` - used for OS/2 specific application type magic
71*a4d6d3b8SXin LI* `src/ascmagic.c` - third & last set of tests, based on hardwired assumptions.
72*a4d6d3b8SXin LI* `src/asctime_r.c` - replacement for OS's that don't have it.
73*a4d6d3b8SXin LI* `src/asprintf.c` - replacement for OS's that don't have it.
74*a4d6d3b8SXin LI* `src/buffer.c` - buffer handling functions.
75*a4d6d3b8SXin LI* `src/cdf.[ch]` - parser for Microsoft Compound Document Files
76*a4d6d3b8SXin LI* `src/cdf_time.c` - time converter for CDF.
77*a4d6d3b8SXin LI* `src/compress.c` - handles decompressing files to look inside.
78*a4d6d3b8SXin LI* `src/ctime_r.c` - replacement for OS's that don't have it.
79*a4d6d3b8SXin LI* `src/der.[ch]` - parser for Distinguished Encoding Rules
80*a4d6d3b8SXin LI* `src/dprintf.c` - replacement for OS's that don't have it.
81*a4d6d3b8SXin LI* `src/elfclass.h` - common code for elf 32/64.
82*a4d6d3b8SXin LI* `src/encoding.c` - handles unicode encodings
83*a4d6d3b8SXin LI* `src/file.c` - the main program
84*a4d6d3b8SXin LI* `src/file.h` - header file
85*a4d6d3b8SXin LI* `src/file_opts.h` - list of options
86*a4d6d3b8SXin LI* `src/fmtcheck.c` - replacement for OS's that don't have it.
87*a4d6d3b8SXin LI* `src/fsmagic.c` - first set of tests the program runs, based on filesystem info
88*a4d6d3b8SXin LI* `src/funcs.c` - utilility functions
89*a4d6d3b8SXin LI* `src/getline.c` - replacement for OS's that don't have it.
90*a4d6d3b8SXin LI* `src/getopt_long.c` - replacement for OS's that don't have it.
91*a4d6d3b8SXin LI* `src/gmtime_r.c` - replacement for OS's that don't have it.
92*a4d6d3b8SXin LI* `src/is_csv.c` - knows about Comma Separated Value file format (RFC 4180).
93*a4d6d3b8SXin LI* `src/is_json.c` - knows about JavaScript Object Notation format (RFC 8259).
94*a4d6d3b8SXin LI* `src/is_tar.c, tar.h` - knows about Tape ARchive format (courtesy John Gilmore).
95*a4d6d3b8SXin LI* `src/localtime_r.c` - replacement for OS's that don't have it.
96*a4d6d3b8SXin LI* `src/magic.h.in` - source file for magic.h
97*a4d6d3b8SXin LI* `src/mygetopt.h` - replacement for OS's that don't have it.
98*a4d6d3b8SXin LI* `src/magic.c` - the libmagic api
99*a4d6d3b8SXin LI* `src/names.h` - header file for ascmagic.c
100*a4d6d3b8SXin LI* `src/pread.c` - replacement for OS's that don't have it.
101*a4d6d3b8SXin LI* `src/print.c` - print results, errors, warnings.
102*a4d6d3b8SXin LI* `src/readcdf.c` - CDF wrapper.
103*a4d6d3b8SXin LI* `src/readelf.[ch]` - Stand-alone elf parsing code.
104*a4d6d3b8SXin LI* `src/softmagic.c` - 2nd set of tests, based on /etc/magic
105*a4d6d3b8SXin LI* `src/mygetopt.h` - replacement for OS's that don't have it.
106*a4d6d3b8SXin LI* `src/strcasestr.c` - replacement for OS's that don't have it.
107*a4d6d3b8SXin LI* `src/strlcat.c` - replacement for OS's that don't have it.
108*a4d6d3b8SXin LI* `src/strlcpy.c` - replacement for OS's that don't have it.
109*a4d6d3b8SXin LI* `src/strndup.c` - replacement for OS's that don't have it.
110*a4d6d3b8SXin LI* `src/tar.h` - tar file definitions
111*a4d6d3b8SXin LI* `src/vasprintf.c` - for systems that don't have it.
112*a4d6d3b8SXin LI* `doc/file.man` - man page for the command
113*a4d6d3b8SXin LI* `doc/magic.man` - man page for the magic file, courtesy Guy Harris.
114*a4d6d3b8SXin LI	Install as magic.4 on USG and magic.5 on V7 or Berkeley; cf Makefile.
115*a4d6d3b8SXin LI
116*a4d6d3b8SXin LIMagdir - directory of /etc/magic pieces
117*a4d6d3b8SXin LI------------------------------------------------------------------------------
118*a4d6d3b8SXin LI
119*a4d6d3b8SXin LIIf you submit a new magic entry please make sure you read the following
120*a4d6d3b8SXin LIguidelines:
121*a4d6d3b8SXin LI
122*a4d6d3b8SXin LI- Initial match is preferably at least 32 bits long, and is a _unique_ match
123*a4d6d3b8SXin LI- If this is not feasible, use additional check
124*a4d6d3b8SXin LI- Match of <= 16 bits are not accepted
125*a4d6d3b8SXin LI- Delay printing string as much as possible, don't print output too early
126*a4d6d3b8SXin LI- Avoid printf arbitrary byte as string, which can be a source of
127*a4d6d3b8SXin LI  crash and buffer overflow
128*a4d6d3b8SXin LI
129*a4d6d3b8SXin LI- Provide complete information with entry:
130*a4d6d3b8SXin LI  * One line short summary
131*a4d6d3b8SXin LI  * Optional long description
132*a4d6d3b8SXin LI  * File extension, if applicable
133*a4d6d3b8SXin LI  * Full name and contact method (for discussion when entry has problem)
134*a4d6d3b8SXin LI  * Further reference, such as documentation of format
135*a4d6d3b8SXin LI
136*a4d6d3b8SXin LIgpg for dummies:
137*a4d6d3b8SXin LI------------------------------------------------------------------------------
138*a4d6d3b8SXin LI
139*a4d6d3b8SXin LI```
140*a4d6d3b8SXin LI$ gpg --verify file-X.YY.tar.gz.asc file-X.YY.tar.gz
141*a4d6d3b8SXin LIgpg: assuming signed data in `file-X.YY.tar.gz'
142*a4d6d3b8SXin LIgpg: Signature made WWW MMM DD HH:MM:SS YYYY ZZZ using DSA key ID KKKKKKKK
143*a4d6d3b8SXin LI```
144*a4d6d3b8SXin LI
145*a4d6d3b8SXin LITo download the key:
146*a4d6d3b8SXin LI
147*a4d6d3b8SXin LI```
148*a4d6d3b8SXin LI$ gpg --keyserver hkp://keys.gnupg.net --recv-keys KKKKKKKK
149*a4d6d3b8SXin LI```
150*a4d6d3b8SXin LI------------------------------------------------------------------------------
151*a4d6d3b8SXin LI
152*a4d6d3b8SXin LI
153*a4d6d3b8SXin LIParts of this software were developed at SoftQuad Inc., developers
154*a4d6d3b8SXin LIof SGML/HTML/XML publishing software, in Toronto, Canada.
155*a4d6d3b8SXin LISoftQuad was swallowed up by Corel in 2002 and does not exist any longer.
156