xref: /freebsd/contrib/file/magic/Magdir/statistics (revision a4d6d3b8910f3805eebcd8703e11e066aad2e2a1)
1
2#------------------------------------------------------------------------------
3# $File: statistics,v 1.3 2022/03/24 15:48:58 christos Exp $
4# statistics:  file(1) magic for statistics related software
5#
6
7# From Remy Rampin
8
9# Stata is a statistical software tool that was created in 1985. While I
10# don't personally use it, data files in its native (proprietary) format
11# are common (.dta files).
12#
13# Because they are so common, especially in statistical and social
14# sciences, Stata files and SPSS files can be opened by a lot of modern
15# software, for example Python's pandas package provides built-in
16# support for them (read_stata() and read_spss()).
17#
18# I noticed that the magic database includes an entry for SPSS files but
19# not Stata files. Stata files for Stata 13 and newer (formats 117, 118,
20# and 119) always begin with the string "<stata_dta><header>" as per
21# https://www.stata.com/help.cgi?dta#definition
22#
23# The format version number always follows, for example:
24#    <stata_dta><header><release>117</release>
25#    <stata_dta><header><release>118</release>
26#
27# Therefore the following line would do the trick:
28#    0       string  <stata_dta><header>     Stata Data File
29#
30# (I'm sure the version number could be captured as well but I did not
31# manage this without a regex)
32#
33# Unfortunately the previous formats (created by Stata before 13, which
34# was released 2013) are harder to recognize. Format 115 starts with the
35# four bytes 0x73010100 or 0x73020100, format 114 with 0x72010100 or
36# 0x72020100, format 113 with 0x71010101 or 0x71020101.
37#
38# For additional reference, the Library of Congress website has an entry
39# for the Stata Data File Format 118:
40# https://www.loc.gov/preservation/digital/formats/fdd/fdd000471.shtml
41#
42# Example of those files can be found on Zenodo:
43# https://zenodo.org/search?page=1&size=20&q=&file_type=dta
440	string	\<stata_dta\>\<header\>\<release\>	Stata Data File
45>&0	regex	[0-9]+					(Release %s)
46