xref: /freebsd/contrib/file/magic/Magdir/msooxml (revision ae316d1d1cffd71ab7751f94e10118777a88e027)
1b6cee71dSXin LI
2b6cee71dSXin LI#------------------------------------------------------------------------------
3*ae316d1dSXin LI# $File: msooxml,v 1.23 2024/07/19 18:48:23 christos Exp $
4b6cee71dSXin LI# msooxml:  file(1) magic for Microsoft Office XML
5b6cee71dSXin LI# From: Ralf Brown <ralf.brown@gmail.com>
6b6cee71dSXin LI
7b6cee71dSXin LI# .docx, .pptx, and .xlsx are XML plus other files inside a ZIP
8b6cee71dSXin LI#   archive.  The first member file is normally "[Content_Types].xml".
9b6cee71dSXin LI#   but some libreoffice generated files put this later. Perhaps skip
10b6cee71dSXin LI#   the "[Content_Types].xml" test?
11b6cee71dSXin LI# Since MSOOXML doesn't have anything like the uncompressed "mimetype"
12b6cee71dSXin LI#   file of ePub or OpenDocument, we'll have to scan for a filename
13b6cee71dSXin LI#   which can distinguish between the three types
14b6cee71dSXin LI
1558a0f0d0SEitan Adler0		name		msooxml
1658a0f0d0SEitan Adler>0		string		word/		Microsoft Word 2007+
1758a0f0d0SEitan Adler!:mime application/vnd.openxmlformats-officedocument.wordprocessingml.document
18a2dfb722SXin LI!:ext	docx
1958a0f0d0SEitan Adler>0		string		ppt/		Microsoft PowerPoint 2007+
2058a0f0d0SEitan Adler!:mime application/vnd.openxmlformats-officedocument.presentationml.presentation
21a2dfb722SXin LI!:ext	pptx
2258a0f0d0SEitan Adler>0		string		xl/		Microsoft Excel 2007+
2358a0f0d0SEitan Adler!:mime application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
24a2dfb722SXin LI!:ext	xlsx
2543a5ec4eSXin LI>0		string		visio/		Microsoft Visio 2013+
26d38c30c0SXin LI!:mime application/vnd.ms-visio.drawing.main+xml
2743a5ec4eSXin LI>0		string		AppManifest.xaml	Microsoft Silverlight Application
2843a5ec4eSXin LI!:mime application/x-silverlight-app
2958a0f0d0SEitan Adler
30b6cee71dSXin LI# start by checking for ZIP local file header signature
31b6cee71dSXin LI0		string		PK\003\004
32b6cee71dSXin LI!:strength +10
33b6cee71dSXin LI# make sure the first file is correct
342dc4dbb9SEitan Adler>0x1E		use		msooxml
3543a5ec4eSXin LI>0x1E		default		x
36a4d6d3b8SXin LI>>0x1E		regex		\\[Content_Types\\]\\.xml|_rels/\\.rels|docProps|customXml
37b6cee71dSXin LI# skip to the second local file header
38b6cee71dSXin LI# since some documents include a 520-byte extra field following the file
39b6cee71dSXin LI# header, we need to scan for the next header
4043a5ec4eSXin LI>>>(18.l+49)	search/6000	PK\003\004
41*ae316d1dSXin LI>>>>&26		use		msooxml
42*ae316d1dSXin LI>>>>&26		default		x
43b6cee71dSXin LI# now skip to the *third* local file header; again, we need to scan due to a
44b6cee71dSXin LI# 520-byte extra field following the file header
45*ae316d1dSXin LI>>>>>&26	search/6000	PK\003\004
46b6cee71dSXin LI# and check the subdirectory name to determine which type of OOXML
47b6cee71dSXin LI# file we have.	 Correct the mimetype with the registered ones:
4848c779cdSXin LI# https://technet.microsoft.com/en-us/library/cc179224.aspx
49*ae316d1dSXin LI>>>>>>&26	use		msooxml
50*ae316d1dSXin LI>>>>>>&26	default		x
5158a0f0d0SEitan Adler# OpenOffice/Libreoffice orders ZIP entry differently, so check the 4th file
52*ae316d1dSXin LI>>>>>>>&26	search/6000	PK\003\004
53*ae316d1dSXin LI>>>>>>>>&26	use		msooxml
54a4d6d3b8SXin LI# Some OOXML generators add an extra customXml directory. Check another file.
55*ae316d1dSXin LI>>>>>>>>&26	default		x
56*ae316d1dSXin LI>>>>>>>>>&26	search/6000	PK\003\004
57*ae316d1dSXin LI>>>>>>>>>>&26	use		msooxml
58*ae316d1dSXin LI>>>>>>>>>>&26	default		x
59*ae316d1dSXin LI>>>>>>>>>>>&26	search/6000	PK\003\004
60*ae316d1dSXin LI>>>>>>>>>>>>&26	use		msooxml
61*ae316d1dSXin LI>>>>>>>>>>>>&26	default		x		Microsoft OOXML
62*ae316d1dSXin LI>>>>>>>>>>>&26	default		x		Microsoft OOXML
63*ae316d1dSXin LI>>>>>>>>>>&26	default		x		Microsoft OOXML
64a4d6d3b8SXin LI>>>>>>>>>&26	default		x		Microsoft OOXML
65*ae316d1dSXin LI>>>>>>>>&26	default		x		Microsoft OOXML
6643a5ec4eSXin LI>>>>>>>&26	default		x		Microsoft OOXML
67*ae316d1dSXin LI>>>>>>&26	default		x		Microsoft OOXML
68898496eeSXin LI>>0x1E		regex		\\[trash\\]
69898496eeSXin LI>>>&26		search/6000	PK\003\004
70898496eeSXin LI>>>>&26		search/6000	PK\003\004
71898496eeSXin LI>>>>>&26	use		msooxml
72898496eeSXin LI>>>>>&26	default		x
73898496eeSXin LI>>>>>>&26	search/6000	PK\003\004
74898496eeSXin LI>>>>>>>&26	use		msooxml
75898496eeSXin LI>>>>>>>&26	default		x		Microsoft OOXML
76898496eeSXin LI>>>>>>&26	default		x		Microsoft OOXML
77898496eeSXin LI>>>>>&26	default		x		Microsoft OOXML
78