xref: /freebsd/contrib/mandoc/mandoc.3 (revision d0ff5773cefaf3fa41b1be3e44ca35bd9d5f68ee)
1.\" $Id: mandoc.3,v 1.46 2025/02/25 17:03:54 schwarze Exp $
2.\"
3.\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4.\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org>
5.\"
6.\" Permission to use, copy, modify, and distribute this software for any
7.\" purpose with or without fee is hereby granted, provided that the above
8.\" copyright notice and this permission notice appear in all copies.
9.\"
10.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17.\"
18.Dd $Mdocdate: February 25 2025 $
19.Dt MANDOC 3
20.Os
21.Sh NAME
22.Nm mandoc ,
23.Nm deroff ,
24.Nm mparse_alloc ,
25.Nm mparse_copy ,
26.Nm mparse_free ,
27.Nm mparse_open ,
28.Nm mparse_readfd ,
29.Nm mparse_reset ,
30.Nm mparse_result
31.Nd mandoc macro compiler library
32.Sh SYNOPSIS
33.In sys/types.h
34.In stdio.h
35.In mandoc.h
36.In roff.h
37.In mandoc_parse.h
38.Pp
39.Fd "#define ASCII_NBRSP"
40.Fd "#define ASCII_HYPH"
41.Fd "#define ASCII_BREAK"
42.Ft struct mparse *
43.Fo mparse_alloc
44.Fa "int options"
45.Fa "enum mandoc_os oe_e"
46.Fa "char *os_s"
47.Fc
48.Ft void
49.Fo mparse_free
50.Fa "struct mparse *parse"
51.Fc
52.Ft void
53.Fo mparse_copy
54.Fa "const struct mparse *parse"
55.Fc
56.Ft int
57.Fo mparse_open
58.Fa "struct mparse *parse"
59.Fa "const char *fname"
60.Fc
61.Ft void
62.Fo mparse_readfd
63.Fa "struct mparse *parse"
64.Fa "int fd"
65.Fa "const char *fname"
66.Fc
67.Ft void
68.Fo mparse_reset
69.Fa "struct mparse *parse"
70.Fc
71.Ft struct roff_meta *
72.Fo mparse_result
73.Fa "struct mparse *parse"
74.Fc
75.In roff.h
76.Ft void
77.Fo deroff
78.Fa "char **dest"
79.Fa "const struct roff_node *node"
80.Fc
81.In sys/types.h
82.In mandoc.h
83.In mdoc.h
84.Vt extern const char * const * mdoc_argnames;
85.Vt extern const char * const * mdoc_macronames;
86.In sys/types.h
87.In mandoc.h
88.In man.h
89.Vt extern const char * const * man_macronames;
90.Sh DESCRIPTION
91The
92.Nm mandoc
93library parses a
94.Ux
95manual into an abstract syntax tree (AST).
96.Ux
97manuals are composed of
98.Xr mdoc 7
99or
100.Xr man 7 ,
101and may be mixed with
102.Xr roff 7 ,
103.Xr tbl 7 ,
104and
105.Xr eqn 7
106invocations.
107.Pp
108The following describes a general parse sequence:
109.Bl -enum
110.It
111initiate a parsing sequence with
112.Xr mchars_alloc 3
113and
114.Fn mparse_alloc ;
115.It
116open a file with
117.Xr open 2
118or
119.Fn mparse_open ;
120.It
121parse it with
122.Fn mparse_readfd ;
123.It
124close it with
125.Xr close 2 ;
126.It
127retrieve the syntax tree with
128.Fn mparse_result ;
129.It
130if information about the validity of the input is needed, fetch it with
131.Fn mparse_updaterc ;
132.It
133iterate over parse nodes with starting from the
134.Fa first
135member of the returned
136.Vt struct roff_meta ;
137.It
138free all allocated memory with
139.Fn mparse_free
140and
141.Xr mchars_free 3 ,
142or invoke
143.Fn mparse_reset
144and go back to step 2 to parse new files.
145.El
146.Pp
147The design goals of the
148.Nm mandoc
149library are limited to providing the functionality required by the
150.Xr mandoc 1
151program.
152Consequently, the functions documented in the present manual page
153do not aim for API stability.
154Any third-party program using them typically requires adjustments after every
155.Nm mandoc
156release.
157Linking such a program requires
158.Fl lz
159because
160.Fn mparse_readfd
161calls
162.Xr gzdopen 3 ,
163.Xr gzread 3 ,
164.Xr gzerror 3 ,
165and
166.Xr gzclose 3 .
167For
168.Xr mandoc 1
169itself, the
170.Pa ./configure
171script automatically adds
172.Fl lz
173to the
174.Ev LDADD
175.Xr make 1
176variable.
177.Sh REFERENCE
178This section documents the functions, types, and variables available
179via
180.In mandoc.h ,
181with the exception of those documented in
182.Xr mandoc_escape 3
183and
184.Xr mchars_alloc 3 .
185.Ss Types
186.Bl -ohang
187.It Vt "enum mandocerr"
188An error or warning message during parsing.
189.It Vt "enum mandoclevel"
190A classification of an
191.Vt "enum mandocerr"
192as regards system operation.
193See the DIAGNOSTICS section in
194.Xr mandoc 1
195regarding the meanings of the levels.
196.It Vt "struct mparse"
197An opaque pointer to a running parse sequence.
198Created with
199.Fn mparse_alloc
200and freed with
201.Fn mparse_free .
202This may be used across parsed input if
203.Fn mparse_reset
204is called between parses.
205.El
206.Ss Functions
207.Bl -ohang
208.It Fn deroff
209Obtain a text-only representation of a
210.Vt struct roff_node ,
211including text contained in its child nodes.
212To be used on children of the
213.Fa first
214member of
215.Vt struct roff_meta .
216When it is no longer needed, the pointer returned from
217.Fn deroff
218can be passed to
219.Xr free 3 .
220.It Fn mparse_alloc
221Allocate a parser.
222The arguments have the following effect:
223.Bl -tag -offset 5n -width inttype
224.It Ar options
225When the
226.Dv MPARSE_MDOC
227or
228.Dv MPARSE_MAN
229bit is set, only that parser is used.
230Otherwise, the document type is automatically detected.
231.Pp
232When the
233.Dv MPARSE_SO
234bit is set,
235.Xr roff 7
236.Ic \&so
237file inclusion requests are always honoured.
238Otherwise, if the request is the only content in an input file,
239only the file name is remembered, to be returned in the
240.Fa sodest
241field of
242.Vt struct roff_meta .
243.Pp
244When the
245.Dv MPARSE_QUICK
246bit is set, parsing is aborted after the NAME section.
247This is for example useful in
248.Xr makewhatis 8
249.Fl Q
250to quickly build minimal databases.
251.Pp
252When the
253.Dv MARSE_VALIDATE
254bit is set,
255.Fn mparse_result
256runs the validation functions before returning the syntax tree.
257This is almost always required, except in certain debugging scenarios,
258for example to dump unvalidated syntax trees.
259.It Ar os_e
260Operating system to check base system conventions for.
261If
262.Dv MANDOC_OS_OTHER ,
263the system is automatically detected from
264.Ic \&Os ,
265.Fl Ios ,
266or
267.Xr uname 3 .
268.It Ar os_s
269A default string for the
270.Xr mdoc 7
271.Ic \&Os
272macro, overriding the
273.Dv OSNAME
274preprocessor definition and the results of
275.Xr uname 3 .
276Passing
277.Dv NULL
278sets no default.
279.El
280.Pp
281The same parser may be used for multiple files so long as
282.Fn mparse_reset
283is called between parses.
284.Fn mparse_free
285must be called to free the memory allocated by this function.
286Declared in
287.In mandoc.h ,
288implemented in
289.Pa read.c .
290.It Fn mparse_free
291Free all memory allocated by
292.Fn mparse_alloc .
293Declared in
294.In mandoc.h ,
295implemented in
296.Pa read.c .
297.It Fn mparse_copy
298Dump a copy of the input to the standard output; used for
299.Fl man T Ns Cm man .
300Declared in
301.In mandoc.h ,
302implemented in
303.Pa read.c .
304.It Fn mparse_open
305Open the file for reading.
306If that fails and
307.Fa fname
308does not already end in
309.Ql .gz ,
310try again after appending
311.Ql .gz .
312Save the information whether the file is zipped or not.
313Return a file descriptor open for reading or -1 on failure.
314It can be passed to
315.Fn mparse_readfd
316or used directly.
317Declared in
318.In mandoc.h ,
319implemented in
320.Pa read.c .
321.It Fn mparse_readfd
322Parse a file descriptor opened with
323.Xr open 2
324or
325.Fn mparse_open .
326Pass the associated filename in
327.Va fname .
328This function may be called multiple times with different parameters; however,
329.Xr close 2
330and
331.Fn mparse_reset
332should be invoked between parses.
333Declared in
334.In mandoc.h ,
335implemented in
336.Pa read.c .
337.It Fn mparse_reset
338Reset a parser so that
339.Fn mparse_readfd
340may be used again.
341Declared in
342.In mandoc.h ,
343implemented in
344.Pa read.c .
345.It Fn mparse_result
346Obtain the result of a parse.
347Declared in
348.In mandoc.h ,
349implemented in
350.Pa read.c .
351.El
352.Ss Variables
353.Bl -ohang
354.It Va man_macronames
355The string representation of a
356.Xr man 7
357macro as indexed by
358.Vt "enum mant" .
359.It Va mdoc_argnames
360The string representation of an
361.Xr mdoc 7
362macro argument as indexed by
363.Vt "enum mdocargt" .
364.It Va mdoc_macronames
365The string representation of an
366.Xr mdoc 7
367macro as indexed by
368.Vt "enum mdoct" .
369.El
370.Sh IMPLEMENTATION NOTES
371This section consists of structural documentation for
372.Xr mdoc 7
373and
374.Xr man 7
375syntax trees and strings.
376.Ss Man and Mdoc Strings
377Strings may be extracted from mdoc and man meta-data, or from text
378nodes (MDOC_TEXT and MAN_TEXT, respectively).
379These strings have special non-printing formatting cues embedded in the
380text itself, as well as
381.Xr roff 7
382escapes preserved from input.
383Implementing systems will need to handle both situations to produce
384human-readable text.
385In general, strings may be assumed to consist of 7-bit ASCII characters.
386.Pp
387The following non-printing characters may be embedded in text strings:
388.Bl -tag -width Ds
389.It Dv ASCII_NBRSP
390A non-breaking space character.
391.It Dv ASCII_HYPH
392A soft hyphen.
393.It Dv ASCII_BREAK
394A breakable zero-width space.
395.El
396.Pp
397Escape characters are also passed verbatim into text strings.
398An escape character is a sequence of characters beginning with the
399backslash
400.Pq Sq \e .
401To construct human-readable text, these should be intercepted with
402.Xr mandoc_escape 3
403and converted with one the functions described in
404.Xr mchars_alloc 3 .
405.Ss Man Abstract Syntax Tree
406This AST is governed by the ontological rules dictated in
407.Xr man 7
408and derives its terminology accordingly.
409.Pp
410The AST is composed of
411.Vt struct roff_node
412nodes with element, root and text types as declared by the
413.Va type
414field.
415Each node also provides its parse point (the
416.Va line ,
417.Va pos ,
418and
419.Va sec
420fields), its position in the tree (the
421.Va parent ,
422.Va child ,
423.Va next
424and
425.Va prev
426fields) and some type-specific data.
427.Pp
428The tree itself is arranged according to the following normal form,
429where capitalised non-terminals represent nodes.
430.Pp
431.Bl -tag -width "ELEMENTXX" -compact
432.It ROOT
433\(<- mnode+
434.It mnode
435\(<- ELEMENT | TEXT | BLOCK
436.It BLOCK
437\(<- HEAD BODY
438.It HEAD
439\(<- mnode*
440.It BODY
441\(<- mnode*
442.It ELEMENT
443\(<- ELEMENT | TEXT*
444.It TEXT
445\(<- [[:ascii:]]*
446.El
447.Pp
448The only elements capable of nesting other elements are those with
449next-line scope as documented in
450.Xr man 7 .
451.Ss Mdoc Abstract Syntax Tree
452This AST is governed by the ontological
453rules dictated in
454.Xr mdoc 7
455and derives its terminology accordingly.
456.Qq In-line
457elements described in
458.Xr mdoc 7
459are described simply as
460.Qq elements .
461.Pp
462The AST is composed of
463.Vt struct roff_node
464nodes with block, head, body, element, root and text types as declared
465by the
466.Va type
467field.
468Each node also provides its parse point (the
469.Va line ,
470.Va pos ,
471and
472.Va sec
473fields), its position in the tree (the
474.Va parent ,
475.Va child ,
476.Va last ,
477.Va next
478and
479.Va prev
480fields) and some type-specific data, in particular, for nodes generated
481from macros, the generating macro in the
482.Va tok
483field.
484.Pp
485The tree itself is arranged according to the following normal form,
486where capitalised non-terminals represent nodes.
487.Pp
488.Bl -tag -width "ELEMENTXX" -compact
489.It ROOT
490\(<- mnode+
491.It mnode
492\(<- BLOCK | ELEMENT | TEXT
493.It BLOCK
494\(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
495.It ELEMENT
496\(<- TEXT*
497.It HEAD
498\(<- mnode*
499.It BODY
500\(<- mnode* [ENDBODY mnode*]
501.It TAIL
502\(<- mnode*
503.It TEXT
504\(<- [[:ascii:]]*
505.El
506.Pp
507Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
508the BLOCK production: these refer to punctuation marks.
509Furthermore, although a TEXT node will generally have a non-zero-length
510string, in the specific case of
511.Sq \&.Bd \-literal ,
512an empty line will produce a zero-length string.
513Multiple body parts are only found in invocations of
514.Sq \&Bl \-column ,
515where a new body introduces a new phrase.
516.Pp
517The
518.Xr mdoc 7
519syntax tree accommodates for broken block structures as well.
520The ENDBODY node is available to end the formatting associated
521with a given block before the physical end of that block.
522It has a non-null
523.Va end
524field, is of the BODY
525.Va type ,
526has the same
527.Va tok
528as the BLOCK it is ending, and has a
529.Va pending
530field pointing to that BLOCK's BODY node.
531It is an indirect child of that BODY node
532and has no children of its own.
533.Pp
534An ENDBODY node is generated when a block ends while one of its child
535blocks is still open, like in the following example:
536.Bd -literal -offset indent
537\&.Ao ao
538\&.Bo bo ac
539\&.Ac bc
540\&.Bc end
541.Ed
542.Pp
543This example results in the following block structure:
544.Bd -literal -offset indent
545BLOCK Ao
546    HEAD Ao
547    BODY Ao
548        TEXT ao
549        BLOCK Bo, pending -> Ao
550            HEAD Bo
551            BODY Bo
552                TEXT bo
553                TEXT ac
554                ENDBODY Ao, pending -> Ao
555                TEXT bc
556TEXT end
557.Ed
558.Pp
559Here, the formatting of the
560.Ic \&Ao
561block extends from TEXT ao to TEXT ac,
562while the formatting of the
563.Ic \&Bo
564block extends from TEXT bo to TEXT bc.
565It renders as follows in
566.Fl T Ns Cm ascii
567mode:
568.Pp
569.Dl <ao [bo ac> bc] end
570.Pp
571Support for badly-nested blocks is only provided for backward
572compatibility with some older
573.Xr mdoc 7
574implementations.
575Using badly-nested blocks is
576.Em strongly discouraged ;
577for example, the
578.Fl T Ns Cm html
579front-end to
580.Xr mandoc 1
581is unable to render them in any meaningful way.
582Furthermore, behaviour when encountering badly-nested blocks is not
583consistent across troff implementations, especially when using multiple
584levels of badly-nested blocks.
585.Sh SEE ALSO
586.Xr mandoc 1 ,
587.Xr man.cgi 3 ,
588.Xr mandoc_escape 3 ,
589.Xr mandoc_headers 3 ,
590.Xr mandoc_malloc 3 ,
591.Xr mansearch 3 ,
592.Xr mchars_alloc 3 ,
593.Xr tbl 3 ,
594.Xr eqn 7 ,
595.Xr man 7 ,
596.Xr mandoc_char 7 ,
597.Xr mdoc 7 ,
598.Xr roff 7 ,
599.Xr tbl 7
600.Sh AUTHORS
601.An -nosplit
602The
603.Nm
604library was written by
605.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
606and is maintained by
607.An Ingo Schwarze Aq Mt schwarze@openbsd.org .
608