1.\" $Id: mandoc.3,v 1.44 2018/12/30 00:49:55 schwarze Exp $ 2.\" 3.\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv> 4.\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org> 5.\" 6.\" Permission to use, copy, modify, and distribute this software for any 7.\" purpose with or without fee is hereby granted, provided that the above 8.\" copyright notice and this permission notice appear in all copies. 9.\" 10.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 11.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 12.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 13.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 14.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 15.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 16.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 17.\" 18.Dd $Mdocdate: December 30 2018 $ 19.Dt MANDOC 3 20.Os 21.Sh NAME 22.Nm mandoc , 23.Nm deroff , 24.Nm mparse_alloc , 25.Nm mparse_copy , 26.Nm mparse_free , 27.Nm mparse_open , 28.Nm mparse_readfd , 29.Nm mparse_reset , 30.Nm mparse_result 31.Nd mandoc macro compiler library 32.Sh SYNOPSIS 33.In sys/types.h 34.In stdio.h 35.In mandoc.h 36.Pp 37.Fd "#define ASCII_NBRSP" 38.Fd "#define ASCII_HYPH" 39.Fd "#define ASCII_BREAK" 40.Ft struct mparse * 41.Fo mparse_alloc 42.Fa "int options" 43.Fa "enum mandoc_os oe_e" 44.Fa "char *os_s" 45.Fc 46.Ft void 47.Fo mparse_free 48.Fa "struct mparse *parse" 49.Fc 50.Ft void 51.Fo mparse_copy 52.Fa "const struct mparse *parse" 53.Fc 54.Ft int 55.Fo mparse_open 56.Fa "struct mparse *parse" 57.Fa "const char *fname" 58.Fc 59.Ft void 60.Fo mparse_readfd 61.Fa "struct mparse *parse" 62.Fa "int fd" 63.Fa "const char *fname" 64.Fc 65.Ft void 66.Fo mparse_reset 67.Fa "struct mparse *parse" 68.Fc 69.Ft struct roff_meta * 70.Fo mparse_result 71.Fa "struct mparse *parse" 72.Fc 73.In roff.h 74.Ft void 75.Fo deroff 76.Fa "char **dest" 77.Fa "const struct roff_node *node" 78.Fc 79.In sys/types.h 80.In mandoc.h 81.In mdoc.h 82.Vt extern const char * const * mdoc_argnames; 83.Vt extern const char * const * mdoc_macronames; 84.In sys/types.h 85.In mandoc.h 86.In man.h 87.Vt extern const char * const * man_macronames; 88.Sh DESCRIPTION 89The 90.Nm mandoc 91library parses a 92.Ux 93manual into an abstract syntax tree (AST). 94.Ux 95manuals are composed of 96.Xr mdoc 7 97or 98.Xr man 7 , 99and may be mixed with 100.Xr roff 7 , 101.Xr tbl 7 , 102and 103.Xr eqn 7 104invocations. 105.Pp 106The following describes a general parse sequence: 107.Bl -enum 108.It 109initiate a parsing sequence with 110.Xr mchars_alloc 3 111and 112.Fn mparse_alloc ; 113.It 114open a file with 115.Xr open 2 116or 117.Fn mparse_open ; 118.It 119parse it with 120.Fn mparse_readfd ; 121.It 122close it with 123.Xr close 2 ; 124.It 125retrieve the syntax tree with 126.Fn mparse_result ; 127.It 128if information about the validity of the input is needed, fetch it with 129.Fn mparse_updaterc ; 130.It 131iterate over parse nodes with starting from the 132.Fa first 133member of the returned 134.Vt struct roff_meta ; 135.It 136free all allocated memory with 137.Fn mparse_free 138and 139.Xr mchars_free 3 , 140or invoke 141.Fn mparse_reset 142and go back to step 2 to parse new files. 143.El 144.Sh REFERENCE 145This section documents the functions, types, and variables available 146via 147.In mandoc.h , 148with the exception of those documented in 149.Xr mandoc_escape 3 150and 151.Xr mchars_alloc 3 . 152.Ss Types 153.Bl -ohang 154.It Vt "enum mandocerr" 155An error or warning message during parsing. 156.It Vt "enum mandoclevel" 157A classification of an 158.Vt "enum mandocerr" 159as regards system operation. 160See the DIAGNOSTICS section in 161.Xr mandoc 1 162regarding the meanings of the levels. 163.It Vt "struct mparse" 164An opaque pointer to a running parse sequence. 165Created with 166.Fn mparse_alloc 167and freed with 168.Fn mparse_free . 169This may be used across parsed input if 170.Fn mparse_reset 171is called between parses. 172.El 173.Ss Functions 174.Bl -ohang 175.It Fn deroff 176Obtain a text-only representation of a 177.Vt struct roff_node , 178including text contained in its child nodes. 179To be used on children of the 180.Fa first 181member of 182.Vt struct roff_meta . 183When it is no longer needed, the pointer returned from 184.Fn deroff 185can be passed to 186.Xr free 3 . 187.It Fn mparse_alloc 188Allocate a parser. 189The arguments have the following effect: 190.Bl -tag -offset 5n -width inttype 191.It Ar options 192When the 193.Dv MPARSE_MDOC 194or 195.Dv MPARSE_MAN 196bit is set, only that parser is used. 197Otherwise, the document type is automatically detected. 198.Pp 199When the 200.Dv MPARSE_SO 201bit is set, 202.Xr roff 7 203.Ic \&so 204file inclusion requests are always honoured. 205Otherwise, if the request is the only content in an input file, 206only the file name is remembered, to be returned in the 207.Fa sodest 208field of 209.Vt struct roff_meta . 210.Pp 211When the 212.Dv MPARSE_QUICK 213bit is set, parsing is aborted after the NAME section. 214This is for example useful in 215.Xr makewhatis 8 216.Fl Q 217to quickly build minimal databases. 218.Pp 219When the 220.Dv MARSE_VALIDATE 221bit is set, 222.Fn mparse_result 223runs the validation functions before returning the syntax tree. 224This is almost always required, except in certain debugging scenarios, 225for example to dump unvalidated syntax trees. 226.It Ar os_e 227Operating system to check base system conventions for. 228If 229.Dv MANDOC_OS_OTHER , 230the system is automatically detected from 231.Ic \&Os , 232.Fl Ios , 233or 234.Xr uname 3 . 235.It Ar os_s 236A default string for the 237.Xr mdoc 7 238.Ic \&Os 239macro, overriding the 240.Dv OSNAME 241preprocessor definition and the results of 242.Xr uname 3 . 243Passing 244.Dv NULL 245sets no default. 246.El 247.Pp 248The same parser may be used for multiple files so long as 249.Fn mparse_reset 250is called between parses. 251.Fn mparse_free 252must be called to free the memory allocated by this function. 253Declared in 254.In mandoc.h , 255implemented in 256.Pa read.c . 257.It Fn mparse_free 258Free all memory allocated by 259.Fn mparse_alloc . 260Declared in 261.In mandoc.h , 262implemented in 263.Pa read.c . 264.It Fn mparse_copy 265Dump a copy of the input to the standard output; used for 266.Fl man T Ns Cm man . 267Declared in 268.In mandoc.h , 269implemented in 270.Pa read.c . 271.It Fn mparse_open 272Open the file for reading. 273If that fails and 274.Fa fname 275does not already end in 276.Ql .gz , 277try again after appending 278.Ql .gz . 279Save the information whether the file is zipped or not. 280Return a file descriptor open for reading or -1 on failure. 281It can be passed to 282.Fn mparse_readfd 283or used directly. 284Declared in 285.In mandoc.h , 286implemented in 287.Pa read.c . 288.It Fn mparse_readfd 289Parse a file descriptor opened with 290.Xr open 2 291or 292.Fn mparse_open . 293Pass the associated filename in 294.Va fname . 295This function may be called multiple times with different parameters; however, 296.Xr close 2 297and 298.Fn mparse_reset 299should be invoked between parses. 300Declared in 301.In mandoc.h , 302implemented in 303.Pa read.c . 304.It Fn mparse_reset 305Reset a parser so that 306.Fn mparse_readfd 307may be used again. 308Declared in 309.In mandoc.h , 310implemented in 311.Pa read.c . 312.It Fn mparse_result 313Obtain the result of a parse. 314Declared in 315.In mandoc.h , 316implemented in 317.Pa read.c . 318.El 319.Ss Variables 320.Bl -ohang 321.It Va man_macronames 322The string representation of a 323.Xr man 7 324macro as indexed by 325.Vt "enum mant" . 326.It Va mdoc_argnames 327The string representation of an 328.Xr mdoc 7 329macro argument as indexed by 330.Vt "enum mdocargt" . 331.It Va mdoc_macronames 332The string representation of an 333.Xr mdoc 7 334macro as indexed by 335.Vt "enum mdoct" . 336.El 337.Sh IMPLEMENTATION NOTES 338This section consists of structural documentation for 339.Xr mdoc 7 340and 341.Xr man 7 342syntax trees and strings. 343.Ss Man and Mdoc Strings 344Strings may be extracted from mdoc and man meta-data, or from text 345nodes (MDOC_TEXT and MAN_TEXT, respectively). 346These strings have special non-printing formatting cues embedded in the 347text itself, as well as 348.Xr roff 7 349escapes preserved from input. 350Implementing systems will need to handle both situations to produce 351human-readable text. 352In general, strings may be assumed to consist of 7-bit ASCII characters. 353.Pp 354The following non-printing characters may be embedded in text strings: 355.Bl -tag -width Ds 356.It Dv ASCII_NBRSP 357A non-breaking space character. 358.It Dv ASCII_HYPH 359A soft hyphen. 360.It Dv ASCII_BREAK 361A breakable zero-width space. 362.El 363.Pp 364Escape characters are also passed verbatim into text strings. 365An escape character is a sequence of characters beginning with the 366backslash 367.Pq Sq \e . 368To construct human-readable text, these should be intercepted with 369.Xr mandoc_escape 3 370and converted with one the functions described in 371.Xr mchars_alloc 3 . 372.Ss Man Abstract Syntax Tree 373This AST is governed by the ontological rules dictated in 374.Xr man 7 375and derives its terminology accordingly. 376.Pp 377The AST is composed of 378.Vt struct roff_node 379nodes with element, root and text types as declared by the 380.Va type 381field. 382Each node also provides its parse point (the 383.Va line , 384.Va pos , 385and 386.Va sec 387fields), its position in the tree (the 388.Va parent , 389.Va child , 390.Va next 391and 392.Va prev 393fields) and some type-specific data. 394.Pp 395The tree itself is arranged according to the following normal form, 396where capitalised non-terminals represent nodes. 397.Pp 398.Bl -tag -width "ELEMENTXX" -compact 399.It ROOT 400\(<- mnode+ 401.It mnode 402\(<- ELEMENT | TEXT | BLOCK 403.It BLOCK 404\(<- HEAD BODY 405.It HEAD 406\(<- mnode* 407.It BODY 408\(<- mnode* 409.It ELEMENT 410\(<- ELEMENT | TEXT* 411.It TEXT 412\(<- [[:ascii:]]* 413.El 414.Pp 415The only elements capable of nesting other elements are those with 416next-line scope as documented in 417.Xr man 7 . 418.Ss Mdoc Abstract Syntax Tree 419This AST is governed by the ontological 420rules dictated in 421.Xr mdoc 7 422and derives its terminology accordingly. 423.Qq In-line 424elements described in 425.Xr mdoc 7 426are described simply as 427.Qq elements . 428.Pp 429The AST is composed of 430.Vt struct roff_node 431nodes with block, head, body, element, root and text types as declared 432by the 433.Va type 434field. 435Each node also provides its parse point (the 436.Va line , 437.Va pos , 438and 439.Va sec 440fields), its position in the tree (the 441.Va parent , 442.Va child , 443.Va last , 444.Va next 445and 446.Va prev 447fields) and some type-specific data, in particular, for nodes generated 448from macros, the generating macro in the 449.Va tok 450field. 451.Pp 452The tree itself is arranged according to the following normal form, 453where capitalised non-terminals represent nodes. 454.Pp 455.Bl -tag -width "ELEMENTXX" -compact 456.It ROOT 457\(<- mnode+ 458.It mnode 459\(<- BLOCK | ELEMENT | TEXT 460.It BLOCK 461\(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]] 462.It ELEMENT 463\(<- TEXT* 464.It HEAD 465\(<- mnode* 466.It BODY 467\(<- mnode* [ENDBODY mnode*] 468.It TAIL 469\(<- mnode* 470.It TEXT 471\(<- [[:ascii:]]* 472.El 473.Pp 474Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of 475the BLOCK production: these refer to punctuation marks. 476Furthermore, although a TEXT node will generally have a non-zero-length 477string, in the specific case of 478.Sq \&.Bd \-literal , 479an empty line will produce a zero-length string. 480Multiple body parts are only found in invocations of 481.Sq \&Bl \-column , 482where a new body introduces a new phrase. 483.Pp 484The 485.Xr mdoc 7 486syntax tree accommodates for broken block structures as well. 487The ENDBODY node is available to end the formatting associated 488with a given block before the physical end of that block. 489It has a non-null 490.Va end 491field, is of the BODY 492.Va type , 493has the same 494.Va tok 495as the BLOCK it is ending, and has a 496.Va pending 497field pointing to that BLOCK's BODY node. 498It is an indirect child of that BODY node 499and has no children of its own. 500.Pp 501An ENDBODY node is generated when a block ends while one of its child 502blocks is still open, like in the following example: 503.Bd -literal -offset indent 504\&.Ao ao 505\&.Bo bo ac 506\&.Ac bc 507\&.Bc end 508.Ed 509.Pp 510This example results in the following block structure: 511.Bd -literal -offset indent 512BLOCK Ao 513 HEAD Ao 514 BODY Ao 515 TEXT ao 516 BLOCK Bo, pending -> Ao 517 HEAD Bo 518 BODY Bo 519 TEXT bo 520 TEXT ac 521 ENDBODY Ao, pending -> Ao 522 TEXT bc 523TEXT end 524.Ed 525.Pp 526Here, the formatting of the 527.Ic \&Ao 528block extends from TEXT ao to TEXT ac, 529while the formatting of the 530.Ic \&Bo 531block extends from TEXT bo to TEXT bc. 532It renders as follows in 533.Fl T Ns Cm ascii 534mode: 535.Pp 536.Dl <ao [bo ac> bc] end 537.Pp 538Support for badly-nested blocks is only provided for backward 539compatibility with some older 540.Xr mdoc 7 541implementations. 542Using badly-nested blocks is 543.Em strongly discouraged ; 544for example, the 545.Fl T Ns Cm html 546front-end to 547.Xr mandoc 1 548is unable to render them in any meaningful way. 549Furthermore, behaviour when encountering badly-nested blocks is not 550consistent across troff implementations, especially when using multiple 551levels of badly-nested blocks. 552.Sh SEE ALSO 553.Xr mandoc 1 , 554.Xr man.cgi 3 , 555.Xr mandoc_escape 3 , 556.Xr mandoc_headers 3 , 557.Xr mandoc_malloc 3 , 558.Xr mansearch 3 , 559.Xr mchars_alloc 3 , 560.Xr tbl 3 , 561.Xr eqn 7 , 562.Xr man 7 , 563.Xr mandoc_char 7 , 564.Xr mdoc 7 , 565.Xr roff 7 , 566.Xr tbl 7 567.Sh AUTHORS 568.An -nosplit 569The 570.Nm 571library was written by 572.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv 573and is maintained by 574.An Ingo Schwarze Aq Mt schwarze@openbsd.org . 575