1.\" $Id: mandoc_html.3,v 1.23 2020/04/24 13:13:06 schwarze Exp $ 2.\" 3.\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" 17.Dd $Mdocdate: April 24 2020 $ 18.Dt MANDOC_HTML 3 19.Os 20.Sh NAME 21.Nm mandoc_html 22.Nd internals of the mandoc HTML formatter 23.Sh SYNOPSIS 24.In sys/types.h 25.Fd #include """mandoc.h""" 26.Fd #include """roff.h""" 27.Fd #include """out.h""" 28.Fd #include """html.h""" 29.Ft void 30.Fn print_gen_decls "struct html *h" 31.Ft void 32.Fn print_gen_comment "struct html *h" "struct roff_node *n" 33.Ft void 34.Fn print_gen_head "struct html *h" 35.Ft struct tag * 36.Fo print_otag 37.Fa "struct html *h" 38.Fa "enum htmltag tag" 39.Fa "const char *fmt" 40.Fa ... 41.Fc 42.Ft void 43.Fo print_tagq 44.Fa "struct html *h" 45.Fa "const struct tag *until" 46.Fc 47.Ft void 48.Fo print_stagq 49.Fa "struct html *h" 50.Fa "const struct tag *suntil" 51.Fc 52.Ft void 53.Fn html_close_paragraph "struct html *h" 54.Ft enum roff_tok 55.Fo html_fillmode 56.Fa "struct html *h" 57.Fa "enum roff_tok tok" 58.Fc 59.Ft int 60.Fo html_setfont 61.Fa "struct html *h" 62.Fa "enum mandoc_esc font" 63.Fc 64.Ft void 65.Fo print_text 66.Fa "struct html *h" 67.Fa "const char *word" 68.Fc 69.Ft void 70.Fo print_tagged_text 71.Fa "struct html *h" 72.Fa "const char *word" 73.Fa "struct roff_node *n" 74.Fc 75.Ft char * 76.Fo html_make_id 77.Fa "const struct roff_node *n" 78.Fa "int unique" 79.Fc 80.Ft struct tag * 81.Fo print_otag_id 82.Fa "struct html *h" 83.Fa "enum htmltag tag" 84.Fa "const char *cattr" 85.Fa "struct roff_node *n" 86.Fc 87.Ft void 88.Fn print_endline "struct html *h" 89.Sh DESCRIPTION 90The mandoc HTML formatter is not a formal library. 91However, as it is compiled into more than one program, in particular 92.Xr mandoc 1 93and 94.Xr man.cgi 8 , 95and because it may be security-critical in some contexts, 96some documentation is useful to help to use it correctly and 97to prevent XSS vulnerabilities. 98.Pp 99The formatter produces HTML output on the standard output. 100Since proper escaping is usually required and best taken care of 101at one central place, the language-specific formatters 102.Po 103.Pa *_html.c , 104see 105.Sx FILES 106.Pc 107are not supposed to print directly to 108.Dv stdout 109using functions like 110.Xr printf 3 , 111.Xr putc 3 , 112.Xr puts 3 , 113or 114.Xr write 2 . 115Instead, they are expected to use the output functions declared in 116.Pa html.h 117and implemented as part of the main HTML formatting engine in 118.Pa html.c . 119.Ss Data structures 120These structures are declared in 121.Pa html.h . 122.Bl -tag -width Ds 123.It Vt struct html 124Internal state of the HTML formatter. 125.It Vt struct tag 126One entry for the LIFO stack of HTML elements. 127Members include 128.Fa "enum htmltag tag" 129and 130.Fa "struct tag *next" . 131.El 132.Ss Private interface functions 133The function 134.Fn print_gen_decls 135prints the opening 136.Aq Pf \&! Ic DOCTYPE 137declaration. 138.Pp 139The function 140.Fn print_gen_comment 141prints the leading comments, usually containing a Copyright notice 142and license, as an HTML comment. 143It is intended to be called right after opening the 144.Aq Ic HTML 145element. 146Pass the first 147.Dv ROFFT_COMMENT 148node in 149.Fa n . 150.Pp 151The function 152.Fn print_gen_head 153prints the opening 154.Aq Ic META 155and 156.Aq Ic LINK 157elements for the document 158.Aq Ic HEAD , 159using the 160.Fa style 161member of 162.Fa h 163unless that is 164.Dv NULL . 165It uses 166.Fn print_otag 167which takes care of properly encoding attributes, 168which is relevant for the 169.Fa style 170link in particular. 171.Pp 172The function 173.Fn print_otag 174prints the start tag of an HTML element with the name 175.Fa tag , 176optionally including the attributes specified by 177.Fa fmt . 178If 179.Fa fmt 180is the empty string, no attributes are written. 181Each letter of 182.Fa fmt 183specifies one attribute to write. 184Most attributes require one 185.Va char * 186argument which becomes the value of the attribute. 187The arguments have to be given in the same order as the attribute letters. 188If an argument is 189.Dv NULL , 190the respective attribute is not written. 191.Bl -tag -width 1n -offset indent 192.It Cm c 193Print a 194.Cm class 195attribute. 196.It Cm h 197Print a 198.Cm href 199attribute. 200This attribute letter can optionally be followed by a modifier letter. 201If followed by 202.Cm R , 203it formats the link as a local one by prefixing a 204.Sq # 205character. 206If followed by 207.Cm I , 208it interpretes the argument as a header file name 209and generates a link using the 210.Xr mandoc 1 211.Fl O Cm includes 212option. 213If followed by 214.Cm M , 215it takes two arguments instead of one, a manual page name and 216section, and formats them as a link to a manual page using the 217.Xr mandoc 1 218.Fl O Cm man 219option. 220.It Cm i 221Print an 222.Cm id 223attribute. 224.It Cm \&? 225Print an arbitrary attribute. 226This format letter requires two 227.Vt char * 228arguments, the attribute name and the value. 229The name must not be 230.Dv NULL . 231.It Cm s 232Print a 233.Cm style 234attribute. 235If present, it must be the last format letter. 236It requires two 237.Va char * 238arguments. 239The first is the name of the style property, the second its value. 240The name must not be 241.Dv NULL . 242The 243.Cm s 244.Ar fmt 245letter can be repeated, each repetition requiring an additional pair of 246.Va char * 247arguments. 248.El 249.Pp 250.Fn print_otag 251uses the private function 252.Fn print_encode 253to take care of HTML encoding. 254If required by the element type, it remembers in 255.Fa h 256that the element is open. 257The function 258.Fn print_tagq 259is used to close out all open elements up to and including 260.Fa until ; 261.Fn print_stagq 262is a variant to close out all open elements up to but excluding 263.Fa suntil . 264The function 265.Fn html_close_paragraph 266closes all open elements that establish phrasing context, 267thus returning to the innermost flow context. 268.Pp 269The function 270.Fn html_fillmode 271switches to fill mode if 272.Fa want 273is 274.Dv ROFF_fi 275or to no-fill mode if 276.Fa want 277is 278.Dv ROFF_nf . 279Switching from fill mode to no-fill mode closes the current paragraph 280and opens a 281.Aq Ic PRE 282element. 283Switching in the opposite direction closes the 284.Aq Ic PRE 285element, but does not open a new paragraph. 286If 287.Fa want 288matches the mode that is already active, no elements are closed nor opened. 289If 290.Fa want 291is 292.Dv TOKEN_NONE , 293the mode remains as it is. 294.Pp 295The function 296.Fn html_setfont 297selects the 298.Fa font , 299which can be 300.Dv ESCAPE_FONTROMAN , 301.Dv ESCAPE_FONTBOLD , 302.Dv ESCAPE_FONTITALIC , 303.Dv ESCAPE_FONTBI , 304or 305.Dv ESCAPE_FONTCW , 306for future text output and internally remembers 307the font that was active before the change. 308If the 309.Fa font 310argument is 311.Dv ESCAPE_FONTPREV , 312the current and the previous font are exchanged. 313This function only changes the internal state of the 314.Fa h 315object; no HTML elements are written yet. 316Subsequent text output will write font elements when needed. 317.Pp 318The function 319.Fn print_text 320prints HTML element content. 321It uses the private function 322.Fn print_encode 323to take care of HTML encoding. 324If the document has requested a non-standard font, for example using a 325.Xr roff 7 326.Ic \ef 327font escape sequence, 328.Fn print_text 329wraps 330.Fa word 331in an HTML font selection element using the 332.Fn print_otag 333and 334.Fn print_tagq 335functions. 336.Pp 337The function 338.Fn print_tagged_text 339is a variant of 340.Fn print_text 341that wraps 342.Fa word 343in an 344.Aq Ic A 345element of class 346.Qq permalink 347if 348.Fa n 349is not 350.Dv NULL 351and yields a segment identifier when passed to 352.Fn html_make_id . 353.Pp 354The function 355.Fn html_make_id 356allocates a string to be used for the 357.Cm id 358attribute of an HTML element and/or as a segment identifier for a URI in an 359.Aq Ic A 360element. 361If 362.Fa n 363contains a 364.Fa tag 365attribute, it is used; otherwise, child nodes are used. 366If 367.Fa n 368is an 369.Ic \&Sh , 370.Ic \&Ss , 371.Ic \&Sx , 372.Ic SH , 373or 374.Ic SS 375node, the resulting string is the concatenation of the child strings; 376for other node types, only the first child is used. 377Bytes not permitted in URI-fragment strings are replaced by underscores. 378If any of the children to be used is not a text node, 379no string is generated and 380.Dv NULL 381is returned instead. 382If the 383.Fa unique 384argument is non-zero, deduplication is performed by appending an 385underscore and a decimal integer, if necessary. 386If the 387.Fa unique 388argument is 1, this is assumed to be the first call for this tag 389at this location, typically for use by 390.Dv NODE_ID , 391so the integer is incremented before use. 392If the 393.Fa unique 394argument is 2, this is ssumed to be the second call for this tag 395at this location, typically for use by 396.Dv NODE_HREF , 397so the existing integer, if any, is used without incrementing it. 398.Pp 399The function 400.Fn print_otag_id 401opens a 402.Fa tag 403element of class 404.Fa cattr 405for the node 406.Fa n . 407If the flag 408.Dv NODE_ID 409is set in 410.Fa n , 411it attempts to generate an 412.Cm id 413attribute with 414.Fn html_make_id . 415If the flag 416.Dv NODE_HREF 417is set in 418.Fa n , 419an 420.Aq Ic A 421element of class 422.Qq permalink 423is added: 424outside if 425.Fa n 426generates an element that can only occur in phrasing context, 427or inside otherwise. 428This function is a wrapper around 429.Fn html_make_id 430and 431.Fn print_otag , 432automatically chosing the 433.Fa unique 434argument appropriately and setting the 435.Fa fmt 436arguments to 437.Qq chR 438and 439.Qq ci , 440respectively. 441.Pp 442The function 443.Fn print_endline 444makes sure subsequent output starts on a new HTML output line. 445If nothing was printed on the current output line yet, it has no effect. 446Otherwise, it appends any buffered text to the current output line, 447ends the line, and updates the internal state of the 448.Fa h 449object. 450.Pp 451The functions 452.Fn print_eqn , 453.Fn print_tbl , 454and 455.Fn print_tblclose 456are not yet documented. 457.Sh RETURN VALUES 458The functions 459.Fn print_otag 460and 461.Fn print_otag_id 462return a pointer to a new element on the stack of HTML elements. 463When 464.Fn print_otag_id 465opens two elements, a pointer to the outer one is returned. 466The memory pointed to is owned by the library and is automatically 467.Xr free 3 Ns d 468when 469.Fn print_tagq 470is called on it or when 471.Fn print_stagq 472is called on a parent element. 473.Pp 474The function 475.Fn html_fillmode 476returns 477.Dv ROFF_fi 478if fill mode was active before the call or 479.Dv ROFF_nf 480otherwise. 481.Pp 482The function 483.Fn html_make_id 484returns a newly allocated string or 485.Dv NULL 486if 487.Fa n 488lacks text data to create the attribute from. 489The caller is responsible for 490.Xr free 3 Ns ing 491the returned string after using it. 492.Pp 493In case of 494.Xr malloc 3 495failure, these functions do not return but call 496.Xr err 3 . 497.Sh FILES 498.Bl -tag -width mandoc_aux.c -compact 499.It Pa main.h 500declarations of public functions for use by the main program, 501not yet documented 502.It Pa html.h 503declarations of data types and private functions 504for use by language-specific HTML formatters 505.It Pa html.c 506main HTML formatting engine and utility functions 507.It Pa mdoc_html.c 508.Xr mdoc 7 509HTML formatter 510.It Pa man_html.c 511.Xr man 7 512HTML formatter 513.It Pa tbl_html.c 514.Xr tbl 7 515HTML formatter 516.It Pa eqn_html.c 517.Xr eqn 7 518HTML formatter 519.It Pa roff_html.c 520.Xr roff 7 521HTML formatter, handling requests like 522.Ic br , 523.Ic ce , 524.Ic fi , 525.Ic ft , 526.Ic nf , 527.Ic rj , 528and 529.Ic sp . 530.It Pa out.h 531declarations of data types and private functions 532for shared use by all mandoc formatters, 533not yet documented 534.It Pa out.c 535private functions for shared use by all mandoc formatters 536.It Pa mandoc_aux.h 537declarations of common mandoc utility functions, see 538.Xr mandoc 3 539.It Pa mandoc_aux.c 540implementation of common mandoc utility functions 541.El 542.Sh SEE ALSO 543.Xr mandoc 1 , 544.Xr mandoc 3 , 545.Xr man.cgi 8 546.Sh AUTHORS 547.An -nosplit 548The mandoc HTML formatter was written by 549.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv . 550It is maintained by 551.An Ingo Schwarze Aq Mt schwarze@openbsd.org , 552who also wrote this manual. 553