1.\" $Id: mandoc_html.3,v 1.24 2022/06/24 11:15:53 schwarze Exp $ 2.\" 3.\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" 17.Dd $Mdocdate: June 24 2022 $ 18.Dt MANDOC_HTML 3 19.Os 20.Sh NAME 21.Nm mandoc_html 22.Nd internals of the mandoc HTML formatter 23.Sh SYNOPSIS 24.In sys/types.h 25.Fd #include """mandoc.h""" 26.Fd #include """roff.h""" 27.Fd #include """out.h""" 28.Fd #include """html.h""" 29.Ft void 30.Fn print_gen_decls "struct html *h" 31.Ft void 32.Fn print_gen_comment "struct html *h" "struct roff_node *n" 33.Ft void 34.Fn print_gen_head "struct html *h" 35.Ft struct tag * 36.Fo print_otag 37.Fa "struct html *h" 38.Fa "enum htmltag tag" 39.Fa "const char *fmt" 40.Fa ... 41.Fc 42.Ft void 43.Fo print_tagq 44.Fa "struct html *h" 45.Fa "const struct tag *until" 46.Fc 47.Ft void 48.Fo print_stagq 49.Fa "struct html *h" 50.Fa "const struct tag *suntil" 51.Fc 52.Ft void 53.Fn html_close_paragraph "struct html *h" 54.Ft enum roff_tok 55.Fo html_fillmode 56.Fa "struct html *h" 57.Fa "enum roff_tok tok" 58.Fc 59.Ft int 60.Fo html_setfont 61.Fa "struct html *h" 62.Fa "enum mandoc_esc font" 63.Fc 64.Ft void 65.Fo print_text 66.Fa "struct html *h" 67.Fa "const char *word" 68.Fc 69.Ft void 70.Fo print_tagged_text 71.Fa "struct html *h" 72.Fa "const char *word" 73.Fa "struct roff_node *n" 74.Fc 75.Ft char * 76.Fo html_make_id 77.Fa "const struct roff_node *n" 78.Fa "int unique" 79.Fc 80.Ft struct tag * 81.Fo print_otag_id 82.Fa "struct html *h" 83.Fa "enum htmltag tag" 84.Fa "const char *cattr" 85.Fa "struct roff_node *n" 86.Fc 87.Ft void 88.Fn print_endline "struct html *h" 89.Sh DESCRIPTION 90The mandoc HTML formatter is not a formal library. 91However, as it is compiled into more than one program, in particular 92.Xr mandoc 1 93and 94.Xr man.cgi 8 , 95and because it may be security-critical in some contexts, 96some documentation is useful to help to use it correctly and 97to prevent XSS vulnerabilities. 98.Pp 99The formatter produces HTML output on the standard output. 100Since proper escaping is usually required and best taken care of 101at one central place, the language-specific formatters 102.Po 103.Pa *_html.c , 104see 105.Sx FILES 106.Pc 107are not supposed to print directly to 108.Dv stdout 109using functions like 110.Xr printf 3 , 111.Xr putc 3 , 112.Xr puts 3 , 113or 114.Xr write 2 . 115Instead, they are expected to use the output functions declared in 116.Pa html.h 117and implemented as part of the main HTML formatting engine in 118.Pa html.c . 119.Ss Data structures 120These structures are declared in 121.Pa html.h . 122.Bl -tag -width Ds 123.It Vt struct html 124Internal state of the HTML formatter. 125.It Vt struct tag 126One entry for the LIFO stack of HTML elements. 127Members include 128.Fa "enum htmltag tag" 129and 130.Fa "struct tag *next" . 131.El 132.Ss Private interface functions 133The function 134.Fn print_gen_decls 135prints the opening 136.Aq Pf \&! Ic DOCTYPE 137declaration. 138.Pp 139The function 140.Fn print_gen_comment 141prints the leading comments, usually containing a Copyright notice 142and license, as an HTML comment. 143It is intended to be called right after opening the 144.Aq Ic HTML 145element. 146Pass the first 147.Dv ROFFT_COMMENT 148node in 149.Fa n . 150.Pp 151The function 152.Fn print_gen_head 153prints the opening 154.Aq Ic META 155and 156.Aq Ic LINK 157elements for the document 158.Aq Ic HEAD , 159using the 160.Fa style 161member of 162.Fa h 163unless that is 164.Dv NULL . 165It uses 166.Fn print_otag 167which takes care of properly encoding attributes, 168which is relevant for the 169.Fa style 170link in particular. 171.Pp 172The function 173.Fn print_otag 174prints the start tag of an HTML element with the name 175.Fa tag , 176optionally including the attributes specified by 177.Fa fmt . 178If 179.Fa fmt 180is the empty string, no attributes are written. 181Each letter of 182.Fa fmt 183specifies one attribute to write. 184Most attributes require one 185.Va char * 186argument which becomes the value of the attribute. 187The arguments have to be given in the same order as the attribute letters. 188If an argument is 189.Dv NULL , 190the respective attribute is not written. 191.Bl -tag -width 1n -offset indent 192.It Cm c 193Print a 194.Cm class 195attribute. 196.It Cm h 197Print a 198.Cm href 199attribute. 200This attribute letter can optionally be followed by a modifier letter. 201If followed by 202.Cm R , 203it formats the link as a local one by prefixing a 204.Sq # 205character. 206If followed by 207.Cm I , 208it interpretes the argument as a header file name 209and generates a link using the 210.Xr mandoc 1 211.Fl O Cm includes 212option. 213If followed by 214.Cm M , 215it takes two arguments instead of one, a manual page name and 216section, and formats them as a link to a manual page using the 217.Xr mandoc 1 218.Fl O Cm man 219option. 220.It Cm i 221Print an 222.Cm id 223attribute. 224.It Cm r 225Print an ARIA 226.Cm role 227attribute. 228.It Cm \&? 229Print an arbitrary attribute. 230This format letter requires two 231.Vt char * 232arguments, the attribute name and the value. 233The name must not be 234.Dv NULL . 235.It Cm s 236Print a 237.Cm style 238attribute. 239If present, it must be the last format letter. 240It requires two 241.Va char * 242arguments. 243The first is the name of the style property, the second its value. 244The name must not be 245.Dv NULL . 246The 247.Cm s 248.Ar fmt 249letter can be repeated, each repetition requiring an additional pair of 250.Va char * 251arguments. 252.El 253.Pp 254.Fn print_otag 255uses the private function 256.Fn print_encode 257to take care of HTML encoding. 258If required by the element type, it remembers in 259.Fa h 260that the element is open. 261The function 262.Fn print_tagq 263is used to close out all open elements up to and including 264.Fa until ; 265.Fn print_stagq 266is a variant to close out all open elements up to but excluding 267.Fa suntil . 268The function 269.Fn html_close_paragraph 270closes all open elements that establish phrasing context, 271thus returning to the innermost flow context. 272.Pp 273The function 274.Fn html_fillmode 275switches to fill mode if 276.Fa want 277is 278.Dv ROFF_fi 279or to no-fill mode if 280.Fa want 281is 282.Dv ROFF_nf . 283Switching from fill mode to no-fill mode closes the current paragraph 284and opens a 285.Aq Ic PRE 286element. 287Switching in the opposite direction closes the 288.Aq Ic PRE 289element, but does not open a new paragraph. 290If 291.Fa want 292matches the mode that is already active, no elements are closed nor opened. 293If 294.Fa want 295is 296.Dv TOKEN_NONE , 297the mode remains as it is. 298.Pp 299The function 300.Fn html_setfont 301selects the 302.Fa font , 303which can be 304.Dv ESCAPE_FONTROMAN , 305.Dv ESCAPE_FONTBOLD , 306.Dv ESCAPE_FONTITALIC , 307.Dv ESCAPE_FONTBI , 308or 309.Dv ESCAPE_FONTCW , 310for future text output and internally remembers 311the font that was active before the change. 312If the 313.Fa font 314argument is 315.Dv ESCAPE_FONTPREV , 316the current and the previous font are exchanged. 317This function only changes the internal state of the 318.Fa h 319object; no HTML elements are written yet. 320Subsequent text output will write font elements when needed. 321.Pp 322The function 323.Fn print_text 324prints HTML element content. 325It uses the private function 326.Fn print_encode 327to take care of HTML encoding. 328If the document has requested a non-standard font, for example using a 329.Xr roff 7 330.Ic \ef 331font escape sequence, 332.Fn print_text 333wraps 334.Fa word 335in an HTML font selection element using the 336.Fn print_otag 337and 338.Fn print_tagq 339functions. 340.Pp 341The function 342.Fn print_tagged_text 343is a variant of 344.Fn print_text 345that wraps 346.Fa word 347in an 348.Aq Ic A 349element of class 350.Qq permalink 351if 352.Fa n 353is not 354.Dv NULL 355and yields a segment identifier when passed to 356.Fn html_make_id . 357.Pp 358The function 359.Fn html_make_id 360allocates a string to be used for the 361.Cm id 362attribute of an HTML element and/or as a segment identifier for a URI in an 363.Aq Ic A 364element. 365If 366.Fa n 367contains a 368.Fa tag 369attribute, it is used; otherwise, child nodes are used. 370If 371.Fa n 372is an 373.Ic \&Sh , 374.Ic \&Ss , 375.Ic \&Sx , 376.Ic SH , 377or 378.Ic SS 379node, the resulting string is the concatenation of the child strings; 380for other node types, only the first child is used. 381Bytes not permitted in URI-fragment strings are replaced by underscores. 382If any of the children to be used is not a text node, 383no string is generated and 384.Dv NULL 385is returned instead. 386If the 387.Fa unique 388argument is non-zero, deduplication is performed by appending an 389underscore and a decimal integer, if necessary. 390If the 391.Fa unique 392argument is 1, this is assumed to be the first call for this tag 393at this location, typically for use by 394.Dv NODE_ID , 395so the integer is incremented before use. 396If the 397.Fa unique 398argument is 2, this is ssumed to be the second call for this tag 399at this location, typically for use by 400.Dv NODE_HREF , 401so the existing integer, if any, is used without incrementing it. 402.Pp 403The function 404.Fn print_otag_id 405opens a 406.Fa tag 407element of class 408.Fa cattr 409for the node 410.Fa n . 411If the flag 412.Dv NODE_ID 413is set in 414.Fa n , 415it attempts to generate an 416.Cm id 417attribute with 418.Fn html_make_id . 419If the flag 420.Dv NODE_HREF 421is set in 422.Fa n , 423an 424.Aq Ic A 425element of class 426.Qq permalink 427is added: 428outside if 429.Fa n 430generates an element that can only occur in phrasing context, 431or inside otherwise. 432This function is a wrapper around 433.Fn html_make_id 434and 435.Fn print_otag , 436automatically chosing the 437.Fa unique 438argument appropriately and setting the 439.Fa fmt 440arguments to 441.Qq chR 442and 443.Qq ci , 444respectively. 445.Pp 446The function 447.Fn print_endline 448makes sure subsequent output starts on a new HTML output line. 449If nothing was printed on the current output line yet, it has no effect. 450Otherwise, it appends any buffered text to the current output line, 451ends the line, and updates the internal state of the 452.Fa h 453object. 454.Pp 455The functions 456.Fn print_eqn , 457.Fn print_tbl , 458and 459.Fn print_tblclose 460are not yet documented. 461.Sh RETURN VALUES 462The functions 463.Fn print_otag 464and 465.Fn print_otag_id 466return a pointer to a new element on the stack of HTML elements. 467When 468.Fn print_otag_id 469opens two elements, a pointer to the outer one is returned. 470The memory pointed to is owned by the library and is automatically 471.Xr free 3 Ns d 472when 473.Fn print_tagq 474is called on it or when 475.Fn print_stagq 476is called on a parent element. 477.Pp 478The function 479.Fn html_fillmode 480returns 481.Dv ROFF_fi 482if fill mode was active before the call or 483.Dv ROFF_nf 484otherwise. 485.Pp 486The function 487.Fn html_make_id 488returns a newly allocated string or 489.Dv NULL 490if 491.Fa n 492lacks text data to create the attribute from. 493The caller is responsible for 494.Xr free 3 Ns ing 495the returned string after using it. 496.Pp 497In case of 498.Xr malloc 3 499failure, these functions do not return but call 500.Xr err 3 . 501.Sh FILES 502.Bl -tag -width mandoc_aux.c -compact 503.It Pa main.h 504declarations of public functions for use by the main program, 505not yet documented 506.It Pa html.h 507declarations of data types and private functions 508for use by language-specific HTML formatters 509.It Pa html.c 510main HTML formatting engine and utility functions 511.It Pa mdoc_html.c 512.Xr mdoc 7 513HTML formatter 514.It Pa man_html.c 515.Xr man 7 516HTML formatter 517.It Pa tbl_html.c 518.Xr tbl 7 519HTML formatter 520.It Pa eqn_html.c 521.Xr eqn 7 522HTML formatter 523.It Pa roff_html.c 524.Xr roff 7 525HTML formatter, handling requests like 526.Ic br , 527.Ic ce , 528.Ic fi , 529.Ic ft , 530.Ic nf , 531.Ic rj , 532and 533.Ic sp . 534.It Pa out.h 535declarations of data types and private functions 536for shared use by all mandoc formatters, 537not yet documented 538.It Pa out.c 539private functions for shared use by all mandoc formatters 540.It Pa mandoc_aux.h 541declarations of common mandoc utility functions, see 542.Xr mandoc 3 543.It Pa mandoc_aux.c 544implementation of common mandoc utility functions 545.El 546.Sh SEE ALSO 547.Xr mandoc 1 , 548.Xr mandoc 3 , 549.Xr man.cgi 8 550.Sh AUTHORS 551.An -nosplit 552The mandoc HTML formatter was written by 553.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv . 554It is maintained by 555.An Ingo Schwarze Aq Mt schwarze@openbsd.org , 556who also wrote this manual. 557