1.\" $Id: mandoc_html.3,v 1.19 2019/01/11 12:56:43 schwarze Exp $ 2.\" 3.\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" 17.Dd $Mdocdate: January 11 2019 $ 18.Dt MANDOC_HTML 3 19.Os 20.Sh NAME 21.Nm mandoc_html 22.Nd internals of the mandoc HTML formatter 23.Sh SYNOPSIS 24.In "html.h" 25.Ft void 26.Fn print_gen_decls "struct html *h" 27.Ft void 28.Fn print_gen_comment "struct html *h" "struct roff_node *n" 29.Ft void 30.Fn print_gen_head "struct html *h" 31.Ft struct tag * 32.Fo print_otag 33.Fa "struct html *h" 34.Fa "enum htmltag tag" 35.Fa "const char *fmt" 36.Fa ... 37.Fc 38.Ft void 39.Fo print_tagq 40.Fa "struct html *h" 41.Fa "const struct tag *until" 42.Fc 43.Ft void 44.Fo print_stagq 45.Fa "struct html *h" 46.Fa "const struct tag *suntil" 47.Fc 48.Ft void 49.Fo print_text 50.Fa "struct html *h" 51.Fa "const char *word" 52.Fc 53.Ft char * 54.Fo html_make_id 55.Fa "const struct roff_node *n" 56.Fc 57.Ft int 58.Fo html_strlen 59.Fa "const char *cp" 60.Fc 61.Sh DESCRIPTION 62The mandoc HTML formatter is not a formal library. 63However, as it is compiled into more than one program, in particular 64.Xr mandoc 1 65and 66.Xr man.cgi 8 , 67and because it may be security-critical in some contexts, 68some documentation is useful to help to use it correctly and 69to prevent XSS vulnerabilities. 70.Pp 71The formatter produces HTML output on the standard output. 72Since proper escaping is usually required and best taken care of 73at one central place, the language-specific formatters 74.Po 75.Pa *_html.c , 76see 77.Sx FILES 78.Pc 79are not supposed to print directly to 80.Dv stdout 81using functions like 82.Xr printf 3 , 83.Xr putc 3 , 84.Xr puts 3 , 85or 86.Xr write 2 . 87Instead, they are expected to use the output functions declared in 88.Pa html.h 89and implemented as part of the main HTML formatting engine in 90.Pa html.c . 91.Ss Data structures 92These structures are declared in 93.Pa html.h . 94.Bl -tag -width Ds 95.It Vt struct html 96Internal state of the HTML formatter. 97.It Vt struct tag 98One entry for the LIFO stack of HTML elements. 99Members are 100.Fa "enum htmltag tag" 101and 102.Fa "struct tag *next" . 103.El 104.Ss Private interface functions 105The function 106.Fn print_gen_decls 107prints the opening 108.Ao Pf \&? Ic xml ? Ac 109and 110.Aq Pf \&! Ic DOCTYPE 111declarations required for the current document type. 112.Pp 113The function 114.Fn print_gen_comment 115prints the leading comments, usually containing a Copyright notice 116and license, as an HTML comment. 117It is intended to be called right after opening the 118.Aq Ic HTML 119element. 120Pass the first 121.Dv ROFFT_COMMENT 122node in 123.Fa n . 124.Pp 125The function 126.Fn print_gen_head 127prints the opening 128.Aq Ic META 129and 130.Aq Ic LINK 131elements for the document 132.Aq Ic HEAD , 133using the 134.Fa style 135member of 136.Fa h 137unless that is 138.Dv NULL . 139It uses 140.Fn print_otag 141which takes care of properly encoding attributes, 142which is relevant for the 143.Fa style 144link in particular. 145.Pp 146The function 147.Fn print_otag 148prints the start tag of an HTML element with the name 149.Fa tag , 150optionally including the attributes specified by 151.Fa fmt . 152If 153.Fa fmt 154is the empty string, no attributes are written. 155Each letter of 156.Fa fmt 157specifies one attribute to write. 158Most attributes require one 159.Va char * 160argument which becomes the value of the attribute. 161The arguments have to be given in the same order as the attribute letters. 162If an argument is 163.Dv NULL , 164the respective attribute is not written. 165.Bl -tag -width 1n -offset indent 166.It Cm c 167Print a 168.Cm class 169attribute. 170.It Cm h 171Print a 172.Cm href 173attribute. 174This attribute letter can optionally be followed by a modifier letter. 175If followed by 176.Cm R , 177it formats the link as a local one by prefixing a 178.Sq # 179character. 180If followed by 181.Cm I , 182it interpretes the argument as a header file name 183and generates a link using the 184.Xr mandoc 1 185.Fl O Cm includes 186option. 187If followed by 188.Cm M , 189it takes two arguments instead of one, a manual page name and 190section, and formats them as a link to a manual page using the 191.Xr mandoc 1 192.Fl O Cm man 193option. 194.It Cm i 195Print an 196.Cm id 197attribute. 198.It Cm \&? 199Print an arbitrary attribute. 200This format letter requires two 201.Vt char * 202arguments, the attribute name and the value. 203The name must not be 204.Dv NULL . 205.It Cm s 206Print a 207.Cm style 208attribute. 209If present, it must be the last format letter. 210It requires two 211.Va char * 212arguments. 213The first is the name of the style property, the second its value. 214The name must not be 215.Dv NULL . 216The 217.Cm s 218.Ar fmt 219letter can be repeated, each repetition requiring an additional pair of 220.Va char * 221arguments. 222.El 223.Pp 224.Fn print_otag 225uses the private function 226.Fn print_encode 227to take care of HTML encoding. 228If required by the element type, it remembers in 229.Fa h 230that the element is open. 231The function 232.Fn print_tagq 233is used to close out all open elements up to and including 234.Fa until ; 235.Fn print_stagq 236is a variant to close out all open elements up to but excluding 237.Fa suntil . 238.Pp 239The function 240.Fn print_text 241prints HTML element content. 242It uses the private function 243.Fn print_encode 244to take care of HTML encoding. 245If the document has requested a non-standard font, for example using a 246.Xr roff 7 247.Ic \ef 248font escape sequence, 249.Fn print_text 250wraps 251.Fa word 252in an HTML font selection element using the 253.Fn print_otag 254and 255.Fn print_tagq 256functions. 257.Pp 258The function 259.Fn html_make_id 260takes a node containing one or more text children 261and returns a newly allocated string containing the concatenation 262of the child strings, with blanks replaced by underscores. 263If the node 264.Fa n 265contains any non-text child node, 266.Fn html_make_id 267returns 268.Dv NULL 269instead. 270The caller is responsible for freeing the returned string. 271.Pp 272The function 273.Fn html_strlen 274counts the number of characters in 275.Fa cp . 276It is used as a crude estimate of the width needed to display a string. 277.Pp 278The functions 279.Fn print_eqn , 280.Fn print_tbl , 281and 282.Fn print_tblclose 283are not yet documented. 284.Sh FILES 285.Bl -tag -width mandoc_aux.c -compact 286.It Pa main.h 287declarations of public functions for use by the main program, 288not yet documented 289.It Pa html.h 290declarations of data types and private functions 291for use by language-specific HTML formatters 292.It Pa html.c 293main HTML formatting engine and utility functions 294.It Pa mdoc_html.c 295.Xr mdoc 7 296HTML formatter 297.It Pa man_html.c 298.Xr man 7 299HTML formatter 300.It Pa tbl_html.c 301.Xr tbl 7 302HTML formatter 303.It Pa eqn_html.c 304.Xr eqn 7 305HTML formatter 306.It Pa out.h 307declarations of data types and private functions 308for shared use by all mandoc formatters, 309not yet documented 310.It Pa out.c 311private functions for shared use by all mandoc formatters 312.It Pa mandoc_aux.h 313declarations of common mandoc utility functions, see 314.Xr mandoc 3 315.It Pa mandoc_aux.c 316implementation of common mandoc utility functions 317.El 318.Sh SEE ALSO 319.Xr mandoc 1 , 320.Xr mandoc 3 , 321.Xr man.cgi 8 322.Sh AUTHORS 323.An -nosplit 324The mandoc HTML formatter was written by 325.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv . 326It is maintained by 327.An Ingo Schwarze Aq Mt schwarze@openbsd.org , 328who also wrote this manual. 329