xref: /freebsd/contrib/mandoc/mandoc_html.3 (revision 6c05f3a74f30934ee60919cc97e16ec69b542b06)
1.\"	$Id: mandoc_html.3,v 1.24 2022/06/24 11:15:53 schwarze Exp $
2.\"
3.\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org>
4.\"
5.\" Permission to use, copy, modify, and distribute this software for any
6.\" purpose with or without fee is hereby granted, provided that the above
7.\" copyright notice and this permission notice appear in all copies.
8.\"
9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16.\"
17.Dd $Mdocdate: June 24 2022 $
18.Dt MANDOC_HTML 3
19.Os
20.Sh NAME
21.Nm mandoc_html
22.Nd internals of the mandoc HTML formatter
23.Sh SYNOPSIS
24.In sys/types.h
25.Fd #include """mandoc.h"""
26.Fd #include """roff.h"""
27.Fd #include """out.h"""
28.Fd #include """html.h"""
29.Ft void
30.Fn print_gen_decls "struct html *h"
31.Ft void
32.Fn print_gen_comment "struct html *h" "struct roff_node *n"
33.Ft void
34.Fn print_gen_head "struct html *h"
35.Ft struct tag *
36.Fo print_otag
37.Fa "struct html *h"
38.Fa "enum htmltag tag"
39.Fa "const char *fmt"
40.Fa ...
41.Fc
42.Ft void
43.Fo print_tagq
44.Fa "struct html *h"
45.Fa "const struct tag *until"
46.Fc
47.Ft void
48.Fo print_stagq
49.Fa "struct html *h"
50.Fa "const struct tag *suntil"
51.Fc
52.Ft void
53.Fn html_close_paragraph "struct html *h"
54.Ft enum roff_tok
55.Fo html_fillmode
56.Fa "struct html *h"
57.Fa "enum roff_tok tok"
58.Fc
59.Ft int
60.Fo html_setfont
61.Fa "struct html *h"
62.Fa "enum mandoc_esc font"
63.Fc
64.Ft void
65.Fo print_text
66.Fa "struct html *h"
67.Fa "const char *word"
68.Fc
69.Ft void
70.Fo print_tagged_text
71.Fa "struct html *h"
72.Fa "const char *word"
73.Fa "struct roff_node *n"
74.Fc
75.Ft char *
76.Fo html_make_id
77.Fa "const struct roff_node *n"
78.Fa "int unique"
79.Fc
80.Ft struct tag *
81.Fo print_otag_id
82.Fa "struct html *h"
83.Fa "enum htmltag tag"
84.Fa "const char *cattr"
85.Fa "struct roff_node *n"
86.Fc
87.Ft void
88.Fn print_endline "struct html *h"
89.Sh DESCRIPTION
90The mandoc HTML formatter is not a formal library.
91However, as it is compiled into more than one program, in particular
92.Xr mandoc 1
93and
94.Xr man.cgi 8 ,
95and because it may be security-critical in some contexts,
96some documentation is useful to help to use it correctly and
97to prevent XSS vulnerabilities.
98.Pp
99The formatter produces HTML output on the standard output.
100Since proper escaping is usually required and best taken care of
101at one central place, the language-specific formatters
102.Po
103.Pa *_html.c ,
104see
105.Sx FILES
106.Pc
107are not supposed to print directly to
108.Dv stdout
109using functions like
110.Xr printf 3 ,
111.Xr putc 3 ,
112.Xr puts 3 ,
113or
114.Xr write 2 .
115Instead, they are expected to use the output functions declared in
116.Pa html.h
117and implemented as part of the main HTML formatting engine in
118.Pa html.c .
119.Ss Data structures
120These structures are declared in
121.Pa html.h .
122.Bl -tag -width Ds
123.It Vt struct html
124Internal state of the HTML formatter.
125.It Vt struct tag
126One entry for the LIFO stack of HTML elements.
127Members include
128.Fa "enum htmltag tag"
129and
130.Fa "struct tag *next" .
131.El
132.Ss Private interface functions
133The function
134.Fn print_gen_decls
135prints the opening
136.Aq Pf \&! Ic DOCTYPE
137declaration.
138.Pp
139The function
140.Fn print_gen_comment
141prints the leading comments, usually containing a Copyright notice
142and license, as an HTML comment.
143It is intended to be called right after opening the
144.Aq Ic HTML
145element.
146Pass the first
147.Dv ROFFT_COMMENT
148node in
149.Fa n .
150.Pp
151The function
152.Fn print_gen_head
153prints the opening
154.Aq Ic META
155and
156.Aq Ic LINK
157elements for the document
158.Aq Ic HEAD ,
159using the
160.Fa style
161member of
162.Fa h
163unless that is
164.Dv NULL .
165It uses
166.Fn print_otag
167which takes care of properly encoding attributes,
168which is relevant for the
169.Fa style
170link in particular.
171.Pp
172The function
173.Fn print_otag
174prints the start tag of an HTML element with the name
175.Fa tag ,
176optionally including the attributes specified by
177.Fa fmt .
178If
179.Fa fmt
180is the empty string, no attributes are written.
181Each letter of
182.Fa fmt
183specifies one attribute to write.
184Most attributes require one
185.Va char *
186argument which becomes the value of the attribute.
187The arguments have to be given in the same order as the attribute letters.
188If an argument is
189.Dv NULL ,
190the respective attribute is not written.
191.Bl -tag -width 1n -offset indent
192.It Cm c
193Print a
194.Cm class
195attribute.
196.It Cm h
197Print a
198.Cm href
199attribute.
200This attribute letter can optionally be followed by a modifier letter.
201If followed by
202.Cm R ,
203it formats the link as a local one by prefixing a
204.Sq #
205character.
206If followed by
207.Cm I ,
208it interpretes the argument as a header file name
209and generates a link using the
210.Xr mandoc 1
211.Fl O Cm includes
212option.
213If followed by
214.Cm M ,
215it takes two arguments instead of one, a manual page name and
216section, and formats them as a link to a manual page using the
217.Xr mandoc 1
218.Fl O Cm man
219option.
220.It Cm i
221Print an
222.Cm id
223attribute.
224.It Cm r
225Print an ARIA
226.Cm role
227attribute.
228.It Cm \&?
229Print an arbitrary attribute.
230This format letter requires two
231.Vt char *
232arguments, the attribute name and the value.
233The name must not be
234.Dv NULL .
235.It Cm s
236Print a
237.Cm style
238attribute.
239If present, it must be the last format letter.
240It requires two
241.Va char *
242arguments.
243The first is the name of the style property, the second its value.
244The name must not be
245.Dv NULL .
246The
247.Cm s
248.Ar fmt
249letter can be repeated, each repetition requiring an additional pair of
250.Va char *
251arguments.
252.El
253.Pp
254.Fn print_otag
255uses the private function
256.Fn print_encode
257to take care of HTML encoding.
258If required by the element type, it remembers in
259.Fa h
260that the element is open.
261The function
262.Fn print_tagq
263is used to close out all open elements up to and including
264.Fa until ;
265.Fn print_stagq
266is a variant to close out all open elements up to but excluding
267.Fa suntil .
268The function
269.Fn html_close_paragraph
270closes all open elements that establish phrasing context,
271thus returning to the innermost flow context.
272.Pp
273The function
274.Fn html_fillmode
275switches to fill mode if
276.Fa want
277is
278.Dv ROFF_fi
279or to no-fill mode if
280.Fa want
281is
282.Dv ROFF_nf .
283Switching from fill mode to no-fill mode closes the current paragraph
284and opens a
285.Aq Ic PRE
286element.
287Switching in the opposite direction closes the
288.Aq Ic PRE
289element, but does not open a new paragraph.
290If
291.Fa want
292matches the mode that is already active, no elements are closed nor opened.
293If
294.Fa want
295is
296.Dv TOKEN_NONE ,
297the mode remains as it is.
298.Pp
299The function
300.Fn html_setfont
301selects the
302.Fa font ,
303which can be
304.Dv ESCAPE_FONTROMAN ,
305.Dv ESCAPE_FONTBOLD ,
306.Dv ESCAPE_FONTITALIC ,
307.Dv ESCAPE_FONTBI ,
308or
309.Dv ESCAPE_FONTCW ,
310for future text output and internally remembers
311the font that was active before the change.
312If the
313.Fa font
314argument is
315.Dv ESCAPE_FONTPREV ,
316the current and the previous font are exchanged.
317This function only changes the internal state of the
318.Fa h
319object; no HTML elements are written yet.
320Subsequent text output will write font elements when needed.
321.Pp
322The function
323.Fn print_text
324prints HTML element content.
325It uses the private function
326.Fn print_encode
327to take care of HTML encoding.
328If the document has requested a non-standard font, for example using a
329.Xr roff 7
330.Ic \ef
331font escape sequence,
332.Fn print_text
333wraps
334.Fa word
335in an HTML font selection element using the
336.Fn print_otag
337and
338.Fn print_tagq
339functions.
340.Pp
341The function
342.Fn print_tagged_text
343is a variant of
344.Fn print_text
345that wraps
346.Fa word
347in an
348.Aq Ic A
349element of class
350.Qq permalink
351if
352.Fa n
353is not
354.Dv NULL
355and yields a segment identifier when passed to
356.Fn html_make_id .
357.Pp
358The function
359.Fn html_make_id
360allocates a string to be used for the
361.Cm id
362attribute of an HTML element and/or as a segment identifier for a URI in an
363.Aq Ic A
364element.
365If
366.Fa n
367contains a
368.Fa tag
369attribute, it is used; otherwise, child nodes are used.
370If
371.Fa n
372is an
373.Ic \&Sh ,
374.Ic \&Ss ,
375.Ic \&Sx ,
376.Ic SH ,
377or
378.Ic SS
379node, the resulting string is the concatenation of the child strings;
380for other node types, only the first child is used.
381Bytes not permitted in URI-fragment strings are replaced by underscores.
382If any of the children to be used is not a text node,
383no string is generated and
384.Dv NULL
385is returned instead.
386If the
387.Fa unique
388argument is non-zero, deduplication is performed by appending an
389underscore and a decimal integer, if necessary.
390If the
391.Fa unique
392argument is 1, this is assumed to be the first call for this tag
393at this location, typically for use by
394.Dv NODE_ID ,
395so the integer is incremented before use.
396If the
397.Fa unique
398argument is 2, this is ssumed to be the second call for this tag
399at this location, typically for use by
400.Dv NODE_HREF ,
401so the existing integer, if any, is used without incrementing it.
402.Pp
403The function
404.Fn print_otag_id
405opens a
406.Fa tag
407element of class
408.Fa cattr
409for the node
410.Fa n .
411If the flag
412.Dv NODE_ID
413is set in
414.Fa n ,
415it attempts to generate an
416.Cm id
417attribute with
418.Fn html_make_id .
419If the flag
420.Dv NODE_HREF
421is set in
422.Fa n ,
423an
424.Aq Ic A
425element of class
426.Qq permalink
427is added:
428outside if
429.Fa n
430generates an element that can only occur in phrasing context,
431or inside otherwise.
432This function is a wrapper around
433.Fn html_make_id
434and
435.Fn print_otag ,
436automatically chosing the
437.Fa unique
438argument appropriately and setting the
439.Fa fmt
440arguments to
441.Qq chR
442and
443.Qq ci ,
444respectively.
445.Pp
446The function
447.Fn print_endline
448makes sure subsequent output starts on a new HTML output line.
449If nothing was printed on the current output line yet, it has no effect.
450Otherwise, it appends any buffered text to the current output line,
451ends the line, and updates the internal state of the
452.Fa h
453object.
454.Pp
455The functions
456.Fn print_eqn ,
457.Fn print_tbl ,
458and
459.Fn print_tblclose
460are not yet documented.
461.Sh RETURN VALUES
462The functions
463.Fn print_otag
464and
465.Fn print_otag_id
466return a pointer to a new element on the stack of HTML elements.
467When
468.Fn print_otag_id
469opens two elements, a pointer to the outer one is returned.
470The memory pointed to is owned by the library and is automatically
471.Xr free 3 Ns d
472when
473.Fn print_tagq
474is called on it or when
475.Fn print_stagq
476is called on a parent element.
477.Pp
478The function
479.Fn html_fillmode
480returns
481.Dv ROFF_fi
482if fill mode was active before the call or
483.Dv ROFF_nf
484otherwise.
485.Pp
486The function
487.Fn html_make_id
488returns a newly allocated string or
489.Dv NULL
490if
491.Fa n
492lacks text data to create the attribute from.
493The caller is responsible for
494.Xr free 3 Ns ing
495the returned string after using it.
496.Pp
497In case of
498.Xr malloc 3
499failure, these functions do not return but call
500.Xr err 3 .
501.Sh FILES
502.Bl -tag -width mandoc_aux.c -compact
503.It Pa main.h
504declarations of public functions for use by the main program,
505not yet documented
506.It Pa html.h
507declarations of data types and private functions
508for use by language-specific HTML formatters
509.It Pa html.c
510main HTML formatting engine and utility functions
511.It Pa mdoc_html.c
512.Xr mdoc 7
513HTML formatter
514.It Pa man_html.c
515.Xr man 7
516HTML formatter
517.It Pa tbl_html.c
518.Xr tbl 7
519HTML formatter
520.It Pa eqn_html.c
521.Xr eqn 7
522HTML formatter
523.It Pa roff_html.c
524.Xr roff 7
525HTML formatter, handling requests like
526.Ic br ,
527.Ic ce ,
528.Ic fi ,
529.Ic ft ,
530.Ic nf ,
531.Ic rj ,
532and
533.Ic sp .
534.It Pa out.h
535declarations of data types and private functions
536for shared use by all mandoc formatters,
537not yet documented
538.It Pa out.c
539private functions for shared use by all mandoc formatters
540.It Pa mandoc_aux.h
541declarations of common mandoc utility functions, see
542.Xr mandoc 3
543.It Pa mandoc_aux.c
544implementation of common mandoc utility functions
545.El
546.Sh SEE ALSO
547.Xr mandoc 1 ,
548.Xr mandoc 3 ,
549.Xr man.cgi 8
550.Sh AUTHORS
551.An -nosplit
552The mandoc HTML formatter was written by
553.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv .
554It is maintained by
555.An Ingo Schwarze Aq Mt schwarze@openbsd.org ,
556who also wrote this manual.
557