xref: /freebsd/contrib/mandoc/mandoc_html.3 (revision 63f537551380d2dab29fa402ad1269feae17e594)
1.\"	$Id: mandoc_html.3,v 1.23 2020/04/24 13:13:06 schwarze Exp $
2.\"
3.\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org>
4.\"
5.\" Permission to use, copy, modify, and distribute this software for any
6.\" purpose with or without fee is hereby granted, provided that the above
7.\" copyright notice and this permission notice appear in all copies.
8.\"
9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16.\"
17.Dd $Mdocdate: April 24 2020 $
18.Dt MANDOC_HTML 3
19.Os
20.Sh NAME
21.Nm mandoc_html
22.Nd internals of the mandoc HTML formatter
23.Sh SYNOPSIS
24.In sys/types.h
25.Fd #include """mandoc.h"""
26.Fd #include """roff.h"""
27.Fd #include """out.h"""
28.Fd #include """html.h"""
29.Ft void
30.Fn print_gen_decls "struct html *h"
31.Ft void
32.Fn print_gen_comment "struct html *h" "struct roff_node *n"
33.Ft void
34.Fn print_gen_head "struct html *h"
35.Ft struct tag *
36.Fo print_otag
37.Fa "struct html *h"
38.Fa "enum htmltag tag"
39.Fa "const char *fmt"
40.Fa ...
41.Fc
42.Ft void
43.Fo print_tagq
44.Fa "struct html *h"
45.Fa "const struct tag *until"
46.Fc
47.Ft void
48.Fo print_stagq
49.Fa "struct html *h"
50.Fa "const struct tag *suntil"
51.Fc
52.Ft void
53.Fn html_close_paragraph "struct html *h"
54.Ft enum roff_tok
55.Fo html_fillmode
56.Fa "struct html *h"
57.Fa "enum roff_tok tok"
58.Fc
59.Ft int
60.Fo html_setfont
61.Fa "struct html *h"
62.Fa "enum mandoc_esc font"
63.Fc
64.Ft void
65.Fo print_text
66.Fa "struct html *h"
67.Fa "const char *word"
68.Fc
69.Ft void
70.Fo print_tagged_text
71.Fa "struct html *h"
72.Fa "const char *word"
73.Fa "struct roff_node *n"
74.Fc
75.Ft char *
76.Fo html_make_id
77.Fa "const struct roff_node *n"
78.Fa "int unique"
79.Fc
80.Ft struct tag *
81.Fo print_otag_id
82.Fa "struct html *h"
83.Fa "enum htmltag tag"
84.Fa "const char *cattr"
85.Fa "struct roff_node *n"
86.Fc
87.Ft void
88.Fn print_endline "struct html *h"
89.Sh DESCRIPTION
90The mandoc HTML formatter is not a formal library.
91However, as it is compiled into more than one program, in particular
92.Xr mandoc 1
93and
94.Xr man.cgi 8 ,
95and because it may be security-critical in some contexts,
96some documentation is useful to help to use it correctly and
97to prevent XSS vulnerabilities.
98.Pp
99The formatter produces HTML output on the standard output.
100Since proper escaping is usually required and best taken care of
101at one central place, the language-specific formatters
102.Po
103.Pa *_html.c ,
104see
105.Sx FILES
106.Pc
107are not supposed to print directly to
108.Dv stdout
109using functions like
110.Xr printf 3 ,
111.Xr putc 3 ,
112.Xr puts 3 ,
113or
114.Xr write 2 .
115Instead, they are expected to use the output functions declared in
116.Pa html.h
117and implemented as part of the main HTML formatting engine in
118.Pa html.c .
119.Ss Data structures
120These structures are declared in
121.Pa html.h .
122.Bl -tag -width Ds
123.It Vt struct html
124Internal state of the HTML formatter.
125.It Vt struct tag
126One entry for the LIFO stack of HTML elements.
127Members include
128.Fa "enum htmltag tag"
129and
130.Fa "struct tag *next" .
131.El
132.Ss Private interface functions
133The function
134.Fn print_gen_decls
135prints the opening
136.Aq Pf \&! Ic DOCTYPE
137declaration.
138.Pp
139The function
140.Fn print_gen_comment
141prints the leading comments, usually containing a Copyright notice
142and license, as an HTML comment.
143It is intended to be called right after opening the
144.Aq Ic HTML
145element.
146Pass the first
147.Dv ROFFT_COMMENT
148node in
149.Fa n .
150.Pp
151The function
152.Fn print_gen_head
153prints the opening
154.Aq Ic META
155and
156.Aq Ic LINK
157elements for the document
158.Aq Ic HEAD ,
159using the
160.Fa style
161member of
162.Fa h
163unless that is
164.Dv NULL .
165It uses
166.Fn print_otag
167which takes care of properly encoding attributes,
168which is relevant for the
169.Fa style
170link in particular.
171.Pp
172The function
173.Fn print_otag
174prints the start tag of an HTML element with the name
175.Fa tag ,
176optionally including the attributes specified by
177.Fa fmt .
178If
179.Fa fmt
180is the empty string, no attributes are written.
181Each letter of
182.Fa fmt
183specifies one attribute to write.
184Most attributes require one
185.Va char *
186argument which becomes the value of the attribute.
187The arguments have to be given in the same order as the attribute letters.
188If an argument is
189.Dv NULL ,
190the respective attribute is not written.
191.Bl -tag -width 1n -offset indent
192.It Cm c
193Print a
194.Cm class
195attribute.
196.It Cm h
197Print a
198.Cm href
199attribute.
200This attribute letter can optionally be followed by a modifier letter.
201If followed by
202.Cm R ,
203it formats the link as a local one by prefixing a
204.Sq #
205character.
206If followed by
207.Cm I ,
208it interpretes the argument as a header file name
209and generates a link using the
210.Xr mandoc 1
211.Fl O Cm includes
212option.
213If followed by
214.Cm M ,
215it takes two arguments instead of one, a manual page name and
216section, and formats them as a link to a manual page using the
217.Xr mandoc 1
218.Fl O Cm man
219option.
220.It Cm i
221Print an
222.Cm id
223attribute.
224.It Cm \&?
225Print an arbitrary attribute.
226This format letter requires two
227.Vt char *
228arguments, the attribute name and the value.
229The name must not be
230.Dv NULL .
231.It Cm s
232Print a
233.Cm style
234attribute.
235If present, it must be the last format letter.
236It requires two
237.Va char *
238arguments.
239The first is the name of the style property, the second its value.
240The name must not be
241.Dv NULL .
242The
243.Cm s
244.Ar fmt
245letter can be repeated, each repetition requiring an additional pair of
246.Va char *
247arguments.
248.El
249.Pp
250.Fn print_otag
251uses the private function
252.Fn print_encode
253to take care of HTML encoding.
254If required by the element type, it remembers in
255.Fa h
256that the element is open.
257The function
258.Fn print_tagq
259is used to close out all open elements up to and including
260.Fa until ;
261.Fn print_stagq
262is a variant to close out all open elements up to but excluding
263.Fa suntil .
264The function
265.Fn html_close_paragraph
266closes all open elements that establish phrasing context,
267thus returning to the innermost flow context.
268.Pp
269The function
270.Fn html_fillmode
271switches to fill mode if
272.Fa want
273is
274.Dv ROFF_fi
275or to no-fill mode if
276.Fa want
277is
278.Dv ROFF_nf .
279Switching from fill mode to no-fill mode closes the current paragraph
280and opens a
281.Aq Ic PRE
282element.
283Switching in the opposite direction closes the
284.Aq Ic PRE
285element, but does not open a new paragraph.
286If
287.Fa want
288matches the mode that is already active, no elements are closed nor opened.
289If
290.Fa want
291is
292.Dv TOKEN_NONE ,
293the mode remains as it is.
294.Pp
295The function
296.Fn html_setfont
297selects the
298.Fa font ,
299which can be
300.Dv ESCAPE_FONTROMAN ,
301.Dv ESCAPE_FONTBOLD ,
302.Dv ESCAPE_FONTITALIC ,
303.Dv ESCAPE_FONTBI ,
304or
305.Dv ESCAPE_FONTCW ,
306for future text output and internally remembers
307the font that was active before the change.
308If the
309.Fa font
310argument is
311.Dv ESCAPE_FONTPREV ,
312the current and the previous font are exchanged.
313This function only changes the internal state of the
314.Fa h
315object; no HTML elements are written yet.
316Subsequent text output will write font elements when needed.
317.Pp
318The function
319.Fn print_text
320prints HTML element content.
321It uses the private function
322.Fn print_encode
323to take care of HTML encoding.
324If the document has requested a non-standard font, for example using a
325.Xr roff 7
326.Ic \ef
327font escape sequence,
328.Fn print_text
329wraps
330.Fa word
331in an HTML font selection element using the
332.Fn print_otag
333and
334.Fn print_tagq
335functions.
336.Pp
337The function
338.Fn print_tagged_text
339is a variant of
340.Fn print_text
341that wraps
342.Fa word
343in an
344.Aq Ic A
345element of class
346.Qq permalink
347if
348.Fa n
349is not
350.Dv NULL
351and yields a segment identifier when passed to
352.Fn html_make_id .
353.Pp
354The function
355.Fn html_make_id
356allocates a string to be used for the
357.Cm id
358attribute of an HTML element and/or as a segment identifier for a URI in an
359.Aq Ic A
360element.
361If
362.Fa n
363contains a
364.Fa tag
365attribute, it is used; otherwise, child nodes are used.
366If
367.Fa n
368is an
369.Ic \&Sh ,
370.Ic \&Ss ,
371.Ic \&Sx ,
372.Ic SH ,
373or
374.Ic SS
375node, the resulting string is the concatenation of the child strings;
376for other node types, only the first child is used.
377Bytes not permitted in URI-fragment strings are replaced by underscores.
378If any of the children to be used is not a text node,
379no string is generated and
380.Dv NULL
381is returned instead.
382If the
383.Fa unique
384argument is non-zero, deduplication is performed by appending an
385underscore and a decimal integer, if necessary.
386If the
387.Fa unique
388argument is 1, this is assumed to be the first call for this tag
389at this location, typically for use by
390.Dv NODE_ID ,
391so the integer is incremented before use.
392If the
393.Fa unique
394argument is 2, this is ssumed to be the second call for this tag
395at this location, typically for use by
396.Dv NODE_HREF ,
397so the existing integer, if any, is used without incrementing it.
398.Pp
399The function
400.Fn print_otag_id
401opens a
402.Fa tag
403element of class
404.Fa cattr
405for the node
406.Fa n .
407If the flag
408.Dv NODE_ID
409is set in
410.Fa n ,
411it attempts to generate an
412.Cm id
413attribute with
414.Fn html_make_id .
415If the flag
416.Dv NODE_HREF
417is set in
418.Fa n ,
419an
420.Aq Ic A
421element of class
422.Qq permalink
423is added:
424outside if
425.Fa n
426generates an element that can only occur in phrasing context,
427or inside otherwise.
428This function is a wrapper around
429.Fn html_make_id
430and
431.Fn print_otag ,
432automatically chosing the
433.Fa unique
434argument appropriately and setting the
435.Fa fmt
436arguments to
437.Qq chR
438and
439.Qq ci ,
440respectively.
441.Pp
442The function
443.Fn print_endline
444makes sure subsequent output starts on a new HTML output line.
445If nothing was printed on the current output line yet, it has no effect.
446Otherwise, it appends any buffered text to the current output line,
447ends the line, and updates the internal state of the
448.Fa h
449object.
450.Pp
451The functions
452.Fn print_eqn ,
453.Fn print_tbl ,
454and
455.Fn print_tblclose
456are not yet documented.
457.Sh RETURN VALUES
458The functions
459.Fn print_otag
460and
461.Fn print_otag_id
462return a pointer to a new element on the stack of HTML elements.
463When
464.Fn print_otag_id
465opens two elements, a pointer to the outer one is returned.
466The memory pointed to is owned by the library and is automatically
467.Xr free 3 Ns d
468when
469.Fn print_tagq
470is called on it or when
471.Fn print_stagq
472is called on a parent element.
473.Pp
474The function
475.Fn html_fillmode
476returns
477.Dv ROFF_fi
478if fill mode was active before the call or
479.Dv ROFF_nf
480otherwise.
481.Pp
482The function
483.Fn html_make_id
484returns a newly allocated string or
485.Dv NULL
486if
487.Fa n
488lacks text data to create the attribute from.
489The caller is responsible for
490.Xr free 3 Ns ing
491the returned string after using it.
492.Pp
493In case of
494.Xr malloc 3
495failure, these functions do not return but call
496.Xr err 3 .
497.Sh FILES
498.Bl -tag -width mandoc_aux.c -compact
499.It Pa main.h
500declarations of public functions for use by the main program,
501not yet documented
502.It Pa html.h
503declarations of data types and private functions
504for use by language-specific HTML formatters
505.It Pa html.c
506main HTML formatting engine and utility functions
507.It Pa mdoc_html.c
508.Xr mdoc 7
509HTML formatter
510.It Pa man_html.c
511.Xr man 7
512HTML formatter
513.It Pa tbl_html.c
514.Xr tbl 7
515HTML formatter
516.It Pa eqn_html.c
517.Xr eqn 7
518HTML formatter
519.It Pa roff_html.c
520.Xr roff 7
521HTML formatter, handling requests like
522.Ic br ,
523.Ic ce ,
524.Ic fi ,
525.Ic ft ,
526.Ic nf ,
527.Ic rj ,
528and
529.Ic sp .
530.It Pa out.h
531declarations of data types and private functions
532for shared use by all mandoc formatters,
533not yet documented
534.It Pa out.c
535private functions for shared use by all mandoc formatters
536.It Pa mandoc_aux.h
537declarations of common mandoc utility functions, see
538.Xr mandoc 3
539.It Pa mandoc_aux.c
540implementation of common mandoc utility functions
541.El
542.Sh SEE ALSO
543.Xr mandoc 1 ,
544.Xr mandoc 3 ,
545.Xr man.cgi 8
546.Sh AUTHORS
547.An -nosplit
548The mandoc HTML formatter was written by
549.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv .
550It is maintained by
551.An Ingo Schwarze Aq Mt schwarze@openbsd.org ,
552who also wrote this manual.
553