xref: /freebsd/contrib/mandoc/mandoc_escape.3 (revision b64c5a0ace59af62eff52bfe110a521dc73c937b)
1.\" $Id: mandoc_escape.3,v 1.6 2023/10/23 14:46:22 schwarze Exp $
2.\"
3.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
4.\"
5.\" Permission to use, copy, modify, and distribute this software for any
6.\" purpose with or without fee is hereby granted, provided that the above
7.\" copyright notice and this permission notice appear in all copies.
8.\"
9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16.\"
17.Dd $Mdocdate: October 23 2023 $
18.Dt MANDOC_ESCAPE 3
19.Os
20.Sh NAME
21.Nm mandoc_escape
22.Nd parse roff escape sequences
23.Sh SYNOPSIS
24.In sys/types.h
25.In mandoc.h
26.Ft "enum mandoc_esc"
27.Fo mandoc_escape
28.Fa "const char **end"
29.Fa "const char **start"
30.Fa "int *sz"
31.Fc
32.Sh DESCRIPTION
33This function scans a
34.Xr roff 7
35escape sequence.
36.Pp
37An escape sequence consists of
38.Bl -dash -compact -width 2n
39.It
40an initial backslash character
41.Pq Sq \e ,
42.It
43a single ASCII character called the escape sequence identifier,
44.It
45and, with only a few exceptions, an argument.
46.El
47.Pp
48Arguments can be given in the following forms; some escape sequence
49identifiers only accept some of these forms as specified below.
50The first three forms are called the standard forms.
51.Bl -tag -width 2n
52.It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&]
53The argument starts after the initial
54.Sq \&[ ,
55ends before the final
56.Sq \&] ,
57and the escape sequence ends with the final
58.Sq \&] .
59.It Two-character argument short form: Ic \&( Ns Ar ar
60This form can only be used for arguments
61consisting of exactly two characters.
62It has the same effect as
63.Ic \&[ Ns Ar ar Ns Ic \&] .
64.It One-character argument short form: Ar a
65This form can only be used for arguments
66consisting of exactly one character.
67It has the same effect as
68.Ic \&[ Ns Ar a Ns Ic \&] .
69.It Delimited form: Ar C Ns Ar argument Ns Ar C
70The argument starts after the initial delimiter character
71.Ar C ,
72ends before the next occurrence of the delimiter character
73.Ar C ,
74and the escape sequence ends with that second
75.Ar C .
76Some escape sequences allow arbitrary characters
77.Ar C
78as quoting characters, some restrict the range of characters
79that can be used as quoting characters.
80.El
81.Pp
82Upon function entry,
83.Pf * Fa end
84is expected to point to the escape sequence identifier.
85The values passed in as
86.Pf * Fa start
87and
88.Pf * Fa sz
89are ignored and overwritten.
90.Pp
91By design, this function cannot handle those
92.Xr roff 7
93escape sequences that require in-place expansion, in particular
94user-defined strings
95.Ic \e* ,
96number registers
97.Ic \en ,
98width measurements
99.Ic \ew ,
100and numerical expression control
101.Ic \eB .
102These are handled by
103.Fn roff_expand ,
104a private preprocessor function called from
105.Fn roff_parseln
106and
107.Fn roff_getarg ,
108see the file
109.Pa roff.c .
110.Pp
111The function
112.Fn mandoc_escape
113is used
114.Bl -dash -compact -width 2n
115.It
116recursively by itself, because some escape sequence arguments can
117in turn contain other escape sequences,
118.It
119for parsing and error detection internally by the
120.Xr roff 7
121parser part of the
122.Xr mandoc 3
123library, see the file
124.Pa roff.c ,
125.It
126occasionally by high-level parser and validation modules when they
127need to skip escape sequences while scanning the input, see the files
128.Pa mdoc.c ,
129.Pa man.c ,
130.Pa man_validate.c ,
131.Pa eqn.c ,
132and
133.Pa tbl_data.c
134.It
135above all externally by the
136.Xr mandoc 1
137formatting modules, in particular
138.Fl Tascii
139and
140.Fl Thtml ,
141for formatting purposes, see the files
142.Pa term.c
143and
144.Pa html.c ,
145.It
146and rarely externally by high-level utilities using the mandoc library,
147for example
148.Xr makewhatis 8 ,
149to purge escape sequences from text.
150.El
151.Sh RETURN VALUES
152Upon function return, the pointer
153.Pf * Fa end
154is set to the character after the end of the escape sequence,
155such that the calling higher-level parser can easily continue.
156.Pp
157For escape sequences taking an argument, the pointer
158.Pf * Fa start
159is set to the beginning of the argument and
160.Pf * Fa sz
161is set to the length of the argument.
162For escape sequences not taking an argument,
163.Pf * Fa start
164is set to the character after the end of the sequence and
165.Pf * Fa sz
166is set to 0.
167Both
168.Fa start
169and
170.Fa sz
171may be
172.Dv NULL ;
173in that case, the argument and the length are not returned.
174.Pp
175For sequences taking an argument, the function
176.Fn mandoc_escape
177returns one of the following values:
178.Bl -tag -width 2n
179.It Dv ESCAPE_DEVICE
180The escape sequence
181.Ic \e*(.T
182or
183.Ic \e*[.T] .
184.It Dv ESCAPE_FONT
185The escape sequence
186.Ic \ef
187taking an argument in standard form:
188.Ic \ef[ , \ef( , \ef Ns Ar a .
189Two-character arguments starting with the character
190.Sq C
191are reduced to one-character arguments by skipping the
192.Sq C .
193More specific values are returned for the most commonly used arguments:
194.Bl -column "argument" "ESCAPE_FONTITALIC"
195.It argument Ta return value
196.It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
197.It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
198.It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
199.It Cm P Ta Dv ESCAPE_FONTPREV
200.It Cm BI Ta Dv ESCAPE_FONTBI
201.El
202.It Dv ESCAPE_HLINE
203The escape sequence
204.Ic \eh
205followed by an argument delimited by an arbitrary character.
206.It Dv ESCAPE_HORIZ
207The escape sequence
208.Ic \el
209followed by an argument delimited by an arbitrary character.
210.It Dv ESCAPE_NUMBERED
211The escape sequence
212.Ic \eN
213followed by a delimited argument.
214The delimiter character is arbitrary except that digits cannot be used.
215If a digit is encountered instead of the opening delimiter, that
216digit is considered to be the argument and the end of the sequence, and
217.Dv ESCAPE_IGNORE
218is returned.
219.Pp
220Such ASCII character escape sequences can be rendered using the function
221.Fn mchars_num2char
222described in the
223.Xr mchars_alloc 3
224manual.
225.It Dv ESCAPE_OVERSTRIKE
226The escape sequence
227.Ic \eo
228followed by an argument delimited by an arbitrary character.
229.It Dv ESCAPE_SPECIAL
230The escape sequence
231.Ic \eC
232taking an argument delimited with the single quote character
233and, as a special exception, the escape sequences
234.Em not
235having an identifier, that is, those where the argument, in standard
236form, directly follows the initial backslash:
237.Ic \eC' , \e[ , \e( , \e Ns Ar a .
238Note that the one-character argument short form can only be used for
239argument characters that do not clash with escape sequence identifiers.
240.Pp
241If the argument matches one of the forms described below under
242.Dv ESCAPE_UNICODE ,
243that value is returned instead.
244.Pp
245The
246.Dv ESCAPE_SPECIAL
247special character escape sequences can be rendered using the functions
248.Fn mchars_spec2cp
249and
250.Fn mchars_spec2str
251described in the
252.Xr mchars_alloc 3
253manual.
254.It Dv ESCAPE_UNICODE
255Escape sequences of the same format as described above under
256.Dv ESCAPE_SPECIAL ,
257but with an argument of the forms
258.Ic u Ns Ar XXXX ,
259.Ic u Ns Ar YXXXX ,
260or
261.Ic u10 Ns Ar XXXX
262where
263.Ar X
264and
265.Ar Y
266are hexadecimal digits and
267.Ar Y
268is not zero:
269.Ic \eC'u , \e[u .
270As a special exception,
271.Pf * Fa start
272is set to the character after the
273.Ic u ,
274and the
275.Pf * Fa sz
276return value does not include the
277.Ic u
278either.
279.Pp
280Such Unicode character escape sequences can be rendered using the function
281.Fn mchars_num2uc
282described in the
283.Xr mchars_alloc 3
284manual.
285.It Dv ESCAPE_IGNORE
286Many escape sequences that
287.Xr mandoc 1
288intends to ignore, in particular:
289.Bl -bullet -width 2n
290.It
291The escape sequence
292.Ic \es
293followed by an argument in standard form or by an argument delimited
294by the single quote character:
295.Ic \es' , \es[ , \es( , \es Ns Ar a .
296As a special exception, an optional
297.Sq +
298or
299.Sq \-
300character is allowed after the
301.Sq s
302for all forms.
303.It
304The escape sequences
305.Ic \eF ,
306.Ic \ek ,
307.Ic \eM ,
308.Ic \em ,
309.Ic \eO ,
310and
311.Ic \eY
312followed by an argument in standard form.
313.It
314The escape sequences
315.Ic \eb ,
316.Ic \eD ,
317.Ic \eR ,
318.Ic \eX ,
319and
320.Ic \eZ
321followed by an argument delimited by an arbitrary character.
322.It
323The escape sequences
324.Ic \eH ,
325.Ic \eL ,
326.Ic \eS ,
327.Ic \ev ,
328and
329.Ic \ex
330followed by an argument delimited by a character that cannot occur
331in numerical expressions.
332However, if any character that can occur in numerical expressions
333is found instead of a delimiter, the sequence is considered to end
334with that character, and
335.Dv ESCAPE_ERROR
336is returned.
337.It
338The escape sequences
339.Ic \eO
340with a single-digit argument in the range from 1 to 4 inclusive.
341.El
342.It Dv ESCAPE_UNSUPP
343An escape sequence that
344.Xr mandoc 1
345can parse, but for which formatting in unsupported, in particular
346.Qq \eO0
347and
348.Qq \eO5 .
349.It Dv ESCAPE_ERROR
350Escape sequences taking an argument
351where the actual argument contains a syntax error.
352In particular, that happens if the end of the logical input line
353is reached before the end of the argument.
354.El
355.Pp
356For sequences that do not take an argument, the function
357.Fn mandoc_escape
358returns one of the following values:
359.Bl -tag -width 2n
360.It Dv ESCAPE_BREAK
361The escape sequence
362.Qq \ep .
363.It Dv ESCAPE_IGNORE
364Many escape sequences including
365.Qq \e% ,
366.Qq \e& ,
367.Qq \e| ,
368.Qq \ed ,
369and
370.Qq \eu .
371.It Dv ESCAPE_NOSPACE
372The escape sequence
373.Qq \ec .
374.It Dv ESCAPE_SKIPCHAR
375The escape sequence
376.Qq \ez .
377.It Dv ESCAPE_UNSUPP
378The escape sequences
379.Qq \e! ,
380.Qq \e? ,
381and
382.Qq \er .
383.It Dv ESCAPE_UNDEF
384Many escape sequences that other
385.Xr roff 7
386implementations do not define either, for example
387.Qq \eG ,
388.Qq \eI ,
389.Qq \ei ,
390.Qq \eJ ,
391.Qq \ej ,
392.Qq \eK ,
393.Qq \eP ,
394.Qq \eT ,
395.Qq \eU ,
396.Qq \eW ,
397and
398.Qq \ey .
399.El
400.Sh FILES
401This function is implemented in
402.Pa mandoc.c .
403.Sh SEE ALSO
404.Xr mchars_alloc 3 ,
405.Xr mandoc_char 7 ,
406.Xr roff 7
407.Sh HISTORY
408This function has been available since mandoc 1.11.2.
409.Sh AUTHORS
410.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
411.An Ingo Schwarze Aq Mt schwarze@openbsd.org
412