mandoc_escape.3 (01d4e2149e5566e5d9394913dc9fb032da259e0b) mandoc_escape.3 (c1c95add8c80843ba15d784f95c361d795b1f593)
1.\" $Id: mandoc_escape.3,v 1.4 2017/07/04 23:40:01 schwarze Exp $
1.\" $Id: mandoc_escape.3,v 1.6 2023/10/23 14:46:22 schwarze Exp $
2.\"
3.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
4.\"
5.\" Permission to use, copy, modify, and distribute this software for any
6.\" purpose with or without fee is hereby granted, provided that the above
7.\" copyright notice and this permission notice appear in all copies.
8.\"
9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16.\"
2.\"
3.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
4.\"
5.\" Permission to use, copy, modify, and distribute this software for any
6.\" purpose with or without fee is hereby granted, provided that the above
7.\" copyright notice and this permission notice appear in all copies.
8.\"
9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16.\"
17.Dd $Mdocdate: July 4 2017 $
17.Dd $Mdocdate: October 23 2023 $
18.Dt MANDOC_ESCAPE 3
19.Os
20.Sh NAME
21.Nm mandoc_escape
22.Nd parse roff escape sequences
23.Sh SYNOPSIS
24.In sys/types.h
25.In mandoc.h

--- 49 unchanged lines hidden (view full) ---

75.Ar C .
76Some escape sequences allow arbitrary characters
77.Ar C
78as quoting characters, some restrict the range of characters
79that can be used as quoting characters.
80.El
81.Pp
82Upon function entry,
18.Dt MANDOC_ESCAPE 3
19.Os
20.Sh NAME
21.Nm mandoc_escape
22.Nd parse roff escape sequences
23.Sh SYNOPSIS
24.In sys/types.h
25.In mandoc.h

--- 49 unchanged lines hidden (view full) ---

75.Ar C .
76Some escape sequences allow arbitrary characters
77.Ar C
78as quoting characters, some restrict the range of characters
79that can be used as quoting characters.
80.El
81.Pp
82Upon function entry,
83.Fa end
83.Pf * Fa end
84is expected to point to the escape sequence identifier.
85The values passed in as
84is expected to point to the escape sequence identifier.
85The values passed in as
86.Fa start
86.Pf * Fa start
87and
87and
88.Fa sz
88.Pf * Fa sz
89are ignored and overwritten.
90.Pp
91By design, this function cannot handle those
92.Xr roff 7
93escape sequences that require in-place expansion, in particular
94user-defined strings
95.Ic \e* ,
96number registers
97.Ic \en ,
98width measurements
99.Ic \ew ,
100and numerical expression control
101.Ic \eB .
102These are handled by
89are ignored and overwritten.
90.Pp
91By design, this function cannot handle those
92.Xr roff 7
93escape sequences that require in-place expansion, in particular
94user-defined strings
95.Ic \e* ,
96number registers
97.Ic \en ,
98width measurements
99.Ic \ew ,
100and numerical expression control
101.Ic \eB .
102These are handled by
103.Fn roff_res ,
103.Fn roff_expand ,
104a private preprocessor function called from
104a private preprocessor function called from
105.Fn roff_parseln ,
105.Fn roff_parseln
106and
107.Fn roff_getarg ,
106see the file
107.Pa roff.c .
108.Pp
109The function
110.Fn mandoc_escape
111is used
112.Bl -dash -compact -width 2n
113.It
114recursively by itself, because some escape sequence arguments can
115in turn contain other escape sequences,
116.It
108see the file
109.Pa roff.c .
110.Pp
111The function
112.Fn mandoc_escape
113is used
114.Bl -dash -compact -width 2n
115.It
116recursively by itself, because some escape sequence arguments can
117in turn contain other escape sequences,
118.It
117for error detection internally by the
119for parsing and error detection internally by the
118.Xr roff 7
119parser part of the
120.Xr mandoc 3
121library, see the file
122.Pa roff.c ,
123.It
120.Xr roff 7
121parser part of the
122.Xr mandoc 3
123library, see the file
124.Pa roff.c ,
125.It
126occasionally by high-level parser and validation modules when they
127need to skip escape sequences while scanning the input, see the files
128.Pa mdoc.c ,
129.Pa man.c ,
130.Pa man_validate.c ,
131.Pa eqn.c ,
132and
133.Pa tbl_data.c
134.It
124above all externally by the
125.Xr mandoc 1
126formatting modules, in particular
127.Fl Tascii
128and
129.Fl Thtml ,
130for formatting purposes, see the files
131.Pa term.c
132and
133.Pa html.c ,
134.It
135and rarely externally by high-level utilities using the mandoc library,
136for example
137.Xr makewhatis 8 ,
138to purge escape sequences from text.
139.El
140.Sh RETURN VALUES
141Upon function return, the pointer
135above all externally by the
136.Xr mandoc 1
137formatting modules, in particular
138.Fl Tascii
139and
140.Fl Thtml ,
141for formatting purposes, see the files
142.Pa term.c
143and
144.Pa html.c ,
145.It
146and rarely externally by high-level utilities using the mandoc library,
147for example
148.Xr makewhatis 8 ,
149to purge escape sequences from text.
150.El
151.Sh RETURN VALUES
152Upon function return, the pointer
142.Fa end
153.Pf * Fa end
143is set to the character after the end of the escape sequence,
144such that the calling higher-level parser can easily continue.
145.Pp
146For escape sequences taking an argument, the pointer
154is set to the character after the end of the escape sequence,
155such that the calling higher-level parser can easily continue.
156.Pp
157For escape sequences taking an argument, the pointer
147.Fa start
158.Pf * Fa start
148is set to the beginning of the argument and
159is set to the beginning of the argument and
149.Fa sz
160.Pf * Fa sz
150is set to the length of the argument.
151For escape sequences not taking an argument,
161is set to the length of the argument.
162For escape sequences not taking an argument,
152.Fa start
163.Pf * Fa start
153is set to the character after the end of the sequence and
164is set to the character after the end of the sequence and
154.Fa sz
165.Pf * Fa sz
155is set to 0.
156Both
157.Fa start
158and
159.Fa sz
160may be
161.Dv NULL ;
162in that case, the argument and the length are not returned.
163.Pp
164For sequences taking an argument, the function
165.Fn mandoc_escape
166returns one of the following values:
167.Bl -tag -width 2n
166is set to 0.
167Both
168.Fa start
169and
170.Fa sz
171may be
172.Dv NULL ;
173in that case, the argument and the length are not returned.
174.Pp
175For sequences taking an argument, the function
176.Fn mandoc_escape
177returns one of the following values:
178.Bl -tag -width 2n
179.It Dv ESCAPE_DEVICE
180The escape sequence
181.Ic \e*(.T
182or
183.Ic \e*[.T] .
168.It Dv ESCAPE_FONT
169The escape sequence
170.Ic \ef
171taking an argument in standard form:
172.Ic \ef[ , \ef( , \ef Ns Ar a .
173Two-character arguments starting with the character
174.Sq C
175are reduced to one-character arguments by skipping the
176.Sq C .
177More specific values are returned for the most commonly used arguments:
178.Bl -column "argument" "ESCAPE_FONTITALIC"
179.It argument Ta return value
180.It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
181.It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
182.It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
183.It Cm P Ta Dv ESCAPE_FONTPREV
184.It Cm BI Ta Dv ESCAPE_FONTBI
185.El
184.It Dv ESCAPE_FONT
185The escape sequence
186.Ic \ef
187taking an argument in standard form:
188.Ic \ef[ , \ef( , \ef Ns Ar a .
189Two-character arguments starting with the character
190.Sq C
191are reduced to one-character arguments by skipping the
192.Sq C .
193More specific values are returned for the most commonly used arguments:
194.Bl -column "argument" "ESCAPE_FONTITALIC"
195.It argument Ta return value
196.It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
197.It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
198.It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
199.It Cm P Ta Dv ESCAPE_FONTPREV
200.It Cm BI Ta Dv ESCAPE_FONTBI
201.El
202.It Dv ESCAPE_HLINE
203The escape sequence
204.Ic \eh
205followed by an argument delimited by an arbitrary character.
206.It Dv ESCAPE_HORIZ
207The escape sequence
208.Ic \el
209followed by an argument delimited by an arbitrary character.
210.It Dv ESCAPE_NUMBERED
211The escape sequence
212.Ic \eN
213followed by a delimited argument.
214The delimiter character is arbitrary except that digits cannot be used.
215If a digit is encountered instead of the opening delimiter, that
216digit is considered to be the argument and the end of the sequence, and
217.Dv ESCAPE_IGNORE
218is returned.
219.Pp
220Such ASCII character escape sequences can be rendered using the function
221.Fn mchars_num2char
222described in the
223.Xr mchars_alloc 3
224manual.
225.It Dv ESCAPE_OVERSTRIKE
226The escape sequence
227.Ic \eo
228followed by an argument delimited by an arbitrary character.
186.It Dv ESCAPE_SPECIAL
187The escape sequence
188.Ic \eC
189taking an argument delimited with the single quote character
190and, as a special exception, the escape sequences
191.Em not
192having an identifier, that is, those where the argument, in standard
193form, directly follows the initial backslash:

--- 26 unchanged lines hidden (view full) ---

220.Ar X
221and
222.Ar Y
223are hexadecimal digits and
224.Ar Y
225is not zero:
226.Ic \eC'u , \e[u .
227As a special exception,
229.It Dv ESCAPE_SPECIAL
230The escape sequence
231.Ic \eC
232taking an argument delimited with the single quote character
233and, as a special exception, the escape sequences
234.Em not
235having an identifier, that is, those where the argument, in standard
236form, directly follows the initial backslash:

--- 26 unchanged lines hidden (view full) ---

263.Ar X
264and
265.Ar Y
266are hexadecimal digits and
267.Ar Y
268is not zero:
269.Ic \eC'u , \e[u .
270As a special exception,
228.Fa start
271.Pf * Fa start
229is set to the character after the
230.Ic u ,
231and the
272is set to the character after the
273.Ic u ,
274and the
232.Fa sz
275.Pf * Fa sz
233return value does not include the
234.Ic u
235either.
236.Pp
237Such Unicode character escape sequences can be rendered using the function
238.Fn mchars_num2uc
239described in the
240.Xr mchars_alloc 3
241manual.
276return value does not include the
277.Ic u
278either.
279.Pp
280Such Unicode character escape sequences can be rendered using the function
281.Fn mchars_num2uc
282described in the
283.Xr mchars_alloc 3
284manual.
242.It Dv ESCAPE_NUMBERED
243The escape sequence
244.Ic \eN
245followed by a delimited argument.
246The delimiter character is arbitrary except that digits cannot be used.
247If a digit is encountered instead of the opening delimiter, that
248digit is considered to be the argument and the end of the sequence, and
249.Dv ESCAPE_IGNORE
250is returned.
251.Pp
252Such ASCII character escape sequences can be rendered using the function
253.Fn mchars_num2char
254described in the
255.Xr mchars_alloc 3
256manual.
257.It Dv ESCAPE_OVERSTRIKE
258The escape sequence
259.Ic \eo
260followed by an argument delimited by an arbitrary character.
261.It Dv ESCAPE_IGNORE
285.It Dv ESCAPE_IGNORE
286Many escape sequences that
287.Xr mandoc 1
288intends to ignore, in particular:
262.Bl -bullet -width 2n
263.It
264The escape sequence
265.Ic \es
266followed by an argument in standard form or by an argument delimited
267by the single quote character:
268.Ic \es' , \es[ , \es( , \es Ns Ar a .
269As a special exception, an optional
270.Sq +
271or
272.Sq \-
273character is allowed after the
274.Sq s
275for all forms.
276.It
277The escape sequences
278.Ic \eF ,
289.Bl -bullet -width 2n
290.It
291The escape sequence
292.Ic \es
293followed by an argument in standard form or by an argument delimited
294by the single quote character:
295.Ic \es' , \es[ , \es( , \es Ns Ar a .
296As a special exception, an optional
297.Sq +
298or
299.Sq \-
300character is allowed after the
301.Sq s
302for all forms.
303.It
304The escape sequences
305.Ic \eF ,
279.Ic \eg ,
280.Ic \ek ,
281.Ic \eM ,
282.Ic \em ,
306.Ic \ek ,
307.Ic \eM ,
308.Ic \em ,
283.Ic \en ,
284.Ic \eV ,
309.Ic \eO ,
285and
286.Ic \eY
287followed by an argument in standard form.
288.It
289The escape sequences
310and
311.Ic \eY
312followed by an argument in standard form.
313.It
314The escape sequences
290.Ic \eA ,
291.Ic \eb ,
292.Ic \eD ,
293.Ic \eR ,
294.Ic \eX ,
295and
296.Ic \eZ
297followed by an argument delimited by an arbitrary character.
298.It
299The escape sequences
300.Ic \eH ,
315.Ic \eb ,
316.Ic \eD ,
317.Ic \eR ,
318.Ic \eX ,
319and
320.Ic \eZ
321followed by an argument delimited by an arbitrary character.
322.It
323The escape sequences
324.Ic \eH ,
301.Ic \eh ,
302.Ic \eL ,
325.Ic \eL ,
303.Ic \el ,
304.Ic \eS ,
305.Ic \ev ,
306and
307.Ic \ex
308followed by an argument delimited by a character that cannot occur
309in numerical expressions.
310However, if any character that can occur in numerical expressions
311is found instead of a delimiter, the sequence is considered to end
312with that character, and
313.Dv ESCAPE_ERROR
314is returned.
326.Ic \eS ,
327.Ic \ev ,
328and
329.Ic \ex
330followed by an argument delimited by a character that cannot occur
331in numerical expressions.
332However, if any character that can occur in numerical expressions
333is found instead of a delimiter, the sequence is considered to end
334with that character, and
335.Dv ESCAPE_ERROR
336is returned.
337.It
338The escape sequences
339.Ic \eO
340with a single-digit argument in the range from 1 to 4 inclusive.
315.El
341.El
342.It Dv ESCAPE_UNSUPP
343An escape sequence that
344.Xr mandoc 1
345can parse, but for which formatting in unsupported, in particular
346.Qq \eO0
347and
348.Qq \eO5 .
316.It Dv ESCAPE_ERROR
349.It Dv ESCAPE_ERROR
317Escape sequences taking an argument but not matching any of the above patterns.
350Escape sequences taking an argument
351where the actual argument contains a syntax error.
318In particular, that happens if the end of the logical input line
319is reached before the end of the argument.
320.El
321.Pp
322For sequences that do not take an argument, the function
323.Fn mandoc_escape
324returns one of the following values:
325.Bl -tag -width 2n
352In particular, that happens if the end of the logical input line
353is reached before the end of the argument.
354.El
355.Pp
356For sequences that do not take an argument, the function
357.Fn mandoc_escape
358returns one of the following values:
359.Bl -tag -width 2n
326.It Dv ESCAPE_SKIPCHAR
360.It Dv ESCAPE_BREAK
327The escape sequence
361The escape sequence
328.Qq \ez .
362.Qq \ep .
363.It Dv ESCAPE_IGNORE
364Many escape sequences including
365.Qq \e% ,
366.Qq \e& ,
367.Qq \e| ,
368.Qq \ed ,
369and
370.Qq \eu .
329.It Dv ESCAPE_NOSPACE
330The escape sequence
331.Qq \ec .
371.It Dv ESCAPE_NOSPACE
372The escape sequence
373.Qq \ec .
332.It Dv ESCAPE_IGNORE
374.It Dv ESCAPE_SKIPCHAR
375The escape sequence
376.Qq \ez .
377.It Dv ESCAPE_UNSUPP
333The escape sequences
378The escape sequences
334.Qq \ed
379.Qq \e! ,
380.Qq \e? ,
335and
381and
336.Qq \eu .
382.Qq \er .
383.It Dv ESCAPE_UNDEF
384Many escape sequences that other
385.Xr roff 7
386implementations do not define either, for example
387.Qq \eG ,
388.Qq \eI ,
389.Qq \ei ,
390.Qq \eJ ,
391.Qq \ej ,
392.Qq \eK ,
393.Qq \eP ,
394.Qq \eT ,
395.Qq \eU ,
396.Qq \eW ,
397and
398.Qq \ey .
337.El
338.Sh FILES
339This function is implemented in
340.Pa mandoc.c .
341.Sh SEE ALSO
342.Xr mchars_alloc 3 ,
343.Xr mandoc_char 7 ,
344.Xr roff 7
345.Sh HISTORY
346This function has been available since mandoc 1.11.2.
347.Sh AUTHORS
348.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
349.An Ingo Schwarze Aq Mt schwarze@openbsd.org
399.El
400.Sh FILES
401This function is implemented in
402.Pa mandoc.c .
403.Sh SEE ALSO
404.Xr mchars_alloc 3 ,
405.Xr mandoc_char 7 ,
406.Xr roff 7
407.Sh HISTORY
408This function has been available since mandoc 1.11.2.
409.Sh AUTHORS
410.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
411.An Ingo Schwarze Aq Mt schwarze@openbsd.org
350.Sh BUGS
351The function doesn't cleanly distinguish between sequences that are
352valid and supported, valid and ignored, valid and unsupported,
353syntactically invalid, or undefined.
354For sequences that are ignored or unsupported, it doesn't tell
355whether that deficiency is likely to cause major formatting problems
356and/or loss of document content.
357The function is already rather complicated and still parses some
358sequences incorrectly.
359.
360.ig
361For these sequences, the list given below specifies a starting string
362and either the length of the argument or an ending character.
363The argument starts after the starting string.
364In the former case, the sequence ends with the end of the argument.
365In the latter case, the argument ends before the ending character,
366and the sequence ends with the ending character.
367..