mandoc_escape.3 (01d4e2149e5566e5d9394913dc9fb032da259e0b) | mandoc_escape.3 (c1c95add8c80843ba15d784f95c361d795b1f593) |
---|---|
1.\" $Id: mandoc_escape.3,v 1.4 2017/07/04 23:40:01 schwarze Exp $ | 1.\" $Id: mandoc_escape.3,v 1.6 2023/10/23 14:46:22 schwarze Exp $ |
2.\" 3.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" | 2.\" 3.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" |
17.Dd $Mdocdate: July 4 2017 $ | 17.Dd $Mdocdate: October 23 2023 $ |
18.Dt MANDOC_ESCAPE 3 19.Os 20.Sh NAME 21.Nm mandoc_escape 22.Nd parse roff escape sequences 23.Sh SYNOPSIS 24.In sys/types.h 25.In mandoc.h --- 49 unchanged lines hidden (view full) --- 75.Ar C . 76Some escape sequences allow arbitrary characters 77.Ar C 78as quoting characters, some restrict the range of characters 79that can be used as quoting characters. 80.El 81.Pp 82Upon function entry, | 18.Dt MANDOC_ESCAPE 3 19.Os 20.Sh NAME 21.Nm mandoc_escape 22.Nd parse roff escape sequences 23.Sh SYNOPSIS 24.In sys/types.h 25.In mandoc.h --- 49 unchanged lines hidden (view full) --- 75.Ar C . 76Some escape sequences allow arbitrary characters 77.Ar C 78as quoting characters, some restrict the range of characters 79that can be used as quoting characters. 80.El 81.Pp 82Upon function entry, |
83.Fa end | 83.Pf * Fa end |
84is expected to point to the escape sequence identifier. 85The values passed in as | 84is expected to point to the escape sequence identifier. 85The values passed in as |
86.Fa start | 86.Pf * Fa start |
87and | 87and |
88.Fa sz | 88.Pf * Fa sz |
89are ignored and overwritten. 90.Pp 91By design, this function cannot handle those 92.Xr roff 7 93escape sequences that require in-place expansion, in particular 94user-defined strings 95.Ic \e* , 96number registers 97.Ic \en , 98width measurements 99.Ic \ew , 100and numerical expression control 101.Ic \eB . 102These are handled by | 89are ignored and overwritten. 90.Pp 91By design, this function cannot handle those 92.Xr roff 7 93escape sequences that require in-place expansion, in particular 94user-defined strings 95.Ic \e* , 96number registers 97.Ic \en , 98width measurements 99.Ic \ew , 100and numerical expression control 101.Ic \eB . 102These are handled by |
103.Fn roff_res , | 103.Fn roff_expand , |
104a private preprocessor function called from | 104a private preprocessor function called from |
105.Fn roff_parseln , | 105.Fn roff_parseln 106and 107.Fn roff_getarg , |
106see the file 107.Pa roff.c . 108.Pp 109The function 110.Fn mandoc_escape 111is used 112.Bl -dash -compact -width 2n 113.It 114recursively by itself, because some escape sequence arguments can 115in turn contain other escape sequences, 116.It | 108see the file 109.Pa roff.c . 110.Pp 111The function 112.Fn mandoc_escape 113is used 114.Bl -dash -compact -width 2n 115.It 116recursively by itself, because some escape sequence arguments can 117in turn contain other escape sequences, 118.It |
117for error detection internally by the | 119for parsing and error detection internally by the |
118.Xr roff 7 119parser part of the 120.Xr mandoc 3 121library, see the file 122.Pa roff.c , 123.It | 120.Xr roff 7 121parser part of the 122.Xr mandoc 3 123library, see the file 124.Pa roff.c , 125.It |
126occasionally by high-level parser and validation modules when they 127need to skip escape sequences while scanning the input, see the files 128.Pa mdoc.c , 129.Pa man.c , 130.Pa man_validate.c , 131.Pa eqn.c , 132and 133.Pa tbl_data.c 134.It |
|
124above all externally by the 125.Xr mandoc 1 126formatting modules, in particular 127.Fl Tascii 128and 129.Fl Thtml , 130for formatting purposes, see the files 131.Pa term.c 132and 133.Pa html.c , 134.It 135and rarely externally by high-level utilities using the mandoc library, 136for example 137.Xr makewhatis 8 , 138to purge escape sequences from text. 139.El 140.Sh RETURN VALUES 141Upon function return, the pointer | 135above all externally by the 136.Xr mandoc 1 137formatting modules, in particular 138.Fl Tascii 139and 140.Fl Thtml , 141for formatting purposes, see the files 142.Pa term.c 143and 144.Pa html.c , 145.It 146and rarely externally by high-level utilities using the mandoc library, 147for example 148.Xr makewhatis 8 , 149to purge escape sequences from text. 150.El 151.Sh RETURN VALUES 152Upon function return, the pointer |
142.Fa end | 153.Pf * Fa end |
143is set to the character after the end of the escape sequence, 144such that the calling higher-level parser can easily continue. 145.Pp 146For escape sequences taking an argument, the pointer | 154is set to the character after the end of the escape sequence, 155such that the calling higher-level parser can easily continue. 156.Pp 157For escape sequences taking an argument, the pointer |
147.Fa start | 158.Pf * Fa start |
148is set to the beginning of the argument and | 159is set to the beginning of the argument and |
149.Fa sz | 160.Pf * Fa sz |
150is set to the length of the argument. 151For escape sequences not taking an argument, | 161is set to the length of the argument. 162For escape sequences not taking an argument, |
152.Fa start | 163.Pf * Fa start |
153is set to the character after the end of the sequence and | 164is set to the character after the end of the sequence and |
154.Fa sz | 165.Pf * Fa sz |
155is set to 0. 156Both 157.Fa start 158and 159.Fa sz 160may be 161.Dv NULL ; 162in that case, the argument and the length are not returned. 163.Pp 164For sequences taking an argument, the function 165.Fn mandoc_escape 166returns one of the following values: 167.Bl -tag -width 2n | 166is set to 0. 167Both 168.Fa start 169and 170.Fa sz 171may be 172.Dv NULL ; 173in that case, the argument and the length are not returned. 174.Pp 175For sequences taking an argument, the function 176.Fn mandoc_escape 177returns one of the following values: 178.Bl -tag -width 2n |
179.It Dv ESCAPE_DEVICE 180The escape sequence 181.Ic \e*(.T 182or 183.Ic \e*[.T] . |
|
168.It Dv ESCAPE_FONT 169The escape sequence 170.Ic \ef 171taking an argument in standard form: 172.Ic \ef[ , \ef( , \ef Ns Ar a . 173Two-character arguments starting with the character 174.Sq C 175are reduced to one-character arguments by skipping the 176.Sq C . 177More specific values are returned for the most commonly used arguments: 178.Bl -column "argument" "ESCAPE_FONTITALIC" 179.It argument Ta return value 180.It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN 181.It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC 182.It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD 183.It Cm P Ta Dv ESCAPE_FONTPREV 184.It Cm BI Ta Dv ESCAPE_FONTBI 185.El | 184.It Dv ESCAPE_FONT 185The escape sequence 186.Ic \ef 187taking an argument in standard form: 188.Ic \ef[ , \ef( , \ef Ns Ar a . 189Two-character arguments starting with the character 190.Sq C 191are reduced to one-character arguments by skipping the 192.Sq C . 193More specific values are returned for the most commonly used arguments: 194.Bl -column "argument" "ESCAPE_FONTITALIC" 195.It argument Ta return value 196.It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN 197.It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC 198.It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD 199.It Cm P Ta Dv ESCAPE_FONTPREV 200.It Cm BI Ta Dv ESCAPE_FONTBI 201.El |
202.It Dv ESCAPE_HLINE 203The escape sequence 204.Ic \eh 205followed by an argument delimited by an arbitrary character. 206.It Dv ESCAPE_HORIZ 207The escape sequence 208.Ic \el 209followed by an argument delimited by an arbitrary character. 210.It Dv ESCAPE_NUMBERED 211The escape sequence 212.Ic \eN 213followed by a delimited argument. 214The delimiter character is arbitrary except that digits cannot be used. 215If a digit is encountered instead of the opening delimiter, that 216digit is considered to be the argument and the end of the sequence, and 217.Dv ESCAPE_IGNORE 218is returned. 219.Pp 220Such ASCII character escape sequences can be rendered using the function 221.Fn mchars_num2char 222described in the 223.Xr mchars_alloc 3 224manual. 225.It Dv ESCAPE_OVERSTRIKE 226The escape sequence 227.Ic \eo 228followed by an argument delimited by an arbitrary character. |
|
186.It Dv ESCAPE_SPECIAL 187The escape sequence 188.Ic \eC 189taking an argument delimited with the single quote character 190and, as a special exception, the escape sequences 191.Em not 192having an identifier, that is, those where the argument, in standard 193form, directly follows the initial backslash: --- 26 unchanged lines hidden (view full) --- 220.Ar X 221and 222.Ar Y 223are hexadecimal digits and 224.Ar Y 225is not zero: 226.Ic \eC'u , \e[u . 227As a special exception, | 229.It Dv ESCAPE_SPECIAL 230The escape sequence 231.Ic \eC 232taking an argument delimited with the single quote character 233and, as a special exception, the escape sequences 234.Em not 235having an identifier, that is, those where the argument, in standard 236form, directly follows the initial backslash: --- 26 unchanged lines hidden (view full) --- 263.Ar X 264and 265.Ar Y 266are hexadecimal digits and 267.Ar Y 268is not zero: 269.Ic \eC'u , \e[u . 270As a special exception, |
228.Fa start | 271.Pf * Fa start |
229is set to the character after the 230.Ic u , 231and the | 272is set to the character after the 273.Ic u , 274and the |
232.Fa sz | 275.Pf * Fa sz |
233return value does not include the 234.Ic u 235either. 236.Pp 237Such Unicode character escape sequences can be rendered using the function 238.Fn mchars_num2uc 239described in the 240.Xr mchars_alloc 3 241manual. | 276return value does not include the 277.Ic u 278either. 279.Pp 280Such Unicode character escape sequences can be rendered using the function 281.Fn mchars_num2uc 282described in the 283.Xr mchars_alloc 3 284manual. |
242.It Dv ESCAPE_NUMBERED 243The escape sequence 244.Ic \eN 245followed by a delimited argument. 246The delimiter character is arbitrary except that digits cannot be used. 247If a digit is encountered instead of the opening delimiter, that 248digit is considered to be the argument and the end of the sequence, and 249.Dv ESCAPE_IGNORE 250is returned. 251.Pp 252Such ASCII character escape sequences can be rendered using the function 253.Fn mchars_num2char 254described in the 255.Xr mchars_alloc 3 256manual. 257.It Dv ESCAPE_OVERSTRIKE 258The escape sequence 259.Ic \eo 260followed by an argument delimited by an arbitrary character. | |
261.It Dv ESCAPE_IGNORE | 285.It Dv ESCAPE_IGNORE |
286Many escape sequences that 287.Xr mandoc 1 288intends to ignore, in particular: |
|
262.Bl -bullet -width 2n 263.It 264The escape sequence 265.Ic \es 266followed by an argument in standard form or by an argument delimited 267by the single quote character: 268.Ic \es' , \es[ , \es( , \es Ns Ar a . 269As a special exception, an optional 270.Sq + 271or 272.Sq \- 273character is allowed after the 274.Sq s 275for all forms. 276.It 277The escape sequences 278.Ic \eF , | 289.Bl -bullet -width 2n 290.It 291The escape sequence 292.Ic \es 293followed by an argument in standard form or by an argument delimited 294by the single quote character: 295.Ic \es' , \es[ , \es( , \es Ns Ar a . 296As a special exception, an optional 297.Sq + 298or 299.Sq \- 300character is allowed after the 301.Sq s 302for all forms. 303.It 304The escape sequences 305.Ic \eF , |
279.Ic \eg , | |
280.Ic \ek , 281.Ic \eM , 282.Ic \em , | 306.Ic \ek , 307.Ic \eM , 308.Ic \em , |
283.Ic \en , 284.Ic \eV , | 309.Ic \eO , |
285and 286.Ic \eY 287followed by an argument in standard form. 288.It 289The escape sequences | 310and 311.Ic \eY 312followed by an argument in standard form. 313.It 314The escape sequences |
290.Ic \eA , | |
291.Ic \eb , 292.Ic \eD , 293.Ic \eR , 294.Ic \eX , 295and 296.Ic \eZ 297followed by an argument delimited by an arbitrary character. 298.It 299The escape sequences 300.Ic \eH , | 315.Ic \eb , 316.Ic \eD , 317.Ic \eR , 318.Ic \eX , 319and 320.Ic \eZ 321followed by an argument delimited by an arbitrary character. 322.It 323The escape sequences 324.Ic \eH , |
301.Ic \eh , | |
302.Ic \eL , | 325.Ic \eL , |
303.Ic \el , | |
304.Ic \eS , 305.Ic \ev , 306and 307.Ic \ex 308followed by an argument delimited by a character that cannot occur 309in numerical expressions. 310However, if any character that can occur in numerical expressions 311is found instead of a delimiter, the sequence is considered to end 312with that character, and 313.Dv ESCAPE_ERROR 314is returned. | 326.Ic \eS , 327.Ic \ev , 328and 329.Ic \ex 330followed by an argument delimited by a character that cannot occur 331in numerical expressions. 332However, if any character that can occur in numerical expressions 333is found instead of a delimiter, the sequence is considered to end 334with that character, and 335.Dv ESCAPE_ERROR 336is returned. |
337.It 338The escape sequences 339.Ic \eO 340with a single-digit argument in the range from 1 to 4 inclusive. |
|
315.El | 341.El |
342.It Dv ESCAPE_UNSUPP 343An escape sequence that 344.Xr mandoc 1 345can parse, but for which formatting in unsupported, in particular 346.Qq \eO0 347and 348.Qq \eO5 . |
|
316.It Dv ESCAPE_ERROR | 349.It Dv ESCAPE_ERROR |
317Escape sequences taking an argument but not matching any of the above patterns. | 350Escape sequences taking an argument 351where the actual argument contains a syntax error. |
318In particular, that happens if the end of the logical input line 319is reached before the end of the argument. 320.El 321.Pp 322For sequences that do not take an argument, the function 323.Fn mandoc_escape 324returns one of the following values: 325.Bl -tag -width 2n | 352In particular, that happens if the end of the logical input line 353is reached before the end of the argument. 354.El 355.Pp 356For sequences that do not take an argument, the function 357.Fn mandoc_escape 358returns one of the following values: 359.Bl -tag -width 2n |
326.It Dv ESCAPE_SKIPCHAR | 360.It Dv ESCAPE_BREAK |
327The escape sequence | 361The escape sequence |
328.Qq \ez . | 362.Qq \ep . 363.It Dv ESCAPE_IGNORE 364Many escape sequences including 365.Qq \e% , 366.Qq \e& , 367.Qq \e| , 368.Qq \ed , 369and 370.Qq \eu . |
329.It Dv ESCAPE_NOSPACE 330The escape sequence 331.Qq \ec . | 371.It Dv ESCAPE_NOSPACE 372The escape sequence 373.Qq \ec . |
332.It Dv ESCAPE_IGNORE | 374.It Dv ESCAPE_SKIPCHAR 375The escape sequence 376.Qq \ez . 377.It Dv ESCAPE_UNSUPP |
333The escape sequences | 378The escape sequences |
334.Qq \ed | 379.Qq \e! , 380.Qq \e? , |
335and | 381and |
336.Qq \eu . | 382.Qq \er . 383.It Dv ESCAPE_UNDEF 384Many escape sequences that other 385.Xr roff 7 386implementations do not define either, for example 387.Qq \eG , 388.Qq \eI , 389.Qq \ei , 390.Qq \eJ , 391.Qq \ej , 392.Qq \eK , 393.Qq \eP , 394.Qq \eT , 395.Qq \eU , 396.Qq \eW , 397and 398.Qq \ey . |
337.El 338.Sh FILES 339This function is implemented in 340.Pa mandoc.c . 341.Sh SEE ALSO 342.Xr mchars_alloc 3 , 343.Xr mandoc_char 7 , 344.Xr roff 7 345.Sh HISTORY 346This function has been available since mandoc 1.11.2. 347.Sh AUTHORS 348.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv 349.An Ingo Schwarze Aq Mt schwarze@openbsd.org | 399.El 400.Sh FILES 401This function is implemented in 402.Pa mandoc.c . 403.Sh SEE ALSO 404.Xr mchars_alloc 3 , 405.Xr mandoc_char 7 , 406.Xr roff 7 407.Sh HISTORY 408This function has been available since mandoc 1.11.2. 409.Sh AUTHORS 410.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv 411.An Ingo Schwarze Aq Mt schwarze@openbsd.org |
350.Sh BUGS 351The function doesn't cleanly distinguish between sequences that are 352valid and supported, valid and ignored, valid and unsupported, 353syntactically invalid, or undefined. 354For sequences that are ignored or unsupported, it doesn't tell 355whether that deficiency is likely to cause major formatting problems 356and/or loss of document content. 357The function is already rather complicated and still parses some 358sequences incorrectly. 359. 360.ig 361For these sequences, the list given below specifies a starting string 362and either the length of the argument or an ending character. 363The argument starts after the starting string. 364In the former case, the sequence ends with the end of the argument. 365In the latter case, the argument ends before the ending character, 366and the sequence ends with the ending character. 367.. | |