1.\" $Id: mandoc_escape.3,v 1.6 2023/10/23 14:46:22 schwarze Exp $ 2.\" 3.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" 17.Dd $Mdocdate: October 23 2023 $ 18.Dt MANDOC_ESCAPE 3 19.Os 20.Sh NAME 21.Nm mandoc_escape 22.Nd parse roff escape sequences 23.Sh SYNOPSIS 24.In sys/types.h 25.In mandoc.h 26.Ft "enum mandoc_esc" 27.Fo mandoc_escape 28.Fa "const char **end" 29.Fa "const char **start" 30.Fa "int *sz" 31.Fc 32.Sh DESCRIPTION 33This function scans a 34.Xr roff 7 35escape sequence. 36.Pp 37An escape sequence consists of 38.Bl -dash -compact -width 2n 39.It 40an initial backslash character 41.Pq Sq \e , 42.It 43a single ASCII character called the escape sequence identifier, 44.It 45and, with only a few exceptions, an argument. 46.El 47.Pp 48Arguments can be given in the following forms; some escape sequence 49identifiers only accept some of these forms as specified below. 50The first three forms are called the standard forms. 51.Bl -tag -width 2n 52.It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&] 53The argument starts after the initial 54.Sq \&[ , 55ends before the final 56.Sq \&] , 57and the escape sequence ends with the final 58.Sq \&] . 59.It Two-character argument short form: Ic \&( Ns Ar ar 60This form can only be used for arguments 61consisting of exactly two characters. 62It has the same effect as 63.Ic \&[ Ns Ar ar Ns Ic \&] . 64.It One-character argument short form: Ar a 65This form can only be used for arguments 66consisting of exactly one character. 67It has the same effect as 68.Ic \&[ Ns Ar a Ns Ic \&] . 69.It Delimited form: Ar C Ns Ar argument Ns Ar C 70The argument starts after the initial delimiter character 71.Ar C , 72ends before the next occurrence of the delimiter character 73.Ar C , 74and the escape sequence ends with that second 75.Ar C . 76Some escape sequences allow arbitrary characters 77.Ar C 78as quoting characters, some restrict the range of characters 79that can be used as quoting characters. 80.El 81.Pp 82Upon function entry, 83.Pf * Fa end 84is expected to point to the escape sequence identifier. 85The values passed in as 86.Pf * Fa start 87and 88.Pf * Fa sz 89are ignored and overwritten. 90.Pp 91By design, this function cannot handle those 92.Xr roff 7 93escape sequences that require in-place expansion, in particular 94user-defined strings 95.Ic \e* , 96number registers 97.Ic \en , 98width measurements 99.Ic \ew , 100and numerical expression control 101.Ic \eB . 102These are handled by 103.Fn roff_expand , 104a private preprocessor function called from 105.Fn roff_parseln 106and 107.Fn roff_getarg , 108see the file 109.Pa roff.c . 110.Pp 111The function 112.Fn mandoc_escape 113is used 114.Bl -dash -compact -width 2n 115.It 116recursively by itself, because some escape sequence arguments can 117in turn contain other escape sequences, 118.It 119for parsing and error detection internally by the 120.Xr roff 7 121parser part of the 122.Xr mandoc 3 123library, see the file 124.Pa roff.c , 125.It 126occasionally by high-level parser and validation modules when they 127need to skip escape sequences while scanning the input, see the files 128.Pa mdoc.c , 129.Pa man.c , 130.Pa man_validate.c , 131.Pa eqn.c , 132and 133.Pa tbl_data.c 134.It 135above all externally by the 136.Xr mandoc 1 137formatting modules, in particular 138.Fl Tascii 139and 140.Fl Thtml , 141for formatting purposes, see the files 142.Pa term.c 143and 144.Pa html.c , 145.It 146and rarely externally by high-level utilities using the mandoc library, 147for example 148.Xr makewhatis 8 , 149to purge escape sequences from text. 150.El 151.Sh RETURN VALUES 152Upon function return, the pointer 153.Pf * Fa end 154is set to the character after the end of the escape sequence, 155such that the calling higher-level parser can easily continue. 156.Pp 157For escape sequences taking an argument, the pointer 158.Pf * Fa start 159is set to the beginning of the argument and 160.Pf * Fa sz 161is set to the length of the argument. 162For escape sequences not taking an argument, 163.Pf * Fa start 164is set to the character after the end of the sequence and 165.Pf * Fa sz 166is set to 0. 167Both 168.Fa start 169and 170.Fa sz 171may be 172.Dv NULL ; 173in that case, the argument and the length are not returned. 174.Pp 175For sequences taking an argument, the function 176.Fn mandoc_escape 177returns one of the following values: 178.Bl -tag -width 2n 179.It Dv ESCAPE_DEVICE 180The escape sequence 181.Ic \e*(.T 182or 183.Ic \e*[.T] . 184.It Dv ESCAPE_FONT 185The escape sequence 186.Ic \ef 187taking an argument in standard form: 188.Ic \ef[ , \ef( , \ef Ns Ar a . 189Two-character arguments starting with the character 190.Sq C 191are reduced to one-character arguments by skipping the 192.Sq C . 193More specific values are returned for the most commonly used arguments: 194.Bl -column "argument" "ESCAPE_FONTITALIC" 195.It argument Ta return value 196.It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN 197.It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC 198.It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD 199.It Cm P Ta Dv ESCAPE_FONTPREV 200.It Cm BI Ta Dv ESCAPE_FONTBI 201.El 202.It Dv ESCAPE_HLINE 203The escape sequence 204.Ic \eh 205followed by an argument delimited by an arbitrary character. 206.It Dv ESCAPE_HORIZ 207The escape sequence 208.Ic \el 209followed by an argument delimited by an arbitrary character. 210.It Dv ESCAPE_NUMBERED 211The escape sequence 212.Ic \eN 213followed by a delimited argument. 214The delimiter character is arbitrary except that digits cannot be used. 215If a digit is encountered instead of the opening delimiter, that 216digit is considered to be the argument and the end of the sequence, and 217.Dv ESCAPE_IGNORE 218is returned. 219.Pp 220Such ASCII character escape sequences can be rendered using the function 221.Fn mchars_num2char 222described in the 223.Xr mchars_alloc 3 224manual. 225.It Dv ESCAPE_OVERSTRIKE 226The escape sequence 227.Ic \eo 228followed by an argument delimited by an arbitrary character. 229.It Dv ESCAPE_SPECIAL 230The escape sequence 231.Ic \eC 232taking an argument delimited with the single quote character 233and, as a special exception, the escape sequences 234.Em not 235having an identifier, that is, those where the argument, in standard 236form, directly follows the initial backslash: 237.Ic \eC' , \e[ , \e( , \e Ns Ar a . 238Note that the one-character argument short form can only be used for 239argument characters that do not clash with escape sequence identifiers. 240.Pp 241If the argument matches one of the forms described below under 242.Dv ESCAPE_UNICODE , 243that value is returned instead. 244.Pp 245The 246.Dv ESCAPE_SPECIAL 247special character escape sequences can be rendered using the functions 248.Fn mchars_spec2cp 249and 250.Fn mchars_spec2str 251described in the 252.Xr mchars_alloc 3 253manual. 254.It Dv ESCAPE_UNICODE 255Escape sequences of the same format as described above under 256.Dv ESCAPE_SPECIAL , 257but with an argument of the forms 258.Ic u Ns Ar XXXX , 259.Ic u Ns Ar YXXXX , 260or 261.Ic u10 Ns Ar XXXX 262where 263.Ar X 264and 265.Ar Y 266are hexadecimal digits and 267.Ar Y 268is not zero: 269.Ic \eC'u , \e[u . 270As a special exception, 271.Pf * Fa start 272is set to the character after the 273.Ic u , 274and the 275.Pf * Fa sz 276return value does not include the 277.Ic u 278either. 279.Pp 280Such Unicode character escape sequences can be rendered using the function 281.Fn mchars_num2uc 282described in the 283.Xr mchars_alloc 3 284manual. 285.It Dv ESCAPE_IGNORE 286Many escape sequences that 287.Xr mandoc 1 288intends to ignore, in particular: 289.Bl -bullet -width 2n 290.It 291The escape sequence 292.Ic \es 293followed by an argument in standard form or by an argument delimited 294by the single quote character: 295.Ic \es' , \es[ , \es( , \es Ns Ar a . 296As a special exception, an optional 297.Sq + 298or 299.Sq \- 300character is allowed after the 301.Sq s 302for all forms. 303.It 304The escape sequences 305.Ic \eF , 306.Ic \ek , 307.Ic \eM , 308.Ic \em , 309.Ic \eO , 310and 311.Ic \eY 312followed by an argument in standard form. 313.It 314The escape sequences 315.Ic \eb , 316.Ic \eD , 317.Ic \eR , 318.Ic \eX , 319and 320.Ic \eZ 321followed by an argument delimited by an arbitrary character. 322.It 323The escape sequences 324.Ic \eH , 325.Ic \eL , 326.Ic \eS , 327.Ic \ev , 328and 329.Ic \ex 330followed by an argument delimited by a character that cannot occur 331in numerical expressions. 332However, if any character that can occur in numerical expressions 333is found instead of a delimiter, the sequence is considered to end 334with that character, and 335.Dv ESCAPE_ERROR 336is returned. 337.It 338The escape sequences 339.Ic \eO 340with a single-digit argument in the range from 1 to 4 inclusive. 341.El 342.It Dv ESCAPE_UNSUPP 343An escape sequence that 344.Xr mandoc 1 345can parse, but for which formatting in unsupported, in particular 346.Qq \eO0 347and 348.Qq \eO5 . 349.It Dv ESCAPE_ERROR 350Escape sequences taking an argument 351where the actual argument contains a syntax error. 352In particular, that happens if the end of the logical input line 353is reached before the end of the argument. 354.El 355.Pp 356For sequences that do not take an argument, the function 357.Fn mandoc_escape 358returns one of the following values: 359.Bl -tag -width 2n 360.It Dv ESCAPE_BREAK 361The escape sequence 362.Qq \ep . 363.It Dv ESCAPE_IGNORE 364Many escape sequences including 365.Qq \e% , 366.Qq \e& , 367.Qq \e| , 368.Qq \ed , 369and 370.Qq \eu . 371.It Dv ESCAPE_NOSPACE 372The escape sequence 373.Qq \ec . 374.It Dv ESCAPE_SKIPCHAR 375The escape sequence 376.Qq \ez . 377.It Dv ESCAPE_UNSUPP 378The escape sequences 379.Qq \e! , 380.Qq \e? , 381and 382.Qq \er . 383.It Dv ESCAPE_UNDEF 384Many escape sequences that other 385.Xr roff 7 386implementations do not define either, for example 387.Qq \eG , 388.Qq \eI , 389.Qq \ei , 390.Qq \eJ , 391.Qq \ej , 392.Qq \eK , 393.Qq \eP , 394.Qq \eT , 395.Qq \eU , 396.Qq \eW , 397and 398.Qq \ey . 399.El 400.Sh FILES 401This function is implemented in 402.Pa mandoc.c . 403.Sh SEE ALSO 404.Xr mchars_alloc 3 , 405.Xr mandoc_char 7 , 406.Xr roff 7 407.Sh HISTORY 408This function has been available since mandoc 1.11.2. 409.Sh AUTHORS 410.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv 411.An Ingo Schwarze Aq Mt schwarze@openbsd.org 412