1.\" $Id: mandoc_escape.3,v 1.4 2017/07/04 23:40:01 schwarze Exp $ 2.\" 3.\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org> 4.\" 5.\" Permission to use, copy, modify, and distribute this software for any 6.\" purpose with or without fee is hereby granted, provided that the above 7.\" copyright notice and this permission notice appear in all copies. 8.\" 9.\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10.\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11.\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12.\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13.\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 16.\" 17.Dd $Mdocdate: July 4 2017 $ 18.Dt MANDOC_ESCAPE 3 19.Os 20.Sh NAME 21.Nm mandoc_escape 22.Nd parse roff escape sequences 23.Sh SYNOPSIS 24.In sys/types.h 25.In mandoc.h 26.Ft "enum mandoc_esc" 27.Fo mandoc_escape 28.Fa "const char **end" 29.Fa "const char **start" 30.Fa "int *sz" 31.Fc 32.Sh DESCRIPTION 33This function scans a 34.Xr roff 7 35escape sequence. 36.Pp 37An escape sequence consists of 38.Bl -dash -compact -width 2n 39.It 40an initial backslash character 41.Pq Sq \e , 42.It 43a single ASCII character called the escape sequence identifier, 44.It 45and, with only a few exceptions, an argument. 46.El 47.Pp 48Arguments can be given in the following forms; some escape sequence 49identifiers only accept some of these forms as specified below. 50The first three forms are called the standard forms. 51.Bl -tag -width 2n 52.It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&] 53The argument starts after the initial 54.Sq \&[ , 55ends before the final 56.Sq \&] , 57and the escape sequence ends with the final 58.Sq \&] . 59.It Two-character argument short form: Ic \&( Ns Ar ar 60This form can only be used for arguments 61consisting of exactly two characters. 62It has the same effect as 63.Ic \&[ Ns Ar ar Ns Ic \&] . 64.It One-character argument short form: Ar a 65This form can only be used for arguments 66consisting of exactly one character. 67It has the same effect as 68.Ic \&[ Ns Ar a Ns Ic \&] . 69.It Delimited form: Ar C Ns Ar argument Ns Ar C 70The argument starts after the initial delimiter character 71.Ar C , 72ends before the next occurrence of the delimiter character 73.Ar C , 74and the escape sequence ends with that second 75.Ar C . 76Some escape sequences allow arbitrary characters 77.Ar C 78as quoting characters, some restrict the range of characters 79that can be used as quoting characters. 80.El 81.Pp 82Upon function entry, 83.Fa end 84is expected to point to the escape sequence identifier. 85The values passed in as 86.Fa start 87and 88.Fa sz 89are ignored and overwritten. 90.Pp 91By design, this function cannot handle those 92.Xr roff 7 93escape sequences that require in-place expansion, in particular 94user-defined strings 95.Ic \e* , 96number registers 97.Ic \en , 98width measurements 99.Ic \ew , 100and numerical expression control 101.Ic \eB . 102These are handled by 103.Fn roff_res , 104a private preprocessor function called from 105.Fn roff_parseln , 106see the file 107.Pa roff.c . 108.Pp 109The function 110.Fn mandoc_escape 111is used 112.Bl -dash -compact -width 2n 113.It 114recursively by itself, because some escape sequence arguments can 115in turn contain other escape sequences, 116.It 117for error detection internally by the 118.Xr roff 7 119parser part of the 120.Xr mandoc 3 121library, see the file 122.Pa roff.c , 123.It 124above all externally by the 125.Xr mandoc 1 126formatting modules, in particular 127.Fl Tascii 128and 129.Fl Thtml , 130for formatting purposes, see the files 131.Pa term.c 132and 133.Pa html.c , 134.It 135and rarely externally by high-level utilities using the mandoc library, 136for example 137.Xr makewhatis 8 , 138to purge escape sequences from text. 139.El 140.Sh RETURN VALUES 141Upon function return, the pointer 142.Fa end 143is set to the character after the end of the escape sequence, 144such that the calling higher-level parser can easily continue. 145.Pp 146For escape sequences taking an argument, the pointer 147.Fa start 148is set to the beginning of the argument and 149.Fa sz 150is set to the length of the argument. 151For escape sequences not taking an argument, 152.Fa start 153is set to the character after the end of the sequence and 154.Fa sz 155is set to 0. 156Both 157.Fa start 158and 159.Fa sz 160may be 161.Dv NULL ; 162in that case, the argument and the length are not returned. 163.Pp 164For sequences taking an argument, the function 165.Fn mandoc_escape 166returns one of the following values: 167.Bl -tag -width 2n 168.It Dv ESCAPE_FONT 169The escape sequence 170.Ic \ef 171taking an argument in standard form: 172.Ic \ef[ , \ef( , \ef Ns Ar a . 173Two-character arguments starting with the character 174.Sq C 175are reduced to one-character arguments by skipping the 176.Sq C . 177More specific values are returned for the most commonly used arguments: 178.Bl -column "argument" "ESCAPE_FONTITALIC" 179.It argument Ta return value 180.It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN 181.It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC 182.It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD 183.It Cm P Ta Dv ESCAPE_FONTPREV 184.It Cm BI Ta Dv ESCAPE_FONTBI 185.El 186.It Dv ESCAPE_SPECIAL 187The escape sequence 188.Ic \eC 189taking an argument delimited with the single quote character 190and, as a special exception, the escape sequences 191.Em not 192having an identifier, that is, those where the argument, in standard 193form, directly follows the initial backslash: 194.Ic \eC' , \e[ , \e( , \e Ns Ar a . 195Note that the one-character argument short form can only be used for 196argument characters that do not clash with escape sequence identifiers. 197.Pp 198If the argument matches one of the forms described below under 199.Dv ESCAPE_UNICODE , 200that value is returned instead. 201.Pp 202The 203.Dv ESCAPE_SPECIAL 204special character escape sequences can be rendered using the functions 205.Fn mchars_spec2cp 206and 207.Fn mchars_spec2str 208described in the 209.Xr mchars_alloc 3 210manual. 211.It Dv ESCAPE_UNICODE 212Escape sequences of the same format as described above under 213.Dv ESCAPE_SPECIAL , 214but with an argument of the forms 215.Ic u Ns Ar XXXX , 216.Ic u Ns Ar YXXXX , 217or 218.Ic u10 Ns Ar XXXX 219where 220.Ar X 221and 222.Ar Y 223are hexadecimal digits and 224.Ar Y 225is not zero: 226.Ic \eC'u , \e[u . 227As a special exception, 228.Fa start 229is set to the character after the 230.Ic u , 231and the 232.Fa sz 233return value does not include the 234.Ic u 235either. 236.Pp 237Such Unicode character escape sequences can be rendered using the function 238.Fn mchars_num2uc 239described in the 240.Xr mchars_alloc 3 241manual. 242.It Dv ESCAPE_NUMBERED 243The escape sequence 244.Ic \eN 245followed by a delimited argument. 246The delimiter character is arbitrary except that digits cannot be used. 247If a digit is encountered instead of the opening delimiter, that 248digit is considered to be the argument and the end of the sequence, and 249.Dv ESCAPE_IGNORE 250is returned. 251.Pp 252Such ASCII character escape sequences can be rendered using the function 253.Fn mchars_num2char 254described in the 255.Xr mchars_alloc 3 256manual. 257.It Dv ESCAPE_OVERSTRIKE 258The escape sequence 259.Ic \eo 260followed by an argument delimited by an arbitrary character. 261.It Dv ESCAPE_IGNORE 262.Bl -bullet -width 2n 263.It 264The escape sequence 265.Ic \es 266followed by an argument in standard form or by an argument delimited 267by the single quote character: 268.Ic \es' , \es[ , \es( , \es Ns Ar a . 269As a special exception, an optional 270.Sq + 271or 272.Sq \- 273character is allowed after the 274.Sq s 275for all forms. 276.It 277The escape sequences 278.Ic \eF , 279.Ic \eg , 280.Ic \ek , 281.Ic \eM , 282.Ic \em , 283.Ic \en , 284.Ic \eV , 285and 286.Ic \eY 287followed by an argument in standard form. 288.It 289The escape sequences 290.Ic \eA , 291.Ic \eb , 292.Ic \eD , 293.Ic \eR , 294.Ic \eX , 295and 296.Ic \eZ 297followed by an argument delimited by an arbitrary character. 298.It 299The escape sequences 300.Ic \eH , 301.Ic \eh , 302.Ic \eL , 303.Ic \el , 304.Ic \eS , 305.Ic \ev , 306and 307.Ic \ex 308followed by an argument delimited by a character that cannot occur 309in numerical expressions. 310However, if any character that can occur in numerical expressions 311is found instead of a delimiter, the sequence is considered to end 312with that character, and 313.Dv ESCAPE_ERROR 314is returned. 315.El 316.It Dv ESCAPE_ERROR 317Escape sequences taking an argument but not matching any of the above patterns. 318In particular, that happens if the end of the logical input line 319is reached before the end of the argument. 320.El 321.Pp 322For sequences that do not take an argument, the function 323.Fn mandoc_escape 324returns one of the following values: 325.Bl -tag -width 2n 326.It Dv ESCAPE_SKIPCHAR 327The escape sequence 328.Qq \ez . 329.It Dv ESCAPE_NOSPACE 330The escape sequence 331.Qq \ec . 332.It Dv ESCAPE_IGNORE 333The escape sequences 334.Qq \ed 335and 336.Qq \eu . 337.El 338.Sh FILES 339This function is implemented in 340.Pa mandoc.c . 341.Sh SEE ALSO 342.Xr mchars_alloc 3 , 343.Xr mandoc_char 7 , 344.Xr roff 7 345.Sh HISTORY 346This function has been available since mandoc 1.11.2. 347.Sh AUTHORS 348.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv 349.An Ingo Schwarze Aq Mt schwarze@openbsd.org 350.Sh BUGS 351The function doesn't cleanly distinguish between sequences that are 352valid and supported, valid and ignored, valid and unsupported, 353syntactically invalid, or undefined. 354For sequences that are ignored or unsupported, it doesn't tell 355whether that deficiency is likely to cause major formatting problems 356and/or loss of document content. 357The function is already rather complicated and still parses some 358sequences incorrectly. 359. 360.ig 361For these sequences, the list given below specifies a starting string 362and either the length of the argument or an ending character. 363The argument starts after the starting string. 364In the former case, the sequence ends with the end of the argument. 365In the latter case, the argument ends before the ending character, 366and the sequence ends with the ending character. 367.. 368