xref: /freebsd/contrib/libc-vis/unvis.3 (revision ea46e63863df9bd36b65f7293092214f1937349e)
1*ea46e638SKyle Evans.\"	$NetBSD: unvis.3,v 1.30 2019/05/08 15:37:41 bad Exp $
28ccca122SBrooks Davis.\"
38ccca122SBrooks Davis.\" Copyright (c) 1989, 1991, 1993
48ccca122SBrooks Davis.\"	The Regents of the University of California.  All rights reserved.
58ccca122SBrooks Davis.\"
68ccca122SBrooks Davis.\" Redistribution and use in source and binary forms, with or without
78ccca122SBrooks Davis.\" modification, are permitted provided that the following conditions
88ccca122SBrooks Davis.\" are met:
98ccca122SBrooks Davis.\" 1. Redistributions of source code must retain the above copyright
108ccca122SBrooks Davis.\"    notice, this list of conditions and the following disclaimer.
118ccca122SBrooks Davis.\" 2. Redistributions in binary form must reproduce the above copyright
128ccca122SBrooks Davis.\"    notice, this list of conditions and the following disclaimer in the
138ccca122SBrooks Davis.\"    documentation and/or other materials provided with the distribution.
148ccca122SBrooks Davis.\" 3. Neither the name of the University nor the names of its contributors
158ccca122SBrooks Davis.\"    may be used to endorse or promote products derived from this software
168ccca122SBrooks Davis.\"    without specific prior written permission.
178ccca122SBrooks Davis.\"
188ccca122SBrooks Davis.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
198ccca122SBrooks Davis.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
208ccca122SBrooks Davis.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
218ccca122SBrooks Davis.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
228ccca122SBrooks Davis.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
238ccca122SBrooks Davis.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
248ccca122SBrooks Davis.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
258ccca122SBrooks Davis.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
268ccca122SBrooks Davis.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
278ccca122SBrooks Davis.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
288ccca122SBrooks Davis.\" SUCH DAMAGE.
298ccca122SBrooks Davis.\"
308ccca122SBrooks Davis.\"     @(#)unvis.3	8.2 (Berkeley) 12/11/93
318ccca122SBrooks Davis.\"
32*ea46e638SKyle Evans.Dd May 8, 2019
338ccca122SBrooks Davis.Dt UNVIS 3
348ccca122SBrooks Davis.Os
358ccca122SBrooks Davis.Sh NAME
368ccca122SBrooks Davis.Nm unvis ,
37ff88ef41SBrooks Davis.Nm strunvis ,
38ff88ef41SBrooks Davis.Nm strnunvis ,
39ff88ef41SBrooks Davis.Nm strunvisx ,
40ff88ef41SBrooks Davis.Nm strnunvisx
418ccca122SBrooks Davis.Nd decode a visual representation of characters
428ccca122SBrooks Davis.Sh LIBRARY
438ccca122SBrooks Davis.Lb libc
448ccca122SBrooks Davis.Sh SYNOPSIS
458ccca122SBrooks Davis.In vis.h
468ccca122SBrooks Davis.Ft int
478ccca122SBrooks Davis.Fn unvis "char *cp" "int c" "int *astate" "int flag"
488ccca122SBrooks Davis.Ft int
498ccca122SBrooks Davis.Fn strunvis "char *dst" "const char *src"
508ccca122SBrooks Davis.Ft int
518ccca122SBrooks Davis.Fn strnunvis "char *dst" "size_t dlen" "const char *src"
528ccca122SBrooks Davis.Ft int
538ccca122SBrooks Davis.Fn strunvisx "char *dst" "const char *src" "int flag"
548ccca122SBrooks Davis.Ft int
558ccca122SBrooks Davis.Fn strnunvisx "char *dst" "size_t dlen" "const char *src" "int flag"
568ccca122SBrooks Davis.Sh DESCRIPTION
578ccca122SBrooks DavisThe
588ccca122SBrooks Davis.Fn unvis ,
598ccca122SBrooks Davis.Fn strunvis
608ccca122SBrooks Davisand
618ccca122SBrooks Davis.Fn strunvisx
628ccca122SBrooks Davisfunctions
638ccca122SBrooks Davisare used to decode a visual representation of characters, as produced
648ccca122SBrooks Davisby the
658ccca122SBrooks Davis.Xr vis 3
668ccca122SBrooks Davisfunction, back into
678ccca122SBrooks Davisthe original form.
688ccca122SBrooks Davis.Pp
698ccca122SBrooks DavisThe
708ccca122SBrooks Davis.Fn unvis
718ccca122SBrooks Davisfunction is called with successive characters in
728ccca122SBrooks Davis.Ar c
738ccca122SBrooks Davisuntil a valid sequence is recognized, at which time the decoded
748ccca122SBrooks Davischaracter is available at the character pointed to by
758ccca122SBrooks Davis.Ar cp .
768ccca122SBrooks Davis.Pp
778ccca122SBrooks DavisThe
788ccca122SBrooks Davis.Fn strunvis
798ccca122SBrooks Davisfunction decodes the characters pointed to by
808ccca122SBrooks Davis.Ar src
818ccca122SBrooks Davisinto the buffer pointed to by
828ccca122SBrooks Davis.Ar dst .
838ccca122SBrooks DavisThe
848ccca122SBrooks Davis.Fn strunvis
858ccca122SBrooks Davisfunction simply copies
868ccca122SBrooks Davis.Ar src
878ccca122SBrooks Davisto
888ccca122SBrooks Davis.Ar dst ,
898ccca122SBrooks Davisdecoding any escape sequences along the way,
908ccca122SBrooks Davisand returns the number of characters placed into
918ccca122SBrooks Davis.Ar dst ,
928ccca122SBrooks Davisor \-1 if an
938ccca122SBrooks Davisinvalid escape sequence was detected.
948ccca122SBrooks DavisThe size of
958ccca122SBrooks Davis.Ar dst
968ccca122SBrooks Davisshould be equal to the size of
978ccca122SBrooks Davis.Ar src
988ccca122SBrooks Davis(that is, no expansion takes place during decoding).
998ccca122SBrooks Davis.Pp
1008ccca122SBrooks DavisThe
1018ccca122SBrooks Davis.Fn strunvisx
102*ea46e638SKyle Evansand
103*ea46e638SKyle Evans.Fn strnunvisx
104*ea46e638SKyle Evansfunctions do the same as the
1058ccca122SBrooks Davis.Fn strunvis
106*ea46e638SKyle Evansand
107*ea46e638SKyle Evans.Fn strnunvis
108*ea46e638SKyle Evansfunctions,
109*ea46e638SKyle Evansbut take a flag that specifies the style the string
1108ccca122SBrooks Davis.Ar src
1118ccca122SBrooks Davisis encoded with.
112*ea46e638SKyle EvansThe meaning of the flag is the same as explained below for
113*ea46e638SKyle Evans.Fn unvis .
1148ccca122SBrooks Davis.Pp
1158ccca122SBrooks DavisThe
1168ccca122SBrooks Davis.Fn unvis
1178ccca122SBrooks Davisfunction implements a state machine that can be used to decode an
1188ccca122SBrooks Davisarbitrary stream of bytes.
1198ccca122SBrooks DavisAll state associated with the bytes being decoded is stored outside the
1208ccca122SBrooks Davis.Fn unvis
1218ccca122SBrooks Davisfunction (that is, a pointer to the state is passed in), so
1228ccca122SBrooks Daviscalls decoding different streams can be freely intermixed.
1238ccca122SBrooks DavisTo start decoding a stream of bytes, first initialize an integer to zero.
1248ccca122SBrooks DavisCall
1258ccca122SBrooks Davis.Fn unvis
1268ccca122SBrooks Daviswith each successive byte, along with a pointer
1278ccca122SBrooks Davisto this integer, and a pointer to a destination character.
1288ccca122SBrooks DavisThe
1298ccca122SBrooks Davis.Fn unvis
1308ccca122SBrooks Davisfunction has several return codes that must be handled properly.
1318ccca122SBrooks DavisThey are:
1328ccca122SBrooks Davis.Bl -tag -width UNVIS_VALIDPUSH
133778c12a6SBrooks Davis.It Li \&0 No (zero)
1348ccca122SBrooks DavisAnother character is necessary; nothing has been recognized yet.
1358ccca122SBrooks Davis.It Dv UNVIS_VALID
1368ccca122SBrooks DavisA valid character has been recognized and is available at the location
137778c12a6SBrooks Davispointed to by
138778c12a6SBrooks Davis.Fa cp .
1398ccca122SBrooks Davis.It Dv UNVIS_VALIDPUSH
1408ccca122SBrooks DavisA valid character has been recognized and is available at the location
141778c12a6SBrooks Davispointed to by
142778c12a6SBrooks Davis.Fa cp ;
143778c12a6SBrooks Davishowever, the character currently passed in should be passed in again.
1448ccca122SBrooks Davis.It Dv UNVIS_NOCHAR
1458ccca122SBrooks DavisA valid sequence was detected, but no character was produced.
1468ccca122SBrooks DavisThis return code is necessary to indicate a logical break between characters.
1478ccca122SBrooks Davis.It Dv UNVIS_SYNBAD
1488ccca122SBrooks DavisAn invalid escape sequence was detected, or the decoder is in an unknown state.
1498ccca122SBrooks DavisThe decoder is placed into the starting state.
1508ccca122SBrooks Davis.El
1518ccca122SBrooks Davis.Pp
1528ccca122SBrooks DavisWhen all bytes in the stream have been processed, call
1538ccca122SBrooks Davis.Fn unvis
1548ccca122SBrooks Davisone more time with flag set to
1558ccca122SBrooks Davis.Dv UNVIS_END
1568ccca122SBrooks Davisto extract any remaining character (the character passed in is ignored).
1578ccca122SBrooks Davis.Pp
1588ccca122SBrooks DavisThe
159778c12a6SBrooks Davis.Fa flag
1608ccca122SBrooks Davisargument is also used to specify the encoding style of the source.
1618ccca122SBrooks DavisIf set to
162*ea46e638SKyle Evans.Dv VIS_NOESCAPE
163*ea46e638SKyle Evans.Fn unvis
164*ea46e638SKyle Evanswill not decode backslash escapes.
165*ea46e638SKyle EvansIf set to
1668ccca122SBrooks Davis.Dv VIS_HTTPSTYLE
1678ccca122SBrooks Davisor
1688ccca122SBrooks Davis.Dv VIS_HTTP1808 ,
1698ccca122SBrooks Davis.Fn unvis
1708ccca122SBrooks Daviswill decode URI strings as specified in RFC 1808.
1718ccca122SBrooks DavisIf set to
1728ccca122SBrooks Davis.Dv VIS_HTTP1866 ,
1738ccca122SBrooks Davis.Fn unvis
174778c12a6SBrooks Daviswill decode entity references and numeric character references
175778c12a6SBrooks Davisas specified in RFC 1866.
1768ccca122SBrooks DavisIf set to
1778ccca122SBrooks Davis.Dv VIS_MIMESTYLE ,
1788ccca122SBrooks Davis.Fn unvis
1798ccca122SBrooks Daviswill decode MIME Quoted-Printable strings as specified in RFC 2045.
1808ccca122SBrooks DavisIf set to
1818ccca122SBrooks Davis.Dv VIS_NOESCAPE ,
1828ccca122SBrooks Davis.Fn unvis
183778c12a6SBrooks Daviswill not decode
184778c12a6SBrooks Davis.Ql \e
185778c12a6SBrooks Davisquoted characters.
1868ccca122SBrooks Davis.Pp
1878ccca122SBrooks DavisThe following code fragment illustrates a proper use of
1888ccca122SBrooks Davis.Fn unvis .
1898ccca122SBrooks Davis.Bd -literal -offset indent
1908ccca122SBrooks Davisint state = 0;
1918ccca122SBrooks Davischar out;
1928ccca122SBrooks Davis
1938ccca122SBrooks Daviswhile ((ch = getchar()) != EOF) {
1948ccca122SBrooks Davisagain:
195ff88ef41SBrooks Davis	switch(unvis(&out, ch, &state, 0)) {
1968ccca122SBrooks Davis	case 0:
1978ccca122SBrooks Davis	case UNVIS_NOCHAR:
1988ccca122SBrooks Davis		break;
1998ccca122SBrooks Davis	case UNVIS_VALID:
2008ccca122SBrooks Davis		(void)putchar(out);
2018ccca122SBrooks Davis		break;
2028ccca122SBrooks Davis	case UNVIS_VALIDPUSH:
2038ccca122SBrooks Davis		(void)putchar(out);
2048ccca122SBrooks Davis		goto again;
2058ccca122SBrooks Davis	case UNVIS_SYNBAD:
2068ccca122SBrooks Davis		errx(EXIT_FAILURE, "Bad character sequence!");
2078ccca122SBrooks Davis	}
2088ccca122SBrooks Davis}
209ff88ef41SBrooks Davisif (unvis(&out, '\e0', &state, UNVIS_END) == UNVIS_VALID)
2108ccca122SBrooks Davis	(void)putchar(out);
2118ccca122SBrooks Davis.Ed
2128ccca122SBrooks Davis.Sh ERRORS
2138ccca122SBrooks DavisThe functions
2148ccca122SBrooks Davis.Fn strunvis ,
2158ccca122SBrooks Davis.Fn strnunvis ,
2168ccca122SBrooks Davis.Fn strunvisx ,
2178ccca122SBrooks Davisand
2188ccca122SBrooks Davis.Fn strnunvisx
2198ccca122SBrooks Daviswill return \-1 on error and set
2208ccca122SBrooks Davis.Va errno
2218ccca122SBrooks Davisto:
2228ccca122SBrooks Davis.Bl -tag -width Er
2238ccca122SBrooks Davis.It Bq Er EINVAL
2248ccca122SBrooks DavisAn invalid escape sequence was detected, or the decoder is in an unknown state.
2258ccca122SBrooks Davis.El
2268ccca122SBrooks Davis.Pp
2278ccca122SBrooks DavisIn addition the functions
2288ccca122SBrooks Davis.Fn strnunvis
2298ccca122SBrooks Davisand
2308ccca122SBrooks Davis.Fn strnunvisx
2318ccca122SBrooks Daviswill can also set
2328ccca122SBrooks Davis.Va errno
2338ccca122SBrooks Davison error to:
2348ccca122SBrooks Davis.Bl -tag -width Er
2358ccca122SBrooks Davis.It Bq Er ENOSPC
2368ccca122SBrooks DavisNot enough space to perform the conversion.
2378ccca122SBrooks Davis.El
2388ccca122SBrooks Davis.Sh SEE ALSO
2398ccca122SBrooks Davis.Xr unvis 1 ,
2408ccca122SBrooks Davis.Xr vis 1 ,
2418ccca122SBrooks Davis.Xr vis 3
2428ccca122SBrooks Davis.Rs
2438ccca122SBrooks Davis.%A R. Fielding
2448ccca122SBrooks Davis.%T Relative Uniform Resource Locators
2458ccca122SBrooks Davis.%O RFC1808
2468ccca122SBrooks Davis.Re
2478ccca122SBrooks Davis.Sh HISTORY
2488ccca122SBrooks DavisThe
2498ccca122SBrooks Davis.Fn unvis
2508ccca122SBrooks Davisfunction
2518ccca122SBrooks Davisfirst appeared in
2528ccca122SBrooks Davis.Bx 4.4 .
2538ccca122SBrooks DavisThe
2548ccca122SBrooks Davis.Fn strnunvis
2558ccca122SBrooks Davisand
2568ccca122SBrooks Davis.Fn strnunvisx
2578ccca122SBrooks Davisfunctions appeared in
2588ccca122SBrooks Davis.Nx 6.0
2598ccca122SBrooks Davisand
260778c12a6SBrooks Davis.Fx 9.2 .
261778c12a6SBrooks Davis.Sh BUGS
262778c12a6SBrooks DavisThe names
263778c12a6SBrooks Davis.Dv VIS_HTTP1808
264778c12a6SBrooks Davisand
265778c12a6SBrooks Davis.Dv VIS_HTTP1866
266778c12a6SBrooks Davisare wrong.
267778c12a6SBrooks DavisPercent-encoding was defined in RFC 1738, the original RFC for URL.
268778c12a6SBrooks DavisRFC 1866 defines HTML 2.0, an application of SGML, from which it
269778c12a6SBrooks Davisinherits concepts of numeric character references and entity
270778c12a6SBrooks Davisreferences.
271