xref: /illumos-gate/usr/src/man/man3c/c16rtomb.3c (revision eda3ef2de2d15b389090f6ef953edaea3daaace4)
1*eda3ef2dSRobert Mustacchi.\"
2*eda3ef2dSRobert Mustacchi.\" This file and its contents are supplied under the terms of the
3*eda3ef2dSRobert Mustacchi.\" Common Development and Distribution License ("CDDL"), version 1.0.
4*eda3ef2dSRobert Mustacchi.\" You may only use this file in accordance with the terms of version
5*eda3ef2dSRobert Mustacchi.\" 1.0 of the CDDL.
6*eda3ef2dSRobert Mustacchi.\"
7*eda3ef2dSRobert Mustacchi.\" A full copy of the text of the CDDL should have accompanied this
8*eda3ef2dSRobert Mustacchi.\" source.  A copy of the CDDL is also available via the Internet at
9*eda3ef2dSRobert Mustacchi.\" http://www.illumos.org/license/CDDL.
10*eda3ef2dSRobert Mustacchi.\"
11*eda3ef2dSRobert Mustacchi.\"
12*eda3ef2dSRobert Mustacchi.\" Copyright 2020 Robert Mustacchi
13*eda3ef2dSRobert Mustacchi.\"
14*eda3ef2dSRobert Mustacchi.Dd April 23, 2020
15*eda3ef2dSRobert Mustacchi.Dt C16RTOMB 3C
16*eda3ef2dSRobert Mustacchi.Os
17*eda3ef2dSRobert Mustacchi.Sh NAME
18*eda3ef2dSRobert Mustacchi.Nm c16rtomb ,
19*eda3ef2dSRobert Mustacchi.Nm c32rtomb ,
20*eda3ef2dSRobert Mustacchi.Nm wcrtomb ,
21*eda3ef2dSRobert Mustacchi.Nm wcrtomb_l
22*eda3ef2dSRobert Mustacchi.Nd convert wide-characters to character sequences
23*eda3ef2dSRobert Mustacchi.Sh SYNOPSIS
24*eda3ef2dSRobert Mustacchi.In uchar.h
25*eda3ef2dSRobert Mustacchi.Ft size_t
26*eda3ef2dSRobert Mustacchi.Fo c16rtomb
27*eda3ef2dSRobert Mustacchi.Fa "char *restrict str"
28*eda3ef2dSRobert Mustacchi.Fa "char16_t c16"
29*eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps"
30*eda3ef2dSRobert Mustacchi.Fc
31*eda3ef2dSRobert Mustacchi.Ft size_t
32*eda3ef2dSRobert Mustacchi.Fo c32rtomb
33*eda3ef2dSRobert Mustacchi.Fa "char *restrict str"
34*eda3ef2dSRobert Mustacchi.Fa "char32_t c32"
35*eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps"
36*eda3ef2dSRobert Mustacchi.Fc
37*eda3ef2dSRobert Mustacchi.In stdio.h
38*eda3ef2dSRobert Mustacchi.Ft size_t
39*eda3ef2dSRobert Mustacchi.Fo wcrtomb
40*eda3ef2dSRobert Mustacchi.Fa "char *restrict str"
41*eda3ef2dSRobert Mustacchi.Fa "wchar_t wc"
42*eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps"
43*eda3ef2dSRobert Mustacchi.Fc
44*eda3ef2dSRobert Mustacchi.In stdio.h
45*eda3ef2dSRobert Mustacchi.In xlocale.h
46*eda3ef2dSRobert Mustacchi.Ft size_t
47*eda3ef2dSRobert Mustacchi.Fo wcrtomb_l
48*eda3ef2dSRobert Mustacchi.Fa "char *restrict str"
49*eda3ef2dSRobert Mustacchi.Fa "wchar_t wc"
50*eda3ef2dSRobert Mustacchi.Fa "mbstate_t *restrict ps"
51*eda3ef2dSRobert Mustacchi.Fa "locale_t loc"
52*eda3ef2dSRobert Mustacchi.Fc
53*eda3ef2dSRobert Mustacchi.Sh DESCRIPTION
54*eda3ef2dSRobert MustacchiThe
55*eda3ef2dSRobert Mustacchi.Fn c16rtomb ,
56*eda3ef2dSRobert Mustacchi.Fn c32rtomb ,
57*eda3ef2dSRobert Mustacchi.Fn wcrtomb ,
58*eda3ef2dSRobert Mustacchiand
59*eda3ef2dSRobert Mustacchi.Fn wcrtomb_l
60*eda3ef2dSRobert Mustacchifunctions convert wide-character sequences into a series of multi-byte
61*eda3ef2dSRobert Mustacchicharacters.
62*eda3ef2dSRobert MustacchiThe functions work in the following formats:
63*eda3ef2dSRobert Mustacchi.Bl -tag -width wcrtomb_l
64*eda3ef2dSRobert Mustacchi.It Fn c16rtomb
65*eda3ef2dSRobert MustacchiA UTF-16 code sequence, where every code point is represented by one or
66*eda3ef2dSRobert Mustacchitwo
67*eda3ef2dSRobert Mustacchi.Vt char16_t .
68*eda3ef2dSRobert MustacchiThe UTF-16 encoding will encode certain Unicode code points as a pair of
69*eda3ef2dSRobert Mustacchitwo 16-bit code sequences, commonly referred to as a surrogate pair.
70*eda3ef2dSRobert Mustacchi.It Fn c32rtomb
71*eda3ef2dSRobert MustacchiA UTF-32 code sequence, where every code point is represented by a
72*eda3ef2dSRobert Mustacchisingle
73*eda3ef2dSRobert Mustacchi.Vt char32_t .
74*eda3ef2dSRobert MustacchiIt is illegal to pass reserved Unicode code points.
75*eda3ef2dSRobert Mustacchi.It Fn wcrtomb , Fn wcrtomb_l
76*eda3ef2dSRobert MustacchiWide characters, being a 32-bit value where every code point is
77*eda3ef2dSRobert Mustacchirepresented by a single
78*eda3ef2dSRobert Mustacchi.Vt wchar_t .
79*eda3ef2dSRobert MustacchiWhile the
80*eda3ef2dSRobert Mustacchi.Vt wchar_t
81*eda3ef2dSRobert Mustacchiand
82*eda3ef2dSRobert Mustacchi.Vt char32_t
83*eda3ef2dSRobert Mustacchiare different types, in this implementation, they are similar encodings.
84*eda3ef2dSRobert Mustacchi.El
85*eda3ef2dSRobert Mustacchi.Pp
86*eda3ef2dSRobert MustacchiThe functions all work by looking at the passed in wide-character
87*eda3ef2dSRobert Mustacchi.Po
88*eda3ef2dSRobert Mustacchi.Fa c16 ,
89*eda3ef2dSRobert Mustacchi.Fa c32 ,
90*eda3ef2dSRobert Mustacchi.Fa wc
91*eda3ef2dSRobert Mustacchi.Pc
92*eda3ef2dSRobert Mustacchiand appending it to the current conversion state,
93*eda3ef2dSRobert Mustacchi.Fa ps .
94*eda3ef2dSRobert MustacchiOnce a valid code point, based on the current locale, is found, then it
95*eda3ef2dSRobert Mustacchiwill be converted into a series of characters that are stored in
96*eda3ef2dSRobert Mustacchi.Fa str .
97*eda3ef2dSRobert MustacchiUp to
98*eda3ef2dSRobert Mustacchi.Dv MB_CUR_MAX
99*eda3ef2dSRobert Mustacchibytes will be stored in
100*eda3ef2dSRobert Mustacchi.Fa str .
101*eda3ef2dSRobert MustacchiIt is the caller's responsibility to ensure that there is sufficient
102*eda3ef2dSRobert Mustacchispace in
103*eda3ef2dSRobert Mustacchi.Fa str .
104*eda3ef2dSRobert Mustacchi.Pp
105*eda3ef2dSRobert MustacchiThe functions are all influenced by the
106*eda3ef2dSRobert Mustacchi.Dv LC_CTYPE
107*eda3ef2dSRobert Mustacchicategory of the current locale for determining what is considered a
108*eda3ef2dSRobert Mustacchivalid character.
109*eda3ef2dSRobert MustacchiFor example, in the
110*eda3ef2dSRobert Mustacchi.Sy C
111*eda3ef2dSRobert Mustacchilocale,
112*eda3ef2dSRobert Mustacchionly ASCII characters are recognized, while in a
113*eda3ef2dSRobert Mustacchi.Sy UTF-8
114*eda3ef2dSRobert Mustacchibased locale like
115*eda3ef2dSRobert Mustacchi.Sy en_us.UTF-8 ,
116*eda3ef2dSRobert Mustacchiall valid Unicode code points are recognized and will be converted into
117*eda3ef2dSRobert Mustacchithe corresponding multi-byte sequence.
118*eda3ef2dSRobert MustacchiThe
119*eda3ef2dSRobert Mustacchi.Fn wcrtomb_l
120*eda3ef2dSRobert Mustacchifunction uses the locale passed in
121*eda3ef2dSRobert Mustacchi.Fa loc
122*eda3ef2dSRobert Mustacchirather than the locale of the current thread.
123*eda3ef2dSRobert Mustacchi.Pp
124*eda3ef2dSRobert MustacchiThe
125*eda3ef2dSRobert Mustacchi.Fa ps
126*eda3ef2dSRobert Mustacchiargument represents a multi-byte conversion state which can be used
127*eda3ef2dSRobert Mustacchiacross multiple calls to a given function
128*eda3ef2dSRobert Mustacchi.Pq but not mixed between functions .
129*eda3ef2dSRobert MustacchiThese allow for characters to be consumed from subsequent buffers, e.g.
130*eda3ef2dSRobert Mustacchidifferent values of
131*eda3ef2dSRobert Mustacchi.Fa str .
132*eda3ef2dSRobert MustacchiThe functions may be called from multiple threads as long as they use
133*eda3ef2dSRobert Mustacchiunique values for
134*eda3ef2dSRobert Mustacchi.Fa ps .
135*eda3ef2dSRobert MustacchiIf
136*eda3ef2dSRobert Mustacchi.Fa ps
137*eda3ef2dSRobert Mustacchiis
138*eda3ef2dSRobert Mustacchi.Dv NULL ,
139*eda3ef2dSRobert Mustacchithen a function-specific buffer will be used for the conversion state;
140*eda3ef2dSRobert Mustacchihowever, this is stored between all threads and its use is not
141*eda3ef2dSRobert Mustacchirecommended.
142*eda3ef2dSRobert Mustacchi.Pp
143*eda3ef2dSRobert MustacchiThe functions all have a special behavior when
144*eda3ef2dSRobert Mustacchi.Dv NULL
145*eda3ef2dSRobert Mustacchiis passed for
146*eda3ef2dSRobert Mustacchi.Fa str .
147*eda3ef2dSRobert MustacchiThey instead will treat it as though a the NULL wide-character was
148*eda3ef2dSRobert Mustacchipassed in
149*eda3ef2dSRobert Mustacchi.Fa c16 ,
150*eda3ef2dSRobert Mustacchi.Fa c32 ,
151*eda3ef2dSRobert Mustacchior
152*eda3ef2dSRobert Mustacchi.Fa wc
153*eda3ef2dSRobert Mustacchiand an internal buffer
154*eda3ef2dSRobert Mustacchi.Pq buf
155*eda3ef2dSRobert Mustacchiwill be used to write out the results of the
156*eda3ef2dSRobert Mustacchiconverstion.
157*eda3ef2dSRobert MustacchiIn other words, the functions would be called as:
158*eda3ef2dSRobert Mustacchi.Bd -literal -offset indent
159*eda3ef2dSRobert Mustacchic16rtomb(buf, L'\\0', ps)
160*eda3ef2dSRobert Mustacchic32rtomb(buf, L'\\0', ps)
161*eda3ef2dSRobert Mustacchiwcrtomb(buf, L'\\0', ps)
162*eda3ef2dSRobert Mustacchiwcrtomb_l(buf, L'\\0', ps, loc)
163*eda3ef2dSRobert Mustacchi.Ed
164*eda3ef2dSRobert Mustacchi.Ss Locale Details
165*eda3ef2dSRobert MustacchiNot all locales in the system are Unicode based locales.
166*eda3ef2dSRobert MustacchiFor example, ISO 8859 family locales have code points with values that
167*eda3ef2dSRobert Mustacchido not match their counterparts in Unicode.
168*eda3ef2dSRobert MustacchiWhen using these functions with non-Unicode based locales, the code
169*eda3ef2dSRobert Mustacchipoints returned will be those determined by the locale.
170*eda3ef2dSRobert MustacchiThey will not be converted from the corresponding Unicode code point.
171*eda3ef2dSRobert MustacchiFor example, if using the Euro sign in ISO 8859-15, these functions
172*eda3ef2dSRobert Mustacchiwill not encode the Unicode value 0x20ac into the ISO 8859-15 value
173*eda3ef2dSRobert Mustacchi0xa4.
174*eda3ef2dSRobert Mustacchi.Pp
175*eda3ef2dSRobert MustacchiRegardless of the locale, the characters returned will be encoded as
176*eda3ef2dSRobert Mustacchithough the code point were the corresponding value in Unicode.
177*eda3ef2dSRobert MustacchiThis means that when using UTF-16, if the corresponding code point were
178*eda3ef2dSRobert Mustacchiin the range for surorgate pairs, then the
179*eda3ef2dSRobert Mustacchi.Fn c16rtomb
180*eda3ef2dSRobert Mustacchifunction will expect to receive that code point in that fashion.
181*eda3ef2dSRobert Mustacchi.Pp
182*eda3ef2dSRobert MustacchiThis behavior of the
183*eda3ef2dSRobert Mustacchi.Fn c16rtomb
184*eda3ef2dSRobert Mustacchiand
185*eda3ef2dSRobert Mustacchi.Fn c32rtomb
186*eda3ef2dSRobert Mustacchifunctions should not be relied upon, is not portable, and subject to
187*eda3ef2dSRobert Mustacchichange for non-Unicode locales.
188*eda3ef2dSRobert Mustacchi.Sh RETURN VALUES
189*eda3ef2dSRobert MustacchiUpon successful completion, the
190*eda3ef2dSRobert Mustacchi.Fn c16rtomb ,
191*eda3ef2dSRobert Mustacchi.Fn c32rtomb ,
192*eda3ef2dSRobert Mustacchi.Fn wcrtomb ,
193*eda3ef2dSRobert Mustacchiand
194*eda3ef2dSRobert Mustacchi.Fn wcrtomb_l
195*eda3ef2dSRobert Mustacchifunctions return the number of bytes stored in
196*eda3ef2dSRobert Mustacchi.Fa str .
197*eda3ef2dSRobert MustacchiOtherwise,
198*eda3ef2dSRobert Mustacchi.Sy (size_t)-1
199*eda3ef2dSRobert Mustacchiis returned to indicate an encoding error and
200*eda3ef2dSRobert Mustacchi.Va errno
201*eda3ef2dSRobert Mustacchiis set.
202*eda3ef2dSRobert Mustacchi.Sh EXAMPLES
203*eda3ef2dSRobert Mustacchi.Sy Example 1
204*eda3ef2dSRobert MustacchiConverting a UTF-32 character into a multi-byte character sequence.
205*eda3ef2dSRobert Mustacchi.Bd -literal
206*eda3ef2dSRobert Mustacchi#include <locale.h>
207*eda3ef2dSRobert Mustacchi#include <stdlib.h>
208*eda3ef2dSRobert Mustacchi#include <string.h>
209*eda3ef2dSRobert Mustacchi#include <err.h>
210*eda3ef2dSRobert Mustacchi#include <stdio.h>
211*eda3ef2dSRobert Mustacchi#include <uchar.h>
212*eda3ef2dSRobert Mustacchi
213*eda3ef2dSRobert Mustacchiint
214*eda3ef2dSRobert Mustacchimain(void)
215*eda3ef2dSRobert Mustacchi{
216*eda3ef2dSRobert Mustacchi        mbstate_t mbs;
217*eda3ef2dSRobert Mustacchi        size_t ret;
218*eda3ef2dSRobert Mustacchi        char buf[MB_CUR_MAX];
219*eda3ef2dSRobert Mustacchi        char32_t val = 0x5149;
220*eda3ef2dSRobert Mustacchi        const char *uchar_exp = "\exe5\ex85\ex89";
221*eda3ef2dSRobert Mustacchi
222*eda3ef2dSRobert Mustacchi        (void) memset(&mbs, 0, sizeof (mbs));
223*eda3ef2dSRobert Mustacchi        (void) setlocale(LC_CTYPE, "en_US.UTF-8");
224*eda3ef2dSRobert Mustacchi        ret = c32rtomb(buf, val, &mbs);
225*eda3ef2dSRobert Mustacchi        if (ret != strlen(uchar_exp)) {
226*eda3ef2dSRobert Mustacchi                errx(EXIT_FAILURE, "failed to convert string, got %zd",
227*eda3ef2dSRobert Mustacchi                    ret);
228*eda3ef2dSRobert Mustacchi        }
229*eda3ef2dSRobert Mustacchi
230*eda3ef2dSRobert Mustacchi        if (strncmp(buf, uchar_exp, ret) != 0) {
231*eda3ef2dSRobert Mustacchi                errx(EXIT_FAILURE, "converted char32_t does not match "
232*eda3ef2dSRobert Mustacchi                    "expected value");
233*eda3ef2dSRobert Mustacchi        }
234*eda3ef2dSRobert Mustacchi
235*eda3ef2dSRobert Mustacchi        return (0);
236*eda3ef2dSRobert Mustacchi}
237*eda3ef2dSRobert Mustacchi.Ed
238*eda3ef2dSRobert Mustacchi.Sh ERRORS
239*eda3ef2dSRobert MustacchiThe
240*eda3ef2dSRobert Mustacchi.Fn c16rtomb ,
241*eda3ef2dSRobert Mustacchi.Fn c32rtomb ,
242*eda3ef2dSRobert Mustacchi.Fn wcrtomb ,
243*eda3ef2dSRobert Mustacchiand
244*eda3ef2dSRobert Mustacchi.Fn wcrtomb_l
245*eda3ef2dSRobert Mustacchifunctions will fail if:
246*eda3ef2dSRobert Mustacchi.Bl -tag -width Er
247*eda3ef2dSRobert Mustacchi.It Er EINVAL
248*eda3ef2dSRobert MustacchiThe conversion state in
249*eda3ef2dSRobert Mustacchi.Fa ps
250*eda3ef2dSRobert Mustacchiis invalid.
251*eda3ef2dSRobert Mustacchi.It Er EILSEQ
252*eda3ef2dSRobert MustacchiAn invalid character sequence has been detected.
253*eda3ef2dSRobert Mustacchi.El
254*eda3ef2dSRobert Mustacchi.Sh MT-LEVEL
255*eda3ef2dSRobert MustacchiThe
256*eda3ef2dSRobert Mustacchi.Fn c16rtomb ,
257*eda3ef2dSRobert Mustacchi.Fn c32rtomb ,
258*eda3ef2dSRobert Mustacchi.Fn wcrtomb ,
259*eda3ef2dSRobert Mustacchiand
260*eda3ef2dSRobert Mustacchi.Fn wcrtomb_l
261*eda3ef2dSRobert Mustacchifunctions are
262*eda3ef2dSRobert Mustacchi.Sy MT-Safe
263*eda3ef2dSRobert Mustacchias long as different
264*eda3ef2dSRobert Mustacchi.Vt mbstate_t
265*eda3ef2dSRobert Mustacchistructures are passed in
266*eda3ef2dSRobert Mustacchi.Fa ps .
267*eda3ef2dSRobert MustacchiIf
268*eda3ef2dSRobert Mustacchi.Fa ps
269*eda3ef2dSRobert Mustacchiis
270*eda3ef2dSRobert Mustacchi.Dv NULL
271*eda3ef2dSRobert Mustacchior different threads use the same value for
272*eda3ef2dSRobert Mustacchi.Fa ps ,
273*eda3ef2dSRobert Mustacchithen the functions are
274*eda3ef2dSRobert Mustacchi.Sy Unsafe .
275*eda3ef2dSRobert Mustacchi.Sh INTERFACE STABILITY
276*eda3ef2dSRobert Mustacchi.Sy Committed
277*eda3ef2dSRobert Mustacchi.Sh SEE ALSO
278*eda3ef2dSRobert Mustacchi.Xr mbrtoc16 3C ,
279*eda3ef2dSRobert Mustacchi.Xr mbrtoc32 3C ,
280*eda3ef2dSRobert Mustacchi.Xr mbrtowc 3C ,
281*eda3ef2dSRobert Mustacchi.Xr newlocale 3C ,
282*eda3ef2dSRobert Mustacchi.Xr setlocale 3C ,
283*eda3ef2dSRobert Mustacchi.Xr uselocale 3C ,
284*eda3ef2dSRobert Mustacchi.Xr uchar.h 3HEAD ,
285*eda3ef2dSRobert Mustacchi.Xr environ 5
286